Re: CI failure due to offline llvm.org

2018-01-11 Thread kellen sunderland
Doing a few searches I see that llvm.org  doesn't
appear to be stable enough for CI.  I'm going to write something to
hopefully make it a little more stable today, while still allowing those at
home to have easily reproducible build steps through docker.  What I'd
propose is we cache the 15 or so deb packages that get installed with clang
in s3 in the CI env.  For home users who can't reach the cached s3 bucket
we fall back to apt.llvm.org installation.  Sound like a reasonable plan
Marco?

On Fri, Jan 12, 2018 at 8:21 AM, Marco de Abreu <
marco.g.ab...@googlemail.com> wrote:

> Aah I understand, you're right, we should revisit our decisions. I'll put
> it into the backlog so I don't forget it.
>
> -Marco
>
> Am 12.01.2018 2:48 vorm. schrieb "Chris Olivier" :
>
> Yeah, I'm just saying the whole delete was done as a drastic measure at the
> time. It may not be necessary do re-pull everything. Instead of deleting
> everything, you could delete everything *except* the .git dir. and then
> checkout the commit you want and it'll regenerate the sources from the .git
> database.
>
> This, of course, assuming the .git database is never wrong...  If something
> goes wrong, you can nuke the whole dir.
>
>
> On Thu, Jan 11, 2018 at 5:42 PM, Marco de Abreu <
> marco.g.ab...@googlemail.com> wrote:
>
> > Exactly
> >
> > -Marco
> >
> > On Fri, Jan 12, 2018 at 2:40 AM, Chris Olivier 
> > wrote:
> >
> > > Actrually, this is the commit related to it.
> > > https://github.com/cjolivier01/mxnet/commit/
> > 573a010879583885a0193e30dc0b8c
> > > 848d80869b
> > >
> > > Before, the workspace directory wasn't being deleted.  Now it is,
> > correct?
> > > Everything under the top directory, right?
> > >
> > > So a git clone re-pulls everything?
> > >
> > > On Thu, Jan 11, 2018 at 4:51 PM, Marco de Abreu <
> > > marco.g.ab...@googlemail.com> wrote:
> > >
> > > > deleteDir() deletes the content of the current workspace
> > > >
> > > > Okay, I haven't seen any errors related to lua-package not being
> > deleted.
> > > > Do you have a CI-link by any chance?
> > > >
> > > > -Marco
> > > >
> > > > On Fri, Jan 12, 2018 at 1:49 AM, Chris Olivier <
> cjolivie...@gmail.com>
> > > > wrote:
> > > >
> > > > > what is deleteDir() call doing in Jenkinsfile?
> > > > > Yes, I mentioned the case where it wasn't getting cleaned.
> > > > >
> > > > > On Thu, Jan 11, 2018 at 4:41 PM, Marco de Abreu <
> > > > > marco.g.ab...@googlemail.com> wrote:
> > > > >
> > > > > > During git_init: First we're just using git clean, if checkout
> > fails,
> > > > > we're
> > > > > > deleting the entire workspace and retrying.
> > > > > >
> > > > > > During build: First we're using regular make. If build fails,
> we're
> > > > using
> > > > > > make clean before executing make again.
> > > > > >
> > > > > > During test: No cleanup happening in case of failure.
> > > > > >
> > > > > > So far, I haven't noticed any files not being deleted in the
> > > workspace.
> > > > > Do
> > > > > > you know an example?
> > > > > >
> > > > > > -Marco
> > > > > >
> > > > > > On Fri, Jan 12, 2018 at 1:34 AM, Chris Olivier <
> > > cjolivie...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > What approach is used now?  I see in Jenkinsfile() that
> > deleteDir()
> > > > is
> > > > > > > called at the top of init_git() and init_git_win().  That
> > dele5tes
> > > > the
> > > > > > > whole directory, correct?
> > > > > > >
> > > > > > > Before there were problems with 'git clean -d -f' *not*
> deleting
> > > some
> > > > > > > directories which were tracked on one branch and not on
> another,
> > > > which
> > > > > I
> > > > > > > believe is why deletDir() was put there. The directory I recall
> > was
> > > > > > > something like lua-package or something that was in someone's
> > > private
> > > > > > repo
> > > > > > > or something like that...
> > > > > > >
> > > > > > > On Thu, Jan 11, 2018 at 4:02 PM, Marco de Abreu <
> > > > > > > marco.g.ab...@googlemail.com> wrote:
> > > > > > >
> > > > > > > > While it's a quite harsh solution to delete the entire
> > > workspace, I
> > > > > > think
> > > > > > > > that it's a good way. Git checkout takes between 2 and 10
> > > seconds,
> > > > > so I
> > > > > > > > don't think we need to optimize in that regard.
> > > > > > > >
> > > > > > > > git clean is our 'soft' approach to clean up. Deleting the
> > > > workspace
> > > > > is
> > > > > > > the
> > > > > > > > 'hard' approach, so this shouldn't be an issue.
> > > > > > > >
> > > > > > > > But there is one catch: Windows builds are not containerized
> > and
> > > > > while
> > > > > > we
> > > > > > > > delete the workspace, there could still be a lot of files
> which
> > > are
> > > > > not
> > > > > > > > being tracked. In future I'd like to have at least a
> > > > > file-system-layer
> > > > > > in
> > > > > > > > between our tests and the host, but we will have to analyze
> if
> > > > > > something
> > > > > > > > 

Re: CI failure due to offline llvm.org

2018-01-11 Thread Marco de Abreu
Aah I understand, you're right, we should revisit our decisions. I'll put
it into the backlog so I don't forget it.

-Marco

Am 12.01.2018 2:48 vorm. schrieb "Chris Olivier" :

Yeah, I'm just saying the whole delete was done as a drastic measure at the
time. It may not be necessary do re-pull everything. Instead of deleting
everything, you could delete everything *except* the .git dir. and then
checkout the commit you want and it'll regenerate the sources from the .git
database.

This, of course, assuming the .git database is never wrong...  If something
goes wrong, you can nuke the whole dir.


On Thu, Jan 11, 2018 at 5:42 PM, Marco de Abreu <
marco.g.ab...@googlemail.com> wrote:

> Exactly
>
> -Marco
>
> On Fri, Jan 12, 2018 at 2:40 AM, Chris Olivier 
> wrote:
>
> > Actrually, this is the commit related to it.
> > https://github.com/cjolivier01/mxnet/commit/
> 573a010879583885a0193e30dc0b8c
> > 848d80869b
> >
> > Before, the workspace directory wasn't being deleted.  Now it is,
> correct?
> > Everything under the top directory, right?
> >
> > So a git clone re-pulls everything?
> >
> > On Thu, Jan 11, 2018 at 4:51 PM, Marco de Abreu <
> > marco.g.ab...@googlemail.com> wrote:
> >
> > > deleteDir() deletes the content of the current workspace
> > >
> > > Okay, I haven't seen any errors related to lua-package not being
> deleted.
> > > Do you have a CI-link by any chance?
> > >
> > > -Marco
> > >
> > > On Fri, Jan 12, 2018 at 1:49 AM, Chris Olivier 
> > > wrote:
> > >
> > > > what is deleteDir() call doing in Jenkinsfile?
> > > > Yes, I mentioned the case where it wasn't getting cleaned.
> > > >
> > > > On Thu, Jan 11, 2018 at 4:41 PM, Marco de Abreu <
> > > > marco.g.ab...@googlemail.com> wrote:
> > > >
> > > > > During git_init: First we're just using git clean, if checkout
> fails,
> > > > we're
> > > > > deleting the entire workspace and retrying.
> > > > >
> > > > > During build: First we're using regular make. If build fails,
we're
> > > using
> > > > > make clean before executing make again.
> > > > >
> > > > > During test: No cleanup happening in case of failure.
> > > > >
> > > > > So far, I haven't noticed any files not being deleted in the
> > workspace.
> > > > Do
> > > > > you know an example?
> > > > >
> > > > > -Marco
> > > > >
> > > > > On Fri, Jan 12, 2018 at 1:34 AM, Chris Olivier <
> > cjolivie...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > What approach is used now?  I see in Jenkinsfile() that
> deleteDir()
> > > is
> > > > > > called at the top of init_git() and init_git_win().  That
> dele5tes
> > > the
> > > > > > whole directory, correct?
> > > > > >
> > > > > > Before there were problems with 'git clean -d -f' *not* deleting
> > some
> > > > > > directories which were tracked on one branch and not on another,
> > > which
> > > > I
> > > > > > believe is why deletDir() was put there. The directory I recall
> was
> > > > > > something like lua-package or something that was in someone's
> > private
> > > > > repo
> > > > > > or something like that...
> > > > > >
> > > > > > On Thu, Jan 11, 2018 at 4:02 PM, Marco de Abreu <
> > > > > > marco.g.ab...@googlemail.com> wrote:
> > > > > >
> > > > > > > While it's a quite harsh solution to delete the entire
> > workspace, I
> > > > > think
> > > > > > > that it's a good way. Git checkout takes between 2 and 10
> > seconds,
> > > > so I
> > > > > > > don't think we need to optimize in that regard.
> > > > > > >
> > > > > > > git clean is our 'soft' approach to clean up. Deleting the
> > > workspace
> > > > is
> > > > > > the
> > > > > > > 'hard' approach, so this shouldn't be an issue.
> > > > > > >
> > > > > > > But there is one catch: Windows builds are not containerized
> and
> > > > while
> > > > > we
> > > > > > > delete the workspace, there could still be a lot of files
which
> > are
> > > > not
> > > > > > > being tracked. In future I'd like to have at least a
> > > > file-system-layer
> > > > > in
> > > > > > > between our tests and the host, but we will have to analyze if
> > > > > something
> > > > > > > like this exists. At the moment, we even got tests writing to
> > > > system32.
> > > > > > >
> > > > > > > -Marco
> > > > > > >
> > > > > > > On Fri, Jan 12, 2018 at 12:44 AM, Chris Olivier <
> > > > cjolivie...@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Ok, but still on that note. I remember before that when some
> > > > problems
> > > > > > > were
> > > > > > > > being fixed in CI (before your time), they switched to
> deleting
> > > the
> > > > > > > entire
> > > > > > > > source directory, ".git" subdirectory and all.  At the time,
> > the
> > > CI
> > > > > was
> > > > > > > in
> > > > > > > > such an chaotic state that I didn't make an issue of it, but
> > now
> > > > that
> > > > > > it
> > > > > > > > has stabilized (for the most part, today's incident
> > > > > notwithstanding), I
> > > > > > > > think that we may want 

Re: Reduce 99% of your memory leaks with this simple trick!

2018-01-11 Thread Marco de Abreu
We'd like to add automate coverity statistic generation to CI, but no ETA
yet.

-Marco

Am 12.01.2018 4:36 vorm. schrieb "Chris Olivier" :

> I think there's tons of books on "best practices" already, so I wouldn't
> want to trouble you :)
>
> Are we running coverity static analysis?  It catches those kinds of things.
>
> On Thu, Jan 11, 2018 at 7:04 PM, Bhavin Thaker 
> wrote:
>
> > Would it make sense to have a developer best practices section on the
> > Apache wiki where such guidance can be documented for future reference?
> >
> > Bhavin Thaker.
> >
> > On Thu, Jan 11, 2018 at 9:56 AM Anirudh  wrote:
> >
> > > Hi,
> > >
> > >
> > > I have been thinking about exception handling specifically inside
> spawned
> > > threads.
> > >
> > > As Tianqi mentioned, there is already a mechanism with LOG(FATAL) or
> > CHECK
> > > for exception handling inside main
> > >
> > > Thread. For exception handling inside spawned threads I see two places:
> > > iterators and operators.
> > >
> > >
> > >
> > > For iterators, we can use exception_ptr to transport the exceptions
> from
> > > child thread to the main thread.
> > >
> > > This can be implemented in the threadediter class template. Since
> > > PrefetchingIter is used by most iterators in MXNet,
> > >
> > > and this uses threadediter, we should be able to cover most of our use
> > > cases.
> > >
> > >
> > >
> > > For operators, I was thinking that we can transport the exception down
> > the
> > > dependency path.
> > >
> > > For example, when an exception is caught inside ExecuteOprBlock for a
> > > single operator,
> > >
> > > We store the exception_ptr in the operator. We then propagate the
> > > exception_ptr down to all the vars that the
> > >
> > > Operator writes to. Similarly, if an operator’s read vars has
> > exception_ptr
> > > attached to it, we propagate it down to the operator itself.
> > >
> > >
> > >
> > > We can then check if the var has an associated exception_ptr in
> > > wait_to_read.
> > >
> > > One problem I see with the approach is that even if an operator fails
> we
> > > may need to run subsequent operators. One way to avoid this
> > >
> > > Would be an onstart callback, which would mark the operator to not
> > execute
> > > if any of the read vars have an exception_ptr attached to it.
> > >
> > >
> > >
> > > Anirudh
> > >
> > > On Thu, Jan 11, 2018 at 9:02 AM, Tianqi Chen  >
> > > wrote:
> > >
> > > > I am all for RAII when possible in most of the code. The only reason
> > some
> > > > of the raw ptr occur in dmlc codebase was legacy-issue, and that can
> be
> > > > resolved by wrapping returning ptr via unique_ptr or shared_ptr. One
> > > > notable property of RAII is exception safe, which makes the code
> handle
> > > > resource correctly when it throws in the middle. There are cases
> where
> > > > memory allocation needs to be explicitly handled(e.g. GPU memory
> > > > management) and reused where we need to do explicit management when
> > > needed.
> > > >
> > > >
> > > > As for exception handling, we do have a mechanism for handling
> > > exceptions.
> > > > when you do LOG(FATAL) or CHECK is caught at the C API boundary,
> which
> > > > translates to return code  -1 and an error is thrown on the python
> > side.
> > > > Throwing exception from another thread is a more tricky thing, which
> > > > involves catching them in the engine, and usually, the state is not
> > > correct
> > > > in such case. But most of the cases when we need exception handling
> are
> > > the
> > > > simple case of opening a file and use CHECK should suffice.
> > > >
> > > > A better approach might be defining a new macro for errors intended
> to
> > > > throw to a user and handled correctly, something like DMLC_EXPECT.
> But
> > I
> > > > find it might be a burden to put developer to distinguish what should
> > be
> > > a
> > > > user error and a normal check, so we just use CHECK for now
> > > >
> > > > Tianqi
> > > >
> > > > On Thu, Jan 11, 2018 at 3:09 AM, Pedro Larroy <
> > > > pedro.larroy.li...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > I would like to encourage contributors to use RAII idioms in C++
> > > > > whenever possible to avoid resource leaks.
> > > > >
> > > > > RAII is an ugly acronym that stands for Resource Acquisition Is
> > > > > Initialization, which basically means that you should almost never
> > use
> > > > > explicit new and delete operators and instead use std::make_shared,
> > > > > std::make_unique and std::vector  and .data() for raw
> > > > > buffers. Also always allocating OS resources in constructors
> > releasing
> > > > > them in destructors such as file descriptors.
> > > > >
> > > > > Asides from forgetting to call delete on an allocation, explicit
> > > > > deletes are bad because an exception thrown in the middle prevents
> > > > > delete from running entirely.
> > > > >
> > > > > This helps a lot 

Re: Reduce 99% of your memory leaks with this simple trick!

2018-01-11 Thread Chris Olivier
I think there's tons of books on "best practices" already, so I wouldn't
want to trouble you :)

Are we running coverity static analysis?  It catches those kinds of things.

On Thu, Jan 11, 2018 at 7:04 PM, Bhavin Thaker 
wrote:

> Would it make sense to have a developer best practices section on the
> Apache wiki where such guidance can be documented for future reference?
>
> Bhavin Thaker.
>
> On Thu, Jan 11, 2018 at 9:56 AM Anirudh  wrote:
>
> > Hi,
> >
> >
> > I have been thinking about exception handling specifically inside spawned
> > threads.
> >
> > As Tianqi mentioned, there is already a mechanism with LOG(FATAL) or
> CHECK
> > for exception handling inside main
> >
> > Thread. For exception handling inside spawned threads I see two places:
> > iterators and operators.
> >
> >
> >
> > For iterators, we can use exception_ptr to transport the exceptions from
> > child thread to the main thread.
> >
> > This can be implemented in the threadediter class template. Since
> > PrefetchingIter is used by most iterators in MXNet,
> >
> > and this uses threadediter, we should be able to cover most of our use
> > cases.
> >
> >
> >
> > For operators, I was thinking that we can transport the exception down
> the
> > dependency path.
> >
> > For example, when an exception is caught inside ExecuteOprBlock for a
> > single operator,
> >
> > We store the exception_ptr in the operator. We then propagate the
> > exception_ptr down to all the vars that the
> >
> > Operator writes to. Similarly, if an operator’s read vars has
> exception_ptr
> > attached to it, we propagate it down to the operator itself.
> >
> >
> >
> > We can then check if the var has an associated exception_ptr in
> > wait_to_read.
> >
> > One problem I see with the approach is that even if an operator fails we
> > may need to run subsequent operators. One way to avoid this
> >
> > Would be an onstart callback, which would mark the operator to not
> execute
> > if any of the read vars have an exception_ptr attached to it.
> >
> >
> >
> > Anirudh
> >
> > On Thu, Jan 11, 2018 at 9:02 AM, Tianqi Chen 
> > wrote:
> >
> > > I am all for RAII when possible in most of the code. The only reason
> some
> > > of the raw ptr occur in dmlc codebase was legacy-issue, and that can be
> > > resolved by wrapping returning ptr via unique_ptr or shared_ptr. One
> > > notable property of RAII is exception safe, which makes the code handle
> > > resource correctly when it throws in the middle. There are cases where
> > > memory allocation needs to be explicitly handled(e.g. GPU memory
> > > management) and reused where we need to do explicit management when
> > needed.
> > >
> > >
> > > As for exception handling, we do have a mechanism for handling
> > exceptions.
> > > when you do LOG(FATAL) or CHECK is caught at the C API boundary, which
> > > translates to return code  -1 and an error is thrown on the python
> side.
> > > Throwing exception from another thread is a more tricky thing, which
> > > involves catching them in the engine, and usually, the state is not
> > correct
> > > in such case. But most of the cases when we need exception handling are
> > the
> > > simple case of opening a file and use CHECK should suffice.
> > >
> > > A better approach might be defining a new macro for errors intended to
> > > throw to a user and handled correctly, something like DMLC_EXPECT. But
> I
> > > find it might be a burden to put developer to distinguish what should
> be
> > a
> > > user error and a normal check, so we just use CHECK for now
> > >
> > > Tianqi
> > >
> > > On Thu, Jan 11, 2018 at 3:09 AM, Pedro Larroy <
> > > pedro.larroy.li...@gmail.com>
> > > wrote:
> > >
> > > > Hi
> > > >
> > > > I would like to encourage contributors to use RAII idioms in C++
> > > > whenever possible to avoid resource leaks.
> > > >
> > > > RAII is an ugly acronym that stands for Resource Acquisition Is
> > > > Initialization, which basically means that you should almost never
> use
> > > > explicit new and delete operators and instead use std::make_shared,
> > > > std::make_unique and std::vector  and .data() for raw
> > > > buffers. Also always allocating OS resources in constructors
> releasing
> > > > them in destructors such as file descriptors.
> > > >
> > > > Asides from forgetting to call delete on an allocation, explicit
> > > > deletes are bad because an exception thrown in the middle prevents
> > > > delete from running entirely.
> > > >
> > > > This helps a lot writing correct, secure and exception safe code
> > > > without memory leaks.
> > > >
> > > > Another problem that I think is worth a discussion, is how to handle
> > > > exceptions and errors. Right now, I don't think there's a good way to
> > > > throw an exception in some functions without crashing the python
> > > > interpreter. I think we should come with a smart way to propagate
> > > > exceptions from the library up to the user runtime 

Re: Reduce 99% of your memory leaks with this simple trick!

2018-01-11 Thread Bhavin Thaker
Would it make sense to have a developer best practices section on the
Apache wiki where such guidance can be documented for future reference?

Bhavin Thaker.

On Thu, Jan 11, 2018 at 9:56 AM Anirudh  wrote:

> Hi,
>
>
> I have been thinking about exception handling specifically inside spawned
> threads.
>
> As Tianqi mentioned, there is already a mechanism with LOG(FATAL) or CHECK
> for exception handling inside main
>
> Thread. For exception handling inside spawned threads I see two places:
> iterators and operators.
>
>
>
> For iterators, we can use exception_ptr to transport the exceptions from
> child thread to the main thread.
>
> This can be implemented in the threadediter class template. Since
> PrefetchingIter is used by most iterators in MXNet,
>
> and this uses threadediter, we should be able to cover most of our use
> cases.
>
>
>
> For operators, I was thinking that we can transport the exception down the
> dependency path.
>
> For example, when an exception is caught inside ExecuteOprBlock for a
> single operator,
>
> We store the exception_ptr in the operator. We then propagate the
> exception_ptr down to all the vars that the
>
> Operator writes to. Similarly, if an operator’s read vars has exception_ptr
> attached to it, we propagate it down to the operator itself.
>
>
>
> We can then check if the var has an associated exception_ptr in
> wait_to_read.
>
> One problem I see with the approach is that even if an operator fails we
> may need to run subsequent operators. One way to avoid this
>
> Would be an onstart callback, which would mark the operator to not execute
> if any of the read vars have an exception_ptr attached to it.
>
>
>
> Anirudh
>
> On Thu, Jan 11, 2018 at 9:02 AM, Tianqi Chen 
> wrote:
>
> > I am all for RAII when possible in most of the code. The only reason some
> > of the raw ptr occur in dmlc codebase was legacy-issue, and that can be
> > resolved by wrapping returning ptr via unique_ptr or shared_ptr. One
> > notable property of RAII is exception safe, which makes the code handle
> > resource correctly when it throws in the middle. There are cases where
> > memory allocation needs to be explicitly handled(e.g. GPU memory
> > management) and reused where we need to do explicit management when
> needed.
> >
> >
> > As for exception handling, we do have a mechanism for handling
> exceptions.
> > when you do LOG(FATAL) or CHECK is caught at the C API boundary, which
> > translates to return code  -1 and an error is thrown on the python side.
> > Throwing exception from another thread is a more tricky thing, which
> > involves catching them in the engine, and usually, the state is not
> correct
> > in such case. But most of the cases when we need exception handling are
> the
> > simple case of opening a file and use CHECK should suffice.
> >
> > A better approach might be defining a new macro for errors intended to
> > throw to a user and handled correctly, something like DMLC_EXPECT. But I
> > find it might be a burden to put developer to distinguish what should be
> a
> > user error and a normal check, so we just use CHECK for now
> >
> > Tianqi
> >
> > On Thu, Jan 11, 2018 at 3:09 AM, Pedro Larroy <
> > pedro.larroy.li...@gmail.com>
> > wrote:
> >
> > > Hi
> > >
> > > I would like to encourage contributors to use RAII idioms in C++
> > > whenever possible to avoid resource leaks.
> > >
> > > RAII is an ugly acronym that stands for Resource Acquisition Is
> > > Initialization, which basically means that you should almost never use
> > > explicit new and delete operators and instead use std::make_shared,
> > > std::make_unique and std::vector  and .data() for raw
> > > buffers. Also always allocating OS resources in constructors releasing
> > > them in destructors such as file descriptors.
> > >
> > > Asides from forgetting to call delete on an allocation, explicit
> > > deletes are bad because an exception thrown in the middle prevents
> > > delete from running entirely.
> > >
> > > This helps a lot writing correct, secure and exception safe code
> > > without memory leaks.
> > >
> > > Another problem that I think is worth a discussion, is how to handle
> > > exceptions and errors. Right now, I don't think there's a good way to
> > > throw an exception in some functions without crashing the python
> > > interpreter. I think we should come with a smart way to propagate
> > > exceptions from the library up to the user runtime (python, scala...)
> > >
> > > As an example of what I'm talking about is this suspicious code that I
> > > saw in a PR, which has several bugs in a few lines of code related to
> > > what I'm discussing in this thread, crashing Python when trying to
> > > open a file that doesn't exist. (How to propagate an exception in this
> > > case?)
> > >
> > > https://github.com/apache/incubator-mxnet/pull/9370/files
> > >
> > > Please excuse the clickbait subject, just trying to grab your
> > > attention in a 

Re: CI failure due to offline llvm.org

2018-01-11 Thread Chris Olivier
Yeah, I'm just saying the whole delete was done as a drastic measure at the
time. It may not be necessary do re-pull everything. Instead of deleting
everything, you could delete everything *except* the .git dir. and then
checkout the commit you want and it'll regenerate the sources from the .git
database.

This, of course, assuming the .git database is never wrong...  If something
goes wrong, you can nuke the whole dir.


On Thu, Jan 11, 2018 at 5:42 PM, Marco de Abreu <
marco.g.ab...@googlemail.com> wrote:

> Exactly
>
> -Marco
>
> On Fri, Jan 12, 2018 at 2:40 AM, Chris Olivier 
> wrote:
>
> > Actrually, this is the commit related to it.
> > https://github.com/cjolivier01/mxnet/commit/
> 573a010879583885a0193e30dc0b8c
> > 848d80869b
> >
> > Before, the workspace directory wasn't being deleted.  Now it is,
> correct?
> > Everything under the top directory, right?
> >
> > So a git clone re-pulls everything?
> >
> > On Thu, Jan 11, 2018 at 4:51 PM, Marco de Abreu <
> > marco.g.ab...@googlemail.com> wrote:
> >
> > > deleteDir() deletes the content of the current workspace
> > >
> > > Okay, I haven't seen any errors related to lua-package not being
> deleted.
> > > Do you have a CI-link by any chance?
> > >
> > > -Marco
> > >
> > > On Fri, Jan 12, 2018 at 1:49 AM, Chris Olivier 
> > > wrote:
> > >
> > > > what is deleteDir() call doing in Jenkinsfile?
> > > > Yes, I mentioned the case where it wasn't getting cleaned.
> > > >
> > > > On Thu, Jan 11, 2018 at 4:41 PM, Marco de Abreu <
> > > > marco.g.ab...@googlemail.com> wrote:
> > > >
> > > > > During git_init: First we're just using git clean, if checkout
> fails,
> > > > we're
> > > > > deleting the entire workspace and retrying.
> > > > >
> > > > > During build: First we're using regular make. If build fails, we're
> > > using
> > > > > make clean before executing make again.
> > > > >
> > > > > During test: No cleanup happening in case of failure.
> > > > >
> > > > > So far, I haven't noticed any files not being deleted in the
> > workspace.
> > > > Do
> > > > > you know an example?
> > > > >
> > > > > -Marco
> > > > >
> > > > > On Fri, Jan 12, 2018 at 1:34 AM, Chris Olivier <
> > cjolivie...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > What approach is used now?  I see in Jenkinsfile() that
> deleteDir()
> > > is
> > > > > > called at the top of init_git() and init_git_win().  That
> dele5tes
> > > the
> > > > > > whole directory, correct?
> > > > > >
> > > > > > Before there were problems with 'git clean -d -f' *not* deleting
> > some
> > > > > > directories which were tracked on one branch and not on another,
> > > which
> > > > I
> > > > > > believe is why deletDir() was put there. The directory I recall
> was
> > > > > > something like lua-package or something that was in someone's
> > private
> > > > > repo
> > > > > > or something like that...
> > > > > >
> > > > > > On Thu, Jan 11, 2018 at 4:02 PM, Marco de Abreu <
> > > > > > marco.g.ab...@googlemail.com> wrote:
> > > > > >
> > > > > > > While it's a quite harsh solution to delete the entire
> > workspace, I
> > > > > think
> > > > > > > that it's a good way. Git checkout takes between 2 and 10
> > seconds,
> > > > so I
> > > > > > > don't think we need to optimize in that regard.
> > > > > > >
> > > > > > > git clean is our 'soft' approach to clean up. Deleting the
> > > workspace
> > > > is
> > > > > > the
> > > > > > > 'hard' approach, so this shouldn't be an issue.
> > > > > > >
> > > > > > > But there is one catch: Windows builds are not containerized
> and
> > > > while
> > > > > we
> > > > > > > delete the workspace, there could still be a lot of files which
> > are
> > > > not
> > > > > > > being tracked. In future I'd like to have at least a
> > > > file-system-layer
> > > > > in
> > > > > > > between our tests and the host, but we will have to analyze if
> > > > > something
> > > > > > > like this exists. At the moment, we even got tests writing to
> > > > system32.
> > > > > > >
> > > > > > > -Marco
> > > > > > >
> > > > > > > On Fri, Jan 12, 2018 at 12:44 AM, Chris Olivier <
> > > > cjolivie...@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Ok, but still on that note. I remember before that when some
> > > > problems
> > > > > > > were
> > > > > > > > being fixed in CI (before your time), they switched to
> deleting
> > > the
> > > > > > > entire
> > > > > > > > source directory, ".git" subdirectory and all.  At the time,
> > the
> > > CI
> > > > > was
> > > > > > > in
> > > > > > > > such an chaotic state that I didn't make an issue of it, but
> > now
> > > > that
> > > > > > it
> > > > > > > > has stabilized (for the most part, today's incident
> > > > > notwithstanding), I
> > > > > > > > think that we may want to revisit it if it is still doing
> that.
> > > > you
> > > > > > > could,
> > > > > > > > for example, just delete everything except the .git directory
> > and
> > > > > then
> > > > > > > do a
> > 

Re: CI failure due to offline llvm.org

2018-01-11 Thread Marco de Abreu
Exactly

-Marco

On Fri, Jan 12, 2018 at 2:40 AM, Chris Olivier 
wrote:

> Actrually, this is the commit related to it.
> https://github.com/cjolivier01/mxnet/commit/573a010879583885a0193e30dc0b8c
> 848d80869b
>
> Before, the workspace directory wasn't being deleted.  Now it is, correct?
> Everything under the top directory, right?
>
> So a git clone re-pulls everything?
>
> On Thu, Jan 11, 2018 at 4:51 PM, Marco de Abreu <
> marco.g.ab...@googlemail.com> wrote:
>
> > deleteDir() deletes the content of the current workspace
> >
> > Okay, I haven't seen any errors related to lua-package not being deleted.
> > Do you have a CI-link by any chance?
> >
> > -Marco
> >
> > On Fri, Jan 12, 2018 at 1:49 AM, Chris Olivier 
> > wrote:
> >
> > > what is deleteDir() call doing in Jenkinsfile?
> > > Yes, I mentioned the case where it wasn't getting cleaned.
> > >
> > > On Thu, Jan 11, 2018 at 4:41 PM, Marco de Abreu <
> > > marco.g.ab...@googlemail.com> wrote:
> > >
> > > > During git_init: First we're just using git clean, if checkout fails,
> > > we're
> > > > deleting the entire workspace and retrying.
> > > >
> > > > During build: First we're using regular make. If build fails, we're
> > using
> > > > make clean before executing make again.
> > > >
> > > > During test: No cleanup happening in case of failure.
> > > >
> > > > So far, I haven't noticed any files not being deleted in the
> workspace.
> > > Do
> > > > you know an example?
> > > >
> > > > -Marco
> > > >
> > > > On Fri, Jan 12, 2018 at 1:34 AM, Chris Olivier <
> cjolivie...@gmail.com>
> > > > wrote:
> > > >
> > > > > What approach is used now?  I see in Jenkinsfile() that deleteDir()
> > is
> > > > > called at the top of init_git() and init_git_win().  That dele5tes
> > the
> > > > > whole directory, correct?
> > > > >
> > > > > Before there were problems with 'git clean -d -f' *not* deleting
> some
> > > > > directories which were tracked on one branch and not on another,
> > which
> > > I
> > > > > believe is why deletDir() was put there. The directory I recall was
> > > > > something like lua-package or something that was in someone's
> private
> > > > repo
> > > > > or something like that...
> > > > >
> > > > > On Thu, Jan 11, 2018 at 4:02 PM, Marco de Abreu <
> > > > > marco.g.ab...@googlemail.com> wrote:
> > > > >
> > > > > > While it's a quite harsh solution to delete the entire
> workspace, I
> > > > think
> > > > > > that it's a good way. Git checkout takes between 2 and 10
> seconds,
> > > so I
> > > > > > don't think we need to optimize in that regard.
> > > > > >
> > > > > > git clean is our 'soft' approach to clean up. Deleting the
> > workspace
> > > is
> > > > > the
> > > > > > 'hard' approach, so this shouldn't be an issue.
> > > > > >
> > > > > > But there is one catch: Windows builds are not containerized and
> > > while
> > > > we
> > > > > > delete the workspace, there could still be a lot of files which
> are
> > > not
> > > > > > being tracked. In future I'd like to have at least a
> > > file-system-layer
> > > > in
> > > > > > between our tests and the host, but we will have to analyze if
> > > > something
> > > > > > like this exists. At the moment, we even got tests writing to
> > > system32.
> > > > > >
> > > > > > -Marco
> > > > > >
> > > > > > On Fri, Jan 12, 2018 at 12:44 AM, Chris Olivier <
> > > cjolivie...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Ok, but still on that note. I remember before that when some
> > > problems
> > > > > > were
> > > > > > > being fixed in CI (before your time), they switched to deleting
> > the
> > > > > > entire
> > > > > > > source directory, ".git" subdirectory and all.  At the time,
> the
> > CI
> > > > was
> > > > > > in
> > > > > > > such an chaotic state that I didn't make an issue of it, but
> now
> > > that
> > > > > it
> > > > > > > has stabilized (for the most part, today's incident
> > > > notwithstanding), I
> > > > > > > think that we may want to revisit it if it is still doing that.
> > > you
> > > > > > could,
> > > > > > > for example, just delete everything except the .git directory
> and
> > > > then
> > > > > > do a
> > > > > > > 'git reset --hard' to get back a baseline before having to
> > > > re-download
> > > > > > > everything every tim e(also should speed up the builds).
> > > > > > >
> > > > > > > Note that 'git clean' was not working as it doesn't delete
> > > 'unknown'
> > > > > > > directories, which was the problem.
> > > > > > >
> > > > > > > WDYT?
> > > > > > >
> > > > > > > On Thu, Jan 11, 2018 at 3:26 PM, Marco de Abreu <
> > > > > > > marco.g.ab...@googlemail.com> wrote:
> > > > > > >
> > > > > > > > This happens because we just merged the clang compilation
> > > > > > > > https://github.com/apache/incubator-mxnet/commit/
> > > > > > > > 2b73aac527a3439ec0dc9b1e76c6df09ea347eb1.
> > > > > > > > This means that clang has to get installed on all slaves and
> > > after
> > > > > some
> > > > > 

Re: CI failure due to offline llvm.org

2018-01-11 Thread Chris Olivier
Actrually, this is the commit related to it.
https://github.com/cjolivier01/mxnet/commit/573a010879583885a0193e30dc0b8c848d80869b

Before, the workspace directory wasn't being deleted.  Now it is, correct?
Everything under the top directory, right?

So a git clone re-pulls everything?

On Thu, Jan 11, 2018 at 4:51 PM, Marco de Abreu <
marco.g.ab...@googlemail.com> wrote:

> deleteDir() deletes the content of the current workspace
>
> Okay, I haven't seen any errors related to lua-package not being deleted.
> Do you have a CI-link by any chance?
>
> -Marco
>
> On Fri, Jan 12, 2018 at 1:49 AM, Chris Olivier 
> wrote:
>
> > what is deleteDir() call doing in Jenkinsfile?
> > Yes, I mentioned the case where it wasn't getting cleaned.
> >
> > On Thu, Jan 11, 2018 at 4:41 PM, Marco de Abreu <
> > marco.g.ab...@googlemail.com> wrote:
> >
> > > During git_init: First we're just using git clean, if checkout fails,
> > we're
> > > deleting the entire workspace and retrying.
> > >
> > > During build: First we're using regular make. If build fails, we're
> using
> > > make clean before executing make again.
> > >
> > > During test: No cleanup happening in case of failure.
> > >
> > > So far, I haven't noticed any files not being deleted in the workspace.
> > Do
> > > you know an example?
> > >
> > > -Marco
> > >
> > > On Fri, Jan 12, 2018 at 1:34 AM, Chris Olivier 
> > > wrote:
> > >
> > > > What approach is used now?  I see in Jenkinsfile() that deleteDir()
> is
> > > > called at the top of init_git() and init_git_win().  That dele5tes
> the
> > > > whole directory, correct?
> > > >
> > > > Before there were problems with 'git clean -d -f' *not* deleting some
> > > > directories which were tracked on one branch and not on another,
> which
> > I
> > > > believe is why deletDir() was put there. The directory I recall was
> > > > something like lua-package or something that was in someone's private
> > > repo
> > > > or something like that...
> > > >
> > > > On Thu, Jan 11, 2018 at 4:02 PM, Marco de Abreu <
> > > > marco.g.ab...@googlemail.com> wrote:
> > > >
> > > > > While it's a quite harsh solution to delete the entire workspace, I
> > > think
> > > > > that it's a good way. Git checkout takes between 2 and 10 seconds,
> > so I
> > > > > don't think we need to optimize in that regard.
> > > > >
> > > > > git clean is our 'soft' approach to clean up. Deleting the
> workspace
> > is
> > > > the
> > > > > 'hard' approach, so this shouldn't be an issue.
> > > > >
> > > > > But there is one catch: Windows builds are not containerized and
> > while
> > > we
> > > > > delete the workspace, there could still be a lot of files which are
> > not
> > > > > being tracked. In future I'd like to have at least a
> > file-system-layer
> > > in
> > > > > between our tests and the host, but we will have to analyze if
> > > something
> > > > > like this exists. At the moment, we even got tests writing to
> > system32.
> > > > >
> > > > > -Marco
> > > > >
> > > > > On Fri, Jan 12, 2018 at 12:44 AM, Chris Olivier <
> > cjolivie...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Ok, but still on that note. I remember before that when some
> > problems
> > > > > were
> > > > > > being fixed in CI (before your time), they switched to deleting
> the
> > > > > entire
> > > > > > source directory, ".git" subdirectory and all.  At the time, the
> CI
> > > was
> > > > > in
> > > > > > such an chaotic state that I didn't make an issue of it, but now
> > that
> > > > it
> > > > > > has stabilized (for the most part, today's incident
> > > notwithstanding), I
> > > > > > think that we may want to revisit it if it is still doing that.
> > you
> > > > > could,
> > > > > > for example, just delete everything except the .git directory and
> > > then
> > > > > do a
> > > > > > 'git reset --hard' to get back a baseline before having to
> > > re-download
> > > > > > everything every tim e(also should speed up the builds).
> > > > > >
> > > > > > Note that 'git clean' was not working as it doesn't delete
> > 'unknown'
> > > > > > directories, which was the problem.
> > > > > >
> > > > > > WDYT?
> > > > > >
> > > > > > On Thu, Jan 11, 2018 at 3:26 PM, Marco de Abreu <
> > > > > > marco.g.ab...@googlemail.com> wrote:
> > > > > >
> > > > > > > This happens because we just merged the clang compilation
> > > > > > > https://github.com/apache/incubator-mxnet/commit/
> > > > > > > 2b73aac527a3439ec0dc9b1e76c6df09ea347eb1.
> > > > > > > This means that clang has to get installed on all slaves and
> > after
> > > > some
> > > > > > > time, the docker images will be cached. The problem right now
> is
> > > that
> > > > > > their
> > > > > > > apt-server is unavailable, means the initial installation to
> > create
> > > > the
> > > > > > > docker cache doesn't succeed. In future, this will be cached.
> > > > > > >
> > > > > > > -Marco
> > > > > > >
> > > > > > > On Thu, Jan 11, 2018 at 11:45 PM, Chris Olivier <

Re: CI failure due to offline llvm.org

2018-01-11 Thread Chris Olivier
I will send you a link to the ticket offline since it's from my day job

On Thu, Jan 11, 2018 at 4:51 PM, Marco de Abreu <
marco.g.ab...@googlemail.com> wrote:

> deleteDir() deletes the content of the current workspace
>
> Okay, I haven't seen any errors related to lua-package not being deleted.
> Do you have a CI-link by any chance?
>
> -Marco
>
> On Fri, Jan 12, 2018 at 1:49 AM, Chris Olivier 
> wrote:
>
> > what is deleteDir() call doing in Jenkinsfile?
> > Yes, I mentioned the case where it wasn't getting cleaned.
> >
> > On Thu, Jan 11, 2018 at 4:41 PM, Marco de Abreu <
> > marco.g.ab...@googlemail.com> wrote:
> >
> > > During git_init: First we're just using git clean, if checkout fails,
> > we're
> > > deleting the entire workspace and retrying.
> > >
> > > During build: First we're using regular make. If build fails, we're
> using
> > > make clean before executing make again.
> > >
> > > During test: No cleanup happening in case of failure.
> > >
> > > So far, I haven't noticed any files not being deleted in the workspace.
> > Do
> > > you know an example?
> > >
> > > -Marco
> > >
> > > On Fri, Jan 12, 2018 at 1:34 AM, Chris Olivier 
> > > wrote:
> > >
> > > > What approach is used now?  I see in Jenkinsfile() that deleteDir()
> is
> > > > called at the top of init_git() and init_git_win().  That dele5tes
> the
> > > > whole directory, correct?
> > > >
> > > > Before there were problems with 'git clean -d -f' *not* deleting some
> > > > directories which were tracked on one branch and not on another,
> which
> > I
> > > > believe is why deletDir() was put there. The directory I recall was
> > > > something like lua-package or something that was in someone's private
> > > repo
> > > > or something like that...
> > > >
> > > > On Thu, Jan 11, 2018 at 4:02 PM, Marco de Abreu <
> > > > marco.g.ab...@googlemail.com> wrote:
> > > >
> > > > > While it's a quite harsh solution to delete the entire workspace, I
> > > think
> > > > > that it's a good way. Git checkout takes between 2 and 10 seconds,
> > so I
> > > > > don't think we need to optimize in that regard.
> > > > >
> > > > > git clean is our 'soft' approach to clean up. Deleting the
> workspace
> > is
> > > > the
> > > > > 'hard' approach, so this shouldn't be an issue.
> > > > >
> > > > > But there is one catch: Windows builds are not containerized and
> > while
> > > we
> > > > > delete the workspace, there could still be a lot of files which are
> > not
> > > > > being tracked. In future I'd like to have at least a
> > file-system-layer
> > > in
> > > > > between our tests and the host, but we will have to analyze if
> > > something
> > > > > like this exists. At the moment, we even got tests writing to
> > system32.
> > > > >
> > > > > -Marco
> > > > >
> > > > > On Fri, Jan 12, 2018 at 12:44 AM, Chris Olivier <
> > cjolivie...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Ok, but still on that note. I remember before that when some
> > problems
> > > > > were
> > > > > > being fixed in CI (before your time), they switched to deleting
> the
> > > > > entire
> > > > > > source directory, ".git" subdirectory and all.  At the time, the
> CI
> > > was
> > > > > in
> > > > > > such an chaotic state that I didn't make an issue of it, but now
> > that
> > > > it
> > > > > > has stabilized (for the most part, today's incident
> > > notwithstanding), I
> > > > > > think that we may want to revisit it if it is still doing that.
> > you
> > > > > could,
> > > > > > for example, just delete everything except the .git directory and
> > > then
> > > > > do a
> > > > > > 'git reset --hard' to get back a baseline before having to
> > > re-download
> > > > > > everything every tim e(also should speed up the builds).
> > > > > >
> > > > > > Note that 'git clean' was not working as it doesn't delete
> > 'unknown'
> > > > > > directories, which was the problem.
> > > > > >
> > > > > > WDYT?
> > > > > >
> > > > > > On Thu, Jan 11, 2018 at 3:26 PM, Marco de Abreu <
> > > > > > marco.g.ab...@googlemail.com> wrote:
> > > > > >
> > > > > > > This happens because we just merged the clang compilation
> > > > > > > https://github.com/apache/incubator-mxnet/commit/
> > > > > > > 2b73aac527a3439ec0dc9b1e76c6df09ea347eb1.
> > > > > > > This means that clang has to get installed on all slaves and
> > after
> > > > some
> > > > > > > time, the docker images will be cached. The problem right now
> is
> > > that
> > > > > > their
> > > > > > > apt-server is unavailable, means the initial installation to
> > create
> > > > the
> > > > > > > docker cache doesn't succeed. In future, this will be cached.
> > > > > > >
> > > > > > > -Marco
> > > > > > >
> > > > > > > On Thu, Jan 11, 2018 at 11:45 PM, Chris Olivier <
> > > > cjolivie...@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > >  do we download all submodules from scratch every build?  if
> we
> > > do
> > > > > then
> > > > > > > we
> > > > > > > 

Re: CI failure due to offline llvm.org

2018-01-11 Thread Marco de Abreu
During git_init: First we're just using git clean, if checkout fails, we're
deleting the entire workspace and retrying.

During build: First we're using regular make. If build fails, we're using
make clean before executing make again.

During test: No cleanup happening in case of failure.

So far, I haven't noticed any files not being deleted in the workspace. Do
you know an example?

-Marco

On Fri, Jan 12, 2018 at 1:34 AM, Chris Olivier 
wrote:

> What approach is used now?  I see in Jenkinsfile() that deleteDir() is
> called at the top of init_git() and init_git_win().  That dele5tes the
> whole directory, correct?
>
> Before there were problems with 'git clean -d -f' *not* deleting some
> directories which were tracked on one branch and not on another, which I
> believe is why deletDir() was put there. The directory I recall was
> something like lua-package or something that was in someone's private repo
> or something like that...
>
> On Thu, Jan 11, 2018 at 4:02 PM, Marco de Abreu <
> marco.g.ab...@googlemail.com> wrote:
>
> > While it's a quite harsh solution to delete the entire workspace, I think
> > that it's a good way. Git checkout takes between 2 and 10 seconds, so I
> > don't think we need to optimize in that regard.
> >
> > git clean is our 'soft' approach to clean up. Deleting the workspace is
> the
> > 'hard' approach, so this shouldn't be an issue.
> >
> > But there is one catch: Windows builds are not containerized and while we
> > delete the workspace, there could still be a lot of files which are not
> > being tracked. In future I'd like to have at least a file-system-layer in
> > between our tests and the host, but we will have to analyze if something
> > like this exists. At the moment, we even got tests writing to system32.
> >
> > -Marco
> >
> > On Fri, Jan 12, 2018 at 12:44 AM, Chris Olivier 
> > wrote:
> >
> > > Ok, but still on that note. I remember before that when some problems
> > were
> > > being fixed in CI (before your time), they switched to deleting the
> > entire
> > > source directory, ".git" subdirectory and all.  At the time, the CI was
> > in
> > > such an chaotic state that I didn't make an issue of it, but now that
> it
> > > has stabilized (for the most part, today's incident notwithstanding), I
> > > think that we may want to revisit it if it is still doing that.  you
> > could,
> > > for example, just delete everything except the .git directory and then
> > do a
> > > 'git reset --hard' to get back a baseline before having to re-download
> > > everything every tim e(also should speed up the builds).
> > >
> > > Note that 'git clean' was not working as it doesn't delete 'unknown'
> > > directories, which was the problem.
> > >
> > > WDYT?
> > >
> > > On Thu, Jan 11, 2018 at 3:26 PM, Marco de Abreu <
> > > marco.g.ab...@googlemail.com> wrote:
> > >
> > > > This happens because we just merged the clang compilation
> > > > https://github.com/apache/incubator-mxnet/commit/
> > > > 2b73aac527a3439ec0dc9b1e76c6df09ea347eb1.
> > > > This means that clang has to get installed on all slaves and after
> some
> > > > time, the docker images will be cached. The problem right now is that
> > > their
> > > > apt-server is unavailable, means the initial installation to create
> the
> > > > docker cache doesn't succeed. In future, this will be cached.
> > > >
> > > > -Marco
> > > >
> > > > On Thu, Jan 11, 2018 at 11:45 PM, Chris Olivier <
> cjolivie...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > >  do we download all submodules from scratch every build?  if we do
> > then
> > > > we
> > > > > should probably find a way not to suggest just doing git reset or
> > > > something
> > > > > like that
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Jan 11, 2018 at 1:47 PM Marco de Abreu <
> > > > > marco.g.ab...@googlemail.com>
> > > > > wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > we're currently experiencing a CI outage caused by
> > > http://apt.llvm.org
> > > > > not
> > > > > > being reachable.
> > > > > >
> > > > > > Best regards,
> > > > > > Marco
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: CI failure due to offline llvm.org

2018-01-11 Thread Chris Olivier
What approach is used now?  I see in Jenkinsfile() that deleteDir() is
called at the top of init_git() and init_git_win().  That dele5tes the
whole directory, correct?

Before there were problems with 'git clean -d -f' *not* deleting some
directories which were tracked on one branch and not on another, which I
believe is why deletDir() was put there. The directory I recall was
something like lua-package or something that was in someone's private repo
or something like that...

On Thu, Jan 11, 2018 at 4:02 PM, Marco de Abreu <
marco.g.ab...@googlemail.com> wrote:

> While it's a quite harsh solution to delete the entire workspace, I think
> that it's a good way. Git checkout takes between 2 and 10 seconds, so I
> don't think we need to optimize in that regard.
>
> git clean is our 'soft' approach to clean up. Deleting the workspace is the
> 'hard' approach, so this shouldn't be an issue.
>
> But there is one catch: Windows builds are not containerized and while we
> delete the workspace, there could still be a lot of files which are not
> being tracked. In future I'd like to have at least a file-system-layer in
> between our tests and the host, but we will have to analyze if something
> like this exists. At the moment, we even got tests writing to system32.
>
> -Marco
>
> On Fri, Jan 12, 2018 at 12:44 AM, Chris Olivier 
> wrote:
>
> > Ok, but still on that note. I remember before that when some problems
> were
> > being fixed in CI (before your time), they switched to deleting the
> entire
> > source directory, ".git" subdirectory and all.  At the time, the CI was
> in
> > such an chaotic state that I didn't make an issue of it, but now that it
> > has stabilized (for the most part, today's incident notwithstanding), I
> > think that we may want to revisit it if it is still doing that.  you
> could,
> > for example, just delete everything except the .git directory and then
> do a
> > 'git reset --hard' to get back a baseline before having to re-download
> > everything every tim e(also should speed up the builds).
> >
> > Note that 'git clean' was not working as it doesn't delete 'unknown'
> > directories, which was the problem.
> >
> > WDYT?
> >
> > On Thu, Jan 11, 2018 at 3:26 PM, Marco de Abreu <
> > marco.g.ab...@googlemail.com> wrote:
> >
> > > This happens because we just merged the clang compilation
> > > https://github.com/apache/incubator-mxnet/commit/
> > > 2b73aac527a3439ec0dc9b1e76c6df09ea347eb1.
> > > This means that clang has to get installed on all slaves and after some
> > > time, the docker images will be cached. The problem right now is that
> > their
> > > apt-server is unavailable, means the initial installation to create the
> > > docker cache doesn't succeed. In future, this will be cached.
> > >
> > > -Marco
> > >
> > > On Thu, Jan 11, 2018 at 11:45 PM, Chris Olivier  >
> > > wrote:
> > >
> > > >  do we download all submodules from scratch every build?  if we do
> then
> > > we
> > > > should probably find a way not to suggest just doing git reset or
> > > something
> > > > like that
> > > >
> > > >
> > > >
> > > > On Thu, Jan 11, 2018 at 1:47 PM Marco de Abreu <
> > > > marco.g.ab...@googlemail.com>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > we're currently experiencing a CI outage caused by
> > http://apt.llvm.org
> > > > not
> > > > > being reachable.
> > > > >
> > > > > Best regards,
> > > > > Marco
> > > > >
> > > >
> > >
> >
>


Re: CI failure due to offline llvm.org

2018-01-11 Thread Marco de Abreu
Good news: The server is back up. The installation and docker cache
regeneration should succeed now.

-Marco

On Fri, Jan 12, 2018 at 1:02 AM, Marco de Abreu <
marco.g.ab...@googlemail.com> wrote:

> While it's a quite harsh solution to delete the entire workspace, I think
> that it's a good way. Git checkout takes between 2 and 10 seconds, so I
> don't think we need to optimize in that regard.
>
> git clean is our 'soft' approach to clean up. Deleting the workspace is
> the 'hard' approach, so this shouldn't be an issue.
>
> But there is one catch: Windows builds are not containerized and while we
> delete the workspace, there could still be a lot of files which are not
> being tracked. In future I'd like to have at least a file-system-layer in
> between our tests and the host, but we will have to analyze if something
> like this exists. At the moment, we even got tests writing to system32.
>
> -Marco
>
> On Fri, Jan 12, 2018 at 12:44 AM, Chris Olivier 
> wrote:
>
>> Ok, but still on that note. I remember before that when some problems were
>> being fixed in CI (before your time), they switched to deleting the entire
>> source directory, ".git" subdirectory and all.  At the time, the CI was in
>> such an chaotic state that I didn't make an issue of it, but now that it
>> has stabilized (for the most part, today's incident notwithstanding), I
>> think that we may want to revisit it if it is still doing that.  you
>> could,
>> for example, just delete everything except the .git directory and then do
>> a
>> 'git reset --hard' to get back a baseline before having to re-download
>> everything every tim e(also should speed up the builds).
>>
>> Note that 'git clean' was not working as it doesn't delete 'unknown'
>> directories, which was the problem.
>>
>> WDYT?
>>
>> On Thu, Jan 11, 2018 at 3:26 PM, Marco de Abreu <
>> marco.g.ab...@googlemail.com> wrote:
>>
>> > This happens because we just merged the clang compilation
>> > https://github.com/apache/incubator-mxnet/commit/
>> > 2b73aac527a3439ec0dc9b1e76c6df09ea347eb1.
>> > This means that clang has to get installed on all slaves and after some
>> > time, the docker images will be cached. The problem right now is that
>> their
>> > apt-server is unavailable, means the initial installation to create the
>> > docker cache doesn't succeed. In future, this will be cached.
>> >
>> > -Marco
>> >
>> > On Thu, Jan 11, 2018 at 11:45 PM, Chris Olivier 
>> > wrote:
>> >
>> > >  do we download all submodules from scratch every build?  if we do
>> then
>> > we
>> > > should probably find a way not to suggest just doing git reset or
>> > something
>> > > like that
>> > >
>> > >
>> > >
>> > > On Thu, Jan 11, 2018 at 1:47 PM Marco de Abreu <
>> > > marco.g.ab...@googlemail.com>
>> > > wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > we're currently experiencing a CI outage caused by
>> http://apt.llvm.org
>> > > not
>> > > > being reachable.
>> > > >
>> > > > Best regards,
>> > > > Marco
>> > > >
>> > >
>> >
>>
>
>


Re: CI failure due to offline llvm.org

2018-01-11 Thread Marco de Abreu
While it's a quite harsh solution to delete the entire workspace, I think
that it's a good way. Git checkout takes between 2 and 10 seconds, so I
don't think we need to optimize in that regard.

git clean is our 'soft' approach to clean up. Deleting the workspace is the
'hard' approach, so this shouldn't be an issue.

But there is one catch: Windows builds are not containerized and while we
delete the workspace, there could still be a lot of files which are not
being tracked. In future I'd like to have at least a file-system-layer in
between our tests and the host, but we will have to analyze if something
like this exists. At the moment, we even got tests writing to system32.

-Marco

On Fri, Jan 12, 2018 at 12:44 AM, Chris Olivier 
wrote:

> Ok, but still on that note. I remember before that when some problems were
> being fixed in CI (before your time), they switched to deleting the entire
> source directory, ".git" subdirectory and all.  At the time, the CI was in
> such an chaotic state that I didn't make an issue of it, but now that it
> has stabilized (for the most part, today's incident notwithstanding), I
> think that we may want to revisit it if it is still doing that.  you could,
> for example, just delete everything except the .git directory and then do a
> 'git reset --hard' to get back a baseline before having to re-download
> everything every tim e(also should speed up the builds).
>
> Note that 'git clean' was not working as it doesn't delete 'unknown'
> directories, which was the problem.
>
> WDYT?
>
> On Thu, Jan 11, 2018 at 3:26 PM, Marco de Abreu <
> marco.g.ab...@googlemail.com> wrote:
>
> > This happens because we just merged the clang compilation
> > https://github.com/apache/incubator-mxnet/commit/
> > 2b73aac527a3439ec0dc9b1e76c6df09ea347eb1.
> > This means that clang has to get installed on all slaves and after some
> > time, the docker images will be cached. The problem right now is that
> their
> > apt-server is unavailable, means the initial installation to create the
> > docker cache doesn't succeed. In future, this will be cached.
> >
> > -Marco
> >
> > On Thu, Jan 11, 2018 at 11:45 PM, Chris Olivier 
> > wrote:
> >
> > >  do we download all submodules from scratch every build?  if we do then
> > we
> > > should probably find a way not to suggest just doing git reset or
> > something
> > > like that
> > >
> > >
> > >
> > > On Thu, Jan 11, 2018 at 1:47 PM Marco de Abreu <
> > > marco.g.ab...@googlemail.com>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > we're currently experiencing a CI outage caused by
> http://apt.llvm.org
> > > not
> > > > being reachable.
> > > >
> > > > Best regards,
> > > > Marco
> > > >
> > >
> >
>


Re: CI failure due to offline llvm.org

2018-01-11 Thread Chris Olivier
Ok, but still on that note. I remember before that when some problems were
being fixed in CI (before your time), they switched to deleting the entire
source directory, ".git" subdirectory and all.  At the time, the CI was in
such an chaotic state that I didn't make an issue of it, but now that it
has stabilized (for the most part, today's incident notwithstanding), I
think that we may want to revisit it if it is still doing that.  you could,
for example, just delete everything except the .git directory and then do a
'git reset --hard' to get back a baseline before having to re-download
everything every tim e(also should speed up the builds).

Note that 'git clean' was not working as it doesn't delete 'unknown'
directories, which was the problem.

WDYT?

On Thu, Jan 11, 2018 at 3:26 PM, Marco de Abreu <
marco.g.ab...@googlemail.com> wrote:

> This happens because we just merged the clang compilation
> https://github.com/apache/incubator-mxnet/commit/
> 2b73aac527a3439ec0dc9b1e76c6df09ea347eb1.
> This means that clang has to get installed on all slaves and after some
> time, the docker images will be cached. The problem right now is that their
> apt-server is unavailable, means the initial installation to create the
> docker cache doesn't succeed. In future, this will be cached.
>
> -Marco
>
> On Thu, Jan 11, 2018 at 11:45 PM, Chris Olivier 
> wrote:
>
> >  do we download all submodules from scratch every build?  if we do then
> we
> > should probably find a way not to suggest just doing git reset or
> something
> > like that
> >
> >
> >
> > On Thu, Jan 11, 2018 at 1:47 PM Marco de Abreu <
> > marco.g.ab...@googlemail.com>
> > wrote:
> >
> > > Hello,
> > >
> > > we're currently experiencing a CI outage caused by http://apt.llvm.org
> > not
> > > being reachable.
> > >
> > > Best regards,
> > > Marco
> > >
> >
>


Re: R Build failure

2018-01-11 Thread Haibin Lin
+1 for using free datasets or datasets without license issues and host them on 
s3 buckets to reduce external dependencies. 

On 2018-01-06 15:26, kellen sunderland  wrote: 
> FYI PRs are currently failing to build.  The R "Matrix Factorization" test
> is failing to download this dataset: http://files.grouplens.org/datasets/
> movielens/ml-100k.zip .  The site https://grouplens.org/ appears to be down.
> 
> Issue here: https://github.com/apache/incubator-mxnet/issues/9332
> PR to skip the test here:
> https://github.com/apache/incubator-mxnet/pull/9333
> 
> -Kellen
> 


Re: Test failures due to mxnet.text

2018-01-11 Thread Haibin Lin
I noticed that, too. I pinged the contributor to investigate the cause of the 
failure. Thanks for reporting this, Marco.

Best,
Haibin


On 2018-01-11 13:45, Marco de Abreu  wrote: 
> Hello,
> 
> apparently, the recently introduced mxnet.text API introduces test failures
> https://github.com/apache/incubator-mxnet/pull/8763. It would be great if
> the two following issues could be investigated:
> http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/incubator-mxnet/branches/master/runs/175/nodes/336/steps/629/log/?start=0
> 
> test_text.test_glove ... FAIL
> Traceback (most recent call last):
> 
>   File "C:\Anaconda3\envs\py2\lib\site-packages\nose\case.py", line 197, in
> runTest
> 
> self.test(*self.arg)
> 
>   File
> "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\test_text.py",
> line 125, in test_glove
> 
> 'glove', pretrained_file_name='glove.6B.50d.txt')
> 
>   File
> "C:\jenkins_slave\workspace\ut-python-cpu\pkg_vc14_cpu\python\mxnet\text\embedding.py",
> line 371, in create
> 
> return create_text_embedding(embedding_name, **kwargs)
> 
>   File
> "C:\jenkins_slave\workspace\ut-python-cpu\pkg_vc14_cpu\python\mxnet\registry.py",
> line 163, in create
> 
> return registry[name](*args, **kwargs)
> 
>   File
> "C:\jenkins_slave\workspace\ut-python-cpu\pkg_vc14_cpu\python\mxnet\text\embedding.py",
> line 538, in __init__
> 
> self._load_embedding(pretrained_file_path, ' ', init_unknown_vec)
> 
>   File
> "C:\jenkins_slave\workspace\ut-python-cpu\pkg_vc14_cpu\python\mxnet\text\embedding.py",
> line 201, in _load_embedding
> 
> % (line_num, token, len(elems), vec_len)
> 
> AssertionError: At line 321803 of the pre-trained token embedding file: the
> dimension of token nonslip is 7 but the dimension of previous tokens is 50.
> Dimensions of all the tokens must be the same.
> 
>  >> begin captured logging << 
> 
> root: INFO: Loading pre-trained token embedding vectors from
> C:\Windows\system32\config\systemprofile\.mxnet\embeddings\glove\glove.6B.50d.txt
> 
> - >> end captured logging << -
> 
> 
> Also, we got a skipped test:
> test_text.test_fasttext ...
> C:\jenkins_slave\workspace\ut-python-cpu\pkg_vc14_cpu\python\mxnet\text\embedding.py:188:
> UserWarning: At line 1 of the pre-trained text embedding file: token 111051
> with 1-dimensional vector [300.0] is likely a header and is skipped.
> 
>   'skipped.' % (line_num, token, elems))
> 
> 
> 
> Thank you
> 
> -Marco
> 


Re: Refactoring MXNet scala code to use "org.apache.mxnet"

2018-01-11 Thread YiZhi Liu
Since we now have changed package prefix to org.apache, does someone have
guidance for posting it to Apache's Maven Repository?

ml.dmlc.mxnet was posted to oss.sonatype.org.

2018-01-04 17:14 GMT-08:00 Lupesko, Hagay :

> Suneel,
>
> I tend to think for this issue, GitHub issue is good enough and we do not
> need JIRA.
> Can you clarify what is the advantage you see in using JIRA over GitHub
> issue for this specific case?
>
> Thanks!
> Hagay
>
> On 1/4/18, 16:34, "Suneel Marthi"  wrote:
>
> Jira has been around for a while -
> https://issues.apache.org/jira/projects/MXNET/
>
> switch to using jira
>
> On Thu, Jan 4, 2018 at 7:31 PM, Roshani Nagmote <
> roshaninagmo...@gmail.com>
> wrote:
>
> >  Hi,
> >
> > As currently, MXNet does not have Jira project, I have created
> github issue
> > for now.
> > https://github.com/apache/incubator-mxnet/issues/9315
> >
> > Will create the PR and link the issue there.
> >
> > Thanks,
> > Roshani
> >
> > On Thu, Jan 4, 2018 at 3:08 PM, Naveen Swamy 
> wrote:
> >
> > > Hi Suneel,
> > >
> > > Did we decide that we will using Jira going forward? If not, can
> someone
> > > summarize on the improvement email on the consensus and lets make
> it
> > > universal and how to use it, what is expected, etc.,
> > >
> > > For the record, I like the idea of using Jira for more openness.
> > >
> > > Also, MXNet does not have Jira project, can you help creating one?
> > >
> > > Thanks, Naveen
> > >
> > >
> > > On Thu, Jan 4, 2018 at 2:35 PM, Suneel Marthi 
> > wrote:
> > >
> > > > Is there a Jira for this? Please create a Jira and reference
> that in
> > the
> > > PR
> > > > for this.
> > > >
> > > > On Thu, Jan 4, 2018 at 5:16 PM, Roshani Nagmote <
> > > roshaninagmo...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hello all,
> > > > >
> > > > > I am working on publishing mxnet-scala release to maven
> repository
> > and
> > > > as a
> > > > > part of that, I will also be refactoring mxnet-scala
> > code/tests/example
> > > > and
> > > > > docs to use "org.apache.mxnet" instead of "ml.dmlc.mxnet".
> > > > >
> > > > > Currently, MXNet-Scala
> > > > > 
> library
> > uses
> > > > > "ml.dmlc.mxnet" packages. This work will change the way to
> import
> > > modules
> > > > > when using mxnet-scala package.
> > > > >
> > > > > *Old way:*
> > > > >
> > > > > scala> import ml.dmlc.mxnet._
> > > > >import ml.dmlc.mxnet._scala> val arr = NDArray.ones(2, 3)
> > > > >arr: ml.dmlc.mxnet.NDArray = ml.dmlc.mxnet.NDArray@f5e74790
> > > > >
> > > > > *New way:*
> > > > >
> > > > > scala> import org.apache.mxnet._
> > > > >import org.apache.mxnet._
> > > > > scala> val arr = NDArray.ones(2, 3)
> > > > >arr: org.apache.mxnet.NDArray = org.apache.mxnet.NDArray@
> f5e74790
> > > > >
> > > > >
> > > > > Please let me know if anyone has any thoughts or issues with
> this
> > > change.
> > > > >
> > > > > Thanks,
> > > > > Roshani
> > > > >
> > > >
> > >
> >
>
>
>
>


-- 
Yizhi Liu
DMLC member
Amazon Web Services
Vancouver, Canada


CI failure due to offline llvm.org

2018-01-11 Thread Marco de Abreu
Hello,

we're currently experiencing a CI outage caused by http://apt.llvm.org not
being reachable.

Best regards,
Marco


Test failures due to mxnet.text

2018-01-11 Thread Marco de Abreu
Hello,

apparently, the recently introduced mxnet.text API introduces test failures
https://github.com/apache/incubator-mxnet/pull/8763. It would be great if
the two following issues could be investigated:
http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/incubator-mxnet/branches/master/runs/175/nodes/336/steps/629/log/?start=0

test_text.test_glove ... FAIL
Traceback (most recent call last):

  File "C:\Anaconda3\envs\py2\lib\site-packages\nose\case.py", line 197, in
runTest

self.test(*self.arg)

  File
"C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\test_text.py",
line 125, in test_glove

'glove', pretrained_file_name='glove.6B.50d.txt')

  File
"C:\jenkins_slave\workspace\ut-python-cpu\pkg_vc14_cpu\python\mxnet\text\embedding.py",
line 371, in create

return create_text_embedding(embedding_name, **kwargs)

  File
"C:\jenkins_slave\workspace\ut-python-cpu\pkg_vc14_cpu\python\mxnet\registry.py",
line 163, in create

return registry[name](*args, **kwargs)

  File
"C:\jenkins_slave\workspace\ut-python-cpu\pkg_vc14_cpu\python\mxnet\text\embedding.py",
line 538, in __init__

self._load_embedding(pretrained_file_path, ' ', init_unknown_vec)

  File
"C:\jenkins_slave\workspace\ut-python-cpu\pkg_vc14_cpu\python\mxnet\text\embedding.py",
line 201, in _load_embedding

% (line_num, token, len(elems), vec_len)

AssertionError: At line 321803 of the pre-trained token embedding file: the
dimension of token nonslip is 7 but the dimension of previous tokens is 50.
Dimensions of all the tokens must be the same.

 >> begin captured logging << 

root: INFO: Loading pre-trained token embedding vectors from
C:\Windows\system32\config\systemprofile\.mxnet\embeddings\glove\glove.6B.50d.txt

- >> end captured logging << -


Also, we got a skipped test:
test_text.test_fasttext ...
C:\jenkins_slave\workspace\ut-python-cpu\pkg_vc14_cpu\python\mxnet\text\embedding.py:188:
UserWarning: At line 1 of the pre-trained text embedding file: token 111051
with 1-dimensional vector [300.0] is likely a header and is skipped.

  'skipped.' % (line_num, token, elems))



Thank you

-Marco


Re: Reduce 99% of your memory leaks with this simple trick!

2018-01-11 Thread Anirudh
Hi,


I have been thinking about exception handling specifically inside spawned
threads.

As Tianqi mentioned, there is already a mechanism with LOG(FATAL) or CHECK
for exception handling inside main

Thread. For exception handling inside spawned threads I see two places:
iterators and operators.



For iterators, we can use exception_ptr to transport the exceptions from
child thread to the main thread.

This can be implemented in the threadediter class template. Since
PrefetchingIter is used by most iterators in MXNet,

and this uses threadediter, we should be able to cover most of our use
cases.



For operators, I was thinking that we can transport the exception down the
dependency path.

For example, when an exception is caught inside ExecuteOprBlock for a
single operator,

We store the exception_ptr in the operator. We then propagate the
exception_ptr down to all the vars that the

Operator writes to. Similarly, if an operator’s read vars has exception_ptr
attached to it, we propagate it down to the operator itself.



We can then check if the var has an associated exception_ptr in
wait_to_read.

One problem I see with the approach is that even if an operator fails we
may need to run subsequent operators. One way to avoid this

Would be an onstart callback, which would mark the operator to not execute
if any of the read vars have an exception_ptr attached to it.



Anirudh

On Thu, Jan 11, 2018 at 9:02 AM, Tianqi Chen 
wrote:

> I am all for RAII when possible in most of the code. The only reason some
> of the raw ptr occur in dmlc codebase was legacy-issue, and that can be
> resolved by wrapping returning ptr via unique_ptr or shared_ptr. One
> notable property of RAII is exception safe, which makes the code handle
> resource correctly when it throws in the middle. There are cases where
> memory allocation needs to be explicitly handled(e.g. GPU memory
> management) and reused where we need to do explicit management when needed.
>
>
> As for exception handling, we do have a mechanism for handling exceptions.
> when you do LOG(FATAL) or CHECK is caught at the C API boundary, which
> translates to return code  -1 and an error is thrown on the python side.
> Throwing exception from another thread is a more tricky thing, which
> involves catching them in the engine, and usually, the state is not correct
> in such case. But most of the cases when we need exception handling are the
> simple case of opening a file and use CHECK should suffice.
>
> A better approach might be defining a new macro for errors intended to
> throw to a user and handled correctly, something like DMLC_EXPECT. But I
> find it might be a burden to put developer to distinguish what should be a
> user error and a normal check, so we just use CHECK for now
>
> Tianqi
>
> On Thu, Jan 11, 2018 at 3:09 AM, Pedro Larroy <
> pedro.larroy.li...@gmail.com>
> wrote:
>
> > Hi
> >
> > I would like to encourage contributors to use RAII idioms in C++
> > whenever possible to avoid resource leaks.
> >
> > RAII is an ugly acronym that stands for Resource Acquisition Is
> > Initialization, which basically means that you should almost never use
> > explicit new and delete operators and instead use std::make_shared,
> > std::make_unique and std::vector  and .data() for raw
> > buffers. Also always allocating OS resources in constructors releasing
> > them in destructors such as file descriptors.
> >
> > Asides from forgetting to call delete on an allocation, explicit
> > deletes are bad because an exception thrown in the middle prevents
> > delete from running entirely.
> >
> > This helps a lot writing correct, secure and exception safe code
> > without memory leaks.
> >
> > Another problem that I think is worth a discussion, is how to handle
> > exceptions and errors. Right now, I don't think there's a good way to
> > throw an exception in some functions without crashing the python
> > interpreter. I think we should come with a smart way to propagate
> > exceptions from the library up to the user runtime (python, scala...)
> >
> > As an example of what I'm talking about is this suspicious code that I
> > saw in a PR, which has several bugs in a few lines of code related to
> > what I'm discussing in this thread, crashing Python when trying to
> > open a file that doesn't exist. (How to propagate an exception in this
> > case?)
> >
> > https://github.com/apache/incubator-mxnet/pull/9370/files
> >
> > Please excuse the clickbait subject, just trying to grab your
> > attention in a humorous way now that the weekend is approaching.
> >
> > Pedro.
> >
>


Re: Release plan - MXNET 1.0.1

2018-01-11 Thread Asmus Hetzel
 Hello Haibin, 
we have the following in the release notes under performance improvements:
 "Integrated MKLDNN for CPU training and inference acceleration"
My impression is that this is what PR 8302 is about. I browsed through the code 
and understand and agree what this PR is trying to achieve.  But must admit 
that I would feel uncomfortable to wrap this up into a release schedule that 
early. This PR touches 115 files in a sometimes intrusive way and is not yet 
finally tested nor integrated into the master branch. Wrapping such a PR in 
very lately will put a big risk on the release that we should only take when 
absolutely unavoidable.I personally would either remove this from the release 
or otherwise move the release date. 
Let me know if I misunderstood anything. 
Regards
Asmus



 

Am Donnerstag, 11. Januar 2018, 00:34:04 MEZ hat Haibin Lin 
 Folgendes geschrieben:  
 
 I am starting the process to prepare for MXNET 1.0.1 release. I have
drafted release notes
(*https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.0.1+Release+Notes
*)
to cover the tasks under this release.

A release candidate will be cut on Monday 22nd Jan, 2018 and voting will
commence from then till Thursday 25th Jan, 2018. If you have any additional
features in progress and would like to include it in this release, please
assure they have been merged by Thursday 18th Jan, 2018 with comment so I
may update the release notes.

Feel free to add any other comments/suggestions.

Thanks,
Haibin
  

Re: Reduce 99% of your memory leaks with this simple trick!

2018-01-11 Thread McCollum, Cliff
+1 to this.

I once worked on a 500k LOC C++ distributed system for the Telecom industry 
that only had new() and delete() calls on six lines. Everything else was done 
via the RAII pattern with std::  The net effect was that even in such a large 
system, we recorded only three memory leaks in six years of development and 
production - and two of them weren’t even in our code.

Banning all use of manual memory management is absolutely possible with current 
C++. Anyone calling new (unless you are using auto_ptr) should pause and look 
for a better way. 

Cliff


Sent from my iPad

> On 11 Jan 2018, at 11:10, Pedro Larroy  wrote:
> 
> Hi
> 
> I would like to encourage contributors to use RAII idioms in C++
> whenever possible to avoid resource leaks.
> 
> RAII is an ugly acronym that stands for Resource Acquisition Is
> Initialization, which basically means that you should almost never use
> explicit new and delete operators and instead use std::make_shared,
> std::make_unique and std::vector  and .data() for raw
> buffers. Also always allocating OS resources in constructors releasing
> them in destructors such as file descriptors.
> 
> Asides from forgetting to call delete on an allocation, explicit
> deletes are bad because an exception thrown in the middle prevents
> delete from running entirely.
> 
> This helps a lot writing correct, secure and exception safe code
> without memory leaks.
> 
> Another problem that I think is worth a discussion, is how to handle
> exceptions and errors. Right now, I don't think there's a good way to
> throw an exception in some functions without crashing the python
> interpreter. I think we should come with a smart way to propagate
> exceptions from the library up to the user runtime (python, scala...)
> 
> As an example of what I'm talking about is this suspicious code that I
> saw in a PR, which has several bugs in a few lines of code related to
> what I'm discussing in this thread, crashing Python when trying to
> open a file that doesn't exist. (How to propagate an exception in this
> case?)
> 
> https://github.com/apache/incubator-mxnet/pull/9370/files
> 
> Please excuse the clickbait subject, just trying to grab your
> attention in a humorous way now that the weekend is approaching.
> 
> Pedro.


Reduce 99% of your memory leaks with this simple trick!

2018-01-11 Thread Pedro Larroy
Hi

I would like to encourage contributors to use RAII idioms in C++
whenever possible to avoid resource leaks.

RAII is an ugly acronym that stands for Resource Acquisition Is
Initialization, which basically means that you should almost never use
explicit new and delete operators and instead use std::make_shared,
std::make_unique and std::vector  and .data() for raw
buffers. Also always allocating OS resources in constructors releasing
them in destructors such as file descriptors.

Asides from forgetting to call delete on an allocation, explicit
deletes are bad because an exception thrown in the middle prevents
delete from running entirely.

This helps a lot writing correct, secure and exception safe code
without memory leaks.

Another problem that I think is worth a discussion, is how to handle
exceptions and errors. Right now, I don't think there's a good way to
throw an exception in some functions without crashing the python
interpreter. I think we should come with a smart way to propagate
exceptions from the library up to the user runtime (python, scala...)

As an example of what I'm talking about is this suspicious code that I
saw in a PR, which has several bugs in a few lines of code related to
what I'm discussing in this thread, crashing Python when trying to
open a file that doesn't exist. (How to propagate an exception in this
case?)

https://github.com/apache/incubator-mxnet/pull/9370/files

Please excuse the clickbait subject, just trying to grab your
attention in a humorous way now that the weekend is approaching.

Pedro.