Re: [DISCUSS] Apache MXNet: Path to graduation

2019-08-30 Thread Anton Chernov
As a physicist I would like to point out that "Gluon" means: An elementary
particle that acts as the exchange particle for the strong force between
quarks [1].
As a general scientific term it can barely be seen as a candidate for
trademark registration.

[1] https://en.wikipedia.org/wiki/Gluon

On Fri, 30 Aug 2019 at 10:27, Leonard Lausen  wrote:

> Carin recently noted that gluonhq.com already uses the Gluon brand for
> end-to-end enterprise mobile solution and Marco found that they do so
> apparently since at least 2015. Do you see any impact on the Gluon brand
> for deep learning models?
>
> The MXNet brand is currently also unregistered by Apache (but registered
> by various other companies), whereas for example Tensorflow is
> registered by Google LLC in a variety of jurisdictions. Trademarks
> registered under the Madrid System can be found at
> https://www3.wipo.int/branddb/en/
>
> Best regards
> Leonard
>
> Hen  writes:
>
> > Amazon. Amazon created the brand. They own the first repository to use
> the
> > term in this conext ( https://github.com/gluon-api ). There was some
> > involvement from Microsoft, so Microsoft's opinion may also be relevant.
> > Gluon is not an Apache Software Foundation nor Apache MXNet brand.
> >
> > Unless it was very recent, I don't believe there have been any trademark
> > registrations. If Amazon would prefer Apache control the Gluon naming, I
> > think the simplest 'act' to make that so would be to move the gluon-api
> > repository over to ASF control.
> >
> > Hen
> >
> > On Thu, Aug 29, 2019 at 8:27 AM Chris Olivier 
> wrote:
> >
> >> Who is the gluon “Brand Owner”?
> >>
> >> On Tue, Aug 27, 2019 at 10:43 AM Chris Olivier 
> >> wrote:
> >>
> >> > Who is the gluon "brand owner"?
> >> >
> >> > On Tue, Aug 27, 2019 at 10:13 AM Qing Lan 
> wrote:
> >> >
> >> >> Hi Lieven,
> >> >>
> >> >> Thanks for your comments. After the discussion with several
> committers
> >> >> and contributors offline, we agreed that there are space for
> >> improvement.
> >> >>
> >> >>
> >> >>   1.  About the Gluon naming
> >> >>
> >> >> As we know, Gluon is born with the unique API design pattern. It
> >> >> gradually became the dominant Python front end for MXNet. I would
> >> suggest
> >> >> to discuss more with the Brand owner and see if there could be a
> further
> >> >> integration with MXNet. To MXNet itself, it becomes more popular with
> >> this
> >> >> frontend. We lean on the strong community and improve our product
> >> better by
> >> >> consuming the feedback from it.
> >> >>
> >> >>  2. Diversity of the PMC
> >> >> Currently, we have 40 PMC numbers from different companies, like
> Amazon,
> >> >> Uber, NVIDIA, ByteDance and a lot more. We are trying to grow the
> number
> >> >> and invite indivials from different companies as well as research
> >> institute.
> >> >>
> >> >> 3. Release rotation
> >> >> In the history, most of the releases were done by the Amazon side.
> >> >> Currently, we are moving on to rotate this responsibility with
> >> >> contributors/committers not from Amazon to start working on them.
> >> >>
> >> >> 4. Committers from different firm/institution should have real
> work
> >> >> on MXNet
> >> >> I can tell from the issues/PRs/rfcs they submitted and indeed and
> indeed
> >> >> we should encourage the committers who is less active to be involved
> >> into
> >> >> MXNet contribution.
> >> >>
> >> >> Thanks,
> >> >> Qing
> >> >>
> >> >> 
> >> >> From: Lieven Govaerts 
> >> >> Sent: Saturday, August 10, 2019 5:59
> >> >> To: dev@mxnet.incubator.apache.org 
> >> >> Cc: d...@mxnet.apache.org 
> >> >> Subject: Re: [DISCUSS] Apache MXNet: Path to graduation
> >> >>
> >> >> Hi Qing,
> >> >>
> >> >> as a user and ASF member observing this project:
> >> >>
> >> >> On Sat, 10 Aug 2019 at 01:44, Qing Lan  wrote:
> >> >>
> >> >> > Hi All,
> >> >> >
> >> >> > I would like to start a thread to discuss about the graduation for
> >> >> Apache
> >> >> > MXNet. From my time working in the community, I saw a great
> >> improvement
> >> >> in
> >> >> > most of the area that we do to make MXNet a better place. We keep
> >> >> tracking
> >> >> > on all of the issues user raised and reviewing PRs. We follow the
> >> Apache
> >> >> > Way to release the package in official repository.
> >> >> >
> >> >> >
> >> >> in terms of code, documentation, visibility this project is certainly
> >> in a
> >> >> healthy state, I see a lot of interest of companies and people, the
> >> >> community is growing... As a user that gives me confidence my time
> >> >> invested
> >> >> in this product is well spent.
> >> >>
> >> >>
> >> >> > In 2017, Apache MXNet joined the Apache incubation project. I think
> >> now
> >> >> is
> >> >> > a good time to review the path to graduate MXNet and move forward
> to
> >> it.
> >> >> > Please feel free to share your thoughts on graduation and space for
> >> >> > improvement.
> >> >> >
> >> >> >
> >> >> If I may share one 

[INVITATION] Berlin MXNet Recurring User Group Meeting

2019-06-25 Thread Anton Chernov
Dear MXNet Community,

This is a friendly reminder that the Berlin MXNet Recurring User Group
Meeting will
be held today at 6pm-7pm (CEST) / 9am-10am (PST). More info here:

https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28Incubating%29+User+Groups+recurring+meetings

https://chime.aws/9346098196

Best
Anton


Re: [VOTE] Remove conflicting OpenMP from CMake builds

2019-06-18 Thread Anton Chernov
 > than this, so I don't really have much else to add.
> > >
> > > ldd /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so
> > > linux-vdso.so.1 (0x7ffc989cf000)
> > > libmklml_intel.so =>
> > > /usr/local/lib/python3.6/dist-packages/mxnet/libmklml_intel.so
> > > (0x7f0afb7c1000)
> > >* libiomp5.so =>
> > > /usr/local/lib/python3.6/dist-packages/mxnet/libiomp5.so
> > > (0x7f0afb3e5000)*
> > > librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1
> (0x7f0afb1dd000)
> > > libmkldnn.so.0 =>
> > > /usr/local/lib/python3.6/dist-packages/mxnet/libmkldnn.so.0
> > > (0x7f0afa7ba000)
> > > libgfortran.so.3 =>
> > > /usr/local/lib/python3.6/dist-packages/mxnet/libgfortran.so.3
> > > (0x7f0afa493000)
> > > libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
> (0x7f0afa28f000)
> > > libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > > (0x7f0af9f06000)
> > > libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
> (0x7f0af9b68000)
> > > libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> > > (0x7f0af995)
> > > libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> > > (0x7f0af9731000)
> > > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
> (0x7f0af934)
> > > /lib64/ld-linux-x86-64.so.2 (0x7f0b073f4000)
> > > libquadmath.so.0 =>
> > > /usr/local/lib/python3.6/dist-packages/mxnet/libquadmath.so.0
> > > (0x7f0af910)
> > >
> > >
> > > On Mon, Jun 17, 2019 at 10:58 AM Pedro Larroy <
> pedro.larroy.li...@gmail.com>
> > > wrote:
> > >
> > > > I had read the "Apache Voting Process" guide here:
> > > > https://www.apache.org/foundation/voting.html  and I thought code
> > > > changes could be discussed on the mailing list in cases where the PR
> > > > is stuck or there's no response for a long time, also I understood
> > > > that -1's have to be justified.  Could you, or someone more familiar
> > > > which the Apache way enlighten on how to move this issue forward in a
> > > > constructive way?
> > > >
> > > > Thanks a lot.
> > > >
> > > > Pedro.
> > > >
> > > > On Mon, Jun 17, 2019 at 10:46 AM Pedro Larroy
> > > >  wrote:
> > > > >
> > > > > Thanks.
> > > > >
> > > > > How do we go on advancing this PR then? all the questions have been
> > > > > answered, performance numbers provided and more. Until how long
> can a
> > > > > veto stand? Also without replies to contributors.
> > > > >
> > > > > Pedro.
> > > > >
> > > > > On Fri, Jun 14, 2019 at 5:44 PM Sheng Zha 
> wrote:
> > > > > >
> > > > > > This vote is invalid as the original PR has been vetoed by a
> > > > committer. A vote on dev@ won't help you circumvent a veto.
> > > > > >
> > > > > > -sz
> > > > > >
> > > > > > On 2019/06/14 23:59:33, Pedro Larroy <
> pedro.larroy.li...@gmail.com>
> > > > wrote:
> > > > > > > Hi all
> > > > > > >
> > > > > > > This is a 5-day vote to act and wrap up an outstanding PR that
> > > > removes
> > > > > > > linkage with multiple OpenMP from 3rdparty and uses the system
> > > > > > > provided one which might resolve a number of difficult to debug
> > > > issues
> > > > > > > and possible undefined behaviour.
> > > > > > >
> > > > > > > https://github.com/apache/incubator-mxnet/pull/12160
> > > > > > >
> > > > > > > See the comments in the thread for more details but in short,
> linking
> > > > > > > with multiple openmp versions seems to lead to undefined
> behaviour,
> > > > > > > plus it would simplify not having to deal with a custom openmp
> > > > version
> > > > > > > and rely on the platform provided one.
> > > > > > >
> > > > > > > This is expected to simplify builds and address a number of
> problems.
> > > > > > > Seems it doesn't cause any performance degradation, (the Gluon
> tests
> > > > > > > run almost 4x faster in my 64 core machine).
> > > > > > >
> > > > > > > There has been in depth study of performance implications by
> > > > > > > contributors like Stanislav Tsukrov and Anton Chernov.  All the
> > > > > > > concerns and comments from the reviewers have been addressed
> and we
> > > > > > > can't keep asking open ended questions to block PRs. Reviewers
> are
> > > > > > > expected to be proactive and responsive to contributors so we
> keep
> > > > > > > encouraging active contributors.
> > > > > > >
> > > > > > > please vote to merge this PR accordingly:
> > > > > > >
> > > > > > > +1 = approve
> > > > > > > +0 = no opinion
> > > > > > > -1 = disapprove (provide reason)
> > > > > > >
> > > > > > > If we observe regressions reported by any internal performance
> > > > systems
> > > > > > > or by contributors the PR can be reverted easily. So it's not
> a one
> > > > > > > way door. But it will be useful to try this in master for a
> while.
> > > > > > >
> > > > > > > Thank you.
> > > > > > >
> > > > > > > Pedro.
> > > > > > >
> > > >
>


Re: [Discussion] Remove bundled llvm OpenMP

2019-05-22 Thread Anton Chernov
We are now waiting for a committer's review and merge.

ср, 22 мая 2019 г. в 22:14, Pedro Larroy :

> Thanks Aaron and Anton!   Can we rebase to update the PR?  Let me know
> how can I help further if you find some problems.
>
> On Wed, May 22, 2019 at 6:49 AM Aaron Markham 
> wrote:
> >
> > I reopened it for you.
> >
> > On Wed, May 22, 2019, 05:25 Anton Chernov  wrote:
> >
> > > I don't have necessary rights to reopen this PR.
> > >
> > > пн, 20 мая 2019 г. в 08:00, Pedro Larroy  >:
> > >
> > > > Hi Anton, Stas.
> > > >
> > > > Can we reopen this PR and get it merged as per the data collected by
> > > Stas?
> > > >
> > > > https://github.com/apache/incubator-mxnet/pull/12160
> > > >
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/MXNET/Benchmarking+MXNet+with+different+OpenMP+implementations
> > > >
> > > > There are multiple issues that will be fixed by solving this problem.
> > > >
> > > >
> > > > Pedro
> > > >
> > > > On Tue, Feb 12, 2019 at 4:54 AM Anton Chernov 
> > > wrote:
> > > > >
> > > > > I would like to propose a possible alternative solution for
> > > > consideration.
> > > > >
> > > > > If keeping llvm OpenMP as a submodule is inevitable one could make
> > > > > following adjustments:
> > > > >
> > > > > Since compilers try to find their own OpenMP library implicitly,
> MXNet
> > > > > needs to ensure that only the bundled version is found. Therefore
> > > during
> > > > > the build and also during deployment this library has to provide
> > > symlinks
> > > > > for each possible compiler that would link to the built artifact
> ie.
> > > > >
> > > > > libiomp.so -> libgomp.so -> libomp.so
> > > > >
> > > > > The MKLML iomp would need to be hidden and removed as well.
> > > > >
> > > > > On Windows it would be a different story, but as can be seen [1]
> > > bundled
> > > > > OpenMP was not included in the Windows build anyway.
> > > > >
> > > > > Alternatively: always use iomp (with same symlinking trick though)
> > > > provided
> > > > > by MKLML distribution [2]. This potentially could work on Windows
> as
> > > > well.
> > > > >
> > > > > Best
> > > > > Anton
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> https://github.com/apache/incubator-mxnet/blob/8a63bdecf2d9f12d34fe5874957ae4c867eb5f5b/CMakeLists.txt#L408-L410
> > > > > [2] https://github.com/intel/mkl-dnn/releases
> > > > >
> > > > > вт, 12 февр. 2019 г. в 11:22, Anton Chernov :
> > > > >
> > > > > > Recent benchmarking results have been published here [1].
> Experiments
> > > > > > compare different OpenMP implementations as well as binaries
> compiled
> > > > with
> > > > > > different compilers including GCC, Clang and ICC.
> > > > > >
> > > > > > During experimentation another issues with mixing up libraries
> was
> > > > > > identified and described here [2].
> > > > > >
> > > > > > Best
> > > > > > Anton
> > > > > >
> > > > > > [1] https://cwiki.apache.org/confluence/x/2wclBg
> > > > > > [2]
> > > > > >
> > > >
> > >
> https://github.com/apache/incubator-mxnet/issues/14087#issuecomment-461734041
> > > > > >
> > > > > >
> > > > > > вс, 9 дек. 2018 г. в 16:28, Anton Chernov :
> > > > > >
> > > > > >> Hi Chris,
> > > > > >>
> > > > > >> Following up on the issue, are all things resolved in the
> > > discussion?
> > > > > >>
> > > > > >> If yes, I kindly ask you to reopen this PR and remove
> ‘requesting
> > > > > >> changes’ status:
> > > > > >> https://github.com/apache/incubator-mxnet/pull/12160
> > > > > >>
> > > > > >> Thank you.
> > > > > >>
> > > > > >>
> > > > > >> B

Re: [Discussion] Remove bundled llvm OpenMP

2019-05-22 Thread Anton Chernov
Great! Thank you, Aaron. I have rebased it.

ср, 22 мая 2019 г. в 15:49, Aaron Markham :

> I reopened it for you.
>
> On Wed, May 22, 2019, 05:25 Anton Chernov  wrote:
>
> > I don't have necessary rights to reopen this PR.
> >
> > пн, 20 мая 2019 г. в 08:00, Pedro Larroy :
> >
> > > Hi Anton, Stas.
> > >
> > > Can we reopen this PR and get it merged as per the data collected by
> > Stas?
> > >
> > > https://github.com/apache/incubator-mxnet/pull/12160
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Benchmarking+MXNet+with+different+OpenMP+implementations
> > >
> > > There are multiple issues that will be fixed by solving this problem.
> > >
> > >
> > > Pedro
> > >
> > > On Tue, Feb 12, 2019 at 4:54 AM Anton Chernov 
> > wrote:
> > > >
> > > > I would like to propose a possible alternative solution for
> > > consideration.
> > > >
> > > > If keeping llvm OpenMP as a submodule is inevitable one could make
> > > > following adjustments:
> > > >
> > > > Since compilers try to find their own OpenMP library implicitly,
> MXNet
> > > > needs to ensure that only the bundled version is found. Therefore
> > during
> > > > the build and also during deployment this library has to provide
> > symlinks
> > > > for each possible compiler that would link to the built artifact ie.
> > > >
> > > > libiomp.so -> libgomp.so -> libomp.so
> > > >
> > > > The MKLML iomp would need to be hidden and removed as well.
> > > >
> > > > On Windows it would be a different story, but as can be seen [1]
> > bundled
> > > > OpenMP was not included in the Windows build anyway.
> > > >
> > > > Alternatively: always use iomp (with same symlinking trick though)
> > > provided
> > > > by MKLML distribution [2]. This potentially could work on Windows as
> > > well.
> > > >
> > > > Best
> > > > Anton
> > > >
> > > > [1]
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/blob/8a63bdecf2d9f12d34fe5874957ae4c867eb5f5b/CMakeLists.txt#L408-L410
> > > > [2] https://github.com/intel/mkl-dnn/releases
> > > >
> > > > вт, 12 февр. 2019 г. в 11:22, Anton Chernov :
> > > >
> > > > > Recent benchmarking results have been published here [1].
> Experiments
> > > > > compare different OpenMP implementations as well as binaries
> compiled
> > > with
> > > > > different compilers including GCC, Clang and ICC.
> > > > >
> > > > > During experimentation another issues with mixing up libraries was
> > > > > identified and described here [2].
> > > > >
> > > > > Best
> > > > > Anton
> > > > >
> > > > > [1] https://cwiki.apache.org/confluence/x/2wclBg
> > > > > [2]
> > > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues/14087#issuecomment-461734041
> > > > >
> > > > >
> > > > > вс, 9 дек. 2018 г. в 16:28, Anton Chernov :
> > > > >
> > > > >> Hi Chris,
> > > > >>
> > > > >> Following up on the issue, are all things resolved in the
> > discussion?
> > > > >>
> > > > >> If yes, I kindly ask you to reopen this PR and remove ‘requesting
> > > > >> changes’ status:
> > > > >> https://github.com/apache/incubator-mxnet/pull/12160
> > > > >>
> > > > >> Thank you.
> > > > >>
> > > > >>
> > > > >> Best
> > > > >> Anton
> > > > >>
> > > > >>
> > > > >> вт, 27 нояб. 2018 г. в 17:15, Anton Chernov  >:
> > > > >>
> > > > >>> Another thing to take into consideration:
> > > > >>>
> > > > >>> All python artefacts that are created (PyPi) are built with make
> > and
> > > are
> > > > >>> not using the bundled OpenMP library.
> > > > >>>
> > > > >>> One step for the switch to CMake to happen is the approval and
> > > merging
> > > > >>> of the m

Re: [Discussion] Remove bundled llvm OpenMP

2019-05-22 Thread Anton Chernov
I don't have necessary rights to reopen this PR.

пн, 20 мая 2019 г. в 08:00, Pedro Larroy :

> Hi Anton, Stas.
>
> Can we reopen this PR and get it merged as per the data collected by Stas?
>
> https://github.com/apache/incubator-mxnet/pull/12160
>
>
> https://cwiki.apache.org/confluence/display/MXNET/Benchmarking+MXNet+with+different+OpenMP+implementations
>
> There are multiple issues that will be fixed by solving this problem.
>
>
> Pedro
>
> On Tue, Feb 12, 2019 at 4:54 AM Anton Chernov  wrote:
> >
> > I would like to propose a possible alternative solution for
> consideration.
> >
> > If keeping llvm OpenMP as a submodule is inevitable one could make
> > following adjustments:
> >
> > Since compilers try to find their own OpenMP library implicitly, MXNet
> > needs to ensure that only the bundled version is found. Therefore during
> > the build and also during deployment this library has to provide symlinks
> > for each possible compiler that would link to the built artifact ie.
> >
> > libiomp.so -> libgomp.so -> libomp.so
> >
> > The MKLML iomp would need to be hidden and removed as well.
> >
> > On Windows it would be a different story, but as can be seen [1] bundled
> > OpenMP was not included in the Windows build anyway.
> >
> > Alternatively: always use iomp (with same symlinking trick though)
> provided
> > by MKLML distribution [2]. This potentially could work on Windows as
> well.
> >
> > Best
> > Anton
> >
> > [1]
> >
> https://github.com/apache/incubator-mxnet/blob/8a63bdecf2d9f12d34fe5874957ae4c867eb5f5b/CMakeLists.txt#L408-L410
> > [2] https://github.com/intel/mkl-dnn/releases
> >
> > вт, 12 февр. 2019 г. в 11:22, Anton Chernov :
> >
> > > Recent benchmarking results have been published here [1]. Experiments
> > > compare different OpenMP implementations as well as binaries compiled
> with
> > > different compilers including GCC, Clang and ICC.
> > >
> > > During experimentation another issues with mixing up libraries was
> > > identified and described here [2].
> > >
> > > Best
> > > Anton
> > >
> > > [1] https://cwiki.apache.org/confluence/x/2wclBg
> > > [2]
> > >
> https://github.com/apache/incubator-mxnet/issues/14087#issuecomment-461734041
> > >
> > >
> > > вс, 9 дек. 2018 г. в 16:28, Anton Chernov :
> > >
> > >> Hi Chris,
> > >>
> > >> Following up on the issue, are all things resolved in the discussion?
> > >>
> > >> If yes, I kindly ask you to reopen this PR and remove ‘requesting
> > >> changes’ status:
> > >> https://github.com/apache/incubator-mxnet/pull/12160
> > >>
> > >> Thank you.
> > >>
> > >>
> > >> Best
> > >> Anton
> > >>
> > >>
> > >> вт, 27 нояб. 2018 г. в 17:15, Anton Chernov :
> > >>
> > >>> Another thing to take into consideration:
> > >>>
> > >>> All python artefacts that are created (PyPi) are built with make and
> are
> > >>> not using the bundled OpenMP library.
> > >>>
> > >>> One step for the switch to CMake to happen is the approval and
> merging
> > >>> of the mentioned PR:
> > >>>
> > >>> https://github.com/apache/incubator-mxnet/pull/12160
> > >>>
> > >>> If there are no other objections I kindly ask Chris Olivier to remove
> > >>> his 'requesting changes' veto on it to unblock the CMake overhaul
> work.
> > >>>
> > >>> Thank you.
> > >>>
> > >>> Best
> > >>> Anton
> > >>>
> > >>> чт, 22 нояб. 2018 г. в 17:11, Anton Chernov :
> > >>>
> > >>>>
> > >>>> Thank you for you answer, Chris.
> > >>>>
> > >>>> > The whole “mixing omp libraries” is something that occurs in
> > >>>> production
> > >>>> every day and certainly in everything that uses mkl.
> > >>>>
> > >>>> I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures
> that
> > >>>> this mixture is not happening:
> > >>>>
> > >>>> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
> > >>>> runtime library to work. As different OpenMP runtim

[INVITATION] 7th of May 2019 / Berlin MXNet Recurring User Group Meeting

2019-05-07 Thread Anton Chernov
Dear MXNet community,

I would like to invite you to the regular Apache MXNet (Incubating) User
Group meeting on the 19th of March 2019 [1].

As usually, the meeting will have remote VC, powered by Amazon Chime [2].
It will be held from 6pm-7pm (CEST) / 9am-10am (PST).

Join the meeting:

https://chime.aws/1980729377
Meeting ID: 1980729377

Looking forward to meet you there.

Best
Anton

[1]
https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28Incubating%29+
User+Groups+recurring+meetings
[2] https://chime.aws/


Re: The Learning Robot

2019-03-29 Thread Anton Chernov
Thank you!

Maybe you can retweet mine:

https://twitter.com/lebegus/status/556984824885249

And ApacheMXNet could do that afterwards as well?

Anton

пт, 29 мар. 2019 г. в 14:06, Carin Meier :

> Great story!
>
> I would love to retweet it on twitter. Is there anyway to get it out on the
> https://twitter.com/ApacheMXNet account?
>
> - Carin
>
> On Fri, Mar 29, 2019 at 6:37 AM Anton Chernov  wrote:
>
> > Dear MXNet Community,
> >
> >
> > Read the development story of a robotics demo powered by deep learning
> with
> > Apache MXNet on an embedded platform.
> >
> >
> > The Learning Robot
> >
> > Humans and machines, hand in hand.
> >
> > https://medium.com/apache-mxnet/the-learning-robot-1c2deab8f375
> >
> >
> > Best
> >
> > Anton
> >
>


The Learning Robot

2019-03-29 Thread Anton Chernov
Dear MXNet Community,


Read the development story of a robotics demo powered by deep learning with
Apache MXNet on an embedded platform.


The Learning Robot

Humans and machines, hand in hand.

https://medium.com/apache-mxnet/the-learning-robot-1c2deab8f375


Best

Anton


Re: Call for Ideas and Approaches to Community Building

2019-03-26 Thread Anton Chernov
Here is a demo with some impressions:

https://youtu.be/UwJxLztoI1o

The MXNet blog post will follow soon.

We wanted to show it on GTC as well, but couldn't allocate the needed time.

You can see the code in Thomas repository:

https://github.com/ThomasDelteil/RobotTracker_MXNet

But it's far from being just reusable and lacks documentation.

I could see though that if we get enough time, we would wrap most things
into docker containers, write proper instructions and give the community
the opportunity to contribute and to show it on their own.

Best
Anton


ср, 20 мар. 2019 г. в 17:08, Aaron Markham :

> Anton, can you share the design and specs and code for the robot arm demo?
> I wish that was being shown at GTC now. It would be great to let people
> borrow it for West coast events. Maybe I can get one built here in Palo
> Alto.
>
> On Tue, Mar 19, 2019, 05:54 Anton Chernov  wrote:
>
> > I don't know whether that is enough, but here are a few efforts we make
> to
> > promote MXNet:
> >
> > * The robotic arms demo at the embedded world
> > We promoted MXNet as the framework to go on embedded devices with our
> > robotic arms demo. We've got a lot of attention from different people
> > including professors from multiple universities. A blog post about the
> demo
> > will be posted in the next days MXNet Medium blog [1].
> >
> > Here again some impressions from twitter:
> > https://twitter.com/lebegus/status/1100839414228500485
> >
> > * MLPerf results
> > We intend to publish more benchmark results to MLPerf [2], showing proof
> of
> > the performance advantages of MXNet.
> >
> > * Recurring user group meetings
> > We offer recurring VC meetings [3], free for everyone. We dedicate our
> time
> > to anyone that would like to know more about MXNet or to ask any other
> > related question.
> >
> > * Collaborative meetups
> > We organize meetups with attendants from various companies [4], sharing
> > their interesting use cases and best practises with ML and MXNet.
> >
> > Tracking works and papers on popular science conferences is a valid
> metric,
> > but it's focused on research. More and more people that don't write
> papers
> > use ML and MXNet in production without knowing all the scientific
> details.
> > How to measure how many are out there is an open question.
> >
> > Best
> > Anton
> >
> > [1] https://medium.com/apache-mxnet
> > [2] https://mlperf.org/
> > [3] https://cwiki.apache.org/confluence/x/7BY0BQ
> > [4] https://www.meetup.com/Deep-Learning-with-Apache-MXNet-Berlin
> >
> >
> > вт, 19 мар. 2019 г. в 07:23, Isabel Drost-Fromm :
> >
> > >
> > >
> > > Am 19. März 2019 02:49:23 MEZ schrieb "Zhao, Patric" <
> > > patric.z...@intel.com>:
> > > >I suggest to encourage and fund the students/researchers to present
> > > >their works on the popular conference.
> > > >I know talking is easy but maybe the decision maker can allocate more
> > > >resources for marketing.
> > >
> > > Just for clarity, who exactly do you mean with "the decision maker"?
> > > Decision maker for what?
> > >
> > > On another note, beyond that one conference, which other channels do
> > > people here follow? How did you first hear about mxnet?
> > >
> > >
> > > Isabel
> > >
> > >
> > > --
> > > Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
> > >
> >
>


Re: CI unstable

2019-03-26 Thread Anton Chernov
The fix for the CI system has been merged and the system should be stable
again. You can now rebase all stale PR's.

I have ported the fixes to the latest release branches as well:

Fixes for CI downloads (v1.4.x)
https://github.com/apache/incubator-mxnet/pull/14526

Fixes for CI downloads (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/14525

I would appreciate a review and merge on those.

Best
Anton

пт, 22 мар. 2019 г. в 20:57, Anton Chernov :

> Yes, you can see the changes in this PR:
>
> https://github.com/apache/incubator-mxnet/pull/14504
>
> Unfortunately, there are still a few issues left to fix.
>
> Best
> Anton
>
> пт, 22 марта 2019 г. в 18:10, Mu Li :
>
>> I saw CI is downloading from data.dmlc.ml. Changing it data.mxnet.io
>> should
>> fix this issue. Say
>>
>> http://data.dmlc.ml/models/imagenet/inception-bn/Inception-BN-0126.params
>> ->
>> http://data.mxnet.io/models/imagenet/inception-bn/Inception-BN-0126.params
>>
>> On Thu, Mar 21, 2019 at 11:57 AM Anton Chernov 
>> wrote:
>>
>> > Dear MXNet Community,
>> >
>> > Since a few days we are experiencing problems with CI PR verification
>> > builds. For some reason unix-cpu builds get aborted. Potentially there
>> is a
>> > problem with gitlab.com from where dependencies are downloaded for
>> static
>> > MXNet builds.
>> >
>> > We are working hard on finding and fixing the issue. Please excuse the
>> > inconvenience.
>> >
>> > Best
>> > Anton
>> >
>>
>


Re: CI unstable

2019-03-22 Thread Anton Chernov
Yes, you can see the changes in this PR:

https://github.com/apache/incubator-mxnet/pull/14504

Unfortunately, there are still a few issues left to fix.

Best
Anton

пт, 22 марта 2019 г. в 18:10, Mu Li :

> I saw CI is downloading from data.dmlc.ml. Changing it data.mxnet.io
> should
> fix this issue. Say
>
> http://data.dmlc.ml/models/imagenet/inception-bn/Inception-BN-0126.params
> ->
> http://data.mxnet.io/models/imagenet/inception-bn/Inception-BN-0126.params
>
> On Thu, Mar 21, 2019 at 11:57 AM Anton Chernov 
> wrote:
>
> > Dear MXNet Community,
> >
> > Since a few days we are experiencing problems with CI PR verification
> > builds. For some reason unix-cpu builds get aborted. Potentially there
> is a
> > problem with gitlab.com from where dependencies are downloaded for
> static
> > MXNet builds.
> >
> > We are working hard on finding and fixing the issue. Please excuse the
> > inconvenience.
> >
> > Best
> > Anton
> >
>


CI unstable

2019-03-21 Thread Anton Chernov
Dear MXNet Community,

Since a few days we are experiencing problems with CI PR verification
builds. For some reason unix-cpu builds get aborted. Potentially there is a
problem with gitlab.com from where dependencies are downloaded for static
MXNet builds.

We are working hard on finding and fixing the issue. Please excuse the
inconvenience.

Best
Anton


Re: Call for Ideas and Approaches to Community Building

2019-03-19 Thread Anton Chernov
I don't know whether that is enough, but here are a few efforts we make to
promote MXNet:

* The robotic arms demo at the embedded world
We promoted MXNet as the framework to go on embedded devices with our
robotic arms demo. We've got a lot of attention from different people
including professors from multiple universities. A blog post about the demo
will be posted in the next days MXNet Medium blog [1].

Here again some impressions from twitter:
https://twitter.com/lebegus/status/1100839414228500485

* MLPerf results
We intend to publish more benchmark results to MLPerf [2], showing proof of
the performance advantages of MXNet.

* Recurring user group meetings
We offer recurring VC meetings [3], free for everyone. We dedicate our time
to anyone that would like to know more about MXNet or to ask any other
related question.

* Collaborative meetups
We organize meetups with attendants from various companies [4], sharing
their interesting use cases and best practises with ML and MXNet.

Tracking works and papers on popular science conferences is a valid metric,
but it's focused on research. More and more people that don't write papers
use ML and MXNet in production without knowing all the scientific details.
How to measure how many are out there is an open question.

Best
Anton

[1] https://medium.com/apache-mxnet
[2] https://mlperf.org/
[3] https://cwiki.apache.org/confluence/x/7BY0BQ
[4] https://www.meetup.com/Deep-Learning-with-Apache-MXNet-Berlin


вт, 19 мар. 2019 г. в 07:23, Isabel Drost-Fromm :

>
>
> Am 19. März 2019 02:49:23 MEZ schrieb "Zhao, Patric" <
> patric.z...@intel.com>:
> >I suggest to encourage and fund the students/researchers to present
> >their works on the popular conference.
> >I know talking is easy but maybe the decision maker can allocate more
> >resources for marketing.
>
> Just for clarity, who exactly do you mean with "the decision maker"?
> Decision maker for what?
>
> On another note, beyond that one conference, which other channels do
> people here follow? How did you first hear about mxnet?
>
>
> Isabel
>
>
> --
> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
>


[INVITATION] 19th of March 2019 / Berlin MXNet Recurring User Group Meeting

2019-03-19 Thread Anton Chernov
Dear MXNet community,

I would like to invite you to the regular Apache MXNet (Incubating) User
Group meeting on the 19th of March 2019 [1].

As usually, the meeting will have remote VC, powered by Amazon Chime [2].

Join the meeting:

https://chime.aws/4899512091
Meeting ID: 4899 51 2091

Best
Anton

[1]
https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28Incubating%29+
User+Groups+recurring+meetings
[2] https://chime.aws/


Re: [RFC] Integrating the new MXNet website

2019-03-11 Thread Anton Chernov
The link is not working:
https://cwiki.apache.org/confluence/pages/viewpageaction?pageId=103089084



Page Not Found
We can't find that page. This could be because:

The page doesn't exist.
The page exists, but you don't have view permission for that space.



I have checked that I'm logged in.

Best
Anton

чт, 7 мар. 2019 г. в 20:09, Aaron Markham :

> Thanks Aston.
>
> For contributors interested in helping improve the website's UX and design:
> I spent some time learning the Material Design ecosystem as the beta site
> uses this as its template. From there I refactored the wireframes using
> Sketch + Material Design plugin and added them to gallery.io. These new
> ones are for a typical user flow: they go from the Docs main page to Python
> to Gluon (mxnet.gluon) and finally to mxnet.gluon.data. I put these
> screenshots in the Website Redesign Information Architecture & Wireframe
> wiki article (that I moved to the Proposals section):
> https://cwiki.apache.org/confluence/pages/viewpageaction?pageId=103089084
> Anyone that is interested can contribute directly to commenting on the
> Material Design wireframe revisions here (you can also see drafts for
> Ecosystem and Features):
>
> https://gallery.io/projects/MCHbtQVoQ2HCZfrqGyueCUJz/files/MCEJu8Y2hyDScX1HssyoxcEAyxZ6gioeNE8
>
> Cheers,
> Aaron
>
>
> On Mon, Mar 4, 2019 at 6:03 PM Aston Zhang  wrote:
>
> > Dear Community,
> >
> > We have published a post at
> > https://github.com/apache/incubator-mxnet/issues/14330 requesting for
> > comments on our proposal of integrating the new MXNet website. We'd like
> > to hear
> > the thoughts and suggestions from the community and welcome any form of
> > contribution to improve contents and UX of the MXNet website.
> >
> > Thanks,
> > MXNet developers from AWS
> >
>


Re: direction for documentation across various APIs that share common doc source

2019-03-04 Thread Anton Chernov
Hi Aaron,

Here is an idea: The main documentation is the one in .cc files. In theory
the language bindings should just override some stuff from it, like
examples. If I understand correctly there is a sphinx script that generates
the documentation. If run it first for core src folder and then from a
language binding folder it could use the -f, --force flag [1] to override
the needed parts. That would allow to provide a 'default' version of the
documentation, that could be adjusted where needed.

Best
Anton

[1]
http://www.sphinx-doc.org/en/stable/man/sphinx-apidoc.html#sphinx-apidoc-manual-page

вт, 26 февр. 2019 г. в 02:20, Aaron Markham :

> Hi everyone,
> A recent issue and pending PR has brought a thorny docs situation to
> my attention again and I'd like to hear from the community on how to
> proceed.
> We currently get some of the docs for the Python API pulled out of .cc
> files. Other APIs also get docs from there, or pull the Python docs to
> autogenerate their docs. This presents some problems:
> 1. (Some of) The code examples provided don't run when you copy and
> paste them. [1]
> 2. The code examples that show up in other APIs won't work as the code
> is Python and for (many/complicated) statements the syntax can be
> wrong.
>
> When I try out something new and go for the hello world example or
> browse around I do expect the docs' code examples to work. If they
> don't, well, that's a bad sign and I move on to another project. I'd
> like for new users to have a great experience no matter what language
> they use.
>
> One fix is to go ahead a be "Python 1st" and make sure the code
> executes. This route is proposed in a PR for some NDArray operators.
> [2] As I mention in the PR comments, this has the drawback of being
> very specific to Python and the psuedo-code, for what its worth,
> showing up in Scala docs (for example) will be much more obviously out
> of place. If I were an Scala person, I'd probably find this
> irritating. The same goes for R.
>
> So... what should we do? Here are some ideas:
> a) I thought about providing different examples in the .cc code, one
> for each language and then making sure those are parsed out properly
> when the APIs are generating their docs. I'm not sure how feasible
> this is.
> b) I thought that it would be nice if each operator had a wrapper for
> each language API, and this is where the example payload resides.
> Maybe docstrings go here too or the common docstrings just bubble up
> from the cc file. The benefit is that changes for a specific language
> remain in those packages and don't touch the shared core files.
> c) Another route is to keep the examples in the .cc files pseudo-code,
> but then also make sure each language has real examples in their docs.
> Then, any code block that's in the docs now that won't execute should
> be changed to a preformatted text block so people don't confuse it
> with functional code.
>
> I really don't like any of these options as they each sound like ton
> of work and difficult to maintain. Are there any projects that solve
> this problem in some elegant and efficient way?
>
> Cheers,
> Aaron
>
> [1] https://github.com/apache/incubator-mxnet/issues/14232
> [2] https://github.com/apache/incubator-mxnet/pull/14243
>


Embedded World 2019 Robotics Demo

2019-02-27 Thread Anton Chernov
Dear MXNet Community,

If you happens to be at the Embedded World exhibition in Nürnberg drop by
our booth at the Qt stand in hall 4 to see a MXNet robotics demo.

Looking forward to see you!

Best
Anton


Re: Benchmarking MXNet with different compilers and different OpenMP implementations (results)

2019-02-14 Thread Anton Chernov
Thank you, Aaron, for your interest on the topic.

My main previous proposal still stands: remove bundled OpenMP submodule and
use OpenMP provided by the environment [1]. This might lead to performance
degradation in some cases where an old OpenMP library is used or thread
affinity wasn't set properly. But that would be a problem of the
environment, not MXNet.

I described some alternative solutions in [1] as part of this [2] thread.
Tricking the linker with symlinks in both cases should allow to avoid
multiple OpenMP implementations linked simultaneously to MXNet. Windows
questions would be still open.

Best
Anton

[1] https://github.com/apache/incubator-mxnet/pull/12160
[2]
https://lists.apache.org/thread.html/007d8db15a1782e1b20896a4050b62710d4ff0908c67b94af7cb0f8b@%3Cdev.mxnet.apache.org%3E
[3]
https://lists.apache.org/thread.html/4827f0f742b6e7e070da350ea81226d059401527f3072ce8b33c1fdf@%3Cdev.mxnet.apache.org%3E


вт, 12 февр. 2019 г. в 16:39, Aaron Markham :

> This is really great research. I've often wondered what the difference
> really is, and why it has to be so complicated. It seems the answer is
> there isn't much difference and it shouldn't be as complex.
> As for your next steps, would you propose that cmake be brought up to
> parity? It seems strange that it causes slowness and if so, it shouldn't be
> recommended for now.
> Also, testing for windows compliers might be quite important as install
> stats suggest a significant portion of windows users. Wouldn't this nudge
> the decision of what to use as a rule going forward?
> I ran into this submodule openmp issue on windows myself. How does that get
> fixed? Do we have to repackage all of the submodules to make sure they use
> the recommended implementation or they use what the system expects?
>
> Cheers,
> Aaron
>
> On Tue, Feb 12, 2019, 04:37 Anton Chernov  wrote:
>
> > Dear MXNet community,
> >
> > Due to multiple problems related to OpenMP and stale proposed change [1]
> we
> > have been working on gathering performance data on the impact of using
> > different OpenMP implementations with MXNet (great thanks to Stanislav
> > Tsukrov for the hard work). The results can be found here [2].
> >
> > As a short summary of the investigation: The difference between different
> > compilers is insignificant. Native OpenMP implementations (more or less
> > recent) perform equally (<5% difference). See more details in the
> document.
> >
> > Please review the document and share your thoughts on the topic.
> >
> > Thanks!
> >
> > Best
> > Anton
> >
> > [1]
> >
> >
> https://lists.apache.org/thread.html/4827f0f742b6e7e070da350ea81226d059401527f3072ce8b33c1fdf@
> > 
> > [2] https://cwiki.apache.org/confluence/x/2wclBg
> >
>


Re: [Discussion] Remove bundled llvm OpenMP

2019-02-12 Thread Anton Chernov
I would like to propose a possible alternative solution for consideration.

If keeping llvm OpenMP as a submodule is inevitable one could make
following adjustments:

Since compilers try to find their own OpenMP library implicitly, MXNet
needs to ensure that only the bundled version is found. Therefore during
the build and also during deployment this library has to provide symlinks
for each possible compiler that would link to the built artifact ie.

libiomp.so -> libgomp.so -> libomp.so

The MKLML iomp would need to be hidden and removed as well.

On Windows it would be a different story, but as can be seen [1] bundled
OpenMP was not included in the Windows build anyway.

Alternatively: always use iomp (with same symlinking trick though) provided
by MKLML distribution [2]. This potentially could work on Windows as well.

Best
Anton

[1]
https://github.com/apache/incubator-mxnet/blob/8a63bdecf2d9f12d34fe5874957ae4c867eb5f5b/CMakeLists.txt#L408-L410
[2] https://github.com/intel/mkl-dnn/releases

вт, 12 февр. 2019 г. в 11:22, Anton Chernov :

> Recent benchmarking results have been published here [1]. Experiments
> compare different OpenMP implementations as well as binaries compiled with
> different compilers including GCC, Clang and ICC.
>
> During experimentation another issues with mixing up libraries was
> identified and described here [2].
>
> Best
> Anton
>
> [1] https://cwiki.apache.org/confluence/x/2wclBg
> [2]
> https://github.com/apache/incubator-mxnet/issues/14087#issuecomment-461734041
>
>
> вс, 9 дек. 2018 г. в 16:28, Anton Chernov :
>
>> Hi Chris,
>>
>> Following up on the issue, are all things resolved in the discussion?
>>
>> If yes, I kindly ask you to reopen this PR and remove ‘requesting
>> changes’ status:
>> https://github.com/apache/incubator-mxnet/pull/12160
>>
>> Thank you.
>>
>>
>> Best
>> Anton
>>
>>
>> вт, 27 нояб. 2018 г. в 17:15, Anton Chernov :
>>
>>> Another thing to take into consideration:
>>>
>>> All python artefacts that are created (PyPi) are built with make and are
>>> not using the bundled OpenMP library.
>>>
>>> One step for the switch to CMake to happen is the approval and merging
>>> of the mentioned PR:
>>>
>>> https://github.com/apache/incubator-mxnet/pull/12160
>>>
>>> If there are no other objections I kindly ask Chris Olivier to remove
>>> his 'requesting changes' veto on it to unblock the CMake overhaul work.
>>>
>>> Thank you.
>>>
>>> Best
>>> Anton
>>>
>>> чт, 22 нояб. 2018 г. в 17:11, Anton Chernov :
>>>
>>>>
>>>> Thank you for you answer, Chris.
>>>>
>>>> > The whole “mixing omp libraries” is something that occurs in
>>>> production
>>>> every day and certainly in everything that uses mkl.
>>>>
>>>> I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures that
>>>> this mixture is not happening:
>>>>
>>>> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
>>>> runtime library to work. As different OpenMP runtimes may not be binary
>>>> compatible it's important to ensure that only one OpenMP runtime is used
>>>> throughout the application. Having more than one OpenMP runtime initialized
>>>> may lead to undefined behavior resulting in incorrect results or crashes."
>>>> [1]
>>>>
>>>> That is why 2 different MKLML libraries are provided:
>>>>
>>>> lib/libmklml_gnu.so  | Intel MKL small library for GNU* OpenMP runtime
>>>> lib/libmklml_intel.so | Intel MKL small library for Intel(R) OpenMP
>>>> runtime
>>>>
>>>> > is the suggestion that libiomp be removed from mkl?
>>>>
>>>> That is certainly not my suggestion.
>>>>
>>>> > have you spoken with intel? have you consulted Intel at all?
>>>>
>>>> Yes, I have asked for comments on the issue.
>>>>
>>>> > “hard to debug random crash”. you’re seeing an assertion which is
>>>> probably ...
>>>>
>>>> I'm seeing the result of undefined behaviour. And I want to put
>>>> emphasis on the following statement:
>>>>
>>>> I disregards of whether there is a particular reason for the assert -
>>>> it is a result of behaviour that should not happen. There are valid ways
>>>> how to use llvm OpenMP in MXNet and the current way is not one of them.
>>>>
>>>>

Benchmarking MXNet with different compilers and different OpenMP implementations (results)

2019-02-12 Thread Anton Chernov
Dear MXNet community,

Due to multiple problems related to OpenMP and stale proposed change [1] we
have been working on gathering performance data on the impact of using
different OpenMP implementations with MXNet (great thanks to Stanislav
Tsukrov for the hard work). The results can be found here [2].

As a short summary of the investigation: The difference between different
compilers is insignificant. Native OpenMP implementations (more or less
recent) perform equally (<5% difference). See more details in the document.

Please review the document and share your thoughts on the topic.

Thanks!

Best
Anton

[1]
https://lists.apache.org/thread.html/4827f0f742b6e7e070da350ea81226d059401527f3072ce8b33c1fdf@

[2] https://cwiki.apache.org/confluence/x/2wclBg


Re: [Discussion] Remove bundled llvm OpenMP

2019-02-12 Thread Anton Chernov
Recent benchmarking results have been published here [1]. Experiments
compare different OpenMP implementations as well as binaries compiled with
different compilers including GCC, Clang and ICC.

During experimentation another issues with mixing up libraries was
identified and described here [2].

Best
Anton

[1] https://cwiki.apache.org/confluence/x/2wclBg
[2]
https://github.com/apache/incubator-mxnet/issues/14087#issuecomment-461734041


вс, 9 дек. 2018 г. в 16:28, Anton Chernov :

> Hi Chris,
>
> Following up on the issue, are all things resolved in the discussion?
>
> If yes, I kindly ask you to reopen this PR and remove ‘requesting changes’
> status:
> https://github.com/apache/incubator-mxnet/pull/12160
>
> Thank you.
>
>
> Best
> Anton
>
>
> вт, 27 нояб. 2018 г. в 17:15, Anton Chernov :
>
>> Another thing to take into consideration:
>>
>> All python artefacts that are created (PyPi) are built with make and are
>> not using the bundled OpenMP library.
>>
>> One step for the switch to CMake to happen is the approval and merging of
>> the mentioned PR:
>>
>> https://github.com/apache/incubator-mxnet/pull/12160
>>
>> If there are no other objections I kindly ask Chris Olivier to remove his
>> 'requesting changes' veto on it to unblock the CMake overhaul work.
>>
>> Thank you.
>>
>> Best
>> Anton
>>
>> чт, 22 нояб. 2018 г. в 17:11, Anton Chernov :
>>
>>>
>>> Thank you for you answer, Chris.
>>>
>>> > The whole “mixing omp libraries” is something that occurs in production
>>> every day and certainly in everything that uses mkl.
>>>
>>> I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures that
>>> this mixture is not happening:
>>>
>>> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
>>> runtime library to work. As different OpenMP runtimes may not be binary
>>> compatible it's important to ensure that only one OpenMP runtime is used
>>> throughout the application. Having more than one OpenMP runtime initialized
>>> may lead to undefined behavior resulting in incorrect results or crashes."
>>> [1]
>>>
>>> That is why 2 different MKLML libraries are provided:
>>>
>>> lib/libmklml_gnu.so  | Intel MKL small library for GNU* OpenMP runtime
>>> lib/libmklml_intel.so | Intel MKL small library for Intel(R) OpenMP
>>> runtime
>>>
>>> > is the suggestion that libiomp be removed from mkl?
>>>
>>> That is certainly not my suggestion.
>>>
>>> > have you spoken with intel? have you consulted Intel at all?
>>>
>>> Yes, I have asked for comments on the issue.
>>>
>>> > “hard to debug random crash”. you’re seeing an assertion which is
>>> probably ...
>>>
>>> I'm seeing the result of undefined behaviour. And I want to put emphasis
>>> on the following statement:
>>>
>>> I disregards of whether there is a particular reason for the assert - it
>>> is a result of behaviour that should not happen. There are valid ways how
>>> to use llvm OpenMP in MXNet and the current way is not one of them.
>>>
>>> > The lack of root-causing the problem and knee-jerk solution here makes
>>> me
>>> uncomfortable.
>>>
>>> I hope that my efforts highlighting the problems reach you to mitigate
>>> your uncomfort.
>>>
>>> > if you want to see performance differences there’s an environment
>>> variable
>>> you can set in the mxnet omp tuning code that will print overhead and
>>> execution times for the current omp library.
>>>
>>> I don't want to see performance differences in the current OpenMP
>>> library. I want to remove the current OpenMP library and use the one
>>> provided by the compiler.
>>>
>>>
>>>
>>> Best
>>> Anton
>>>
>>> [1] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
>>>
>>> чт, 22 нояб. 2018 г. в 16:50, Chris Olivier :
>>>
>>>> Do you not work on CI mostly? My apologies for thinking that was some
>>>> sort
>>>> of team effort between you and a few others that were passionate about
>>>> CI
>>>> keeping the CI system running smoothly.
>>>>
>>>> You have source code, you have the line the assertion is on. If you
>>>> can’t
>>>> describe what’s going wrong that causes the assertion, then I don’t
>

Re: Taxonomy on our cwiki

2019-01-21 Thread Anton Chernov
A quick tip about links to the wiki pages, note the difference in links:

* https://cwiki.apache.org/confluence/display/MXNET/Release+Process (1)
* https://cwiki.apache.org/confluence/x/BINjB (2)

If sharing was done via the 'Share' menu the link (2) would persist after
any structual movements.

Best
Anton


сб, 19 янв. 2019 г. в 16:49, Pedro Larroy :

> +1
>
> On Sat, Jan 19, 2019 at 2:51 PM Zhao, Patric 
> wrote:
> >
> > +1, Good idea.
> >
> > It's not very easy to find out the related contents since lots of
> folders in the website.
> >
> >
> > > -Original Message-
> > > From: Sheng Zha [mailto:zhash...@apache.org]
> > > Sent: Saturday, January 19, 2019 3:28 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: Taxonomy on our cwiki
> > >
> > > Hi MXNet,
> > >
> > > Given that currently cwiki is the only place other than mxnet website
> for
> > > mxnet-related documentation, I'd like to request your attention to the
> > > (slightly disorganized) cwiki page of MXNet. The top level folders
> (and their
> > > contents) currently looks like this:
> > > - Design Proposals* (bag of proposals, not in order)
> > > - Development* (mixture of guides, roadmaps, processes)
> > > - Release Process (release notes)
> > > - Website (guides and proposals)
> > > - MXNet Clojure (call for contribution, guides)
> > > - MXNet Keras Integration (design)
> > > - MXNet-ONNX Integration (design, dev status)
> > > - MXNet R Package (guide, backlog)
> > > - MXNet-Scala (design, dev status, guide)
> > > - Content Formatting Templates (not a folder but link to two docs)
> > > - How-to articles (1 guide)
> > > - Community (guide on apache-related processes)
> > > - Data IO (designs)
> > > - Continuous Integration (guides, designs)
> > > - Meetups and Hangouts (events)
> > >
> > > And here are two good examples from successful Apache projects:
> > > - Apache Flink: an **audience-oriented** structure [1]
> > >   Users (Presentations and How-to)
> > >   Contributors (Dev processes and How-to)
> > >   Committers (Infra, Dev processes, Release processes, Releases)
> > >   Roadmaps and Feature Designs (archive)
> > > - Apache OpenNLP: a **content-oriented** structure [2]
> > >   Guides
> > >   External Resources
> > >   Proposals
> > >   Releasing
> > >
> > > Clean organization helps content discovery and saves time on locating
> useful
> > > content. Given that we have good amount of content on the wiki page, I
> > > suggest that we decide on a cleaner taxonomy, re-organize contents
> > > accordingly, and add future contents accordingly. To provide a
> starting point
> > > for the discussion, I suggest:
> > > - Given the state we are in, start with content-oriented organization,
> use
> > > these top-level categories: Guides (including processes and how-tos),
> > > Development (including designs, proposals, notes, roadmaps), Community
> > > (including events, activities, external resources and contents)
> > > - If people strongly prefer audience-oriented structure, later we can
> adopt a
> > > structure similar to Flink's.
> > >
> > > Feel free to share your thoughts and preferences here. Thanks.
> > >
> > > -sz
> > >
> > > [1]
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Homehttp
> > > s://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home
> > > [2] https://cwiki.apache.org/confluence/display/OPENNLP/Index
>


[INVITATION] 8 January 2019 / Apache MXNet (Incubating) User Group meeting

2019-01-08 Thread Anton Chernov
Dear MXNet community,

I would like to invite you to the regular Apache MXNet (Incubating) User
Group meeting on the 8th of January 2019 [1].

As usually, the meeting will have remote VC, powered by Amazon Chime.

Join the meeting:

https://chime.aws/4899512091
Meeting ID: 4899 51 2091

Best
Anton

[1]
https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28Incubating%29+User+Groups+recurring+meetings


Re: [DISCUSS] About the PR merging policy

2018-12-12 Thread Anton Chernov
There was no policy setup before for this as far as I know.

Personally, I don't see any value in introducing another blocker for enough
slow process of PR merges. The best value / burden ratio one gets from
lightweight 'another pair of eyes' approach.

Anton

ср, 12 дек. 2018 г. в 01:24, Tianqi Chen :

> I think it is fine as long as we act on good faith. I will normally respect
> code review comments from anyone who might be able to give reasonable
> comments, and beg to differ with good technical reasoning. Normally
> contributions happen in a way that things won't get blocked in small
> features.
>
> For major changes, RFC discussion would be  helpful to resolve the case
>
> Tianqi
>
> On Tue, Dec 11, 2018 at 4:18 PM Qing Lan  wrote:
>
> > Hi all,
> >
> > Recently I self-merged my PR without getting approvals from other
> > committers https://github.com/apache/incubator-mxnet/pull/13617 and only
> > contributors approval. I apologize to the community and thank Marco for
> > pointing out the problem. I took a lesson that we should at least have
> one
> > committer’s approval to merge the code. However, I just found this
> section
> > is missing in the CWiki
> >
> https://cwiki.apache.org/confluence/display/MXNET/Become+an+Apache+MXNet+%28incubating%29+Committer+and+PPMC+Member
> .
> > So I would like to discuss in here:
> >
> > How to conduct the PR reviewing/merging. How many approvals (Committers
> > and Contributors) we should get in order to merge?
> >
> > How to deal with disagreement in the discussion (e.g a
> > contributor/committer request a change)?
> >
> > Please don’t hesitate to share your thoughts!
> >
> > Thanks,
> > Qing
> >
>


GitHub CI status update not stable

2018-12-11 Thread Anton Chernov
Dear MXNet community,

Currently the CI system is experiencing issues with GitHub verification
status updates. If your PR is stuck with a status that looks like:

ci/jenkins/mxnet-validation/centos-cpu Expected — Waiting for status to be
reported
ci/jenkins/mxnet-validation/centos-gpu Expected — Waiting for status to be
reported

Try to make an empty commit to retrigger the build.

You can find the current status of your PR in Jenkins if you look for the
PR id:

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-gpu/detail/PR-13576/5/pipeline

We are working on resolving the issue and will update you once it will get
resolved.

Best regards,
Anton Chernov


Re: [Discussion] Remove bundled llvm OpenMP

2018-12-09 Thread Anton Chernov
Hi Chris,

Following up on the issue, are all things resolved in the discussion?

If yes, I kindly ask you to reopen this PR and remove ‘requesting changes’
status:
https://github.com/apache/incubator-mxnet/pull/12160

Thank you.


Best
Anton


вт, 27 нояб. 2018 г. в 17:15, Anton Chernov :

> Another thing to take into consideration:
>
> All python artefacts that are created (PyPi) are built with make and are
> not using the bundled OpenMP library.
>
> One step for the switch to CMake to happen is the approval and merging of
> the mentioned PR:
>
> https://github.com/apache/incubator-mxnet/pull/12160
>
> If there are no other objections I kindly ask Chris Olivier to remove his
> 'requesting changes' veto on it to unblock the CMake overhaul work.
>
> Thank you.
>
> Best
> Anton
>
> чт, 22 нояб. 2018 г. в 17:11, Anton Chernov :
>
>>
>> Thank you for you answer, Chris.
>>
>> > The whole “mixing omp libraries” is something that occurs in production
>> every day and certainly in everything that uses mkl.
>>
>> I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures that
>> this mixture is not happening:
>>
>> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
>> runtime library to work. As different OpenMP runtimes may not be binary
>> compatible it's important to ensure that only one OpenMP runtime is used
>> throughout the application. Having more than one OpenMP runtime initialized
>> may lead to undefined behavior resulting in incorrect results or crashes."
>> [1]
>>
>> That is why 2 different MKLML libraries are provided:
>>
>> lib/libmklml_gnu.so  | Intel MKL small library for GNU* OpenMP runtime
>> lib/libmklml_intel.so | Intel MKL small library for Intel(R) OpenMP
>> runtime
>>
>> > is the suggestion that libiomp be removed from mkl?
>>
>> That is certainly not my suggestion.
>>
>> > have you spoken with intel? have you consulted Intel at all?
>>
>> Yes, I have asked for comments on the issue.
>>
>> > “hard to debug random crash”. you’re seeing an assertion which is
>> probably ...
>>
>> I'm seeing the result of undefined behaviour. And I want to put emphasis
>> on the following statement:
>>
>> I disregards of whether there is a particular reason for the assert - it
>> is a result of behaviour that should not happen. There are valid ways how
>> to use llvm OpenMP in MXNet and the current way is not one of them.
>>
>> > The lack of root-causing the problem and knee-jerk solution here makes
>> me
>> uncomfortable.
>>
>> I hope that my efforts highlighting the problems reach you to mitigate
>> your uncomfort.
>>
>> > if you want to see performance differences there’s an environment
>> variable
>> you can set in the mxnet omp tuning code that will print overhead and
>> execution times for the current omp library.
>>
>> I don't want to see performance differences in the current OpenMP
>> library. I want to remove the current OpenMP library and use the one
>> provided by the compiler.
>>
>>
>>
>> Best
>> Anton
>>
>> [1] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
>>
>> чт, 22 нояб. 2018 г. в 16:50, Chris Olivier :
>>
>>> Do you not work on CI mostly? My apologies for thinking that was some
>>> sort
>>> of team effort between you and a few others that were passionate about CI
>>> keeping the CI system running smoothly.
>>>
>>> You have source code, you have the line the assertion is on. If you can’t
>>> describe what’s going wrong that causes the assertion, then I don’t
>>> really
>>> have anything more to add to this conversation beyond what’s below:
>>>
>>> The whole “mixing omp libraries” is something that occurs in production
>>> every day and certainly in everything that uses mkl.  It may occasionally
>>> cause problems for some edge cases when there is super-complex linking
>>> strategies and dynamic loading.  But this is not one of those edge cases.
>>> Mostly blaming this is a red herring for other thread-related problems
>>> and
>>> people switch omp library and the timing of their code changes and they
>>> stop seeing the problem. I’ve spent my entire career doing heavily
>>> multiphreaded c++ development and i’ve seen that a million times.  is the
>>> suggestion that libiomp be removed from mkl? have you spoken with intel?
>>> have you consulted Intel at all?
>>>
>>> an

Re: using conan to manage Apache Incubator MXNet project dependencies

2018-12-05 Thread Anton Chernov
What I rather meant is a 'sandboxed build', for example without internet
connection. Would this be possible?

Or is it possible to define and maintain own conan package registry?

ср, 28 нояб. 2018 г. в 16:33, Konstantin Ivlev :

> > Is it possible to have an offline build somehow? For example, if all
> > dependencies would be stored locally. Probably, that would require some
> > modifications to the conan file, right?
> as soon as all dependencies are stored locally,
> conan will use binaries from the local cache.
> no modifications are required to conan file or workflow in this case.
>
> ср, 28 нояб. 2018 г. в 22:24, Anton Chernov :
>
> > Great, thank you for your answers.
> >
> > Is it possible to have an offline build somehow? For example, if all
> > dependencies would be stored locally. Probably, that would require some
> > modifications to the conan file, right?
> >
> >
> > вт, 27 нояб. 2018 г. в 16:59, Konstantin Ivlev :
> >
> > > >  Would it be possible to define cmake itself as a conan dependency
> for
> > > MXNet?
> > > yes, it is definitely possible
> > > for instance, to declare CMake 3.13.0 as a dependency, the following
> line
> > > has to be added to the conanfile.py:
> > >
> > > build_requires = "cmake_installer/3.13.0@conan/stable"
> > >
> > > other build tools might be added in a similar manner.
> > >
> > > вт, 27 нояб. 2018 г. в 22:54, Anton Chernov :
> > >
> > > > I think I asked this already, but want to confirm in regards to the
> > > > following discussion:
> > > >
> > > > MXNet CMake build - raise minimal required version
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/09772f9447ebff72cdfea5512ae70e295db8a21ba0ddb39359cd0a77@%3Cdev.mxnet.apache.org%3E
> > > >
> > > > Would it be possible to define cmake itself as a conan dependency for
> > > > MXNet?
> > > >
> > > > Best
> > > > Anton
> > > >
> > > > вт, 27 нояб. 2018 г. в 15:44, Konstantin Ivlev  >:
> > > >
> > > > > both questions are totally valid.
> > > > > > Is it easy to create a build which will build dependencies from
> > > source?
> > > > > 1. yep, it's very easy, just add `--build` argument to the `conan
> > > > install`
> > > > > command line
> > > > > >  What guarantees you get with conan with regards to ABI / C++
> > stdlib
> > > > > binary compatibility of the pulled dependencies?
> > > > > 2. the default binary compatibility model may be found in
> > settings.yml
> > > (
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/conan-io/conan/blob/develop/conans/client/conf/__init__.py#L14
> > > > > )
> > > > > it includes the operation system, architecture, compiler, its
> version
> > > and
> > > > > C++ stdlib.
> > > > > while it's true that it won't cover 100% of use-cases (e.g. for
> older
> > > > > distros or older glibc), it's very good default for most users.
> > > > > you may always define custom binary compatibility attributes in
> your
> > > > > settings.yml (e.g. distro, microarchitecture, sanitizers, etc.).
> > > > >
> > > > >
> > > > > вт, 27 нояб. 2018 г. в 21:17, Pedro Larroy <
> > > pedro.larroy.li...@gmail.com
> > > > >:
> > > > >
> > > > > > Thanks both for the detailed explanations. Couple of more
> > questions:
> > > > > >
> > > > > > Is it easy to create a build which will build dependencies from
> > > source?
> > > > > > What guarantees you get with conan with regards to ABI / C++
> stdlib
> > > > > > binary compatibility of the pulled dependencies?
> > > > > >
> > > > > > Just to clarify: My concerns are in terms of reproducible builds
> /
> > > > > > source only distribution and undefined behaviour due to different
> > > > > > compiler / stdlib versions. Are these valid or is it oudated
> > > > > > knowledge?
> > > > > >
> > > > > > Pedro.
> > > > > > On Tue, Nov 27, 2018 at 2:34 PM Diego Rodriguez-Losada
> > > > > >  wrote:
> > > > > > >
> > &

[ANNOUNCE] Release Apache MXNet (incubating) version 1.3.1

2018-11-29 Thread Anton Chernov
Dear all,

The Apache MXNet (incubating) community is happy to announce Apache MXNet
(incubating) version 1.3.1!

Apache MXNet (incubating) is a deep learning framework designed for both
efficiency and flexibility. It allows you to mix symbolic and imperative
programming to maximize efficiency and productivity.

1.3.1 is a maintenance release incorporating important bug fixes and
important performance improvements.

A full list of the changes in this release can be found in the release
notes:
https://cwiki.apache.org/confluence/x/eZGzBQ

A link to the download can be found here:
http://mxnet.incubator.apache.org/install/download.html

If you prefer to build from source and experiment with various compile-time
configuration options, use this link to get the instructions:
http://mxnet.incubator.apache.org/install/index.html

Or you can download and play with MXNet easily using one of the options
below:

1. The Pip packages can be found here:
https://pypi.python.org/pypi/mxnet

2. The Docker Images can be found here:
https://hub.docker.com/r/mxnet/python/

Links in Maven to the published Scala packages:

https://repository.apache.org/content/repositories/releases/org/apache/mxnet/
https://repository.apache.org/#nexus-search;quick~org.apache.mxnet

and to the experimental Clojure packages:
https://repository.apache.org/content/repositories/releases/org/apache/mxnet/contrib/clojure/

The Docker images:
https://hub.docker.com/u/mxnet/

The Pip package:
https://pypi.python.org/pypi/mxnet

The Release Tag:
https://github.com/apache/incubator-mxnet/tree/1.3.1

MXNet Resources
- Our discussion forum (https://discuss.mxnet.io)
- MXNet user mailing list (
https://lists.apache.org/list.html?u...@mxnet.apache.org)
- MXNet dev mailing list (
https://lists.apache.org/list.html?d...@mxnet.apache.org)
- StackOverflow mxnet tag (https://stackoverflow.com/questions/tagged/mxnet)
- MXNet website (https://mxnet.incubator.apache.org/faq/)
- Github issues (https://github.com/apache/incubator-mxnet/issues)
- Wiki (https://cwiki.apache.org/confluence/display/MXNET)

Attend one of the regular user groups meetings:
https://cwiki.apache.org/confluence/x/7BY0BQ

For more information on Apache MXNet (incubating), please see:
https://mxnet.io


Best regards,
Apache MXNet (incubating) Team

___

DISCLAIMER:

Apache MXNet (incubating) is an effort undergoing incubation at The Apache
Software Foundation (ASF), sponsored by the name of Apache Incubator PMC.
Incubation is required of all newly accepted projects until a further
review indicates that the infrastructure, communications, and decision
making process have stabilized in a manner consistent with other successful
ASF projects. While incubation status is not necessarily a reflection of
the completeness or stability of the code, it does indicate that the
project has yet to be fully endorsed by the ASF.

https://cwiki.apache.org/confluence/x/BINjB


Re: using conan to manage Apache Incubator MXNet project dependencies

2018-11-28 Thread Anton Chernov
Great, thank you for your answers.

Is it possible to have an offline build somehow? For example, if all
dependencies would be stored locally. Probably, that would require some
modifications to the conan file, right?


вт, 27 нояб. 2018 г. в 16:59, Konstantin Ivlev :

> >  Would it be possible to define cmake itself as a conan dependency for
> MXNet?
> yes, it is definitely possible
> for instance, to declare CMake 3.13.0 as a dependency, the following line
> has to be added to the conanfile.py:
>
> build_requires = "cmake_installer/3.13.0@conan/stable"
>
> other build tools might be added in a similar manner.
>
> вт, 27 нояб. 2018 г. в 22:54, Anton Chernov :
>
> > I think I asked this already, but want to confirm in regards to the
> > following discussion:
> >
> > MXNet CMake build - raise minimal required version
> >
> >
> https://lists.apache.org/thread.html/09772f9447ebff72cdfea5512ae70e295db8a21ba0ddb39359cd0a77@%3Cdev.mxnet.apache.org%3E
> >
> > Would it be possible to define cmake itself as a conan dependency for
> > MXNet?
> >
> > Best
> > Anton
> >
> > вт, 27 нояб. 2018 г. в 15:44, Konstantin Ivlev :
> >
> > > both questions are totally valid.
> > > > Is it easy to create a build which will build dependencies from
> source?
> > > 1. yep, it's very easy, just add `--build` argument to the `conan
> > install`
> > > command line
> > > >  What guarantees you get with conan with regards to ABI / C++ stdlib
> > > binary compatibility of the pulled dependencies?
> > > 2. the default binary compatibility model may be found in settings.yml
> (
> > >
> > >
> >
> https://github.com/conan-io/conan/blob/develop/conans/client/conf/__init__.py#L14
> > > )
> > > it includes the operation system, architecture, compiler, its version
> and
> > > C++ stdlib.
> > > while it's true that it won't cover 100% of use-cases (e.g. for older
> > > distros or older glibc), it's very good default for most users.
> > > you may always define custom binary compatibility attributes in your
> > > settings.yml (e.g. distro, microarchitecture, sanitizers, etc.).
> > >
> > >
> > > вт, 27 нояб. 2018 г. в 21:17, Pedro Larroy <
> pedro.larroy.li...@gmail.com
> > >:
> > >
> > > > Thanks both for the detailed explanations. Couple of more questions:
> > > >
> > > > Is it easy to create a build which will build dependencies from
> source?
> > > > What guarantees you get with conan with regards to ABI / C++ stdlib
> > > > binary compatibility of the pulled dependencies?
> > > >
> > > > Just to clarify: My concerns are in terms of reproducible builds /
> > > > source only distribution and undefined behaviour due to different
> > > > compiler / stdlib versions. Are these valid or is it oudated
> > > > knowledge?
> > > >
> > > > Pedro.
> > > > On Tue, Nov 27, 2018 at 2:34 PM Diego Rodriguez-Losada
> > > >  wrote:
> > > > >
> > > > > Hi Pedro,
> > > > >
> > > > > Conan is distributed. So besides building from sources the
> > > dependencies,
> > > > it
> > > > > is also possible to create binaries yourself for those dependencies
> > > (with
> > > > > the existing recipes, or your own recipes), and host them in your
> own
> > > > repo
> > > > > (Bintray OSS repo, or Artifactory).
> > > > >
> > > > > This will provide both the security that you own the dependencies
> > > > binaries
> > > > > and the convenience and speed of not having to build from sources.
> > Even
> > > > if
> > > > > you provide the binaries, consumers can always fallback to build
> from
> > > > > sources too.
> > > > >
> > > > > Kind regards,
> > > > > Diego
> > > > >
> > > > > El mar., 27 nov. 2018 a las 13:34, Konstantin Ivlev (<
> > > > tomsks...@gmail.com>)
> > > > > escribió:
> > > > >
> > > > > > Hi Pedro,
> > > > > >
> > > > > > yes, you're absolutely right, by default, conan will be pulling
> > > > prebuilt
> > > > > > binaries for the libraries from the bintray.
> > > > > > however, if prebuilt binaries are not available (e.g. because you
> > use
> > > &

Re: [Discussion] Remove bundled llvm OpenMP

2018-11-27 Thread Anton Chernov
Another thing to take into consideration:

All python artefacts that are created (PyPi) are built with make and are
not using the bundled OpenMP library.

One step for the switch to CMake to happen is the approval and merging of
the mentioned PR:

https://github.com/apache/incubator-mxnet/pull/12160

If there are no other objections I kindly ask Chris Olivier to remove his
'requesting changes' veto on it to unblock the CMake overhaul work.

Thank you.

Best
Anton

чт, 22 нояб. 2018 г. в 17:11, Anton Chernov :

>
> Thank you for you answer, Chris.
>
> > The whole “mixing omp libraries” is something that occurs in production
> every day and certainly in everything that uses mkl.
>
> I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures that
> this mixture is not happening:
>
> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime
> library to work. As different OpenMP runtimes may not be binary compatible
> it's important to ensure that only one OpenMP runtime is used throughout
> the application. Having more than one OpenMP runtime initialized may lead
> to undefined behavior resulting in incorrect results or crashes." [1]
>
> That is why 2 different MKLML libraries are provided:
>
> lib/libmklml_gnu.so  | Intel MKL small library for GNU* OpenMP runtime
> lib/libmklml_intel.so | Intel MKL small library for Intel(R) OpenMP runtime
>
> > is the suggestion that libiomp be removed from mkl?
>
> That is certainly not my suggestion.
>
> > have you spoken with intel? have you consulted Intel at all?
>
> Yes, I have asked for comments on the issue.
>
> > “hard to debug random crash”. you’re seeing an assertion which is
> probably ...
>
> I'm seeing the result of undefined behaviour. And I want to put emphasis
> on the following statement:
>
> I disregards of whether there is a particular reason for the assert - it
> is a result of behaviour that should not happen. There are valid ways how
> to use llvm OpenMP in MXNet and the current way is not one of them.
>
> > The lack of root-causing the problem and knee-jerk solution here makes me
> uncomfortable.
>
> I hope that my efforts highlighting the problems reach you to mitigate
> your uncomfort.
>
> > if you want to see performance differences there’s an environment
> variable
> you can set in the mxnet omp tuning code that will print overhead and
> execution times for the current omp library.
>
> I don't want to see performance differences in the current OpenMP library.
> I want to remove the current OpenMP library and use the one provided by the
> compiler.
>
>
>
> Best
> Anton
>
> [1] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
>
> чт, 22 нояб. 2018 г. в 16:50, Chris Olivier :
>
>> Do you not work on CI mostly? My apologies for thinking that was some sort
>> of team effort between you and a few others that were passionate about CI
>> keeping the CI system running smoothly.
>>
>> You have source code, you have the line the assertion is on. If you can’t
>> describe what’s going wrong that causes the assertion, then I don’t really
>> have anything more to add to this conversation beyond what’s below:
>>
>> The whole “mixing omp libraries” is something that occurs in production
>> every day and certainly in everything that uses mkl.  It may occasionally
>> cause problems for some edge cases when there is super-complex linking
>> strategies and dynamic loading.  But this is not one of those edge cases.
>> Mostly blaming this is a red herring for other thread-related problems and
>> people switch omp library and the timing of their code changes and they
>> stop seeing the problem. I’ve spent my entire career doing heavily
>> multiphreaded c++ development and i’ve seen that a million times.  is the
>> suggestion that libiomp be removed from mkl? have you spoken with intel?
>> have you consulted Intel at all?
>>
>> and what you are seeing isn’t some “hard to debug random crash”. you’re
>> seeing an assertion which is probably related to omp trying to create a
>> thread pool after a fork and something was done in the mxnet code to make
>> that sketchy to do. I’d suggest filing an issue with the llvm openmp just
>> like you’d file with any other not-well-understood behavior in mxnet.
>>
>> The lack of root-causing the problem and knee-jerk solution here makes me
>> uncomfortable.
>>
>> if you want to see performance differences there’s an environment variable
>> you can set in the mxnet omp tuning code that will print overhead and
>> execution times for the current omp library.
>>
>>
>>
>>
>&

Re: using conan to manage Apache Incubator MXNet project dependencies

2018-11-27 Thread Anton Chernov
I think I asked this already, but want to confirm in regards to the
following discussion:

MXNet CMake build - raise minimal required version
https://lists.apache.org/thread.html/09772f9447ebff72cdfea5512ae70e295db8a21ba0ddb39359cd0a77@%3Cdev.mxnet.apache.org%3E

Would it be possible to define cmake itself as a conan dependency for MXNet?

Best
Anton

вт, 27 нояб. 2018 г. в 15:44, Konstantin Ivlev :

> both questions are totally valid.
> > Is it easy to create a build which will build dependencies from source?
> 1. yep, it's very easy, just add `--build` argument to the `conan install`
> command line
> >  What guarantees you get with conan with regards to ABI / C++ stdlib
> binary compatibility of the pulled dependencies?
> 2. the default binary compatibility model may be found in settings.yml (
>
> https://github.com/conan-io/conan/blob/develop/conans/client/conf/__init__.py#L14
> )
> it includes the operation system, architecture, compiler, its version and
> C++ stdlib.
> while it's true that it won't cover 100% of use-cases (e.g. for older
> distros or older glibc), it's very good default for most users.
> you may always define custom binary compatibility attributes in your
> settings.yml (e.g. distro, microarchitecture, sanitizers, etc.).
>
>
> вт, 27 нояб. 2018 г. в 21:17, Pedro Larroy :
>
> > Thanks both for the detailed explanations. Couple of more questions:
> >
> > Is it easy to create a build which will build dependencies from source?
> > What guarantees you get with conan with regards to ABI / C++ stdlib
> > binary compatibility of the pulled dependencies?
> >
> > Just to clarify: My concerns are in terms of reproducible builds /
> > source only distribution and undefined behaviour due to different
> > compiler / stdlib versions. Are these valid or is it oudated
> > knowledge?
> >
> > Pedro.
> > On Tue, Nov 27, 2018 at 2:34 PM Diego Rodriguez-Losada
> >  wrote:
> > >
> > > Hi Pedro,
> > >
> > > Conan is distributed. So besides building from sources the
> dependencies,
> > it
> > > is also possible to create binaries yourself for those dependencies
> (with
> > > the existing recipes, or your own recipes), and host them in your own
> > repo
> > > (Bintray OSS repo, or Artifactory).
> > >
> > > This will provide both the security that you own the dependencies
> > binaries
> > > and the convenience and speed of not having to build from sources. Even
> > if
> > > you provide the binaries, consumers can always fallback to build from
> > > sources too.
> > >
> > > Kind regards,
> > > Diego
> > >
> > > El mar., 27 nov. 2018 a las 13:34, Konstantin Ivlev (<
> > tomsks...@gmail.com>)
> > > escribió:
> > >
> > > > Hi Pedro,
> > > >
> > > > yes, you're absolutely right, by default, conan will be pulling
> > prebuilt
> > > > binaries for the libraries from the bintray.
> > > > however, if prebuilt binaries are not available (e.g. because you use
> > some
> > > > different compiler for which we don't have prebuilt binaries),
> > > > or if you want to build binaries yourself for some another reason,
> > > > then libraries always might be built from source (by passing e.g.
> > "--build
> > > > always", "--build missing" or "--build " to the conan
> install
> > > > command line).
> > > >
> > > > yours sincerely, Konstantin
> > > >
> > > > вт, 27 нояб. 2018 г. в 19:27, Pedro Larroy <
> > pedro.larroy.li...@gmail.com>:
> > > >
> > > > > Hi Konstantin
> > > > >
> > > > > Thanks for this contribution. With your proposed changes, when
> > > > > building MXNet we will be pulling binaries for the libraries
> managed
> > > > > by conan?
> > > > >
> > > > >
> > > > > Pedro.
> > > > > On Mon, Nov 26, 2018 at 11:43 AM Konstantin Ivlev <
> > tomsks...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Hello, Ivan,
> > > > > >
> > > > > > could you possibly clarify your question (may be explaining the
> > > > use-case
> > > > > > behind it)?
> > > > > > Gradle appears to be build system, AFAIK more popular in Java
> > world.
> > > > > > meanwhile, Apache Incubator MXNet project uses CMake as its build
> > > > system.
> > > > > > please correct me, I am wrong.
> > > > > > in general, conan, as a package manager, is pretty
> > > > build-system-agnostic,
> > > > > > and it may work with arbitrary build systems (to count a few,
> > CMake,
> > > > > > premake, qmake, full list:
> > > > > > https://docs.conan.io/en/latest/reference/generators.html). I
> > don't
> > > > > think
> > > > > > Gradle is exception here.
> > > > > > also, for instance, Android Studio also uses Gradle for Android
> C++
> > > > > > projects, and conan works flawlessly in this particular case (
> > > > > >
> > > >
> > https://blog.conan.io/2018/02/13/Android-Studio-project-Conan-Boost.html
> > > > > ).
> > > > > >
> > > > > > yours sincerely, Konstantin
> > > > > >
> > > > > > пн, 26 нояб. 2018 г. в 16:43, Ivan Serdyuk <
> > > > local.tourist.k...@gmail.com
> > > > > >:
> > > > > >
> > > > > > > Kostantin, and what (overall) option with using Gradle? Does

Re: [Discussion] Remove bundled llvm OpenMP

2018-11-22 Thread Anton Chernov
Thank you for you answer, Chris.

> The whole “mixing omp libraries” is something that occurs in production
every day and certainly in everything that uses mkl.

I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures that
this mixture is not happening:

"Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime
library to work. As different OpenMP runtimes may not be binary compatible
it's important to ensure that only one OpenMP runtime is used throughout
the application. Having more than one OpenMP runtime initialized may lead
to undefined behavior resulting in incorrect results or crashes." [1]

That is why 2 different MKLML libraries are provided:

lib/libmklml_gnu.so  | Intel MKL small library for GNU* OpenMP runtime
lib/libmklml_intel.so | Intel MKL small library for Intel(R) OpenMP runtime

> is the suggestion that libiomp be removed from mkl?

That is certainly not my suggestion.

> have you spoken with intel? have you consulted Intel at all?

Yes, I have asked for comments on the issue.

> “hard to debug random crash”. you’re seeing an assertion which is
probably ...

I'm seeing the result of undefined behaviour. And I want to put emphasis on
the following statement:

I disregards of whether there is a particular reason for the assert - it is
a result of behaviour that should not happen. There are valid ways how to
use llvm OpenMP in MXNet and the current way is not one of them.

> The lack of root-causing the problem and knee-jerk solution here makes me
uncomfortable.

I hope that my efforts highlighting the problems reach you to mitigate your
uncomfort.

> if you want to see performance differences there’s an environment variable
you can set in the mxnet omp tuning code that will print overhead and
execution times for the current omp library.

I don't want to see performance differences in the current OpenMP library.
I want to remove the current OpenMP library and use the one provided by the
compiler.



Best
Anton

[1] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265

чт, 22 нояб. 2018 г. в 16:50, Chris Olivier :

> Do you not work on CI mostly? My apologies for thinking that was some sort
> of team effort between you and a few others that were passionate about CI
> keeping the CI system running smoothly.
>
> You have source code, you have the line the assertion is on. If you can’t
> describe what’s going wrong that causes the assertion, then I don’t really
> have anything more to add to this conversation beyond what’s below:
>
> The whole “mixing omp libraries” is something that occurs in production
> every day and certainly in everything that uses mkl.  It may occasionally
> cause problems for some edge cases when there is super-complex linking
> strategies and dynamic loading.  But this is not one of those edge cases.
> Mostly blaming this is a red herring for other thread-related problems and
> people switch omp library and the timing of their code changes and they
> stop seeing the problem. I’ve spent my entire career doing heavily
> multiphreaded c++ development and i’ve seen that a million times.  is the
> suggestion that libiomp be removed from mkl? have you spoken with intel?
> have you consulted Intel at all?
>
> and what you are seeing isn’t some “hard to debug random crash”. you’re
> seeing an assertion which is probably related to omp trying to create a
> thread pool after a fork and something was done in the mxnet code to make
> that sketchy to do. I’d suggest filing an issue with the llvm openmp just
> like you’d file with any other not-well-understood behavior in mxnet.
>
> The lack of root-causing the problem and knee-jerk solution here makes me
> uncomfortable.
>
> if you want to see performance differences there’s an environment variable
> you can set in the mxnet omp tuning code that will print overhead and
> execution times for the current omp library.
>
>
>
>
>
>
>
> On Thu, Nov 22, 2018 at 7:12 AM Anton Chernov  wrote:
>
> > Hi Chris,
> >
> > Thank you for your answer. If you have noticed the initial email comes
> from
> > me, Anton Chernov (@lebeg on Github) and thus the proposal is not from
> any
> > 'Ci' team that you've mentioned, but from me personally.
> >
> > You are writing:
> >
> > > someone is doing something unhealthy when they fork ...
> >
> > I'm missing any context to understand what you mean.
> >
> > > we get a lot of performance gain from OMP ...
> >
> > There is no data that would prove this statement and therefore it is a
> > random guess.
> >
> > > in many months, no investigation has occurred as to WHY the assertion
> is
> > failing.
> >
> > The investigation has concluded that this is happening due to undefined

Re: [Discussion] MXNet CMake build - raise minimal required version

2018-11-22 Thread Anton Chernov
You can find relevant information regarding the profiling flag here:

https://github.com/apache/incubator-mxnet/issues/11563


чт, 22 нояб. 2018 г. в 16:06, Chris Olivier :

> what is meant by:
>
>
> *Profiling*
> The profiler is always on even for production release builds, because MXNet
> can not be build without it [2].  ?
>
> you mean it is always built or it is turned on (ie recording and saving
> profiling information)?  I am not aware of it being turned on by default.
>
>
> profiler has no overhead when built in but not turned on.
>
>
> On Thu, Nov 22, 2018 at 2:35 AM Anton Chernov  wrote:
>
> > Dear MXNet community,
> >
> > I propose to raise the minimal required cmake version that is needed to
> > build MXNet to 3.10 which was tagged on March 16 2018 [1].
> >
> > The effort of repairing cmake scripts in general is targeting to
> deprecate
> > make and maintain only 1 build system.
> >
> > *Need*
> >
> > The build system is the foundation of every software project. It's
> quality
> > is directly impacting the quality of the project. The MXNet build system
> is
> > fragile, partially broken and not maintained.
> >
> > Users of MXNet and developers are confused by the fact that 2 build
> systems
> > exist at the same time: make and CMake.
> >
> > The main functional areas which are impacted by the current state of the
> > cmake files are:
> >
> > *OpenMP*
> > The current CMake files mix OpenMP libraries from different compliers
> which
> > is undefined behaviour. It leads to indeterministic crashes on some
> > platforms. Build and deployment are very hard. No evidence exists that
> > proves that there is any benefit of having llvm OpenMP library as a
> > submodule in MXNet.
> >
> > *BLAS and LAPACK*
> > Basic math library usage is mixed up. It is hard and confusing to
> configure
> > and the choosing logic of the most optimal library is not present. MKL
> and
> > OpenBLAS are intermixed in an unpredictable manner.
> >
> > *Profiling*
> > The profiler is always on even for production release builds, because
> MXNet
> > can not be build without it [2].
> >
> > *CUDA*
> > CUDA is detected by 3 different files in the current cmake scripts and
> the
> > choice of those is based on a obscure logic with involves different
> > versions of cmake and platforms which it's building on
> >
> > * CMakeLists.txt
> > * cmake/FirstClassLangCuda.cmake
> > * 3rdparty/mshadow/cmake/Cuda.cmake
> >
> >
> > *Confusing and misleading cmake user options*
> > For example, USE_CUDA / USE_OLDCMAKECUDA. Some of them will do or not do
> > what they supposed to based on cmake generator version and version of
> cmake
> > [3].
> > There are currently more than 30 build parameters for MXNet none of them
> > documented. Some of them not even located in the main CMakeLists.txt
> file,
> > for example 'BLAS'.
> >
> >
> > *Issues*
> > There is a significant amount of github issues related to cmake or build
> in
> > general. New tickets are issued frequently.
> >
> > * #8702 (https://github.com/apache/incubator-mxnet/issues/8702)
> >  [DISCUSSION] Should we deprecate Makefile and only use CMake?
> > * #5079 (https://github.com/apache/incubator-mxnet/issues/5079)
>  troubles
> > building python interface on raspberry pi 3
> > * #1722 (https://github.com/apache/incubator-mxnet/issues/1722)
>  problem:
> > compile mxnet with hdfs
> > * #11549 (https://github.com/apache/incubator-mxnet/issues/11549) Pip
> > package can be much faster (OpenCV version?)
> > * #11417 (https://github.com/apache/incubator-mxnet/issues/11417)
> > libomp.so
> > dependency (need REAL fix)
> > * #8532 (https://github.com/apache/incubator-mxnet/issues/8532)
> >  mxnet-mkl
> > (v0.12.0) crash when using (conda-installed) numpy with MKL //
> (indirectly)
> > * #11131 (https://github.com/apache/incubator-mxnet/issues/11131)
> > mxnet-cu92 low efficiency  // (indirectly)
> > * #10743 (https://github.com/apache/incubator-mxnet/issues/10743) CUDA
> > 9.1.xx failed if not set OLDCMAKECUDA on cmake 3.10.3 with unix makefile
> or
> > Ninja generator
> > * #10742 (https://github.com/apache/incubator-mxnet/issues/10742) typo
> in
> > cpp-package/CMakeLists.txt
> > * #10737 (https://github.com/apache/incubator-mxnet/issues/10737) Cmake
> is
> > running again when execute make install
> > * #10543 (https://github.com/apache/incubator-mxnet/issues/10543) Failed
> > to
> > bui

Re: [Discussion] Remove bundled llvm OpenMP

2018-11-22 Thread Anton Chernov
Hi Chris,

Thank you for your answer. If you have noticed the initial email comes from
me, Anton Chernov (@lebeg on Github) and thus the proposal is not from any
'Ci' team that you've mentioned, but from me personally.

You are writing:

> someone is doing something unhealthy when they fork ...

I'm missing any context to understand what you mean.

> we get a lot of performance gain from OMP ...

There is no data that would prove this statement and therefore it is a
random guess.

> in many months, no investigation has occurred as to WHY the assertion is
failing.

The investigation has concluded that this is happening due to undefined
behaviour which is, in my opinion, a suffient answer that does not require
to go any deeper.

> the pr is vetoed until such a time that the actual root cause of the
problem is known.

And considering the statements above there is no valid reason to veto the
PR.


Best
Anton

чт, 22 нояб. 2018 г. в 15:38, Chris Olivier :

> 3x less overhead*
>
> On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier 
> wrote:
>
> > someone is doing something unhealthy when they fork, which is causing an
> > assertion in the openmp library. the same assertion that would fire in
> mkl,
> > which is linked to libiomp5 (exact same omp library). this is new
> behavior
> > and most likely due to an error or suboptimal approach in the forking
> logic
> > in mxnet.
> >
> > in order to circumvent the assert, the Ci team is proposing to remove the
> > library completely which is equivalent to cutting off your leg to make
> the
> > pain from stubbing your toe go away.
> >
> > we get a lot of performance gain from OMP. is has about a 1/3 less
> > overhead for entering omp regions and also supports omp regions after a
> > fork, which libgomp does not.
> >
> > in many months, no investigation has occurred as to WHY the assertion is
> > failing.
> >
> > the pr is vetoed until such a time that the actual root cause of the
> > problem is known.
> >
> >
> > thanks,
> >
> > -Chris.
> >
> >
> >
> >
> > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov 
> wrote:
> >
> >> Dear MXNet community,
> >>
> >> I would like to drive attention to an important issue that is present in
> >> the MXNet CMake build: usage of bundled llvm OpenMP library.
> >>
> >> I have opened a PR to remove it:
> >> https://github.com/apache/incubator-mxnet/pull/12160
> >>
> >> The issue was closed, but I am strong in my oppinion that it's the right
> >> thing to do.
> >>
> >> *Background*
> >> If you want to use OpenMP pragmas in your code for parallelization you
> >> would supply a special flag to the compiler:
> >>
> >> - Clang / -fopenmp
> >> https://openmp.llvm.org/
> >>
> >> - GCC / -fopenmp
> >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
> >>
> >> - Intel / [Q]openmp
> >>
> >>
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> >>
> >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
> >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
> >>
> >> Each of the compilers would enable the '#pragma omp' directive during
> >> C/C++
> >> compilation and arrange for automatic linking of the OpenMP runtime
> >> library
> >> supplied by each complier separately.
> >>
> >> Thus, to use the advantages of an OpenMP implementation one has to
> compile
> >> the code with the corresponding compiler.
> >>
> >> Currently, in MXNet CMake build scripts a bundled version of llvm OpenMP
> >> is
> >> used ([1] and [2]) to replace the OpenMP library supplied by the
> compiler.
> >>
> >> I will quote here the README from the MKL-DNN (Intel(R) Math Kernel
> >> Library
> >> for Deep Neural Networks):
> >>
> >> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
> runtime
> >> library to work. As different OpenMP runtimes may not be binary
> compatible
> >> it's important to ensure that only one OpenMP runtime is used throughout
> >> the application. Having more than one OpenMP runtime initialized may
> lead
> >> to undefined behavior resulting in incorrect results or crashes." [3]
> >>
> >> And:
> >>
> >> "Using GNU compiler with -fopenmp and -liomp5 options will link the
> >> application with both Intel and GNU OpenMP runtime libraries. This will
> >>

[Discussion] Remove bundled llvm OpenMP

2018-11-22 Thread Anton Chernov
Dear MXNet community,

I would like to drive attention to an important issue that is present in
the MXNet CMake build: usage of bundled llvm OpenMP library.

I have opened a PR to remove it:
https://github.com/apache/incubator-mxnet/pull/12160

The issue was closed, but I am strong in my oppinion that it's the right
thing to do.

*Background*
If you want to use OpenMP pragmas in your code for parallelization you
would supply a special flag to the compiler:

- Clang / -fopenmp
https://openmp.llvm.org/

- GCC / -fopenmp
https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html

- Intel / [Q]openmp
https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1

- Visual Studio: /openmp (Enable OpenMP 2.0 Support)
https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx

Each of the compilers would enable the '#pragma omp' directive during C/C++
compilation and arrange for automatic linking of the OpenMP runtime library
supplied by each complier separately.

Thus, to use the advantages of an OpenMP implementation one has to compile
the code with the corresponding compiler.

Currently, in MXNet CMake build scripts a bundled version of llvm OpenMP is
used ([1] and [2]) to replace the OpenMP library supplied by the compiler.

I will quote here the README from the MKL-DNN (Intel(R) Math Kernel Library
for Deep Neural Networks):

"Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime
library to work. As different OpenMP runtimes may not be binary compatible
it's important to ensure that only one OpenMP runtime is used throughout
the application. Having more than one OpenMP runtime initialized may lead
to undefined behavior resulting in incorrect results or crashes." [3]

And:

"Using GNU compiler with -fopenmp and -liomp5 options will link the
application with both Intel and GNU OpenMP runtime libraries. This will
lead to undefined behavior of the application." [4]

As can be seen from ldd for MXNet:

$ ldd build/tests/mxnet_unit_tests | grep omp
libomp.so => /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
(0x7f697bc55000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
(0x7f69660cd000)

*Performance*

The only performance data related to OpenMP in MXNet I was able to find is
here:
https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172

Which in my understanding is testing imact of different environment
variables for the same setup (using same bundled OpenMP library).

The libraries may differ in implementation and the Thread Affinity
Interface [5] may have significant impact on performance.

All compliers support it:

- Clang / KMP_AFFINITY
https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp

- GCC / GOMP_CPU_AFFINITY
https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html

- Intel / KMP_AFFINITY
https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1

- Visual Studio / SetThreadAffinityMask
https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask

*Issues*

Failed OpenMP assertion when loading MXNet compiled with DEBUG=1
https://github.com/apache/incubator-mxnet/issues/10856

libomp.so dependency (need REAL fix)
https://github.com/apache/incubator-mxnet/issues/11417

mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL
https://github.com/apache/incubator-mxnet/issues/8532

Performance regression when OMP_NUM_THREADS environment variable is not set
https://github.com/apache/incubator-mxnet/issues/9744

Poor concat CPU performance on CUDA builds
https://github.com/apache/incubator-mxnet/issues/11905

I would appreciate hearing your thoughts.


Best
Anton

[1]
https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
[2] https://github.com/apache/incubator-mxnet/tree/master/3rdparty
[3] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
[4] https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
[5] https://software.intel.com/en-us/node/522691


[Discussion] MXNet CMake build - raise minimal required version

2018-11-22 Thread Anton Chernov
Dear MXNet community,

I propose to raise the minimal required cmake version that is needed to
build MXNet to 3.10 which was tagged on March 16 2018 [1].

The effort of repairing cmake scripts in general is targeting to deprecate
make and maintain only 1 build system.

*Need*

The build system is the foundation of every software project. It's quality
is directly impacting the quality of the project. The MXNet build system is
fragile, partially broken and not maintained.

Users of MXNet and developers are confused by the fact that 2 build systems
exist at the same time: make and CMake.

The main functional areas which are impacted by the current state of the
cmake files are:

*OpenMP*
The current CMake files mix OpenMP libraries from different compliers which
is undefined behaviour. It leads to indeterministic crashes on some
platforms. Build and deployment are very hard. No evidence exists that
proves that there is any benefit of having llvm OpenMP library as a
submodule in MXNet.

*BLAS and LAPACK*
Basic math library usage is mixed up. It is hard and confusing to configure
and the choosing logic of the most optimal library is not present. MKL and
OpenBLAS are intermixed in an unpredictable manner.

*Profiling*
The profiler is always on even for production release builds, because MXNet
can not be build without it [2].

*CUDA*
CUDA is detected by 3 different files in the current cmake scripts and the
choice of those is based on a obscure logic with involves different
versions of cmake and platforms which it's building on

* CMakeLists.txt
* cmake/FirstClassLangCuda.cmake
* 3rdparty/mshadow/cmake/Cuda.cmake


*Confusing and misleading cmake user options*
For example, USE_CUDA / USE_OLDCMAKECUDA. Some of them will do or not do
what they supposed to based on cmake generator version and version of cmake
[3].
There are currently more than 30 build parameters for MXNet none of them
documented. Some of them not even located in the main CMakeLists.txt file,
for example 'BLAS'.


*Issues*
There is a significant amount of github issues related to cmake or build in
general. New tickets are issued frequently.

* #8702 (https://github.com/apache/incubator-mxnet/issues/8702)
 [DISCUSSION] Should we deprecate Makefile and only use CMake?
* #5079 (https://github.com/apache/incubator-mxnet/issues/5079)   troubles
building python interface on raspberry pi 3
* #1722 (https://github.com/apache/incubator-mxnet/issues/1722)   problem:
compile mxnet with hdfs
* #11549 (https://github.com/apache/incubator-mxnet/issues/11549) Pip
package can be much faster (OpenCV version?)
* #11417 (https://github.com/apache/incubator-mxnet/issues/11417) libomp.so
dependency (need REAL fix)
* #8532 (https://github.com/apache/incubator-mxnet/issues/8532)   mxnet-mkl
(v0.12.0) crash when using (conda-installed) numpy with MKL // (indirectly)
* #11131 (https://github.com/apache/incubator-mxnet/issues/11131)
mxnet-cu92 low efficiency  // (indirectly)
* #10743 (https://github.com/apache/incubator-mxnet/issues/10743) CUDA
9.1.xx failed if not set OLDCMAKECUDA on cmake 3.10.3 with unix makefile or
Ninja generator
* #10742 (https://github.com/apache/incubator-mxnet/issues/10742) typo in
cpp-package/CMakeLists.txt
* #10737 (https://github.com/apache/incubator-mxnet/issues/10737) Cmake is
running again when execute make install
* #10543 (https://github.com/apache/incubator-mxnet/issues/10543) Failed to
build from source when set USE_CPP_PACKAGE = 1, fatal error C1083: unabel
to open file: “mxnet-cpp/op.h”: No such file or directory
* #10217 (https://github.com/apache/incubator-mxnet/issues/10217) Building
with OpenCV causes link errors
* #10175 (https://github.com/apache/incubator-mxnet/issues/10175) MXNet
MKLDNN build dependency/flow discussion
* #10009 (https://github.com/apache/incubator-mxnet/issues/10009)
[CMAKE][IoT] Remove pthread from android_arm64 build
* #9944 (https://github.com/apache/incubator-mxnet/issues/9944)   MXNet
MinGW-w64 build error // (indirectly)
* #9868 (https://github.com/apache/incubator-mxnet/issues/9868)   MKL and
CMake
* #9516 (https://github.com/apache/incubator-mxnet/issues/9516)   cmake
cuda arch issues
* #9105 (https://github.com/apache/incubator-mxnet/issues/9105)
 libmxnet.so load path error
* #9096 (https://github.com/apache/incubator-mxnet/issues/9096)   MXNet
built with GPerftools crashes
* #8786 (https://github.com/apache/incubator-mxnet/issues/8786)   Link
failure on DEBUG=1 (static member symbol not defined) // (indirectly)
* #8729 (https://github.com/apache/incubator-mxnet/issues/8729)   Build
amalgamation using a docker // (indirectly)
* #8667 (https://github.com/apache/incubator-mxnet/issues/8667)
 Compiler/linker error while trying to build from source on Mac OSX Sierra
10.12.6
* #8295 (https://github.com/apache/incubator-mxnet/issues/8295)   Building
with cmake - error
* #7852 (https://github.com/apache/incubator-mxnet/issues/7852)   Trouble
installing MXNet on Raspberry Pi 3
* #13303 

Re: [RESULTS] [VOTE] Release Apache MXNet (incubating) version 1.3.1.rc0

2018-11-21 Thread Anton Chernov
The vote on the @general list of incubator.apache.org has been started:

https://lists.apache.org/thread.html/adf86e1c3332559ad91880a412aa1063dc72cd6f4f3d6c4c0d91a2dd@%3Cgeneral.incubator.apache.org%3E

The vote closes on 24th of November 2018 14:30 CET.


Best
Anton


вт, 20 нояб. 2018 г. в 20:34, Hagay Lupesko :

> Great - congrats!
>
> On Tue, Nov 20, 2018 at 8:51 AM Anton Chernov  wrote:
>
> > Dear MXNet community,
> >
> > I'm happy to announce the results of the vote.
> >
> > This vote passes with 8 +1 votes (4 binding) and no 0 or -1 votes.
> >
> > +1 votes
> >
> > * Carin / binding
> > * Indhu / binding
> > * Sandeep / binding
> > * Jim / binding
> > * Kellen
> > * Steffen
> > * Roshani
> > * Aaron
> >
> > 0 votes
> > * No votes
> >
> > -1 votes
> > * No votes
> >
> > Vote thread can be found here [1]. The list of members can be found here
> > [2].
> >
> > I'll continue with the release process and the release announcement will
> > follow in the next few days.
> >
> >
> > Best
> > Anton
> >
> > [1]
> >
> >
> https://lists.apache.org/thread.html/32ab13b6d2d80fd75dbc2ec62151d12d09f6e0ca89799ae0aa26894b@%3Cdev.mxnet.apache.org%3E
> > [2] http://incubator.apache.org/projects/mxnet.html
> >
>


[RESULTS] [VOTE] Release Apache MXNet (incubating) version 1.3.1.rc0

2018-11-20 Thread Anton Chernov
Dear MXNet community,

I'm happy to announce the results of the vote.

This vote passes with 8 +1 votes (4 binding) and no 0 or -1 votes.

+1 votes

* Carin / binding
* Indhu / binding
* Sandeep / binding
* Jim / binding
* Kellen
* Steffen
* Roshani
* Aaron

0 votes
* No votes

-1 votes
* No votes

Vote thread can be found here [1]. The list of members can be found here
[2].

I'll continue with the release process and the release announcement will
follow in the next few days.


Best
Anton

[1]
https://lists.apache.org/thread.html/32ab13b6d2d80fd75dbc2ec62151d12d09f6e0ca89799ae0aa26894b@%3Cdev.mxnet.apache.org%3E
[2] http://incubator.apache.org/projects/mxnet.html


Re: [VOTE] Release Apache MXNet (incubating) version 1.3.1.rc0

2018-11-20 Thread Anton Chernov
Thank you everyone, the vote is closed. I will send the results in a
separate announcement.

Best
Anton

пн, 19 нояб. 2018 г. в 15:44, Jim Jagielski :

> +1 from me (macOS)
>
> > On Nov 16, 2018, at 2:52 AM, kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
> >
> > Thanks for organizing the release Anton and for testing Carin and
> Steffen.
> > Lots of great fixes in this release.  As we don't have the required 3
> > committers I'd suggest extending the vote for a few days.
> >
> > I tested the following on MacOS 10.13, High Sierra:
> >
> > INCUBATING IN RELEASE FILE: check.
> > LICENSE check.
> > NOTICE check.
> > SIGNATURE check.
> > HASH check.
> > DISCLAIMER check.
> > SOURCE COMPILES VIA MAKEFILE check.
> > SOURCE COMPILES VIA CMAKE check.
> > C++ TESTS PASS fail
> > Two tests failing for me.
> > Build with flags: cmake -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_OPENMP=0
> > -DUSE_OPENCV=0 ..
> > Ran c++ tests with exclusions: ./tests/mxnet_unit_tests
> > --gtest_filter=-GpuTopology.*
> > Result:
> > [  FAILED  ] 2 tests, listed below:
> > [  FAILED  ] ACTIVATION_PERF.ExecuteBidirectional
> > [  FAILED  ] ACTIVATION_PERF.TimingCPU
> >
> > PYHTON UNIT TESTS PASS check.
> >
> > Not sure if the test failures are a regression so I'm +0 (non-binding)
> >
> > On Thu, Nov 15, 2018 at 5:43 PM Steffen Rochel 
> > wrote:
> >
> >> +1 build on MacOS Sierra following instructions on
> >>
> >>
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Developer+Setup+on+Mac
> >> and run one training test.
> >>
> >> On Tue, Nov 13, 2018 at 2:34 PM Carin Meier 
> wrote:
> >>
> >>> +1 - Clojure package tested fine with Scala jars
> >>>
> >>> On Mon, Nov 12, 2018 at 6:53 PM Anton Chernov 
> >> wrote:
> >>>
> >>>> Dear MXNet community,
> >>>>
> >>>> This is the vote to release Apache MXNet (incubating) version 1.3.1.
> >>> Voting
> >>>> will start now, on Monday the 12th of November 2018 and close on 14:00
> >>>> Thursday the 15th of November 2018, Pacific Time (PT).
> >>>>
> >>>> Link to release notes:
> >>>> https://cwiki.apache.org/confluence/x/eZGzBQ
> >>>>
> >>>> Link to release candidate 1.3.1.rc0:
> >>>> https://github.com/apache/incubator-mxnet/releases/tag/1.3.1.rc0
> >>>>
> >>>> Link to source and signatures on apache dist server:
> >>>> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.3.1.rc0/
> >>>>
> >>>> Link to scala packages on the staging repo:
> >>>>
> >>>> * CPU
> >>>>
> >>>>
> >>>
> >>
> https://repository.apache.org/content/repositories/snapshots/org/apache/mxnet/mxnet-full_2.11-osx-x86_64-cpu/1.3.1-SNAPSHOT/
> >>>>
> >>>> * GPU
> >>>>
> >>>>
> >>>
> >>
> https://repository.apache.org/content/repositories/snapshots/org/apache/mxnet/mxnet-full_2.11-linux-x86_64-gpu/1.3.1-SNAPSHOT/
> >>>>
> >>>> Please remember to TEST first before voting accordingly:
> >>>> +1 = approve
> >>>> +0 = no opinion
> >>>> -1 = disapprove (provide reason)
> >>>>
> >>>>
> >>>> Best regards,
> >>>> Anton
> >>>>
> >>>
> >>
>
>


Re: Should PR-860 (Use modernized range loops where possible) be reverted?

2018-11-20 Thread Anton Chernov
Hi Carin,

The discussion [1] was about whether to enable automatic checks on using
old behaviour in new PR's. Kellens PR [2] was about modernizing the actual
code itself and was not up for voting, thus could not receive any technical
veto votes.

Per the discussion (as I have understood it), we won't get veto votes if we
would enable the check on CI, if it would be treated as a warning.

Thank you for merging the PR in the first place. I see no reason for
reverting it.

Best
Anton

[1]
https://lists.apache.org/thread.html/b47f285a80bef47c5ead6c361614e338a0661f6c0c76196c1e3719c5@%3Cdev.mxnet.apache.org%3E
[2] https://github.com/apache/incubator-mxnet/pull/12356


вт, 20 нояб. 2018 г. в 15:24, Pedro Larroy :

> Hi all
>
> I think we have to make the clear separation between the thread votes
> on "uniformly adopting C++11 range loops in the MXNet project" and a
> PR which refactored code to be more legible and with improved variable
> names.
> Merging that PR doesn't imply that we have to uniformly adopt the
> previous proposal.  The PR was reviewed and approved by several
> people. I would keep the two topics separate, merging this PR doesn't
> prescribe any particular idiom for future commits or reviews.
>
> Pedro.
>
> On Tue, Nov 20, 2018 at 2:58 PM Carin Meier  wrote:
> >
> > My intent was to be helpful, but I think I may have merged this PR
> > yesterday too soon thinking it was approved and ready to merge
> > https://github.com/apache/incubator-mxnet/pull/12356
> >
> > I didn't see the connected dev discussion
> >
> https://lists.apache.org/thread.html/b47f285a80bef47c5ead6c361614e338a0661f6c0c76196c1e3719c5@%3Cdev.mxnet.apache.org%3E
> > where there were -1 votes, which I believe are vetos?
> >
> > So the question is confirm: should PR should be reverted?
> >
> > Sorry for any confusion,
> > Carin
>


Re: [VOTE] Release Apache MXNet (incubating) version 1.3.1.rc0

2018-11-16 Thread Anton Chernov
Thank you Carin, Steffen and Kellen for your votes.

The results so far:

* Binding *

+1 votes:
  - Carin

* Non-Binding *

+1 votes:
  - Kellen
  - Steffen

So far, we've got only positive votes, but unfortunately not enough to
conclude a result from the vote. I would like to remind everyone that we
need at least 3 binding +1 votes. Therefore, the vote is extended until
Tuesday 20th of November 2018, 5pm CET (9am PT).

I kindly ask the community to participate in voting for this patch release.

Best regards
Anton


пт, 16 нояб. 2018 г. в 9:10, kellen sunderland :

> Just tested with 1.3.0 and those tests were failing for that release as
> well.  Given it's not a regression I'm +1 (non-binding).
>
> On Thu, Nov 15, 2018 at 11:52 PM kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
>
> > Thanks for organizing the release Anton and for testing Carin and
> > Steffen.  Lots of great fixes in this release.  As we don't have the
> > required 3 committers I'd suggest extending the vote for a few days.
> >
> > I tested the following on MacOS 10.13, High Sierra:
> >
> > INCUBATING IN RELEASE FILE: check.
> > LICENSE check.
> > NOTICE check.
> > SIGNATURE check.
> > HASH check.
> > DISCLAIMER check.
> > SOURCE COMPILES VIA MAKEFILE check.
> > SOURCE COMPILES VIA CMAKE check.
> > C++ TESTS PASS fail
> > Two tests failing for me.
> > Build with flags: cmake -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_OPENMP=0
> > -DUSE_OPENCV=0 ..
> > Ran c++ tests with exclusions: ./tests/mxnet_unit_tests
> > --gtest_filter=-GpuTopology.*
> > Result:
> > [  FAILED  ] 2 tests, listed below:
> > [  FAILED  ] ACTIVATION_PERF.ExecuteBidirectional
> > [  FAILED  ] ACTIVATION_PERF.TimingCPU
> >
> > PYHTON UNIT TESTS PASS check.
> >
> > Not sure if the test failures are a regression so I'm +0 (non-binding)
> >
> > On Thu, Nov 15, 2018 at 5:43 PM Steffen Rochel 
> > wrote:
> >
> >> +1 build on MacOS Sierra following instructions on
> >>
> >>
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Developer+Setup+on+Mac
> >> and run one training test.
> >>
> >> On Tue, Nov 13, 2018 at 2:34 PM Carin Meier 
> wrote:
> >>
> >> > +1 - Clojure package tested fine with Scala jars
> >> >
> >> > On Mon, Nov 12, 2018 at 6:53 PM Anton Chernov 
> >> wrote:
> >> >
> >> > > Dear MXNet community,
> >> > >
> >> > > This is the vote to release Apache MXNet (incubating) version 1.3.1.
> >> > Voting
> >> > > will start now, on Monday the 12th of November 2018 and close on
> 14:00
> >> > > Thursday the 15th of November 2018, Pacific Time (PT).
> >> > >
> >> > > Link to release notes:
> >> > > https://cwiki.apache.org/confluence/x/eZGzBQ
> >> > >
> >> > > Link to release candidate 1.3.1.rc0:
> >> > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.1.rc0
> >> > >
> >> > > Link to source and signatures on apache dist server:
> >> > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.3.1.rc0/
> >> > >
> >> > > Link to scala packages on the staging repo:
> >> > >
> >> > > * CPU
> >> > >
> >> > >
> >> >
> >>
> https://repository.apache.org/content/repositories/snapshots/org/apache/mxnet/mxnet-full_2.11-osx-x86_64-cpu/1.3.1-SNAPSHOT/
> >> > >
> >> > > * GPU
> >> > >
> >> > >
> >> >
> >>
> https://repository.apache.org/content/repositories/snapshots/org/apache/mxnet/mxnet-full_2.11-linux-x86_64-gpu/1.3.1-SNAPSHOT/
> >> > >
> >> > > Please remember to TEST first before voting accordingly:
> >> > > +1 = approve
> >> > > +0 = no opinion
> >> > > -1 = disapprove (provide reason)
> >> > >
> >> > >
> >> > > Best regards,
> >> > > Anton
> >> > >
> >> >
> >>
> >
>


Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

2018-11-15 Thread Anton Chernov
I'd like to remind everyone that 'code freeze' would mean cutting a v1.4.x
release branch and all following fixes would need to be backported.
Development on master can be continued as usual.

Best
Anton

ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel :

> Dear MXNet community,
> the agreed plan was to establish code freeze for 1.4.0 release today. As
> the 1.3.1 patch release is still ongoing I suggest to post-pone the code
> freeze to Friday 16th November 2018.
>
> Sergey Kolychev has agreed to act as co-release manager for all tasks which
> require committer privileges. If anybody is interested to volunteer as
> release manager - now is the time to speak up. Otherwise I will manage the
> release.
>
> Regards,
> Steffen
>


[VOTE] Release Apache MXNet (incubating) version 1.3.1.rc0

2018-11-12 Thread Anton Chernov
Dear MXNet community,

This is the vote to release Apache MXNet (incubating) version 1.3.1. Voting
will start now, on Monday the 12th of November 2018 and close on 14:00
Thursday the 15th of November 2018, Pacific Time (PT).

Link to release notes:
https://cwiki.apache.org/confluence/x/eZGzBQ

Link to release candidate 1.3.1.rc0:
https://github.com/apache/incubator-mxnet/releases/tag/1.3.1.rc0

Link to source and signatures on apache dist server:
https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.3.1.rc0/

Link to scala packages on the staging repo:

* CPU
https://repository.apache.org/content/repositories/snapshots/org/apache/mxnet/mxnet-full_2.11-osx-x86_64-cpu/1.3.1-SNAPSHOT/

* GPU
https://repository.apache.org/content/repositories/snapshots/org/apache/mxnet/mxnet-full_2.11-linux-x86_64-gpu/1.3.1-SNAPSHOT/

Please remember to TEST first before voting accordingly:
+1 = approve
+0 = no opinion
-1 = disapprove (provide reason)


Best regards,
Anton


Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-12 Thread Anton Chernov
Unfortunately, merging the following PR

Set correct update on kvstore flag in dist_device_sync mode (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13121

Broke `dist-kvstore tests CPU` test stage:

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/82/pipeline

A revert PR has been opened:

Revert "Set correct update on kvstore flag in dist_device_sync mode
(v1.3.x) (#13121)
https://github.com/apache/incubator-mxnet/pull/13228

The test already passed, so the PR is good to go. The initial fix will not
be considered for the release and will get a notion in the known issues
section.

Added a version bump to the release branch:

news, readme update for v1.3.1 release
https://github.com/apache/incubator-mxnet/pull/13225

Since patch releases are now done on branches the master branch needs a
version update. Following PR for introducing the change:

Bumped minor version to 1.4.0 as 1.3.1 will be continued in the v1.3x branch
https://github.com/apache/incubator-mxnet/pull/13231


The confluence page 'Apache MXNet (incubating) 1.3.1 Release Notes' has
been updated:
https://cwiki.apache.org/confluence/x/eZGzBQ


Best
Anton

сб, 10 нояб. 2018 г. в 11:59, Anton Chernov :

> Due to various problems we had to postpone the tagging and vote for the
> release till Monday, the 12th of November 2018.
>
> Following change has been updated and waiting to be merged:
>
> Disable flaky test test_operator.test_dropout (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13200
>
> Indeed the MACOS tests timed out as well for the branch. The proposed
> change contains thus only the build:
>
> [MXNET-908] Enable minimal OSX Travis build (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13179
>
>
> Best
> Anton
>
> пт, 9 нояб. 2018 г. в 13:11, Anton Chernov :
>
>> I created the following PR to disable the test:
>>
>> Disable flaky test test_operator.test_dropout (v1.3.x)
>> https://github.com/apache/incubator-mxnet/pull/13200
>>
>> The second failure I suppose is related to:
>>
>> distributed kvstore bug in MXNet
>> https://github.com/apache/incubator-mxnet/issues/12713
>>
>> Which partially was fixed by
>>
>> Set correct update on kvstore flag in dist_device_sync mode (v1.3.x)
>> https://github.com/apache/incubator-mxnet/pull/13121
>>
>> But another part of the issue is still open and does not have a fix yet:
>>
>> "When distributed kvstore is used, by default gluon.Trainer doesn't work
>> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be more
>> specific, the trainer updates once per GPU, the LRScheduler object is
>> shared across GPUs and get a wrong update count."
>>
>>
>> Best
>> Anton
>>
>>
>> пт, 9 нояб. 2018 г. в 11:48, Anton Chernov :
>>
>>> In case the tests for MACOS will time out as well we can disable them
>>> and keep at least the build stage as in:
>>>
>>> Disable travis tests
>>> https://github.com/apache/incubator-mxnet/pull/13137
>>>
>>> Best
>>> Anton
>>>
>>> пт, 9 нояб. 2018 г. в 11:17, Anton Chernov :
>>>
>>>>
>>>> Hi Naveen,
>>>>
>>>> I believe that the timeout is not an issue for the branch. And I see
>>>> great benefit in having tests for MACOS on the release branch. The travis
>>>> build is not blocking anyway, so I don't see any risk in adding it.
>>>>
>>>> * test_dropout
>>>>
>>>> Currently, there is a problem with test_dropout that fails consistently
>>>> on the branch:
>>>>
>>>>
>>>> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/97/pipeline
>>>>
>>>> Error reported:
>>>>
>>>> ==
>>>> FAIL: test_operator.test_dropout
>>>> --
>>>> Traceback (most recent call last):
>>>>   File "C:\Anaconda3\envs\py3\lib\site-packages\nose\case.py", line
>>>> 197, in runTest
>>>> self.test(*self.arg)
>>>>   File
>>>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\common.py",
>>>> line 173, in test_new
>>>> orig_test(*args, **kwargs)
>>>>   File
>>>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
>>>> line 5853, in test_dropout
>>>> check_dropout_

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-10 Thread Anton Chernov
Due to various problems we had to postpone the tagging and vote for the
release till Monday, the 12th of November 2018.

Following change has been updated and waiting to be merged:

Disable flaky test test_operator.test_dropout (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13200

Indeed the MACOS tests timed out as well for the branch. The proposed
change contains thus only the build:

[MXNET-908] Enable minimal OSX Travis build (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13179


Best
Anton

пт, 9 нояб. 2018 г. в 13:11, Anton Chernov :

> I created the following PR to disable the test:
>
> Disable flaky test test_operator.test_dropout (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13200
>
> The second failure I suppose is related to:
>
> distributed kvstore bug in MXNet
> https://github.com/apache/incubator-mxnet/issues/12713
>
> Which partially was fixed by
>
> Set correct update on kvstore flag in dist_device_sync mode (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13121
>
> But another part of the issue is still open and does not have a fix yet:
>
> "When distributed kvstore is used, by default gluon.Trainer doesn't work
> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be more
> specific, the trainer updates once per GPU, the LRScheduler object is
> shared across GPUs and get a wrong update count."
>
>
> Best
> Anton
>
>
> пт, 9 нояб. 2018 г. в 11:48, Anton Chernov :
>
>> In case the tests for MACOS will time out as well we can disable them and
>> keep at least the build stage as in:
>>
>> Disable travis tests
>> https://github.com/apache/incubator-mxnet/pull/13137
>>
>> Best
>> Anton
>>
>> пт, 9 нояб. 2018 г. в 11:17, Anton Chernov :
>>
>>>
>>> Hi Naveen,
>>>
>>> I believe that the timeout is not an issue for the branch. And I see
>>> great benefit in having tests for MACOS on the release branch. The travis
>>> build is not blocking anyway, so I don't see any risk in adding it.
>>>
>>> * test_dropout
>>>
>>> Currently, there is a problem with test_dropout that fails consistently
>>> on the branch:
>>>
>>>
>>> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/97/pipeline
>>>
>>> Error reported:
>>>
>>> ==
>>> FAIL: test_operator.test_dropout
>>> --
>>> Traceback (most recent call last):
>>>   File "C:\Anaconda3\envs\py3\lib\site-packages\nose\case.py", line 197,
>>> in runTest
>>> self.test(*self.arg)
>>>   File
>>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\common.py",
>>> line 173, in test_new
>>> orig_test(*args, **kwargs)
>>>   File
>>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
>>> line 5853, in test_dropout
>>> check_dropout_ratio(0.0, shape)
>>>   File
>>> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
>>> line 5797, in check_dropout_ratio
>>> assert exe.outputs[0].asnumpy().min() == min_value
>>> AssertionError:
>>>  >> begin captured logging << 
>>> common: INFO: Setting test np/mx/python random seeds, use
>>> MXNET_TEST_SEED=428273587 to reproduce.
>>> - >> end captured logging << -
>>>
>>> The test is enabled on master:
>>>
>>> Re-enables test_operator.test_dropout
>>> https://github.com/apache/incubator-mxnet/pull/12717
>>>
>>> And there are no failures for it [1].
>>>
>>> * KVStore tests
>>>
>>> Unfortunately, KVStore tests fail as well.
>>>
>>>
>>> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/96/pipeline
>>>
>>> Error reported:
>>>
>>> AssertionError
>>> test_gluon_trainer_type()
>>> assert trainer._update_on_kvstore is update_on_kv\
>>>   File "dist_sync_kvstore.py", line 388, in test_gluon_trainer_type
>>>
>>> If nobody has a fix for these issues, I will disable the tests and add
>>> information to the known issues section.
>>>
>>> Best
>>> Anton
>>>
>>&

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-09 Thread Anton Chernov
In case the tests for MACOS will time out as well we can disable them and
keep at least the build stage as in:

Disable travis tests
https://github.com/apache/incubator-mxnet/pull/13137

Best
Anton

пт, 9 нояб. 2018 г. в 11:17, Anton Chernov :

>
> Hi Naveen,
>
> I believe that the timeout is not an issue for the branch. And I see great
> benefit in having tests for MACOS on the release branch. The travis build
> is not blocking anyway, so I don't see any risk in adding it.
>
> * test_dropout
>
> Currently, there is a problem with test_dropout that fails consistently on
> the branch:
>
>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/97/pipeline
>
> Error reported:
>
> ==
> FAIL: test_operator.test_dropout
> --
> Traceback (most recent call last):
>   File "C:\Anaconda3\envs\py3\lib\site-packages\nose\case.py", line 197,
> in runTest
> self.test(*self.arg)
>   File
> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\common.py",
> line 173, in test_new
> orig_test(*args, **kwargs)
>   File
> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
> line 5853, in test_dropout
> check_dropout_ratio(0.0, shape)
>   File
> "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
> line 5797, in check_dropout_ratio
> assert exe.outputs[0].asnumpy().min() == min_value
> AssertionError:
>  >> begin captured logging << 
> common: INFO: Setting test np/mx/python random seeds, use
> MXNET_TEST_SEED=428273587 to reproduce.
> - >> end captured logging << -
>
> The test is enabled on master:
>
> Re-enables test_operator.test_dropout
> https://github.com/apache/incubator-mxnet/pull/12717
>
> And there are no failures for it [1].
>
> * KVStore tests
>
> Unfortunately, KVStore tests fail as well.
>
>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/96/pipeline
>
> Error reported:
>
> AssertionError
> test_gluon_trainer_type()
> assert trainer._update_on_kvstore is update_on_kv\
>   File "dist_sync_kvstore.py", line 388, in test_gluon_trainer_type
>
> If nobody has a fix for these issues, I will disable the tests and add
> information to the known issues section.
>
> Best
> Anton
>
> [1] http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/master/
>
> чт, 8 нояб. 2018 г. в 21:44, Naveen Swamy :
>
>> Anton, I don't think we need to add the Mac OS tests for 1.3.1 branch
>> since
>> travis CI is timing out and creates blockers, it also did not exist for
>> v1.3.0.
>>
>>
>> On Thu, Nov 8, 2018 at 10:04 AM Anton Chernov 
>> wrote:
>>
>> > A PR to fix the tests:
>> >
>> > Remove test for non existing index copy operator (v1.3.x)
>> > https://github.com/apache/incubator-mxnet/pull/13180
>> >
>> >
>> > Best
>> > Anton
>> >
>> > чт, 8 нояб. 2018 г. в 10:05, Anton Chernov :
>> >
>> > > An addition has been made to include MacOS tests for the v1.3.x
>> branch:
>> > >
>> > > [MXNET-908] Enable minimal OSX Travis build (v1.3.x)
>> > > https://github.com/apache/incubator-mxnet/pull/13179
>> > >
>> > > It includes following PR's for master:
>> > >
>> > > [MXNET-908] Enable minimal OSX Travis build
>> > > https://github.com/apache/incubator-mxnet/pull/12462
>> > >
>> > > [MXNET-908] Enable python tests in Travis
>> > > https://github.com/apache/incubator-mxnet/pull/12550
>> > >
>> > > [MXNET-968] Fix MacOS python tests
>> > > https://github.com/apache/incubator-mxnet/pull/12590
>> > >
>> > >
>> > > Best
>> > > Anton
>> > >
>> > >
>> > > чт, 8 нояб. 2018 г. в 9:38, Anton Chernov :
>> > >
>> > >> Thank you everyone for your support and suggestions. All proposed
>> PR's
>> > >> have been merged. We will tag the release candidate and start the
>> vote
>> > on
>> > >> Friday, the 9th of November 2018.
>> > >>
>> > >> Unfortunately after the merges the tests started to fail:
>> > >>
>> > >>
>> http

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-09 Thread Anton Chernov
Hi Naveen,

I believe that the timeout is not an issue for the branch. And I see great
benefit in having tests for MACOS on the release branch. The travis build
is not blocking anyway, so I don't see any risk in adding it.

* test_dropout

Currently, there is a problem with test_dropout that fails consistently on
the branch:

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/97/pipeline

Error reported:

==
FAIL: test_operator.test_dropout
--
Traceback (most recent call last):
  File "C:\Anaconda3\envs\py3\lib\site-packages\nose\case.py", line 197, in
runTest
self.test(*self.arg)
  File
"C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\common.py",
line 173, in test_new
orig_test(*args, **kwargs)
  File
"C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
line 5853, in test_dropout
check_dropout_ratio(0.0, shape)
  File
"C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py",
line 5797, in check_dropout_ratio
assert exe.outputs[0].asnumpy().min() == min_value
AssertionError:
 >> begin captured logging << 
common: INFO: Setting test np/mx/python random seeds, use
MXNET_TEST_SEED=428273587 to reproduce.
- >> end captured logging << -

The test is enabled on master:

Re-enables test_operator.test_dropout
https://github.com/apache/incubator-mxnet/pull/12717

And there are no failures for it [1].

* KVStore tests

Unfortunately, KVStore tests fail as well.

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.3.x/96/pipeline

Error reported:

AssertionError
test_gluon_trainer_type()
assert trainer._update_on_kvstore is update_on_kv\
  File "dist_sync_kvstore.py", line 388, in test_gluon_trainer_type

If nobody has a fix for these issues, I will disable the tests and add
information to the known issues section.

Best
Anton

[1] http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/master/

чт, 8 нояб. 2018 г. в 21:44, Naveen Swamy :

> Anton, I don't think we need to add the Mac OS tests for 1.3.1 branch since
> travis CI is timing out and creates blockers, it also did not exist for
> v1.3.0.
>
>
> On Thu, Nov 8, 2018 at 10:04 AM Anton Chernov  wrote:
>
> > A PR to fix the tests:
> >
> > Remove test for non existing index copy operator (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13180
> >
> >
> > Best
> > Anton
> >
> > чт, 8 нояб. 2018 г. в 10:05, Anton Chernov :
> >
> > > An addition has been made to include MacOS tests for the v1.3.x branch:
> > >
> > > [MXNET-908] Enable minimal OSX Travis build (v1.3.x)
> > > https://github.com/apache/incubator-mxnet/pull/13179
> > >
> > > It includes following PR's for master:
> > >
> > > [MXNET-908] Enable minimal OSX Travis build
> > > https://github.com/apache/incubator-mxnet/pull/12462
> > >
> > > [MXNET-908] Enable python tests in Travis
> > > https://github.com/apache/incubator-mxnet/pull/12550
> > >
> > > [MXNET-968] Fix MacOS python tests
> > > https://github.com/apache/incubator-mxnet/pull/12590
> > >
> > >
> > > Best
> > > Anton
> > >
> > >
> > > чт, 8 нояб. 2018 г. в 9:38, Anton Chernov :
> > >
> > >> Thank you everyone for your support and suggestions. All proposed PR's
> > >> have been merged. We will tag the release candidate and start the vote
> > on
> > >> Friday, the 9th of November 2018.
> > >>
> > >> Unfortunately after the merges the tests started to fail:
> > >>
> > >> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.3.x/
> > >>
> > >> I will look into the failures, but any help as usual is very
> > appreciated.
> > >>
> > >> The nightly tests are fine:
> > >> http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTests/job/v1.3.x/
> > >>
> > >>
> > >> Best
> > >> Anton
> > >>
> > >>
> > >>
> > >>
> > >> ср, 7 нояб. 2018 г. в 17:19, Anton Chernov :
> > >>
> > >>> Yes, you are right about the versions wording, thanks for
> > clarification.
> > >>>
> > >>> A performance improvement can be considered a bugfix as well. I see
> no
> > >&

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-08 Thread Anton Chernov
Thank you everyone for your support and suggestions. All proposed PR's have
been merged. We will tag the release candidate and start the vote on
Friday, the 9th of November 2018.

Unfortunately after the merges the tests started to fail:

http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.3.x/

I will look into the failures, but any help as usual is very appreciated.

The nightly tests are fine:
http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTests/job/v1.3.x/


Best
Anton




ср, 7 нояб. 2018 г. в 17:19, Anton Chernov :

> Yes, you are right about the versions wording, thanks for clarification.
>
> A performance improvement can be considered a bugfix as well. I see no big
> risks in including PR's by Haibin and Lin into the patch release.
>
> @Haibin, if you can reopen the PR's they should be good to go for the
> relase, considering the importance of the improvements.
>
> I propose the following bugfixes for the release as well (already created
> corresponding PR's):
>
> Fixed __setattr__ method of _MXClassPropertyMetaClass (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13157
>
> fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13158
>
> We will be starting to merge the PR's shortly. If are no more proposals
> for backporting I would consider the list as set.
>
> Best
> Anton
>
> ср, 7 нояб. 2018 г. в 17:01, Sheng Zha :
>
>> Hi Anton,
>>
>> I hear your concern about a simultaneous 1.4.0 release and it certainly
>> is a valid one.
>>
>> Regarding the release, let’s agree on the language first. According to
>> semver.org, 1.3.1 release is considered patch release, which is for
>> backward compatible bug fixes, while 1.4.0 release is considered minor
>> release, which is for backward compatible new features. A major release
>> would mean 2.0.
>>
>> The three PRs suggested by Haibin and Lin are all introducing new
>> features. If they go into a patch release, it would require an exception
>> accepted by the community. Also, if other violation happens it could be
>> ground for declining a release during votes.
>>
>> -sz
>>
>> > On Nov 7, 2018, at 2:25 AM, Anton Chernov  wrote:
>> >
>> > [MXNET-1179] Enforce deterministic algorithms in convolution layers
>>
>


Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-08 Thread Anton Chernov
A PR to fix the tests:

Remove test for non existing index copy operator (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13180


Best
Anton

чт, 8 нояб. 2018 г. в 10:05, Anton Chernov :

> An addition has been made to include MacOS tests for the v1.3.x branch:
>
> [MXNET-908] Enable minimal OSX Travis build (v1.3.x)
> https://github.com/apache/incubator-mxnet/pull/13179
>
> It includes following PR's for master:
>
> [MXNET-908] Enable minimal OSX Travis build
> https://github.com/apache/incubator-mxnet/pull/12462
>
> [MXNET-908] Enable python tests in Travis
> https://github.com/apache/incubator-mxnet/pull/12550
>
> [MXNET-968] Fix MacOS python tests
> https://github.com/apache/incubator-mxnet/pull/12590
>
>
> Best
> Anton
>
>
> чт, 8 нояб. 2018 г. в 9:38, Anton Chernov :
>
>> Thank you everyone for your support and suggestions. All proposed PR's
>> have been merged. We will tag the release candidate and start the vote on
>> Friday, the 9th of November 2018.
>>
>> Unfortunately after the merges the tests started to fail:
>>
>> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.3.x/
>>
>> I will look into the failures, but any help as usual is very appreciated.
>>
>> The nightly tests are fine:
>> http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTests/job/v1.3.x/
>>
>>
>> Best
>> Anton
>>
>>
>>
>>
>> ср, 7 нояб. 2018 г. в 17:19, Anton Chernov :
>>
>>> Yes, you are right about the versions wording, thanks for clarification.
>>>
>>> A performance improvement can be considered a bugfix as well. I see no
>>> big risks in including PR's by Haibin and Lin into the patch release.
>>>
>>> @Haibin, if you can reopen the PR's they should be good to go for the
>>> relase, considering the importance of the improvements.
>>>
>>> I propose the following bugfixes for the release as well (already
>>> created corresponding PR's):
>>>
>>> Fixed __setattr__ method of _MXClassPropertyMetaClass (v1.3.x)
>>> https://github.com/apache/incubator-mxnet/pull/13157
>>>
>>> fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x)
>>> https://github.com/apache/incubator-mxnet/pull/13158
>>>
>>> We will be starting to merge the PR's shortly. If are no more proposals
>>> for backporting I would consider the list as set.
>>>
>>> Best
>>> Anton
>>>
>>> ср, 7 нояб. 2018 г. в 17:01, Sheng Zha :
>>>
>>>> Hi Anton,
>>>>
>>>> I hear your concern about a simultaneous 1.4.0 release and it certainly
>>>> is a valid one.
>>>>
>>>> Regarding the release, let’s agree on the language first. According to
>>>> semver.org, 1.3.1 release is considered patch release, which is for
>>>> backward compatible bug fixes, while 1.4.0 release is considered minor
>>>> release, which is for backward compatible new features. A major release
>>>> would mean 2.0.
>>>>
>>>> The three PRs suggested by Haibin and Lin are all introducing new
>>>> features. If they go into a patch release, it would require an exception
>>>> accepted by the community. Also, if other violation happens it could be
>>>> ground for declining a release during votes.
>>>>
>>>> -sz
>>>>
>>>> > On Nov 7, 2018, at 2:25 AM, Anton Chernov 
>>>> wrote:
>>>> >
>>>> > [MXNET-1179] Enforce deterministic algorithms in convolution layers
>>>>
>>>


Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-08 Thread Anton Chernov
An addition has been made to include MacOS tests for the v1.3.x branch:

[MXNET-908] Enable minimal OSX Travis build (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13179

It includes following PR's for master:

[MXNET-908] Enable minimal OSX Travis build
https://github.com/apache/incubator-mxnet/pull/12462

[MXNET-908] Enable python tests in Travis
https://github.com/apache/incubator-mxnet/pull/12550

[MXNET-968] Fix MacOS python tests
https://github.com/apache/incubator-mxnet/pull/12590


Best
Anton


чт, 8 нояб. 2018 г. в 9:38, Anton Chernov :

> Thank you everyone for your support and suggestions. All proposed PR's
> have been merged. We will tag the release candidate and start the vote on
> Friday, the 9th of November 2018.
>
> Unfortunately after the merges the tests started to fail:
>
> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.3.x/
>
> I will look into the failures, but any help as usual is very appreciated.
>
> The nightly tests are fine:
> http://jenkins.mxnet-ci.amazon-ml.com/job/NightlyTests/job/v1.3.x/
>
>
> Best
> Anton
>
>
>
>
> ср, 7 нояб. 2018 г. в 17:19, Anton Chernov :
>
>> Yes, you are right about the versions wording, thanks for clarification.
>>
>> A performance improvement can be considered a bugfix as well. I see no
>> big risks in including PR's by Haibin and Lin into the patch release.
>>
>> @Haibin, if you can reopen the PR's they should be good to go for the
>> relase, considering the importance of the improvements.
>>
>> I propose the following bugfixes for the release as well (already created
>> corresponding PR's):
>>
>> Fixed __setattr__ method of _MXClassPropertyMetaClass (v1.3.x)
>> https://github.com/apache/incubator-mxnet/pull/13157
>>
>> fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x)
>> https://github.com/apache/incubator-mxnet/pull/13158
>>
>> We will be starting to merge the PR's shortly. If are no more proposals
>> for backporting I would consider the list as set.
>>
>> Best
>> Anton
>>
>> ср, 7 нояб. 2018 г. в 17:01, Sheng Zha :
>>
>>> Hi Anton,
>>>
>>> I hear your concern about a simultaneous 1.4.0 release and it certainly
>>> is a valid one.
>>>
>>> Regarding the release, let’s agree on the language first. According to
>>> semver.org, 1.3.1 release is considered patch release, which is for
>>> backward compatible bug fixes, while 1.4.0 release is considered minor
>>> release, which is for backward compatible new features. A major release
>>> would mean 2.0.
>>>
>>> The three PRs suggested by Haibin and Lin are all introducing new
>>> features. If they go into a patch release, it would require an exception
>>> accepted by the community. Also, if other violation happens it could be
>>> ground for declining a release during votes.
>>>
>>> -sz
>>>
>>> > On Nov 7, 2018, at 2:25 AM, Anton Chernov  wrote:
>>> >
>>> > [MXNET-1179] Enforce deterministic algorithms in convolution layers
>>>
>>


Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-07 Thread Anton Chernov
Yes, you are right about the versions wording, thanks for clarification.

A performance improvement can be considered a bugfix as well. I see no big
risks in including PR's by Haibin and Lin into the patch release.

@Haibin, if you can reopen the PR's they should be good to go for the
relase, considering the importance of the improvements.

I propose the following bugfixes for the release as well (already created
corresponding PR's):

Fixed __setattr__ method of _MXClassPropertyMetaClass (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13157

fixed symbols naming in RNNCell, LSTMCell, GRUCell (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13158

We will be starting to merge the PR's shortly. If are no more proposals for
backporting I would consider the list as set.

Best
Anton

ср, 7 нояб. 2018 г. в 17:01, Sheng Zha :

> Hi Anton,
>
> I hear your concern about a simultaneous 1.4.0 release and it certainly is
> a valid one.
>
> Regarding the release, let’s agree on the language first. According to
> semver.org, 1.3.1 release is considered patch release, which is for
> backward compatible bug fixes, while 1.4.0 release is considered minor
> release, which is for backward compatible new features. A major release
> would mean 2.0.
>
> The three PRs suggested by Haibin and Lin are all introducing new
> features. If they go into a patch release, it would require an exception
> accepted by the community. Also, if other violation happens it could be
> ground for declining a release during votes.
>
> -sz
>
> > On Nov 7, 2018, at 2:25 AM, Anton Chernov  wrote:
> >
> > [MXNET-1179] Enforce deterministic algorithms in convolution layers
>


Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-07 Thread Anton Chernov
@Sheng: Sorry, nevermind. It was already suggested by Lin.

The following backport PR's have been created:

allow foreach on input with 0 length (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13151

[MXNET-1179] Enforce deterministic algorithms in convolution layers (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13152

Document the newly added env variable (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13156

add/update infer_range docs (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13153

fix broken Python IO API docs (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13154

fix broken links (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13155


Best
Anton


ср, 7 нояб. 2018 г. в 10:49, Anton Chernov :

> Hi Sheng,
>
> thanks for you suggestions. Personally, I would not rush with new major
> release as this breaks the pace and creates unnecessary pressure in my
> opinion.
>
> If the changes suggested by Haibin are really important then I think we
> can consider them for the minor release, even if they are not strictly
> speaking *bugfixes*. Do you think that might be an option?
>
> And did I understand correctly, you are suggesting:
>
> [MXNET-1179] Enforce deterministic algorithms in convolution layers
> https://github.com/apache/incubator-mxnet/pull/12992
>
> for the 1.3.1 release?
>
> Best
> Anton
>
>
> ср, 7 нояб. 2018 г. в 0:59, Sheng Zha :
>
>> Similar to the two PRs that Haibin suggested, 12992 introduces new
>> interface for controlling determinism, which is better suited for minor
>> release.
>>
>> I think other than lack of release manager to drive 1.4.0 release,
>> there’s no reason we cannot do two releases (1.4.0 & 1.3.1) at the same
>> time. I’m willing to help with the 1.4.0 release to make these new features
>> available one month sooner, if there’s no other concern.
>>
>> -sz
>>
>> > On Nov 6, 2018, at 3:30 PM, Lin Yuan  wrote:
>> >
>> > Hi Anton,
>> >
>> > Thanks for helping the release.
>> > The following PRs are needed by customers who want to use deterministic
>> > CUDNN convolution algorithms:
>> >
>> > https://github.com/apache/incubator-mxnet/pull/12992
>> > https://github.com/apache/incubator-mxnet/pull/13049
>> >
>> > Thanks!
>> >
>> > Lin
>> >
>> >
>> > On Tue, Nov 6, 2018 at 1:51 PM Aaron Markham > >
>> > wrote:
>> >
>> >> Hi Anton,
>> >> I have the following suggestions for fixes to include in 1.3.1. These
>> each
>> >> have updates to files that will impact docs generation for the 1.3.x
>> >> version of the website's Python API docs:
>> >>
>> >> https://github.com/apache/incubator-mxnet/pull/12879
>> >> https://github.com/apache/incubator-mxnet/pull/12871
>> >> https://github.com/apache/incubator-mxnet/pull/12856
>> >>
>> >> Thanks,
>> >> Aaron
>> >>
>> >>> On Tue, Nov 6, 2018 at 1:29 PM Lai Wei  wrote:
>> >>>
>> >>> Hi Anton,
>> >>>
>> >>> Thanks for driving this, I would like to include the following fix in
>> >>> 1.3.1:
>> >>> Allow infer shape partial on foreach operator:
>> >>> https://github.com/apache/incubator-mxnet/pull/12471
>> >>>
>> >>> Keras-MXNet needs this functionality to infer shape partially
>> >>> on foreach operator. (Used in RNN operators)
>> >>>
>> >>> Thanks a lot!
>> >>>
>> >>>
>> >>> Best Regards
>> >>> Lai Wei
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Nov 6, 2018 at 10:44 AM Haibin Lin 
>> >>> wrote:
>> >>>
>> >>>> Hi Naveen and Anton,
>> >>>>
>> >>>> Thanks for pointing that out. You are right that these are not
>> critical
>> >>>> fixes. Putting them in 1.4.0 is more appropriate. PRs are closed.
>> >>>>
>> >>>> Best,
>> >>>> Haibin
>> >>>>
>> >>>> On Tue, Nov 6, 2018 at 7:35 AM Naveen Swamy 
>> >> wrote:
>> >>>>
>> >>>>> Please note that this is a patch release(1.3.1) to address critical
>> >>>> bugs!,
>> >>>>> For everything else please wait for 1.4.0 which is planned very
>> >> shortly
>> >>>>> after 1.3.1
>

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-07 Thread Anton Chernov
Hi Sheng,

thanks for you suggestions. Personally, I would not rush with new major
release as this breaks the pace and creates unnecessary pressure in my
opinion.

If the changes suggested by Haibin are really important then I think we can
consider them for the minor release, even if they are not strictly speaking
*bugfixes*. Do you think that might be an option?

And did I understand correctly, you are suggesting:

[MXNET-1179] Enforce deterministic algorithms in convolution layers
https://github.com/apache/incubator-mxnet/pull/12992

for the 1.3.1 release?

Best
Anton


ср, 7 нояб. 2018 г. в 0:59, Sheng Zha :

> Similar to the two PRs that Haibin suggested, 12992 introduces new
> interface for controlling determinism, which is better suited for minor
> release.
>
> I think other than lack of release manager to drive 1.4.0 release, there’s
> no reason we cannot do two releases (1.4.0 & 1.3.1) at the same time. I’m
> willing to help with the 1.4.0 release to make these new features available
> one month sooner, if there’s no other concern.
>
> -sz
>
> > On Nov 6, 2018, at 3:30 PM, Lin Yuan  wrote:
> >
> > Hi Anton,
> >
> > Thanks for helping the release.
> > The following PRs are needed by customers who want to use deterministic
> > CUDNN convolution algorithms:
> >
> > https://github.com/apache/incubator-mxnet/pull/12992
> > https://github.com/apache/incubator-mxnet/pull/13049
> >
> > Thanks!
> >
> > Lin
> >
> >
> > On Tue, Nov 6, 2018 at 1:51 PM Aaron Markham 
> > wrote:
> >
> >> Hi Anton,
> >> I have the following suggestions for fixes to include in 1.3.1. These
> each
> >> have updates to files that will impact docs generation for the 1.3.x
> >> version of the website's Python API docs:
> >>
> >> https://github.com/apache/incubator-mxnet/pull/12879
> >> https://github.com/apache/incubator-mxnet/pull/12871
> >> https://github.com/apache/incubator-mxnet/pull/12856
> >>
> >> Thanks,
> >> Aaron
> >>
> >>> On Tue, Nov 6, 2018 at 1:29 PM Lai Wei  wrote:
> >>>
> >>> Hi Anton,
> >>>
> >>> Thanks for driving this, I would like to include the following fix in
> >>> 1.3.1:
> >>> Allow infer shape partial on foreach operator:
> >>> https://github.com/apache/incubator-mxnet/pull/12471
> >>>
> >>> Keras-MXNet needs this functionality to infer shape partially
> >>> on foreach operator. (Used in RNN operators)
> >>>
> >>> Thanks a lot!
> >>>
> >>>
> >>> Best Regards
> >>> Lai Wei
> >>>
> >>>
> >>>
> >>> On Tue, Nov 6, 2018 at 10:44 AM Haibin Lin 
> >>> wrote:
> >>>
> >>>> Hi Naveen and Anton,
> >>>>
> >>>> Thanks for pointing that out. You are right that these are not
> critical
> >>>> fixes. Putting them in 1.4.0 is more appropriate. PRs are closed.
> >>>>
> >>>> Best,
> >>>> Haibin
> >>>>
> >>>> On Tue, Nov 6, 2018 at 7:35 AM Naveen Swamy 
> >> wrote:
> >>>>
> >>>>> Please note that this is a patch release(1.3.1) to address critical
> >>>> bugs!,
> >>>>> For everything else please wait for 1.4.0 which is planned very
> >> shortly
> >>>>> after 1.3.1
> >>>>>
> >>>>>> On Nov 6, 2018, at 7:17 AM, Anton Chernov 
> >>> wrote:
> >>>>>>
> >>>>>> The following PR's have been created so far:
> >>>>>>
> >>>>>> Infer dtype in SymbolBlock import from input symbol (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13117
> >>>>>>
> >>>>>> [MXNET-953] Fix oob memory read (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13118
> >>>>>>
> >>>>>> [MXNET-969] Fix buffer overflow in RNNOp (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13119
> >>>>>>
> >>>>>> [MXNET-922] Fix memleak in profiler (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mxnet/pull/13120
> >>>>>>
> >>>>>> Set correct update on kvstore flag in dist_device_sync mode
> >> (v1.3.x)
> >>>>>> https://github.com/apache/incubator-mx

Re: [DISCUSS] Speedup non-code PR in CI

2018-11-07 Thread Anton Chernov
Hi Lin,

thanks for your suggestion. I think it makes total sense. The triggering
logic of the verification build is in the Jenkinsfile [1]. We would be
happy if you could drive this and introduce a PR that implements this check.

Best
Anton

[1] https://github.com/apache/incubator-mxnet/blob/master/Jenkinsfile

вт, 6 нояб. 2018 г. в 20:55, Lin Yuan :

> Kellen and Pedro,
>
> Thanks for your pointers. I am not an expert in CI but one naive speedup I
> can see is that if the PR only contains *.md file, then skip the build and
> testing cycles. This can make documentation/correction easier and save
> computation resource for other needed tests. Any side effect there?
>
> Thanks,
>
> Lin
>


Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-06 Thread Anton Chernov
The following PR's have been created so far:

Infer dtype in SymbolBlock import from input symbol (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13117

[MXNET-953] Fix oob memory read (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13118

[MXNET-969] Fix buffer overflow in RNNOp (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13119

[MXNET-922] Fix memleak in profiler (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13120

Set correct update on kvstore flag in dist_device_sync mode (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13121

update mshadow (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13122

CudnnFind() usage improvements (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13123

Fix lazy record io when used with dataloader and multi_worker > 0 (v1.3.x)
https://github.com/apache/incubator-mxnet/pull/13124


As stated previously I would be rather opposed to have following PR's it in
the patch release:

Gluon LSTM Projection and Clipping Support (#13055) v1.3.x
https://github.com/apache/incubator-mxnet/pull/13129

sample_like operators (#13034) v1.3.x
https://github.com/apache/incubator-mxnet/pull/13130


Best
Anton

вт, 6 нояб. 2018 г. в 16:06, Anton Chernov :

> Hi Haibin,
>
> I have a few comments regarding the proposed performance improvement
> changes.
>
> CUDNN support for LSTM with projection & clipping
> https://github.com/apache/incubator-mxnet/pull/13056
>
> There is no doubt that this change brings value, but I don't see it as a
> critical bug fix. I would rather leave it for the next major release.
>
> sample_like operators
> https://github.com/apache/incubator-mxnet/pull/13034
>
> Even if it's related to performance, this is an addition of functionality
> and I would also push this to be in the next major release only.
>
>
> Best
> Anton
>
>
> вт, 6 нояб. 2018 г. в 15:55, Anton Chernov :
>
>> Hi Patric,
>>
>> This change was listed in the 'PR candidates suggested for consideration
>> for v1.3.1 patch release' section [1].
>>
>> You are right, I also think that this is not a critical hotfix change
>> that should be included into the 1.3.1 patch release.
>>
>> Thus I'm not making any further efforts to bring it in.
>>
>> Best
>> Anton
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates
>>
>>
>> вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric :
>>
>>> Hi Anton,
>>>
>>> Thanks for looking into the MKL-DNN PR.
>>>
>>> As my understanding of cwiki (
>>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
>>> ),
>>> these features will go into 1.4 rather than patch release of 1.3.1.
>>>
>>> Feel free to correct me :)
>>>
>>> Thanks,
>>>
>>> --Patric
>>>
>>> > -Original Message-
>>> > From: Anton Chernov [mailto:mecher...@gmail.com]
>>> > Sent: Tuesday, November 6, 2018 3:11 AM
>>> > To: d...@mxnet.apache.org
>>> > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch
>>> > release
>>> >
>>> > It seems that there is a problem porting following changes to the
>>> v1.3.x
>>> > release branch:
>>> >
>>> > Implement mkldnn convolution fusion and quantization
>>> > https://github.com/apache/incubator-mxnet/pull/12530
>>> >
>>> > MKL-DNN Quantization Examples and README
>>> > https://github.com/apache/incubator-mxnet/pull/12808
>>> >
>>> > The bases are different.
>>> >
>>> > I would need help from authors of these changes to make a backport PR.
>>> >
>>> > @ZhennanQin, @xinyu-intel would you be able to assist me and create the
>>> > corresponding PR's?
>>> >
>>> > Without proper history and domain knowledge I would not be able to
>>> create
>>> > them by my own in reasonable amount of time, I'm afraid.
>>> >
>>> > Best regards,
>>> > Anton
>>> >
>>> > пн, 5 нояб. 2018 г. в 19:45, Anton Chernov :
>>> >
>>> > >
>>> > > As part of:
>>> > >
>>> > > Implement mkldnn convolution fusion and quantization
>>> > > https://github.com/apache/incubator-mxnet/pull/12530
>>> > >
>>> > > I propose to add the examples and documentation PR as well:
>>> > >
>>> > > MKL-DNN Quantization Examples 

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-06 Thread Anton Chernov
Hi Haibin,

I have a few comments regarding the proposed performance improvement
changes.

CUDNN support for LSTM with projection & clipping
https://github.com/apache/incubator-mxnet/pull/13056

There is no doubt that this change brings value, but I don't see it as a
critical bug fix. I would rather leave it for the next major release.

sample_like operators
https://github.com/apache/incubator-mxnet/pull/13034

Even if it's related to performance, this is an addition of functionality
and I would also push this to be in the next major release only.


Best
Anton


вт, 6 нояб. 2018 г. в 15:55, Anton Chernov :

> Hi Patric,
>
> This change was listed in the 'PR candidates suggested for consideration
> for v1.3.1 patch release' section [1].
>
> You are right, I also think that this is not a critical hotfix change that
> should be included into the 1.3.1 patch release.
>
> Thus I'm not making any further efforts to bring it in.
>
> Best
> Anton
>
> [1]
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates
>
>
> вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric :
>
>> Hi Anton,
>>
>> Thanks for looking into the MKL-DNN PR.
>>
>> As my understanding of cwiki (
>> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
>> ),
>> these features will go into 1.4 rather than patch release of 1.3.1.
>>
>> Feel free to correct me :)
>>
>> Thanks,
>>
>> --Patric
>>
>> > -Original Message-
>> > From: Anton Chernov [mailto:mecher...@gmail.com]
>> > Sent: Tuesday, November 6, 2018 3:11 AM
>> > To: d...@mxnet.apache.org
>> > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch
>> > release
>> >
>> > It seems that there is a problem porting following changes to the v1.3.x
>> > release branch:
>> >
>> > Implement mkldnn convolution fusion and quantization
>> > https://github.com/apache/incubator-mxnet/pull/12530
>> >
>> > MKL-DNN Quantization Examples and README
>> > https://github.com/apache/incubator-mxnet/pull/12808
>> >
>> > The bases are different.
>> >
>> > I would need help from authors of these changes to make a backport PR.
>> >
>> > @ZhennanQin, @xinyu-intel would you be able to assist me and create the
>> > corresponding PR's?
>> >
>> > Without proper history and domain knowledge I would not be able to
>> create
>> > them by my own in reasonable amount of time, I'm afraid.
>> >
>> > Best regards,
>> > Anton
>> >
>> > пн, 5 нояб. 2018 г. в 19:45, Anton Chernov :
>> >
>> > >
>> > > As part of:
>> > >
>> > > Implement mkldnn convolution fusion and quantization
>> > > https://github.com/apache/incubator-mxnet/pull/12530
>> > >
>> > > I propose to add the examples and documentation PR as well:
>> > >
>> > > MKL-DNN Quantization Examples and README
>> > > https://github.com/apache/incubator-mxnet/pull/12808
>> > >
>> > >
>> > > Best regards,
>> > > Anton
>> > >
>> > > пн, 5 нояб. 2018 г. в 19:02, Anton Chernov :
>> > >
>> > >> Dear MXNet community,
>> > >>
>> > >> I will be the release manager for the upcoming 1.3.1 patch release.
>> > >> Naveen will be co-managing the release and providing help from the
>> > >> committers side.
>> > >>
>> > >> The following dates have been set:
>> > >>
>> > >> Code Freeze: 31st October 2018
>> > >> Release published: 13th November 2018
>> > >>
>> > >> Release notes have been drafted here [1].
>> > >>
>> > >>
>> > >> * Known issues
>> > >>
>> > >> Update MKL-DNN dependency
>> > >> https://github.com/apache/incubator-mxnet/pull/12953
>> > >>
>> > >> This PR hasn't been merged even to master yet. Requires additional
>> > >> discussion and merge.
>> > >>
>> > >> distributed kvstore bug in MXNet
>> > >> https://github.com/apache/incubator-mxnet/issues/12713
>> > >>
>> > >> > When distributed kvstore is used, by default gluon.Trainer doesn't
>> > >> > work
>> > >> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be
>> &

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-06 Thread Anton Chernov
Hi Patric,

This change was listed in the 'PR candidates suggested for consideration
for v1.3.1 patch release' section [1].

You are right, I also think that this is not a critical hotfix change that
should be included into the 1.3.1 patch release.

Thus I'm not making any further efforts to bring it in.

Best
Anton

[1]
https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates


вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric :

> Hi Anton,
>
> Thanks for looking into the MKL-DNN PR.
>
> As my understanding of cwiki (
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
> ),
> these features will go into 1.4 rather than patch release of 1.3.1.
>
> Feel free to correct me :)
>
> Thanks,
>
> --Patric
>
> > -Original Message-
> > From: Anton Chernov [mailto:mecher...@gmail.com]
> > Sent: Tuesday, November 6, 2018 3:11 AM
> > To: d...@mxnet.apache.org
> > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch
> > release
> >
> > It seems that there is a problem porting following changes to the v1.3.x
> > release branch:
> >
> > Implement mkldnn convolution fusion and quantization
> > https://github.com/apache/incubator-mxnet/pull/12530
> >
> > MKL-DNN Quantization Examples and README
> > https://github.com/apache/incubator-mxnet/pull/12808
> >
> > The bases are different.
> >
> > I would need help from authors of these changes to make a backport PR.
> >
> > @ZhennanQin, @xinyu-intel would you be able to assist me and create the
> > corresponding PR's?
> >
> > Without proper history and domain knowledge I would not be able to create
> > them by my own in reasonable amount of time, I'm afraid.
> >
> > Best regards,
> > Anton
> >
> > пн, 5 нояб. 2018 г. в 19:45, Anton Chernov :
> >
> > >
> > > As part of:
> > >
> > > Implement mkldnn convolution fusion and quantization
> > > https://github.com/apache/incubator-mxnet/pull/12530
> > >
> > > I propose to add the examples and documentation PR as well:
> > >
> > > MKL-DNN Quantization Examples and README
> > > https://github.com/apache/incubator-mxnet/pull/12808
> > >
> > >
> > > Best regards,
> > > Anton
> > >
> > > пн, 5 нояб. 2018 г. в 19:02, Anton Chernov :
> > >
> > >> Dear MXNet community,
> > >>
> > >> I will be the release manager for the upcoming 1.3.1 patch release.
> > >> Naveen will be co-managing the release and providing help from the
> > >> committers side.
> > >>
> > >> The following dates have been set:
> > >>
> > >> Code Freeze: 31st October 2018
> > >> Release published: 13th November 2018
> > >>
> > >> Release notes have been drafted here [1].
> > >>
> > >>
> > >> * Known issues
> > >>
> > >> Update MKL-DNN dependency
> > >> https://github.com/apache/incubator-mxnet/pull/12953
> > >>
> > >> This PR hasn't been merged even to master yet. Requires additional
> > >> discussion and merge.
> > >>
> > >> distributed kvstore bug in MXNet
> > >> https://github.com/apache/incubator-mxnet/issues/12713
> > >>
> > >> > When distributed kvstore is used, by default gluon.Trainer doesn't
> > >> > work
> > >> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be
> > >> more specific, the trainer updates once per GPU, the LRScheduler
> > >> object is shared across GPUs and get a wrong update count.
> > >>
> > >> This needs to be fixed. [6]
> > >>
> > >>
> > >> * Changes
> > >>
> > >> The following changes will be ported to the release branch, per [2]:
> > >>
> > >> Infer dtype in SymbolBlock import from input symbol [3]
> > >> https://github.com/apache/incubator-mxnet/pull/12412
> > >>
> > >> [MXNET-953] Fix oob memory read
> > >> https://github.com/apache/incubator-mxnet/pull/12631
> > >>
> > >> [MXNET-969] Fix buffer overflow in RNNOp
> > >> https://github.com/apache/incubator-mxnet/pull/12603
> > >>
> > >> [MXNET-922] Fix memleak in profiler
> > >> https://github.com/apache/incubator-mxnet/pull/12499
> > >>
> > >> Implement mkldnn convolution fusion and quanti

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-05 Thread Anton Chernov
It seems that there is a problem porting following changes to the v1.3.x
release branch:

Implement mkldnn convolution fusion and quantization
https://github.com/apache/incubator-mxnet/pull/12530

MKL-DNN Quantization Examples and README
https://github.com/apache/incubator-mxnet/pull/12808

The bases are different.

I would need help from authors of these changes to make a backport PR.

@ZhennanQin, @xinyu-intel would you be able to assist me and create the
corresponding PR's?

Without proper history and domain knowledge I would not be able to create
them by my own in reasonable amount of time, I'm afraid.

Best regards,
Anton

пн, 5 нояб. 2018 г. в 19:45, Anton Chernov :

>
> As part of:
>
> Implement mkldnn convolution fusion and quantization
> https://github.com/apache/incubator-mxnet/pull/12530
>
> I propose to add the examples and documentation PR as well:
>
> MKL-DNN Quantization Examples and README
> https://github.com/apache/incubator-mxnet/pull/12808
>
>
> Best regards,
> Anton
>
> пн, 5 нояб. 2018 г. в 19:02, Anton Chernov :
>
>> Dear MXNet community,
>>
>> I will be the release manager for the upcoming 1.3.1 patch release.
>> Naveen will be co-managing the release and providing help from the
>> committers side.
>>
>> The following dates have been set:
>>
>> Code Freeze: 31st October 2018
>> Release published: 13th November 2018
>>
>> Release notes have been drafted here [1].
>>
>>
>> * Known issues
>>
>> Update MKL-DNN dependency
>> https://github.com/apache/incubator-mxnet/pull/12953
>>
>> This PR hasn't been merged even to master yet. Requires additional
>> discussion and merge.
>>
>> distributed kvstore bug in MXNet
>> https://github.com/apache/incubator-mxnet/issues/12713
>>
>> > When distributed kvstore is used, by default gluon.Trainer doesn't work
>> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be more
>> specific, the trainer updates once per GPU, the LRScheduler object is
>> shared across GPUs and get a wrong update count.
>>
>> This needs to be fixed. [6]
>>
>>
>> * Changes
>>
>> The following changes will be ported to the release branch, per [2]:
>>
>> Infer dtype in SymbolBlock import from input symbol [3]
>> https://github.com/apache/incubator-mxnet/pull/12412
>>
>> [MXNET-953] Fix oob memory read
>> https://github.com/apache/incubator-mxnet/pull/12631
>>
>> [MXNET-969] Fix buffer overflow in RNNOp
>> https://github.com/apache/incubator-mxnet/pull/12603
>>
>> [MXNET-922] Fix memleak in profiler
>> https://github.com/apache/incubator-mxnet/pull/12499
>>
>> Implement mkldnn convolution fusion and quantization (MXNet Graph
>> Optimization and Quantization based on subgraph and MKL-DNN proposal [4])
>> https://github.com/apache/incubator-mxnet/pull/12530
>>
>> Following items (test cases) should be already part of 1.3.0:
>>
>> [MXNET-486] Create CPP test for concat MKLDNN operator
>> https://github.com/apache/incubator-mxnet/pull/11371
>>
>> [MXNET-489] MKLDNN Pool test
>> https://github.com/apache/incubator-mxnet/pull/11608
>>
>> [MXNET-484] MKLDNN C++ test for LRN operator
>> https://github.com/apache/incubator-mxnet/pull/11831
>>
>> [MXNET-546] Add unit test for MKLDNNSum
>> https://github.com/apache/incubator-mxnet/pull/11272
>>
>> [MXNET-498] Test MKLDNN backward operators
>> https://github.com/apache/incubator-mxnet/pull/11232
>>
>> [MXNET-500] Test cases improvement for MKLDNN on Gluon
>> https://github.com/apache/incubator-mxnet/pull/10921
>>
>> Set correct update on kvstore flag in dist_device_sync mode (as part of
>> fixing [5])
>> https://github.com/apache/incubator-mxnet/pull/12786
>>
>> upgrade mshadow version
>> https://github.com/apache/incubator-mxnet/pull/12692
>> But another PR will be used instead:
>> update mshadow
>> https://github.com/apache/incubator-mxnet/pull/12674
>>
>> CudnnFind() usage improvements
>> https://github.com/apache/incubator-mxnet/pull/12804
>> A critical CUDNN fix that reduces GPU memory consumption and addresses
>> this memory leak issue. This is an important fix to include in 1.3.1
>>
>>
>> From discussion about gluon toolkits:
>>
>> disable opencv threading for forked process
>> https://github.com/apache/incubator-mxnet/pull/12025
>>
>> Fix lazy record io when used with dataloader and multi_worker > 0
>> https://github.com/apache/incubator-mxnet/pull/12554
>>
>> fix potential floa

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-05 Thread Anton Chernov
As part of:

Implement mkldnn convolution fusion and quantization
https://github.com/apache/incubator-mxnet/pull/12530

I propose to add the examples and documentation PR as well:

MKL-DNN Quantization Examples and README
https://github.com/apache/incubator-mxnet/pull/12808


Best regards,
Anton

пн, 5 нояб. 2018 г. в 19:02, Anton Chernov :

> Dear MXNet community,
>
> I will be the release manager for the upcoming 1.3.1 patch release. Naveen
> will be co-managing the release and providing help from the committers side.
>
> The following dates have been set:
>
> Code Freeze: 31st October 2018
> Release published: 13th November 2018
>
> Release notes have been drafted here [1].
>
>
> * Known issues
>
> Update MKL-DNN dependency
> https://github.com/apache/incubator-mxnet/pull/12953
>
> This PR hasn't been merged even to master yet. Requires additional
> discussion and merge.
>
> distributed kvstore bug in MXNet
> https://github.com/apache/incubator-mxnet/issues/12713
>
> > When distributed kvstore is used, by default gluon.Trainer doesn't work
> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be more
> specific, the trainer updates once per GPU, the LRScheduler object is
> shared across GPUs and get a wrong update count.
>
> This needs to be fixed. [6]
>
>
> * Changes
>
> The following changes will be ported to the release branch, per [2]:
>
> Infer dtype in SymbolBlock import from input symbol [3]
> https://github.com/apache/incubator-mxnet/pull/12412
>
> [MXNET-953] Fix oob memory read
> https://github.com/apache/incubator-mxnet/pull/12631
>
> [MXNET-969] Fix buffer overflow in RNNOp
> https://github.com/apache/incubator-mxnet/pull/12603
>
> [MXNET-922] Fix memleak in profiler
> https://github.com/apache/incubator-mxnet/pull/12499
>
> Implement mkldnn convolution fusion and quantization (MXNet Graph
> Optimization and Quantization based on subgraph and MKL-DNN proposal [4])
> https://github.com/apache/incubator-mxnet/pull/12530
>
> Following items (test cases) should be already part of 1.3.0:
>
> [MXNET-486] Create CPP test for concat MKLDNN operator
> https://github.com/apache/incubator-mxnet/pull/11371
>
> [MXNET-489] MKLDNN Pool test
> https://github.com/apache/incubator-mxnet/pull/11608
>
> [MXNET-484] MKLDNN C++ test for LRN operator
> https://github.com/apache/incubator-mxnet/pull/11831
>
> [MXNET-546] Add unit test for MKLDNNSum
> https://github.com/apache/incubator-mxnet/pull/11272
>
> [MXNET-498] Test MKLDNN backward operators
> https://github.com/apache/incubator-mxnet/pull/11232
>
> [MXNET-500] Test cases improvement for MKLDNN on Gluon
> https://github.com/apache/incubator-mxnet/pull/10921
>
> Set correct update on kvstore flag in dist_device_sync mode (as part of
> fixing [5])
> https://github.com/apache/incubator-mxnet/pull/12786
>
> upgrade mshadow version
> https://github.com/apache/incubator-mxnet/pull/12692
> But another PR will be used instead:
> update mshadow
> https://github.com/apache/incubator-mxnet/pull/12674
>
> CudnnFind() usage improvements
> https://github.com/apache/incubator-mxnet/pull/12804
> A critical CUDNN fix that reduces GPU memory consumption and addresses
> this memory leak issue. This is an important fix to include in 1.3.1
>
>
> From discussion about gluon toolkits:
>
> disable opencv threading for forked process
> https://github.com/apache/incubator-mxnet/pull/12025
>
> Fix lazy record io when used with dataloader and multi_worker > 0
> https://github.com/apache/incubator-mxnet/pull/12554
>
> fix potential floating number overflow, enable float16
> https://github.com/apache/incubator-mxnet/pull/12118
>
>
>
> * Resolved issues
>
> MxNet 1.2.1–module get_outputs()
> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
>
> As far as I can see from the comments the issue has been resolved, no
> actions need to be taken for this release. [7] is mentioned in this
> regards, but I don't see any action points here either.
>
>
> I will start with help of Naveen port the mentioned PR's to the 1.3.x
> branch.
>
>
> Best regards,
> Anton
>
> [1] https://cwiki.apache.org/confluence/x/eZGzBQ
> [2]
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
> [3] https://github.com/apache/incubator-mxnet/issues/11849
> [4]
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN
> [5] https://github.com/apache/incubator-mxnet/issues/12713
> [6]
> https://github.com/apache/incubator-mxnet/issues/12713#issuecomment-435773777
> [7] https://github.com/apache/incubator-mxnet/pull/11005
>
>


[Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-05 Thread Anton Chernov
Dear MXNet community,

I will be the release manager for the upcoming 1.3.1 patch release. Naveen
will be co-managing the release and providing help from the committers side.

The following dates have been set:

Code Freeze: 31st October 2018
Release published: 13th November 2018

Release notes have been drafted here [1].


* Known issues

Update MKL-DNN dependency
https://github.com/apache/incubator-mxnet/pull/12953

This PR hasn't been merged even to master yet. Requires additional
discussion and merge.

distributed kvstore bug in MXNet
https://github.com/apache/incubator-mxnet/issues/12713

> When distributed kvstore is used, by default gluon.Trainer doesn't work
with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be more
specific, the trainer updates once per GPU, the LRScheduler object is
shared across GPUs and get a wrong update count.

This needs to be fixed. [6]


* Changes

The following changes will be ported to the release branch, per [2]:

Infer dtype in SymbolBlock import from input symbol [3]
https://github.com/apache/incubator-mxnet/pull/12412

[MXNET-953] Fix oob memory read
https://github.com/apache/incubator-mxnet/pull/12631

[MXNET-969] Fix buffer overflow in RNNOp
https://github.com/apache/incubator-mxnet/pull/12603

[MXNET-922] Fix memleak in profiler
https://github.com/apache/incubator-mxnet/pull/12499

Implement mkldnn convolution fusion and quantization (MXNet Graph
Optimization and Quantization based on subgraph and MKL-DNN proposal [4])
https://github.com/apache/incubator-mxnet/pull/12530

Following items (test cases) should be already part of 1.3.0:

[MXNET-486] Create CPP test for concat MKLDNN operator
https://github.com/apache/incubator-mxnet/pull/11371

[MXNET-489] MKLDNN Pool test
https://github.com/apache/incubator-mxnet/pull/11608

[MXNET-484] MKLDNN C++ test for LRN operator
https://github.com/apache/incubator-mxnet/pull/11831

[MXNET-546] Add unit test for MKLDNNSum
https://github.com/apache/incubator-mxnet/pull/11272

[MXNET-498] Test MKLDNN backward operators
https://github.com/apache/incubator-mxnet/pull/11232

[MXNET-500] Test cases improvement for MKLDNN on Gluon
https://github.com/apache/incubator-mxnet/pull/10921

Set correct update on kvstore flag in dist_device_sync mode (as part of
fixing [5])
https://github.com/apache/incubator-mxnet/pull/12786

upgrade mshadow version
https://github.com/apache/incubator-mxnet/pull/12692
But another PR will be used instead:
update mshadow
https://github.com/apache/incubator-mxnet/pull/12674

CudnnFind() usage improvements
https://github.com/apache/incubator-mxnet/pull/12804
A critical CUDNN fix that reduces GPU memory consumption and addresses this
memory leak issue. This is an important fix to include in 1.3.1


>From discussion about gluon toolkits:

disable opencv threading for forked process
https://github.com/apache/incubator-mxnet/pull/12025

Fix lazy record io when used with dataloader and multi_worker > 0
https://github.com/apache/incubator-mxnet/pull/12554

fix potential floating number overflow, enable float16
https://github.com/apache/incubator-mxnet/pull/12118



* Resolved issues

MxNet 1.2.1–module get_outputs()
https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882

As far as I can see from the comments the issue has been resolved, no
actions need to be taken for this release. [7] is mentioned in this
regards, but I don't see any action points here either.


I will start with help of Naveen port the mentioned PR's to the 1.3.x
branch.


Best regards,
Anton

[1] https://cwiki.apache.org/confluence/x/eZGzBQ
[2]
https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
[3] https://github.com/apache/incubator-mxnet/issues/11849
[4]
https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN
[5] https://github.com/apache/incubator-mxnet/issues/12713
[6]
https://github.com/apache/incubator-mxnet/issues/12713#issuecomment-435773777
[7] https://github.com/apache/incubator-mxnet/pull/11005


Re: [Discuss] Feature detection at runtime / test skipping depending on features

2018-11-01 Thread Anton Chernov
Great idea. I could see following checks being added:

* Is OpenMP enabled?
* Is CUDA enabled? (though already available as 0 for GPU count)
* Is NCCL enabled?
* CuDNN?
* What BLAS / LAPACK math library is used?
* F16 support enabled?
* KVStore enabled?
* TensorRT?

It would help to structure the tests better.

Best
Anton


чт, 1 нояб. 2018 г. в 15:30, Pedro Larroy :

> Hi
>
> There are some tests that fail when some features are not compiled in, such
> as Opencv.
>
> In some cases we skip the test according to some precondition such as:
>
> @unittest.skipIf(not graphviz_exists(),
>
>
> I would propose that we have a Python module that exports a set of methods
> to check what features are compiled in to skip tests which need this
> feature.
>
>
>
> test_gluon_data.test_recordimage_dataset ... [INFO] Setting test
> np/mx/python random seeds, use MXNET_TEST_SEED=1883419283 to reproduce.
> ERROR
> test_gluon_data.test_recordimage_dataset_with_data_loader_multiworker ...
> Process Process-1:
> Traceback (most recent call last):
>   File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in
> _bootstrap
> self.run()
>   File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
> self._target(*self._args, **self._kwargs)
>   File
> "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/data/dataloader.py",
> line 189, in worker_loop
> batch = batchify_fn([dataset[i] for i in samples])
>   File
> "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/data/dataloader.py",
> line 189, in 
> batch = batchify_fn([dataset[i] for i in samples])
>   File
>
> "/usr/local/lib/python3.5/dist-packages/mxnet/gluon/data/vision/datasets.py",
> line 261, in __getitem__
> return image.imdecode(img, self._flag), header.label
>   File "/usr/local/lib/python3.5/dist-packages/mxnet/image/image.py", line
> 147, in imdecode
> return _internal._cvimdecode(buf, *args, **kwargs)
>   File "", line 36, in _cvimdecode
>   File "/usr/local/lib/python3.5/dist-packages/mxnet/_ctypes/ndarray.py",
> line 92, in _imperative_invoke
> ctypes.byref(out_stypes)))
>   File "/usr/local/lib/python3.5/dist-packages/mxnet/base.py", line 252, in
> check_call
> raise MXNetError(py_str(_LIB.MXGetLastError()))
> mxnet.base.MXNetError: [19:21:42] /work/mxnet/src/io/image_io.cc:211: Build
> with USE_OPENCV=1 for image io.
>
>
> Pedro
>


Re: Reproducing test failures on CI

2018-10-23 Thread Anton Chernov
Dear MXNet community,

Unfortunately, due to various reasons, we need to reschedule the demo to
next week's user group, to 30th of October.

Best regards,
Anton




вт, 16 окт. 2018 г. в 18:03, Pedro Larroy :

> These are two separate events. The London meetup is not related to Anton's
> original email.
>
> Regarding reproducing CI failures I would suggest that we create some easy
> to use scripts and templates to launch instances rather than lengthy
> documentation or materials. If the process is complex, automation is always
> better than lengthy instructions.
>
> It should be a couple of instructions to reproduce test failures locally or
> in EC2.
> I have a personal terraform file and scripts which I use to provision
> instances to do MXNet work in which does all the tedious configuration. I
> could polish them up a bit and create a PR. Another script would be needed
> to launch build & test easily as now with the complexity of the
> JenkinsFiles is too convoluted to reverse engineer for somebody not
> familiar with CI.
>
> There's this nice guide that Marco created:
> https://cwiki.apache.org/confluence/display/MXNET/Reproducing+test+results
>
> But seems not many people read it, also it doesn't solve provisioning the
> instance and installing the initial dependencies.
>
>
> Pedro.
>
> On Mon, Oct 15, 2018 at 8:58 PM Naveen Swamy  wrote:
>
> > Timur,
> > Here is a meetup Scheduled for 23rd October in London, where Pedro Larroy
> > will talk about Deep Learning using MXNet!
> >
> >
> >
> https://www.meetup.com/Deep-Learning-with-Apache-MXNet-London/events/255280739/
> >
> >
> > -Naveen
> >
> > On Mon, Oct 15, 2018 at 11:18 AM Anton Chernov 
> > wrote:
> >
> > > Sorry, Timur, I've missed that part.
> > >
> > > It will be during the regular user group meeting that is conducted in
> > > Berlin and is streamed via Chime. You can find more information on the
> > > wiki:
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28Incubating%29+User+Groups+recurring+meetings
> > >
> > > Best
> > > Anton
> > >
> > >
> > > пн, 15 окт. 2018 г. в 18:45, Timur Shenkao :
> > >
> > > > Is it London meeting?
> > > > Or some other location?
> > > >
> > > > On Monday, October 15, 2018, Anton Chernov 
> > wrote:
> > > >
> > > > > Dear MXNet community,
> > > > >
> > > > > We've noticed that there has been some difficulties setting up
> > > > environments
> > > > > and reproducing test results from failed builds on the CI. We would
> > > like
> > > > to
> > > > > offer some help to the community on that and therefore helding a
> > small
> > > > live
> > > > > stream demo session during our User Group Meeting on the 23rd of
> > > October.
> > > > > We will be:
> > > > >
> > > > > * Reviewing a failure and make an initial guess on the cause
> > > > > * Setting up environment
> > > > > * Reproducing the build step from the CI
> > > > > * Reproducing a failure step
> > > > > * Making and submitting a fix back to the community
> > > > >
> > > > > Feel free to propose some additional topic for the streaming.
> > > > >
> > > > > Best regards
> > > > > Anton
> > > > >
> > > >
> > >
> >
>


Re: Creating branch for Java_API

2018-10-15 Thread Anton Chernov
We could create a special job for testing it, maybe with a tweaked
Jenkinsfile so you could run only the tests you are interested in. What do
you think?

Best
Anton

пт, 12 окт. 2018 г. в 20:24, Naveen Swamy :

> Hi All,
>
> Just wanted to inform there that I am going to create a branch on GitHub
> for the Java API work that Andrew/Qing and few others are doing. This is
> only temporary I realize this will not have testing.
> There seems to be continued disagreement in the approaches we are
> taking(which is fine), so I are going to create a branch and provide the
> code to a few interested users(within Amazon) and get concrete
> feedback from them.
>
> Thanks, Naveen
>


Re: [DISCUSS] Use modernized C++11 range loops uniformly throughout the project

2018-09-29 Thread Anton Chernov
And if you want a more authoritative opinion on that check out what the C++
core guidelines are saying [1]:

> ES.71: Prefer a range-for-statement to a for-statement when there is a
choice
> Reason
> Readability. Error prevention. Efficiency.

Best regards
Anton

[1]
https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Res-for-range


сб, 29 сент. 2018 г. в 16:13, Anton Chernov :

> +1
>
> Maybe it's not necessary to enforce usage of range-based for, but I would
> highly encourage to to it due to already named advantages. If code would be
> introduced using the old-style there could be a comment suggesting the new
> way. But why do the manual work and not leave that to the automated tool?
>
> And since it's already automated - wouldn't it be better to keep a unified
> modern style?
>
> Just to make this a trend - C++ evolves quickly and this will not be only
> upgrade that would needed to be made. And the easier such upgrades get
> accepted the easier in general is to upgrade the codebase.
>
> Soon the standard will get ranges and concepts and this will change the
> way C++ applications get written significantly. It is a good habit to be
> open for changes and keep up with the trends. By using the new
> possibilities the language can offer you prepare yourself for further
> changes and are more likely to accept them, evolving your programming style.
>
> Take a look at a new examples on modern usages (taken from [1]):
>
> // since C++17
> for (auto&& [first,second] : mymap) {
> // use first and second
> }
>
> // since C++20
> for (auto& x : foo().items()) { /* .. */ } // undefined behavior if foo()
> returns by value
> for (T thing = foo(); auto& x : thing.items()) { /* ... */ } // OK
>
> // since C++11
> struct cow_string { /* ... */ };
> // a copy-on-write string cow_string str = /* ... */;
> // for(auto x : str) { /* ... */ } // may cause deep copy
> for(auto x : std::as_const(str)) { /* ... */ }
>
> Regarding performance: it's really easy to prove that generated assembly
> is not changing at all. There is a really handy tool for that [2]. You can
> check online the assembly for different language constructs and different
> compilers.
>
> Best regards,
> Anton
>
> [1] https://en.cppreference.com/w/cpp/language/range-for
> [2] https://gcc.godbolt.org
>
> сб, 29 сент. 2018 г. в 13:15, kellen sunderland <
> kellen.sunderl...@gmail.com>:
>
>> It's more readable because it's concise and it's consistent for many types
>> you're looping over (i.e. primitive arrays, stl iterators, etc all work
>> the
>> same way).  It's also useful because it's consistent with other
>> programming
>> languages, making C++ codebases much easier to read for novice and
>> intermediate developers.  IMO it also leads to better naming in loop
>> bodies
>> as the concise style means you're less likely to have important 1 letter
>> variable names describing loop elements (e.g. no int i =0 or it ...).
>> More
>> motivation can be found in the cpp standards proposals for C++11
>> http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1868.html and
>> http://open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3853.htm.
>>
>>
>>
>> On Sat, Sep 29, 2018 at 6:38 PM Naveen Swamy  wrote:
>>
>> > Kellen,
>> >
>> > Could you please explain why you think range loops are better and how it
>> > improves readability?  this is a relatively new feature, many of them
>> are
>> > used to the old syntax, shouldn't we leave it for the developers to
>> choose
>> > the one that best suits the need and their familiarity.
>> > In general I support the notion of standardizing where necessary,
>> enforcing
>> > rules on loops seems little bit like micro-managing how you should write
>> > C++ code for MXNet.
>> >
>> > -1(open to change based on new information)
>> >
>> >
>> >
>> > On Fri, Sep 28, 2018 at 5:20 PM Chris Olivier 
>> > wrote:
>> >
>> > > ok then, my vote is still -1, however, because it’s just adding
>> needless
>> > > friction for developers imho.
>> > >
>> > > On Fri, Sep 28, 2018 at 7:42 AM kellen sunderland <
>> > > kellen.sunderl...@gmail.com> wrote:
>> > >
>> > > > "Range loops aren’t always the most performant way" Do you have an
>> > > example
>> > > > where there's a perf difference?
>> > > >
>> > > > "In addition, sometimes you want the index. Or maybe you want to
>> &g

Re: [DISCUSS] Use modernized C++11 range loops uniformly throughout the project

2018-09-29 Thread Anton Chernov
+1

Maybe it's not necessary to enforce usage of range-based for, but I would
highly encourage to to it due to already named advantages. If code would be
introduced using the old-style there could be a comment suggesting the new
way. But why do the manual work and not leave that to the automated tool?

And since it's already automated - wouldn't it be better to keep a unified
modern style?

Just to make this a trend - C++ evolves quickly and this will not be only
upgrade that would needed to be made. And the easier such upgrades get
accepted the easier in general is to upgrade the codebase.

Soon the standard will get ranges and concepts and this will change the way
C++ applications get written significantly. It is a good habit to be open
for changes and keep up with the trends. By using the new possibilities the
language can offer you prepare yourself for further changes and are more
likely to accept them, evolving your programming style.

Take a look at a new examples on modern usages (taken from [1]):

// since C++17
for (auto&& [first,second] : mymap) {
// use first and second
}

// since C++20
for (auto& x : foo().items()) { /* .. */ } // undefined behavior if foo()
returns by value
for (T thing = foo(); auto& x : thing.items()) { /* ... */ } // OK

// since C++11
struct cow_string { /* ... */ };
// a copy-on-write string cow_string str = /* ... */;
// for(auto x : str) { /* ... */ } // may cause deep copy
for(auto x : std::as_const(str)) { /* ... */ }

Regarding performance: it's really easy to prove that generated assembly is
not changing at all. There is a really handy tool for that [2]. You can
check online the assembly for different language constructs and different
compilers.

Best regards,
Anton

[1] https://en.cppreference.com/w/cpp/language/range-for
[2] https://gcc.godbolt.org

сб, 29 сент. 2018 г. в 13:15, kellen sunderland :

> It's more readable because it's concise and it's consistent for many types
> you're looping over (i.e. primitive arrays, stl iterators, etc all work the
> same way).  It's also useful because it's consistent with other programming
> languages, making C++ codebases much easier to read for novice and
> intermediate developers.  IMO it also leads to better naming in loop bodies
> as the concise style means you're less likely to have important 1 letter
> variable names describing loop elements (e.g. no int i =0 or it ...).  More
> motivation can be found in the cpp standards proposals for C++11
> http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1868.html and
> http://open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3853.htm.
>
>
>
> On Sat, Sep 29, 2018 at 6:38 PM Naveen Swamy  wrote:
>
> > Kellen,
> >
> > Could you please explain why you think range loops are better and how it
> > improves readability?  this is a relatively new feature, many of them are
> > used to the old syntax, shouldn't we leave it for the developers to
> choose
> > the one that best suits the need and their familiarity.
> > In general I support the notion of standardizing where necessary,
> enforcing
> > rules on loops seems little bit like micro-managing how you should write
> > C++ code for MXNet.
> >
> > -1(open to change based on new information)
> >
> >
> >
> > On Fri, Sep 28, 2018 at 5:20 PM Chris Olivier 
> > wrote:
> >
> > > ok then, my vote is still -1, however, because it’s just adding
> needless
> > > friction for developers imho.
> > >
> > > On Fri, Sep 28, 2018 at 7:42 AM kellen sunderland <
> > > kellen.sunderl...@gmail.com> wrote:
> > >
> > > > "Range loops aren’t always the most performant way" Do you have an
> > > example
> > > > where there's a perf difference?
> > > >
> > > > "In addition, sometimes you want the index. Or maybe you want to
> > iterate
> > > > backwards, or not start from the first, etc. Maybe you want the
> > iterator
> > > > because you remove it from the list at the bottom of the loop
> Seems
> > > > like a rule for the sake of having a rule."
> > > >
> > > > I should have been more clear about this point.  If you're using the
> > > index
> > > > in the loop, doing reverse iteration, or not iterating from
> > start-to-end
> > > > this inspection is smart enough to realize it and will not suggest
> > > > optimizing that type of loop.  The loops that would be changes are
> > _only_
> > > > the loops which are detected as equivalent to range-loops.  Examples
> > can
> > > be
> > > > found here:
> > > >
> > >
> >
> https://clang.llvm.org/extra/clang-tidy/checks/modernize-loop-convert.html
> > > > or you can look at what's been changed in the ref PR.  I've initially
> > set
> > > > our confidence level at 'reasonable' but we could also set to 'safe'
> > > which
> > > > would further reduce the number of loops the check would apply to.
> > > >
> > > > -Kellen
> > > >
> > > > On Fri, Sep 28, 2018 at 3:54 PM Chris Olivier <
> cjolivie...@apache.org>
> > > > wrote:
> > > >
> > > > > -1
> > > > >
> > > > > Range loops aren’t always the most performant way. 

Re: Reformulating to a more efficient design of Mxnet-label-Bot

2018-09-27 Thread Anton Chernov
I mean that you don't have to be a code owner to review a PR. If code is
touched you are familiar with or code is similar to some of you've
submitted before than you could be a good reviewer. The bot could pick some
amount of reviewers based on this.

Best
Anton

чт, 27 сент. 2018 г. в 17:28, Qing Lan :

> Great work Harsh! I like your webhook design. This would allow us to do a
> great more for the label bot and speed up the response time.
>
> -Marco: I think Anton means the "Assignees" field in issues and PRs
>
> Thanks,
> Qing
> On 9/27/18, 5:06 PM, "Marco de Abreu" 
> wrote:
>
> You mean like a replacement for the codeowners feature?
>
> Anton Chernov  schrieb am Fr., 28. Sep. 2018,
> 01:39:
>
> > As a feature request: Could we include detection and proposal of
> reviewers
> > to the bot as well?
> >
> > Anton
> >
> > чт, 27 сент. 2018 г. в 15:27, Harsh Patel <
> harshpatel081...@gmail.com>:
> >
> > > Hey,
> > > I'm Harsh Patel, and I am looking to contribute to MXNet. I wanted
> to get
> > > some feedback to improvements that could be made with the current
> > structure
> > > that we have for automatically labelling issues and pull requests.
> I have
> > > linked my proposed design structure on the bottom of this wiki
> page (
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Machine+Learning+Based+GitHub+Bot
> > > )
> > > - it should be under 7. Overall, users will benefit from this
> design
> > since
> > > it will allow adding, updating, and deleting of labels freely.
> Label
> > > creation will be faster since this model focuses on labelling an
> issue as
> > > soon as it is made. Another key benefit is that we minimize the
> number of
> > > total GitHub API calls that need to be made. Feedback would be much
> > > appreciated - I would like to hear what the developers have to say!
> > Thanks.
> > >
> >
>
>
>


Re: Reformulating to a more efficient design of Mxnet-label-Bot

2018-09-27 Thread Anton Chernov
As a feature request: Could we include detection and proposal of reviewers
to the bot as well?

Anton

чт, 27 сент. 2018 г. в 15:27, Harsh Patel :

> Hey,
> I'm Harsh Patel, and I am looking to contribute to MXNet. I wanted to get
> some feedback to improvements that could be made with the current structure
> that we have for automatically labelling issues and pull requests. I have
> linked my proposed design structure on the bottom of this wiki page (
>
> https://cwiki.apache.org/confluence/display/MXNET/Machine+Learning+Based+GitHub+Bot
> )
> - it should be under 7. Overall, users will benefit from this design since
> it will allow adding, updating, and deleting of labels freely. Label
> creation will be faster since this model focuses on labelling an issue as
> soon as it is made. Another key benefit is that we minimize the number of
> total GitHub API calls that need to be made. Feedback would be much
> appreciated - I would like to hear what the developers have to say! Thanks.
>


Re: Remove MKLML as dependency

2018-09-19 Thread Anton Chernov
MKLML is super easy to install since it's distributed with the MKL-DNN
package on GitHub [1] and this for all desktop platforms (Linux, Windows
and MacOS). Currently, I don't see a way how MKL could be automatically
installed on a Windows CI host for example. It also has the advantage of
being smaller than the whole MKL library, which is good for distribution,
while still having enough functionality it it.

I would rather be in favour of keeping it.

The unfortunate situation with the fact that MKLML is downloaded for every
cmake build will hopefully get resolved when #11148 PR [2] will be merged.

Best regards,
Anton

[1] https://github.com/intel/mkl-dnn/releases
[2] https://github.com/apache/incubator-mxnet/pull/11148


ср, 19 сент. 2018 г. в 8:31, Lv, Tao A :

> If you just want to test the performance, I think you need link MKL for
> BLAS and MKL-DNN for NN. Also MKL-DNN should link MKL for better
> performance.
>
> Here are some ways for you to install full MKL library if you don't have
> one:
> 1. Register and download from intel website:
> https://software.intel.com/en-us/mkl
> 2. Apt-get/yum: currently it need configure Intel’s repositories.
> a.
> https://software.intel.com/en-us/articles/installing-intel-free-libs-and-python-yum-repo
> b.
> https://software.intel.com/en-us/articles/installing-intel-free-libs-and-python-apt-repo
> 3. pip install mkl / mkl-devel: ‘mkl’ package has the runtime and
> ‘mkl-devel’ includes everything with the headers
> a.
> https://software.intel.com/en-us/articles/installing-the-intel-distribution-for-python-and-intel-performance-libraries-with-pip-and
> 4. conda install: also has mkl and mkl-devel
> a. https://anaconda.org/intel/mkl
> b. https://anaconda.org/intel/mkl-devel
>
> If you want to redistribute MKL with MXNet, you may need take care of the
> license issue. Currently, MKL is using ISSL (
> https://software.intel.com/en-us/license/intel-simplified-software-license
> ).
>
> -Original Message-
> From: Zai, Alexander [mailto:alex...@amazon.com.INVALID]
> Sent: Wednesday, September 19, 2018 12:49 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Remove MKLML as dependency
>
> Will test it out tomorrow.
>
> On the side, what is the best way to test MKL build for MXnet. MKL is
> licensed?
>
> Best,
> Alex
>
> On 9/18/18, 7:50 PM, "Lv, Tao A"  wrote:
>
> Hi Alex,
>
> Thanks for bringing this up.
>
> The original intention of MKLML is to provide a light and
> easy-to-access library for ML/DL community. It's released with MKL-DNN
> under Apache-2.0 license.
>
> AFAIK, MKL-DNN still relies on it for better performance. So I'm
> afraid there will be a performance regression in MKL pip packages if MKLML
> is simply removed.
>
> Have you ever tried the build without MKLML and how does the
> performance look like?
>
> -tao
>
> -Original Message-
> From: Alex Zai [mailto:aza...@gmail.com]
> Sent: Wednesday, September 19, 2018 4:49 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Remove MKLML as dependency
>
> On our build from source page we have a list of blas libraries that
> are recommended:
> https://mxnet.incubator.apache.org/install/build_from_source.html
>
> MKL-DNN
> MKL
> MKLML
> Apple Accelerate
> OpenBlas
>
> MKLML is a subset of MKL (https://github.com/intel/mkl-dnn/issues/102)
> and therefore MKLML users can just use MKL instead. Does anyone see an
> issue with me removing this? It would simplify out doc page and build file.
>
> Alex
>
>
>


Re: Nightly Builds Not Working for Cu90MKL?

2018-08-31 Thread Anton Chernov
Thank you for noticing!

We are working on automating the process, but currently it's a manual
effort to publish to PyPi. We are experiencing some problems with the
publishing, but the issue should get resolved soon.

Best
Anton

пт, 31 авг. 2018 г. в 23:29, Alfredo Luque :

> See here:
> https://pypi.org/project/mxnet-cu90mkl/#history
>
> No builds show up since 8/22. From what I can tell, other variants (eg;
> mxnet-mkl) are up to date.
>
> On August 31, 2018 at 2:24:30 PM, Anton Chernov (mecher...@gmail.com)
> wrote:
>
> Hi Alfredo!
>
> Could you provide more info on this? Where do you get the information?
>
> Best
> Anton
>
> пт, 31 авг. 2018 г. в 22:49, Alfredo Luque
>  >:
>
> > Just curious why the latest build is 2018-08-22 while the other variants
> > are up to date.
> >
> > Thanks,
> >
> > —
> > Alfredo Luque
> > Software Engineer
> > Machine Learning Infrastructure
> > Airbnb
> > San Francisco, CA
> >
>
> —
> Alfredo Luque
> Software Engineer
> Machine Learning Infrastructure
> Airbnb
> San Francisco, CA
>


Re: Nightly Builds Not Working for Cu90MKL?

2018-08-31 Thread Anton Chernov
Hi Alfredo!

Could you provide more info on this? Where do you get the information?

Best
Anton

пт, 31 авг. 2018 г. в 22:49, Alfredo Luque :

> Just curious why the latest build is 2018-08-22 while the other variants
> are up to date.
>
> Thanks,
>
> —
> Alfredo Luque
> Software Engineer
> Machine Learning Infrastructure
> Airbnb
> San Francisco, CA
>


Apache MXNet (Incubating) Recurring User Group Meeting Notes: 28th August 2018

2018-08-29 Thread Anton Chernov
Dear Apache MXNet (Incubating) community,

Please find some meeting notes from the user group meeting on 28th August
2018.

# MXNet contribution

Per joined the group to see and discuss how he can contribute to MXNet. He
already had a PR opened [1] and was looking for the next steps. As a
suggestion work on flaky tests was proposed.

# Architecture review

Hagen joined the group in pearson to review and discuss architecture that
he and his company is using for categorizing uploaded by users documents.
We had interesting discussion in this regards, some broader information on
the topic came up. Here are some things worth checking out:

* Best Practices for ML Engineering [2]
* Powerful open source se2seq framework Sockeye [3] which is used for
machine translation
* Attention based methods, like Transformer [4], Attention Is All You Need
[5].

Join us for the next sessions!

Best
Anton

[1] https://github.com/apache/incubator-mxnet/pull/12331
[2] https://developers.google.com/machine-learning/guides/rules-of-ml/
[3] https://github.com/awslabs/sockeye
[4] https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html
[5] https://arxiv.org/abs/1706.03762


Re: MXNet Berlin Office Hours

2018-08-08 Thread Anton Chernov
Well we are a group of people physically located in Berlin and ready to
provide help in person. A proposal to improve the process is described here
[1]. Any feedback will be greatly appreciated.

And I just wanted to mention that we had already a few sessions with a lot
of people being happy about the help they got. Sometimes a discussion in
person makes a problem so much easier to solve.

Best
Anton

[1]
https://cwiki.apache.org/confluence/display/MXNET/PROPOSAL%3A+Apache+MXNet%28Incubating%29+Office+Hours

вт, 24 июл. 2018 г. в 4:32, Hen :

> Noting that I find "MXNet Berlin team" a very confusing concept.
>
> Does that mean "Apache MXNet committers who happen to live in Berlin?"
>
> On Mon, Jul 16, 2018 at 2:27 AM, Anton Chernov 
> wrote:
>
> > Dear MXNet community,
> >
> > As part of our customer support the MXNet Berlin team is offering office
> > hours on Tuesdays 6pm-7pm (CEST) | 9:00am-10am (PST).
> >
> > They happen onsite in the Amazon Berlin office:
> > Krausenstraße 38, 10117 Berlin in BER12 01.501
> >
> > Conference Bridge Information
> >
> > Chime meeting ID: 5461650798
> > Join via browser screen share: https://chime.aws/5461650798
> > Join via phone (US): +1-929-432-4463,,5461650798#
> > Join via phone (US toll-free): +1-855-552-4463,,5461650798#
> > International dial-in: https://chime.aws/dialinnumbers/
> > In-room video system: Ext: 62000, Meeting PIN: 5461650798#
> >
> > How can we help you?
> >
> > The following are a few examples of the types of consultations we
> provide:
> >
> > * CI and infrastructure questions
> > * Build system
> > * Benchmarking
> > * Edge devices (for example Raspberry Pi, Jetson)
> > * C++
> > * General questions
> >
> > Before attending
> >
> > Try finding answers on:
> >
> > * Our discussion forum (https://discuss.mxnet.io)
> > * StackOverflow mxnet tag (https://stackoverflow.com/
> > questions/tagged/mxnet)
> > * MXNet website (https://mxnet.incubator.apache.org/faq/)
> > * Github issues (https://github.com/apache/incubator-mxnet/issues)
> >
> > If this does not help:
> >
> > In advance fill out a github issue (
> > https://github.com/apache/incubator-mxnet/issues/new) at least a few
> days
> > before so that the team member who will help with the issue gets a chance
> > to prepare.
> >
> > Main point of contact through email: mxnet-edge-oncall-primary[at]a
> > mazon.com
> >
> > Best regards
> > Anton Chernov
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/MXNET/
> > MXNet+Berlin+Office+Hours
> >
>


Re: MXNet developer setup on Mac with VSCode for develop, test and debug

2018-07-19 Thread Anton Chernov
I hope that the instructions will be for cmake soon. After merging this PR
[1] there should be not too much problems with the build (considering it's
without CUDA).

Best
Anton

[1] https://github.com/apache/incubator-mxnet/pull/11148

чт, 19 июл. 2018 г. в 18:59, Pedro Larroy :

> Have you guys tried CLion, works like a charm for me. (Requires license).
>
> On Wed, Jul 18, 2018 at 10:09 PM Naveen Swamy  wrote:
>
> > Thanks Sandeep for putting this together, it would make it easy for
> people
> > who prefer to IDEs to get started with MXNet easily.
> >
> > On Wed, Jul 18, 2018 at 1:04 PM, Lin Yuan  wrote:
> >
> > > Hi Aaron,
> > >
> > > This doc is for development on Mac. It is not intended for Windows
> users.
> > > Maybe we can start a different thread to discuss about MXNet build on
> > > Windows? I have tried it myself on a GPU instances built on Windows
> DLAMI
> > > 10.0. I would love to share with you my setup steps.
> > >
> > > Lin
> > >
> > > On Wed, Jul 18, 2018 at 11:43 AM Markham, Aaron
> > > 
> > > wrote:
> > >
> > > > This is tangential, but Lin, I noticed during the RC1 tests you said
> > you
> > > > tried it out on Windows and it worked for you. I'd like to get VS2017
> > or
> > > VS
> > > > Code working, take Sandeep's setup content and possibly your Windows
> > > > experience, and improve the MXNet Windows setup guide. I've tried it
> > and
> > > > failed. Multiple times. I also tried the MKLDNN instructions and
> > failed.
> > > I
> > > > tried the setup tools batch file and was hit with a lot of dependency
> > > > errors. Some of the problem isn't in the MXNet docs, but in the
> > > > dependencies' documentation, but I'm left to go figure that out on my
> > > own.
> > > > Anyway, any help you can provide here would be great. Also, if any of
> > you
> > > > reading this has a sort of checklist or guide for Windows, I'd love
> to
> > > see
> > > > it.
> > > >
> > > > BTW, I'm using Windows 10 with an NVIDIA GeForce GTX 980, and was
> > trying
> > > > to use VS2017 Community Edition and MKL. I went to MKL after OpenBLAS
> > > > wasn't installing/building.
> > > >
> > > > On 7/18/18, 10:59 AM, "Lin Yuan"  wrote:
> > > >
> > > > Thanks for the well-written document! As a new MXNet developer, I
> > > have
> > > > found it very helpful.
> > > >
> > > > Lin
> > > >
> > > > On Wed, Jul 18, 2018 at 10:50 AM sandeep krishnamurthy <
> > > s...@apache.org
> > > > >
> > > > wrote:
> > > >
> > > > > Hello Community,
> > > > >
> > > > >
> > > > >
> > > > > As a MXNet contributor, I had issues and took me some time on
> > > getting
> > > > > hands-on with MXNet codebase, being able to code, test, DEBUG
> > > > python/CPP
> > > > > combination. I have documented the steps for MXNet development
> > > setup
> > > > using
> > > > > VSCode on Mac. Document starts from installing all required
> > > > > tools/packages/IDEs/extensions and then provides steps for
> > > debugging
> > > > mix of
> > > > > Python/CPP code, which is most likely the case for any MXNet
> > > > developer, all
> > > > > in single IDE window. By end of this document, anyone should be
> > > able
> > > > to
> > > > > walk through the MXNet code, debug and be able to make first
> code
> > > > change.
> > > > >
> > > > >
> > > > >
> > > > > Please feel free to add comments, make changes as necessary.
> > > > >
> > > > >
> > > > >
> > > > https://cwiki.apache.org/confluence/display/MXNET/
> > > MXNet+Developer+Setup+on+Mac
> > > > >
> > > > > Best,
> > > > > Sandeep
> > > > >
> > > >
> > > >
> > > >
> > >
> >
>


Remove Caffe functionality

2018-07-19 Thread Anton Chernov
Dear community,

Currently MXNet has a Caffe framework integration (translator and
converter) [1].

There were some issues discovered with it, for example some tests were
failing [2]. Since we decided to remove the flaky tests and proceed to
making them stable I propose completely removing this functionality
instead.

There are multiple reasons to this:

* Mind that this is Caffe 1 (not 2)
* Some people mentioned: "Caffe is soo 2015."
* Keeping functionality that is both unstable and old is a burden for
maintenance.
* Keeping functionality that nobody needs is not necessary overall

Please let me know your thoughts.

Best
Anton

[1]
https://github.com/apache/incubator-mxnet/commits/master/tools/caffe_converter/convert_caffe_modelzoo.py
[2]
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1207/pipeline/


MXNet Berlin Office Hours

2018-07-16 Thread Anton Chernov
Dear MXNet community,

As part of our customer support the MXNet Berlin team is offering office
hours on Tuesdays 6pm-7pm (CEST) | 9:00am-10am (PST).

They happen onsite in the Amazon Berlin office:
Krausenstraße 38, 10117 Berlin in BER12 01.501

Conference Bridge Information

Chime meeting ID: 5461650798
Join via browser screen share: https://chime.aws/5461650798
Join via phone (US): +1-929-432-4463,,5461650798#
Join via phone (US toll-free): +1-855-552-4463,,5461650798#
International dial-in: https://chime.aws/dialinnumbers/
In-room video system: Ext: 62000, Meeting PIN: 5461650798#

How can we help you?

The following are a few examples of the types of consultations we provide:

* CI and infrastructure questions
* Build system
* Benchmarking
* Edge devices (for example Raspberry Pi, Jetson)
* C++
* General questions

Before attending

Try finding answers on:

* Our discussion forum (https://discuss.mxnet.io)
* StackOverflow mxnet tag (https://stackoverflow.com/questions/tagged/mxnet)
* MXNet website (https://mxnet.incubator.apache.org/faq/)
* Github issues (https://github.com/apache/incubator-mxnet/issues)

If this does not help:

In advance fill out a github issue (
https://github.com/apache/incubator-mxnet/issues/new) at least a few days
before so that the team member who will help with the issue gets a chance
to prepare.

Main point of contact through email: mxnet-edge-oncall-primary[at]amazon.com

Best regards
Anton Chernov

[1]
https://cwiki.apache.org/confluence/display/MXNET/MXNet+Berlin+Office+Hours


Re: CI is experiencing issues at the moment

2018-07-12 Thread Anton Chernov
The issue has been resolved. We have retriggered the failed builds and
there is no need for any actions on your side. We will continue to monitor
the builds for any further failures.

Apologies for any inconvenience.

Best regards,
Anton

чт, 12 июл. 2018 г. в 11:31, Anton Chernov :

> Dear MXNet community,
>
> We are currently experiencing some issues with the CI system [1] (the disk
> space run full on the master).
>
> We will update you shortly when the issue was resolved.
>
> Best regards,
> Anton Chernov
>
> [1] https://github.com/apache/incubator-mxnet/issues/11654
>


CI is experiencing issues at the moment

2018-07-12 Thread Anton Chernov
Dear MXNet community,

We are currently experiencing some issues with the CI system [1] (the disk
space run full on the master).

We will update you shortly when the issue was resolved.

Best regards,
Anton Chernov

[1] https://github.com/apache/incubator-mxnet/issues/11654


CI is experiencing issues at the moment

2018-07-12 Thread Anton Chernov
Dear MXNet community,

We are currently experiencing some issues with the CI system [1] (the disk
space run full on the master).

We will update you shortly when the issue was resolved.

Best regards,
Anton Chernov

[1] https://github.com/apache/incubator-mxnet/issues/11654


Re: Feature branches for ARM and Android

2018-06-14 Thread Anton Chernov
Thank you Thomas for your suggestion, we already did exactly that and now
even have a CI verification for PR's to this branches in a public fork. The
main problem mentioned by Pedro already is that the changes we are doing
are already big and they are not going to be smaller over time. The merge
back to origin is going to be not only full of conflicts, but also
challenging to review.

As well such a PR could not be named starting with [MXNET-xxx] mentioning a
certain JIRA ticket, since it incorporates a batch of things tightly bound
to each other. It will be a big not separable list of tickets and changes
both specific to the problem and general improvements necessary to be made.
This is impossible to cherry-pick or revert separately and a completely
different branch needs to be maintained for release changes, general
improvements and specific task development.

Some general improvements require such amount of work (for example some
cmake improvements) that the initial issue becomes not solvable anymore in
a reasonable amount of time, burying both the potentially added value and
WIP improvements.

In general I don't understand the reason for such hard blocking of
contributions. None of the iterative changes proposed have an "unstable
state", they are all bringing value in a series of improvements that fix
the success already made.

Anton

ср, 13 июн. 2018 г. в 22:43, Pedro Larroy :

> The problem is that the process of porting is incremental and requires
> several patches from different collaborators to advance in different areas,
> like build system, infrastructure, code fixes, virtualization This gets
> difficult when having multiple scattered PRs open. We lost track of which
> changes where in which PR fixing the ARMv7 port with Anton.
>
> The normal way to operate in these cases in my experience is either use a
> feature branch and collaborate and share patches there, or integrate the
> patches to move towards the goal in the master branch. The latter is not
> always possible. I think going forwards we will try using an integration
> branch in our org:  MXNetEdge/incubator-mxnet which is a public fork. The
> downside is that we should be wary of merging back large patches to master,
> I think often we have problems in large patches that touch too many things.
> Happy to hear different suggestions, as is always good to find better
> branching patterns and ways of working.
>
> Pedro.
>
> On Wed, Jun 13, 2018 at 8:30 PM Thomas DELTEIL 
> wrote:
>
> > Hi Pedro,
> >
> > Is there a problem in working off a branch in your own fork and issue a
> > [WIP] PR ? This is a pattern I have seen a lot and personally I think it
> > works well, since it also gives some visibility if someone is interested
> in
> > looking at the progress of the work. You can add people collaborating
> with
> > you as collaborator to your own fork and that way your commits will be
> run
> > against the CI. Make sure to merge from apache/master and not
> larroy/master
> > if you have conflicts? Not sure why you got these conflicts otherwise.
> >
> > All the best,
> >
> > Thomas
> >
> > 2018-06-12 23:39 GMT-07:00 Pedro Larroy :
> >
> > > Thanks a lot for creating these branches and proposing the idea, for
> the
> > > reasons you listed.
> > >
> > >
> > >  We tried during this week to work with these branches with @lebeg for
> > > Android and Arm support, for the reasons listed below these branches
> are
> > > not useful for us, so you can delete them.
> > >
> > > 1. We don't have permissions to commit to these development branches,
> > > 2. they show merge conflicts that have been solved locally before
> running
> > > CI (?). I'm pretty sure I merged and resolved conflicts locally. 3. It
> > > would also pollute the repository history with continuous merges to and
> > > from these branches. I prefer to have a linear history in master so
> > > changes, regressions and bisecting can be less painful when dealing
> with
> > > issues.
> > >
> > > I think is important to share development and integrate small,
> > incremental
> > > patches towards architecture support, unfortunately these branches
> can't
> > > help us at this stage. We will share our work through a different means
> > and
> > > without polluting the project with additional branches which are not
> > meant
> > > for production or general use.
> > >
> > >
> > >
> > >
> > > On Mon, Jun 11, 2018 at 6:20 AM Marco de Abreu <
> > > marco.g.ab...@googlemail.com>
> > > wrote:
> > >
> > > > The problem with regular reviews here is that we might want to keep
> > > > temporary code or hacks as a temporary solution before we finalize
> it.
> > A
> > > > regular review would have problems with that.
> > > >
> > > > The reason against a fork is the requirement of CI. Since multiple
> > people
> > > > are working on the same branch and we have to file PRs against each
> > > other,
> > > > it would cause problems if CI is only triggered after the fact.
> > > >
> > > > Ideally, the branch 

Re: Make scalapkg fails if USE_BLAS is set to openblas/mkl/apple

2018-06-06 Thread Anton Chernov
The problem still persists with make build, so no correlation with the
build system.

2018-06-06 15:49 GMT+02:00 Naveen Swamy :

> I am using make to build
>
> On Wed, Jun 6, 2018 at 6:47 AM, Anton Chernov  wrote:
>
> > Yes, we have checked that as well, but did not help in our case. I've
> > checked out a commit close to 1.1 where it was still working and built it
> > with the new ci scripts (using cmake build for armv7) and it failed very
> > similar. It seems that the problem might be in the way we are building
> the
> > library.
> >
> > Are you using cmake or make for builds?
> >
> > 2018-06-06 14:47 GMT+02:00 Naveen Swamy :
> >
> > > By the way it turned out the problem was when we used
> USE_SIGNAL_HANDLER
> > > along with a combination of flags, try removing signal handler and see
> if
> > > it works
> > >
> > > > On Jun 6, 2018, at 12:28 AM, Anton Chernov 
> > wrote:
> > > >
> > > > Unfortunately, I think this is the same behaviour that we're
> observing
> > on
> > > > Raspberry Pi's. Currently we are bisecting the release to find the
> > > breaking
> > > > commit to have an idea what exactly is broken.
> > > >
> > > > What I can say for now that this failure is not deterministic (on
> > RPi's)
> > > > and the library import to python passes in 1 of 4 times. The creation
> > of
> > > > NDArray's in this case fails though in all cases with similar message
> > > that
> > > > the stack is corrupted.
> > > >
> > > > Will update on findings.
> > > >
> > > > -- Anton
> > > >
> > > >
> > > > 2018-06-05 16:19 GMT+02:00 Pedro Larroy <
> pedro.larroy.li...@gmail.com>
> > :
> > > >
> > > >> Could you compile with debug symbols or get a core file? From this
> > > output
> > > >> is not clear why the crash is happening.
> > > >>
> > > >>> On Sun, May 27, 2018 at 10:04 AM, Naveen Swamy  >
> > > wrote:
> > > >>>
> > > >>> Hi,
> > > >>> I am working to publish MXNet-Scala package to maven and
> encountering
> > > an
> > > >>> issue when trying to build with openblas/mkl/apple. This is on both
> > the
> > > >>> master and the 1.2.0 branch? Can some one help with this.
> > > >>> make scalapkg fails when it calls the MXNet backend to get all the
> > > APIs ?
> > > >>> can someone help here? should I publish with blas disabled? I have
> > > >> already
> > > >>> quite a bit of time on this/
> > > >>>
> > > >>> [INFO]
> > > >>> [INFO] Segmentation fault: 11
> > > >>> [INFO]
> > > >>> [INFO] Stack trace returned 10 entries:
> > > >>> [INFO] [bt] (0)
> > > >>> /home/ubuntu/mxnet-master/scala-package/init-native/
> > > >>> linux-x86_64/target/libmxnet-init-scala-linux-x86_64.so(
> > > >>> dmlc::StackTrace[abi:cxx11]()+0x1bc)
> > > >>> [0x7f2f04ca58ec]
> > > >>> [INFO] [bt] (1)
> > > >>> /home/ubuntu/mxnet-master/scala-package/init-native/
> > > >>> linux-x86_64/target/libmxnet-init-scala-linux-x86_64.so(+
> 0x31d7a4f)
> > > >>> [0x7f2f07971a4f]
> > > >>> [INFO] [bt] (2) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0)
> > > >> [0x7f3096cd24b0]
> > > >>> [INFO] [bt] (3)
> > > >>> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> > > >>> libjvm.so(+0x3e4afc)
> > > >>> [0x7f3093e0aafc]
> > > >>> [INFO] [bt] (4)
> > > >>> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> > > >>> libjvm.so(+0xa239d6)
> > > >>> [0x7f30944499d6]
> > > >>> [INFO] [bt] (5)
> > > >>> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> > > >>> libjvm.so(+0xa24cdc)
> > > >>> [0x7f309444acdc]
> > > >>> [INFO] [bt] (6)
> > > >>> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> > > >>> libjvm.so(+0xa24e4c)
> > > >>> [0x7f309444ae4c]
> > > >>> [INFO] [bt] (7)
> > > >>> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> > > >>> libjvm.so(+0x7c3

Re: Make scalapkg fails if USE_BLAS is set to openblas/mkl/apple

2018-06-06 Thread Anton Chernov
Yes, we have checked that as well, but did not help in our case. I've
checked out a commit close to 1.1 where it was still working and built it
with the new ci scripts (using cmake build for armv7) and it failed very
similar. It seems that the problem might be in the way we are building the
library.

Are you using cmake or make for builds?

2018-06-06 14:47 GMT+02:00 Naveen Swamy :

> By the way it turned out the problem was when we used USE_SIGNAL_HANDLER
> along with a combination of flags, try removing signal handler and see if
> it works
>
> > On Jun 6, 2018, at 12:28 AM, Anton Chernov  wrote:
> >
> > Unfortunately, I think this is the same behaviour that we're observing on
> > Raspberry Pi's. Currently we are bisecting the release to find the
> breaking
> > commit to have an idea what exactly is broken.
> >
> > What I can say for now that this failure is not deterministic (on RPi's)
> > and the library import to python passes in 1 of 4 times. The creation of
> > NDArray's in this case fails though in all cases with similar message
> that
> > the stack is corrupted.
> >
> > Will update on findings.
> >
> > -- Anton
> >
> >
> > 2018-06-05 16:19 GMT+02:00 Pedro Larroy :
> >
> >> Could you compile with debug symbols or get a core file? From this
> output
> >> is not clear why the crash is happening.
> >>
> >>> On Sun, May 27, 2018 at 10:04 AM, Naveen Swamy 
> wrote:
> >>>
> >>> Hi,
> >>> I am working to publish MXNet-Scala package to maven and encountering
> an
> >>> issue when trying to build with openblas/mkl/apple. This is on both the
> >>> master and the 1.2.0 branch? Can some one help with this.
> >>> make scalapkg fails when it calls the MXNet backend to get all the
> APIs ?
> >>> can someone help here? should I publish with blas disabled? I have
> >> already
> >>> quite a bit of time on this/
> >>>
> >>> [INFO]
> >>> [INFO] Segmentation fault: 11
> >>> [INFO]
> >>> [INFO] Stack trace returned 10 entries:
> >>> [INFO] [bt] (0)
> >>> /home/ubuntu/mxnet-master/scala-package/init-native/
> >>> linux-x86_64/target/libmxnet-init-scala-linux-x86_64.so(
> >>> dmlc::StackTrace[abi:cxx11]()+0x1bc)
> >>> [0x7f2f04ca58ec]
> >>> [INFO] [bt] (1)
> >>> /home/ubuntu/mxnet-master/scala-package/init-native/
> >>> linux-x86_64/target/libmxnet-init-scala-linux-x86_64.so(+0x31d7a4f)
> >>> [0x7f2f07971a4f]
> >>> [INFO] [bt] (2) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0)
> >> [0x7f3096cd24b0]
> >>> [INFO] [bt] (3)
> >>> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> >>> libjvm.so(+0x3e4afc)
> >>> [0x7f3093e0aafc]
> >>> [INFO] [bt] (4)
> >>> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> >>> libjvm.so(+0xa239d6)
> >>> [0x7f30944499d6]
> >>> [INFO] [bt] (5)
> >>> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> >>> libjvm.so(+0xa24cdc)
> >>> [0x7f309444acdc]
> >>> [INFO] [bt] (6)
> >>> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> >>> libjvm.so(+0xa24e4c)
> >>> [0x7f309444ae4c]
> >>> [INFO] [bt] (7)
> >>> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> >>> libjvm.so(+0x7c3252)
> >>> [0x7f30941e9252]
> >>> [INFO] [bt] (8)
> >>> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> >>> libjvm.so(+0x5b00d6)
> >>> [0x7f3093fd60d6]
> >>> [INFO] [bt] (9)
> >>> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> >>> libjvm.so(+0x5b2c44)
> >>> [0x7f3093fd8c44]
> >>> [INFO]
> >>> 
> 
> >>>
> >>> [INFO] MXNet Scala Package - Parent ... SUCCESS [
> >>> 1.265 s]
> >>> [INFO] MXNet Scala Package - Initializer .. SUCCESS [
> >>> 2.215 s]
> >>> [INFO] MXNet Scala Package - Initializer Native Parent  SUCCESS [
> >>> 0.017 s]
> >>> [INFO] MXNet Scala Package - Initializer Native Linux-x86_64 SUCCESS [
> >>> 4.417 s]
> >>> [INFO] MXNet Scala Package - Macros ... SUCCESS [
> >>> 7.083 s]
> >>> [INFO] MXNet Scala Package - Core . FAILURE [
> >>> 4.341 s]
> >>> [INFO] MXNet Scala Package - Native Parent  SKIPPED
> >>> [INFO] MXNet Scala Package - Native Linux-x86_64 CPU-only . SKIPPED
> >>> [INFO] MXNet Scala Package - Inference  SKIPPED
> >>> [INFO] MXNet Scala Package - Examples . SKIPPED
> >>> [INFO] MXNet Scala Package - Spark ML . SKIPPED
> >>> [INFO] MXNet Scala Package - Full Parent .. SKIPPED
> >>> [INFO] MXNet Scala Package - Full Linux-x86_64 CPU-only ... SKIPPED
> >>>
> >>> -Naveen
> >>>
> >>
>


Re: Make scalapkg fails if USE_BLAS is set to openblas/mkl/apple

2018-06-06 Thread Anton Chernov
Unfortunately, I think this is the same behaviour that we're observing on
Raspberry Pi's. Currently we are bisecting the release to find the breaking
commit to have an idea what exactly is broken.

What I can say for now that this failure is not deterministic (on RPi's)
and the library import to python passes in 1 of 4 times. The creation of
NDArray's in this case fails though in all cases with similar message that
the stack is corrupted.

Will update on findings.

-- Anton


2018-06-05 16:19 GMT+02:00 Pedro Larroy :

> Could you compile with debug symbols or get a core file? From this output
> is not clear why the crash is happening.
>
> On Sun, May 27, 2018 at 10:04 AM, Naveen Swamy  wrote:
>
> > Hi,
> > I am working to publish MXNet-Scala package to maven and encountering an
> > issue when trying to build with openblas/mkl/apple. This is on both the
> > master and the 1.2.0 branch? Can some one help with this.
> > make scalapkg fails when it calls the MXNet backend to get all the APIs ?
> > can someone help here? should I publish with blas disabled? I have
> already
> > quite a bit of time on this/
> >
> > [INFO]
> > [INFO] Segmentation fault: 11
> > [INFO]
> > [INFO] Stack trace returned 10 entries:
> > [INFO] [bt] (0)
> > /home/ubuntu/mxnet-master/scala-package/init-native/
> > linux-x86_64/target/libmxnet-init-scala-linux-x86_64.so(
> > dmlc::StackTrace[abi:cxx11]()+0x1bc)
> > [0x7f2f04ca58ec]
> > [INFO] [bt] (1)
> > /home/ubuntu/mxnet-master/scala-package/init-native/
> > linux-x86_64/target/libmxnet-init-scala-linux-x86_64.so(+0x31d7a4f)
> > [0x7f2f07971a4f]
> > [INFO] [bt] (2) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0)
> [0x7f3096cd24b0]
> > [INFO] [bt] (3)
> > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> > libjvm.so(+0x3e4afc)
> > [0x7f3093e0aafc]
> > [INFO] [bt] (4)
> > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> > libjvm.so(+0xa239d6)
> > [0x7f30944499d6]
> > [INFO] [bt] (5)
> > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> > libjvm.so(+0xa24cdc)
> > [0x7f309444acdc]
> > [INFO] [bt] (6)
> > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> > libjvm.so(+0xa24e4c)
> > [0x7f309444ae4c]
> > [INFO] [bt] (7)
> > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> > libjvm.so(+0x7c3252)
> > [0x7f30941e9252]
> > [INFO] [bt] (8)
> > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> > libjvm.so(+0x5b00d6)
> > [0x7f3093fd60d6]
> > [INFO] [bt] (9)
> > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> > libjvm.so(+0x5b2c44)
> > [0x7f3093fd8c44]
> > [INFO]
> > 
> >
> > [INFO] MXNet Scala Package - Parent ... SUCCESS [
> > 1.265 s]
> > [INFO] MXNet Scala Package - Initializer .. SUCCESS [
> > 2.215 s]
> > [INFO] MXNet Scala Package - Initializer Native Parent  SUCCESS [
> > 0.017 s]
> > [INFO] MXNet Scala Package - Initializer Native Linux-x86_64 SUCCESS [
> > 4.417 s]
> > [INFO] MXNet Scala Package - Macros ... SUCCESS [
> > 7.083 s]
> > [INFO] MXNet Scala Package - Core . FAILURE [
> > 4.341 s]
> > [INFO] MXNet Scala Package - Native Parent  SKIPPED
> > [INFO] MXNet Scala Package - Native Linux-x86_64 CPU-only . SKIPPED
> > [INFO] MXNet Scala Package - Inference  SKIPPED
> > [INFO] MXNet Scala Package - Examples . SKIPPED
> > [INFO] MXNet Scala Package - Spark ML . SKIPPED
> > [INFO] MXNet Scala Package - Full Parent .. SKIPPED
> > [INFO] MXNet Scala Package - Full Linux-x86_64 CPU-only ... SKIPPED
> >
> > -Naveen
> >
>


Re: Make cmake default

2018-06-05 Thread Anton Chernov
Here [1] you can find a work in progress PR regarding BLAS libraries
handling with cmake in MXNet.

-- Anton

[1] https://github.com/apache/incubator-mxnet/pull/11148

2018-06-04 11:43 GMT+02:00 Anton Chernov :

> +1
>
>
>
> Cmake build scripts have currently some limitations (CUDA, lapack, F16
> etc) especially for cross compilations.
>
> I am currently working on those [1]. lapack and BLAS cmake module coming
> soon.
>
> Once this is done all ci builds can be ported to use cmake builds.
>
>
>
> Regarding amalgamation:
>
> It would certainly be beneficial to remove amalgamation ASAP since it's
> misleading customers.
>
>
>
> -- Anton
>
> [1] CUDA, F16 https://github.com/apache/incubator-mxnet/pull/10564
>
>
> 2018-06-04 10:17 GMT+02:00 Chen HY :
>
>> glad to hear mxnet.js is back again.
>>
>> 2018-06-04 8:43 GMT+01:00 Asmus Hetzel :
>>
>> >  +1
>> >
>> > I have dealt with the make/cmake stuff when integrating lapack/cusolver.
>> > Having a single cmake would have made things far easier.
>> >
>> > Asmus
>> >
>> >
>> >
>> > Am Freitag, 1. Juni 2018, 23:58:17 MESZ hat Alex Zai <
>> aza...@gmail.com>
>> > Folgendes geschrieben:
>> >
>> >  Just realized that the email lists strips aways all hyperlinks.
>> Attached
>> > is a
>> > copy of my previous email with links pasted in.
>> >
>> > What are peoples' thought on requiring cmake when building from source?
>> > Currently we have to maintain two independent build files (CMakeLists
>> and
>> > Makefile) which makes it more difficult to develop (each are 600+
>> lines).
>> > Also,
>> > our current build system (in Makefile) requires that 3rdparty
>> dependencies
>> > have
>> > binaries present (or a Makefile to generate binaries) in the repo, which
>> > is not
>> > always the case.
>> > Generating a makefile with cmake will make our Makefile very simple like
>> > PyTorch'sMakefile (20 lines of code -
>> > https://github.com/pytorch/pytorch/blob/master/Makefile). Also, not all
>> > 3rdparty
>> > dependencies have binaries or Makefiles. For 3rdparty/mkldnn we end up
>> > calling
>> > cmake
>> >  (https://github.com/apache/incubator-mxnet/blob/master/
>> > prepare_mkldnn.sh#L96)
>> > to generate binaries (this does not violate our 'no cmake dependency' as
>> > USE_MKLDNN is OFF by default). If we encounter any library in the future
>> > that
>> > requires us to generate artifacts with cmake, it would be better to make
>> > the
>> > switch now. Lastly, we already require cmake as a dependency forwindows'
>> > developers
>> >  (https://www.dropbox.com/s/9sfnderg58z4j1l/Screenshot%
>> > 202018-06-01%2013.43.08.png?dl=0)
>> > so this would only affect linux / mac developers who do not have cmake
>> > already.
>> > I currently have a pendingPR
>> >  (https://github.com/apache/incubator-mxnet/pull/8/) that depends
>> on
>> > this
>> > change. The library does not have a Makefile or binaries present. Unlike
>> > mkldnn,
>> > we would want this library included by default so I cannot generate
>> > artifacts
>> > with cmake. The alternative would be to strip out only the relevant
>> parts
>> > of the
>> > code we need from the library. I did this in a previous version of myPR
>> >  (https://github.com/apache/incubator-mxnet/compare/
>> > dfdfd1ad15de8bb1b899effb0860a4e834093cfc...a4267eb80488804a7
>> f74ff01f5627c
>> > 47dd46bd78)
>> > but it is incredible messy.
>> > Please let me know your thoughts.
>> > Best,
>> > Alex
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Jun 1, 2018 2:51 PM, Alex Zai aza...@gmail.com  wrote:
>> > What are peoples' thought on requiring cmake when building from source?
>> > Currently we have to maintain two independent build files (CMakeLists
>> and
>> > Makefile) which makes it more difficult to develop (each are 600+
>> lines).
>> > Also,
>> > our current build system (in Makefile) requires that 3rdparty
>> dependencies
>> > have
>> > binaries present (or a Makefile to generate binaries) in the repo, which
>> > is not
>> > always the case.
>> > Generating a makefile with cmake will make our Makefile very simple like
>> > PyTorch's Makefile (20 lines of code). Also, not 

Re: Make cmake default

2018-06-04 Thread Anton Chernov
+1



Cmake build scripts have currently some limitations (CUDA, lapack, F16 etc)
especially for cross compilations.

I am currently working on those [1]. lapack and BLAS cmake module coming
soon.

Once this is done all ci builds can be ported to use cmake builds.



Regarding amalgamation:

It would certainly be beneficial to remove amalgamation ASAP since it's
misleading customers.



-- Anton

[1] CUDA, F16 https://github.com/apache/incubator-mxnet/pull/10564


2018-06-04 10:17 GMT+02:00 Chen HY :

> glad to hear mxnet.js is back again.
>
> 2018-06-04 8:43 GMT+01:00 Asmus Hetzel :
>
> >  +1
> >
> > I have dealt with the make/cmake stuff when integrating lapack/cusolver.
> > Having a single cmake would have made things far easier.
> >
> > Asmus
> >
> >
> >
> > Am Freitag, 1. Juni 2018, 23:58:17 MESZ hat Alex Zai <
> aza...@gmail.com>
> > Folgendes geschrieben:
> >
> >  Just realized that the email lists strips aways all hyperlinks. Attached
> > is a
> > copy of my previous email with links pasted in.
> >
> > What are peoples' thought on requiring cmake when building from source?
> > Currently we have to maintain two independent build files (CMakeLists and
> > Makefile) which makes it more difficult to develop (each are 600+ lines).
> > Also,
> > our current build system (in Makefile) requires that 3rdparty
> dependencies
> > have
> > binaries present (or a Makefile to generate binaries) in the repo, which
> > is not
> > always the case.
> > Generating a makefile with cmake will make our Makefile very simple like
> > PyTorch'sMakefile (20 lines of code -
> > https://github.com/pytorch/pytorch/blob/master/Makefile). Also, not all
> > 3rdparty
> > dependencies have binaries or Makefiles. For 3rdparty/mkldnn we end up
> > calling
> > cmake
> >  (https://github.com/apache/incubator-mxnet/blob/master/
> > prepare_mkldnn.sh#L96)
> > to generate binaries (this does not violate our 'no cmake dependency' as
> > USE_MKLDNN is OFF by default). If we encounter any library in the future
> > that
> > requires us to generate artifacts with cmake, it would be better to make
> > the
> > switch now. Lastly, we already require cmake as a dependency forwindows'
> > developers
> >  (https://www.dropbox.com/s/9sfnderg58z4j1l/Screenshot%
> > 202018-06-01%2013.43.08.png?dl=0)
> > so this would only affect linux / mac developers who do not have cmake
> > already.
> > I currently have a pendingPR
> >  (https://github.com/apache/incubator-mxnet/pull/8/) that depends on
> > this
> > change. The library does not have a Makefile or binaries present. Unlike
> > mkldnn,
> > we would want this library included by default so I cannot generate
> > artifacts
> > with cmake. The alternative would be to strip out only the relevant parts
> > of the
> > code we need from the library. I did this in a previous version of myPR
> >  (https://github.com/apache/incubator-mxnet/compare/
> > dfdfd1ad15de8bb1b899effb0860a4e834093cfc...
> a4267eb80488804a7f74ff01f5627c
> > 47dd46bd78)
> > but it is incredible messy.
> > Please let me know your thoughts.
> > Best,
> > Alex
> >
> >
> >
> >
> >
> > On Fri, Jun 1, 2018 2:51 PM, Alex Zai aza...@gmail.com  wrote:
> > What are peoples' thought on requiring cmake when building from source?
> > Currently we have to maintain two independent build files (CMakeLists and
> > Makefile) which makes it more difficult to develop (each are 600+ lines).
> > Also,
> > our current build system (in Makefile) requires that 3rdparty
> dependencies
> > have
> > binaries present (or a Makefile to generate binaries) in the repo, which
> > is not
> > always the case.
> > Generating a makefile with cmake will make our Makefile very simple like
> > PyTorch's Makefile (20 lines of code). Also, not all 3rdparty
> dependencies
> > have
> > binaries or Makefiles. For 3rdparty/mkldnn we end up calling cmake to
> > generate
> > binaries (this does not violate our 'no cmake dependency' as USE_MKLDNN
> is
> > OFF
> > by default). If we encounter any library in the future that requires us
> to
> > generate artifacts with cmake, it would be better to make the switch now.
> > Lastly, we already require cmake as a dependency for windows'
> > developers so this
> > would only affect linux / mac developers who do not have cmake already.
> > I currently have a pending PR that depends on this change. The library
> > does not
> > have a Makefile or binaries present. Unlike mkldnn, we would want this
> > library
> > included by default so I cannot generate artifacts with cmake. The
> > alternative
> > would be to strip out only the relevant parts of the code we need from
> the
> > library. I did this in a previous version of my PR  but it is incredible
> > messy.
> > Please let me know your thoughts.
> > Best,
> > Alex
> >
>
>
>
> --
> Chen Hanyang 陈涵洋
> Software School Fudan University
> +86-138-1881-7745
>


Re: MXNet C++ package improvements

2018-03-21 Thread Anton Chernov
The document has been synced to JIRA
<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75976112>.

2018-03-15 17:51 GMT+01:00 Anton Chernov <mecher...@gmail.com>:

> I've improved the document based on recent comments, please have another
> look. What has been changed so far:
>
> * Fixed unclear formulations
> * Added details about the current state of the library and why is it
> complicated to use
> * Improved the explanation of the piml idiom and provided a code example
> * Added thread safety requirements
>
> Thanks @chris and @naveen for your reviewing efforts!
>
>
> 2018-03-14 21:44 GMT+01:00 Anton Chernov <mecher...@gmail.com>:
>
>> Sure, here we go:
>> https://docs.google.com/document/d/1Xi0aU9Nks7-GcsJIfcXoEpgP
>> eYnSiEtBpJlhZvMaIb8/edit?usp=sharing
>>
>>
>> 2018-03-14 19:15 GMT+01:00 Chris Olivier <cjolivie...@gmail.com>:
>>
>>> Can you put that on google docs so that it can be commented/edited?
>>>
>>> On Wed, Mar 14, 2018 at 11:07 AM, Anton Chernov <mecher...@gmail.com>
>>> wrote:
>>>
>>> > Dear MxNet Community,
>>> >
>>> > please find here
>>> > <https://cwiki.apache.org/confluence/pages/viewpage.action?p
>>> ageId=75976112>
>>> > the
>>> > design document for the proposed MXNet C++ package improvements for
>>> review
>>> > and consideration.
>>> >
>>> > Feedback is welcome and highly appreciated. Thank you!
>>> >
>>> > BR
>>> > Anton
>>> >
>>>
>>
>>
>


Re: MXNet C++ package improvements

2018-03-15 Thread Anton Chernov
 I've improved the document based on recent comments, please have another
look. What has been changed so far:

* Fixed unclear formulations
* Added details about the current state of the library and why is it
complicated to use
* Improved the explanation of the piml idiom and provided a code example
* Added thread safety requirements

Thanks @chris and @naveen for your reviewing efforts!


2018-03-14 21:44 GMT+01:00 Anton Chernov <mecher...@gmail.com>:

> Sure, here we go:
> https://docs.google.com/document/d/1Xi0aU9Nks7-
> GcsJIfcXoEpgPeYnSiEtBpJlhZvMaIb8/edit?usp=sharing
>
>
> 2018-03-14 19:15 GMT+01:00 Chris Olivier <cjolivie...@gmail.com>:
>
>> Can you put that on google docs so that it can be commented/edited?
>>
>> On Wed, Mar 14, 2018 at 11:07 AM, Anton Chernov <mecher...@gmail.com>
>> wrote:
>>
>> > Dear MxNet Community,
>> >
>> > please find here
>> > <https://cwiki.apache.org/confluence/pages/viewpage.action?
>> pageId=75976112>
>> > the
>> > design document for the proposed MXNet C++ package improvements for
>> review
>> > and consideration.
>> >
>> > Feedback is welcome and highly appreciated. Thank you!
>> >
>> > BR
>> > Anton
>> >
>>
>
>


MXNet C++ package improvements

2018-03-14 Thread Anton Chernov
Dear MxNet Community,

please find here
 the
design document for the proposed MXNet C++ package improvements for review
and consideration.

Feedback is welcome and highly appreciated. Thank you!

BR
Anton


Wiki Access

2018-03-14 Thread Anton Chernov
 Hi!

Can somebody please give write me write access to the MXNet wiki? In
particular this section:

https://cwiki.apache.org/confluence/display/MXNET/Design+Proposals

Thanks!

Anton


Join MXNet Development Discussion

2017-10-24 Thread Anton Chernov
Hi!

I would like to join the MXNet Development Discussion on slack. Could you
grant me access?

Thanks!

Kind regards,
Anton Chernov