Re: Changes to MPI-operator

2019-04-16 Thread Roshani Nagmote
Sounds good. We(Pinar, Vandana and me) are currently prototyping and we are
planning to start a discussion on dev list once we have some logical
conclusion.
We will share more details soon and seek feedback from the community.

Thanks,
Roshani

On Mon, Apr 15, 2019 at 5:30 PM Yuan Tang  wrote:

> I am cc’ing MXNet dev mailing list here.
>
> Thanks for the note Roshani. Look forward to seeing your contribution!
> Though let’s also discuss this in MXNet dev mailing list since other people
> (e.g. Carl and Lin) might be working on this as well to avoid duplicate
> work.
>
> Best,
> Yuan
>
> On Mon, Apr 15, 2019 at 5:51 PM Rong Ou  wrote:
>
>> Sounds great! Yes it would be nice to have some examples for MXNet.
>>
>> On Mon, Apr 15, 2019 at 3:36 PM Roshani Nagmote <
>> roshaninagmo...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I work on Apache MXNet and recently I used MPI-Operator to run
>>> distributed training with MXNet and horovod on Kubernetes.
>>> I with few other folks tried to adjust the capacity for a training job
>>> based on the available workers and restart the training job from where it
>>> left off if any worker goes away in between.
>>>
>>> To do this, we had to do a few modifications to MPI-operator. For
>>> example, updating workerReplicas and launcherRole. Currently, changes are
>>> in my repo and I will be making a PR on MPI-operator with these changes.
>>> Also, planning to contribute few examples. I wanted to reach out to you
>>> first before creating a PR.
>>>
>>> Please let me know what your thoughts are on this.
>>>
>>> Thanks,
>>> Roshani
>>>
>>


Re: [VOTE] Release Apache MXNet (incubating) version 1.4.0.rc3

2019-02-18 Thread Roshani Nagmote
+1 Downloaded, installed on Ubuntu 16.04. Verified signatures.
Built from source with cuda enabled. Ran train_mnist.py test successfully.

Thanks,
Roshani

On Sun, Feb 17, 2019 at 12:13 PM Carin Meier  wrote:

> +1 Downloaded and verified the signature on the tar. Built and tested the
> Scala/Clojure package
>
> On Sun, Feb 17, 2019 at 2:13 PM Qing Lan  wrote:
>
> > +1 (binding) on the release. Checked Mac + Linux (Ubuntu 16.04) build
> from
> > source successfully. Checked Scala build with no errors.
> >
> > On 2/15/19, 6:08 PM, "Piyush Ghai"  wrote:
> >
> > Dear MXNet community,
> >
> > I would like to propose a vote to release Apache MXNet (incubating)
> > version v1.4.0.
> > Voting will start today, Friday February 15th 6pm PST and will close
> > on Monday,
> > February 18th 6pm PST.
> >
> > Link to release notes:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> > <
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+(incubating)+1.4.0+Release+Notes
> > >
> >
> > Link to release candidate 1.4.0.rc3:
> >  
> > https://github.com/apache/incubator-mxnet/releases/tag/1.4.0.rc3 <
> > https://github.com/apache/incubator-mxnet/releases/tag/1.4.0.rc3>/
> >
> > Link to source and signatures on apache dist server:
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.4.0.rc3/ <
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.4.0.rc3/>
> >
> >
> > Please remember to TEST first before voting accordingly:
> > +1 = approve
> > +0 = no opinion
> > -1 = disapprove (provide reason)
> >
> >
> > Best regards,
> > Piyush
> >
> >
>


Re: [VOTE] Release Apache MXNet (incubating) version 1.3.1.rc0

2018-11-16 Thread Roshani Nagmote
+1
Installed the release package on Ubuntu with CUDA, CUDNN enabled and ran
train_cifar10.py example successfully.

Thanks,
Roshani

On Fri, Nov 16, 2018 at 5:21 AM Anton Chernov  wrote:

> Thank you Carin, Steffen and Kellen for your votes.
>
> The results so far:
>
> * Binding *
>
> +1 votes:
>   - Carin
>
> * Non-Binding *
>
> +1 votes:
>   - Kellen
>   - Steffen
>
> So far, we've got only positive votes, but unfortunately not enough to
> conclude a result from the vote. I would like to remind everyone that we
> need at least 3 binding +1 votes. Therefore, the vote is extended until
> Tuesday 20th of November 2018, 5pm CET (9am PT).
>
> I kindly ask the community to participate in voting for this patch release.
>
> Best regards
> Anton
>
>
> пт, 16 нояб. 2018 г. в 9:10, kellen sunderland <
> kellen.sunderl...@gmail.com
> >:
>
> > Just tested with 1.3.0 and those tests were failing for that release as
> > well.  Given it's not a regression I'm +1 (non-binding).
> >
> > On Thu, Nov 15, 2018 at 11:52 PM kellen sunderland <
> > kellen.sunderl...@gmail.com> wrote:
> >
> > > Thanks for organizing the release Anton and for testing Carin and
> > > Steffen.  Lots of great fixes in this release.  As we don't have the
> > > required 3 committers I'd suggest extending the vote for a few days.
> > >
> > > I tested the following on MacOS 10.13, High Sierra:
> > >
> > > INCUBATING IN RELEASE FILE: check.
> > > LICENSE check.
> > > NOTICE check.
> > > SIGNATURE check.
> > > HASH check.
> > > DISCLAIMER check.
> > > SOURCE COMPILES VIA MAKEFILE check.
> > > SOURCE COMPILES VIA CMAKE check.
> > > C++ TESTS PASS fail
> > > Two tests failing for me.
> > > Build with flags: cmake -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_OPENMP=0
> > > -DUSE_OPENCV=0 ..
> > > Ran c++ tests with exclusions: ./tests/mxnet_unit_tests
> > > --gtest_filter=-GpuTopology.*
> > > Result:
> > > [  FAILED  ] 2 tests, listed below:
> > > [  FAILED  ] ACTIVATION_PERF.ExecuteBidirectional
> > > [  FAILED  ] ACTIVATION_PERF.TimingCPU
> > >
> > > PYHTON UNIT TESTS PASS check.
> > >
> > > Not sure if the test failures are a regression so I'm +0 (non-binding)
> > >
> > > On Thu, Nov 15, 2018 at 5:43 PM Steffen Rochel <
> steffenroc...@gmail.com>
> > > wrote:
> > >
> > >> +1 build on MacOS Sierra following instructions on
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Developer+Setup+on+Mac
> > >> and run one training test.
> > >>
> > >> On Tue, Nov 13, 2018 at 2:34 PM Carin Meier 
> > wrote:
> > >>
> > >> > +1 - Clojure package tested fine with Scala jars
> > >> >
> > >> > On Mon, Nov 12, 2018 at 6:53 PM Anton Chernov 
> > >> wrote:
> > >> >
> > >> > > Dear MXNet community,
> > >> > >
> > >> > > This is the vote to release Apache MXNet (incubating) version
> 1.3.1.
> > >> > Voting
> > >> > > will start now, on Monday the 12th of November 2018 and close on
> > 14:00
> > >> > > Thursday the 15th of November 2018, Pacific Time (PT).
> > >> > >
> > >> > > Link to release notes:
> > >> > > https://cwiki.apache.org/confluence/x/eZGzBQ
> > >> > >
> > >> > > Link to release candidate 1.3.1.rc0:
> > >> > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.1.rc0
> > >> > >
> > >> > > Link to source and signatures on apache dist server:
> > >> > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.3.1.rc0/
> > >> > >
> > >> > > Link to scala packages on the staging repo:
> > >> > >
> > >> > > * CPU
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://repository.apache.org/content/repositories/snapshots/org/apache/mxnet/mxnet-full_2.11-osx-x86_64-cpu/1.3.1-SNAPSHOT/
> > >> > >
> > >> > > * GPU
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://repository.apache.org/content/repositories/snapshots/org/apache/mxnet/mxnet-full_2.11-linux-x86_64-gpu/1.3.1-SNAPSHOT/
> > >> > >
> > >> > > Please remember to TEST first before voting accordingly:
> > >> > > +1 = approve
> > >> > > +0 = no opinion
> > >> > > -1 = disapprove (provide reason)
> > >> > >
> > >> > >
> > >> > > Best regards,
> > >> > > Anton
> > >> > >
> > >> >
> > >>
> > >
> >
>


Re: Call for participation: evaluate Java API

2018-10-16 Thread Roshani Nagmote
Java APi SSD example code seems to be in this PR
https://github.com/apache/incubator-mxnet/pull/12830

Thanks,
Roshani

On Tue, Oct 16, 2018 at 1:41 PM Pedro Larroy 
wrote:

> To play around we use the java api branch?  is there a link to some example
> code?
>
> Thanks.
>
> On Fri, Oct 12, 2018 at 9:16 PM Davydenko, Denis <
> dzianis.davydze...@gmail.com> wrote:
>
> > Not so long ago there was a design shared for MXNet Java API: [1]
> >
> > In a couple of days we are going to have initial version of its
> > implementation. We are looking for users who would like to get this
> initial
> > version and evaluate how well it suits their use cases or just play
> around
> > with it and provide feedback on its usability and performance. This
> initial
> > version includes:
> > - Predictor
> > - ObjectDetector
> > - NDArray, Context, Shape, DataDesc
> > - Reference implementation of SSD
> >
> > If you or someone you know is interested - please do not hesitate to
> reach
> > out!
> >
> > [1]:
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Java+Inference+API
> >
> >
> >
> >
>


Re: [RESULT][VOTE] Release MXNet version 1.3.0

2018-09-12 Thread Roshani Nagmote
Thanks Chris. Will keep it in mind next time. :)

Regards,
Roshani

On Fri, Sep 7, 2018 at 12:07 PM Chris Olivier  wrote:

> nit: using the "-" before peoples' names makes this kind of hard to read
> for me, since "-" is part of "-1"
>
> On Fri, Sep 7, 2018 at 11:18 AM Roshani Nagmote  >
> wrote:
>
> > Hi All,
> >
> > So, this vote passes with *seven* +1, *two* 0  and *three* -1 votes.
> >
> > *+1 votes*
> > *Committers:*
> > - Joshua Zhang
> > - Carin
> > - Naveen
> > - Indu
> > - Haibin
> >
> > *Community:*
> > - Pigeon Lucky
> > - Steffen
> > *0 votes:*
> > *Community:*
> > - Thomas
> > - Aaron
> > *-1 votes:*
> > *Committers:*
> > - Sandeep
> > - Anirudh
> >
> > *Community:*
> > - Hagay
> >
> > *Vote Thread:*
> >
> >
> >
> https://lists.apache.org/thread.html/8ad6f14811be465cdf663d6962980fd95e12193626292631a21ec6f1@%3Cdev.mxnet.apache.org%3E
> >
> >
> > I will continue with the release process on general@ and the release
> > announcement will follow in the next few days.
> >
> > Thanks,
> > Roshani
> >
>


Re: [VOTE] Release MXNet version 1.3.0.RC0

2018-09-12 Thread Roshani Nagmote
Thanks everyone for testing and voting for the release. I am working with
Sheng to finalize and post the release. Announcement will follow soon.

Regards,
Roshani

On Mon, Sep 10, 2018 at 7:03 AM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> Tracked down the issue referred to above and it's not a bug.   I'll update
> the ticket.
>
> Changing to +1.
>
> On Mon, Sep 10, 2018 at 3:00 PM kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
>
> > -0.1
> >
> > There's one test failure I've run into (details below).  Following
> Indhu's
> > logic I don't think this should block the release as it's not relating
> to a
> > release feature introduced in this version.
> >
> > I'm trying to use the cpp-package examples as reference code for how to
> > run MXNet models from a native context. I'd like to run them with ASAN
> as a
> > sanity check for memory leaks and pointer errors.  I was continually
> > running into segfaults and crashes w/ and w/o ASAN.  A little googling
> > shows me that this issue has already been reported, and is related to
> > running tests on CPU, not to any changes I made:
> > https://github.com/apache/incubator-mxnet/issues/9814  Having what our
> > effectively our reference examples crash is not a good practice IMO.
> >
> > I also share some concerns around the fp16 failures.  I know developers
> > who are currently porting their models to Gluon who use fp16.  They'll be
> > disappointed with the error.
> >
> > In general though, release looks good.  Big thanks to Sheng and Roshani
> > for putting it together (and sorry for the late testing).
> >
> > -Kellen
> >
> >
> > On Fri, Sep 7, 2018 at 4:31 AM Anirudh  wrote:
> >
> >> -1 Considering that using fp16 with gluon is much easier than the
> >> alternative where you need access to the model code, this fix is really
> >> useful. I understand the pain of doing mxnet release and appreciate
> >> Roshani
> >> and Shengs efforts, but this seems like something we should fix.
> >>
> >> On Thu, Sep 6, 2018, 4:57 PM Haibin Lin 
> wrote:
> >>
> >> > +1 built from source and passes dist_sync_kvstore test on Ubuntu.
> >> >
> >> > Best,
> >> > Haibin
> >> >
> >> > On Thu, Sep 6, 2018 at 1:32 PM Indhu  wrote:
> >> >
> >> > > +1
> >> > >
> >> > > The release candidate looks good. I'm able to build and run basic
> >> models.
> >> > >
> >> > > One the FP16 issue:
> >> > >
> >> > > Like others have pointed out, releases on expensive in terms of time
> >> and
> >> > > effort. There needs to be a high and more objective bar on what
> >> qualifies
> >> > > as a release blocker to make sure we are not setting precedence for
> a
> >> lot
> >> > > of release blockers in future.
> >> > >
> >> > > I think a release blocker is justified only if there is a serious
> bug
> >> > > discovered in one of the features included in the release or if
> there
> >> is
> >> > a
> >> > > regression. Given FP16 supports is not a new feature claimed in this
> >> > > release and this is not a regression in this release candidate, I'm
> >> > > inclined to release this candidate and include the FP16 fix in a
> >> > subsequent
> >> > > release.
> >> > >
> >> > > Thanks,
> >> > > Indu
> >> > >
> >> > > On Wed, Sep 5, 2018 at 10:21 AM Aaron Markham <
> >> aaron.s.mark...@gmail.com
> >> > >
> >> > > wrote:
> >> > >
> >> > > > 0 (non-binding) If we have a problem that blocks users, and a
> >> solution
> >> > in
> >> > > > hand... then we should fix it, but not at the expense of starting
> >> the
> >> > > > release cycle again just for one fix. Users can cherry pick or
> build
> >> > from
> >> > > > master if they want the fix right away, right? I'd change my mind
> >> to -1
> >> > > if
> >> > > > this wasn't the case, with good reason, and if the user impact was
> >> > > critical
> >> > > > to adoption or risks abandonment.
> >> > > >
> >> > > >
> >> > > > On Wed, Sep 5, 2018 at 9:57 AM Roshani Nagmote <
> >

[RESULT][VOTE] Release MXNet version 1.3.0

2018-09-07 Thread Roshani Nagmote
Hi All,

So, this vote passes with *seven* +1, *two* 0  and *three* -1 votes.

*+1 votes*
*Committers:*
- Joshua Zhang
- Carin
- Naveen
- Indu
- Haibin

*Community:*
- Pigeon Lucky
- Steffen
*0 votes:*
*Community:*
- Thomas
- Aaron
*-1 votes:*
*Committers:*
- Sandeep
- Anirudh

*Community:*
- Hagay

*Vote Thread:*

https://lists.apache.org/thread.html/8ad6f14811be465cdf663d6962980fd95e12193626292631a21ec6f1@%3Cdev.mxnet.apache.org%3E


I will continue with the release process on general@ and the release
announcement will follow in the next few days.

Thanks,
Roshani


Re: [VOTE] Release MXNet version 1.3.0.RC0

2018-09-06 Thread Roshani Nagmote
Thanks Kellen and Naveen for pointing it out.
Now we have 3 committers +1 votes to move forward with the release. But it
will be great if more people can test the release.

I am extending the timeline for voting till 7 pm today. Please test and
vote.

Thanks,
Roshani

On Thu, Sep 6, 2018 at 5:46 AM Naveen Swamy  wrote:

> +1
>
>
> Roshani/Sheng,
>
> Thanks for putting this release together, I was able to test the release
> only now. As Kellen indicated this release does not have enough committer
> votes, I suggest you extend the timeline.
>
> I downloaded the source code from
> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.3.0.rc0/.
>
> I verified the signature of the release and built the Scala package from
> this source, I was able to run Scala Unit Tests and Integration tests
> successfully.
>
> Also IMO, the issue that Sandeep though is good to include in the release,
> I would not consider it a release blocker since it has a work around and
> you can add it to release notes as a link to the github issue with the
> workaround.
>
> Other notes (consider adding to retrospective):
>
> On running  gpg --verify, I received a message that the signature is Good
> from Sheng Zha along with a WARNING(gpg: WARNING: This key is not certified
> with a trusted signature!), On researching I found this is fine[1] and the
> fingerprint matches with Sheng's Key here
> https://dist.apache.org/repos/dist/dev/incubator/mxnet/KEYS.
>
> Next time, please send a link to the source and signatures on apache dist
> server
>
> I am currently working with Qing to create and test a maven package for
> Scala, please wait and add that to the Announcement email.
>
> Next time, please give a day or two after the RC is cut so we can create
> packages for various language bindings(Scala, Clojure, R) --(currently this
> is manual), so we can get the packages that users use tested during the RC
> phase.
>
> During the release, I suggest the release manager communicate
> regularly(daily) on dev@ until an announcement is made so everyone is
> aware
> of the status and can plan their work to accommodate building packages,
> testing RC, etc.,
>
> 1.
>
> http://www.apache.org/dev/release-signing.html#valid-untrusted-vs-invalid-trusted
>
>
> Thanks, Naveen
>
>
>
> On Wed, Sep 5, 2018 at 10:20 AM, Aaron Markham 
> wrote:
>
> > 0 (non-binding) If we have a problem that blocks users, and a solution in
> > hand... then we should fix it, but not at the expense of starting the
> > release cycle again just for one fix. Users can cherry pick or build from
> > master if they want the fix right away, right? I'd change my mind to -1
> if
> > this wasn't the case, with good reason, and if the user impact was
> critical
> > to adoption or risks abandonment.
> >
> >
> > On Wed, Sep 5, 2018 at 9:57 AM Roshani Nagmote <
> roshaninagmo...@gmail.com>
> > wrote:
> >
> > > I believe everyone here is working hard to make MXNet a better
> framework
> > > for users. It's completely okay to have different opinions, we can
> decide
> > > together if this issue is a blocker or not after voting time is over.
> > >
> > > As I mentioned before, voting will end at 7 pm today. So there is still
> > > time to test the release. If there are any other issues anyone finds, I
> > > will be happy to start the process again and work on RC1. For now, I
> want
> > > to encourage everyone to utilize this time and vote. :)
> > >
> > > Thanks,
> > > Roshani
> > >
> > > On Tue, Sep 4, 2018 at 10:35 PM sandeep krishnamurthy <
> > > sandeep.krishn...@gmail.com> wrote:
> > >
> > > >1. As a Apache MXNet community member, I raised the concern of
> > broken
> > > >functionality for the user. I explained and provided the data
> points
> > > on
> > > > the
> > > >issue, workaround and why I think it is important. If after all
> > this,
> > > > you
> > > >think my vote is biased on my employer just because a user I
> quoted
> > is
> > > > from
> > > >Amazon, this is more concerning to me on my voting abilities.
> > > >2. My -1 no where undermines the huge amount of effort that goes
> > > behind
> > > >the scene for a release to happen. Great respect and recognition
> for
> > > >everyone involved in all the releases of MXNet in the past and
> > this. I
> > > >voted on my judgement of what may be good for the users of MXNet.
> > > >3. As pointed

[RESULT][VOTE] Release MXNet version 1.3.0

2018-09-05 Thread Roshani Nagmote
Hi All,

Thank you for spending the time to test MXNet 1.3.0.RC0 release.
Sandeep mentioned the issue
 when the user
tries to load model params trained/saved as FP16 in gluon. The fix
 will go into the
master branch and users who want to use it can build MXNet from master. We
will not block release for this.

So, this vote passes with *four* +1, *two* 0  and *two* -1 votes.

*+1 votes*

*Committers:*

- Joshua Zhang

- Carin


*Community:*

- Pigeon Lucky

- Steffen


*0 votes:*

*Community:*

- Thomas

- Aaron


*-1 votes:*

*Committers:*

- Sandeep


*Community:*

- Hagay



*Vote Thread:*

https://lists.apache.org/thread.html/8ad6f14811be465cdf663d6962980fd95e12193626292631a21ec6f1@%3Cdev.mxnet.apache.org%3E


I will continue with the release process on general@ and the release
announcement will follow in the next few days.


Thanks,
Roshani


Re: [VOTE] Release MXNet version 1.3.0.RC0

2018-09-05 Thread Roshani Nagmote
ase-blocker nor its fix made it for the
> 8/3
> > > code
> > > > > > > freeze."
> > > > > > > You are right, would have been better to have this identified
> and
> > > > fixed
> > > > > > > earlier and included before code freeze.
> > > > > > >
> > > > > > > - "The PR is still not ready yet as it doesn't have approval."
> > > > > > > I think it is waiting for your review.
> > > > > > >
> > > > > > > - "it would be great if you could provide some additional
> > reasoning
> > > > > > besides
> > > > > > > "X mentions the issue" or "fix was done by X""
> > > > > > > I have. Repeating what I wrote in my previous email for
> clarity:
> > > > Basic
> > > > > > > functionality broken: loading a model (albeit one that that was
> > > saved
> > > > > as
> > > > > > > non FP32)
> > > > > > >
> > > > > > > So, yes - this issue seems to have been out there for a while,
> > > > somehow
> > > > > > went
> > > > > > > under the radar... but I think the key question is whether this
> > > > blocks
> > > > > a
> > > > > > > basic functionality in MXNet. I believe so, hence my -1 vote.
> > > > > > >
> > > > > > > Hagay
> > > > > > >
> > > > > > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha 
> > > wrote:
> > > > > > >
> > > > > > > > Hi Hagay and Sandeep,
> > > > > > > >
> > > > > > > > Could you help us understand why this specific bug is more
> > > > important
> > > > > > than
> > > > > > > > all the other known bugs, that this becomes a release
> blocker?
> > > > > > > >
> > > > > > > > Some facts to consider:
> > > > > > > > - The bug exists since SymbolBlock was introduced a year ago
> > and
> > > > has
> > > > > > > > survived at least three releases, so this is not a
> regression.
> > > > > > > > - Timeline-wise, this bug was reported on 7/21, but was not
> > > > reported
> > > > > as
> > > > > > > > release-blocker in the release discussion thread until 8/31
> > [1].
> > > > > > Neither
> > > > > > > > its reporting as release-blocker nor its fix made it for the
> > 8/3
> > > > code
> > > > > > > > freeze.
> > > > > > > > - The PR is still not ready yet as it doesn't have approval.
> > > > > > > >
> > > > > > > > Hagay, it would be great if you could provide some additional
> > > > > reasoning
> > > > > > > > besides "X mentions the issue" or "fix was done by X".
> Thanks.
> > > > > > > >
> > > > > > > > -sz
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> > > > > > > >
> > > > > > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko <
> > lupe...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Sandeep mentions the issue of an error when user tries to
> > load
> > > > > model
> > > > > > > > params
> > > > > > > > > trained/saved as FP16.
> > > > > > > > > https://github.com/apache/incubator-mxnet/issues/11849
> > > > > > > > > The fix was done by Sandeep:
> > > > > > > > > https://github.com/apache/incubator-mxnet/pull/12412 and
> is
> > > > ready
> > > > > to
> > > > > > > be
> > > > > > > > > cherry picked into the release branch.
> > > > > > > > >
> > > > > > > > > This seems like a release blocker to me:
> > > > > > > > > - Basic functionality broken: loading a model (albeit one
> > that
> > > > that
> > > > > > was
> > > > > > > > > saved as non FP32)
> > > > > > > > > - Reported by 3 users (wgchang@, nicklhy@ and
> ThomasDelteil@
> > )
> > > > > > > > >
> > > > > > > > > -1 (non binding)
> > > > > > > > >
> > > > > > > > > Hagay
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy <
> > > > > > > > > sandeep.krishn...@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > "- 0"
> > > > > > > > > >
> > > > > > > > > > I believe the bug #11849
> > > > > > > > > > <https://github.com/apache/incubator-mxnet/issues/11849
> >,
> > > > unable
> > > > > > to
> > > > > > > > > import
> > > > > > > > > > non-fp32 models into Gluon, fixed in this PR #12412
> > > > > > > > > > <https://github.com/apache/incubator-mxnet/pull/12412>
> is
> > > > > > important
> > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > users. I would rather pick this fix in this release than
> > > plan a
> > > > > > minor
> > > > > > > > > > release later.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Sandeep
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho <
> > > > > > > chohy...@cs.washington.edu>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Actually, the command "git clone --recursive
> > > > > > > > > > > https://github.com/apache/incubator-mxnet -b
> 1.3.0.rc0"
> > > > works
> > > > > > fine
> > > > > > > > > now,
> > > > > > > > > > > never mind.
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho <
> > > > > > > > chohy...@cs.washington.edu>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Unfortunately, MXNet was depending on a branch of TVM
> > > that
> > > > is
> > > > > > now
> > > > > > > > > > > deleted.
> > > > > > > > > > > > We will have to merge #12448
> > > > > > > > > > > > <
> https://github.com/apache/incubator-mxnet/pull/12448>
> > > > > before
> > > > > > > the
> > > > > > > > > > > release.
> > > > > > > > > > > >
> > > > > > > > > > > > Background: See dmlc/tvm#1394 <
> > > > > > > > > https://github.com/dmlc/tvm/issues/1394
> > > > > > > > > > >.
> > > > > > > > > > > >
> > > > > > > > > > > > Philip.
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier <
> > > > > > carinme...@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >> Checked out the tag, built and tested the Clojure
> > > package.
> > > > > +1
> > > > > > > > > > > >>
> > > > > > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
> > > > > > > > > > > >> roshaninagmo...@gmail.com>
> > > > > > > > > > > >> wrote:
> > > > > > > > > > > >>
> > > > > > > > > > > >> > Hi all,
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > I would like to propose a vote to release Apache
> > MXNet
> > > > > > > > > (incubating)
> > > > > > > > > > > >> version
> > > > > > > > > > > >> > 1.3.0.RC0. Voting will start now (Friday, Aug
> 31st)
> > > and
> > > > > end
> > > > > > at
> > > > > > > > > 7:00
> > > > > > > > > > PM
> > > > > > > > > > > >> > PDT, Wednesday, Sept 5th.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Link to release notes:
> > > > > > > > > > > >> >
> https://github.com/apache/incubator-mxnet/releases
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Link to release candidate 1.3.0.rc0:
> > > > > > > > > > > >> > *
> > > > > > > >
> > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > > > > > > > > > >> > <
> > > > > > > >
> > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > > > > > > > > > >0*
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > View this page, click on "Build from Source", and
> > use
> > > > the
> > > > > > > source
> > > > > > > > > > code
> > > > > > > > > > > >> > obtained from 1.3.0.rc0 tag:
> > > > > > > > > > > >> >
> > https://mxnet.incubator.apache.org/install/index.html
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Please remember to TEST first before voting
> > > accordingly:
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > +1 = approve
> > > > > > > > > > > >> > +0 = no opinion
> > > > > > > > > > > >> > -1 = disapprove (provide reason)
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Thanks,
> > > > > > > > > > > >> > Roshani
> > > > > > > > > > > >> >
> > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Sandeep Krishnamurthy
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sandeep Krishnamurthy
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> Sandeep Krishnamurthy
>


[VOTE] Release MXNet version 1.3.0.RC0

2018-08-31 Thread Roshani Nagmote
Hi all,

I would like to propose a vote to release Apache MXNet (incubating) version
1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at 7:00 PM
PDT, Wednesday, Sept 5th.

Link to release notes:
https://github.com/apache/incubator-mxnet/releases

Link to release candidate 1.3.0.rc0:
*https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
0*

View this page, click on "Build from Source", and use the source code
obtained from 1.3.0.rc0 tag:
https://mxnet.incubator.apache.org/install/index.html

Please remember to TEST first before voting accordingly:

+1 = approve
+0 = no opinion
-1 = disapprove (provide reason)

Thanks,
Roshani


Re: Release plan - MXNET 1.3

2018-08-31 Thread Roshani Nagmote
Thanks Pedro for bringing up this issue. Will add it in release notes.

@Sandeep, Thanks for working on this bug. For now, not prioritizing fix in
the release branch. We can address it in the next releases.

As we have not added Scala maven packages in previous release testing, we
will not be delaying the voting process anymore.
I will send a separate release voting email.

Thanks,
Roshani

On Fri, Aug 31, 2018 at 10:34 AM sandeep krishnamurthy <
sandeep.krishn...@gmail.com> wrote:

> Hello Roshani/Sheng/MXNet community,
>
> Thanks for handling 1.3 release.
>
> Here is an issue - https://github.com/apache/incubator-mxnet/issues/11849,
> where you cannot import non-fp32 models into Gluon Symbol block. I think
> this is an important feature for our users who imports a FP16, FP64 models
> into Gluon.
> I have created a PR for the fix -
> https://github.com/apache/incubator-mxnet/pull/12412 reviewed by one
> community member (Lin) and I would like to work towards getting this
> merged, please chime in for review.
>
> Since we have not started the vote yet, I suggest we cherry pick this bug
> fix in to the RC.
>
> Let me know your suggestions.
>
> Best,
> Sandeep
>
> On Fri, Aug 31, 2018 at 2:38 AM Pedro Larroy  >
> wrote:
>
> > Hi
> >
> > The armv7 docker builds are broken due to a problem with dockcross. Shall
> > we add this to the release notes?  I'm working on a fix. Best would be to
> > get the fix on the release branch.
> >
> > https://github.com/apache/incubator-mxnet/issues/12421
> >
> > Pedro
> >
> > On Fri, Aug 31, 2018 at 2:38 AM Roshani Nagmote <
> roshaninagmo...@gmail.com
> > >
> > wrote:
> >
> > > Hi all,
> > >
> > > I was waiting to run nightly builds to pass on the release branch. So
> RC0
> > > was a bit delayed.
> > > Now, we have tagged the release candidate! Thanks a lot Sheng for
> helping
> > > with this!
> > >
> > > https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0
> > >
> > > Voting process to test RC0 will start after Scala package gets
> published
> > to
> > > maven repositories.
> > > I had an offline discussion with Naveen and Qing who will be working on
> > > publishing Scala packages. They want to include Scala packages in
> testing
> > > as well.
> > >
> > > I will communicate timelines for voting soon.
> > >
> > > Thanks,
> > > Roshani
> > >
> > >
> > >
> > > On Thu, Aug 23, 2018 at 10:24 AM kellen sunderland <
> > > kellen.sunderl...@gmail.com> wrote:
> > >
> > > > Great news!  Thanks for the efforts Roshani + Sheng.
> > > >
> > > > On Thu, Aug 23, 2018 at 6:58 PM Roshani Nagmote <
> > > roshaninagmo...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Release branch v1.3.x was cut yesterday night. Thanks, @Sheng for
> > > helping
> > > > > with this and merging a bunch of PRs. I will be running tests on
> the
> > > > branch
> > > > > and move forward with the release steps now. :)
> > > > > Thanks,
> > > > > Roshani
> > > > >
> > > > > On Wed, Aug 22, 2018 at 11:12 AM Roshani Nagmote <
> > > > > roshaninagmo...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks Patric for reviewing the notes. Updated the doc with
> MKL-DNN
> > > > > points
> > > > > > you mentioned accordingly.
> > > > > >
> > > > > > Regards,
> > > > > > Roshani
> > > > > >
> > > > > > On Tue, Aug 21, 2018 at 8:03 PM Zhao, Patric <
> > patric.z...@intel.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Hi Roshani,
> > > > > >>
> > > > > >> Good notes :)
> > > > > >>
> > > > > >> Several items about the performance and MKL-DNN in the below,
> > please
> > > > > help
> > > > > >> take a review.
> > > > > >>
> > > > > >> @Da, Alex, if anything about MKL-DNN is missed, feel free to
> add.
> > > > > >>
> > > > > >> *Performance improvement
> > > > > >> +Support for dot(dns, csr) = dns and dot(dns, csr.T) = dns on
> CPU
> > > > > >> htt

Re: Release plan - MXNET 1.3

2018-08-30 Thread Roshani Nagmote
Hi all,

I was waiting to run nightly builds to pass on the release branch. So RC0
was a bit delayed.
Now, we have tagged the release candidate! Thanks a lot Sheng for helping
with this!

https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0

Voting process to test RC0 will start after Scala package gets published to
maven repositories.
I had an offline discussion with Naveen and Qing who will be working on
publishing Scala packages. They want to include Scala packages in testing
as well.

I will communicate timelines for voting soon.

Thanks,
Roshani



On Thu, Aug 23, 2018 at 10:24 AM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> Great news!  Thanks for the efforts Roshani + Sheng.
>
> On Thu, Aug 23, 2018 at 6:58 PM Roshani Nagmote  >
> wrote:
>
> > Release branch v1.3.x was cut yesterday night. Thanks, @Sheng for helping
> > with this and merging a bunch of PRs. I will be running tests on the
> branch
> > and move forward with the release steps now. :)
> > Thanks,
> > Roshani
> >
> > On Wed, Aug 22, 2018 at 11:12 AM Roshani Nagmote <
> > roshaninagmo...@gmail.com>
> > wrote:
> >
> > > Thanks Patric for reviewing the notes. Updated the doc with MKL-DNN
> > points
> > > you mentioned accordingly.
> > >
> > > Regards,
> > > Roshani
> > >
> > > On Tue, Aug 21, 2018 at 8:03 PM Zhao, Patric 
> > > wrote:
> > >
> > >> Hi Roshani,
> > >>
> > >> Good notes :)
> > >>
> > >> Several items about the performance and MKL-DNN in the below, please
> > help
> > >> take a review.
> > >>
> > >> @Da, Alex, if anything about MKL-DNN is missed, feel free to add.
> > >>
> > >> *Performance improvement
> > >> +Support for dot(dns, csr) = dns and dot(dns, csr.T) = dns on CPU
> > >> https://github.com/apache/incubator-mxnet/pull/3
> > >> +Performance improvement for Batch Dot on CPU from mshadow
> > >> https://github.com/dmlc/mshadow/pull/342
> > >> -Fix the topk regression issue (#12197)
> > >> This is the bugfix rather than performance improvements
> > >>
> > >>
> > >> *MKL-DNN
> > >> More functionality supports:
> > >> +Support more activation functions, "sigmoid", "tanh", "softrelu"
> > >> https://github.com/apache/incubator-mxnet/pull/10336
> > >>
> > >> Debugging functionality:
> > >> +Result check
> > >> https://github.com/apache/incubator-mxnet/pull/12069
> > >> +Backend switch
> > >> https://github.com/apache/incubator-mxnet/pull/12058
> > >>
> > >> Thanks,
> > >>
> > >> --Patric
> > >>
> > >> > -Original Message-
> > >> > From: Roshani Nagmote [mailto:roshaninagmo...@gmail.com]
> > >> > Sent: Wednesday, August 22, 2018 1:53 AM
> > >> > To: dev@mxnet.incubator.apache.org
> > >> > Subject: Re: Release plan - MXNET 1.3
> > >> >
> > >> > Hi,
> > >> >
> > >> > Thank you everyone for helping to clear release blockers. CI tests
> > were
> > >> failing
> > >> > so we delayed RC by some time. But now the tests are passing and we
> > are
> > >> > ready to cut the release branch.
> > >> >
> > >> > I have drafted release notes here:
> > >> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28in
> > >> > cubating%29+1.3.0+Release+Notes
> > >> >
> > >> >
> > >> > Please take a look and update if I have missed anything. I will be
> > >> cutting
> > >> > RC0 tomorrow.
> > >> >
> > >> > Thanks,
> > >> > Roshani
> > >> >
> > >> > On Thu, Aug 16, 2018 at 2:28 PM Roshani Nagmote
> > >> > 
> > >> > wrote:
> > >> >
> > >> > > Sure will do. thanks.
> > >> > >
> > >> > > -Roshani
> > >> > >
> > >> > > On Thu, Aug 16, 2018 at 11:53 AM Afrooze, Sina <
> sina@gmail.com>
> > >> > wrote:
> > >> > >
> > >> > >> Hi Roshani - Can you please make sure that this fix (which is
> > already
> > >> > >> merged to master) is also merged to the stable branch for 1.3.0:
> > >>

Re: Release plan - MXNET 1.3

2018-08-23 Thread Roshani Nagmote
Release branch v1.3.x was cut yesterday night. Thanks, @Sheng for helping
with this and merging a bunch of PRs. I will be running tests on the branch
and move forward with the release steps now. :)
Thanks,
Roshani

On Wed, Aug 22, 2018 at 11:12 AM Roshani Nagmote 
wrote:

> Thanks Patric for reviewing the notes. Updated the doc with MKL-DNN points
> you mentioned accordingly.
>
> Regards,
> Roshani
>
> On Tue, Aug 21, 2018 at 8:03 PM Zhao, Patric 
> wrote:
>
>> Hi Roshani,
>>
>> Good notes :)
>>
>> Several items about the performance and MKL-DNN in the below, please help
>> take a review.
>>
>> @Da, Alex, if anything about MKL-DNN is missed, feel free to add.
>>
>> *Performance improvement
>> +Support for dot(dns, csr) = dns and dot(dns, csr.T) = dns on CPU
>> https://github.com/apache/incubator-mxnet/pull/3
>> +Performance improvement for Batch Dot on CPU from mshadow
>> https://github.com/dmlc/mshadow/pull/342
>> -Fix the topk regression issue (#12197)
>> This is the bugfix rather than performance improvements
>>
>>
>> *MKL-DNN
>> More functionality supports:
>> +Support more activation functions, "sigmoid", "tanh", "softrelu"
>> https://github.com/apache/incubator-mxnet/pull/10336
>>
>> Debugging functionality:
>> +Result check
>> https://github.com/apache/incubator-mxnet/pull/12069
>> +Backend switch
>> https://github.com/apache/incubator-mxnet/pull/12058
>>
>> Thanks,
>>
>> --Patric
>>
>> > -Original Message-
>> > From: Roshani Nagmote [mailto:roshaninagmo...@gmail.com]
>> > Sent: Wednesday, August 22, 2018 1:53 AM
>> > To: dev@mxnet.incubator.apache.org
>> > Subject: Re: Release plan - MXNET 1.3
>> >
>> > Hi,
>> >
>> > Thank you everyone for helping to clear release blockers. CI tests were
>> failing
>> > so we delayed RC by some time. But now the tests are passing and we are
>> > ready to cut the release branch.
>> >
>> > I have drafted release notes here:
>> > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28in
>> > cubating%29+1.3.0+Release+Notes
>> >
>> >
>> > Please take a look and update if I have missed anything. I will be
>> cutting
>> > RC0 tomorrow.
>> >
>> > Thanks,
>> > Roshani
>> >
>> > On Thu, Aug 16, 2018 at 2:28 PM Roshani Nagmote
>> > 
>> > wrote:
>> >
>> > > Sure will do. thanks.
>> > >
>> > > -Roshani
>> > >
>> > > On Thu, Aug 16, 2018 at 11:53 AM Afrooze, Sina 
>> > wrote:
>> > >
>> > >> Hi Roshani - Can you please make sure that this fix (which is already
>> > >> merged to master) is also merged to the stable branch for 1.3.0:
>> > >> https://github.com/apache/incubator-mxnet/pull/11493 - Thanks, Sina
>> > >>
>> > >>
>> > >> On 8/16/18, 10:51 AM, "Roshani Nagmote"
>> > 
>> > >> wrote:
>> > >>
>> > >> Hi all,
>> > >>
>> > >> Release status:
>> > >>
>> > >> Currently, for release 1.3.0 there are a couple of issues open
>> > >> which needs
>> > >> to be resolved before cutting RC.
>> > >>
>> > >> The current date we are looking at for cutting RC0 is
>> 08/17(Friday).
>> > >>
>> > >>
>> > >>
>> > >> Open issues which need to be looked at before cutting RC:
>> > >>
>> > >>1. Topk regression issue
>> > >><https://github.com/apache/incubator-mxnet/issues/12197> -
>> > >> #12202 PR
>> > >>with fix <
>> https://github.com/apache/incubator-mxnet/pull/12202>
>> > >>2. Excessive memory allocation issue
>> > >><https://github.com/apache/incubator-mxnet/issues/12116> -
>> > >> #12184 PR
>> > >>with fix <
>> https://github.com/apache/incubator-mxnet/pull/12184>
>> > >>3. Test_io.test_csvIter breaks on CentOS
>> > >><https://github.com/apache/incubator-mxnet/issues/12139> -
>> > >> #12189 PR
>> > >>with fix
>> > >> <https://github.com/apache/incubator-mxnet/pull/12189>
>> > >>
>

Re: Release plan - MXNET 1.3

2018-08-22 Thread Roshani Nagmote
Thanks Patric for reviewing the notes. Updated the doc with MKL-DNN points
you mentioned accordingly.

Regards,
Roshani

On Tue, Aug 21, 2018 at 8:03 PM Zhao, Patric  wrote:

> Hi Roshani,
>
> Good notes :)
>
> Several items about the performance and MKL-DNN in the below, please help
> take a review.
>
> @Da, Alex, if anything about MKL-DNN is missed, feel free to add.
>
> *Performance improvement
> +Support for dot(dns, csr) = dns and dot(dns, csr.T) = dns on CPU
> https://github.com/apache/incubator-mxnet/pull/3
> +Performance improvement for Batch Dot on CPU from mshadow
> https://github.com/dmlc/mshadow/pull/342
> -Fix the topk regression issue (#12197)
> This is the bugfix rather than performance improvements
>
>
> *MKL-DNN
> More functionality supports:
> +Support more activation functions, "sigmoid", "tanh", "softrelu"
> https://github.com/apache/incubator-mxnet/pull/10336
>
> Debugging functionality:
> +Result check
> https://github.com/apache/incubator-mxnet/pull/12069
> +Backend switch
> https://github.com/apache/incubator-mxnet/pull/12058
>
> Thanks,
>
> --Patric
>
> > -Original Message-
> > From: Roshani Nagmote [mailto:roshaninagmo...@gmail.com]
> > Sent: Wednesday, August 22, 2018 1:53 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: Release plan - MXNET 1.3
> >
> > Hi,
> >
> > Thank you everyone for helping to clear release blockers. CI tests were
> failing
> > so we delayed RC by some time. But now the tests are passing and we are
> > ready to cut the release branch.
> >
> > I have drafted release notes here:
> > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28in
> > cubating%29+1.3.0+Release+Notes
> >
> >
> > Please take a look and update if I have missed anything. I will be
> cutting
> > RC0 tomorrow.
> >
> > Thanks,
> > Roshani
> >
> > On Thu, Aug 16, 2018 at 2:28 PM Roshani Nagmote
> > 
> > wrote:
> >
> > > Sure will do. thanks.
> > >
> > > -Roshani
> > >
> > > On Thu, Aug 16, 2018 at 11:53 AM Afrooze, Sina 
> > wrote:
> > >
> > >> Hi Roshani - Can you please make sure that this fix (which is already
> > >> merged to master) is also merged to the stable branch for 1.3.0:
> > >> https://github.com/apache/incubator-mxnet/pull/11493 - Thanks, Sina
> > >>
> > >>
> > >> On 8/16/18, 10:51 AM, "Roshani Nagmote"
> > 
> > >> wrote:
> > >>
> > >> Hi all,
> > >>
> > >> Release status:
> > >>
> > >> Currently, for release 1.3.0 there are a couple of issues open
> > >> which needs
> > >> to be resolved before cutting RC.
> > >>
> > >> The current date we are looking at for cutting RC0 is
> 08/17(Friday).
> > >>
> > >>
> > >>
> > >> Open issues which need to be looked at before cutting RC:
> > >>
> > >>1. Topk regression issue
> > >><https://github.com/apache/incubator-mxnet/issues/12197> -
> > >> #12202 PR
> > >>with fix <https://github.com/apache/incubator-mxnet/pull/12202
> >
> > >>2. Excessive memory allocation issue
> > >><https://github.com/apache/incubator-mxnet/issues/12116> -
> > >> #12184 PR
> > >>with fix <https://github.com/apache/incubator-mxnet/pull/12184
> >
> > >>3. Test_io.test_csvIter breaks on CentOS
> > >><https://github.com/apache/incubator-mxnet/issues/12139> -
> > >> #12189 PR
> > >>with fix
> > >> <https://github.com/apache/incubator-mxnet/pull/12189>
> > >>
> > >>
> > >>
> > >> @committers, could you please help review these PRs and get them
> > >> merged?
> > >>
> > >>
> > >>
> > >> Thanks,
> > >>
> > >> Roshani
> > >>
> > >> On Tue, Aug 14, 2018 at 12:46 PM Roshani Nagmote <
> > >> roshaninagmo...@gmail.com>
> > >> wrote:
> > >>
> > >> > Talked to the person who ran resnet50 benchmarks offline. Build
> > >> flag was
> > >> > not properly set so there was a difference in performance
> > >> numbers observed.
> > >> > There is

Re: Release plan - MXNET 1.3

2018-08-21 Thread Roshani Nagmote
Hi,

Thank you everyone for helping to clear release blockers. CI tests were
failing so we delayed RC by some time. But now the tests are passing and we
are ready to cut the release branch.

I have drafted release notes here:
https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.3.0+Release+Notes


Please take a look and update if I have missed anything. I will be cutting
RC0 tomorrow.

Thanks,
Roshani

On Thu, Aug 16, 2018 at 2:28 PM Roshani Nagmote 
wrote:

> Sure will do. thanks.
>
> -Roshani
>
> On Thu, Aug 16, 2018 at 11:53 AM Afrooze, Sina  wrote:
>
>> Hi Roshani - Can you please make sure that this fix (which is already
>> merged to master) is also merged to the stable branch for 1.3.0:
>> https://github.com/apache/incubator-mxnet/pull/11493 - Thanks, Sina
>>
>>
>> On 8/16/18, 10:51 AM, "Roshani Nagmote" 
>> wrote:
>>
>> Hi all,
>>
>> Release status:
>>
>> Currently, for release 1.3.0 there are a couple of issues open which
>> needs
>> to be resolved before cutting RC.
>>
>> The current date we are looking at for cutting RC0 is 08/17(Friday).
>>
>>
>>
>> Open issues which need to be looked at before cutting RC:
>>
>>1. Topk regression issue
>><https://github.com/apache/incubator-mxnet/issues/12197> - #12202
>> PR
>>with fix <https://github.com/apache/incubator-mxnet/pull/12202>
>>2. Excessive memory allocation issue
>><https://github.com/apache/incubator-mxnet/issues/12116> - #12184
>> PR
>>with fix <https://github.com/apache/incubator-mxnet/pull/12184>
>>3. Test_io.test_csvIter breaks on CentOS
>><https://github.com/apache/incubator-mxnet/issues/12139> - #12189
>> PR
>>    with fix  <https://github.com/apache/incubator-mxnet/pull/12189>
>>
>>
>>
>> @committers, could you please help review these PRs and get them
>> merged?
>>
>>
>>
>> Thanks,
>>
>> Roshani
>>
>> On Tue, Aug 14, 2018 at 12:46 PM Roshani Nagmote <
>> roshaninagmo...@gmail.com>
>> wrote:
>>
>> > Talked to the person who ran resnet50 benchmarks offline. Build
>> flag was
>> > not properly set so there was a difference in performance numbers
>> observed.
>> > There is no issue caught and he was able to get the same results as
>> > mentioned here https://mxnet.incubator.apache.org/faq/perf.html
>> > <https://mxnet.incubator.apache.org/faq/perf.html#scoring-results>
>> >
>> > We are good here.
>> >
>> > Thanks,
>> > Roshani
>> >
>> > On Mon, Aug 13, 2018 at 4:08 PM Roshani Nagmote <
>> roshaninagmo...@gmail.com>
>> > wrote:
>> >
>>     >> Hi Dom,
>> >>
>> >> I verified resnet50 run on MXNet master branch. Checked on single
>> gpu
>> >> machine. Numbers match. I didn't see any performance degradation.
>> >> https://mxnet.incubator.apache.org/faq/perf.html#scoring-results
>> >>
>> >> Can you please give me more details on the instance type and
>> script you
>> >> ran exactly so that I can try to reproduce it again?
>> >>
>> >> Thanks,
>> >> Roshani
>> >>
>> >>
>>     >> On Mon, Aug 13, 2018 at 12:31 PM Roshani Nagmote <
>> >> roshaninagmo...@gmail.com> wrote:
>> >>
>> >>> This is not a major feature. I meant other new feature requests
>> PR won't
>> >>> be accepted in 1.3 release now.
>> >>> Bug fixes will be accepted. I will be trying to reproduce the
>> regression
>> >>> Dom mentioned today. :)
>> >>>
>> >>> Thanks,
>> >>> Roshani
>> >>>
>> >>> On Mon, Aug 13, 2018 at 12:06 PM Naveen Swamy > >
>> >>> wrote:
>> >>>
>> >>>> Is this is a major feature? This is a regression that Dom is
>> reporting
>> >>>> wrt
>> >>>> to performance
>> >>>>
>> >>>> On Mon, Aug 13, 2018 at 11:38 AM, Roshani Nagmote <
>> >>>> roshaninagmo...@gmail.com
>> >>>> > wrote:
>&g

Re: Release blocker? - buggy topk Op

2018-08-16 Thread Roshani Nagmote
Thanks Leonard for raising this issue.
@ciyong Thanks for submitting the fix. I will be tracking mentioned PR
#12202  for release.

-Roshani

On Thu, Aug 16, 2018 at 6:45 AM Zhao, Patric  wrote:

> Hi Leonard,
>
> Thanks to raising the issue of topk op.
>
> The root cause is from the current API design which used float data type
> to represent the integer index, and as we know, the float type could NOT
> express the large integer precisely.
> (I have no offense. I know I missed some backgrounds and I think the
> current design is very good).
>
> The new CI#12085 changes the computation order and make this issue looks
> more significant. Essentially, the bug will happen when the index is large
> whatever with or without the new CI.
> One line example code can trigger the issue,
> 'print(mx.nd.topk(mx.nd.array(np.arange(256*300096).reshape(8, -1)), k=4))'.
>
> Thus, the real fix is to change the API interface and use INT for the
> index. But it might introduce compatibility issue to current
> framework/topology due to API change.
> I am not sure we need to change in the last minutes of release 1.3
> (actually, we can contribute to it).
>
> Currently, we submitted a fix (#12202) to make the computation order as
> same as before and still much faster :)
>
> Apologies for the confusion and feel free to let us know for any feedback.
>
> Thanks,
>
> --Patric
>
>
> > -Original Message-
> > From: Leonard Lausen [mailto:l-softw...@lausen.nl]
> > Sent: Thursday, August 16, 2018 9:51 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Release blocker? - buggy topk Op
> >
> > Recent changes in mxnet master introduced a bug into the topk operator.
> >  Below code example will output [ 274232. 179574. 274233. 274231.] with
> >  mxnet-cu90==1.3.0b20180810 but [ 274232. 179574. 274232. 274232.] with
> > mxnet-cu90==1.3.0b20180814. Likely #12085 is at fault.
> >
> > See https://github.com/apache/incubator-mxnet/issues/12197 for more
> info.
> >
> > I think this should be considered a release blocker for the 1.3 release.
> >
> > Note this breaks some parts of the KDD 18 MXNet / Gluon tutorial which is
> > scheduled for next Tuesday http://www.kdd.org/kdd2018/hands-on-
> > tutorials/view/mxnet-with-a-focus-on-nlp
> > . (We can work around by asking people to install the 0810 version
> > though.)
>
>


Re: Release plan - MXNET 1.3

2018-08-16 Thread Roshani Nagmote
Hi all,

Release status:

Currently, for release 1.3.0 there are a couple of issues open which needs
to be resolved before cutting RC.

The current date we are looking at for cutting RC0 is 08/17(Friday).



Open issues which need to be looked at before cutting RC:

   1. Topk regression issue
   <https://github.com/apache/incubator-mxnet/issues/12197> - #12202 PR
   with fix <https://github.com/apache/incubator-mxnet/pull/12202>
   2. Excessive memory allocation issue
   <https://github.com/apache/incubator-mxnet/issues/12116> - #12184 PR
   with fix <https://github.com/apache/incubator-mxnet/pull/12184>
   3. Test_io.test_csvIter breaks on CentOS
   <https://github.com/apache/incubator-mxnet/issues/12139> - #12189 PR
   with fix  <https://github.com/apache/incubator-mxnet/pull/12189>



@committers, could you please help review these PRs and get them merged?



Thanks,

Roshani

On Tue, Aug 14, 2018 at 12:46 PM Roshani Nagmote 
wrote:

> Talked to the person who ran resnet50 benchmarks offline. Build flag was
> not properly set so there was a difference in performance numbers observed.
> There is no issue caught and he was able to get the same results as
> mentioned here https://mxnet.incubator.apache.org/faq/perf.html
> <https://mxnet.incubator.apache.org/faq/perf.html#scoring-results>
>
> We are good here.
>
> Thanks,
> Roshani
>
> On Mon, Aug 13, 2018 at 4:08 PM Roshani Nagmote 
> wrote:
>
>> Hi Dom,
>>
>> I verified resnet50 run on MXNet master branch. Checked on single gpu
>> machine. Numbers match. I didn't see any performance degradation.
>> https://mxnet.incubator.apache.org/faq/perf.html#scoring-results
>>
>> Can you please give me more details on the instance type and script you
>> ran exactly so that I can try to reproduce it again?
>>
>> Thanks,
>> Roshani
>>
>>
>> On Mon, Aug 13, 2018 at 12:31 PM Roshani Nagmote <
>> roshaninagmo...@gmail.com> wrote:
>>
>>> This is not a major feature. I meant other new feature requests PR won't
>>> be accepted in 1.3 release now.
>>> Bug fixes will be accepted. I will be trying to reproduce the regression
>>> Dom mentioned today. :)
>>>
>>> Thanks,
>>> Roshani
>>>
>>> On Mon, Aug 13, 2018 at 12:06 PM Naveen Swamy 
>>> wrote:
>>>
>>>> Is this is a major feature? This is a regression that Dom is reporting
>>>> wrt
>>>> to performance
>>>>
>>>> On Mon, Aug 13, 2018 at 11:38 AM, Roshani Nagmote <
>>>> roshaninagmo...@gmail.com
>>>> > wrote:
>>>>
>>>> > Thanks for reporting this issue Dom.
>>>> > 08/10 (Frida)y was the major feature freeze date. We won't be
>>>> accepting any
>>>> > new features now for MXNet 1.3 release.
>>>> > RC0 will be cut on 08/17(Friday).
>>>> >
>>>> > Will be verifying the performance degradation issue mentioned.
>>>> >
>>>> > Thanks,
>>>> > Roshani
>>>> >
>>>> > On Mon, Aug 13, 2018 at 8:45 AM Divakaruni, Dominic
>>>> >  wrote:
>>>> >
>>>> > > Hi all, We tested resnet50 on MXNet built from master branch on
>>>> Friday
>>>> > and
>>>> > > were seeing degraded performance on GPU - about 50% slower compared
>>>> to
>>>> > > these values here https://mxnet.incubator.apache.org/faq/perf.html.
>>>> FWIW
>>>> > > this slowdown was seen for both MXNet as well as the TRT integrated
>>>> > MXNet.
>>>> > >
>>>> > > Something for you all to verify before or after you cut the RC.
>>>> > >
>>>> > > Thx!
>>>> > >
>>>> > > On 8/13/18, 4:34 AM, "kellen sunderland" <
>>>> kellen.sunderl...@gmail.com>
>>>> > > wrote:
>>>> > >
>>>> > > Hey Roshani,
>>>> > >
>>>> > > Has a RC branch already been cut?  If so, a quick heads up that
>>>> I
>>>> > think
>>>> > > this commit should probably get into RC0 for 1.3.
>>>> > >
>>>> > > https://github.com/apache/incubator-mxnet/commit/
>>>> > ee8755a2531b322fec29c9c3d2aa3b8738da41f3
>>>> > >
>>>> > > It won't cause issues for users, but from a versioning
>>>> compatibility
>>>> > > persp

Re: Release plan - MXNET 1.3

2018-08-14 Thread Roshani Nagmote
Talked to the person who ran resnet50 benchmarks offline. Build flag was
not properly set so there was a difference in performance numbers observed.
There is no issue caught and he was able to get the same results as
mentioned here https://mxnet.incubator.apache.org/faq/perf.html
<https://mxnet.incubator.apache.org/faq/perf.html#scoring-results>

We are good here.

Thanks,
Roshani

On Mon, Aug 13, 2018 at 4:08 PM Roshani Nagmote 
wrote:

> Hi Dom,
>
> I verified resnet50 run on MXNet master branch. Checked on single gpu
> machine. Numbers match. I didn't see any performance degradation.
> https://mxnet.incubator.apache.org/faq/perf.html#scoring-results
>
> Can you please give me more details on the instance type and script you
> ran exactly so that I can try to reproduce it again?
>
> Thanks,
> Roshani
>
>
> On Mon, Aug 13, 2018 at 12:31 PM Roshani Nagmote <
> roshaninagmo...@gmail.com> wrote:
>
>> This is not a major feature. I meant other new feature requests PR won't
>> be accepted in 1.3 release now.
>> Bug fixes will be accepted. I will be trying to reproduce the regression
>> Dom mentioned today. :)
>>
>> Thanks,
>> Roshani
>>
>> On Mon, Aug 13, 2018 at 12:06 PM Naveen Swamy  wrote:
>>
>>> Is this is a major feature? This is a regression that Dom is reporting
>>> wrt
>>> to performance
>>>
>>> On Mon, Aug 13, 2018 at 11:38 AM, Roshani Nagmote <
>>> roshaninagmo...@gmail.com
>>> > wrote:
>>>
>>> > Thanks for reporting this issue Dom.
>>> > 08/10 (Frida)y was the major feature freeze date. We won't be
>>> accepting any
>>> > new features now for MXNet 1.3 release.
>>> > RC0 will be cut on 08/17(Friday).
>>> >
>>> > Will be verifying the performance degradation issue mentioned.
>>> >
>>> > Thanks,
>>> > Roshani
>>> >
>>> > On Mon, Aug 13, 2018 at 8:45 AM Divakaruni, Dominic
>>> >  wrote:
>>> >
>>> > > Hi all, We tested resnet50 on MXNet built from master branch on
>>> Friday
>>> > and
>>> > > were seeing degraded performance on GPU - about 50% slower compared
>>> to
>>> > > these values here https://mxnet.incubator.apache.org/faq/perf.html.
>>> FWIW
>>> > > this slowdown was seen for both MXNet as well as the TRT integrated
>>> > MXNet.
>>> > >
>>> > > Something for you all to verify before or after you cut the RC.
>>> > >
>>> > > Thx!
>>> > >
>>> > > On 8/13/18, 4:34 AM, "kellen sunderland" <
>>> kellen.sunderl...@gmail.com>
>>> > > wrote:
>>> > >
>>> > > Hey Roshani,
>>> > >
>>> > > Has a RC branch already been cut?  If so, a quick heads up that I
>>> > think
>>> > > this commit should probably get into RC0 for 1.3.
>>> > >
>>> > > https://github.com/apache/incubator-mxnet/commit/
>>> > ee8755a2531b322fec29c9c3d2aa3b8738da41f3
>>> > >
>>> > > It won't cause issues for users, but from a versioning
>>> compatibility
>>> > > perspective it's probably better that we remove these functions
>>> in
>>> > this
>>> > > release. This way we don't have to worry about major bumps in the
>>> > next
>>> > > release if they're removed.
>>> > >
>>> > > -Kellen
>>> > >
>>> > >
>>> > > On Fri, Aug 10, 2018 at 7:24 PM Roshani Nagmote <
>>> > > roshaninagmo...@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > Thanks Kellen and everyone else for working to get TensorRT PR
>>> > > merged!
>>> > > > @Sina, I will be keeping track of that issue and fixes to get
>>> in
>>> > the
>>> > > > release.
>>> > > >
>>> > > > We are starting code freeze for 1.3 release today. A release
>>> > > candidate will
>>> > > > be cut on 08/17.
>>> > > > Feel free to add any other comments/suggestions.
>>> > > >
>>> > > > Thanks,
>>> > > > Roshani
>>> > > >
>>> > > > On Fri, Aug 10, 2018 at 5:39 AM kellen sunderland <
>>> > 

Re: Release plan - MXNET 1.3

2018-08-13 Thread Roshani Nagmote
Hi Dom,

I verified resnet50 run on MXNet master branch. Checked on single gpu
machine. Numbers match. I didn't see any performance degradation.
https://mxnet.incubator.apache.org/faq/perf.html#scoring-results

Can you please give me more details on the instance type and script you ran
exactly so that I can try to reproduce it again?

Thanks,
Roshani


On Mon, Aug 13, 2018 at 12:31 PM Roshani Nagmote 
wrote:

> This is not a major feature. I meant other new feature requests PR won't
> be accepted in 1.3 release now.
> Bug fixes will be accepted. I will be trying to reproduce the regression
> Dom mentioned today. :)
>
> Thanks,
> Roshani
>
> On Mon, Aug 13, 2018 at 12:06 PM Naveen Swamy  wrote:
>
>> Is this is a major feature? This is a regression that Dom is reporting wrt
>> to performance
>>
>> On Mon, Aug 13, 2018 at 11:38 AM, Roshani Nagmote <
>> roshaninagmo...@gmail.com
>> > wrote:
>>
>> > Thanks for reporting this issue Dom.
>> > 08/10 (Frida)y was the major feature freeze date. We won't be accepting
>> any
>> > new features now for MXNet 1.3 release.
>> > RC0 will be cut on 08/17(Friday).
>> >
>> > Will be verifying the performance degradation issue mentioned.
>> >
>> > Thanks,
>> > Roshani
>> >
>> > On Mon, Aug 13, 2018 at 8:45 AM Divakaruni, Dominic
>> >  wrote:
>> >
>> > > Hi all, We tested resnet50 on MXNet built from master branch on Friday
>> > and
>> > > were seeing degraded performance on GPU - about 50% slower compared to
>> > > these values here https://mxnet.incubator.apache.org/faq/perf.html.
>> FWIW
>> > > this slowdown was seen for both MXNet as well as the TRT integrated
>> > MXNet.
>> > >
>> > > Something for you all to verify before or after you cut the RC.
>> > >
>> > > Thx!
>> > >
>> > > On 8/13/18, 4:34 AM, "kellen sunderland" <
>> kellen.sunderl...@gmail.com>
>> > > wrote:
>> > >
>> > > Hey Roshani,
>> > >
>> > > Has a RC branch already been cut?  If so, a quick heads up that I
>> > think
>> > > this commit should probably get into RC0 for 1.3.
>> > >
>> > > https://github.com/apache/incubator-mxnet/commit/
>> > ee8755a2531b322fec29c9c3d2aa3b8738da41f3
>> > >
>> > > It won't cause issues for users, but from a versioning
>> compatibility
>> > > perspective it's probably better that we remove these functions in
>> > this
>> > > release. This way we don't have to worry about major bumps in the
>> > next
>> > > release if they're removed.
>> > >
>> > > -Kellen
>> > >
>> > >
>> > > On Fri, Aug 10, 2018 at 7:24 PM Roshani Nagmote <
>> > > roshaninagmo...@gmail.com>
>> > > wrote:
>> > >
>> > > > Thanks Kellen and everyone else for working to get TensorRT PR
>> > > merged!
>> > > > @Sina, I will be keeping track of that issue and fixes to get in
>> > the
>> > > > release.
>> > > >
>> > > > We are starting code freeze for 1.3 release today. A release
>> > > candidate will
>> > > > be cut on 08/17.
>> > > > Feel free to add any other comments/suggestions.
>> > > >
>> > >     > Thanks,
>> > > > Roshani
>> > > >
>> > > > On Fri, Aug 10, 2018 at 5:39 AM kellen sunderland <
>> > > > kellen.sunderl...@gmail.com> wrote:
>> > > >
>> > > > > All merged and ready to go from my side Roshani (the TensorRT
>> > PR).
>> > > > >
>> > > > > I agree with Sina that issue 12116 looks it's a blocker.  I'll
>> > try
>> > > and
>> > > > > reproduce it locally to get another datapoint.
>> > > > >
>> > > > > On Fri, Aug 10, 2018 at 3:15 AM Afrooze, Sina <
>> > sina@gmail.com>
>> > > > wrote:
>> > > > >
>> > > > > > Hi Roshani - I think this regression issue is a release
>> > blocker:
>> > > > > > https://github.com/apache/incubator-mxnet/issues/12116  -
>> Sina
>> > > > > >
>> > > > > >

Re: Release plan - MXNET 1.3

2018-08-13 Thread Roshani Nagmote
This is not a major feature. I meant other new feature requests PR won't be
accepted in 1.3 release now.
Bug fixes will be accepted. I will be trying to reproduce the regression
Dom mentioned today. :)

Thanks,
Roshani

On Mon, Aug 13, 2018 at 12:06 PM Naveen Swamy  wrote:

> Is this is a major feature? This is a regression that Dom is reporting wrt
> to performance
>
> On Mon, Aug 13, 2018 at 11:38 AM, Roshani Nagmote <
> roshaninagmo...@gmail.com
> > wrote:
>
> > Thanks for reporting this issue Dom.
> > 08/10 (Frida)y was the major feature freeze date. We won't be accepting
> any
> > new features now for MXNet 1.3 release.
> > RC0 will be cut on 08/17(Friday).
> >
> > Will be verifying the performance degradation issue mentioned.
> >
> > Thanks,
> > Roshani
> >
> > On Mon, Aug 13, 2018 at 8:45 AM Divakaruni, Dominic
> >  wrote:
> >
> > > Hi all, We tested resnet50 on MXNet built from master branch on Friday
> > and
> > > were seeing degraded performance on GPU - about 50% slower compared to
> > > these values here https://mxnet.incubator.apache.org/faq/perf.html.
> FWIW
> > > this slowdown was seen for both MXNet as well as the TRT integrated
> > MXNet.
> > >
> > > Something for you all to verify before or after you cut the RC.
> > >
> > > Thx!
> > >
> > > On 8/13/18, 4:34 AM, "kellen sunderland"  >
> > > wrote:
> > >
> > > Hey Roshani,
> > >
> > > Has a RC branch already been cut?  If so, a quick heads up that I
> > think
> > > this commit should probably get into RC0 for 1.3.
> > >
> > > https://github.com/apache/incubator-mxnet/commit/
> > ee8755a2531b322fec29c9c3d2aa3b8738da41f3
> > >
> > > It won't cause issues for users, but from a versioning
> compatibility
> > > perspective it's probably better that we remove these functions in
> > this
> > > release. This way we don't have to worry about major bumps in the
> > next
> > > release if they're removed.
> > >
> > > -Kellen
> > >
> > >
> > > On Fri, Aug 10, 2018 at 7:24 PM Roshani Nagmote <
> > > roshaninagmo...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Kellen and everyone else for working to get TensorRT PR
> > > merged!
> > > > @Sina, I will be keeping track of that issue and fixes to get in
> > the
> > > > release.
> > > >
> > > > We are starting code freeze for 1.3 release today. A release
> > > candidate will
> > > > be cut on 08/17.
> > > > Feel free to add any other comments/suggestions.
> > > >
> > > > Thanks,
> > > > Roshani
> > > >
> > > > On Fri, Aug 10, 2018 at 5:39 AM kellen sunderland <
> > > > kellen.sunderl...@gmail.com> wrote:
> > >     >
> > > > > All merged and ready to go from my side Roshani (the TensorRT
> > PR).
> > > > >
> > > > > I agree with Sina that issue 12116 looks it's a blocker.  I'll
> > try
> > > and
> > > > > reproduce it locally to get another datapoint.
> > > > >
> > > > > On Fri, Aug 10, 2018 at 3:15 AM Afrooze, Sina <
> > sina@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi Roshani - I think this regression issue is a release
> > blocker:
> > > > > > https://github.com/apache/incubator-mxnet/issues/12116  -
> Sina
> > > > > >
> > > > > >
> > > > > > On 8/8/18, 12:40 PM, "Roshani Nagmote" <
> > > roshaninagmo...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Thanks, Kellen for letting me know.
> > > > > >
> > > > > > On Wed, Aug 8, 2018 at 12:09 PM kellen sunderland <
> > > > > > kellen.sunderl...@gmail.com> wrote:
> > > > > >
> > > > > > > Hey Roshani, I think it should be ready by Friday.
> > > > > > >
> > > > > > > On Tue, Aug 7, 2018, 10:20 PM Roshani Nagmote <
> > > > > > roshaninagmo...@gmail.com>
> > > > > > > wrote:
> > > 

Re: Release plan - MXNET 1.3

2018-08-13 Thread Roshani Nagmote
Thanks for reporting this issue Dom.
08/10 (Frida)y was the major feature freeze date. We won't be accepting any
new features now for MXNet 1.3 release.
RC0 will be cut on 08/17(Friday).

Will be verifying the performance degradation issue mentioned.

Thanks,
Roshani

On Mon, Aug 13, 2018 at 8:45 AM Divakaruni, Dominic
 wrote:

> Hi all, We tested resnet50 on MXNet built from master branch on Friday and
> were seeing degraded performance on GPU - about 50% slower compared to
> these values here https://mxnet.incubator.apache.org/faq/perf.html. FWIW
> this slowdown was seen for both MXNet as well as the TRT integrated MXNet.
>
> Something for you all to verify before or after you cut the RC.
>
> Thx!
>
> On 8/13/18, 4:34 AM, "kellen sunderland" 
> wrote:
>
> Hey Roshani,
>
> Has a RC branch already been cut?  If so, a quick heads up that I think
> this commit should probably get into RC0 for 1.3.
>
> https://github.com/apache/incubator-mxnet/commit/ee8755a2531b322fec29c9c3d2aa3b8738da41f3
>
> It won't cause issues for users, but from a versioning compatibility
> perspective it's probably better that we remove these functions in this
> release. This way we don't have to worry about major bumps in the next
> release if they're removed.
>
> -Kellen
>
>
> On Fri, Aug 10, 2018 at 7:24 PM Roshani Nagmote <
> roshaninagmo...@gmail.com>
> wrote:
>
> > Thanks Kellen and everyone else for working to get TensorRT PR
> merged!
> > @Sina, I will be keeping track of that issue and fixes to get in the
> > release.
> >
> > We are starting code freeze for 1.3 release today. A release
> candidate will
> > be cut on 08/17.
> > Feel free to add any other comments/suggestions.
> >
> > Thanks,
> > Roshani
> >
> > On Fri, Aug 10, 2018 at 5:39 AM kellen sunderland <
> > kellen.sunderl...@gmail.com> wrote:
> >
> > > All merged and ready to go from my side Roshani (the TensorRT PR).
> > >
> > > I agree with Sina that issue 12116 looks it's a blocker.  I'll try
> and
> > > reproduce it locally to get another datapoint.
> > >
> > > On Fri, Aug 10, 2018 at 3:15 AM Afrooze, Sina 
> > wrote:
> > >
> > > > Hi Roshani - I think this regression issue is a release blocker:
> > > > https://github.com/apache/incubator-mxnet/issues/12116  - Sina
> > > >
> > > >
> > > > On 8/8/18, 12:40 PM, "Roshani Nagmote" <
> roshaninagmo...@gmail.com>
> > > wrote:
> > > >
> > > > Thanks, Kellen for letting me know.
> > > >
> > > > On Wed, Aug 8, 2018 at 12:09 PM kellen sunderland <
> > > > kellen.sunderl...@gmail.com> wrote:
> > > >
> > > > > Hey Roshani, I think it should be ready by Friday.
> > > > >
> > > > > On Tue, Aug 7, 2018, 10:20 PM Roshani Nagmote <
> > > > roshaninagmo...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks Kellen. Yes, we were treating this PR as a release
> > > blocker.
> > > > Do you
> > > > > > have any ETA by which it will be completed? Approximate
> time
> > will
> > > > also
> > > > > > work.
> > > > > > @zhi, Thanks for bringing this PR into notice. I will
> keep a
> > > track
> > > > of it.
> > > > > >
> > > > > > -Roshani
> > > > > >
> > > > > > On Tue, Aug 7, 2018 at 11:30 AM Joshua Z. Zhang <
> > > > cheungc...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > I strongly suggest to track this PR
> > > > > > > https://github.com/apache/incubator-mxnet/pull/11908 <
> > > > > > > https://github.com/apache/incubator-mxnet/pull/11908>
> in 1.3
> > > > release
> > > > > > > which fixed the usability issue for lower end machines
> that
> > > > don’t have
> > > > > as
> > > > > > > large shared memory space as ec2 instances.
> > > &g

Re: Release plan - MXNET 1.3

2018-08-10 Thread Roshani Nagmote
Thanks Kellen and everyone else for working to get TensorRT PR merged!
@Sina, I will be keeping track of that issue and fixes to get in the
release.

We are starting code freeze for 1.3 release today. A release candidate will
be cut on 08/17.
Feel free to add any other comments/suggestions.

Thanks,
Roshani

On Fri, Aug 10, 2018 at 5:39 AM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> All merged and ready to go from my side Roshani (the TensorRT PR).
>
> I agree with Sina that issue 12116 looks it's a blocker.  I'll try and
> reproduce it locally to get another datapoint.
>
> On Fri, Aug 10, 2018 at 3:15 AM Afrooze, Sina  wrote:
>
> > Hi Roshani - I think this regression issue is a release blocker:
> > https://github.com/apache/incubator-mxnet/issues/12116  - Sina
> >
> >
> > On 8/8/18, 12:40 PM, "Roshani Nagmote" 
> wrote:
> >
> > Thanks, Kellen for letting me know.
> >
> > On Wed, Aug 8, 2018 at 12:09 PM kellen sunderland <
> > kellen.sunderl...@gmail.com> wrote:
> >
> > > Hey Roshani, I think it should be ready by Friday.
> > >
> > > On Tue, Aug 7, 2018, 10:20 PM Roshani Nagmote <
> > roshaninagmo...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Kellen. Yes, we were treating this PR as a release
> blocker.
> > Do you
> > > > have any ETA by which it will be completed? Approximate time will
> > also
> > > > work.
> > > > @zhi, Thanks for bringing this PR into notice. I will keep a
> track
> > of it.
> > > >
> > > > -Roshani
> > > >
> > > > On Tue, Aug 7, 2018 at 11:30 AM Joshua Z. Zhang <
> > cheungc...@gmail.com>
> > > > wrote:
> > > >
> > > > > I strongly suggest to track this PR
> > > > > https://github.com/apache/incubator-mxnet/pull/11908 <
> > > > > https://github.com/apache/incubator-mxnet/pull/11908> in 1.3
> > release
> > > > > which fixed the usability issue for lower end machines that
> > don’t have
> > > as
> > > > > large shared memory space as ec2 instances.
> > > > >
> > > > > Best,
> > > > >
> > > > > - Zhi
> > > > >
> > > > > > On Aug 7, 2018, at 9:05 AM, Roshani Nagmote <
> > > roshaninagmo...@gmail.com
> > > > >
> > > > > wrote:
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Right now, we are delaying MXNet 1.3 release for pending
> > TensorRT PR
> > > (
> > > > > > https://github.com/apache/incubator-mxnet/pull/11325 ).
> > > > > >
> > > > > > I wanted to ask everyone for their opinions if we should
> delay
> > the
> > > > > release
> > > > > > to get tensorRT integration in or we should go ahead with the
> > release
> > > > and
> > > > > > include tensorRT in next release. Please provide suggestions.
> > > > > >
> > > > > > Thanks,
> > > > > > Roshani
> > > > > >
> > > > > > On Mon, Aug 6, 2018 at 12:45 AM Hagay Lupesko <
> > lupe...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > >> Some thoughts: why not keep it out of 1.3, and merge it into
> > master
> > > so
> > > > > it
> > > > > >> can go out with 1.4 instead?
> > > > > >> Pros:
> > > > > >> - Reduce quality risks for 1.3
> > > > > >> - More time to test and get feedback before release
> > > > > >> - Avoid further delays in 1.3 release (lots of good stuff
> > there
> > > > already
> > > > > for
> > > > > >> users)
> > > > > >> Cons:
> > > > > >> - People will need to get master to experiment with TRT (not
> > a major
> > > > > issue
> > > > > >> IMO)
> > > > > >>
> > > > > >> Besides, TRT requires a build flag anyway, so MXNet users
> > consuming
> > > > > built
> > >

Re: Release plan - MXNET 1.3

2018-08-08 Thread Roshani Nagmote
Thanks, Kellen for letting me know.

On Wed, Aug 8, 2018 at 12:09 PM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> Hey Roshani, I think it should be ready by Friday.
>
> On Tue, Aug 7, 2018, 10:20 PM Roshani Nagmote 
> wrote:
>
> > Thanks Kellen. Yes, we were treating this PR as a release blocker. Do you
> > have any ETA by which it will be completed? Approximate time will also
> > work.
> > @zhi, Thanks for bringing this PR into notice. I will keep a track of it.
> >
> > -Roshani
> >
> > On Tue, Aug 7, 2018 at 11:30 AM Joshua Z. Zhang 
> > wrote:
> >
> > > I strongly suggest to track this PR
> > > https://github.com/apache/incubator-mxnet/pull/11908 <
> > > https://github.com/apache/incubator-mxnet/pull/11908> in 1.3 release
> > > which fixed the usability issue for lower end machines that don’t have
> as
> > > large shared memory space as ec2 instances.
> > >
> > > Best,
> > >
> > > - Zhi
> > >
> > > > On Aug 7, 2018, at 9:05 AM, Roshani Nagmote <
> roshaninagmo...@gmail.com
> > >
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > Right now, we are delaying MXNet 1.3 release for pending TensorRT PR
> (
> > > > https://github.com/apache/incubator-mxnet/pull/11325 ).
> > > >
> > > > I wanted to ask everyone for their opinions if we should delay the
> > > release
> > > > to get tensorRT integration in or we should go ahead with the release
> > and
> > > > include tensorRT in next release. Please provide suggestions.
> > > >
> > > > Thanks,
> > > > Roshani
> > > >
> > > > On Mon, Aug 6, 2018 at 12:45 AM Hagay Lupesko 
> > wrote:
> > > >
> > > >> Some thoughts: why not keep it out of 1.3, and merge it into master
> so
> > > it
> > > >> can go out with 1.4 instead?
> > > >> Pros:
> > > >> - Reduce quality risks for 1.3
> > > >> - More time to test and get feedback before release
> > > >> - Avoid further delays in 1.3 release (lots of good stuff there
> > already
> > > for
> > > >> users)
> > > >> Cons:
> > > >> - People will need to get master to experiment with TRT (not a major
> > > issue
> > > >> IMO)
> > > >>
> > > >> Besides, TRT requires a build flag anyway, so MXNet users consuming
> > > built
> > > >> packages (PyPi, Scala) will anyway not be able to try it out unless
> > > >> building from source...
> > > >>
> > > >> Thoughts?
> > > >>
> > > >> On Sun, Aug 5, 2018 at 10:38 PM Steffen Rochel <
> > steffenroc...@gmail.com
> > > >
> > > >> wrote:
> > > >>
> > > >>> Marek, Kellen, Jun, Da, Eric, myself and a few other people
> discussed
> > > >>> offline about TensorRT integration PR (
> > > >>> https://github.com/apache/incubator-mxnet/pull/11325 ). We do
> agree
> > > that
> > > >>> it
> > > >>> would be good to include the PR into upcoming 1.3 release, but are
> > all
> > > >>> concerned about the risk involved and the breaking API change. The
> > > >>> discussion converged to following proposal. (1) change to contrib
> PR
> > > and
> > > >>> (2) define a different top level API to indicate that the package
> is
> > > part
> > > >>> of contrib and experimental (details of API TBD between Marek,
> Kellen
> > > and
> > > >>> Eric). This change would allow to include TRT integration with v1.3
> > to
> > > >>> enable users to try TRT with MXNet, minimize the risk and avoid
> > > breaking
> > > >>> API change.
> > > >>> To accommodate the change the request is to delay RC for a few
> days.
> > > >>>
> > > >>> Regards,
> > > >>> Steffen
> > > >>>
> > > >>> On Tue, Jul 31, 2018 at 5:08 PM Roshani Nagmote <
> > > >> roshaninagmo...@gmail.com
> > > >>>>
> > > >>> wrote:
> > > >>>
> > > >>>> Hi,
> > > >>>>
> > > >>>> I have created a wiki for tracking MXNet 1.3 release with

Re: Release plan - MXNET 1.3

2018-08-07 Thread Roshani Nagmote
Thanks Kellen. Yes, we were treating this PR as a release blocker. Do you
have any ETA by which it will be completed? Approximate time will also work.
@zhi, Thanks for bringing this PR into notice. I will keep a track of it.

-Roshani

On Tue, Aug 7, 2018 at 11:30 AM Joshua Z. Zhang 
wrote:

> I strongly suggest to track this PR
> https://github.com/apache/incubator-mxnet/pull/11908 <
> https://github.com/apache/incubator-mxnet/pull/11908> in 1.3 release
> which fixed the usability issue for lower end machines that don’t have as
> large shared memory space as ec2 instances.
>
> Best,
>
> - Zhi
>
> > On Aug 7, 2018, at 9:05 AM, Roshani Nagmote 
> wrote:
> >
> > Hi all,
> >
> > Right now, we are delaying MXNet 1.3 release for pending TensorRT PR (
> > https://github.com/apache/incubator-mxnet/pull/11325 ).
> >
> > I wanted to ask everyone for their opinions if we should delay the
> release
> > to get tensorRT integration in or we should go ahead with the release and
> > include tensorRT in next release. Please provide suggestions.
> >
> > Thanks,
> > Roshani
> >
> > On Mon, Aug 6, 2018 at 12:45 AM Hagay Lupesko  wrote:
> >
> >> Some thoughts: why not keep it out of 1.3, and merge it into master so
> it
> >> can go out with 1.4 instead?
> >> Pros:
> >> - Reduce quality risks for 1.3
> >> - More time to test and get feedback before release
> >> - Avoid further delays in 1.3 release (lots of good stuff there already
> for
> >> users)
> >> Cons:
> >> - People will need to get master to experiment with TRT (not a major
> issue
> >> IMO)
> >>
> >> Besides, TRT requires a build flag anyway, so MXNet users consuming
> built
> >> packages (PyPi, Scala) will anyway not be able to try it out unless
> >> building from source...
> >>
> >> Thoughts?
> >>
> >> On Sun, Aug 5, 2018 at 10:38 PM Steffen Rochel  >
> >> wrote:
> >>
> >>> Marek, Kellen, Jun, Da, Eric, myself and a few other people discussed
> >>> offline about TensorRT integration PR (
> >>> https://github.com/apache/incubator-mxnet/pull/11325 ). We do agree
> that
> >>> it
> >>> would be good to include the PR into upcoming 1.3 release, but are all
> >>> concerned about the risk involved and the breaking API change. The
> >>> discussion converged to following proposal. (1) change to contrib PR
> and
> >>> (2) define a different top level API to indicate that the package is
> part
> >>> of contrib and experimental (details of API TBD between Marek, Kellen
> and
> >>> Eric). This change would allow to include TRT integration with v1.3 to
> >>> enable users to try TRT with MXNet, minimize the risk and avoid
> breaking
> >>> API change.
> >>> To accommodate the change the request is to delay RC for a few days.
> >>>
> >>> Regards,
> >>> Steffen
> >>>
> >>> On Tue, Jul 31, 2018 at 5:08 PM Roshani Nagmote <
> >> roshaninagmo...@gmail.com
> >>>>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I have created a wiki for tracking MXNet 1.3 release with the
> timeline.
> >>>> Please take a look here:
> >>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.3.0+Release+Status
> >>>>
> >>>> I am still waiting for following 2 PRs to get merged:
> >>>> TRT integration: https://github.com/apache/incubator-mxnet/pull/11325
> >>>> Gluon RNN: https://github.com/apache/incubator-mxnet/pull/11482
> >>>>
> >>>> *Code freeze date is 08/02(Thursday).* Kindly try to complete ongoing
> >>> work
> >>>> and get these PRs merged.
> >>>>
> >>>> Thanks,
> >>>> Roshani
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Jul 30, 2018 at 1:02 PM Roshani Nagmote <
> >>> roshaninagmo...@gmail.com
> >>>>>
> >>>> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> Here is an update on MXNet 1.3 release:
> >>>>> I am still waiting for following PRs to get merged:
> >>>>>
> >>>>> TRT integration:
> >> https://github.com/apache/incubator-mxnet/pull/11325
> &g

Re: Release plan - MXNET 1.3

2018-08-07 Thread Roshani Nagmote
Hi all,

Right now, we are delaying MXNet 1.3 release for pending TensorRT PR (
https://github.com/apache/incubator-mxnet/pull/11325 ).

I wanted to ask everyone for their opinions if we should delay the release
to get tensorRT integration in or we should go ahead with the release and
include tensorRT in next release. Please provide suggestions.

Thanks,
Roshani

On Mon, Aug 6, 2018 at 12:45 AM Hagay Lupesko  wrote:

> Some thoughts: why not keep it out of 1.3, and merge it into master so it
> can go out with 1.4 instead?
> Pros:
> - Reduce quality risks for 1.3
> - More time to test and get feedback before release
> - Avoid further delays in 1.3 release (lots of good stuff there already for
> users)
> Cons:
> - People will need to get master to experiment with TRT (not a major issue
> IMO)
>
> Besides, TRT requires a build flag anyway, so MXNet users consuming built
> packages (PyPi, Scala) will anyway not be able to try it out unless
> building from source...
>
> Thoughts?
>
> On Sun, Aug 5, 2018 at 10:38 PM Steffen Rochel 
> wrote:
>
> > Marek, Kellen, Jun, Da, Eric, myself and a few other people discussed
> > offline about TensorRT integration PR (
> > https://github.com/apache/incubator-mxnet/pull/11325 ). We do agree that
> > it
> > would be good to include the PR into upcoming 1.3 release, but are all
> > concerned about the risk involved and the breaking API change. The
> > discussion converged to following proposal. (1) change to contrib PR and
> > (2) define a different top level API to indicate that the package is part
> > of contrib and experimental (details of API TBD between Marek, Kellen and
> > Eric). This change would allow to include TRT integration with v1.3 to
> > enable users to try TRT with MXNet, minimize the risk and avoid breaking
> > API change.
> > To accommodate the change the request is to delay RC for a few days.
> >
> > Regards,
> > Steffen
> >
> > On Tue, Jul 31, 2018 at 5:08 PM Roshani Nagmote <
> roshaninagmo...@gmail.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > I have created a wiki for tracking MXNet 1.3 release with the timeline.
> > > Please take a look here:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.3.0+Release+Status
> > >
> > > I am still waiting for following 2 PRs to get merged:
> > > TRT integration: https://github.com/apache/incubator-mxnet/pull/11325
> > > Gluon RNN: https://github.com/apache/incubator-mxnet/pull/11482
> > >
> > > *Code freeze date is 08/02(Thursday).* Kindly try to complete ongoing
> > work
> > > and get these PRs merged.
> > >
> > > Thanks,
> > > Roshani
> > >
> > >
> > >
> > > On Mon, Jul 30, 2018 at 1:02 PM Roshani Nagmote <
> > roshaninagmo...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Here is an update on MXNet 1.3 release:
> > > > I am still waiting for following PRs to get merged:
> > > >
> > > > TRT integration:
> https://github.com/apache/incubator-mxnet/pull/11325
> > > > Gluon RNN: https://github.com/apache/incubator-mxnet/pull/11482
> > > > Scala examples:
> > > >
> > > > https://github.com/apache/incubator-mxnet/pull/11753
> > > >
> > > > https://github.com/apache/incubator-mxnet/pull/11621
> > > >
> > > > *New code freeze date is: 08/03*  Please try to get your ongoing PRs
> > > > merged by then.
> > > >
> > > > @Pedro, I didn't include your PRs in tracking list as you said those
> > are
> > > > not critical for now. Please let me know if those needs to be
> included.
> > > > https://github.com/apache/incubator-mxnet/pull/11636
> > > > https://github.com/apache/incubator-mxnet/pull/11562
> > > >
> > > > I also have updated project proposal cwiki page to update the status
> of
> > > > PRs.
> > > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
> > > >
> > > >
> > > > Please let me know if I am missing something.
> > > >
> > > > Thanks,
> > > > Roshani
> > > >
> > > >
> > > > On Thu, Jul 26, 2018 at 1:34 PM Pedro Larroy <
> > > pedro.larroy.li...@gmail.com>
> > > > wrote:
> > > >
> > > >> 

Re: Release plan - MXNET 1.3

2018-07-31 Thread Roshani Nagmote
Hi,

I have created a wiki for tracking MXNet 1.3 release with the timeline.
Please take a look here:
https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.3.0+Release+Status

I am still waiting for following 2 PRs to get merged:
TRT integration: https://github.com/apache/incubator-mxnet/pull/11325
Gluon RNN: https://github.com/apache/incubator-mxnet/pull/11482

*Code freeze date is 08/02(Thursday).* Kindly try to complete ongoing work
and get these PRs merged.

Thanks,
Roshani



On Mon, Jul 30, 2018 at 1:02 PM Roshani Nagmote 
wrote:

> Hi all,
>
> Here is an update on MXNet 1.3 release:
> I am still waiting for following PRs to get merged:
>
> TRT integration: https://github.com/apache/incubator-mxnet/pull/11325
> Gluon RNN: https://github.com/apache/incubator-mxnet/pull/11482
> Scala examples:
>
> https://github.com/apache/incubator-mxnet/pull/11753
>
> https://github.com/apache/incubator-mxnet/pull/11621
>
> *New code freeze date is: 08/03*  Please try to get your ongoing PRs
> merged by then.
>
> @Pedro, I didn't include your PRs in tracking list as you said those are
> not critical for now. Please let me know if those needs to be included.
> https://github.com/apache/incubator-mxnet/pull/11636
> https://github.com/apache/incubator-mxnet/pull/11562
>
> I also have updated project proposal cwiki page to update the status of
> PRs.
> <https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release>
>
> Please let me know if I am missing something.
>
> Thanks,
> Roshani
>
>
> On Thu, Jul 26, 2018 at 1:34 PM Pedro Larroy 
> wrote:
>
>> I would like to get these PR merged:
>>
>> https://github.com/apache/incubator-mxnet/pull/11636
>> https://github.com/apache/incubator-mxnet/pull/11562
>>
>> How much longer until the code freeze?
>>
>> On Thu, Jul 26, 2018 at 1:44 AM Roshani Nagmote <
>> roshaninagmo...@gmail.com>
>> wrote:
>>
>> > Hi all,
>> >
>> > PRs waiting to be merged for 1.3 release:
>> > https://github.com/apache/incubator-mxnet/pull/11325
>> >
>> > Are there any other PRs waiting to get merged? Please let me know.
>> >
>> > Release blocker issue:
>> > https://github.com/apache/incubator-mxnet/issues/11853
>> >
>> > @Marco, @Kellen, Thanks for bringing up the important topic. I agree
>> with
>> > you and we(internal Amazon team) will be working on fixing the disabled
>> > tests.
>> > Currently, my colleague, Hao Jin is working on compiling the list of
>> > disabled tests and leading the effort to fix them in the next few days.
>> >
>> > Thanks,
>> > Roshani
>> >
>> > On Mon, Jul 23, 2018 at 6:39 PM kellen sunderland <
>> > kellen.sunderl...@gmail.com> wrote:
>> >
>> > > Thanks again for organizing Roshani.  I believe the TensorRT work is
>> > ready
>> > > for a merge.  Thanks to Marek and all the NVIDIA people for iterating
>> on
>> > > it.  If possible could a committer review, make sure it meets their
>> > > expectations and then merge?  PR is here:
>> > > https://github.com/apache/incubator-mxnet/pull/11325
>> > >
>> > > To Marco's point.  I'd recommend we review some of those disabled
>> tests
>> > and
>> > > see how likely they are to affect users before we cut a release.
>> Many of
>> > > them are obviously not too important from a user's point of view (e.g.
>> > > downloading a sometimes-offline image in a test).  One idea would be
>> to
>> > try
>> > > and address as many of the customer impacting issues as possible
>> between
>> > > code freeze and the RC0 vote.
>> > >
>> > > On Mon, Jul 23, 2018 at 1:23 PM Marco de Abreu
>> > >  wrote:
>> > >
>> > > > Hello Roshani,
>> > > >
>> > > > frequent releases are good and I'm supportive for this in general in
>> > > order
>> > > > to provide our users with the latest features and improvements. But
>> at
>> > > the
>> > > > moment, I'm slightly concerned about the test coverage due to [1]. I
>> > want
>> > > > us to be conscious about cutting a release even though not all tests
>> > are
>> > > > enabled (29 disabled tests [2] as of today). However, I acknowledge
>> > that
>> > > we
>> > > > have improved by a lot lately thanks to everybody participating an

Re: Release plan - MXNET 1.3

2018-07-30 Thread Roshani Nagmote
Hi all,

Here is an update on MXNet 1.3 release:
I am still waiting for following PRs to get merged:

TRT integration: https://github.com/apache/incubator-mxnet/pull/11325
Gluon RNN: https://github.com/apache/incubator-mxnet/pull/11482
Scala examples:

https://github.com/apache/incubator-mxnet/pull/11753

https://github.com/apache/incubator-mxnet/pull/11621

*New code freeze date is: 08/03*  Please try to get your ongoing PRs merged
by then.

@Pedro, I didn't include your PRs in tracking list as you said those are
not critical for now. Please let me know if those needs to be included.
https://github.com/apache/incubator-mxnet/pull/11636
https://github.com/apache/incubator-mxnet/pull/11562

I also have updated project proposal cwiki page to update the status of PRs.
<https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release>

Please let me know if I am missing something.

Thanks,
Roshani


On Thu, Jul 26, 2018 at 1:34 PM Pedro Larroy 
wrote:

> I would like to get these PR merged:
>
> https://github.com/apache/incubator-mxnet/pull/11636
> https://github.com/apache/incubator-mxnet/pull/11562
>
> How much longer until the code freeze?
>
> On Thu, Jul 26, 2018 at 1:44 AM Roshani Nagmote  >
> wrote:
>
> > Hi all,
> >
> > PRs waiting to be merged for 1.3 release:
> > https://github.com/apache/incubator-mxnet/pull/11325
> >
> > Are there any other PRs waiting to get merged? Please let me know.
> >
> > Release blocker issue:
> > https://github.com/apache/incubator-mxnet/issues/11853
> >
> > @Marco, @Kellen, Thanks for bringing up the important topic. I agree with
> > you and we(internal Amazon team) will be working on fixing the disabled
> > tests.
> > Currently, my colleague, Hao Jin is working on compiling the list of
> > disabled tests and leading the effort to fix them in the next few days.
> >
> > Thanks,
> > Roshani
> >
> > On Mon, Jul 23, 2018 at 6:39 PM kellen sunderland <
> > kellen.sunderl...@gmail.com> wrote:
> >
> > > Thanks again for organizing Roshani.  I believe the TensorRT work is
> > ready
> > > for a merge.  Thanks to Marek and all the NVIDIA people for iterating
> on
> > > it.  If possible could a committer review, make sure it meets their
> > > expectations and then merge?  PR is here:
> > > https://github.com/apache/incubator-mxnet/pull/11325
> > >
> > > To Marco's point.  I'd recommend we review some of those disabled tests
> > and
> > > see how likely they are to affect users before we cut a release.  Many
> of
> > > them are obviously not too important from a user's point of view (e.g.
> > > downloading a sometimes-offline image in a test).  One idea would be to
> > try
> > > and address as many of the customer impacting issues as possible
> between
> > > code freeze and the RC0 vote.
> > >
> > > On Mon, Jul 23, 2018 at 1:23 PM Marco de Abreu
> > >  wrote:
> > >
> > > > Hello Roshani,
> > > >
> > > > frequent releases are good and I'm supportive for this in general in
> > > order
> > > > to provide our users with the latest features and improvements. But
> at
> > > the
> > > > moment, I'm slightly concerned about the test coverage due to [1]. I
> > want
> > > > us to be conscious about cutting a release even though not all tests
> > are
> > > > enabled (29 disabled tests [2] as of today). However, I acknowledge
> > that
> > > we
> > > > have improved by a lot lately thanks to everybody participating and
> > > leading
> > > > the efforts around improving flaky tests. From a retrospective point
> of
> > > > view, we could say that these efforts have actually revealed some
> quite
> > > > interesting bugs and thus the time was well spent and yielded good
> > > results.
> > > >
> > > > What does the community think about making another sprint of
> > improvements
> > > > around tests followed up by a period of 1-2 weeks during which we
> > observe
> > > > the failures closely to ensure that no critical paths are impacted?
> If
> > we
> > > > are in a good shape by then, we could continue the release process
> and
> > at
> > > > the same time have the advantage of giving contributors more lead
> time
> > to
> > > > finish their work to ensure it gets into the release in the desired
> > > > quality.
> > > >
> > > > Again, thanks to everybody for their effort

Re: Release plan - MXNET 1.3

2018-07-25 Thread Roshani Nagmote
Hi all,

PRs waiting to be merged for 1.3 release:
https://github.com/apache/incubator-mxnet/pull/11325

Are there any other PRs waiting to get merged? Please let me know.

Release blocker issue:
https://github.com/apache/incubator-mxnet/issues/11853

@Marco, @Kellen, Thanks for bringing up the important topic. I agree with
you and we(internal Amazon team) will be working on fixing the disabled
tests.
Currently, my colleague, Hao Jin is working on compiling the list of
disabled tests and leading the effort to fix them in the next few days.

Thanks,
Roshani

On Mon, Jul 23, 2018 at 6:39 PM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> Thanks again for organizing Roshani.  I believe the TensorRT work is ready
> for a merge.  Thanks to Marek and all the NVIDIA people for iterating on
> it.  If possible could a committer review, make sure it meets their
> expectations and then merge?  PR is here:
> https://github.com/apache/incubator-mxnet/pull/11325
>
> To Marco's point.  I'd recommend we review some of those disabled tests and
> see how likely they are to affect users before we cut a release.  Many of
> them are obviously not too important from a user's point of view (e.g.
> downloading a sometimes-offline image in a test).  One idea would be to try
> and address as many of the customer impacting issues as possible between
> code freeze and the RC0 vote.
>
> On Mon, Jul 23, 2018 at 1:23 PM Marco de Abreu
>  wrote:
>
> > Hello Roshani,
> >
> > frequent releases are good and I'm supportive for this in general in
> order
> > to provide our users with the latest features and improvements. But at
> the
> > moment, I'm slightly concerned about the test coverage due to [1]. I want
> > us to be conscious about cutting a release even though not all tests are
> > enabled (29 disabled tests [2] as of today). However, I acknowledge that
> we
> > have improved by a lot lately thanks to everybody participating and
> leading
> > the efforts around improving flaky tests. From a retrospective point of
> > view, we could say that these efforts have actually revealed some quite
> > interesting bugs and thus the time was well spent and yielded good
> results.
> >
> > What does the community think about making another sprint of improvements
> > around tests followed up by a period of 1-2 weeks during which we observe
> > the failures closely to ensure that no critical paths are impacted? If we
> > are in a good shape by then, we could continue the release process and at
> > the same time have the advantage of giving contributors more lead time to
> > finish their work to ensure it gets into the release in the desired
> > quality.
> >
> > Again, thanks to everybody for their efforts during the last weeks to
> > improve the usability and stability of MXNet. This is great community
> > effort and a good example of a community working together towards a
> unified
> > goal!
> >
> > Best regards,
> > Marco
> >
> > [1]:
> >
> >
> https://lists.apache.org/thread.html/d6d81401de796a96677a112d6cd0b074b01f46564194ea89b86c6a3e@%3Cdev.mxnet.apache.org%3E
> > [2]:
> >
> >
> https://github.com/apache/incubator-mxnet/issues?q=is%3Aopen+is%3Aissue+label%3A%22Disabled+test%22
> >
> > On Mon, Jul 23, 2018 at 8:34 PM Roshani Nagmote <
> roshaninagmo...@gmail.com
> > >
> > wrote:
> >
> > > Hi all,
> > >
> > > As mentioned before, code freeze date is today July 23rd. Please try to
> > get
> > > your ongoing PRs merged by today.
> > > Please let me know if there are any concerns or need more time.
> > >
> > > Thanks,
> > > Roshani
> > >
> > > On Thu, Jul 19, 2018 at 1:16 PM Anirudh Acharya  >
> > > wrote:
> > >
> > > > @sandeep krishnamurthy  the bug fixes
> in
> > > the
> > > > R-package is something we have just begun, there will not be anything
> > > > significant to announce before the v1.3 code freeze.
> > > >
> > > > On Wed, Jul 18, 2018 at 10:08 PM Steffen Rochel <
> > steffenroc...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > To make it easier to find the discussions related to project
> > proposals
> > > I
> > > > > added a column with a link to the thread on dev@ for most
> projects.
> > > > > Appreciate for the project owners to fill in the blanks and to
> check
> > > > that I
> > > > > got the right threads.
> > > > >
> > > > > Regar

Re: Release plan - MXNET 1.3

2018-07-23 Thread Roshani Nagmote
Hi all,

As mentioned before, code freeze date is today July 23rd. Please try to get
your ongoing PRs merged by today.
Please let me know if there are any concerns or need more time.

Thanks,
Roshani

On Thu, Jul 19, 2018 at 1:16 PM Anirudh Acharya 
wrote:

> @sandeep krishnamurthy  the bug fixes in the
> R-package is something we have just begun, there will not be anything
> significant to announce before the v1.3 code freeze.
>
> On Wed, Jul 18, 2018 at 10:08 PM Steffen Rochel 
> wrote:
>
> > To make it easier to find the discussions related to project proposals I
> > added a column with a link to the thread on dev@ for most projects.
> > Appreciate for the project owners to fill in the blanks and to check
> that I
> > got the right threads.
> >
> > Regards,
> > Steffen
> >
> > On Wed, Jul 18, 2018 at 7:11 PM Roshani Nagmote <
> roshaninagmo...@gmail.com
> > >
> > wrote:
> >
> > > Hi Kellen,
> > >
> > > Sure. I will update the notes with the information.
> > >
> > > Thanks,
> > > Roshani
> > >
> > > On Wed, Jul 18, 2018 at 3:01 PM kellen sunderland <
> > > kellen.sunderl...@gmail.com> wrote:
> > >
> > > > Hey Roshani,
> > > >
> > > > Would you be able to add 'TensorRT Runtime Integration' to the list
> of
> > > > upcoming features?  We'll target getting the release in and polished
> by
> > > the
> > > > 23rd.  Design proposal is here:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Runtime+Integration+with+TensorRT
> > > > and the lead contributor is Marek Kolodziej.
> > > >
> > > > -Kellen
> > > >
> > > > On Wed, Jul 18, 2018 at 8:32 PM Roshani Nagmote <
> > > roshaninagmo...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I am starting the process to prepare for Apache MXNet (incubating)
> > 1.3
> > > > > Release. Please find project proposal draft for this release here:
> > > > > <*
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
> > > > > <
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
> > > > > >*
> > > > > >
> > > > >
> > > > > Target feature freeze date is July 23rd. A release candidate will
> be
> > > cut
> > > > > around Monday, August 6th and voting will commence from then until
> > > > > Thursday, August 9th. If you have any additional features in
> progress
> > > and
> > > > > would like to include it in this release, please make sure to
> comment
> > > so
> > > > I
> > > > > can update the release notes.
> > > > >
> > > > > Feel free to add any other comments/suggestions.
> > > > >
> > > > > Thanks,
> > > > > Roshani
> > > > >
> > > >
> > >
> >
>


Re: Release plan - MXNET 1.3

2018-07-18 Thread Roshani Nagmote
Hi Kellen,

Sure. I will update the notes with the information.

Thanks,
Roshani

On Wed, Jul 18, 2018 at 3:01 PM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> Hey Roshani,
>
> Would you be able to add 'TensorRT Runtime Integration' to the list of
> upcoming features?  We'll target getting the release in and polished by the
> 23rd.  Design proposal is here:
>
> https://cwiki.apache.org/confluence/display/MXNET/Runtime+Integration+with+TensorRT
> and the lead contributor is Marek Kolodziej.
>
> -Kellen
>
> On Wed, Jul 18, 2018 at 8:32 PM Roshani Nagmote  >
> wrote:
>
> > Hi All,
> >
> > I am starting the process to prepare for Apache MXNet (incubating) 1.3
> > Release. Please find project proposal draft for this release here:
> > <*
> >
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
> > <
> >
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
> > >*
> > >
> >
> > Target feature freeze date is July 23rd. A release candidate will be cut
> > around Monday, August 6th and voting will commence from then until
> > Thursday, August 9th. If you have any additional features in progress and
> > would like to include it in this release, please make sure to comment so
> I
> > can update the release notes.
> >
> > Feel free to add any other comments/suggestions.
> >
> > Thanks,
> > Roshani
> >
>


Release plan - MXNET 1.3

2018-07-18 Thread Roshani Nagmote
Hi All,

I am starting the process to prepare for Apache MXNet (incubating) 1.3
Release. Please find project proposal draft for this release here:
<*https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
*
>

Target feature freeze date is July 23rd. A release candidate will be cut
around Monday, August 6th and voting will commence from then until
Thursday, August 9th. If you have any additional features in progress and
would like to include it in this release, please make sure to comment so I
can update the release notes.

Feel free to add any other comments/suggestions.

Thanks,
Roshani


Re: [RESULT][VOTE] Release MXNet version 1.2.1.RC1

2018-07-12 Thread Roshani Nagmote
Hi All,

Updating the results.
This vote passes with 9 +1 votes (1 binding) and no 0 or -1 votes.

*+1 votes*

*IPMC Members:*
- Sergio

*Committers*:
- Haibin
- Indhu
- Sandeep
- Jun
- Zhi
- Naveen

*Contributors:*
- Lin
- Hao

Vote thread:

*https://lists.apache.org/thread.html/6a0459a54dd0b13fc89f653af91306810aa83d5d1a578a2abbac89fe@%3Cdev.mxnet.apache.org%3E
<https://lists.apache.org/thread.html/6a0459a54dd0b13fc89f653af91306810aa83d5d1a578a2abbac89fe@%3Cdev.mxnet.apache.org%3E>*

I will continue with the release process on general@ and the release
announcement
will follow in the next few days.

Thanks,
Roshani


On Thu, Jul 12, 2018 at 6:10 PM Roshani Nagmote 
wrote:

> Hi All,
>
> Thank you for spending the time to test MXNet 1.2.1.RC1 release.
> This vote passes with 9 +1 (binding votes).
>
> Binding +1:
> Haibin
> Indhu
> Sandeep
> Jun
> Zhi
> Naveen
> Lin
> Sergio
> Hao
>
> Vote thread:
>
> *https://lists.apache.org/thread.html/6a0459a54dd0b13fc89f653af91306810aa83d5d1a578a2abbac89fe@%3Cdev.mxnet.apache.org%3E
> <https://lists.apache.org/thread.html/6a0459a54dd0b13fc89f653af91306810aa83d5d1a578a2abbac89fe@%3Cdev.mxnet.apache.org%3E>*
>
> I will continue with the release process on general@ and the release 
> announcement
> will follow in the next few days.
>
> Thanks,
> Roshani
>


[RESULT][VOTE] Release MXNet version 1.2.1.RC1

2018-07-12 Thread Roshani Nagmote
Hi All,

Thank you for spending the time to test MXNet 1.2.1.RC1 release.
This vote passes with 9 +1 (binding votes).

Binding +1:
Haibin
Indhu
Sandeep
Jun
Zhi
Naveen
Lin
Sergio
Hao

Vote thread:

*https://lists.apache.org/thread.html/6a0459a54dd0b13fc89f653af91306810aa83d5d1a578a2abbac89fe@%3Cdev.mxnet.apache.org%3E
*

I will continue with the release process on general@ and the release
announcement
will follow in the next few days.

Thanks,
Roshani


Re: [VOTE] Release MXNet version 1.2.1.RC1

2018-07-11 Thread Roshani Nagmote
Hi All,

Could you please test and vote for this release? Voting will end tomorrow
by 5:50 pm PDT.

Thanks,
Roshani

On Mon, Jul 9, 2018 at 4:53 PM Roshani Nagmote 
wrote:

> Hi all,
>
> I would like to propose a vote to release Apache MXNet (incubating)
> version
> 1.2.1.RC1. Voting will start now (Monday, Jul 9th) and end at 5:50 PM
> PDT, Thursday, July 12th.
>
> Link to release candidate 1.2.1.rc1:
> *https://github.com/apache/incubator-mxnet/releases/tag/1.2.1.rc1
> <https://github.com/apache/incubator-mxnet/releases/tag/1.2.1.rc1>*
>
> View this page, click on "Build from Source", and use the source code
> obtained from 1.2.1.rc1 tag:
> https://mxnet.incubator.apache.org/install/index.html
>
> (Note: The README.md points to the 1.2.1 tag and does not work at the
> moment.)
>
> Please remember to test first before voting accordingly:
>
> +1 = approve
> +0 = no opinion
> -1 = disapprove (provide reason)
>
> Thanks,
> Roshani
>


[VOTE] Release MXNet version 1.2.1.RC1

2018-07-09 Thread Roshani Nagmote
Hi all,

I would like to propose a vote to release Apache MXNet (incubating) version
1.2.1.RC1. Voting will start now (Monday, Jul 9th) and end at 5:50 PM
PDT, Thursday, July 12th.

Link to release candidate 1.2.1.rc1:
*https://github.com/apache/incubator-mxnet/releases/tag/1.2.1.rc1
*

View this page, click on "Build from Source", and use the source code
obtained from 1.2.1.rc1 tag:
https://mxnet.incubator.apache.org/install/index.html

(Note: The README.md points to the 1.2.1 tag and does not work at the
moment.)

Please remember to test first before voting accordingly:

+1 = approve
+0 = no opinion
-1 = disapprove (provide reason)

Thanks,
Roshani


Re: MXNet 1.3 release - call for volunteers

2018-06-29 Thread Roshani Nagmote
Hi,

I would like to volunteer. But as I am not a committer, I would need help
from some committer.
Please let me know if anyone is interested to help me with the release.

Thanks,
Roshani

On Wed, Jun 27, 2018 at 5:52 PM Hagay Lupesko  wrote:

> Hello community!
>
> I'd like to kickstart the process for MXNet v1.3 release - and ask for a
> volunteer to take on this release as a release manager. I am hoping the
> release process can start next week or so.
> The release scope is documented here: [1
> <
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
> >
> ]
> The release process is documented here: [2]
>
> Some of the involved tasks require committer privileges, and I can help
> identifying a committer that will be available to help and mentor the
> release manager. This is a great opportunity for someone to contribute,
> ramp up further on the project, and help get the latest and greatest out to
> MXNet users.
>
> If you are interested - please reply and let me know!
>
> Thanks, Hagay
>
> [1]
>
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
> [2]
>
> https://cwiki.apache.org/confluence/display/MXNET/Release+Process?src=contextnavpagetreemode
>


Re: MXNet issues labeling

2018-05-22 Thread Roshani Nagmote
Sorry for incomplete email. Sent it too fast.
Thanks for all the comments. Aaron guessed it right. We can treat it as a
multi-label classification problem. :)
We are currently working on the design document, will share it on dev list
once it is more concrete.

@Marco, seems like a good idea too but as you said, it will still involve
manual labeling. We can do this as a temporary solution but need a
long-term solution.
@Hao, will explore that option too! Thanks!

Thanks,
Roshani


On Tue, May 22, 2018 at 9:52 AM, Roshani Nagmote <roshaninagmo...@gmail.com>
wrote:

> Thanks for all the comments. Aaron guessed it right. We can treat it as a
> multi-label classification problem. :)
> We are currently working on the design document, will share it on dev list
> once it is more concrete.
>
> @Marco, seems like a good idea too but as you said, it will still involve
> manual labeling. We can do this as a temporary solution but need a more
> @Hao, will explore that option too! Thanks!
>
> Thanks,
> Roshani
>
> On Mon, May 21, 2018 at 5:42 PM, Jin, Hao <h...@amazon.com> wrote:
>
>> Definitely a good idea. I think maybe we can also try out the new gluon
>> NLP toolkit on this task?
>>
>> On 5/21/18, 5:24 PM, "Anirudh" <anirudh2...@gmail.com> wrote:
>>
>> Yes, I guessed that :). Was looking for more details.
>>
>> On Mon, May 21, 2018 at 5:03 PM, Aaron Markham <
>> aaron.s.mark...@gmail.com>
>> wrote:
>>
>> > AI obviously.
>> >
>> > On Mon, May 21, 2018 at 5:01 PM, Anirudh <anirudh2...@gmail.com>
>> wrote:
>> >
>> > > Hi Roshani,
>> > >
>> > > Good suggestion! How will the bot decide what labels to add ?
>> > >
>> > > Anirudh
>> > >
>> > > On Mon, May 21, 2018 at 4:39 PM, Naveen Swamy <mnnav...@gmail.com
>> >
>> > wrote:
>> > >
>> > > > +1 for the proposal to triage issues, I think committers should
>> also do
>> > > > this exercise to understand the customer pain.
>> > > >
>> > > > I am also inclined to use a bot account like how Tensorflow and
>> other
>> > > repos
>> > > > do it, https://github.com/googlebot.
>> > > > https://github.com/tensorflow/tensorflow/pull/19445#event-16
>> 38027271
>> > -->
>> > > > This is auto-tagged by the bot, it would be cool if we could do
>> that as
>> > > > well.
>> > > >
>> > > >
>> > > >
>> > > > On Mon, May 21, 2018 at 4:25 PM, sandeep krishnamurthy <
>> > > > sandeep.krishn...@gmail.com> wrote:
>> > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Roshani for starting this thread.
>> > > > >
>> > > > > Yes, I think labeling the issues will help a lot in driving
>> the
>> > > attention
>> > > > > of contributors to specific areas and make it easy for new
>> > contributors
>> > > > to
>> > > > > search and pick their contribution.
>> > > > >
>> > > > > I agree manually doing it all the time is not scalable and
>> efficient.
>> > > > Your
>> > > > > proposal on bot script to auto-label, similar to the working
>> of
>> > Jenkins
>> > > > bot
>> > > > > to re-test, re-build actions, will be very useful and
>> effective.
>> > > Hence, I
>> > > > > am more inclined to your *option 1* to have a bot account to
>> add
>> > > labels.
>> > > > >
>> > > > > Best,
>> > > > > Sandeep
>> > > > >
>> > > > > On Mon, May 21, 2018 at 4:16 PM, Roshani Nagmote <
>> > > > > roshaninagmo...@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > Some of us here at Amazon as a part of our day job, are
>> triaging
>> > > Github
>> > > > > > issues to find where MXNet users are experiencing
>> difficulty and
>> > help
>> > > > the

Re: MXNet issues labeling

2018-05-22 Thread Roshani Nagmote
Thanks for all the comments. Aaron guessed it right. We can treat it as a
multi-label classification problem. :)
We are currently working on the design document, will share it on dev list
once it is more concrete.

@Marco, seems like a good idea too but as you said, it will still involve
manual labeling. We can do this as a temporary solution but need a more
@Hao, will explore that option too! Thanks!

Thanks,
Roshani

On Mon, May 21, 2018 at 5:42 PM, Jin, Hao <h...@amazon.com> wrote:

> Definitely a good idea. I think maybe we can also try out the new gluon
> NLP toolkit on this task?
>
> On 5/21/18, 5:24 PM, "Anirudh" <anirudh2...@gmail.com> wrote:
>
> Yes, I guessed that :). Was looking for more details.
>
> On Mon, May 21, 2018 at 5:03 PM, Aaron Markham <
> aaron.s.mark...@gmail.com>
> wrote:
>
> > AI obviously.
> >
> > On Mon, May 21, 2018 at 5:01 PM, Anirudh <anirudh2...@gmail.com>
> wrote:
> >
> > > Hi Roshani,
> > >
> > > Good suggestion! How will the bot decide what labels to add ?
> > >
> > > Anirudh
> > >
> > > On Mon, May 21, 2018 at 4:39 PM, Naveen Swamy <mnnav...@gmail.com>
> > wrote:
> > >
> > > > +1 for the proposal to triage issues, I think committers should
> also do
> > > > this exercise to understand the customer pain.
> > > >
> > > > I am also inclined to use a bot account like how Tensorflow and
> other
> > > repos
> > > > do it, https://github.com/googlebot.
> > > > https://github.com/tensorflow/tensorflow/pull/19445#event-
> 1638027271
> > -->
> > > > This is auto-tagged by the bot, it would be cool if we could do
> that as
> > > > well.
> > > >
> > > >
> > > >
> > > > On Mon, May 21, 2018 at 4:25 PM, sandeep krishnamurthy <
> > > > sandeep.krishn...@gmail.com> wrote:
> > > >
> > > > > Thanks,
> > > > >
> > > > > Roshani for starting this thread.
> > > > >
> > > > > Yes, I think labeling the issues will help a lot in driving the
> > > attention
> > > > > of contributors to specific areas and make it easy for new
> > contributors
> > > > to
> > > > > search and pick their contribution.
> > > > >
>     > > > > I agree manually doing it all the time is not scalable and
> efficient.
> > > > Your
> > > > > proposal on bot script to auto-label, similar to the working of
> > Jenkins
> > > > bot
> > > > > to re-test, re-build actions, will be very useful and
> effective.
> > > Hence, I
> > > > > am more inclined to your *option 1* to have a bot account to
> add
> > > labels.
> > > > >
> > > > > Best,
> > > > > Sandeep
> > > > >
> > > > > On Mon, May 21, 2018 at 4:16 PM, Roshani Nagmote <
> > > > > roshaninagmo...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Some of us here at Amazon as a part of our day job, are
> triaging
> > > Github
> > > > > > issues to find where MXNet users are experiencing difficulty
> and
> > help
> > > > the
> > > > > > community focus on those areas. This is done by assigning
> labels to
> > > the
> > > > > > Github issues. We do know that only labeling won’t solve the
> real
> > > > problem
> > > > > > but we will expand our scope to also attempt to resolve the
> issues.
> > > > > > Categorizing issues could also help contributors and
> maintainers
> > who
> > > > > know a
> > > > > > particular area to pick up the issue and help the user.
> > > > > >
> > > > > > Right now, we just manually go through the issues. If they
> are
> > > > questions,
> > > > > > we redirect users to start a discussion on discuss forum,
> find the
> > > > > > appropriate labels and then ask one of the committers to add
> those
>

MXNet issues labeling

2018-05-21 Thread Roshani Nagmote
Hi,

Some of us here at Amazon as a part of our day job, are triaging Github
issues to find where MXNet users are experiencing difficulty and help the
community focus on those areas. This is done by assigning labels to the
Github issues. We do know that only labeling won’t solve the real problem
but we will expand our scope to also attempt to resolve the issues.
Categorizing issues could also help contributors and maintainers who know a
particular area to pick up the issue and help the user.

Right now, we just manually go through the issues. If they are questions,
we redirect users to start a discussion on discuss forum, find the
appropriate labels and then ask one of the committers to add those labels.
This process is not very smooth as its completely manual and every time we
need to ask committers to add labels.

We want to be able to automate/simplify this issue labeling process.
Right now, as far as I know, there's no way for non-committers to add
labels. So, I want to propose two options:

- Using a separate account having minimum permissions to run the bot script
which will do the labeling. For this, we will need an account to be created
from Apache infrastructure with proper access and they can control the
access for the account through
https://docs.aws.amazon.com/secretsmanager/latest/userguide/intro.html

- Using one of the committers auth token to run the script.

Please let me know if you have any other ideas to do this.

Thanks,
Roshani


Re: [VOTE] Change Scala namespace from dmlc to org.apache

2018-03-12 Thread Roshani Nagmote
+1 to change the namespace

On Mon, Mar 12, 2018 at 3:05 PM, Chris Olivier 
wrote:

> The assumption is that it would be changed more-or-less immediately.  ie.
> this is like a voted PR, I guess.
>
> On Mon, Mar 12, 2018 at 2:53 PM, Chris Olivier 
> wrote:
>
> > It is about changing the namespace.  As far as I know, the version number
> > of the next release is not defined.
> > At such point where a release is announced, one could comment, vote
> > whatever on the chosen version of that release, I suppose.  But that's
> > beyond the scope of this vote, because the "next release" is not yet
> > defined.
> >
> >
> >
> > On Mon, Mar 12, 2018 at 2:48 PM, Marco de Abreu <
> > marco.g.ab...@googlemail.com> wrote:
> >
> >> Just for clarification: Is this vote about changing the namespace with
> the
> >> next release?
> >>
> >> On Mon, Mar 12, 2018 at 7:16 PM, Naveen Swamy 
> wrote:
> >>
> >> > Chris, Thanks for starting this vote.
> >> > This is long pending
> >> >
> >> > +1 to change org.apache namespace
> >> >
> >> > On Mon, Mar 12, 2018 at 10:35 AM, Marco de Abreu <
> >> > marco.g.ab...@googlemail.com> wrote:
> >> >
> >> > > I gave my +1 for the code modification. The -1 was for Nan Zhus
> >> proposal
> >> > to
> >> > > get it into 1.2.
> >> > >
> >> > > On Mon, Mar 12, 2018 at 6:18 PM, Chris Olivier <
> cjolivie...@gmail.com
> >> >
> >> > > wrote:
> >> > >
> >> > > > If you're tying this to a process issue, then it's no longer a
> code
> >> > > > modification technical vote.
> >> > > >
> >> > > >
> >> > > > On Mon, Mar 12, 2018 at 9:56 AM, Marco de Abreu <
> >> > > > marco.g.ab...@googlemail.com> wrote:
> >> > > >
> >> > > > > Right
> >> > > > >
> >> > > > > Chris Olivier  schrieb am Mo., 12. März
> >> 2018,
> >> > > > > 17:38:
> >> > > > >
> >> > > > > > Are you saying your vote is contingent upon the outcome of a
> >> > separate
> >> > > > > vote?
> >> > > > > >
> >> > > > > > On Mon, Mar 12, 2018 at 9:37 AM, Marco de Abreu <
> >> > > > > > marco.g.ab...@googlemail.com> wrote:
> >> > > > > >
> >> > > > > > > +1 for changing the namespace
> >> > > > > > > -1 for merging this change into master according to the
> >> current
> >> > > > policy
> >> > > > > > >
> >> > > > > > > Chris Olivier  schrieb am Mo., 12.
> >> März
> >> > > 2018,
> >> > > > > > > 17:34:
> >> > > > > > >
> >> > > > > > > > Release versioning is a separate issue or vote.  At
> release
> >> > time,
> >> > > > > > people
> >> > > > > > > > can "demand" version X or Y.  This vote represents "do we
> >> want
> >> > to
> >> > > > > > change
> >> > > > > > > > the namespace".
> >> > > > > > > >
> >> > > > > > > > On Mon, Mar 12, 2018 at 9:30 AM, Nan Zhu <
> >> > zhunanmcg...@gmail.com
> >> > > >
> >> > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > I think we'd specify it will change in the next version
> >> > (1.2)?
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > On Mon, Mar 12, 2018 at 9:26 AM, Chris Olivier <
> >> > > > > > cjolivie...@gmail.com>
> >> > > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > This vote is for the code-change of altering the Scala
> >> API
> >> > > > > > namespace
> >> > > > > > > > from
> >> > > > > > > > > > dmlc to org.apache.
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > Vote will conclude on Thursday, 5pm PDT.
> >> > > > > > > > > >
> >> > > > > > > > > > Thank you,
> >> > > > > > > > > >
> >> > > > > > > > > > -Chris
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>


Re: [VOTE] Disconnect all non-C API's from mxnet versioning

2018-03-12 Thread Roshani Nagmote
-1 for different versioning.

I feel its just added confusion for users.

On Mon, Mar 12, 2018 at 2:35 PM, YiZhi Liu  wrote:

> Agree.
>
> And my reply to Marco's point,
>
> > Changing namespaces is one use-case, but there will be a lot more with
> increasing activity - we have to take the bigger picture in mind.
> And you mentioned the CPP package as an example.
> > During analysis, we figured that a re-engineering of that API would be
> more appropriate and easier maintainable.
> I cannot agree as an engineer. Why not keep old API and add new ones?
> Just like in c_api.h, we added xxxEx while did not remove xxx, which
> keeps APIs compatible.
>
> > I think it is safe to say that the other APIs have not been maintained
> as actively as our Python/Gluon API.
> Are you saying, if an API is maintained actively and is widely used,
> then it should be versioned together with MXNet Core?
> Interesting, maybe instead we should have another discussion to decide
> whether to remove some of the 'inactive' frontend bindings from the
> repo.
>
> > We have to do #3 anyways, so it is just about having a different number
> set as version string.
> A release with 6 different versions and 5 mappings?
>
> > I really don't see an issue in #1 - it's a simple lookup that could be
> done on our website.
> Please be careful to say 'simple', each time we introduce an
> additional step, we lose a number of our potential users.
> And as I describe in my #5. Imagine an inverse situation. When someone
> has a model trained by gluon 1.6.0, he want to deploy it to JVM, what
> Scala API version should he use? 1.6.0? No. And which R package
> version he should use? It is still different from either Gluon version
> or Scala API version. What a nightmare.
>
> 2018-03-12 14:11 GMT-07:00 Chris Olivier :
> > Marco, you're mixing votes again.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > * This leaves us with three options: 1. Vote failed: No refactoring of
> > user-facing APIs (including namespace changes) possible OR major version
> > increase 2. Vote succeeded: Refactoring of user-facing APIs possible and
> > only users of the changed APIs are affected while major version does not
> > increase for other APIs. 3. Remove SemVer: We could introduce breaking
> > changes at any point in time, but our users would be losing trust due to
> > unexpected failures during upgrades.*
> >
> > What you're describing is not what this vote is about.  This vote is
> > whether to separate mxnet and API versioning.
> > Please try to stay on task.
> >
> >
> >
> > On Mon, Mar 12, 2018 at 1:56 PM, Marco de Abreu <
> > marco.g.ab...@googlemail.com> wrote:
> >
> >> Regarding #4: Changing namespaces is one use-case, but there will be a
> lot
> >> more with increasing activity - we have to take the bigger picture in
> mind.
> >> I think it is safe to say that the other APIs have not been maintained
> as
> >> actively as our Python/Gluon API (which I would say could be versioned
> >> together with MXNet Core, but it does not really make a difference).
> This
> >> results in our APIs not reflecting all features available in MXNet (#2)
> or
> >> doing it in a way that we wouldn't recommend nowadays. While it is no
> >> problem to add new features to an API using a minor version change, it
> >> limits our possibilites to do a refactor. Our team, for example, got a
> >> customer that would like to see the functionality of the Cpp package
> being
> >> increased. During analysis, we figured that a re-engineering of that API
> >> would be more appropriate and easier maintainable. If we don't pass this
> >> vote, we won't be able to make any improvements to our less maintained
> APIs
> >> without a major version increment - which the community is also heavily
> >> against. We have to do #3 anyways, so it is just about having a
> different
> >> number set as version string - right now we're making it easy for
> ourselves
> >> by basically not maintaining any other than the Python interface and
> >> declining all breaking changes or refactors to APIs. I really don't see
> an
> >> issue in #1 - it's a simple lookup that could be done on our website.
> >> Simply select the version of MXNet you would like to have and it will
> >> provide you with the appropriate installation instructions - the same
> way
> >> we're already doing it.
> >>
> >> This leaves us with three options:
> >> 1. Vote failed: No refactoring of user-facing APIs (including namespace
> >> changes) possible OR major version increase
> >> 2. Vote succeeded: Refactoring of user-facing APIs possible and only
> users
> >> of the changed APIs are affected while major version does not increase
> for
> >> other APIs.
> >> 3. Remove SemVer: We could introduce breaking changes at any point in
> time,
> >> but our users would be losing trust due to unexpected failures during
> >> upgrades.
> >>
> >> -Marco
> >>
> >>
> >> On Mon, Mar 12, 2018 at 9:22 PM, YiZhi Liu 

Re: Request for comments: Proposal for import/export model formats module into MXNet

2018-02-22 Thread Roshani Nagmote
Hi Marco,

Good question. ONNX models come with a version number in the model protobuf
file. We can make use of that field when importing into MXNet.

You can see the discussion and design of versioning policies in ONNX here:
https://github.com/onnx/onnx/issues/119

- Roshani


On Thu, Feb 22, 2018 at 5:21 PM, Naveen Swamy <mnnav...@gmail.com> wrote:

> If you train with a newer version of MXNet and try running on a older
> version of MXNet, it might not already work today,  I am not sure if we
> want to support such use-cases. This is tangential to this piece of work
>
> If ONNX were to update their version, I think the right place to keep
> future versions of ONNX compatible should be in ONNX by providing a tool to
> move from ONNX.v0 to ONNX.v1. so that various framework converters always
> move with the latest version of ONNX.
>
> ONNX models I believe already contains the ONNX version with which it was
> built.
>
>
> On Thu, Feb 22, 2018 at 4:38 PM, Marco de Abreu <
> marco.g.ab...@googlemail.com> wrote:
>
> > Hello Roshani,
> >
> > interesting document and a good step towards allowing customers and
> > developers to adopt MXNet faster.
> >
> > Just one quick question: How would your proposed design handle
> > compatibility between old and new versions of MXNet as well as other
> > frameworks? Since serde (import/export) is part of the MXNet source, we
> > won't be able to update it independently. One example I'm thinking about
> is
> > training on the latest version of MXNet and running inference on an older
> > version. Could this cause issues since the ONNX model could be of a
> higher
> > version than the import on the old MXNet version is able to load? Would
> it
> > be necessary to have some kind of compatibility mode during the export
> > process in which you define the target ONNX model version? There might
> also
> > be different operator versions etc.
> >
> > Best regards,
> > Marco
> >
> >
> >
> > On Fri, Feb 23, 2018 at 1:15 AM, Roshani Nagmote <
> > roshaninagmo...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > I wanted to follow up on the proposal I sent before.
> > > https://cwiki.apache.org/confluence/display/MXNET/
> > > Proposal%3A+ImportExport+
> > > module
> > >
> > > It will be great if you can provide your feedback or suggestions.
> > >
> > > Thanks,
> > > Roshani
> > >
> > > On Thu, Jan 18, 2018 at 4:47 PM, Roshani Nagmote <
> > > roshaninagmo...@gmail.com>
> > > wrote:
> > >
> > > > Hello all,
> > > >
> > > > I have written an initial design proposal for a `serde`(temporary
> name)
> > > > module for importing and exporting different model formats like onnx,
> > > > coreml to and from MXNet.
> > > >
> > > > Please take a look and feel free to provide suggestions in the
> comment
> > > > section.
> > > >
> > > > https://cwiki.apache.org/confluence/display/MXNET/
> > > > Proposal%3A+ImportExport+module
> > > >
> > > > Note: I will be traveling next week with limited access to emails.
> So,
> > > > responses might be delayed.
> > > >
> > > > Thanks,
> > > > Roshani
> > > >
> > >
> >
>


Re: Request for comments: Proposal for import/export model formats module into MXNet

2018-02-22 Thread Roshani Nagmote
Hi all,

I wanted to follow up on the proposal I sent before.
https://cwiki.apache.org/confluence/display/MXNET/Proposal%3A+ImportExport+
module

It will be great if you can provide your feedback or suggestions.

Thanks,
Roshani

On Thu, Jan 18, 2018 at 4:47 PM, Roshani Nagmote <roshaninagmo...@gmail.com>
wrote:

> Hello all,
>
> I have written an initial design proposal for a `serde`(temporary name)
> module for importing and exporting different model formats like onnx,
> coreml to and from MXNet.
>
> Please take a look and feel free to provide suggestions in the comment
> section.
>
> https://cwiki.apache.org/confluence/display/MXNET/
> Proposal%3A+ImportExport+module
>
> Note: I will be traveling next week with limited access to emails. So,
> responses might be delayed.
>
> Thanks,
> Roshani
>


Request for comments: Proposal for import/export model formats module into MXNet

2018-01-18 Thread Roshani Nagmote
Hello all,

I have written an initial design proposal for a `serde`(temporary name)
module for importing and exporting different model formats like onnx,
coreml to and from MXNet.

Please take a look and feel free to provide suggestions in the comment
section.

https://cwiki.apache.org/confluence/display/MXNET/Proposal%3A+ImportExport+module

Note: I will be traveling next week with limited access to emails. So,
responses might be delayed.

Thanks,
Roshani


Re: Refactoring MXNet scala code to use "org.apache.mxnet"

2018-01-04 Thread Roshani Nagmote
 Hi,

As currently, MXNet does not have Jira project, I have created github issue
for now.
https://github.com/apache/incubator-mxnet/issues/9315

Will create the PR and link the issue there.

Thanks,
Roshani

On Thu, Jan 4, 2018 at 3:08 PM, Naveen Swamy <mnnav...@gmail.com> wrote:

> Hi Suneel,
>
> Did we decide that we will using Jira going forward? If not, can someone
> summarize on the improvement email on the consensus and lets make it
> universal and how to use it, what is expected, etc.,
>
> For the record, I like the idea of using Jira for more openness.
>
> Also, MXNet does not have Jira project, can you help creating one?
>
> Thanks, Naveen
>
>
> On Thu, Jan 4, 2018 at 2:35 PM, Suneel Marthi <smar...@apache.org> wrote:
>
> > Is there a Jira for this? Please create a Jira and reference that in the
> PR
> > for this.
> >
> > On Thu, Jan 4, 2018 at 5:16 PM, Roshani Nagmote <
> roshaninagmo...@gmail.com
> > >
> > wrote:
> >
> > > Hello all,
> > >
> > > I am working on publishing mxnet-scala release to maven repository and
> > as a
> > > part of that, I will also be refactoring mxnet-scala code/tests/example
> > and
> > > docs to use "org.apache.mxnet" instead of "ml.dmlc.mxnet".
> > >
> > > Currently, MXNet-Scala
> > > <http://mxnet.incubator.apache.org/api/scala/index.html> library uses
> > > "ml.dmlc.mxnet" packages. This work will change the way to import
> modules
> > > when using mxnet-scala package.
> > >
> > > *Old way:*
> > >
> > > scala> import ml.dmlc.mxnet._
> > >import ml.dmlc.mxnet._scala> val arr = NDArray.ones(2, 3)
> > >arr: ml.dmlc.mxnet.NDArray = ml.dmlc.mxnet.NDArray@f5e74790
> > >
> > > *New way:*
> > >
> > > scala> import org.apache.mxnet._
> > >import org.apache.mxnet._
> > > scala> val arr = NDArray.ones(2, 3)
> > >arr: org.apache.mxnet.NDArray = org.apache.mxnet.NDArray@f5e74790
> > >
> > >
> > > Please let me know if anyone has any thoughts or issues with this
> change.
> > >
> > > Thanks,
> > > Roshani
> > >
> >
>


Request for suggestions- Supporting onnx in mxnet

2017-10-18 Thread Roshani Nagmote
Hi guys,


I am working on supporting ONNX  pre-trained
models in Apache MXNet and would like to seek your opinion on the choice of
implementation. I also have created a GitHub issue
. Supporting ONNX in
MXNet will enable users to move between frameworks with their models, this
will also enable MXNet project to be a part of the ONNX open standard and
steer the direction of ONNX.


For those who don’t know ONNX, ONNX is an open source format for AI models
which enables models to be transferred between frameworks. Refer to
https://github.com/onnx/onnx for more details.


To implement the import/export functionality in MXNet, I propose to expose
a MXNet python module “serde”(name taken from Apache Hive project) with the
following methods supporting different formats:

sym, params = mxnet.serde.import(other_format_file, other_format=‘onnx’)

other_format_file =  mxnet.serde.export(mxnet_sym, mxnet_params, ‘onnx’)


The implementation under the hood can be done in two ways:


1) Implement at the MXNet layer by parsing the ONNX model(in protobuf
format) and turn into MXNet Symbolic operators and build MXNet model
directly. Similarly, I can convert the MXNet model to ONNX format at this
layer.


2) The DMLC community has released the nnvm/tvm complier and an
intermediate representation of the models, refer:
http://www.tvmlang.org/2017/10/06/nnvm/tvm-compiler-announcement.html


Based on the conversation on the GitHub issue
 I opened, Mu
mentioned that MXNet would use nnvm/tvm as the backend in the future.


We could hook into this layer to implement the import/export functionality.
nnvm/tvm has ONNX 0.1 version import implemented.

For import,

   1.

   I will need to enhance nnvm/tvm’s importer to support ONNX 0.2
   2.

   Implement nnvm/tvm->mxnet symbolic operators.

For export:


   1.

   mxnet->nnvm/tvm ( nnvm/tvm provides this implementation already)
   2.

   I will need to Implement nnvm/tvm>onnx.


These are the pros and cons I see in the above approaches:

   1.

   Import/export at mxnet layer

Pros:

   1.

   Stable APIs currently used by users.
   2.

   Larger Apache MXNet community of contributors.
   3.

   CI pipeline to catch bugs.
   4.

   Comparatively less time to implement and put it in the hands of the
   users.

Cons:

   1.

   In the future we may have to reimplement at the nnvm/tvm layer, in case
   MXNet moves to the nnvm/tvm backend(assuming it will move).



   1.

   Import/export at nnvm/tvm layer

Pros:

   1.

   Less engineering work in case mxnet moves to nnvm/tvm
   2.

   nnvm/tvm would become a hub to convert to different formats.
   3.

   nnvm operators are more in parity with mxnet’s gluon APIs this could be
   useful in case Gluon becomes the only standard that MXNet will support.

Cons:

   1.

   Nascent project with few contributors
   2.

   Does not support all operators that exist in MXNet Symbolic API
   3.

   No CI Pipeline
   4.

   Current Apache MXNet project does not use nnvm/tvm backend
   5.

   mxnet->nnvm/tvm backend needs more testing and user feedback.


Any suggestions on both of these approaches? From user's perspective, this
will be an implementation detail that is not exposed.

Thanks,

Roshani