Requesting slack access

2018-07-30 Thread Juan Vercellone
-- 
-- .-
VERCELLONE, Juan.
(also known as 1010ad1c97efb4734854b6ffd0899401)


Re: Release blocker: non-determinstic forward in gluon

2018-07-30 Thread Hagay Lupesko
Thanks Pedro.
Good to know you think it is important as well. I hope the community can
review a proposal on the CWiki soon? that would be great...

On Mon, Jul 30, 2018 at 4:26 AM Pedro Larroy 
wrote:

> Hi Hagay
>
> We are aware of this and we are working in this direction which as you
> point out, is more desirable.
> There's a huge amount of non-trivial work that has gone into building these
> distribution packages from Sheng which needs to be adapted for our CI
> system, and taken into consideration.
>
> Pedro.
>
>
> On Mon, Jul 30, 2018 at 9:07 AM Hagay Lupesko  wrote:
>
> > Thanks Tong for root-causing the issue!
> > Thanks Sheng for following up with an updated PyPi package.
> >
> > What worries me is that we seem to build MXNet PyPi distribution packages
> > with a build config different than the CI where all of the tests are
> > running.
> > Looking here [1
> > <
> >
> https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh
> > >]
> > it seems that MXNet CI Ubuntu build uses libopenblas-dev v0.2.18, while
> > PyPi build for MXNet 1.2.1 used v0.3.2 (I would imaging PyPi
> distribution?)
> >
> > Needless to say that if we don't make sure PyPi distribution is aligned
> > with the CI build, similar issues can happen again with other
> dependencies.
> > I'd think we want the build configs to be the same, or better yet have
> the
> > PyPi package be built from the output produced by the CI.
> > Thoughts?
> >
> > [1]
> >
> >
> https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh
> >
> >
> > On Fri, Jul 27, 2018 at 11:31 AM Sheng Zha  wrote:
> >
> > > Tong,
> > >
> > > That's great news. I'm glad that OpenBLAS people are responding so
> > quickly.
> > > In that case it's probably a better idea to use that version instead.
> The
> > > latest OpenBLAS version brings many optimization for all kinds of
> > hardware.
> > >
> > > -sz
> > >
> > > On Fri, Jul 27, 2018 at 11:10 AM, Tong He  wrote:
> > >
> > > > Hi Sheng,
> > > >
> > > > I also opened an issue on OpenBLAS repo:
> > > > https://github.com/xianyi/OpenBLAS/issues/1700 .
> > > >
> > > > As informed that "0.3.2 should be released this weekend", I tested
> > their
> > > > develope branch as well, and seems the new version has fixed the bug.
> > > >
> > > > Since OpenBLAS 0.3.2 could also have performance improvement,
> > therefore I
> > > > propose to wait for OpenBLAS 0.3.2 for our pip post release.
> > > >
> > > >
> > > > Best regards,
> > > >
> > > > Tong He
> > > >
> > > > 2018-07-27 10:54 GMT-07:00 Sheng Zha :
> > > >
> > > > > Forgot to mention, the post release version is a pip package
> version.
> > > > >
> > > > > -sz
> > > > >
> > > > > > On Jul 27, 2018, at 10:42 AM, Sheng Zha 
> > wrote:
> > > > > >
> > > > > > In this case we can regard it as a release problem, which is
> > usually
> > > > > what post release versions are for. It’s still the same release
> with
> > > > > different dependency, so there is no code change needed.
> > > > > >
> > > > > > -sz
> > > > > >
> > > > > >
> > > > > >> On Jul 27, 2018, at 8:31 AM, Steffen Rochel <
> > > steffenroc...@gmail.com>
> > > > > wrote:
> > > > > >>
> > > > > >> Hi Tong - thanks for root causing the problem.
> > > > > >> Sheng - what is 1.2.1.post0? Shouldn't a patch with fix be
> > released
> > > as
> > > > > >> 1.2.2?
> > > > > >> Steffen
> > > > > >>
> > > > > >>> On Thu, Jul 26, 2018 at 5:33 PM Sheng Zha 
> > > > wrote:
> > > > > >>>
> > > > > >>> Dear users and developers of Apache MXNet (Incubating),
> > > > > >>>
> > > > > >>> Thanks to Tong's dedication, the root cause for this issue was
> > > > > identified
> > > > > >>> to be instability in OpenBLAS's latest stable version 0.3.1.
> For
> > > > > details,
> > > > > >>> see Tong's comment
> > > > > >>> <
> > > > > >>> https://github.com/apache/incubator-mxnet/issues/11853#
> > > > > issuecomment-408272772
> > > > > 
> > > > > >>> .
> > > > > >>>
> > > > > >>> Since both the nightly build and the 1.2.1 wheels are affected,
> > we
> > > > > >>> recommend that we stay on OpenBLAS last known stable version
> > 0.2.20
> > > > > that
> > > > > >>> we've been using. I will assume lazy consensus and prepare the
> > fix
> > > > > >>> (1.2.1.post0).
> > > > > >>>
> > > > > >>> -sz
> > > > > >>>
> > > > >  On Tue, Jul 24, 2018 at 3:35 PM, Tong He 
> > wrote:
> > > > > 
> > > > >  Recently there's an issue regarding the inconsistent result
> from
> > > > gluon
> > > > >  forward:
> > > > > 
> > > > >  https://github.com/apache/incubator-mxnet/issues/11853
> > > > > 
> > > > >  Given a constant input image and loaded pretrained parameters,
> > we
> > > > > expect
> > > > > >>> a
> > > > >  deterministic output from arbitrary repeats of forwards.
> However
> > > > from
> > > > > the
> > > > >  issue I see that the forwarded result is non-determinstic. It
> is
> > > > > harmful
> > > > > >>> as
> > > > >  it makes the results from experments

Re: Automated Flaky Test Detector

2018-07-30 Thread Carl Tsai
Minor correction:

I'm working with the *Apache* MXNet engine team at Amazon!

Carl

On Mon, Jul 30, 2018 at 5:42 PM Carl Tsai  wrote:

> Hi all,
>
>
>
> My name is Carl Tsai, I’m an intern working with the Amazon MXNet engine
> team. For the past several weeks, I’ve been working with my mentor Hao Jin
> on an automatic flaky test detection system. As you may know, there’s an
> ongoing effort to reduce the number of flaky tests in the code base, and
> this flaky test detector is meant to help ensure that these efforts will
> have a long-lasting impact on the quality of our tests.
>
>
>
> This is an automated tool that will be run on PR checks to detect flaky
> tests before we merge them. It will check PRs for new and modified test
> cases and run them through a flakiness checker so that we can have high
> confidence that our tests are not flaky. There’s a design overview on
> confluence
> —
> I’d be happy to hear any feedback!
>
>
>
> In the meantime, one of the components, the flakiness checker, has been
> merged under the tools/ folder and I’d encourage those of you who are
> working on flaky tests or writing new tests to try it out. Documentation
> can be found on the confluence page for the project as well as on the 
> Reproducing
> Test Results Page
> 
> under tips and tricks, and I’m open to suggestions on how to improve it.
>
>
>
> Thanks,
>
> Carl
>


Automated Flaky Test Detector

2018-07-30 Thread Carl Tsai
Hi all,



My name is Carl Tsai, I’m an intern working with the Amazon MXNet engine
team. For the past several weeks, I’ve been working with my mentor Hao Jin
on an automatic flaky test detection system. As you may know, there’s an
ongoing effort to reduce the number of flaky tests in the code base, and
this flaky test detector is meant to help ensure that these efforts will
have a long-lasting impact on the quality of our tests.



This is an automated tool that will be run on PR checks to detect flaky
tests before we merge them. It will check PRs for new and modified test
cases and run them through a flakiness checker so that we can have high
confidence that our tests are not flaky. There’s a design overview on
confluence
—
I’d be happy to hear any feedback!



In the meantime, one of the components, the flakiness checker, has been
merged under the tools/ folder and I’d encourage those of you who are
working on flaky tests or writing new tests to try it out. Documentation
can be found on the confluence page for the project as well as on the
Reproducing
Test Results Page

under tips and tricks, and I’m open to suggestions on how to improve it.



Thanks,

Carl


Proposal for new merging/reshaping op

2018-07-30 Thread Taliesin Beynon
Hello all,

I’ve created a proposal for a new reshape op, it is described further at 
https://discuss.mxnet.io/t/proposal-new-merge-dims-reshaping-op/1524

Please post feedback there!
Thanks,
Tali

Reminder: Office hours - Berlin

2018-07-30 Thread Marco de Abreu
Hello,

quick reminder about our office hours Tuesday (tomorrow) at  6pm-7pm (CEST)
| 9:00am-10am (PST) :
https://cwiki.apache.org/confluence/display/MXNET/MXNet+Berlin+Office+Hours

If you would like to participate, please follow the steps described in the
wiki.

This office hour will be held by myself.

Best regards,
Marco


Re: Release plan - MXNET 1.3

2018-07-30 Thread Roshani Nagmote
Hi all,

Here is an update on MXNet 1.3 release:
I am still waiting for following PRs to get merged:

TRT integration: https://github.com/apache/incubator-mxnet/pull/11325
Gluon RNN: https://github.com/apache/incubator-mxnet/pull/11482
Scala examples:

https://github.com/apache/incubator-mxnet/pull/11753

https://github.com/apache/incubator-mxnet/pull/11621

*New code freeze date is: 08/03*  Please try to get your ongoing PRs merged
by then.

@Pedro, I didn't include your PRs in tracking list as you said those are
not critical for now. Please let me know if those needs to be included.
https://github.com/apache/incubator-mxnet/pull/11636
https://github.com/apache/incubator-mxnet/pull/11562

I also have updated project proposal cwiki page to update the status of PRs.


Please let me know if I am missing something.

Thanks,
Roshani


On Thu, Jul 26, 2018 at 1:34 PM Pedro Larroy 
wrote:

> I would like to get these PR merged:
>
> https://github.com/apache/incubator-mxnet/pull/11636
> https://github.com/apache/incubator-mxnet/pull/11562
>
> How much longer until the code freeze?
>
> On Thu, Jul 26, 2018 at 1:44 AM Roshani Nagmote  >
> wrote:
>
> > Hi all,
> >
> > PRs waiting to be merged for 1.3 release:
> > https://github.com/apache/incubator-mxnet/pull/11325
> >
> > Are there any other PRs waiting to get merged? Please let me know.
> >
> > Release blocker issue:
> > https://github.com/apache/incubator-mxnet/issues/11853
> >
> > @Marco, @Kellen, Thanks for bringing up the important topic. I agree with
> > you and we(internal Amazon team) will be working on fixing the disabled
> > tests.
> > Currently, my colleague, Hao Jin is working on compiling the list of
> > disabled tests and leading the effort to fix them in the next few days.
> >
> > Thanks,
> > Roshani
> >
> > On Mon, Jul 23, 2018 at 6:39 PM kellen sunderland <
> > kellen.sunderl...@gmail.com> wrote:
> >
> > > Thanks again for organizing Roshani.  I believe the TensorRT work is
> > ready
> > > for a merge.  Thanks to Marek and all the NVIDIA people for iterating
> on
> > > it.  If possible could a committer review, make sure it meets their
> > > expectations and then merge?  PR is here:
> > > https://github.com/apache/incubator-mxnet/pull/11325
> > >
> > > To Marco's point.  I'd recommend we review some of those disabled tests
> > and
> > > see how likely they are to affect users before we cut a release.  Many
> of
> > > them are obviously not too important from a user's point of view (e.g.
> > > downloading a sometimes-offline image in a test).  One idea would be to
> > try
> > > and address as many of the customer impacting issues as possible
> between
> > > code freeze and the RC0 vote.
> > >
> > > On Mon, Jul 23, 2018 at 1:23 PM Marco de Abreu
> > >  wrote:
> > >
> > > > Hello Roshani,
> > > >
> > > > frequent releases are good and I'm supportive for this in general in
> > > order
> > > > to provide our users with the latest features and improvements. But
> at
> > > the
> > > > moment, I'm slightly concerned about the test coverage due to [1]. I
> > want
> > > > us to be conscious about cutting a release even though not all tests
> > are
> > > > enabled (29 disabled tests [2] as of today). However, I acknowledge
> > that
> > > we
> > > > have improved by a lot lately thanks to everybody participating and
> > > leading
> > > > the efforts around improving flaky tests. From a retrospective point
> of
> > > > view, we could say that these efforts have actually revealed some
> quite
> > > > interesting bugs and thus the time was well spent and yielded good
> > > results.
> > > >
> > > > What does the community think about making another sprint of
> > improvements
> > > > around tests followed up by a period of 1-2 weeks during which we
> > observe
> > > > the failures closely to ensure that no critical paths are impacted?
> If
> > we
> > > > are in a good shape by then, we could continue the release process
> and
> > at
> > > > the same time have the advantage of giving contributors more lead
> time
> > to
> > > > finish their work to ensure it gets into the release in the desired
> > > > quality.
> > > >
> > > > Again, thanks to everybody for their efforts during the last weeks to
> > > > improve the usability and stability of MXNet. This is great community
> > > > effort and a good example of a community working together towards a
> > > unified
> > > > goal!
> > > >
> > > > Best regards,
> > > > Marco
> > > >
> > > > [1]:
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/d6d81401de796a96677a112d6cd0b074b01f46564194ea89b86c6a3e@%3Cdev.mxnet.apache.org%3E
> > > > [2]:
> > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues?q=is%3Aopen+is%3Aissue+label%3A%22Disabled+test%22
> > > >
> > > > On Mon, Jul 23, 2018 at 8:34 PM Roshani Nagmote <
> > > roshaninagmo...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > >

Re: Release blocker: non-determinstic forward in gluon

2018-07-30 Thread Pedro Larroy
Hi Hagay

We are aware of this and we are working in this direction which as you
point out, is more desirable.
There's a huge amount of non-trivial work that has gone into building these
distribution packages from Sheng which needs to be adapted for our CI
system, and taken into consideration.

Pedro.


On Mon, Jul 30, 2018 at 9:07 AM Hagay Lupesko  wrote:

> Thanks Tong for root-causing the issue!
> Thanks Sheng for following up with an updated PyPi package.
>
> What worries me is that we seem to build MXNet PyPi distribution packages
> with a build config different than the CI where all of the tests are
> running.
> Looking here [1
> <
> https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh
> >]
> it seems that MXNet CI Ubuntu build uses libopenblas-dev v0.2.18, while
> PyPi build for MXNet 1.2.1 used v0.3.2 (I would imaging PyPi distribution?)
>
> Needless to say that if we don't make sure PyPi distribution is aligned
> with the CI build, similar issues can happen again with other dependencies.
> I'd think we want the build configs to be the same, or better yet have the
> PyPi package be built from the output produced by the CI.
> Thoughts?
>
> [1]
>
> https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh
>
>
> On Fri, Jul 27, 2018 at 11:31 AM Sheng Zha  wrote:
>
> > Tong,
> >
> > That's great news. I'm glad that OpenBLAS people are responding so
> quickly.
> > In that case it's probably a better idea to use that version instead. The
> > latest OpenBLAS version brings many optimization for all kinds of
> hardware.
> >
> > -sz
> >
> > On Fri, Jul 27, 2018 at 11:10 AM, Tong He  wrote:
> >
> > > Hi Sheng,
> > >
> > > I also opened an issue on OpenBLAS repo:
> > > https://github.com/xianyi/OpenBLAS/issues/1700 .
> > >
> > > As informed that "0.3.2 should be released this weekend", I tested
> their
> > > develope branch as well, and seems the new version has fixed the bug.
> > >
> > > Since OpenBLAS 0.3.2 could also have performance improvement,
> therefore I
> > > propose to wait for OpenBLAS 0.3.2 for our pip post release.
> > >
> > >
> > > Best regards,
> > >
> > > Tong He
> > >
> > > 2018-07-27 10:54 GMT-07:00 Sheng Zha :
> > >
> > > > Forgot to mention, the post release version is a pip package version.
> > > >
> > > > -sz
> > > >
> > > > > On Jul 27, 2018, at 10:42 AM, Sheng Zha 
> wrote:
> > > > >
> > > > > In this case we can regard it as a release problem, which is
> usually
> > > > what post release versions are for. It’s still the same release with
> > > > different dependency, so there is no code change needed.
> > > > >
> > > > > -sz
> > > > >
> > > > >
> > > > >> On Jul 27, 2018, at 8:31 AM, Steffen Rochel <
> > steffenroc...@gmail.com>
> > > > wrote:
> > > > >>
> > > > >> Hi Tong - thanks for root causing the problem.
> > > > >> Sheng - what is 1.2.1.post0? Shouldn't a patch with fix be
> released
> > as
> > > > >> 1.2.2?
> > > > >> Steffen
> > > > >>
> > > > >>> On Thu, Jul 26, 2018 at 5:33 PM Sheng Zha 
> > > wrote:
> > > > >>>
> > > > >>> Dear users and developers of Apache MXNet (Incubating),
> > > > >>>
> > > > >>> Thanks to Tong's dedication, the root cause for this issue was
> > > > identified
> > > > >>> to be instability in OpenBLAS's latest stable version 0.3.1. For
> > > > details,
> > > > >>> see Tong's comment
> > > > >>> <
> > > > >>> https://github.com/apache/incubator-mxnet/issues/11853#
> > > > issuecomment-408272772
> > > > 
> > > > >>> .
> > > > >>>
> > > > >>> Since both the nightly build and the 1.2.1 wheels are affected,
> we
> > > > >>> recommend that we stay on OpenBLAS last known stable version
> 0.2.20
> > > > that
> > > > >>> we've been using. I will assume lazy consensus and prepare the
> fix
> > > > >>> (1.2.1.post0).
> > > > >>>
> > > > >>> -sz
> > > > >>>
> > > >  On Tue, Jul 24, 2018 at 3:35 PM, Tong He 
> wrote:
> > > > 
> > > >  Recently there's an issue regarding the inconsistent result from
> > > gluon
> > > >  forward:
> > > > 
> > > >  https://github.com/apache/incubator-mxnet/issues/11853
> > > > 
> > > >  Given a constant input image and loaded pretrained parameters,
> we
> > > > expect
> > > > >>> a
> > > >  deterministic output from arbitrary repeats of forwards. However
> > > from
> > > > the
> > > >  issue I see that the forwarded result is non-determinstic. It is
> > > > harmful
> > > > >>> as
> > > >  it makes the results from experments/benchmarks/inference
> > > > meaningless.
> > > > 
> > > >  Therefore I propose to block the 1.3 release before it gets
> > > resolved.
> > > > 
> > > > >>>
> > > >
> > >
> >
>


Re: Release blocker: non-determinstic forward in gluon

2018-07-30 Thread Hagay Lupesko
Thanks Tong for root-causing the issue!
Thanks Sheng for following up with an updated PyPi package.

What worries me is that we seem to build MXNet PyPi distribution packages
with a build config different than the CI where all of the tests are
running.
Looking here [1
]
it seems that MXNet CI Ubuntu build uses libopenblas-dev v0.2.18, while
PyPi build for MXNet 1.2.1 used v0.3.2 (I would imaging PyPi distribution?)

Needless to say that if we don't make sure PyPi distribution is aligned
with the CI build, similar issues can happen again with other dependencies.
I'd think we want the build configs to be the same, or better yet have the
PyPi package be built from the output produced by the CI.
Thoughts?

[1]
https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh


On Fri, Jul 27, 2018 at 11:31 AM Sheng Zha  wrote:

> Tong,
>
> That's great news. I'm glad that OpenBLAS people are responding so quickly.
> In that case it's probably a better idea to use that version instead. The
> latest OpenBLAS version brings many optimization for all kinds of hardware.
>
> -sz
>
> On Fri, Jul 27, 2018 at 11:10 AM, Tong He  wrote:
>
> > Hi Sheng,
> >
> > I also opened an issue on OpenBLAS repo:
> > https://github.com/xianyi/OpenBLAS/issues/1700 .
> >
> > As informed that "0.3.2 should be released this weekend", I tested their
> > develope branch as well, and seems the new version has fixed the bug.
> >
> > Since OpenBLAS 0.3.2 could also have performance improvement, therefore I
> > propose to wait for OpenBLAS 0.3.2 for our pip post release.
> >
> >
> > Best regards,
> >
> > Tong He
> >
> > 2018-07-27 10:54 GMT-07:00 Sheng Zha :
> >
> > > Forgot to mention, the post release version is a pip package version.
> > >
> > > -sz
> > >
> > > > On Jul 27, 2018, at 10:42 AM, Sheng Zha  wrote:
> > > >
> > > > In this case we can regard it as a release problem, which is usually
> > > what post release versions are for. It’s still the same release with
> > > different dependency, so there is no code change needed.
> > > >
> > > > -sz
> > > >
> > > >
> > > >> On Jul 27, 2018, at 8:31 AM, Steffen Rochel <
> steffenroc...@gmail.com>
> > > wrote:
> > > >>
> > > >> Hi Tong - thanks for root causing the problem.
> > > >> Sheng - what is 1.2.1.post0? Shouldn't a patch with fix be released
> as
> > > >> 1.2.2?
> > > >> Steffen
> > > >>
> > > >>> On Thu, Jul 26, 2018 at 5:33 PM Sheng Zha 
> > wrote:
> > > >>>
> > > >>> Dear users and developers of Apache MXNet (Incubating),
> > > >>>
> > > >>> Thanks to Tong's dedication, the root cause for this issue was
> > > identified
> > > >>> to be instability in OpenBLAS's latest stable version 0.3.1. For
> > > details,
> > > >>> see Tong's comment
> > > >>> <
> > > >>> https://github.com/apache/incubator-mxnet/issues/11853#
> > > issuecomment-408272772
> > > 
> > > >>> .
> > > >>>
> > > >>> Since both the nightly build and the 1.2.1 wheels are affected, we
> > > >>> recommend that we stay on OpenBLAS last known stable version 0.2.20
> > > that
> > > >>> we've been using. I will assume lazy consensus and prepare the fix
> > > >>> (1.2.1.post0).
> > > >>>
> > > >>> -sz
> > > >>>
> > >  On Tue, Jul 24, 2018 at 3:35 PM, Tong He  wrote:
> > > 
> > >  Recently there's an issue regarding the inconsistent result from
> > gluon
> > >  forward:
> > > 
> > >  https://github.com/apache/incubator-mxnet/issues/11853
> > > 
> > >  Given a constant input image and loaded pretrained parameters, we
> > > expect
> > > >>> a
> > >  deterministic output from arbitrary repeats of forwards. However
> > from
> > > the
> > >  issue I see that the forwarded result is non-determinstic. It is
> > > harmful
> > > >>> as
> > >  it makes the results from experments/benchmarks/inference
> > > meaningless.
> > > 
> > >  Therefore I propose to block the 1.3 release before it gets
> > resolved.
> > > 
> > > >>>
> > >
> >
>