Re: Release blocker: non-determinstic forward in gluon

2018-07-31 Thread kellen sunderland
I'd agree that we should have a repeatable process for generating
artifacts.  It would be useful for Apache release reviewers to be able to
double check the results we get in CI, and it would help give a consistent
experience for users.

I'm a little uncomfortable with the idea of generating the actual artifacts
from the CI account.  The CI account is designed to run arbitrary code from
the internet.  Generating a native binary that gets distributed to a bunch
of computers from this account seems like an unnecessary security risk.  We
could very simply run artifact builds in a different environment for which
the entire internet does not have execute permissions.  I'm not strongly
against the idea of releasing from the current CI account, but I think we
should be careful of how tightly we want to couple these processes.

On Tue, Jul 31, 2018 at 3:35 AM Hagay Lupesko  wrote:

> Thanks Pedro.
> Good to know you think it is important as well. I hope the community can
> review a proposal on the CWiki soon? that would be great...
>
> On Mon, Jul 30, 2018 at 4:26 AM Pedro Larroy  >
> wrote:
>
> > Hi Hagay
> >
> > We are aware of this and we are working in this direction which as you
> > point out, is more desirable.
> > There's a huge amount of non-trivial work that has gone into building
> these
> > distribution packages from Sheng which needs to be adapted for our CI
> > system, and taken into consideration.
> >
> > Pedro.
> >
> >
> > On Mon, Jul 30, 2018 at 9:07 AM Hagay Lupesko  wrote:
> >
> > > Thanks Tong for root-causing the issue!
> > > Thanks Sheng for following up with an updated PyPi package.
> > >
> > > What worries me is that we seem to build MXNet PyPi distribution
> packages
> > > with a build config different than the CI where all of the tests are
> > > running.
> > > Looking here [1
> > > <
> > >
> >
> https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh
> > > >]
> > > it seems that MXNet CI Ubuntu build uses libopenblas-dev v0.2.18, while
> > > PyPi build for MXNet 1.2.1 used v0.3.2 (I would imaging PyPi
> > distribution?)
> > >
> > > Needless to say that if we don't make sure PyPi distribution is aligned
> > > with the CI build, similar issues can happen again with other
> > dependencies.
> > > I'd think we want the build configs to be the same, or better yet have
> > the
> > > PyPi package be built from the output produced by the CI.
> > > Thoughts?
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh
> > >
> > >
> > > On Fri, Jul 27, 2018 at 11:31 AM Sheng Zha  wrote:
> > >
> > > > Tong,
> > > >
> > > > That's great news. I'm glad that OpenBLAS people are responding so
> > > quickly.
> > > > In that case it's probably a better idea to use that version instead.
> > The
> > > > latest OpenBLAS version brings many optimization for all kinds of
> > > hardware.
> > > >
> > > > -sz
> > > >
> > > > On Fri, Jul 27, 2018 at 11:10 AM, Tong He 
> wrote:
> > > >
> > > > > Hi Sheng,
> > > > >
> > > > > I also opened an issue on OpenBLAS repo:
> > > > > https://github.com/xianyi/OpenBLAS/issues/1700 .
> > > > >
> > > > > As informed that "0.3.2 should be released this weekend", I tested
> > > their
> > > > > develope branch as well, and seems the new version has fixed the
> bug.
> > > > >
> > > > > Since OpenBLAS 0.3.2 could also have performance improvement,
> > > therefore I
> > > > > propose to wait for OpenBLAS 0.3.2 for our pip post release.
> > > > >
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Tong He
> > > > >
> > > > > 2018-07-27 10:54 GMT-07:00 Sheng Zha :
> > > > >
> > > > > > Forgot to mention, the post release version is a pip package
> > version.
> > > > > >
> > > > > > -sz
> > > > > >
> > > > > > > On Jul 27, 2018, at 10:42 AM, Sheng Zha 
> > > wrote:
> > > > > > >
> > > > > > > In this case we can regard it as a release problem, which is
> > > usually
> > > > > > what post release versions are for. It’s still the same release
> > with
> > > > > > different dependency, so there is no code change needed.
> > > > > > >
> > > > > > > -sz
> > > > > > >
> > > > > > >
> > > > > > >> On Jul 27, 2018, at 8:31 AM, Steffen Rochel <
> > > > steffenroc...@gmail.com>
> > > > > > wrote:
> > > > > > >>
> > > > > > >> Hi Tong - thanks for root causing the problem.
> > > > > > >> Sheng - what is 1.2.1.post0? Shouldn't a patch with fix be
> > > released
> > > > as
> > > > > > >> 1.2.2?
> > > > > > >> Steffen
> > > > > > >>
> > > > > > >>> On Thu, Jul 26, 2018 at 5:33 PM Sheng Zha <
> szha@gmail.com>
> > > > > wrote:
> > > > > > >>>
> > > > > > >>> Dear users and developers of Apache MXNet (Incubating),
> > > > > > >>>
> > > > > > >>> Thanks to Tong's dedication, the root cause for this issue
> was
> > > > > > identified
> > > > > > >>> to be instability in OpenBLAS's latest stable version 0.3.1.
> > For
> > > > > > details,
> > > > > > >>> see Tong's comment
> > > > > > >>> <
> > > > > > >>> 

Re: Release blocker: non-determinstic forward in gluon

2018-07-30 Thread Hagay Lupesko
Thanks Pedro.
Good to know you think it is important as well. I hope the community can
review a proposal on the CWiki soon? that would be great...

On Mon, Jul 30, 2018 at 4:26 AM Pedro Larroy 
wrote:

> Hi Hagay
>
> We are aware of this and we are working in this direction which as you
> point out, is more desirable.
> There's a huge amount of non-trivial work that has gone into building these
> distribution packages from Sheng which needs to be adapted for our CI
> system, and taken into consideration.
>
> Pedro.
>
>
> On Mon, Jul 30, 2018 at 9:07 AM Hagay Lupesko  wrote:
>
> > Thanks Tong for root-causing the issue!
> > Thanks Sheng for following up with an updated PyPi package.
> >
> > What worries me is that we seem to build MXNet PyPi distribution packages
> > with a build config different than the CI where all of the tests are
> > running.
> > Looking here [1
> > <
> >
> https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh
> > >]
> > it seems that MXNet CI Ubuntu build uses libopenblas-dev v0.2.18, while
> > PyPi build for MXNet 1.2.1 used v0.3.2 (I would imaging PyPi
> distribution?)
> >
> > Needless to say that if we don't make sure PyPi distribution is aligned
> > with the CI build, similar issues can happen again with other
> dependencies.
> > I'd think we want the build configs to be the same, or better yet have
> the
> > PyPi package be built from the output produced by the CI.
> > Thoughts?
> >
> > [1]
> >
> >
> https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh
> >
> >
> > On Fri, Jul 27, 2018 at 11:31 AM Sheng Zha  wrote:
> >
> > > Tong,
> > >
> > > That's great news. I'm glad that OpenBLAS people are responding so
> > quickly.
> > > In that case it's probably a better idea to use that version instead.
> The
> > > latest OpenBLAS version brings many optimization for all kinds of
> > hardware.
> > >
> > > -sz
> > >
> > > On Fri, Jul 27, 2018 at 11:10 AM, Tong He  wrote:
> > >
> > > > Hi Sheng,
> > > >
> > > > I also opened an issue on OpenBLAS repo:
> > > > https://github.com/xianyi/OpenBLAS/issues/1700 .
> > > >
> > > > As informed that "0.3.2 should be released this weekend", I tested
> > their
> > > > develope branch as well, and seems the new version has fixed the bug.
> > > >
> > > > Since OpenBLAS 0.3.2 could also have performance improvement,
> > therefore I
> > > > propose to wait for OpenBLAS 0.3.2 for our pip post release.
> > > >
> > > >
> > > > Best regards,
> > > >
> > > > Tong He
> > > >
> > > > 2018-07-27 10:54 GMT-07:00 Sheng Zha :
> > > >
> > > > > Forgot to mention, the post release version is a pip package
> version.
> > > > >
> > > > > -sz
> > > > >
> > > > > > On Jul 27, 2018, at 10:42 AM, Sheng Zha 
> > wrote:
> > > > > >
> > > > > > In this case we can regard it as a release problem, which is
> > usually
> > > > > what post release versions are for. It’s still the same release
> with
> > > > > different dependency, so there is no code change needed.
> > > > > >
> > > > > > -sz
> > > > > >
> > > > > >
> > > > > >> On Jul 27, 2018, at 8:31 AM, Steffen Rochel <
> > > steffenroc...@gmail.com>
> > > > > wrote:
> > > > > >>
> > > > > >> Hi Tong - thanks for root causing the problem.
> > > > > >> Sheng - what is 1.2.1.post0? Shouldn't a patch with fix be
> > released
> > > as
> > > > > >> 1.2.2?
> > > > > >> Steffen
> > > > > >>
> > > > > >>> On Thu, Jul 26, 2018 at 5:33 PM Sheng Zha 
> > > > wrote:
> > > > > >>>
> > > > > >>> Dear users and developers of Apache MXNet (Incubating),
> > > > > >>>
> > > > > >>> Thanks to Tong's dedication, the root cause for this issue was
> > > > > identified
> > > > > >>> to be instability in OpenBLAS's latest stable version 0.3.1.
> For
> > > > > details,
> > > > > >>> see Tong's comment
> > > > > >>> <
> > > > > >>> https://github.com/apache/incubator-mxnet/issues/11853#
> > > > > issuecomment-408272772
> > > > > 
> > > > > >>> .
> > > > > >>>
> > > > > >>> Since both the nightly build and the 1.2.1 wheels are affected,
> > we
> > > > > >>> recommend that we stay on OpenBLAS last known stable version
> > 0.2.20
> > > > > that
> > > > > >>> we've been using. I will assume lazy consensus and prepare the
> > fix
> > > > > >>> (1.2.1.post0).
> > > > > >>>
> > > > > >>> -sz
> > > > > >>>
> > > > >  On Tue, Jul 24, 2018 at 3:35 PM, Tong He 
> > wrote:
> > > > > 
> > > > >  Recently there's an issue regarding the inconsistent result
> from
> > > > gluon
> > > > >  forward:
> > > > > 
> > > > >  https://github.com/apache/incubator-mxnet/issues/11853
> > > > > 
> > > > >  Given a constant input image and loaded pretrained parameters,
> > we
> > > > > expect
> > > > > >>> a
> > > > >  deterministic output from arbitrary repeats of forwards.
> However
> > > > from
> > > > > the
> > > > >  issue I see that the forwarded result is non-determinstic. It
> is
> > > > > harmful
> > > > > >>> as
> > > > >  it makes the results from 

Re: Release blocker: non-determinstic forward in gluon

2018-07-30 Thread Pedro Larroy
Hi Hagay

We are aware of this and we are working in this direction which as you
point out, is more desirable.
There's a huge amount of non-trivial work that has gone into building these
distribution packages from Sheng which needs to be adapted for our CI
system, and taken into consideration.

Pedro.


On Mon, Jul 30, 2018 at 9:07 AM Hagay Lupesko  wrote:

> Thanks Tong for root-causing the issue!
> Thanks Sheng for following up with an updated PyPi package.
>
> What worries me is that we seem to build MXNet PyPi distribution packages
> with a build config different than the CI where all of the tests are
> running.
> Looking here [1
> <
> https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh
> >]
> it seems that MXNet CI Ubuntu build uses libopenblas-dev v0.2.18, while
> PyPi build for MXNet 1.2.1 used v0.3.2 (I would imaging PyPi distribution?)
>
> Needless to say that if we don't make sure PyPi distribution is aligned
> with the CI build, similar issues can happen again with other dependencies.
> I'd think we want the build configs to be the same, or better yet have the
> PyPi package be built from the output produced by the CI.
> Thoughts?
>
> [1]
>
> https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh
>
>
> On Fri, Jul 27, 2018 at 11:31 AM Sheng Zha  wrote:
>
> > Tong,
> >
> > That's great news. I'm glad that OpenBLAS people are responding so
> quickly.
> > In that case it's probably a better idea to use that version instead. The
> > latest OpenBLAS version brings many optimization for all kinds of
> hardware.
> >
> > -sz
> >
> > On Fri, Jul 27, 2018 at 11:10 AM, Tong He  wrote:
> >
> > > Hi Sheng,
> > >
> > > I also opened an issue on OpenBLAS repo:
> > > https://github.com/xianyi/OpenBLAS/issues/1700 .
> > >
> > > As informed that "0.3.2 should be released this weekend", I tested
> their
> > > develope branch as well, and seems the new version has fixed the bug.
> > >
> > > Since OpenBLAS 0.3.2 could also have performance improvement,
> therefore I
> > > propose to wait for OpenBLAS 0.3.2 for our pip post release.
> > >
> > >
> > > Best regards,
> > >
> > > Tong He
> > >
> > > 2018-07-27 10:54 GMT-07:00 Sheng Zha :
> > >
> > > > Forgot to mention, the post release version is a pip package version.
> > > >
> > > > -sz
> > > >
> > > > > On Jul 27, 2018, at 10:42 AM, Sheng Zha 
> wrote:
> > > > >
> > > > > In this case we can regard it as a release problem, which is
> usually
> > > > what post release versions are for. It’s still the same release with
> > > > different dependency, so there is no code change needed.
> > > > >
> > > > > -sz
> > > > >
> > > > >
> > > > >> On Jul 27, 2018, at 8:31 AM, Steffen Rochel <
> > steffenroc...@gmail.com>
> > > > wrote:
> > > > >>
> > > > >> Hi Tong - thanks for root causing the problem.
> > > > >> Sheng - what is 1.2.1.post0? Shouldn't a patch with fix be
> released
> > as
> > > > >> 1.2.2?
> > > > >> Steffen
> > > > >>
> > > > >>> On Thu, Jul 26, 2018 at 5:33 PM Sheng Zha 
> > > wrote:
> > > > >>>
> > > > >>> Dear users and developers of Apache MXNet (Incubating),
> > > > >>>
> > > > >>> Thanks to Tong's dedication, the root cause for this issue was
> > > > identified
> > > > >>> to be instability in OpenBLAS's latest stable version 0.3.1. For
> > > > details,
> > > > >>> see Tong's comment
> > > > >>> <
> > > > >>> https://github.com/apache/incubator-mxnet/issues/11853#
> > > > issuecomment-408272772
> > > > 
> > > > >>> .
> > > > >>>
> > > > >>> Since both the nightly build and the 1.2.1 wheels are affected,
> we
> > > > >>> recommend that we stay on OpenBLAS last known stable version
> 0.2.20
> > > > that
> > > > >>> we've been using. I will assume lazy consensus and prepare the
> fix
> > > > >>> (1.2.1.post0).
> > > > >>>
> > > > >>> -sz
> > > > >>>
> > > >  On Tue, Jul 24, 2018 at 3:35 PM, Tong He 
> wrote:
> > > > 
> > > >  Recently there's an issue regarding the inconsistent result from
> > > gluon
> > > >  forward:
> > > > 
> > > >  https://github.com/apache/incubator-mxnet/issues/11853
> > > > 
> > > >  Given a constant input image and loaded pretrained parameters,
> we
> > > > expect
> > > > >>> a
> > > >  deterministic output from arbitrary repeats of forwards. However
> > > from
> > > > the
> > > >  issue I see that the forwarded result is non-determinstic. It is
> > > > harmful
> > > > >>> as
> > > >  it makes the results from experments/benchmarks/inference
> > > > meaningless.
> > > > 
> > > >  Therefore I propose to block the 1.3 release before it gets
> > > resolved.
> > > > 
> > > > >>>
> > > >
> > >
> >
>


Re: Release blocker: non-determinstic forward in gluon

2018-07-30 Thread Hagay Lupesko
Thanks Tong for root-causing the issue!
Thanks Sheng for following up with an updated PyPi package.

What worries me is that we seem to build MXNet PyPi distribution packages
with a build config different than the CI where all of the tests are
running.
Looking here [1
]
it seems that MXNet CI Ubuntu build uses libopenblas-dev v0.2.18, while
PyPi build for MXNet 1.2.1 used v0.3.2 (I would imaging PyPi distribution?)

Needless to say that if we don't make sure PyPi distribution is aligned
with the CI build, similar issues can happen again with other dependencies.
I'd think we want the build configs to be the same, or better yet have the
PyPi package be built from the output produced by the CI.
Thoughts?

[1]
https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh


On Fri, Jul 27, 2018 at 11:31 AM Sheng Zha  wrote:

> Tong,
>
> That's great news. I'm glad that OpenBLAS people are responding so quickly.
> In that case it's probably a better idea to use that version instead. The
> latest OpenBLAS version brings many optimization for all kinds of hardware.
>
> -sz
>
> On Fri, Jul 27, 2018 at 11:10 AM, Tong He  wrote:
>
> > Hi Sheng,
> >
> > I also opened an issue on OpenBLAS repo:
> > https://github.com/xianyi/OpenBLAS/issues/1700 .
> >
> > As informed that "0.3.2 should be released this weekend", I tested their
> > develope branch as well, and seems the new version has fixed the bug.
> >
> > Since OpenBLAS 0.3.2 could also have performance improvement, therefore I
> > propose to wait for OpenBLAS 0.3.2 for our pip post release.
> >
> >
> > Best regards,
> >
> > Tong He
> >
> > 2018-07-27 10:54 GMT-07:00 Sheng Zha :
> >
> > > Forgot to mention, the post release version is a pip package version.
> > >
> > > -sz
> > >
> > > > On Jul 27, 2018, at 10:42 AM, Sheng Zha  wrote:
> > > >
> > > > In this case we can regard it as a release problem, which is usually
> > > what post release versions are for. It’s still the same release with
> > > different dependency, so there is no code change needed.
> > > >
> > > > -sz
> > > >
> > > >
> > > >> On Jul 27, 2018, at 8:31 AM, Steffen Rochel <
> steffenroc...@gmail.com>
> > > wrote:
> > > >>
> > > >> Hi Tong - thanks for root causing the problem.
> > > >> Sheng - what is 1.2.1.post0? Shouldn't a patch with fix be released
> as
> > > >> 1.2.2?
> > > >> Steffen
> > > >>
> > > >>> On Thu, Jul 26, 2018 at 5:33 PM Sheng Zha 
> > wrote:
> > > >>>
> > > >>> Dear users and developers of Apache MXNet (Incubating),
> > > >>>
> > > >>> Thanks to Tong's dedication, the root cause for this issue was
> > > identified
> > > >>> to be instability in OpenBLAS's latest stable version 0.3.1. For
> > > details,
> > > >>> see Tong's comment
> > > >>> <
> > > >>> https://github.com/apache/incubator-mxnet/issues/11853#
> > > issuecomment-408272772
> > > 
> > > >>> .
> > > >>>
> > > >>> Since both the nightly build and the 1.2.1 wheels are affected, we
> > > >>> recommend that we stay on OpenBLAS last known stable version 0.2.20
> > > that
> > > >>> we've been using. I will assume lazy consensus and prepare the fix
> > > >>> (1.2.1.post0).
> > > >>>
> > > >>> -sz
> > > >>>
> > >  On Tue, Jul 24, 2018 at 3:35 PM, Tong He  wrote:
> > > 
> > >  Recently there's an issue regarding the inconsistent result from
> > gluon
> > >  forward:
> > > 
> > >  https://github.com/apache/incubator-mxnet/issues/11853
> > > 
> > >  Given a constant input image and loaded pretrained parameters, we
> > > expect
> > > >>> a
> > >  deterministic output from arbitrary repeats of forwards. However
> > from
> > > the
> > >  issue I see that the forwarded result is non-determinstic. It is
> > > harmful
> > > >>> as
> > >  it makes the results from experments/benchmarks/inference
> > > meaningless.
> > > 
> > >  Therefore I propose to block the 1.3 release before it gets
> > resolved.
> > > 
> > > >>>
> > >
> >
>


Re: Release blocker: non-determinstic forward in gluon

2018-07-27 Thread Sheng Zha
Tong,

That's great news. I'm glad that OpenBLAS people are responding so quickly.
In that case it's probably a better idea to use that version instead. The
latest OpenBLAS version brings many optimization for all kinds of hardware.

-sz

On Fri, Jul 27, 2018 at 11:10 AM, Tong He  wrote:

> Hi Sheng,
>
> I also opened an issue on OpenBLAS repo:
> https://github.com/xianyi/OpenBLAS/issues/1700 .
>
> As informed that "0.3.2 should be released this weekend", I tested their
> develope branch as well, and seems the new version has fixed the bug.
>
> Since OpenBLAS 0.3.2 could also have performance improvement, therefore I
> propose to wait for OpenBLAS 0.3.2 for our pip post release.
>
>
> Best regards,
>
> Tong He
>
> 2018-07-27 10:54 GMT-07:00 Sheng Zha :
>
> > Forgot to mention, the post release version is a pip package version.
> >
> > -sz
> >
> > > On Jul 27, 2018, at 10:42 AM, Sheng Zha  wrote:
> > >
> > > In this case we can regard it as a release problem, which is usually
> > what post release versions are for. It’s still the same release with
> > different dependency, so there is no code change needed.
> > >
> > > -sz
> > >
> > >
> > >> On Jul 27, 2018, at 8:31 AM, Steffen Rochel 
> > wrote:
> > >>
> > >> Hi Tong - thanks for root causing the problem.
> > >> Sheng - what is 1.2.1.post0? Shouldn't a patch with fix be released as
> > >> 1.2.2?
> > >> Steffen
> > >>
> > >>> On Thu, Jul 26, 2018 at 5:33 PM Sheng Zha 
> wrote:
> > >>>
> > >>> Dear users and developers of Apache MXNet (Incubating),
> > >>>
> > >>> Thanks to Tong's dedication, the root cause for this issue was
> > identified
> > >>> to be instability in OpenBLAS's latest stable version 0.3.1. For
> > details,
> > >>> see Tong's comment
> > >>> <
> > >>> https://github.com/apache/incubator-mxnet/issues/11853#
> > issuecomment-408272772
> > 
> > >>> .
> > >>>
> > >>> Since both the nightly build and the 1.2.1 wheels are affected, we
> > >>> recommend that we stay on OpenBLAS last known stable version 0.2.20
> > that
> > >>> we've been using. I will assume lazy consensus and prepare the fix
> > >>> (1.2.1.post0).
> > >>>
> > >>> -sz
> > >>>
> >  On Tue, Jul 24, 2018 at 3:35 PM, Tong He  wrote:
> > 
> >  Recently there's an issue regarding the inconsistent result from
> gluon
> >  forward:
> > 
> >  https://github.com/apache/incubator-mxnet/issues/11853
> > 
> >  Given a constant input image and loaded pretrained parameters, we
> > expect
> > >>> a
> >  deterministic output from arbitrary repeats of forwards. However
> from
> > the
> >  issue I see that the forwarded result is non-determinstic. It is
> > harmful
> > >>> as
> >  it makes the results from experments/benchmarks/inference
> > meaningless.
> > 
> >  Therefore I propose to block the 1.3 release before it gets
> resolved.
> > 
> > >>>
> >
>


Re: Release blocker: non-determinstic forward in gluon

2018-07-27 Thread Tong He
Hi Sheng,

I also opened an issue on OpenBLAS repo:
https://github.com/xianyi/OpenBLAS/issues/1700 .

As informed that "0.3.2 should be released this weekend", I tested their
develope branch as well, and seems the new version has fixed the bug.

Since OpenBLAS 0.3.2 could also have performance improvement, therefore I
propose to wait for OpenBLAS 0.3.2 for our pip post release.


Best regards,

Tong He

2018-07-27 10:54 GMT-07:00 Sheng Zha :

> Forgot to mention, the post release version is a pip package version.
>
> -sz
>
> > On Jul 27, 2018, at 10:42 AM, Sheng Zha  wrote:
> >
> > In this case we can regard it as a release problem, which is usually
> what post release versions are for. It’s still the same release with
> different dependency, so there is no code change needed.
> >
> > -sz
> >
> >
> >> On Jul 27, 2018, at 8:31 AM, Steffen Rochel 
> wrote:
> >>
> >> Hi Tong - thanks for root causing the problem.
> >> Sheng - what is 1.2.1.post0? Shouldn't a patch with fix be released as
> >> 1.2.2?
> >> Steffen
> >>
> >>> On Thu, Jul 26, 2018 at 5:33 PM Sheng Zha  wrote:
> >>>
> >>> Dear users and developers of Apache MXNet (Incubating),
> >>>
> >>> Thanks to Tong's dedication, the root cause for this issue was
> identified
> >>> to be instability in OpenBLAS's latest stable version 0.3.1. For
> details,
> >>> see Tong's comment
> >>> <
> >>> https://github.com/apache/incubator-mxnet/issues/11853#
> issuecomment-408272772
> 
> >>> .
> >>>
> >>> Since both the nightly build and the 1.2.1 wheels are affected, we
> >>> recommend that we stay on OpenBLAS last known stable version 0.2.20
> that
> >>> we've been using. I will assume lazy consensus and prepare the fix
> >>> (1.2.1.post0).
> >>>
> >>> -sz
> >>>
>  On Tue, Jul 24, 2018 at 3:35 PM, Tong He  wrote:
> 
>  Recently there's an issue regarding the inconsistent result from gluon
>  forward:
> 
>  https://github.com/apache/incubator-mxnet/issues/11853
> 
>  Given a constant input image and loaded pretrained parameters, we
> expect
> >>> a
>  deterministic output from arbitrary repeats of forwards. However from
> the
>  issue I see that the forwarded result is non-determinstic. It is
> harmful
> >>> as
>  it makes the results from experments/benchmarks/inference
> meaningless.
> 
>  Therefore I propose to block the 1.3 release before it gets resolved.
> 
> >>>
>


Re: Release blocker: non-determinstic forward in gluon

2018-07-27 Thread Sheng Zha
Forgot to mention, the post release version is a pip package version.

-sz

> On Jul 27, 2018, at 10:42 AM, Sheng Zha  wrote:
> 
> In this case we can regard it as a release problem, which is usually what 
> post release versions are for. It’s still the same release with different 
> dependency, so there is no code change needed.
> 
> -sz
> 
> 
>> On Jul 27, 2018, at 8:31 AM, Steffen Rochel  wrote:
>> 
>> Hi Tong - thanks for root causing the problem.
>> Sheng - what is 1.2.1.post0? Shouldn't a patch with fix be released as
>> 1.2.2?
>> Steffen
>> 
>>> On Thu, Jul 26, 2018 at 5:33 PM Sheng Zha  wrote:
>>> 
>>> Dear users and developers of Apache MXNet (Incubating),
>>> 
>>> Thanks to Tong's dedication, the root cause for this issue was identified
>>> to be instability in OpenBLAS's latest stable version 0.3.1. For details,
>>> see Tong's comment
>>> <
>>> https://github.com/apache/incubator-mxnet/issues/11853#issuecomment-408272772
 
>>> .
>>> 
>>> Since both the nightly build and the 1.2.1 wheels are affected, we
>>> recommend that we stay on OpenBLAS last known stable version 0.2.20 that
>>> we've been using. I will assume lazy consensus and prepare the fix
>>> (1.2.1.post0).
>>> 
>>> -sz
>>> 
 On Tue, Jul 24, 2018 at 3:35 PM, Tong He  wrote:
 
 Recently there's an issue regarding the inconsistent result from gluon
 forward:
 
 https://github.com/apache/incubator-mxnet/issues/11853
 
 Given a constant input image and loaded pretrained parameters, we expect
>>> a
 deterministic output from arbitrary repeats of forwards. However from the
 issue I see that the forwarded result is non-determinstic. It is harmful
>>> as
 it makes the results from experments/benchmarks/inference meaningless.
 
 Therefore I propose to block the 1.3 release before it gets resolved.
 
>>> 


Re: Release blocker: non-determinstic forward in gluon

2018-07-27 Thread Sheng Zha
In this case we can regard it as a release problem, which is usually what post 
release versions are for. It’s still the same release with different 
dependency, so there is no code change needed.

-sz


> On Jul 27, 2018, at 8:31 AM, Steffen Rochel  wrote:
> 
> Hi Tong - thanks for root causing the problem.
> Sheng - what is 1.2.1.post0? Shouldn't a patch with fix be released as
> 1.2.2?
> Steffen
> 
>> On Thu, Jul 26, 2018 at 5:33 PM Sheng Zha  wrote:
>> 
>> Dear users and developers of Apache MXNet (Incubating),
>> 
>> Thanks to Tong's dedication, the root cause for this issue was identified
>> to be instability in OpenBLAS's latest stable version 0.3.1. For details,
>> see Tong's comment
>> <
>> https://github.com/apache/incubator-mxnet/issues/11853#issuecomment-408272772
>>> 
>> .
>> 
>> Since both the nightly build and the 1.2.1 wheels are affected, we
>> recommend that we stay on OpenBLAS last known stable version 0.2.20 that
>> we've been using. I will assume lazy consensus and prepare the fix
>> (1.2.1.post0).
>> 
>> -sz
>> 
>>> On Tue, Jul 24, 2018 at 3:35 PM, Tong He  wrote:
>>> 
>>> Recently there's an issue regarding the inconsistent result from gluon
>>> forward:
>>> 
>>> https://github.com/apache/incubator-mxnet/issues/11853
>>> 
>>> Given a constant input image and loaded pretrained parameters, we expect
>> a
>>> deterministic output from arbitrary repeats of forwards. However from the
>>> issue I see that the forwarded result is non-determinstic. It is harmful
>> as
>>> it makes the results from experments/benchmarks/inference meaningless.
>>> 
>>> Therefore I propose to block the 1.3 release before it gets resolved.
>>> 
>> 


Re: Release blocker: non-determinstic forward in gluon

2018-07-27 Thread Steffen Rochel
Hi Tong - thanks for root causing the problem.
Sheng - what is 1.2.1.post0? Shouldn't a patch with fix be released as
1.2.2?
Steffen

On Thu, Jul 26, 2018 at 5:33 PM Sheng Zha  wrote:

> Dear users and developers of Apache MXNet (Incubating),
>
> Thanks to Tong's dedication, the root cause for this issue was identified
> to be instability in OpenBLAS's latest stable version 0.3.1. For details,
> see Tong's comment
> <
> https://github.com/apache/incubator-mxnet/issues/11853#issuecomment-408272772
> >
> .
>
> Since both the nightly build and the 1.2.1 wheels are affected, we
> recommend that we stay on OpenBLAS last known stable version 0.2.20 that
> we've been using. I will assume lazy consensus and prepare the fix
> (1.2.1.post0).
>
> -sz
>
> On Tue, Jul 24, 2018 at 3:35 PM, Tong He  wrote:
>
> > Recently there's an issue regarding the inconsistent result from gluon
> > forward:
> >
> > https://github.com/apache/incubator-mxnet/issues/11853
> >
> > Given a constant input image and loaded pretrained parameters, we expect
> a
> > deterministic output from arbitrary repeats of forwards. However from the
> > issue I see that the forwarded result is non-determinstic. It is harmful
> as
> > it makes the results from experments/benchmarks/inference meaningless.
> >
> > Therefore I propose to block the 1.3 release before it gets resolved.
> >
>


Re: Release blocker: non-determinstic forward in gluon

2018-07-26 Thread Sheng Zha
Dear users and developers of Apache MXNet (Incubating),

Thanks to Tong's dedication, the root cause for this issue was identified
to be instability in OpenBLAS's latest stable version 0.3.1. For details,
see Tong's comment

.

Since both the nightly build and the 1.2.1 wheels are affected, we
recommend that we stay on OpenBLAS last known stable version 0.2.20 that
we've been using. I will assume lazy consensus and prepare the fix
(1.2.1.post0).

-sz

On Tue, Jul 24, 2018 at 3:35 PM, Tong He  wrote:

> Recently there's an issue regarding the inconsistent result from gluon
> forward:
>
> https://github.com/apache/incubator-mxnet/issues/11853
>
> Given a constant input image and loaded pretrained parameters, we expect a
> deterministic output from arbitrary repeats of forwards. However from the
> issue I see that the forwarded result is non-determinstic. It is harmful as
> it makes the results from experments/benchmarks/inference meaningless.
>
> Therefore I propose to block the 1.3 release before it gets resolved.
>


Release blocker: non-determinstic forward in gluon

2018-07-24 Thread Tong He
Recently there's an issue regarding the inconsistent result from gluon forward:

https://github.com/apache/incubator-mxnet/issues/11853

Given a constant input image and loaded pretrained parameters, we expect a 
deterministic output from arbitrary repeats of forwards. However from the issue 
I see that the forwarded result is non-determinstic. It is harmful as it makes 
the results from experments/benchmarks/inference meaningless. 

Therefore I propose to block the 1.3 release before it gets resolved.