Re: [VOTE] Release MXNet version 1.3.0.RC0

2018-09-04 Thread sandeep krishnamurthy
   1. As a Apache MXNet community member, I raised the concern of broken
   functionality for the user. I explained and provided the data points on the
   issue, workaround and why I think it is important. If after all this, you
   think my vote is biased on my employer just because a user I quoted is from
   Amazon, this is more concerning to me on my voting abilities.
   2. My -1 no where undermines the huge amount of effort that goes behind
   the scene for a release to happen. Great respect and recognition for
   everyone involved in all the releases of MXNet in the past and this. I
   voted on my judgement of what may be good for the users of MXNet.
   3. As pointed by Naveen & Chris, -1 are NOT veto. Feel free to decide
   and progress on the release as we already have >3 +1 in this thread.


Best,

Sandeep

On Tue, Sep 4, 2018 at 8:29 PM Chris Olivier  wrote:

> btw, there are no vetoes on package releases:
>
> VOTES ON PACKAGE RELEASES
> 
>
> Votes on whether a package is ready to be released use majority approval
>  -- i.e.
> at least three PMC members must vote affirmatively for release, and there
> must be more positive than negative votes.Releases may not be vetoed.
> Generally
> the community will cancel the release vote if anyone identifies serious
> problems, but in most cases the ultimate decision, lies with the individual
> serving as release manager. The specifics of the process may vary from
> project to project, but the 'minimum quorum of three +1 votes' rule is
> universal.
>
> On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha  wrote:
>
> > Thanks for sharing your opinions, Thomas. Your recognition and respect of
> > people's efforts on preparing the release candidate are certainly
> > appreciated.
> >
> > Now that the vote is set to fail thanks to the veto, there will be plenty
> > of opportunities to include those bug fixes, including the one Zhi
> > mentioned [1], which was already merged in the master and yet chose not
> to
> > block this release with [2]. I will be happy to work with Roshani to
> > prepare another release candidate once ready.
> >
> > -sz
> >
> > [1]
> >
> >
> https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
> > [2]
> >
> >
> https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
> >
> > On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL  >
> > wrote:
> >
> > > -0
> > > (non-binding)
> > >
> > > If I may add some nuancing plus a personal data point as one of the
> users
> > > commenting in the bug report in question:
> > >
> > > - Performance vs. Basic functionality => I don't think high performance
> > > use-cases and basic functionality are two obviously opposed concepts
> and
> > > see no contradiction in Hagay's and Sandeep's statements.
> > > Float16 support is feature of MXNet that provides more than twice the
> > > performance of Float32 on supported platforms, hence the high
> performance
> > > use-case. The bug is that the basic functionality of reloading a saved
> > > float16 models is currently broken.
> > >
> > > - This bug vs Other bugs => Contrary the vast majority of the 140 open
> > bugs
> > > that are mentioned above, I would put to Sandeep's credit that this one
> > bug
> > > has a PR open that provides a fix for it. This would make it a better
> > > candidate to get included in this release than a bug that has no fix
> > ready
> > > for it.
> > >
> > > - Personal datapoint: I recently did some experimentation with float16
> > [1]
> > > and actually coincidentally just published a video on optimizing
> > > performance for Gluon. Float16 conversion is one of the most, if not
> the
> > > most effective way to get performance out of MXNet [2]. I believe there
> > is
> > > a lot of value in publicizing more its use and hence making sure at
> least
> > > the basic support for normal use-cases is present.
> > >
> > > Of course this needs to be balanced with the overhead of preparing a
> new
> > > release candidate once the fixed is reviewed and merged, which seems to
> > be
> > > a lengthy and complex process in its own right, and the delay with
> > > providing the other features present in 1.3 for users that are not
> > running
> > > off the nightly builds.
> > >
> > > All the best,
> > >
> > > Thomas
> > >
> > > [1] https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
> > > [2]
> > >
> > >
> >
> https://www.youtube.com/watch?v=Cqo7FPftNyo=0s=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
> > >
> > > Le mar. 4 sept. 2018 à 17:11, Sheng Zha  a écrit :
> > >
> > > > Sandeep,
> > > >
> > > > Thanks for explaining your veto. We have open bugs that impacted a
> lot
> > > more
> > > > than just 3 customers, just by referring to the number of commenters
> on
> > > the
> > > > issue [1].
> > > >
> > > > You said that 

Re: [VOTE] Release MXNet version 1.3.0.RC0

2018-09-04 Thread Chris Olivier
btw, there are no vetoes on package releases:

VOTES ON PACKAGE RELEASES


Votes on whether a package is ready to be released use majority approval
 -- i.e.
at least three PMC members must vote affirmatively for release, and there
must be more positive than negative votes.Releases may not be vetoed. Generally
the community will cancel the release vote if anyone identifies serious
problems, but in most cases the ultimate decision, lies with the individual
serving as release manager. The specifics of the process may vary from
project to project, but the 'minimum quorum of three +1 votes' rule is
universal.

On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha  wrote:

> Thanks for sharing your opinions, Thomas. Your recognition and respect of
> people's efforts on preparing the release candidate are certainly
> appreciated.
>
> Now that the vote is set to fail thanks to the veto, there will be plenty
> of opportunities to include those bug fixes, including the one Zhi
> mentioned [1], which was already merged in the master and yet chose not to
> block this release with [2]. I will be happy to work with Roshani to
> prepare another release candidate once ready.
>
> -sz
>
> [1]
>
> https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
> [2]
>
> https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
>
> On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL 
> wrote:
>
> > -0
> > (non-binding)
> >
> > If I may add some nuancing plus a personal data point as one of the users
> > commenting in the bug report in question:
> >
> > - Performance vs. Basic functionality => I don't think high performance
> > use-cases and basic functionality are two obviously opposed concepts and
> > see no contradiction in Hagay's and Sandeep's statements.
> > Float16 support is feature of MXNet that provides more than twice the
> > performance of Float32 on supported platforms, hence the high performance
> > use-case. The bug is that the basic functionality of reloading a saved
> > float16 models is currently broken.
> >
> > - This bug vs Other bugs => Contrary the vast majority of the 140 open
> bugs
> > that are mentioned above, I would put to Sandeep's credit that this one
> bug
> > has a PR open that provides a fix for it. This would make it a better
> > candidate to get included in this release than a bug that has no fix
> ready
> > for it.
> >
> > - Personal datapoint: I recently did some experimentation with float16
> [1]
> > and actually coincidentally just published a video on optimizing
> > performance for Gluon. Float16 conversion is one of the most, if not the
> > most effective way to get performance out of MXNet [2]. I believe there
> is
> > a lot of value in publicizing more its use and hence making sure at least
> > the basic support for normal use-cases is present.
> >
> > Of course this needs to be balanced with the overhead of preparing a new
> > release candidate once the fixed is reviewed and merged, which seems to
> be
> > a lengthy and complex process in its own right, and the delay with
> > providing the other features present in 1.3 for users that are not
> running
> > off the nightly builds.
> >
> > All the best,
> >
> > Thomas
> >
> > [1] https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
> > [2]
> >
> >
> https://www.youtube.com/watch?v=Cqo7FPftNyo=0s=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
> >
> > Le mar. 4 sept. 2018 à 17:11, Sheng Zha  a écrit :
> >
> > > Sandeep,
> > >
> > > Thanks for explaining your veto. We have open bugs that impacted a lot
> > more
> > > than just 3 customers, just by referring to the number of commenters on
> > the
> > > issue [1].
> > >
> > > You said that this is for "high performance use cases", which
> contradicts
> > > with Hagay's assement that this is "basic functionality broken". Given
> > that
> > > this is for advanced use cases of using half-precision training, why is
> > it
> > > so much more important than any other open bug reports, that for this
> > > specific bug fix, we have to delay the access of regular users to the
> new
> > > MXNet 1.3 release by at least another week?
> > >
> > > Honestly, I'm concerned that your vote is biased by Amazon involvement,
> > > given that you quoted Amazon Rekognition.
> > >
> > > -sz
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
> > >
> > > On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
> > > sandeep.krishn...@gmail.com> wrote:
> > >
> > > > My initial vote of “-0” was due to lack of info from a user who had
> > said,
> > > > he overcame this issue for FP16 model.
> > > >
> > > >
> > > > However, suggested workaround [1] for the issue is not straight
> forward
> > > and
> > > > generally 

Re: [VOTE] Release MXNet version 1.3.0.RC0

2018-09-04 Thread Naveen Swamy
"Releases may not be vetoed"
http://www.apache.org/legal/release-policy.html#release-approval

I haven't tested the release yet, I'll do so tomorrow.

> On Sep 4, 2018, at 7:13 PM, Sheng Zha  wrote:
> 
> Thanks for sharing your opinions, Thomas. Your recognition and respect of
> people's efforts on preparing the release candidate are certainly
> appreciated.
> 
> Now that the vote is set to fail thanks to the veto, there will be plenty
> of opportunities to include those bug fixes, including the one Zhi
> mentioned [1], which was already merged in the master and yet chose not to
> block this release with [2]. I will be happy to work with Roshani to
> prepare another release candidate once ready.
> 
> -sz
> 
> [1]
> https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E
> [2]
> https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E
> 
> On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL 
> wrote:
> 
>> -0
>> (non-binding)
>> 
>> If I may add some nuancing plus a personal data point as one of the users
>> commenting in the bug report in question:
>> 
>> - Performance vs. Basic functionality => I don't think high performance
>> use-cases and basic functionality are two obviously opposed concepts and
>> see no contradiction in Hagay's and Sandeep's statements.
>> Float16 support is feature of MXNet that provides more than twice the
>> performance of Float32 on supported platforms, hence the high performance
>> use-case. The bug is that the basic functionality of reloading a saved
>> float16 models is currently broken.
>> 
>> - This bug vs Other bugs => Contrary the vast majority of the 140 open bugs
>> that are mentioned above, I would put to Sandeep's credit that this one bug
>> has a PR open that provides a fix for it. This would make it a better
>> candidate to get included in this release than a bug that has no fix ready
>> for it.
>> 
>> - Personal datapoint: I recently did some experimentation with float16 [1]
>> and actually coincidentally just published a video on optimizing
>> performance for Gluon. Float16 conversion is one of the most, if not the
>> most effective way to get performance out of MXNet [2]. I believe there is
>> a lot of value in publicizing more its use and hence making sure at least
>> the basic support for normal use-cases is present.
>> 
>> Of course this needs to be balanced with the overhead of preparing a new
>> release candidate once the fixed is reviewed and merged, which seems to be
>> a lengthy and complex process in its own right, and the delay with
>> providing the other features present in 1.3 for users that are not running
>> off the nightly builds.
>> 
>> All the best,
>> 
>> Thomas
>> 
>> [1] https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
>> [2]
>> 
>> https://www.youtube.com/watch?v=Cqo7FPftNyo=0s=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m
>> 
>>> Le mar. 4 sept. 2018 à 17:11, Sheng Zha  a écrit :
>>> 
>>> Sandeep,
>>> 
>>> Thanks for explaining your veto. We have open bugs that impacted a lot
>> more
>>> than just 3 customers, just by referring to the number of commenters on
>> the
>>> issue [1].
>>> 
>>> You said that this is for "high performance use cases", which contradicts
>>> with Hagay's assement that this is "basic functionality broken". Given
>> that
>>> this is for advanced use cases of using half-precision training, why is
>> it
>>> so much more important than any other open bug reports, that for this
>>> specific bug fix, we have to delay the access of regular users to the new
>>> MXNet 1.3 release by at least another week?
>>> 
>>> Honestly, I'm concerned that your vote is biased by Amazon involvement,
>>> given that you quoted Amazon Rekognition.
>>> 
>>> -sz
>>> 
>>> [1]
>>> 
>>> 
>> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
>>> 
>>> On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
>>> sandeep.krishn...@gmail.com> wrote:
>>> 
 My initial vote of “-0” was due to lack of info from a user who had
>> said,
 he overcame this issue for FP16 model.
 
 
 However, suggested workaround [1] for the issue is not straight forward
>>> and
 generally usable for all users. Also, issue is not simple and isolated
>> to
 be listed in the Release Notes as known issue with a workaround.
 
 
 Changing my vote to: "-1 (binding)" owing to the user impact [3]
 
 
 
 @Sheng:
 
 1. Agreed, bug existed from long time. However, FP16 and such
>>> optimizations
 were added later on. Followed by users [2] using this feature for high
 performance use cases. It is not ok to measure severity of the bug
>> based
>>> on
 its past existence, rather we can see who is impacted now and is it a
>>> small
 subset with a simple workaround or large user impacting issue.
 
 2. Agreed bug was reported 7/21. 

Re: [VOTE] Release MXNet version 1.3.0.RC0

2018-09-04 Thread Thomas DELTEIL
-0
(non-binding)

If I may add some nuancing plus a personal data point as one of the users
commenting in the bug report in question:

- Performance vs. Basic functionality => I don't think high performance
use-cases and basic functionality are two obviously opposed concepts and
see no contradiction in Hagay's and Sandeep's statements.
Float16 support is feature of MXNet that provides more than twice the
performance of Float32 on supported platforms, hence the high performance
use-case. The bug is that the basic functionality of reloading a saved
float16 models is currently broken.

- This bug vs Other bugs => Contrary the vast majority of the 140 open bugs
that are mentioned above, I would put to Sandeep's credit that this one bug
has a PR open that provides a fix for it. This would make it a better
candidate to get included in this release than a bug that has no fix ready
for it.

- Personal datapoint: I recently did some experimentation with float16 [1]
and actually coincidentally just published a video on optimizing
performance for Gluon. Float16 conversion is one of the most, if not the
most effective way to get performance out of MXNet [2]. I believe there is
a lot of value in publicizing more its use and hence making sure at least
the basic support for normal use-cases is present.

Of course this needs to be balanced with the overhead of preparing a new
release candidate once the fixed is reviewed and merged, which seems to be
a lengthy and complex process in its own right, and the delay with
providing the other features present in 1.3 for users that are not running
off the nightly builds.

All the best,

Thomas

[1] https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon
[2]
https://www.youtube.com/watch?v=Cqo7FPftNyo=0s=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m

Le mar. 4 sept. 2018 à 17:11, Sheng Zha  a écrit :

> Sandeep,
>
> Thanks for explaining your veto. We have open bugs that impacted a lot more
> than just 3 customers, just by referring to the number of commenters on the
> issue [1].
>
> You said that this is for "high performance use cases", which contradicts
> with Hagay's assement that this is "basic functionality broken". Given that
> this is for advanced use cases of using half-precision training, why is it
> so much more important than any other open bug reports, that for this
> specific bug fix, we have to delay the access of regular users to the new
> MXNet 1.3 release by at least another week?
>
> Honestly, I'm concerned that your vote is biased by Amazon involvement,
> given that you quoted Amazon Rekognition.
>
> -sz
>
> [1]
>
> https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc
>
> On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
> sandeep.krishn...@gmail.com> wrote:
>
> > My initial vote of “-0” was due to lack of info from a user who had said,
> > he overcame this issue for FP16 model.
> >
> >
> > However, suggested workaround [1] for the issue is not straight forward
> and
> > generally usable for all users. Also, issue is not simple and isolated to
> > be listed in the Release Notes as known issue with a workaround.
> >
> >
> > Changing my vote to: "-1 (binding)" owing to the user impact [3]
> >
> >
> >
> > @Sheng:
> >
> > 1. Agreed, bug existed from long time. However, FP16 and such
> optimizations
> > were added later on. Followed by users [2] using this feature for high
> > performance use cases. It is not ok to measure severity of the bug based
> on
> > its past existence, rather we can see who is impacted now and is it a
> small
> > subset with a simple workaround or large user impacting issue.
> >
> > 2. Agreed bug was reported 7/21. However, I became aware of this issue on
> > 08/29 and submitted the fix on 08/30. Also, I did bring this to the
> notice
> > of community, you and 1.3 release manager (Roshani) on the RC0 proposal
> > thread. Also, I would focus on the issue and user impact than who
> > identified and who is fixing the issue.
> >
> >
> > Based on my discussion with 2 users, I think it is a important feature
> for
> > them to see in Apache MXNet v1.3.0.
> >
> >
> >
> > Best,
> >
> > Sandeep
> >
> >
> > [1] Workaround used by the user.
> >
> >
> > net_fp16 = mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
> > ['data'])
> >
> > params_fp16 = mx.nd.load('resnet34_fp16-.params')
> >
> >
> > for k, v in params_fp16.items():
> >
> > new_key = k.split(':')[1]
> >
> > net_fp16.collect_params()[new_key].cast(v.dtype)
> >
> >
> > net_fp16.collect_params().load('resnet34_fp16-.params', ctx)
> >
> >
> > [2] Amazon Rekognition
> >
> >
> > [3] User story: Train a model -> Cast it to FP16 -> Save the model ->
> Load
> > back the model does not work. They have to cast every parameter with a
> > workaround mentioned above [1].
> >
> > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko  wrote:
> >
> > > Hi Sheng,
> > >
> > > Addressing your questions:
> > >
> > > - "why this specific bug is more 

Re: [VOTE] Release MXNet version 1.3.0.RC0

2018-09-04 Thread Sheng Zha
Sandeep,

Thanks for explaining your veto. We have open bugs that impacted a lot more
than just 3 customers, just by referring to the number of commenters on the
issue [1].

You said that this is for "high performance use cases", which contradicts
with Hagay's assement that this is "basic functionality broken". Given that
this is for advanced use cases of using half-precision training, why is it
so much more important than any other open bug reports, that for this
specific bug fix, we have to delay the access of regular users to the new
MXNet 1.3 release by at least another week?

Honestly, I'm concerned that your vote is biased by Amazon involvement,
given that you quoted Amazon Rekognition.

-sz

[1]
https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc

On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy <
sandeep.krishn...@gmail.com> wrote:

> My initial vote of “-0” was due to lack of info from a user who had said,
> he overcame this issue for FP16 model.
>
>
> However, suggested workaround [1] for the issue is not straight forward and
> generally usable for all users. Also, issue is not simple and isolated to
> be listed in the Release Notes as known issue with a workaround.
>
>
> Changing my vote to: "-1 (binding)" owing to the user impact [3]
>
>
>
> @Sheng:
>
> 1. Agreed, bug existed from long time. However, FP16 and such optimizations
> were added later on. Followed by users [2] using this feature for high
> performance use cases. It is not ok to measure severity of the bug based on
> its past existence, rather we can see who is impacted now and is it a small
> subset with a simple workaround or large user impacting issue.
>
> 2. Agreed bug was reported 7/21. However, I became aware of this issue on
> 08/29 and submitted the fix on 08/30. Also, I did bring this to the notice
> of community, you and 1.3 release manager (Roshani) on the RC0 proposal
> thread. Also, I would focus on the issue and user impact than who
> identified and who is fixing the issue.
>
>
> Based on my discussion with 2 users, I think it is a important feature for
> them to see in Apache MXNet v1.3.0.
>
>
>
> Best,
>
> Sandeep
>
>
> [1] Workaround used by the user.
>
>
> net_fp16 = mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
> ['data'])
>
> params_fp16 = mx.nd.load('resnet34_fp16-.params')
>
>
> for k, v in params_fp16.items():
>
> new_key = k.split(':')[1]
>
> net_fp16.collect_params()[new_key].cast(v.dtype)
>
>
> net_fp16.collect_params().load('resnet34_fp16-.params', ctx)
>
>
> [2] Amazon Rekognition
>
>
> [3] User story: Train a model -> Cast it to FP16 -> Save the model -> Load
> back the model does not work. They have to cast every parameter with a
> workaround mentioned above [1].
>
> On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko  wrote:
>
> > Hi Sheng,
> >
> > Addressing your questions:
> >
> > - "why this specific bug is more important than all the other known bugs,
> > that this becomes a release blocker"
> > I do not consider it to be more or less important than other fixes. It
> can
> > be fixed and included in the release alongside the rest of the release
> > content, right?
> > From the description of the issue it seems important since it is blocking
> > users from loading models that were previously trained and saved. There
> is
> > nothing stopping the community from including this fix into 1.3.0,
> > alongside the rest of the features and fixes.
> >
> > - "The bug exists since SymbolBlock was introduced a year ago and has
> > survived at least three releases, so this is not a regression."
> > I do not think I said it is a regression. However, the fact a bug existed
> > before, does not mean it is OK to release it rather than fix it.
> >
> > - "Timeline-wise, this bug was reported on 7/21, but was not reported as
> > release-blocker in the release discussion thread until 8/31 [1]. Neither
> > its reporting as release-blocker nor its fix made it for the 8/3 code
> > freeze."
> > You are right, would have been better to have this identified and fixed
> > earlier and included before code freeze.
> >
> > - "The PR is still not ready yet as it doesn't have approval."
> > I think it is waiting for your review.
> >
> > - "it would be great if you could provide some additional reasoning
> besides
> > "X mentions the issue" or "fix was done by X""
> > I have. Repeating what I wrote in my previous email for clarity: Basic
> > functionality broken: loading a model (albeit one that that was saved as
> > non FP32)
> >
> > So, yes - this issue seems to have been out there for a while, somehow
> went
> > under the radar... but I think the key question is whether this blocks a
> > basic functionality in MXNet. I believe so, hence my -1 vote.
> >
> > Hagay
> >
> > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha  wrote:
> >
> > > Hi Hagay and Sandeep,
> > >
> > > Could you help us understand why this specific bug is more important
> than
> > > all the 

Re: [VOTE] Release MXNet version 1.3.0.RC0

2018-09-04 Thread sandeep krishnamurthy
My initial vote of “-0” was due to lack of info from a user who had said,
he overcame this issue for FP16 model.


However, suggested workaround [1] for the issue is not straight forward and
generally usable for all users. Also, issue is not simple and isolated to
be listed in the Release Notes as known issue with a workaround.


Changing my vote to: "-1 (binding)" owing to the user impact [3]



@Sheng:

1. Agreed, bug existed from long time. However, FP16 and such optimizations
were added later on. Followed by users [2] using this feature for high
performance use cases. It is not ok to measure severity of the bug based on
its past existence, rather we can see who is impacted now and is it a small
subset with a simple workaround or large user impacting issue.

2. Agreed bug was reported 7/21. However, I became aware of this issue on
08/29 and submitted the fix on 08/30. Also, I did bring this to the notice
of community, you and 1.3 release manager (Roshani) on the RC0 proposal
thread. Also, I would focus on the issue and user impact than who
identified and who is fixing the issue.


Based on my discussion with 2 users, I think it is a important feature for
them to see in Apache MXNet v1.3.0.



Best,

Sandeep


[1] Workaround used by the user.


net_fp16 = mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json',
['data'])

params_fp16 = mx.nd.load('resnet34_fp16-.params')


for k, v in params_fp16.items():

new_key = k.split(':')[1]

net_fp16.collect_params()[new_key].cast(v.dtype)


net_fp16.collect_params().load('resnet34_fp16-.params', ctx)


[2] Amazon Rekognition


[3] User story: Train a model -> Cast it to FP16 -> Save the model -> Load
back the model does not work. They have to cast every parameter with a
workaround mentioned above [1].

On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko  wrote:

> Hi Sheng,
>
> Addressing your questions:
>
> - "why this specific bug is more important than all the other known bugs,
> that this becomes a release blocker"
> I do not consider it to be more or less important than other fixes. It can
> be fixed and included in the release alongside the rest of the release
> content, right?
> From the description of the issue it seems important since it is blocking
> users from loading models that were previously trained and saved. There is
> nothing stopping the community from including this fix into 1.3.0,
> alongside the rest of the features and fixes.
>
> - "The bug exists since SymbolBlock was introduced a year ago and has
> survived at least three releases, so this is not a regression."
> I do not think I said it is a regression. However, the fact a bug existed
> before, does not mean it is OK to release it rather than fix it.
>
> - "Timeline-wise, this bug was reported on 7/21, but was not reported as
> release-blocker in the release discussion thread until 8/31 [1]. Neither
> its reporting as release-blocker nor its fix made it for the 8/3 code
> freeze."
> You are right, would have been better to have this identified and fixed
> earlier and included before code freeze.
>
> - "The PR is still not ready yet as it doesn't have approval."
> I think it is waiting for your review.
>
> - "it would be great if you could provide some additional reasoning besides
> "X mentions the issue" or "fix was done by X""
> I have. Repeating what I wrote in my previous email for clarity: Basic
> functionality broken: loading a model (albeit one that that was saved as
> non FP32)
>
> So, yes - this issue seems to have been out there for a while, somehow went
> under the radar... but I think the key question is whether this blocks a
> basic functionality in MXNet. I believe so, hence my -1 vote.
>
> Hagay
>
> On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha  wrote:
>
> > Hi Hagay and Sandeep,
> >
> > Could you help us understand why this specific bug is more important than
> > all the other known bugs, that this becomes a release blocker?
> >
> > Some facts to consider:
> > - The bug exists since SymbolBlock was introduced a year ago and has
> > survived at least three releases, so this is not a regression.
> > - Timeline-wise, this bug was reported on 7/21, but was not reported as
> > release-blocker in the release discussion thread until 8/31 [1]. Neither
> > its reporting as release-blocker nor its fix made it for the 8/3 code
> > freeze.
> > - The PR is still not ready yet as it doesn't have approval.
> >
> > Hagay, it would be great if you could provide some additional reasoning
> > besides "X mentions the issue" or "fix was done by X". Thanks.
> >
> > -sz
> >
> > [1]
> >
> >
> https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E
> >
> > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko  wrote:
> >
> > > Sandeep mentions the issue of an error when user tries to load model
> > params
> > > trained/saved as FP16.
> > > https://github.com/apache/incubator-mxnet/issues/11849
> > > The fix was done by Sandeep:
> > > 

Re: [LAZY VOTE] Consolidating developer guide in one place (cwiki preferred)

2018-09-04 Thread Lin Yuan
+1

On Tue, Sep 4, 2018 at 1:46 PM Aaron Markham 
wrote:

> I'd like to call for a lazy vote on this before proceeding. Already had
> some +1s but let's be sure.
>
> The vote is to move developer guide info to cwiki. User guides would remain
> on the website.
>
> On Tue, Aug 21, 2018 at 12:53 PM sandeep krishnamurthy <
> sandeep.krishn...@gmail.com> wrote:
>
> > +1
> > Thanks Lin and Aaron. I agree website to cover all user facing
> > documentation and a separate consolidated and organized developer
> focussed
> > docs in one place (cwiki).
> >
> >
> > Note: Permissions on cwiki is currently not well managed with many people
> > having full admin rights to edit/create/delete pages. Should be fine for
> > now, but, when we start accumulating many documents and resources, we
> > should probably revisit on Delete permissions.
> >
> >
> > On Tue, Aug 21, 2018 at 11:57 AM Lin Yuan  wrote:
> >
> > > Hi Aaron,
> > >
> > > Thanks for your answer. I think it's a very worthwhile effort to move
> all
> > > the developer related content from mxet.io website to a dedicated
> > > developer
> > > site. Would you like to initiate this effort?
> > >
> > > Best,
> > >
> > > Lin
> > >
> > > On Wed, Aug 15, 2018 at 3:47 PM Haibin Lin 
> > > wrote:
> > >
> > > > +1
> > > >
> > > > On Wed, Aug 15, 2018 at 1:10 PM, Aaron Markham <
> > > aaron.s.mark...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Lin, I agree with this organization. If you feel like somethings
> > > > should
> > > > > be transitioned from the website to the wiki, I can help with that,
> > but
> > > > for
> > > > > the moment I've been suggesting that new developer-focused content
> be
> > > > > placed on the wiki.
> > > > >
> > > > > On Tue, Aug 14, 2018 at 10:40 AM, Lin Yuan 
> > > wrote:
> > > > >
> > > > > > Dear MXNet community,
> > > > > >
> > > > > > As a developer, I noticed we have some developer guide scattered
> in
> > > > > > different websites (mxnet.io, cwiki):
> > > > > >
> > > > > > E.g.
> > > > > >
> > > > > > How to Create New Operators (Layers): [
> > > > > > https://mxnet.incubator.apache.org/faq/new_op.html]
> > > > > > A Guide to Implementing Sparse Operators in MXNet Backend [
> > > > > > https://cwiki.apache.org/confluence/display/MXNET/A+
> > > > > > Guide+to+Implementing+Sparse+Operators+in+MXNet+Backend
> > > > > > ]
> > > > > >
> > > > > > When searching developer guide by keyword, only one of them can
> be
> > > > > returned
> > > > > > on either site.
> > > > > >
> > > > > > It will be more convenient for developers if all the developer
> > guide
> > > > > > resides on cwiki and all user guide (non-developer) on the
> > mxnet.io
> > > > > > website. We can add a link on mxnet.io to refer all developers
> to
> > > > cwiki
> > > > > > for
> > > > > > guidance.
> > > > > >
> > > > > > Any comment is appreciated.
> > > > > >
> > > > > > Best Regards,
> > > > > >
> > > > > > Lin
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Sandeep Krishnamurthy
> >
>


Re: [LAZY VOTE] Consolidating developer guide in one place (cwiki preferred)

2018-09-04 Thread Aaron Markham
I'd like to call for a lazy vote on this before proceeding. Already had
some +1s but let's be sure.

The vote is to move developer guide info to cwiki. User guides would remain
on the website.

On Tue, Aug 21, 2018 at 12:53 PM sandeep krishnamurthy <
sandeep.krishn...@gmail.com> wrote:

> +1
> Thanks Lin and Aaron. I agree website to cover all user facing
> documentation and a separate consolidated and organized developer focussed
> docs in one place (cwiki).
>
>
> Note: Permissions on cwiki is currently not well managed with many people
> having full admin rights to edit/create/delete pages. Should be fine for
> now, but, when we start accumulating many documents and resources, we
> should probably revisit on Delete permissions.
>
>
> On Tue, Aug 21, 2018 at 11:57 AM Lin Yuan  wrote:
>
> > Hi Aaron,
> >
> > Thanks for your answer. I think it's a very worthwhile effort to move all
> > the developer related content from mxet.io website to a dedicated
> > developer
> > site. Would you like to initiate this effort?
> >
> > Best,
> >
> > Lin
> >
> > On Wed, Aug 15, 2018 at 3:47 PM Haibin Lin 
> > wrote:
> >
> > > +1
> > >
> > > On Wed, Aug 15, 2018 at 1:10 PM, Aaron Markham <
> > aaron.s.mark...@gmail.com>
> > > wrote:
> > >
> > > > Hi Lin, I agree with this organization. If you feel like somethings
> > > should
> > > > be transitioned from the website to the wiki, I can help with that,
> but
> > > for
> > > > the moment I've been suggesting that new developer-focused content be
> > > > placed on the wiki.
> > > >
> > > > On Tue, Aug 14, 2018 at 10:40 AM, Lin Yuan 
> > wrote:
> > > >
> > > > > Dear MXNet community,
> > > > >
> > > > > As a developer, I noticed we have some developer guide scattered in
> > > > > different websites (mxnet.io, cwiki):
> > > > >
> > > > > E.g.
> > > > >
> > > > > How to Create New Operators (Layers): [
> > > > > https://mxnet.incubator.apache.org/faq/new_op.html]
> > > > > A Guide to Implementing Sparse Operators in MXNet Backend [
> > > > > https://cwiki.apache.org/confluence/display/MXNET/A+
> > > > > Guide+to+Implementing+Sparse+Operators+in+MXNet+Backend
> > > > > ]
> > > > >
> > > > > When searching developer guide by keyword, only one of them can be
> > > > returned
> > > > > on either site.
> > > > >
> > > > > It will be more convenient for developers if all the developer
> guide
> > > > > resides on cwiki and all user guide (non-developer) on the
> mxnet.io
> > > > > website. We can add a link on mxnet.io to refer all developers to
> > > cwiki
> > > > > for
> > > > > guidance.
> > > > >
> > > > > Any comment is appreciated.
> > > > >
> > > > > Best Regards,
> > > > >
> > > > > Lin
> > > > >
> > > >
> > >
> >
>
>
> --
> Sandeep Krishnamurthy
>


Re: [VOTE] Release MXNet version 1.3.0.RC0

2018-09-04 Thread Sheng Zha
Hi Hagay and Sandeep,

Could you help us understand why this specific bug is more important than
all the other known bugs, that this becomes a release blocker?

Some facts to consider:
- The bug exists since SymbolBlock was introduced a year ago and has
survived at least three releases, so this is not a regression.
- Timeline-wise, this bug was reported on 7/21, but was not reported as
release-blocker in the release discussion thread until 8/31 [1]. Neither
its reporting as release-blocker nor its fix made it for the 8/3 code
freeze.
- The PR is still not ready yet as it doesn't have approval.

Hagay, it would be great if you could provide some additional reasoning
besides "X mentions the issue" or "fix was done by X". Thanks.

-sz

[1]
https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E

On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko  wrote:

> Sandeep mentions the issue of an error when user tries to load model params
> trained/saved as FP16.
> https://github.com/apache/incubator-mxnet/issues/11849
> The fix was done by Sandeep:
> https://github.com/apache/incubator-mxnet/pull/12412 and is ready to be
> cherry picked into the release branch.
>
> This seems like a release blocker to me:
> - Basic functionality broken: loading a model (albeit one that that was
> saved as non FP32)
> - Reported by 3 users (wgchang@, nicklhy@ and ThomasDelteil@)
>
> -1 (non binding)
>
> Hagay
>
>
>
> On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy <
> sandeep.krishn...@gmail.com> wrote:
>
> > "- 0"
> >
> > I believe the bug #11849
> > , unable to
> import
> > non-fp32 models into Gluon, fixed in this PR #12412
> >  is important for
> > the
> > users. I would rather pick this fix in this release than plan a minor
> > release later.
> >
> > Best,
> > Sandeep
> >
> >
> >
> > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho 
> > wrote:
> >
> > > Actually, the command "git clone --recursive
> > > https://github.com/apache/incubator-mxnet -b 1.3.0.rc0" works fine
> now,
> > > never mind.
> > >
> > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho 
> > > wrote:
> > >
> > > > Unfortunately, MXNet was depending on a branch of TVM that is now
> > > deleted.
> > > > We will have to merge #12448
> > > >  before the
> > > release.
> > > >
> > > > Background: See dmlc/tvm#1394 <
> https://github.com/dmlc/tvm/issues/1394
> > >.
> > > >
> > > > Philip.
> > > >
> > > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier 
> > wrote:
> > > >
> > > >> Checked out the tag, built and tested the Clojure package. +1
> > > >>
> > > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
> > > >> roshaninagmo...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Hi all,
> > > >> >
> > > >> > I would like to propose a vote to release Apache MXNet
> (incubating)
> > > >> version
> > > >> > 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at
> 7:00
> > PM
> > > >> > PDT, Wednesday, Sept 5th.
> > > >> >
> > > >> > Link to release notes:
> > > >> > https://github.com/apache/incubator-mxnet/releases
> > > >> >
> > > >> > Link to release candidate 1.3.0.rc0:
> > > >> > *https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > > >> >  > >0*
> > > >> >
> > > >> > View this page, click on "Build from Source", and use the source
> > code
> > > >> > obtained from 1.3.0.rc0 tag:
> > > >> > https://mxnet.incubator.apache.org/install/index.html
> > > >> >
> > > >> > Please remember to TEST first before voting accordingly:
> > > >> >
> > > >> > +1 = approve
> > > >> > +0 = no opinion
> > > >> > -1 = disapprove (provide reason)
> > > >> >
> > > >> > Thanks,
> > > >> > Roshani
> > > >> >
> > > >>
> > > >
> > >
> >
> >
> > --
> > Sandeep Krishnamurthy
> >
>


Re: [VOTE] Release MXNet version 1.3.0.RC0

2018-09-04 Thread Hagay Lupesko
Sandeep mentions the issue of an error when user tries to load model params
trained/saved as FP16.
https://github.com/apache/incubator-mxnet/issues/11849
The fix was done by Sandeep:
https://github.com/apache/incubator-mxnet/pull/12412 and is ready to be
cherry picked into the release branch.

This seems like a release blocker to me:
- Basic functionality broken: loading a model (albeit one that that was
saved as non FP32)
- Reported by 3 users (wgchang@, nicklhy@ and ThomasDelteil@)

-1 (non binding)

Hagay



On Tue, Sep 4, 2018 at 12:01 PM sandeep krishnamurthy <
sandeep.krishn...@gmail.com> wrote:

> "- 0"
>
> I believe the bug #11849
> , unable to import
> non-fp32 models into Gluon, fixed in this PR #12412
>  is important for
> the
> users. I would rather pick this fix in this release than plan a minor
> release later.
>
> Best,
> Sandeep
>
>
>
> On Mon, Sep 3, 2018 at 2:34 PM Philip Cho 
> wrote:
>
> > Actually, the command "git clone --recursive
> > https://github.com/apache/incubator-mxnet -b 1.3.0.rc0" works fine now,
> > never mind.
> >
> > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho 
> > wrote:
> >
> > > Unfortunately, MXNet was depending on a branch of TVM that is now
> > deleted.
> > > We will have to merge #12448
> > >  before the
> > release.
> > >
> > > Background: See dmlc/tvm#1394  >.
> > >
> > > Philip.
> > >
> > > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier 
> wrote:
> > >
> > >> Checked out the tag, built and tested the Clojure package. +1
> > >>
> > >> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
> > >> roshaninagmo...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi all,
> > >> >
> > >> > I would like to propose a vote to release Apache MXNet (incubating)
> > >> version
> > >> > 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at 7:00
> PM
> > >> > PDT, Wednesday, Sept 5th.
> > >> >
> > >> > Link to release notes:
> > >> > https://github.com/apache/incubator-mxnet/releases
> > >> >
> > >> > Link to release candidate 1.3.0.rc0:
> > >> > *https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> > >> >  >0*
> > >> >
> > >> > View this page, click on "Build from Source", and use the source
> code
> > >> > obtained from 1.3.0.rc0 tag:
> > >> > https://mxnet.incubator.apache.org/install/index.html
> > >> >
> > >> > Please remember to TEST first before voting accordingly:
> > >> >
> > >> > +1 = approve
> > >> > +0 = no opinion
> > >> > -1 = disapprove (provide reason)
> > >> >
> > >> > Thanks,
> > >> > Roshani
> > >> >
> > >>
> > >
> >
>
>
> --
> Sandeep Krishnamurthy
>


Re: [VOTE] Release MXNet version 1.3.0.RC0

2018-09-04 Thread sandeep krishnamurthy
"- 0"

I believe the bug #11849
, unable to import
non-fp32 models into Gluon, fixed in this PR #12412
 is important for the
users. I would rather pick this fix in this release than plan a minor
release later.

Best,
Sandeep



On Mon, Sep 3, 2018 at 2:34 PM Philip Cho 
wrote:

> Actually, the command "git clone --recursive
> https://github.com/apache/incubator-mxnet -b 1.3.0.rc0" works fine now,
> never mind.
>
> On Mon, Sep 3, 2018 at 1:45 PM Philip Cho 
> wrote:
>
> > Unfortunately, MXNet was depending on a branch of TVM that is now
> deleted.
> > We will have to merge #12448
> >  before the
> release.
> >
> > Background: See dmlc/tvm#1394 .
> >
> > Philip.
> >
> > On Mon, Sep 3, 2018 at 7:26 AM Carin Meier  wrote:
> >
> >> Checked out the tag, built and tested the Clojure package. +1
> >>
> >> On Fri, Aug 31, 2018 at 10:59 PM Roshani Nagmote <
> >> roshaninagmo...@gmail.com>
> >> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I would like to propose a vote to release Apache MXNet (incubating)
> >> version
> >> > 1.3.0.RC0. Voting will start now (Friday, Aug 31st) and end at 7:00 PM
> >> > PDT, Wednesday, Sept 5th.
> >> >
> >> > Link to release notes:
> >> > https://github.com/apache/incubator-mxnet/releases
> >> >
> >> > Link to release candidate 1.3.0.rc0:
> >> > *https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc
> >> > 0*
> >> >
> >> > View this page, click on "Build from Source", and use the source code
> >> > obtained from 1.3.0.rc0 tag:
> >> > https://mxnet.incubator.apache.org/install/index.html
> >> >
> >> > Please remember to TEST first before voting accordingly:
> >> >
> >> > +1 = approve
> >> > +0 = no opinion
> >> > -1 = disapprove (provide reason)
> >> >
> >> > Thanks,
> >> > Roshani
> >> >
> >>
> >
>


-- 
Sandeep Krishnamurthy


Re: New Java Inference API

2018-09-04 Thread Naveen Swamy
this proposal is missing many of the offline discussions that happened and
subsequent changes.

@andrewfayres: Please update the wiki(may be you forgot to publish the
changes)

On Tue, Sep 4, 2018 at 11:11 AM Qing Lan  wrote:

> Hi All,
>
> Here is an update for the Java Inference API design doc on CWIKI:
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Java+Inference+API.
> Currently, MXNet Java bindings is an extension of MXNet Scala API that
> allow users to use Java to do inference on MXNet. Users will be able to
> import pre-trained MXNet model and do single/batch inference on it.
>
> Please take a look the design document again and feel free to leave any
> thoughts you have.
>
> Thanks,
> Qing
>
> On 5/10/18, 11:08 AM, "Andrew Ayres"  wrote:
>
> Hi Kellen,
>
> Thanks for the feedback. You bring up an interesting idea about the
> dependencies. I'll add that to the list of things to look into.
>
> As for the threading, my current thinking is that we implement a
> dispatcher
> thread like suggested in the Scala threading discussion
>
> https://discuss.mxnet.io/t/fixing-thread-safety-issues-in-scala-library/236
> .
> I would definitely like to hide such complexities from the user.
>
> Andrew
>
>
> On Thu, May 10, 2018 at 3:22 AM, kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
>
> > Hey Andrew, thanks for the write-up.  I think having a Java binding
> will be
> > very useful for enterprise users.  Doc looks good but two things I'm
> > curious about:
> >
> > How are you planning to handle thread safe inference?   It'll be
> great if
> > you can hide the complexity of dealing with dispatch threading from
> users.
> >
> > The other thing I think a solid Java API could provide is a limited
> number
> > of dependencies.  There's some simple things we can do to make this
> happen
> > (create a statically linked, portable so) but there's also some
> complexity
> > around minimizing dependencies MXNet.  For example we'll likely want
> to
> > release MKL flavoured binaries, we should have a few versions of CUDA
> > supported.  We could try and have one version that has an absolute
> minimum
> > of dependencies (maybe statically linking with openblas).  It might
> be good
> > to document exactly the packages you're planning to release, and
> give some
> > more details about what the dependencies for the packages would be.
> >
> > Many thanks for looking into this, I think it'll be a big
> improvement for
> > many of our users.
> >
> > -Kellen
> >
> > On Thu, May 10, 2018, 12:57 AM Andrew Ayres <
> andrew.f.ay...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > There has been a lot of interest expressed in having a Java API
> for doing
> > > inference. The general idea is that after training a model using
> python,
> > > users would like to be able to load the model for inference inside
> their
> > > existing production eco-system.
> > >
> > > We've begun exploring a few options for the implementation at <
> > > https://cwiki.apache.org/confluence/display/MXNET/
> > MXNet+Java+Inference+API
> > > >
> > > and would appreciate any insights/feedback.
> > >
> > > Thanks,
> > > Andrew
> > >
> >
>
>
>


New Java Inference API

2018-09-04 Thread Qing Lan
Hi All,

Here is an update for the Java Inference API design doc on CWIKI: 
https://cwiki.apache.org/confluence/display/MXNET/MXNet+Java+Inference+API. 
Currently, MXNet Java bindings is an extension of MXNet Scala API that allow 
users to use Java to do inference on MXNet. Users will be able to import 
pre-trained MXNet model and do single/batch inference on it.

Please take a look the design document again and feel free to leave any 
thoughts you have.

Thanks,
Qing

On 5/10/18, 11:08 AM, "Andrew Ayres"  wrote:

Hi Kellen,

Thanks for the feedback. You bring up an interesting idea about the
dependencies. I'll add that to the list of things to look into.

As for the threading, my current thinking is that we implement a dispatcher
thread like suggested in the Scala threading discussion
https://discuss.mxnet.io/t/fixing-thread-safety-issues-in-scala-library/236.
I would definitely like to hide such complexities from the user.

Andrew


On Thu, May 10, 2018 at 3:22 AM, kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> Hey Andrew, thanks for the write-up.  I think having a Java binding will 
be
> very useful for enterprise users.  Doc looks good but two things I'm
> curious about:
>
> How are you planning to handle thread safe inference?   It'll be great if
> you can hide the complexity of dealing with dispatch threading from users.
>
> The other thing I think a solid Java API could provide is a limited number
> of dependencies.  There's some simple things we can do to make this happen
> (create a statically linked, portable so) but there's also some complexity
> around minimizing dependencies MXNet.  For example we'll likely want to
> release MKL flavoured binaries, we should have a few versions of CUDA
> supported.  We could try and have one version that has an absolute minimum
> of dependencies (maybe statically linking with openblas).  It might be 
good
> to document exactly the packages you're planning to release, and give some
> more details about what the dependencies for the packages would be.
>
> Many thanks for looking into this, I think it'll be a big improvement for
> many of our users.
>
> -Kellen
>
> On Thu, May 10, 2018, 12:57 AM Andrew Ayres 
> wrote:
>
> > Hi all,
> >
> > There has been a lot of interest expressed in having a Java API for 
doing
> > inference. The general idea is that after training a model using python,
> > users would like to be able to load the model for inference inside their
> > existing production eco-system.
> >
> > We've begun exploring a few options for the implementation at <
> > https://cwiki.apache.org/confluence/display/MXNET/
> MXNet+Java+Inference+API
> > >
> > and would appreciate any insights/feedback.
> >
> > Thanks,
> > Andrew
> >
>




Re: Propose to discontinue supporting Apache MXNet on Windows 7

2018-09-04 Thread Joshua Z. Zhang
I have contacted some friends in industry, they claim that some controller PCs 
are still on win7 and have no plan to upgrade in near future, so I would 
strongly go -1. 

In terms of build system on Windows 7, MS does give warnings in VS 2015, but 
with compatibility mode, we can still install it without issue. So I guess it’s 
marketing strategy not technical problem on MS side.

Zhi

> On Sep 4, 2018, at 9:02 AM, sebastianb  wrote:
> 
> One more data point: Mathematica still supports Windows 7 (with Platform 
> Update), and we use MXNet as a backend for our neural net framework. So I 
> would also vote against deprecating Windows 7 support.
> 
>> On Sep 2, 2018, at 7:40 PM, Marco de Abreu 
>>  wrote:
>> 
>> Thanks for the data and these quite important points. I agree and hereby
>> change my vote to -1.
>> 
>> Barber, Christopher  schrieb am So., 2. Sep.
>> 2018, 18:56:
>> 
>>> FWIW, my company is only beginning to transition to Windows 10 now, and my
>>> past experience would lead me to believe that many enterprises stick with
>>> old versions of Windows long past when you think they would.
>>> 
>>> Seems to me that if you are unwilling to deprecate python 2.7, then
>>> continuing to support Windows 7 is a no-brainer. You are more likely to get
>>> users to switch to python 3 than you are to get them to install a new
>>> operating system.
>>> 
>>> And do you really want to drop support for platforms that your competitors
>>> still support? Given MXNet's market share, I wouldn't dream of dropping a
>>> platform until after the more popular frameworks have already done so.
>>> 
>>> I also believe that it is possible to install more recent versions of
>>> Visual Studio on Windows 7.
>>> 
>>> On 9/2/18, 1:57 AM, "kellen sunderland" 
>>> wrote:
>>> 
>>>   Google analytics are sadly probably the best numbers we're going to
>>> get.
>>>   Of course these numbers are likely to over-represent windows usage, as
>>> I'm
>>>   sure many people are looking up documentation on a windows machine
>>> while
>>>   ssh'd into a cloud instance or IoT device.
>>> 
>>>   What's the trend over time for these numbers Mu?  Is Windows 7 usage
>>>   relatively stable over the last year?
>>> 
>>>   On Sun, Sep 2, 2018 at 1:58 AM Mu Li  wrote:
>>> 
 According to google analytics, ~12% users who visited mxnet's
>>> website are
 using Windows 7. It's a significant number. Even though we cannot
>>> conclude
 that all of these users will run MXNet on Windows 7, I suggest we
>>> still
 support win7.
 
 BTW, anyone who can access mxnet's google analytics report can
>>> verify this
 number by following this instruction:
 
 
>>> https://stackoverflow.com/questions/1340778/detecting-windows-7-with-google-analytics
 
 
 
 On Sat, Sep 1, 2018 at 1:55 PM Steffen Rochel <
>>> steffenroc...@gmail.com>
 wrote:
 
> I support a data driven decision. Any suggestions how we can obtain
 insight
> about OS usage of the MXNet user community?
> Can we get such information from pip install statistics or should
>>> we
> perform a user poll on the discussion forum?
> On the other hand the lack of data should not prevent us from
>>> moving
> forward and dropping support for outdated OS.
> In any case we would have to announce dropping a platform support
>>> at
 least
> a release in advance.
> Steffen
> 
> On Thu, Aug 30, 2018 at 12:21 PM Sheng Zha 
>>> wrote:
> 
>> Hi Kellen,
>> 
>> Thanks for the explanation. Unfortunately, I don't have the
>>> usage data,
> so
>> I refrained from voting. If any of the voters have such data I'd
>>> love
 to
>> see it too.
>> 
>> -sz
>> 
>> On 2018/08/30 14:58:09, kellen sunderland <
>>> kellen.sunderl...@gmail.com
> 
>> wrote:
>>> I haven't spoken to anyone about the decision (as I'm
>>> currently on an
>>> island in the med) but to me the quick +1s are likely a result
>>> of
 this
>>> being a fairly straightforward decision.  The factors that
>>> went into
 my
>>> thinking were (1) prioritizing growing platforms rather than
 shrinking
>>> platforms (i.e. thinking long term rather than shirt term) and
>>> (2)
>> earning
>>> our customers' trust.  Claiming support for a platform when we
>>> can't
>>> realistically deliver it would lose us trust.  I'd prefer to
>>> over
> deliver
>>> and under promise when it come to windows 7 for this reason.
>>> 
>>> Now on the flip side one thing I would see as valuable is to
>>> try and
> get
>>> windows builds working with clang.  This could be beneficial
>>> in the
> sense
>>> that it would be easy to maintain for mxnet devs and allow us
>>> to use
>> modern
>>> cpp on older windows machines without using vs 2013(which I
>>> consider
 a
>>> non-starter with our codebase).
>>> 
>>> You have peaked my curiousity though 

Re: Propose to discontinue supporting Apache MXNet on Windows 7

2018-09-04 Thread sebastianb
One more data point: Mathematica still supports Windows 7 (with Platform 
Update), and we use MXNet as a backend for our neural net framework. So I would 
also vote against deprecating Windows 7 support.

> On Sep 2, 2018, at 7:40 PM, Marco de Abreu 
>  wrote:
> 
> Thanks for the data and these quite important points. I agree and hereby
> change my vote to -1.
> 
> Barber, Christopher  schrieb am So., 2. Sep.
> 2018, 18:56:
> 
>> FWIW, my company is only beginning to transition to Windows 10 now, and my
>> past experience would lead me to believe that many enterprises stick with
>> old versions of Windows long past when you think they would.
>> 
>> Seems to me that if you are unwilling to deprecate python 2.7, then
>> continuing to support Windows 7 is a no-brainer. You are more likely to get
>> users to switch to python 3 than you are to get them to install a new
>> operating system.
>> 
>> And do you really want to drop support for platforms that your competitors
>> still support? Given MXNet's market share, I wouldn't dream of dropping a
>> platform until after the more popular frameworks have already done so.
>> 
>> I also believe that it is possible to install more recent versions of
>> Visual Studio on Windows 7.
>> 
>> On 9/2/18, 1:57 AM, "kellen sunderland" 
>> wrote:
>> 
>>Google analytics are sadly probably the best numbers we're going to
>> get.
>>Of course these numbers are likely to over-represent windows usage, as
>> I'm
>>sure many people are looking up documentation on a windows machine
>> while
>>ssh'd into a cloud instance or IoT device.
>> 
>>What's the trend over time for these numbers Mu?  Is Windows 7 usage
>>relatively stable over the last year?
>> 
>>On Sun, Sep 2, 2018 at 1:58 AM Mu Li  wrote:
>> 
>>> According to google analytics, ~12% users who visited mxnet's
>> website are
>>> using Windows 7. It's a significant number. Even though we cannot
>> conclude
>>> that all of these users will run MXNet on Windows 7, I suggest we
>> still
>>> support win7.
>>> 
>>> BTW, anyone who can access mxnet's google analytics report can
>> verify this
>>> number by following this instruction:
>>> 
>>> 
>> https://stackoverflow.com/questions/1340778/detecting-windows-7-with-google-analytics
>>> 
>>> 
>>> 
>>> On Sat, Sep 1, 2018 at 1:55 PM Steffen Rochel <
>> steffenroc...@gmail.com>
>>> wrote:
>>> 
 I support a data driven decision. Any suggestions how we can obtain
>>> insight
 about OS usage of the MXNet user community?
 Can we get such information from pip install statistics or should
>> we
 perform a user poll on the discussion forum?
 On the other hand the lack of data should not prevent us from
>> moving
 forward and dropping support for outdated OS.
 In any case we would have to announce dropping a platform support
>> at
>>> least
 a release in advance.
 Steffen
 
 On Thu, Aug 30, 2018 at 12:21 PM Sheng Zha 
>> wrote:
 
> Hi Kellen,
> 
> Thanks for the explanation. Unfortunately, I don't have the
>> usage data,
 so
> I refrained from voting. If any of the voters have such data I'd
>> love
>>> to
> see it too.
> 
> -sz
> 
> On 2018/08/30 14:58:09, kellen sunderland <
>> kellen.sunderl...@gmail.com
 
> wrote:
>> I haven't spoken to anyone about the decision (as I'm
>> currently on an
>> island in the med) but to me the quick +1s are likely a result
>> of
>>> this
>> being a fairly straightforward decision.  The factors that
>> went into
>>> my
>> thinking were (1) prioritizing growing platforms rather than
>>> shrinking
>> platforms (i.e. thinking long term rather than shirt term) and
>> (2)
> earning
>> our customers' trust.  Claiming support for a platform when we
>> can't
>> realistically deliver it would lose us trust.  I'd prefer to
>> over
 deliver
>> and under promise when it come to windows 7 for this reason.
>> 
>> Now on the flip side one thing I would see as valuable is to
>> try and
 get
>> windows builds working with clang.  This could be beneficial
>> in the
 sense
>> that it would be easy to maintain for mxnet devs and allow us
>> to use
> modern
>> cpp on older windows machines without using vs 2013(which I
>> consider
>>> a
>> non-starter with our codebase).
>> 
>> You have peaked my curiousity though Sheng.  How many win7
>> users does
> MXNet
>> have relative to macos/Linux?
>> 
>> On Thu, Aug 30, 2018, 8:51 AM Sheng Zha 
>> wrote:
>> 
>>> Hi Yuan,
>>> 
>>> No problem. This is an issue that's worth having a clear
>>> definition,
 so
>>> there's nothing wrong about your proposal, and thanks for
>> bringing
> this up.
>>> 
>>> I'm more concerned about the seemingly unanimous votes on
>> dropping
> support
>>> on a platform without seeing the supporting evidence that
>> it's the
> right
>>> thing. It