Re: MXNet CD pipelines cost savings

2020-02-10 Thread Marco de Abreu
Thanks for bringing this to our attention.

I'm quite devastated that our two vetoes have been ignored and the
CodeBuild pipeline is actually supplying our user-facing binaries.
Suggesting to turn of the Jenkins based CD now adds insult to injury.

I'd like to hear a plan how to make the project compliant again. I already
announced that I will remove any mentions of that publishing method (speak,
all links on our website pointing to the bucket) if the sourcing system is
not our Jenkins CD. So far I believed that this was actually done, but
seems like we got played here.

For the sake of our users, I'm giving one week (until 2/17) to come up with
a proposal and until the end of the month 2/29 to have CodeBuild turned off
and Jenkins CD fixed. Considering we have been tricked last time, I want to
have confirmation that CodeBuild has been turned off and a description how
we can verify that all artifacts are now coming from Jenkins CD.

Best regards
Marco

Zha, Sheng  schrieb am Mo., 10. Feb. 2020,
19:40:

> +dev@
>
> -sz
>
> On Feb 10, 2020, at 1:35 PM, Zha, Sheng  wrote:
>
>  As already stated in the public threads, I’ve vetoed the CodeBuild
> solution from becoming the long term solution as it’s not publicly
> manageable.
>
> As communicated before, the team should have put efforts in maintaining
> and fixing the Jenkins CD pipeline but has neglected to do so. Promoting
> the CodeBuild solution this way is a step in the wrong direction that has
> to be stopped.
>
> -sz
>
> On Feb 10, 2020, at 1:13 PM, Davydenko, Denis  wrote:
>
> 
> Hello guys,
>
> I would like to start this discussion so that we can align on handling CD
> pipelines we currently have. There are two of them: one in Jenkins<
> http://jenkins.mxnet-ci.amazon-ml.com/job/restricted-mxnet-cd/> and one
> in CodeBuild. The one in
> Jenkins is currently functioning but its runs are always failing. The one
> in CodeBuild is currently functioning and publishing artifacts to S3 bucket<
> https://tiny.amazon.com/39negmk0/IsenLink>.
>
> MXNet Engineering team proposal is to shut down Jenkins based CD
> completely as it is currently just a waste of resources and use CodeBuild
> based setup to continue publishing nightly builds to S3 bucket, which
> provides public access to all binaries stored in it. This doesn’t affect a
> discussion of whether to publish binaries to S3 or to pypi – once that
> concludes (if ever) we can switch destination of CodeBuild projects so that
> they would upload MXNet nightly binaries to pypi instead of S3.
>
> This is an effort to get alignment internally, if possible, before
> bringing this as a proposal for community discussion.
>
> --
> Thanks,
> Denis
>


Re: Join request for MXNet Swift support

2020-02-10 Thread Pedro Larroy
Welcome Rahul ! Excited to have you join us.

I was wondering how fast, effective and what options are to call from
python into Swift, and from Swift into C to execute the dataflow graph or
call into operators. There was a thread before about microbenchmarking
calling into the C++ engine from Python using different methods. Not sure
if you have done some experiments in that direction.

Pedro.

On Mon, Feb 10, 2020 at 3:57 AM Tao Lv  wrote:

> Hi Rahul,
>
> Invite is sent to rahulbhal...@protonmail.com. Welcome to the community
> and
> looking forward to your contribution.
>
> -tao
>
> On Mon, Feb 10, 2020 at 1:10 PM Rahul  .invalid>
> wrote:
>
> > Hello,
> >
> > As per the conversation with [Pedro Larroy](https://twitter.com/plarroy)
> > on [Twitter thread](
> https://twitter.com/plarroy/status/1226408543621771264)
> > I would like to join this Slack channel for contributing to MXNet in
> Swift.
> >
> > Regards
> > Rahul Bhalley
> > [ORCID](https://orcid.org/-0002-4574-0390)
>


Re: MXNet CD pipelines cost savings

2020-02-10 Thread Zha, Sheng
+dev@

-sz

On Feb 10, 2020, at 1:35 PM, Zha, Sheng  wrote:

 As already stated in the public threads, I’ve vetoed the CodeBuild solution 
from becoming the long term solution as it’s not publicly manageable.

As communicated before, the team should have put efforts in maintaining and 
fixing the Jenkins CD pipeline but has neglected to do so. Promoting the 
CodeBuild solution this way is a step in the wrong direction that has to be 
stopped.

-sz

On Feb 10, 2020, at 1:13 PM, Davydenko, Denis  wrote:


Hello guys,

I would like to start this discussion so that we can align on handling CD 
pipelines we currently have. There are two of them: one in 
Jenkins and one 
in CodeBuild. The one in Jenkins is 
currently functioning but its runs are always failing. The one in CodeBuild is 
currently functioning and publishing artifacts to S3 
bucket.

MXNet Engineering team proposal is to shut down Jenkins based CD completely as 
it is currently just a waste of resources and use CodeBuild based setup to 
continue publishing nightly builds to S3 bucket, which provides public access 
to all binaries stored in it. This doesn’t affect a discussion of whether to 
publish binaries to S3 or to pypi – once that concludes (if ever) we can switch 
destination of CodeBuild projects so that they would upload MXNet nightly 
binaries to pypi instead of S3.

This is an effort to get alignment internally, if possible, before bringing 
this as a proposal for community discussion.

--
Thanks,
Denis


Re: Proposal to make MKLDNN as default CPU backend

2020-02-10 Thread Lausen, Leonard
Hi,

as the respective PR has been open for a while and as there has been no follow-
up to Patric's mail, I suggest to merge it once CI passes after Tao's conflict
resolution earlier today.

This gives community members time to test for regressions prior to the 1.7
release. If such were found, we can reconsider the decision.

Best regards
Leonard

[1]: https://github.com/apache/incubator-mxnet/pull/16899

On Wed, 2019-11-20 at 05:27 +, Zhao, Patric wrote:
> Thanks all of the great suggestions. 
> 
> Regarding the binary release, including w/o MKLDNN build, I have summarized a
> table (check attachment).
> 
> - Major changes in python packages, see attached table. 
> - Switch on MKLDNN for no mkl suffix binary in release 1.7 (Red check mark) 
> - Add new mxnet-native build w/o MKLDNN and cuDNN (Yellow background)
>   Track the usage/download in 1-2 releases and then decide if we need it for a
> long time
> - Drop all mkl suffix binary in next major release v2.x.
> 
> Thanks,
> 
> --Patric
> 
> > -Original Message-
> > From: Lin Yuan 
> > Sent: Wednesday, November 20, 2019 5:40 AM
> > To: dev@mxnet.incubator.apache.org
> > Cc: Tao Lv 
> > Subject: Re: Proposal to make MKLDNN as default CPU backend
> > 
> > Also per Sam's suggestion, we could still release a build without MKLDNN
> > (name it mxnet-nomkldnn?) and track the usage/download for one or two
> > releases. If there is no usage, we could drop that build in the future.
> > 
> > Best,
> > 
> > Lin
> > 
> > On Tue, Nov 19, 2019 at 1:23 PM Lin Yuan  wrote:
> > 
> > > Just to summarize base on the concerns Marco raised and discussed
> > abvove:
> > > - AMD CPU (it should work with MKLDNN:
> > > 
> > https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+M
> > KL
> > > -DNN+-+Performance+Benchmarking
> > > )
> > > - ARM CPU (we don't have it today w/o MKLDNN either)
> > > - Windows (Windows support is there regardless of MKLDNN or not)
> > > - GPU and MKLDNN enabled (already supported)
> > > - Fully reproducible results (medical and financial sector requested
> > > that and we have some flags for cuda) (The nondeterminism exists even
> > > today w/o MKLDNN. We should address it regardless of MLKDNN)
> > > 
> > > Marco, please let us know if your concerns are properly addressed?
> > > 
> > > Given that MKLDNN gives significant performance speed up in CPU, I am
> > > inclined to make it default in pip build.
> > > 
> > > Best,
> > > 
> > > Lin
> > > 
> > > On Tue, Nov 19, 2019 at 8:08 AM Chris Olivier 
> > > wrote:
> > > 
> > > > Thanks, Patric. I was just trying to point out that there was
> > > > currently no guarantee of deterministic results without MKL, so
> > > > there’s not necessarily an expectation of determinism with MKL (ie
> > requirement isn’t relaxed).
> > > > On Mon, Nov 18, 2019 at 9:38 PM Zhao, Patric 
> > > > wrote:
> > > > 
> > > > > It may be a concern but little noise can't affect the final results
> > > > > if
> > > > the
> > > > > algorithm is stable in numerical.
> > > > > The MKLDNN backend with mxnet-mkl has been used for 2 years and
> > we
> > > > didn't
> > > > > see the coverage issue caused by multiple threading.
> > > > > In other words, GPU programming mode works well on training where
> > > > > the non-deterministic also exists from multiple threads.
> > > > > 
> > > > > Parts of training accuracy was pasted in the first PR when MKLDNN
> > > > > is integrated.
> > > > > 
> > > > https://github.com/apache/incubator-mxnet/pull/8302#issuecomment-
> > 3596
> > > > 74818
> > > > > In conclusion, it may happen with very little probability. I
> > > > > believe we can get a solution in case it happens someday.
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > --Patric
> > > > > 
> > > > > 
> > > > > > -Original Message-
> > > > > > From: Chris Olivier 
> > > > > > Sent: Tuesday, November 19, 2019 11:51 AM
> > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > Cc: Tao Lv 
> > > > > > Subject: Re: Proposal to make MKLDNN as default CPU backend
> > > > > > 
> > > > > > (for non mkl dropout, for instance)
> > > > > > 
> > > > > > On Mon, Nov 18, 2019 at 7:50 PM Chris Olivier
> > > > > > 
> > > > > > wrote:
> > > > > > 
> > > > > > > To address the deterministic item, I know for a fact that
> > > > > > > training will not be deterministic in some cases where the
> > > > > > > “parallel
> > random”
> > > > > > > class is utilized in parallel threads, such as OMP, if the
> > > > > > > number of cores is different, even with the same seed, because
> > > > > > > threads are seeded independently and different number of
> > > > > > > threads will end up generating different random number
> > > > > > > sequences. Dropout operator being
> > > > > > an example.
> > > > > > > On Mon, Nov 18, 2019 at 6:39 PM Alfredo Luque
> > > > > > >  wrote:
> > > > > > > 
> > > > > > > > For AMD CPUs, you’d want to perform validation because now
> > > > > > > > MKL-DNN would be enabled by default. Historically, other intel
> > 

Re: Join request for MXNet Swift support

2020-02-10 Thread Tao Lv
Hi Rahul,

Invite is sent to rahulbhal...@protonmail.com. Welcome to the community and
looking forward to your contribution.

-tao

On Mon, Feb 10, 2020 at 1:10 PM Rahul 
wrote:

> Hello,
>
> As per the conversation with [Pedro Larroy](https://twitter.com/plarroy)
> on [Twitter thread](https://twitter.com/plarroy/status/1226408543621771264)
> I would like to join this Slack channel for contributing to MXNet in Swift.
>
> Regards
> Rahul Bhalley
> [ORCID](https://orcid.org/-0002-4574-0390)