Re: Stopping nightly releases to Pypi

2020-01-02 Thread Pedro Larroy
CD should be separate from CI for security reasons in any case.


On Sat, Dec 7, 2019 at 10:04 AM Marco de Abreu 
wrote:

> Could you elaborate how a non-Amazonian is able to access, maintain and
> review the CodeBuild pipeline? How come we've diverted from the community
> agreed-on standard where the public Jenkins serves for the purpose of
> testing and releasing MXNet? I'd be curious about the issues you're
> encountering with Jenkins CI that led to a non-standard solution.
>
> -Marco
>
>
> Skalicky, Sam  schrieb am Sa., 7. Dez. 2019,
> 18:39:
>
> > Hi MXNet Community,
> >
> > We have been working on getting nightly builds fixed and made available
> > again. We’ve made another system using AWS CodeBuild & S3 to work around
> > the problems with Jenkins CI, PyPI, etc. It is currently building all the
> > flavors and publishing to an S3 bucket here:
> >
> >
> https://us-west-2.console.aws.amazon.com/s3/buckets/apache-mxnet/dist/?region=us-west-2
> >
> > There are folders for each set of nightly builds, try out the wheels
> > starting today 2019-12-07. Builds start at 1:30am PT (9:30am GMT) and
> > arrive in the bucket 30min-2hours later. Inside each folder are the
> wheels
> > for each flavor of MXNet. Currently we’re only building for linux, builds
> > for windows/Mac will come later.
> >
> > If you want to download the wheels easily you can use a URL in the form
> of:
> > https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/
> >
> /dist/-1.6.0b-py2.py3-none-manylinux1_x86_64.whl
> >
> > Heres a set of links for today’s builds
> >
> > (Plain mxnet, no mkl no cuda)
> >
> >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > (mxnet-mkl
> > <
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl(mxnet-mkl
> >
> > )
> >
> >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > (mxnet-cuXXX
> > <
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl(mxnet-cuXXX
> >
> > )
> >
> >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu90-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> >
> >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu92-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> >
> >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu100-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> >
> >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu101-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > (mxnet-cuXXXmkl
> > <
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu101-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl(mxnet-cuXXXmkl
> >
> > )
> >
> >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu90mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> >
> >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu92mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> >
> >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu100mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> >
> >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu101mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> >
> > You can easily install these pip wheels in your system either by
> > downloading them to your machine first and then installing by doing:
> >
> > pip install /path/to/downloaded/wheel.whl
> >
> > Or you can install directly by just giving the link to pip like this:
> >
> > pip install
> >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> >
> > Credit goes to everyone involved (in no particular order)
> > Rakesh Vasudevan
> > Zach Kimberg
> > Manu Seth
> > Sheng Zha
> > Jun Wu
> > Pedro Larroy
> > Chaitanya Bapat
> >
> > Thanks!
> > Sam
> >
> >
> > On Dec 5, 2019, at 1:16 AM, Lausen, Leonard  > > wrote:
> >
> > We don't loose pip by hosting on S3. We just don't host nightly releases
> > on Pypi
> > servers and mirror them to several hundred mirrors immediately after each
> > build
> > is published which is very expensive for the Pypi project.. People can
> > still
> > install the nightly builds with pip by specifying the -f option.
> >
> > Uploading weekly releases to Pypi will reduce the cost for Pypi by ~75%
> > [1]. It
> > may be acceptable to Pypi, but does it make sense for us? I'm not
> convinced
> > weekly release on Pypi is a good idea. Consider one release is buggy,
> > users will
> > need to wait for 7 days for a fix. It doesn't provide good user
> experience.
> > If someone has a stronger conviction about the value 

Re: windows ci, Cmake update, diverging scripts

2020-01-02 Thread Pedro Larroy
I cleaned up the windows setup and installation scripts. Now building MXNet
in windows can be done by executing just *2* scripts. One to setup the
dependencies and other to build.
I also modified the install instructions with this simplified setup. Please
help review the PR. This also updates CMake to 3.15 as requested by the
developers.

https://github.com/apache/incubator-mxnet/pull/17206

Afterwards I will configure the windows AMI pipeline to use this
environment so we can have CMake 3.15 in the windows AMI.

This is a streamlined workflow for developers using MXNet in windows which
might want to integrate with games or other commercial packages which need
deep learning.

Thanks.


On Mon, Dec 30, 2019 at 4:19 PM Pedro Larroy 
wrote:

> I have looked into this a bit, and seems the open source version which is
> in https://github.com/apache/incubator-mxnet-ci is older than what's
> already deployed.
> The root cause of the failure in the update job seems to be a hardcoded
> AMI which is no longer available. There seems to be a way now to query for
> the latest windows AMI:
> https://aws.amazon.com/blogs/mt/query-for-the-latest-windows-ami-using-systems-manager-parameter-store/
>
> On Mon, Dec 30, 2019 at 3:12 PM Pedro Larroy 
> wrote:
>
>> It's automated but broken as the execution is in failed state. I think we
>> will need an engineer to do repairs there.
>>
>> It's using systems manager automation to produce these AMIs.
>>
>> On Mon, Dec 30, 2019 at 1:44 PM Lausen, Leonard 
>> wrote:
>>
>>> Some more background:
>>>
>>> Since a few days, CI downloads and installs a more recent cmake version
>>> in the
>>> Windows job based on
>>>
>>> https://github.com/leezu/mxnet/blob/230ceee5d9e0e02e58be69dad1c4ffdadbaa1bd9/ci/build_windows.py#L148-L153
>>>
>>> This ad-hoc download and installation is not ideal and in fact a
>>> workaround
>>> until the base Windows AMI used by the CI server is updated. The script
>>> generating the base Windows AMI is tracked at
>>> https://github.com/apache/incubator-mxnet-ci and Shiwen Hu recently
>>> updated the
>>> script to include the updated cmake version:
>>> https://github.com/apache/incubator-mxnet-ci/pull/17
>>>
>>> It seems that this change needs to be deployed manually, which Pedro is
>>> attempting to do. But if I understand correctly Pedro found the public
>>> version
>>> of the AMI generation script and some currently used script diverged:
>>> http://ix.io/25WQ
>>>
>>>
>>>
>>> Questions:
>>> 1) Is there a git history associated with the version of the script that
>>> diverged?
>>>
>>> 2) According to
>>>
>>> https://github.com/apache/incubator-mxnet-ci/tree/master/services/jenkins-slave-creation-windows
>>> the Windows Base AMI should be created automatically. Why is it not done
>>> automatically anymore / why does the documentation claim it happens
>>> automatically but it doesn't?
>>>
>>> On Mon, 2019-12-30 at 12:11 -0800, Pedro Larroy wrote:
>>> > Hi
>>> >
>>> > I was looking at a request from Leonard for updating CMake on windows,
>>> and
>>> > I see that the post-install.py script which setups the windows
>>> environment
>>> > in CI has diverged significantly from the incubator-mxnet-ci and the
>>> > private repository that is used to deploy to production CI.
>>> >
>>> > https://github.com/apache/incubator-mxnet/pull/17031
>>> >
>>> > I see quite some patch of differences, there's also different directory
>>> > structure which Marco committed to incubator-mxnet-ci  and MKL seems
>>> to be
>>> > removed. My question why has this diverged so much, I was expecting to
>>> > transplant just a single patch to update CMake.
>>> >
>>> >
>>> > http://ix.io/25WQ
>>> >
>>> >
>>> > Pedro.
>>>
>>


Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc0

2020-01-02 Thread shiwen hu
https://github.com/apache/incubator-mxnet/pull/16980
https://github.com/apache/incubator-mxnet/pull/17031
should be backported to the release branch.
Without them, windows will not be build.

Pedro Larroy  于2019年12月31日周二 上午4:16写道:

> Agree.
>
> On Sat, Dec 28, 2019 at 12:43 PM Lausen, Leonard  >
> wrote:
>
> > When including the OMP fixes in 1.6, Chris's fix for a race condition
> > should be
> > included as well. So it's 3 PRs:
> >
> > https://github.com/apache/incubator-mxnet/pull/17012
> > https://github.com/apache/incubator-mxnet/pull/17039
> > https://github.com/apache/incubator-mxnet/pull/17098
> >
> > While all of these don't affect the binary Python builds that will be
> > distributed for 1.6 release, they do affect any users building the 1.6
> > release
> > from source with cmake. So it's beneficial to backport the 3 PRs.
> >
> > On Fri, 2019-12-27 at 11:24 -0800, Pedro Larroy wrote:
> > > Agree with Sheng, I think it would be good to have the nice fixes that
> > > Leonard has done for 1.6 and not delay them to further releases since
> > they
> > > are beneficial to users and developers. Thanks Leonard for helping fix
> > > these long standing issues.
> > >
> > > On Fri, Dec 27, 2019 at 11:03 AM Lin Yuan  wrote:
> > >
> > > > No, I just wanted to call it out because the title of the issue says
> > > > "Failed
> > > > OpenMP assertion when loading MXNet compiled with DEBUG=1
> > > > ;".
> > > > If this is considered a release blocker, I think we should backport
> it
> > to
> > > > 1.6.
> > > >
> > > > Thanks,
> > > > Lin
> > > >
> > > > On Fri, Dec 27, 2019 at 10:47 AM Sheng Zha 
> wrote:
> > > >
> > > > > Reading these issues it’s pretty clear to me that these are fixes
> for
> > > > > broken builds. I think we do consider broken builds to be release
> > > > blockers.
> > > > > Lin, am I missing something on which you base your suggestion for
> > > > delaying
> > > > > these changes?
> > > > >
> > > > > -sz
> > > > >
> > > > > > On Dec 27, 2019, at 10:30 AM, Lin Yuan 
> > wrote:
> > > > > >
> > > > > > Are these release blocker? It's very risky to make such
> > last-minute
> > > > big
> > > > > > change after code freeze.
> > > > > >
> > > > > > Can we do this in the next release?
> > > > > >
> > > > > > Lin
> > > > > >
> > > > > > > On Fri, Dec 27, 2019 at 7:37 AM Lausen, Leonard
> > > > > 
> > > > > > > wrote:
> > > > > > >
> > > > > > > In case of backporting #17012, also
> > > > > > > https://github.com/apache/incubator-mxnet/pull/17098 must be
> > > > > backported.
> > > > > > > The
> > > > > > > updated OpenMP added a new target which is not used by MXNet
> but
> > > > breaks
> > > > > the
> > > > > > > build on some systems with nvptx. #17098 disables building this
> > unused
> > > > > and
> > > > > > > broken feature.
> > > > > > >
> > > > > > > > On Thu, 2019-12-26 at 12:55 -0800, Pedro Larroy wrote:
> > > > > > > > https://github.com/apache/incubator-mxnet/pull/17012  should
> > be also
> > > > > > > ported
> > > > > > > > to the release branch.
> > > > > > > >
> > > > > > > > On Fri, Dec 20, 2019 at 1:39 PM Przemysław Trędak <
> > > > ptre...@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > That issue is now fixed in master, I am in the process of
> > > > > > > cherry-picking
> > > > > > > > > the fix to v1.6.x branch. I will prepare the RC1 once that
> is
> > > > > > > > > ready.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Przemek
> > > > > > > > >
> > > > > > > > > On 2019/12/20 20:07:36, Lin Yuan 
> > wrote:
> > > > > > > > > > What's the next step for the release? Should we continue
> > testing
> > > > > > > this and
> > > > > > > > > > vote or wait until the
> > > > > > > > > > https://github.com/apache/incubator-mxnet/issues/17105
> is
> > fixed?
> > > > > > > > > >
> > > > > > > > > > Thanks!
> > > > > > > > > >
> > > > > > > > > > Lin
> > > > > > > > > >
> > > > > > > > > > On Wed, Dec 18, 2019 at 12:55 AM Lausen, Leonard
> > > > > > > > > 
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks Przemysław for managing this release and
> everyone
> > who
> > > > > > > > > contributed
> > > > > > > > > > > to it.
> > > > > > > > > > >
> > > > > > > > > > > Unfortunately Zechen Wang just discovered another issue
> > with
> > > > > > > > > > > GPU
> > > > > > > > > Pointwise
> > > > > > > > > > > Fusion:
> > https://github.com/apache/incubator-mxnet/issues/17105
> > > > > > > > > > >
> > > > > > > > > > > Thus, -1.
> > > > > > > > > > >
> > > > > > > > > > > Unfortunately, as the nightly release pipeline was
> broken
> > > > > > > > > > > until
> > > > > > > > > recently
> > > > > > > > > > > (and
> > > > > > > > > > > still isn't re-set up completely yet), the issue hasn't
> > been
> > > > > > > discovered
> > > > > > > > > > > earlier.
> > > > > > > > > > >
> > > > > > > > > > > Przemysław may have a quick fix for the issue. Another
> > option
> > > > > > > would be