Re: [LAZY VOTE][RESULT] Upgrade CI to CUDA 9.1 with CuDNN 7.0

2018-05-17 Thread Marco de Abreu
Hello Haibin,

I'd love to see CUDA 8 back in CI, but we're currently lacking people to do
this properly (besides just copy the job). Since we agreed on only
supporting the last 2 CUDA major versions, we don't have to verify CUDA 7.

The way to go forward is to have things like these in the nightly test
cycle. At the moment, we don't have to manpower to maintain and improve
that suite, so we'll have to wait until we got more people or somebody is
willing to take this on themselves. I'd be happy to support volunteers here!

Best regards,
Marco

On Thu, May 17, 2018 at 7:56 AM, Haibin Lin 
wrote:

> Is there a plan for adding those CUDA 8 tests back to CI? What about CUDA
> 7?
>
> There were a few build problems in the past few weeks due to lack of CI
> coverage:
> - https://github.com/apache/incubator-mxnet/pull/10710 were found during
> 1.2 rc voting
> - https://github.com/apache/incubator-mxnet/issues/10981 were reported by
> an user with CUDA 7
>
> Having these covered in CI will help catch the issues early. I don't recall
> if we decided to drop CUDA 7 support for MXNet.
>
> Best,
> Haibin
>
> On Wed, Mar 21, 2018 at 6:32 AM, Marco de Abreu <
> marco.g.ab...@googlemail.com> wrote:
>
> > Hello,
> >
> > the migration has just been completed and we're now running our UNIX
> based
> > slaves on CUDA 9.1 with CuDNN 7. The commit is available at
> > https://github.com/apache/incubator-mxnet/commit/
> > b0a6760efa141aeca87b03ecf34dae924bd1af46
> > .
> >
> > No jobs have been interrupted by this migration. If you encounter any
> > errors, please reach back to me.
> >
> > Best regards,
> > Marco
> >
> > On Tue, Mar 20, 2018 at 11:20 PM, Marco de Abreu <
> > marco.g.ab...@googlemail.com> wrote:
> >
> > > Hello,
> > >
> > > the results of this vote are as follows:
> > >
> > > +1:
> > > Jun
> > > Anirudh
> > > Hao
> > > Marco
> > >
> > > 0:
> > > Chris
> > >
> > > -1:
> > > Naveen (veto recalled as of https://lists.apache.org/thread.html/
> > > 242db72a0c96349ef6e0ff1d3b1fe0dc7f7a9082532724c3293666c5@%
> > > 3Cdev.mxnet.apache.org%3E)
> > >
> > > Under the constraint that we will use CUDA 8 on Windows and CUDA 9.1 on
> > > UNIX slaves and work on integration tests for CUDA 8 in the long term,
> > this
> > > vote counts as PASSED.
> > >
> > > The PR for this change is available at https://github.com/apache/
> > > incubator-mxnet/pull/10108. I have developed and tested the new slaves
> in
> > > our test environment and everything looks promising so far. The plan is
> > as
> > > follows:
> > >
> > >1. Get https://github.com/apache/incubator-mxnet/pull/10108
> approved
> > >to allow self-merge – CI can’t pass until slaves have been upgraded.
> > >2. Replace all existing slaves with new upgraded slaves.
> > >3. Retrigger https://github.com/apache/incubator-mxnet/pull/10108
> to
> > >merge necessary changes into master.
> > >
> > > IMPORTANT: The migration will happen tomorrow, so please expect some
> > delay
> > > in job execution - the CI website will be unaffected. Ideally, no jobs
> > > should fail - in case they do, please feel free to retrigger them by
> > using
> > > an empty commit. In case of any errors appearing after the upgrade,
> don't
> > > hesitate to contact me!
> > >
> > > Best regards,
> > > Marco
> > >
> > >
> > > On Tue, Mar 20, 2018 at 1:39 AM, Naveen Swamy 
> > wrote:
> > >
> > >> Yes, for short-term.
> > >>
> > >> On Monday, March 19, 2018, Chris Olivier 
> > wrote:
> > >>
> > >> > In the short ter, Naveen, are you ok with Linux running CUDA 9 and
> > >> Windows
> > >> > CUDA 8 in order to get CUDA version coverage?
> > >> >
> > >> > On 2018/03/16 21:09:09, Marco de Abreu <
> marco.g.ab...@googlemail.com>
> > >> > wrote:
> > >> > > Thanks for your input. How would you propose to proceed in terms
> of
> > a
> > >> > > timeline in case this vote succeedes? I don't really have time to
> > work
> > >> > on a
> > >> > > nightly setup right now. Would anybody in the community be able to
> > >> help
> > >> > me
> > >> > > out here or shall we wait with the migration until a nightly setup
> > for
> > >> > CUDA
> > >> > > 8 is up?
> > >> > >
> > >> > > -Marco
> > >> > >
> > >> > > On Fri, Mar 16, 2018 at 9:55 PM, Bhavin Thaker <
> > >> bhavintha...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > +1 to the suggestion of testing CUDA8 in few nightly instances
> and
> > >> > using
> > >> > > > CUDA9 for most instances in CI.
> > >> > > >
> > >> > > > Bhavin Thaker.
> > >> > > >
> > >> > > > On Fri, Mar 16, 2018 at 12:37 PM Naveen Swamy <
> mnnav...@gmail.com
> > >
> > >> > wrote:
> > >> > > >
> > >> > > > > I think its best to add support for CUDA 9.0 while retaining
> > >> existing
> > >> > > > > support for CUDA 8, code might regress when you remove and
> > create
> > >> > more
> > >> > > > work
> > >> > > > > to add CUDA 8 support back.
> > >> > > > >
> > >> > > > > On Fri, Mar 16, 2018 at 9:29 AM, Marco de Abreu 

Re: [LAZY VOTE][RESULT] Upgrade CI to CUDA 9.1 with CuDNN 7.0

2018-05-16 Thread Haibin Lin
Is there a plan for adding those CUDA 8 tests back to CI? What about CUDA 7?

There were a few build problems in the past few weeks due to lack of CI
coverage:
- https://github.com/apache/incubator-mxnet/pull/10710 were found during
1.2 rc voting
- https://github.com/apache/incubator-mxnet/issues/10981 were reported by
an user with CUDA 7

Having these covered in CI will help catch the issues early. I don't recall
if we decided to drop CUDA 7 support for MXNet.

Best,
Haibin

On Wed, Mar 21, 2018 at 6:32 AM, Marco de Abreu <
marco.g.ab...@googlemail.com> wrote:

> Hello,
>
> the migration has just been completed and we're now running our UNIX based
> slaves on CUDA 9.1 with CuDNN 7. The commit is available at
> https://github.com/apache/incubator-mxnet/commit/
> b0a6760efa141aeca87b03ecf34dae924bd1af46
> .
>
> No jobs have been interrupted by this migration. If you encounter any
> errors, please reach back to me.
>
> Best regards,
> Marco
>
> On Tue, Mar 20, 2018 at 11:20 PM, Marco de Abreu <
> marco.g.ab...@googlemail.com> wrote:
>
> > Hello,
> >
> > the results of this vote are as follows:
> >
> > +1:
> > Jun
> > Anirudh
> > Hao
> > Marco
> >
> > 0:
> > Chris
> >
> > -1:
> > Naveen (veto recalled as of https://lists.apache.org/thread.html/
> > 242db72a0c96349ef6e0ff1d3b1fe0dc7f7a9082532724c3293666c5@%
> > 3Cdev.mxnet.apache.org%3E)
> >
> > Under the constraint that we will use CUDA 8 on Windows and CUDA 9.1 on
> > UNIX slaves and work on integration tests for CUDA 8 in the long term,
> this
> > vote counts as PASSED.
> >
> > The PR for this change is available at https://github.com/apache/
> > incubator-mxnet/pull/10108. I have developed and tested the new slaves in
> > our test environment and everything looks promising so far. The plan is
> as
> > follows:
> >
> >1. Get https://github.com/apache/incubator-mxnet/pull/10108 approved
> >to allow self-merge – CI can’t pass until slaves have been upgraded.
> >2. Replace all existing slaves with new upgraded slaves.
> >3. Retrigger https://github.com/apache/incubator-mxnet/pull/10108 to
> >merge necessary changes into master.
> >
> > IMPORTANT: The migration will happen tomorrow, so please expect some
> delay
> > in job execution - the CI website will be unaffected. Ideally, no jobs
> > should fail - in case they do, please feel free to retrigger them by
> using
> > an empty commit. In case of any errors appearing after the upgrade, don't
> > hesitate to contact me!
> >
> > Best regards,
> > Marco
> >
> >
> > On Tue, Mar 20, 2018 at 1:39 AM, Naveen Swamy 
> wrote:
> >
> >> Yes, for short-term.
> >>
> >> On Monday, March 19, 2018, Chris Olivier 
> wrote:
> >>
> >> > In the short ter, Naveen, are you ok with Linux running CUDA 9 and
> >> Windows
> >> > CUDA 8 in order to get CUDA version coverage?
> >> >
> >> > On 2018/03/16 21:09:09, Marco de Abreu 
> >> > wrote:
> >> > > Thanks for your input. How would you propose to proceed in terms of
> a
> >> > > timeline in case this vote succeedes? I don't really have time to
> work
> >> > on a
> >> > > nightly setup right now. Would anybody in the community be able to
> >> help
> >> > me
> >> > > out here or shall we wait with the migration until a nightly setup
> for
> >> > CUDA
> >> > > 8 is up?
> >> > >
> >> > > -Marco
> >> > >
> >> > > On Fri, Mar 16, 2018 at 9:55 PM, Bhavin Thaker <
> >> bhavintha...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > +1 to the suggestion of testing CUDA8 in few nightly instances and
> >> > using
> >> > > > CUDA9 for most instances in CI.
> >> > > >
> >> > > > Bhavin Thaker.
> >> > > >
> >> > > > On Fri, Mar 16, 2018 at 12:37 PM Naveen Swamy  >
> >> > wrote:
> >> > > >
> >> > > > > I think its best to add support for CUDA 9.0 while retaining
> >> existing
> >> > > > > support for CUDA 8, code might regress when you remove and
> create
> >> > more
> >> > > > work
> >> > > > > to add CUDA 8 support back.
> >> > > > >
> >> > > > > On Fri, Mar 16, 2018 at 9:29 AM, Marco de Abreu <
> >> > > > > marco.g.ab...@googlemail.com> wrote:
> >> > > > >
> >> > > > > > Yeah, sorry Chris, mixed up the names.
> >> > > > > >
> >> > > > > > @Naveen: Would you be fine with doing the switch now and
> adding
> >> > > > > integration
> >> > > > > > tests later or is this a hard constraint for you?
> >> > > > > >
> >> > > > > > On Wed, Mar 14, 2018 at 6:39 PM, Chris Olivier <
> >> > cjolivie...@gmail.com>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > Isn't the TItan V the Volta and not the Tesla?
> >> > > > > > >
> >> > > > > > > On Wed, Mar 14, 2018 at 10:36 AM, Naveen Swamy <
> >> > mnnav...@gmail.com>
> >> > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Marco,
> >> > > > > > > > My -1 vote is for dropping support to CUDA 8 and not for
> >> adding
> >> > > > CUDA
> >> > > > > 9.
> >> > > > > > > > CUDA 9.0 support for MXNet was added Oct'30-2017, I think
> >> that
> >> > all
> >> 

Re: [LAZY VOTE][RESULT] Upgrade CI to CUDA 9.1 with CuDNN 7.0

2018-03-21 Thread Marco de Abreu
Hello,

the migration has just been completed and we're now running our UNIX based
slaves on CUDA 9.1 with CuDNN 7. The commit is available at
https://github.com/apache/incubator-mxnet/commit/b0a6760efa141aeca87b03ecf34dae924bd1af46
.

No jobs have been interrupted by this migration. If you encounter any
errors, please reach back to me.

Best regards,
Marco

On Tue, Mar 20, 2018 at 11:20 PM, Marco de Abreu <
marco.g.ab...@googlemail.com> wrote:

> Hello,
>
> the results of this vote are as follows:
>
> +1:
> Jun
> Anirudh
> Hao
> Marco
>
> 0:
> Chris
>
> -1:
> Naveen (veto recalled as of https://lists.apache.org/thread.html/
> 242db72a0c96349ef6e0ff1d3b1fe0dc7f7a9082532724c3293666c5@%
> 3Cdev.mxnet.apache.org%3E)
>
> Under the constraint that we will use CUDA 8 on Windows and CUDA 9.1 on
> UNIX slaves and work on integration tests for CUDA 8 in the long term, this
> vote counts as PASSED.
>
> The PR for this change is available at https://github.com/apache/
> incubator-mxnet/pull/10108. I have developed and tested the new slaves in
> our test environment and everything looks promising so far. The plan is as
> follows:
>
>1. Get https://github.com/apache/incubator-mxnet/pull/10108 approved
>to allow self-merge – CI can’t pass until slaves have been upgraded.
>2. Replace all existing slaves with new upgraded slaves.
>3. Retrigger https://github.com/apache/incubator-mxnet/pull/10108 to
>merge necessary changes into master.
>
> IMPORTANT: The migration will happen tomorrow, so please expect some delay
> in job execution - the CI website will be unaffected. Ideally, no jobs
> should fail - in case they do, please feel free to retrigger them by using
> an empty commit. In case of any errors appearing after the upgrade, don't
> hesitate to contact me!
>
> Best regards,
> Marco
>
>
> On Tue, Mar 20, 2018 at 1:39 AM, Naveen Swamy  wrote:
>
>> Yes, for short-term.
>>
>> On Monday, March 19, 2018, Chris Olivier  wrote:
>>
>> > In the short ter, Naveen, are you ok with Linux running CUDA 9 and
>> Windows
>> > CUDA 8 in order to get CUDA version coverage?
>> >
>> > On 2018/03/16 21:09:09, Marco de Abreu 
>> > wrote:
>> > > Thanks for your input. How would you propose to proceed in terms of a
>> > > timeline in case this vote succeedes? I don't really have time to work
>> > on a
>> > > nightly setup right now. Would anybody in the community be able to
>> help
>> > me
>> > > out here or shall we wait with the migration until a nightly setup for
>> > CUDA
>> > > 8 is up?
>> > >
>> > > -Marco
>> > >
>> > > On Fri, Mar 16, 2018 at 9:55 PM, Bhavin Thaker <
>> bhavintha...@gmail.com>
>> > > wrote:
>> > >
>> > > > +1 to the suggestion of testing CUDA8 in few nightly instances and
>> > using
>> > > > CUDA9 for most instances in CI.
>> > > >
>> > > > Bhavin Thaker.
>> > > >
>> > > > On Fri, Mar 16, 2018 at 12:37 PM Naveen Swamy 
>> > wrote:
>> > > >
>> > > > > I think its best to add support for CUDA 9.0 while retaining
>> existing
>> > > > > support for CUDA 8, code might regress when you remove and create
>> > more
>> > > > work
>> > > > > to add CUDA 8 support back.
>> > > > >
>> > > > > On Fri, Mar 16, 2018 at 9:29 AM, Marco de Abreu <
>> > > > > marco.g.ab...@googlemail.com> wrote:
>> > > > >
>> > > > > > Yeah, sorry Chris, mixed up the names.
>> > > > > >
>> > > > > > @Naveen: Would you be fine with doing the switch now and adding
>> > > > > integration
>> > > > > > tests later or is this a hard constraint for you?
>> > > > > >
>> > > > > > On Wed, Mar 14, 2018 at 6:39 PM, Chris Olivier <
>> > cjolivie...@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Isn't the TItan V the Volta and not the Tesla?
>> > > > > > >
>> > > > > > > On Wed, Mar 14, 2018 at 10:36 AM, Naveen Swamy <
>> > mnnav...@gmail.com>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > Marco,
>> > > > > > > > My -1 vote is for dropping support to CUDA 8 and not for
>> adding
>> > > > CUDA
>> > > > > 9.
>> > > > > > > > CUDA 9.0 support for MXNet was added Oct'30-2017, I think
>> that
>> > all
>> > > > > > users
>> > > > > > > > might not have switched to CUDA 9.0
>> > > > > > > >
>> > > > > > > > Look at the earlier discussion on the same topic
>> > > > > > > >
>> > > > > > > > https://lists.apache.org/thread.html/
>> > > > 27b84e4fc0e0728f2e4ad8b6827d7f
>> > > > > > > > 996635021a5a4d47b5d3f4dbfb@%3Cdev.mxnet.apache.org%3E
>> > > > > > > >
>> > > > > > > > On Wed, Mar 14, 2018 at 10:14 AM, Marco de Abreu <
>> > > > > > > > marco.g.ab...@googlemail.com> wrote:
>> > > > > > > >
>> > > > > > > > > Right, the code changes would not be validated against
>> CUDA
>> > 8.0
>> > > > as
>> > > > > > part
>> > > > > > > > of
>> > > > > > > > > the PR process.
>> > > > > > > > >
>> > > > > > > > > I don't have any numbers, but it's pretty unlikely that
>> > anybody
>> > > > is
>> > > > > > > still
>> > > > > > > > > using CUDA 8.0. According to

[LAZY VOTE][RESULT] Upgrade CI to CUDA 9.1 with CuDNN 7.0

2018-03-20 Thread Marco de Abreu
Hello,

the results of this vote are as follows:

+1:
Jun
Anirudh
Hao
Marco

0:
Chris

-1:
Naveen (veto recalled as of
https://lists.apache.org/thread.html/242db72a0c96349ef6e0ff1d3b1fe0dc7f7a9082532724c3293666c5@%3Cdev.mxnet.apache.org%3E
)

Under the constraint that we will use CUDA 8 on Windows and CUDA 9.1 on
UNIX slaves and work on integration tests for CUDA 8 in the long term, this
vote counts as PASSED.

The PR for this change is available at
https://github.com/apache/incubator-mxnet/pull/10108. I have developed and
tested the new slaves in our test environment and everything looks
promising so far. The plan is as follows:

   1. Get https://github.com/apache/incubator-mxnet/pull/10108 approved to
   allow self-merge – CI can’t pass until slaves have been upgraded.
   2. Replace all existing slaves with new upgraded slaves.
   3. Retrigger https://github.com/apache/incubator-mxnet/pull/10108 to
   merge necessary changes into master.

IMPORTANT: The migration will happen tomorrow, so please expect some delay
in job execution - the CI website will be unaffected. Ideally, no jobs
should fail - in case they do, please feel free to retrigger them by using
an empty commit. In case of any errors appearing after the upgrade, don't
hesitate to contact me!

Best regards,
Marco

On Tue, Mar 20, 2018 at 1:39 AM, Naveen Swamy  wrote:

> Yes, for short-term.
>
> On Monday, March 19, 2018, Chris Olivier  wrote:
>
> > In the short ter, Naveen, are you ok with Linux running CUDA 9 and
> Windows
> > CUDA 8 in order to get CUDA version coverage?
> >
> > On 2018/03/16 21:09:09, Marco de Abreu 
> > wrote:
> > > Thanks for your input. How would you propose to proceed in terms of a
> > > timeline in case this vote succeedes? I don't really have time to work
> > on a
> > > nightly setup right now. Would anybody in the community be able to help
> > me
> > > out here or shall we wait with the migration until a nightly setup for
> > CUDA
> > > 8 is up?
> > >
> > > -Marco
> > >
> > > On Fri, Mar 16, 2018 at 9:55 PM, Bhavin Thaker  >
> > > wrote:
> > >
> > > > +1 to the suggestion of testing CUDA8 in few nightly instances and
> > using
> > > > CUDA9 for most instances in CI.
> > > >
> > > > Bhavin Thaker.
> > > >
> > > > On Fri, Mar 16, 2018 at 12:37 PM Naveen Swamy 
> > wrote:
> > > >
> > > > > I think its best to add support for CUDA 9.0 while retaining
> existing
> > > > > support for CUDA 8, code might regress when you remove and create
> > more
> > > > work
> > > > > to add CUDA 8 support back.
> > > > >
> > > > > On Fri, Mar 16, 2018 at 9:29 AM, Marco de Abreu <
> > > > > marco.g.ab...@googlemail.com> wrote:
> > > > >
> > > > > > Yeah, sorry Chris, mixed up the names.
> > > > > >
> > > > > > @Naveen: Would you be fine with doing the switch now and adding
> > > > > integration
> > > > > > tests later or is this a hard constraint for you?
> > > > > >
> > > > > > On Wed, Mar 14, 2018 at 6:39 PM, Chris Olivier <
> > cjolivie...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Isn't the TItan V the Volta and not the Tesla?
> > > > > > >
> > > > > > > On Wed, Mar 14, 2018 at 10:36 AM, Naveen Swamy <
> > mnnav...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Marco,
> > > > > > > > My -1 vote is for dropping support to CUDA 8 and not for
> adding
> > > > CUDA
> > > > > 9.
> > > > > > > > CUDA 9.0 support for MXNet was added Oct'30-2017, I think
> that
> > all
> > > > > > users
> > > > > > > > might not have switched to CUDA 9.0
> > > > > > > >
> > > > > > > > Look at the earlier discussion on the same topic
> > > > > > > >
> > > > > > > > https://lists.apache.org/thread.html/
> > > > 27b84e4fc0e0728f2e4ad8b6827d7f
> > > > > > > > 996635021a5a4d47b5d3f4dbfb@%3Cdev.mxnet.apache.org%3E
> > > > > > > >
> > > > > > > > On Wed, Mar 14, 2018 at 10:14 AM, Marco de Abreu <
> > > > > > > > marco.g.ab...@googlemail.com> wrote:
> > > > > > > >
> > > > > > > > > Right, the code changes would not be validated against CUDA
> > 8.0
> > > > as
> > > > > > part
> > > > > > > > of
> > > > > > > > > the PR process.
> > > > > > > > >
> > > > > > > > > I don't have any numbers, but it's pretty unlikely that
> > anybody
> > > > is
> > > > > > > still
> > > > > > > > > using CUDA 8.0. According to
> > > > > > > > > https://en.wikipedia.org/wiki/CUDA#GPUs_supported, the
> > devices
> > > > > which
> > > > > > > are
> > > > > > > > > not being supported by CUDA 9 are under the Fermi
> > architecture
> > > > > which
> > > > > > > has
> > > > > > > > > been released in April 2010. These GPUs are way too old,
> so I
> > > > think
> > > > > > > we're
> > > > > > > > > safe with not covering them specifically - this does not
> mean
> > > > we're
> > > > > > > > > entirely deprecating them.
> > > > > > > > >
> > > > > > > > > One thing to note here is that we're not testing CUDA 9 as
> of
> > > > now.
> > > > > > > > >