Re: CUDNN algorithm selection failure

2018-10-04 Thread kellen sunderland
"I ran a similar test(test_slice_batchnorm) for 5K times and I couldn't
reproduce the issue."

One thing to keep in mind is that the SelectAlgo call will cache results in
a registry that is in static scope.  To repro you'd likely have to create a
new process each time you run the test.  (Apologies if this is already how
you're reproducing).

SelectAlgo call:
https://github.com/apache/incubator-mxnet/blob/403831ace46eab4447794df9411351e439e8983e/src/operator/nn/cudnn/cudnn_convolution-inl.h#L609

Static local / singleton registry pattern here:
https://github.com/apache/incubator-mxnet/blob/024b5a916dd3a39a39031ce5e6565cd7d9d60fe2/src/operator/nn/cudnn/cudnn_algoreg.cc#L37

On Thu, Oct 4, 2018 at 8:58 PM Marco de Abreu
 wrote:

> For GPU, we don't run any tests in parallel.
>
> -Marco
>
> Naveen Swamy  schrieb am Do., 4. Okt. 2018, 19:54:
>
> > Looking at the error raised, you can see that the workspace size(GPU mem
> > size) of 1GB isn't sufficient. I am wondering if it is due to tests
> running
> > in parallel on CI, if this is true(tests running in parallel) is it
> > possible to reduce the parallelism ?
> > Error:
> > "mxnet.base.MXNetError: [05:40:12]
> > src/operator/nn/./cudnn/cudnn_convolution-inl.h:870: Failed to find any
> > forward convolution algorithm.  with workspace size of 1073741824 bytes,
> > please consider reducing batch/model size or increasing the workspace
> size"
> >
> > I ran a similar test(test_slice_batchnorm) for 5K times and I couldn't
> > reproduce the issue. I will look into it further to see if there are
> other
> > alternatives.
> >
> >
> > On Thu, Oct 4, 2018 at 10:48 AM Piyush Ghai 
> wrote:
> >
> > > Another build where test_slice_batchnorm_reshape_batchnorm fails :
> > >
> > >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12721/7/pipeline
> > > <
> > >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12721/7/pipeline
> > > >
> > >
> > > —
> > > Piyush
> > >
> > > > On Oct 3, 2018, at 9:32 AM, Pedro Larroy <
> pedro.larroy.li...@gmail.com
> > >
> > > wrote:
> > > >
> > > > Seems is not the only test:
> > > >
> > >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12726/5/pipeline
> > > >
> > > > test_slice_batchnorm_reshape_batchnorm is also failing and hasn't
> been
> > > > touched for a while. It doesn't look like a problem with the test to
> > me,
> > > > (not a flaky test). Looks to me that should find and address the root
> > > cause
> > > > instead of disabling the test in this case.
> > > >
> > > > Pedro.
> > > >
> > > > On Tue, Oct 2, 2018 at 2:39 AM Marco de Abreu
> > > >  wrote:
> > > >
> > > >> I have created an issue at
> > > >> https://github.com/apache/incubator-mxnet/issues/12715 and a PR to
> > > disable
> > > >> the test at https://github.com/apache/incubator-mxnet/pull/12716.
> > > >>
> > > >> This test is pretty new and was submitted with a number of other
> > > >> problematic (and disabled) tests:
> > > >> https://github.com/apache/incubator-mxnet/issues/11164 It could be
> > > >> possible
> > > >> that the test is simply not stable enough. The PR that introduced
> that
> > > test
> > > >> is https://github.com/apache/incubator-mxnet/pull/10921 - it was
> > merged
> > > >> two
> > > >> days ago.
> > > >>
> > > >> Best regards,
> > > >> Marco
> > > >>
> > > >> On Tue, Oct 2, 2018 at 8:43 AM Pedro Larroy <
> > > pedro.larroy.li...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> Thanks for checking Lin. If it happens again we will have to dig
> > > deeper.
> > > >> We
> > > >>> have just one executor in GPU so I wonder what could be the root
> > cause
> > > of
> > > >>> this.
> > > >>>
> > > >>> On Mon, Oct 1, 2018 at 10:57 PM Lin Yuan 
> > wrote:
> > > >>>
> > >  I could not reproduce the error on an EC2 g3x8 instance making it
> > hard
> > > >> to
> > >  debug. I also suspect it was due to resource usage limit on ci
> > > >>> Instance.
> > > 
> > >  On Mon, Oct 1, 2018 at 10:40 PM Pedro Larroy <
> > > >>> pedro.larroy.li...@gmail.com
> > > >
> > >  wrote:
> > > 
> > > > It doesn't look like flakiness to me at first sight. I think it
> > might
> > > >>> be
> > > > related to resource usage / allocation / leak in the worst case.
> > > >
> > > > Could be that there was not enough memory GPU memory at the time
> of
> > > >>> test
> > > > execution. But I'm just speculating, hence my original question.
> > > >
> > > > Pedro.
> > > >
> > > > On Mon, Oct 1, 2018 at 8:16 PM Lin Yuan 
> > wrote:
> > > >
> > > >> Hi Pedro,
> > > >>
> > > >> I also got this failure in my PR
> > > >>
> > > >>
> > > >
> > > 
> > > >>>
> > > >>
> > >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11742/27/pipeline
> > > >>
> > > >> I was not able to identify the root cause of it from 

Re: CUDNN algorithm selection failure

2018-10-04 Thread Marco de Abreu
For GPU, we don't run any tests in parallel.

-Marco

Naveen Swamy  schrieb am Do., 4. Okt. 2018, 19:54:

> Looking at the error raised, you can see that the workspace size(GPU mem
> size) of 1GB isn't sufficient. I am wondering if it is due to tests running
> in parallel on CI, if this is true(tests running in parallel) is it
> possible to reduce the parallelism ?
> Error:
> "mxnet.base.MXNetError: [05:40:12]
> src/operator/nn/./cudnn/cudnn_convolution-inl.h:870: Failed to find any
> forward convolution algorithm.  with workspace size of 1073741824 bytes,
> please consider reducing batch/model size or increasing the workspace size"
>
> I ran a similar test(test_slice_batchnorm) for 5K times and I couldn't
> reproduce the issue. I will look into it further to see if there are other
> alternatives.
>
>
> On Thu, Oct 4, 2018 at 10:48 AM Piyush Ghai  wrote:
>
> > Another build where test_slice_batchnorm_reshape_batchnorm fails :
> >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12721/7/pipeline
> > <
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12721/7/pipeline
> > >
> >
> > —
> > Piyush
> >
> > > On Oct 3, 2018, at 9:32 AM, Pedro Larroy  >
> > wrote:
> > >
> > > Seems is not the only test:
> > >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12726/5/pipeline
> > >
> > > test_slice_batchnorm_reshape_batchnorm is also failing and hasn't been
> > > touched for a while. It doesn't look like a problem with the test to
> me,
> > > (not a flaky test). Looks to me that should find and address the root
> > cause
> > > instead of disabling the test in this case.
> > >
> > > Pedro.
> > >
> > > On Tue, Oct 2, 2018 at 2:39 AM Marco de Abreu
> > >  wrote:
> > >
> > >> I have created an issue at
> > >> https://github.com/apache/incubator-mxnet/issues/12715 and a PR to
> > disable
> > >> the test at https://github.com/apache/incubator-mxnet/pull/12716.
> > >>
> > >> This test is pretty new and was submitted with a number of other
> > >> problematic (and disabled) tests:
> > >> https://github.com/apache/incubator-mxnet/issues/11164 It could be
> > >> possible
> > >> that the test is simply not stable enough. The PR that introduced that
> > test
> > >> is https://github.com/apache/incubator-mxnet/pull/10921 - it was
> merged
> > >> two
> > >> days ago.
> > >>
> > >> Best regards,
> > >> Marco
> > >>
> > >> On Tue, Oct 2, 2018 at 8:43 AM Pedro Larroy <
> > pedro.larroy.li...@gmail.com>
> > >> wrote:
> > >>
> > >>> Thanks for checking Lin. If it happens again we will have to dig
> > deeper.
> > >> We
> > >>> have just one executor in GPU so I wonder what could be the root
> cause
> > of
> > >>> this.
> > >>>
> > >>> On Mon, Oct 1, 2018 at 10:57 PM Lin Yuan 
> wrote:
> > >>>
> >  I could not reproduce the error on an EC2 g3x8 instance making it
> hard
> > >> to
> >  debug. I also suspect it was due to resource usage limit on ci
> > >>> Instance.
> > 
> >  On Mon, Oct 1, 2018 at 10:40 PM Pedro Larroy <
> > >>> pedro.larroy.li...@gmail.com
> > >
> >  wrote:
> > 
> > > It doesn't look like flakiness to me at first sight. I think it
> might
> > >>> be
> > > related to resource usage / allocation / leak in the worst case.
> > >
> > > Could be that there was not enough memory GPU memory at the time of
> > >>> test
> > > execution. But I'm just speculating, hence my original question.
> > >
> > > Pedro.
> > >
> > > On Mon, Oct 1, 2018 at 8:16 PM Lin Yuan 
> wrote:
> > >
> > >> Hi Pedro,
> > >>
> > >> I also got this failure in my PR
> > >>
> > >>
> > >
> > 
> > >>>
> > >>
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11742/27/pipeline
> > >>
> > >> I was not able to identify the root cause of it from changelist.
> > >> Are
> >  you
> > >> suggesting there is some flakiness in the master branch too?
> > >>
> > >> Thanks,
> > >>
> > >> Lin
> > >>
> > >> On Mon, Oct 1, 2018 at 4:55 PM Pedro Larroy <
> > > pedro.larroy.li...@gmail.com>
> > >> wrote:
> > >>
> > >>> Hi
> > >>>
> > >>> I saw this failure on CI:
> > >>>
> > >>>
> > >>
> > >
> > 
> > >>>
> > >>
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1697/pipeline
> > >>>
> > >>> Have you seen other cases where we fail to select the best CUDNN
> > >> algorithm?
> > >>> In which circumstances this could happen, and do you think is a
> > >>> good
> > > idea
> > >>> to have one selected by default as a last resort?
> > >>>
> > >>>
> > >>> Pedro.
> > >>>
> > >>
> > >
> > 
> > >>>
> > >>
> >
> >
>


Re: CUDNN algorithm selection failure

2018-10-04 Thread Naveen Swamy
Looking at the error raised, you can see that the workspace size(GPU mem
size) of 1GB isn't sufficient. I am wondering if it is due to tests running
in parallel on CI, if this is true(tests running in parallel) is it
possible to reduce the parallelism ?
Error:
"mxnet.base.MXNetError: [05:40:12]
src/operator/nn/./cudnn/cudnn_convolution-inl.h:870: Failed to find any
forward convolution algorithm.  with workspace size of 1073741824 bytes,
please consider reducing batch/model size or increasing the workspace size"

I ran a similar test(test_slice_batchnorm) for 5K times and I couldn't
reproduce the issue. I will look into it further to see if there are other
alternatives.


On Thu, Oct 4, 2018 at 10:48 AM Piyush Ghai  wrote:

> Another build where test_slice_batchnorm_reshape_batchnorm fails :
>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12721/7/pipeline
> <
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12721/7/pipeline
> >
>
> —
> Piyush
>
> > On Oct 3, 2018, at 9:32 AM, Pedro Larroy 
> wrote:
> >
> > Seems is not the only test:
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12726/5/pipeline
> >
> > test_slice_batchnorm_reshape_batchnorm is also failing and hasn't been
> > touched for a while. It doesn't look like a problem with the test to me,
> > (not a flaky test). Looks to me that should find and address the root
> cause
> > instead of disabling the test in this case.
> >
> > Pedro.
> >
> > On Tue, Oct 2, 2018 at 2:39 AM Marco de Abreu
> >  wrote:
> >
> >> I have created an issue at
> >> https://github.com/apache/incubator-mxnet/issues/12715 and a PR to
> disable
> >> the test at https://github.com/apache/incubator-mxnet/pull/12716.
> >>
> >> This test is pretty new and was submitted with a number of other
> >> problematic (and disabled) tests:
> >> https://github.com/apache/incubator-mxnet/issues/11164 It could be
> >> possible
> >> that the test is simply not stable enough. The PR that introduced that
> test
> >> is https://github.com/apache/incubator-mxnet/pull/10921 - it was merged
> >> two
> >> days ago.
> >>
> >> Best regards,
> >> Marco
> >>
> >> On Tue, Oct 2, 2018 at 8:43 AM Pedro Larroy <
> pedro.larroy.li...@gmail.com>
> >> wrote:
> >>
> >>> Thanks for checking Lin. If it happens again we will have to dig
> deeper.
> >> We
> >>> have just one executor in GPU so I wonder what could be the root cause
> of
> >>> this.
> >>>
> >>> On Mon, Oct 1, 2018 at 10:57 PM Lin Yuan  wrote:
> >>>
>  I could not reproduce the error on an EC2 g3x8 instance making it hard
> >> to
>  debug. I also suspect it was due to resource usage limit on ci
> >>> Instance.
> 
>  On Mon, Oct 1, 2018 at 10:40 PM Pedro Larroy <
> >>> pedro.larroy.li...@gmail.com
> >
>  wrote:
> 
> > It doesn't look like flakiness to me at first sight. I think it might
> >>> be
> > related to resource usage / allocation / leak in the worst case.
> >
> > Could be that there was not enough memory GPU memory at the time of
> >>> test
> > execution. But I'm just speculating, hence my original question.
> >
> > Pedro.
> >
> > On Mon, Oct 1, 2018 at 8:16 PM Lin Yuan  wrote:
> >
> >> Hi Pedro,
> >>
> >> I also got this failure in my PR
> >>
> >>
> >
> 
> >>>
> >>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11742/27/pipeline
> >>
> >> I was not able to identify the root cause of it from changelist.
> >> Are
>  you
> >> suggesting there is some flakiness in the master branch too?
> >>
> >> Thanks,
> >>
> >> Lin
> >>
> >> On Mon, Oct 1, 2018 at 4:55 PM Pedro Larroy <
> > pedro.larroy.li...@gmail.com>
> >> wrote:
> >>
> >>> Hi
> >>>
> >>> I saw this failure on CI:
> >>>
> >>>
> >>
> >
> 
> >>>
> >>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1697/pipeline
> >>>
> >>> Have you seen other cases where we fail to select the best CUDNN
> >> algorithm?
> >>> In which circumstances this could happen, and do you think is a
> >>> good
> > idea
> >>> to have one selected by default as a last resort?
> >>>
> >>>
> >>> Pedro.
> >>>
> >>
> >
> 
> >>>
> >>
>
>


Re: CUDNN algorithm selection failure

2018-10-03 Thread Pedro Larroy
Seems is not the only test:
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12726/5/pipeline

test_slice_batchnorm_reshape_batchnorm is also failing and hasn't been
touched for a while. It doesn't look like a problem with the test to me,
(not a flaky test). Looks to me that should find and address the root cause
instead of disabling the test in this case.

Pedro.

On Tue, Oct 2, 2018 at 2:39 AM Marco de Abreu
 wrote:

> I have created an issue at
> https://github.com/apache/incubator-mxnet/issues/12715 and a PR to disable
> the test at https://github.com/apache/incubator-mxnet/pull/12716.
>
> This test is pretty new and was submitted with a number of other
> problematic (and disabled) tests:
> https://github.com/apache/incubator-mxnet/issues/11164 It could be
> possible
> that the test is simply not stable enough. The PR that introduced that test
> is https://github.com/apache/incubator-mxnet/pull/10921 - it was merged
> two
> days ago.
>
> Best regards,
> Marco
>
> On Tue, Oct 2, 2018 at 8:43 AM Pedro Larroy 
> wrote:
>
> > Thanks for checking Lin. If it happens again we will have to dig deeper.
> We
> > have just one executor in GPU so I wonder what could be the root cause of
> > this.
> >
> > On Mon, Oct 1, 2018 at 10:57 PM Lin Yuan  wrote:
> >
> > > I could not reproduce the error on an EC2 g3x8 instance making it hard
> to
> > > debug. I also suspect it was due to resource usage limit on ci
> >  Instance.
> > >
> > > On Mon, Oct 1, 2018 at 10:40 PM Pedro Larroy <
> > pedro.larroy.li...@gmail.com
> > > >
> > > wrote:
> > >
> > > > It doesn't look like flakiness to me at first sight. I think it might
> > be
> > > > related to resource usage / allocation / leak in the worst case.
> > > >
> > > > Could be that there was not enough memory GPU memory at the time of
> > test
> > > > execution. But I'm just speculating, hence my original question.
> > > >
> > > > Pedro.
> > > >
> > > > On Mon, Oct 1, 2018 at 8:16 PM Lin Yuan  wrote:
> > > >
> > > > > Hi Pedro,
> > > > >
> > > > > I also got this failure in my PR
> > > > >
> > > > >
> > > >
> > >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11742/27/pipeline
> > > > >
> > > > > I was not able to identify the root cause of it from changelist.
> Are
> > > you
> > > > > suggesting there is some flakiness in the master branch too?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Lin
> > > > >
> > > > > On Mon, Oct 1, 2018 at 4:55 PM Pedro Larroy <
> > > > pedro.larroy.li...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi
> > > > > >
> > > > > > I saw this failure on CI:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1697/pipeline
> > > > > >
> > > > > > Have you seen other cases where we fail to select the best CUDNN
> > > > > algorithm?
> > > > > > In which circumstances this could happen, and do you think is a
> > good
> > > > idea
> > > > > > to have one selected by default as a last resort?
> > > > > >
> > > > > >
> > > > > > Pedro.
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: CUDNN algorithm selection failure

2018-10-02 Thread Marco de Abreu
I have created an issue at
https://github.com/apache/incubator-mxnet/issues/12715 and a PR to disable
the test at https://github.com/apache/incubator-mxnet/pull/12716.

This test is pretty new and was submitted with a number of other
problematic (and disabled) tests:
https://github.com/apache/incubator-mxnet/issues/11164 It could be possible
that the test is simply not stable enough. The PR that introduced that test
is https://github.com/apache/incubator-mxnet/pull/10921 - it was merged two
days ago.

Best regards,
Marco

On Tue, Oct 2, 2018 at 8:43 AM Pedro Larroy 
wrote:

> Thanks for checking Lin. If it happens again we will have to dig deeper. We
> have just one executor in GPU so I wonder what could be the root cause of
> this.
>
> On Mon, Oct 1, 2018 at 10:57 PM Lin Yuan  wrote:
>
> > I could not reproduce the error on an EC2 g3x8 instance making it hard to
> > debug. I also suspect it was due to resource usage limit on ci
>  Instance.
> >
> > On Mon, Oct 1, 2018 at 10:40 PM Pedro Larroy <
> pedro.larroy.li...@gmail.com
> > >
> > wrote:
> >
> > > It doesn't look like flakiness to me at first sight. I think it might
> be
> > > related to resource usage / allocation / leak in the worst case.
> > >
> > > Could be that there was not enough memory GPU memory at the time of
> test
> > > execution. But I'm just speculating, hence my original question.
> > >
> > > Pedro.
> > >
> > > On Mon, Oct 1, 2018 at 8:16 PM Lin Yuan  wrote:
> > >
> > > > Hi Pedro,
> > > >
> > > > I also got this failure in my PR
> > > >
> > > >
> > >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11742/27/pipeline
> > > >
> > > > I was not able to identify the root cause of it from changelist. Are
> > you
> > > > suggesting there is some flakiness in the master branch too?
> > > >
> > > > Thanks,
> > > >
> > > > Lin
> > > >
> > > > On Mon, Oct 1, 2018 at 4:55 PM Pedro Larroy <
> > > pedro.larroy.li...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > I saw this failure on CI:
> > > > >
> > > > >
> > > >
> > >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1697/pipeline
> > > > >
> > > > > Have you seen other cases where we fail to select the best CUDNN
> > > > algorithm?
> > > > > In which circumstances this could happen, and do you think is a
> good
> > > idea
> > > > > to have one selected by default as a last resort?
> > > > >
> > > > >
> > > > > Pedro.
> > > > >
> > > >
> > >
> >
>


Re: CUDNN algorithm selection failure

2018-10-01 Thread Lin Yuan
I could not reproduce the error on an EC2 g3x8 instance making it hard to
debug. I also suspect it was due to resource usage limit on ci   Instance.

On Mon, Oct 1, 2018 at 10:40 PM Pedro Larroy 
wrote:

> It doesn't look like flakiness to me at first sight. I think it might be
> related to resource usage / allocation / leak in the worst case.
>
> Could be that there was not enough memory GPU memory at the time of test
> execution. But I'm just speculating, hence my original question.
>
> Pedro.
>
> On Mon, Oct 1, 2018 at 8:16 PM Lin Yuan  wrote:
>
> > Hi Pedro,
> >
> > I also got this failure in my PR
> >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11742/27/pipeline
> >
> > I was not able to identify the root cause of it from changelist. Are you
> > suggesting there is some flakiness in the master branch too?
> >
> > Thanks,
> >
> > Lin
> >
> > On Mon, Oct 1, 2018 at 4:55 PM Pedro Larroy <
> pedro.larroy.li...@gmail.com>
> > wrote:
> >
> > > Hi
> > >
> > > I saw this failure on CI:
> > >
> > >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1697/pipeline
> > >
> > > Have you seen other cases where we fail to select the best CUDNN
> > algorithm?
> > > In which circumstances this could happen, and do you think is a good
> idea
> > > to have one selected by default as a last resort?
> > >
> > >
> > > Pedro.
> > >
> >
>


CUDNN algorithm selection failure

2018-10-01 Thread Pedro Larroy
Hi

I saw this failure on CI:
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1697/pipeline

Have you seen other cases where we fail to select the best CUDNN algorithm?
In which circumstances this could happen, and do you think is a good idea
to have one selected by default as a last resort?


Pedro.