Ok. Is random number generation working in the new GPU backend yet? I can
see some code related to it, but a call to *uniform()* produces the error
messages "context name None not defined" and "Could not infer context from
inputs". Looks like it's not possible to specify the target device to
*uniform()*.
On Monday, November 21, 2016 at 2:40:03 AM UTC+2, Pascal Lamblin wrote:
>
> Right, now I remember that the _dev20 version only works on a limited
> number of dimensions. That would explain why adding a new axis helped.
>
> It may be fixed already in the new GPU back-end (it needs libgpuarray,
> then use device=cudaX instead of gpuX) already, otherwise this is where
> we should fix that.
>
> On Fri, Nov 18, 2016, Seppo Enarvi wrote:
> >
> >
> > That's interesting, because this function is not supposed to update the
> > bias. It just computes the cost and its gradient. Maybe that op is used
> > to update the gradient.
> >
> > My GPU is Quadro K2000. I don't think it's too old because the graph
> > contains other instances of GpuAdvancedIncSubtensor1_dev20.
> >
> > Anyway, I started to think why I don't have this problem with the weight
> > matrix. I'm selecting vectors from the weight matrix in the same manner.
> > So I tried converting the bias vector into a matrix, and selecting rows
> > from the matrix (each of which contain only one element):
> >
> > bias = bias[class_ids]
> > =>
> > bias = bias[:, None]
> > bias = bias[class_ids, 0]
> >
> > It's a lot faster this way. I updated to the latest version of Theano
> > from Git and I still see the huge speed difference.
> >
> > Seppo
> >
> >
> >
> > On Friday, November 18, 2016 at 6:49:56 PM UTC+2, Pascal Lamblin wrote:
> > >
> > > Hi,
> > >
> > > This operation is actually the _update_ of the selected elements of
> the
> > > bias.
> > >
> > > There is a faster implementation (named GpuAdvancedIncSubtensor1_dev20
> > > IIRC) that uses atomic addition to speed up that operation. It has the
> > > downside of not yielding a deterministic order of summation if the
> same
> > > element is updated more than once in the same operation.
> > >
> > > One of the issues seems to be that this faster implementation is not
> > > selected. Could it be that you have an old GPU?
> > >
> > > Another potential issue is that your graph seems to first apply
> updates
> > > on a tensor of zeros, and then apply another update on the bias
> itself.
> > > There may be a way of simplifying that.
> > >
> > > On Fri, Nov 18, 2016, Seppo Enarvi wrote:
> > > >
> > > > I'm implementing sampling based softmax alternatives, where I
> compute
> > > the
> > > > preactivations only for certain output classes. I get a very bad
> > > > performance due to a GpuAdvancedIncSubtensor1 op, which consumes 90
> % of
> > > > the processing time of the update function:
> > > >
> > > > <% time> <sum %> <apply time> <time per call> <#call> <id> <Mflops>
> > > > <Gflops/s> <Apply name> 89.0% 89.0% 725.413s 2.44e-01s 2968 115
> > > >
> > >
> GpuAdvancedIncSubtensor1{inplace,inc}(GpuAdvancedIncSubtensor1{inplace,inc}.0,
>
>
> > >
> > > > GpuFromHost.0, Elemwise{Cast{int64}}.0) input 0: dtype=float32,
> > > > shape=(10001,), strides=(1,) input 1: dtype=float32, shape=(25600,),
> > > > strides=(1,) input 2: dtype=int64, shape=(25600,), strides=c output
> 0:
> > > > dtype=float32, shape=(10001,), strides=(1,)
> > > >
> > > > Looking at the computation graph of that function, I noticed it's
> > > operating
> > > > on the bias vector:
> > > >
> > > > GpuAdvancedIncSubtensor1{inplace,inc} [id FL] '' 115
> > > > |GpuAdvancedIncSubtensor1{inplace,inc} [id FM] '' 112
> > > > | |GpuAlloc{memset_0=True} [id FN] '' 17
> > > > | | |CudaNdarrayConstant{[ 0.]} [id FO]
> > > > | | |Shape_i{0} [id FP] '' 7
> > > > | | |bias [id BU]
> > > >
> > > > More precisely, the performance hit seems to come from selecting
> from
> > > the
> > > > bias vector those values that correspond to the output classes (bias
> =
> > > > bias[class_ids]). Is that a particularly expensive operation?
> class_ids
> > > can
> > > > be large (1,000 - 10,000). If I don't use the bias, my speed
> improves
> > > > tenfold. Is there a way to circumvent that problem?
> > >
> >
> > --
> >
> > ---
> > You received this message because you are subscribed to the Google
> Groups "theano-users" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected] <javascript:>.
> > For more options, visit https://groups.google.com/d/optout.
>
>
> --
> Pascal
>
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.