You need to take the subtensor in the forward to save all computation. It is a very problem to remove useless computation due to a subtensor at the end of the graph. We cover very few optimisation compared to what is needed. So move the subtensor in the forward.
Fred Le lun. 2 oct. 2017 23:35, dhern <[email protected]> a écrit : > Thanks for the reply. > > Right, that method however seems to address the issue for gradients with > respect to shared variables. I am interested, as in the code above in > taking symbolic gradients with respect to subarrays of theano tensors. That > doesn't seem to be possible, correct?. I will look more closely into taking > a subtensor of the gradient, although I am not sure it reduces computation > time in my actual code, since that is what I did to begin with and it is > still very time consuming. > > > On Thursday, September 28, 2017 at 3:32:19 PM UTC-4, Pascal Lamblin wrote: > >> Maybe the following can help you. >> >> >> http://deeplearning.net/software/theano/tutorial/faq_tutorial.html#how-to-update-a-subset-of-weights >> >> Also, if you take a subtensor of the gradient itself, some optimizations >> can apply that would avoid the computation of the full gradient. >> >> For instance, with your example, the "subtensor" and "* 2" operations >> are swapped: >> >> >>> grad0 = full_grad[0] >> >>> g0 = theano.function([X, Y], grad0) >> >> >>> theano.printing.debugprint(g0) >> Elemwise{mul,no_inplace} [id A] '' 1 >> |TensorConstant{(1,) of 2.0} [id B] >> |Subtensor{int64} [id C] '' 0 >> |<TensorType(float64, matrix)> [id D] >> |Constant{0} [id E] >> >> >> On 2017-09-27 05:25 PM, Daniel Hernandez wrote: >> > Hi, >> > >> > I was wondering if someone here had an answer to this unsolved question >> > over in stack overflow: >> > >> > >> https://stackoverflow.com/questions/37545325/theano-gradient-of-subtensor >> > >> > Basically, how do you compute gradients w.r.t. a subtensor? >> > >> > The question arises in the context of large tensors, say Y and X, where >> > it is known that each entry in Y depends only on a small subset of the >> > entries of X. Taking T.grad(Y, X) is computationally expensive since it >> > will compute every possible gradient so one would like to be able to >> > compute, e.g. T.grad(Y, X[i]) . Here is some basic code illustrating >> the >> > problem. >> > >> > X = T.matrix() >> > Y = T.sum(X**2) >> > >> > full_grad = T.grad(Y, X) # This works >> > >> > X0 = X[0] >> > test = T.grad(Y, X0) # This pukes a Disconnected Input error >> > >> > Silencing the Disconnected Input can be done in grad, but of course, >> > that doesn't solve anything, evaluating the gradients only results in a >> > bunch of 0s. So, is there a way of taking these gradients with respect >> > to a subtensor? >> > >> > >> > -- >> > >> > --- >> > You received this message because you are subscribed to the Google >> > Groups "theano-users" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> > > an email to [email protected] >> > > <mailto:[email protected]>. >> > For more options, visit https://groups.google.com/d/optout. >> >> -- >> Pascal Lamblin >> > -- > > --- > You received this message because you are subscribed to the Google Groups > "theano-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
