Hi Paul, I am not quite sure what you are going to do.
If you want a batch version of Cholesky and TriangularSolve, then you will end up with a "nb" copy of gradients. What are you going to do with that "nb" copy of gradients? AFAIK, theano.grad only accept scalar input. If you need a Jacobian, you can only implement it by theano.scan and I know theano.scan is inefficient. You may be interested in this thread: https://groups.google.com/d/msg/theano-users/Rg8ZIru-pgo/DgvwY57RBwAJ Best, wonghang Paul Baggenstoss <[email protected]> 於 2020年2月10日 週一 下午8:46寫道: > Hi Wonghang, > So I am working toward making the Cholesky problem faster, and by that > I include triangular > solvers like GpuCublasTriangularSolve(). We typically do the Cholesky > decomp, then solve > linear systems involving the Cholesky matrix that has an upper or lower > triangular form. > > So I have started with GpuCublasTriangularSolve(). It has a lot of > overhead from > all the gpu array matrix copying and creation, which adds to the overhead > of > theano.scan, which is necessary to work with batches. So I thought it > would be much > better to have a batched version of Cholesky() and > GpuCublasTriangularSolve(). I have > created working versions of these batched routines (attached). It's > actually pretty simple, > I just index into the data by batch, then call the gpu solver, potrf() > for Cholesky > and trsv() for the solver. I loop over the batch like this: > > if theano.config.floatX=='float32': > wordlen=4 > else: > wordlet=8 > for ib in range(nb): > trsv(ctx.cublas_handle, uplo, trans, diag, n, > A_ptr+ib*n*n*wordlen, lda, b_ptr+ib*n*m*wordlen, 1) > > > There are a few small gotchas, such as that the GPU routines expect > F-ordered data, but > to index like I did above, it has to be C-ordered. So the data has to be > C-ordered by batch, but F-ordered within the batch! > > The problem I have, where I will need help, is in the gradients. > Although I can compute the gradient in L_op correctly (I think), > theano.gradient is not > happy with the shape of the matrices that L_op returns. This is probably > because gradient does not understand that the data is bacthed. > > Do you think you can help in this matter? I think this is over my head. > Paul > > > -- > > --- > You received this message because you are subscribed to the Google Groups > "theano-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/theano-users/b1336e9d-9601-48e9-9665-58740618a861%40googlegroups.com > <https://groups.google.com/d/msgid/theano-users/b1336e9d-9601-48e9-9665-58740618a861%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/theano-users/CAAMb3nXNLhNmxF1PHoCZfeGaj7C5WM_eAfKFw%2BiF_jNG9HTiRg%40mail.gmail.com.
