Re: [theano-users] Re: Cholesky decomposition slow

Wong Hang Mon, 10 Feb 2020 19:23:02 -0800

Hi Paul,

I am not quite sure what you are going to do.


If you want a batch version of Cholesky and TriangularSolve, then you will
end up with a "nb" copy of gradients.
What are you going to do with that "nb" copy of gradients?
AFAIK, theano.grad only accept scalar input. If you need a Jacobian, you
can only implement it by theano.scan and I know theano.scan is inefficient.

You may be interested in this thread:
https://groups.google.com/d/msg/theano-users/Rg8ZIru-pgo/DgvwY57RBwAJ

Best,
wonghang

Paul Baggenstoss <[email protected]> 於 2020年2月10日 週一 下午8:46寫道：

> Hi Wonghang,
>     So I am working toward making the Cholesky problem faster, and by that
> I include triangular
> solvers like GpuCublasTriangularSolve().  We typically do the Cholesky
> decomp, then solve
> linear systems involving the Cholesky matrix that has an upper or lower
> triangular form.
>
> So I have started with GpuCublasTriangularSolve(). It has a lot of
> overhead from
> all the gpu array matrix copying and creation, which adds to the overhead
> of
> theano.scan, which is necessary to work with batches.  So I thought it
> would be much
> better to have a batched version of Cholesky() and
> GpuCublasTriangularSolve(). I have
> created working versions of these batched routines (attached).  It's
> actually pretty simple,
> I just index into the data by batch, then call the gpu solver,  potrf()
> for Cholesky
> and trsv() for the solver.  I loop over the batch like this:
>
>     if theano.config.floatX=='float32':
>                 wordlen=4
>             else:
>                 wordlet=8
>      for ib in range(nb):
>                 trsv(ctx.cublas_handle, uplo, trans, diag, n,
>                      A_ptr+ib*n*n*wordlen, lda, b_ptr+ib*n*m*wordlen, 1)
>
>
> There are a few small gotchas, such as that the GPU routines expect
> F-ordered data, but
> to index like I did above, it has to be C-ordered. So the data has to be
> C-ordered by batch, but F-ordered within the batch!
>
>  The problem I have, where I will need help, is in the gradients.
> Although I can compute the gradient in L_op correctly (I think),
> theano.gradient is not
> happy with the shape of the matrices that L_op returns. This is probably
> because gradient does not understand that the data is bacthed.
>
>    Do you think you can help in this matter? I think this is over my head.
> Paul
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/theano-users/b1336e9d-9601-48e9-9665-58740618a861%40googlegroups.com
> <https://groups.google.com/d/msgid/theano-users/b1336e9d-9601-48e9-9665-58740618a861%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/theano-users/CAAMb3nXNLhNmxF1PHoCZfeGaj7C5WM_eAfKFw%2BiF_jNG9HTiRg%40mail.gmail.com.

Re: [theano-users] Re: Cholesky decomposition slow

Reply via email to