Hi wonghang,  Sorry to pester you with emails, but I have some interesting 
timing information.
I ran a process using different processors and ways of computing Cholesky()
The results are surprising.

GpuMagmaCholesky()                9.0 sec
slinalg.Cholesky(uses cusolver)  2.9 sec
CPU                                         1.9 sec

It looks like it pays to just use the CPU!

Doesn't make any sense!
Paul


On Thursday, February 6, 2020 at 2:53:55 PM UTC+1, Paul Baggenstoss wrote:
>
>
> Hello again.  
>      So I added 64-bit support to theano/gpuarray/c_code/magma_cholesky.c 
> and to theano/gpuarray/linalg.py in the function GpuMagmaCholesky(). I 
> attached the files.
> It works now for 32 and 64 bit and has gradient. The numerical problem is 
> gone. 
>   But (and this is a big BUT) it iseems to be a factor of at least 2 times 
> slower than the CPU. Any thoughts on this?
> Paul
>
>
> On Thursday, February 6, 2020 at 10:28:08 AM UTC+1, Paul Baggenstoss wrote:
>>
>> Simon,
>> I did more digging and have some more information. I tested 
>> theano.gpuarray.linalg.GpuMagmaCholesky(),  on float32 and it looks good. 
>> The result is exactly the same as for CPU.
>> So the problem seems to be in CUsolver.  The problem is that   
>> theano.gpuarray.linalg.GpuMagmaCholesky()(Cll) does not define a gradient 
>> and works only for
>> float32. I installed the latest magma-2.5.2 and it has support for double 
>> precision Cholesky (dpotrf) but Theano seems to use it's own copy of the 
>> MAGMA source.
>> Not sure how that works. Can I force Theano to use magma-2.5.2 ?  If not, 
>> it seems feasible to borrow the gradient from 
>> theano.gpuarray.linalg.GpuCholesky()
>> and add support for float64 as well.  Thoughts?
>> Paul
>>
>>
>> On Wednesday, February 5, 2020 at 5:32:43 PM UTC+1, Paul Baggenstoss 
>> wrote:
>>>
>>> Hi Simon, I forgot to mention that I use the gradient of Cholesky, and 
>>> this has even more error than the Cholesky decomo, but I assume that this 
>>> is because
>>> of a bug in Cholesky itself.
>>> Paul
>>>
>>>
>>> On Wednesday, February 5, 2020 at 5:30:10 PM UTC+1, Paul Baggenstoss 
>>> wrote:
>>>>
>>>> Hi Simon,I have uploaded the MATLAB format file with the matrix Cll, 
>>>> which is the original matrix, and R_cpu which was produced using CPU by  
>>>> slinalg.Cholesky( ), and R_cuda which
>>>> was produced by the same function, but with GPU ( I think it uses 
>>>> theano.gpuarray.linalg.GpuCholesky() )   Both used the same precision 
>>>> (float32)  so should give the same results.
>>>> But you can see that at the end of the diagonal, the values go wild. It 
>>>> appears to be numericla errors. 
>>>> Thanks in advance!
>>>> Paul
>>>>
>>>>
>>>>
>>>>
>>>> On Wednesday, February 5, 2020 at 4:56:14 PM UTC+1, Wong Hang wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> The GPU cholesky decomposition relies on cuSOLVER or Magma. I believe 
>>>>> nvidia knows their hardware well and cuSOLVER should provide the best 
>>>>> efficient result.
>>>>>
>>>>> Although cholesky decomposition is very numerical stable, when I write 
>>>>> the test case, I find that I will get trouble for relatively small matrix 
>>>>> if I use single-precision.
>>>>>
>>>>> Are you using single-precision on a big matrix? 
>>>>> If not, try to compute the condition number of the matrix to see if it 
>>>>> is too big.
>>>>>
>>>>> If it is not too big, then it may be a bug. I also need to use the 
>>>>> cholesky operator, Please send me the matrix and I am willing to fix it.
>>>>>
>>>>> Best,
>>>>>
>>>>> 2020年2月6日(木) 0:34 Paul Baggenstoss <[email protected]>:
>>>>>
>>>>>> HI Simon, I was wondering if you got anywhere with the faster 
>>>>>> Cholesky for Theano. I also use it a lot and have found it to be 
>>>>>> unstable 
>>>>>> on the GPU.
>>>>>> Paul
>>>>>>
>>>>>> On Saturday, March 7, 2015 at 11:45:36 AM UTC+1, Simon Ebner wrote:
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I want to do computations where I rely heavily on the Cholesky 
>>>>>>> decomposition. Writing a small benchmark for tensor.slinalg.Cholesky, I 
>>>>>>> noticed that the implementation is not as fast as I hoped. As far as I 
>>>>>>> can 
>>>>>>> tell it is not optimized for GPUs yet but relies on the scipy 
>>>>>>> implementation?
>>>>>>> Doing a bit of a google seach I found several cuda implementations 
>>>>>>> for fast Cholesky decompositions on the GPU. Before I try to include 
>>>>>>> that 
>>>>>>> code into my theano environment, could you let me know whether you 
>>>>>>> decided 
>>>>>>> not to implement fast Cholesky decomposition on the GPU on purpose? 
>>>>>>> Furthermore, since I'm fairly new to theano I'm not completely 
>>>>>>> confident 
>>>>>>> how to incorporate cuda code best into my existing theano code. Is the 
>>>>>>> sensible to create a custom OP with optimized C-Code?
>>>>>>>
>>>>>>> Best,
>>>>>>> Simon
>>>>>>>
>>>>>> -- 
>>>>>>
>>>>>> --- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "theano-users" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/theano-users/aca41c35-ec36-4055-bac7-e53aced30ea7%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/theano-users/aca41c35-ec36-4055-bac7-e53aced30ea7%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/theano-users/7aac6c1b-4b3b-4ad3-9a1d-1f331e28cf02%40googlegroups.com.

Reply via email to