Thanks Andreas - 

Yes, unfortunately the patterns does change (the code I run grows and shrinks 
mountain glaciers) so I think I'm out of luck with GPU processing here. I'll 
consider some other options.  Pycuda, nevertheless, is a great package - thanks 
everyone for your hard work developing and maintaining it. 

Cheers, 


-----Original Message-----
From: Andreas Kloeckner [mailto:li...@informa.tiker.net] 
Sent: Sunday, September 19, 2010 6:07 PM
To: Brian Menounos; pycuda@tiker.net
Subject: RE: [PyCUDA] SparseSolve.py example

Hi Brian,

On Tue, 7 Sep 2010 19:56:43 +0000, Brian Menounos <menou...@unbc.ca> wrote:
> Hi Andreas - I realize you're pretty busy answering emails of late, so answer 
> when you can... 

Yeah, sorry. Pretty swamped ATM. I hope things will clear out a bit during the 
fall semester, but so far I don't see when that would be happening...

Btw, I cc'd the list on my reply. Hope you don't mind. Please keep them in the 
loop (for archival purposes) unless you're discussing something confidential.

> I've attached your SparseSolve.py examples tweaked to deal with two 
> pickled numpy arrays (1D and 2D) in order to try out pycuda's 
> conjugate gradient (cg) function.
> 
> I'm typically building sparse matrices and doing iterative cg calls as 
> part of a numerical model for mountain glaciation. I was hoping to 
> speed up the cg function within scipy by sending the task to my gpu. 
> However, what is clear is that much time is spent assembling the 
> packets (your PacketedSpMV() function) before execution of 
> solve_pkt_with_cg().
> 
> I need to execute cg for each time step of my model (typically 1-1.yr 
> steps for 10,000 yr integration) and this is the part of the model 
> where most time is spent. Any speed up here would be ideal.
> 
> However, the performance is about 20 times slower than if run on a 
> single cpu using scipy's cg function. I knew there would be some 
> overhead for reading/writing to the GPU, but I wasn't expecting this 
> much time in packet assembly. Am I wasting my time trying to do this 
> on a GPU? I apologize in advance for my deficit in GPU/parallel 
> coding!

Does the sparsity structure of the matrix change? If not, you could simply 
scatter the new entries into the existing data structure, which would be pretty 
fast (but would still require a little additional code on top of what's there).

If your structure *does* change and can't be predicted/generalized over 
somehow, then the present code is simply not for you. It spends a significant 
amount of (CPU!) time building, partitioning and transferring, under the 
assumption that this only happens during preprocessing. The actual CG and 
matrix-vector products are tuned to be fast. If you'd want to accommodate 
changing sparsity patterns, you'd have to GPU-ify assembly, but I don't think 
even cusp [1] does that.

Andreas

[1] http://code.google.com/p/cusp-library/


_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to