On 9/28/2011 8:15 AM, David Mertens wrote:
Hey Rob,
I've CC'd the PDL list in case somebody there can speak more to your
concerns about PDL. I've never heard of anybody needing 10s of thousands of
dimensions, and PDL might only support 256. Anybody know? As far as sparse
matrix support, I never used it and it's not a crowning feature. Can anybody
else speak more to the matter?
I think the dimensions he is describing are the lengths
of the vectors which would correspond to the size of
dim(0).
PDL doesn't natively support sparse representations
as far as I know. One can use run-length encoding
and decoding to create a compact representation from
which sparse operations could be constructed.
You might wish to try the non-sparse version with PDL
to develop the approach (reducing the dimensionality
of your space if needed to meet memory requirements).
Once you have more details on the computation and
the memory requirements that could lead to a better
parallelized implementation.
--Chris
There's no reason CUDA or OpenCL couldn't handle 10s of thousands of
dimensions, if that's what you need, although you would have to write the C
code to handle it. I'm not sure how to handle sparse matrices in CUDA,
though I believe it's possible. However, my module doesn't really help teach
CUDA, and you'll need to learn that somewhere before you'll see great gains
in performance.
David
On Sep 26, 2011 11:45 PM, "Rob Freeman"<[email protected]> wrote:
David,
GPU parallelization may not be sufficiently advanced for my purposes
yet anyway. The dimensions of my vectors are words, so they have
thousands, and even 10's of thousands of dimensions.
I'll have a look at PDL. If it handles sparse arrays efficiently it
might get me over the hump. I can almost get away with speed issues by
storing intermediate products in RAM, but my current implementation
uses hashes, and Perl hashes seem to get way too big way too fast.
The nVidia cross product routine may not matter. I need to define my
own basis vectors and their relationships. Nothing complex. It is
really just a lot of searching for combinations between vector
elements, substituting, and then collating all the substitutions. A
lot of small operations, and any can update any other, at any time. A
snap in parallel, but serially both really slow, and requiring really
enormous storage for intermediate results.
-Rob
On Mon, Sep 26, 2011 at 8:08 PM, David Mertens<[email protected]>
wrote:
Hi Rob!
Thanks for contacting me about the CUDA module. Although I know the
most about CUDA::Minimal, I expect that that PDL folks might also have
something to say, so I've CC'd them on my response. The PDL community
is a great resource for all questions related to numerical computing,
except possibly for the BioPerl modules. Also, PDL might provide a
good place to prototype your idea before moving to CUDA. Depending on
the way in which you perform your cross product, PDL may be able to
parallelize your calculation across multiple CPUs, if your machine has
them.
A cross-product seems like it would parallelize nicely, and nVidia
even has a cross product available which you can find here:
http://http.developer.nvidia.com/Cg/cross.html. However, the module
that I wrote does not contain any Perl-callable CUDA kernels. The
module was really aimed at my own CUDA work in which I wrote all my
own kernels. If the supplied cross product does not work for you,
you'll have to write your own kernel. How much do you know about CUDA?
David
_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl