Hi Eli,

Thanks for your very nice points. I certainly agree that because of the
input ( a list of sub-lists of strings) it's a bit more complicated to get
them to a format understood by CUDA (or CUDA C at least). Moving data to
CUDA would've probably worked better if I was passing arrays of ints/floats
or strings, and then I could make use of the tens of thousands of threads in
CUDA to get the length of each substring/subarray. In this 2nd case ( array
of ints or floats or chars) it would be a good idea to move data to CUDA (
for perhaps n > = 1000, n is the input size ) no? Even though it is still O(
n ) in the host.


Best regards,

./francis


Perhaps I wasn't clear.  Unless there's some magic interoperability
> between Python and CUDA that I don't know about, you *can't* send an
> entire python list to CUDA (and even if you could, you'd have to copy
> the entire thing over the PCI bus, which isn't super fast).  You'd
> have to pre-process each sublist first.  However, getting the length
> of a python list is O(1), and preprocessing the list is O(length of
> the sublist), so you're better off getting the length of each sublist
> in python code and sending a list of just the lengths to CUDA.
> However, since you can't just send a python list to CUDA, you *still*
> have to preprocess the list of sublist lengths (by sticking them into
> a numpy array, etc.) at which point you're iterating over the list of
> lengths in python anyway, and just tracking the one you want is going
> to be a performance win.
>
> Unless there's an aspect of the problem that I'm not picking up on, I
> really don't see how CUDA can improve the performance of this problem
> *at all*.  Put more succinctly: the boilerplate to prepare your data
> to send to CUDA is going to be more expensive than computing the
> answer in python for this problem.
>
> HTH,
> Eli
>
_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to