Hi Eli, Thanks for your very nice points. I certainly agree that because of the input ( a list of sub-lists of strings) it's a bit more complicated to get them to a format understood by CUDA (or CUDA C at least). Moving data to CUDA would've probably worked better if I was passing arrays of ints/floats or strings, and then I could make use of the tens of thousands of threads in CUDA to get the length of each substring/subarray. In this 2nd case ( array of ints or floats or chars) it would be a good idea to move data to CUDA ( for perhaps n > = 1000, n is the input size ) no? Even though it is still O( n ) in the host.
Best regards, ./francis Perhaps I wasn't clear. Unless there's some magic interoperability > between Python and CUDA that I don't know about, you *can't* send an > entire python list to CUDA (and even if you could, you'd have to copy > the entire thing over the PCI bus, which isn't super fast). You'd > have to pre-process each sublist first. However, getting the length > of a python list is O(1), and preprocessing the list is O(length of > the sublist), so you're better off getting the length of each sublist > in python code and sending a list of just the lengths to CUDA. > However, since you can't just send a python list to CUDA, you *still* > have to preprocess the list of sublist lengths (by sticking them into > a numpy array, etc.) at which point you're iterating over the list of > lengths in python anyway, and just tracking the one you want is going > to be a performance win. > > Unless there's an aspect of the problem that I'm not picking up on, I > really don't see how CUDA can improve the performance of this problem > *at all*. Put more succinctly: the boilerplate to prepare your data > to send to CUDA is going to be more expensive than computing the > answer in python for this problem. > > HTH, > Eli >
_______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda