The python list structure stores the length of the list already (it increments / decrements with appends / pops, etc.), so you'd be *re*computing a value that you already have.
Yup, it does. I was thinking of using each thread to get the len( ) of each sub-list in parallel so I don't have to go through the entire list to get the length of each sub-list sequentially. I think that it would be best at this point for you to implement both and profile the two implementations to compare runtimes. My suggestion would be to implement the python-side wrangling first, and time that vs. my <10 line algo above (I suspect that just the wrangling will be slower than my solution, much less any call to CUDA), then add in the CUDA code after that if it still seems like it's going to be a performance win. Yes, more of empirical tests and then tweaking. Thanks again. Best regards, ./francis
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
