Hi
First, thank you for the awesome work; PyCUDA is such a
nice way of working with the GPU!
As part of a neural field integrator, I am trying to use
the PacketedSpMV:
from pycuda.sparse.packeted import PacketedSpMV
spm_cpu = scipy.io.mmread('add20.mtx').tocsr().astype(float32)
spm_gpu = PacketedSpMV(spm_cpu, False, spm_cpu.dtype)
which works well for matrices up to about 200k elements,
but when I try with matrices much larger, here with 480k
elements, I receive the following traceback:
/usr/lib64/python2.7/site-packages/pycuda-2011.2.2-py2.7-linux-x86_64.egg/pycuda/sparse/packeted.pyc
in __init__(self, mat, is_symmetric, dtype)
122 while True:
123 cut_count, dof_to_packet_nr =
part_graph(int(self.block_count),
--> 124 xadj=adj_mat.indptr, adjncy=adj_mat.indices)
125
126 # build packet_nr_to_dofs
/usr/lib/python2.7/site-packages/PyMetis-2011.1.1-py2.7-linux-x86_64.egg/pymetis/__init__.pyc
in part_graph(nparts, adjacency, xadj, adjncy, vweights, eweights,
recursive)
77 return 0, [0] * (len(xadj)-1)
78
---> 79 return part_graph(nparts, xadj, adjncy, vweights, eweights,
recursive)
TypeError: No registered converter was able to produce a C++ rvalue of
type int from this Python object of type numpy.int32
Any ideas? I am completely unfamiliar with the term 'registered
converter', or I would've looked into the code..
Another question: does this sparse matrix - vector multiply give
performance that is close to optimal? With the small matrices I
can benchmark so far, it's up to 2x faster than CPU (Quadro 600 vs
Xeon 3.2 6M cache) but I expect larger matrices to be
much faster.
Thanks,
Marmaduke
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda