Hi

First, thank you for the awesome work; PyCUDA is such a 
nice way of working with the GPU!

As part of a neural field integrator, I am trying to use 
the PacketedSpMV:


from pycuda.sparse.packeted import PacketedSpMV

spm_cpu = scipy.io.mmread('add20.mtx').tocsr().astype(float32)
spm_gpu = PacketedSpMV(spm_cpu, False, spm_cpu.dtype)


which works well for matrices up to about 200k elements, 
but when I try with matrices much larger, here with 480k 
elements, I receive the following traceback:


/usr/lib64/python2.7/site-packages/pycuda-2011.2.2-py2.7-linux-x86_64.egg/pycuda/sparse/packeted.pyc
in __init__(self, mat, is_symmetric, dtype)
    122         while True:
    123             cut_count, dof_to_packet_nr =
    part_graph(int(self.block_count),
--> 124                     xadj=adj_mat.indptr, adjncy=adj_mat.indices)
    125 
    126             # build packet_nr_to_dofs


/usr/lib/python2.7/site-packages/PyMetis-2011.1.1-py2.7-linux-x86_64.egg/pymetis/__init__.pyc
in part_graph(nparts, adjacency, xadj, adjncy, vweights, eweights,
recursive)
     77         return 0, [0] * (len(xadj)-1)
     78 
---> 79     return part_graph(nparts, xadj, adjncy, vweights, eweights,
recursive)

TypeError: No registered converter was able to produce a C++ rvalue of
type int from this Python object of type numpy.int32


Any ideas? I am completely unfamiliar with the term 'registered 
converter', or I would've looked into the code.. 

Another question: does this sparse matrix - vector multiply give
performance that is close to optimal? With the small matrices I 
can benchmark so far, it's up to 2x faster than CPU (Quadro 600 vs
Xeon 3.2 6M cache) but I expect larger matrices to be
much faster. 

Thanks,
Marmaduke

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to