Hello, I'm trying to evaluate the performance of my PVFS installation over an InfiniBand network, but from time to time a server crashes with this trace in the log:
[E 03/26 21:58] Error: encourage_recv_incoming: mop_id 12952a0 in RTS_DONE message not found. [E 03/26 21:58] [bt] /usr/sbin/pvfs2-server(error+0xca) [0x46f55a] [E 03/26 21:58] [bt] /usr/sbin/pvfs2-server [0x46c88c] [E 03/26 21:58] [bt] /usr/sbin/pvfs2-server [0x46e485] [E 03/26 21:58] [bt] /usr/sbin/pvfs2-server(BMI_testunexpected+0x384) [0x421004] [E 03/26 21:58] [bt] /usr/sbin/pvfs2-server [0x41cf4a] [E 03/26 21:58] [bt] /lib/libpthread.so.0 [0x7f6422ff0fc7] [E 03/26 21:58] [bt] /lib/libc.so.6(clone+0x6d) [0x7f642295164d] I've seen that some other users reported this kind of error in some archives of the mailing list, but didn't find any answer to solve the problem. Any idea how to solve this problem? If it can be of any use: I'm working with 16 PVFS servers (IO server and metadata server at the same time), and I'm benchmarking with the IOR program, for now I have 648 processes writing 8MB each in a shared file with a transfer size that corresponds to the strip size (64KB). Thank you, Matthieu -- Matthieu Dorier ENS Cachan, Brittany (Computer Science dpt.) IRISA Rennes, Office E324 http://perso.eleves.bretagne.ens-cachan.fr/~mdori307/wiki/
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
