Michael Will wrote:
I had a few hours to play on a cluster with 4TB I/O nodes.
Straight dd read / write of a single large file locally on a software raid0 of the two 6-drive raid5 volumes in each i/o node would give 345 MB/s read and 205 MB/s
write throughput.

PVFS single client and single server over single gigabit ethernet would result in 84 MB/s read, 77 MB/s write.

Now I set it up with 8 I/O nodes and 8 clients, the resulting PVFS2 filesystem was 35TB large. However when
doing my benchmarking runs I got these errors:

pvfs2: pvfs2_get_sb -- wait timed out; aborting attempt.
pvfs2_get_sb: mount request failed with -110

/var/log/messages:Feb 9 05:59:27 10.54.1.100 n100 pvfs2-server[8620]: segfault at 0000000000000010 rip 0000003b56e6960d rsp 0000007fbffff160 error 6 /var/log/messages:Feb 9 05:59:38 10.54.1.117 .117 pvfs2_file_read: error in vectored read from handle 1048571, FILE: largefile..117.1 /var/log/messages:Feb 9 05:59:38 10.54.1.117 .117 pvfs2_file_read: error in vectored read from handle 1048571, FILE: largefile..117.1 /var/log/messages:Feb 9 05:59:38 10.54.1.111 .111 pvfs2_file_read: error in vectored read from handle 1048570, FILE: largefile..111.1 /var/log/messages:Feb 9 05:59:38 10.54.1.107 .107 pvfs2_file_read: error in vectored read from handle 1048579, FILE: largefile..107.1 /var/log/messages:Feb 9 05:59:38 10.54.1.111 .111 pvfs2_file_read: error in vectored read from handle 1048570, FILE: largefile..111.1 /var/log/messages:Feb 9 05:59:38 10.54.1.107 .107 pvfs2_file_read: error in vectored read from handle 1048579, FILE: largefile..107.1 /var/log/messages:Feb 9 05:59:38 10.54.1.119 .119 pvfs2_file_read: error in vectored read from handle 1048574, FILE: largefile..119.1 /var/log/messages:Feb 9 05:59:38 10.54.1.106 .106 pvfs2_file_write: error in vectored write to handle 1048581, FILE: largefile..106.1 /var/log/messages:Feb 9 05:59:38 10.54.1.104 .104 pvfs2_file_read: error in vectored read from handle 1048580, FILE: largefile..104.1 /var/log/messages:Feb 9 05:59:38 10.54.1.104 .104 pvfs2_file_read: error in vectored read from handle 1048580, FILE: largefile..104.1 /var/log/messages:Feb 9 05:59:38 10.54.1.103 .103 pvfs2_file_read: error in vectored read from handle 1048573, FILE: largefile..103.1 /var/log/messages:Feb 9 05:59:38 10.54.1.103 .103 pvfs2_file_read: error in vectored read from handle 1048573, FILE: largefile..103.1 /var/log/messages:Feb 9 05:59:38 10.54.1.109 .109 pvfs2_file_read: error in vectored read from handle 1048572, FILE: largefile..109.1 /var/log/messages:Feb 9 05:59:38 10.54.1.109 .109 pvfs2_file_read: error in vectored read from handle 1048572, FILE: largefile..109.1 /var/log/messages:Feb 9 05:59:38 10.54.1.106 .106 pvfs2_file_write: error in vectored write to handle 1048581, FILE: largefile..106.1 /var/log/messages:Feb 9 06:08:18 10.54.1.118 .118 pvfs2: pvfs2_fs_umount -- wait timed out; aborting attempt.

Unfortunately I did not get to play with this anymore since these where customers systems that needed to be cleaned up and shipped, so I cannot do any additional troubleshooting or find out why pvfs2_server died with a segfault on node n100, but it could have to do with the naming scheme on the cluster (node 100 has hostname .100 and an alias n100, n101 is .101 etc.).

The size of the filesystem should not have been an issue, right?
Michael

Hi Michael,

The size of the file system is fine, and the naming scheme should be ok too. Unfortunately I don't think there is much way to tell what happened to the server in this case. If you see this again in the future you may need to try to either get a stack trace from a server core file or else turn on verbose eventlogging in the server configuration to see what happened.

-Phil
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to