Michael Will wrote:
I had a few hours to play on a cluster with 4TB I/O nodes.
Straight dd read / write of a single large file locally on a software
raid0 of the
two 6-drive raid5 volumes in each i/o node would give 345 MB/s read and
205 MB/s
write throughput.
PVFS single client and single server over single gigabit ethernet would
result in 84 MB/s read, 77 MB/s write.
Now I set it up with 8 I/O nodes and 8 clients, the resulting PVFS2
filesystem was 35TB large. However when
doing my benchmarking runs I got these errors:
pvfs2: pvfs2_get_sb -- wait timed out; aborting attempt.
pvfs2_get_sb: mount request failed with -110
/var/log/messages:Feb 9 05:59:27 10.54.1.100 n100 pvfs2-server[8620]:
segfault at 0000000000000010 rip 0000003b56e6960d rsp 0000007fbffff160
error 6
/var/log/messages:Feb 9 05:59:38 10.54.1.117 .117 pvfs2_file_read:
error in vectored read from handle 1048571, FILE: largefile..117.1
/var/log/messages:Feb 9 05:59:38 10.54.1.117 .117 pvfs2_file_read:
error in vectored read from handle 1048571, FILE: largefile..117.1
/var/log/messages:Feb 9 05:59:38 10.54.1.111 .111 pvfs2_file_read:
error in vectored read from handle 1048570, FILE: largefile..111.1
/var/log/messages:Feb 9 05:59:38 10.54.1.107 .107 pvfs2_file_read:
error in vectored read from handle 1048579, FILE: largefile..107.1
/var/log/messages:Feb 9 05:59:38 10.54.1.111 .111 pvfs2_file_read:
error in vectored read from handle 1048570, FILE: largefile..111.1
/var/log/messages:Feb 9 05:59:38 10.54.1.107 .107 pvfs2_file_read:
error in vectored read from handle 1048579, FILE: largefile..107.1
/var/log/messages:Feb 9 05:59:38 10.54.1.119 .119 pvfs2_file_read:
error in vectored read from handle 1048574, FILE: largefile..119.1
/var/log/messages:Feb 9 05:59:38 10.54.1.106 .106 pvfs2_file_write:
error in vectored write to handle 1048581, FILE: largefile..106.1
/var/log/messages:Feb 9 05:59:38 10.54.1.104 .104 pvfs2_file_read:
error in vectored read from handle 1048580, FILE: largefile..104.1
/var/log/messages:Feb 9 05:59:38 10.54.1.104 .104 pvfs2_file_read:
error in vectored read from handle 1048580, FILE: largefile..104.1
/var/log/messages:Feb 9 05:59:38 10.54.1.103 .103 pvfs2_file_read:
error in vectored read from handle 1048573, FILE: largefile..103.1
/var/log/messages:Feb 9 05:59:38 10.54.1.103 .103 pvfs2_file_read:
error in vectored read from handle 1048573, FILE: largefile..103.1
/var/log/messages:Feb 9 05:59:38 10.54.1.109 .109 pvfs2_file_read:
error in vectored read from handle 1048572, FILE: largefile..109.1
/var/log/messages:Feb 9 05:59:38 10.54.1.109 .109 pvfs2_file_read:
error in vectored read from handle 1048572, FILE: largefile..109.1
/var/log/messages:Feb 9 05:59:38 10.54.1.106 .106 pvfs2_file_write:
error in vectored write to handle 1048581, FILE: largefile..106.1
/var/log/messages:Feb 9 06:08:18 10.54.1.118 .118 pvfs2:
pvfs2_fs_umount -- wait timed out; aborting attempt.
Unfortunately I did not get to play with this anymore since these where
customers systems that needed to be cleaned up and shipped,
so I cannot do any additional troubleshooting or find out why
pvfs2_server died with a segfault on node n100, but it could have to do
with the naming scheme on the cluster (node 100 has hostname .100 and an
alias n100, n101 is .101 etc.).
The size of the filesystem should not have been an issue, right?
Michael
Hi Michael,
The size of the file system is fine, and the naming scheme should be ok
too. Unfortunately I don't think there is much way to tell what
happened to the server in this case. If you see this again in the
future you may need to try to either get a stack trace from a server
core file or else turn on verbose eventlogging in the server
configuration to see what happened.
-Phil
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users