I had a few hours to play on a cluster with 4TB I/O nodes.
Straight dd read / write of a single large file locally on a software
raid0 of the
two 6-drive raid5 volumes in each i/o node would give 345 MB/s read and
205 MB/s
write throughput.
PVFS single client and single server over single gigabit ethernet would
result in 84 MB/s read, 77 MB/s write.
Now I set it up with 8 I/O nodes and 8 clients, the resulting PVFS2
filesystem was 35TB large. However when
doing my benchmarking runs I got these errors:
pvfs2: pvfs2_get_sb -- wait timed out; aborting attempt.
pvfs2_get_sb: mount request failed with -110
/var/log/messages:Feb 9 05:59:27 10.54.1.100 n100 pvfs2-server[8620]: segfault
at 0000000000000010 rip 0000003b56e6960d rsp 0000007fbffff160 error 6
/var/log/messages:Feb 9 05:59:38 10.54.1.117 .117 pvfs2_file_read: error in
vectored read from handle 1048571, FILE: largefile..117.1
/var/log/messages:Feb 9 05:59:38 10.54.1.117 .117 pvfs2_file_read: error in
vectored read from handle 1048571, FILE: largefile..117.1
/var/log/messages:Feb 9 05:59:38 10.54.1.111 .111 pvfs2_file_read: error in
vectored read from handle 1048570, FILE: largefile..111.1
/var/log/messages:Feb 9 05:59:38 10.54.1.107 .107 pvfs2_file_read: error in
vectored read from handle 1048579, FILE: largefile..107.1
/var/log/messages:Feb 9 05:59:38 10.54.1.111 .111 pvfs2_file_read: error in
vectored read from handle 1048570, FILE: largefile..111.1
/var/log/messages:Feb 9 05:59:38 10.54.1.107 .107 pvfs2_file_read: error in
vectored read from handle 1048579, FILE: largefile..107.1
/var/log/messages:Feb 9 05:59:38 10.54.1.119 .119 pvfs2_file_read: error in
vectored read from handle 1048574, FILE: largefile..119.1
/var/log/messages:Feb 9 05:59:38 10.54.1.106 .106 pvfs2_file_write: error in
vectored write to handle 1048581, FILE: largefile..106.1
/var/log/messages:Feb 9 05:59:38 10.54.1.104 .104 pvfs2_file_read: error in
vectored read from handle 1048580, FILE: largefile..104.1
/var/log/messages:Feb 9 05:59:38 10.54.1.104 .104 pvfs2_file_read: error in
vectored read from handle 1048580, FILE: largefile..104.1
/var/log/messages:Feb 9 05:59:38 10.54.1.103 .103 pvfs2_file_read: error in
vectored read from handle 1048573, FILE: largefile..103.1
/var/log/messages:Feb 9 05:59:38 10.54.1.103 .103 pvfs2_file_read: error in
vectored read from handle 1048573, FILE: largefile..103.1
/var/log/messages:Feb 9 05:59:38 10.54.1.109 .109 pvfs2_file_read: error in
vectored read from handle 1048572, FILE: largefile..109.1
/var/log/messages:Feb 9 05:59:38 10.54.1.109 .109 pvfs2_file_read: error in
vectored read from handle 1048572, FILE: largefile..109.1
/var/log/messages:Feb 9 05:59:38 10.54.1.106 .106 pvfs2_file_write: error in
vectored write to handle 1048581, FILE: largefile..106.1
/var/log/messages:Feb 9 06:08:18 10.54.1.118 .118 pvfs2: pvfs2_fs_umount --
wait timed out; aborting attempt.
Unfortunately I did not get to play with this anymore since these where
customers systems that needed to be cleaned up and shipped,
so I cannot do any additional troubleshooting or find out why pvfs2_server died
with a segfault on node n100, but it could have to do
with the naming scheme on the cluster (node 100 has hostname .100 and an alias
n100, n101 is .101 etc.).
The size of the filesystem should not have been an issue, right?
Michael
Michael Will wrote:
Depending on what you are trying to do, this might or might not be the
right filesystem for you.
I have only tested pvfs2 in its default configuration with no
fine-tuning, but so far I see pvfs2 strengths are:
1. bandwidth scaling: gets you more i/o bandwidth with additional i/o
nodes
2. parallelism: multiple clients reading at the same time
3. write-speed over read-speed: aggragate write speed scales much
better than the read-speed
If you have only one client (say video player or video editor) running
at a time, and not enough
i/o nodes to make up for the overhead of splitting the data across
servers, then you might be better
off running just an nfs-server on a single beefy node and put all the
disks in there in a raid10 or raid0.
If you plan to support multiple clients, or if you can add enough i/o
nodes, then pvfs2 is very capable.
One thing to try could be to decouple the application and the i/o
generation: Run your application on a
machine that is not also a data-server since then your video/audio
mixing will not be competing for cycles
with the data-producing servers.
Try to have only two i/o nodes and one client instead of all three
being i/o nodes if you only have two servers.
I ran some benchmarks on a small cluster with 6 clients and 4 i/o
nodes each of which only had a single sata disk
and compared it to the nfs-server running on the headnode of the
cluster. This nfs-server was pretty slow, however
I found that a single client would perform better in read with the
single NFS server. For four or six clients,
the NFS-server would be caving in badly though where PVFS2 would then
give me a nice 280MB/s aggregate
write-bandwidth. Read was still only 45MB/s aggragate.
I hope to run more tests on a much larger cluster with tons of storage
this week (>100 i/o nodes with 4.6TB each = Relion 2612 2U server with
12 sata drives)
Michael Will
belcampo wrote:
belcampo wrote:
Hi all,
New to pvfs and related stuff, so try to be kind with me ;-)
I installed according the pvfs2-quickstart guide.
pvfs2-ping -m /mnt/pvfs2
(1) Parsing tab file...
(2) Initializing system interface...
(3) Initializing each file system found in tab file: /etc/fstab...
PVFS2 servers: tcp://server:3334
Storage name: pvfs2-fs
Local mount point: /mnt/pvfs2
/mnt/pvfs2: Ok
(4) Searching for /mnt/pvfs2 in pvfstab...
PVFS2 servers: tcp://server:3334
Storage name: pvfs2-fs
Local mount point: /mnt/pvfs2
meta servers:
tcp://mmulti:3334
data servers:
tcp://mmulti:3334
tcp://mm1:3334
tcp://server:3334
(5) Verifying that all servers are responding...
meta servers:
tcp://mmulti:3334 Ok
data servers:
tcp://mmulti:3334 Ok
tcp://mm1:3334 Ok
tcp://server:3334 Ok
(6) Verifying that fsid 533592664 is acceptable to all servers...
Ok; all servers understand fs_id 533592664
(7) Verifying that root handle is owned by one server...
Root handle: 1048576
Ok; root handle is owned by exactly one server.
=============================================================
The PVFS2 filesystem at /mnt/pvfs2 appears to be correctly configured.
Copying files to /mnt/pvfs limited by network, so OK.
Did a high IO-demanding muxing of audio/video first locally and then
on /mnt/pvfs2 both from the same machine which is one of the data
servers.
Local
Saving to timetest.mp4: 0.500 secs Interleaving
7.58user 19.71system 1:52.26elapsed 24%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (77major+6054minor)pagefaults 0swaps
on /mnt/pvfs2
Saving to timetest.mp4: 0.500 secs Interleaving
37.56user 61.05system 41:54.96elapsed 3%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (68major+6063minor)pagefaults 0swaps
System load user about times 5, system > times 20, needed time >
times 20.
What could be the reason it is, like it is.
Regards
Henk Schoneveld
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
Additional info
PVF version 2.7.0
kernel 2.6.22.9-desktop586-1mdv
x86-32 tcp/ip realtek 8139too on all
no MPI of MPI-IO
logs only tell
Client
D 15:48:13.061859] [INFO]: Mapping pointer 0xb6769000 for I/O
Server
D 02/04 15:47] PVFS2 Server version 2.7.0 starting.
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users