I had a few hours to play on a cluster with 4TB I/O nodes.
Straight dd read / write of a single large file locally on a software raid0 of the two 6-drive raid5 volumes in each i/o node would give 345 MB/s read and 205 MB/s
write throughput.

PVFS single client and single server over single gigabit ethernet would result in 84 MB/s read, 77 MB/s write.

Now I set it up with 8 I/O nodes and 8 clients, the resulting PVFS2 filesystem was 35TB large. However when
doing my benchmarking runs I got these errors:

pvfs2: pvfs2_get_sb -- wait timed out; aborting attempt.
pvfs2_get_sb: mount request failed with -110

/var/log/messages:Feb  9 05:59:27 10.54.1.100 n100 pvfs2-server[8620]: segfault 
at 0000000000000010 rip 0000003b56e6960d rsp 0000007fbffff160 error 6
/var/log/messages:Feb  9 05:59:38 10.54.1.117 .117 pvfs2_file_read: error in 
vectored read from handle 1048571, FILE: largefile..117.1
/var/log/messages:Feb  9 05:59:38 10.54.1.117 .117 pvfs2_file_read: error in 
vectored read from handle 1048571, FILE: largefile..117.1
/var/log/messages:Feb  9 05:59:38 10.54.1.111 .111 pvfs2_file_read: error in 
vectored read from handle 1048570, FILE: largefile..111.1
/var/log/messages:Feb  9 05:59:38 10.54.1.107 .107 pvfs2_file_read: error in 
vectored read from handle 1048579, FILE: largefile..107.1
/var/log/messages:Feb  9 05:59:38 10.54.1.111 .111 pvfs2_file_read: error in 
vectored read from handle 1048570, FILE: largefile..111.1
/var/log/messages:Feb  9 05:59:38 10.54.1.107 .107 pvfs2_file_read: error in 
vectored read from handle 1048579, FILE: largefile..107.1
/var/log/messages:Feb  9 05:59:38 10.54.1.119 .119 pvfs2_file_read: error in 
vectored read from handle 1048574, FILE: largefile..119.1
/var/log/messages:Feb  9 05:59:38 10.54.1.106 .106 pvfs2_file_write: error in 
vectored write to handle 1048581, FILE: largefile..106.1
/var/log/messages:Feb  9 05:59:38 10.54.1.104 .104 pvfs2_file_read: error in 
vectored read from handle 1048580, FILE: largefile..104.1
/var/log/messages:Feb  9 05:59:38 10.54.1.104 .104 pvfs2_file_read: error in 
vectored read from handle 1048580, FILE: largefile..104.1
/var/log/messages:Feb  9 05:59:38 10.54.1.103 .103 pvfs2_file_read: error in 
vectored read from handle 1048573, FILE: largefile..103.1
/var/log/messages:Feb  9 05:59:38 10.54.1.103 .103 pvfs2_file_read: error in 
vectored read from handle 1048573, FILE: largefile..103.1
/var/log/messages:Feb  9 05:59:38 10.54.1.109 .109 pvfs2_file_read: error in 
vectored read from handle 1048572, FILE: largefile..109.1
/var/log/messages:Feb  9 05:59:38 10.54.1.109 .109 pvfs2_file_read: error in 
vectored read from handle 1048572, FILE: largefile..109.1
/var/log/messages:Feb  9 05:59:38 10.54.1.106 .106 pvfs2_file_write: error in 
vectored write to handle 1048581, FILE: largefile..106.1
/var/log/messages:Feb  9 06:08:18 10.54.1.118 .118 pvfs2: pvfs2_fs_umount -- 
wait timed out; aborting attempt.

Unfortunately I did not get to play with this anymore since these where 
customers systems that needed to be cleaned up and shipped,
so I cannot do any additional troubleshooting or find out why pvfs2_server died 
with a segfault on node n100, but it could have to do
with the naming scheme on the cluster (node 100 has hostname .100 and an alias 
n100, n101 is .101 etc.).

The size of the filesystem should not have been an issue, right?
Michael

Michael Will wrote:
Depending on what you are trying to do, this might or might not be the right filesystem for you.

I have only tested pvfs2 in its default configuration with no fine-tuning, but so far I see pvfs2 strengths are: 1. bandwidth scaling: gets you more i/o bandwidth with additional i/o nodes
2. parallelism: multiple clients reading at the same time
3. write-speed over read-speed: aggragate write speed scales much better than the read-speed

If you have only one client (say video player or video editor) running at a time, and not enough i/o nodes to make up for the overhead of splitting the data across servers, then you might be better off running just an nfs-server on a single beefy node and put all the disks in there in a raid10 or raid0.

If you plan to support multiple clients, or if you can add enough i/o nodes, then pvfs2 is very capable.

One thing to try could be to decouple the application and the i/o generation: Run your application on a machine that is not also a data-server since then your video/audio mixing will not be competing for cycles
with the data-producing servers.

Try to have only two i/o nodes and one client instead of all three being i/o nodes if you only have two servers.

I ran some benchmarks on a small cluster with 6 clients and 4 i/o nodes each of which only had a single sata disk and compared it to the nfs-server running on the headnode of the cluster. This nfs-server was pretty slow, however I found that a single client would perform better in read with the single NFS server. For four or six clients, the NFS-server would be caving in badly though where PVFS2 would then give me a nice 280MB/s aggregate
write-bandwidth. Read was still only 45MB/s aggragate.

I hope to run more tests on a much larger cluster with tons of storage this week (>100 i/o nodes with 4.6TB each = Relion 2612 2U server with 12 sata drives)

Michael Will

belcampo wrote:
belcampo wrote:
Hi all,

New to pvfs and related stuff, so try to be kind with me ;-)
I installed according the pvfs2-quickstart guide.
pvfs2-ping -m /mnt/pvfs2

(1) Parsing tab file...

(2) Initializing system interface...

(3) Initializing each file system found in tab file: /etc/fstab...

   PVFS2 servers: tcp://server:3334
   Storage name: pvfs2-fs
   Local mount point: /mnt/pvfs2
   /mnt/pvfs2: Ok

(4) Searching for /mnt/pvfs2 in pvfstab...

   PVFS2 servers: tcp://server:3334
   Storage name: pvfs2-fs
   Local mount point: /mnt/pvfs2

   meta servers:
   tcp://mmulti:3334

   data servers:
   tcp://mmulti:3334
   tcp://mm1:3334
   tcp://server:3334

(5) Verifying that all servers are responding...

   meta servers:
   tcp://mmulti:3334 Ok

   data servers:
   tcp://mmulti:3334 Ok
   tcp://mm1:3334 Ok
   tcp://server:3334 Ok

(6) Verifying that fsid 533592664 is acceptable to all servers...

   Ok; all servers understand fs_id 533592664

(7) Verifying that root handle is owned by one server...

   Root handle: 1048576
     Ok; root handle is owned by exactly one server.

=============================================================

The PVFS2 filesystem at /mnt/pvfs2 appears to be correctly configured.

Copying files to /mnt/pvfs limited by network, so OK.

Did a high IO-demanding muxing of audio/video first locally and then on /mnt/pvfs2 both from the same machine which is one of the data servers.

Local

Saving to timetest.mp4: 0.500 secs Interleaving
7.58user 19.71system 1:52.26elapsed 24%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (77major+6054minor)pagefaults 0swaps

on /mnt/pvfs2

Saving to timetest.mp4: 0.500 secs Interleaving
37.56user 61.05system 41:54.96elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (68major+6063minor)pagefaults 0swaps

System load user about times 5, system > times 20, needed time > times 20.

What could be the reason it is, like it is.

Regards

Henk Schoneveld

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
Additional info
PVF version 2.7.0
kernel 2.6.22.9-desktop586-1mdv
x86-32 tcp/ip realtek 8139too on all
no MPI of MPI-IO
logs only tell
Client
D 15:48:13.061859] [INFO]: Mapping pointer 0xb6769000 for I/O
Server
D 02/04 15:47] PVFS2 Server version 2.7.0 starting.

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to