Becky, I've been looking into this problem and your suggestion. The users apps that are causing problems are reading 64KB per read all the way down to 1 Byte per read and they spending much of their run time reading input. Many are doing 4KB to 8KB per read. I have read the PVFS FAQ about tuning individual directories.
I confess I do not know what my options are regarding tuning of the directories to improve performance. What is the purpose of the four different types of distributions? What stripe size should I use? I realize that PVFS is not suited for reads this small but *any* performance improvement that I can get for these particular problem jobs would be helpful. Thanks! -Roger ----------------------------------------------------------- Roger V. Moye Systems Analyst III XSEDE Campus Champion University of Texas - MD Anderson Cancer Center Division of Quantitative Sciences Pickens Academic Tower - FCT4.6109 Houston, Texas (713) 792-2134 ----------------------------------------------------------- From: Becky Ligon [mailto:[email protected]] Sent: Monday, December 16, 2013 4:10 PM To: Moye,Roger V Cc: Kyle Schochenmaier; [email protected] Subject: Re: [Pvfs2-users] how to troubleshoot performance problems Roger: In general, if your filesystem has x-number of servers and you have used the default 64K stripe size, then you would want to be reading or writing in *at least* (64K * number of servers) bytes at a time (but preferably more) in order to take advantage of the parallelism. You also want to minimize the *number* of files that you create/delete in one job, since these operations require additional metadata accesses. These are guide lines not rules. Looking at what kind of reads/writes and file accesses are being used is the best way to tune your filesystem for a particular purpose. Keep in mind that directories and files can have different attributes than those specified in the config file as the defaults. So, you can tune files or files in a directory to use a different number of servers, a different stripe size, etc. Hope this little bit of information is helpful. Becky On Mon, Dec 16, 2013 at 3:33 PM, Moye,Roger V <[email protected]<mailto:[email protected]>> wrote: Becky, You nailed it: read(7, "4", 1) = 1 read(7, "|", 1) = 1 read(7, "4", 1) = 1 read(7, "2", 1) = 1 read(7, "0", 1) = 1 read(7, "6", 1) = 1 read(7, "5", 1) = 1 read(7, "1", 1) = 1 read(7, "1", 1) = 1 read(7, "2", 1) = 1 read(7, "|", 1) = 1 read(7, "4", 1) = 1 read(7, "2", 1) = 1 read(7, "0", 1) = 1 read(7, "6", 1) = 1 read(7, "5", 1) = 1 read(7, "1", 1) = 1 read(7, "3", 1) = 1 He's doing this from multiple processes on multiple nodes. Question to you: Is there a rule of thumb to follow for 'how small is too small'? -Roger ----------------------------------------------------------- Roger V. Moye Systems Analyst III XSEDE Campus Champion University of Texas - MD Anderson Cancer Center Division of Quantitative Sciences Pickens Academic Tower - FCT4.6109 Houston, Texas (713) 792-2134<tel:%28713%29%20792-2134> ----------------------------------------------------------- From: Becky Ligon [mailto:[email protected]<mailto:[email protected]>] Sent: Monday, December 16, 2013 1:25 PM To: Kyle Schochenmaier Cc: Moye,Roger V; [email protected]<mailto:[email protected]> Subject: Re: [Pvfs2-users] how to troubleshoot performance problems Roger: I have also seen some codes that read/write one byte at a time, which is not appropriate for a parallel filesystem. Try this: While the user's process is running, attach to it with strace and see what kind of read/writes are being issued. Becky On Mon, Dec 16, 2013 at 1:57 PM, Becky Ligon <[email protected]<mailto:[email protected]>> wrote: Roger: Are all of your filesystem servers ALSO metadata servers? Becky On Mon, Dec 16, 2013 at 1:18 PM, Kyle Schochenmaier <[email protected]<mailto:[email protected]>> wrote: There are some tuning params that you can look into here, by default there is a round robin loading on the servers and that is done in chunks of FlowBufferSize (iirc?), you can set this in your config file but by default the size is quite small (64k) and I've pushed it up over 1-2MB and seen drastic improvements in bandwidth for larger requests; but if you're doing tons of small requests this obviously wont help. Can you attach your config file so we can see how things are setup? Kyle Schochenmaier On Mon, Dec 16, 2013 at 11:57 AM, Moye,Roger V <[email protected]<mailto:[email protected]>> wrote: Over the past weekend one of my users reported that his compute jobs running on a server with local disks usually takes about 5 hours. However, running the same jobs on our small Linux cluster using a PVFS filesystem exceeded 24 hours. Here is the environment we are using: 1. RHEL 6.4 on PVFS servers and clients. 2. Computations are performed on any of 16 Linux clients, all running RHEL 6.4. 3. We are running Orangefs-2.8.7. 4. We have 4 PVFS servers, each with an XFS filesystem on a ~35TB RAID 6. Total PVFS filesystem is 146TB. 5. All components are connected via a 10GigE network. I started looking for the source of the problem. For the user(s) showing this poor performance, I found that pvfs-client is using about 65% of the CPU while the compute jobs themselves are using only 4% each. Thus the compute nodes are very lightly loaded and the compute jobs are hardly doing anything. The pvfs2-server process on each PVFS server is using about 140% CPU. No time is being spent in the wait state (so I assume the speed of the disks are not an issue). While the system was exhibiting poor performance I tried to read/write some 10GB files myself and found the performance to be normal for this system (around 450MB/s). I used 'iperf' to measure the network bandwidth between the affected nodes and the PVFS serves and found it normal at 9.38Gb/s. The directories that the users are reading/writing only have a few files in each. Iostat shows that the disk system is being constantly read by something as shown by 'iostat -d 2' on the PVFS servers: Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 0.00 0.00 0.00 0 0 sdb 19.00 4864.00 0.00 9728 0 dm-0 0.00 0.00 0.00 0 0 dm-1 0.00 0.00 0.00 0 0 iostat has looked like over the last 48 hours (since Saturday). I can not find any documentation on how to get stats directly from pvfs2 so I tried this command: pvfs2-statfs -m /pvfs2-mnt I received these results: I/O server statistics: --------------------------------------- server: tcp://dqspvfs01:3334 RAM bytes total : 33619419136 RAM bytes free : 284790784 uptime (seconds) : 14499577 load averages : 0 0 0 handles available: 2305843009213589192 handles total : 2305843009213693950 bytes available : 31456490479616 bytes total : 40000112558080 mode: serving both metadata and I/O data server: tcp://dqspvfs02:3334 RAM bytes total : 33619419136 RAM bytes free : 217452544 uptime (seconds) : 14499840 load averages : 0 0 0 handles available: 2305843009213589104 handles total : 2305843009213693950 bytes available : 31456971476992 bytes total : 40000112558080 mode: serving both metadata and I/O data server: tcp://dqspvfs03:3334 RAM bytes total : 33619419136 RAM bytes free : 428965888 uptime (seconds) : 5437269 load averages : 320 192 0 handles available: 2305843009213588929 handles total : 2305843009213693950 bytes available : 31439132123136 bytes total : 40000112558080 mode: serving both metadata and I/O data server: tcp://dqspvfs04:3334 RAM bytes total : 33619419136 RAM bytes free : 223281152 uptime (seconds) : 10089825 load averages : 1664 3072 0 handles available: 2305843009213588989 handles total : 2305843009213693950 bytes available : 31452933193728 bytes total : 40000112558080 mode: serving both metadata and I/O data Notice that the 'load averages' are 0 for servers #1 and #2 but not #3 and #4. Earlier this morning only #4 showed a non-zero load average. The other three were 0. What does this number mean? My two theories about the source of the problem are: 1. Someone is doing 'a lot' of tiny reads. 2. Or, based on the load averages the PVFS filesystem is somehow not balanced. All of the load is on a single server. How can I prove either of these? Or what other types of diagnostics can I do? Thank you! -Roger ----------------------------------------------- Roger V. Moye Systems Analyst III XSEDE Campus Champion University of Texas - MD Anderson Cancer Center Division of Quantitative Sciences Pickens Academic Tower - FCT4.6109 Houston, Texas (713) 792-2134<tel:%28713%29%20792-2134> ----------------------------------------------------------- _______________________________________________ Pvfs2-users mailing list [email protected]<mailto:[email protected]> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users _______________________________________________ Pvfs2-users mailing list [email protected]<mailto:[email protected]> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
