On Sat, Mar 23, 2013 at 03:31:22PM +0100, Matthieu Dorier wrote: > I've installed PVFS (orangeFS 2.8.7) on a small cluster (2 PVFS > nodes, 28 compute nodes of 24 cores each, everything connected > through infiniband but using an IP stack on top of it, so the > protocol for PVFS is TCP), and I witness some strange performance > behaviors with IOR (using ROMIO compiled against PVFS, no kernel > support):
> IOR is started on 336 processes (14 nodes), writing 4MB/process in a > single shared file using MPI-I/O (4MB transfer size also). It > completes 100 iterations. OK, so you have one pvfs client per core. All these are talking to two servers. > First every time I start an instance of IOR, the first I/O operation > is extremely slow. I'm guessing this is because ROMIO has to > initialize everything, get the list of PVFS servers, etc. Is there a > way to speed this up? ROMIO isn't doing a whole lot here, but there is one thing different about ROMIO's 1st call vs the Nth call. The 1st call (first time any pvfs2 file is opened or deleted), ROMIO will call the function PVFS_util_init_defaults(). If you have 336 clients banging away on just two servers, I bet that could explain some slowness. In the old days, the PVFS server had to service these requests one at a time. I don't think this restriction has been relaxed? Since it is a read-only operation, though, it sure seems like one could just have servers shovel out pvfs2 configuration information as fast as possible. > Then, I set some delay between each iteration, to better reflect the > behavior of an actual scientific application. Fun! this is kind of like what MADNESS does. "computes" by sleeping for a bit. I think Phil's questions will help us understand the highly variable performance. Can you experiment with IORs collective I/O? by default, collective I/O will select one client per node as an "i/o aggregator". The IOR workload will not benefit from ROMIO's two-phase optimization, but you've got 336 clients banging away on two servers. When I last studied pvfs scalability, 100x more clients than servers wasn't a big deal, but 5-6 years ago nodes did not have 24 way parallelism. ==rob -- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
