Just FYI: the PVFS kernel module is not yet compatible with linux 3.4 and above. I know you are not using the kernel module, but just a "heads up" in case you decide to upgrade your kernel.
Becky On Thu, Apr 4, 2013 at 11:15 AM, Matthieu Dorier <[email protected]>wrote: > The kernel is linux-2.6.32. > Local file system is ext3. > > I'm surprised that their is no caching mechanisme in PVFS and that it > relies on the kernel's caches. > > Anyway I was expecting from the beginning to see the performance dropping > because of cache effects, and this was the initial goal of my experimental > campaign to evaluate this cache effect. Yet what I still don't understand > is why the performance decreases over time for cached writes. > > Matthieu > > ------------------------------ > > *De: *"Phil Carns" <[email protected]> > *À: *[email protected] > *Envoyé: *Jeudi 4 Avril 2013 16:31:15 > > *Objet: *Re: [Pvfs2-users] Strange performance behavior with IOR > > Thanks for following up with the extra experiments. If TroveSyncData is > set to no, then the kernel on the servers is in charge of caching. What > kernel version and local file system are you using? I think the options to > control buffer cache behavior have changed a little in different kernels. > > Here is an example of documentation from a recent kernel version: > > http://lxr.linux.no/linux+v3.8.5/Documentation/sysctl/vm.txt > > The dirty_* options are the ones of interest. It might be enough to just > set the dirty_ratio to a low value so that the kernel starts writing once a > minimal amount of data needs to be flushed from the cache. > > No matter what tuning options are selected, the performance is going to > bottom at 50 MB/s any time the rate of data produced by the clients > outpaces the amount of RAM (and time to flush that RAM to disk) on the > servers. We can probably even out some of the fluctuations in burst tests > that you originally reported though, just by making sure that the kernel > doesn't wait until the last possible moment to get the disk involved. > > -Phil > > On 04/04/2013 10:09 AM, Matthieu Dorier wrote: > > Hi all, > > Here are the answers to your earlier questions (experiments are done > with PVFS on 4 nodes, IOR on 384 cores): > > - When IOR uses a file-per-process approach, the performance becomes > very unstable, ranging from 5MB/s to 400MB/s depending on the iteration. No > way to see if there is a global decrease of performance or not. > > - Setting TroveSyncData to yes leads to all iterations having a constant > 50MB/s aggregate throughput. No performance decrease. > > - CPU utilization is not 100% (30% on average). > > So it seems the problem comes from caching. The questions are: where is > the cache implemented, how to control its size and when it is sync'ed. > > Matthieu > > ------------------------------ > > *De: *"Becky Ligon" <[email protected]> <[email protected]> > *À: *"Matthieu Dorier" <[email protected]><[email protected]> > *Cc: *"Rob Latham" <[email protected]> <[email protected]>, "pvfs2-users" > <[email protected]><[email protected]>, > "ofs-support" <[email protected]> <[email protected]> > *Envoyé: *Mardi 2 Avril 2013 19:19:07 > *Objet: *Re: [Pvfs2-users] Strange performance behavior with IOR > > Another FYI: On our cluster here at Clemson University, we have turned > off hyperthreading on any machine having intel processors. We found that > MPI applications perform badly on a true multi-core system when > hyperthreading is enabled. > > Do any of your compute nodes have hyperthreading enabled? > > Becky > > > On Tue, Apr 2, 2013 at 12:44 PM, Becky Ligon <[email protected]> wrote: > >> Just FYI: What we have seen with the high CPU utilization is that once >> you have more processes running than cores per machine, the performance >> slows down. And, we have seen this problem with the client core as well as >> the pvfs library (which ROMIO accesses). We have not been able to recreate >> the problem systematically and thus have not been able to resolve the issue. >> >> >> On Tue, Apr 2, 2013 at 12:15 PM, Matthieu Dorier < >> [email protected]> wrote: >> >>> To answer Phil's question: just restarting IOR is enough, yes. Not >>> PVFS. >>> For the rest, I'll do some experiments when I have the chance and get >>> back to you. >>> >>> Thanks all >>> >>> Matthieu >>> >>> ------------------------------ >>> >>> *De: *"Becky Ligon" <[email protected]> >>> *À: *"Matthieu Dorier" <[email protected]> >>> *Cc: *"Rob Latham" <[email protected]>, "pvfs2-users" < >>> [email protected]>, "ofs-support" < >>> [email protected]> >>> *Envoyé: *Mardi 2 Avril 2013 17:22:17 >>> >>> *Objet: *Re: [Pvfs2-users] Strange performance behavior with IOR >>> >>> Matthieu: >>> >>> Are you seeing any 100% CPU utilizations on the client? We have seen >>> this with the client core (which you are not using) on a multicore system; >>> however, both the client core and the PVFS interface do use the same >>> request structures, etc. >>> >>> Becky >>> >>> On Tue, Apr 2, 2013 at 11:11 AM, Becky Ligon <[email protected]> wrote: >>> >>>> Matthieu: >>>> >>>> I have asked Phil Carns to help you since he is more familiar with the >>>> benchmark and MPIIO. I think Rob Latham or Rob Ross may be helping too. I >>>> continue to look at your data in the mean time. >>>> >>>> Becky >>>> >>>> Phil/Rob: >>>> >>>> Thanks so much for helping Matthieu. I am digging into the matter but >>>> MPI is still new to me and I'm not familiar with the PVFS interface that >>>> accompanies ROMIO. >>>> >>>> Becky >>>> >>>> PS. Can we keep this on the pvfs2-users list so I can see how things >>>> progress? >>>> >>>> >>>> On Tue, Apr 2, 2013 at 10:47 AM, Matthieu Dorier < >>>> [email protected]> wrote: >>>> >>>>> Hi Rob and Phil, >>>>> >>>>> This thread moved to the ofs-support mailing list (probably because >>>>> the first personne to answer was part of this team), but I didn't get much >>>>> answer to my problem, so I'll try to summarize here what I have done. >>>>> >>>>> First to answer Phil, here is the PVFS config file attached, and here >>>>> is the script file used for IOR: >>>>> >>>>> IOR START >>>>> testFile = pvfs2:/mnt/pvfs2/testfileA >>>>> filePerProc=0 >>>>> api=MPIIO >>>>> repetitions=100 >>>>> verbose=2 >>>>> blockSize=4m >>>>> transferSize=4m >>>>> collective=1 >>>>> writeFile=1 >>>>> interTestDelay=60 >>>>> readFile=0 >>>>> RUN >>>>> IOR STOP >>>>> >>>>> Besides the tests I was describing on my first mail, I also did the >>>>> same experiments on another cluster also with TCP over IB, and then on >>>>> Ethernet, with 336 clients and 672 clients, with 2, 4 and 8 storage >>>>> servers. In every cases, this behavior appears. >>>>> >>>>> I benchmarked the local disk attached to the storage servers and got >>>>> 42MB/s, so the high throughput of over 2GB/s I get obviously benefits from >>>>> some caching mechanisme and the periodic behavior observed at high output >>>>> frequency could be explained by that. Yet this does not explain why, >>>>> overall, the performance decreases over time. >>>>> >>>>> I attach a set of graphics summarizing the experiments (on the x axis >>>>> it's the iteration number and on the y axis the aggregate throughput >>>>> obtained for this iteration, 100 consecutive iterations are performed). >>>>> It seems that the performance follows the law D = a*T+b where D is the >>>>> duration of the write, T is the wallclock time since the beginning of the >>>>> experiment, and "a" and "b" are constants. >>>>> >>>>> When I stop IOR and immediately restart it, I get the good performance >>>>> back, it does not continue at the reduced performance the previous >>>>> instance >>>>> finished. >>>>> >>>>> I also thought it could come from the fact that the same file is >>>>> re-written at every iteration, and tried with the multiFile=1 option to >>>>> have one new file at every iteration instead, but this didn't help. >>>>> >>>>> Last thing I can mention: I'm using mpich 3.0.2, compiled with PVFS >>>>> support. >>>>> >>>>> Matthieu >>>>> >>>>> ----- Mail original ----- >>>>> > De: "Rob Latham" <[email protected]> >>>>> > À: "Matthieu Dorier" <[email protected]> >>>>> > Cc: "pvfs2-users" <[email protected]> >>>>> > Envoyé: Mardi 2 Avril 2013 15:57:54 >>>>> > Objet: Re: [Pvfs2-users] Strange performance behavior with IOR >>>>> > >>>>> > On Sat, Mar 23, 2013 at 03:31:22PM +0100, Matthieu Dorier wrote: >>>>> > > I've installed PVFS (orangeFS 2.8.7) on a small cluster (2 PVFS >>>>> > > nodes, 28 compute nodes of 24 cores each, everything connected >>>>> > > through infiniband but using an IP stack on top of it, so the >>>>> > > protocol for PVFS is TCP), and I witness some strange performance >>>>> > > behaviors with IOR (using ROMIO compiled against PVFS, no kernel >>>>> > > support): >>>>> > >>>>> > > IOR is started on 336 processes (14 nodes), writing 4MB/process in >>>>> > > a >>>>> > > single shared file using MPI-I/O (4MB transfer size also). It >>>>> > > completes 100 iterations. >>>>> > >>>>> > OK, so you have one pvfs client per core. All these are talking to >>>>> > two servers. >>>>> > >>>>> > > First every time I start an instance of IOR, the first I/O >>>>> > > operation >>>>> > > is extremely slow. I'm guessing this is because ROMIO has to >>>>> > > initialize everything, get the list of PVFS servers, etc. Is there >>>>> > > a >>>>> > > way to speed this up? >>>>> > >>>>> > ROMIO isn't doing a whole lot here, but there is one thing different >>>>> > about ROMIO's 1st call vs the Nth call. The 1st call (first time any >>>>> > pvfs2 file is opened or deleted), ROMIO will call the function >>>>> > PVFS_util_init_defaults(). >>>>> > >>>>> > If you have 336 clients banging away on just two servers, I bet that >>>>> > could explain some slowness. In the old days, the PVFS server had to >>>>> > service these requests one at a time. >>>>> > >>>>> > I don't think this restriction has been relaxed? Since it is a >>>>> > read-only operation, though, it sure seems like one could just have >>>>> > servers shovel out pvfs2 configuration information as fast as >>>>> > possible. >>>>> > >>>>> > >>>>> > > Then, I set some delay between each iteration, to better reflect >>>>> > > the >>>>> > > behavior of an actual scientific application. >>>>> > >>>>> > Fun! this is kind of like what MADNESS does. "computes" by sleeping >>>>> > for a bit. I think Phil's questions will help us understand the >>>>> > highly variable performance. >>>>> > >>>>> > Can you experiment with IORs collective I/O? by default, collective >>>>> > I/O will select one client per node as an "i/o aggregator". The IOR >>>>> > workload will not benefit from ROMIO's two-phase optimization, but >>>>> > you've got 336 clients banging away on two servers. When I last >>>>> > studied pvfs scalability, 100x more clients than servers wasn't a >>>>> > big >>>>> > deal, but 5-6 years ago nodes did not have 24 way parallelism. >>>>> > >>>>> > ==rob >>>>> > >>>>> > -- >>>>> > Rob Latham >>>>> > Mathematics and Computer Science Division >>>>> > Argonne National Lab, IL USA >>>>> > >>>>> >>>>> _______________________________________________ >>>>> Pvfs2-users mailing list >>>>> [email protected] >>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >>>>> >>>>> >>>> >>>> >>>> -- >>>> Becky Ligon >>>> OrangeFS Support and Development >>>> Omnibond Systems >>>> Anderson, South Carolina >>>> >>>> >>> >>> >>> -- >>> Becky Ligon >>> OrangeFS Support and Development >>> Omnibond Systems >>> Anderson, South Carolina >>> >>> >>> >> >> >> -- >> Becky Ligon >> OrangeFS Support and Development >> Omnibond Systems >> Anderson, South Carolina >> >> > > > -- > Becky Ligon > OrangeFS Support and Development > Omnibond Systems > Anderson, South Carolina > > > > > _______________________________________________ > Pvfs2-users mailing > [email protected]http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > > > > _______________________________________________ > Pvfs2-users mailing list > [email protected] > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > > > > _______________________________________________ > Pvfs2-users mailing list > [email protected] > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > > -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
