Just FYI:  What we have seen with the high CPU utilization is that once you
have more processes running than cores per machine, the performance slows
down.  And, we have seen this problem with the client core as well as the
pvfs library (which ROMIO accesses).  We have not been able to recreate the
problem systematically and thus have not been able to resolve the issue.


On Tue, Apr 2, 2013 at 12:15 PM, Matthieu Dorier
<[email protected]>wrote:

> To answer Phil's question: just restarting IOR is enough, yes. Not PVFS.
> For the rest, I'll do some experiments when I have the chance and get back
> to you.
>
> Thanks all
>
> Matthieu
>
> ------------------------------
>
> *De: *"Becky Ligon" <[email protected]>
> *À: *"Matthieu Dorier" <[email protected]>
> *Cc: *"Rob Latham" <[email protected]>, "pvfs2-users" <
> [email protected]>, "ofs-support" <
> [email protected]>
> *Envoyé: *Mardi 2 Avril 2013 17:22:17
>
> *Objet: *Re: [Pvfs2-users] Strange performance behavior with IOR
>
> Matthieu:
>
> Are you seeing any 100% CPU utilizations on the client?  We have seen this
> with the client core (which you are not using) on a multicore system;
> however, both the client core and the PVFS interface do use the same
> request structures, etc.
>
> Becky
>
> On Tue, Apr 2, 2013 at 11:11 AM, Becky Ligon <[email protected]> wrote:
>
>> Matthieu:
>>
>> I have asked Phil Carns to help you since he is more familiar with the
>> benchmark and MPIIO.  I think Rob Latham or Rob Ross may be helping too.  I
>> continue to look at your data in the mean time.
>>
>> Becky
>>
>> Phil/Rob:
>>
>> Thanks so much for helping Matthieu.  I am digging into the matter but
>> MPI is still new to me and I'm not familiar with the PVFS interface that
>> accompanies ROMIO.
>>
>> Becky
>>
>> PS.  Can we keep this on the pvfs2-users list so I can see how things
>> progress?
>>
>>
>> On Tue, Apr 2, 2013 at 10:47 AM, Matthieu Dorier <
>> [email protected]> wrote:
>>
>>> Hi Rob and Phil,
>>>
>>> This thread moved to the ofs-support mailing list (probably because the
>>> first personne to answer was part of this team), but I didn't get much
>>> answer to my problem, so I'll try to summarize here what I have done.
>>>
>>> First to answer Phil, here is the PVFS config file attached, and here is
>>> the script file used for IOR:
>>>
>>> IOR START
>>>   testFile = pvfs2:/mnt/pvfs2/testfileA
>>>   filePerProc=0
>>>   api=MPIIO
>>>   repetitions=100
>>>   verbose=2
>>>   blockSize=4m
>>>   transferSize=4m
>>>   collective=1
>>>   writeFile=1
>>>   interTestDelay=60
>>>   readFile=0
>>>   RUN
>>> IOR STOP
>>>
>>> Besides the tests I was describing on my first mail, I also did the same
>>> experiments on another cluster also with TCP over IB, and then on Ethernet,
>>> with 336 clients and 672 clients, with 2, 4 and 8 storage servers. In every
>>> cases, this behavior appears.
>>>
>>> I benchmarked the local disk attached to the storage servers and got
>>> 42MB/s, so the high throughput of over 2GB/s I get obviously benefits from
>>> some caching mechanisme and the periodic behavior observed at high output
>>> frequency could be explained by that. Yet this does not explain why,
>>> overall, the performance decreases over time.
>>>
>>> I attach a set of graphics summarizing the experiments (on the x axis
>>> it's the iteration number and on the y axis the aggregate throughput
>>> obtained for this iteration, 100 consecutive iterations are performed).
>>> It seems that the performance follows the law D = a*T+b where D is the
>>> duration of the write, T is the wallclock time since the beginning of the
>>> experiment, and "a" and "b" are constants.
>>>
>>> When I stop IOR and immediately restart it, I get the good performance
>>> back, it does not continue at the reduced performance the previous instance
>>> finished.
>>>
>>> I also thought it could come from the fact that the same file is
>>> re-written at every iteration, and tried with the multiFile=1 option to
>>> have one new file at every iteration instead, but this didn't help.
>>>
>>> Last thing I can mention: I'm using mpich 3.0.2, compiled with PVFS
>>> support.
>>>
>>> Matthieu
>>>
>>> ----- Mail original -----
>>> > De: "Rob Latham" <[email protected]>
>>> > À: "Matthieu Dorier" <[email protected]>
>>> > Cc: "pvfs2-users" <[email protected]>
>>> > Envoyé: Mardi 2 Avril 2013 15:57:54
>>> > Objet: Re: [Pvfs2-users] Strange performance behavior with IOR
>>> >
>>> > On Sat, Mar 23, 2013 at 03:31:22PM +0100, Matthieu Dorier wrote:
>>> > > I've installed PVFS (orangeFS 2.8.7) on a small cluster (2 PVFS
>>> > > nodes, 28 compute nodes of 24 cores each, everything connected
>>> > > through infiniband but using an IP stack on top of it, so the
>>> > > protocol for PVFS is TCP), and I witness some strange performance
>>> > > behaviors with IOR (using ROMIO compiled against PVFS, no kernel
>>> > > support):
>>> >
>>> > > IOR is started on 336 processes (14 nodes), writing 4MB/process in
>>> > > a
>>> > > single shared file using MPI-I/O (4MB transfer size also). It
>>> > > completes 100 iterations.
>>> >
>>> > OK, so you have one pvfs client per core.  All these are talking to
>>> > two servers.
>>> >
>>> > > First every time I start an instance of IOR, the first I/O
>>> > > operation
>>> > > is extremely slow. I'm guessing this is because ROMIO has to
>>> > > initialize everything, get the list of PVFS servers, etc. Is there
>>> > > a
>>> > > way to speed this up?
>>> >
>>> > ROMIO isn't doing a whole lot here, but there is one thing different
>>> > about ROMIO's 1st call vs the Nth call.  The 1st call (first time any
>>> > pvfs2 file is opened or deleted), ROMIO will call the function
>>> > PVFS_util_init_defaults().
>>> >
>>> > If you have 336 clients banging away on just two servers, I bet that
>>> > could explain some slowness.  In the old days, the PVFS server had to
>>> > service these requests one at a time.
>>> >
>>> > I don't think this restriction has been relaxed?  Since it is a
>>> > read-only operation, though, it sure seems like one could just have
>>> > servers shovel out pvfs2 configuration information as fast as
>>> > possible.
>>> >
>>> >
>>> > > Then, I set some delay between each iteration, to better reflect
>>> > > the
>>> > > behavior of an actual scientific application.
>>> >
>>> > Fun! this is kind of like what MADNESS does.  "computes" by sleeping
>>> > for a bit.   I think Phil's questions will help us understand the
>>> > highly variable performance.
>>> >
>>> > Can you experiment with IORs collective I/O?  by default, collective
>>> > I/O will select one client per node as an "i/o aggregator".  The IOR
>>> > workload will not benefit from ROMIO's two-phase optimization, but
>>> > you've got 336 clients banging away on two servers.  When I last
>>> > studied pvfs scalability,  100x more clients than servers wasn't a
>>> > big
>>> > deal, but 5-6 years ago nodes did not have 24 way parallelism.
>>> >
>>> > ==rob
>>> >
>>> > --
>>> > Rob Latham
>>> > Mathematics and Computer Science Division
>>> > Argonne National Lab, IL USA
>>> >
>>>
>>> _______________________________________________
>>> Pvfs2-users mailing list
>>> [email protected]
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>
>>>
>>
>>
>> --
>> Becky Ligon
>> OrangeFS Support and Development
>> Omnibond Systems
>> Anderson, South Carolina
>>
>>
>
>
> --
> Becky Ligon
> OrangeFS Support and Development
> Omnibond Systems
> Anderson, South Carolina
>
>
>


-- 
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to