Just FYI:  the PVFS kernel module is not yet compatible with linux 3.4 and
above.  I know you are not using the kernel module, but just a "heads up"
in case you decide to upgrade your kernel.

Becky


On Thu, Apr 4, 2013 at 11:15 AM, Matthieu Dorier
<[email protected]>wrote:

> The kernel is linux-2.6.32.
> Local file system is ext3.
>
> I'm surprised that their is no caching mechanisme in PVFS and that it
> relies on the kernel's caches.
>
> Anyway I was expecting from the beginning to see the performance dropping
> because of cache effects, and this was the initial goal of my experimental
> campaign to evaluate this cache effect. Yet what I still don't understand
> is why the performance decreases over time for cached writes.
>
> Matthieu
>
> ------------------------------
>
> *De: *"Phil Carns" <[email protected]>
> *À: *[email protected]
> *Envoyé: *Jeudi 4 Avril 2013 16:31:15
>
> *Objet: *Re: [Pvfs2-users] Strange performance behavior with IOR
>
> Thanks for following up with the extra experiments.  If TroveSyncData is
> set to no, then the kernel on the servers is in charge of caching.  What
> kernel version and local file system are you using?  I think the options to
> control buffer cache behavior have changed a little in different kernels.
>
> Here is an example of documentation from a recent kernel version:
>
>     http://lxr.linux.no/linux+v3.8.5/Documentation/sysctl/vm.txt
>
> The dirty_* options are the ones of interest.  It might be enough to just
> set the dirty_ratio to a low value so that the kernel starts writing once a
> minimal amount of data needs to be flushed from the cache.
>
> No matter what tuning options are selected, the performance is going to
> bottom at 50 MB/s any time the rate of data produced by the clients
> outpaces the amount of RAM (and time to flush that RAM to disk) on the
> servers.  We can probably even out some of the fluctuations in burst tests
> that you originally reported though, just by making sure that the kernel
> doesn't wait until the last possible moment to get the disk involved.
>
> -Phil
>
> On 04/04/2013 10:09 AM, Matthieu Dorier wrote:
>
> Hi all,
>
>  Here are the answers to your earlier questions (experiments are done
> with PVFS on 4 nodes, IOR on 384 cores):
>
>  - When IOR uses a file-per-process approach, the performance becomes
> very unstable, ranging from 5MB/s to 400MB/s depending on the iteration. No
> way to see if there is a global decrease of performance or not.
>
>  - Setting TroveSyncData to yes leads to all iterations having a constant
> 50MB/s aggregate throughput. No performance decrease.
>
>  - CPU utilization is not 100% (30% on average).
>
>  So it seems the problem comes from caching. The questions are: where is
> the cache implemented, how to control its size and when it is sync'ed.
>
>  Matthieu
>
> ------------------------------
>
> *De: *"Becky Ligon" <[email protected]> <[email protected]>
> *À: *"Matthieu Dorier" <[email protected]><[email protected]>
> *Cc: *"Rob Latham" <[email protected]> <[email protected]>, "pvfs2-users"
> <[email protected]><[email protected]>,
> "ofs-support" <[email protected]> <[email protected]>
> *Envoyé: *Mardi 2 Avril 2013 19:19:07
> *Objet: *Re: [Pvfs2-users] Strange performance behavior with IOR
>
>  Another FYI:  On our cluster here at Clemson University, we have turned
> off hyperthreading on any machine having intel processors.  We found that
> MPI applications perform badly on a true multi-core system when
> hyperthreading is enabled.
>
>  Do any of your compute nodes have hyperthreading enabled?
>
> Becky
>
>
> On Tue, Apr 2, 2013 at 12:44 PM, Becky Ligon <[email protected]> wrote:
>
>> Just FYI:  What we have seen with the high CPU utilization is that once
>> you have more processes running than cores per machine, the performance
>> slows down.  And, we have seen this problem with the client core as well as
>> the pvfs library (which ROMIO accesses).  We have not been able to recreate
>> the problem systematically and thus have not been able to resolve the issue.
>>
>>
>> On Tue, Apr 2, 2013 at 12:15 PM, Matthieu Dorier <
>> [email protected]> wrote:
>>
>>>  To answer Phil's question: just restarting IOR is enough, yes. Not
>>> PVFS.
>>> For the rest, I'll do some experiments when I have the chance and get
>>> back to you.
>>>
>>>  Thanks all
>>>
>>>  Matthieu
>>>
>>> ------------------------------
>>>
>>> *De: *"Becky Ligon" <[email protected]>
>>> *À: *"Matthieu Dorier" <[email protected]>
>>> *Cc: *"Rob Latham" <[email protected]>, "pvfs2-users" <
>>> [email protected]>, "ofs-support" <
>>> [email protected]>
>>> *Envoyé: *Mardi 2 Avril 2013 17:22:17
>>>
>>> *Objet: *Re: [Pvfs2-users] Strange performance behavior with IOR
>>>
>>> Matthieu:
>>>
>>> Are you seeing any 100% CPU utilizations on the client?  We have seen
>>> this with the client core (which you are not using) on a multicore system;
>>> however, both the client core and the PVFS interface do use the same
>>> request structures, etc.
>>>
>>> Becky
>>>
>>> On Tue, Apr 2, 2013 at 11:11 AM, Becky Ligon <[email protected]> wrote:
>>>
>>>> Matthieu:
>>>>
>>>> I have asked Phil Carns to help you since he is more familiar with the
>>>> benchmark and MPIIO.  I think Rob Latham or Rob Ross may be helping too.  I
>>>> continue to look at your data in the mean time.
>>>>
>>>> Becky
>>>>
>>>> Phil/Rob:
>>>>
>>>> Thanks so much for helping Matthieu.  I am digging into the matter but
>>>> MPI is still new to me and I'm not familiar with the PVFS interface that
>>>> accompanies ROMIO.
>>>>
>>>> Becky
>>>>
>>>> PS.  Can we keep this on the pvfs2-users list so I can see how things
>>>> progress?
>>>>
>>>>
>>>>  On Tue, Apr 2, 2013 at 10:47 AM, Matthieu Dorier <
>>>> [email protected]> wrote:
>>>>
>>>>>  Hi Rob and Phil,
>>>>>
>>>>> This thread moved to the ofs-support mailing list (probably because
>>>>> the first personne to answer was part of this team), but I didn't get much
>>>>> answer to my problem, so I'll try to summarize here what I have done.
>>>>>
>>>>> First to answer Phil, here is the PVFS config file attached, and here
>>>>> is the script file used for IOR:
>>>>>
>>>>> IOR START
>>>>>   testFile = pvfs2:/mnt/pvfs2/testfileA
>>>>>   filePerProc=0
>>>>>   api=MPIIO
>>>>>    repetitions=100
>>>>>   verbose=2
>>>>>   blockSize=4m
>>>>>   transferSize=4m
>>>>>   collective=1
>>>>>   writeFile=1
>>>>>   interTestDelay=60
>>>>>   readFile=0
>>>>>   RUN
>>>>> IOR STOP
>>>>>
>>>>>  Besides the tests I was describing on my first mail, I also did the
>>>>> same experiments on another cluster also with TCP over IB, and then on
>>>>> Ethernet, with 336 clients and 672 clients, with 2, 4 and 8 storage
>>>>> servers. In every cases, this behavior appears.
>>>>>
>>>>> I benchmarked the local disk attached to the storage servers and got
>>>>> 42MB/s, so the high throughput of over 2GB/s I get obviously benefits from
>>>>> some caching mechanisme and the periodic behavior observed at high output
>>>>> frequency could be explained by that. Yet this does not explain why,
>>>>> overall, the performance decreases over time.
>>>>>
>>>>> I attach a set of graphics summarizing the experiments (on the x axis
>>>>> it's the iteration number and on the y axis the aggregate throughput
>>>>> obtained for this iteration, 100 consecutive iterations are performed).
>>>>> It seems that the performance follows the law D = a*T+b where D is the
>>>>> duration of the write, T is the wallclock time since the beginning of the
>>>>> experiment, and "a" and "b" are constants.
>>>>>
>>>>> When I stop IOR and immediately restart it, I get the good performance
>>>>> back, it does not continue at the reduced performance the previous 
>>>>> instance
>>>>> finished.
>>>>>
>>>>> I also thought it could come from the fact that the same file is
>>>>> re-written at every iteration, and tried with the multiFile=1 option to
>>>>> have one new file at every iteration instead, but this didn't help.
>>>>>
>>>>> Last thing I can mention: I'm using mpich 3.0.2, compiled with PVFS
>>>>> support.
>>>>>
>>>>> Matthieu
>>>>>
>>>>> ----- Mail original -----
>>>>> > De: "Rob Latham" <[email protected]>
>>>>> > À: "Matthieu Dorier" <[email protected]>
>>>>> > Cc: "pvfs2-users" <[email protected]>
>>>>> > Envoyé: Mardi 2 Avril 2013 15:57:54
>>>>> > Objet: Re: [Pvfs2-users] Strange performance behavior with IOR
>>>>>  >
>>>>> > On Sat, Mar 23, 2013 at 03:31:22PM +0100, Matthieu Dorier wrote:
>>>>> > > I've installed PVFS (orangeFS 2.8.7) on a small cluster (2 PVFS
>>>>> > > nodes, 28 compute nodes of 24 cores each, everything connected
>>>>> > > through infiniband but using an IP stack on top of it, so the
>>>>> > > protocol for PVFS is TCP), and I witness some strange performance
>>>>> > > behaviors with IOR (using ROMIO compiled against PVFS, no kernel
>>>>> > > support):
>>>>> >
>>>>> > > IOR is started on 336 processes (14 nodes), writing 4MB/process in
>>>>> > > a
>>>>> > > single shared file using MPI-I/O (4MB transfer size also). It
>>>>> > > completes 100 iterations.
>>>>> >
>>>>> > OK, so you have one pvfs client per core.  All these are talking to
>>>>> > two servers.
>>>>> >
>>>>> > > First every time I start an instance of IOR, the first I/O
>>>>> > > operation
>>>>> > > is extremely slow. I'm guessing this is because ROMIO has to
>>>>> > > initialize everything, get the list of PVFS servers, etc. Is there
>>>>> > > a
>>>>> > > way to speed this up?
>>>>> >
>>>>> > ROMIO isn't doing a whole lot here, but there is one thing different
>>>>> > about ROMIO's 1st call vs the Nth call.  The 1st call (first time any
>>>>> > pvfs2 file is opened or deleted), ROMIO will call the function
>>>>> > PVFS_util_init_defaults().
>>>>> >
>>>>> > If you have 336 clients banging away on just two servers, I bet that
>>>>> > could explain some slowness.  In the old days, the PVFS server had to
>>>>> > service these requests one at a time.
>>>>> >
>>>>> > I don't think this restriction has been relaxed?  Since it is a
>>>>> > read-only operation, though, it sure seems like one could just have
>>>>> > servers shovel out pvfs2 configuration information as fast as
>>>>> > possible.
>>>>> >
>>>>> >
>>>>> > > Then, I set some delay between each iteration, to better reflect
>>>>> > > the
>>>>> > > behavior of an actual scientific application.
>>>>> >
>>>>> > Fun! this is kind of like what MADNESS does.  "computes" by sleeping
>>>>> > for a bit.   I think Phil's questions will help us understand the
>>>>> > highly variable performance.
>>>>> >
>>>>> > Can you experiment with IORs collective I/O?  by default, collective
>>>>> > I/O will select one client per node as an "i/o aggregator".  The IOR
>>>>> > workload will not benefit from ROMIO's two-phase optimization, but
>>>>> > you've got 336 clients banging away on two servers.  When I last
>>>>> > studied pvfs scalability,  100x more clients than servers wasn't a
>>>>> > big
>>>>> > deal, but 5-6 years ago nodes did not have 24 way parallelism.
>>>>> >
>>>>> > ==rob
>>>>> >
>>>>> > --
>>>>> > Rob Latham
>>>>> > Mathematics and Computer Science Division
>>>>> > Argonne National Lab, IL USA
>>>>> >
>>>>>
>>>>>  _______________________________________________
>>>>> Pvfs2-users mailing list
>>>>> [email protected]
>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Becky Ligon
>>>> OrangeFS Support and Development
>>>> Omnibond Systems
>>>> Anderson, South Carolina
>>>>
>>>>
>>>
>>>
>>> --
>>> Becky Ligon
>>> OrangeFS Support and Development
>>> Omnibond Systems
>>> Anderson, South Carolina
>>>
>>>
>>>
>>
>>
>> --
>> Becky Ligon
>> OrangeFS Support and Development
>> Omnibond Systems
>> Anderson, South Carolina
>>
>>
>
>
> --
> Becky Ligon
> OrangeFS Support and Development
> Omnibond Systems
> Anderson, South Carolina
>
>
>
>
> _______________________________________________
> Pvfs2-users mailing 
> [email protected]http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>
>
> _______________________________________________
> Pvfs2-users mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>
>
> _______________________________________________
> Pvfs2-users mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>


-- 
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to