Re: [petsc-users] possible performance issues with PETSc on Cray

Samar Khatiwala Fri, 09 May 2014 04:48:27 -0700

Hi Jed et al.,

Just wanted to report back on the resolution of this issue. The computing 
support people at HLRN in Germany 
submitted a test case to CRAY re. performance on their XC30. CRAY has finally 
gotten back with a solution, 
which is to use the run-time option  -vecscatter_alltoall. Apparently this is a 
known issue and according to the 
HLRN folks passing this command line option to PETSc seems to work nicely.


Thanks again for your help.

Samar

On Apr 11, 2014, at 7:44 AM, Jed Brown <[email protected]> wrote:

> Samar Khatiwala <[email protected]> writes:
> 
>> Hello,
>> 
>> This is a somewhat vague query but I and a colleague have been running PETSc 
>> (3.4.3.0) on a Cray 
>> XC30 in Germany (https://www.hlrn.de/home/view/System3/WebHome) and the 
>> system administrators 
>> alerted us to some anomalies with our jobs that may or may not be related to 
>> PETSc but I thought I'd ask 
>> here in case others have noticed something similar.
>> 
>> First, there was a large variation in run-time for identical jobs, sometimes 
>> as much as 50%. We didn't 
>> really pick up on this but other users complained to the IT people that 
>> their jobs were taking a performance 
>> hit with a similar variation in run-time. At that point we're told the IT 
>> folks started monitoring jobs and 
>> carrying out tests to see what was going on. They discovered that (1) this 
>> always happened when we were 
>> running our jobs and (2) the problem got worse with physical proximity to 
>> the nodes on which our jobs were 
>> running (what they described as a "strong interaction" between our jobs and 
>> others presumably through the 
>> communication network).
> 
> It sounds like you are strong scaling (smallish subdomains) so that your
> application is sensitive to network latency.  I see significant
> performance variability on XC-30 with this Full Multigrid solver that is
> not using PETSc.
> 
> http://59a2.org/files/hopper-vs-edison.3semilogx.png
> 
> See the factor of 2 performance variability for the samples of the ~15M
> element case.  This operation is limited by instruction issue rather
> than bandwidth (indeed, it is several times faster than doing the same
> operations with assembled matrices).  Here the variability is within the
> same application performing repeated solves.  If you get a different
> partition on a different run, you can see larger variation.
> 
> If your matrices are large enough, your performance will be limited by
> memory bandwidth.  (This is the typical case, but sufficiently small
> matrices can fit in cache.)  I once encountered a batch system that did
> not properly reset nodes between runs, leaving a partially-filled
> ramdisk distributed asymmetrically across the memory busses.  This led
> to 3x performance reduction on 4-socket nodes because much of the memory
> demanded by the application would be faulted onto one memory bus.
> Presumably your machine has a resource manager that would not allow such
> things to happen.

Re: [petsc-users] possible performance issues with PETSc on Cray

Reply via email to