On Tue, 16 Mar 2010, John Peterson wrote:
On Tue, Mar 16, 2010 at 1:27 PM, Roy Stogner wrote:
Very good point. John, you might try sticking Parallel::barrier() in
front of each of the Parallel::max() calls - if that ends up capturing
all the perflog time, then the problem isn't max() takin
On Tue, Mar 16, 2010 at 1:27 PM, Roy Stogner wrote:
>
> On Tue, 16 Mar 2010, Kirk, Benjamin (JSC-EG311) wrote:
>
>> The only other thing that comes to mind is that max effectively
>> introduces a barrier
>
> Very good point. John, you might try sticking Parallel::barrier() in
> front of each of t
John and I have been working on a model reduction framework for
parametrized PDEs (which is where the Parallel::max timing came from)
where the system does _many_ reduced order solves on a large training
set of parameters in order to "train" a reduced basis. The training set
gets split up among
Expound??
- Original Message -
From: Derek Gaston
To: Roy Stogner
Cc: Kirk, Benjamin (JSC-EG311); '[email protected]'
; '[email protected]'
Sent: Tue Mar 16 14:55:28 2010
Subject: Re: [Libmesh-devel] Parallel::max()
On Mar 16, 2010, at 12:27 PM, Roy Stogne
On Mar 16, 2010, at 12:27 PM, Roy Stogner wrote:
> In which case it's not our Parallel::max
> implementation that's screwed up, just _Y_our load-balancing. ;-)
Fixed that for you ;-)
Derek
--
Download Intel® Parallel
On Tue, 16 Mar 2010, Kirk, Benjamin (JSC-EG311) wrote:
> The only other thing that comes to mind is that max effectively
> introduces a barrier
Very good point. John, you might try sticking Parallel::barrier() in
front of each of the Parallel::max() calls - if that ends up capturing
all the per
The only other thing that comes to mind is that max effectively introduces a
barrier, so a calling sequence like
Parallel::max(...);
Parallel::allgather(...);
Would likely result in a much longer max than allgather since the processes
will be synchronized already. (?)
-Ben
- Original Mes
On Tue, 16 Mar 2010, John Peterson wrote:
> Is there something up with our Parallel::max() implementation? In a
> recent code I ran on 256 processors, each call to Parallel::max
> apparently required 24 seconds, orders of magnitude longer than
> something like gather, with presumably way more co
Is there something up with our Parallel::max() implementation? In a
recent code I ran on 256 processors, each call to Parallel::max
apparently required 24 seconds, orders of magnitude longer than
something like gather, with presumably way more communication?!
(You may want to view this PerfLog ta