On Tue, 16 Mar 2010, John Peterson wrote:
On Tue, Mar 16, 2010 at 1:27 PM, Roy Stogner wrote:
Very good point. John, you might try sticking Parallel::barrier() in
front of each of the Parallel::max() calls - if that ends up capturing
all the perflog time, then the problem isn't max() takin
On Tue, Mar 16, 2010 at 1:27 PM, Roy Stogner wrote:
>
> On Tue, 16 Mar 2010, Kirk, Benjamin (JSC-EG311) wrote:
>
>> The only other thing that comes to mind is that max effectively
>> introduces a barrier
>
> Very good point. John, you might try sticking Parallel::barrier() in
> front of each of t
[email protected]'
>
> Sent: Tue Mar 16 14:55:28 2010
> Subject: Re: [Libmesh-devel] Parallel::max()
>
>
> On Mar 16, 2010, at 12:27 PM, Roy Stogner wrote:
>
>> In which case it's not our Parallel
Expound??
- Original Message -
From: Derek Gaston
To: Roy Stogner
Cc: Kirk, Benjamin (JSC-EG311); '[email protected]'
; '[email protected]'
Sent: Tue Mar 16 14:55:28 2010
Subject: Re: [Libmesh-devel] Parallel::max()
On Mar 16, 20
On Mar 16, 2010, at 12:27 PM, Roy Stogner wrote:
> In which case it's not our Parallel::max
> implementation that's screwed up, just _Y_our load-balancing. ;-)
Fixed that for you ;-)
Derek
--
Download Intel® Parallel
On Tue, 16 Mar 2010, Kirk, Benjamin (JSC-EG311) wrote:
> The only other thing that comes to mind is that max effectively
> introduces a barrier
Very good point. John, you might try sticking Parallel::barrier() in
front of each of the Parallel::max() calls - if that ends up capturing
all the per
Message -
From: Roy Stogner
To: John Peterson
Cc: libmesh-devel
Sent: Tue Mar 16 13:07:17 2010
Subject: Re: [Libmesh-devel] Parallel::max()
On Tue, 16 Mar 2010, John Peterson wrote:
> Is there something up with our Parallel::max() implementation? In a
> recent code I ran on 256 proc
On Tue, 16 Mar 2010, John Peterson wrote:
> Is there something up with our Parallel::max() implementation? In a
> recent code I ran on 256 processors, each call to Parallel::max
> apparently required 24 seconds, orders of magnitude longer than
> something like gather, with presumably way more co
Is there something up with our Parallel::max() implementation? In a
recent code I ran on 256 processors, each call to Parallel::max
apparently required 24 seconds, orders of magnitude longer than
something like gather, with presumably way more communication?!
(You may want to view this PerfLog ta