> On May 29, 2015, at 7:55 AM, Mark Adams <[email protected]> wrote:
> 
> I am suspecting that it is catching load imbalance and just not reporting it 
> correctly.


    The code is trivial and exactly the same as in many other places where the 
load balance of not 1.0 is reported so something is funky:

PetscErrorCode  VecAssemblyBegin(Vec vec)
{
  PetscErrorCode ierr;

  PetscFunctionBegin;
  PetscValidHeaderSpecific(vec,VEC_CLASSID,1);
  PetscValidType(vec,1);
  ierr = VecStashViewFromOptions(vec,NULL,"-vec_view_stash");CHKERRQ(ierr);
  ierr = PetscLogEventBegin(VEC_AssemblyBegin,vec,0,0,0);CHKERRQ(ierr);
  if (vec->ops->assemblybegin) {
    ierr = (*vec->ops->assemblybegin)(vec);CHKERRQ(ierr);
  }
  ierr = PetscLogEventEnd(VEC_AssemblyBegin,vec,0,0,0);CHKERRQ(ierr);
  ierr = PetscObjectStateIncrease((PetscObject)vec);CHKERRQ(ierr);
  PetscFunctionReturn(0);
}

  I cannot explain why the load balance would be 1.0 unless, by unlikely 
coincidence on the 248 different calls to the function different processes are 
the ones waiting so that the sum of the waits on different processes matches 
over the 248 calls. Possible but 


> I've added a barrier in the code.

   You don't need a barrier.  If you do not have a barrier you should see all 
the "wait time" now accumulate somewhere later in the code at the next 
reduction after the VecAssemblyBegin/End.

  Barry



> 
> Here are the two log files.
> 
> On Thu, May 28, 2015 at 7:48 PM, Barry Smith <[email protected]> wrote:
> 
>    VecAssemblyBegin() serves as a barrier unless you set the vector option 
> VEC_IGNORE_OFF_PROC_ENTRIES so I am not surprised that it "appears" to take a 
> lot of time. BUT the balance between the fastest and slowest is listed in 
> your table below is 1.0  which is very surprising; indicating every process 
> supposedly spent the same amount of time within the VecAssemblyBegin(). Note 
> that for VecAssemblyEnd() the balance is 2.3 which is what I commonly would 
> expect. Please send me ALL the output for -log_summary for these cases.  
> Version of PETSc shouldn't matter for this issue.
> 
> > On May 28, 2015, at 4:59 PM, Mark Adams <[email protected]> wrote:
> >
> > We are seeing some large times spent in VecAssemblyBegin:
> >
> > VecAssemblyBegin     242 1.0 7.9796e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > 7.3e+02 12  0  0  0  5  76  0  0  0 10     0
> > VecAssemblyEnd       242 1.0 5.6624e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 
> > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >
> > This is with 64K cores on Edison.  On 128K cores (weak speedup) we see:
> >
> > VecAssemblyBegin     248 1.0 2.3615e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > 7.4e+02 17  0  0  0  4  87  0  0  0 10     0
> > VecAssemblyEnd       248 1.0 6.8855e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 
> > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> >
> > We are working on using older versions of PETSc to make sure this is a 
> > PETSc issue but does anyone have any thoughts on this?
> >
> > Mark
> 
> 
> <log_64K><log_128K>

Reply via email to