VecAssemblyBegin() serves as a barrier unless you set the vector option VEC_IGNORE_OFF_PROC_ENTRIES so I am not surprised that it "appears" to take a lot of time. BUT the balance between the fastest and slowest is listed in your table below is 1.0 which is very surprising; indicating every process supposedly spent the same amount of time within the VecAssemblyBegin(). Note that for VecAssemblyEnd() the balance is 2.3 which is what I commonly would expect. Please send me ALL the output for -log_summary for these cases. Version of PETSc shouldn't matter for this issue.
> On May 28, 2015, at 4:59 PM, Mark Adams <[email protected]> wrote: > > We are seeing some large times spent in VecAssemblyBegin: > > VecAssemblyBegin 242 1.0 7.9796e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 7.3e+02 12 0 0 0 5 76 0 0 0 10 0 > VecAssemblyEnd 242 1.0 5.6624e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > This is with 64K cores on Edison. On 128K cores (weak speedup) we see: > > VecAssemblyBegin 248 1.0 2.3615e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 7.4e+02 17 0 0 0 4 87 0 0 0 10 0 > VecAssemblyEnd 248 1.0 6.8855e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > We are working on using older versions of PETSc to make sure this is a PETSc > issue but does anyone have any thoughts on this? > > Mark
