Thanks, this code does not have any communication in VecAssembly so I added the INGNORE stuff and added a barrier before this section of code. I am suspecting that VecAssembly is catching load imbalance but not reporting it for some reason.
On Fri, May 29, 2015 at 10:54 AM, Jed Brown <[email protected]> wrote: > Mark Adams <[email protected]> writes: > > > I am suspecting that it is catching load imbalance and just not reporting > > it correctly. I've added a barrier in the code. > > > > Here are the two log files. > > Mark, there has always been a worst-case O(n*p) algorithm in > VecStashScatterBegin_Private: > > for (i=0; i<stash->n; i++) { > /* if indices are NOT locally sorted, need to start search at the > beginning */ > if (lastidx > (idx = stash->idx[i])) j = 0; > lastidx = idx; > for (; j<size; j++) { > if (idx >= owners[j] && idx < owners[j+1]) { > nprocs[2*j]++; nprocs[2*j+1] = 1; owner[i] = j; break; > } > } > } > > The branch jed/mat-assembly-perf has a scalable implementation. Can you > try it (either in that branch or in 'next')? >
