Mark Adams <[email protected]> writes: > I am suspecting that it is catching load imbalance and just not reporting > it correctly. I've added a barrier in the code. > > Here are the two log files.
Mark, there has always been a worst-case O(n*p) algorithm in
VecStashScatterBegin_Private:
for (i=0; i<stash->n; i++) {
/* if indices are NOT locally sorted, need to start search at the beginning
*/
if (lastidx > (idx = stash->idx[i])) j = 0;
lastidx = idx;
for (; j<size; j++) {
if (idx >= owners[j] && idx < owners[j+1]) {
nprocs[2*j]++; nprocs[2*j+1] = 1; owner[i] = j; break;
}
}
}
The branch jed/mat-assembly-perf has a scalable implementation. Can you
try it (either in that branch or in 'next')?
signature.asc
Description: PGP signature
