Mark Adams <[email protected]> writes:

> I am suspecting that it is catching load imbalance and just not reporting
> it correctly. I've added a barrier in the code.
>
> Here are the two log files.

Mark, there has always been a worst-case O(n*p) algorithm in
VecStashScatterBegin_Private:

  for (i=0; i<stash->n; i++) {
    /* if indices are NOT locally sorted, need to start search at the beginning 
*/
    if (lastidx > (idx = stash->idx[i])) j = 0;
    lastidx = idx;
    for (; j<size; j++) {
      if (idx >= owners[j] && idx < owners[j+1]) {
        nprocs[2*j]++; nprocs[2*j+1] = 1; owner[i] = j; break;
      }
    }
  }

The branch jed/mat-assembly-perf has a scalable implementation.  Can you
try it (either in that branch or in 'next')?

Attachment: signature.asc
Description: PGP signature

Reply via email to