Mark Adams <[email protected]> writes: > Thanks, this code does not have any communication in VecAssembly so I added > the INGNORE stuff and added a barrier before this section of code. I am > suspecting that VecAssembly is catching load imbalance but not reporting it > for some reason.
If you put a barrier before, it starts at about the same time on all processes. The operation contains synchronization followed by (scalable) point-to-point messaging. The slowness is almost certainly caused by the non-scalable algorithm -- I've seen it with other applications. So try my branch.
signature.asc
Description: PGP signature
