Stefano Zampini <stefano.zamp...@gmail.com> writes: > You should swap fieldsplit and ASM > > -pc_type fieldsplit > -fieldsplit_0_pc_type asm
Note that this incurs separate communication for each split. If you nest them the other way, there would be one heavy communication and then a bunch of local work. This latency impact may not matter as much when you're already launching GPU kernels for the local work, though the communication to/from device memory is also expensive.