I just pushed some fixes for the issue in this thread. You will typically notice somewhat different performance with multiple blocks per process because a different partition is used. In some special cases, you can reproduce the partition through appropriate use of -mat_partitioning_type. For example
$ mpirun -n 7 ./ex2 -pc_type asm -pc_asm_blocks 7 Norm of error 0.000458307 iterations 7 $ mpirun -n 1 ./ex2 -pc_type asm -pc_asm_blocks 7 # parmetis for me Norm of error 0.000180024 iterations 8 $ mpirun -n 1 ./ex2 -pc_type asm -pc_asm_blocks 7 -mat_partitioning_type square Norm of error 0.000458307 iterations 7 Jed
