Matt I messed up a bit the things as I sent you the number of sub iters done with preonly. It looks to be faster but outer iters goes up from 4 (with gmres as sub ksp) to 399, while run time goes from 37s to 10s. I manage to drop a bit the run time by dropping the relative tolerance
Looking at the iters done by sub gmres, they are around 20ish for U block, and around 5 for P block, so nothing too crazy, what do you think?
