On Fri, 11 Dec 2009 18:24:58 +0800, Wee-Beng Tay <zonexo at gmail.com> wrote:
> But you mention abt latency, so shouldn't minimizing the number of neighbor > processes reduce latency and improve performance? Perhaps, depending on your network. But there are many tricks to hide latency of ghost updates, global reductions (in dot products) are harder especially since MPI collectives are synchronous. The higher iteration counts are way more painful than marginally higher update cost. > For both do u mean dividing 1 big grid into 4 55x35 grids? Yes, instead of 4 thin slices. And so on as you refine. DA does this automatically, just don't choose a prime number of processes (because then it would be forced into doing slices). > so whichever method I use (horizontal or vertical) doesn't matter? But > splitting to 4 55x35 grids will be better? Trying to send directly from some contiguous array is not a worthwhile optimization. My comment about latency was to guide against another "optimization" of sending some components of a vector problem separately when not all components "need" updating (it's likely faster to do one update of 5 values per ghost node than to do two separate updates of 1 value per node). Splitting into 4 subdomains isn't "better" than 2 subdomains, but when using many subdomains, they should not be thin slices. DA manages all of this. If you have some compelling reason *not* to use DA, you won't go far wrong by copying the design decisions in DA. Jed