On Jun 10, 2012, at 5:54 PM, Mark F. Adams wrote:
> Barry,
>
> Your algorithm is the classic CS algorithm. My algorithms exploits the (low
> dimensional) geometric nature of our graphs. Assuming you have a decent
> partitioning, as Jed says, you have, for instance, "interior" vertices that
> have no external dependence so coloring them is silly.
Sure absolutely in the MPI world. If you are going for vectorization it may
not be silly.
> The algorithm enforces only the (external) data dependencies that are
> required for correctness, which for normal PDE problems results in a far
> shorter dependency depth than generic coloring and leaves you some freedom to
> reorder for performance.
>
> From a theoretically point of view there is actually a coloring in the
> algorithm but it is not obvious. For instance, if there is only one vertex
> per process then it basically reverts to a coloring algorithm (although it is
> still asynchronous) where the processor IDs (in the way that I implement it
> for simplicity) are the "random" numbers that one can use for an MIS or
> coloring.
I think it would be useful to understand the algorithm exactly in terms of
"block colorings?", "dependency graphs", ..... etc?
Barry
>
> Mark
>
> On Jun 10, 2012, at 1:36 PM, Jed Brown wrote:
>
>> On Sun, Jun 10, 2012 at 12:30 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> For clarification it sounds like you color your matrix (with independent
>> sets) and then march through the colors, each color gets smoothed (in
>> parallel since they are independent) then on to the next color?
>>
>> Not independent sets, just by data dependency. For example, everyone relaxes
>> a block along the right boundary (implying an ordering that visits those
>> blocks in rank order, except that those strips never touch each other so
>> they can be done in parallel), then send the updated state to the proc on
>> the right. While waiting for the message, he might smooth another set that
>> does not have a data dependency. When a process receives the first message
>> (from the proc to the left), it can relax that part. There is a picture in
>> Mark's paper that should make the algorithm clear. This approach doesn't
>> suffer from the atrocious memory access pattern of classical coloring; it's
>> also asynchronous and completes with fewer messages in total.
>