Hey,
> Then add on top of this the fact that you could simply recompile > PETSc, run it natively on the card, and still run it on your > CPU's as MPMD. > > > This is a good way to get terrible performance. > > > Why? Decompose your domain to take into account the imbalance in > computational power. The link between the card and the CPU is going to > be faster than going to another node. I fully second Jed. Computational scientists are already fighting with getting scalable performance on a 'standard' multi-core architecture, so I doubt that one can really obtain a gain on an accelerator-architecture for any real-world application just be recompilation of existing code. Also, add the extra issue of PCI-Express latency. Best regards, Karli
