On Sat, 5 Dec 2009 16:02:38 -0600, Matthew Knepley <knepley at gmail.com> wrote: > I need to understand better. You are asking about the case where we have > many GPUs and one CPU? If its always one or two GPUs per CPU I do not > see the problem.
Barry initially proposed one Python thread per node, then distributing the kernels over many CPU cores on that node, or to one-or-more GPUs. With some abuse of terminology, lets call them all worker threads, perhaps dozens if running on multicore CPUs, or hundreds/thousands when on a GPU. The physics, such as FEM integration, has to be done by those worker threads. But unless every thread is it's own subdomain (i.e. Block Jacobi/ASM with very small subdomains), we still need to assemble a small number of matrices per node. So we would need a lock-free concurrent MatSetValues, otherwise we'll only scale to a few worker threads before everything is blocked on MatSetValues. > Hmm, still not quite getting this problem. We need concurrency on the > GPU, but why would we need it on the CPU? Only if the we were doing real work on the many CPU cores per node. > On the GPU, triangular solve will be just as crappy as it currently > is, but will look even worse due to large number of cores. It could be worse because a single GPU thread is likely slower than a CPU core. > It is not the only smoother. For instance, polynomial smoothers would > be more concurrent. Yup. > > I have trouble finding decent preconditioning algorithms suitable for > > the fine granularity of GPUs. Matt thinks we can get rid of all the > > crappy sparse matrix kernels and precondition everything with FMM. > > > > That is definitely my view, or at least my goal. And I would say this, > if we are just starting out on these things, I think it makes sense to > do the home runs first. If we just try and reproduce things, people > might say "That is nice, but I can already do that pretty well". Agreed, but it's also important to have something good to offer people who aren't ready to throw out everything they know and design a new algorithm based on a radically different approach that may or may not be any good for their physics. Jed