Dammit, didn't reply-all. Sorry. On Fri, Jan 21, 2022 at 8:52 AM Paul T. Bauman <ptbau...@gmail.com> wrote:
> > > On Fri, Jan 21, 2022 at 8:37 AM Jed Brown <j...@jedbrown.org> wrote: > >> "Paul T. Bauman" <ptbau...@gmail.com> writes: >> >> > On Fri, Jan 21, 2022 at 8:19 AM Jed Brown <j...@jedbrown.org> wrote: >> > >> >> Mark Adams <mfad...@lbl.gov> writes: >> >> >> >> > Two questions about hypre on HIP: >> >> > >> >> > * I am doing this now. Is this correct? >> >> > >> >> > '--download-hypre', >> >> > '--download-hypre-configure-arguments=--enable-unified-memory', >> >> >> > >> > Apologies for interjecting, but I want to point out here that a pretty >> good >> > chunk of BoomerAMG is ported to the GPU and you may not need this >> > unified-memory option. I point this out because you will get >> substantially >> > better performance without this option, i.e. using "native" GPU memory. >> I >> > do not know the intricacies of the PETSc/HYPRE/GPU interaction so maybe >> > PETSc won't handle the CPU->GPU memcopies for you (I'm assuming vecs, >> mats >> > are assembled on the CPU) in which case you might need the option. And >> if >> > you do run into code paths in BoomerAMG that are not ported to the GPU >> and >> > you want to use them, I'd be very interested to know what the options >> are >> > that are missing a GPU port. >> >> We have matrices and vectors assembled on the device and logic to pass >> the device data to Hypre. Stefano knows the details. >> >> Will the option --enable-unified-memory hurt performance if we provide >> all data on the device? >> > > Yes. The way HYPRE's memory model is setup is that ALL GPU allocations are > "native" (i.e. [cuda,hip]Malloc) or, if unified memory is enabled, then ALL > GPU allocations are unified memory (i.e. [cuda,hip]MallocManaged). > Regarding HIP, there is an HMM implementation of hipMallocManaged planned, > but is it not yet delivered AFAIK (and it will *not* support gfx906, e.g. > RVII, FYI), so, today, under the covers, hipMallocManaged is calling > hipHostMalloc. So, today, all your unified memory allocations in HYPRE on > HIP are doing CPU-pinned memory accesses. And performance is just truly > terrible (as you might expect). >