On Oct 8, 2020, at 11:14 AM, Hefty, Sean <[email protected]> wrote: > > >> On Thu, Oct 08, 2020 at 05:45:31PM +0000, Hefty, Sean wrote: >>>>> There have been extensive discussions on github around the MR cache, >>>>> deadlocks, libibverbs madvise tracking, and fork. The current >>>>> direction is to only enable the MR cache when fork is disabled. >>>>> This was done to work-around internal libibverbs tracking. But I >>>>> suspect that bypassing that tracking (which is possible) can still >>>>> lead to issues when registrations are made through the MR cache. >>>> >>>> MADV_DONTFORK will be obsolete starting in kernel v5.9 >>>> >>>> If you can test and confirm that everything works without it then we >>>> can detect and disable ibv_fork_init on new kernels. >>> >>> Interesting. What will the behavior be for registered regions when fork is >>> called? >> >> They are copied into the fork. >> >>> My concern is that the registrations are made and maintained without >>> the application being aware. Will cached registrations need to be >>> released when fork is invoked, or is there some other mechanism >>> coming into play now? >> >> MRs continue to reliably point to memory owned in the parent >> process. >> >> The child process will be unable to use any MRs or verbs objects, just >> like today. > > Thanks - I think this means that fork becomes a non-issue.
We’ll have to figure out how to make some of these decisions at runtime. We need a fix for today’s world (even if we’re ok with it being a little more hacky, since it has a finite lifespan), and a way to know whether we need to run that hacky fix or not in the future. I don’t think our current hack around trying ibv_fork_init() will work in a world where rdma-core isn’t building the RB tree. So some way of exposing that new behavior out of rdma-core would, unfortunately, be helpful to Libfabric. Brian _______________________________________________ ofiwg mailing list [email protected] https://lists.openfabrics.org/mailman/listinfo/ofiwg
