There have been extensive discussions on github around the MR cache, deadlocks, 
libibverbs madvise tracking, and fork.  The current direction is to only enable 
the MR cache when fork is disabled.  This was done to work-around internal 
libibverbs tracking.  But I suspect that bypassing that tracking (which is 
possible) can still lead to issues when registrations are made through the MR 
cache.

However, the only time that madvise(DONTFORK) *needs* to be called is:

- immediately prior to calling fork()
- only on memory registrations actively in use

Currently, if the app *might* call fork(), madvise() is called as part of every 
memory registration/deregistration.  This has a negative impact on performance. 
 If we can defer calling madvise() until it is needed, then enabling fork() 
support for all apps would be possible, without impacting apps that don't call 
it.  Additionally, even if apps call fork(), we may be able to avoid calling 
madvise() for every registration.

To do this, we need:

- the ability to intercept fork()
- calling madvise() from the intercept routine.

The first might be possible by using the memhooks mechanism.  I don't know if 
there would be an issue with the second.

Assuming the above works, when fork() is called, cached registrations not in 
use can simply be flushed.  Registrations with a use_cnt > 0 need madvise() 
called on them.  Those registrations can be flushed once their use_cnt = 0, 
with madvise() called to re-enable fork.

Without the ability to intercept fork(), I don't see a way to enable the MR 
cache and also support fork().  Caching a registration and marking the memory 
with madvise(DONTFORK) has the potential to hide data from the forked process 
that an application might expect to find.

- Sean
_______________________________________________
ofiwg mailing list
[email protected]
https://lists.openfabrics.org/mailman/listinfo/ofiwg

Reply via email to