> On Sept. 6, 2018, 6:18 p.m., James Peach wrote: > > 3rdparty/libprocess/src/posix/subprocess.hpp > > Lines 195 (patched) > > <https://reviews.apache.org/r/68644/diff/1/?file=2082817#file2082817line195> > > > > We need to be careful here. We are in an async-signal-safe context but > > hashmap and list allocate memory. > > > > Actually iterating over the directory is difficult to do in an > > async-signal-safe context. You can open the directory in the parent and do > > the iteration in the child, but I don't think there's any guarantee that > > the readdir in the child is safe (though AFAIK in glibc it would work). > > Qian Zhang wrote: > Good catch! > > Yeah, we could do `os::lsof()` in the parent (e.g., right before we fork > the child: > https://github.com/apache/mesos/blob/1.7.0/3rdparty/libprocess/src/posix/subprocess.hpp#L261) > , but what if the parent opens a new fd after `os::lsof()` is called but > before the child is forked? In such case, the new fd will be leaked to the > child. > > And it seems `opendir`, `readdir` and `closedir` are not > async-signal-safe according to > http://man7.org/linux/man-pages/man7/signal-safety.7.html > > Maybe we should do something like this: > https://github.com/python/cpython/blob/master/Modules/_posixsubprocess.c#L257:L303 > , but I see they call `getdents64` which seems not async-signal-safe too.
> Yeah, we could do os::lsof() in the parent (e.g., right before we fork the > child: > https://github.com/apache/mesos/blob/1.7.0/3rdparty/libprocess/src/posix/subprocess.hpp#L261) > , but what if the parent opens a new fd after os::lsof() is called but > before the child is forked? In such case, the new fd will be leaked to the > child. Yes, I agree that we should try to avoid this race. > And it seems opendir, readdir and closedir are not async-signal-safe > according to > http://man7.org/linux/man-pages/man7/signal-safety.7.html Yup. I believe that this is OK in practice on Linux; at least I see this approach being used in real code. > Maybe we should do something like this: > https://github.com/python/cpython/blob/master/Modules/_posixsubprocess.c#L257:L303 > , but I see they call getdents64 which seems not async-signal-safe too. The two approaches that I can find in common use for Linux are to scan the `fd` directory or to just close everything up to `getdtablesize`. The former is a bit less portable, and the latter probably has a small performance cost. The cpython code is probably the best implementation of the scanning approach that I've seen. I don't know of any way to implement the scanning approach that is *strictly* async-signal-safe. A slightly different approach would be to exec a helper tool that does the closing. This would have a performance drawback (cost of linking libmesos at startup), and possibly compatibility issues. - James ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68644/#review208417 ----------------------------------------------------------- On Sept. 6, 2018, 1:25 a.m., Qian Zhang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68644/ > ----------------------------------------------------------- > > (Updated Sept. 6, 2018, 1:25 a.m.) > > > Review request for mesos, Gilbert Song and James Peach. > > > Bugs: MESOS-9152 > https://issues.apache.org/jira/browse/MESOS-9152 > > > Repository: mesos > > > Description > ------- > > Closed all file descriptors except `whitelist_fds` in posix/subprocess. > > > Diffs > ----- > > 3rdparty/libprocess/src/posix/subprocess.hpp > 007058b61fdcd4716aa793516c842c3cef8c0a29 > 3rdparty/libprocess/src/subprocess.cpp > c0640de2dc4278b884282dfaad98c49c3b067a5b > > > Diff: https://reviews.apache.org/r/68644/diff/1/ > > > Testing > ------- > > > Thanks, > > Qian Zhang > >
