On Sun, 2018-02-25 at 13:22 -0500, Bill Torpey wrote: > Hi Franceso: > > A few more points below … > > Good luck, and please post back if you find out anything interesting! > > Regards, > > Bill > > > On Feb 25, 2018, at 4:54 AM, Francesco <[email protected] > > m> wrote: > > > > Hi Bill, > > thanks for your answer. > > > > > > > > 2018-02-24 21:49 GMT+01:00 Bill Torpey <[email protected] > > <mailto:[email protected]>>: > > ... > > If T2 and T4 were application code, this would be a clear violation > > of ZMQ’s threading rules (assuming “legacy”, non-thread-safe socket > > types). > > Right. > > > > > > For instance, one technique would be to perform initialization > > (e.g., bind’s) in the main thread, and only after that is done spin > > up other threads to do processing. In this case, TSAN wouldn't > > have any way to know that the application guarantees that accesses > > cannot result in a race, so TSAN would flag it. I’ve gotten in the > > habit of using mutexes to protect code like this even though it > > should not strictly be needed, just to keep tools like TSAN happy, > > and also because I don’t know the ZMQ code well enough to be 100% > > certain that the mutexes are not necessary — better safe than > > sorry! > > Yeah, that's a possibility but it results in a lot of "clutter" > > that decrease code readability and makes it harder to maintain in > > the long run... > > > > This situation is different, though, since T4 is not an application > > thread — it’s an internal ZMQ worker thread. So, I think in this > > case we kind of have to accept that ZMQ is doing the right thing > > here. > > At least, that’s the approach I’ve been taking. When I instrument > > my apps and libraries with TSAN I specifically do NOT instrument > > ZMQ, and I also use the “called_from_lib:libzmq.so” suppression > > (which is listed as an example for TSAN: https://github.com/google/ > > sanitizers/wiki/ThreadSanitizerSuppressions > > <https://github.com/google/sanitizers/wiki/ThreadSanitizerSuppressi > > ons>). > > Understood. I have a question though: if you > > use “called_from_lib:libzmq.so” suppression, are you able to > > spot the race condition due to T2 and T4 being application threads > > (instead of being 1 application and 1 zmq internal)? > > I wonder if TSAN, detecting that one of the 2 threads generating > > the data race is inside ZMQ, entirely suppress the race warning or > > instead will suppress only race conditions involving 2 internal zmq > > threads.. > > Good question. I could only find one post that discusses this > suppression: https://groups.google.com/forum/#!topic/thread-sanitizer > /NEcgiPEG0N8 <https://groups.google.com/forum/#!topic/thread- > sanitizer/NEcgiPEG0N8> > > called_from_lib suppresses only interceptors (like read or > memset) called directly from the lib. It's intended for non- > instrumented libraries. > > However, when I try this with my test code, enabling the suppression > actually increases the number of false positives reported by > TSAN. Disabling the suppression results in a smaller number of > mostly different false positives. You may want to experiment with > this — I plan to take another look at whether enabling this > suppression is a good idea based on what I’ve seen in my tests. > > Unfortunately, it’s not possible to use the race:libzmq.so > suppression to avoid all false positives in ZMQ, since that > suppresses ALL warnings where libzmq.so appears ANYWHERE in the stack > trace, and that is much too broad. > > So, there’s no simple answer. I’ve developed some scripts that parse > the output of TSAN and generate MD5 hashes of the stack traces, which > can then be used to suppress individual stack traces. Going that > route is a lot of work, but it’s the only way I know of at this time > to provide more granular suppressions with TSAN. It would be nice if > the suppression mechanism in TSAN were more robust (e.g., more like > valgrind’s), but it isn’t. > > > > > > > Instrumenting libzmq and/or omitting the suppression causes a LOT > > of warnings, esp. in the ZMQ worker threads. So, unless I'm > > willing to commit the time and effort to go through and investigate > > each of these warnings, I feel I have little choice but to accept > > that at this point in its lifetime ZMQ should be race-free for all > > practical purposes. > > > > FWIW, I’ve done fairly extensive testing, and specifically stress > > testing, and have yet to find anything that looks like an honest- > > to-goodness bug in ZMQ. (Which is not to say that the docs are > > always clear about what to expect in certain situations ;-) I did > > have one problem which appears to have been a bug in epoll, and > > which was resolved by upgrading Linux, but that’s it. > > > > I agree, I was just surprised to see so many warnings... > > > > > > BTW, there’s an excellent overview of how this all works at http:// > > zeromq.org/whitepapers:architecture > > <http://zeromq.org/whitepapers:architecture> — although it’s > > somewhat old, it appears to still be relatively accurate. > > > > thanks for the link, I read that and it's quite interesting. > > However it does not mention how multi-thread safety is achieved. > > Just out of curiosity I will take a look at ypipe implementation. > > Yes, that would be interesting. Luca is saying that ZMQ is basically > race-free, but I have never seen any documentation on how that is > achieved or verified. From a quick look at the code, it appears that > ZMQ uses a combination of plain old pthread mutexes along with > knowledge of ZMQ’s internal threading architecture to know when > mutexes are unnecessary.
The queues and pipes use atomic pointer-swapping and a lock-free single-writer-single-reader algorithm (somewhat similar, although much simpler and focused for a single use case, to some of the data structures provided by liburcu) You can see the implementation in ypipe.hpp and yqueue.hpp, they are both fairly small. -- Kind regards, Luca Boccassi
signature.asc
Description: This is a digitally signed message part
_______________________________________________ zeromq-dev mailing list [email protected] https://lists.zeromq.org/mailman/listinfo/zeromq-dev
