Re: [zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

2018-02-25 Thread Luca Boccassi
On Sun, 2018-02-25 at 13:22 -0500, Bill Torpey wrote:
> Hi Franceso:
> 
> A few more points below …
> 
> Good luck, and please post back if you find out anything interesting!
> 
> Regards,
> 
> Bill
> 
> > On Feb 25, 2018, at 4:54 AM, Francesco  > m> wrote:
> > 
> > Hi Bill,
> > thanks for your answer.
> > 
> > 
> > 
> > 2018-02-24 21:49 GMT+01:00 Bill Torpey  > >:
> > ...
> > If T2 and T4 were application code, this would be a clear violation
> > of ZMQ’s threading rules (assuming “legacy”, non-thread-safe socket
> > types).  
> > Right. 
> > 
> >  
> > For instance, one technique would be to perform initialization
> > (e.g., bind’s) in the main thread, and only after that is done spin
> > up other threads to do processing.  In this case, TSAN wouldn't
> > have any way to know that the application guarantees that accesses
> > cannot result in a race, so TSAN would flag it.  I’ve gotten in the
> > habit of using mutexes to protect code like this even though it
> > should not strictly be needed, just to keep tools like TSAN happy,
> > and also because I don’t know the ZMQ code well enough to be 100%
> > certain that the mutexes are not necessary — better safe than
> > sorry!
> > Yeah, that's a possibility but it results in a lot of "clutter"
> > that decrease code readability and makes it harder to maintain in
> > the long run...
> > 
> > This situation is different, though, since T4 is not an application
> > thread — it’s an internal ZMQ worker thread.   So, I think in this
> > case we kind of have to accept that ZMQ is doing the right thing
> > here.
> > At least, that’s the approach I’ve been taking.  When I instrument
> > my apps and libraries with TSAN I specifically do NOT instrument
> > ZMQ, and I also use the “called_from_lib:libzmq.so” suppression
> > (which is listed as an example for TSAN: https://github.com/google/
> > sanitizers/wiki/ThreadSanitizerSuppressions
> >  > ons>).
> > Understood. I have a question though: if you
> > use   “called_from_lib:libzmq.so”  suppression, are you able to
> > spot the race condition due to T2 and T4 being application threads
> > (instead of being 1 application and 1 zmq internal)? 
> > I wonder if TSAN, detecting that one of the 2 threads generating
> > the data race is inside ZMQ, entirely suppress the race warning or
> > instead will suppress only race conditions involving 2 internal zmq
> > threads..
> 
> Good question.  I could only find one post that discusses this
> suppression: https://groups.google.com/forum/#!topic/thread-sanitizer
> /NEcgiPEG0N8  sanitizer/NEcgiPEG0N8>
> 
>   called_from_lib suppresses only interceptors (like read or
> memset) called directly from the lib. It's intended for non-
> instrumented libraries. 
> 
> However, when I try this with my test code, enabling the suppression
> actually increases the number of false positives reported by
> TSAN.  Disabling the suppression results in a smaller number of
> mostly different false positives.  You may want to experiment with
> this — I plan to take another look at whether enabling this
> suppression is a good idea based on what I’ve seen in my tests.
> 
> Unfortunately, it’s not possible to use the race:libzmq.so
> suppression to avoid all false positives in ZMQ, since that
> suppresses ALL warnings where libzmq.so appears ANYWHERE in the stack
> trace, and that is much too broad.
> 
> So, there’s no simple answer.  I’ve developed some scripts that parse
> the output of TSAN and generate MD5 hashes of the stack traces, which
> can then be used to suppress individual stack traces.  Going that
> route is a lot of work, but it’s the only way I know of at this time
> to provide more granular suppressions with TSAN.  It would be nice if
> the suppression mechanism in TSAN were more robust (e.g., more like
> valgrind’s), but it isn’t.
> 
> > 
> > 
> > Instrumenting libzmq and/or omitting the suppression causes a LOT
> > of warnings, esp. in the ZMQ worker threads.  So, unless I'm
> > willing to commit the time and effort to go through and investigate
> > each of these warnings, I feel I have little choice but to accept
> > that at this point in its lifetime ZMQ should be race-free for all
> > practical purposes.
> > 
> > FWIW, I’ve done fairly extensive testing, and specifically stress
> > testing, and have yet to find anything that looks like an honest-
> > to-goodness bug in ZMQ.  (Which is not to say that the docs are
> > always clear about what to expect in certain situations ;-)  I did
> > have one problem which appears to have been a bug in epoll, and
> > which was resolved by upgrading Linux, but that’s it.
> > 
> > I agree, I was just surprised to see so many warnings... 
> > 
> >  
> > BTW, there’s an excellent overview of how this all works at http://
> > 

Re: [zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

2018-02-25 Thread Bill Torpey
Hi Franceso:

A few more points below …

Good luck, and please post back if you find out anything interesting!

Regards,

Bill

> On Feb 25, 2018, at 4:54 AM, Francesco  wrote:
> 
> Hi Bill,
> thanks for your answer.
> 
> 
> 
> 2018-02-24 21:49 GMT+01:00 Bill Torpey  >:
> ...
> If T2 and T4 were application code, this would be a clear violation of ZMQ’s 
> threading rules (assuming “legacy”, non-thread-safe socket types).  
> Right. 
> 
>  
> For instance, one technique would be to perform initialization (e.g., bind’s) 
> in the main thread, and only after that is done spin up other threads to do 
> processing.  In this case, TSAN wouldn't have any way to know that the 
> application guarantees that accesses cannot result in a race, so TSAN would 
> flag it.  I’ve gotten in the habit of using mutexes to protect code like this 
> even though it should not strictly be needed, just to keep tools like TSAN 
> happy, and also because I don’t know the ZMQ code well enough to be 100% 
> certain that the mutexes are not necessary — better safe than sorry!
> Yeah, that's a possibility but it results in a lot of "clutter" that decrease 
> code readability and makes it harder to maintain in the long run...
> 
> This situation is different, though, since T4 is not an application thread — 
> it’s an internal ZMQ worker thread.   So, I think in this case we kind of 
> have to accept that ZMQ is doing the right thing here.
> At least, that’s the approach I’ve been taking.  When I instrument my apps 
> and libraries with TSAN I specifically do NOT instrument ZMQ, and I also use 
> the “called_from_lib:libzmq.so” suppression (which is listed as an example 
> for TSAN: 
> https://github.com/google/sanitizers/wiki/ThreadSanitizerSuppressions 
> ).
> Understood. I have a question though: if you use   
> “called_from_lib:libzmq.so”  suppression, are you able to spot the race 
> condition due to T2 and T4 being application threads (instead of being 1 
> application and 1 zmq internal)? 
> I wonder if TSAN, detecting that one of the 2 threads generating the data 
> race is inside ZMQ, entirely suppress the race warning or instead will 
> suppress only race conditions involving 2 internal zmq threads..

Good question.  I could only find one post that discusses this suppression: 
https://groups.google.com/forum/#!topic/thread-sanitizer/NEcgiPEG0N8 


called_from_lib suppresses only interceptors (like read or memset) 
called directly from the lib. It's intended for non-instrumented libraries. 

However, when I try this with my test code, enabling the suppression actually 
increases the number of false positives reported by TSAN.  Disabling the 
suppression results in a smaller number of mostly different false positives.  
You may want to experiment with this — I plan to take another look at whether 
enabling this suppression is a good idea based on what I’ve seen in my tests.

Unfortunately, it’s not possible to use the race:libzmq.so suppression to avoid 
all false positives in ZMQ, since that suppresses ALL warnings where libzmq.so 
appears ANYWHERE in the stack trace, and that is much too broad.

So, there’s no simple answer.  I’ve developed some scripts that parse the 
output of TSAN and generate MD5 hashes of the stack traces, which can then be 
used to suppress individual stack traces.  Going that route is a lot of work, 
but it’s the only way I know of at this time to provide more granular 
suppressions with TSAN.  It would be nice if the suppression mechanism in TSAN 
were more robust (e.g., more like valgrind’s), but it isn’t.

> 
> 
> Instrumenting libzmq and/or omitting the suppression causes a LOT of 
> warnings, esp. in the ZMQ worker threads.  So, unless I'm willing to commit 
> the time and effort to go through and investigate each of these warnings, I 
> feel I have little choice but to accept that at this point in its lifetime 
> ZMQ should be race-free for all practical purposes.
> 
> FWIW, I’ve done fairly extensive testing, and specifically stress testing, 
> and have yet to find anything that looks like an honest-to-goodness bug in 
> ZMQ.  (Which is not to say that the docs are always clear about what to 
> expect in certain situations ;-)  I did have one problem which appears to 
> have been a bug in epoll, and which was resolved by upgrading Linux, but 
> that’s it.
> 
> I agree, I was just surprised to see so many warnings... 
> 
>  
> BTW, there’s an excellent overview of how this all works at 
> http://zeromq.org/whitepapers:architecture 
>  — although it’s somewhat old, it 
> appears to still be relatively accurate.
> 
> thanks for the link, I read that and it's quite interesting. However it does 
> not mention how multi-thread 

Re: [zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

2018-02-25 Thread Francesco
Hi Bill,
thanks for your answer.



2018-02-24 21:49 GMT+01:00 Bill Torpey :
>
> ...
>
If T2 and T4 were application code, this would be a clear violation of
> ZMQ’s threading rules (assuming “legacy”, non-thread-safe socket types).
>
Right.



> For instance, one technique would be to perform initialization (e.g.,
> bind’s) in the main thread, and only after that is done spin up other
> threads to do processing.  In this case, TSAN wouldn't have any way to know
> that the application guarantees that accesses cannot result in a race, so
> TSAN would flag it.  I’ve gotten in the habit of using mutexes to protect
> code like this even though it should not strictly be needed, just to keep
> tools like TSAN happy, and also because I don’t know the ZMQ code well
> enough to be 100% certain that the mutexes are not necessary — better safe
> than sorry!
>
Yeah, that's a possibility but it results in a lot of "clutter" that
decrease code readability and makes it harder to maintain in the long run...

This situation is different, though, since T4 is not an application thread
> — it’s an internal ZMQ worker thread.   So, I think in this case we kind of
> have to accept that ZMQ is doing the right thing here.
> At least, that’s the approach I’ve been taking.  When I instrument my apps
> and libraries with TSAN I specifically do NOT instrument ZMQ, and I also
> use the “called_from_lib:libzmq.so” suppression (which is listed as an
> example for TSAN: https://github.com/google/sanitizers/wiki/
> ThreadSanitizerSuppressions).
>
Understood. I have a question though: if you use
“called_from_lib:libzmq.so”  suppression, are you able to spot the race
condition due to T2 and T4 being application threads (instead of being 1
application and 1 zmq internal)?
I wonder if TSAN, detecting that one of the 2 threads generating the data
race is inside ZMQ, entirely suppress the race warning or instead will
suppress only race conditions involving 2 internal zmq threads..


Instrumenting libzmq and/or omitting the suppression causes a LOT of
> warnings, esp. in the ZMQ worker threads.  So, unless I'm willing to commit
> the time and effort to go through and investigate each of these warnings, I
> feel I have little choice but to accept that at this point in its lifetime
> ZMQ should be race-free for all practical purposes.
>
> FWIW, I’ve done fairly extensive testing, and specifically stress testing,
> and have yet to find anything that looks like an honest-to-goodness bug in
> ZMQ.  (Which is not to say that the docs are always clear about what to
> expect in certain situations ;-)  I did have one problem which appears to
> have been a bug in epoll, and which was resolved by upgrading Linux, but
> that’s it.
>

I agree, I was just surprised to see so many warnings...



> BTW, there’s an excellent overview of how this all works at
> http://zeromq.org/whitepapers:architecture — although it’s somewhat old,
> it appears to still be relatively accurate.
>

thanks for the link, I read that and it's quite interesting. However it
does not mention how multi-thread safety is achieved.
Just out of curiosity I will take a look at ypipe implementation.



> I tried forcing use of C++11 internal atomics and all the warnings
> about  zmq::atomic_ptr_t<> disappear (I still get tons about ypipe and
> "pipe")..
> maybe it's best to prefer C++11 atomics when available?
>
>
> I would actually suggest using plain old pthread for synchronization
> primitives — TSAN’s support for pthread is pretty mature. Support for C++11
> is a bit less mature, and has certain constraints (
> https://clang.llvm.org/docs/ThreadSanitizer.html):
>

However using old pthread for synchronization you probably loose a lot in
performances... if C++11 atomics are supported by TSAN just fine, why not
preferring them over the GCC intrinsics?



> *ThreadSanitizer is in beta stage. It is known to work on large C++
> programs using pthreads, but we do not promise anything (yet). C++11
> threading is supported with llvm libc++. The test suite is integrated into
> CMake build and can be run with make check-tsan command.*
>
>
> In particular, to use C++11 you also need to use llvm’s libcxx, rather
> than gnu’s libstdc++.
>

I think that the quoted text is referring to "C++11 threading" that is to
http://en.cppreference.com/w/cpp/thread
As I mentioned from a simple test I do see that at least gcc 5.3.0 TSAN is
handling C++11 atomics fine (without forcing you to link LLVM libc++)




> Btw I will try to write a suppression file for ThreadSanitizer and ZMQ...
> I just start to doubt: is all ZMQ code really race-free? :)
>
>
> As mentioned above, that is likely to be a huge undertaking.  At this
> point, I think the only confirmation we have that ZMQ is race-free is based
> on “black-box” testing.  The good news is that ZMQ appears to be used
> widely enough that the code is well-tested.  (Of course, failure to find a
> bug doesn’t prove that there isn’t one, so 

Re: [zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

2018-02-24 Thread Luca Boccassi
There are a couple in the use/setting of a handful of socket options,
but nothing major. The pipes and queues are fine.

On Sat, 2018-02-24 at 17:34 +0100, Francesco wrote:
> Actually even building zeromq with -fsanitize=thread I still get a
> lot of
> data races about ZMQ atomics:
> 
> WARNING: ThreadSanitizer: data race (pid=25770)
>   Atomic write of size 4 at 0x7d417e48 by thread T4:
> #0 __tsan_atomic32_fetch_add
> ../../../../gcc-5.3.0/libsanitizer/tsan/tsan_interface_atomic.cc:611
> (libtsan.so.0+0x0005860f)
> #1 zmq::atomic_counter_t::add(unsigned int)
> src/atomic_counter.hpp:116
> (libzmq.so.5+0x0001a087)
> #2 zmq::poller_base_t::adjust_load(int) src/poller_base.cpp:53
> (libzmq.so.5+0x0006d20d)
> #3 zmq::epoll_t::add_fd(int, zmq::i_poll_events*)
> src/epoll.cpp:91
> (libzmq.so.5+0x0003c229)
> #4 zmq::io_object_t::add_fd(int) src/io_object.cpp:66
> (libzmq.so.5+0x000400ef)
> #5 zmq::tcp_connecter_t::start_connecting()
> src/tcp_connecter.cpp:204
> (libzmq.so.5+0x000abf30)
> #6 zmq::tcp_connecter_t::process_plug() src/tcp_connecter.cpp:94
> (libzmq.so.5+0x000ab6cb)
> #7 zmq::object_t::process_command(zmq::command_t&)
> src/object.cpp:90
> (libzmq.so.5+0x00059e37)
> #8 zmq::io_thread_t::in_event() src/io_thread.cpp:85
> (libzmq.so.5+0x00040cb3)
> #9 zmq::epoll_t::loop() src/epoll.cpp:188
> (libzmq.so.5+0x0003ce6d)
> #10 zmq::epoll_t::worker_routine(void*) src/epoll.cpp:203
> (libzmq.so.5+0x0003cfb6)
> #11 thread_routine src/thread.cpp:109
> (libzmq.so.5+0x000af00b)
> 
>   Previous read of size 4 at 0x7d417e48 by thread T2:
> #0 zmq::atomic_counter_t::get() const src/atomic_counter.hpp:210
> (libzmq.so.5+0x00061d20)
> #1 zmq::poller_base_t::get_load() src/poller_base.cpp:47
> (libzmq.so.5+0x0006d1c6)
> #2 zmq::io_thread_t::get_load() src/io_thread.cpp:72
> (libzmq.so.5+0x00040bef)
> #3 zmq::ctx_t::choose_io_thread(unsigned long) src/ctx.cpp:451
> (libzmq.so.5+0x000184ff)
> #4 zmq::object_t::choose_io_thread(unsigned long)
> src/object.cpp:199
> (libzmq.so.5+0x0005a67f)
> #5 zmq::socket_base_t::bind(char const*) src/socket_base.cpp:610
> (libzmq.so.5+0x0008bca6)
> #6 zmq_bind src/zmq.cpp:326 (libzmq.so.5+0x000c8735)
>    
> 
> or about ypipe:
> 
> WARNING: ThreadSanitizer: data race (pid=25770)
>   Read of size 8 at 0x7d640001e0c0 by thread T4:
> #0 zmq::ypipe_t::read(zmq::command_t*)
> src/ypipe.hpp:170 (libzmq.so.5+0x00047e49)
> #1 zmq::mailbox_t::recv(zmq::command_t*, int) src/mailbox.cpp:73
> (libzmq.so.5+0x00047793)
> #2 zmq::io_thread_t::in_event() src/io_thread.cpp:86
> (libzmq.so.5+0x00040cd5)
> #3 zmq::epoll_t::loop() src/epoll.cpp:188
> (libzmq.so.5+0x0003ce6d)
> #4 zmq::epoll_t::worker_routine(void*) src/epoll.cpp:203
> (libzmq.so.5+0x0003cfb6)
> #5 thread_routine src/thread.cpp:109 (libzmq.so.5+0x000af00b)
> 
>   Previous write of size 8 at 0x7d640001e0c0 by thread T2 (mutexes:
> write
> M500):
> #0 zmq::ypipe_t::write(zmq::command_t const&,
> bool)
> src/ypipe.hpp:85 (libzmq.so.5+0x00047f05)
> #1 zmq::mailbox_t::send(zmq::command_t const&) src/mailbox.cpp:62
> (libzmq.so.5+0x000476dc)
> #2 zmq::ctx_t::send_command(unsigned int, zmq::command_t const&)
> src/ctx.cpp:438 (libzmq.so.5+0x00018420)
> #3 zmq::object_t::send_command(zmq::command_t&)
> src/object.cpp:474
> (libzmq.so.5+0x0005c07b)
> #4 zmq::object_t::send_plug(zmq::own_t*, bool) src/object.cpp:220
> (libzmq.so.5+0x0005a7ef)
> #5 zmq::own_t::launch_child(zmq::own_t*) src/own.cpp:87
> (libzmq.so.5+0x0006134f)
> #6 zmq::socket_base_t::add_endpoint(char const*, zmq::own_t*,
> zmq::pipe_t*) src/socket_base.cpp:1006 (libzmq.so.5+0x0008e081)
> #7 zmq::socket_base_t::bind(char const*) src/socket_base.cpp:630
> (libzmq.so.5+0x0008be89)
> #8 zmq_bind src/zmq.cpp:326 (libzmq.so.5+0x000c8735)
> ...
> 
> 
> I tried forcing use of C++11 internal atomics and all the warnings
> about  zmq::atomic_ptr_t<> disappear (I still get tons about ypipe
> and
> "pipe")..
> maybe it's best to prefer C++11 atomics when available?
> 
> Btw I will try to write a suppression file for ThreadSanitizer and
> ZMQ... I
> just start to doubt: is all ZMQ code really race-free? :)
> 
> 
> Thanks,
> Francesco
> 
> 
> 
> 
> 2018-02-24 16:44 GMT+01:00 Francesco :
> 
> > Hi Bill,
> > indeed I've found ThreadSanitizer to be more effective i.e., it
> > contains
> > much less false positives compared to Helgrind... still I'm getting
> > several
> > race condition warnings out of libzmq 4.2.3... indeed I actually
> > built only
> > my application code with -fsanitize=thread... do you know if I need
> > to
> > rebuild libzmq with that option as well? I will try to see if it
> > makes any
> > difference!
> > 
> > Thanks,
> > 

Re: [zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

2018-02-24 Thread Bill Torpey
Francesco:

Please see my comments below.  Full disclaimer: I don’t pretend to understand 
ZMQ internals, and a lot of this information came via trial and error.  YMMV, 
but I hope this helps.  If you find out more, it would be much appreciated if 
you post back to this thread — thanks!

Regards,

Bill

> On Feb 24, 2018, at 11:34 AM, Francesco  wrote:
> 
> Actually even building zeromq with -fsanitize=thread I still get a lot of 
> data races about ZMQ atomics:
> 
> WARNING: ThreadSanitizer: data race (pid=25770)
>   Atomic write of size 4 at 0x7d417e48 by thread T4:
> #0 __tsan_atomic32_fetch_add 
> ../../../../gcc-5.3.0/libsanitizer/tsan/tsan_interface_atomic.cc:611 
> (libtsan.so.0+0x0005860f)
> #1 zmq::atomic_counter_t::add(unsigned int) src/atomic_counter.hpp:116 
> (libzmq.so.5+0x0001a087)
> #2 zmq::poller_base_t::adjust_load(int) src/poller_base.cpp:53 
> (libzmq.so.5+0x0006d20d)
> #3 zmq::epoll_t::add_fd(int, zmq::i_poll_events*) src/epoll.cpp:91 
> (libzmq.so.5+0x0003c229)
> #4 zmq::io_object_t::add_fd(int) src/io_object.cpp:66 
> (libzmq.so.5+0x000400ef)
> #5 zmq::tcp_connecter_t::start_connecting() src/tcp_connecter.cpp:204 
> (libzmq.so.5+0x000abf30)
> #6 zmq::tcp_connecter_t::process_plug() src/tcp_connecter.cpp:94 
> (libzmq.so.5+0x000ab6cb)
> #7 zmq::object_t::process_command(zmq::command_t&) src/object.cpp:90 
> (libzmq.so.5+0x00059e37)
> #8 zmq::io_thread_t::in_event() src/io_thread.cpp:85 
> (libzmq.so.5+0x00040cb3)
> #9 zmq::epoll_t::loop() src/epoll.cpp:188 (libzmq.so.5+0x0003ce6d)
> #10 zmq::epoll_t::worker_routine(void*) src/epoll.cpp:203 
> (libzmq.so.5+0x0003cfb6)
> #11 thread_routine src/thread.cpp:109 (libzmq.so.5+0x000af00b)
> 
>   Previous read of size 4 at 0x7d417e48 by thread T2:
> #0 zmq::atomic_counter_t::get() const src/atomic_counter.hpp:210 
> (libzmq.so.5+0x00061d20)
> #1 zmq::poller_base_t::get_load() src/poller_base.cpp:47 
> (libzmq.so.5+0x0006d1c6)
> #2 zmq::io_thread_t::get_load() src/io_thread.cpp:72 
> (libzmq.so.5+0x00040bef)
> #3 zmq::ctx_t::choose_io_thread(unsigned long) src/ctx.cpp:451 
> (libzmq.so.5+0x000184ff)
> #4 zmq::object_t::choose_io_thread(unsigned long) src/object.cpp:199 
> (libzmq.so.5+0x0005a67f)
> #5 zmq::socket_base_t::bind(char const*) src/socket_base.cpp:610 
> (libzmq.so.5+0x0008bca6)
> #6 zmq_bind src/zmq.cpp:326 (libzmq.so.5+0x000c8735)
>….


Since I don’t see any references to mutexes above, I assume that the code does 
not use mutexes.  Otherwise you would get a warning that looks more like this:

  Previous write of size 8 at 0x7b0864b8 by thread T5 (mutexes: write M88, 
write M90, write M87):

TSAN keeps track of the mutexes held at the time of each memory access — if two 
memory accesses occur on two different threads, but they have at least one 
mutex in common, TSAN accepts the access as race-free.  If the two accesses 
don’t have at least one mutex in common, then the access is considered a data 
race.

If T2 and T4 were application code, this would be a clear violation of ZMQ’s 
threading rules (assuming “legacy”, non-thread-safe socket types).  For 
instance, one technique would be to perform initialization (e.g., bind’s) in 
the main thread, and only after that is done spin up other threads to do 
processing.  In this case, TSAN wouldn't have any way to know that the 
application guarantees that accesses cannot result in a race, so TSAN would 
flag it.  I’ve gotten in the habit of using mutexes to protect code like this 
even though it should not strictly be needed, just to keep tools like TSAN 
happy, and also because I don’t know the ZMQ code well enough to be 100% 
certain that the mutexes are not necessary — better safe than sorry!

This situation is different, though, since T4 is not an application thread — 
it’s an internal ZMQ worker thread.   So, I think in this case we kind of have 
to accept that ZMQ is doing the right thing here.

At least, that’s the approach I’ve been taking.  When I instrument my apps and 
libraries with TSAN I specifically do NOT instrument ZMQ, and I also use the 
“called_from_lib:libzmq.so” suppression (which is listed as an example for 
TSAN: https://github.com/google/sanitizers/wiki/ThreadSanitizerSuppressions 
).

Instrumenting libzmq and/or omitting the suppression causes a LOT of warnings, 
esp. in the ZMQ worker threads.  So, unless I'm willing to commit the time and 
effort to go through and investigate each of these warnings, I feel I have 
little choice but to accept that at this point in its lifetime ZMQ should be 
race-free for all practical purposes.

FWIW, I’ve done fairly extensive testing, and specifically stress testing, and 
have yet to find anything that looks like an honest-to-goodness bug in ZMQ.  
(Which is 

Re: [zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

2018-02-24 Thread Francesco
Actually even building zeromq with -fsanitize=thread I still get a lot of
data races about ZMQ atomics:

WARNING: ThreadSanitizer: data race (pid=25770)
  Atomic write of size 4 at 0x7d417e48 by thread T4:
#0 __tsan_atomic32_fetch_add
../../../../gcc-5.3.0/libsanitizer/tsan/tsan_interface_atomic.cc:611
(libtsan.so.0+0x0005860f)
#1 zmq::atomic_counter_t::add(unsigned int) src/atomic_counter.hpp:116
(libzmq.so.5+0x0001a087)
#2 zmq::poller_base_t::adjust_load(int) src/poller_base.cpp:53
(libzmq.so.5+0x0006d20d)
#3 zmq::epoll_t::add_fd(int, zmq::i_poll_events*) src/epoll.cpp:91
(libzmq.so.5+0x0003c229)
#4 zmq::io_object_t::add_fd(int) src/io_object.cpp:66
(libzmq.so.5+0x000400ef)
#5 zmq::tcp_connecter_t::start_connecting() src/tcp_connecter.cpp:204
(libzmq.so.5+0x000abf30)
#6 zmq::tcp_connecter_t::process_plug() src/tcp_connecter.cpp:94
(libzmq.so.5+0x000ab6cb)
#7 zmq::object_t::process_command(zmq::command_t&) src/object.cpp:90
(libzmq.so.5+0x00059e37)
#8 zmq::io_thread_t::in_event() src/io_thread.cpp:85
(libzmq.so.5+0x00040cb3)
#9 zmq::epoll_t::loop() src/epoll.cpp:188 (libzmq.so.5+0x0003ce6d)
#10 zmq::epoll_t::worker_routine(void*) src/epoll.cpp:203
(libzmq.so.5+0x0003cfb6)
#11 thread_routine src/thread.cpp:109 (libzmq.so.5+0x000af00b)

  Previous read of size 4 at 0x7d417e48 by thread T2:
#0 zmq::atomic_counter_t::get() const src/atomic_counter.hpp:210
(libzmq.so.5+0x00061d20)
#1 zmq::poller_base_t::get_load() src/poller_base.cpp:47
(libzmq.so.5+0x0006d1c6)
#2 zmq::io_thread_t::get_load() src/io_thread.cpp:72
(libzmq.so.5+0x00040bef)
#3 zmq::ctx_t::choose_io_thread(unsigned long) src/ctx.cpp:451
(libzmq.so.5+0x000184ff)
#4 zmq::object_t::choose_io_thread(unsigned long) src/object.cpp:199
(libzmq.so.5+0x0005a67f)
#5 zmq::socket_base_t::bind(char const*) src/socket_base.cpp:610
(libzmq.so.5+0x0008bca6)
#6 zmq_bind src/zmq.cpp:326 (libzmq.so.5+0x000c8735)
   

or about ypipe:

WARNING: ThreadSanitizer: data race (pid=25770)
  Read of size 8 at 0x7d640001e0c0 by thread T4:
#0 zmq::ypipe_t::read(zmq::command_t*)
src/ypipe.hpp:170 (libzmq.so.5+0x00047e49)
#1 zmq::mailbox_t::recv(zmq::command_t*, int) src/mailbox.cpp:73
(libzmq.so.5+0x00047793)
#2 zmq::io_thread_t::in_event() src/io_thread.cpp:86
(libzmq.so.5+0x00040cd5)
#3 zmq::epoll_t::loop() src/epoll.cpp:188 (libzmq.so.5+0x0003ce6d)
#4 zmq::epoll_t::worker_routine(void*) src/epoll.cpp:203
(libzmq.so.5+0x0003cfb6)
#5 thread_routine src/thread.cpp:109 (libzmq.so.5+0x000af00b)

  Previous write of size 8 at 0x7d640001e0c0 by thread T2 (mutexes: write
M500):
#0 zmq::ypipe_t::write(zmq::command_t const&, bool)
src/ypipe.hpp:85 (libzmq.so.5+0x00047f05)
#1 zmq::mailbox_t::send(zmq::command_t const&) src/mailbox.cpp:62
(libzmq.so.5+0x000476dc)
#2 zmq::ctx_t::send_command(unsigned int, zmq::command_t const&)
src/ctx.cpp:438 (libzmq.so.5+0x00018420)
#3 zmq::object_t::send_command(zmq::command_t&) src/object.cpp:474
(libzmq.so.5+0x0005c07b)
#4 zmq::object_t::send_plug(zmq::own_t*, bool) src/object.cpp:220
(libzmq.so.5+0x0005a7ef)
#5 zmq::own_t::launch_child(zmq::own_t*) src/own.cpp:87
(libzmq.so.5+0x0006134f)
#6 zmq::socket_base_t::add_endpoint(char const*, zmq::own_t*,
zmq::pipe_t*) src/socket_base.cpp:1006 (libzmq.so.5+0x0008e081)
#7 zmq::socket_base_t::bind(char const*) src/socket_base.cpp:630
(libzmq.so.5+0x0008be89)
#8 zmq_bind src/zmq.cpp:326 (libzmq.so.5+0x000c8735)
...


I tried forcing use of C++11 internal atomics and all the warnings
about  zmq::atomic_ptr_t<> disappear (I still get tons about ypipe and
"pipe")..
maybe it's best to prefer C++11 atomics when available?

Btw I will try to write a suppression file for ThreadSanitizer and ZMQ... I
just start to doubt: is all ZMQ code really race-free? :)


Thanks,
Francesco




2018-02-24 16:44 GMT+01:00 Francesco :

> Hi Bill,
> indeed I've found ThreadSanitizer to be more effective i.e., it contains
> much less false positives compared to Helgrind... still I'm getting several
> race condition warnings out of libzmq 4.2.3... indeed I actually built only
> my application code with -fsanitize=thread... do you know if I need to
> rebuild libzmq with that option as well? I will try to see if it makes any
> difference!
>
> Thanks,
> Francesco
>
>
>
>
> 2018-02-24 15:17 GMT+01:00 Bill Torpey :
>
>> I’m using clang’s Thread Sanitizer for a similar purpose, and just
>> happened to notice that the TSAN docs use ZeroMQ as one of the example
>> suppressions:  https://github.com/google/san
>> itizers/wiki/ThreadSanitizerSuppressions
>>
>> I assume that the reason for suppressing libzmq.so has to do with
>> (legacy) sockets not being thread-safe, so 

Re: [zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

2018-02-24 Thread Francesco
Hi Bill,
indeed I've found ThreadSanitizer to be more effective i.e., it contains
much less false positives compared to Helgrind... still I'm getting several
race condition warnings out of libzmq 4.2.3... indeed I actually built only
my application code with -fsanitize=thread... do you know if I need to
rebuild libzmq with that option as well? I will try to see if it makes any
difference!

Thanks,
Francesco




2018-02-24 15:17 GMT+01:00 Bill Torpey :

> I’m using clang’s Thread Sanitizer for a similar purpose, and just
> happened to notice that the TSAN docs use ZeroMQ as one of the example
> suppressions:  https://github.com/google/sanitizers/wiki/
> ThreadSanitizerSuppressions
>
> I assume that the reason for suppressing libzmq.so has to do with (legacy)
> sockets not being thread-safe, so the code may exhibit race conditions that
> are irrelevant given that the code is not intended to be called from
> multiple threads.
>
> FWIW, you may want to check out the clang sanitizers — they have certain
> advantages over valgrind — (faster, multi-threaded, etc.) if you are able
> to instrument the code at build time.
>
>
> On Feb 23, 2018, at 6:52 AM, Luca Boccassi 
> wrote:
>
> On Fri, 2018-02-23 at 12:22 +0100, Francesco wrote:
>
> Hi all,
> I'm trying to further debug the problem I described in my earlier
> mail (
> https://lists.zeromq.org/pipermail/zeromq-dev/2018-February/032303.ht
> ml) so
> I decided to use Helgrind to find race conditions in my code.
>
> My problem is that apparently Helgrind 3.12.0 is reporting race
> conditions
> against zmq::atomic_ptr_t<> implementation.
> Now I know that Helgrind has troubles with C++11 atomics but by
> looking at
> the code I see that ZMQ is not using them (note: I do have
> ZMQ_ATOMIC_PTR_CXX11 defined but I also have ZMQ_ATOMIC_PTR_INTRINSIC
> defined, so the latter wins!).
>
> In particular Helgrind 3.12.0 tells me that:
>
>
> ==00:00:00:11.885 29399==
> ==00:00:00:11.885 29399== *Possible data race during read of size 8
> at
> 0xB373BF0 by thread #4*
> ==00:00:00:11.885 29399== Locks held: none
> ==00:00:00:11.885 29399==at 0x6BD79AB:
> *zmq::atomic_ptr_t::cas*(zmq::command_t*,
> zmq::command_t*)
> (atomic_ptr.hpp:150)
> ==00:00:00:11.885 29399==by 0x6BD7874:
> zmq::ypipe_t 16>::check_read() (ypipe.hpp:147)
> ==00:00:00:11.885 29399==by 0x6BD7288:
> zmq::ypipe_t 16>::read(zmq::command_t*) (ypipe.hpp:165)
> ==00:00:00:11.885 29399==by 0x6BD6FE7:
> zmq::mailbox_t::recv(zmq::command_t*, int) (mailbox.cpp:98)
> ==00:00:00:11.885 29399==by 0x6BD29FC:
> zmq::io_thread_t::in_event()
> (io_thread.cpp:81)
> ==00:00:00:11.885 29399==by 0x6BD05C1: zmq::epoll_t::loop()
> (epoll.cpp:188)
> ==00:00:00:11.885 29399==by 0x6BD06C3:
> zmq::epoll_t::worker_routine(void*) (epoll.cpp:203)
> ==00:00:00:11.885 29399==by 0x6C18BA5: thread_routine
> (thread.cpp:109)
> ==00:00:00:11.885 29399==by 0x4C2F837: mythread_wrapper
> (hg_intercepts.c:389)
> ==00:00:00:11.885 29399==by 0x6E72463: start_thread
> (pthread_create.c:334)
> ==00:00:00:11.885 29399==by 0x92F901C: clone (clone.S:109)
> ==00:00:00:11.885 29399==
> ==00:00:00:11.885 29399== This conflicts with a previous write of
> size 8 by
> thread #2
> ==00:00:00:11.885 29399== Locks held: 1, at address 0xB373C08
> ==00:00:00:11.885 29399==at 0x6BD77F4:
> *zmq::atomic_ptr_t::set*(zmq::command_t*)
> (atomic_ptr.hpp:90)
> ==00:00:00:11.885 29399==by 0x6BD7422:
> zmq::ypipe_t 16>::flush() (ypipe.hpp:125)
> ==00:00:00:11.885 29399==by 0x6BD6DF5:
> zmq::mailbox_t::send(zmq::command_t const&) (mailbox.cpp:63)
> ==00:00:00:11.885 29399==by 0x6BB9128:
> zmq::ctx_t::send_command(unsigned int, zmq::command_t const&)
> (ctx.cpp:438)
> ==00:00:00:11.885 29399==by 0x6BE34CE:
> zmq::object_t::send_command(zmq::command_t&) (object.cpp:474)
> ==00:00:00:11.885 29399==by 0x6BE26F8:
> zmq::object_t::send_plug(zmq::own_t*, bool) (object.cpp:220)
> ==00:00:00:11.885 29399==by 0x6BE68E2:
> zmq::own_t::launch_child(zmq::own_t*) (own.cpp:87)
> ==00:00:00:11.885 29399==by 0x6C03D6C:
> zmq::socket_base_t::add_endpoint(char const*, zmq::own_t*,
> zmq::pipe_t*)
> (socket_base.cpp:1006)
> ==00:00:00:11.885 29399==  Address 0xb373bf0 is 128 bytes inside a
> block of
> size 224 alloc'd
> ==00:00:00:11.885 29399==at 0x4C2A6FD: operator new(unsigned
> long,
> std::nothrow_t const&) (vg_replace_malloc.c:376)
> ==00:00:00:11.885 29399==by 0x6BB8B8D:
> zmq::ctx_t::create_socket(int)
> (ctx.cpp:351)
> ==00:00:00:11.885 29399==by 0x6C284D5: zmq_socket (zmq.cpp:267)
> ==00:00:00:11.885 29399==by 0x6143809:
> ZmqClientSocket::Config(PubSubSocketConfig const&)
> (ZmqRequestReply.cpp:303)
> ==00:00:00:11.885 29399==by 0x6144069:
> ZmqClientMultiSocket::Config(PubSubSocketConfig const&)
> (ZmqRequestReply.cpp:407)
> ==00:00:00:11.885 29399==by 0x61684EF: 

Re: [zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

2018-02-24 Thread Bill Torpey
I’m using clang’s Thread Sanitizer for a similar purpose, and just happened to 
notice that the TSAN docs use ZeroMQ as one of the example suppressions:  
https://github.com/google/sanitizers/wiki/ThreadSanitizerSuppressions 


I assume that the reason for suppressing libzmq.so has to do with (legacy) 
sockets not being thread-safe, so the code may exhibit race conditions that are 
irrelevant given that the code is not intended to be called from multiple 
threads.

FWIW, you may want to check out the clang sanitizers — they have certain 
advantages over valgrind — (faster, multi-threaded, etc.) if you are able to 
instrument the code at build time.


> On Feb 23, 2018, at 6:52 AM, Luca Boccassi  wrote:
> 
> On Fri, 2018-02-23 at 12:22 +0100, Francesco wrote:
>> Hi all,
>> I'm trying to further debug the problem I described in my earlier
>> mail (
>> https://lists.zeromq.org/pipermail/zeromq-dev/2018-February/032303.ht
>> ml) so
>> I decided to use Helgrind to find race conditions in my code.
>> 
>> My problem is that apparently Helgrind 3.12.0 is reporting race
>> conditions
>> against zmq::atomic_ptr_t<> implementation.
>> Now I know that Helgrind has troubles with C++11 atomics but by
>> looking at
>> the code I see that ZMQ is not using them (note: I do have
>> ZMQ_ATOMIC_PTR_CXX11 defined but I also have ZMQ_ATOMIC_PTR_INTRINSIC
>> defined, so the latter wins!).
>> 
>> In particular Helgrind 3.12.0 tells me that:
>> 
>> 
>> ==00:00:00:11.885 29399==
>> ==00:00:00:11.885 29399== *Possible data race during read of size 8
>> at
>> 0xB373BF0 by thread #4*
>> ==00:00:00:11.885 29399== Locks held: none
>> ==00:00:00:11.885 29399==at 0x6BD79AB:
>> *zmq::atomic_ptr_t::cas*(zmq::command_t*,
>> zmq::command_t*)
>> (atomic_ptr.hpp:150)
>> ==00:00:00:11.885 29399==by 0x6BD7874:
>> zmq::ypipe_t> 16>::check_read() (ypipe.hpp:147)
>> ==00:00:00:11.885 29399==by 0x6BD7288:
>> zmq::ypipe_t> 16>::read(zmq::command_t*) (ypipe.hpp:165)
>> ==00:00:00:11.885 29399==by 0x6BD6FE7:
>> zmq::mailbox_t::recv(zmq::command_t*, int) (mailbox.cpp:98)
>> ==00:00:00:11.885 29399==by 0x6BD29FC:
>> zmq::io_thread_t::in_event()
>> (io_thread.cpp:81)
>> ==00:00:00:11.885 29399==by 0x6BD05C1: zmq::epoll_t::loop()
>> (epoll.cpp:188)
>> ==00:00:00:11.885 29399==by 0x6BD06C3:
>> zmq::epoll_t::worker_routine(void*) (epoll.cpp:203)
>> ==00:00:00:11.885 29399==by 0x6C18BA5: thread_routine
>> (thread.cpp:109)
>> ==00:00:00:11.885 29399==by 0x4C2F837: mythread_wrapper
>> (hg_intercepts.c:389)
>> ==00:00:00:11.885 29399==by 0x6E72463: start_thread
>> (pthread_create.c:334)
>> ==00:00:00:11.885 29399==by 0x92F901C: clone (clone.S:109)
>> ==00:00:00:11.885 29399==
>> ==00:00:00:11.885 29399== This conflicts with a previous write of
>> size 8 by
>> thread #2
>> ==00:00:00:11.885 29399== Locks held: 1, at address 0xB373C08
>> ==00:00:00:11.885 29399==at 0x6BD77F4:
>> *zmq::atomic_ptr_t::set*(zmq::command_t*)
>> (atomic_ptr.hpp:90)
>> ==00:00:00:11.885 29399==by 0x6BD7422:
>> zmq::ypipe_t> 16>::flush() (ypipe.hpp:125)
>> ==00:00:00:11.885 29399==by 0x6BD6DF5:
>> zmq::mailbox_t::send(zmq::command_t const&) (mailbox.cpp:63)
>> ==00:00:00:11.885 29399==by 0x6BB9128:
>> zmq::ctx_t::send_command(unsigned int, zmq::command_t const&)
>> (ctx.cpp:438)
>> ==00:00:00:11.885 29399==by 0x6BE34CE:
>> zmq::object_t::send_command(zmq::command_t&) (object.cpp:474)
>> ==00:00:00:11.885 29399==by 0x6BE26F8:
>> zmq::object_t::send_plug(zmq::own_t*, bool) (object.cpp:220)
>> ==00:00:00:11.885 29399==by 0x6BE68E2:
>> zmq::own_t::launch_child(zmq::own_t*) (own.cpp:87)
>> ==00:00:00:11.885 29399==by 0x6C03D6C:
>> zmq::socket_base_t::add_endpoint(char const*, zmq::own_t*,
>> zmq::pipe_t*)
>> (socket_base.cpp:1006)
>> ==00:00:00:11.885 29399==  Address 0xb373bf0 is 128 bytes inside a
>> block of
>> size 224 alloc'd
>> ==00:00:00:11.885 29399==at 0x4C2A6FD: operator new(unsigned
>> long,
>> std::nothrow_t const&) (vg_replace_malloc.c:376)
>> ==00:00:00:11.885 29399==by 0x6BB8B8D:
>> zmq::ctx_t::create_socket(int)
>> (ctx.cpp:351)
>> ==00:00:00:11.885 29399==by 0x6C284D5: zmq_socket (zmq.cpp:267)
>> ==00:00:00:11.885 29399==by 0x6143809:
>> ZmqClientSocket::Config(PubSubSocketConfig const&)
>> (ZmqRequestReply.cpp:303)
>> ==00:00:00:11.885 29399==by 0x6144069:
>> ZmqClientMultiSocket::Config(PubSubSocketConfig const&)
>> (ZmqRequestReply.cpp:407)
>> ==00:00:00:11.885 29399==by 0x61684EF: client_thread_main(void*)
>> (ZmqRequestReplyUnitTests.cpp:132)
>> ==00:00:00:11.886 29399==by 0x4C2F837: mythread_wrapper
>> (hg_intercepts.c:389)
>> ==00:00:00:11.886 29399==by 0x6E72463: start_thread
>> (pthread_create.c:334)
>> ==00:00:00:11.886 29399==by 0x92F901C: clone (clone.S:109)
>> ==00:00:00:11.886 29399==  Block was alloc'd by 

Re: [zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

2018-02-23 Thread Thomas Rodgers
I don’t know if this has changed recently, but at the time I added the
compiler intrinsics support, it was generally deemed undesirable to require
C++11.

On Fri, Feb 23, 2018 at 5:22 AM Francesco 
wrote:

> Hi all,
> I'm trying to further debug the problem I described in my earlier mail (
> https://lists.zeromq.org/pipermail/zeromq-dev/2018-February/032303.html)
> so I decided to use Helgrind to find race conditions in my code.
>
> My problem is that apparently Helgrind 3.12.0 is reporting race conditions
> against zmq::atomic_ptr_t<> implementation.
> Now I know that Helgrind has troubles with C++11 atomics but by looking at
> the code I see that ZMQ is not using them (note: I do have
> ZMQ_ATOMIC_PTR_CXX11 defined but I also have ZMQ_ATOMIC_PTR_INTRINSIC
> defined, so the latter wins!).
>
> In particular Helgrind 3.12.0 tells me that:
>
>
> ==00:00:00:11.885 29399==
> ==00:00:00:11.885 29399== *Possible data race during read of size 8 at
> 0xB373BF0 by thread #4*
> ==00:00:00:11.885 29399== Locks held: none
> ==00:00:00:11.885 29399==at 0x6BD79AB:
> *zmq::atomic_ptr_t::cas*(zmq::command_t*,
> zmq::command_t*) (atomic_ptr.hpp:150)
> ==00:00:00:11.885 29399==by 0x6BD7874: zmq::ypipe_t 16>::check_read() (ypipe.hpp:147)
> ==00:00:00:11.885 29399==by 0x6BD7288: zmq::ypipe_t 16>::read(zmq::command_t*) (ypipe.hpp:165)
> ==00:00:00:11.885 29399==by 0x6BD6FE7:
> zmq::mailbox_t::recv(zmq::command_t*, int) (mailbox.cpp:98)
> ==00:00:00:11.885 29399==by 0x6BD29FC: zmq::io_thread_t::in_event()
> (io_thread.cpp:81)
> ==00:00:00:11.885 29399==by 0x6BD05C1: zmq::epoll_t::loop()
> (epoll.cpp:188)
> ==00:00:00:11.885 29399==by 0x6BD06C3:
> zmq::epoll_t::worker_routine(void*) (epoll.cpp:203)
> ==00:00:00:11.885 29399==by 0x6C18BA5: thread_routine (thread.cpp:109)
> ==00:00:00:11.885 29399==by 0x4C2F837: mythread_wrapper
> (hg_intercepts.c:389)
> ==00:00:00:11.885 29399==by 0x6E72463: start_thread
> (pthread_create.c:334)
> ==00:00:00:11.885 29399==by 0x92F901C: clone (clone.S:109)
> ==00:00:00:11.885 29399==
> ==00:00:00:11.885 29399== This conflicts with a previous write of size 8
> by thread #2
> ==00:00:00:11.885 29399== Locks held: 1, at address 0xB373C08
> ==00:00:00:11.885 29399==at 0x6BD77F4:
> *zmq::atomic_ptr_t::set*(zmq::command_t*)
> (atomic_ptr.hpp:90)
> ==00:00:00:11.885 29399==by 0x6BD7422: zmq::ypipe_t 16>::flush() (ypipe.hpp:125)
> ==00:00:00:11.885 29399==by 0x6BD6DF5:
> zmq::mailbox_t::send(zmq::command_t const&) (mailbox.cpp:63)
> ==00:00:00:11.885 29399==by 0x6BB9128:
> zmq::ctx_t::send_command(unsigned int, zmq::command_t const&) (ctx.cpp:438)
> ==00:00:00:11.885 29399==by 0x6BE34CE:
> zmq::object_t::send_command(zmq::command_t&) (object.cpp:474)
> ==00:00:00:11.885 29399==by 0x6BE26F8:
> zmq::object_t::send_plug(zmq::own_t*, bool) (object.cpp:220)
> ==00:00:00:11.885 29399==by 0x6BE68E2:
> zmq::own_t::launch_child(zmq::own_t*) (own.cpp:87)
> ==00:00:00:11.885 29399==by 0x6C03D6C:
> zmq::socket_base_t::add_endpoint(char const*, zmq::own_t*, zmq::pipe_t*)
> (socket_base.cpp:1006)
> ==00:00:00:11.885 29399==  Address 0xb373bf0 is 128 bytes inside a block
> of size 224 alloc'd
> ==00:00:00:11.885 29399==at 0x4C2A6FD: operator new(unsigned long,
> std::nothrow_t const&) (vg_replace_malloc.c:376)
> ==00:00:00:11.885 29399==by 0x6BB8B8D: zmq::ctx_t::create_socket(int)
> (ctx.cpp:351)
> ==00:00:00:11.885 29399==by 0x6C284D5: zmq_socket (zmq.cpp:267)
> ==00:00:00:11.885 29399==by 0x6143809:
> ZmqClientSocket::Config(PubSubSocketConfig const&) (ZmqRequestReply.cpp:303)
> ==00:00:00:11.885 29399==by 0x6144069:
> ZmqClientMultiSocket::Config(PubSubSocketConfig const&)
> (ZmqRequestReply.cpp:407)
> ==00:00:00:11.885 29399==by 0x61684EF: client_thread_main(void*)
> (ZmqRequestReplyUnitTests.cpp:132)
> ==00:00:00:11.886 29399==by 0x4C2F837: mythread_wrapper
> (hg_intercepts.c:389)
> ==00:00:00:11.886 29399==by 0x6E72463: start_thread
> (pthread_create.c:334)
> ==00:00:00:11.886 29399==by 0x92F901C: clone (clone.S:109)
> ==00:00:00:11.886 29399==  Block was alloc'd by thread #2
>
>
> Is this a known (and ignorable) issue with  zmq::atomic_ptr_t<>?
>
> Thanks,
> Francesco
>
>
>
>
> ___
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev


Re: [zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

2018-02-23 Thread Luca Boccassi
On Fri, 2018-02-23 at 12:22 +0100, Francesco wrote:
> Hi all,
> I'm trying to further debug the problem I described in my earlier
> mail (
> https://lists.zeromq.org/pipermail/zeromq-dev/2018-February/032303.ht
> ml) so
> I decided to use Helgrind to find race conditions in my code.
> 
> My problem is that apparently Helgrind 3.12.0 is reporting race
> conditions
> against zmq::atomic_ptr_t<> implementation.
> Now I know that Helgrind has troubles with C++11 atomics but by
> looking at
> the code I see that ZMQ is not using them (note: I do have
> ZMQ_ATOMIC_PTR_CXX11 defined but I also have ZMQ_ATOMIC_PTR_INTRINSIC
> defined, so the latter wins!).
> 
> In particular Helgrind 3.12.0 tells me that:
> 
> 
> ==00:00:00:11.885 29399==
> ==00:00:00:11.885 29399== *Possible data race during read of size 8
> at
> 0xB373BF0 by thread #4*
> ==00:00:00:11.885 29399== Locks held: none
> ==00:00:00:11.885 29399==at 0x6BD79AB:
> *zmq::atomic_ptr_t::cas*(zmq::command_t*,
> zmq::command_t*)
> (atomic_ptr.hpp:150)
> ==00:00:00:11.885 29399==by 0x6BD7874:
> zmq::ypipe_t 16>::check_read() (ypipe.hpp:147)
> ==00:00:00:11.885 29399==by 0x6BD7288:
> zmq::ypipe_t 16>::read(zmq::command_t*) (ypipe.hpp:165)
> ==00:00:00:11.885 29399==by 0x6BD6FE7:
> zmq::mailbox_t::recv(zmq::command_t*, int) (mailbox.cpp:98)
> ==00:00:00:11.885 29399==by 0x6BD29FC:
> zmq::io_thread_t::in_event()
> (io_thread.cpp:81)
> ==00:00:00:11.885 29399==by 0x6BD05C1: zmq::epoll_t::loop()
> (epoll.cpp:188)
> ==00:00:00:11.885 29399==by 0x6BD06C3:
> zmq::epoll_t::worker_routine(void*) (epoll.cpp:203)
> ==00:00:00:11.885 29399==by 0x6C18BA5: thread_routine
> (thread.cpp:109)
> ==00:00:00:11.885 29399==by 0x4C2F837: mythread_wrapper
> (hg_intercepts.c:389)
> ==00:00:00:11.885 29399==by 0x6E72463: start_thread
> (pthread_create.c:334)
> ==00:00:00:11.885 29399==by 0x92F901C: clone (clone.S:109)
> ==00:00:00:11.885 29399==
> ==00:00:00:11.885 29399== This conflicts with a previous write of
> size 8 by
> thread #2
> ==00:00:00:11.885 29399== Locks held: 1, at address 0xB373C08
> ==00:00:00:11.885 29399==at 0x6BD77F4:
> *zmq::atomic_ptr_t::set*(zmq::command_t*)
> (atomic_ptr.hpp:90)
> ==00:00:00:11.885 29399==by 0x6BD7422:
> zmq::ypipe_t 16>::flush() (ypipe.hpp:125)
> ==00:00:00:11.885 29399==by 0x6BD6DF5:
> zmq::mailbox_t::send(zmq::command_t const&) (mailbox.cpp:63)
> ==00:00:00:11.885 29399==by 0x6BB9128:
> zmq::ctx_t::send_command(unsigned int, zmq::command_t const&)
> (ctx.cpp:438)
> ==00:00:00:11.885 29399==by 0x6BE34CE:
> zmq::object_t::send_command(zmq::command_t&) (object.cpp:474)
> ==00:00:00:11.885 29399==by 0x6BE26F8:
> zmq::object_t::send_plug(zmq::own_t*, bool) (object.cpp:220)
> ==00:00:00:11.885 29399==by 0x6BE68E2:
> zmq::own_t::launch_child(zmq::own_t*) (own.cpp:87)
> ==00:00:00:11.885 29399==by 0x6C03D6C:
> zmq::socket_base_t::add_endpoint(char const*, zmq::own_t*,
> zmq::pipe_t*)
> (socket_base.cpp:1006)
> ==00:00:00:11.885 29399==  Address 0xb373bf0 is 128 bytes inside a
> block of
> size 224 alloc'd
> ==00:00:00:11.885 29399==at 0x4C2A6FD: operator new(unsigned
> long,
> std::nothrow_t const&) (vg_replace_malloc.c:376)
> ==00:00:00:11.885 29399==by 0x6BB8B8D:
> zmq::ctx_t::create_socket(int)
> (ctx.cpp:351)
> ==00:00:00:11.885 29399==by 0x6C284D5: zmq_socket (zmq.cpp:267)
> ==00:00:00:11.885 29399==by 0x6143809:
> ZmqClientSocket::Config(PubSubSocketConfig const&)
> (ZmqRequestReply.cpp:303)
> ==00:00:00:11.885 29399==by 0x6144069:
> ZmqClientMultiSocket::Config(PubSubSocketConfig const&)
> (ZmqRequestReply.cpp:407)
> ==00:00:00:11.885 29399==by 0x61684EF: client_thread_main(void*)
> (ZmqRequestReplyUnitTests.cpp:132)
> ==00:00:00:11.886 29399==by 0x4C2F837: mythread_wrapper
> (hg_intercepts.c:389)
> ==00:00:00:11.886 29399==by 0x6E72463: start_thread
> (pthread_create.c:334)
> ==00:00:00:11.886 29399==by 0x92F901C: clone (clone.S:109)
> ==00:00:00:11.886 29399==  Block was alloc'd by thread #2
> 
> 
> Is this a known (and ignorable) issue with  zmq::atomic_ptr_t<>?
> 
> Thanks,
> Francesco

Yeah I started trying to put together a suppression file but never
finished it:

https://github.com/bluca/libzmq/commit/fb9ee9da7631f9506cbfcd6db29a284ae6e9651e

Hope to have time to finish working on it eventually (feel free to
contribute!) as it's very noisy right now, as it can't know about our
lock-free queue implementation without the custom suppression file

-- 
Kind regards,
Luca Boccassi

signature.asc
Description: This is a digitally signed message part
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev


[zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

2018-02-23 Thread Francesco
Hi all,
I'm trying to further debug the problem I described in my earlier mail (
https://lists.zeromq.org/pipermail/zeromq-dev/2018-February/032303.html) so
I decided to use Helgrind to find race conditions in my code.

My problem is that apparently Helgrind 3.12.0 is reporting race conditions
against zmq::atomic_ptr_t<> implementation.
Now I know that Helgrind has troubles with C++11 atomics but by looking at
the code I see that ZMQ is not using them (note: I do have
ZMQ_ATOMIC_PTR_CXX11 defined but I also have ZMQ_ATOMIC_PTR_INTRINSIC
defined, so the latter wins!).

In particular Helgrind 3.12.0 tells me that:


==00:00:00:11.885 29399==
==00:00:00:11.885 29399== *Possible data race during read of size 8 at
0xB373BF0 by thread #4*
==00:00:00:11.885 29399== Locks held: none
==00:00:00:11.885 29399==at 0x6BD79AB:
*zmq::atomic_ptr_t::cas*(zmq::command_t*, zmq::command_t*)
(atomic_ptr.hpp:150)
==00:00:00:11.885 29399==by 0x6BD7874: zmq::ypipe_t::check_read() (ypipe.hpp:147)
==00:00:00:11.885 29399==by 0x6BD7288: zmq::ypipe_t::read(zmq::command_t*) (ypipe.hpp:165)
==00:00:00:11.885 29399==by 0x6BD6FE7:
zmq::mailbox_t::recv(zmq::command_t*, int) (mailbox.cpp:98)
==00:00:00:11.885 29399==by 0x6BD29FC: zmq::io_thread_t::in_event()
(io_thread.cpp:81)
==00:00:00:11.885 29399==by 0x6BD05C1: zmq::epoll_t::loop()
(epoll.cpp:188)
==00:00:00:11.885 29399==by 0x6BD06C3:
zmq::epoll_t::worker_routine(void*) (epoll.cpp:203)
==00:00:00:11.885 29399==by 0x6C18BA5: thread_routine (thread.cpp:109)
==00:00:00:11.885 29399==by 0x4C2F837: mythread_wrapper
(hg_intercepts.c:389)
==00:00:00:11.885 29399==by 0x6E72463: start_thread
(pthread_create.c:334)
==00:00:00:11.885 29399==by 0x92F901C: clone (clone.S:109)
==00:00:00:11.885 29399==
==00:00:00:11.885 29399== This conflicts with a previous write of size 8 by
thread #2
==00:00:00:11.885 29399== Locks held: 1, at address 0xB373C08
==00:00:00:11.885 29399==at 0x6BD77F4:
*zmq::atomic_ptr_t::set*(zmq::command_t*)
(atomic_ptr.hpp:90)
==00:00:00:11.885 29399==by 0x6BD7422: zmq::ypipe_t::flush() (ypipe.hpp:125)
==00:00:00:11.885 29399==by 0x6BD6DF5:
zmq::mailbox_t::send(zmq::command_t const&) (mailbox.cpp:63)
==00:00:00:11.885 29399==by 0x6BB9128:
zmq::ctx_t::send_command(unsigned int, zmq::command_t const&) (ctx.cpp:438)
==00:00:00:11.885 29399==by 0x6BE34CE:
zmq::object_t::send_command(zmq::command_t&) (object.cpp:474)
==00:00:00:11.885 29399==by 0x6BE26F8:
zmq::object_t::send_plug(zmq::own_t*, bool) (object.cpp:220)
==00:00:00:11.885 29399==by 0x6BE68E2:
zmq::own_t::launch_child(zmq::own_t*) (own.cpp:87)
==00:00:00:11.885 29399==by 0x6C03D6C:
zmq::socket_base_t::add_endpoint(char const*, zmq::own_t*, zmq::pipe_t*)
(socket_base.cpp:1006)
==00:00:00:11.885 29399==  Address 0xb373bf0 is 128 bytes inside a block of
size 224 alloc'd
==00:00:00:11.885 29399==at 0x4C2A6FD: operator new(unsigned long,
std::nothrow_t const&) (vg_replace_malloc.c:376)
==00:00:00:11.885 29399==by 0x6BB8B8D: zmq::ctx_t::create_socket(int)
(ctx.cpp:351)
==00:00:00:11.885 29399==by 0x6C284D5: zmq_socket (zmq.cpp:267)
==00:00:00:11.885 29399==by 0x6143809:
ZmqClientSocket::Config(PubSubSocketConfig const&) (ZmqRequestReply.cpp:303)
==00:00:00:11.885 29399==by 0x6144069:
ZmqClientMultiSocket::Config(PubSubSocketConfig const&)
(ZmqRequestReply.cpp:407)
==00:00:00:11.885 29399==by 0x61684EF: client_thread_main(void*)
(ZmqRequestReplyUnitTests.cpp:132)
==00:00:00:11.886 29399==by 0x4C2F837: mythread_wrapper
(hg_intercepts.c:389)
==00:00:00:11.886 29399==by 0x6E72463: start_thread
(pthread_create.c:334)
==00:00:00:11.886 29399==by 0x92F901C: clone (clone.S:109)
==00:00:00:11.886 29399==  Block was alloc'd by thread #2


Is this a known (and ignorable) issue with  zmq::atomic_ptr_t<>?

Thanks,
Francesco
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev