Re: [zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

2018-02-24 Thread Luca Boccassi
There are a couple in the use/setting of a handful of socket options,
but nothing major. The pipes and queues are fine.

On Sat, 2018-02-24 at 17:34 +0100, Francesco wrote:
> Actually even building zeromq with -fsanitize=thread I still get a
> lot of
> data races about ZMQ atomics:
> 
> WARNING: ThreadSanitizer: data race (pid=25770)
>   Atomic write of size 4 at 0x7d417e48 by thread T4:
> #0 __tsan_atomic32_fetch_add
> ../../../../gcc-5.3.0/libsanitizer/tsan/tsan_interface_atomic.cc:611
> (libtsan.so.0+0x0005860f)
> #1 zmq::atomic_counter_t::add(unsigned int)
> src/atomic_counter.hpp:116
> (libzmq.so.5+0x0001a087)
> #2 zmq::poller_base_t::adjust_load(int) src/poller_base.cpp:53
> (libzmq.so.5+0x0006d20d)
> #3 zmq::epoll_t::add_fd(int, zmq::i_poll_events*)
> src/epoll.cpp:91
> (libzmq.so.5+0x0003c229)
> #4 zmq::io_object_t::add_fd(int) src/io_object.cpp:66
> (libzmq.so.5+0x000400ef)
> #5 zmq::tcp_connecter_t::start_connecting()
> src/tcp_connecter.cpp:204
> (libzmq.so.5+0x000abf30)
> #6 zmq::tcp_connecter_t::process_plug() src/tcp_connecter.cpp:94
> (libzmq.so.5+0x000ab6cb)
> #7 zmq::object_t::process_command(zmq::command_t&)
> src/object.cpp:90
> (libzmq.so.5+0x00059e37)
> #8 zmq::io_thread_t::in_event() src/io_thread.cpp:85
> (libzmq.so.5+0x00040cb3)
> #9 zmq::epoll_t::loop() src/epoll.cpp:188
> (libzmq.so.5+0x0003ce6d)
> #10 zmq::epoll_t::worker_routine(void*) src/epoll.cpp:203
> (libzmq.so.5+0x0003cfb6)
> #11 thread_routine src/thread.cpp:109
> (libzmq.so.5+0x000af00b)
> 
>   Previous read of size 4 at 0x7d417e48 by thread T2:
> #0 zmq::atomic_counter_t::get() const src/atomic_counter.hpp:210
> (libzmq.so.5+0x00061d20)
> #1 zmq::poller_base_t::get_load() src/poller_base.cpp:47
> (libzmq.so.5+0x0006d1c6)
> #2 zmq::io_thread_t::get_load() src/io_thread.cpp:72
> (libzmq.so.5+0x00040bef)
> #3 zmq::ctx_t::choose_io_thread(unsigned long) src/ctx.cpp:451
> (libzmq.so.5+0x000184ff)
> #4 zmq::object_t::choose_io_thread(unsigned long)
> src/object.cpp:199
> (libzmq.so.5+0x0005a67f)
> #5 zmq::socket_base_t::bind(char const*) src/socket_base.cpp:610
> (libzmq.so.5+0x0008bca6)
> #6 zmq_bind src/zmq.cpp:326 (libzmq.so.5+0x000c8735)
>    
> 
> or about ypipe:
> 
> WARNING: ThreadSanitizer: data race (pid=25770)
>   Read of size 8 at 0x7d640001e0c0 by thread T4:
> #0 zmq::ypipe_t::read(zmq::command_t*)
> src/ypipe.hpp:170 (libzmq.so.5+0x00047e49)
> #1 zmq::mailbox_t::recv(zmq::command_t*, int) src/mailbox.cpp:73
> (libzmq.so.5+0x00047793)
> #2 zmq::io_thread_t::in_event() src/io_thread.cpp:86
> (libzmq.so.5+0x00040cd5)
> #3 zmq::epoll_t::loop() src/epoll.cpp:188
> (libzmq.so.5+0x0003ce6d)
> #4 zmq::epoll_t::worker_routine(void*) src/epoll.cpp:203
> (libzmq.so.5+0x0003cfb6)
> #5 thread_routine src/thread.cpp:109 (libzmq.so.5+0x000af00b)
> 
>   Previous write of size 8 at 0x7d640001e0c0 by thread T2 (mutexes:
> write
> M500):
> #0 zmq::ypipe_t::write(zmq::command_t const&,
> bool)
> src/ypipe.hpp:85 (libzmq.so.5+0x00047f05)
> #1 zmq::mailbox_t::send(zmq::command_t const&) src/mailbox.cpp:62
> (libzmq.so.5+0x000476dc)
> #2 zmq::ctx_t::send_command(unsigned int, zmq::command_t const&)
> src/ctx.cpp:438 (libzmq.so.5+0x00018420)
> #3 zmq::object_t::send_command(zmq::command_t&)
> src/object.cpp:474
> (libzmq.so.5+0x0005c07b)
> #4 zmq::object_t::send_plug(zmq::own_t*, bool) src/object.cpp:220
> (libzmq.so.5+0x0005a7ef)
> #5 zmq::own_t::launch_child(zmq::own_t*) src/own.cpp:87
> (libzmq.so.5+0x0006134f)
> #6 zmq::socket_base_t::add_endpoint(char const*, zmq::own_t*,
> zmq::pipe_t*) src/socket_base.cpp:1006 (libzmq.so.5+0x0008e081)
> #7 zmq::socket_base_t::bind(char const*) src/socket_base.cpp:630
> (libzmq.so.5+0x0008be89)
> #8 zmq_bind src/zmq.cpp:326 (libzmq.so.5+0x000c8735)
> ...
> 
> 
> I tried forcing use of C++11 internal atomics and all the warnings
> about  zmq::atomic_ptr_t<> disappear (I still get tons about ypipe
> and
> "pipe")..
> maybe it's best to prefer C++11 atomics when available?
> 
> Btw I will try to write a suppression file for ThreadSanitizer and
> ZMQ... I
> just start to doubt: is all ZMQ code really race-free? :)
> 
> 
> Thanks,
> Francesco
> 
> 
> 
> 
> 2018-02-24 16:44 GMT+01:00 Francesco :
> 
> > Hi Bill,
> > indeed I've found ThreadSanitizer to be more effective i.e., it
> > contains
> > much less false positives compared to Helgrind... still I'm getting
> > several
> > race condition warnings out of libzmq 4.2.3... indeed I actually
> > built only
> > my application code with -fsanitize=thread... do you know if I need
> > to
> > rebuild libzmq with that option as well? I will try to see if it
> > makes any
> > difference!
> > 
> > Thanks,
> > 

Re: [zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

2018-02-24 Thread Bill Torpey
Francesco:

Please see my comments below.  Full disclaimer: I don’t pretend to understand 
ZMQ internals, and a lot of this information came via trial and error.  YMMV, 
but I hope this helps.  If you find out more, it would be much appreciated if 
you post back to this thread — thanks!

Regards,

Bill

> On Feb 24, 2018, at 11:34 AM, Francesco  wrote:
> 
> Actually even building zeromq with -fsanitize=thread I still get a lot of 
> data races about ZMQ atomics:
> 
> WARNING: ThreadSanitizer: data race (pid=25770)
>   Atomic write of size 4 at 0x7d417e48 by thread T4:
> #0 __tsan_atomic32_fetch_add 
> ../../../../gcc-5.3.0/libsanitizer/tsan/tsan_interface_atomic.cc:611 
> (libtsan.so.0+0x0005860f)
> #1 zmq::atomic_counter_t::add(unsigned int) src/atomic_counter.hpp:116 
> (libzmq.so.5+0x0001a087)
> #2 zmq::poller_base_t::adjust_load(int) src/poller_base.cpp:53 
> (libzmq.so.5+0x0006d20d)
> #3 zmq::epoll_t::add_fd(int, zmq::i_poll_events*) src/epoll.cpp:91 
> (libzmq.so.5+0x0003c229)
> #4 zmq::io_object_t::add_fd(int) src/io_object.cpp:66 
> (libzmq.so.5+0x000400ef)
> #5 zmq::tcp_connecter_t::start_connecting() src/tcp_connecter.cpp:204 
> (libzmq.so.5+0x000abf30)
> #6 zmq::tcp_connecter_t::process_plug() src/tcp_connecter.cpp:94 
> (libzmq.so.5+0x000ab6cb)
> #7 zmq::object_t::process_command(zmq::command_t&) src/object.cpp:90 
> (libzmq.so.5+0x00059e37)
> #8 zmq::io_thread_t::in_event() src/io_thread.cpp:85 
> (libzmq.so.5+0x00040cb3)
> #9 zmq::epoll_t::loop() src/epoll.cpp:188 (libzmq.so.5+0x0003ce6d)
> #10 zmq::epoll_t::worker_routine(void*) src/epoll.cpp:203 
> (libzmq.so.5+0x0003cfb6)
> #11 thread_routine src/thread.cpp:109 (libzmq.so.5+0x000af00b)
> 
>   Previous read of size 4 at 0x7d417e48 by thread T2:
> #0 zmq::atomic_counter_t::get() const src/atomic_counter.hpp:210 
> (libzmq.so.5+0x00061d20)
> #1 zmq::poller_base_t::get_load() src/poller_base.cpp:47 
> (libzmq.so.5+0x0006d1c6)
> #2 zmq::io_thread_t::get_load() src/io_thread.cpp:72 
> (libzmq.so.5+0x00040bef)
> #3 zmq::ctx_t::choose_io_thread(unsigned long) src/ctx.cpp:451 
> (libzmq.so.5+0x000184ff)
> #4 zmq::object_t::choose_io_thread(unsigned long) src/object.cpp:199 
> (libzmq.so.5+0x0005a67f)
> #5 zmq::socket_base_t::bind(char const*) src/socket_base.cpp:610 
> (libzmq.so.5+0x0008bca6)
> #6 zmq_bind src/zmq.cpp:326 (libzmq.so.5+0x000c8735)
>….


Since I don’t see any references to mutexes above, I assume that the code does 
not use mutexes.  Otherwise you would get a warning that looks more like this:

  Previous write of size 8 at 0x7b0864b8 by thread T5 (mutexes: write M88, 
write M90, write M87):

TSAN keeps track of the mutexes held at the time of each memory access — if two 
memory accesses occur on two different threads, but they have at least one 
mutex in common, TSAN accepts the access as race-free.  If the two accesses 
don’t have at least one mutex in common, then the access is considered a data 
race.

If T2 and T4 were application code, this would be a clear violation of ZMQ’s 
threading rules (assuming “legacy”, non-thread-safe socket types).  For 
instance, one technique would be to perform initialization (e.g., bind’s) in 
the main thread, and only after that is done spin up other threads to do 
processing.  In this case, TSAN wouldn't have any way to know that the 
application guarantees that accesses cannot result in a race, so TSAN would 
flag it.  I’ve gotten in the habit of using mutexes to protect code like this 
even though it should not strictly be needed, just to keep tools like TSAN 
happy, and also because I don’t know the ZMQ code well enough to be 100% 
certain that the mutexes are not necessary — better safe than sorry!

This situation is different, though, since T4 is not an application thread — 
it’s an internal ZMQ worker thread.   So, I think in this case we kind of have 
to accept that ZMQ is doing the right thing here.

At least, that’s the approach I’ve been taking.  When I instrument my apps and 
libraries with TSAN I specifically do NOT instrument ZMQ, and I also use the 
“called_from_lib:libzmq.so” suppression (which is listed as an example for 
TSAN: https://github.com/google/sanitizers/wiki/ThreadSanitizerSuppressions 
).

Instrumenting libzmq and/or omitting the suppression causes a LOT of warnings, 
esp. in the ZMQ worker threads.  So, unless I'm willing to commit the time and 
effort to go through and investigate each of these warnings, I feel I have 
little choice but to accept that at this point in its lifetime ZMQ should be 
race-free for all practical purposes.

FWIW, I’ve done fairly extensive testing, and specifically stress testing, and 
have yet to find anything that looks like an honest-to-goodness bug in ZMQ.  
(Which is 

Re: [zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

2018-02-24 Thread Francesco
Actually even building zeromq with -fsanitize=thread I still get a lot of
data races about ZMQ atomics:

WARNING: ThreadSanitizer: data race (pid=25770)
  Atomic write of size 4 at 0x7d417e48 by thread T4:
#0 __tsan_atomic32_fetch_add
../../../../gcc-5.3.0/libsanitizer/tsan/tsan_interface_atomic.cc:611
(libtsan.so.0+0x0005860f)
#1 zmq::atomic_counter_t::add(unsigned int) src/atomic_counter.hpp:116
(libzmq.so.5+0x0001a087)
#2 zmq::poller_base_t::adjust_load(int) src/poller_base.cpp:53
(libzmq.so.5+0x0006d20d)
#3 zmq::epoll_t::add_fd(int, zmq::i_poll_events*) src/epoll.cpp:91
(libzmq.so.5+0x0003c229)
#4 zmq::io_object_t::add_fd(int) src/io_object.cpp:66
(libzmq.so.5+0x000400ef)
#5 zmq::tcp_connecter_t::start_connecting() src/tcp_connecter.cpp:204
(libzmq.so.5+0x000abf30)
#6 zmq::tcp_connecter_t::process_plug() src/tcp_connecter.cpp:94
(libzmq.so.5+0x000ab6cb)
#7 zmq::object_t::process_command(zmq::command_t&) src/object.cpp:90
(libzmq.so.5+0x00059e37)
#8 zmq::io_thread_t::in_event() src/io_thread.cpp:85
(libzmq.so.5+0x00040cb3)
#9 zmq::epoll_t::loop() src/epoll.cpp:188 (libzmq.so.5+0x0003ce6d)
#10 zmq::epoll_t::worker_routine(void*) src/epoll.cpp:203
(libzmq.so.5+0x0003cfb6)
#11 thread_routine src/thread.cpp:109 (libzmq.so.5+0x000af00b)

  Previous read of size 4 at 0x7d417e48 by thread T2:
#0 zmq::atomic_counter_t::get() const src/atomic_counter.hpp:210
(libzmq.so.5+0x00061d20)
#1 zmq::poller_base_t::get_load() src/poller_base.cpp:47
(libzmq.so.5+0x0006d1c6)
#2 zmq::io_thread_t::get_load() src/io_thread.cpp:72
(libzmq.so.5+0x00040bef)
#3 zmq::ctx_t::choose_io_thread(unsigned long) src/ctx.cpp:451
(libzmq.so.5+0x000184ff)
#4 zmq::object_t::choose_io_thread(unsigned long) src/object.cpp:199
(libzmq.so.5+0x0005a67f)
#5 zmq::socket_base_t::bind(char const*) src/socket_base.cpp:610
(libzmq.so.5+0x0008bca6)
#6 zmq_bind src/zmq.cpp:326 (libzmq.so.5+0x000c8735)
   

or about ypipe:

WARNING: ThreadSanitizer: data race (pid=25770)
  Read of size 8 at 0x7d640001e0c0 by thread T4:
#0 zmq::ypipe_t::read(zmq::command_t*)
src/ypipe.hpp:170 (libzmq.so.5+0x00047e49)
#1 zmq::mailbox_t::recv(zmq::command_t*, int) src/mailbox.cpp:73
(libzmq.so.5+0x00047793)
#2 zmq::io_thread_t::in_event() src/io_thread.cpp:86
(libzmq.so.5+0x00040cd5)
#3 zmq::epoll_t::loop() src/epoll.cpp:188 (libzmq.so.5+0x0003ce6d)
#4 zmq::epoll_t::worker_routine(void*) src/epoll.cpp:203
(libzmq.so.5+0x0003cfb6)
#5 thread_routine src/thread.cpp:109 (libzmq.so.5+0x000af00b)

  Previous write of size 8 at 0x7d640001e0c0 by thread T2 (mutexes: write
M500):
#0 zmq::ypipe_t::write(zmq::command_t const&, bool)
src/ypipe.hpp:85 (libzmq.so.5+0x00047f05)
#1 zmq::mailbox_t::send(zmq::command_t const&) src/mailbox.cpp:62
(libzmq.so.5+0x000476dc)
#2 zmq::ctx_t::send_command(unsigned int, zmq::command_t const&)
src/ctx.cpp:438 (libzmq.so.5+0x00018420)
#3 zmq::object_t::send_command(zmq::command_t&) src/object.cpp:474
(libzmq.so.5+0x0005c07b)
#4 zmq::object_t::send_plug(zmq::own_t*, bool) src/object.cpp:220
(libzmq.so.5+0x0005a7ef)
#5 zmq::own_t::launch_child(zmq::own_t*) src/own.cpp:87
(libzmq.so.5+0x0006134f)
#6 zmq::socket_base_t::add_endpoint(char const*, zmq::own_t*,
zmq::pipe_t*) src/socket_base.cpp:1006 (libzmq.so.5+0x0008e081)
#7 zmq::socket_base_t::bind(char const*) src/socket_base.cpp:630
(libzmq.so.5+0x0008be89)
#8 zmq_bind src/zmq.cpp:326 (libzmq.so.5+0x000c8735)
...


I tried forcing use of C++11 internal atomics and all the warnings
about  zmq::atomic_ptr_t<> disappear (I still get tons about ypipe and
"pipe")..
maybe it's best to prefer C++11 atomics when available?

Btw I will try to write a suppression file for ThreadSanitizer and ZMQ... I
just start to doubt: is all ZMQ code really race-free? :)


Thanks,
Francesco




2018-02-24 16:44 GMT+01:00 Francesco :

> Hi Bill,
> indeed I've found ThreadSanitizer to be more effective i.e., it contains
> much less false positives compared to Helgrind... still I'm getting several
> race condition warnings out of libzmq 4.2.3... indeed I actually built only
> my application code with -fsanitize=thread... do you know if I need to
> rebuild libzmq with that option as well? I will try to see if it makes any
> difference!
>
> Thanks,
> Francesco
>
>
>
>
> 2018-02-24 15:17 GMT+01:00 Bill Torpey :
>
>> I’m using clang’s Thread Sanitizer for a similar purpose, and just
>> happened to notice that the TSAN docs use ZeroMQ as one of the example
>> suppressions:  https://github.com/google/san
>> itizers/wiki/ThreadSanitizerSuppressions
>>
>> I assume that the reason for suppressing libzmq.so has to do with
>> (legacy) sockets not being thread-safe, so 

Re: [zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

2018-02-24 Thread Francesco
Hi Bill,
indeed I've found ThreadSanitizer to be more effective i.e., it contains
much less false positives compared to Helgrind... still I'm getting several
race condition warnings out of libzmq 4.2.3... indeed I actually built only
my application code with -fsanitize=thread... do you know if I need to
rebuild libzmq with that option as well? I will try to see if it makes any
difference!

Thanks,
Francesco




2018-02-24 15:17 GMT+01:00 Bill Torpey :

> I’m using clang’s Thread Sanitizer for a similar purpose, and just
> happened to notice that the TSAN docs use ZeroMQ as one of the example
> suppressions:  https://github.com/google/sanitizers/wiki/
> ThreadSanitizerSuppressions
>
> I assume that the reason for suppressing libzmq.so has to do with (legacy)
> sockets not being thread-safe, so the code may exhibit race conditions that
> are irrelevant given that the code is not intended to be called from
> multiple threads.
>
> FWIW, you may want to check out the clang sanitizers — they have certain
> advantages over valgrind — (faster, multi-threaded, etc.) if you are able
> to instrument the code at build time.
>
>
> On Feb 23, 2018, at 6:52 AM, Luca Boccassi 
> wrote:
>
> On Fri, 2018-02-23 at 12:22 +0100, Francesco wrote:
>
> Hi all,
> I'm trying to further debug the problem I described in my earlier
> mail (
> https://lists.zeromq.org/pipermail/zeromq-dev/2018-February/032303.ht
> ml) so
> I decided to use Helgrind to find race conditions in my code.
>
> My problem is that apparently Helgrind 3.12.0 is reporting race
> conditions
> against zmq::atomic_ptr_t<> implementation.
> Now I know that Helgrind has troubles with C++11 atomics but by
> looking at
> the code I see that ZMQ is not using them (note: I do have
> ZMQ_ATOMIC_PTR_CXX11 defined but I also have ZMQ_ATOMIC_PTR_INTRINSIC
> defined, so the latter wins!).
>
> In particular Helgrind 3.12.0 tells me that:
>
>
> ==00:00:00:11.885 29399==
> ==00:00:00:11.885 29399== *Possible data race during read of size 8
> at
> 0xB373BF0 by thread #4*
> ==00:00:00:11.885 29399== Locks held: none
> ==00:00:00:11.885 29399==at 0x6BD79AB:
> *zmq::atomic_ptr_t::cas*(zmq::command_t*,
> zmq::command_t*)
> (atomic_ptr.hpp:150)
> ==00:00:00:11.885 29399==by 0x6BD7874:
> zmq::ypipe_t 16>::check_read() (ypipe.hpp:147)
> ==00:00:00:11.885 29399==by 0x6BD7288:
> zmq::ypipe_t 16>::read(zmq::command_t*) (ypipe.hpp:165)
> ==00:00:00:11.885 29399==by 0x6BD6FE7:
> zmq::mailbox_t::recv(zmq::command_t*, int) (mailbox.cpp:98)
> ==00:00:00:11.885 29399==by 0x6BD29FC:
> zmq::io_thread_t::in_event()
> (io_thread.cpp:81)
> ==00:00:00:11.885 29399==by 0x6BD05C1: zmq::epoll_t::loop()
> (epoll.cpp:188)
> ==00:00:00:11.885 29399==by 0x6BD06C3:
> zmq::epoll_t::worker_routine(void*) (epoll.cpp:203)
> ==00:00:00:11.885 29399==by 0x6C18BA5: thread_routine
> (thread.cpp:109)
> ==00:00:00:11.885 29399==by 0x4C2F837: mythread_wrapper
> (hg_intercepts.c:389)
> ==00:00:00:11.885 29399==by 0x6E72463: start_thread
> (pthread_create.c:334)
> ==00:00:00:11.885 29399==by 0x92F901C: clone (clone.S:109)
> ==00:00:00:11.885 29399==
> ==00:00:00:11.885 29399== This conflicts with a previous write of
> size 8 by
> thread #2
> ==00:00:00:11.885 29399== Locks held: 1, at address 0xB373C08
> ==00:00:00:11.885 29399==at 0x6BD77F4:
> *zmq::atomic_ptr_t::set*(zmq::command_t*)
> (atomic_ptr.hpp:90)
> ==00:00:00:11.885 29399==by 0x6BD7422:
> zmq::ypipe_t 16>::flush() (ypipe.hpp:125)
> ==00:00:00:11.885 29399==by 0x6BD6DF5:
> zmq::mailbox_t::send(zmq::command_t const&) (mailbox.cpp:63)
> ==00:00:00:11.885 29399==by 0x6BB9128:
> zmq::ctx_t::send_command(unsigned int, zmq::command_t const&)
> (ctx.cpp:438)
> ==00:00:00:11.885 29399==by 0x6BE34CE:
> zmq::object_t::send_command(zmq::command_t&) (object.cpp:474)
> ==00:00:00:11.885 29399==by 0x6BE26F8:
> zmq::object_t::send_plug(zmq::own_t*, bool) (object.cpp:220)
> ==00:00:00:11.885 29399==by 0x6BE68E2:
> zmq::own_t::launch_child(zmq::own_t*) (own.cpp:87)
> ==00:00:00:11.885 29399==by 0x6C03D6C:
> zmq::socket_base_t::add_endpoint(char const*, zmq::own_t*,
> zmq::pipe_t*)
> (socket_base.cpp:1006)
> ==00:00:00:11.885 29399==  Address 0xb373bf0 is 128 bytes inside a
> block of
> size 224 alloc'd
> ==00:00:00:11.885 29399==at 0x4C2A6FD: operator new(unsigned
> long,
> std::nothrow_t const&) (vg_replace_malloc.c:376)
> ==00:00:00:11.885 29399==by 0x6BB8B8D:
> zmq::ctx_t::create_socket(int)
> (ctx.cpp:351)
> ==00:00:00:11.885 29399==by 0x6C284D5: zmq_socket (zmq.cpp:267)
> ==00:00:00:11.885 29399==by 0x6143809:
> ZmqClientSocket::Config(PubSubSocketConfig const&)
> (ZmqRequestReply.cpp:303)
> ==00:00:00:11.885 29399==by 0x6144069:
> ZmqClientMultiSocket::Config(PubSubSocketConfig const&)
> (ZmqRequestReply.cpp:407)
> ==00:00:00:11.885 29399==by 0x61684EF: 

Re: [zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

2018-02-24 Thread Bill Torpey
I’m using clang’s Thread Sanitizer for a similar purpose, and just happened to 
notice that the TSAN docs use ZeroMQ as one of the example suppressions:  
https://github.com/google/sanitizers/wiki/ThreadSanitizerSuppressions 


I assume that the reason for suppressing libzmq.so has to do with (legacy) 
sockets not being thread-safe, so the code may exhibit race conditions that are 
irrelevant given that the code is not intended to be called from multiple 
threads.

FWIW, you may want to check out the clang sanitizers — they have certain 
advantages over valgrind — (faster, multi-threaded, etc.) if you are able to 
instrument the code at build time.


> On Feb 23, 2018, at 6:52 AM, Luca Boccassi  wrote:
> 
> On Fri, 2018-02-23 at 12:22 +0100, Francesco wrote:
>> Hi all,
>> I'm trying to further debug the problem I described in my earlier
>> mail (
>> https://lists.zeromq.org/pipermail/zeromq-dev/2018-February/032303.ht
>> ml) so
>> I decided to use Helgrind to find race conditions in my code.
>> 
>> My problem is that apparently Helgrind 3.12.0 is reporting race
>> conditions
>> against zmq::atomic_ptr_t<> implementation.
>> Now I know that Helgrind has troubles with C++11 atomics but by
>> looking at
>> the code I see that ZMQ is not using them (note: I do have
>> ZMQ_ATOMIC_PTR_CXX11 defined but I also have ZMQ_ATOMIC_PTR_INTRINSIC
>> defined, so the latter wins!).
>> 
>> In particular Helgrind 3.12.0 tells me that:
>> 
>> 
>> ==00:00:00:11.885 29399==
>> ==00:00:00:11.885 29399== *Possible data race during read of size 8
>> at
>> 0xB373BF0 by thread #4*
>> ==00:00:00:11.885 29399== Locks held: none
>> ==00:00:00:11.885 29399==at 0x6BD79AB:
>> *zmq::atomic_ptr_t::cas*(zmq::command_t*,
>> zmq::command_t*)
>> (atomic_ptr.hpp:150)
>> ==00:00:00:11.885 29399==by 0x6BD7874:
>> zmq::ypipe_t> 16>::check_read() (ypipe.hpp:147)
>> ==00:00:00:11.885 29399==by 0x6BD7288:
>> zmq::ypipe_t> 16>::read(zmq::command_t*) (ypipe.hpp:165)
>> ==00:00:00:11.885 29399==by 0x6BD6FE7:
>> zmq::mailbox_t::recv(zmq::command_t*, int) (mailbox.cpp:98)
>> ==00:00:00:11.885 29399==by 0x6BD29FC:
>> zmq::io_thread_t::in_event()
>> (io_thread.cpp:81)
>> ==00:00:00:11.885 29399==by 0x6BD05C1: zmq::epoll_t::loop()
>> (epoll.cpp:188)
>> ==00:00:00:11.885 29399==by 0x6BD06C3:
>> zmq::epoll_t::worker_routine(void*) (epoll.cpp:203)
>> ==00:00:00:11.885 29399==by 0x6C18BA5: thread_routine
>> (thread.cpp:109)
>> ==00:00:00:11.885 29399==by 0x4C2F837: mythread_wrapper
>> (hg_intercepts.c:389)
>> ==00:00:00:11.885 29399==by 0x6E72463: start_thread
>> (pthread_create.c:334)
>> ==00:00:00:11.885 29399==by 0x92F901C: clone (clone.S:109)
>> ==00:00:00:11.885 29399==
>> ==00:00:00:11.885 29399== This conflicts with a previous write of
>> size 8 by
>> thread #2
>> ==00:00:00:11.885 29399== Locks held: 1, at address 0xB373C08
>> ==00:00:00:11.885 29399==at 0x6BD77F4:
>> *zmq::atomic_ptr_t::set*(zmq::command_t*)
>> (atomic_ptr.hpp:90)
>> ==00:00:00:11.885 29399==by 0x6BD7422:
>> zmq::ypipe_t> 16>::flush() (ypipe.hpp:125)
>> ==00:00:00:11.885 29399==by 0x6BD6DF5:
>> zmq::mailbox_t::send(zmq::command_t const&) (mailbox.cpp:63)
>> ==00:00:00:11.885 29399==by 0x6BB9128:
>> zmq::ctx_t::send_command(unsigned int, zmq::command_t const&)
>> (ctx.cpp:438)
>> ==00:00:00:11.885 29399==by 0x6BE34CE:
>> zmq::object_t::send_command(zmq::command_t&) (object.cpp:474)
>> ==00:00:00:11.885 29399==by 0x6BE26F8:
>> zmq::object_t::send_plug(zmq::own_t*, bool) (object.cpp:220)
>> ==00:00:00:11.885 29399==by 0x6BE68E2:
>> zmq::own_t::launch_child(zmq::own_t*) (own.cpp:87)
>> ==00:00:00:11.885 29399==by 0x6C03D6C:
>> zmq::socket_base_t::add_endpoint(char const*, zmq::own_t*,
>> zmq::pipe_t*)
>> (socket_base.cpp:1006)
>> ==00:00:00:11.885 29399==  Address 0xb373bf0 is 128 bytes inside a
>> block of
>> size 224 alloc'd
>> ==00:00:00:11.885 29399==at 0x4C2A6FD: operator new(unsigned
>> long,
>> std::nothrow_t const&) (vg_replace_malloc.c:376)
>> ==00:00:00:11.885 29399==by 0x6BB8B8D:
>> zmq::ctx_t::create_socket(int)
>> (ctx.cpp:351)
>> ==00:00:00:11.885 29399==by 0x6C284D5: zmq_socket (zmq.cpp:267)
>> ==00:00:00:11.885 29399==by 0x6143809:
>> ZmqClientSocket::Config(PubSubSocketConfig const&)
>> (ZmqRequestReply.cpp:303)
>> ==00:00:00:11.885 29399==by 0x6144069:
>> ZmqClientMultiSocket::Config(PubSubSocketConfig const&)
>> (ZmqRequestReply.cpp:407)
>> ==00:00:00:11.885 29399==by 0x61684EF: client_thread_main(void*)
>> (ZmqRequestReplyUnitTests.cpp:132)
>> ==00:00:00:11.886 29399==by 0x4C2F837: mythread_wrapper
>> (hg_intercepts.c:389)
>> ==00:00:00:11.886 29399==by 0x6E72463: start_thread
>> (pthread_create.c:334)
>> ==00:00:00:11.886 29399==by 0x92F901C: clone (clone.S:109)
>> ==00:00:00:11.886 29399==  Block was alloc'd by 

Re: [zeromq-dev] zproto- terminated events

2018-02-24 Thread Wes Young
[for the search archives]

> On Feb 16, 2018, at 9:00 AM, Wes Young  wrote:
> 
> (looking at zgossip.c and zgossip_engine.inc, and using versions that are 
> in-sync with current zproto master)
> 
> in zproto should a “termination” state trigger a socket disconnect (eg: a TCP 
> FIN?) in the state machine? or is this something that needs to be handled in 
> your server code? and it’s just assumed the TCP state stays ‘open’ unless 
> you’ve implemented some lower level heart-beats or actually force a term of 
> the tcp socket itself(?)
> 
> for instance- when zgossip triggers an expired/term event- the state goes to 
> term and it appears to remove the client from it’s list(?), but it never 
> seems to clean up the TCP connection unless there’s a hard disconnect. i 
> can’t tell if this is on purpose and supposed to be done in the client, or if 
> that’s just where folks left off last.. along with PING/PONG/TTL, 
> zgossip.c:client_terminate is empty atm.

this is really more of a lower level zmq feature that i musta brain-farted on. 
the client is really responsible for the initiating a FIN, which is fine. the 
other issue seems to be, unlike other zproto protocols gossip *can* be really 
quiet at times. the zproto application server thinks the client is gone, but a 
FIN never went through (there was no lower level zmq disconnect). the other 
protocols seem to get around this issue by being more chatty, having longer 
timeouts and the correct implementation of PING/PONG which was started in 
gossip, but not quite finished (testing a PR for this).

also testing some TTL ideas that will influence the timeouts (eg: when things 
go quiet).

> i’m looking through the zproto docs and examples and think i’m just missing 
> something dumb here.. might be mixing some terms too.

--
wes
wesyoung.me



signature.asc
Description: Message signed with OpenPGP
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev