Re: [zeromq-dev] Multiple connects with a ZMQ_REQ socket appears not to work

2018-02-23 Thread Justin Karneges
On Fri, Feb 23, 2018, at 4:53 PM, Mark via zeromq-dev wrote:
> As I mentioned previously the docs on REQ/REP state:
> 
>"If no services are available, then any send operation on the
> socket shall block until at least one service becomes available."
> 
> but the send() doesn't block in this situation. As you said earlier
> the message is queued and sent async with send() returning to the
> caller.
> 
> I also mis-interpreted the docs to imply that when "at least one
> service becomes available" that any queued messages would go to that
> service, but that's not the case as the round-robin decision is only
> made once on the call to send() - not during the async sending
> process.
> 
> Anyhoo, it's all good now that I know better.

Yeah one thing that got me when I started out with ZeroMQ is understanding that 
queues only exist in the context of known peers, and that a zmq socket doesn't 
have a master queue or anything that exists in the absence of known peers. This 
is why writing to a socket that is binding but has no peers will block.

I say "known peers" rather than "peer connections" because you get a queue once 
you attempt to connect to a peer, even if that connection is not yet 
established.

Justin
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev


Re: [zeromq-dev] Multiple connects with a ZMQ_REQ socket appears not to work

2018-02-23 Thread Mark via zeromq-dev
> Have a read of the zguide, there should be plenty information of
> various failover patterns: http://zguide.zeromq.org/

Yep, read that. I'm happy with my REQ/REP exchanger pattern now that I
have a better grasp of the underlying behaviour.

As I mentioned previously the docs on REQ/REP state:

   "If no services are available, then any send operation on the
socket shall block until at least one service becomes available."

but the send() doesn't block in this situation. As you said earlier
the message is queued and sent async with send() returning to the
caller.

I also mis-interpreted the docs to imply that when "at least one
service becomes available" that any queued messages would go to that
service, but that's not the case as the round-robin decision is only
made once on the call to send() - not during the async sending
process.

Anyhoo, it's all good now that I know better.


Mark.
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev


Re: [zeromq-dev] Multiple connects with a ZMQ_REQ socket appears not to work

2018-02-23 Thread Luca Boccassi
On Fri, 2018-02-23 at 19:18 +, Mark wrote:
> On 23Feb18, Luca Boccassi allegedly wrote:
> > That's because it's round-robin, and the connection is async - so
> > it
> > will wait on the first server to respond, and it never does so it's
> > blocked there. Sounds like what you really want is "fail-over" - IE
> > if
> > the first does not respond, try the second. That might work if you
> > tune
> > the tcp reconnect options to have a small timeout, so that the pipe
> > is
> > removed - by default it's quite large. Note sure it will work - try
> > it.
> 
> The TCP (re)connection is being established just fine so I doubt
> re-connect options will help. (But on your suggestion I did try
> adjusting re-connect opts just to be sure - no effect).
> 
> What did help somewhat was setting ZMQ_IMMEDIATE.
> 
> But that doesn't handle the corner case of the connection appearing
> to
> be up as far as 0MQ is concerned, but unbeknown to it, the TCP socket
> is dead or will be dead when it tries to use it next.

That's what I mean - there are socket options for keepalive and
reconnect timeouts (which also mean connect timeouts) - if you tweak
those, the first connection pipe might get discarded after the timeout
- needs testing though

-- 
Kind regards,
Luca Boccassi

signature.asc
Description: This is a digitally signed message part
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev


Re: [zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

2018-02-23 Thread Thomas Rodgers
I don’t know if this has changed recently, but at the time I added the
compiler intrinsics support, it was generally deemed undesirable to require
C++11.

On Fri, Feb 23, 2018 at 5:22 AM Francesco 
wrote:

> Hi all,
> I'm trying to further debug the problem I described in my earlier mail (
> https://lists.zeromq.org/pipermail/zeromq-dev/2018-February/032303.html)
> so I decided to use Helgrind to find race conditions in my code.
>
> My problem is that apparently Helgrind 3.12.0 is reporting race conditions
> against zmq::atomic_ptr_t<> implementation.
> Now I know that Helgrind has troubles with C++11 atomics but by looking at
> the code I see that ZMQ is not using them (note: I do have
> ZMQ_ATOMIC_PTR_CXX11 defined but I also have ZMQ_ATOMIC_PTR_INTRINSIC
> defined, so the latter wins!).
>
> In particular Helgrind 3.12.0 tells me that:
>
>
> ==00:00:00:11.885 29399==
> ==00:00:00:11.885 29399== *Possible data race during read of size 8 at
> 0xB373BF0 by thread #4*
> ==00:00:00:11.885 29399== Locks held: none
> ==00:00:00:11.885 29399==at 0x6BD79AB:
> *zmq::atomic_ptr_t::cas*(zmq::command_t*,
> zmq::command_t*) (atomic_ptr.hpp:150)
> ==00:00:00:11.885 29399==by 0x6BD7874: zmq::ypipe_t 16>::check_read() (ypipe.hpp:147)
> ==00:00:00:11.885 29399==by 0x6BD7288: zmq::ypipe_t 16>::read(zmq::command_t*) (ypipe.hpp:165)
> ==00:00:00:11.885 29399==by 0x6BD6FE7:
> zmq::mailbox_t::recv(zmq::command_t*, int) (mailbox.cpp:98)
> ==00:00:00:11.885 29399==by 0x6BD29FC: zmq::io_thread_t::in_event()
> (io_thread.cpp:81)
> ==00:00:00:11.885 29399==by 0x6BD05C1: zmq::epoll_t::loop()
> (epoll.cpp:188)
> ==00:00:00:11.885 29399==by 0x6BD06C3:
> zmq::epoll_t::worker_routine(void*) (epoll.cpp:203)
> ==00:00:00:11.885 29399==by 0x6C18BA5: thread_routine (thread.cpp:109)
> ==00:00:00:11.885 29399==by 0x4C2F837: mythread_wrapper
> (hg_intercepts.c:389)
> ==00:00:00:11.885 29399==by 0x6E72463: start_thread
> (pthread_create.c:334)
> ==00:00:00:11.885 29399==by 0x92F901C: clone (clone.S:109)
> ==00:00:00:11.885 29399==
> ==00:00:00:11.885 29399== This conflicts with a previous write of size 8
> by thread #2
> ==00:00:00:11.885 29399== Locks held: 1, at address 0xB373C08
> ==00:00:00:11.885 29399==at 0x6BD77F4:
> *zmq::atomic_ptr_t::set*(zmq::command_t*)
> (atomic_ptr.hpp:90)
> ==00:00:00:11.885 29399==by 0x6BD7422: zmq::ypipe_t 16>::flush() (ypipe.hpp:125)
> ==00:00:00:11.885 29399==by 0x6BD6DF5:
> zmq::mailbox_t::send(zmq::command_t const&) (mailbox.cpp:63)
> ==00:00:00:11.885 29399==by 0x6BB9128:
> zmq::ctx_t::send_command(unsigned int, zmq::command_t const&) (ctx.cpp:438)
> ==00:00:00:11.885 29399==by 0x6BE34CE:
> zmq::object_t::send_command(zmq::command_t&) (object.cpp:474)
> ==00:00:00:11.885 29399==by 0x6BE26F8:
> zmq::object_t::send_plug(zmq::own_t*, bool) (object.cpp:220)
> ==00:00:00:11.885 29399==by 0x6BE68E2:
> zmq::own_t::launch_child(zmq::own_t*) (own.cpp:87)
> ==00:00:00:11.885 29399==by 0x6C03D6C:
> zmq::socket_base_t::add_endpoint(char const*, zmq::own_t*, zmq::pipe_t*)
> (socket_base.cpp:1006)
> ==00:00:00:11.885 29399==  Address 0xb373bf0 is 128 bytes inside a block
> of size 224 alloc'd
> ==00:00:00:11.885 29399==at 0x4C2A6FD: operator new(unsigned long,
> std::nothrow_t const&) (vg_replace_malloc.c:376)
> ==00:00:00:11.885 29399==by 0x6BB8B8D: zmq::ctx_t::create_socket(int)
> (ctx.cpp:351)
> ==00:00:00:11.885 29399==by 0x6C284D5: zmq_socket (zmq.cpp:267)
> ==00:00:00:11.885 29399==by 0x6143809:
> ZmqClientSocket::Config(PubSubSocketConfig const&) (ZmqRequestReply.cpp:303)
> ==00:00:00:11.885 29399==by 0x6144069:
> ZmqClientMultiSocket::Config(PubSubSocketConfig const&)
> (ZmqRequestReply.cpp:407)
> ==00:00:00:11.885 29399==by 0x61684EF: client_thread_main(void*)
> (ZmqRequestReplyUnitTests.cpp:132)
> ==00:00:00:11.886 29399==by 0x4C2F837: mythread_wrapper
> (hg_intercepts.c:389)
> ==00:00:00:11.886 29399==by 0x6E72463: start_thread
> (pthread_create.c:334)
> ==00:00:00:11.886 29399==by 0x92F901C: clone (clone.S:109)
> ==00:00:00:11.886 29399==  Block was alloc'd by thread #2
>
>
> Is this a known (and ignorable) issue with  zmq::atomic_ptr_t<>?
>
> Thanks,
> Francesco
>
>
>
>
> ___
> zeromq-dev mailing list
> zeromq-dev@lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev


Re: [zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

2018-02-23 Thread Luca Boccassi
On Fri, 2018-02-23 at 12:22 +0100, Francesco wrote:
> Hi all,
> I'm trying to further debug the problem I described in my earlier
> mail (
> https://lists.zeromq.org/pipermail/zeromq-dev/2018-February/032303.ht
> ml) so
> I decided to use Helgrind to find race conditions in my code.
> 
> My problem is that apparently Helgrind 3.12.0 is reporting race
> conditions
> against zmq::atomic_ptr_t<> implementation.
> Now I know that Helgrind has troubles with C++11 atomics but by
> looking at
> the code I see that ZMQ is not using them (note: I do have
> ZMQ_ATOMIC_PTR_CXX11 defined but I also have ZMQ_ATOMIC_PTR_INTRINSIC
> defined, so the latter wins!).
> 
> In particular Helgrind 3.12.0 tells me that:
> 
> 
> ==00:00:00:11.885 29399==
> ==00:00:00:11.885 29399== *Possible data race during read of size 8
> at
> 0xB373BF0 by thread #4*
> ==00:00:00:11.885 29399== Locks held: none
> ==00:00:00:11.885 29399==at 0x6BD79AB:
> *zmq::atomic_ptr_t::cas*(zmq::command_t*,
> zmq::command_t*)
> (atomic_ptr.hpp:150)
> ==00:00:00:11.885 29399==by 0x6BD7874:
> zmq::ypipe_t 16>::check_read() (ypipe.hpp:147)
> ==00:00:00:11.885 29399==by 0x6BD7288:
> zmq::ypipe_t 16>::read(zmq::command_t*) (ypipe.hpp:165)
> ==00:00:00:11.885 29399==by 0x6BD6FE7:
> zmq::mailbox_t::recv(zmq::command_t*, int) (mailbox.cpp:98)
> ==00:00:00:11.885 29399==by 0x6BD29FC:
> zmq::io_thread_t::in_event()
> (io_thread.cpp:81)
> ==00:00:00:11.885 29399==by 0x6BD05C1: zmq::epoll_t::loop()
> (epoll.cpp:188)
> ==00:00:00:11.885 29399==by 0x6BD06C3:
> zmq::epoll_t::worker_routine(void*) (epoll.cpp:203)
> ==00:00:00:11.885 29399==by 0x6C18BA5: thread_routine
> (thread.cpp:109)
> ==00:00:00:11.885 29399==by 0x4C2F837: mythread_wrapper
> (hg_intercepts.c:389)
> ==00:00:00:11.885 29399==by 0x6E72463: start_thread
> (pthread_create.c:334)
> ==00:00:00:11.885 29399==by 0x92F901C: clone (clone.S:109)
> ==00:00:00:11.885 29399==
> ==00:00:00:11.885 29399== This conflicts with a previous write of
> size 8 by
> thread #2
> ==00:00:00:11.885 29399== Locks held: 1, at address 0xB373C08
> ==00:00:00:11.885 29399==at 0x6BD77F4:
> *zmq::atomic_ptr_t::set*(zmq::command_t*)
> (atomic_ptr.hpp:90)
> ==00:00:00:11.885 29399==by 0x6BD7422:
> zmq::ypipe_t 16>::flush() (ypipe.hpp:125)
> ==00:00:00:11.885 29399==by 0x6BD6DF5:
> zmq::mailbox_t::send(zmq::command_t const&) (mailbox.cpp:63)
> ==00:00:00:11.885 29399==by 0x6BB9128:
> zmq::ctx_t::send_command(unsigned int, zmq::command_t const&)
> (ctx.cpp:438)
> ==00:00:00:11.885 29399==by 0x6BE34CE:
> zmq::object_t::send_command(zmq::command_t&) (object.cpp:474)
> ==00:00:00:11.885 29399==by 0x6BE26F8:
> zmq::object_t::send_plug(zmq::own_t*, bool) (object.cpp:220)
> ==00:00:00:11.885 29399==by 0x6BE68E2:
> zmq::own_t::launch_child(zmq::own_t*) (own.cpp:87)
> ==00:00:00:11.885 29399==by 0x6C03D6C:
> zmq::socket_base_t::add_endpoint(char const*, zmq::own_t*,
> zmq::pipe_t*)
> (socket_base.cpp:1006)
> ==00:00:00:11.885 29399==  Address 0xb373bf0 is 128 bytes inside a
> block of
> size 224 alloc'd
> ==00:00:00:11.885 29399==at 0x4C2A6FD: operator new(unsigned
> long,
> std::nothrow_t const&) (vg_replace_malloc.c:376)
> ==00:00:00:11.885 29399==by 0x6BB8B8D:
> zmq::ctx_t::create_socket(int)
> (ctx.cpp:351)
> ==00:00:00:11.885 29399==by 0x6C284D5: zmq_socket (zmq.cpp:267)
> ==00:00:00:11.885 29399==by 0x6143809:
> ZmqClientSocket::Config(PubSubSocketConfig const&)
> (ZmqRequestReply.cpp:303)
> ==00:00:00:11.885 29399==by 0x6144069:
> ZmqClientMultiSocket::Config(PubSubSocketConfig const&)
> (ZmqRequestReply.cpp:407)
> ==00:00:00:11.885 29399==by 0x61684EF: client_thread_main(void*)
> (ZmqRequestReplyUnitTests.cpp:132)
> ==00:00:00:11.886 29399==by 0x4C2F837: mythread_wrapper
> (hg_intercepts.c:389)
> ==00:00:00:11.886 29399==by 0x6E72463: start_thread
> (pthread_create.c:334)
> ==00:00:00:11.886 29399==by 0x92F901C: clone (clone.S:109)
> ==00:00:00:11.886 29399==  Block was alloc'd by thread #2
> 
> 
> Is this a known (and ignorable) issue with  zmq::atomic_ptr_t<>?
> 
> Thanks,
> Francesco

Yeah I started trying to put together a suppression file but never
finished it:

https://github.com/bluca/libzmq/commit/fb9ee9da7631f9506cbfcd6db29a284ae6e9651e

Hope to have time to finish working on it eventually (feel free to
contribute!) as it's very noisy right now, as it can't know about our
lock-free queue implementation without the custom suppression file

-- 
Kind regards,
Luca Boccassi

signature.asc
Description: This is a digitally signed message part
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev


[zeromq-dev] are zmq::atomic_ptr_t<> Helgrind warnings known?

2018-02-23 Thread Francesco
Hi all,
I'm trying to further debug the problem I described in my earlier mail (
https://lists.zeromq.org/pipermail/zeromq-dev/2018-February/032303.html) so
I decided to use Helgrind to find race conditions in my code.

My problem is that apparently Helgrind 3.12.0 is reporting race conditions
against zmq::atomic_ptr_t<> implementation.
Now I know that Helgrind has troubles with C++11 atomics but by looking at
the code I see that ZMQ is not using them (note: I do have
ZMQ_ATOMIC_PTR_CXX11 defined but I also have ZMQ_ATOMIC_PTR_INTRINSIC
defined, so the latter wins!).

In particular Helgrind 3.12.0 tells me that:


==00:00:00:11.885 29399==
==00:00:00:11.885 29399== *Possible data race during read of size 8 at
0xB373BF0 by thread #4*
==00:00:00:11.885 29399== Locks held: none
==00:00:00:11.885 29399==at 0x6BD79AB:
*zmq::atomic_ptr_t::cas*(zmq::command_t*, zmq::command_t*)
(atomic_ptr.hpp:150)
==00:00:00:11.885 29399==by 0x6BD7874: zmq::ypipe_t::check_read() (ypipe.hpp:147)
==00:00:00:11.885 29399==by 0x6BD7288: zmq::ypipe_t::read(zmq::command_t*) (ypipe.hpp:165)
==00:00:00:11.885 29399==by 0x6BD6FE7:
zmq::mailbox_t::recv(zmq::command_t*, int) (mailbox.cpp:98)
==00:00:00:11.885 29399==by 0x6BD29FC: zmq::io_thread_t::in_event()
(io_thread.cpp:81)
==00:00:00:11.885 29399==by 0x6BD05C1: zmq::epoll_t::loop()
(epoll.cpp:188)
==00:00:00:11.885 29399==by 0x6BD06C3:
zmq::epoll_t::worker_routine(void*) (epoll.cpp:203)
==00:00:00:11.885 29399==by 0x6C18BA5: thread_routine (thread.cpp:109)
==00:00:00:11.885 29399==by 0x4C2F837: mythread_wrapper
(hg_intercepts.c:389)
==00:00:00:11.885 29399==by 0x6E72463: start_thread
(pthread_create.c:334)
==00:00:00:11.885 29399==by 0x92F901C: clone (clone.S:109)
==00:00:00:11.885 29399==
==00:00:00:11.885 29399== This conflicts with a previous write of size 8 by
thread #2
==00:00:00:11.885 29399== Locks held: 1, at address 0xB373C08
==00:00:00:11.885 29399==at 0x6BD77F4:
*zmq::atomic_ptr_t::set*(zmq::command_t*)
(atomic_ptr.hpp:90)
==00:00:00:11.885 29399==by 0x6BD7422: zmq::ypipe_t::flush() (ypipe.hpp:125)
==00:00:00:11.885 29399==by 0x6BD6DF5:
zmq::mailbox_t::send(zmq::command_t const&) (mailbox.cpp:63)
==00:00:00:11.885 29399==by 0x6BB9128:
zmq::ctx_t::send_command(unsigned int, zmq::command_t const&) (ctx.cpp:438)
==00:00:00:11.885 29399==by 0x6BE34CE:
zmq::object_t::send_command(zmq::command_t&) (object.cpp:474)
==00:00:00:11.885 29399==by 0x6BE26F8:
zmq::object_t::send_plug(zmq::own_t*, bool) (object.cpp:220)
==00:00:00:11.885 29399==by 0x6BE68E2:
zmq::own_t::launch_child(zmq::own_t*) (own.cpp:87)
==00:00:00:11.885 29399==by 0x6C03D6C:
zmq::socket_base_t::add_endpoint(char const*, zmq::own_t*, zmq::pipe_t*)
(socket_base.cpp:1006)
==00:00:00:11.885 29399==  Address 0xb373bf0 is 128 bytes inside a block of
size 224 alloc'd
==00:00:00:11.885 29399==at 0x4C2A6FD: operator new(unsigned long,
std::nothrow_t const&) (vg_replace_malloc.c:376)
==00:00:00:11.885 29399==by 0x6BB8B8D: zmq::ctx_t::create_socket(int)
(ctx.cpp:351)
==00:00:00:11.885 29399==by 0x6C284D5: zmq_socket (zmq.cpp:267)
==00:00:00:11.885 29399==by 0x6143809:
ZmqClientSocket::Config(PubSubSocketConfig const&) (ZmqRequestReply.cpp:303)
==00:00:00:11.885 29399==by 0x6144069:
ZmqClientMultiSocket::Config(PubSubSocketConfig const&)
(ZmqRequestReply.cpp:407)
==00:00:00:11.885 29399==by 0x61684EF: client_thread_main(void*)
(ZmqRequestReplyUnitTests.cpp:132)
==00:00:00:11.886 29399==by 0x4C2F837: mythread_wrapper
(hg_intercepts.c:389)
==00:00:00:11.886 29399==by 0x6E72463: start_thread
(pthread_create.c:334)
==00:00:00:11.886 29399==by 0x92F901C: clone (clone.S:109)
==00:00:00:11.886 29399==  Block was alloc'd by thread #2


Is this a known (and ignorable) issue with  zmq::atomic_ptr_t<>?

Thanks,
Francesco
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev


Re: [zeromq-dev] Multiple connects with a ZMQ_REQ socket appears not to work

2018-02-23 Thread Luca Boccassi
On Fri, 2018-02-23 at 04:57 +, Mark wrote:
> Hi all.
> 
> According to http://api.zeromq.org/4-2:zmq-socket with a ZMQ_REQ
> socket I can connect to multiple endpoints and the semantics of
> zmq_send() are:
> 
>   "Each request sent is round-robined among all services, and
>   each reply received is matched with the last issued request.
> 
>   If no services are available, then any send operation on the
>   socket shall block until at least one service becomes
>   available. The REQ socket shall not discard messages."
> 
> IOWs the message should get to one of the connected endpoints
> eventually.
> 
> But I'm not seeing that behavior. Rather, the message never arrives
> at
> any of the available endpoints.
> 
> To demonstrate this I hacked up the hello world examples such that
> the
> client connects to two endpoints: the first nominates a port that
> nothing is listening on and the second nominates a port that the
> server is listening on. What I'm seeing is that the message exchange
> never occurs.
> 
> I have attached both programs as they are minorly different from the
> examples on the zmq website.
> 
> If I swap the connect order in the client such that the "up" server
> comes first, then the message exchange works. After further testing I
> deduce that attempting to exchange with second and subsequent connect
> endpoints are the problem.

That's because it's round-robin, and the connection is async - so it
will wait on the first server to respond, and it never does so it's
blocked there. Sounds like what you really want is "fail-over" - IE if
the first does not respond, try the second. That might work if you tune
the tcp reconnect options to have a small timeout, so that the pipe is
removed - by default it's quite large. Note sure it will work - try it.

-- 
Kind regards,
Luca Boccassi

signature.asc
Description: This is a digitally signed message part
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev