Hi,
Setting following KEEPALIVEs in ZMQ SUB, the problem seems fixed. Since I
haven't seen any issue from few hours.
ZMQ_TCP_KEEPALIVE = 1
ZMQ_TCP_KEEPALIVE_IDLE = 30
ZMQ_TCP_KEEPALIVE_INTVL = 5
ZMQ_TCP_KEEPALIVE_CNT = 6
Thank you very much. Meanwhile, an alternate idea came into my mind. Please see
the following code
snippet---------------------------------------------------------------------------------------
while (ros::ok()) { zmq::message_t msg; int rc = 0; try { rc =
zmq_socket.recv(&msg); } catch (zmq::error_t& e) { } if (rc) {
// do work here } else {
zmq_socket.connect(socket_address.c_str()); // re-connect
}}--------------------------------------------------------------------------------------The
above code snippet is not tested. It just came into my mind. Intuitively, it
looks okay to me but I am not sure if it is acceptable in ZMQ.
What do you say?
-ThanksRavi
On Sunday, 28 January 2018 4:37 PM, Justin Karneges <[email protected]>
wrote:
#yiv6282349310 p.yiv6282349310MsoNormal, #yiv6282349310
p.yiv6282349310MsoNoSpacing{margin:0;}All that seems fine, but to get TCP
keepalives to use a shorter timeout you'll want to set those additional
options, yes.
For example, in my apps I use:
ZMQ_TCP_KEEPALIVE = 1
ZMQ_TCP_KEEPALIVE_IDLE = 30
ZMQ_TCP_KEEPALIVE_INTVL = 5
ZMQ_TCP_KEEPALIVE_CNT = 6
What this means is if there is 30 seconds of no I/O, then the peer will be
pinged every 5 seconds, up to 6 times, before closing the connection. Thus, the
connection should recover after about a minute.
If you don't set these additional options, then the OS defaults are used, which
can sometimes be hours (!) long.
Justin
On Sat, Jan 27, 2018, at 9:02 PM, Ravi Joshi via zeromq-dev wrote:
Hi,
I am little confused. Let me first explain the scenario again.
There are 3 publishers written in C# language running on Windows 10 OS. On the
other hand, there are 3 subscribers written in C++ language running on Ubuntu
14.04 LTS OS. The mapping from publisher to subscriber is one to one.
Now let me mention ZMQ socket configurations.
Publisher configuration-
SetOption(ZSocketOption.CONFLATE, 1);
Subscriber configuration-
socket.setsockopt(ZMQ_SUBSCRIBE, "", 0); // allow all messages
socket.setsockopt(ZMQ_RCVTIMEO, &timeout, sizeof(timeout)); // int timeout
= 1000
socket.setsockopt(ZMQ_LINGER, &linger, sizeof(linger)); // int linger = 0
socket.setsockopt(ZMQ_CONFLATE, &conflate, sizeof(conflate)); // int
conflate = 1
socket.setsockopt(ZMQ_TCP_KEEPALIVE, &tcp_keepalive,
sizeof(tcp_keepalive)); // int tcp_keepalive = 1
Do the above configurations look fine? Or do you want me to change and try once?
I am confused since I am not able to find that how to set HEARTBEAT in
Publisher-Subscriber. Any suggestions, please?
Regarding ZMQ_TCP_KEEPALIVE_*, I found following three variables
ZMQ_TCP_KEEPALIVE_IDLE, ZMQ_TCP_KEEPALIVE_CNT, and ZMQ_TCP_KEEPALIVE_INTVL. The
values for these variables is not clear from the documentation. Any
suggestions, please?
Thank you very much.
-
Ravi
On Sunday, 28 January 2018 3:10 AM, Justin Karneges <[email protected]> wrote:
You'd still have to wait for the TCP keepalive to timeout the connection before
it will recover. On Ubuntu this might be a very long time, so be sure to set
all the ZMQ_TCP_KEEPALIVE_* options to ensure a shorter timeout.
On Sat, Jan 27, 2018, at 2:27 AM, Ravi Joshi via zeromq-dev wrote:
Hi Justin,
I will check it using netstat.
Meanwhile, ZMQ_TCP_KEEPALIVE seems not working. I still see that after some
time, Windows OS, where publishers are running, is showing 0 MBPS transmission
rate. After I restart subscribers in ROS on Ubuntu, publishers start working.
Please note that during this process I am not restarting publishers at all.
Below is the code snippet added to all subscribers-
int tcp_keepalive = 1;
zmq_socket.setsockopt(ZMQ_TCP_KEEPALIVE, &tcp_keepalive, sizeof(tcp_keepalive));
-
Thanks
Ravi
On Saturday, 27 January 2018 5:36 PM, Justin Karneges <[email protected]>
wrote:
One thing you might do is run netstat on both sides to see if the connections
are still listed. In a dead connection scenario, netstat should no longer list
the connection on the PUB side, but should remain listing it on the SUB side.
Note that it can take time for the PUB connection to give up. On Linux, the
default is something like 20 minutes after it dies, so give the PUB side some
extra time after messages stop transmitting. If transmission hasn't worked for
over 20 minutes and netstat is still showing the connection on the PUB side,
then the problem may be something else.
On Sat, Jan 27, 2018, at 12:13 AM, Ravi Joshi via zeromq-dev wrote:
Hi Justin,
Thank you very much. How do I make sure that I am getting dead connections?
For time being, I am enabling ZMQ_TCP_KEEPALIVE on all 3 SUB sockets.
I will tell you the status of it after sometime.
Thanks
-
Ravi
Sent from Yahoo Mail for iPhone
On Saturday, January 27, 2018, 3:27 PM, Justin Karneges <[email protected]>
wrote:
Hi,
One issue with socket types that don't usually write data (such as SUB) is that
a dead connection might go unnoticed forever. You can work around this by
enabling TCP keep alives on the SUB socket. I don't know if you're getting dead
connections here but just thought I'd mention it.
Justin
On Fri, Jan 26, 2018, at 9:33 PM, Ravi Joshi via zeromq-dev wrote:
> Hi,
>
> I am using Publisher-Subscriber pattern consisting of 3 publishers to
> publish 3 different types of data. All 3 publishers are written in a
> single C# file. However, each subscriber is written in a separate C++
> file inside ROS. From the point of ZeroMQ, there is no difference in
> each subscriber, since context, socket initialization and receiving
> message is done in the same way for all subscriber. Hence, in order to
> make the mail shorter, I am just posting code snippet of 1
> subscriber below.
>
> The publisher code in C# snippet is available in Pastebin
> (https://pastebin.com/S65LmwuV).
> The subscriber code in C++ snippet is available in Pastebin
> (https://pastebin.com/xb3V0n0u).
>
> The publisher works well initially for some time and successfully
> transmits data at 700MBPS rate but stops transmitting any data after 5-6
> hours.
>
> In order to make publisher working again, I need to restart the
> subscribers. This is strange to me since it is unexpected behavior as
> per the Publisher-Subscriber pattern is concerned.
>
> Why such weird behavior? Any workaround, please.
>
> -
> Thanks
> Ravi
> _______________________________________________
> zeromq-dev mailing list
> [email protected]
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected]
https://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected]
https://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected]
https://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected]
https://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected]
https://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected]
https://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected]
https://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected]
https://lists.zeromq.org/mailman/listinfo/zeromq-dev