Hi,

I've also made good experience with Google's gperftools 
(https://github.com/gperftools/gperftools) / tcmalloc. I also experienced a 
nice speed-up with tcmalloc in addition to its use as memory profiler. The 
generated profile data can be viewed with kcachegrind.

Best wishes,
Jens

-----Ursprüngliche Nachricht-----
Von: zeromq-dev [mailto:zeromq-dev-boun...@lists.zeromq.org] Im Auftrag von 
Luca Boccassi
Gesendet: Freitag, 18. August 2017 11:41
An: ZeroMQ development list
Betreff: Re: [zeromq-dev] another time problems with stream socket and lost 
clients.

On Thu, 2017-08-17 at 21:25 +0000, Juergen Gnoss wrote:
> Problem is, my program crashes after half an hour or so,
> 
> saying:
> 
> FATAL ERROR at src/zhashx.c:210
> 
> OUT OF MEMORY (malloc returned NULL)
> 
> Aborted
> 
> Strange, running it with valgrind for a few minutes and
> 
> terminating it with ctrlC all is OK, valgrind at least
> 
> say’s that all is OK.
> 
> ==29022==
> 
> ==29022== HEAP SUMMARY:
> 
> ==29022== in use at exit: 0 bytes in 0 blocks
> 
> ==29022== total heap usage: 2,665 allocs, 2,665 frees, 965,174 bytes 
> allocated
> 
> ==29022==
> 
> ==29022== All heap blocks were freed — no leaks are possible
> 
> ==29022==
> 
> ==29022== For counts of detected and suppressed errors, rerun with:
> -v
> 
> ==29022== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from
> 0)
> 
> So, roll up sleeves and find a way how to see what’s going on.
> 
> Watching the memory, shows a slight grow in usage over time.
> 
> Try to get familiar with wireshark …
> 
> and I tracked it down, that a mobile client that connects to the 
> server
> 
> changes at will it’s outgoing port and sometimes even it’s IP.
> 
> Sometimes he terminates the connection correct before coming over the
> 
> new port or IP, sometimes not.
> 
> I put an interesting logfile on pastebin for people that are 
> interested
> 
> to see what’s going on in my case.
> 
> https://pastebin.com/hiRc3AfD
> 
> I tried to play with heartbeat options on the socket.
> 
> Last day’s I saw another people on the list have similar problems with 
> lost
> 
> clients on stream sockets. If I remember well, luca recommended to use
> 
> socket options, so the kernel will take over on lost clients and the 
> library
> 
> will do the rest.
> 
> In my case, sure I don’t use the right combination of options, because 
> the
> 
> connections stay open. When my program crashes, the kernel still tries 
> to notify
> 
> the clients to close.
> 
> Here is what I use to create the socket’s.
> 
> ‘’’c
> 
> self->deviceSocket = zsock_new_stream(connstr);
> 
> if (!self->deviceSocket) {
> 
>     zsys_error( "Error getting BSD Socket socket\n%s\n", zmq_strerror 
> (errno));
> 
>     free_DBPool(self);
> 
>     return -1;
> 
> }
> 
> zsys_info( "BSD Socket bind to : '%s'\n", connstr);
> 
> 
> 
> 
> int hbi = zsock_heartbeat_ivl (self->deviceSocket);
> 
> if (hbi < 300) {
> 
>     zsys_info( "BSD Socket heartbeat is '%d' --> to low \n", hbi);
> 
>     zsock_set_heartbeat_ivl (self->deviceSocket, 30);
> 
>     hbi = zsock_heartbeat_ivl (self->deviceSocket);
> 
>     zsys_info( "BSD Socket heartbeat is now : '%d'\n", hbi);
> 
> }
> 
> 
> 
> 
> int hbto = zsock_heartbeat_timeout(self->deviceSocket);
> 
> if (hbi < 120) {
> 
>     zsys_info( "BSD Socket heartbeat timeout is '%d' --> to low \n", 
> hbto);
> 
>     zsock_set_heartbeat_timeout(self->deviceSocket, 120);
> 
>     hbto = zsock_heartbeat_timeout(self->deviceSocket);
> 
>     zsys_info( "BSD Socket heartbeat timeout is now : '%d'\n", hbto);
> 
> }
> 
> 
> 
> 
> zpoller_add (self->devActor_poller, self->deviceSocket);
> 
> 
> ‘’’
> 
> 
> Question one is now, what is the right option to use?
> 
> Or should I take care myself to disconnect clients by sending a
> 
> zero frame to sockets clientID.
> 
> Question two is, why the program crashes?
> 
> Operating System is far from out of memory at the time the
> 
> program crashes.
> 
> 
> thanks
> 
> Ju
> 
> PS.:
> 
> Logfile on pastebin is a combined log from 3 sources into one logfile,
> 
>   1.  the program running the socket
>   2.  tshark watching the port in question
>   3.  a czmq - dish  listening on the programs radio socket
> 
> 
> all connections coming from the same mobile client (only one client in 
> that case)
> 
> going to a server with public IP ( IP replaced with "my.ser.vers.ip")
> 
> and no (known) firewall or NAT in the way.

Hi,

A few things:

1) When using Wireshark with ZMQ there's this great dissector for the
protocol: https://github.com/whitequark/zmtp-wireshark
I recommend using it, as it makes it so much easier to debug connections
2) If a malloc is failing, then your program IS going out of memory as there's 
really no other reason why it would fail - might not be the system memory, but 
only what that process is allowed to use (cgroups or other limitations?)
3) If you want to analyse the program's memory utilisation I suggest valgrind 
with --tool=massif, it's a great profiler
4) As the manpage for zmq_setsockopt says, the heartbeat options only apply for 
the next connections - so you need to create the socket first, apply the 
options then connect it, eg:

s = zsock_new(ZMQ_STREAM);
zsock_set...
zsock_connect(s, connstr, NULL);

Finally and most importantly, heartbeats are a ZMTP feature, so it won't work 
for ZMQ_STREAM as in those cases the peers are not ZMTP sockets, but plain TCP 
sockets.

So as you guessed I think you should take care of gracefully terminating the 
connections, but I don't use ZMQ_STREAM that much so there might be more.

--
Kind regards,
Luca Boccassi

_______________________________________________
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to