Stumbled across a potential memory leak in lib/coroipcc.c in the method coroipcc_service_connect(). It's there in 1.1.2 and the 1.2 releases for corosync.
If the "connect" fails, the method returns without cleaning the hdb_handle
up, and the number of handles grows and grows if you loop retrying until it
successfully connects and it continues to fail.
We found this when we left a client running overnight trying to reconnect at
a high polling rate without corosync running and in the morning the memory
had grown about 300MB and the CPU was maxed out, so it got our attention.
We saw no unusual memory or CPU growth if the client successfully connected
to corosync.
At or near line 609 of lib/coroipcc.c (in the 1.2 release) the code reads:
sys_res = connect (request_fd, (struct sockaddr *)&address,
COROSYNC_SUN_LEN(&address));
if (sys_res == -1) {
close (request_fd);
return (CS_ERR_TRY_AGAIN);
}
Looks like the following should be added before the "return
(CS_ERR_TRY_AGAIN);"
hdb_handle_put (&ipc_hdb, *handle);
hdb_handle_destroy (&ipc_hdb, *handle);
We tried adding these and the memory leak and CPU runup no longer occurred
in our client when corosync isn't running.
The method coroipcc_service_connect() can also return with other failures
later on, so this cleanup, or something related, probably needs to be added
at the bottom of the method as well.
Note: the method cpg_initialize() in lib/cpg.c, which calls
coroipcc_service_connect(), cleans up its own hdb_handles on error, that's
where the idea for the proposed fix comes from.
Hope this helps.
Courtland Chapman
In-Depth Engineering
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Openais mailing list [email protected] https://lists.linux-foundation.org/mailman/listinfo/openais
