Hi,

I've been trying for some weeks to construct a reproducible scenario for
the memory leaks I've been observing with gnutls. In summary, what I
have been seeing is that rsyslog eventually consumes all the system
memory and hangs the machine; this only happens if gnutls is configured.

This was happening pretty quickly (and consistently) with 4.4.2. As
suggested, I upgraded to a locally-built 4.5.6; around the same time, I
checked (and corrected) certain errors in the certificate setup. And
incidentally I also tightened-up the firewall, so that random
port-scanners and general internet pollution was less likely to be
making TCP connections on the syslog server's port.

The result is that the OOM condition *has* recurred once or twice; but
not consistently, and nothing like as often or as quickly as it was
initially with 4.4.2. However I'm not sure the version of rsyslog is the
only software-version factor involved, as I have also upgraded the
version of the gnutls library on both the clients and the server.

So with the 4.5.6 configuration, I have now managed to demonstrate a
reproducible leak scenario (reproducible on my setup, anyway). I simply
have a PHP script open then close 100 non-TLS TCP connections (from
another computer) to the TLS syslog port. No data is sent. Connections
are opened then closed individually - the script has no more than one
socket open at one time. The aim was to simulate the effect of some
port-scanner bot hitting the Gnutls port repeatedly (i.e. what I have
configured my firewall to prevent). E.g.:

# valgrind --log-file=valgrind.log --leak-check=full
/usr/local/sbin/rsyslogd -c4 -f /etc/rsyslog.conf -d >rsyslog.log

Before:
# ps -p `cat /var/run/rsyslogd.pid` -o args -o pid=pid -o sz=sz -o vsz=vsz
COMMAND                       pid    sz    vsz
/usr/bin/valgrind.bin --log 11249 31142 124568

After making 100 TCP client socket connections:
# ps -p `cat /var/run/rsyslogd.pid` -o args -o pid=pid -o sz=sz -o vsz=vsz
COMMAND                       pid    sz    vsz
/usr/bin/valgrind.bin --log 11249 32214 128856

The way I read that, the amount of virtual memory (vsz) consumed by
rsyslog has increased by 4288 x 1024 bytes between these two ps
invocations, and the size of the core image (sz - I guess that means
excluding heap memory - I dunno) by 1072 pages (without valgrind, the
numbers are much smaller - the increase is smaller by a factor of about
10). It's unlikely that this memory has been consumed in the course of
normal handling of syslog messages, as the two ps runs are just about
10s apart, and the normal flow of messages on this system would normally
include only a few tens of messages maximum during such an interval.

Tests run separately, and without valgrind, indicate that memory
consumed as a result of the "unencrypted connections" test is not
returned to the system within ten minutes.

It seems that this configuration of rsyslog does not show observable
memory draining in normal operation when run for several days. With the
test, the draining that is observed is much greater than the typical
fluctuation in memory usage observed during normal operation.

Incidentally: after running the test, "netstat -ant" shows a large
number of connections to the TLS syslog port in a CLOSE_WAIT state; as I
understand it, this means the connection has been closed by the client,
but the TCP stack is waiting for the server to complete the close. These
dangling connections disappear when rsyslog is terminated.

I'm rather new at this kind of memory-tracking work on Linux; so
apologies for the low quality.

rsyslog.log and valgrind.log are being sent to Rainer under separate
cover. I haven't been able to match-up the results from ps with the data
in the valgrind logfile, but I assume that is just further evidence of
my ignorance :-)

-- 
Jack.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to