Hi, I've been trying for some weeks to construct a reproducible scenario for the memory leaks I've been observing with gnutls. In summary, what I have been seeing is that rsyslog eventually consumes all the system memory and hangs the machine; this only happens if gnutls is configured.
This was happening pretty quickly (and consistently) with 4.4.2. As suggested, I upgraded to a locally-built 4.5.6; around the same time, I checked (and corrected) certain errors in the certificate setup. And incidentally I also tightened-up the firewall, so that random port-scanners and general internet pollution was less likely to be making TCP connections on the syslog server's port. The result is that the OOM condition *has* recurred once or twice; but not consistently, and nothing like as often or as quickly as it was initially with 4.4.2. However I'm not sure the version of rsyslog is the only software-version factor involved, as I have also upgraded the version of the gnutls library on both the clients and the server. So with the 4.5.6 configuration, I have now managed to demonstrate a reproducible leak scenario (reproducible on my setup, anyway). I simply have a PHP script open then close 100 non-TLS TCP connections (from another computer) to the TLS syslog port. No data is sent. Connections are opened then closed individually - the script has no more than one socket open at one time. The aim was to simulate the effect of some port-scanner bot hitting the Gnutls port repeatedly (i.e. what I have configured my firewall to prevent). E.g.: # valgrind --log-file=valgrind.log --leak-check=full /usr/local/sbin/rsyslogd -c4 -f /etc/rsyslog.conf -d >rsyslog.log Before: # ps -p `cat /var/run/rsyslogd.pid` -o args -o pid=pid -o sz=sz -o vsz=vsz COMMAND pid sz vsz /usr/bin/valgrind.bin --log 11249 31142 124568 After making 100 TCP client socket connections: # ps -p `cat /var/run/rsyslogd.pid` -o args -o pid=pid -o sz=sz -o vsz=vsz COMMAND pid sz vsz /usr/bin/valgrind.bin --log 11249 32214 128856 The way I read that, the amount of virtual memory (vsz) consumed by rsyslog has increased by 4288 x 1024 bytes between these two ps invocations, and the size of the core image (sz - I guess that means excluding heap memory - I dunno) by 1072 pages (without valgrind, the numbers are much smaller - the increase is smaller by a factor of about 10). It's unlikely that this memory has been consumed in the course of normal handling of syslog messages, as the two ps runs are just about 10s apart, and the normal flow of messages on this system would normally include only a few tens of messages maximum during such an interval. Tests run separately, and without valgrind, indicate that memory consumed as a result of the "unencrypted connections" test is not returned to the system within ten minutes. It seems that this configuration of rsyslog does not show observable memory draining in normal operation when run for several days. With the test, the draining that is observed is much greater than the typical fluctuation in memory usage observed during normal operation. Incidentally: after running the test, "netstat -ant" shows a large number of connections to the TLS syslog port in a CLOSE_WAIT state; as I understand it, this means the connection has been closed by the client, but the TCP stack is waiting for the server to complete the close. These dangling connections disappear when rsyslog is terminated. I'm rather new at this kind of memory-tracking work on Linux; so apologies for the low quality. rsyslog.log and valgrind.log are being sent to Rainer under separate cover. I haven't been able to match-up the results from ps with the data in the valgrind logfile, but I assume that is just further evidence of my ignorance :-) -- Jack. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

