Re: [rsyslog] Qualys scan against rsyslog causes it to segfault
On 10/7/2017 5:25 AM, Rainer Gerhards wrote: 2017-10-07 7:57 GMT+02:00 deoren: As I dig more into this, I'm beginning to think the only thing the Qualys scan did was aggravate an existing problem and cause rsyslog to tip over more quickly. While I didn't observe the problem at the time, I recently made some configuration changes to introduce a lookup table to map known IPs to known hostnames (something I plan to programmatically build from a central db at some point in the future). I opted to try this in an effort to stop using $hostname directly (which gives odd results with invalid messages) for dynamic directory/filenames. A long story short, I found that systemd is now (without a Qualys scan actively running) restarting rsyslog after it dies within a minute or so of running. My guess is that when a remote client sees that the rsyslog receiver is running again it retries delivery and when the receiver is passed the message it performs the lookup. At that point the real cause for the segfault is triggered and rsyslog goes down. Looking at the output of the dmesg command I found many instances of messages like this one: [364581.440049] rs:main Q:Reg[82570]: segfault at 0 ip 7faeb1fa55f4 sp 7faeaac431d0 error 4 in libfastjson.so.4.1.0[7faeb1fa1000+a000] I stopped rsyslog and then manually started it up like so: rsyslogd -dn > rsyslog-debug.log 2>&1 This was printed to the console: Segmentation fault (core dumped) Here are the last 20 lines from the debug log file: 7587.842410212:main Q:Reg/w0 : ../action.c: action 'action 21': set suspended state to 0 7587.842412422:main Q:Reg/w0 : rainerscript.c: STOP 7587.842418787:main Q:Reg/w0 : ruleset.c: processBATCH: next msg 44: <167>2017-10-07T01:15:37.701Z vms2.example.com Rhttpproxy: verbose rhttpproxy[79FC5B70] [Originator@6876 sub=Proxy Req 829 7587.842421471:main Q:Reg/w0 : rainerscript.c: CALL [receiver-origin-hostname-lookup, queue:0] 7587.842427317:main Q:Reg/w0 : rainerscript.c: IF 7587.842433062:main Q:Reg/w0 : rainerscript.c: var 'fromhost-ip' 7587.842442260:main Q:Reg/w0 : rainerscript.c: == 7587.842449663:main Q:Reg/w0 : rainerscript.c: string '127.0.0.1' 7587.842460890:main Q:Reg/w0 : rainerscript.c: eval expr 0x22147a0, type 'CMP_EQ' 7587.842463187:main Q:Reg/w0 : rainerscript.c: eval expr 0x22138e0, type 'V' 7587.842466060:main Q:Reg/w0 : rainerscript.c: rainerscript: (string) var 8: '192.168.5.142' 7587.842468406:main Q:Reg/w0 : rainerscript.c: eval expr 0x22138e0, return datatype 'S':0 7587.842470812:main Q:Reg/w0 : rainerscript.c: eval expr 0x22147a0, return datatype 'N':0 7587.842473056:main Q:Reg/w0 : ruleset.c: if condition result is 0 7587.842475315:main Q:Reg/w0 : rainerscript.c: IF 7587.842480945:main Q:Reg/w0 : rainerscript.c: var '!origin!hostname' 7587.842490085:main Q:Reg/w0 : rainerscript.c: == 7587.842497393:main Q:Reg/w0 : rainerscript.c: string '' 7587.842508499:main Q:Reg/w0 : rainerscript.c: eval expr 0x22148f0, type 'CMP_EQ' 7587.842511262:main Q:Reg/w0 : rainerscript.c: eval expr 0x22146f0, type 'V' I checked my JSON lookup table file and found a duplicate key (but with different values). This was unintentional and was a result of a mistake when building out the file (copy, paste, modify, oops!). I was hopeful that fixing that would resolve the issue, but no such luck. Just in case it isn't readable above, here is the same output plus the ruleset I'm using: https://gist.github.com/deoren/7e8e799428d0d87c4e467fd4add51d09 Anything stand out as obviously wrong? I've gone back over the Expressions, Control Structures and Variable (Property) types docs and it doesn't appear that I'm attempting to access/set the variables improperly, but I could be missing something. The lookup table feature was added in 8.17.0 as an experimental feature. There seems to be at least one hard-to-find bug inside it, see here: https://github.com/rsyslog/rsyslog/issues/1071 In the past days, I have looked at the issue as the original contributor seems to have lost interest in it. It sounds like what you experience is related to this problem. It would be great if you could run rsyslog under valgrind and see if the problem persists and, if so, what valgrind reports. You can use valgrind simply by installing the distro-provided valgrind package and prefixing the command with "valgrind", as such: $ valgrind rsyslogd -dn > rsyslog-debug.log 2>&1 This would be very helpful. It might also be very, very useful if you could add your findings to above-mentioned github issue tracker. I am 99.999% sure it is the same root cause. Rainer Thanks in advance for your help. Sure, I can do that. I ended up eventually getting my configuration to a "stable" point where where rsyslog was no longer crashing, though I don't yet fully understand why what I changed worked. In my current "stable" configuration
Re: [rsyslog] Qualys scan against rsyslog causes it to segfault
2017-10-07 7:57 GMT+02:00 deoren: > As I dig more into this, I'm beginning to think the only thing the Qualys > scan did was aggravate an existing problem and cause rsyslog to tip over > more quickly. > > While I didn't observe the problem at the time, I recently made some > configuration changes to introduce a lookup table to map known IPs to known > hostnames (something I plan to programmatically build from a central db at > some point in the future). I opted to try this in an effort to stop using > $hostname directly (which gives odd results with invalid messages) for > dynamic directory/filenames. > > A long story short, I found that systemd is now (without a Qualys scan > actively running) restarting rsyslog after it dies within a minute or so of > running. My guess is that when a remote client sees that the rsyslog > receiver is running again it retries delivery and when the receiver is > passed the message it performs the lookup. At that point the real cause for > the segfault is triggered and rsyslog goes down. > > Looking at the output of the dmesg command I found many instances of > messages like this one: > > [364581.440049] rs:main Q:Reg[82570]: segfault at 0 ip 7faeb1fa55f4 sp > 7faeaac431d0 error 4 in libfastjson.so.4.1.0[7faeb1fa1000+a000] > > I stopped rsyslog and then manually started it up like so: > > rsyslogd -dn > rsyslog-debug.log 2>&1 > > This was printed to the console: > > Segmentation fault (core dumped) > > Here are the last 20 lines from the debug log file: > > 7587.842410212:main Q:Reg/w0 : ../action.c: action 'action 21': set > suspended state to 0 > 7587.842412422:main Q:Reg/w0 : rainerscript.c: STOP > 7587.842418787:main Q:Reg/w0 : ruleset.c: processBATCH: next msg 44: > <167>2017-10-07T01:15:37.701Z vms2.example.com Rhttpproxy: verbose > rhttpproxy[79FC5B70] [Originator@6876 sub=Proxy Req 829 > 7587.842421471:main Q:Reg/w0 : rainerscript.c: CALL > [receiver-origin-hostname-lookup, queue:0] > 7587.842427317:main Q:Reg/w0 : rainerscript.c: IF > 7587.842433062:main Q:Reg/w0 : rainerscript.c: var 'fromhost-ip' > 7587.842442260:main Q:Reg/w0 : rainerscript.c: == > 7587.842449663:main Q:Reg/w0 : rainerscript.c: string '127.0.0.1' > 7587.842460890:main Q:Reg/w0 : rainerscript.c: eval expr 0x22147a0, type > 'CMP_EQ' > 7587.842463187:main Q:Reg/w0 : rainerscript.c: eval expr 0x22138e0, type > 'V' > 7587.842466060:main Q:Reg/w0 : rainerscript.c: rainerscript: (string) var > 8: '192.168.5.142' > 7587.842468406:main Q:Reg/w0 : rainerscript.c: eval expr 0x22138e0, return > datatype 'S':0 > 7587.842470812:main Q:Reg/w0 : rainerscript.c: eval expr 0x22147a0, return > datatype 'N':0 > 7587.842473056:main Q:Reg/w0 : ruleset.c: if condition result is 0 > 7587.842475315:main Q:Reg/w0 : rainerscript.c: IF > 7587.842480945:main Q:Reg/w0 : rainerscript.c: var > '!origin!hostname' > 7587.842490085:main Q:Reg/w0 : rainerscript.c: == > 7587.842497393:main Q:Reg/w0 : rainerscript.c: string '' > 7587.842508499:main Q:Reg/w0 : rainerscript.c: eval expr 0x22148f0, type > 'CMP_EQ' > 7587.842511262:main Q:Reg/w0 : rainerscript.c: eval expr 0x22146f0, type > 'V' > > I checked my JSON lookup table file and found a duplicate key (but with > different values). This was unintentional and was a result of a mistake when > building out the file (copy, paste, modify, oops!). > > I was hopeful that fixing that would resolve the issue, but no such luck. > > Just in case it isn't readable above, here is the same output plus the > ruleset I'm using: > > https://gist.github.com/deoren/7e8e799428d0d87c4e467fd4add51d09 > > Anything stand out as obviously wrong? > > I've gone back over the Expressions, Control Structures and Variable > (Property) types docs and it doesn't appear that I'm attempting to > access/set the variables improperly, but I could be missing something. The lookup table feature was added in 8.17.0 as an experimental feature. There seems to be at least one hard-to-find bug inside it, see here: https://github.com/rsyslog/rsyslog/issues/1071 In the past days, I have looked at the issue as the original contributor seems to have lost interest in it. It sounds like what you experience is related to this problem. It would be great if you could run rsyslog under valgrind and see if the problem persists and, if so, what valgrind reports. You can use valgrind simply by installing the distro-provided valgrind package and prefixing the command with "valgrind", as such: $ valgrind rsyslogd -dn > rsyslog-debug.log 2>&1 This would be very helpful. It might also be very, very useful if you could add your findings to above-mentioned github issue tracker. I am 99.999% sure it is the same root cause. Rainer > > Thanks in advance for your help. > > > ___ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog >