Re: [rsyslog] Qualys scan against rsyslog causes it to segfault

2017-10-23 Thread deoren

On 10/23/2017 7:51 PM, deoren wrote:

On 10/23/2017 7:38 PM, deoren wrote:

On 10/23/2017 7:11 PM, David Lang wrote:
do you have a tcpdump or info from Qualys saying what it sends as 
part of the scan?


David Lang




Thankfully (for troubleshooting purposes), the problem isn't specific 
to the Qualys scan. I later learned that messages coming from our ESXi 
hosts trigger the problem as well. It may be that ANY message arriving 
on an input where I'm attempting to check for an empty 
$!origin!hostname property is enough to trigger the segfault.


That said, the messages sent by the Qualys scan along with messages 
I've seen coming from our ESXi hosts are often missing information 
(such as the hostname).


I'll do further testing and post back.


According to what I captured with tcpdump, this is what Wireshark 
translated the conversation as:


<166>2017-10-24T00:48:08.071Z vms1.example.com Hostd: [23140B70 verbose 
'Solo.VmwareCLI' opID=hostd-3963 user=root] Result (type boolean) (wsdl 
boolean) (kind 1)
<166>2017-10-24T00:48:08.071Z vms1.example.com Hostd: [22DC2B70 verbose 
'Hostsvc.SyslogConfigProvider'] Received syslog cli invalidation message
<166>2017-10-24T00:48:08.071Z vms1.example.com Hostd: [22DC2B70 verbose 
'Hostsvc.SyslogConfigProvider'] Running '/sbin/localcli system syslog 
config get'
<166>2017-10-24T00:48:08.073Z vms1.example.com Hostd: [22DC2B70 info 
'SysCommandPosix'] ForkExec(/sbin/localcli)  9164454
<166>2017-10-24T00:48:08.077Z vms1.example.com Hostd: [226B0B70 verbose 
'Default' opID=hostd-10a7 user=root] CloseSession called for session 
id=0896d7c3-f4a1-d872-7b76-a01bf0543edf
<166>2017-10-24T00:48:08.077Z vms1.example.com Hostd: [226B0B70 info 
'Vimsvc.ha-eventmgr' opID=hostd-10a7 user=root] Event 743 : User 
root@127.0.0.1 logged out (login time: Tuesday, 24 October, 2017 
00:48:07, number of API invocations: 0, user agent: )
<166>2017-10-24T00:48:08.106Z vms1.example.com Rhttpproxy: [FF9CFB70 
verbose 'Proxy Req 85506'] The client closed the stream, not unexpectedly.
<166>2017-10-24T00:48:08.408Z vms1.example.com Hostd: [22DC2B70 verbose 
'Hostsvc.SyslogConfigProvider'] Running '/sbin/localcli system syslog 
config logger list'
<166>2017-10-24T00:48:08.409Z vms1.example.com Hostd: [22DC2B70 info 
'SysCommandPosix'] ForkExec(/sbin/localcli)  916445


I'll next test with the logger command from a remote host and see where 
that goes.


logger --tcp --port 514 --server sawmill3.example.com "hello!"

That's enough to cause a segfault.
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] Qualys scan against rsyslog causes it to segfault

2017-10-23 Thread deoren

On 10/23/2017 7:38 PM, deoren wrote:

On 10/23/2017 7:11 PM, David Lang wrote:
do you have a tcpdump or info from Qualys saying what it sends as part 
of the scan?


David Lang




Thankfully (for troubleshooting purposes), the problem isn't specific to 
the Qualys scan. I later learned that messages coming from our ESXi 
hosts trigger the problem as well. It may be that ANY message arriving 
on an input where I'm attempting to check for an empty $!origin!hostname 
property is enough to trigger the segfault.


That said, the messages sent by the Qualys scan along with messages I've 
seen coming from our ESXi hosts are often missing information (such as 
the hostname).


I'll do further testing and post back.


According to what I captured with tcpdump, this is what Wireshark 
translated the conversation as:


<166>2017-10-24T00:48:08.071Z vms1.example.com Hostd: [23140B70 verbose 
'Solo.VmwareCLI' opID=hostd-3963 user=root] Result (type boolean) (wsdl 
boolean) (kind 1)
<166>2017-10-24T00:48:08.071Z vms1.example.com Hostd: [22DC2B70 verbose 
'Hostsvc.SyslogConfigProvider'] Received syslog cli invalidation message
<166>2017-10-24T00:48:08.071Z vms1.example.com Hostd: [22DC2B70 verbose 
'Hostsvc.SyslogConfigProvider'] Running '/sbin/localcli system syslog 
config get'
<166>2017-10-24T00:48:08.073Z vms1.example.com Hostd: [22DC2B70 info 
'SysCommandPosix'] ForkExec(/sbin/localcli)  9164454
<166>2017-10-24T00:48:08.077Z vms1.example.com Hostd: [226B0B70 verbose 
'Default' opID=hostd-10a7 user=root] CloseSession called for session 
id=0896d7c3-f4a1-d872-7b76-a01bf0543edf
<166>2017-10-24T00:48:08.077Z vms1.example.com Hostd: [226B0B70 info 
'Vimsvc.ha-eventmgr' opID=hostd-10a7 user=root] Event 743 : User 
root@127.0.0.1 logged out (login time: Tuesday, 24 October, 2017 
00:48:07, number of API invocations: 0, user agent: )
<166>2017-10-24T00:48:08.106Z vms1.example.com Rhttpproxy: [FF9CFB70 
verbose 'Proxy Req 85506'] The client closed the stream, not unexpectedly.
<166>2017-10-24T00:48:08.408Z vms1.example.com Hostd: [22DC2B70 verbose 
'Hostsvc.SyslogConfigProvider'] Running '/sbin/localcli system syslog 
config logger list'
<166>2017-10-24T00:48:08.409Z vms1.example.com Hostd: [22DC2B70 info 
'SysCommandPosix'] ForkExec(/sbin/localcli)  916445


I'll next test with the logger command from a remote host and see where 
that goes.

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Qualys scan against rsyslog causes it to segfault

2017-10-23 Thread deoren

On 10/23/2017 7:11 PM, David Lang wrote:
do you have a tcpdump or info from Qualys saying what it sends as part 
of the scan?


David Lang




Thankfully (for troubleshooting purposes), the problem isn't specific to 
the Qualys scan. I later learned that messages coming from our ESXi 
hosts trigger the problem as well. It may be that ANY message arriving 
on an input where I'm attempting to check for an empty $!origin!hostname 
property is enough to trigger the segfault.


That said, the messages sent by the Qualys scan along with messages I've 
seen coming from our ESXi hosts are often missing information (such as 
the hostname).


I'll do further testing and post back.

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Qualys scan against rsyslog causes it to segfault

2017-10-23 Thread David Lang
do you have a tcpdump or info from Qualys saying what it sends as part of the 
scan?


David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Qualys scan against rsyslog causes it to segfault

2017-10-23 Thread deoren

On 10/7/2017 10:44 AM, deoren wrote:

On 10/7/2017 5:25 AM, Rainer Gerhards wrote:

2017-10-07 7:57 GMT+02:00 deoren
:

As I dig more into this, I'm beginning to think the only thing the 
Qualys

scan did was aggravate an existing problem and cause rsyslog to tip over
more quickly.

While I didn't observe the problem at the time, I recently made some
configuration changes to introduce a lookup table to map known IPs to 
known
hostnames (something I plan to programmatically build from a central 
db at
some point in the future). I opted to try this in an effort to stop 
using

$hostname directly (which gives odd results with invalid messages) for
dynamic directory/filenames.

A long story short, I found that systemd is now (without a Qualys scan
actively running) restarting rsyslog after it dies within a minute or 
so of

running. My guess is that when a remote client sees that the rsyslog
receiver is running again it retries delivery and when the receiver is
passed the message it performs the lookup. At that point the real 
cause for

the segfault is triggered and rsyslog goes down.

Looking at the output of the dmesg command I found many instances of
messages like this one:

[364581.440049] rs:main Q:Reg[82570]: segfault at 0 ip 
7faeb1fa55f4 sp

7faeaac431d0 error 4 in libfastjson.so.4.1.0[7faeb1fa1000+a000]

I stopped rsyslog and then manually started it up like so:

rsyslogd -dn > rsyslog-debug.log 2>&1

This was printed to the console:

Segmentation fault (core dumped)

Here are the last 20 lines from the debug log file:

7587.842410212:main Q:Reg/w0  : ../action.c: action 'action 21': set
suspended state to 0
7587.842412422:main Q:Reg/w0  : rainerscript.c: STOP
7587.842418787:main Q:Reg/w0  : ruleset.c: processBATCH: next msg 44:
<167>2017-10-07T01:15:37.701Z vms2.example.com Rhttpproxy: verbose
rhttpproxy[79FC5B70] [Originator@6876 sub=Proxy Req 829
7587.842421471:main Q:Reg/w0  : rainerscript.c: CALL
[receiver-origin-hostname-lookup, queue:0]
7587.842427317:main Q:Reg/w0  : rainerscript.c: IF
7587.842433062:main Q:Reg/w0  : rainerscript.c: var 
'fromhost-ip'

7587.842442260:main Q:Reg/w0  : rainerscript.c:   ==
7587.842449663:main Q:Reg/w0  : rainerscript.c: string 
'127.0.0.1'
7587.842460890:main Q:Reg/w0  : rainerscript.c: eval expr 0x22147a0, 
type

'CMP_EQ'
7587.842463187:main Q:Reg/w0  : rainerscript.c: eval expr 0x22138e0, 
type

'V'
7587.842466060:main Q:Reg/w0  : rainerscript.c: rainerscript: 
(string) var

8: '192.168.5.142'
7587.842468406:main Q:Reg/w0  : rainerscript.c: eval expr 0x22138e0, 
return

datatype 'S':0
7587.842470812:main Q:Reg/w0  : rainerscript.c: eval expr 0x22147a0, 
return

datatype 'N':0
7587.842473056:main Q:Reg/w0  : ruleset.c: if condition result is 0
7587.842475315:main Q:Reg/w0  : rainerscript.c: IF
7587.842480945:main Q:Reg/w0  : rainerscript.c: var
'!origin!hostname'
7587.842490085:main Q:Reg/w0  : rainerscript.c:   ==
7587.842497393:main Q:Reg/w0  : rainerscript.c: string ''
7587.842508499:main Q:Reg/w0  : rainerscript.c: eval expr 0x22148f0, 
type

'CMP_EQ'
7587.842511262:main Q:Reg/w0  : rainerscript.c: eval expr 0x22146f0, 
type

'V'

I checked my JSON lookup table file and found a duplicate key (but with
different values). This was unintentional and was a result of a 
mistake when

building out the file (copy, paste, modify, oops!).

I was hopeful that fixing that would resolve the issue, but no such 
luck.


Just in case it isn't readable above, here is the same output plus the
ruleset I'm using:

https://gist.github.com/deoren/7e8e799428d0d87c4e467fd4add51d09

Anything stand out as obviously wrong?

I've gone back over the Expressions, Control Structures and Variable
(Property) types docs and it doesn't appear that I'm attempting to
access/set the variables improperly, but I could be missing something.


The lookup table feature was added in 8.17.0 as an experimental
feature. There seems to be at least one hard-to-find bug inside it,
see here:

https://github.com/rsyslog/rsyslog/issues/1071

In the past days, I have looked at the issue as the original
contributor seems to have lost interest in it. It sounds like what you
experience is related to this problem. It would be great if you could
run rsyslog under valgrind and see if the problem persists and, if so,
what valgrind reports. You can use valgrind simply by installing the
distro-provided valgrind package and prefixing the command with
"valgrind", as such:

$ valgrind rsyslogd -dn > rsyslog-debug.log 2>&1

This would be very helpful. It might also be very, very useful if you
could add your findings to above-mentioned github issue tracker. I am
99.999% sure it is the same root cause.

Rainer



Thanks in advance for your help.


Sure, I can do that.

I ended up eventually getting my configuration to a "stable" point where 
where rsyslog was no longer crashing, though I don't yet fully 

[rsyslog] RES: Client/server architecture with omrelp/imrelp - Lost of messages in case of server side failure

2017-10-23 Thread Jether B. Santos via rsyslog
Hello,

I realized the parameter RebindInterval was causing duplicated messages in both 
balanced servers. I removed that and apparentely the messages haven't been lost 
anymore.

Tomorrow I will stress it more.

Regards,

Jether

De: Jether B. Santos
Enviada em: segunda-feira, 23 de outubro de 2017 11:35
Para: 'rsyslog@lists.adiscon.com' 
Assunto: Client/server architecture with omrelp/imrelp - Lost of messages in 
case of server side failure

Hi everybody,

I'm testing the following rsyslog client/server architecture:

- Client side: a rsyslog server that consumes messages from a local apache log 
file through imfile module. The main worker threads save the messages in a disk 
assisted queue of an omrelp action. The action threads dequeue the messages and 
send them to rsyslog servers behind a haproxy instance. Rsyslog's client 
configuration:
---
module(load="imfile" mode="inotify")
input(type="imfile" file="/var/log/httpd/access_log" tag="apache:" 
facility="local3" severity="notice" ruleset="remote_logserver")

module(load="omrelp")
ruleset(name="remote_logserver"){
action(type="omrelp" Target="10.0.0.43" Port="514" 
queue.filename="fila-msg" queue.size="180" queue.saveonshutdown="on"
queue.type="LinkedList" action.resumeretrycount="-1" 
action.resumeInterval="1" RebindInterval="72")
}
---

- Server side: two rsyslog servers behind a haproxy load balancer instance. The 
servers listen the logs through imrelp module. Rsyslog's servers configuration:
---
module(load="imrelp")
input(type="imrelp" port="514")
local3.*/var/log/httpd-access.log
---

The problem: Everytime I stop and start the rsyslog daemon in the remote 
rsyslog server where the client is conected, despite haproxy switches the 
connection to the other rsyslog server, at least a message is lost. I tried 
configure retrycount="-1" but without success.

I read about similar problem in the mail list 
(http://lists.adiscon.net/pipermail/rsyslog/2016-March/042379.html) but I've 
not understood exaclly what's been done to solve that 
(https://github.com/rsyslog/rsyslog/issues/974?_pjax=%23js-repo-pjax-container):

- Is it really mandatory to configure batch of only one message a time 
(queue.dequeuebatchsize="1") in order to not lose any message in case of the 
remote rsyslog server fail?
- Has the "isolation" of a "corrupt message" implemented (Rainer said in the 
mail list mentioned above: I thought that if a batch failed, it pushed all the 
messages back on the queue and retried with a half size batch until it got to 
the one message that could not be processed and only did a fatal fail on that 
message)?
- What about the action retry feature? When it is enabled and you have a batch 
of many messages and just one fails, what's expected to happen?

My servers environment:
- Centos 7.3
- Rsyslog version: 8.24

Regards,

Jether



"Esta mensagem e reservada e sua divulgacao, distribuicao, reproducao ou 
qualquer forma de uso e proibida e depende de previa autorizacao desta 
instituicao. O remetente utiliza o correio eletronico no exercicio do seu 
trabalho ou em razao dele, eximindo esta instituicao de qualquer 
responsabilidade por utilizacao indevida. Se voce recebeu esta mensagem por 
engano, favor elimina-la imediatamente."

"This message is reserved and its disclosure, distribution, reproduction or any 
other form of use is prohibited and shall depend upon previous proper 
authorization. The sender uses the electronic mail in the exercise of his/her 
work or by virtue thereof, and the institution accepts no liability for its 
undue use. If you have received this e-mail by mistake, please delete it 
immediately."

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


[rsyslog] Client/server architecture with omrelp/imrelp - Lost of messages in case of server side failure

2017-10-23 Thread Jether B. Santos via rsyslog
Hi everybody,

I'm testing the following rsyslog client/server architecture:

- Client side: a rsyslog server that consumes messages from a local apache log 
file through imfile module. The main worker threads save the messages in a disk 
assisted queue of an omrelp action. The action threads dequeue the messages and 
send them to rsyslog servers behind a haproxy instance. Rsyslog's client 
configuration:
---
module(load="imfile" mode="inotify")
input(type="imfile" file="/var/log/httpd/access_log" tag="apache:" 
facility="local3" severity="notice" ruleset="remote_logserver")

module(load="omrelp")
ruleset(name="remote_logserver"){
action(type="omrelp" Target="10.0.0.43" Port="514" 
queue.filename="fila-msg" queue.size="180" queue.saveonshutdown="on"
queue.type="LinkedList" action.resumeretrycount="-1" 
action.resumeInterval="1" RebindInterval="72")
}
---

- Server side: two rsyslog servers behind a haproxy load balancer instance. The 
servers listen the logs through imrelp module. Rsyslog's servers configuration:
---
module(load="imrelp")
input(type="imrelp" port="514")
local3.*/var/log/httpd-access.log
---

The problem: Everytime I stop and start the rsyslog daemon in the remote 
rsyslog server where the client is conected, despite haproxy switches the 
connection to the other rsyslog server, at least a message is lost. I tried 
configure retrycount="-1" but without success.

I read about similar problem in the mail list 
(http://lists.adiscon.net/pipermail/rsyslog/2016-March/042379.html) but I've 
not understood exaclly what's been done to solve that 
(https://github.com/rsyslog/rsyslog/issues/974?_pjax=%23js-repo-pjax-container):

- Is it really mandatory to configure batch of only one message a time 
(queue.dequeuebatchsize="1") in order to not lose any message in case of the 
remote rsyslog server fail?
- Has the "isolation" of a "corrupt message" implemented (Rainer said in the 
mail list mentioned above: I thought that if a batch failed, it pushed all the 
messages back on the queue and retried with a half size batch until it got to 
the one message that could not be processed and only did a fatal fail on that 
message)?
- What about the action retry feature? When it is enabled and you have a batch 
of many messages and just one fails, what's expected to happen?

My servers environment:
- Centos 7.3
- Rsyslog version: 8.24

Regards,

Jether



"Esta mensagem e reservada e sua divulgacao, distribuicao, reproducao ou 
qualquer forma de uso e proibida e depende de previa autorizacao desta 
instituicao. O remetente utiliza o correio eletronico no exercicio do seu 
trabalho ou em razao dele, eximindo esta instituicao de qualquer 
responsabilidade por utilizacao indevida. Se voce recebeu esta mensagem por 
engano, favor elimina-la imediatamente."

"This message is reserved and its disclosure, distribution, reproduction or any 
other form of use is prohibited and shall depend upon previous proper 
authorization. The sender uses the electronic mail in the exercise of his/her 
work or by virtue thereof, and the institution accepts no liability for its 
undue use. If you have received this e-mail by mistake, please delete it 
immediately."

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.