On Sun, 15 Dec 2013, Erik Steffl wrote:

looking at messages and relp responses being sent I realized that there are multiple connection open that are used by the same action, as far as I can tell.

 E.g. I see these relp confirmations received:

[pid 21291] 07:24:00.744578 recvfrom(14, "1 rsp 92 200 OK\nrelp_version=0\nrelp_software=librelp,1.2.0,http://librelp.adiscon.com\ncommands=syslog\n";, 32768, MSG_DONTWAIT, NULL, NULL) = 102 [pid 21291] 07:25:05.093520 recvfrom(16, "1 rsp 92 200 OK\nrelp_version=0\nrelp_software=librelp,1.2.0,http://librelp.adiscon.com\ncommands=syslog\n";, 32768, MSG_DONTWAIT, NULL, NULL) = 102

 Then I look at lsof:

rsyslogd 21254 syslog 14u IPv4 8699295 0t0 TCP ip-10-158-97-169.ec2.internal:52708->ec2-50-19-250-187.compute-1.amazonaws.com:5140 (ESTABLISHED)

rsyslogd 21254 syslog 16u IPv4 8699315 0t0 TCP ip-10-158-97-169.ec2.internal:52709->ec2-50-19-250-187.compute-1.amazonaws.com:5140 (ESTABLISHED)

 Where ec2-50-19-250-187.compute-1.amazonaws.com is the load balancer.

So if there a few connections open, MARK is sent every minute over ONE of them and keeps it alive, the other one goes bad (ELB timeout). Then somebody else comes in, tries to use the bad connection and there it is, silence...

Yep, this makes perfect sense.

If you have a fairly low traffic volume, you can get into the situation where one thread keeps draining the queue (and sending the messages) while the other thread (with the other connection) has no traffic, the connection goes idle, and bad-things-happen(TM)

So I guess cron-ing the messages works because that uses /dev/log, same as the program that runs every 15 minutes and encounters bad connection problem. However MARK uses the other connection so does not help.

I doubt if it's that predictable, I think it's pure luck that one was reliable and the other wasn't.

 Does that make sense?

 That would mean that the possible fixes for this are:

 - use the TCP keepalive (not released yet but hopefully soon)

this is the real fix. It's in the master source, but not in anything tagged as a release, so it woul drequire compiling from source.

- somehow make rsyslog deal with the bad connection better (not sure how yet but since there is no actual network problem I guess there must be something rsyslog can do to talk to the collector again)

the only think that rsyslog could do is have a shorter timeout. I'd suggest looking at tweaking the source (and/or TCP stack timeouts) to see if shorter timeouts actually help in your case. They may not really be possible.

the other thing that may be happening is that this may not be the load balancer that's broken. It may be that one of the machines has iptables rules that are opening up stateful holes for this traffic. If that times out, the load balancer won't know until it times out that the machine isn't getting traffic any longer (and it's possible this problem is on the load balancer as well, but I would have expected to hear far more screams about it if the AWS ELB had this sort of bug)

If you go directly to the destination machine without going through the load balancer can you duplicate the problem?

David Lang


just asking whether that makes sense, of course not asking anybody to do anything, obviously for the second solution I'd have to do more work to figure out what can be done and maybe come up with patches...

 thanks!

        erik

On 12/15/2013 08:41 PM, David Lang wrote:
On Sun, 15 Dec 2013, Erik Steffl wrote:

 changed config so now the MARK messages are sent (and received). BTW
they use kern.info facility, not sure why it's not the same as what
Rainer found in source (syslog.info).

 This solves the simple test case in which I only have sender, load
balancer (ELB) and receiver. I strace both sender and receiver, see
the MARK messages and the connection is fine, i.e. I send a log
message using e.g. logger, then wait 5 min, then send another one and
it works. Without MARK the second message never arrives.

 However in a more complex scenario this does not help at all. Complex
scenario looks like this (ascii arrows are flows of syslog messages):

 - 6 machines -> ELB-prod -> collector-prod

 - 1 machine -> ELB-test -> collector-test

 - every 5 minutes: collector-test -> ELB-prod -> collector-prod

 - every 5 minutes: collector-prod -> ELB-prod -> collector-prod (yes,
program on collector-prod sends message to rsyslog on collector-prod
over ELB)

 that 5 minute pause cause the connection to go stale somehow which
results in periods of silence. I configured both collector-prod and
collector-test to send MARK messages (to collector-prod since that's
where they send regular messages) but I still see the periods of
silence (on both collectors). Used strace to verify that MARK messages
arrive (I guess it's possible that I confused the MARK messages from
collector-prod and collector-test, will continue investigating on that
front)

 However adding cron entry to send few log messages every minute DOES
solve the problem (there is no silence anymore).

 Any ideas why that would be? Is it possible that MARK messages are
being sent through different connections than other messages?

 As far as I can tell the only difference between MARK messages and
the cron'd messages is that the MARK messages are generated by immark
and use kern.info facilty and the cron'd messages arrive via /dev/log
and use local0.info facility.

 Any ideas why would the simpler scenario behave differently than the
complex scenario? Or why MARK messages do not solve the problem in
complex scenario while cron'd messages do?

my guess is that you have some bug in your config that is not forwarding
the mark messages in the more complex case.

There really isn't a difference as far as this problem is concerned
between cron generating the message and immark generating the message.
As you note, it's just which process does the work.

David Lang

 thanks!

    erik

On 12/12/2013 02:39 PM, David Lang wrote:
On Thu, 12 Dec 2013, Erik Steffl wrote:

 I will try to test librelp (with keepalive) but I need some
workaround in the meantime (sort of right now).

 Already tested that cron-ing logger once per minute keeps the
connection alive so that's my backup workaround.

 immark would be better cause then I only need to install rsyslog
config (easier deployment) plus it would be more efficient, do you
think that what David suggested is the best option?

 if I understood David's comment something like this is what I am
looking for:

if
prifilt("local0.*") or
prifilt("local1.*") or
prifilt("local2.*") or
prifilt("local3.*") or
prifilt("local4.*") or
prifilt("local5.*") or
prifilt("local6.*") or
prifilt("local7.*") or
( prifilt("syslog.info") and ... message is --MARK--)

pretty much, I would do $msg == '--MARK--' as the second test

David Lang

then {
action(type="mmjsonparse")
if $parsesuccess == "OK" then {
  action(
    type="omrelp"
    target="someHost"
    port="5140"
    template="json"
    # see http://www.rsyslog.com/doc/node32.html
    # disk used if forwarding blocked
    queue.filename="json"
    queue.maxdiskspace="75161927680" # 70GB (valuable data)
    action.writeAllMarkMessages="on"
  )
} else {
 ...
}

 reasonable? Can be improved?

 thanks!

    erik

On 12/12/2013 12:44 PM, Rainer Gerhards wrote:
On Thu, Dec 12, 2013 at 9:17 PM, David Lang <[email protected]> wrote:

On Thu, 12 Dec 2013, Erik Steffl wrote:

  On 12/12/2013 08:29 AM, David Lang wrote:

what facility and severity do the immark messages show up as?

immark just generates messages, normal filtering rules determine
where
theya re sent, and the transport used (in this case RELP) has no
effect
on if they are sent or not, it's all in the filters.


  thanks, that makes my question a lot more specific. How do I
configure
immark to use a specific facility?


I don't think you do. I think they are using the syslog or kernel
facility, but I'd have to setup a quick test to check. I'll try to
do it
tonight if I can, but since you are seeing the messages locally, log
with
RSYSLOG_DebugFormat for a couple of minutes and look at what they are
logged as.


its syslog.=info:

http://git.adiscon.com/?p=rsyslog.git;a=blob;f=plugins/immark/immark.c;h=0e946c0b92d555174b38de42dd236ac4432b98e7;hb=HEAD#l196





  All I found when searching is this:

$ModLoad immark.so
$MarkMessagePeriod 60

  which is what I have in my config.

  Given that I see the --MARK-- messages in /var/log/syslog and
/var/log/kern.log I guess they are going to kern facility. Given
the config
below I need to use e.g. local0 facility.


no, you need to change your filtering config to send these messages,
not
try to change the messages to match your current config.


you actually can't. I considered mark a legacy feature and have not
enhanced it since 8 yrs.

Keepalive is the better option. librelp is not yet build due to the
current
workload. The code actually right now is at github only, as I have
some
problems with the Adiscon repo. Easy to clone from here

https://github.com/rgerhards/librelp



messages have the facility that they have, you don't change the
facility
any more than you re-write the message to say something different.


actually, in this case a config option would make sense. But again, I
thought this is just legacy...

Rainer



David Lang


   Unfortunately can't find anything related to --MARK-- and
facilities (or
anything else other than the two settings above).

  Any ideas/pointers? Or if not possible to configure immark can I
catch
the --MARK-- message and change its facility? Or catch the --MARK--
message
and have action that uses omrelp and same target (would that use
same TCP
connection)?

  Thanks!

         erik


David Lang

On Thu, 12 Dec 2013, Erik Steffl wrote:

  Date: Thu, 12 Dec 2013 02:30:52 -0800
From: Erik Steffl <[email protected]>
Reply-To: rsyslog-users <[email protected]>
To: rsyslog-users <[email protected]>
Subject: [rsyslog] immark - how to use with action(...)

  How would I use immark to send mark messages for defined
actions that
use omrelp?

I have tried something like this:

$ModLoad immark.so
$MarkMessagePeriod 60

if(..)
if
  prifilt("local0.*") or
  prifilt("local1.*") or
  prifilt("local2.*") or
  prifilt("local3.*") or
  prifilt("local4.*") or
  prifilt("local5.*") or
  prifilt("local6.*") or
  prifilt("local7.*")
then {
  action(type="mmjsonparse")
  if $parsesuccess == "OK" then {
    action(
      type="omrelp"
      target="someHost"
      port="5140"
      template="json"
      # see http://www.rsyslog.com/doc/node32.html
      # disk used if forwarding blocked
      queue.filename="json"
      queue.maxdiskspace="75161927680" # 70GB (valuable data)
      action.writeAllMarkMessages="on"
    )
} else {
   ...
}

I see --MARK-- messages in /var/log/syslog and
/var/log/kern.log but
they are not send by omrelp action (the action works fine, normal
messages are going through).

Verified where the --MARK-- messages are going using strace so
pretty
sure they are only going to those two local files, nothing goes
over
RELP. Also checked on the receiving side of RELP, no incoming
messages
have --MARK-- in them. And the connection goes down which is also
very
strong indicator that there are no --MARK-- messages.

How do I configure it so that the --MARK-- messages are send over
RELP
protocol to someHost (over same TCP connection that the given
action
uses, it's for purpose to keep alive the connection since RELP
does
not support KeepAlive (yet, Rainer just added it to master,
thanks!))

This is on Ubuntu 13.10 using rsyslog 7.5.6, librelp 1.2.0 from
adiscon repo.

Thanks!

     erik
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT
POST
if you DON'T LIKE THAT.

  _______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
DON'T LIKE THAT.


_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
if you
DON'T LIKE THAT.

  _______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT
POST if you DON'T LIKE THAT.


_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
if you DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
if you DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to