On Mon, Dec 16, 2013 at 8:47 AM, Erik Steffl <[email protected]> wrote:
> looking at messages and relp responses being sent I realized that there > are multiple connection open that are used by the same action, as far as I > can tell. > > E.g. I see these relp confirmations received: > > [pid 21291] 07:24:00.744578 recvfrom(14, "1 rsp 92 200 > OK\nrelp_version=0\nrelp_software=librelp,1.2.0,http://librelp.adiscon.com > \ncommands=syslog\n", 32768, MSG_DONTWAIT, NULL, NULL) = 102 > [pid 21291] 07:25:05.093520 recvfrom(16, "1 rsp 92 200 > OK\nrelp_version=0\nrelp_software=librelp,1.2.0,http://librelp.adiscon.com > \ncommands=syslog\n", 32768, MSG_DONTWAIT, NULL, NULL) = 102 > > Then I look at lsof: > > rsyslogd 21254 syslog 14u IPv4 8699295 0t0 > TCP ip-10-158-97-169.ec2.internal:52708->ec2-50-19-250-187. > compute-1.amazonaws.com:5140 (ESTABLISHED) > > rsyslogd 21254 syslog 16u IPv4 8699315 0t0 > TCP ip-10-158-97-169.ec2.internal:52709->ec2-50-19-250-187. > compute-1.amazonaws.com:5140 (ESTABLISHED) > > Where ec2-50-19-250-187.compute-1.amazonaws.com is the load balancer. > > So if there a few connections open, MARK is sent every minute over ONE > of them and keeps it alive, the other one goes bad (ELB timeout). Then > somebody else comes in, tries to use the bad connection and there it is, > silence... > > So I guess cron-ing the messages works because that uses /dev/log, same > as the program that runs every 15 minutes and encounters bad connection > problem. However MARK uses the other connection so does not help. > > Does that make sense? > > definitely, if you have multiple actions, they go over different connections. > That would mean that the possible fixes for this are: > > - use the TCP keepalive (not released yet but hopefully soon) > > I won't do any official release right before I go off to vacation. Tried librelp last week, but you know what happed ;) I any case, everthing is present and can be build from source. > - somehow make rsyslog deal with the bad connection better (not sure how > yet but since there is no actual network problem I guess there must be > something rsyslog can do to talk to the collector again) > if we don't get an error, how should *we* improve. Fix the load balancer that accepts messages and throws them away. IMHO rsyslog/relp work just like they should and I have no idea on how we could fix that problem. > > just asking whether that makes sense, of course not asking anybody to do > anything, obviously for the second solution I'd have to do more work to > figure out what can be done and maybe come up with patches... > > If someone finds out a way to make work with the broken balancer, I'll gladly add patches. I also think you should probably get this going by just setting the retry settings in a useful way. Have you played with them? Can you post the relp action config (I am sure you did, but it takes time to find it...). Rainer > thanks! > > erik > > > On 12/15/2013 08:41 PM, David Lang wrote: > >> On Sun, 15 Dec 2013, Erik Steffl wrote: >> >> changed config so now the MARK messages are sent (and received). BTW >>> they use kern.info facility, not sure why it's not the same as what >>> Rainer found in source (syslog.info). >>> >>> This solves the simple test case in which I only have sender, load >>> balancer (ELB) and receiver. I strace both sender and receiver, see >>> the MARK messages and the connection is fine, i.e. I send a log >>> message using e.g. logger, then wait 5 min, then send another one and >>> it works. Without MARK the second message never arrives. >>> >>> However in a more complex scenario this does not help at all. Complex >>> scenario looks like this (ascii arrows are flows of syslog messages): >>> >>> - 6 machines -> ELB-prod -> collector-prod >>> >>> - 1 machine -> ELB-test -> collector-test >>> >>> - every 5 minutes: collector-test -> ELB-prod -> collector-prod >>> >>> - every 5 minutes: collector-prod -> ELB-prod -> collector-prod (yes, >>> program on collector-prod sends message to rsyslog on collector-prod >>> over ELB) >>> >>> that 5 minute pause cause the connection to go stale somehow which >>> results in periods of silence. I configured both collector-prod and >>> collector-test to send MARK messages (to collector-prod since that's >>> where they send regular messages) but I still see the periods of >>> silence (on both collectors). Used strace to verify that MARK messages >>> arrive (I guess it's possible that I confused the MARK messages from >>> collector-prod and collector-test, will continue investigating on that >>> front) >>> >>> However adding cron entry to send few log messages every minute DOES >>> solve the problem (there is no silence anymore). >>> >>> Any ideas why that would be? Is it possible that MARK messages are >>> being sent through different connections than other messages? >>> >>> As far as I can tell the only difference between MARK messages and >>> the cron'd messages is that the MARK messages are generated by immark >>> and use kern.info facilty and the cron'd messages arrive via /dev/log >>> and use local0.info facility. >>> >>> Any ideas why would the simpler scenario behave differently than the >>> complex scenario? Or why MARK messages do not solve the problem in >>> complex scenario while cron'd messages do? >>> >> >> my guess is that you have some bug in your config that is not forwarding >> the mark messages in the more complex case. >> >> There really isn't a difference as far as this problem is concerned >> between cron generating the message and immark generating the message. >> As you note, it's just which process does the work. >> >> David Lang >> >> thanks! >>> >>> erik >>> >>> On 12/12/2013 02:39 PM, David Lang wrote: >>> >>>> On Thu, 12 Dec 2013, Erik Steffl wrote: >>>> >>>> I will try to test librelp (with keepalive) but I need some >>>>> workaround in the meantime (sort of right now). >>>>> >>>>> Already tested that cron-ing logger once per minute keeps the >>>>> connection alive so that's my backup workaround. >>>>> >>>>> immark would be better cause then I only need to install rsyslog >>>>> config (easier deployment) plus it would be more efficient, do you >>>>> think that what David suggested is the best option? >>>>> >>>>> if I understood David's comment something like this is what I am >>>>> looking for: >>>>> >>>>> if >>>>> prifilt("local0.*") or >>>>> prifilt("local1.*") or >>>>> prifilt("local2.*") or >>>>> prifilt("local3.*") or >>>>> prifilt("local4.*") or >>>>> prifilt("local5.*") or >>>>> prifilt("local6.*") or >>>>> prifilt("local7.*") or >>>>> ( prifilt("syslog.info") and ... message is --MARK--) >>>>> >>>> >>>> pretty much, I would do $msg == '--MARK--' as the second test >>>> >>>> David Lang >>>> >>>> then { >>>>> action(type="mmjsonparse") >>>>> if $parsesuccess == "OK" then { >>>>> action( >>>>> type="omrelp" >>>>> target="someHost" >>>>> port="5140" >>>>> template="json" >>>>> # see http://www.rsyslog.com/doc/node32.html >>>>> # disk used if forwarding blocked >>>>> queue.filename="json" >>>>> queue.maxdiskspace="75161927680" # 70GB (valuable data) >>>>> action.writeAllMarkMessages="on" >>>>> ) >>>>> } else { >>>>> ... >>>>> } >>>>> >>>>> reasonable? Can be improved? >>>>> >>>>> thanks! >>>>> >>>>> erik >>>>> >>>>> On 12/12/2013 12:44 PM, Rainer Gerhards wrote: >>>>> >>>>>> On Thu, Dec 12, 2013 at 9:17 PM, David Lang <[email protected]> wrote: >>>>>> >>>>>> On Thu, 12 Dec 2013, Erik Steffl wrote: >>>>>>> >>>>>>> On 12/12/2013 08:29 AM, David Lang wrote: >>>>>>> >>>>>>>> >>>>>>>> what facility and severity do the immark messages show up as? >>>>>>>>> >>>>>>>>> immark just generates messages, normal filtering rules determine >>>>>>>>> where >>>>>>>>> theya re sent, and the transport used (in this case RELP) has no >>>>>>>>> effect >>>>>>>>> on if they are sent or not, it's all in the filters. >>>>>>>>> >>>>>>>>> >>>>>>>> thanks, that makes my question a lot more specific. How do I >>>>>>>> configure >>>>>>>> immark to use a specific facility? >>>>>>>> >>>>>>>> >>>>>>> I don't think you do. I think they are using the syslog or kernel >>>>>>> facility, but I'd have to setup a quick test to check. I'll try to >>>>>>> do it >>>>>>> tonight if I can, but since you are seeing the messages locally, log >>>>>>> with >>>>>>> RSYSLOG_DebugFormat for a couple of minutes and look at what they are >>>>>>> logged as. >>>>>>> >>>>>>> >>>>>>> its syslog.=info: >>>>>> >>>>>> http://git.adiscon.com/?p=rsyslog.git;a=blob;f=plugins/ >>>>>> immark/immark.c;h=0e946c0b92d555174b38de42dd236a >>>>>> c4432b98e7;hb=HEAD#l196 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> All I found when searching is this: >>>>>>> >>>>>>>> >>>>>>>> $ModLoad immark.so >>>>>>>> $MarkMessagePeriod 60 >>>>>>>> >>>>>>>> which is what I have in my config. >>>>>>>> >>>>>>>> Given that I see the --MARK-- messages in /var/log/syslog and >>>>>>>> /var/log/kern.log I guess they are going to kern facility. Given >>>>>>>> the config >>>>>>>> below I need to use e.g. local0 facility. >>>>>>>> >>>>>>>> >>>>>>> no, you need to change your filtering config to send these messages, >>>>>>> not >>>>>>> try to change the messages to match your current config. >>>>>>> >>>>>>> >>>>>>> you actually can't. I considered mark a legacy feature and have not >>>>>> enhanced it since 8 yrs. >>>>>> >>>>>> Keepalive is the better option. librelp is not yet build due to the >>>>>> current >>>>>> workload. The code actually right now is at github only, as I have >>>>>> some >>>>>> problems with the Adiscon repo. Easy to clone from here >>>>>> >>>>>> https://github.com/rgerhards/librelp >>>>>> >>>>>> >>>>>> >>>>>> messages have the facility that they have, you don't change the >>>>>>> facility >>>>>>> any more than you re-write the message to say something different. >>>>>>> >>>>>> >>>>>> >>>>>> actually, in this case a config option would make sense. But again, I >>>>>> thought this is just legacy... >>>>>> >>>>>> Rainer >>>>>> >>>>>> >>>>>>> >>>>>>> David Lang >>>>>>> >>>>>>> >>>>>>> Unfortunately can't find anything related to --MARK-- and >>>>>>> facilities (or >>>>>>> >>>>>>>> anything else other than the two settings above). >>>>>>>> >>>>>>>> Any ideas/pointers? Or if not possible to configure immark can I >>>>>>>> catch >>>>>>>> the --MARK-- message and change its facility? Or catch the --MARK-- >>>>>>>> message >>>>>>>> and have action that uses omrelp and same target (would that use >>>>>>>> same TCP >>>>>>>> connection)? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> erik >>>>>>>> >>>>>>>> >>>>>>>> David Lang >>>>>>>>> >>>>>>>>> On Thu, 12 Dec 2013, Erik Steffl wrote: >>>>>>>>> >>>>>>>>> Date: Thu, 12 Dec 2013 02:30:52 -0800 >>>>>>>>> >>>>>>>>>> From: Erik Steffl <[email protected]> >>>>>>>>>> Reply-To: rsyslog-users <[email protected]> >>>>>>>>>> To: rsyslog-users <[email protected]> >>>>>>>>>> Subject: [rsyslog] immark - how to use with action(...) >>>>>>>>>> >>>>>>>>>> How would I use immark to send mark messages for defined >>>>>>>>>> actions that >>>>>>>>>> use omrelp? >>>>>>>>>> >>>>>>>>>> I have tried something like this: >>>>>>>>>> >>>>>>>>>> $ModLoad immark.so >>>>>>>>>> $MarkMessagePeriod 60 >>>>>>>>>> >>>>>>>>>> if(..) >>>>>>>>>> if >>>>>>>>>> prifilt("local0.*") or >>>>>>>>>> prifilt("local1.*") or >>>>>>>>>> prifilt("local2.*") or >>>>>>>>>> prifilt("local3.*") or >>>>>>>>>> prifilt("local4.*") or >>>>>>>>>> prifilt("local5.*") or >>>>>>>>>> prifilt("local6.*") or >>>>>>>>>> prifilt("local7.*") >>>>>>>>>> then { >>>>>>>>>> action(type="mmjsonparse") >>>>>>>>>> if $parsesuccess == "OK" then { >>>>>>>>>> action( >>>>>>>>>> type="omrelp" >>>>>>>>>> target="someHost" >>>>>>>>>> port="5140" >>>>>>>>>> template="json" >>>>>>>>>> # see http://www.rsyslog.com/doc/node32.html >>>>>>>>>> # disk used if forwarding blocked >>>>>>>>>> queue.filename="json" >>>>>>>>>> queue.maxdiskspace="75161927680" # 70GB (valuable data) >>>>>>>>>> action.writeAllMarkMessages="on" >>>>>>>>>> ) >>>>>>>>>> } else { >>>>>>>>>> ... >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> I see --MARK-- messages in /var/log/syslog and >>>>>>>>>> /var/log/kern.log but >>>>>>>>>> they are not send by omrelp action (the action works fine, normal >>>>>>>>>> messages are going through). >>>>>>>>>> >>>>>>>>>> Verified where the --MARK-- messages are going using strace so >>>>>>>>>> pretty >>>>>>>>>> sure they are only going to those two local files, nothing goes >>>>>>>>>> over >>>>>>>>>> RELP. Also checked on the receiving side of RELP, no incoming >>>>>>>>>> messages >>>>>>>>>> have --MARK-- in them. And the connection goes down which is also >>>>>>>>>> very >>>>>>>>>> strong indicator that there are no --MARK-- messages. >>>>>>>>>> >>>>>>>>>> How do I configure it so that the --MARK-- messages are send over >>>>>>>>>> RELP >>>>>>>>>> protocol to someHost (over same TCP connection that the given >>>>>>>>>> action >>>>>>>>>> uses, it's for purpose to keep alive the connection since RELP >>>>>>>>>> does >>>>>>>>>> not support KeepAlive (yet, Rainer just added it to master, >>>>>>>>>> thanks!)) >>>>>>>>>> >>>>>>>>>> This is on Ubuntu 13.10 using rsyslog 7.5.6, librelp 1.2.0 from >>>>>>>>>> adiscon repo. >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> >>>>>>>>>> erik >>>>>>>>>> _______________________________________________ >>>>>>>>>> rsyslog mailing list >>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT >>>>>>>>>> POST >>>>>>>>>> if you DON'T LIKE THAT. >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> >>>>>>>>> rsyslog mailing list >>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>>> myriad >>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>>>>>> you >>>>>>>>> DON'T LIKE THAT. >>>>>>>>> >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> rsyslog mailing list >>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>> myriad >>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>>>>>>> if you >>>>>>>> DON'T LIKE THAT. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> >>>>>>> rsyslog mailing list >>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>> myriad >>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>>>> you >>>>>>> DON'T LIKE THAT. >>>>>>> >>>>>>> _______________________________________________ >>>>>> rsyslog mailing list >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>> http://www.rsyslog.com/professional-services/ >>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT >>>>>> POST if you DON'T LIKE THAT. >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> rsyslog mailing list >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> http://www.rsyslog.com/professional-services/ >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>>>> if you DON'T LIKE THAT. >>>>> >>>>> _______________________________________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com/professional-services/ >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>> DON'T LIKE THAT. >>>> >>> >>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com/professional-services/ >>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>> if you DON'T LIKE THAT. >>> >>> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >> DON'T LIKE THAT. >> > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

