I have checked the debug log that was on the probably related bug tracker at:
http://bugzilla.adiscon.com/show_bug.cgi?id=208 There, it looks like the client librelp version causes problems (just a hypothesis so far). That brings me to: which version of librelp do you use on client and server? Rainer On Tue, Nov 19, 2013 at 2:45 PM, Rainer Gerhards <[email protected]>wrote: > OK, I am slowly catching up and can most probably look into this problem > now. > > Erik, I think I need to create an instrumented version in any case so that > we can get a bit more insight. Are you able to deploy such a version from > git or source tarball? Also, I will need at least a partial debug log, from > before things go wrong up until the situation recovers. If the debug log > get's to large, "debug on demand" (turn on via signal) will probably be > helpful. > > Could you assist with these things? > > Thx, > Rainer > > > On Tue, Nov 19, 2013 at 9:08 AM, David Lang <[email protected]> wrote: > >> On Mon, 18 Nov 2013, Erik Steffl wrote: >> >> On 11/18/2013 08:06 PM, David Lang wrote: >>> >>>> could you run impstats with a rapid reporting cycle while you duplicate >>>> the problem? >>>> >>> >>> run impstats how? I can add it, any suggested configuration? Just log >>> into a file every minute? >>> >> >> yes, log into a file every minute or so. We may end up shrinking this to >> a faster interval >> >> When you log as fast as you can, I expect that the queue will grow, and >> then take a little bit of time to drain off. If it gets stuck while >> draining, the pstats output will show us. >> >> David Lang >> >> >> With a rapid trigger like this, a new debug log may be helpful with less >>>> 'stray noise' in it. >>>> >>> >>> will try to set that up, at least it will be easier to get close to the >>> time the problem starts (sometime between 1st and 100th burst message) >>> >>> Speculation: >>>> >>>> My guess is that something is getting 'stuck' processing the last >>>> message of the third batch, and this is stalling the system (probably >>>> something is holding a lock that's preventing RELP from finishing >>>> processing messages, which is causing the senders to pause until RELP >>>> finally acknowledges the sent messages) >>>> >>>> knowing if all the messages from the script make it to the log or if >>>> some of them don't make it until things startup again would be useful >>>> data. This may involve creating a custom output template to add extra >>>> time info (received time, now, etc) to the output line and see what >>>> happens. >>>> >>> >>> sounds reasonable, will try to figure out if that's happening, >>> >>> thanks! >>> >>> erik >>> >>> >>>> David Lang >>>> >>>> On Mon, 18 Nov 2013, Erik Steffl wrote: >>>> >>>> figured out more specific scenario that is causing the silence. >>>>> >>>>> to recap this is the current setup: >>>>> >>>>> - 6 machines sending logs to collector machine using RELP >>>>> >>>>> - 1 machine collecting all the logs on disk (20-30 files at a time, >>>>> split by origin etc.) >>>>> >>>>> There is fairly steady traffic of 200-300 messages per second, 200 - >>>>> 300 kB/s. >>>>> >>>>> In this scenario if I run a program that logs 100 messages as fast as >>>>> possible the whole system goes silent (every THIRD time this happens), >>>>> all six senders stop sending messages (or almost stop, the traffic >>>>> goes down to maybe 1kB/s) and doesn't start again until I run another >>>>> burst of messages (which is equally weird, why would it start again >>>>> after next burst of messages?). >>>>> >>>>> The program that sends 100 messages is running on the log collector >>>>> host (not sure if it's relevant, likely not). >>>>> >>>>> This produces the silences virtually 100% (every third time it runs), >>>>> as long as I cron the 100-messages-log-sender the problem is there, if >>>>> I comment it out it works fine (tested both number of times, sometime >>>>> left it running for days). >>>>> >>>>> Previously I thought this might be related to renaming of the files >>>>> or reload rsyslog (HUP signal) but that is completely irrelevant. >>>>> Regardless of renaming or HUP signal it's the burst of messages that >>>>> makes it go silent (and somehow signaling to all senders to go silent, >>>>> perhaps by not sending RELP confirmations?). >>>>> >>>>> The program that logs burst of 100 messages is a perl script that >>>>> does essentially this (some irrelevant parts left out, think it would >>>>> run as is but it might need some fixes): >>>>> >>>>> use strict; >>>>> use Log::Log4perl; >>>>> >>>>> my $logConfig = ' >>>>> log4perl.rootLogger=DEBUG, SYSLOG >>>>> log4perl.appender.SYSLOG = Log::Dispatch::Syslog >>>>> log4perl.appender.SYSLOG.min_level = debug >>>>> log4perl.appender.SYSLOG.ident = myTag >>>>> log4perl.appender.SYSLOG.facility = local0 >>>>> log4perl.appender.SYSLOG.layout = Log::Log4perl::Layout:: >>>>> PatternLayout >>>>> log4perl.appender.SYSLOG.layout.ConversionPattern=@cee:%m >>>>> '; >>>>> >>>>> Log::Log4perl::init(\$logConfig); >>>>> my $l = Log::Log4perl::get_logger(); >>>>> my $now = localtime(); >>>>> >>>>> for(my $i = 0; $i < 100; $i++) { >>>>> $l->info(JSON::encode_json({now=>$now,i=>$i})); >>>>> } >>>>> >>>>> it needs liblog-log4perl-perl and liblog-dispatch-perl ubuntu >>>>> packages (or equivalent on other distros). >>>>> >>>>> This does not seem to be related to rate-limiting (as documented) >>>>> since there are no dropped messages, didn't find any rsyslog messages >>>>> about dropping messages due to rate limiting (grep'd all files in >>>>> /var/log/). Could this be related to Queue slowdown? We do not set >>>>> dequeueslowdown for any queues. >>>>> >>>>> Any ideas why would a short burst of 100 messages cause this problem? >>>>> >>>>> thanks! >>>>> >>>>> erik >>>>> >>>>> On 11/14/2013 01:53 AM, Rainer Gerhards wrote: >>>>> >>>>>> On Thu, Nov 14, 2013 at 6:58 AM, Erik Steffl <[email protected]> wrote: >>>>>> >>>>>> did you have a chance to look at debug log? >>>>>>> >>>>>>> >>>>>>> not yet, but I am finalizing the 8.1.0 release. Bear with me another >>>>>> couple >>>>>> of days pls. >>>>>> Rainer >>>>>> >>>>>> >>>>>> did more testing in the meantime, figured out that: >>>>>>> >>>>>>> 6 senders, 1 collector version 7.5.4, 300 kB/s always works >>>>>>> >>>>>>> 6 senders, 1 collector versions 7.5.5, 7.5.6 300 kB/s do not work >>>>>>> >>>>>>> 6 senders, 2 collectors versions 7.5.5, 7.5.6 300 kB/s mostly >>>>>>> work but >>>>>>> not always (periods of silence are much more predicable if the re is >>>>>>> no >>>>>>> reload/HUP signal) >>>>>>> >>>>>>> 1 sender, one collector any version always works (this has less >>>>>>> files >>>>>>> open plusthe total traffic is around 100lB/s, not sure if that's >>>>>>> siginificant) >>>>>>> >>>>>>> Looked at strace on both sides (sender and receiver) during time >>>>>>> when >>>>>>> traffic went from 300kB/s down to virtually zero but see nothing >>>>>>> suspicious, essentially there is lot of messages sent, then one >>>>>>> message >>>>>>> confirming all of them. Then at some point there is timeout, then few >>>>>>> messages are confirmed one by one then nothing, then it starts again. >>>>>>> >>>>>>> As far as I can tell it's the rename of the files that causes the >>>>>>> problem (we rename athen remove them but if I comment out the remove >>>>>>> part >>>>>>> the problem is still there). We never rename the files that rsyslogd >>>>>>> is >>>>>>> actually writing to (even though they are still open). >>>>>>> >>>>>>> Given the behaviour in the scenarios above it seems something got >>>>>>> broken >>>>>>> between 7.5.4 and 7.5.5. Did the code that handles renamed files >>>>>>> change in >>>>>>> any way? I looked at changelog (http://www.rsyslog.com/ >>>>>>> changelog-for-7-5-5-v7-devel/) before but didn't see anything that >>>>>>> looks >>>>>>> related. >>>>>>> >>>>>>> Any ideas how to troubleshoot this further? >>>>>>> >>>>>>> thanks! >>>>>>> >>>>>>> erik >>>>>>> >>>>>>> >>>>>>> On 11/05/2013 10:04 AM, Rainer Gerhards wrote: >>>>>>> >>>>>>> On Tue, Nov 5, 2013 at 3:41 AM, Erik Steffl <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>> See the debug log attached, the first line is the last message >>>>>>>> in one >>>>>>>> >>>>>>>>> of >>>>>>>>> the files (maybe not last message overall, but all communication >>>>>>>>> stops at >>>>>>>>> that second, there's number of files so it's hard to figure out >>>>>>>>> which >>>>>>>>> message is the last one). >>>>>>>>> >>>>>>>>> Also noticed that number of messages in log with same number >>>>>>>>> at the >>>>>>>>> beginning drops right after this message, I assume the beginning >>>>>>>>> of the >>>>>>>>> line has something to do with time, these are the counts and >>>>>>>>> values at >>>>>>>>> the >>>>>>>>> beginning of the debug log line: >>>>>>>>> >>>>>>>>> >>>>>>>>> It's the last 5 digits of a Unix timestamp - so you don't get an >>>>>>>> absolute >>>>>>>> time, but an idea how things progress. After the dot I think there >>>>>>>> are ms >>>>>>>> (or so) and after the colon there is a thread ID (same ID = same >>>>>>>> thread). >>>>>>>> >>>>>>>> Hope that helps a bit, will try to look at the debug log tomorrow >>>>>>>> morning. >>>>>>>> >>>>>>>> Rainer >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 21264 9547. >>>>>>>>> >>>>>>>>> >>>>>>>> 22378 9548. >>>>>>>> >>>>>>>> 2725 9549. >>>>>>>>> 1877 9550. >>>>>>>>> ...9551 to 9969 left out, low counts around 2000 >>>>>>>>> ...9970 to 0519 left out, low counts around 6 >>>>>>>>> 6 0520. >>>>>>>>> 6 0521. >>>>>>>>> 39302 0522. >>>>>>>>> 51997 0523. >>>>>>>>> >>>>>>>>> thanks! >>>>>>>>> >>>>>>>>> erik >>>>>>>>> >>>>>>>>> >>>>>>>>> On 11/03/2013 01:05 PM, David Lang wrote: >>>>>>>>> >>>>>>>>> I would post everything in the log file starting with the last >>>>>>>>> log that >>>>>>>>> >>>>>>>>>> it successfully sent and continuing a while after that message (a >>>>>>>>>> while >>>>>>>>>> being a couple hundred lines or so to be safe) >>>>>>>>>> >>>>>>>>>> once we look at the log we may ask for more. >>>>>>>>>> >>>>>>>>>> David Lang >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sun, 3 Nov 2013, Erik Steffl wrote: >>>>>>>>>> >>>>>>>>>> Date: Sun, 03 Nov 2013 11:45:03 -0800 >>>>>>>>>> >>>>>>>>>> From: Erik Steffl <[email protected]> >>>>>>>>>>> Reply-To: rsyslog-users <[email protected]> >>>>>>>>>>> To: rsyslog-users <[email protected]> >>>>>>>>>>> Subject: Re: [rsyslog] Rsyslog with RELP not sending/receiving >>>>>>>>>>> messages for >>>>>>>>>>> long intervals >>>>>>>>>>> >>>>>>>>>>> I have debug log where I was able to find last message in one >>>>>>>>>>> of the >>>>>>>>>>> logs files (right before the silence period) but it's 3 GB :) >>>>>>>>>>> Any hint >>>>>>>>>>> how much to pick the relevant part of the debug log? Is the >>>>>>>>>>> timestamp >>>>>>>>>>> part of the log message? There is something in the beginning of >>>>>>>>>>> each >>>>>>>>>>> debug log line but I don't know how to translate it to time >>>>>>>>>>> (part of >>>>>>>>>>> it is transaction id, the rest not sure about) >>>>>>>>>>> >>>>>>>>>>> thanks! (and of course I unserstand you can't drop everything >>>>>>>>>>> and >>>>>>>>>>> work on my problem, thanks for answers!) >>>>>>>>>>> >>>>>>>>>>> erik >>>>>>>>>>> >>>>>>>>>>> On 11/02/2013 08:51 AM, Rainer Gerhards wrote: >>>>>>>>>>> >>>>>>>>>>> On Fri, Nov 1, 2013 at 11:31 PM, Erik Steffl <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> running reload every 43 seconds, behaviour is exactly the >>>>>>>>>>>> same >>>>>>>>>>>> >>>>>>>>>>>> (30 min >>>>>>>>>>>>> traffic, 15 min silence) so I think it's not related to reload. >>>>>>>>>>>>> >>>>>>>>>>>>> will test the latest version and I guess re-test the older >>>>>>>>>>>>> version and >>>>>>>>>>>>> go from there... >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> It would probably be a good idea to try to capture a debug >>>>>>>>>>>>> log so >>>>>>>>>>>>> >>>>>>>>>>>>> that we >>>>>>>>>>>> can see what rsyslog itself thinks what happens. Note that you >>>>>>>>>>>> can >>>>>>>>>>>> also >>>>>>>>>>>> just enable it shortly before you expect the problem to happen >>>>>>>>>>>> (google for >>>>>>>>>>>> "debug on demand"). >>>>>>>>>>>> >>>>>>>>>>>> Right now, I am totally busy with complex work and cannot look >>>>>>>>>>>> at it. >>>>>>>>>>>> If >>>>>>>>>>>> this is for a business, I would strongly suggest to purchase >>>>>>>>>>>> one of >>>>>>>>>>>> the >>>>>>>>>>>> low-priced support options. That would enable an engineer to >>>>>>>>>>>> begin >>>>>>>>>>>> working >>>>>>>>>>>> on the case within half a day or so. I would *really love* to >>>>>>>>>>>> drop >>>>>>>>>>>> this to >>>>>>>>>>>> someone else, as I really don't like the smell of this bug -- >>>>>>>>>>>> but if >>>>>>>>>>>> I do, >>>>>>>>>>>> I am pretty sure other bugs will creep up, and that would again >>>>>>>>>>>> be the >>>>>>>>>>>> death of the new work I am doing (and it took me 3 years or so >>>>>>>>>>>> to get >>>>>>>>>>>> to >>>>>>>>>>>> this point!). >>>>>>>>>>>> >>>>>>>>>>>> So it looks like non-paid work needs to stay focused on that >>>>>>>>>>>> work, >>>>>>>>>>>> which >>>>>>>>>>>> will be useful for all users in the hopefully not so distant >>>>>>>>>>>> future. >>>>>>>>>>>> All >>>>>>>>>>>> non-emergency bugs (usually security related things) or request >>>>>>>>>>>> thus >>>>>>>>>>>> need >>>>>>>>>>>> to be on hold for a couple of days or maybe till the end of >>>>>>>>>>>> month. >>>>>>>>>>>> >>>>>>>>>>>> If you get a debug log, I will have time to at least look >>>>>>>>>>>> quickly at >>>>>>>>>>>> it, >>>>>>>>>>>> but I can't say if that will be sufficient to provide a >>>>>>>>>>>> solution. >>>>>>>>>>>> >>>>>>>>>>>> sorry for that and I hope for your understanding. >>>>>>>>>>>> >>>>>>>>>>>> Rainer >>>>>>>>>>>> >>>>>>>>>>>> thanks! >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> erik >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 11/01/2013 02:29 PM, David Lang wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Ok, this is definantly sounding like a bug in RELP, it >>>>>>>>>>>>> worked on an >>>>>>>>>>>>> >>>>>>>>>>>>> earlier version, check if it works on the latest version, and >>>>>>>>>>>>>> then I >>>>>>>>>>>>>> would suggest fileing a bug on the www.rsyslog.com bugtracker >>>>>>>>>>>>>> >>>>>>>>>>>>>> This sounds like something Rainer is going to need to look >>>>>>>>>>>>>> at. He's >>>>>>>>>>>>>> in >>>>>>>>>>>>>> Germany, so I believe his weekend has started, he may see it >>>>>>>>>>>>>> over >>>>>>>>>>>>>> the >>>>>>>>>>>>>> weekend or it may wait until his work week starts. >>>>>>>>>>>>>> >>>>>>>>>>>>>> David Lang >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, 1 Nov 2013, Erik Steffl wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> pretty sure reload is sending HUP (that's upstart job as >>>>>>>>>>>>>> inlcluded in >>>>>>>>>>>>>> >>>>>>>>>>>>>> adiscon packages), the rsyslog keeps the same PID, I even >>>>>>>>>>>>>> tried to >>>>>>>>>>>>>> >>>>>>>>>>>>>>> strace it while running reload, here's syslog message: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Nov 1 10:47:25 ip-10-238-198-103 rsyslogd: [origin >>>>>>>>>>>>>>> software="rsyslogd" swVersion="7.5.5" x-pid="3865" >>>>>>>>>>>>>>> x-info="http://www.rsyslog.com"] rsyslogd was HUPed >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> but I think it must be related to something else cause >>>>>>>>>>>>>>> when I >>>>>>>>>>>>>>> removed >>>>>>>>>>>>>>> the reload (so the log mover was doing only rename, upload, >>>>>>>>>>>>>>> remove) it >>>>>>>>>>>>>>> was still exhibiting the same works for 30 min, silent for >>>>>>>>>>>>>>> 15 min >>>>>>>>>>>>>>> etc. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> running reload every 43 seconds now and will see what >>>>>>>>>>>>>>> happens... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> erik >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 11/01/2013 12:57 PM, David Lang wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> a HUP of the server should not cause any interruption in >>>>>>>>>>>>>>> processing >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> messages. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> are you sure upstart is seing HUP not doing a full >>>>>>>>>>>>>>>> stop/start? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> could you try sending HUP to rsyslog manually a few times >>>>>>>>>>>>>>>> to see >>>>>>>>>>>>>>>> if you >>>>>>>>>>>>>>>> can duplicate the problem? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> David Lang >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, 1 Nov 2013, Erik Steffl wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> sorry, it wasn't entirely clear in my previous email >>>>>>>>>>>>>>>> but we >>>>>>>>>>>>>>>> are doing >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ubuntu upstart reload which is sending HUP signal, pretty >>>>>>>>>>>>>>>> sure >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> it's >>>>>>>>>>>>>>>>> doing the right thing since logrotate scripts use it, >>>>>>>>>>>>>>>>> syslog >>>>>>>>>>>>>>>>> message >>>>>>>>>>>>>>>>> says rsyslog received HUP signal, rsyslogd keeps the same >>>>>>>>>>>>>>>>> PID >>>>>>>>>>>>>>>>> and all >>>>>>>>>>>>>>>>> files are closed (as verified using lsof). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On top of that if, during one of these quiet periods, >>>>>>>>>>>>>>>>> I try to >>>>>>>>>>>>>>>>> send >>>>>>>>>>>>>>>>> messages from another client (on which I just restarted >>>>>>>>>>>>>>>>> rsyslog >>>>>>>>>>>>>>>>> so it >>>>>>>>>>>>>>>>> has no error counters etc.) I think it's also not sending >>>>>>>>>>>>>>>>> anything, >>>>>>>>>>>>>>>>> when stracing it (the sender) I see it recieved messages >>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>> logger >>>>>>>>>>>>>>>>> but is not sending them anywhere. So I think that the >>>>>>>>>>>>>>>>> collector >>>>>>>>>>>>>>>>> rsyslog continues to send some kind of signal that it's not >>>>>>>>>>>>>>>>> ready but >>>>>>>>>>>>>>>>> not sure why or how to figure out whether it actually does >>>>>>>>>>>>>>>>> it. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Any ideas how to troubleshoot this? I think I'll >>>>>>>>>>>>>>>>> record more >>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> strace output and try to reconstruct RELP messages, I guess >>>>>>>>>>>>>>>>> there is >>>>>>>>>>>>>>>>> some RELP response that has something else than RSP-CODE = >>>>>>>>>>>>>>>>> 200 >>>>>>>>>>>>>>>>> (looking at http://www.rsyslog.com/doc/relp.html) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I was just about to publish the S3 upload scripts but >>>>>>>>>>>>>>>>> since >>>>>>>>>>>>>>>>> they >>>>>>>>>>>>>>>>> cause rsyslog to misbehave this way I didn't do it yet, >>>>>>>>>>>>>>>>> maybe >>>>>>>>>>>>>>>>> I'll >>>>>>>>>>>>>>>>> post them as they are and fix later... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> erik >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 11/01/2013 01:02 AM, David Lang wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> the purpose of RELP is to tell the senders if the >>>>>>>>>>>>>>>>> receiver was >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> able to >>>>>>>>>>>>>>>>>> accept the message or not, so that's working as designed. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> When you do a full restart of rsyslog, it stops >>>>>>>>>>>>>>>>>> processing new >>>>>>>>>>>>>>>>>> messages >>>>>>>>>>>>>>>>>> and disconnects all senders (you are doing a full stop of >>>>>>>>>>>>>>>>>> rsyslog and >>>>>>>>>>>>>>>>>> then starting it from scratch, this takes time). My guess >>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>> that the >>>>>>>>>>>>>>>>>> senders are detecting 'too many failures' when trying to >>>>>>>>>>>>>>>>>> send >>>>>>>>>>>>>>>>>> messages, >>>>>>>>>>>>>>>>>> so they are backing off and sleeping for a while rather >>>>>>>>>>>>>>>>>> tthan >>>>>>>>>>>>>>>>>> performing >>>>>>>>>>>>>>>>>> a mini DOS attack on the network and server. Every third >>>>>>>>>>>>>>>>>> restart is >>>>>>>>>>>>>>>>>> probably triggering a threshold. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Instead of doing a full restart, just send the rsyslog >>>>>>>>>>>>>>>>>> daemon a >>>>>>>>>>>>>>>>>> HUP >>>>>>>>>>>>>>>>>> signal instead. That will tell rsyslog to flush and close >>>>>>>>>>>>>>>>>> all >>>>>>>>>>>>>>>>>> files so >>>>>>>>>>>>>>>>>> that you can rotate them (If you are doing compression on >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> files, you >>>>>>>>>>>>>>>>>> may need to sleep for a few seconds to let the fluch >>>>>>>>>>>>>>>>>> complete). >>>>>>>>>>>>>>>>>> Rsyslog >>>>>>>>>>>>>>>>>> can continue to receive new messages during this time, so >>>>>>>>>>>>>>>>>> your >>>>>>>>>>>>>>>>>> senders >>>>>>>>>>>>>>>>>> will not see an outage. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> By the way, I'm needing a script to upload rsyslog >>>>>>>>>>>>>>>>>> archives to >>>>>>>>>>>>>>>>>> S3, >>>>>>>>>>>>>>>>>> could >>>>>>>>>>>>>>>>>> you send me a copy of yours? (remove any passwords first >>>>>>>>>>>>>>>>>> please >>>>>>>>>>>>>>>>>> :-) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> David Lang >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, 31 Oct 2013, Erik Steffl wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Date: Thu, 31 Oct 2013 18:44:23 -0700 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> From: Erik Steffl <[email protected]> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Reply-To: rsyslog-users <[email protected]> >>>>>>>>>>>>>>>>>>> To: rsyslog-users <[email protected]> >>>>>>>>>>>>>>>>>>> Subject: [rsyslog] Rsyslog with RELP not >>>>>>>>>>>>>>>>>>> sending/receiving >>>>>>>>>>>>>>>>>>> messages >>>>>>>>>>>>>>>>>>> for long >>>>>>>>>>>>>>>>>>> intervals >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> We have a fairly simple setup of 6 hosts sending syslog >>>>>>>>>>>>>>>>>>> messages to >>>>>>>>>>>>>>>>>>> one collector host, all of these run rsyslog >>>>>>>>>>>>>>>>>>> 7.5.5-0adiscon2 >>>>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>> adiscon repo and use RELP to transfer messages. There is >>>>>>>>>>>>>>>>>>> also >>>>>>>>>>>>>>>>>>> load >>>>>>>>>>>>>>>>>>> balancer in front of the collector machine but I dont' >>>>>>>>>>>>>>>>>>> think it >>>>>>>>>>>>>>>>>>> matters in this case. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Rsyslog on collector machine is configured to write to >>>>>>>>>>>>>>>>>>> files, >>>>>>>>>>>>>>>>>>> switching to a new file every 15 minute, using config >>>>>>>>>>>>>>>>>>> like this >>>>>>>>>>>>>>>>>>> (abbreviated a bit): >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> template(name="jsonFilename" type="list") { >>>>>>>>>>>>>>>>>>> constant(value="/path/") >>>>>>>>>>>>>>>>>>> property(name="$now") >>>>>>>>>>>>>>>>>>> constant(value="/") >>>>>>>>>>>>>>>>>>> property(name="$hour") >>>>>>>>>>>>>>>>>>> constant(value="/") >>>>>>>>>>>>>>>>>>> property(name="$qhour") >>>>>>>>>>>>>>>>>>> constant(value="/") >>>>>>>>>>>>>>>>>>> constant(value="log.json") >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> action(type="omfile" DynaFile="jsonFilename" >>>>>>>>>>>>>>>>>>> Template="jsonFormat") >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> We run a script at every 2, 17, 32, 47 minute of the >>>>>>>>>>>>>>>>>>> hour >>>>>>>>>>>>>>>>>>> and upload >>>>>>>>>>>>>>>>>>> the just finished file to S3. The uploading works like >>>>>>>>>>>>>>>>>>> this: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - let's say it's 3:02:00, rsyslog is writing to >>>>>>>>>>>>>>>>>>> /path/2013-10-10/03/00/log.json >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - get the filename log.json (anything that's not >>>>>>>>>>>>>>>>>>> current, >>>>>>>>>>>>>>>>>>> usually >>>>>>>>>>>>>>>>>>> just one previous file which in the example would be >>>>>>>>>>>>>>>>>>> /path/2013-10-10/02/03/log.json) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - rename /path/2013-10-10/02/03/log.json to >>>>>>>>>>>>>>>>>>> /path/2013-10-10/02/03/log.json.uploading.0 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - reload rsyslog (to make sure that even if for some >>>>>>>>>>>>>>>>>>> reason >>>>>>>>>>>>>>>>>>> it was >>>>>>>>>>>>>>>>>>> writing to just renamed file it would close it and open >>>>>>>>>>>>>>>>>>> a new >>>>>>>>>>>>>>>>>>> file) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - upload /path/2013-10-10/02/03/log.json.uploading.0 >>>>>>>>>>>>>>>>>>> to S3 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - remove /path/2013-10-10/02/03/log.json.uploading.0 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Here's what happens every third run (yes, regularly EVERY >>>>>>>>>>>>>>>>>>> THIRD RUN) >>>>>>>>>>>>>>>>>>> of this script: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - rsyslog stops writing to the CURRENT file >>>>>>>>>>>>>>>>>>> (/path/2013-10-10/03/00/log.json, the one that is NOT >>>>>>>>>>>>>>>>>>> being >>>>>>>>>>>>>>>>>>> renamed) >>>>>>>>>>>>>>>>>>> few seconds into the run of the script (e.g. 3:02:04) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - 6 hosts that were sending syslog messages to the >>>>>>>>>>>>>>>>>>> log >>>>>>>>>>>>>>>>>>> collector >>>>>>>>>>>>>>>>>>> STOP >>>>>>>>>>>>>>>>>>> sending anything (as verified by stracing rsyslogd, >>>>>>>>>>>>>>>>>>> tcpdump >>>>>>>>>>>>>>>>>>> and in >>>>>>>>>>>>>>>>>>> amazon AWS console metric for network in) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - after this nothing is ever written into >>>>>>>>>>>>>>>>>>> /path/2013-10-10/03/00/log.json >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - the 6 clients start sending sysog messages again >>>>>>>>>>>>>>>>>>> when the >>>>>>>>>>>>>>>>>>> next >>>>>>>>>>>>>>>>>>> file >>>>>>>>>>>>>>>>>>> is created (in this example it would be >>>>>>>>>>>>>>>>>>> /path/2013-10-10/03/01/log.json) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I checked and double check the files, dates, >>>>>>>>>>>>>>>>>>> verified that >>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> current file is not touched but can't figure out what's >>>>>>>>>>>>>>>>>>> going >>>>>>>>>>>>>>>>>>> on. I >>>>>>>>>>>>>>>>>>> tried the script without reload rsyslog but it didn't >>>>>>>>>>>>>>>>>>> make any >>>>>>>>>>>>>>>>>>> difference. If I don't run this script rsyslog works >>>>>>>>>>>>>>>>>>> flawlessly. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> any ideas how to troubleshoot this? What could be >>>>>>>>>>>>>>>>>>> causing >>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> rsyslog >>>>>>>>>>>>>>>>>>> to stop writing to the file and for the senders to stop >>>>>>>>>>>>>>>>>>> sending >>>>>>>>>>>>>>>>>>> syslog >>>>>>>>>>>>>>>>>>> messages? I assume the rsyslog on the collector host >>>>>>>>>>>>>>>>>>> somehow >>>>>>>>>>>>>>>>>>> signals >>>>>>>>>>>>>>>>>>> to the 6 hosts that send messages that it's not ready or >>>>>>>>>>>>>>>>>>> something... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> thanks! >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> erik >>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> rsyslog mailing list >>>>>>>>>>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>>>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>>>>>>>>>>> What's up with rsyslog? Follow >>>>>>>>>>>>>>>>>>> https://twitter.com/rgerhards >>>>>>>>>>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are >>>>>>>>>>>>>>>>>>> ARCHIVED >>>>>>>>>>>>>>>>>>> by a >>>>>>>>>>>>>>>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE >>>>>>>>>>>>>>>>>>> and DO >>>>>>>>>>>>>>>>>>> NOT >>>>>>>>>>>>>>>>>>> POST >>>>>>>>>>>>>>>>>>> if you DON'T LIKE THAT. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> rsyslog mailing list >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>>>>>>>>>> What's up with rsyslog? Follow >>>>>>>>>>>>>>>>>> https://twitter.com/rgerhards >>>>>>>>>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are >>>>>>>>>>>>>>>>>> ARCHIVED by >>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>> myriad >>>>>>>>>>>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO >>>>>>>>>>>>>>>>>> NOT POST >>>>>>>>>>>>>>>>>> if you >>>>>>>>>>>>>>>>>> DON'T LIKE THAT. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> rsyslog mailing list >>>>>>>>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>>>>>>>>> What's up with rsyslog? Follow >>>>>>>>>>>>>>>>> https://twitter.com/rgerhards >>>>>>>>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are >>>>>>>>>>>>>>>>> ARCHIVED by a >>>>>>>>>>>>>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and >>>>>>>>>>>>>>>>> DO >>>>>>>>>>>>>>>>> NOT POST >>>>>>>>>>>>>>>>> if you DON'T LIKE THAT. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> rsyslog mailing list >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>>>>>>>> What's up with rsyslog? Follow >>>>>>>>>>>>>>>> https://twitter.com/rgerhards >>>>>>>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are >>>>>>>>>>>>>>>> ARCHIVED by a >>>>>>>>>>>>>>>> myriad >>>>>>>>>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT >>>>>>>>>>>>>>>> POST >>>>>>>>>>>>>>>> if you >>>>>>>>>>>>>>>> DON'T LIKE THAT. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> rsyslog mailing list >>>>>>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED >>>>>>>>>>>>>>> by a >>>>>>>>>>>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and >>>>>>>>>>>>>>> DO NOT >>>>>>>>>>>>>>> POST >>>>>>>>>>>>>>> if you DON'T LIKE THAT. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> rsyslog mailing list >>>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED >>>>>>>>>>>>>> by a >>>>>>>>>>>>>> myriad >>>>>>>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT >>>>>>>>>>>>>> POST if >>>>>>>>>>>>>> you >>>>>>>>>>>>>> DON'T LIKE THAT. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> >>>>>>>>>>>>> rsyslog mailing list >>>>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED >>>>>>>>>>>>> by a >>>>>>>>>>>>> myriad >>>>>>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT >>>>>>>>>>>>> POST if >>>>>>>>>>>>> you >>>>>>>>>>>>> DON'T LIKE THAT. >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> >>>>>>>>>>>>> rsyslog mailing list >>>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by >>>>>>>>>>>> a >>>>>>>>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO >>>>>>>>>>>> NOT >>>>>>>>>>>> POST if you DON'T LIKE THAT. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> >>>>>>>>>>> rsyslog mailing list >>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO >>>>>>>>>>> NOT POST >>>>>>>>>>> if you DON'T LIKE THAT. >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> >>>>>>>>>>> rsyslog mailing list >>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>>>> myriad >>>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>>>>>>>>> if you >>>>>>>>>> DON'T LIKE THAT. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> rsyslog mailing list >>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>>> myriad >>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>>>>>> you >>>>>>>>> DON'T LIKE THAT. >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> >>>>>>>> rsyslog mailing list >>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>> myriad >>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>>>>> you >>>>>>>> DON'T LIKE THAT. >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>> rsyslog mailing list >>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>> myriad >>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>>>> you >>>>>>> DON'T LIKE THAT. >>>>>>> >>>>>>> _______________________________________________ >>>>>> rsyslog mailing list >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>> http://www.rsyslog.com/professional-services/ >>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT >>>>>> POST if you DON'T LIKE THAT. >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> rsyslog mailing list >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> http://www.rsyslog.com/professional-services/ >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>>>> if you DON'T LIKE THAT. >>>>> >>>>> _______________________________________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com/professional-services/ >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>> DON'T LIKE THAT. >>>> >>> >>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com/professional-services/ >>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>> DON'T LIKE THAT. >>> >>> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >> DON'T LIKE THAT. >> > > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

