On Tue, Nov 5, 2013 at 3:41 AM, Erik Steffl <[email protected]> wrote: > See the debug log attached, the first line is the last message in one of > the files (maybe not last message overall, but all communication stops at > that second, there's number of files so it's hard to figure out which > message is the last one). > > Also noticed that number of messages in log with same number at the > beginning drops right after this message, I assume the beginning of the > line has something to do with time, these are the counts and values at the > beginning of the debug log line: >
It's the last 5 digits of a Unix timestamp - so you don't get an absolute time, but an idea how things progress. After the dot I think there are ms (or so) and after the colon there is a thread ID (same ID = same thread). Hope that helps a bit, will try to look at the debug log tomorrow morning. Rainer > > 21264 9547. 22378 9548. > 2725 9549. > 1877 9550. > ...9551 to 9969 left out, low counts around 2000 > ...9970 to 0519 left out, low counts around 6 > 6 0520. > 6 0521. > 39302 0522. > 51997 0523. > > thanks! > > erik > > > On 11/03/2013 01:05 PM, David Lang wrote: > >> I would post everything in the log file starting with the last log that >> it successfully sent and continuing a while after that message (a while >> being a couple hundred lines or so to be safe) >> >> once we look at the log we may ask for more. >> >> David Lang >> >> >> On Sun, 3 Nov 2013, Erik Steffl wrote: >> >> Date: Sun, 03 Nov 2013 11:45:03 -0800 >>> From: Erik Steffl <[email protected]> >>> Reply-To: rsyslog-users <[email protected]> >>> To: rsyslog-users <[email protected]> >>> Subject: Re: [rsyslog] Rsyslog with RELP not sending/receiving >>> messages for >>> long intervals >>> >>> I have debug log where I was able to find last message in one of the >>> logs files (right before the silence period) but it's 3 GB :) Any hint >>> how much to pick the relevant part of the debug log? Is the timestamp >>> part of the log message? There is something in the beginning of each >>> debug log line but I don't know how to translate it to time (part of >>> it is transaction id, the rest not sure about) >>> >>> thanks! (and of course I unserstand you can't drop everything and >>> work on my problem, thanks for answers!) >>> >>> erik >>> >>> On 11/02/2013 08:51 AM, Rainer Gerhards wrote: >>> >>>> On Fri, Nov 1, 2013 at 11:31 PM, Erik Steffl <[email protected]> wrote: >>>> >>>> running reload every 43 seconds, behaviour is exactly the same >>>>> (30 min >>>>> traffic, 15 min silence) so I think it's not related to reload. >>>>> >>>>> will test the latest version and I guess re-test the older >>>>> version and >>>>> go from there... >>>>> >>>>> >>>>> It would probably be a good idea to try to capture a debug log so >>>> that we >>>> can see what rsyslog itself thinks what happens. Note that you can also >>>> just enable it shortly before you expect the problem to happen >>>> (google for >>>> "debug on demand"). >>>> >>>> Right now, I am totally busy with complex work and cannot look at it. If >>>> this is for a business, I would strongly suggest to purchase one of the >>>> low-priced support options. That would enable an engineer to begin >>>> working >>>> on the case within half a day or so. I would *really love* to drop >>>> this to >>>> someone else, as I really don't like the smell of this bug -- but if >>>> I do, >>>> I am pretty sure other bugs will creep up, and that would again be the >>>> death of the new work I am doing (and it took me 3 years or so to get to >>>> this point!). >>>> >>>> So it looks like non-paid work needs to stay focused on that work, which >>>> will be useful for all users in the hopefully not so distant future. All >>>> non-emergency bugs (usually security related things) or request thus >>>> need >>>> to be on hold for a couple of days or maybe till the end of month. >>>> >>>> If you get a debug log, I will have time to at least look quickly at it, >>>> but I can't say if that will be sufficient to provide a solution. >>>> >>>> sorry for that and I hope for your understanding. >>>> >>>> Rainer >>>> >>>> thanks! >>>> >>>>> >>>>> erik >>>>> >>>>> >>>>> On 11/01/2013 02:29 PM, David Lang wrote: >>>>> >>>>> Ok, this is definantly sounding like a bug in RELP, it worked on an >>>>>> earlier version, check if it works on the latest version, and then I >>>>>> would suggest fileing a bug on the www.rsyslog.com bugtracker >>>>>> >>>>>> This sounds like something Rainer is going to need to look at. He's in >>>>>> Germany, so I believe his weekend has started, he may see it over the >>>>>> weekend or it may wait until his work week starts. >>>>>> >>>>>> David Lang >>>>>> >>>>>> On Fri, 1 Nov 2013, Erik Steffl wrote: >>>>>> >>>>>> pretty sure reload is sending HUP (that's upstart job as >>>>>> inlcluded in >>>>>> >>>>>>> adiscon packages), the rsyslog keeps the same PID, I even tried to >>>>>>> strace it while running reload, here's syslog message: >>>>>>> >>>>>>> Nov 1 10:47:25 ip-10-238-198-103 rsyslogd: [origin >>>>>>> software="rsyslogd" swVersion="7.5.5" x-pid="3865" >>>>>>> x-info="http://www.rsyslog.com"] rsyslogd was HUPed >>>>>>> >>>>>>> but I think it must be related to something else cause when I >>>>>>> removed >>>>>>> the reload (so the log mover was doing only rename, upload, >>>>>>> remove) it >>>>>>> was still exhibiting the same works for 30 min, silent for 15 min >>>>>>> etc. >>>>>>> >>>>>>> running reload every 43 seconds now and will see what happens... >>>>>>> >>>>>>> erik >>>>>>> >>>>>>> On 11/01/2013 12:57 PM, David Lang wrote: >>>>>>> >>>>>>> a HUP of the server should not cause any interruption in processing >>>>>>>> messages. >>>>>>>> >>>>>>>> are you sure upstart is seing HUP not doing a full stop/start? >>>>>>>> >>>>>>>> could you try sending HUP to rsyslog manually a few times to see >>>>>>>> if you >>>>>>>> can duplicate the problem? >>>>>>>> >>>>>>>> David Lang >>>>>>>> >>>>>>>> On Fri, 1 Nov 2013, Erik Steffl wrote: >>>>>>>> >>>>>>>> sorry, it wasn't entirely clear in my previous email but we >>>>>>>> are doing >>>>>>>> >>>>>>>>> ubuntu upstart reload which is sending HUP signal, pretty sure it's >>>>>>>>> doing the right thing since logrotate scripts use it, syslog >>>>>>>>> message >>>>>>>>> says rsyslog received HUP signal, rsyslogd keeps the same PID >>>>>>>>> and all >>>>>>>>> files are closed (as verified using lsof). >>>>>>>>> >>>>>>>>> On top of that if, during one of these quiet periods, I try to >>>>>>>>> send >>>>>>>>> messages from another client (on which I just restarted rsyslog >>>>>>>>> so it >>>>>>>>> has no error counters etc.) I think it's also not sending anything, >>>>>>>>> when stracing it (the sender) I see it recieved messages from >>>>>>>>> logger >>>>>>>>> but is not sending them anywhere. So I think that the collector >>>>>>>>> rsyslog continues to send some kind of signal that it's not >>>>>>>>> ready but >>>>>>>>> not sure why or how to figure out whether it actually does it. >>>>>>>>> >>>>>>>>> Any ideas how to troubleshoot this? I think I'll record more of >>>>>>>>> strace output and try to reconstruct RELP messages, I guess >>>>>>>>> there is >>>>>>>>> some RELP response that has something else than RSP-CODE = 200 >>>>>>>>> (looking at http://www.rsyslog.com/doc/relp.html) >>>>>>>>> >>>>>>>>> I was just about to publish the S3 upload scripts but since they >>>>>>>>> cause rsyslog to misbehave this way I didn't do it yet, maybe I'll >>>>>>>>> post them as they are and fix later... >>>>>>>>> >>>>>>>>> erik >>>>>>>>> >>>>>>>>> On 11/01/2013 01:02 AM, David Lang wrote: >>>>>>>>> >>>>>>>>> the purpose of RELP is to tell the senders if the receiver was >>>>>>>>>> able to >>>>>>>>>> accept the message or not, so that's working as designed. >>>>>>>>>> >>>>>>>>>> When you do a full restart of rsyslog, it stops processing new >>>>>>>>>> messages >>>>>>>>>> and disconnects all senders (you are doing a full stop of >>>>>>>>>> rsyslog and >>>>>>>>>> then starting it from scratch, this takes time). My guess is >>>>>>>>>> that the >>>>>>>>>> senders are detecting 'too many failures' when trying to send >>>>>>>>>> messages, >>>>>>>>>> so they are backing off and sleeping for a while rather tthan >>>>>>>>>> performing >>>>>>>>>> a mini DOS attack on the network and server. Every third >>>>>>>>>> restart is >>>>>>>>>> probably triggering a threshold. >>>>>>>>>> >>>>>>>>>> Instead of doing a full restart, just send the rsyslog daemon a >>>>>>>>>> HUP >>>>>>>>>> signal instead. That will tell rsyslog to flush and close all >>>>>>>>>> files so >>>>>>>>>> that you can rotate them (If you are doing compression on the >>>>>>>>>> files, you >>>>>>>>>> may need to sleep for a few seconds to let the fluch complete). >>>>>>>>>> Rsyslog >>>>>>>>>> can continue to receive new messages during this time, so your >>>>>>>>>> senders >>>>>>>>>> will not see an outage. >>>>>>>>>> >>>>>>>>>> By the way, I'm needing a script to upload rsyslog archives to S3, >>>>>>>>>> could >>>>>>>>>> you send me a copy of yours? (remove any passwords first please >>>>>>>>>> :-) >>>>>>>>>> >>>>>>>>>> David Lang >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, 31 Oct 2013, Erik Steffl wrote: >>>>>>>>>> >>>>>>>>>> Date: Thu, 31 Oct 2013 18:44:23 -0700 >>>>>>>>>> >>>>>>>>>>> From: Erik Steffl <[email protected]> >>>>>>>>>>> Reply-To: rsyslog-users <[email protected]> >>>>>>>>>>> To: rsyslog-users <[email protected]> >>>>>>>>>>> Subject: [rsyslog] Rsyslog with RELP not sending/receiving >>>>>>>>>>> messages >>>>>>>>>>> for long >>>>>>>>>>> intervals >>>>>>>>>>> >>>>>>>>>>> We have a fairly simple setup of 6 hosts sending syslog >>>>>>>>>>> messages to >>>>>>>>>>> one collector host, all of these run rsyslog 7.5.5-0adiscon2 from >>>>>>>>>>> adiscon repo and use RELP to transfer messages. There is also >>>>>>>>>>> load >>>>>>>>>>> balancer in front of the collector machine but I dont' think it >>>>>>>>>>> matters in this case. >>>>>>>>>>> >>>>>>>>>>> Rsyslog on collector machine is configured to write to files, >>>>>>>>>>> switching to a new file every 15 minute, using config like this >>>>>>>>>>> (abbreviated a bit): >>>>>>>>>>> >>>>>>>>>>> template(name="jsonFilename" type="list") { >>>>>>>>>>> constant(value="/path/") >>>>>>>>>>> property(name="$now") >>>>>>>>>>> constant(value="/") >>>>>>>>>>> property(name="$hour") >>>>>>>>>>> constant(value="/") >>>>>>>>>>> property(name="$qhour") >>>>>>>>>>> constant(value="/") >>>>>>>>>>> constant(value="log.json") >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> action(type="omfile" DynaFile="jsonFilename" >>>>>>>>>>> Template="jsonFormat") >>>>>>>>>>> >>>>>>>>>>> We run a script at every 2, 17, 32, 47 minute of the hour >>>>>>>>>>> and upload >>>>>>>>>>> the just finished file to S3. The uploading works like this: >>>>>>>>>>> >>>>>>>>>>> - let's say it's 3:02:00, rsyslog is writing to >>>>>>>>>>> /path/2013-10-10/03/00/log.json >>>>>>>>>>> >>>>>>>>>>> - get the filename log.json (anything that's not current, >>>>>>>>>>> usually >>>>>>>>>>> just one previous file which in the example would be >>>>>>>>>>> /path/2013-10-10/02/03/log.json) >>>>>>>>>>> >>>>>>>>>>> - rename /path/2013-10-10/02/03/log.json to >>>>>>>>>>> /path/2013-10-10/02/03/log.json.uploading.0 >>>>>>>>>>> >>>>>>>>>>> - reload rsyslog (to make sure that even if for some reason >>>>>>>>>>> it was >>>>>>>>>>> writing to just renamed file it would close it and open a new >>>>>>>>>>> file) >>>>>>>>>>> >>>>>>>>>>> - upload /path/2013-10-10/02/03/log.json.uploading.0 to S3 >>>>>>>>>>> >>>>>>>>>>> - remove /path/2013-10-10/02/03/log.json.uploading.0 >>>>>>>>>>> >>>>>>>>>>> Here's what happens every third run (yes, regularly EVERY >>>>>>>>>>> THIRD RUN) >>>>>>>>>>> of this script: >>>>>>>>>>> >>>>>>>>>>> - rsyslog stops writing to the CURRENT file >>>>>>>>>>> (/path/2013-10-10/03/00/log.json, the one that is NOT being >>>>>>>>>>> renamed) >>>>>>>>>>> few seconds into the run of the script (e.g. 3:02:04) >>>>>>>>>>> >>>>>>>>>>> - 6 hosts that were sending syslog messages to the log >>>>>>>>>>> collector >>>>>>>>>>> STOP >>>>>>>>>>> sending anything (as verified by stracing rsyslogd, tcpdump >>>>>>>>>>> and in >>>>>>>>>>> amazon AWS console metric for network in) >>>>>>>>>>> >>>>>>>>>>> - after this nothing is ever written into >>>>>>>>>>> /path/2013-10-10/03/00/log.json >>>>>>>>>>> >>>>>>>>>>> - the 6 clients start sending sysog messages again when the >>>>>>>>>>> next >>>>>>>>>>> file >>>>>>>>>>> is created (in this example it would be >>>>>>>>>>> /path/2013-10-10/03/01/log.json) >>>>>>>>>>> >>>>>>>>>>> I checked and double check the files, dates, verified that the >>>>>>>>>>> current file is not touched but can't figure out what's going >>>>>>>>>>> on. I >>>>>>>>>>> tried the script without reload rsyslog but it didn't make any >>>>>>>>>>> difference. If I don't run this script rsyslog works flawlessly. >>>>>>>>>>> >>>>>>>>>>> any ideas how to troubleshoot this? What could be causing the >>>>>>>>>>> rsyslog >>>>>>>>>>> to stop writing to the file and for the senders to stop sending >>>>>>>>>>> syslog >>>>>>>>>>> messages? I assume the rsyslog on the collector host somehow >>>>>>>>>>> signals >>>>>>>>>>> to the 6 hosts that send messages that it's not ready or >>>>>>>>>>> something... >>>>>>>>>>> >>>>>>>>>>> thanks! >>>>>>>>>>> >>>>>>>>>>> erik >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> rsyslog mailing list >>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT >>>>>>>>>>> POST >>>>>>>>>>> if you DON'T LIKE THAT. >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> >>>>>>>>>> rsyslog mailing list >>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>>>> myriad >>>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>>>>>>>>> if you >>>>>>>>>> DON'T LIKE THAT. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> rsyslog mailing list >>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO >>>>>>>>> NOT POST >>>>>>>>> if you DON'T LIKE THAT. >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> >>>>>>>> rsyslog mailing list >>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>> myriad >>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>>>>>>> if you >>>>>>>> DON'T LIKE THAT. >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> rsyslog mailing list >>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT >>>>>>> POST >>>>>>> if you DON'T LIKE THAT. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> >>>>>> rsyslog mailing list >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>> http://www.rsyslog.com/professional-services/ >>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>> myriad >>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>>> DON'T LIKE THAT. >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> rsyslog mailing list >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> http://www.rsyslog.com/professional-services/ >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>> myriad >>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>> DON'T LIKE THAT. >>>>> >>>>> _______________________________________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com/professional-services/ >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT >>>> POST if you DON'T LIKE THAT. >>>> >>>> >>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com/professional-services/ >>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>> if you DON'T LIKE THAT. >>> >>> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >> DON'T LIKE THAT. >> > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

