On Fri, Nov 1, 2013 at 11:31 PM, Erik Steffl <[email protected]> wrote:
> running reload every 43 seconds, behaviour is exactly the same (30 min > traffic, 15 min silence) so I think it's not related to reload. > > will test the latest version and I guess re-test the older version and > go from there... > > It would probably be a good idea to try to capture a debug log so that we can see what rsyslog itself thinks what happens. Note that you can also just enable it shortly before you expect the problem to happen (google for "debug on demand"). Right now, I am totally busy with complex work and cannot look at it. If this is for a business, I would strongly suggest to purchase one of the low-priced support options. That would enable an engineer to begin working on the case within half a day or so. I would *really love* to drop this to someone else, as I really don't like the smell of this bug -- but if I do, I am pretty sure other bugs will creep up, and that would again be the death of the new work I am doing (and it took me 3 years or so to get to this point!). So it looks like non-paid work needs to stay focused on that work, which will be useful for all users in the hopefully not so distant future. All non-emergency bugs (usually security related things) or request thus need to be on hold for a couple of days or maybe till the end of month. If you get a debug log, I will have time to at least look quickly at it, but I can't say if that will be sufficient to provide a solution. sorry for that and I hope for your understanding. Rainer thanks! > > erik > > > On 11/01/2013 02:29 PM, David Lang wrote: > >> Ok, this is definantly sounding like a bug in RELP, it worked on an >> earlier version, check if it works on the latest version, and then I >> would suggest fileing a bug on the www.rsyslog.com bugtracker >> >> This sounds like something Rainer is going to need to look at. He's in >> Germany, so I believe his weekend has started, he may see it over the >> weekend or it may wait until his work week starts. >> >> David Lang >> >> On Fri, 1 Nov 2013, Erik Steffl wrote: >> >> pretty sure reload is sending HUP (that's upstart job as inlcluded in >>> adiscon packages), the rsyslog keeps the same PID, I even tried to >>> strace it while running reload, here's syslog message: >>> >>> Nov 1 10:47:25 ip-10-238-198-103 rsyslogd: [origin >>> software="rsyslogd" swVersion="7.5.5" x-pid="3865" >>> x-info="http://www.rsyslog.com"] rsyslogd was HUPed >>> >>> but I think it must be related to something else cause when I removed >>> the reload (so the log mover was doing only rename, upload, remove) it >>> was still exhibiting the same works for 30 min, silent for 15 min etc. >>> >>> running reload every 43 seconds now and will see what happens... >>> >>> erik >>> >>> On 11/01/2013 12:57 PM, David Lang wrote: >>> >>>> a HUP of the server should not cause any interruption in processing >>>> messages. >>>> >>>> are you sure upstart is seing HUP not doing a full stop/start? >>>> >>>> could you try sending HUP to rsyslog manually a few times to see if you >>>> can duplicate the problem? >>>> >>>> David Lang >>>> >>>> On Fri, 1 Nov 2013, Erik Steffl wrote: >>>> >>>> sorry, it wasn't entirely clear in my previous email but we are doing >>>>> ubuntu upstart reload which is sending HUP signal, pretty sure it's >>>>> doing the right thing since logrotate scripts use it, syslog message >>>>> says rsyslog received HUP signal, rsyslogd keeps the same PID and all >>>>> files are closed (as verified using lsof). >>>>> >>>>> On top of that if, during one of these quiet periods, I try to send >>>>> messages from another client (on which I just restarted rsyslog so it >>>>> has no error counters etc.) I think it's also not sending anything, >>>>> when stracing it (the sender) I see it recieved messages from logger >>>>> but is not sending them anywhere. So I think that the collector >>>>> rsyslog continues to send some kind of signal that it's not ready but >>>>> not sure why or how to figure out whether it actually does it. >>>>> >>>>> Any ideas how to troubleshoot this? I think I'll record more of >>>>> strace output and try to reconstruct RELP messages, I guess there is >>>>> some RELP response that has something else than RSP-CODE = 200 >>>>> (looking at http://www.rsyslog.com/doc/relp.html) >>>>> >>>>> I was just about to publish the S3 upload scripts but since they >>>>> cause rsyslog to misbehave this way I didn't do it yet, maybe I'll >>>>> post them as they are and fix later... >>>>> >>>>> erik >>>>> >>>>> On 11/01/2013 01:02 AM, David Lang wrote: >>>>> >>>>>> the purpose of RELP is to tell the senders if the receiver was able to >>>>>> accept the message or not, so that's working as designed. >>>>>> >>>>>> When you do a full restart of rsyslog, it stops processing new >>>>>> messages >>>>>> and disconnects all senders (you are doing a full stop of rsyslog and >>>>>> then starting it from scratch, this takes time). My guess is that the >>>>>> senders are detecting 'too many failures' when trying to send >>>>>> messages, >>>>>> so they are backing off and sleeping for a while rather tthan >>>>>> performing >>>>>> a mini DOS attack on the network and server. Every third restart is >>>>>> probably triggering a threshold. >>>>>> >>>>>> Instead of doing a full restart, just send the rsyslog daemon a HUP >>>>>> signal instead. That will tell rsyslog to flush and close all files so >>>>>> that you can rotate them (If you are doing compression on the >>>>>> files, you >>>>>> may need to sleep for a few seconds to let the fluch complete). >>>>>> Rsyslog >>>>>> can continue to receive new messages during this time, so your senders >>>>>> will not see an outage. >>>>>> >>>>>> By the way, I'm needing a script to upload rsyslog archives to S3, >>>>>> could >>>>>> you send me a copy of yours? (remove any passwords first please :-) >>>>>> >>>>>> David Lang >>>>>> >>>>>> >>>>>> On Thu, 31 Oct 2013, Erik Steffl wrote: >>>>>> >>>>>> Date: Thu, 31 Oct 2013 18:44:23 -0700 >>>>>>> From: Erik Steffl <[email protected]> >>>>>>> Reply-To: rsyslog-users <[email protected]> >>>>>>> To: rsyslog-users <[email protected]> >>>>>>> Subject: [rsyslog] Rsyslog with RELP not sending/receiving messages >>>>>>> for long >>>>>>> intervals >>>>>>> >>>>>>> We have a fairly simple setup of 6 hosts sending syslog messages to >>>>>>> one collector host, all of these run rsyslog 7.5.5-0adiscon2 from >>>>>>> adiscon repo and use RELP to transfer messages. There is also load >>>>>>> balancer in front of the collector machine but I dont' think it >>>>>>> matters in this case. >>>>>>> >>>>>>> Rsyslog on collector machine is configured to write to files, >>>>>>> switching to a new file every 15 minute, using config like this >>>>>>> (abbreviated a bit): >>>>>>> >>>>>>> template(name="jsonFilename" type="list") { >>>>>>> constant(value="/path/") >>>>>>> property(name="$now") >>>>>>> constant(value="/") >>>>>>> property(name="$hour") >>>>>>> constant(value="/") >>>>>>> property(name="$qhour") >>>>>>> constant(value="/") >>>>>>> constant(value="log.json") >>>>>>> } >>>>>>> >>>>>>> action(type="omfile" DynaFile="jsonFilename" Template="jsonFormat") >>>>>>> >>>>>>> We run a script at every 2, 17, 32, 47 minute of the hour and upload >>>>>>> the just finished file to S3. The uploading works like this: >>>>>>> >>>>>>> - let's say it's 3:02:00, rsyslog is writing to >>>>>>> /path/2013-10-10/03/00/log.json >>>>>>> >>>>>>> - get the filename log.json (anything that's not current, usually >>>>>>> just one previous file which in the example would be >>>>>>> /path/2013-10-10/02/03/log.json) >>>>>>> >>>>>>> - rename /path/2013-10-10/02/03/log.json to >>>>>>> /path/2013-10-10/02/03/log.json.uploading.0 >>>>>>> >>>>>>> - reload rsyslog (to make sure that even if for some reason it was >>>>>>> writing to just renamed file it would close it and open a new file) >>>>>>> >>>>>>> - upload /path/2013-10-10/02/03/log.json.uploading.0 to S3 >>>>>>> >>>>>>> - remove /path/2013-10-10/02/03/log.json.uploading.0 >>>>>>> >>>>>>> Here's what happens every third run (yes, regularly EVERY THIRD RUN) >>>>>>> of this script: >>>>>>> >>>>>>> - rsyslog stops writing to the CURRENT file >>>>>>> (/path/2013-10-10/03/00/log.json, the one that is NOT being renamed) >>>>>>> few seconds into the run of the script (e.g. 3:02:04) >>>>>>> >>>>>>> - 6 hosts that were sending syslog messages to the log collector >>>>>>> STOP >>>>>>> sending anything (as verified by stracing rsyslogd, tcpdump and in >>>>>>> amazon AWS console metric for network in) >>>>>>> >>>>>>> - after this nothing is ever written into >>>>>>> /path/2013-10-10/03/00/log.json >>>>>>> >>>>>>> - the 6 clients start sending sysog messages again when the next >>>>>>> file >>>>>>> is created (in this example it would be >>>>>>> /path/2013-10-10/03/01/log.json) >>>>>>> >>>>>>> I checked and double check the files, dates, verified that the >>>>>>> current file is not touched but can't figure out what's going on. I >>>>>>> tried the script without reload rsyslog but it didn't make any >>>>>>> difference. If I don't run this script rsyslog works flawlessly. >>>>>>> >>>>>>> any ideas how to troubleshoot this? What could be causing the >>>>>>> rsyslog >>>>>>> to stop writing to the file and for the senders to stop sending >>>>>>> syslog >>>>>>> messages? I assume the rsyslog on the collector host somehow signals >>>>>>> to the 6 hosts that send messages that it's not ready or something... >>>>>>> >>>>>>> thanks! >>>>>>> >>>>>>> erik >>>>>>> _______________________________________________ >>>>>>> rsyslog mailing list >>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT >>>>>>> POST >>>>>>> if you DON'T LIKE THAT. >>>>>>> >>>>>>> _______________________________________________ >>>>>> rsyslog mailing list >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>> http://www.rsyslog.com/professional-services/ >>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>> myriad >>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>>> DON'T LIKE THAT. >>>>>> >>>>> >>>>> _______________________________________________ >>>>> rsyslog mailing list >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> http://www.rsyslog.com/professional-services/ >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>>>> if you DON'T LIKE THAT. >>>>> >>>>> _______________________________________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com/professional-services/ >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>> DON'T LIKE THAT. >>>> >>> >>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com/professional-services/ >>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>> if you DON'T LIKE THAT. >>> >>> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >> DON'T LIKE THAT. >> > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

