On Tue, Nov 5, 2013 at 3:41 AM, Erik Steffl <[email protected]> wrote:

>   See the debug log attached, the first line is the last message in one of
> the files (maybe not last message overall, but all communication stops at
> that second, there's number of files so it's hard to figure out which
> message is the last one).
>
>   Also noticed that number of messages in log with same number at the
> beginning drops right after this message, I assume the beginning of the
> line has something to do with time, these are the counts and values at the
> beginning of the debug log line:
>

It's the last 5 digits of a Unix timestamp - so you don't get an absolute
time, but an idea how things progress. After the dot I think there are ms
(or so) and after the colon there is a thread ID (same ID = same thread).

Hope that helps a bit, will try to look at the debug log tomorrow morning.

Rainer


>
>   21264 9547.

  22378 9548.
>    2725 9549.
>    1877 9550.
> ...9551 to 9969 left out, low counts around 2000
> ...9970 to 0519 left out, low counts around  6
>       6 0520.
>       6 0521.
>   39302 0522.
>   51997 0523.
>
>   thanks!
>
>         erik
>
>
> On 11/03/2013 01:05 PM, David Lang wrote:
>
>> I would post everything in the log file starting with the last log that
>> it successfully sent and continuing a while after that message (a while
>> being a couple hundred lines or so to be safe)
>>
>> once we look at the log we may ask for more.
>>
>> David Lang
>>
>>
>>   On Sun, 3 Nov 2013, Erik Steffl wrote:
>>
>>  Date: Sun, 03 Nov 2013 11:45:03 -0800
>>> From: Erik Steffl <[email protected]>
>>> Reply-To: rsyslog-users <[email protected]>
>>> To: rsyslog-users <[email protected]>
>>> Subject: Re: [rsyslog] Rsyslog with RELP not sending/receiving
>>> messages for
>>>     long intervals
>>>
>>>  I have debug log where I was able to find last message in one of the
>>> logs files (right before the silence period) but it's 3 GB :) Any hint
>>> how much to pick the relevant part of the debug log? Is the timestamp
>>> part of the log message? There is something in the beginning of each
>>> debug log line but I don't know how to translate it to time (part of
>>> it is transaction id, the rest not sure about)
>>>
>>>  thanks! (and of course I unserstand you can't drop everything and
>>> work on my problem, thanks for answers!)
>>>
>>>     erik
>>>
>>> On 11/02/2013 08:51 AM, Rainer Gerhards wrote:
>>>
>>>> On Fri, Nov 1, 2013 at 11:31 PM, Erik Steffl <[email protected]> wrote:
>>>>
>>>>     running reload every 43 seconds, behaviour is exactly the same
>>>>> (30 min
>>>>> traffic, 15 min silence) so I think it's not related to reload.
>>>>>
>>>>>    will test the latest version and I guess re-test the older
>>>>> version and
>>>>> go from there...
>>>>>
>>>>>
>>>>>  It would probably be a good idea to try to capture a debug log so
>>>> that we
>>>> can see what rsyslog itself thinks what happens. Note that you can also
>>>> just enable it shortly before you expect the problem to happen
>>>> (google for
>>>> "debug on demand").
>>>>
>>>> Right now, I am totally busy with complex work and cannot look at it. If
>>>> this is for a business, I would strongly suggest to purchase one of the
>>>> low-priced support options. That would enable an engineer to begin
>>>> working
>>>> on the case within half a day or so. I would *really love* to drop
>>>> this to
>>>> someone else, as I really don't like the smell of this bug -- but if
>>>> I do,
>>>> I am pretty sure other bugs will creep up, and that would again be the
>>>> death of the new work I am doing (and it took me 3 years or so to get to
>>>> this point!).
>>>>
>>>> So it looks like non-paid work needs to stay focused on that work, which
>>>> will be useful for all users in the hopefully not so distant future. All
>>>> non-emergency bugs (usually security related things) or request thus
>>>> need
>>>> to be on hold for a couple of days or maybe till the end of month.
>>>>
>>>> If you get a debug log, I will have time to at least look quickly at it,
>>>> but I can't say if that will be sufficient to provide a solution.
>>>>
>>>> sorry for that and I hope for your understanding.
>>>>
>>>> Rainer
>>>>
>>>>    thanks!
>>>>
>>>>>
>>>>>          erik
>>>>>
>>>>>
>>>>> On 11/01/2013 02:29 PM, David Lang wrote:
>>>>>
>>>>>  Ok, this is definantly sounding like a bug in RELP, it worked on an
>>>>>> earlier version, check if it works on the latest version, and then I
>>>>>> would suggest fileing a bug on the www.rsyslog.com bugtracker
>>>>>>
>>>>>> This sounds like something Rainer is going to need to look at. He's in
>>>>>> Germany, so I believe his weekend has started, he may see it over the
>>>>>> weekend or it may wait until his work week starts.
>>>>>>
>>>>>> David Lang
>>>>>>
>>>>>>    On Fri, 1 Nov 2013, Erik Steffl wrote:
>>>>>>
>>>>>>    pretty sure reload is sending HUP (that's upstart job as
>>>>>> inlcluded in
>>>>>>
>>>>>>> adiscon packages), the rsyslog keeps the same PID, I even tried to
>>>>>>> strace it while running reload, here's syslog message:
>>>>>>>
>>>>>>> Nov  1 10:47:25 ip-10-238-198-103 rsyslogd: [origin
>>>>>>> software="rsyslogd" swVersion="7.5.5" x-pid="3865"
>>>>>>> x-info="http://www.rsyslog.com";] rsyslogd was HUPed
>>>>>>>
>>>>>>>   but I think it must be related to something else cause when I
>>>>>>> removed
>>>>>>> the reload (so the log mover was doing only rename, upload,
>>>>>>> remove) it
>>>>>>> was still exhibiting the same works for 30 min, silent for 15 min
>>>>>>> etc.
>>>>>>>
>>>>>>>   running reload every 43 seconds now and will see what happens...
>>>>>>>
>>>>>>>      erik
>>>>>>>
>>>>>>> On 11/01/2013 12:57 PM, David Lang wrote:
>>>>>>>
>>>>>>>  a HUP of the server should not cause any interruption in processing
>>>>>>>> messages.
>>>>>>>>
>>>>>>>> are you sure upstart is seing HUP not doing a full stop/start?
>>>>>>>>
>>>>>>>> could you try sending HUP to rsyslog manually a few times to see
>>>>>>>> if you
>>>>>>>> can duplicate the problem?
>>>>>>>>
>>>>>>>> David Lang
>>>>>>>>
>>>>>>>> On Fri, 1 Nov 2013, Erik Steffl wrote:
>>>>>>>>
>>>>>>>>    sorry, it wasn't entirely clear in my previous email but we
>>>>>>>> are doing
>>>>>>>>
>>>>>>>>> ubuntu upstart reload which is sending HUP signal, pretty sure it's
>>>>>>>>> doing the right thing since logrotate scripts use it, syslog
>>>>>>>>> message
>>>>>>>>> says rsyslog received HUP signal, rsyslogd keeps the same PID
>>>>>>>>> and all
>>>>>>>>> files are closed (as verified using lsof).
>>>>>>>>>
>>>>>>>>>   On top of that if, during one of these quiet periods, I try to
>>>>>>>>> send
>>>>>>>>> messages from another client (on which I just restarted rsyslog
>>>>>>>>> so it
>>>>>>>>> has no error counters etc.) I think it's also not sending anything,
>>>>>>>>> when stracing it (the sender) I see it recieved messages from
>>>>>>>>> logger
>>>>>>>>> but is not sending them anywhere. So I think that the collector
>>>>>>>>> rsyslog continues to send some kind of signal that it's not
>>>>>>>>> ready but
>>>>>>>>> not sure why or how to figure out whether it actually does it.
>>>>>>>>>
>>>>>>>>>   Any ideas how to troubleshoot this? I think I'll record more of
>>>>>>>>> strace output and try to reconstruct RELP messages, I guess
>>>>>>>>> there is
>>>>>>>>> some RELP response that has something else than RSP-CODE = 200
>>>>>>>>> (looking at http://www.rsyslog.com/doc/relp.html)
>>>>>>>>>
>>>>>>>>>   I was just about to publish the S3 upload scripts but since they
>>>>>>>>> cause rsyslog to misbehave this way I didn't do it yet, maybe I'll
>>>>>>>>> post them as they are and fix later...
>>>>>>>>>
>>>>>>>>>      erik
>>>>>>>>>
>>>>>>>>> On 11/01/2013 01:02 AM, David Lang wrote:
>>>>>>>>>
>>>>>>>>>  the purpose of RELP is to tell the senders if the receiver was
>>>>>>>>>> able to
>>>>>>>>>> accept the message or not, so that's working as designed.
>>>>>>>>>>
>>>>>>>>>> When you do a full restart of rsyslog, it stops processing new
>>>>>>>>>> messages
>>>>>>>>>> and disconnects all senders (you are doing a full stop of
>>>>>>>>>> rsyslog and
>>>>>>>>>> then starting it from scratch, this takes time). My guess is
>>>>>>>>>> that the
>>>>>>>>>> senders are detecting 'too many failures' when trying to send
>>>>>>>>>> messages,
>>>>>>>>>> so they are backing off and sleeping for a while rather tthan
>>>>>>>>>> performing
>>>>>>>>>> a mini DOS attack on the network and server. Every third
>>>>>>>>>> restart is
>>>>>>>>>> probably triggering a threshold.
>>>>>>>>>>
>>>>>>>>>> Instead of doing a full restart, just send the rsyslog daemon a
>>>>>>>>>> HUP
>>>>>>>>>> signal instead. That will tell rsyslog to flush and close all
>>>>>>>>>> files so
>>>>>>>>>> that you can rotate them (If you are doing compression on the
>>>>>>>>>> files, you
>>>>>>>>>> may need to sleep for a few seconds to let the fluch complete).
>>>>>>>>>> Rsyslog
>>>>>>>>>> can continue to receive new messages during this time, so your
>>>>>>>>>> senders
>>>>>>>>>> will not see an outage.
>>>>>>>>>>
>>>>>>>>>> By the way, I'm needing a script to upload rsyslog archives to S3,
>>>>>>>>>> could
>>>>>>>>>> you send me a copy of yours? (remove any passwords first please
>>>>>>>>>> :-)
>>>>>>>>>>
>>>>>>>>>> David Lang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    On Thu, 31 Oct 2013, Erik Steffl wrote:
>>>>>>>>>>
>>>>>>>>>>   Date: Thu, 31 Oct 2013 18:44:23 -0700
>>>>>>>>>>
>>>>>>>>>>> From: Erik Steffl <[email protected]>
>>>>>>>>>>> Reply-To: rsyslog-users <[email protected]>
>>>>>>>>>>> To: rsyslog-users <[email protected]>
>>>>>>>>>>> Subject: [rsyslog] Rsyslog with RELP not sending/receiving
>>>>>>>>>>> messages
>>>>>>>>>>> for long
>>>>>>>>>>>      intervals
>>>>>>>>>>>
>>>>>>>>>>> We have a fairly simple setup of 6 hosts sending syslog
>>>>>>>>>>> messages to
>>>>>>>>>>> one collector host, all of these run rsyslog 7.5.5-0adiscon2 from
>>>>>>>>>>> adiscon repo and use RELP to transfer messages. There is also
>>>>>>>>>>> load
>>>>>>>>>>> balancer in front of the collector machine but I dont' think it
>>>>>>>>>>> matters in this case.
>>>>>>>>>>>
>>>>>>>>>>> Rsyslog on collector machine is configured to write to files,
>>>>>>>>>>> switching to a new file every 15 minute, using config like this
>>>>>>>>>>> (abbreviated a bit):
>>>>>>>>>>>
>>>>>>>>>>> template(name="jsonFilename" type="list") {
>>>>>>>>>>>   constant(value="/path/")
>>>>>>>>>>>   property(name="$now")
>>>>>>>>>>>   constant(value="/")
>>>>>>>>>>>   property(name="$hour")
>>>>>>>>>>>   constant(value="/")
>>>>>>>>>>>   property(name="$qhour")
>>>>>>>>>>>   constant(value="/")
>>>>>>>>>>>   constant(value="log.json")
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> action(type="omfile" DynaFile="jsonFilename"
>>>>>>>>>>> Template="jsonFormat")
>>>>>>>>>>>
>>>>>>>>>>>   We run a script at every 2, 17, 32, 47 minute of the hour
>>>>>>>>>>> and upload
>>>>>>>>>>> the just finished file to S3. The uploading works like this:
>>>>>>>>>>>
>>>>>>>>>>>   - let's say it's 3:02:00, rsyslog is writing to
>>>>>>>>>>> /path/2013-10-10/03/00/log.json
>>>>>>>>>>>
>>>>>>>>>>>   - get the filename log.json (anything that's not current,
>>>>>>>>>>> usually
>>>>>>>>>>> just one previous file which in the example would be
>>>>>>>>>>> /path/2013-10-10/02/03/log.json)
>>>>>>>>>>>
>>>>>>>>>>>   - rename /path/2013-10-10/02/03/log.json to
>>>>>>>>>>> /path/2013-10-10/02/03/log.json.uploading.0
>>>>>>>>>>>
>>>>>>>>>>>   - reload rsyslog (to make sure that even if for some reason
>>>>>>>>>>> it was
>>>>>>>>>>> writing to just renamed file it would close it and open a new
>>>>>>>>>>> file)
>>>>>>>>>>>
>>>>>>>>>>>   - upload /path/2013-10-10/02/03/log.json.uploading.0 to S3
>>>>>>>>>>>
>>>>>>>>>>>   - remove /path/2013-10-10/02/03/log.json.uploading.0
>>>>>>>>>>>
>>>>>>>>>>> Here's what happens every third run (yes, regularly EVERY
>>>>>>>>>>> THIRD RUN)
>>>>>>>>>>> of this script:
>>>>>>>>>>>
>>>>>>>>>>>   - rsyslog stops writing to the CURRENT file
>>>>>>>>>>> (/path/2013-10-10/03/00/log.json, the one that is NOT being
>>>>>>>>>>> renamed)
>>>>>>>>>>> few seconds into the run of the script (e.g. 3:02:04)
>>>>>>>>>>>
>>>>>>>>>>>   - 6 hosts that were sending syslog messages to the log
>>>>>>>>>>> collector
>>>>>>>>>>> STOP
>>>>>>>>>>> sending anything (as verified by stracing rsyslogd, tcpdump
>>>>>>>>>>> and in
>>>>>>>>>>> amazon AWS console metric for network in)
>>>>>>>>>>>
>>>>>>>>>>>   - after this nothing is ever written into
>>>>>>>>>>> /path/2013-10-10/03/00/log.json
>>>>>>>>>>>
>>>>>>>>>>>   - the 6 clients start sending sysog messages again when the
>>>>>>>>>>> next
>>>>>>>>>>> file
>>>>>>>>>>> is created (in this example it would be
>>>>>>>>>>> /path/2013-10-10/03/01/log.json)
>>>>>>>>>>>
>>>>>>>>>>>   I checked and double check the files, dates, verified that the
>>>>>>>>>>> current file is not touched but can't figure out what's going
>>>>>>>>>>> on. I
>>>>>>>>>>> tried the script without reload rsyslog but it didn't make any
>>>>>>>>>>> difference. If I don't run this script rsyslog works flawlessly.
>>>>>>>>>>>
>>>>>>>>>>>   any ideas how to troubleshoot this? What could be causing the
>>>>>>>>>>> rsyslog
>>>>>>>>>>> to stop writing to the file and for the senders to stop sending
>>>>>>>>>>> syslog
>>>>>>>>>>> messages? I assume the rsyslog on the collector host somehow
>>>>>>>>>>> signals
>>>>>>>>>>> to the 6 hosts that send messages that it's not ready or
>>>>>>>>>>> something...
>>>>>>>>>>>
>>>>>>>>>>>   thanks!
>>>>>>>>>>>
>>>>>>>>>>>      erik
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> rsyslog mailing list
>>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT
>>>>>>>>>>> POST
>>>>>>>>>>> if you DON'T LIKE THAT.
>>>>>>>>>>>
>>>>>>>>>>>   _______________________________________________
>>>>>>>>>>>
>>>>>>>>>> rsyslog mailing list
>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>>>> myriad
>>>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
>>>>>>>>>> if you
>>>>>>>>>> DON'T LIKE THAT.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> rsyslog mailing list
>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO
>>>>>>>>> NOT POST
>>>>>>>>> if you DON'T LIKE THAT.
>>>>>>>>>
>>>>>>>>>   _______________________________________________
>>>>>>>>>
>>>>>>>> rsyslog mailing list
>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>> myriad
>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
>>>>>>>> if you
>>>>>>>> DON'T LIKE THAT.
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> rsyslog mailing list
>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT
>>>>>>> POST
>>>>>>> if you DON'T LIKE THAT.
>>>>>>>
>>>>>>>   _______________________________________________
>>>>>>>
>>>>>> rsyslog mailing list
>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>> http://www.rsyslog.com/professional-services/
>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>> myriad
>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>>> DON'T LIKE THAT.
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> rsyslog mailing list
>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> http://www.rsyslog.com/professional-services/
>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>> myriad
>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>> DON'T LIKE THAT.
>>>>>
>>>>>  _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT
>>>> POST if you DON'T LIKE THAT.
>>>>
>>>>
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
>>> if you DON'T LIKE THAT.
>>>
>>>  _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
>
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to