No, the sending system never run out of disk space. It just hit the maximal disk space limit and there are still plenty of disk space available.
I'll test out recover_qi.pl to see if it can help when .qi files are lost. I have seen cases where .qi file was present, but the queued files still didn't get flushed, so this may not be the root cause. Several other observations in my tests: - This flushing issue happens consistently after maximal disk space limit was reached - Subsequent "service rsyslog restart" took quite some time to complete. - Alternatively, "service rsyslog stop" followed by "service rsyslog start" didn't help either. Could queued file data structure be corrupted at either the time of reaching disk space limit or rsyslog restart? More likely during rsyslog restart when the configuration was changed. Rsyslog shouldn't lose or corrupt data, sounds like a potential bug here. What do you think? Found a similar and potentially related issue report here and it appears no resolution: http://lists.adiscon.net/pipermail/rsyslog/2013-March/032012.html Thanks, Fajun On Fri, May 10, 2013 at 8:01 PM, David Lang <[email protected]> wrote: > The tool that can recreate the .qi indexes is available at > > git.adiscon.com/?p=rsyslog.**git;a=blob;f=tools/recover_qi.**pl;h=** > 4e2cf9d561fb06bf0efdca73b90d78**75a7b4102d;hb=HEAD<http://git.adiscon.com/?p=rsyslog.git;a=blob;f=tools/recover_qi.pl;h=4e2cf9d561fb06bf0efdca73b90d7875a7b4102d;hb=HEAD> > > This is known to be needed if the disk completely fills up and .qi files > are lost. But it should not be needed otherwise. > > David Lang > > On Fri, 10 May 2013, David Lang wrote: > > Date: Fri, 10 May 2013 10:08:43 -0700 (PDT) >> From: David Lang <[email protected]> >> Reply-To: rsyslog-users <[email protected]> >> To: rsyslog-users <[email protected]> >> Subject: Re: [rsyslog] Rsyslog Disk Queue Flush Issue >> >> >> A quick question and clarification of my understanding here. >> >> Did this sending system ever run completely out of disk space? or did it >> just hit the queue size limit that you set and still had more disk space >> available? >> >> Bad Things (tm) are known to happen to the queue if it completely runs >> out of space. >> >> David Lang >> >> On Thu, 9 May 2013, David Lang wrote: >> >> On Thu, 9 May 2013, Fajun Chen wrote: >>> >>> Saw this error message below several times, not every time though: >>>> rsyslogd-2040: fatal error on disk queue 'action 22 queue[DA]', >>>> emergency >>>> switch to direct mode [try http://www.rsyslog.com/e/2040 ] >>>> >>> >>> I think this is the problem. >>> >>> are there any other errors around this (something that may indicate an >>> error reading/writing a file?) >>> >>> I also noticed that .qi file in the disk queue directory doesn't show up >>>> sometimes. Is this file required for disk queue to function correctly? I >>>> did notice that disk queue flushing was working when the files was >>>> absent >>>> in several cases. >>>> >>> >>> >>> I don't believe that the queue will work properly without the .qi file. >>> and if it's switching to direct mode and failing, the messages are not >>> really getting queued, they are getting lost. I think that what's happening >>> is that rsyslog has the data that's in the .qi file in memory, so it can >>> keep going without it, but once it's restarted, it doesn't know what's what >>> >>> what version are you using on the sending machine? (you may have said >>> earlier, but I'm not remembering). I know I've seen some work done dealing >>> with these sorts of problems since 7.0 was released, so if you are on an >>> older version, the first thing to try is a current version. >>> >>> I think I remember Rainer posting a tool to recreate/fix the .qi file in >>> cases where it's been corrupted. >>> >>> David Lang >>> >>> Thanks, >>>> Fajun >>>> >>>> >>>> On Thu, May 9, 2013 at 4:50 PM, David Lang <[email protected]> wrote: >>>> >>>> On Thu, 9 May 2013, Fajun Chen wrote: >>>>> >>>>> Resending original reply without debugging logs since it was blocked >>>>> >>>>>> waiting for approval (message size over 512k). >>>>>> >>>>>> Thanks, >>>>>> Fajun >>>>>> >>>>>> On Thu, May 9, 2013 at 2:52 PM, Fajun Chen <[email protected]> >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> On Thu, May 9, 2013 at 11:34 AM, David Lang <[email protected]> wrote: >>>>>>> >>>>>>> On Thu, 9 May 2013, Fajun Chen wrote: >>>>>>> >>>>>>>> >>>>>>>> On Wed, May 8, 2013 at 9:22 PM, David Lang <[email protected]> wrote: >>>>>>>> >>>>>>>> >>>>>>>>> On Wed, 8 May 2013, Fajun Chen wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> iptables block setting didn't work for some reason. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> what do the iptables rules on that system look like? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> without seeing them, my guess is that there is a rule already >>>>>>>>>> there >>>>>>>>>> that >>>>>>>>>> allows packets related to a known connection that are getting >>>>>>>>>> applied >>>>>>>>>> (and >>>>>>>>>> therefor accepting the packets) before the deny rule you are >>>>>>>>>> trying to >>>>>>>>>> put >>>>>>>>>> in place takes effect. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> The same problem exists on 5.8.6 with iptables blocking. One minor >>>>>>>>> detail: >>>>>>>>> the queued files reached the limit of 96M, it's reduced to 95M >>>>>>>>> after >>>>>>>>> the >>>>>>>>> firewall was unblocked, but it stays at 95M on the client without >>>>>>>>> flushing. >>>>>>>>> I can use logger to send new log messages to the server, so network >>>>>>>>> connection is not an issue. >>>>>>>>> >>>>>>>>> 7.3.14 seems to be working with iptables blocking. >>>>>>>>> >>>>>>>>> >>>>>>>>> hmm, I don't understand how it could be different for different >>>>>>>> versions >>>>>>>> of rsyslog. the iptables filtering should be happening by the OS and >>>>>>>> wouldn't care what version of software is running. >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> iptables filtering issue had been resolved by restarting rsyslog >>>>>>> for the >>>>>>> firewall changes to take effect. >>>>>>> >>>>>>> >>>>>> Ok, that is almost certinly the 'established connection' thing that >>>>> I was >>>>> speculating about. >>>>> >>>>> >>>>> rsyslog version has nothing to do with iptables filtering. What I >>>>> referred >>>>> >>>>>> to was that rsyslog 5.8.6 doesn't flush queued files while 7.3.14 does >>>>>>> when >>>>>>> iptables filtering was changed from blocking to unblocking. >>>>>>> >>>>>>> >>>>>> ahh, Ok. >>>>> >>>>> At this point 5.8.6 is old enough that it's well past being supported, >>>>> so >>>>> let's work on the current version. >>>>> >>>>> >>>>> >>>>> >>>>>>>> As a alternative testing, I stopped rsylogd on the remote server >>>>>>>> and >>>>>>>> the >>>>>>>> >>>>>>>> >>>>>>>>> logs were queued on the client as expected. I started rsyslog on >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>>> remote >>>>>>>>>>> server once the disk queue on the client is filled up. I did see >>>>>>>>>>> the >>>>>>>>>>> queue >>>>>>>>>>> files were flushed to the remote server once rsyslog is back to >>>>>>>>>>> service. So >>>>>>>>>>> this seems to be related to rsyslog configuration change. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> My guess (without knowing the code well) is that the queued >>>>>>>>>>> >>>>>>>>>> messages are >>>>>>>>>> somehow queued for the specific destination (IIRC you had this >>>>>>>>>> queue >>>>>>>>>> setup >>>>>>>>>> as an action queue, not as the main queue, you posted your config, >>>>>>>>>> but I >>>>>>>>>> have already deleted those messages). I'd be curious to see if you >>>>>>>>>> have >>>>>>>>>> the >>>>>>>>>> same problem spilling the main queue to disk. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Just for your reference, here's my rsyslog configuration: >>>>>>>>> >>>>>>>>> # start forwarding rule 1 of 1 >>>>>>>>> $ActionQueueType LinkedList >>>>>>>>> $ActionQueueFileName srvrfwd >>>>>>>>> $ActionResumeRetryCount -1 >>>>>>>>> $ActionQueueSaveOnShutdown on >>>>>>>>> $ActionQueueMaxDiskSpace 100000000 >>>>>>>>> $ActionQueueSize 200000 # Tried 100000 as well >>>>>>>>> $ActionQueueHighWaterMark 600 >>>>>>>>> $ActionQueueLowWaterMark 200 >>>>>>>>> $ActionQueueTimeoutEnqueue 1 >>>>>>>>> >>>>>>>>> #local5.* :omrelp:127.255.255.1:20514 # Invalid IP to trigger >>>>>>>>> log >>>>>>>>> buffering >>>>>>>>> local5.* :omrelp:172.17.5.28:20514 # Real IP to trigger >>>>>>>>> log >>>>>>>>> forwarding >>>>>>>>> # end forwarding rule 1 of 1 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On the other hand, as I noted in the first report, when I changed >>>>>>>>> >>>>>>>>>> rsyslog >>>>>>>>>> >>>>>>>>>> configuration before disk space limit is reached, the queued >>>>>>>>>> files >>>>>>>>>> >>>>>>>>>>> were >>>>>>>>>>> flushed to the remote server without issues. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> very interesting, and probably a bug. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Let me know if you need debugging logs to troubleshoot it. >>>>>>>>> >>>>>>>>> >>>>>>>>> Ranier will need probably need to get involved with this, but >>>>>>>> he's super >>>>>>>> busy the next 2-3 weeks with a very high priority deadline (the >>>>>>>> "every >>>>>>>> waking hour" type of project) >>>>>>>> >>>>>>>> It wouldn't hurt to take a look at the debug logs for the copy >>>>>>>> started >>>>>>>> after the config change. >>>>>>>> >>>>>>>> >>>>>>>> Rsyslog debugging log is attached here. This was collected by >>>>>>> running >>>>>>> "rsyslogd -dn" when the remote server IP was set to valid. Please >>>>>>> let me >>>>>>> know if you want me to submit bug tracking item. >>>>>>> >>>>>>> >>>>>> you can e-mail it to me directly >>>>> >>>>> >>>>> >>>>> by the way, are you sure you are doing a full restart after the config >>>>>>>> change? a -HUP does not cause rsyslog to do a full restart and >>>>>>>> re-read >>>>>>>> it's >>>>>>>> config file, it just causes rsyslog to close and re-open it's >>>>>>>> outputs (a >>>>>>>> full restart takes a long time and can cause messages to be lost) >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> I did "service rsyslog restart" after the config change. "Kill >>>>>>> timeout 5" >>>>>>> is set in /etc/init/rsyslog.conf. I'm not sure if this timeout >>>>>>> setting >>>>>>> could make a difference. >>>>>>> >>>>>>> >>>>>> this should do it. but just to be sure, do a stop (make sure it's >>>>> finished >>>>> shuttng down), then a start >>>>> >>>>> >>>>> >>>>> >>>>>>>> We need the initial startup logs to be queued before remote >>>>>>>> logging >>>>>>>> >>>>>>>> server >>>>>>>>> >>>>>>>>>> >>>>>>>>>> is set. Switching from invalid IP to valid IP in rsyslog >>>>>>>>>> >>>>>>>>>>> configuration >>>>>>>>>>> was >>>>>>>>>>> chosen to meet this requirement. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Is there any chance of re-ordering the startup sequence to get >>>>>>>>>>> the >>>>>>>>>>> >>>>>>>>>> config >>>>>>>>>> first, then start rsyslog, then start everything else? kernel >>>>>>>>>> messages >>>>>>>>>> will >>>>>>>>>> get queued for quite a while, so they shouldn't be an issue. The >>>>>>>>>> only >>>>>>>>>> issue >>>>>>>>>> would be any other applications that need to write logs very >>>>>>>>>> early on. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The problem is that we don't know remote logging server at >>>>>>>>>> startup, >>>>>>>>>> >>>>>>>>> so we >>>>>>>>> need the capability to buffer the logs until the remote server is >>>>>>>>> set >>>>>>>>> by >>>>>>>>> user later. Understood that the logs could get lost after the disk >>>>>>>>> space >>>>>>>>> limit is reached. Is there any way to achieve this without rsyslog >>>>>>>>> configuration change? >>>>>>>>> >>>>>>>>> >>>>>>>>> one possibility would be to just write the logs to a file and >>>>>>>> then use >>>>>>>> imfile to read this file later to send them upstream, but I'm not >>>>>>>> sure >>>>>>>> if >>>>>>>> imfile has gained the capability to get all it's data from the file >>>>>>>> yet. >>>>>>>> >>>>>>>> Historically, imfile only read the message content from the file, it >>>>>>>> generated the timestamp, hostname, priority, and severity >>>>>>>> information >>>>>>>> itself. I know there was talk about having an option to have imfile >>>>>>>> parse >>>>>>>> this from the file, but I don't know if it ever happened. >>>>>>>> >>>>>>>> If nothing else, you could write messages to a file with the >>>>>>>> RSYSLOG_ForwardFormat and then use netcat or similar to read the >>>>>>>> file >>>>>>>> and >>>>>>>> spit it out over the network later, but that wouldn't be able to use >>>>>>>> RELP >>>>>>>> to send it. I guess you could use netcat to send it to a UDP >>>>>>>> listener on >>>>>>>> localhost and then have the logs sent out via RELP from there. >>>>>>>> >>>>>>>> There should be some way to feed the logs to /dev/log, but I'm not >>>>>>>> sure >>>>>>>> exactly how to do that. >>>>>>>> >>>>>>>> Thanks for all your suggestions. Data completeness and integrity is >>>>>>>> very >>>>>>>> >>>>>>>> important in our use cases. I'm not sure how some of the logging >>>>>>> information such as originial timestamp would change when it's routed >>>>>>> around. If this is confirmed to be a bug and can be fixed in 1-2 >>>>>>> months, >>>>>>> I >>>>>>> would much rather to wait for the fix. >>>>>>> >>>>>>> >>>>>> Well, if you feed the data to syslog with the timestamp, it will >>>>> preserve >>>>> the existing timestamp by default. >>>>> >>>>> David Lang >>>>> >>>>> Thanks, >>>>> >>>>>> Fajun >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Wed, May 8, 2013 at 11:56 AM, David Lang <[email protected]> wrote: >>>>>>>> >>>>>>>> >>>>>>>>> On Wed, 8 May 2013, Fajun Chen wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I upgraded ubuntu rsyslog to 7.3.14 and still got the same >>>>>>>>>>>> issue. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> My test procedure: >>>>>>>>>>>> >>>>>>>>>>>>> Clean log file. Set remote host IP to 127.255.255.1 (invalid >>>>>>>>>>>>> IP) in >>>>>>>>>>>>> rsyslog conf. service rsyslog restart followed by logger in a >>>>>>>>>>>>> loop. The >>>>>>>>>>>>> disk queue files are buffered but are limited to 96M overall. >>>>>>>>>>>>> Set >>>>>>>>>>>>> remote >>>>>>>>>>>>> host IP to valid IP. service rsyslog restart. I expect the >>>>>>>>>>>>> queued >>>>>>>>>>>>> files to >>>>>>>>>>>>> be flushed to the remote host but these files are still in the >>>>>>>>>>>>> queuing >>>>>>>>>>>>> directory. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> This may be a silly thought, but the fact that you are >>>>>>>>>>>>> changing >>>>>>>>>>>>> the >>>>>>>>>>>>> >>>>>>>>>>>>> configuration between these two steps could be part of the >>>>>>>>>>>> problem. >>>>>>>>>>>> >>>>>>>>>>>> I would suggest that instead of changing the config to >>>>>>>>>>>> enable/disable >>>>>>>>>>>> sending the logs that you instead keep the rsyslog config the >>>>>>>>>>>> same >>>>>>>>>>>> and set >>>>>>>>>>>> iptables rules to block and unblock the communications. >>>>>>>>>>>> >>>>>>>>>>>> ______________________________******_________________ >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> rsyslog mailing list >>>>>>>>>> >>>>>>>>> http://lists.adiscon.net/******mailman/listinfo/rsyslog<http://lists.adiscon.net/****mailman/listinfo/rsyslog> >>>>>>>> <http:**//lists.adiscon.net/**mailman/**listinfo/rsyslog<http://lists.adiscon.net/**mailman/listinfo/rsyslog> >>>>>>>> > >>>>>>>> <http:**//lists.adiscon.net/**mailman/**listinfo/rsyslog<http://lists.adiscon.net/mailman/**listinfo/rsyslog> >>>>>>>> <htt**p://lists.adiscon.net/mailman/**listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog> >>>>>>>> > >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> http://www.rsyslog.com/******professional-services/<http://www.rsyslog.com/****professional-services/> >>>>>>>> <http://**www.rsyslog.com/****professional-services/<http://www.rsyslog.com/**professional-services/> >>>>>>>> > >>>>>>>> <http://**www.rsyslog.com/**professional-**services/<http://www.rsyslog.com/professional-**services/> >>>>>>>> <http:**//www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/> >>>>>>>> > >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>>> myriad >>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>>>>> you >>>>>>>> DON'T LIKE THAT. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> ______________________________****_________________ >>>>>>> >>>>>> rsyslog mailing list >>>>>> http://lists.adiscon.net/****mailman/listinfo/rsyslog<http://lists.adiscon.net/**mailman/listinfo/rsyslog> >>>>>> <http:**//lists.adiscon.net/mailman/**listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog> >>>>>> > >>>>>> http://www.rsyslog.com/****professional-services/<http://www.rsyslog.com/**professional-services/> >>>>>> <http://**www.rsyslog.com/professional-**services/<http://www.rsyslog.com/professional-services/> >>>>>> > >>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>> myriad >>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>>> DON'T LIKE THAT. >>>>>> >>>>>> ______________________________****_________________ >>>>>> >>>>> rsyslog mailing list >>>>> http://lists.adiscon.net/****mailman/listinfo/rsyslog<http://lists.adiscon.net/**mailman/listinfo/rsyslog> >>>>> <http:**//lists.adiscon.net/mailman/**listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog> >>>>> > >>>>> http://www.rsyslog.com/****professional-services/<http://www.rsyslog.com/**professional-services/> >>>>> <http://**www.rsyslog.com/professional-**services/<http://www.rsyslog.com/professional-services/> >>>>> > >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>> myriad >>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>> DON'T LIKE THAT. >>>>> >>>>> ______________________________**_________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/**mailman/listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog> >>>> http://www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/> >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>> you DON'T LIKE THAT. >>>> >>>> >>> ______________________________**_________________ >> rsyslog mailing list >> http://lists.adiscon.net/**mailman/listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog> >> http://www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/> >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >> DON'T LIKE THAT. >> >> ______________________________**_________________ > rsyslog mailing list > http://lists.adiscon.net/**mailman/listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog> > http://www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/> > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

