When you do a stop, rsyslog attempts to flush the logs it's holding in memory before it shuts down (at which point they are permanently lost).

If the logs can't go to their destination and can't go to the disk assist queue, they permanently fail.

Rsyslog tries _really_ hard to avoid permanently failing any message. This includes many attempts to deliver it (which IIRC, you had the limit set to -1, i.e. forever), but there is a backup timeout to say "I've spent too much time trying to save data, I'm going to go ahead and exit now anyway"

I wonder if some part of the code inside rsyslog is treating "over quota" the same way it would "out of disk space" and writing a message to the queue file, but then failing to write the new .qi file, resulting in corruption. If so, I think the right answer would be to not count the .qi files against the quota, and document this clearly. My understanding is that the .qi files are tiny compared to the log data.

I think this is going to have to wait until Rainer is available again, but that's not going to be for a couple of weeks.

In the meantime, we can look into some of the other work-arounds that I suggested if you want.

David Lang

On Fri, 10 May 2013, Fajun Chen wrote:

No, the sending system never run out of disk space. It just hit the maximal
disk space limit and there are still plenty of disk space available.

I'll test out recover_qi.pl to see if it can help when .qi files are lost.
I have seen cases where .qi file was present, but the queued files still
didn't get flushed, so this may not be the root cause.

Several other observations in my tests:
- This flushing issue happens consistently after maximal disk space limit
was reached
- Subsequent "service rsyslog restart" took quite some time to complete.
- Alternatively, "service rsyslog stop" followed by "service rsyslog start"
didn't help either.
Could queued file data structure be corrupted at either the time of
reaching disk space limit or rsyslog restart? More likely during rsyslog
restart when the configuration was changed. Rsyslog shouldn't lose or
corrupt data, sounds like a potential bug here. What do you think?

Found a similar and potentially related issue report here and it appears no
resolution:
http://lists.adiscon.net/pipermail/rsyslog/2013-March/032012.html

Thanks,
Fajun

On Fri, May 10, 2013 at 8:01 PM, David Lang <[email protected]> wrote:

The tool that can recreate the .qi indexes is available at

git.adiscon.com/?p=rsyslog.**git;a=blob;f=tools/recover_qi.**pl;h=**
4e2cf9d561fb06bf0efdca73b90d78**75a7b4102d;hb=HEAD<http://git.adiscon.com/?p=rsyslog.git;a=blob;f=tools/recover_qi.pl;h=4e2cf9d561fb06bf0efdca73b90d7875a7b4102d;hb=HEAD>

This is known to be needed if the disk completely fills up and .qi files
are lost. But it should not be needed otherwise.

David Lang

On Fri, 10 May 2013, David Lang wrote:

 Date: Fri, 10 May 2013 10:08:43 -0700 (PDT)
From: David Lang <[email protected]>
Reply-To: rsyslog-users <[email protected]>
To: rsyslog-users <[email protected]>
Subject: Re: [rsyslog] Rsyslog Disk Queue Flush Issue


A quick question and clarification of my understanding here.

Did this sending system ever run completely out of disk space? or did it
just hit the queue size limit that you set and still had more disk space
available?

Bad Things (tm) are known to happen to the queue if it completely runs
out of space.

David Lang

On Thu, 9 May 2013, David Lang wrote:

 On Thu, 9 May 2013, Fajun Chen wrote:

 Saw this error message below several times, not every time though:
rsyslogd-2040: fatal error on disk queue 'action 22 queue[DA]',
emergency
switch to direct mode [try http://www.rsyslog.com/e/2040 ]


I think this is the problem.

are there any other errors around this (something that may indicate an
error reading/writing a file?)

 I also noticed that .qi file in the disk queue directory doesn't show up
sometimes. Is this file required for disk queue to function correctly? I
did notice that disk queue flushing was working when the files was
absent
in several cases.



I don't believe that the queue will work properly without the .qi file.
and if it's switching to direct mode and failing, the messages are not
really getting queued, they are getting lost. I think that what's happening
is that rsyslog has the data that's in the .qi file in memory, so it can
keep going without it, but once it's restarted, it doesn't know what's what

what version are you using on the sending machine? (you may have said
earlier, but I'm not remembering). I know I've seen some work done dealing
with these sorts of problems since 7.0 was released, so if you are on an
older version, the first thing to try is a current version.

I think I remember Rainer posting a tool to recreate/fix the .qi file in
cases where it's been corrupted.

David Lang

 Thanks,
Fajun


On Thu, May 9, 2013 at 4:50 PM, David Lang <[email protected]> wrote:

 On Thu, 9 May 2013, Fajun Chen wrote:

 Resending original reply without debugging logs since it was blocked

waiting for approval (message size over 512k).

Thanks,
Fajun

On Thu, May 9, 2013 at 2:52 PM, Fajun Chen <[email protected]>
wrote:




On Thu, May 9, 2013 at 11:34 AM, David Lang <[email protected]> wrote:

 On Thu, 9 May 2013, Fajun Chen wrote:


 On Wed, May 8, 2013 at 9:22 PM, David Lang <[email protected]> wrote:


 On Wed, 8 May 2013, Fajun Chen wrote:


 iptables block setting didn't work for some reason.



 what do the iptables rules on that system look like?


without seeing them, my guess is that there is a rule already
there
that
allows packets related to a known connection that are getting
applied
(and
therefor accepting the packets) before the deny rule you are
trying to
put
in place takes effect.



The same problem exists on 5.8.6 with iptables blocking. One minor
detail:
the queued files reached the limit of 96M, it's reduced to 95M
after
the
firewall was unblocked, but it stays at 95M on the client without
flushing.
I can use logger to send new log messages to the server, so network
connection is not an issue.

7.3.14 seems to be working with iptables blocking.


 hmm, I don't understand how it could be different for different
versions
of rsyslog. the iptables filtering should be happening by the OS and
wouldn't care what version of software is running.



iptables filtering issue had been resolved by restarting rsyslog
 for the
firewall changes to take effect.


 Ok, that is almost certinly the 'established connection' thing that
I was
speculating about.


 rsyslog version has nothing to do with iptables filtering. What I
referred

to was that rsyslog 5.8.6 doesn't flush queued files while 7.3.14 does
when
iptables filtering was changed from blocking to unblocking.


 ahh, Ok.

At this point 5.8.6 is old enough that it's well past being supported,
so
let's work on the current version.




  As a alternative testing, I stopped rsylogd on the remote server
and
the


  logs were queued on the client as expected. I started rsyslog on
the

remote
server once the disk queue on the client is filled up. I did see
the
queue
files were flushed to the remote server once rsyslog is back to
service. So
this seems to be related to rsyslog configuration change.


 My guess (without knowing the code well) is that the queued

messages are
somehow queued for the specific destination (IIRC you had this
queue
setup
as an action queue, not as the main queue, you posted your config,
but I
have already deleted those messages). I'd be curious to see if you
have
the
same problem spilling the main queue to disk.



Just for your reference, here's my rsyslog configuration:

# start forwarding rule 1 of 1
$ActionQueueType LinkedList
$ActionQueueFileName srvrfwd
$ActionResumeRetryCount -1
$ActionQueueSaveOnShutdown on
$ActionQueueMaxDiskSpace  100000000
$ActionQueueSize 200000   # Tried 100000 as well
$ActionQueueHighWaterMark 600
$ActionQueueLowWaterMark  200
$ActionQueueTimeoutEnqueue 1

#local5.* :omrelp:127.255.255.1:20514   # Invalid IP to trigger
log
buffering
local5.* :omrelp:172.17.5.28:20514         # Real IP to trigger
log
forwarding
# end forwarding rule 1 of 1



  On the other hand, as I noted in the first report, when I changed

rsyslog

 configuration before disk space limit is reached, the queued
files

were
flushed to the remote server without issues.


 very interesting, and probably a bug.




Let me know if you need debugging logs to troubleshoot it.


 Ranier will need probably need to get involved with this, but
he's super
busy the next 2-3 weeks with a very high priority deadline (the
"every
waking hour" type of project)

It wouldn't hurt to take a look at the debug logs for the copy
started
after the config change.


 Rsyslog debugging log is attached here. This was collected by
running
"rsyslogd -dn" when the remote server IP was set to valid. Please
let me
know if you want me to submit bug tracking item.


 you can e-mail it to me directly



 by the way, are you sure you are doing a full restart after the config
change? a -HUP does not cause rsyslog to do a full restart and
re-read
it's
config file, it just causes rsyslog to close and re-open it's
outputs (a
full restart takes a long time and can cause messages to be lost)



I did "service rsyslog restart" after the config change. "Kill
timeout 5"
is set in /etc/init/rsyslog.conf. I'm not sure if this timeout
setting
could make a difference.


 this should do it. but just to be sure, do a stop (make sure it's
finished
shuttng down), then a start




  We need the initial startup logs to be queued before remote
logging

 server


 is set. Switching from invalid IP to valid IP in rsyslog

configuration
was
chosen to meet this requirement.


 Is there any chance of re-ordering the startup sequence to get
the

config
first, then start rsyslog, then start everything else? kernel
messages
will
get queued for quite a while, so they shouldn't be an issue. The
only
issue
would be any other applications that need to write logs very
early on.


 The problem is that we don't know remote logging server at
startup,

so we
need the capability to buffer the logs until the remote server is
set
by
user later. Understood that the logs could get lost after the disk
space
limit is reached. Is there any way to achieve this without rsyslog
configuration change?


 one possibility would be to just write the logs to a file and
then use
imfile to read this file later to send them upstream, but I'm not
sure
if
imfile has gained the capability to get all it's data from the file
yet.

Historically, imfile only read the message content from the file, it
generated the timestamp, hostname, priority, and severity
information
itself. I know there was talk about having an option to have imfile
parse
this from the file, but I don't know if it ever happened.

If nothing else, you could write messages to a file with the
RSYSLOG_ForwardFormat and then use netcat or similar to read the
file
and
spit it out over the network later, but that wouldn't be able to use
RELP
to send it. I guess you could use netcat to send it to a UDP
listener on
localhost and then have the logs sent out via RELP from there.

There should be some way to feed the logs to /dev/log, but I'm not
sure
exactly how to do that.

Thanks for all your suggestions. Data completeness and integrity is
very

 important in our use cases. I'm not sure how some of the logging
information such as originial timestamp would change when it's routed
around. If this is confirmed to be a bug and can be fixed in 1-2
months,
I
would much rather to wait for the fix.


 Well, if you feed the data to syslog with the timestamp, it will
preserve
the existing timestamp by default.

David Lang

 Thanks,

Fajun



 On Wed, May 8, 2013 at 11:56 AM, David Lang <[email protected]> wrote:


  On Wed, 8 May 2013, Fajun Chen wrote:


  I upgraded ubuntu rsyslog to 7.3.14 and still got the same
issue.


 My test procedure:

Clean log file. Set remote host IP to 127.255.255.1 (invalid
IP) in
rsyslog conf. service rsyslog restart followed by logger in a
loop. The
disk queue files are buffered but are limited to 96M overall.
Set
remote
host IP to valid IP. service rsyslog restart. I expect the
queued
files to
be flushed to the remote host but these files are still in the
queuing
directory.


 This may be a silly thought, but the fact that you are
changing
the

 configuration between these two steps could be part of the
problem.

I would suggest that instead of changing the config to
enable/disable
sending the logs that you instead keep the rsyslog config the
same
and set
iptables rules to block and unblock the communications.

 ______________________________******_________________


 rsyslog mailing list

http://lists.adiscon.net/******mailman/listinfo/rsyslog<http://lists.adiscon.net/****mailman/listinfo/rsyslog>
<http:**//lists.adiscon.net/**mailman/**listinfo/rsyslog<http://lists.adiscon.net/**mailman/listinfo/rsyslog>

<http:**//lists.adiscon.net/**mailman/**listinfo/rsyslog<http://lists.adiscon.net/mailman/**listinfo/rsyslog>
<htt**p://lists.adiscon.net/mailman/**listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog>



 
http://www.rsyslog.com/******professional-services/<http://www.rsyslog.com/****professional-services/>
<http://**www.rsyslog.com/****professional-services/<http://www.rsyslog.com/**professional-services/>

<http://**www.rsyslog.com/**professional-**services/<http://www.rsyslog.com/professional-**services/>
<http:**//www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/>




What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
DON'T LIKE THAT.



 ______________________________****_________________

rsyslog mailing list
http://lists.adiscon.net/****mailman/listinfo/rsyslog<http://lists.adiscon.net/**mailman/listinfo/rsyslog>
<http:**//lists.adiscon.net/mailman/**listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog>

http://www.rsyslog.com/****professional-services/<http://www.rsyslog.com/**professional-services/>
<http://**www.rsyslog.com/professional-**services/<http://www.rsyslog.com/professional-services/>

What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

 ______________________________****_________________

rsyslog mailing list
http://lists.adiscon.net/****mailman/listinfo/rsyslog<http://lists.adiscon.net/**mailman/listinfo/rsyslog>
<http:**//lists.adiscon.net/mailman/**listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog>

http://www.rsyslog.com/****professional-services/<http://www.rsyslog.com/**professional-services/>
<http://**www.rsyslog.com/professional-**services/<http://www.rsyslog.com/professional-services/>

What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

 ______________________________**_________________
rsyslog mailing list
http://lists.adiscon.net/**mailman/listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog>
http://www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/>
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you DON'T LIKE THAT.


 ______________________________**_________________
rsyslog mailing list
http://lists.adiscon.net/**mailman/listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog>
http://www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/>
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

 ______________________________**_________________
rsyslog mailing list
http://lists.adiscon.net/**mailman/listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog>
http://www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/>
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to