I am working on the same case here:
https://github.com/rsyslog/rsyslog/issues/868

The bottom line is that most probably the .qi file is corrupt. This
can happen when the recorded queue size is larger than the actual
number of messages inside the disk queue.

I would appreciate if comments go to the issue tracker.

Rainer

2016-03-28 23:58 GMT+02:00 Joe Blow <[email protected]>:
> Well there you go. Makes me wish the file was in json...
>
> Cheers,
>
> JB
>
>   Original Message
> From:[email protected]
> Sent:March 28, 2016 5:48 PM
> To:[email protected]
> Reply-to:[email protected]
> Subject:Re: [rsyslog] Fatal error on disk queue
>
> .qi files are text files. Ciprian posted his here:
> http://lists.adiscon.net/pipermail/rsyslog/2015-August/041020.html
>
> Here is mine:
>
> <OPB:1:qqueue:1:
> +iQueueSize:2:2:13:
> +tVars.disk.sizeOnDisk:2:5:15841:
>>End
> .
> <Obj:1:strm:1:
> +iCurrFNum:2:1:1:
> +pszFName:1:21:omelasticsearch-queue:
> +iMaxFiles:2:8:10000000:
> +bDeleteOnClose:2:1:0:
> +sType:2:1:1:
> +tOperationsMode:2:1:2:
> +tOpenMode:2:3:384:
> +iCurrOffs:2:5:15841:
> +inode:2:1:0:
> +bPrevWasNL:2:1:0:
>>End
> .
> <Obj:1:strm:1:
> +iCurrFNum:2:1:1:
> +pszFName:1:21:omelasticsearch-queue:
> +iMaxFiles:2:8:10000000:
> +bDeleteOnClose:2:1:1:
> +sType:2:1:1:
> +tOperationsMode:2:1:1:
> +tOpenMode:2:3:384:
> +iCurrOffs:2:4:1137:
> +inode:2:6:136136:
> +bPrevWasNL:2:1:0:
>>End
> .
>
>
> Thanks,
>
> Alec
>
> On Mon, Mar 28, 2016 at 3:45 PM, Joe Blow <[email protected]> wrote:
>
>> From what I remember the qi file is some type of binary file. I'll have a
>> gander tomorrow and reply back.
>>
>> Since I had a pain finding the script I'll just put a github link up once
>> I get some free time and link it back here in case anyone wants it.
>>
>> Cheers,
>>
>> JB
>>
>>   Original Message
>> From:[email protected]
>> Sent:March 28, 2016 5:40 PM
>> To:[email protected]
>> Reply-to:[email protected]
>> Subject:Re: [rsyslog] Fatal error on disk queue
>>
>> On Mon, 28 Mar 2016, Joe Blow wrote:
>>
>> > Do you happen to have a large amount of data flowing through the queue?
>> Have
>> > you thought about using the checkpoint.interval directive to ensure the
>> data
>> > you want queued is written to disk in a timely manner?  As far as
>> processing
>> > the corrupted queue files, there should be a recover_qi.pl script
>> kicking
>> > around to fix the queue files and send the data upstream via your
>> existing
>> > queue.
>>
>> My understanding is that in this case, all the data files are gone, the
>> only
>> thing left is the .qi file.
>>
>> That would indicate the all the logs were delivered, but the question is
>> why the
>> .qi file is still around.
>>
>> I haven't looked inside a .qi file, is it text, or is there a script to
>> parse it
>> and dump the info from it?
>>
>> > I've been toying with the idea of creating/using a separate instance of
>> > rsyslog to deal with queue backlogs so the queue can be emptied and get
>> back
>> > to a healthy state without losing the backlog or dealing with the
>> slowness
>> > issues that come with emptying massive disk backed queues in the event of
>> > upstream failure, specifically with larger feeds.
>>
>> I just had fun where a log destination was offline for 2 months.
>> Periodically I
>> needed to stop rsyslog, copy all the queue files to a new directory,
>> restart
>> rsyslog, and then compress the files so that I didn't run out of space on
>> the
>> relay box.
>>
>> When replaying them, I copied /etc/rsyslog.conf to /etc/rsyslog.conf.relay
>> and
>> did a few tweaks (not listening to the network or /dev/log, writing stats
>> to
>> different files, etc), and use a separate copy of rsyslog with this config
>> file
>> to replay the logs. With all the discussions about log replay speed from
>> queues,
>> I was worried, but it turns out not to be the bottleneck by a large margin
>> in my
>> case.
>>
>> David Lang
>>
>> > Cheers,
>> >
>> > JB
>> >
>> >   Original Message
>> > From:[email protected]
>> > Sent:March 28, 2016 4:15 PM
>> > To:[email protected]
>> > Reply-to:[email protected]
>> > Subject:Re: [rsyslog] Fatal error on disk queue
>> >
>> > Hi Alec,
>> >
>> > No, I don't have any answers here.
>> > Somehow the queue files get corrupted. On restart, only the data files
>> are
>> > removed, but not the .qi so the disk queue cannot be properly
>> initialised.
>> >
>> > Ciprian
>> >
>> > --
>> > Performance Monitoring * Log Analytics * Search Analytics
>> > Solr & Elasticsearch Support * http://sematext.com/
>> >
>> > On Mon, Mar 28, 2016 at 10:51 PM, Alec Swan <[email protected]> wrote:
>> >
>> >> Ciprian, your issue looks very similar. Have you found a work-around?
>> >>
>> >> David, permissions seem to be fine and SELinux is disabled (see below).
>> Any
>> >> other thoughts?
>> >>
>> >> [root@m0058601 rsyslog.d]# ls -la /var/lib/ | grep rsyslog
>> >> drwx------  2 root     root     4096 Nov  3 13:00 rsyslog
>> >>
>> >> [root@m0058601 rsyslog.d]# ls -la /var/spool/ | grep rsyslog
>> >> drwxr-xr-x  2 root   root    4096 Mar 27 22:34 rsyslog
>> >>
>> >> [root@m0058601 rsyslog.d]# sestatus
>> >> SELinux status:                 disabled
>> >>
>> >> Thanks,
>> >>
>> >> Alec
>> >>
>> >> On Mon, Mar 28, 2016 at 12:59 PM, Ciprian Hacman <
>> >> [email protected]> wrote:
>> >>
>> >> > Seems very similar to this discussion. Unfortunately, never got the
>> >> chance
>> >> > to understand what happened.
>> >> > http://lists.adiscon.net/pipermail/rsyslog/2015-August/041020.html
>> >> >
>> >> > Ciprian
>> >> >
>> >> > --
>> >> > Performance Monitoring * Log Analytics * Search Analytics
>> >> > Solr & Elasticsearch Support * http://sematext.com/
>> >> >
>> >> > On Mon, Mar 28, 2016 at 7:25 PM, David Lang <[email protected]> wrote:
>> >> >
>> >> > > On Sat, 26 Mar 2016, Alec Swan wrote:
>> >> > >
>> >> > > Hi there,
>> >> > >>
>> >> > >> I am using omelasticsearch module to send logs to elasticsearch
>> server
>> >> > and
>> >> > >> started noticing the "fatal error on disk queue" error shown
>> below. I
>> >> > also
>> >> > >> noticed a 560 byte .qi file created for the queue configured for
>> >> > >> omelasticsearch action as shown below. Once I removed the .qi file
>> the
>> >> > >> error went away.
>> >> > >>
>> >> > >> Is there anything wrong with the configuration? If not, how do I go
>> >> > about
>> >> > >> troubleshooting this issue?
>> >> > >>
>> >> > >> CONFIGURATION
>> >> > >>
>> >> > >>    action(
>> >> > >>         type = "omelasticsearch"
>> >> > >>         template = "es-payload"
>> >> > >>         dynSearchIndex = "on"
>> >> > >>         searchIndex = "logstash-index"
>> >> > >>         searchType = "syslog"
>> >> > >>         server = "127.0.0.1"
>> >> > >>         serverport = "9200"
>> >> > >>         uid = "xxx"
>> >> > >>         pwd = "yyy"
>> >> > >>         errorFile = "/var/log/rsyslog/ES-error.log"
>> >> > >>         bulkmode = "on"
>> >> > >>         action.resumeretrycount="-1"  # retry if ES is unreachable
>> (-1
>> >> > for
>> >> > >> infinite retries)
>> >> > >>         action.resumeInterval="60"
>> >> > >>         queue.dequeuebatchsize="1000"   # ES bulk size
>> >> > >>         queue.type="linkedlist"
>> >> > >>         queue.size="100000"
>> >> > >>         queue.workerthreads="5"
>> >> > >>         queue.timeoutworkerthreadshutdown="2000"
>> >> > >>         queue.spoolDirectory="/var/spool/rsyslog"
>> >> > >>         queue.filename="omelasticsearch-queue"
>> >> > >>         queue.maxfilesize="100m"
>> >> > >>         queue.maxdiskspace="1g"
>> >> > >>         queue.highwatermark="80000" # when to start spilling to
>> disk
>> >> > >>         queue.lowwatermark="20000"  # when to stop spilling to disk
>> >> > >>         queue.saveonshutdown="on"
>> >> > >>    )
>> >> > >>
>> >> > >>
>> >> > >> ERROR
>> >> > >>
>> >> > >> Mar 27 04:02:04 m0058180 rsyslogd-2040: fatal error on disk queue
>> >> > 'action
>> >> > >> 4
>> >> > >> queue[DA]', emergency switch to direct mode [v8.14.0 try
>> >> > >> http://www.rsyslog.com/e/2040 ]
>> >> > >> Mar 27 04:02:04 m0058180 rsyslogd: [origin software="rsyslogd"
>> >> > >> swVersion="8.14.0" x-pid="2648" x-info="http://www.rsyslog.com";]
>> >> start
>> >> > >> Mar 27 04:02:04 m0058180 rsyslogd-2040: fatal error on disk queue
>> >> > 'action
>> >> > >> 2
>> >> > >> queue[DA]', emergency switch to direct mode [v8.14.0 try
>> >> > >> http://www.rsyslog.com/e/2040 ]
>> >> > >>
>> >> > >
>> >> > > usually this means that there is a permission problem (including
>> >> SELinux
>> >> > > permissions) when trying to access the files in the work directory.
>> >> > >
>> >> > > David Lang
>> >> > >
>> >> > > _______________________________________________
>> >> > > rsyslog mailing list
>> >> > > http://lists.adiscon.net/mailman/listinfo/rsyslog
>> >> > > http://www.rsyslog.com/professional-services/
>> >> > > What's up with rsyslog? Follow https://twitter.com/rgerhards
>> >> > > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>> >> myriad
>> >> > > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>> you
>> >> > > DON'T LIKE THAT.
>> >> > >
>> >> > _______________________________________________
>> >> > rsyslog mailing list
>> >> > http://lists.adiscon.net/mailman/listinfo/rsyslog
>> >> > http://www.rsyslog.com/professional-services/
>> >> > What's up with rsyslog? Follow https://twitter.com/rgerhards
>> >> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>> myriad
>> >> > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> >> > DON'T LIKE THAT.
>> >> >
>> >> _______________________________________________
>> >> rsyslog mailing list
>> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> >> http://www.rsyslog.com/professional-services/
>> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> >> DON'T LIKE THAT.
>> >>
>> > _______________________________________________
>> > rsyslog mailing list
>> > http://lists.adiscon.net/mailman/listinfo/rsyslog
>> > http://www.rsyslog.com/professional-services/
>> > What's up with rsyslog? Follow https://twitter.com/rgerhards
>> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>> > _______________________________________________
>> > rsyslog mailing list
>> > http://lists.adiscon.net/mailman/listinfo/rsyslog
>> > http://www.rsyslog.com/professional-services/
>> > What's up with rsyslog? Follow https://twitter.com/rgerhards
>> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
> LIKE THAT.
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
> LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to