25.11.2013 12:35, David Lang пишет:
On Mon, 25 Nov 2013, Pavel Levshin wrote:

25.11.2013 12:18, David Lang:\

Another possibly interesting message is:

7975.038523942:7fe2064cf700: main Q: doEnqSingleObject: LightDelay mark reached for light delayable message - blocking a bit.

which was received approximately once per second during following interval (this is also when the traffic went down to zero):

1385087975 Fri Nov 22 02:39:35 UTC 2013
1385088546 Fri Nov 22 02:49:06 UTC 2013

Does this shed any light on what's going on?

I bet this is the problem. This blocks main queue for a second.

There are possible workarounds, including increasing main queue size (this watermark is set to 70% by default).

searching through the documentation, this seems to be a tcp input option

old style

$InputTCPFlowControl defaults on set to off to disable

new style
FlowControl='off' on the module load

it still seems wrong that this triggers problems for such a long time. unless some output is blocking and causing the queue to not drain, it seems like it shoudl very quicly open up again.


I'm wrong in that it is blocking the whole queue; in reality, it is blocking only one input. In this case, it is RELP. It is using flow control, there is no option to disable it.

The question is: why queue length stays above LightDelayMark for a while?

what does pstats show during this timeframe? set the reporting frequency so that you will get several samples during the problem timeframe.


Has been posted today here, already. Some excerpts:

Fri Nov 22 02:15:45 2013: main Q: size=9533 enqueued=56272 full=0 discarded.full=0 discarded.nf=0 maxqsize=9533 Fri Nov 22 02:16:46 2013: main Q: size=22597 enqueued=69336 full=0 discarded.full=0 discarded.nf=0 maxqsize=22597 Fri Nov 22 02:17:46 2013: main Q: size=26 enqueued=82099 full=0 discarded.full=0 discarded.nf=0 maxqsize=25655 Fri Nov 22 02:18:46 2013: main Q: size=27 enqueued=94622 full=0 discarded.full=0 discarded.nf=0 maxqsize=25655 Fri Nov 22 02:19:46 2013: main Q: size=28 enqueued=107148 full=0 discarded.full=0 discarded.nf=0 maxqsize=25655 Fri Nov 22 02:20:46 2013: main Q: size=9202 enqueued=119439 full=0 discarded.full=0 discarded.nf=0 maxqsize=25655 Fri Nov 22 02:21:46 2013: main Q: size=25 enqueued=131510 full=0 discarded.full=0 discarded.nf=0 maxqsize=25655 Fri Nov 22 02:22:46 2013: main Q: size=29 enqueued=143703 full=0 discarded.full=0 discarded.nf=0 maxqsize=25655 Fri Nov 22 02:23:46 2013: main Q: size=25 enqueued=156219 full=0 discarded.full=0 discarded.nf=0 maxqsize=25655 Fri Nov 22 02:24:46 2013: main Q: size=27 enqueued=169026 full=0 discarded.full=0 discarded.nf=0 maxqsize=25655 Fri Nov 22 02:25:46 2013: main Q: size=25 enqueued=181734 full=0 discarded.full=0 discarded.nf=0 maxqsize=25655 Fri Nov 22 02:26:46 2013: main Q: size=29 enqueued=193979 full=0 discarded.full=0 discarded.nf=0 maxqsize=25655 Fri Nov 22 02:27:46 2013: main Q: size=27 enqueued=206103 full=0 discarded.full=0 discarded.nf=0 maxqsize=25655 Fri Nov 22 02:28:46 2013: main Q: size=28 enqueued=218616 full=0 discarded.full=0 discarded.nf=0 maxqsize=25655 Fri Nov 22 02:29:46 2013: main Q: size=29 enqueued=231468 full=0 discarded.full=0 discarded.nf=0 maxqsize=25655 Fri Nov 22 02:30:46 2013: main Q: size=9396 enqueued=244058 full=0 discarded.full=0 discarded.nf=0 maxqsize=25655 Fri Nov 22 02:31:46 2013: main Q: size=21097 enqueued=255759 full=0 discarded.full=0 discarded.nf=0 maxqsize=25655 Fri Nov 22 02:32:47 2013: main Q: size=32811 enqueued=267473 full=0 discarded.full=0 discarded.nf=0 maxqsize=32811 Fri Nov 22 02:33:47 2013: main Q: size=18451 enqueued=279993 full=0 discarded.full=0 discarded.nf=0 maxqsize=40206 Fri Nov 22 02:34:47 2013: main Q: size=6527 enqueued=292854 full=0 discarded.full=0 discarded.nf=0 maxqsize=40206 Fri Nov 22 02:35:47 2013: main Q: size=19557 enqueued=305884 full=0 discarded.full=0 discarded.nf=0 maxqsize=40206 Fri Nov 22 02:36:47 2013: main Q: size=32967 enqueued=319294 full=0 discarded.full=0 discarded.nf=0 maxqsize=40206 Fri Nov 22 02:37:47 2013: main Q: size=46424 enqueued=332751 full=0 discarded.full=0 discarded.nf=0 maxqsize=46424 Fri Nov 22 02:38:47 2013: main Q: size=59397 enqueued=345724 full=0 discarded.full=0 discarded.nf=0 maxqsize=59397 Fri Nov 22 02:39:47 2013: main Q: size=70037 enqueued=356365 full=0 discarded.full=0 discarded.nf=0 maxqsize=70037 Fri Nov 22 02:40:47 2013: main Q: size=70329 enqueued=356657 full=0 discarded.full=0 discarded.nf=0 maxqsize=70329 Fri Nov 22 02:41:47 2013: main Q: size=70416 enqueued=356744 full=0 discarded.full=0 discarded.nf=0 maxqsize=70416 Fri Nov 22 02:42:47 2013: main Q: size=70503 enqueued=356831 full=0 discarded.full=0 discarded.nf=0 maxqsize=70503 Fri Nov 22 02:43:47 2013: main Q: size=70590 enqueued=356918 full=0 discarded.full=0 discarded.nf=0 maxqsize=70590 Fri Nov 22 02:44:47 2013: main Q: size=70677 enqueued=357005 full=0 discarded.full=0 discarded.nf=0 maxqsize=70677 Fri Nov 22 02:45:47 2013: main Q: size=70969 enqueued=357297 full=0 discarded.full=0 discarded.nf=0 maxqsize=70969 Fri Nov 22 02:46:47 2013: main Q: size=69207 enqueued=367465 full=0 discarded.full=0 discarded.nf=0 maxqsize=71020 Fri Nov 22 02:47:47 2013: main Q: size=65006 enqueued=406528 full=0 discarded.full=0 discarded.nf=0 maxqsize=71020 Fri Nov 22 02:48:48 2013: main Q: size=70055 enqueued=426682 full=0 discarded.full=0 discarded.nf=0 maxqsize=71020 Fri Nov 22 02:49:48 2013: main Q: size=66943 enqueued=454801 full=0 discarded.full=0 discarded.nf=0 maxqsize=71020 Fri Nov 22 02:51:36 2013: main Q: size=10 enqueued=8284 full=0 discarded.full=0 discarded.nf=0 maxqsize=14 Fri Nov 22 02:52:36 2013: main Q: size=9 enqueued=20954 full=0 discarded.full=0 discarded.nf=0 maxqsize=15 Fri Nov 22 02:53:36 2013: main Q: size=13 enqueued=33913 full=0 discarded.full=0 discarded.nf=0 maxqsize=20 Fri Nov 22 02:54:36 2013: main Q: size=22 enqueued=46568 full=0 discarded.full=0 discarded.nf=0 maxqsize=22 Fri Nov 22 02:55:36 2013: main Q: size=6 enqueued=60148 full=0 discarded.full=0 discarded.nf=0 maxqsize=70 Fri Nov 22 02:56:36 2013: main Q: size=15 enqueued=73316 full=0 discarded.full=0 discarded.nf=0 maxqsize=70 Fri Nov 22 02:57:36 2013: main Q: size=12 enqueued=85892 full=0 discarded.full=0 discarded.nf=0 maxqsize=70 Fri Nov 22 02:58:36 2013: main Q: size=24 enqueued=98845 full=0 discarded.full=0 discarded.nf=0 maxqsize=70 Fri Nov 22 02:59:37 2013: main Q: size=14 enqueued=111903 full=0 discarded.full=0 discarded.nf=0 maxqsize=70 Fri Nov 22 03:00:37 2013: main Q: size=8096 enqueued=125043 full=0 discarded.full=0 discarded.nf=0 maxqsize=8096

The queue is almost "locked" from 02:39:47 to 02:45:47, and rsyslog is restarted short after this time.

The queue does not shrink, so the problem can be in this output, which is freezing many times:

Fri Nov 22 02:15:45 2013: action 7: processed=46613 failed=0
Fri Nov 22 02:16:46 2013: action 7: processed=46613 failed=0
Fri Nov 22 02:17:46 2013: action 7: processed=81687 failed=0
Fri Nov 22 02:18:46 2013: action 7: processed=94184 failed=0
Fri Nov 22 02:19:46 2013: action 7: processed=106683 failed=0
Fri Nov 22 02:20:46 2013: action 7: processed=109768 failed=0
Fri Nov 22 02:21:46 2013: action 7: processed=130786 failed=0
Fri Nov 22 02:22:46 2013: action 7: processed=142948 failed=0
Fri Nov 22 02:23:46 2013: action 7: processed=155441 failed=0
Fri Nov 22 02:24:46 2013: action 7: processed=168219 failed=0
Fri Nov 22 02:25:46 2013: action 7: processed=180697 failed=0
Fri Nov 22 02:26:46 2013: action 7: processed=192912 failed=0
Fri Nov 22 02:27:46 2013: action 7: processed=205010 failed=0
Fri Nov 22 02:28:46 2013: action 7: processed=217496 failed=0
Fri Nov 22 02:29:46 2013: action 7: processed=230320 failed=0
Fri Nov 22 02:30:46 2013: action 7: processed=233457 failed=0
Fri Nov 22 02:31:46 2013: action 7: processed=233457 failed=0
Fri Nov 22 02:32:47 2013: action 7: processed=233457 failed=0
Fri Nov 22 02:33:47 2013: action 7: processed=260539 failed=0
Fri Nov 22 02:34:47 2013: action 7: processed=285012 failed=0
Fri Nov 22 02:35:47 2013: action 7: processed=285012 failed=0
Fri Nov 22 02:36:47 2013: action 7: processed=285012 failed=0
Fri Nov 22 02:37:47 2013: action 7: processed=285012 failed=0
Fri Nov 22 02:38:47 2013: action 7: processed=285012 failed=0
Fri Nov 22 02:39:47 2013: action 7: processed=285012 failed=0
Fri Nov 22 02:40:47 2013: action 7: processed=285012 failed=0
Fri Nov 22 02:41:47 2013: action 7: processed=285012 failed=0
Fri Nov 22 02:42:47 2013: action 7: processed=285012 failed=0
Fri Nov 22 02:43:47 2013: action 7: processed=285012 failed=0
Fri Nov 22 02:44:47 2013: action 7: processed=285012 failed=0
Fri Nov 22 02:45:47 2013: action 7: processed=285012 failed=0
Fri Nov 22 02:46:47 2013: action 7: processed=296930 failed=0
Fri Nov 22 02:47:47 2013: action 7: processed=340113 failed=0
Fri Nov 22 02:48:48 2013: action 7: processed=354705 failed=0
Fri Nov 22 02:49:48 2013: action 7: processed=385790 failed=0
Fri Nov 22 02:51:36 2013: action 7: processed=8246 failed=0
Fri Nov 22 02:52:36 2013: action 7: processed=20889 failed=0
Fri Nov 22 02:53:36 2013: action 7: processed=33821 failed=0
Fri Nov 22 02:54:36 2013: action 7: processed=46449 failed=0
Fri Nov 22 02:55:36 2013: action 7: processed=59797 failed=0
Fri Nov 22 02:56:36 2013: action 7: processed=72938 failed=0
Fri Nov 22 02:57:36 2013: action 7: processed=85487 failed=0
Fri Nov 22 02:58:36 2013: action 7: processed=98413 failed=0
Fri Nov 22 02:59:37 2013: action 7: processed=111444 failed=0
Fri Nov 22 03:00:37 2013: action 7: processed=116395 failed=0

It was in initial posting:

- 6 hosts that were sending syslog messages to the log collector STOP sending anything (as verified by stracing rsyslogd, tcpdump and in amazon AWS console metric for network in)

- after this nothing is ever written into /path/2013-10-10/03/00/log.json

- the 6 clients start sending sysog messages again when the next file is created (in this example it would be /path/2013-10-10/03/01/log.json)

So, most probably, rsyslog is unable to write the file.


--
Pavel Levshin

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to