If the connection fails, rsyslog *immediately* tries to re-open, but that seems to fail as well.
Btw: git v7-devel and master branch niw contain the changes i made. Could you please try one of these versions. Sent from phone, thus brief. Am 30.11.2013 00:39 schrieb "Erik Steffl" <[email protected]>: > On 11/29/2013 01:48 AM, Rainer Gerhards wrote: > >> On Thu, Nov 28, 2013 at 2:07 AM, Erik Steffl <[email protected]> wrote: >> >> On 11/26/2013 07:06 AM, Pavel Levshin wrote: >>> >>> >>>> 26.11.2013 1:20, Erik Steffl: >>>> >>>> >>>>> what does the above mean exactly? It does make sense, in each case >>>>> the burst of messages gets in via /dev/log then is send out via RELP >>>>> (either to the same machine or a different one). >>>>> >>>>> Any ideas why this doesn't fix itself until the next burst of >>>>> messages? Any suggestions on what to do or what to investigate next? I >>>>> guess I could run these with strace and see what exactly they say to >>>>> each other (both the sender and receiver). >>>>> >>>>> >>>>> I'm not sure why (and if) this is 'fixed' by the next burst of >>>> messages, >>>> but this can be somehow because next portion of messages pushes the >>>> queue. Perhaps it is unable to retry after suspend without a push. >>>> >>>> >>> not sure if 'fixed' is the right word but it is always unstuck right >>> after next (sometime next to next) burst of messages, never at random >>> time. >>> This behaviour is the same whether the period is 5 min or 15 min. >>> >>> >> While I have been silent, I followed the ML discussion (I did not check >> the >> debug log further, though, as Pavel did excellent work here). To me, it >> looks like "normal" suspension code is kicking in. If an action fails, >> rsyslog retries once (except otherwise configured) and if it fails again, >> the action is initially suspended for 30 seconds. Then, retries happen and >> the suspension period is prolonged if they fail. >> >> It looks very much like this is the mechanism at work. However, what I >> don't understand is why the suspension period is so long. >> >> Out of all this, I think it would make much sense if rsyslog had the >> capability to report when an action is suspended and when it is resumed. I >> am right now adding this capability. I would suggest that when this change >> is ready, you apply it and we can than see what it reports (much easier >> than walking the debug log, and very obvious to users ;)). >> > > as I mentioned I tried to do two changes to the test scenario and tested > each of these separately: > > - send 200 message burst directly from collector-test to collector-prod > (RELP), i.e. no load balancer > > - keep using the load balancer but in addition to 200 messages curst > every 5 min also send few (3) messages every minute > > Each of these scenarios work, as in traffic is smooth, NO silences. This > means that something goes wrong if we use load balancer and there is no > traffic for 5 minutes. > > From our previous investigation of amazon elastic load balancer (which > is what we're using) it often lies about the connection, i.e. it has no > connection on the backend but it happily accepts connects and data and > pretends everything is fine (that's why we initially switched from plain > TCP to RELP). > > Not entirely sure what the load balancer is doing in this case but it > seems that rsyslog thinks the connection is fine and keeps sending data to > the load balancer but connection is actually broken (maybe the load > balancer closed the connection between itself and the destination because > of some timeout). > > So the situation is not fixed until rsyslog closes the connection and > opens a new one. > > Is there any way for us to either make rsyslog keep the connection alive > or to re-open it sooner? > > erik > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

