If the connection fails, rsyslog *immediately* tries to re-open, but that
seems to fail as well.

Btw: git v7-devel and master branch niw contain the changes i made.  Could
you please try one of these versions.

Sent from phone, thus brief.
Am 30.11.2013 00:39 schrieb "Erik Steffl" <[email protected]>:

> On 11/29/2013 01:48 AM, Rainer Gerhards wrote:
>
>> On Thu, Nov 28, 2013 at 2:07 AM, Erik Steffl <[email protected]> wrote:
>>
>>  On 11/26/2013 07:06 AM, Pavel Levshin wrote:
>>>
>>>
>>>> 26.11.2013 1:20, Erik Steffl:
>>>>
>>>>
>>>>>    what does the above mean exactly? It does make sense, in each case
>>>>> the burst of messages gets in via /dev/log then is send out via RELP
>>>>> (either to the same machine or a different one).
>>>>>
>>>>>    Any ideas why this doesn't fix itself until the next burst of
>>>>> messages? Any suggestions on what to do or what to investigate next? I
>>>>> guess I could run these with strace and see what exactly they say to
>>>>> each other (both the sender and receiver).
>>>>>
>>>>>
>>>>>  I'm not sure why (and if) this is 'fixed' by the next burst of
>>>> messages,
>>>> but this can be somehow because next portion of messages pushes the
>>>> queue. Perhaps it is unable to retry after suspend without a push.
>>>>
>>>>
>>>    not sure if 'fixed' is the right word but it is always unstuck right
>>> after next (sometime next to next) burst of messages, never at random
>>> time.
>>> This behaviour is the same whether the period is 5 min or 15 min.
>>>
>>>
>> While I have been silent, I followed the ML discussion (I did not check
>> the
>> debug log further, though, as Pavel did excellent work here). To me, it
>> looks like "normal" suspension code is kicking in. If an action fails,
>> rsyslog retries once (except otherwise configured) and if it fails again,
>> the action is initially suspended for 30 seconds. Then, retries happen and
>> the suspension period is prolonged if they fail.
>>
>> It looks very much like this is the mechanism at work. However, what I
>> don't understand is why the suspension period is so long.
>>
>> Out of all this, I think it would make much sense if rsyslog had the
>> capability to report when an action is suspended and when it is resumed. I
>> am right now adding this capability. I would suggest that when this change
>> is ready, you apply it and we can than see what it reports (much easier
>> than walking the debug log, and very obvious to users ;)).
>>
>
>   as I mentioned I tried to do two changes to the test scenario and tested
> each of these separately:
>
>   - send 200 message burst directly from collector-test to collector-prod
> (RELP), i.e. no load balancer
>
>   - keep using the load balancer but in addition to 200 messages curst
> every 5 min also send few (3) messages every minute
>
>   Each of these scenarios work, as in traffic is smooth, NO silences. This
> means that something goes wrong if we use load balancer and there is no
> traffic for 5 minutes.
>
>   From our previous investigation of amazon elastic load balancer (which
> is what we're using) it often lies about the connection, i.e. it has no
> connection on the backend but it happily accepts connects and data and
> pretends everything is fine (that's why we initially switched from plain
> TCP to RELP).
>
>   Not entirely sure what the load balancer is doing in this case but it
> seems that rsyslog thinks the connection is fine and keeps sending data to
> the load balancer but connection is actually broken (maybe the load
> balancer closed the connection between itself and the destination because
> of some timeout).
>
>   So the situation is not fixed until rsyslog closes the connection and
> opens a new one.
>
>   Is there any way for us to either make rsyslog keep the connection alive
> or to re-open it sooner?
>
>         erik
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to