On Thu, Dec 19, 2013 at 2:28 AM, Erik Steffl <[email protected]> wrote:

>   maybe this was too wordy, here's simpler version.
>
>
I had finally to work on some internal things before vacation, thus the
silence. Today is meetings as well...


>   after adding:
>
>   action.resumeRetryCount="-1"
>   action.resumeInterval="5"
>
>   to action definition I see that rsyslogd tries to resend the message few
> times, where sendto succeeds but not RELP response is received. After few
> times sendto take approximately 10 minutes and fails.
>
>
You mean sendto() itself takes around 10 minutes?

Rainer

>   Afterwards the connection is closed, re-opened and everything works fine.
>
>   Is there any setting that would make the 10 minute sendto timeout
> shorter? Assuming it's some TCP level timeout so maybe it's not settable
> using rsyslog config (didn't find anything myself).
>
>   Alternatively is there anything that would make rsyslogd close and
> reopen connection after not receiving RELP response for some time (or some
> number of retries)?
>
>   I see there are some queue timeouts like $ActionQueueTimeoutActionCompletion
> (looking at http://www.rsyslog.com/doc/rsyslog_conf_global.html) but not
> sure what those actually do and if they would apply to this case.
>
>   thanks!
>
>         erik
>
>
> On 12/16/2013 05:51 PM, Erik Steffl wrote:
>
>>    chaged the config as suggested however it does not seem it retries
>> that often, here's what tcpdump shows:
>>
>> ubuntu@ip-10-158-97-169:~$ sudo tcpdump -A port 5140
>> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>> listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
>> 01:05:43.652804 IP ip-10-158-97-169.ec2.internal.53118 >
>> ec2-50-19-250-187.compute-1.amazonaws.com.5140: Flags [P.], seq
>> 3725470457:3725470859, ack 2328419555, win 237, options [nop,nop,TS val
>> 840645376 ecr 1017783106], length 402
>> E.....@.@...
>> .a.2....~....*................
>> 2.;.<.#B26 syslog 387 <134>2013-12-17T00:58:13.436136+00:00
>> ip-10-158-97-169 logBurst[3103]:
>> @cee:{"message-json":{"count":"0/1","now":"Tue Dec 17 00:58:13
>> 2013"},"yummlyLogOrigin":{"supportLevel":"prod","system":
>> "LOGS","cluster":"prod","role":"collectorErik","host":"ip-
>> 10-158-97-169","tag":"logBurst[3103]:","programname"
>> :"logBurst","priority":"local0.info","timestamp":"2013-12-
>> 17T00:58:13.436136+00:00"}}
>>
>>
>>
>> 01:07:43.972781 IP ip-10-158-97-169.ec2.internal.53118 >
>> ec2-50-19-250-187.compute-1.amazonaws.com.5140: Flags [P.], seq 0:402,
>> ack 1, win 237, options [nop,nop,TS val 840675456 ecr 1017783106],
>> length 402
>> E.....@[email protected]
>> .a.2....~....*................
>> 2...<.#B26 syslog 387 <134>2013-12-17T00:58:13.436136+00:00
>> ip-10-158-97-169 logBurst[3103]:
>> @cee:{"message-json":{"count":"0/1","now":"Tue Dec 17 00:58:13
>> 2013"},"yummlyLogOrigin":{"supportLevel":"prod","system":
>> "LOGS","cluster":"prod","role":"collectorErik","host":"ip-
>> 10-158-97-169","tag":"logBurst[3103]:","programname"
>> :"logBurst","priority":"local0.info","timestamp":"2013-12-
>> 17T00:58:13.436136+00:00"}}
>>
>>
>>    syslog 387 keeps repeating every two minutes.
>>
>>    Config:
>>
>> if
>>    prifilt("local0.*") or
>>    ...
>>    prifilt("local7.*")
>> then {
>>    action(type="mmjsonparse")
>>    if $parsesuccess == "OK" then {
>>      action(
>>        type="omrelp"
>>        target="elb.collector.prod.logs.ylmmuy.com"
>>        port="5140"
>>        template="json"
>>        queue.type="LinkedList"
>>        queue.filename="json"
>>        queue.maxdiskspace="75161927680" # 70GB (valuable data)
>>        action.resumeRetryCount="-1"
>>        action.resumeInterval="5"
>>      )
>>    } else {
>> ...
>>    }
>>    stop
>> }
>>
>>    Same test as before (host to load balancer to another host using
>> RELP), no MARK, no other messages, just wait for connection to go stale
>> then start sending messages every 5 seconds. It takes about 15 minutes
>> for it to recover.
>>
>>    First message (strace output):
>>
>> 3081  00:58:13.528459 sendto(13, "26 syslog 387
>> <134>2013-12-17T00:58:13.436136+00:00 ip-10-158-97-169 logBurst[3103]:
>> @cee:{\"message-json\":{\"count\":\"0/1\",\"now\":\"Tue Dec 17 00:58:13
>> 2013\"},\"yummlyLogOrigin\":{\"supportLevel\":\"prod\",\"
>> system\":\"LOGS\",\"cluster\":\"prod\",\"role\":\"
>> collectorErik\",\"host\":\"ip-10-158-97-169\",\"tag\":\"
>> logBurst[3103]:\",\"programname\":\"logBurst\",\"priority\":\"local0.info
>> \",\"timestamp\":\"2013-12-17T00:58:13.436136+00:00\"}}\n\n",
>> 402, 0, NULL, 0) = 402
>> ...
>> 3081  00:58:13.529411 setsockopt(13, SOL_TCP, TCP_CORK, [0], 4) = 0
>> ...
>> 3081  00:58:18.725420 setsockopt(13, SOL_TCP, TCP_CORK, [1], 4) = 0
>> ...
>> 3081  00:58:18.726657 sendto(13, "27 syslog 387 ... same as above ...
>>
>>    Just like tcpdump shows the message is being resent (strace output
>> just like the one above) until:
>>
>> 3081  01:02:27.982896 sendto(13, "77 syslog 387
>> <134>2013-12-17T01:02:27.893264+00:00 ip-10-158-97-169 logBurst[3257]:
>> @cee:{\"message-json\":{\"count\":\"0/1\",\"now\":\"Tue Dec 17 01:02:27
>> 2013\"},\"yummlyLogOrigin\":{\"supportLevel\":\"prod\",\"
>> system\":\"LOGS\",\"cluster\":\"prod\",\"role\":\"
>> collectorErik\",\"host\":\"ip-10-158-97-169\",\"tag\":\"
>> logBurst[3257]:\",\"programname\":\"logBurst\",\"priority\":\"local0.info
>> \",\"timestamp\":\"2013-12-17T01:02:27.893264+00:00\"}}\n\n",
>> 402, 0, NULL, 0 <unfinished ...>
>> ... other threads ...
>> 3081  01:13:44.932842 <... sendto resumed> ) = 45
>> ... writing debug info ...
>> 3081  01:13:44.934579 setsockopt(13, SOL_SOCKET, SO_LINGER, {onoff=1,
>> linger=0}, 8) = 0
>> 3081  01:13:44.934662 close(13)         = 0
>>
>>    After this it recovers. The total time is 15 minutes or so. Is there
>> any way to shorten this time?
>>
>>      erik
>>
>> On 12/16/2013 12:42 AM, Rainer Gerhards wrote:
>>
>>> On Mon, Dec 16, 2013 at 9:35 AM, Erik Steffl <[email protected]> wrote:
>>>
>>>
>>>> if
>>>>    prifilt("local0.*") or
>>>>    ...
>>>>    (prifilt("kern.info") and ($msg == '-- MARK --'))
>>>>
>>>> then {
>>>>    action(type="mmjsonparse")
>>>>    if $parsesuccess == "OK" then {
>>>>      action(
>>>>        type="omrelp"
>>>>        target="elb.collector.prod.logs.ylmmuy.com"
>>>>        port="5140"
>>>>        template="json"
>>>>      )
>>>>    } else {
>>>>      action(
>>>>        type="omrelp"
>>>>        target="elb.collector.prod.logs.ylmmuy.com"
>>>>        port="5140"
>>>>        template="text"
>>>>      )
>>>>    }
>>>>    stop
>>>> }
>>>>
>>>>
>>>>  that's what I suspected. You use the defaults, which means "disable me
>>> for
>>> 30 seconds if the connections break continuesly". Try
>>>
>>> use
>>>
>>>      action(
>>>        type="omrelp"
>>>        target="elb.collector.prod.logs.ylmmuy.com"
>>>        port="5140"
>>>        template="text"
>>>        *action.resumeRetryCount="-1"*
>>>        *action.resumeInterval="5"*
>>>      )
>>>
>>>   to get you started. It will try infinitely to send messages, but will
>>> pause 5 seconds between retries. Note that you may run into trouble if
>>> the
>>> destination is offline for an extended period of time.
>>>
>>>    http://www.rsyslog.com/doc/omrelp.html don't see the retry
>>> settings, are
>>>
>>>> these some generic action retries?
>>>>
>>>
>>>
>>> action parameters applying to all actions:
>>>
>>> http://www.rsyslog.com/doc/rsyslog_conf_actions.html
>>>
>>> (you know the doc discussion, so no need to explain it may be unintuitive
>>> to find ;-))
>>>
>>> Rainer
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
>>> if you DON'T LIKE THAT.
>>>
>>>
>>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to