On Thu, Dec 19, 2013 at 2:28 AM, Erik Steffl <[email protected]> wrote:
> maybe this was too wordy, here's simpler version. > > I had finally to work on some internal things before vacation, thus the silence. Today is meetings as well... > after adding: > > action.resumeRetryCount="-1" > action.resumeInterval="5" > > to action definition I see that rsyslogd tries to resend the message few > times, where sendto succeeds but not RELP response is received. After few > times sendto take approximately 10 minutes and fails. > > You mean sendto() itself takes around 10 minutes? Rainer > Afterwards the connection is closed, re-opened and everything works fine. > > Is there any setting that would make the 10 minute sendto timeout > shorter? Assuming it's some TCP level timeout so maybe it's not settable > using rsyslog config (didn't find anything myself). > > Alternatively is there anything that would make rsyslogd close and > reopen connection after not receiving RELP response for some time (or some > number of retries)? > > I see there are some queue timeouts like $ActionQueueTimeoutActionCompletion > (looking at http://www.rsyslog.com/doc/rsyslog_conf_global.html) but not > sure what those actually do and if they would apply to this case. > > thanks! > > erik > > > On 12/16/2013 05:51 PM, Erik Steffl wrote: > >> chaged the config as suggested however it does not seem it retries >> that often, here's what tcpdump shows: >> >> ubuntu@ip-10-158-97-169:~$ sudo tcpdump -A port 5140 >> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode >> listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes >> 01:05:43.652804 IP ip-10-158-97-169.ec2.internal.53118 > >> ec2-50-19-250-187.compute-1.amazonaws.com.5140: Flags [P.], seq >> 3725470457:3725470859, ack 2328419555, win 237, options [nop,nop,TS val >> 840645376 ecr 1017783106], length 402 >> E.....@.@... >> .a.2....~....*................ >> 2.;.<.#B26 syslog 387 <134>2013-12-17T00:58:13.436136+00:00 >> ip-10-158-97-169 logBurst[3103]: >> @cee:{"message-json":{"count":"0/1","now":"Tue Dec 17 00:58:13 >> 2013"},"yummlyLogOrigin":{"supportLevel":"prod","system": >> "LOGS","cluster":"prod","role":"collectorErik","host":"ip- >> 10-158-97-169","tag":"logBurst[3103]:","programname" >> :"logBurst","priority":"local0.info","timestamp":"2013-12- >> 17T00:58:13.436136+00:00"}} >> >> >> >> 01:07:43.972781 IP ip-10-158-97-169.ec2.internal.53118 > >> ec2-50-19-250-187.compute-1.amazonaws.com.5140: Flags [P.], seq 0:402, >> ack 1, win 237, options [nop,nop,TS val 840675456 ecr 1017783106], >> length 402 >> E.....@[email protected] >> .a.2....~....*................ >> 2...<.#B26 syslog 387 <134>2013-12-17T00:58:13.436136+00:00 >> ip-10-158-97-169 logBurst[3103]: >> @cee:{"message-json":{"count":"0/1","now":"Tue Dec 17 00:58:13 >> 2013"},"yummlyLogOrigin":{"supportLevel":"prod","system": >> "LOGS","cluster":"prod","role":"collectorErik","host":"ip- >> 10-158-97-169","tag":"logBurst[3103]:","programname" >> :"logBurst","priority":"local0.info","timestamp":"2013-12- >> 17T00:58:13.436136+00:00"}} >> >> >> syslog 387 keeps repeating every two minutes. >> >> Config: >> >> if >> prifilt("local0.*") or >> ... >> prifilt("local7.*") >> then { >> action(type="mmjsonparse") >> if $parsesuccess == "OK" then { >> action( >> type="omrelp" >> target="elb.collector.prod.logs.ylmmuy.com" >> port="5140" >> template="json" >> queue.type="LinkedList" >> queue.filename="json" >> queue.maxdiskspace="75161927680" # 70GB (valuable data) >> action.resumeRetryCount="-1" >> action.resumeInterval="5" >> ) >> } else { >> ... >> } >> stop >> } >> >> Same test as before (host to load balancer to another host using >> RELP), no MARK, no other messages, just wait for connection to go stale >> then start sending messages every 5 seconds. It takes about 15 minutes >> for it to recover. >> >> First message (strace output): >> >> 3081 00:58:13.528459 sendto(13, "26 syslog 387 >> <134>2013-12-17T00:58:13.436136+00:00 ip-10-158-97-169 logBurst[3103]: >> @cee:{\"message-json\":{\"count\":\"0/1\",\"now\":\"Tue Dec 17 00:58:13 >> 2013\"},\"yummlyLogOrigin\":{\"supportLevel\":\"prod\",\" >> system\":\"LOGS\",\"cluster\":\"prod\",\"role\":\" >> collectorErik\",\"host\":\"ip-10-158-97-169\",\"tag\":\" >> logBurst[3103]:\",\"programname\":\"logBurst\",\"priority\":\"local0.info >> \",\"timestamp\":\"2013-12-17T00:58:13.436136+00:00\"}}\n\n", >> 402, 0, NULL, 0) = 402 >> ... >> 3081 00:58:13.529411 setsockopt(13, SOL_TCP, TCP_CORK, [0], 4) = 0 >> ... >> 3081 00:58:18.725420 setsockopt(13, SOL_TCP, TCP_CORK, [1], 4) = 0 >> ... >> 3081 00:58:18.726657 sendto(13, "27 syslog 387 ... same as above ... >> >> Just like tcpdump shows the message is being resent (strace output >> just like the one above) until: >> >> 3081 01:02:27.982896 sendto(13, "77 syslog 387 >> <134>2013-12-17T01:02:27.893264+00:00 ip-10-158-97-169 logBurst[3257]: >> @cee:{\"message-json\":{\"count\":\"0/1\",\"now\":\"Tue Dec 17 01:02:27 >> 2013\"},\"yummlyLogOrigin\":{\"supportLevel\":\"prod\",\" >> system\":\"LOGS\",\"cluster\":\"prod\",\"role\":\" >> collectorErik\",\"host\":\"ip-10-158-97-169\",\"tag\":\" >> logBurst[3257]:\",\"programname\":\"logBurst\",\"priority\":\"local0.info >> \",\"timestamp\":\"2013-12-17T01:02:27.893264+00:00\"}}\n\n", >> 402, 0, NULL, 0 <unfinished ...> >> ... other threads ... >> 3081 01:13:44.932842 <... sendto resumed> ) = 45 >> ... writing debug info ... >> 3081 01:13:44.934579 setsockopt(13, SOL_SOCKET, SO_LINGER, {onoff=1, >> linger=0}, 8) = 0 >> 3081 01:13:44.934662 close(13) = 0 >> >> After this it recovers. The total time is 15 minutes or so. Is there >> any way to shorten this time? >> >> erik >> >> On 12/16/2013 12:42 AM, Rainer Gerhards wrote: >> >>> On Mon, Dec 16, 2013 at 9:35 AM, Erik Steffl <[email protected]> wrote: >>> >>> >>>> if >>>> prifilt("local0.*") or >>>> ... >>>> (prifilt("kern.info") and ($msg == '-- MARK --')) >>>> >>>> then { >>>> action(type="mmjsonparse") >>>> if $parsesuccess == "OK" then { >>>> action( >>>> type="omrelp" >>>> target="elb.collector.prod.logs.ylmmuy.com" >>>> port="5140" >>>> template="json" >>>> ) >>>> } else { >>>> action( >>>> type="omrelp" >>>> target="elb.collector.prod.logs.ylmmuy.com" >>>> port="5140" >>>> template="text" >>>> ) >>>> } >>>> stop >>>> } >>>> >>>> >>>> that's what I suspected. You use the defaults, which means "disable me >>> for >>> 30 seconds if the connections break continuesly". Try >>> >>> use >>> >>> action( >>> type="omrelp" >>> target="elb.collector.prod.logs.ylmmuy.com" >>> port="5140" >>> template="text" >>> *action.resumeRetryCount="-1"* >>> *action.resumeInterval="5"* >>> ) >>> >>> to get you started. It will try infinitely to send messages, but will >>> pause 5 seconds between retries. Note that you may run into trouble if >>> the >>> destination is offline for an extended period of time. >>> >>> http://www.rsyslog.com/doc/omrelp.html don't see the retry >>> settings, are >>> >>>> these some generic action retries? >>>> >>> >>> >>> action parameters applying to all actions: >>> >>> http://www.rsyslog.com/doc/rsyslog_conf_actions.html >>> >>> (you know the doc discussion, so no need to explain it may be unintuitive >>> to find ;-)) >>> >>> Rainer >>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com/professional-services/ >>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>> myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST >>> if you DON'T LIKE THAT. >>> >>> >> > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

