I don't see the rest of this thread, can you (re?) post you config?

Rainer

El mié., 2 sept. 2020 a las 3:13, Adam Chalkley via rsyslog
(<[email protected]>) escribió:
>
> Unfortunately the system is still having issues.
>
> I enabled debug logging earlier, copied the debug log aside and *then* 
> disabled debug-on-demand logging (last time I forgot that not copying the 
> file elsewhere first complete wipes the log file).
>
> I have 18 MB worth of captured details for a short timeframe.
>
> Out of that log file I'm seeing a lot of these entries:
>
> 7316.804057780:imuxsock.c     : main Q: queue.c: EnqueueMsg advised worker 
> start
> 7316.804062774:imuxsock.c     : imuxsock.c: --------imuxsock calling poll() 
> on 2 fds
> 7316.804232359:imuxsock.c     : imuxsock.c: Message from UNIX socket: #3, 
> size 221
> 7316.804258678:imuxsock.c     : datetime.c: ParseTIMESTAMP3339: invalid year: 
> 0, pszTS: 'e'
> 7316.804264073:imuxsock.c     : main Q: queue.c: EnqueueMsg advised worker 
> start
> 7316.804267610:imuxsock.c     : imuxsock.c: --------imuxsock calling poll() 
> on 2 fds
> 7317.803939077:imrelp.c       : main Q: queue.c: EnqueueMsg advised worker 
> start
> 7317.803994833:imrelp.c       : imrelp.c: relpTcpSend: send data: 14916 rsp 6 
> 200 OK
>
> 7317.804025407:imrelp.c       : imrelp.c: relpTcpSend: sock 80, lenbuf 19, 
> send returned -1 [errno 32]
> 7317.804038885:imrelp.c       : imrelp.c: librelp: generic error: ecode 
> 10014, emsg 'error sending relp: Broken pipe'
> 7317.804049421:imrelp.c       : errmsg.c: Called LogMsg, msg: imrelp[2514]: 
> error 'error sending relp: Broken pipe', object  'lstn 2514: conn to clt 
> 192.168.2.172/192.168.2.172' - input may not work as intended
> 7317.804054526:imrelp.c       : operatingstate.c: osf: MSG imrelp[2514]: 
> error 'error sending relp: Broken pipe', object  'lstn 2514: conn to clt 
> 192.168.2.172/192.168.2.172' - input may not work as intended: signaling new 
> internal message via SIGTTOU: 'imrelp[2514]: error 'error sending relp: 
> Broken pipe', object  'lstn 2514: conn to clt 192.168.2.172/192.168.2.172' - 
> input may not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 
> ]'
> 7317.804097763:main thread    : janitor.c: janitorRun() called
>
> The pattern seems to be that a remote target is interrupted and then the main 
> queue (main Q) starts filling up. All omrelp actions have queues attached and 
> none of them ever filled (going by impstats log file which was regularly 
> receiving updates).
>
> Going by the local log file on the primary receiver it noted these details:
>
> 2020-09-01T12:27:33.306418-05:00 woodchuck1 rsyslogd: main Q:Reg: high 
> activity - starting 1 additional worker thread(s), currently 1 active worker 
> threads. [v8.2006.0 try https://www.rsyslog.com/e/2439 ]
> 2020-09-01T12:27:33.429855-05:00 woodchuck1 rsyslogd: omfwd: TCPSendBuf error 
> -2027, destruct TCP Connection to graylog1.example.com:514 [v8.2006.0 try 
> https://www.rsyslog.com/e/2027 ]
> 2020-09-01T12:27:33.469576-05:00 woodchuck1 rsyslogd: action 
> 'ForwardToGraylog1' suspended (module 'builtin:omfwd'), retry 0. There should 
> be messages before this one giving the reason for suspension. [v8.2006.0 try 
> https://www.rsyslog.com/e/2007 ]
> 2020-09-01T12:27:33.470212-05:00 woodchuck1 rsyslogd: action 
> 'ForwardToGraylog1' resumed (module 'builtin:omfwd') [v8.2006.0 try 
> https://www.rsyslog.com/e/2359 ]
> 2020-09-01T12:33:02.644015-05:00 woodchuck1 rsyslogd: -- MARK --
> 2020-09-01T12:53:02.711834-05:00 woodchuck1 rsyslogd: -- MARK --
> 2020-09-01T13:13:02.802190-05:00 woodchuck1 rsyslogd: -- MARK --
> 2020-09-01T13:33:02.856874-05:00 woodchuck1 rsyslogd: -- MARK --
> 2020-09-01T13:45:10.206220-05:00 woodchuck1 rsyslogd: main Q:Reg: high 
> activity - starting 1 additional worker thread(s), currently 2 active worker 
> threads. [v8.2006.0 try https://www.rsyslog.com/e/2439 ]
> 2020-09-01T13:45:10.222580-05:00 woodchuck1 rsyslogd: librelp error 10008 
> forwarding to server woodchuck2.example.com:2514 - suspending  [v8.2006.0 try 
> https://www.rsyslog.com/e/2291 ]
> 2020-09-01T13:45:10.222794-05:00 woodchuck1 rsyslogd: action 
> 'ForwardTowoodchuck2' suspended (module 'omrelp'), retry 0. There should be 
> messages before this one giving the reason for suspension. [v8.2006.0 try 
> https://www.rsyslog.com/e/2007 ]
> 2020-09-01T13:45:11.232309-05:00 woodchuck1 rsyslogd: action 
> 'ForwardTowoodchuck2' resumed (module 'omrelp') [v8.2006.0 try 
> https://www.rsyslog.com/e/2359 ]
> According to the impstats log, the main queue started filling up around 11:21 
> am, about the time that several of our mail relays were rebooting from 
> maintenance.
>
> This is an Ubuntu 18.04 box that was in-place upgraded from Ubuntu 16.04.
>
> Not sure if it is relevant, but the existing rsyslog packages were not 
> swapped out with replacement 18.04-based packages as part of the upgrade as 
> expected.
>
> dpkg -l | grep rsyslog
> ii  rsyslog                                8.2006.0-0adiscon2xenial1          
>              amd64        a rocket-fast system for log processing
> ii  rsyslog-mmjsonparse                    8.2006.0-0adiscon2xenial1          
>              amd64        Parsing/handling of CEE/Lumberjack JSON messages in 
> rsyslog
> ii  rsyslog-mmnormalize                    8.2006.0-0adiscon2xenial1          
>              amd64        The rsyslog-mmnormalize package provides log 
> normalization
> ii  rsyslog-mmrm1stspace                   8.2006.0-0adiscon2xenial1          
>              amd64        The mmrm1stspace module permits to strip the 
> leading space from
> ii  rsyslog-relp                           8.2006.0-0adiscon2xenial1          
>              amd64        RELP protocol support for rsyslog
>
> I was planning to wait for the next rsyslog packages to be released for the 
> Ubuntu PPA to see if that was enough to trigger a switchover to "bionic" 
> based packages. I suspect this is completely unrelated to the problem 
> experienced, but wanted to make sure and note it here.
>
> Recent changes (not *exactly* flailing around, but not very confident) from a 
> few days ago (which didn't help today):
>
> * Modified forward queues (attached to actions within a ruleset)
> ** increase queue size from 10K to 100K
> ** increase worker threads from 1 to 4
> ** disable explicit high water mark setting, allow default setting to apply
> ** disable explicit low water mark setting, allow default setting to apply
>
> * Modified 'main Q' (as impstats lists it)
> ** adjust worker threads from 1 to 4
> ** adjust queue size from 10K to 500K
>
> Changes made tonight after restarting rsyslog:
>
> * Modified ruleset used to write out messages to local log files on the 
> receiver
> ** increase queue size to 250K
> ** increase worker threads to 4
>
> * Disable lightDelayMark on main queue (as discussed on this thread)
>
>
> I'm not sure what's going on. This box was quite stable up until we upgraded 
> the OS from Ubuntu 16.04 to 18.04.
>
> Any ideas/suggestions are welcome.
>
> Thanks.
>
> -----Original Message-----
> From: Adam Chalkley
> Sent: Friday, August 28, 2020 4:15 PM
> To: rsyslog-users <[email protected]>
> Cc: [email protected]
> Subject: RE: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, 
> connections from clients failing with a high number of CLOSE_WAIT connections 
> on receiver
>
> Hi Andre,
>
> Thank you for the additional feedback.
>
> As you suggested, the problem is likely tied back to the TCP probe. We've not 
> had it enabled since last Sunday and rsyslog has been running fine without 
> making the changes we previously discussed (I'm still interested in making 
> them, I've just been pulled in other directions).
>
> Do you happen to know of a safe way to check that the port is open remotely 
> without triggering a failure from rsyslog's perspective? I'm guessing that a 
> minimal RELP-compatible client would be the best approach. Is there such a 
> tool that you're aware of that could be called periodically to confirm that a 
> rsyslog receiver (RELP-enabled port) is functioning properly?
>
> Just thought I would ask.
>
> Thanks!
>
> -----Original Message-----
> From: Andre Lorbach <[email protected]>
> Sent: Monday, August 24, 2020 4:06 AM
> To: rsyslog-users <[email protected]>
> Cc: Adam Chalkley <[email protected]>
> Subject: AW: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, 
> connections from clients failing with a high number of CLOSE_WAIT connections 
> on receiver
>
> I think those errors were there all the time but not reported in older
> librelp version.
> I reviewed the code and we added this error output about 2 years ago in
> librelp.
> Ubuntu 16.04 most likely is using an older librelp version, so you did not
> see the error there.
>
> The problem is caused by the TCP Probe, it may helps if you try to receive
> data before you drop the connection.
>
> Best regards,
> Andre Lorbach
> --
> Adiscon GmbH
> Mozartstr. 21
> 97950 Großrinderfeld, Germany
> Ph. +49-9349-9298530
> Geschäftsführer/President: Rainer Gerhards Reg.-Gericht Mannheim, HRB
> 560610
> Ust.-IDNr.: DE 81 22 04 622
> Web: www.adiscon.com - Mail: [email protected]
>
> Informations regarding your data privacy policy can be found here:
> https://www.adiscon.com/data-privacy-policy/
>
> This e-mail may contain confidential and/or privileged information. If you
> are not the intended recipient or have received this e-mail in error
> please notify the sender immediately and delete this e-mail. Any
> unauthorized copying, disclosure or distribution of the material in this
> e-mail is strictly forbidden.
>
> Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
> Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
> irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
> vernichten Sie diese E-Mail. Das unerlaubte Kopieren und die unbefugte
> Weitergabe dieser E-Mail sind nicht gestattet.
>
>
>
> > -----Ursprüngliche Nachricht-----
> > Von: rsyslog <[email protected]> Im Auftrag von Adam
> > Chalkley via rsyslog
> > Gesendet: Mittwoch, 19. August 2020 18:38
> > An: rsyslog-users <[email protected]>
> > Cc: Adam Chalkley <[email protected]>
> > Betreff: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04,
> connections
> > from clients failing with a high number of CLOSE_WAIT connections on
> > receiver
> >
> > Hi,
> >
> > We upgraded the OS on our central receiver yesterday from Ubuntu 16.04
> > (4.4 kernel) to 18.04 (4.15 kernel).
> >
> > We are using the upstream PPA, so running 8.2006.0 on receivers and
> > endpoints.
> >
> > When we started getting reports from our Nagios instance that the
> rsyslog
> > forward queues endpoints were beginning to fill we checked our receiver
> > (sawmill1) and saw 94 open TCP connections with 40 of them in CLOSE_WAIT
> > from our Nagios server, most of them I suspect from the TCP port
> connection
> > test performed every 5 minutes.
> >
> > Log samples from the receiver system (which are related to port probes
> from
> > our Nagios instance):
> >
> > 2020-08-19T10:05:01.279416-05:00 lincoln rsyslogd: -- MARK --
> > 2020-08-19T10:05:08.249358-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'server
> > closed relp session, session broken', object  'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T10:05:08.249626-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'error
> > sending relp: Bad file descriptor', object  'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T10:08:08.020625-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'server
> > closed relp session, session broken', object  'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T10:08:08.021253-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'error
> > sending relp: Bad file descriptor', object  'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T10:11:08.074712-05:00 lincoln rsyslogd: imrelp[2514]: error
> 'server
> > closed relp session, session broken', object  'lstn 2514: conn to clt
> > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> try
> > https://www.rsyslog.com/e/2353 ]
> >
> > Log samples from the Nagios instance:
> >
> > 2020-08-19T11:19:53.444953-05:00 nagios rsyslogd:
> > omrelp[lincoln.lib.auburn.edu:2514]: error 'error waiting on required
> session
> > state, session broken', object  'conn to srvr
> lincoln.lib.auburn.edu:2514' -
> > action may not work as intended [v8.2006.0 try
> > https://www.rsyslog.com/e/2353 ]
> > 2020-08-19T11:19:53.445260-05:00 nagios rsyslogd:
> > omrelp[lincoln.lib.auburn.edu:2514]: error 'error opening connection to
> > remote peer', object  'conn to srvr lincoln.lib.auburn.edu:2514' -
> action may
> > not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ]
> >
> > Is  there a setting I can apply to rsyslog to help resolve this?
> >
> > Is this a known bug?
> >
> > We didn't have the issue with v8.2006.0 on our receiver when it was
> running
> > Ubuntu 16.04 (the prior OS release), even though it made the same
> > complaints about the TCP port probes from Nagios.
> >
> > Thanks in advance.
> >
> > _______________________________________________
> > rsyslog mailing list
> > https://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL:
> > This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites
> beyond
> > our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
> LIKE THAT.
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to