all is fine - maybe you start with a snippet of just some of the
affected actions.

Rainer

El jue., 3 sept. 2020 a las 15:13, Adam Chalkley
(<[email protected]>) escribió:
>
> Rainer,
>
> Thanks for the reply.
>
> I've not shared the config, but I can work on doing so. It's rather long and 
> likely over complicated, but it would be great to have other eyes on it.
>
> It may take some time to sanitize the contents before sharing, but when I'm 
> able, should I share it here as attachments, inline as plaintext or via a 
> GitHub repo? I'm leaning towards the latter if you don't have a specific 
> preference.
>
> Thank you for offering to take a look at the configuration!
>
> -----Original Message-----
> From: rsyslog <[email protected]> On Behalf Of Rainer 
> Gerhards via rsyslog
> Sent: Wednesday, September 2, 2020 3:14 AM
> To: rsyslog-users <[email protected]>
> Cc: Rainer Gerhards <[email protected]>
> Subject: Re: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, main 
> queue filling up, imrelp-related errors
>
> I don't see the rest of this thread, can you (re?) post you config?
>
> Rainer
>
> El mié., 2 sept. 2020 a las 3:13, Adam Chalkley via rsyslog
> (<[email protected]>) escribió:
> >
> > Unfortunately the system is still having issues.
> >
> > I enabled debug logging earlier, copied the debug log aside and *then* 
> > disabled debug-on-demand logging (last time I forgot that not copying the 
> > file elsewhere first complete wipes the log file).
> >
> > I have 18 MB worth of captured details for a short timeframe.
> >
> > Out of that log file I'm seeing a lot of these entries:
> >
> > 7316.804057780:imuxsock.c     : main Q: queue.c: EnqueueMsg advised worker 
> > start
> > 7316.804062774:imuxsock.c     : imuxsock.c: --------imuxsock calling poll() 
> > on 2 fds
> > 7316.804232359:imuxsock.c     : imuxsock.c: Message from UNIX socket: #3, 
> > size 221
> > 7316.804258678:imuxsock.c     : datetime.c: ParseTIMESTAMP3339: invalid 
> > year: 0, pszTS: 'e'
> > 7316.804264073:imuxsock.c     : main Q: queue.c: EnqueueMsg advised worker 
> > start
> > 7316.804267610:imuxsock.c     : imuxsock.c: --------imuxsock calling poll() 
> > on 2 fds
> > 7317.803939077:imrelp.c       : main Q: queue.c: EnqueueMsg advised worker 
> > start
> > 7317.803994833:imrelp.c       : imrelp.c: relpTcpSend: send data: 14916 rsp 
> > 6 200 OK
> >
> > 7317.804025407:imrelp.c       : imrelp.c: relpTcpSend: sock 80, lenbuf 19, 
> > send returned -1 [errno 32]
> > 7317.804038885:imrelp.c       : imrelp.c: librelp: generic error: ecode 
> > 10014, emsg 'error sending relp: Broken pipe'
> > 7317.804049421:imrelp.c       : errmsg.c: Called LogMsg, msg: imrelp[2514]: 
> > error 'error sending relp: Broken pipe', object  'lstn 2514: conn to clt 
> > 192.168.2.172/192.168.2.172' - input may not work as intended
> > 7317.804054526:imrelp.c       : operatingstate.c: osf: MSG imrelp[2514]: 
> > error 'error sending relp: Broken pipe', object  'lstn 2514: conn to clt 
> > 192.168.2.172/192.168.2.172' - input may not work as intended: signaling 
> > new internal message via SIGTTOU: 'imrelp[2514]: error 'error sending relp: 
> > Broken pipe', object  'lstn 2514: conn to clt 192.168.2.172/192.168.2.172' 
> > - input may not work as intended [v8.2006.0 try 
> > https://www.rsyslog.com/e/2353 ]'
> > 7317.804097763:main thread    : janitor.c: janitorRun() called
> >
> > The pattern seems to be that a remote target is interrupted and then the 
> > main queue (main Q) starts filling up. All omrelp actions have queues 
> > attached and none of them ever filled (going by impstats log file which was 
> > regularly receiving updates).
> >
> > Going by the local log file on the primary receiver it noted these details:
> >
> > 2020-09-01T12:27:33.306418-05:00 woodchuck1 rsyslogd: main Q:Reg: high 
> > activity - starting 1 additional worker thread(s), currently 1 active 
> > worker threads. [v8.2006.0 try https://www.rsyslog.com/e/2439 ]
> > 2020-09-01T12:27:33.429855-05:00 woodchuck1 rsyslogd: omfwd: TCPSendBuf 
> > error -2027, destruct TCP Connection to graylog1.example.com:514 [v8.2006.0 
> > try https://www.rsyslog.com/e/2027 ]
> > 2020-09-01T12:27:33.469576-05:00 woodchuck1 rsyslogd: action 
> > 'ForwardToGraylog1' suspended (module 'builtin:omfwd'), retry 0. There 
> > should be messages before this one giving the reason for suspension. 
> > [v8.2006.0 try https://www.rsyslog.com/e/2007 ]
> > 2020-09-01T12:27:33.470212-05:00 woodchuck1 rsyslogd: action 
> > 'ForwardToGraylog1' resumed (module 'builtin:omfwd') [v8.2006.0 try 
> > https://www.rsyslog.com/e/2359 ]
> > 2020-09-01T12:33:02.644015-05:00 woodchuck1 rsyslogd: -- MARK --
> > 2020-09-01T12:53:02.711834-05:00 woodchuck1 rsyslogd: -- MARK --
> > 2020-09-01T13:13:02.802190-05:00 woodchuck1 rsyslogd: -- MARK --
> > 2020-09-01T13:33:02.856874-05:00 woodchuck1 rsyslogd: -- MARK --
> > 2020-09-01T13:45:10.206220-05:00 woodchuck1 rsyslogd: main Q:Reg: high 
> > activity - starting 1 additional worker thread(s), currently 2 active 
> > worker threads. [v8.2006.0 try https://www.rsyslog.com/e/2439 ]
> > 2020-09-01T13:45:10.222580-05:00 woodchuck1 rsyslogd: librelp error 10008 
> > forwarding to server woodchuck2.example.com:2514 - suspending  [v8.2006.0 
> > try https://www.rsyslog.com/e/2291 ]
> > 2020-09-01T13:45:10.222794-05:00 woodchuck1 rsyslogd: action 
> > 'ForwardTowoodchuck2' suspended (module 'omrelp'), retry 0. There should be 
> > messages before this one giving the reason for suspension. [v8.2006.0 try 
> > https://www.rsyslog.com/e/2007 ]
> > 2020-09-01T13:45:11.232309-05:00 woodchuck1 rsyslogd: action 
> > 'ForwardTowoodchuck2' resumed (module 'omrelp') [v8.2006.0 try 
> > https://www.rsyslog.com/e/2359 ]
> > According to the impstats log, the main queue started filling up around 
> > 11:21 am, about the time that several of our mail relays were rebooting 
> > from maintenance.
> >
> > This is an Ubuntu 18.04 box that was in-place upgraded from Ubuntu 16.04.
> >
> > Not sure if it is relevant, but the existing rsyslog packages were not 
> > swapped out with replacement 18.04-based packages as part of the upgrade as 
> > expected.
> >
> > dpkg -l | grep rsyslog
> > ii  rsyslog                                8.2006.0-0adiscon2xenial1        
> >                amd64        a rocket-fast system for log processing
> > ii  rsyslog-mmjsonparse                    8.2006.0-0adiscon2xenial1        
> >                amd64        Parsing/handling of CEE/Lumberjack JSON 
> > messages in rsyslog
> > ii  rsyslog-mmnormalize                    8.2006.0-0adiscon2xenial1        
> >                amd64        The rsyslog-mmnormalize package provides log 
> > normalization
> > ii  rsyslog-mmrm1stspace                   8.2006.0-0adiscon2xenial1        
> >                amd64        The mmrm1stspace module permits to strip the 
> > leading space from
> > ii  rsyslog-relp                           8.2006.0-0adiscon2xenial1        
> >                amd64        RELP protocol support for rsyslog
> >
> > I was planning to wait for the next rsyslog packages to be released for the 
> > Ubuntu PPA to see if that was enough to trigger a switchover to "bionic" 
> > based packages. I suspect this is completely unrelated to the problem 
> > experienced, but wanted to make sure and note it here.
> >
> > Recent changes (not *exactly* flailing around, but not very confident) from 
> > a few days ago (which didn't help today):
> >
> > * Modified forward queues (attached to actions within a ruleset)
> > ** increase queue size from 10K to 100K
> > ** increase worker threads from 1 to 4
> > ** disable explicit high water mark setting, allow default setting to apply
> > ** disable explicit low water mark setting, allow default setting to apply
> >
> > * Modified 'main Q' (as impstats lists it)
> > ** adjust worker threads from 1 to 4
> > ** adjust queue size from 10K to 500K
> >
> > Changes made tonight after restarting rsyslog:
> >
> > * Modified ruleset used to write out messages to local log files on the 
> > receiver
> > ** increase queue size to 250K
> > ** increase worker threads to 4
> >
> > * Disable lightDelayMark on main queue (as discussed on this thread)
> >
> >
> > I'm not sure what's going on. This box was quite stable up until we 
> > upgraded the OS from Ubuntu 16.04 to 18.04.
> >
> > Any ideas/suggestions are welcome.
> >
> > Thanks.
> >
> > -----Original Message-----
> > From: Adam Chalkley
> > Sent: Friday, August 28, 2020 4:15 PM
> > To: rsyslog-users <[email protected]>
> > Cc: [email protected]
> > Subject: RE: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, 
> > connections from clients failing with a high number of CLOSE_WAIT 
> > connections on receiver
> >
> > Hi Andre,
> >
> > Thank you for the additional feedback.
> >
> > As you suggested, the problem is likely tied back to the TCP probe. We've 
> > not had it enabled since last Sunday and rsyslog has been running fine 
> > without making the changes we previously discussed (I'm still interested in 
> > making them, I've just been pulled in other directions).
> >
> > Do you happen to know of a safe way to check that the port is open remotely 
> > without triggering a failure from rsyslog's perspective? I'm guessing that 
> > a minimal RELP-compatible client would be the best approach. Is there such 
> > a tool that you're aware of that could be called periodically to confirm 
> > that a rsyslog receiver (RELP-enabled port) is functioning properly?
> >
> > Just thought I would ask.
> >
> > Thanks!
> >
> > -----Original Message-----
> > From: Andre Lorbach <[email protected]>
> > Sent: Monday, August 24, 2020 4:06 AM
> > To: rsyslog-users <[email protected]>
> > Cc: Adam Chalkley <[email protected]>
> > Subject: AW: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, 
> > connections from clients failing with a high number of CLOSE_WAIT 
> > connections on receiver
> >
> > I think those errors were there all the time but not reported in older
> > librelp version.
> > I reviewed the code and we added this error output about 2 years ago in
> > librelp.
> > Ubuntu 16.04 most likely is using an older librelp version, so you did not
> > see the error there.
> >
> > The problem is caused by the TCP Probe, it may helps if you try to receive
> > data before you drop the connection.
> >
> > Best regards,
> > Andre Lorbach
> > --
> > Adiscon GmbH
> > Mozartstr. 21
> > 97950 Großrinderfeld, Germany
> > Ph. +49-9349-9298530
> > Geschäftsführer/President: Rainer Gerhards Reg.-Gericht Mannheim, HRB
> > 560610
> > Ust.-IDNr.: DE 81 22 04 622
> > Web: www.adiscon.com - Mail: [email protected]
> >
> > Informations regarding your data privacy policy can be found here:
> > https://www.adiscon.com/data-privacy-policy/
> >
> > This e-mail may contain confidential and/or privileged information. If you
> > are not the intended recipient or have received this e-mail in error
> > please notify the sender immediately and delete this e-mail. Any
> > unauthorized copying, disclosure or distribution of the material in this
> > e-mail is strictly forbidden.
> >
> > Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
> > Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
> > irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
> > vernichten Sie diese E-Mail. Das unerlaubte Kopieren und die unbefugte
> > Weitergabe dieser E-Mail sind nicht gestattet.
> >
> >
> >
> > > -----Ursprüngliche Nachricht-----
> > > Von: rsyslog <[email protected]> Im Auftrag von Adam
> > > Chalkley via rsyslog
> > > Gesendet: Mittwoch, 19. August 2020 18:38
> > > An: rsyslog-users <[email protected]>
> > > Cc: Adam Chalkley <[email protected]>
> > > Betreff: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04,
> > connections
> > > from clients failing with a high number of CLOSE_WAIT connections on
> > > receiver
> > >
> > > Hi,
> > >
> > > We upgraded the OS on our central receiver yesterday from Ubuntu 16.04
> > > (4.4 kernel) to 18.04 (4.15 kernel).
> > >
> > > We are using the upstream PPA, so running 8.2006.0 on receivers and
> > > endpoints.
> > >
> > > When we started getting reports from our Nagios instance that the
> > rsyslog
> > > forward queues endpoints were beginning to fill we checked our receiver
> > > (sawmill1) and saw 94 open TCP connections with 40 of them in CLOSE_WAIT
> > > from our Nagios server, most of them I suspect from the TCP port
> > connection
> > > test performed every 5 minutes.
> > >
> > > Log samples from the receiver system (which are related to port probes
> > from
> > > our Nagios instance):
> > >
> > > 2020-08-19T10:05:01.279416-05:00 lincoln rsyslogd: -- MARK --
> > > 2020-08-19T10:05:08.249358-05:00 lincoln rsyslogd: imrelp[2514]: error
> > 'server
> > > closed relp session, session broken', object  'lstn 2514: conn to clt
> > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> > try
> > > https://www.rsyslog.com/e/2353 ]
> > > 2020-08-19T10:05:08.249626-05:00 lincoln rsyslogd: imrelp[2514]: error
> > 'error
> > > sending relp: Bad file descriptor', object  'lstn 2514: conn to clt
> > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> > try
> > > https://www.rsyslog.com/e/2353 ]
> > > 2020-08-19T10:08:08.020625-05:00 lincoln rsyslogd: imrelp[2514]: error
> > 'server
> > > closed relp session, session broken', object  'lstn 2514: conn to clt
> > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> > try
> > > https://www.rsyslog.com/e/2353 ]
> > > 2020-08-19T10:08:08.021253-05:00 lincoln rsyslogd: imrelp[2514]: error
> > 'error
> > > sending relp: Bad file descriptor', object  'lstn 2514: conn to clt
> > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> > try
> > > https://www.rsyslog.com/e/2353 ]
> > > 2020-08-19T10:11:08.074712-05:00 lincoln rsyslogd: imrelp[2514]: error
> > 'server
> > > closed relp session, session broken', object  'lstn 2514: conn to clt
> > > 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0
> > try
> > > https://www.rsyslog.com/e/2353 ]
> > >
> > > Log samples from the Nagios instance:
> > >
> > > 2020-08-19T11:19:53.444953-05:00 nagios rsyslogd:
> > > omrelp[lincoln.lib.auburn.edu:2514]: error 'error waiting on required
> > session
> > > state, session broken', object  'conn to srvr
> > lincoln.lib.auburn.edu:2514' -
> > > action may not work as intended [v8.2006.0 try
> > > https://www.rsyslog.com/e/2353 ]
> > > 2020-08-19T11:19:53.445260-05:00 nagios rsyslogd:
> > > omrelp[lincoln.lib.auburn.edu:2514]: error 'error opening connection to
> > > remote peer', object  'conn to srvr lincoln.lib.auburn.edu:2514' -
> > action may
> > > not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ]
> > >
> > > Is  there a setting I can apply to rsyslog to help resolve this?
> > >
> > > Is this a known bug?
> > >
> > > We didn't have the issue with v8.2006.0 on our receiver when it was
> > running
> > > Ubuntu 16.04 (the prior OS release), even though it made the same
> > > complaints about the TCP port probes from Nagios.
> > >
> > > Thanks in advance.
> > >
> > > _______________________________________________
> > > rsyslog mailing list
> > > https://lists.adiscon.net/mailman/listinfo/rsyslog
> > > http://www.rsyslog.com/professional-services/
> > > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL:
> > > This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites
> > beyond
> > > our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
> > _______________________________________________
> > rsyslog mailing list
> > https://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
> > LIKE THAT.
> _______________________________________________
> rsyslog mailing list
> https://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
> LIKE THAT.
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to