with a quick glance at things
you are doing a lot of dynamic filename templates, since you do not change the
default dynafile cache size (and I don't know if you can on that ancient a
version), rsyslog is spending a LOT of time syncing, closing, and opening files.
Also, you are extensively using the if..then style filters, those are much
slower than other filters on versions prior to 7.x
So it's probably the case that if you upgrade your central servers to a current
version, and set a large enough DynaFileCacheSize your performance problems will
disappear.
David Lang
On Wed, 4 Dec 2013, Dan Finn wrote:
OK, now we might be onto something. I can’t determine exactly which
remote machine the client is hitting because it’s going through the F5 so
what I did is took a look at the stats on the F5 and picked the busiest
remote server. There is an rsyslog thread on there that is hovering at
very close to 100%.
Here’s the config from our destination servers. They all share an
identical config. http://pastebin.com/35K9gw97
DAN FINN
Linux System Administrator
Office: 801-746-7580 ext. 5381
Mobile: 801-609-4705
[email protected]
Backcountry.com <http://www.backcountry.com/>
Competitive Cyclist <http://www.competitivecyclist.com/>
RealCyclist.com <http://www.realcyclist.com/>
Dogfunk.com <http://www.dogfunk.com/>
SteepandCheap.com <http://www.steepandcheap.com/>
Chainlove.com <http://www.chainlove.com/>
WhiskeyMilitia.com <http://www.whiskeymilitia.com/>
On 12/4/13, 10:41 AM, "David Lang" <[email protected]> wrote:
Ok, then the question is how fast is the receiving machine accepting
messages.
unless you have an unusually complex template, you should be able to send
messages very fast.
But if the receiving machine is not processing messages fast enough there
will
be a buildup. but if all it's doing is writing to local files (and you
aren't
doing a lot of dynamic filename stuff) it's unlikely that it should be
that
slow.
you could look at what the different threads are doing using top
(remember to
hit 'H' to see the threads) and if one or more threads is maxing out the
CPU,
you can then look at the batching settings.
But I really don't think the sending machine is the bottleneck, if it was
it
wouldn't be able to write the queue files either.
David Lang
On Wed, 4 Dec 2013, Dan Finn wrote:
I’m thinking it’s most likely something around #3. :)
I don’t think it’s a network or F5 related problem as far as I can tell.
For example, right now we have a server that is writing logs to the
local
spool. I ran tcpdump and I can see rsyslog talking to the destination
servers just fine but the spool is slowly growing. According to netstat
rsyslog is only making 1 TCP connection to the VIP on the F5 and it
seems
to be able to pass traffic through that connection.
DAN FINN
Linux System Administrator
Office: 801-746-7580 ext. 5381
Mobile: 801-609-4705
[email protected]
Backcountry.com <http://www.backcountry.com/>
Competitive Cyclist <http://www.competitivecyclist.com/>
RealCyclist.com <http://www.realcyclist.com/>
Dogfunk.com <http://www.dogfunk.com/>
SteepandCheap.com <http://www.steepandcheap.com/>
Chainlove.com <http://www.chainlove.com/>
WhiskeyMilitia.com <http://www.whiskeymilitia.com/>
On 12/4/13, 9:31 AM, "Dave Caplinger" <[email protected]>
wrote:
Without impstats output it's hard to say for sure, but since your
config
is so succinct, you are getting a lot of default buffer sizes and
watermark parameters. I see you have $ActionResumeRetryCount set to -1
for infinite retries (which is good). Note though that default high
and
low water marks are 8,000 and 2,000 messages, respectively. So once
you
get into disk-assisted mode, you won't leave it until the action queue
gets all the way down to 2000 messages. The default action queue size
will be 10,000 messages, and that's really not very much, especially in
an environment that has significant spikes in volume.
The other possibilities that come to mind are:
1) that the F5 is correctly sending to an rsyslog server that isn't
listening any more for some reason
If the receiving side's TCP session gets stuck, or something else goes
wrong but the F5 doesn't know it, the hashing algorithm will continue
to
send traffic to the same (dead) destination. TCP default timeouts are
2
minutes; this can seem like an eternity when digging through packet
captures. So on the sending side, perhaps it sends a SYN trying to
open
the session, and then nothing happens for 2 minutes before it tries all
over again?
2) perhaps there's something else in the network breaking the TCP
session, such as a firewall doing NAT
I've seen cases before where the NAT-ing firewall would time-out
translated IP addresses after a certain period, breaking long-running
sessions. The Cisco PIX/ASA, for example, has both idle
address-translation timeouts, as well as total duration timeouts. So
even a currently in-use session can still be affected by something like
this.
3) maybe there is some odd behavior in v4 of rsyslog pertaining to this
situation that has long since been fixed :-)
Not pointing fingers; I just don't have a lot of experience with
rsyslog
that old so I'm just speculating.
--
Dave Caplinger, Director of Architecture | 402.361.3063
Solutionary | Relevant . Intelligent . Security
On Dec 3, 2013, at 6:10 PM, Dan Finn <[email protected]> wrote:
I’ve done that and I’ve seen 2 things happen during these periods
where
files are being written locally.
1) Nothing at all was attempted to be sent to the remote destination.
Using telnet I could make a connection just fine but rsyslog wasn’t
even
attempting to send or talk to the destination server over TCP 514.
Message queue was growing extremely fast. I can’t explain it but on
the
2nd or 3rd restart it started talking to the remote again and began
flushing out the queue.
2) lots of traffic is going to the remote over TCP 514. The queue is
slowly growing but growing at a consistent rate. This is the most
common
situation, I’ve only seen situation #1 once. I don’t see any errors
or
retrys or anything like that.
On 12/3/13, 5:01 PM, "David Lang" <[email protected]> wrote:
On Tue, 3 Dec 2013, Erik Steffl wrote:
we have sort of similar problem, in our case it's Amazon Elastic
Load
Balancer (ELB) that somehow causes the connection go "bad" if there
is
no
traffic for 5 min (not sure what the exact time is, 1 minute is ok,
5
minutes
is not).
not sure what going "bad" actually means (still investigating) but
the
data
is not going through, rsyslog sends data but there is no response...
it
recovers eventually but not sure what exactly triggers the recovery
(sending
more messages is what triggers it but how exactly is not clear).
It's not the same case but maybe you can look into VIP and
connections
and
see what happens there, maybe use strace to see what are the
responses
when
rsyslogd sends data to destination...
or use tcpdump to watch the traffic over the network.
David Lang
erik
On 12/03/2013 01:12 PM, Dan Finn wrote:
I had kind of wondered about that as well but I have a few reasons
that
make it seem like that is not the case.
The ³central server² is actually a VIP on our F5 load balancer
with 4
rsyslog destination servers behind it. We have about 200 servers
in
our
environment and during these busy times the only servers that ever
seem to
log locally are the postgres servers. The volume of logs being
written on
these servers is certainly much higher than anywhere else. My
theory
is
that the rsyslog ³client² is not keeping up with the sheer volume
on
these
servers during the busy times but until I can find some concrete
info
that
is just a theory.
We are looking gat upgrading to v7 but unfortunately that¹s not
going
to
be a quick fix. I was hoping maybe there was an issue in my config
or
something that could be tweaked but it sounds like maybe that is
not
the
case?
I did capture some debug output while this was happening.
Unfortunately
it was pretty large so I don¹t know if I can share the whole thing
but
is
there anything in particular I would be looking for in there? I
see
that
it says it¹s writing the files locally but I didn¹t see where it
says
why.
Thanks,
Dan
On 12/3/13, 3:03 AM, "David Lang" <[email protected]> wrote:
you are sending the logs via TCP, which means that if the system
you
are
sending
logs to gets backed up, logs will queue on the sending system,
spilling
to disk
as needed.
the bottleneck is probably on the central server, but we have no
info
about what
it's doing.
The go-to tool for diagnosting this sort of problem is the
impstats
module, but
I don't think that existed back in the 4.x days, and tracking down
the
bottleneck without it is significantly harder. Is there any way
you
can
upgrade
to a current version?
David Lang
On Mon, 2 Dec 2013, Dan Finn wrote:
Date: Mon, 2 Dec 2013 20:53:54 +0000
From: Dan Finn <[email protected]>
Reply-To: rsyslog-users <[email protected]>
To: "[email protected]" <[email protected]>
Subject: [rsyslog] rsyslog frequently queuing to disk when it
should
be
sending over the network
Hello,
I¹m trying to get some insight into an issue that we have been
seeing
quite a bit. We have some postgres servers that are quite
verbose.
When the servers get busy we have an issue where they queue their
logs
locally instead of sending over the network however I can¹t find
any
reason why that would be, at least not from a OS resource
standpoint.
We are running rsyslog4-4.8.0-1.ius.el5. This is my config from
the
client that was having issues : http://pastebin.com/n3XpRdMm.
I watched it queue about 10k files under /var/spool/rsyslog
before
I
finally had to manually delete them out because disk was filling
up.
What¹s the best way to get some insight into why this might be
happening? Is there a way I can enable some debug logging for
the
rsyslog process itself? Any settings in our config that could be
tweaked?
Thanks,
Dan
DAN FINN
Linux System Administrator
[email protected]<mailto:[email protected]>
Backcountry.com<http://www.backcountry.com/>
Competitive Cyclist<http://www.competitivecyclist.com/>
RealCyclist.com<http://www.realcyclist.com/>
Dogfunk.com<http://www.dogfunk.com/>
SteepandCheap.com<http://www.steepandcheap.com/>
Chainlove.com<http://www.chainlove.com/>
WhiskeyMilitia.com<http://www.whiskeymilitia.com/>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT
POST
if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T
LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T
LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.