> I personally recommend using fromhost-ip instead of fromhost. If you want
> to use a name, use the hostname that the server put in the log message, if
> you need to ignore that and see where it last came from, the IP is faster
and
> has fewer failure modes.

That's what we used to do when we were on rsyslog v5 a couple years ago, but
the name makes the data more user friendly, and sysadmins (and me) are in
here actually using the data regularly. And not all the hostnames from the
messages were consistent, so it was messy. I'll have to re-evaluate.

> 
> with your config, only traffic arriving on port 10517 will get written to
the
> testingruleset, internal syslog messages won't show up.

Yup! I was trying to eliminate all the other noise to focus on a specific
source that I know is having issues.

> 
> now, there are things that can cause you to see messages via tcpdump and
> not see them in rsyslog.
> 
> The most common is iptables rules

I tested that theory by disabling iptables for a period of time and waiting
for messages to come in - nothing was written out to file with iptables
disabled.

> 
> But since you are writing to /tmp you could be running into permission
> problems with SELinux/AppArmor (depending on your distro). try putting the
> file under /var/log and see if it appears. It should appear when the first
log
> message arrives.

The file exists in /tmp already and is written to randomly. The dynafile
template also has an omfile output to /var/log/collection, which is where
everything else we collect is written. Both the file in /tmp and
/var/log/collection always get written to at the same time, but randomly and
not always when tcpdump shows a message came in.  So I don't think it is a
permissions issue because data is written out to both locations, sometimes.

> 
> by the way, you don't need to do *.*, just list the actions directly, so
the
> following works just fine (permissions allowing)
> 
> ruleset(name="TestingRuleSet") {
>        /tmp/debug.log;RSYSLOG_DebugFormat
>        action(type="omfile" dynaFile="EduroamACSFile"
> template="EduroamACSFormat" name="writeEduroamACSFile")
>        stop
> } # end ruleset

I'll make that change.
> 
> in impstats, the /tmp/debug.log line will be action ## while the line
after it
> will have a name, check what it says about messages that are processed/etc
> 

I restarted rsyslog at 12:51: Oct  7 12:51:59 its-syslog-up1 rsyslogd:
[origin software="rsyslogd" swVersion="8.8.0" x-pid="22861"
x-info="http://www.rsyslog.com";] start

Here's what I have 5 minutes later:
Oct  7 12:56:59 its-syslog-up1 rsyslogd-pstats: action 20:
origin=core.action processed=59 failed=0 suspended=0 suspended.duration=0
resumed=0
Oct  7 12:56:59 its-syslog-up1 rsyslogd-pstats: dynafile cache
EduroamACSFile: origin=omfile requests=59 level0=58 missed=1 evicted=0
maxused=1 closetimeouts=0
Oct  7 12:56:59 its-syslog-up1 rsyslogd-pstats: writeEduroamACSFile:
origin=core.action processed=59 failed=0 suspended=0 suspended.duration=0
resumed=0
Oct  7 12:56:59 its-syslog-up1 rsyslogd-pstats: imudp(*:10517): origin=imudp
submitted=59
Oct  7 12:56:59 its-syslog-up1 rsyslogd-pstats: imudp(*:10517): origin=imudp
submitted=0

HOWEVER - the last message written out to /tmp/debug.log and the file in
/var/log/collection is from 12:01:
Oct  7 12:01:33 <hostname> CSCOacs_Failed_Attempts 0000133963 1 0 2015-10-07
12:01:3... etc etc

So rsyslog processes the messages and says it is writing it out, but they
aren't there.

> 
> Can you deliver messages to port 10517 via TCP? there is an odd case where
a
> missing route can cause UDP messages to not be delivered to the app.

Sure, I'll reconfigure it again but I had already tested TCP and saw the
same behavior a few days ago.  

> 
> > Oct  6 16:27:45 its-syslog-up1 rsyslogd-pstats: imudp(*:10517):
> origin=imudp
> submitted=1221
> > Oct  6 16:27:45 its-syslog-up1 rsyslogd-pstats: imudp(*:10517):
> origin=imudp
> submitted=0
> 
> yep, that says that messages are arriving
> 
> with impstats, do you have it configured to reset counters each time or
not?
> there is a little inaccuracy that creeps in when resetting the counters,
and the
> impstats analysis tool that Rainer pointed you at assumes the counters
aren't
> being reset.
> 
> We really need at least two sets of stats to be able to see what's
happening.

Nope, I'm not resetting counters each run. I'll include two sets from now
on. However, something is now preventing my impstats from writing out! I got
one set of stats after restarting at 12:51 and nothing since. The problem
has to be with the OS. Max open file descriptors is already massive, well
above the number of files I'm writing out:

cat /proc/sys/fs/file-max
3246061

> 
> now, looking at the impstats
> 
> I see one smoking gun:
> 
> main Q: origin=core.queue size=149 enqueued=492522977 full=51
> discarded.full=0
> discarded.nf=0 maxqsize=100000
> 
> maxqsize is such a nice round number that I'll bet it's the max you have
> configured. That means that at some point rsyslog filled up the queue
> completely
> and would have been unable to process any more messages. That means
> that UDP
> messages that arrived during that time would have been dropped.
> 
> this also says it was full 51 times, so that is at least 51 messages lost.
> 
> on a high volume server, you probably want to bump the main queue size up
> a bit
> larger as well.

This is new since removing the queues for each rule set. I had the max size
tuned for each rule set so they weren't filling up. Here's an example from
yesterday, before I removed the rule set queues:

Oct  6 08:52:08 its-syslog-up1 rsyslogd-pstats: LinuxRuleSet:
origin=core.queue size=14 enqueued=3675661180 full=0 discarded.full=0
discarded.nf=0 maxqsize=350911
Oct  6 08:52:08 its-syslog-up1 rsyslogd-pstats: WindowsRuleSet:
origin=core.queue size=0 enqueued=750119141 full=0 discarded.full=0
discarded.nf=0 maxqsize=156862
Oct  6 08:52:08 its-syslog-up1 rsyslogd-pstats: MailRuleSet:
origin=core.queue size=0 enqueued=341023350 full=0 discarded.full=0
discarded.nf=0 maxqsize=5169
Oct  6 08:52:08 its-syslog-up1 rsyslogd-pstats: NetworkDeviceRuleSet:
origin=core.queue size=0 enqueued=16185906229 full=0 discarded.full=0
discarded.nf=0 maxqsize=456471
Oct  6 08:52:08 its-syslog-up1 rsyslogd-pstats: main Q: origin=core.queue
size=116 enqueued=1797553 full=0 discarded.full=0 discarded.nf=0
maxqsize=1738

Never full after running for ~2 weeks. For that reason, it might make sense
to go back to ruleset queues. For now though, I can tune the main queue
size, but I haven't explicitly defined that anywhere. Seems 100,000 is the
built in default. I'll figure out how to bump that up and increase it.

> 
> another issue is that several actions show that they have been suspended
> and not
> resumed, that will cause logs to be missing from those outputs (action2,
30,
> 43,
> 67). one advantage of the new action() format is the ability to name
things.
> Even if you don't name everything, naming several of them will help figure
> out
> what the rest are.

Good point - I will go through and name everything :) I've noticed these
types of messages but the error doesn't give a lot of info into why and I
don't know how to troubleshoot it. After I name the actions, any tips to
diagnose?

> 
> as you said, there are still some dynafile cache settings that probably
need to
> be increased. They can slow everything down and cause the queue to fill
(but
> without knowing if the stats are reset, I don't know if the numbers are
since
> startup or since the last report) Since the busiest outputs are also the
ones
> with the highest miss rates, you may end up with a more dramatic I/O
savings
> once you fix them.

I just bumped the remaining two that were showing evictions again, so the
impact here should be minimal now.

> 
> I don't understand why you have multiple lines for the same thing (imudp
> and
> writeEduroamACSFile for example), unless it's one line per thread, in
which
> case
> these are showing that you really don't need multiple threads for the
inputs,
> you aren't using them :-)

I think I figured out that one and resolved it as well - I had been testing
this at various points in different rule sets (ie ports), and I left the
filters and actions defined in each of them. I've removed everything now
except for the one I'm actively testing with.
> 

Thanks again,
Dan

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to