Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss

2015-10-06 Thread David Lang

When you said "if you are writing to a local file, you should not need to
have a separate queue", does queue equate to ruleset in this case? The main
reason I have the separate rulesets is to handle different kinds of incoming
logs on different ports, making it easier to write the rules logic for the
different types. Does that make sense?


using rulesets to segment your rules makes lots of sense, adding a queue if 
everything in the ruleset is writing to local files almost never does.


writing to disk is fast enough that the overhead of moving things in and out of 
the queue really hurts. In addition, if everything is behind a separate queue, 
the main worker threads end up not batching messages, and that _really_ hurts 
performance because the overhead of locking and unlocking the queues for each 
message is rather high.



There is a high number of dynafiles, specifically for server logs, which we
break out into files by host and programname with these templates:
template(name="LinuxProgramFile" type="string"
string="/var/log/collection/linux/%fromhost:::lowercase%/%programname%.log")
template(name="WindowsProgramFile" type="string"
string="/var/log/collection/windows/%HOSTNAME:::lowercase%/%programname%.log
")

The result is about 32,000 total dynafiles which are then rotated out daily.


the key is how many different files are being written to at the same time. As 
Rainer points out, if you have 1000 files open at a time, you need to have the 
dynafilecachesize set >1000.


I have systems that write per-server, per 15-min files. for each server this 
creates 96 files a day. I don't need 96 * servercount slots in the 
dynafilecache, I need > servercount.


the penalty for having this undersized is huge. when you are thrashing, each 
message that arrives requires that the system flush it's cache, close the file 
(which triggers filesystem metadata updates), open a new file and write the 
data. make the cache size much larger than what you need.


Looking at top, rsyslogd as a whole never goes above 120%. Most of the
threads hang out below 15% with the busiest one - rs:NetworkDevic - hitting
about 50% during peak. Although I hadn't noticed this before, watching top
for a few minutes this morning, rsyslogd hangs out around 60%, then bursts
to almost 200% for one second, then disappears from top in the next second,
before returning to 60%. It cycled through this every 10 seconds or so.
After restarting rsyslogd, the behavior went away - it's staying around 60%.


Should I try lowering the number of threads for the rulesets/imudp? This
issue does happen with both TCP and UDP, however, which further compounds my
confusion. You mention systemd - I'll look into that as I have a case open
with RH.


if you are just dealing with logs sent from remote machines, systemd isn't going 
to be involved.


For the rulesets, try eliminating the queue entirely, combined with fixing the 
dynafilecachesize issue, I'll bet cpu utilization will plummet and you won't 
miss the queues.


for imudp, try leaving the number of threads at 1. I've run systems receiving 
hundreds of thousands of logs/sec with a single thread. It's only when we start 
getting close to the throughput of Gig-E that a single thread starts having 
problems in most cases.


set all thread counts to 1 and watch top to see if any of them need help.


From what I'm seeing, you shouldn't end up needing to do anything else.


David Lang

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] RFC: dynamic-stats support

2015-10-06 Thread Rainer Gerhards
I can't fully dig into this, but I think we must *very carefully*
evaluate the overall design. Some time ago we introduced arrays for
the limited liblognorm use case, and it hurts us every now and then
when folks want to use arrays for other use cases. It may probably
make sense to re-think how the variable engine etc behaves before
adding more functionality. And make sure that everything works smooth
in all use cases. While anything else may take care for some use
cases, I fear we may get too fragmented. At least this is what I
learned in the past months discussions.

Anyone else?

Rainer

2015-10-06 17:10 GMT+02:00 singh.janmejay :
> It is possible to use global-variables (it'll require some
> enhancements, table-support etc), but it'll be very inefficient
> compared to this approach. For instance, choice of data-structure etc
> allows making the solution a lot more efficient.
>
> Here its possible to locklessly increment counters in most cases, so
> its overhead is a lot lesser than global-variables.
>
> Recycle is precisely to allow this lockless mechanism to work. Its
> basically saying, it'll track metric-names he has seen in last 1 hour.
> If we kill tracking of it as soon as we don't see an increment
> (between 2 reporting runs of impstats), it'll lead to unnecessary
> churn when low-values are common or load is not uniform in time.
>
> Implementing it on top of global-variables is not only has very high
> performance-penalty(it'll be prohibitive for high-throughput
> scenarios), it also exposes too much complexity to the user (where
> user has to worry about reset etc).
>
> I don't plan to have a scheduler in this implementation.
> GetAllStatsLines call will purge the tree instead of reset at that
> interval. Its basically a balance between freeing-up memory occupied
> by stale-metric-names vs. performance (lockless handling of
> increment). So it will be governed by impstat schedule. May be I
> should change name to better name (equivalent of
> purge_known_keys_after_they_have_been_reported_N_times).
>
>
> On Tue, Oct 6, 2015 at 4:30 PM, David Lang  wrote:
>> On Tue, 6 Oct 2015, singh.janmejay wrote:
>>
>>> Hi,
>>>
>>> I am working on support for stats with dynamic-name. This comes handy
>>> in situations where metric-name is dependent upon value of a certain
>>> attribute of the message.
>>>
>>> Say, for a central log-aggregation service, its valuable to know what
>>> is inbound message-count distribution across application-clusters that
>>> send logs to it, or for a shared-server, its valuable to know what is
>>> the log-volume generation across users etc.
>>>
>>> Im thinking of using functions-like interface to support this. It may
>>> look similar to this:
>>>
>>> 
>>> dyn_stats("user_msg_count")
>>>
>>> ...
>>>
>>> ruleset(...) {
>>> ...
>>> dyn_inc("user_msg_count", $.user)
>>> ...
>>> }
>>> 
>>>
>>> dyn_stats signature looks like:
>>> dyn_stats(, , >> default=10k>, )
>>>
>>> dyn_inc signature looks like:
>>> dyn_inc(, )
>>>
>>>
>>> Reporting would work similar to static-metric via impstats. Mapping:
>>> statsobj_s.name = name_space
>>> statsobj_s.origin = "dyn"
>>> ctr_s.name = "foo" (say $.user had value foo)
>>>
>>>
>>> Thoughts / suggestions?
>>
>>
>> how is this different/better than global variables? (although we may need to
>> implement soem functions, atomic inc/dec copy+clear) If you have pstats
>> output in json format, you can even piggyback on it's schedule to output the
>> data.
>>
>>
>> things like stats can very easily end up being expensive in terms of locking
>> (something global variables already have figured out), and it sounds like
>> you are proposing adding a scheduler of some sort to output the data.
>>
>> variables should not need to be 'recycled', either they contain data or they
>> don't. If they contain data, you need to keep the data until you do
>> something with it, if they don't, you don't have to track them.
>>
>>
>> I am actually doing this sort of thing external to rsyslog in SEC
>>
>> I have a template in rsyslog that contains hostname, fromhost-ip,
>> programname and I output it via improg to SEC. SEC accumulates counters and
>> has scheduled outputs to files.
>>
>> before I started using SEC for this, I used the same template to output to a
>> file and then for reports, used cut + sort + uniq -c to extract the data I
>> need. When the files only contain the significant data, this is actually not
>> bad to do, even at higher volumes.
>>
>> David Lang
>> ___
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
>> LIKE THAT.
>
>
>
> --
> Regards,

Re: [rsyslog] omfile fuse

2015-10-06 Thread David Lang

On Tue, 6 Oct 2015, Rainer Gerhards wrote:


2015-10-06 7:27 GMT+02:00 David Lang :

On Tue, 6 Oct 2015, Rainer Gerhards wrote:


2015-10-06 0:09 GMT+02:00 Kendall Green :


I've set asyncWriting="off" and the only difference is, that with 1000
message test, all were delivered on HUP and/or service restart, but
apparently not on flushInterval or buffer full. I'm currently running
another test with 100k messages iterating about 10 logs/sec, and watching
the output for number of lines. The output file is created on dfs-fuse
mount upon receipt of the first log, but 0 lines written.

This current test will hopefully show how many logs get processed without
intervention over the next ~3 hours, and if still no logs appear will see
how many are written upon HUP signal.

I'm unsure how the asyncWriting off impacts the rest of the config, and
if
the current config has disabled compression... or not with no zip.

This note is from omfile asyncWriting doc:
*Note that in order to enable FlushInterval, AsyncWriting must be set to
“on”. Otherwise, the flush interval will be ignored. Also note that when
FlushOnTXEnd is “on” but AsyncWriting is off, output will only be written
when the buffer is full. This may take several hours, or even require a
rsyslog shutdown. However, a buffer flush can be forced in that case by
sending rsyslogd a HUP signal.*

Please help clarify how these parameters correspond and what the settings
can achieve expedited log transport, avoiding long delays waiting for
buffering or flush intervals. This goal is why have chosen low settings
for
IOBufferSize and flushInterval.



Without async writing, everything is written sync on when it arrives
(more precisely at commit of the current batch) aka "as quickly as
possible". Thus, these params do not exist because they have no
meaning. Note that with a very large batch and a small buffer, partial
records may be written.



although without the checkpoint parameter the writes are to the OS and they
get flushed to disk at a later point.


you mean "sync", right?


I may be misunderstanding, but i thought the checkpoint (or checkpoint interval) 
parameter controleld how frequently you did the sync (so turning sync on didn't 
make it per every message)


David Lang___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] Separation of actions based on log source - with good performance

2015-10-06 Thread Christopher . Racky
Hello,
I have never heared such a nonsense. 
Actually the number of applications that does not include features that are not 
official documented shoult be extremly limited.

The functionality is really usefull and already in big landscapes productive.
Please, please do NOT remove the lookup-table from the main branch.
The functionaltiy works fine, I'm using this since march and I did not have any 
issue since the latest patch of janmejay.

Even the "concept" is not fully implemented (e.g. smaller things like nomatch) 
the main part works fine.


My suggestion would be to document everything which is currently implemented 
and keep the "conceptual documentation" as it is.
So the Maintainer should no longer have an issue with it.


If the main issue it the time to document the already implemented features, I 
can create a patch.


Chris



> Gesendet: Dienstag, 06. Oktober 2015 um 07:36 Uhr
> Von: "David Lang" 
> An: rsyslog-users 
> Betreff: Re: [rsyslog] Separation of actions based on log source - with good 
> performance
>
> a CVE for something that requires manually enabling an experimental feature???
> 
> it would be one thing if a default config had the problem, or if it was 
> something entirely dependent on remote data.
> 
> I would be very tempted to respond to the CVE with "don't enable this 
> incomplete 
> feature" as the solution. It's very common for incomplete features to be 
> included in released versions
> 
> grumble, we have enough real bugs to worry about.
> 
> David Lang
> 
> On Tue, 6 Oct 2015, Rainer Gerhards wrote:
> 
> > Date: Tue, 6 Oct 2015 07:15:31 +0200
> > From: Rainer Gerhards 
> > Reply-To: rsyslog-users 
> > To: rsyslog-users 
> > Subject: Re: [rsyslog] Separation of actions based on log source - with good
> > performance
> > 
> > Sorry, folks, good intent always seems to find someone who turns it
> > into negative. I was yesterday contacted by a distro maintainer who
> > wants to turn this bug in the officially non-existant lookup table
> > feature into a CVE and insists that it is a vuln even after the
> > argument that the feature never oficially existed.
> >
> > It looks like it was a bad idea to merge potentially useful yet
> > incomplete code into the main branch (and documenting it to be not
> > present). It looks like I need to re-think my stance on experimental
> > features.
> >
> > Anyhow, I really don't want to support the argument that something
> > non-existing can be a CVE. As such, I will create a new
> > master-insecure branch, which will be a clone of the current master
> > branch. Then I'll remove the lookup table code, so that the code base
> > matches the documentation. I really don't want to create a general
> > principle here that we need to create CVEs (and patched) for something
> > that was just added as a convenience for a handful of folks who were
> > ready to take a risk.
> >
> > If there is sufficient interest, we can consider officially adding
> > this feature to the January 8.15 release iff it is ready by then.
> > @janmejay: please let me know if you would like to continue with your
> > work on lookup tables under this new situation.
> >
> > As soon as I have time, I'll check what else needs to be removed. Not
> > sure about the ./contributed branch, because the project cannot
> > guarantee at all this is bug-free. It's documented to be so, but if
> > that is not sufficient, it should probably live only in the
> > master-insecure branch.
> >
> > Rainer
> >
> > 2015-10-02 17:29 GMT+02:00 singh.janmejay :
> >> As of now it returns empty string for no-match. I guess we should go ahead
> >> with it in its current form. We can add default value any time later
> >> without breaking compatibility(it'd default to "").
> >>
> >> I'll add a test for multiple tables on Monday.
> >>
> >> On Fri, Oct 2, 2015, 7:16 PM   wrote:
> >>
> >>> Hi,
> >>> No, I didn't. I even didn't realize the patch.
> >>>
> >>> It seems to be exactly the related issue. So I don't expect any further
> >>> issues.
> >>> I will use the new version on 2 systems. If there is any other issue, I
> >>> will let you know.
> >>> Release data for next rsyslog version is quite far so enough time to
> >>> test... ;)
> >>>
> >>> The missing implementation of "nomatch" (default) entry as described at
> >>>  http://www.rsyslog.com/doc/lookup_tables.html
> >>> would from my opinion require changes:
> >>>
> >>> Arround line 132 of lookup.c file (save of value)
> >>> Arround line 243 of lookup.c file (search in lookuptable fails, so return
> >>> nomatch value.
> >>>
> >>>
> >>> regards
> >>> Chris
> >>>
> >>>
> >>> > Gesendet: Donnerstag, 01. Oktober 2015 um 16:57 Uhr
> >>> > Von: "singh.janmejay" 
> >>> > An: rsyslog-users 
> >>> > Betreff: Re: [rsyslog] Separation of 

Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss

2015-10-06 Thread Woodruff, Dan
> -Original Message-
> From: rsyslog-boun...@lists.adiscon.com [mailto:rsyslog-
> boun...@lists.adiscon.com] On Behalf Of David Lang
> Sent: Tuesday, October 6, 2015 12:49 PM
> To: rsyslog-users 
> Subject: Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss
> 
> > Thank you both David and Rainer - I really appreciate the detailed
> responses
> > and it's already helping put my box in a better state.
> 
> good to hear
> 
> > I used the impstats analyzer and started increasing dynaFileCacheSize
for
> > the actions that were problematic. CPU usage increased by about 2.5x and
> > context switches went up to 800k for about 15 minutes after I restarted
> > rsyslogd but seems to have settled back in around 60-70% and around 100k
> > context switches now. Top showed three in:imtcp threads as the busiest
> > during that time.
> 
> a backlog of inbound messages via tcp after a restart isn't unusual.
> 
> with heavy inbound tcp use you may want to look at using imptcp instead of
> imtcp.
> 
> > Cacti stats show about half the number of disk operations as before. So
it
> > seems to be a general performance gain, but it did not affect message
loss,
> > however, as I can already see that some messages have disappeared.
> 
> I'm surprised it didn't help more than that. what do the impstats look
like.

Impstats looks good - only two dynafiles that still have evictions, which
I'll tune a little more now. See bottom of message for a full recent output.

> 
> > Regarding queues, some of my rules in each ruleset do forward messages
> to
> > another host (a SIEM) over TCP. Should I keep the queue to support that,
> or
> > is TCP good enough?
> 
> it's a good idea to have a queue on outbound delivery methods that can
> block
> because if the remote system goes down it will block all message
processing.
> Consider using a disk queue for this as well (to hand an extended outage
of
> the
> remote system)
> 
> also note for the record that TCP is not as reliable as you may be
thinking.
> https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__blog.gerhards.net_2008_04_on-2Dunreliability-2Dof-2Dplain-2Dtcp-
> 2Dsyslog.html=BQICAg=kbmfwr1Yojg42sGEpaQh5ofMHBeTl9EI2eaqQZ
> hHbOU=0S5vJ8-FmQy6Qk5D6_T4U9EYbkCcMc4ijDuyUem89Lk=N-
> 1ExMOcN75_9w1gY8ODemHdzRwHC_SeAxYSaAEb4P0=XiOr6GhA1NvhGT
> keIJGzrwxy7vjn6FYRasLdeBoLCvo=

Thanks. That is definitely important to me so I will look into creating a
disk queue for outbound messages once I get this missing message issue
worked out.
> 
> > RH support suggests turning of DNS lookups. Would that have any affect?
> > We've come to rely on hostname for creating directories for each server.
> 
> the %hostname% variable is what the sending system puts in the message,
> DNS
> lookups aren't needed for that. Turning off DNS will avoid delays on UDP
> messages. With DNS lookups, a slow DNS server can cause delays (without
> rsyslog
> using CPU) that can end up causing some messages to be lost if the OS
> queues end
> up running out of space.

We have several %fromhost% templates for servers. I noticed that there is a
rush of DNS requests when rsyslog first starts up but then it quiets down
due to the internal caching. I also have a local DNS cache running to point
to - not currently set - but when I did have the system pointed there, I
still had missing messages.

> 
> Now that you have addressed some of the worst problems in your config,
> let's
> talk about the log loss problem.
> 
> are you loosing messages that arrive via UDP or TCP (or both)?

Losing messages in both. Currently I'm testing with a port dedicated to just
one log source that I am continuously having problems with, and I'm sending
UDP right now. The port is 10517 and the config is:

template(name="EduroamACSFile" type="string"
string="/var/log/collection/eduroam_acs.log")
template(name="EduroamACSFormat" type="string" string="%timestamp%
%hostname% %syslogtag%%msg:::escape-cc%\n")
ruleset(name="TestingRuleSet") {

*.* {
/tmp/debug.log;RSYSLOG_DebugFormat
action(type="omfile" dynaFile="EduroamACSFile"
template="EduroamACSFormat" name="writeEduroamACSFile")
stop
}
} # end ruleset

# bind the listening ports to the ruleset
input(type="imudp" port="10517" ruleset="TestingRuleSet")
input(type="imtcp" port="10517" ruleset="TestingRuleSet")

I haven't had a message appear in /tmp/debug or in eduroam_acs since 11:01
this morning, even with several rsyslogd restarts since then. Last message
in my log file (sanitized by me):
Oct  6 11:01:06  CSCOacs_Passed_Authentications 129983 2 1
Step=11001 , Step=11017 , Step=15008 , Step=15004 , Step=15012 , Step=11358
, Step=11361 , Step=11355 , Step=11100 , Step=11101 , Step=11357 ,
Step=11002 , NetworkDeviceName=... msg continues

But tcpdump shows data coming into the box to port 10517 UDP:
16:26:36.751119 IP x.x.2.85.55422 > x.x.101.70.10517: UDP, length 1015
E.@.<..U..eF.~)4<181>Oct  6 

Re: [rsyslog] RFC: dynamic-stats support

2015-10-06 Thread David Lang

On Wed, 7 Oct 2015, singh.janmejay wrote:


--
Regards,
Janmejay

PS: Please blame the typos in this mail on my phone's uncivilized soft
keyboard sporting it's not-so-smart-assist technology.

On Oct 6, 2015 10:32 PM, "David Lang"  wrote:


On Tue, 6 Oct 2015, singh.janmejay wrote:


It is possible to use global-variables (it'll require some
enhancements, table-support etc), but it'll be very inefficient
compared to this approach. For instance, choice of data-structure etc
allows making the solution a lot more efficient.



As for the data structures, Rainer has been identifying inefficencies in

how json-c works and working to improve them




That optimizes variable system. But it still is a general propose variable
system. It can't and shouldn't understand relationship between variables.


what relationships are there between the different metrics?




Here its possible to locklessly increment counters in most cases, so
its overhead is a lot lesser than global-variables.



how can you manage counters in multiple threads without locks? Especially

when dealing with batches.




Consider a trie based implementation. With bounded fanout-factor, it's O(1)
wrt metric-names cardinality. It also has very little lock contention
involved. Usually operations work with read-locks, only when new metric is
initialized it requires a write lock on patent node. If recycles are few
and far apart, lock contention would be negligible.


if you have multiple threads that may need to update the same metric at the same 
time, a tree doesn't eliminate the locking.


The current json-c locking is being make intentionally over-broad right now 
because it appears that some json-c code is not thread-safe and we haven't 
identified it yet. Once that's tracked down and fixed (or json-c replaced), 
updating one item should not require locking any more than that item.





Recycle is precisely to allow this lockless mechanism to work. Its
basically saying, it'll track metric-names he has seen in last 1 hour.
If we kill tracking of it as soon as we don't see an increment
(between 2 reporting runs of impstats), it'll lead to unnecessary
churn when low-values are common or load is not uniform in time.



that depends on the cost of initializing a metric vs the cost of tracking

the recycle mechanism.




0 value data-points can easily be filtered out. So they don't create any
processing overhead downstream. Cost of tracking for recycle is minimal
because it's a single counter bring tracked, when it reaches zero it's
reset to orig starting value and trie is killed after reporting accumulated
stats.


actually, filtering out 0 data-points can be a very bad thing. Far too many 
monitoring tools produce stright-line graphs/estimates between reported 
data-points, so it's very important to report 0 value data-points



Implementing it on top of global-variables is not only has very high
performance-penalty(it'll be prohibitive for high-throughput
scenarios), it also exposes too much complexity to the user (where
user has to worry about reset etc).

I don't plan to have a scheduler in this implementation.
GetAllStatsLines call will purge the tree instead of reset at that
interval. Its basically a balance between freeing-up memory occupied
by stale-metric-names vs. performance (lockless handling of
increment). So it will be governed by impstat schedule. May be I
should change name to better name (equivalent of
purge_known_keys_after_they_have_been_reported_N_times).



if this is just adding additional metrics to the impstats output that

eliminates the schedular/reset issue.


I think we should have a metric configuration be fairly static, allow

configuring custom metrics and add to them, but don't use data from the
message as part of the name of the metric, and continue reporting them
forever, even if they are 0 (so no need to 'recycle' names)

Dynamic metrics are a real usecase for any shared system(utilisation across
several users, several hosts, several clusters, several-subnets etc are
easily reportable with this). The only way to report utilisation in many
scenarios is to have dyn-metric names. The alternative is to pre-declare
all keys, but that to me is a more indirect solution. It's not as
flexible/adaptive.

I think declarative static-key a useful feature on its own, for eg when
classifying reportable metric into buckets known in advance, but dyn-key
and configurable-static-key are not interchangeable.


dynamic systems will have pathalogical failure condiditions. Consider what 
happens when someone uses hostname in a dynafile template and then some system 
starts spewing malformed logs that put garbage data in that field. Creating 
hundreds or thousands of metric variables is much worse.


I agree that pre-declared keys are less flexible, but they are also going to be 
far safer and easier to deal with.


David Lang
___
rsyslog mailing list

Re: [rsyslog] Complex forwarding and spoofing question

2015-10-06 Thread David Lang
enable impstats, my guess is that there is something wrong with communications 
to the remote system so you get a bunch of messages in, send them out, but the 
are not getting acks back, so things stop trying to deliver. That's then filling 
the queue and spilling over to the disk.


also do a tcpdump on the connection to the remote system.

David Lang


On Tue, 6 Oct 2015, Randy Baca wrote:


Date: Tue, 6 Oct 2015 20:35:12 +
From: Randy Baca 
Reply-To: rsyslog-users 
To: rsyslog-users 
Subject: Re: [rsyslog] Complex forwarding and spoofing question

OK, I got everything updated and we are up and running.

However, there appears to be an issue with disk queuing.  When we fire up 
rsyslog the remotebox is only receiving a short burst of events for a few 
seconds and then about 1 or 2 events per second after that.  After the few 
seconds when the burst is over the disk queue on the local system then starts 
creating spool files.  They don't appear to be flushing.  The local system 
receives about 1k events per second.  They are all making it to the local UDP 
spoof but not the remotebox, even after playing around with the queue settings.

Here is the current conf:

module (load="imklog")
module (load="imuxsock")
module (load="imudp")
module (load="imtcp")
module (load="omudpspoof")

# Global Configuration
global (
   parser.escapeControlCharactersCStyle="on"
   workDirectory="/var/spool/rsyslog"
   localHostname="hostname"
   )

# In: 514/TCP (uses the same queue as UDP)
input(type="imtcp"
   address="10.1.2.3"
   port="514"
   ruleset="ruleset_eth0_514")

# In: 514/UDP (uses the same queue as TCP)
input(type="imudp"
   address="10.1.2.3"
   port="514"
   ruleset="ruleset_eth0_514")

# Out: UDP-spoof to the local and also forward to remote rsyslog
ruleset(name="ruleset_eth0_514"
   queue.type="LinkedList") {
   call action.local.udp515
   call action.fwd.remotebox
   stop
   }

# Fwd to localhost:udp/515
ruleset(name="action.local.udp515") {
   action(type="omudpspoof"
   name="omudpspoof.udp515"
   target="127.0.0.1"
   port="515")
   }

# Fwd to remotebox w/ compression and local disk queueing
ruleset(name="action.fwd.remotebox") {
   action(type="omfwd"
   name="omfwd.remotebox"
   queue.spoolDirectory="/var/spool/rsyslog_tcp"
#queue.lowwatermark="2000"
#queue.highwatermark="100"
   queue.workerthreads="2"
#queue.type="Disk"
   queue.type="LinkedList"
#queue.filename="omfwd_remotebox"
#queue.size="10"
   queue.maxdiskspace="20G"
   queue.saveonshutdown="on"
   action.resumeretrycount="-1"
   target="172.22.22.22"
   port="514"
   protocol="tcp"
   compression.mode="stream:always")
   }




From: rsyslog-boun...@lists.adiscon.com [rsyslog-boun...@lists.adiscon.com] on 
behalf of Dave Caplinger [davecaplin...@solutionary.com]
Sent: Monday, October 05, 2015 5:40 AM
To: rsyslog-users
Subject: Re: [rsyslog] Complex forwarding and spoofing question

David's right, it's a separate package.  Sorry about that.  Here's a list of 
all the 64-bit rsyslog packages (from 'yum search rsyslog') from a system I 
have access to; please verify you see something similar from yours.

rsyslog-gnutls.x86_64 : TLS protocol support for rsyslog
rsyslog-gssapi.x86_64 : GSSAPI authentication and encryption support for rsyslog
rsyslog-mysql.x86_64 : MySQL support for rsyslog
rsyslog-pgsql.x86_64 : PostgresSQL support for rsyslog
rsyslog-relp.x86_64 : RELP protocol support for rsyslog
rsyslog-snmp.x86_64 : SNMP protocol support for rsyslog
rsyslog.x86_64 : Enhanced system logging and kernel message trapping daemons
rsyslog-elasticsearch.x86_64 : Provides the omelasticsearch module
rsyslog-mmjsonparse.x86_64 : mmjsonparse support
rsyslog-mmutf8fix.x86_64 : mmutf8fix support
rsyslog-udpspoof.x86_64 : Provides the omudpspoof module

Then, of course: 'yum install rsyslog-udpspoof'.

--
Dave Caplinger, Director, Technical Product Management | 402.361.3063 | 
Solutionary — An NTT Group Security Company


On Oct 3, 2015, at 1:02 AM, David Lang  wrote:

It may be a separate package (I don't know how to query this through yum)

in the meantime, try addding the following

$template raw,"%rawmsg%"

and then change the forwarding via localhost from omudpspoof to udp with the 
template raw.

David Lang

On Fri, 2 Oct 2015, Randy Baca wrote:


Date: Fri, 2 Oct 2015 23:46:27 +
From: Randy Baca 
Reply-To: rsyslog-users 
To: rsyslog-users 
Subject: Re: [rsyslog] Complex forwarding and spoofing question
Upgraded to 8.13.  Looks like now the omudpspoof.so file is missing.  Does it 
not get installed automatically with the upgrade?  I checked in /lib64/rsyslog/ 
and it is not there.  It also didn't like the number I had set for the 

Re: [rsyslog] Separation of actions based on log source - with good performance

2015-10-06 Thread singh.janmejay
Let us ship a full implementation in 8.14, let us not remove it.

--
Regards,
Janmejay

PS: Please blame the typos in this mail on my phone's uncivilized soft
keyboard sporting it's not-so-smart-assist technology.

On Oct 6, 2015 1:19 PM, "singh.janmejay"  wrote:

> How about this: I get to it next week and implement all that the proposal
> talks about and document it? It doesn't seem like a lot of work from where
> it is right now.
>
> I vote for not removing it.
>
> --
> Regards,
> Janmejay
>
> PS: Please blame the typos in this mail on my phone's uncivilized soft
> keyboard sporting it's not-so-smart-assist technology.
>
> On Oct 6, 2015 11:53 AM,  wrote:
>
>> Hello,
>> I have never heared such a nonsense.
>> Actually the number of applications that does not include features that
>> are not official documented shoult be extremly limited.
>>
>> The functionality is really usefull and already in big landscapes
>> productive.
>> Please, please do NOT remove the lookup-table from the main branch.
>> The functionaltiy works fine, I'm using this since march and I did not
>> have any issue since the latest patch of janmejay.
>>
>> Even the "concept" is not fully implemented (e.g. smaller things like
>> nomatch) the main part works fine.
>>
>>
>> My suggestion would be to document everything which is currently
>> implemented and keep the "conceptual documentation" as it is.
>> So the Maintainer should no longer have an issue with it.
>>
>>
>> If the main issue it the time to document the already implemented
>> features, I can create a patch.
>>
>>
>> Chris
>>
>>
>>
>> > Gesendet: Dienstag, 06. Oktober 2015 um 07:36 Uhr
>> > Von: "David Lang" 
>> > An: rsyslog-users 
>> > Betreff: Re: [rsyslog] Separation of actions based on log source - with
>> good performance
>> >
>> > a CVE for something that requires manually enabling an experimental
>> feature???
>> >
>> > it would be one thing if a default config had the problem, or if it was
>> > something entirely dependent on remote data.
>> >
>> > I would be very tempted to respond to the CVE with "don't enable this
>> incomplete
>> > feature" as the solution. It's very common for incomplete features to be
>> > included in released versions
>> >
>> > grumble, we have enough real bugs to worry about.
>> >
>> > David Lang
>> >
>> > On Tue, 6 Oct 2015, Rainer Gerhards wrote:
>> >
>> > > Date: Tue, 6 Oct 2015 07:15:31 +0200
>> > > From: Rainer Gerhards 
>> > > Reply-To: rsyslog-users 
>> > > To: rsyslog-users 
>> > > Subject: Re: [rsyslog] Separation of actions based on log source -
>> with good
>> > > performance
>> > >
>> > > Sorry, folks, good intent always seems to find someone who turns it
>> > > into negative. I was yesterday contacted by a distro maintainer who
>> > > wants to turn this bug in the officially non-existant lookup table
>> > > feature into a CVE and insists that it is a vuln even after the
>> > > argument that the feature never oficially existed.
>> > >
>> > > It looks like it was a bad idea to merge potentially useful yet
>> > > incomplete code into the main branch (and documenting it to be not
>> > > present). It looks like I need to re-think my stance on experimental
>> > > features.
>> > >
>> > > Anyhow, I really don't want to support the argument that something
>> > > non-existing can be a CVE. As such, I will create a new
>> > > master-insecure branch, which will be a clone of the current master
>> > > branch. Then I'll remove the lookup table code, so that the code base
>> > > matches the documentation. I really don't want to create a general
>> > > principle here that we need to create CVEs (and patched) for something
>> > > that was just added as a convenience for a handful of folks who were
>> > > ready to take a risk.
>> > >
>> > > If there is sufficient interest, we can consider officially adding
>> > > this feature to the January 8.15 release iff it is ready by then.
>> > > @janmejay: please let me know if you would like to continue with your
>> > > work on lookup tables under this new situation.
>> > >
>> > > As soon as I have time, I'll check what else needs to be removed. Not
>> > > sure about the ./contributed branch, because the project cannot
>> > > guarantee at all this is bug-free. It's documented to be so, but if
>> > > that is not sufficient, it should probably live only in the
>> > > master-insecure branch.
>> > >
>> > > Rainer
>> > >
>> > > 2015-10-02 17:29 GMT+02:00 singh.janmejay :
>> > >> As of now it returns empty string for no-match. I guess we should go
>> ahead
>> > >> with it in its current form. We can add default value any time later
>> > >> without breaking compatibility(it'd default to "").
>> > >>
>> > >> I'll add a test for multiple tables on Monday.
>> > >>
>> > >> On Fri, Oct 2, 2015, 7:16 PM  

Re: [rsyslog] Separation of actions based on log source - with good performance

2015-10-06 Thread singh.janmejay
How about this: I get to it next week and implement all that the proposal
talks about and document it? It doesn't seem like a lot of work from where
it is right now.

I vote for not removing it.

--
Regards,
Janmejay

PS: Please blame the typos in this mail on my phone's uncivilized soft
keyboard sporting it's not-so-smart-assist technology.

On Oct 6, 2015 11:53 AM,  wrote:

> Hello,
> I have never heared such a nonsense.
> Actually the number of applications that does not include features that
> are not official documented shoult be extremly limited.
>
> The functionality is really usefull and already in big landscapes
> productive.
> Please, please do NOT remove the lookup-table from the main branch.
> The functionaltiy works fine, I'm using this since march and I did not
> have any issue since the latest patch of janmejay.
>
> Even the "concept" is not fully implemented (e.g. smaller things like
> nomatch) the main part works fine.
>
>
> My suggestion would be to document everything which is currently
> implemented and keep the "conceptual documentation" as it is.
> So the Maintainer should no longer have an issue with it.
>
>
> If the main issue it the time to document the already implemented
> features, I can create a patch.
>
>
> Chris
>
>
>
> > Gesendet: Dienstag, 06. Oktober 2015 um 07:36 Uhr
> > Von: "David Lang" 
> > An: rsyslog-users 
> > Betreff: Re: [rsyslog] Separation of actions based on log source - with
> good performance
> >
> > a CVE for something that requires manually enabling an experimental
> feature???
> >
> > it would be one thing if a default config had the problem, or if it was
> > something entirely dependent on remote data.
> >
> > I would be very tempted to respond to the CVE with "don't enable this
> incomplete
> > feature" as the solution. It's very common for incomplete features to be
> > included in released versions
> >
> > grumble, we have enough real bugs to worry about.
> >
> > David Lang
> >
> > On Tue, 6 Oct 2015, Rainer Gerhards wrote:
> >
> > > Date: Tue, 6 Oct 2015 07:15:31 +0200
> > > From: Rainer Gerhards 
> > > Reply-To: rsyslog-users 
> > > To: rsyslog-users 
> > > Subject: Re: [rsyslog] Separation of actions based on log source -
> with good
> > > performance
> > >
> > > Sorry, folks, good intent always seems to find someone who turns it
> > > into negative. I was yesterday contacted by a distro maintainer who
> > > wants to turn this bug in the officially non-existant lookup table
> > > feature into a CVE and insists that it is a vuln even after the
> > > argument that the feature never oficially existed.
> > >
> > > It looks like it was a bad idea to merge potentially useful yet
> > > incomplete code into the main branch (and documenting it to be not
> > > present). It looks like I need to re-think my stance on experimental
> > > features.
> > >
> > > Anyhow, I really don't want to support the argument that something
> > > non-existing can be a CVE. As such, I will create a new
> > > master-insecure branch, which will be a clone of the current master
> > > branch. Then I'll remove the lookup table code, so that the code base
> > > matches the documentation. I really don't want to create a general
> > > principle here that we need to create CVEs (and patched) for something
> > > that was just added as a convenience for a handful of folks who were
> > > ready to take a risk.
> > >
> > > If there is sufficient interest, we can consider officially adding
> > > this feature to the January 8.15 release iff it is ready by then.
> > > @janmejay: please let me know if you would like to continue with your
> > > work on lookup tables under this new situation.
> > >
> > > As soon as I have time, I'll check what else needs to be removed. Not
> > > sure about the ./contributed branch, because the project cannot
> > > guarantee at all this is bug-free. It's documented to be so, but if
> > > that is not sufficient, it should probably live only in the
> > > master-insecure branch.
> > >
> > > Rainer
> > >
> > > 2015-10-02 17:29 GMT+02:00 singh.janmejay :
> > >> As of now it returns empty string for no-match. I guess we should go
> ahead
> > >> with it in its current form. We can add default value any time later
> > >> without breaking compatibility(it'd default to "").
> > >>
> > >> I'll add a test for multiple tables on Monday.
> > >>
> > >> On Fri, Oct 2, 2015, 7:16 PM   wrote:
> > >>
> > >>> Hi,
> > >>> No, I didn't. I even didn't realize the patch.
> > >>>
> > >>> It seems to be exactly the related issue. So I don't expect any
> further
> > >>> issues.
> > >>> I will use the new version on 2 systems. If there is any other
> issue, I
> > >>> will let you know.
> > >>> Release data for next rsyslog version is quite far so enough time to
> > >>> test... ;)
> > >>>
> > 

Re: [rsyslog] Remote syslog setup unsuccesful

2015-10-06 Thread GeertLiesveld
David,

I added a file: /etc/rsyslog.d/enableDebug.conf with the contents:
*.* /var/log/all.log;RSYSLOG_DebugFormat

And restarted rsyslog. I do not see a file all.log being created in
/var/log.

Could it be this feature was not included in rsyslog version 4.2.0 ?

Geert Liesveld.



--
View this message in context: 
http://rsyslog-users.1305293.n2.nabble.com/Remote-syslog-setup-unsuccesful-tp7589216p7589255.html
Sent from the rsyslog-users mailing list archive at Nabble.com.
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Separation of actions based on log source - with good performance

2015-10-06 Thread Rainer Gerhards
2015-10-06 9:51 GMT+02:00 singh.janmejay :
> Let us ship a full implementation in 8.14, let us not remove it.

OK... I'd say let's see if this actually gets a CVE or not. If it
does, I think the proper thing to do is to remove what never was
really existing in 8.14. We can always introduce a new feature in
8.15. But for political reasons, I do not want to make this look like
a bugfix, but rather what it is: the first incarnation of a new
feature. I think those that really need this feature are in this
discussion thread and can build it themselves until 8.15 comes out.

The implications of makeing this look like a bug-fix are hard from my
PoV: I have already been asked to fix this "security vulnerability" in
older versions. The real fix is to remove the code in question, not to
spend another 2+ days on writing old stuff patches and advisories for
something that is really non-existent.

Over the past couple of month, we have done several improvements
to the build process and testing process and static code analysis and
cross-platform and... . While all of this is nice, it is also clearly
visible that the past 12 month were those with the least new features
and the least *bugfixes* since ever in rsyslog history. While I
definitely see value, I become more and more concerned if spending my
scare time on many close-to-cosmetic or quite-platform-specific issues
is really the right path forward. Sure we now know how to tell that
something is failing and we know it would be good to fix them. Two
years ago, we wouldn't know for sure, but the problems would have been
fixed already ;)

Rainer
>
> --
> Regards,
> Janmejay
>
> PS: Please blame the typos in this mail on my phone's uncivilized soft
> keyboard sporting it's not-so-smart-assist technology.
>
> On Oct 6, 2015 1:19 PM, "singh.janmejay"  wrote:
>
>> How about this: I get to it next week and implement all that the proposal
>> talks about and document it? It doesn't seem like a lot of work from where
>> it is right now.
>>
>> I vote for not removing it.
>>
>> --
>> Regards,
>> Janmejay
>>
>> PS: Please blame the typos in this mail on my phone's uncivilized soft
>> keyboard sporting it's not-so-smart-assist technology.
>>
>> On Oct 6, 2015 11:53 AM,  wrote:
>>
>>> Hello,
>>> I have never heared such a nonsense.
>>> Actually the number of applications that does not include features that
>>> are not official documented shoult be extremly limited.
>>>
>>> The functionality is really usefull and already in big landscapes
>>> productive.
>>> Please, please do NOT remove the lookup-table from the main branch.
>>> The functionaltiy works fine, I'm using this since march and I did not
>>> have any issue since the latest patch of janmejay.
>>>
>>> Even the "concept" is not fully implemented (e.g. smaller things like
>>> nomatch) the main part works fine.
>>>
>>>
>>> My suggestion would be to document everything which is currently
>>> implemented and keep the "conceptual documentation" as it is.
>>> So the Maintainer should no longer have an issue with it.
>>>
>>>
>>> If the main issue it the time to document the already implemented
>>> features, I can create a patch.
>>>
>>>
>>> Chris
>>>
>>>
>>>
>>> > Gesendet: Dienstag, 06. Oktober 2015 um 07:36 Uhr
>>> > Von: "David Lang" 
>>> > An: rsyslog-users 
>>> > Betreff: Re: [rsyslog] Separation of actions based on log source - with
>>> good performance
>>> >
>>> > a CVE for something that requires manually enabling an experimental
>>> feature???
>>> >
>>> > it would be one thing if a default config had the problem, or if it was
>>> > something entirely dependent on remote data.
>>> >
>>> > I would be very tempted to respond to the CVE with "don't enable this
>>> incomplete
>>> > feature" as the solution. It's very common for incomplete features to be
>>> > included in released versions
>>> >
>>> > grumble, we have enough real bugs to worry about.
>>> >
>>> > David Lang
>>> >
>>> > On Tue, 6 Oct 2015, Rainer Gerhards wrote:
>>> >
>>> > > Date: Tue, 6 Oct 2015 07:15:31 +0200
>>> > > From: Rainer Gerhards 
>>> > > Reply-To: rsyslog-users 
>>> > > To: rsyslog-users 
>>> > > Subject: Re: [rsyslog] Separation of actions based on log source -
>>> with good
>>> > > performance
>>> > >
>>> > > Sorry, folks, good intent always seems to find someone who turns it
>>> > > into negative. I was yesterday contacted by a distro maintainer who
>>> > > wants to turn this bug in the officially non-existant lookup table
>>> > > feature into a CVE and insists that it is a vuln even after the
>>> > > argument that the feature never oficially existed.
>>> > >
>>> > > It looks like it was a bad idea to merge potentially useful yet
>>> > > incomplete code into the main branch (and documenting it to be not
>>> > > present). It looks like I need to 

Re: [rsyslog] Remote syslog setup unsuccesful

2015-10-06 Thread David Lang

no, that has existed since rsyslog 2.x or so
if you aren't getting anything at all in that log you either have a syntax error 
that's confusing rsyslog (run 'rsyslogd -N2' to have it do a syntax check of the 
main config file and anything included by that file) or you have something else 
in the config that's throwing the log away before you get to that line.


can you post the config file with that line in it?

David Lang



 On Tue, 6 Oct 2015, GeertLiesveld wrote:


Date: Tue, 6 Oct 2015 01:35:40 -0700 (MST)
From: GeertLiesveld 
Reply-To: rsyslog-users 
To: rsyslog@lists.adiscon.com
Subject: Re: [rsyslog] Remote syslog setup unsuccesful

David,

I added a file: /etc/rsyslog.d/enableDebug.conf with the contents:
*.* /var/log/all.log;RSYSLOG_DebugFormat

And restarted rsyslog. I do not see a file all.log being created in
/var/log.

Could it be this feature was not included in rsyslog version 4.2.0 ?

Geert Liesveld.



--
View this message in context: 
http://rsyslog-users.1305293.n2.nabble.com/Remote-syslog-setup-unsuccesful-tp7589216p7589255.html
Sent from the rsyslog-users mailing list archive at Nabble.com.
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss

2015-10-06 Thread David Lang

I'm surprised it didn't help more than that. what do the impstats look
like.



Impstats looks good - only two dynafiles that still have evictions, which
I'll tune a little more now. See bottom of message for a full recent output.


at high traffic volumes increasing the batch size can also be helpful, but if 
the queue is usually empty it probably won't make much difference.



We have several %fromhost% templates for servers. I noticed that there is a
rush of DNS requests when rsyslog first starts up but then it quiets down
due to the internal caching. I also have a local DNS cache running to point
to - not currently set - but when I did have the system pointed there, I
still had missing messages.


well, you've fixed a lot of things, the dynafile thrashing was a major problem. 
DNS is probably less of an issue, and only an issue for UDP not TCP (UDP needs 
to do a lookup for each packet, TCP only for each connection)


I personally recommend using fromhost-ip instead of fromhost. If you want to use 
a name, use the hostname that the server put in the log message, if you need to 
ignore that and see where it last came from, the IP is faster and has fewer 
failure modes.


with your config, only traffic arriving on port 10517 will get written to the 
testingruleset, internal syslog messages won't show up.


now, there are things that can cause you to see messages via tcpdump and not see 
them in rsyslog.


The most common is iptables rules

But since you are writing to /tmp you could be running into permission problems 
with SELinux/AppArmor (depending on your distro). try putting the file under 
/var/log and see if it appears. It should appear when the first log message 
arrives.


by the way, you don't need to do *.*, just list the actions directly, so the 
following works just fine (permissions allowing)


ruleset(name="TestingRuleSet") {
  /tmp/debug.log;RSYSLOG_DebugFormat
  action(type="omfile" dynaFile="EduroamACSFile" template="EduroamACSFormat" 
name="writeEduroamACSFile")
  stop
} # end ruleset

in impstats, the /tmp/debug.log line will be action ## while the line after it 
will have a name, check what it says about messages that are processed/etc



Can you deliver messages to port 10517 via TCP? there is an odd case where a 
missing route can cause UDP messages to not be delivered to the app.



Oct  6 16:27:45 its-syslog-up1 rsyslogd-pstats: imudp(*:10517): origin=imudp

submitted=1221

Oct  6 16:27:45 its-syslog-up1 rsyslogd-pstats: imudp(*:10517): origin=imudp

submitted=0

yep, that says that messages are arriving

with impstats, do you have it configured to reset counters each time or not? 
there is a little inaccuracy that creeps in when resetting the counters, and the 
impstats analysis tool that Rainer pointed you at assumes the counters aren't 
being reset.


We really need at least two sets of stats to be able to see what's happening.

now, looking at the impstats

I see one smoking gun:

main Q: origin=core.queue size=149 enqueued=492522977 full=51 discarded.full=0 
discarded.nf=0 maxqsize=10


maxqsize is such a nice round number that I'll bet it's the max you have 
configured. That means that at some point rsyslog filled up the queue completely 
and would have been unable to process any more messages. That means that UDP 
messages that arrived during that time would have been dropped.


this also says it was full 51 times, so that is at least 51 messages lost.

on a high volume server, you probably want to bump the main queue size up a bit 
larger as well.


another issue is that several actions show that they have been suspended and not 
resumed, that will cause logs to be missing from those outputs (action2, 30, 43, 
67). one advantage of the new action() format is the ability to name things. 
Even if you don't name everything, naming several of them will help figure out 
what the rest are.


as you said, there are still some dynafile cache settings that probably need to 
be increased. They can slow everything down and cause the queue to fill (but 
without knowing if the stats are reset, I don't know if the numbers are since 
startup or since the last report) Since the busiest outputs are also the ones 
with the highest miss rates, you may end up with a more dramatic I/O savings 
once you fix them.


I don't understand why you have multiple lines for the same thing (imudp and 
writeEduroamACSFile for example), unless it's one line per thread, in which case 
these are showing that you really don't need multiple threads for the inputs, 
you aren't using them :-)


David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Separation of actions based on log source - with good performance

2015-10-06 Thread singh.janmejay
Ok. But we'll have to keep merging master into master-insecure going forward.

Im ok working on it off any branch.

On Tue, Oct 6, 2015 at 3:14 PM, Rainer Gerhards
 wrote:
> 2015-10-06 9:51 GMT+02:00 singh.janmejay :
>> Let us ship a full implementation in 8.14, let us not remove it.
>
> OK... I'd say let's see if this actually gets a CVE or not. If it
> does, I think the proper thing to do is to remove what never was
> really existing in 8.14. We can always introduce a new feature in
> 8.15. But for political reasons, I do not want to make this look like
> a bugfix, but rather what it is: the first incarnation of a new
> feature. I think those that really need this feature are in this
> discussion thread and can build it themselves until 8.15 comes out.
>
> The implications of makeing this look like a bug-fix are hard from my
> PoV: I have already been asked to fix this "security vulnerability" in
> older versions. The real fix is to remove the code in question, not to
> spend another 2+ days on writing old stuff patches and advisories for
> something that is really non-existent.
>
> Over the past couple of month, we have done several improvements
> to the build process and testing process and static code analysis and
> cross-platform and... . While all of this is nice, it is also clearly
> visible that the past 12 month were those with the least new features
> and the least *bugfixes* since ever in rsyslog history. While I
> definitely see value, I become more and more concerned if spending my
> scare time on many close-to-cosmetic or quite-platform-specific issues
> is really the right path forward. Sure we now know how to tell that
> something is failing and we know it would be good to fix them. Two
> years ago, we wouldn't know for sure, but the problems would have been
> fixed already ;)
>
> Rainer
>>
>> --
>> Regards,
>> Janmejay
>>
>> PS: Please blame the typos in this mail on my phone's uncivilized soft
>> keyboard sporting it's not-so-smart-assist technology.
>>
>> On Oct 6, 2015 1:19 PM, "singh.janmejay"  wrote:
>>
>>> How about this: I get to it next week and implement all that the proposal
>>> talks about and document it? It doesn't seem like a lot of work from where
>>> it is right now.
>>>
>>> I vote for not removing it.
>>>
>>> --
>>> Regards,
>>> Janmejay
>>>
>>> PS: Please blame the typos in this mail on my phone's uncivilized soft
>>> keyboard sporting it's not-so-smart-assist technology.
>>>
>>> On Oct 6, 2015 11:53 AM,  wrote:
>>>
 Hello,
 I have never heared such a nonsense.
 Actually the number of applications that does not include features that
 are not official documented shoult be extremly limited.

 The functionality is really usefull and already in big landscapes
 productive.
 Please, please do NOT remove the lookup-table from the main branch.
 The functionaltiy works fine, I'm using this since march and I did not
 have any issue since the latest patch of janmejay.

 Even the "concept" is not fully implemented (e.g. smaller things like
 nomatch) the main part works fine.


 My suggestion would be to document everything which is currently
 implemented and keep the "conceptual documentation" as it is.
 So the Maintainer should no longer have an issue with it.


 If the main issue it the time to document the already implemented
 features, I can create a patch.


 Chris



 > Gesendet: Dienstag, 06. Oktober 2015 um 07:36 Uhr
 > Von: "David Lang" 
 > An: rsyslog-users 
 > Betreff: Re: [rsyslog] Separation of actions based on log source - with
 good performance
 >
 > a CVE for something that requires manually enabling an experimental
 feature???
 >
 > it would be one thing if a default config had the problem, or if it was
 > something entirely dependent on remote data.
 >
 > I would be very tempted to respond to the CVE with "don't enable this
 incomplete
 > feature" as the solution. It's very common for incomplete features to be
 > included in released versions
 >
 > grumble, we have enough real bugs to worry about.
 >
 > David Lang
 >
 > On Tue, 6 Oct 2015, Rainer Gerhards wrote:
 >
 > > Date: Tue, 6 Oct 2015 07:15:31 +0200
 > > From: Rainer Gerhards 
 > > Reply-To: rsyslog-users 
 > > To: rsyslog-users 
 > > Subject: Re: [rsyslog] Separation of actions based on log source -
 with good
 > > performance
 > >
 > > Sorry, folks, good intent always seems to find someone who turns it
 > > into negative. I was yesterday contacted by a distro maintainer who
 > > wants to turn this bug in the officially non-existant 

Re: [rsyslog] Remote syslog setup unsuccesful

2015-10-06 Thread Rainer Gerhards
2015-10-06 12:16 GMT+02:00 David Lang :
> no, that has existed since rsyslog 2.x or so

As I was not sure, I checked out 4.2.0 and the RSYSLOG_DebugFormat
template is indeed available in it.

Rainer
> if you aren't getting anything at all in that log you either have a syntax
> error that's confusing rsyslog (run 'rsyslogd -N2' to have it do a syntax
> check of the main config file and anything included by that file) or you
> have something else in the config that's throwing the log away before you
> get to that line.
>
> can you post the config file with that line in it?
>
> David Lang
>
>
>
>  On Tue, 6 Oct 2015, GeertLiesveld wrote:
>
>> Date: Tue, 6 Oct 2015 01:35:40 -0700 (MST)
>> From: GeertLiesveld 
>> Reply-To: rsyslog-users 
>> To: rsyslog@lists.adiscon.com
>> Subject: Re: [rsyslog] Remote syslog setup unsuccesful
>>
>>
>> David,
>>
>> I added a file: /etc/rsyslog.d/enableDebug.conf with the contents:
>> *.* /var/log/all.log;RSYSLOG_DebugFormat
>>
>> And restarted rsyslog. I do not see a file all.log being created in
>> /var/log.
>>
>> Could it be this feature was not included in rsyslog version 4.2.0 ?
>>
>> Geert Liesveld.
>>
>>
>>
>> --
>> View this message in context:
>> http://rsyslog-users.1305293.n2.nabble.com/Remote-syslog-setup-unsuccesful-tp7589216p7589255.html
>> Sent from the rsyslog-users mailing list archive at Nabble.com.
>> ___
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
>> LIKE THAT.
>>
> ___
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
> LIKE THAT.
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


[rsyslog] RFC: dynamic-stats support

2015-10-06 Thread singh.janmejay
Hi,

I am working on support for stats with dynamic-name. This comes handy
in situations where metric-name is dependent upon value of a certain
attribute of the message.

Say, for a central log-aggregation service, its valuable to know what
is inbound message-count distribution across application-clusters that
send logs to it, or for a shared-server, its valuable to know what is
the log-volume generation across users etc.

Im thinking of using functions-like interface to support this. It may
look similar to this:


dyn_stats("user_msg_count")

...

ruleset(...) {
...
dyn_inc("user_msg_count", $.user)
...
}


dyn_stats signature looks like:
dyn_stats(, , , )

dyn_inc signature looks like:
dyn_inc(, )


Reporting would work similar to static-metric via impstats. Mapping:
statsobj_s.name = name_space
statsobj_s.origin = "dyn"
ctr_s.name = "foo" (say $.user had value foo)


Thoughts / suggestions?

-- 
Regards,
Janmejay
http://codehunk.wordpress.com
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] RFC: dynamic-stats support

2015-10-06 Thread David Lang

On Tue, 6 Oct 2015, singh.janmejay wrote:


Hi,

I am working on support for stats with dynamic-name. This comes handy
in situations where metric-name is dependent upon value of a certain
attribute of the message.

Say, for a central log-aggregation service, its valuable to know what
is inbound message-count distribution across application-clusters that
send logs to it, or for a shared-server, its valuable to know what is
the log-volume generation across users etc.

Im thinking of using functions-like interface to support this. It may
look similar to this:


dyn_stats("user_msg_count")

...

ruleset(...) {
...
dyn_inc("user_msg_count", $.user)
...
}


dyn_stats signature looks like:
dyn_stats(, , , )

dyn_inc signature looks like:
dyn_inc(, )


Reporting would work similar to static-metric via impstats. Mapping:
statsobj_s.name = name_space
statsobj_s.origin = "dyn"
ctr_s.name = "foo" (say $.user had value foo)


Thoughts / suggestions?


how is this different/better than global variables? (although we may need to 
implement soem functions, atomic inc/dec copy+clear) If you have pstats output 
in json format, you can even piggyback on it's schedule to output the data.



things like stats can very easily end up being expensive in terms of locking 
(something global variables already have figured out), and it sounds like you 
are proposing adding a scheduler of some sort to output the data.


variables should not need to be 'recycled', either they contain data or they 
don't. If they contain data, you need to keep the data until you do something 
with it, if they don't, you don't have to track them.



I am actually doing this sort of thing external to rsyslog in SEC

I have a template in rsyslog that contains hostname, fromhost-ip, programname 
and I output it via improg to SEC. SEC accumulates counters and has scheduled 
outputs to files.


before I started using SEC for this, I used the same template to output to a 
file and then for reports, used cut + sort + uniq -c to extract the data I need. 
When the files only contain the significant data, this is actually not bad to 
do, even at higher volumes.


David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] omfile fuse

2015-10-06 Thread Rainer Gerhards
2015-10-06 8:26 GMT+02:00 David Lang :
> On Tue, 6 Oct 2015, Rainer Gerhards wrote:
>
>> 2015-10-06 7:27 GMT+02:00 David Lang :
>>>
>>> On Tue, 6 Oct 2015, Rainer Gerhards wrote:
>>>
 2015-10-06 0:09 GMT+02:00 Kendall Green :
>
>
> I've set asyncWriting="off" and the only difference is, that with 1000
> message test, all were delivered on HUP and/or service restart, but
> apparently not on flushInterval or buffer full. I'm currently running
> another test with 100k messages iterating about 10 logs/sec, and
> watching
> the output for number of lines. The output file is created on dfs-fuse
> mount upon receipt of the first log, but 0 lines written.
>
> This current test will hopefully show how many logs get processed
> without
> intervention over the next ~3 hours, and if still no logs appear will
> see
> how many are written upon HUP signal.
>
> I'm unsure how the asyncWriting off impacts the rest of the config, and
> if
> the current config has disabled compression... or not with no zip.
>
> This note is from omfile asyncWriting doc:
> *Note that in order to enable FlushInterval, AsyncWriting must be set
> to
> “on”. Otherwise, the flush interval will be ignored. Also note that
> when
> FlushOnTXEnd is “on” but AsyncWriting is off, output will only be
> written
> when the buffer is full. This may take several hours, or even require a
> rsyslog shutdown. However, a buffer flush can be forced in that case by
> sending rsyslogd a HUP signal.*
>
> Please help clarify how these parameters correspond and what the
> settings
> can achieve expedited log transport, avoiding long delays waiting for
> buffering or flush intervals. This goal is why have chosen low settings
> for
> IOBufferSize and flushInterval.



 Without async writing, everything is written sync on when it arrives
 (more precisely at commit of the current batch) aka "as quickly as
 possible". Thus, these params do not exist because they have no
 meaning. Note that with a very large batch and a small buffer, partial
 records may be written.
>>>
>>>
>>>
>>> although without the checkpoint parameter the writes are to the OS and
>>> they
>>> get flushed to disk at a later point.
>>
>>
>> you mean "sync", right?
>
>
> I may be misunderstanding, but i thought the checkpoint (or checkpoint
> interval) parameter controleld how frequently you did the sync (so turning
> sync on didn't make it per every message)

That checkopoint interval is for queues (which share the same util
object), not for omfile. There, what you say is right. For omfile (if
not set to async writing), the sync happens at the end of each batch,
which usually means after each write because usually a full batch goes
into the write buffer. If the batch has more data than the write
buffer fits, all buffer writes are done and then the flush.

I've also updated the doc to be more precise:

https://github.com/rsyslog/rsyslog-doc/commit/b2bb5d93787f9662eb41499c86f881d2bd649dce

Rainer
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss

2015-10-06 Thread Woodruff, Dan
Thank you David. From what I see, none of the queues are getting full.

None of the actions - I have 73 total - have a 'failed' count of greater
than zero. Four have a 'suspended' count in the single digits. 

When you said "if you are writing to a local file, you should not need to
have a separate queue", does queue equate to ruleset in this case? The main
reason I have the separate rulesets is to handle different kinds of incoming
logs on different ports, making it easier to write the rules logic for the
different types. Does that make sense?

There is a high number of dynafiles, specifically for server logs, which we
break out into files by host and programname with these templates:
template(name="LinuxProgramFile" type="string"
string="/var/log/collection/linux/%fromhost:::lowercase%/%programname%.log")
template(name="WindowsProgramFile" type="string"
string="/var/log/collection/windows/%HOSTNAME:::lowercase%/%programname%.log
")

The result is about 32,000 total dynafiles which are then rotated out daily.

Looking at top, rsyslogd as a whole never goes above 120%. Most of the
threads hang out below 15% with the busiest one - rs:NetworkDevic - hitting
about 50% during peak. Although I hadn't noticed this before, watching top
for a few minutes this morning, rsyslogd hangs out around 60%, then bursts
to almost 200% for one second, then disappears from top in the next second,
before returning to 60%. It cycled through this every 10 seconds or so.
After restarting rsyslogd, the behavior went away - it's staying around 60%.

Should I try lowering the number of threads for the rulesets/imudp? This
issue does happen with both TCP and UDP, however, which further compounds my
confusion. You mention systemd - I'll look into that as I have a case open
with RH.

Dan


-Original Message-
From: rsyslog-boun...@lists.adiscon.com
[mailto:rsyslog-boun...@lists.adiscon.com] On Behalf Of David Lang
Sent: Monday, October 5, 2015 3:23 PM
To: rsyslog-users 
Subject: Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss

doing a cut-n-past since this client doesn't quote signed messages properly:


David, thank you. Apologies, I had cut out the dynafile for testing. Here is
some additional config file:

template(name="EduroamACSFile" type="string"
string="/var/log/collection/eduroam_acs.log")
template(name="EduroamACSFormat" type="string" string="%timestamp%
%hostname% %syslogtag%%msg:::escape-cc%\n")

ruleset(name="ruleset1" queue.type="LinkedList" queue.size="75"
queue.workerthreads="4") {
 # at very top of ruleset - IPs obscured for privacy
 if $fromhost-ip == 'x.x.2.75'
 or $fromhost-ip == 'x.x.2.85'
 then {
 /tmp/debug.log;RSYSLOG_DebugFormat
 action(type="omfile" dynaFile="EduroamACSFile"
template="EduroamACSFormat" name="writeEduroamACSFile")
 stop
 }
 #... other rules below
} # end ruleset

I do have impstats enabled for every 5 minutes. Here's output specific to
this file for a recent sample for this file, plus the general rule set
queues:

Oct  5 14:06:47 its-syslog-up1 rsyslogd-pstats: dynafile cache
EduroamACSFile: origin=omfile requests=31123 level0=30781 missed=342
evicted=0 maxused=1 closetimeouts=51
Oct  5 14:06:47 its-syslog-up1 rsyslogd-pstats: writeEduroamACSFile:
origin=core.action processed=31123 failed=0 suspended=0 suspended.duration=0
resumed=0

Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: resource-usage:
origin=impstats utime=217851481196 stime=87433080781 maxrss=628664
minflt=43999703 majflt=216 inblock=3033769 oublock=-1119976592
nvcsw=379912551 nivcsw=20999555
Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: LinuxRuleSet:
origin=core.queue size=6 enqueued=3304935974 full=0 discarded.full=0
discarded.nf=0 maxqsize=350911
Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: WindowsRuleSet:
origin=core.queue size=0 enqueued=648043087 full=0 discarded.full=0
discarded.nf=0 maxqsize=156862
Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: MailRuleSet:
origin=core.queue size=0 enqueued=307960420 full=0 discarded.full=0
discarded.nf=0 maxqsize=4243
Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: NetworkDeviceRuleSet:
origin=core.queue size=0 enqueued=11008320810 full=0 discarded.full=0
discarded.nf=0 maxqsize=456471
Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: main Q: origin=core.queue
size=120 enqueued=1496835 full=0 discarded.full=0 discarded.nf=0
maxqsize=1738
Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: imudp(w1): origin=imudp
called.recvmmsg=8282193695 called.recvmsg=0 msgs.received=2833007396 Oct  5
14:11:47 its-syslog-up1 rsyslogd-pstats: imudp(w0): origin=imudp
called.recvmmsg=14154695256 called.recvmsg=0 msgs.received=8909255564

To estimate volume of logs this is receiving, I took a recent impstats
output for enqueued for all rule sets and added them, then subtracted the
values from an hour ago, divided by 3600 to get msgs/second and got 29521.
This 

Re: [rsyslog] Remote syslog setup unsuccesful

2015-10-06 Thread GeertLiesveld
Rainer, 
thanks for your reply. 
For now , I just started rsyslog in the foreground using the -d option. That
was enough to see that the messages from the remote host did not come in.
I checked tcpdump again and saw the messages were actually delivered to the
interface. 

We restored a broken install for iptables and that showed that all traffic
should come in.
So we started a test with netcat listening on port UDP 514 instead of
rsyslog (stopped).
That showed indeed nothing was coming in, which ruled out an error in the
configuration of rsyslog.

We checked the other settings for the OS. We have both ipv6 and ipv4
configured in this system. For Ipv6 there is a gateway configured. For ipv4
there was not. Actually, this was one of the things that David mentioned
earlier. Well, it seems he was exacly right! we configured a default gateway
for ipv4 as well, and after starting the rsyslog service, messages were
logged in the way they are supposed to, in a separate directory using the
%FROMHOST% option.

I want to thank you very much for your most welcome help.  

Best Regards,

Geert Liesveld.



--
View this message in context: 
http://rsyslog-users.1305293.n2.nabble.com/Remote-syslog-setup-unsuccesful-tp7589216p7589263.html
Sent from the rsyslog-users mailing list archive at Nabble.com.
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss

2015-10-06 Thread Rainer Gerhards
2015-10-06 14:57 GMT+02:00 Woodruff, Dan :
> Thank you David. From what I see, none of the queues are getting full.
>
> None of the actions - I have 73 total - have a 'failed' count of greater
> than zero. Four have a 'suspended' count in the single digits.
>
> When you said "if you are writing to a local file, you should not need to
> have a separate queue", does queue equate to ruleset in this case? The main
> reason I have the separate rulesets is to handle different kinds of incoming
> logs on different ports, making it easier to write the rules logic for the
> different types. Does that make sense?
>
> There is a high number of dynafiles, specifically for server logs, which we
> break out into files by host and programname with these templates:
> template(name="LinuxProgramFile" type="string"
> string="/var/log/collection/linux/%fromhost:::lowercase%/%programname%.log")
> template(name="WindowsProgramFile" type="string"
> string="/var/log/collection/windows/%HOSTNAME:::lowercase%/%programname%.log
> ")
>
> The result is about 32,000 total dynafiles which are then rotated out daily.
>
> Looking at top, rsyslogd as a whole never goes above 120%. Most of the
> threads hang out below 15% with the busiest one - rs:NetworkDevic - hitting
> about 50% during peak. Although I hadn't noticed this before, watching top
> for a few minutes this morning, rsyslogd hangs out around 60%, then bursts
> to almost 200% for one second, then disappears from top in the next second,
> before returning to 60%. It cycled through this every 10 seconds or so.
> After restarting rsyslogd, the behavior went away - it's staying around 60%.

This could be caused by a too-small dynafile cache. I haven't located
the conf for these actions with 32,000 files nor the stats, but
depending on size it could be the problem. Please post the *full*
impstats log.

Rainer
>
> Should I try lowering the number of threads for the rulesets/imudp? This
> issue does happen with both TCP and UDP, however, which further compounds my
> confusion. You mention systemd - I'll look into that as I have a case open
> with RH.
>
> Dan
>
>
> -Original Message-
> From: rsyslog-boun...@lists.adiscon.com
> [mailto:rsyslog-boun...@lists.adiscon.com] On Behalf Of David Lang
> Sent: Monday, October 5, 2015 3:23 PM
> To: rsyslog-users 
> Subject: Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss
>
> doing a cut-n-past since this client doesn't quote signed messages properly:
>
>
> David, thank you. Apologies, I had cut out the dynafile for testing. Here is
> some additional config file:
>
> template(name="EduroamACSFile" type="string"
> string="/var/log/collection/eduroam_acs.log")
> template(name="EduroamACSFormat" type="string" string="%timestamp%
> %hostname% %syslogtag%%msg:::escape-cc%\n")
>
> ruleset(name="ruleset1" queue.type="LinkedList" queue.size="75"
> queue.workerthreads="4") {
>  # at very top of ruleset - IPs obscured for privacy
>  if $fromhost-ip == 'x.x.2.75'
>  or $fromhost-ip == 'x.x.2.85'
>  then {
>  /tmp/debug.log;RSYSLOG_DebugFormat
>  action(type="omfile" dynaFile="EduroamACSFile"
> template="EduroamACSFormat" name="writeEduroamACSFile")
>  stop
>  }
>  #... other rules below
> } # end ruleset
>
> I do have impstats enabled for every 5 minutes. Here's output specific to
> this file for a recent sample for this file, plus the general rule set
> queues:
>
> Oct  5 14:06:47 its-syslog-up1 rsyslogd-pstats: dynafile cache
> EduroamACSFile: origin=omfile requests=31123 level0=30781 missed=342
> evicted=0 maxused=1 closetimeouts=51
> Oct  5 14:06:47 its-syslog-up1 rsyslogd-pstats: writeEduroamACSFile:
> origin=core.action processed=31123 failed=0 suspended=0 suspended.duration=0
> resumed=0
>
> Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: resource-usage:
> origin=impstats utime=217851481196 stime=87433080781 maxrss=628664
> minflt=43999703 majflt=216 inblock=3033769 oublock=-1119976592
> nvcsw=379912551 nivcsw=20999555
> Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: LinuxRuleSet:
> origin=core.queue size=6 enqueued=3304935974 full=0 discarded.full=0
> discarded.nf=0 maxqsize=350911
> Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: WindowsRuleSet:
> origin=core.queue size=0 enqueued=648043087 full=0 discarded.full=0
> discarded.nf=0 maxqsize=156862
> Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: MailRuleSet:
> origin=core.queue size=0 enqueued=307960420 full=0 discarded.full=0
> discarded.nf=0 maxqsize=4243
> Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: NetworkDeviceRuleSet:
> origin=core.queue size=0 enqueued=11008320810 full=0 discarded.full=0
> discarded.nf=0 maxqsize=456471
> Oct  5 14:11:47 its-syslog-up1 rsyslogd-pstats: main Q: origin=core.queue
> size=120 enqueued=1496835 full=0 discarded.full=0 discarded.nf=0
> maxqsize=1738
> Oct  5 14:11:47 its-syslog-up1 

Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss

2015-10-06 Thread Woodruff, Dan
Rainer, thank you. Here is a full extract of a recent impstats run:

Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: imuxsock: origin=imuxsock
submitted=282984 ratelimit.discarded=0 ratelimit.numratelimiters=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache StatsFile:
origin=omfile requests=326648 level0=325603 missed=1045 evicted=0 maxused=1
closetimeouts=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 1: origin=core.action
processed=326648 failed=0 suspended=0 suspended.duration=0 resumed=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 2: origin=core.action
processed=3255090916 failed=0 suspended=2 suspended.duration=240 resumed=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache LDAPFile:
origin=omfile requests=3171530030 level0=3171529033 missed=997 evicted=0
maxused=1 closetimeouts=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 3: origin=core.action
processed=3171530030 failed=0 suspended=0 suspended.duration=0 resumed=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
LinuxIPTablesFile: origin=omfile requests=138877957 level0=19864683
missed=91303303 evicted=91292586 maxused=10 closetimeouts=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 4: origin=core.action
processed=138877957 failed=0 suspended=0 suspended.duration=0 resumed=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache LinuxUFWFile:
origin=omfile requests=0 level0=0 missed=0 evicted=0 maxused=0
closetimeouts=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 5: origin=core.action
processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache DHCPFile:
origin=omfile requests=3518037 level0=3516992 missed=1045 evicted=0
maxused=1 closetimeouts=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 6: origin=core.action
processed=3518037 failed=0 suspended=0 suspended.duration=0 resumed=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
LinuxBlankProgramFile: origin=omfile requests=93283 level0=28347 missed=1932
evicted=15 maxused=10 closetimeouts=303
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 7: origin=core.action
processed=93283 failed=0 suspended=0 suspended.duration=0 resumed=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
LinuxProgramFile: origin=omfile requests=358821719 level0=245218530
missed=23914416 evicted=23903728 maxused=10 closetimeouts=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 8: origin=core.action
processed=358821719 failed=0 suspended=0 suspended.duration=0 resumed=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
10514Catchall: origin=omfile requests=0 level0=0 missed=0 evicted=0
maxused=0 closetimeouts=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 9: origin=core.action
processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
WindowsSecFile: origin=omfile requests=232842816 level0=189594825
missed=19555094 evicted=19544459 maxused=10 closetimeouts=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 10:
origin=core.action processed=232842816 failed=0 suspended=0
suspended.duration=0 resumed=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
WindowsSecFile: origin=omfile requests=18059 level0=17740 missed=319
evicted=0 maxused=1 closetimeouts=68
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 11:
origin=core.action processed=18059 failed=0 suspended=0 suspended.duration=0
resumed=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
WindowsAppFile: origin=omfile requests=14111964 level0=8603913
missed=4387715 evicted=4377380 maxused=10 closetimeouts=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 12:
origin=core.action processed=14111964 failed=0 suspended=0
suspended.duration=0 resumed=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
WindowsSysFile: origin=omfile requests=7726972 level0=4070882 missed=3120334
evicted=3110190 maxused=10 closetimeouts=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 13:
origin=core.action processed=7726972 failed=0 suspended=0
suspended.duration=0 resumed=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
ParkingPayStationFile: origin=omfile requests=0 level0=0 missed=0 evicted=0
maxused=0 closetimeouts=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 14:
origin=core.action processed=0 failed=0 suspended=0 suspended.duration=0
resumed=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
WindowsProgramFileNoDotLog: origin=omfile requests=238047839
level0=217994829 missed=17165101 evicted=17154494 maxused=10 closetimeouts=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 15:
origin=core.action processed=238047839 failed=0 suspended=0
suspended.duration=0 resumed=0
Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
WindowsProgramFile: origin=omfile 

Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss

2015-10-06 Thread Rainer Gerhards
There are a couple of those:

LinuxIPTablesFile: origin=omfile requests=138877957 level0=19864683
missed=91303303 evicted=91292586 maxused=10 closetimeouts=0

have a look at the evicted count! This totally kills performance.

Each time one entry needs to be evicted, the file is closed, and a new
one is opened. You should set the cache size for this action to match
roughly the number of files that you want to keep open during normal
operations (e.g. if you have 1000 hosts and include hour and hostname
in the dynafile name, the cache size should be around 1000, except if
you have some low talkers.

Also note the online tool at

http://www.rsyslog.com/impstats-analyzer/

It probably will spit out a couple of warnings if you submit the pstats.

HTH
Rainer

2015-10-06 15:29 GMT+02:00 Woodruff, Dan :
> Rainer, thank you. Here is a full extract of a recent impstats run:
>
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: imuxsock: origin=imuxsock
> submitted=282984 ratelimit.discarded=0 ratelimit.numratelimiters=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache StatsFile:
> origin=omfile requests=326648 level0=325603 missed=1045 evicted=0 maxused=1
> closetimeouts=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 1: origin=core.action
> processed=326648 failed=0 suspended=0 suspended.duration=0 resumed=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 2: origin=core.action
> processed=3255090916 failed=0 suspended=2 suspended.duration=240 resumed=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache LDAPFile:
> origin=omfile requests=3171530030 level0=3171529033 missed=997 evicted=0
> maxused=1 closetimeouts=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 3: origin=core.action
> processed=3171530030 failed=0 suspended=0 suspended.duration=0 resumed=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
> LinuxIPTablesFile: origin=omfile requests=138877957 level0=19864683
> missed=91303303 evicted=91292586 maxused=10 closetimeouts=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 4: origin=core.action
> processed=138877957 failed=0 suspended=0 suspended.duration=0 resumed=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache LinuxUFWFile:
> origin=omfile requests=0 level0=0 missed=0 evicted=0 maxused=0
> closetimeouts=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 5: origin=core.action
> processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache DHCPFile:
> origin=omfile requests=3518037 level0=3516992 missed=1045 evicted=0
> maxused=1 closetimeouts=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 6: origin=core.action
> processed=3518037 failed=0 suspended=0 suspended.duration=0 resumed=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
> LinuxBlankProgramFile: origin=omfile requests=93283 level0=28347 missed=1932
> evicted=15 maxused=10 closetimeouts=303
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 7: origin=core.action
> processed=93283 failed=0 suspended=0 suspended.duration=0 resumed=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
> LinuxProgramFile: origin=omfile requests=358821719 level0=245218530
> missed=23914416 evicted=23903728 maxused=10 closetimeouts=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 8: origin=core.action
> processed=358821719 failed=0 suspended=0 suspended.duration=0 resumed=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
> 10514Catchall: origin=omfile requests=0 level0=0 missed=0 evicted=0
> maxused=0 closetimeouts=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 9: origin=core.action
> processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
> WindowsSecFile: origin=omfile requests=232842816 level0=189594825
> missed=19555094 evicted=19544459 maxused=10 closetimeouts=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 10:
> origin=core.action processed=232842816 failed=0 suspended=0
> suspended.duration=0 resumed=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
> WindowsSecFile: origin=omfile requests=18059 level0=17740 missed=319
> evicted=0 maxused=1 closetimeouts=68
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 11:
> origin=core.action processed=18059 failed=0 suspended=0 suspended.duration=0
> resumed=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
> WindowsAppFile: origin=omfile requests=14111964 level0=8603913
> missed=4387715 evicted=4377380 maxused=10 closetimeouts=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 12:
> origin=core.action processed=14111964 failed=0 suspended=0
> suspended.duration=0 resumed=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
> WindowsSysFile: origin=omfile requests=7726972 level0=4070882 

Re: [rsyslog] RFC: dynamic-stats support

2015-10-06 Thread singh.janmejay
Rainer,

I see this as something completely outside the scope of  variables.
Building stats collector over variables is possible, but then we are
then talking about a general purpose language which allows building
such complex things. This increases the scope of Rainerscript and with
larger scope comes complexity. I feel this is in-line with the other
Lua discussion where you emphasized that Rainerscript should not
become a fully-general-purpose language?

Eg. creating an atomic-increment function for variable requires that
we educate users about what can and can't be done if atomic-increment
function is used anywhere on a variable. What relationship they can
expect it to have with other atomic-incrementing variables (which gets
into memory model).



On Tue, Oct 6, 2015 at 8:49 PM, Rainer Gerhards
 wrote:
> I can't fully dig into this, but I think we must *very carefully*
> evaluate the overall design. Some time ago we introduced arrays for
> the limited liblognorm use case, and it hurts us every now and then
> when folks want to use arrays for other use cases. It may probably
> make sense to re-think how the variable engine etc behaves before
> adding more functionality. And make sure that everything works smooth
> in all use cases. While anything else may take care for some use
> cases, I fear we may get too fragmented. At least this is what I
> learned in the past months discussions.
>
> Anyone else?
>
> Rainer
>
> 2015-10-06 17:10 GMT+02:00 singh.janmejay :
>> It is possible to use global-variables (it'll require some
>> enhancements, table-support etc), but it'll be very inefficient
>> compared to this approach. For instance, choice of data-structure etc
>> allows making the solution a lot more efficient.
>>
>> Here its possible to locklessly increment counters in most cases, so
>> its overhead is a lot lesser than global-variables.
>>
>> Recycle is precisely to allow this lockless mechanism to work. Its
>> basically saying, it'll track metric-names he has seen in last 1 hour.
>> If we kill tracking of it as soon as we don't see an increment
>> (between 2 reporting runs of impstats), it'll lead to unnecessary
>> churn when low-values are common or load is not uniform in time.
>>
>> Implementing it on top of global-variables is not only has very high
>> performance-penalty(it'll be prohibitive for high-throughput
>> scenarios), it also exposes too much complexity to the user (where
>> user has to worry about reset etc).
>>
>> I don't plan to have a scheduler in this implementation.
>> GetAllStatsLines call will purge the tree instead of reset at that
>> interval. Its basically a balance between freeing-up memory occupied
>> by stale-metric-names vs. performance (lockless handling of
>> increment). So it will be governed by impstat schedule. May be I
>> should change name to better name (equivalent of
>> purge_known_keys_after_they_have_been_reported_N_times).
>>
>>
>> On Tue, Oct 6, 2015 at 4:30 PM, David Lang  wrote:
>>> On Tue, 6 Oct 2015, singh.janmejay wrote:
>>>
 Hi,

 I am working on support for stats with dynamic-name. This comes handy
 in situations where metric-name is dependent upon value of a certain
 attribute of the message.

 Say, for a central log-aggregation service, its valuable to know what
 is inbound message-count distribution across application-clusters that
 send logs to it, or for a shared-server, its valuable to know what is
 the log-volume generation across users etc.

 Im thinking of using functions-like interface to support this. It may
 look similar to this:

 
 dyn_stats("user_msg_count")

 ...

 ruleset(...) {
 ...
 dyn_inc("user_msg_count", $.user)
 ...
 }
 

 dyn_stats signature looks like:
 dyn_stats(, , >>> default=10k>, )

 dyn_inc signature looks like:
 dyn_inc(, )


 Reporting would work similar to static-metric via impstats. Mapping:
 statsobj_s.name = name_space
 statsobj_s.origin = "dyn"
 ctr_s.name = "foo" (say $.user had value foo)


 Thoughts / suggestions?
>>>
>>>
>>> how is this different/better than global variables? (although we may need to
>>> implement soem functions, atomic inc/dec copy+clear) If you have pstats
>>> output in json format, you can even piggyback on it's schedule to output the
>>> data.
>>>
>>>
>>> things like stats can very easily end up being expensive in terms of locking
>>> (something global variables already have figured out), and it sounds like
>>> you are proposing adding a scheduler of some sort to output the data.
>>>
>>> variables should not need to be 'recycled', either they contain data or they
>>> don't. If they contain data, you need to keep the data until you do
>>> something with it, if they don't, you don't have to track them.
>>>
>>>
>>> I am actually doing 

Re: [rsyslog] RFC: dynamic-stats support

2015-10-06 Thread Brian Knox
I personally would argue that stats around the actual content of syslog
messages is outside of the domain that rsyslog should be responsible for.

impstats makes sense to me as it provides statistics around rsyslogs
operation itself.  Once I start wanting stats and counters around message
content, I would rather delegate that to a different system entirely.

On Tue, Oct 6, 2015 at 12:04 PM, singh.janmejay 
wrote:

> Rainer,
>
> I see this as something completely outside the scope of  variables.
> Building stats collector over variables is possible, but then we are
> then talking about a general purpose language which allows building
> such complex things. This increases the scope of Rainerscript and with
> larger scope comes complexity. I feel this is in-line with the other
> Lua discussion where you emphasized that Rainerscript should not
> become a fully-general-purpose language?
>
> Eg. creating an atomic-increment function for variable requires that
> we educate users about what can and can't be done if atomic-increment
> function is used anywhere on a variable. What relationship they can
> expect it to have with other atomic-incrementing variables (which gets
> into memory model).
>
>
>
> On Tue, Oct 6, 2015 at 8:49 PM, Rainer Gerhards
>  wrote:
> > I can't fully dig into this, but I think we must *very carefully*
> > evaluate the overall design. Some time ago we introduced arrays for
> > the limited liblognorm use case, and it hurts us every now and then
> > when folks want to use arrays for other use cases. It may probably
> > make sense to re-think how the variable engine etc behaves before
> > adding more functionality. And make sure that everything works smooth
> > in all use cases. While anything else may take care for some use
> > cases, I fear we may get too fragmented. At least this is what I
> > learned in the past months discussions.
> >
> > Anyone else?
> >
> > Rainer
> >
> > 2015-10-06 17:10 GMT+02:00 singh.janmejay :
> >> It is possible to use global-variables (it'll require some
> >> enhancements, table-support etc), but it'll be very inefficient
> >> compared to this approach. For instance, choice of data-structure etc
> >> allows making the solution a lot more efficient.
> >>
> >> Here its possible to locklessly increment counters in most cases, so
> >> its overhead is a lot lesser than global-variables.
> >>
> >> Recycle is precisely to allow this lockless mechanism to work. Its
> >> basically saying, it'll track metric-names he has seen in last 1 hour.
> >> If we kill tracking of it as soon as we don't see an increment
> >> (between 2 reporting runs of impstats), it'll lead to unnecessary
> >> churn when low-values are common or load is not uniform in time.
> >>
> >> Implementing it on top of global-variables is not only has very high
> >> performance-penalty(it'll be prohibitive for high-throughput
> >> scenarios), it also exposes too much complexity to the user (where
> >> user has to worry about reset etc).
> >>
> >> I don't plan to have a scheduler in this implementation.
> >> GetAllStatsLines call will purge the tree instead of reset at that
> >> interval. Its basically a balance between freeing-up memory occupied
> >> by stale-metric-names vs. performance (lockless handling of
> >> increment). So it will be governed by impstat schedule. May be I
> >> should change name to better name (equivalent of
> >> purge_known_keys_after_they_have_been_reported_N_times).
> >>
> >>
> >> On Tue, Oct 6, 2015 at 4:30 PM, David Lang  wrote:
> >>> On Tue, 6 Oct 2015, singh.janmejay wrote:
> >>>
>  Hi,
> 
>  I am working on support for stats with dynamic-name. This comes handy
>  in situations where metric-name is dependent upon value of a certain
>  attribute of the message.
> 
>  Say, for a central log-aggregation service, its valuable to know what
>  is inbound message-count distribution across application-clusters that
>  send logs to it, or for a shared-server, its valuable to know what is
>  the log-volume generation across users etc.
> 
>  Im thinking of using functions-like interface to support this. It may
>  look similar to this:
> 
>  
>  dyn_stats("user_msg_count")
> 
>  ...
> 
>  ruleset(...) {
>  ...
>  dyn_inc("user_msg_count", $.user)
>  ...
>  }
>  
> 
>  dyn_stats signature looks like:
>  dyn_stats(, ,   default=10k>, )
> 
>  dyn_inc signature looks like:
>  dyn_inc(, )
> 
> 
>  Reporting would work similar to static-metric via impstats. Mapping:
>  statsobj_s.name = name_space
>  statsobj_s.origin = "dyn"
>  ctr_s.name = "foo" (say $.user had value foo)
> 
> 
>  Thoughts / suggestions?
> >>>
> >>>
> >>> how is this different/better than global variables? (although we may
> need to
> 

Re: [rsyslog] RFC: dynamic-stats support

2015-10-06 Thread singh.janmejay
Sure, sounds good.

The ability to gather stats with dynamic-key is important. I am
willing to even help with a rewrite if at some point we feel its best
implemented in a different way in the light of variable-implementation
changes.

On Tue, Oct 6, 2015 at 9:48 PM, Rainer Gerhards
 wrote:
> 2015-10-06 18:04 GMT+02:00 singh.janmejay :
>> Rainer,
>>
>> I see this as something completely outside the scope of  variables.
>> Building stats collector over variables is possible, but then we are
>> then talking about a general purpose language which allows building
>> such complex things. This increases the scope of Rainerscript and with
>> larger scope comes complexity. I feel this is in-line with the other
>> Lua discussion where you emphasized that Rainerscript should not
>> become a fully-general-purpose language?
>>
>> Eg. creating an atomic-increment function for variable requires that
>> we educate users about what can and can't be done if atomic-increment
>> function is used anywhere on a variable. What relationship they can
>> expect it to have with other atomic-incrementing variables (which gets
>> into memory model).
>
> Maybe I just feel overwhelemed in the moment with keeping track of
> everything that is going on. How about this: we can merge it BUT flag
> it as experimental. If all works out well, I am free starting early
> next year to have a deep look at the overall design and sticking
> together all those loose edges. I suspect that I would like to change
> a couple of things in the interest of tying it all well together (like
> I currently do in liblognorm).
>
> But if I need to carry all this legacy, that's really a burden (e.g.
> liblognorm now contains the full v1 code as a copy, which means it
> also needs to be somewhat maintained). I want to avoid this. As long
> as we document this as an *interim* solution that is not necessarily
> here to stay and as such "use at your own risk and it will probably
> break next year" I am sufficiently happy with that. We just must be
> aware that things may really break and there is a big chance the
> actually will. And I don't want to hear about potential vuln or
> compatiblity issues or whatever when this code is changed/removed.
> Also keep on your mind that I probably need to totally revamp the
> variable system, as json-c has many problematic parts for our use
> (what I learned when digging deep with liblognorm). So I *know* that
> there are big changes coming up next year!
>
> And, full ack: I want to limit the scope of RainerScript. Arrays was a
> good sample of why it may be a bad idea to go to boldly forward
> without thinking about the big picture.
>
> Rainer
>>
>>
>>
>> On Tue, Oct 6, 2015 at 8:49 PM, Rainer Gerhards
>>  wrote:
>>> I can't fully dig into this, but I think we must *very carefully*
>>> evaluate the overall design. Some time ago we introduced arrays for
>>> the limited liblognorm use case, and it hurts us every now and then
>>> when folks want to use arrays for other use cases. It may probably
>>> make sense to re-think how the variable engine etc behaves before
>>> adding more functionality. And make sure that everything works smooth
>>> in all use cases. While anything else may take care for some use
>>> cases, I fear we may get too fragmented. At least this is what I
>>> learned in the past months discussions.
>>>
>>> Anyone else?
>>>
>>> Rainer
>>>
>>> 2015-10-06 17:10 GMT+02:00 singh.janmejay :
 It is possible to use global-variables (it'll require some
 enhancements, table-support etc), but it'll be very inefficient
 compared to this approach. For instance, choice of data-structure etc
 allows making the solution a lot more efficient.

 Here its possible to locklessly increment counters in most cases, so
 its overhead is a lot lesser than global-variables.

 Recycle is precisely to allow this lockless mechanism to work. Its
 basically saying, it'll track metric-names he has seen in last 1 hour.
 If we kill tracking of it as soon as we don't see an increment
 (between 2 reporting runs of impstats), it'll lead to unnecessary
 churn when low-values are common or load is not uniform in time.

 Implementing it on top of global-variables is not only has very high
 performance-penalty(it'll be prohibitive for high-throughput
 scenarios), it also exposes too much complexity to the user (where
 user has to worry about reset etc).

 I don't plan to have a scheduler in this implementation.
 GetAllStatsLines call will purge the tree instead of reset at that
 interval. Its basically a balance between freeing-up memory occupied
 by stale-metric-names vs. performance (lockless handling of
 increment). So it will be governed by impstat schedule. May be I
 should change name to better name (equivalent of
 

Re: [rsyslog] RFC: dynamic-stats support

2015-10-06 Thread Rainer Gerhards
2015-10-06 18:04 GMT+02:00 singh.janmejay :
> Rainer,
>
> I see this as something completely outside the scope of  variables.
> Building stats collector over variables is possible, but then we are
> then talking about a general purpose language which allows building
> such complex things. This increases the scope of Rainerscript and with
> larger scope comes complexity. I feel this is in-line with the other
> Lua discussion where you emphasized that Rainerscript should not
> become a fully-general-purpose language?
>
> Eg. creating an atomic-increment function for variable requires that
> we educate users about what can and can't be done if atomic-increment
> function is used anywhere on a variable. What relationship they can
> expect it to have with other atomic-incrementing variables (which gets
> into memory model).

Maybe I just feel overwhelemed in the moment with keeping track of
everything that is going on. How about this: we can merge it BUT flag
it as experimental. If all works out well, I am free starting early
next year to have a deep look at the overall design and sticking
together all those loose edges. I suspect that I would like to change
a couple of things in the interest of tying it all well together (like
I currently do in liblognorm).

But if I need to carry all this legacy, that's really a burden (e.g.
liblognorm now contains the full v1 code as a copy, which means it
also needs to be somewhat maintained). I want to avoid this. As long
as we document this as an *interim* solution that is not necessarily
here to stay and as such "use at your own risk and it will probably
break next year" I am sufficiently happy with that. We just must be
aware that things may really break and there is a big chance the
actually will. And I don't want to hear about potential vuln or
compatiblity issues or whatever when this code is changed/removed.
Also keep on your mind that I probably need to totally revamp the
variable system, as json-c has many problematic parts for our use
(what I learned when digging deep with liblognorm). So I *know* that
there are big changes coming up next year!

And, full ack: I want to limit the scope of RainerScript. Arrays was a
good sample of why it may be a bad idea to go to boldly forward
without thinking about the big picture.

Rainer
>
>
>
> On Tue, Oct 6, 2015 at 8:49 PM, Rainer Gerhards
>  wrote:
>> I can't fully dig into this, but I think we must *very carefully*
>> evaluate the overall design. Some time ago we introduced arrays for
>> the limited liblognorm use case, and it hurts us every now and then
>> when folks want to use arrays for other use cases. It may probably
>> make sense to re-think how the variable engine etc behaves before
>> adding more functionality. And make sure that everything works smooth
>> in all use cases. While anything else may take care for some use
>> cases, I fear we may get too fragmented. At least this is what I
>> learned in the past months discussions.
>>
>> Anyone else?
>>
>> Rainer
>>
>> 2015-10-06 17:10 GMT+02:00 singh.janmejay :
>>> It is possible to use global-variables (it'll require some
>>> enhancements, table-support etc), but it'll be very inefficient
>>> compared to this approach. For instance, choice of data-structure etc
>>> allows making the solution a lot more efficient.
>>>
>>> Here its possible to locklessly increment counters in most cases, so
>>> its overhead is a lot lesser than global-variables.
>>>
>>> Recycle is precisely to allow this lockless mechanism to work. Its
>>> basically saying, it'll track metric-names he has seen in last 1 hour.
>>> If we kill tracking of it as soon as we don't see an increment
>>> (between 2 reporting runs of impstats), it'll lead to unnecessary
>>> churn when low-values are common or load is not uniform in time.
>>>
>>> Implementing it on top of global-variables is not only has very high
>>> performance-penalty(it'll be prohibitive for high-throughput
>>> scenarios), it also exposes too much complexity to the user (where
>>> user has to worry about reset etc).
>>>
>>> I don't plan to have a scheduler in this implementation.
>>> GetAllStatsLines call will purge the tree instead of reset at that
>>> interval. Its basically a balance between freeing-up memory occupied
>>> by stale-metric-names vs. performance (lockless handling of
>>> increment). So it will be governed by impstat schedule. May be I
>>> should change name to better name (equivalent of
>>> purge_known_keys_after_they_have_been_reported_N_times).
>>>
>>>
>>> On Tue, Oct 6, 2015 at 4:30 PM, David Lang  wrote:
 On Tue, 6 Oct 2015, singh.janmejay wrote:

> Hi,
>
> I am working on support for stats with dynamic-name. This comes handy
> in situations where metric-name is dependent upon value of a certain
> attribute of the message.
>
> Say, for a central log-aggregation service, its valuable to know 

Re: [rsyslog] RFC: dynamic-stats support

2015-10-06 Thread Rainer Gerhards
2015-10-06 18:36 GMT+02:00 singh.janmejay :
> Sure, sounds good.
>
> The ability to gather stats with dynamic-key is important. I am
> willing to even help with a rewrite if at some point we feel its best
> implemented in a different way in the light of variable-implementation
> changes.

going a bit OT here: json-c is fine. But we have begun a bit to abuse
it as a general variable system, which it really isn't. So I think we
need to revamp the var system completely and that should, among
others, address things like concurrency and performance. Once we reach
this point, it may (may!) make sense to change some of the inner
workings of liblognorm again. I am currently modelling the changes in
a way that makes it relatively easy to do so later. The idea is that
we could later have a switch to either return json-c objects (which
would be generated just before handing the object back to the user)
... or some better representation. I am *not* saying this will
necessarily happenm but is a potential route.

I focus on liblognorm at the moment. When that is fully completed,
I'll look into what I can further improve in json-c without breaking
what it really is. Once this is done, it's the right time to look at
actual need to replace the rsyslog variable system (I am 90% sure it's
required, but that really needs to be seen...).

Just to clarify. To the current project: whatever you do, please make
sure the doc prominently says "this is an interim solution and it may
totally change". I hate to request that (and even more hate it if I
need to break it later), but that's the only path I currently see
moving forward in a situation where I see a big change is very
probable -- but don't have the time to do it right now. I really hate
potentially wasting your time.

Rainer
>
> On Tue, Oct 6, 2015 at 9:48 PM, Rainer Gerhards
>  wrote:
>> 2015-10-06 18:04 GMT+02:00 singh.janmejay :
>>> Rainer,
>>>
>>> I see this as something completely outside the scope of  variables.
>>> Building stats collector over variables is possible, but then we are
>>> then talking about a general purpose language which allows building
>>> such complex things. This increases the scope of Rainerscript and with
>>> larger scope comes complexity. I feel this is in-line with the other
>>> Lua discussion where you emphasized that Rainerscript should not
>>> become a fully-general-purpose language?
>>>
>>> Eg. creating an atomic-increment function for variable requires that
>>> we educate users about what can and can't be done if atomic-increment
>>> function is used anywhere on a variable. What relationship they can
>>> expect it to have with other atomic-incrementing variables (which gets
>>> into memory model).
>>
>> Maybe I just feel overwhelemed in the moment with keeping track of
>> everything that is going on. How about this: we can merge it BUT flag
>> it as experimental. If all works out well, I am free starting early
>> next year to have a deep look at the overall design and sticking
>> together all those loose edges. I suspect that I would like to change
>> a couple of things in the interest of tying it all well together (like
>> I currently do in liblognorm).
>>
>> But if I need to carry all this legacy, that's really a burden (e.g.
>> liblognorm now contains the full v1 code as a copy, which means it
>> also needs to be somewhat maintained). I want to avoid this. As long
>> as we document this as an *interim* solution that is not necessarily
>> here to stay and as such "use at your own risk and it will probably
>> break next year" I am sufficiently happy with that. We just must be
>> aware that things may really break and there is a big chance the
>> actually will. And I don't want to hear about potential vuln or
>> compatiblity issues or whatever when this code is changed/removed.
>> Also keep on your mind that I probably need to totally revamp the
>> variable system, as json-c has many problematic parts for our use
>> (what I learned when digging deep with liblognorm). So I *know* that
>> there are big changes coming up next year!
>>
>> And, full ack: I want to limit the scope of RainerScript. Arrays was a
>> good sample of why it may be a bad idea to go to boldly forward
>> without thinking about the big picture.
>>
>> Rainer
>>>
>>>
>>>
>>> On Tue, Oct 6, 2015 at 8:49 PM, Rainer Gerhards
>>>  wrote:
 I can't fully dig into this, but I think we must *very carefully*
 evaluate the overall design. Some time ago we introduced arrays for
 the limited liblognorm use case, and it hurts us every now and then
 when folks want to use arrays for other use cases. It may probably
 make sense to re-think how the variable engine etc behaves before
 adding more functionality. And make sure that everything works smooth
 in all use cases. While anything else may take care for some use
 cases, I fear we may get too 

Re: [rsyslog] RFC: dynamic-stats support

2015-10-06 Thread David Lang

On Tue, 6 Oct 2015, singh.janmejay wrote:


It is possible to use global-variables (it'll require some
enhancements, table-support etc), but it'll be very inefficient
compared to this approach. For instance, choice of data-structure etc
allows making the solution a lot more efficient.


As for the data structures, Rainer has been identifying inefficencies in how 
json-c works and working to improve them



Here its possible to locklessly increment counters in most cases, so
its overhead is a lot lesser than global-variables.


how can you manage counters in multiple threads without locks? Especially when 
dealing with batches.



Recycle is precisely to allow this lockless mechanism to work. Its
basically saying, it'll track metric-names he has seen in last 1 hour.
If we kill tracking of it as soon as we don't see an increment
(between 2 reporting runs of impstats), it'll lead to unnecessary
churn when low-values are common or load is not uniform in time.


that depends on the cost of initializing a metric vs the cost of tracking the 
recycle mechanism.



Implementing it on top of global-variables is not only has very high
performance-penalty(it'll be prohibitive for high-throughput
scenarios), it also exposes too much complexity to the user (where
user has to worry about reset etc).

I don't plan to have a scheduler in this implementation.
GetAllStatsLines call will purge the tree instead of reset at that
interval. Its basically a balance between freeing-up memory occupied
by stale-metric-names vs. performance (lockless handling of
increment). So it will be governed by impstat schedule. May be I
should change name to better name (equivalent of
purge_known_keys_after_they_have_been_reported_N_times).


if this is just adding additional metrics to the impstats output that eliminates 
the schedular/reset issue.


I think we should have a metric configuration be fairly static, allow 
configuring custom metrics and add to them, but don't use data from the message 
as part of the name of the metric, and continue reporting them forever, even if 
they are 0 (so no need to 'recycle' names)


David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss

2015-10-06 Thread Woodruff, Dan
Thank you both David and Rainer - I really appreciate the detailed responses
and it's already helping put my box in a better state.

I used the impstats analyzer and started increasing dynaFileCacheSize for
the actions that were problematic. CPU usage increased by about 2.5x and
context switches went up to 800k for about 15 minutes after I restarted
rsyslogd but seems to have settled back in around 60-70% and around 100k
context switches now. Top showed three in:imtcp threads as the busiest
during that time.

Cacti stats show about half the number of disk operations as before. So it
seems to be a general performance gain, but it did not affect message loss,
however, as I can already see that some messages have disappeared.

Regarding queues, some of my rules in each ruleset do forward messages to
another host (a SIEM) over TCP. Should I keep the queue to support that, or
is TCP good enough?

RH support suggests turning of DNS lookups. Would that have any affect?
We've come to rely on hostname for creating directories for each server.

Dan Woodruff

-Original Message-
From: rsyslog-boun...@lists.adiscon.com
[mailto:rsyslog-boun...@lists.adiscon.com] On Behalf Of Rainer Gerhards
Sent: Tuesday, October 6, 2015 9:47 AM
To: rsyslog-users 
Subject: Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss

There are a couple of those:

LinuxIPTablesFile: origin=omfile requests=138877957 level0=19864683
missed=91303303 evicted=91292586 maxused=10 closetimeouts=0

have a look at the evicted count! This totally kills performance.

Each time one entry needs to be evicted, the file is closed, and a new
one is opened. You should set the cache size for this action to match
roughly the number of files that you want to keep open during normal
operations (e.g. if you have 1000 hosts and include hour and hostname
in the dynafile name, the cache size should be around 1000, except if
you have some low talkers.

Also note the online tool at

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rsyslog.com_impstats
-2Danalyzer_=BQICAg=kbmfwr1Yojg42sGEpaQh5ofMHBeTl9EI2eaqQZhHbOU=0S5vJ8
-FmQy6Qk5D6_T4U9EYbkCcMc4ijDuyUem89Lk=zfcoEMd6LlXQ7QdIO07SYeKV1XF06xf9-ncR
plNjcN8=sYq7lASJR370QFGO7qlbIFkrmueH01xv1J9reZc6L2Q= 

It probably will spit out a couple of warnings if you submit the pstats.

HTH
Rainer

2015-10-06 15:29 GMT+02:00 Woodruff, Dan :
> Rainer, thank you. Here is a full extract of a recent impstats run:
>
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: imuxsock: origin=imuxsock
> submitted=282984 ratelimit.discarded=0 ratelimit.numratelimiters=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache StatsFile:
> origin=omfile requests=326648 level0=325603 missed=1045 evicted=0
maxused=1
> closetimeouts=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 1:
origin=core.action
> processed=326648 failed=0 suspended=0 suspended.duration=0 resumed=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 2:
origin=core.action
> processed=3255090916 failed=0 suspended=2 suspended.duration=240 resumed=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache LDAPFile:
> origin=omfile requests=3171530030 level0=3171529033 missed=997 evicted=0
> maxused=1 closetimeouts=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 3:
origin=core.action
> processed=3171530030 failed=0 suspended=0 suspended.duration=0 resumed=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
> LinuxIPTablesFile: origin=omfile requests=138877957 level0=19864683
> missed=91303303 evicted=91292586 maxused=10 closetimeouts=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 4:
origin=core.action
> processed=138877957 failed=0 suspended=0 suspended.duration=0 resumed=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
LinuxUFWFile:
> origin=omfile requests=0 level0=0 missed=0 evicted=0 maxused=0
> closetimeouts=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 5:
origin=core.action
> processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache DHCPFile:
> origin=omfile requests=3518037 level0=3516992 missed=1045 evicted=0
> maxused=1 closetimeouts=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 6:
origin=core.action
> processed=3518037 failed=0 suspended=0 suspended.duration=0 resumed=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
> LinuxBlankProgramFile: origin=omfile requests=93283 level0=28347
missed=1932
> evicted=15 maxused=10 closetimeouts=303
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 7:
origin=core.action
> processed=93283 failed=0 suspended=0 suspended.duration=0 resumed=0
> Oct  6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache
> LinuxProgramFile: origin=omfile requests=358821719 level0=245218530
> missed=23914416 evicted=23903728 maxused=10 closetimeouts=0
> Oct  6 08:27:07 

Re: [rsyslog] RFC: dynamic-stats support

2015-10-06 Thread singh.janmejay
It is possible to use global-variables (it'll require some
enhancements, table-support etc), but it'll be very inefficient
compared to this approach. For instance, choice of data-structure etc
allows making the solution a lot more efficient.

Here its possible to locklessly increment counters in most cases, so
its overhead is a lot lesser than global-variables.

Recycle is precisely to allow this lockless mechanism to work. Its
basically saying, it'll track metric-names he has seen in last 1 hour.
If we kill tracking of it as soon as we don't see an increment
(between 2 reporting runs of impstats), it'll lead to unnecessary
churn when low-values are common or load is not uniform in time.

Implementing it on top of global-variables is not only has very high
performance-penalty(it'll be prohibitive for high-throughput
scenarios), it also exposes too much complexity to the user (where
user has to worry about reset etc).

I don't plan to have a scheduler in this implementation.
GetAllStatsLines call will purge the tree instead of reset at that
interval. Its basically a balance between freeing-up memory occupied
by stale-metric-names vs. performance (lockless handling of
increment). So it will be governed by impstat schedule. May be I
should change name to better name (equivalent of
purge_known_keys_after_they_have_been_reported_N_times).


On Tue, Oct 6, 2015 at 4:30 PM, David Lang  wrote:
> On Tue, 6 Oct 2015, singh.janmejay wrote:
>
>> Hi,
>>
>> I am working on support for stats with dynamic-name. This comes handy
>> in situations where metric-name is dependent upon value of a certain
>> attribute of the message.
>>
>> Say, for a central log-aggregation service, its valuable to know what
>> is inbound message-count distribution across application-clusters that
>> send logs to it, or for a shared-server, its valuable to know what is
>> the log-volume generation across users etc.
>>
>> Im thinking of using functions-like interface to support this. It may
>> look similar to this:
>>
>> 
>> dyn_stats("user_msg_count")
>>
>> ...
>>
>> ruleset(...) {
>> ...
>> dyn_inc("user_msg_count", $.user)
>> ...
>> }
>> 
>>
>> dyn_stats signature looks like:
>> dyn_stats(, , > default=10k>, )
>>
>> dyn_inc signature looks like:
>> dyn_inc(, )
>>
>>
>> Reporting would work similar to static-metric via impstats. Mapping:
>> statsobj_s.name = name_space
>> statsobj_s.origin = "dyn"
>> ctr_s.name = "foo" (say $.user had value foo)
>>
>>
>> Thoughts / suggestions?
>
>
> how is this different/better than global variables? (although we may need to
> implement soem functions, atomic inc/dec copy+clear) If you have pstats
> output in json format, you can even piggyback on it's schedule to output the
> data.
>
>
> things like stats can very easily end up being expensive in terms of locking
> (something global variables already have figured out), and it sounds like
> you are proposing adding a scheduler of some sort to output the data.
>
> variables should not need to be 'recycled', either they contain data or they
> don't. If they contain data, you need to keep the data until you do
> something with it, if they don't, you don't have to track them.
>
>
> I am actually doing this sort of thing external to rsyslog in SEC
>
> I have a template in rsyslog that contains hostname, fromhost-ip,
> programname and I output it via improg to SEC. SEC accumulates counters and
> has scheduled outputs to files.
>
> before I started using SEC for this, I used the same template to output to a
> file and then for reports, used cut + sort + uniq -c to extract the data I
> need. When the files only contain the significant data, this is actually not
> bad to do, even at higher volumes.
>
> David Lang
> ___
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
> LIKE THAT.



-- 
Regards,
Janmejay
http://codehunk.wordpress.com
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.