Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss
When you said "if you are writing to a local file, you should not need to have a separate queue", does queue equate to ruleset in this case? The main reason I have the separate rulesets is to handle different kinds of incoming logs on different ports, making it easier to write the rules logic for the different types. Does that make sense? using rulesets to segment your rules makes lots of sense, adding a queue if everything in the ruleset is writing to local files almost never does. writing to disk is fast enough that the overhead of moving things in and out of the queue really hurts. In addition, if everything is behind a separate queue, the main worker threads end up not batching messages, and that _really_ hurts performance because the overhead of locking and unlocking the queues for each message is rather high. There is a high number of dynafiles, specifically for server logs, which we break out into files by host and programname with these templates: template(name="LinuxProgramFile" type="string" string="/var/log/collection/linux/%fromhost:::lowercase%/%programname%.log") template(name="WindowsProgramFile" type="string" string="/var/log/collection/windows/%HOSTNAME:::lowercase%/%programname%.log ") The result is about 32,000 total dynafiles which are then rotated out daily. the key is how many different files are being written to at the same time. As Rainer points out, if you have 1000 files open at a time, you need to have the dynafilecachesize set >1000. I have systems that write per-server, per 15-min files. for each server this creates 96 files a day. I don't need 96 * servercount slots in the dynafilecache, I need > servercount. the penalty for having this undersized is huge. when you are thrashing, each message that arrives requires that the system flush it's cache, close the file (which triggers filesystem metadata updates), open a new file and write the data. make the cache size much larger than what you need. Looking at top, rsyslogd as a whole never goes above 120%. Most of the threads hang out below 15% with the busiest one - rs:NetworkDevic - hitting about 50% during peak. Although I hadn't noticed this before, watching top for a few minutes this morning, rsyslogd hangs out around 60%, then bursts to almost 200% for one second, then disappears from top in the next second, before returning to 60%. It cycled through this every 10 seconds or so. After restarting rsyslogd, the behavior went away - it's staying around 60%. Should I try lowering the number of threads for the rulesets/imudp? This issue does happen with both TCP and UDP, however, which further compounds my confusion. You mention systemd - I'll look into that as I have a case open with RH. if you are just dealing with logs sent from remote machines, systemd isn't going to be involved. For the rulesets, try eliminating the queue entirely, combined with fixing the dynafilecachesize issue, I'll bet cpu utilization will plummet and you won't miss the queues. for imudp, try leaving the number of threads at 1. I've run systems receiving hundreds of thousands of logs/sec with a single thread. It's only when we start getting close to the throughput of Gig-E that a single thread starts having problems in most cases. set all thread counts to 1 and watch top to see if any of them need help. From what I'm seeing, you shouldn't end up needing to do anything else. David Lang ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: [rsyslog] RFC: dynamic-stats support
I can't fully dig into this, but I think we must *very carefully* evaluate the overall design. Some time ago we introduced arrays for the limited liblognorm use case, and it hurts us every now and then when folks want to use arrays for other use cases. It may probably make sense to re-think how the variable engine etc behaves before adding more functionality. And make sure that everything works smooth in all use cases. While anything else may take care for some use cases, I fear we may get too fragmented. At least this is what I learned in the past months discussions. Anyone else? Rainer 2015-10-06 17:10 GMT+02:00 singh.janmejay: > It is possible to use global-variables (it'll require some > enhancements, table-support etc), but it'll be very inefficient > compared to this approach. For instance, choice of data-structure etc > allows making the solution a lot more efficient. > > Here its possible to locklessly increment counters in most cases, so > its overhead is a lot lesser than global-variables. > > Recycle is precisely to allow this lockless mechanism to work. Its > basically saying, it'll track metric-names he has seen in last 1 hour. > If we kill tracking of it as soon as we don't see an increment > (between 2 reporting runs of impstats), it'll lead to unnecessary > churn when low-values are common or load is not uniform in time. > > Implementing it on top of global-variables is not only has very high > performance-penalty(it'll be prohibitive for high-throughput > scenarios), it also exposes too much complexity to the user (where > user has to worry about reset etc). > > I don't plan to have a scheduler in this implementation. > GetAllStatsLines call will purge the tree instead of reset at that > interval. Its basically a balance between freeing-up memory occupied > by stale-metric-names vs. performance (lockless handling of > increment). So it will be governed by impstat schedule. May be I > should change name to better name (equivalent of > purge_known_keys_after_they_have_been_reported_N_times). > > > On Tue, Oct 6, 2015 at 4:30 PM, David Lang wrote: >> On Tue, 6 Oct 2015, singh.janmejay wrote: >> >>> Hi, >>> >>> I am working on support for stats with dynamic-name. This comes handy >>> in situations where metric-name is dependent upon value of a certain >>> attribute of the message. >>> >>> Say, for a central log-aggregation service, its valuable to know what >>> is inbound message-count distribution across application-clusters that >>> send logs to it, or for a shared-server, its valuable to know what is >>> the log-volume generation across users etc. >>> >>> Im thinking of using functions-like interface to support this. It may >>> look similar to this: >>> >>> >>> dyn_stats("user_msg_count") >>> >>> ... >>> >>> ruleset(...) { >>> ... >>> dyn_inc("user_msg_count", $.user) >>> ... >>> } >>> >>> >>> dyn_stats signature looks like: >>> dyn_stats(, , >> default=10k>, ) >>> >>> dyn_inc signature looks like: >>> dyn_inc(, ) >>> >>> >>> Reporting would work similar to static-metric via impstats. Mapping: >>> statsobj_s.name = name_space >>> statsobj_s.origin = "dyn" >>> ctr_s.name = "foo" (say $.user had value foo) >>> >>> >>> Thoughts / suggestions? >> >> >> how is this different/better than global variables? (although we may need to >> implement soem functions, atomic inc/dec copy+clear) If you have pstats >> output in json format, you can even piggyback on it's schedule to output the >> data. >> >> >> things like stats can very easily end up being expensive in terms of locking >> (something global variables already have figured out), and it sounds like >> you are proposing adding a scheduler of some sort to output the data. >> >> variables should not need to be 'recycled', either they contain data or they >> don't. If they contain data, you need to keep the data until you do >> something with it, if they don't, you don't have to track them. >> >> >> I am actually doing this sort of thing external to rsyslog in SEC >> >> I have a template in rsyslog that contains hostname, fromhost-ip, >> programname and I output it via improg to SEC. SEC accumulates counters and >> has scheduled outputs to files. >> >> before I started using SEC for this, I used the same template to output to a >> file and then for reports, used cut + sort + uniq -c to extract the data I >> need. When the files only contain the significant data, this is actually not >> bad to do, even at higher volumes. >> >> David Lang >> ___ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of >> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T >> LIKE THAT. > > > > -- > Regards,
Re: [rsyslog] omfile fuse
On Tue, 6 Oct 2015, Rainer Gerhards wrote: 2015-10-06 7:27 GMT+02:00 David Lang: On Tue, 6 Oct 2015, Rainer Gerhards wrote: 2015-10-06 0:09 GMT+02:00 Kendall Green : I've set asyncWriting="off" and the only difference is, that with 1000 message test, all were delivered on HUP and/or service restart, but apparently not on flushInterval or buffer full. I'm currently running another test with 100k messages iterating about 10 logs/sec, and watching the output for number of lines. The output file is created on dfs-fuse mount upon receipt of the first log, but 0 lines written. This current test will hopefully show how many logs get processed without intervention over the next ~3 hours, and if still no logs appear will see how many are written upon HUP signal. I'm unsure how the asyncWriting off impacts the rest of the config, and if the current config has disabled compression... or not with no zip. This note is from omfile asyncWriting doc: *Note that in order to enable FlushInterval, AsyncWriting must be set to “on”. Otherwise, the flush interval will be ignored. Also note that when FlushOnTXEnd is “on” but AsyncWriting is off, output will only be written when the buffer is full. This may take several hours, or even require a rsyslog shutdown. However, a buffer flush can be forced in that case by sending rsyslogd a HUP signal.* Please help clarify how these parameters correspond and what the settings can achieve expedited log transport, avoiding long delays waiting for buffering or flush intervals. This goal is why have chosen low settings for IOBufferSize and flushInterval. Without async writing, everything is written sync on when it arrives (more precisely at commit of the current batch) aka "as quickly as possible". Thus, these params do not exist because they have no meaning. Note that with a very large batch and a small buffer, partial records may be written. although without the checkpoint parameter the writes are to the OS and they get flushed to disk at a later point. you mean "sync", right? I may be misunderstanding, but i thought the checkpoint (or checkpoint interval) parameter controleld how frequently you did the sync (so turning sync on didn't make it per every message) David Lang___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: [rsyslog] Separation of actions based on log source - with good performance
Hello, I have never heared such a nonsense. Actually the number of applications that does not include features that are not official documented shoult be extremly limited. The functionality is really usefull and already in big landscapes productive. Please, please do NOT remove the lookup-table from the main branch. The functionaltiy works fine, I'm using this since march and I did not have any issue since the latest patch of janmejay. Even the "concept" is not fully implemented (e.g. smaller things like nomatch) the main part works fine. My suggestion would be to document everything which is currently implemented and keep the "conceptual documentation" as it is. So the Maintainer should no longer have an issue with it. If the main issue it the time to document the already implemented features, I can create a patch. Chris > Gesendet: Dienstag, 06. Oktober 2015 um 07:36 Uhr > Von: "David Lang"> An: rsyslog-users > Betreff: Re: [rsyslog] Separation of actions based on log source - with good > performance > > a CVE for something that requires manually enabling an experimental feature??? > > it would be one thing if a default config had the problem, or if it was > something entirely dependent on remote data. > > I would be very tempted to respond to the CVE with "don't enable this > incomplete > feature" as the solution. It's very common for incomplete features to be > included in released versions > > grumble, we have enough real bugs to worry about. > > David Lang > > On Tue, 6 Oct 2015, Rainer Gerhards wrote: > > > Date: Tue, 6 Oct 2015 07:15:31 +0200 > > From: Rainer Gerhards > > Reply-To: rsyslog-users > > To: rsyslog-users > > Subject: Re: [rsyslog] Separation of actions based on log source - with good > > performance > > > > Sorry, folks, good intent always seems to find someone who turns it > > into negative. I was yesterday contacted by a distro maintainer who > > wants to turn this bug in the officially non-existant lookup table > > feature into a CVE and insists that it is a vuln even after the > > argument that the feature never oficially existed. > > > > It looks like it was a bad idea to merge potentially useful yet > > incomplete code into the main branch (and documenting it to be not > > present). It looks like I need to re-think my stance on experimental > > features. > > > > Anyhow, I really don't want to support the argument that something > > non-existing can be a CVE. As such, I will create a new > > master-insecure branch, which will be a clone of the current master > > branch. Then I'll remove the lookup table code, so that the code base > > matches the documentation. I really don't want to create a general > > principle here that we need to create CVEs (and patched) for something > > that was just added as a convenience for a handful of folks who were > > ready to take a risk. > > > > If there is sufficient interest, we can consider officially adding > > this feature to the January 8.15 release iff it is ready by then. > > @janmejay: please let me know if you would like to continue with your > > work on lookup tables under this new situation. > > > > As soon as I have time, I'll check what else needs to be removed. Not > > sure about the ./contributed branch, because the project cannot > > guarantee at all this is bug-free. It's documented to be so, but if > > that is not sufficient, it should probably live only in the > > master-insecure branch. > > > > Rainer > > > > 2015-10-02 17:29 GMT+02:00 singh.janmejay : > >> As of now it returns empty string for no-match. I guess we should go ahead > >> with it in its current form. We can add default value any time later > >> without breaking compatibility(it'd default to ""). > >> > >> I'll add a test for multiple tables on Monday. > >> > >> On Fri, Oct 2, 2015, 7:16 PM wrote: > >> > >>> Hi, > >>> No, I didn't. I even didn't realize the patch. > >>> > >>> It seems to be exactly the related issue. So I don't expect any further > >>> issues. > >>> I will use the new version on 2 systems. If there is any other issue, I > >>> will let you know. > >>> Release data for next rsyslog version is quite far so enough time to > >>> test... ;) > >>> > >>> The missing implementation of "nomatch" (default) entry as described at > >>> http://www.rsyslog.com/doc/lookup_tables.html > >>> would from my opinion require changes: > >>> > >>> Arround line 132 of lookup.c file (save of value) > >>> Arround line 243 of lookup.c file (search in lookuptable fails, so return > >>> nomatch value. > >>> > >>> > >>> regards > >>> Chris > >>> > >>> > >>> > Gesendet: Donnerstag, 01. Oktober 2015 um 16:57 Uhr > >>> > Von: "singh.janmejay" > >>> > An: rsyslog-users > >>> > Betreff: Re: [rsyslog] Separation of
Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss
> -Original Message- > From: rsyslog-boun...@lists.adiscon.com [mailto:rsyslog- > boun...@lists.adiscon.com] On Behalf Of David Lang > Sent: Tuesday, October 6, 2015 12:49 PM > To: rsyslog-users> Subject: Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss > > > Thank you both David and Rainer - I really appreciate the detailed > responses > > and it's already helping put my box in a better state. > > good to hear > > > I used the impstats analyzer and started increasing dynaFileCacheSize for > > the actions that were problematic. CPU usage increased by about 2.5x and > > context switches went up to 800k for about 15 minutes after I restarted > > rsyslogd but seems to have settled back in around 60-70% and around 100k > > context switches now. Top showed three in:imtcp threads as the busiest > > during that time. > > a backlog of inbound messages via tcp after a restart isn't unusual. > > with heavy inbound tcp use you may want to look at using imptcp instead of > imtcp. > > > Cacti stats show about half the number of disk operations as before. So it > > seems to be a general performance gain, but it did not affect message loss, > > however, as I can already see that some messages have disappeared. > > I'm surprised it didn't help more than that. what do the impstats look like. Impstats looks good - only two dynafiles that still have evictions, which I'll tune a little more now. See bottom of message for a full recent output. > > > Regarding queues, some of my rules in each ruleset do forward messages > to > > another host (a SIEM) over TCP. Should I keep the queue to support that, > or > > is TCP good enough? > > it's a good idea to have a queue on outbound delivery methods that can > block > because if the remote system goes down it will block all message processing. > Consider using a disk queue for this as well (to hand an extended outage of > the > remote system) > > also note for the record that TCP is not as reliable as you may be thinking. > https://urldefense.proofpoint.com/v2/url?u=http- > 3A__blog.gerhards.net_2008_04_on-2Dunreliability-2Dof-2Dplain-2Dtcp- > 2Dsyslog.html=BQICAg=kbmfwr1Yojg42sGEpaQh5ofMHBeTl9EI2eaqQZ > hHbOU=0S5vJ8-FmQy6Qk5D6_T4U9EYbkCcMc4ijDuyUem89Lk=N- > 1ExMOcN75_9w1gY8ODemHdzRwHC_SeAxYSaAEb4P0=XiOr6GhA1NvhGT > keIJGzrwxy7vjn6FYRasLdeBoLCvo= Thanks. That is definitely important to me so I will look into creating a disk queue for outbound messages once I get this missing message issue worked out. > > > RH support suggests turning of DNS lookups. Would that have any affect? > > We've come to rely on hostname for creating directories for each server. > > the %hostname% variable is what the sending system puts in the message, > DNS > lookups aren't needed for that. Turning off DNS will avoid delays on UDP > messages. With DNS lookups, a slow DNS server can cause delays (without > rsyslog > using CPU) that can end up causing some messages to be lost if the OS > queues end > up running out of space. We have several %fromhost% templates for servers. I noticed that there is a rush of DNS requests when rsyslog first starts up but then it quiets down due to the internal caching. I also have a local DNS cache running to point to - not currently set - but when I did have the system pointed there, I still had missing messages. > > Now that you have addressed some of the worst problems in your config, > let's > talk about the log loss problem. > > are you loosing messages that arrive via UDP or TCP (or both)? Losing messages in both. Currently I'm testing with a port dedicated to just one log source that I am continuously having problems with, and I'm sending UDP right now. The port is 10517 and the config is: template(name="EduroamACSFile" type="string" string="/var/log/collection/eduroam_acs.log") template(name="EduroamACSFormat" type="string" string="%timestamp% %hostname% %syslogtag%%msg:::escape-cc%\n") ruleset(name="TestingRuleSet") { *.* { /tmp/debug.log;RSYSLOG_DebugFormat action(type="omfile" dynaFile="EduroamACSFile" template="EduroamACSFormat" name="writeEduroamACSFile") stop } } # end ruleset # bind the listening ports to the ruleset input(type="imudp" port="10517" ruleset="TestingRuleSet") input(type="imtcp" port="10517" ruleset="TestingRuleSet") I haven't had a message appear in /tmp/debug or in eduroam_acs since 11:01 this morning, even with several rsyslogd restarts since then. Last message in my log file (sanitized by me): Oct 6 11:01:06 CSCOacs_Passed_Authentications 129983 2 1 Step=11001 , Step=11017 , Step=15008 , Step=15004 , Step=15012 , Step=11358 , Step=11361 , Step=11355 , Step=11100 , Step=11101 , Step=11357 , Step=11002 , NetworkDeviceName=... msg continues But tcpdump shows data coming into the box to port 10517 UDP: 16:26:36.751119 IP x.x.2.85.55422 > x.x.101.70.10517: UDP, length 1015 E.@.<..U..eF.~)4<181>Oct 6
Re: [rsyslog] RFC: dynamic-stats support
On Wed, 7 Oct 2015, singh.janmejay wrote: -- Regards, Janmejay PS: Please blame the typos in this mail on my phone's uncivilized soft keyboard sporting it's not-so-smart-assist technology. On Oct 6, 2015 10:32 PM, "David Lang"wrote: On Tue, 6 Oct 2015, singh.janmejay wrote: It is possible to use global-variables (it'll require some enhancements, table-support etc), but it'll be very inefficient compared to this approach. For instance, choice of data-structure etc allows making the solution a lot more efficient. As for the data structures, Rainer has been identifying inefficencies in how json-c works and working to improve them That optimizes variable system. But it still is a general propose variable system. It can't and shouldn't understand relationship between variables. what relationships are there between the different metrics? Here its possible to locklessly increment counters in most cases, so its overhead is a lot lesser than global-variables. how can you manage counters in multiple threads without locks? Especially when dealing with batches. Consider a trie based implementation. With bounded fanout-factor, it's O(1) wrt metric-names cardinality. It also has very little lock contention involved. Usually operations work with read-locks, only when new metric is initialized it requires a write lock on patent node. If recycles are few and far apart, lock contention would be negligible. if you have multiple threads that may need to update the same metric at the same time, a tree doesn't eliminate the locking. The current json-c locking is being make intentionally over-broad right now because it appears that some json-c code is not thread-safe and we haven't identified it yet. Once that's tracked down and fixed (or json-c replaced), updating one item should not require locking any more than that item. Recycle is precisely to allow this lockless mechanism to work. Its basically saying, it'll track metric-names he has seen in last 1 hour. If we kill tracking of it as soon as we don't see an increment (between 2 reporting runs of impstats), it'll lead to unnecessary churn when low-values are common or load is not uniform in time. that depends on the cost of initializing a metric vs the cost of tracking the recycle mechanism. 0 value data-points can easily be filtered out. So they don't create any processing overhead downstream. Cost of tracking for recycle is minimal because it's a single counter bring tracked, when it reaches zero it's reset to orig starting value and trie is killed after reporting accumulated stats. actually, filtering out 0 data-points can be a very bad thing. Far too many monitoring tools produce stright-line graphs/estimates between reported data-points, so it's very important to report 0 value data-points Implementing it on top of global-variables is not only has very high performance-penalty(it'll be prohibitive for high-throughput scenarios), it also exposes too much complexity to the user (where user has to worry about reset etc). I don't plan to have a scheduler in this implementation. GetAllStatsLines call will purge the tree instead of reset at that interval. Its basically a balance between freeing-up memory occupied by stale-metric-names vs. performance (lockless handling of increment). So it will be governed by impstat schedule. May be I should change name to better name (equivalent of purge_known_keys_after_they_have_been_reported_N_times). if this is just adding additional metrics to the impstats output that eliminates the schedular/reset issue. I think we should have a metric configuration be fairly static, allow configuring custom metrics and add to them, but don't use data from the message as part of the name of the metric, and continue reporting them forever, even if they are 0 (so no need to 'recycle' names) Dynamic metrics are a real usecase for any shared system(utilisation across several users, several hosts, several clusters, several-subnets etc are easily reportable with this). The only way to report utilisation in many scenarios is to have dyn-metric names. The alternative is to pre-declare all keys, but that to me is a more indirect solution. It's not as flexible/adaptive. I think declarative static-key a useful feature on its own, for eg when classifying reportable metric into buckets known in advance, but dyn-key and configurable-static-key are not interchangeable. dynamic systems will have pathalogical failure condiditions. Consider what happens when someone uses hostname in a dynafile template and then some system starts spewing malformed logs that put garbage data in that field. Creating hundreds or thousands of metric variables is much worse. I agree that pre-declared keys are less flexible, but they are also going to be far safer and easier to deal with. David Lang ___ rsyslog mailing list
Re: [rsyslog] Complex forwarding and spoofing question
enable impstats, my guess is that there is something wrong with communications to the remote system so you get a bunch of messages in, send them out, but the are not getting acks back, so things stop trying to deliver. That's then filling the queue and spilling over to the disk. also do a tcpdump on the connection to the remote system. David Lang On Tue, 6 Oct 2015, Randy Baca wrote: Date: Tue, 6 Oct 2015 20:35:12 + From: Randy BacaReply-To: rsyslog-users To: rsyslog-users Subject: Re: [rsyslog] Complex forwarding and spoofing question OK, I got everything updated and we are up and running. However, there appears to be an issue with disk queuing. When we fire up rsyslog the remotebox is only receiving a short burst of events for a few seconds and then about 1 or 2 events per second after that. After the few seconds when the burst is over the disk queue on the local system then starts creating spool files. They don't appear to be flushing. The local system receives about 1k events per second. They are all making it to the local UDP spoof but not the remotebox, even after playing around with the queue settings. Here is the current conf: module (load="imklog") module (load="imuxsock") module (load="imudp") module (load="imtcp") module (load="omudpspoof") # Global Configuration global ( parser.escapeControlCharactersCStyle="on" workDirectory="/var/spool/rsyslog" localHostname="hostname" ) # In: 514/TCP (uses the same queue as UDP) input(type="imtcp" address="10.1.2.3" port="514" ruleset="ruleset_eth0_514") # In: 514/UDP (uses the same queue as TCP) input(type="imudp" address="10.1.2.3" port="514" ruleset="ruleset_eth0_514") # Out: UDP-spoof to the local and also forward to remote rsyslog ruleset(name="ruleset_eth0_514" queue.type="LinkedList") { call action.local.udp515 call action.fwd.remotebox stop } # Fwd to localhost:udp/515 ruleset(name="action.local.udp515") { action(type="omudpspoof" name="omudpspoof.udp515" target="127.0.0.1" port="515") } # Fwd to remotebox w/ compression and local disk queueing ruleset(name="action.fwd.remotebox") { action(type="omfwd" name="omfwd.remotebox" queue.spoolDirectory="/var/spool/rsyslog_tcp" #queue.lowwatermark="2000" #queue.highwatermark="100" queue.workerthreads="2" #queue.type="Disk" queue.type="LinkedList" #queue.filename="omfwd_remotebox" #queue.size="10" queue.maxdiskspace="20G" queue.saveonshutdown="on" action.resumeretrycount="-1" target="172.22.22.22" port="514" protocol="tcp" compression.mode="stream:always") } From: rsyslog-boun...@lists.adiscon.com [rsyslog-boun...@lists.adiscon.com] on behalf of Dave Caplinger [davecaplin...@solutionary.com] Sent: Monday, October 05, 2015 5:40 AM To: rsyslog-users Subject: Re: [rsyslog] Complex forwarding and spoofing question David's right, it's a separate package. Sorry about that. Here's a list of all the 64-bit rsyslog packages (from 'yum search rsyslog') from a system I have access to; please verify you see something similar from yours. rsyslog-gnutls.x86_64 : TLS protocol support for rsyslog rsyslog-gssapi.x86_64 : GSSAPI authentication and encryption support for rsyslog rsyslog-mysql.x86_64 : MySQL support for rsyslog rsyslog-pgsql.x86_64 : PostgresSQL support for rsyslog rsyslog-relp.x86_64 : RELP protocol support for rsyslog rsyslog-snmp.x86_64 : SNMP protocol support for rsyslog rsyslog.x86_64 : Enhanced system logging and kernel message trapping daemons rsyslog-elasticsearch.x86_64 : Provides the omelasticsearch module rsyslog-mmjsonparse.x86_64 : mmjsonparse support rsyslog-mmutf8fix.x86_64 : mmutf8fix support rsyslog-udpspoof.x86_64 : Provides the omudpspoof module Then, of course: 'yum install rsyslog-udpspoof'. -- Dave Caplinger, Director, Technical Product Management | 402.361.3063 | Solutionary — An NTT Group Security Company On Oct 3, 2015, at 1:02 AM, David Lang wrote: It may be a separate package (I don't know how to query this through yum) in the meantime, try addding the following $template raw,"%rawmsg%" and then change the forwarding via localhost from omudpspoof to udp with the template raw. David Lang On Fri, 2 Oct 2015, Randy Baca wrote: Date: Fri, 2 Oct 2015 23:46:27 + From: Randy Baca Reply-To: rsyslog-users To: rsyslog-users Subject: Re: [rsyslog] Complex forwarding and spoofing question Upgraded to 8.13. Looks like now the omudpspoof.so file is missing. Does it not get installed automatically with the upgrade? I checked in /lib64/rsyslog/ and it is not there. It also didn't like the number I had set for the
Re: [rsyslog] Separation of actions based on log source - with good performance
Let us ship a full implementation in 8.14, let us not remove it. -- Regards, Janmejay PS: Please blame the typos in this mail on my phone's uncivilized soft keyboard sporting it's not-so-smart-assist technology. On Oct 6, 2015 1:19 PM, "singh.janmejay"wrote: > How about this: I get to it next week and implement all that the proposal > talks about and document it? It doesn't seem like a lot of work from where > it is right now. > > I vote for not removing it. > > -- > Regards, > Janmejay > > PS: Please blame the typos in this mail on my phone's uncivilized soft > keyboard sporting it's not-so-smart-assist technology. > > On Oct 6, 2015 11:53 AM, wrote: > >> Hello, >> I have never heared such a nonsense. >> Actually the number of applications that does not include features that >> are not official documented shoult be extremly limited. >> >> The functionality is really usefull and already in big landscapes >> productive. >> Please, please do NOT remove the lookup-table from the main branch. >> The functionaltiy works fine, I'm using this since march and I did not >> have any issue since the latest patch of janmejay. >> >> Even the "concept" is not fully implemented (e.g. smaller things like >> nomatch) the main part works fine. >> >> >> My suggestion would be to document everything which is currently >> implemented and keep the "conceptual documentation" as it is. >> So the Maintainer should no longer have an issue with it. >> >> >> If the main issue it the time to document the already implemented >> features, I can create a patch. >> >> >> Chris >> >> >> >> > Gesendet: Dienstag, 06. Oktober 2015 um 07:36 Uhr >> > Von: "David Lang" >> > An: rsyslog-users >> > Betreff: Re: [rsyslog] Separation of actions based on log source - with >> good performance >> > >> > a CVE for something that requires manually enabling an experimental >> feature??? >> > >> > it would be one thing if a default config had the problem, or if it was >> > something entirely dependent on remote data. >> > >> > I would be very tempted to respond to the CVE with "don't enable this >> incomplete >> > feature" as the solution. It's very common for incomplete features to be >> > included in released versions >> > >> > grumble, we have enough real bugs to worry about. >> > >> > David Lang >> > >> > On Tue, 6 Oct 2015, Rainer Gerhards wrote: >> > >> > > Date: Tue, 6 Oct 2015 07:15:31 +0200 >> > > From: Rainer Gerhards >> > > Reply-To: rsyslog-users >> > > To: rsyslog-users >> > > Subject: Re: [rsyslog] Separation of actions based on log source - >> with good >> > > performance >> > > >> > > Sorry, folks, good intent always seems to find someone who turns it >> > > into negative. I was yesterday contacted by a distro maintainer who >> > > wants to turn this bug in the officially non-existant lookup table >> > > feature into a CVE and insists that it is a vuln even after the >> > > argument that the feature never oficially existed. >> > > >> > > It looks like it was a bad idea to merge potentially useful yet >> > > incomplete code into the main branch (and documenting it to be not >> > > present). It looks like I need to re-think my stance on experimental >> > > features. >> > > >> > > Anyhow, I really don't want to support the argument that something >> > > non-existing can be a CVE. As such, I will create a new >> > > master-insecure branch, which will be a clone of the current master >> > > branch. Then I'll remove the lookup table code, so that the code base >> > > matches the documentation. I really don't want to create a general >> > > principle here that we need to create CVEs (and patched) for something >> > > that was just added as a convenience for a handful of folks who were >> > > ready to take a risk. >> > > >> > > If there is sufficient interest, we can consider officially adding >> > > this feature to the January 8.15 release iff it is ready by then. >> > > @janmejay: please let me know if you would like to continue with your >> > > work on lookup tables under this new situation. >> > > >> > > As soon as I have time, I'll check what else needs to be removed. Not >> > > sure about the ./contributed branch, because the project cannot >> > > guarantee at all this is bug-free. It's documented to be so, but if >> > > that is not sufficient, it should probably live only in the >> > > master-insecure branch. >> > > >> > > Rainer >> > > >> > > 2015-10-02 17:29 GMT+02:00 singh.janmejay : >> > >> As of now it returns empty string for no-match. I guess we should go >> ahead >> > >> with it in its current form. We can add default value any time later >> > >> without breaking compatibility(it'd default to ""). >> > >> >> > >> I'll add a test for multiple tables on Monday. >> > >> >> > >> On Fri, Oct 2, 2015, 7:16 PM
Re: [rsyslog] Separation of actions based on log source - with good performance
How about this: I get to it next week and implement all that the proposal talks about and document it? It doesn't seem like a lot of work from where it is right now. I vote for not removing it. -- Regards, Janmejay PS: Please blame the typos in this mail on my phone's uncivilized soft keyboard sporting it's not-so-smart-assist technology. On Oct 6, 2015 11:53 AM,wrote: > Hello, > I have never heared such a nonsense. > Actually the number of applications that does not include features that > are not official documented shoult be extremly limited. > > The functionality is really usefull and already in big landscapes > productive. > Please, please do NOT remove the lookup-table from the main branch. > The functionaltiy works fine, I'm using this since march and I did not > have any issue since the latest patch of janmejay. > > Even the "concept" is not fully implemented (e.g. smaller things like > nomatch) the main part works fine. > > > My suggestion would be to document everything which is currently > implemented and keep the "conceptual documentation" as it is. > So the Maintainer should no longer have an issue with it. > > > If the main issue it the time to document the already implemented > features, I can create a patch. > > > Chris > > > > > Gesendet: Dienstag, 06. Oktober 2015 um 07:36 Uhr > > Von: "David Lang" > > An: rsyslog-users > > Betreff: Re: [rsyslog] Separation of actions based on log source - with > good performance > > > > a CVE for something that requires manually enabling an experimental > feature??? > > > > it would be one thing if a default config had the problem, or if it was > > something entirely dependent on remote data. > > > > I would be very tempted to respond to the CVE with "don't enable this > incomplete > > feature" as the solution. It's very common for incomplete features to be > > included in released versions > > > > grumble, we have enough real bugs to worry about. > > > > David Lang > > > > On Tue, 6 Oct 2015, Rainer Gerhards wrote: > > > > > Date: Tue, 6 Oct 2015 07:15:31 +0200 > > > From: Rainer Gerhards > > > Reply-To: rsyslog-users > > > To: rsyslog-users > > > Subject: Re: [rsyslog] Separation of actions based on log source - > with good > > > performance > > > > > > Sorry, folks, good intent always seems to find someone who turns it > > > into negative. I was yesterday contacted by a distro maintainer who > > > wants to turn this bug in the officially non-existant lookup table > > > feature into a CVE and insists that it is a vuln even after the > > > argument that the feature never oficially existed. > > > > > > It looks like it was a bad idea to merge potentially useful yet > > > incomplete code into the main branch (and documenting it to be not > > > present). It looks like I need to re-think my stance on experimental > > > features. > > > > > > Anyhow, I really don't want to support the argument that something > > > non-existing can be a CVE. As such, I will create a new > > > master-insecure branch, which will be a clone of the current master > > > branch. Then I'll remove the lookup table code, so that the code base > > > matches the documentation. I really don't want to create a general > > > principle here that we need to create CVEs (and patched) for something > > > that was just added as a convenience for a handful of folks who were > > > ready to take a risk. > > > > > > If there is sufficient interest, we can consider officially adding > > > this feature to the January 8.15 release iff it is ready by then. > > > @janmejay: please let me know if you would like to continue with your > > > work on lookup tables under this new situation. > > > > > > As soon as I have time, I'll check what else needs to be removed. Not > > > sure about the ./contributed branch, because the project cannot > > > guarantee at all this is bug-free. It's documented to be so, but if > > > that is not sufficient, it should probably live only in the > > > master-insecure branch. > > > > > > Rainer > > > > > > 2015-10-02 17:29 GMT+02:00 singh.janmejay : > > >> As of now it returns empty string for no-match. I guess we should go > ahead > > >> with it in its current form. We can add default value any time later > > >> without breaking compatibility(it'd default to ""). > > >> > > >> I'll add a test for multiple tables on Monday. > > >> > > >> On Fri, Oct 2, 2015, 7:16 PM wrote: > > >> > > >>> Hi, > > >>> No, I didn't. I even didn't realize the patch. > > >>> > > >>> It seems to be exactly the related issue. So I don't expect any > further > > >>> issues. > > >>> I will use the new version on 2 systems. If there is any other > issue, I > > >>> will let you know. > > >>> Release data for next rsyslog version is quite far so enough time to > > >>> test... ;) > > >>> > >
Re: [rsyslog] Remote syslog setup unsuccesful
David, I added a file: /etc/rsyslog.d/enableDebug.conf with the contents: *.* /var/log/all.log;RSYSLOG_DebugFormat And restarted rsyslog. I do not see a file all.log being created in /var/log. Could it be this feature was not included in rsyslog version 4.2.0 ? Geert Liesveld. -- View this message in context: http://rsyslog-users.1305293.n2.nabble.com/Remote-syslog-setup-unsuccesful-tp7589216p7589255.html Sent from the rsyslog-users mailing list archive at Nabble.com. ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: [rsyslog] Separation of actions based on log source - with good performance
2015-10-06 9:51 GMT+02:00 singh.janmejay: > Let us ship a full implementation in 8.14, let us not remove it. OK... I'd say let's see if this actually gets a CVE or not. If it does, I think the proper thing to do is to remove what never was really existing in 8.14. We can always introduce a new feature in 8.15. But for political reasons, I do not want to make this look like a bugfix, but rather what it is: the first incarnation of a new feature. I think those that really need this feature are in this discussion thread and can build it themselves until 8.15 comes out. The implications of makeing this look like a bug-fix are hard from my PoV: I have already been asked to fix this "security vulnerability" in older versions. The real fix is to remove the code in question, not to spend another 2+ days on writing old stuff patches and advisories for something that is really non-existent. Over the past couple of month, we have done several improvements to the build process and testing process and static code analysis and cross-platform and... . While all of this is nice, it is also clearly visible that the past 12 month were those with the least new features and the least *bugfixes* since ever in rsyslog history. While I definitely see value, I become more and more concerned if spending my scare time on many close-to-cosmetic or quite-platform-specific issues is really the right path forward. Sure we now know how to tell that something is failing and we know it would be good to fix them. Two years ago, we wouldn't know for sure, but the problems would have been fixed already ;) Rainer > > -- > Regards, > Janmejay > > PS: Please blame the typos in this mail on my phone's uncivilized soft > keyboard sporting it's not-so-smart-assist technology. > > On Oct 6, 2015 1:19 PM, "singh.janmejay" wrote: > >> How about this: I get to it next week and implement all that the proposal >> talks about and document it? It doesn't seem like a lot of work from where >> it is right now. >> >> I vote for not removing it. >> >> -- >> Regards, >> Janmejay >> >> PS: Please blame the typos in this mail on my phone's uncivilized soft >> keyboard sporting it's not-so-smart-assist technology. >> >> On Oct 6, 2015 11:53 AM, wrote: >> >>> Hello, >>> I have never heared such a nonsense. >>> Actually the number of applications that does not include features that >>> are not official documented shoult be extremly limited. >>> >>> The functionality is really usefull and already in big landscapes >>> productive. >>> Please, please do NOT remove the lookup-table from the main branch. >>> The functionaltiy works fine, I'm using this since march and I did not >>> have any issue since the latest patch of janmejay. >>> >>> Even the "concept" is not fully implemented (e.g. smaller things like >>> nomatch) the main part works fine. >>> >>> >>> My suggestion would be to document everything which is currently >>> implemented and keep the "conceptual documentation" as it is. >>> So the Maintainer should no longer have an issue with it. >>> >>> >>> If the main issue it the time to document the already implemented >>> features, I can create a patch. >>> >>> >>> Chris >>> >>> >>> >>> > Gesendet: Dienstag, 06. Oktober 2015 um 07:36 Uhr >>> > Von: "David Lang" >>> > An: rsyslog-users >>> > Betreff: Re: [rsyslog] Separation of actions based on log source - with >>> good performance >>> > >>> > a CVE for something that requires manually enabling an experimental >>> feature??? >>> > >>> > it would be one thing if a default config had the problem, or if it was >>> > something entirely dependent on remote data. >>> > >>> > I would be very tempted to respond to the CVE with "don't enable this >>> incomplete >>> > feature" as the solution. It's very common for incomplete features to be >>> > included in released versions >>> > >>> > grumble, we have enough real bugs to worry about. >>> > >>> > David Lang >>> > >>> > On Tue, 6 Oct 2015, Rainer Gerhards wrote: >>> > >>> > > Date: Tue, 6 Oct 2015 07:15:31 +0200 >>> > > From: Rainer Gerhards >>> > > Reply-To: rsyslog-users >>> > > To: rsyslog-users >>> > > Subject: Re: [rsyslog] Separation of actions based on log source - >>> with good >>> > > performance >>> > > >>> > > Sorry, folks, good intent always seems to find someone who turns it >>> > > into negative. I was yesterday contacted by a distro maintainer who >>> > > wants to turn this bug in the officially non-existant lookup table >>> > > feature into a CVE and insists that it is a vuln even after the >>> > > argument that the feature never oficially existed. >>> > > >>> > > It looks like it was a bad idea to merge potentially useful yet >>> > > incomplete code into the main branch (and documenting it to be not >>> > > present). It looks like I need to
Re: [rsyslog] Remote syslog setup unsuccesful
no, that has existed since rsyslog 2.x or so if you aren't getting anything at all in that log you either have a syntax error that's confusing rsyslog (run 'rsyslogd -N2' to have it do a syntax check of the main config file and anything included by that file) or you have something else in the config that's throwing the log away before you get to that line. can you post the config file with that line in it? David Lang On Tue, 6 Oct 2015, GeertLiesveld wrote: Date: Tue, 6 Oct 2015 01:35:40 -0700 (MST) From: GeertLiesveldReply-To: rsyslog-users To: rsyslog@lists.adiscon.com Subject: Re: [rsyslog] Remote syslog setup unsuccesful David, I added a file: /etc/rsyslog.d/enableDebug.conf with the contents: *.* /var/log/all.log;RSYSLOG_DebugFormat And restarted rsyslog. I do not see a file all.log being created in /var/log. Could it be this feature was not included in rsyslog version 4.2.0 ? Geert Liesveld. -- View this message in context: http://rsyslog-users.1305293.n2.nabble.com/Remote-syslog-setup-unsuccesful-tp7589216p7589255.html Sent from the rsyslog-users mailing list archive at Nabble.com. ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT. ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss
I'm surprised it didn't help more than that. what do the impstats look like. Impstats looks good - only two dynafiles that still have evictions, which I'll tune a little more now. See bottom of message for a full recent output. at high traffic volumes increasing the batch size can also be helpful, but if the queue is usually empty it probably won't make much difference. We have several %fromhost% templates for servers. I noticed that there is a rush of DNS requests when rsyslog first starts up but then it quiets down due to the internal caching. I also have a local DNS cache running to point to - not currently set - but when I did have the system pointed there, I still had missing messages. well, you've fixed a lot of things, the dynafile thrashing was a major problem. DNS is probably less of an issue, and only an issue for UDP not TCP (UDP needs to do a lookup for each packet, TCP only for each connection) I personally recommend using fromhost-ip instead of fromhost. If you want to use a name, use the hostname that the server put in the log message, if you need to ignore that and see where it last came from, the IP is faster and has fewer failure modes. with your config, only traffic arriving on port 10517 will get written to the testingruleset, internal syslog messages won't show up. now, there are things that can cause you to see messages via tcpdump and not see them in rsyslog. The most common is iptables rules But since you are writing to /tmp you could be running into permission problems with SELinux/AppArmor (depending on your distro). try putting the file under /var/log and see if it appears. It should appear when the first log message arrives. by the way, you don't need to do *.*, just list the actions directly, so the following works just fine (permissions allowing) ruleset(name="TestingRuleSet") { /tmp/debug.log;RSYSLOG_DebugFormat action(type="omfile" dynaFile="EduroamACSFile" template="EduroamACSFormat" name="writeEduroamACSFile") stop } # end ruleset in impstats, the /tmp/debug.log line will be action ## while the line after it will have a name, check what it says about messages that are processed/etc Can you deliver messages to port 10517 via TCP? there is an odd case where a missing route can cause UDP messages to not be delivered to the app. Oct 6 16:27:45 its-syslog-up1 rsyslogd-pstats: imudp(*:10517): origin=imudp submitted=1221 Oct 6 16:27:45 its-syslog-up1 rsyslogd-pstats: imudp(*:10517): origin=imudp submitted=0 yep, that says that messages are arriving with impstats, do you have it configured to reset counters each time or not? there is a little inaccuracy that creeps in when resetting the counters, and the impstats analysis tool that Rainer pointed you at assumes the counters aren't being reset. We really need at least two sets of stats to be able to see what's happening. now, looking at the impstats I see one smoking gun: main Q: origin=core.queue size=149 enqueued=492522977 full=51 discarded.full=0 discarded.nf=0 maxqsize=10 maxqsize is such a nice round number that I'll bet it's the max you have configured. That means that at some point rsyslog filled up the queue completely and would have been unable to process any more messages. That means that UDP messages that arrived during that time would have been dropped. this also says it was full 51 times, so that is at least 51 messages lost. on a high volume server, you probably want to bump the main queue size up a bit larger as well. another issue is that several actions show that they have been suspended and not resumed, that will cause logs to be missing from those outputs (action2, 30, 43, 67). one advantage of the new action() format is the ability to name things. Even if you don't name everything, naming several of them will help figure out what the rest are. as you said, there are still some dynafile cache settings that probably need to be increased. They can slow everything down and cause the queue to fill (but without knowing if the stats are reset, I don't know if the numbers are since startup or since the last report) Since the busiest outputs are also the ones with the highest miss rates, you may end up with a more dramatic I/O savings once you fix them. I don't understand why you have multiple lines for the same thing (imudp and writeEduroamACSFile for example), unless it's one line per thread, in which case these are showing that you really don't need multiple threads for the inputs, you aren't using them :-) David Lang ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: [rsyslog] Separation of actions based on log source - with good performance
Ok. But we'll have to keep merging master into master-insecure going forward. Im ok working on it off any branch. On Tue, Oct 6, 2015 at 3:14 PM, Rainer Gerhardswrote: > 2015-10-06 9:51 GMT+02:00 singh.janmejay : >> Let us ship a full implementation in 8.14, let us not remove it. > > OK... I'd say let's see if this actually gets a CVE or not. If it > does, I think the proper thing to do is to remove what never was > really existing in 8.14. We can always introduce a new feature in > 8.15. But for political reasons, I do not want to make this look like > a bugfix, but rather what it is: the first incarnation of a new > feature. I think those that really need this feature are in this > discussion thread and can build it themselves until 8.15 comes out. > > The implications of makeing this look like a bug-fix are hard from my > PoV: I have already been asked to fix this "security vulnerability" in > older versions. The real fix is to remove the code in question, not to > spend another 2+ days on writing old stuff patches and advisories for > something that is really non-existent. > > Over the past couple of month, we have done several improvements > to the build process and testing process and static code analysis and > cross-platform and... . While all of this is nice, it is also clearly > visible that the past 12 month were those with the least new features > and the least *bugfixes* since ever in rsyslog history. While I > definitely see value, I become more and more concerned if spending my > scare time on many close-to-cosmetic or quite-platform-specific issues > is really the right path forward. Sure we now know how to tell that > something is failing and we know it would be good to fix them. Two > years ago, we wouldn't know for sure, but the problems would have been > fixed already ;) > > Rainer >> >> -- >> Regards, >> Janmejay >> >> PS: Please blame the typos in this mail on my phone's uncivilized soft >> keyboard sporting it's not-so-smart-assist technology. >> >> On Oct 6, 2015 1:19 PM, "singh.janmejay" wrote: >> >>> How about this: I get to it next week and implement all that the proposal >>> talks about and document it? It doesn't seem like a lot of work from where >>> it is right now. >>> >>> I vote for not removing it. >>> >>> -- >>> Regards, >>> Janmejay >>> >>> PS: Please blame the typos in this mail on my phone's uncivilized soft >>> keyboard sporting it's not-so-smart-assist technology. >>> >>> On Oct 6, 2015 11:53 AM, wrote: >>> Hello, I have never heared such a nonsense. Actually the number of applications that does not include features that are not official documented shoult be extremly limited. The functionality is really usefull and already in big landscapes productive. Please, please do NOT remove the lookup-table from the main branch. The functionaltiy works fine, I'm using this since march and I did not have any issue since the latest patch of janmejay. Even the "concept" is not fully implemented (e.g. smaller things like nomatch) the main part works fine. My suggestion would be to document everything which is currently implemented and keep the "conceptual documentation" as it is. So the Maintainer should no longer have an issue with it. If the main issue it the time to document the already implemented features, I can create a patch. Chris > Gesendet: Dienstag, 06. Oktober 2015 um 07:36 Uhr > Von: "David Lang" > An: rsyslog-users > Betreff: Re: [rsyslog] Separation of actions based on log source - with good performance > > a CVE for something that requires manually enabling an experimental feature??? > > it would be one thing if a default config had the problem, or if it was > something entirely dependent on remote data. > > I would be very tempted to respond to the CVE with "don't enable this incomplete > feature" as the solution. It's very common for incomplete features to be > included in released versions > > grumble, we have enough real bugs to worry about. > > David Lang > > On Tue, 6 Oct 2015, Rainer Gerhards wrote: > > > Date: Tue, 6 Oct 2015 07:15:31 +0200 > > From: Rainer Gerhards > > Reply-To: rsyslog-users > > To: rsyslog-users > > Subject: Re: [rsyslog] Separation of actions based on log source - with good > > performance > > > > Sorry, folks, good intent always seems to find someone who turns it > > into negative. I was yesterday contacted by a distro maintainer who > > wants to turn this bug in the officially non-existant
Re: [rsyslog] Remote syslog setup unsuccesful
2015-10-06 12:16 GMT+02:00 David Lang: > no, that has existed since rsyslog 2.x or so As I was not sure, I checked out 4.2.0 and the RSYSLOG_DebugFormat template is indeed available in it. Rainer > if you aren't getting anything at all in that log you either have a syntax > error that's confusing rsyslog (run 'rsyslogd -N2' to have it do a syntax > check of the main config file and anything included by that file) or you > have something else in the config that's throwing the log away before you > get to that line. > > can you post the config file with that line in it? > > David Lang > > > > On Tue, 6 Oct 2015, GeertLiesveld wrote: > >> Date: Tue, 6 Oct 2015 01:35:40 -0700 (MST) >> From: GeertLiesveld >> Reply-To: rsyslog-users >> To: rsyslog@lists.adiscon.com >> Subject: Re: [rsyslog] Remote syslog setup unsuccesful >> >> >> David, >> >> I added a file: /etc/rsyslog.d/enableDebug.conf with the contents: >> *.* /var/log/all.log;RSYSLOG_DebugFormat >> >> And restarted rsyslog. I do not see a file all.log being created in >> /var/log. >> >> Could it be this feature was not included in rsyslog version 4.2.0 ? >> >> Geert Liesveld. >> >> >> >> -- >> View this message in context: >> http://rsyslog-users.1305293.n2.nabble.com/Remote-syslog-setup-unsuccesful-tp7589216p7589255.html >> Sent from the rsyslog-users mailing list archive at Nabble.com. >> ___ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T >> LIKE THAT. >> > ___ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T > LIKE THAT. ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
[rsyslog] RFC: dynamic-stats support
Hi, I am working on support for stats with dynamic-name. This comes handy in situations where metric-name is dependent upon value of a certain attribute of the message. Say, for a central log-aggregation service, its valuable to know what is inbound message-count distribution across application-clusters that send logs to it, or for a shared-server, its valuable to know what is the log-volume generation across users etc. Im thinking of using functions-like interface to support this. It may look similar to this: dyn_stats("user_msg_count") ... ruleset(...) { ... dyn_inc("user_msg_count", $.user) ... } dyn_stats signature looks like: dyn_stats(, , , ) dyn_inc signature looks like: dyn_inc(, ) Reporting would work similar to static-metric via impstats. Mapping: statsobj_s.name = name_space statsobj_s.origin = "dyn" ctr_s.name = "foo" (say $.user had value foo) Thoughts / suggestions? -- Regards, Janmejay http://codehunk.wordpress.com ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: [rsyslog] RFC: dynamic-stats support
On Tue, 6 Oct 2015, singh.janmejay wrote: Hi, I am working on support for stats with dynamic-name. This comes handy in situations where metric-name is dependent upon value of a certain attribute of the message. Say, for a central log-aggregation service, its valuable to know what is inbound message-count distribution across application-clusters that send logs to it, or for a shared-server, its valuable to know what is the log-volume generation across users etc. Im thinking of using functions-like interface to support this. It may look similar to this: dyn_stats("user_msg_count") ... ruleset(...) { ... dyn_inc("user_msg_count", $.user) ... } dyn_stats signature looks like: dyn_stats(, , , ) dyn_inc signature looks like: dyn_inc(, ) Reporting would work similar to static-metric via impstats. Mapping: statsobj_s.name = name_space statsobj_s.origin = "dyn" ctr_s.name = "foo" (say $.user had value foo) Thoughts / suggestions? how is this different/better than global variables? (although we may need to implement soem functions, atomic inc/dec copy+clear) If you have pstats output in json format, you can even piggyback on it's schedule to output the data. things like stats can very easily end up being expensive in terms of locking (something global variables already have figured out), and it sounds like you are proposing adding a scheduler of some sort to output the data. variables should not need to be 'recycled', either they contain data or they don't. If they contain data, you need to keep the data until you do something with it, if they don't, you don't have to track them. I am actually doing this sort of thing external to rsyslog in SEC I have a template in rsyslog that contains hostname, fromhost-ip, programname and I output it via improg to SEC. SEC accumulates counters and has scheduled outputs to files. before I started using SEC for this, I used the same template to output to a file and then for reports, used cut + sort + uniq -c to extract the data I need. When the files only contain the significant data, this is actually not bad to do, even at higher volumes. David Lang ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: [rsyslog] omfile fuse
2015-10-06 8:26 GMT+02:00 David Lang: > On Tue, 6 Oct 2015, Rainer Gerhards wrote: > >> 2015-10-06 7:27 GMT+02:00 David Lang : >>> >>> On Tue, 6 Oct 2015, Rainer Gerhards wrote: >>> 2015-10-06 0:09 GMT+02:00 Kendall Green : > > > I've set asyncWriting="off" and the only difference is, that with 1000 > message test, all were delivered on HUP and/or service restart, but > apparently not on flushInterval or buffer full. I'm currently running > another test with 100k messages iterating about 10 logs/sec, and > watching > the output for number of lines. The output file is created on dfs-fuse > mount upon receipt of the first log, but 0 lines written. > > This current test will hopefully show how many logs get processed > without > intervention over the next ~3 hours, and if still no logs appear will > see > how many are written upon HUP signal. > > I'm unsure how the asyncWriting off impacts the rest of the config, and > if > the current config has disabled compression... or not with no zip. > > This note is from omfile asyncWriting doc: > *Note that in order to enable FlushInterval, AsyncWriting must be set > to > “on”. Otherwise, the flush interval will be ignored. Also note that > when > FlushOnTXEnd is “on” but AsyncWriting is off, output will only be > written > when the buffer is full. This may take several hours, or even require a > rsyslog shutdown. However, a buffer flush can be forced in that case by > sending rsyslogd a HUP signal.* > > Please help clarify how these parameters correspond and what the > settings > can achieve expedited log transport, avoiding long delays waiting for > buffering or flush intervals. This goal is why have chosen low settings > for > IOBufferSize and flushInterval. Without async writing, everything is written sync on when it arrives (more precisely at commit of the current batch) aka "as quickly as possible". Thus, these params do not exist because they have no meaning. Note that with a very large batch and a small buffer, partial records may be written. >>> >>> >>> >>> although without the checkpoint parameter the writes are to the OS and >>> they >>> get flushed to disk at a later point. >> >> >> you mean "sync", right? > > > I may be misunderstanding, but i thought the checkpoint (or checkpoint > interval) parameter controleld how frequently you did the sync (so turning > sync on didn't make it per every message) That checkopoint interval is for queues (which share the same util object), not for omfile. There, what you say is right. For omfile (if not set to async writing), the sync happens at the end of each batch, which usually means after each write because usually a full batch goes into the write buffer. If the batch has more data than the write buffer fits, all buffer writes are done and then the flush. I've also updated the doc to be more precise: https://github.com/rsyslog/rsyslog-doc/commit/b2bb5d93787f9662eb41499c86f881d2bd649dce Rainer ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss
Thank you David. From what I see, none of the queues are getting full. None of the actions - I have 73 total - have a 'failed' count of greater than zero. Four have a 'suspended' count in the single digits. When you said "if you are writing to a local file, you should not need to have a separate queue", does queue equate to ruleset in this case? The main reason I have the separate rulesets is to handle different kinds of incoming logs on different ports, making it easier to write the rules logic for the different types. Does that make sense? There is a high number of dynafiles, specifically for server logs, which we break out into files by host and programname with these templates: template(name="LinuxProgramFile" type="string" string="/var/log/collection/linux/%fromhost:::lowercase%/%programname%.log") template(name="WindowsProgramFile" type="string" string="/var/log/collection/windows/%HOSTNAME:::lowercase%/%programname%.log ") The result is about 32,000 total dynafiles which are then rotated out daily. Looking at top, rsyslogd as a whole never goes above 120%. Most of the threads hang out below 15% with the busiest one - rs:NetworkDevic - hitting about 50% during peak. Although I hadn't noticed this before, watching top for a few minutes this morning, rsyslogd hangs out around 60%, then bursts to almost 200% for one second, then disappears from top in the next second, before returning to 60%. It cycled through this every 10 seconds or so. After restarting rsyslogd, the behavior went away - it's staying around 60%. Should I try lowering the number of threads for the rulesets/imudp? This issue does happen with both TCP and UDP, however, which further compounds my confusion. You mention systemd - I'll look into that as I have a case open with RH. Dan -Original Message- From: rsyslog-boun...@lists.adiscon.com [mailto:rsyslog-boun...@lists.adiscon.com] On Behalf Of David Lang Sent: Monday, October 5, 2015 3:23 PM To: rsyslog-usersSubject: Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss doing a cut-n-past since this client doesn't quote signed messages properly: David, thank you. Apologies, I had cut out the dynafile for testing. Here is some additional config file: template(name="EduroamACSFile" type="string" string="/var/log/collection/eduroam_acs.log") template(name="EduroamACSFormat" type="string" string="%timestamp% %hostname% %syslogtag%%msg:::escape-cc%\n") ruleset(name="ruleset1" queue.type="LinkedList" queue.size="75" queue.workerthreads="4") { # at very top of ruleset - IPs obscured for privacy if $fromhost-ip == 'x.x.2.75' or $fromhost-ip == 'x.x.2.85' then { /tmp/debug.log;RSYSLOG_DebugFormat action(type="omfile" dynaFile="EduroamACSFile" template="EduroamACSFormat" name="writeEduroamACSFile") stop } #... other rules below } # end ruleset I do have impstats enabled for every 5 minutes. Here's output specific to this file for a recent sample for this file, plus the general rule set queues: Oct 5 14:06:47 its-syslog-up1 rsyslogd-pstats: dynafile cache EduroamACSFile: origin=omfile requests=31123 level0=30781 missed=342 evicted=0 maxused=1 closetimeouts=51 Oct 5 14:06:47 its-syslog-up1 rsyslogd-pstats: writeEduroamACSFile: origin=core.action processed=31123 failed=0 suspended=0 suspended.duration=0 resumed=0 Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: resource-usage: origin=impstats utime=217851481196 stime=87433080781 maxrss=628664 minflt=43999703 majflt=216 inblock=3033769 oublock=-1119976592 nvcsw=379912551 nivcsw=20999555 Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: LinuxRuleSet: origin=core.queue size=6 enqueued=3304935974 full=0 discarded.full=0 discarded.nf=0 maxqsize=350911 Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: WindowsRuleSet: origin=core.queue size=0 enqueued=648043087 full=0 discarded.full=0 discarded.nf=0 maxqsize=156862 Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: MailRuleSet: origin=core.queue size=0 enqueued=307960420 full=0 discarded.full=0 discarded.nf=0 maxqsize=4243 Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: NetworkDeviceRuleSet: origin=core.queue size=0 enqueued=11008320810 full=0 discarded.full=0 discarded.nf=0 maxqsize=456471 Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: main Q: origin=core.queue size=120 enqueued=1496835 full=0 discarded.full=0 discarded.nf=0 maxqsize=1738 Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: imudp(w1): origin=imudp called.recvmmsg=8282193695 called.recvmsg=0 msgs.received=2833007396 Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: imudp(w0): origin=imudp called.recvmmsg=14154695256 called.recvmsg=0 msgs.received=8909255564 To estimate volume of logs this is receiving, I took a recent impstats output for enqueued for all rule sets and added them, then subtracted the values from an hour ago, divided by 3600 to get msgs/second and got 29521. This
Re: [rsyslog] Remote syslog setup unsuccesful
Rainer, thanks for your reply. For now , I just started rsyslog in the foreground using the -d option. That was enough to see that the messages from the remote host did not come in. I checked tcpdump again and saw the messages were actually delivered to the interface. We restored a broken install for iptables and that showed that all traffic should come in. So we started a test with netcat listening on port UDP 514 instead of rsyslog (stopped). That showed indeed nothing was coming in, which ruled out an error in the configuration of rsyslog. We checked the other settings for the OS. We have both ipv6 and ipv4 configured in this system. For Ipv6 there is a gateway configured. For ipv4 there was not. Actually, this was one of the things that David mentioned earlier. Well, it seems he was exacly right! we configured a default gateway for ipv4 as well, and after starting the rsyslog service, messages were logged in the way they are supposed to, in a separate directory using the %FROMHOST% option. I want to thank you very much for your most welcome help. Best Regards, Geert Liesveld. -- View this message in context: http://rsyslog-users.1305293.n2.nabble.com/Remote-syslog-setup-unsuccesful-tp7589216p7589263.html Sent from the rsyslog-users mailing list archive at Nabble.com. ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss
2015-10-06 14:57 GMT+02:00 Woodruff, Dan: > Thank you David. From what I see, none of the queues are getting full. > > None of the actions - I have 73 total - have a 'failed' count of greater > than zero. Four have a 'suspended' count in the single digits. > > When you said "if you are writing to a local file, you should not need to > have a separate queue", does queue equate to ruleset in this case? The main > reason I have the separate rulesets is to handle different kinds of incoming > logs on different ports, making it easier to write the rules logic for the > different types. Does that make sense? > > There is a high number of dynafiles, specifically for server logs, which we > break out into files by host and programname with these templates: > template(name="LinuxProgramFile" type="string" > string="/var/log/collection/linux/%fromhost:::lowercase%/%programname%.log") > template(name="WindowsProgramFile" type="string" > string="/var/log/collection/windows/%HOSTNAME:::lowercase%/%programname%.log > ") > > The result is about 32,000 total dynafiles which are then rotated out daily. > > Looking at top, rsyslogd as a whole never goes above 120%. Most of the > threads hang out below 15% with the busiest one - rs:NetworkDevic - hitting > about 50% during peak. Although I hadn't noticed this before, watching top > for a few minutes this morning, rsyslogd hangs out around 60%, then bursts > to almost 200% for one second, then disappears from top in the next second, > before returning to 60%. It cycled through this every 10 seconds or so. > After restarting rsyslogd, the behavior went away - it's staying around 60%. This could be caused by a too-small dynafile cache. I haven't located the conf for these actions with 32,000 files nor the stats, but depending on size it could be the problem. Please post the *full* impstats log. Rainer > > Should I try lowering the number of threads for the rulesets/imudp? This > issue does happen with both TCP and UDP, however, which further compounds my > confusion. You mention systemd - I'll look into that as I have a case open > with RH. > > Dan > > > -Original Message- > From: rsyslog-boun...@lists.adiscon.com > [mailto:rsyslog-boun...@lists.adiscon.com] On Behalf Of David Lang > Sent: Monday, October 5, 2015 3:23 PM > To: rsyslog-users > Subject: Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss > > doing a cut-n-past since this client doesn't quote signed messages properly: > > > David, thank you. Apologies, I had cut out the dynafile for testing. Here is > some additional config file: > > template(name="EduroamACSFile" type="string" > string="/var/log/collection/eduroam_acs.log") > template(name="EduroamACSFormat" type="string" string="%timestamp% > %hostname% %syslogtag%%msg:::escape-cc%\n") > > ruleset(name="ruleset1" queue.type="LinkedList" queue.size="75" > queue.workerthreads="4") { > # at very top of ruleset - IPs obscured for privacy > if $fromhost-ip == 'x.x.2.75' > or $fromhost-ip == 'x.x.2.85' > then { > /tmp/debug.log;RSYSLOG_DebugFormat > action(type="omfile" dynaFile="EduroamACSFile" > template="EduroamACSFormat" name="writeEduroamACSFile") > stop > } > #... other rules below > } # end ruleset > > I do have impstats enabled for every 5 minutes. Here's output specific to > this file for a recent sample for this file, plus the general rule set > queues: > > Oct 5 14:06:47 its-syslog-up1 rsyslogd-pstats: dynafile cache > EduroamACSFile: origin=omfile requests=31123 level0=30781 missed=342 > evicted=0 maxused=1 closetimeouts=51 > Oct 5 14:06:47 its-syslog-up1 rsyslogd-pstats: writeEduroamACSFile: > origin=core.action processed=31123 failed=0 suspended=0 suspended.duration=0 > resumed=0 > > Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: resource-usage: > origin=impstats utime=217851481196 stime=87433080781 maxrss=628664 > minflt=43999703 majflt=216 inblock=3033769 oublock=-1119976592 > nvcsw=379912551 nivcsw=20999555 > Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: LinuxRuleSet: > origin=core.queue size=6 enqueued=3304935974 full=0 discarded.full=0 > discarded.nf=0 maxqsize=350911 > Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: WindowsRuleSet: > origin=core.queue size=0 enqueued=648043087 full=0 discarded.full=0 > discarded.nf=0 maxqsize=156862 > Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: MailRuleSet: > origin=core.queue size=0 enqueued=307960420 full=0 discarded.full=0 > discarded.nf=0 maxqsize=4243 > Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: NetworkDeviceRuleSet: > origin=core.queue size=0 enqueued=11008320810 full=0 discarded.full=0 > discarded.nf=0 maxqsize=456471 > Oct 5 14:11:47 its-syslog-up1 rsyslogd-pstats: main Q: origin=core.queue > size=120 enqueued=1496835 full=0 discarded.full=0 discarded.nf=0 > maxqsize=1738 > Oct 5 14:11:47 its-syslog-up1
Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss
Rainer, thank you. Here is a full extract of a recent impstats run: Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: imuxsock: origin=imuxsock submitted=282984 ratelimit.discarded=0 ratelimit.numratelimiters=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache StatsFile: origin=omfile requests=326648 level0=325603 missed=1045 evicted=0 maxused=1 closetimeouts=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 1: origin=core.action processed=326648 failed=0 suspended=0 suspended.duration=0 resumed=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 2: origin=core.action processed=3255090916 failed=0 suspended=2 suspended.duration=240 resumed=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache LDAPFile: origin=omfile requests=3171530030 level0=3171529033 missed=997 evicted=0 maxused=1 closetimeouts=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 3: origin=core.action processed=3171530030 failed=0 suspended=0 suspended.duration=0 resumed=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache LinuxIPTablesFile: origin=omfile requests=138877957 level0=19864683 missed=91303303 evicted=91292586 maxused=10 closetimeouts=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 4: origin=core.action processed=138877957 failed=0 suspended=0 suspended.duration=0 resumed=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache LinuxUFWFile: origin=omfile requests=0 level0=0 missed=0 evicted=0 maxused=0 closetimeouts=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 5: origin=core.action processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache DHCPFile: origin=omfile requests=3518037 level0=3516992 missed=1045 evicted=0 maxused=1 closetimeouts=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 6: origin=core.action processed=3518037 failed=0 suspended=0 suspended.duration=0 resumed=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache LinuxBlankProgramFile: origin=omfile requests=93283 level0=28347 missed=1932 evicted=15 maxused=10 closetimeouts=303 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 7: origin=core.action processed=93283 failed=0 suspended=0 suspended.duration=0 resumed=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache LinuxProgramFile: origin=omfile requests=358821719 level0=245218530 missed=23914416 evicted=23903728 maxused=10 closetimeouts=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 8: origin=core.action processed=358821719 failed=0 suspended=0 suspended.duration=0 resumed=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache 10514Catchall: origin=omfile requests=0 level0=0 missed=0 evicted=0 maxused=0 closetimeouts=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 9: origin=core.action processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache WindowsSecFile: origin=omfile requests=232842816 level0=189594825 missed=19555094 evicted=19544459 maxused=10 closetimeouts=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 10: origin=core.action processed=232842816 failed=0 suspended=0 suspended.duration=0 resumed=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache WindowsSecFile: origin=omfile requests=18059 level0=17740 missed=319 evicted=0 maxused=1 closetimeouts=68 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 11: origin=core.action processed=18059 failed=0 suspended=0 suspended.duration=0 resumed=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache WindowsAppFile: origin=omfile requests=14111964 level0=8603913 missed=4387715 evicted=4377380 maxused=10 closetimeouts=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 12: origin=core.action processed=14111964 failed=0 suspended=0 suspended.duration=0 resumed=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache WindowsSysFile: origin=omfile requests=7726972 level0=4070882 missed=3120334 evicted=3110190 maxused=10 closetimeouts=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 13: origin=core.action processed=7726972 failed=0 suspended=0 suspended.duration=0 resumed=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache ParkingPayStationFile: origin=omfile requests=0 level0=0 missed=0 evicted=0 maxused=0 closetimeouts=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 14: origin=core.action processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache WindowsProgramFileNoDotLog: origin=omfile requests=238047839 level0=217994829 missed=17165101 evicted=17154494 maxused=10 closetimeouts=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 15: origin=core.action processed=238047839 failed=0 suspended=0 suspended.duration=0 resumed=0 Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache WindowsProgramFile: origin=omfile
Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss
There are a couple of those: LinuxIPTablesFile: origin=omfile requests=138877957 level0=19864683 missed=91303303 evicted=91292586 maxused=10 closetimeouts=0 have a look at the evicted count! This totally kills performance. Each time one entry needs to be evicted, the file is closed, and a new one is opened. You should set the cache size for this action to match roughly the number of files that you want to keep open during normal operations (e.g. if you have 1000 hosts and include hour and hostname in the dynafile name, the cache size should be around 1000, except if you have some low talkers. Also note the online tool at http://www.rsyslog.com/impstats-analyzer/ It probably will spit out a couple of warnings if you submit the pstats. HTH Rainer 2015-10-06 15:29 GMT+02:00 Woodruff, Dan: > Rainer, thank you. Here is a full extract of a recent impstats run: > > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: imuxsock: origin=imuxsock > submitted=282984 ratelimit.discarded=0 ratelimit.numratelimiters=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache StatsFile: > origin=omfile requests=326648 level0=325603 missed=1045 evicted=0 maxused=1 > closetimeouts=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 1: origin=core.action > processed=326648 failed=0 suspended=0 suspended.duration=0 resumed=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 2: origin=core.action > processed=3255090916 failed=0 suspended=2 suspended.duration=240 resumed=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache LDAPFile: > origin=omfile requests=3171530030 level0=3171529033 missed=997 evicted=0 > maxused=1 closetimeouts=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 3: origin=core.action > processed=3171530030 failed=0 suspended=0 suspended.duration=0 resumed=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache > LinuxIPTablesFile: origin=omfile requests=138877957 level0=19864683 > missed=91303303 evicted=91292586 maxused=10 closetimeouts=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 4: origin=core.action > processed=138877957 failed=0 suspended=0 suspended.duration=0 resumed=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache LinuxUFWFile: > origin=omfile requests=0 level0=0 missed=0 evicted=0 maxused=0 > closetimeouts=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 5: origin=core.action > processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache DHCPFile: > origin=omfile requests=3518037 level0=3516992 missed=1045 evicted=0 > maxused=1 closetimeouts=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 6: origin=core.action > processed=3518037 failed=0 suspended=0 suspended.duration=0 resumed=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache > LinuxBlankProgramFile: origin=omfile requests=93283 level0=28347 missed=1932 > evicted=15 maxused=10 closetimeouts=303 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 7: origin=core.action > processed=93283 failed=0 suspended=0 suspended.duration=0 resumed=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache > LinuxProgramFile: origin=omfile requests=358821719 level0=245218530 > missed=23914416 evicted=23903728 maxused=10 closetimeouts=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 8: origin=core.action > processed=358821719 failed=0 suspended=0 suspended.duration=0 resumed=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache > 10514Catchall: origin=omfile requests=0 level0=0 missed=0 evicted=0 > maxused=0 closetimeouts=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 9: origin=core.action > processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache > WindowsSecFile: origin=omfile requests=232842816 level0=189594825 > missed=19555094 evicted=19544459 maxused=10 closetimeouts=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 10: > origin=core.action processed=232842816 failed=0 suspended=0 > suspended.duration=0 resumed=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache > WindowsSecFile: origin=omfile requests=18059 level0=17740 missed=319 > evicted=0 maxused=1 closetimeouts=68 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 11: > origin=core.action processed=18059 failed=0 suspended=0 suspended.duration=0 > resumed=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache > WindowsAppFile: origin=omfile requests=14111964 level0=8603913 > missed=4387715 evicted=4377380 maxused=10 closetimeouts=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 12: > origin=core.action processed=14111964 failed=0 suspended=0 > suspended.duration=0 resumed=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache > WindowsSysFile: origin=omfile requests=7726972 level0=4070882
Re: [rsyslog] RFC: dynamic-stats support
Rainer, I see this as something completely outside the scope of variables. Building stats collector over variables is possible, but then we are then talking about a general purpose language which allows building such complex things. This increases the scope of Rainerscript and with larger scope comes complexity. I feel this is in-line with the other Lua discussion where you emphasized that Rainerscript should not become a fully-general-purpose language? Eg. creating an atomic-increment function for variable requires that we educate users about what can and can't be done if atomic-increment function is used anywhere on a variable. What relationship they can expect it to have with other atomic-incrementing variables (which gets into memory model). On Tue, Oct 6, 2015 at 8:49 PM, Rainer Gerhardswrote: > I can't fully dig into this, but I think we must *very carefully* > evaluate the overall design. Some time ago we introduced arrays for > the limited liblognorm use case, and it hurts us every now and then > when folks want to use arrays for other use cases. It may probably > make sense to re-think how the variable engine etc behaves before > adding more functionality. And make sure that everything works smooth > in all use cases. While anything else may take care for some use > cases, I fear we may get too fragmented. At least this is what I > learned in the past months discussions. > > Anyone else? > > Rainer > > 2015-10-06 17:10 GMT+02:00 singh.janmejay : >> It is possible to use global-variables (it'll require some >> enhancements, table-support etc), but it'll be very inefficient >> compared to this approach. For instance, choice of data-structure etc >> allows making the solution a lot more efficient. >> >> Here its possible to locklessly increment counters in most cases, so >> its overhead is a lot lesser than global-variables. >> >> Recycle is precisely to allow this lockless mechanism to work. Its >> basically saying, it'll track metric-names he has seen in last 1 hour. >> If we kill tracking of it as soon as we don't see an increment >> (between 2 reporting runs of impstats), it'll lead to unnecessary >> churn when low-values are common or load is not uniform in time. >> >> Implementing it on top of global-variables is not only has very high >> performance-penalty(it'll be prohibitive for high-throughput >> scenarios), it also exposes too much complexity to the user (where >> user has to worry about reset etc). >> >> I don't plan to have a scheduler in this implementation. >> GetAllStatsLines call will purge the tree instead of reset at that >> interval. Its basically a balance between freeing-up memory occupied >> by stale-metric-names vs. performance (lockless handling of >> increment). So it will be governed by impstat schedule. May be I >> should change name to better name (equivalent of >> purge_known_keys_after_they_have_been_reported_N_times). >> >> >> On Tue, Oct 6, 2015 at 4:30 PM, David Lang wrote: >>> On Tue, 6 Oct 2015, singh.janmejay wrote: >>> Hi, I am working on support for stats with dynamic-name. This comes handy in situations where metric-name is dependent upon value of a certain attribute of the message. Say, for a central log-aggregation service, its valuable to know what is inbound message-count distribution across application-clusters that send logs to it, or for a shared-server, its valuable to know what is the log-volume generation across users etc. Im thinking of using functions-like interface to support this. It may look similar to this: dyn_stats("user_msg_count") ... ruleset(...) { ... dyn_inc("user_msg_count", $.user) ... } dyn_stats signature looks like: dyn_stats(, , >>> default=10k>, ) dyn_inc signature looks like: dyn_inc(, ) Reporting would work similar to static-metric via impstats. Mapping: statsobj_s.name = name_space statsobj_s.origin = "dyn" ctr_s.name = "foo" (say $.user had value foo) Thoughts / suggestions? >>> >>> >>> how is this different/better than global variables? (although we may need to >>> implement soem functions, atomic inc/dec copy+clear) If you have pstats >>> output in json format, you can even piggyback on it's schedule to output the >>> data. >>> >>> >>> things like stats can very easily end up being expensive in terms of locking >>> (something global variables already have figured out), and it sounds like >>> you are proposing adding a scheduler of some sort to output the data. >>> >>> variables should not need to be 'recycled', either they contain data or they >>> don't. If they contain data, you need to keep the data until you do >>> something with it, if they don't, you don't have to track them. >>> >>> >>> I am actually doing
Re: [rsyslog] RFC: dynamic-stats support
I personally would argue that stats around the actual content of syslog messages is outside of the domain that rsyslog should be responsible for. impstats makes sense to me as it provides statistics around rsyslogs operation itself. Once I start wanting stats and counters around message content, I would rather delegate that to a different system entirely. On Tue, Oct 6, 2015 at 12:04 PM, singh.janmejaywrote: > Rainer, > > I see this as something completely outside the scope of variables. > Building stats collector over variables is possible, but then we are > then talking about a general purpose language which allows building > such complex things. This increases the scope of Rainerscript and with > larger scope comes complexity. I feel this is in-line with the other > Lua discussion where you emphasized that Rainerscript should not > become a fully-general-purpose language? > > Eg. creating an atomic-increment function for variable requires that > we educate users about what can and can't be done if atomic-increment > function is used anywhere on a variable. What relationship they can > expect it to have with other atomic-incrementing variables (which gets > into memory model). > > > > On Tue, Oct 6, 2015 at 8:49 PM, Rainer Gerhards > wrote: > > I can't fully dig into this, but I think we must *very carefully* > > evaluate the overall design. Some time ago we introduced arrays for > > the limited liblognorm use case, and it hurts us every now and then > > when folks want to use arrays for other use cases. It may probably > > make sense to re-think how the variable engine etc behaves before > > adding more functionality. And make sure that everything works smooth > > in all use cases. While anything else may take care for some use > > cases, I fear we may get too fragmented. At least this is what I > > learned in the past months discussions. > > > > Anyone else? > > > > Rainer > > > > 2015-10-06 17:10 GMT+02:00 singh.janmejay : > >> It is possible to use global-variables (it'll require some > >> enhancements, table-support etc), but it'll be very inefficient > >> compared to this approach. For instance, choice of data-structure etc > >> allows making the solution a lot more efficient. > >> > >> Here its possible to locklessly increment counters in most cases, so > >> its overhead is a lot lesser than global-variables. > >> > >> Recycle is precisely to allow this lockless mechanism to work. Its > >> basically saying, it'll track metric-names he has seen in last 1 hour. > >> If we kill tracking of it as soon as we don't see an increment > >> (between 2 reporting runs of impstats), it'll lead to unnecessary > >> churn when low-values are common or load is not uniform in time. > >> > >> Implementing it on top of global-variables is not only has very high > >> performance-penalty(it'll be prohibitive for high-throughput > >> scenarios), it also exposes too much complexity to the user (where > >> user has to worry about reset etc). > >> > >> I don't plan to have a scheduler in this implementation. > >> GetAllStatsLines call will purge the tree instead of reset at that > >> interval. Its basically a balance between freeing-up memory occupied > >> by stale-metric-names vs. performance (lockless handling of > >> increment). So it will be governed by impstat schedule. May be I > >> should change name to better name (equivalent of > >> purge_known_keys_after_they_have_been_reported_N_times). > >> > >> > >> On Tue, Oct 6, 2015 at 4:30 PM, David Lang wrote: > >>> On Tue, 6 Oct 2015, singh.janmejay wrote: > >>> > Hi, > > I am working on support for stats with dynamic-name. This comes handy > in situations where metric-name is dependent upon value of a certain > attribute of the message. > > Say, for a central log-aggregation service, its valuable to know what > is inbound message-count distribution across application-clusters that > send logs to it, or for a shared-server, its valuable to know what is > the log-volume generation across users etc. > > Im thinking of using functions-like interface to support this. It may > look similar to this: > > > dyn_stats("user_msg_count") > > ... > > ruleset(...) { > ... > dyn_inc("user_msg_count", $.user) > ... > } > > > dyn_stats signature looks like: > dyn_stats(, , default=10k>, ) > > dyn_inc signature looks like: > dyn_inc(, ) > > > Reporting would work similar to static-metric via impstats. Mapping: > statsobj_s.name = name_space > statsobj_s.origin = "dyn" > ctr_s.name = "foo" (say $.user had value foo) > > > Thoughts / suggestions? > >>> > >>> > >>> how is this different/better than global variables? (although we may > need to >
Re: [rsyslog] RFC: dynamic-stats support
Sure, sounds good. The ability to gather stats with dynamic-key is important. I am willing to even help with a rewrite if at some point we feel its best implemented in a different way in the light of variable-implementation changes. On Tue, Oct 6, 2015 at 9:48 PM, Rainer Gerhardswrote: > 2015-10-06 18:04 GMT+02:00 singh.janmejay : >> Rainer, >> >> I see this as something completely outside the scope of variables. >> Building stats collector over variables is possible, but then we are >> then talking about a general purpose language which allows building >> such complex things. This increases the scope of Rainerscript and with >> larger scope comes complexity. I feel this is in-line with the other >> Lua discussion where you emphasized that Rainerscript should not >> become a fully-general-purpose language? >> >> Eg. creating an atomic-increment function for variable requires that >> we educate users about what can and can't be done if atomic-increment >> function is used anywhere on a variable. What relationship they can >> expect it to have with other atomic-incrementing variables (which gets >> into memory model). > > Maybe I just feel overwhelemed in the moment with keeping track of > everything that is going on. How about this: we can merge it BUT flag > it as experimental. If all works out well, I am free starting early > next year to have a deep look at the overall design and sticking > together all those loose edges. I suspect that I would like to change > a couple of things in the interest of tying it all well together (like > I currently do in liblognorm). > > But if I need to carry all this legacy, that's really a burden (e.g. > liblognorm now contains the full v1 code as a copy, which means it > also needs to be somewhat maintained). I want to avoid this. As long > as we document this as an *interim* solution that is not necessarily > here to stay and as such "use at your own risk and it will probably > break next year" I am sufficiently happy with that. We just must be > aware that things may really break and there is a big chance the > actually will. And I don't want to hear about potential vuln or > compatiblity issues or whatever when this code is changed/removed. > Also keep on your mind that I probably need to totally revamp the > variable system, as json-c has many problematic parts for our use > (what I learned when digging deep with liblognorm). So I *know* that > there are big changes coming up next year! > > And, full ack: I want to limit the scope of RainerScript. Arrays was a > good sample of why it may be a bad idea to go to boldly forward > without thinking about the big picture. > > Rainer >> >> >> >> On Tue, Oct 6, 2015 at 8:49 PM, Rainer Gerhards >> wrote: >>> I can't fully dig into this, but I think we must *very carefully* >>> evaluate the overall design. Some time ago we introduced arrays for >>> the limited liblognorm use case, and it hurts us every now and then >>> when folks want to use arrays for other use cases. It may probably >>> make sense to re-think how the variable engine etc behaves before >>> adding more functionality. And make sure that everything works smooth >>> in all use cases. While anything else may take care for some use >>> cases, I fear we may get too fragmented. At least this is what I >>> learned in the past months discussions. >>> >>> Anyone else? >>> >>> Rainer >>> >>> 2015-10-06 17:10 GMT+02:00 singh.janmejay : It is possible to use global-variables (it'll require some enhancements, table-support etc), but it'll be very inefficient compared to this approach. For instance, choice of data-structure etc allows making the solution a lot more efficient. Here its possible to locklessly increment counters in most cases, so its overhead is a lot lesser than global-variables. Recycle is precisely to allow this lockless mechanism to work. Its basically saying, it'll track metric-names he has seen in last 1 hour. If we kill tracking of it as soon as we don't see an increment (between 2 reporting runs of impstats), it'll lead to unnecessary churn when low-values are common or load is not uniform in time. Implementing it on top of global-variables is not only has very high performance-penalty(it'll be prohibitive for high-throughput scenarios), it also exposes too much complexity to the user (where user has to worry about reset etc). I don't plan to have a scheduler in this implementation. GetAllStatsLines call will purge the tree instead of reset at that interval. Its basically a balance between freeing-up memory occupied by stale-metric-names vs. performance (lockless handling of increment). So it will be governed by impstat schedule. May be I should change name to better name (equivalent of
Re: [rsyslog] RFC: dynamic-stats support
2015-10-06 18:04 GMT+02:00 singh.janmejay: > Rainer, > > I see this as something completely outside the scope of variables. > Building stats collector over variables is possible, but then we are > then talking about a general purpose language which allows building > such complex things. This increases the scope of Rainerscript and with > larger scope comes complexity. I feel this is in-line with the other > Lua discussion where you emphasized that Rainerscript should not > become a fully-general-purpose language? > > Eg. creating an atomic-increment function for variable requires that > we educate users about what can and can't be done if atomic-increment > function is used anywhere on a variable. What relationship they can > expect it to have with other atomic-incrementing variables (which gets > into memory model). Maybe I just feel overwhelemed in the moment with keeping track of everything that is going on. How about this: we can merge it BUT flag it as experimental. If all works out well, I am free starting early next year to have a deep look at the overall design and sticking together all those loose edges. I suspect that I would like to change a couple of things in the interest of tying it all well together (like I currently do in liblognorm). But if I need to carry all this legacy, that's really a burden (e.g. liblognorm now contains the full v1 code as a copy, which means it also needs to be somewhat maintained). I want to avoid this. As long as we document this as an *interim* solution that is not necessarily here to stay and as such "use at your own risk and it will probably break next year" I am sufficiently happy with that. We just must be aware that things may really break and there is a big chance the actually will. And I don't want to hear about potential vuln or compatiblity issues or whatever when this code is changed/removed. Also keep on your mind that I probably need to totally revamp the variable system, as json-c has many problematic parts for our use (what I learned when digging deep with liblognorm). So I *know* that there are big changes coming up next year! And, full ack: I want to limit the scope of RainerScript. Arrays was a good sample of why it may be a bad idea to go to boldly forward without thinking about the big picture. Rainer > > > > On Tue, Oct 6, 2015 at 8:49 PM, Rainer Gerhards > wrote: >> I can't fully dig into this, but I think we must *very carefully* >> evaluate the overall design. Some time ago we introduced arrays for >> the limited liblognorm use case, and it hurts us every now and then >> when folks want to use arrays for other use cases. It may probably >> make sense to re-think how the variable engine etc behaves before >> adding more functionality. And make sure that everything works smooth >> in all use cases. While anything else may take care for some use >> cases, I fear we may get too fragmented. At least this is what I >> learned in the past months discussions. >> >> Anyone else? >> >> Rainer >> >> 2015-10-06 17:10 GMT+02:00 singh.janmejay : >>> It is possible to use global-variables (it'll require some >>> enhancements, table-support etc), but it'll be very inefficient >>> compared to this approach. For instance, choice of data-structure etc >>> allows making the solution a lot more efficient. >>> >>> Here its possible to locklessly increment counters in most cases, so >>> its overhead is a lot lesser than global-variables. >>> >>> Recycle is precisely to allow this lockless mechanism to work. Its >>> basically saying, it'll track metric-names he has seen in last 1 hour. >>> If we kill tracking of it as soon as we don't see an increment >>> (between 2 reporting runs of impstats), it'll lead to unnecessary >>> churn when low-values are common or load is not uniform in time. >>> >>> Implementing it on top of global-variables is not only has very high >>> performance-penalty(it'll be prohibitive for high-throughput >>> scenarios), it also exposes too much complexity to the user (where >>> user has to worry about reset etc). >>> >>> I don't plan to have a scheduler in this implementation. >>> GetAllStatsLines call will purge the tree instead of reset at that >>> interval. Its basically a balance between freeing-up memory occupied >>> by stale-metric-names vs. performance (lockless handling of >>> increment). So it will be governed by impstat schedule. May be I >>> should change name to better name (equivalent of >>> purge_known_keys_after_they_have_been_reported_N_times). >>> >>> >>> On Tue, Oct 6, 2015 at 4:30 PM, David Lang wrote: On Tue, 6 Oct 2015, singh.janmejay wrote: > Hi, > > I am working on support for stats with dynamic-name. This comes handy > in situations where metric-name is dependent upon value of a certain > attribute of the message. > > Say, for a central log-aggregation service, its valuable to know
Re: [rsyslog] RFC: dynamic-stats support
2015-10-06 18:36 GMT+02:00 singh.janmejay: > Sure, sounds good. > > The ability to gather stats with dynamic-key is important. I am > willing to even help with a rewrite if at some point we feel its best > implemented in a different way in the light of variable-implementation > changes. going a bit OT here: json-c is fine. But we have begun a bit to abuse it as a general variable system, which it really isn't. So I think we need to revamp the var system completely and that should, among others, address things like concurrency and performance. Once we reach this point, it may (may!) make sense to change some of the inner workings of liblognorm again. I am currently modelling the changes in a way that makes it relatively easy to do so later. The idea is that we could later have a switch to either return json-c objects (which would be generated just before handing the object back to the user) ... or some better representation. I am *not* saying this will necessarily happenm but is a potential route. I focus on liblognorm at the moment. When that is fully completed, I'll look into what I can further improve in json-c without breaking what it really is. Once this is done, it's the right time to look at actual need to replace the rsyslog variable system (I am 90% sure it's required, but that really needs to be seen...). Just to clarify. To the current project: whatever you do, please make sure the doc prominently says "this is an interim solution and it may totally change". I hate to request that (and even more hate it if I need to break it later), but that's the only path I currently see moving forward in a situation where I see a big change is very probable -- but don't have the time to do it right now. I really hate potentially wasting your time. Rainer > > On Tue, Oct 6, 2015 at 9:48 PM, Rainer Gerhards > wrote: >> 2015-10-06 18:04 GMT+02:00 singh.janmejay : >>> Rainer, >>> >>> I see this as something completely outside the scope of variables. >>> Building stats collector over variables is possible, but then we are >>> then talking about a general purpose language which allows building >>> such complex things. This increases the scope of Rainerscript and with >>> larger scope comes complexity. I feel this is in-line with the other >>> Lua discussion where you emphasized that Rainerscript should not >>> become a fully-general-purpose language? >>> >>> Eg. creating an atomic-increment function for variable requires that >>> we educate users about what can and can't be done if atomic-increment >>> function is used anywhere on a variable. What relationship they can >>> expect it to have with other atomic-incrementing variables (which gets >>> into memory model). >> >> Maybe I just feel overwhelemed in the moment with keeping track of >> everything that is going on. How about this: we can merge it BUT flag >> it as experimental. If all works out well, I am free starting early >> next year to have a deep look at the overall design and sticking >> together all those loose edges. I suspect that I would like to change >> a couple of things in the interest of tying it all well together (like >> I currently do in liblognorm). >> >> But if I need to carry all this legacy, that's really a burden (e.g. >> liblognorm now contains the full v1 code as a copy, which means it >> also needs to be somewhat maintained). I want to avoid this. As long >> as we document this as an *interim* solution that is not necessarily >> here to stay and as such "use at your own risk and it will probably >> break next year" I am sufficiently happy with that. We just must be >> aware that things may really break and there is a big chance the >> actually will. And I don't want to hear about potential vuln or >> compatiblity issues or whatever when this code is changed/removed. >> Also keep on your mind that I probably need to totally revamp the >> variable system, as json-c has many problematic parts for our use >> (what I learned when digging deep with liblognorm). So I *know* that >> there are big changes coming up next year! >> >> And, full ack: I want to limit the scope of RainerScript. Arrays was a >> good sample of why it may be a bad idea to go to boldly forward >> without thinking about the big picture. >> >> Rainer >>> >>> >>> >>> On Tue, Oct 6, 2015 at 8:49 PM, Rainer Gerhards >>> wrote: I can't fully dig into this, but I think we must *very carefully* evaluate the overall design. Some time ago we introduced arrays for the limited liblognorm use case, and it hurts us every now and then when folks want to use arrays for other use cases. It may probably make sense to re-think how the variable engine etc behaves before adding more functionality. And make sure that everything works smooth in all use cases. While anything else may take care for some use cases, I fear we may get too
Re: [rsyslog] RFC: dynamic-stats support
On Tue, 6 Oct 2015, singh.janmejay wrote: It is possible to use global-variables (it'll require some enhancements, table-support etc), but it'll be very inefficient compared to this approach. For instance, choice of data-structure etc allows making the solution a lot more efficient. As for the data structures, Rainer has been identifying inefficencies in how json-c works and working to improve them Here its possible to locklessly increment counters in most cases, so its overhead is a lot lesser than global-variables. how can you manage counters in multiple threads without locks? Especially when dealing with batches. Recycle is precisely to allow this lockless mechanism to work. Its basically saying, it'll track metric-names he has seen in last 1 hour. If we kill tracking of it as soon as we don't see an increment (between 2 reporting runs of impstats), it'll lead to unnecessary churn when low-values are common or load is not uniform in time. that depends on the cost of initializing a metric vs the cost of tracking the recycle mechanism. Implementing it on top of global-variables is not only has very high performance-penalty(it'll be prohibitive for high-throughput scenarios), it also exposes too much complexity to the user (where user has to worry about reset etc). I don't plan to have a scheduler in this implementation. GetAllStatsLines call will purge the tree instead of reset at that interval. Its basically a balance between freeing-up memory occupied by stale-metric-names vs. performance (lockless handling of increment). So it will be governed by impstat schedule. May be I should change name to better name (equivalent of purge_known_keys_after_they_have_been_reported_N_times). if this is just adding additional metrics to the impstats output that eliminates the schedular/reset issue. I think we should have a metric configuration be fairly static, allow configuring custom metrics and add to them, but don't use data from the message as part of the name of the metric, and continue reporting them forever, even if they are 0 (so no need to 'recycle' names) David Lang ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss
Thank you both David and Rainer - I really appreciate the detailed responses and it's already helping put my box in a better state. I used the impstats analyzer and started increasing dynaFileCacheSize for the actions that were problematic. CPU usage increased by about 2.5x and context switches went up to 800k for about 15 minutes after I restarted rsyslogd but seems to have settled back in around 60-70% and around 100k context switches now. Top showed three in:imtcp threads as the busiest during that time. Cacti stats show about half the number of disk operations as before. So it seems to be a general performance gain, but it did not affect message loss, however, as I can already see that some messages have disappeared. Regarding queues, some of my rules in each ruleset do forward messages to another host (a SIEM) over TCP. Should I keep the queue to support that, or is TCP good enough? RH support suggests turning of DNS lookups. Would that have any affect? We've come to rely on hostname for creating directories for each server. Dan Woodruff -Original Message- From: rsyslog-boun...@lists.adiscon.com [mailto:rsyslog-boun...@lists.adiscon.com] On Behalf Of Rainer Gerhards Sent: Tuesday, October 6, 2015 9:47 AM To: rsyslog-usersSubject: Re: [rsyslog] RHEL7.1 / rsyslog 8.x random message loss There are a couple of those: LinuxIPTablesFile: origin=omfile requests=138877957 level0=19864683 missed=91303303 evicted=91292586 maxused=10 closetimeouts=0 have a look at the evicted count! This totally kills performance. Each time one entry needs to be evicted, the file is closed, and a new one is opened. You should set the cache size for this action to match roughly the number of files that you want to keep open during normal operations (e.g. if you have 1000 hosts and include hour and hostname in the dynafile name, the cache size should be around 1000, except if you have some low talkers. Also note the online tool at https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rsyslog.com_impstats -2Danalyzer_=BQICAg=kbmfwr1Yojg42sGEpaQh5ofMHBeTl9EI2eaqQZhHbOU=0S5vJ8 -FmQy6Qk5D6_T4U9EYbkCcMc4ijDuyUem89Lk=zfcoEMd6LlXQ7QdIO07SYeKV1XF06xf9-ncR plNjcN8=sYq7lASJR370QFGO7qlbIFkrmueH01xv1J9reZc6L2Q= It probably will spit out a couple of warnings if you submit the pstats. HTH Rainer 2015-10-06 15:29 GMT+02:00 Woodruff, Dan : > Rainer, thank you. Here is a full extract of a recent impstats run: > > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: imuxsock: origin=imuxsock > submitted=282984 ratelimit.discarded=0 ratelimit.numratelimiters=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache StatsFile: > origin=omfile requests=326648 level0=325603 missed=1045 evicted=0 maxused=1 > closetimeouts=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 1: origin=core.action > processed=326648 failed=0 suspended=0 suspended.duration=0 resumed=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 2: origin=core.action > processed=3255090916 failed=0 suspended=2 suspended.duration=240 resumed=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache LDAPFile: > origin=omfile requests=3171530030 level0=3171529033 missed=997 evicted=0 > maxused=1 closetimeouts=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 3: origin=core.action > processed=3171530030 failed=0 suspended=0 suspended.duration=0 resumed=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache > LinuxIPTablesFile: origin=omfile requests=138877957 level0=19864683 > missed=91303303 evicted=91292586 maxused=10 closetimeouts=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 4: origin=core.action > processed=138877957 failed=0 suspended=0 suspended.duration=0 resumed=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache LinuxUFWFile: > origin=omfile requests=0 level0=0 missed=0 evicted=0 maxused=0 > closetimeouts=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 5: origin=core.action > processed=0 failed=0 suspended=0 suspended.duration=0 resumed=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache DHCPFile: > origin=omfile requests=3518037 level0=3516992 missed=1045 evicted=0 > maxused=1 closetimeouts=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 6: origin=core.action > processed=3518037 failed=0 suspended=0 suspended.duration=0 resumed=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache > LinuxBlankProgramFile: origin=omfile requests=93283 level0=28347 missed=1932 > evicted=15 maxused=10 closetimeouts=303 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: action 7: origin=core.action > processed=93283 failed=0 suspended=0 suspended.duration=0 resumed=0 > Oct 6 08:27:07 its-syslog-up1 rsyslogd-pstats: dynafile cache > LinuxProgramFile: origin=omfile requests=358821719 level0=245218530 > missed=23914416 evicted=23903728 maxused=10 closetimeouts=0 > Oct 6 08:27:07
Re: [rsyslog] RFC: dynamic-stats support
It is possible to use global-variables (it'll require some enhancements, table-support etc), but it'll be very inefficient compared to this approach. For instance, choice of data-structure etc allows making the solution a lot more efficient. Here its possible to locklessly increment counters in most cases, so its overhead is a lot lesser than global-variables. Recycle is precisely to allow this lockless mechanism to work. Its basically saying, it'll track metric-names he has seen in last 1 hour. If we kill tracking of it as soon as we don't see an increment (between 2 reporting runs of impstats), it'll lead to unnecessary churn when low-values are common or load is not uniform in time. Implementing it on top of global-variables is not only has very high performance-penalty(it'll be prohibitive for high-throughput scenarios), it also exposes too much complexity to the user (where user has to worry about reset etc). I don't plan to have a scheduler in this implementation. GetAllStatsLines call will purge the tree instead of reset at that interval. Its basically a balance between freeing-up memory occupied by stale-metric-names vs. performance (lockless handling of increment). So it will be governed by impstat schedule. May be I should change name to better name (equivalent of purge_known_keys_after_they_have_been_reported_N_times). On Tue, Oct 6, 2015 at 4:30 PM, David Langwrote: > On Tue, 6 Oct 2015, singh.janmejay wrote: > >> Hi, >> >> I am working on support for stats with dynamic-name. This comes handy >> in situations where metric-name is dependent upon value of a certain >> attribute of the message. >> >> Say, for a central log-aggregation service, its valuable to know what >> is inbound message-count distribution across application-clusters that >> send logs to it, or for a shared-server, its valuable to know what is >> the log-volume generation across users etc. >> >> Im thinking of using functions-like interface to support this. It may >> look similar to this: >> >> >> dyn_stats("user_msg_count") >> >> ... >> >> ruleset(...) { >> ... >> dyn_inc("user_msg_count", $.user) >> ... >> } >> >> >> dyn_stats signature looks like: >> dyn_stats(, , > default=10k>, ) >> >> dyn_inc signature looks like: >> dyn_inc(, ) >> >> >> Reporting would work similar to static-metric via impstats. Mapping: >> statsobj_s.name = name_space >> statsobj_s.origin = "dyn" >> ctr_s.name = "foo" (say $.user had value foo) >> >> >> Thoughts / suggestions? > > > how is this different/better than global variables? (although we may need to > implement soem functions, atomic inc/dec copy+clear) If you have pstats > output in json format, you can even piggyback on it's schedule to output the > data. > > > things like stats can very easily end up being expensive in terms of locking > (something global variables already have figured out), and it sounds like > you are proposing adding a scheduler of some sort to output the data. > > variables should not need to be 'recycled', either they contain data or they > don't. If they contain data, you need to keep the data until you do > something with it, if they don't, you don't have to track them. > > > I am actually doing this sort of thing external to rsyslog in SEC > > I have a template in rsyslog that contains hostname, fromhost-ip, > programname and I output it via improg to SEC. SEC accumulates counters and > has scheduled outputs to files. > > before I started using SEC for this, I used the same template to output to a > file and then for reports, used cut + sort + uniq -c to extract the data I > need. When the files only contain the significant data, this is actually not > bad to do, even at higher volumes. > > David Lang > ___ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T > LIKE THAT. -- Regards, Janmejay http://codehunk.wordpress.com ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.