On Wed, 3 Mar 2021, odrzen wrote:

Yes you understand my problem.
And I understand that in the end, the problem is mainly in the central machine 
where it receives all these messages.

From the remote machines, I have (probably) correctly defined the way they send 
their logs to the central machine. Now this part is very easy and cool. Indeed, 
all the messages come to the main/central machine very nicely and in real time.
But the messages are really too many per second. At the moment I don't seem to 
have a problem, but in the future I may have  bottleneck problems  or something 
else..

what is 'too many per second', I've gone to several hundred thousand per second in production settings, and others have tested (with simple configs) over a million messages per second in tests/

How can I make this more efficient so that I don't have problems in the future ?

I'd have to see the full config to begin to guess where your bottlenecks are.


Υour idea about "facility" and "severity" is good, but I also realize that it 
is not the best and most effective.

About  syslogtag  seems to me a very interesting idea, as well as the way you described 
with the "json" messages. But first, I want to ask for something.
Now after the settings I did thanks to your help, I receive the messages as 
follows :

```
2021-03-04T00:13:14+02:00 example.com  apache: 192.168.1.1 - - [04/Mar/2021:00:13:14 +0200] "GET / 
HTTP/1.1" 301 237 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64
2021-03-04T00:14:20+02:00 example.com  apache: 192.168.1.2 - - [04/Mar/2021:00:14:20 +0200] "GET / 
HTTP/1.1" 302 - "-" "Mozilla/5.0 (X11; Linux x86_64; rv:78.
2021-03-04T00:14:20+02:00 example.com  apache: 192.168.1.3 - - [04/Mar/2021:00:14:20 +0200] "GET 
/post/ HTTP/1.1" 200 3877 "-" "Mozilla/5.0 (X11; Linux x86_64;
```

Is not safe to try to push all messages containing the word "apache" into a separate 
"rullset" ?
Although I guess this word may exist randomly in a completely irrelevant 
message... so then there will be a problem.. right?

apache: is the syslogtag, so if you check for it just in that property you don't have to worry about it appearing elsewhere

write some logs out with the template RSYSLOG_DebugFormat and it will show you how rsyslog has already parsed the message and what veriables you have available to work with.

So before I try to try the solution by converting the messages to "JSON" format 
first,
I don't understand how on the central rsyslog, I will define the following:
1. don't manage/handle the messages with the "x" tag at all.
2. the messages with the tag "x", will be managed/handled by this ruleset (sub 
process)

if $syslogtag == "apache:" then {
  one set of rules (which  could be a call to a ruleset)
} else {
another set of rules that will get evaluated if and only if (IFF) the syslog tag is not apach:
}

David Lang



‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, March 3, 2021 6:24 AM, David Lang <[email protected]> wrote:

ruleseets only apply inside the instance or rsyslog that is running them. Once
you send the logs to a new machine, you now have a separate problem. How will
you identify the log you want to treat differently.

If they are arriving via the same port as other logs, this can be difficult. I
deal with this on my systems by having the sending machine send the logs in a
json format that can include additional metadata (like what file the log came
from) while still beingable to easily recreate the original log message.

If I am understanding your problem correctly it seems like what you want is:

you have files on machine A that you want to move to machine B in real-time

This can be done without defining any rulesets, but it will require you create
some way to identify the logs.

since you are reading files from disk that do not have any facility or severity
as part of the message being read, you could use those to encode what is what
(local0 is file1, local1 is file2, using severity is possible, but more
difficult), this gives you up to 64 combinations to work with, but is a pain to
keep straight.

another option would be to use the syslogtag field, and then just know on the
far side that if you receive one of the special syslogtag values, you need to
formt the write to disk without using that field. the syslogtag field cannot
include a / and is limited to 32 characters, so you do have some limitations

or you could make a custom format for your output that puts the file path as the
first thing after the syslogtag and then parse it out on the receiving side.

I go a step further in making a more complex, but more flexible solution where I
create a json message to send that has a field 'msg' that is the original
message, and a tree of objects 'trusted-<company abbriv>' that I have contain
the metadata that I want to add. This includes the filename if it's read from a
file, the name and timestamps related to any relays that it goes through, what
environment this is from (dev/qa/prod/etc) for cases where people like to re-use
names, and anything else that comes up in the future. On the receiving side,
it's a json message that gets parsed, then I look at the data in $!trusted-foo!*
and can make decisions on what to do at that point.

David Lang

On Tue, 2 Mar 2021, odrzen wrote:

Date: Tue, 02 Mar 2021 22:57:37 +0000
From: odrzen [email protected]
To: David Lang [email protected]
Cc: rsyslog-users [email protected]
Subject: Re: [rsyslog] The right way to include more log files?
So, as I understand it - after your very good explanation, it's very important 
to define a `ruleset` in case we want rsyslog to handle/manage additional logs.
And it need a new `ruleset`, with a specific `action`, `template`, `queue` and 
`Target`.
In your opinion, if a service writes its own logs to separate log files and 
rsyslog handles these logs by default, if this service generates a lot of logs, 
would you also still create a separate `ruleset` for it?
On the side of the machine in which I have defined in the way you describe 
which additional log files the rsyslog handles ( with its own `ruleset` ), I 
see that I actually have more information about the messages from these logs 
using the `impstats` module (at the moment I don't know how else I can get more 
information about them).
For example:

    Wed Mar  3 00:30:33 2021: global: origin=dynstats 
msg_per_host.ops_overflow=0 msg_per_host.new_metric_add=0 
msg_per_host.no_metric=0 msg_per_host.metrics_purged=0 
msg_per_host.ops_ignored=0 msg_per_host.purge_triggered=107
    Wed Mar  3 00:30:33 2021: imuxsock: origin=imuxsock submitted=0 
ratelimit.discarded=0 ratelimit.numratelimiters=0
    Wed Mar  3 00:30:33 2021: action 0: origin=core.action processed=55295 
failed=0 suspended=0 suspended.duration=0 resumed=0
    Wed Mar  3 00:30:33 2021: action 1: origin=core.action processed=4511 
failed=0 suspended=0 suspended.duration=0 resumed=0
    Wed Mar  3 00:30:33 2021: action 2: origin=core.action processed=49706 
failed=0 suspended=0 suspended.duration=0 resumed=0
    Wed Mar  3 00:30:33 2021: action 3: origin=core.action processed=15 
failed=0 suspended=0 suspended.duration=0 resumed=0
    Wed Mar  3 00:30:33 2021: action 4: origin=core.action processed=1063 
failed=0 suspended=0 suspended.duration=0 resumed=0
    Wed Mar  3 00:30:33 2021: action 5: origin=core.action processed=0 failed=0 
suspended=0 suspended.duration=0 resumed=0
    Wed Mar  3 00:30:33 2021: action 6: origin=core.action processed=0 failed=0 
suspended=0 suspended.duration=0 resumed=0
    Wed Mar  3 00:30:33 2021: action 7: origin=core.action processed=0 failed=0 
suspended=0 suspended.duration=0 resumed=0

    Wed Mar  3 00:30:33 2021: msg_per_host: origin=dynstats.bucket

    Wed Mar  3 00:30:33 2021: apache: origin=core.action processed=6405 
failed=0 suspended=0 suspended.duration=0 resumed=0

    Wed Mar  3 00:30:33 2021: resource-usage: origin=impstats utime=25597640 
stime=23465292 maxrss=17348 minflt=18258 majflt=0 inblock=656 oublock=125552 
nvcsw=522977 nivcsw=115

    Wed Mar  3 00:30:33 2021: apache queue[DA]: origin=core.queue size=0 
enqueued=0 full=0 discarded.full=0 discarded.nf=0 maxqsize=0
    Wed Mar  3 00:30:33 2021: apache queue: origin=core.queue size=0 
enqueued=6405 full=0 discarded.full=0 discarded.nf=0 maxqsize=3

    Wed Mar  3 00:30:33 2021: main Q: origin=core.queue size=0 enqueued=61700 
full=0 discarded.full=0 discarded.nf=0 maxqsize=10


But now, from the side of the central machine to which I send the logs, can or should I 
set a separate "ruleset" for these messages ?
So that I can be sure that the messages were successfully processed and stored 
on the central machine as well ?
Thank you very much for the explanations and your time.
The way you describe them is very nice and simple. You helped me a lot to better 
understand why we need "rulesets".
Sorry if I had to figure this out on some page of the documentation, but I 
didn't see it described that way.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, February 25, 2021 12:33 AM, David Lang [email protected] wrote:

On Wed, 24 Feb 2021, odrzen wrote:

I am more wondering to understand what is the right way and why to use 
rulestes, actions or quests and especially in this case to send particular log 
files to a central rsyslog.

There are a few reasons to use a ruleset

1.  when you have an input (say a network port) that is very different than 
other
    inputs and you only want to have a subset of the rules processed for logs 
that
    arrive on this input

2.  a varient of #1, if you want to make sure that logs arriving from one input
    cannot be blocked if the queue builds up processing other inputs, you 
configure
    as #1 and add a queue to the ruleset

3.  if you want to put a queue on a group of actions, say sending to one of a
    couple different destinations (failover), if you put a queue on each 
action, it
    will 'succeed' by putting the message in the queue, even if it's not sent. 
But
    you can put a queue on the ruleset to buffer things at that level, then have
    actions that don't have a queue and can fail (which you can detect)

4.  avoiding duplicate writers to one destination. If you are writing to the 
same
    file/sending to the same remote machine and have 10 different actions in 
your
    rule that all have the same output, they will all be trying to output at the
    same time (opening multiple connections to remote systems), if you put the
    action in a ruleset and call it from all of those destinations, you ony 
have one
    connection

5.  making the ruleset easier to understand. Just like functions in programming
    languages, it may be easier to understand a config file that calls rulesets 
that
    hide the details rather than having all the statements inline.
    David Lang




_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to