Re: [rsyslog] Would imhiredis make sense?

2016-11-23 Thread Rainer Gerhards
Side-note: I agree with mostolog on the advantages of componentication for
fault isolation. Just another user case...

Rainer

Sent from phone, thus brief.

Am 23.11.2016 14:47 schrieb "David Lang" :

> On Wed, 23 Nov 2016, mosto...@gmail.com wrote:
>
> However, if you really want to go this way, one thing you can do is to
>>> make use of the multicast mac feature in ethernet to distribute the same
>>> logs to multiple systems/containers and have each container throw away all
>>> logs except what it's configured to handle.
>>>
>>> This lets you add/remove log processing at any time and even have
>>> multiple systems processing the same logs in different ways
>>>
>>> https://www.usenix.org/conference/lisa12/technical-sessions/
>>> presentation/lang_david
>>>
>> Network traffic x2
>> Actually, we are using a similar environment for other things, but I
>> don't think that's the way to go.
>>
>
> This doesn't need to double the network traffic in the way you are
> thinking. The IP address that the senders deliver to is shared across all
> your processing boxes. The switch replicates the traffic on it's backbone
> and delivers it to each machine.
>
> with your current approach you do
>
> sender -> rsyslog -> redis -> logstash -> ES
>
> so there are 3-4 copies of the logs (depending on if sender and rsyslog
> are the same box)
>
> if instead you did
>
> sender -> multicast mac to rsyslog -> ES
>
> there would only be two copies of the logs on the wire at any point
> (although N copies total going into the rsyslog box, but that's only on the
> interface to those boxes)
>
> David Lang
> ___
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Would imhiredis make sense?

2016-11-23 Thread David Lang

On Wed, 23 Nov 2016, mosto...@gmail.com wrote:

However, if you really want to go this way, one thing you can do is to make 
use of the multicast mac feature in ethernet to distribute the same logs to 
multiple systems/containers and have each container throw away all logs 
except what it's configured to handle.


This lets you add/remove log processing at any time and even have multiple 
systems processing the same logs in different ways


https://www.usenix.org/conference/lisa12/technical-sessions/presentation/lang_david 

Network traffic x2
Actually, we are using a similar environment for other things, but I don't 
think that's the way to go.


This doesn't need to double the network traffic in the way you are thinking. The 
IP address that the senders deliver to is shared across all your processing 
boxes. The switch replicates the traffic on it's backbone and delivers it to 
each machine.


with your current approach you do

sender -> rsyslog -> redis -> logstash -> ES

so there are 3-4 copies of the logs (depending on if sender and rsyslog are the 
same box)


if instead you did

sender -> multicast mac to rsyslog -> ES

there would only be two copies of the logs on the wire at any point (although N 
copies total going into the rsyslog box, but that's only on the interface to 
those boxes)


David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Would imhiredis make sense?

2016-11-23 Thread mosto...@gmail.com


Logstash needs something like redis because it can't do any queueing 
itself. Rsyslog is built around queues, and has the ability to 
create multiple queues and piplines internally, you don't need to 
run multiple instances.

I want multiples instances in order to:

* Being able to process pipelines on different containers/hosts


much less needed on rsyslog due to the higher effiency. I've had 
rsyslog handling over a hundred thousand logs/sec on a single host.


This is our current scenario (each element deployed within a docker 
container):


   logs-->RELP-->rsyslog-->redis-->logstash_app_1/N...


This allow us to have multiple simpler configurations for logstash, 
splitting traffic between multiple workers/containers on different 
hosts, high availability, load balancing...





* Isolate pipelines to prevent problems on one affecting others


rulesets with queues on each ruleset solvs this for you.
One segfault while processing one ruleset/action (actually, it happened 
a lot with 8.22) crash the whole process.




All processing from that point on will take place in different 
threads working on different queues for each category.
Will I be able to "reload" rsyslog configuration to add/delete new 
rulesets/pipelines?


you can stop/start rsyslog, but there is not a way to change the 
config on the fly.

:(

However, if you really want to go this way, one thing you can do is to 
make use of the multicast mac feature in ethernet to distribute the 
same logs to multiple systems/containers and have each container throw 
away all logs except what it's configured to handle.


This lets you add/remove log processing at any time and even have 
multiple systems processing the same logs in different ways


https://www.usenix.org/conference/lisa12/technical-sessions/presentation/lang_david 


Network traffic x2
Actually, we are using a similar environment for other things, but I 
don't think that's the way to go.


KISS, start simple and only add complexity when you find it's actually 
needed. Have plans for how to scale out when you hit limits, but you 
usually find that you hit limits far later than expected. Yes, you may 
have to eventually do the same work, but by having a solid system now 
with less work, you can spend the time saved now to improve other things.
KISS is great, but we are looking to build a dynamic pipeline, and we 
found rsyslog is close to be the proper tool, with a couple of changes!



Somehow related with Rainer's new file reader proposal, I think a 
rsyslog code review/refactor will help with this.

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Would imhiredis make sense?

2016-11-22 Thread David Lang

On Tue, 22 Nov 2016, mosto...@gmail.com wrote:

What sort of log volume are you talking about here? (logs/sec type of 
thing)

From 0 to thousand-thousands/sec

Logstash needs something like redis because it can't do any queueing 
itself. Rsyslog is built around queues, and has the ability to create 
multiple queues and piplines internally, you don't need to run multiple 
instances.

I want multiples instances in order to:

* Being able to process pipelines on different containers/hosts


much less needed on rsyslog due to the higher effiency. I've had rsyslog 
handling over a hundred thousand logs/sec on a single host.



* Isolate pipelines to prevent problems on one affecting others


rulesets with queues on each ruleset solvs this for you.


* (others)


that's hard to answer :-)



What you would do is create a ruleset for each application (pipeline) and 
give that ruleset it's own queue.
I know it can be done, but not what I'm looking for. Moreover, I would love 
to be a "dynamic" configuration


As new logs arrive, you then sort them by application, and for each 
application (or application category), you call the appropriate ruleset.


And, if there are a lot of evt/sec, you may have a bottleneck. I'll probably 
have a rsyslog cluster based on docker swarm mode


This is unlikly to be a bottleneck. The overhead of recieving a log message, 
parsing it, and looking up what ruleset to call is very cheap. At anything under 
several hundred thousand logs/sec it's unlikly to max out a single core.


All processing from that point on will take place in different threads 
working on different queues for each category.
Will I be able to "reload" rsyslog configuration to add/delete new 
rulesets/pipelines?


you can stop/start rsyslog, but there is not a way to change the config on the 
fly.


However, if you really want to go this way, one thing you can do is to make use 
of the multicast mac feature in ethernet to distribute the same logs to multiple 
systems/containers and have each container throw away all logs except what it's 
configured to handle.


This lets you add/remove log processing at any time and even have multiple 
systems processing the same logs in different ways


https://www.usenix.org/conference/lisa12/technical-sessions/presentation/lang_david


Give it a try, I'll bet that you find the result much simpler and faster.

I expecting your reply ;)


KISS, start simple and only add complexity when you find it's actually needed. 
Have plans for how to scale out when you hit limits, but you usually find that 
you hit limits far later than expected. Yes, you may have to eventually do the 
same work, but by having a solid system now with less work, you can spend the 
time saved now to improve other things.


David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Would imhiredis make sense?

2016-11-22 Thread mosto...@gmail.com


What sort of log volume are you talking about here? (logs/sec type of 
thing)

From 0 to thousand-thousands/sec

Logstash needs something like redis because it can't do any queueing 
itself. Rsyslog is built around queues, and has the ability to create 
multiple queues and piplines internally, you don't need to run 
multiple instances.

I want multiples instances in order to:

 * Being able to process pipelines on different containers/hosts
 * Isolate pipelines to prevent problems on one affecting others
 * (others)


What you would do is create a ruleset for each application (pipeline) 
and give that ruleset it's own queue.
I know it can be done, but not what I'm looking for. Moreover, I would 
love to be a "dynamic" configuration


As new logs arrive, you then sort them by application, and for each 
application (or application category), you call the appropriate ruleset.
And, if there are a lot of evt/sec, you may have a bottleneck. I'll 
probably have a rsyslog cluster based on docker swarm mode


All processing from that point on will take place in different threads 
working on different queues for each category.
Will I be able to "reload" rsyslog configuration to add/delete new 
rulesets/pipelines?



Give it a try, I'll bet that you find the result much simpler and faster.

I expecting your reply ;)

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Would imhiredis make sense?

2016-11-22 Thread David Lang

On Tue, 22 Nov 2016, mosto...@gmail.com wrote:

We've been playing with logstash, rsyslog and redis for a while in order to 
*index into elasticsearch a bunch of application logs*. Briefly: 
app1-file1.log, app1-file2.log...appN-fileX.log -> pipeline -> elasticsearch.


So far, we are using *redis queues and _each application_ processing was made 
by one logstash instance* (docker container). Of course, this works with 5-10 
applications, but it doesn't when you plan to deploy 100 apps cause each 
logstash instance requires ~512MB of RAM.


We've been thinking about rsyslog since the beginning, because it takes fewer 
RAM, but just noticed it doesn't have a *redis input module (aka: imhiredis)*


We still plan to have independent instances (one rsyslog for each 
application), but we're wondering if you'll consider it makes sense to 
implement this module.


What sort of log volume are you talking about here? (logs/sec type of thing)

Logstash needs something like redis because it can't do any queueing itself. 
Rsyslog is built around queues, and has the ability to create multiple queues 
and piplines internally, you don't need to run multiple instances.


What you would do is create a ruleset for each application (pipeline) and give 
that ruleset it's own queue.


As new logs arrive, you then sort them by application, and for each application 
(or application category), you call the appropriate ruleset. All processing from 
that point on will take place in different threads working on different queues 
for each category.


Give it a try, I'll bet that you find the result much simpler and faster.

David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


[rsyslog] Would imhiredis make sense?

2016-11-22 Thread mosto...@gmail.com

Hi


We've been playing with logstash, rsyslog and redis for a while in order 
to *index into elasticsearch a bunch of application logs*. Briefly: 
app1-file1.log, app1-file2.log...appN-fileX.log -> pipeline -> 
elasticsearch.


So far, we are using *redis queues and _each application_ processing was 
made by one logstash instance* (docker container). Of course, this works 
with 5-10 applications, but it doesn't when you plan to deploy 100 apps 
cause each logstash instance requires ~512MB of RAM.


We've been thinking about rsyslog since the beginning, because it takes 
fewer RAM, but just noticed it doesn't have a *redis input module (aka: 
imhiredis)*


We still plan to have independent instances (one rsyslog for each 
application), but we're wondering if you'll consider it makes sense to 
implement this module.


Regards

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.