On Fri, 23 Jan 2015, David Lang wrote:

On Sat, 24 Jan 2015, singh.janmejay wrote:

That is alright. I wanted to float the idea and get an agreement. I'll be
happy to implement it, if we see value in it.

Let's back up here and talk about the problem rather than getting bogged down in the solution.

Rsyslog already has the ability to relay the message from one machine to another reliably, the problem you are trying to address is what happens if the box dies after the receiving machine acks the message, correct?

Currently we can (at a performance cost) configure the receiving machine to ack the message only after it's safe on redundant disks, but the message still won't be delivered until after the machine is repaired (worst case)

You are looking for a way to deal with this problem.

Is this a reasonable problem statement?

following up (to try and keep things in bite size pieces), there are two parts to the problem.

1. delivering messages to multiple machines

2. combining the resulting messages again at the end.

for the moment let's ignore #2

Looking at #1.

There are two fundamental approaches you can take.

1. you can have the sender deliver it to multiple destinations and not consider it delivered until it's been accepted by enough destinations.

2. you can have the receiver replicate it out to multiple peers and not consider it delivered (and therefor not sending the ack) until it's been accepted by enough peers.

the first option would mostly modify omrelp while the second would mostly modify imrelp

Since we have many cases where people want to replicate the data, sometimes to different networks, I think we would be best off doing it on the output side.

Actually, now that I think about it. If you have a ruleset with a queue on it that contains two relp actions, no message will be considered delivered until both actions have succeeded in delivering it. As long as you have some way of ensuring that both actions aren't going to deliver to the same destination, no code changes are needed. If you make the destinations be two clusters of machines, at least one machine in each cluster will receive the message.

Getting fancier than this, if RELP has the receiver identify themselves, you could modify the RELP output module to allow multiple destinations in some way, and if it recives the same ID that it already has for one of it's existing connections, close the connnection and retry. This way you could have all the machines in a single cluster and guarantee that you would deliver to at least two of them.

by the way, one other issue you will run into in modern datacenters is figuing out how to make sure that you don't have the machines that you are using for redundancy sharing a single point of failure. That SPoF could be that they are both VMs on the same host machine, both are plugged in to the same power source, both need to talk through the same switch, both need to talk through the same router, etc. Then you have to worry about what happens if communication breaks down between these independent pieces but they can still talk to the outside world (this is referred to as the "split brain" problem). The complexity in ensuring that you have the machines completely independent is severe enough that I personally would be happier having two independent clusters rather than replicating to N nodes of the same cluster and try to make sure that you have hit machines that are completely independent of each other.

So it may be that this part of the problem is doable today, with a bit of ugly config, but no code changes needed (although it's very possible to make things simpler).

David Lang

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to