Re: [rsyslog] [RFC: Ingestion Relay] End-to-end reliable 'at-least-once' message delivery at large scale

David Lang Fri, 23 Jan 2015 23:33:28 -0800

On Fri, 23 Jan 2015, David Lang wrote:

On Sat, 24 Jan 2015, singh.janmejay wrote:
That is alright. I wanted to float the idea and get an agreement. I'll be
happy to implement it, if we see value in it.
Let's back up here and talk about the problem rather than getting bogged downin the solution.
Rsyslog already has the ability to relay the message from one machine toanother reliably, the problem you are trying to address is what happens ifthe box dies after the receiving machine acks the message, correct?
Currently we can (at a performance cost) configure the receiving machine toack the message only after it's safe on redundant disks, but the messagestill won't be delivered until after the machine is repaired (worst case)
You are looking for a way to deal with this problem.

Is this a reasonable problem statement?

following up (to try and keep things in bite size pieces), there are two partsto the problem.


1. delivering messages to multiple machines

2. combining the resulting messages again at the end.

for the moment let's ignore #2

Looking at #1.

There are two fundamental approaches you can take.

1. you can have the sender deliver it to multiple destinations and not considerit delivered until it's been accepted by enough destinations.

2. you can have the receiver replicate it out to multiple peers and not considerit delivered (and therefor not sending the ack) until it's been accepted byenough peers.

the first option would mostly modify omrelp while the second would mostly modifyimrelp

Since we have many cases where people want to replicate the data, sometimes todifferent networks, I think we would be best off doing it on the output side.

Actually, now that I think about it. If you have a ruleset with a queue on itthat contains two relp actions, no message will be considered delivered untilboth actions have succeeded in delivering it. As long as you have some way ofensuring that both actions aren't going to deliver to the same destination, nocode changes are needed. If you make the destinations be two clusters ofmachines, at least one machine in each cluster will receive the message.

Getting fancier than this, if RELP has the receiver identify themselves, youcould modify the RELP output module to allow multiple destinations in some way,and if it recives the same ID that it already has for one of it's existingconnections, close the connnection and retry. This way you could have all themachines in a single cluster and guarantee that you would deliver to at leasttwo of them.

by the way, one other issue you will run into in modern datacenters is figuingout how to make sure that you don't have the machines that you are using forredundancy sharing a single point of failure. That SPoF could be that they areboth VMs on the same host machine, both are plugged in to the same power source,both need to talk through the same switch, both need to talk through the samerouter, etc. Then you have to worry about what happens if communication breaksdown between these independent pieces but they can still talk to the outsideworld (this is referred to as the "split brain" problem). The complexity inensuring that you have the machines completely independent is severe enough thatI personally would be happier having two independent clusters rather thanreplicating to N nodes of the same cluster and try to make sure that you havehit machines that are completely independent of each other.

So it may be that this part of the problem is doable today, with a bit of uglyconfig, but no code changes needed (although it's very possible to make thingssimpler).


David Lang

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] [RFC: Ingestion Relay] End-to-end reliable 'at-least-once' message delivery at large scale

Reply via email to