On Fri, 23 Jan 2015, David Lang wrote:
On Sat, 24 Jan 2015, singh.janmejay wrote:
That is alright. I wanted to float the idea and get an agreement. I'll be
happy to implement it, if we see value in it.
Let's back up here and talk about the problem rather than getting bogged down
in the solution.
Rsyslog already has the ability to relay the message from one machine to
another reliably, the problem you are trying to address is what happens if
the box dies after the receiving machine acks the message, correct?
Currently we can (at a performance cost) configure the receiving machine to
ack the message only after it's safe on redundant disks, but the message
still won't be delivered until after the machine is repaired (worst case)
You are looking for a way to deal with this problem.
Is this a reasonable problem statement?
following up (to try and keep things in bite size pieces), there are two parts
to the problem.
1. delivering messages to multiple machines
2. combining the resulting messages again at the end.
for the moment let's ignore #2
Looking at #1.
There are two fundamental approaches you can take.
1. you can have the sender deliver it to multiple destinations and not consider
it delivered until it's been accepted by enough destinations.
2. you can have the receiver replicate it out to multiple peers and not consider
it delivered (and therefor not sending the ack) until it's been accepted by
enough peers.
the first option would mostly modify omrelp while the second would mostly modify
imrelp
Since we have many cases where people want to replicate the data, sometimes to
different networks, I think we would be best off doing it on the output side.
Actually, now that I think about it. If you have a ruleset with a queue on it
that contains two relp actions, no message will be considered delivered until
both actions have succeeded in delivering it. As long as you have some way of
ensuring that both actions aren't going to deliver to the same destination, no
code changes are needed. If you make the destinations be two clusters of
machines, at least one machine in each cluster will receive the message.
Getting fancier than this, if RELP has the receiver identify themselves, you
could modify the RELP output module to allow multiple destinations in some way,
and if it recives the same ID that it already has for one of it's existing
connections, close the connnection and retry. This way you could have all the
machines in a single cluster and guarantee that you would deliver to at least
two of them.
by the way, one other issue you will run into in modern datacenters is figuing
out how to make sure that you don't have the machines that you are using for
redundancy sharing a single point of failure. That SPoF could be that they are
both VMs on the same host machine, both are plugged in to the same power source,
both need to talk through the same switch, both need to talk through the same
router, etc. Then you have to worry about what happens if communication breaks
down between these independent pieces but they can still talk to the outside
world (this is referred to as the "split brain" problem). The complexity in
ensuring that you have the machines completely independent is severe enough that
I personally would be happier having two independent clusters rather than
replicating to N nodes of the same cluster and try to make sure that you have
hit machines that are completely independent of each other.
So it may be that this part of the problem is doable today, with a bit of ugly
config, but no code changes needed (although it's very possible to make things
simpler).
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.