Re: [rsyslog] [RFC: Ingestion Relay] End-to-end reliable 'at-least-once' message delivery at large scale

singh.janmejay Sun, 25 Jan 2015 09:27:07 -0800

On Sun, Jan 25, 2015 at 7:20 PM, Rainer Gerhards <[email protected]>
wrote:


> adding to my previous response:
>
> I think you can achive the result by thightly coupling input and output via
> DIRECT queue mode (aka "no queue") and using RELP for the transport. But
> this may already have been discussed (sorry, I really currently have no
> time to dig into the discussion, no matter how interesting it is -- I
> really took a lot of time off my schedule to do the testbench, CI and
> similar work and I begin to get into real trouble if I distract myself any
> further).
>

No worries, I'll be happy to summarize once we have reached some
conclusion.

I'll have to summarize it anyway to confirm that we are on the same page
regardless of us going ahead or dropping the idea. So its not even
additional work.


>
> No matter if the desired result can be accomplished with current technology
> or not, I thing it would be nice to have an easier approach at it. Thus I
> would be very willing to merge any result.
>

Thanks.


>
> In order to go forward, Janmejay, I suggest this:
>
> - have a deep look at the current rsyslog core code, at a minimum this
> means:
>   * understand inputs, input batches and queue submission
>   * understand the queue subsystem, including DA queues and the relation
>     of action queues and main queues
>   * understand action processing (many of that in action.c)
>   * understand output batching
> - be sure to have a very good grip on the points above
> - then think about where you need to
>   a) add new modules (like input, output)
>   b) add new capabilities inside the rsyslog core (like new queue modes)
>   c) need to modify existing code/capabilities
> - once done, propose the what you want to change for the b) and c) items
>
> Once we have this proposal, I would be very willing to look if there is
> anything breaking core ideas or existing use cases. As a side-note, I
> wouldn't wonder if you find out that you need to do a major rewrite of the
> core components that I explicitly mentioned above. That's my gut feeling
> because as I said a couple of days ago on the ML, I myself would really
> like to rewrite major parts of the queue code (and thus associated queue
> users). See my past posting for details.
>

I will. If we decide to build this out in Rsyslog, I'll need to do a much
deeper analysis anyway.


>
> Again, my apologies that I can't participate any further in this effort,
> but that's simply going over my head time-wise.
>

No problem at all.


>
> Rainer
>
> 2015-01-23 23:41 GMT+01:00 Rainer Gerhards <[email protected]>:
>
> > I admit I just skimmed over the messages.  The windows architecture is
> > quite a bit different, the input waits there in some cases.
> >
> > Sent from phone, thus brief.
> > Am 23.01.2015 23:35 schrieb "David Lang" <[email protected]>:
> >
> >> for this case it would have to feed back across the queues to the input
> >> module so that the input module could send something out.
> >>
> >> That's the really hard part.
> >>
> >> David Lang
> >>
> >> On Fri, 23 Jan 2015, Rainer Gerhards wrote:
> >>
> >>  Well,  an output can block when forwarding. We do a similar thing with
> >>> the
> >>> windows products, via another protocol. But as you say David, it's
> pretty
> >>> problematic in many cases.
> >>>
> >>> But as I said, I really can't help develop this right now.  That
> includes
> >>> in depth design discussions.
> >>>
> >>> Rainer
> >>>
> >>> Sent from phone, thus brief.
> >>> Am 23.01.2015 22:54 schrieb "David Lang" <[email protected]>:
> >>>
> >>>  The huge problem would be how to pass data (the ack) back up from the
> >>>> final receiver to the original sender. I don't think rsyslog would be
> >>>> able
> >>>> to do this without a MAJOR rewrite (currently the earlier machines
> would
> >>>> have completely forgotten about the message by the time the ack gets
> >>>> generated)
> >>>>
> >>>> David Lang
> >>>>
> >>>> On Fri, 23 Jan 2015, Rainer Gerhards wrote:
> >>>>
> >>>>  Sorry to be that blunt, but I simply have no time to participate in
> >>>>
> >>>>> developing this. But I would be very open to merge any results.
> >>>>>
> >>>>> Rainer
> >>>>>
> >>>>> Sent from phone, thus brief.
> >>>>> Am 23.01.2015 22:17 schrieb "singh.janmejay" <
> [email protected]
> >>>>> >:
> >>>>>
> >>>>>  On Sat, Jan 24, 2015 at 2:19 AM, David Lang <[email protected]> wrote:
> >>>>>
> >>>>>>
> >>>>>> > RELP is the network protocol you need for this sort of
> reliability.
> >>>>>> > However, you would also need to not allow any message to be stored
> >>>>>> in
> >>>>>> > memory (because it would be lost if rsyslog crashes or the system
> >>>>>> reboots
> >>>>>> > unexpectedly). You would have to use disk queues (not disk
> assisted
> >>>>>> queues)
> >>>>>> > everywhere and do some other settings (checkpoint interval of 1
> for
> >>>>>> example)
> >>>>>> >
> >>>>>> > This would absolutly cripple your performance due to the disk I/O
> >>>>>> > limitations. I did some testing of this a few years ago. I was
> >>>>>> using a
> >>>>>> > high-end PCI SSD (a 160G card cost >$5K at the time) and depending
> >>>>>> on
> >>>>>> the
> >>>>>> > filesystem I used, I could get rsyslog to receive between 2K and
> 8K
> >>>>>> > messages/sec. The same hardware writing to a 7200rpm SATA drive
> with
> >>>>>> memory
> >>>>>> > buffering allowed could handle 380K messages/sec (the limiting
> >>>>>> factor
> >>>>>> was
> >>>>>> > the Gig-E network)
> >>>>>> >
> >>>>>> > Doing this sort of reliability on a 15Krpm SAS drive would limit
> >>>>>> you to
> >>>>>> > ~50 logs/sec. Modern SSDs would be able to do better, I would
> guess
> >>>>>> a
> >>>>>> few
> >>>>>> > hundred logs/sec from a good drive, but you would be chewing
> through
> >>>>>> the
> >>>>>> > drive lifetime several thousand times faster than if you were
> >>>>>> allowing
> >>>>>> > memory buffering.
> >>>>>> >
> >>>>>> > Very few people have logs that are critical enough to warrent this
> >>>>>> sort
> >>>>>> of
> >>>>>> > performance degredation.
> >>>>>> >
> >>>>>>
> >>>>>> I didn't particularly have disk-based queues in mind for reliability
> >>>>>> reasons. However, messages may need to overflow to disk to manage
> >>>>>> bursts
> >>>>>> (but only for burstability reasons). For a large-architecture for
> this
> >>>>>> nature, its generally useful to classify failures in a broad way
> >>>>>> (rather
> >>>>>> than very granular failure modes, that we identify for transactional
> >>>>>> databases etc). The reason for this ties back to self-healing. Its
> >>>>>> easier
> >>>>>> to build self-healing mechanisms assuming only one kind of failure,
> >>>>>> node
> >>>>>> loss. It could happen for multiple reasons, but if we treat it that
> >>>>>> way,
> >>>>>> all we have to do is build room for managing the cluster when 1 (or
> k)
> >>>>>> nodes are lost.
> >>>>>>
> >>>>>> So thinking of it that way, a rsyslog crash, or a machine-crash or a
> >>>>>> disk
> >>>>>> failure are all the same to me. They are just node loss (we may be
> >>>>>> able
> >>>>>> to
> >>>>>> bring the node back with some offline procedure), but it'll come
> back
> >>>>>> as
> >>>>>> a
> >>>>>> fresh machine with no state.
> >>>>>>
> >>>>>> Which is why I treat K-safety as a basic design parameter. If K
> nodes
> >>>>>> disappear, data will be lost.
> >>>>>>
> >>>>>> With this kind of coarse-grained failure-mode, messages can easily
> be
> >>>>>> kept
> >>>>>> in memory.
> >>>>>>
> >>>>>>
> >>>>>> >
> >>>>>> > In addition, this sort of reliability is saying that you would
> >>>>>> rather
> >>>>>> have
> >>>>>> > your applications freeze than have them do something and not have
> it
> >>>>>> > logged. And that you are willing to have your application slow
> down
> >>>>>> to
> >>>>>> the
> >>>>>> > speed of the logging. Very few people are willing to do this.
> >>>>>> >
> >>>>>> >
> >>>>>> >
> >>>>>> > You are proposing doing the application ack across multiple hops
> >>>>>> instead
> >>>>>> > of doing it hop-by-hop. This would avoid the problem that can
> happen
> >>>>>> with
> >>>>>> > hop-by-hop acks where a machine that has acked a message then dies
> >>>>>> and
> >>>>>> > needs to be recovered before the message can get delivered
> (assuming
> >>>>>> you
> >>>>>> > have redundant storage and enough of the storage survives to be
> >>>>>> able to
> >>>>>> be
> >>>>>> > read, the message would eventually get through).
> >>>>>> >
> >>>>>> > But you now have the problem that the sender needs to know how
> many
> >>>>>> > destinations the logs are going to. If you have any filters to
> >>>>>> decide
> >>>>>> what
> >>>>>> > to do with the logs, the sender needs to know if the log got lost,
> >>>>>> or
> >>>>>> if
> >>>>>> a
> >>>>>> > filter decided to not write the log. If the rules would deliver
> the
> >>>>>> logs
> >>>>>> to
> >>>>>> > multiple places, the sender will need to know how many places it's
> >>>>>> going
> >>>>>> to
> >>>>>> > be delivered to so that it can know how many different acks it's
> >>>>>> supposed
> >>>>>> > to get back.
> >>>>>> >
> >>>>>>
> >>>>>> So the design expects clusters to be broken in multiple tiers. Let
> us
> >>>>>> take
> >>>>>> a 3 tier example.
> >>>>>>
> >>>>>> Say we have 100 machines, we break them into 3 tiers of 34, 33 and
> 33
> >>>>>> machines.
> >>>>>>
> >>>>>> Assuming every producer wants at the most 2-safety, I can use 3
> tiers
> >>>>>> to
> >>>>>> build this design.
> >>>>>>
> >>>>>> So first producer discovers Tier-1 nodes, and hashes its session_id
> to
> >>>>>> pick
> >>>>>> one of the 34 nodes, if it is not able to connect, it discards that
> >>>>>> node
> >>>>>> from the collection(now we end up with 33 nodes in Tier-1) and
> hashes
> >>>>>> to
> >>>>>> a
> >>>>>> different node (again a Tier-1 node, of-course).
> >>>>>>
> >>>>>> One it finds a node that it can connect to, it sends its session_id
> >>>>>> and
> >>>>>> message-batch to it.
> >>>>>>
> >>>>>> The selected Tier-1 node now hashes the session_id and finds a
> Tier-2
> >>>>>> node
> >>>>>> (it again discovers all Tier-2 nodes via external discovery
> >>>>>> mechanism).
> >>>>>> If
> >>>>>> it fails to connect, it discards that node and hashes again to one
> of
> >>>>>> the
> >>>>>> remaining 32 nodes, and so on.
> >>>>>>
> >>>>>> Eventually it reaches Tier-3, which is where ruleset has a clause
> >>>>>> which
> >>>>>> checks for replica_number == 1, and handles the message differently.
> >>>>>> It
> >>>>>> is
> >>>>>> handed over to an action which delivers it to the downstream system
> >>>>>> (which
> >>>>>> may in-turn again be a syslog-cluster, or a datastore etc).
> >>>>>>
> >>>>>> So each node only has to worry about the next hop that it needs to
> >>>>>> deliver
> >>>>>> to.
> >>>>>>
> >>>>>>
> >>>>>> >
> >>>>>> > These problems make it so that I don't see how you would
> reasonably
> >>>>>> manage
> >>>>>> > this sort of environment.
> >>>>>> >
> >>>>>> >
> >>>>>> >
> >>>>>> > I would suggest that you think hard about what your requirements
> >>>>>> really
> >>>>>> > are.
> >>>>>> >
> >>>>>> > It may be that you are only sending to one place, in which case,
> you
> >>>>>> > really want to just be inserting your messages into an ACID
> >>>>>> complient
> >>>>>> > database.
> >>>>>> >
> >>>>>> > It may be that your requirements for absolute reliability are not
> >>>>>> quite
> >>>>>> as
> >>>>>> > severe as you are initially thinking that they are, and that you
> can
> >>>>>> then
> >>>>>> > use the existing hop-by-hop reliability. Or they are even less
> >>>>>> severe
> >>>>>> and
> >>>>>> > you can accept some amount of memory buffering to get a few orders
> >>>>>> of
> >>>>>> > magnatude better performance from your logging. Remember that we
> are
> >>>>>> > talking about performance differences of 10,000x on normal
> >>>>>> hardware. A
> >>>>>> bit
> >>>>>> > less, but still 100x or so on esoteric, high-end hardware.
> >>>>>> >
> >>>>>> >
> >>>>>> Yep, I completely agree. In most cases extreme reliability such as
> >>>>>> this
> >>>>>> is
> >>>>>> not required, and is best avoided for cost reasons.
> >>>>>>
> >>>>>> But for select applications it is lifesaver.
> >>>>>>
> >>>>>>
> >>>>>> >
> >>>>>> > I will also say that there are messaging systems that claim to
> have
> >>>>>> the
> >>>>>> > properties that you are looking for (Flume for example), but
> almost
> >>>>>> nobody
> >>>>>> > operates them in their full reliability mode because of the
> >>>>>> performance
> >>>>>> > issues. And they do not have the filtering and multiple
> destination
> >>>>>> > capabilities that *syslog provides.
> >>>>>> >
> >>>>>>
> >>>>>> Yes, Flume is one of the best options. But it comes with some unique
> >>>>>> problems too (its not light-weight enough for running producer side
> +
> >>>>>> managed-environment overhead (GC etc) cause their own set of
> >>>>>> problems).
> >>>>>> There is also value in offering the same interface to producers for
> >>>>>> ingestion into un-acked and reliable pipeline (because a lot of
> other
> >>>>>> things, like integration with other systems can be reused). It also
> >>>>>> keeps
> >>>>>> things simple because producers do all operations in one way, with
> one
> >>>>>> tool, regardless of its ingestion mechanism being acked/replicated
> >>>>>> etc.
> >>>>>>
> >>>>>> Reliability in this case is built end-to-end, so building stronger
> >>>>>> guarantees over-the-wire parts of the pipeline doesn't seem very
> >>>>>> valuable
> >>>>>> to me. Why do you feel RELP will be necessary?
> >>>>>>
> >>>>>>
> >>>>>> >
> >>>>>> > David Lang
> >>>>>> >
> >>>>>> >
> >>>>>> >
> >>>>>> > On Sat, 24 Jan 2015, singh.janmejay wrote:
> >>>>>> >
> >>>>>> >  Date: Sat, 24 Jan 2015 01:48:18 +0530
> >>>>>> >> From: singh.janmejay <[email protected]>
> >>>>>> >> Reply-To: rsyslog-users <[email protected]>
> >>>>>> >> To: rsyslog-users <[email protected]>
> >>>>>> >> Subject: [rsyslog] [RFC: Ingestion Relay] End-to-end reliable
> >>>>>> >> 'at-least-once'
> >>>>>> >>     message delivery at large scale
> >>>>>> >>
> >>>>>> >>
> >>>>>> >> Greetings,
> >>>>>> >>
> >>>>>> >> This is a proposal for new-feature, and im inviting thoughts.
> >>>>>> >>
> >>>>>> >> The aim is to use a set of rsyslog nodes(let us call it a
> cluster)
> >>>>>> to
> >>>>>> be
> >>>>>> >> able to move messages reliably from source to destination.
> >>>>>> >>
> >>>>>> >> Let us make a few assumptions so we can define the expected
> >>>>>> properties
> >>>>>> >> clearly.
> >>>>>> >>
> >>>>>> >>
> >>>>>> >> Assumptions:
> >>>>>> >>
> >>>>>> >> - Data once successfully delivered to the Destination (typically
> a
> >>>>>> >> datastore) is considered safe.
> >>>>>> >> - Source-crashing with incomplete message hand-off to the cluster
> >>>>>> is
> >>>>>> >> outside the scope of this. In such a case, source must retry.
> >>>>>> >> - The cluster must be designed to support a maximum of K node
> >>>>>> failures
> >>>>>> >> without any message loss
> >>>>>> >>
> >>>>>> >>
> >>>>>> >> Here are the properties that may be desirable in such a
> service(the
> >>>>>> >> cluster
> >>>>>> >> is implementation of this service):
> >>>>>> >>
> >>>>>> >> - No message should ever be lost once handed over to the
> >>>>>> delivery-network
> >>>>>> >> except in a disaster scenario
> >>>>>> >> - Disaster scenario is a condition where more than k nodes in the
> >>>>>> cluster
> >>>>>> >> fail
> >>>>>> >> - Each source may pick a desirable value of k, where (k <= K)
> >>>>>> >> - Any cluster nodes must re-transmit messages at a timeout T, if
> >>>>>> >> downstream
> >>>>>> >> fails to ACK it before the timeout.
> >>>>>> >> - Such a cluster should ideally be composable, in the sense, user
> >>>>>> should
> >>>>>> >> be
> >>>>>> >> able to chain multiple such clusters.
> >>>>>> >>
> >>>>>> >>
> >>>>>> >> This requires the cluster to support k-way replication of
> messages
> >>>>>> in
> >>>>>> the
> >>>>>> >> cluster.
> >>>>>> >>
> >>>>>> >> Implementation:
> >>>>>> >>
> >>>>>> >> High level:
> >>>>>> >> - The cluster is divided in multiple tiers (let us call them
> >>>>>> >> replication-tiers (or rep-tiers).
> >>>>>> >> - The cluster can handle multiple sessions at a time.
> >>>>>> >> - Session_ids are unique and are generated by producer system
> when
> >>>>>> they
> >>>>>> >> start producing messages
> >>>>>> >> - Within a session, we have a notion of sequence-number (or
> >>>>>> seq_no),
> >>>>>> which
> >>>>>> >> is a monotonically increasing number(incremented by 1 per
> message).
> >>>>>> This
> >>>>>> >> requirement can possibly be relaxed for performance reasons, and
> >>>>>> gaps
> >>>>>> in
> >>>>>> >> seq-id may be acceptable.
> >>>>>> >> - Replication is basically managed by lower tiers sending data
> >>>>>> over to
> >>>>>> >> higher tiers within the cluster, until replica-number (an
> attribute
> >>>>>> each
> >>>>>> >> message carries, falls to 1)
> >>>>>> >> - When replica-number falls to zero, we transmit message to
> desired
> >>>>>> >> destination. (This can alternatively be done at the earliest
> >>>>>> opportunity,
> >>>>>> >> i.e. in Tier-1, under special-circumstances, but let us discuss
> >>>>>> that
> >>>>>> later
> >>>>>> >> if we find enough interest in doing so).
> >>>>>> >> - There must be several nodes in each Tier, allocated to minimize
> >>>>>> >> possibility of all of them going down at once (across
> availability
> >>>>>> zones,
> >>>>>> >> different chassis etc).
> >>>>>> >> - There must be a mechanism which allows nodes from upstream
> >>>>>> system to
> >>>>>> >> discover nodes of Tier-1 of the cluster, and Tier-1 nodes to
> >>>>>> discover
> >>>>>> >> nodes
> >>>>>> >> in Tier-2 of the cluster and so on. Hence nodes in Tier-K of the
> >>>>>> cluster
> >>>>>> >> should be able to discover downstream nodes.
> >>>>>> >> - Each session (or multiple sessions bundled according to
> arbitrary
> >>>>>> logic,
> >>>>>> >> such as hashing), must pick one node from each tier as
> >>>>>> >> downstream-tier-node.
> >>>>>> >> - Each node must maintain 2 watermarks:
> >>>>>> >>    * Replicated till seq_no : till what sequence number have
> >>>>>> messages
> >>>>>> been
> >>>>>> >> k-way replicated in the cluster
> >>>>>> >>    * Delivered till seq_no: till what sequence number have
> messages
> >>>>>> been
> >>>>>> >> delivered to downstream system
> >>>>>> >> - Each send-operation (i.e. transmission of messages) from
> >>>>>> upstream to
> >>>>>> >> cluster's Tier-1 or from lower tier in cluster to higher tier in
> >>>>>> cluster
> >>>>>> >> will pass messages such that highest seq_no of any message(per
> >>>>>> session)
> >>>>>> in
> >>>>>> >> transmitted batch is known
> >>>>>> >> - Each receive-operation in cluster's Tier-1 or in upper-tiers
> >>>>>> within
> >>>>>> >> cluster must respond/reply to transmitter with the two water-mark
> >>>>>> values
> >>>>>> >> (i.e Replicated seq_no and Delivered seq_no) per session.
> >>>>>> >> - Lower tiers (within the cluster) are free to discard messages
> all
> >>>>>> >> message
> >>>>>> >> with seq_no <= Delivered till seq_no
> >>>>>> >> - Upstream system is free to discard all messages with seq_no <=
> >>>>>> >> Replicated
> >>>>>> >> till seq_no of cluster
> >>>>>> >> - Upstream and downstream systems can be chained as instances of
> >>>>>> such
> >>>>>> >> clusters if need be
> >>>>>> >> - Maximum replication factor 'K' is dictated by cluster design
> >>>>>> (number
> >>>>>> of
> >>>>>> >> tiers)
> >>>>>> >> - Desired replication factor 'k' is a per-message controllable
> >>>>>> attribute
> >>>>>> >> (decided by the upstream)
> >>>>>> >>
> >>>>>> >> The sequence-diagrams below explain this visually:
> >>>>>> >>
> >>>>>> >> Here is a case with an upstream sending messages with k = K :
> >>>>>> >> ingestion_relay_1_max_replication.png
> >>>>>> >> <https://docs.google.com/file/d/0B_XhUZLNFT4dN21TLTZBQjZMdUk/
> >>>>>> >> edit?usp=drive_web>
> >>>>>> >>
> >>>>>> >> This is a case with k < K :
> >>>>>> >> ingestion_relay_2_low_replication.png
> >>>>>> >> <https://docs.google.com/file/d/0B_XhUZLNFT4da1lKMnRKdU9JUkU/
> >>>>>> >> edit?usp=drive_web>
> >>>>>> >> 
> >>>>>> >> The above 2 cases show only one transmission going from upstream
> >>>>>> system
> >>>>>> to
> >>>>>> >> downstream system serially, this shows it pipelined :
> >>>>>> >> ingestion_relay_3_pipelining.png
> >>>>>> >> <https://docs.google.com/file/d/0B_XhUZLNFT4dQUpTZGRDdVVXLVU/
> >>>>>> >> edit?usp=drive_web>
> >>>>>> >> 
> >>>>>> >> This demonstrates failure of a node in the cluster, and how it
> >>>>>> recovers
> >>>>>> in
> >>>>>> >> absence of continued transmission (it is recovered by timeout and
> >>>>>> >> retransmission) :
> >>>>>> >> ingestion_relay_4_timeout_based_recovery.png
> >>>>>> >> <https://docs.google.com/file/d/0B_XhUZLNFT4dMm5kUWtaTlVfV1U/
> >>>>>> >> edit?usp=drive_web>
> >>>>>> >> 
> >>>>>> >> This demonstrates failure of a node in the cluster, and how it
> >>>>>> recovers
> >>>>>> >> due
> >>>>>> >> to continued transmission :
> >>>>>> >> ingestion_relay_5_broken_transmission_based_recovery.png
> >>>>>> >> <https://docs.google.com/file/d/0B_XhUZLNFT4dd3M0SXpUYjFXdlk/
> >>>>>> >> edit?usp=drive_web>
> >>>>>> >>
> >>>>>> >> 
> >>>>>> >>
> >>>>>> >> Rsyslog level implementation sketch:
> >>>>>> >>
> >>>>>> >> - Let us assume there is a way to identify the set of inputs,
> >>>>>> queues,
> >>>>>> >> rulesets and actions that need to participate as reliable
> pipeline
> >>>>>> >> components in a cluster node
> >>>>>> >> - Each participating queue, will expect messages to contain a
> >>>>>> session-id
> >>>>>> >> - Consumer bound to a queue will be expected to provide values
> for
> >>>>>> both
> >>>>>> >> watermarks to per-session to dequeue more messages.
> >>>>>> >> - Producer bound to a queue will be provided values for both
> >>>>>> watermarks
> >>>>>> >> per-session as return value when en-queueing more messages.
> >>>>>> >> - The inputs will transmit (either broadcast or unicast) both
> >>>>>> watermark
> >>>>>> >> values to upstream actions (unicast is sent over relevant
> >>>>>> connections,
> >>>>>> >> broadcast is sent across all connections) (please note this has
> >>>>>> nothing
> >>>>>> to
> >>>>>> >> do with network broadcast domains, as everything is over TCP).
> >>>>>> >> - Actions will receive the two watermarks and push it back to the
> >>>>>> queue
> >>>>>> >> action is bound to, in order to dequeue more messages
> >>>>>> >> - Rulesets will need to pick the relevant actions value across
> >>>>>> multiple
> >>>>>> >> action-queues according to user-provided configuration, and
> >>>>>> propagate
> >>>>>> it
> >>>>>> >> backwards
> >>>>>> >> - Action must have ability to set arbitrarily value for
> >>>>>> replica-number
> >>>>>> >> when
> >>>>>> >> passing it to downstream-system (so that chaining is possible).
> >>>>>> >> - Inputs may produce the new value for replicated till seq_no
> when
> >>>>>> >> receiving a message with replica_number == 1
> >>>>>> >> - Action may produce the new value for delivered till seq_no
> after
> >>>>>> having
> >>>>>> >> successfully delivered a message with replica_number == 1
> >>>>>> >>
> >>>>>> >> Rsyslog configuration required(from user):
> >>>>>> >>
> >>>>>> >> - User will need to identify machines that are a part of cluster
> >>>>>> >> - These machines will have to be divided in multiple replication
> >>>>>> tiers
> >>>>>> (as
> >>>>>> >> replication will happen only across machines in different tiers)
> >>>>>> >> - User can pass message to the next cluster by setting
> >>>>>> replica_number
> >>>>>> back
> >>>>>> >> to a desired number and passing it to an action which writes it
> to
> >>>>>> one
> >>>>>> of
> >>>>>> >> the nodes in a downstream cluster
> >>>>>> >> - User needs to check replica_number in the ruleset and take
> >>>>>> special
> >>>>>> >> action
> >>>>>> >> (to write it to downstream system) when replica_number == 1
> >>>>>> >>
> >>>>>> >>
> >>>>>> >> Does this have any overlap with RELP?
> >>>>>> >>
> >>>>>> >> I haven't studied RELP in depth yet, but as far as I understand
> >>>>>> it, it
> >>>>>> >> tries to solve the problem of delivering messages reliably
> between
> >>>>>> a
> >>>>>> >> single-producer and a single-consumer losslessly (it targets
> >>>>>> different
> >>>>>> >> kind
> >>>>>> >> of loss scenarios specifically). In addition to this, its scope
> is
> >>>>>> limited
> >>>>>> >> to ensuring no messages are lost during transportation. In event
> >>>>>> of a
> >>>>>> >> crash
> >>>>>> >> of the receiver node before it can handle received message
> >>>>>> reliably,
> >>>>>> some
> >>>>>> >> messages may be lost. Someone with deeper knowledge of RELP
> should
> >>>>>> chime
> >>>>>> >> in.
> >>>>>> >>
> >>>>>> >>
> >>>>>> >>
> >>>>>> >> Thoughts?
> >>>>>> >>
> >>>>>> >>
> >>>>>> >>
> >>>>>> >> --
> >>>>>> >> Regards,
> >>>>>> >> Janmejay
> >>>>>> >> http://codehunk.wordpress.com
> >>>>>> >> _______________________________________________
> >>>>>> >> rsyslog mailing list
> >>>>>> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>>>>> >> http://www.rsyslog.com/professional-services/
> >>>>>> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >>>>>> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> >>>>>> myriad
> >>>>>> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST
> if
> >>>>>> you
> >>>>>> >> DON'T LIKE THAT.
> >>>>>> >
> >>>>>> >
> >>>>>> > _______________________________________________
> >>>>>> > rsyslog mailing list
> >>>>>> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>>>>> > http://www.rsyslog.com/professional-services/
> >>>>>> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> >>>>>> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> >>>>>> myriad
> >>>>>> > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
> >>>>>> you
> >>>>>> > DON'T LIKE THAT.
> >>>>>> >
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Regards,
> >>>>>> Janmejay
> >>>>>> http://codehunk.wordpress.com
> >>>>>> _______________________________________________
> >>>>>> rsyslog mailing list
> >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>>>>> http://www.rsyslog.com/professional-services/
> >>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> >>>>>> myriad
> >>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
> you
> >>>>>> DON'T LIKE THAT.
> >>>>>>
> >>>>>>  _______________________________________________
> >>>>> rsyslog mailing list
> >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>>>> http://www.rsyslog.com/professional-services/
> >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> >>>>> myriad
> >>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
> you
> >>>>> DON'T LIKE THAT.
> >>>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> rsyslog mailing list
> >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>>> http://www.rsyslog.com/professional-services/
> >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> myriad
> >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> >>>> DON'T LIKE THAT.
> >>>>
> >>>>  _______________________________________________
> >>> rsyslog mailing list
> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>> http://www.rsyslog.com/professional-services/
> >>> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> myriad
> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> >>> DON'T LIKE THAT.
> >>
> >>
> >> _______________________________________________
> >> rsyslog mailing list
> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> http://www.rsyslog.com/professional-services/
> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> >> DON'T LIKE THAT.
> >>
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>



-- 
Regards,
Janmejay
http://codehunk.wordpress.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] [RFC: Ingestion Relay] End-to-end reliable 'at-least-once' message delivery at large scale

Reply via email to