Re: [rsyslog] [RFC] Log-forward destination-cluster support

Rainer Gerhards Thu, 04 Jun 2015 01:39:52 -0700

Sorry if this sounds discouraging: I currently have such a large
backlog that I can't engage in that effort and I think I am also
unable to merge any change of this magnitude any time before the
backlog has become shorter (Q4+ 2015 I guess).


Sorry I have no better answer, but you see yourself what all is going
on and I really need to make sure I can follow at least the bare
essentials.

Rainer

2015-06-04 5:53 GMT+02:00 singh.janmejay <[email protected]>:
> Yes L4 load-balancing will work to significant scale. L7
> load-balancing will do even better in terms of even load, but not sure
> if syslog protocol is widely supported in load-balancers.
>
> DNS scaling and propagation delay are sometimes not acceptable, but
> BGP anycast is something that'd work at data-center scale with very
> large PODs.
>
> This is an alternative to that. It has fewer moving parts (just
> producer and consumer), no LB and it doesn't require the complexity of
> anycast.
>
> It trades-off engineering complexity of load-balancer and anycast with
> smarter-clients and servers (increasing the complexity of clients and
> servers a little, but also simplifying the deployment topology
> significantly).
>
> I think all three are valid approaches and choice of one over the
> other(best fit) will vary across deployments.
>
>
> On Thu, Jun 4, 2015 at 8:45 AM, David Lang <[email protected]> wrote:
>> I don't see the advantage of adding all this complexity as opposed to using
>> existing load balancing approaches. With existing tools we can deliver the
>> log stream to a cluster of systems, and deal with them failing. Yes, the
>> easy approaches to doing this are limited to the throughput of a single
>> wire, but since that single wire is commonly 10Gb/sec (and easily 40Gb/sec)
>> with off-the-shelf technology, and the fact that the log stream can be
>> compressed, this isn't likely to be an issue for much of anyone below Google
>> scale.
>>
>> There is a lot of advantages to keeping the failover logic and config
>> contained to as small an area of the network and as few devices as possible.
>> The systems accepting the ogs _must- participate in the process (responding
>> to health checks if nothing else), it only takes a couple other boxes (if
>> any) to perform TCP load balancing. And having everything local increases
>> the accuracy of the detection and speed of recovery.
>>
>> If you want to deal with larger failures (datacenter scale), then existing
>> DNS/BGP failover tools can come into play.
>>
>> What advantage do we gain by pushing the configuration and failover logic to
>> the senders?
>>
>> David Lang
>>
>>
>> On Thu, 4 Jun 2015, singh.janmejay wrote:
>>
>>> Hi,
>>>
>>> This is proposal towards first-class support for notion of a 'cluster'
>>> as a log-forwarding destination. It talks about a
>>> technology-independent service-discovery-support implementation.
>>>
>>> Scenario / Context:
>>> ------------------------
>>> Say an environment is supposed to relay all logs to a logical
>>> destination for aggregation/archival purpose. Such a setup at large
>>> scale would have a several log-producers and several log-receivers (or
>>> listeners).
>>>
>>> Because of several machines being involved, probability of some
>>> listener nodes failing is significant, and log-producers should
>>> ideally be somehow aware of "current" log-receiving-cluster definition
>>> (that is, nodes that are healthy and receiving logs).
>>>
>>> This proposal talks about first-class definition and usage of
>>> log-receiver-cluster.
>>>
>>> Configuration concepts:
>>> -----------------------------
>>>
>>> action(...): gets an additional parameter called cluster, which
>>> follows URL like conversion (scheme is used to indicate technology
>>> used for service-discovery).
>>>
>>> input(...): gets an additional parameter called cluster, same as action
>>>
>>> The URL must resolve to currently-available host+port pairs, and may
>>> support notifications, polling or both (depends on underlying
>>> service-discovery technology).
>>>
>>> The resolution of URLs and representation of host+port pairs in the
>>> service-endpoint is left to service-discovery technology.
>>>
>>> Configuration examples:
>>> ------------------------------
>>>
>>> action(type="omfwd" cluster="zk://10.20.30.40,10.20.30.41/foo_cluster"
>>> template="foo-fwd" protocol="tcp" tcp_framing="octet-counted"
>>> ResendLastMSGOnReconnect="on")
>>>
>>> input(type="imptcp" port="10514" name="foo" ruleset="foo.recv"
>>> cluster="zk://10.20.30.40,10.20.30.41/foo_cluster")
>>>
>>> The parameter "cluster" in both cases points to a url
>>> "zk://10.20.30.40,10.20.30.41/foo_cluster" which does:
>>>
>>> [behaviour common to both input and action]
>>>     - scheme 'zk' resolves to use of zookeeper based
>>> service-discovery (picks one of the multiple supported
>>> service-discovery technologies)
>>>     - reach the zookeeper at IP address: 10.20.30.40 or 10.20.30.41
>>>     - look for znode (zookeeper specific): /foo_cluster
>>>
>>>
>>> [action specific behaviour]
>>>     - enumerate ephemeral nodes under the said znode
>>>     - pick one of the child-nodes at random
>>>     - expect it to be follow naming convension host:port
>>>     - break host and port apart and use it as destination for forwarding
>>> logs
>>>
>>>
>>> [input specific behaviour]
>>>     - enumerate IP addresses that this input is listening to
>>>     - create ephemeral child-znodes under the foo_cluster znode with
>>> name IP:port (follow this convention, enforced by service discovery
>>> technology specific implementation)
>>>     - keep the session live as long as input is live (which means,
>>> kill the session if queues are filled up, or the heartbeat-mechanism
>>> will automatically kill it if rsyslog process dies, or host freezes up
>>> or fails, or connecting switch fails, etc).
>>>
>>> Service-discovery technology independence:
>>> --------------------------------------------------------
>>>
>>> URL-like naming convention allows for scheme to indicate
>>> service-discovery technology. Cluster-url for simple polling based
>>> discovery technology implemented over http would look like this:
>>>
>>> http://10.20.30.40/foo_cluster
>>>
>>> A log-cabin based impl would have something similar to:
>>>
>>> logcabin://10.20.30.40,10.20.30.40/foo_cluster
>>>
>>> Implementation sketch:
>>> ----------------------------
>>>
>>> Service-discovery support can be implemented within rsyslog(runtime),
>>> or as an independent library under rsyslog umbrella (I think I prefer
>>> independent library).
>>>
>>> Any input or output modules that choose to support it basically link
>>> with the library and do one of the following:
>>>
>>> - as an action:
>>>     when doAction fails, it picks a random-endpoint from a collection
>>> of currently-available-endpoints provided by service discovery (random
>>> picking from the collection is implemented inside the library, because
>>> then it allows us later to make use of other parameters (such as
>>> current load, which will be better for load-balancing)).
>>>
>>> - as an input:
>>>     if listen is successful and input is ready to take traffic, it
>>> enumerates all IP addresses it is listening on, and registers with
>>> service discovery technology those end-points (which can then be
>>> discovered by senders).
>>>
>>> This can also be used at source or sink where producer or consumer is
>>> aware of the service-discovery technology specific protocol and
>>> rsyslog is only at one end of the pipe (as long as the same protocol
>>> is followed, which will be easy considering the implementation is
>>> available as a library).
>>>
>>> Host and port as required by to-be-supported input/output module will
>>> become optional. Either host+port or cluster parameter must be
>>> specified, specifying both will generate a warning, and it'll discard
>>> cluster in favour of host+port (based on specificity).
>>>
>>> The support for this can be built incrementally for each input/output
>>> module as desired, I have omfwd and imptcp in mind at this time.
>>>
>>> Thoughts?
>>>
>>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
>> LIKE THAT.
>
>
>
> --
> Regards,
> Janmejay
> http://codehunk.wordpress.com
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
> LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] [RFC] Log-forward destination-cluster support

Reply via email to