Re: [rsyslog] omelasticsearch - failed operation handling

Rich Megginson via rsyslog Wed, 23 May 2018 10:58:47 -0700

maybe the actual code will explain what I intend:https://github.com/rsyslog/rsyslog/pull/2733


On 05/18/2018 10:52 AM, Rainer Gerhards wrote:

Just quicky chiming in, will need to catch a plane early tomorrowmorning.

It's complicated. At this point, the original message is no longeravailable, as omelsticsearch works with batches, but the rule engineneeds to process message by message (we had to change that some timeago). The messages are still in batches, but modifications happen tothe message so they need to go through individually. Needs moreexplanation, for which I currently have no time.

So we need to either create a new rsyslog core to plugin interface ordo something omelsticsearch específico.


I can elaborate more at the end of May.

Rainer

Sent from phone, thus brief.

David Lang <[email protected] <mailto:[email protected]>> schrieb am Do., 17.Mai 2018, 18:25:


    On Thu, 17 May 2018, Rich Megginson wrote:

    >> then you can mark the ones accepted as done and just retry the
    ones that
    >> fail.
    >
    > That's what I'm proposing.
    >
    >> But there's still no need for a separate ruleset and queue. In
    Rsyslog, if
    >> an output cannot accept a message and there's reason to think
    that it will
    >> in the future, then you suspend that output and try again
    later. If you
    >> have reason to believe that the message is never going to be
    able to be
    >> delivered, then you need to fail the message or you will be
    stuck forever.
    >> This is what the error output was made for.
    >
    > So how would that work on a per-record basis?
    >
    > Would this be something different than using MsgConstruct -> set
    fields in
    > msg from original request -> ratelimitAddMsg for each record to
    resubmit?

    Rainer, in a batch, is there any way to mark some of the messages
    as delivered
    and others as failed as opposed to failing the entire batch?

    >>
    >>> If using the "index" (default) bulk type, this causes
    duplicate records to
    >>> be added.
    >>> If using the "create" type (and you have assigned a unique
    _id), you will
    >>> get back many 409 Duplicate errors.
    >>> This causes problems - we know because this is how the fluentd
    plugin used
    >>> to work, which is why we had to change it.
    >>>
    >>>
    
https://www.elastic.co/guide/en/elasticsearch/guide/2.x/_monitoring_individual_nodes.html#_threadpool_section

    >>> "Bulk Rejections"
    >>> "It is much better to handle queuing in your application by
    gracefully
    >>> handling the back pressure from a full queue. When you receive
    bulk
    >>> rejections, you should take these steps:
    >>>
    >>>     Pause the import thread for 3–5 seconds.
    >>>     Extract the rejected actions from the bulk response, since
    it is
    >>> probable that many of the actions were successful. The bulk
    response will
    >>> tell you which succeeded and which were rejected.
    >>>     Send a new bulk request with just the rejected actions.
    >>>     Repeat from step 1 if rejections are encountered again.
    >>>
    >>> Using this procedure, your code naturally adapts to the load
    of your
    >>> cluster and naturally backs off.
    >>> "
    >>
    >> Does it really accept some and reject some in a random manner?
    or is it a
    >> matter of accepting the first X and rejecting any after that
    point? The
    >> first is easier to deal with.
    >
    > It appears to be random.  So you may get a failure from the
    first record in
    > the batch and the last record in the batch, and success for the
    others.  Or
    > vice versa.  There appear to be many, many factors in the
    tuning, hardware,
    > network, etc. that come into play.
    >
    > There isn't an easy way to deal with this :P
    >
    >>
    >>
    >> Batch mode was created to be able to more efficiently process
    messages that
    >> are inserted into databases, we then found that the reduced queue
    >> congestion was a significant advantage in itself.
    >>
    >> But unless you have a queue just for the ES action,
    >
    > That's what we had to do for the fluentd case - we have a
    separate "ES retry
    > queue".  One of the tricky parts is that there may be multiple
    outputs - you
    > may want to send each log record to Elasticsearch _and_ a
    message bus _and_ a
    > remote rsyslog forwarder. But you only want to retry sending to
    Elasticsearch
    > to avoid duplication in the other outputs.

    In Rsyslog, queues are explicitly configured by the admin (for
    various reasons,
    including performance and reliability trade-offs), I really don't
    like the idea
    of omelasticsearch creating it's own queue without these options.
    Kafka does
    this and it's an ongoing source of problems.


_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] omelasticsearch - failed operation handling

Reply via email to