Re: Logstash and Filebeat guaranteed delivery

2023-12-03 Thread Piotr P. Karwasz
Hi Matt,

On Fri, 1 Dec 2023 at 18:25, Matt Sicker  wrote:
>
> However, that does raise an interesting point: we could link to third party 
> plugins and such to help curate it.

That is part of the plan for MS9:

1. Each JAR containing Log4j plugins will have some documentation
resources (e.g. `META-INF/log4j/plugins.xml`),
2. A GitHub Actions job will take care of collecting these
documentation files from multiple JARs in order to create a single
Log4j Configuration Reference.

The milestone is due in February, so we will have plenty of time to
discuss the criteria that third-party plugins must fulfill to be
referenced in our documentation.

Piotr

[1] https://github.com/apache/logging-log4j2/issues/1954


Re: Logstash and Filebeat guaranteed delivery

2023-12-02 Thread Matt Sicker
For one, there is a battle-tested Log4j Redis Appender
>>>> <https://github.com/vy/log4j2-redis-appender>, which enabled us to
>> remove
>>>> `log4j-redis` in `main`.
>>>> 
>>>> Indeed, Flume can deliver what Redis+Logstash do. Though my point is,
>> not
>>>> that Redis has a magical feature set, but there *are* several log sink
>>>> stacks one can build using modern stock components and provide
>> guaranteed
>>>> delivery. I would like to document some of those, if not best-practices,
>>>> known-to-work solutions. This way we can enable our users to make a
>>>> well-informed decision and pick the best approach that fits into their
>>>> existing stack.
>>>> 
>>>> On Thu, Nov 30, 2023 at 9:34 PM Ralph Goers >> 
>>>> wrote:
>>>> 
>>>>> Volkan,
>>>>> 
>>>>> Notice that neither of the links you have provided use the term
>>>>> “guaranteed delivery”. That is because that is not really what they are
>>>>> providing. In addition, notice that Logstash says "Input plugins that
>> do
>>>>> not use a request-response protocol cannot be protected from data
>> loss”,
>>>>> and "Data may be lost if an abnormal shutdown occurs before the
>>>> checkpoint
>>>>> file has been committed”. Note that Flume’s FileChannel does not face
>> the
>>>>> second issue while the first would also be a problem if it is using a
>>>>> source that doesn’t support acknowledgements.However, Log4j’s
>>>> FlumeAppender
>>>>> always gets acks.
>>>>> 
>>>>> To make this clearer let me review the architecture for my
>> implementation
>>>>> again.
>>>>> 
>>>>> First the phone system maintains a list of ip addresses that can handle
>>>>> Radius accounting records. We host 2 Flume servers in the same data
>>>> center
>>>>> as the phone system and configure the phone system with their ip
>>>> addresses.
>>>>> The Radius records will be sent to those Flume servers which will
>> accept
>>>>> them with our custom Radius Source. That converts them to JSON and
>> passes
>>>>> the JSON to the File Channel. Once the File Channel has written them to
>>>>> disk the source responds back to the phone system with an ACK that the
>>>>> record was received. It the record is not processed quickly enough (I
>>>>> believe it is 100ms) then the phone system will try a different ip
>>>> address
>>>>> assuming it couldn’t be delivered by the first server.  Another thread
>>>>> reads the records from the File Channel and sends them to a Flume in a
>>>>> different data center for processing. This follows the same pattern.
>> The
>>>>> Avro Sink serializes the record and sends it to the target Flume. That
>>>>> Flume writes the record to a File channel and the Avro Source responds
>>>> with
>>>>> an ACK that the record was received, at which point the originating
>> Flume
>>>>> will remove it from the File Channel.
>>>>> 
>>>>> If you will notice, the application itself knows that delivery is
>>>>> guaranteed because it gets an ACK to say so. Due to this, Filbeat
>> cannot
>>>>> possibly implement guaranteed delivery. The application will expect
>> that
>>>>> once it writes to a file or to System.out delivery is guaranteed, which
>>>>> really cannot be true.
>>>>> 
>>>>> As for using Google Cloud that would default the whole point. If your
>>>> data
>>>>> center has lost contact with the outside world it won’t be able to get
>> to
>>>>> Google Cloud.
>>>>> 
>>>>> While Redis would work it would require a) an application component
>> that
>>>>> interacts with Redis such as a Redis Appender (which we don’t have) b)
>> a
>>>>> Redis deployment c) a Logstash (or some other Redis consumer) to
>> forward
>>>>> the event. It is a lot simpler to configure Flume than to do all of
>> that.
>>>>> 
>>>>> Ralph
>>>>> 
>>>>> 
>>>>>> On Nov 30, 2023, at 4:32 AM, Volkan Yazıcı  wrote:
>>>>>> 
>>>>>> Ralph, could you elaborate on your response, please? AFAIK, Logstash
>>>> and
>>>>> Filebeat provide guaranteed delivery, if configured correctly. As a
>>>> matter
>>>>> of fact they have docs (here and here) explaining how to do it –
>>>> actually,
>>>>> there are several ways on how to do it. What makes you think they don't
>>>>> provide guaranteed delivery?
>>>>>> 
>>>>>> I have implemented two different types of logging pipelines with
>>>>> guaranteed delivery:
>>>>>>   •
>>>>>> Using a Google Cloud BigQuery appender
>>>>>>   • Using a Redis appender (Redis queue is ingested to Elasticsearch
>>>>> through Logstash)
>>>>>> I want to learn where I can potentially violate the delivery
>> guarantee.
>>>>>> 
>>>>>> On Thu, Nov 30, 2023 at 5:54 AM Ralph Goers <
>>>> ralph.go...@dslextreme.com>
>>>>> wrote:
>>>>>> Fluentbit, Fluentd, Logstash, and Filebeat are the main tools used for
>>>>> log forwarding. While they all have some amount of plugability none of
>>>> the
>>>>> are as flexible as Flume. In addition, as I have mentioned before, none
>>>> of
>>>>> them provide guaranteed delivery so I would never recommend them for
>>>>> forwarding audit logs.
>>>>> 
>>>>> 
>>>> 
>> 
>> 



Re: Logstash and Filebeat guaranteed delivery

2023-12-01 Thread Gary Gregory
ot best-practices,
> >> known-to-work solutions. This way we can enable our users to make a
> >> well-informed decision and pick the best approach that fits into their
> >> existing stack.
> >>
> >> On Thu, Nov 30, 2023 at 9:34 PM Ralph Goers  >
> >> wrote:
> >>
> >>> Volkan,
> >>>
> >>> Notice that neither of the links you have provided use the term
> >>> “guaranteed delivery”. That is because that is not really what they are
> >>> providing. In addition, notice that Logstash says "Input plugins that
> do
> >>> not use a request-response protocol cannot be protected from data
> loss”,
> >>> and "Data may be lost if an abnormal shutdown occurs before the
> >> checkpoint
> >>> file has been committed”. Note that Flume’s FileChannel does not face
> the
> >>> second issue while the first would also be a problem if it is using a
> >>> source that doesn’t support acknowledgements.However, Log4j’s
> >> FlumeAppender
> >>> always gets acks.
> >>>
> >>> To make this clearer let me review the architecture for my
> implementation
> >>> again.
> >>>
> >>> First the phone system maintains a list of ip addresses that can handle
> >>> Radius accounting records. We host 2 Flume servers in the same data
> >> center
> >>> as the phone system and configure the phone system with their ip
> >> addresses.
> >>> The Radius records will be sent to those Flume servers which will
> accept
> >>> them with our custom Radius Source. That converts them to JSON and
> passes
> >>> the JSON to the File Channel. Once the File Channel has written them to
> >>> disk the source responds back to the phone system with an ACK that the
> >>> record was received. It the record is not processed quickly enough (I
> >>> believe it is 100ms) then the phone system will try a different ip
> >> address
> >>> assuming it couldn’t be delivered by the first server.  Another thread
> >>> reads the records from the File Channel and sends them to a Flume in a
> >>> different data center for processing. This follows the same pattern.
> The
> >>> Avro Sink serializes the record and sends it to the target Flume. That
> >>> Flume writes the record to a File channel and the Avro Source responds
> >> with
> >>> an ACK that the record was received, at which point the originating
> Flume
> >>> will remove it from the File Channel.
> >>>
> >>> If you will notice, the application itself knows that delivery is
> >>> guaranteed because it gets an ACK to say so. Due to this, Filbeat
> cannot
> >>> possibly implement guaranteed delivery. The application will expect
> that
> >>> once it writes to a file or to System.out delivery is guaranteed, which
> >>> really cannot be true.
> >>>
> >>> As for using Google Cloud that would default the whole point. If your
> >> data
> >>> center has lost contact with the outside world it won’t be able to get
> to
> >>> Google Cloud.
> >>>
> >>> While Redis would work it would require a) an application component
> that
> >>> interacts with Redis such as a Redis Appender (which we don’t have) b)
> a
> >>> Redis deployment c) a Logstash (or some other Redis consumer) to
> forward
> >>> the event. It is a lot simpler to configure Flume than to do all of
> that.
> >>>
> >>> Ralph
> >>>
> >>>
> >>>> On Nov 30, 2023, at 4:32 AM, Volkan Yazıcı  wrote:
> >>>>
> >>>> Ralph, could you elaborate on your response, please? AFAIK, Logstash
> >> and
> >>> Filebeat provide guaranteed delivery, if configured correctly. As a
> >> matter
> >>> of fact they have docs (here and here) explaining how to do it –
> >> actually,
> >>> there are several ways on how to do it. What makes you think they don't
> >>> provide guaranteed delivery?
> >>>>
> >>>> I have implemented two different types of logging pipelines with
> >>> guaranteed delivery:
> >>>>•
> >>>> Using a Google Cloud BigQuery appender
> >>>>• Using a Redis appender (Redis queue is ingested to Elasticsearch
> >>> through Logstash)
> >>>> I want to learn where I can potentially violate the delivery
> guarantee.
> >>>>
> >>>> On Thu, Nov 30, 2023 at 5:54 AM Ralph Goers <
> >> ralph.go...@dslextreme.com>
> >>> wrote:
> >>>> Fluentbit, Fluentd, Logstash, and Filebeat are the main tools used for
> >>> log forwarding. While they all have some amount of plugability none of
> >> the
> >>> are as flexible as Flume. In addition, as I have mentioned before, none
> >> of
> >>> them provide guaranteed delivery so I would never recommend them for
> >>> forwarding audit logs.
> >>>
> >>>
> >>
>
>


Re: Logstash and Filebeat guaranteed delivery

2023-12-01 Thread Matt Sicker
face the
>>> second issue while the first would also be a problem if it is using a
>>> source that doesn’t support acknowledgements.However, Log4j’s
>> FlumeAppender
>>> always gets acks.
>>> 
>>> To make this clearer let me review the architecture for my implementation
>>> again.
>>> 
>>> First the phone system maintains a list of ip addresses that can handle
>>> Radius accounting records. We host 2 Flume servers in the same data
>> center
>>> as the phone system and configure the phone system with their ip
>> addresses.
>>> The Radius records will be sent to those Flume servers which will accept
>>> them with our custom Radius Source. That converts them to JSON and passes
>>> the JSON to the File Channel. Once the File Channel has written them to
>>> disk the source responds back to the phone system with an ACK that the
>>> record was received. It the record is not processed quickly enough (I
>>> believe it is 100ms) then the phone system will try a different ip
>> address
>>> assuming it couldn’t be delivered by the first server.  Another thread
>>> reads the records from the File Channel and sends them to a Flume in a
>>> different data center for processing. This follows the same pattern. The
>>> Avro Sink serializes the record and sends it to the target Flume. That
>>> Flume writes the record to a File channel and the Avro Source responds
>> with
>>> an ACK that the record was received, at which point the originating Flume
>>> will remove it from the File Channel.
>>> 
>>> If you will notice, the application itself knows that delivery is
>>> guaranteed because it gets an ACK to say so. Due to this, Filbeat cannot
>>> possibly implement guaranteed delivery. The application will expect that
>>> once it writes to a file or to System.out delivery is guaranteed, which
>>> really cannot be true.
>>> 
>>> As for using Google Cloud that would default the whole point. If your
>> data
>>> center has lost contact with the outside world it won’t be able to get to
>>> Google Cloud.
>>> 
>>> While Redis would work it would require a) an application component that
>>> interacts with Redis such as a Redis Appender (which we don’t have) b) a
>>> Redis deployment c) a Logstash (or some other Redis consumer) to forward
>>> the event. It is a lot simpler to configure Flume than to do all of that.
>>> 
>>> Ralph
>>> 
>>> 
>>>> On Nov 30, 2023, at 4:32 AM, Volkan Yazıcı  wrote:
>>>> 
>>>> Ralph, could you elaborate on your response, please? AFAIK, Logstash
>> and
>>> Filebeat provide guaranteed delivery, if configured correctly. As a
>> matter
>>> of fact they have docs (here and here) explaining how to do it –
>> actually,
>>> there are several ways on how to do it. What makes you think they don't
>>> provide guaranteed delivery?
>>>> 
>>>> I have implemented two different types of logging pipelines with
>>> guaranteed delivery:
>>>>•
>>>> Using a Google Cloud BigQuery appender
>>>>• Using a Redis appender (Redis queue is ingested to Elasticsearch
>>> through Logstash)
>>>> I want to learn where I can potentially violate the delivery guarantee.
>>>> 
>>>> On Thu, Nov 30, 2023 at 5:54 AM Ralph Goers <
>> ralph.go...@dslextreme.com>
>>> wrote:
>>>> Fluentbit, Fluentd, Logstash, and Filebeat are the main tools used for
>>> log forwarding. While they all have some amount of plugability none of
>> the
>>> are as flexible as Flume. In addition, as I have mentioned before, none
>> of
>>> them provide guaranteed delivery so I would never recommend them for
>>> forwarding audit logs.
>>> 
>>> 
>> 



Re: Logstash and Filebeat guaranteed delivery

2023-12-01 Thread Gary Gregory
ch will accept
> > them with our custom Radius Source. That converts them to JSON and passes
> > the JSON to the File Channel. Once the File Channel has written them to
> > disk the source responds back to the phone system with an ACK that the
> > record was received. It the record is not processed quickly enough (I
> > believe it is 100ms) then the phone system will try a different ip
> address
> > assuming it couldn’t be delivered by the first server.  Another thread
> > reads the records from the File Channel and sends them to a Flume in a
> > different data center for processing. This follows the same pattern. The
> > Avro Sink serializes the record and sends it to the target Flume. That
> > Flume writes the record to a File channel and the Avro Source responds
> with
> > an ACK that the record was received, at which point the originating Flume
> > will remove it from the File Channel.
> >
> > If you will notice, the application itself knows that delivery is
> > guaranteed because it gets an ACK to say so. Due to this, Filbeat cannot
> > possibly implement guaranteed delivery. The application will expect that
> > once it writes to a file or to System.out delivery is guaranteed, which
> > really cannot be true.
> >
> > As for using Google Cloud that would default the whole point. If your
> data
> > center has lost contact with the outside world it won’t be able to get to
> > Google Cloud.
> >
> > While Redis would work it would require a) an application component that
> > interacts with Redis such as a Redis Appender (which we don’t have) b) a
> > Redis deployment c) a Logstash (or some other Redis consumer) to forward
> > the event. It is a lot simpler to configure Flume than to do all of that.
> >
> > Ralph
> >
> >
> > > On Nov 30, 2023, at 4:32 AM, Volkan Yazıcı  wrote:
> > >
> > > Ralph, could you elaborate on your response, please? AFAIK, Logstash
> and
> > Filebeat provide guaranteed delivery, if configured correctly. As a
> matter
> > of fact they have docs (here and here) explaining how to do it –
> actually,
> > there are several ways on how to do it. What makes you think they don't
> > provide guaranteed delivery?
> > >
> > > I have implemented two different types of logging pipelines with
> > guaranteed delivery:
> > > •
> > > Using a Google Cloud BigQuery appender
> > > • Using a Redis appender (Redis queue is ingested to Elasticsearch
> > through Logstash)
> > > I want to learn where I can potentially violate the delivery guarantee.
> > >
> > > On Thu, Nov 30, 2023 at 5:54 AM Ralph Goers <
> ralph.go...@dslextreme.com>
> > wrote:
> > > Fluentbit, Fluentd, Logstash, and Filebeat are the main tools used for
> > log forwarding. While they all have some amount of plugability none of
> the
> > are as flexible as Flume. In addition, as I have mentioned before, none
> of
> > them provide guaranteed delivery so I would never recommend them for
> > forwarding audit logs.
> >
> >
>


Re: Logstash and Filebeat guaranteed delivery

2023-12-01 Thread Volkan Yazıcı
ved, at which point the originating Flume
> will remove it from the File Channel.
>
> If you will notice, the application itself knows that delivery is
> guaranteed because it gets an ACK to say so. Due to this, Filbeat cannot
> possibly implement guaranteed delivery. The application will expect that
> once it writes to a file or to System.out delivery is guaranteed, which
> really cannot be true.
>
> As for using Google Cloud that would default the whole point. If your data
> center has lost contact with the outside world it won’t be able to get to
> Google Cloud.
>
> While Redis would work it would require a) an application component that
> interacts with Redis such as a Redis Appender (which we don’t have) b) a
> Redis deployment c) a Logstash (or some other Redis consumer) to forward
> the event. It is a lot simpler to configure Flume than to do all of that.
>
> Ralph
>
>
> > On Nov 30, 2023, at 4:32 AM, Volkan Yazıcı  wrote:
> >
> > Ralph, could you elaborate on your response, please? AFAIK, Logstash and
> Filebeat provide guaranteed delivery, if configured correctly. As a matter
> of fact they have docs (here and here) explaining how to do it – actually,
> there are several ways on how to do it. What makes you think they don't
> provide guaranteed delivery?
> >
> > I have implemented two different types of logging pipelines with
> guaranteed delivery:
> > •
> > Using a Google Cloud BigQuery appender
> > • Using a Redis appender (Redis queue is ingested to Elasticsearch
> through Logstash)
> > I want to learn where I can potentially violate the delivery guarantee.
> >
> > On Thu, Nov 30, 2023 at 5:54 AM Ralph Goers 
> wrote:
> > Fluentbit, Fluentd, Logstash, and Filebeat are the main tools used for
> log forwarding. While they all have some amount of plugability none of the
> are as flexible as Flume. In addition, as I have mentioned before, none of
> them provide guaranteed delivery so I would never recommend them for
> forwarding audit logs.
>
>


Re: Logstash and Filebeat guaranteed delivery

2023-11-30 Thread Ralph Goers
Volkan, 

Notice that neither of the links you have provided use the term “guaranteed 
delivery”. That is because that is not really what they are providing. In 
addition, notice that Logstash says "Input plugins that do not use a 
request-response protocol cannot be protected from data loss”, and "Data may be 
lost if an abnormal shutdown occurs before the checkpoint file has been 
committed”. Note that Flume’s FileChannel does not face the second issue while 
the first would also be a problem if it is using a source that doesn’t support 
acknowledgements.However, Log4j’s FlumeAppender always gets acks.

To make this clearer let me review the architecture for my implementation again.

First the phone system maintains a list of ip addresses that can handle Radius 
accounting records. We host 2 Flume servers in the same data center as the 
phone system and configure the phone system with their ip addresses. The Radius 
records will be sent to those Flume servers which will accept them with our 
custom Radius Source. That converts them to JSON and passes the JSON to the 
File Channel. Once the File Channel has written them to disk the source 
responds back to the phone system with an ACK that the record was received. It 
the record is not processed quickly enough (I believe it is 100ms) then the 
phone system will try a different ip address assuming it couldn’t be delivered 
by the first server.  Another thread reads the records from the File Channel 
and sends them to a Flume in a different data center for processing. This 
follows the same pattern. The Avro Sink serializes the record and sends it to 
the target Flume. That Flume writes the record to a File channel and the Avro 
Source responds with an ACK that the record was received, at which point the 
originating Flume will remove it from the File Channel.

If you will notice, the application itself knows that delivery is guaranteed 
because it gets an ACK to say so. Due to this, Filbeat cannot possibly 
implement guaranteed delivery. The application will expect that once it writes 
to a file or to System.out delivery is guaranteed, which really cannot be true.

As for using Google Cloud that would default the whole point. If your data 
center has lost contact with the outside world it won’t be able to get to 
Google Cloud.

While Redis would work it would require a) an application component that 
interacts with Redis such as a Redis Appender (which we don’t have) b) a Redis 
deployment c) a Logstash (or some other Redis consumer) to forward the event. 
It is a lot simpler to configure Flume than to do all of that.

Ralph


> On Nov 30, 2023, at 4:32 AM, Volkan Yazıcı  wrote:
> 
> Ralph, could you elaborate on your response, please? AFAIK, Logstash and 
> Filebeat provide guaranteed delivery, if configured correctly. As a matter of 
> fact they have docs (here and here) explaining how to do it – actually, there 
> are several ways on how to do it. What makes you think they don't provide 
> guaranteed delivery?
> 
> I have implemented two different types of logging pipelines with guaranteed 
> delivery:
> • 
> Using a Google Cloud BigQuery appender
> • Using a Redis appender (Redis queue is ingested to Elasticsearch 
> through Logstash)
> I want to learn where I can potentially violate the delivery guarantee.
> 
> On Thu, Nov 30, 2023 at 5:54 AM Ralph Goers  
> wrote:
> Fluentbit, Fluentd, Logstash, and Filebeat are the main tools used for log 
> forwarding. While they all have some amount of plugability none of the are as 
> flexible as Flume. In addition, as I have mentioned before, none of them 
> provide guaranteed delivery so I would never recommend them for forwarding 
> audit logs.



Logstash and Filebeat guaranteed delivery

2023-11-30 Thread Volkan Yazıcı
Ralph, could you elaborate on your response, please? AFAIK, Logstash and
Filebeat provide guaranteed delivery, if configured correctly. As a matter
of fact they have docs (here
<https://www.elastic.co/guide/en/logstash/current/persistent-queues.html>
and here
<https://www.elastic.co/guide/en/beats/filebeat/current/how-filebeat-works.html#at-least-once-delivery>)
explaining how to do it – actually, there are several ways on how to do it.
What makes you think they don't provide guaranteed delivery?

I have implemented two different types of logging pipelines with guaranteed
delivery:

   1. Using a Google Cloud BigQuery appender
   2. Using a Redis appender (Redis queue is ingested to Elasticsearch
   through Logstash)

I want to learn where I can potentially violate the delivery guarantee.

On Thu, Nov 30, 2023 at 5:54 AM Ralph Goers 
wrote:

> Fluentbit, Fluentd, Logstash, and Filebeat are the main tools used for log
> forwarding. While they all have some amount of plugability none of the are
> as flexible as Flume. In addition, as I have mentioned before, none of them
> provide guaranteed delivery so I would never recommend them for forwarding
> audit logs.
>