Re: [prometheus-users] Re: Advice on where/how to write a new niche-ish blackbox exporter probe?

Stewart Webb Wed, 25 Dec 2024 15:03:08 -0800

The main thing I'm hoping to track with this exporter at the moment is 
availability of the endpoint for sending log data to - i.e. we want it 
tracked in prometheus history if it goes down. We have a cluster of 
Fluentbit instances sitting behind an Amazon AWS NLB which might help 
explain the motivation a bit further.
I might re-consider the "just run a script" option but it may depend on 
whether the probing source can be set up to do that (my team doesn't 
run/maintain that particular service). Standard Prometheus HTTP scraping 
definitely is readily available which is the main reason that's been my 
focus so far.


As per my writeup repo:
> [...] the [Forward] protocol specifies a "chunk" option that can be used 
to get the server to respond to a batch of messages sent up to it. This 
forms the basis of the testing performed in this repo to see if the ack can 
be caught in a way to get a more reliable availability check probe.
The main issue is that to catch this "ack", the prober needs to understand 
the binary msgpack response which isn't newline-delimited, and the 
blackbox_exporter probe is essentially hard-coded to work with 
newline-delimited protocols/responses:
> The [blackbox_exporter] tcp probe's response checking uses a Go 
bufio.Scanner <https://pkg.go.dev/bufio#Scanner> (see 
https://github.com/prometheus/blackbox_exporter/blob/v0.25.0/prober/tcp.go#L135)
 
seemingly with defaults, which means it will only ever be able to work with 
newline-separated chunks of bytes.

Fluentbit does come with a HTTP endpoint in it that can provide a status 
check (https://docs.fluentbit.io/manual/administration/monitoring), but 
this runs on a different port which we'd have to route through the NLB 
separately, and starts to defeat the point of validating the ingress 
traffic flow is working as expected because it's essentially testing 
something else at that point.

In terms of Fluentbit sending the message elsewhere, the details of this is 
a config and policy thing (which, yes, would be valuable to test and track 
too, but will be a lot more involved as it will probably involve checking 
one or more destination services as well).

Cheers all for the replies.

Stewart
On Monday, 16 December 2024 at 19:52:16 UTC+11 Matthias Rampke wrote:

> Given the main motion of fluentbit is to accept a message and send it 
> *elsewhere*, what failure modes can and cannot be covered by the 
> request-and-response style of the blackbox exporter? In other words, how 
> far would this protocol support get you – or how much more could you do 
> with an exporter that is specific to the system, and can e.g. receive the 
> message back from a fluentbit output?
>
> As you point out, there's a limitation to how generic an exporter can be 
> and still be sensibly configurable, or at what point the configuration 
> becomes just as complex as writing the code. Putting aside whether this 
> protocol is widely used enough to justify adding support to the exporter, I 
> wonder how much value you would get from that before you run into 
> limitations from the fundamental model of the generic exporter.
>
> /MR
>
> On Sat, Dec 14, 2024 at 7:23 PM Chris Siebenmann <
> cks.prom...@cs.toronto.edu> wrote:
>
>> > If you want to minimize your work, you can write a test as a one-shot 
>> > standalone program in any language of your choice, and either:
>> > 1. Run it from cron, write the results to a file, and pick them up by 
>> > node_exporter textfile collector; OR
>> > 2. Run it on demand from exporter_exporter 
>> > <https://github.com/QubitProducts/exporter_exporter> using the "exec" 
>> > method; OR
>> > 3. Run it as a nagios plugin under nrped, and query it from 
>> nrpe_exporter 
>> > <https://www.robustperception.io/nagios-nrpe-prometheus-exporter/>
>>
>> Another 'run a program and provide its output as metrics' option is the
>> third party script exporter,
>> https://github.com/ricoberger/script_exporter
>>
>> The basic usage of the script exporter is very similar to the blackbox
>> exporter, but of course you have to start a program every time. We've
>> been happily using it for years for a variety of checks that require
>> more sophistication (and fine grained metrics) than the Blackbox
>> exporter can handle.
>>
>> (Another 'run it from cron' option is to have it push metrics into a
>> Pushgateway instance, but my view is that generally you want to use the
>> node_exporter textfile collector for that if it's possible. Pushgateway
>> usually has various drawbacks compared to the node_exporter approach.)
>>
>>         - cks
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to prometheus-use...@googlegroups.com.
>> To view this discussion visit 
>> https://groups.google.com/d/msgid/prometheus-users/3646228.1734204215%40apps0.cs.toronto.edu
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-users/064a1ef5-5b8e-45ee-a5ef-22a1456bb149n%40googlegroups.com.

Re: [prometheus-users] Re: Advice on where/how to write a new niche-ish blackbox exporter probe?

Reply via email to