What is this "DNSLookupDuration3s" you talk about? Is it an alerting
rule? Can you show the expr?
To me, it sounds like the opposite problem. My guess is that
blackbox_exporter is first making a UDP DNS query, and either the query or
the response is being blocked. So after 3 seconds it retries with TCP, and
that succeeds.
You can check this theory using tcpdump (especially if you can do tcpdump
on the caching resolver as well). Do you see an outbound UDP DNS query,
but no response? The resolution is to fix the underlying UDP communication
problem.
Are there any virtual machines involved in this? That's the one case where
I have seen this exact problem before with UDP traffic but not TCP. The
packet is sent without a correct UDP checksum, because checksum offloading
is enabled and the client expects the NIC to insert a correct one; but the
receiver doesn't know this, and just sees a packet with a bad checksum and
discards it.
The solution, or at least workaround, is to disable UDP transmit checksum
offloading on the VM's network interface (probably just the one running
blackbox_exporter)
Try:
ethtool --offload eth0 tx off
and if that doesn't work, also try:
ethtool --offload eth0 gso off gro off tso off
On Thursday, 25 August 2022 at 08:34:37 UTC+1 [email protected] wrote:
> The blackbox_exporter uses the built-in Go resolver library[0]. The only
> options here are which address family you want in return.
>
> [0]: https://pkg.go.dev/net#Resolver.LookupIP
>
> On Thu, Aug 25, 2022 at 7:35 AM terrible person <[email protected]>
> wrote:
>
>> Thank you, actually I found out about this behaviour just after I posted
>> here.
>> Strangely, I don't see tcp connections with either nslookup of dig,
>> though response is about 860 bytes, but UDP outgoing traffic is present.
>> When I probe with blackbox there is also tcp.
>>
>> How blackbox performs such probes? In parallel or successively? Is there
>> a way to suspend such behaviour, analogue to +notcp option of dig?
>>
>> On Thursday, August 25, 2022 at 2:03:27 PM UTC+10 [email protected] wrote:
>>
>>> DNS lookups will switch to TCP if the response is larger than can fit in
>>> a single packet. But that should happen immediately.
>>>
>>>
>>>
>>> On Thu, Aug 25, 2022 at 5:56 AM terrible person <[email protected]>
>>> wrote:
>>>
>>>> Hi. I'm currently debugging DNS Lookup warnings (more that 3 sec) and
>>>> need to figure out whether our network/our DNS/or exporter is misbehaving.
>>>> So I'm checking ssh endpoints with tcp module:
>>>>
>>>> [image: 2022-08-25_13-22-13.png]
>>>>
>>>> but experience 3+ seconds delay on resolving ssh hostnames, which
>>>> triggers alerts DNSLookupDuration3s.
>>>>
>>>> [image: 2022-08-25_13-26-19.png]
>>>> problem looks something like this on different hosts - 3.0s+ seconds of
>>>> timeout, which looks very much like a generic tcp timeout.
>>>>
>>>> I checked on DNS server and yes, after UDP queries there is a TCP DNS
>>>> query for A record. I don't see any UDP checksum corruption or delays for
>>>> such failover. Is this intended? Can someone help me out on this.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Prometheus Users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/prometheus-users/16d8f137-a7e3-4361-a624-6719d71b1d29n%40googlegroups.com
>>>>
>>>> <https://groups.google.com/d/msgid/prometheus-users/16d8f137-a7e3-4361-a624-6719d71b1d29n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>>
> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/prometheus-users/5f7bc297-5130-433b-b70c-6de34186a9e8n%40googlegroups.com
>>
>> <https://groups.google.com/d/msgid/prometheus-users/5f7bc297-5130-433b-b70c-6de34186a9e8n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/4ad9d5fa-6bba-4cdd-a942-39050262ea87n%40googlegroups.com.