1) [image: 2022-08-25_20-10-14.png] 2) I was checking with tcpdump. Don't know if I'm on pair with your theory cause client (blackbox) sending syn immediately after receiving "large" udp packet. As I said I don't see this behavior with dig, nor I see the truncated flag. UDP response from server is 860 bytes. My hypothesis that DNS server is clogging with amount of TCP requests (more than 100 hosts) and he resets some of them, then there is 3s TCP timeout, and successful retry with new connection after. I will check RST flags from 53 port tomorrow on the DNS server host.
3) Yep, this is something i learned today. I was reading this <https://sandilands.info/sgordon/segmentation-offloading-with-wireshark-and-ethtool> article, but I don't know, you sure about it? As I understood it, you see incorrect checksums with tcpdump, cause of this [image: 2022-08-25_20-27-13.png] but it has no effect on actual traffic. I observed that that tcpdump shows that checksums are incorrect for outgoing upd traffic, but receiver show that checksums are fine. Mb I can attach some dumps later. So for now I see some ways to overcome this: 1) Somehow decrease DNS response (AUTHORITY SECTION и ADDITIONAL SECTION), though I don't know if I can do it (I'm using FreeIPA) 2) Make changes on client side, either custom changes to blackbox itself, or make architectural changes with spreading probing load on DNS server. don't know, hard stuff On Thursday, August 25, 2022 at 6:40:23 PM UTC+10 Brian Candler wrote: > What is this "DNSLookupDuration3s" you talk about? Is it an alerting > rule? Can you show the expr? > > To me, it sounds like the opposite problem. My guess is that > blackbox_exporter is first making a UDP DNS query, and either the query or > the response is being blocked. So after 3 seconds it retries with TCP, and > that succeeds. > > You can check this theory using tcpdump (especially if you can do tcpdump > on the caching resolver as well). Do you see an outbound UDP DNS query, > but no response? The resolution is to fix the underlying UDP communication > problem. > > Are there any virtual machines involved in this? That's the one case > where I have seen this exact problem before with UDP traffic but not TCP. > The packet is sent without a correct UDP checksum, because checksum > offloading is enabled and the client expects the NIC to insert a correct > one; but the receiver doesn't know this, and just sees a packet with a bad > checksum and discards it. > > The solution, or at least workaround, is to disable UDP transmit checksum > offloading on the VM's network interface (probably just the one running > blackbox_exporter) > > Try: > ethtool --offload eth0 tx off > > and if that doesn't work, also try: > ethtool --offload eth0 gso off gro off tso off > > On Thursday, 25 August 2022 at 08:34:37 UTC+1 [email protected] wrote: > >> The blackbox_exporter uses the built-in Go resolver library[0]. The only >> options here are which address family you want in return. >> >> [0]: https://pkg.go.dev/net#Resolver.LookupIP >> >> On Thu, Aug 25, 2022 at 7:35 AM terrible person <[email protected]> >> wrote: >> >>> Thank you, actually I found out about this behaviour just after I posted >>> here. >>> Strangely, I don't see tcp connections with either nslookup of dig, >>> though response is about 860 bytes, but UDP outgoing traffic is present. >>> When I probe with blackbox there is also tcp. >>> >>> How blackbox performs such probes? In parallel or successively? Is there >>> a way to suspend such behaviour, analogue to +notcp option of dig? >>> >>> On Thursday, August 25, 2022 at 2:03:27 PM UTC+10 [email protected] >>> wrote: >>> >>>> DNS lookups will switch to TCP if the response is larger than can fit >>>> in a single packet. But that should happen immediately. >>>> >>>> >>>> >>>> On Thu, Aug 25, 2022 at 5:56 AM terrible person <[email protected]> >>>> wrote: >>>> >>>>> Hi. I'm currently debugging DNS Lookup warnings (more that 3 sec) and >>>>> need to figure out whether our network/our DNS/or exporter is misbehaving. >>>>> So I'm checking ssh endpoints with tcp module: >>>>> >>>>> [image: 2022-08-25_13-22-13.png] >>>>> >>>>> but experience 3+ seconds delay on resolving ssh hostnames, which >>>>> triggers alerts DNSLookupDuration3s. >>>>> >>>>> [image: 2022-08-25_13-26-19.png] >>>>> problem looks something like this on different hosts - 3.0s+ seconds >>>>> of timeout, which looks very much like a generic tcp timeout. >>>>> >>>>> I checked on DNS server and yes, after UDP queries there is a TCP DNS >>>>> query for A record. I don't see any UDP checksum corruption or delays >>>>> for >>>>> such failover. Is this intended? Can someone help me out on this. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Prometheus Users" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/prometheus-users/16d8f137-a7e3-4361-a624-6719d71b1d29n%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/prometheus-users/16d8f137-a7e3-4361-a624-6719d71b1d29n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Prometheus Users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/prometheus-users/5f7bc297-5130-433b-b70c-6de34186a9e8n%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/prometheus-users/5f7bc297-5130-433b-b70c-6de34186a9e8n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/6b0556a4-5b8e-4cc2-a8b4-7ee9b7966310n%40googlegroups.com.

