Re: [dnsdist] DNSDIST 1.3.3-3 from standard debian buster

2019-08-14 Thread Remi Gacogne
Hi Chris,

On 8/14/19 2:58 AM, Chris wrote:
> For this issue I have not been able to make any progress yet. I have
> asked my colleagues for help as I am a network admin by trade, something
> they found that may be potentially related is this kernel bug:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=202997
> 
> One of my colleagues also deployed a base install of Debian and did some
> testing with iperf; apparently he could reproduce the issue without
> involving PowerDNS at all. From the above link he has adjusted a few
> things but as of yet the problem hasn't been resolved.

That's very interesting, thank you! I'm a bit surprised to see the bug
entry marked as RESOLVED INVALID since clearly the sockets did not lock
up that way in earlier kernels under the same conditions.

Did you try reducing the sysctl net.core.{r,w}mem_{default,max} values
on your system to see if the issue remains?

I have been directed to this patch as another possible lead:

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=891584f48a9084ba462f10da4c6bb28b6181b543

Do you know if your system receives a lot of fragmented UDP datagrams?

> I am wondering if there is a plan for official dnsdist 1.4 packages on
> Debian Stretch? I am going to need to use that for now as Buster isn't
> suitable for production in my environment yet.

We already provide 1.4.0 packages for Stretch at
https://repo.powerdns.com unless I'm missing something?

Best regards,
-- 
Remi Gacogne
PowerDNS.COM BV - https://www.powerdns.com/



signature.asc
Description: OpenPGP digital signature
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] DNSDIST 1.3.3-3 from standard debian buster

2019-08-14 Thread Remi Gacogne
Hi Frank,

On 8/13/19 4:59 PM, Lichtnau Frank wrote:
> I think, it has to be with 'high latencies'.
> 
> I have: 
> - 1 pool (winmls) for windows-ad-dns-queries
> - 1 pool (mls) for rest of our internal domain
> - and a dns-forwarder (with 3 listener) for external dns-queries.
> 
> The pools work fine with latencies of 0.3 - 0.8 The single dns-forwarder has 
> latencies of 40 - 56. And there I have the drops.
> 
> For testing I reconfigure the external dns-queries over the pool(mls). 
> And than I have the drops in this pool.
> 
> I would try to install dnsdist 1.3.3 in debian 9, but it works not, because 
> some packet-dependencies was not given.
> And the  dnsdist-packet in debian 9 was to old.
> 
> Your tool is importent for me, because it helps me to capture queer manner of 
> our windows machines. 
> If the dns-server is gone, windows don't switch to the second dns-server in 
> his given list of dns-servers. 
> 
> BTW, I would build now a tool as workaround for checking dnsdist frequency.
> if the quote between queries and drops too bad or grow up I restart the 
> daemon.

So now I'm confused, are you experiencing the same issue reported by
Chris where dnsdist stops responding to every single query sent over UDP
after a while, or are you just experiencing some queries being dropped?

Queries being dropped happens for one of two reasons, either the backend
did not respond fast enough, for example because it is overloaded, or
dnsdist is struggling to keep up with the responses and could not
process them fast enough.
It is quite easy to see the difference between the two cases because in
the second one dnsdist will be CPU bound, which is easy to spot in top,
metronome or grafana. If the backend is not responding fast enough, then
you need to investigate the backend or eventually the network.

> I check your API api/v1/servers/localhost and see, that the value from
Column "Drops" are given in field=reused.
>
> Why ist he name reused and what means reused in this context?

I agree it's not clear but that's the same thing. Historically dnsdist
did not actively discover timeouts, ie a backend not responding, but
would notice later when picking up a state (a slot in the table dnsdist
uses to keep track of queries sent to backends) to forward a new query
that the state was still marked as "in use", meaning that the response
never came through. We will then "reuse" that state, and so the
corresponding metric is named "reused" even though nowadays we usually
notice the timeout by regularly scanning the table.

Best regards,
-- 
Remi Gacogne
PowerDNS.COM BV - https://www.powerdns.com/



signature.asc
Description: OpenPGP digital signature
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist


Re: [dnsdist] DNSDIST 1.3.3-3 from standard debian buster

2019-08-12 Thread Remi Gacogne
Hi Frank,

On 8/12/19 4:27 PM, Lichtnau Frank wrote:
> I can confirm that we have the same problems under debian buster  like
> Chris call “dnsdist 1.4 and Debian buster”.
> https://mailman.powerdns.com/pipermail/dnsdist/2019-August/000601.html
> 
> The only differcence is, we installed the standard debian packet 1.3.3-3
> 
> It works fine for hours and than all calls with no local domain-names
> are dropped. We have no ACL – Dynamic – Rule and Blockfilter Drops,
> 
> The dns-call in direction internal DNS-Server works  fine.
> 
> I try to grow up the listener  for our external DNS-Server and  than I
> grow up also the sockets, but it helps not.
> 
> I active remote logging via ProtobufLogger, but can’t find any
> interesting things.

Thanks a lot for the feedback. I'm not surprised to read that you didn't
see anything interesting in remote logging, since it's looking more and
more like a Buster issue than a dnsdist one, especially if the issue
also manifests itself with the Auth..

Best regards,
-- 
Remi Gacogne
PowerDNS.COM BV - https://www.powerdns.com/



signature.asc
Description: OpenPGP digital signature
___
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist