Re: BIND and UDP tuning

2018-09-27 Thread Lee
On 9/27/18, Alex  wrote:
> Hi,
>
>> Just a wild thought:
>> It works with a lower speed line (at least I read it that way) but has
>> problems with higher speeds.
>> Could it be that the line is so fast that it "overtakes" the host in
>> question?
>>
>> A faster incoming line will give less time between the packets for
>> processing.
>
> No, I actually upgraded from a 65/20mbit to a 165/35mbit recently,
> thinking it was too slow because it was happening at the slower speeds
> as well. I've also implemented some basic QoS to throttle outgoing
> smtp and prioritize DNS but it made no difference.

Has your provider enabled qos?  I'd bet their dropping packets that
exceed qos rate limits would be considered "working as expected".

Which brings up the question of exactly what does SERVFAIL mean?  Can
no response to a query result in SERVFAIL?  Is there a way to tell the
difference between no response & getting a response indicating a
failure?

Lee
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: BIND and UDP tuning

2018-09-27 Thread Noel Butler
Hi Alex, 

Have you tried on a separate physical server? To rule out the actual
hardware as being the problem? 

Is this some  user grade PC with either onboard or external ethernet
interface, or a proper server grade equipment? Age of equipment? What
else does that machine do? 

Cheers 

On 28/09/2018 02:07, Alex wrote:

> Hi,
> 
>> Just a wild thought:
>> It works with a lower speed line (at least I read it that way) but has 
>> problems with higher speeds.
>> Could it be that the line is so fast that it "overtakes" the host in 
>> question?
>> 
>> A faster incoming line will give less time between the packets for 
>> processing.
> 
> No, I actually upgraded from a 65/20mbit to a 165/35mbit recently,
> thinking it was too slow because it was happening at the slower speeds
> as well. I've also implemented some basic QoS to throttle outgoing
> smtp and prioritize DNS but it made no difference.
> 
> Thanks,
> Alex
> ___
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
> from this list
> 
> bind-users mailing list
> bind-users@lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users

-- 
Kind Regards, 

Noel Butler 

This Email, including any attachments, may contain legally 
privileged
information, therefore remains confidential and subject to copyright
protected under international law. You may not disseminate, discuss, or
reveal, any part, to anyone, without the authors express written
authority to do so. If you are not the intended recipient, please notify
the sender then delete all copies of this message including attachments,
immediately. Confidentiality, copyright, and legal privilege are not
waived or lost by reason of the mistaken delivery of this message. Only
PDF [1] and ODF [2] documents accepted, please do not send proprietary
formatted documents 

 

Links:
--
[1] http://www.adobe.com/
[2] http://en.wikipedia.org/wiki/OpenDocument___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: BIND and UDP tuning

2018-09-27 Thread Alex
Hi,

> Just a wild thought:
> It works with a lower speed line (at least I read it that way) but has 
> problems with higher speeds.
> Could it be that the line is so fast that it "overtakes" the host in question?
>
> A faster incoming line will give less time between the packets for processing.

No, I actually upgraded from a 65/20mbit to a 165/35mbit recently,
thinking it was too slow because it was happening at the slower speeds
as well. I've also implemented some basic QoS to throttle outgoing
smtp and prioritize DNS but it made no difference.

Thanks,
Alex
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: BIND and UDP tuning

2018-09-27 Thread Ben Croswell
When we ran into UDP tuning issues on high traffic devices it presented as
silent discards rather than SERVFAIL.

On Thu, Sep 27, 2018, 12:04 PM Alex  wrote:

> Hi,
>
> > On Thu, Sep 27, 2018 at 10:53:25AM -0400, Alex wrote:
> > > Many of these values I've already tweaked and have had no effect on my
> > > SERVFAIL issues :-(
> >
> > If you are getting SERVFAILs from a BIND resolver you administer, then
> > it has responded to your query. If you turn up the log level to
> > something like -d 99, it'll print the steps that led to that SERVFAIL.
> > Usually you'll find something there that directs you to next steps.
> >
> > On this topic, my home resolver is also a stock packaged BIND version as
> > you, and I too see spurious SERVFAILs sometimes. I used to think this
> > was due to too much indirection, e.g., when named starts up and you run:
> >
> > dig -x 176.9.81.50
>
> It doesn't typically happen when running from the command-line. It
> does occasionally happen, though. I usually run something like "dig
> +all +trace +nodnssec ". It sometimes times out in the
> middle, with something like "cannot resolve xyz host", which may even
> be one of the root servers.
>
> I also typically run it with "rndc trace 11" which shows me quite a
> bit of debugging info - too much to look through manually. With trace
> 99, I can imagine it being overwhelming amount of info. Do you have
> any ideas of what to look for? "query-errors"?
>
> Also, I also see other SERVFAIL errors that really are SERVFAIL errors
> - when querying the host manually, it still responds immediately with
> SERVFAIL.
>
> Thanks,
> Alex
>
>
>
> >
> > on a cold cache. However it seems to be returning SERVFAIL sometimes for
> > what should be a cached answer. I'll also turn up the debug logging and
> > watch it.
> >
> > Mukund
> ___
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to
> unsubscribe from this list
>
> bind-users mailing list
> bind-users@lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
>
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: BIND and UDP tuning

2018-09-27 Thread Alex
Hi,

> > This is also only happening on the two identical systems connected
> > to the 165/35mbit cable modem.
> > ...
> > I really hope there is > someone with some additional ideas.
>
> Is it the modem?

No, it's been replaced at least once, and I've been assured by both
the cable tech that was here and the dimwits on the other end that
it's operating normally. I really wish it were that easy.

Thanks,
Alex



>
> --
>
> 73,
> Ged.
> ___
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
> from this list
>
> bind-users mailing list
> bind-users@lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: BIND and UDP tuning

2018-09-27 Thread Alex
Hi,

> On Thu, Sep 27, 2018 at 10:53:25AM -0400, Alex wrote:
> > Many of these values I've already tweaked and have had no effect on my
> > SERVFAIL issues :-(
>
> If you are getting SERVFAILs from a BIND resolver you administer, then
> it has responded to your query. If you turn up the log level to
> something like -d 99, it'll print the steps that led to that SERVFAIL.
> Usually you'll find something there that directs you to next steps.
>
> On this topic, my home resolver is also a stock packaged BIND version as
> you, and I too see spurious SERVFAILs sometimes. I used to think this
> was due to too much indirection, e.g., when named starts up and you run:
>
> dig -x 176.9.81.50

It doesn't typically happen when running from the command-line. It
does occasionally happen, though. I usually run something like "dig
+all +trace +nodnssec ". It sometimes times out in the
middle, with something like "cannot resolve xyz host", which may even
be one of the root servers.

I also typically run it with "rndc trace 11" which shows me quite a
bit of debugging info - too much to look through manually. With trace
99, I can imagine it being overwhelming amount of info. Do you have
any ideas of what to look for? "query-errors"?

Also, I also see other SERVFAIL errors that really are SERVFAIL errors
- when querying the host manually, it still responds immediately with
SERVFAIL.

Thanks,
Alex



>
> on a cold cache. However it seems to be returning SERVFAIL sometimes for
> what should be a cached answer. I'll also turn up the debug logging and
> watch it.
>
> Mukund
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: BIND and UDP tuning

2018-09-27 Thread G.W. Haywood via bind-users

Hi there,

On Thu, 27 Sep 2018, Alex wrote


This is also only happening on the two identical systems connected
to the 165/35mbit cable modem.
...
I really hope there is > someone with some additional ideas.


Is it the modem?

--

73,
Ged.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: BIND and UDP tuning

2018-09-27 Thread Mukund Sivaraman
On Thu, Sep 27, 2018 at 10:53:25AM -0400, Alex wrote:
> Many of these values I've already tweaked and have had no effect on my
> SERVFAIL issues :-(

If you are getting SERVFAILs from a BIND resolver you administer, then
it has responded to your query. If you turn up the log level to
something like -d 99, it'll print the steps that led to that SERVFAIL.
Usually you'll find something there that directs you to next steps.

On this topic, my home resolver is also a stock packaged BIND version as
you, and I too see spurious SERVFAILs sometimes. I used to think this
was due to too much indirection, e.g., when named starts up and you run:

dig -x 176.9.81.50

on a cold cache. However it seems to be returning SERVFAIL sometimes for
what should be a cached answer. I'll also turn up the debug logging and
watch it.

Mukund
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: BIND and UDP tuning

2018-09-27 Thread Sten Carlsen


On 27/09/2018 16.53, Alex wrote:
> Hi,
>
>>> I reported a few weeks ago that I was experiencing a really high
>>> number of "SERVFAIL" messages in my bind-9.11.4-P1 system running on
>>> fedora28, and I haven't yet found a solution. This is all now running
>>> on a 165/35 cable system.
>>>
>>> I found a program named dropwatch which is showing a significant
>>> number of dropped UDP packets, particularly when there are bursts of
>>> email traffic:
>>>
>>> 12 drops at skb_queue_purge+13 (0x9f79a0c3)
>>> 1 drops at __udp4_lib_rcv+1e6 (0x9f83bdf6)
>>> 4 drops at __udp4_lib_rcv+1e6 (0x9f83bdf6)
>>> 5 drops at nf_hook_slow+a7 (0x9f7faff7)
>>> 3 drops at sk_stream_kill_queues+48 (0x9f7a1158)
>>> 3 drops at __udp4_lib_rcv+1e6 (0x9f83bdf6)
>>> ...
>>>
>>> # netstat -us
>>> ...
>>> Udp:
>>> 23449482 packets received
>>> 1724269 packets to unknown port received
>>> 8248 packet receive errors
>>> 31394909 packets sent
>>> 8243 receive buffer errors
>>> 0 send buffer errors
>>> InCsumErrors: 5
>>> IgnoredMulti: 43247
>>>
>>> The SERVFAIL messages don't necessarily correspond to the UDP packet
>>> errors shown by netstat, but the dropwatch output is continuous. The
>>> netstat packet receive errors also don't seem to correspond to
>>> "SERVFAIL" or "Name service" errors:
>>>
>>> 26-Sep-2018 12:42:49.743 query-errors: info: client @0x7fb3c41634d0
>>> 127.0.0.1#44104 (46.36.47.104.wl.mailspike.net): query failed
>>> (SERVFAIL) for 46.36.47.104.wl.mailspike.net/IN/A at
>>> ../../../bin/named/query.c:8580
>>>
>>> Sep 26 12:47:11 mail03 postfix/dnsblog[22821]: warning: dnsblog_query:
>>> lookup error for DNS query 196.91.107.80.bl.spameatingmonkey.net: Host
>>> or domain name not found. Name service error for
>>> name=196.91.107.80.bl.spameatingmonkey.net type=A: Host not found, try
>>> again
>>>
>>> I've been following this thread from some time ago, but nothing I've
>>> done has made a difference. I really don't know what the buffer sizes
>>> should be.
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__bind-2Dusers-
>>> 2Dforum.2342410.n4.nabble.com_Tuning-2Dsuggestions-2Dfor-2Dhigh-2Dcore-
>>> 2Dcount-2DLinux-2Dservers-
>>> 2Dtd3899.html=DwICAg=MOptNlVtIETeDALC_lULrw=udvvbouEjrWNUMab5xo_vLb
>>> UE6LRGu5fmxLhrDvVJS8=5XQNuuRQ4kxK03zqoWaJHIdaJvNdsyTKHuFlDKedbpc=5Dqh
>>> ne-5w5V_1coBTBvTITwK2EFeankOegTaofy8S5w=
>>>
>>> Are there specific bind tunables you might recommend? edns-udp-size,
>>> perhaps?
>>>
>>> Any ideas on other tunables such as net.core.*mem_default etc?
>> *chuckles to self*
>>
>> I was just referring back to that thread myself to try remember what I did.
>>
>> I ended up tuning the following items:
>>
>>   - name: SYSCTL system tuning, basics
>> sysctl:
>>   name: "{{ item.name }}"
>>   value: "{{ item.value }}"
>>   sysctl_set: yes
>>   state: present
>> with_items:
>>   - { name: 'vm.swappiness', value: 0 }
>>   - { name: 'net.core.netdev_max_backlog', value: 32768 }
>>   - { name: 'net.core.netdev_budget', value: 2700 }
>>   - { name: 'net.ipv4.tcp_sack', value: 0 }
>>   - { name: 'net.core.somaxconn', value: 2048 }
>>   - { name: 'net.core.rmem_default', value: 16777216 }
>>   - { name: 'net.core.rmem_max', value: 16777216 }
>>   - { name: 'net.core.wmem_default', value: 16777216 }
>>   - { name: 'net.core.wmem_max', value: 16777216 }
> Were you troubleshooting the same problems as I'm experiencing?
>
> Many of these values I've already tweaked and have had no effect on my
> SERVFAIL issues :-(
>
> I've also been following the performance tuning variables in this RH document:
> https://access.redhat.com/sites/default/files/attachments/20150325_network_performance_tuning.pdf
>
> These errors appear to occur in spurts - there is typically ten or
> more in a row at a time, then any number of minutes/seconds before the
> next one.
>
> It looks like there are periods of as many as 500 queries per second,
> although the usual amount is closer to 200 per second.
>
> I don't believe this is a bind configuration problem, as the "Name
> service error" errors from postfix also occur when testing with
> unbound.
>
> This is also only happening on the two identical systems connected to
> the 165/35mbit cable modem. I've verified with Oponline, and they've
> emphatically asserted there are no problems with the circuit. The
> systems are 8-core Xeon E31240 with 16GB RAM. I've also tried other
> systems, including a 12-core i7 with 32GB.
>
> We have several other systems connected to a 10mbit DIA ethernet
> circuit where these errors don't generally occur. They are also
> similarly configured fedora systems with the same version of bind.
>
> I'm really at a loss as to what the problem(s) are, but feel like it's
> really impacting our ability to query RBLs for processing mail.
>
>> Whilst mentioned in passing on that thread, there was also poking around 
>> with TOE, pause, 

Re: NTP through DNS?

2018-09-27 Thread Bob McDonald
Having multiple CNAME records for the same hsotname is a violation of
rfc1034. (that and bind9 won't allow it...)

Surely there must be some creative solution which doesn't a) violate the
DNS specs and b) doesn't suggest the use of deprecated software (bind8).

Regards,

Bob
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: BIND and UDP tuning

2018-09-27 Thread Alex
Hi,

> > I reported a few weeks ago that I was experiencing a really high
> > number of "SERVFAIL" messages in my bind-9.11.4-P1 system running on
> > fedora28, and I haven't yet found a solution. This is all now running
> > on a 165/35 cable system.
> >
> > I found a program named dropwatch which is showing a significant
> > number of dropped UDP packets, particularly when there are bursts of
> > email traffic:
> >
> > 12 drops at skb_queue_purge+13 (0x9f79a0c3)
> > 1 drops at __udp4_lib_rcv+1e6 (0x9f83bdf6)
> > 4 drops at __udp4_lib_rcv+1e6 (0x9f83bdf6)
> > 5 drops at nf_hook_slow+a7 (0x9f7faff7)
> > 3 drops at sk_stream_kill_queues+48 (0x9f7a1158)
> > 3 drops at __udp4_lib_rcv+1e6 (0x9f83bdf6)
> > ...
> >
> > # netstat -us
> > ...
> > Udp:
> > 23449482 packets received
> > 1724269 packets to unknown port received
> > 8248 packet receive errors
> > 31394909 packets sent
> > 8243 receive buffer errors
> > 0 send buffer errors
> > InCsumErrors: 5
> > IgnoredMulti: 43247
> >
> > The SERVFAIL messages don't necessarily correspond to the UDP packet
> > errors shown by netstat, but the dropwatch output is continuous. The
> > netstat packet receive errors also don't seem to correspond to
> > "SERVFAIL" or "Name service" errors:
> >
> > 26-Sep-2018 12:42:49.743 query-errors: info: client @0x7fb3c41634d0
> > 127.0.0.1#44104 (46.36.47.104.wl.mailspike.net): query failed
> > (SERVFAIL) for 46.36.47.104.wl.mailspike.net/IN/A at
> > ../../../bin/named/query.c:8580
> >
> > Sep 26 12:47:11 mail03 postfix/dnsblog[22821]: warning: dnsblog_query:
> > lookup error for DNS query 196.91.107.80.bl.spameatingmonkey.net: Host
> > or domain name not found. Name service error for
> > name=196.91.107.80.bl.spameatingmonkey.net type=A: Host not found, try
> > again
> >
> > I've been following this thread from some time ago, but nothing I've
> > done has made a difference. I really don't know what the buffer sizes
> > should be.
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__bind-2Dusers-
> > 2Dforum.2342410.n4.nabble.com_Tuning-2Dsuggestions-2Dfor-2Dhigh-2Dcore-
> > 2Dcount-2DLinux-2Dservers-
> > 2Dtd3899.html=DwICAg=MOptNlVtIETeDALC_lULrw=udvvbouEjrWNUMab5xo_vLb
> > UE6LRGu5fmxLhrDvVJS8=5XQNuuRQ4kxK03zqoWaJHIdaJvNdsyTKHuFlDKedbpc=5Dqh
> > ne-5w5V_1coBTBvTITwK2EFeankOegTaofy8S5w=
> >
> > Are there specific bind tunables you might recommend? edns-udp-size,
> > perhaps?
> >
> > Any ideas on other tunables such as net.core.*mem_default etc?
>
> *chuckles to self*
>
> I was just referring back to that thread myself to try remember what I did.
>
> I ended up tuning the following items:
>
>   - name: SYSCTL system tuning, basics
> sysctl:
>   name: "{{ item.name }}"
>   value: "{{ item.value }}"
>   sysctl_set: yes
>   state: present
> with_items:
>   - { name: 'vm.swappiness', value: 0 }
>   - { name: 'net.core.netdev_max_backlog', value: 32768 }
>   - { name: 'net.core.netdev_budget', value: 2700 }
>   - { name: 'net.ipv4.tcp_sack', value: 0 }
>   - { name: 'net.core.somaxconn', value: 2048 }
>   - { name: 'net.core.rmem_default', value: 16777216 }
>   - { name: 'net.core.rmem_max', value: 16777216 }
>   - { name: 'net.core.wmem_default', value: 16777216 }
>   - { name: 'net.core.wmem_max', value: 16777216 }

Were you troubleshooting the same problems as I'm experiencing?

Many of these values I've already tweaked and have had no effect on my
SERVFAIL issues :-(

I've also been following the performance tuning variables in this RH document:
https://access.redhat.com/sites/default/files/attachments/20150325_network_performance_tuning.pdf

These errors appear to occur in spurts - there is typically ten or
more in a row at a time, then any number of minutes/seconds before the
next one.

It looks like there are periods of as many as 500 queries per second,
although the usual amount is closer to 200 per second.

I don't believe this is a bind configuration problem, as the "Name
service error" errors from postfix also occur when testing with
unbound.

This is also only happening on the two identical systems connected to
the 165/35mbit cable modem. I've verified with Oponline, and they've
emphatically asserted there are no problems with the circuit. The
systems are 8-core Xeon E31240 with 16GB RAM. I've also tried other
systems, including a 12-core i7 with 32GB.

We have several other systems connected to a 10mbit DIA ethernet
circuit where these errors don't generally occur. They are also
similarly configured fedora systems with the same version of bind.

I'm really at a loss as to what the problem(s) are, but feel like it's
really impacting our ability to query RBLs for processing mail.

> Whilst mentioned in passing on that thread, there was also poking around with 
> TOE, pause, coalesce adaptive and ring size settings (look at ethtool -K, 
> ethtool -A, ethtool -C and ethtool -G), but sadly 

RE: BIND and UDP tuning

2018-09-27 Thread Browne, Stuart via bind-users
> -Original Message-
> From: Tony Finch [mailto:d...@dotat.at]
> 
> >   - { name: 'net.ipv4.tcp_sack', value: 0 }
> 
> Why? SACK is super important for TCP performance over links that have any
> degree of lossiness, and I don't recall hearing of any caveats.
> 
> Tony.
> --
> f.anthony.n.finch  

If I recall correctly, it had to do with the fact that we were in a 
very-network-close test environment with very-small packets so it wasn't 
necessary to even consider resends. I don't recall whether it did anything at 
all to the results; it is just one of the various things I stuck into the 
blender in order to see if it made a difference and was still in at the end of 
testing. The number of test iterations I went through was in the hundreds and 
most of it was "Moar! MOAR!" rather than good arguments; more about proving a 
design could reach a theoretical limit than whether it would be 100% stable in 
production. 

The environment design that these tests were preparing for haven't been 
implemented yet; that's what I'm working on over the next few weeks, so I'll be 
going over these settings with some kid-gloves and being a little gentler as we 
don't need a single location churning out 2M5 qps; we're quite happy with 2M.

Let's hear it for overkill!

Stuart
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


RE: BIND and UDP tuning

2018-09-27 Thread Tony Finch
Browne, Stuart via bind-users  wrote:

>   - { name: 'net.ipv4.tcp_sack', value: 0 }

Why? SACK is super important for TCP performance over links that have any
degree of lossiness, and I don't recall hearing of any caveats.

Tony.
-- 
f.anthony.n.finchhttp://dotat.at/
a just distribution of the rewards of success
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users