Re: Troubleshooting BIND stops responding

2017-03-30 Thread Alan Clegg
On 3/30/17 6:02 AM, Mark Elkins wrote:
> Stopping right here, Recursive lookup and Authoritative services are
> completely different services - and require different servers
> (preferably, though you could run multiple incidents of nameservers on a
> single server - but that can get ugly).

Actually, no.  Running both recursive and authoritative does not require
different servers and does not require running multiple instances of
bind.  It's not recommended, but it's not hard, and it has worked for
lots of people for lots of years.

> Your two recursive servers should remain as recursive servers, only
> giving replies to your customer base. When you start running DNSSEC,
> this becomes even more important, a recursive server running as an
> authoritative server for a zone can not give a proper DNSSEC reply when
> asked about Zones carried in its config.

Actually, the only thing it doesn't do is the validation.  It gives
responses just fine as long as you aren't validating your own data.
Trusting the "AD" bit is a great concept, but you really want to
validate as close to the end-point as possible.

> Rather keep things simple.
> 
> I would presume that you have multiple authoritative servers for your
> "vtt.net" domain. If you need more redundancy, add in more authoritative
> nameservers or better still an AnyCast instance. Even any of your local
> Authoritative Nameservers should ask your recursive servers when they
> need to look up information that is not part of the Zones they manage.
> Enough of the preaching.

Interesting to go from "keep things simple" to "let's use anycast" in
three sentences.

Too many people are trying to solve problems that don't exist with
additional complexity that cause additional issues elsewhere in the
network stack.  If your nameserver has issues with basic responses, good
luck debugging that while also dealing with routing problems in your
network and wondering which server you should actually be looking at.

Sorry to sound like an old grouch, but I'm really feeling like and old
grouch these days.

> If you were to run IPv6, a number of errors would disappear, otherwise
> force BIND not to do any IPv6. Adding IPv6 though would be preferable.  ;-)

Keep things simple... When your nameserver isn't responding, don't think
about running IPv6, fix the problem at hand.  And "if you run IPv6, a
number of errors disappear".  I'm just shaking my head.

> Don't think though that any of this is causing your problem. You could
> always upgrade your version of BIND. On my Gentoo Laptop, I'm  running
> BIND 9.11.0-P3, so you are a bit behind.

And there is the useful nugget.

Yes, OP, see if your problems continue once you upgrade.

AlanC



signature.asc
Description: OpenPGP digital signature
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: Troubleshooting BIND stops responding

2017-03-30 Thread Mark Elkins


On 30/03/2017 06:35, i.chu...@volga.ttk.ru wrote:
> Greetings to everyone!
>
> I'm an engineer at local ISP and we have to provide 2 DNS servers running 
> BIND for our clients. We have logs full of various BIND errors but are 
> unable to gain full understanding of the problem. The main problem is that 
> the BIND at 213.80.236.18 sometimes stops responding after working fine 
> for about a week. Then BIND just doesn't return any responses and we have 
> to restart it. There is a suspicion of a weak (because other services are 
> running normally) DoS attack but I don't know the right way to determine 
> if it is so or not. I would be glad if anyone be so kind to help us to 
> solve this issue.
>
> The machines have the IPv4 addresses: 217.23.80.4 (BIND version 9.9.4) and 
> 213.80.236.18 (BIND version 9.9.5-r3) and have to resolve hostnames only 
> for ISP customers (and refuse to resolve for others) BUT we want to be 
> able to resolve our specific zones like vtt.net for anybody trying in case 
> of authoritative nameserver failures

Stopping right here, Recursive lookup and Authoritative services are
completely different services - and require different servers
(preferably, though you could run multiple incidents of nameservers on a
single server - but that can get ugly).

Your two recursive servers should remain as recursive servers, only
giving replies to your customer base. When you start running DNSSEC,
this becomes even more important, a recursive server running as an
authoritative server for a zone can not give a proper DNSSEC reply when
asked about Zones carried in its config.

Rather keep things simple.

I would presume that you have multiple authoritative servers for your
"vtt.net" domain. If you need more redundancy, add in more authoritative
nameservers or better still an AnyCast instance. Even any of your local
Authoritative Nameservers should ask your recursive servers when they
need to look up information that is not part of the Zones they manage.
Enough of the preaching.

-oOo-

If you were to run IPv6, a number of errors would disappear, otherwise
force BIND not to do any IPv6. Adding IPv6 though would be preferable.  ;-)

Don't think though that any of this is causing your problem. You could
always upgrade your version of BIND. On my Gentoo Laptop, I'm  running
BIND 9.11.0-P3, so you are a bit behind.

-- 
Mark James ELKINS  -  Posix Systems - (South) Africa
m...@posix.co.za   Tel: +27.128070590  Cell: +27.826010496
For fast, reliable, low cost Internet in ZA: https://ftth.posix.co.za

___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Troubleshooting BIND stops responding

2017-03-29 Thread i . chudov
Greetings to everyone!

I'm an engineer at local ISP and we have to provide 2 DNS servers running 
BIND for our clients. We have logs full of various BIND errors but are 
unable to gain full understanding of the problem. The main problem is that 
the BIND at 213.80.236.18 sometimes stops responding after working fine 
for about a week. Then BIND just doesn't return any responses and we have 
to restart it. There is a suspicion of a weak (because other services are 
running normally) DoS attack but I don't know the right way to determine 
if it is so or not. I would be glad if anyone be so kind to help us to 
solve this issue.

The machines have the IPv4 addresses: 217.23.80.4 (BIND version 9.9.4) and 
213.80.236.18 (BIND version 9.9.5-r3) and have to resolve hostnames only 
for ISP customers (and refuse to resolve for others) BUT we want to be 
able to resolve our specific zones like vtt.net for anybody trying in case 
of authoritative nameserver failures.

I can post the configuration files like citation/attachment if it's 
appropriate.

And here is log samples from 213.80.236.18:
dns_more.log (configured as "channel enhlog/severity info;"):
30-Mar-2017 08:19:31.001 rate-limit: stop limiting NXDOMAIN responses to 
213.80.210.0/24 for .  ()
30-Mar-2017 08:19:38.822 resolver: DNS format error from 173.245.59.100#53 
resolving 82.51.18.104.in-addr.arpa/PTR for client 188.168.243.125#15693: 
Name 104.in-addr.arpa (SOA) not subdomain of zone 18.104.in-addr.arpa -- 
invalid response
30-Mar-2017 08:19:38.840 resolver: DNS format error from 173.245.58.100#53 
resolving 82.51.18.104.in-addr.arpa/PTR for client 188.168.243.125#15693: 
Name 104.in-addr.arpa (SOA) not subdomain of zone 18.104.in-addr.arpa -- 
invalid response
30-Mar-2017 08:19:51.428 resolver: clients-per-query decreased to 19
30-Mar-2017 08:19:54.725 resolver: DNS format error from 
205.251.192.232#53 resolving now.dolphin.com/ for client 
100.64.36.162#32772: Name dolphin.com (SOA) not subdomain of zone 
now.dolphin.com -- invalid response
30-Mar-2017 08:19:54.786 resolver: DNS format error from 
205.251.195.198#53 resolving now.dolphin.com/ for client 
100.64.36.162#32772: Name dolphin.com (SOA) not subdomain of zone 
now.dolphin.com -- invalid response
30-Mar-2017 08:19:54.848 resolver: DNS format error from 
2600:9000:5307:5600::1#53 resolving now.dolphin.com/ for client 
100.64.36.162#32772: Name dolphin.com (SOA) not subdomain of zone 
now.dolphin.com -- invalid response
30-Mar-2017 08:19:54.925 resolver: DNS format error from 
2600:9000:5304:6600::1#53 resolving now.dolphin.com/ for client 
100.64.36.162#32772: Name dolphin.com (SOA) not subdomain of zone 
now.dolphin.com -- invalid response
30-Mar-2017 08:19:54.998 resolver: DNS format error from 
2600:9000:5300:e800::1#53 resolving now.dolphin.com/ for client 
100.64.36.162#32772: Name dolphin.com (SOA) not subdomain of zone 
now.dolphin.com -- invalid response
30-Mar-2017 08:19:55.060 resolver: DNS format error from 
2600:9000:5303:c600::1#53 resolving now.dolphin.com/ for client 
100.64.36.162#32772: Name dolphin.com (SOA) not subdomain of zone 
now.dolphin.com -- invalid response

process.log (configured as "channel process/severity notice;"):
29-Nov-2016 07:09:28.266 xfer-in: transfer of 'rpz/IN/global' from 
217.23.80.2#53: failed while receiving responses: connection reset
15-Dec-2016 09:56:41.637 xfer-in: transfer of './IN/root' from 
2001:500:2f::f#53: failed to connect: timed out
15-Dec-2016 10:23:37.125 xfer-in: transfer of './IN/root' from 
2001:500:2f::f#53: failed to connect: timed out
15-Dec-2016 10:53:32.581 xfer-in: transfer of './IN/root' from 
2001:500:2f::f#53: failed to connect: timed out
15-Dec-2016 11:20:08.997 xfer-in: transfer of './IN/root' from 
2001:500:2f::f#53: failed to connect: timed out
15-Dec-2016 11:49:11.461 xfer-in: transfer of './IN/root' from 
2001:500:2f::f#53: failed to connect: timed out
15-Dec-2016 12:20:39.845 xfer-in: transfer of './IN/root' from 
2001:500:2f::f#53: failed to connect: timed out
15-Dec-2016 12:48:14.245 xfer-in: transfer of './IN/root' from 
2001:500:2f::f#53: failed to connect: timed out
15-Dec-2016 13:21:37.708 xfer-in: transfer of './IN/root' from 
2001:500:2f::f#53: failed to connect: timed out
15-Dec-2016 13:55:00.133 xfer-in: transfer of './IN/root' from 
2001:500:2f::f#53: failed to connect: timed out
12-Mar-2017 09:25:09.993 xfer-in: transfer of './IN/root' from 
2620:0:2830:202::132#53: failed while receiving responses: end of file

security.log (configured as "channel security/severity info;"):
30-Mar-2017 08:21:57.558 lame-servers: error (unexpected RCODE REFUSED) 
resolving 'echo-nl03.calyptra-soft.net/A/IN': 62.212.78.199#53
30-Mar-2017 08:21:57.630 lame-servers: error (unexpected RCODE REFUSED) 
resolving 'echo-nl03.calyptra-soft.net/A/IN': 83.149.64.123#53
30-Mar-2017 08:21:57.696 lame-servers: error (unexpected RCODE REFUSED) 
resolving '22.178.87.223.in-addr.arpa/PTR/IN': 

Re: troubleshooting bind

2012-04-10 Thread Matus UHLAR - fantomas

On 09.04.12 16:55, Marseglia, Michael wrote:
I'm troubleshooting a DNS issue we recently experienced where records 
were unresolveable, response NXDOMAIN, from the caching DNS server.  
I flushed the cache using rndc flush and I received the host's ip.


There were no errors in the system log so I'm enabling debug logging 
should it occur again.  I'm still not sure what caused the NXDOMAIN 
response it so I'm reviewing my BIND config and taking a look at the 
default values.


the NXDOMAIN answer was apparently returned by one of servers that are 
authoritative for the domain or domains abovec. Check all servers in 
the resolution path for the answer.


It's a quite common problem with master/slave synchronization, multiple 
masters, or a missing delegation to a subdomain, where this can happen.


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
They say when you play that M$ CD backward you can hear satanic messages.
That's nothing. If you play it forward it will install Windows.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


troubleshooting bind

2012-04-09 Thread Marseglia, Michael
Hello,

  I'm troubleshooting a DNS issue we recently experienced where records were 
unresolveable, response NXDOMAIN, from the caching DNS server.  I flushed the 
cache using rndc flush and I received the host's ip.

  There were no errors in the system log so I'm enabling debug logging should 
it occur again.  I'm still not sure what caused the NXDOMAIN response it so I'm 
reviewing my BIND config and taking a look at the default values.

  When configuring BIND for an internal corporate network with a thousand 
clients should any of the default values be tweaked?  I've searched for tuning 
guidance but I haven't found any yet.

  I've taken interest in the tcp-clients, max-ncache-ttl, max-cache-ttl, 
cleaning-interval and max-cache-size values.  These are all currently set to 
default.

  I'm guessing in a more volatile network with DHCP and frequent 
provisioning/deprovisioning of hosts I would want to lower the max-ncache-ttl 
and max-cache-ttl values.  Is this correct?

  Regarding the tcp-clients option, where can I find the current connection 
count and how do I know if I'm coming close to this number?  In what type of 
environment would it be expected to hit the default threshold of 100?

  Lastly, if max-cache-size is set to unlimited what happens if BIND consumes 
all the available memory?  Will the linux kernel terminate the process?  How 
can I find the value of the current cache size?



Mike Marseglia
Network Engineer, CharterCARE
p: 401-456-2331
c: 401-248-4867
e: michael.marseg...@chartercare.orgmailto:michael.marseg...@chartercare.org
t: @mmars


___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: troubleshooting bind

2012-04-09 Thread Chuck Swiger
Hi--

On Apr 9, 2012, at 9:55 AM, Marseglia, Michael wrote:
[ ... ]
   When configuring BIND for an internal corporate network with a thousand 
 clients should any of the default values be tweaked?  I’ve searched for 
 tuning guidance but I haven’t found any yet.
  
   I’ve taken interest in the tcp-clients, max-ncache-ttl, max-cache-ttl, 
 cleaning-interval and max-cache-size values.  These are all currently set to 
 default.

These are good things to take a look at, yes, although also clients-per-query  
max-clients-per-query.

   I’m guessing in a more volatile network with DHCP and frequent 
 provisioning/deprovisioning of hosts I would want to lower the max-ncache-ttl 
 and max-cache-ttl values.  Is this correct?

That depends-- if the volatile domain is your domain, and BIND is authoritative 
for it, then it will be providing AAs directly from zone data, rather than 
caching responses obtained from some other nameserver.  For the most part, it's 
better for an active domain with frequently changing data to adjust the TTLs 
for the domain to appropriate values, and let named figure things out from 
there...but you can only tweak that for the domains you manage.

   Regarding the tcp-clients option, where can I find the current connection 
 count and how do I know if I’m coming close to this number?  In what type of 
 environment would it be expected to hit the default threshold of 100?

You can see what active TCP sessions are open via something like:

  netstat -p tcp | grep 53

...and add | wc -l if you want to count them.

(You might also want to tweak that a bit to use fgrep .53\  to only match 
port 53...)

I don't think it's expected that many TCP sessions would be needed, since UDP + 
EDNS0 works fine for almost all cases, although as DNSSEC becomes more widely 
adopted it might be the case that more TCP sessions will be used.

   Lastly, if max-cache-size is set to unlimited what happens if BIND consumes 
 all the available memory?  Will the linux kernel terminate the process?  How 
 can I find the value of the current cache size?

Most platforms set up a process datasize limit (commonly set to 1GB or so), 
after which malloc() and friends will fail to get more memory.  The kernel will 
only terminate processes if the entire system runs out of VM, including swap 
space, but the system will generally in an unusable state due to heavy 
paging/swapping before the kernel OOM killer gets invoked.

Regards,
-- 
-Chuck

___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users