Re: dns cache issue

2019-01-11 Thread Edwardo Garcia
OK, so  this happen again, with link congestion.

bind is caching the results as tested with no congestion, 78ms down to
1ms... BUT the issue with bind remain and logs show nothing wrong

congested link lookup , tried in instant succession with a second or less
between:
google.com (like any other host I try)  timeout no servers can be reached
lookup internal zone I added to bind, replies with 7ms
retry google and few other sites again, all timeout no servers can be
reached
(google may only have 5min TTL but other domains i'm testing, including
mail provider etc, is 1 day.
ping to DNS box is quick
ping to other boxes is quick too
disconnect  windows updating pc, and google et al respond with 1ms so it
obviously is in the bloody cache but because bind  cant do something with
internet in a timely manor it just spits dummy

Why bind do this if it should already know the answer, it should give
answer, since it holds the record, just as it knows the internal test zone.

this all cause mail to fail, web browsing to fail, boss not happy.



On Fri, Jan 11, 2019 at 9:27 AM Edwardo Garcia  wrote:

> Kevin,
> I though lan saturation too, but I can ssh into bind server immediately, I
> also, from my other pc did a lookup on local authoritative zone rpz.lan, so
> my bind replying right away or within 1 second during congestion, could it
> be dnssec the problem, I did not disable that to test, it really is like it
> is not caching any external results so maybe it needs to go out and do all
> lookups again to make sure signature valid? I really don't know. I'm now
> guessing.
>
> I will try your suggestion of logging again, and as for link local, yes,
> couple of years ago  we saw problems
>
> ed
>
> On Fri, Jan 11, 2019 at 1:17 AM Kevin Darcy 
> wrote:
>
>> Offhand, sounds like your LAN is saturated so the queries might not be
>> getting to BIND in the first place. Or the replies aren't getting back.
>> It's unlikely that QoS is going to help this, you indicated that QoS was on
>> your "router", and that is typical -- usually QoS is found on WAN links.
>> (Although, on the other hand, you mentioned VoIP, and VoIP sometimes
>> requires applying QoS at the LAN level too).
>>
>> You currently have query logging turned off. If it's not too
>> resource-intensive, you might want to consider turning that on, to verify
>> whether the queries are getting to BIND. Or, run a packet capture on the
>> BIND side. Packet capture on the BIND device should also help to identify
>> any issues talking upstream (e.g. to TLD servers or auth servers for
>> domains like google.com). Packet capture on the *client* side would
>> probably be necessary for definitive proof of whether replies are being
>> dropped by the LAN (compare what the server sent side-by-side with what the
>> client saw).
>>
>> I was intrigued by "server fe80::/16 { bogus yes; }; " in your config.
>> Have you had issues with IPv6 link-local addresses being associated with
>> delegated nameservers? I haven't noticed this, but then again, I haven't
>> been looking for that particular misconfiguration specifically...
>>
>>
>>   - Kevin
>>
>>
>>
>> On Thu, Jan 10, 2019 at 12:06 AM Edwardo Garcia 
>> wrote:
>>
>>> With new windows update last day, we notice something strange, our local
>>> DNS cache server timeout on lookups.
>>>
>>> For example lookup google.com, 1 minute later fails timeout looking up,
>>> but since it has already looked it up it should have returned answer from
>>> cache yes? google has a 5min TTL, my cache doesnt cacher it for even  1ns
>>> it seems
>>>
>>> QoS on router gives DNS (udp and tcp)and VoIP highest priority,
>>> everything else is default QoS must be working because if I do
>>> host www.google.com $externalDNSserver   I get an answer pretty much
>>> right away,  immediately try again on our local dns server it times out
>>> cant connect to any servers.
>>> this contrinues on, if I drop the LAN port on switch the windows update
>>> machine uses,  it resolves google.com again, bring back up that port,
>>> it times out again.
>>>
>>> this only happens on congestion, with our cable link maxed out.
>>>
>>> (never thought i'd see the day when a windows pc would take out an
>>> entire network)
>>>
>>> Below is my named.conf I have to be missing something ?
>>>
>>> BIND 9.11.2-P1
>>> running on Linux i686 3.16.58 #1 SMP Sat Sep 29 11:06:24 AEST 2018
>>> built by make with defaults
>>>
>>> acl "trusted" { localhost; 198.162.100.0/24; };
>>> acl "sysop" { localhost; 192.168.100.6; };
>>>
>>> options {
>>> directory "/var/named";
>>> allow-query { trusted; };
>>> allow-query-cache { trusted; };
>>> allow-transfer { sysop; };
>>> transfer-format many-answers;
>>> masterfile-format text;
>>> interface-interval 0;
>>> response-policy {zone "rpz.lan"; };
>>> dnssec-enable yes;
>>> dnssec-validation auto;
>>> empty-zones-enable yes;
>>> };
>>>
>>> serv

Re: dns cache issue

2019-01-10 Thread Edwardo Garcia
Kevin,
I though lan saturation too, but I can ssh into bind server immediately, I
also, from my other pc did a lookup on local authoritative zone rpz.lan, so
my bind replying right away or within 1 second during congestion, could it
be dnssec the problem, I did not disable that to test, it really is like it
is not caching any external results so maybe it needs to go out and do all
lookups again to make sure signature valid? I really don't know. I'm now
guessing.

I will try your suggestion of logging again, and as for link local, yes,
couple of years ago  we saw problems

ed

On Fri, Jan 11, 2019 at 1:17 AM Kevin Darcy 
wrote:

> Offhand, sounds like your LAN is saturated so the queries might not be
> getting to BIND in the first place. Or the replies aren't getting back.
> It's unlikely that QoS is going to help this, you indicated that QoS was on
> your "router", and that is typical -- usually QoS is found on WAN links.
> (Although, on the other hand, you mentioned VoIP, and VoIP sometimes
> requires applying QoS at the LAN level too).
>
> You currently have query logging turned off. If it's not too
> resource-intensive, you might want to consider turning that on, to verify
> whether the queries are getting to BIND. Or, run a packet capture on the
> BIND side. Packet capture on the BIND device should also help to identify
> any issues talking upstream (e.g. to TLD servers or auth servers for
> domains like google.com). Packet capture on the *client* side would
> probably be necessary for definitive proof of whether replies are being
> dropped by the LAN (compare what the server sent side-by-side with what the
> client saw).
>
> I was intrigued by "server fe80::/16 { bogus yes; }; " in your config.
> Have you had issues with IPv6 link-local addresses being associated with
> delegated nameservers? I haven't noticed this, but then again, I haven't
> been looking for that particular misconfiguration specifically...
>
>
>   - Kevin
>
>
>
> On Thu, Jan 10, 2019 at 12:06 AM Edwardo Garcia 
> wrote:
>
>> With new windows update last day, we notice something strange, our local
>> DNS cache server timeout on lookups.
>>
>> For example lookup google.com, 1 minute later fails timeout looking up,
>> but since it has already looked it up it should have returned answer from
>> cache yes? google has a 5min TTL, my cache doesnt cacher it for even  1ns
>> it seems
>>
>> QoS on router gives DNS (udp and tcp)and VoIP highest priority,
>> everything else is default QoS must be working because if I do
>> host www.google.com $externalDNSserver   I get an answer pretty much
>> right away,  immediately try again on our local dns server it times out
>> cant connect to any servers.
>> this contrinues on, if I drop the LAN port on switch the windows update
>> machine uses,  it resolves google.com again, bring back up that port, it
>> times out again.
>>
>> this only happens on congestion, with our cable link maxed out.
>>
>> (never thought i'd see the day when a windows pc would take out an entire
>> network)
>>
>> Below is my named.conf I have to be missing something ?
>>
>> BIND 9.11.2-P1
>> running on Linux i686 3.16.58 #1 SMP Sat Sep 29 11:06:24 AEST 2018
>> built by make with defaults
>>
>> acl "trusted" { localhost; 198.162.100.0/24; };
>> acl "sysop" { localhost; 192.168.100.6; };
>>
>> options {
>> directory "/var/named";
>> allow-query { trusted; };
>> allow-query-cache { trusted; };
>> allow-transfer { sysop; };
>> transfer-format many-answers;
>> masterfile-format text;
>> interface-interval 0;
>> response-policy {zone "rpz.lan"; };
>> dnssec-enable yes;
>> dnssec-validation auto;
>> empty-zones-enable yes;
>> };
>>
>> server fe80::/16 { bogus yes; };
>>
>> logging {
>> category lame-servers { null; };
>> category edns-disabled { null; };
>> category client { null; };
>> category dnssec { null; };
>>  //channel log_queries { file "/var/named/query.log";
>> print-category yes; };
>>  //category queries { log_queries; };
>> channel log-rpz { file "/var/log/rpz.log" versions 10 25m;
>> severity info; };
>> category rpz { log-rpz; };
>> };
>>
>> zone "." {
>> type hint;
>> file "root.cache";
>>
>> zone "rpz.lan" {
>> type master;
>> file "rpz.lan";
>> allow-query { trusted; };
>> allow-update {none;};
>> notify no;
>> };
>>
>>
>> zone "akamai.net" {
>> type forward;
>> forward first;
>> forwarders { xx; xx; };
>> };
>>
>>
>>
>> ___
>> Please visit https://lists.isc.org/mailman/listinfo/bind-users to
>> unsubscribe from this list
>>
>> bind-users mailing list
>> bind-users@lists.isc.org
>> https://lists.isc.org/mailman/listinfo/bind-users
>>
>
___
Please visit https://lists.isc.or

Re: dns cache issue

2019-01-10 Thread Kevin Darcy
Offhand, sounds like your LAN is saturated so the queries might not be
getting to BIND in the first place. Or the replies aren't getting back.
It's unlikely that QoS is going to help this, you indicated that QoS was on
your "router", and that is typical -- usually QoS is found on WAN links.
(Although, on the other hand, you mentioned VoIP, and VoIP sometimes
requires applying QoS at the LAN level too).

You currently have query logging turned off. If it's not too
resource-intensive, you might want to consider turning that on, to verify
whether the queries are getting to BIND. Or, run a packet capture on the
BIND side. Packet capture on the BIND device should also help to identify
any issues talking upstream (e.g. to TLD servers or auth servers for
domains like google.com). Packet capture on the *client* side would
probably be necessary for definitive proof of whether replies are being
dropped by the LAN (compare what the server sent side-by-side with what the
client saw).

I was intrigued by "server fe80::/16 { bogus yes; }; " in your config. Have
you had issues with IPv6 link-local addresses being associated with
delegated nameservers? I haven't noticed this, but then again, I haven't
been looking for that particular misconfiguration specifically...


- Kevin



On Thu, Jan 10, 2019 at 12:06 AM Edwardo Garcia  wrote:

> With new windows update last day, we notice something strange, our local
> DNS cache server timeout on lookups.
>
> For example lookup google.com, 1 minute later fails timeout looking up,
> but since it has already looked it up it should have returned answer from
> cache yes? google has a 5min TTL, my cache doesnt cacher it for even  1ns
> it seems
>
> QoS on router gives DNS (udp and tcp)and VoIP highest priority, everything
> else is default QoS must be working because if I do
> host www.google.com $externalDNSserver   I get an answer pretty much
> right away,  immediately try again on our local dns server it times out
> cant connect to any servers.
> this contrinues on, if I drop the LAN port on switch the windows update
> machine uses,  it resolves google.com again, bring back up that port, it
> times out again.
>
> this only happens on congestion, with our cable link maxed out.
>
> (never thought i'd see the day when a windows pc would take out an entire
> network)
>
> Below is my named.conf I have to be missing something ?
>
> BIND 9.11.2-P1
> running on Linux i686 3.16.58 #1 SMP Sat Sep 29 11:06:24 AEST 2018
> built by make with defaults
>
> acl "trusted" { localhost; 198.162.100.0/24; };
> acl "sysop" { localhost; 192.168.100.6; };
>
> options {
> directory "/var/named";
> allow-query { trusted; };
> allow-query-cache { trusted; };
> allow-transfer { sysop; };
> transfer-format many-answers;
> masterfile-format text;
> interface-interval 0;
> response-policy {zone "rpz.lan"; };
> dnssec-enable yes;
> dnssec-validation auto;
> empty-zones-enable yes;
> };
>
> server fe80::/16 { bogus yes; };
>
> logging {
> category lame-servers { null; };
> category edns-disabled { null; };
> category client { null; };
> category dnssec { null; };
>  //channel log_queries { file "/var/named/query.log";
> print-category yes; };
>  //category queries { log_queries; };
> channel log-rpz { file "/var/log/rpz.log" versions 10 25m;
> severity info; };
> category rpz { log-rpz; };
> };
>
> zone "." {
> type hint;
> file "root.cache";
>
> zone "rpz.lan" {
> type master;
> file "rpz.lan";
> allow-query { trusted; };
> allow-update {none;};
> notify no;
> };
>
>
> zone "akamai.net" {
> type forward;
> forward first;
> forwarders { xx; xx; };
> };
>
>
>
> ___
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to
> unsubscribe from this list
>
> bind-users mailing list
> bind-users@lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
>
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


dns cache issue

2019-01-09 Thread Edwardo Garcia
With new windows update last day, we notice something strange, our local
DNS cache server timeout on lookups.

For example lookup google.com, 1 minute later fails timeout looking up, but
since it has already looked it up it should have returned answer from cache
yes? google has a 5min TTL, my cache doesnt cacher it for even  1ns it seems

QoS on router gives DNS (udp and tcp)and VoIP highest priority, everything
else is default QoS must be working because if I do
host www.google.com $externalDNSserver   I get an answer pretty much right
away,  immediately try again on our local dns server it times out cant
connect to any servers.
this contrinues on, if I drop the LAN port on switch the windows update
machine uses,  it resolves google.com again, bring back up that port, it
times out again.

this only happens on congestion, with our cable link maxed out.

(never thought i'd see the day when a windows pc would take out an entire
network)

Below is my named.conf I have to be missing something ?

BIND 9.11.2-P1
running on Linux i686 3.16.58 #1 SMP Sat Sep 29 11:06:24 AEST 2018
built by make with defaults

acl "trusted" { localhost; 198.162.100.0/24; };
acl "sysop" { localhost; 192.168.100.6; };

options {
directory "/var/named";
allow-query { trusted; };
allow-query-cache { trusted; };
allow-transfer { sysop; };
transfer-format many-answers;
masterfile-format text;
interface-interval 0;
response-policy {zone "rpz.lan"; };
dnssec-enable yes;
dnssec-validation auto;
empty-zones-enable yes;
};

server fe80::/16 { bogus yes; };

logging {
category lame-servers { null; };
category edns-disabled { null; };
category client { null; };
category dnssec { null; };
 //channel log_queries { file "/var/named/query.log";
print-category yes; };
 //category queries { log_queries; };
channel log-rpz { file "/var/log/rpz.log" versions 10 25m; severity
info; };
category rpz { log-rpz; };
};

zone "." {
type hint;
file "root.cache";

zone "rpz.lan" {
type master;
file "rpz.lan";
allow-query { trusted; };
allow-update {none;};
notify no;
};


zone "akamai.net" {
type forward;
forward first;
forwarders { xx; xx; };
};
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users