Re: intermittent SERVFAIL for high visible domains such as *.google.com

2018-01-22 Thread Brian J. Murrell
On Mon, 2018-01-22 at 16:10 +, Tony Finch wrote:
> 
> You should make sure it is enabled, because there are vital clues in
> those
> log lines :-)

But they will only occur if there is some lameness with the ns[1-
4].google.com records and that will already be reported with lame:n in
the "fetch completed at resolver.c" lines won't they, or am I
completely misunderstanding something here?

> Yes, and you should track down when they occur and look for other
> error
> indications areound that time.

So, over the last week of tracing I have only these lines which match
"fetch completed at resolver.c:[0-9]* for ns[1-4].google.com":

19-Jan-2018 09:41:53.347 fetch completed at resolver.c:7492 for 
ns4.google.com/ in 0.042154: success/success 
[domain:google.com,referral:0,restart:1,qrysent:1,timeout:0,lame:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]
19-Jan-2018 09:41:53.350 fetch completed at resolver.c:7492 for 
ns2.google.com/ in 0.042019: success/success 
[domain:google.com,referral:0,restart:1,qrysent:1,timeout:0,lame:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]
19-Jan-2018 09:41:53.356 fetch completed at resolver.c:7492 for 
ns3.google.com/ in 0.043881: success/success 
[domain:google.com,referral:0,restart:1,qrysent:1,timeout:0,lame:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]
19-Jan-2018 09:41:53.362 fetch completed at resolver.c:7492 for 
ns1.google.com/ in 0.047039: success/success 
[domain:google.com,referral:0,restart:1,qrysent:1,timeout:0,lame:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]

None of them show any lame servers.

Wouldn't I see occurrences of those with lame:n if I there were any
lameness?

Cheers,
b.


signature.asc
Description: This is a digitally signed message part
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: 9.11 can't validate sss.gov

2018-01-22 Thread Grant Taylor via bind-users

On 01/22/2018 09:21 AM, Warren Kumari wrote:
http://www.sss.gov works OK, but http://sss.gov always seems to return 
"The requested service is temporarily unavailable. It is either overloaded 
or under maintenance. Please try later.".


Inconsistency between related things is annoying.

I guess props for consistently returning different things.

There is a fair bit os disagreement over if a bare domain should resolve 
/ have a web-server listening, but ISTM that if you do, you should have 
it work


I agree that this (at the very least) violates (what I consider to be) 
reasonable expectation and surprises users.


I'm of the opinion that if you have www.sss.gov and sss.gov, that they 
should behave in very similar and related ways.  -  My personal 
preference would be for sss.gov to 30[1267] redirect to www.sss.gov. 
Ideally any non-HTTPS to HTTPS.


I wonder how many people have tried the bare domain and never realized 
that adding in the 'www' "fixes" it.


I expect that there are a lot more than we may think.



--
Grant. . . .
unix || die



smime.p7s
Description: S/MIME Cryptographic Signature
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: 9.11 can't validate sss.gov

2018-01-22 Thread Warren Kumari
Unrelated to the DNS bit, but still silly / annoying:

http://www.sss.gov works OK, but http://sss.gov always seems to return
"The requested service is temporarily unavailable. It is either
overloaded or under maintenance. Please try later.".

There is a fair bit os disagreement over if a bare domain should
resolve / have a web-server listening, but ISTM that if you *do*, you
should have it work -- I wonder how many people have tried the bare
domain and never realized that adding in the 'www' "fixes" it.

W

On Mon, Jan 22, 2018 at 11:08 AM, Timothy A. Holtzen
 wrote:
> I've informed the selective service (sss.gov) of the issue.  They have
> supposedly passed it on to their "web support group".  We will see if
> anything happens but I'm not holding my breath.  At least a government
> agency should have more influence to get qwest to fix their servers than
> I do.
>
> Timothy A. Holtzen
> Campus Network Administrator
> Nebraska Wesleyan University
> Public PGP key CFB4 3AE8 B726 DEBF 00D9  CCFC 426E 76AF DABC B3D7
>
>
> On 01/19/2018 05:04 PM, Mark Andrews wrote:
>> Yes, qwest were informed years ago that there severs are broken. Report this 
>> to the .gov site operators.  The servers return BADVERS to the queries which 
>> was never part of the EDNS spec and is a invention of the servers 
>> developers. FORMERR was permissible by STD13  but this was tightened when 
>> the EDNS spec was revised to say ignore unknown EDNS options.
>>
>
>
>
> ___
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
> from this list
>
> bind-users mailing list
> bind-users@lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users



-- 
I don't think the execution is relevant when it was obviously a bad
idea in the first place.
This is like putting rabid weasels in your pants, and later expressing
regret at having chosen those particular rabid weasels and that pair
of pants.
   ---maf
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: one domain not resolving via response-policy zone

2018-01-22 Thread lists
Hey Kai,

> If I do a nslookup for one of the otto.de domains I reveive "** server
> can't find somehost.ov.otto.de: SERVFAIL"

The guideline behind the response-policy is that only an actual response gets 
rewritten.
This is usually an answer from a recursive lookup.
If you don't get an answer, there is nothing to rewrite.

The SERVFAIL won't be rewritten unless you told BIND to do so.

You could try the 'qname-wait-recurse' option.
I guess this isn't the original purpose of this option, but based on the 
documentation this should
work for you.

>From https://ftp.isc.org/isc/bind9/cur/9.11/doc/arm/Bv9ARM.ch06.html
> Using this option can cause error responses such as SERVFAIL
> to appear to be rewritten, since no recursion is being done to
> discover problems at the authoritative server.

Cheers
Felix

On 22.01.2018 13:58, Kai Wiechers wrote:

> Hi List,
> I setup a response-policy zone to override some Records from external
> DNS-Servers I can't control.
> My db.rpz Zonefile:
> $TTL 4H
> @ IN SOA localhost. kai.mydomain.com. (
> 2018012212 ; serial
> 5M ; refresh
> 5M ; retry
> 4W ; expiry
> 5M) ; minimum
> IN NS localhost.
> localhost A 127.0.0.1
> ulf.test.google.de A 192.168.0.1
> gerd.test.google.de A 192.168.0.2
> bild.de A 192.168.0.3
> somehost.ov.otto.de A 10.0.0.1
> otherhost.ov.otto.de A 10.0.0.2
> heise.de A 192.168.0.4
> In my options I just added
> response-policy { zone "rpz"; };
> What really drives me crazy is, that the override of the google and
> heise domain is working. But the otto.de domains not.
> If I do a nslookup for one of the otto.de domains I reveive "** server
> can't find somehost.ov.otto.de: SERVFAIL"
> Any hints for me?
> Thanks and best regards,
> Kai
> ___
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
> from this list
> bind-users mailing list
> bind-users@lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: intermittent SERVFAIL for high visible domains such as *.google.com

2018-01-22 Thread Tony Finch
Brian J. Murrell  wrote:
>
> Yeah.  Must be disabled by default on EL7 I would guess, just because
> it's so noisy.

You should make sure it is enabled, because there are vital clues in those
log lines :-) Other categories you should check are `edns-disabled` (which
I already mentioned) and `resolver`.

> So, if lame servers were a problem with resolving ns[1-4].google.com,
> then I would see messages like in my previous message with a lame:n tag
> where n > 0, yes?

Yes, and you should track down when they occur and look for other error
indications areound that time.

Tony.
-- 
f.anthony.n.finch    http://dotat.at/  -  I xn--zr8h punycode
Bailey, Fair Isle, Faeroes: Cyclonic, becoming south or southwest, 5 to 7,
increasing gale 8 at times. Rough or very rough, occasionally high at first.
Occasional rain or showers, squally later in Bailey. Good, occasionally poor.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: 9.11 can't validate sss.gov

2018-01-22 Thread Timothy A. Holtzen
I've informed the selective service (sss.gov) of the issue.  They have
supposedly passed it on to their "web support group".  We will see if
anything happens but I'm not holding my breath.  At least a government
agency should have more influence to get qwest to fix their servers than
I do.

Timothy A. Holtzen
Campus Network Administrator
Nebraska Wesleyan University
Public PGP key CFB4 3AE8 B726 DEBF 00D9  CCFC 426E 76AF DABC B3D7


On 01/19/2018 05:04 PM, Mark Andrews wrote:
> Yes, qwest were informed years ago that there severs are broken. Report this 
> to the .gov site operators.  The servers return BADVERS to the queries which 
> was never part of the EDNS spec and is a invention of the servers developers. 
> FORMERR was permissible by STD13  but this was tightened when the EDNS spec 
> was revised to say ignore unknown EDNS options. 
>




signature.asc
Description: OpenPGP digital signature
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: intermittent SERVFAIL for high visible domains such as *.google.com

2018-01-22 Thread Brian J. Murrell
On Mon, 2018-01-22 at 12:04 +, Tony Finch wrote:
> 
> The thing to look out for is the minutes before the outage starts -
> see
> what kind of failures you get.

So, taking this approach, looking for the first occurrence of just any
one of the names ns[1-4].google.com prior to the A/ queries that
are in http://brian.interlinx.bc.ca/named.run.log starting at:

19-Jan-2018 18:04:50.785 createfetch: ns1.google.com A

(which end up resulting in the SERVFAIL for www.google.com/IN/A) the
first previous occurrence of just any one of those names is:

19-Jan-2018 17:48:59.122 resquery 0x7f10102ecd50 (fctx 
0x7f10102e5dc0(lh4.ggpht.com/)): response
19-Jan-2018 17:48:59.122 received packet:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id:   3024
;; flags: qr cd; QUESTION: 1, ANSWER: 0, AUTHORITY: 8, ADDITIONAL: 5
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;lh4.ggpht.com. IN  

;; AUTHORITY SECTION:
ggpht.com.  172800  IN  NS  ns2.google.com.
ggpht.com.  172800  IN  NS  ns1.google.com.
ggpht.com.  172800  IN  NS  ns3.google.com.
ggpht.com.  172800  IN  NS  ns4.google.com.
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN NSEC3 1 1 0 - 
CK0Q1GIN43N1ARRC9OSM6QPQR81H5M9A NS SOA RRSIG DNSKEY NSEC3PARAM
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN RRSIG NSEC3 8 2 86400 
20180124054922 20180117043922 46967 com. 
pjslTFtda4UfkpJtO9rbVmzSRQ+JslWRuBl/r0tkeyX4nBA8wjOIQjCH DJl+C6CA8TMW
lO9dfx5ZHM2s59N/XfQG3fp2N68bf3rhSp5OwUEVy205 
6LMbiiW7wjp0MEQOGorvf29kS6ApuZHGOseP5HQrAIBO4XxZvomAPME+ Q1c=
FGFB71PIIJ5JUGA7GFUQ06ANFUVDRKBA.com. 86400 IN NSEC3 1 1 0 - 
FGFGQ2SH7LNK03PV0R76S8B47TPVJK59 NS DS RRSIG
FGFB71PIIJ5JUGA7GFUQ06ANFUVDRKBA.com. 86400 IN RRSIG NSEC3 8 2 86400 
20180125052147 20180118041147 46967 com. 
DkAophVbTjntmUtcj2HIiigTv5yxlNuTIAGWgXY+W9QhAJp4UUYpqxOe jmyxVEUtfYqS
3ANVWz7EI+ucYS1CE8UKuWUx4eGAz8F/YbN/KA5cvxWO 
SEqri5Lg3W2MjiB/DXXFI/WrnmuLPNIQdDZD2H1lQ56CTUAL0pPpDby9 788=

;; ADDITIONAL SECTION:
ns2.google.com. 172800  IN  A   216.239.34.10
ns1.google.com. 172800  IN  A   216.239.32.10
ns3.google.com. 172800  IN  A   216.239.36.10
ns4.google.com. 172800  IN  A   216.239.38.10

I realize this query result has nothing to do with www,google.com, but
it is the first occurrence of just any of the names ns[1-4].google.com
prior to the start of the subsequent SERFAIL processing that starts at
18:04:50.785 and it's more than 10 minutes prior to the SERVFAIL.

That seems to indicate that nothing at all to do with any of the names
ns[1-4].google.com happens for more than 10 minutes before a SERVFAIL
is returned for www.google.com right?  Nothing at all happens that
could result in a any of those names being lame, right?

Cheers,
b.


signature.asc
Description: This is a digitally signed message part
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: intermittent SERVFAIL for high visible domains such as *.google.com

2018-01-22 Thread Brian J. Murrell
On Mon, 2018-01-22 at 12:45 +, Tony Finch wrote:
> 
> They'll have a log category of edns-disabled.

But if the problem were EDNS, would it be so intermittent and always
fixable by rndc reload?

> But, looking through the
> code, if this is leading to lameness you will also get lame-servers
> log
> messages.

So just looking for lame servers will cover EDNS issues also then,
right?

> lame-servers is also a log category, and tends to be quite noisy
> about
> various problems :-)

Yeah.  Must be disabled by default on EL7 I would guess, just because
it's so noisy.

> The tagged values there are various kinds of things that happened
> when
> resolving; the lame: tag is a count of the lame servers that were
> encountered, including both newly discovered lame servers and cached
> lame
> servers.

So, if lame servers were a problem with resolving ns[1-4].google.com,
then I would see messages like in my previous message with a lame:n tag
where n > 0, yes?

Cheers,
b.


signature.asc
Description: This is a digitally signed message part
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

one domain not resolving via response-policy zone

2018-01-22 Thread Kai Wiechers
Hi List,

I setup a response-policy zone to override some Records from external
DNS-Servers I can't control.

My db.rpz Zonefile:

$TTL 4H
@   IN  SOA localhost. kai.mydomain.com. (
    2018012212  ; serial
    5M  ; refresh
    5M  ; retry
    4W  ; expiry
    5M) ; minimum
  IN  NS  localhost.

localhost                    A    127.0.0.1

ulf.test.google.de       A    192.168.0.1
gerd.test.google.de   A    192.168.0.2
bild.de    A    192.168.0.3
somehost.ov.otto.de  A    10.0.0.1
otherhost.ov.otto.de  A    10.0.0.2
heise.de A    192.168.0.4


In my options I just added

response-policy { zone "rpz"; };

What really drives me crazy is, that the override of the google and
heise domain is working. But the otto.de domains not.
If I do a nslookup for one of the otto.de domains I reveive "** server
can't find somehost.ov.otto.de: SERVFAIL"

Any hints for me?

Thanks and best regards,
Kai
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: intermittent SERVFAIL for high visible domains such as *.google.com

2018-01-22 Thread Tony Finch
Brian J. Murrell  wrote:
>
> What do EDNS problem messages look like?  Just something to grep for I
> mean.

They'll have a log category of edns-disabled. But, looking through the
code, if this is leading to lameness you will also get lame-servers log
messages.

> > or lame-servers complaints

lame-servers is also a log category, and tends to be quite noisy about
various problems :-)

> Does the "lame:1" in this message indicate lameness:
>
> 18-Jan-2018 11:12:47.103 fetch completed at resolver.c:3074 for 
> 149.243.194.103.in-addr.arpa/PTR in 0.000744: failure/success 
> [domain:243.194.103.in-addr.arpa,referral:0,restart:1,qrysent:0,timeout:0,lame:1,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]

The tagged values there are various kinds of things that happened when
resolving; the lame: tag is a count of the lame servers that were
encountered, including both newly discovered lame servers and cached lame
servers. The other error-related numbers are worth paying attention to, I
think - timeout, neterr, badresp, adberr, findfail, valfail.

Tony.
-- 
f.anthony.n.finch    http://dotat.at/  -  I xn--zr8h punycode
Tyne, Dogger: West, backing south, 5 or 6, occasionally 7 later. Slight or
moderate, occasionally rough. Occasional rain or showers. Good, occasionally
poor.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: intermittent SERVFAIL for high visible domains such as *.google.com

2018-01-22 Thread Brian J. Murrell
On Mon, 2018-01-22 at 12:04 +, Tony Finch wrote:
> 
> That indicates that it has already marked the servers as lame, so the
> packet trace isn't going to tell you what caused the lameness.

OK.

> The thing to look out for is the minutes before the outage starts -
> see
> what kind of failures you get.
> 
> Also, check the logs for EDNS

What do EDNS problem messages look like?  Just something to grep for I
mean.

> or lame-servers complaints

Does the "lame:1" in this message indicate lameness:

18-Jan-2018 11:12:47.103 fetch completed at resolver.c:3074 for 
149.243.194.103.in-addr.arpa/PTR in 0.000744: failure/success 
[domain:243.194.103.in-addr.arpa,referral:0,restart:1,qrysent:0,timeout:0,lame:1,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]

Of course, that one is irrelevant to my situation, I'm just using it as
an example of how to find lame delegations.

Cheers,
b.


signature.asc
Description: This is a digitally signed message part
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: intermittent SERVFAIL for high visible domains such as *.google.com

2018-01-22 Thread Tony Finch
Brian J. Murrell  wrote:
>
> that demonstrates how BIND is getting .com referrals from the root
> servers when doing a query for www.google.com and then doing nothing
> with those referrals before returning a SERVFAIL.

That indicates that it has already marked the servers as lame, so the
packet trace isn't going to tell you what caused the lameness.

The thing to look out for is the minutes before the outage starts - see
what kind of failures you get.

Also, check the logs for EDNS or lame-servers complaints before an outage
starts, which I hope will give you a better idea of how long the problem
is (e.g. start off around the 10 minute mark suggested by the lame-ttl
setting).

Good luck :-)

Tony.
-- 
f.anthony.n.finch    http://dotat.at/  -  I xn--zr8h punycode
South Utsire, Northeast Forties: Southerly or southeasterly 5 to 7, increasing
gale 8 at times. Moderate or rough, occasionally very rough in South Utsire.
Occasional rain. Good, occasionally poor.
___
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users