Stumped - SERVFAIL vs NOERROR?

2011-04-27 Thread Karl Auer
Hi all.

Well, I'm stumped.

This is causing non-delivery of mail for the affected domain because it
is blocking fallback from IPv6 to IPv4 for the domain. The problem
smells like misconfigured IPv6 somewhere along the way, but all the
servers involved (that have IPv6 addresses) seem to be answering OK.

Using our local caching, recursive BIND9 nameservers, we get SERVFAIL on
a particular domain, namely mailergoat.rsi.co.jp. But from other
places, we get NOERROR (which is the correct answer, because there is a
A record with that name). However, from some places outside our network
we also get SERVFAIL.

Traces (using the +trace option to dig) are identical regardless of
where we do them, besides some reordering of the nameserver results,
which is normal.

One oddity (at least it seems odd to me) is that a trace ends with two
nameservers, gtm1.rsi.co.jp and gtm2.rsi.co.jp, that are not present in
the nameserver list for rsi.co.jp, meaning that the domain
mailergoat.rsi.co.jp has been delegated to them. When I ask either of
those servers directly for the nameserver records for
mailergoat.rsi.co.jp, I get NOERROR, but no answer. Asking those servers
for ANY records for that name shows an A record and a TXT (SPF) record
only. That makes this a lame delegation - but why do some recursive
nameservers report it as SERVFAIL and some as NOERROR? A difference
between nameservers, or nameserver versions?

Any ideas gratefully received. See below for dig outputs demonstrating
the above statements.

Regards, K.

dmz-rz-ap:[~]$ dig mailergoat.rsi.co.jp 

;  DiG 9.6.1-P3  mailergoat.rsi.co.jp 
;; global options: +cmd
;; Got answer:
;; -HEADER- opcode: QUERY, status: SERVFAIL, id: 772
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;mailergoat.rsi.co.jp.  IN  

;; Query time: 582 msec
;; SERVER: 129.132.98.12#53(129.132.98.12)
;; WHEN: Wed Apr 27 13:09:43 2011
;; MSG SIZE  rcvd: 38

But from other places, we get NOERROR (which is the correct answer,
because there is a A record with that name). This via Google DNS:

dns2-rz-ap:[log]$ dig mailergoat.rsi.co.jp  @8.8.8.8

;  DiG 9.2.4  mailergoat.rsi.co.jp  @8.8.8.8
;; global options:  printcmd
;; Got answer:
;; -HEADER- opcode: QUERY, status: NOERROR, id: 518
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;mailergoat.rsi.co.jp.  IN  

;; AUTHORITY SECTION:
rsi.co.jp.  60  IN  SOA gtm1.rsi.co.jp.
hostmaster.gtm1.rsi.co.jp. 31 10800 3600 604800 60

;; Query time: 523 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Apr 27 13:10:07 2011
;; MSG SIZE  rcvd: 90

Note that there *is* an A record with that name:

dmz-rz-ap:[~]$ dig mailergoat.rsi.co.jp 

;  DiG 9.6.1-P3  mailergoat.rsi.co.jp
;; global options: +cmd
;; Got answer:
;; -HEADER- opcode: QUERY, status: NOERROR, id: 1627
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 2

;; QUESTION SECTION:
;mailergoat.rsi.co.jp.  IN  A

;; ANSWER SECTION:
mailergoat.rsi.co.jp.   600 IN  A   202.214.41.103

;; AUTHORITY SECTION:
mailergoat.rsi.co.jp.   260 IN  NS  gtm2.rsi.co.jp.
mailergoat.rsi.co.jp.   260 IN  NS  gtm1.rsi.co.jp.

;; ADDITIONAL SECTION:
gtm1.rsi.co.jp. 600 IN  A   202.214.41.51
gtm2.rsi.co.jp. 600 IN  A   202.25.214.15

;; Query time: 592 msec
;; SERVER: 129.132.98.12#53(129.132.98.12)
;; WHEN: Wed Apr 27 13:14:56 2011
;; MSG SIZE  rcvd: 124


But from some places outside our network we also get SERVFAIL:

kauer@karl:~$ dig mailergoat.rsi.co.jp 

;  DiG 9.7.1-P2  mailergoat.rsi.co.jp 
;; global options: +cmd
;; Got answer:
;; -HEADER- opcode: QUERY, status: SERVFAIL, id: 3850
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;mailergoat.rsi.co.jp.  IN  

;; Query time: 544 msec
;; SERVER: 192.168.1.35#53(192.168.1.35)
;; WHEN: Wed Apr 27 21:09:40 2011
;; MSG SIZE  rcvd: 38

The following sequence of three digs shows that when I ask the
reportedly authoritative servers directly about this name, they can and
do answer correctly. It's only when the query recurses that SERVFAIL
shows up:

kauer@karl:~$ dig @gtm1.rsi.co.jp  mailergoat.rsi.co.jp 

;  DiG 9.7.1-P2  @gtm1.rsi.co.jp mailergoat.rsi.co.jp 
; (1 server found)
;; global options: +cmd
;; Got answer:
;; -HEADER- opcode: QUERY, status: NOERROR, id: 43306
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;mailergoat.rsi.co.jp.  IN  

;; AUTHORITY SECTION:
rsi.co.jp.  60  IN  SOA gtm1.rsi.co.jp. 
hostmaster.gtm1.rsi.co.jp. 31
10800 3600 604800 60

;; Query time: 272 msec
;; SERVER: 202.214.41.51#53(202.214.41.51)
;; WHEN: Wed Apr 27 21:40:09 2011
;; MSG SIZE  rcvd: 90

kauer@karl:~$ dig @gtm2.rsi.co.jp  

Re: Stumped - SERVFAIL vs NOERROR?

2011-04-27 Thread Mark Andrews

In message 1303906294.2246.93.camel@karl, Karl Auer writes:
 
 Hi all.
 
 Well, I'm stumped.
 
 This is causing non-delivery of mail for the affected domain because it
 is blocking fallback from IPv6 to IPv4 for the domain. The problem
 smells like misconfigured IPv6 somewhere along the way, but all the
 servers involved (that have IPv6 addresses) seem to be answering OK.

The SMTP server will be failing on the MX lookup if it is following
the RFCs.  A and  should only be looked up after getting a
NODATA response to a MX query.

 Using our local caching, recursive BIND9 nameservers, we get SERVFAIL on
 a particular domain, namely mailergoat.rsi.co.jp. But from other
 places, we get NOERROR (which is the correct answer, because there is a
 A record with that name). However, from some places outside our network
 we also get SERVFAIL.

The nameservers for mailergoat.rsi.co.jp are broken.  They return
the *wrong* SOA record in the response which can clearly be seen at
the end of a dig +trace mailergoat.rsi.co.jp mx.

mailergoat.rsi.co.jp.   600 IN  NS  gtm1.rsi.co.jp.
mailergoat.rsi.co.jp.   600 IN  NS  gtm2.rsi.co.jp.
;; Received 108 bytes from 202.248.0.34#53(ns.center.web.ad.jp) in 304 ms

rsi.co.jp.  60  IN  SOA gtm1.rsi.co.jp. 
hostmaster.gtm1.rsi.co.jp. 31 10800 3600 604800 60
;; Received 90 bytes from 202.25.214.15#53(gtm2.rsi.co.jp) in 395 ms

The correct SOA record would be mailergoat.rsi.co.jp 60 IN SOA
gtm1.rsi.co.jp. hostmaster.gtm1.rsi.co.jp. 31 10800 3600 604800 60
all other things being equal.

 Traces (using the +trace option to dig) are identical regardless of
 where we do them, besides some reordering of the nameserver results,
 which is normal.
 
 One oddity (at least it seems odd to me) is that a trace ends with two
 nameservers, gtm1.rsi.co.jp and gtm2.rsi.co.jp, that are not present in
 the nameserver list for rsi.co.jp, meaning that the domain
 mailergoat.rsi.co.jp has been delegated to them. When I ask either of
 those servers directly for the nameserver records for
 mailergoat.rsi.co.jp, I get NOERROR, but no answer. Asking those servers
 for ANY records for that name shows an A record and a TXT (SPF) record
 only. That makes this a lame delegation - but why do some recursive
 nameservers report it as SERVFAIL and some as NOERROR? A difference
 between nameservers, or nameserver versions?

Different tolerances for errors.

Adding a MX record here will help.  One really shouldn't be depending
apon the implicit MX records generated from the A and  records.

 Any ideas gratefully received. See below for dig outputs demonstrating
 the above statements.
 
 Regards, K.
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Stumped - SERVFAIL vs NOERROR?

2011-04-27 Thread Tony Finch
Karl Auer ka...@biplane.com.au wrote:

 Using our local caching, recursive BIND9 nameservers, we get SERVFAIL on
 a particular domain, namely mailergoat.rsi.co.jp. But from other
 places, we get NOERROR (which is the correct answer, because there is a
 A record with that name). However, from some places outside our network
 we also get SERVFAIL.

The name servers for the zone mailergoat.rsi.co.jp are broken. They return
a nodata response with the wrong authority for all non-A non-TXT queries.
The SOA record owner name in the additional section of the reply should be
mailergoat.rsi.co.jp not rsi.co.jp. BIND requires that the SOA owner name
in a nodata response matches the zone name that BIND is expecting. This is
part of the logic it uses to tell the difference between various kinds of
negative responses (as in RFC 2308).

Tony.
-- 
f.anthony.n.finch  d...@dotat.at  http://dotat.at/
Rockall, Malin, Hebrides: South 5 to 7, occasionally gale 8 at first in
Rockall and Malin, veering west or northwest 4 or 5, then backing southwest 5
or 6 later. Rough or very rough. Occasional rain. Moderate or good,
occasionally poor.
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users