Re: named daemon hangs

2009-05-04 Thread Adam Tkac
On Sat, May 02, 2009 at 04:06:18PM +0100, Nelson Vale wrote:
 Hi all,
 
 
 I've been facing a problem in my private network which I was not able to fix
 yet.
 
 In my gateway (linux debian alike) I have bind 9.5 installed and running,
 and I have one IPSec tunnel to another gateway over the internet. It also
 has configured a forward zone with the name server being the other gateway
 internal address (accessibly through the IPSec tunnel only).
 
 Recently the other IPSec endpoint was shutdown and, of course, my queries to
 the forward domain started failling. Nothing strange here...
 
 The real problem is that I suddendly were not able to resolve any other DNS
 queries, like www.google.com, from inside my network:
 
 host www.google.com
 ;; connection timed out; no servers could be reached
 
 I took a look at the named daemon and I see that it does not respond to
 anything as long as the IPSec tunnel is down, but only if it's the other
 endpoint that is down. I've tried stopping my endpoint and this problem do
 not occur as long as I restart named. I think this happens because as long
 as my endpoint is up the routes to the other endpoint are set, and named
 trys to querie the forward domain name server. The problem is that the
 queries do not timeout and named hangs there:

Please check this:
- https://bugzilla.redhat.com/show_bug.cgi?id=427629
- http://lkml.org/lkml/2007/12/4/260
- http://lkml.org/lkml/2008/4/17/474

$ echo 1 /proc/sys/net/core/xfrm_larval_drop

should help you.

Adam

-- 
Adam Tkac, Red Hat, Inc.
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: named daemon hangs

2009-05-04 Thread Nelson Vale
Hi,

Thank you all for your help. This fix surely made the difference :).

echo 1 /proc/sys/net/core/xfrm_larval_drop


Nelson Vale


On Mon, May 4, 2009 at 8:18 AM, Adam Tkac at...@redhat.com wrote:

 On Sat, May 02, 2009 at 04:06:18PM +0100, Nelson Vale wrote:
  Hi all,
 
 
  I've been facing a problem in my private network which I was not able to
 fix
  yet.
 
  In my gateway (linux debian alike) I have bind 9.5 installed and running,
  and I have one IPSec tunnel to another gateway over the internet. It also
  has configured a forward zone with the name server being the other
 gateway
  internal address (accessibly through the IPSec tunnel only).
 
  Recently the other IPSec endpoint was shutdown and, of course, my queries
 to
  the forward domain started failling. Nothing strange here...
 
  The real problem is that I suddendly were not able to resolve any other
 DNS
  queries, like www.google.com, from inside my network:
 
  host www.google.com
  ;; connection timed out; no servers could be reached
 
  I took a look at the named daemon and I see that it does not respond to
  anything as long as the IPSec tunnel is down, but only if it's the other
  endpoint that is down. I've tried stopping my endpoint and this problem
 do
  not occur as long as I restart named. I think this happens because as
 long
  as my endpoint is up the routes to the other endpoint are set, and named
  trys to querie the forward domain name server. The problem is that the
  queries do not timeout and named hangs there:

 Please check this:
 - https://bugzilla.redhat.com/show_bug.cgi?id=427629
 - http://lkml.org/lkml/2007/12/4/260
 - http://lkml.org/lkml/2008/4/17/474

 $ echo 1 /proc/sys/net/core/xfrm_larval_drop

 should help you.

 Adam

 --
 Adam Tkac, Red Hat, Inc.

___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Re: named daemon hangs

2009-05-02 Thread Barry Margolin
In article gthni2$26n...@sf1.isc.org,
 Nelson Vale nelsonduv...@gmail.com wrote:

 I've been facing a problem in my private network which I was not able to fix
 yet.
 
 In my gateway (linux debian alike) I have bind 9.5 installed and running,
 and I have one IPSec tunnel to another gateway over the internet. It also
 has configured a forward zone with the name server being the other gateway
 internal address (accessibly through the IPSec tunnel only).
 
 Recently the other IPSec endpoint was shutdown and, of course, my queries to
 the forward domain started failling. Nothing strange here...
 
 The real problem is that I suddendly were not able to resolve any other DNS
 queries, like www.google.com, from inside my network:
 
 host www.google.com
 ;; connection timed out; no servers could be reached
 
 I took a look at the named daemon and I see that it does not respond to
 anything as long as the IPSec tunnel is down, but only if it's the other
 endpoint that is down. I've tried stopping my endpoint and this problem do
 not occur as long as I restart named. I think this happens because as long
 as my endpoint is up the routes to the other endpoint are set, and named
 trys to querie the forward domain name server. The problem is that the
 queries do not timeout and named hangs there:

I recall a thread about a similar problem a year or two ago, I suggest 
you search the comp.protocols.dns.bind archives in Google Groups.

-- 
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE don't copy me on replies, I'll read them in the group ***
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: named daemon hangs

2009-05-02 Thread Jonathan Petersson
Could you please provide a copy of your config, I'm guessing that you
have a general forwarder in place or haven't turned on recursion.

/Jonathan

On Sat, May 2, 2009 at 8:06 AM, Nelson Vale nelsonduv...@gmail.com wrote:
 Hi all,


 I've been facing a problem in my private network which I was not able to fix
 yet.

 In my gateway (linux debian alike) I have bind 9.5 installed and running,
 and I have one IPSec tunnel to another gateway over the internet. It also
 has configured a forward zone with the name server being the other gateway
 internal address (accessibly through the IPSec tunnel only).

 Recently the other IPSec endpoint was shutdown and, of course, my queries to
 the forward domain started failling. Nothing strange here...

 The real problem is that I suddendly were not able to resolve any other DNS
 queries, like www.google.com, from inside my network:

 host www.google.com
 ;; connection timed out; no servers could be reached

 I took a look at the named daemon and I see that it does not respond to
 anything as long as the IPSec tunnel is down, but only if it's the other
 endpoint that is down. I've tried stopping my endpoint and this problem do
 not occur as long as I restart named. I think this happens because as long
 as my endpoint is up the routes to the other endpoint are set, and named
 trys to querie the forward domain name server. The problem is that the
 queries do not timeout and named hangs there:

 The configuration I have is:

 Bind: BIND 9.5.0-P2
 IP Address (private): 192.168.9.254
 Forwarders: ADSL provider (2 forwarders)
 Forward Zone: mylan.loc
 Name Server:192.168.90.254


 After it starts if I try to querie one of the forward zone record
 (box.mylan.loc) it displays:

 ...
 02-May-2009 14:22:21.843 socket 0xb7bd5548: dispatch_recv:  event 0xb7be3d28
 - task 0xb7b74d18
 02-May-2009 14:22:21.844 socket 0xb7bd5548: internal_recv: task 0xb7b74d18
 got event 0xb7bd559c
 02-May-2009 14:22:21.844 socket 0xb7bd5548 192.168.9.2#47869: packet
 received correctly
 02-May-2009 14:22:21.844 socket 0xb7bd5548: processing cmsg 0xb7bb2120
 02-May-2009 14:22:21.844 client 192.168.9.2#47869: UDP request
 02-May-2009 14:22:21.844 client 192.168.9.2#47869: using view '_default'
 02-May-2009 14:22:21.845 client 192.168.9.2#47869: request is not signed
 02-May-2009 14:22:21.845 client 192.168.9.2#47869: recursion available
 02-May-2009 14:22:21.845 client 192.168.9.2#47869: query
 02-May-2009 14:22:21.845 client 192.168.9.2#47869: ns_client_attach: ref = 1
 02-May-2009 14:22:21.845 client 192.168.9.2#47869: query (cache)
 'box.mylan.loc/A/IN' approved
 02-May-2009 14:22:21.845 client 192.168.9.2#47869: replace
 02-May-2009 14:22:21.845 clientmgr @0xb7baa608: createclients
 02-May-2009 14:22:21.846 clientmgr @0xb7baa608: recycle
 02-May-2009 14:22:21.846 createfetch: box.mylan.loc A
 02-May-2009 14:22:21.846 fctx 0xb7bae408(box.mylan.loc/A'): create
 02-May-2009 14:22:21.846 fctx 0xb7bae408(box.mylan.loc/A'): join
 02-May-2009 14:22:21.846 fetch 0xb7bb4148 (fctx
 0xb7bae408(box.mylan.loc/A)): created
 02-May-2009 14:22:21.846 client @0xb7bda008: udprecv
 02-May-2009 14:22:21.846 socket 0xb7bd5548: socket_recv: event 0xb7bd4b48 -
 task 0xb7bb1690
 02-May-2009 14:22:21.847 fctx 0xb7bae408(box.mylan.loc/A'): start
 02-May-2009 14:22:21.847 fctx 0xb7bae408(box.mylan.loc/A'): try
 02-May-2009 14:22:21.847 fctx 0xb7bae408(box.mylan.loc/A'): cancelqueries
 02-May-2009 14:22:21.847 fctx 0xb7bae408(box.mylan.loc/A'): getaddresses
 02-May-2009 14:22:21.847 findaddrinfo: new entry 0xb7aec4a0
 02-May-2009 14:22:21.847 fctx 0xb7bae408(box.mylan.loc/A'): query
 02-May-2009 14:22:21.848 socket 0xb7b79938: created
 02-May-2009 14:22:21.848 socket 0xb7b79938 0.0.0.0#43841: bound
 02-May-2009 14:22:21.848 dispatchmgr 0xb7bbb168: created UDP dispatcher
 0xb7b6d378
 02-May-2009 14:22:21.848 dispatch 0xb7b6d378: created task 0xb7b74d70
 02-May-2009 14:22:21.848 dispatch 0xb7b6d378: created socket 0xb7b79938
 02-May-2009 14:22:21.848 resquery 0xb7b80008 (fctx
 0xb7bae408(box.mylan.loc/A)): send
 02-May-2009 14:22:21.849 dispatch 0xb7b6d378 response 0xb7ba7848
 192.168.90.254#53: attached to task 0xb7b6f2c8
 02-May-2009 14:22:21.849 socket 0xb7b79938: socket_recv: event 0xb7b81698 -
 task 0xb7b74d70


 and it hangs here forever. Even if I restart the named server it does not
 respond to any of my queries. If I stop the named server with Ctrl + C it
 displays:

 ...
 ^C02-May-2009 14:23:46.773 socket.c:1226: unexpected error:
 02-May-2009 14:23:46.773 internal_send: 192.168.90.254#53: Interrupted
 system call should be restarted
 02-May-2009 14:23:46.774 errno2result.c:111: unexpected error:
 02-May-2009 14:23:46.774 unable to convert errno to isc_result: 85:
 Interrupted system call should be restarted
 02-May-2009 14:23:46.774 resquery 0xb7b80008 (fctx
 0xb7bae408(box.mylan.loc/A)): sent
 02-May-2009 14:23:46.774 resquery 0xb7b80008 (fctx
 0xb7bae408(box.mylan.loct/A)): senddone
 02-May-2009 14:23:46.774 fctx 

Re: named daemon hangs

2009-05-02 Thread Nelson Vale
On Sat, May 2, 2009 at 9:39 PM, Jonathan Petersson jpeters...@garnser.sewrote:

 Could you please provide a copy of your config, I'm guessing that you
 have a general forwarder in place or haven't turned on recursion.


The options and the forward zone are as follows:
acl internal {
127.0.0.1/8;
192.168.9.0/24;
}
options {
directory /etc/namedb;
pid-file /var/run/named.pid;
statistics-file /var/run/named.stats;

forwarders {
x.x.x.x; (ISP DNS server)
x.x.x.x;  (ISP DNS server)
};
forward first;
max-transfer-time-in 120;
max-transfer-time-out 120;
transfer-format many-answers;
};
 zone mylan.loc {
type forward;
forwarders {
192.168.90.254;
};
};
zone anothernet.no-ip.org {
type master;
file anothernet.no-ip.org;

allow-query {
internal;
};

allow-transfer {
none;
};

allow-update {
none;
};
};
zone 9.168.192.IN-ADDR.ARPA {
type master;
file another.no-ip.org.rev;

allow-query {
internal;
};

allow-transfer {
none;
};

allow-update {
none;
};
};
...



 /Jonathan

 On Sat, May 2, 2009 at 8:06 AM, Nelson Vale nelsonduv...@gmail.com
 wrote:
  Hi all,
 
 
  I've been facing a problem in my private network which I was not able to
 fix
  yet.
 
  In my gateway (linux debian alike) I have bind 9.5 installed and running,
  and I have one IPSec tunnel to another gateway over the internet. It also
  has configured a forward zone with the name server being the other
 gateway
  internal address (accessibly through the IPSec tunnel only).
 
  Recently the other IPSec endpoint was shutdown and, of course, my queries
 to
  the forward domain started failling. Nothing strange here...
 
  The real problem is that I suddendly were not able to resolve any other
 DNS
  queries, like www.google.com, from inside my network:
 
  host www.google.com
  ;; connection timed out; no servers could be reached
 
  I took a look at the named daemon and I see that it does not respond to
  anything as long as the IPSec tunnel is down, but only if it's the other
  endpoint that is down. I've tried stopping my endpoint and this problem
 do
  not occur as long as I restart named. I think this happens because as
 long
  as my endpoint is up the routes to the other endpoint are set, and named
  trys to querie the forward domain name server. The problem is that the
  queries do not timeout and named hangs there:
 
  The configuration I have is:
 
  Bind: BIND 9.5.0-P2
  IP Address (private): 192.168.9.254
  Forwarders: ADSL provider (2 forwarders)
  Forward Zone: mylan.loc
  Name Server:192.168.90.254
 
 
  After it starts if I try to querie one of the forward zone record
  (box.mylan.loc) it displays:
 
  ...
  02-May-2009 14:22:21.843 socket 0xb7bd5548: dispatch_recv:  event
 0xb7be3d28
  - task 0xb7b74d18
  02-May-2009 14:22:21.844 socket 0xb7bd5548: internal_recv: task
 0xb7b74d18
  got event 0xb7bd559c
  02-May-2009 14:22:21.844 socket 0xb7bd5548 192.168.9.2#47869: packet
  received correctly
  02-May-2009 14:22:21.844 socket 0xb7bd5548: processing cmsg 0xb7bb2120
  02-May-2009 14:22:21.844 client 192.168.9.2#47869: UDP request
  02-May-2009 14:22:21.844 client 192.168.9.2#47869: using view '_default'
  02-May-2009 14:22:21.845 client 192.168.9.2#47869: request is not signed
  02-May-2009 14:22:21.845 client 192.168.9.2#47869: recursion available
  02-May-2009 14:22:21.845 client 192.168.9.2#47869: query
  02-May-2009 14:22:21.845 client 192.168.9.2#47869: ns_client_attach: ref
 = 1
  02-May-2009 14:22:21.845 client 192.168.9.2#47869: query (cache)
  'box.mylan.loc/A/IN' approved
  02-May-2009 14:22:21.845 client 192.168.9.2#47869: replace
  02-May-2009 14:22:21.845 clientmgr @0xb7baa608: createclients
  02-May-2009 14:22:21.846 clientmgr @0xb7baa608: recycle
  02-May-2009 14:22:21.846 createfetch: box.mylan.loc A
  02-May-2009 14:22:21.846 fctx 0xb7bae408(box.mylan.loc/A'): create
  02-May-2009 14:22:21.846 fctx 0xb7bae408(box.mylan.loc/A'): join
  02-May-2009 14:22:21.846 fetch 0xb7bb4148 (fctx
  0xb7bae408(box.mylan.loc/A)): created
  02-May-2009 14:22:21.846 client @0xb7bda008: udprecv
  02-May-2009 14:22:21.846 socket 0xb7bd5548: socket_recv: event 0xb7bd4b48
 -
  task 0xb7bb1690
  02-May-2009 14:22:21.847 fctx 0xb7bae408(box.mylan.loc/A'): start
  02-May-2009 14:22:21.847 fctx 0xb7bae408(box.mylan.loc/A'): try
  02-May-2009 14:22:21.847 fctx 0xb7bae408(box.mylan.loc/A'): cancelqueries
  02-May-2009 14:22:21.847 fctx 0xb7bae408(box.mylan.loc/A'): getaddresses
  02-May-2009 14:22:21.847 findaddrinfo: new entry 0xb7aec4a0
  02-May-2009 14:22:21.847 fctx 0xb7bae408(box.mylan.loc/A'): query
  02-May-2009 14:22:21.848 socket 0xb7b79938: created
  02-May-2009 14:22:21.848 socket 0xb7b79938 0.0.0.0#43841: bound
  02-May-2009 

Re: named daemon hangs

2009-05-02 Thread Mark Andrews

This is a bug in the kernel where it does not honour that
the socket is set to non-blocking mode but instead blocks.
Go complain to your OS vendor.

Mark

In message 38a4524a0905020806s4c939382n80c1c3da656c8...@mail.gmail.com, 
Nelson Vale writes:
 Hi all,
 
 
 I've been facing a problem in my private network which I was not able to fix
 yet.
 
 In my gateway (linux debian alike) I have bind 9.5 installed and running,
 and I have one IPSec tunnel to another gateway over the internet. It also
 has configured a forward zone with the name server being the other gateway
 internal address (accessibly through the IPSec tunnel only).
 
 Recently the other IPSec endpoint was shutdown and, of course, my queries to
 the forward domain started failling. Nothing strange here...
 
 The real problem is that I suddendly were not able to resolve any other DNS
 queries, like www.google.com, from inside my network:
 
 host www.google.com
 ;; connection timed out; no servers could be reached
 
 I took a look at the named daemon and I see that it does not respond to
 anything as long as the IPSec tunnel is down, but only if it's the other
 endpoint that is down. I've tried stopping my endpoint and this problem do
 not occur as long as I restart named. I think this happens because as long
 as my endpoint is up the routes to the other endpoint are set, and named
 trys to querie the forward domain name server. The problem is that the
 queries do not timeout and named hangs there:
 
 The configuration I have is:
 
 Bind: BIND 9.5.0-P2
 IP Address (private): 192.168.9.254
 Forwarders: ADSL provider (2 forwarders)
 Forward Zone: mylan.loc
 Name Server:192.168.90.254
 
 
 After it starts if I try to querie one of the forward zone record
 (box.mylan.loc) it displays:
 
 ...
 02-May-2009 14:22:21.843 socket 0xb7bd5548: dispatch_recv:  event 0xb7be3d28
 - task 0xb7b74d18
 02-May-2009 14:22:21.844 socket 0xb7bd5548: internal_recv: task 0xb7b74d18
 got event 0xb7bd559c
 02-May-2009 14:22:21.844 socket 0xb7bd5548 192.168.9.2#47869: packet
 received correctly
 02-May-2009 14:22:21.844 socket 0xb7bd5548: processing cmsg 0xb7bb2120
 02-May-2009 14:22:21.844 client 192.168.9.2#47869: UDP request
 02-May-2009 14:22:21.844 client 192.168.9.2#47869: using view '_default'
02-May-2009 14:22:21.845 client 192.168.9.2#47869: request is not signed
 02-May-2009 14:22:21.845 client 192.168.9.2#47869: recursion available
 02-May-2009 14:22:21.845 client 192.168.9.2#47869: query
 02-May-2009 14:22:21.845 client 192.168.9.2#47869: ns_client_attach: ref = 1
 02-May-2009 14:22:21.845 client 192.168.9.2#47869: query (cache)
 'box.mylan.loc/A/IN' approved
 02-May-2009 14:22:21.845 client 192.168.9.2#47869: replace
 02-May-2009 14:22:21.845 clientmgr @0xb7baa608: createclients
 02-May-2009 14:22:21.846 clientmgr @0xb7baa608: recycle
 02-May-2009 14:22:21.846 createfetch: box.mylan.loc A
 02-May-2009 14:22:21.846 fctx 0xb7bae408(box.mylan.loc/A'): create
 02-May-2009 14:22:21.846 fctx 0xb7bae408(box.mylan.loc/A'): join
 02-May-2009 14:22:21.846 fetch 0xb7bb4148 (fctx
 0xb7bae408(box.mylan.loc/A)): created
 02-May-2009 14:22:21.846 client @0xb7bda008: udprecv
 02-May-2009 14:22:21.846 socket 0xb7bd5548: socket_recv: event 0xb7bd4b48 -
 task 0xb7bb1690
 02-May-2009 14:22:21.847 fctx 0xb7bae408(box.mylan.loc/A'): start
 02-May-2009 14:22:21.847 fctx 0xb7bae408(box.mylan.loc/A'): try
 02-May-2009 14:22:21.847 fctx 0xb7bae408(box.mylan.loc/A'): cancelqueries
 02-May-2009 14:22:21.847 fctx 0xb7bae408(box.mylan.loc/A'): getaddresses
 02-May-2009 14:22:21.847 findaddrinfo: new entry 0xb7aec4a0
 02-May-2009 14:22:21.847 fctx 0xb7bae408(box.mylan.loc/A'): query
 02-May-2009 14:22:21.848 socket 0xb7b79938: created
 02-May-2009 14:22:21.848 socket 0xb7b79938 0.0.0.0#43841: bound
 02-May-2009 14:22:21.848 dispatchmgr 0xb7bbb168: created UDP dispatcher
 0xb7b6d378
 02-May-2009 14:22:21.848 dispatch 0xb7b6d378: created task 0xb7b74d70
 02-May-2009 14:22:21.848 dispatch 0xb7b6d378: created socket 0xb7b79938
 02-May-2009 14:22:21.848 resquery 0xb7b80008 (fctx
 0xb7bae408(box.mylan.loc/A)): send
 02-May-2009 14:22:21.849 dispatch 0xb7b6d378 response 0xb7ba7848
 192.168.90.254#53: attached to task 0xb7b6f2c8
 02-May-2009 14:22:21.849 socket 0xb7b79938: socket_recv: event 0xb7b81698 -
 task 0xb7b74d70
 
 
 and it hangs here forever. Even if I restart the named server it does not
 respond to any of my queries. If I stop the named server with Ctrl + C it
 displays:
 
 ...
 ^C02-May-2009 14:23:46.773 socket.c:1226: unexpected error:
 02-May-2009 14:23:46.773 internal_send: 192.168.90.254#53: Interrupted
 system call should be restarted
 02-May-2009 14:23:46.774 errno2result.c:111: unexpected error:
 02-May-2009 14:23:46.774 unable to convert errno to isc_result: 85:
 Interrupted system call should be restarted
 02-May-2009 14:23:46.774 resquery 0xb7b80008 (fctx
 0xb7bae408(box.mylan.loc/A)): sent
 02-May-2009 14:23:46.774 resquery 0xb7b80008 (fctx