Re: [Pdns-users] 3.4.8 -> 4.0.1: Exiting because communicator?? thread died with STL error: stou

2016-09-15 Thread Oliver Peter
On Thu, Sep 15, 2016 at 01:02:38PM +0300, cmouse wrote:
> Make sure its a number or NULL (not "")

You are a genius, thank you very much!

mysql> select notified_serial from domains where notified_serial is not null 
order by notified_serial asc limit 10;  
 
+-+
| notified_serial |
+-+
|   -93865895 |
|   4 |
|  2006013100 |
|  2006050100 |
|  2006061900 |
|  2006061900 |
|  2006090500 |
|  2006090500 |
|  2006090500 |
|  2006091800 |
+-+
10 rows in set (1.04 sec)

mysql> select * from domains where notified_serial = -93865895;
+++---+++-+-+
| id | name   | master| last_check | type   | notified_serial | 
account |
+++---+++-+-+
| 845349 | xxx.de | xxx.xxx.xxx.x |   NULL | MASTER |   -93865895 | 
NULL|
+++---+++-+-+
1 row in set (1.71 sec)

mysql> update domains set notified_serial = NULL where id = 845349 limit 1; 

  
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Which has fixed the problem.

Pieter:
I created the following issue 
https://github.com/PowerDNS/pdns/issues/4475


-- 
Oliver PETER   oli...@gfuzz.de   0x456D688F
___
Pdns-users mailing list
Pdns-users@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/pdns-users


Re: [Pdns-users] 3.4.8 -> 4.0.1: Exiting because communicator? thread died with STL error: stou

2016-09-15 Thread Oliver Peter
On Thu, Sep 15, 2016 at 12:02:39PM +0300, cmouse wrote:
> I ran into this same issue. It was in my case the domains table 
> last_notified_serial. Maybe it is similar issue for you?

How do you mean?
last_notified_serial is neither a table nor a row in the domains table,
perhaps you mean domains.notified_serial?

https://github.com/PowerDNS/pdns/blob/master/modules/gmysqlbackend/schema.mysql.sql#L7

Anyway, what did you do exactly to fix your problem?


-- 
Oliver PETER   oli...@gfuzz.de   0x456D688F
___
Pdns-users mailing list
Pdns-users@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/pdns-users


Re: [Pdns-users] 3.4.8 -> 4.0.1: Exiting because communicator thread died with STL error: stou

2016-09-15 Thread Oliver Peter
Hi Pieter,

Thanks for your reply.

On Thu, Sep 15, 2016 at 09:12:43AM +0200, Pieter Lexis wrote:
> On Thu, 15 Sep 2016 09:05:31 +0200
> Oliver Peter  wrote:
> > During the update process from our 3.4.8 servers to 4.0.1 we encountered
> > a dying/looping pdns instance.  3.4.x has been stable for the last
> > ~6months.
> > We already moved 4 of our 5 auth NS to 4.0.1, all of them are running
> > FreeBSD10, all of them are working fine as expected.
> > 
> > Today we upgraded our last instance and this one showed us a strange
> > error (Murphy's law) so we had to downgrade to 3.4.8.  The service comes
> > up OK, servers a couple of requests, dies, and comes up again, etc:
> > 
> > 
> > Basically the machines are running almost the same config (except IP
> > settings of course) and serving almost the same zone database (~2mio
> > domains, ~20mio records).
> > 
> > On the same machine we have another pdns instance running, same
> > binaries, a bit less zones/records, different config profile - this one
> > was pretty stable.
> > 
> > Any hints appreciated.
> 
> We became a little more strict on database content in 4.0.0. I would suggest 
> running `pdnsutil check-all-zones` to see which record causes the issue. 
> Could you then send us that record in a github issue[1], because crashing on 
> something like this is bad.

I tried that:
[r...@a.ns14.net:~]# pdnsutil check-all-zones
Error: stou
[r...@a.ns14.net:~]# pdnsutil -v check-all-zones

  
Error: stou

truss gives me nothing helpful at the moment:
[...]
munmap(0x80340,4194304)  = 0 (0x0)
poll({5/POLLIN|POLLPRI},1,0) = 0 (0x0)
write(5,"\^E\0\0\0\^Y\^A\0\0\0",9)   = 9 (0x9)
write(5,"\^A\0\0\0\^A",5)= 5 (0x5)
shutdown(5,SHUT_RDWR)= 0 (0x0)
close(5) = 0 (0x0)
madvise(0x8097f,0x1,0x5,0xaaab,0x809405e20,0x801f0ee80) = 0 
(0x0)
munmap(0x81400,4194304)  = 0 (0x0)
madvise(0x8057fc000,0x1000,0x5,0xaaab,0x7fffb9e0,0x801f0ee80) = 
0 (0x0)
munmap(0x80940,4194304)  = 0 (0x0)
madvise(0x8024f4000,0x3000,0x5,0xaaab,0x7fffb9e0,0x801f0ee80) = 
0 (0x0)
madvise(0x8024fa000,0x8000,0x5,0xaaab,0x7fffb9e0,0x801f0ee80) = 
0 (0x0)
madvise(0x802503000,0x2000,0x5,0xaaab,0x7fffb9e0,0x801f0ee80) = 
0 (0x0)
madvise(0x802528000,0x2000,0x5,0xaaab,0x7fffb9e0,0x801f0ee80) = 
0 (0x0)
write(4,"\^A\0\0\0\^A",5)= 5 (0x5)
shutdown(4,SHUT_RDWR)= 0 (0x0)
close(4) = 0 (0x0)
Error: write(2,"Error: ",7)  = 7 (0x7)
stouwrite(2,"stou",4)= 4 (0x4)

write(2,"\n",1)  = 1 (0x1)
[...]

Is it possible to add more debug flags/output to the program?

Once we found the corrupt zone(s) I will file in a bug at github.


-- 
Oliver PETER   oli...@gfuzz.de   0x456D688F
___
Pdns-users mailing list
Pdns-users@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/pdns-users


Re: [Pdns-users] 3.4.8 -> 4.0.1: Exiting because communicator thread died with STL error: stou

2016-09-15 Thread Pieter Lexis
Hi Oliver,

On Thu, 15 Sep 2016 09:05:31 +0200
Oliver Peter  wrote:

> Hi,
> 
> During the update process from our 3.4.8 servers to 4.0.1 we encountered
> a dying/looping pdns instance.  3.4.x has been stable for the last
> ~6months.
> We already moved 4 of our 5 auth NS to 4.0.1, all of them are running
> FreeBSD10, all of them are working fine as expected.
> 
> Today we upgraded our last instance and this one showed us a strange
> error (Murphy's law) so we had to downgrade to 3.4.8.  The service comes
> up OK, servers a couple of requests, dies, and comes up again, etc:
> 
> 
> Basically the machines are running almost the same config (except IP
> settings of course) and serving almost the same zone database (~2mio
> domains, ~20mio records).
> 
> On the same machine we have another pdns instance running, same
> binaries, a bit less zones/records, different config profile - this one
> was pretty stable.
> 
> Any hints appreciated.

We became a little more strict on database content in 4.0.0. I would suggest 
running `pdnsutil check-all-zones` to see which record causes the issue. Could 
you then send us that record in a github issue[1], because crashing on 
something like this is bad.

Best regards,

Pieter

1 - https://github.com/PowerDNS/pdns/issues/new

-- 
Pieter Lexis
PowerDNS.COM BV -- https://www.powerdns.com
___
Pdns-users mailing list
Pdns-users@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/pdns-users