Re: [Freeipa-devel] [PATCH 0291-0294] Fix locking to prevent crashes and deadlocks

2014-09-12 Thread Martin Basti

On 11/09/14 21:58, Petr Spacek wrote:

On 11.9.2014 18:34, Martin Basti wrote:

On 11/09/14 15:57, Martin Basti wrote:

On 11/09/14 11:59, Petr Spacek wrote:

Hello,

I was fighting with random crashes for couple of days ... and 
discovered
that run_exclusive_enter()/isc_task_beginexclusive() usage was 
completely

incorrect and didn't actually lock anything.

This series of patches reworks internal locking (and related event 
system)

to work around limitations of isc_task_beginexclusive() mechanism.

It would be better to get rid of isc_task_beginexclusive() 
completely but
IMHO it is not possible because of BIND's dns_view*() functions 
have to be

guarded with it.


Testing is going to be interesting because we are speaking about race
conditions.

I used ~ 100 DNS zones, each zone had ~ 100 random domain names 
inside with

random A//TXT RRs. My LDIF is here:
http://people.redhat.com/~pspacek/a/2014/09/11/dns-test.ldif.xz

I was able to randomly reproduce various crashes when BIND was 
running with

more threads than usually.

You can try to run BIND with this command (as root) and play games 
with -n

parameter:
$ export KRB5_KTNAME=/etc/named.keytab
$ named -4 -g -u named -m record -n 10

Please test also the case where BIND receives SIGINT during 
start-up. It is

possible to run BIND with commands above and wait for message:
11-Sep-2014 11:54:58.092 running

At this point send SIGINT (CTRL+C) to BIND and see what happens. It 
could

crash or deadlock.

It is necessary to send the signal before BIND prints this message:
11-Sep-2014 11:55:11.707 zone z1.test/IN: loaded serial 1410429304

Let me know if you need any assistance.


I need your assistance, I haven't been able to reproduce it.

Martin


I applied the patchset, and NACK


I don't understand how I could possibly miss this. I was convinced 
that the patch set was thoroughly tested ...


Anyway, attached patch should fix the problem you were facing. Please 
re-test it.


Thank you!

Petr^2 Spacek


#1
If named is running and I randomly choose few zones and delete them, 
it causes

named failure

dig @localhost A r1.z12.test

;  DiG 9.9.4-P2-RedHat-9.9.4-12.P2.fc20  @localhost A 
r1.z12.test

; (2 servers found)
;; global options: +cmd
;; connection timed out; no servers could be reached

* SIGINT doesn't work
* rndc doesn't work
* DS worksSIGINT signal stops working.

Output:
snip
11-Sep-2014 11:26:37.495 client 127.0.0.1#62615: received notify for 
zone

'z99.test'

^C^C^C^C^C^C^C^C


Process:
named29125  1.1  2.9 789972 45976 pts/0Sl+  11:26   0:02 
named -4 -g

-u named -m record -n 10

I have to kill it with kill -9

#2
same as #1 If new zone is added,

#3
same as #1 If new record is added

#4
same as #1 If record is deleted

Functional ACK

--
Martin Basti

___
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel


Re: [Freeipa-devel] [PATCH 0291-0294] Fix locking to prevent crashes and deadlocks

2014-09-12 Thread Petr Spacek

On 12.9.2014 14:45, Martin Basti wrote:

On 11/09/14 21:58, Petr Spacek wrote:

On 11.9.2014 18:34, Martin Basti wrote:

On 11/09/14 15:57, Martin Basti wrote:

On 11/09/14 11:59, Petr Spacek wrote:

Hello,

I was fighting with random crashes for couple of days ... and discovered
that run_exclusive_enter()/isc_task_beginexclusive() usage was completely
incorrect and didn't actually lock anything.

This series of patches reworks internal locking (and related event system)
to work around limitations of isc_task_beginexclusive() mechanism.

It would be better to get rid of isc_task_beginexclusive() completely but
IMHO it is not possible because of BIND's dns_view*() functions have to be
guarded with it.


Testing is going to be interesting because we are speaking about race
conditions.

I used ~ 100 DNS zones, each zone had ~ 100 random domain names inside with
random A//TXT RRs. My LDIF is here:
http://people.redhat.com/~pspacek/a/2014/09/11/dns-test.ldif.xz

I was able to randomly reproduce various crashes when BIND was running with
more threads than usually.

You can try to run BIND with this command (as root) and play games with -n
parameter:
$ export KRB5_KTNAME=/etc/named.keytab
$ named -4 -g -u named -m record -n 10

Please test also the case where BIND receives SIGINT during start-up. It is
possible to run BIND with commands above and wait for message:
11-Sep-2014 11:54:58.092 running

At this point send SIGINT (CTRL+C) to BIND and see what happens. It could
crash or deadlock.

It is necessary to send the signal before BIND prints this message:
11-Sep-2014 11:55:11.707 zone z1.test/IN: loaded serial 1410429304

Let me know if you need any assistance.


I need your assistance, I haven't been able to reproduce it.

Martin


...


Functional ACK


Pushed to master:
1d23e27e0198cacdbbb14ac13ff039c33546d9af
09386eb446ac8d9c12110d21f2d4da36c1b792bf
e7d8f6f175eae795c603eee00d06d5799b2d6c39
17a29e20406ad9db0b28fcb6ec4b02394385fe53

--
Petr^2 Spacek

___
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel


Re: [Freeipa-devel] [PATCH 0291-0294] Fix locking to prevent crashes and deadlocks

2014-09-11 Thread Martin Basti

On 11/09/14 11:59, Petr Spacek wrote:

Hello,

I was fighting with random crashes for couple of days ... and 
discovered that run_exclusive_enter()/isc_task_beginexclusive() usage 
was completely incorrect and didn't actually lock anything.


This series of patches reworks internal locking (and related event 
system) to work around limitations of isc_task_beginexclusive() 
mechanism.


It would be better to get rid of isc_task_beginexclusive() completely 
but IMHO it is not possible because of BIND's dns_view*() functions 
have to be guarded with it.



Testing is going to be interesting because we are speaking about race 
conditions.


I used ~ 100 DNS zones, each zone had ~ 100 random domain names inside 
with random A//TXT RRs. My LDIF is here:

http://people.redhat.com/~pspacek/a/2014/09/11/dns-test.ldif.xz

I was able to randomly reproduce various crashes when BIND was running 
with more threads than usually.


You can try to run BIND with this command (as root) and play games 
with -n parameter:

$ export KRB5_KTNAME=/etc/named.keytab
$ named -4 -g -u named -m record -n 10

Please test also the case where BIND receives SIGINT during start-up. 
It is possible to run BIND with commands above and wait for message:

11-Sep-2014 11:54:58.092 running

At this point send SIGINT (CTRL+C) to BIND and see what happens. It 
could crash or deadlock.


It is necessary to send the signal before BIND prints this message:
11-Sep-2014 11:55:11.707 zone z1.test/IN: loaded serial 1410429304

Let me know if you need any assistance.


I need your assistance, I haven't been able to reproduce it.

Martin

--
Martin Basti

___
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel


Re: [Freeipa-devel] [PATCH 0291-0294] Fix locking to prevent crashes and deadlocks

2014-09-11 Thread Martin Basti

On 11/09/14 15:57, Martin Basti wrote:

On 11/09/14 11:59, Petr Spacek wrote:

Hello,

I was fighting with random crashes for couple of days ... and 
discovered that run_exclusive_enter()/isc_task_beginexclusive() usage 
was completely incorrect and didn't actually lock anything.


This series of patches reworks internal locking (and related event 
system) to work around limitations of isc_task_beginexclusive() 
mechanism.


It would be better to get rid of isc_task_beginexclusive() completely 
but IMHO it is not possible because of BIND's dns_view*() functions 
have to be guarded with it.



Testing is going to be interesting because we are speaking about race 
conditions.


I used ~ 100 DNS zones, each zone had ~ 100 random domain names 
inside with random A//TXT RRs. My LDIF is here:

http://people.redhat.com/~pspacek/a/2014/09/11/dns-test.ldif.xz

I was able to randomly reproduce various crashes when BIND was 
running with more threads than usually.


You can try to run BIND with this command (as root) and play games 
with -n parameter:

$ export KRB5_KTNAME=/etc/named.keytab
$ named -4 -g -u named -m record -n 10

Please test also the case where BIND receives SIGINT during start-up. 
It is possible to run BIND with commands above and wait for message:

11-Sep-2014 11:54:58.092 running

At this point send SIGINT (CTRL+C) to BIND and see what happens. It 
could crash or deadlock.


It is necessary to send the signal before BIND prints this message:
11-Sep-2014 11:55:11.707 zone z1.test/IN: loaded serial 1410429304

Let me know if you need any assistance.


I need your assistance, I haven't been able to reproduce it.

Martin


I applied the patchset, and NACK


#1
If named is running and I randomly choose few zones and delete them, it 
causes named failure


dig @localhost A r1.z12.test

;  DiG 9.9.4-P2-RedHat-9.9.4-12.P2.fc20  @localhost A r1.z12.test
; (2 servers found)
;; global options: +cmd
;; connection timed out; no servers could be reached

* SIGINT doesn't work
* rndc doesn't work
* DS worksSIGINT signal stops working.

Output:
snip
11-Sep-2014 11:26:37.495 client 127.0.0.1#62615: received notify for 
zone 'z99.test'


^C^C^C^C^C^C^C^C


Process:
named29125  1.1  2.9 789972 45976 pts/0Sl+  11:26   0:02 named 
-4 -g -u named -m record -n 10


I have to kill it with kill -9

#2
same as #1 If new zone is added,

#3
same as #1 If new record is added

#4
same as #1 If record is deleted

--
Martin Basti

___
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel


Re: [Freeipa-devel] [PATCH 0291-0294] Fix locking to prevent crashes and deadlocks

2014-09-11 Thread Petr Spacek

On 11.9.2014 18:34, Martin Basti wrote:

On 11/09/14 15:57, Martin Basti wrote:

On 11/09/14 11:59, Petr Spacek wrote:

Hello,

I was fighting with random crashes for couple of days ... and discovered
that run_exclusive_enter()/isc_task_beginexclusive() usage was completely
incorrect and didn't actually lock anything.

This series of patches reworks internal locking (and related event system)
to work around limitations of isc_task_beginexclusive() mechanism.

It would be better to get rid of isc_task_beginexclusive() completely but
IMHO it is not possible because of BIND's dns_view*() functions have to be
guarded with it.


Testing is going to be interesting because we are speaking about race
conditions.

I used ~ 100 DNS zones, each zone had ~ 100 random domain names inside with
random A//TXT RRs. My LDIF is here:
http://people.redhat.com/~pspacek/a/2014/09/11/dns-test.ldif.xz

I was able to randomly reproduce various crashes when BIND was running with
more threads than usually.

You can try to run BIND with this command (as root) and play games with -n
parameter:
$ export KRB5_KTNAME=/etc/named.keytab
$ named -4 -g -u named -m record -n 10

Please test also the case where BIND receives SIGINT during start-up. It is
possible to run BIND with commands above and wait for message:
11-Sep-2014 11:54:58.092 running

At this point send SIGINT (CTRL+C) to BIND and see what happens. It could
crash or deadlock.

It is necessary to send the signal before BIND prints this message:
11-Sep-2014 11:55:11.707 zone z1.test/IN: loaded serial 1410429304

Let me know if you need any assistance.


I need your assistance, I haven't been able to reproduce it.

Martin


I applied the patchset, and NACK


I don't understand how I could possibly miss this. I was convinced that the 
patch set was thoroughly tested ...


Anyway, attached patch should fix the problem you were facing. Please re-test 
it.

Thank you!

Petr^2 Spacek


#1
If named is running and I randomly choose few zones and delete them, it causes
named failure

dig @localhost A r1.z12.test

;  DiG 9.9.4-P2-RedHat-9.9.4-12.P2.fc20  @localhost A r1.z12.test
; (2 servers found)
;; global options: +cmd
;; connection timed out; no servers could be reached

* SIGINT doesn't work
* rndc doesn't work
* DS worksSIGINT signal stops working.

Output:
snip
11-Sep-2014 11:26:37.495 client 127.0.0.1#62615: received notify for zone
'z99.test'

^C^C^C^C^C^C^C^C


Process:
named29125  1.1  2.9 789972 45976 pts/0Sl+  11:26   0:02 named -4 -g
-u named -m record -n 10

I have to kill it with kill -9

#2
same as #1 If new zone is added,

#3
same as #1 If new record is added

#4
same as #1 If record is deleted
From 1d23e27e0198cacdbbb14ac13ff039c33546d9af Mon Sep 17 00:00:00 2001
From: Petr Spacek pspa...@redhat.com
Date: Wed, 10 Sep 2014 16:48:52 +0200
Subject: [PATCH] Rework locking in settings.c module.

Locking mechanism run_exclusive_enter()/exit() works correctly only if
all calls operate on the same task.

Locking was reworked to use ordinary mutex stored in settings_set_t
structure.

Signed-off-by: Petr Spacek pspa...@redhat.com
---
 src/ldap_helper.c | 32 +---
 src/settings.c| 63 ++-
 src/settings.h| 11 +-
 3 files changed, 55 insertions(+), 51 deletions(-)

diff --git a/src/ldap_helper.c b/src/ldap_helper.c
index b3cc7f8389e52decd2f90a18eae761fbc37433a0..89d585abe1cda14fb23807e612970e9bd1e84459 100644
--- a/src/ldap_helper.c
+++ b/src/ldap_helper.c
@@ -394,7 +394,7 @@ validate_local_instance_settings(ldap_instance_t *inst, settings_set_t *set) {
 
 	if (strcmp(dir_name, str_buf(buff)) != 0)
 		CHECK(setting_set(directory, inst-local_settings,
-  str_buf(buff), inst-task));
+  str_buf(buff)));
 	str_destroy(buff);
 	dir_name = NULL;
 	CHECK(setting_get_str(directory, inst-local_settings, dir_name));
@@ -429,8 +429,7 @@ validate_local_instance_settings(ldap_instance_t *inst, settings_set_t *set) {
 		CLEANUP_WITH(ISC_R_FAILURE);
 	}
 	CHECK(isc_string_printf(print_buff, PRINT_BUFF_SIZE, %u, auth_method_enum));
-	CHECK(setting_set(auth_method_enum, inst-local_settings, print_buff,
-			  inst-task));
+	CHECK(setting_set(auth_method_enum, inst-local_settings, print_buff));
 
 	/* check we have the right data when SASL/GSSAPI is selected */
 	CHECK(setting_get_str(sasl_mech, set, sasl_mech));
@@ -485,13 +484,11 @@ validate_local_instance_settings(ldap_instance_t *inst, settings_set_t *set) {
 		  default '%s',
 		  str_buf(buff));
 	CHECK(setting_set(krb5_principal, set,
-			  str_buf(buff),
-			  inst-task));
+			  str_buf(buff)));
 }
 			} else {
 CHECK(setting_set(krb5_principal, set,
-		  sasl_user,
-		  inst-task));
+		  sasl_user));
 			}
 		}
 	} else if (auth_method_enum == AUTH_SASL) {
@@ -559,7 +556,7 @@ new_ldap_instance(isc_mem_t *mctx, const char *db_name,
 	  sizeof(settings_global_default), settings_name,