Re: [Samba] [PATCH] Do not close winbind socket during use
Hi Andrew, Am 03.07.2013 09:44, schrieb Andrew Bartlett: On Thu, 2013-06-27 at 11:42 +1000, Andrew Bartlett wrote: On Wed, 2013-06-26 at 20:39 +1000, Andrew Bartlett wrote: On Mon, 2013-06-24 at 15:26 +, philippe.simo...@swisscom.com wrote: Hi Andrew, and by putting more num-callers : valgrind --num-callers=50 samba -i -M single Thanks for getting me that. I've managed to reproduce it here, but not under valgrind, and only when I hack the code to force a timeout. At least this should help me figure out why we process the winbind socket close, which is the crux of this issue. I think I've found the cause of the issue you are hitting. There is still another issue with the nested event loop in the krb5 libs, but these two patches should help significantly. As you have had more luck than I in reproducing this in a unaltered setting, please let me know if this helps. Patches are for git master, but may apply to 4.0 as well. G'Day, The original reporter has confirmed to me that this removes the segfault for him. It changes it to a 105 sec hang, (due to the winbind client trying for 5 second at at a time many times). Can I get a review on it so we can rid master and eventually 4.0 of this nasty crash? I've looked through this patches and have some improvements. The main problem is that we're not sure wbsrv_call_loop() is called again on the terminated connection, when the last pending request is finished. That's why I remember all broken connections and try to clean them up before accepting a new connection or processing any new request on any connection. This way we're sure the connection gets removed eventually. I'm currently running some autobuild with the attached patches, they might also fix the current flakey crashes, e.g. https://git.samba.org/autobuild.flakey/2013-07-08-0055/samba.stderr metze signature.asc Description: OpenPGP digital signature -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba
Re: [Samba] [PATCH] Do not close winbind socket during use
Am 09.07.2013 17:33, schrieb Stefan (metze) Metzmacher: Hi Andrew, Am 03.07.2013 09:44, schrieb Andrew Bartlett: On Thu, 2013-06-27 at 11:42 +1000, Andrew Bartlett wrote: On Wed, 2013-06-26 at 20:39 +1000, Andrew Bartlett wrote: On Mon, 2013-06-24 at 15:26 +, philippe.simo...@swisscom.com wrote: Hi Andrew, and by putting more num-callers : valgrind --num-callers=50 samba -i -M single Thanks for getting me that. I've managed to reproduce it here, but not under valgrind, and only when I hack the code to force a timeout. At least this should help me figure out why we process the winbind socket close, which is the crux of this issue. I think I've found the cause of the issue you are hitting. There is still another issue with the nested event loop in the krb5 libs, but these two patches should help significantly. As you have had more luck than I in reproducing this in a unaltered setting, please let me know if this helps. Patches are for git master, but may apply to 4.0 as well. G'Day, The original reporter has confirmed to me that this removes the segfault for him. It changes it to a 105 sec hang, (due to the winbind client trying for 5 second at at a time many times). Can I get a review on it so we can rid master and eventually 4.0 of this nasty crash? I've looked through this patches and have some improvements. The main problem is that we're not sure wbsrv_call_loop() is called again on the terminated connection, when the last pending request is finished. That's why I remember all broken connections and try to clean them up before accepting a new connection or processing any new request on any connection. This way we're sure the connection gets removed eventually. I'm currently running some autobuild with the attached patches, they might also fix the current flakey crashes, e.g. https://git.samba.org/autobuild.flakey/2013-07-08-0055/samba.stderr Here's the next try, which hopefully don't crash in make test :-) metze signature.asc Description: OpenPGP digital signature -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba
Re: [Samba] [PATCH] Do not close winbind socket during use
Am 09.07.2013 18:03, schrieb Stefan (metze) Metzmacher: Am 09.07.2013 17:33, schrieb Stefan (metze) Metzmacher: Hi Andrew, Am 03.07.2013 09:44, schrieb Andrew Bartlett: On Thu, 2013-06-27 at 11:42 +1000, Andrew Bartlett wrote: On Wed, 2013-06-26 at 20:39 +1000, Andrew Bartlett wrote: On Mon, 2013-06-24 at 15:26 +, philippe.simo...@swisscom.com wrote: Hi Andrew, and by putting more num-callers : valgrind --num-callers=50 samba -i -M single Thanks for getting me that. I've managed to reproduce it here, but not under valgrind, and only when I hack the code to force a timeout. At least this should help me figure out why we process the winbind socket close, which is the crux of this issue. I think I've found the cause of the issue you are hitting. There is still another issue with the nested event loop in the krb5 libs, but these two patches should help significantly. As you have had more luck than I in reproducing this in a unaltered setting, please let me know if this helps. Patches are for git master, but may apply to 4.0 as well. G'Day, The original reporter has confirmed to me that this removes the segfault for him. It changes it to a 105 sec hang, (due to the winbind client trying for 5 second at at a time many times). Can I get a review on it so we can rid master and eventually 4.0 of this nasty crash? I've looked through this patches and have some improvements. The main problem is that we're not sure wbsrv_call_loop() is called again on the terminated connection, when the last pending request is finished. That's why I remember all broken connections and try to clean them up before accepting a new connection or processing any new request on any connection. This way we're sure the connection gets removed eventually. I'm currently running some autobuild with the attached patches, they might also fix the current flakey crashes, e.g. https://git.samba.org/autobuild.flakey/2013-07-08-0055/samba.stderr Here's the next try, which hopefully don't crash in make test :-) Ok, it passed 4 times on master and 4 times on v4-0-test, if you're ok with it I'll squash my changes and the missing Pair-programmed-with:, Signed-off-by:, Reviewed-by: tags and push it... Are you fine with that? metze signature.asc Description: OpenPGP digital signature -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba
Re: [Samba] [PATCH] Do not close winbind socket during use
On Tue, 2013-07-09 at 22:36 +0200, Stefan (metze) Metzmacher wrote: Am 09.07.2013 18:03, schrieb Stefan (metze) Metzmacher: Am 09.07.2013 17:33, schrieb Stefan (metze) Metzmacher: Hi Andrew, Am 03.07.2013 09:44, schrieb Andrew Bartlett: On Thu, 2013-06-27 at 11:42 +1000, Andrew Bartlett wrote: On Wed, 2013-06-26 at 20:39 +1000, Andrew Bartlett wrote: On Mon, 2013-06-24 at 15:26 +, philippe.simo...@swisscom.com wrote: Hi Andrew, and by putting more num-callers : valgrind --num-callers=50 samba -i -M single Thanks for getting me that. I've managed to reproduce it here, but not under valgrind, and only when I hack the code to force a timeout. At least this should help me figure out why we process the winbind socket close, which is the crux of this issue. I think I've found the cause of the issue you are hitting. There is still another issue with the nested event loop in the krb5 libs, but these two patches should help significantly. As you have had more luck than I in reproducing this in a unaltered setting, please let me know if this helps. Patches are for git master, but may apply to 4.0 as well. G'Day, The original reporter has confirmed to me that this removes the segfault for him. It changes it to a 105 sec hang, (due to the winbind client trying for 5 second at at a time many times). Can I get a review on it so we can rid master and eventually 4.0 of this nasty crash? I've looked through this patches and have some improvements. The main problem is that we're not sure wbsrv_call_loop() is called again on the terminated connection, when the last pending request is finished. That's why I remember all broken connections and try to clean them up before accepting a new connection or processing any new request on any connection. This way we're sure the connection gets removed eventually. I'm currently running some autobuild with the attached patches, they might also fix the current flakey crashes, e.g. https://git.samba.org/autobuild.flakey/2013-07-08-0055/samba.stderr Here's the next try, which hopefully don't crash in make test :-) Ok, it passed 4 times on master and 4 times on v4-0-test, if you're ok with it I'll squash my changes and the missing Pair-programmed-with:, Signed-off-by:, Reviewed-by: tags and push it... Are you fine with that? Thanks, please do that. Signed-off-by: Andrew Bartlett abart...@samba.org Reviewed-by: Andrew Bartlett abart...@samba.org Andrew Bartlett -- Andrew Bartletthttp://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba
Re: [Samba] [PATCH] Do not close winbind socket during use
On Thu, 2013-06-27 at 11:42 +1000, Andrew Bartlett wrote: On Wed, 2013-06-26 at 20:39 +1000, Andrew Bartlett wrote: On Mon, 2013-06-24 at 15:26 +, philippe.simo...@swisscom.com wrote: Hi Andrew, and by putting more num-callers : valgrind --num-callers=50 samba -i -M single Thanks for getting me that. I've managed to reproduce it here, but not under valgrind, and only when I hack the code to force a timeout. At least this should help me figure out why we process the winbind socket close, which is the crux of this issue. I think I've found the cause of the issue you are hitting. There is still another issue with the nested event loop in the krb5 libs, but these two patches should help significantly. As you have had more luck than I in reproducing this in a unaltered setting, please let me know if this helps. Patches are for git master, but may apply to 4.0 as well. G'Day, The original reporter has confirmed to me that this removes the segfault for him. It changes it to a 105 sec hang, (due to the winbind client trying for 5 second at at a time many times). Can I get a review on it so we can rid master and eventually 4.0 of this nasty crash? Thanks, Andrew Bartlett -- Andrew Bartletthttp://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org From df7c099be9366b0439f12d0924bd2192ad4888bd Mon Sep 17 00:00:00 2001 From: Andrew Bartlett abart...@samba.org Date: Thu, 27 Jun 2013 11:27:03 +1000 Subject: [PATCH 1/2] service_stream: Log if the connection termination is deferred or not --- source4/smbd/service_stream.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/source4/smbd/service_stream.c b/source4/smbd/service_stream.c index 22c4c04..74bb477 100644 --- a/source4/smbd/service_stream.c +++ b/source4/smbd/service_stream.c @@ -60,7 +60,11 @@ void stream_terminate_connection(struct stream_connection *srv_conn, const char if (!reason) reason = unknown reason; - DEBUG(3,(Terminating connection - '%s'\n, reason)); + if (srv_conn-processing) { + DEBUG(3,(Terminating connection deferred - '%s'\n, reason)); + } else { + DEBUG(3,(Terminating connection - '%s'\n, reason)); + } srv_conn-terminate = reason; -- 1.7.11.7 From 0daf694bce47710a62f7e38aa2830bc1b40f3dfc Mon Sep 17 00:00:00 2001 From: Andrew Bartlett abart...@samba.org Date: Thu, 27 Jun 2013 11:28:03 +1000 Subject: [PATCH 2/2] s4-winbindd: Do not terminate a connection that is still pending Instead, wait until the call attempts to reply, and let it terminate then (often this happens in the attempt to then write to the broken pipe). Andrew Bartlett --- source4/winbind/wb_samba3_protocol.c | 5 + source4/winbind/wb_server.c | 14 +- source4/winbind/wb_server.h | 5 - 3 files changed, 22 insertions(+), 2 deletions(-) diff --git a/source4/winbind/wb_samba3_protocol.c b/source4/winbind/wb_samba3_protocol.c index 2846e9c..1b78c99 100644 --- a/source4/winbind/wb_samba3_protocol.c +++ b/source4/winbind/wb_samba3_protocol.c @@ -297,6 +297,8 @@ NTSTATUS wbsrv_samba3_send_reply(struct wbsrv_samba3_call *call) struct tevent_req *subreq; NTSTATUS status; + call-wbconn-pending_calls--; + status = wbsrv_samba3_push_reply(call); NT_STATUS_NOT_OK_RETURN(status); @@ -355,9 +357,12 @@ NTSTATUS wbsrv_samba3_process(struct wbsrv_samba3_call *call) return status; } + call-wbconn-pending_calls++; + status = wbsrv_samba3_handle_call(call); if (!NT_STATUS_IS_OK(status)) { + call-wbconn-pending_calls--; talloc_free(call); return status; } diff --git a/source4/winbind/wb_server.c b/source4/winbind/wb_server.c index 983f9f5..fb67d23 100644 --- a/source4/winbind/wb_server.c +++ b/source4/winbind/wb_server.c @@ -31,7 +31,14 @@ void wbsrv_terminate_connection(struct wbsrv_connection *wbconn, const char *reason) { - stream_terminate_connection(wbconn-conn, reason); + if (wbconn-pending_calls == 0) { + char *full_reason = talloc_asprintf(wbconn, wbsrv: %s, reason); + stream_terminate_connection(wbconn-conn, full_reason ? full_reason : reason); + } else { + DEBUG(3,(wbsrv: terminating connection due to '%s' defered due to %d pending calls\n, + reason, wbconn-pending_calls)); + wbconn-terminate = reason; + } } static void wbsrv_call_loop(struct tevent_req *subreq) @@ -41,6 +48,11 @@ static void wbsrv_call_loop(struct tevent_req *subreq) struct wbsrv_samba3_call *call; NTSTATUS status; + if (wbsrv_conn-terminate) { + wbsrv_terminate_connection(wbsrv_conn, wbsrv_conn-terminate); + return; + } + call = talloc_zero(wbsrv_conn, struct wbsrv_samba3_call); if (call == NULL) { wbsrv_terminate_connection(wbsrv_conn, wbsrv_call_loop: diff --git a/source4/winbind/wb_server.h b/source4/winbind/wb_server.h index 9b03004..941af68 100644 --- a/source4/winbind/wb_server.h +++ b/source4/winbind/wb_server.h @@ -94,9 +94,12 @@ struct
Re: [Samba] [PATCH] Do not close winbind socket during use
Hi Andrew, i tried your both patches (on a 'clean' 4.0.6), and the difference is that samba is not crashing anymore, but winbind seems to be blocked after a wbinfo --uid-info 300. e.g : r...@gwnois03.test.ch ~# wbinfo --uid-info 311 TEST\Guest:*:311:312::/home/TEST/Guest:/bin/false r...@gwnois03.test.ch ~# wbinfo --uid-info 300 no response, infinite timeout Philippe -Original Message- From: Andrew Bartlett [mailto:abart...@samba.org] Sent: Thursday, June 27, 2013 3:43 AM To: Simonet Philippe, ITS-OUS-OP-IFM-NW-IPE; me...@samba.org; k...@samba.org Cc: sa...@samba.org; samba-techni...@samba.org Subject: [PATCH] Do not close winbind socket during use On Wed, 2013-06-26 at 20:39 +1000, Andrew Bartlett wrote: On Mon, 2013-06-24 at 15:26 +, philippe.simo...@swisscom.com wrote: Hi Andrew, and by putting more num-callers : valgrind --num-callers=50 samba -i -M single Thanks for getting me that. I've managed to reproduce it here, but not under valgrind, and only when I hack the code to force a timeout. At least this should help me figure out why we process the winbind socket close, which is the crux of this issue. I think I've found the cause of the issue you are hitting. There is still another issue with the nested event loop in the krb5 libs, but these two patches should help significantly. As you have had more luck than I in reproducing this in a unaltered setting, please let me know if this helps. Patches are for git master, but may apply to 4.0 as well. Kai, Metze: In reading the code, I cannot see why the DNS server would not suffer the same issue, if the DNS clients closed it's socket. Should we find a more generic way to do this in service_stream, or should just duplicate this? I don't think other servers hit the same issue as they are currently 'blocking' in terms of the socket handler. Thanks, Andrew Bartlett -- Andrew Bartletthttp://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba
Re: [Samba] [PATCH] Do not close winbind socket during use
On Thu, 2013-06-27 at 07:14 +, philippe.simo...@swisscom.com wrote: Hi Andrew, i tried your both patches (on a 'clean' 4.0.6), and the difference is that samba is not crashing anymore, but winbind seems to be blocked after a wbinfo --uid-info 300. e.g : r...@gwnois03.test.ch ~# wbinfo --uid-info 311 TEST\Guest:*:311:312::/home/TEST/Guest:/bin/false r...@gwnois03.test.ch ~# wbinfo --uid-info 300 no response, infinite timeout Philippe Can you run it with -d4 and mail me the log (privately), and run it under valgrind again? Thanks, Andrew Bartlett -- Andrew Bartletthttp://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba
Re: [Samba] [PATCH] Do not close winbind socket during use
On Thu, 2013-06-27 at 11:42 +1000, Andrew Bartlett wrote: On Wed, 2013-06-26 at 20:39 +1000, Andrew Bartlett wrote: On Mon, 2013-06-24 at 15:26 +, philippe.simo...@swisscom.com wrote: Hi Andrew, and by putting more num-callers : valgrind --num-callers=50 samba -i -M single Thanks for getting me that. I've managed to reproduce it here, but not under valgrind, and only when I hack the code to force a timeout. At least this should help me figure out why we process the winbind socket close, which is the crux of this issue. I think I've found the cause of the issue you are hitting. There is still another issue with the nested event loop in the krb5 libs, but these two patches should help significantly. As you have had more luck than I in reproducing this in a unaltered setting, please let me know if this helps. Patches are for git master, but may apply to 4.0 as well. Actually, while they might apply to 4.0, the other changes I earlier in the thread need to be applied first. (They are already in master). Kai, Metze: In reading the code, I cannot see why the DNS server would not suffer the same issue, if the DNS clients closed it's socket. Should we find a more generic way to do this in service_stream, or should just duplicate this? I don't think other servers hit the same issue as they are currently 'blocking' in terms of the socket handler. Thanks, Andrew Bartlett -- Andrew Bartletthttp://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba
[Samba] [PATCH] Do not close winbind socket during use
On Wed, 2013-06-26 at 20:39 +1000, Andrew Bartlett wrote: On Mon, 2013-06-24 at 15:26 +, philippe.simo...@swisscom.com wrote: Hi Andrew, and by putting more num-callers : valgrind --num-callers=50 samba -i -M single Thanks for getting me that. I've managed to reproduce it here, but not under valgrind, and only when I hack the code to force a timeout. At least this should help me figure out why we process the winbind socket close, which is the crux of this issue. I think I've found the cause of the issue you are hitting. There is still another issue with the nested event loop in the krb5 libs, but these two patches should help significantly. As you have had more luck than I in reproducing this in a unaltered setting, please let me know if this helps. Patches are for git master, but may apply to 4.0 as well. Kai, Metze: In reading the code, I cannot see why the DNS server would not suffer the same issue, if the DNS clients closed it's socket. Should we find a more generic way to do this in service_stream, or should just duplicate this? I don't think other servers hit the same issue as they are currently 'blocking' in terms of the socket handler. Thanks, Andrew Bartlett -- Andrew Bartletthttp://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org From df7c099be9366b0439f12d0924bd2192ad4888bd Mon Sep 17 00:00:00 2001 From: Andrew Bartlett abart...@samba.org Date: Thu, 27 Jun 2013 11:27:03 +1000 Subject: [PATCH 1/2] service_stream: Log if the connection termination is deferred or not --- source4/smbd/service_stream.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/source4/smbd/service_stream.c b/source4/smbd/service_stream.c index 22c4c04..74bb477 100644 --- a/source4/smbd/service_stream.c +++ b/source4/smbd/service_stream.c @@ -60,7 +60,11 @@ void stream_terminate_connection(struct stream_connection *srv_conn, const char if (!reason) reason = unknown reason; - DEBUG(3,(Terminating connection - '%s'\n, reason)); + if (srv_conn-processing) { + DEBUG(3,(Terminating connection deferred - '%s'\n, reason)); + } else { + DEBUG(3,(Terminating connection - '%s'\n, reason)); + } srv_conn-terminate = reason; -- 1.7.11.7 From 0daf694bce47710a62f7e38aa2830bc1b40f3dfc Mon Sep 17 00:00:00 2001 From: Andrew Bartlett abart...@samba.org Date: Thu, 27 Jun 2013 11:28:03 +1000 Subject: [PATCH 2/2] s4-winbindd: Do not terminate a connection that is still pending Instead, wait until the call attempts to reply, and let it terminate then (often this happens in the attempt to then write to the broken pipe). Andrew Bartlett --- source4/winbind/wb_samba3_protocol.c | 5 + source4/winbind/wb_server.c | 14 +- source4/winbind/wb_server.h | 5 - 3 files changed, 22 insertions(+), 2 deletions(-) diff --git a/source4/winbind/wb_samba3_protocol.c b/source4/winbind/wb_samba3_protocol.c index 2846e9c..1b78c99 100644 --- a/source4/winbind/wb_samba3_protocol.c +++ b/source4/winbind/wb_samba3_protocol.c @@ -297,6 +297,8 @@ NTSTATUS wbsrv_samba3_send_reply(struct wbsrv_samba3_call *call) struct tevent_req *subreq; NTSTATUS status; + call-wbconn-pending_calls--; + status = wbsrv_samba3_push_reply(call); NT_STATUS_NOT_OK_RETURN(status); @@ -355,9 +357,12 @@ NTSTATUS wbsrv_samba3_process(struct wbsrv_samba3_call *call) return status; } + call-wbconn-pending_calls++; + status = wbsrv_samba3_handle_call(call); if (!NT_STATUS_IS_OK(status)) { + call-wbconn-pending_calls--; talloc_free(call); return status; } diff --git a/source4/winbind/wb_server.c b/source4/winbind/wb_server.c index 983f9f5..fb67d23 100644 --- a/source4/winbind/wb_server.c +++ b/source4/winbind/wb_server.c @@ -31,7 +31,14 @@ void wbsrv_terminate_connection(struct wbsrv_connection *wbconn, const char *reason) { - stream_terminate_connection(wbconn-conn, reason); + if (wbconn-pending_calls == 0) { + char *full_reason = talloc_asprintf(wbconn, wbsrv: %s, reason); + stream_terminate_connection(wbconn-conn, full_reason ? full_reason : reason); + } else { + DEBUG(3,(wbsrv: terminating connection due to '%s' defered due to %d pending calls\n, + reason, wbconn-pending_calls)); + wbconn-terminate = reason; + } } static void wbsrv_call_loop(struct tevent_req *subreq) @@ -41,6 +48,11 @@ static void wbsrv_call_loop(struct tevent_req *subreq) struct wbsrv_samba3_call *call; NTSTATUS status; + if (wbsrv_conn-terminate) { + wbsrv_terminate_connection(wbsrv_conn, wbsrv_conn-terminate); + return; + } + call = talloc_zero(wbsrv_conn, struct wbsrv_samba3_call); if (call == NULL) { wbsrv_terminate_connection(wbsrv_conn, wbsrv_call_loop: diff --git a/source4/winbind/wb_server.h b/source4/winbind/wb_server.h index 9b03004..941af68 100644 --- a/source4/winbind/wb_server.h +++ b/source4/winbind/wb_server.h @@ -94,9 +94,12 @@ struct wbsrv_connection { /*