The branch, master has been updated via fabc3c9d382 source4 smbd prefork: Add code comments via 830a6502047 WHATSNEW: prefork restart via 2db38370e3f samba-tool processes: display pre-fork masters and workers via 0c10c0e17a1 source4 dcerpc_server: remove irpc_add_name via 46b164de676 source4 smbd prefork: Cleanup messaging on restart via 11d424e1999 source4 messaging: clean up terminated processes via 40941e98f85 source4 smbd prefork: Add backoff to process restart via 6c850b77c4a source4 smbd prefork: restart on non zero exit code via 3315a28ea92 source4 smbd process: pass the fatal flag to terminate via d4641c8f76b source4 smbd prefork: Restart failed processes via 1adb5b2122a source4 smbd prefork: Pass restart information via 5fa134dc83e source4 smbd test: prefork process restart from 6b136a8184d replmd: remove unnecessary indent
https://git.samba.org/?p=samba.git;a=shortlog;h=master - Log ----------------------------------------------------------------- commit fabc3c9d3827fd30d7cec674105218c97c04e801 Author: Tim Beale <timbe...@catalyst.net.nz> Date: Thu Nov 22 10:35:01 2018 +1300 source4 smbd prefork: Add code comments Add some comments to the prefork code explaining what's going on. Signed-off-by: Tim Beale <timbe...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> Autobuild-User(master): Andrew Bartlett <abart...@samba.org> Autobuild-Date(master): Fri Nov 23 11:40:49 CET 2018 on sn-devel-144 commit 830a65020475886af6bd1edc347c0a2b96d14a38 Author: Gary Lockyer <g...@catalyst.net.nz> Date: Wed Sep 19 15:25:02 2018 +1200 WHATSNEW: prefork restart Signed-off-by: Gary Lockyer <g...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 2db38370e3f6f4c91e87c8c469f3b3c59c0f3a68 Author: Gary Lockyer <g...@catalyst.net.nz> Date: Fri Sep 14 09:38:56 2018 +1200 samba-tool processes: display pre-fork masters and workers Tag prefork work processes with "(worker 0)", and sort the process list on server name to get a consistent order. Service: PID -------------------------------------- cldap_server 15588 ... ldap_server 15584 ldap_server(worker 0) 15627 ldap_server(worker 1) 15630 ldap_server(worker 2) 15632 ldap_server(worker 3) 15634 nbt_server 15576 notify-daemon 15638 ... samba 0 ... wrepl_server 15580 Signed-off-by: Gary Lockyer <g...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 0c10c0e17a11b7a33c1de20ecd8b37830af909f7 Author: Gary Lockyer <g...@catalyst.net.nz> Date: Tue Sep 11 07:38:06 2018 +1200 source4 dcerpc_server: remove irpc_add_name Remove the irpc_add_name from dcesrv_sock_accept, as it results in two identical names being registered for a process. Signed-off-by: Gary Lockyer <g...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 46b164de67689c57d316701557b9847a0b9e45c1 Author: Gary Lockyer <g...@catalyst.net.nz> Date: Fri Sep 14 09:45:38 2018 +1200 source4 smbd prefork: Cleanup messaging on restart Clean up names registered in messaging for a terminated process. Signed-off-by: Gary Lockyer <g...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 11d424e1999e50f036ce2b33411f6f607c60a487 Author: Gary Lockyer <g...@catalyst.net.nz> Date: Fri Sep 14 09:43:59 2018 +1200 source4 messaging: clean up terminated processes Now that the smbd pre-fork process model restarts failed processes rather than terminating, we end up with names registered to defunct processes. This patch adds a function to clean up all the names registered to a process. Signed-off-by: Gary Lockyer <g...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 40941e98f8508daf59b83e05461ebeab4bebaacd Author: Gary Lockyer <g...@catalyst.net.nz> Date: Wed Sep 5 07:31:22 2018 +1200 source4 smbd prefork: Add backoff to process restart Add new smbd.conf variables 'prefork backoff increment' and 'prefork maximum backoff' to control the rate at which failed pre-forked processes are restarted. Signed-off-by: Gary Lockyer <g...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 6c850b77c4ae6c4aa6d914f7ccedc2bcce939217 Author: Gary Lockyer <g...@catalyst.net.nz> Date: Tue Sep 4 12:12:49 2018 +1200 source4 smbd prefork: restart on non zero exit code Restart any pre-fork master or worker process that exits with a non zero exit code. Signed-off-by: Gary Lockyer <g...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 3315a28ea92438eca499fb87b863cbd2db50a6a6 Author: Gary Lockyer <g...@catalyst.net.nz> Date: Tue Sep 4 10:09:38 2018 +1200 source4 smbd process: pass the fatal flag to terminate Pass the fatal flag supplied to task_server_terminate to the process task_terminate method. It will be used by the task_terminate methods to set an appropriate exit code. The process_prefork model will use a non zero exit code to indicate that the process should be restarted. Signed-off-by: Gary Lockyer <g...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit d4641c8f76b7764f81ce246d2a2b8103593d0dac Author: Gary Lockyer <g...@catalyst.net.nz> Date: Mon Sep 3 09:34:17 2018 +1200 source4 smbd prefork: Restart failed processes Restart any pre-forked master or worker process that terminated with SIGABRT, SIGBUS, SIGFPE, SIGILL or SIGSYS Signed-off-by: Gary Lockyer <g...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 1adb5b2122a2026a5910852ac33b83f656f9b4c2 Author: Gary Lockyer <g...@catalyst.net.nz> Date: Fri Aug 31 11:40:18 2018 +1200 source4 smbd prefork: Pass restart information Pass information about the pre-fork master and worker processes that will allow them to be restarted. Signed-off-by: Gary Lockyer <g...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> commit 5fa134dc83e0fae21c1b20d722d6040f49a152e4 Author: Gary Lockyer <g...@catalyst.net.nz> Date: Tue Sep 18 08:37:02 2018 +1200 source4 smbd test: prefork process restart Add tests for the restarting of failed/terminated process, by the pre-fork process model. Signed-off-by: Gary Lockyer <g...@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abart...@samba.org> ----------------------------------------------------------------------- Summary of changes: WHATSNEW.txt | 20 +- .../smbdotconf/base/preforkbackoffincrement.xml | 25 + .../preforkmaximumbackoff.xml} | 7 +- lib/param/loadparm.c | 2 + python/samba/netcmd/processes.py | 75 ++- python/samba/tests/prefork_restart.py | 467 ++++++++++++++ selftest/target/README | 6 + selftest/target/Samba.pm | 1 + selftest/target/Samba4.pm | 40 ++ source3/param/loadparm.c | 2 + source4/kdc/kdc-heimdal.c | 3 +- source4/lib/messaging/messaging.c | 42 ++ source4/lib/messaging/messaging.h | 2 + source4/rpc_server/dcerpc_server.c | 2 - source4/selftest/tests.py | 10 + source4/smbd/process_model.h | 16 +- source4/smbd/process_prefork.c | 677 ++++++++++++++++----- source4/smbd/process_single.c | 29 +- source4/smbd/process_standard.c | 31 +- source4/smbd/service_stream.c | 3 +- source4/smbd/service_task.c | 4 +- 21 files changed, 1274 insertions(+), 190 deletions(-) create mode 100644 docs-xml/smbdotconf/base/preforkbackoffincrement.xml copy docs-xml/smbdotconf/{winbind/idmapnegativecachetime.xml => base/preforkmaximumbackoff.xml} (55%) create mode 100644 python/samba/tests/prefork_restart.py Changeset truncated at 500 lines: diff --git a/WHATSNEW.txt b/WHATSNEW.txt index 98ada34e5c6..fc43edc8e86 100644 --- a/WHATSNEW.txt +++ b/WHATSNEW.txt @@ -63,6 +63,18 @@ information about how the users are spread across groups in your domain. The 'samba-tool group list --verbose' command has also been updated to include the number of users in each group. +prefork process restart +----------------------- + +The pre-fork process model now restarts failed processes. The delay between +restart attempts is controlled by the "prefork backoff increment" (default = 10) +and "prefork maximum backoff" (default = 120) smbd.conf parameters. A linear +back off strategy is used with "prefork backoff increment" added to the +delay between restart attempts up until it reaches "prefork maximum backoff". + +Using the default sequence the restart delays (in seconds) are: + 0, 10, 20, ..., 120, 120, ... + REMOVED FEATURES ================ @@ -75,8 +87,12 @@ The samba_backup script has been removed. This has now been replaced by the smb.conf changes ================ - Parameter Name Description Default - -------------- ----------- ------- + Parameter Name Description Default + -------------- ----------- ------- + prefork backoff increment Delay added to process restart 10 (seconds) + between attempts. + prefork maximum backoff Maximum delay for process between 120 (seconds) + process restart attempts KNOWN ISSUES diff --git a/docs-xml/smbdotconf/base/preforkbackoffincrement.xml b/docs-xml/smbdotconf/base/preforkbackoffincrement.xml new file mode 100644 index 00000000000..2cb1cc32fd8 --- /dev/null +++ b/docs-xml/smbdotconf/base/preforkbackoffincrement.xml @@ -0,0 +1,25 @@ +<samba:parameter name="prefork backoff increment" + context="G" + type="integer" + xmlns:samba="http://www.samba.org/samba/DTD/samba-doc"> +<description> + <para>This option specifies the number of seconds added to the delay + before a prefork master or worker process is restarted. The + restart is initially zero, the prefork backoff increment is + added to the delay on each restart up to the value specified by + "prefork maximum backoff". + </para> + + <para>Additionally the the backoff for an individual service by using + "prefork backoff increment: service name" + i.e. "prefork backoff increment:ldap = 2" to set the + backoff increment to 2.</para> + + <para>If the backoff increment is 2 and the maximum backoff is 5. + There will be a zero second delay for the first restart. A two + second delay for the second restart. A four second delay for the + third and any subsequent restarts</para> +</description> + +<value type="default">10</value> +</samba:parameter> diff --git a/docs-xml/smbdotconf/winbind/idmapnegativecachetime.xml b/docs-xml/smbdotconf/base/preforkmaximumbackoff.xml similarity index 55% copy from docs-xml/smbdotconf/winbind/idmapnegativecachetime.xml copy to docs-xml/smbdotconf/base/preforkmaximumbackoff.xml index 32c4e1f8308..17e530d5db8 100644 --- a/docs-xml/smbdotconf/winbind/idmapnegativecachetime.xml +++ b/docs-xml/smbdotconf/base/preforkmaximumbackoff.xml @@ -1,11 +1,12 @@ -<samba:parameter name="idmap negative cache time" +<samba:parameter name="prefork maximum backoff" context="G" type="integer" xmlns:samba="http://www.samba.org/samba/DTD/samba-doc"> <description> - <para>This parameter specifies the number of seconds that Winbind's - idmap interface will cache negative SID/uid/gid query results. + <para>This option controls the maximum delay before a failed pre-fork + process is restarted. </para> + </description> <value type="default">120</value> diff --git a/lib/param/loadparm.c b/lib/param/loadparm.c index 484c8913009..a7dbc6f8f0b 100644 --- a/lib/param/loadparm.c +++ b/lib/param/loadparm.c @@ -2997,6 +2997,8 @@ struct loadparm_context *loadparm_init(TALLOC_CTX *mem_ctx) "49152-65535"); lpcfg_do_global_parameter(lp_ctx, "prefork children", "4"); + lpcfg_do_global_parameter(lp_ctx, "prefork backoff increment", "10"); + lpcfg_do_global_parameter(lp_ctx, "prefork maximum backoff", "120"); lpcfg_do_global_parameter(lp_ctx, "check parent directory delete on close", "no"); diff --git a/python/samba/netcmd/processes.py b/python/samba/netcmd/processes.py index d04a548abd7..0406b1859ca 100644 --- a/python/samba/netcmd/processes.py +++ b/python/samba/netcmd/processes.py @@ -49,6 +49,42 @@ class cmd_processes(Command): takes_args = [] + # + # Get details of the samba services currently registered in irpc + # The prefork process model registers names in the form: + # prefork-master-<service> and prefork-worker-<service>-<instance> + # + # To allow this routine to identify pre-fork master and worker process + # + # returns a tuple (filtered, masters, workers) + # + # filtered - is a list of services with the prefork-* removed + # masters - dictionary keyed on service name of prefork master processes + # workers - dictionary keyed on service name containing an ordered list + # of worker processes. + def get_service_data(self, msg_ctx): + services = msg_ctx.irpc_all_servers() + filtered = [] + masters = {} + workers = {} + for service in services: + for id in service.ids: + if service.name.startswith("prefork-master"): + ns = service.name.split("-") + name = ns[2] + "_server" + masters[name] = service.ids[0].pid + elif service.name.startswith("prefork-worker"): + ns = service.name.split("-") + name = ns[2] + "_server" + instance = int(ns[3]) + pid = service.ids[0].pid + if name not in workers: + workers[name] = {} + workers[name][instance] = (instance, pid) + else: + filtered.append(service) + return (filtered, masters, workers) + def run(self, sambaopts, versionopts, section_name=None, name=None, pid=None): @@ -72,9 +108,36 @@ class cmd_processes(Command): if server_id.pid == int(pid): self.outf.write("%s\n" % name.name) else: - names = msg_ctx.irpc_all_servers() - self.outf.write(" Service: PID \n") - self.outf.write("-----------------------------\n") - for name in names: - for server_id in name.ids: - self.outf.write("%-16s %6d\n" % (name.name, server_id.pid)) + seen = {} # Service entries already printed, service names can + # be registered multiple times against a process + # but we should only display them once. + prefork = {} # Services running in the prefork process model + # want to ensure that the master process and workers + # are grouped to together. + (services, masters, workers) = self.get_service_data(msg_ctx) + self.outf.write(" Service: PID\n") + self.outf.write("--------------------------------------\n") + + for service in sorted(services, key=lambda x: x.name): + if service.name in masters: + # If this service is running in a pre-forked process we + # want to print the master process followed by all the + # worker processes + pid = masters[service.name] + if pid not in prefork: + prefork[pid] = True + self.outf.write("%-26s %6d\n" % + (service.name, pid)) + if service.name in workers: + ws = workers[service.name] + for w in ws: + (instance, pid) = ws[w] + sn = "{0}(worker {1})".format( + service.name, instance) + self.outf.write("%-26s %6d\n" % (sn, pid)) + else: + for server_id in service.ids: + if (service.name, server_id.pid) not in seen: + self.outf.write("%-26s %6d\n" + % (service.name, server_id.pid)) + seen[(service.name, server_id.pid)] = True diff --git a/python/samba/tests/prefork_restart.py b/python/samba/tests/prefork_restart.py new file mode 100644 index 00000000000..5233ca1f805 --- /dev/null +++ b/python/samba/tests/prefork_restart.py @@ -0,0 +1,467 @@ +# Tests for process restarting in the pre-fork process model +# +# Copyright (C) Andrew Bartlett <abart...@samba.org> 2018 +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. +# + +from __future__ import print_function +"""Tests process restarting in the pre-fork process model. + NOTE: As this test kills samba processes it won't play nicely with other + tests, so needs to be run in it's own environment. +""" + + +import os +import signal +import time + +import samba +from samba.tests import TestCase, delete_force +from samba.dcerpc import echo, netlogon +from samba.messaging import Messaging +from samba.samdb import SamDB +from samba.credentials import Credentials, DONT_USE_KERBEROS +from samba.compat import get_string +from samba.dsdb import ( + UF_WORKSTATION_TRUST_ACCOUNT, + UF_PASSWD_NOTREQD) +from samba.dcerpc.misc import SEC_CHAN_WKSTA +from samba.auth import system_session + +NUM_WORKERS = 4 +MACHINE_NAME = "PFRS" + + +class PreforkProcessRestartTests(TestCase): + + def setUp(self): + super(PreforkProcessRestartTests, self).setUp() + lp_ctx = self.get_loadparm() + self.msg_ctx = Messaging(lp_ctx=lp_ctx) + + def tearDown(self): + super(PreforkProcessRestartTests, self).tearDown() + + def get_process_data(self): + services = self.msg_ctx.irpc_all_servers() + + processes = [] + for service in services: + for id in service.ids: + processes.append((service.name, id.pid)) + return processes + + def get_process(self, name): + processes = self.get_process_data() + for pname, pid in processes: + if name == pname: + return pid + return None + + def get_worker_pids(self, name, workers): + pids = [] + for x in range(workers): + process_name = "prefork-worker-{0}-{1}".format(name, x) + pids.append(self.get_process(process_name)) + self.assertIsNotNone(pids[x]) + return pids + + def wait_for_workers(self, name, workers): + num_workers = len(workers) + for x in range(num_workers): + process_name = "prefork-worker-{0}-{1}".format(name, x) + self.wait_for_process(process_name, workers[x], 0, 1, 30) + + def wait_for_process(self, name, pid, initial_delay, wait, timeout): + time.sleep(initial_delay) + delay = initial_delay + while delay < timeout: + p = self.get_process(name) + if p is not None and p != pid: + # process has restarted + return + time.sleep(wait) + delay += wait + self.fail("Times out after {0} seconds waiting for {1} to restart". + format(delay, name)) + + def check_for_duplicate_processes(self): + processes = self.get_process_data() + process_map = {} + for name, p in processes: + if (name.startswith("prefork-") or + name.endswith("_server") or + name.endswith("srv")): + + if name in process_map: + if p != process_map[name]: + self.fail( + "Duplicate process for {0}, pids {1} and {2}". + format(name, p, process_map[name])) + + def simple_bind(self): + creds = self.insta_creds(template=self.get_credentials()) + creds.set_bind_dn("%s\\%s" % (creds.get_domain(), + creds.get_username())) + + self.samdb = SamDB(url="ldaps://%s" % os.environ["SERVER"], + lp=self.get_loadparm(), + credentials=creds) + + def rpc_echo(self): + conn = echo.rpcecho("ncalrpc:", self.get_loadparm()) + self.assertEquals([1, 2, 3], conn.EchoData([1, 2, 3])) + + def netlogon(self): + server = os.environ["SERVER"] + host = os.environ["SERVER_IP"] + lp = self.get_loadparm() + + credentials = self.get_credentials() + + session = system_session() + ldb = SamDB(url="ldap://%s" % host, + session_info=session, + credentials=credentials, + lp=lp) + machine_pass = samba.generate_random_password(32, 32) + machine_name = MACHINE_NAME + machine_dn = "cn=%s,%s" % (machine_name, ldb.domain_dn()) + + delete_force(ldb, machine_dn) + + utf16pw = ('"%s"' % get_string(machine_pass)).encode('utf-16-le') + ldb.add({ + "dn": machine_dn, + "objectclass": "computer", + "sAMAccountName": "%s$" % machine_name, + "userAccountControl": + str(UF_WORKSTATION_TRUST_ACCOUNT | UF_PASSWD_NOTREQD), + "unicodePwd": utf16pw}) + + machine_creds = Credentials() + machine_creds.guess(lp) + machine_creds.set_secure_channel_type(SEC_CHAN_WKSTA) + machine_creds.set_kerberos_state(DONT_USE_KERBEROS) + machine_creds.set_password(machine_pass) + machine_creds.set_username(machine_name + "$") + machine_creds.set_workstation(machine_name) + + netlogon.netlogon( + "ncacn_ip_tcp:%s[schannel,seal]" % server, + lp, + machine_creds) + + delete_force(ldb, machine_dn) + + def test_ldap_master_restart(self): + # check ldap connection, do a simple bind + self.simple_bind() + + # get ldap master process + pid = self.get_process("prefork-master-ldap") + self.assertIsNotNone(pid) + + # Get the worker processes + workers = self.get_worker_pids("ldap", NUM_WORKERS) + + # kill it + os.kill(pid, signal.SIGTERM) + + # wait for the process to restart + self.wait_for_process("prefork-master-ldap", pid, 1, 1, 30) + + # restarting the master restarts the workers as well, so make sure + # they have finished restarting + self.wait_for_workers("ldap", workers) + + # get ldap master process + new_pid = self.get_process("prefork-master-ldap") + self.assertIsNotNone(new_pid) + + # check that the pid has changed + self.assertNotEquals(pid, new_pid) + + # check that the worker processes have restarted + new_workers = self.get_worker_pids("ldap", NUM_WORKERS) + for x in range(NUM_WORKERS): + self.assertNotEquals(workers[x], new_workers[x]) + + # check that the previous server entries have been removed. + self.check_for_duplicate_processes() + + # check ldap connection, another simple bind + self.simple_bind() + + def test_ldap_worker_restart(self): + # check ldap connection, do a simple bind + self.simple_bind() + + # get ldap master process + pid = self.get_process("prefork-master-ldap") + self.assertIsNotNone(pid) + + # Get the worker processes + workers = self.get_worker_pids("ldap", NUM_WORKERS) + + # kill worker 0 + os.kill(workers[0], signal.SIGTERM) + + # wait for the process to restart + self.wait_for_process("prefork-worker-ldap-0", pid, 1, 1, 30) + + # get ldap master process + new_pid = self.get_process("prefork-master-ldap") + self.assertIsNotNone(new_pid) + + # check that the pid has not changed + self.assertEquals(pid, new_pid) + + # check that the worker processes have restarted + new_workers = self.get_worker_pids("ldap", NUM_WORKERS) + # process 0 should have a new pid the others should be unchanged + self.assertNotEquals(workers[0], new_workers[0]) + self.assertEquals(workers[1], new_workers[1]) + self.assertEquals(workers[2], new_workers[2]) + self.assertEquals(workers[3], new_workers[3]) + + # check that the previous server entries have been removed. + self.check_for_duplicate_processes() + + # check ldap connection, another simple bind + self.simple_bind() + + # + # Kill all the ldap worker processes and ensure that they are restarted + # correctly + # + def test_ldap_all_workers_restart(self): + # check ldap connection, do a simple bind + self.simple_bind() + + # get ldap master process + pid = self.get_process("prefork-master-ldap") + self.assertIsNotNone(pid) + + # Get the worker processes + workers = self.get_worker_pids("ldap", NUM_WORKERS) + + # kill all the worker processes + for x in workers: + os.kill(x, signal.SIGTERM) + + # wait for the worker processes to restart + self.wait_for_workers("ldap", workers) + + # get ldap master process + new_pid = self.get_process("prefork-master-ldap") + self.assertIsNotNone(new_pid) + + # check that the pid has not changed + self.assertEquals(pid, new_pid) + + # check that the worker processes have restarted + new_workers = self.get_worker_pids("ldap", NUM_WORKERS) + for x in range(NUM_WORKERS): + self.assertNotEquals(workers[x], new_workers[x]) + + # check that the previous server entries have been removed. + self.check_for_duplicate_processes() + + # check ldap connection, another simple bind + self.simple_bind() + + def test_rpc_master_restart(self): + # check rpc connection, make a rpc echo request + self.rpc_echo() + + # get rpc master process + pid = self.get_process("prefork-master-rpc") + self.assertIsNotNone(pid) + + # Get the worker processes + workers = self.get_worker_pids("rpc", NUM_WORKERS) + + # kill it + os.kill(pid, signal.SIGTERM) + -- Samba Shared Repository