Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd
On 11/13/2015 08:20 AM, Petr Cech wrote: >Hi Jakub, > >it works due to your reproducer. It is really need to have >setenforce == 1 You meant setenforce 0, right? Hi Jakub, yes, of course, I meant setenforce 1. It was mistake. --^-- 1 --> 0 I did little mistake again. ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-devel
Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd
On Fri, Nov 13, 2015 at 08:20:53AM +0100, Petr Cech wrote: > I reviewed the code. All remarks were addressed. > CI tests passed: > http://sssd-ci.duckdns.org/logs/job/33/10/summary.html > > => ACK > > Regards > > Petr I changed the wording of the commit message a bit (we added an option, not a command) and pushed the patch to master: 89530c830ded58c6140cdb34c9de07bf77bb5bc0 ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-devel
Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd
On 11/11/2015 12:25 PM, Jakub Hrozek wrote: On Wed, Nov 11, 2015 at 11:07:46AM +0100, Petr Cech wrote: >On 11/11/2015 09:32 AM, Jakub Hrozek wrote: > >>>Hi Jakub, > >>> > >>>I just sent the patch to the CI tests and they passed > >>>http://sssd-ci.duckdns.org/logs/job/32/63/summary.html > >>> > >>>> >Then I would prefer undocumented. It matches how we (don't) document the > >>>> >"command" option. > >>>I agree with little exception. I think it could be more clear if we write > >>>little documentation to commit message or to the code near the new option. > >>>But, how everybody wrote, we could leave man page without documentation. > >Good idea, I can move the documentation that was previously in the man > >page to the commit message, would that work for you? > >Yes, it works for me. > >I am going to 1/2 PTO for now. > >The last two things are that I would like to run it due to your reproducer. >And there is a remark from Stephen Gallagher which we need resolve. > >However, CI tests passed. So if you're in a hurry with code review, please >ask someone else. > >I can continue with review tomorrow. I'm not in hurry at all. Attached is a patch that adds a better commit message. We can discuss any details related to testing over IRC if you prefer. Hi Jakub, it works due to your reproducer. It is really need to have setenforce == 1 CI tests passed: http://sssd-ci.duckdns.org/logs/job/32/25/summary.html Stephen Gallagher wrote (2015-09-11 11:32 AM): There are problems inherent with passing the PID to the child process. There's no guarantee that the process still exists. In the worst-case, the PID could actually be reassigned to a new process and the output you got back from something like pstack could be reading from a different executable entirely. --- I understand, it could be dangerous. But, this option is a little bit secret, we don't write about it in our man pages and so on. I hope it will be used only for debuging some hot cases. There are only little remarks in patch. => ACK Regrds Petr PS: I accepted that we have # p = copy; not something like: # copy_ptr = copy; How I suggested previous mail. 0001-SSSD-Add-a-new-command-diag_cmd.patch From ee4135adf6669221de575ebc92e7b3aabba55ba9 Mon Sep 17 00:00:00 2001 From: Jakub Hrozek<jhro...@redhat.com> Date: Mon, 2 Nov 2015 11:41:31 +0100 Subject: [PATCH] SSSD: Add a new command diag_cmd This command is an optional one that is run when a sbus ping times out and before a SIGKILL commans is sent. ---^--- s -> d diag_cmd (string): A command that should be run for diagnostic purpose when an sbus timeout fails. The option value may contain %p which would be expanded for the process ID of the process that timed out Example: pstack %p This setting would print the stackstrace of the command whose ping timed out. Default: not set. --- src/confdb/confdb.h | 1 + src/config/SSSDConfig/__init__.py.in | 1 + src/config/SSSDConfigTest.py | 1 + src/config/etc/sssd.api.conf | 1 + src/monitor/monitor.c| 215 +++ 5 files changed, 197 insertions(+), 22 deletions(-) diff --git a/src/confdb/confdb.h b/src/confdb/confdb.h index 37b5fd7c7629e2618a1699e3ffd58110171db605..0ef7268f9cdc2c18482bbf7b8dbe19d3ef6b7bbf 100644 --- a/src/confdb/confdb.h +++ b/src/confdb/confdb.h @@ -71,6 +71,7 @@ #define CONFDB_MONITOR_DEFAULT_DOMAIN "default_domain_suffix" #define CONFDB_MONITOR_OVERRIDE_SPACE "override_space" #define CONFDB_MONITOR_USER_RUNAS "user" +#define CONFDB_MONITOR_PRE_KILL_CMD "diag_cmd" /* Both monitor and domains */ #define CONFDB_NAME_REGEX "re_expression" diff --git a/src/config/SSSDConfig/__init__.py.in b/src/config/SSSDConfig/__init__.py.in index bf61c402796122050fa43cf41128faec4771c5d2..60129e6e7fbc96d11c539323346c22a7db6d7f23 100644 --- a/src/config/SSSDConfig/__init__.py.in +++ b/src/config/SSSDConfig/__init__.py.in @@ -50,6 +50,7 @@ option_strings = { 'reconnection_retries' : _('Number of times to attempt connection to Data Providers'), 'fd_limit' : _('The number of file descriptors that may be opened by this responder'), 'client_idle_timeout' : _('Idle time before automatic disconnection of a client'), +'diag_cmd' : _('The command to run when a service ping times out'), # [sssd] 'services' : _('SSSD Services to start'), diff --git a/src/config/SSSDConfigTest.py b/src/config/SSSDConfigTest.py index 45562214da5d227b45914abbcb298e043048adf5..abd4a39258e060f27db62eb2352450b6c405930c 100755 --- a/src/config/SSSDConfigTest.py +++ b/src/config/SSSDConfigTest.py @@ -307,6 +307,7 @@ class SSSDConfigTestSSSDServi
Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd
On 11/12/2015 06:30 PM, Jakub Hrozek wrote: On Thu, Nov 12, 2015 at 10:49:33AM +0100, Petr Cech wrote: >On 11/11/2015 12:25 PM, Jakub Hrozek wrote: > >On Wed, Nov 11, 2015 at 11:07:46AM +0100, Petr Cech wrote: > >>>On 11/11/2015 09:32 AM, Jakub Hrozek wrote: > >>>>>> >>>Hi Jakub, > >>>>>> >>> > >>>>>> >>>I just sent the patch to the CI tests and they passed > >>>>>> >>>http://sssd-ci.duckdns.org/logs/job/32/63/summary.html > >>>>>> >>> > >>>>>>>> >>>> >Then I would prefer undocumented. It matches how we (don't) document the > >>>>>>>> >>>> >"command" option. > >>>>>> >>>I agree with little exception. I think it could be more clear if we write > >>>>>> >>>little documentation to commit message or to the code near the new option. > >>>>>> >>>But, how everybody wrote, we could leave man page without documentation. > >>>> >Good idea, I can move the documentation that was previously in the man > >>>> >page to the commit message, would that work for you? > >>> > >>>Yes, it works for me. > >>> > >>>I am going to 1/2 PTO for now. > >>> > >>>The last two things are that I would like to run it due to your reproducer. > >>>And there is a remark from Stephen Gallagher which we need resolve. > >>> > >>>However, CI tests passed. So if you're in a hurry with code review, please > >>>ask someone else. > >>> > >>>I can continue with review tomorrow. > >I'm not in hurry at all. Attached is a patch that adds a better commit > >message. We can discuss any details related to testing over IRC if you > >prefer. >Hi Jakub, > >it works due to your reproducer. It is really need to have >setenforce == 1 You meant setenforce 0, right? Hi Jakub, yes, of course, I meant setenforce 1. It was mistake. > >CI tests passed: >http://sssd-ci.duckdns.org/logs/job/32/25/summary.html > >Stephen Gallagher wrote (2015-09-11 11:32 AM): >There are problems inherent with passing the PID to the child process. >There's no guarantee that the process still exists. In the worst-case, >the PID could actually be reassigned to a new process and the output >you got back from something like pstack could be reading from a >different executable entirely. >--- >I understand, it could be dangerous. But, this option is a little bit >secret, we don't write about it in our man pages and so on. I hope it will >be used only for debuging some hot cases. Yes, Stephen is right, but if this option is only used for debugging, then I think we're fine. I would really prefer to have this undocumented option rather than run blind in case services get stuck.. Yes, I agreee. We can use this way because it is only for debugging. The real solution here would be to use systemd for service management. Good point. Thank you for remark. > >There are only little remarks in patch. > >=> ACK Then it should be a nack, don't let sloppy patches through:-) OK I will be more restrictive. > >Regrds > >Petr > >PS: I accepted that we have ># p = copy; >not something like: ># copy_ptr = copy; >How I suggested previous mail. Sorry, I overlooked that previously. It's an honest mistake, I didn't want to ignore you. Feel free to just push again for changes you like next time, each suggestion should be discussed and either accepted or rejected (with good reason). OK. > > >0001-SSSD-Add-a-new-command-diag_cmd.patch > > > > > > From ee4135adf6669221de575ebc92e7b3aabba55ba9 Mon Sep 17 00:00:00 2001 > >From: Jakub Hrozek<jhro...@redhat.com> > >Date: Mon, 2 Nov 2015 11:41:31 +0100 > >Subject: [PATCH] SSSD: Add a new command diag_cmd > > > >This command is an optional one that is run when a sbus ping times out > >and before a SIGKILL commans is sent. > ---^--- s -> d Fixed OK. [...] > >@@ -1065,6 +1237,18 @@ static errno_t get_ping_config(struct mt_ctx *ctx, const char *path, > >"Time between service pings for [%s]: [%d]\n", > > svc->name, svc->ping_time); > > > >+ret = confdb_get_string(ctx->cdb, svc, path, > >+CONFDB_MONITOR_PRE_KILL_CMD, > >+ NULL, >diag_cmd); > >+if (ret != EOK) { > >+DEBUG(SSSDBG_CRIT_FAILURE, > >+
Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd
On Thu, Nov 12, 2015 at 10:49:33AM +0100, Petr Cech wrote: > On 11/11/2015 12:25 PM, Jakub Hrozek wrote: > >On Wed, Nov 11, 2015 at 11:07:46AM +0100, Petr Cech wrote: > >>>On 11/11/2015 09:32 AM, Jakub Hrozek wrote: > >>>>>> >>>Hi Jakub, > >>>>>> >>> > >>>>>> >>>I just sent the patch to the CI tests and they passed > >>>>>> >>>http://sssd-ci.duckdns.org/logs/job/32/63/summary.html > >>>>>> >>> > >>>>>>>> >>>> >Then I would prefer undocumented. It matches how we (don't) > >>>>>>>> >>>> >document the > >>>>>>>> >>>> >"command" option. > >>>>>> >>>I agree with little exception. I think it could be more clear if we > >>>>>> >>>write > >>>>>> >>>little documentation to commit message or to the code near the new > >>>>>> >>>option. > >>>>>> >>>But, how everybody wrote, we could leave man page without > >>>>>> >>>documentation. > >>>> >Good idea, I can move the documentation that was previously in the man > >>>> >page to the commit message, would that work for you? > >>> > >>>Yes, it works for me. > >>> > >>>I am going to 1/2 PTO for now. > >>> > >>>The last two things are that I would like to run it due to your reproducer. > >>>And there is a remark from Stephen Gallagher which we need resolve. > >>> > >>>However, CI tests passed. So if you're in a hurry with code review, please > >>>ask someone else. > >>> > >>>I can continue with review tomorrow. > >I'm not in hurry at all. Attached is a patch that adds a better commit > >message. We can discuss any details related to testing over IRC if you > >prefer. > Hi Jakub, > > it works due to your reproducer. It is really need to have > setenforce == 1 You meant setenforce 0, right? > > CI tests passed: > http://sssd-ci.duckdns.org/logs/job/32/25/summary.html > > Stephen Gallagher wrote (2015-09-11 11:32 AM): > There are problems inherent with passing the PID to the child process. > There's no guarantee that the process still exists. In the worst-case, > the PID could actually be reassigned to a new process and the output > you got back from something like pstack could be reading from a > different executable entirely. > --- > I understand, it could be dangerous. But, this option is a little bit > secret, we don't write about it in our man pages and so on. I hope it will > be used only for debuging some hot cases. Yes, Stephen is right, but if this option is only used for debugging, then I think we're fine. I would really prefer to have this undocumented option rather than run blind in case services get stuck.. The real solution here would be to use systemd for service management. > > There are only little remarks in patch. > > => ACK Then it should be a nack, don't let sloppy patches through :-) > > Regrds > > Petr > > PS: I accepted that we have > # p = copy; > not something like: > # copy_ptr = copy; > How I suggested previous mail. Sorry, I overlooked that previously. It's an honest mistake, I didn't want to ignore you. Feel free to just push again for changes you like next time, each suggestion should be discussed and either accepted or rejected (with good reason). > > >0001-SSSD-Add-a-new-command-diag_cmd.patch > > > > > > From ee4135adf6669221de575ebc92e7b3aabba55ba9 Mon Sep 17 00:00:00 2001 > >From: Jakub Hrozek<jhro...@redhat.com> > >Date: Mon, 2 Nov 2015 11:41:31 +0100 > >Subject: [PATCH] SSSD: Add a new command diag_cmd > > > >This command is an optional one that is run when a sbus ping times out > >and before a SIGKILL commans is sent. > ---^--- s -> d Fixed [...] > >@@ -1065,6 +1237,18 @@ static errno_t get_ping_config(struct mt_ctx *ctx, > >const char *path, > >"Time between service pings for [%s]: [%d]\n", > > svc->name, svc->ping_time); > > > >+ret = confdb_get_string(ctx->cdb, svc, path, > >+CONFDB_MONITOR_PRE_KILL_CMD, > >+NULL, >diag_cmd); > >+if (ret != EOK) { > >+DEBUG(SSSDBG_CRIT_FAILURE, > >+ "Failed to get diagnostics command for %s\n", svc-&
Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd
On 11/11/2015 09:32 AM, Jakub Hrozek wrote: >Hi Jakub, > >I just sent the patch to the CI tests and they passed >http://sssd-ci.duckdns.org/logs/job/32/63/summary.html > > >Then I would prefer undocumented. It matches how we (don't) document the > >"command" option. >I agree with little exception. I think it could be more clear if we write >little documentation to commit message or to the code near the new option. >But, how everybody wrote, we could leave man page without documentation. Good idea, I can move the documentation that was previously in the man page to the commit message, would that work for you? Yes, it works for me. I am going to 1/2 PTO for now. The last two things are that I would like to run it due to your reproducer. And there is a remark from Stephen Gallagher which we need resolve. However, CI tests passed. So if you're in a hurry with code review, please ask someone else. I can continue with review tomorrow. Regards Petr ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-devel
Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd
On Wed, Nov 11, 2015 at 09:12:52AM +0100, Petr Cech wrote: > On 11/10/2015 04:20 PM, Jakub Hrozek wrote: > >On Tue, Nov 10, 2015 at 01:22:54PM +0100, Lukas Slebodnik wrote: > >>>On (10/11/15 13:15), Jakub Hrozek wrote: > >On Mon, Nov 09, 2015 at 11:32:30AM +0100, Petr Cech wrote: > > >>On 11/04/2015 11:24 AM, Jakub Hrozek wrote: > >> >> >Hi, > >> >> > > >> >> >I created this patch to try to diagnose an issue where sssd would > >> >> >randomly restart on any of machines in a VM cluster without giving > >> >> >too > >> >> >much advise why. I think it might be useful to merge in general. > > >> > > >>Hi Jakub, > > >> > > >>I reviewed the patch. Code looks good to me. > > >>CI tests passed:http://sssd-ci.duckdns.org/logs/job/32/25/summary.html > > >> > > >>Then I tried to test new functionality. > > >> > > >>Man pages are right, I found diag_cmd in sssd.conf. > > >> > > >>And I really got the right message when I kill sss_pam: > > >># (Mon Nov 9 04:30:47 2015) [sssd] [svc_child_info] (0x0040): Child > > >>[25767] > > >>terminated with signal [9] > > >> > > >>I would like to see output of pstack, but I don't know, how to get > > >>the right > > >>state of SSSD. Can you help me, please? > > > >I tested the patch by setting a low 'timeout' in the 'domain' section > >and then setting the diag_cmd: > >[domain/foo] > >timeout = 2 > >diag_cmd = pstack %p > > > >then I stopped the back end: > ># kill -STOP $(pidof sssd_be) > > > >You should see the pstack output in /var/log/sssd/sssd.log, also the > >debug_level must be increased in the [sssd] section. You might also need > >to set SELinux to Permissive, otherwise sssd might not be able to fork > >an exec pstack.. > >>>So in this case I would prefer if this opton was not documented. > >>>or it should be documented issues with SELinux > >>> > Hi Jakub, > > I just sent the patch to the CI tests and they passed > http://sssd-ci.duckdns.org/logs/job/32/63/summary.html > > >Then I would prefer undocumented. It matches how we (don't) document the > >"command" option. > I agree with little exception. I think it could be more clear if we write > little documentation to commit message or to the code near the new option. > But, how everybody wrote, we could leave man page without documentation. Good idea, I can move the documentation that was previously in the man page to the commit message, would that work for you? ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-devel
Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd
On Wed, Nov 11, 2015 at 11:07:46AM +0100, Petr Cech wrote: > On 11/11/2015 09:32 AM, Jakub Hrozek wrote: > >>>Hi Jakub, > >>> > >>>I just sent the patch to the CI tests and they passed > >>>http://sssd-ci.duckdns.org/logs/job/32/63/summary.html > >>> > >>>> >Then I would prefer undocumented. It matches how we (don't) document the > >>>> >"command" option. > >>>I agree with little exception. I think it could be more clear if we write > >>>little documentation to commit message or to the code near the new option. > >>>But, how everybody wrote, we could leave man page without documentation. > >Good idea, I can move the documentation that was previously in the man > >page to the commit message, would that work for you? > > Yes, it works for me. > > I am going to 1/2 PTO for now. > > The last two things are that I would like to run it due to your reproducer. > And there is a remark from Stephen Gallagher which we need resolve. > > However, CI tests passed. So if you're in a hurry with code review, please > ask someone else. > > I can continue with review tomorrow. I'm not in hurry at all. Attached is a patch that adds a better commit message. We can discuss any details related to testing over IRC if you prefer. >From ee4135adf6669221de575ebc92e7b3aabba55ba9 Mon Sep 17 00:00:00 2001 From: Jakub Hrozek <jhro...@redhat.com> Date: Mon, 2 Nov 2015 11:41:31 +0100 Subject: [PATCH] SSSD: Add a new command diag_cmd This command is an optional one that is run when a sbus ping times out and before a SIGKILL commans is sent. diag_cmd (string): A command that should be run for diagnostic purpose when an sbus timeout fails. The option value may contain %p which would be expanded for the process ID of the process that timed out Example: pstack %p This setting would print the stackstrace of the command whose ping timed out. Default: not set. --- src/confdb/confdb.h | 1 + src/config/SSSDConfig/__init__.py.in | 1 + src/config/SSSDConfigTest.py | 1 + src/config/etc/sssd.api.conf | 1 + src/monitor/monitor.c| 215 +++ 5 files changed, 197 insertions(+), 22 deletions(-) diff --git a/src/confdb/confdb.h b/src/confdb/confdb.h index 37b5fd7c7629e2618a1699e3ffd58110171db605..0ef7268f9cdc2c18482bbf7b8dbe19d3ef6b7bbf 100644 --- a/src/confdb/confdb.h +++ b/src/confdb/confdb.h @@ -71,6 +71,7 @@ #define CONFDB_MONITOR_DEFAULT_DOMAIN "default_domain_suffix" #define CONFDB_MONITOR_OVERRIDE_SPACE "override_space" #define CONFDB_MONITOR_USER_RUNAS "user" +#define CONFDB_MONITOR_PRE_KILL_CMD "diag_cmd" /* Both monitor and domains */ #define CONFDB_NAME_REGEX "re_expression" diff --git a/src/config/SSSDConfig/__init__.py.in b/src/config/SSSDConfig/__init__.py.in index bf61c402796122050fa43cf41128faec4771c5d2..60129e6e7fbc96d11c539323346c22a7db6d7f23 100644 --- a/src/config/SSSDConfig/__init__.py.in +++ b/src/config/SSSDConfig/__init__.py.in @@ -50,6 +50,7 @@ option_strings = { 'reconnection_retries' : _('Number of times to attempt connection to Data Providers'), 'fd_limit' : _('The number of file descriptors that may be opened by this responder'), 'client_idle_timeout' : _('Idle time before automatic disconnection of a client'), +'diag_cmd' : _('The command to run when a service ping times out'), # [sssd] 'services' : _('SSSD Services to start'), diff --git a/src/config/SSSDConfigTest.py b/src/config/SSSDConfigTest.py index 45562214da5d227b45914abbcb298e043048adf5..abd4a39258e060f27db62eb2352450b6c405930c 100755 --- a/src/config/SSSDConfigTest.py +++ b/src/config/SSSDConfigTest.py @@ -307,6 +307,7 @@ class SSSDConfigTestSSSDService(unittest.TestCase): 'reconnection_retries', 'fd_limit', 'client_idle_timeout', +'diag_cmd', 'description'] self.assertTrue(type(options) == dict, diff --git a/src/config/etc/sssd.api.conf b/src/config/etc/sssd.api.conf index 72abb8b3f427e9bcf3cf3dcaa67aaf4e3e50226c..0c03625bd5f0d3fbf8948971b015d4e8255f0009 100644 --- a/src/config/etc/sssd.api.conf +++ b/src/config/etc/sssd.api.conf @@ -13,6 +13,7 @@ fd_limit = int, None, false client_idle_timeout = int, None, false force_timeout = int, None, false description = str, None, false +diag_cmd = str, None, false [sssd] # Monitor service diff --git a/src/monitor/monitor.c b/src/monitor/monitor.c index 89ac882d38652e071fdc619495c3252807527da0..5ff5f85527a6aa91e220ddd07e8c8738dba0e829 100644 --- a/src/monitor/monitor.c +++ b/src/monitor/monitor.c @@ -116,6 +116,7 @@ struct mt_svc { char *identity; pid_t pid; +char *diag_cmd; int ping_time; int kill_time; @@ -
Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd
On 11/10/2015 04:20 PM, Jakub Hrozek wrote: On Tue, Nov 10, 2015 at 01:22:54PM +0100, Lukas Slebodnik wrote: >On (10/11/15 13:15), Jakub Hrozek wrote: > >On Mon, Nov 09, 2015 at 11:32:30AM +0100, Petr Cech wrote: > >>On 11/04/2015 11:24 AM, Jakub Hrozek wrote: > >> >Hi, > >> > > >> >I created this patch to try to diagnose an issue where sssd would > >> >randomly restart on any of machines in a VM cluster without giving too > >> >much advise why. I think it might be useful to merge in general. > >> > >>Hi Jakub, > >> > >>I reviewed the patch. Code looks good to me. > >>CI tests passed:http://sssd-ci.duckdns.org/logs/job/32/25/summary.html > >> > >>Then I tried to test new functionality. > >> > >>Man pages are right, I found diag_cmd in sssd.conf. > >> > >>And I really got the right message when I kill sss_pam: > >># (Mon Nov 9 04:30:47 2015) [sssd] [svc_child_info] (0x0040): Child [25767] > >>terminated with signal [9] > >> > >>I would like to see output of pstack, but I don't know, how to get the right > >>state of SSSD. Can you help me, please? > > > >I tested the patch by setting a low 'timeout' in the 'domain' section > >and then setting the diag_cmd: > >[domain/foo] > >timeout = 2 > >diag_cmd = pstack %p > > > >then I stopped the back end: > ># kill -STOP $(pidof sssd_be) > > > >You should see the pstack output in /var/log/sssd/sssd.log, also the > >debug_level must be increased in the [sssd] section. You might also need > >to set SELinux to Permissive, otherwise sssd might not be able to fork > >an exec pstack.. >So in this case I would prefer if this opton was not documented. >or it should be documented issues with SELinux > Hi Jakub, I just sent the patch to the CI tests and they passed http://sssd-ci.duckdns.org/logs/job/32/63/summary.html Then I would prefer undocumented. It matches how we (don't) document the "command" option. I agree with little exception. I think it could be more clear if we write little documentation to commit message or to the code near the new option. But, how everybody wrote, we could leave man page without documentation. A new patch is attached. 0001-SSSD-Add-a-new-command-diag_cmd.patch From fb1b8c5fd9fbec475c036563640d7e320d526620 Mon Sep 17 00:00:00 2001 From: Jakub Hrozek<jhro...@redhat.com> Date: Mon, 2 Nov 2015 11:41:31 +0100 Subject: [PATCH] SSSD: Add a new command diag_cmd This command is an optional one that is run when a sbus ping times out and before a SIGKILL commans is sent. This command supports a single template substitution that expands to the PID of the service being signaled. --- src/confdb/confdb.h | 1 + src/config/SSSDConfig/__init__.py.in | 1 + src/config/SSSDConfigTest.py | 1 + src/config/etc/sssd.api.conf | 1 + src/monitor/monitor.c| 215 +++ 5 files changed, 197 insertions(+), 22 deletions(-) diff --git a/src/confdb/confdb.h b/src/confdb/confdb.h index 37b5fd7c7629e2618a1699e3ffd58110171db605..0ef7268f9cdc2c18482bbf7b8dbe19d3ef6b7bbf 100644 --- a/src/confdb/confdb.h +++ b/src/confdb/confdb.h @@ -71,6 +71,7 @@ #define CONFDB_MONITOR_DEFAULT_DOMAIN "default_domain_suffix" #define CONFDB_MONITOR_OVERRIDE_SPACE "override_space" #define CONFDB_MONITOR_USER_RUNAS "user" +#define CONFDB_MONITOR_PRE_KILL_CMD "diag_cmd" /* Both monitor and domains */ #define CONFDB_NAME_REGEX "re_expression" diff --git a/src/config/SSSDConfig/__init__.py.in b/src/config/SSSDConfig/__init__.py.in index bf61c402796122050fa43cf41128faec4771c5d2..60129e6e7fbc96d11c539323346c22a7db6d7f23 100644 --- a/src/config/SSSDConfig/__init__.py.in +++ b/src/config/SSSDConfig/__init__.py.in @@ -50,6 +50,7 @@ option_strings = { 'reconnection_retries' : _('Number of times to attempt connection to Data Providers'), 'fd_limit' : _('The number of file descriptors that may be opened by this responder'), 'client_idle_timeout' : _('Idle time before automatic disconnection of a client'), +'diag_cmd' : _('The command to run when a service ping times out'), This is the reason why I ask for little documentation... because there is '%p' template shadowed. # [sssd] 'services' : _('SSSD Services to start'), diff --git a/src/config/SSSDConfigTest.py b/src/config/SSSDConfigTest.py index 45562214da5d227b45914abbcb298e043048adf5..abd4a39258e060f27db62eb2352450b6c405930c 100755 --- a/src/config/SSSDConfigTest.py +++ b/src/config/SSSDConfigTest.py @@ -307,6 +307,7 @@ class SSSDConfigTestSSSDService(unittest.TestCase): '
Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd
On Tue, Nov 10, 2015 at 01:22:54PM +0100, Lukas Slebodnik wrote: > On (10/11/15 13:15), Jakub Hrozek wrote: > >On Mon, Nov 09, 2015 at 11:32:30AM +0100, Petr Cech wrote: > >> On 11/04/2015 11:24 AM, Jakub Hrozek wrote: > >> >Hi, > >> > > >> >I created this patch to try to diagnose an issue where sssd would > >> >randomly restart on any of machines in a VM cluster without giving too > >> >much advise why. I think it might be useful to merge in general. > >> > >> Hi Jakub, > >> > >> I reviewed the patch. Code looks good to me. > >> CI tests passed: http://sssd-ci.duckdns.org/logs/job/32/25/summary.html > >> > >> Then I tried to test new functionality. > >> > >> Man pages are right, I found diag_cmd in sssd.conf. > >> > >> And I really got the right message when I kill sss_pam: > >> # (Mon Nov 9 04:30:47 2015) [sssd] [svc_child_info] (0x0040): Child > >> [25767] > >> terminated with signal [9] > >> > >> I would like to see output of pstack, but I don't know, how to get the > >> right > >> state of SSSD. Can you help me, please? > > > >I tested the patch by setting a low 'timeout' in the 'domain' section > >and then setting the diag_cmd: > >[domain/foo] > >timeout = 2 > >diag_cmd = pstack %p > > > >then I stopped the back end: > ># kill -STOP $(pidof sssd_be) > > > >You should see the pstack output in /var/log/sssd/sssd.log, also the > >debug_level must be increased in the [sssd] section. You might also need > >to set SELinux to Permissive, otherwise sssd might not be able to fork > >an exec pstack.. > So in this case I would prefer if this opton was not documented. > or it should be documented issues with SELinux > Then I would prefer undocumented. It matches how we (don't) document the "command" option. A new patch is attached. >From fb1b8c5fd9fbec475c036563640d7e320d526620 Mon Sep 17 00:00:00 2001 From: Jakub Hrozek <jhro...@redhat.com> Date: Mon, 2 Nov 2015 11:41:31 +0100 Subject: [PATCH] SSSD: Add a new command diag_cmd This command is an optional one that is run when a sbus ping times out and before a SIGKILL commans is sent. This command supports a single template substitution that expands to the PID of the service being signaled. --- src/confdb/confdb.h | 1 + src/config/SSSDConfig/__init__.py.in | 1 + src/config/SSSDConfigTest.py | 1 + src/config/etc/sssd.api.conf | 1 + src/monitor/monitor.c| 215 +++ 5 files changed, 197 insertions(+), 22 deletions(-) diff --git a/src/confdb/confdb.h b/src/confdb/confdb.h index 37b5fd7c7629e2618a1699e3ffd58110171db605..0ef7268f9cdc2c18482bbf7b8dbe19d3ef6b7bbf 100644 --- a/src/confdb/confdb.h +++ b/src/confdb/confdb.h @@ -71,6 +71,7 @@ #define CONFDB_MONITOR_DEFAULT_DOMAIN "default_domain_suffix" #define CONFDB_MONITOR_OVERRIDE_SPACE "override_space" #define CONFDB_MONITOR_USER_RUNAS "user" +#define CONFDB_MONITOR_PRE_KILL_CMD "diag_cmd" /* Both monitor and domains */ #define CONFDB_NAME_REGEX "re_expression" diff --git a/src/config/SSSDConfig/__init__.py.in b/src/config/SSSDConfig/__init__.py.in index bf61c402796122050fa43cf41128faec4771c5d2..60129e6e7fbc96d11c539323346c22a7db6d7f23 100644 --- a/src/config/SSSDConfig/__init__.py.in +++ b/src/config/SSSDConfig/__init__.py.in @@ -50,6 +50,7 @@ option_strings = { 'reconnection_retries' : _('Number of times to attempt connection to Data Providers'), 'fd_limit' : _('The number of file descriptors that may be opened by this responder'), 'client_idle_timeout' : _('Idle time before automatic disconnection of a client'), +'diag_cmd' : _('The command to run when a service ping times out'), # [sssd] 'services' : _('SSSD Services to start'), diff --git a/src/config/SSSDConfigTest.py b/src/config/SSSDConfigTest.py index 45562214da5d227b45914abbcb298e043048adf5..abd4a39258e060f27db62eb2352450b6c405930c 100755 --- a/src/config/SSSDConfigTest.py +++ b/src/config/SSSDConfigTest.py @@ -307,6 +307,7 @@ class SSSDConfigTestSSSDService(unittest.TestCase): 'reconnection_retries', 'fd_limit', 'client_idle_timeout', +'diag_cmd', 'description'] self.assertTrue(type(options) == dict, diff --git a/src/config/etc/sssd.api.conf b/src/config/etc/sssd.api.conf index 72abb8b3f427e9bcf3cf3dcaa67aaf4e3e50226c..0c03625bd5f0d3fbf8948971b015d4e8255f0009 100644 --- a/src/config/etc/sssd.api.conf +++ b/src/config/etc/sssd.api.conf @@ -13,6 +13,7 @@ fd_limit = int, None, false client_idle_timeout = int, No
Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd
On 11/09/2015 07:17 PM, Stephen Gallagher wrote: There are problems inherent with passing the PID to the child process. There's no guarantee that the process still exists. In the worst-case, the PID could actually be reassigned to a new process and the output you got back from something like pstack could be reading from a different executable entirely. +1 I am sorry I didn't see big picture. Petr ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-devel
Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd
On Mon, Nov 09, 2015 at 11:32:30AM +0100, Petr Cech wrote: > On 11/04/2015 11:24 AM, Jakub Hrozek wrote: > >Hi, > > > >I created this patch to try to diagnose an issue where sssd would > >randomly restart on any of machines in a VM cluster without giving too > >much advise why. I think it might be useful to merge in general. > > Hi Jakub, > > I reviewed the patch. Code looks good to me. > CI tests passed: http://sssd-ci.duckdns.org/logs/job/32/25/summary.html > > Then I tried to test new functionality. > > Man pages are right, I found diag_cmd in sssd.conf. > > And I really got the right message when I kill sss_pam: > # (Mon Nov 9 04:30:47 2015) [sssd] [svc_child_info] (0x0040): Child [25767] > terminated with signal [9] > > I would like to see output of pstack, but I don't know, how to get the right > state of SSSD. Can you help me, please? I tested the patch by setting a low 'timeout' in the 'domain' section and then setting the diag_cmd: [domain/foo] timeout = 2 diag_cmd = pstack %p then I stopped the back end: # kill -STOP $(pidof sssd_be) You should see the pstack output in /var/log/sssd/sssd.log, also the debug_level must be increased in the [sssd] section. You might also need to set SELinux to Permissive, otherwise sssd might not be able to fork an exec pstack.. ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-devel
Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/09/2015 05:32 AM, Petr Cech wrote: > On 11/04/2015 11:24 AM, Jakub Hrozek wrote: >> Hi, >> >> I created this patch to try to diagnose an issue where sssd >> would randomly restart on any of machines in a VM cluster without >> giving too much advise why. I think it might be useful to merge >> in general. > > Hi Jakub, > > I reviewed the patch. Code looks good to me. CI tests passed: > http://sssd-ci.duckdns.org/logs/job/32/25/summary.html > > Then I tried to test new functionality. > > Man pages are right, I found diag_cmd in sssd.conf. > > And I really got the right message when I kill sss_pam: # (Mon Nov > 9 04:30:47 2015) [sssd] [svc_child_info] (0x0040): Child [25767] > terminated with signal [9] > > I would like to see output of pstack, but I don't know, how to get > the right state of SSSD. Can you help me, please? > > Regards > > Petr There are problems inherent with passing the PID to the child process. There's no guarantee that the process still exists. In the worst-case, the PID could actually be reassigned to a new process and the output you got back from something like pstack could be reading from a different executable entirely. -BEGIN PGP SIGNATURE- Version: GnuPG v2 iEYEARECAAYFAlZA4zwACgkQeiVVYja6o6PjVQCgoTLJPxp46rdMzBmuTrbXi329 ZMEAn0fFVamHQugQAFjCNxQ4hCnTawmC =VOCY -END PGP SIGNATURE- ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-devel
Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd
On 11/04/2015 11:24 AM, Jakub Hrozek wrote: Hi, I created this patch to try to diagnose an issue where sssd would randomly restart on any of machines in a VM cluster without giving too much advise why. I think it might be useful to merge in general. Hi Jakub, I reviewed the patch. Code looks good to me. CI tests passed: http://sssd-ci.duckdns.org/logs/job/32/25/summary.html Then I tried to test new functionality. Man pages are right, I found diag_cmd in sssd.conf. And I really got the right message when I kill sss_pam: # (Mon Nov 9 04:30:47 2015) [sssd] [svc_child_info] (0x0040): Child [25767] terminated with signal [9] I would like to see output of pstack, but I don't know, how to get the right state of SSSD. Can you help me, please? Regards Petr ___ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-devel
[SSSD] [PATCH] SSSD: Add a new command diag_cmd
Hi, I created this patch to try to diagnose an issue where sssd would randomly restart on any of machines in a VM cluster without giving too much advise why. I think it might be useful to merge in general. >From cf1f9d56b85387885f6211434b827fdbb9e5f4c3 Mon Sep 17 00:00:00 2001 From: Jakub Hrozek <jhro...@redhat.com> Date: Mon, 2 Nov 2015 11:41:31 +0100 Subject: [PATCH] SSSD: Add a new command diag_cmd This command is an optional one that is run when a sbus ping times out and before a SIGKILL commans is sent. This command supports a single template substitution that expands to the PID of the service being signaled. --- src/confdb/confdb.h | 1 + src/config/SSSDConfig/__init__.py.in | 1 + src/config/SSSDConfigTest.py | 1 + src/config/etc/sssd.api.conf | 1 + src/man/sssd.conf.5.xml | 17 +++ src/monitor/monitor.c| 215 +++ 6 files changed, 214 insertions(+), 22 deletions(-) diff --git a/src/confdb/confdb.h b/src/confdb/confdb.h index 37b5fd7c7629e2618a1699e3ffd58110171db605..0ef7268f9cdc2c18482bbf7b8dbe19d3ef6b7bbf 100644 --- a/src/confdb/confdb.h +++ b/src/confdb/confdb.h @@ -71,6 +71,7 @@ #define CONFDB_MONITOR_DEFAULT_DOMAIN "default_domain_suffix" #define CONFDB_MONITOR_OVERRIDE_SPACE "override_space" #define CONFDB_MONITOR_USER_RUNAS "user" +#define CONFDB_MONITOR_PRE_KILL_CMD "diag_cmd" /* Both monitor and domains */ #define CONFDB_NAME_REGEX "re_expression" diff --git a/src/config/SSSDConfig/__init__.py.in b/src/config/SSSDConfig/__init__.py.in index bf61c402796122050fa43cf41128faec4771c5d2..60129e6e7fbc96d11c539323346c22a7db6d7f23 100644 --- a/src/config/SSSDConfig/__init__.py.in +++ b/src/config/SSSDConfig/__init__.py.in @@ -50,6 +50,7 @@ option_strings = { 'reconnection_retries' : _('Number of times to attempt connection to Data Providers'), 'fd_limit' : _('The number of file descriptors that may be opened by this responder'), 'client_idle_timeout' : _('Idle time before automatic disconnection of a client'), +'diag_cmd' : _('The command to run when a service ping times out'), # [sssd] 'services' : _('SSSD Services to start'), diff --git a/src/config/SSSDConfigTest.py b/src/config/SSSDConfigTest.py index 45562214da5d227b45914abbcb298e043048adf5..abd4a39258e060f27db62eb2352450b6c405930c 100755 --- a/src/config/SSSDConfigTest.py +++ b/src/config/SSSDConfigTest.py @@ -307,6 +307,7 @@ class SSSDConfigTestSSSDService(unittest.TestCase): 'reconnection_retries', 'fd_limit', 'client_idle_timeout', +'diag_cmd', 'description'] self.assertTrue(type(options) == dict, diff --git a/src/config/etc/sssd.api.conf b/src/config/etc/sssd.api.conf index 72abb8b3f427e9bcf3cf3dcaa67aaf4e3e50226c..0c03625bd5f0d3fbf8948971b015d4e8255f0009 100644 --- a/src/config/etc/sssd.api.conf +++ b/src/config/etc/sssd.api.conf @@ -13,6 +13,7 @@ fd_limit = int, None, false client_idle_timeout = int, None, false force_timeout = int, None, false description = str, None, false +diag_cmd = str, None, false [sssd] # Monitor service diff --git a/src/man/sssd.conf.5.xml b/src/man/sssd.conf.5.xml index 573f421a7d885d28d5dbc03423e6c6dd84d7b8fd..615629ad0c4bbd6bc43812cd354d8d61c2f77514 100644 --- a/src/man/sssd.conf.5.xml +++ b/src/man/sssd.conf.5.xml @@ -116,6 +116,23 @@ + +diag_cmd (string) + + +A command that should be run for diagnostic purpose +when an sbus timeout fails. The option value may +contain %p which would be expanded +for the process ID of the process that timed out. + + +Example: pstack %p + + +Default: Not set + + + diff --git a/src/monitor/monitor.c b/src/monitor/monitor.c index 89ac882d38652e071fdc619495c3252807527da0..5ff5f85527a6aa91e220ddd07e8c8738dba0e829 100644 --- a/src/monitor/monitor.c +++ b/src/monitor/monitor.c @@ -116,6 +116,7 @@ struct mt_svc { char *identity; pid_t pid; +char *diag_cmd; int ping_time; int kill_time; @@ -383,6 +384,176 @@ static int add_svc_conn_spy(struct mt_svc *svc) return EOK; } +static char *expand_diag_cmd(struct mt_svc *svc, + const char *template) +{ +TALLOC_CTX *tmp_ctx = NULL; +char *copy; +char *p; +char *n; +char *result = NULL; +char action; +char *res = NULL; + +if (template == NULL)