Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd

2015-11-13 Thread Petr Cech

On 11/13/2015 08:20 AM, Petr Cech wrote:

>Hi Jakub,
>
>it works due to your reproducer. It is really need to have
>setenforce == 1

You meant setenforce 0, right?


Hi Jakub,

yes, of course, I meant setenforce 1. It was mistake.

   --^-- 1 --> 0
I did little mistake again.
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/sssd-devel


Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd

2015-11-13 Thread Jakub Hrozek
On Fri, Nov 13, 2015 at 08:20:53AM +0100, Petr Cech wrote:
> I reviewed the code. All remarks were addressed.
> CI tests passed:
> http://sssd-ci.duckdns.org/logs/job/33/10/summary.html
> 
> => ACK
> 
> Regards
> 
> Petr

I changed the wording of the commit message a bit (we added an option,
not a command) and pushed the patch to master:
89530c830ded58c6140cdb34c9de07bf77bb5bc0
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/sssd-devel


Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd

2015-11-12 Thread Petr Cech

On 11/11/2015 12:25 PM, Jakub Hrozek wrote:

On Wed, Nov 11, 2015 at 11:07:46AM +0100, Petr Cech wrote:

>On 11/11/2015 09:32 AM, Jakub Hrozek wrote:

> >>>Hi Jakub,
> >>>
> >>>I just sent the patch to the CI tests and they passed
> >>>http://sssd-ci.duckdns.org/logs/job/32/63/summary.html
> >>>

> >>>> >Then I would prefer undocumented. It matches how we (don't) document the
> >>>> >"command" option.

> >>>I agree with little exception. I think it could be more clear if we write
> >>>little documentation to commit message or to the code near the new option.
> >>>But, how everybody wrote, we could leave man page without documentation.

> >Good idea, I can move the documentation that was previously in the man
> >page to the commit message, would that work for you?

>
>Yes, it works for me.
>
>I am going to 1/2 PTO for now.
>
>The last two things are that I would like to run it due to your reproducer.
>And there is a remark from Stephen Gallagher which we need resolve.
>
>However, CI tests passed. So if you're in a hurry with code review, please
>ask someone else.
>
>I can continue with review tomorrow.

I'm not in hurry at all. Attached is a patch that adds a better commit
message. We can discuss any details related to testing over IRC if you
prefer.

Hi Jakub,

it works due to your reproducer. It is really need to have
setenforce == 1

CI tests passed:
http://sssd-ci.duckdns.org/logs/job/32/25/summary.html

Stephen Gallagher wrote (2015-09-11 11:32 AM):
There are problems inherent with passing the PID to the child process.
There's no guarantee that the process still exists. In the worst-case,
the PID could actually be reassigned to a new process and the output
you got back from something like pstack could be reading from a
different executable entirely.
---
I understand, it could be dangerous. But, this option is a little bit 
secret, we don't write about it in our man pages and so on. I hope it 
will be used only for debuging some hot cases.


There are only little remarks in patch.

=> ACK

Regrds

Petr

PS: I accepted that we have
# p = copy;
not something like:
# copy_ptr = copy;
How I suggested previous mail.


0001-SSSD-Add-a-new-command-diag_cmd.patch


 From ee4135adf6669221de575ebc92e7b3aabba55ba9 Mon Sep 17 00:00:00 2001
From: Jakub Hrozek<jhro...@redhat.com>
Date: Mon, 2 Nov 2015 11:41:31 +0100
Subject: [PATCH] SSSD: Add a new command diag_cmd

This command is an optional one that is run when a sbus ping times out
and before a SIGKILL commans is sent.

  ---^--- s -> d


diag_cmd (string):
A command that should be run for diagnostic purpose when an sbus timeout
fails. The option value may contain %p which would be expanded for the
process ID of the process that timed out

Example:
 pstack %p
This setting would print the stackstrace of the command whose ping timed out.

Default: not set.
---
  src/confdb/confdb.h  |   1 +
  src/config/SSSDConfig/__init__.py.in |   1 +
  src/config/SSSDConfigTest.py |   1 +
  src/config/etc/sssd.api.conf |   1 +
  src/monitor/monitor.c| 215 +++
  5 files changed, 197 insertions(+), 22 deletions(-)

diff --git a/src/confdb/confdb.h b/src/confdb/confdb.h
index 
37b5fd7c7629e2618a1699e3ffd58110171db605..0ef7268f9cdc2c18482bbf7b8dbe19d3ef6b7bbf
 100644
--- a/src/confdb/confdb.h
+++ b/src/confdb/confdb.h
@@ -71,6 +71,7 @@
  #define CONFDB_MONITOR_DEFAULT_DOMAIN "default_domain_suffix"
  #define CONFDB_MONITOR_OVERRIDE_SPACE "override_space"
  #define CONFDB_MONITOR_USER_RUNAS "user"
+#define CONFDB_MONITOR_PRE_KILL_CMD "diag_cmd"

  /* Both monitor and domains */
  #define CONFDB_NAME_REGEX   "re_expression"
diff --git a/src/config/SSSDConfig/__init__.py.in 
b/src/config/SSSDConfig/__init__.py.in
index 
bf61c402796122050fa43cf41128faec4771c5d2..60129e6e7fbc96d11c539323346c22a7db6d7f23
 100644
--- a/src/config/SSSDConfig/__init__.py.in
+++ b/src/config/SSSDConfig/__init__.py.in
@@ -50,6 +50,7 @@ option_strings = {
  'reconnection_retries' : _('Number of times to attempt connection to Data 
Providers'),
  'fd_limit' : _('The number of file descriptors that may be opened by this 
responder'),
  'client_idle_timeout' : _('Idle time before automatic disconnection of a 
client'),
+'diag_cmd' : _('The command to run when a service ping times out'),

  # [sssd]
  'services' : _('SSSD Services to start'),
diff --git a/src/config/SSSDConfigTest.py b/src/config/SSSDConfigTest.py
index 
45562214da5d227b45914abbcb298e043048adf5..abd4a39258e060f27db62eb2352450b6c405930c
 100755
--- a/src/config/SSSDConfigTest.py
+++ b/src/config/SSSDConfigTest.py
@@ -307,6 +307,7 @@ class SSSDConfigTestSSSDServi

Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd

2015-11-12 Thread Petr Cech

On 11/12/2015 06:30 PM, Jakub Hrozek wrote:

On Thu, Nov 12, 2015 at 10:49:33AM +0100, Petr Cech wrote:

>On 11/11/2015 12:25 PM, Jakub Hrozek wrote:

> >On Wed, Nov 11, 2015 at 11:07:46AM +0100, Petr Cech wrote:

> >>>On 11/11/2015 09:32 AM, Jakub Hrozek wrote:

> >>>>>> >>>Hi Jakub,
> >>>>>> >>>
> >>>>>> >>>I just sent the patch to the CI tests and they passed
> >>>>>> >>>http://sssd-ci.duckdns.org/logs/job/32/63/summary.html
> >>>>>> >>>

> >>>>>>>> >>>> >Then I would prefer undocumented. It matches how we (don't) 
document the
> >>>>>>>> >>>> >"command" option.

> >>>>>> >>>I agree with little exception. I think it could be more clear if we 
write
> >>>>>> >>>little documentation to commit message or to the code near the new 
option.
> >>>>>> >>>But, how everybody wrote, we could leave man page without 
documentation.

> >>>> >Good idea, I can move the documentation that was previously in the man
> >>>> >page to the commit message, would that work for you?

> >>>
> >>>Yes, it works for me.
> >>>
> >>>I am going to 1/2 PTO for now.
> >>>
> >>>The last two things are that I would like to run it due to your reproducer.
> >>>And there is a remark from Stephen Gallagher which we need resolve.
> >>>
> >>>However, CI tests passed. So if you're in a hurry with code review, please
> >>>ask someone else.
> >>>
> >>>I can continue with review tomorrow.

> >I'm not in hurry at all. Attached is a patch that adds a better commit
> >message. We can discuss any details related to testing over IRC if you
> >prefer.

>Hi Jakub,
>
>it works due to your reproducer. It is really need to have
>setenforce == 1

You meant setenforce 0, right?


Hi Jakub,

yes, of course, I meant setenforce 1. It was mistake.



>
>CI tests passed:
>http://sssd-ci.duckdns.org/logs/job/32/25/summary.html
>
>Stephen Gallagher wrote (2015-09-11 11:32 AM):
>There are problems inherent with passing the PID to the child process.
>There's no guarantee that the process still exists. In the worst-case,
>the PID could actually be reassigned to a new process and the output
>you got back from something like pstack could be reading from a
>different executable entirely.
>---
>I understand, it could be dangerous. But, this option is a little bit
>secret, we don't write about it in our man pages and so on. I hope it will
>be used only for debuging some hot cases.

Yes, Stephen is right, but if this option is only used for debugging,
then I think we're fine. I would really prefer to have this undocumented
option rather than run blind in case services get stuck..

Yes, I agreee. We can use this way because it is only for debugging.




The real solution here would be to use systemd for service management.

Good point. Thank you for remark.





>
>There are only little remarks in patch.
>
>=> ACK

Then it should be a nack, don't let sloppy patches through:-)

OK I will be more restrictive.





>
>Regrds
>
>Petr
>
>PS: I accepted that we have
># p = copy;
>not something like:
># copy_ptr = copy;
>How I suggested previous mail.

Sorry, I overlooked that previously. It's an honest mistake, I didn't want
to ignore you. Feel free to just push again for changes you like next time,
each suggestion should be discussed and either accepted or rejected (with
good reason).

OK.





>

> >0001-SSSD-Add-a-new-command-diag_cmd.patch
> >
> >
> > From ee4135adf6669221de575ebc92e7b3aabba55ba9 Mon Sep 17 00:00:00 2001
> >From: Jakub Hrozek<jhro...@redhat.com>
> >Date: Mon, 2 Nov 2015 11:41:31 +0100
> >Subject: [PATCH] SSSD: Add a new command diag_cmd
> >
> >This command is an optional one that is run when a sbus ping times out
> >and before a SIGKILL commans is sent.

>   ---^--- s -> d

Fixed

OK.




[...]


> >@@ -1065,6 +1237,18 @@ static errno_t get_ping_config(struct mt_ctx *ctx, 
const char *path,
> >"Time between service pings for [%s]: [%d]\n",
> > svc->name, svc->ping_time);
> >
> >+ret = confdb_get_string(ctx->cdb, svc, path,
> >+CONFDB_MONITOR_PRE_KILL_CMD,
> >+        NULL, >diag_cmd);
> >+if (ret != EOK) {
> >+DEBUG(SSSDBG_CRIT_FAILURE,
> >+  

Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd

2015-11-12 Thread Jakub Hrozek
On Thu, Nov 12, 2015 at 10:49:33AM +0100, Petr Cech wrote:
> On 11/11/2015 12:25 PM, Jakub Hrozek wrote:
> >On Wed, Nov 11, 2015 at 11:07:46AM +0100, Petr Cech wrote:
> >>>On 11/11/2015 09:32 AM, Jakub Hrozek wrote:
> >>>>>> >>>Hi Jakub,
> >>>>>> >>>
> >>>>>> >>>I just sent the patch to the CI tests and they passed
> >>>>>> >>>http://sssd-ci.duckdns.org/logs/job/32/63/summary.html
> >>>>>> >>>
> >>>>>>>> >>>> >Then I would prefer undocumented. It matches how we (don't) 
> >>>>>>>> >>>> >document the
> >>>>>>>> >>>> >"command" option.
> >>>>>> >>>I agree with little exception. I think it could be more clear if we 
> >>>>>> >>>write
> >>>>>> >>>little documentation to commit message or to the code near the new 
> >>>>>> >>>option.
> >>>>>> >>>But, how everybody wrote, we could leave man page without 
> >>>>>> >>>documentation.
> >>>> >Good idea, I can move the documentation that was previously in the man
> >>>> >page to the commit message, would that work for you?
> >>>
> >>>Yes, it works for me.
> >>>
> >>>I am going to 1/2 PTO for now.
> >>>
> >>>The last two things are that I would like to run it due to your reproducer.
> >>>And there is a remark from Stephen Gallagher which we need resolve.
> >>>
> >>>However, CI tests passed. So if you're in a hurry with code review, please
> >>>ask someone else.
> >>>
> >>>I can continue with review tomorrow.
> >I'm not in hurry at all. Attached is a patch that adds a better commit
> >message. We can discuss any details related to testing over IRC if you
> >prefer.
> Hi Jakub,
> 
> it works due to your reproducer. It is really need to have
> setenforce == 1

You meant setenforce 0, right?

> 
> CI tests passed:
> http://sssd-ci.duckdns.org/logs/job/32/25/summary.html
> 
> Stephen Gallagher wrote (2015-09-11 11:32 AM):
> There are problems inherent with passing the PID to the child process.
> There's no guarantee that the process still exists. In the worst-case,
> the PID could actually be reassigned to a new process and the output
> you got back from something like pstack could be reading from a
> different executable entirely.
> ---
> I understand, it could be dangerous. But, this option is a little bit
> secret, we don't write about it in our man pages and so on. I hope it will
> be used only for debuging some hot cases.

Yes, Stephen is right, but if this option is only used for debugging,
then I think we're fine. I would really prefer to have this undocumented
option rather than run blind in case services get stuck..

The real solution here would be to use systemd for service management.

> 
> There are only little remarks in patch.
> 
> => ACK

Then it should be a nack, don't let sloppy patches through :-)

> 
> Regrds
> 
> Petr
> 
> PS: I accepted that we have
> # p = copy;
> not something like:
> # copy_ptr = copy;
> How I suggested previous mail.

Sorry, I overlooked that previously. It's an honest mistake, I didn't want
to ignore you. Feel free to just push again for changes you like next time,
each suggestion should be discussed and either accepted or rejected (with
good reason).

> 
> >0001-SSSD-Add-a-new-command-diag_cmd.patch
> >
> >
> > From ee4135adf6669221de575ebc92e7b3aabba55ba9 Mon Sep 17 00:00:00 2001
> >From: Jakub Hrozek<jhro...@redhat.com>
> >Date: Mon, 2 Nov 2015 11:41:31 +0100
> >Subject: [PATCH] SSSD: Add a new command diag_cmd
> >
> >This command is an optional one that is run when a sbus ping times out
> >and before a SIGKILL commans is sent.
>   ---^--- s -> d

Fixed

[...]

> >@@ -1065,6 +1237,18 @@ static errno_t get_ping_config(struct mt_ctx *ctx, 
> >const char *path,
> >"Time between service pings for [%s]: [%d]\n",
> > svc->name, svc->ping_time);
> >
> >+ret = confdb_get_string(ctx->cdb, svc, path,
> >+CONFDB_MONITOR_PRE_KILL_CMD,
> >+NULL, >diag_cmd);
> >+if (ret != EOK) {
> >+DEBUG(SSSDBG_CRIT_FAILURE,
> >+  "Failed to get diagnostics command for %s\n", svc-&

Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd

2015-11-11 Thread Petr Cech

On 11/11/2015 09:32 AM, Jakub Hrozek wrote:

>Hi Jakub,
>
>I just sent the patch to the CI tests and they passed
>http://sssd-ci.duckdns.org/logs/job/32/63/summary.html
>

> >Then I would prefer undocumented. It matches how we (don't) document the
> >"command" option.

>I agree with little exception. I think it could be more clear if we write
>little documentation to commit message or to the code near the new option.
>But, how everybody wrote, we could leave man page without documentation.

Good idea, I can move the documentation that was previously in the man
page to the commit message, would that work for you?


Yes, it works for me.

I am going to 1/2 PTO for now.

The last two things are that I would like to run it due to your 
reproducer. And there is a remark from Stephen Gallagher which we need 
resolve.


However, CI tests passed. So if you're in a hurry with code review, 
please ask someone else.


I can continue with review tomorrow.

Regards

Petr
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/sssd-devel


Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd

2015-11-11 Thread Jakub Hrozek
On Wed, Nov 11, 2015 at 09:12:52AM +0100, Petr Cech wrote:
> On 11/10/2015 04:20 PM, Jakub Hrozek wrote:
> >On Tue, Nov 10, 2015 at 01:22:54PM +0100, Lukas Slebodnik wrote:
> >>>On (10/11/15 13:15), Jakub Hrozek wrote:
>  >On Mon, Nov 09, 2015 at 11:32:30AM +0100, Petr Cech wrote:
> > >>On 11/04/2015 11:24 AM, Jakub Hrozek wrote:
> >> >> >Hi,
> >> >> >
> >> >> >I created this patch to try to diagnose an issue where sssd would
> >> >> >randomly restart on any of machines in a VM cluster without giving 
> >> >> >too
> >> >> >much advise why. I think it might be useful to merge in general.
> > >>
> > >>Hi Jakub,
> > >>
> > >>I reviewed the patch. Code looks good to me.
> > >>CI tests passed:http://sssd-ci.duckdns.org/logs/job/32/25/summary.html
> > >>
> > >>Then I tried to test new functionality.
> > >>
> > >>Man pages are right, I found diag_cmd in sssd.conf.
> > >>
> > >>And I really got the right message when I kill sss_pam:
> > >># (Mon Nov  9 04:30:47 2015) [sssd] [svc_child_info] (0x0040): Child 
> > >>[25767]
> > >>terminated with signal [9]
> > >>
> > >>I would like to see output of pstack, but I don't know, how to get 
> > >>the right
> > >>state of SSSD. Can you help me, please?
>  >
>  >I tested the patch by setting a low 'timeout' in the 'domain' section
>  >and then setting the diag_cmd:
>  >[domain/foo]
>  >timeout = 2
>  >diag_cmd = pstack %p
>  >
>  >then I stopped the back end:
>  ># kill -STOP $(pidof sssd_be)
>  >
>  >You should see the pstack output in /var/log/sssd/sssd.log, also the
>  >debug_level must be increased in the [sssd] section. You might also need
>  >to set SELinux to Permissive, otherwise sssd might not be able to fork
>  >an exec pstack..
> >>>So in this case I would prefer if this opton was not documented.
> >>>or it should be documented issues with SELinux
> >>>
> Hi Jakub,
> 
> I just sent the patch to the CI tests and they passed
> http://sssd-ci.duckdns.org/logs/job/32/63/summary.html
> 
> >Then I would prefer undocumented. It matches how we (don't) document the
> >"command" option.
> I agree with little exception. I think it could be more clear if we write
> little documentation to commit message or to the code near the new option.
> But, how everybody wrote, we could leave man page without documentation.

Good idea, I can move the documentation that was previously in the man
page to the commit message, would that work for you?
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/sssd-devel


Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd

2015-11-11 Thread Jakub Hrozek
On Wed, Nov 11, 2015 at 11:07:46AM +0100, Petr Cech wrote:
> On 11/11/2015 09:32 AM, Jakub Hrozek wrote:
> >>>Hi Jakub,
> >>>
> >>>I just sent the patch to the CI tests and they passed
> >>>http://sssd-ci.duckdns.org/logs/job/32/63/summary.html
> >>>
> >>>> >Then I would prefer undocumented. It matches how we (don't) document the
> >>>> >"command" option.
> >>>I agree with little exception. I think it could be more clear if we write
> >>>little documentation to commit message or to the code near the new option.
> >>>But, how everybody wrote, we could leave man page without documentation.
> >Good idea, I can move the documentation that was previously in the man
> >page to the commit message, would that work for you?
> 
> Yes, it works for me.
> 
> I am going to 1/2 PTO for now.
> 
> The last two things are that I would like to run it due to your reproducer.
> And there is a remark from Stephen Gallagher which we need resolve.
> 
> However, CI tests passed. So if you're in a hurry with code review, please
> ask someone else.
> 
> I can continue with review tomorrow.

I'm not in hurry at all. Attached is a patch that adds a better commit
message. We can discuss any details related to testing over IRC if you
prefer.
>From ee4135adf6669221de575ebc92e7b3aabba55ba9 Mon Sep 17 00:00:00 2001
From: Jakub Hrozek <jhro...@redhat.com>
Date: Mon, 2 Nov 2015 11:41:31 +0100
Subject: [PATCH] SSSD: Add a new command diag_cmd

This command is an optional one that is run when a sbus ping times out
and before a SIGKILL commans is sent.

diag_cmd (string):
A command that should be run for diagnostic purpose when an sbus timeout
fails. The option value may contain %p which would be expanded for the
process ID of the process that timed out

Example:
pstack %p
This setting would print the stackstrace of the command whose ping timed out.

Default: not set.
---
 src/confdb/confdb.h  |   1 +
 src/config/SSSDConfig/__init__.py.in |   1 +
 src/config/SSSDConfigTest.py |   1 +
 src/config/etc/sssd.api.conf |   1 +
 src/monitor/monitor.c| 215 +++
 5 files changed, 197 insertions(+), 22 deletions(-)

diff --git a/src/confdb/confdb.h b/src/confdb/confdb.h
index 
37b5fd7c7629e2618a1699e3ffd58110171db605..0ef7268f9cdc2c18482bbf7b8dbe19d3ef6b7bbf
 100644
--- a/src/confdb/confdb.h
+++ b/src/confdb/confdb.h
@@ -71,6 +71,7 @@
 #define CONFDB_MONITOR_DEFAULT_DOMAIN "default_domain_suffix"
 #define CONFDB_MONITOR_OVERRIDE_SPACE "override_space"
 #define CONFDB_MONITOR_USER_RUNAS "user"
+#define CONFDB_MONITOR_PRE_KILL_CMD "diag_cmd"
 
 /* Both monitor and domains */
 #define CONFDB_NAME_REGEX   "re_expression"
diff --git a/src/config/SSSDConfig/__init__.py.in 
b/src/config/SSSDConfig/__init__.py.in
index 
bf61c402796122050fa43cf41128faec4771c5d2..60129e6e7fbc96d11c539323346c22a7db6d7f23
 100644
--- a/src/config/SSSDConfig/__init__.py.in
+++ b/src/config/SSSDConfig/__init__.py.in
@@ -50,6 +50,7 @@ option_strings = {
 'reconnection_retries' : _('Number of times to attempt connection to Data 
Providers'),
 'fd_limit' : _('The number of file descriptors that may be opened by this 
responder'),
 'client_idle_timeout' : _('Idle time before automatic disconnection of a 
client'),
+'diag_cmd' : _('The command to run when a service ping times out'),
 
 # [sssd]
 'services' : _('SSSD Services to start'),
diff --git a/src/config/SSSDConfigTest.py b/src/config/SSSDConfigTest.py
index 
45562214da5d227b45914abbcb298e043048adf5..abd4a39258e060f27db62eb2352450b6c405930c
 100755
--- a/src/config/SSSDConfigTest.py
+++ b/src/config/SSSDConfigTest.py
@@ -307,6 +307,7 @@ class SSSDConfigTestSSSDService(unittest.TestCase):
 'reconnection_retries',
 'fd_limit',
 'client_idle_timeout',
+'diag_cmd',
 'description']
 
 self.assertTrue(type(options) == dict,
diff --git a/src/config/etc/sssd.api.conf b/src/config/etc/sssd.api.conf
index 
72abb8b3f427e9bcf3cf3dcaa67aaf4e3e50226c..0c03625bd5f0d3fbf8948971b015d4e8255f0009
 100644
--- a/src/config/etc/sssd.api.conf
+++ b/src/config/etc/sssd.api.conf
@@ -13,6 +13,7 @@ fd_limit = int, None, false
 client_idle_timeout = int, None, false
 force_timeout = int, None, false
 description = str, None, false
+diag_cmd = str, None, false
 
 [sssd]
 # Monitor service
diff --git a/src/monitor/monitor.c b/src/monitor/monitor.c
index 
89ac882d38652e071fdc619495c3252807527da0..5ff5f85527a6aa91e220ddd07e8c8738dba0e829
 100644
--- a/src/monitor/monitor.c
+++ b/src/monitor/monitor.c
@@ -116,6 +116,7 @@ struct mt_svc {
 char *identity;
 pid_t pid;
 
+char *diag_cmd;
 int ping_time;
 int kill_time;
 
@@ -

Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd

2015-11-11 Thread Petr Cech

On 11/10/2015 04:20 PM, Jakub Hrozek wrote:

On Tue, Nov 10, 2015 at 01:22:54PM +0100, Lukas Slebodnik wrote:

>On (10/11/15 13:15), Jakub Hrozek wrote:

> >On Mon, Nov 09, 2015 at 11:32:30AM +0100, Petr Cech wrote:

> >>On 11/04/2015 11:24 AM, Jakub Hrozek wrote:

> >> >Hi,
> >> >
> >> >I created this patch to try to diagnose an issue where sssd would
> >> >randomly restart on any of machines in a VM cluster without giving too
> >> >much advise why. I think it might be useful to merge in general.

> >>
> >>Hi Jakub,
> >>
> >>I reviewed the patch. Code looks good to me.
> >>CI tests passed:http://sssd-ci.duckdns.org/logs/job/32/25/summary.html
> >>
> >>Then I tried to test new functionality.
> >>
> >>Man pages are right, I found diag_cmd in sssd.conf.
> >>
> >>And I really got the right message when I kill sss_pam:
> >># (Mon Nov  9 04:30:47 2015) [sssd] [svc_child_info] (0x0040): Child [25767]
> >>terminated with signal [9]
> >>
> >>I would like to see output of pstack, but I don't know, how to get the right
> >>state of SSSD. Can you help me, please?

> >
> >I tested the patch by setting a low 'timeout' in the 'domain' section
> >and then setting the diag_cmd:
> >[domain/foo]
> >timeout = 2
> >diag_cmd = pstack %p
> >
> >then I stopped the back end:
> ># kill -STOP $(pidof sssd_be)
> >
> >You should see the pstack output in /var/log/sssd/sssd.log, also the
> >debug_level must be increased in the [sssd] section. You might also need
> >to set SELinux to Permissive, otherwise sssd might not be able to fork
> >an exec pstack..

>So in this case I would prefer if this opton was not documented.
>or it should be documented issues with SELinux
>

Hi Jakub,

I just sent the patch to the CI tests and they passed
http://sssd-ci.duckdns.org/logs/job/32/63/summary.html


Then I would prefer undocumented. It matches how we (don't) document the
"command" option.
I agree with little exception. I think it could be more clear if we 
write little documentation to commit message or to the code near the new 
option. But, how everybody wrote, we could leave man page without 
documentation.




A new patch is attached.


0001-SSSD-Add-a-new-command-diag_cmd.patch


 From fb1b8c5fd9fbec475c036563640d7e320d526620 Mon Sep 17 00:00:00 2001
From: Jakub Hrozek<jhro...@redhat.com>
Date: Mon, 2 Nov 2015 11:41:31 +0100
Subject: [PATCH] SSSD: Add a new command diag_cmd

This command is an optional one that is run when a sbus ping times out
and before a SIGKILL commans is sent.

This command supports a single template substitution that expands to the
PID of the service being signaled.
---
  src/confdb/confdb.h  |   1 +
  src/config/SSSDConfig/__init__.py.in |   1 +
  src/config/SSSDConfigTest.py |   1 +
  src/config/etc/sssd.api.conf |   1 +
  src/monitor/monitor.c| 215 +++
  5 files changed, 197 insertions(+), 22 deletions(-)

diff --git a/src/confdb/confdb.h b/src/confdb/confdb.h
index 
37b5fd7c7629e2618a1699e3ffd58110171db605..0ef7268f9cdc2c18482bbf7b8dbe19d3ef6b7bbf
 100644
--- a/src/confdb/confdb.h
+++ b/src/confdb/confdb.h
@@ -71,6 +71,7 @@
  #define CONFDB_MONITOR_DEFAULT_DOMAIN "default_domain_suffix"
  #define CONFDB_MONITOR_OVERRIDE_SPACE "override_space"
  #define CONFDB_MONITOR_USER_RUNAS "user"
+#define CONFDB_MONITOR_PRE_KILL_CMD "diag_cmd"

  /* Both monitor and domains */
  #define CONFDB_NAME_REGEX   "re_expression"
diff --git a/src/config/SSSDConfig/__init__.py.in 
b/src/config/SSSDConfig/__init__.py.in
index 
bf61c402796122050fa43cf41128faec4771c5d2..60129e6e7fbc96d11c539323346c22a7db6d7f23
 100644
--- a/src/config/SSSDConfig/__init__.py.in
+++ b/src/config/SSSDConfig/__init__.py.in
@@ -50,6 +50,7 @@ option_strings = {
  'reconnection_retries' : _('Number of times to attempt connection to Data 
Providers'),
  'fd_limit' : _('The number of file descriptors that may be opened by this 
responder'),
  'client_idle_timeout' : _('Idle time before automatic disconnection of a 
client'),
+'diag_cmd' : _('The command to run when a service ping times out'),
This is the reason why I ask for little documentation... because there 
is '%p' template shadowed.




  # [sssd]
  'services' : _('SSSD Services to start'),
diff --git a/src/config/SSSDConfigTest.py b/src/config/SSSDConfigTest.py
index 
45562214da5d227b45914abbcb298e043048adf5..abd4a39258e060f27db62eb2352450b6c405930c
 100755
--- a/src/config/SSSDConfigTest.py
+++ b/src/config/SSSDConfigTest.py
@@ -307,6 +307,7 @@ class SSSDConfigTestSSSDService(unittest.TestCase):
  '

Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd

2015-11-10 Thread Jakub Hrozek
On Tue, Nov 10, 2015 at 01:22:54PM +0100, Lukas Slebodnik wrote:
> On (10/11/15 13:15), Jakub Hrozek wrote:
> >On Mon, Nov 09, 2015 at 11:32:30AM +0100, Petr Cech wrote:
> >> On 11/04/2015 11:24 AM, Jakub Hrozek wrote:
> >> >Hi,
> >> >
> >> >I created this patch to try to diagnose an issue where sssd would
> >> >randomly restart on any of machines in a VM cluster without giving too
> >> >much advise why. I think it might be useful to merge in general.
> >> 
> >> Hi Jakub,
> >> 
> >> I reviewed the patch. Code looks good to me.
> >> CI tests passed: http://sssd-ci.duckdns.org/logs/job/32/25/summary.html
> >> 
> >> Then I tried to test new functionality.
> >> 
> >> Man pages are right, I found diag_cmd in sssd.conf.
> >> 
> >> And I really got the right message when I kill sss_pam:
> >> # (Mon Nov  9 04:30:47 2015) [sssd] [svc_child_info] (0x0040): Child 
> >> [25767]
> >> terminated with signal [9]
> >> 
> >> I would like to see output of pstack, but I don't know, how to get the 
> >> right
> >> state of SSSD. Can you help me, please?
> >
> >I tested the patch by setting a low 'timeout' in the 'domain' section
> >and then setting the diag_cmd:
> >[domain/foo]
> >timeout = 2
> >diag_cmd = pstack %p
> >
> >then I stopped the back end:
> ># kill -STOP $(pidof sssd_be)
> >
> >You should see the pstack output in /var/log/sssd/sssd.log, also the
> >debug_level must be increased in the [sssd] section. You might also need
> >to set SELinux to Permissive, otherwise sssd might not be able to fork
> >an exec pstack..
> So in this case I would prefer if this opton was not documented.
> or it should be documented issues with SELinux
> 

Then I would prefer undocumented. It matches how we (don't) document the
"command" option.

A new patch is attached.
>From fb1b8c5fd9fbec475c036563640d7e320d526620 Mon Sep 17 00:00:00 2001
From: Jakub Hrozek <jhro...@redhat.com>
Date: Mon, 2 Nov 2015 11:41:31 +0100
Subject: [PATCH] SSSD: Add a new command diag_cmd

This command is an optional one that is run when a sbus ping times out
and before a SIGKILL commans is sent.

This command supports a single template substitution that expands to the
PID of the service being signaled.
---
 src/confdb/confdb.h  |   1 +
 src/config/SSSDConfig/__init__.py.in |   1 +
 src/config/SSSDConfigTest.py |   1 +
 src/config/etc/sssd.api.conf |   1 +
 src/monitor/monitor.c| 215 +++
 5 files changed, 197 insertions(+), 22 deletions(-)

diff --git a/src/confdb/confdb.h b/src/confdb/confdb.h
index 
37b5fd7c7629e2618a1699e3ffd58110171db605..0ef7268f9cdc2c18482bbf7b8dbe19d3ef6b7bbf
 100644
--- a/src/confdb/confdb.h
+++ b/src/confdb/confdb.h
@@ -71,6 +71,7 @@
 #define CONFDB_MONITOR_DEFAULT_DOMAIN "default_domain_suffix"
 #define CONFDB_MONITOR_OVERRIDE_SPACE "override_space"
 #define CONFDB_MONITOR_USER_RUNAS "user"
+#define CONFDB_MONITOR_PRE_KILL_CMD "diag_cmd"
 
 /* Both monitor and domains */
 #define CONFDB_NAME_REGEX   "re_expression"
diff --git a/src/config/SSSDConfig/__init__.py.in 
b/src/config/SSSDConfig/__init__.py.in
index 
bf61c402796122050fa43cf41128faec4771c5d2..60129e6e7fbc96d11c539323346c22a7db6d7f23
 100644
--- a/src/config/SSSDConfig/__init__.py.in
+++ b/src/config/SSSDConfig/__init__.py.in
@@ -50,6 +50,7 @@ option_strings = {
 'reconnection_retries' : _('Number of times to attempt connection to Data 
Providers'),
 'fd_limit' : _('The number of file descriptors that may be opened by this 
responder'),
 'client_idle_timeout' : _('Idle time before automatic disconnection of a 
client'),
+'diag_cmd' : _('The command to run when a service ping times out'),
 
 # [sssd]
 'services' : _('SSSD Services to start'),
diff --git a/src/config/SSSDConfigTest.py b/src/config/SSSDConfigTest.py
index 
45562214da5d227b45914abbcb298e043048adf5..abd4a39258e060f27db62eb2352450b6c405930c
 100755
--- a/src/config/SSSDConfigTest.py
+++ b/src/config/SSSDConfigTest.py
@@ -307,6 +307,7 @@ class SSSDConfigTestSSSDService(unittest.TestCase):
 'reconnection_retries',
 'fd_limit',
 'client_idle_timeout',
+'diag_cmd',
 'description']
 
 self.assertTrue(type(options) == dict,
diff --git a/src/config/etc/sssd.api.conf b/src/config/etc/sssd.api.conf
index 
72abb8b3f427e9bcf3cf3dcaa67aaf4e3e50226c..0c03625bd5f0d3fbf8948971b015d4e8255f0009
 100644
--- a/src/config/etc/sssd.api.conf
+++ b/src/config/etc/sssd.api.conf
@@ -13,6 +13,7 @@ fd_limit = int, None, false
 client_idle_timeout = int, No

Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd

2015-11-10 Thread Petr Cech

On 11/09/2015 07:17 PM, Stephen Gallagher wrote:

There are problems inherent with passing the PID to the child process.
There's no guarantee that the process still exists. In the worst-case,
the PID could actually be reassigned to a new process and the output
you got back from something like pstack could be reading from a
different executable entirely.

+1

I am sorry I didn't see big picture.

Petr
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/sssd-devel


Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd

2015-11-10 Thread Jakub Hrozek
On Mon, Nov 09, 2015 at 11:32:30AM +0100, Petr Cech wrote:
> On 11/04/2015 11:24 AM, Jakub Hrozek wrote:
> >Hi,
> >
> >I created this patch to try to diagnose an issue where sssd would
> >randomly restart on any of machines in a VM cluster without giving too
> >much advise why. I think it might be useful to merge in general.
> 
> Hi Jakub,
> 
> I reviewed the patch. Code looks good to me.
> CI tests passed: http://sssd-ci.duckdns.org/logs/job/32/25/summary.html
> 
> Then I tried to test new functionality.
> 
> Man pages are right, I found diag_cmd in sssd.conf.
> 
> And I really got the right message when I kill sss_pam:
> # (Mon Nov  9 04:30:47 2015) [sssd] [svc_child_info] (0x0040): Child [25767]
> terminated with signal [9]
> 
> I would like to see output of pstack, but I don't know, how to get the right
> state of SSSD. Can you help me, please?

I tested the patch by setting a low 'timeout' in the 'domain' section
and then setting the diag_cmd:
[domain/foo]
timeout = 2
diag_cmd = pstack %p

then I stopped the back end:
# kill -STOP $(pidof sssd_be)

You should see the pstack output in /var/log/sssd/sssd.log, also the
debug_level must be increased in the [sssd] section. You might also need
to set SELinux to Permissive, otherwise sssd might not be able to fork
an exec pstack..
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/sssd-devel


Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd

2015-11-09 Thread Stephen Gallagher
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/09/2015 05:32 AM, Petr Cech wrote:
> On 11/04/2015 11:24 AM, Jakub Hrozek wrote:
>> Hi,
>> 
>> I created this patch to try to diagnose an issue where sssd
>> would randomly restart on any of machines in a VM cluster without
>> giving too much advise why. I think it might be useful to merge
>> in general.
> 
> Hi Jakub,
> 
> I reviewed the patch. Code looks good to me. CI tests passed:
> http://sssd-ci.duckdns.org/logs/job/32/25/summary.html
> 
> Then I tried to test new functionality.
> 
> Man pages are right, I found diag_cmd in sssd.conf.
> 
> And I really got the right message when I kill sss_pam: # (Mon Nov
> 9 04:30:47 2015) [sssd] [svc_child_info] (0x0040): Child [25767]
> terminated with signal [9]
> 
> I would like to see output of pstack, but I don't know, how to get
> the right state of SSSD. Can you help me, please?
> 
> Regards
> 
> Petr


There are problems inherent with passing the PID to the child process.
There's no guarantee that the process still exists. In the worst-case,
the PID could actually be reassigned to a new process and the output
you got back from something like pstack could be reading from a
different executable entirely.

-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iEYEARECAAYFAlZA4zwACgkQeiVVYja6o6PjVQCgoTLJPxp46rdMzBmuTrbXi329
ZMEAn0fFVamHQugQAFjCNxQ4hCnTawmC
=VOCY
-END PGP SIGNATURE-
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/sssd-devel


Re: [SSSD] [PATCH] SSSD: Add a new command diag_cmd

2015-11-09 Thread Petr Cech

On 11/04/2015 11:24 AM, Jakub Hrozek wrote:

Hi,

I created this patch to try to diagnose an issue where sssd would
randomly restart on any of machines in a VM cluster without giving too
much advise why. I think it might be useful to merge in general.


Hi Jakub,

I reviewed the patch. Code looks good to me.
CI tests passed: http://sssd-ci.duckdns.org/logs/job/32/25/summary.html

Then I tried to test new functionality.

Man pages are right, I found diag_cmd in sssd.conf.

And I really got the right message when I kill sss_pam:
# (Mon Nov  9 04:30:47 2015) [sssd] [svc_child_info] (0x0040): Child 
[25767] terminated with signal [9]


I would like to see output of pstack, but I don't know, how to get the 
right state of SSSD. Can you help me, please?


Regards

Petr
___
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/sssd-devel


[SSSD] [PATCH] SSSD: Add a new command diag_cmd

2015-11-04 Thread Jakub Hrozek
Hi,

I created this patch to try to diagnose an issue where sssd would
randomly restart on any of machines in a VM cluster without giving too
much advise why. I think it might be useful to merge in general.
>From cf1f9d56b85387885f6211434b827fdbb9e5f4c3 Mon Sep 17 00:00:00 2001
From: Jakub Hrozek <jhro...@redhat.com>
Date: Mon, 2 Nov 2015 11:41:31 +0100
Subject: [PATCH] SSSD: Add a new command diag_cmd

This command is an optional one that is run when a sbus ping times out
and before a SIGKILL commans is sent.

This command supports a single template substitution that expands to the
PID of the service being signaled.
---
 src/confdb/confdb.h  |   1 +
 src/config/SSSDConfig/__init__.py.in |   1 +
 src/config/SSSDConfigTest.py |   1 +
 src/config/etc/sssd.api.conf |   1 +
 src/man/sssd.conf.5.xml  |  17 +++
 src/monitor/monitor.c| 215 +++
 6 files changed, 214 insertions(+), 22 deletions(-)

diff --git a/src/confdb/confdb.h b/src/confdb/confdb.h
index 
37b5fd7c7629e2618a1699e3ffd58110171db605..0ef7268f9cdc2c18482bbf7b8dbe19d3ef6b7bbf
 100644
--- a/src/confdb/confdb.h
+++ b/src/confdb/confdb.h
@@ -71,6 +71,7 @@
 #define CONFDB_MONITOR_DEFAULT_DOMAIN "default_domain_suffix"
 #define CONFDB_MONITOR_OVERRIDE_SPACE "override_space"
 #define CONFDB_MONITOR_USER_RUNAS "user"
+#define CONFDB_MONITOR_PRE_KILL_CMD "diag_cmd"
 
 /* Both monitor and domains */
 #define CONFDB_NAME_REGEX   "re_expression"
diff --git a/src/config/SSSDConfig/__init__.py.in 
b/src/config/SSSDConfig/__init__.py.in
index 
bf61c402796122050fa43cf41128faec4771c5d2..60129e6e7fbc96d11c539323346c22a7db6d7f23
 100644
--- a/src/config/SSSDConfig/__init__.py.in
+++ b/src/config/SSSDConfig/__init__.py.in
@@ -50,6 +50,7 @@ option_strings = {
 'reconnection_retries' : _('Number of times to attempt connection to Data 
Providers'),
 'fd_limit' : _('The number of file descriptors that may be opened by this 
responder'),
 'client_idle_timeout' : _('Idle time before automatic disconnection of a 
client'),
+'diag_cmd' : _('The command to run when a service ping times out'),
 
 # [sssd]
 'services' : _('SSSD Services to start'),
diff --git a/src/config/SSSDConfigTest.py b/src/config/SSSDConfigTest.py
index 
45562214da5d227b45914abbcb298e043048adf5..abd4a39258e060f27db62eb2352450b6c405930c
 100755
--- a/src/config/SSSDConfigTest.py
+++ b/src/config/SSSDConfigTest.py
@@ -307,6 +307,7 @@ class SSSDConfigTestSSSDService(unittest.TestCase):
 'reconnection_retries',
 'fd_limit',
 'client_idle_timeout',
+'diag_cmd',
 'description']
 
 self.assertTrue(type(options) == dict,
diff --git a/src/config/etc/sssd.api.conf b/src/config/etc/sssd.api.conf
index 
72abb8b3f427e9bcf3cf3dcaa67aaf4e3e50226c..0c03625bd5f0d3fbf8948971b015d4e8255f0009
 100644
--- a/src/config/etc/sssd.api.conf
+++ b/src/config/etc/sssd.api.conf
@@ -13,6 +13,7 @@ fd_limit = int, None, false
 client_idle_timeout = int, None, false
 force_timeout = int, None, false
 description = str, None, false
+diag_cmd = str, None, false
 
 [sssd]
 # Monitor service
diff --git a/src/man/sssd.conf.5.xml b/src/man/sssd.conf.5.xml
index 
573f421a7d885d28d5dbc03423e6c6dd84d7b8fd..615629ad0c4bbd6bc43812cd354d8d61c2f77514
 100644
--- a/src/man/sssd.conf.5.xml
+++ b/src/man/sssd.conf.5.xml
@@ -116,6 +116,23 @@
 
 
 
+
+diag_cmd (string)
+
+
+A command that should be run for diagnostic purpose
+when an sbus timeout fails. The option value may
+contain %p which would be expanded
+for the process ID of the process that timed out.
+
+
+Example: pstack %p
+
+
+Default: Not set
+
+
+
   
 
 
diff --git a/src/monitor/monitor.c b/src/monitor/monitor.c
index 
89ac882d38652e071fdc619495c3252807527da0..5ff5f85527a6aa91e220ddd07e8c8738dba0e829
 100644
--- a/src/monitor/monitor.c
+++ b/src/monitor/monitor.c
@@ -116,6 +116,7 @@ struct mt_svc {
 char *identity;
 pid_t pid;
 
+char *diag_cmd;
 int ping_time;
 int kill_time;
 
@@ -383,6 +384,176 @@ static int add_svc_conn_spy(struct mt_svc *svc)
 return EOK;
 }
 
+static char *expand_diag_cmd(struct mt_svc *svc,
+ const char *template)
+{
+TALLOC_CTX *tmp_ctx = NULL;
+char *copy;
+char *p;
+char *n;
+char *result = NULL;
+char action;
+char *res = NULL;
+
+if (template == NULL)