Re: [SSSD] RFC: Improving the debug messages

Pavel Březina Wed, 01 Jul 2015 04:18:03 -0700

On 07/01/2015 12:48 PM, Jakub Hrozek wrote:

On Wed, Jul 01, 2015 at 12:03:48PM +0200, Pavel Březina wrote:

On 06/29/2015 06:13 PM, Jakub Hrozek wrote:

Hi,


I spent many hours debugging SSSD in different scenarios last week and I
admit it wasn't always easy -- and I have the source code knowledge I
can use. I imagine it's considerably harder for users and admins..

So this is a brainstorm request on how can we make debugging with SSSD
easier. Maybe there are some low-hanging fruits that can be fixed
easily. Off the top of my head:

- it should be easier to see start and end of a request in the back end.
   Instead of:
     [be_get_account_info] (0x0200): Got request for [0x1001][1][name=admin]
     [acctinfo_callback] (0x0100): Request processed. Returned 0,0,Success 
(Success)
   We could make the debug messages more explicit:
     [be_get_account_info] (0x0200): Received request for 
[object=user][key=name][value=admin]
     [acctinfo_callback] (0x0200): Finished request for 
[object=user][key=name][value=admin]. Returned 0,0,Success

   Then we could document the messages in our troubleshooting document.
   Please note I'm not proposing to turn debug messages into any kind of
   API and keep them the same forever, but decorate the usual flow with
   messages that make sense without source level knowledge.

- same for authentication

- same for responder cache requests. We seem to have gotten better with
   the new cache_req code there, so this is mostly about using the new
   code in all responders. But also the commands we receive from sockets
   should be printed in human-readable form.

- Running sssd in environment where all actions complete successfully
   should emit no debug messages. Default log level should be moved to
   SSSDBG_OP_FAILURE or CRIT_FAILURE. (This basically amounts to checking
   all OP, FATAL and CRIT failure messages..)

   The reason is that sometimes sssd fails, but because logging is
   totally silent, we don't know what happened at all. Currently we have
   a couple of small bugs where we might print a loud DEBUG message just
   because we search for an entry which is not there etc.

- anything that causes SSSD to fail to start should also emit a syslog
   message. Admins don't really know about sssd debug logs.

- our man pages are not structured well, especially the LDAP man page is
   too big and contains too many options.

One reason I'm bringing this up now is that we'll have a new SSSD developer
starting soon and these might be nice tasks to start with AND they're
also needed.


+1

I think the best way to start is to look at the existing debug messages and
take advantage of the bit-mask based log levels - it's been there for a
while and there is lots of space to increase its granularity but we still
use it as 1-9 levels. That beeing said, we should create more levels so we
can really distinguish between important trace parts (event start, event
end), information (object not found but this was expected), low level stuff
(ldb traces that brings lots of noice to the highest level), etc.


The ldb traces are a really good point.

OK, I like the suggestion with newer debug levels, but we should be
careful with not adding too many.

Yes, of course, the levels needs to be thought trough. In my opinion,that requires deep analysis of existing debug messages to see how thetypes can be grouped together. I can't do it from the top of my headeven though I have some ideas.

Having a larger granularity can be also a benefit while investigatinglogs not just when obtaining them because we will be able to highlightinteresting parts (e.g. failover related messages when dealing withserver resolution).

Now, I'm just thinking loud because this would be much bigger scope thanwhat would like you to see. Maybe, we should rethink this completelyfrom the beginning. What I'd like to see in logs:


a) trace so we know where we are and what is happening
b) see all errors
c) get rid of too low level noise, unless it is a special case

a) and b) is something you always want to see, so there is actually nopoint to create a level for that. It may beneficial to create a two-partsystem (category: failover, ldap, request, ...; priority: warning,error, information, trace, ...).

The other thing to be careful about is that admins are used to just set:
     debug_level=10
Adding new debug levels to make life easier for admins but then not
enabling them if admins set what they're used to is also a bit
contradictory :0

The new levels are there for four years, or how long I am with theproject. It was the first thing I have developed. So it may be a time tostart enforcing it.

Now, I'd like to be honest but I don't mean to insult anyone. If anadministrator is able to actually read something from the logs, he willbe able to use other level format. The others just set whatever we tellthem anyway.


_______________________________________________
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/sssd-devel

Re: [SSSD] RFC: Improving the debug messages

Reply via email to