On (01/07/15 12:48), Jakub Hrozek wrote:
>On Wed, Jul 01, 2015 at 12:03:48PM +0200, Pavel Březina wrote:
>> On 06/29/2015 06:13 PM, Jakub Hrozek wrote:
>> >Hi,
>> >
>> >I spent many hours debugging SSSD in different scenarios last week and I
>> >admit it wasn't always easy -- and I have the source code knowledge I
>> >can use. I imagine it's considerably harder for users and admins..
>> >
>> >So this is a brainstorm request on how can we make debugging with SSSD
>> >easier. Maybe there are some low-hanging fruits that can be fixed
>> >easily. Off the top of my head:
>> >
>> >- it should be easier to see start and end of a request in the back end.
>> >   Instead of:
>> >     [be_get_account_info] (0x0200): Got request for [0x1001][1][name=admin]
>> >     [acctinfo_callback] (0x0100): Request processed. Returned 0,0,Success 
>> > (Success)
>> >   We could make the debug messages more explicit:
>> >     [be_get_account_info] (0x0200): Received request for 
>> > [object=user][key=name][value=admin]
>> >     [acctinfo_callback] (0x0200): Finished request for 
>> > [object=user][key=name][value=admin]. Returned 0,0,Success
>> >
>> >   Then we could document the messages in our troubleshooting document.
>> >   Please note I'm not proposing to turn debug messages into any kind of
>> >   API and keep them the same forever, but decorate the usual flow with
>> >   messages that make sense without source level knowledge.
>> >
>> >- same for authentication
>> >
>> >- same for responder cache requests. We seem to have gotten better with
>> >   the new cache_req code there, so this is mostly about using the new
>> >   code in all responders. But also the commands we receive from sockets
>> >   should be printed in human-readable form.
>> >
>> >- Running sssd in environment where all actions complete successfully
>> >   should emit no debug messages. Default log level should be moved to
>> >   SSSDBG_OP_FAILURE or CRIT_FAILURE. (This basically amounts to checking
>> >   all OP, FATAL and CRIT failure messages..)
>> >
>> >   The reason is that sometimes sssd fails, but because logging is
>> >   totally silent, we don't know what happened at all. Currently we have
>> >   a couple of small bugs where we might print a loud DEBUG message just
>> >   because we search for an entry which is not there etc.
>> >
>> >- anything that causes SSSD to fail to start should also emit a syslog
>> >   message. Admins don't really know about sssd debug logs.
>> >
>> >- our man pages are not structured well, especially the LDAP man page is
>> >   too big and contains too many options.
>> >
>> >One reason I'm bringing this up now is that we'll have a new SSSD developer
>> >starting soon and these might be nice tasks to start with AND they're
>> >also needed.
>> 
>> +1
>> 
>> I think the best way to start is to look at the existing debug messages and
>> take advantage of the bit-mask based log levels - it's been there for a
>> while and there is lots of space to increase its granularity but we still
>> use it as 1-9 levels. That beeing said, we should create more levels so we
>> can really distinguish between important trace parts (event start, event
>> end), information (object not found but this was expected), low level stuff
>> (ldb traces that brings lots of noice to the highest level), etc.
>
>The ldb traces are a really good point.
>
>OK, I like the suggestion with newer debug levels, but we should be
>careful with not adding too many.
>
>The other thing to be careful about is that admins are used to just set:
>    debug_level=10
We needn't extend old debug levels just bit mask version
and old debug level(0-9) can be a bitmask of more new debug_levels.

So if someone set debug_level 9 or 10 it will still enable all debug messages.

LS
_______________________________________________
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/sssd-devel

Reply via email to