[SSSD] Re: Design Document: Data Provider Refactoring

Jakub Hrozek Wed, 06 Apr 2016 08:46:39 -0700

On Fri, Apr 01, 2016 at 12:07:04PM +0200, Pavel Březina wrote:
> https://fedorahosted.org/sssd/wiki/DesignDocs/DataProvider
> 
> For your convenience, the text is copied below:
> 
> = Data Provider Refactoring =
> 
> Related ticket(s):
>  * https://fedorahosted.org/sssd/ticket/385


I think you can also add https://fedorahosted.org/sssd/ticket/1878 and more
importantly https://fedorahosted.org/sssd/ticket/490. #385 is something
we want to enable with this work.

> 
> == Problem statement ==
> 
> Current state of data provider interface is not extensible enough to fulfil
> needs of planned SSSD features such as
> [https://fedorahosted.org/sssd/wiki/DesignDocs/sssctl SSSD Status Tool]. The
> main flaw that we aim to solve is to simplify adding of new methods,
> properties and possibly signals using our ''sbus'' interface. As a side
> effect we will also solve the following issues that are in current code:
> 
> * encapsulate data provider from the rest of the code
> * fix poor memory hierarchy which creates occasional race condition on
> shutdown
> * convert method handlers to ''simple'' tevent requests that are not aware
> of data provider
> * handle D-Bus message reply automatically in data provider code
> 
> === Terminology ===
> 
> This section clarifies the terminology that is used in this document.
> 
> * '''Backend''': Implementation of a domain (periodic tasks, online/offline
> callbacks, online check, ...)
> * '''Data Provider''': Interface between backend and responders
> * '''Module''': library implementing data provider interface (LDAP, IPA, AD,
> KRB5, PROXY)
> * '''Target''': functionality implemented in modules (id, auth, chpass,
> selinux, autofs, sudo, hostid)

It would be nice to clarify that when this text says "D-Bus" it doesn't
mean the system bus, but only the D-Bus protocol over a local socket.


> 
> A general overview of the communication process is as follows.
> 
> 1. Responder issues a method call with Data Provider through DP D-Bus API
> 2. Data Provider calls a method handler registered by configured module
> 3. Method handler is finished
> 4. Reply is sent to responder
> 
> == Current state ==
> 
> This is just a brief summarization, please refer to the code to get the
> whole picture.
> 
> At this moment each target can have only one method specified. The method is
> defined by providing bet_ops data in sssm_$module_$target_init function.
> Structure bet_ops contains ''handler'' that defines a method handler in
> addition with ''check_online'' which defines a method that should be called
> when SSSD is trying to check if it can reestablish a connection and it is
> used only in connection with ID provider. Field ''finalize'' was probably
> introduced as a clean up function, however it is not used at all at the
> moment.
> 
> Even though it is not possible with current code to have different private
> data for different methods, it is possible to extend this structure to allow
> more methods. However, it would be nice to have it in more automated and
> controlled way and we still can't use properties and signals this way
> though.
> 
> {{{
> struct bet_ops {
>     be_req_fn_t check_online;
>     be_req_fn_t handler;
>     be_req_fn_t finalize;
> };
> }}}
> 
> Each target is defined in ''struct bet_data''.
> 
> {{{
> static struct bet_data bet_data[] = {
>     {BET_NULL, NULL, NULL},
>     {BET_ID, CONFDB_DOMAIN_ID_PROVIDER, "sssm_%s_id_init"},
>     [...]
>     {BET_MAX, NULL, NULL}
> };
> }}}
> 
> Initialization function assigns the bet_ops structure together with private
> data. The private data are attached to ''be_ctx'' in talloc memory hierarchy
> which results in race conditions during shutdown process. This is currently
> solved by ''be_spy'' which basically forces the desired order of freeing
> data, however we have seen some crashes on shutdown which we were unable to
> figure out so far even with spies.
> 
> {{{
> /* Auth Handler */
> struct bet_ops sdap_auth_ops = {
>     .handler = sdap_pam_auth_handler,
>     .finalize = sdap_shutdown
> };
> 
> int sssm_ldap_auth_init(struct be_ctx *bectx,
>                         struct bet_ops **ops,
>                         void **pvt_data)
> {
>     struct sdap_auth_ctx *ctx;
>     int ret;
> 
>     [...]
> 
>         *ops = &sdap_auth_ops;
>         *pvt_data = ctx;
>     }
> 
>     return ret;
> }
> }}}
> 
> === Goals to achieve ===
> 
> * make adding a new client automated and error proof
> * make adding a new target automated and error proof
> * make adding a new method automated and error proof
> * create a proper talloc hierarchy so we can control clean up process

We've seen some recent crashes related to sssd_be going away while a
request was pending. It would be nice to explicitly say in the design
page how the new design helps to remediate that.

> * support module's contructor and private data shared across target's
> initialization functions
> * make method handlers pure tevent requests that returns single error code

Currently we return a triple, err_min, err_maj and error string, do you
plan on changing that? (The document says you don't further down, but
this sentence is a bit conflicting IMO..)

> * make method handlers not aware of reply process
> * improve debugging capabilities
>  * keep track of active requests
>  * make each request clearly visible in logs
> * allow methods with different output parameters
> * allow D-Bus objects, properties and signals
> * properly terminate all requests on clean up
> 

(I agree with the design, but would like to see some example..)

> === Implementation steps ===
> 
> 1. (done) Implement the new data provider interface

If this is done, can we see the code in some WIP branch? I would just
like to catch any design issues as soon as possible and looking at the
actual code might help.

> 2. (wip) Convert modules init functions
> 3. (wip) Convert existing handlers into tevent requests
> 4. Switch to the new interface
> 5. Add new methods and interfaces as needed
> 
> === Responders ===
> 
> In the first stage no change to the responders needs to be done. 

Not even the responder_common part? That would be nice, because there we
have quite a lot of code that is not tested, but important, such as
chaining duplicate requests etc. I wouldn't like to change this code
without tests..

> All
> existing data provider methods will always succeed and return three output
> parameters (major error, minor error, error message) as the current code
> does. New methods that return error or some output parameters may be added
> without affecting the current responder data provider code. When the new
> code is thoroughly tested we can change the existing methods to return
> either error or success but this requires also changes in responders. I
> would like to write something similar to cache_req but I don't have any
> specific plan so far.
> 
> === Questions ===
> 
> === Configuration changes ===
> 
> No configuration changes.
> 
> === How To Test ===
> 
> All existing test must pass and no functionality is broken.

This is hard to say without seeing the code, but the existing code, at
least in the responder, as I said above, covers a number of use-cases
that ordinary tests might not catch, so we might want to work on good
unit tests..

> 
> === How To Debug ===
> 
> Each data provider request life cycle can be tracked in debug logs with a
> special message prefix: '''DP Request [$name #$index]'''. The $name is the
> name of the request (i.e. which method was called), $index is a cyclic
> number assigned to the request. When we run out of number we siply start
> from 1 again.
> 
> In the debugger we can monitor active data provider request, clients,
> modules and targets in '''be_ctx->provider'''.
> 
> === Authors ===
> 
> Pavel Březina <[email protected]>
> _______________________________________________
> sssd-devel mailing list
> [email protected]
> https://lists.fedorahosted.org/admin/lists/[email protected]
_______________________________________________
sssd-devel mailing list
[email protected]
https://lists.fedorahosted.org/admin/lists/[email protected]

[SSSD] Re: Design Document: Data Provider Refactoring

Reply via email to