On Wed, Nov 09, 2016 at 07:26:14AM -0500, Simo Sorce wrote:
> On Wed, 2016-11-09 at 13:08 +0100, Sumit Bose wrote:
> > On Tue, Nov 08, 2016 at 10:28:20AM +0100, Jakub Hrozek wrote:
> > > Hi,
> > > 
> > > I would like to ask for opinions about:
> > > https://fedorahosted.org/sssd/ticket/3126
> > > 
> > > The basic idea is that the responder would choose what kind of 
> > > optimization
> > > would the back end perform when saving the sysdb entries.  Requests that
> > > just return information might choose to optimize very aggressively (using
> > > modifyTimestamp) and requests that actually authenticate or authorize the
> > > user might choose to not optimize at all to avoid issues like the ones we 
> > > saw
> > > with virtual attributes that don't bump the modifyTimestamp attribute at 
> > > all.
> > > 
> > > On the responder side, this is quite easy, just send an additional flag
> > > during the responder request. It's the provider part I'm not so sure
> > > about, because there the optimizations are performed at the sysdb level.
> > > 
> > > So far I can only think about extending sysdb_transaction_start() (or
> > > providing sysdb_opt_transaction_start and letting the old
> > > sysdb_transaction_start default to no optimization) which would
> > > internally keep track of the active transaction and the optimization we
> > > want to perform. Since only sssd_be is the cache writer and there is
> > > only one cache per domain.
> > > 
> > > Additionally, we would have to keep the transaction optimization level 
> > > around
> > > in some context until the request bubbles from the data provider handler 
> > > to
> > > actually saving the transaction. I don't I hope this won't be too messy, 
> > > but
> > > since the requests are asynchronous, so far I don't see any way around
> > > it. The only thing that might be less messy in the long term is to provide
> > > a bit more generic structure ("request status") that would so far only
> > > include the optimization level and later might be extended to include
> > > e.g. intermediate data. But on the other hand, I'm not sure I have
> > > thought about passing the data between requests hard enough to design
> > > this properly. Should I?
> > > 
> > > Any other opinions? Thoughts?
> > 
> > I'm not sure if this wouldn't confuse users. If I understood the
> > proposal correctly the typical nss calls like getpwnam(), getgrnam() etc
> > will use the optimized requests with modifyTimestamp checks. If there is
> > a server which does not change modifyTimestamp on updates changes in
> > e.g. the user's shell or in the list of group-members will not be
> > visible even after calling 'sss_cache -E' because modifyTimestamp didn't
> > change on the server. But typically a user does not log in again to
> > check for such changes but just calls e.g. 'getent group changed_group'
> > again and again waiting for changes.
> > 
> > I think if there are really such servers out there which do not update
> > modifyTimestamp it should be just possible to switch of the time-stamp
> > cache generally in this case (I wonder if this is already possible by
> > setting ldap_{user|group|netgroup}_modify_timestamp to a non-existing
> > attribute?).

Yes, this alrady works (I haven't tested this now, but I did when I was
working on the sysdb speedups feature)

>> In this case the number of cache updates can be reduced by
> > increasing the cache timeout because during authenticating and
> > authorization fresh data is requested from the backend anyways.
> 
> There are cases where things like CoS are used that can change an
> attribute value (even dynamically) but do not alter the modifyTimestamp.
> In general operational attributes can behave that way.

Right, nsAccountLock behaves this way for example. What we did actually
to mitigate this was never use modifyTimestamp to compare user entries,
but only for groups -- it's the user entries that typically contain
critical attributes for access control. For users we always fall back to
"deep" attribute value comparison. This "should" also be OK since the
number of attributes of users is typically low.

> 
> > I think the time would be better spend e.g. on
> > https://fedorahosted.org/sssd/ticket/3211 "Refactor the
> > sdap_async_groups.c module" and make sure all needed data is read from
> > LDAP (multiple LDAP connections might be involved here) first and
> > written to the cache in a single transaction.

Yes, this is a good enhancement, but it's not so much about writing in a
single transaction -- for plain LDAP groups we already use a single
transaction. The problem in the sdap_async_groups.c module is that we
often iterate over the attribute list several times to e.g. look at what
members are already cached. Witht the refactoring, the aim would be to
only iterate over the potentially large list once and store the
intermediate data (like a list of cached members, list of uncached
members) in memory.

The problem with multiple transactions happens if the request must
contact different provider types (for example when resolving AD object
with IPA overrides); currently we run a full request including a
save-to-disk transaction because the sysdb is currently the only way to
pass data between different providers.

So I agree both these problems should be tackled, they are just two
different problems :-)

> 
> The problem is deciding when to forcibly reload data we already have in
> the cache, regardless of cache status.

Since we are already on the cautious side and never use modifyTimestamp
for users, but only for groups, would it be a clearer engineering
approach to measure what speedup can we gain by refactoring the
sdap_async_groups.c module versus using modifyTimestamp for users and
/then/ decide if the additional complexity is worth it?
_______________________________________________
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org

Reply via email to