On Fri, Jan 04, 2019 at 09:20:20AM +0000, R Davies wrote:
> (re-sending as I initially sent to ssd-users-owners in error)
> 
> For an AD environment using service discovery.
> 
> Periodically sssd will invalidate its cache at unexpected times.  Digging
> around debug logs and sources leads me to understand the following:
> 
> Every 15 minutes (or as defined by ldap_connection_expire_timeout) sssd
> re-establishes the connection to LDAP, closing the exiting collection.
> When sssd is configured to auto discover (via DNS _srv_ records, where the
> priority is the same for each server); auto-discovery might return a
> different LDAP server, at which point sssd's stored uSNChanged values are
> invalid (as these are unique to each server), the cached values are
> cleared, and enumeration is run - essentially afresh - against the new LDAP
> server.

Thank you very much for digging into the issue.

> 
> Is this outcome expected by design?

Honestly, I'm not sure and I would like some other developers to chime
in with their opinion.

Historically, we've said that SSSD should stick to a 'working' server as
long as it can, so on one hand I see the point in the sticky behaviour.
On the other hand, I've also seen admins relying on the TTL validity of
the SRV records, expecting that, if they change the SRV records, the
client chooses a new server after the TTL expires.

> 
> This behaviour is rather unfortunate as sssd_be will become CPU hog as it
> rebuilds the cache again.
> 
> It is possible to work around the behaviour e.g.:
> 
> 1) by not using service discovery, i.e.

Yes, in this case, the same server will always be selected from the
list, working around the problem.

> 
> ad_server = server1
> ad_backup = server2
> 
> which is fairly tiresome to maintain across an estate - separate
> configurations for different sites etc, faking load balancing by swapping
> configurations.
> 
> 2) having different priorities for each AD server in a given site, losing
> load balancing - unless DNS gave out different priorities depending on the
> source of the request, but this seems messy.
> 
> A better approach might be to patch sssd's auto discovery to "stick" to the
> previously bound LDAP server, currently the first server in the list of
> primary servers returned by ad_sort_servers_by_dns().  I have a proof of
> concept patch that is straight forward, and fairly well contained, the
> behaviour is controlled by an ad_sticky option in sssd.conf.
> 
> Is there a better solution to this problem?   Would a patch - as vaguely
> outlined above - likely gain acceptance?

If the behaviour is controllable by an option, my opinion is that it
would be a good approach.

Would the stickiness also persist across SRV priority levels? What I
mean is that if server1 had originally the highest priority (the lowest
priority value in the SRV record), but then the SRV record is expired
and the server is suddendly in a lower priority tier, IMO then the server
should be 'forgotten' and a new one chosen..
_______________________________________________
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org

Reply via email to