[SSSD-users] Re: Offline caching of group names and memberships?

Simo Sorce Tue, 24 Sep 2019 10:47:26 -0700

On Tue, 2019-09-24 at 17:58 +0200, Lukas Slebodnik wrote:
> On (24/09/19 09:26), Simo Sorce wrote:
> > On Tue, 2019-09-24 at 10:56 +0200, Lukas Slebodnik wrote:
> > > On (23/09/19 18:04), Simo Sorce wrote:
> > > > On Mon, 2019-09-23 at 22:53 +0200, Lukas Slebodnik wrote:
> > > > > On (23/09/19 15:55), Simo Sorce wrote:
> > > > > > On Mon, 2019-09-23 at 14:39 -0500, Spike White wrote:
> > > > > > > All,
> > > > > > > 
> > > > > > > Our cybersecurity team doesn’t allow Linux sysadmins to directly 
> > > > > > > log in as
> > > > > > > root.  (violates accountability, auditability and traceability).  
> > > > > > > We log in
> > > > > > > with an ADM account, which is then eligible to become root via 
> > > > > > > ‘sudo su –‘.
> > > > > > > 
> > > > > > > That is, all members of a particular group are allowed to sudo to 
> > > > > > > root.
> > > > > > > 
> > > > > > > This is preferred because with modern sudo versions all sudo 
> > > > > > > sessions are
> > > > > > > session-logged.
> > > > > > > 
> > > > > > > Anyway, if I log in with my ADM account and someone shuts down 
> > > > > > > sssd, it no
> > > > > > > longer knows what groups I’m in.  That is, the session is still 
> > > > > > > there – but
> > > > > > > it cannot look up the group names.
> > > > > > > 
> > > > > > > [admspike_white@zzzdmsdev06 ~]$ id
> > > > > > > 
> > > > > > > uid=2025431 gid=1002 groups=1002,2284295
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Because the sudo privs are based on group name, it doesn’t allow 
> > > > > > > Linux
> > > > > > > sysadmins to become root and thus start sssd.
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Is there a way to cache those group names and memberships?  Say 
> > > > > > > with nscd?
> > > > > > > So that if sssd is (temporarily) shut down, we can become root 
> > > > > > > and start up?
> > > > > > 
> > > > > > sssd already caches user and group tables for fast lookup, but those
> > > > > > caches are not very big, so if you have very many groups you may 
> > > > > > need
> > > > > > to increase the size.
> > > > > > 
> > > > > > Also these caches have somewhat strict timeouts, I forget if they 
> > > > > > stop
> > > > > > returning anything at all if the timeout is expired.
> > > > > > 
> > > > > > 
> > > > > 
> > > > > The behaviour of fast mmap cache is to fall back to daemon in case of
> > > > > expired entry. Which is by default just 5 minutes.
> > > > > And if sssd is not running then it will not return anything.
> > > > > 
> > > > > > > Obviously, we can go look up the root password for the particular 
> > > > > > > server –
> > > > > > > but that’s a painful portal.  It’d be better if we could cache 
> > > > > > > group names
> > > > > > > and memberships, if sssd is temporarily down or offline.
> > > > > > 
> > > > > > Perhaps an RFE to return whatever was in cachi, even if expired, if
> > > > > > sssd daemons are unresponsive may be opened, should that be the
> > > > > > behavior when caches timed out.
> > > > > > 
> > > > > > 
> > > > > 
> > > > > I do not see a reason why sssd should be temporarily down.
> > > > > If there is a crash then it should be restarted by systemd.
> > > > > If sssd is running but in offline mode then it should return even
> > > > > expired entries from the cache.
> > > > > 
> > > > > I would say the biggest problem in the description is
> > > > > "someone shuts down sssd". And just somebody with root privileges can 
> > > > > do that.
> > > > > But if sb has root(sudo) access then it can break anything there 
> > > > > (even sshd)
> > > > > And thus nobody can connect there. What would you do in such 
> > > > > situation?
> > > > 
> > > > Not sure what would you do with a rouge admin, but there can definitely
> > > > be cases where sssd will refuse to start, for example if an admin fat-
> > > > fingers the config file, in that case allowing the fast cache to be
> > > > used would save the day.
> > > > 
> > > 
> > > `sssctl config-check should help
> > > 
> > > Admin should be careful when touching critical critical services sssd/sshd
> > > and be prepared for recovery.
> > > 
> > > It is not a problem of daemons but admins.
> > 
> > We build tools for admins, not for platonic perfections though...
> > 
> 
> I thought there was assumption that sssd will never handle root
> because it is a prerequisite to run sssd itself. (chicken and egg problem)
> And the issue with sudo and group membership is almost like that.


SSSD could handle root just fine, we chose not to because SSSD
initially was for network identities.

Now that we have support for the files provider though, it is possible
SSSD focus can shift toward playing with root accounts too. 

> > > 
> > > > So I think that regardless of how sssd can end up in a state where it
> > > > is not running it may be useful to allow to return whatever information
> > > > we have so that the system is more recoverable, after all the
> > > > information there may be stale, but it is not incorrect.
> > > > 
> > > > That said if sudo rules are served via SSSD there may be issues there
> > > > too, but that is another story.
> > > > 
> > > 
> > > sudo rules do not have fast memory cache and thus relying on
> > > users and groups from fast memory cache is not enough in case of 
> > > not-running
> > > sssd.
> > 
> > Yes but for this case probably sudo rules are hardcoded in the sudoers
> > file.
> > 
> 
> OK that would be reasonable. But would be good to get info from reporter :-)
> 
> 
> > > IMHO, there still should be a way how to do disaster recovery
> > > in case of unresponsive sshd/sssd. I cannot see any issue in sssd itself 
> > > here.
> > 
> > The issue is in not using the fast cache when there is no reason not
> > to.
> > 
> 
> You cannot rely on fast cache it might be half populated and admins
> need to be lucky to get right group membersip in case of "unresponsive"
> sssd. The only reliable way would be to query ldb cache.
> But then either sssd_nss is running or sssd nsswitch plugin would need to know
> hot to get data from ldb cache,
> 
> So it is not clear to me what do you suggest.
> How would you solve such special case in sssd?

So we have a quite a few options.
One option would be indeed to link nss_sss with the ldb code so it ould
do direct queries if the user has at least read access, not a very
interesting case, given users generally do not have access to the ldb
caches.

Another option is to allow admins to mark some groups as important and
make sure to never kick them out of the fast cache. This is actually
potentially a good performance tuning option, for setups where there
are large amounts of groups but only a few are really important to
servers. (even better if we could somehow auto-learn what groups are
critical, but an option would be the next best thing).

Setting important group may also trigger a timer within sssd so that it
regularly refreshes the user/group fast caches, this would avoid
periodic performance hits to critical applications when the fast cache
expires.

> > > But it would be good to get details from Spike about "someone shuts down 
> > > sssd"
> > 
> > Or some other system issue breaks it. For example a bad upgrade that
> > breaks some library sssd depends on or other issues like that.
> > 
> 
> The same can happen to sshd and you will not be able to connect there either.

Sure anything can happen, it is just a matter of making he system a
little more resilient where possible.

> So group membership would not solve anything. And there still should be
> a procedure how to solve such disaster.

The user said they have a procedure, but it is cumbersome, so a minor
change to allow the cache to be used when the sssd responder dies would
be beneficial.

> I am sorry to be a pessimist here.

It's a matter of degrees, sure a change to allow the fast cache to be
used will not solve all issues, but it can help save the day in *some*
cases. It also may avoid other issues not connected with login or sudo.

For example if the admin is performing maintenance and stopped sssd
temporarily, the fast cache can keep other applications running
smoothly even after expiration while the admin tries to fix whatever
issue sssd has. I think this aspect is quite important and alone would
justify a change in this direction as it avoids disrupting a running
process (to the point it may decide to crash or exit) during partial
maintenance windows.

Simo.

-- 
Simo Sorce
RHEL Crypto Team
Red Hat, Inc



_______________________________________________
sssd-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/[email protected]

[SSSD-users] Re: Offline caching of group names and memberships?

Reply via email to