Hi James, I'll try to include questions/comments/suggestions in-line below.
> We have an air-gapped network of RHEL7 hosts that use sssd to perform > PKINIT (smartcard + Kerberos) authentication against Windows Server > 2016 domain controllers. > > Setting this up properly entailed setting pkinit_anchors, pkinit_pool, > pkinit_cert_match, et. al. in the krb5.conf file, and enabling > smartcard authentication in gdm. It also entailed adding individual > certificates to each user object’s userCertificate property, which our > Windows guys grumbled about. And I'm guessing the AD servers have the root and issuing CA certificates imported and trusted right? Since you're problem is intermittent I would guess CA certificates missing isn't your issue. But, it can be a common one (at least during initial setup or during CA moves/retirements). > > (The way Windows performs PKINIT is to find the certificate on the > card that has a Microsoft User Principal Name X509v3 Subject > Alternative Name, extract that value, and then look for the AD user > object that has the same userPrincipalName. But the version of sssd > that shipped with RHEL7 can’t do that SAN/userPrincipalName matching.) > > For the most part, this has worked, and worked well. Once again, sssd > has been an invaluable tool. > > But. > > For some accounts, smartcard authentication does not work, *even > though* you can use kinit to perform PKINIT against the card (e.g., if > you login via password authentication, then insert the smartcard once > you have a shell window to play with). When you're testing with kinit, are you running something like this: kinit -X X509_user_identity=PKCS11:module_name=/usr/lib64/opensc-pkcs11.so principal@REALM Just want to make sure I'm thinking of the correct test here. > > For the accounts where smartcard authentication works, after you enter > your username in gdm, the card blinks for a few seconds, and then you > are prompted to enter the PIN as follows: > > <CN> PIN: > > …where <CN> is the value of the CN= field of the certificate Subject > of the certificate on the smartcard that contains the Microsoft UPN > SAN. E.g.: > > LASTNAME.FIRSTNAME.123456789 PIN: > > For the accounts where smartcard authentication fails, after you enter > your username in gdm, the card blinks for a few seconds, and then you > are prompted to enter the PIN as follows: > > PIN for Smartcard: > > That PIN prompt is the kiss of death: even if you enter the correct > PIN, authentication will always fail. This may be an indication that SSSD is timing out during a step but, I'm not 100% sure. > > We know that our Kerberos configuration (e.g., pkinit_cert_match) > correctly yields one (and only one candidate certificate) from the > smartcard, which is the correct certificate: > > pkinit_cert_match = &&<SAN>.*@.* > > And running kinit (with PKINIT) against the smartcard works just fine. > But logins fail for some users and not others. Which almost certainly > means that something is derailing sssd. But it’s not obvious what it > is. We’ve double-checked that the userCertificate objects are correct > in AD (that is, they match the smartcard). And this makes me think that SSSD is timing out while trying to talk to the AD server for Kerberos communications. > > Even more confusingly, the accounts for which smartcard authentication > works versus doesn’t work can change over time. For example, a few > weeks ago, my own account worked for smartcard login; now it doesn’t. > But we know we made no configuration changes and applied no package > updates to the host. > > I have also had the situation where I got the “PIN for Smartcard” gdm > prompt, rebooted the host, and then got the “<CN> PIN” gdm prompt. > That almost implies an sssd caching issue, or inconsistent > data/behavior between our (two) domain controllers. Can you try setting a couple timeouts to see if this helps? I'd suggest trying the following: 1. add kerberos timeout to the [domain/whatever] section of the sssd.conf: krb5_auth_timeout = 60 2. add a p11_child timeout to the pam section (less likely to be your issue from the symptoms): p11_child_timeout = 60 > > Again, these are air-gapped systems, so I can provide no logs; we are > going to have to slog through the sssd logs and figure it out on our > own. Can you give version numbers in case there were known bugs we might be able to identify here? One other question related to being air-gapped, do the certificates on the cards have OCSP/CRL info/urls set? If so, SSSD may be trying to check that if not disabled. So, if your certificates set OCSP, you may need to disable. You can test this with something like: 3. Disable OCSP verifications in the [sssd] section of the sssd.conf file: certificate_verification = no_ocsp FYI, in RHEL8 we have "soft" fail options for OCSP/CRL but, those didn't make it into RHEL7's version of SSSD. certificate_verification = soft_ocsp,soft_crl > > Questions for the list: > > * Does this sound familiar to anyone? Have you already been down this > path? If so, what did you discover? Maybe, I'm hoping this is a simple timeout issue and the suggestions above work. From most of your symptoms, I think it may be the kerberos timeout issue. The OCSP issue is probably not your problem but, I've heard of (not seen personally) issues with unreliable network connectivity to OCSP servers. So if you have something in your air-gapped network that is acting as an OCSP server, it may be something to look into. > > * sssd logging can be quite voluminous (particularly at higher > debugging levels), to the point where I fear I might miss the needle > in the haystack that is indicating the problem. Can anyone provide > some tips on specific areas where I should focus? Yes, there is a LOT of data in sssd logs especially when using "debug_level = 9". I usually start with the p11_child.log to make sure that SSSD properly identified the card. This is also where you should see OCSP failures disable use of a certificate on the card IIRC. If it finds the certificate, you might see kerberos timeouts in the krb5_child.log file. After that, you can look through the sssd_pam.log file. One method of sorting through the logs to find smart card related issues that I've also used is to find a timestamp of failed attempt in /var/log/secure (if setup) or the journal and just grep for that in /var/log/sssd and just sort through those. > > Thanks in advance for any tips/advice. I hope that helps, Scott _______________________________________________ sssd-users mailing list -- [email protected] To unsubscribe send an email to [email protected] Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/[email protected] Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
