Re: [Freeipa-users] replica running trust-agents can't resolve AD users - which of these sssd errors should I be focusing on?
Really appreciate the high-level of insight and support on this list. Very refreshing! Alexander Bokovoy wrote: Can you show us ldap_child.log and krb5_child.log from /var/log/sssd on the replica? krb5_child.log is totally empty (strange) as I thought I had debug_level = 10 set everywhere ldap_child.log is posted at this URL due to length: http://chrisdag.me/ldap_child.log.sanitized.txt There seem to be something weird with networking stack, because at 15:43:13 the next attempt to connect gets 'connection refused'. May be 389-ds is just warming up and there is not enough CPU or I/O to handle the load? So, sssd on the replica is able to retrieve information from the replica's LDAP server. It also is able to retrieve the trust topology information and retrieve the trusted domain objects to use against the forest root domains your deployment trusts. But at the point when it tries to contact global catalog and domain controllers from the trusted domains, it cannot access them, so it considers them offline. Can you show us your /etc/krb5.conf on this replica, content of files in /var/lib/sss/pubconf/krb5.include.d subdirectory which get included into /etc/krb5.conf, and the logs I asked above? Here is sanitized krb5.conf from the replica. The CAPATH information was provided by someone on this list to resolve a problem with the CAPATHs being wrong by default on v4.2 with our complex AD environment. We've since made an Ansible playbook to update the krb5.conf file on our client machines. We comment out the include path again based on our v4.2 issues howeve includedir /etc/krb5.conf.d/ ## Disabled due to SSSD Bug related to CA paths ## across different AD trusts # includedir /var/lib/sss/pubconf/krb5.include.d/ ## This is the manual COMPANY fix: [capaths] COMPANYAWS.ORG = { COMPANYIDM.ORG = COMPANYAWS.ORG } COMPANYIDM.ORG = { COMPANYAWS.ORG = COMPANYAWS.ORG COMPANY.ORG = COMPANY.ORG EAME.COMPANY.ORG = COMPANY.ORG APAC.COMPANY.ORG = COMPANY.ORG LATAM.COMPANY.ORG = COMPANY.ORG NAFTA.COMPANY.ORG = COMPANY.ORG } COMPANY.ORG = { COMPANYIDM.ORG = COMPANY.ORG } EAME.COMPANY.ORG = { COMPANYIDM.ORG = COMPANY.ORG } APAC.COMPANY.ORG = { COMPANYIDM.ORG = COMPANY.ORG } LATAM.COMPANY.ORG = { COMPANYIDM.ORG = COMPANY.ORG } NAFTA.COMPANY.ORG = { COMPANYIDM.ORG = COMPANY.ORG } [logging] default = FILE:/var/log/krb5libs.log kdc = FILE:/var/log/krb5kdc.log admin_server = FILE:/var/log/kadmind.log [libdefaults] default_realm = COMPANYIDM.ORG dns_lookup_realm = true dns_lookup_kdc = true rdns = false ticket_lifetime = 24h forwardable = true udp_preference_limit = 0 default_ccache_name = KEYRING:persistent:%{uid} canonicalize = true [realms] COMPANYIDM.ORG = { kdc = usaeilidmp002.COMPANYidm.org:88 master_kdc = usaeilidmp002.COMPANYidm.org:88 admin_server = usaeilidmp002.COMPANYidm.org:749 default_domain = COMPANYidm.org pkinit_anchors = FILE:/etc/ipa/ca.crt } [domain_realm] .COMPANYidm.org = COMPANYIDM.ORG COMPANYidm.org = COMPANYIDM.ORG usaeilidmp002.COMPANYidm.org = COMPANYIDM.ORG [dbmodules] COMPANYIDM.ORG = { db_library = ipadb.so } ## Also from the include path we had previously commented out [plugins] localauth = { module = sssd:/usr/lib64/sssd/modules/sssd_krb5_localauth_plugin.so } Can you make sure that the replica is actually able to reach AD DCs for the trusted domains (ports tcp/3268, tcp/389, tcp/88, udp/88, udp/53 at least)? I'm going to see if I can come up with a new verification method. My normal way of "proving" connectivity in this AWS environment is to use the VPC flow logs to search for REJECT alerts between the NIC on this IPA server and the remote AD domain controller. This was very effective in proving on our master IPA that something was blocking UDP:88 to a few remote controllers. Sadly I can't find any REJECT messages for this replica server so I've been assuming connectivity was totally fine. Will try to test via other methods. -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project
Re: [Freeipa-users] replica running trust-agents can't resolve AD users - which of these sssd errors should I be focusing on?
Oddly enough the keytab location on the replica is sort of empty ... ls -al /var/lib/sss/keytabs/ total 4 drwx--. 2 sssd sssd 32 Dec 23 13:58 . drwxr-xr-x. 9 root root 94 Dec 19 17:05 .. -rw--- 1 sssd sssd 219 Dec 20 20:40 company.org.keytab Jakub Hrozek wrote: In addition, can you also see if the keytab with the trust principal is there? Probably it would be /var/lib/sss/keytabs/shanetest.org. At15:43:11, sssd tried to fetch the keytab for this trust: (ThuDec 22 15:43:11 2016) [sssd[be[companyidm.org]]] [ipa_server_trusted_dom_setup_1way] (0x0400): Will re-fetch keytab for shanetest.org (ThuDec 22 15:43:11 2016) [sssd[be[companyidm.org]]] [ipa_getkeytab_send] (0x0400): Retrieving keytab forcompanyidm$@SHANETEST.ORG from usaeilidmp002.companyidm.org into /var/lib/sss/keytabs/shanetest.org.keytabRw7Iai using ccache /var/lib/sss/db/ccache_companyidm.ORG But fails: SASL Bind failed Can't contact LDAP server (-1) ! Failed to bind to server! Failed to get keytab (ThuDec 22 15:43:11 2016) [sssd[be[companyidm.org]]] [ipa_getkeytab_done] (0x0040): ipa-getkeytab failed with status [2304] (ThuDec 22 15:43:11 2016) [sssd[be[companyidm.org]]] [ipa_getkeytab_recv] (0x2000): ipa-getkeytab status 2304 (ThuDec 22 15:43:11 2016) [sssd[be[companyidm.org]]] [ipa_server_trust_1way_kt_done] (0x0080): ipa_getkeytab_recv failed: 1432158265 What I don't see in the logs, though is that if we try and re-fetch the keytab after going online (we should, though). -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project
Re: [Freeipa-users] replica running trust-agents can't resolve AD users - which of these sssd errors should I be focusing on?
On Thu, Dec 22, 2016 at 11:34:01PM +0200, Alexander Bokovoy wrote: > On to, 22 joulu 2016, Chris Dagdigian wrote: > > Hi folks, > > > > Summary: Replica w/ Trust agents can't resolve AD users. Not sure which > > debug_level=log error I should focus on. Would appreciate extra eyeballs > > on this .. > > > > Have a brand new replica (v4.4) running and after installing the AD > > trust agents I still can't recognize users who exist in the trusted AD > > domains. > > > > Running at debug_level=10 for logging as usual however deleting the logs > > and doing a fresh reboot followed by trying to resolve a users still > > make 4000+ log entries so rather than include it here I've posted a > > sanitized sssd_domain.log file here: > > > > http://chrisdag.me/sssd_companyidm.org.log.txt > > > > There are two sets of messages in that massive log file that concern me > > but I don't know enough yet to figure out which one to focus on. > > > > The first set of messages show what appears to be a fatal error in > > connecting to the local ldap:// server on the replica. > > > > However - > > - dirsirv logs look fine > > - the various ldapsearch commands in the Free-IPA troublehooting page > > work to query both the replica and the remote master > > - 'ipactl status' shows directory services running > > - no firewall blocking and AWS VPC flowLogs show no REJECT traffic > > whatsoever for the NIC on the replica > Can you show us ldap_child.log and krb5_child.log from /var/log/sssd on > the replica? > > There seem to be something weird with networking stack, because at > 15:43:13 the next attempt to connect gets 'connection refused'. May be > 389-ds is just warming up and there is not enough CPU or I/O to handle > the load? > > (Thu Dec 22 15:43:13 2016) [sssd[be[companyidm.org]]] > [be_resolve_server_process] (0x0200): Found address for server > usaeilidmp002.companyidm.org: [10.127.66.11] TTL 7200 > (Thu Dec 22 15:43:13 2016) [sssd[be[companyidm.org]]] > [sssd_async_socket_init_send] (0x4000): Using file descriptor [27] for the > connection. > (Thu Dec 22 15:43:13 2016) [sssd[be[companyidm.org]]] > [sssd_async_socket_init_send] (0x0400): Setting 6 seconds timeout for > connecting > (Thu Dec 22 15:43:13 2016) [sssd[be[companyidm.org]]] > [sssd_async_connect_done] (0x0020): connect failed [111][Connection refused]. > (Thu Dec 22 15:43:13 2016) [sssd[be[companyidm.org]]] > [sssd_async_socket_init_done] (0x0020): sdap_async_sys_connect request > failed: [111]: Connection refused. > > this is definitely is different from the result of two seconds before: > > (Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]] > [sssd_async_socket_init_send] (0x4000): Using file descriptor [21] for the > connection. > (Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]] > [sssd_async_connect_send] (0x0020): connect failed [101][Network is > unreachable]. > (Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]] > [sssd_async_socket_init_send] (0x0400): Setting 6 seconds timeout for > connecting > (Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]] > [sssd_async_socket_init_done] (0x0020): sdap_async_sys_connect request > failed: [101]: Network is unreachable. > (Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]] > [sssd_async_socket_state_destructor] (0x0400): closing socket [21] (Thu Dec > 22 15:43:11 2016) [sssd[be[companyidm.org]]] > [sss_ldap_init_sys_connect_done] (0x0020): sssd_async_socket_init request > failed: [101]: Network is unreachable. > (Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]] > [sdap_sys_connect_done] (0x0020): sdap_async_connect_call request failed: > [101]: Network is unreachable. > > Later, in a minute it seems to respond just well: > > > (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] > [sssd_async_socket_init_send] (0x4000): Using file descriptor [27] for the > connection. > (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] > [sssd_async_socket_init_send] (0x0400): Setting 6 seconds timeout for > connecting > (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] > [sdap_ldap_connect_callback_add] (0x1000): New LDAP connection to > [ldap://usaeilidmp002.companyidm.org:389/??base] with fd [27]. > (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] > [sdap_get_rootdse_send] (0x4000): Getting rootdse > (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] > [sdap_print_server] (0x2000): Searching 10.127.66.11:389 > ... > > (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] [sdap_parse_range] > (0x2000): No sub-attributes for [supportedSASLMechanisms] > (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] [sdap_parse_range] > (0x2000): No sub-attributes for [defaultNamingContext] > (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] [sdap_parse_range] > (0x2000): No sub-attributes for [lastUSN] > (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] > [sdap_process_result] (0x2000): Trace: sh[0x7f4f63ace530], connected[1], > ops[0x7f4f63b283d0], ldap[0x7f4f63b28720]
Re: [Freeipa-users] replica running trust-agents can't resolve AD users - which of these sssd errors should I be focusing on?
On to, 22 joulu 2016, Chris Dagdigian wrote: Hi folks, Summary: Replica w/ Trust agents can't resolve AD users. Not sure which debug_level=log error I should focus on. Would appreciate extra eyeballs on this .. Have a brand new replica (v4.4) running and after installing the AD trust agents I still can't recognize users who exist in the trusted AD domains. Running at debug_level=10 for logging as usual however deleting the logs and doing a fresh reboot followed by trying to resolve a users still make 4000+ log entries so rather than include it here I've posted a sanitized sssd_domain.log file here: http://chrisdag.me/sssd_companyidm.org.log.txt There are two sets of messages in that massive log file that concern me but I don't know enough yet to figure out which one to focus on. The first set of messages show what appears to be a fatal error in connecting to the local ldap:// server on the replica. However - - dirsirv logs look fine - the various ldapsearch commands in the Free-IPA troublehooting page work to query both the replica and the remote master - 'ipactl status' shows directory services running - no firewall blocking and AWS VPC flowLogs show no REJECT traffic whatsoever for the NIC on the replica Can you show us ldap_child.log and krb5_child.log from /var/log/sssd on the replica? There seem to be something weird with networking stack, because at 15:43:13 the next attempt to connect gets 'connection refused'. May be 389-ds is just warming up and there is not enough CPU or I/O to handle the load? (Thu Dec 22 15:43:13 2016) [sssd[be[companyidm.org]]] [be_resolve_server_process] (0x0200): Found address for server usaeilidmp002.companyidm.org: [10.127.66.11] TTL 7200 (Thu Dec 22 15:43:13 2016) [sssd[be[companyidm.org]]] [sssd_async_socket_init_send] (0x4000): Using file descriptor [27] for the connection. (Thu Dec 22 15:43:13 2016) [sssd[be[companyidm.org]]] [sssd_async_socket_init_send] (0x0400): Setting 6 seconds timeout for connecting (Thu Dec 22 15:43:13 2016) [sssd[be[companyidm.org]]] [sssd_async_connect_done] (0x0020): connect failed [111][Connection refused]. (Thu Dec 22 15:43:13 2016) [sssd[be[companyidm.org]]] [sssd_async_socket_init_done] (0x0020): sdap_async_sys_connect request failed: [111]: Connection refused. this is definitely is different from the result of two seconds before: (Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]] [sssd_async_socket_init_send] (0x4000): Using file descriptor [21] for the connection. (Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]] [sssd_async_connect_send] (0x0020): connect failed [101][Network is unreachable]. (Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]] [sssd_async_socket_init_send] (0x0400): Setting 6 seconds timeout for connecting (Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]] [sssd_async_socket_init_done] (0x0020): sdap_async_sys_connect request failed: [101]: Network is unreachable. (Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]] [sssd_async_socket_state_destructor] (0x0400): closing socket [21] (Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]] [sss_ldap_init_sys_connect_done] (0x0020): sssd_async_socket_init request failed: [101]: Network is unreachable. (Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]] [sdap_sys_connect_done] (0x0020): sdap_async_connect_call request failed: [101]: Network is unreachable. Later, in a minute it seems to respond just well: (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] [sssd_async_socket_init_send] (0x4000): Using file descriptor [27] for the connection. (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] [sssd_async_socket_init_send] (0x0400): Setting 6 seconds timeout for connecting (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] [sdap_ldap_connect_callback_add] (0x1000): New LDAP connection to [ldap://usaeilidmp002.companyidm.org:389/??base] with fd [27]. (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] [sdap_get_rootdse_send] (0x4000): Getting rootdse (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] [sdap_print_server] (0x2000): Searching 10.127.66.11:389 ... (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] [sdap_parse_range] (0x2000): No sub-attributes for [supportedSASLMechanisms] (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] [sdap_parse_range] (0x2000): No sub-attributes for [defaultNamingContext] (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] [sdap_parse_range] (0x2000): No sub-attributes for [lastUSN] (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] [sdap_process_result] (0x2000): Trace: sh[0x7f4f63ace530], connected[1], ops[0x7f4f63b283d0], ldap[0x7f4f63b28720] (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] [sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_RESULT] (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] [sdap_get_generic_op_finished] (0x0400): Search result: Success(0), no errmsg set (Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]]