Hi, after running some time with debugging enabled, I came across a "Too many open files" error in the logs. Shouldn't this be fixed? (https://fedorahosted.org/sssd/ticket/2792)
We have 6 AD servers in our environment which get returned when using DNS SRV records. It seems like one can't be discoverd in a timely manner: (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [sdap_id_op_connect_step] (0x4000): beginning to connect (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'AD_GC' (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [get_port_status] (0x1000): Port status of port 0 for server '(no name)' is 'not working' (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [get_port_status] (0x0100): Reseting the status of port 0 for server '(no name)' (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [get_srv_data_status] (0x0400): Changing state of SRV lookup from 'SRV_RESOLVE_ERROR' to 'SRV_NEUTRAL'. (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [resolve_srv_send] (0x0200): The status of SRV lookup is neutral (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [ad_srv_plugin_send] (0x0400): About to find domain controllers (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [ad_get_dc_servers_send] (0x0400): Looking up domain controllers in domain some.domain.com (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [resolv_discover_srv_next_domain] (0x0400): SRV resolution of service 'ldap'. Will use DNS discovery domain 'some.domain.com' (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [resolv_getsrv_send] (0x0100): Trying to resolve SRV record of '_ldap._tcp.some.domain.com' (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [schedule_request_timeout] (0x2000): Scheduling a timeout of 6 seconds (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [schedule_timeout_watcher] (0x2000): Scheduling DNS timeout watcher (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [unschedule_timeout_watcher] (0x4000): Unscheduling DNS timeout watcher (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [resolv_getsrv_done] (0x1000): Using TTL [600] (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [request_watch_destructor] (0x0400): Deleting request watch (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [fo_discover_srv_done] (0x0400): Got answer. Processing... (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [fo_discover_srv_done] (0x0400): Got 6 servers (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [ad_get_dc_servers_done] (0x0400): Found 6 domain controllers in domain some.domain.com (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [ad_srv_plugin_dcs_done] (0x0400): About to locate suitable site (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [sdap_connect_host_send] (0x0400): Resolving host DC01.SOME.Domain.com (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [resolv_is_address] (0x4000): [DC01.SOME.Domain.com] does not look like an IP address (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [resolv_gethostbyname_step] (0x2000): Querying files (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [resolv_gethostbyname_files_send] (0x0100): Trying to resolve A record of 'DC01.SOME.Domain.com' in files (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [resolv_gethostbyname_step] (0x2000): Querying files (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [resolv_gethostbyname_files_send] (0x0100): Trying to resolve AAAA record of 'DC01.SOME.Domain.com' in files (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [resolv_gethostbyname_next] (0x0200): No more address families to retry (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [resolv_gethostbyname_step] (0x2000): Querying DNS (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 'DC01.SOME.Domain.com' in DNS (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [schedule_request_timeout] (0x2000): Scheduling a timeout of 6 seconds (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [schedule_timeout_watcher] (0x2000): Scheduling DNS timeout watcher (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [unschedule_timeout_watcher] (0x4000): Unscheduling DNS timeout watcher (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [resolv_gethostbyname_dns_parse] (0x1000): Parsing an A reply (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [request_watch_destructor] (0x0400): Deleting request watch (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [sdap_connect_host_resolv_done] (0x0400): Connecting to ldap://DC01.SOME.Domain.com:389 (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [sss_ldap_init_send] (0x4000): Using file descriptor [1015] for LDAP connection. (Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [sss_ldap_init_send] (0x0400): Setting 6 seconds timeout for connecting (Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sbus_dispatch] (0x4000): dbus conn: 0x1741990 (Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sbus_dispatch] (0x4000): Dispatching. (Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sbus_message_handler] (0x4000): Received SBUS method [ping] (Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sbus_get_sender_id_send] (0x2000): Not a sysbus message, quit (Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sbus_handler_got_caller_id] (0x4000): Received SBUS method [ping] (Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [fo_resolve_service_timeout] (0x0080): Service resolving timeout reached (Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sdap_handle_release] (0x2000): Trace: sh[0x3742310], connected[0], ops[(nil)], ldap[(nil)], destructor_lock[0], release_memory[0] (Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [be_resolve_server_done] (0x1000): Server resolution failed: 14 (Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sdap_id_op_connect_done] (0x0400): Failed to connect to server, but ignore mark offline is enabled. (Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sdap_id_op_connect_done] (0x4000): notify offline to op #1 (Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sdap_id_op_connect_step] (0x4000): beginning to connect This occurs repeatedly until sssd runs out file descriptors: (Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sdap_kinit_kdc_resolved] (0x1000): KDC resolved, attempting to get TGT... (Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [create_tgt_req_send_buffer] (0x0400): buffer size: 54 (Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sdap_fork_child] (0x0020): pipe failed [24][Too many open files]. (Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sdap_get_tgt_send] (0x0020): sdap_fork_child failed. (Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sdap_kinit_done] (0x0020): child failed (24 [Too many open files]) (Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sdap_cli_kinit_done] (0x0400): Cannot get a TGT: ret [24](Too many open files) (Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'DC02.SOME.Domain.com' as 'not working' which then results in: Mar 29 08:25:51 <HOSTNAME> sshd[6975]: pam_sss(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=<rhost> user=<user> Mar 29 08:25:51 <HOSTNAME> sshd[6975]: pam_sss(sshd:auth): received for user <user>: 4 (System error) Mar 29 08:25:53 <HOSTNAME> sshd[6975]: Failed password for <user> from <ip> port 54001 ssh2 I've now tried setting dns_resolver_timeout to 10 seconds as someone mentioned at https://fedorahosted.org/sssd/ticket/2792 as a possible workaround but it would be much appreciated if someone could provide some feedback on this issue. Many Thanks, Christoph _______________________________________________ sssd-users mailing list [email protected] https://lists.fedorahosted.org/admin/lists/[email protected]
