Hi,

after running some time with debugging enabled, I came across  a "Too many open 
files" error in the logs.
Shouldn't this be fixed? (https://fedorahosted.org/sssd/ticket/2792)

We have 6 AD servers in our environment which get returned when using DNS SRV 
records. It seems like one can't be discoverd in a timely manner:

(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[sdap_id_op_connect_step] (0x4000): beginning to connect
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[fo_resolve_service_send] (0x0100): Trying to resolve service 'AD_GC'
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [get_port_status] 
(0x1000): Port status of port 0 for server '(no name)' is 'not working'
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [get_port_status] 
(0x0100): Reseting the status of port 0 for server '(no name)'
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [get_srv_data_status] 
(0x0400): Changing state of SRV lookup from 'SRV_RESOLVE_ERROR' to 
'SRV_NEUTRAL'.
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [resolve_srv_send] 
(0x0200): The status of SRV lookup is neutral
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [ad_srv_plugin_send] 
(0x0400): About to find domain controllers
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [ad_get_dc_servers_send] 
(0x0400): Looking up domain controllers in domain some.domain.com
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[resolv_discover_srv_next_domain] (0x0400): SRV resolution of service 'ldap'. 
Will use DNS discovery domain 'some.domain.com'
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [resolv_getsrv_send] 
(0x0100): Trying to resolve SRV record of '_ldap._tcp.some.domain.com'
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[schedule_request_timeout] (0x2000): Scheduling a timeout of 6 seconds
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[schedule_timeout_watcher] (0x2000): Scheduling DNS timeout watcher
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[unschedule_timeout_watcher] (0x4000): Unscheduling DNS timeout watcher
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [resolv_getsrv_done] 
(0x1000): Using TTL [600]
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[request_watch_destructor] (0x0400): Deleting request watch
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [fo_discover_srv_done] 
(0x0400): Got answer. Processing...
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [fo_discover_srv_done] 
(0x0400): Got 6 servers
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [ad_get_dc_servers_done] 
(0x0400): Found 6 domain controllers in domain some.domain.com
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [ad_srv_plugin_dcs_done] 
(0x0400): About to locate suitable site
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [sdap_connect_host_send] 
(0x0400): Resolving host DC01.SOME.Domain.com
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [resolv_is_address] 
(0x4000): [DC01.SOME.Domain.com] does not look like an IP address
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[resolv_gethostbyname_step] (0x2000): Querying files
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[resolv_gethostbyname_files_send] (0x0100): Trying to resolve A record of 
'DC01.SOME.Domain.com' in files
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[resolv_gethostbyname_step] (0x2000): Querying files
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[resolv_gethostbyname_files_send] (0x0100): Trying to resolve AAAA record of 
'DC01.SOME.Domain.com' in files
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[resolv_gethostbyname_next] (0x0200): No more address families to retry
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[resolv_gethostbyname_step] (0x2000): Querying DNS
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 
'DC01.SOME.Domain.com' in DNS
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[schedule_request_timeout] (0x2000): Scheduling a timeout of 6 seconds
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[schedule_timeout_watcher] (0x2000): Scheduling DNS timeout watcher
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[unschedule_timeout_watcher] (0x4000): Unscheduling DNS timeout watcher
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[resolv_gethostbyname_dns_parse] (0x1000): Parsing an A reply
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[request_watch_destructor] (0x0400): Deleting request watch
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] 
[sdap_connect_host_resolv_done] (0x0400): Connecting to 
ldap://DC01.SOME.Domain.com:389
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [sss_ldap_init_send] 
(0x4000): Using file descriptor [1015] for LDAP connection.
(Sat Mar 26 10:17:01 2016) [sssd[be[some.domain.com]]] [sss_ldap_init_send] 
(0x0400): Setting 6 seconds timeout for connecting
(Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sbus_dispatch] 
(0x4000): dbus conn: 0x1741990
(Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sbus_dispatch] 
(0x4000): Dispatching.
(Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sbus_message_handler] 
(0x4000): Received SBUS method [ping]
(Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] 
[sbus_get_sender_id_send] (0x2000): Not a sysbus message, quit
(Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] 
[sbus_handler_got_caller_id] (0x4000): Received SBUS method [ping]
(Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] 
[fo_resolve_service_timeout] (0x0080): Service resolving timeout reached
(Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sdap_handle_release] 
(0x2000): Trace: sh[0x3742310], connected[0], ops[(nil)], ldap[(nil)], 
destructor_lock[0], release_memory[0]
(Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [be_resolve_server_done] 
(0x1000): Server resolution failed: 14
(Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] 
[sdap_id_op_connect_done] (0x0400): Failed to connect to server, but ignore 
mark offline is enabled.
(Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] 
[sdap_id_op_connect_done] (0x4000): notify offline to op #1
(Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] 
[sdap_id_op_connect_step] (0x4000): beginning to connect


This occurs repeatedly until sssd runs out file descriptors:

(Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] 
[sdap_kinit_kdc_resolved] (0x1000): KDC resolved, attempting to get TGT...
(Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] 
[create_tgt_req_send_buffer] (0x0400): buffer size: 54
(Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sdap_fork_child] 
(0x0020): pipe failed [24][Too many open files].
(Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sdap_get_tgt_send] 
(0x0020): sdap_fork_child failed.
(Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sdap_kinit_done] 
(0x0020): child failed (24 [Too many open files])
(Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [sdap_cli_kinit_done] 
(0x0400): Cannot get a TGT: ret [24](Too many open files)
(Sat Mar 26 10:17:07 2016) [sssd[be[some.domain.com]]] [fo_set_port_status] 
(0x0100): Marking port 389 of server 'DC02.SOME.Domain.com' as 'not working'


which then results in:


Mar 29 08:25:51 <HOSTNAME> sshd[6975]: pam_sss(sshd:auth): authentication 
failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=<rhost> user=<user>
Mar 29 08:25:51 <HOSTNAME> sshd[6975]: pam_sss(sshd:auth): received for user 
<user>: 4 (System error)
Mar 29 08:25:53 <HOSTNAME> sshd[6975]: Failed password for <user> from <ip> 
port 54001 ssh2


I've now tried setting dns_resolver_timeout to 10 seconds as someone mentioned 
at https://fedorahosted.org/sssd/ticket/2792 as a possible workaround but it 
would be much appreciated if someone could provide some feedback on this issue.

Many Thanks,
Christoph
_______________________________________________
sssd-users mailing list
[email protected]
https://lists.fedorahosted.org/admin/lists/[email protected]

Reply via email to