** Description changed:

+ [Impact]
  
- When SSSD fails to connect to a provider (LDAP, for instance) it creates a 
timed event with tevent_add_timer() in order to retry in ~1 min. Tevent relies 
on CLOCK_REALTIME, using absolute epoch time, so when the time changes (e.g. 
NTP sync) the scheduled event is affected.
+  * SSSD is affected by clock shifts because it's built on libtevent, that 
doesn't use monotonic clocks. 
+  * After an event has been scheduled, if the time drifts to the past SSSD 
won't recover and the event will have to wait the shifted time to be executed.
+ 
+ [Test Case]
+ 
+  * Modify the /etc/hosts file to force a failed resolution and restart sssd.
+  * Within the first ~1 minute retry window, change the date (date --set) to a 
previous time (e.g. 2h).
+  * "Tail" the log file. The event will be executed at the original schedule, 
past those 2h.
+ 
+ [Regression Potential]
+ 
+  * None
+ 
+ [Other Info]
+ 
+  * Version 1.15 implements a watchdog that detects time shifts and resets 
itselft but doesn't reschedule the scheduled events.
+  * There's a network monitor based on netlink that detects interface changes 
and restarts the providers, rescheduling the scheduled events.
+  * Such restart can be triggered with SIGUSR2. Sending that signal after the 
watchdog restarts fixes this issue.
+  * Upstream bug: https://fedorahosted.org/sssd/ticket/3285 
+ 
+ [Original Description]
+ 
+ When SSSD fails to connect to a provider (LDAP, for instance) it creates
+ a timed event with tevent_add_timer() in order to retry in ~1 min.
+ Tevent relies on CLOCK_REALTIME, using absolute epoch time, so when the
+ time changes (e.g. NTP sync) the scheduled event is affected.
  
  Reproducer:
  
  1. Modify the /etc/hosts file to force a failed resolution and restart
  sssd
  
- (Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] 
[resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 'ldap' 
in DNS 
- (Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] 
[resolv_gethostbyname_done] (0x0040): querying hosts database failed [5]: 
Input/output error 
- (Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] 
[fo_resolve_service_done] (0x0020): Failed to resolve server 'ldap': Could not 
contact DNS servers 
- ... 
- (Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] [check_online_callback] 
(0x0100): Backend returned: (1, 0, <NULL>) [Provider is Offline] 
+ (Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] 
[resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 'ldap' 
in DNS
+ (Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] 
[resolv_gethostbyname_done] (0x0040): querying hosts database failed [5]: 
Input/output error
+ (Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] 
[fo_resolve_service_done] (0x0020): Failed to resolve server 'ldap': Could not 
contact DNS servers
+ ...
+ (Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] [check_online_callback] 
(0x0100): Backend returned: (1, 0, <NULL>) [Provider is Offline]
  
  2. Within the ~1 minute window, change the date (date --set) to a
  previous time (2 hour in my example). Note: if /etc/resolv.conf or a
  network interface is modified, SSSD providers will restart, but the
  scheduled retry will be kept.
  
- (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [recreate_ares_channel] 
(0x0100): Initializing new c-ares channel 
- (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [recreate_ares_channel] 
(0x0100): Destroying the old c-ares channel 
- (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] 
[set_server_common_status] (0x0100): Marking server 'ldap' as 'name not 
resolved' 
- (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [fo_set_port_status] 
(0x0100): Marking port 389 of server 'ldap' as 'neutral' 
- (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] 
[fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP' 
- ... 
- (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] 
[fo_resolve_service_done] (0x0020): Failed to resolve server 'ldap': Could not 
contact DNS servers 
- (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] 
[set_server_common_status] (0x0100): Marking server 'ldap' as 'not working' 
- ... 
- (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [check_online_callback] 
(0x0100): Backend returned: (1, 0, <NULL>) [Provider is Offline] 
+ (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [recreate_ares_channel] 
(0x0100): Initializing new c-ares channel
+ (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [recreate_ares_channel] 
(0x0100): Destroying the old c-ares channel
+ (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] 
[set_server_common_status] (0x0100): Marking server 'ldap' as 'name not 
resolved'
+ (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [fo_set_port_status] 
(0x0100): Marking port 389 of server 'ldap' as 'neutral'
+ (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] 
[fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP'
+ ...
+ (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] 
[fo_resolve_service_done] (0x0020): Failed to resolve server 'ldap': Could not 
contact DNS servers
+ (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] 
[set_server_common_status] (0x0100): Marking server 'ldap' as 'not working'
+ ...
+ (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [check_online_callback] 
(0x0100): Backend returned: (1, 0, <NULL>) [Provider is Offline]
  
  3. Once it gets to the programmed date (2 hours later), it retries:
  
- (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] 
[set_server_common_status] (0x0100): Marking server 'ldap' as 'name not 
resolved' 
- (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [fo_set_port_status] 
(0x0100): Marking port 389 of server 'ldap' as 'neutral' 
- (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] 
[fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP' 
- ... 
- (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] 
[fo_resolve_service_done] (0x0020): Failed to resolve server 'ldap': Could not 
contact DNS servers 
- (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] 
[set_server_common_status] (0x0100): Marking server 'ldap' as 'not working' 
- ... 
+ (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] 
[set_server_common_status] (0x0100): Marking server 'ldap' as 'name not 
resolved'
+ (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [fo_set_port_status] 
(0x0100): Marking port 389 of server 'ldap' as 'neutral'
+ (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] 
[fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP'
+ ...
+ (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] 
[fo_resolve_service_done] (0x0020): Failed to resolve server 'ldap': Could not 
contact DNS servers
+ (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] 
[set_server_common_status] (0x0100): Marking server 'ldap' as 'not working'
+ ...
  (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [check_online_callback] 
(0x0100): Backend returned: (1, 0, <NULL>) [Provider is Offline]

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1641875

Title:
  Scheduled events (e.g. LDAP connection retries) are affected by clock
  adjustments

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/sssd/+bug/1641875/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to