[SSSD] Re: Monotonic clock for timed events

2016-10-12 Thread Simo Sorce
On Wed, 2016-10-12 at 10:52 +0200, Pavel Březina wrote:
> On 10/11/2016 03:26 PM, Simo Sorce wrote:
> > On Mon, 2016-10-10 at 14:04 +0200, Pavel Březina wrote:
> >> On 10/10/2016 10:09 AM, Fabiano Fidêncio wrote:
> >>> Victor,
> >>>
> >>> On Mon, Oct 10, 2016 at 10:04 AM, Victor Tapia
> >>>  wrote:
>  Hi list,
> 
>  I've faced a race condition when SSSD boots in a machine with a big
>  clock drift. This is what I see:
> 
>  1. SSSD starts before the network is up, queries the LDAP server without
>  success and sets a retry timer (~60 secs)
>  2. NTP starts and corrects the clock, 1 hour back for example.
>  3. SSSD takes ~60 secs + the drift correction (1 hour) to retry the
>  connection.
> 
>  In this particular scenario the credentials cache is disabled, so the
>  wait time to login is noticeable. How feasible would it be to use a
>  monotonic clock for this kind of timed events?
> >>>
> >>> Are you running git master? This issue is supposed to be already
> >>> solved by 
> >>> https://github.com/SSSD/sssd/commit/b8ceaeb80cffb00c26390913ea959b77f7e848b9
> >>
> >> This patch fix the issue only in watchdog which would result in
> >> terminating sssd otherwise. Fixing it across whole sssd would be
> >> difficult. The fix should go to tevent.
> >
> > It also seem to fix the issue only if the time jumps backwards, not if
> > it jumps forward, in that case if I read the code right, we'd still end
> > up killing sssd.
> 
> Yes, we don't need0 to care about forward jump since that means all 
> tevent timers that are within this time shift are fired anyway.

Well I do care if sssd kills all children just because I suspended the
laptop for a while :-)

Simo.

-- 
Simo Sorce * Red Hat, Inc * New York
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] Re: Monotonic clock for timed events

2016-10-12 Thread Pavel Březina

On 10/11/2016 03:26 PM, Simo Sorce wrote:

On Mon, 2016-10-10 at 14:04 +0200, Pavel Březina wrote:

On 10/10/2016 10:09 AM, Fabiano Fidêncio wrote:

Victor,

On Mon, Oct 10, 2016 at 10:04 AM, Victor Tapia
 wrote:

Hi list,

I've faced a race condition when SSSD boots in a machine with a big
clock drift. This is what I see:

1. SSSD starts before the network is up, queries the LDAP server without
success and sets a retry timer (~60 secs)
2. NTP starts and corrects the clock, 1 hour back for example.
3. SSSD takes ~60 secs + the drift correction (1 hour) to retry the
connection.

In this particular scenario the credentials cache is disabled, so the
wait time to login is noticeable. How feasible would it be to use a
monotonic clock for this kind of timed events?


Are you running git master? This issue is supposed to be already
solved by 
https://github.com/SSSD/sssd/commit/b8ceaeb80cffb00c26390913ea959b77f7e848b9


This patch fix the issue only in watchdog which would result in
terminating sssd otherwise. Fixing it across whole sssd would be
difficult. The fix should go to tevent.


It also seem to fix the issue only if the time jumps backwards, not if
it jumps forward, in that case if I read the code right, we'd still end
up killing sssd.


Yes, we don't need0 to care about forward jump since that means all 
tevent timers that are within this time shift are fired anyway.




Simo.


___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] Re: Monotonic clock for timed events

2016-10-11 Thread Simo Sorce
On Mon, 2016-10-10 at 11:19 +0200, Jakub Hrozek wrote:
> On Mon, Oct 10, 2016 at 11:09:35AM +0200, Victor Tapia wrote:
> > El 10/10/16 a las 10:56, Ian Kent escribió:
> > > On Mon, 2016-10-10 at 10:42 +0200, Jakub Hrozek wrote:
> > >> On Mon, Oct 10, 2016 at 10:04:30AM +0200, Victor Tapia wrote:
> > >>> Hi list,
> > >>>
> > >>> I've faced a race condition when SSSD boots in a machine with a big
> > >>> clock drift. This is what I see:
> > >>>
> > >>> 1. SSSD starts before the network is up, queries the LDAP server without
> > >>> success and sets a retry timer (~60 secs)
> > >>> 2. NTP starts and corrects the clock, 1 hour back for example.
> > >>> 3. SSSD takes ~60 secs + the drift correction (1 hour) to retry the
> > >>> connection.
> > >>>
> > >>> In this particular scenario the credentials cache is disabled, so the
> > >>> wait time to login is noticeable. How feasible would it be to use a
> > >>> monotonic clock for this kind of timed events?
> > >>
> > >> I really have not tried this and I guess I don't know tevent internals
> > >> well enough if this works, but I wonder if just using:
> > >> clock_getime()
> > > 
> > > With a CLOCK_MONOTONIC I presume?
> > > I think that's what has been suggested.
> > > 
> > 
> > I was thinking about using CLOCK_MONOTONIC_RAW:
> > 
> > CLOCK_MONOTONIC_RAW (since Linux 2.6.28; Linux-specific)
> > Similar to CLOCK_MONOTONIC, but provides access to a raw hardware-based
> > time that is not subject to NTP adjustments or the incremental
> > adjustments performed by adjtime(3).
> > 
> > >> and constructing struct timeval
> > >> in place of:
> > >> tevent_timeval_current_ofs()
> > >> could solve this particular issue.
> > >>
> > >> On the other hand, this is a pattern we use in SSSD all through the code
> > >> for timed events and we're just not well equipped to handle time drifts.
> > >> Did you investigate why doesn't sssd detect the networking change from
> > >> libnl messages or from resolv.conf being touched?
> > 
> > I didn't dig much into it yet (I just checked tevent to confirm it uses
> > gettimeofday()), so I'll take this as my next step.
> 
> btw the samba-technical mailing list is the best source of info about
> libtevent and the best place to ask questions about libtevent..

We should suggest that libtevent starts using timerfd with epoll to do
time tracking instead of the internal computations, and then a monotonic
clock can be used with some calculation at set time to convert the time
of day to the current monotonic clock.

Simo.

-- 
Simo Sorce * Red Hat, Inc * New York
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] Re: Monotonic clock for timed events

2016-10-11 Thread Simo Sorce
On Mon, 2016-10-10 at 14:04 +0200, Pavel Březina wrote:
> On 10/10/2016 10:09 AM, Fabiano Fidêncio wrote:
> > Victor,
> >
> > On Mon, Oct 10, 2016 at 10:04 AM, Victor Tapia
> >  wrote:
> >> Hi list,
> >>
> >> I've faced a race condition when SSSD boots in a machine with a big
> >> clock drift. This is what I see:
> >>
> >> 1. SSSD starts before the network is up, queries the LDAP server without
> >> success and sets a retry timer (~60 secs)
> >> 2. NTP starts and corrects the clock, 1 hour back for example.
> >> 3. SSSD takes ~60 secs + the drift correction (1 hour) to retry the
> >> connection.
> >>
> >> In this particular scenario the credentials cache is disabled, so the
> >> wait time to login is noticeable. How feasible would it be to use a
> >> monotonic clock for this kind of timed events?
> >
> > Are you running git master? This issue is supposed to be already
> > solved by 
> > https://github.com/SSSD/sssd/commit/b8ceaeb80cffb00c26390913ea959b77f7e848b9
> 
> This patch fix the issue only in watchdog which would result in 
> terminating sssd otherwise. Fixing it across whole sssd would be 
> difficult. The fix should go to tevent.

It also seem to fix the issue only if the time jumps backwards, not if
it jumps forward, in that case if I read the code right, we'd still end
up killing sssd.

Simo.

-- 
Simo Sorce * Red Hat, Inc * New York
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] Re: Monotonic clock for timed events

2016-10-11 Thread Simo Sorce
On Mon, 2016-10-10 at 10:04 +0200, Victor Tapia wrote:
> Hi list,
> 
> I've faced a race condition when SSSD boots in a machine with a big
> clock drift. This is what I see:
> 
> 1. SSSD starts before the network is up, queries the LDAP server without
> success and sets a retry timer (~60 secs)
> 2. NTP starts and corrects the clock, 1 hour back for example.
> 3. SSSD takes ~60 secs + the drift correction (1 hour) to retry the
> connection.
> 
> In this particular scenario the credentials cache is disabled, so the
> wait time to login is noticeable. How feasible would it be to use a
> monotonic clock for this kind of timed events?

We should use a monotonic clock for most internal events.

Simo.

-- 
Simo Sorce * Red Hat, Inc * New York
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] Re: Monotonic clock for timed events

2016-10-10 Thread Pavel Březina

On 10/10/2016 10:09 AM, Fabiano Fidêncio wrote:

Victor,

On Mon, Oct 10, 2016 at 10:04 AM, Victor Tapia
 wrote:

Hi list,

I've faced a race condition when SSSD boots in a machine with a big
clock drift. This is what I see:

1. SSSD starts before the network is up, queries the LDAP server without
success and sets a retry timer (~60 secs)
2. NTP starts and corrects the clock, 1 hour back for example.
3. SSSD takes ~60 secs + the drift correction (1 hour) to retry the
connection.

In this particular scenario the credentials cache is disabled, so the
wait time to login is noticeable. How feasible would it be to use a
monotonic clock for this kind of timed events?


Are you running git master? This issue is supposed to be already
solved by 
https://github.com/SSSD/sssd/commit/b8ceaeb80cffb00c26390913ea959b77f7e848b9


This patch fix the issue only in watchdog which would result in 
terminating sssd otherwise. Fixing it across whole sssd would be 
difficult. The fix should go to tevent.

___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] Re: Monotonic clock for timed events

2016-10-10 Thread Jakub Hrozek
On Mon, Oct 10, 2016 at 11:09:35AM +0200, Victor Tapia wrote:
> El 10/10/16 a las 10:56, Ian Kent escribió:
> > On Mon, 2016-10-10 at 10:42 +0200, Jakub Hrozek wrote:
> >> On Mon, Oct 10, 2016 at 10:04:30AM +0200, Victor Tapia wrote:
> >>> Hi list,
> >>>
> >>> I've faced a race condition when SSSD boots in a machine with a big
> >>> clock drift. This is what I see:
> >>>
> >>> 1. SSSD starts before the network is up, queries the LDAP server without
> >>> success and sets a retry timer (~60 secs)
> >>> 2. NTP starts and corrects the clock, 1 hour back for example.
> >>> 3. SSSD takes ~60 secs + the drift correction (1 hour) to retry the
> >>> connection.
> >>>
> >>> In this particular scenario the credentials cache is disabled, so the
> >>> wait time to login is noticeable. How feasible would it be to use a
> >>> monotonic clock for this kind of timed events?
> >>
> >> I really have not tried this and I guess I don't know tevent internals
> >> well enough if this works, but I wonder if just using:
> >> clock_getime()
> > 
> > With a CLOCK_MONOTONIC I presume?
> > I think that's what has been suggested.
> > 
> 
> I was thinking about using CLOCK_MONOTONIC_RAW:
> 
> CLOCK_MONOTONIC_RAW (since Linux 2.6.28; Linux-specific)
> Similar to CLOCK_MONOTONIC, but provides access to a raw hardware-based
> time that is not subject to NTP adjustments or the incremental
> adjustments performed by adjtime(3).
> 
> >> and constructing struct timeval
> >> in place of:
> >> tevent_timeval_current_ofs()
> >> could solve this particular issue.
> >>
> >> On the other hand, this is a pattern we use in SSSD all through the code
> >> for timed events and we're just not well equipped to handle time drifts.
> >> Did you investigate why doesn't sssd detect the networking change from
> >> libnl messages or from resolv.conf being touched?
> 
> I didn't dig much into it yet (I just checked tevent to confirm it uses
> gettimeofday()), so I'll take this as my next step.

btw the samba-technical mailing list is the best source of info about
libtevent and the best place to ask questions about libtevent..
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] Re: Monotonic clock for timed events

2016-10-10 Thread Victor Tapia
El 10/10/16 a las 10:56, Ian Kent escribió:
> On Mon, 2016-10-10 at 10:42 +0200, Jakub Hrozek wrote:
>> On Mon, Oct 10, 2016 at 10:04:30AM +0200, Victor Tapia wrote:
>>> Hi list,
>>>
>>> I've faced a race condition when SSSD boots in a machine with a big
>>> clock drift. This is what I see:
>>>
>>> 1. SSSD starts before the network is up, queries the LDAP server without
>>> success and sets a retry timer (~60 secs)
>>> 2. NTP starts and corrects the clock, 1 hour back for example.
>>> 3. SSSD takes ~60 secs + the drift correction (1 hour) to retry the
>>> connection.
>>>
>>> In this particular scenario the credentials cache is disabled, so the
>>> wait time to login is noticeable. How feasible would it be to use a
>>> monotonic clock for this kind of timed events?
>>
>> I really have not tried this and I guess I don't know tevent internals
>> well enough if this works, but I wonder if just using:
>> clock_getime()
> 
> With a CLOCK_MONOTONIC I presume?
> I think that's what has been suggested.
> 

I was thinking about using CLOCK_MONOTONIC_RAW:

CLOCK_MONOTONIC_RAW (since Linux 2.6.28; Linux-specific)
Similar to CLOCK_MONOTONIC, but provides access to a raw hardware-based
time that is not subject to NTP adjustments or the incremental
adjustments performed by adjtime(3).

>> and constructing struct timeval
>> in place of:
>> tevent_timeval_current_ofs()
>> could solve this particular issue.
>>
>> On the other hand, this is a pattern we use in SSSD all through the code
>> for timed events and we're just not well equipped to handle time drifts.
>> Did you investigate why doesn't sssd detect the networking change from
>> libnl messages or from resolv.conf being touched?

I didn't dig much into it yet (I just checked tevent to confirm it uses
gettimeofday()), so I'll take this as my next step.

Thanks,

Victor
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] Re: Monotonic clock for timed events

2016-10-10 Thread Ian Kent
On Mon, 2016-10-10 at 10:42 +0200, Jakub Hrozek wrote:
> On Mon, Oct 10, 2016 at 10:04:30AM +0200, Victor Tapia wrote:
> > Hi list,
> > 
> > I've faced a race condition when SSSD boots in a machine with a big
> > clock drift. This is what I see:
> > 
> > 1. SSSD starts before the network is up, queries the LDAP server without
> > success and sets a retry timer (~60 secs)
> > 2. NTP starts and corrects the clock, 1 hour back for example.
> > 3. SSSD takes ~60 secs + the drift correction (1 hour) to retry the
> > connection.
> > 
> > In this particular scenario the credentials cache is disabled, so the
> > wait time to login is noticeable. How feasible would it be to use a
> > monotonic clock for this kind of timed events?
> 
> I really have not tried this and I guess I don't know tevent internals
> well enough if this works, but I wonder if just using:
> clock_getime()

With a CLOCK_MONOTONIC I presume?
I think that's what has been suggested.

> and constructing struct timeval
> in place of:
> tevent_timeval_current_ofs()
> could solve this particular issue.
> 
> On the other hand, this is a pattern we use in SSSD all through the code
> for timed events and we're just not well equipped to handle time drifts.
> Did you investigate why doesn't sssd detect the networking change from
> libnl messages or from resolv.conf being touched?

I had a patch series contributed to autofs recently that changed to use
 CLOCK_MONOTONIC for the same type of problem.

However, the man page says of CLOCK_MONOTONIC:

"Clock that cannot be set and represents monotonic time since some
unspecified starting  point. This clock is not affected by discontinuous
jumps in the system time (e.g., if the system administrator manually
changes the clock), but is  affected by the incremental adjustments
performed by adjtime(3) and NTP."

So perhaps this won't actually help when NTP adjusts the clock (although NTP
should adjust the clock, perhaps, slowly enough), I didn't catch that when I
reviewed the autofs patch series  mmm.

Ian
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] Re: Monotonic clock for timed events

2016-10-10 Thread Jakub Hrozek
On Mon, Oct 10, 2016 at 10:04:30AM +0200, Victor Tapia wrote:
> Hi list,
> 
> I've faced a race condition when SSSD boots in a machine with a big
> clock drift. This is what I see:
> 
> 1. SSSD starts before the network is up, queries the LDAP server without
> success and sets a retry timer (~60 secs)
> 2. NTP starts and corrects the clock, 1 hour back for example.
> 3. SSSD takes ~60 secs + the drift correction (1 hour) to retry the
> connection.
> 
> In this particular scenario the credentials cache is disabled, so the
> wait time to login is noticeable. How feasible would it be to use a
> monotonic clock for this kind of timed events?

I really have not tried this and I guess I don't know tevent internals
well enough if this works, but I wonder if just using:
clock_gettime()
and constructing struct timeval
in place of:
tevent_timeval_current_ofs()
could solve this particular issue.

On the other hand, this is a pattern we use in SSSD all through the code
for timed events and we're just not well equipped to handle time drifts.
Did you investigate why doesn't sssd detect the networking change from
libnl messages or from resolv.conf being touched?
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] Re: Monotonic clock for timed events

2016-10-10 Thread Victor Tapia
Hi Fabiano and list,

Please forgive me, I just realized that I wasn't, indeed, running from
master. I'll try again with the proper version and come back if needed.

Thanks!

Victor


El 10/10/16 a las 10:09, Fabiano Fidêncio escribió:
> Victor,
> 
> On Mon, Oct 10, 2016 at 10:04 AM, Victor Tapia
>  wrote:
>> Hi list,
>>
>> I've faced a race condition when SSSD boots in a machine with a big
>> clock drift. This is what I see:
>>
>> 1. SSSD starts before the network is up, queries the LDAP server without
>> success and sets a retry timer (~60 secs)
>> 2. NTP starts and corrects the clock, 1 hour back for example.
>> 3. SSSD takes ~60 secs + the drift correction (1 hour) to retry the
>> connection.
>>
>> In this particular scenario the credentials cache is disabled, so the
>> wait time to login is noticeable. How feasible would it be to use a
>> monotonic clock for this kind of timed events?
> 
> Are you running git master? This issue is supposed to be already
> solved by 
> https://github.com/SSSD/sssd/commit/b8ceaeb80cffb00c26390913ea959b77f7e848b9
> 
> Best Regards,
> 
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org


[SSSD] Re: Monotonic clock for timed events

2016-10-10 Thread Fabiano Fidêncio
Victor,

On Mon, Oct 10, 2016 at 10:04 AM, Victor Tapia
 wrote:
> Hi list,
>
> I've faced a race condition when SSSD boots in a machine with a big
> clock drift. This is what I see:
>
> 1. SSSD starts before the network is up, queries the LDAP server without
> success and sets a retry timer (~60 secs)
> 2. NTP starts and corrects the clock, 1 hour back for example.
> 3. SSSD takes ~60 secs + the drift correction (1 hour) to retry the
> connection.
>
> In this particular scenario the credentials cache is disabled, so the
> wait time to login is noticeable. How feasible would it be to use a
> monotonic clock for this kind of timed events?

Are you running git master? This issue is supposed to be already
solved by 
https://github.com/SSSD/sssd/commit/b8ceaeb80cffb00c26390913ea959b77f7e848b9

Best Regards,
-- 
Fabiano Fidêncio
___
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org