On (04/12/15 16:02), Michal Židek wrote:
>On 12/04/2015 03:07 PM, Lukas Slebodnik wrote:
>>On (04/12/15 14:35), Michal Židek wrote:
>>>On 12/04/2015 02:32 PM, Jakub Hrozek wrote:
>>>>On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote:
>>>>>On 12/04/2015 12:29 PM, Lukas Slebodnik wrote:
>>>>>>On (04/12/15 12:11), Jakub Hrozek wrote:
>>>>>>>On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote:
>>>>>>>>On (03/12/15 20:22), Jakub Hrozek wrote:
>>>>>>>>>On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote:
>>>>>>>>>>On (02/12/15 17:10), Michal Židek wrote:
>>>>>>>>>>>Hi!
>>>>>>>>>>>
>>>>>>>>>>>I saw some integration tests failures recently,
>>>>>>>>>>>and I think there is a race condition between the
>>>>>>>>>>>enumeration refresh timeout and the sleeps
>>>>>>>>>>>after some operations that wait for this timeout.
>>>>>>>>>>>SSSD fails to populate changes from LDAP in time
>>>>>>>>>>>and some asserts can fail because of this.
>>>>>>>>>>>
>>>>>>>>>>>So far I saw 4 tests to fail like this, which
>>>>>>>>>>>is already quite a lot.
>>>>>>>>>>>
>>>>>>>>>>>The attached patch modifies the timeout values
>>>>>>>>>>>and hopefully removes the issue.
>>>>>>>>>>>
>>>>>>>>>>>Michal
>>>>>>>>>>
>>>>>>>>>>>From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 
>>>>>>>>>>>2001
>>>>>>>>>>>From: =?UTF-8?q?Michal=20=C5=BDidek?= <mzi...@redhat.com>
>>>>>>>>>>>Date: Wed, 2 Dec 2015 16:44:48 +0100
>>>>>>>>>>>Subject: [PATCH] ldap_test.py: Modify enum cache timeouts
>>>>>>>>>>>
>>>>>>>>>>>There is a race condation between ldap
>>>>>>>>>>>enumeration refresh timeout and the sleeps
>>>>>>>>>>>that wait for the ldap changes to populate
>>>>>>>>>>>to SSSD if the timeout and the sleeps have
>>>>>>>>>>>the same value.
>>>>>>>>>>>---
>>>>>>>>>>>src/tests/intg/ldap_test.py | 30 +++++++++++++++++-------------
>>>>>>>>>>>1 file changed, 17 insertions(+), 13 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>>diff --git a/src/tests/intg/ldap_test.py 
>>>>>>>>>>>b/src/tests/intg/ldap_test.py
>>>>>>>>>>>index 757ee20..8ec8dbe 100644
>>>>>>>>>>>--- a/src/tests/intg/ldap_test.py
>>>>>>>>>>>+++ b/src/tests/intg/ldap_test.py
>>>>>>>>>>>@@ -33,7 +33,11 @@ import ldap_ent
>>>>>>>>>>>from util import *
>>>>>>>>>>>
>>>>>>>>>>>LDAP_BASE_DN = "dc=example,dc=com"
>>>>>>>>>>>-INTERACTIVE_TIMEOUT = 4
>>>>>>>>>>>+INTERACTIVE_TIMEOUT = 2
>>>>>>>>>>>+
>>>>>>>>>>>+
>>>>>>>>>>>+def wait_for_ldap_enum_refresh():
>>>>>>>>>>>+    time.sleep(INTERACTIVE_TIMEOUT + 4)
>>>>>>>>>>Why does it need to be INTERACTIVE_TIMEOUT + 4
>>>>>>>>>>
>>>>>>>>>>Could it be INTERACTIVE_TIMEOUT + 3 or + 5
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>Regardless of the value we choose, can we move this patch forward? I 
>>>>>>>>>see
>>>>>>>>>the related failure quite often in SSSD.
>>>>>>>>Adding timeout without real explanation is not a solution.
>>>>>>>>
>>>>>>>>The main problem is with empiric values.
>>>>>>>>If they are very high then test are slow.
>>>>>>>>And there still can be slow/fast machine where lower values caused 
>>>>>>>>troubles.
>>>>>>>>
>>>>>>>>The ideal solution would be to get rid of enumeration
>>>>>>>>in ldap tests.
>>>>>>>
>>>>>>>Enumeration is a codepath that is different from non-enumeration, so it
>>>>>>>should be tested. Not as priority, not as the only ldap tests, but it's
>>>>>>>a valid case, so it should be there.
>>>>>>>
>>>>>>>>If we want to test enumeration than it should be in separate
>>>>>>>>test.
>>>>>>>
>>>>>>>Maybe, but we do have enumeration tests and we should fix them.
>>>>>>>
>>>>>>Adding sleep is not a fix. It's just a workaround
>>>>>>because all sleep timeout are just an empiric values.
>>>>>>and we should fix test and not adding workaround/hacks.
>>>>>>
>>>>>>If we cannot fix the test and don't want te rewrite test without 
>>>>>>enumeration
>>>>>>then we should remove/revert problematic tests.
>>>>>>
>>>>>>LS
>>>>>
>>>>>I will send a new patch with an explanation (sort of),
>>>>>but it still will be a guess. I am not sure what the
>>>>>real safe value should be, only that the sleep's
>>>>>after operation should be longer than the ldap
>>>>>refresh and enum cache timeouts (and that the
>>>>>current values do not cope well wit the CI load).
>>>>
>>>>Would it be more acceptable then to define the ldap refresh and enum
>>>>cache timeouts as variables in the test and sleep for (enum_timeout +
>>>>cache_timeout + 1) ?
>>>>
>>>>At least that would be more readable than a magic constant..
>>>
>>>Will do. All will be derived from INTERACTIVE_TIMEOUT
>>>so that we need to change just one value if needed in the
>>>future.
>>>
>>Will it be reliable?
>>
>>Will it work on slow(arm) machines?
>>
>>I plan to run integration test in "%check" phase
>>of rpm build. And koji/fedora has rpm machines.
>
>We can mark tests that may fail on slow machines
>due to timeouts as "unsafe" and skip them in the
>rpm build.
It's not about rpmbuild, it's a general problem.
And marking tests as unsafe does not solve anything.
We need a reliable way how to find out that enumeratin task is finished.

Even grepping log files in loop is better than using
sleep

LS
_______________________________________________
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org

Reply via email to