On (04/12/15 16:02), Michal Židek wrote: >On 12/04/2015 03:07 PM, Lukas Slebodnik wrote: >>On (04/12/15 14:35), Michal Židek wrote: >>>On 12/04/2015 02:32 PM, Jakub Hrozek wrote: >>>>On Fri, Dec 04, 2015 at 02:29:49PM +0100, Michal Židek wrote: >>>>>On 12/04/2015 12:29 PM, Lukas Slebodnik wrote: >>>>>>On (04/12/15 12:11), Jakub Hrozek wrote: >>>>>>>On Fri, Dec 04, 2015 at 11:42:00AM +0100, Lukas Slebodnik wrote: >>>>>>>>On (03/12/15 20:22), Jakub Hrozek wrote: >>>>>>>>>On Wed, Dec 02, 2015 at 05:59:07PM +0100, Lukas Slebodnik wrote: >>>>>>>>>>On (02/12/15 17:10), Michal Židek wrote: >>>>>>>>>>>Hi! >>>>>>>>>>> >>>>>>>>>>>I saw some integration tests failures recently, >>>>>>>>>>>and I think there is a race condition between the >>>>>>>>>>>enumeration refresh timeout and the sleeps >>>>>>>>>>>after some operations that wait for this timeout. >>>>>>>>>>>SSSD fails to populate changes from LDAP in time >>>>>>>>>>>and some asserts can fail because of this. >>>>>>>>>>> >>>>>>>>>>>So far I saw 4 tests to fail like this, which >>>>>>>>>>>is already quite a lot. >>>>>>>>>>> >>>>>>>>>>>The attached patch modifies the timeout values >>>>>>>>>>>and hopefully removes the issue. >>>>>>>>>>> >>>>>>>>>>>Michal >>>>>>>>>> >>>>>>>>>>>From b724db15ce0c1593cfdd7b4da8e0c39e97942e8c Mon Sep 17 00:00:00 >>>>>>>>>>>2001 >>>>>>>>>>>From: =?UTF-8?q?Michal=20=C5=BDidek?= <mzi...@redhat.com> >>>>>>>>>>>Date: Wed, 2 Dec 2015 16:44:48 +0100 >>>>>>>>>>>Subject: [PATCH] ldap_test.py: Modify enum cache timeouts >>>>>>>>>>> >>>>>>>>>>>There is a race condation between ldap >>>>>>>>>>>enumeration refresh timeout and the sleeps >>>>>>>>>>>that wait for the ldap changes to populate >>>>>>>>>>>to SSSD if the timeout and the sleeps have >>>>>>>>>>>the same value. >>>>>>>>>>>--- >>>>>>>>>>>src/tests/intg/ldap_test.py | 30 +++++++++++++++++------------- >>>>>>>>>>>1 file changed, 17 insertions(+), 13 deletions(-) >>>>>>>>>>> >>>>>>>>>>>diff --git a/src/tests/intg/ldap_test.py >>>>>>>>>>>b/src/tests/intg/ldap_test.py >>>>>>>>>>>index 757ee20..8ec8dbe 100644 >>>>>>>>>>>--- a/src/tests/intg/ldap_test.py >>>>>>>>>>>+++ b/src/tests/intg/ldap_test.py >>>>>>>>>>>@@ -33,7 +33,11 @@ import ldap_ent >>>>>>>>>>>from util import * >>>>>>>>>>> >>>>>>>>>>>LDAP_BASE_DN = "dc=example,dc=com" >>>>>>>>>>>-INTERACTIVE_TIMEOUT = 4 >>>>>>>>>>>+INTERACTIVE_TIMEOUT = 2 >>>>>>>>>>>+ >>>>>>>>>>>+ >>>>>>>>>>>+def wait_for_ldap_enum_refresh(): >>>>>>>>>>>+ time.sleep(INTERACTIVE_TIMEOUT + 4) >>>>>>>>>>Why does it need to be INTERACTIVE_TIMEOUT + 4 >>>>>>>>>> >>>>>>>>>>Could it be INTERACTIVE_TIMEOUT + 3 or + 5 >>>>>>>>>> >>>>>>>>> >>>>>>>>>Regardless of the value we choose, can we move this patch forward? I >>>>>>>>>see >>>>>>>>>the related failure quite often in SSSD. >>>>>>>>Adding timeout without real explanation is not a solution. >>>>>>>> >>>>>>>>The main problem is with empiric values. >>>>>>>>If they are very high then test are slow. >>>>>>>>And there still can be slow/fast machine where lower values caused >>>>>>>>troubles. >>>>>>>> >>>>>>>>The ideal solution would be to get rid of enumeration >>>>>>>>in ldap tests. >>>>>>> >>>>>>>Enumeration is a codepath that is different from non-enumeration, so it >>>>>>>should be tested. Not as priority, not as the only ldap tests, but it's >>>>>>>a valid case, so it should be there. >>>>>>> >>>>>>>>If we want to test enumeration than it should be in separate >>>>>>>>test. >>>>>>> >>>>>>>Maybe, but we do have enumeration tests and we should fix them. >>>>>>> >>>>>>Adding sleep is not a fix. It's just a workaround >>>>>>because all sleep timeout are just an empiric values. >>>>>>and we should fix test and not adding workaround/hacks. >>>>>> >>>>>>If we cannot fix the test and don't want te rewrite test without >>>>>>enumeration >>>>>>then we should remove/revert problematic tests. >>>>>> >>>>>>LS >>>>> >>>>>I will send a new patch with an explanation (sort of), >>>>>but it still will be a guess. I am not sure what the >>>>>real safe value should be, only that the sleep's >>>>>after operation should be longer than the ldap >>>>>refresh and enum cache timeouts (and that the >>>>>current values do not cope well wit the CI load). >>>> >>>>Would it be more acceptable then to define the ldap refresh and enum >>>>cache timeouts as variables in the test and sleep for (enum_timeout + >>>>cache_timeout + 1) ? >>>> >>>>At least that would be more readable than a magic constant.. >>> >>>Will do. All will be derived from INTERACTIVE_TIMEOUT >>>so that we need to change just one value if needed in the >>>future. >>> >>Will it be reliable? >> >>Will it work on slow(arm) machines? >> >>I plan to run integration test in "%check" phase >>of rpm build. And koji/fedora has rpm machines. > >We can mark tests that may fail on slow machines >due to timeouts as "unsafe" and skip them in the >rpm build. It's not about rpmbuild, it's a general problem. And marking tests as unsafe does not solve anything. We need a reliable way how to find out that enumeratin task is finished.
Even grepping log files in loop is better than using sleep LS _______________________________________________ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org