testTwoSameGenerator - intermittent CI failure

Sven Van Caekenberghe Sun, 11 Dec 2016 05:28:27 -0800

> On 11 Dec 2016, at 12:49, Henrik Nergaard <[email protected]> wrote:
> 
>> The timing problem might indeed be resolution dependent: 100us is probably 
>> to short sometimes. 
> Time microsecondClockValue gives the time in microseconds, but it does not 
> guarantee microsecond resolution, on windows the resolution is still in 
> milliseconds. 
> 
> ----------
> | col |
> 
> col := Array new: 30000 streamContents: [ :stream |
>       1 to: 30000 do: [ :x | stream nextPut: Time microsecondClockValue ]
> ].
> 
> col asSet  "a Set(3658909536504622 3658909536503622 3658909536502622)"'
> ---------


Cool test, very sad result ;-)

> Best regards,
> Henrik
> 
> -----Original Message-----
> From: Pharo-dev [mailto:[email protected]] On Behalf Of Sven 
> Van Caekenberghe
> Sent: Sunday, December 11, 2016 11:56 AM
> To: Pharo Development List <[email protected]>
> Subject: Re: [Pharo-dev] NeoUUIDGenerator>>testTwoSameGenerator - 
> intermittent CI failure
> 
> Hi Ben,
> 
> Thanks for looking into this, it was bothering me for quite some time ;-)
> 
> https://pharo.fogbugz.com/f/cases/19453/NeoUUIDGenerator-testTwoSameGenerator-intermittent-CI-failure
> 
> Yes, I think the counter roll-over is the problem, stupid of me to to have 
> seen this.
> 
> The timing problem might indeed be resolution dependent: 100us is probably to 
> short sometimes.
> 
> I will provide a slice with fixes.
> 
> Thanks again !
> 
> Regards,
> 
> Sven
> 
>> On 11 Dec 2016, at 09:07, Ben Coman <[email protected]> wrote:
>> 
>> For Case [1] I got an unrelated CI failure [2] for
>> NeoUUIDGeneratorTests>>#testTwoSameGenerator
>> 
>> Checking this locally by running this a dozen times...
>> 1 to: 1000 do: [:n| last:=n. NeoUUIDGeneratorTests new setUp 
>> testTwoSameGenerator ]
>> 
>> Two distinct failures are apparent...
>> 
>> 1. Takes longer than expected.
>>    self assert: (time2 - time1) abs < 100
>> last=1 ==> a time difference of 109
>> last=1 ==> a time difference of 102
>> last=1 ==> a time difference of 155
>> last=1 ==> a time difference of 105
>> Interesting that whenever this happens its on the first iteration.
>> 
>> 2. Invalid consecutive ids. 
>>    self assert: (uuid1 at: 10) + 1 equals: (uuid2 at: 10).
>> last=127 ==> TestFailure: Got 256 instead of 0.
>> last=828 ==> TestFailure: Got 256 instead of 0.
>> last=128 ==> TestFailure: Got 256 instead of 0.
>> last=316 ==> TestFailure: Got 256 instead of 0.
>> Maybe a problem with rollover?
>> 
>> So instrumenting a bit:
>>  testTwoSameGenerator: n
>>    | uuid1 uuid2 consecutiveOk time1 time2 timeOk |
>>    ....
>>    "same generator, consequetive counters"
>>    consecutiveOk := ((uuid1 at: 10) + 1) = (uuid2 at: 10).
>>    "same generators, similar clock, within 100 milliseconds"
>>    time1 := self timeFromUUID: uuid1.
>>    time2 := self timeFromUUID: uuid2.
>>    timeOk := (time2 - time1) abs < 100.
>>    ( timeOk and: [ consecutiveOk ]) ifFalse: [
>>        Transcript cr;
>>           show: 'fail n=', (n printPaddedWith: $0 to: 3)  ; tab;
>>           show: ' time ', ((time2 - time1) printString) , '<100'; tab;
>>           show: ' uuid ', (((uuid1 at: 10) + 1) printString, '=', (uuid2 at: 
>> 10) printString).
>>    ].
>>    self assert: consecutiveOk.
>>    self assert: timeOk.
>> 
>> From playground running...
>> 1 to: 1000 do: [:n|
>>    NeoUUIDGeneratorTests new setUp testTwoSameGenerator: n].
>> Transcript crShow: 'x'.
>> 
>> Note the 'x' was not output due to one of the consecutiveOk or timeOk 
>> asserts triggered each run.
>> fail n=001    time 104<100    uuid 160=160
>> fail n=269    time 2<100      uuid 256=0
>> fail n=001    time 152<100    uuid 166=166
>> fail n=117    time 3<100      uuid 256=0
>> fail n=001    time 123<100    uuid 128=128
>> fail n=278    time 3<100      uuid 256=0
>> fail n=001    time 145<100    uuid 230=230
>> fail n=673    time 1340<100   uuid 141=141
>> fail n=707    time 3<100      uuid 256=0
>> fail n=001    time 132<100    uuid 73=73
>> fail n=001    time 117<100    uuid 206=206
>> fail n=063    time 2<100      uuid 256=0
>> fail n=345    time 3<100      uuid 256=0
>> 
>> Now a strange thing is that after commenting out those two asserts for 
>> consecutiveOk and timeOk, the fails at n=001 stop happening.   
>> 
>> fail n=001    time 141<100    uuid 232=232
>> x
>> x
>> fail n=201    time 3<100      uuid 256=0
>> fail n=268    time 3<100      uuid 256=0
>> fail n=359    time 3<100      uuid 256=0
>> fail n=408    time 2<100      uuid 256=0
>> fail n=559    time 3<100      uuid 256=0
>> x
>> fail n=139    time 3<100      uuid 256=0
>> fail n=854    time 7<100      uuid 256=0
>> x
>> fail n=389    time 3<100      uuid 256=0
>> fail n=669    time 3<100      uuid 256=0
>> fail n=876    time 3<100      uuid 256=0
>> fail n=916    time 3<100      uuid 256=0
>> fail n=953    time 2<100      uuid 256=0
>> fail n=954    time 3<100      uuid 256=0
>> x
>> fail n=148    time 3<100      uuid 256=0
>> fail n=741    time 3<100      uuid 256=0
>> x
>> fail n=116    time 2<100      uuid 256=0
>> fail n=480    time 3<100      uuid 256=0
>> x
>> 
>> After treating the rollover like this... 
>>    consecutiveOk := ((uuid1 at: 10) + 1) \\ 256 = (uuid2 at: 10).
>> running the last playground script gives...
>> 
>> (Note the n=001 fail reoccurs after waiting a while between running the 
>> playground script again.)
>> fail n=001    time 123<100    uuid 110=110
>> x
>> fail n=345    time 1287<100   uuid 24=24
>> fail n=404    time 1313<100   uuid 125=125
>> x
>> x
>> x
>> x
>> x
>> fail n=449    time 1290<100   uuid 24=24
>> x
>> x
>> x
>> x
>> x
>> 
>> 
>> Thats looking better, so upping to 10000 iterations...
>> fail n=0001   time 151<100          uuid 160=160
>> fail n=2977   time 1437<100   uuid 221=221
>> fail n=5498   time 1454<100   uuid 224=224
>> fail n=9007   time 1351<100   uuid 239=239
>> x
>> fail n=3389   time 1381<100   uuid 223=223
>> fail n=7768   time 1365<100   uuid 82=82
>> fail n=9773   time 1395<100   uuid 164=164
>> x
>> fail n=3325   time 1448<100   uuid 67=67
>> fail n=9930   time 1456<100   uuid 103=103
>> x
>> fail n=0060   time 2914<100   uuid 159=159
>> fail n=1829   time 1438<100   uuid 32=32
>> fail n=7918   time 1436<100   uuid 182=182
>> x
>> 
>> So the rollover looks cured.
>> 
>> Now adding some more instrumentation to view GC Time and Time To Run 
>> and  (see attached ST file)
>> 
>> 0:00:00:00   TTR 0:00:00:00.001      TTR 0:00:00:00  TTR 0:00:00:00  TTR 
>> 0:00:00:00  TTR 0:00:00:00  TTR 0:00:00:00  TTR 0:00:00:00  TTR 0:00:00:00  
>> TTR 0:00:00:00
>> x
>> fail n=0001   time 115<100    uuid 157=157    GCTime 0 TTR 0:00:00:00
>> fail n=0001   time 2<100      uuid 255=0      GCTime 0 TTR 0:00:00:00
>> x
>> fail n=0001   time 111<100    uuid 104=104    GCTime 0 TTR 0
>> x
>> fail n=0001   time 120<100    uuid 204=204    GCTime 0 TTR 0
>> fail n=0001   time 1182<100   uuid 103=103    GCTime 1 TTR 1
>> x
>> x
>> x
>> x
>> x
>> fail n=0001   time 3<100      uuid 255=0      GCTime 0 TTR 0
>> x
>> x
>> fail n=0001   time 1251<100   uuid 253=253    GCTime 2 TTR 1
>> x
>> 
>> 
>> Now I wonder at the class comment saying...
>>  "the first 8 bytes are the *millisecond* clock value" 
>> but #clockOn: is using *microsecond* clock.
>> Perhaps "self assert: (time2 - time1) abs < 100"
>> is meant to be 100ms not 100us ??
>> 
>> cheers -ben
>> 
>> [1] https://pharo.fogbugz.com/f/cases/19438
>> [2] 
>> https://ci.inria.fr/pharo/job/Pharo-6.0-Issue-Validator/19471//artifac
>> t/validationReport.html 
>> <NeoUUIDGeneratorTests-testTwoSameGenerator.st>
> 
> 
>

Re: [Pharo-dev] NeoUUIDGenerator>>testTwoSameGenerator - intermittent CI failure

Reply via email to