testTwoSameGenerator - intermittent CI failure

Henrik Nergaard Sun, 11 Dec 2016 03:51:04 -0800

>The timing problem might indeed be resolution dependent: 100us is probably to 
>short sometimes. 
Time microsecondClockValue gives the time in microseconds, but it does not 
guarantee microsecond resolution, on windows the resolution is still in 
milliseconds.


----------
| col |

col := Array new: 30000 streamContents: [ :stream |
        1 to: 30000 do: [ :x | stream nextPut: Time microsecondClockValue ]
].

col asSet  "a Set(3658909536504622 3658909536503622 3658909536502622)"'
---------

Best regards,
Henrik

-----Original Message-----
From: Pharo-dev [mailto:[email protected]] On Behalf Of Sven 
Van Caekenberghe
Sent: Sunday, December 11, 2016 11:56 AM
To: Pharo Development List <[email protected]>
Subject: Re: [Pharo-dev] NeoUUIDGenerator>>testTwoSameGenerator - intermittent 
CI failure

Hi Ben,

Thanks for looking into this, it was bothering me for quite some time ;-)

https://pharo.fogbugz.com/f/cases/19453/NeoUUIDGenerator-testTwoSameGenerator-intermittent-CI-failure

Yes, I think the counter roll-over is the problem, stupid of me to to have seen 
this.

The timing problem might indeed be resolution dependent: 100us is probably to 
short sometimes.

I will provide a slice with fixes.

Thanks again !

Regards,

Sven

> On 11 Dec 2016, at 09:07, Ben Coman <[email protected]> wrote:
> 
> For Case [1] I got an unrelated CI failure [2] for
> NeoUUIDGeneratorTests>>#testTwoSameGenerator
> 
> Checking this locally by running this a dozen times...
> 1 to: 1000 do: [:n| last:=n. NeoUUIDGeneratorTests new setUp 
> testTwoSameGenerator ]
> 
> Two distinct failures are apparent...
> 
> 1. Takes longer than expected.
>     self assert: (time2 - time1) abs < 100
> last=1 ==> a time difference of 109
> last=1 ==> a time difference of 102
> last=1 ==> a time difference of 155
> last=1 ==> a time difference of 105
> Interesting that whenever this happens its on the first iteration.
> 
> 2. Invalid consecutive ids. 
>     self assert: (uuid1 at: 10) + 1 equals: (uuid2 at: 10).
> last=127 ==> TestFailure: Got 256 instead of 0.
> last=828 ==> TestFailure: Got 256 instead of 0.
> last=128 ==> TestFailure: Got 256 instead of 0.
> last=316 ==> TestFailure: Got 256 instead of 0.
> Maybe a problem with rollover?
> 
> So instrumenting a bit:
>   testTwoSameGenerator: n
>     | uuid1 uuid2 consecutiveOk time1 time2 timeOk |
>     ....
>     "same generator, consequetive counters"
>     consecutiveOk := ((uuid1 at: 10) + 1) = (uuid2 at: 10).
>     "same generators, similar clock, within 100 milliseconds"
>     time1 := self timeFromUUID: uuid1.
>     time2 := self timeFromUUID: uuid2.
>     timeOk := (time2 - time1) abs < 100.
>     ( timeOk and: [ consecutiveOk ]) ifFalse: [
>         Transcript cr;
>            show: 'fail n=', (n printPaddedWith: $0 to: 3)  ; tab;
>            show: ' time ', ((time2 - time1) printString) , '<100'; tab;
>            show: ' uuid ', (((uuid1 at: 10) + 1) printString, '=', (uuid2 at: 
> 10) printString).
>     ].
>     self assert: consecutiveOk.
>     self assert: timeOk.
> 
> From playground running...
> 1 to: 1000 do: [:n|
>     NeoUUIDGeneratorTests new setUp testTwoSameGenerator: n].
> Transcript crShow: 'x'.
> 
> Note the 'x' was not output due to one of the consecutiveOk or timeOk asserts 
> triggered each run.
> fail n=001     time 104<100    uuid 160=160
> fail n=269     time 2<100      uuid 256=0
> fail n=001     time 152<100    uuid 166=166
> fail n=117     time 3<100      uuid 256=0
> fail n=001     time 123<100    uuid 128=128
> fail n=278     time 3<100      uuid 256=0
> fail n=001     time 145<100    uuid 230=230
> fail n=673     time 1340<100   uuid 141=141
> fail n=707     time 3<100      uuid 256=0
> fail n=001     time 132<100    uuid 73=73
> fail n=001     time 117<100    uuid 206=206
> fail n=063     time 2<100      uuid 256=0
> fail n=345     time 3<100      uuid 256=0
> 
> Now a strange thing is that after commenting out those two asserts for 
> consecutiveOk and timeOk, the fails at n=001 stop happening.   
> 
> fail n=001     time 141<100    uuid 232=232
> x
> x
> fail n=201     time 3<100      uuid 256=0
> fail n=268     time 3<100      uuid 256=0
> fail n=359     time 3<100      uuid 256=0
> fail n=408     time 2<100      uuid 256=0
> fail n=559     time 3<100      uuid 256=0
> x
> fail n=139     time 3<100      uuid 256=0
> fail n=854     time 7<100      uuid 256=0
> x
> fail n=389     time 3<100      uuid 256=0
> fail n=669     time 3<100      uuid 256=0
> fail n=876     time 3<100      uuid 256=0
> fail n=916     time 3<100      uuid 256=0
> fail n=953     time 2<100      uuid 256=0
> fail n=954     time 3<100      uuid 256=0
> x
> fail n=148     time 3<100      uuid 256=0
> fail n=741     time 3<100      uuid 256=0
> x
> fail n=116     time 2<100      uuid 256=0
> fail n=480     time 3<100      uuid 256=0
> x
> 
> After treating the rollover like this... 
>     consecutiveOk := ((uuid1 at: 10) + 1) \\ 256 = (uuid2 at: 10).
> running the last playground script gives...
> 
> (Note the n=001 fail reoccurs after waiting a while between running the 
> playground script again.)
> fail n=001     time 123<100    uuid 110=110
> x
> fail n=345     time 1287<100   uuid 24=24
> fail n=404     time 1313<100   uuid 125=125
> x
> x
> x
> x
> x
> fail n=449     time 1290<100   uuid 24=24
> x
> x
> x
> x
> x
> 
> 
> Thats looking better, so upping to 10000 iterations...
> fail n=0001    time 151<100          uuid 160=160
> fail n=2977    time 1437<100   uuid 221=221
> fail n=5498    time 1454<100   uuid 224=224
> fail n=9007    time 1351<100   uuid 239=239
> x
> fail n=3389    time 1381<100   uuid 223=223
> fail n=7768    time 1365<100   uuid 82=82
> fail n=9773    time 1395<100   uuid 164=164
> x
> fail n=3325    time 1448<100   uuid 67=67
> fail n=9930    time 1456<100   uuid 103=103
> x
> fail n=0060    time 2914<100   uuid 159=159
> fail n=1829    time 1438<100   uuid 32=32
> fail n=7918    time 1436<100   uuid 182=182
> x
> 
> So the rollover looks cured.
> 
> Now adding some more instrumentation to view GC Time and Time To Run 
> and  (see attached ST file)
> 
> 0:00:00:00    TTR 0:00:00:00.001      TTR 0:00:00:00  TTR 0:00:00:00  TTR 
> 0:00:00:00  TTR 0:00:00:00  TTR 0:00:00:00  TTR 0:00:00:00  TTR 0:00:00:00  
> TTR 0:00:00:00
> x
> fail n=0001    time 115<100    uuid 157=157    GCTime 0 TTR 0:00:00:00
> fail n=0001    time 2<100      uuid 255=0      GCTime 0 TTR 0:00:00:00
> x
> fail n=0001    time 111<100    uuid 104=104    GCTime 0 TTR 0
> x
> fail n=0001    time 120<100    uuid 204=204    GCTime 0 TTR 0
> fail n=0001    time 1182<100   uuid 103=103    GCTime 1 TTR 1
> x
> x
> x
> x
> x
> fail n=0001    time 3<100      uuid 255=0      GCTime 0 TTR 0
> x
> x
> fail n=0001    time 1251<100   uuid 253=253    GCTime 2 TTR 1
> x
> 
> 
> Now I wonder at the class comment saying...
>   "the first 8 bytes are the *millisecond* clock value" 
> but #clockOn: is using *microsecond* clock.
> Perhaps "self assert: (time2 - time1) abs < 100"
> is meant to be 100ms not 100us ??
> 
> cheers -ben
> 
> [1] https://pharo.fogbugz.com/f/cases/19438
> [2] 
> https://ci.inria.fr/pharo/job/Pharo-6.0-Issue-Validator/19471//artifac
> t/validationReport.html 
> <NeoUUIDGeneratorTests-testTwoSameGenerator.st>

Re: [Pharo-dev] NeoUUIDGenerator>>testTwoSameGenerator - intermittent CI failure

Reply via email to