testTwoSameGenerator - intermittent CI failure

Ben Coman Sun, 11 Dec 2016 07:52:32 -0800

On Sun, Dec 11, 2016 at 7:49 PM, Henrik Nergaard <[email protected]>
wrote:


> >The timing problem might indeed be resolution dependent: 100us is
> probably to short sometimes.
> Time microsecondClockValue gives the time in microseconds, but it does not
> guarantee microsecond resolution, on windows the resolution is still in
> milliseconds.
>

Useful to know. But regardless of the resolution, the units are still
microseconds.
So the majority of the time you'll assert: 3622 - 3622 < 100
which passes,
but other times you'll cross the boundary and you'll assert: ...4622 -
...3622 < 100,
which fails.

The important question is, what resolution of time is important for UUIDs?
If 10ms is sufficient, then ...4622 - ...3622 < 10000 easily passes on even
the slowest computer..



> ----------
> | col |
>
> col := Array new: 30000 streamContents: [ :stream |
>         1 to: 30000 do: [ :x | stream nextPut: Time microsecondClockValue ]
> ].
>
> col asSet  "a Set(3658909536504622 3658909536503622 3658909536502622)"'
>

btw I forgot to mention I'm on 32-bit Linux. So thankfully I get...
col asSet size "==>16135"

btw2, For a challenge I'm lining up to improve the VM clock "sometime",
maybe end of Feb (if it proves within my capability no one else gets to it
first).
https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/36



>
> -----Original Message-----
> From: Pharo-dev [mailto:[email protected]] On Behalf Of
> Sven Van Caekenberghe
> Sent: Sunday, December 11, 2016 11:56 AM
> To: Pharo Development List <[email protected]>
> Subject: Re: [Pharo-dev] NeoUUIDGenerator>>testTwoSameGenerator -
> intermittent CI failure
>
> Hi Ben,
>
> Thanks for looking into this, it was bothering me for quite some time ;-)
>
> https://pharo.fogbugz.com/f/cases/19453/NeoUUIDGenerator-
> testTwoSameGenerator-intermittent-CI-failure
>
> Yes, I think the counter roll-over is the problem, stupid of me to to have
> seen this.
>
> The timing problem might indeed be resolution dependent: 100us is probably
> to short sometimes.
>
> I will provide a slice with fixes.
>

Cool. Ping me from the slice.
cheers -ben


>
> Thanks again !
>
> Regards,
>
> Sven
>
> > On 11 Dec 2016, at 09:07, Ben Coman <[email protected]> wrote:
> >
> > For Case [1] I got an unrelated CI failure [2] for
> > NeoUUIDGeneratorTests>>#testTwoSameGenerator
> >
> > Checking this locally by running this a dozen times...
> > 1 to: 1000 do: [:n| last:=n. NeoUUIDGeneratorTests new setUp
> > testTwoSameGenerator ]
> >
> > Two distinct failures are apparent...
> >
> > 1. Takes longer than expected.
> >     self assert: (time2 - time1) abs < 100
> > last=1 ==> a time difference of 109
> > last=1 ==> a time difference of 102
> > last=1 ==> a time difference of 155
> > last=1 ==> a time difference of 105
> > Interesting that whenever this happens its on the first iteration.
> >
> > 2. Invalid consecutive ids.
> >     self assert: (uuid1 at: 10) + 1 equals: (uuid2 at: 10).
> > last=127 ==> TestFailure: Got 256 instead of 0.
> > last=828 ==> TestFailure: Got 256 instead of 0.
> > last=128 ==> TestFailure: Got 256 instead of 0.
> > last=316 ==> TestFailure: Got 256 instead of 0.
> > Maybe a problem with rollover?
> >
> > So instrumenting a bit:
> >   testTwoSameGenerator: n
> >     | uuid1 uuid2 consecutiveOk time1 time2 timeOk |
> >     ....
> >     "same generator, consequetive counters"
> >     consecutiveOk := ((uuid1 at: 10) + 1) = (uuid2 at: 10).
> >     "same generators, similar clock, within 100 milliseconds"
> >     time1 := self timeFromUUID: uuid1.
> >     time2 := self timeFromUUID: uuid2.
> >     timeOk := (time2 - time1) abs < 100.
> >     ( timeOk and: [ consecutiveOk ]) ifFalse: [
> >         Transcript cr;
> >            show: 'fail n=', (n printPaddedWith: $0 to: 3)  ; tab;
> >            show: ' time ', ((time2 - time1) printString) , '<100'; tab;
> >            show: ' uuid ', (((uuid1 at: 10) + 1) printString, '=',
> (uuid2 at: 10) printString).
> >     ].
> >     self assert: consecutiveOk.
> >     self assert: timeOk.
> >
> > From playground running...
> > 1 to: 1000 do: [:n|
> >     NeoUUIDGeneratorTests new setUp testTwoSameGenerator: n].
> > Transcript crShow: 'x'.
> >
> > Note the 'x' was not output due to one of the consecutiveOk or timeOk
> asserts triggered each run.
> > fail n=001     time 104<100    uuid 160=160
> > fail n=269     time 2<100      uuid 256=0
> > fail n=001     time 152<100    uuid 166=166
> > fail n=117     time 3<100      uuid 256=0
> > fail n=001     time 123<100    uuid 128=128
> > fail n=278     time 3<100      uuid 256=0
> > fail n=001     time 145<100    uuid 230=230
> > fail n=673     time 1340<100   uuid 141=141
> > fail n=707     time 3<100      uuid 256=0
> > fail n=001     time 132<100    uuid 73=73
> > fail n=001     time 117<100    uuid 206=206
> > fail n=063     time 2<100      uuid 256=0
> > fail n=345     time 3<100      uuid 256=0
> >
> > Now a strange thing is that after commenting out those two asserts for
> consecutiveOk and timeOk, the fails at n=001 stop happening.
> >
> > fail n=001     time 141<100    uuid 232=232
> > x
> > x
> > fail n=201     time 3<100      uuid 256=0
> > fail n=268     time 3<100      uuid 256=0
> > fail n=359     time 3<100      uuid 256=0
> > fail n=408     time 2<100      uuid 256=0
> > fail n=559     time 3<100      uuid 256=0
> > x
> > fail n=139     time 3<100      uuid 256=0
> > fail n=854     time 7<100      uuid 256=0
> > x
> > fail n=389     time 3<100      uuid 256=0
> > fail n=669     time 3<100      uuid 256=0
> > fail n=876     time 3<100      uuid 256=0
> > fail n=916     time 3<100      uuid 256=0
> > fail n=953     time 2<100      uuid 256=0
> > fail n=954     time 3<100      uuid 256=0
> > x
> > fail n=148     time 3<100      uuid 256=0
> > fail n=741     time 3<100      uuid 256=0
> > x
> > fail n=116     time 2<100      uuid 256=0
> > fail n=480     time 3<100      uuid 256=0
> > x
> >
> > After treating the rollover like this...
> >     consecutiveOk := ((uuid1 at: 10) + 1) \\ 256 = (uuid2 at: 10).
> > running the last playground script gives...
> >
> > (Note the n=001 fail reoccurs after waiting a while between running the
> playground script again.)
> > fail n=001     time 123<100    uuid 110=110
> > x
> > fail n=345     time 1287<100   uuid 24=24
> > fail n=404     time 1313<100   uuid 125=125
> > x
> > x
> > x
> > x
> > x
> > fail n=449     time 1290<100   uuid 24=24
> > x
> > x
> > x
> > x
> > x
> >
> >
> > Thats looking better, so upping to 10000 iterations...
> > fail n=0001    time 151<100          uuid 160=160
> > fail n=2977    time 1437<100   uuid 221=221
> > fail n=5498    time 1454<100   uuid 224=224
> > fail n=9007    time 1351<100   uuid 239=239
> > x
> > fail n=3389    time 1381<100   uuid 223=223
> > fail n=7768    time 1365<100   uuid 82=82
> > fail n=9773    time 1395<100   uuid 164=164
> > x
> > fail n=3325    time 1448<100   uuid 67=67
> > fail n=9930    time 1456<100   uuid 103=103
> > x
> > fail n=0060    time 2914<100   uuid 159=159
> > fail n=1829    time 1438<100   uuid 32=32
> > fail n=7918    time 1436<100   uuid 182=182
> > x
> >
> > So the rollover looks cured.
> >
> > Now adding some more instrumentation to view GC Time and Time To Run
> > and  (see attached ST file)
> >
> > 0:00:00:00    TTR 0:00:00:00.001      TTR 0:00:00:00  TTR 0:00:00:00
> TTR 0:00:00:00  TTR 0:00:00:00  TTR 0:00:00:00  TTR 0:00:00:00  TTR
> 0:00:00:00  TTR 0:00:00:00
> > x
> > fail n=0001    time 115<100    uuid 157=157    GCTime 0 TTR 0:00:00:00
> > fail n=0001    time 2<100      uuid 255=0      GCTime 0 TTR 0:00:00:00
> > x
> > fail n=0001    time 111<100    uuid 104=104    GCTime 0 TTR 0
> > x
> > fail n=0001    time 120<100    uuid 204=204    GCTime 0 TTR 0
> > fail n=0001    time 1182<100   uuid 103=103    GCTime 1 TTR 1
> > x
> > x
> > x
> > x
> > x
> > fail n=0001    time 3<100      uuid 255=0      GCTime 0 TTR 0
> > x
> > x
> > fail n=0001    time 1251<100   uuid 253=253    GCTime 2 TTR 1
> > x
> >
> >
> > Now I wonder at the class comment saying...
> >   "the first 8 bytes are the *millisecond* clock value"
> > but #clockOn: is using *microsecond* clock.
> > Perhaps "self assert: (time2 - time1) abs < 100"
> > is meant to be 100ms not 100us ??
> >
> > cheers -ben
> >
> > [1] https://pharo.fogbugz.com/f/cases/19438
> > [2]
> > https://ci.inria.fr/pharo/job/Pharo-6.0-Issue-Validator/19471//artifac
> > t/validationReport.html
> > <NeoUUIDGeneratorTests-testTwoSameGenerator.st>
>
>
>
>

Re: [Pharo-dev] NeoUUIDGenerator>>testTwoSameGenerator - intermittent CI failure

Reply via email to