Hi Ben, Thanks for looking into this, it was bothering me for quite some time ;-)
https://pharo.fogbugz.com/f/cases/19453/NeoUUIDGenerator-testTwoSameGenerator-intermittent-CI-failure Yes, I think the counter roll-over is the problem, stupid of me to to have seen this. The timing problem might indeed be resolution dependent: 100us is probably to short sometimes. I will provide a slice with fixes. Thanks again ! Regards, Sven > On 11 Dec 2016, at 09:07, Ben Coman <[email protected]> wrote: > > For Case [1] I got an unrelated CI failure [2] for > NeoUUIDGeneratorTests>>#testTwoSameGenerator > > Checking this locally by running this a dozen times... > 1 to: 1000 do: [:n| last:=n. NeoUUIDGeneratorTests new setUp > testTwoSameGenerator ] > > Two distinct failures are apparent... > > 1. Takes longer than expected. > self assert: (time2 - time1) abs < 100 > last=1 ==> a time difference of 109 > last=1 ==> a time difference of 102 > last=1 ==> a time difference of 155 > last=1 ==> a time difference of 105 > Interesting that whenever this happens its on the first iteration. > > 2. Invalid consecutive ids. > self assert: (uuid1 at: 10) + 1 equals: (uuid2 at: 10). > last=127 ==> TestFailure: Got 256 instead of 0. > last=828 ==> TestFailure: Got 256 instead of 0. > last=128 ==> TestFailure: Got 256 instead of 0. > last=316 ==> TestFailure: Got 256 instead of 0. > Maybe a problem with rollover? > > So instrumenting a bit: > testTwoSameGenerator: n > | uuid1 uuid2 consecutiveOk time1 time2 timeOk | > .... > "same generator, consequetive counters" > consecutiveOk := ((uuid1 at: 10) + 1) = (uuid2 at: 10). > "same generators, similar clock, within 100 milliseconds" > time1 := self timeFromUUID: uuid1. > time2 := self timeFromUUID: uuid2. > timeOk := (time2 - time1) abs < 100. > ( timeOk and: [ consecutiveOk ]) ifFalse: [ > Transcript cr; > show: 'fail n=', (n printPaddedWith: $0 to: 3) ; tab; > show: ' time ', ((time2 - time1) printString) , '<100'; tab; > show: ' uuid ', (((uuid1 at: 10) + 1) printString, '=', (uuid2 at: > 10) printString). > ]. > self assert: consecutiveOk. > self assert: timeOk. > > From playground running... > 1 to: 1000 do: [:n| > NeoUUIDGeneratorTests new setUp testTwoSameGenerator: n]. > Transcript crShow: 'x'. > > Note the 'x' was not output due to one of the consecutiveOk or timeOk asserts > triggered each run. > fail n=001 time 104<100 uuid 160=160 > fail n=269 time 2<100 uuid 256=0 > fail n=001 time 152<100 uuid 166=166 > fail n=117 time 3<100 uuid 256=0 > fail n=001 time 123<100 uuid 128=128 > fail n=278 time 3<100 uuid 256=0 > fail n=001 time 145<100 uuid 230=230 > fail n=673 time 1340<100 uuid 141=141 > fail n=707 time 3<100 uuid 256=0 > fail n=001 time 132<100 uuid 73=73 > fail n=001 time 117<100 uuid 206=206 > fail n=063 time 2<100 uuid 256=0 > fail n=345 time 3<100 uuid 256=0 > > Now a strange thing is that after commenting out those two asserts for > consecutiveOk and timeOk, the fails at n=001 stop happening. > > fail n=001 time 141<100 uuid 232=232 > x > x > fail n=201 time 3<100 uuid 256=0 > fail n=268 time 3<100 uuid 256=0 > fail n=359 time 3<100 uuid 256=0 > fail n=408 time 2<100 uuid 256=0 > fail n=559 time 3<100 uuid 256=0 > x > fail n=139 time 3<100 uuid 256=0 > fail n=854 time 7<100 uuid 256=0 > x > fail n=389 time 3<100 uuid 256=0 > fail n=669 time 3<100 uuid 256=0 > fail n=876 time 3<100 uuid 256=0 > fail n=916 time 3<100 uuid 256=0 > fail n=953 time 2<100 uuid 256=0 > fail n=954 time 3<100 uuid 256=0 > x > fail n=148 time 3<100 uuid 256=0 > fail n=741 time 3<100 uuid 256=0 > x > fail n=116 time 2<100 uuid 256=0 > fail n=480 time 3<100 uuid 256=0 > x > > After treating the rollover like this... > consecutiveOk := ((uuid1 at: 10) + 1) \\ 256 = (uuid2 at: 10). > running the last playground script gives... > > (Note the n=001 fail reoccurs after waiting a while between running the > playground script again.) > fail n=001 time 123<100 uuid 110=110 > x > fail n=345 time 1287<100 uuid 24=24 > fail n=404 time 1313<100 uuid 125=125 > x > x > x > x > x > fail n=449 time 1290<100 uuid 24=24 > x > x > x > x > x > > > Thats looking better, so upping to 10000 iterations... > fail n=0001 time 151<100 uuid 160=160 > fail n=2977 time 1437<100 uuid 221=221 > fail n=5498 time 1454<100 uuid 224=224 > fail n=9007 time 1351<100 uuid 239=239 > x > fail n=3389 time 1381<100 uuid 223=223 > fail n=7768 time 1365<100 uuid 82=82 > fail n=9773 time 1395<100 uuid 164=164 > x > fail n=3325 time 1448<100 uuid 67=67 > fail n=9930 time 1456<100 uuid 103=103 > x > fail n=0060 time 2914<100 uuid 159=159 > fail n=1829 time 1438<100 uuid 32=32 > fail n=7918 time 1436<100 uuid 182=182 > x > > So the rollover looks cured. > > Now adding some more instrumentation to view GC Time and Time To Run and > (see attached ST file) > > 0:00:00:00 TTR 0:00:00:00.001 TTR 0:00:00:00 TTR 0:00:00:00 TTR > 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 TTR 0:00:00:00 > TTR 0:00:00:00 > x > fail n=0001 time 115<100 uuid 157=157 GCTime 0 TTR 0:00:00:00 > fail n=0001 time 2<100 uuid 255=0 GCTime 0 TTR 0:00:00:00 > x > fail n=0001 time 111<100 uuid 104=104 GCTime 0 TTR 0 > x > fail n=0001 time 120<100 uuid 204=204 GCTime 0 TTR 0 > fail n=0001 time 1182<100 uuid 103=103 GCTime 1 TTR 1 > x > x > x > x > x > fail n=0001 time 3<100 uuid 255=0 GCTime 0 TTR 0 > x > x > fail n=0001 time 1251<100 uuid 253=253 GCTime 2 TTR 1 > x > > > Now I wonder at the class comment saying... > "the first 8 bytes are the *millisecond* clock value" > but #clockOn: is using *microsecond* clock. > Perhaps "self assert: (time2 - time1) abs < 100" > is meant to be 100ms not 100us ?? > > cheers -ben > > [1] https://pharo.fogbugz.com/f/cases/19438 > [2] > https://ci.inria.fr/pharo/job/Pharo-6.0-Issue-Validator/19471//artifact/validationReport.html > <NeoUUIDGeneratorTests-testTwoSameGenerator.st>
