On 6 May 2018 at 04:20, Eliot Miranda <[email protected]> wrote:
> Hi Ben, > > > On May 5, 2018, at 7:25 AM, Ben Coman <[email protected]> wrote: > > > > On 5 May 2018 at 22:10, Ben Coman <[email protected]> wrote: > >> One of the key parts of Delay scheduling is setting the resumption time. >> There are two places this could be done. >> a. In Delay>>#schedule, running (for example) at userSchedulingPriority=40 >> b. In DelayMicrosecondScheduler>>#handleTImerEvent:, running at >> timingPriority=80 >> >> When we were using the millisecond clock for delay scheduling, >> it made sense to use (b.) since the clock might(?) roll over >> between when resumption time was set, and when #handleTImerEvent: >> expires delays. >> >> This should not be a problem now that we are using the microsecond clock, >> so I wondered what the latency penalty might be between (a.) and (b.). >> I ran a little experiment that indicates the cost/latency of switching >> threads, >> and was curious if anyone can comment on the validity of the experiment >> and interpretation of results. >> >> I made the following three changes... >> >> DelayScheduler subclass: #DelayMicrosecondScheduler >> instanceVariableNames: 'latencyStart countSinceLastZero' >> classVariableNames: 'LatencyCounts' >> poolDictionaries: '' >> category: 'Kernel-Processes' >> >> >> DelayMicrosecondScheduler>>#schedule: aDelay >> latencyStart:= Time primUTCMicrosecondsClock. "This is position (a.)" >> aDelay schedulerBeingWaitedOn ifTrue: [^self error: 'This Delay has >> already been scheduled.']. >> accessProtect critical: [ >> scheduledDelay := aDelay. >> timingSemaphore signal. "#handleTimerEvent: sets scheduledDelay:=nil" >> ]. >> >> >> DelayMicrosecondScheduler>>#handleTimerEvent: microsecondNowTick >> | microsecondNextTick | >> "Process any schedule requests" "This is position (b.)" >> scheduledDelay ifNotNil: [ >> |latency| >> latency := Time primUTCMicrosecondsClock - latencyStart. >> LatencyCounts ifNil: [ LatencyCounts := Bag new ]. >> LatencyCounts add: latency. >> latency = 0 >> ifTrue: [ countSinceLastZero := 1 + (countSinceLastZero ifNil: [0])] >> ifFalse: [Transcript >> crShow: 'zero latency count ' , countSinceLastZero printString ; >> show: ' before latency ', latency printString. >> countSinceLastZero := 0]. >> "Schedule the given delay." >> scheduledDelay scheduler: self resumptionTime: microsecondNowTick + (1000 >> * scheduledDelay millisecondDelayDuration). >> self scheduleDelay: scheduledDelay. >> scheduledDelay := nil ]. >> <snip> >> rest of method unchanged >> >> >> Then opened the Transcript and in Playground evaluated... >> Delay delaySchedulerClass: DelayMicrosecondScheduler. >> >> >> The Transcript results are shown below with some comments inserted. >> >> Now I guess the latency is affected by garbage collection. >> But one thing I was curious about is why the latency's were quantised in >> 1000s. >> >> Another interesting thing is that vast majority of the latency's were >> zero, >> which was a nice surprise, but can it be true? Or is it a consequence >> of the quantitisation rounding down? >> >> It seems that the idle-ness of the image affected how often a non-zero >> latency occurred. >> After I left the house for a while, the count of zero latency was very >> high, but a few still occurred. It would make sense there was less GC >> while idle. What is a good snippet of code to stress GC. I presume the >> latency might increase. >> >> >> zero latency count 2273 before latency 1000 >> zero latency count 943 before latency 1000 >> zero latency count 3666 before latency 1000 >> zero latency count 1643 before latency 1000 >> zero latency count 27 before latency 1000 >> "Left house for 20 minutes" >> zero latency count 12022 before latency 1000 >> zero latency count 15195 before latency 1000 >> zero latency count 41998 before latency 1000 >> "Returned from outing" >> zero latency count 128 before latency 1000 >> zero latency count 116 before latency 1000 >> zero latency count 555 before latency 1000 >> zero latency count 2377 before latency 1000 >> zero latency count 5423 before latency 1000 >> zero latency count 3178 before latency 1000 >> zero latency count 47 before latency 1000 >> zero latency count 2276 before latency 1000 >> "Left house to go shopping" >> zero latency count 6708 before latency 3000 >> zero latency count 4896 before latency 1000 >> zero latency count 433 before latency 1000 >> zero latency count 7106 before latency 1000 >> zero latency count 2195 before latency 1000 >> zero latency count 12397 before latency 1000 >> zero latency count 4815 before latency 2000 >> zero latency count 3480 before latency 1000 >> zero latency count 5690 before latency 1000 >> zero latency count 1670 before latency 1000 >> zero latency count 320 before latency 1000 >> zero latency count 1641 before latency 2000 >> zero latency count 290 before latency 1000 >> zero latency count 3835 before latency 1000 >> zero latency count 3818 before latency 1000 >> zero latency count 1132 before latency 1000 >> "Returned from shopping." >> zero latency count 2045 before latency 1000 >> zero latency count 671 before latency 1000 >> zero latency count 167 before latency 1000 >> zero latency count 147 before latency 1000 >> >> From the System Browser, inspecting the class variable LatencyCounts >> showed... >> 0 171329 >> 3000 1 >> 2000 3 >> 1000 49 >> > > P.S. after running a few more hours, LatencyCounts shows... > 0 1102691 > 1000 219 > 2000 12 > 3000 29 > 4000 21 > 5000 5 > 49000 1 > 75000 1 > > > It's likely that the vm code is using the cheap microsecond click which is > updated by the heartbeat, not the expensive one that interrogates the OS at > the instant. If not, it's the OS. I'll check soon. > I went looking and found an obvious source of quantitization multiplying by MicrosecondsPerMillisecond in the Win32 currentUTCMicroseconds() line 109 here... https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/ e17db79411cfec767e04f3d94d12a642d920a30e/platforms/win32/vm/ sqWin32Time.c#L109 Running the same experiment (for 20 mins) on Linux produced a latency distribution in line with what I was originally expecting... usec count 1 135 2 992 3 3250 4 21161 5 17309 6 5341 7 1230 8 313 9 79 10 44 11 46 12 50 13 36 14 58 15 73 16 38 17 12 18 7 19 5 20 1 21 3 23 2 24 4 25 4 26 5 27 4 29 3 30 6 31 5 32 6 33 5 34 2 35 2 36 5 37 4 39 4 40 2 41 2 44 3 46 2 47 1 48 4 49 3 51 2 53 1 54 1 56 2 58 1 59 1 61 1 66 2 70 1 82 1 86 1 184 1 544 1 576 1 1212 1 1402 1 2016 1 self average. "4.8"microseconds self stdev. "12.9"microseconds self average + (5 * self stdev) "69.1"microseconds So there are nine outliers outside five sigma. @Clement, Would you expect your incremental GC to have much impact on those outliers? or its target pause is ~1000usecs anyway? cheers -ben
