I love these little discussions/examples that you share... things you take for granted have subtle impacts that are worth reasoning about. Whenever I read them, it definitely improves my understanding and makes me a better programmer. Thanks Markus.
Tim On Fri, 19 May 2023, at 12:01 PM, Marcus Denker wrote: > Dictionary lookup is used a lot. Let's look at Dictionary>>#at: > > > at: key > "Answer the value associated with the key." > ^ self at: key ifAbsent: [self errorKeyNotFound: key] > > > The method looks simple and at a first glance, there are no obvious problems. > > But if we look at it again, we see that for every execution of #at:, > the block [self errorKeyNotFound: key] > has to be created. The method stores a CompiledBlock in the literals, > and a bytecode "create block" creates > the block: > > ``` > (Dictionary>>#at:) symbolic > > "'41 <4C> self > 42 <40> pushTemp: 0 > 43 <40> pushTemp: 0 > 44 <F9 00 01> fullClosure:a CompiledBlock: [self errorKeyNotFound: key] > NumCopied: 1 > 47 <A1> send: at:ifAbsent: > 48 <5C> returnTop'" > ``` > > This happens at *every* execution of #at:, even though the block that > we spend time to create will never > be executed at runtime (outside of real errors). > > If you look at the rest of the code path of #at: in a Dictionary, it is > carefully written to avoid block creation > by using optimized contructs (ifTrue: and friends). > > Could clean blocks help? Clean blocks are blocks that only use data > that the compiler knows at compile time, > thus we can create a CleanBlockClosure (which has the CompiledBlock) > and store *that* as a Literal. > > This then would mean that we could just use as fast pushLiteral: to > push the block, no creation at runtime needed. > > Clean blocks are not yet enabled by default, but we can use a compiler > option pragma to enable them just for this > method: > > ``` > <compilerOptions: #(+ optionCleanBlockClosure)> > ``` > > The problem is that the block references both "self" and the key to be > able to show a nice error message. > Referencing either one make the block not clean. > > Could we change the block to be clean? I guess we could not refer self > and inline the method #errorKeyNotFound: > (knowing the instance is not that important for logging the error in > production, for example). But we do want > to know the key that is not found, it really simplifies debugging > especially if you have to look at log files. > > If we closely look at the block: we know a bit more about it and how it > is used. We know that if it gets evaluated, > that evaluation will *always* happen with the homeContext of the block > on the stack, as it is will always be #at:ifAbsent: > that evaluates that block. > > And in that case, there is a trick that we can do: we can use > reflection to read the needed values via the stack. > > You might not be aware, but the debugger needs some quite interesting > features from the reflective layer of the system to provide > the user experience that you take for granted (and that feels trivial). > Imagine a block like that: > > ``` > tt > > | temp1 temp2 | > temp1 := 1. > temp2 := 2. > > self class methods do: [ :each | self halt. each with: temp1 ]. > ^ temp2 > ``` > > > The block does not reference temp2, so it actually is created as a > block that does not know temp2 at all. temp2 is not accessibe from the > block or it's context. > Yet, when debugging, you want to just be able to write temp2 in the > block and eval it (without saving the method), as *if* you would access > the temp in this block and recompile, the compiler would compile a > different block that would know temp2. > > So the reflective API of Context (and the infrastructure of reading > Variables reflectively), is build in a way that #readVariableNamed: > will > use the stack to find the value of temps that are not available in the > block context, but could be available if they would be referenced > statically. > > And if you think about it, we have just a case like that here, in > reverse: if we would use #readVariableNamed: on the context to read the > argument, > the compiler would not see a read of "key", thus compile a block that > does not have that temp available. And if we then read "self" via > thisContext, too, > the compiler sees a block that it can compile as a clean block: > > > at: key > "Answer the value associated with the key." > > <compilerOptions: #(+ optionCleanBlockClosure)> > > ^ self at: key ifAbsent: [ > > "this block is never executed, yet we pay runtime cost to > create it > if it is a full block. > We use instead the reflective API to read the argument and > receiver > via the stack, making > the block clean. > > We enable cleanBlocks for this method by setting > optionCleanBlockClosure as it is not yet > enabled by default" > > KeyNotFound signalFor: (thisContext readVariableNamed: #key) > in: > thisContext sender receiver > "This is equivalent to: > self errorKeyNotFound: key" > ] > > > Let's look at the printed symbolic bytecode: > > > ``` > Dictionary>>#at:) symbolic > > "'41 <4C> self > 42 <40> pushTemp: 0 > 43 <20> pushConstant: [ > > "this block is never executed, yet we pay runtime cost to > create it > if it is a full block. > We use instead the reflective API to read the argument and > receiver > via the stack, making > the block clean. > > We enable cleanBlocks for this method by setting > optionCleanBlockClosure as it is not yet > enabled by default" > > KeyNotFound signalFor: (thisContext readVariableNamed: #key) > in: > thisContext sender receiver > "This is equivalent to: > self errorKeyNotFound: key" > ] > 44 <A1> send: at:ifAbsent: > 45 <5C> returnTop'" > ``` > > you see that it now uses pushConstant: to push the pre-compiled > CleanBlockClosure, which is *much* faster. > > A very naive benchmark using a fairly large Dictionary (Smalltalk globals): > > ``` > [Smalltalk globals at: #Object] bench > > 8591996.801/7040335.266 "1.220395972119888" > ``` > > Shows a speedup of ~20%, which is qute a lot! > > Of course, this is a hack. For one, it just solves this for this one > case. And we do not want to add code like > that everywhere. And real world impact of speed will be quite limited > due to that (it just speeds up > Dictionary>#at:) > > The alternative would be to inline #at:ifAbsent: This inlining, due to > the existing subclasses, leads to quite some > needed refactorings, and has the same problem of being a solution for > this problem in this one place. > > The real solution in the long term is a JIT that removes blocks like > that by inlining automatically for the code it executes, > while not changing the image level code at all. This would then solve > this for all similar cases everywhere all at once. > > But the nice aspect of the hack is: it is local (touches just this one > method), gives us some speedup now and will be > trivially removable when we deploy a better solution in the future. > > But the code is quite ugly and indeed relies on reflection... which > maybe for Dictionary>>#at: is not that nice > (e.g. if you want to create a minimal image without reflection) > > But it is save to add this to your own image, it works in Pharo11 and > Pharo12 and I guess even in Pharo10 > (not tested there). > > Marcus