I love these little discussions/examples that you share... things you take for 
granted have subtle impacts that are worth reasoning about. Whenever I read 
them, it definitely improves my understanding and makes me a better programmer. 
Thanks Markus.

Tim

On Fri, 19 May 2023, at 12:01 PM, Marcus Denker wrote:
> Dictionary lookup is used a lot. Let's look at Dictionary>>#at:
>
>
> at: key
>       "Answer the value associated with the key."
>       ^ self at: key ifAbsent: [self errorKeyNotFound: key]
>
>
> The method looks simple and at a first glance, there are no obvious problems.
>
> But if we look at it again, we see that for every execution of #at:, 
> the block [self errorKeyNotFound: key]
> has to be created. The method stores a CompiledBlock in the literals, 
> and a bytecode "create block" creates 
> the block:
>
> ```
> (Dictionary>>#at:) symbolic 
>
> "'41 <4C> self
> 42 <40> pushTemp: 0
> 43 <40> pushTemp: 0
> 44 <F9 00 01> fullClosure:a CompiledBlock: [self errorKeyNotFound: key] 
> NumCopied: 1
> 47 <A1> send: at:ifAbsent:
> 48 <5C> returnTop'"
> ```
>
> This happens at *every* execution of #at:, even though the block that 
> we spend time to create will never
> be executed at runtime (outside of real errors).
>
> If you look at the rest of the code path of #at: in a Dictionary, it is 
> carefully written to avoid block creation
> by using optimized contructs (ifTrue: and friends).
>
> Could clean blocks help? Clean blocks are blocks that only use data 
> that the compiler knows at compile time,
> thus we can create a CleanBlockClosure (which has the CompiledBlock) 
> and store *that* as a Literal.
>
> This then would mean that we could just use as fast pushLiteral: to 
> push the block, no creation at runtime needed.
>
> Clean blocks are not yet enabled by default, but we can use a compiler 
> option pragma to enable them just for this
> method:
>
> ```
>       <compilerOptions: #(+ optionCleanBlockClosure)>
> ```
>
> The problem is that the block references both "self" and the key to be 
> able to show a nice error message.
> Referencing either one make the block not clean. 
>
> Could we change the block to be clean? I guess we could not refer self 
> and inline the method #errorKeyNotFound: 
> (knowing the instance is not that important for logging the error in 
> production, for example). But we do want 
> to know the key that is not found, it really simplifies debugging 
> especially if you have to look at log files.
> 
> If we closely look at the block: we know a bit more about it and how it 
> is used. We know that if it gets evaluated, 
> that evaluation will *always* happen with the homeContext of the block 
> on the stack, as it is will always be #at:ifAbsent: 
> that evaluates that block.
> 
> And in that case, there is a trick that we can do: we can use 
> reflection to read the needed values via the stack.
> 
> You might not be aware, but the debugger needs some quite interesting 
> features from the reflective layer of the system to provide
> the user experience that you take for granted (and that feels trivial). 
> Imagine a block like that:
>
> ```
> tt
>
>       | temp1 temp2 |
>       temp1 := 1.
>       temp2 := 2.
>
>       self class methods do: [ :each | self halt. each with: temp1 ].
>       ^ temp2
> ```
>
>
> The block does not reference temp2, so it actually is created as a 
> block that does not know temp2 at all. temp2 is not accessibe from the 
> block or it's context.
> Yet, when debugging, you want to just be able to write temp2 in the 
> block and eval it (without saving the method), as *if* you would access
> the temp in this block and recompile, the compiler would compile a 
> different block that would know temp2.
>
> So the reflective API of Context (and the infrastructure of reading 
> Variables reflectively), is build in a way that #readVariableNamed: 
> will 
> use the stack to find the value of temps that are not available in the 
> block context, but could be available if they would be referenced 
> statically.
>
> And if you think about it, we have just a case like that here, in 
> reverse: if we would use #readVariableNamed: on the context to read the 
> argument,
> the compiler would not see a read of "key", thus compile a block that 
> does not have that temp available. And if we then read "self" via 
> thisContext, too,
> the compiler sees a block that it can compile as a clean block:
>
>
> at: key
>       "Answer the value associated with the key."
>       
>       <compilerOptions: #(+ optionCleanBlockClosure)>
>       
>       ^ self at: key ifAbsent: [
>       
>               "this block is never executed, yet we pay runtime cost to 
> create it 
> if it is a full block.
>               We use instead the reflective API to read the argument and 
> receiver 
> via the stack, making
>               the block clean.
>       
>               We enable cleanBlocks for this method by setting 
> optionCleanBlockClosure as it is not yet
>               enabled by default"
>               
>               KeyNotFound signalFor: (thisContext readVariableNamed: #key) 
> in: 
> thisContext sender receiver
>               "This is equivalent to:
>                       self errorKeyNotFound: key"
>               ]
>               
>               
> Let's look at the printed symbolic bytecode:
>
>
> ```
> Dictionary>>#at:) symbolic  
>
> "'41 <4C> self
> 42 <40> pushTemp: 0
> 43 <20> pushConstant: [
>       
>               "this block is never executed, yet we pay runtime cost to 
> create it 
> if it is a full block.
>               We use instead the reflective API to read the argument and 
> receiver 
> via the stack, making
>               the block clean.
>       
>               We enable cleanBlocks for this method by setting 
> optionCleanBlockClosure as it is not yet
>               enabled by default"
>               
>               KeyNotFound signalFor: (thisContext readVariableNamed: #key) 
> in: 
> thisContext sender receiver
>               "This is equivalent to:
>                       self errorKeyNotFound: key"
>               ]
> 44 <A1> send: at:ifAbsent:
> 45 <5C> returnTop'"
> ```
>
> you see that it now uses pushConstant: to push the pre-compiled 
> CleanBlockClosure, which is *much* faster.
>
> A very naive benchmark using a fairly large Dictionary (Smalltalk globals):
>
> ```
> [Smalltalk globals at: #Object] bench 
>
> 8591996.801/7040335.266 "1.220395972119888"
> ```
>
> Shows a speedup of ~20%, which is qute a lot!
>
> Of course, this is a hack. For one, it just solves this for this one 
> case. And we do not want to add code like
> that everywhere. And real world impact of speed will be quite limited 
> due to that (it just speeds up
> Dictionary>#at:)
>
> The alternative would be to inline #at:ifAbsent: This inlining, due to 
> the existing subclasses, leads to quite some
> needed refactorings, and has the same problem of being a solution for 
> this problem in this one place.
>
> The real solution in the long term is a JIT that removes blocks like 
> that by inlining automatically for the code it executes,
> while not changing the image level code at all. This would then solve 
> this for all similar cases everywhere all at once.
>
> But the nice aspect of the hack is: it is local (touches just this one 
> method), gives us some speedup now and will be
> trivially removable when we deploy a better solution in the future.
>
> But the code is quite ugly and indeed relies on reflection... which 
> maybe for Dictionary>>#at: is not that nice
> (e.g. if you want to create a minimal image without reflection)
>
> But it is save to add this to your own image, it works in Pharo11 and 
> Pharo12 and I guess even in Pharo10
> (not tested there).
>
>       Marcus

Reply via email to