#at:

Marcus Denker Fri, 19 May 2023 04:01:44 -0700

Dictionary lookup is used a lot. Let's look at Dictionary>>#at:


at: key
        "Answer the value associated with the key."
        ^ self at: key ifAbsent: [self errorKeyNotFound: key]


The method looks simple and at a first glance, there are no obvious problems.

But if we look at it again, we see that for every execution of #at:, the block 
[self errorKeyNotFound: key]
has to be created. The method stores a CompiledBlock in the literals, and a 
bytecode "create block" creates 
the block:

```
(Dictionary>>#at:) symbolic 

"'41 <4C> self
42 <40> pushTemp: 0
43 <40> pushTemp: 0
44 <F9 00 01> fullClosure:a CompiledBlock: [self errorKeyNotFound: key] 
NumCopied: 1
47 <A1> send: at:ifAbsent:
48 <5C> returnTop'"
```

This happens at *every* execution of #at:, even though the block that we spend 
time to create will never
be executed at runtime (outside of real errors).

If you look at the rest of the code path of #at: in a Dictionary, it is 
carefully written to avoid block creation
by using optimized contructs (ifTrue: and friends).

Could clean blocks help? Clean blocks are blocks that only use data that the 
compiler knows at compile time,
thus we can create a CleanBlockClosure (which has the CompiledBlock) and store 
*that* as a Literal.

This then would mean that we could just use as fast pushLiteral: to push the 
block, no creation at runtime needed.

Clean blocks are not yet enabled by default, but we can use a compiler option 
pragma to enable them just for this
method:

```
        <compilerOptions: #(+ optionCleanBlockClosure)>
```

The problem is that the block references both "self" and the key to be able to 
show a nice error message.
Referencing either one make the block not clean. 

Could we change the block to be clean? I guess we could not refer self and 
inline the method #errorKeyNotFound: 
(knowing the instance is not that important for logging the error in 
production, for example). But we do want 
to know the key that is not found, it really simplifies debugging especially if 
you have to look at log files.
 
If we closely look at the block: we know a bit more about it and how it is 
used. We know that if it gets evaluated, 
that evaluation will *always* happen with the homeContext of the block on the 
stack, as it is will always be #at:ifAbsent: 
that evaluates that block.
 
And in that case, there is a trick that we can do: we can use reflection to 
read the needed values via the stack.
 
You might not be aware, but the debugger needs some quite interesting features 
from the reflective layer of the system to provide
the user experience that you take for granted (and that feels trivial). Imagine 
a block like that:

```
tt

        | temp1 temp2 |
        temp1 := 1.
        temp2 := 2.

        self class methods do: [ :each | self halt. each with: temp1 ].
        ^ temp2
```


The block does not reference temp2, so it actually is created as a block that 
does not know temp2 at all. temp2 is not accessibe from the block or it's 
context.
Yet, when debugging, you want to just be able to write temp2 in the block and 
eval it (without saving the method), as *if* you would access
the temp in this block and recompile, the compiler would compile a different 
block that would know temp2.

So the reflective API of Context (and the infrastructure of reading Variables 
reflectively), is build in a way that #readVariableNamed: will 
use the stack to find the value of temps that are not available in the block 
context, but could be available if they would be referenced 
statically.

And if you think about it, we have just a case like that here, in reverse: if 
we would use #readVariableNamed: on the context to read the argument,
the compiler would not see a read of "key", thus compile a block that does not 
have that temp available. And if we then read "self" via thisContext, too,
the compiler sees a block that it can compile as a clean block:


at: key
        "Answer the value associated with the key."
        
        <compilerOptions: #(+ optionCleanBlockClosure)>
        
        ^ self at: key ifAbsent: [
        
                "this block is never executed, yet we pay runtime cost to 
create it if it is a full block.
                We use instead the reflective API to read the argument and 
receiver via the stack, making
                the block clean.
        
                We enable cleanBlocks for this method by setting 
optionCleanBlockClosure as it is not yet
                enabled by default"
                
                KeyNotFound signalFor: (thisContext readVariableNamed: #key) 
in: thisContext sender receiver
                "This is equivalent to:
                        self errorKeyNotFound: key"
                ]
                
                
Let's look at the printed symbolic bytecode:


```
Dictionary>>#at:) symbolic  

"'41 <4C> self
42 <40> pushTemp: 0
43 <20> pushConstant: [
        
                "this block is never executed, yet we pay runtime cost to 
create it if it is a full block.
                We use instead the reflective API to read the argument and 
receiver via the stack, making
                the block clean.
        
                We enable cleanBlocks for this method by setting 
optionCleanBlockClosure as it is not yet
                enabled by default"
                
                KeyNotFound signalFor: (thisContext readVariableNamed: #key) 
in: thisContext sender receiver
                "This is equivalent to:
                        self errorKeyNotFound: key"
                ]
44 <A1> send: at:ifAbsent:
45 <5C> returnTop'"
```

you see that it now uses pushConstant: to push the pre-compiled 
CleanBlockClosure, which is *much* faster.

A very naive benchmark using a fairly large Dictionary (Smalltalk globals):

```
[Smalltalk globals at: #Object] bench 

8591996.801/7040335.266 "1.220395972119888"
```

Shows a speedup of ~20%, which is qute a lot!

Of course, this is a hack. For one, it just solves this for this one case. And 
we do not want to add code like
that everywhere. And real world impact of speed will be quite limited due to 
that (it just speeds up
Dictionary>#at:)

The alternative would be to inline #at:ifAbsent: This inlining, due to the 
existing subclasses, leads to quite some
needed refactorings, and has the same problem of being a solution for this 
problem in this one place.

The real solution in the long term is a JIT that removes blocks like that by 
inlining automatically for the code it executes,
while not changing the image level code at all. This would then solve this for 
all similar cases everywhere all at once.

But the nice aspect of the hack is: it is local (touches just this one method), 
gives us some speedup now and will be
trivially removable when we deploy a better solution in the future.

But the code is quite ugly and indeed relies on reflection... which maybe for 
Dictionary>>#at: is not that nice
(e.g. if you want to create a minimal image without reflection)

But it is save to add this to your own image, it works in Pharo11 and Pharo12 
and I guess even in Pharo10
(not tested there).

        Marcus

[Pharo-dev] Fun with CleanBlocks or: Speeding up Dictionary>>#at:

Reply via email to