On 05.10.2011 01:59, Nicolas Cellier wrote:
2011/10/5 Michael Roberts<[email protected]>:
I find the thread a bit confused in the sense
Dictionary *new* at: x ifAbsentPut: y
does not make sense (is academic), because the new dictionary will
never have x as a key. So why profile it and use that as reasoning?...
whereas
myDict at:x ifAbsentPut: y
is more interesting.
I guess in some examples, like when you want to serialize an object
graph with as many nodes as branches (no or few sharing), then the
case of absence is dominating.
Knowing that Mariano is working on Fuel, i guess that could have
biased his judgement.
In my experience i am biased by the use of the message in the systems
that already use it heavily...however it always feels that the use of
ifAbsentXX is intentionally deferring some code because it expresses
exceptional cases in the application logic. Especially if the block
makes an object you don't want to be doing that every time and
discarding the result irrespective of the presence of the key. It does
not have to be expensive to make, just not good style near any kind of
loop.
I know some people that actually don't like the ifAbsentPut: []
variant. and prefer to be explicit, of the form
(myDict includesKey: x) ifFalse: [
myDict at: x put: y]
Both codes are valid. But note that at:ifAbsentPut: will return the
value, or the newly put value.
So the exact equivalent is ((myDict includesKey: x)
ifTrue: [myDict at: x]
ifFalse: [myDict at: x put: y])
However, maybe Mariano is searching for tight inner loop optimization
(we can only guess, because he didn't tell, he should have).
In this case, using ifFalse: is good: because it is inlined by
Compiler, it avoids a BlockClosure creation.
But above code will cost two lookup in both branch, so it will be
pretty bad too, depending on hash evaluation cost and collision rate,
in a majority of cases, worse than block creation time.
I would suggest to Mariano to implement in the Dictionary class of his
choice the single lookup, block free creation code:
at: key ifAbsentPutValue: aValue
"Answer the value associated with the key.
if key isn't found, associate it first with aValue."
| index |
^((array at: (index := self scanFor: key))
ifNil: [ array at: index put: (key -> aValue). aValue ]
ifNotNil: [:value | value]
But, please, please, Mariano, don't change #at:ifAbsentPut:
As other said, conditional execution is useful both for speed and
because in Object world, the state can change.
aBagOfKeys do: [:key | myDict at: key ifAbsentPut: (myStream next)].
aBagOfKeys do: [:key | myDict at: key ifAbsentPut: [myStream next]].
won't behave the same if aBagOfKeys has duplicates, will they ?
Last thing, #at:ifAbsentPut: is absolutely decoupled from
Object>>#value, it does rely on Object>>#value
So don't throw #at:ifAbsentPut: because you don't like Object>>#value.
If you don't like it, don't use it, use a BlockClosure instead.
Anyway, thanks to Mariano for these questions, they were biased, but
not silly, no question is :)
Nicolas
You could make at:ifAbsentPut: do single lookup, at the cost of using
cull: (with the scanned for index) in at:ifAbsent: .
Sort of restricts what you can pass to at:ifAbsent: though, (not that I
found any not using a block in a quick scan of senders, nor did my image
crash)
and the runtime really didn't improve that much.
| dict |
dict := Dictionary new: 100000.
[1 to: 100000 do: [: i | "1 scan
(new)" "2 scans (old)"
dict at: i ifAbsentPut: i ]] timeToRun 34 34 48 38 82 34 58 35
33 42 76 57 56 40 40 57 120 48
This is numbers in a presized dictionary though, so the collision rate
might not be representative...
Let me check that with points instead, where the hash computation/lookup
cost is higher...
| dict pts|
dict := Dictionary new: 100000.
pts := (1 to: 100000) collect: [:i | i@i].
[1 to: pts size do: [: ix |
"1 scan (new)"
"2 scans (old)"
dict at: (pts at: ix) ifAbsentPut: ix ]] timeToRun 96 89 89 89 91 143
92 89 96 132 127 144 126 139 126 133 129 127 134
Still not worth it to restrict at:ifAbsent: imho.
Cheers,
Henry