Thanks everybody for the feedback/discussions, it helped a lot. I made a new
version that is simpler and better, as well as more correct.
===
Name: STON-Core-SvenVanCaekenberghe.72
Author: SvenVanCaekenberghe
Time: 14 April 2016, 2:17:40.799554 pm
UUID: 80f809ee-2693-47c7-adf5-d56a9647d78c
Ancestors: STON-Core-SvenVanCaekenberghe.71
New, better, more correct implementation of Dictionary/Set rehashing after
reference resolution made them potentially unhealthy.
See the new tests #testDictionaryWithIndirectReferenceKeys and
#testSetWithIndirectReferenceElements for example cases that were not covered
by the previous optimization.
During resolution of references we count the number of unresolved references
backwards so that we know when #stonProcessSubObjects: did actually resolve a
reference (possible deep down). If that is the case we call a new hook called
#stonPostReferenceResolution that does nothing by default. For Dictionary and
Set #stonPostReferenceResolution is implemented as
self isHealthy ifFalse: [ self rehash ]
Now, rehashing is only done when absolutely necesary (this is crucial for
performance). We even avoid the possibly expensive #isHealthy check when no
references were actually resolved and hence no hashing could possibly change.
===
Name: STON-Tests-SvenVanCaekenberghe.64
Author: SvenVanCaekenberghe
Time: 14 April 2016, 2:17:59.88328 pm
UUID: 1401dc66-e503-4da1-bc7c-30dd0a4afa06
Ancestors: STON-Tests-SvenVanCaekenberghe.63
===
Sven
> On 06 Apr 2016, at 17:58, Henrik Sperre Johansen
> <[email protected]> wrote:
>
> Not really about modifying during iteration, if an object you try to remove:
> is in the wrong slot, you'll most likely get a CannotBeFound error.
>
> Which you can work around by iterating indexes and nilling directly, doing so
> in reverse merely minimizes the amount of work needed to potentially fill the
> slot when it is still empty after re-adding.
> (Maintain index of next nil slot as well, test backwards from that, and you
> should get away with the minimal amount of work required... Just need to
> ensure it's still nil, and not the slot removed object was put in)
>
> Needless to say, it'd probably end up pretty hairy, and still have worse
> worse-case perf* than rehash, though average-case would probably be much
> better.
>
> Cheers,
> Henry
>
> * worst-case performance in linearly hashed tables is *much* worse for remove
> than add, due to the potential fixups needed when the set is dense...
> Consider at:ifAbsent: perf experienced for missing entries in default sized
> Unicode tables with hash at start of longest non-nil sequence of slots, and
> multiply by 100
>> On 6. apr. 2016, at 21.07, Sven Van Caekenberghe <[email protected]> wrote:
>>
>> Yeah, I just realised that iterating and changing a dictionary or set at the
>> same time won't work ;-)
>>
>>> On 06 Apr 2016, at 16:01, Henrik Johansen <[email protected]>
>>> wrote:
>>>
>>> If you are iterating over a Set with incorrectly placed objects, remove:
>>> calls aren't going to do you much good ;)
>>> Not to mention, even nilling the slot directly, then add:'ing, still means
>>> you have to scan subsequent entries up to the next empty slot for
>>> potentially better placement, if the object ended up being added elsewhere.
>>> (IOW, if you're gonna do it, better iterate in reverse)
>>>
>>> Cheers,
>>> Henry
>>>
>>>> On 06 Apr 2016, at 3:46 , Sven Van Caekenberghe <[email protected]> wrote:
>>>>
>>>>
>>>>> On 06 Apr 2016, at 15:34, Nicolai Hess <[email protected]> wrote:
>>>>>
>>>>>
>>>>>
>>>>> 2016-04-06 15:25 GMT+02:00 Sven Van Caekenberghe <[email protected]>:
>>>>> Hi Nicolai,
>>>>>
>>>>>> On 06 Apr 2016, at 14:56, Nicolai Hess <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2016-04-06 14:27 GMT+02:00 Sven Van Caekenberghe <[email protected]>:
>>>>>> Fix for review:
>>>>>>
>>>>>> ===
>>>>>> Name: STON-Core-SvenVanCaekenberghe.71
>>>>>> Author: SvenVanCaekenberghe
>>>>>> Time: 6 April 2016, 2:22:24.782251 pm
>>>>>> UUID: 64b8b741-365e-41fe-aa98-565e33ca5d24
>>>>>> Ancestors: STON-Core-SvenVanCaekenberghe.70
>>>>>>
>>>>>> Fix a bug where STONReferences occurring as keys in Dictionaries or
>>>>>> elements in Sets caused those to be unhealthy after materialization. Thx
>>>>>> to Peter Uhnák for reporting this issue.
>>>>>>
>>>>>> Add 3 new unit tests to STONReaderTests
>>>>>>
>>>>>> #testDictionaryWithReferenceKeys
>>>>>> #testSetWithReferenceElements
>>>>>> #testDeepStructure
>>>>>>
>>>>>> Fix Details
>>>>>>
>>>>>> change the implementation of STONReader>>#processSubObjectsOf: from
>>>>>> iterative to recursive (see version 39 of 29 November 2012, this might
>>>>>> be a functional regression, see #testDeepStructure; cleanup of stack
>>>>>> instance variable for later) so that #stonProcessSubObjects: can be
>>>>>> overwritten with code being executed before or after full reference
>>>>>> resolution
>>>>>>
>>>>>> imho, recursion stack depth will be equal during both writing and
>>>>>> reading, and should be acceptable.
>>>>>>
>>>>>> overwrite #stonProcessSubObjects: in Dictionary and Set to #rehash at
>>>>>> the end, but only when needed (minimal optimalization, see
>>>>>> Dictionary>>#containsStonReferenceAsKey and Set>>#containsStonReference)
>>>>>> ===
>>>>>> Name: STON-Tests-SvenVanCaekenberghe.63
>>>>>> Author: SvenVanCaekenberghe
>>>>>> Time: 6 April 2016, 2:22:45.01986 pm
>>>>>> UUID: 0beb2322-b81a-46ee-a0e2-6648a808774a
>>>>>> Ancestors: STON-Tests-SvenVanCaekenberghe.62
>>>>>>
>>>>>> (idem)
>>>>>> ===
>>>>>
>>>>> Thanks for looking at the code.
>>>>>
>>>>>> Hi Sven,
>>>>>> instead of rehashing the dictionary for every ston reference,
>>>>>
>>>>> (It rehashes only once after resolving all references)
>>>>>
>>>>> Ah, of course. I thought this would be called for every ston reference
>>>>> used as key.
>>>>>
>>>>>
>>>>>
>>>>>> wouldn't it work to remove and readd the value after processing the
>>>>>> subobject:
>>>>>>
>>>>>> Dictionary>>#stonProcessSubObjects: block
>>>>>> self keys do:[:key |
>>>>>> |value|
>>>>>> value := block value:(self removeKey: key ifAbsent:[ nil]).
>>>>>> self at: (block value: key) put: value].
>>>>>
>>>>> Interesting idea. I have to think about that approach.
>>>>>
>>>>> Now, Object>>#stonProcessSubObjects: is very general and looks at named
>>>>> and indexed instance variables. But this probably could be replaced by
>>>>> something more high level and specific I guess.
>>>>>
>>>>> Adding and removing each key/value has a cost too. I try to make the
>>>>> simplest case very efficient and only pay a price when really needed.
>>>>> Anyway, time for some calculations.
>>>>>
>>>>> ok, yes running over all keys isn't better than rehashing :-)
>>>>
>>>> Still, your idea might be better than you would expect:
>>>>
>>>> Now, there is 1 (partial) iteration to do the check before, 2 iterations
>>>> over the keys and values arrays, then an optional full rehash (which also
>>>> reallocates and thus generates garbage).
>>>>
>>>> Your idea does only 1 iteration over keys, with remove and add (which also
>>>> happens more efficiently on the arrays above), no check before, no rehash,
>>>> probably no garbage generation at all in any case.
>>>>
>>>> Like I said, I have to study it (especially the cost of remove/add, maybe
>>>> that can be optimised a bit as well).
>>>>
>>>>> Thanks again for the suggestion !
>>>>>
>>>>> Sven
>>>>>
>>>>>>> On 06 Apr 2016, at 14:04, Sven Van Caekenberghe <[email protected]> wrote:
>>>>>>>
>>>>>>> https://pharo.fogbugz.com/f/cases/17946/STON-materializes-unhealthy-Dictionaries-and-Sets-when-references-occur-in-its-keys-or-elements
>>>>>>>
>>>>>>> fix coming
>>>>>>>
>>>>>>>>> On 05 Apr 2016, at 13:11, Sven Van Caekenberghe <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 05 Apr 2016, at 13:02, Nicolai Hess <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2016-04-05 12:32 GMT+02:00 Cyril Ferlicot Delbecque
>>>>>>>>> <[email protected]>:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On 05/04/2016 12:09, Sven Van Caekenberghe wrote:
>>>>>>>>>>
>>>>>>>>>> Like I said, it is a hashing issue, sometimes it will be correct by
>>>>>>>>>> accident.
>>>>>>>>>>
>>>>>>>>>> I hope you did not have to much trouble with this bug, I guess it
>>>>>>>>>> must have been hard to chase.
>>>>>>>>>>
>>>>>>>>>> Is it urgent ?
>>>>>>>>>>
>>>>>>>>>> I probably can give you a quick fix, but I would like to think a bit
>>>>>>>>>> more about this, since rehashing each materialised dictionary seems
>>>>>>>>>> expensive.
>>>>>>>>>
>>>>>>>>> Hi Sven,
>>>>>>>>>
>>>>>>>>> I got the same kind of problem in a personal application.
>>>>>>>>>
>>>>>>>>> I use Sets that I serialize and I had a lot of trouble because
>>>>>>>>> sometimes
>>>>>>>>> some action had strange behaviours.
>>>>>>>>>
>>>>>>>>> For example in a set with element `aSet remove: aSet anyOne` raised
>>>>>>>>> 'XXX
>>>>>>>>> not found in aSet'.
>>>>>>>>>
>>>>>>>>> I am glad to hear that it is a Ston issue and not me that used sets
>>>>>>>>> in a
>>>>>>>>> bad way :)
>>>>>>>>>
>>>>>>>>> For me too it is not urgent since I have a not of university work for
>>>>>>>>> the moment.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> How are hashed collections created/filled during ston-parsing ?
>>>>>>>>> If the position in a hashed collection is created by a
>>>>>>>>> ston-reference, that is later replaced by the "real" object,
>>>>>>>>> the index in the dictionary (or other hashed collections) may be
>>>>>>>>> wrong.
>>>>>>>>
>>>>>>>> Yes, that is indeed it, Nicolai.
>>>>>>>>
>>>>>>>> But I would like to try to minimise the rehashing as it seems
>>>>>>>> expensive. But first I need a more reliable failure.
>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Cyril Ferlicot
>>>>>>>>>
>>>>>>>>> http://www.synectique.eu
>>>>>>>>>
>>>>>>>>> 165 Avenue Bretagne
>>>>>>>>> Lille 59000 France
>>
>>
>