Re: [Pharo-dev] [Pharo-users] STON materialization corrupts a dictionary if the keys are references

Sven Van Caekenberghe Thu, 14 Apr 2016 05:29:54 -0700

Thanks everybody for the feedback/discussions, it helped a lot. I made a new 
version that is simpler and better, as well as more correct.


===
Name: STON-Core-SvenVanCaekenberghe.72
Author: SvenVanCaekenberghe
Time: 14 April 2016, 2:17:40.799554 pm
UUID: 80f809ee-2693-47c7-adf5-d56a9647d78c
Ancestors: STON-Core-SvenVanCaekenberghe.71

New, better, more correct implementation of Dictionary/Set rehashing after 
reference resolution made them potentially unhealthy.

See the new tests #testDictionaryWithIndirectReferenceKeys and 
#testSetWithIndirectReferenceElements for example cases that were not covered 
by the previous optimization.

During resolution of references we count the number of unresolved references 
backwards so that we know when #stonProcessSubObjects: did actually resolve a 
reference (possible deep down). If that is the case we call a new hook called 
#stonPostReferenceResolution that does nothing by default. For Dictionary and 
Set #stonPostReferenceResolution is implemented as

        self isHealthy ifFalse: [ self rehash ]
        
Now, rehashing is only done when absolutely necesary (this is crucial for 
performance). We even avoid the possibly expensive #isHealthy check when no 
references were actually resolved and hence no hashing could possibly change.  
===
Name: STON-Tests-SvenVanCaekenberghe.64
Author: SvenVanCaekenberghe
Time: 14 April 2016, 2:17:59.88328 pm
UUID: 1401dc66-e503-4da1-bc7c-30dd0a4afa06
Ancestors: STON-Tests-SvenVanCaekenberghe.63
===

Sven

> On 06 Apr 2016, at 17:58, Henrik Sperre Johansen 
> <[email protected]> wrote:
> 
> Not really about modifying during iteration, if an object you try to remove: 
> is in the wrong slot, you'll most likely get a CannotBeFound error.
> 
> Which you can work around by iterating indexes and nilling directly, doing so 
> in reverse merely minimizes the amount of work needed to potentially fill the 
> slot when it is still empty after re-adding.
> (Maintain index of next nil slot as well, test backwards from that, and you 
> should get away with the minimal amount of work required... Just need to 
> ensure it's still nil, and not the slot removed object was put in)
> 
> Needless to say, it'd probably end up pretty hairy, and still have worse
> worse-case perf* than rehash, though average-case would probably be much 
> better.
> 
> Cheers,
> Henry
> 
> * worst-case performance in linearly hashed tables is *much* worse for remove 
> than add, due to the potential fixups needed when the set is dense... 
> Consider at:ifAbsent: perf experienced for missing entries in default sized 
> Unicode tables with hash at start of longest non-nil sequence of slots, and 
> multiply by 100
>> On 6. apr. 2016, at 21.07, Sven Van Caekenberghe <[email protected]> wrote:
>> 
>> Yeah, I just realised that iterating and changing a dictionary or set at the 
>> same time won't work ;-)
>> 
>>> On 06 Apr 2016, at 16:01, Henrik Johansen <[email protected]> 
>>> wrote:
>>> 
>>> If you are iterating over a Set with incorrectly placed objects, remove: 
>>> calls aren't going to do you much good ;)
>>> Not to mention, even nilling the slot directly, then add:'ing, still means 
>>> you have to scan subsequent entries up to the next empty slot for 
>>> potentially better placement, if the object ended up being added elsewhere.
>>> (IOW, if you're gonna do it, better iterate in reverse)
>>> 
>>> Cheers,
>>> Henry
>>> 
>>>> On 06 Apr 2016, at 3:46 , Sven Van Caekenberghe <[email protected]> wrote:
>>>> 
>>>> 
>>>>> On 06 Apr 2016, at 15:34, Nicolai Hess <[email protected]> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> 2016-04-06 15:25 GMT+02:00 Sven Van Caekenberghe <[email protected]>:
>>>>> Hi Nicolai,
>>>>> 
>>>>>> On 06 Apr 2016, at 14:56, Nicolai Hess <[email protected]> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 2016-04-06 14:27 GMT+02:00 Sven Van Caekenberghe <[email protected]>:
>>>>>> Fix for review:
>>>>>> 
>>>>>> ===
>>>>>> Name: STON-Core-SvenVanCaekenberghe.71
>>>>>> Author: SvenVanCaekenberghe
>>>>>> Time: 6 April 2016, 2:22:24.782251 pm
>>>>>> UUID: 64b8b741-365e-41fe-aa98-565e33ca5d24
>>>>>> Ancestors: STON-Core-SvenVanCaekenberghe.70
>>>>>> 
>>>>>> Fix a bug where STONReferences occurring as keys in Dictionaries or 
>>>>>> elements in Sets caused those to be unhealthy after materialization. Thx 
>>>>>> to Peter Uhnák for reporting this issue.
>>>>>> 
>>>>>> Add 3 new unit tests to STONReaderTests
>>>>>> 
>>>>>> #testDictionaryWithReferenceKeys
>>>>>> #testSetWithReferenceElements
>>>>>> #testDeepStructure
>>>>>> 
>>>>>> Fix Details
>>>>>> 
>>>>>> change the implementation of STONReader>>#processSubObjectsOf: from 
>>>>>> iterative to recursive (see version 39 of 29 November 2012, this might 
>>>>>> be a functional regression, see #testDeepStructure; cleanup of stack 
>>>>>> instance variable for later) so that #stonProcessSubObjects: can be 
>>>>>> overwritten with code being executed before or after full reference 
>>>>>> resolution
>>>>>> 
>>>>>> imho, recursion stack depth will be equal during both writing and 
>>>>>> reading, and should be acceptable.
>>>>>> 
>>>>>> overwrite #stonProcessSubObjects: in Dictionary and Set to #rehash at 
>>>>>> the end, but only when needed (minimal optimalization, see 
>>>>>> Dictionary>>#containsStonReferenceAsKey and Set>>#containsStonReference)
>>>>>> ===
>>>>>> Name: STON-Tests-SvenVanCaekenberghe.63
>>>>>> Author: SvenVanCaekenberghe
>>>>>> Time: 6 April 2016, 2:22:45.01986 pm
>>>>>> UUID: 0beb2322-b81a-46ee-a0e2-6648a808774a
>>>>>> Ancestors: STON-Tests-SvenVanCaekenberghe.62
>>>>>> 
>>>>>> (idem)
>>>>>> ===
>>>>> 
>>>>> Thanks for looking at the code.
>>>>> 
>>>>>> Hi Sven,
>>>>>> instead of rehashing the dictionary for every ston reference,
>>>>> 
>>>>> (It rehashes only once after resolving all references)
>>>>> 
>>>>> Ah, of course. I thought this would be called for every ston reference 
>>>>> used as key.
>>>>> 
>>>>> 
>>>>> 
>>>>>> wouldn't it work to remove and readd the value after processing the 
>>>>>> subobject:
>>>>>> 
>>>>>> Dictionary>>#stonProcessSubObjects: block
>>>>>> self keys do:[:key |
>>>>>>     |value|
>>>>>>     value := block value:(self removeKey: key ifAbsent:[ nil]).
>>>>>>     self at: (block value: key) put: value].
>>>>> 
>>>>> Interesting idea. I have to think about that approach.
>>>>> 
>>>>> Now, Object>>#stonProcessSubObjects: is very general and looks at named 
>>>>> and indexed instance variables. But this probably could be replaced by 
>>>>> something more high level and specific I guess.
>>>>> 
>>>>> Adding and removing each key/value has a cost too. I try to make the 
>>>>> simplest case very efficient and only pay a price when really needed. 
>>>>> Anyway, time for some calculations.
>>>>> 
>>>>> ok, yes running over all keys isn't better than rehashing :-)
>>>> 
>>>> Still, your idea might be better than you would expect:
>>>> 
>>>> Now, there is 1 (partial) iteration to do the check before, 2 iterations 
>>>> over the keys and values arrays, then an optional full rehash (which also 
>>>> reallocates and thus generates garbage).
>>>> 
>>>> Your idea does only 1 iteration over keys, with remove and add (which also 
>>>> happens more efficiently on the arrays above), no check before, no rehash, 
>>>> probably no garbage generation at all in any case.
>>>> 
>>>> Like I said, I have to study it (especially the cost of remove/add, maybe 
>>>> that can be optimised a bit as well).
>>>> 
>>>>> Thanks again for the suggestion !
>>>>> 
>>>>> Sven
>>>>> 
>>>>>>> On 06 Apr 2016, at 14:04, Sven Van Caekenberghe <[email protected]> wrote:
>>>>>>> 
>>>>>>> https://pharo.fogbugz.com/f/cases/17946/STON-materializes-unhealthy-Dictionaries-and-Sets-when-references-occur-in-its-keys-or-elements
>>>>>>> 
>>>>>>> fix coming
>>>>>>> 
>>>>>>>>> On 05 Apr 2016, at 13:11, Sven Van Caekenberghe <[email protected]> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 05 Apr 2016, at 13:02, Nicolai Hess <[email protected]> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2016-04-05 12:32 GMT+02:00 Cyril Ferlicot Delbecque 
>>>>>>>>> <[email protected]>:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 05/04/2016 12:09, Sven Van Caekenberghe wrote:
>>>>>>>>>> 
>>>>>>>>>> Like I said, it is a hashing issue, sometimes it will be correct by 
>>>>>>>>>> accident.
>>>>>>>>>> 
>>>>>>>>>> I hope you did not have to much trouble with this bug, I guess it 
>>>>>>>>>> must have been hard to chase.
>>>>>>>>>> 
>>>>>>>>>> Is it urgent ?
>>>>>>>>>> 
>>>>>>>>>> I probably can give you a quick fix, but I would like to think a bit 
>>>>>>>>>> more about this, since rehashing each materialised dictionary seems 
>>>>>>>>>> expensive.
>>>>>>>>> 
>>>>>>>>> Hi Sven,
>>>>>>>>> 
>>>>>>>>> I got the same kind of problem in a personal application.
>>>>>>>>> 
>>>>>>>>> I use Sets that I serialize and I had a lot of trouble because 
>>>>>>>>> sometimes
>>>>>>>>> some action had strange behaviours.
>>>>>>>>> 
>>>>>>>>> For example in a set with element `aSet remove: aSet anyOne` raised 
>>>>>>>>> 'XXX
>>>>>>>>> not found in aSet'.
>>>>>>>>> 
>>>>>>>>> I am glad to hear that it is a Ston issue and not me that used sets 
>>>>>>>>> in a
>>>>>>>>> bad way :)
>>>>>>>>> 
>>>>>>>>> For me too it is not urgent since I have a not of university work for
>>>>>>>>> the moment.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> How are hashed collections created/filled during ston-parsing ?
>>>>>>>>> If the position in a hashed collection is created by a 
>>>>>>>>> ston-reference, that is later replaced by the "real" object,
>>>>>>>>> the index in the dictionary  (or other hashed collections) may be 
>>>>>>>>> wrong.
>>>>>>>> 
>>>>>>>> Yes, that is indeed it, Nicolai.
>>>>>>>> 
>>>>>>>> But I would like to try to minimise the rehashing as it seems 
>>>>>>>> expensive. But first I need a more reliable failure.
>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Cyril Ferlicot
>>>>>>>>> 
>>>>>>>>> http://www.synectique.eu
>>>>>>>>> 
>>>>>>>>> 165 Avenue Bretagne
>>>>>>>>> Lille 59000 France
>> 
>> 
>

Re: [Pharo-dev] [Pharo-users] STON materialization corrupts a dictionary if the keys are references

Reply via email to