Re: [Pharo-dev] [Pharo-users] STON materialization corrupts a dictionary if the keys are references

Henrik Sperre Johansen Wed, 06 Apr 2016 08:59:48 -0700

Not really about modifying during iteration, if an object you try to remove: is 
in the wrong slot, you'll most likely get a CannotBeFound error.


Which you can work around by iterating indexes and nilling directly, doing so 
in reverse merely minimizes the amount of work needed to potentially fill the 
slot when it is still empty after re-adding.
(Maintain index of next nil slot as well, test backwards from that, and you 
should get away with the minimal amount of work required... Just need to ensure 
it's still nil, and not the slot removed object was put in)

Needless to say, it'd probably end up pretty hairy, and still have worse
worse-case perf* than rehash, though average-case would probably be much better.

Cheers,
Henry

* worst-case performance in linearly hashed tables is *much* worse for remove 
than add, due to the potential fixups needed when the set is dense... Consider 
at:ifAbsent: perf experienced for missing entries in default sized Unicode 
tables with hash at start of longest non-nil sequence of slots, and multiply by 
100
> On 6. apr. 2016, at 21.07, Sven Van Caekenberghe <[email protected]> wrote:
> 
> Yeah, I just realised that iterating and changing a dictionary or set at the 
> same time won't work ;-)
> 
>> On 06 Apr 2016, at 16:01, Henrik Johansen <[email protected]> 
>> wrote:
>> 
>> If you are iterating over a Set with incorrectly placed objects, remove: 
>> calls aren't going to do you much good ;)
>> Not to mention, even nilling the slot directly, then add:'ing, still means 
>> you have to scan subsequent entries up to the next empty slot for 
>> potentially better placement, if the object ended up being added elsewhere.
>> (IOW, if you're gonna do it, better iterate in reverse)
>> 
>> Cheers,
>> Henry
>> 
>>> On 06 Apr 2016, at 3:46 , Sven Van Caekenberghe <[email protected]> wrote:
>>> 
>>> 
>>>> On 06 Apr 2016, at 15:34, Nicolai Hess <[email protected]> wrote:
>>>> 
>>>> 
>>>> 
>>>> 2016-04-06 15:25 GMT+02:00 Sven Van Caekenberghe <[email protected]>:
>>>> Hi Nicolai,
>>>> 
>>>>> On 06 Apr 2016, at 14:56, Nicolai Hess <[email protected]> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> 2016-04-06 14:27 GMT+02:00 Sven Van Caekenberghe <[email protected]>:
>>>>> Fix for review:
>>>>> 
>>>>> ===
>>>>> Name: STON-Core-SvenVanCaekenberghe.71
>>>>> Author: SvenVanCaekenberghe
>>>>> Time: 6 April 2016, 2:22:24.782251 pm
>>>>> UUID: 64b8b741-365e-41fe-aa98-565e33ca5d24
>>>>> Ancestors: STON-Core-SvenVanCaekenberghe.70
>>>>> 
>>>>> Fix a bug where STONReferences occurring as keys in Dictionaries or 
>>>>> elements in Sets caused those to be unhealthy after materialization. Thx 
>>>>> to Peter Uhnák for reporting this issue.
>>>>> 
>>>>> Add 3 new unit tests to STONReaderTests
>>>>> 
>>>>> #testDictionaryWithReferenceKeys
>>>>> #testSetWithReferenceElements
>>>>> #testDeepStructure
>>>>> 
>>>>> Fix Details
>>>>> 
>>>>> change the implementation of STONReader>>#processSubObjectsOf: from 
>>>>> iterative to recursive (see version 39 of 29 November 2012, this might be 
>>>>> a functional regression, see #testDeepStructure; cleanup of stack 
>>>>> instance variable for later) so that #stonProcessSubObjects: can be 
>>>>> overwritten with code being executed before or after full reference 
>>>>> resolution
>>>>> 
>>>>> imho, recursion stack depth will be equal during both writing and 
>>>>> reading, and should be acceptable.
>>>>> 
>>>>> overwrite #stonProcessSubObjects: in Dictionary and Set to #rehash at the 
>>>>> end, but only when needed (minimal optimalization, see 
>>>>> Dictionary>>#containsStonReferenceAsKey and Set>>#containsStonReference)
>>>>> ===
>>>>> Name: STON-Tests-SvenVanCaekenberghe.63
>>>>> Author: SvenVanCaekenberghe
>>>>> Time: 6 April 2016, 2:22:45.01986 pm
>>>>> UUID: 0beb2322-b81a-46ee-a0e2-6648a808774a
>>>>> Ancestors: STON-Tests-SvenVanCaekenberghe.62
>>>>> 
>>>>> (idem)
>>>>> ===
>>>> 
>>>> Thanks for looking at the code.
>>>> 
>>>>> Hi Sven,
>>>>> instead of rehashing the dictionary for every ston reference,
>>>> 
>>>> (It rehashes only once after resolving all references)
>>>> 
>>>> Ah, of course. I thought this would be called for every ston reference 
>>>> used as key.
>>>> 
>>>> 
>>>> 
>>>>> wouldn't it work to remove and readd the value after processing the 
>>>>> subobject:
>>>>> 
>>>>> Dictionary>>#stonProcessSubObjects: block
>>>>>  self keys do:[:key |
>>>>>      |value|
>>>>>      value := block value:(self removeKey: key ifAbsent:[ nil]).
>>>>>      self at: (block value: key) put: value].
>>>> 
>>>> Interesting idea. I have to think about that approach.
>>>> 
>>>> Now, Object>>#stonProcessSubObjects: is very general and looks at named 
>>>> and indexed instance variables. But this probably could be replaced by 
>>>> something more high level and specific I guess.
>>>> 
>>>> Adding and removing each key/value has a cost too. I try to make the 
>>>> simplest case very efficient and only pay a price when really needed. 
>>>> Anyway, time for some calculations.
>>>> 
>>>> ok, yes running over all keys isn't better than rehashing :-)
>>> 
>>> Still, your idea might be better than you would expect:
>>> 
>>> Now, there is 1 (partial) iteration to do the check before, 2 iterations 
>>> over the keys and values arrays, then an optional full rehash (which also 
>>> reallocates and thus generates garbage).
>>> 
>>> Your idea does only 1 iteration over keys, with remove and add (which also 
>>> happens more efficiently on the arrays above), no check before, no rehash, 
>>> probably no garbage generation at all in any case.
>>> 
>>> Like I said, I have to study it (especially the cost of remove/add, maybe 
>>> that can be optimised a bit as well).
>>> 
>>>> Thanks again for the suggestion !
>>>> 
>>>> Sven
>>>> 
>>>>>> On 06 Apr 2016, at 14:04, Sven Van Caekenberghe <[email protected]> wrote:
>>>>>> 
>>>>>> https://pharo.fogbugz.com/f/cases/17946/STON-materializes-unhealthy-Dictionaries-and-Sets-when-references-occur-in-its-keys-or-elements
>>>>>> 
>>>>>> fix coming
>>>>>> 
>>>>>>>> On 05 Apr 2016, at 13:11, Sven Van Caekenberghe <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 05 Apr 2016, at 13:02, Nicolai Hess <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2016-04-05 12:32 GMT+02:00 Cyril Ferlicot Delbecque 
>>>>>>>> <[email protected]>:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 05/04/2016 12:09, Sven Van Caekenberghe wrote:
>>>>>>>>> 
>>>>>>>>> Like I said, it is a hashing issue, sometimes it will be correct by 
>>>>>>>>> accident.
>>>>>>>>> 
>>>>>>>>> I hope you did not have to much trouble with this bug, I guess it 
>>>>>>>>> must have been hard to chase.
>>>>>>>>> 
>>>>>>>>> Is it urgent ?
>>>>>>>>> 
>>>>>>>>> I probably can give you a quick fix, but I would like to think a bit 
>>>>>>>>> more about this, since rehashing each materialised dictionary seems 
>>>>>>>>> expensive.
>>>>>>>> 
>>>>>>>> Hi Sven,
>>>>>>>> 
>>>>>>>> I got the same kind of problem in a personal application.
>>>>>>>> 
>>>>>>>> I use Sets that I serialize and I had a lot of trouble because 
>>>>>>>> sometimes
>>>>>>>> some action had strange behaviours.
>>>>>>>> 
>>>>>>>> For example in a set with element `aSet remove: aSet anyOne` raised 
>>>>>>>> 'XXX
>>>>>>>> not found in aSet'.
>>>>>>>> 
>>>>>>>> I am glad to hear that it is a Ston issue and not me that used sets in 
>>>>>>>> a
>>>>>>>> bad way :)
>>>>>>>> 
>>>>>>>> For me too it is not urgent since I have a not of university work for
>>>>>>>> the moment.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> How are hashed collections created/filled during ston-parsing ?
>>>>>>>> If the position in a hashed collection is created by a ston-reference, 
>>>>>>>> that is later replaced by the "real" object,
>>>>>>>> the index in the dictionary  (or other hashed collections) may be 
>>>>>>>> wrong.
>>>>>>> 
>>>>>>> Yes, that is indeed it, Nicolai.
>>>>>>> 
>>>>>>> But I would like to try to minimise the rehashing as it seems 
>>>>>>> expensive. But first I need a more reliable failure.
>>>>>>> 
>>>>>>>> --
>>>>>>>> Cyril Ferlicot
>>>>>>>> 
>>>>>>>> http://www.synectique.eu
>>>>>>>> 
>>>>>>>> 165 Avenue Bretagne
>>>>>>>> Lille 59000 France
> 
>

Re: [Pharo-dev] [Pharo-users] STON materialization corrupts a dictionary if the keys are references

Reply via email to