Re: [Pharo-project] FasterSets

Henrik Johansen Thu, 02 Jul 2009 08:49:31 -0700

I submit for your consideration the use case where you want to  
initialize a Set with elements from multiple sources.
How do you do that today?
Well, you could write
set := (Set withAll: source1) addAll: source2; addAll: source3... etc,  
which will surely cause quite some grows. (and also crashes if source1  
has more elements than implementation limits)


A few other options:

set := Set new: (sources inject: 0 into: [:sub :next | sub + next  
size]).
sources do: [:each | set addAll: each ]

or:
tmpWithAllToAdd := sources inject: OrderedCollection new into: [:sub :  
next | sub addAll: next; yourself].
set := Set withAll: tmpAllToAdd

which doesn't perform excessive grows, but still would crash if total  
size of sources is bigger than implementation limits.

So how are these better than modifying addAll: so you can write
set := Set new.
sources do:  [:each | set addAll: each]
and not worry too much how much your set spends on growing/checking if  
it needs to grow?

You could always min: with current implementation limit of Set size if  
you think it's likely that's a case you'll encounter...

And my point was really, I'd rather have an error telling me "Hey,  
you're trying to do something EXTREME" (like, trying to add (in VW)  
more than 2^28 (2^56 on 64bit)
elements to a set) than have my program stop for a large amount of  
time, then finishing "correctly"...

Cheers,
Henry

On Jul 2, 2009, at 4:16 50PM, Andres Valloud wrote:

> However, handling the growth failure wouldn't crash the program...
>
> Henrik Johansen wrote:
>> It would.
>> I'm not sure I consider the current behaviour of spending 12 minutes
>> adding the same element so much better, though.
>>
>> Cheers,
>> Henry
>>
>> On Jul 2, 2009, at 12:17 14PM, Andres Valloud wrote:
>>
>>
>>> Also, what happens in these cases?
>>>
>>> | set |
>>> set := Set new.
>>> bag := Bag new.
>>> bad add: 4 withOccurrences: 1000000000 "or equivalent code".
>>> set addAll: bag
>>>
>>> I am guessing the set will fail to grow and addAll: will fail, even
>>> though it should work.  Maybe it won't happen with bags, or perhaps
>>> the
>>> example needs some massaging.  Is the failure to grow handled
>>> gracefully?  Note that I have not looked at the code because I have
>>> not
>>> had much chance to do that yet.
>>>
>>> Andres Valloud wrote:
>>>
>>>> Hmmm, I don't know though, usually the usage pattern with hashed
>>>> collections has been to presize explicitly and then add to them...
>>>> what
>>>> kind of use cases are you dealing with?
>>>>
>>>> Henrik Johansen wrote:
>>>>
>>>>
>>>>> Having spent a fair deal of time (in VisualWorks) worrying about  
>>>>> my
>>>>> intial Set sizes, writing functions to precompute sane sizes for
>>>>> arbitrary inputs and adding changeCapacityTo: calls before addAll:
>>>>> operations due to performance issues on a case-by-case basis
>>>>> (Squeak/
>>>>> Pharo doesn't even HAVE a changeCapacityTo: ), I have to strongly
>>>>> disagree.
>>>>>
>>>>> I'd rather spend my time thinking about my problem domain, and  
>>>>> know
>>>>> the Set implementation handles the nitty-gritty details in a
>>>>> sufficiently smart way.
>>>>>
>>>>> Whether that means growing once to accomedate the worst case, then
>>>>> shrinking afterwards, or doing a grow check every X added items,  
>>>>> in
>>>>> the unlikely event that what you're adding is a 3 million element
>>>>> Array with 10 unique objects, I don't really care.
>>>>>
>>>>> Cheers,
>>>>> Henry
>>>>>
>>>>> On Jul 2, 2009, at 7:38 10AM, Andres Valloud wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> (Set new: desiredCapacity) addAll: "consecutive integers"
>>>>>>
>>>>>> may be a better compromise.
>>>>>>
>>>>>> Andres.
>>>>>>
>>>>>>
>>>>>> Stéphane Ducasse wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Thanks this is really good.
>>>>>>> We will probably integrate these changes really fast
>>>>>>>
>>>>>>> On Jul 1, 2009, at 1:03 PM, Henrik Johansen wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> I took some time to check this yesterday, my comments on the  
>>>>>>>> code
>>>>>>>> itself are basically the same as what Lucas posted in the issue
>>>>>>>> tracker.
>>>>>>>> Works as advertised, but should probably clean out the "gunk"
>>>>>>>> which is
>>>>>>>> unused after it's integrated.
>>>>>>>>
>>>>>>>> Some feedback on possible improvements:
>>>>>>>> - What it does, is basically speed up the adds to the new array
>>>>>>>> in a
>>>>>>>> grow that might happen Why not take this all to the logical
>>>>>>>> conclusion, and instead optimize an addAll: method to be used  
>>>>>>>> in
>>>>>>>> grow,
>>>>>>>> that assumes elements are not in set (and themselves are not
>>>>>>>> duplicates)? F.ex. it could also add to tally just once,
>>>>>>>> instead of
>>>>>>>> one increase per element as the noCompareAdd: does. (an add
>>>>>>>> method
>>>>>>>> with no tally add, no = object check, and no growCheck).
>>>>>>>>
>>>>>>>> - The usual addAll: method can be sped up in a similar way, by
>>>>>>>> performing one grow operation before adding elements if  
>>>>>>>> necessary
>>>>>>>> (pessimistically assuming all elements are unique), then use an
>>>>>>>> add
>>>>>>>> method which does not check for grow need at all. (an add  
>>>>>>>> method
>>>>>>>> with
>>>>>>>> tally add and = object check, but no growCheck)
>>>>>>>> If desirable, you could also shrink the capacity after adds are
>>>>>>>> done, to match the real amount of new elements.
>>>>>>>>
>>>>>>>> Adding (1 to: 1200000) to a new Set with such an addAll: method
>>>>>>>> took
>>>>>>>> 1/2 to  1/3rd the time of the regular addAll: method, even with
>>>>>>>> the
>>>>>>>> lower grow cost of FasterSets. (Lower gains if Set is presized,
>>>>>>>> and if
>>>>>>>> hash-calculation is bigger part of cost)
>>>>>>>>
>>>>>>>> Note: If you want to stay ANSI-compatible, one tricky point in
>>>>>>>> such an
>>>>>>>> optimization is adding all objects in collection up to a nil
>>>>>>>> element,
>>>>>>>> then raising an error. (ie: must produce equivalent results to
>>>>>>>> add:
>>>>>>>> performed of each of collections elements in sequence )
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Henry
>>>>>>>>
>>>>>>>>
>>>>>>>> On Jun 25, 2009, at 8:55 03PM, Stéphane Ducasse wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> I would really like to assess FasterSets for Pharo1.1
>>>>>>>>>
>>>>>>>>> Begin forwarded message:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> From: Ralph Boland <[email protected]>
>>>>>>>>>> Date: June 25, 2009 8:30:13 PM CEDT
>>>>>>>>>> To: Stéphane Ducasse <[email protected]>
>>>>>>>>>> Subject: Re: about fasterTests
>>>>>>>>>>
>>>>>>>>>> 2009/6/25 Stéphane Ducasse <[email protected]>:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> we are interest to get better libraries in pharo.
>>>>>>>>>>> If you want to join and help making sure that your code in
>>>>>>>>>>> integrated in
>>>>>>>>>>> pharo1.1
>>>>>>>>>>> please say it and may be join the pharo mailing-list.
>>>>>>>>>>>
>>>>>>>>>>> Stef
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Yes, I would like my code 'FasterSets' added to Pharo.
>>>>>>>>>> I joined the pharo mailing list.
>>>>>>>>>> Today I will mail you my MIT license release.
>>>>>>>>>> Probably will take a couple of weeks (From Calgary, Canada)
>>>>>>>>>> to get
>>>>>>>>>> there.
>>>>>>>>>> Once I am added as a contributer I will release my code to
>>>>>>>>>> Pharo
>>>>>>>>>> for
>>>>>>>>>> verification.
>>>>>>>>>> In the meantime if any Squeakers try out my code and report
>>>>>>>>>> bugs I
>>>>>>>>>> will fix them
>>>>>>>>>> before releasing to Pharo.
>>>>>>>>>>
>>>>>>>>>> Some comments:
>>>>>>>>>>
>>>>>>>>>> 1)  Since FasterSets modifies low level methods in class Set
>>>>>>>>>> and
>>>>>>>>>> its
>>>>>>>>>> subclasses it
>>>>>>>>>> may be incompatible with packages that create subclasses of  
>>>>>>>>>> Set
>>>>>>>>>> and then
>>>>>>>>>> override low level methods.  If someone could send me a  
>>>>>>>>>> list of
>>>>>>>>>> package add-ons
>>>>>>>>>> to Pharo that do this then I can download them and make any
>>>>>>>>>> modifications
>>>>>>>>>> necessary for them to run with  FasterSets and make these
>>>>>>>>>> changes
>>>>>>>>>> part of my
>>>>>>>>>> release.  Note that it is no more difficult to subclass from
>>>>>>>>>> Set
>>>>>>>>>> with FasterSets
>>>>>>>>>> than without;  it is just different and only different if you
>>>>>>>>>> do
>>>>>>>>>> low level things.
>>>>>>>>>>
>>>>>>>>>> 2)  FasterSets uses a method  'noCompareOrGrowAdd:'  which  
>>>>>>>>>> adds
>>>>>>>>>> an element to a set without doing compares or growing the  
>>>>>>>>>> set.
>>>>>>>>>> Needless to say it is a private method.
>>>>>>>>>> I also have a public method 'nastyAddNew:'  which, like
>>>>>>>>>> 'noCompareOrGrowAdd:'  adds an element to a set without
>>>>>>>>>> checking
>>>>>>>>>> if the
>>>>>>>>>> element is already there but does do a grow if needed.
>>>>>>>>>> This is useful (i.e. faster) in situations where you KNOW the
>>>>>>>>>> object
>>>>>>>>>> you are adding to a set is not there already.
>>>>>>>>>> However if the object IS already there then you now have two
>>>>>>>>>> of them in your set;  i.e.  you now have a subtle bug.
>>>>>>>>>> I use 'nastyAddNew:' in my code but never made it a part of
>>>>>>>>>> FasterSets
>>>>>>>>>> because it is questionable whether or not such a method
>>>>>>>>>> should be
>>>>>>>>>> available generally.   However, if I can decide I like the
>>>>>>>>>> 'nastyAddNew:'
>>>>>>>>>> method and am willing to use it despite its risks so can  
>>>>>>>>>> others
>>>>>>>>>> so there
>>>>>>>>>> is an argument for it being added.
>>>>>>>>>> While my expectation is that you don't want   
>>>>>>>>>> 'nastyAddNew:'  in
>>>>>>>>>> Pharo
>>>>>>>>>> I thought you should at least be aware of it.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Ralph Boland
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Pharo-project mailing list
>>>>>>>>> [email protected]
>>>>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pharo-project mailing list
>>>>>>>> [email protected]
>>>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pharo-project mailing list
>>>>>>> [email protected]
>>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-
>>>>>>> project
>>>>>>> .
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Pharo-project mailing list
>>>>>> [email protected]
>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> .
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [email protected]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>> .
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [email protected]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>>
>>
>> .
>>
>>
>
> _______________________________________________
> Pharo-project mailing list
> [email protected]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>


_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] FasterSets

Reply via email to