I submit for your consideration the use case where you want to initialize a Set with elements from multiple sources. How do you do that today? Well, you could write set := (Set withAll: source1) addAll: source2; addAll: source3... etc, which will surely cause quite some grows. (and also crashes if source1 has more elements than implementation limits)
A few other options: set := Set new: (sources inject: 0 into: [:sub :next | sub + next size]). sources do: [:each | set addAll: each ] or: tmpWithAllToAdd := sources inject: OrderedCollection new into: [:sub : next | sub addAll: next; yourself]. set := Set withAll: tmpAllToAdd which doesn't perform excessive grows, but still would crash if total size of sources is bigger than implementation limits. So how are these better than modifying addAll: so you can write set := Set new. sources do: [:each | set addAll: each] and not worry too much how much your set spends on growing/checking if it needs to grow? You could always min: with current implementation limit of Set size if you think it's likely that's a case you'll encounter... And my point was really, I'd rather have an error telling me "Hey, you're trying to do something EXTREME" (like, trying to add (in VW) more than 2^28 (2^56 on 64bit) elements to a set) than have my program stop for a large amount of time, then finishing "correctly"... Cheers, Henry On Jul 2, 2009, at 4:16 50PM, Andres Valloud wrote: > However, handling the growth failure wouldn't crash the program... > > Henrik Johansen wrote: >> It would. >> I'm not sure I consider the current behaviour of spending 12 minutes >> adding the same element so much better, though. >> >> Cheers, >> Henry >> >> On Jul 2, 2009, at 12:17 14PM, Andres Valloud wrote: >> >> >>> Also, what happens in these cases? >>> >>> | set | >>> set := Set new. >>> bag := Bag new. >>> bad add: 4 withOccurrences: 1000000000 "or equivalent code". >>> set addAll: bag >>> >>> I am guessing the set will fail to grow and addAll: will fail, even >>> though it should work. Maybe it won't happen with bags, or perhaps >>> the >>> example needs some massaging. Is the failure to grow handled >>> gracefully? Note that I have not looked at the code because I have >>> not >>> had much chance to do that yet. >>> >>> Andres Valloud wrote: >>> >>>> Hmmm, I don't know though, usually the usage pattern with hashed >>>> collections has been to presize explicitly and then add to them... >>>> what >>>> kind of use cases are you dealing with? >>>> >>>> Henrik Johansen wrote: >>>> >>>> >>>>> Having spent a fair deal of time (in VisualWorks) worrying about >>>>> my >>>>> intial Set sizes, writing functions to precompute sane sizes for >>>>> arbitrary inputs and adding changeCapacityTo: calls before addAll: >>>>> operations due to performance issues on a case-by-case basis >>>>> (Squeak/ >>>>> Pharo doesn't even HAVE a changeCapacityTo: ), I have to strongly >>>>> disagree. >>>>> >>>>> I'd rather spend my time thinking about my problem domain, and >>>>> know >>>>> the Set implementation handles the nitty-gritty details in a >>>>> sufficiently smart way. >>>>> >>>>> Whether that means growing once to accomedate the worst case, then >>>>> shrinking afterwards, or doing a grow check every X added items, >>>>> in >>>>> the unlikely event that what you're adding is a 3 million element >>>>> Array with 10 unique objects, I don't really care. >>>>> >>>>> Cheers, >>>>> Henry >>>>> >>>>> On Jul 2, 2009, at 7:38 10AM, Andres Valloud wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> (Set new: desiredCapacity) addAll: "consecutive integers" >>>>>> >>>>>> may be a better compromise. >>>>>> >>>>>> Andres. >>>>>> >>>>>> >>>>>> Stéphane Ducasse wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Thanks this is really good. >>>>>>> We will probably integrate these changes really fast >>>>>>> >>>>>>> On Jul 1, 2009, at 1:03 PM, Henrik Johansen wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> I took some time to check this yesterday, my comments on the >>>>>>>> code >>>>>>>> itself are basically the same as what Lucas posted in the issue >>>>>>>> tracker. >>>>>>>> Works as advertised, but should probably clean out the "gunk" >>>>>>>> which is >>>>>>>> unused after it's integrated. >>>>>>>> >>>>>>>> Some feedback on possible improvements: >>>>>>>> - What it does, is basically speed up the adds to the new array >>>>>>>> in a >>>>>>>> grow that might happen Why not take this all to the logical >>>>>>>> conclusion, and instead optimize an addAll: method to be used >>>>>>>> in >>>>>>>> grow, >>>>>>>> that assumes elements are not in set (and themselves are not >>>>>>>> duplicates)? F.ex. it could also add to tally just once, >>>>>>>> instead of >>>>>>>> one increase per element as the noCompareAdd: does. (an add >>>>>>>> method >>>>>>>> with no tally add, no = object check, and no growCheck). >>>>>>>> >>>>>>>> - The usual addAll: method can be sped up in a similar way, by >>>>>>>> performing one grow operation before adding elements if >>>>>>>> necessary >>>>>>>> (pessimistically assuming all elements are unique), then use an >>>>>>>> add >>>>>>>> method which does not check for grow need at all. (an add >>>>>>>> method >>>>>>>> with >>>>>>>> tally add and = object check, but no growCheck) >>>>>>>> If desirable, you could also shrink the capacity after adds are >>>>>>>> done, to match the real amount of new elements. >>>>>>>> >>>>>>>> Adding (1 to: 1200000) to a new Set with such an addAll: method >>>>>>>> took >>>>>>>> 1/2 to 1/3rd the time of the regular addAll: method, even with >>>>>>>> the >>>>>>>> lower grow cost of FasterSets. (Lower gains if Set is presized, >>>>>>>> and if >>>>>>>> hash-calculation is bigger part of cost) >>>>>>>> >>>>>>>> Note: If you want to stay ANSI-compatible, one tricky point in >>>>>>>> such an >>>>>>>> optimization is adding all objects in collection up to a nil >>>>>>>> element, >>>>>>>> then raising an error. (ie: must produce equivalent results to >>>>>>>> add: >>>>>>>> performed of each of collections elements in sequence ) >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Henry >>>>>>>> >>>>>>>> >>>>>>>> On Jun 25, 2009, at 8:55 03PM, Stéphane Ducasse wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> I would really like to assess FasterSets for Pharo1.1 >>>>>>>>> >>>>>>>>> Begin forwarded message: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> From: Ralph Boland <[email protected]> >>>>>>>>>> Date: June 25, 2009 8:30:13 PM CEDT >>>>>>>>>> To: Stéphane Ducasse <[email protected]> >>>>>>>>>> Subject: Re: about fasterTests >>>>>>>>>> >>>>>>>>>> 2009/6/25 Stéphane Ducasse <[email protected]>: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> we are interest to get better libraries in pharo. >>>>>>>>>>> If you want to join and help making sure that your code in >>>>>>>>>>> integrated in >>>>>>>>>>> pharo1.1 >>>>>>>>>>> please say it and may be join the pharo mailing-list. >>>>>>>>>>> >>>>>>>>>>> Stef >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Yes, I would like my code 'FasterSets' added to Pharo. >>>>>>>>>> I joined the pharo mailing list. >>>>>>>>>> Today I will mail you my MIT license release. >>>>>>>>>> Probably will take a couple of weeks (From Calgary, Canada) >>>>>>>>>> to get >>>>>>>>>> there. >>>>>>>>>> Once I am added as a contributer I will release my code to >>>>>>>>>> Pharo >>>>>>>>>> for >>>>>>>>>> verification. >>>>>>>>>> In the meantime if any Squeakers try out my code and report >>>>>>>>>> bugs I >>>>>>>>>> will fix them >>>>>>>>>> before releasing to Pharo. >>>>>>>>>> >>>>>>>>>> Some comments: >>>>>>>>>> >>>>>>>>>> 1) Since FasterSets modifies low level methods in class Set >>>>>>>>>> and >>>>>>>>>> its >>>>>>>>>> subclasses it >>>>>>>>>> may be incompatible with packages that create subclasses of >>>>>>>>>> Set >>>>>>>>>> and then >>>>>>>>>> override low level methods. If someone could send me a >>>>>>>>>> list of >>>>>>>>>> package add-ons >>>>>>>>>> to Pharo that do this then I can download them and make any >>>>>>>>>> modifications >>>>>>>>>> necessary for them to run with FasterSets and make these >>>>>>>>>> changes >>>>>>>>>> part of my >>>>>>>>>> release. Note that it is no more difficult to subclass from >>>>>>>>>> Set >>>>>>>>>> with FasterSets >>>>>>>>>> than without; it is just different and only different if you >>>>>>>>>> do >>>>>>>>>> low level things. >>>>>>>>>> >>>>>>>>>> 2) FasterSets uses a method 'noCompareOrGrowAdd:' which >>>>>>>>>> adds >>>>>>>>>> an element to a set without doing compares or growing the >>>>>>>>>> set. >>>>>>>>>> Needless to say it is a private method. >>>>>>>>>> I also have a public method 'nastyAddNew:' which, like >>>>>>>>>> 'noCompareOrGrowAdd:' adds an element to a set without >>>>>>>>>> checking >>>>>>>>>> if the >>>>>>>>>> element is already there but does do a grow if needed. >>>>>>>>>> This is useful (i.e. faster) in situations where you KNOW the >>>>>>>>>> object >>>>>>>>>> you are adding to a set is not there already. >>>>>>>>>> However if the object IS already there then you now have two >>>>>>>>>> of them in your set; i.e. you now have a subtle bug. >>>>>>>>>> I use 'nastyAddNew:' in my code but never made it a part of >>>>>>>>>> FasterSets >>>>>>>>>> because it is questionable whether or not such a method >>>>>>>>>> should be >>>>>>>>>> available generally. However, if I can decide I like the >>>>>>>>>> 'nastyAddNew:' >>>>>>>>>> method and am willing to use it despite its risks so can >>>>>>>>>> others >>>>>>>>>> so there >>>>>>>>>> is an argument for it being added. >>>>>>>>>> While my expectation is that you don't want >>>>>>>>>> 'nastyAddNew:' in >>>>>>>>>> Pharo >>>>>>>>>> I thought you should at least be aware of it. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> >>>>>>>>>> Ralph Boland >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Pharo-project mailing list >>>>>>>>> [email protected] >>>>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Pharo-project mailing list >>>>>>>> [email protected] >>>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Pharo-project mailing list >>>>>>> [email protected] >>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo- >>>>>>> project >>>>>>> . >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Pharo-project mailing list >>>>>> [email protected] >>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>>>>> >>>>>> >>>>>> >>>>>> >>>>> . >>>>> >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Pharo-project mailing list >>>> [email protected] >>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>>> . >>>> >>>> >>>> >>> _______________________________________________ >>> Pharo-project mailing list >>> [email protected] >>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>> >>> >> >> . >> >> > > _______________________________________________ > Pharo-project mailing list > [email protected] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [email protected] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
