Jimmy, > On 15 Jan 2015, at 19:46, Jimmie Houchin <[email protected]> wrote: > > I do not understand what is happening here. > > oc := OrderedCollection new: 10000. > oc size. "0" > > a := Array new: 10000. > a size. "10000" > > So it seems that the #new: isn't creating an OrderedCollection of the size we > pass in. > > I discovered this when I attempted to do the identical test but with an Array > instead of an OrderedCollection. Since the Array doesn't #add: I had to > change the code to #at:put:.
This is all normal and by design. OrderedCollection is a growable collection, designed so you can add elements at either end. The thread's discussion is about growing strategies. Consider, (OrderedCollection new: 100) capacity = 100 it means it has preallocated room for a 100 elements, but the way it is managed is tricky, as people discovered. > When I noticed all of the OC code did #add: even if supposedly setting its > size to 10000(0). I tried to use #at:put: in the OrderedCollection but it > failed because their was no index 1, because the size was still 0. > > I naively would have thought that OrderedCollection new: 10000 would return > an OrderedCollection of 10000 empty/nil elements and that I could #at:put: > into any of those 10000 slots and #add if I needed to go beyond those 10000. > > Tests done on a laptop with Windows 7, Pharo 4 latest image, latest stable > vm, 3rd gen i7 processor, 12gb ram. > > I did all of the tests in Pharo 4 and added my Array test. > 0:00:00:04.37 > 0:00:00:03.183 > 0:00:00:03.674 > 0:00:00:03.753 > 0:00:00:01.647 and with an Array instead of OrderedCollection > > So we can see if we know the max size and create the collection with that > size it should improve performance. > > I also do not understand why Pharo is so slow on this? > > With the preallocated Array above it took 1.647 seconds. > > Python 3.4.2, 0.003 seconds with 10,000 ints, and 0.17 with 1,000,000 ints. > Julia 3.5, 0.005 seconds with 1,000,000 Int64. > > Now I don't expect Pharo to keep up with Julia. But we aren't even in the > neighborhood with Python. > > I really would like to use Pharo. It really is my favorite language and > environment. But these performance numbers give me pause. There are differently sized loops in this thread, what exactly did you try ? What is the Python code that you ran ? Benchmarking is always dangerous, you must make absolutely clear what you are running and comparing. Sven > Jimmie > > > On 1/14/2015 3:53 AM, Max Leske wrote: >> Hi Alex >> >> I thought the following little experiment might interest you. >> >> I was trying to find out (as per your suggestion) how big the difference >> would be between: >> - setting the array in OrderedCollection to nil and initializing it to size >> 10 on first access >> - initializing the array in OrderedCollection to size 0. >> >> Expectation: #grow will take time, ergo, using an empty array should be >> slower than using >> a preallocated array. >> >> Experiment (Pharo 1.3 on SqueakVM): >> >> >> [ 10000 timesRepeat: [ | c | c := OrderedCollection new: 100000. >> 1 to: 10000 do: [ :i | c add: i ] ] ] timeToRun 17566 >> >> [ 10000 timesRepeat: [ | c | c := OrderedCollection new: 10000. >> 1 to: 10000 do: [ :i | c add: i ] ] ] timeToRun 19717 >> >> [ 10000 timesRepeat: [ | c | c := OrderedCollection new: 0. >> 1 to: 10000 do: [ :i | c add: i ] ] ] timeToRun 17598 >> >> >> >> [ 10000 timesRepeat: [ | c | c := OrderedCollection new: 1000. >> 1 to: 10000 do: [ :i | c add: i ] ] ] timeToRun 18084 >> >> >> >> Result: growing a collection is always faster (or at least nearly equally >> fast)! >> >> Explanation: What makes preinitialized collections so slow is the fact the >> the firstIndex variable will be set to >> the first third of the array (3333 in this case). When the last >> position in the array is reached, all elements >> are copied to the beginning of the array (2n operations). This copy >> operation is really slow (which can be seen >> from the first example with an oversized collection that does not >> require copying because the last position >> of the array is never reached). >> >> >> Experiment (Pharo 4 on PharoVM): >> (I’m using bigger numbers here to compensate for the faster VM :) ) >> >> >> [ 10000 timesRepeat: [ | c | c := OrderedCollection new: 1000000. >> 1 to: 100000 do: [ :i | c add: i ] ] ] timeToRun "0:00:01:09.225" >> >> [ 10000 timesRepeat: [ | c | c := OrderedCollection new: 100000. >> 1 to: 100000 do: [ :i | c add: i ] ] ] timeToRun "0:00:00:37.653" >> >> [ 10000 timesRepeat: [ | c | c := OrderedCollection new: 0. >> 1 to: 100000 do: [ :i | c add: i ] ] ] timeToRun "0:00:00:40.152" >> >> >> >> [ 10000 timesRepeat: [ | c | c := OrderedCollection new: 10000. >> 1 to: 100000 do: [ :i | c add: i ] ] ] timeToRun "0:00:00:42.121” >> >> >> >> Result: growing a collection is easily fast enough, although minute >> performance gains are possible with preinitialized collections. >> >> Explanation: #makeRoomAtLast has been improved. Also: firstIndex is now set >> to the beginning of the array, preventing the copy >> operation that made the former measurements so slow. >> The run with the oversized collection is only so slow because it >> apparently takes a lot of time to assign the large array (?). >> >> >> Consequence of my little experiment: My initial implementation of the idea >> in you paper set the array in OrderedCollection to nil >> under the assumption that grow operations are very costly. That choice >> leads to many changes in methods of OrderedCollection >> where it is necessary to check if the array has been initialized yet. >> This introduces overhead and makes the code more error prone. >> I see now that it makes much more sense to use an empty array, the >> performance lost by growing ist neglectable. >> >> It is also interesting to note that there almost never seems to be a >> good reason to initialize an OrderedCollection with a large array. >> >> >> Cheers, >> Max > >
