Re: [Pharo-dev] Implementing lazy loading for expandable collections (part 2)

Sven Van Caekenberghe Thu, 15 Jan 2015 11:14:14 -0800

Jimmy,

> On 15 Jan 2015, at 19:46, Jimmie Houchin <[email protected]> wrote:
> 
> I do not understand what is happening here.
> 
> oc := OrderedCollection new: 10000.
> oc size. "0"
> 
> a := Array new: 10000.
> a size. "10000"
> 
> So it seems that the #new: isn't creating an OrderedCollection of the size we 
> pass in.
> 
> I discovered this when I attempted to do the identical test but with an Array 
> instead of an OrderedCollection. Since the Array doesn't #add:   I had to 
> change the code to #at:put:.


This is all normal and by design.

OrderedCollection is a growable collection, designed so you can add elements at 
either end. The thread's discussion is about growing strategies.

Consider,

  (OrderedCollection new: 100) capacity = 100

it means it has preallocated room for a 100 elements, but the way it is managed 
is tricky, as people discovered.

> When I noticed all of the OC code did #add: even if supposedly setting its 
> size to 10000(0). I tried to use #at:put: in the OrderedCollection but it 
> failed because their was no index 1, because the size was still 0.
> 
> I naively would have thought that OrderedCollection new: 10000 would return 
> an OrderedCollection of 10000 empty/nil elements and that I could #at:put: 
> into any of those 10000 slots and #add if I needed to go beyond those 10000.
> 
> Tests done on a laptop with Windows 7, Pharo 4 latest image, latest stable 
> vm, 3rd gen i7 processor, 12gb ram.
> 
> I did all of the tests in Pharo 4 and added my Array test.
> 0:00:00:04.37
> 0:00:00:03.183
> 0:00:00:03.674
> 0:00:00:03.753
> 0:00:00:01.647 and with an Array instead of OrderedCollection
> 
> So we can see if we know the max size and create the collection with that 
> size it should improve performance.
> 
> I also do not understand why Pharo is so slow on this?
> 
> With the preallocated Array above it took 1.647 seconds.
> 
> Python 3.4.2,   0.003 seconds with 10,000 ints, and 0.17 with 1,000,000 ints.
> Julia 3.5, 0.005 seconds with 1,000,000 Int64.
> 
> Now I don't expect Pharo to keep up with Julia. But we aren't even in the 
> neighborhood with Python.
> 
> I really would like to use Pharo. It really is my favorite language and 
> environment. But these performance numbers give me pause.

There are differently sized loops in this thread, what exactly did you try ?

What is the Python code that you ran ?

Benchmarking is always dangerous, you must make absolutely clear what you are 
running and comparing.

Sven

> Jimmie
> 
> 
> On 1/14/2015 3:53 AM, Max Leske wrote:
>> Hi Alex
>> 
>> I thought the following little experiment might interest you.
>> 
>> I was trying to find out (as per your suggestion) how big the difference 
>> would be between:
>> - setting the array in OrderedCollection to nil and initializing it to size 
>> 10 on first access
>> - initializing the array in OrderedCollection to size 0.
>> 
>> Expectation: #grow will take time, ergo, using an empty array should be 
>> slower than using
>>      a preallocated array.
>> 
>> Experiment (Pharo 1.3 on SqueakVM):
>> 
>> 
>> [ 10000 timesRepeat: [ | c | c  := OrderedCollection new: 100000.
>>  1 to: 10000 do: [ :i | c add: i ] ] ] timeToRun 17566
>> 
>> [ 10000 timesRepeat: [ | c | c  := OrderedCollection new: 10000.
>>  1 to: 10000 do: [ :i | c add: i ] ] ] timeToRun 19717
>> 
>> [ 10000 timesRepeat: [ | c | c  := OrderedCollection new: 0.
>>  1 to: 10000 do: [ :i | c add: i ] ] ] timeToRun 17598
>> 
>> 
>> 
>> [ 10000 timesRepeat: [ | c | c  := OrderedCollection new: 1000.
>>  1 to: 10000 do: [ :i | c add: i ] ] ] timeToRun 18084
>> 
>> 
>> 
>> Result: growing a collection is always faster (or at least nearly equally 
>> fast)!
>> 
>> Explanation: What makes preinitialized collections so slow is the fact the 
>> the firstIndex variable will be set to
>>      the first third of the array (3333 in this case). When the last 
>> position in the array is reached, all elements
>>      are copied to the beginning of the array (2n operations). This copy 
>> operation is really slow (which can be seen
>>      from the first example with an oversized collection that does not 
>> require copying because the last position
>>      of the array is never reached).
>> 
>> 
>> Experiment (Pharo 4 on PharoVM):
>> (I’m using bigger numbers here to compensate for the faster VM :) )
>> 
>> 
>> [ 10000 timesRepeat: [ | c | c  := OrderedCollection new: 1000000.
>>  1 to: 100000 do: [ :i | c add: i ] ] ] timeToRun "0:00:01:09.225"
>> 
>> [ 10000 timesRepeat: [ | c | c  := OrderedCollection new: 100000.
>>  1 to: 100000 do: [ :i | c add: i ] ] ] timeToRun "0:00:00:37.653"
>> 
>> [ 10000 timesRepeat: [ | c | c  := OrderedCollection new: 0.
>>  1 to: 100000 do: [ :i | c add: i ] ] ] timeToRun "0:00:00:40.152"
>> 
>> 
>> 
>> [ 10000 timesRepeat: [ | c | c  := OrderedCollection new: 10000.
>>  1 to: 100000 do: [ :i | c add: i ] ] ] timeToRun "0:00:00:42.121”
>> 
>> 
>> 
>> Result: growing a collection is easily fast enough, although minute 
>> performance gains are possible with preinitialized collections.
>> 
>> Explanation: #makeRoomAtLast has been improved. Also: firstIndex is now set 
>> to the beginning of the array, preventing the copy
>>      operation that made the former measurements so slow.
>>      The run with the oversized collection is only so slow because it 
>> apparently takes a lot of time to assign the large array (?).
>> 
>> 
>> Consequence of my little experiment: My initial implementation of the idea 
>> in you paper set the array in OrderedCollection to nil
>>      under the assumption that grow operations are very costly. That choice 
>> leads to many changes in methods of OrderedCollection
>>      where it is necessary to check if the array has been initialized yet. 
>> This introduces overhead and makes the code more error prone.
>>      I see now that it makes much more sense to use an empty array, the 
>> performance lost by growing ist neglectable.
>>      
>>      It is also interesting to note that there almost never seems to be a 
>> good reason to initialize an OrderedCollection with a large array.
>> 
>> 
>> Cheers,
>> Max
> 
>

Re: [Pharo-dev] Implementing lazy loading for expandable collections (part 2)

Reply via email to