Re: [Pharo-dev] Can we make this fast ?

[email protected] Wed, 19 Nov 2014 00:49:36 -0800

On Wed, Nov 19, 2014 at 12:33 AM, Eliot Miranda <[email protected]>
wrote:


>
>
> On Tue, Nov 18, 2014 at 3:24 PM, [email protected] <[email protected]>
> wrote:
>
>>
>> Le 18 nov. 2014 22:52, "Sven Van Caekenberghe" <[email protected]> a écrit :
>>
>> >
>> >
>> > > On 18 Nov 2014, at 22:28, Andreas Wacknitz <[email protected]> wrote:
>> > >
>> > >
>> > > Am 18.11.14 22:12, schrieb [email protected]:
>> > >> See
>> > >> https://news.ycombinator.com/item?id=8625280
>> > >>
>> > >> [ (1 to: 100000000) sum ] timeToRun
>> > >>
>> > >> Here we get out of the SmallInteger range...
>> > >>
>> > >> Apart FFI'ing the thing, do we have other options?
>> > >>
>> > >> Phil
>> > > Choose a better algorithm:
>> > > [100000000 * 10000001 / 2] timeToRun
>> > > should beat yours by several orders of magnitude.
>> > >
>> > > Andreas
>> >
>> > Haha, great !
>> >
>> >
>>
>> Sure cool.
>>
>> Now Pharo is slow on these loops with arithmetics and it is painful.
>>
>
> Yes, but that particular loop is low because in part it uses the sum
> convenience.  e.g.
>
> Collection>>sum
> "Compute the sum of all the elements in the receiver"
>
> ^self reduce:[:a :b| a + b]
>
> Collection>>reduce: binaryBlock
> "Apply the argument, binaryBlock cumulatively to the elements of the
> receiver.
> For sequenceable collections the elements will be used in order, for
> unordered
> collections the order is unspecified."
>
> | first nextValue |
> first := true.
> self do: [ :each |
> first
> ifTrue: [ nextValue := each. first := false ]
> ifFalse: [ nextValue := binaryBlock value: nextValue value: each ] ].
> first ifTrue: [ self errorEmptyCollection ].
> ^nextValue
>

I see that in Squeak 4.5.

Interestingly, Pharo (here 3.0) has a different implementation.

Collection>>sum

sum
"This is implemented using a variant of the normal inject:into: pattern.
The reason for this is that it is not known whether we're in the normal
number line, i.e. whether 0 is a good initial value for the sum.
Consider a collection of measurement objects, 0 would be the unitless
value and would not be appropriate to add with the unit-ed objects."
| sum sample |
sample := self anyOne.
sum := self inject: sample into: [:accum :each | accum + each].
^ sum - sample

and reduce: is:

Collection>>reduce: aBlock
"Fold the result of the receiver into aBlock. The argument aBlock must take
two or more arguments. It applies the argument, binaryBlock cumulatively to
the elements of the receiver. For sequenceable collections the elements
will be used in order, for unordered collections the order is unspecified."
"#(1 2 3) asSet reduce: [ :a :b | a + b ]
--> 1 + 2 + 3 = 6
#(1 2 3 4 5) asSet reduce: [ :a :b :c | a + b + c ]
--> 1 + 2 + 3 + 4 + 5 = 15"
 ^self asOrderedCollection reduce: aBlock

Which in SequencableCollection>>reduce: aBlock

is

reduce: aBlock
"Fold the result of the receiver into aBlock. The argument aBlock must take
two or more arguments. It applies the argument, binaryBlock cumulatively to
the elements of the receiver. For sequenceable collections the elements
will be used in order, for unordered collections the order is unspecified."
 "#(1 2 3) reduce: [ :a :b | a + b ]
--> 1 + 2 + 3 = 6
#(1 2 3 4 5) reduce: [ :a :b :c | a + b + c ]
--> 1 + 2 + 3 + 4 + 5 = 15"
 ^ self reduceLeft: aBlock


where reduceLeft: aBlock is

reduceLeft: aBlock
"Fold the result of the receiver from left to right into aBlock. The
argument aBlock must take two or more arguments."

"#(1 2 3) reduceLeft: [ :a :b | a - b ].
--> ((1 - 2) - 3) = -4
#(1 + 3 - 5) reduceLeft: [ :a :op :b | a perform: op with: b ].
--> ((1 + 3) - 5) = -1"

| arguments |
self emptyCheck.
arguments := Array new: aBlock argumentCount.
(arguments size = 0 or: [ (self size + 1) \\ (arguments size - 1) > 0 ])
ifTrue: [ self error: 'Collection size and block argument count do not
match.' ].
arguments at: 1 put: self first.
2 to: self size by: arguments size - 1 do: [ :index |
arguments
replaceFrom: 2 to: arguments size with: self startingAt: index;
at: 1 put: (aBlock valueWithArguments: arguments) ].
^ arguments first

That's quite a different way to look at things implementation wise.

Why those changes?



>
> | n |
> n := 100000000.
> {  [ (1 to: n) sum ] timeToRun. [| s | s := 0. (1 to: n) do: [:i| s := s +
> i]] timeToRun. [| s | s := 0. 1 to: n do: [:i| s := s + i]] timeToRun. [n *
> (n + 1) / 2] timeToRun}
>
> #(10112 9538 8432 0)
>

Pharo 3.0 gives:
Array(0:00:00:06.916 0:00:00:06.303 0:00:00:05.647 0:00:00:00)

Squeak 4.5 gives:
 #(8211 7844 6875 0)

Interestingly, the Pharo timeToRun is the Squeak durationToRun.

| n |
n := 100000000.
{  [ (1 to: n) sum ] durationToRun. [| s | s := 0. (1 to: n) do: [:i| s :=
s + i]] durationToRun. [| s | s := 0. 1 to: n do: [:i| s := s + i]]
durationToRun. [n * (n + 1) / 2] durationToRun}

{0:00:00:08.271 . 0:00:00:07.9 . 0:00:00:06.94 . 0:00:00:00}

Printout formats are also different :-)

So:


Pharo 3.0    (0:00:00:06.916   0:00:00:06.303 0:00:00:05.647  0:00:00:00)
Squeak 4.5 {0:00:00:08.271   0:00:00:07.900  0:00:00:06.940 . 0:00:00:00}

I tried it out on VW 7.10PUL 32bits:
| n |
n := 100000000.
"[(1 to: n) sum ] timeToRun."
[| s | s := 0. 1 to: n do: [:i| s := s + i]] timeToRun. 5.766674 seconds

I tried it out on VW 7.10PUL 64bits:

There is no sum in Collection as far as I can see.

| n |
n := 100000000.
"[(1 to: n) sum ] timeToRun."
[| s | s := 0. (1 to: n) do: [:i| s := s + i]] timeToRun.   3.171995 seconds

| n |
n := 100000000.
"[(1 to: n) sum ] timeToRun."
[| s | s := 0. 1 to: n do: [:i| s := s + i]] timeToRun.  235.993
milliseconds   Impressive!!! Hope we get that with the 64 bit VM too.

last case, 0 of course.




Eliot, why is a class variable faster?
>>
>
> The expression is a constant.  You can e.g. compute it at class
>  initialization and for descriptive purposes store the result in a class
> variable. That's usually quicker than computing anything every time.
> Standard examples are specialized dispatch tables, dictionaries, etc.
> Class variables are a convenient way of holding onto precomputed objects.
>

Thank you.

>
> Alex, I'll a look at OpenCL.
>>
>> Also, I tried with a whileTrue: loop which was 3 times faster than the
>> sum on Collection.
>>
>> How to profile these kinds of tight loops? The standard profiler is kind
>> of useless.
>> Is AndreasProfiler the way to go
>>
> IMO yes.  It is much more accurate about primitives than the standard
> profiler.
>

Thx.

Phil


> --
> best,
> Eliot
>

Re: [Pharo-dev] Can we make this fast ?

Reply via email to