[Pharo-dev] Re: Array sum. is very slow

Stéphane Ducasse Sun, 09 Jan 2022 01:15:39 -0800

On my machine so this is the same.

SQ5.3


Test with 10000000 elements
Original #sum -> Time: 196 milliseconds, Total: 5.001448710680429e6
Naive #sum -> Time: 152 milliseconds, Total: 5.001448710680429e6
Inject #sum -> Time: 143 milliseconds, Total: 5.001448710680429e6



> On 8 Jan 2022, at 21:47, stephane ducasse <stephane.duca...@inria.fr> wrote:
> 
> Thanks benoit for the snippet
> I run it in Pharo 10 and I got
> 
> Test with 10000000 elements
> Original #sum -> Time: 195 milliseconds, Total: 4.999452880735064e6
> Naive #sum -> Time: 153 milliseconds, Total: 4.999452880735063e6
> Inject #sum -> Time: 198 milliseconds, Total: 4.999452880735063e6
> 
> 
> in Pharo 9
> Test with 10000000 elements
> Original #sum -> Time: 182 milliseconds, Total: 4.999339450212771e6
> Naive #sum -> Time: 148 milliseconds, Total: 4.999339450212771e6
> Inject #sum -> Time: 203 milliseconds, Total: 4.999339450212771e6
> 
> I’m interested to understand why Pharo is slower. May be this is the impact 
> of the new full blocks. 
> We started to play with the idea of regression benchmarks. 
> 
> S
> 
> 
>> On 7 Jan 2022, at 16:36, Benoit St-Jean via Pharo-dev 
>> <pharo-dev@lists.pharo.org <mailto:pharo-dev@lists.pharo.org>> wrote:
>> 
>> Can you come up with a simple "base case" so we can find the 
>> bottleneck/problem?
>> 
>> I'm not sure about what you're trying to do.
>> 
>> What do you get if you try this in a workspace (adjust the value of n to 
>> what you want, I tested it with 10 million items).
>> 
>> Let's get this one step at a time!
>> 
>> 
>> 
>> |  floatArray  n  rng t1 t2 t3 r1 r2 r3 |
>> 
>> n := 10000000.
>> 
>> rng := Random new.
>> 
>> floatArray := Array new: n. 
>> floatArray doWithIndex: [:each :idx | floatArray at: idx put: rng next].
>> 
>> t1 := Time millisecondsToRun: [r1 := floatArray sum].
>> t2 := Time millisecondsToRun: [| total |
>>      
>>                                                              total := 0.
>>                                                              floatArray do: 
>> [:each | total := total + each ].
>>                                                              r2 := total].
>>                                                      
>> t3 := Time millisecondsToRun: [r3 := floatArray inject: 0 into:  [: total 
>> :each | total + each ]].
>> 
>> Transcript cr.
>> Transcript cr; show: 'Test with ', n printString, ' elements'.
>> Transcript cr;show: 'Original #sum -> Time: ', t1 printString, ' 
>> milliseconds, Total: ', r1 printString.
>> Transcript cr;show: 'Naive #sum -> Time: ', t2 printString, ' milliseconds, 
>> Total: ', r2 printString.  
>> Transcript cr;show: 'Inject #sum -> Time: ', t3 printString, ' milliseconds, 
>> Total: ', r3 printString.  
>> 
>> --------------------------
>> 
>> Here are the results I get on Squeak 5.3
>> 
>> Test with 10000000 elements
>> Original #sum -> Time: 143 milliseconds, Total: 4.999271889099622e6
>> Naive #sum -> Time: 115 milliseconds, Total: 4.999271889099622e6
>> Inject #sum -> Time: 102 milliseconds, Total: 4.999271889099622e6
>> 
>> 
>> 
>> ----------------- 
>> Benoît St-Jean 
>> Yahoo! Messenger: bstjean 
>> Twitter: @BenLeChialeux 
>> Pinterest: benoitstjean 
>> Instagram: Chef_Benito
>> IRC: lamneth 
>> GitHub: bstjean
>> Blogue: endormitoire.wordpress.com <http://endormitoire.wordpress.com/> 
>> "A standpoint is an intellectual horizon of radius zero".  (A. Einstein)
>> 
>> 
>> On Thursday, January 6, 2022, 03:38:22 p.m. EST, Jimmie Houchin 
>> <jlhouc...@gmail.com <mailto:jlhouc...@gmail.com>> wrote:
>> 
>> 
>> I have written a micro benchmark which stresses a language in areas 
>> which are crucial to my application.
>> 
>> I have written this micro benchmark in Pharo, Crystal, Nim, Python, 
>> PicoLisp, C, C++, Java and Julia.
>> 
>> On my i7 laptop Julia completes it in about 1 minute and 15 seconds, 
>> amazing magic they have done.
>> 
>> Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. 
>> Pharo takes over 2 hours. :(
>> 
>> In my benchmarks if I comment out the sum and average of the array. It 
>> completes in 3.5 seconds.
>> And when I sum the array it gives the correct results. So I can verify 
>> its validity.
>> 
>> To illustrate below is some sample code of what I am doing. I iterate 
>> over the array and do calculations on each value of the array and update 
>> the array and sum and average at each value simple to stress array 
>> access and sum and average.
>> 
>> 28800 is simply derived from time series one minute values for 5 days, 4 
>> weeks.
>> 
>> randarray := Array new: 28800.
>> 
>> 1 to: randarray size do: [ :i | randarray at: i put: Number random ].
>> 
>> randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations 
>> here." randarray sum. randarray average ]] timeToRun.
>> 
>> randarrayttr. "0:00:00:36.135"
>> 
>> 
>> I do 2 loops with 100 iterations each.
>> 
>> randarrayttr * 200. "0:02:00:27"
>> 
>> 
>> I learned early on in this adventure when dealing with compiled 
>> languages that if you don’t do a lot, the test may not last long enough 
>> to give any times.
>> 
>> Pharo is my preference. But this is an awful big gap in performance. 
>> When doing backtesting this is huge. Does my backtest take minutes, 
>> hours or days?
>> 
>> I am not a computer scientist nor expert in Pharo or Smalltalk. So I do 
>> not know if there is anything which can improve this.
>> 
>> 
>> However I have played around with several experiments of my #sum: method.
>> 
>> This implementation reduces the time on the above randarray in half.
>> 
>> sum: col
>> | sum |
>> sum := 0.
>> 1 to: col size do: [ :i |
>>      sum := sum + (col at: i) ].
>> ^ sum
>> 
>> randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations 
>> here."
>>     ltsa sum: randarray. ltsa sum: randarray ]] timeToRun.
>> randarrayttr2. "0:00:00:18.563"
>> 
>> And this one reduces it a little more.
>> 
>> sum10: col
>> | sum |
>> sum := 0.
>> 1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |
>>      sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + 
>> (col at: (i + 3)) + (col at: (i + 4))
>>          + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + 
>> (col at: (i + 8)) + (col at: (i + 9))].
>> ((col size quo: 10) * 10 + 1) to: col size do: [ :i |
>>      sum := sum + (col at: i)].
>> ^ sum
>> 
>> randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations 
>> here."
>>     ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun.
>> randarrayttr3. "0:00:00:14.592"
>> 
>> It closes the gap with plain Python3 no numpy. But that is a pretty low 
>> standard.
>> 
>> Any ideas, thoughts, wisdom, directions to pursue.
>> 
>> Thanks
>> 
>> Jimmie
>> 
>

[Pharo-dev] Re: Array sum. is very slow

Reply via email to