[Pharo-dev] Re: Array sum. is very slow

Sven Van Caekenberghe Fri, 07 Jan 2022 10:53:19 -0800

Hi Jimmy,

I made a couple more changes:


- I added 

SequenceableCollection>>#sum
        | sum |
        sum := 0.
        1 to: self size do: [ :each |
                sum := sum + (self at: each) ].
        ^ sum 

as an extension method. It is not 100% semantically the same as the original, 
but it works for our case here. this also optimises #average BTW. This is the 
main one.

- I tried to avoid a couple of integer -> float conversions in 

normalize: n
        | nn |
        
        nn := n = 0
                ifTrue: [ 0.000123456789 ] 
                ifFalse: [ n asFloat ].

        [ nn <= 0.0001 ] whileTrue: [ nn := nn * 10.0 ].
        [ nn >= 1.0 ] whileTrue: [ nn := nn * 0.1 ].

        ^ nn

- Avoided one assignment in

loop1calc: i j: j n: n
        | v |
        v := n * (i+n) * (j-n) * 0.1234567.
        ^ self normalize: (v*v*v)

the time for 10 iterations now is halved:

===
Starting test for array size: 28800  iterations: 10

Creating array of size: 28800   timeToRun: 0:00:00:00.002

Starting loop 1 at: 2022-01-07T19:28:52.109011+01:00
Loop 1 time: nil
nsum: 11234.235001659386
navg: 0.3900776042242842

Starting loop 2 at: 2022-01-07T19:31:21.821784+01:00
Loop 2 time: 0:00:02:28.017
nsum: 11245.697629561537
navg: 0.3904756121375534

End of test.  TotalTime: 0:00:04:57.733
===

Sven

> On 7 Jan 2022, at 16:30, Sven Van Caekenberghe <s...@stfx.eu> wrote:
> 
> 
> 
>> On 7 Jan 2022, at 16:05, Jimmie Houchin <jlhouc...@gmail.com> wrote:
>> 
>> Hello Sven,
>> 
>> I went and removed the Stdouts that you mention and other timing code from 
>> the loops.
>> 
>> I am running the test now, to see if that makes much difference. I do not 
>> think it will.
>> 
>> The reason I put that in there is because it take so long to run. It can be 
>> frustrating to wait and wait and not know if your test is doing anything or 
>> not. So I put the code in to let me know.
>> 
>> One of your parameters is incorrect. It is 100 iterations not 10.
> 
> Ah, I misread the Python code, on top it says, reps = 10, while at the bottom 
> it does indeed say, doit(100).
> 
> So the time should be multiplied by 10.
> 
> The logging, esp. the #flush will slow things down. But the removing the 
> message tally spy is important too.
> 
> The general implementation of #sum is not optimal in the case of a fixed 
> array. Consider:
> 
> data := Array new: 1e5 withAll: 0.5.
> 
> [ data sum ] bench. "'494.503 per second'"
> 
> [ | sum | sum := 0. data do: [ :each | sum := sum + each ]. sum ] bench. 
> "'680.128 per second'"
> 
> [ | sum | sum := 0. 1 to: 1e5 do: [ :each | sum := sum + (data at: each) ]. 
> sum ] bench. "'1033.180 per second'"
> 
> As others have remarked: doing #average right after #sum is doing the same 
> thing twice. But maybe that is not the point.
> 
>> I learned early on in this experiment that I have to do a large number of 
>> iterations or C, C++, Java, etc are too fast to have comprehensible results.
>> 
>> I can tell if any of the implementations is incorrect by the final nsum. All 
>> implementations must produce the same result.
>> 
>> Thanks for the comments.
>> 
>> Jimmie
>> 
>> 
>> On 1/7/22 07:40, Sven Van Caekenberghe wrote:
>>> Hi Jimmie,
>>> 
>>> I loaded your code in Pharo 9 on my MacBook Pro (Intel i5) macOS 12.1
>>> 
>>> I commented out the Stdio logging from the 2 inner loops (#loop1, #loop2) 
>>> (not done in Python either) as well as the MessageTally spyOn: from #run 
>>> (slows things down).
>>> 
>>> Then I ran your code with:
>>> 
>>> [ (LanguageTest newSize: 60*24*5*4 iterations: 10) run ] timeToRun.
>>> 
>>> which gave me "0:00:09:31.338"
>>> 
>>> The console output was:
>>> 
>>> ===
>>> Starting test for array size: 28800  iterations: 10
>>> 
>>> Creating array of size: 28800   timeToRun: 0:00:00:00.031
>>> 
>>> Starting loop 1 at: 2022-01-07T14:10:35.395394+01:00
>>> Loop 1 time: nil
>>> nsum: 11234.235001659388
>>> navg: 0.39007760422428434
>>> 
>>> Starting loop 2 at: 2022-01-07T14:15:22.108433+01:00
>>> Loop 2 time: 0:00:04:44.593
>>> nsum: 11245.697629561537
>>> navg: 0.3904756121375534
>>> 
>>> End of test.  TotalTime: 0:00:09:31.338
>>> ===
>>> 
>>> Which would be twice as fast as Python, if I got the parameters correct.
>>> 
>>> Sven
>>> 
>>>> On 7 Jan 2022, at 13:19, Jimmie Houchin <jlhouc...@gmail.com> wrote:
>>>> 
>>>> As I stated this is a micro benchmark and very much not anything 
>>>> resembling a real app, Your comments are true if you are writing your app. 
>>>> But if you want to stress the language you are going to do things which 
>>>> are seemingly non-sense and abusive.
>>>> 
>>>> Also as I stated. The test has to be sufficient to stress faster languages 
>>>> or it is meaningless.
>>>> 
>>>> If I remove the #sum and the #average calls from the inner loops, this is 
>>>> what we get.
>>>> 
>>>> Julia      0.2256 seconds
>>>> Python   5.318  seconds
>>>> Pharo    3.5    seconds
>>>> 
>>>> This test does not sufficiently stress the language. Nor does it provide 
>>>> any valuable insight into summing and averaging which is done a lot, in 
>>>> lots of places in every iteration.
>>>> 
>>>> If you notice that inner array changes the array every iteration. So every 
>>>> call to #sum and #average is getting different data.
>>>> 
>>>> Full Test
>>>> 
>>>> Julia     1.13  minutes
>>>> Python   24.02 minutes
>>>> Pharo    2:09:04
>>>> 
>>>> Code for the above is now published. You can let me know if I am doing 
>>>> something unequal to the various languages.
>>>> 
>>>> And just remember anything you do which sufficiently changes the test has 
>>>> to be done in all the languages to give a fair test. This isn't a lets 
>>>> make Pharo look good test. I do want Pharo to look good, but honestly.
>>>> 
>>>> Yes, I know that I can bind to BLAS or other external libraries. But that 
>>>> is not a test of Pharo. The Python is plain Python3 no Numpy, just using 
>>>> the the default list [] for the array.
>>>> 
>>>> Julia is a whole other world. It is faster than Numpy. This is their 
>>>> domain and they optimize, optimize, optimize all the math. In fact they 
>>>> have reached the point that some pure Julia code beats pure Fortran.
>>>> 
>>>> In all of this I just want Pharo to do the best it can.
>>>> 
>>>> With the above results unless you already had an investment in Pharo, you 
>>>> wouldn't even look. :(
>>>> 
>>>> Thanks for exploring this with me.
>>>> 
>>>> 
>>>> Jimmie
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 1/6/22 18:24, John Brant wrote:
>>>>> On Jan 6, 2022, at 4:35 PM, Jimmie Houchin <jlhouc...@gmail.com> wrote:
>>>>>> No, it is an array of floats. The only integers in the test are in the 
>>>>>> indexes of the loops.
>>>>>> 
>>>>>> Number random. "generates a float  0.8188008774329387"
>>>>>> 
>>>>>> So in the randarray below it is an array of 28800 floats.
>>>>>> 
>>>>>> It just felt so wrong to me that Python3 was so much faster. I don't 
>>>>>> care if Nim, Crystal, Julia are faster. But...
>>>>>> 
>>>>>> 
>>>>>> I am new to Iceberg and have never shared anything on Github so this is 
>>>>>> all new to me. I uploaded my language test so you can see what it does. 
>>>>>> It is a micro-benchmark. It does things that are not realistic in an 
>>>>>> app. But it does stress a language in areas important to my app.
>>>>>> 
>>>>>> 
>>>>>> https://github.com/jlhouchin/LanguageTestPharo
>>>>>> 
>>>>>> 
>>>>>> Let me know if there is anything else I can do to help solve this 
>>>>>> problem.
>>>>>> 
>>>>>> I am a lone developer in my spare time. So my apologies for any ugly 
>>>>>> code.
>>>>>> 
>>>>> Are you sure that you have the same algorithm in Python? You are calling 
>>>>> sum and average inside the loop where you are modifying the array:
>>>>> 
>>>>>   1 to: nsize do: [ :j || n |
>>>>>           n := narray at: j.
>>>>>           narray at: j put: (self loop1calc: i j: j n: n).
>>>>>           nsum := narray sum.
>>>>>           navg := narray average ]
>>>>> 
>>>>> As a result, you are calculating the sum of the 28,800 size array 28,800 
>>>>> times (plus another 28,800 times for the average). If I write a similar 
>>>>> loop in Python, it looks like it would take almost 9 minutes on my 
>>>>> machine without using numpy to calculate the sum. The Pharo code takes 
>>>>> ~40 seconds. If this is really how the code should be, then I would 
>>>>> change it to not call sum twice (once for sum and once in average). This 
>>>>> will almost result in a 2x speedup. You could also modify the algorithm 
>>>>> to update the nsum value in the loop instead of summing the array each 
>>>>> time. I think the updating would require <120,000 math ops vs the >1.6 
>>>>> billion that you are performing.
>>>>> 
>>>>> 
>>>>> John Brant

[Pharo-dev] Re: Array sum. is very slow

Reply via email to