[Pharo-dev] Re: Array sum. is very slow

Sven Van Caekenberghe Fri, 07 Jan 2022 07:31:16 -0800

> On 7 Jan 2022, at 16:05, Jimmie Houchin <[email protected]> wrote:
> 
> Hello Sven,
> 
> I went and removed the Stdouts that you mention and other timing code from 
> the loops.
> 
> I am running the test now, to see if that makes much difference. I do not 
> think it will.
> 
> The reason I put that in there is because it take so long to run. It can be 
> frustrating to wait and wait and not know if your test is doing anything or 
> not. So I put the code in to let me know.
> 
> One of your parameters is incorrect. It is 100 iterations not 10.

Ah, I misread the Python code, on top it says, reps = 10, while at the bottom 
it does indeed say, doit(100).

So the time should be multiplied by 10.

The logging, esp. the #flush will slow things down. But the removing the 
message tally spy is important too.

The general implementation of #sum is not optimal in the case of a fixed array. 
Consider:

data := Array new: 1e5 withAll: 0.5.

[ data sum ] bench. "'494.503 per second'"

[ | sum | sum := 0. data do: [ :each | sum := sum + each ]. sum ] bench. 
"'680.128 per second'"

[ | sum | sum := 0. 1 to: 1e5 do: [ :each | sum := sum + (data at: each) ]. sum 
] bench. "'1033.180 per second'"

As others have remarked: doing #average right after #sum is doing the same 
thing twice. But maybe that is not the point.

> I learned early on in this experiment that I have to do a large number of 
> iterations or C, C++, Java, etc are too fast to have comprehensible results.
> 
> I can tell if any of the implementations is incorrect by the final nsum. All 
> implementations must produce the same result.
> 
> Thanks for the comments.
> 
> Jimmie
> 
> 
> On 1/7/22 07:40, Sven Van Caekenberghe wrote:
>> Hi Jimmie,
>> 
>> I loaded your code in Pharo 9 on my MacBook Pro (Intel i5) macOS 12.1
>> 
>> I commented out the Stdio logging from the 2 inner loops (#loop1, #loop2) 
>> (not done in Python either) as well as the MessageTally spyOn: from #run 
>> (slows things down).
>> 
>> Then I ran your code with:
>> 
>> [ (LanguageTest newSize: 60*24*5*4 iterations: 10) run ] timeToRun.
>> 
>> which gave me "0:00:09:31.338"
>> 
>> The console output was:
>> 
>> ===
>> Starting test for array size: 28800  iterations: 10
>> 
>> Creating array of size: 28800   timeToRun: 0:00:00:00.031
>> 
>> Starting loop 1 at: 2022-01-07T14:10:35.395394+01:00
>> Loop 1 time: nil
>> nsum: 11234.235001659388
>> navg: 0.39007760422428434
>> 
>> Starting loop 2 at: 2022-01-07T14:15:22.108433+01:00
>> Loop 2 time: 0:00:04:44.593
>> nsum: 11245.697629561537
>> navg: 0.3904756121375534
>> 
>> End of test.  TotalTime: 0:00:09:31.338
>> ===
>> 
>> Which would be twice as fast as Python, if I got the parameters correct.
>> 
>> Sven
>> 
>>> On 7 Jan 2022, at 13:19, Jimmie Houchin <[email protected]> wrote:
>>> 
>>> As I stated this is a micro benchmark and very much not anything resembling 
>>> a real app, Your comments are true if you are writing your app. But if you 
>>> want to stress the language you are going to do things which are seemingly 
>>> non-sense and abusive.
>>> 
>>> Also as I stated. The test has to be sufficient to stress faster languages 
>>> or it is meaningless.
>>> 
>>> If I remove the #sum and the #average calls from the inner loops, this is 
>>> what we get.
>>> 
>>> Julia      0.2256 seconds
>>> Python   5.318  seconds
>>> Pharo    3.5    seconds
>>> 
>>> This test does not sufficiently stress the language. Nor does it provide 
>>> any valuable insight into summing and averaging which is done a lot, in 
>>> lots of places in every iteration.
>>> 
>>> If you notice that inner array changes the array every iteration. So every 
>>> call to #sum and #average is getting different data.
>>> 
>>> Full Test
>>> 
>>> Julia     1.13  minutes
>>> Python   24.02 minutes
>>> Pharo    2:09:04
>>> 
>>> Code for the above is now published. You can let me know if I am doing 
>>> something unequal to the various languages.
>>> 
>>> And just remember anything you do which sufficiently changes the test has 
>>> to be done in all the languages to give a fair test. This isn't a lets make 
>>> Pharo look good test. I do want Pharo to look good, but honestly.
>>> 
>>> Yes, I know that I can bind to BLAS or other external libraries. But that 
>>> is not a test of Pharo. The Python is plain Python3 no Numpy, just using 
>>> the the default list [] for the array.
>>> 
>>> Julia is a whole other world. It is faster than Numpy. This is their domain 
>>> and they optimize, optimize, optimize all the math. In fact they have 
>>> reached the point that some pure Julia code beats pure Fortran.
>>> 
>>> In all of this I just want Pharo to do the best it can.
>>> 
>>> With the above results unless you already had an investment in Pharo, you 
>>> wouldn't even look. :(
>>> 
>>> Thanks for exploring this with me.
>>> 
>>> 
>>> Jimmie
>>> 
>>> 
>>> 
>>> 
>>> On 1/6/22 18:24, John Brant wrote:
>>>> On Jan 6, 2022, at 4:35 PM, Jimmie Houchin <[email protected]> wrote:
>>>>> No, it is an array of floats. The only integers in the test are in the 
>>>>> indexes of the loops.
>>>>> 
>>>>> Number random. "generates a float  0.8188008774329387"
>>>>> 
>>>>> So in the randarray below it is an array of 28800 floats.
>>>>> 
>>>>> It just felt so wrong to me that Python3 was so much faster. I don't care 
>>>>> if Nim, Crystal, Julia are faster. But...
>>>>> 
>>>>> 
>>>>> I am new to Iceberg and have never shared anything on Github so this is 
>>>>> all new to me. I uploaded my language test so you can see what it does. 
>>>>> It is a micro-benchmark. It does things that are not realistic in an app. 
>>>>> But it does stress a language in areas important to my app.
>>>>> 
>>>>> 
>>>>> https://github.com/jlhouchin/LanguageTestPharo
>>>>> 
>>>>> 
>>>>> Let me know if there is anything else I can do to help solve this problem.
>>>>> 
>>>>> I am a lone developer in my spare time. So my apologies for any ugly code.
>>>>> 
>>>> Are you sure that you have the same algorithm in Python? You are calling 
>>>> sum and average inside the loop where you are modifying the array:
>>>> 
>>>>    1 to: nsize do: [ :j || n |
>>>>            n := narray at: j.
>>>>            narray at: j put: (self loop1calc: i j: j n: n).
>>>>            nsum := narray sum.
>>>>            navg := narray average ]
>>>> 
>>>> As a result, you are calculating the sum of the 28,800 size array 28,800 
>>>> times (plus another 28,800 times for the average). If I write a similar 
>>>> loop in Python, it looks like it would take almost 9 minutes on my machine 
>>>> without using numpy to calculate the sum. The Pharo code takes ~40 
>>>> seconds. If this is really how the code should be, then I would change it 
>>>> to not call sum twice (once for sum and once in average). This will almost 
>>>> result in a 2x speedup. You could also modify the algorithm to update the 
>>>> nsum value in the loop instead of summing the array each time. I think the 
>>>> updating would require <120,000 math ops vs the >1.6 billion that you are 
>>>> performing.
>>>> 
>>>> 
>>>> John Brant
[Pharo-dev] Re: Array sum. is very slow

Reply via email to