Hi Jimmy,
I made a couple more changes:
- I added
SequenceableCollection>>#sum
| sum |
sum := 0.
1 to: self size do: [ :each |
sum := sum + (self at: each) ].
^ sum
as an extension method. It is not 100% semantically the same as the original,
but it works for our case here. this also optimises #average BTW. This is the
main one.
- I tried to avoid a couple of integer -> float conversions in
normalize: n
| nn |
nn := n = 0
ifTrue: [ 0.000123456789 ]
ifFalse: [ n asFloat ].
[ nn <= 0.0001 ] whileTrue: [ nn := nn * 10.0 ].
[ nn >= 1.0 ] whileTrue: [ nn := nn * 0.1 ].
^ nn
- Avoided one assignment in
loop1calc: i j: j n: n
| v |
v := n * (i+n) * (j-n) * 0.1234567.
^ self normalize: (v*v*v)
the time for 10 iterations now is halved:
===
Starting test for array size: 28800 iterations: 10
Creating array of size: 28800 timeToRun: 0:00:00:00.002
Starting loop 1 at: 2022-01-07T19:28:52.109011+01:00
Loop 1 time: nil
nsum: 11234.235001659386
navg: 0.3900776042242842
Starting loop 2 at: 2022-01-07T19:31:21.821784+01:00
Loop 2 time: 0:00:02:28.017
nsum: 11245.697629561537
navg: 0.3904756121375534
End of test. TotalTime: 0:00:04:57.733
===
Sven
> On 7 Jan 2022, at 16:30, Sven Van Caekenberghe <[email protected]> wrote:
>
>
>
>> On 7 Jan 2022, at 16:05, Jimmie Houchin <[email protected]> wrote:
>>
>> Hello Sven,
>>
>> I went and removed the Stdouts that you mention and other timing code from
>> the loops.
>>
>> I am running the test now, to see if that makes much difference. I do not
>> think it will.
>>
>> The reason I put that in there is because it take so long to run. It can be
>> frustrating to wait and wait and not know if your test is doing anything or
>> not. So I put the code in to let me know.
>>
>> One of your parameters is incorrect. It is 100 iterations not 10.
>
> Ah, I misread the Python code, on top it says, reps = 10, while at the bottom
> it does indeed say, doit(100).
>
> So the time should be multiplied by 10.
>
> The logging, esp. the #flush will slow things down. But the removing the
> message tally spy is important too.
>
> The general implementation of #sum is not optimal in the case of a fixed
> array. Consider:
>
> data := Array new: 1e5 withAll: 0.5.
>
> [ data sum ] bench. "'494.503 per second'"
>
> [ | sum | sum := 0. data do: [ :each | sum := sum + each ]. sum ] bench.
> "'680.128 per second'"
>
> [ | sum | sum := 0. 1 to: 1e5 do: [ :each | sum := sum + (data at: each) ].
> sum ] bench. "'1033.180 per second'"
>
> As others have remarked: doing #average right after #sum is doing the same
> thing twice. But maybe that is not the point.
>
>> I learned early on in this experiment that I have to do a large number of
>> iterations or C, C++, Java, etc are too fast to have comprehensible results.
>>
>> I can tell if any of the implementations is incorrect by the final nsum. All
>> implementations must produce the same result.
>>
>> Thanks for the comments.
>>
>> Jimmie
>>
>>
>> On 1/7/22 07:40, Sven Van Caekenberghe wrote:
>>> Hi Jimmie,
>>>
>>> I loaded your code in Pharo 9 on my MacBook Pro (Intel i5) macOS 12.1
>>>
>>> I commented out the Stdio logging from the 2 inner loops (#loop1, #loop2)
>>> (not done in Python either) as well as the MessageTally spyOn: from #run
>>> (slows things down).
>>>
>>> Then I ran your code with:
>>>
>>> [ (LanguageTest newSize: 60*24*5*4 iterations: 10) run ] timeToRun.
>>>
>>> which gave me "0:00:09:31.338"
>>>
>>> The console output was:
>>>
>>> ===
>>> Starting test for array size: 28800 iterations: 10
>>>
>>> Creating array of size: 28800 timeToRun: 0:00:00:00.031
>>>
>>> Starting loop 1 at: 2022-01-07T14:10:35.395394+01:00
>>> Loop 1 time: nil
>>> nsum: 11234.235001659388
>>> navg: 0.39007760422428434
>>>
>>> Starting loop 2 at: 2022-01-07T14:15:22.108433+01:00
>>> Loop 2 time: 0:00:04:44.593
>>> nsum: 11245.697629561537
>>> navg: 0.3904756121375534
>>>
>>> End of test. TotalTime: 0:00:09:31.338
>>> ===
>>>
>>> Which would be twice as fast as Python, if I got the parameters correct.
>>>
>>> Sven
>>>
>>>> On 7 Jan 2022, at 13:19, Jimmie Houchin <[email protected]> wrote:
>>>>
>>>> As I stated this is a micro benchmark and very much not anything
>>>> resembling a real app, Your comments are true if you are writing your app.
>>>> But if you want to stress the language you are going to do things which
>>>> are seemingly non-sense and abusive.
>>>>
>>>> Also as I stated. The test has to be sufficient to stress faster languages
>>>> or it is meaningless.
>>>>
>>>> If I remove the #sum and the #average calls from the inner loops, this is
>>>> what we get.
>>>>
>>>> Julia 0.2256 seconds
>>>> Python 5.318 seconds
>>>> Pharo 3.5 seconds
>>>>
>>>> This test does not sufficiently stress the language. Nor does it provide
>>>> any valuable insight into summing and averaging which is done a lot, in
>>>> lots of places in every iteration.
>>>>
>>>> If you notice that inner array changes the array every iteration. So every
>>>> call to #sum and #average is getting different data.
>>>>
>>>> Full Test
>>>>
>>>> Julia 1.13 minutes
>>>> Python 24.02 minutes
>>>> Pharo 2:09:04
>>>>
>>>> Code for the above is now published. You can let me know if I am doing
>>>> something unequal to the various languages.
>>>>
>>>> And just remember anything you do which sufficiently changes the test has
>>>> to be done in all the languages to give a fair test. This isn't a lets
>>>> make Pharo look good test. I do want Pharo to look good, but honestly.
>>>>
>>>> Yes, I know that I can bind to BLAS or other external libraries. But that
>>>> is not a test of Pharo. The Python is plain Python3 no Numpy, just using
>>>> the the default list [] for the array.
>>>>
>>>> Julia is a whole other world. It is faster than Numpy. This is their
>>>> domain and they optimize, optimize, optimize all the math. In fact they
>>>> have reached the point that some pure Julia code beats pure Fortran.
>>>>
>>>> In all of this I just want Pharo to do the best it can.
>>>>
>>>> With the above results unless you already had an investment in Pharo, you
>>>> wouldn't even look. :(
>>>>
>>>> Thanks for exploring this with me.
>>>>
>>>>
>>>> Jimmie
>>>>
>>>>
>>>>
>>>>
>>>> On 1/6/22 18:24, John Brant wrote:
>>>>> On Jan 6, 2022, at 4:35 PM, Jimmie Houchin <[email protected]> wrote:
>>>>>> No, it is an array of floats. The only integers in the test are in the
>>>>>> indexes of the loops.
>>>>>>
>>>>>> Number random. "generates a float 0.8188008774329387"
>>>>>>
>>>>>> So in the randarray below it is an array of 28800 floats.
>>>>>>
>>>>>> It just felt so wrong to me that Python3 was so much faster. I don't
>>>>>> care if Nim, Crystal, Julia are faster. But...
>>>>>>
>>>>>>
>>>>>> I am new to Iceberg and have never shared anything on Github so this is
>>>>>> all new to me. I uploaded my language test so you can see what it does.
>>>>>> It is a micro-benchmark. It does things that are not realistic in an
>>>>>> app. But it does stress a language in areas important to my app.
>>>>>>
>>>>>>
>>>>>> https://github.com/jlhouchin/LanguageTestPharo
>>>>>>
>>>>>>
>>>>>> Let me know if there is anything else I can do to help solve this
>>>>>> problem.
>>>>>>
>>>>>> I am a lone developer in my spare time. So my apologies for any ugly
>>>>>> code.
>>>>>>
>>>>> Are you sure that you have the same algorithm in Python? You are calling
>>>>> sum and average inside the loop where you are modifying the array:
>>>>>
>>>>> 1 to: nsize do: [ :j || n |
>>>>> n := narray at: j.
>>>>> narray at: j put: (self loop1calc: i j: j n: n).
>>>>> nsum := narray sum.
>>>>> navg := narray average ]
>>>>>
>>>>> As a result, you are calculating the sum of the 28,800 size array 28,800
>>>>> times (plus another 28,800 times for the average). If I write a similar
>>>>> loop in Python, it looks like it would take almost 9 minutes on my
>>>>> machine without using numpy to calculate the sum. The Pharo code takes
>>>>> ~40 seconds. If this is really how the code should be, then I would
>>>>> change it to not call sum twice (once for sum and once in average). This
>>>>> will almost result in a 2x speedup. You could also modify the algorithm
>>>>> to update the nsum value in the loop instead of summing the array each
>>>>> time. I think the updating would require <120,000 math ops vs the >1.6
>>>>> billion that you are performing.
>>>>>
>>>>>
>>>>> John Brant