Hi Jimmy, I made a couple more changes:
- I added SequenceableCollection>>#sum | sum | sum := 0. 1 to: self size do: [ :each | sum := sum + (self at: each) ]. ^ sum as an extension method. It is not 100% semantically the same as the original, but it works for our case here. this also optimises #average BTW. This is the main one. - I tried to avoid a couple of integer -> float conversions in normalize: n | nn | nn := n = 0 ifTrue: [ 0.000123456789 ] ifFalse: [ n asFloat ]. [ nn <= 0.0001 ] whileTrue: [ nn := nn * 10.0 ]. [ nn >= 1.0 ] whileTrue: [ nn := nn * 0.1 ]. ^ nn - Avoided one assignment in loop1calc: i j: j n: n | v | v := n * (i+n) * (j-n) * 0.1234567. ^ self normalize: (v*v*v) the time for 10 iterations now is halved: === Starting test for array size: 28800 iterations: 10 Creating array of size: 28800 timeToRun: 0:00:00:00.002 Starting loop 1 at: 2022-01-07T19:28:52.109011+01:00 Loop 1 time: nil nsum: 11234.235001659386 navg: 0.3900776042242842 Starting loop 2 at: 2022-01-07T19:31:21.821784+01:00 Loop 2 time: 0:00:02:28.017 nsum: 11245.697629561537 navg: 0.3904756121375534 End of test. TotalTime: 0:00:04:57.733 === Sven > On 7 Jan 2022, at 16:30, Sven Van Caekenberghe <s...@stfx.eu> wrote: > > > >> On 7 Jan 2022, at 16:05, Jimmie Houchin <jlhouc...@gmail.com> wrote: >> >> Hello Sven, >> >> I went and removed the Stdouts that you mention and other timing code from >> the loops. >> >> I am running the test now, to see if that makes much difference. I do not >> think it will. >> >> The reason I put that in there is because it take so long to run. It can be >> frustrating to wait and wait and not know if your test is doing anything or >> not. So I put the code in to let me know. >> >> One of your parameters is incorrect. It is 100 iterations not 10. > > Ah, I misread the Python code, on top it says, reps = 10, while at the bottom > it does indeed say, doit(100). > > So the time should be multiplied by 10. > > The logging, esp. the #flush will slow things down. But the removing the > message tally spy is important too. > > The general implementation of #sum is not optimal in the case of a fixed > array. Consider: > > data := Array new: 1e5 withAll: 0.5. > > [ data sum ] bench. "'494.503 per second'" > > [ | sum | sum := 0. data do: [ :each | sum := sum + each ]. sum ] bench. > "'680.128 per second'" > > [ | sum | sum := 0. 1 to: 1e5 do: [ :each | sum := sum + (data at: each) ]. > sum ] bench. "'1033.180 per second'" > > As others have remarked: doing #average right after #sum is doing the same > thing twice. But maybe that is not the point. > >> I learned early on in this experiment that I have to do a large number of >> iterations or C, C++, Java, etc are too fast to have comprehensible results. >> >> I can tell if any of the implementations is incorrect by the final nsum. All >> implementations must produce the same result. >> >> Thanks for the comments. >> >> Jimmie >> >> >> On 1/7/22 07:40, Sven Van Caekenberghe wrote: >>> Hi Jimmie, >>> >>> I loaded your code in Pharo 9 on my MacBook Pro (Intel i5) macOS 12.1 >>> >>> I commented out the Stdio logging from the 2 inner loops (#loop1, #loop2) >>> (not done in Python either) as well as the MessageTally spyOn: from #run >>> (slows things down). >>> >>> Then I ran your code with: >>> >>> [ (LanguageTest newSize: 60*24*5*4 iterations: 10) run ] timeToRun. >>> >>> which gave me "0:00:09:31.338" >>> >>> The console output was: >>> >>> === >>> Starting test for array size: 28800 iterations: 10 >>> >>> Creating array of size: 28800 timeToRun: 0:00:00:00.031 >>> >>> Starting loop 1 at: 2022-01-07T14:10:35.395394+01:00 >>> Loop 1 time: nil >>> nsum: 11234.235001659388 >>> navg: 0.39007760422428434 >>> >>> Starting loop 2 at: 2022-01-07T14:15:22.108433+01:00 >>> Loop 2 time: 0:00:04:44.593 >>> nsum: 11245.697629561537 >>> navg: 0.3904756121375534 >>> >>> End of test. TotalTime: 0:00:09:31.338 >>> === >>> >>> Which would be twice as fast as Python, if I got the parameters correct. >>> >>> Sven >>> >>>> On 7 Jan 2022, at 13:19, Jimmie Houchin <jlhouc...@gmail.com> wrote: >>>> >>>> As I stated this is a micro benchmark and very much not anything >>>> resembling a real app, Your comments are true if you are writing your app. >>>> But if you want to stress the language you are going to do things which >>>> are seemingly non-sense and abusive. >>>> >>>> Also as I stated. The test has to be sufficient to stress faster languages >>>> or it is meaningless. >>>> >>>> If I remove the #sum and the #average calls from the inner loops, this is >>>> what we get. >>>> >>>> Julia 0.2256 seconds >>>> Python 5.318 seconds >>>> Pharo 3.5 seconds >>>> >>>> This test does not sufficiently stress the language. Nor does it provide >>>> any valuable insight into summing and averaging which is done a lot, in >>>> lots of places in every iteration. >>>> >>>> If you notice that inner array changes the array every iteration. So every >>>> call to #sum and #average is getting different data. >>>> >>>> Full Test >>>> >>>> Julia 1.13 minutes >>>> Python 24.02 minutes >>>> Pharo 2:09:04 >>>> >>>> Code for the above is now published. You can let me know if I am doing >>>> something unequal to the various languages. >>>> >>>> And just remember anything you do which sufficiently changes the test has >>>> to be done in all the languages to give a fair test. This isn't a lets >>>> make Pharo look good test. I do want Pharo to look good, but honestly. >>>> >>>> Yes, I know that I can bind to BLAS or other external libraries. But that >>>> is not a test of Pharo. The Python is plain Python3 no Numpy, just using >>>> the the default list [] for the array. >>>> >>>> Julia is a whole other world. It is faster than Numpy. This is their >>>> domain and they optimize, optimize, optimize all the math. In fact they >>>> have reached the point that some pure Julia code beats pure Fortran. >>>> >>>> In all of this I just want Pharo to do the best it can. >>>> >>>> With the above results unless you already had an investment in Pharo, you >>>> wouldn't even look. :( >>>> >>>> Thanks for exploring this with me. >>>> >>>> >>>> Jimmie >>>> >>>> >>>> >>>> >>>> On 1/6/22 18:24, John Brant wrote: >>>>> On Jan 6, 2022, at 4:35 PM, Jimmie Houchin <jlhouc...@gmail.com> wrote: >>>>>> No, it is an array of floats. The only integers in the test are in the >>>>>> indexes of the loops. >>>>>> >>>>>> Number random. "generates a float 0.8188008774329387" >>>>>> >>>>>> So in the randarray below it is an array of 28800 floats. >>>>>> >>>>>> It just felt so wrong to me that Python3 was so much faster. I don't >>>>>> care if Nim, Crystal, Julia are faster. But... >>>>>> >>>>>> >>>>>> I am new to Iceberg and have never shared anything on Github so this is >>>>>> all new to me. I uploaded my language test so you can see what it does. >>>>>> It is a micro-benchmark. It does things that are not realistic in an >>>>>> app. But it does stress a language in areas important to my app. >>>>>> >>>>>> >>>>>> https://github.com/jlhouchin/LanguageTestPharo >>>>>> >>>>>> >>>>>> Let me know if there is anything else I can do to help solve this >>>>>> problem. >>>>>> >>>>>> I am a lone developer in my spare time. So my apologies for any ugly >>>>>> code. >>>>>> >>>>> Are you sure that you have the same algorithm in Python? You are calling >>>>> sum and average inside the loop where you are modifying the array: >>>>> >>>>> 1 to: nsize do: [ :j || n | >>>>> n := narray at: j. >>>>> narray at: j put: (self loop1calc: i j: j n: n). >>>>> nsum := narray sum. >>>>> navg := narray average ] >>>>> >>>>> As a result, you are calculating the sum of the 28,800 size array 28,800 >>>>> times (plus another 28,800 times for the average). If I write a similar >>>>> loop in Python, it looks like it would take almost 9 minutes on my >>>>> machine without using numpy to calculate the sum. The Pharo code takes >>>>> ~40 seconds. If this is really how the code should be, then I would >>>>> change it to not call sum twice (once for sum and once in average). This >>>>> will almost result in a 2x speedup. You could also modify the algorithm >>>>> to update the nsum value in the loop instead of summing the array each >>>>> time. I think the updating would require <120,000 math ops vs the >1.6 >>>>> billion that you are performing. >>>>> >>>>> >>>>> John Brant