[Pharo-dev] Re: Array sum. is very slow

2022-01-12 Thread Esteban Lorenzano
Nevertheless, I would like to stress that the usage of low level libraries to 
speed up what needs to fast is a correct approach in whatever language that 
exists, from old turbo pascal enabled an "asm" command to write directly 
assembly when needed to modern languages that implements their libraries in C 
and then use it... heck, even C libraries have parts written in assembly when 
needed!
I do not see why we can not gain speed in the places where is needed through a 
primitive, a plugin or a FFI call.

cheers!
Esteban

On Jan 12 2022, at 6:24 pm, teso...@gmail.com wrote:
> I have activated the plugin in the build for Pharo9, it will be available in 
> the next release.
>
> It will be interesting to have a Float64 extension to it.
> On Wed, Jan 12, 2022, 18:06 Marcus Denker  (mailto:marcus.den...@inria.fr)> wrote:
> >
> >
> > > On 12 Jan 2022, at 17:38, Henrik Sperre Johansen 
> > > mailto:henrik.s.johan...@veloxit.no)> 
> > > wrote:
> > > True!
> > > It’s a little bit of a naming conundrum, since the «Float» in Pharo is 
> > > already 64-bit, but since we’re speaking «native» arrays,
> > > DoubleArray
> > > would be the best, I guess.
> > >
> > > Speaking of, the related new (… to me, anyways) 
> > > DoubleByte/DoubleWordArray classes have incorrect definitions in Pharo 9 
> > > AFAICT- variableByte/WordSubclasses, instead of 
> > > variableDoubleByte/variableDoubleWordSubclasses…
> >
> > We finally fixed this in Pharo10 2 days ago: 
> > https://github.com/pharo-project/pharo/pull/9792
> >
> > Marcus

[Pharo-dev] Re: Array sum. is very slow

2022-01-12 Thread teso...@gmail.com
I have activated the plugin in the build for Pharo9, it will be available
in the next release.

It will be interesting to have a Float64 extension to it.

On Wed, Jan 12, 2022, 18:06 Marcus Denker  wrote:

>
>
> On 12 Jan 2022, at 17:38, Henrik Sperre Johansen <
> henrik.s.johan...@veloxit.no> wrote:
>
> True!
> It’s a little bit of a naming conundrum, since the «Float» in Pharo is
> already 64-bit, but since we’re speaking «native» arrays,
> DoubleArray
> would be the best, I guess.
>
> Speaking of, the related new (… to me, anyways) DoubleByte/DoubleWordArray
> classes have incorrect definitions in Pharo 9 AFAICT-
> variableByte/WordSubclasses, instead of
> variableDoubleByte/variableDoubleWordSubclasses…
>
>
>
> We finally fixed this in Pharo10 2 days ago:
> https://github.com/pharo-project/pharo/pull/9792
>
> Marcus
>


[Pharo-dev] Re: Array sum. is very slow

2022-01-12 Thread Marcus Denker


> On 12 Jan 2022, at 17:38, Henrik Sperre Johansen 
>  wrote:
> 
> True!
> It’s a little bit of a naming conundrum, since the «Float» in Pharo is 
> already 64-bit, but since we’re speaking «native» arrays,
> DoubleArray
> would be the best, I guess.
> 
> Speaking of, the related new (… to me, anyways) DoubleByte/DoubleWordArray 
> classes have incorrect definitions in Pharo 9 AFAICT- 
> variableByte/WordSubclasses, instead of 
> variableDoubleByte/variableDoubleWordSubclasses…


We finally fixed this in Pharo10 2 days ago: 
https://github.com/pharo-project/pharo/pull/9792 


Marcus

[Pharo-dev] Re: Array sum. is very slow

2022-01-12 Thread Henrik Sperre Johansen
True!
It’s a little bit of a naming conundrum, since the «Float» in Pharo is already 
64-bit, but since we’re speaking «native» arrays,
DoubleArray
would be the best, I guess.

Speaking of, the related new (… to me, anyways) DoubleByte/DoubleWordArray 
classes have incorrect definitions in Pharo 9 AFAICT- 
variableByte/WordSubclasses, instead of 
variableDoubleByte/variableDoubleWordSubclasses…

| dwa |
dwa := DoubleWordArray new: 1.
dwa at: 1 put: 1 << 32.

and

| dba |
dba := DoubleByteArray new: 1.
dba at: 1 put: 256.

*should* work…

Cheers,
Henry

> On 12 Jan 2022, at 16:51, Sven Van Caekenberghe  wrote:
> 
> Yes that would certainly be useful.
> 
> But, AFAIU, FloatArray consists of 32-bit Float numbers, I think we also need 
> a DoubleFloatArray since 64-bit Floats are the default nowadays.
> 
>> On 12 Jan 2022, at 16:31, Henrik Sperre Johansen 
>>  wrote:
>> 
>> We could also try modifying Pharo to use C by reintroducing the FloatArray 
>> plugin ;)
>> 
>> | fa r |
>> fa := FloatArray new: 28800.
>> r := Random new.
>> 1 to fa size do: [ :i | fa at: i put: r next ].
>> [ 1 to: fa size do: [ :i | fa sum ] ] timeToRun
>> 
>> Pharo 9, no plugin:
>> 0:00:01:14.777
>> Pharo 5, with plugin:
>> 0:00:00:00.526
>> 
>> Cheers,
>> Henry
>> 
>> 
 On 11 Jan 2022, at 10:08, Andrei Chis  wrote:
>>> 
>>> Hi Jimmie,
>>> 
>>> I was scanning through this thread and saw that the Python call uses
>>> the sum function. If I remember correctly, in Python the built-in sum
>>> function is directly implemented in C [1] (unless Python is compiled
>>> with SLOW_SUM set to true). In that case on large arrays the function
>>> can easily be several times faster than just iterating over the
>>> individual objects as the Pharo code does. The benchmark seems to
>>> compare summing numbers in C with summing numbers in Pharo. Would be
>>> interesting to modify the Python code to use a similar loop as in
>>> Pharo for doing the sum.
>>> 
>>> Cheers,
>>> Andrei
>>> 
>>> [1] 
>>> https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675cdbd0193b/Python/bltinmodule.c#L2461
>>> 
 On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin  wrote:
 
 Some experiments and discoveries.
 
 I am running my full language test every time. It is the only way I can 
 compare results. It is also what fully stresses the language.
 
 The reason I wrote the test as I did is because I wanted to know a couple 
 of things. Is the language sufficiently performant on basic maths. I am 
 not doing any high PolyMath level math. Simple things like moving averages 
 over portions of arrays.
 
 The other is efficiency of array iteration and access. This why #sum is 
 the best test of this attribute. #sum iterates and accesses every element 
 of the array. It will reveal if there are any problems.
 
 The default test  Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes.
 
 When I comment out the #sum and #average calls, Pharo completes the test 
 in 3.5 seconds. So almost all the time is spent in those two calls.
 
 So most of this conversation has focused on why #sum is as slow as it is 
 or how to improve the performance of #sum with other implementations.
 
 
 
 So I decided to breakdown the #sum and try some things.
 
 Starting with the initial implementation and SequenceableCollection's 
 default #sum  time of 02:04:03
 
 
 "This implementation does no work. Only iterates through the array.
 It completed in 00:10:08"
 sum
  | sum |
   sum := 1.
  1 to: self size do: [ :each | ].
  ^ sum
 
 
 "This implementation does no work, but adds to iteration, accessing the 
 value of the array.
 It completed in 00:32:32.
 Quite a bit of time for simply iterating and accessing."
 sum
  | sum |
  sum := 1.
  1 to: self size do: [ :each | self at: each ].
  ^ sum
 
 
 "This implementation I had in my initial email as an experiment and also 
 several other did the same in theirs.
 A naive simple implementation.
 It completed in 01:00:53.  Half the time of the original."
 sum
 | sum |
  sum := 0.
  1 to: self size do: [ :each |
  sum := sum + (self at: each) ].
  ^ sum
 
 
 
 "This implementation I also had in my initial email as an experiment I had 
 done.
 It completed in 00:50:18.
 It reduces the iterations and increases the accesses per iteration.
 It is the fastest implementation so far."
 sum
  | sum |
  sum := 0.
  1 to: ((self size quo: 10) * 10) by: 10 do: [ :i |
  sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 2)) + 
 (self at: (i + 3)) + (self at: (i + 4))  + (self at: (i + 5)) 
 + (self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: 
 (i + 9))].
 
  ((self size quo: 10) * 10 + 1) 

[Pharo-dev] Re: Array sum. is very slow

2022-01-12 Thread Sven Van Caekenberghe
Yes that would certainly be useful.

But, AFAIU, FloatArray consists of 32-bit Float numbers, I think we also need a 
DoubleFloatArray since 64-bit Floats are the default nowadays.

> On 12 Jan 2022, at 16:31, Henrik Sperre Johansen 
>  wrote:
> 
> We could also try modifying Pharo to use C by reintroducing the FloatArray 
> plugin ;)
> 
> | fa r |
> fa := FloatArray new: 28800.
> r := Random new.
> 1 to fa size do: [ :i | fa at: i put: r next ].
> [ 1 to: fa size do: [ :i | fa sum ] ] timeToRun
> 
> Pharo 9, no plugin:
> 0:00:01:14.777
> Pharo 5, with plugin:
> 0:00:00:00.526
> 
> Cheers,
> Henry
> 
> 
>> On 11 Jan 2022, at 10:08, Andrei Chis  wrote:
>> 
>> Hi Jimmie,
>> 
>> I was scanning through this thread and saw that the Python call uses
>> the sum function. If I remember correctly, in Python the built-in sum
>> function is directly implemented in C [1] (unless Python is compiled
>> with SLOW_SUM set to true). In that case on large arrays the function
>> can easily be several times faster than just iterating over the
>> individual objects as the Pharo code does. The benchmark seems to
>> compare summing numbers in C with summing numbers in Pharo. Would be
>> interesting to modify the Python code to use a similar loop as in
>> Pharo for doing the sum.
>> 
>> Cheers,
>> Andrei
>> 
>> [1] 
>> https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675cdbd0193b/Python/bltinmodule.c#L2461
>> 
>>> On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin  wrote:
>>> 
>>> Some experiments and discoveries.
>>> 
>>> I am running my full language test every time. It is the only way I can 
>>> compare results. It is also what fully stresses the language.
>>> 
>>> The reason I wrote the test as I did is because I wanted to know a couple 
>>> of things. Is the language sufficiently performant on basic maths. I am not 
>>> doing any high PolyMath level math. Simple things like moving averages over 
>>> portions of arrays.
>>> 
>>> The other is efficiency of array iteration and access. This why #sum is the 
>>> best test of this attribute. #sum iterates and accesses every element of 
>>> the array. It will reveal if there are any problems.
>>> 
>>> The default test  Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes.
>>> 
>>> When I comment out the #sum and #average calls, Pharo completes the test in 
>>> 3.5 seconds. So almost all the time is spent in those two calls.
>>> 
>>> So most of this conversation has focused on why #sum is as slow as it is or 
>>> how to improve the performance of #sum with other implementations.
>>> 
>>> 
>>> 
>>> So I decided to breakdown the #sum and try some things.
>>> 
>>> Starting with the initial implementation and SequenceableCollection's 
>>> default #sum  time of 02:04:03
>>> 
>>> 
>>> "This implementation does no work. Only iterates through the array.
>>> It completed in 00:10:08"
>>> sum
>>>   | sum |
>>>sum := 1.
>>>   1 to: self size do: [ :each | ].
>>>   ^ sum
>>> 
>>> 
>>> "This implementation does no work, but adds to iteration, accessing the 
>>> value of the array.
>>> It completed in 00:32:32.
>>> Quite a bit of time for simply iterating and accessing."
>>> sum
>>>   | sum |
>>>   sum := 1.
>>>   1 to: self size do: [ :each | self at: each ].
>>>   ^ sum
>>> 
>>> 
>>> "This implementation I had in my initial email as an experiment and also 
>>> several other did the same in theirs.
>>> A naive simple implementation.
>>> It completed in 01:00:53.  Half the time of the original."
>>> sum
>>>  | sum |
>>>   sum := 0.
>>>   1 to: self size do: [ :each |
>>>   sum := sum + (self at: each) ].
>>>   ^ sum
>>> 
>>> 
>>> 
>>> "This implementation I also had in my initial email as an experiment I had 
>>> done.
>>> It completed in 00:50:18.
>>> It reduces the iterations and increases the accesses per iteration.
>>> It is the fastest implementation so far."
>>> sum
>>>   | sum |
>>>   sum := 0.
>>>   1 to: ((self size quo: 10) * 10) by: 10 do: [ :i |
>>>   sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 2)) + 
>>> (self at: (i + 3)) + (self at: (i + 4))  + (self at: (i + 5)) + 
>>> (self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: (i 
>>> + 9))].
>>> 
>>>   ((self size quo: 10) * 10 + 1) to: self size do: [ :i |
>>>   sum := sum + (self at: i)].
>>> ^ sum
>>> 
>>> Summary
>>> 
>>> For whatever reason iterating and accessing on an Array is expensive. That 
>>> alone took longer than Python to complete the entire test.
>>> 
>>> I had allowed this knowledge of how much slower Pharo was to stop me from 
>>> using Pharo. Encouraged me to explore other options.
>>> 
>>> I have the option to use any language I want. I like Pharo. I do not like 
>>> Python at all. Julia is unexciting to me. I don't like their anti-OO 
>>> approach.
>>> 
>>> At one point I had a fairly complete Pharo implementation, which is where I 
>>> got frustrated with backtesting taking days.
>>> 
>>> That implementation is gone. I had 

[Pharo-dev] Re: Array sum. is very slow

2022-01-12 Thread Henrik Sperre Johansen
We could also try modifying Pharo to use C by reintroducing the FloatArray 
plugin ;)

| fa r |
fa := FloatArray new: 28800.
r := Random new.
1 to fa size do: [ :i | fa at: i put: r next ].
[ 1 to: fa size do: [ :i | fa sum ] ] timeToRun

Pharo 9, no plugin:
0:00:01:14.777
Pharo 5, with plugin:
0:00:00:00.526

Cheers,
Henry


> On 11 Jan 2022, at 10:08, Andrei Chis  wrote:
> 
> Hi Jimmie,
> 
> I was scanning through this thread and saw that the Python call uses
> the sum function. If I remember correctly, in Python the built-in sum
> function is directly implemented in C [1] (unless Python is compiled
> with SLOW_SUM set to true). In that case on large arrays the function
> can easily be several times faster than just iterating over the
> individual objects as the Pharo code does. The benchmark seems to
> compare summing numbers in C with summing numbers in Pharo. Would be
> interesting to modify the Python code to use a similar loop as in
> Pharo for doing the sum.
> 
> Cheers,
> Andrei
> 
> [1] 
> https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675cdbd0193b/Python/bltinmodule.c#L2461
> 
>> On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin  wrote:
>> 
>> Some experiments and discoveries.
>> 
>> I am running my full language test every time. It is the only way I can 
>> compare results. It is also what fully stresses the language.
>> 
>> The reason I wrote the test as I did is because I wanted to know a couple of 
>> things. Is the language sufficiently performant on basic maths. I am not 
>> doing any high PolyMath level math. Simple things like moving averages over 
>> portions of arrays.
>> 
>> The other is efficiency of array iteration and access. This why #sum is the 
>> best test of this attribute. #sum iterates and accesses every element of the 
>> array. It will reveal if there are any problems.
>> 
>> The default test  Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes.
>> 
>> When I comment out the #sum and #average calls, Pharo completes the test in 
>> 3.5 seconds. So almost all the time is spent in those two calls.
>> 
>> So most of this conversation has focused on why #sum is as slow as it is or 
>> how to improve the performance of #sum with other implementations.
>> 
>> 
>> 
>> So I decided to breakdown the #sum and try some things.
>> 
>> Starting with the initial implementation and SequenceableCollection's 
>> default #sum  time of 02:04:03
>> 
>> 
>> "This implementation does no work. Only iterates through the array.
>> It completed in 00:10:08"
>> sum
>>| sum |
>> sum := 1.
>>1 to: self size do: [ :each | ].
>>^ sum
>> 
>> 
>> "This implementation does no work, but adds to iteration, accessing the 
>> value of the array.
>> It completed in 00:32:32.
>> Quite a bit of time for simply iterating and accessing."
>> sum
>>| sum |
>>sum := 1.
>>1 to: self size do: [ :each | self at: each ].
>>^ sum
>> 
>> 
>> "This implementation I had in my initial email as an experiment and also 
>> several other did the same in theirs.
>> A naive simple implementation.
>> It completed in 01:00:53.  Half the time of the original."
>> sum
>>   | sum |
>>sum := 0.
>>1 to: self size do: [ :each |
>>sum := sum + (self at: each) ].
>>^ sum
>> 
>> 
>> 
>> "This implementation I also had in my initial email as an experiment I had 
>> done.
>> It completed in 00:50:18.
>> It reduces the iterations and increases the accesses per iteration.
>> It is the fastest implementation so far."
>> sum
>>| sum |
>>sum := 0.
>>1 to: ((self size quo: 10) * 10) by: 10 do: [ :i |
>>sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 2)) + 
>> (self at: (i + 3)) + (self at: (i + 4))  + (self at: (i + 5)) + 
>> (self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: (i 
>> + 9))].
>> 
>>((self size quo: 10) * 10 + 1) to: self size do: [ :i |
>>sum := sum + (self at: i)].
>>  ^ sum
>> 
>> Summary
>> 
>> For whatever reason iterating and accessing on an Array is expensive. That 
>> alone took longer than Python to complete the entire test.
>> 
>> I had allowed this knowledge of how much slower Pharo was to stop me from 
>> using Pharo. Encouraged me to explore other options.
>> 
>> I have the option to use any language I want. I like Pharo. I do not like 
>> Python at all. Julia is unexciting to me. I don't like their anti-OO 
>> approach.
>> 
>> At one point I had a fairly complete Pharo implementation, which is where I 
>> got frustrated with backtesting taking days.
>> 
>> That implementation is gone. I had not switched to Iceberg. I had a problem 
>> with my hard drive. So I am starting over.
>> 
>> I am not a computer scientist, language expert, vm expert or anyone with the 
>> skills to discover and optimize arrays. So I will end my tilting at 
>> windmills here.
>> 
>> I value all the other things that Pharo brings, that I miss when I am using 
>> Julia or Python or Crystal, etc. 

[Pharo-dev] Re: Array sum. is very slow

2022-01-11 Thread Miloslav.Raus
Hi, ppl.

I kinda agree with both sides of the argument.
Whilst taken one way it _is_ comparing apples to oranges, its deeply beneficial 
to optimize the obvious/"idiomatic" cases - especially if you can [without 
introducing friction / special cases].
- ifTrue: and/or ifFalse anyone ?

Aaaand the language /runtime/environment should be evaluated on the grounds of 
how it handles the "idiomatic cases" -- unless you wanna diverge into the 
territory of "how much assembly [or its hi-level equiv.] is too much 
optimization".

No minus points for python here. But no way they can do sane reloading while 
keeping current semantics ...

It's all a trade-off, and the only clean winners overall are Smalltalk & 
[Common] Lisp, IMNSHO.
- biased, but happy; in denial, also (?) - mostly paid for working with 
other laguages/runtimes :-/

Cheers,

M.R.

-Original Message-
From: Jimmie Houchin  
Sent: Tuesday, January 11, 2022 3:37 PM
To: pharo-dev@lists.pharo.org
Subject: [Pharo-dev] Re: Array sum. is very slow

Personally I am okay with Python implementing in C. That is their 
implementation detail. It does not impose anything on the user other than 
knowing normal Python. It isn't cheating or unfair. They are under no 
obligation to handicap themselves so that we can be more comparable.

Are we going to put such requirements on C, C++, Julia, Crystal, Nim?

I expect every language to put forth its best. I would like the same for Pharo. 
And let the numbers fall where they may.

Jimmie



On 1/11/22 03:07, Andrei Chis wrote:
> Hi Jimmie,
>
> I was scanning through this thread and saw that the Python call uses 
> the sum function. If I remember correctly, in Python the built-in sum 
> function is directly implemented in C [1] (unless Python is compiled 
> with SLOW_SUM set to true). In that case on large arrays the function 
> can easily be several times faster than just iterating over the 
> individual objects as the Pharo code does. The benchmark seems to 
> compare summing numbers in C with summing numbers in Pharo. Would be 
> interesting to modify the Python code to use a similar loop as in 
> Pharo for doing the sum.
>
> Cheers,
> Andrei
>
> [1] 
> https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675
> cdbd0193b/Python/bltinmodule.c#L2461
>
> On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin  wrote:
>> Some experiments and discoveries.
>>
>> I am running my full language test every time. It is the only way I can 
>> compare results. It is also what fully stresses the language.
>>
>> The reason I wrote the test as I did is because I wanted to know a couple of 
>> things. Is the language sufficiently performant on basic maths. I am not 
>> doing any high PolyMath level math. Simple things like moving averages over 
>> portions of arrays.
>>
>> The other is efficiency of array iteration and access. This why #sum is the 
>> best test of this attribute. #sum iterates and accesses every element of the 
>> array. It will reveal if there are any problems.
>>
>> The default test  Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes.
>>
>> When I comment out the #sum and #average calls, Pharo completes the test in 
>> 3.5 seconds. So almost all the time is spent in those two calls.
>>
>> So most of this conversation has focused on why #sum is as slow as it is or 
>> how to improve the performance of #sum with other implementations.
>>
>>
>>
>> So I decided to breakdown the #sum and try some things.
>>
>> Starting with the initial implementation and SequenceableCollection's 
>> default #sum  time of 02:04:03
>>
>>
>> "This implementation does no work. Only iterates through the array.
>> It completed in 00:10:08"
>> sum
>>  | sum |
>>   sum := 1.
>>  1 to: self size do: [ :each | ].
>>  ^ sum
>>
>>
>> "This implementation does no work, but adds to iteration, accessing the 
>> value of the array.
>> It completed in 00:32:32.
>> Quite a bit of time for simply iterating and accessing."
>> sum
>>  | sum |
>>  sum := 1.
>>  1 to: self size do: [ :each | self at: each ].
>>  ^ sum
>>
>>
>> "This implementation I had in my initial email as an experiment and also 
>> several other did the same in theirs.
>> A naive simple implementation.
>> It completed in 01:00:53.  Half the time of the original."
>> sum
>> | sum |
>>  sum := 0.
>>  1 to: self size do: [ :each |
>>  sum := sum + (self at: each) ].
>>  ^ sum
>>
>>
>>
>> "This implementa

[Pharo-dev] Re: Array sum. is very slow

2022-01-11 Thread Jimmie Houchin

Thanks for the comments.

They are very true.

Jimmie


On 1/11/22 04:49, Nicolas Anquetil wrote:

Hi,

don'tforget to weight in your time too.

The ease to develop AND evolve a program is an important aspect that
the benchmarks don't show.

Nowdays, developer time count often more than processing time because
you may easily spent days on a nasty bug or an unplanned evolution.

have a nice day

nicolas

On Mon, 2022-01-10 at 14:05 -0600, Jimmie Houchin wrote:

Some experiments and discoveries.
I am running my full language test every time. It is the only way I
can compare results. It is also what fully stresses the language.
The reason I wrote the test as I did is because I wanted to know a
couple of things. Is the language sufficiently performant on basic
maths. I am not doing any high PolyMath level math. Simple things
like moving averages over portions of arrays.
The other is efficiency of array iteration and access. This why #sum
is the best test of this attribute. #sum iterates and accesses every
element of the array. It will reveal if there are any problems.
The default test  Julia 1m15s, Python 24.5 minutes, Pharo 2hour
4minutes.
When I comment out the #sum and #average calls, Pharo completes the
test in 3.5 seconds. So almost all the time is spent in those two
calls.
So most of this conversation has focused on why #sum is as slow as it
is or how to improve the performance of #sum with other
implementations.

  
So I decided to breakdown the #sum and try some things.

Starting with the initial implementation and SequenceableCollection's
default #sum  time of 02:04:03

"This implementation does no work. Only iterates through the array.
It completed in 00:10:08"
  sum
  | sum |
       sum := 1.
  1 to: self size do: [ :each | ].
  ^ sum
  
  
  "This implementation does no work, but adds to iteration, accessing

the value of the array.
It completed in 00:32:32.
Quite a bit of time for simply iterating and accessing."
  sum
  | sum |
  sum := 1.
  1 to: self size do: [ :each | self at: each ].
  ^ sum
  
  
  "This implementation I had in my initial email as an experiment and

also several other did the same in theirs.
A naive simple implementation.
It completed in 01:00:53.  Half the time of the original."
  sum
     | sum |
  sum := 0.
  1 to: self size do: [ :each |
      sum := sum + (self at: each) ].
  ^ sum
  
  
  
  "This implementation I also had in my initial email as an experiment

I had done.
It completed in 00:50:18.
It reduces the iterations and increases the accesses per iteration.
It is the fastest implementation so far."
  sum
  | sum |
  sum := 0.
  1 to: ((self size quo: 10) * 10) by: 10 do: [ :i |
      sum := sum + (self at: i) + (self at: (i + 1)) + (self at:
(i + 2)) + (self at: (i + 3)) + (self at: (i + 4))              +
(self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self
at: (i + 8)) + (self at: (i + 9))].
  
  ((self size quo: 10) * 10 + 1) to: self size do: [ :i |

      sum := sum + (self at: i)].
        ^ sum
  
Summary

For whatever reason iterating and accessing on an Array is expensive.
That alone took longer than Python to complete the entire test.
  
  I had allowed this knowledge of how much slower Pharo was to stop me

from using Pharo. Encouraged me to explore other options.
  
  I have the option to use any language I want. I like Pharo. I do not

like Python at all. Julia is unexciting to me. I don't like their
anti-OO approach.
  
  At one point I had a fairly complete Pharo implementation, which is

where I got frustrated with backtesting taking days.
  
  That implementation is gone. I had not switched to Iceberg. I had a

problem with my hard drive. So I am starting over.
I am not a computer scientist, language expert, vm expert or anyone
with the skills to discover and optimize arrays. So I will end my
tilting at windmills here.
I value all the other things that Pharo brings, that I miss when I am
using Julia or Python or Crystal, etc. Those languages do not have
the vision to do what Pharo (or any Smalltalk) does.
Pharo may not optimize my app as much as x,y or z. But Pharo
optimized me.
That said, I have made the decision to go all in with Pharo. Set
aside all else.
  In that regard I went ahead and put my money in with my decision and
joined the Pharo Association last week.
Thanks for all of your help in exploring the problem.

Jimmie Houchin
  


[Pharo-dev] Re: Array sum. is very slow

2022-01-11 Thread Jimmie Houchin
Personally I am okay with Python implementing in C. That is their 
implementation detail. It does not impose anything on the user other 
than knowing normal Python. It isn't cheating or unfair. They are under 
no obligation to handicap themselves so that we can be more comparable.


Are we going to put such requirements on C, C++, Julia, Crystal, Nim?

I expect every language to put forth its best. I would like the same for 
Pharo. And let the numbers fall where they may.


Jimmie



On 1/11/22 03:07, Andrei Chis wrote:

Hi Jimmie,

I was scanning through this thread and saw that the Python call uses
the sum function. If I remember correctly, in Python the built-in sum
function is directly implemented in C [1] (unless Python is compiled
with SLOW_SUM set to true). In that case on large arrays the function
can easily be several times faster than just iterating over the
individual objects as the Pharo code does. The benchmark seems to
compare summing numbers in C with summing numbers in Pharo. Would be
interesting to modify the Python code to use a similar loop as in
Pharo for doing the sum.

Cheers,
Andrei

[1] 
https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675cdbd0193b/Python/bltinmodule.c#L2461

On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin  wrote:

Some experiments and discoveries.

I am running my full language test every time. It is the only way I can compare 
results. It is also what fully stresses the language.

The reason I wrote the test as I did is because I wanted to know a couple of 
things. Is the language sufficiently performant on basic maths. I am not doing 
any high PolyMath level math. Simple things like moving averages over portions 
of arrays.

The other is efficiency of array iteration and access. This why #sum is the 
best test of this attribute. #sum iterates and accesses every element of the 
array. It will reveal if there are any problems.

The default test  Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes.

When I comment out the #sum and #average calls, Pharo completes the test in 3.5 
seconds. So almost all the time is spent in those two calls.

So most of this conversation has focused on why #sum is as slow as it is or how 
to improve the performance of #sum with other implementations.



So I decided to breakdown the #sum and try some things.

Starting with the initial implementation and SequenceableCollection's default 
#sum  time of 02:04:03


"This implementation does no work. Only iterates through the array.
It completed in 00:10:08"
sum
 | sum |
  sum := 1.
 1 to: self size do: [ :each | ].
 ^ sum


"This implementation does no work, but adds to iteration, accessing the value 
of the array.
It completed in 00:32:32.
Quite a bit of time for simply iterating and accessing."
sum
 | sum |
 sum := 1.
 1 to: self size do: [ :each | self at: each ].
 ^ sum


"This implementation I had in my initial email as an experiment and also 
several other did the same in theirs.
A naive simple implementation.
It completed in 01:00:53.  Half the time of the original."
sum
| sum |
 sum := 0.
 1 to: self size do: [ :each |
 sum := sum + (self at: each) ].
 ^ sum



"This implementation I also had in my initial email as an experiment I had done.
It completed in 00:50:18.
It reduces the iterations and increases the accesses per iteration.
It is the fastest implementation so far."
sum
 | sum |
 sum := 0.
 1 to: ((self size quo: 10) * 10) by: 10 do: [ :i |
 sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 2)) + 
(self at: (i + 3)) + (self at: (i + 4))  + (self at: (i + 5)) + 
(self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: (i + 
9))].

 ((self size quo: 10) * 10 + 1) to: self size do: [ :i |
 sum := sum + (self at: i)].
   ^ sum

Summary

For whatever reason iterating and accessing on an Array is expensive. That 
alone took longer than Python to complete the entire test.

I had allowed this knowledge of how much slower Pharo was to stop me from using 
Pharo. Encouraged me to explore other options.

I have the option to use any language I want. I like Pharo. I do not like 
Python at all. Julia is unexciting to me. I don't like their anti-OO approach.

At one point I had a fairly complete Pharo implementation, which is where I got 
frustrated with backtesting taking days.

That implementation is gone. I had not switched to Iceberg. I had a problem 
with my hard drive. So I am starting over.

I am not a computer scientist, language expert, vm expert or anyone with the 
skills to discover and optimize arrays. So I will end my tilting at windmills 
here.

I value all the other things that Pharo brings, that I miss when I am using 
Julia or Python or Crystal, etc. Those languages do not have the vision to do 
what Pharo (or any Smalltalk) does.

Pharo may not optimize my app as much as x,y or z. But Pharo optimized me.

That said, I 

[Pharo-dev] Re: Array sum. is very slow

2022-01-11 Thread Nicolas Anquetil


Hi,

don'tforget to weight in your time too.

The ease to develop AND evolve a program is an important aspect that
the benchmarks don't show.

Nowdays, developer time count often more than processing time because
you may easily spent days on a nasty bug or an unplanned evolution.

have a nice day

nicolas

On Mon, 2022-01-10 at 14:05 -0600, Jimmie Houchin wrote:
> Some experiments and discoveries.
> I am running my full language test every time. It is the only way I
> can compare results. It is also what fully stresses the language.
> The reason I wrote the test as I did is because I wanted to know a
> couple of things. Is the language sufficiently performant on basic
> maths. I am not doing any high PolyMath level math. Simple things
> like moving averages over portions of arrays.
> The other is efficiency of array iteration and access. This why #sum
> is the best test of this attribute. #sum iterates and accesses every
> element of the array. It will reveal if there are any problems.
> The default test  Julia 1m15s, Python 24.5 minutes, Pharo 2hour
> 4minutes.
> When I comment out the #sum and #average calls, Pharo completes the
> test in 3.5 seconds. So almost all the time is spent in those two
> calls.
> So most of this conversation has focused on why #sum is as slow as it
> is or how to improve the performance of #sum with other
> implementations.
> 
>  
> So I decided to breakdown the #sum and try some things.
> Starting with the initial implementation and SequenceableCollection's
> default #sum  time of 02:04:03
> 
> "This implementation does no work. Only iterates through the array.
> It completed in 00:10:08"
>  sum
>  | sum |
>       sum := 1.
>  1 to: self size do: [ :each | ]. 
>  ^ sum
>  
>  
>  "This implementation does no work, but adds to iteration, accessing
> the value of the array.
> It completed in 00:32:32.
> Quite a bit of time for simply iterating and accessing."
>  sum
>  | sum |
>  sum := 1.
>  1 to: self size do: [ :each | self at: each ].
>  ^ sum
>  
>  
>  "This implementation I had in my initial email as an experiment and
> also several other did the same in theirs.
> A naive simple implementation.
> It completed in 01:00:53.  Half the time of the original."
>  sum
>     | sum |
>  sum := 0.
>  1 to: self size do: [ :each |
>      sum := sum + (self at: each) ].
>  ^ sum
>  
>  
>  
>  "This implementation I also had in my initial email as an experiment
> I had done.
> It completed in 00:50:18.
> It reduces the iterations and increases the accesses per iteration.
> It is the fastest implementation so far."
>  sum
>  | sum |
>  sum := 0.
>  1 to: ((self size quo: 10) * 10) by: 10 do: [ :i |
>      sum := sum + (self at: i) + (self at: (i + 1)) + (self at:
> (i + 2)) + (self at: (i + 3)) + (self at: (i + 4))              +
> (self at: (i + 5)) + (self at: (i + 6)) + (self at: (i + 7)) + (self
> at: (i + 8)) + (self at: (i + 9))].
>  
>  ((self size quo: 10) * 10 + 1) to: self size do: [ :i |
>      sum := sum + (self at: i)].
>        ^ sum
>  
> Summary
> For whatever reason iterating and accessing on an Array is expensive.
> That alone took longer than Python to complete the entire test.
>  
>  I had allowed this knowledge of how much slower Pharo was to stop me
> from using Pharo. Encouraged me to explore other options.
>  
>  I have the option to use any language I want. I like Pharo. I do not
> like Python at all. Julia is unexciting to me. I don't like their
> anti-OO approach.
>  
>  At one point I had a fairly complete Pharo implementation, which is
> where I got frustrated with backtesting taking days.
>  
>  That implementation is gone. I had not switched to Iceberg. I had a
> problem with my hard drive. So I am starting over.
> I am not a computer scientist, language expert, vm expert or anyone
> with the skills to discover and optimize arrays. So I will end my
> tilting at windmills here.
> I value all the other things that Pharo brings, that I miss when I am
> using Julia or Python or Crystal, etc. Those languages do not have
> the vision to do what Pharo (or any Smalltalk) does.
> Pharo may not optimize my app as much as x,y or z. But Pharo
> optimized me.
> That said, I have made the decision to go all in with Pharo. Set
> aside all else.
>  In that regard I went ahead and put my money in with my decision and
> joined the Pharo Association last week.
> Thanks for all of your help in exploring the problem.
> 
> Jimmie Houchin
>  



[Pharo-dev] Re: Array sum. is very slow

2022-01-11 Thread Sven Van Caekenberghe



> On 11 Jan 2022, at 11:17, Sven Van Caekenberghe  wrote:
> 
> which would seem to be 3 times faster !

And with my changes (faster #sum, message spy removed):

[ (LanguageTest newSize: 60*24*5*4 iterations: 1) run ] timeToRun. 
"0:00:00:26.612"

6 times faster.

[Pharo-dev] Re: Array sum. is very slow

2022-01-11 Thread Sven Van Caekenberghe
Hi Andrei,

That is a good catch, indeed, that makes all the difference and is an unfair 
comparison.

If I take Jimmie's code and add

def sum2(l):
  sum = 0
  for i in range(0,len(l)):
sum = sum + l[i]
  return sum

def average(l):
  return sum2(l)/len(l)

and replace the other calls of sum to sum2 in loop1 and loop2, I get the 
following for 1 iteration:

>>> doit(1)
Tue Jan 11 10:34:24 2022
Creating list
createList(n), na[-1]:   0.28803
reps:  1
inside at top loop1: start:  Tue Jan 11 10:34:24 2022
Loop1 time:  1.5645889163017273
nsum: 11242.949400371168
navg: 0.3903801875128878
loop2: start:  Tue Jan 11 10:35:58 2022
Loop2 time:  -27364895.977849767
nsum: 10816.16871440453
navg: 0.3755614136946017
finished:  Tue Jan 11 10:37:33 2022
start time: 1641893664.795651
end time: 1641893853.597397
total time: 1614528959.1841362
nsum: 10816.16871440453
navg: 0.3755614136946017

The total time is calculated wrongly, but doing the calculation in Pharo:

(1641893853.597397 - 1641893664.795651) seconds. "0:00:03:08.80174613"

so 3 minutes.

Jimmie's unmodified Pharo code give for 1 iteration:

[ (LanguageTest newSize: 60*24*5*4 iterations: 1) run ] timeToRun. 
"0:00:01:00.438"

Starting test for array size: 28800  iterations: 1

Creating array of size: 28800   timeToRun: 0:00:00:00.035

Starting loop 1 at: 2022-01-11T10:53:53.423313+01:00
1: 2022-01-11T10:53:53   innerttr: 0:00:00:30.073   averageTime: 0:00:00:30.073
Loop 1 time: nil
nsum: 11242.949400371168
navg: 0.3903801875128878

Starting loop 2 at: 2022-01-11T10:54:23.497281+01:00
1: 2022-01-11T10:54:23   innerttr: 0:00:00:30.306   averageTime: 0:00:00:30.306
Loop 2 time: 0:00:00:30.306
nsum: 10816.168714404532
navg: 0.3755614136946018

End of test.  TotalTime: 0:00:01:00.416

which would seem to be 3 times faster !

Benchmarking is a black art.

Sven

> On 11 Jan 2022, at 10:07, Andrei Chis  wrote:
> 
> Hi Jimmie,
> 
> I was scanning through this thread and saw that the Python call uses
> the sum function. If I remember correctly, in Python the built-in sum
> function is directly implemented in C [1] (unless Python is compiled
> with SLOW_SUM set to true). In that case on large arrays the function
> can easily be several times faster than just iterating over the
> individual objects as the Pharo code does. The benchmark seems to
> compare summing numbers in C with summing numbers in Pharo. Would be
> interesting to modify the Python code to use a similar loop as in
> Pharo for doing the sum.
> 
> Cheers,
> Andrei
> 
> [1] 
> https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675cdbd0193b/Python/bltinmodule.c#L2461
> 
> On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin  wrote:
>> 
>> Some experiments and discoveries.
>> 
>> I am running my full language test every time. It is the only way I can 
>> compare results. It is also what fully stresses the language.
>> 
>> The reason I wrote the test as I did is because I wanted to know a couple of 
>> things. Is the language sufficiently performant on basic maths. I am not 
>> doing any high PolyMath level math. Simple things like moving averages over 
>> portions of arrays.
>> 
>> The other is efficiency of array iteration and access. This why #sum is the 
>> best test of this attribute. #sum iterates and accesses every element of the 
>> array. It will reveal if there are any problems.
>> 
>> The default test  Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes.
>> 
>> When I comment out the #sum and #average calls, Pharo completes the test in 
>> 3.5 seconds. So almost all the time is spent in those two calls.
>> 
>> So most of this conversation has focused on why #sum is as slow as it is or 
>> how to improve the performance of #sum with other implementations.
>> 
>> 
>> 
>> So I decided to breakdown the #sum and try some things.
>> 
>> Starting with the initial implementation and SequenceableCollection's 
>> default #sum  time of 02:04:03
>> 
>> 
>> "This implementation does no work. Only iterates through the array.
>> It completed in 00:10:08"
>> sum
>>| sum |
>> sum := 1.
>>1 to: self size do: [ :each | ].
>>^ sum
>> 
>> 
>> "This implementation does no work, but adds to iteration, accessing the 
>> value of the array.
>> It completed in 00:32:32.
>> Quite a bit of time for simply iterating and accessing."
>> sum
>>| sum |
>>sum := 1.
>>1 to: self size do: [ :each | self at: each ].
>>^ sum
>> 
>> 
>> "This implementation I had in my initial email as an experiment and also 
>> several other did the same in theirs.
>> A naive simple implementation.
>> It completed in 01:00:53.  Half the time of the original."
>> sum
>>   | sum |
>>sum := 0.
>>1 to: self size do: [ :each |
>>sum := sum + (self at: each) ].
>>^ sum
>> 
>> 
>> 
>> "This implementation I also had in my initial email as an experiment I had 
>> done.
>> It completed in 00:50:18.
>> It reduces the iterations and increases the accesses per iteration.
>> It is the 

[Pharo-dev] Re: Array sum. is very slow

2022-01-11 Thread Andrei Chis
Hi Jimmie,

I was scanning through this thread and saw that the Python call uses
the sum function. If I remember correctly, in Python the built-in sum
function is directly implemented in C [1] (unless Python is compiled
with SLOW_SUM set to true). In that case on large arrays the function
can easily be several times faster than just iterating over the
individual objects as the Pharo code does. The benchmark seems to
compare summing numbers in C with summing numbers in Pharo. Would be
interesting to modify the Python code to use a similar loop as in
Pharo for doing the sum.

Cheers,
Andrei

[1] 
https://github.com/python/cpython/blob/135cabd328504e1648d17242b42b675cdbd0193b/Python/bltinmodule.c#L2461

On Mon, Jan 10, 2022 at 9:06 PM Jimmie Houchin  wrote:
>
> Some experiments and discoveries.
>
> I am running my full language test every time. It is the only way I can 
> compare results. It is also what fully stresses the language.
>
> The reason I wrote the test as I did is because I wanted to know a couple of 
> things. Is the language sufficiently performant on basic maths. I am not 
> doing any high PolyMath level math. Simple things like moving averages over 
> portions of arrays.
>
> The other is efficiency of array iteration and access. This why #sum is the 
> best test of this attribute. #sum iterates and accesses every element of the 
> array. It will reveal if there are any problems.
>
> The default test  Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes.
>
> When I comment out the #sum and #average calls, Pharo completes the test in 
> 3.5 seconds. So almost all the time is spent in those two calls.
>
> So most of this conversation has focused on why #sum is as slow as it is or 
> how to improve the performance of #sum with other implementations.
>
>
>
> So I decided to breakdown the #sum and try some things.
>
> Starting with the initial implementation and SequenceableCollection's default 
> #sum  time of 02:04:03
>
>
> "This implementation does no work. Only iterates through the array.
> It completed in 00:10:08"
> sum
> | sum |
>  sum := 1.
> 1 to: self size do: [ :each | ].
> ^ sum
>
>
> "This implementation does no work, but adds to iteration, accessing the value 
> of the array.
> It completed in 00:32:32.
> Quite a bit of time for simply iterating and accessing."
> sum
> | sum |
> sum := 1.
> 1 to: self size do: [ :each | self at: each ].
> ^ sum
>
>
> "This implementation I had in my initial email as an experiment and also 
> several other did the same in theirs.
> A naive simple implementation.
> It completed in 01:00:53.  Half the time of the original."
> sum
>| sum |
> sum := 0.
> 1 to: self size do: [ :each |
> sum := sum + (self at: each) ].
> ^ sum
>
>
>
> "This implementation I also had in my initial email as an experiment I had 
> done.
> It completed in 00:50:18.
> It reduces the iterations and increases the accesses per iteration.
> It is the fastest implementation so far."
> sum
> | sum |
> sum := 0.
> 1 to: ((self size quo: 10) * 10) by: 10 do: [ :i |
> sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 2)) + 
> (self at: (i + 3)) + (self at: (i + 4))  + (self at: (i + 5)) + 
> (self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: (i + 
> 9))].
>
> ((self size quo: 10) * 10 + 1) to: self size do: [ :i |
> sum := sum + (self at: i)].
>   ^ sum
>
> Summary
>
> For whatever reason iterating and accessing on an Array is expensive. That 
> alone took longer than Python to complete the entire test.
>
> I had allowed this knowledge of how much slower Pharo was to stop me from 
> using Pharo. Encouraged me to explore other options.
>
> I have the option to use any language I want. I like Pharo. I do not like 
> Python at all. Julia is unexciting to me. I don't like their anti-OO approach.
>
> At one point I had a fairly complete Pharo implementation, which is where I 
> got frustrated with backtesting taking days.
>
> That implementation is gone. I had not switched to Iceberg. I had a problem 
> with my hard drive. So I am starting over.
>
> I am not a computer scientist, language expert, vm expert or anyone with the 
> skills to discover and optimize arrays. So I will end my tilting at windmills 
> here.
>
> I value all the other things that Pharo brings, that I miss when I am using 
> Julia or Python or Crystal, etc. Those languages do not have the vision to do 
> what Pharo (or any Smalltalk) does.
>
> Pharo may not optimize my app as much as x,y or z. But Pharo optimized me.
>
> That said, I have made the decision to go all in with Pharo. Set aside all 
> else.
> In that regard I went ahead and put my money in with my decision and joined 
> the Pharo Association last week.
>
> Thanks for all of your help in exploring the problem.
>
>
> Jimmie Houchin


[Pharo-dev] Re: Array sum. is very slow

2022-01-10 Thread Jimmie Houchin

Some experiments and discoveries.

I am running my full language test every time. It is the only way I can 
compare results. It is also what fully stresses the language.


The reason I wrote the test as I did is because I wanted to know a 
couple of things. Is the language sufficiently performant on basic 
maths. I am not doing any high PolyMath level math. Simple things like 
moving averages over portions of arrays.


The other is efficiency of array iteration and access. This why #sum is 
the best test of this attribute. #sum iterates and accesses every 
element of the array. It will reveal if there are any problems.


The default test  Julia 1m15s, Python 24.5 minutes, Pharo 2hour 4minutes.

When I comment out the #sum and #average calls, Pharo completes the test 
in 3.5 seconds. So almost all the time is spent in those two calls.


So most of this conversation has focused on why #sum is as slow as it is 
or how to improve the performance of #sum with other implementations.




So I decided to breakdown the #sum and try some things.

Starting with the initial implementation and SequenceableCollection's 
default #sum  time of 02:04:03



/"This implementation does no work. Only iterates through the array.//
//It completed in 00:10:08"/
sum
    | sum |
   sum := 1.
    1 to: self size do: [ :each | ].
    ^ sum


/"This implementation does no work, but adds to iteration, accessing the 
value of the array.//

//It completed in 00:32:32.//
//Quite a bit of time for simply iterating and accessing."/
sum
    | sum |
    sum := 1.
    1 to: self size do: [ :each | self at: each ].
    ^ sum


/"This implementation I had in my initial email as an experiment and 
also several other did the same in theirs.//

//A naive simple implementation.//
//It completed in 01:00:53.  Half the time of the original."/
sum
   | sum |
    sum := 0.
    1 to: self size do: [ :each |
        sum := sum + (self at: each) ].
    ^ sum



/"This implementation I also had in my initial email as an experiment I 
had done.//

//It completed in 00:50:18.//
//It reduces the iterations and increases the accesses per iteration.//
//It is the fastest implementation so far."/
sum
    | sum |
    sum := 0.
    1 to: ((self size quo: 10) * 10) by: 10 do: [ :i |
        sum := sum + (self at: i) + (self at: (i + 1)) + (self at: (i + 
2)) + (self at: (i + 3)) + (self at: (i + 4))  + (self at: (i + 5)) + 
(self at: (i + 6)) + (self at: (i + 7)) + (self at: (i + 8)) + (self at: 
(i + 9))].


    ((self size quo: 10) * 10 + 1) to: self size do: [ :i |
        sum := sum + (self at: i)].
    ^ sum

*Summary
*

For whatever reason iterating and accessing on an Array is expensive. 
That alone took longer than Python to complete the entire test.


I had allowed this knowledge of how much slower Pharo was to stop me 
from using Pharo. Encouraged me to explore other options.


I have the option to use any language I want. I like Pharo. I do not 
like Python at all. Julia is unexciting to me. I don't like their 
anti-OO approach.


At one point I had a fairly complete Pharo implementation, which is 
where I got frustrated with backtesting taking days.


That implementation is gone. I had not switched to Iceberg. I had a 
problem with my hard drive. So I am starting over.


I am not a computer scientist, language expert, vm expert or anyone with 
the skills to discover and optimize arrays. So I will end my tilting at 
windmills here.


I value all the other things that Pharo brings, that I miss when I am 
using Julia or Python or Crystal, etc. Those languages do not have the 
vision to do what Pharo (or any Smalltalk) does.


Pharo may not optimize my app as much as x,y or z. But Pharo optimized me.

That said, I have made the decision to go all in with Pharo. Set aside 
all else.
In that regard I went ahead and put my money in with my decision and 
joined the Pharo Association last week.


Thanks for all of your help in exploring the problem.


Jimmie Houchin


[Pharo-dev] Re: Array sum. is very slow

2022-01-09 Thread Nicolas Anquetil
On Sun, 2022-01-09 at 12:05 +0100, Nicolas Anquetil wrote:
> 
> Definitly not easy to do benchmarking
> I got these strange results:
> 
> n := 1000.
> floatArray := Array new: n. 
> 
> Time millisecondsToRun: [ floatArray doWithIndex: [:each :idx |
> floatArray at: idx put: Random new ] ].
> "-> 2871"
> 
> Time millisecondsToRun: [ floatArray doWithIndex: [:each :idx |
> floatArray at: idx put: i ] ].
ooops, that was 'floatArray at: idx put: idx'
(-> similar time)

> "-> 86"
> 
> Time millisecondsToRun: [1 to: n do: [:i | Random new ]].
> "-> 829"
> 
> so
> - assigning 'Random new' to 1M array elements takes 2.8 seconds.
> - assigning a value to 1M array elements takes 0.08 seconds.
> - computing 'Random new' 1M times takes 0.8 seconds
> 
> 
> I wonder where the extra 2 seconds come from?
> some optimization in the background?
> 
> I did the 3 of them several times in different order and the results
> are similar.
> 
> nicolas
> 
> On Fri, 2022-01-07 at 15:36 +, Benoit St-Jean via Pharo-dev
> wrote:
> > Can you come up with a simple "base case" so we can find the
> > bottleneck/problem?
> > 
> > I'm not sure about what you're trying to do.
> > 
> > What do you get if you try this in a workspace (adjust the value of
> > n
> > to what you want, I tested it with 10 million items).
> > 
> > Let's get this one step at a time!
> > 
> > 
> > 
> > >   floatArray  n  rng t1 t2 t3 r1 r2 r3 |
> > 
> > n := 1000.
> > 
> > rng := Random new.
> > 
> > floatArray := Array new: n. 
> > floatArray doWithIndex: [:each :idx | floatArray at: idx put: rng
> > next].
> > 
> > t1 := Time millisecondsToRun: [r1 := floatArray sum].
> > t2 := Time millisecondsToRun: [| total |
> > 
> > tot
> > al
> > := 0.
> > flo
> > atA
> > rray do: [:each | total := total + each ].
> > r2
> > :=
> > total].
> > 
> > t3 := Time millisecondsToRun: [r3 := floatArray inject: 0 into:  [:
> > total :each | total + each ]].
> > 
> > Transcript cr.
> > Transcript cr; show: 'Test with ', n printString, ' elements'.
> > Transcript cr;show: 'Original #sum -> Time: ', t1 printString, '
> > milliseconds, Total: ', r1 printString.
> > Transcript cr;show: 'Naive #sum -> Time: ', t2 printString, '
> > milliseconds, Total: ', r2 printString.  
> > Transcript cr;show: 'Inject #sum -> Time: ', t3 printString, '
> > milliseconds, Total: ', r3 printString.  
> > 
> > --
> > 
> > Here are the results I get on Squeak 5.3
> > 
> > Test with 1000 elements
> > Original #sum -> Time: 143 milliseconds, Total: 4.999271889099622e6
> > Naive #sum -> Time: 115 milliseconds, Total: 4.999271889099622e6
> > Inject #sum -> Time: 102 milliseconds, Total: 4.999271889099622e6
> > 
> > 
> > 
> > - 
> > Benoît St-Jean 
> > Yahoo! Messenger: bstjean 
> > Twitter: @BenLeChialeux 
> > Pinterest: benoitstjean 
> > Instagram: Chef_Benito
> > IRC: lamneth 
> > GitHub: bstjean
> > Blogue: endormitoire.wordpress.com 
> > "A standpoint is an intellectual horizon of radius zero".  (A.
> > Einstein)
> > 
> > 
> >  On Thursday, January 6, 2022, 03:38:22 p.m. EST, Jimmie Houchin
> >  wrote: 
> > 
> > 
> > I have written a micro benchmark which stresses a language in areas
> > which are crucial to my application.
> > 
> > I have written this micro benchmark in Pharo, Crystal, Nim, Python,
> > PicoLisp, C, C++, Java and Julia.
> > 
> > On my i7 laptop Julia completes it in about 1 minute and 15
> > seconds, 
> > amazing magic they have done.
> > 
> > Pharo takes over 2 hours. :(
> > 
> > In my benchmarks if I comment out the sum and average of the array.
> > It 
> > completes in 3.5 seconds.
> > And when I sum the array it gives the correct results. So I can
> > verify 
> > its validity.
> > 
> > over the array and do calculations on each value of the array and
> > update 
> > the array and sum and average at each value simple to stress array 
> > access and sum and average.
> > 
> > 28800 is simply derived from time series one minute values for 5
> > days,
> > 4 
> > weeks.
> > 
> > randarray := Array new: 28800.
> > 
> > 1 to: randarray size do: [ :i | randarray at: i put: Number random
> > ].
> > 
> > here." randarray sum. randarray average ]] timeToRun.
> > 
> > randarrayttr. "0:00:00:36.135"
> > 
> > 
> > I do 2 loops with 100 iterations each.
> > 
> > randarrayttr * 200. "0:02:00:27"
> > 
> > 
> > I learned early on in this adventure when dealing with compiled 
> > languages that if you don’t do a lot, the test may not last long
> > enough
> > to give any times.
> > 
> > Pharo is my preference. But this is an awful big gap in
> > performance. 
> > When doing backtesting this is huge. Does my backtest take minutes,
> > hours or days?
> > 
> > I am not a computer scientist nor expert in 

[Pharo-dev] Re: Array sum. is very slow

2022-01-09 Thread Nicolas Anquetil


Definitly not easy to do benchmarking
I got these strange results:

n := 1000.
floatArray := Array new: n. 

Time millisecondsToRun: [ floatArray doWithIndex: [:each :idx |
floatArray at: idx put: Random new ] ].
"-> 2871"

Time millisecondsToRun: [ floatArray doWithIndex: [:each :idx |
floatArray at: idx put: i ] ].
"-> 86"

Time millisecondsToRun: [1 to: n do: [:i | Random new ]].
"-> 829"

so
- assigning 'Random new' to 1M array elements takes 2.8 seconds.
- assigning a value to 1M array elements takes 0.08 seconds.
- computing 'Random new' 1M times takes 0.8 seconds


I wonder where the extra 2 seconds come from?
some optimization in the background?

I did the 3 of them several times in different order and the results
are similar.

nicolas

On Fri, 2022-01-07 at 15:36 +, Benoit St-Jean via Pharo-dev wrote:
> Can you come up with a simple "base case" so we can find the
> bottleneck/problem?
> 
> I'm not sure about what you're trying to do.
> 
> What do you get if you try this in a workspace (adjust the value of n
> to what you want, I tested it with 10 million items).
> 
> Let's get this one step at a time!
> 
> 
> 
> |  floatArray  n  rng t1 t2 t3 r1 r2 r3 |
> 
> n := 1000.
> 
> rng := Random new.
> 
> floatArray := Array new: n. 
> floatArray doWithIndex: [:each :idx | floatArray at: idx put: rng
> next].
> 
> t1 := Time millisecondsToRun: [r1 := floatArray sum].
> t2 := Time millisecondsToRun: [| total |
>   
>   total
> := 0.
>   floatA
> rray do: [:each | total := total + each ].
>   r2 :=
> total].
>   
> t3 := Time millisecondsToRun: [r3 := floatArray inject: 0 into:  [:
> total :each | total + each ]].
> 
> Transcript cr.
> Transcript cr; show: 'Test with ', n printString, ' elements'.
> Transcript cr;show: 'Original #sum -> Time: ', t1 printString, '
> milliseconds, Total: ', r1 printString.
> Transcript cr;show: 'Naive #sum -> Time: ', t2 printString, '
> milliseconds, Total: ', r2 printString.  
> Transcript cr;show: 'Inject #sum -> Time: ', t3 printString, '
> milliseconds, Total: ', r3 printString.  
> 
> --
> 
> Here are the results I get on Squeak 5.3
> 
> Test with 1000 elements
> Original #sum -> Time: 143 milliseconds, Total: 4.999271889099622e6
> Naive #sum -> Time: 115 milliseconds, Total: 4.999271889099622e6
> Inject #sum -> Time: 102 milliseconds, Total: 4.999271889099622e6
> 
> 
> 
> - 
> Benoît St-Jean 
> Yahoo! Messenger: bstjean 
> Twitter: @BenLeChialeux 
> Pinterest: benoitstjean 
> Instagram: Chef_Benito
> IRC: lamneth 
> GitHub: bstjean
> Blogue: endormitoire.wordpress.com 
> "A standpoint is an intellectual horizon of radius zero".  (A.
> Einstein)
> 
> 
>  On Thursday, January 6, 2022, 03:38:22 p.m. EST, Jimmie Houchin
>  wrote: 
> 
> 
> I have written a micro benchmark which stresses a language in areas 
> which are crucial to my application.
> 
> I have written this micro benchmark in Pharo, Crystal, Nim, Python, 
> PicoLisp, C, C++, Java and Julia.
> 
> On my i7 laptop Julia completes it in about 1 minute and 15 seconds, 
> amazing magic they have done.
> 
> Pharo takes over 2 hours. :(
> 
> In my benchmarks if I comment out the sum and average of the array. It 
> completes in 3.5 seconds.
> And when I sum the array it gives the correct results. So I can verify 
> its validity.
> 
> over the array and do calculations on each value of the array and
> update 
> the array and sum and average at each value simple to stress array 
> access and sum and average.
> 
> 28800 is simply derived from time series one minute values for 5 days,
> 4 
> weeks.
> 
> randarray := Array new: 28800.
> 
> 1 to: randarray size do: [ :i | randarray at: i put: Number random ].
> 
> here." randarray sum. randarray average ]] timeToRun.
> 
> randarrayttr. "0:00:00:36.135"
> 
> 
> I do 2 loops with 100 iterations each.
> 
> randarrayttr * 200. "0:02:00:27"
> 
> 
> I learned early on in this adventure when dealing with compiled 
> languages that if you don’t do a lot, the test may not last long enough
> to give any times.
> 
> Pharo is my preference. But this is an awful big gap in performance. 
> When doing backtesting this is huge. Does my backtest take minutes, 
> hours or days?
> 
> I am not a computer scientist nor expert in Pharo or Smalltalk. So I do
> not know if there is anything which can improve this.
> 
> 
> However I have played around with several experiments of my #sum:
> method.
> 
> This implementation reduces the time on the above randarray in half.
> 
> sum: col
> | sum |
> sum := 0.
> 1 to: col size do: [ :i |
>   sum := sum + (col at: i) ].
> ^ sum
> 
> randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations 
> here."
>  ltsa sum: randarray. ltsa sum: randarray ]] 

[Pharo-dev] Re: Array sum. is very slow

2022-01-09 Thread Stéphane Ducasse
On my machine so this is the same.

SQ5.3

Test with 1000 elements
Original #sum -> Time: 196 milliseconds, Total: 5.001448710680429e6
Naive #sum -> Time: 152 milliseconds, Total: 5.001448710680429e6
Inject #sum -> Time: 143 milliseconds, Total: 5.001448710680429e6



> On 8 Jan 2022, at 21:47, stephane ducasse  wrote:
> 
> Thanks benoit for the snippet
> I run it in Pharo 10 and I got
> 
> Test with 1000 elements
> Original #sum -> Time: 195 milliseconds, Total: 4.999452880735064e6
> Naive #sum -> Time: 153 milliseconds, Total: 4.999452880735063e6
> Inject #sum -> Time: 198 milliseconds, Total: 4.999452880735063e6
> 
> 
> in Pharo 9
> Test with 1000 elements
> Original #sum -> Time: 182 milliseconds, Total: 4.999339450212771e6
> Naive #sum -> Time: 148 milliseconds, Total: 4.999339450212771e6
> Inject #sum -> Time: 203 milliseconds, Total: 4.999339450212771e6
> 
> I’m interested to understand why Pharo is slower. May be this is the impact 
> of the new full blocks. 
> We started to play with the idea of regression benchmarks. 
> 
> S
> 
> 
>> On 7 Jan 2022, at 16:36, Benoit St-Jean via Pharo-dev 
>> mailto:pharo-dev@lists.pharo.org>> wrote:
>> 
>> Can you come up with a simple "base case" so we can find the 
>> bottleneck/problem?
>> 
>> I'm not sure about what you're trying to do.
>> 
>> What do you get if you try this in a workspace (adjust the value of n to 
>> what you want, I tested it with 10 million items).
>> 
>> Let's get this one step at a time!
>> 
>> 
>> 
>> |  floatArray  n  rng t1 t2 t3 r1 r2 r3 |
>> 
>> n := 1000.
>> 
>> rng := Random new.
>> 
>> floatArray := Array new: n. 
>> floatArray doWithIndex: [:each :idx | floatArray at: idx put: rng next].
>> 
>> t1 := Time millisecondsToRun: [r1 := floatArray sum].
>> t2 := Time millisecondsToRun: [| total |
>>  
>>  total := 0.
>>  floatArray do: 
>> [:each | total := total + each ].
>>  r2 := total].
>>  
>> t3 := Time millisecondsToRun: [r3 := floatArray inject: 0 into:  [: total 
>> :each | total + each ]].
>> 
>> Transcript cr.
>> Transcript cr; show: 'Test with ', n printString, ' elements'.
>> Transcript cr;show: 'Original #sum -> Time: ', t1 printString, ' 
>> milliseconds, Total: ', r1 printString.
>> Transcript cr;show: 'Naive #sum -> Time: ', t2 printString, ' milliseconds, 
>> Total: ', r2 printString.  
>> Transcript cr;show: 'Inject #sum -> Time: ', t3 printString, ' milliseconds, 
>> Total: ', r3 printString.  
>> 
>> --
>> 
>> Here are the results I get on Squeak 5.3
>> 
>> Test with 1000 elements
>> Original #sum -> Time: 143 milliseconds, Total: 4.999271889099622e6
>> Naive #sum -> Time: 115 milliseconds, Total: 4.999271889099622e6
>> Inject #sum -> Time: 102 milliseconds, Total: 4.999271889099622e6
>> 
>> 
>> 
>> - 
>> Benoît St-Jean 
>> Yahoo! Messenger: bstjean 
>> Twitter: @BenLeChialeux 
>> Pinterest: benoitstjean 
>> Instagram: Chef_Benito
>> IRC: lamneth 
>> GitHub: bstjean
>> Blogue: endormitoire.wordpress.com  
>> "A standpoint is an intellectual horizon of radius zero".  (A. Einstein)
>> 
>> 
>> On Thursday, January 6, 2022, 03:38:22 p.m. EST, Jimmie Houchin 
>> mailto:jlhouc...@gmail.com>> wrote:
>> 
>> 
>> I have written a micro benchmark which stresses a language in areas 
>> which are crucial to my application.
>> 
>> I have written this micro benchmark in Pharo, Crystal, Nim, Python, 
>> PicoLisp, C, C++, Java and Julia.
>> 
>> On my i7 laptop Julia completes it in about 1 minute and 15 seconds, 
>> amazing magic they have done.
>> 
>> Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. 
>> Pharo takes over 2 hours. :(
>> 
>> In my benchmarks if I comment out the sum and average of the array. It 
>> completes in 3.5 seconds.
>> And when I sum the array it gives the correct results. So I can verify 
>> its validity.
>> 
>> To illustrate below is some sample code of what I am doing. I iterate 
>> over the array and do calculations on each value of the array and update 
>> the array and sum and average at each value simple to stress array 
>> access and sum and average.
>> 
>> 28800 is simply derived from time series one minute values for 5 days, 4 
>> weeks.
>> 
>> randarray := Array new: 28800.
>> 
>> 1 to: randarray size do: [ :i | randarray at: i put: Number random ].
>> 
>> randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations 
>> here." randarray sum. randarray average ]] timeToRun.
>> 
>> randarrayttr. "0:00:00:36.135"
>> 
>> 
>> I do 2 loops with 100 iterations each.
>> 
>> randarrayttr * 200. "0:02:00:27"
>> 
>> 
>> I learned early on in this adventure when dealing with compiled 
>> languages that if you don’t do a lot, the test may not last long 

[Pharo-dev] Re: Array sum. is very slow

2022-01-08 Thread stephane ducasse
Thanks benoit for the snippet
I run it in Pharo 10 and I got

Test with 1000 elements
Original #sum -> Time: 195 milliseconds, Total: 4.999452880735064e6
Naive #sum -> Time: 153 milliseconds, Total: 4.999452880735063e6
Inject #sum -> Time: 198 milliseconds, Total: 4.999452880735063e6


in Pharo 9
Test with 1000 elements
Original #sum -> Time: 182 milliseconds, Total: 4.999339450212771e6
Naive #sum -> Time: 148 milliseconds, Total: 4.999339450212771e6
Inject #sum -> Time: 203 milliseconds, Total: 4.999339450212771e6

I’m interested to understand why Pharo is slower. May be this is the impact 
of the new full blocks. 
We started to play with the idea of regression benchmarks. 

S


> On 7 Jan 2022, at 16:36, Benoit St-Jean via Pharo-dev 
>  wrote:
> 
> Can you come up with a simple "base case" so we can find the 
> bottleneck/problem?
> 
> I'm not sure about what you're trying to do.
> 
> What do you get if you try this in a workspace (adjust the value of n to what 
> you want, I tested it with 10 million items).
> 
> Let's get this one step at a time!
> 
> 
> 
> |  floatArray  n  rng t1 t2 t3 r1 r2 r3 |
> 
> n := 1000.
> 
> rng := Random new.
> 
> floatArray := Array new: n. 
> floatArray doWithIndex: [:each :idx | floatArray at: idx put: rng next].
> 
> t1 := Time millisecondsToRun: [r1 := floatArray sum].
> t2 := Time millisecondsToRun: [| total |
>   
>   total := 0.
>   floatArray do: 
> [:each | total := total + each ].
>   r2 := total].
>   
> t3 := Time millisecondsToRun: [r3 := floatArray inject: 0 into:  [: total 
> :each | total + each ]].
> 
> Transcript cr.
> Transcript cr; show: 'Test with ', n printString, ' elements'.
> Transcript cr;show: 'Original #sum -> Time: ', t1 printString, ' 
> milliseconds, Total: ', r1 printString.
> Transcript cr;show: 'Naive #sum -> Time: ', t2 printString, ' milliseconds, 
> Total: ', r2 printString.  
> Transcript cr;show: 'Inject #sum -> Time: ', t3 printString, ' milliseconds, 
> Total: ', r3 printString.  
> 
> --
> 
> Here are the results I get on Squeak 5.3
> 
> Test with 1000 elements
> Original #sum -> Time: 143 milliseconds, Total: 4.999271889099622e6
> Naive #sum -> Time: 115 milliseconds, Total: 4.999271889099622e6
> Inject #sum -> Time: 102 milliseconds, Total: 4.999271889099622e6
> 
> 
> 
> - 
> Benoît St-Jean 
> Yahoo! Messenger: bstjean 
> Twitter: @BenLeChialeux 
> Pinterest: benoitstjean 
> Instagram: Chef_Benito
> IRC: lamneth 
> GitHub: bstjean
> Blogue: endormitoire.wordpress.com 
> "A standpoint is an intellectual horizon of radius zero".  (A. Einstein)
> 
> 
> On Thursday, January 6, 2022, 03:38:22 p.m. EST, Jimmie Houchin 
>  wrote:
> 
> 
> I have written a micro benchmark which stresses a language in areas 
> which are crucial to my application.
> 
> I have written this micro benchmark in Pharo, Crystal, Nim, Python, 
> PicoLisp, C, C++, Java and Julia.
> 
> On my i7 laptop Julia completes it in about 1 minute and 15 seconds, 
> amazing magic they have done.
> 
> Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. 
> Pharo takes over 2 hours. :(
> 
> In my benchmarks if I comment out the sum and average of the array. It 
> completes in 3.5 seconds.
> And when I sum the array it gives the correct results. So I can verify 
> its validity.
> 
> To illustrate below is some sample code of what I am doing. I iterate 
> over the array and do calculations on each value of the array and update 
> the array and sum and average at each value simple to stress array 
> access and sum and average.
> 
> 28800 is simply derived from time series one minute values for 5 days, 4 
> weeks.
> 
> randarray := Array new: 28800.
> 
> 1 to: randarray size do: [ :i | randarray at: i put: Number random ].
> 
> randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations 
> here." randarray sum. randarray average ]] timeToRun.
> 
> randarrayttr. "0:00:00:36.135"
> 
> 
> I do 2 loops with 100 iterations each.
> 
> randarrayttr * 200. "0:02:00:27"
> 
> 
> I learned early on in this adventure when dealing with compiled 
> languages that if you don’t do a lot, the test may not last long enough 
> to give any times.
> 
> Pharo is my preference. But this is an awful big gap in performance. 
> When doing backtesting this is huge. Does my backtest take minutes, 
> hours or days?
> 
> I am not a computer scientist nor expert in Pharo or Smalltalk. So I do 
> not know if there is anything which can improve this.
> 
> 
> However I have played around with several experiments of my #sum: method.
> 
> This implementation reduces the time on the above randarray in half.
> 
> sum: col
> | sum |
> sum := 0.
> 1 to: col size do: [ :i |
>  sum := sum + (col at: i) ].

[Pharo-dev] Re: Array sum. is very slow

2022-01-07 Thread Sven Van Caekenberghe
Hi Jimmy,

I made a couple more changes:

- I added 

SequenceableCollection>>#sum
| sum |
sum := 0.
1 to: self size do: [ :each |
sum := sum + (self at: each) ].
^ sum 

as an extension method. It is not 100% semantically the same as the original, 
but it works for our case here. this also optimises #average BTW. This is the 
main one.

- I tried to avoid a couple of integer -> float conversions in 

normalize: n
| nn |

nn := n = 0
ifTrue: [ 0.000123456789 ] 
ifFalse: [ n asFloat ].

[ nn <= 0.0001 ] whileTrue: [ nn := nn * 10.0 ].
[ nn >= 1.0 ] whileTrue: [ nn := nn * 0.1 ].

^ nn

- Avoided one assignment in

loop1calc: i j: j n: n
| v |
v := n * (i+n) * (j-n) * 0.1234567.
^ self normalize: (v*v*v)

the time for 10 iterations now is halved:

===
Starting test for array size: 28800  iterations: 10

Creating array of size: 28800   timeToRun: 0:00:00:00.002

Starting loop 1 at: 2022-01-07T19:28:52.109011+01:00
Loop 1 time: nil
nsum: 11234.235001659386
navg: 0.3900776042242842

Starting loop 2 at: 2022-01-07T19:31:21.821784+01:00
Loop 2 time: 0:00:02:28.017
nsum: 11245.697629561537
navg: 0.3904756121375534

End of test.  TotalTime: 0:00:04:57.733
===

Sven

> On 7 Jan 2022, at 16:30, Sven Van Caekenberghe  wrote:
> 
> 
> 
>> On 7 Jan 2022, at 16:05, Jimmie Houchin  wrote:
>> 
>> Hello Sven,
>> 
>> I went and removed the Stdouts that you mention and other timing code from 
>> the loops.
>> 
>> I am running the test now, to see if that makes much difference. I do not 
>> think it will.
>> 
>> The reason I put that in there is because it take so long to run. It can be 
>> frustrating to wait and wait and not know if your test is doing anything or 
>> not. So I put the code in to let me know.
>> 
>> One of your parameters is incorrect. It is 100 iterations not 10.
> 
> Ah, I misread the Python code, on top it says, reps = 10, while at the bottom 
> it does indeed say, doit(100).
> 
> So the time should be multiplied by 10.
> 
> The logging, esp. the #flush will slow things down. But the removing the 
> message tally spy is important too.
> 
> The general implementation of #sum is not optimal in the case of a fixed 
> array. Consider:
> 
> data := Array new: 1e5 withAll: 0.5.
> 
> [ data sum ] bench. "'494.503 per second'"
> 
> [ | sum | sum := 0. data do: [ :each | sum := sum + each ]. sum ] bench. 
> "'680.128 per second'"
> 
> [ | sum | sum := 0. 1 to: 1e5 do: [ :each | sum := sum + (data at: each) ]. 
> sum ] bench. "'1033.180 per second'"
> 
> As others have remarked: doing #average right after #sum is doing the same 
> thing twice. But maybe that is not the point.
> 
>> I learned early on in this experiment that I have to do a large number of 
>> iterations or C, C++, Java, etc are too fast to have comprehensible results.
>> 
>> I can tell if any of the implementations is incorrect by the final nsum. All 
>> implementations must produce the same result.
>> 
>> Thanks for the comments.
>> 
>> Jimmie
>> 
>> 
>> On 1/7/22 07:40, Sven Van Caekenberghe wrote:
>>> Hi Jimmie,
>>> 
>>> I loaded your code in Pharo 9 on my MacBook Pro (Intel i5) macOS 12.1
>>> 
>>> I commented out the Stdio logging from the 2 inner loops (#loop1, #loop2) 
>>> (not done in Python either) as well as the MessageTally spyOn: from #run 
>>> (slows things down).
>>> 
>>> Then I ran your code with:
>>> 
>>> [ (LanguageTest newSize: 60*24*5*4 iterations: 10) run ] timeToRun.
>>> 
>>> which gave me "0:00:09:31.338"
>>> 
>>> The console output was:
>>> 
>>> ===
>>> Starting test for array size: 28800  iterations: 10
>>> 
>>> Creating array of size: 28800   timeToRun: 0:00:00:00.031
>>> 
>>> Starting loop 1 at: 2022-01-07T14:10:35.395394+01:00
>>> Loop 1 time: nil
>>> nsum: 11234.235001659388
>>> navg: 0.39007760422428434
>>> 
>>> Starting loop 2 at: 2022-01-07T14:15:22.108433+01:00
>>> Loop 2 time: 0:00:04:44.593
>>> nsum: 11245.697629561537
>>> navg: 0.3904756121375534
>>> 
>>> End of test.  TotalTime: 0:00:09:31.338
>>> ===
>>> 
>>> Which would be twice as fast as Python, if I got the parameters correct.
>>> 
>>> Sven
>>> 
 On 7 Jan 2022, at 13:19, Jimmie Houchin  wrote:
 
 As I stated this is a micro benchmark and very much not anything 
 resembling a real app, Your comments are true if you are writing your app. 
 But if you want to stress the language you are going to do things which 
 are seemingly non-sense and abusive.
 
 Also as I stated. The test has to be sufficient to stress faster languages 
 or it is meaningless.
 
 If I remove the #sum and the #average calls from the inner loops, this is 
 what we get.
 
 Julia  0.2256 seconds
 Python   5.318  seconds
 Pharo3.5seconds
 
 This test does not sufficiently stress the language. Nor does it provide 
 any valuable insight into summing and averaging 

[Pharo-dev] Re: Array sum. is very slow

2022-01-07 Thread Benoit St-Jean via Pharo-dev
Can you come up with a simple "base case" so we can find the bottleneck/problem?
I'm not sure about what you're trying to do.
What do you get if you try this in a workspace (adjust the value of n to what 
you want, I tested it with 10 million items).
Let's get this one step at a time!


|  floatArray  n  rng t1 t2 t3 r1 r2 r3 |
n := 1000.
rng := Random new.
floatArray := Array new: n. floatArray doWithIndex: [:each :idx | floatArray 
at: idx put: rng next].
t1 := Time millisecondsToRun: [r1 := floatArray sum].t2 := Time 
millisecondsToRun: [| total |  total := 0. floatArray do: [:each | total := 
total + each ]. r2 := total]. t3 := Time millisecondsToRun: [r3 := floatArray 
inject: 0 into:  [: total :each | total + each ]].
Transcript cr.Transcript cr; show: 'Test with ', n printString, ' 
elements'.Transcript cr;show: 'Original #sum -> Time: ', t1 printString, ' 
milliseconds, Total: ', r1 printString.Transcript cr;show: 'Naive #sum -> Time: 
', t2 printString, ' milliseconds, Total: ', r2 printString.  Transcript 
cr;show: 'Inject #sum -> Time: ', t3 printString, ' milliseconds, Total: ', r3 
printString.  
--
Here are the results I get on Squeak 5.3
Test with 1000 elementsOriginal #sum -> Time: 143 milliseconds, Total: 
4.999271889099622e6Naive #sum -> Time: 115 milliseconds, Total: 
4.999271889099622e6Inject #sum -> Time: 102 milliseconds, Total: 
4.999271889099622e6


- 
Benoît St-Jean 
Yahoo! Messenger: bstjean 
Twitter: @BenLeChialeux 
Pinterest: benoitstjean 
Instagram: Chef_Benito
IRC: lamneth 
GitHub: bstjean
Blogue: endormitoire.wordpress.com 
"A standpoint is an intellectual horizon of radius zero".  (A. Einstein) 

On Thursday, January 6, 2022, 03:38:22 p.m. EST, Jimmie Houchin 
 wrote:  
 
 I have written a micro benchmark which stresses a language in areas 
which are crucial to my application.

I have written this micro benchmark in Pharo, Crystal, Nim, Python, 
PicoLisp, C, C++, Java and Julia.

On my i7 laptop Julia completes it in about 1 minute and 15 seconds, 
amazing magic they have done.

Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. 
Pharo takes over 2 hours. :(

In my benchmarks if I comment out the sum and average of the array. It 
completes in 3.5 seconds.
And when I sum the array it gives the correct results. So I can verify 
its validity.

To illustrate below is some sample code of what I am doing. I iterate 
over the array and do calculations on each value of the array and update 
the array and sum and average at each value simple to stress array 
access and sum and average.

28800 is simply derived from time series one minute values for 5 days, 4 
weeks.

randarray := Array new: 28800.

1 to: randarray size do: [ :i | randarray at: i put: Number random ].

randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations 
here." randarray sum. randarray average ]] timeToRun.

randarrayttr. "0:00:00:36.135"


I do 2 loops with 100 iterations each.

randarrayttr * 200. "0:02:00:27"


I learned early on in this adventure when dealing with compiled 
languages that if you don’t do a lot, the test may not last long enough 
to give any times.

Pharo is my preference. But this is an awful big gap in performance. 
When doing backtesting this is huge. Does my backtest take minutes, 
hours or days?

I am not a computer scientist nor expert in Pharo or Smalltalk. So I do 
not know if there is anything which can improve this.


However I have played around with several experiments of my #sum: method.

This implementation reduces the time on the above randarray in half.

sum: col
| sum |
sum := 0.
1 to: col size do: [ :i |
  sum := sum + (col at: i) ].
^ sum

randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations 
here."
     ltsa sum: randarray. ltsa sum: randarray ]] timeToRun.
randarrayttr2. "0:00:00:18.563"

And this one reduces it a little more.

sum10: col
| sum |
sum := 0.
1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |
  sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + 
(col at: (i + 3)) + (col at: (i + 4))
  + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + 
(col at: (i + 8)) + (col at: (i + 9))].
((col size quo: 10) * 10 + 1) to: col size do: [ :i |
  sum := sum + (col at: i)].
^ sum

randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations 
here."
     ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun.
randarrayttr3. "0:00:00:14.592"

It closes the gap with plain Python3 no numpy. But that is a pretty low 
standard.

Any ideas, thoughts, wisdom, directions to pursue.

Thanks

Jimmie

  

[Pharo-dev] Re: Array sum. is very slow

2022-01-07 Thread Sven Van Caekenberghe



> On 7 Jan 2022, at 16:05, Jimmie Houchin  wrote:
> 
> Hello Sven,
> 
> I went and removed the Stdouts that you mention and other timing code from 
> the loops.
> 
> I am running the test now, to see if that makes much difference. I do not 
> think it will.
> 
> The reason I put that in there is because it take so long to run. It can be 
> frustrating to wait and wait and not know if your test is doing anything or 
> not. So I put the code in to let me know.
> 
> One of your parameters is incorrect. It is 100 iterations not 10.

Ah, I misread the Python code, on top it says, reps = 10, while at the bottom 
it does indeed say, doit(100).

So the time should be multiplied by 10.

The logging, esp. the #flush will slow things down. But the removing the 
message tally spy is important too.

The general implementation of #sum is not optimal in the case of a fixed array. 
Consider:

data := Array new: 1e5 withAll: 0.5.

[ data sum ] bench. "'494.503 per second'"

[ | sum | sum := 0. data do: [ :each | sum := sum + each ]. sum ] bench. 
"'680.128 per second'"

[ | sum | sum := 0. 1 to: 1e5 do: [ :each | sum := sum + (data at: each) ]. sum 
] bench. "'1033.180 per second'"

As others have remarked: doing #average right after #sum is doing the same 
thing twice. But maybe that is not the point.

> I learned early on in this experiment that I have to do a large number of 
> iterations or C, C++, Java, etc are too fast to have comprehensible results.
> 
> I can tell if any of the implementations is incorrect by the final nsum. All 
> implementations must produce the same result.
> 
> Thanks for the comments.
> 
> Jimmie
> 
> 
> On 1/7/22 07:40, Sven Van Caekenberghe wrote:
>> Hi Jimmie,
>> 
>> I loaded your code in Pharo 9 on my MacBook Pro (Intel i5) macOS 12.1
>> 
>> I commented out the Stdio logging from the 2 inner loops (#loop1, #loop2) 
>> (not done in Python either) as well as the MessageTally spyOn: from #run 
>> (slows things down).
>> 
>> Then I ran your code with:
>> 
>> [ (LanguageTest newSize: 60*24*5*4 iterations: 10) run ] timeToRun.
>> 
>> which gave me "0:00:09:31.338"
>> 
>> The console output was:
>> 
>> ===
>> Starting test for array size: 28800  iterations: 10
>> 
>> Creating array of size: 28800   timeToRun: 0:00:00:00.031
>> 
>> Starting loop 1 at: 2022-01-07T14:10:35.395394+01:00
>> Loop 1 time: nil
>> nsum: 11234.235001659388
>> navg: 0.39007760422428434
>> 
>> Starting loop 2 at: 2022-01-07T14:15:22.108433+01:00
>> Loop 2 time: 0:00:04:44.593
>> nsum: 11245.697629561537
>> navg: 0.3904756121375534
>> 
>> End of test.  TotalTime: 0:00:09:31.338
>> ===
>> 
>> Which would be twice as fast as Python, if I got the parameters correct.
>> 
>> Sven
>> 
>>> On 7 Jan 2022, at 13:19, Jimmie Houchin  wrote:
>>> 
>>> As I stated this is a micro benchmark and very much not anything resembling 
>>> a real app, Your comments are true if you are writing your app. But if you 
>>> want to stress the language you are going to do things which are seemingly 
>>> non-sense and abusive.
>>> 
>>> Also as I stated. The test has to be sufficient to stress faster languages 
>>> or it is meaningless.
>>> 
>>> If I remove the #sum and the #average calls from the inner loops, this is 
>>> what we get.
>>> 
>>> Julia  0.2256 seconds
>>> Python   5.318  seconds
>>> Pharo3.5seconds
>>> 
>>> This test does not sufficiently stress the language. Nor does it provide 
>>> any valuable insight into summing and averaging which is done a lot, in 
>>> lots of places in every iteration.
>>> 
>>> If you notice that inner array changes the array every iteration. So every 
>>> call to #sum and #average is getting different data.
>>> 
>>> Full Test
>>> 
>>> Julia 1.13  minutes
>>> Python   24.02 minutes
>>> Pharo2:09:04
>>> 
>>> Code for the above is now published. You can let me know if I am doing 
>>> something unequal to the various languages.
>>> 
>>> And just remember anything you do which sufficiently changes the test has 
>>> to be done in all the languages to give a fair test. This isn't a lets make 
>>> Pharo look good test. I do want Pharo to look good, but honestly.
>>> 
>>> Yes, I know that I can bind to BLAS or other external libraries. But that 
>>> is not a test of Pharo. The Python is plain Python3 no Numpy, just using 
>>> the the default list [] for the array.
>>> 
>>> Julia is a whole other world. It is faster than Numpy. This is their domain 
>>> and they optimize, optimize, optimize all the math. In fact they have 
>>> reached the point that some pure Julia code beats pure Fortran.
>>> 
>>> In all of this I just want Pharo to do the best it can.
>>> 
>>> With the above results unless you already had an investment in Pharo, you 
>>> wouldn't even look. :(
>>> 
>>> Thanks for exploring this with me.
>>> 
>>> 
>>> Jimmie
>>> 
>>> 
>>> 
>>> 
>>> On 1/6/22 18:24, John Brant wrote:
 On Jan 6, 2022, at 4:35 PM, Jimmie Houchin  wrote:
> No, it is an array of floats. The only integers in the 

[Pharo-dev] Re: Array sum. is very slow

2022-01-07 Thread Jimmie Houchin

Hello Sven,

I went and removed the Stdouts that you mention and other timing code 
from the loops.


I am running the test now, to see if that makes much difference. I do 
not think it will.


The reason I put that in there is because it take so long to run. It can 
be frustrating to wait and wait and not know if your test is doing 
anything or not. So I put the code in to let me know.


One of your parameters is incorrect. It is 100 iterations not 10.

I learned early on in this experiment that I have to do a large number 
of iterations or C, C++, Java, etc are too fast to have comprehensible 
results.


I can tell if any of the implementations is incorrect by the final nsum. 
All implementations must produce the same result.


Thanks for the comments.

Jimmie


On 1/7/22 07:40, Sven Van Caekenberghe wrote:

Hi Jimmie,

I loaded your code in Pharo 9 on my MacBook Pro (Intel i5) macOS 12.1

I commented out the Stdio logging from the 2 inner loops (#loop1, #loop2) (not 
done in Python either) as well as the MessageTally spyOn: from #run (slows 
things down).

Then I ran your code with:

[ (LanguageTest newSize: 60*24*5*4 iterations: 10) run ] timeToRun.

which gave me "0:00:09:31.338"

The console output was:

===
Starting test for array size: 28800  iterations: 10

Creating array of size: 28800   timeToRun: 0:00:00:00.031

Starting loop 1 at: 2022-01-07T14:10:35.395394+01:00
Loop 1 time: nil
nsum: 11234.235001659388
navg: 0.39007760422428434

Starting loop 2 at: 2022-01-07T14:15:22.108433+01:00
Loop 2 time: 0:00:04:44.593
nsum: 11245.697629561537
navg: 0.3904756121375534

End of test.  TotalTime: 0:00:09:31.338
===

Which would be twice as fast as Python, if I got the parameters correct.

Sven


On 7 Jan 2022, at 13:19, Jimmie Houchin  wrote:

As I stated this is a micro benchmark and very much not anything resembling a 
real app, Your comments are true if you are writing your app. But if you want 
to stress the language you are going to do things which are seemingly non-sense 
and abusive.

Also as I stated. The test has to be sufficient to stress faster languages or 
it is meaningless.

If I remove the #sum and the #average calls from the inner loops, this is what 
we get.

Julia  0.2256 seconds
Python   5.318  seconds
Pharo3.5seconds

This test does not sufficiently stress the language. Nor does it provide any 
valuable insight into summing and averaging which is done a lot, in lots of 
places in every iteration.

If you notice that inner array changes the array every iteration. So every call 
to #sum and #average is getting different data.

Full Test

Julia 1.13  minutes
Python   24.02 minutes
Pharo2:09:04

Code for the above is now published. You can let me know if I am doing 
something unequal to the various languages.

And just remember anything you do which sufficiently changes the test has to be 
done in all the languages to give a fair test. This isn't a lets make Pharo 
look good test. I do want Pharo to look good, but honestly.

Yes, I know that I can bind to BLAS or other external libraries. But that is 
not a test of Pharo. The Python is plain Python3 no Numpy, just using the the 
default list [] for the array.

Julia is a whole other world. It is faster than Numpy. This is their domain and 
they optimize, optimize, optimize all the math. In fact they have reached the 
point that some pure Julia code beats pure Fortran.

In all of this I just want Pharo to do the best it can.

With the above results unless you already had an investment in Pharo, you 
wouldn't even look. :(

Thanks for exploring this with me.


Jimmie




On 1/6/22 18:24, John Brant wrote:

On Jan 6, 2022, at 4:35 PM, Jimmie Houchin  wrote:

No, it is an array of floats. The only integers in the test are in the indexes 
of the loops.

Number random. "generates a float  0.8188008774329387"

So in the randarray below it is an array of 28800 floats.

It just felt so wrong to me that Python3 was so much faster. I don't care if 
Nim, Crystal, Julia are faster. But...


I am new to Iceberg and have never shared anything on Github so this is all new 
to me. I uploaded my language test so you can see what it does. It is a 
micro-benchmark. It does things that are not realistic in an app. But it does 
stress a language in areas important to my app.


https://github.com/jlhouchin/LanguageTestPharo


Let me know if there is anything else I can do to help solve this problem.

I am a lone developer in my spare time. So my apologies for any ugly code.


Are you sure that you have the same algorithm in Python? You are calling sum 
and average inside the loop where you are modifying the array:

1 to: nsize do: [ :j || n |
n := narray at: j.
narray at: j put: (self loop1calc: i j: j n: n).
nsum := narray sum.
navg := narray average ]

As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus 
another 28,800 times for the 

[Pharo-dev] Re: Array sum. is very slow

2022-01-07 Thread Sven Van Caekenberghe
Hi Jimmie,

I loaded your code in Pharo 9 on my MacBook Pro (Intel i5) macOS 12.1

I commented out the Stdio logging from the 2 inner loops (#loop1, #loop2) (not 
done in Python either) as well as the MessageTally spyOn: from #run (slows 
things down).

Then I ran your code with:

[ (LanguageTest newSize: 60*24*5*4 iterations: 10) run ] timeToRun.

which gave me "0:00:09:31.338"

The console output was:

===
Starting test for array size: 28800  iterations: 10

Creating array of size: 28800   timeToRun: 0:00:00:00.031

Starting loop 1 at: 2022-01-07T14:10:35.395394+01:00
Loop 1 time: nil
nsum: 11234.235001659388
navg: 0.39007760422428434

Starting loop 2 at: 2022-01-07T14:15:22.108433+01:00
Loop 2 time: 0:00:04:44.593
nsum: 11245.697629561537
navg: 0.3904756121375534

End of test.  TotalTime: 0:00:09:31.338
===

Which would be twice as fast as Python, if I got the parameters correct.

Sven

> On 7 Jan 2022, at 13:19, Jimmie Houchin  wrote:
> 
> As I stated this is a micro benchmark and very much not anything resembling a 
> real app, Your comments are true if you are writing your app. But if you want 
> to stress the language you are going to do things which are seemingly 
> non-sense and abusive.
> 
> Also as I stated. The test has to be sufficient to stress faster languages or 
> it is meaningless.
> 
> If I remove the #sum and the #average calls from the inner loops, this is 
> what we get.
> 
> Julia  0.2256 seconds
> Python   5.318  seconds
> Pharo3.5seconds
> 
> This test does not sufficiently stress the language. Nor does it provide any 
> valuable insight into summing and averaging which is done a lot, in lots of 
> places in every iteration.
> 
> If you notice that inner array changes the array every iteration. So every 
> call to #sum and #average is getting different data.
> 
> Full Test
> 
> Julia 1.13  minutes
> Python   24.02 minutes
> Pharo2:09:04
> 
> Code for the above is now published. You can let me know if I am doing 
> something unequal to the various languages.
> 
> And just remember anything you do which sufficiently changes the test has to 
> be done in all the languages to give a fair test. This isn't a lets make 
> Pharo look good test. I do want Pharo to look good, but honestly.
> 
> Yes, I know that I can bind to BLAS or other external libraries. But that is 
> not a test of Pharo. The Python is plain Python3 no Numpy, just using the the 
> default list [] for the array.
> 
> Julia is a whole other world. It is faster than Numpy. This is their domain 
> and they optimize, optimize, optimize all the math. In fact they have reached 
> the point that some pure Julia code beats pure Fortran.
> 
> In all of this I just want Pharo to do the best it can.
> 
> With the above results unless you already had an investment in Pharo, you 
> wouldn't even look. :(
> 
> Thanks for exploring this with me.
> 
> 
> Jimmie
> 
> 
> 
> 
> On 1/6/22 18:24, John Brant wrote:
>> On Jan 6, 2022, at 4:35 PM, Jimmie Houchin  wrote:
>>> No, it is an array of floats. The only integers in the test are in the 
>>> indexes of the loops.
>>> 
>>> Number random. "generates a float  0.8188008774329387"
>>> 
>>> So in the randarray below it is an array of 28800 floats.
>>> 
>>> It just felt so wrong to me that Python3 was so much faster. I don't care 
>>> if Nim, Crystal, Julia are faster. But...
>>> 
>>> 
>>> I am new to Iceberg and have never shared anything on Github so this is all 
>>> new to me. I uploaded my language test so you can see what it does. It is a 
>>> micro-benchmark. It does things that are not realistic in an app. But it 
>>> does stress a language in areas important to my app.
>>> 
>>> 
>>> https://github.com/jlhouchin/LanguageTestPharo
>>> 
>>> 
>>> Let me know if there is anything else I can do to help solve this problem.
>>> 
>>> I am a lone developer in my spare time. So my apologies for any ugly code.
>>> 
>> Are you sure that you have the same algorithm in Python? You are calling sum 
>> and average inside the loop where you are modifying the array:
>> 
>>  1 to: nsize do: [ :j || n |
>>  n := narray at: j.
>>  narray at: j put: (self loop1calc: i j: j n: n).
>>  nsum := narray sum.
>>  navg := narray average ]
>> 
>> As a result, you are calculating the sum of the 28,800 size array 28,800 
>> times (plus another 28,800 times for the average). If I write a similar loop 
>> in Python, it looks like it would take almost 9 minutes on my machine 
>> without using numpy to calculate the sum. The Pharo code takes ~40 seconds. 
>> If this is really how the code should be, then I would change it to not call 
>> sum twice (once for sum and once in average). This will almost result in a 
>> 2x speedup. You could also modify the algorithm to update the nsum value in 
>> the loop instead of summing the array each time. I think the updating would 
>> require <120,000 math ops vs the >1.6 billion that you are performing.
>> 
>> 

[Pharo-dev] Re: Array sum. is very slow

2022-01-07 Thread Jimmie Houchin
As I stated this is a micro benchmark and very much not anything 
resembling a real app, Your comments are true if you are writing your 
app. But if you want to stress the language you are going to do things 
which are seemingly non-sense and abusive.


Also as I stated. The test has to be sufficient to stress faster 
languages or it is meaningless.


If I remove the #sum and the #average calls from the inner loops, this 
is what we get.


Julia      0.2256 seconds
Python   5.318  seconds
Pharo    3.5    seconds

This test does not sufficiently stress the language. Nor does it provide 
any valuable insight into summing and averaging which is done a lot, in 
lots of places in every iteration.


If you notice that inner array changes the array every iteration. So 
every call to #sum and #average is getting different data.


Full Test

Julia 1.13  minutes
Python   24.02 minutes
Pharo    2:09:04

Code for the above is now published. You can let me know if I am doing 
something unequal to the various languages.


And just remember anything you do which sufficiently changes the test 
has to be done in all the languages to give a fair test. This isn't a 
lets make Pharo look good test. I do want Pharo to look good, but honestly.


Yes, I know that I can bind to BLAS or other external libraries. But 
that is not a test of Pharo. The Python is plain Python3 no Numpy, just 
using the the default list [] for the array.


Julia is a whole other world. It is faster than Numpy. This is their 
domain and they optimize, optimize, optimize all the math. In fact they 
have reached the point that some pure Julia code beats pure Fortran.


In all of this I just want Pharo to do the best it can.

With the above results unless you already had an investment in Pharo, 
you wouldn't even look. :(


Thanks for exploring this with me.


Jimmie




On 1/6/22 18:24, John Brant wrote:

On Jan 6, 2022, at 4:35 PM, Jimmie Houchin  wrote:

No, it is an array of floats. The only integers in the test are in the indexes 
of the loops.

Number random. "generates a float  0.8188008774329387"

So in the randarray below it is an array of 28800 floats.

It just felt so wrong to me that Python3 was so much faster. I don't care if 
Nim, Crystal, Julia are faster. But...


I am new to Iceberg and have never shared anything on Github so this is all new 
to me. I uploaded my language test so you can see what it does. It is a 
micro-benchmark. It does things that are not realistic in an app. But it does 
stress a language in areas important to my app.


https://github.com/jlhouchin/LanguageTestPharo


Let me know if there is anything else I can do to help solve this problem.

I am a lone developer in my spare time. So my apologies for any ugly code.


Are you sure that you have the same algorithm in Python? You are calling sum 
and average inside the loop where you are modifying the array:

1 to: nsize do: [ :j || n |
n := narray at: j.
narray at: j put: (self loop1calc: i j: j n: n).
nsum := narray sum.
navg := narray average ]

As a result, you are calculating the sum of the 28,800 size array 28,800 times (plus 
another 28,800 times for the average). If I write a similar loop in Python, it looks 
like it would take almost 9 minutes on my machine without using numpy to calculate 
the sum. The Pharo code takes ~40 seconds. If this is really how the code should be, 
then I would change it to not call sum twice (once for sum and once in average). This 
will almost result in a 2x speedup. You could also modify the algorithm to update the 
nsum value in the loop instead of summing the array each time. I think the updating 
would require <120,000 math ops vs the >1.6 billion that you are performing.


John Brant


[Pharo-dev] Re: Array sum. is very slow

2022-01-07 Thread Guillermo Polito
Yes, I just saw also that I used an interval instead of an array… I need to 
sleep more ^^

Anyways, even with a 28k large array wether they are small integers or floats, 
I have "reasonable results” (where reasonable = not taking hours, nor minutes 
but a couple of milliseconds :P)

randarray := Array new: 28800 withAll: 0.
[randarray sum] bench "'2059.176 per second'"

randarray2 := Array new: 28800 withAll: 0.1234567.
[randarray2 sum] bench "'1771.737 per second’"

I join John’s request to see the Python code…
Is that possible?
G

> El 6 ene 2022, a las 23:35, Jimmie Houchin  escribió:
> 
> No, it is an array of floats. The only integers in the test are in the 
> indexes of the loops.
> 
> Number random. "generates a float  0.8188008774329387"
> 
> So in the randarray below it is an array of 28800 floats.
> 
> It just felt so wrong to me that Python3 was so much faster. I don't care if 
> Nim, Crystal, Julia are faster. But...
> 
> 
> I am new to Iceberg and have never shared anything on Github so this is all 
> new to me. I uploaded my language test so you can see what it does. It is a 
> micro-benchmark. It does things that are not realistic in an app. But it does 
> stress a language in areas important to my app.
> 
> 
> https://github.com/jlhouchin/LanguageTestPharo
> 
> 
> Let me know if there is anything else I can do to help solve this problem.
> 
> I am a lone developer in my spare time. So my apologies for any ugly code.
> 
> 
> Thanks for your help.
> 
> Jimmie
> 
> 
> On 1/6/22 15:07, Guillermo Polito wrote:
>> Hi Jummie,
>> 
>> Is it possible that your program is computing a lot of **very** large 
>> integers?
>> 
>> I’m just trying the following with small numbers, and I don’t see the issue. 
>> #sum executes on a 28k large collection around 20 million times per second 
>> on my old 2015 i5.
>> 
>> a := (1 to: 28000).
>> [a sum] bench "'20256552.490 per second’"
>> 
>> If you could share with us more data, we could take a look.
>> Now i’m curious.
>> 
>> Thanks,
>> G
>> 
>>> El 6 ene 2022, a las 21:37, Jimmie Houchin  escribió:
>>> 
>>> I have written a micro benchmark which stresses a language in areas which 
>>> are crucial to my application.
>>> 
>>> I have written this micro benchmark in Pharo, Crystal, Nim, Python, 
>>> PicoLisp, C, C++, Java and Julia.
>>> 
>>> On my i7 laptop Julia completes it in about 1 minute and 15 seconds, 
>>> amazing magic they have done.
>>> 
>>> Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. Pharo 
>>> takes over 2 hours. :(
>>> 
>>> In my benchmarks if I comment out the sum and average of the array. It 
>>> completes in 3.5 seconds.
>>> And when I sum the array it gives the correct results. So I can verify its 
>>> validity.
>>> 
>>> To illustrate below is some sample code of what I am doing. I iterate over 
>>> the array and do calculations on each value of the array and update the 
>>> array and sum and average at each value simple to stress array access and 
>>> sum and average.
>>> 
>>> 28800 is simply derived from time series one minute values for 5 days, 4 
>>> weeks.
>>> 
>>> randarray := Array new: 28800.
>>> 
>>> 1 to: randarray size do: [ :i | randarray at: i put: Number random ].
>>> 
>>> randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations 
>>> here." randarray sum. randarray average ]] timeToRun.
>>> 
>>> randarrayttr. "0:00:00:36.135"
>>> 
>>> 
>>> I do 2 loops with 100 iterations each.
>>> 
>>> randarrayttr * 200. "0:02:00:27"
>>> 
>>> 
>>> I learned early on in this adventure when dealing with compiled languages 
>>> that if you don’t do a lot, the test may not last long enough to give any 
>>> times.
>>> 
>>> Pharo is my preference. But this is an awful big gap in performance. When 
>>> doing backtesting this is huge. Does my backtest take minutes, hours or 
>>> days?
>>> 
>>> I am not a computer scientist nor expert in Pharo or Smalltalk. So I do not 
>>> know if there is anything which can improve this.
>>> 
>>> 
>>> However I have played around with several experiments of my #sum: method.
>>> 
>>> This implementation reduces the time on the above randarray in half.
>>> 
>>> sum: col
>>> | sum |
>>> sum := 0.
>>> 1 to: col size do: [ :i |
>>>  sum := sum + (col at: i) ].
>>> ^ sum
>>> 
>>> randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations 
>>> here."
>>> ltsa sum: randarray. ltsa sum: randarray ]] timeToRun.
>>> randarrayttr2. "0:00:00:18.563"
>>> 
>>> And this one reduces it a little more.
>>> 
>>> sum10: col
>>> | sum |
>>> sum := 0.
>>> 1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |
>>>  sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + 
>>> (col at: (i + 3)) + (col at: (i + 4))
>>>  + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + (col 
>>> at: (i + 8)) + (col at: (i + 9))].
>>> ((col size quo: 10) * 10 + 1) to: col size do: [ :i |
>>>  sum := sum + (col at: i)].
>>> ^ sum
>>> 
>>> randarrayttr3 := [ 1 to: randarray 

[Pharo-dev] Re: Array sum. is very slow

2022-01-07 Thread stephane ducasse

Thanks John

This was an important remark :)

Another remark is that you can also call BLAS for heavy mathematical operations 
(this is what numpy is doing just calling large fortran library and I do not 
know for julia but it should be same).
And this is easy to do in Pharo.

https://thepharo.dev/2021/10/17/binding-an-external-library-into-pharo/ 


And now you can just define a lot more easily a new binding.


S
> On 7 Jan 2022, at 01:24, John Brant  wrote:
> 
> On Jan 6, 2022, at 4:35 PM, Jimmie Houchin  > wrote:
>> 
>> No, it is an array of floats. The only integers in the test are in the 
>> indexes of the loops.
>> 
>> Number random. "generates a float  0.8188008774329387"
>> 
>> So in the randarray below it is an array of 28800 floats.
>> 
>> It just felt so wrong to me that Python3 was so much faster. I don't care if 
>> Nim, Crystal, Julia are faster. But...
>> 
>> 
>> I am new to Iceberg and have never shared anything on Github so this is all 
>> new to me. I uploaded my language test so you can see what it does. It is a 
>> micro-benchmark. It does things that are not realistic in an app. But it 
>> does stress a language in areas important to my app.
>> 
>> 
>> https://github.com/jlhouchin/LanguageTestPharo
>> 
>> 
>> Let me know if there is anything else I can do to help solve this problem.
>> 
>> I am a lone developer in my spare time. So my apologies for any ugly code.
>> 
> 
> Are you sure that you have the same algorithm in Python? You are calling sum 
> and average inside the loop where you are modifying the array:
> 
>   1 to: nsize do: [ :j || n |
>   n := narray at: j.
>   narray at: j put: (self loop1calc: i j: j n: n).
>   nsum := narray sum.
>   navg := narray average ]
> 
> As a result, you are calculating the sum of the 28,800 size array 28,800 
> times (plus another 28,800 times for the average). If I write a similar loop 
> in Python, it looks like it would take almost 9 minutes on my machine without 
> using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this 
> is really how the code should be, then I would change it to not call sum 
> twice (once for sum and once in average). This will almost result in a 2x 
> speedup. You could also modify the algorithm to update the nsum value in the 
> loop instead of summing the array each time. I think the updating would 
> require <120,000 math ops vs the >1.6 billion that you are performing.
> 
> 
> John Brant



[Pharo-dev] Re: Array sum. is very slow

2022-01-06 Thread John Brant
On Jan 6, 2022, at 4:35 PM, Jimmie Houchin  wrote:
> 
> No, it is an array of floats. The only integers in the test are in the 
> indexes of the loops.
> 
> Number random. "generates a float  0.8188008774329387"
> 
> So in the randarray below it is an array of 28800 floats.
> 
> It just felt so wrong to me that Python3 was so much faster. I don't care if 
> Nim, Crystal, Julia are faster. But...
> 
> 
> I am new to Iceberg and have never shared anything on Github so this is all 
> new to me. I uploaded my language test so you can see what it does. It is a 
> micro-benchmark. It does things that are not realistic in an app. But it does 
> stress a language in areas important to my app.
> 
> 
> https://github.com/jlhouchin/LanguageTestPharo
> 
> 
> Let me know if there is anything else I can do to help solve this problem.
> 
> I am a lone developer in my spare time. So my apologies for any ugly code.
> 

Are you sure that you have the same algorithm in Python? You are calling sum 
and average inside the loop where you are modifying the array:

1 to: nsize do: [ :j || n |
n := narray at: j.
narray at: j put: (self loop1calc: i j: j n: n).
nsum := narray sum.
navg := narray average ]

As a result, you are calculating the sum of the 28,800 size array 28,800 times 
(plus another 28,800 times for the average). If I write a similar loop in 
Python, it looks like it would take almost 9 minutes on my machine without 
using numpy to calculate the sum. The Pharo code takes ~40 seconds. If this is 
really how the code should be, then I would change it to not call sum twice 
(once for sum and once in average). This will almost result in a 2x speedup. 
You could also modify the algorithm to update the nsum value in the loop 
instead of summing the array each time. I think the updating would require 
<120,000 math ops vs the >1.6 billion that you are performing.


John Brant

[Pharo-dev] Re: Array sum. is very slow

2022-01-06 Thread Jimmie Houchin
No, it is an array of floats. The only integers in the test are in the 
indexes of the loops.


Number random. "generates a float  0.8188008774329387"

So in the randarray below it is an array of 28800 floats.

It just felt so wrong to me that Python3 was so much faster. I don't 
care if Nim, Crystal, Julia are faster. But...



I am new to Iceberg and have never shared anything on Github so this is 
all new to me. I uploaded my language test so you can see what it does. 
It is a micro-benchmark. It does things that are not realistic in an 
app. But it does stress a language in areas important to my app.



https://github.com/jlhouchin/LanguageTestPharo


Let me know if there is anything else I can do to help solve this problem.

I am a lone developer in my spare time. So my apologies for any ugly code.


Thanks for your help.

Jimmie


On 1/6/22 15:07, Guillermo Polito wrote:

Hi Jummie,

Is it possible that your program is computing a lot of **very** large integers?

I’m just trying the following with small numbers, and I don’t see the issue. 
#sum executes on a 28k large collection around 20 million times per second on 
my old 2015 i5.

a := (1 to: 28000).
[a sum] bench "'20256552.490 per second’"

If you could share with us more data, we could take a look.
Now i’m curious.

Thanks,
G


El 6 ene 2022, a las 21:37, Jimmie Houchin  escribió:

I have written a micro benchmark which stresses a language in areas which are 
crucial to my application.

I have written this micro benchmark in Pharo, Crystal, Nim, Python, PicoLisp, 
C, C++, Java and Julia.

On my i7 laptop Julia completes it in about 1 minute and 15 seconds, amazing 
magic they have done.

Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. Pharo 
takes over 2 hours. :(

In my benchmarks if I comment out the sum and average of the array. It 
completes in 3.5 seconds.
And when I sum the array it gives the correct results. So I can verify its 
validity.

To illustrate below is some sample code of what I am doing. I iterate over the 
array and do calculations on each value of the array and update the array and 
sum and average at each value simple to stress array access and sum and average.

28800 is simply derived from time series one minute values for 5 days, 4 weeks.

randarray := Array new: 28800.

1 to: randarray size do: [ :i | randarray at: i put: Number random ].

randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations here." 
randarray sum. randarray average ]] timeToRun.

randarrayttr. "0:00:00:36.135"


I do 2 loops with 100 iterations each.

randarrayttr * 200. "0:02:00:27"


I learned early on in this adventure when dealing with compiled languages that 
if you don’t do a lot, the test may not last long enough to give any times.

Pharo is my preference. But this is an awful big gap in performance. When doing 
backtesting this is huge. Does my backtest take minutes, hours or days?

I am not a computer scientist nor expert in Pharo or Smalltalk. So I do not 
know if there is anything which can improve this.


However I have played around with several experiments of my #sum: method.

This implementation reduces the time on the above randarray in half.

sum: col
| sum |
sum := 0.
1 to: col size do: [ :i |
  sum := sum + (col at: i) ].
^ sum

randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations here."
 ltsa sum: randarray. ltsa sum: randarray ]] timeToRun.
randarrayttr2. "0:00:00:18.563"

And this one reduces it a little more.

sum10: col
| sum |
sum := 0.
1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |
  sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + (col 
at: (i + 3)) + (col at: (i + 4))
  + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + (col 
at: (i + 8)) + (col at: (i + 9))].
((col size quo: 10) * 10 + 1) to: col size do: [ :i |
  sum := sum + (col at: i)].
^ sum

randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations here."
 ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun.
randarrayttr3. "0:00:00:14.592"

It closes the gap with plain Python3 no numpy. But that is a pretty low 
standard.

Any ideas, thoughts, wisdom, directions to pursue.

Thanks

Jimmie



[Pharo-dev] Re: Array sum. is very slow

2022-01-06 Thread Guillermo Polito
Hi Jummie,

Is it possible that your program is computing a lot of **very** large integers?

I’m just trying the following with small numbers, and I don’t see the issue. 
#sum executes on a 28k large collection around 20 million times per second on 
my old 2015 i5.

a := (1 to: 28000).
[a sum] bench "'20256552.490 per second’"

If you could share with us more data, we could take a look.
Now i’m curious.

Thanks,
G

> El 6 ene 2022, a las 21:37, Jimmie Houchin  escribió:
> 
> I have written a micro benchmark which stresses a language in areas which are 
> crucial to my application.
> 
> I have written this micro benchmark in Pharo, Crystal, Nim, Python, PicoLisp, 
> C, C++, Java and Julia.
> 
> On my i7 laptop Julia completes it in about 1 minute and 15 seconds, amazing 
> magic they have done.
> 
> Crystal and Nim do it in about 5 minutes. Python in about 25 minutes. Pharo 
> takes over 2 hours. :(
> 
> In my benchmarks if I comment out the sum and average of the array. It 
> completes in 3.5 seconds.
> And when I sum the array it gives the correct results. So I can verify its 
> validity.
> 
> To illustrate below is some sample code of what I am doing. I iterate over 
> the array and do calculations on each value of the array and update the array 
> and sum and average at each value simple to stress array access and sum and 
> average.
> 
> 28800 is simply derived from time series one minute values for 5 days, 4 
> weeks.
> 
> randarray := Array new: 28800.
> 
> 1 to: randarray size do: [ :i | randarray at: i put: Number random ].
> 
> randarrayttr := [ 1 to: randarray size do: [ :i | "other calculations here." 
> randarray sum. randarray average ]] timeToRun.
> 
> randarrayttr. "0:00:00:36.135"
> 
> 
> I do 2 loops with 100 iterations each.
> 
> randarrayttr * 200. "0:02:00:27"
> 
> 
> I learned early on in this adventure when dealing with compiled languages 
> that if you don’t do a lot, the test may not last long enough to give any 
> times.
> 
> Pharo is my preference. But this is an awful big gap in performance. When 
> doing backtesting this is huge. Does my backtest take minutes, hours or days?
> 
> I am not a computer scientist nor expert in Pharo or Smalltalk. So I do not 
> know if there is anything which can improve this.
> 
> 
> However I have played around with several experiments of my #sum: method.
> 
> This implementation reduces the time on the above randarray in half.
> 
> sum: col
> | sum |
> sum := 0.
> 1 to: col size do: [ :i |
>  sum := sum + (col at: i) ].
> ^ sum
> 
> randarrayttr2 := [ 1 to: randarray size do: [ :i | "other calculations here."
> ltsa sum: randarray. ltsa sum: randarray ]] timeToRun.
> randarrayttr2. "0:00:00:18.563"
> 
> And this one reduces it a little more.
> 
> sum10: col
> | sum |
> sum := 0.
> 1 to: ((col size quo: 10) * 10) by: 10 do: [ :i |
>  sum := sum + (col at: i) + (col at: (i + 1)) + (col at: (i + 2)) + (col 
> at: (i + 3)) + (col at: (i + 4))
>  + (col at: (i + 5)) + (col at: (i + 6)) + (col at: (i + 7)) + (col 
> at: (i + 8)) + (col at: (i + 9))].
> ((col size quo: 10) * 10 + 1) to: col size do: [ :i |
>  sum := sum + (col at: i)].
> ^ sum
> 
> randarrayttr3 := [ 1 to: randarray size do: [ :i | "other calculations here."
> ltsa sum10: randarray. ltsa sum10: randarray ]] timeToRun.
> randarrayttr3. "0:00:00:14.592"
> 
> It closes the gap with plain Python3 no numpy. But that is a pretty low 
> standard.
> 
> Any ideas, thoughts, wisdom, directions to pursue.
> 
> Thanks
> 
> Jimmie
>