Re: How to make Python run as fast (or faster) than Julia

2018-03-07 Thread Python
On Thu, Mar 08, 2018 at 08:44:16AM +1100, Chris Angelico wrote:
> On Thu, Mar 8, 2018 at 8:36 AM, Python  wrote:
> > On Mon, Mar 05, 2018 at 04:09:48PM -0800, Dan Stromberg wrote:
> >> On Mon, Mar 5, 2018 at 3:53 PM, Python  wrote:
> >> > On Sat, Mar 03, 2018 at 08:18:03AM +1100, Chris Angelico wrote:
> >> >> > Python is often a preferred solution because it is often fantastic for
> >> >> > rapid implementation and maintainability.  The GIL's interference
> >> >> > with threaded code performance has, for me at least, on several
> >> >> > occasions been...  disappointing (perf costs of removing it aside)
> >> >> > because it gets in the way of choosing Python for such solutions.
> >> >> > Jython and IronPython are simply not feasible options for me, for
> >> >> > multiple reasons that have zero to do with their technical
> >> >> > suitability.
> >> >>
> >> >> Have you actually tried it and run into problems,
> >> >
> >> > Yes.  It was years ago and I forget the details, but I even posted
> >> > some sample code here and was told (quite possibly by you) that it was
> >> > the GIL that was eating my lunch.  Someone suggested writing the bits
> >> > I wanted to thread as a C extension, which largely defeated the
> >> > purpose of using Python.  In at least one case I just used C++, and in
> >> > another I just ignored the problem until it went away.
> >>
> >> So how about a little Cython?
> >
> > Nope.  As a practical matter, the only Python option I have is
> > Python proper (i.e. CPython).
> 
> So you can use C++, but only if you write the entire program in it,
> instead of using Cython and only transforming the part that's actually
> slow?

That describes the situation closely enough to correctly that I don't
feel the need to elaborate, yes.

I will say that /in theory/ I can do whatever I want; in practice
there are significant barriers to integrate new tools into the build
system I'm required to use that make it prohibitive to do so.  There
are complexities there which I don't care to describe, but the point
remains, effectively, the only Python option I have available is
CPython.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-03-07 Thread Chris Angelico
On Thu, Mar 8, 2018 at 8:36 AM, Python  wrote:
> On Mon, Mar 05, 2018 at 04:09:48PM -0800, Dan Stromberg wrote:
>> On Mon, Mar 5, 2018 at 3:53 PM, Python  wrote:
>> > On Sat, Mar 03, 2018 at 08:18:03AM +1100, Chris Angelico wrote:
>> >> > Python is often a preferred solution because it is often fantastic for
>> >> > rapid implementation and maintainability.  The GIL's interference
>> >> > with threaded code performance has, for me at least, on several
>> >> > occasions been...  disappointing (perf costs of removing it aside)
>> >> > because it gets in the way of choosing Python for such solutions.
>> >> > Jython and IronPython are simply not feasible options for me, for
>> >> > multiple reasons that have zero to do with their technical
>> >> > suitability.
>> >>
>> >> Have you actually tried it and run into problems,
>> >
>> > Yes.  It was years ago and I forget the details, but I even posted
>> > some sample code here and was told (quite possibly by you) that it was
>> > the GIL that was eating my lunch.  Someone suggested writing the bits
>> > I wanted to thread as a C extension, which largely defeated the
>> > purpose of using Python.  In at least one case I just used C++, and in
>> > another I just ignored the problem until it went away.
>>
>> So how about a little Cython?
>
> Nope.  As a practical matter, the only Python option I have is
> Python proper (i.e. CPython).

So you can use C++, but only if you write the entire program in it,
instead of using Cython and only transforming the part that's actually
slow?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-03-07 Thread Python
On Mon, Mar 05, 2018 at 04:09:48PM -0800, Dan Stromberg wrote:
> On Mon, Mar 5, 2018 at 3:53 PM, Python  wrote:
> > On Sat, Mar 03, 2018 at 08:18:03AM +1100, Chris Angelico wrote:
> >> > Python is often a preferred solution because it is often fantastic for
> >> > rapid implementation and maintainability.  The GIL's interference
> >> > with threaded code performance has, for me at least, on several
> >> > occasions been...  disappointing (perf costs of removing it aside)
> >> > because it gets in the way of choosing Python for such solutions.
> >> > Jython and IronPython are simply not feasible options for me, for
> >> > multiple reasons that have zero to do with their technical
> >> > suitability.
> >>
> >> Have you actually tried it and run into problems,
> >
> > Yes.  It was years ago and I forget the details, but I even posted
> > some sample code here and was told (quite possibly by you) that it was
> > the GIL that was eating my lunch.  Someone suggested writing the bits
> > I wanted to thread as a C extension, which largely defeated the
> > purpose of using Python.  In at least one case I just used C++, and in
> > another I just ignored the problem until it went away.
> 
> So how about a little Cython?

Nope.  As a practical matter, the only Python option I have is
Python proper (i.e. CPython).

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-03-05 Thread Dan Stromberg
On Mon, Mar 5, 2018 at 3:53 PM, Python  wrote:
> On Sat, Mar 03, 2018 at 08:18:03AM +1100, Chris Angelico wrote:
>> > Python is often a preferred solution because it is often fantastic for
>> > rapid implementation and maintainability.  The GIL's interference
>> > with threaded code performance has, for me at least, on several
>> > occasions been...  disappointing (perf costs of removing it aside)
>> > because it gets in the way of choosing Python for such solutions.
>> > Jython and IronPython are simply not feasible options for me, for
>> > multiple reasons that have zero to do with their technical
>> > suitability.
>>
>> Have you actually tried it and run into problems,
>
> Yes.  It was years ago and I forget the details, but I even posted
> some sample code here and was told (quite possibly by you) that it was
> the GIL that was eating my lunch.  Someone suggested writing the bits
> I wanted to thread as a C extension, which largely defeated the
> purpose of using Python.  In at least one case I just used C++, and in
> another I just ignored the problem until it went away.

So how about a little Cython?  It has decent GIL control, isn't much
different from Python syntactically, and can be used to create C
extension modules callable from CPython.  It allows you to pretty
freely intermix Python data types and C data types - just be careful
about implicit conversions from one to the other - they can slow
things down.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-03-05 Thread Python
On Sat, Mar 03, 2018 at 08:18:03AM +1100, Chris Angelico wrote:
> > Python is often a preferred solution because it is often fantastic for
> > rapid implementation and maintainability.  The GIL's interference
> > with threaded code performance has, for me at least, on several
> > occasions been...  disappointing (perf costs of removing it aside)
> > because it gets in the way of choosing Python for such solutions.
> > Jython and IronPython are simply not feasible options for me, for
> > multiple reasons that have zero to do with their technical
> > suitability.
> 
> Have you actually tried it and run into problems, 

Yes.  It was years ago and I forget the details, but I even posted
some sample code here and was told (quite possibly by you) that it was
the GIL that was eating my lunch.  Someone suggested writing the bits
I wanted to thread as a C extension, which largely defeated the
purpose of using Python.  In at least one case I just used C++, and in
another I just ignored the problem until it went away.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-03-02 Thread Steven D'Aprano
On Fri, 02 Mar 2018 14:45:55 -0600, Python wrote:

> On Mon, Feb 26, 2018 at 09:57:06AM +, Steven D'Aprano wrote:
>> Besides, if you want Python with no GIL so you can run threaded code,
>> why aren't you using IronPython or Jython?
> 
> But this is just another oversimplified argument.  In the real world
> there rather often exist constraints which have nothing to do with what
> is technically possible, but which make impractical or infeasible what
> may be technically possible.

Of course it is simplified. As is the argument that the GIL is the source 
of all evil, er, why Python threading is slow.

In reality, we may have good reasons for not using IronPython or Jython, 
such as the desire for the latest 3.x version, or because we rely on C 
external libraries that don't work under the CLR or JVM.


[...]
> But back to the point of the thread, I think it's ironic that you're
> basically making the benchmarks' argument:  Choose the tool which suits
> the solution you need.  You argue we should use IronPython or Jython
> when threads are needed, and the Julia benchmarks *imply* (but never
> actually argue, AFAICT) that if you need a solution which relies on
> heavy use of function calls, since Python will be slower than Julia, you
> should therefore choose Julia instead.

Indeed. While I haven't said so explicitly in this thread, I agree 
entirely with everything you say here. One of the things I'm very 
interested in is the possibility of writing number-crunching code in 
Julia and calling it from Python.

I haven't tried it yet, but it is on my bucket-list :-)


> But that's only a sensible
> conclusion if all other things be equal, which of course they are not,
> neither for Julia vs. Python, nor for Python vs. Jython or IronPython
> (though likely the latter case is slightly closer to being true).

There are always constraints on what language you might use. Foo-lang 
might be the best language for the job, but since I don't know the 
language (or even heard of it), I can't use it. You may be constrained by 
familiarity, shop politics, customer requirements, the availability of 
staff, personal preferences, legal requirements, etc.

But taking all of those things into consideration, the conclusion still 
fits: use the language that suits you and the job best. If that language 
is Python, then this is how you can get the best performance if and when 
needed.



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-03-02 Thread Chris Angelico
On Sat, Mar 3, 2018 at 7:45 AM, Python  wrote:
> On Mon, Feb 26, 2018 at 09:57:06AM +, Steven D'Aprano wrote:
>> Besides, if you want Python with no GIL so you can run threaded code, why
>> aren't you using IronPython or Jython?
>
> But this is just another oversimplified argument.  In the real world
> there rather often exist constraints which have nothing to do with
> what is technically possible, but which make impractical or infeasible
> what may be technically possible.
>
> Python is often a preferred solution because it is often fantastic for
> rapid implementation and maintainability.  The GIL's interference
> with threaded code performance has, for me at least, on several
> occasions been...  disappointing (perf costs of removing it aside)
> because it gets in the way of choosing Python for such solutions.
> Jython and IronPython are simply not feasible options for me, for
> multiple reasons that have zero to do with their technical
> suitability.

Have you actually tried it and run into problems, or do you succumb to
the FUD and give up before implementing? I can think of only one time
when I tried to speed up a program by multithreading it and was unable
to due to locking. (And actually, it wasn't even the GIL that was the
problem - it was a non-thread-safe library I was calling on, and
*that* meant everything serialized.) "One time, people!" -- Wasabi

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-03-02 Thread Python
On Mon, Feb 26, 2018 at 09:57:06AM +, Steven D'Aprano wrote:
> Besides, if you want Python with no GIL so you can run threaded code, why 
> aren't you using IronPython or Jython?

But this is just another oversimplified argument.  In the real world
there rather often exist constraints which have nothing to do with
what is technically possible, but which make impractical or infeasible
what may be technically possible.

Python is often a preferred solution because it is often fantastic for
rapid implementation and maintainability.  The GIL's interference
with threaded code performance has, for me at least, on several
occasions been...  disappointing (perf costs of removing it aside)
because it gets in the way of choosing Python for such solutions.
Jython and IronPython are simply not feasible options for me, for
multiple reasons that have zero to do with their technical
suitability.

But back to the point of the thread, I think it's ironic that you're
basically making the benchmarks' argument:  Choose the tool which
suits the solution you need.  You argue we should use IronPython or
Jython when threads are needed, and the Julia benchmarks *imply* (but
never actually argue, AFAICT) that if you need a solution which relies
on heavy use of function calls, since Python will be slower than Julia,
you should therefore choose Julia instead.  But that's only a sensible
conclusion if all other things be equal, which of course they are not,
neither for Julia vs. Python, nor for Python vs. Jython or IronPython
(though likely the latter case is slightly closer to being true).

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-27 Thread Steven D'Aprano
On Tue, 27 Feb 2018 11:27:21 +, bartc wrote:

[...]
>> The built-in generator is using a completely different algorithm
>> though, so rate of generation isn't the entire story. How long is the
>> period of the one you're using? (How long before it loops?)
> 
> I believe it's 5*2**1320480*(2**64-1) according to the author's comment.
> 
> I haven't tested that.

What, you haven't run a full period of the generator? I thought your code 
was fast! *wink*

Sorry, I couldn't resist... :-)

But for the record, if we could generate a million million numbers per 
second, that is one every picosecond, the total time would be of the 
order of 10**397504 years. That's a one followed by three hundred ninety-
seven thousand zeroes.


> (By looping I understand that to mean, before the same sequence starts
> again. Because as the numbers are 64 bits, individual numbers will
> inevitably be repeated from time to time.)

Yes, that's what the period means: how long it takes for the entire 
sequence to repeat, not just a single number.




-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-27 Thread bartc

On 27/02/2018 02:27, Chris Angelico wrote:

On Tue, Feb 27, 2018 at 12:57 PM, bartc  wrote:

On 27/02/2018 00:35, Chris Angelico wrote:



Anyway, even this pure Python version can deliver pseudo random numbers at
some 200,000 per second, while the built-in generator does 450,000 per
second, so it's not bad going.


The built-in generator is using a completely different algorithm
though, so rate of generation isn't the entire story. How long is the
period of the one you're using? (How long before it loops?)


I believe it's 5*2**1320480*(2**64-1) according to the author's comment.

I haven't tested that.

(By looping I understand that to mean, before the same sequence starts 
again. Because as the numbers are 64 bits, individual numbers will 
inevitably be repeated from time to time.)



--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-27 Thread Ben Bacarisse
Christian Gollwitzer  writes:

> George Marsaglia is a researcher who invented a couple of PRNGs which
> are both simple (one of the first was called KISS) yet surprisingly
> good.

s/is/was/

Sadly, he died a few years ago (2011).

-- 
Ben.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-27 Thread Christian Gollwitzer

Am 27.02.18 um 03:27 schrieb Chris Angelico:

On Tue, Feb 27, 2018 at 12:57 PM, bartc  wrote:


Anyway, even this pure Python version can deliver pseudo random numbers at
some 200,000 per second, while the built-in generator does 450,000 per
second, so it's not bad going.


The built-in generator is using a completely different algorithm
though, so rate of generation isn't the entire story. How long is the
period of the one you're using? (How long before it loops?) If you
churn the generator to an arbitrary number and then take the next 100
values it generates, are they randomly distributed? 


George Marsaglia is a researcher who invented a couple of PRNGs which 
are both simple (one of the first was called KISS) yet surprisingly 
good. See here: https://en.wikipedia.org/wiki/KISS_(algorithm)


They possess all these desirable properties you mention above and pass 
the DIEHARD random number generator tests (also invented by Marsaglia).



Can you
reconstruct the RNG's internal state by watching it generate numbers
for a short while?


https://eprint.iacr.org/2011/007.pdf


Obviously no PRNG is going to be perfect at this, but there are
certainly degrees of quality to be compared.


It is a field of research and the generator shown by Bart is one of the 
better generators constructed by an expert in the field.


Christian
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Chris Angelico
On Tue, Feb 27, 2018 at 12:57 PM, bartc  wrote:
> On 27/02/2018 00:35, Chris Angelico wrote:
>>
>> On Tue, Feb 27, 2018 at 11:17 AM, Steven D'Aprano
>>  wrote:
>>>
>>> On Tue, 27 Feb 2018 02:09:53 +1100, Chris Angelico wrote:
>>>
 You're still reimplementing the C code in Python, which is inefficient.
 Have you considered going back to the *actual algorithm* and
 implementing that idiomatically in Python? I think you'll find that (a)
 the code is more readable, and (b) the run time is much lower.
>>>
>>>
>>> Chris, I think this is incredibly optimistic, if not naive. We're talking
>>> about a PRNG by Marsaglia, so my guess is that the "original algorithm"
>>> *is* the C code. Or possibly Fortran.
>>>
>>> Even if not, even if there is actually a language-neutral algorithm, its
>>> a PRNG which means its going to be filled with bit-twiddling and number-
>>> crunching operations. Pure Python without a JIT is never going to be
>>> competitive with C or Fortran for code like that.
>>>
>>
>> I may have been a little unclear. It's highly unlikely that the run
>> time of the properly-implemented Python code will be lower than the
>> original C or Fortran. But it most certainly CAN be more efficient
>> than the Python reimplementation of the C implementation, which would
>> narrow the gap considerably. Implementing Python idiotically instead
>> of idiomatically gives you suboptimal performance.
>
>
> Nonsense. You shouldn't need to care about such things. An algorithm is an
> algorithm. And the better ones should be easily written in any language.
>
> This particular one, which was of interest because the calculations tended
> to overflow into undesirable bignums, is just a bunch of calculations.
> 'Bit-twiddling'.
>
> Anyway, even this pure Python version can deliver pseudo random numbers at
> some 200,000 per second, while the built-in generator does 450,000 per
> second, so it's not bad going.

The built-in generator is using a completely different algorithm
though, so rate of generation isn't the entire story. How long is the
period of the one you're using? (How long before it loops?) If you
churn the generator to an arbitrary number and then take the next 100
values it generates, are they randomly distributed? Can you
reconstruct the RNG's internal state by watching it generate numbers
for a short while?

Obviously no PRNG is going to be perfect at this, but there are
certainly degrees of quality to be compared.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread bartc

On 27/02/2018 00:35, Chris Angelico wrote:

On Tue, Feb 27, 2018 at 11:17 AM, Steven D'Aprano
 wrote:

On Tue, 27 Feb 2018 02:09:53 +1100, Chris Angelico wrote:


You're still reimplementing the C code in Python, which is inefficient.
Have you considered going back to the *actual algorithm* and
implementing that idiomatically in Python? I think you'll find that (a)
the code is more readable, and (b) the run time is much lower.


Chris, I think this is incredibly optimistic, if not naive. We're talking
about a PRNG by Marsaglia, so my guess is that the "original algorithm"
*is* the C code. Or possibly Fortran.

Even if not, even if there is actually a language-neutral algorithm, its
a PRNG which means its going to be filled with bit-twiddling and number-
crunching operations. Pure Python without a JIT is never going to be
competitive with C or Fortran for code like that.



I may have been a little unclear. It's highly unlikely that the run
time of the properly-implemented Python code will be lower than the
original C or Fortran. But it most certainly CAN be more efficient
than the Python reimplementation of the C implementation, which would
narrow the gap considerably. Implementing Python idiotically instead
of idiomatically gives you suboptimal performance.


Nonsense. You shouldn't need to care about such things. An algorithm is 
an algorithm. And the better ones should be easily written in any language.


This particular one, which was of interest because the calculations 
tended to overflow into undesirable bignums, is just a bunch of 
calculations. 'Bit-twiddling'.


Anyway, even this pure Python version can deliver pseudo random numbers 
at some 200,000 per second, while the built-in generator does 450,000 
per second, so it's not bad going.


Of course, the C version will generate them at over 100 million per second.

--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Chris Angelico
On Tue, Feb 27, 2018 at 11:17 AM, Steven D'Aprano
 wrote:
> On Tue, 27 Feb 2018 02:09:53 +1100, Chris Angelico wrote:
>
>> You're still reimplementing the C code in Python, which is inefficient.
>> Have you considered going back to the *actual algorithm* and
>> implementing that idiomatically in Python? I think you'll find that (a)
>> the code is more readable, and (b) the run time is much lower.
>
> Chris, I think this is incredibly optimistic, if not naive. We're talking
> about a PRNG by Marsaglia, so my guess is that the "original algorithm"
> *is* the C code. Or possibly Fortran.
>
> Even if not, even if there is actually a language-neutral algorithm, its
> a PRNG which means its going to be filled with bit-twiddling and number-
> crunching operations. Pure Python without a JIT is never going to be
> competitive with C or Fortran for code like that.
>

I may have been a little unclear. It's highly unlikely that the run
time of the properly-implemented Python code will be lower than the
original C or Fortran. But it most certainly CAN be more efficient
than the Python reimplementation of the C implementation, which would
narrow the gap considerably. Implementing Python idiotically instead
of idiomatically gives you suboptimal performance.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Steven D'Aprano
On Tue, 27 Feb 2018 02:09:53 +1100, Chris Angelico wrote:

> You're still reimplementing the C code in Python, which is inefficient.
> Have you considered going back to the *actual algorithm* and
> implementing that idiomatically in Python? I think you'll find that (a)
> the code is more readable, and (b) the run time is much lower.

Chris, I think this is incredibly optimistic, if not naive. We're talking 
about a PRNG by Marsaglia, so my guess is that the "original algorithm" 
*is* the C code. Or possibly Fortran.

Even if not, even if there is actually a language-neutral algorithm, its 
a PRNG which means its going to be filled with bit-twiddling and number-
crunching operations. Pure Python without a JIT is never going to be 
competitive with C or Fortran for code like that.


-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread bartc

On 26/02/2018 20:27, bartc wrote:

On 26/02/2018 19:50, Chris Angelico wrote:

On Tue, Feb 27, 2018 at 6:37 AM, Rick Johnson



So what? Latency is latency. And whether it occurs over the
course of one heavily recursive algorithm that constitutes
the depth and breadth of an entire program (a la fib()), or
it is the incremental cumulative consequence of the entire
program execution, the fact remains that function call
overhead contributes to a significant portion of the latency
inherent in some trivial, and *ALL* non-trivial, modern
software.


[Sorry, the bit of your (Chris') post I replied to got chopped by 
mistake. Here is my post again with the right context:]


CA:
> By saying "the fact remains", you're handwaving away the work of
> actually measuring that function call overhead is "significant". Can
> you show me those numbers? Steve's point is that it is NOT
> significant, because non-toy functions have non-trivial bodies. If you
> wish to disagree, you have to demonstrate that the function call is
> *itself* costly, even when there's a real body to it.
>

Take the last bit of Python I posted, which was that RNG test.

It uses this function:

  def i64(x): return x & 0x

This is a small function, but it can't be a toy one as it was suggested 
by Ned Batchelder. And the test program wasn't at all recursive.



Running the program with N=10_000_000 took 46 seconds.

Replacing the i64() call with '' (where m is a global set to 
0x), it took 38 seconds.


So putting that fix into a function, which is convenient for coding, 
maintenance and readability, cost 20% in runtime.


Going the other way, having i64() call another function i64a() which 
does the work, common when using wrapper functions, it took 60 seconds. 
A 30% slowdown, even though most of the work is doing numeric shifts and 
adds.


So function calls do seem to be expensive operations in CPython, and any 
benchmarks which highlight that fact should be welcomed.


(Note than none of this seems to affect PyPy, which ran at the same 
speed with i64(), without it, or with both i64() and i64a().)


--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread bartc

On 26/02/2018 19:50, Chris Angelico wrote:

On Tue, Feb 27, 2018 at 6:37 AM, Rick Johnson



So what? Latency is latency. And whether it occurs over the
course of one heavily recursive algorithm that constitutes
the depth and breadth of an entire program (a la fib()), or
it is the incremental cumulative consequence of the entire
program execution, the fact remains that function call
overhead contributes to a significant portion of the latency
inherent in some trivial, and *ALL* non-trivial, modern
software.


Take the last bit of Python I posted, which was that RNG test.

It uses this function:

  def i64(x): return x & 0x

This is a small function, but it can't be a toy one as it was suggested 
by Ned Batchelder. And the test program wasn't at all recursive.



Running the program with N=10_000_000 took 46 seconds.

Replacing the i64() call with '' (where m is a global set to 
0x), it took 38 seconds.


So putting that fix into a function, which is convenient for coding, 
maintenance and readability, cost 20% in runtime.


Going the other way, having i64() call another function i64a() which 
does the work, common when using wrapper functions, it took 60 seconds. 
A 30% slowdown, even though most of the work is doing numeric shifts and 
adds.


So function calls do seem to be expensive operations in CPython, and any 
benchmarks which highlight that fact should be welcomed.


(Note than none of this seems to affect PyPy, which ran at the same 
speed with i64(), without it, or with both i64() and i64a().)


--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Chris Angelico
On Tue, Feb 27, 2018 at 6:37 AM, Rick Johnson
 wrote:
> On Monday, February 26, 2018 at 4:39:22 AM UTC-6, Steven D'Aprano wrote:
>> On Sun, 25 Feb 2018 19:26:12 -0800, Rick Johnson wrote:
>>
>> > On Friday, February 23, 2018 at 8:48:55 PM UTC-6, Steven D'Aprano wrote:
>> > [...]
>> > > Take the Fibonacci double-recursion benchmark. Okay, it
>> > > tests how well your language does at making millions of
>> > > function calls. Why?
>> >
>> > Because making "millons of function calls" is what
>> > software *DOES*!
>>
>> You're right -- I worded that badly, and ended up saying
>> something I didn't really mean. Mea culpa.  Of course over
>> the course of the entire program run, most non-trivial
>> programmers will end up having millions (if not billions)
>> of function calls *in total*. But they're not all happening
>> *together*.
>
> So what? Latency is latency. And whether it occurs over the
> course of one heavily recursive algorithm that constitutes
> the depth and breadth of an entire program (a la fib()), or
> it is the incremental cumulative consequence of the entire
> program execution, the fact remains that function call
> overhead contributes to a significant portion of the latency
> inherent in some trivial, and *ALL* non-trivial, modern
> software.

By saying "the fact remains", you're handwaving away the work of
actually measuring that function call overhead is "significant". Can
you show me those numbers? Steve's point is that it is NOT
significant, because non-toy functions have non-trivial bodies. If you
wish to disagree, you have to demonstrate that the function call is
*itself* costly, even when there's a real body to it.

> I'm very glad you brought this up, because it proves my main
> point that, Python's maximum recursion error is one feature
> they got _right_. If you don't like the default recursion
> limit of 1000, or maybe 1000 recursions is breaking a valid
> recursive algorithm, you can set the recursion limit to
> whatever you like...
>
> ## Python 2 ##
> >>> import sys
> >>> sys.setrecursionlimit
> 
> >>> sys.setrecursionlimit(10)
> >>> count = 1
> >>> def bad():
> ... global count
> ... print count
> ... count += 1
> ... bad()
> ...
> >>> bad()
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "", line 5, in bad
>   File "", line 5, in bad
>   File "", line 5, in bad
>   File "", line 5, in bad
>   File "", line 5, in bad
>   File "", line 5, in bad
>   File "", line 5, in bad
>   File "", line 5, in bad
>   File "", line 5, in bad
> RuntimeError: maximum recursion depth exceeded
>
> It is however unfortunate that we get slamed with N
> recursive exception messages :-(, but hey, at least we can
> modify the maximum recursion limit! It's a step in the right
> direction. :-)

Ah, been a while since you tried Python 3, is it? :)

  [Previous line repeated X more times]

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Rick Johnson
On Monday, February 26, 2018 at 4:39:22 AM UTC-6, Steven D'Aprano wrote:
> On Sun, 25 Feb 2018 19:26:12 -0800, Rick Johnson wrote:
>
> > On Friday, February 23, 2018 at 8:48:55 PM UTC-6, Steven D'Aprano wrote:
> > [...]
> > > Take the Fibonacci double-recursion benchmark. Okay, it
> > > tests how well your language does at making millions of
> > > function calls. Why?
> >
> > Because making "millons of function calls" is what
> > software *DOES*!
>
> You're right -- I worded that badly, and ended up saying
> something I didn't really mean. Mea culpa.  Of course over
> the course of the entire program run, most non-trivial
> programmers will end up having millions (if not billions)
> of function calls *in total*. But they're not all happening
> *together*.

So what? Latency is latency. And whether it occurs over the
course of one heavily recursive algorithm that constitutes
the depth and breadth of an entire program (a la fib()), or
it is the incremental cumulative consequence of the entire
program execution, the fact remains that function call
overhead contributes to a significant portion of the latency
inherent in some trivial, and *ALL* non-trivial, modern
software.

>
> What I tried to get across is, how often do we use millions
> of function calls to get one value?

All the time! For, in the abstract, the fundamental function
of software is to extract a "value" from one or more inputs
-- is it not? So, whether that "value" be: (1) a document in
an office productivity suite, or (2) a flight path for a
mars lander or a jumbo jet or child's quad-copter, or (3) a
modified image in a image processing program, or (4) a
design drawing or technical schematic for a mars rover bot,
or (5) an HD quality streaming porno flick, or yes, (6) even
a recursive function which unwittingly underscores some of
the less attractive aspects of the Python language (aka:
function call overhead!) -- all of these examples, and
countless others, extract a value from inputs. Of course,
much like physical traits i suppose, some "values" are more
desirable than others.

>
> By default, Python will only allow one thousand function
> calls in a stack before raising RecursionError.

By _default_, yes.

>
> py> def bad():
> ... return bad()
> ...
> py> bad()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "", line 2, in bad
>   [ previous line repeats 998 times ]
> RecursionError: maximum recursion depth exceeded
>
>
> There's nothing unique about recursion that will trigger
> this error. Any call stack of more than 1000 function calls
> will do it. So forget about code getting a million
> functions deep. The average is probably more like a dozen,
> perhaps a hundred tops.

I'm very glad you brought this up, because it proves my main
point that, Python's maximum recursion error is one feature
they got _right_. If you don't like the default recursion
limit of 1000, or maybe 1000 recursions is breaking a valid
recursive algorithm, you can set the recursion limit to
whatever you like...

## Python 2 ##
>>> import sys
>>> sys.setrecursionlimit

>>> sys.setrecursionlimit(10)
>>> count = 1
>>> def bad():
... global count
... print count
... count += 1
... bad()
...
>>> bad()
1
2
3
4
5
6
7
8
9
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 5, in bad
  File "", line 5, in bad
  File "", line 5, in bad
  File "", line 5, in bad
  File "", line 5, in bad
  File "", line 5, in bad
  File "", line 5, in bad
  File "", line 5, in bad
  File "", line 5, in bad
RuntimeError: maximum recursion depth exceeded

It is however unfortunate that we get slamed with N
recursive exception messages :-(, but hey, at least we can
modify the maximum recursion limit! It's a step in the right
direction. :-)

> [...]
> > > For most application code, executing the function is far
> > > more costly than the overhead of calling it,
> >
> > What? Your functions are only called _once_ during the
> > lifetime of your code execution? Enough with the leg
> > pulling.
>
> Let me explain what I intended to say.  Here is a really
> poor estimate of the overhead of calling a function on my
> computer, measured in iPython.
>
> [...snip valid examples...]
>
> Now I know that this is not always the case. But I maintain
> that for *most* (but not all) practical, real-world code
> that actually does something useful, the overhead of
> calling the functions is *usually* (but not always) low
> compared to the cost of actually doing whatever it is that
> the functions do.

True. But again you failed to address my main point that:
"Lazy features (when feasible) must be offered
_optionally_".

Yes, function call overhead may be tiny in relation to the
code encapsulated within (at least for non-trivial code),
but the fact remains that function calls are inherent in
modern software, and as 

Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread bartc

On 26/02/2018 17:05, Ben Bacarisse wrote:

bartc  writes:


A C version is given below. (One I may have messed around with, which
I'm not sure works properly. For an original, google for Marsaglia and
KISS64 or SUPRKISS64.)


The version I know uses unsigned integers.  Did you change then to signed?


Yeah, I don't know what I was doing with that version.


For a Python version, go back to the original C and work from there.


The original C makes confusing use of macros and comma operators.

A version without macros or comma expressions is here (tweaked from 
generated C):


  https://pastebin.com/raw/k4jFK5TN

This runs the 1-billion loop in 6 seconds.

--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Rick Johnson
On Monday, February 26, 2018 at 3:59:40 AM UTC-6, Steven D'Aprano wrote:
> On Sun, 25 Feb 2018 20:22:17 -0800, Rick Johnson wrote:

> (We tried painting Go Faster stripes on the server, and it
> didn't work.)

Well of course the server won't work after you drip water-
based paint all over the circuit boards. Way to go! In the
future i would suggest draping some old linens over the
electrical bits before throwing your paint around (just be
sure to consult with the wife before rummaging through her
linen closet, as men have been known to meet an untimely
demise for doing much less) . And might i suggest some
skulls and faux flames? Hey, it may not make the thing go
any faster, but it will surely make the other kids jealous.
There's no reason why apple and microsoft should be allowed
to suck up every last merchandising dollar.

> There's no point suggesting such major changes to Python
> that requires going back to the drawing board, to Python
> 0.1 or earlier, and changing the entire execution and
> memory model of the language.

Absolutely not. Probably too late for that now.  I'm only
suggesting that we, when at all possible, provide a switch
to disable these lazy-features in those rare cases when they
do create bottlenecks. Lazy iterators and dict-views are
great ideas, but i'm sure there are instances which prove
them to be PITA and even a bottleneck.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Ben Bacarisse
bartc  writes:

> A C version is given below. (One I may have messed around with, which
> I'm not sure works properly. For an original, google for Marsaglia and
> KISS64 or SUPRKISS64.)

The version I know uses unsigned integers.  Did you change then to signed?

For a Python version, go back to the original C and work from there.

-- 
Ben.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread bartc

On 26/02/2018 15:09, Chris Angelico wrote:

On Tue, Feb 27, 2018 at 2:02 AM, bartc  wrote:

On 26/02/2018 14:04, bartc wrote:


On 26/02/2018 13:42, Ned Batchelder wrote:




   Well, once you notice that the


Python code had N=1e5, and the C code had N=1e9 :)   If you want to
experiment, with N=1e5, the final number should be 5255210926702073855.



OK, I'll try that.



I have that Python version working now. It's necessary to apply that masking
function to wherever numbers can get bigger.

I don't know how long a 1-billion loop will take, but a 10-million loop took
46 seconds on Python 3.6, and 21 seconds on PyPy 2.7 from a couple of years
ago. (And on Windows, which has a somewhat slower CPython than Linux.)


You're still reimplementing the C code in Python, which is
inefficient. Have you considered going back to the *actual algorithm*
and implementing that idiomatically in Python? I think you'll find
that (a) the code is more readable, and (b) the run time is much
lower.


Do you really think that?

The algorithm seems to require this sequence of calculations to be gone 
through. Otherwise anything more efficient can also be done in any language.


So how would idiomatic Python help, by seeing if such a routine already 
exists in some library? That wouldn't be playing the game.


If it helps, I remember playing with a version in Algol 68 Genie 
(interpreted). Still buggy as the output is different, it does about the 
same amount of work.


A 10-million loop would take an estimated 1000 seconds, on the same 
machine that CPython took 46 seconds. So four magnitudes slower than C.


--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Ned Batchelder

On 2/26/18 10:09 AM, Chris Angelico wrote:

On Tue, Feb 27, 2018 at 2:02 AM, bartc  wrote:

On 26/02/2018 14:04, bartc wrote:

On 26/02/2018 13:42, Ned Batchelder wrote:



   Well, once you notice that the

Python code had N=1e5, and the C code had N=1e9 :)   If you want to
experiment, with N=1e5, the final number should be 5255210926702073855.


OK, I'll try that.


I have that Python version working now. It's necessary to apply that masking
function to wherever numbers can get bigger.

I don't know how long a 1-billion loop will take, but a 10-million loop took
46 seconds on Python 3.6, and 21 seconds on PyPy 2.7 from a couple of years
ago. (And on Windows, which has a somewhat slower CPython than Linux.)

You're still reimplementing the C code in Python, which is
inefficient. Have you considered going back to the *actual algorithm*
and implementing that idiomatically in Python? I think you'll find
that (a) the code is more readable, and (b) the run time is much
lower.




Don't bother: C will win this kind of race every time.

--Ned.
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Chris Angelico
On Tue, Feb 27, 2018 at 2:02 AM, bartc  wrote:
> On 26/02/2018 14:04, bartc wrote:
>>
>> On 26/02/2018 13:42, Ned Batchelder wrote:
>
>
>>   Well, once you notice that the
>>>
>>> Python code had N=1e5, and the C code had N=1e9 :)   If you want to
>>> experiment, with N=1e5, the final number should be 5255210926702073855.
>>
>>
>> OK, I'll try that.
>
>
> I have that Python version working now. It's necessary to apply that masking
> function to wherever numbers can get bigger.
>
> I don't know how long a 1-billion loop will take, but a 10-million loop took
> 46 seconds on Python 3.6, and 21 seconds on PyPy 2.7 from a couple of years
> ago. (And on Windows, which has a somewhat slower CPython than Linux.)

You're still reimplementing the C code in Python, which is
inefficient. Have you considered going back to the *actual algorithm*
and implementing that idiomatically in Python? I think you'll find
that (a) the code is more readable, and (b) the run time is much
lower.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread bartc

On 26/02/2018 14:04, bartc wrote:

On 26/02/2018 13:42, Ned Batchelder wrote:



  Well, once you notice that the
Python code had N=1e5, and the C code had N=1e9 :)   If you want to 
experiment, with N=1e5, the final number should be 5255210926702073855.


OK, I'll try that.


I have that Python version working now. It's necessary to apply that 
masking function to wherever numbers can get bigger.


I don't know how long a 1-billion loop will take, but a 10-million loop 
took 46 seconds on Python 3.6, and 21 seconds on PyPy 2.7 from a couple 
of years ago. (And on Windows, which has a somewhat slower CPython than 
Linux.)


Result should be x=11240129907685265998.

By comparison, the C version compiled with -O3 took 0.11 seconds.

(The C version I posted will work, if adjusted to a 1000 loop, but 
you have to change 'signed' to 'unsigned'. Apparently they weren't 
interchangeable after all. I've no idea why I used 'signed' there.


That version is rather cryptic, but it can be better written and without 
the macros, and it will run just as fast. (Marsaglia may have been hot 
with random number routines, but his C could have done with some work...)


My interpreter, using 64-bit numbers, managed 4.8 seconds. But unsigned 
arithmetic, which is uncommon, is not accelerated.)


---

Q=0
carry=36243678541
xcng=12367890123456
xs=521288629546311
indx=20632

def i64(x): return x & 0x

def refill():
global Q, carry, indx
for i in range(20632):
h = carry & 1
z = i64((  i64((Q[i]<<41))>>1)+(i64((Q[i]<<39))>>1)+(carry>>1))
carry = i64((Q[i]>>23)+(Q[i]>>25)+(z>>63))
Q[i] = i64(~(i64(i64(z<<1)+h)))

indx=1
return Q[0]

def start():
global Q, carry, xcng, xs, indx
Q=[0,]*20632

for i in range(20632):

xcng=i64(6906969069 * xcng + 123)

xs = i64(xs ^ (xs<<13))
xs = i64(xs ^ (xs>>17))
xs = i64(xs ^ (xs<<43))

Q[i] = i64(xcng + xs)

N = 1000
for i in range(N):
if indx<20632:
s = Q[indx]
indx+=1
else:
s = refill()
xcng=i64(6906969069 * xcng + 123)
xs = i64(xs ^ (xs<<13))
xs = i64(xs ^ (xs>>17))
xs = i64(xs ^ (xs<<43))
x = i64(s+xcng+xs)
print ("Does x= 4013566000157423768")
print (" x=",x)

start()

--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread bartc

On 26/02/2018 14:34, Chris Angelico wrote:


I'm glad _someone_ funded PyPy, anyhow. It's a great demonstration of
what can be done.


And it's good that /someone/ at least understands how PyPy works, as I 
don't think many people do.


Apparently it doesn't optimise 'hot paths' within a Python program, but 
optimises hot paths in the special Python interpreter. One written in 
[R]Python. Or something...


--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Chris Angelico
On Tue, Feb 27, 2018 at 12:03 AM, Steven D'Aprano
 wrote:
> On Mon, 26 Feb 2018 22:34:00 +1100, Chris Angelico wrote:
>> Removing the GIL from CPython is not about "speeding up" the language or
>> the interpreter, but about improving parallelism.
>
> It is about speeding up threaded code which is CPU-bound. I call that
> speeding up the language :-)
>
> Hypothetically, it could also speed up even unthreaded code.

In its purest sense, no, it cannot speed up unthreaded code. Since
many Python programs are single-threaded, this restricts the benefits
to those programs which can actually parallelize, but the penalties
(mainly overhead) usually apply to all programs.

> Some language features could internally launch threads and operate using
> implicit parallelism. For instance, map(func, seq) could run in parallel
> in separate threads whenever Python knew that func had no side-effects,
> say for built-ins. There are also parallel algorithms for bignum
> arithmetic. If threads were quick enough, I'm sure we could come up with
> many more examples. Hypothetically speaking.

These are ways of converting single-threaded code into multi-threaded
code, which could then benefit from parallelization, which can reduce
wall-time for the process as a whole.

> (In practice, the old hope that parallel computing would speed everything
> up turned out to be flawed. Relatively few tasks are embarrassingly
> parallel, most have sequential bottlenecks, and many are inherently
> sequential.)

Exactly; and it goes further: many modern programmers do not think in
terms of parallelism. Even event-driven code takes a bit of getting
your head around. Why are async functions preferred over threads? It's
not JUST because today's OSes have thread count limits far below
socket limits - if that were the sole justification, we wouldn't have
language-level support for async/await - it'd be a niche technique
used only in the highest-throughput applications. No, async functions
are preferred because they *reduce the points where context switching
can occur*, thus making it easier *for the programmer*. If there were
a way for the language to automatically run things in parallel, the
biggest risk is that the programmer who wrote it can no longer
understand it.

>>> (If it weren't for the EU government funding it, PyPy would probably be
>>> languishing in oblivion. Everyone wants a faster Python so long as they
>>> don't have to pay for it, or do the work.)
>>
>> Hm, I didn't know the EU was funding PyPy. Okay. Cool.
>
> I don't know if they still receive funding, but I know that PyPy really
> only got going in a big way when they got a grant from the EU. I think it
> paid for at least one developer to work on it full time for a year.
> DuckDuckGo is probably your friend if you care about the details.

Meh, I don't care that much about what the EU does or doesn't fund.
Was just a mild case of curiosity. I had about as much interest as my
savings account accrued last year.

I'm glad _someone_ funded PyPy, anyhow. It's a great demonstration of
what can be done.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread bartc

On 26/02/2018 13:42, Ned Batchelder wrote:

On 2/26/18 7:13 AM, bartc wrote:


A C version is given below. (One I may have messed around with, which 
I'm not sure works properly. For an original, google for Marsaglia and 
KISS64 or SUPRKISS64.)


Most integers are unsigned, which have well-defined overflow in C 


With proper 64-bit masking (def only64(x): return x & 
0x), the Python version produces the correct answer 
using a reasonable amount of memory.


I did try sometime like that, but I must have missed something because I 
didn't get quite the same results as a working version.


And with interpreted code, you tend not to test using loops of a billion 
iterations.


 Well, once you notice that the
Python code had N=1e5, and the C code had N=1e9 :)   If you want to 
experiment, with N=1e5, the final number should be 5255210926702073855.


OK, I'll try that.

Also, I note that you said, "Most integers are unsigned", but the C code 
has them all declared as signed?  It doesn't seem to have mattered to 
your result, but I'm not an expert on C portability guarantees.


The C code I first pasted used 'unsigned', but the main program logic 
wasn't right, and I found another version that looked better. That one 
used 'signed' for some reason, which I completely missed.


Even if with C it works with either, the signed version might have 
'undefined behaviour'. As said, google for the original; the ones I can 
see have 'unsigned'. But I can also see a Fortran version that just uses 
'integer*8', which I believe is signed 64-bit.


--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Rhodri James

On 26/02/18 13:42, Ned Batchelder wrote:
Also, I note that you said, "Most integers are unsigned", but the C code 
has them all declared as signed?  It doesn't seem to have mattered to 
your result, but I'm not an expert on C portability guarantees.


C explicitly leaves the behaviour of signed arithmetic overflow 
undefined, so you have no portability guarantees there.


--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Ned Batchelder

On 2/26/18 7:13 AM, bartc wrote:

On 26/02/2018 11:40, Chris Angelico wrote:

On Mon, Feb 26, 2018 at 10:13 PM, bartc  wrote:
Below is the first draft of a Python port of a program to do with 
random

numbers. (Ported from my language, which in turned ported it from a C
program by George Marsaglia, the random number guy.)

However, running it quickly exhausts the memory in my machine. The 
reason is
that Python unhelpfully widens all results to bignums as needed. The 
code

relies on calculations being modulo 2^64.

Note that restricting integer ops to 64 bits probably still won't 
work, as I

believe the numbers need to be unsigned.


No, it's because the original implementation assumed integer
wrap-around (at least, I think that's what's happening; I haven't
analyzed the code in great detail). That means all your integer
operations are doing two jobs: the one they claim to, and then a
masking to 64 bits signed. That's two abstract operations that happen,
due to the nature of the CPU, to work efficiently together. If you
don't implement both halves of that in your Python port, you have
failed at porting. What if you were porting a program from a 72-bit
chip that assumed Binary Coded Decimal? Would you complain that C's
integers are badly behaved?

And that's without even asking whether a C program that assumes
integer wrap-around counts as portable. At least with Python, you have
a guarantee that integer operations are going to behave the same way
on all compliant implementations of the language.



A C version is given below. (One I may have messed around with, which 
I'm not sure works properly. For an original, google for Marsaglia and 
KISS64 or SUPRKISS64.)


Most integers are unsigned, which have well-defined overflow in C 
(they just wrap as expected). In C, a mixed signed/unsigned op is 
performed as unsigned.


-

/*   SUPRKISS64.c, period 5*2^1320480*(2^64-1)   */
#include 
#include 
#include "timer.c"
 static signed long long Q[20632],
 carry=36243678541LL,
 xcng=12367890123456LL,
 xs=521288629546311LL,
 indx=20632;

 #define CNG ( xcng=6906969069LL*xcng+123 )
 #define XS  ( xs^=xs<<13,xs^=xs>>17,xs^=xs<<43 )
 #define SUPR ( indx<20632 ? Q[indx++] : refill() )
 #define KISS SUPR+CNG+XS

 signed long long refill(void) {
  int i; signed long long z,h;
  for(i=0;i<20632;i++) {
    h = (carry&1);
    z = ((Q[i]<<41)>>1)+((Q[i]<<39)>>1)+(carry>>1);
    carry = (Q[i]>>23)+(Q[i]>>25)+(z>>63);
    Q[i] = ~((z<<1)+h);
  }
  indx=1;
  return (Q[0]);
 }

 int main() {
  int i; signed long long x;

  for(i=0;i<20632;i++) Q[i]=CNG+XS;

  for(i=0;i<10;i++) x=KISS;

  printf("Does x=4013566000157423768\n x=%llu.\n",x);
}



With proper 64-bit masking (def only64(x): return x & 
0x), the Python version produces the correct answer 
using a reasonable amount of memory. Well, once you notice that the 
Python code had N=1e5, and the C code had N=1e9 :)   If you want to 
experiment, with N=1e5, the final number should be 5255210926702073855.


Also, I note that you said, "Most integers are unsigned", but the C code 
has them all declared as signed?  It doesn't seem to have mattered to 
your result, but I'm not an expert on C portability guarantees.


--Ned.
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Steven D'Aprano
On Mon, 26 Feb 2018 22:34:00 +1100, Chris Angelico wrote:

[...]
>> (We tried painting Go Faster stripes on the server, and it didn't
>> work.)
> 
> Steven Middlename D'Aprano! You should know better than that. "It didn't
> work" is not good enough. What actually happened?

A truck crashed into the power pole outside our building and the power 
went off. Then the UPS ran down and the server crashed.

Clearly this is a bug in the go-faster stripes. We tried reporting it:

Expected behaviour: server goes faster.

Actual behaviour: a truck crashes into the power pole 
across the street and the lights go out.

but the maintainers closed the issue "Works for me". Very disappointed!


[...]
> Removing the GIL from CPython is not about "speeding up" the language or
> the interpreter, but about improving parallelism.

It is about speeding up threaded code which is CPU-bound. I call that 
speeding up the language :-)

Hypothetically, it could also speed up even unthreaded code.

Some language features could internally launch threads and operate using 
implicit parallelism. For instance, map(func, seq) could run in parallel 
in separate threads whenever Python knew that func had no side-effects, 
say for built-ins. There are also parallel algorithms for bignum 
arithmetic. If threads were quick enough, I'm sure we could come up with 
many more examples. Hypothetically speaking.

(In practice, the old hope that parallel computing would speed everything 
up turned out to be flawed. Relatively few tasks are embarrassingly 
parallel, most have sequential bottlenecks, and many are inherently 
sequential.)


>> (If it weren't for the EU government funding it, PyPy would probably be
>> languishing in oblivion. Everyone wants a faster Python so long as they
>> don't have to pay for it, or do the work.)
> 
> Hm, I didn't know the EU was funding PyPy. Okay. Cool.

I don't know if they still receive funding, but I know that PyPy really 
only got going in a big way when they got a grant from the EU. I think it 
paid for at least one developer to work on it full time for a year. 
DuckDuckGo is probably your friend if you care about the details.



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread bartc

On 26/02/2018 12:06, Antoon Pardon wrote:

On 23-02-18 02:27, Steven D'Aprano wrote:

Why do you care about the 50 million calls? That's crazy -- the important
thing is *calculating the Fibonacci numbers as efficiently as possible*.


No necessarily.

David Beazley in his talks sometimes uses an ineffecient algorithm for 
calculating
fibonacci numbers because he needs something that uses the cpu intensively.
calculating the fibonacci numbers in that context as efficiently as possible 
would
defeat that purpose.

So in a context of a benchmark it is not unreasonable to assume those 50 million
calls are the purpose and not calculating the Fibonacci numbers as efficiently 
as
possible.


I don't think Steven is ever going to concede this point.

Because Python performs badly compared to Julia or C, and it's not 
possible to conveniently offload the task to some fast library because 
it only uses a handful of primitive byte-codes.


(I have the same trouble with my own interpreted language. Although 
somewhat brisker than CPython, it will always be much slower than a 
C-like language on such micro-benchmarks.


But I accept that; I don't have an army of people working on 
acceleration projects and tracing JIT compilers. To those people 
however, such a benchmark can be a useful yardstick of progress.)


--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Antoon Pardon
On 24-02-18 06:18, Terry Reedy wrote:
> On 2/23/2018 11:32 AM, Python wrote:
>>
>> Doing that would completely fail to accomplish the task of comparing
>> the performance of recursive function calls in the two languages,
>> which is what the benchmark was designed to do.
>
> That is an non goal because *languages* do not have clock-time speeds,
> only programs written in concrete implementations run on real hardware.
>
> Why do you think it fair to compare function call times using the
> slowest rather than the fastest implementation of Python function
> calls?  Real Python programmers who are seriously concerned about time
> try to use the fastest implementation appropriate to the situation.

If the slowest happens to be the standard implementation I see nothing wrong 
with it.

>
> >  So, no actually, it shouldn't.
>
> To me, it is 'not using the fasted Python' that fails to make apples
> to apples comparisons.
>
> It has been said here that comparisons should use the same algorithm
> doing the much the same work in both implementation.  However, a
> Python function call does *much more work* than in most languages,
> because the meaning of 'function call' is much more flexible in Python
> than most languages.

So? I don't see what is wrong when is pointed out that all that extra
work, that a lot of people don't care about
is costing performance.

> The usual comparison is like lemons (other language calls) to oranges
> (Python language calls, much more flexible).  To be fair, the
> comparison should be to a Python implementation that either notices or
> accepts a declaration that, for instance, fib(n) only needs to pass an
> int by position.
>
> Comparing int addition to python object addition also compares unlike
> operations.  To compare using the same addition algorithm, one must
> use an implementation that can restrict '+' to just int addition.

I don't agree, if the performance loss comes from the standard implementation
of the language unable of doing that then people testing languages shouldn't
be burdened with going to search for some implementation that can.

-- 
Antoon Pardon


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread bartc

On 26/02/2018 11:40, Chris Angelico wrote:

On Mon, Feb 26, 2018 at 10:13 PM, bartc  wrote:

Below is the first draft of a Python port of a program to do with random
numbers. (Ported from my language, which in turned ported it from a C
program by George Marsaglia, the random number guy.)

However, running it quickly exhausts the memory in my machine. The reason is
that Python unhelpfully widens all results to bignums as needed. The code
relies on calculations being modulo 2^64.

Note that restricting integer ops to 64 bits probably still won't work, as I
believe the numbers need to be unsigned.


No, it's because the original implementation assumed integer
wrap-around (at least, I think that's what's happening; I haven't
analyzed the code in great detail). That means all your integer
operations are doing two jobs: the one they claim to, and then a
masking to 64 bits signed. That's two abstract operations that happen,
due to the nature of the CPU, to work efficiently together. If you
don't implement both halves of that in your Python port, you have
failed at porting. What if you were porting a program from a 72-bit
chip that assumed Binary Coded Decimal? Would you complain that C's
integers are badly behaved?

And that's without even asking whether a C program that assumes
integer wrap-around counts as portable. At least with Python, you have
a guarantee that integer operations are going to behave the same way
on all compliant implementations of the language.



A C version is given below. (One I may have messed around with, which 
I'm not sure works properly. For an original, google for Marsaglia and 
KISS64 or SUPRKISS64.)


Most integers are unsigned, which have well-defined overflow in C (they 
just wrap as expected). In C, a mixed signed/unsigned op is performed as 
unsigned.


-

/*   SUPRKISS64.c, period 5*2^1320480*(2^64-1)   */
#include 
#include 
#include "timer.c"
 static signed long long Q[20632],
 carry=36243678541LL,
 xcng=12367890123456LL,
 xs=521288629546311LL,
 indx=20632;

 #define CNG ( xcng=6906969069LL*xcng+123 )
 #define XS  ( xs^=xs<<13,xs^=xs>>17,xs^=xs<<43 )
 #define SUPR ( indx<20632 ? Q[indx++] : refill() )
 #define KISS SUPR+CNG+XS

 signed long long refill(void) {
  int i; signed long long z,h;
  for(i=0;i<20632;i++) {
h = (carry&1);
z = ((Q[i]<<41)>>1)+((Q[i]<<39)>>1)+(carry>>1);
carry = (Q[i]>>23)+(Q[i]>>25)+(z>>63);
Q[i] = ~((z<<1)+h);
  }
  indx=1;
  return (Q[0]);
 }

 int main() {
  int i; signed long long x;

  for(i=0;i<20632;i++) Q[i]=CNG+XS;

  for(i=0;i<10;i++) x=KISS;

  printf("Does x=4013566000157423768\n x=%llu.\n",x);
}

--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Antoon Pardon
On 23-02-18 02:27, Steven D'Aprano wrote:
> Why do you care about the 50 million calls? That's crazy -- the important 
> thing is *calculating the Fibonacci numbers as efficiently as possible*.

No necessarily.

David Beazley in his talks sometimes uses an ineffecient algorithm for 
calculating
fibonacci numbers because he needs something that uses the cpu intensively.
calculating the fibonacci numbers in that context as efficiently as possible 
would
defeat that purpose.

So in a context of a benchmark it is not unreasonable to assume those 50 million
calls are the purpose and not calculating the Fibonacci numbers as efficiently 
as
possible.

-- 
Antoon. 
 

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Chris Angelico
On Mon, Feb 26, 2018 at 10:13 PM, bartc  wrote:
> Below is the first draft of a Python port of a program to do with random
> numbers. (Ported from my language, which in turned ported it from a C
> program by George Marsaglia, the random number guy.)
>
> However, running it quickly exhausts the memory in my machine. The reason is
> that Python unhelpfully widens all results to bignums as needed. The code
> relies on calculations being modulo 2^64.
>
> Note that restricting integer ops to 64 bits probably still won't work, as I
> believe the numbers need to be unsigned.

No, it's because the original implementation assumed integer
wrap-around (at least, I think that's what's happening; I haven't
analyzed the code in great detail). That means all your integer
operations are doing two jobs: the one they claim to, and then a
masking to 64 bits signed. That's two abstract operations that happen,
due to the nature of the CPU, to work efficiently together. If you
don't implement both halves of that in your Python port, you have
failed at porting. What if you were porting a program from a 72-bit
chip that assumed Binary Coded Decimal? Would you complain that C's
integers are badly behaved?

And that's without even asking whether a C program that assumes
integer wrap-around counts as portable. At least with Python, you have
a guarantee that integer operations are going to behave the same way
on all compliant implementations of the language.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Chris Angelico
On Mon, Feb 26, 2018 at 8:57 PM, Steven D'Aprano
 wrote:
> On Sun, 25 Feb 2018 20:22:17 -0800, Rick Johnson wrote:
>
>> So of course, speed is not and should not be the
>> primary concern, but to say that execution speed is of _no_ concern is
>> quite absurd indeed.
>
> I'm pretty sure that nobody here has said that speed is of no concern.
>
> Rather, I would argue that the position we're taking is that *in general*
> Python is fast enough for the sorts of tasks we use it for (except, of
> course, when it isn't, in which case you have our sympathy, and if we can
> suggest some solutions we will).
>
> Of course we'd take speed optimizations if they were free, but they're
> never free:
>
> - they often require more memory to run;
>
> - they usually require more code, which adds to the maintenance burden
>   and increases the interpreter bloat;
>
> - and increases the risk of bugs;
>
> - somebody has to write, debug, document and maintain the code,
>   and developer time and effort is in short supply;
>
> - or the optimization requires changes to the way Python operates;
>
> - even if we wanted to make that change, it will break backwards
>   compatibility;
>
> - and often what people imagine is a simple optimization (because
>   they haven't tried it) isn't simple at all;
>
> - or simply doesn't work;
>
> - and most importantly, just saying "Make it go faster!" doesn't work,
>   we actually need to care about the details of *how* to make it faster.

Or it reduces memory usage, improves performance, and makes things
easier on the programmer... but might place unexpected (or unwanted)
demands on other implementations of Python. CPython, by virtue of
being the "default" Python interpreter, has to be careful of appearing
to mandate something. That's why buy-in from other Python
implementation teams (I believe Jython was the most notable here) was
important in the discussion about the recent compact dict update. Even
what looks like a no-brainer still can have unwanted consequences.

> (We tried painting Go Faster stripes on the server, and it didn't work.)

Steven Middlename D'Aprano! You should know better than that. "It
didn't work" is not good enough. What actually happened? Did the
stripes smoulder and smoke until you powered down the server? Did the
server raise NotImplementedError when you touched it with the
paintbrush? Did your boss walk up behind you and hand you a pink slip?
Be specific!

> There have been at least two (maybe three) attempts to remove the GIL
> from CPython. They've all turned out to increase complexity by a huge
> amount, and not actually provide the hoped-for speed increase. Under many
> common scenarios, the GIL-less CPython actually ran *slower*.
>
> (I say "hoped for", but it was more wishful thinking than a rational
> expectation that removing the GIL would speed things up. I don't believe
> any of the core developers were surprised that removing the GIL didn't
> increase speeds much, if at all, and sometimes slowed things down. The
> believe that the GIL slows Python down is mostly due to a really
> simplistic understanding of how threading works.)

Removing the GIL from CPython is not about "speeding up" the language
or the interpreter, but about improving parallelism. And I think all
the core devs understand this (hence the lack of surprise there), but
people who call for it often don't.

> (If it weren't for the EU government funding it, PyPy would probably be
> languishing in oblivion. Everyone wants a faster Python so long as they
> don't have to pay for it, or do the work.)

Hm, I didn't know the EU was funding PyPy. Okay. Cool.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread bartc

On 26/02/2018 09:22, Steven D'Aprano wrote:

On Sun, 25 Feb 2018 21:19:19 -0800, Rick Johnson wrote:



I agree with your sarcasm. And that's why these types of auto
conversions should be optional. I agree that most times it's more
practical to let python handle the dirty details. But in some cases,
where you need to squeeze out a few more drops of speed juice, you won't
mind the extra trouble.


And that's where you call out to a library like numpy, or write a C
extension, or use a tool like Numba or Cython to optimise your Python
code to use native ints. (Numba is still a work in progress, but Cython
is a mature working product.)

Or to put it another way... if you want machine ints in Python, the way
you get them is something like:

from numba import jit

@jit
def myfunction(): ...



The core language doesn't have to support these things when there is a
healthy language ecosystem that can do it.


Below is the first draft of a Python port of a program to do with random 
numbers. (Ported from my language, which in turned ported it from a C 
program by George Marsaglia, the random number guy.)


However, running it quickly exhausts the memory in my machine. The 
reason is that Python unhelpfully widens all results to bignums as 
needed. The code relies on calculations being modulo 2^64.


Note that restricting integer ops to 64 bits probably still won't work, 
as I believe the numbers need to be unsigned.


--

Q=0
carry=36243678541
xcng=12367890123456
xs=521288629546311
indx=20632

def refill():
global Q, carry, indx
for i in range(20632):
h = carry & 1
z = ((Q[i]<<41)>>1)+((Q[i]<<39)>>1)+(carry>>1)
carry = (Q[i]>>23)+(Q[i]>>25)+(z>>63)
Q[i] = ~((z<<1)+h)
indx=1
return Q[0]

def start():
global Q, carry, xcng, xs, indx
Q=[0,]*20632

for i in range(20632):

xcng=6906969069 * xcng + 123

xs ^= (xs<<13)
xs ^= (xs>>17)
xs ^= (xs<<43)

Q[i] = xcng + xs

N = 10
for i in range(N):
if indx<20632:
s = Q[indx]
indx+=1
else:
s = refill()
xcng=6906969069 * xcng + 123
xs ^= (xs<<13)
xs ^= (xs>>17)
xs ^= (xs<<43)
x = s+xcng+xs
print ("Does x= 4013566000157423768")
print (" x=",x)

start()

--

(The code performs N iterations of a random number generator. You get 
the result expected, ie. x=401...768, when N is a billion.)


--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Steven D'Aprano
On Sun, 25 Feb 2018 19:26:12 -0800, Rick Johnson wrote:

> On Friday, February 23, 2018 at 8:48:55 PM UTC-6, Steven D'Aprano wrote:
> [...]
>> Take the Fibonacci double-recursion benchmark. Okay, it tests how well
>> your language does at making millions of function calls. Why?
> 
> Because making "millons of function calls" is what software *DOES*!

You're right -- I worded that badly, and ended up saying something I 
didn't really mean. Mea culpa.

Of course over the course of the entire program run, most non-trivial 
programmers will end up having millions (if not billions) of function 
calls *in total*. But they're not all happening *together*.

What I tried to get across is, how often do we use millions of function 
calls to get one value? By default, Python will only allow one thousand 
function calls in a stack before raising RecursionError.

py> def bad():
... return bad()
...
py> bad()
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 2, in bad
  [ previous line repeats 998 times ]
RecursionError: maximum recursion depth exceeded


There's nothing unique about recursion that will trigger this error. Any 
call stack of more than 1000 function calls will do it. So forget about 
code getting a million functions deep. The average is probably more like 
a dozen, perhaps a hundred tops.


[...]
>> For most application code, executing the function is far more costly
>> than the overhead of calling it,
> 
> What? Your functions are only called _once_ during the lifetime of your
> code execution? Enough with the leg pulling.

Let me explain what I intended to say.

Here is a really poor estimate of the overhead of calling a function on 
my computer, measured in iPython.

In [50]: def call():
   : pass
   :
In [51]: %timeit call()
100 loops, best of 3: 717 ns per loop

So, on my computer, in iPython, it takes 717ns to call a function that 
does nothing. How long does it take to do the tiniest smidgen of useful 
work?

In [52]: def do_something():
   : a = 1
   : b = a+1
   : return b
   :

In [53]: %timeit do_something()
100 loops, best of 3: 923 ns per loop

Only an extra 200 ns! Wow, function calls are expensive!!!

(Well, yes, I knew that.) However, let's do some *real* work, not a 
Mickey Mouse function like do_something().

In [54]: import pyprimes

In [55]: %timeit pyprimes.nth_prime(10)
1 loops, best of 3: 2.6 s per loop


I happen to know that calling nth_prime ends up making a couple of 
hundred function calls. (Because I wrote it.) Let's call it 10,000 
function calls, to be on the safe side. So 10,000 x 800 nanoseconds is 
0.008 second, let's call it 0.01 second, which is just 0.4% of the total 
cost of calculating the number I was after.

(For the record, that's 1299709.)

So just a trivial fraction of the total cost of the function.

Now I know that this is not always the case. But I maintain that for 
*most* (but not all) practical, real-world code that actually does 
something useful, the overhead of calling the functions is *usually* (but 
not always) low compared to the cost of actually doing whatever it is 
that the functions do.



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Steven D'Aprano
On Sun, 25 Feb 2018 20:22:17 -0800, Rick Johnson wrote:

> So of course, speed is not and should not be the
> primary concern, but to say that execution speed is of _no_ concern is
> quite absurd indeed.

I'm pretty sure that nobody here has said that speed is of no concern.

Rather, I would argue that the position we're taking is that *in general* 
Python is fast enough for the sorts of tasks we use it for (except, of 
course, when it isn't, in which case you have our sympathy, and if we can 
suggest some solutions we will).

Of course we'd take speed optimizations if they were free, but they're 
never free:

- they often require more memory to run;

- they usually require more code, which adds to the maintenance burden
  and increases the interpreter bloat;

- and increases the risk of bugs;

- somebody has to write, debug, document and maintain the code, 
  and developer time and effort is in short supply;

- or the optimization requires changes to the way Python operates;

- even if we wanted to make that change, it will break backwards
  compatibility;

- and often what people imagine is a simple optimization (because
  they haven't tried it) isn't simple at all;

- or simply doesn't work;

- and most importantly, just saying "Make it go faster!" doesn't work,
  we actually need to care about the details of *how* to make it faster.

(We tried painting Go Faster stripes on the server, and it didn't work.)

There's no point suggesting such major changes to Python that requires 
going back to the drawing board, to Python 0.1 or earlier, and changing 
the entire execution and memory model of the language.

That would just mean we've swapped from a successful, solid, reliable 
version 3.6 of the language to an untried, unstable, unproven, bug-ridden 
version 0.1 of New Python.

And like New Coke, it won't attract new users (they're already using 
Javascript or Go or whatever...) and it will alienate existing users (if 
they wanted Javascript or Go they'd already be using it).

There have been at least two (maybe three) attempts to remove the GIL 
from CPython. They've all turned out to increase complexity by a huge 
amount, and not actually provide the hoped-for speed increase. Under many 
common scenarios, the GIL-less CPython actually ran *slower*.

(I say "hoped for", but it was more wishful thinking than a rational 
expectation that removing the GIL would speed things up. I don't believe 
any of the core developers were surprised that removing the GIL didn't 
increase speeds much, if at all, and sometimes slowed things down. The 
believe that the GIL slows Python down is mostly due to a really 
simplistic understanding of how threading works.)

Besides, if you want Python with no GIL so you can run threaded code, why 
aren't you using IronPython or Jython?

Another example: UnladenSwallow. That (mostly) failed to meet 
expectations because the compiler tool chain being used wasn't mature 
enough. If somebody were to re-do the project again now, the results 
might be different.

But it turns out that not that many people care enough to make Python 
faster to actually invest money in it. Everyone wants to just stamp their 
foot and get it for free.

(If it weren't for the EU government funding it, PyPy would probably be 
languishing in oblivion. Everyone wants a faster Python so long as they 
don't have to pay for it, or do the work.)

A third example: Victor Stinner's FatPython, which seemed really 
promising in theory, but turned out to not make enough progress fast 
enough and he lost interest.

I have mentioned FatPython here a number of times. All you people who 
complain that Python is "too slow" and that the core devs ought to do 
something about it... did any of you volunteer any time to the FatPython 
project?



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-26 Thread Steven D'Aprano
On Sun, 25 Feb 2018 21:19:19 -0800, Rick Johnson wrote:


> I agree with your sarcasm. And that's why these types of auto
> conversions should be optional. I agree that most times it's more
> practical to let python handle the dirty details. But in some cases,
> where you need to squeeze out a few more drops of speed juice, you won't
> mind the extra trouble.

And that's where you call out to a library like numpy, or write a C 
extension, or use a tool like Numba or Cython to optimise your Python 
code to use native ints. (Numba is still a work in progress, but Cython 
is a mature working product.)

Or to put it another way... if you want machine ints in Python, the way 
you get them is something like:

from numba import jit

@jit
def myfunction(): ...



The core language doesn't have to support these things when there is a 
healthy language ecosystem that can do it.


-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-25 Thread Rick Johnson
On Sunday, February 25, 2018 at 10:35:29 PM UTC-6, Steven D'Aprano wrote:
[...]
> Ah, you mean just like the way things were in Python 1.0
> through 2.1? Hands up anyone who has seen an integer
> OverflowError in the last 10 years? Anyone?

I think Python2.1 is much older than 10 years, so yeah, of
course almost no one (save a few old timers) have seen that
error. :-)

> [...]
> I really miss having to either add, or delete, an "L"
> suffix from my long ints, and having to catch OverflowError
> to get any work done, and generally spending half my time
> worrying how my algorithms will behave when integer
> operations overflow.

I agree with your sarcasm. And that's why these types of
auto conversions should be optional. I agree that most times
it's more practical to let python handle the dirty details.
But in some cases, where you need to squeeze out a few more
drops of speed juice, you won't mind the extra trouble. And
perhaps you (meaning specifically you) will never need such
a flexible feature. But hey. The Python community is
diverse. So please do keep that in mind.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-25 Thread Steven D'Aprano
On Sun, 25 Feb 2018 18:33:47 -0800, Rick Johnson wrote:

> On Friday, February 23, 2018 at 10:41:45 AM UTC-6, Steven D'Aprano
> wrote: [...]
>> There are dozens of languages that have made the design choice to limit
>> their default integers to 16- 32- or 64-bit fixed size, and let the
>> user worry about overflow. Bart, why does it upset you so that Python
>> made a different choice?
> 
> A default "integer-diversity-and-inclusivity-doctrine" is all fine and
> dandy by me, (Hey, even integers need safe spaces), but i do wish we
> pythonistas had a method to turn off this (and other) cycle burning
> "features" -- you know -- in the 99.9 percent of time that we don't
> need them.

Ah, you mean just like the way things were in Python 1.0 through 2.1? 
Hands up anyone who has seen an integer OverflowError in the last 10 
years? Anyone?

[steve@ando ~]$ python1.5 -c "print 2**64"
Traceback (innermost last):
  File "", line 1, in ?
OverflowError: integer pow()


I really miss having to either add, or delete, an "L" suffix from my long 
ints, and having to catch OverflowError to get any work done, and 
generally spending half my time worrying how my algorithms will behave 
when integer operations overflow.

Good times. I miss those days. I also miss the days when everyone had 
scabies.


As someone who wrote Python code when bignums where *not* the default, I 
can tell you that:

- it was a real PITA for those who cared about their code working
  correctly and being bug-free;

- and the speed up actually wasn't that meaningful.


As is so often the case with these things, using fixed size ints looks 
good in benchmark games, but what's fast in a toy benchmark and what's 
fast in real code are not always the same.


-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-25 Thread Rick Johnson
On Sunday, February 25, 2018 at 8:45:56 PM UTC-6, Chris Angelico wrote:
> On Mon, Feb 26, 2018 at 1:33 PM, Rick Johnson
>  wrote:
[...]
> > A default "integer-diversity-and-inclusivity-doctrine" is
> > all fine and dandy by me, (Hey, even integers need safe spaces),
> 
> In Python 3.6+, integers have safe underscores instead.

You spacist!!!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-25 Thread Rick Johnson
On Sunday, February 25, 2018 at 8:45:56 PM UTC-6, Chris Angelico wrote:
> On Mon, Feb 26, 2018 at 1:33 PM, Rick Johnson

[...]

> > but i do wish we pythonistas had a method to turn off this
> > (and other) cycle burning "features" -- you know -- in the
> > 99.9 percent of time that we don't need them.
> 
> Definitely. We should also provide a way for people to
> manually allocate memory, for the times when that would be
> more efficient. And to switch out the syntax to include
> braces.

Now you're throwing the baby out with the bath water. 

We needn't re-implement the C language simply to achieve
reasonable and practical code-execution efficiency. Python
was never intended to be the fastest language. Heck! Python
was never intented to be the fastest _scripting_ language!
So of course, speed is not and should not be the primary
concern, but to say that execution speed is of _no_ concern
is quite absurd indeed.

As Steven mentioned, few people, even of they _could_ afford
the absurd price, will ever purchase that high-end sports
car. However, by using this example Steve is committing the
argumentation sin of comparing apples to oranges because
transportation is more a matter of practicality. Sure, you
may arrive at your destination more quickly in the high-end
sports car, however, the trade-off will be that you could be
thrown in jail for speeding or be injured or killed in an
accident.

But such existential risks are not the case for software...

Code that execute more quickly simply (wait for it...) yes,
executes more quickly! And I can assure you: no one will die,
be injured, or even become "Bubba's" cell mate should their
code commit the unforgivable "sin" of executing more
quickly[1].


--
[1] Quick! Someone please call a priest! We need to confess!!!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-25 Thread Rick Johnson
On Friday, February 23, 2018 at 8:48:55 PM UTC-6, Steven D'Aprano wrote:
[...]
> Take the Fibonacci double-recursion benchmark. Okay, it
> tests how well your language does at making millions of
> function calls. Why?

Because making "millons of function calls" is what software
*DOES*!

Granted, and as others have testified in this thread, i
myself have almost never implemented an algorithm like that
monstrosity of recursion that is implemented in fib(), but
whether we use recursive functions are not, a high number of
functions calls is inevitable in our software. And if you
don't believe me, then use sys.setrace(myFunc) to trace all
function calls to stdout, and you shall be enlightened[1].

> How often do you make millions of function calls?

All the time! Of course, if the extent of your code base is
a single script containing the source `print 'Hello World'`,
perhaps your experience may vary from others here.

> For most application code, executing the function is far
> more costly than the overhead of calling it,

What? Your functions are only called _once_ during the
lifetime of your code execution? Enough with the leg
pulling.

> and the call overhead is dwarfed by the rest of the
> application.

Of course, because it's calling _other_ functions.

> For many, many applications, the *entire* program run could
> take orders of magnitude fewer function calls than a single
> call to fib(38).

Hmm, let's see:

def main():
print 'hello world'
main()

By golly, he's right! O_O


--
[1] But, in the meantime, you may want to make yourself a
taco; go play a round of golf; and possibly pay a visit to
national monument (or three!), while all that crap is printing
to stdout.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-25 Thread Chris Angelico
On Mon, Feb 26, 2018 at 1:33 PM, Rick Johnson
 wrote:
> On Friday, February 23, 2018 at 10:41:45 AM UTC-6, Steven D'Aprano wrote:
> [...]
>> There are dozens of languages that have made the design
>> choice to limit their default integers to 16- 32- or 64-bit
>> fixed size, and let the user worry about overflow. Bart,
>> why does it upset you so that Python made a different
>> choice?
>
> A default "integer-diversity-and-inclusivity-doctrine" is
> all fine and dandy by me, (Hey, even integers need safe spaces),

In Python 3.6+, integers have safe underscores instead.

> but i do wish we pythonistas had a method to turn off this
> (and other) cycle burning "features" -- you know -- in the
> 99.9 percent of time that we don't need them.

Definitely. We should also provide a way for people to manually
allocate memory, for the times when that would be more efficient. And
to switch out the syntax to include braces. And all those other things
people like from other languages. Hey, here's an idea: maybe Python
shouldn't stop people from using other languages, if those other
languages are better suited to the job?

> And BTW... Am i the *ONLY* person here who feels that Python
> optimizations should be more than merely the tossing of dead
> weight overboard?

I dunno. We optimize this mailing list that way..

*ducking for cover*

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-25 Thread Rick Johnson
On Friday, February 23, 2018 at 10:41:45 AM UTC-6, Steven D'Aprano wrote:
[...]
> There are dozens of languages that have made the design
> choice to limit their default integers to 16- 32- or 64-bit
> fixed size, and let the user worry about overflow. Bart,
> why does it upset you so that Python made a different
> choice?

A default "integer-diversity-and-inclusivity-doctrine" is
all fine and dandy by me, (Hey, even integers need safe spaces),
but i do wish we pythonistas had a method to turn off this
(and other) cycle burning "features" -- you know -- in the
99.9 percent of time that we don't need them.

And BTW... Am i the *ONLY* person here who feels that Python
optimizations should be more than merely the tossing of dead
weight overboard?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-24 Thread bartc

On 24/02/2018 02:05, Steven D'Aprano wrote:

On Fri, 23 Feb 2018 19:25:35 +, bartc wrote:



Python is 10 times slower than a competitor = doesn't matter
My language is 1.5 times slower than the big boys' = matters
a great deal



As for Python's order-of-magnitude speed difference, thank you for being
generous.


Actually that comparison was with a competitor, ie. another dynamic 
language, because I understand such languages work in different fields 
from the Cs and C++s.


I'm sure there must be some that are faster (years since I've looked at 
the field), but I vaguely had in mind mine. Although since then, CPython 
has gotten faster.


Note that there are JIT-based implementations now which can give very 
good results (other than PyPy) with dynamic languages.


My own efforts are still byte-code based so are unlikely to get any 
faster. But they are also very simple.



So it is quite possible to get practical work done and be a competitive,
useful language despite being (allegedly) a thousand or more times slower
than C.


Of course. I've been using a dynamic scripting language as an adjunct to 
my compiled applications since the mid 80s. Then they were crude and 
hopelessly slow (and machines were a lot slower too), but they could 
still be tremendously useful with the right balance.


But the faster they are, the more work they can take over.


--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-24 Thread bartc

On 24/02/2018 02:46, Steven D'Aprano wrote:


Take the Fibonacci double-recursion benchmark. Okay, it tests how well
your language does at making millions of function calls. Why? How often
do you make millions of function calls?


Very often. Unless you are writing code 1970s style with everything in 
one big main function.


I've done some tests with my interpreter [sorry, Ned], on one real task:

  Total number of byte-code calls: 3.2 million
  Total number of real x64 calls: 520 million

On this specific benchmark: 48 million and 580 million.

Changing the way those x64 calls were made (using a different call 
convention), made some byte-code programs take up to 50% longer to 
execute. [In this interpreter, each byte-code, no matter what it is, is 
dispatched via a function call.]


 For most application code,

executing the function is far more costly than the overhead of calling
it, and the call overhead is dwarfed by the rest of the application.


Any actual figures?

In the case of interpreters, you want to optimise each byte-code, and 
one way of doing that is to write a small program which features that 
byte-code heavily. And then you start tweaking.


It is true that when working with heavy-duty data, or offloading work to 
external, non-byte-code functions, then the byte-code execution 
overheads are small. But Python's weakness is when it /is/ executing 
actual algorithms using actual byte-code.


And actually, performance of CPython does seem to have improved 
tremendously over the years. So some people have their eye on the ball. 
Clearly not you.



If you have a language with tail recursion elimination, you can bet
that's its benchmarks will include examples of tail recursion and tail
recursion will be a favoured idiom in that language. If it doesn't, it
won't.


Benchmarks need to be honest. But Fibonacci I think can't use that 
optimisation (although gcc seems to have found another way of not that 
much work).


--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread boB Stepp
On Fri, Feb 23, 2018 at 1:16 PM, Chris Angelico  wrote:
> On Sat, Feb 24, 2018 at 6:09 AM, Python  wrote:
>> On Sat, Feb 24, 2018 at 05:56:25AM +1100, Chris Angelico wrote:
>>> No, not satisfied. Everything you've said would still be satisfied if
>>> all versions of the benchmark used the same non-recursive algorithm.
>>> There's nothing here that says it's testing recursion, just that (for
>>> consistency) it's testing the same algorithm. There is no reason to
>>> specifically test *recursion*, unless that actually aligns with what
>>> you're doing.
>>
>> It seems abundantly clear to me that testing recursion is the point of
>> writing a benchmark implementing recursion (and very little of
>> anything else).  Again, you can decide for yourself the suitability of
>> the benchmark, but I don't think you can really claim it doesn't
>> effectively test what it means to.
>
> Where do you get that it's specifically a recursion benchmark though?
> I can't find it anywhere in the explanatory text.

I hope I am not missing something blatantly obvious, but in the page
Python linked to (https://julialang.org/benchmarks/), in the opening
paragraph it states:


These micro-benchmarks, while not comprehensive, do test compiler
performance on a range of common code patterns, such as function
calls, string parsing, sorting, numerical loops, random number
generation, recursion, and array operations.


Recursion is listed above as one code pattern to specifically target
with one of their micro-benchmarks.  I did not exhaustively go through
each language's code examples, but it does appear that the authors of
these benchmarks are implementing as exactly as possible the same
algorithm chosen for each of these code patterns in each language.
Now to me the real question is whether the Julia people were trying to
be intellectually honest in their choices of coding patterns and
algorithms or were they deliberately cherry picking these to make
Julia look better than the competition?  Myself, I would rather be
charitable than accusatory about the benchmarkers' intentions.  For
instance, the authors were aware of numpy and used it for some of the
python coding -- the array operations they were targeting IIRC.
Instead, if they were being deliberately dishonest, they could have
come up with some really contrived python code.



-- 
boB
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Terry Reedy

On 2/23/2018 11:32 AM, Python wrote:

On Fri, Feb 23, 2018 at 03:11:36AM -0500, Terry Reedy wrote:

Why do you care about the 50 million calls? That's crazy -- the important
thing is *calculating the Fibonacci numbers as efficiently as possible*.



If you are writing practical programs, that's true.  But the Julia
benchmarks are not practical programs; they are designed to compare
the performance of various language features across a range of
languages.


If that were so, then the comparison should use the fastest *Python*
implementation.


Doing that would completely fail to accomplish the task of comparing
the performance of recursive function calls in the two languages,
which is what the benchmark was designed to do.


That is an non goal because *languages* do not have clock-time speeds, 
only programs written in concrete implementations run on real hardware.


Why do you think it fair to compare function call times using the 
slowest rather than the fastest implementation of Python function calls? 
 Real Python programmers who are seriously concerned about time try to 
use the fastest implementation appropriate to the situation.


>  So, no actually, it shouldn't.

To me, it is 'not using the fasted Python' that fails to make apples to 
apples comparisons.


It has been said here that comparisons should use the same algorithm 
doing the much the same work in both implementation.  However, a Python 
function call does *much more work* than in most languages, because the 
meaning of 'function call' is much more flexible in Python than most 
languages.  The usual comparison is like lemons (other language calls) 
to oranges (Python language calls, much more flexible).  To be fair, the 
comparison should be to a Python implementation that either notices or 
accepts a declaration that, for instance, fib(n) only needs to pass an 
int by position.


Comparing int addition to python object addition also compares unlike 
operations.  To compare using the same addition algorithm, one must use 
an implementation that can restrict '+' to just int addition.


The Juila fib benchmark uses the type of function call Julia is good at. 
 Suppose we define fib in a way that uses Python features.


def fib(n, dummy):
if n >= 2:
return fib(n=n-1, dummy=dummy) + fib(dummy=dummy, n=n-2)
elif n >= 0:
return 1
else:
return None, n # or some error indicator including the bad value

If the language does not support 'parameter=value' syntax (which existed 
long before Python), use ('n', n) and ('dummy', dummy) instead.

Now let's compare call times.

Or lets try 'fib(n-1, dummy) + fib(dummy=dummy, n=n-2)' to compare 
functions that can identify argments either by position or by name.


Or f(n-1, dummy) + f(dummy=dummy, n=n-2) + f(*(n-3, dummy)), and change 
'2' to '3', to utilize another Python call feature.  If the language 
does not support *args, omit '*'.




--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Steven D'Aprano
On Fri, 23 Feb 2018 12:43:06 -0600, Python wrote:

> Even if testing optimized
> code is the point, as the article claims, it utterly fails to do that. 
> Bad science.

You've used that statement two or three times now.

*This isn't science*.

There's nothing scientific about writing benchmarks, or even objective. 
It is through and through subjective choices given a paper-thin patina of 
objectivity because the results include numbers.

But those numbers depend on the precise implementation of the benchmark. 
They depend on the machine you run them on, sometimes strongly enough 
that the order of which language is faster can swap. I remember a bug in 
Python's urllib module, I think it was, that made code using it literally 
hundreds of times slower on Windows than Linux or OS X.

The choice of algorithms used is not objective, or fair. Most of it is 
tradition: the famous "whetstone" benchmark apparently measures something 
which has little or no connection to anything software developers should 
care about. It, like the Dhrystone variant, were invented to benchmark 
CPU performance. The relevance to comparing languages is virtually zero.

"As this data reveals, Dhrystone is not a particularly 
representative sample of the kinds of instruction sequences
that are typical of today's applications. The majority of 
embedded applications make little use of the C libraries
for example, and even desktop applications are unlikely to
have such a high weighting of a very small number of
specific library calls."

http://dell.docjava.com/courses/cr346/data/papers/DhrystoneMIPS-
CriticismbyARM.pdf


Take the Fibonacci double-recursion benchmark. Okay, it tests how well 
your language does at making millions of function calls. Why? How often 
do you make millions of function calls? For most application code, 
executing the function is far more costly than the overhead of calling 
it, and the call overhead is dwarfed by the rest of the application.

For many, many applications, the *entire* program run could take orders 
of magnitude fewer function calls than a single call to fib(38).

If you have a language with tail recursion elimination, you can bet 
that's its benchmarks will include examples of tail recursion and tail 
recursion will be a favoured idiom in that language. If it doesn't, it 
won't.

I'm going to end with a quote:

"And of course, the very success of a benchmark program is
a danger in that people may tune their compilers and/or
hardware to it, and with this action make it less useful."

Reinhold P. Weicker, Siemens AG, April 1989
Author of the Dhrystone Benchmark



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Dan Stromberg
On Fri, Feb 23, 2018 at 6:06 PM, bartc  wrote:
> On 24/02/2018 00:45, Dan Stromberg wrote:
>> But again, those techniques are only infrequently relevant, and pretty
>> much never relevant to an entire application.
> That your page lists 23 different approaches to making Python faster
> suggests that its speed can be an issue. Certainly there seem to be a lot of
> projects going on trying to fix that aspect.

Of course CPU performance can be an issue. For any language.

When I was a kid, programs were written in BASIC.  If BASIC was too
slow, you'd rewrite the hotspots in assembler.  Does that mean
everything should have been written in assembler?  Not at all.
Assembler was slow to code in and tended to produce crashy solutions.

Obsessing about CPU time is outdated thinking for almost all problems.
Developer time is much more likely to matter to the bottom line

> Perhaps it ought to have been clear that this was CPython, if it in fact
> was.

CPython != Python.  g++ != C++.  gcc != C.

Python is a language.  Not an implementation of a language.

> There are so many possibilities with Python, that using anything else
> would confuse matters.

Gosh no.  A language's success can, in significant part, be judged by
the number of compatible implementations and dialects.

You might find this interesting:
http://stromberg.dnsalias.org/~strombrg/why-is-python-slow/

It's just one microbenchmark, but it's me addressing someone asking
why Python is so much slower than C++.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread bartc

On 24/02/2018 00:45, Dan Stromberg wrote:

On Fri, Feb 23, 2018 at 1:32 PM, bartc  wrote:



But the difference in runtime speed between Python and other dynamic
languages, if you look at benchmarks doing actual work in the language, can
be much greater than two times, yet that doesn't appear to matter.


The best way to use Python is to:
1) Write your whole application in it.
2) IF things are too slow, profile the app.  Usually performance will
be fast enough from the start, but not always.
3) After (if) identifying a hotspot, optimize it in any of a variety
of ways, while still reaping the benefits of rapid development (which
often matters much more than CPU speed today)

I have a page about speeding up python:
http://stromberg.dnsalias.org/~strombrg/speeding-python/

But again, those techniques are only infrequently relevant, and pretty
much never relevant to an entire application.


That your page lists 23 different approaches to making Python faster 
suggests that its speed can be an issue. Certainly there seem to be a 
lot of projects going on trying to fix that aspect.


So I'll rephrase, and say that some individuals in this group are saying 
that speed doesn't matter. For many others it apparently does.


(I've just looked at the benchmark results referred to in the OP's link 
[https://julialang.org/benchmarks/] and the comparison for the recursion 
benchmark (let's just call it that) seems fair enough to me.


Perhaps it ought to have been clear that this was CPython, if it in fact 
was. There are so many possibilities with Python, that using anything 
else would confuse matters. People who use Python will be aware there 
are acceleration techniques.


(A note regarding the C timing in that table: I've looked at what gcc 
does with this benchmark with its unreasonably fast timing, and it 
appears to cheat. That is, it does not call that function (ie. enter via 
the normal prologue code) the number of times it says. I don't know if 
that was the case when they tested C for that table. But it can happen 
that a sufficiently simple benchmark can lend itself to 
super-optimisation that is much harder with real code.


For that reason I no longer take account of gcc-O3 timings for 
micro-benchmarks.))



--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Steven D'Aprano
On Fri, 23 Feb 2018 19:25:35 +, bartc wrote:

> On 23/02/2018 18:05, Steven D'Aprano wrote:
>> On Fri, 23 Feb 2018 13:51:33 +, Ben Bacarisse wrote:
> 
>> Stop writing crap code and then complaining that the language is "too
>> slow". Write better code, and then we'll take your complaints
>> seriously.
> 
> I write compilers for static languages, which are generally considered
> toys, partly because the code generated is not great.

I have no idea whether your compilers are toys, or what "not great" means 
for your generated code.

- too slow for the intended purpose?
- full of bugs?
- generated code is enormous?
- requires too much memory?
- all of the above?

If the actual answer is, "none of the above", then apart from your 
failure to attract users and build a full ecosystem around your compiler, 
I see no reason to call them toys.

If your compilers can get within 10% of best-of-breed C compilers, even 
if only on trivial problems, that's nothing to sneer at. If you can do it 
on non-trivial applications (say, the Linux kernel) without introducing a 
vast number of compiler-generated bugs, then you're doing very well 
indeed.

In another post (responding to Chris) you wrote:

> Python is 10 times slower than a competitor = doesn't matter
> My language is 1.5 times slower than the big boys' = matters 
> a great deal

Maybe it matters to *you*. You are the one who stated it was a toy 
compiler. To me, I have *no idea* whether the compiler is any good. 
Nobody but you uses it. I once tried to compile it, and couldn't get it 
to compile.

As for Python's order-of-magnitude speed difference, thank you for being 
generous. If you cherry-pick the right benchmarks, you could easily have 
said it was two orders-of-magnitude slower. On the other hand, if you use 
PyPy, and similarly cherry-pick the right benchmarks, you can justifiably 
say that *Python is faster than C*. (On carefully chosen examples which 
probably aren't representative of full application performance.)

But okay, let's take "Python is an order of magnitude (between 10 and 90 
times) slower than C" for the sake of the argument. Does it matter?

Maybe. Maybe not. Sometimes it matters. Sometimes it doesn't.

The world's fastest street-legal production car is, so I'm told, the 
Bugatti Veyron Super Sport. It has a top speed of 268 mph and the engine 
has 1200 horsepower. And yet literally *billions* of car drivers choose 
to drive something slower. They wouldn't buy a Veyron even if they could 
afford it. Why shouldn't people make choices based on factors other than 
speed, speed, speed, speed, speed?

Ruby has fallen out of favour now, but for quite a few years it was 
looking like a contender despite being an order of magnitude slower than 
Python. Which is an order of magnitude slower than Java, which is an 
order of magnitude slower than C, according to some benchmarks at least.

People write business applications using Excel spreadsheets and macros. 
And why not?

So it is quite possible to get practical work done and be a competitive, 
useful language despite being (allegedly) a thousand or more times slower 
than C.


[...]
> Have you ever written a benchmark in your life? If so, did it do
> anything useful? Other than what benchmarks are normally used for.

Yes I have, not for what benchmarks are normally used for (dick-measuring 
contests between programmers competing to see whose language is more 
macho) but for the very practical purposes of:

(1) Deciding which algorithm to use in practice;

(2) Ensuring that updates to the code don't suffer serious performance 
regressions;

(3) And checking that the code is *fast enough* (not as fast as 
conceivably possible) for its purpose.

They are the sorts of benchmarks I care about.


-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Dan Stromberg
On Fri, Feb 23, 2018 at 1:32 PM, bartc  wrote:
> On 23/02/2018 20:12, Chris Angelico wrote:
> So I'll keep it generic. Let's say the Tiny C compiler is not taken
> seriously because it might be a couple of times slower than gcc-O3, even
> thought it's 1% of the size and compiles 1000% as fast.

What's generic about a specific example?  (that's rhetorical)

I've used tcc in preference to gcc before; I had an application where
performance didn't matter as much as security; tcc has buffer overrun
checking.

> But the difference in runtime speed between Python and other dynamic
> languages, if you look at benchmarks doing actual work in the language, can
> be much greater than two times, yet that doesn't appear to matter.

The best way to use Python is to:
1) Write your whole application in it.
2) IF things are too slow, profile the app.  Usually performance will
be fast enough from the start, but not always.
3) After (if) identifying a hotspot, optimize it in any of a variety
of ways, while still reaping the benefits of rapid development (which
often matters much more than CPU speed today)

I have a page about speeding up python:
http://stromberg.dnsalias.org/~strombrg/speeding-python/

But again, those techniques are only infrequently relevant, and pretty
much never relevant to an entire application.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Chris Angelico
On Sat, Feb 24, 2018 at 8:32 AM, bartc  wrote:
> So I'll keep it generic. Let's say the Tiny C compiler is not taken
> seriously because it might be a couple of times slower than gcc-O3, even
> thought it's 1% of the size and compiles 1000% as fast.

Except that nobody has said that. You're doing an excellent job of
fighting against a strawman.

I'm done. Have fun!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread bartc

On 23/02/2018 20:12, Chris Angelico wrote:

On Sat, Feb 24, 2018 at 7:02 AM, bartc  wrote:



I don't know what point you're making here. Unless it's that no software is
worth writing unless it's done on a big scale with a huge team of people,
and is very well known.

My own point should be clear:

  Python is 10 times slower than a competitor = doesn't matter
  My language is 1.5 times slower than the big boys' = matters a great deal


Most performance is a tradeoff. Python has other advantages; what are
yours? So far, all you've said is that you're "not too much worse".
That's great for a toy language!


Ned Batchelder has banned me from mentioning any of my private work here.

So I'll keep it generic. Let's say the Tiny C compiler is not taken 
seriously because it might be a couple of times slower than gcc-O3, even 
thought it's 1% of the size and compiles 1000% as fast.


But the difference in runtime speed between Python and other dynamic 
languages, if you look at benchmarks doing actual work in the language, 
can be much greater than two times, yet that doesn't appear to matter.


Even defining how fast or slow Python actually is is fraught with 
complexities making it hard to establish just what its comparative speed is:


No one seems to be able to agree with what benchmarks ought to be used. 
Some don't seem to get benchmarks. Then it's question of whether you run 
CPython. Or PyPy. Or Ironpython or Jpython. Or Numba with @jit. Or 
Numpy. Or @cache or whatever it is. Or you run Cython.


Or maybe you can combine @jit, @cache, Cython, and PyPy in the same 
program; I don't know. (Have I ever mentioned Python is complicated?)


So, you were asking about the benefits of having a small, simple 
self-contained implementation of a language. I really couldn't tell you...



--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Ned Batchelder

On 2/23/18 3:02 PM, bartc wrote:

On 23/02/2018 19:47, Chris Angelico wrote:

On Sat, Feb 24, 2018 at 6:25 AM, bartc  wrote:



The difference between Python and another dynamic language might be a
magnitude, yet you say it doesn't matter.

Thanks, that makes me feel much better about my own work! If a 
magnitude

difference in performance can be so casually dismissed.


Yep! Your work is awesome, because your code is only a little bit
slower than the same algorithm with the same bugs in a better-known
language that's supported on more platforms and has a large team of
people supporting it.

Remind me what's actually advantageous about your language?


I don't know what point you're making here. Unless it's that no 
software is worth writing unless it's done on a big scale with a huge 
team of people, and is very well known.


My own point should be clear:

 Python is 10 times slower than a competitor = doesn't matter
 My language is 1.5 times slower than the big boys' = matters a great 
deal





I've lost track of the points anyone here is making.  Can we PLEASE 
avoid getting into this tarpit of comparing Bart's language to Python 
yet again?


--Ned.
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Chris Angelico
On Sat, Feb 24, 2018 at 7:02 AM, bartc  wrote:
> On 23/02/2018 19:47, Chris Angelico wrote:
>>
>> On Sat, Feb 24, 2018 at 6:25 AM, bartc  wrote:
>
>
>>> The difference between Python and another dynamic language might be a
>>> magnitude, yet you say it doesn't matter.
>>>
>>> Thanks, that makes me feel much better about my own work! If a magnitude
>>> difference in performance can be so casually dismissed.
>>
>>
>> Yep! Your work is awesome, because your code is only a little bit
>> slower than the same algorithm with the same bugs in a better-known
>> language that's supported on more platforms and has a large team of
>> people supporting it.
>>
>> Remind me what's actually advantageous about your language?
>
>
> I don't know what point you're making here. Unless it's that no software is
> worth writing unless it's done on a big scale with a huge team of people,
> and is very well known.
>
> My own point should be clear:
>
>  Python is 10 times slower than a competitor = doesn't matter
>  My language is 1.5 times slower than the big boys' = matters a great deal

Most performance is a tradeoff. Python has other advantages; what are
yours? So far, all you've said is that you're "not too much worse".
That's great for a toy language!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread bartc

On 23/02/2018 19:47, Chris Angelico wrote:

On Sat, Feb 24, 2018 at 6:25 AM, bartc  wrote:



The difference between Python and another dynamic language might be a
magnitude, yet you say it doesn't matter.

Thanks, that makes me feel much better about my own work! If a magnitude
difference in performance can be so casually dismissed.


Yep! Your work is awesome, because your code is only a little bit
slower than the same algorithm with the same bugs in a better-known
language that's supported on more platforms and has a large team of
people supporting it.

Remind me what's actually advantageous about your language?


I don't know what point you're making here. Unless it's that no software 
is worth writing unless it's done on a big scale with a huge team of 
people, and is very well known.


My own point should be clear:

 Python is 10 times slower than a competitor = doesn't matter
 My language is 1.5 times slower than the big boys' = matters a great deal


--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Python
On Fri, Feb 23, 2018 at 01:41:17PM -0600, Python wrote:
> On Sat, Feb 24, 2018 at 06:16:38AM +1100, Chris Angelico wrote:
> > > It seems abundantly clear to me that testing recursion is the point of
> > > writing a benchmark implementing recursion (and very little of
> > > anything else).  Again, you can decide for yourself the suitability of
> > > the benchmark, but I don't think you can really claim it doesn't
> > > effectively test what it means to.
> > 
> > Where do you get that it's specifically a recursion benchmark though?
> > I can't find it anywhere in the explanatory text.
> 
> I don't know how you could conclude it could be meant to test anything
> else.

Actually I will make one small correction, which is to say that the
benchmark tests the performance of algorithms which are predominated
by many function calls, of which recursion is an obvious subset.  But
I think that's splitting hairs.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Jack Fearnley
On Thu, 22 Feb 2018 17:13:02 -0800, Rick Johnson wrote:

> On Thursday, February 22, 2018 at 1:55:35 PM UTC-6, Jack Fearnley wrote:
> [...]
>> I realize that this thread is about benchmarking and not really about
>> generating fibonacci numbers, but I hope nobody is using this code to
>> generate them on a 'production' basis,
> 
> I've been raising the warning flags about that awful Fibonacci example
> for years!
> 
> First it was in the tutorial, and then, they plastered it on the front
> page of Python.org like it were some sort of trophy...
> 
> At this point i have concluded that it must be: (1) a sadistic noob
> hazing device, or (2) a straw-stuffed,
> anthropomorphic, crow-poop-riddled monster that sends the seasoned
> programmers off in the opposite direction.

Excellent thread!  I see I'm preaching to the converted.

Jack



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Chris Angelico
On Sat, Feb 24, 2018 at 6:25 AM, bartc  wrote:
> On 23/02/2018 18:05, Steven D'Aprano wrote:
>>
>> On Fri, 23 Feb 2018 13:51:33 +, Ben Bacarisse wrote:
>
>
>> Stop writing crap code and then complaining that the language is "too
>> slow". Write better code, and then we'll take your complaints seriously.
>
>
> I write compilers for static languages, which are generally considered toys,
> partly because the code generated is not great.

Doesn't matter what language you're writing in, if your algorithm is
flawed. Bubble sort is bubble sort no matter what you're using. This
function will calculate triangle numbers just as inefficiently in
Python as it would in C:

def triangle(n):
if n <= 1: return n
return triangle(n-1) + triangle(n-1) - triangle(n-2) + 1


> But in a real application, the difference between my compiler, and gcc-O3 or
> MSVC/O2 might only be 10-50%. 1.1 to 1.5 times as fast (or as slow), and
> that's an application that does few external library calls.
>
> The difference between Python and another dynamic language might be a
> magnitude, yet you say it doesn't matter.
>
> Thanks, that makes me feel much better about my own work! If a magnitude
> difference in performance can be so casually dismissed.

Yep! Your work is awesome, because your code is only a little bit
slower than the same algorithm with the same bugs in a better-known
language that's supported on more platforms and has a large team of
people supporting it.

Remind me what's actually advantageous about your language?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Python
On Sat, Feb 24, 2018 at 06:16:38AM +1100, Chris Angelico wrote:
> > It seems abundantly clear to me that testing recursion is the point of
> > writing a benchmark implementing recursion (and very little of
> > anything else).  Again, you can decide for yourself the suitability of
> > the benchmark, but I don't think you can really claim it doesn't
> > effectively test what it means to.
> 
> Where do you get that it's specifically a recursion benchmark though?
> I can't find it anywhere in the explanatory text.

I don't know how you could conclude it could be meant to test anything
else.  The code is:

## recursive fib ##
fib(n) = n < 2 ? n : fib(n-1) + fib(n-2)

What else is there of substance to test?  Trigraphs?  Addition?  The
comment even calls out that it is recursive.

I further draw inference from the two statements:

1. "Instead, the benchmarks are written to test the performance of
   identical algorithms and code patterns implemented in each
   language."

Calling out explicitly that it is algorithms that they are testing,
and here the algorithm is a recursion.

2. "For example, the Fibonacci benchmarks all use the same
   (inefficient) doubly-recursive algorithm..."  
   
They expressly called out this algorithm as being inefficient, and
expressly called it out as being double-recursion.

I'll grant you they never explicitly said so unequivocally (that I
saw--my reading of their site was not exhaustive), but I can't see any
way an informed reader could conclude any different purpose other than
benchmarking recursion.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread bartc

On 23/02/2018 18:56, Chris Angelico wrote:

On Sat, Feb 24, 2018 at 5:43 AM, Python  wrote:



Satisfied?


No, not satisfied. Everything you've said would still be satisfied if
all versions of the benchmark used the same non-recursive algorithm.


Why? Does it matter?


There's nothing here that says it's testing recursion, just that (for
consistency) it's testing the same algorithm. There is no reason to
specifically test *recursion*, unless that actually aligns with what
you're doing. For instance, when Google Chrome rolled out its new V8
JavaScript engine, it was specifically optimized for recursion,
because many Google apps used recursion heavily. If you're
implementing a new LISP interpreter, you should probably optimize for
recursion, because most LISP code is going to be written recursively.
But otherwise, there's no particular reason to stick to recursion.


You list plenty of reasons, then say there's no reason to use recursion!

The recursion is a red herring; but to get a program with lots of 
function calling, then using a recursive algorithm - any algorithm - 
will do the job.


Would you have a different opinion if Python excelled at that benchmark, 
and Julia sucked? (I can't remember what the results were.)


(I was testing one of my static compilers today; it has three call 
convention modes all operating within the Win64 ABI. I used the 
Fibonacci benchmark with fib(42) to test it, with these results:


 mode 1   5.8 seconds   (non-ABI)
 mode 2   6.5 seconds   (part ABI conformance)
 mode 3   7.4 seconds   (full ABI conformance)

Other compilers (C compilers that have to conform) ranged from 4.3 to 
6.2 seconds, including MSVC/O2. Except for gcc-O3 which managed 1.7 
seconds. How do it do that? I'll have to investigate to see how much it 
cheated.


The point of all this: I was using the recursive Fibonacci benchmark for 
these tests, as there is little else going on besides function calls and 
returns. It's perfect.


Why some people hate it here, I really don't know.)


I won't dispute that part. The correct way to do this would be to
optimize both sides fairly - either not at all, or equivalently (for
some definition of equivalence). But switching both sides to an
unoptimized iterative algorithm would be perfectly fair. Recursion is
NOT essential to the benchmark.


So, what benchmark would you use instead if you wanted an idea of how 
well each language dealt with it?


--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread bartc

On 23/02/2018 18:05, Steven D'Aprano wrote:

On Fri, 23 Feb 2018 13:51:33 +, Ben Bacarisse wrote:



Stop writing crap code and then complaining that the language is "too
slow". Write better code, and then we'll take your complaints seriously.


I write compilers for static languages, which are generally considered 
toys, partly because the code generated is not great.


But in a real application, the difference between my compiler, and 
gcc-O3 or MSVC/O2 might only be 10-50%. 1.1 to 1.5 times as fast (or as 
slow), and that's an application that does few external library calls.


The difference between Python and another dynamic language might be a 
magnitude, yet you say it doesn't matter.


Thanks, that makes me feel much better about my own work! If a magnitude 
difference in performance can be so casually dismissed.



"Python takes hours to sort a million integers using BubbleSort! Its too
damn slow!!!"

"Why not use the built-in sort? That's fast."

"NOO I MUST USE BUBBLESORT, BECAUSE REASONS!!!1!"

*wink*


Have you ever written a benchmark in your life? If so, did it do 
anything useful? Other than what benchmarks are normally used for.


--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Chris Angelico
On Sat, Feb 24, 2018 at 6:09 AM, Python  wrote:
> On Sat, Feb 24, 2018 at 05:56:25AM +1100, Chris Angelico wrote:
>> No, not satisfied. Everything you've said would still be satisfied if
>> all versions of the benchmark used the same non-recursive algorithm.
>> There's nothing here that says it's testing recursion, just that (for
>> consistency) it's testing the same algorithm. There is no reason to
>> specifically test *recursion*, unless that actually aligns with what
>> you're doing.
>
> It seems abundantly clear to me that testing recursion is the point of
> writing a benchmark implementing recursion (and very little of
> anything else).  Again, you can decide for yourself the suitability of
> the benchmark, but I don't think you can really claim it doesn't
> effectively test what it means to.

Where do you get that it's specifically a recursion benchmark though?
I can't find it anywhere in the explanatory text.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Python
On Sat, Feb 24, 2018 at 05:56:25AM +1100, Chris Angelico wrote:
> No, not satisfied. Everything you've said would still be satisfied if
> all versions of the benchmark used the same non-recursive algorithm.
> There's nothing here that says it's testing recursion, just that (for
> consistency) it's testing the same algorithm. There is no reason to
> specifically test *recursion*, unless that actually aligns with what
> you're doing. 

It seems abundantly clear to me that testing recursion is the point of
writing a benchmark implementing recursion (and very little of
anything else).  Again, you can decide for yourself the suitability of
the benchmark, but I don't think you can really claim it doesn't
effectively test what it means to.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Python
On Fri, Feb 23, 2018 at 06:14:22PM +, Steven D'Aprano wrote:
> > I never said the benchmarks chosen were awesome...  :-)  What I'm saying
> > is this:
> > 
> > 1. Insistence that the most efficient python implementation of Fib
> >completely misses the point (and defeats the purpose) of the
> >benchmarks, and as such the entire article is lost in the weeds.
> 
> The point of the benchmarks is to highlight how great Julia is

For a particular set of common code patterns.  You left that part out, 
and it's the crux of the issue.

> and why people should choose it over Python.

I'd argue not why, but WHEN.  Or really I'd argue nothing of the
sort... and simply point out that the benchmarks do exactly what the
Julia team say they do.  It's up to you to decide how useful that is.

> The point of the article is to show how to take badly written, 
> unidiomatic, slow code written in Python, and modify it to be better, 
> faster code competitive (if not faster than) Julia, while remaining 
> within the Python ecosystem.

Even if that was the point, the article fails, because it compares
non-optimized Julia code to optimized Pyhton code.

> We aren't limited to the two options the benchmarks suggest

It actually suggested quite a lot more than that... benchmarks
compared the performance of 12 languages, using exactly the same
programming patterns (as near as I can tell anyway, I didn't
scrutinize all of the code).


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Chris Angelico
On Sat, Feb 24, 2018 at 5:43 AM, Python  wrote:
> On Sat, Feb 24, 2018 at 03:42:43AM +1100, Chris Angelico wrote:
>> >> If that were so, then the comparison should use the fastest *Python*
>> >> implementation.
>> >
>> > Doing that would completely fail to accomplish the task of comparing
>> > the performance of recursive function calls in the two languages,
>> > which is what the benchmark was designed to do.  So, no actually, it
>> > shouldn't.
>> >
>>
>> Where does the author say that the benchmark is designed to compare
>> recursion?
>
> Chris, you're a smart guy...  Are you suggesting that the reason the
> fibonacci sequence was selected as a benchmark is because it's such an
> amazingly useful problem that it, in and of itself, warrants having
> such a behchmark?  Or, do you think the reason it makes sense to have
> such a benchmark is because, like the reason it's presented in pretty
> much every CS program ever, it presents an opportunity to consider a
> particular class of problems and different techniques for solving
> those problems, and the performance characteristics of those
> solutions?
>
>
> But, to answer your question more directly, here:
>
>   https://julialang.org/benchmarks/
>
> "It is important to note that the benchmark codes are not written
> for absolute maximal performance (the fastest code to compute
> recursion_fibonacci(20) is the constant literal 6765). Instead,
> the benchmarks are written to test the performance of identical
> algorithms and code patterns implemented in each language. For
> example, the Fibonacci benchmarks all use the same (inefficient)
> doubly-recursive algorithm..."
>
> Satisfied?

No, not satisfied. Everything you've said would still be satisfied if
all versions of the benchmark used the same non-recursive algorithm.
There's nothing here that says it's testing recursion, just that (for
consistency) it's testing the same algorithm. There is no reason to
specifically test *recursion*, unless that actually aligns with what
you're doing. For instance, when Google Chrome rolled out its new V8
JavaScript engine, it was specifically optimized for recursion,
because many Google apps used recursion heavily. If you're
implementing a new LISP interpreter, you should probably optimize for
recursion, because most LISP code is going to be written recursively.
But otherwise, there's no particular reason to stick to recursion.

> So, by changing the algorithm, the article defeats the purpose of the
> benchmark.  It makes some fine points about code optimization, but it
> completely fails at its stated purpose (to make the benchmarks more
> fair).  The comparisons it makes are substantially less valid than the
> ones made by the Julia benchmarks, on account of optimizing only the
> algorithm used by Python, and not testing with a similarly optimized
> algorithm in Julia, but rather using its results for the intentionally
> unoptimized algorithm those benchmarks used.  Even if testing
> optimized code is the point, as the article claims, it utterly fails
> to do that.  Bad science.
>

I won't dispute that part. The correct way to do this would be to
optimize both sides fairly - either not at all, or equivalently (for
some definition of equivalence). But switching both sides to an
unoptimized iterative algorithm would be perfectly fair. Recursion is
NOT essential to the benchmark.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Python
On Sat, Feb 24, 2018 at 03:42:43AM +1100, Chris Angelico wrote:
> >> If that were so, then the comparison should use the fastest *Python*
> >> implementation.
> >
> > Doing that would completely fail to accomplish the task of comparing
> > the performance of recursive function calls in the two languages,
> > which is what the benchmark was designed to do.  So, no actually, it
> > shouldn't.
> >
> 
> Where does the author say that the benchmark is designed to compare
> recursion? 

Chris, you're a smart guy...  Are you suggesting that the reason the
fibonacci sequence was selected as a benchmark is because it's such an
amazingly useful problem that it, in and of itself, warrants having
such a behchmark?  Or, do you think the reason it makes sense to have
such a benchmark is because, like the reason it's presented in pretty
much every CS program ever, it presents an opportunity to consider a
particular class of problems and different techniques for solving
those problems, and the performance characteristics of those
solutions?


But, to answer your question more directly, here:

  https://julialang.org/benchmarks/

"It is important to note that the benchmark codes are not written
for absolute maximal performance (the fastest code to compute
recursion_fibonacci(20) is the constant literal 6765). Instead,
the benchmarks are written to test the performance of identical
algorithms and code patterns implemented in each language. For
example, the Fibonacci benchmarks all use the same (inefficient)
doubly-recursive algorithm..."

Satisfied?

> Recursion is sometimes a good way to describe an algorithm, but
> rarely a good way to implement it at a low level.

I'm well aware, and said the equivalent elsewhere.  As I also said
elsewhere, I never claimed it was a particularly useful benchmark.  It
is, nevertheless, designed to accomplish the stated goal, and does
exactly that.  You can decide for yourself how useful that goal is,
but you can't really argue that it doesn't serve that purpose.

So, by changing the algorithm, the article defeats the purpose of the
benchmark.  It makes some fine points about code optimization, but it
completely fails at its stated purpose (to make the benchmarks more
fair).  The comparisons it makes are substantially less valid than the
ones made by the Julia benchmarks, on account of optimizing only the
algorithm used by Python, and not testing with a similarly optimized
algorithm in Julia, but rather using its results for the intentionally
unoptimized algorithm those benchmarks used.  Even if testing
optimized code is the point, as the article claims, it utterly fails
to do that.  Bad science.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Chris Angelico
On Sat, Feb 24, 2018 at 5:05 AM, Steven D'Aprano
 wrote:
> But guess what? The benchmarks are flawed. The performance of real-world
> Julia code doesn't match the performance of the benchmarks.
>
> "What’s disappointing is the striking difference between
> the claimed performance and the observed one. For example,
> a trivial hello world program in Julia runs ~27x slower
> than Python’s version and ~187x slower than the one in C."
>
> http://zverovich.net/2016/05/13/giving-up-on-julia.html
>
> Admittedly "Hello World" is not what *I* would call a real world program,
> but that's just an illustration of the discrepancy between the
> performance in artificial benchmarks and the performance in useful code.

No, but Hello World is a good test of application startup time. (A
null program doesn't always test everything, and doesn't look very
good either.) If Hello World takes half a second to run, you have a
language that is terrible for short scripts or command-line tools.
That doesn't mean it's a "slow language" necessarily - nor even a slow
interpreter, which is what you're actually testing - but it does mean
that, well...

rosuav@sikorsky:~$ time pypy -c 'print("Hello, world!")'
Hello, world!

real 0m0.429s
user 0m0.036s
sys 0m0.032s
rosuav@sikorsky:~$ time python -c 'print("Hello, world!")'
Hello, world!

real 0m0.024s
user 0m0.016s
sys 0m0.008s

... CPython does have its place still. Or maybe...

rosuav@sikorsky:~$ time pypy -c 'print("Hello, world!")'
Hello, world!

real 0m0.050s
user 0m0.016s
sys 0m0.032s
rosuav@sikorsky:~$ time python -c 'print("Hello, world!")'
Hello, world!

real 0m0.022s
user 0m0.016s
sys 0m0.004s

... that PyPy doesn't live in my disk cache the way CPython does. :)
PyPy is still slower than CPython for startup time, but not THAT much
slower. However, if you DO have something that has a ton of startup
overhead, you need to know about it. (Interestingly, I was actually
able to compile and run a C hello-world in comparable time to PyPy,
once the cache was warmed up. Curious. I expected that to be slower.
Once again, it proves that intuition can be extremely wrong.)

In terms of "which language is faster?", real-world code is both the
ONLY way to measure, and a completely unfair comparison. Life is
tough.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Chris Angelico
On Sat, Feb 24, 2018 at 4:49 AM, Steven D'Aprano
 wrote:
> On Sat, 24 Feb 2018 00:03:06 +1100, Chris Angelico wrote:
>
>>> Is numpy a general purpose C library that can also be called from any
>>> language that can use a C API? Or is it specific to Python?
>>>
>>>
>> No, it's a general purpose FORTRAN library that can also be called from
>> any language that can use a generic C, FORTRAN, COBOL, etc API.
>
> Numpy itself is a collection of Python interfaces to a number of C and
> Fortran libraries. You can't generally call numpy from other languages --
> you can't even call numpy from other Python implementations, unless they
> support calling C/Fortran. Jython, for example, can't do it without help:
>
> https://jyni.org/
>
> and while PyPy can call numpy, doing so is slow, and there's a custom
> fork of numpy specially for PyPy:
>
> https://bitbucket.org/pypy/numpy

Right. I was kinda handwaving a bit; numpy itself isn't the thing
you'd call from elsewhere, but BLAS libraries in general are intended
to be available from many languages. Point is, it's not
Python-specific.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Steven D'Aprano
On Fri, 23 Feb 2018 10:50:50 -0600, Python wrote:

> On Fri, Feb 23, 2018 at 01:38:01AM -0500, Terry Reedy wrote:
>> It is no secret that Python's interpreted function calls are slower
>> than function calls compiled to machine code and that Python's infinite
>> precision integer arithmetic is slower  that machine int arithmetic. 
>> So there is almost no point in combining both in a naive drastically
>> inefficient algorithm and declaring that Python is slower.
> 
> I never said the benchmarks chosen were awesome...  :-)  What I'm saying
> is this:
> 
> 1. Insistence that the most efficient python implementation of Fib
>completely misses the point (and defeats the purpose) of the
>benchmarks, and as such the entire article is lost in the weeds.

The point of the benchmarks is to highlight how great Julia is, and why 
people should choose it over Python.

The point of the article is to show how to take badly written, 
unidiomatic, slow code written in Python, and modify it to be better, 
faster code competitive (if not faster than) Julia, while remaining 
within the Python ecosystem.

We aren't limited to the two options the benchmarks suggest: "Fast Julia! 
Slow Python!". (Three, if you include C.) We can take a third option: "we 
reject your crappy algorithms and write code like a professional".

https://allthetropes.org/wiki/Take_a_Third_Option



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Steven D'Aprano
On Fri, 23 Feb 2018 13:51:33 +, Ben Bacarisse wrote:

[...]
>> I don't know why the Julia programmers chose to use such a poor
>> algorithm:
> 
> It's odd indeed, but given that they did, what you take to be the point
> of the article -- to write a good Python algorithm as fast as the
> terrible Julia one -- seems a bit pointless.

Someone looking at the Julia benchmarks will look at the results and go, 
"Wow! Julia is so much faster than Python, I should choose to use Julia 
instead!". 

But guess what? The benchmarks are flawed. The performance of real-world 
Julia code doesn't match the performance of the benchmarks.

"What’s disappointing is the striking difference between 
the claimed performance and the observed one. For example, 
a trivial hello world program in Julia runs ~27x slower 
than Python’s version and ~187x slower than the one in C."

http://zverovich.net/2016/05/13/giving-up-on-julia.html

Admittedly "Hello World" is not what *I* would call a real world program, 
but that's just an illustration of the discrepancy between the 
performance in artificial benchmarks and the performance in useful code.

The benchmarks emphasise what Julia is good at, and use poor, unidiomatic 
Python code. People who are either too naive to know better, or who *do* 
know better but for some bizarre reason still take benchmarks at face 
value, believe them, and conclude that Julia is faster than Python. Not 
just a bit faster, but enough to conclude that Python is uncompetitive.

Okay, so Julia is fast, at least for toy benchmarks, maybe or maybe not 
for actual code you care about. Great. Python is fast too, if you stop 
writing shitty code. The *FIRST* lesson in optimization is to re-write 
your crap implementation with a better algorithm. The point of the post 
is to teach that lesson.

Stop writing crap code and then complaining that the language is "too 
slow". Write better code, and then we'll take your complaints seriously. 
Or, you never know, maybe you'll find it's fast enough and you don't 
actually have to abandon your existing code base and migrate to a whole 
new language.

"Python takes hours to sort a million integers using BubbleSort! Its too 
damn slow!!!"

"Why not use the built-in sort? That's fast."

"NOO I MUST USE BUBBLESORT, BECAUSE REASONS!!!1!"

*wink*



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Steven D'Aprano
On Fri, 23 Feb 2018 12:17:50 +, bartc wrote:

> On 23/02/2018 01:27, Steven D'Aprano wrote:
>> On Thu, 22 Feb 2018 17:53:30 +, bartc wrote:
> 
>>> The actual result is irrelevant, so long as its correct. The important
>>> thing is those 50 million calls.
>> 
>> Why do you care about the 50 million calls?
> 
> Because we are interested in comparing call-efficiency across languages?

Are we? What on earth makes you think that?


[...]
> You're responsible for implementing the call-mechanism of a new
> language, and you want to make it efficient. What program do you test it
> with, one that contains zero calls (clearly, that will be most
> efficient!), or one that contains lots of calls?

This has nothing to do with the topic under consideration. It is 
irrelevant. This discussion has nothing, absolutely nothing with 
"implementing the call-mechanism of a new language".

Perhaps you have missed that Python is a mature language well over 20 
years old now, and even Julia is a few years old.


[...]
> OK, then, if you are at all interested in Julia vs. Python (and it was
> you who posted that article!), then compare a version of a benchmark
> that uses as few calls as possible, if you like. But what exactly is
> that benchmark then highlighting? Loops?

What on earth makes you think that the point of this is to highlight any 
one particular feature of a language? Loops, calls, addition, who cares?

The point is to show how to take badly written slow code written in 
Python, and modify it to be better, faster code while remaining within 
the Python ecosystem.


[...]
> OK, then do as I did, and keep a global tally of how many calls it
> makes, and print that as the result. Then the correct output for
> 'fibcalls(36)' should be:
> 
>48315633
> 
> If yours shows only 37, then /your/ program is wrong.

I'm not interested in such a function. The words have not yet been 
invented to explain how little I care about how fast Python or Julia can 
make millions and millions of pointless function calls for no good reason.

But if it makes you feel good about yourself, if it makes you feel that 
you've won some sort of little victory, then I concede:

If a programmer cares about writing useless, crappy, pointless code that 
just makes function call after function call for no good reason, then 
Python will be terrible for the job. I would never use Python for such 
code, it is awful at it.

If you think it is important to run an empty loop a billion times in a 
nanosecond, Python will be crap at that too.

Python is only useful for writing useful, meaningful code that gets 
useful work done, not for your "fibcalls" function. If you have a need 
for "fibcalls", then go ahead and use Julia, or C, or whatever language 
you prefer, and be glad that you have a wide choice of languages to 
choose from.


-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Steven D'Aprano
On Sat, 24 Feb 2018 00:03:06 +1100, Chris Angelico wrote:

>> Is numpy a general purpose C library that can also be called from any
>> language that can use a C API? Or is it specific to Python?
>>
>>
> No, it's a general purpose FORTRAN library that can also be called from
> any language that can use a generic C, FORTRAN, COBOL, etc API.

Numpy itself is a collection of Python interfaces to a number of C and 
Fortran libraries. You can't generally call numpy from other languages -- 
you can't even call numpy from other Python implementations, unless they 
support calling C/Fortran. Jython, for example, can't do it without help:

https://jyni.org/

and while PyPy can call numpy, doing so is slow, and there's a custom 
fork of numpy specially for PyPy:

https://bitbucket.org/pypy/numpy


Remember that numpy show cases the exact reason Python was invented: to 
act as an easy to use "glue language" to join together efficient and 
useful, but not so easy to use, libraries written in C and Fortran. 
Jython was invented to support Java libraries, and IronPython to make it 
easy to integrate with the Dot-Net ecosystem.

-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Python
On Fri, Feb 23, 2018 at 01:38:01AM -0500, Terry Reedy wrote:
> It is no secret that Python's interpreted function calls are slower
> than function calls compiled to machine code and that Python's
> infinite precision integer arithmetic is slower  that machine int
> arithmetic.  So there is almost no point in combining both in a
> naive drastically inefficient algorithm and declaring that Python is
> slower.

I never said the benchmarks chosen were awesome...  :-)  What I'm
saying is this:

1. Insistence that the most efficient python implementation of Fib
   completely misses the point (and defeats the purpose) of the
   benchmarks, and as such the entire article is lost in the weeds.

2. Inasmuch as someone might want to measure the performance of
   recursive function calls in a particular language, the choice
   of recursively implementing fib is a fine one, and useful for that
   purpose.

3. If you want to say that choosing the most efficient implementation
   in Python is the only way to do a fair comparison, then you must
   also choose the most efficient implementation in Julia, which this
   is not, by all appearances.  But again, this misses the point of
   the benchmark and defeats its purpose.

Now if you want to complain that the purpose is silly, I won't even
argue that...  In 30 years of writing code I don't think I've ever
imiplemented a recursive solution to any problem that wasn't purely
academic.  Maybe once or twice... maybe.  But the purpose is what it
is, and arguing that it must be done a different way is bogus.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Chris Angelico
On Sat, Feb 24, 2018 at 3:39 AM, Steven D'Aprano
 wrote:
> On Fri, 23 Feb 2018 23:41:44 +1100, Chris Angelico wrote:
>
> [...]
>>>  Integer pixel values
>>
>> Maybe in 64 bits for the time being, but 32 certainly won't be enough.
>> As soon as you do any sort of high DPI image manipulation, you will
>> exceed 2**32 total pixels in an image (that's just 65536x65536, or
>> 46341x46341 if you're using signed integers);
>
> I don't think there's any sensible reason to use signed ints for pixel
> offsets.

How about "my language's default integer type is signed"? That's what
gave us all manner of problems, like the way a whole lot of old (and
some not even all THAT old) programs have trouble if you have more
than 2GB of free space - because they query disk free space as a
32-bit number, then interpret that number as signed. (If, as on many
modern systems, you have hundreds or thousands of gig free, it depends
whether that number modulo 4GB is more than 2GB, which is a very fun
thing to manage.)

>> and it wouldn't surprise
>> me if some image manipulation needs that many on a single side - if not
>> today, then tomorrow. So 64 bits might not be enough once you start
>> counting total pixels.
>
> 64-bit int will limit your image size to a maximum of 4294967296 x
> 4294967296.
>
> If you stitched the images from, let's say, the Hubble Space Telescope
> together into one panoramic image of the entire sky, you'd surely exceed
> that by a couple of orders of magnitude. There's a lot of sky and the
> resolution of the Hubble is very fine.
>
> Or made a collage of all the duck-face photos on Facebook *wink*
>
> But you wouldn't be able to open such a file on your computer. Not until
> we get a 128-bit OS :-)

Hmm, yeah, I didn't actually do the math on that, heh.

> While all of this is very interesting, let's remember that listing all
> the many things which can be counted in 64 bits (or 128 bits, or 1024
> bits...) doesn't disprove the need for bignums. The argument Bart makes
> is to list all the things that don't need refrigerating:
>
> - tinned beans
> - nuts and bolts
> - DVDs
> - books
> - spoons
>
> as evidence that we don't need refrigerators. Yeah, I get it, there are
> approximately a hundred thousand million billion trillion things that
> don't need bignums. There are so many things that don't need bignums, you
> need a bignum to count them all!

Touche!

> And I don't give a rat's arse that this means that adding 1 to 10 to get
> 11 in Python takes thirty nanoseconds instead of ten nanoseconds as a
> consequence. If I cared about that, I wouldn't be using Python.

Which is why you don't do crypto in Python - because you DO care about
whether it takes thirty sometimes and ten sometimes, and various other
things.

So there *are* situations where you specifically do not want bignums;
but they're a lot rarer than situations where you don't care what the
underlying implementation is, and it's incredibly freeing to not have
to take integer magnitude limitations into account.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Chris Angelico
On Sat, Feb 24, 2018 at 3:32 AM, Python  wrote:
> On Fri, Feb 23, 2018 at 03:11:36AM -0500, Terry Reedy wrote:
>> >>Why do you care about the 50 million calls? That's crazy -- the important
>> >>thing is *calculating the Fibonacci numbers as efficiently as possible*.
>>
>> >If you are writing practical programs, that's true.  But the Julia
>> >benchmarks are not practical programs; they are designed to compare
>> >the performance of various language features across a range of
>> >languages.
>>
>> If that were so, then the comparison should use the fastest *Python*
>> implementation.
>
> Doing that would completely fail to accomplish the task of comparing
> the performance of recursive function calls in the two languages,
> which is what the benchmark was designed to do.  So, no actually, it
> shouldn't.
>

Where does the author say that the benchmark is designed to compare
recursion? If you specifically test recursion, some language
interpreters will perform really well (because they convert your
recursion into iteration) and others will not (because they faithfully
execute the code you actually ask them to). Recursion is sometimes a
good way to describe an algorithm, but rarely a good way to implement
it at a low level.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Steven D'Aprano
On Fri, 23 Feb 2018 23:41:44 +1100, Chris Angelico wrote:

[...]
>>  Integer pixel values
> 
> Maybe in 64 bits for the time being, but 32 certainly won't be enough.
> As soon as you do any sort of high DPI image manipulation, you will
> exceed 2**32 total pixels in an image (that's just 65536x65536, or
> 46341x46341 if you're using signed integers); 

I don't think there's any sensible reason to use signed ints for pixel 
offsets.


> and it wouldn't surprise
> me if some image manipulation needs that many on a single side - if not
> today, then tomorrow. So 64 bits might not be enough once you start
> counting total pixels.

64-bit int will limit your image size to a maximum of 4294967296 x 
4294967296.

If you stitched the images from, let's say, the Hubble Space Telescope 
together into one panoramic image of the entire sky, you'd surely exceed 
that by a couple of orders of magnitude. There's a lot of sky and the 
resolution of the Hubble is very fine.

Or made a collage of all the duck-face photos on Facebook *wink*

But you wouldn't be able to open such a file on your computer. Not until 
we get a 128-bit OS :-)

While all of this is very interesting, let's remember that listing all 
the many things which can be counted in 64 bits (or 128 bits, or 1024 
bits...) doesn't disprove the need for bignums. The argument Bart makes 
is to list all the things that don't need refrigerating:

- tinned beans
- nuts and bolts
- DVDs
- books
- spoons

as evidence that we don't need refrigerators. Yeah, I get it, there are 
approximately a hundred thousand million billion trillion things that 
don't need bignums. There are so many things that don't need bignums, you 
need a bignum to count them all!

I'm truly happy for Bart that he never, or almost never, uses numbers 
larger than 2**64. I just wish he would be less of a prig about this and 
stop trying to deny me my bignums.

One of the things I like about Python is that I stopped needing to worry 
about integer overflow back around Python 2.2 when the default integer 
type became a bignum. I used to write code like:

try:
calculation(n)
except OverflowError:
calculation(long(n))

all the time back then. Now, if I have an integer calculation, I just 
calculate it without having to care whether it fits in 64 bits or not.

And I don't give a rat's arse that this means that adding 1 to 10 to get 
11 in Python takes thirty nanoseconds instead of ten nanoseconds as a 
consequence. If I cared about that, I wouldn't be using Python.

Just the other day I need to calculate 23! (factorial) for a probability 
calculation, and Python made it trivially easy. I didn't need to import a 
special library, or use special syntax to say "use a bignum". I just 
multiplied 1 through 23.

Why, just now, on a whim, simply because I can, I calculated 100!! 
(double-factorial):

py> reduce(operator.mul, range(100, 0, -2), 1)
34243224702511976248246432895208185975118675053719198827915654463488

(Yes, I'm a maths geek, and I love this sort of thing. So sue me.)

There are dozens of languages that have made the design choice to limit 
their default integers to 16- 32- or 64-bit fixed size, and let the user 
worry about overflow. Bart, why does it upset you so that Python made a 
different choice?


-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Python
On Fri, Feb 23, 2018 at 03:11:36AM -0500, Terry Reedy wrote:
> >>Why do you care about the 50 million calls? That's crazy -- the important
> >>thing is *calculating the Fibonacci numbers as efficiently as possible*.
> 
> >If you are writing practical programs, that's true.  But the Julia
> >benchmarks are not practical programs; they are designed to compare
> >the performance of various language features across a range of
> >languages.
> 
> If that were so, then the comparison should use the fastest *Python*
> implementation.  

Doing that would completely fail to accomplish the task of comparing
the performance of recursive function calls in the two languages,
which is what the benchmark was designed to do.  So, no actually, it
shouldn't.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Ben Bacarisse
Steven D'Aprano  writes:

> On Fri, 23 Feb 2018 00:26:33 +, bartc wrote:
>
>> The point of the article was Julia vs. Python. You can't make Python
>> faster by switching to a faster algorithm; you have to use the same one
>> on both.
>
> No, the point of the article was to write Python code that is as fast as 
> the Julia code.

I did not get that impression.  In fact I was not really sure what the
point was.  At the start, the authors says

  "My take on this kind of cross language comparison is that the
  benchmarks should be defined by tasks to perform, then have language
  experts write the best code they can to perform these tasks."

but if that is the point (rather the one on you picked up on) then the
article could be, at best, only half the story.

> I don't know why the Julia programmers chose to use such a poor 
> algorithm:

It's odd indeed, but given that they did, what you take to be the point
of the article -- to write a good Python algorithm as fast as the
terrible Julia one -- seems a bit pointless.

-- 
Ben.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread bartc

On 23/02/2018 12:41, Chris Angelico wrote:

On Fri, Feb 23, 2018 at 11:17 PM, bartc  wrote:



  Integer pixel values


Maybe in 64 bits for the time being, but 32 certainly won't be enough.


Why, people's eyes will evolve to see quintillions of colours?


As soon as you do any sort of high DPI image manipulation, you will
exceed 2**32 total pixels in an image (that's just 65536x65536, or
46341x46341 if you're using signed integers)


(Huh?)

; and it wouldn't surprise

me if some image manipulation needs that many on a single side - if
not today, then tomorrow. So 64 bits might not be enough once you
start counting total pixels.


A four-billion x four-billion image would be quite something. I wonder 
how long it would take Cpython to loop through all 18e18 pixels? (BTW 
this still fits into unsigned 64 bits.)


These speculations are pointless. Graphics is very hardware-oriented 
now, and this is mainly the concern of graphics processors which will 
already have wide datapaths, much wider than 64 bits. Because you can of 
course use more than one 64-bit value.



Most calculations on these can also be done in 64 bits (except multiply and
power where the ceiling might 32 bits or lower).


Exactly. Most.


OK, good, we've agreed on something: Most. Actually most values in my 
programs fit into 32 bits. Most of /those/ are likely to fit into 16 bits.


In a program, most integers are going to represent small values.

 And then you'll run into some limitation somewhere.


Let's say you want to know how many IP addresses, on average, are
allocated to one person. You'll probably get a single digit number as
your answer. Can you calculate that using 32-bit integers? What about
64-bit? Should you use bignums just in case? Take your pick, let me
know, and then I'll tell you my answer - and why.


I'm not saying bignums are needed 0% of the time. But they don't need to 
be used for 100% of all integer values, just in case.




What sort of handicap is it? Performance?


I don't know. Wasn't that brought up in the discussion? That Python's 
fib() routine could seamlessly accommodate bigger values as needed, and 
Julia's couldn't, and maybe that was one reason that Julia had the edge.


 Or maybe, in Bartville, this flexibility is

worth paying for but large integers are useless?


In 40 years of coding I can say that I have very rarely needed arbitrary 
precision integer values. (They come up in some recreational programs, 
or for implementing arbitrary precision integers in the languages used 
to run those programs!)


But I have needed flexible strings and lists all the time.

Is my programming experience really that different from anybody else's?

--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Chris Angelico
On Fri, Feb 23, 2018 at 11:57 PM, bartc  wrote:
> On 23/02/2018 08:11, Terry Reedy wrote:
>
>> * Python has an import statement.  But 'comparisons' disallow 'import
>> numpy', a quite legal Python statement, and similar others.
>
>
> If I'm duplicating a benchmark [in another language] then the last thing I
> want to see is something like 'import numpy', or other special language
> feature, as that cannot be replicated.
>
> I want to see a pure algorithm. Not the equivalent of:
>
>   os.system("C_version.exe")
>
>   The ability
>>
>> to import Python wrappers of Fortran and C libraries is part of the
>> fundamental design of cpython.  It is a distributed project.
>
>
> Is numpy a general purpose C library that can also be called from any
> language that can use a C API? Or is it specific to Python?
>

No, it's a general purpose FORTRAN library that can also be called
from any language that can use a generic C, FORTRAN, COBOL, etc API.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread bartc

On 23/02/2018 08:11, Terry Reedy wrote:

* Python has an import statement.  But 'comparisons' disallow 'import 
numpy', a quite legal Python statement, and similar others.


If I'm duplicating a benchmark [in another language] then the last thing 
I want to see is something like 'import numpy', or other special 
language feature, as that cannot be replicated.


I want to see a pure algorithm. Not the equivalent of:

  os.system("C_version.exe")

  The ability
to import Python wrappers of Fortran and C libraries is part of the 
fundamental design of cpython.  It is a distributed project.


Is numpy a general purpose C library that can also be called from any 
language that can use a C API? Or is it specific to Python?


--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Chris Angelico
On Fri, Feb 23, 2018 at 11:17 PM, bartc  wrote:
>>> The fact is that the vast majority of integer calculations don't need to
>>> use big integers (pretty much 100% of mine). Probably most don't even
>>> need 64 bits, but 32 bits.
>>
>>
>> And here we have the World According To Bart again: "since *I* don't need
>> more than 32-bits, then obviously it is a FACT than nobody else could
>> need them".
>
>
> Do you have better statistics than me? If so please share. I guess the
> number of calculations (that /can/ be done in 64 bits) is less than 100%,
> but not 0%, so somewhere in between.
>
> I'm saying it's nearer to 100% than to 0%; you're saying, what . ?

I don't know what he's saying, but I'm saying that unless it is 100%,
your language should just natively support bigints. How do you know
when you'll need that extra room? Are you going to pick bigints for
everything that's affected by user input? If so, you may as well use
them for everything and not bother with the distinction; if not, some
day, your program will need larger numbers than you guessed for it,
and it'll break.

> As examples of quantities that can be represented in 64 bits:
>
>  Character codes or code points

Sure. That's why we have a special optimization for sequences of
21-bit numbers. We call it "str".

>  Integer pixel values

Maybe in 64 bits for the time being, but 32 certainly won't be enough.
As soon as you do any sort of high DPI image manipulation, you will
exceed 2**32 total pixels in an image (that's just 65536x65536, or
46341x46341 if you're using signed integers); and it wouldn't surprise
me if some image manipulation needs that many on a single side - if
not today, then tomorrow. So 64 bits might not be enough once you
start counting total pixels.

>  Most enumerations
>  For-loop counting (if it needs more than 64 bits, then we might be
>waiting a while for it to finish)

Most loops don't need any counting. Should we implement a 0-bit
integer for those?

>  Array indexing
>  Numbers of people, cars, guns, countries, houses, books, iphones etc
>on Earth
>  ... endless examples of numbers less than 900 used
>in programming

As soon as you say "the number of X is always going to be less than
Y", you open yourself up to risk.

https://xkcd.com/865/

> Most calculations on these can also be done in 64 bits (except multiply and
> power where the ceiling might 32 bits or lower).

Exactly. Most. And then you'll run into some limitation somewhere.

Let's say you want to know how many IP addresses, on average, are
allocated to one person. You'll probably get a single digit number as
your answer. Can you calculate that using 32-bit integers? What about
64-bit? Should you use bignums just in case? Take your pick, let me
know, and then I'll tell you my answer - and why.

>> Yes Bart, the reason that so many languages have support for Bignums is
>> that everyone except you is an idiot who loves writing slow code for
>> absolutely no reason at all.
>
> If bignum support is a handicap, even when they are not needed, then it is
> something to consider.

What sort of handicap is it? Performance? That's what optimization is
for. Some languages (Pike, for instance) have a single data type that
is sometimes represented internally by a native integer and sometimes
by a bignum; it's completely transparent apart from with timings.
Others (eg Python 2) have two separate data types, but will smoothly
transition to bignum any time it's needed. IMO CPython 3.x could
probably benefit from the transparent optimization... but Python (the
language) benefits hugely from the core int type simply being a
bignum. There are no semantically significant effects at the boundary
between "small numbers" and "large numbers".

Auto-growing lists are a cost, too. It's expensive to have to allocate
more memory to allow another element to be added. Should we lock in
the sizes of all lists/arrays at creation? Why should we pay the price
for that flexibility? Or maybe, in Bartville, this flexibility is
worth paying for but large integers are useless?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread bartc

On 23/02/2018 01:27, Steven D'Aprano wrote:

On Thu, 22 Feb 2018 17:53:30 +, bartc wrote:



The actual result is irrelevant, so long as its correct. The important
thing is those 50 million calls.


Why do you care about the 50 million calls?


Because we are interested in comparing call-efficiency across languages? 
NOT in Fibonacci numbers.


 That's crazy -- the important

thing is *calculating the Fibonacci numbers as efficiently as possible*.

I don't believe for a second that you would write code like this:


 # need a Fibonacci number, but the implementation we have


We don't. If Fibonacci causes problems for you because you can't see 
past the inefficiency, then we might use the Ackermann function instead.


(But I've found Ackermann can cause stack overflow in some languages.)


so why on earth do you care about the 50 million calls?

The only reason I care about them is to reduce the number as much as
possible. Why do you care about them?


You're responsible for implementing the call-mechanism of a new 
language, and you want to make it efficient. What program do you test it 
with, one that contains zero calls (clearly, that will be most 
efficient!), or one that contains lots of calls?




arr := some array
for i := 1 to len(arr):
 value = arr[i]
 process(value)


If I'm writing Ruby or Python, that means *NOT* writing loops like that,
but writing them like this:

for value in arr:
 process(value)


Suppose it turned out that in Python, /this/ was actually faster:

  for i in range(len(arr)):
 value = arr[i]
 process(value)
?


I don't give a flying fox about how fast the compiler can do those 48
million calls, I want to know how to get Fibonacci numbers as fast as I
can, and if that means avoiding making the calls, then you better believe
that I will avoid making those calls.


OK, then, if you are at all interested in Julia vs. Python (and it was 
you who posted that article!), then compare a version of a benchmark 
that uses as few calls as possible, if you like. But what exactly is 
that benchmark then highlighting? Loops?




By running a different program? Fine, then here's my submission for the
N=20 case used in the tests:

def fib(n): return 6765


The program has to be *correct*, and your program is not.


OK, then do as I did, and keep a global tally of how many calls it 
makes, and print that as the result. Then the correct output for 
'fibcalls(36)' should be:


  48315633

If yours shows only 37, then /your/ program is wrong. (BTW, this value 
will always fit in 64 bits, for the foreseeable future.)



When comparing two compilers, you are ALWAYS comparing two different
programs.


That's part of the art of doing it properly.



Either the compilers generate precisely the same machine/byte code, in
which case there is no point in comparing them[1], or they generate
different machine/byte code, in which case you're comparing two different
running programs.


So the people who worked for years on PyPy never once bothered to 
compare results with running the same source program with CPython, 
because there would be point? OK, I never knew that...



The fact is that the vast majority of integer calculations don't need to
use big integers (pretty much 100% of mine). Probably most don't even
need 64 bits, but 32 bits.


And here we have the World According To Bart again: "since *I* don't need
more than 32-bits, then obviously it is a FACT than nobody else could
need them".


Do you have better statistics than me? If so please share. I guess the 
number of calculations (that /can/ be done in 64 bits) is less than 
100%, but not 0%, so somewhere in between.


I'm saying it's nearer to 100% than to 0%; you're saying, what . ?

As examples of quantities that can be represented in 64 bits:

 Character codes or code points
 Integer pixel values
 Most enumerations
 For-loop counting (if it needs more than 64 bits, then we might be
   waiting a while for it to finish)
 Array indexing
 Numbers of people, cars, guns, countries, houses, books, iphones etc
   on Earth
 ... endless examples of numbers less than 900 used
   in programming

Most calculations on these can also be done in 64 bits (except multiply 
and power where the ceiling might 32 bits or lower).



Yes Bart, the reason that so many languages have support for Bignums is
that everyone except you is an idiot who loves writing slow code for
absolutely no reason at all.


If bignum support is a handicap, even when they are not needed, then it 
is something to consider.


--
bartc
--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Steven D'Aprano
On Fri, 23 Feb 2018 11:00:28 +0100, Stefan Behnel wrote:

> I've even seen proper Cython benchmark code that a C compiler can fully
> analyse as static and replaces by a constant, and then get wonder
> speedups from it that would never translate to any real-world gains.

This is one of the reasons why typical benchmarks are junk. If you're not 
benchmarking actual, real-world code you care about, you have no idea how 
the language will perform with actual, real-world code.

There is a popular meme that "Python is slow" based on the results of 
poorly-written code. And yet, in *real world code* that people actually 
use, Python's performance is not that bad -- especially when you play to 
its strengths, rather than its weaknesses, and treat it as a glue 
language for stitching together high-performance libraries written in 
high-performance languages like C, Fortran, Rust and Java.

Speaking of bad benchmarks, I recall an anecdote about a programmer 
running some benchmarks on Fortran code on various mainframes, and found 
that one specific machine was millions of times faster than the others.

It turns out that the code being tested was something like this 
(translated into Python):

for i in range(1, 100):
pass


and the Fortan compiler on that one machine was clever enough to treat it 
as dead code and remove it, turning the entire benchmark into a single 
no-op.

Your point that we must consider benchmarkers carefully is a good one.


-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Steven D'Aprano
On Fri, 23 Feb 2018 11:00:28 +0100, Stefan Behnel wrote:

> I've even seen proper Cython benchmark code that a C compiler can fully
> analyse as static and replaces by a constant, and then get wonder
> speedups from it that would never translate to any real-world gains.

This is one of the reasons why typical benchmarks are junk. If you're not 
benchmarking actual, real-world code you care about, you have no idea how 
the language will perform with actual, real-world code.

There is a popular meme that "Python is slow" based on the results of 
poorly-written code. And yet, in *real world code* that people actually 
use, Python's performance is not that bad -- especially when you play to 
its strengths, rather than its weaknesses, and treat it as a glue 
language for stitching together high-performance libraries written in 
high-performance languages like C, Fortran, Rust and Java.

Speaking of bad benchmarks, I recall an anecdote about a programmer 
running some benchmarks on Fortran code on various mainframes, and found 
that one specific machine was millions of times faster than the others.

It turns out that the code being tested was something like this 
(translated into Python):

for i in range(1, 100):
pass


and the Fortan compiler on that one machine was clever enough to treat it 
as dead code and remove it, turning the entire benchmark into a single 
no-op.

Your point that we must consider benchmarkers carefully is a good one.


-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Stefan Behnel
Steven D'Aprano schrieb am 22.02.2018 um 11:59:
> https://www.ibm.com/developerworks/community/blogs/jfp/entry/Python_Meets_Julia_Micro_Performance?lang=en

Thanks for sharing, Steven.

While it was already suggested between the lines in some of the replies,
I'd like to emphasise that the combination of timeit and result caching
(memoizing) in this post is misleading and not meaningful. It actually
shows a common mistake that easily happens when benchmarking code.

Since timeit repeats the benchmark runs and takes the minimum, it will
*always* return the time it takes to look up the final result in the cache,
and never report the actual performance of the code that is meant to be
benchmarked here. From what I read, this was probably not intended by the
author of the post.

Myself, I'm guilty as charged to run into this more than once and have seen
it in many occasions. I've even seen proper Cython benchmark code that a C
compiler can fully analyse as static and replaces by a constant, and then
get wonder speedups from it that would never translate to any real-world
gains. These things happen, but they are mistakes. All they tell us is that
we must always be careful when evaluating benchmarks and their results, and
to take good care that the benchmarks match the intended need, which is to
help us understand and solve a real-world performance problem [1].

Stefan


[1] That's also the problem in the post above and in the original
benchmarks it refers to: there is no real-world problem to be solved here.
Someone wrote slow code and then showed that one language can evaluate that
slow code faster than another. Nothing to see here, keep walking...

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread alister via Python-list
On Fri, 23 Feb 2018 03:11:36 -0500, Terry Reedy wrote:

> On 2/22/2018 10:31 PM, Python wrote:
> 
>>> Why do you care about the 50 million calls? That's crazy -- the
>>> important thing is *calculating the Fibonacci numbers as efficiently
>>> as possible*.
> 
>> If you are writing practical programs, that's true.  But the Julia
>> benchmarks are not practical programs; they are designed to compare the
>> performance of various language features across a range of languages.
> 
> If that were so, then the comparison should use the fastest *Python*
> implementation.  Which is what the article discussed here did.  But the
> comparison is a lie when the comparison is compiled machine code versus
> bytecode interpreted by crippled cpython*.  And people use this sort of
> benchmark to choose a language for *practical programs*, and don't know
> that they are not seeing *Python* times, but *crippled cpython* times.

Benchmarks are completely pointless anyway.
the question when choosing a language should not be 'How fast it is', it 
should always be 'Is it fast enough'.
once you have a choice of languages that are fast enough other criteria 
for suitability become important (such as readability & ease of debugging)


-- 
If you're going to do something tonight that you'll be sorry for tomorrow
morning, sleep late.
-- Henny Youngman
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Chris Angelico
On Fri, Feb 23, 2018 at 5:38 PM, Terry Reedy  wrote:
> As to the vague 'class of problems implemented in a similar manner': Any
> function f of count N that depends of values of f for counts < N can be
> memoized the same way in Python as fibonacci.  Everything said about P vs J
> for fib applies to such similar problems.  The same is almost true for
> functions that depend on a pair of counts, such as the number of
> combinations of N things taken K at a time.  The time benefit per space used
> is just a bit less.

The naive recursive Fibonacci function can be memoized with maxsize=2
and it gets virtually all the benefit - in fact, it becomes
effectively iterative. (I think even maxsize=1 will do that.) With
most memoization, that kind of benefit is less blatant.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-23 Thread Terry Reedy

On 2/22/2018 10:31 PM, Python wrote:


Why do you care about the 50 million calls? That's crazy -- the important
thing is *calculating the Fibonacci numbers as efficiently as possible*.



If you are writing practical programs, that's true.  But the Julia
benchmarks are not practical programs; they are designed to compare
the performance of various language features across a range of
languages.


If that were so, then the comparison should use the fastest *Python* 
implementation.  Which is what the article discussed here did.  But the 
comparison is a lie when the comparison is compiled machine code versus 
bytecode interpreted by crippled cpython*.  And people use this sort of 
benchmark to choose a language for *practical programs*, and don't know 
that they are not seeing *Python* times, but *crippled cpython* times.


* Python has an import statement.  But 'comparisons' disallow 'import 
numpy', a quite legal Python statement, and similar others.  The ability 
to import Python wrappers of Fortran and C libraries is part of the 
fundamental design of cpython.  It is a distributed project.


The fact that numerical python, numarray, and now numpy are distributed 
separately, in spite of proposals to include them, syntax additions for 
their benefit, and their overwhelming usefullness, is a matter of social 
and administrative convenience, not necessity.  There is also a 
proposal, which I consider likely to happen, to enhance the cpython 
Windows and Mac installers by making installation of numpy and other 
great 'external' modules selectable options.


It takes just 2 or 3 legal Python lines that do run with cpython as 
installed to make 'import numpy' work, without resorting to subprocess 
(though that is *also* legal python:


from ensurepip import bootstrap; bootstrap()  # only if needed
from pip import main
main(['install', 'numpy'])

>>> import numpy
>>> dir(numpy)
['ALLOW_THREADS', 'AxisError', 'BUFSIZE', 'CLIP', 'ComplexWarning',
...
'vstack', 'warnings', 'where', 'who', 'zeros', 'zeros_like']

So, given a realistic number-crunching benchmark that take at least 
several minutes, one could add those two lines at the top and be off


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-22 Thread Terry Reedy

On 2/22/2018 8:36 PM, Python wrote:

On Thu, Feb 22, 2018 at 11:29:53PM +1100, Chris Angelico wrote:

The idea of the Fibonacci benchmark is to test how effectively an
implementation manages large numbers of recursive function calls. Then,
fib(36) would normally involve 48,315,633 calls.

This version does only 37, giving a misleading impression.


Not overly misleading; the point of it is to show how trivially easy
it is to memoize a function in Python.


This does not appear to me at all to be the point of the article.  The
point of the article seems to be that the Julia benchmarks compare
unfairly the performance of Python to Julia, because they do not use
algorithms that suit "the best way to use Python."  But I have to
agree with bartc, that is not at all the point of the benchmarks.  The
point of the benchmarks is to compare the performance of a particular
algorithm or set of algorithms across a variety of languages.

It's fine to say that this method of computing Fibonacci sequences is
inefficient; but anyone who's spent any time on the problem knows that
recursive function calls is not the most efficient way to calculate
them in most languages.  So it must follow logically that the point of
the benchmark is not to prove that Julia is faster than Python at
solving Fibonacci sequences, but rather that Julia is faster than
Python at solving the class of problems that would be implemented in a
similar manner as this particular implementation of Fibonacci
sequences.


It is no secret that Python's interpreted function calls are slower than 
function calls compiled to machine code and that Python's infinite 
precision integer arithmetic is slower  that machine int arithmetic.  So 
there is almost no point in combining both in a naive drastically 
inefficient algorithm and declaring that Python is slower.


As to the vague 'class of problems implemented in a similar manner': 
Any function f of count N that depends of values of f for counts < N can 
be memoized the same way in Python as fibonacci.  Everything said about 
P vs J for fib applies to such similar problems.  The same is almost 
true for functions that depend on a pair of counts, such as the number 
of combinations of N things taken K at a time.  The time benefit per 
space used is just a bit less.


Now consider a general recursive problem: a graph with N nodes and E 
edges and recursive node functions that depend on the value of the 
function at args(node) other nodes.  Example: the minimum length of a 
path from node a to node b.  In the field of graph algorithms, it is 
completely routine to save f(node) for intermediate nodes when 
calculated. The idea of naively implementing the recursive definition, 
in any non-trivial practical situation, without saving intermediate 
values, is pretty ludicrous.


Fib can be viewed as a function on a directed graph where all but the 2 
base nodes have two incoming and two outgoing edges.


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: How to make Python run as fast (or faster) than Julia

2018-02-22 Thread Steven D'Aprano
On Fri, 23 Feb 2018 00:26:33 +, bartc wrote:

> The point of the article was Julia vs. Python. You can't make Python
> faster by switching to a faster algorithm; you have to use the same one
> on both.

No, the point of the article was to write Python code that is as fast as 
the Julia code.

I don't know why the Julia programmers chose to use such a poor 
algorithm: maybe they are show-casing just how amazingly awesome the 
Julia compiler is that it can even make a rubbish algorithm run fast. I 
don't care -- they can choose whatever algorithm they like for their 
Julia code, but I don't really care how good the language is at running 
bad code that I would never use in real life.



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


  1   2   >