Terry Reedy writes:

 > So you agree that the limit of 39 is not intrinsic to the fib function 
 > or its uses, but is an after-the-fact limit imposed to mask the bug 
 > proneness of using substitutes for integers.

I don't know what the limit used in the benchmark is, but it must be
quite a bit lower than 50 for 32-bit integers and could be greater
than 90 for 64-bit integers.  And it's not "masking bugs", it's
"respecting the domain of valid input", s'il vous plait.

 > To my mind, a fairer and more useful benchmark of 'base language 
 > performance' based on fib would use a wider domain.

"Fair", maybe.  But why play this game at all?  These benchmarks are
simply not useful to users choosing languages, unless they already
know the difficulties of interpreting benchmarks and are willing to
expend the effort to account for them.

Without that knowledge and effort, choosing a programming language
based on microbenchmarks is like choosing a car based on the
leg-length of the model sitting on the hood in the TV commercial.

 > The report would say that CPython (with lru_cache disallowed) is
 > slow but works over a wide range of inputs,

No, the report would say "Use of this benchmark for cross-language
comparison of function call speed is more or less inaccurate due to
differences in representation of integers and in handling the
possibility of exceptions in 'integer' arithmetic."  You are picking
one tiny difference, but there are potentially many, some quite a bit
larger on the tested domain (for example, some languages may be able
to optimize fib() to unboxed integers, in which case they'll blow away
all those that don't).

 > Users could then make a more informed pick.

My point in my reply to Nick is that users aren't making informed
picks.  If they were, we wouldn't even be thinking about having this
conversation.  I'm not sure what they are doing (maybe, as Nick
suggests, justifying their "tribal" prejudices?), but it's not
that. ;-)  Sure, other things being equal, better benchmarks will
improve runtime performance, but other things are so far from being
equal even an economist can't say "ceteris paribus" here.

To expand that point: I don't really see a point in users (ie,
developers in Python and other such languages) looking at these
benchmarks except for the fun of feeling like implementers, to be
honest.  Even the implementers shouldn't much care about cross-
language benchmarks, except that when a "similar"[1] language does
significantly better on a particular benchmark, it's often useful to
wonder "how dey do dat?!"  Typically the answer is "they 'cheat'" ==
"fail one of the properties we consider required", but sometimes it's
"ooh, that's cute, and I bet we could make Python work the same way"
or "urkh, we can't do *that* (yuck!) but we could FATten up Python
with similar effect".  (Let me take this opportunity to say "Thank
you, Victor!")

Of course in the case of a controlled experiment like "configure in
Victor's changes and run the benchmarks to make sure they're not
detectably slower", they're invaluable regression tests, and more or
less valuable (ie, YMMV) as measures of improvement to compare to
costs they may impose in other features or (even more fuzzy) in
developer time.


Footnotes: 
[1]  Whatever that means....

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to