Terry Reedy writes: > So you agree that the limit of 39 is not intrinsic to the fib function > or its uses, but is an after-the-fact limit imposed to mask the bug > proneness of using substitutes for integers.
I don't know what the limit used in the benchmark is, but it must be quite a bit lower than 50 for 32-bit integers and could be greater than 90 for 64-bit integers. And it's not "masking bugs", it's "respecting the domain of valid input", s'il vous plait. > To my mind, a fairer and more useful benchmark of 'base language > performance' based on fib would use a wider domain. "Fair", maybe. But why play this game at all? These benchmarks are simply not useful to users choosing languages, unless they already know the difficulties of interpreting benchmarks and are willing to expend the effort to account for them. Without that knowledge and effort, choosing a programming language based on microbenchmarks is like choosing a car based on the leg-length of the model sitting on the hood in the TV commercial. > The report would say that CPython (with lru_cache disallowed) is > slow but works over a wide range of inputs, No, the report would say "Use of this benchmark for cross-language comparison of function call speed is more or less inaccurate due to differences in representation of integers and in handling the possibility of exceptions in 'integer' arithmetic." You are picking one tiny difference, but there are potentially many, some quite a bit larger on the tested domain (for example, some languages may be able to optimize fib() to unboxed integers, in which case they'll blow away all those that don't). > Users could then make a more informed pick. My point in my reply to Nick is that users aren't making informed picks. If they were, we wouldn't even be thinking about having this conversation. I'm not sure what they are doing (maybe, as Nick suggests, justifying their "tribal" prejudices?), but it's not that. ;-) Sure, other things being equal, better benchmarks will improve runtime performance, but other things are so far from being equal even an economist can't say "ceteris paribus" here. To expand that point: I don't really see a point in users (ie, developers in Python and other such languages) looking at these benchmarks except for the fun of feeling like implementers, to be honest. Even the implementers shouldn't much care about cross- language benchmarks, except that when a "similar"[1] language does significantly better on a particular benchmark, it's often useful to wonder "how dey do dat?!" Typically the answer is "they 'cheat'" == "fail one of the properties we consider required", but sometimes it's "ooh, that's cute, and I bet we could make Python work the same way" or "urkh, we can't do *that* (yuck!) but we could FATten up Python with similar effect". (Let me take this opportunity to say "Thank you, Victor!") Of course in the case of a controlled experiment like "configure in Victor's changes and run the benchmarks to make sure they're not detectably slower", they're invaluable regression tests, and more or less valuable (ie, YMMV) as measures of improvement to compare to costs they may impose in other features or (even more fuzzy) in developer time. Footnotes: [1] Whatever that means.... _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com