[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-20 Thread Mark Dickinson

Mark Dickinson dicki...@gmail.com added the comment:

 I do think it would be unfortunately to not go a little further though -
 just because we can do better with little effort, we can save a few CPU
 cycles which means saving time, money and all of this can only be good
 for the planet.  ;-)

Gawain,

Programmer cycles matter too. :-)   Code clarity, especially in the Python 
core, is valued (at least by me) very highly---for all the usual reasons:  
readability, maintainability, fewer places for bugs to hide.  Verifying 
the correctness of the shorter version of int_to_decimal_string takes 
significantly less time, for me, than verifying the longer version.  Of 
course, that's probably partly because I wrote it, but I'd guess that it's 
still true for an independent viewer.

For example, Tim Peters has been heard to say that if he did this all over 
again, he probably wouldn't have added Karatsuba multplication:

http://mail.python.org/pipermail/python-dev/2008-November/083355.html

It's not easy deciding where to draw the line, but for me, this particular 
tiny speed gain (12.5% on a microbenchmark isn't really that significant) 
just isn't worth this particular (also tiny, admittedly) complexity gain.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-19 Thread Gawain Bolton

Gawain Bolton gp.bol...@computer.org added the comment:

Here's a modified version of the patch to Objects/intobject.c which
__does__ use the two-digits-at-a-time optimization.

Compared to the int_decimal_conversion_trunk.patch, my tests show a
further 12.5% improvement with two digit numbers - positive or negative
and more than 8% overall using different sizes all the way up to sys.maxint.

I admit, there is a slight increase in code complexity.  However, I
disagree that the optimization is machine/compiler dependent as there
are fundamentally half as many divisions.

I hope I don't come across as unappreciative, on the contrary the
fundamental ideas is to special case base 10 conversions and get a speed
boost by leveraging the compiler and the
int_decimal_conversion_trunk.patch does this nicely.

I do think it would be unfortunately to not go a little further though -
just because we can do better with little effort, we can save a few CPU
cycles which means saving time, money and all of this can only be good
for the planet.  ;-)

--
Added file: http://bugs.python.org/file14935/int_decimal_conversion_trunk.patch2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-18 Thread Mark Dickinson

Mark Dickinson dicki...@gmail.com added the comment:

The now unused arbitrary base conversion was removed in r74910 (py3k).  
I'm deliberately going to leave it in in the trunk, just in case there's 
third party code that's using _PyLong_Format.  (There shouldn't be, given 
the '_Py' prefix, but you never know.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-18 Thread Mark Dickinson

Mark Dickinson dicki...@gmail.com added the comment:

Here's a patch for trunk's Objects/intobject.c.  With this patch, I'm 
seeing more than 100% speed increases for str(sys.maxint) on OS X 10.6 
(64-bit) and more modest speed gains on OS X 10.5 (32-bit).

I'm leaving out the two-digits-at-a-time optimization.  It *does* give a 
small speed gain on my machine, but IMO it's not enough of a gain to 
justify the extra code complexity;  it also seems like one of those 
optimizations whose value might be highly machine/compiler-dependent.

I'll apply this in a few days' time.

--
Added file: http://bugs.python.org/file14924/int_decimal_conversion_trunk.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-16 Thread Mark Dickinson

Mark Dickinson dicki...@gmail.com added the comment:

Updated patch, with minor changes:
  - remove an incorrect Py_DECREF(str)
  - rename _PyLong_ToDecimal;  no need for the _Py prefix, since this
function isn't shared across files
  - absorb special case for 0 into the rest of the code
  - whitespace and indentation fixes

Not that it matters much, but it's curious that on my machine (gcc-4.2, OS 
X 10.6.1, x64-64) it's significantly faster (~6% increase in str() speed 
for large integers) to use the line:

pout[j] = z - (twodigits)hi * _PyLong_DECIMAL_BASE;

in the middle of the inner loop, rather than the line:

pout[j] = z - hi * _PyLong_DECIMAL_BASE;

I'm wondering whether this is just a quirk of my OS/compiler combination, 
or whether there's a good reason for this difference.  The lines are 
functionally equivalent, since the result is reduced modulo 2**32 either 
way, but the first line involves a 32x32-64 multiplication and a 64-bit 
subtraction, where the second involves a 32x32-32 multiplication and a 
32-bit subtraction;  the generated assembly code for the second line is 
also one instruction shorter (there's a move opcode saved somewhere).

--
Added file: 
http://bugs.python.org/file14902/long_decimal_conversion_py3k_2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-16 Thread Mark Dickinson

Mark Dickinson dicki...@gmail.com added the comment:

Applied long_decimal_conversion_py3k_2.patch in r74851;  backported to 
trunk in r74853.

Still to do:
 - look at the 'two-digits-at-a-time' optimization.
 - rip out the non-binary-base code from _PyLong_Format

While we're at it, it would also be good to look at the decimal string - 
integer conversion,  but I think that's a separate issue.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-13 Thread Mark Dickinson

Mark Dickinson dicki...@gmail.com added the comment:

Barring objections, I plan to apply the 'long_decimal_conversion_py3k' 
patch to the py3k branch; I'll then backport to trunk.  This patch doesn't 
include Gawain's two-decimal-digits-at-a-time optimization;  after I've 
applied the patch, I'll have another look at that.

This would leave the non-binary-base code in _PyLong_Format unused.  I'll 
hold off on removing that code until the python-ideas thread on arbitrary-
base int - string conversions has reached a conclusion.

--
keywords: +patch
Added file: http://bugs.python.org/file14884/long_decimal_conversion_py3k.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-13 Thread Mark Dickinson

Changes by Mark Dickinson dicki...@gmail.com:


--
stage:  - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-13 Thread Gawain Bolton

Gawain Bolton gp.bol...@computer.org added the comment:

Mark,
I haven't tried your latest patch but I tried your patch3 and obtained 
the same performance improvement for long integers, namely 3.1x faster.
This is truly an excellent improvement, my hat's off to you!

As for the basic integers, I'm working on another patch which improves
performance even more but by all means, go ahead with your improvement
for long integers.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-12 Thread Mark Dickinson

Changes by Mark Dickinson dicki...@gmail.com:


--
assignee:  - marketdickinson

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-10 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 If you're working with huge integers and care about speed, then 
 you should probably be using gmpy or similar

I disagree with you, mark. The patch is around 20 lines and does optimize all
cases, not only the huge integers. See my benchmark: conversion for small
integers (type 'int') are also faster (7 to 22%). I think that the base 10 is 
more
common than 2^k bases, and a conversion from integer to decimal string is a
very common operation in Python.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-10 Thread Collin Winter

Collin Winter coll...@gmail.com added the comment:

I ran this patch against Unladen Swallow's slowspitfire template 
benchmark, which does more int-string conversions than any of our other 
benchmarks. When run against Python trunk r74737, I get these results:

slowspitfire:
Min: 0.888772 - 0.867427: 2.46% faster
Avg: 0.891857 - 0.872461: 2.22% faster
Significant (t=45.532127, a=0.95)

(./perf.py -r -b slowspitfire ../a/python.exe ../b/python.exe)
This was an idle MacBook Pro, OS X 10.5.8, Apple gcc 4.0.1, 2.4 GHz Core 
Duo.

Other benchmarks benefit, but are only barely statistically significant.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-10 Thread Mark Dickinson

Mark Dickinson dicki...@gmail.com added the comment:

Thanks for those results, Collin.

 By benchmark should be reproduced on a 64 bits CPU with 2^15 and 2^30
 bases for the long type.

On OS X 10.6 (64-bit Python non-debug non-framework build with gcc 4.2 
from Apple, 30-bit long digits, straight ./configure  make), Victor's 
benchmark gives me the following results:

original = 1023.9 ms  (best of 10 runs)
patched = 1005.3 ms (best of 10 runs).

- a speedup of about 1.85%.  So it looks as though x86_64 doesn't benefit 
to the same extent that 32-bit does.  Presumably that's because gcc-4.2 is 
unable or unwilling to turn a 64-bit by 64-bit division with a constant 
dividend of 10**9 into a multiplication;  I don't know whether using a 
later gcc would make a difference.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-10 Thread Mark Dickinson

Mark Dickinson dicki...@gmail.com added the comment:

Sorry:  ignore that last message;  I was talking through my hat.  I 
think I must have been unwittingly building in debug mode.

Ahem.  After a 'make distclean', I get the following results (same specs 
as above, on a 2.53Ghz MacBook Pro / Core 2 Duo).

original: 783.8 ms
patched:  373.5 ms  (2.1 x faster)
patch 2:  323.7 ms  (2.4 x faster)

patch 2 (attached) is Gawain's original patch, but with the base != 
2,8,10,16 code removed and the call to _inplace_divrem1 inlined and 
slightly reorganized.  (No changes to intobject.c.)

--
Added file: 
http://bugs.python.org/file14872/base10_conversion_performance_patch2.txt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-10 Thread Mark Dickinson

Mark Dickinson dicki...@gmail.com added the comment:

One more iteration of the patch is attached: I rewrote the conversion
algorithm to do the base PyLong_BASE to base 10**e conversion first,
then output the base 10**e array as individual digits.  For OS
X/Intel, this seems to speed things up significantly.

(First three values below are the same as before.)

OS X 10.6, 64-bit build of Python, 30-bit digits:

original: 783.8 ms
patch 1:  373.5 ms  (2.1 x faster)
patch 2:  323.7 ms  (2.4 x faster)
patch 3:  250.1 ms  (3.1 x faster)

For OS X 10.5, 32-bit build of Python with 15-bit digits, on the same platform 
as above, 
I get the following timings:

original: 2045.1 ms
patch 1 : 1052.2 ms (1.94 x faster)
patch 2 : 1228.7 ms (1.66 x faster)
patch 3 :  725.8 ms (2.82 x faster)

--
Added file: 
http://bugs.python.org/file14875/base10_conversion_performance_patch3.txt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-10 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

I don't understand the following comment in patch3:
/* convert: base 2 in pin - base 10 in pout */

I  think that pin base is 2^30 / 2^15 and pout base is 10^9 / 10 ^ 4, not 10.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-09 Thread Gawain Bolton

Gawain Bolton gp.bol...@computer.org added the comment:

Yes I agree it would be a good idea to have one definition and one
instantiation of the _decimal_digit_table[] and BitLengthTable[32] arrays.

Where do you suggest these tables could be put?  I'll be happy to
provide an updated patch if you can let me know.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-09 Thread Gregory P. Smith

Changes by Gregory P. Smith g...@krypto.org:


--
nosy: +collinwinter, gregory.p.smith

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-09 Thread Mark Dickinson

Mark Dickinson dicki...@gmail.com added the comment:

I'm a bit ambivalent about this patch:  I'd like to see string to integer 
conversions improved, and on the surface it makes sense to special-case 
base 10 conversions (just as power-of-two bases are already special-
cased), but this seems like quite a lot of extra code to debug and 
maintain just to speed up one aspect of the integer implementation.  It 
would be easy to grow longobject.c by several thousand lines by adding a 
handful of optimizations of this type.

If you're working with huge integers and care about speed, then you should 
probably be using gmpy or similar (or something other than Python).  And 
this patch doesn't solve the real problem with converting huge integers to 
and from decimal strings, which is that the algorithms used have quadratic 
running time.  Amongst quadratic-time algorithms, I'm tempted to call 
Python's implementation 'good enough'.

Gawain, do you have a particular use-case in mind for this optimization?  
Are there common applications where string - int conversion times are 
critical?

--
versions: +Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-09 Thread Mark Dickinson

Mark Dickinson dicki...@gmail.com added the comment:

On the other hand, _PyLong_Format currently contains general machinery 
for integer - string conversion in any base in the range [2, 36], but I 
don't think that machinery is ever used for bases other than 2, 8, 10 
and 16.  So ripping _PyLong_Format out and just leaving the binary base 
and the suggested base 10 code might actually make longobject.c shorter.

I'd be interested to know how much the conversion of two digits at one 
time instead of one is contributing to the speedup by itself.  For large 
integers, I'd suspect that this makes very little difference.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-07 Thread Eric Smith

Changes by Eric Smith e...@trueblade.com:


--
nosy: +eric.smith

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-07 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@haypocalc.com:


Added file: http://bugs.python.org/file14857/bench_long_format.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-07 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Would it be possible to share some constant data between intobject.c and
longobject.c? There is already static const unsigned char BitLengthTable[32]
(32 bytes), and the patch introduces static const char _decimal_digit_table[]
(100 bytes).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-07 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

By benchmark should be reproduced on a 64 bits CPU with 2^15 and 2^30
bases for the long type.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-09-07 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

I wrote a dummy script to generate a big number (2568 decimal digits, 8530
bits) and then benchmark str(n). Results on my computer:

Python 2.7a0, Pentium4 @ 3.0 GHz (32 bits), long base=2^15

Smallest value of 5 runs:

  original = 5046.8 ms
  patched = 2032.4 ms

For huge numbers, the patch is much (60%) faster.

--

Small integer (type=int) : n=factorial(10) (22 bits, 7 decimal digits) with
10 loops.

   original = 861.7 ms
   patched = 639.2 ms

It's also faster (26%).

--

And with n=1 (1 bit, 1 decimal digit), type=int :

  original = 606.7
  patched = 561.6

It's a little bit faster (7%) with the patch.

I don't see any performance regression, only good improvements: 60% faster
to huge numbers.

--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-08-29 Thread Mark Dickinson

Changes by Mark Dickinson dicki...@gmail.com:


--
nosy: +marketdickinson

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6713] Integer Long types: Performance improvement of 1.6x to 2x for base 10 conversions

2009-08-16 Thread Gawain Bolton

New submission from Gawain Bolton gp.bol...@computer.org:

Converting integer  long types to their ASCII representation is a task
which can be quite CPU intensive due to the division  modulo
operations.  For long integers having hundreds or thousands of digits,
this can take a truly significant amount of CPU time.

I have written a special case for base 10 conversions which allows for
two improvements.
1) Two digits can be converted at a time, thus reducing the number of
div/mod operations by two.
2) An optimizing compiler can avoid performing a division operation when
the divisor is hardcoded.  The expensive division operation can be
replaced by a much faster multiplication operation.

My tests show an improvement of 1.6x to 1.8x improvement for integer
types and 2x improvement for longs.

Note that because integers are displayed using fprintf(), the
performance improvement is only seen when __repr__() is called.

Patch is provided against trunk.  It is somewhat difficult to read the
patch in one or two places due to the use of tabs.

--
components: Interpreter Core
files: base10_conversion_performance_patch.txt
messages: 91636
nosy: gawain
severity: normal
status: open
title: Integer  Long types:  Performance improvement of 1.6x to 2x for base 10 
conversions
type: performance
versions: Python 2.7
Added file: 
http://bugs.python.org/file14736/base10_conversion_performance_patch.txt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6713
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com