I unrolled the loop and reordered the bounds checking on the varint
_VarUInt64ByteSizeNoTag function which gets called quite a bit within
wire_format.py and got some interesting speedups.
What surprised me is that my first attempt to speed this function up was by
using the actual mathematical formula for how large a varint will be
encoded, using log base 2 and knowing that it gets spread out across 7-bit
pieces.
something like this:
if uint64 UINT64_MAX:
raise ValueError
if uint64 1:
uint64 = 1
uint64 += 1
size = math.ceil(math.log(uint64, 2) / 7)
return size
This code has a consistent running time for every value sent in to it. On
my simple hardware for encoded values under 2^28, this code is SLOWER than
the looping version (loop only goes around 4 times). But for values larger
than 2^28, this logarithm method was up to 2x faster! But that's the
opposite performance we want, since varints should be used less frequently
than fixed(32|64) for values greater than 2^28.
So instead, I unrolled the loop and moved the range bounds checking to a
more sensible place. I then compared the runtimes using the timeit module
and found it's from 2x-3x faster than the old method. I didn't really
expect that kind of speedup from just unrolling the loop, so I would
appreciate some feedback. Given how often this code is called
I've only tested this with python 2.5.2 on linux (fedora10), with -O and
without -O on the cmdline. I also ran the python/setup.py test tests and
got an OK for the 145 tests. I haven't tested yet on a 64bit box, or
other versions of python, so perhaps others could try it out. I've attached
the test code (named size_calc.py which is standalone code), and also the
patch, (named varintsizenotag.patch, taken from svn diff at trunk/../) which
has the unrolled loop (ATTACHED.)
Here are the measurements I got (output follows, also attached as
speedtest.txt):
-
Testing speed using 400 iterations
Value 18446744073709551615 uses (old:10 == new:10) bytes to encode
(UINT64_MAX)
Old method takes: 35.3816759586 New method takes: 10.5258858204
seconds
New method is 3.4x the old method speed. New method is FASTER.
Value 18446744073709551614 uses (old:10 == new:10) bytes to encode
(UINT64_MAX-1)
Old method takes: 34.9271941185 New method takes: 10.4822030067
seconds
New method is 3.3x the old method speed. New method is FASTER.
Value 0 uses (old:1 == new:1) bytes to encode (bit-shift is 0, offset is -1)
Old method takes: 4.06476712227 New method takes: 2.08561015129
seconds
New method is 1.9x the old method speed. New method is FASTER.
Value 1 uses (old:1 == new:1) bytes to encode (bit-shift is 0, offset is 0)
Old method takes: 4.05545282364 New method takes: 2.09449100494
seconds
New method is 1.9x the old method speed. New method is FASTER.
Value 2 uses (old:1 == new:1) bytes to encode (bit-shift is 0, offset is 1)
Old method takes: 4.31162905693 New method takes: 2.09951090813
seconds
New method is 2.1x the old method speed. New method is FASTER.
Value 127 uses (old:1 == new:1) bytes to encode (bit-shift is 7, offset is
-1)
Old method takes: 4.07322001457 New method takes: 1.99916601181
seconds
New method is 2.0x the old method speed. New method is FASTER.
Value 128 uses (old:2 == new:2) bytes to encode (bit-shift is 7, offset is
0)
Old method takes: 5.91175985336 New method takes: 2.56322216988
seconds
New method is 2.3x the old method speed. New method is FASTER.
Value 129 uses (old:2 == new:2) bytes to encode (bit-shift is 7, offset is
1)
Old method takes: 5.71182394028 New method takes: 2.66633605957
seconds
New method is 2.1x the old method speed. New method is FASTER.
Value 16383 uses (old:2 == new:2) bytes to encode (bit-shift is 14, offset
is -1)
Old method takes: 5.86016702652 New method takes: 2.5270512104 seconds
New method is 2.3x the old method speed. New method is FASTER.
Value 16384 uses (old:3 == new:3) bytes to encode (bit-shift is 14, offset
is 0)
Old method takes: 7.12100481987 New method takes: 2.98018193245
seconds
New method is 2.4x the old method speed. New method is FASTER.
Value 16385 uses (old:3 == new:3) bytes to encode (bit-shift is 14, offset
is 1)
Old method takes: 7.09731388092 New method takes: 2.95949792862
seconds
New method is 2.4x the old method speed. New method is FASTER.
Value 2097151 uses (old:3 == new:3) bytes to encode (bit-shift is 21, offset
is -1)
Old method takes: 7.49546098709 New method takes: 2.94564318657
seconds
New method is 2.5x the old method speed. New method is FASTER.
Value 2097152 uses (old:4 == new:4) bytes to encode (bit-shift is 21, offset
is 0)
Old method takes: 9.02288794518 New method takes: 3.25170016289
seconds
New method is 2.8x the old method speed. New method is FASTER.
Value