Re: Human-readable strings?

2009-01-17 Thread Vadim Peretokin
Okay, thank you.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: how to submit patches? (python performance enhancement)

2009-01-17 Thread Will Pierce
I unrolled the loop and reordered the bounds checking on the varint
_VarUInt64ByteSizeNoTag  function which gets called quite a bit within
wire_format.py and got some interesting speedups.

What surprised me is that my first attempt to speed this function up was by
using the actual mathematical formula for how large a varint will be
encoded, using log base 2 and knowing that it gets spread out across 7-bit
pieces.
something like this:
  if uint64  UINT64_MAX:
raise ValueError
  if uint64  1:
uint64 = 1
  uint64 += 1
  size = math.ceil(math.log(uint64,  2) / 7)
  return size

This code has a consistent running time for every value sent in to it.  On
my simple hardware for encoded values under 2^28, this code is SLOWER than
the looping version (loop only goes around 4 times).  But for values larger
than 2^28, this logarithm method was up to 2x faster!  But that's the
opposite performance we want, since varints should be used less frequently
than fixed(32|64) for values greater than 2^28.

So instead, I unrolled the loop and moved the range bounds checking to a
more sensible place.  I then compared the runtimes using the timeit module
and found it's from 2x-3x faster than the old method.  I didn't really
expect that kind of speedup from just unrolling the loop, so I would
appreciate some feedback.  Given how often this code is called

I've only tested this with python 2.5.2 on linux (fedora10), with -O and
without -O on the cmdline.  I also ran the python/setup.py test tests and
got an OK for the 145 tests.  I haven't tested yet on a 64bit box, or
other versions of python, so perhaps others could try it out.  I've attached
the test code (named size_calc.py which is standalone code), and also the
patch, (named varintsizenotag.patch, taken from svn diff at trunk/../) which
has the unrolled loop (ATTACHED.)

Here are the measurements I got (output follows, also attached as
speedtest.txt):
-
Testing speed using 400 iterations
Value 18446744073709551615 uses (old:10 == new:10) bytes to encode
(UINT64_MAX)
  Old method takes: 35.3816759586 New method takes: 10.5258858204
seconds
 New method is 3.4x the old method speed.  New method is FASTER.

Value 18446744073709551614 uses (old:10 == new:10) bytes to encode
(UINT64_MAX-1)
  Old method takes: 34.9271941185 New method takes: 10.4822030067
seconds
 New method is 3.3x the old method speed.  New method is FASTER.

Value 0 uses (old:1 == new:1) bytes to encode (bit-shift is 0, offset is -1)
  Old method takes: 4.06476712227 New method takes: 2.08561015129
seconds
 New method is 1.9x the old method speed.  New method is FASTER.

Value 1 uses (old:1 == new:1) bytes to encode (bit-shift is 0, offset is 0)
  Old method takes: 4.05545282364 New method takes: 2.09449100494
seconds
 New method is 1.9x the old method speed.  New method is FASTER.

Value 2 uses (old:1 == new:1) bytes to encode (bit-shift is 0, offset is 1)
  Old method takes: 4.31162905693 New method takes: 2.09951090813
seconds
 New method is 2.1x the old method speed.  New method is FASTER.

Value 127 uses (old:1 == new:1) bytes to encode (bit-shift is 7, offset is
-1)
  Old method takes: 4.07322001457 New method takes: 1.99916601181
seconds
 New method is 2.0x the old method speed.  New method is FASTER.

Value 128 uses (old:2 == new:2) bytes to encode (bit-shift is 7, offset is
0)
  Old method takes: 5.91175985336 New method takes: 2.56322216988
seconds
 New method is 2.3x the old method speed.  New method is FASTER.

Value 129 uses (old:2 == new:2) bytes to encode (bit-shift is 7, offset is
1)
  Old method takes: 5.71182394028 New method takes: 2.66633605957
seconds
 New method is 2.1x the old method speed.  New method is FASTER.

Value 16383 uses (old:2 == new:2) bytes to encode (bit-shift is 14, offset
is -1)
  Old method takes: 5.86016702652 New method takes: 2.5270512104 seconds
 New method is 2.3x the old method speed.  New method is FASTER.

Value 16384 uses (old:3 == new:3) bytes to encode (bit-shift is 14, offset
is 0)
  Old method takes: 7.12100481987 New method takes: 2.98018193245
seconds
 New method is 2.4x the old method speed.  New method is FASTER.

Value 16385 uses (old:3 == new:3) bytes to encode (bit-shift is 14, offset
is 1)
  Old method takes: 7.09731388092 New method takes: 2.95949792862
seconds
 New method is 2.4x the old method speed.  New method is FASTER.

Value 2097151 uses (old:3 == new:3) bytes to encode (bit-shift is 21, offset
is -1)
  Old method takes: 7.49546098709 New method takes: 2.94564318657
seconds
 New method is 2.5x the old method speed.  New method is FASTER.

Value 2097152 uses (old:4 == new:4) bytes to encode (bit-shift is 21, offset
is 0)
  Old method takes: 9.02288794518 New method takes: 3.25170016289
seconds
 New method is 2.8x the old method speed.  New method is FASTER.

Value 

Re: Generating subclasses of a protocol buffer serialized class

2009-01-17 Thread Chris

Yes after I pressed the button I realized the error of my ways (re
reduce maintainability).  I partially agree with you about the
exposing of the internal nastyness,however that is a choice to be made
by the developer per use basis.  To not have that there is like saying
we dont offer you object oriented programming cause you will make
mistakeswell kinda (not meant to be a deeply religious argument
open for internet flaming).

C

On Jan 17, 11:53 am, Henner Zeller h.zel...@acm.org wrote:
 Hi,

 On Sat, Jan 17, 2009 at 9:30 AM, Chris chrisjcoll...@gmail.com wrote:

  Thanks Steve and Alek.  As I mentioned wrappers delegating access to a
  containing class and simple copy to from a business class is what we
  do today and its a mess, its error prone and its totally manual, am I
  missing something?  Every new attribute needs to either have a few
  lines more added to the business class to get and set its values.

 Typically, a business class should not expose all the internal state
 (i.e. the attributes of the underlying data) in the first place. You
 should be very specific which attributes you expose.

 Anyway, having said that, you might consider just returning a
 reference or pointer to the protocol buffer kept in the business class
 in case you _really_ have to access the internal data
   business_object-data().get_foobar().
 This way you don't have to modify anything in case you add fields to
 the data. However, this way you always expose all properties of the
 internal data which is a bad idea - but again, this would have
 happened with the inheritance approach as well.
 On advantage as well is that the access code looks sufficiently ugly
 that users will avoid using internal data fields directly :)

  In
  a prior job I did use a system where persisted classes were generated
  as *Base classes where you were expected to implement the * extending
  *Base.  This worked very well and reduced maintainability.

 You said the right thing here, but I guess you meant to say 'reduced
 maintenance' .. ;)

 -h

  I can of
  course imagine that for languages that are not OO it would prove to be
  a challenge (yes I dont use Python).

  Thanks

  C

  On Jan 17, 12:20 am, Alek Storm alek.st...@gmail.com wrote:
  I agree with Shane.  Wrapping the data is a great way to separate the
  business logic from the data.  Actually, if you're using Python,
  you're not allowed to subclass generated message types (though there's
  no 'sealed' modifier) - it screws with the metaclass machinery.

  Cheers,
  Alek
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---