I've had a chance to revisit this and I tested this using the python Riak
library. As a result, I made some changes, and I have a much better Python
My original python 3 protobufs code took in byte data passed in as strs.
The way the Riak library works, it's reading data from a socket and
sending that to protobufs to decode. I believe most libraries will do the
same - read from a socket or maybe a file, which will load 'bytes' in
Python 3. I changed my code so that it works with bytes instead of
strings. I translated portions of code until I was able to read from Riak,
then went back and got the unittests to pass as well, and both are working
My next goal is to get Python 3 to use the _pb2 suffix just like the Python
2 code does. I am currently using _py3_pb2 as a suffix because of the C++
tests, but this meant I had to go into the Riak library and change a bunch
of imports, for example import riak_pb2 --> import riak_py3_pb2 as riak_pb2.
On Monday, October 1, 2012 4:13:11 PM UTC-7, Charles Law wrote:
> I assumed that the type/value errors are no longer valid in Python 3, so I
> removed the 3 checks in reflection_test.testStringUTF8Encoding(). All unit
> tests now pass!
> On Tuesday, September 25, 2012 1:47:09 PM UTC-7, Charles Law wrote:
>> I thought about this a little, and realized that both unicode and str
>> type strings are passed into fields that have cpp_type CPP_STRING and
>> field_type TYPE_STRING. I know the 7-bit character limit is only imposed
>> on str type strings - all the extreme value tests use unicode strings. In
>> Python3, all strings are unicode, so should this limit only exist in Python
>> On Friday, September 21, 2012 6:16:41 PM UTC-7, Charles Law wrote:
>>> I've made an attempt to create a Python3 compatible version of
>>> protobufs. I have some code that passes pretty much all the unit tests
>>> which I've posted here:
>>> I probably won't have a chance to look at this again for a couple weeks
>>> if not longer, so I want to get it out there. In my attempt I decided to
>>> follow the advice in another post, and treat python3 as a new language. To
>>> get python3 working, you'll have to compile the C code. There are also a
>>> few issues I ran into along the way:
>>> - I decided to use strings where unicode is used in Python 2. I was
>>> originally going to try to use bytes/bytearrays, but they do not support
>>> bit characters, and some of the setup.py tests use "exotic" 16 bit chrs.
>>> (Warning: I might have something conceptually wrong here)
>>> - There are places where byte data is stored as strings, then
>>> converted to unicode. I ended up converting strings (I called them
>>> bytestr's) to normal strings. I'm not sure this is done correctly
>>> everywhere though.
>>> - Data is packed/unpacked using struct.pack/unpack which is done
>>> using bytes instead of strings in Python3. I have simple
>>> and bytes_to_string() functions to do this.
>>> What's left is:
>>> - There are a couple Exceptions that I don't throw. They are
>>> supposed to be where the Python2 code converts from unicode strings to
>>> regular strings. I am definitely missing something conceptually here -
>>> haven't figured out how Python 2x supports strings with "exotic"
>>> characters, but not strings like u'a\x80a'. If someone can solve this
>>> problem & figure out when to throw the exceptions Python3 will be *
>>> fully* working.
>>> I might have small bits of time here or there but I don't think I can
>>> devote the time I need to get this finished for several weeks, so if
>>> someone wants to finish this up, feel free to fork this code. If anyone
>>> wants to see what I did, the best way to do this is to diff between the
>>> latest commit and commit 49ccf5d8b3b688c335dc35bcb9f219eca78c7210.
You received this message because you are subscribed to the Google Groups
"Protocol Buffers" group.
To view this discussion on the web visit
To post to this group, send email to email@example.com.
To unsubscribe from this group, send email to
For more options, visit this group at