On Tue, Dec 7, 2010 at 9:40 PM, Kenton Varda <[email protected]> wrote:
> On Tue, Dec 7, 2010 at 9:19 PM, Yang Zhang <[email protected]> wrote:
>>
>> > Also, note that if you explicitly compile C++ versions of your
>> > messages and link them into the process, they'll be even faster. (If
>> > you
>> > don't, the library falls back to DynamicMessage which is not as fast as
>> > generated code.)
>>
>> I'm trying to decipher that last hint, but having some trouble - what
>> exactly do you mean / how do I do that? I'm just using protoc
>> --py_out=... and PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp.
>
> I'm not completely sure what I mean, because I don't have much experience
> with Python C Extensions. Basically I'm saying you should additionally
> generate C++ code using protoc, the compile that into a C extension (even
> with no interface), and then load it into your Python process. Simply
> having the C++ code for your message types present will make them faster.
Ah, my understanding now is that:
- Python code ordinarily (without
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp) uses pure Python
(generated code) to parse/serialize messages.
- Python code *with* PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp) uses
generic C++ code that dynamically parses/serializes messages (via
DynamicMessage/reflection), as opposed to using any pre-generated C++
code.
- Python code with PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp actually
also *searches for the symbols for any pre-generated C++ code in the
current process*, and uses them if available instead of
DynamicMessage...? (This is via some global DescriptorPool magic?)
Sounds like pretty weird behavior, but indeed, now I get even faster
processing. The following run shows ~68x and ~13x speedups vs. ~15x
and ~8x (my original speedup calculations were ~15x and ~8x, not ~12x
and ~7x...not sure how I got those, I probably was going off a
different set of measurements):
$ PYTHONPATH=build/lib.linux-x86_64-2.6/:$PYTHONPATH
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp python sandbox/pbbench.py
out.ini
noop: 1.6188621521e-07
ser: 6.39575719833e-06
parse: 4.55250144005e-05
msg size: 10730
This was simple to do. I added a C extension to my setup.py:
<<<
setup(
...
ext_modules=[Extension('podpb',
sources=['cpp/podpb.c','cpp/main.pb.cc'], libraries=['protobuf'])],
...
)
>>>
Generate the second source file with `protoc --cpp_out=cpp`, and
create the first one to set up an empty Python module:
<<<
#include <Python.h>
static PyMethodDef PodMethods[] = {
{NULL, NULL, 0, NULL} /* Sentinel */
};
PyMODINIT_FUNC
initpodpb(void)
{
PyObject *m;
m = Py_InitModule("podpb", PodMethods);
if (m == NULL)
return;
}
>>>
Now `python setup.py build` should build everything. Just import the
module (podpb in our case) and you're good.
Awesome tip, thanks Kenton. I foresee additions to the documentation
in protobuf's near future.... :)
--
Yang Zhang
http://yz.mit.edu/
--
You received this message because you are subscribed to the Google Groups
"Protocol Buffers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/protobuf?hl=en.