Several points:

* Some of your test cases seem to be parsing from or serializing to files.
 This may be measuring file I/O performance more than it is measuring the
respective serialization libraries.  Even though you are using clock() to
measure time, simply setting up file I/O operations involves syscalls and
copying that could take some CPU time to execute.  Try parsing from and
serializing to in-memory buffers instead.  For protocol buffers you should
use ParseFromArray() and SerializeToArray() for maximum performance -- not
sure if boost has equivalents.

* Your test generates different random data for the boost test vs. the
protobuf test.  For an accurate comparison, you really should use identical
data.

* Finally, your test isn't a very interesting test case for protocol
buffers.  Parsing and serializing a lot of strings is going to be dominated
by the performance of memcpy().  You might notice that the actual
serialization step in your program takes much less time than even just
populating the message object.  It might be more interesting to try
serializing a message involving many different fields of different types.


I think the reason parsing ends up being much slower than serialization for
you is because it spends most of the time in malloc(), allocating strings.
 There are a few things you can do about this:

1) Reuse the same message object every time you parse.  It will then reuse
the same memory instead of allocating new memory.

2) Make sure you are not using a reference-counting string implementation.
 They are, ironically, very slow, due to the need for atomic operations.

3) Use Google's tcmalloc in place of your system's malloc.  It is probably a
lot faster.

On Sun, Mar 29, 2009 at 9:32 PM, Yingfeng Zhang <yingfeng.zh...@gmail.com>wrote:

> Test files are attached
>
> Best
>
>
>
> On Mon, Mar 30, 2009 at 12:14 PM, Kenton Varda <ken...@google.com> wrote:
>
>> What does your .proto file look like?  And the code that uses it?
>>
>> On Sun, Mar 29, 2009 at 9:06 PM, Yingfeng <yingfeng.zh...@gmail.com>wrote:
>>
>>>
>>> Hi,
>>> We are looking for a fast mechanism for serialization/deserialization.
>>> Here is our comparison between pb and boost:
>>> We hope to serialize/deserialize data in std containers, such as:
>>>
>>> std::vector<std::string>
>>>
>>> Here is the data
>>> 10000000 strings are stored in the vector
>>>
>>> as to boost:
>>> Serialization: 3.8 s
>>> Deserialization: 6.89 s
>>>
>>> as to protocol buffers:
>>> Serialization: 4.59 s
>>> Deserialization: 0.47 s
>>>
>>> It seems pb performs much bettern than boost in deserialization,
>>> however it is even slower than boost in serialization. Could it be
>>> improved for serialization to be as fast as deserialization?
>>>
>>>
>>> >>>
>>>
>>
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to