boost  supports two kinds of serialization mechanism : text and binary

On Tue, Mar 31, 2009 at 11:37 AM, Kenton Varda <ken...@google.com> wrote:

> What do you mean "change boost binary"?
>
> Parsing ~500MB in 0.05 seconds sounds dubious to me.  That's 10GB/s
> throughput.
>
>
> On Mon, Mar 30, 2009 at 7:57 PM, Yingfeng Zhang 
> <yingfeng.zh...@gmail.com>wrote:
>
>> If we change boost binary, here is the result, it seems much faster..
>>
>> It takes 0.08 seconds for boost to serialize vector<int> of size 10000000
>> !
>>
>> It takes 0.05 seconds for boost to deserialize vector<int> of size
>> 10000000 !
>>
>>
>> Best
>>
>>
>>
>>
>> On Tue, Mar 31, 2009 at 10:46 AM, Kenton Varda <ken...@google.com> wrote:
>>
>>> That's more like it.  :)
>>>
>>>
>>> On Mon, Mar 30, 2009 at 7:09 PM, Yingfeng Zhang <
>>> yingfeng.zh...@gmail.com> wrote:
>>>
>>>> Thanks for feedbacks.
>>>> I agree with what your points.I use vector<string> because it had
>>>> already been used on existing platform.
>>>>
>>>> A newer test of comparing vector<int> has the following result:
>>>>
>>>> It takes 1.9 seconds for boost to serilize vector<int> of size 10000000
>>>> !
>>>>
>>>> It takes 4.71 seconds for boost to deserilize vector<int> of size
>>>> 10000000 !
>>>>
>>>> It takes 0.47 seconds for protocol-buffer to serilize  vector<int>of
>>>> size 10000000 !
>>>>
>>>> It takes 0.45 seconds for protocol-buffer to deserilize  vector<int> of
>>>> size 10000000 !
>>>>
>>>>
>>>>
>>>> Best
>>>>
>>>>
>>>>
>>>> On Tue, Mar 31, 2009 at 2:07 AM, Kenton Varda <ken...@google.com>wrote:
>>>>
>>>>> Several points:
>>>>>
>>>>> * Some of your test cases seem to be parsing from or serializing to
>>>>> files.  This may be measuring file I/O performance more than it is 
>>>>> measuring
>>>>> the respective serialization libraries.  Even though you are using clock()
>>>>> to measure time, simply setting up file I/O operations involves syscalls 
>>>>> and
>>>>> copying that could take some CPU time to execute.  Try parsing from and
>>>>> serializing to in-memory buffers instead.  For protocol buffers you should
>>>>> use ParseFromArray() and SerializeToArray() for maximum performance -- not
>>>>> sure if boost has equivalents.
>>>>>
>>>>> * Your test generates different random data for the boost test vs. the
>>>>> protobuf test.  For an accurate comparison, you really should use 
>>>>> identical
>>>>> data.
>>>>>
>>>>> * Finally, your test isn't a very interesting test case for protocol
>>>>> buffers.  Parsing and serializing a lot of strings is going to be 
>>>>> dominated
>>>>> by the performance of memcpy().  You might notice that the actual
>>>>> serialization step in your program takes much less time than even just
>>>>> populating the message object.  It might be more interesting to try
>>>>> serializing a message involving many different fields of different types.
>>>>>
>>>>>
>>>>> I think the reason parsing ends up being much slower than serialization
>>>>> for you is because it spends most of the time in malloc(), allocating
>>>>> strings.  There are a few things you can do about this:
>>>>>
>>>>> 1) Reuse the same message object every time you parse.  It will then
>>>>> reuse the same memory instead of allocating new memory.
>>>>>
>>>>> 2) Make sure you are not using a reference-counting string
>>>>> implementation.  They are, ironically, very slow, due to the need for 
>>>>> atomic
>>>>> operations.
>>>>>
>>>>> 3) Use Google's tcmalloc in place of your system's malloc.  It is
>>>>> probably a lot faster.
>>>>>
>>>>> On Sun, Mar 29, 2009 at 9:32 PM, Yingfeng Zhang <
>>>>> yingfeng.zh...@gmail.com> wrote:
>>>>>
>>>>>> Test files are attached
>>>>>>
>>>>>> Best
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 30, 2009 at 12:14 PM, Kenton Varda <ken...@google.com>wrote:
>>>>>>
>>>>>>> What does your .proto file look like?  And the code that uses it?
>>>>>>>
>>>>>>> On Sun, Mar 29, 2009 at 9:06 PM, Yingfeng 
>>>>>>> <yingfeng.zh...@gmail.com>wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>> We are looking for a fast mechanism for
>>>>>>>> serialization/deserialization.
>>>>>>>> Here is our comparison between pb and boost:
>>>>>>>> We hope to serialize/deserialize data in std containers, such as:
>>>>>>>>
>>>>>>>> std::vector<std::string>
>>>>>>>>
>>>>>>>> Here is the data
>>>>>>>> 10000000 strings are stored in the vector
>>>>>>>>
>>>>>>>> as to boost:
>>>>>>>> Serialization: 3.8 s
>>>>>>>> Deserialization: 6.89 s
>>>>>>>>
>>>>>>>> as to protocol buffers:
>>>>>>>> Serialization: 4.59 s
>>>>>>>> Deserialization: 0.47 s
>>>>>>>>
>>>>>>>> It seems pb performs much bettern than boost in deserialization,
>>>>>>>> however it is even slower than boost in serialization. Could it be
>>>>>>>> improved for serialization to be as fast as deserialization?
>>>>>>>>
>>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to