Thanks for feedbacks. I agree with what your points.I use vector<string> because it had already been used on existing platform.
A newer test of comparing vector<int> has the following result: It takes 1.9 seconds for boost to serilize vector<int> of size 10000000 ! It takes 4.71 seconds for boost to deserilize vector<int> of size 10000000 ! It takes 0.47 seconds for protocol-buffer to serilize vector<int>of size 10000000 ! It takes 0.45 seconds for protocol-buffer to deserilize vector<int> of size 10000000 ! Best On Tue, Mar 31, 2009 at 2:07 AM, Kenton Varda <ken...@google.com> wrote: > Several points: > > * Some of your test cases seem to be parsing from or serializing to files. > This may be measuring file I/O performance more than it is measuring the > respective serialization libraries. Even though you are using clock() to > measure time, simply setting up file I/O operations involves syscalls and > copying that could take some CPU time to execute. Try parsing from and > serializing to in-memory buffers instead. For protocol buffers you should > use ParseFromArray() and SerializeToArray() for maximum performance -- not > sure if boost has equivalents. > > * Your test generates different random data for the boost test vs. the > protobuf test. For an accurate comparison, you really should use identical > data. > > * Finally, your test isn't a very interesting test case for protocol > buffers. Parsing and serializing a lot of strings is going to be dominated > by the performance of memcpy(). You might notice that the actual > serialization step in your program takes much less time than even just > populating the message object. It might be more interesting to try > serializing a message involving many different fields of different types. > > > I think the reason parsing ends up being much slower than serialization for > you is because it spends most of the time in malloc(), allocating strings. > There are a few things you can do about this: > > 1) Reuse the same message object every time you parse. It will then reuse > the same memory instead of allocating new memory. > > 2) Make sure you are not using a reference-counting string implementation. > They are, ironically, very slow, due to the need for atomic operations. > > 3) Use Google's tcmalloc in place of your system's malloc. It is probably > a lot faster. > > On Sun, Mar 29, 2009 at 9:32 PM, Yingfeng Zhang > <yingfeng.zh...@gmail.com>wrote: > >> Test files are attached >> >> Best >> >> >> >> On Mon, Mar 30, 2009 at 12:14 PM, Kenton Varda <ken...@google.com> wrote: >> >>> What does your .proto file look like? And the code that uses it? >>> >>> On Sun, Mar 29, 2009 at 9:06 PM, Yingfeng <yingfeng.zh...@gmail.com>wrote: >>> >>>> >>>> Hi, >>>> We are looking for a fast mechanism for serialization/deserialization. >>>> Here is our comparison between pb and boost: >>>> We hope to serialize/deserialize data in std containers, such as: >>>> >>>> std::vector<std::string> >>>> >>>> Here is the data >>>> 10000000 strings are stored in the vector >>>> >>>> as to boost: >>>> Serialization: 3.8 s >>>> Deserialization: 6.89 s >>>> >>>> as to protocol buffers: >>>> Serialization: 4.59 s >>>> Deserialization: 0.47 s >>>> >>>> It seems pb performs much bettern than boost in deserialization, >>>> however it is even slower than boost in serialization. Could it be >>>> improved for serialization to be as fast as deserialization? >>>> >>>> >>>> >>>> >>>> >>> >> > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~----------~----~----~----~------~----~------~--~---