On Fri, Aug 28, 2009 at 8:14 AM, Saptarshi Guha <saptarshi.g...@gmail.com>wrote:

> Hello,
> Thanks much for the answers. I did perform some tests and your
> statements hold true (marginal differences however)
> i.e for small messages (~7kb), the FDescriptor method is faster than
> SerializeToString. For larger messages the latter is faster.


Err, I had it the other way around.  :)  SerializeToFileDescriptor() should
definitely be slower than SerializeToString() for small messages.  And
actually, for large messages it is still probably slower, but avoiding
memory fragmentation seems more important.


> Surprisingly, the output to array is much slower than the other two.


That doesn't seem right, since SerializeToString() and SerializeToArray()
share the same implementation.  The only difference is that
SerializeToString() has to allocate space first, which should make it
slower.


>
>
> Thanks for your input, it was really helpful.
> Regards
> Saptarshi
>
> On Thu, Aug 27, 2009 at 10:19 PM, Kenton Varda<ken...@google.com> wrote:
> > BTW, when I talk about one thing being more efficient than another, it's
> > really a matter of a few percent difference.  For the vast majority of
> apps,
> > it does not matter.  I'd suggest not worrying about it unless you're
> really
> > sure you need to improve your performance *and* profiling shows that you
> > spend a lot of time in protobuf code.
> >
> > On Thu, Aug 27, 2009 at 7:18 PM, Kenton Varda <ken...@google.com> wrote:
> >>
> >>
> >> On Thu, Aug 27, 2009 at 2:06 PM, Saptarshi Guha <
> saptarshi.g...@gmail.com>
> >> wrote:
> >>>
> >>> Hello
> >>> I was thinking about this and had some questions
> >>>
> >>> On Mon, Aug 24, 2009 at 3:29 PM, Kenton Varda<ken...@google.com>
> wrote:
> >>> > Generally the most efficient way to serialize a message to stdout is:
> >>> >   message.SerializeToFileDescriptor(STDOUT_FILENO);
> >>> > (If your system doesn't define STDOUT_FILENO, just use the number 1.)
> >>> > If you normally use C++'s cout, you might want to write to that
> >>> > instead:
> >>> >   message.SerializeToOstream(std::cout);
> >>>
> >>> Does the protobuf library buffer on the file descriptor?
> >>
> >> Yes.
> >>
> >>>
> >>> I am opening stdout in binary mode, changing the buffer size (setvbuf)
> >>> and writing to that
> >>> if i give SerializeToFileDescriptor the file descriptor of this new
> >>> FILE* object, I guess it won't
> >>> use my buffer (I know fwrite uses write, but does write care for the
> >>> buffer of the FILE* object?).
> >>
> >> That is correct.  FILE* adds a buffering layer on top of the fd.  If you
> >> wanted protobuf to write to that buffer, you could probably write an
> >> implementation of protobuf::io::CopyingOutputStream for FILE* and wrap
> it in
> >> a protobuf::io::CopyingOutputStreamAdaptor, then pass that to
> >> message.SerializeToZeroCopyStream().
> >>
> >>>
> >>> > For small messages, it may be slightly faster to serialize to a
> string
> >>> > and
> >>> > then write that.  But the difference there would be small, and if it
> >>> > matters
> >>> > to you we should probably just fix the protobuf library to do this
> >>> > optimization automatically...
> >>> I should point out that my messages will be in the kb and definitely
> >>> less than an MB.
> >>
> >> For "small messages", I mean ~4kb or less.  The issue is that
> >> SerializeToFileDescriptor() allocates an 8k buffer internally, which is
> >> wasteful if the message is much less than 8k.  We should fix it so that
> it
> >> doesn't do that for small messages.
> >>
> >>>
> >>> You mention serializing to string. However I also see a method
> >>> "SerializeToArray" .
> >>> What is the difference?
> >>
> >> With SerializeToArray() you need to make sure the array is big enough
> >> ahead of time, whereas SerializeToString() will allocate a string of the
> >> correct size.  You can call ByteSize() in order to size your array, but
> when
> >> you then call SerializeToArray() it will have to call ByteSize() again
> >> internally, which is wasteful.  To allocate a correctly-sized array and
> >> serialize to it with optimal efficiency you have to use ByteSize() and
> then
> >> call SerializeToArrayWithCachedSizes() -- which reuses the sizes
> computed by
> >> the previous ByteSize() call.  Actually, I guess that's not very hard,
> is
> >> it?  It used to be harder.
> >>
> >>>
> >>> To avoid repeated mallocs/free, I intend to keep one  global
> >>> array(resizing if required)
> >>
> >> If you reuse a single std::string object, you should get the same
> effect.
> >>  string::clear() does not free the backing array, it just sets the size
> to
> >> zero.  So, it will reuse that array the next time you serialize into it.
> >>
> >>>
> >>> , writing to that array and keeping a track of the bytes written and
> >>> writing th array out to the stream.
> >>> Since my app is not threaded, I do not have an issue of multiple
> >>> threads writing to that single array.
> >>> However if SerializeToFileDescriptor is still better than this
> >>> approach there is no need for this.
> >>
> >> SerializeToFileDescriptor() is better if your messages are very large
> >> because it avoids allocating large contiguous blocks of memory, which
> can
> >> cause memory fragmentation.  Otherwise it has no advantage over
> serializing
> >> to an array and then writing it to the file.
> >>
> >>>
> >>>
> >>> > All of these methods require that you write the size first if you
> >>> > intend to
> >>> > write multiple messages to the stream.
> >>>
> >>> Yes, I will be writing the length first.
> >>
> >> Ah, of course, in this case you have to call ByteSize() anyway, so if
> >> you're really worried about performance then you want to call
> >> Serialize*WithCachedSizes().
> >>
> >>>
> >>> I should point out I haven't had much experience with write,fwrite so
> >>> my understanding might be incomplete.
> >>>
> >>> Much thanks for advice
> >>> Regards
> >>> Saptarshi
> >>
> >
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to