BTW, when I talk about one thing being more efficient than another, it's
really a matter of a few percent difference. For the vast majority of apps,
it does not matter. I'd suggest not worrying about it unless you're really
sure you need to improve your performance *and* profiling shows that you
spend a lot of time in protobuf code.
On Thu, Aug 27, 2009 at 7:18 PM, Kenton Varda <ken...@google.com> wrote:
> On Thu, Aug 27, 2009 at 2:06 PM, Saptarshi Guha
>> I was thinking about this and had some questions
>> On Mon, Aug 24, 2009 at 3:29 PM, Kenton Varda<ken...@google.com> wrote:
>> > Generally the most efficient way to serialize a message to stdout is:
>> > message.SerializeToFileDescriptor(STDOUT_FILENO);
>> > (If your system doesn't define STDOUT_FILENO, just use the number 1.)
>> > If you normally use C++'s cout, you might want to write to that instead:
>> > message.SerializeToOstream(std::cout);
>> Does the protobuf library buffer on the file descriptor?
>> I am opening stdout in binary mode, changing the buffer size (setvbuf)
>> and writing to that
>> if i give SerializeToFileDescriptor the file descriptor of this new
>> FILE* object, I guess it won't
>> use my buffer (I know fwrite uses write, but does write care for the
>> buffer of the FILE* object?).
> That is correct. FILE* adds a buffering layer on top of the fd. If you
> wanted protobuf to write to that buffer, you could probably write an
> implementation of protobuf::io::CopyingOutputStream for FILE* and wrap it in
> a protobuf::io::CopyingOutputStreamAdaptor, then pass that to
>> > For small messages, it may be slightly faster to serialize to a string
>> > then write that. But the difference there would be small, and if it
>> > to you we should probably just fix the protobuf library to do this
>> > optimization automatically...
>> I should point out that my messages will be in the kb and definitely
>> less than an MB.
> For "small messages", I mean ~4kb or less. The issue is that
> SerializeToFileDescriptor() allocates an 8k buffer internally, which is
> wasteful if the message is much less than 8k. We should fix it so that it
> doesn't do that for small messages.
>> You mention serializing to string. However I also see a method
>> "SerializeToArray" .
>> What is the difference?
> With SerializeToArray() you need to make sure the array is big enough ahead
> of time, whereas SerializeToString() will allocate a string of the correct
> size. You can call ByteSize() in order to size your array, but when you
> then call SerializeToArray() it will have to call ByteSize() again
> internally, which is wasteful. To allocate a correctly-sized array and
> serialize to it with optimal efficiency you have to use ByteSize() and then
> call SerializeToArrayWithCachedSizes() -- which reuses the sizes computed by
> the previous ByteSize() call. Actually, I guess that's not very hard, is
> it? It used to be harder.
>> To avoid repeated mallocs/free, I intend to keep one global
>> array(resizing if required)
> If you reuse a single std::string object, you should get the same effect.
> string::clear() does not free the backing array, it just sets the size to
> zero. So, it will reuse that array the next time you serialize into it.
>> , writing to that array and keeping a track of the bytes written and
>> writing th array out to the stream.
>> Since my app is not threaded, I do not have an issue of multiple
>> threads writing to that single array.
>> However if SerializeToFileDescriptor is still better than this
>> approach there is no need for this.
> SerializeToFileDescriptor() is better if your messages are very large
> because it avoids allocating large contiguous blocks of memory, which can
> cause memory fragmentation. Otherwise it has no advantage over serializing
> to an array and then writing it to the file.
>> > All of these methods require that you write the size first if you intend
>> > write multiple messages to the stream.
>> Yes, I will be writing the length first.
> Ah, of course, in this case you have to call ByteSize() anyway, so if
> you're really worried about performance then you want to call
>> I should point out I haven't had much experience with write,fwrite so
>> my understanding might be incomplete.
>> Much thanks for advice
You received this message because you are subscribed to the Google Groups
"Protocol Buffers" group.
To post to this group, send email to email@example.com
To unsubscribe from this group, send email to
For more options, visit this group at