BTW, when I talk about one thing being more efficient than another, it's
really a matter of a few percent difference.  For the vast majority of apps,
it does not matter.  I'd suggest not worrying about it unless you're really
sure you need to improve your performance *and* profiling shows that you
spend a lot of time in protobuf code.

On Thu, Aug 27, 2009 at 7:18 PM, Kenton Varda <ken...@google.com> wrote:

>
>
> On Thu, Aug 27, 2009 at 2:06 PM, Saptarshi Guha 
> <saptarshi.g...@gmail.com>wrote:
>
>> Hello
>> I was thinking about this and had some questions
>>
>> On Mon, Aug 24, 2009 at 3:29 PM, Kenton Varda<ken...@google.com> wrote:
>> > Generally the most efficient way to serialize a message to stdout is:
>> >   message.SerializeToFileDescriptor(STDOUT_FILENO);
>> > (If your system doesn't define STDOUT_FILENO, just use the number 1.)
>> > If you normally use C++'s cout, you might want to write to that instead:
>> >   message.SerializeToOstream(std::cout);
>>
>> Does the protobuf library buffer on the file descriptor?
>
>
> Yes.
>
>
>> I am opening stdout in binary mode, changing the buffer size (setvbuf)
>> and writing to that
>> if i give SerializeToFileDescriptor the file descriptor of this new
>> FILE* object, I guess it won't
>> use my buffer (I know fwrite uses write, but does write care for the
>> buffer of the FILE* object?).
>
>
> That is correct.  FILE* adds a buffering layer on top of the fd.  If you
> wanted protobuf to write to that buffer, you could probably write an
> implementation of protobuf::io::CopyingOutputStream for FILE* and wrap it in
> a protobuf::io::CopyingOutputStreamAdaptor, then pass that to
> message.SerializeToZeroCopyStream().
>
>
>> > For small messages, it may be slightly faster to serialize to a string
>> and
>> > then write that.  But the difference there would be small, and if it
>> matters
>> > to you we should probably just fix the protobuf library to do this
>> > optimization automatically...
>> I should point out that my messages will be in the kb and definitely
>> less than an MB.
>
>
> For "small messages", I mean ~4kb or less.  The issue is that
> SerializeToFileDescriptor() allocates an 8k buffer internally, which is
> wasteful if the message is much less than 8k.  We should fix it so that it
> doesn't do that for small messages.
>
>
>> You mention serializing to string. However I also see a method
>> "SerializeToArray" .
>> What is the difference?
>
>
> With SerializeToArray() you need to make sure the array is big enough ahead
> of time, whereas SerializeToString() will allocate a string of the correct
> size.  You can call ByteSize() in order to size your array, but when you
> then call SerializeToArray() it will have to call ByteSize() again
> internally, which is wasteful.  To allocate a correctly-sized array and
> serialize to it with optimal efficiency you have to use ByteSize() and then
> call SerializeToArrayWithCachedSizes() -- which reuses the sizes computed by
> the previous ByteSize() call.  Actually, I guess that's not very hard, is
> it?  It used to be harder.
>
>
>> To avoid repeated mallocs/free, I intend to keep one  global
>> array(resizing if required)
>
>
> If you reuse a single std::string object, you should get the same effect.
>  string::clear() does not free the backing array, it just sets the size to
> zero.  So, it will reuse that array the next time you serialize into it.
>
>
>> , writing to that array and keeping a track of the bytes written and
>> writing th array out to the stream.
>> Since my app is not threaded, I do not have an issue of multiple
>> threads writing to that single array.
>> However if SerializeToFileDescriptor is still better than this
>> approach there is no need for this.
>
>
> SerializeToFileDescriptor() is better if your messages are very large
> because it avoids allocating large contiguous blocks of memory, which can
> cause memory fragmentation.  Otherwise it has no advantage over serializing
> to an array and then writing it to the file.
>
>
>>
>>
>>
>> > All of these methods require that you write the size first if you intend
>> to
>> > write multiple messages to the stream.
>>
>> Yes, I will be writing the length first.
>
>
> Ah, of course, in this case you have to call ByteSize() anyway, so if
> you're really worried about performance then you want to call
> Serialize*WithCachedSizes().
>
>
>> I should point out I haven't had much experience with write,fwrite so
>> my understanding might be incomplete.
>>
>> Much thanks for advice
>> Regards
>> Saptarshi
>>
>
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to