Hello,
Thanks much for the answers. I did perform some tests and your
statements hold true (marginal differences however)
i.e for small messages (~7kb), the FDescriptor method is faster than
SerializeToString. For larger messages the latter is faster.

I tried a typical case (for me), creating R runif(N) object(once),
serialize using ProtoBufs, writing this out and repeating this M
times.
For N say, 125  *FD is better and for larger N(2000, about 15KB) to
String is better. However, i did notice about 10% improvement (not a
very rigorous experiment) for the FD method over *String method when
it came to right tiny messages (~1KB) 10MM(=M) times .

Surprisingly, the output to array is much slower than the other two.

Thanks for your input, it was really helpful.
Regards
Saptarshi

On Thu, Aug 27, 2009 at 10:19 PM, Kenton Varda<ken...@google.com> wrote:
> BTW, when I talk about one thing being more efficient than another, it's
> really a matter of a few percent difference.  For the vast majority of apps,
> it does not matter.  I'd suggest not worrying about it unless you're really
> sure you need to improve your performance *and* profiling shows that you
> spend a lot of time in protobuf code.
>
> On Thu, Aug 27, 2009 at 7:18 PM, Kenton Varda <ken...@google.com> wrote:
>>
>>
>> On Thu, Aug 27, 2009 at 2:06 PM, Saptarshi Guha <saptarshi.g...@gmail.com>
>> wrote:
>>>
>>> Hello
>>> I was thinking about this and had some questions
>>>
>>> On Mon, Aug 24, 2009 at 3:29 PM, Kenton Varda<ken...@google.com> wrote:
>>> > Generally the most efficient way to serialize a message to stdout is:
>>> >   message.SerializeToFileDescriptor(STDOUT_FILENO);
>>> > (If your system doesn't define STDOUT_FILENO, just use the number 1.)
>>> > If you normally use C++'s cout, you might want to write to that
>>> > instead:
>>> >   message.SerializeToOstream(std::cout);
>>>
>>> Does the protobuf library buffer on the file descriptor?
>>
>> Yes.
>>
>>>
>>> I am opening stdout in binary mode, changing the buffer size (setvbuf)
>>> and writing to that
>>> if i give SerializeToFileDescriptor the file descriptor of this new
>>> FILE* object, I guess it won't
>>> use my buffer (I know fwrite uses write, but does write care for the
>>> buffer of the FILE* object?).
>>
>> That is correct.  FILE* adds a buffering layer on top of the fd.  If you
>> wanted protobuf to write to that buffer, you could probably write an
>> implementation of protobuf::io::CopyingOutputStream for FILE* and wrap it in
>> a protobuf::io::CopyingOutputStreamAdaptor, then pass that to
>> message.SerializeToZeroCopyStream().
>>
>>>
>>> > For small messages, it may be slightly faster to serialize to a string
>>> > and
>>> > then write that.  But the difference there would be small, and if it
>>> > matters
>>> > to you we should probably just fix the protobuf library to do this
>>> > optimization automatically...
>>> I should point out that my messages will be in the kb and definitely
>>> less than an MB.
>>
>> For "small messages", I mean ~4kb or less.  The issue is that
>> SerializeToFileDescriptor() allocates an 8k buffer internally, which is
>> wasteful if the message is much less than 8k.  We should fix it so that it
>> doesn't do that for small messages.
>>
>>>
>>> You mention serializing to string. However I also see a method
>>> "SerializeToArray" .
>>> What is the difference?
>>
>> With SerializeToArray() you need to make sure the array is big enough
>> ahead of time, whereas SerializeToString() will allocate a string of the
>> correct size.  You can call ByteSize() in order to size your array, but when
>> you then call SerializeToArray() it will have to call ByteSize() again
>> internally, which is wasteful.  To allocate a correctly-sized array and
>> serialize to it with optimal efficiency you have to use ByteSize() and then
>> call SerializeToArrayWithCachedSizes() -- which reuses the sizes computed by
>> the previous ByteSize() call.  Actually, I guess that's not very hard, is
>> it?  It used to be harder.
>>
>>>
>>> To avoid repeated mallocs/free, I intend to keep one  global
>>> array(resizing if required)
>>
>> If you reuse a single std::string object, you should get the same effect.
>>  string::clear() does not free the backing array, it just sets the size to
>> zero.  So, it will reuse that array the next time you serialize into it.
>>
>>>
>>> , writing to that array and keeping a track of the bytes written and
>>> writing th array out to the stream.
>>> Since my app is not threaded, I do not have an issue of multiple
>>> threads writing to that single array.
>>> However if SerializeToFileDescriptor is still better than this
>>> approach there is no need for this.
>>
>> SerializeToFileDescriptor() is better if your messages are very large
>> because it avoids allocating large contiguous blocks of memory, which can
>> cause memory fragmentation.  Otherwise it has no advantage over serializing
>> to an array and then writing it to the file.
>>
>>>
>>>
>>> > All of these methods require that you write the size first if you
>>> > intend to
>>> > write multiple messages to the stream.
>>>
>>> Yes, I will be writing the length first.
>>
>> Ah, of course, in this case you have to call ByteSize() anyway, so if
>> you're really worried about performance then you want to call
>> Serialize*WithCachedSizes().
>>
>>>
>>> I should point out I haven't had much experience with write,fwrite so
>>> my understanding might be incomplete.
>>>
>>> Much thanks for advice
>>> Regards
>>> Saptarshi
>>
>
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to