Re: [protobuf] Deserialize messages from CodedInputStream without a message size prefix

2010-12-13 Thread Saptarshi Guha


You should be using toByteArray(), not getBytes(), to serialize to  
the protobuf wire format.


Oh those variables are byte array wrappers and contains serialized  
bytes among other things.



You also need to delimit the messages.


Got it, thanks.

Otherwise, the first ParseFromCodedStream call will consume as many  
bytes as are available in the byte array.


Yes , confirmed with some print statements.  That does remind me of  
some I know at the dining table.


Thanks and cheers
J

On Sun, Dec 12, 2010 at 10:13 PM, Fishtank  
saptarshi.g...@gmail.com wrote:

Hello,

I have a CodedInputStream created from a byte array (in C++)

CodedInputStream cds((uint8_t*)a,tbytes);
cds.SetTotalBytesLimit(N,M);

I have written bytes to this array (a) from Java like so

b.writeRawBytes(k.getBytes(),0,k.getLength());
b.writeRawBytes(v.getBytes(),0,v.getLength());

Given the number N of k,v pairs written, I'd like to deserialize them.
Notice I haven't prepended byte sizes. I thought I could do  
something like this


for(i = 1 to N){
  rexp_container-ParseFromCodedStream(cds) //k
  // do something with rexp_container
  rexp_container.Clear();
  rexp_container-ParseFromCodedStream(cds) //v
  // do something with rexp_container
  rexp_container.Clear();
}

Is this the correct way to do it? I get a missing field error (not  
supposed to be the case)

I tried ParsePartialFromCodedStream but I get incorrect results.

Is it okay to provide a CodedInputStream and pick of the messages  
one by one?


Cheers
Joy

[1] PB ERROR[LOGLEVEL_ERROR](google/protobuf/message_lite.cc:123)  
Can't parse message of type REXP because it is missing required  
fields: rclass


--
You received this message because you are subscribed to the Google  
Groups Protocol Buffers group.

To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com 
.
For more options, visit this group at http://groups.google.com/group/protobuf?hl=en 
.


--
You received this message because you are subscribed to the Google  
Groups Protocol Buffers group.

To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com 
.
For more options, visit this group at http://groups.google.com/group/protobuf?hl=en 
.


--
You received this message because you are subscribed to the Google Groups Protocol 
Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Protobuf 2.3 and compiler warnings

2010-04-15 Thread Saptarshi Guha
Ah, i saw the diff.
Thank you
Saptarshi


On Thu, Apr 15, 2010 at 3:53 PM, Jason Hsueh jas...@google.com wrote:
 Yes, these are safe to ignore. This is also addressed in
 r302: http://code.google.com/p/protobuf/source/detail?r=302

 On Thu, Apr 15, 2010 at 12:40 PM, Fishtank saptarshi.g...@gmail.com wrote:

 Hello,
 I ran protoc --cpp_out on a proto file using 2.3.0 compiler . I
 replaced the my previously generated cpp and header files with the new
 ones and compiled.
 It works,but I get some compiler warnings, which always makes me
 concerned.

 Can I safely ignore these?

 g++ -I/ln/meraki/custom//lib64/R/include  -I/usr/local/include    -
 fpic  -g -O2 -I. -g  -DHAVE_UINTPTR_T  `pkg-config --cflags protobuf` -
 Wall -c utility.cc -o utility.o
 /ln/meraki/custom/include/google/protobuf/io/coded_stream.h: In member
 function ‘bool

 google::protobuf::io::CodedInputStream::ReadLittleEndian32(google::protobuf::uint32*)’:
 /ln/meraki/custom/include/google/protobuf/io/coded_stream.h:776:
 warning: comparison between signed and unsigned integer expressions
 /ln/meraki/custom/include/google/protobuf/io/coded_stream.h: In member
 function ‘bool

 google::protobuf::io::CodedInputStream::ReadLittleEndian64(google::protobuf::uint64*)’:
 /ln/meraki/custom/include/google/protobuf/io/coded_stream.h:791:
 warning: comparison between signed and unsigned integer expressions

 Thank you
 Saptarshi

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.




-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] protocol message was rejected because it was too big (Turn off?)

2009-12-12 Thread Saptarshi Guha
Thanks much.
Regards
Saptarshi


On Sat, Dec 12, 2009 at 4:40 PM, Kenton Varda ken...@google.com wrote:
 Actually, this is a better link:
 http://code.google.com/apis/protocolbuffers/docs/reference/cpp/google.protobuf.common.html#SetLogHandler.details
 For some reason the auto-generated documentation is failing to hyperlink to
 the definition of LogHandler, but it is shown among the typedefs at the top
 of that page:
 typedef void LogHandler(LogLevel level, const char *filename, int line,
 const string message)
 On Sat, Dec 12, 2009 at 1:36 PM, Kenton Varda ken...@google.com wrote:

 All messages are written to stderr (not stdout), which is usually reserved
 for human-readable error messages.  However, you can redirect the messages
 using google::protobuf::SetLogHandler() as documented here:

 http://code.google.com/apis/protocolbuffers/docs/reference/cpp/google.protobuf.common.html#SetLogHandler

 On Sat, Dec 12, 2009 at 10:50 AM, Saptarshi Guha
 saptarshi.g...@gmail.com wrote:

 Hello,
 I am using Protocol Buffers to serialize some data.
 To begin with I do realize that I shouldn't be using PB for
 serializing very large messages, but given that I am, I have to deal
 with these messages.

 E.g I have a message of 381MB, so naturally I get this error when
 parsing:

 libprotobuf ERROR google/protobuf/io/coded_stream.cc:196] A protocol
 message was rejected because it was too big (more than 67108864
 bytes).  To increase the limit (or to disable these warnings), see
 CodedInputStream::SetTotalBytesLimit() in
 google/protobuf/io/coded_stream.h.


 I viewed the header file and see what I have to do next. I'll fix my
 code soon, till then:

 My program redirects standard error(and output) and re-encodes
 functions that write to these streams. However writing to s.out and
 s.err are through special functions. PB, does not use my functions.
 Other libraries (and I only use PB) writing to s.err and s.out can
 adversely affect my program.

 Q. Is there a flag I can set to not display the warning? i.e silently
 fail? I'm not using CodedInputstream, instead I use ParseFromArray (i
 have read in the bytes with m own functions)

 Regards
 Saptarshi

 --

 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.






--

You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.




[protobuf] Re: On SnowLeopard, EXC_CRASH (SIGABRT)

2009-11-01 Thread Saptarshi Guha

Hello,
Your explanation sounds right. Neither do I have Snow Leopard, so will  
have to ask the user.
Personally, I would never (unless absolutely  forced to) install  
another GCC.  Tried it once, and so many things started failing it was  
miserable.

I'll get back to this thread.

Regards
Saptarshi

On Nov 1, 2009, at 3:15 AM, Kenton Varda wrote:

 Sorry, I don't have access to a Snow Leopard machine to test this on.

 However, your second link looks like a very likely culprit.  They  
 seem to be saying that all C++ code on Snow Leopard needs to be  
 compiled with -D_GLIBCXX_FULLY_DYNAMIC_STRING, otherwise it will  
 likely crash.  So, I'd recommend re-compiling libprotobuf and your  
 app with this flag.

 But I'm confused.  This seems like a truly massive bug --  
 essentially, it sounds like Apple has released a C++ compiler that,  
 by default, is not compatible with their C++ standard library.  Is  
 it really possible that such a huge problem would make it through  
 basic testing, let alone be shipped and live several months with  
 fewer than 10 sites on the entire internet mentioning it?

 No, that seems unlikely to me.  My guess is that the Apple release  
 of GCC actually sets this flag correctly by default, but you are  
 actually using some other GCC, perhaps from MacPorts or some such.   
 Could this be the case?

 On Sat, Oct 31, 2009 at 5:58 PM, Saptarshi Guha saptarshi.g...@gmail.com 
  wrote:


 On Oct 30, 2009, at 7:14 PM, Kenton Varda wrote:

  There's not much we can do with this without a reproducible demo.
 


 Hello,
 I have placed a link[1] to tgz file, which can be run like

 tar zxvf test.parse.tgz
 cd testdata
 make
 ./testme testdata.bin

 If all works well, it should display an  entry of string keys and
 string values.
 It compiles and runs on Leopard 10.5.7 (Macbook), but fails with

 
 ./testme testdata.bin
 Reading in 3239 bytes
 testme(41471) malloc: *** error for object 0x100222520: pointer being
 freed was not allocated
 *** set a breakpoint in malloc_error_break to debug
 
 on Snow Leopard (i can't recall the machine type)

 On another note,  I read that( from an emacs blog) that Snow Leopard
 has fully dynamic strings enabled by default
 and there is an issue regarding freeing such strings[2] . I'm not sure
 if this even related.

 Thanks
 Saptarshi


 [1] http://ml.stat.purdue.edu/rpackages/test.parse.tgz
 [2] http://www.newartisans.com/2009/10/a-c-gotcha-on-snow-leopard.html






  On Fri, Oct 30, 2009 at 4:04 PM, Saptarshi Guha saptarshi.g...@gmail.com
   wrote:
 
  Hello,
  I have  a byte array which I'd like to deserialize, it is about 3K
  bytes.
  On RHEL 5, 64 bit machine, protobuf 2.2 my deserialization works.
  On Leopard 10.5.7 on a macbook it also works. (for both 32 bit and  
 64
  bit versions)
 
 
  Above gcc: 4.0.1
 
  However, someone reported this crash on SnowLeopard (gcc4.2)
 
  I'm not sure why this happens. The crash appears to arise within the
  protobuf calls.
  Regards
  Saptarshi
 
 
  Process: R [34034]
  Path:/Applications/R64.app/Contents/MacOS/R
  Identifier:  org.R-project.R
  Version: R 2.10.0 GUI 1.30 Leopard build 64-bit (5511)
  Code Type:   X86-64 (Native)
  Parent Process:  launchd [182]
 
  Date/Time:   2009-10-28 20:11:54.353 -0700
  OS Version:  Mac OS X 10.6.1 (10B504)
  Report Version:  6
 
  Interval Since Last Report:  195786 sec
  Crashes Since Last Report:   2
  Per-App Interval Since Last Report:  1676 sec
  Per-App Crashes Since Last Report:   1
  Anonymous UUID:  0A96FBCF-6045-4A38-
  A8E5-619A52D23CE5
 
  Exception Type:  EXC_CRASH (SIGABRT)
  Exception Codes: 0x, 0x
  Crashed Thread:  0  Dispatch queue: com.apple.main-thread
 
  Application Specific Information:
  abort() called
 
  Thread 0 Crashed:  Dispatch queue: com.apple.main-thread
  0   libSystem.B.dylib   0x7fff836bdff6 __kill  
 + 10
  1   libSystem.B.dylib   0x7fff8375f072 abort +  
 83
  2   libSystem.B.dylib   0x7fff83676095 free +  
 128
  3   libstdc++.6.dylib   0x7fff87aa71e8
  std::string::reserve(unsigned long) + 90
  4   libstdc++.6.dylib   0x7fff87aa742b
  std::string::append(char const*, unsigned long) + 127
  5   libprotobuf.4.dylib 0x000116a0989c
  google::protobuf::io::CodedInputStream::ReadString(std::string*,  
 int)
  + 236 (coded_stream.h:761)
  6   Rhipe.so0x00011430909c
  STRING
  
  ::MergePartialFromCodedStream 
 (google::protobuf::io::CodedInputStream*)
  + 396
  7   Rhipe.so0x0001143099d8
  REXP
  
  ::MergePartialFromCodedStream 
 (google::protobuf::io::CodedInputStream*)
  + 2104
  8   Rhipe.so0x000114309de8
  REXP
  
  ::MergePartialFromCodedStream 
 (google::protobuf::io::CodedInputStream*)
  + 3144

[protobuf] Re: On SnowLeopard, EXC_CRASH (SIGABRT)

2009-10-31 Thread Saptarshi Guha


On Oct 30, 2009, at 7:14 PM, Kenton Varda wrote:

 There's not much we can do with this without a reproducible demo.



Hello,
I have placed a link[1] to tgz file, which can be run like

tar zxvf test.parse.tgz
cd testdata
make
./testme testdata.bin

If all works well, it should display an  entry of string keys and  
string values.
It compiles and runs on Leopard 10.5.7 (Macbook), but fails with


./testme testdata.bin
Reading in 3239 bytes
testme(41471) malloc: *** error for object 0x100222520: pointer being  
freed was not allocated
*** set a breakpoint in malloc_error_break to debug

on Snow Leopard (i can't recall the machine type)

On another note,  I read that( from an emacs blog) that Snow Leopard  
has fully dynamic strings enabled by default
and there is an issue regarding freeing such strings[2] . I'm not sure  
if this even related.

Thanks
Saptarshi


[1] http://ml.stat.purdue.edu/rpackages/test.parse.tgz
[2] http://www.newartisans.com/2009/10/a-c-gotcha-on-snow-leopard.html






 On Fri, Oct 30, 2009 at 4:04 PM, Saptarshi Guha saptarshi.g...@gmail.com 
  wrote:

 Hello,
 I have  a byte array which I'd like to deserialize, it is about 3K
 bytes.
 On RHEL 5, 64 bit machine, protobuf 2.2 my deserialization works.
 On Leopard 10.5.7 on a macbook it also works. (for both 32 bit and 64
 bit versions)


 Above gcc: 4.0.1

 However, someone reported this crash on SnowLeopard (gcc4.2)

 I'm not sure why this happens. The crash appears to arise within the
 protobuf calls.
 Regards
 Saptarshi


 Process: R [34034]
 Path:/Applications/R64.app/Contents/MacOS/R
 Identifier:  org.R-project.R
 Version: R 2.10.0 GUI 1.30 Leopard build 64-bit (5511)
 Code Type:   X86-64 (Native)
 Parent Process:  launchd [182]

 Date/Time:   2009-10-28 20:11:54.353 -0700
 OS Version:  Mac OS X 10.6.1 (10B504)
 Report Version:  6

 Interval Since Last Report:  195786 sec
 Crashes Since Last Report:   2
 Per-App Interval Since Last Report:  1676 sec
 Per-App Crashes Since Last Report:   1
 Anonymous UUID:  0A96FBCF-6045-4A38-
 A8E5-619A52D23CE5

 Exception Type:  EXC_CRASH (SIGABRT)
 Exception Codes: 0x, 0x
 Crashed Thread:  0  Dispatch queue: com.apple.main-thread

 Application Specific Information:
 abort() called

 Thread 0 Crashed:  Dispatch queue: com.apple.main-thread
 0   libSystem.B.dylib   0x7fff836bdff6 __kill + 10
 1   libSystem.B.dylib   0x7fff8375f072 abort + 83
 2   libSystem.B.dylib   0x7fff83676095 free + 128
 3   libstdc++.6.dylib   0x7fff87aa71e8
 std::string::reserve(unsigned long) + 90
 4   libstdc++.6.dylib   0x7fff87aa742b
 std::string::append(char const*, unsigned long) + 127
 5   libprotobuf.4.dylib 0x000116a0989c
 google::protobuf::io::CodedInputStream::ReadString(std::string*, int)
 + 236 (coded_stream.h:761)
 6   Rhipe.so0x00011430909c
 STRING
 ::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)
 + 396
 7   Rhipe.so0x0001143099d8
 REXP
 ::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)
 + 2104
 8   Rhipe.so0x000114309de8
 REXP
 ::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)
 + 3144
 9   libprotobuf.4.dylib 0x0001169f893e
 google::protobuf::MessageLite::ParseFromArray(void const*, int) + 62
 (message_lite.cc:104)
 10  Rhipe.so0x00011430d5d3
 unserializeUsingPB + 99







 Saptarshi Guha | saptarshi.g...@gmail.com | http://www.stat.purdue.edu/~sguha
 The use of anthropomorphic terminology when dealing with computing
 systems
 is a symptom of professional immaturity.
-- Edsger W. Dijkstra


 



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



[protobuf] On SnowLeopard, EXC_CRASH (SIGABRT)

2009-10-30 Thread Saptarshi Guha

Hello,
I have  a byte array which I'd like to deserialize, it is about 3K  
bytes.
On RHEL 5, 64 bit machine, protobuf 2.2 my deserialization works.
On Leopard 10.5.7 on a macbook it also works. (for both 32 bit and 64  
bit versions)


Above gcc: 4.0.1

However, someone reported this crash on SnowLeopard (gcc4.2)

I'm not sure why this happens. The crash appears to arise within the  
protobuf calls.
Regards
Saptarshi


Process: R [34034]
Path:/Applications/R64.app/Contents/MacOS/R
Identifier:  org.R-project.R
Version: R 2.10.0 GUI 1.30 Leopard build 64-bit (5511)
Code Type:   X86-64 (Native)
Parent Process:  launchd [182]

Date/Time:   2009-10-28 20:11:54.353 -0700
OS Version:  Mac OS X 10.6.1 (10B504)
Report Version:  6

Interval Since Last Report:  195786 sec
Crashes Since Last Report:   2
Per-App Interval Since Last Report:  1676 sec
Per-App Crashes Since Last Report:   1
Anonymous UUID:  0A96FBCF-6045-4A38- 
A8E5-619A52D23CE5

Exception Type:  EXC_CRASH (SIGABRT)
Exception Codes: 0x, 0x
Crashed Thread:  0  Dispatch queue: com.apple.main-thread

Application Specific Information:
abort() called

Thread 0 Crashed:  Dispatch queue: com.apple.main-thread
0   libSystem.B.dylib   0x7fff836bdff6 __kill + 10
1   libSystem.B.dylib   0x7fff8375f072 abort + 83
2   libSystem.B.dylib   0x7fff83676095 free + 128
3   libstdc++.6.dylib   0x7fff87aa71e8  
std::string::reserve(unsigned long) + 90
4   libstdc++.6.dylib   0x7fff87aa742b  
std::string::append(char const*, unsigned long) + 127
5   libprotobuf.4.dylib 0x000116a0989c  
google::protobuf::io::CodedInputStream::ReadString(std::string*, int)  
+ 236 (coded_stream.h:761)
6   Rhipe.so0x00011430909c  
STRING 
::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)  
+ 396
7   Rhipe.so0x0001143099d8  
REXP 
::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)  
+ 2104
8   Rhipe.so0x000114309de8  
REXP 
::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)  
+ 3144
9   libprotobuf.4.dylib 0x0001169f893e  
google::protobuf::MessageLite::ParseFromArray(void const*, int) + 62  
(message_lite.cc:104)
10  Rhipe.so0x00011430d5d3  
unserializeUsingPB + 99







Saptarshi Guha | saptarshi.g...@gmail.com | http://www.stat.purdue.edu/~sguha
The use of anthropomorphic terminology when dealing with computing  
systems
is a symptom of professional immaturity.
-- Edsger W. Dijkstra


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



[protobuf] Re: File already exists in database: abort trap

2009-10-29 Thread Saptarshi Guha


On Oct 29, 2009, at 2:26 AM, Kenton Varda ken...@google.com wrote:

 I'm not familiar with R.  But, the error means that you've attempted  
 to load two different copies of rexp.pb.cc into the same process.   
 The protobuf runtime requires that all compiled-in .proto files have  
 unique names.

 Note that if you have two files with the same name, but in different  
 directories, then you should make sure to run protoc from the common  
 parent directory.  E.g., if you have foo/myproto.proto and bar/ 
 myproto.proto, DON'T do this:

   (cd foo  protoc myproto.proto)
   (cd bar  protoc myproto.proto)

 In this case, protoc assigns the same name to both files.

 Instead, you should do this:

   protoc foo/myproto.proto
   protoc bar/myproto.proto

 Then the directory name becomes part of each file name, making them  
 different.  Note that you can also use --proto_path to identify the  
 top-level source directory instead of actually running protoc from it.

Thanks , that is what us happening, the same file is being loaded  
twice. I'll follow your suggestion and see how things go.


 

 PS.  I notice you wrote this:

 http://lists.r-forge.r-project.org/pipermail/rprotobuf-yada/2009-October/00.html

 There is a missing option (c) here:  Do like option (b), but do NOT  
 copy the protoc source code.  Instead, simply link against  
 libprotoc, which is one of the libraries included in the protobuf  
 package -- it provides the CommandLineInterface class, so all you  
 need to do is write a CodeGenerator implementation and a main()  
 function.  The .proto language DOES change over time (we add new  
 features), so you should not try to write your own parser!

Yes , after I sent that email, romain pointed me to the c++ API  
whereupon I found the methods you mentioned. My mistake. I saw the  
java generator and got a rudimentary idea if how it was being done.


Thanks again for your prompt help
Regards
Saptarshi


 S
 In the future I'm hoping to make protoc itself support plugins;  
 see this thread:
 http://groups.google.com/group/protobuf/browse_thread/thread/1010da67218d45d2

 On Wed, Oct 28, 2009 at 10:12 PM, Fishtank  
 saptarshi.g...@gmail.com wrote:

 Hello,
 I use R 2.9.2 and have written two packages which use a proto file,
 one proto is a subset of the other.
 When I load both packages(one after the other,though order does not
 matter) loading the second causes the following crash

 libprotobuf ERROR google/protobuf/descriptor_database.cc:56] File
 already exists in database: rexp.proto
 libprotobuf FATAL google/protobuf/descriptor.cc:857] CHECK failed:
 generated_database_-Add(encoded_file_descriptor, size):
 Abort trap


 What does this mean, and is there something I'm doing wrong on my
 side?
 Thank you for your time
 Regards
 Saptarshi

 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: A quick question regarding writing protobuf message to Stream preceded by Header

2009-08-28 Thread Saptarshi Guha

Hello,
Thanks much for the answers. I did perform some tests and your
statements hold true (marginal differences however)
i.e for small messages (~7kb), the FDescriptor method is faster than
SerializeToString. For larger messages the latter is faster.

I tried a typical case (for me), creating R runif(N) object(once),
serialize using ProtoBufs, writing this out and repeating this M
times.
For N say, 125  *FD is better and for larger N(2000, about 15KB) to
String is better. However, i did notice about 10% improvement (not a
very rigorous experiment) for the FD method over *String method when
it came to right tiny messages (~1KB) 10MM(=M) times .

Surprisingly, the output to array is much slower than the other two.

Thanks for your input, it was really helpful.
Regards
Saptarshi

On Thu, Aug 27, 2009 at 10:19 PM, Kenton Vardaken...@google.com wrote:
 BTW, when I talk about one thing being more efficient than another, it's
 really a matter of a few percent difference.  For the vast majority of apps,
 it does not matter.  I'd suggest not worrying about it unless you're really
 sure you need to improve your performance *and* profiling shows that you
 spend a lot of time in protobuf code.

 On Thu, Aug 27, 2009 at 7:18 PM, Kenton Varda ken...@google.com wrote:


 On Thu, Aug 27, 2009 at 2:06 PM, Saptarshi Guha saptarshi.g...@gmail.com
 wrote:

 Hello
 I was thinking about this and had some questions

 On Mon, Aug 24, 2009 at 3:29 PM, Kenton Vardaken...@google.com wrote:
  Generally the most efficient way to serialize a message to stdout is:
    message.SerializeToFileDescriptor(STDOUT_FILENO);
  (If your system doesn't define STDOUT_FILENO, just use the number 1.)
  If you normally use C++'s cout, you might want to write to that
  instead:
    message.SerializeToOstream(std::cout);

 Does the protobuf library buffer on the file descriptor?

 Yes.


 I am opening stdout in binary mode, changing the buffer size (setvbuf)
 and writing to that
 if i give SerializeToFileDescriptor the file descriptor of this new
 FILE* object, I guess it won't
 use my buffer (I know fwrite uses write, but does write care for the
 buffer of the FILE* object?).

 That is correct.  FILE* adds a buffering layer on top of the fd.  If you
 wanted protobuf to write to that buffer, you could probably write an
 implementation of protobuf::io::CopyingOutputStream for FILE* and wrap it in
 a protobuf::io::CopyingOutputStreamAdaptor, then pass that to
 message.SerializeToZeroCopyStream().


  For small messages, it may be slightly faster to serialize to a string
  and
  then write that.  But the difference there would be small, and if it
  matters
  to you we should probably just fix the protobuf library to do this
  optimization automatically...
 I should point out that my messages will be in the kb and definitely
 less than an MB.

 For small messages, I mean ~4kb or less.  The issue is that
 SerializeToFileDescriptor() allocates an 8k buffer internally, which is
 wasteful if the message is much less than 8k.  We should fix it so that it
 doesn't do that for small messages.


 You mention serializing to string. However I also see a method
 SerializeToArray .
 What is the difference?

 With SerializeToArray() you need to make sure the array is big enough
 ahead of time, whereas SerializeToString() will allocate a string of the
 correct size.  You can call ByteSize() in order to size your array, but when
 you then call SerializeToArray() it will have to call ByteSize() again
 internally, which is wasteful.  To allocate a correctly-sized array and
 serialize to it with optimal efficiency you have to use ByteSize() and then
 call SerializeToArrayWithCachedSizes() -- which reuses the sizes computed by
 the previous ByteSize() call.  Actually, I guess that's not very hard, is
 it?  It used to be harder.


 To avoid repeated mallocs/free, I intend to keep one  global
 array(resizing if required)

 If you reuse a single std::string object, you should get the same effect.
  string::clear() does not free the backing array, it just sets the size to
 zero.  So, it will reuse that array the next time you serialize into it.


 , writing to that array and keeping a track of the bytes written and
 writing th array out to the stream.
 Since my app is not threaded, I do not have an issue of multiple
 threads writing to that single array.
 However if SerializeToFileDescriptor is still better than this
 approach there is no need for this.

 SerializeToFileDescriptor() is better if your messages are very large
 because it avoids allocating large contiguous blocks of memory, which can
 cause memory fragmentation.  Otherwise it has no advantage over serializing
 to an array and then writing it to the file.



  All of these methods require that you write the size first if you
  intend to
  write multiple messages to the stream.

 Yes, I will be writing the length first.

 Ah, of course, in this case you have to call ByteSize() anyway, so if
 you're really worried