Re: [protobuf] Re: Protocol buffers and large data sets

2010-05-27 Thread Terri Kamm
Thanks, that worked!

Terri


On Mon, May 24, 2010 at 4:46 PM, Kenton Varda ken...@google.com wrote:
 My guess is that you're using a single CodedInputStream to read all your
 input, repeatedly calling message.ParseFromCodedStream().  Instead, create a
 new CodedInputStream for each message.  If you construct it on the stack,
 there is no significant overhead to doing this:
   while (true) {
     CodedInputStream stream(input);
     // read one message, or break if at EOF
   }

 On Mon, May 24, 2010 at 12:21 PM, Terri terri.k...@gmail.com wrote:

 Hi,

 I've been struggling to figure out just exactly how to do the many
 smaller messages approach. I've implemented this strategy, which is
 working except for a byte limit problem:


 http://groups.google.com/group/protobuf/browse_thread/thread/038cc4ad000b4265/95981da7e07ce197?hide_quotes=no

 I also raised the byte limit using SetTotalBytesLimit to maxint.

 I use a python program to read my data form disk and package it up
 into messages that are roughly 110 bytes each. Then I pipe it to a C++
 program that reads messages and crunches. But, I still have a problem
 because the total number of bytes of all my smaller messages is
 greater than maxint and the C++ fails to read when it hits the limit.

 I like the protobuf approach to passing data, I just need to remove
 that limit.

 What can I do?

 Thanks,
 Terri

 On May 17, 7:00 pm, Jason Hsueh jas...@google.com wrote:
  There is a default byte size limit of 64MB when parsing protocol buffers
  -
  if a message is larger than that, it will fail to parse. This can be
  configured if you really need to parse larger messages, but it is
  generally
  not recommended. Additionally, ByteSize() returns a 32-bit integer, so
  there's an implicit limit on the size of data that can be serialized.
 
  You can certainly use protocol buffers in large data sets, but it's not
  recommended to have your entire data set be represented by a single
  message.
  Instead, see if you can break it up into smaller messages.
 
 
 
  On Mon, May 17, 2010 at 1:05 PM, sanikumbh saniku...@gmail.com wrote:
   I wanted to get some opinion on large data sets and protocol buffers.
   Protocol Buffer project page by google says that for data  1
   megabytes, one should consider something different but they don’t
   mention what would happen if one crosses this limit. Are there any
   known failure modes when it comes to the large data sets?
   What are your observations, recommendations from your experience on
   this front?
 
   --
   You received this message because you are subscribed to the Google
   Groups
   Protocol Buffers group.
   To post to this group, send email to proto...@googlegroups.com.
   To unsubscribe from this group, send email to
  
   protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
   .
   For more options, visit this group at
  http://groups.google.com/group/protobuf?hl=en.
 
  --
  You received this message because you are subscribed to the Google
  Groups Protocol Buffers group.
  To post to this group, send email to proto...@googlegroups.com.
  To unsubscribe from this group, send email to
  protobuf+unsubscr...@googlegroups.com.
  For more options, visit this group
  athttp://groups.google.com/group/protobuf?hl=en.

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.




-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: Protocol buffers and large data sets

2010-05-24 Thread Terri
Hi,

I've been struggling to figure out just exactly how to do the many
smaller messages approach. I've implemented this strategy, which is
working except for a byte limit problem:

http://groups.google.com/group/protobuf/browse_thread/thread/038cc4ad000b4265/95981da7e07ce197?hide_quotes=no

I also raised the byte limit using SetTotalBytesLimit to maxint.

I use a python program to read my data form disk and package it up
into messages that are roughly 110 bytes each. Then I pipe it to a C++
program that reads messages and crunches. But, I still have a problem
because the total number of bytes of all my smaller messages is
greater than maxint and the C++ fails to read when it hits the limit.

I like the protobuf approach to passing data, I just need to remove
that limit.

What can I do?

Thanks,
Terri

On May 17, 7:00 pm, Jason Hsueh jas...@google.com wrote:
 There is a default byte size limit of 64MB when parsing protocol buffers -
 if a message is larger than that, it will fail to parse. This can be
 configured if you really need to parse larger messages, but it is generally
 not recommended. Additionally, ByteSize() returns a 32-bit integer, so
 there's an implicit limit on the size of data that can be serialized.

 You can certainly use protocol buffers in large data sets, but it's not
 recommended to have your entire data set be represented by a single message.
 Instead, see if you can break it up into smaller messages.



 On Mon, May 17, 2010 at 1:05 PM, sanikumbh saniku...@gmail.com wrote:
  I wanted to get some opinion on large data sets and protocol buffers.
  Protocol Buffer project page by google says that for data  1
  megabytes, one should consider something different but they don’t
  mention what would happen if one crosses this limit. Are there any
  known failure modes when it comes to the large data sets?
  What are your observations, recommendations from your experience on
  this front?

  --
  You received this message because you are subscribed to the Google Groups
  Protocol Buffers group.
  To post to this group, send email to proto...@googlegroups.com.
  To unsubscribe from this group, send email to
  protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
  .
  For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.

 --
 You received this message because you are subscribed to the Google Groups 
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to 
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group 
 athttp://groups.google.com/group/protobuf?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: Protocol buffers and large data sets

2010-05-24 Thread Kenton Varda
My guess is that you're using a single CodedInputStream to read all your
input, repeatedly calling message.ParseFromCodedStream().  Instead, create a
new CodedInputStream for each message.  If you construct it on the stack,
there is no significant overhead to doing this:

  while (true) {
CodedInputStream stream(input);
// read one message, or break if at EOF
  }

On Mon, May 24, 2010 at 12:21 PM, Terri terri.k...@gmail.com wrote:

 Hi,

 I've been struggling to figure out just exactly how to do the many
 smaller messages approach. I've implemented this strategy, which is
 working except for a byte limit problem:


 http://groups.google.com/group/protobuf/browse_thread/thread/038cc4ad000b4265/95981da7e07ce197?hide_quotes=no

 I also raised the byte limit using SetTotalBytesLimit to maxint.

 I use a python program to read my data form disk and package it up
 into messages that are roughly 110 bytes each. Then I pipe it to a C++
 program that reads messages and crunches. But, I still have a problem
 because the total number of bytes of all my smaller messages is
 greater than maxint and the C++ fails to read when it hits the limit.

 I like the protobuf approach to passing data, I just need to remove
 that limit.

 What can I do?

 Thanks,
 Terri

 On May 17, 7:00 pm, Jason Hsueh jas...@google.com wrote:
  There is a default byte size limit of 64MB when parsing protocol buffers
 -
  if a message is larger than that, it will fail to parse. This can be
  configured if you really need to parse larger messages, but it is
 generally
  not recommended. Additionally, ByteSize() returns a 32-bit integer, so
  there's an implicit limit on the size of data that can be serialized.
 
  You can certainly use protocol buffers in large data sets, but it's not
  recommended to have your entire data set be represented by a single
 message.
  Instead, see if you can break it up into smaller messages.
 
 
 
  On Mon, May 17, 2010 at 1:05 PM, sanikumbh saniku...@gmail.com wrote:
   I wanted to get some opinion on large data sets and protocol buffers.
   Protocol Buffer project page by google says that for data  1
   megabytes, one should consider something different but they don’t
   mention what would happen if one crosses this limit. Are there any
   known failure modes when it comes to the large data sets?
   What are your observations, recommendations from your experience on
   this front?
 
   --
   You received this message because you are subscribed to the Google
 Groups
   Protocol Buffers group.
   To post to this group, send email to proto...@googlegroups.com.
   To unsubscribe from this group, send email to
   protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 protobuf%2bunsubscr...@googlegroups.comprotobuf%252bunsubscr...@googlegroups.com
 
   .
   For more options, visit this group at
  http://groups.google.com/group/protobuf?hl=en.
 
  --
  You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
  To post to this group, send email to proto...@googlegroups.com.
  To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
  For more options, visit this group athttp://
 groups.google.com/group/protobuf?hl=en.

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.