Re: how to parse a file with millions of records with protobuf

2008-12-08 Thread Jon Skeet

On Dec 7, 11:45 am, nightwalker leo [EMAIL PROTECTED] wrote:
 when I try to parse an addressbook file which has 2^20 records of
 person , my program complains like this:
 libprotobuf WARNING D:\protobuf-2.0.2\src\google\protobuf\io
 \coded_stream.cc:459] Reading dangerously large protocol message.  If
 the message turns out to be larger than 67108864 bytes, parsing will
 be halted for security reasons.  To increase the limit (or to disable
 these warnings), see CodedInputStream::SetTotalBytesLimit() in google/
 protobuf/io/coded_stream.h.

 how to deal with the problem in an elegant way instead of increasing
 the limit or simply turning off the warning message?

In my C# port, I have code to write out messages as if they were a
repeated field #1 of a container type, and another class to read in
the same format in a stream manner, one entry at a time.

Would that useful to you?

Jon
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: how to parse a file with millions of records with protobuf

2008-12-08 Thread David Anderson

Do you really need to have the entire file in memory at once? Reading
64M of addresses into memory seems like the wrong approach (I could
be wrong of course, since I don't know what you're doing with them).

If you need to do something with each entry individually, you could do
chunked reads: when writing, instead of serializing the whole message
at once, build several messages of ~10M each, and write them out with
a length prefix. When reading back, use the length prefix to yield
nicely sized chunks of your address book. There may even be a nice way
to do this implicitely at the input/output stream level, if it is
aware of field boundaries, but I don't have a good enough handle on
the implementation to say.

If you need to find a specific entry in the address book, you should
sort the address book. You then chunk in the same manner, and add an
index message at the end of the file that lists the start offset of
all chunks. You can then do binary search over the chunks (even more
efficiently if the index includes the start and end keys of your
chunks, e.g. last names) to locate the chunk you want.

If none of these answers are satisfactory, and you really need the
entire multi-hundred-megabyte message loaded at once, I guess you can
use SetTotalBytesLimit() to raise the safety limits to whatever you
feel is necessary. But usually, when I try to load a bunch of small
messages as one monolithic block, I find that my data format isn't
adapted to what I want to do.

Hope this helps a little
- Dave

On Sun, Dec 7, 2008 at 12:45 PM, nightwalker leo [EMAIL PROTECTED] wrote:

 when I try to parse an addressbook file which has 2^20 records of
 person , my program complains like this:
 libprotobuf WARNING D:\protobuf-2.0.2\src\google\protobuf\io
 \coded_stream.cc:459] Reading dangerously large protocol message.  If
 the message turns out to be larger than 67108864 bytes, parsing will
 be halted for security reasons.  To increase the limit (or to disable
 these warnings), see CodedInputStream::SetTotalBytesLimit() in google/
 protobuf/io/coded_stream.h.

 how to deal with the problem in an elegant way instead of increasing
 the limit or simply turning off the warning message?
 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: how to parse a file with millions of records with protobuf

2008-12-08 Thread Kenton Varda
On Sun, Dec 7, 2008 at 3:45 AM, nightwalker leo [EMAIL PROTECTED]wrote:


 when I try to parse an addressbook file which has 2^20 records of
 person , my program complains like this:
 libprotobuf WARNING D:\protobuf-2.0.2\src\google\protobuf\io
 \coded_stream.cc:459] Reading dangerously large protocol message.  If
 the message turns out to be larger than 67108864 bytes, parsing will
 be halted for security reasons.  To increase the limit (or to disable
 these warnings), see CodedInputStream::SetTotalBytesLimit() in google/
 protobuf/io/coded_stream.h.

 how to deal with the problem in an elegant way instead of increasing
 the limit or simply turning off the warning message?


The documentation for SetTotalBytesLimit() answers your question:

http://code.google.com/apis/protocolbuffers/docs/reference/cpp/google.protobuf.io.coded_stream.html#CodedInputStream.SetTotalBytesLimit.details

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: how to parse a file with millions of records with protobuf

2008-12-08 Thread nightwalker leo

thanks,could you give me an example plz

On 12月8日, 下午4时10分, Jon Skeet [EMAIL PROTECTED] wrote:
 On Dec 7, 11:45 am, nightwalker leo [EMAIL PROTECTED] wrote:

  when I try to parse an addressbook file which has 2^20 records of
  person , my program complains like this:
  libprotobuf WARNING D:\protobuf-2.0.2\src\google\protobuf\io
  \coded_stream.cc:459] Reading dangerously large protocol message.  If
  the message turns out to be larger than 67108864 bytes, parsing will
  be halted for security reasons.  To increase the limit (or to disable
  these warnings), see CodedInputStream::SetTotalBytesLimit() in google/
  protobuf/io/coded_stream.h.

  how to deal with the problem in an elegant way instead of increasing
  the limit or simply turning off the warning message?

 In my C# port, I have code to write out messages as if they were a
 repeated field #1 of a container type, and another class to read in
 the same format in a stream manner, one entry at a time.

 Would that useful to you?

 Jon
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



how to parse a file with millions of records with protobuf

2008-12-07 Thread nightwalker leo

when I try to parse an addressbook file which has 2^20 records of
person , my program complains like this:
libprotobuf WARNING D:\protobuf-2.0.2\src\google\protobuf\io
\coded_stream.cc:459] Reading dangerously large protocol message.  If
the message turns out to be larger than 67108864 bytes, parsing will
be halted for security reasons.  To increase the limit (or to disable
these warnings), see CodedInputStream::SetTotalBytesLimit() in google/
protobuf/io/coded_stream.h.

how to deal with the problem in an elegant way instead of increasing
the limit or simply turning off the warning message?
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---