Re: how to parse a file with millions of records with protobuf
On Dec 7, 11:45 am, nightwalker leo [EMAIL PROTECTED] wrote: when I try to parse an addressbook file which has 2^20 records of person , my program complains like this: libprotobuf WARNING D:\protobuf-2.0.2\src\google\protobuf\io \coded_stream.cc:459] Reading dangerously large protocol message. If the message turns out to be larger than 67108864 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/ protobuf/io/coded_stream.h. how to deal with the problem in an elegant way instead of increasing the limit or simply turning off the warning message? In my C# port, I have code to write out messages as if they were a repeated field #1 of a container type, and another class to read in the same format in a stream manner, one entry at a time. Would that useful to you? Jon --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: how to parse a file with millions of records with protobuf
Do you really need to have the entire file in memory at once? Reading 64M of addresses into memory seems like the wrong approach (I could be wrong of course, since I don't know what you're doing with them). If you need to do something with each entry individually, you could do chunked reads: when writing, instead of serializing the whole message at once, build several messages of ~10M each, and write them out with a length prefix. When reading back, use the length prefix to yield nicely sized chunks of your address book. There may even be a nice way to do this implicitely at the input/output stream level, if it is aware of field boundaries, but I don't have a good enough handle on the implementation to say. If you need to find a specific entry in the address book, you should sort the address book. You then chunk in the same manner, and add an index message at the end of the file that lists the start offset of all chunks. You can then do binary search over the chunks (even more efficiently if the index includes the start and end keys of your chunks, e.g. last names) to locate the chunk you want. If none of these answers are satisfactory, and you really need the entire multi-hundred-megabyte message loaded at once, I guess you can use SetTotalBytesLimit() to raise the safety limits to whatever you feel is necessary. But usually, when I try to load a bunch of small messages as one monolithic block, I find that my data format isn't adapted to what I want to do. Hope this helps a little - Dave On Sun, Dec 7, 2008 at 12:45 PM, nightwalker leo [EMAIL PROTECTED] wrote: when I try to parse an addressbook file which has 2^20 records of person , my program complains like this: libprotobuf WARNING D:\protobuf-2.0.2\src\google\protobuf\io \coded_stream.cc:459] Reading dangerously large protocol message. If the message turns out to be larger than 67108864 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/ protobuf/io/coded_stream.h. how to deal with the problem in an elegant way instead of increasing the limit or simply turning off the warning message? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: how to parse a file with millions of records with protobuf
On Sun, Dec 7, 2008 at 3:45 AM, nightwalker leo [EMAIL PROTECTED]wrote: when I try to parse an addressbook file which has 2^20 records of person , my program complains like this: libprotobuf WARNING D:\protobuf-2.0.2\src\google\protobuf\io \coded_stream.cc:459] Reading dangerously large protocol message. If the message turns out to be larger than 67108864 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/ protobuf/io/coded_stream.h. how to deal with the problem in an elegant way instead of increasing the limit or simply turning off the warning message? The documentation for SetTotalBytesLimit() answers your question: http://code.google.com/apis/protocolbuffers/docs/reference/cpp/google.protobuf.io.coded_stream.html#CodedInputStream.SetTotalBytesLimit.details --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: how to parse a file with millions of records with protobuf
thanks,could you give me an example plz On 12月8日, 下午4时10分, Jon Skeet [EMAIL PROTECTED] wrote: On Dec 7, 11:45 am, nightwalker leo [EMAIL PROTECTED] wrote: when I try to parse an addressbook file which has 2^20 records of person , my program complains like this: libprotobuf WARNING D:\protobuf-2.0.2\src\google\protobuf\io \coded_stream.cc:459] Reading dangerously large protocol message. If the message turns out to be larger than 67108864 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/ protobuf/io/coded_stream.h. how to deal with the problem in an elegant way instead of increasing the limit or simply turning off the warning message? In my C# port, I have code to write out messages as if they were a repeated field #1 of a container type, and another class to read in the same format in a stream manner, one entry at a time. Would that useful to you? Jon --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
how to parse a file with millions of records with protobuf
when I try to parse an addressbook file which has 2^20 records of person , my program complains like this: libprotobuf WARNING D:\protobuf-2.0.2\src\google\protobuf\io \coded_stream.cc:459] Reading dangerously large protocol message. If the message turns out to be larger than 67108864 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/ protobuf/io/coded_stream.h. how to deal with the problem in an elegant way instead of increasing the limit or simply turning off the warning message? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---