[protobuf] Re: how to use GzipInputStream with multiple messages?
Could someone give me a usecase / example of using GzipInputStream (outputstream) for reading a binary file with multiple object types? Regards, Alok On Feb 2, 5:46 pm, alok alok.jad...@gmail.com wrote: also, what is the standard way to write and to read from a gzipstream. I am doing something like this to write to stream: headerMessage.SerializeToZeroCopyStream(gzip_output); to read from stream: headerMessage.ParseFromZeroCopyStream(gzip_input, headerMessage.ByteSize()); is the above approach correct? I haven't used gzipstreams with protocol buffers before. i am not able to make it work. Regards, Alok On Feb 2, 5:18 pm, alok alok.jad...@gmail.com wrote: How do we implement GzipInputStream to read a file with different messages. I could achieve this using coded input stream by appending the size of the object before the object itself. I am not sure how to get same result using GzipInputStream. Could someone please guide me here? Regards, Alok -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: suggestions on improving the performance?
anymore suggestions? On Jan 16, 11:14 am, alok alok.jad...@gmail.com wrote: google groups linkhttp://groups.google.com/group/protobuf/browse_thread/thread/64a07911... I tested the code with reusing the coded input object. Not much change in the speed performance. void ReadAllMessages(ZeroCopyInputStream *raw_input, stdext::hash_setstd::string instruments) { int item_count = 0; CodedInputStream* in = new CodedInputStream(raw_input); in-SetTotalBytesLimit(1e9, 9e8); while(1) { if(item_count % 20 == 0){ delete in; in = new CodedInputStream(raw_input); in-SetTotalBytesLimit(1e9, 9e8); } if(!ReadNextRecord(in, instruments)) break; item_count++; } cout Finished reading file. Total item_count items read.endl; } I reuse coded input object for every 200k objects. there are total of around 650k objects in the file. I get a feeling, whether this slowness is because of my binary file format. is there anything i can change so that i can read it faster. like eg, removing optional fields and keeping the format as raw as possible etc. regards, Alok On Jan 16, 10:40 am, alok alok.jad...@gmail.com wrote: here is the link to a forum which states why i have to set the limit. http://markmail.org/message/km7mlmj46jgfs3rx#query:+page:1+mid:5f7q3w... excerpt from the link The problem is that CodedInputStream has internal counter of how many bytes are read so far with the same object. In my case, there are a lot of small messages saved in the same file. I do not read them at once and therefore do not care about large messages, limits. I am safe. So, the problem can be easily solved by calling: CodedInputStream input_stream(...); input_stream.SetTotalBytesLimit(1e9, 9e8); My use-case is really about storing extremely large number (up to 1e9) of small messages ~ 10K each. My problem is same as above, so i will have to set the limits on coded input object. Regards, Alok On Jan 16, 10:26 am, alok alok.jad...@gmail.com wrote: I was actually doing that initially, but I kept getting error on Maximum length for a message is reached ( I dont have exact error string at the moment). This was because my input binary file is large and it reaches the limit for coded input very fast. I saw a post on the forum (or maybe on Stack Exchange) which suggested that i should create a new coded_input object for each message. I have to reset the limits for coded input object. user on that thread suggested that its easy to create and destroy coded_input object. These objects are not big. Anyways, I will try it again by resetting the limits on this object. But then, would this be casuing the slowness? I will try and let you know the results. Regards, Alok On Jan 16, 9:46 am, Daniel Wright dwri...@google.com wrote: You're making a new CodedInputStream for each message -- I think that gives very poor buffering behavior. You should just pass coded_input to ReadAllMessages and keep reusing it. Cheers Daniel On Sun, Jan 15, 2012 at 4:41 PM, alok alok.jad...@gmail.com wrote: Daniel, i am hoping that my code is incorrect but i am not sure what is wrong or what is really causing this slowness. @ Henner Zeller, sorry i forgot to include the object length in above example. I do store object length for each object. I dont have issues in reading all the objects. Code is working fine. I just want to make sure to be able to make the code run faster now. attaching my code here... File format is File header Record1, Record2, Record3 Each record contains n objects of type defined in proto file. 1st object has header which contains the number of objects in each record. code proto file message HeaderMessage { required double timestamp = 1; required string ric_code = 2; required int32 count = 3; required int32 total_message_size = 4; } message QuoteMessage { enum Side { ASK = 0; BID = 1; } required Side type = 1; required int32 level = 2; optional double price = 3; optional int64 size = 4; optional int32 count = 5; optional HeaderMessage header = 6; } message CustomMessage { required string field_name = 1; required double value = 2; optional HeaderMessage header = 3; } message TradeMessage { optional double price = 1; optional int64 size = 2; optional int64 AccumulatedVolume = 3; optional HeaderMessage header = 4; } message AlphaMessage
[protobuf] Re: suggestions on improving the performance?
(int argc, _TCHAR* argv[]) { GOOGLE_PROTOBUF_VERIFY_VERSION; ZeroCopyInputStream *raw_input; CodedInputStream *coded_input; stdext::hash_setstd::string instruments; string filename = S:/users/aaj/sandbox/tickdata/bin/hk/ 2011/2011.01.04.bin; int fd = _open(filename.c_str(), _O_BINARY | O_RDONLY); if( fd == -1 ) { printf( Error opening the file. \n ); exit( 1 ); } raw_input = new FileInputStream(fd); coded_input = new CodedInputStream(raw_input); uint32 magic_no; coded_input-ReadLittleEndian32(magic_no); cout HEADER: \t magic_noendl; cout Reading data objects.. endl; delete coded_input; cout td '\n'; ReadAllMessages(raw_input, instruments); cout td '\n'; delete raw_input; _close(fd); google::protobuf::ShutdownProtobufLibrary(); return 0; } /code On Jan 14, 3:37 am, Henner Zeller henner.zel...@googlemail.com wrote: On Fri, Jan 13, 2012 at 11:22, Daniel Wright dwri...@google.com wrote: It's extremely unlikely that text parsing is faster than binary parsing on pretty much any message. My guess is that there's something wrong in the way you're reading the binary file -- e.g. no buffering, or possibly a bug where you hand the protobuf library multiple messages concatenated together. In particular, the object type, object, object type object .. doesn't seem to include headers that describe the length of the following message, but such a separator is needed. (http://code.google.com/apis/protocolbuffers/docs/techniques.html#stre...) It'd be easier to comment if you post the code. Cheers Daniel On Fri, Jan 13, 2012 at 1:22 AM, alok alok.jad...@gmail.com wrote: any suggestions? experiences? regards, Alok On Jan 11, 1:16 pm, alok alok.jad...@gmail.com wrote: my point is ..should i have one message something like Message Record{ required HeaderMessage header; optional TradeMessage trade; repeated QuoteMessage quotes; // 0 or more repeated CustomMessage customs; // 0 or more } or rather should i keep my file plain as object type, object, objecttype, object without worrying about the concept of a record. Each message in file is usually header + any 1 type of message (trade, quote or custom) .. and mostly only 1 quote or custom message not more. what would be faster to decode? Regards, Alok On Jan 11, 12:41 pm, alok alok.jad...@gmail.com wrote: Hi everyone, My program is taking more time to read binary files than the text files. I think the issue is with the structure of the binary files that i have designed. (Or could it be possible that binary decoding is slower than text files parsing? ). Data file is a large text file with 1 record per row. upto 1.2 GB. Binary file is around 900 MB. ** - Text file reading takes 3 minutes to read the file. - Binary file reading takes 5 minutes. I saw a very strange behavior. - Just to see how long it takes to skim through binary file, i started reading header on each message which holds the length of the message and then skipped that many bytes using the Skip() function of coded_input object. After making this change, i was expecting that reading through file should take less time, but it took more than 10 minutes. Is skipping not same as adding n bytes to the file pointer? is it slower to skip the object than read it? Are their any guidelines on how the structure should be designed to get the best performance? My current structure looks as below message HeaderMessage { required double timestamp = 1; required string ric_code = 2; required int32 count = 3; required int32 total_message_size = 4; } message QuoteMessage { enum Side { ASK = 0; BID = 1; } required Side type = 1; required int32 level = 2; optional double price = 3; optional int64 size = 4; optional int32 count = 5; optional HeaderMessage header = 6; } message CustomMessage { required string field_name = 1; required double value = 2; optional HeaderMessage header = 3; } message TradeMessage { optional double price = 1; optional int64 size = 2; optional int64 AccumulatedVolume = 3; optional HeaderMessage header = 4; } Binary file format is object type, object, object type object ... 1st object of a record holds header with n number of objects in that record. next n-1 objects will not hold header since they all belong to same record (same update time). now n+1th object belongs to the new record and it will hold header
[protobuf] Re: suggestions on improving the performance?
I was actually doing that initially, but I kept getting error on Maximum length for a message is reached ( I dont have exact error string at the moment). This was because my input binary file is large and it reaches the limit for coded input very fast. I saw a post on the forum (or maybe on Stack Exchange) which suggested that i should create a new coded_input object for each message. I have to reset the limits for coded input object. user on that thread suggested that its easy to create and destroy coded_input object. These objects are not big. Anyways, I will try it again by resetting the limits on this object. But then, would this be casuing the slowness? I will try and let you know the results. Regards, Alok On Jan 16, 9:46 am, Daniel Wright dwri...@google.com wrote: You're making a new CodedInputStream for each message -- I think that gives very poor buffering behavior. You should just pass coded_input to ReadAllMessages and keep reusing it. Cheers Daniel On Sun, Jan 15, 2012 at 4:41 PM, alok alok.jad...@gmail.com wrote: Daniel, i am hoping that my code is incorrect but i am not sure what is wrong or what is really causing this slowness. @ Henner Zeller, sorry i forgot to include the object length in above example. I do store object length for each object. I dont have issues in reading all the objects. Code is working fine. I just want to make sure to be able to make the code run faster now. attaching my code here... File format is File header Record1, Record2, Record3 Each record contains n objects of type defined in proto file. 1st object has header which contains the number of objects in each record. code proto file message HeaderMessage { required double timestamp = 1; required string ric_code = 2; required int32 count = 3; required int32 total_message_size = 4; } message QuoteMessage { enum Side { ASK = 0; BID = 1; } required Side type = 1; required int32 level = 2; optional double price = 3; optional int64 size = 4; optional int32 count = 5; optional HeaderMessage header = 6; } message CustomMessage { required string field_name = 1; required double value = 2; optional HeaderMessage header = 3; } message TradeMessage { optional double price = 1; optional int64 size = 2; optional int64 AccumulatedVolume = 3; optional HeaderMessage header = 4; } message AlphaMessage { required int32 level = 1; required double alpha = 2; optional double stddev = 3; optional HeaderMessage header = 4; } /code code Reading records from binary file bool ReadNextRecord(CodedInputStream *coded_input, stdext::hash_setstd::string instruments) { uint32 count, objtype, objlen; int i; int objectsread = 0; HeaderMessage *hMsg = NULL; TradeMessage tMsg; QuoteMessage qMsg; CustomMessage cMsg; AlphaMessage aMsg; while(1) { if(!coded_input-ReadLittleEndian32(objtype)) { return false; } if(!coded_input-ReadLittleEndian32(objlen)) { return false; } CodedInputStream::Limit lim = coded_input-PushLimit(objlen); switch(objtype) { case 2: qMsg.ParseFromCodedStream(coded_input); if(qMsg.has_header()) { //hMsg = hMsg = new HeaderMessage(); hMsg-Clear(); hMsg-Swap(qMsg.mutable_header()); } objectsread++; break; case 3: tMsg.ParseFromCodedStream(coded_input); if(tMsg.has_header()) { //hMsg = tMsg.mutable_header(); hMsg = new HeaderMessage(); hMsg-Clear(); hMsg-Swap(tMsg.mutable_header()); } objectsread++; break; case 4: aMsg.ParseFromCodedStream(coded_input); if(aMsg.has_header()) { //hMsg = aMsg.mutable_header(); hMsg = new HeaderMessage(); hMsg-Clear(); hMsg-Swap(aMsg.mutable_header()); } objectsread++; break
[protobuf] Re: suggestions on improving the performance?
here is the link to a forum which states why i have to set the limit. http://markmail.org/message/km7mlmj46jgfs3rx#query:+page:1+mid:5f7q3wj2htwajjof+state:results excerpt from the link The problem is that CodedInputStream has internal counter of how many bytes are read so far with the same object. In my case, there are a lot of small messages saved in the same file. I do not read them at once and therefore do not care about large messages, limits. I am safe. So, the problem can be easily solved by calling: CodedInputStream input_stream(...); input_stream.SetTotalBytesLimit(1e9, 9e8); My use-case is really about storing extremely large number (up to 1e9) of small messages ~ 10K each. My problem is same as above, so i will have to set the limits on coded input object. Regards, Alok On Jan 16, 10:26 am, alok alok.jad...@gmail.com wrote: I was actually doing that initially, but I kept getting error on Maximum length for a message is reached ( I dont have exact error string at the moment). This was because my input binary file is large and it reaches the limit for coded input very fast. I saw a post on the forum (or maybe on Stack Exchange) which suggested that i should create a new coded_input object for each message. I have to reset the limits for coded input object. user on that thread suggested that its easy to create and destroy coded_input object. These objects are not big. Anyways, I will try it again by resetting the limits on this object. But then, would this be casuing the slowness? I will try and let you know the results. Regards, Alok On Jan 16, 9:46 am, Daniel Wright dwri...@google.com wrote: You're making a new CodedInputStream for each message -- I think that gives very poor buffering behavior. You should just pass coded_input to ReadAllMessages and keep reusing it. Cheers Daniel On Sun, Jan 15, 2012 at 4:41 PM, alok alok.jad...@gmail.com wrote: Daniel, i am hoping that my code is incorrect but i am not sure what is wrong or what is really causing this slowness. @ Henner Zeller, sorry i forgot to include the object length in above example. I do store object length for each object. I dont have issues in reading all the objects. Code is working fine. I just want to make sure to be able to make the code run faster now. attaching my code here... File format is File header Record1, Record2, Record3 Each record contains n objects of type defined in proto file. 1st object has header which contains the number of objects in each record. code proto file message HeaderMessage { required double timestamp = 1; required string ric_code = 2; required int32 count = 3; required int32 total_message_size = 4; } message QuoteMessage { enum Side { ASK = 0; BID = 1; } required Side type = 1; required int32 level = 2; optional double price = 3; optional int64 size = 4; optional int32 count = 5; optional HeaderMessage header = 6; } message CustomMessage { required string field_name = 1; required double value = 2; optional HeaderMessage header = 3; } message TradeMessage { optional double price = 1; optional int64 size = 2; optional int64 AccumulatedVolume = 3; optional HeaderMessage header = 4; } message AlphaMessage { required int32 level = 1; required double alpha = 2; optional double stddev = 3; optional HeaderMessage header = 4; } /code code Reading records from binary file bool ReadNextRecord(CodedInputStream *coded_input, stdext::hash_setstd::string instruments) { uint32 count, objtype, objlen; int i; int objectsread = 0; HeaderMessage *hMsg = NULL; TradeMessage tMsg; QuoteMessage qMsg; CustomMessage cMsg; AlphaMessage aMsg; while(1) { if(!coded_input-ReadLittleEndian32(objtype)) { return false; } if(!coded_input-ReadLittleEndian32(objlen)) { return false; } CodedInputStream::Limit lim = coded_input-PushLimit(objlen); switch(objtype) { case 2: qMsg.ParseFromCodedStream(coded_input); if(qMsg.has_header()) { //hMsg = hMsg = new HeaderMessage(); hMsg-Clear(); hMsg-Swap(qMsg.mutable_header()); } objectsread++; break; case 3
[protobuf] Re: suggestions on improving the performance?
google groups link http://groups.google.com/group/protobuf/browse_thread/thread/64a07911e3c90cd5 I tested the code with reusing the coded input object. Not much change in the speed performance. void ReadAllMessages(ZeroCopyInputStream *raw_input, stdext::hash_setstd::string instruments) { int item_count = 0; CodedInputStream* in = new CodedInputStream(raw_input); in-SetTotalBytesLimit(1e9, 9e8); while(1) { if(item_count % 20 == 0){ delete in; in = new CodedInputStream(raw_input); in-SetTotalBytesLimit(1e9, 9e8); } if(!ReadNextRecord(in, instruments)) break; item_count++; } cout Finished reading file. Total item_count items read.endl; } I reuse coded input object for every 200k objects. there are total of around 650k objects in the file. I get a feeling, whether this slowness is because of my binary file format. is there anything i can change so that i can read it faster. like eg, removing optional fields and keeping the format as raw as possible etc. regards, Alok On Jan 16, 10:40 am, alok alok.jad...@gmail.com wrote: here is the link to a forum which states why i have to set the limit. http://markmail.org/message/km7mlmj46jgfs3rx#query:+page:1+mid:5f7q3w... excerpt from the link The problem is that CodedInputStream has internal counter of how many bytes are read so far with the same object. In my case, there are a lot of small messages saved in the same file. I do not read them at once and therefore do not care about large messages, limits. I am safe. So, the problem can be easily solved by calling: CodedInputStream input_stream(...); input_stream.SetTotalBytesLimit(1e9, 9e8); My use-case is really about storing extremely large number (up to 1e9) of small messages ~ 10K each. My problem is same as above, so i will have to set the limits on coded input object. Regards, Alok On Jan 16, 10:26 am, alok alok.jad...@gmail.com wrote: I was actually doing that initially, but I kept getting error on Maximum length for a message is reached ( I dont have exact error string at the moment). This was because my input binary file is large and it reaches the limit for coded input very fast. I saw a post on the forum (or maybe on Stack Exchange) which suggested that i should create a new coded_input object for each message. I have to reset the limits for coded input object. user on that thread suggested that its easy to create and destroy coded_input object. These objects are not big. Anyways, I will try it again by resetting the limits on this object. But then, would this be casuing the slowness? I will try and let you know the results. Regards, Alok On Jan 16, 9:46 am, Daniel Wright dwri...@google.com wrote: You're making a new CodedInputStream for each message -- I think that gives very poor buffering behavior. You should just pass coded_input to ReadAllMessages and keep reusing it. Cheers Daniel On Sun, Jan 15, 2012 at 4:41 PM, alok alok.jad...@gmail.com wrote: Daniel, i am hoping that my code is incorrect but i am not sure what is wrong or what is really causing this slowness. @ Henner Zeller, sorry i forgot to include the object length in above example. I do store object length for each object. I dont have issues in reading all the objects. Code is working fine. I just want to make sure to be able to make the code run faster now. attaching my code here... File format is File header Record1, Record2, Record3 Each record contains n objects of type defined in proto file. 1st object has header which contains the number of objects in each record. code proto file message HeaderMessage { required double timestamp = 1; required string ric_code = 2; required int32 count = 3; required int32 total_message_size = 4; } message QuoteMessage { enum Side { ASK = 0; BID = 1; } required Side type = 1; required int32 level = 2; optional double price = 3; optional int64 size = 4; optional int32 count = 5; optional HeaderMessage header = 6; } message CustomMessage { required string field_name = 1; required double value = 2; optional HeaderMessage header = 3; } message TradeMessage { optional double price = 1; optional int64 size = 2; optional int64 AccumulatedVolume = 3; optional HeaderMessage header = 4; } message AlphaMessage { required int32 level = 1; required double alpha = 2; optional double stddev = 3; optional HeaderMessage header = 4; } /code code
[protobuf] Re: suggestions on improving the performance?
any suggestions? experiences? regards, Alok On Jan 11, 1:16 pm, alok alok.jad...@gmail.com wrote: my point is ..should i have one message something like Message Record{ required HeaderMessage header; optional TradeMessage trade; repeated QuoteMessage quotes; // 0 or more repeated CustomMessage customs; // 0 or more } or rather should i keep my file plain as object type, object, objecttype, object without worrying about the concept of a record. Each message in file is usually header + any 1 type of message (trade, quote or custom) .. and mostly only 1 quote or custom message not more. what would be faster to decode? Regards, Alok On Jan 11, 12:41 pm, alok alok.jad...@gmail.com wrote: Hi everyone, My program is taking more time to read binary files than the text files. I think the issue is with the structure of the binary files that i have designed. (Or could it be possible that binary decoding is slower than text files parsing? ). Data file is a large text file with 1 record per row. upto 1.2 GB. Binary file is around 900 MB. ** - Text file reading takes 3 minutes to read the file. - Binary file reading takes 5 minutes. I saw a very strange behavior. - Just to see how long it takes to skim through binary file, i started reading header on each message which holds the length of the message and then skipped that many bytes using the Skip() function of coded_input object. After making this change, i was expecting that reading through file should take less time, but it took more than 10 minutes. Is skipping not same as adding n bytes to the file pointer? is it slower to skip the object than read it? Are their any guidelines on how the structure should be designed to get the best performance? My current structure looks as below message HeaderMessage { required double timestamp = 1; required string ric_code = 2; required int32 count = 3; required int32 total_message_size = 4; } message QuoteMessage { enum Side { ASK = 0; BID = 1; } required Side type = 1; required int32 level = 2; optional double price = 3; optional int64 size = 4; optional int32 count = 5; optional HeaderMessage header = 6; } message CustomMessage { required string field_name = 1; required double value = 2; optional HeaderMessage header = 3; } message TradeMessage { optional double price = 1; optional int64 size = 2; optional int64 AccumulatedVolume = 3; optional HeaderMessage header = 4; } Binary file format is object type, object, object type object ... 1st object of a record holds header with n number of objects in that record. next n-1 objects will not hold header since they all belong to same record (same update time). now n+1th object belongs to the new record and it will hold header for next record. Any advices? Regards, Alok -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] suggestions on improving the performance?
Hi everyone, My program is taking more time to read binary files than the text files. I think the issue is with the structure of the binary files that i have designed. (Or could it be possible that binary decoding is slower than text files parsing? ). Data file is a large text file with 1 record per row. upto 1.2 GB. Binary file is around 900 MB. ** - Text file reading takes 3 minutes to read the file. - Binary file reading takes 5 minutes. I saw a very strange behavior. - Just to see how long it takes to skim through binary file, i started reading header on each message which holds the length of the message and then skipped that many bytes using the Skip() function of coded_input object. After making this change, i was expecting that reading through file should take less time, but it took more than 10 minutes. Is skipping not same as adding n bytes to the file pointer? is it slower to skip the object than read it? Are their any guidelines on how the structure should be designed to get the best performance? My current structure looks as below message HeaderMessage { required double timestamp = 1; required string ric_code = 2; required int32 count = 3; required int32 total_message_size = 4; } message QuoteMessage { enum Side { ASK = 0; BID = 1; } required Side type = 1; required int32 level = 2; optional double price = 3; optional int64 size = 4; optional int32 count = 5; optional HeaderMessage header = 6; } message CustomMessage { required string field_name = 1; required double value = 2; optional HeaderMessage header = 3; } message TradeMessage { optional double price = 1; optional int64 size = 2; optional int64 AccumulatedVolume = 3; optional HeaderMessage header = 4; } Binary file format is object type, object, object type object ... 1st object of a record holds header with n number of objects in that record. next n-1 objects will not hold header since they all belong to same record (same update time). now n+1th object belongs to the new record and it will hold header for next record. Any advices? Regards, Alok -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: suggestions on improving the performance?
my point is ..should i have one message something like Message Record{ required HeaderMessage header; optional TradeMessage trade; repeated QuoteMessage quotes; // 0 or more repeated CustomMessage customs; // 0 or more } or rather should i keep my file plain as object type, object, objecttype, object without worrying about the concept of a record. Each message in file is usually header + any 1 type of message (trade, quote or custom) .. and mostly only 1 quote or custom message not more. what would be faster to decode? Regards, Alok On Jan 11, 12:41 pm, alok alok.jad...@gmail.com wrote: Hi everyone, My program is taking more time to read binary files than the text files. I think the issue is with the structure of the binary files that i have designed. (Or could it be possible that binary decoding is slower than text files parsing? ). Data file is a large text file with 1 record per row. upto 1.2 GB. Binary file is around 900 MB. ** - Text file reading takes 3 minutes to read the file. - Binary file reading takes 5 minutes. I saw a very strange behavior. - Just to see how long it takes to skim through binary file, i started reading header on each message which holds the length of the message and then skipped that many bytes using the Skip() function of coded_input object. After making this change, i was expecting that reading through file should take less time, but it took more than 10 minutes. Is skipping not same as adding n bytes to the file pointer? is it slower to skip the object than read it? Are their any guidelines on how the structure should be designed to get the best performance? My current structure looks as below message HeaderMessage { required double timestamp = 1; required string ric_code = 2; required int32 count = 3; required int32 total_message_size = 4; } message QuoteMessage { enum Side { ASK = 0; BID = 1; } required Side type = 1; required int32 level = 2; optional double price = 3; optional int64 size = 4; optional int32 count = 5; optional HeaderMessage header = 6; } message CustomMessage { required string field_name = 1; required double value = 2; optional HeaderMessage header = 3; } message TradeMessage { optional double price = 1; optional int64 size = 2; optional int64 AccumulatedVolume = 3; optional HeaderMessage header = 4; } Binary file format is object type, object, object type object ... 1st object of a record holds header with n number of objects in that record. next n-1 objects will not hold header since they all belong to same record (same update time). now n+1th object belongs to the new record and it will hold header for next record. Any advices? Regards, Alok -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] nested message not read properly. (_has_bits is not set?)
Hi, I have a nested message that i am trying to read using protocol buffers. Message looks like as below code message HeaderMessage { required double timestamp = 1; required string ric_code = 2; required int32 count = 3; required int32 total_message_size = 4; } message CustomMessage { required string field_name = 1; required double value = 2; optional HeaderMessage header = 3; } /code Here, I am setting the header field in the CustomMessage and writting the custom message to binary output file. I know for sure that the message is written properly to the binary file because I am able to retrive it properly using the C# library (protobuf-net) to read the binary file. I am trying to read the same file using C++ protocol buffers library. But when i read the object from coded_stream cMsg.ParseFromCodedStream(coded_input); I see that the header is not set in the cMsg. I looked inside protocol buffers library. While reading the object, it checks for if header is set or not using the following function inline bool CustomMessage::has_header() const { return (_has_bits_[0] 0x0004u) != 0; } Above function returns false and header object is not read. When I am writing the object to binary file, value of _has_bits was 0x00fb6ff8 but when I am reading the custom message from the binary file, this time the value of _has_bits is unchanged before and after reading the object. This value is 0x0012fbcc. For this value, has_header() function returns false. so when I call ParseFromCodedStream function, _has_bits value is not read properly causing a problem in reading header object. What am I doing wrong in this case? How to resolve this issue? Thanks for your help. -Alok -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: nested message not read properly. (_has_bits is not set?)
Apologies for incorrect values of _has_bits in my prev message. Those were the addresses of the variable _has_bits. But the behvaior of the _has_bits is same. When I initialize the header while writing using cMsg.mutable_header(), it sets the 3rd bit in _has_bits and the value of _has_bits is 7 ( = 0x0111). But when I read the cMsg in my reader program, value of _has_bits is 3 ( = 0x0011) 3rd bit is not set and hence the header message is not read. I am further looking into the code to understand why this bit is not set. Appreciate it if you could tell me what am I doing wrong in this case. Regards, Alok On Dec 22, 10:07 am, alok alok.jad...@gmail.com wrote: Hi, I have a nested message that i am trying to read using protocol buffers. Message looks like as below code message HeaderMessage { required double timestamp = 1; required string ric_code = 2; required int32 count = 3; required int32 total_message_size = 4; } message CustomMessage { required string field_name = 1; required double value = 2; optional HeaderMessage header = 3; } /code Here, I am setting the header field in the CustomMessage and writting the custom message to binary output file. I know for sure that the message is written properly to the binary file because I am able to retrive it properly using the C# library (protobuf-net) to read the binary file. I am trying to read the same file using C++ protocol buffers library. But when i read the object from coded_stream cMsg.ParseFromCodedStream(coded_input); I see that the header is not set in the cMsg. I looked inside protocol buffers library. While reading the object, it checks for if header is set or not using the following function inline bool CustomMessage::has_header() const { return (_has_bits_[0] 0x0004u) != 0; } Above function returns false and header object is not read. When I am writing the object to binary file, value of _has_bits was 0x00fb6ff8 but when I am reading the custom message from the binary file, this time the value of _has_bits is unchanged before and after reading the object. This value is 0x0012fbcc. For this value, has_header() function returns false. so when I call ParseFromCodedStream function, _has_bits value is not read properly causing a problem in reading header object. What am I doing wrong in this case? How to resolve this issue? Thanks for your help. -Alok -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: nested message not read properly. (_has_bits is not set?)
On further investigation, looks like the issue could be due to the limits. I have following code code lim = coded_input-PushLimit(objlen); cMsg.ParseFromCodedStream(coded_input); coded_input-PopLimit(lim); cmsgsize = cMsg.ByteSize(); /code Above, the value of objlen is 44. The length of cMsg should have been 44 bytes (including header message) but when I check cMsg.ByteSize(), it returns value 20. Its reading only 20 bytes instead of 44 bytes. inside PushLimit() function, buffer_end_ += buffer_size_after_limit_; value of buffer_size_after_limit_ is 0 and buffer_end_ ends after 20 bytes only reading partial message. Help me investigate this issue further. Regards, Alok On Dec 22, 10:49 am, alok alok.jad...@gmail.com wrote: Apologies for incorrect values of _has_bits in my prev message. Those were the addresses of the variable _has_bits. But the behvaior of the _has_bits is same. When I initialize the header while writing using cMsg.mutable_header(), it sets the 3rd bit in _has_bits and the value of _has_bits is 7 ( = 0x0111). But when I read the cMsg in my reader program, value of _has_bits is 3 ( = 0x0011) 3rd bit is not set and hence the header message is not read. I am further looking into the code to understand why this bit is not set. Appreciate it if you could tell me what am I doing wrong in this case. Regards, Alok On Dec 22, 10:07 am, alok alok.jad...@gmail.com wrote: Hi, I have a nested message that i am trying to read using protocol buffers. Message looks like as below code message HeaderMessage { required double timestamp = 1; required string ric_code = 2; required int32 count = 3; required int32 total_message_size = 4; } message CustomMessage { required string field_name = 1; required double value = 2; optional HeaderMessage header = 3; } /code Here, I am setting the header field in the CustomMessage and writting the custom message to binary output file. I know for sure that the message is written properly to the binary file because I am able to retrive it properly using the C# library (protobuf-net) to read the binary file. I am trying to read the same file using C++ protocol buffers library. But when i read the object from coded_stream cMsg.ParseFromCodedStream(coded_input); I see that the header is not set in the cMsg. I looked inside protocol buffers library. While reading the object, it checks for if header is set or not using the following function inline bool CustomMessage::has_header() const { return (_has_bits_[0] 0x0004u) != 0; } Above function returns false and header object is not read. When I am writing the object to binary file, value of _has_bits was 0x00fb6ff8 but when I am reading the custom message from the binary file, this time the value of _has_bits is unchanged before and after reading the object. This value is 0x0012fbcc. For this value, has_header() function returns false. so when I call ParseFromCodedStream function, _has_bits value is not read properly causing a problem in reading header object. What am I doing wrong in this case? How to resolve this issue? Thanks for your help. -Alok -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: nested message not read properly. (_has_bits is not set?)
The issue is resolved now. Made the same mistake which I did in the past. Forgot to open the file in the binary mode. It encountered an early eof cuz of text mode. Now its working fine. thanks, Alok On Dec 22, 11:25 am, alok alok.jad...@gmail.com wrote: On further investigation, looks like the issue could be due to the limits. I have following code code lim = coded_input-PushLimit(objlen); cMsg.ParseFromCodedStream(coded_input); coded_input-PopLimit(lim); cmsgsize = cMsg.ByteSize(); /code Above, the value of objlen is 44. The length of cMsg should have been 44 bytes (including header message) but when I check cMsg.ByteSize(), it returns value 20. Its reading only 20 bytes instead of 44 bytes. inside PushLimit() function, buffer_end_ += buffer_size_after_limit_; value of buffer_size_after_limit_ is 0 and buffer_end_ ends after 20 bytes only reading partial message. Help me investigate this issue further. Regards, Alok On Dec 22, 10:49 am, alok alok.jad...@gmail.com wrote: Apologies for incorrect values of _has_bits in my prev message. Those were the addresses of the variable _has_bits. But the behvaior of the _has_bits is same. When I initialize the header while writing using cMsg.mutable_header(), it sets the 3rd bit in _has_bits and the value of _has_bits is 7 ( = 0x0111). But when I read the cMsg in my reader program, value of _has_bits is 3 ( = 0x0011) 3rd bit is not set and hence the header message is not read. I am further looking into the code to understand why this bit is not set. Appreciate it if you could tell me what am I doing wrong in this case. Regards, Alok On Dec 22, 10:07 am, alok alok.jad...@gmail.com wrote: Hi, I have a nested message that i am trying to read using protocol buffers. Message looks like as below code message HeaderMessage { required double timestamp = 1; required string ric_code = 2; required int32 count = 3; required int32 total_message_size = 4; } message CustomMessage { required string field_name = 1; required double value = 2; optional HeaderMessage header = 3; } /code Here, I am setting the header field in the CustomMessage and writting the custom message to binary output file. I know for sure that the message is written properly to the binary file because I am able to retrive it properly using the C# library (protobuf-net) to read the binary file. I am trying to read the same file using C++ protocol buffers library. But when i read the object from coded_stream cMsg.ParseFromCodedStream(coded_input); I see that the header is not set in the cMsg. I looked inside protocol buffers library. While reading the object, it checks for if header is set or not using the following function inline bool CustomMessage::has_header() const { return (_has_bits_[0] 0x0004u) != 0; } Above function returns false and header object is not read. When I am writing the object to binary file, value of _has_bits was 0x00fb6ff8 but when I am reading the custom message from the binary file, this time the value of _has_bits is unchanged before and after reading the object. This value is 0x0012fbcc. For this value, has_header() function returns false. so when I call ParseFromCodedStream function, _has_bits value is not read properly causing a problem in reading header object. What am I doing wrong in this case? How to resolve this issue? Thanks for your help. -Alok -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Incorrect encoding of protocol buffer message
Hi, I am facing a strange issue when I write a binary file using protocol buffers library. I had hard time reading the generated binary file in my C# program. I would find an incorrect byte and unexpectedly encounter an end of file byte in the middle of file. But interesting thing is that my C++ reader program was able to read the program properly but C# program errors. So I am very confused. After investigating further in the binary file, we saw that the file had some incorrect bytes inserted while encoding. (Either it is incorrect or maybe I am missing something). I can share all my c++ reader/C# reader programs and the binary file to resolve this issue. We suspect that there is a bug in the library which is writing the data incorrectly. Below is the finding from the investigations of the data . * I am displaying only certain information which is required to understand the issue. I am printing the actual Message and not the length and other header bytes associated with this. code message TradeMessage { required double timestamp = 1; required string ric_code = 2; required double price = 3; required int64 size = 4; required int64 AccumulatedVolume = 5; } /code Some of the objects read from the binary file look as below (object 1 - good) 09 06 81 95 43 c3 27 dc 40 12 07 30 30 32 35 2e 48 4b 19 00 00 00 00 00 00 20 40 20 00 28 00 (object 2 - good) 09 25 06 81 95 c3 27 dc 40 12 07 30 30 32 34 2e 48 4b 19 00 00 00 00 00 00 00 00 20 00 28 00 (object 3 - corrupt) 09 71 3d 0d 0a d7 c3 27 dc 40 12 07 30 30 32 33 2e 48 4b 19 00 00 00 00 00 00 3b 40 20 00 28 If you look at 3 objects above, each object starts with field 1 which is timestamp, double. It is encoded as 09 i.e. field 1, wire-type 1 (i.e. 64-bit), so next 8 bytes represent the timestamp value. If you carefully look at first 10 bytes of each object, you will see that object 1 and object 2 encodes field 1 properly, but for object 3, actual payload of field 1 starts from byte # 3. header for this field is (09 71). 2 bytes. (either 2 bytes of header or 9 bytes of payload) But in either case, one extra byte is written to my binary file. Why is this happening? How does c++ reader understands to decode this data? Is this correct or is there a bug involved here? Please advice. Regards, Alok -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] how to flush coded stream buffer?
Hi, I am working on a very large file generated using protocol buffers. At certain point of time during file creating, my app raises no memory exception. Binary file size at this time is around 700 MB. I suppose if i could clear out the coded streams buffer, this issue could be resolved. Is my suggestion correct to solve this issue? How can we resolve this issue? I tried to flush using the file descriptors but it didn't work. Regards, Alok -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] How to check end of file stream?
Hi, I am using following code while reading through the file stream while(!coded_input-ExpectAtEnd()) { coded_input-ReadLittleEndian32(count); for(i=0; icount; i++) { coded_input-ReadLittleEndian32(objtype); coded_input-ReadLittleEndian32(objlen); cout Item = ++item_count type = objtype length = objlenendl; coded_input-Skip(objlen); } } Above code runs in infinite loop. My file has only 14 objects. File length is 627 bytes only. But ExpectAtEnd() Never returns true in above example. What is the right way to check end of file? Regards, Alok -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Why protocol buffer c++ library not reading binary objects properly?
I created a binary file using a c++ program using protocol buffers. I had issues reading the binary file in my C# program, so I decided to write a small c++ program to test the reading. My proto file is as follows message TradeMessage { required double timestamp = 1; required string ric_code = 2; required double price = 3; required int64 size = 4; required int64 AccumulatedVolume = 5; } When writing to protocol buffer, I first write the object type, then object length and the object itself. coded_output-WriteLittleEndian32((int) ObjectType_Trade); coded_output-WriteLittleEndian32(trade.ByteSize()); trade.SerializeToCodedStream(coded_output); Now, when I am trying to read the same file in my c++ program i see strange behavior. My reading code is as follows: coded_input-ReadLittleEndian32(objtype); coded_input-ReadLittleEndian32(objlen); tMsg.ParseFromCodedStream(coded_input); cout Expected Size = objlen endl; cout Trade message received for: tMsg.ric_code() endl; cout TradeMessage Size = tMsg.ByteSize() endl; In this case, i get following output Expected Size = 33 Trade message received for: .CSAP0104 TradeMessage Size = 42 When I write to file, I write trade.ByteSize() as 33 bytes, but when I read the same object, the object ByteSize() is 42 bytes i.e. it is trying to read 42 bytes. But it should be trying to read 33 bytes. This affects the rest of the fiel reading. I am not sure what is wrong in this. Please advice. just to double check.. I compared the protocol buffer generated files in my reader and writer projects. The generated files are identical. So I guess, the file coding is different for some reason. I do not understand why it is different. Regards, Alok Regards, Alok -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] Maps in protobuf / 2
Haven't found a direct way to create a map, but, we use the following to serialize map like data structures. - message KeyValue{ required string key = 1; required string value = 2; } message Map { repeated KeyValue items = 1; } message Foo { required string id = 1; optional Map map = 2; } Alok On 06/16/2011 01:26 PM, Marco Mistroni wrote: HI all sorry i hijacked a previous thread .. Is it possibel to define Maps in protobuff? i have some serverside code which returns a MapString, Double, and i was wondering if there was a way in protobuf to define a Map could anyone help ? w/kindest regards marco -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.