[protobuf] suggestions on improving the performance?

alok Tue, 10 Jan 2012 20:41:58 -0800

Hi everyone,

My program is taking more time to read binary files than the text
files. I think the issue is with the structure of the binary files
that i have designed. (Or could it be possible that binary decoding is
slower than text files parsing? ).


Data file is a large text file with 1 record per row. upto 1.2 GB.
Binary file is around 900 MB.

**
 - Text file reading takes 3 minutes to read the file.
 - Binary file reading takes 5 minutes.

I saw a very strange behavior.
 - Just to see how long it takes to skim through binary file, i
started reading header on each message which holds the length of the
message and then skipped that many bytes using the Skip() function of
coded_input object. After making this change, i was expecting that
reading through file should take less time, but it took more than 10
minutes. Is skipping not same as adding n bytes to the file pointer?
is it slower to skip the object than read it?

Are their any guidelines on how the structure should be designed to
get the best performance?

My current structure looks as below

message HeaderMessage {
  required double timestamp = 1;
  required string ric_code = 2;
  required int32 count = 3;
  required int32 total_message_size = 4;
}

message QuoteMessage {
        enum Side {
    ASK = 0;
    BID = 1;
  }
  required Side type = 1;
        required int32 level = 2;
        optional double price = 3;
        optional int64 size = 4;
        optional int32 count = 5;
        optional HeaderMessage header = 6;
}

message CustomMessage {
        required string field_name = 1;
        required double value = 2;
        optional HeaderMessage header = 3;
}

message TradeMessage {
        optional double price = 1;
        optional int64 size = 2;
        optional int64 AccumulatedVolume = 3;
        optional HeaderMessage header = 4;
}


Binary file format is
object type, object, object type object ...

1st object of a record holds header with n number of objects in that
record. next n-1 objects will not hold header since they all belong to
same record (same update time).
now n+1th object belongs to the new record and it will hold header for
next record.

Any advices?

Regards,
Alok

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

[protobuf] suggestions on improving the performance?

Reply via email to