Re: [protobuf] Re: suggestions on improving the performance?

Henner Zeller Fri, 13 Jan 2012 11:38:02 -0800

On Fri, Jan 13, 2012 at 11:22, Daniel Wright <dwri...@google.com> wrote:
> It's extremely unlikely that text parsing is faster than binary parsing on
> pretty much any message.  My guess is that there's something wrong in the
> way you're reading the binary file -- e.g. no buffering, or possibly a bug
> where you hand the protobuf library multiple messages concatenated together.


In particular, the
   object type, object, object type object ..
doesn't seem to include headers that describe the length of the
following message, but such a separator is needed.
( http://code.google.com/apis/protocolbuffers/docs/techniques.html#streaming )

>  It'd be easier to comment if you post the code.
>
> Cheers
> Daniel
>
>
> On Fri, Jan 13, 2012 at 1:22 AM, alok <alok.jad...@gmail.com> wrote:
>>
>> any suggestions? experiences?
>>
>> regards,
>> Alok
>>
>> On Jan 11, 1:16 pm, alok <alok.jad...@gmail.com> wrote:
>> > my point is ..should i have one message something like
>> >
>> > Message Record{
>> >   required HeaderMessage header;
>> >   optional TradeMessage trade;
>> >   repeated QuoteMessage quotes; // 0 or more
>> >   repeated CustomMessage customs; // 0 or more
>> >
>> > }
>> >
>> > or rather should i keep my file plain as
>> > object type, object, objecttype, object
>> > without worrying about the concept of a record.
>> >
>> > Each message in file is usually header + any 1 type of message (trade,
>> > quote or custom) ..  and mostly only 1 quote or custom message not
>> > more.
>> >
>> > what would be faster to decode?
>> >
>> > Regards,
>> > Alok
>> >
>> > On Jan 11, 12:41 pm, alok <alok.jad...@gmail.com> wrote:
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > > Hi everyone,
>> >
>> > > My program is taking more time to read binary files than the text
>> > > files. I think the issue is with the structure of the binary files
>> > > that i have designed. (Or could it be possible that binary decoding is
>> > > slower than text files parsing? ).
>> >
>> > > Data file is a large text file with 1 record per row. upto 1.2 GB.
>> > > Binary file is around 900 MB.
>> >
>> > > **
>> > >  - Text file reading takes 3 minutes to read the file.
>> > >  - Binary file reading takes 5 minutes.
>> >
>> > > I saw a very strange behavior.
>> > >  - Just to see how long it takes to skim through binary file, i
>> > > started reading header on each message which holds the length of the
>> > > message and then skipped that many bytes using the Skip() function of
>> > > coded_input object. After making this change, i was expecting that
>> > > reading through file should take less time, but it took more than 10
>> > > minutes. Is skipping not same as adding n bytes to the file pointer?
>> > > is it slower to skip the object than read it?
>> >
>> > > Are their any guidelines on how the structure should be designed to
>> > > get the best performance?
>> >
>> > > My current structure looks as below
>> >
>> > > message HeaderMessage {
>> > >   required double timestamp = 1;
>> > >   required string ric_code = 2;
>> > >   required int32 count = 3;
>> > >   required int32 total_message_size = 4;
>> >
>> > > }
>> >
>> > > message QuoteMessage {
>> > >         enum Side {
>> > >     ASK = 0;
>> > >     BID = 1;
>> > >   }
>> > >   required Side type = 1;
>> > >         required int32 level = 2;
>> > >         optional double price = 3;
>> > >         optional int64 size = 4;
>> > >         optional int32 count = 5;
>> > >         optional HeaderMessage header = 6;
>> >
>> > > }
>> >
>> > > message CustomMessage {
>> > >         required string field_name = 1;
>> > >         required double value = 2;
>> > >         optional HeaderMessage header = 3;
>> >
>> > > }
>> >
>> > > message TradeMessage {
>> > >         optional double price = 1;
>> > >         optional int64 size = 2;
>> > >         optional int64 AccumulatedVolume = 3;
>> > >         optional HeaderMessage header = 4;
>> >
>> > > }
>> >
>> > > Binary file format is
>> > > object type, object, object type object ...
>> >
>> > > 1st object of a record holds header with n number of objects in that
>> > > record. next n-1 objects will not hold header since they all belong to
>> > > same record (same update time).
>> > > now n+1th object belongs to the new record and it will hold header for
>> > > next record.
>> >
>> > > Any advices?
>> >
>> > > Regards,
>> > > Alok
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Protocol Buffers" group.
>> To post to this group, send email to protobuf@googlegroups.com.
>> To unsubscribe from this group, send email to
>> protobuf+unsubscr...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/protobuf?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To post to this group, send email to protobuf@googlegroups.com.
> To unsubscribe from this group, send email to
> protobuf+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/protobuf?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Re: [protobuf] Re: suggestions on improving the performance?

Reply via email to