[protobuf] Re: suggestions on improving the performance?

2012-01-16 Thread alok
anymore suggestions?

On Jan 16, 11:14 am, alok alok.jad...@gmail.com wrote:
 google groups 
 linkhttp://groups.google.com/group/protobuf/browse_thread/thread/64a07911...

 I tested the code with reusing the coded input object. Not much change
 in the speed performance.

 void ReadAllMessages(ZeroCopyInputStream *raw_input,
 stdext::hash_setstd::string instruments)
 {
         int item_count = 0;

         CodedInputStream* in = new  CodedInputStream(raw_input);
         in-SetTotalBytesLimit(1e9, 9e8);
         while(1)
         {
                 if(item_count % 20 == 0){
                         delete in;
                         in = new  CodedInputStream(raw_input);
                         in-SetTotalBytesLimit(1e9, 9e8);
                 }
                 if(!ReadNextRecord(in, instruments))
                         break;
                 item_count++;
         }
         cout  Finished reading file. Total item_count items
 read.endl;

 }

 I reuse coded input object for every 200k objects. there are total of
 around 650k objects in the file.

 I get a feeling, whether this slowness is because of my binary file
 format. is there anything i can change so that i can read it faster.
 like eg, removing optional fields and keeping the format as raw as
 possible etc.

 regards,
 Alok

 On Jan 16, 10:40 am, alok alok.jad...@gmail.com wrote:







  here is the link to a forum which states why i have to set the limit.

 http://markmail.org/message/km7mlmj46jgfs3rx#query:+page:1+mid:5f7q3w...

  excerpt from the link

  The problem is that CodedInputStream has internal counter of how many
  bytes are read so far with the same object.

  In my case, there are a lot of small messages saved in the same file.
  I do not read them at once and therefore do not care about large
  messages, limits. I am safe.

  So, the problem can be easily solved by calling:

  CodedInputStream input_stream(...);
  input_stream.SetTotalBytesLimit(1e9, 9e8);

  My use-case is really about storing extremely large number (up to 1e9)
  of small messages ~ 10K each. 

  My problem is same as above, so i will have to set the limits on coded
  input object.

  Regards,
  Alok

  On Jan 16, 10:26 am, alok alok.jad...@gmail.com wrote:

   I was actually doing that initially, but I kept getting error on
   Maximum length for a message is reached ( I dont have exact error
   string at the moment). This was because my input binary file is large
   and it reaches the limit for coded input very fast.

   I saw a post on the forum (or maybe on Stack Exchange) which suggested
   that i should create a new coded_input object for each message. I have
   to reset the limits for coded input object. user on that thread
   suggested that its easy to create and destroy coded_input object.
   These objects are not big.

   Anyways, I will try it again by resetting the limits on this object.
   But then, would this be casuing the slowness? I will try and let you
   know the results.

   Regards,
   Alok

   On Jan 16, 9:46 am, Daniel Wright dwri...@google.com wrote:

You're making a new CodedInputStream for each message -- I think that 
gives
very poor buffering behavior.  You should just pass coded_input to
ReadAllMessages and keep reusing it.

Cheers
Daniel

On Sun, Jan 15, 2012 at 4:41 PM, alok alok.jad...@gmail.com wrote:
 Daniel,

 i am hoping that my code is incorrect but i am not sure what is wrong
 or what is really causing this slowness.

 @ Henner Zeller, sorry i forgot to include the object length in above
 example. I do store object length for each object. I dont have issues
 in reading all the objects. Code is working fine. I just want to make
 sure to be able to make the code run faster now.

 attaching my code here...

 File format is

 File header
 Record1, Record2, Record3

 Each record contains n objects of type defined in proto file. 1st
 object has header which contains the number of objects in each record.

 code
 proto file

 message HeaderMessage {
        required double timestamp = 1;
  required string ric_code = 2;
  required int32 count = 3;
  required int32 total_message_size = 4;
 }

 message QuoteMessage {
        enum Side {
    ASK = 0;
    BID = 1;
  }
  required Side type = 1;
        required int32 level = 2;
        optional double price = 3;
        optional int64 size = 4;
        optional int32 count = 5;
        optional HeaderMessage header = 6;
 }

 message CustomMessage {
        required string field_name = 1;
        required double value = 2;
        optional HeaderMessage header = 3;
 }

 message TradeMessage {
        optional double price = 1;
        optional int64 size = 2;
        optional int64 AccumulatedVolume = 3;
        optional HeaderMessage header = 4;
 }

 message AlphaMessage {

[protobuf] Re: suggestions on improving the performance?

2012-01-15 Thread alok
Daniel,

i am hoping that my code is incorrect but i am not sure what is wrong
or what is really causing this slowness.

@ Henner Zeller, sorry i forgot to include the object length in above
example. I do store object length for each object. I dont have issues
in reading all the objects. Code is working fine. I just want to make
sure to be able to make the code run faster now.

attaching my code here...

File format is

File header
Record1, Record2, Record3

Each record contains n objects of type defined in proto file. 1st
object has header which contains the number of objects in each record.

code
proto file

message HeaderMessage {
required double timestamp = 1;
  required string ric_code = 2;
  required int32 count = 3;
  required int32 total_message_size = 4;
}

message QuoteMessage {
enum Side {
ASK = 0;
BID = 1;
  }
  required Side type = 1;
required int32 level = 2;
optional double price = 3;
optional int64 size = 4;
optional int32 count = 5;
optional HeaderMessage header = 6;
}

message CustomMessage {
required string field_name = 1;
required double value = 2;
optional HeaderMessage header = 3;
}

message TradeMessage {
optional double price = 1;
optional int64 size = 2;
optional int64 AccumulatedVolume = 3;
optional HeaderMessage header = 4;
}

message AlphaMessage {
required int32 level = 1;
required double alpha = 2;
optional double stddev = 3;
optional HeaderMessage header = 4;
}

/code

code
Reading records from binary file

bool ReadNextRecord(CodedInputStream *coded_input,
stdext::hash_setstd::string instruments)
{
uint32 count, objtype, objlen;
int i;

int objectsread = 0;
HeaderMessage *hMsg = NULL;
TradeMessage tMsg;
QuoteMessage qMsg;
CustomMessage cMsg;
AlphaMessage aMsg;

while(1)
{
if(!coded_input-ReadLittleEndian32(objtype)) {
return false;
}
if(!coded_input-ReadLittleEndian32(objlen)) {
return false;
}
CodedInputStream::Limit lim = coded_input-PushLimit(objlen);

switch(objtype)
{
case 2:
qMsg.ParseFromCodedStream(coded_input);
if(qMsg.has_header())
{
//hMsg =
hMsg = new HeaderMessage();
hMsg-Clear();
hMsg-Swap(qMsg.mutable_header());
}
objectsread++;
break;

case 3:
tMsg.ParseFromCodedStream(coded_input);
if(tMsg.has_header())
{
//hMsg = tMsg.mutable_header();
hMsg = new HeaderMessage();
hMsg-Clear();
hMsg-Swap(tMsg.mutable_header());
}
objectsread++;
break;

case 4:
aMsg.ParseFromCodedStream(coded_input);
if(aMsg.has_header())
{
//hMsg = aMsg.mutable_header();
hMsg = new HeaderMessage();
hMsg-Clear();
hMsg-Swap(aMsg.mutable_header());
}
objectsread++;
break;

case 5:
cMsg.ParseFromCodedStream(coded_input);
if(cMsg.has_header())
{
//hMsg = cMsg.mutable_header();
hMsg = new HeaderMessage();
hMsg-Clear();
hMsg-Swap(cMsg.mutable_header());
}
objectsread++;
break;

default:
cout  Invalid object type  objtype 
endl;
return false;
break;
}
coded_input-PopLimit(lim);
if(objectsread == hMsg-count()) break;
}
return true;
}


void ReadAllMessages(ZeroCopyInputStream *raw_input,
stdext::hash_setstd::string instruments)
{
int item_count = 0;
while(1)
{
CodedInputStream in(raw_input);
if(!ReadNextRecord(in, instruments))
break;
item_count++;
}
cout  Finished reading file. Total item_count items
read.endl;
}


int 

Re: [protobuf] Re: suggestions on improving the performance?

2012-01-15 Thread Daniel Wright
You're making a new CodedInputStream for each message -- I think that gives
very poor buffering behavior.  You should just pass coded_input to
ReadAllMessages and keep reusing it.

Cheers
Daniel

On Sun, Jan 15, 2012 at 4:41 PM, alok alok.jad...@gmail.com wrote:

 Daniel,

 i am hoping that my code is incorrect but i am not sure what is wrong
 or what is really causing this slowness.

 @ Henner Zeller, sorry i forgot to include the object length in above
 example. I do store object length for each object. I dont have issues
 in reading all the objects. Code is working fine. I just want to make
 sure to be able to make the code run faster now.

 attaching my code here...

 File format is

 File header
 Record1, Record2, Record3

 Each record contains n objects of type defined in proto file. 1st
 object has header which contains the number of objects in each record.

 code
 proto file

 message HeaderMessage {
required double timestamp = 1;
  required string ric_code = 2;
  required int32 count = 3;
  required int32 total_message_size = 4;
 }

 message QuoteMessage {
enum Side {
ASK = 0;
BID = 1;
  }
  required Side type = 1;
required int32 level = 2;
optional double price = 3;
optional int64 size = 4;
optional int32 count = 5;
optional HeaderMessage header = 6;
 }

 message CustomMessage {
required string field_name = 1;
required double value = 2;
optional HeaderMessage header = 3;
 }

 message TradeMessage {
optional double price = 1;
optional int64 size = 2;
optional int64 AccumulatedVolume = 3;
optional HeaderMessage header = 4;
 }

 message AlphaMessage {
required int32 level = 1;
required double alpha = 2;
optional double stddev = 3;
 optional HeaderMessage header = 4;
 }

 /code

 code
 Reading records from binary file

 bool ReadNextRecord(CodedInputStream *coded_input,
 stdext::hash_setstd::string instruments)
 {
uint32 count, objtype, objlen;
int i;

int objectsread = 0;
HeaderMessage *hMsg = NULL;
TradeMessage tMsg;
QuoteMessage qMsg;
CustomMessage cMsg;
AlphaMessage aMsg;

while(1)
{
if(!coded_input-ReadLittleEndian32(objtype)) {
return false;
}
if(!coded_input-ReadLittleEndian32(objlen)) {
return false;
}
CodedInputStream::Limit lim =
 coded_input-PushLimit(objlen);

switch(objtype)
{
case 2:
qMsg.ParseFromCodedStream(coded_input);
if(qMsg.has_header())
{
//hMsg =
hMsg = new HeaderMessage();
hMsg-Clear();
hMsg-Swap(qMsg.mutable_header());
}
objectsread++;
break;

case 3:
tMsg.ParseFromCodedStream(coded_input);
if(tMsg.has_header())
{
//hMsg = tMsg.mutable_header();
hMsg = new HeaderMessage();
hMsg-Clear();
hMsg-Swap(tMsg.mutable_header());
}
objectsread++;
break;

case 4:
aMsg.ParseFromCodedStream(coded_input);
if(aMsg.has_header())
{
//hMsg = aMsg.mutable_header();
hMsg = new HeaderMessage();
hMsg-Clear();
hMsg-Swap(aMsg.mutable_header());
}
objectsread++;
break;

case 5:
cMsg.ParseFromCodedStream(coded_input);
if(cMsg.has_header())
{
//hMsg = cMsg.mutable_header();
hMsg = new HeaderMessage();
hMsg-Clear();
hMsg-Swap(cMsg.mutable_header());
}
objectsread++;
break;

default:
cout  Invalid object type  objtype 
 endl;
return false;
break;
}
coded_input-PopLimit(lim);
if(objectsread == hMsg-count()) break;
}
return true;
 }


 void ReadAllMessages(ZeroCopyInputStream *raw_input,
 stdext::hash_setstd::string instruments)
 {

[protobuf] Re: suggestions on improving the performance?

2012-01-15 Thread alok
I was actually doing that initially, but I kept getting error on
Maximum length for a message is reached ( I dont have exact error
string at the moment). This was because my input binary file is large
and it reaches the limit for coded input very fast.

I saw a post on the forum (or maybe on Stack Exchange) which suggested
that i should create a new coded_input object for each message. I have
to reset the limits for coded input object. user on that thread
suggested that its easy to create and destroy coded_input object.
These objects are not big.

Anyways, I will try it again by resetting the limits on this object.
But then, would this be casuing the slowness? I will try and let you
know the results.

Regards,
Alok

On Jan 16, 9:46 am, Daniel Wright dwri...@google.com wrote:
 You're making a new CodedInputStream for each message -- I think that gives
 very poor buffering behavior.  You should just pass coded_input to
 ReadAllMessages and keep reusing it.

 Cheers
 Daniel







 On Sun, Jan 15, 2012 at 4:41 PM, alok alok.jad...@gmail.com wrote:
  Daniel,

  i am hoping that my code is incorrect but i am not sure what is wrong
  or what is really causing this slowness.

  @ Henner Zeller, sorry i forgot to include the object length in above
  example. I do store object length for each object. I dont have issues
  in reading all the objects. Code is working fine. I just want to make
  sure to be able to make the code run faster now.

  attaching my code here...

  File format is

  File header
  Record1, Record2, Record3

  Each record contains n objects of type defined in proto file. 1st
  object has header which contains the number of objects in each record.

  code
  proto file

  message HeaderMessage {
         required double timestamp = 1;
   required string ric_code = 2;
   required int32 count = 3;
   required int32 total_message_size = 4;
  }

  message QuoteMessage {
         enum Side {
     ASK = 0;
     BID = 1;
   }
   required Side type = 1;
         required int32 level = 2;
         optional double price = 3;
         optional int64 size = 4;
         optional int32 count = 5;
         optional HeaderMessage header = 6;
  }

  message CustomMessage {
         required string field_name = 1;
         required double value = 2;
         optional HeaderMessage header = 3;
  }

  message TradeMessage {
         optional double price = 1;
         optional int64 size = 2;
         optional int64 AccumulatedVolume = 3;
         optional HeaderMessage header = 4;
  }

  message AlphaMessage {
         required int32 level = 1;
         required double alpha = 2;
         optional double stddev = 3;
          optional HeaderMessage header = 4;
  }

  /code

  code
  Reading records from binary file

  bool ReadNextRecord(CodedInputStream *coded_input,
  stdext::hash_setstd::string instruments)
  {
         uint32 count, objtype, objlen;
         int i;

         int objectsread = 0;
         HeaderMessage *hMsg = NULL;
         TradeMessage tMsg;
         QuoteMessage qMsg;
         CustomMessage cMsg;
         AlphaMessage aMsg;

         while(1)
         {
                 if(!coded_input-ReadLittleEndian32(objtype)) {
                         return false;
                 }
                 if(!coded_input-ReadLittleEndian32(objlen)) {
                         return false;
                 }
                 CodedInputStream::Limit lim =
  coded_input-PushLimit(objlen);

                 switch(objtype)
                 {
                 case 2:
                         qMsg.ParseFromCodedStream(coded_input);
                         if(qMsg.has_header())
                         {
                                 //hMsg =
                                 hMsg = new HeaderMessage();
                                 hMsg-Clear();
                                 hMsg-Swap(qMsg.mutable_header());
                         }
                         objectsread++;
                         break;

                 case 3:
                         tMsg.ParseFromCodedStream(coded_input);
                         if(tMsg.has_header())
                         {
                                 //hMsg = tMsg.mutable_header();
                                 hMsg = new HeaderMessage();
                                 hMsg-Clear();
                                 hMsg-Swap(tMsg.mutable_header());
                         }
                         objectsread++;
                         break;

                 case 4:
                         aMsg.ParseFromCodedStream(coded_input);
                         if(aMsg.has_header())
                         {
                                 //hMsg = aMsg.mutable_header();
                                 hMsg = new HeaderMessage();
                                 hMsg-Clear();
                                 hMsg-Swap(aMsg.mutable_header());
                         }
                         objectsread++;
                         break;

          

[protobuf] Re: suggestions on improving the performance?

2012-01-15 Thread alok
here is the link to a forum which states why i have to set the limit.

http://markmail.org/message/km7mlmj46jgfs3rx#query:+page:1+mid:5f7q3wj2htwajjof+state:results

excerpt from the link

The problem is that CodedInputStream has internal counter of how many
bytes are read so far with the same object.

In my case, there are a lot of small messages saved in the same file.
I do not read them at once and therefore do not care about large
messages, limits. I am safe.

So, the problem can be easily solved by calling:

CodedInputStream input_stream(...);
input_stream.SetTotalBytesLimit(1e9, 9e8);

My use-case is really about storing extremely large number (up to 1e9)
of small messages ~ 10K each. 


My problem is same as above, so i will have to set the limits on coded
input object.

Regards,
Alok


On Jan 16, 10:26 am, alok alok.jad...@gmail.com wrote:
 I was actually doing that initially, but I kept getting error on
 Maximum length for a message is reached ( I dont have exact error
 string at the moment). This was because my input binary file is large
 and it reaches the limit for coded input very fast.

 I saw a post on the forum (or maybe on Stack Exchange) which suggested
 that i should create a new coded_input object for each message. I have
 to reset the limits for coded input object. user on that thread
 suggested that its easy to create and destroy coded_input object.
 These objects are not big.

 Anyways, I will try it again by resetting the limits on this object.
 But then, would this be casuing the slowness? I will try and let you
 know the results.

 Regards,
 Alok

 On Jan 16, 9:46 am, Daniel Wright dwri...@google.com wrote:







  You're making a new CodedInputStream for each message -- I think that gives
  very poor buffering behavior.  You should just pass coded_input to
  ReadAllMessages and keep reusing it.

  Cheers
  Daniel

  On Sun, Jan 15, 2012 at 4:41 PM, alok alok.jad...@gmail.com wrote:
   Daniel,

   i am hoping that my code is incorrect but i am not sure what is wrong
   or what is really causing this slowness.

   @ Henner Zeller, sorry i forgot to include the object length in above
   example. I do store object length for each object. I dont have issues
   in reading all the objects. Code is working fine. I just want to make
   sure to be able to make the code run faster now.

   attaching my code here...

   File format is

   File header
   Record1, Record2, Record3

   Each record contains n objects of type defined in proto file. 1st
   object has header which contains the number of objects in each record.

   code
   proto file

   message HeaderMessage {
          required double timestamp = 1;
    required string ric_code = 2;
    required int32 count = 3;
    required int32 total_message_size = 4;
   }

   message QuoteMessage {
          enum Side {
      ASK = 0;
      BID = 1;
    }
    required Side type = 1;
          required int32 level = 2;
          optional double price = 3;
          optional int64 size = 4;
          optional int32 count = 5;
          optional HeaderMessage header = 6;
   }

   message CustomMessage {
          required string field_name = 1;
          required double value = 2;
          optional HeaderMessage header = 3;
   }

   message TradeMessage {
          optional double price = 1;
          optional int64 size = 2;
          optional int64 AccumulatedVolume = 3;
          optional HeaderMessage header = 4;
   }

   message AlphaMessage {
          required int32 level = 1;
          required double alpha = 2;
          optional double stddev = 3;
           optional HeaderMessage header = 4;
   }

   /code

   code
   Reading records from binary file

   bool ReadNextRecord(CodedInputStream *coded_input,
   stdext::hash_setstd::string instruments)
   {
          uint32 count, objtype, objlen;
          int i;

          int objectsread = 0;
          HeaderMessage *hMsg = NULL;
          TradeMessage tMsg;
          QuoteMessage qMsg;
          CustomMessage cMsg;
          AlphaMessage aMsg;

          while(1)
          {
                  if(!coded_input-ReadLittleEndian32(objtype)) {
                          return false;
                  }
                  if(!coded_input-ReadLittleEndian32(objlen)) {
                          return false;
                  }
                  CodedInputStream::Limit lim =
   coded_input-PushLimit(objlen);

                  switch(objtype)
                  {
                  case 2:
                          qMsg.ParseFromCodedStream(coded_input);
                          if(qMsg.has_header())
                          {
                                  //hMsg =
                                  hMsg = new HeaderMessage();
                                  hMsg-Clear();
                                  hMsg-Swap(qMsg.mutable_header());
                          }
                          objectsread++;
                          break;

                  case 3:
                       

[protobuf] Re: suggestions on improving the performance?

2012-01-15 Thread alok
google groups link
http://groups.google.com/group/protobuf/browse_thread/thread/64a07911e3c90cd5

I tested the code with reusing the coded input object. Not much change
in the speed performance.

void ReadAllMessages(ZeroCopyInputStream *raw_input,
stdext::hash_setstd::string instruments)
{
int item_count = 0;

CodedInputStream* in = new  CodedInputStream(raw_input);
in-SetTotalBytesLimit(1e9, 9e8);
while(1)
{
if(item_count % 20 == 0){
delete in;
in = new  CodedInputStream(raw_input);
in-SetTotalBytesLimit(1e9, 9e8);
}
if(!ReadNextRecord(in, instruments))
break;
item_count++;
}
cout  Finished reading file. Total item_count items
read.endl;
}

I reuse coded input object for every 200k objects. there are total of
around 650k objects in the file.

I get a feeling, whether this slowness is because of my binary file
format. is there anything i can change so that i can read it faster.
like eg, removing optional fields and keeping the format as raw as
possible etc.

regards,
Alok

On Jan 16, 10:40 am, alok alok.jad...@gmail.com wrote:
 here is the link to a forum which states why i have to set the limit.

 http://markmail.org/message/km7mlmj46jgfs3rx#query:+page:1+mid:5f7q3w...

 excerpt from the link

 The problem is that CodedInputStream has internal counter of how many
 bytes are read so far with the same object.

 In my case, there are a lot of small messages saved in the same file.
 I do not read them at once and therefore do not care about large
 messages, limits. I am safe.

 So, the problem can be easily solved by calling:

 CodedInputStream input_stream(...);
 input_stream.SetTotalBytesLimit(1e9, 9e8);

 My use-case is really about storing extremely large number (up to 1e9)
 of small messages ~ 10K each. 

 My problem is same as above, so i will have to set the limits on coded
 input object.

 Regards,
 Alok

 On Jan 16, 10:26 am, alok alok.jad...@gmail.com wrote:







  I was actually doing that initially, but I kept getting error on
  Maximum length for a message is reached ( I dont have exact error
  string at the moment). This was because my input binary file is large
  and it reaches the limit for coded input very fast.

  I saw a post on the forum (or maybe on Stack Exchange) which suggested
  that i should create a new coded_input object for each message. I have
  to reset the limits for coded input object. user on that thread
  suggested that its easy to create and destroy coded_input object.
  These objects are not big.

  Anyways, I will try it again by resetting the limits on this object.
  But then, would this be casuing the slowness? I will try and let you
  know the results.

  Regards,
  Alok

  On Jan 16, 9:46 am, Daniel Wright dwri...@google.com wrote:

   You're making a new CodedInputStream for each message -- I think that 
   gives
   very poor buffering behavior.  You should just pass coded_input to
   ReadAllMessages and keep reusing it.

   Cheers
   Daniel

   On Sun, Jan 15, 2012 at 4:41 PM, alok alok.jad...@gmail.com wrote:
Daniel,

i am hoping that my code is incorrect but i am not sure what is wrong
or what is really causing this slowness.

@ Henner Zeller, sorry i forgot to include the object length in above
example. I do store object length for each object. I dont have issues
in reading all the objects. Code is working fine. I just want to make
sure to be able to make the code run faster now.

attaching my code here...

File format is

File header
Record1, Record2, Record3

Each record contains n objects of type defined in proto file. 1st
object has header which contains the number of objects in each record.

code
proto file

message HeaderMessage {
       required double timestamp = 1;
 required string ric_code = 2;
 required int32 count = 3;
 required int32 total_message_size = 4;
}

message QuoteMessage {
       enum Side {
   ASK = 0;
   BID = 1;
 }
 required Side type = 1;
       required int32 level = 2;
       optional double price = 3;
       optional int64 size = 4;
       optional int32 count = 5;
       optional HeaderMessage header = 6;
}

message CustomMessage {
       required string field_name = 1;
       required double value = 2;
       optional HeaderMessage header = 3;
}

message TradeMessage {
       optional double price = 1;
       optional int64 size = 2;
       optional int64 AccumulatedVolume = 3;
       optional HeaderMessage header = 4;
}

message AlphaMessage {
       required int32 level = 1;
       required double alpha = 2;
       optional double stddev = 3;
        optional HeaderMessage header = 4;
}

/code

code

[protobuf] Re: suggestions on improving the performance?

2012-01-13 Thread alok
any suggestions? experiences?

regards,
Alok

On Jan 11, 1:16 pm, alok alok.jad...@gmail.com wrote:
 my point is ..should i have one message something like

 Message Record{
   required HeaderMessage header;
   optional TradeMessage trade;
   repeated QuoteMessage quotes; // 0 or more
   repeated CustomMessage customs; // 0 or more

 }

 or rather should i keep my file plain as
 object type, object, objecttype, object
 without worrying about the concept of a record.

 Each message in file is usually header + any 1 type of message (trade,
 quote or custom) ..  and mostly only 1 quote or custom message not
 more.

 what would be faster to decode?

 Regards,
 Alok

 On Jan 11, 12:41 pm, alok alok.jad...@gmail.com wrote:







  Hi everyone,

  My program is taking more time to read binary files than the text
  files. I think the issue is with the structure of the binary files
  that i have designed. (Or could it be possible that binary decoding is
  slower than text files parsing? ).

  Data file is a large text file with 1 record per row. upto 1.2 GB.
  Binary file is around 900 MB.

  **
   - Text file reading takes 3 minutes to read the file.
   - Binary file reading takes 5 minutes.

  I saw a very strange behavior.
   - Just to see how long it takes to skim through binary file, i
  started reading header on each message which holds the length of the
  message and then skipped that many bytes using the Skip() function of
  coded_input object. After making this change, i was expecting that
  reading through file should take less time, but it took more than 10
  minutes. Is skipping not same as adding n bytes to the file pointer?
  is it slower to skip the object than read it?

  Are their any guidelines on how the structure should be designed to
  get the best performance?

  My current structure looks as below

  message HeaderMessage {
    required double timestamp = 1;
    required string ric_code = 2;
    required int32 count = 3;
    required int32 total_message_size = 4;

  }

  message QuoteMessage {
          enum Side {
      ASK = 0;
      BID = 1;
    }
    required Side type = 1;
          required int32 level = 2;
          optional double price = 3;
          optional int64 size = 4;
          optional int32 count = 5;
          optional HeaderMessage header = 6;

  }

  message CustomMessage {
          required string field_name = 1;
          required double value = 2;
          optional HeaderMessage header = 3;

  }

  message TradeMessage {
          optional double price = 1;
          optional int64 size = 2;
          optional int64 AccumulatedVolume = 3;
          optional HeaderMessage header = 4;

  }

  Binary file format is
  object type, object, object type object ...

  1st object of a record holds header with n number of objects in that
  record. next n-1 objects will not hold header since they all belong to
  same record (same update time).
  now n+1th object belongs to the new record and it will hold header for
  next record.

  Any advices?

  Regards,
  Alok

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: suggestions on improving the performance?

2012-01-13 Thread Daniel Wright
It's extremely unlikely that text parsing is faster than binary parsing on
pretty much any message.  My guess is that there's something wrong in the
way you're reading the binary file -- e.g. no buffering, or possibly a bug
where you hand the protobuf library multiple messages concatenated
together.  It'd be easier to comment if you post the code.

Cheers
Daniel

On Fri, Jan 13, 2012 at 1:22 AM, alok alok.jad...@gmail.com wrote:

 any suggestions? experiences?

 regards,
 Alok

 On Jan 11, 1:16 pm, alok alok.jad...@gmail.com wrote:
  my point is ..should i have one message something like
 
  Message Record{
required HeaderMessage header;
optional TradeMessage trade;
repeated QuoteMessage quotes; // 0 or more
repeated CustomMessage customs; // 0 or more
 
  }
 
  or rather should i keep my file plain as
  object type, object, objecttype, object
  without worrying about the concept of a record.
 
  Each message in file is usually header + any 1 type of message (trade,
  quote or custom) ..  and mostly only 1 quote or custom message not
  more.
 
  what would be faster to decode?
 
  Regards,
  Alok
 
  On Jan 11, 12:41 pm, alok alok.jad...@gmail.com wrote:
 
 
 
 
 
 
 
   Hi everyone,
 
   My program is taking more time to read binary files than the text
   files. I think the issue is with the structure of the binary files
   that i have designed. (Or could it be possible that binary decoding is
   slower than text files parsing? ).
 
   Data file is a large text file with 1 record per row. upto 1.2 GB.
   Binary file is around 900 MB.
 
   **
- Text file reading takes 3 minutes to read the file.
- Binary file reading takes 5 minutes.
 
   I saw a very strange behavior.
- Just to see how long it takes to skim through binary file, i
   started reading header on each message which holds the length of the
   message and then skipped that many bytes using the Skip() function of
   coded_input object. After making this change, i was expecting that
   reading through file should take less time, but it took more than 10
   minutes. Is skipping not same as adding n bytes to the file pointer?
   is it slower to skip the object than read it?
 
   Are their any guidelines on how the structure should be designed to
   get the best performance?
 
   My current structure looks as below
 
   message HeaderMessage {
 required double timestamp = 1;
 required string ric_code = 2;
 required int32 count = 3;
 required int32 total_message_size = 4;
 
   }
 
   message QuoteMessage {
   enum Side {
   ASK = 0;
   BID = 1;
 }
 required Side type = 1;
   required int32 level = 2;
   optional double price = 3;
   optional int64 size = 4;
   optional int32 count = 5;
   optional HeaderMessage header = 6;
 
   }
 
   message CustomMessage {
   required string field_name = 1;
   required double value = 2;
   optional HeaderMessage header = 3;
 
   }
 
   message TradeMessage {
   optional double price = 1;
   optional int64 size = 2;
   optional int64 AccumulatedVolume = 3;
   optional HeaderMessage header = 4;
 
   }
 
   Binary file format is
   object type, object, object type object ...
 
   1st object of a record holds header with n number of objects in that
   record. next n-1 objects will not hold header since they all belong to
   same record (same update time).
   now n+1th object belongs to the new record and it will hold header for
   next record.
 
   Any advices?
 
   Regards,
   Alok

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to protobuf@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: suggestions on improving the performance?

2012-01-13 Thread Henner Zeller
On Fri, Jan 13, 2012 at 11:22, Daniel Wright dwri...@google.com wrote:
 It's extremely unlikely that text parsing is faster than binary parsing on
 pretty much any message.  My guess is that there's something wrong in the
 way you're reading the binary file -- e.g. no buffering, or possibly a bug
 where you hand the protobuf library multiple messages concatenated together.

In particular, the
   object type, object, object type object ..
doesn't seem to include headers that describe the length of the
following message, but such a separator is needed.
( http://code.google.com/apis/protocolbuffers/docs/techniques.html#streaming )

  It'd be easier to comment if you post the code.

 Cheers
 Daniel


 On Fri, Jan 13, 2012 at 1:22 AM, alok alok.jad...@gmail.com wrote:

 any suggestions? experiences?

 regards,
 Alok

 On Jan 11, 1:16 pm, alok alok.jad...@gmail.com wrote:
  my point is ..should i have one message something like
 
  Message Record{
    required HeaderMessage header;
    optional TradeMessage trade;
    repeated QuoteMessage quotes; // 0 or more
    repeated CustomMessage customs; // 0 or more
 
  }
 
  or rather should i keep my file plain as
  object type, object, objecttype, object
  without worrying about the concept of a record.
 
  Each message in file is usually header + any 1 type of message (trade,
  quote or custom) ..  and mostly only 1 quote or custom message not
  more.
 
  what would be faster to decode?
 
  Regards,
  Alok
 
  On Jan 11, 12:41 pm, alok alok.jad...@gmail.com wrote:
 
 
 
 
 
 
 
   Hi everyone,
 
   My program is taking more time to read binary files than the text
   files. I think the issue is with the structure of the binary files
   that i have designed. (Or could it be possible that binary decoding is
   slower than text files parsing? ).
 
   Data file is a large text file with 1 record per row. upto 1.2 GB.
   Binary file is around 900 MB.
 
   **
    - Text file reading takes 3 minutes to read the file.
    - Binary file reading takes 5 minutes.
 
   I saw a very strange behavior.
    - Just to see how long it takes to skim through binary file, i
   started reading header on each message which holds the length of the
   message and then skipped that many bytes using the Skip() function of
   coded_input object. After making this change, i was expecting that
   reading through file should take less time, but it took more than 10
   minutes. Is skipping not same as adding n bytes to the file pointer?
   is it slower to skip the object than read it?
 
   Are their any guidelines on how the structure should be designed to
   get the best performance?
 
   My current structure looks as below
 
   message HeaderMessage {
     required double timestamp = 1;
     required string ric_code = 2;
     required int32 count = 3;
     required int32 total_message_size = 4;
 
   }
 
   message QuoteMessage {
           enum Side {
       ASK = 0;
       BID = 1;
     }
     required Side type = 1;
           required int32 level = 2;
           optional double price = 3;
           optional int64 size = 4;
           optional int32 count = 5;
           optional HeaderMessage header = 6;
 
   }
 
   message CustomMessage {
           required string field_name = 1;
           required double value = 2;
           optional HeaderMessage header = 3;
 
   }
 
   message TradeMessage {
           optional double price = 1;
           optional int64 size = 2;
           optional int64 AccumulatedVolume = 3;
           optional HeaderMessage header = 4;
 
   }
 
   Binary file format is
   object type, object, object type object ...
 
   1st object of a record holds header with n number of objects in that
   record. next n-1 objects will not hold header since they all belong to
   same record (same update time).
   now n+1th object belongs to the new record and it will hold header for
   next record.
 
   Any advices?
 
   Regards,
   Alok

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to protobuf@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.


 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to protobuf@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: suggestions on improving the performance?

2012-01-10 Thread alok
my point is ..should i have one message something like

Message Record{
  required HeaderMessage header;
  optional TradeMessage trade;
  repeated QuoteMessage quotes; // 0 or more
  repeated CustomMessage customs; // 0 or more
}

or rather should i keep my file plain as
object type, object, objecttype, object
without worrying about the concept of a record.

Each message in file is usually header + any 1 type of message (trade,
quote or custom) ..  and mostly only 1 quote or custom message not
more.

what would be faster to decode?

Regards,
Alok


On Jan 11, 12:41 pm, alok alok.jad...@gmail.com wrote:
 Hi everyone,

 My program is taking more time to read binary files than the text
 files. I think the issue is with the structure of the binary files
 that i have designed. (Or could it be possible that binary decoding is
 slower than text files parsing? ).

 Data file is a large text file with 1 record per row. upto 1.2 GB.
 Binary file is around 900 MB.

 **
  - Text file reading takes 3 minutes to read the file.
  - Binary file reading takes 5 minutes.

 I saw a very strange behavior.
  - Just to see how long it takes to skim through binary file, i
 started reading header on each message which holds the length of the
 message and then skipped that many bytes using the Skip() function of
 coded_input object. After making this change, i was expecting that
 reading through file should take less time, but it took more than 10
 minutes. Is skipping not same as adding n bytes to the file pointer?
 is it slower to skip the object than read it?

 Are their any guidelines on how the structure should be designed to
 get the best performance?

 My current structure looks as below

 message HeaderMessage {
   required double timestamp = 1;
   required string ric_code = 2;
   required int32 count = 3;
   required int32 total_message_size = 4;

 }

 message QuoteMessage {
         enum Side {
     ASK = 0;
     BID = 1;
   }
   required Side type = 1;
         required int32 level = 2;
         optional double price = 3;
         optional int64 size = 4;
         optional int32 count = 5;
         optional HeaderMessage header = 6;

 }

 message CustomMessage {
         required string field_name = 1;
         required double value = 2;
         optional HeaderMessage header = 3;

 }

 message TradeMessage {
         optional double price = 1;
         optional int64 size = 2;
         optional int64 AccumulatedVolume = 3;
         optional HeaderMessage header = 4;

 }

 Binary file format is
 object type, object, object type object ...

 1st object of a record holds header with n number of objects in that
 record. next n-1 objects will not hold header since they all belong to
 same record (same update time).
 now n+1th object belongs to the new record and it will hold header for
 next record.

 Any advices?

 Regards,
 Alok

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.