google groups link
http://groups.google.com/group/protobuf/browse_thread/thread/64a07911e3c90cd5

I tested the code with reusing the coded input object. Not much change
in the speed performance.

void ReadAllMessages(ZeroCopyInputStream *raw_input,
stdext::hash_set<std::string> instruments)
{
        int item_count = 0;

        CodedInputStream* in = new  CodedInputStream(raw_input);
        in->SetTotalBytesLimit(1e9, 9e8);
        while(1)
        {
                if(item_count % 200000 == 0){
                        delete in;
                        in = new  CodedInputStream(raw_input);
                        in->SetTotalBytesLimit(1e9, 9e8);
                }
                if(!ReadNextRecord(in, instruments))
                        break;
                item_count++;
        }
        cout << "Finished reading file. Total "<<item_count<<" items
read."<<endl;
}

I reuse coded input object for every 200k objects. there are total of
around 650k objects in the file.

I get a feeling, whether this slowness is because of my binary file
format. is there anything i can change so that i can read it faster.
like eg, removing optional fields and keeping the format as raw as
possible etc.

regards,
Alok

On Jan 16, 10:40 am, alok <alok.jad...@gmail.com> wrote:
> here is the link to a forum which states why i have to set the limit.
>
> http://markmail.org/message/km7mlmj46jgfs3rx#query:+page:1+mid:5f7q3w...
>
> excerpt from the link
>
> "The problem is that CodedInputStream has internal counter of how many
> bytes are read so far with the same object.
>
> In my case, there are a lot of small messages saved in the same file.
> I do not read them at once and therefore do not care about large
> messages, limits. I am safe.
>
> So, the problem can be easily solved by calling:
>
> CodedInputStream input_stream(...);
> input_stream.SetTotalBytesLimit(1e9, 9e8);
>
> My use-case is really about storing extremely large number (up to 1e9)
> of small messages ~ 10K each. "
>
> My problem is same as above, so i will have to set the limits on coded
> input object.
>
> Regards,
> Alok
>
> On Jan 16, 10:26 am, alok <alok.jad...@gmail.com> wrote:
>
>
>
>
>
>
>
> > I was actually doing that initially, but I kept getting error on
> > "Maximum length for a message is reached" ( I dont have exact error
> > string at the moment). This was because my input binary file is large
> > and it reaches the limit for coded input very fast.
>
> > I saw a post on the forum (or maybe on Stack Exchange) which suggested
> > that i should create a new coded_input object for each message. I have
> > to reset the limits for coded input object. user on that thread
> > suggested that its easy to create and destroy coded_input object.
> > These objects are not big.
>
> > Anyways, I will try it again by resetting the limits on this object.
> > But then, would this be casuing the slowness? I will try and let you
> > know the results.
>
> > Regards,
> > Alok
>
> > On Jan 16, 9:46 am, Daniel Wright <dwri...@google.com> wrote:
>
> > > You're making a new CodedInputStream for each message -- I think that 
> > > gives
> > > very poor buffering behavior.  You should just pass coded_input to
> > > ReadAllMessages and keep reusing it.
>
> > > Cheers
> > > Daniel
>
> > > On Sun, Jan 15, 2012 at 4:41 PM, alok <alok.jad...@gmail.com> wrote:
> > > > Daniel,
>
> > > > i am hoping that my code is incorrect but i am not sure what is wrong
> > > > or what is really causing this slowness.
>
> > > > @ Henner Zeller, sorry i forgot to include the object length in above
> > > > example. I do store object length for each object. I dont have issues
> > > > in reading all the objects. Code is working fine. I just want to make
> > > > sure to be able to make the code run faster now.
>
> > > > attaching my code here...
>
> > > > File format is
>
> > > > File header
> > > > Record1, Record2, Record3
>
> > > > Each record contains n objects of type defined in proto file. 1st
> > > > object has header which contains the number of objects in each record.
>
> > > > <code>
> > > > proto file
>
> > > > message HeaderMessage {
> > > >        required double timestamp = 1;
> > > >  required string ric_code = 2;
> > > >  required int32 count = 3;
> > > >  required int32 total_message_size = 4;
> > > > }
>
> > > > message QuoteMessage {
> > > >        enum Side {
> > > >    ASK = 0;
> > > >    BID = 1;
> > > >  }
> > > >  required Side type = 1;
> > > >        required int32 level = 2;
> > > >        optional double price = 3;
> > > >        optional int64 size = 4;
> > > >        optional int32 count = 5;
> > > >        optional HeaderMessage header = 6;
> > > > }
>
> > > > message CustomMessage {
> > > >        required string field_name = 1;
> > > >        required double value = 2;
> > > >        optional HeaderMessage header = 3;
> > > > }
>
> > > > message TradeMessage {
> > > >        optional double price = 1;
> > > >        optional int64 size = 2;
> > > >        optional int64 AccumulatedVolume = 3;
> > > >        optional HeaderMessage header = 4;
> > > > }
>
> > > > message AlphaMessage {
> > > >        required int32 level = 1;
> > > >        required double alpha = 2;
> > > >        optional double stddev = 3;
> > > >         optional HeaderMessage header = 4;
> > > > }
>
> > > > </code>
>
> > > > <code>
> > > > Reading records from binary file
>
> > > > bool ReadNextRecord(CodedInputStream *coded_input,
> > > > stdext::hash_set<std::string> instruments)
> > > > {
> > > >        uint32 count, objtype, objlen;
> > > >        int i;
>
> > > >        int objectsread = 0;
> > > >        HeaderMessage *hMsg = NULL;
> > > >        TradeMessage tMsg;
> > > >        QuoteMessage qMsg;
> > > >        CustomMessage cMsg;
> > > >        AlphaMessage aMsg;
>
> > > >        while(1)
> > > >        {
> > > >                if(!coded_input->ReadLittleEndian32(&objtype)) {
> > > >                        return false;
> > > >                }
> > > >                if(!coded_input->ReadLittleEndian32(&objlen)) {
> > > >                        return false;
> > > >                }
> > > >                CodedInputStream::Limit lim =
> > > > coded_input->PushLimit(objlen);
>
> > > >                switch(objtype)
> > > >                {
> > > >                case 2:
> > > >                        qMsg.ParseFromCodedStream(coded_input);
> > > >                        if(qMsg.has_header())
> > > >                        {
> > > >                                //hMsg =
> > > >                                hMsg = new HeaderMessage();
> > > >                                hMsg->Clear();
> > > >                                hMsg->Swap(qMsg.mutable_header());
> > > >                        }
> > > >                        objectsread++;
> > > >                        break;
>
> > > >                case 3:
> > > >                        tMsg.ParseFromCodedStream(coded_input);
> > > >                        if(tMsg.has_header())
> > > >                        {
> > > >                                //hMsg = tMsg.mutable_header();
> > > >                                hMsg = new HeaderMessage();
> > > >                                hMsg->Clear();
> > > >                                hMsg->Swap(tMsg.mutable_header());
> > > >                        }
> > > >                        objectsread++;
> > > >                        break;
>
> > > >                case 4:
> > > >                        aMsg.ParseFromCodedStream(coded_input);
> > > >                        if(aMsg.has_header())
> > > >                        {
> > > >                                //hMsg = aMsg.mutable_header();
> > > >                                hMsg = new HeaderMessage();
> > > >                                hMsg->Clear();
> > > >                                hMsg->Swap(aMsg.mutable_header());
> > > >                        }
> > > >                        objectsread++;
> > > >                        break;
>
> > > >                case 5:
> > > >                        cMsg.ParseFromCodedStream(coded_input);
> > > >                        if(cMsg.has_header())
> > > >                        {
> > > >                                //hMsg = cMsg.mutable_header();
> > > >                                hMsg = new HeaderMessage();
> > > >                                hMsg->Clear();
> > > >                                hMsg->Swap(cMsg.mutable_header());
> > > >                        }
> > > >                        objectsread++;
> > > >                        break;
>
> > > >                default:
> > > >                        cout << "Invalid object type "<< objtype <<
> > > > endl;
> > > >                        return false;
> > > >                        break;
> > > >                }
> > > >                coded_input->PopLimit(lim);
> > > >                if(objectsread == hMsg->count()) break;
> > > >        }
> > > >        return true;
> > > > }
>
> > > > void ReadAllMessages(ZeroCopyInputStream *raw_input,
> > > > stdext::hash_set<std::string> instruments)
> > > > {
> > > >        int item_count = 0;
> > > >        while(1)
> > > >        {
> > > >                CodedInputStream in(raw_input);
> > > >                if(!ReadNextRecord(&in, instruments))
> > > >                        break;
> > > >                item_count++;
> > > >        }
> > > >        cout << "Finished reading file. Total "<<item_count<<" items
> > > > read."<<endl;
> > > > }
>
> > > > int _tmain(int argc, _TCHAR* argv[])
> > > > {
> > > >        GOOGLE_PROTOBUF_VERIFY_VERSION;
>
> > > >        ZeroCopyInputStream *raw_input;
> > > >        CodedInputStream *coded_input;
> > > >        stdext::hash_set<std::string> instruments;
>
> > > >        string filename = "S:/users/aaj/sandbox/tickdata/bin/hk/
> > > > 2011/2011.01.04.bin";
> > > >        int fd = _open(filename.c_str(), _O_BINARY | O_RDONLY);
>
> > > >        if( fd == -1 )
> > > >        {
> > > >                printf( "Error opening the file. \n" );
> > > >                exit( 1 );
> > > >        }
>
> > > >        raw_input = new FileInputStream(fd);
> > > >        coded_input = new CodedInputStream(raw_input);
>
> > > >        uint32 magic_no;
>
> > > >        coded_input->ReadLittleEndian32(&magic_no);
>
> > > >        cout << "HEADER: " << "\t" << magic_no<<endl;
> > > >        cout << "Reading data objects.." << endl;
> > > >        delete coded_input;
> > > >
>
> ...
>
> read more »

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Reply via email to