Hi Can you confirm that both the test runs created the same number of HFiles?
Regards Ram > -----Original Message----- > From: Prakrati Agrawal [mailto:[email protected]] > Sent: Monday, June 25, 2012 12:05 PM > To: [email protected] > Subject: RE: Enabling caching increasing the time of retrieval > > No the DB is on a fully distributed setup. > > I wrote the data completely and then started retrieving it and both the > tests were done one by one. So I think the possibility of data being in > the memstore is not there. > Please help me > > Thanks and Regards > Prakrati > > > -----Original Message----- > From: Amandeep Khurana [mailto:[email protected]] > Sent: Monday, June 25, 2012 11:51 AM > To: [email protected] > Subject: Re: Enabling caching increasing the time of retrieval > > Is this on a standalone instance or do you have fully distributed setup > deployed? Do you have any kind of monitoring in place? > > From the numbers you are giving, it looks like the data is of the order > of a few 10 MBs, assuming this is a single threaded read. Did you write > more data between the first run (with cache disabled) and the second > run (with cache enabled)? It is possible that the data was in the > memstore and not yet flushed to HFiles when you did the first test. The > memstore flushed and now the reads had to go to disk since the cache > was not yet warmed up with that data. Subsequent reads of the same rows > should be faster in that case. > > > On Sunday, June 24, 2012 at 11:11 PM, Prakrati Agrawal wrote: > > > Dear all > > > > I am trying to optimize the retrieval code in Java for HBase. The > following are the timings without cache enabled: > > The time taken to get 175347 columns of a row key is 677 ms > > The time taken to get rows : 99 and columns: 14888573 is 48806 ms > > The time taken to get all data (rows: 396 and columns: 32611576) is > 96469 ms > > > > The time taken after caching is enabled(Both block and setCaching) : > > The time taken to get 175347 columns of a row key is 713 ms > > The time taken to get rows : 99 and columns: 14888573 is 57649 ms > > The time taken to get all data (rows: 396 and columns: 32611576) is > 111056 ms > > > > As you all can see, time increases after I enable caching. I am not > understanding what I am doing wrong. Please help me > > > > Thanks and Regards > > Prakrati > > > > > > ________________________________ > > This email message may contain proprietary, private and confidential > information. The information transmitted is intended only for the > person(s) or entities to which it is addressed. Any review, > retransmission, dissemination or other use of, or taking of any action > in reliance upon, this information by persons or entities other than > the intended recipient is prohibited and may be illegal. If you > received this in error, please contact the sender and delete the > message from your system. > > > > Mu Sigma takes all reasonable steps to ensure that its electronic > communications are free from viruses. However, given Internet > accessibility, the Company cannot accept liability for any virus > introduced by this e-mail or any attachment and you are advised to use > up-to-date virus checking software. > > > > This email message may contain proprietary, private and confidential > information. The information transmitted is intended only for the > person(s) or entities to which it is addressed. Any review, > retransmission, dissemination or other use of, or taking of any action > in reliance upon, this information by persons or entities other than > the intended recipient is prohibited and may be illegal. If you > received this in error, please contact the sender and delete the > message from your system. > > Mu Sigma takes all reasonable steps to ensure that its electronic > communications are free from viruses. However, given Internet > accessibility, the Company cannot accept liability for any virus > introduced by this e-mail or any attachment and you are advised to use > up-to-date virus checking software.
