Try to use caching for query
Regards § Shashwat Shriparv Sent from Samsung GalaxyJean-Marc Spaggiari <[email protected]> wrote:You're better to put the data based on the way you will access it. If you always read data from columns A, B, C and D together, then bundle them in a single column. And all of that in a single CF... JM 2013/1/25, Vibhav Mundra <[email protected]>: > This is what I think, Sorry for my ignorance. > > I want to use the property of Hbase( i.e columnar DB) so that only the > required columns are accessed. For this I kept a large number of column > families. > > But I am still not understanding....what is happening as there is no disk > I/O only High CPU and some network activity. > Why is the scan taking more time than the time to populate the Hbase. > > -Vibhav > > > On Fri, Jan 25, 2013 at 11:36 PM, Jean-Marc Spaggiari < > [email protected]> wrote: > >> Hi Vibhav, >> >> Do you really need 13 diffefent columns familly? Can't you find a way >> to bundle that into 1 or 2 max CF? Maybe by prefixing the colument >> name? >> >> That might help... >> >> JM >> >> 2013/1/25, Vibhav Mundra <[email protected]>: >> > The number of column families I have is 13, which I guess is okie? >> > >> > -Vibhav >> > >> > >> > On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[email protected]> wrote: >> > >> >> You'll have this problem if you have a large number of column families >> >> being scanned/populated at the same time. Make sure the data you >> >> scan/populate frequently are in the same column family (you can have >> many >> >> columns in a column family). Unlike BigTable/Hypertable which has the >> >> concept of locality/access groups, HBase always stores column families >> in >> >> separate files, essentially making column family not only a logic >> >> grouping >> >> mechanism but also a physical locality group. >> >> >> >> >> >> On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[email protected]> >> wrote: >> >> >> >> > I am facing a very strange problem with HBase. >> >> > >> >> > This what I did: >> >> > a) Create a table, using pre partioned splits. >> >> > b) Also the column familes are zipped with lzo compression. >> >> > c) Using the above configuration I am able to populate 2 million row >> >> > per >> >> > min in the Hbase. >> >> > d) I have created a table with 300 million odd rows, which roughy >> >> > took >> >> me 3 >> >> > hours to populate and the data size is of 25GB. >> >> > >> >> > e) But when I query for data the performance I am getting is very >> >> > bad. >> >> > Basically this is what I am seeing: High CPU, no disk I/O and >> >> > network >> >> > I/O is happening at the rate of 6~7MB secs. >> >> > >> >> > >> >> > Because of this, if I scan the entries of the table using Hive it is >> >> taking >> >> > ages. >> >> > Basically it is taking around 24 hours to scan the table. Any idea, >> >> > of >> >> how >> >> > to debug. >> >> > >> >> > >> >> > -Vibhav >> >> > >> >> >> > >> >
