Hi Vibhav, Do you really need 13 diffefent columns familly? Can't you find a way to bundle that into 1 or 2 max CF? Maybe by prefixing the colument name?
That might help... JM 2013/1/25, Vibhav Mundra <[email protected]>: > The number of column families I have is 13, which I guess is okie? > > -Vibhav > > > On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[email protected]> wrote: > >> You'll have this problem if you have a large number of column families >> being scanned/populated at the same time. Make sure the data you >> scan/populate frequently are in the same column family (you can have many >> columns in a column family). Unlike BigTable/Hypertable which has the >> concept of locality/access groups, HBase always stores column families in >> separate files, essentially making column family not only a logic >> grouping >> mechanism but also a physical locality group. >> >> >> On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[email protected]> wrote: >> >> > I am facing a very strange problem with HBase. >> > >> > This what I did: >> > a) Create a table, using pre partioned splits. >> > b) Also the column familes are zipped with lzo compression. >> > c) Using the above configuration I am able to populate 2 million row >> > per >> > min in the Hbase. >> > d) I have created a table with 300 million odd rows, which roughy took >> me 3 >> > hours to populate and the data size is of 25GB. >> > >> > e) But when I query for data the performance I am getting is very bad. >> > Basically this is what I am seeing: High CPU, no disk I/O and >> > network >> > I/O is happening at the rate of 6~7MB secs. >> > >> > >> > Because of this, if I scan the entries of the table using Hive it is >> taking >> > ages. >> > Basically it is taking around 24 hours to scan the table. Any idea, of >> how >> > to debug. >> > >> > >> > -Vibhav >> > >> >
