This is what I think, Sorry for my ignorance. I want to use the property of Hbase( i.e columnar DB) so that only the required columns are accessed. For this I kept a large number of column families.
But I am still not understanding....what is happening as there is no disk I/O only High CPU and some network activity. Why is the scan taking more time than the time to populate the Hbase. -Vibhav On Fri, Jan 25, 2013 at 11:36 PM, Jean-Marc Spaggiari < [email protected]> wrote: > Hi Vibhav, > > Do you really need 13 diffefent columns familly? Can't you find a way > to bundle that into 1 or 2 max CF? Maybe by prefixing the colument > name? > > That might help... > > JM > > 2013/1/25, Vibhav Mundra <[email protected]>: > > The number of column families I have is 13, which I guess is okie? > > > > -Vibhav > > > > > > On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[email protected]> wrote: > > > >> You'll have this problem if you have a large number of column families > >> being scanned/populated at the same time. Make sure the data you > >> scan/populate frequently are in the same column family (you can have > many > >> columns in a column family). Unlike BigTable/Hypertable which has the > >> concept of locality/access groups, HBase always stores column families > in > >> separate files, essentially making column family not only a logic > >> grouping > >> mechanism but also a physical locality group. > >> > >> > >> On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[email protected]> > wrote: > >> > >> > I am facing a very strange problem with HBase. > >> > > >> > This what I did: > >> > a) Create a table, using pre partioned splits. > >> > b) Also the column familes are zipped with lzo compression. > >> > c) Using the above configuration I am able to populate 2 million row > >> > per > >> > min in the Hbase. > >> > d) I have created a table with 300 million odd rows, which roughy took > >> me 3 > >> > hours to populate and the data size is of 25GB. > >> > > >> > e) But when I query for data the performance I am getting is very bad. > >> > Basically this is what I am seeing: High CPU, no disk I/O and > >> > network > >> > I/O is happening at the rate of 6~7MB secs. > >> > > >> > > >> > Because of this, if I scan the entries of the table using Hive it is > >> taking > >> > ages. > >> > Basically it is taking around 24 hours to scan the table. Any idea, of > >> how > >> > to debug. > >> > > >> > > >> > -Vibhav > >> > > >> > > >
