You'll have this problem if you have a large number of column families being scanned/populated at the same time. Make sure the data you scan/populate frequently are in the same column family (you can have many columns in a column family). Unlike BigTable/Hypertable which has the concept of locality/access groups, HBase always stores column families in separate files, essentially making column family not only a logic grouping mechanism but also a physical locality group.
On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[email protected]> wrote: > I am facing a very strange problem with HBase. > > This what I did: > a) Create a table, using pre partioned splits. > b) Also the column familes are zipped with lzo compression. > c) Using the above configuration I am able to populate 2 million row per > min in the Hbase. > d) I have created a table with 300 million odd rows, which roughy took me 3 > hours to populate and the data size is of 25GB. > > e) But when I query for data the performance I am getting is very bad. > Basically this is what I am seeing: High CPU, no disk I/O and network > I/O is happening at the rate of 6~7MB secs. > > > Because of this, if I scan the entries of the table using Hive it is taking > ages. > Basically it is taking around 24 hours to scan the table. Any idea, of how > to debug. > > > -Vibhav >
