I am new to the Hbase-HIve. Am I missing something. If would be great if you can point me to some documents about caching.
-Vibhav On Sat, Jan 26, 2013 at 1:01 AM, Shashwat Shriparv < [email protected]> wrote: > I would suggest u to look onto caching techniques > > > > > Regards > ยง > Shashwat Shriparv > > > Sent from Samsung GalaxyAdrien Mogenet <[email protected]> > wrote:Definitely not, you should keep it under 3 maximum. Keep in mind that > 1 CF > == 1 Store == at least that many big files to read. > > > On Fri, Jan 25, 2013 at 6:59 PM, Vibhav Mundra <[email protected]> wrote: > > > The number of column families I have is 13, which I guess is okie? > > > > -Vibhav > > > > > > On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[email protected]> wrote: > > > > > You'll have this problem if you have a large number of column families > > > being scanned/populated at the same time. Make sure the data you > > > scan/populate frequently are in the same column family (you can have > many > > > columns in a column family). Unlike BigTable/Hypertable which has the > > > concept of locality/access groups, HBase always stores column families > in > > > separate files, essentially making column family not only a logic > > grouping > > > mechanism but also a physical locality group. > > > > > > > > > On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[email protected]> > wrote: > > > > > > > I am facing a very strange problem with HBase. > > > > > > > > This what I did: > > > > a) Create a table, using pre partioned splits. > > > > b) Also the column familes are zipped with lzo compression. > > > > c) Using the above configuration I am able to populate 2 million row > > per > > > > min in the Hbase. > > > > d) I have created a table with 300 million odd rows, which roughy > took > > > me 3 > > > > hours to populate and the data size is of 25GB. > > > > > > > > e) But when I query for data the performance I am getting is very > bad. > > > > Basically this is what I am seeing: High CPU, no disk I/O and > > network > > > > I/O is happening at the rate of 6~7MB secs. > > > > > > > > > > > > Because of this, if I scan the entries of the table using Hive it is > > > taking > > > > ages. > > > > Basically it is taking around 24 hours to scan the table. Any idea, > of > > > how > > > > to debug. > > > > > > > > > > > > -Vibhav > > > > > > > > > > > > > -- > Adrien Mogenet > 06.59.16.64.22 > http://www.mogenet.me >
