I did use the following but it didnt help either. SET hbase.client.scanner.caching=30000; SET hive.hbase.client.scanner.caching=30000;
-Vibhav On Sat, Jan 26, 2013 at 12:43 AM, Shashwat Shriparv < [email protected]> wrote: > > > Try to use caching for query > > > Regards > ยง > Shashwat Shriparv > > > Sent from Samsung GalaxyJean-Marc Spaggiari <[email protected]> > wrote:You're better to put the data based on the way you will access it. > > If you always read data from columns A, B, C and D together, then > bundle them in a single column. And all of that in a single CF... > > JM > > 2013/1/25, Vibhav Mundra <[email protected]>: > > This is what I think, Sorry for my ignorance. > > > > I want to use the property of Hbase( i.e columnar DB) so that only the > > required columns are accessed. For this I kept a large number of column > > families. > > > > But I am still not understanding....what is happening as there is no disk > > I/O only High CPU and some network activity. > > Why is the scan taking more time than the time to populate the Hbase. > > > > -Vibhav > > > > > > On Fri, Jan 25, 2013 at 11:36 PM, Jean-Marc Spaggiari < > > [email protected]> wrote: > > > >> Hi Vibhav, > >> > >> Do you really need 13 diffefent columns familly? Can't you find a way > >> to bundle that into 1 or 2 max CF? Maybe by prefixing the colument > >> name? > >> > >> That might help... > >> > >> JM > >> > >> 2013/1/25, Vibhav Mundra <[email protected]>: > >> > The number of column families I have is 13, which I guess is okie? > >> > > >> > -Vibhav > >> > > >> > > >> > On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[email protected]> wrote: > >> > > >> >> You'll have this problem if you have a large number of column > families > >> >> being scanned/populated at the same time. Make sure the data you > >> >> scan/populate frequently are in the same column family (you can have > >> many > >> >> columns in a column family). Unlike BigTable/Hypertable which has the > >> >> concept of locality/access groups, HBase always stores column > families > >> in > >> >> separate files, essentially making column family not only a logic > >> >> grouping > >> >> mechanism but also a physical locality group. > >> >> > >> >> > >> >> On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[email protected]> > >> wrote: > >> >> > >> >> > I am facing a very strange problem with HBase. > >> >> > > >> >> > This what I did: > >> >> > a) Create a table, using pre partioned splits. > >> >> > b) Also the column familes are zipped with lzo compression. > >> >> > c) Using the above configuration I am able to populate 2 million > row > >> >> > per > >> >> > min in the Hbase. > >> >> > d) I have created a table with 300 million odd rows, which roughy > >> >> > took > >> >> me 3 > >> >> > hours to populate and the data size is of 25GB. > >> >> > > >> >> > e) But when I query for data the performance I am getting is very > >> >> > bad. > >> >> > Basically this is what I am seeing: High CPU, no disk I/O and > >> >> > network > >> >> > I/O is happening at the rate of 6~7MB secs. > >> >> > > >> >> > > >> >> > Because of this, if I scan the entries of the table using Hive it > is > >> >> taking > >> >> > ages. > >> >> > Basically it is taking around 24 hours to scan the table. Any idea, > >> >> > of > >> >> how > >> >> > to debug. > >> >> > > >> >> > > >> >> > -Vibhav > >> >> > > >> >> > >> > > >> > > >
