Your hive client needs to see a hbase-site.xml in its classpath, so you can set the config there. Also this in general: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath
J-D On Tue, Oct 11, 2011 at 8:04 PM, Weihua JIANG <[email protected]>wrote: > Since I am using Hive to perform query, I don't know how to set it. > Can you tell me how to do so? > > Thanks > Weihua > > 2011/10/12 Jean-Daniel Cryans <[email protected]>: > > This is one big factor and you didn't mention configuring it: > > http://hbase.apache.org/book.html#perf.hbase.client.caching > > > > J-D > > > > On Tue, Oct 11, 2011 at 7:47 PM, Weihua JIANG <[email protected] > >wrote: > > > >> Hi all, > >> > >> I have made some perf test about Hive+HBase. The table is a normal 2D > >> table with about 160M rows (each row with 7 small columns) and 32 > >> regions. There is only one column family and all regions have been > >> major compacted to one store file before test. > >> > >> On a cluster with 11 task trackers (each with 4 map slots and 1 reduce > >> slot, these servers also act as region servers), a simple SQL in Hive > >> select count(*) from table where column3='Y'; > >> needs ~1700 seconds to finish. > >> > >> But, after use CTAS statement to create an internal table (stored as > >> sequence file), this statement only needs 43 seconds to finish. > >> > >> So Hive+HBase is 40X slower than Hive+HDFS. > >> > >> Though Hive+HBase has less map tasks (32 vs 223), but since there are > >> only 44 map slots available, I don't think it is the main cause. > >> > >> I studied the source code of HBase scan implementation. To me, it > >> seems, in my case, the scan performs HFile read in a quite similar way > >> as sequence file read (sequential reading of each key/value pair). So, > >> in theory, the performance shall be quite similar. > >> > >> Can anyone explain the 40X slowdown? > >> > >> Thanks > >> Weihua > >> > > >
