After set this argument to 1000, I get a result: hive/hbase is 4X slower than hive/hdfs.
how much X is the expected slowdown for hive/hbase vs hive/hdfs? Thanks Weihua 2011/10/12 Akash Ashok <[email protected]>: > Hi, > To set this parameter you could use "set hbase.client.scanner.caching=500;" > before the execution of your hive query. > > Cheers, > Akash > > On Wed, Oct 12, 2011 at 8:34 AM, Weihua JIANG <[email protected]>wrote: > >> Since I am using Hive to perform query, I don't know how to set it. >> Can you tell me how to do so? >> >> Thanks >> Weihua >> >> 2011/10/12 Jean-Daniel Cryans <[email protected]>: >> > This is one big factor and you didn't mention configuring it: >> > http://hbase.apache.org/book.html#perf.hbase.client.caching >> > >> > J-D >> > >> > On Tue, Oct 11, 2011 at 7:47 PM, Weihua JIANG <[email protected] >> >wrote: >> > >> >> Hi all, >> >> >> >> I have made some perf test about Hive+HBase. The table is a normal 2D >> >> table with about 160M rows (each row with 7 small columns) and 32 >> >> regions. There is only one column family and all regions have been >> >> major compacted to one store file before test. >> >> >> >> On a cluster with 11 task trackers (each with 4 map slots and 1 reduce >> >> slot, these servers also act as region servers), a simple SQL in Hive >> >> select count(*) from table where column3='Y'; >> >> needs ~1700 seconds to finish. >> >> >> >> But, after use CTAS statement to create an internal table (stored as >> >> sequence file), this statement only needs 43 seconds to finish. >> >> >> >> So Hive+HBase is 40X slower than Hive+HDFS. >> >> >> >> Though Hive+HBase has less map tasks (32 vs 223), but since there are >> >> only 44 map slots available, I don't think it is the main cause. >> >> >> >> I studied the source code of HBase scan implementation. To me, it >> >> seems, in my case, the scan performs HFile read in a quite similar way >> >> as sequence file read (sequential reading of each key/value pair). So, >> >> in theory, the performance shall be quite similar. >> >> >> >> Can anyone explain the 40X slowdown? >> >> >> >> Thanks >> >> Weihua >> >> >> > >> >
