Hi,
I was testing indexes today as well and the index definitely got used.
You should be able to see this when you run two separate queries:
INSERT OVERWRITE DIRECTORY "/tmp/index-result2" ...
SELECT ...
The SELECT was faster for me than without the index. In your case the
time might be spent in the GROUP BY and maybe you have little data so
the times look the same.
What is not so good is that index can't be partitioned on different
columns than the table. E.g. I would like to partition the table on date
and the index on region (I can't partition the table on both date®ion
bc there are thousands of regions and that would create huge directory
structure in HDFS, which I read is not recommended).
Martin
On 08/06/2011 11:28, Guillaume WEILL wrote:
Hi,
I want to test the use of indexes in hive. For this I created anindex,
I launched a first query above, I changed the settings on Hive and ran
my query on my database table:
CREATE INDEX index ON TABLE table(key) as 'COMPACT' WITH DEFERRED REBUILD;
ALTER INDEX index ON table REBUILD;
INSERT OVERWRITE DIRECTORY "/tmp/index-result2" SELECT `_bucketname` ,
`_offsets` FROM default__table_index__ x WHERE x.key=100;
SET hive.index.compact.file=/tmp/index_result2;
SET
hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
SELECT key, col2, sum(col3) FROM tableWHERE key=100 group BY col2;
No error but I am not sure that the index is really used. Indeed I get
the same performance with and without the index.
When I look at the logs (tasktracker, datanode, job_config), I see no
call to the directory / tmp / index_result2.
How do I know if my index has been really used?
Thanks for your help,
--
Guillaume WEILL