Re: Index

Martin Konicek Wed, 08 Jun 2011 04:27:54 -0700

Hi,

I was testing indexes today as well and the index definitely got used.You should be able to see this when you run two separate queries:

INSERT OVERWRITE DIRECTORY "/tmp/index-result2" ...
SELECT ...

The SELECT was faster for me than without the index. In your case thetime might be spent in the GROUP BY and maybe you have little data sothe times look the same.

What is not so good is that index can't be partitioned on differentcolumns than the table. E.g. I would like to partition the table on dateand the index on region (I can't partition the table on both date&regionbc there are thousands of regions and that would create huge directorystructure in HDFS, which I read is not recommended).


Martin

On 08/06/2011 11:28, Guillaume WEILL wrote:

Hi,
I want to test the use of indexes in hive. For this I created anindex,I launched a first query above, I changed the settings on Hive and ranmy query on my database table:
CREATE INDEX index ON TABLE table(key) as 'COMPACT' WITH DEFERRED REBUILD;
ALTER INDEX index ON table REBUILD;
INSERT OVERWRITE DIRECTORY "/tmp/index-result2" SELECT `_bucketname` ,  
`_offsets` FROM default__table_index__ x WHERE x.key=100;

SET hive.index.compact.file=/tmp/index_result2;
SET 
hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
SELECT key, col2, sum(col3) FROM tableWHERE key=100  group BY col2;
No error but I am not sure that the index is really used. Indeed I getthe same performance with and without the index.When I look at the logs (tasktracker, datanode, job_config), I see nocall to the directory / tmp / index_result2.
How do I know if my index has been really used?

Thanks for your help,

--
Guillaume WEILL

Re: Index

Reply via email to