Thanks for your reply. Could you say me how much time have you saved with the index and the time used on a query without query? The amount of data of your table could be helpful too. This is to verify your point of view about the amount of data because I am really not impressed by performance of my index. Personally I work on 100 GB.
Guillaume 2011/6/8 Martin Konicek <[email protected]> > Hi, > > I was testing indexes today as well and the index definitely got used. You > should be able to see this when you run two separate queries: > > INSERT OVERWRITE DIRECTORY "/tmp/index-result2" ... > SELECT ... > > The SELECT was faster for me than without the index. In your case the time > might be spent in the GROUP BY and maybe you have little data so the times > look the same. > > What is not so good is that index can't be partitioned on different columns > than the table. E.g. I would like to partition the table on date and the > index on region (I can't partition the table on both date®ion bc there > are thousands of regions and that would create huge directory structure in > HDFS, which I read is not recommended). > > Martin > > > On 08/06/2011 11:28, Guillaume WEILL wrote: > >> Hi, >> >> I want to test the use of indexes in hive. For this I created anindex, I >> launched a first query above, I changed the settings on Hive and ran my >> query on my database table: >> >> CREATE INDEX index ON TABLE table(key) as 'COMPACT' WITH DEFERRED REBUILD; >> ALTER INDEX index ON table REBUILD; >> INSERT OVERWRITE DIRECTORY "/tmp/index-result2" SELECT `_bucketname` , >> `_offsets` FROM default__table_index__ x WHERE x.key=100; >> >> SET hive.index.compact.file=/tmp/index_result2; >> SET >> hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat; >> SELECT key, col2, sum(col3) FROM tableWHERE key=100 group BY col2; >> >> >> >> No error but I am not sure that the index is really used. Indeed I get the >> same performance with and without the index. >> When I look at the logs (tasktracker, datanode, job_config), I see no call >> to the directory / tmp / index_result2. >> How do I know if my index has been really used? >> >> Thanks for your help, >> >> -- >> Guillaume WEILL >> >
