Hive indexes without improvement of performance

Vadim Dedkov Thu, 16 Jun 2016 13:50:56 -0700

Hello!

I use Hive 1.1.0-cdh5.5.0 and try to use indexes support.


My index creation:
*CREATE INDEX doc_id_idx on TABLE my_schema_name.doc_t (id) AS 'COMPACT'
WITH DEFERRED REBUILD;*
*ALTER INDEX doc_id_idx ON my_schema_name.doc_t REBUILD;*

Then I set configs:
*set hive.optimize.autoindex=true;*
*set hive.optimize.index.filter=true;*
*set hive.optimize.index.filter.compact.minsize=0;*
*set hive.index.compact.query.max.size=-1;*
*set hive.index.compact.query.max.entries=-1; *

And my query is:
*select count(*) from my_schema_name.doc_t WHERE id = '3723445235879';*

Sometimes I have improvement of performance, but most of cases - not.

In cases when I have improvement:
1. my query is
*select count(*) from my_schema_name.doc_t WHERE id = '3723445235879';*
give me NullPointerException (in logs I see that Hive doesn't find my index
table)
2. then I write:
*USE my_schema_name;*
*select count(*) from doc_t WHERE id = '3723445235879';*
and have result with improvement
(172 sec)

In case when I don't have improvement, I can use either
*select count(*) from my_schema_name.doc_t WHERE id = '3723445235879';*
without exception, either
*USE my_schema_name;*
*select count(*) from doc_t WHERE id = '3723445235879';*
and have result
(1153 sec)

My table is about 6 billion rows.
I tried various combinations on index configs, including only these two:
*set hive.optimize.index.filter=true;*
*set hive.optimize.index.filter.compact.minsize=0;*
My hadoop version is 2.6.0-cdh5.5.0

What I do wrong?

Thank you.

-- 
_______________             _______________
Best regards,                    С уважением
Vadim Dedkov.                  Вадим Дедков.

Hive indexes without improvement of performance

Reply via email to