David Mollitor created HIVE-21792:
-------------------------------------

             Summary: Hive Indexes... Again
                 Key: HIVE-21792
                 URL: https://issues.apache.org/jira/browse/HIVE-21792
             Project: Hive
          Issue Type: New Feature
          Components: Indexing
            Reporter: David Mollitor


Hive had an implementation of indexing that was made somewhat obsolete given 
the introduction of columnar file formats with their own internal indexing.

I propose that Hive introduce Indexing again.

# Column Index: Stored in HBase
# Full-Text Index: Stored in Solr

The basic idea is that, the key in HBase is the record and the value is the 
relative file path of the data in the Hive table.

Performing an INSERT statement creates the index for each record.

https://dev.mysql.com/doc/refman/8.0/en/create-index.html

When generating the explain plan, only the files involved in the query are 
considered.

This would prevents having to scan large amounts of data for the typical BI 
tools when the set of data is known to be very small.

{code:sql}
-- Quick retrieval of small sets of records
select * from user where userid=27;

-- Full scans
select count(1) from user;
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to