Venki Korukanti created HIVE-5631:
-------------------------------------

             Summary: Index creation on a skew table fails
                 Key: HIVE-5631
                 URL: https://issues.apache.org/jira/browse/HIVE-5631
             Project: Hive
          Issue Type: Bug
          Components: Database/Schema
    Affects Versions: 0.12.0
            Reporter: Venki Korukanti
            Assignee: Venki Korukanti
             Fix For: 0.13.0


REPRO STEPS:

create database skewtest;
use skewtest;
create table skew (id bigint, acct string) skewed by (acct) on ('CC','CH');
create index skew_indx on table skew (id) as 
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED 
REBUILD;

Last DDL fails with following error.
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. InvalidObjectException(message:Invalid 
skew column [acct])

When creating a table, Hive has sanity tests to make sure the columns have 
proper names and the skewed columns are subset of the table columns. Here we 
fail because index table has skewed column info. Index tables's skewed columns 
include {acct} and the columns are {id, _bucketname, _offsets}. As the skewed 
column {acct} is not part of the table columns Hive throws the exception.

The reason why Index table got skewed column info even though its definition 
has no such info is: When creating the index table a deep copy of the base 
table's StorageDescriptor (SD) (in this case 'skew') is made. And in that 
copied SD, index specific parameters are set and unrelated parameters are 
reset. Here skewed column info is not reset (there are few other params that 
are not reset). That's why the index table contains the skewed column info.

Fix: Instead of deep copying the base table StorageDescriptor, create a new one 
from gathered info. This way it avoids the index table to inherit unnecessary 
properties in SD from base table.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to