Re: Review Request 14890: Index creation on a skew table fails

2014-11-13 Thread Venki Korukanti

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14890/
---

(Updated Nov. 14, 2014, 12:03 a.m.)


Review request for hive, Ashutosh Chauhan and Thejas Nair.


Changes
---

Rebased on latest trunk.


Bugs: HIVE-5631
https://issues.apache.org/jira/browse/HIVE-5631


Repository: hive-git


Description (updated)
---

Repro steps:
CREATE DATABASE skewtest;
USE skewtest;
CREATE TABLE skew (id bigint, acct string) SKEWED BY (acct) ON ('CC','CH');
CREATE INDEX skew_indx ON TABLE skew (id) as 
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED 
REBUILD;

Last DDL fails with following error.
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. InvalidObjectException(message:Invalid 
skew column [acct])

When creating a table, Hive has sanity tests to make sure the columns have 
proper names and the skewed columns are subset of the table columns. Here we 
fail because index table has skewed column info. Index tables's skewed columns 
include {acct} and the columns are {id, _bucketname, _offsets}. As the skewed 
column {acct} is not part of the table columns Hive throws the exception. 

The reason why Index table got skewed column info even though its definition 
has no such info is: When creating the index table a deep copy of the base 
table's StorageDescriptor (SD) (in this case 'skew') is made. And in that 
copied SD, index specific parameters are set and unrelated parameters are 
reset. Here skewed column info is not reset (there are few other params that 
are not reset). That's why the index table contains the skewed column info.

Fix: Instead of deep copying the base table StorageDescriptor, create a new one 
from gathered info. This way it avoids the index table to inherit unnecessary 
properties in SD from base table.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java b900627 
  ql/src/test/queries/clientpositive/index_skewtable.q PRE-CREATION 
  ql/src/test/results/clientpositive/index_skewtable.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/14890/diff/


Testing
---

Added unittest and ran the index related unittest queries


Thanks,

Venki Korukanti



Re: Review Request 14890: Index creation on a skew table fails

2014-11-13 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14890/#review61395
---

Ship it!


Ship It!

- Ashutosh Chauhan


On Nov. 14, 2014, 12:03 a.m., Venki Korukanti wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/14890/
 ---
 
 (Updated Nov. 14, 2014, 12:03 a.m.)
 
 
 Review request for hive, Ashutosh Chauhan and Thejas Nair.
 
 
 Bugs: HIVE-5631
 https://issues.apache.org/jira/browse/HIVE-5631
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Repro steps:
 CREATE DATABASE skewtest;
 USE skewtest;
 CREATE TABLE skew (id bigint, acct string) SKEWED BY (acct) ON ('CC','CH');
 CREATE INDEX skew_indx ON TABLE skew (id) as 
 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED 
 REBUILD;
 
 Last DDL fails with following error.
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask. 
 InvalidObjectException(message:Invalid skew column [acct])
 
 When creating a table, Hive has sanity tests to make sure the columns have 
 proper names and the skewed columns are subset of the table columns. Here we 
 fail because index table has skewed column info. Index tables's skewed 
 columns include {acct} and the columns are {id, _bucketname, _offsets}. As 
 the skewed column {acct} is not part of the table columns Hive throws the 
 exception. 
 
 The reason why Index table got skewed column info even though its definition 
 has no such info is: When creating the index table a deep copy of the base 
 table's StorageDescriptor (SD) (in this case 'skew') is made. And in that 
 copied SD, index specific parameters are set and unrelated parameters are 
 reset. Here skewed column info is not reset (there are few other params that 
 are not reset). That's why the index table contains the skewed column info.
 
 Fix: Instead of deep copying the base table StorageDescriptor, create a new 
 one from gathered info. This way it avoids the index table to inherit 
 unnecessary properties in SD from base table.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java b900627 
   ql/src/test/queries/clientpositive/index_skewtable.q PRE-CREATION 
   ql/src/test/results/clientpositive/index_skewtable.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/14890/diff/
 
 
 Testing
 ---
 
 Added unittest and ran the index related unittest queries
 
 
 Thanks,
 
 Venki Korukanti
 




Re: Review Request 14890: Index creation on a skew table fails

2013-10-24 Thread Venki Korukanti

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14890/
---

(Updated Oct. 24, 2013, 6:34 p.m.)


Review request for hive, Ashutosh Chauhan and Thejas Nair.


Changes
---

Initialize SerDeInfo object


Bugs: HIVE-5631
https://issues.apache.org/jira/browse/HIVE-5631


Repository: hive-git


Description
---

Repro steps:
CREATE DATABASE skewtest;
USE skewtest;
CREATE TABLE skew (id bigint, acct string) SKEWED BY (acct) ON ('CC','CH');
CREATE INDEX skew_indx ON TABLE skew (id) as 
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED 
REBUILD;

Last DDL fails with following error.
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. InvalidObjectException(message:Invalid 
skew column [acct])

When creating a table, Hive has sanity tests to make sure the columns have 
proper names and the skewed columns are subset of the table columns. Here we 
fail because index table has skewed column info. Index tables's skewed columns 
include {acct} and the columns are {id, _bucketname, _offsets}. As the skewed 
column {acct} is not part of the table columns Hive throws the exception. 

The reason why Index table got skewed column info even though its definition 
has no such info is: When creating the index table a deep copy of the base 
table's StorageDescriptor (SD) (in this case 'skew') is made. And in that 
copied SD, index specific parameters are set and unrelated parameters are 
reset. Here skewed column info is not reset (there are few other params that 
are not reset). That's why the index table contains the skewed column info.

Fix: Instead of deep copying the base table StorageDescriptor, create a new one 
from gathered info. This way it avoids the index table to inherit unnecessary 
properties in SD from base table.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java b0f124b 
  ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java d0cbed6 
  ql/src/test/queries/clientpositive/index_skewtable.q PRE-CREATION 
  ql/src/test/results/clientpositive/index_skewtable.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/14890/diff/


Testing
---

Added unittest and ran the index related unittest queries


Thanks,

Venki Korukanti



Review Request 14890: Index creation on a skew table fails

2013-10-23 Thread Venki Korukanti

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14890/
---

Review request for hive, Ashutosh Chauhan and Thejas Nair.


Bugs: HIVE-5631
https://issues.apache.org/jira/browse/HIVE-5631


Repository: hive-git


Description
---

Repro steps:
CREATE DATABASE skewtest;
USE skewtest;
CREATE TABLE skew (id bigint, acct string) SKEWED BY (acct) ON ('CC','CH');
CREATE INDEX skew_indx ON TABLE skew (id) as 
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED 
REBUILD;

Last DDL fails with following error.
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. InvalidObjectException(message:Invalid 
skew column [acct])

When creating a table, Hive has sanity tests to make sure the columns have 
proper names and the skewed columns are subset of the table columns. Here we 
fail because index table has skewed column info. Index tables's skewed columns 
include {acct} and the columns are {id, _bucketname, _offsets}. As the skewed 
column {acct} is not part of the table columns Hive throws the exception. 

The reason why Index table got skewed column info even though its definition 
has no such info is: When creating the index table a deep copy of the base 
table's StorageDescriptor (SD) (in this case 'skew') is made. And in that 
copied SD, index specific parameters are set and unrelated parameters are 
reset. Here skewed column info is not reset (there are few other params that 
are not reset). That's why the index table contains the skewed column info.

Fix: Instead of deep copying the base table StorageDescriptor, create a new one 
from gathered info. This way it avoids the index table to inherit unnecessary 
properties in SD from base table.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java b0f124b 
  ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java d0cbed6 
  ql/src/test/queries/clientpositive/index_skewtable.q PRE-CREATION 
  ql/src/test/results/clientpositive/index_skewtable.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/14890/diff/


Testing
---

Added unittest and ran the index related unittest queries


Thanks,

Venki Korukanti