subject:"\[jira\] \[Commented\] \(HIVE\-2246\) Dedupe tables' column schemas from partitions in the metastore db"

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2013-01-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547981#comment-13547981
 ] 

Hudson commented on HIVE-2246:
--

Integrated in Hive-trunk-hadoop2 #54 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/54/])
HIVE-3424. Error by upgrading a Hive 0.7.0 database to 0.8.0 
(008-HIVE-2246.mysql.sql) (Alexander Alten-Lorenz via cws) (Revision 1380483)

 Result = ABORTED
cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1380483
Files : 
* /hive/trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql


 Dedupe tables' column schemas from partitions in the metastore db
 -

 Key: HIVE-2246
 URL: https://issues.apache.org/jira/browse/HIVE-2246
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Fix For: 0.8.0

 Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch, 
 HIVE-2246.8.patch


 Note: this patch proposes a schema change, and is therefore incompatible with 
 the current metastore.
 We can re-organize the JDO models to reduce space usage to keep the metastore 
 scalable for the future.  Currently, partitions are the fastest growing 
 objects in the metastore, and the metastore keeps a separate copy of the 
 columns list for each partition.  We can normalize the metastore db by 
 decoupling Columns from Storage Descriptors and not storing duplicate lists 
 of the columns for each partition. 
 An idea is to create an additional level of indirection with a Column 
 Descriptor that has a list of columns.  A table has a reference to its 
 latest Column Descriptor (note: a table may have more than one Column 
 Descriptor in the case of schema evolution).  Partitions and Indexes can 
 reference the same Column Descriptors as their parent table.
 Currently, the COLUMNS table in the metastore has roughly (number of 
 partitions + number of tables) * (average number of columns pertable) rows.  
 We can reduce this to (number of tables) * (average number of columns per 
 table) rows, while incurring a small cost proportional to the number of 
 tables to store the Column Descriptors.
 Please see the latest review board for additional implementation details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2012-09-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448018#comment-13448018
 ] 

Hudson commented on HIVE-2246:
--

Integrated in Hive-trunk-h0.21 #1646 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1646/])
HIVE-3424. Error by upgrading a Hive 0.7.0 database to 0.8.0 
(008-HIVE-2246.mysql.sql) (Alexander Alten-Lorenz via cws) (Revision 1380483)

 Result = FAILURE
cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1380483
Files : 
* /hive/trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql


 Dedupe tables' column schemas from partitions in the metastore db
 -

 Key: HIVE-2246
 URL: https://issues.apache.org/jira/browse/HIVE-2246
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Fix For: 0.8.0

 Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch, 
 HIVE-2246.8.patch


 Note: this patch proposes a schema change, and is therefore incompatible with 
 the current metastore.
 We can re-organize the JDO models to reduce space usage to keep the metastore 
 scalable for the future.  Currently, partitions are the fastest growing 
 objects in the metastore, and the metastore keeps a separate copy of the 
 columns list for each partition.  We can normalize the metastore db by 
 decoupling Columns from Storage Descriptors and not storing duplicate lists 
 of the columns for each partition. 
 An idea is to create an additional level of indirection with a Column 
 Descriptor that has a list of columns.  A table has a reference to its 
 latest Column Descriptor (note: a table may have more than one Column 
 Descriptor in the case of schema evolution).  Partitions and Indexes can 
 reference the same Column Descriptors as their parent table.
 Currently, the COLUMNS table in the metastore has roughly (number of 
 partitions + number of tables) * (average number of columns pertable) rows.  
 We can reduce this to (number of tables) * (average number of columns per 
 table) rows, while incurring a small cost proportional to the number of 
 tables to store the Column Descriptors.
 Please see the latest review board for additional implementation details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-12-02 Thread Namit Jain (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13161747#comment-13161747
]

Namit Jain commented on HIVE-2246:
--

Note that there is a bug in the upgrade script. After running this script, the
column information for all the partitions is lost. They all inherit the columns
from the table definition. It is not a serious problem, as the
partition column information is not really used by Hive. The only command whose
results will change is:

describe table T partition P;

Dedupe tables' column schemas from partitions in the metastore db
-

Key: HIVE-2246
URL: https://issues.apache.org/jira/browse/HIVE-2246
Project: Hive
Issue Type: Improvement
Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
Fix For: 0.8.0

Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch,
HIVE-2246.8.patch

Note: this patch proposes a schema change, and is therefore incompatible with
the current metastore.
We can re-organize the JDO models to reduce space usage to keep the metastore
scalable for the future. Currently, partitions are the fastest growing
objects in the metastore, and the metastore keeps a separate copy of the
columns list for each partition. We can normalize the metastore db by
decoupling Columns from Storage Descriptors and not storing duplicate lists
of the columns for each partition.
An idea is to create an additional level of indirection with a Column
Descriptor that has a list of columns. A table has a reference to its
latest Column Descriptor (note: a table may have more than one Column
Descriptor in the case of schema evolution). Partitions and Indexes can
reference the same Column Descriptors as their parent table.
Currently, the COLUMNS table in the metastore has roughly (number of
partitions + number of tables) * (average number of columns pertable) rows.
We can reduce this to (number of tables) * (average number of columns per
table) rows, while incurring a small cost proportional to the number of
tables to store the Column Descriptors.
Please see the latest review board for additional implementation details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-12-02 Thread Ashutosh Chauhan (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13161749#comment-13161749
]

Ashutosh Chauhan commented on HIVE-2246:

Thanks Namit for pointing this out. HCatalog looks into the columns information
of partitions, so it will have an issue. Do you have a fix or it or if you can
point out which part of script has a bug, we can take a look.

Dedupe tables' column schemas from partitions in the metastore db
-

Key: HIVE-2246
URL: https://issues.apache.org/jira/browse/HIVE-2246
Project: Hive
Issue Type: Improvement
Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
Fix For: 0.8.0

Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch,
HIVE-2246.8.patch

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-12-02 Thread Ashutosh Chauhan (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13161753#comment-13161753
]

Ashutosh Chauhan commented on HIVE-2246:

Also, I assume this is only while upgrading an existing metastore. Newly added
partitions after upgrade or new install will not loose any information.

Dedupe tables' column schemas from partitions in the metastore db
-

Key: HIVE-2246
URL: https://issues.apache.org/jira/browse/HIVE-2246
Project: Hive
Issue Type: Improvement
Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
Fix For: 0.8.0

Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch,
HIVE-2246.8.patch

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-11-13 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149397#comment-13149397
 ] 

Hudson commented on HIVE-2246:
--

Integrated in Hive-trunk-h0.21 #1082 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1082/])
HIVE-2572 HIVE-2246 upgrade script changed the COLUMNS_V2.COMMENT length
(Ning Zhang via namit)

namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1201470
Files : 
* /hive/trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql


 Dedupe tables' column schemas from partitions in the metastore db
 -

 Key: HIVE-2246
 URL: https://issues.apache.org/jira/browse/HIVE-2246
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Fix For: 0.8.0

 Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch, 
 HIVE-2246.8.patch


 Note: this patch proposes a schema change, and is therefore incompatible with 
 the current metastore.
 We can re-organize the JDO models to reduce space usage to keep the metastore 
 scalable for the future.  Currently, partitions are the fastest growing 
 objects in the metastore, and the metastore keeps a separate copy of the 
 columns list for each partition.  We can normalize the metastore db by 
 decoupling Columns from Storage Descriptors and not storing duplicate lists 
 of the columns for each partition. 
 An idea is to create an additional level of indirection with a Column 
 Descriptor that has a list of columns.  A table has a reference to its 
 latest Column Descriptor (note: a table may have more than one Column 
 Descriptor in the case of schema evolution).  Partitions and Indexes can 
 reference the same Column Descriptors as their parent table.
 Currently, the COLUMNS table in the metastore has roughly (number of 
 partitions + number of tables) * (average number of columns pertable) rows.  
 We can reduce this to (number of tables) * (average number of columns per 
 table) rows, while incurring a small cost proportional to the number of 
 tables to store the Column Descriptors.
 Please see the latest review board for additional implementation details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-11-11 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13148987#comment-13148987
 ] 

Hudson commented on HIVE-2246:
--

Integrated in Hive-trunk-h0.21 #1079 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1079/])
HIVE-2568 HIVE-2246 upgrade script needs to drop foreign key in COLUMNS_OLD
(Ning Zhang via namit)

namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1201091
Files : 
* /hive/trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql


 Dedupe tables' column schemas from partitions in the metastore db
 -

 Key: HIVE-2246
 URL: https://issues.apache.org/jira/browse/HIVE-2246
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Fix For: 0.8.0

 Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch, 
 HIVE-2246.8.patch


 Note: this patch proposes a schema change, and is therefore incompatible with 
 the current metastore.
 We can re-organize the JDO models to reduce space usage to keep the metastore 
 scalable for the future.  Currently, partitions are the fastest growing 
 objects in the metastore, and the metastore keeps a separate copy of the 
 columns list for each partition.  We can normalize the metastore db by 
 decoupling Columns from Storage Descriptors and not storing duplicate lists 
 of the columns for each partition. 
 An idea is to create an additional level of indirection with a Column 
 Descriptor that has a list of columns.  A table has a reference to its 
 latest Column Descriptor (note: a table may have more than one Column 
 Descriptor in the case of schema evolution).  Partitions and Indexes can 
 reference the same Column Descriptors as their parent table.
 Currently, the COLUMNS table in the metastore has roughly (number of 
 partitions + number of tables) * (average number of columns pertable) rows.  
 We can reduce this to (number of tables) * (average number of columns per 
 table) rows, while incurring a small cost proportional to the number of 
 tables to store the Column Descriptors.
 Please see the latest review board for additional implementation details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-11-08 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13146787#comment-13146787
 ] 

Hudson commented on HIVE-2246:
--

Integrated in Hive-0.8.0-SNAPSHOT-h0.21 #87 (See 
[https://builds.apache.org/job/Hive-0.8.0-SNAPSHOT-h0.21/87/])
HIVE-2556. upgrade script 008-HIVE-2246.mysql.sql contains syntax errors. 
(Ning Zhang via pauly)

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

pauly : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1199595
Files : 
* 
/hive/branches/branch-0.8/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql


 Dedupe tables' column schemas from partitions in the metastore db
 -

 Key: HIVE-2246
 URL: https://issues.apache.org/jira/browse/HIVE-2246
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Fix For: 0.8.0

 Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch, 
 HIVE-2246.8.patch


 Note: this patch proposes a schema change, and is therefore incompatible with 
 the current metastore.
 We can re-organize the JDO models to reduce space usage to keep the metastore 
 scalable for the future.  Currently, partitions are the fastest growing 
 objects in the metastore, and the metastore keeps a separate copy of the 
 columns list for each partition.  We can normalize the metastore db by 
 decoupling Columns from Storage Descriptors and not storing duplicate lists 
 of the columns for each partition. 
 An idea is to create an additional level of indirection with a Column 
 Descriptor that has a list of columns.  A table has a reference to its 
 latest Column Descriptor (note: a table may have more than one Column 
 Descriptor in the case of schema evolution).  Partitions and Indexes can 
 reference the same Column Descriptors as their parent table.
 Currently, the COLUMNS table in the metastore has roughly (number of 
 partitions + number of tables) * (average number of columns pertable) rows.  
 We can reduce this to (number of tables) * (average number of columns per 
 table) rows, while incurring a small cost proportional to the number of 
 tables to store the Column Descriptors.
 Please see the latest review board for additional implementation details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-11-08 Thread Hudson (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13146794#comment-13146794
]

Hudson commented on HIVE-2246:
--

Integrated in Hive-trunk-h0.21 #1070 (See
[https://builds.apache.org/job/Hive-trunk-h0.21/1070/])
HIVE-2556. upgrade script 008-HIVE-2246.mysql.sql contains syntax errors.
(Ning Zhang via pauly)

pauly : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1199593
Files :
* /hive/trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql

Dedupe tables' column schemas from partitions in the metastore db
-

Key: HIVE-2246
URL: https://issues.apache.org/jira/browse/HIVE-2246
Project: Hive
Issue Type: Improvement
Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
Fix For: 0.8.0

Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch,
HIVE-2246.8.patch

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-11-04 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1319#comment-1319
 ] 

Hudson commented on HIVE-2246:
--

Integrated in Hive-trunk-h0.21 #1059 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1059/])
HIVE-2366. Metastore upgrade scripts for HIVE-2246 do not migrate indexes 
nor rename the old COLUMNS table (Sohan Jain via Ning Zhang)

nzhang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1197644
Files : 
* /hive/trunk/metastore/scripts/upgrade/derby/008-HIVE-2246.derby.sql
* /hive/trunk/metastore/scripts/upgrade/derby/008-REVERT-HIVE-2246.derby.sql
* /hive/trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql


 Dedupe tables' column schemas from partitions in the metastore db
 -

 Key: HIVE-2246
 URL: https://issues.apache.org/jira/browse/HIVE-2246
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Fix For: 0.8.0

 Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch, 
 HIVE-2246.8.patch


 Note: this patch proposes a schema change, and is therefore incompatible with 
 the current metastore.
 We can re-organize the JDO models to reduce space usage to keep the metastore 
 scalable for the future.  Currently, partitions are the fastest growing 
 objects in the metastore, and the metastore keeps a separate copy of the 
 columns list for each partition.  We can normalize the metastore db by 
 decoupling Columns from Storage Descriptors and not storing duplicate lists 
 of the columns for each partition. 
 An idea is to create an additional level of indirection with a Column 
 Descriptor that has a list of columns.  A table has a reference to its 
 latest Column Descriptor (note: a table may have more than one Column 
 Descriptor in the case of schema evolution).  Partitions and Indexes can 
 reference the same Column Descriptors as their parent table.
 Currently, the COLUMNS table in the metastore has roughly (number of 
 partitions + number of tables) * (average number of columns pertable) rows.  
 We can reduce this to (number of tables) * (average number of columns per 
 table) rows, while incurring a small cost proportional to the number of 
 tables to store the Column Descriptors.
 Please see the latest review board for additional implementation details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-11-04 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13144490#comment-13144490
 ] 

Hudson commented on HIVE-2246:
--

Integrated in Hive-0.8.0-SNAPSHOT-h0.21 #82 (See 
[https://builds.apache.org/job/Hive-0.8.0-SNAPSHOT-h0.21/82/])
HIVE-2366. Metastore upgrade scripts for HIVE-2246 do not migrate indexes 
nor rename the old COLUMNS table (Sohan Jain via Ning Zhang)

nzhang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1197646
Files : 
* 
/hive/branches/branch-0.8/metastore/scripts/upgrade/derby/008-HIVE-2246.derby.sql
* 
/hive/branches/branch-0.8/metastore/scripts/upgrade/derby/008-REVERT-HIVE-2246.derby.sql
* 
/hive/branches/branch-0.8/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql


 Dedupe tables' column schemas from partitions in the metastore db
 -

 Key: HIVE-2246
 URL: https://issues.apache.org/jira/browse/HIVE-2246
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Fix For: 0.8.0

 Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch, 
 HIVE-2246.8.patch


 Note: this patch proposes a schema change, and is therefore incompatible with 
 the current metastore.
 We can re-organize the JDO models to reduce space usage to keep the metastore 
 scalable for the future.  Currently, partitions are the fastest growing 
 objects in the metastore, and the metastore keeps a separate copy of the 
 columns list for each partition.  We can normalize the metastore db by 
 decoupling Columns from Storage Descriptors and not storing duplicate lists 
 of the columns for each partition. 
 An idea is to create an additional level of indirection with a Column 
 Descriptor that has a list of columns.  A table has a reference to its 
 latest Column Descriptor (note: a table may have more than one Column 
 Descriptor in the case of schema evolution).  Partitions and Indexes can 
 reference the same Column Descriptors as their parent table.
 Currently, the COLUMNS table in the metastore has roughly (number of 
 partitions + number of tables) * (average number of columns pertable) rows.  
 We can reduce this to (number of tables) * (average number of columns per 
 table) rows, while incurring a small cost proportional to the number of 
 tables to store the Column Descriptors.
 Please see the latest review board for additional implementation details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-08-11 Thread Paul Yang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083751#comment-13083751
]

Paul Yang commented on HIVE-2246:
-

There has been some issues identified with this patch. We will be doing some
additional testing, but we might rollback so that we don't leave trunk in an
unstable state.

Dedupe tables' column schemas from partitions in the metastore db
-

Key: HIVE-2246
URL: https://issues.apache.org/jira/browse/HIVE-2246
Project: Hive
Issue Type: Improvement
Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
Fix For: 0.8.0

Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch,
HIVE-2246.8.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-08-09 Thread Paul Yang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081956#comment-13081956
]

Paul Yang commented on HIVE-2246:
-

+1 - tests passed. Will commit.

Dedupe tables' column schemas from partitions in the metastore db
-

Key: HIVE-2246
URL: https://issues.apache.org/jira/browse/HIVE-2246
Project: Hive
Issue Type: Improvement
Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch,
HIVE-2246.8.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-08-08 Thread jirapos...@reviews.apache.org (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081188#comment-13081188
]

jirapos...@reviews.apache.org commented on HIVE-2246:
-

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1183/
---

(Updated 2011-08-08 20:55:11.546253)

Review request for hive, Ning Zhang and Paul Yang.

Changes
---

added derby upgrade and revert-the-upgrade script

Summary
---

This patch tries to make minimal changes to the API while keeping migration
short and somewhat easy to revert.

The new schema can be described as follows:
- CDS is a table corresponding to Column Descriptor objects. Currently, it
only stores a CD_ID.
- COLUMNS_V2 is a table corresponding to MFieldSchema objects, or columns. A
Column Descriptor holds a list of columns. COLUMNS_V2 has a foreign key to the
CD_ID to which it belongs.
- SDS was modified to reference a Column Descriptor. So SDS now has a foreign
key to a CD_ID which describes its columns.

During migration, we create Column Descriptors for tables in a straightforward
manner: their columns are now just wrapped inside a column descriptor. The SDS
of partitions use their parent table's column descriptor, since currently a
partition and its table share the same list of columns.

When altering or adding a partition, give it it's parent table's column
descriptor IF the columns they describe are the same. Otherwise, create a new
column descriptor for its columns.

When adding or altering a table, create a new column descriptor every time.

Whenever you drop a storage descriptor (e.g, when dropping tables or
partitions), check to see if the related column descriptor has any other
references in the table. That is, check to see if any other storage
descriptors point to that column descriptor. If none do, then delete that
column descriptor. This check is in place so we don't have unreferenced column
descriptors and columns hanging around after schema evolution for tables.

This addresses bug HIVE-2246.
https://issues.apache.org/jira/browse/HIVE-2246

Diffs (updated)
-

trunk/metastore/scripts/upgrade/derby/008-HIVE-2246.derby.sql PRE-CREATION
trunk/metastore/scripts/upgrade/derby/008-REVERT-HIVE-2246.derby.sql
PRE-CREATION
trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql PRE-CREATION
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
1153927
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
1153927

trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MColumnDescriptor.java
PRE-CREATION

trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MStorageDescriptor.java
1153927
trunk/metastore/src/model/package.jdo 1153927
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1153927
trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/MetaDataFormatUtils.java
1153927

Diff: https://reviews.apache.org/r/1183/diff

Testing
---

Passes facebook's regression testing and all existing test cases. In one
instance, before migration, the overhead involved with storage descriptors and
columns was ~11 GB. After migration, the overhead was ~1.5 GB.

Thanks,

Sohan

Dedupe tables' column schemas from partitions in the metastore db
-

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-08-08 Thread jirapos...@reviews.apache.org (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081202#comment-13081202
]

jirapos...@reviews.apache.org commented on HIVE-2246:
-

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1183/
---

(Updated 2011-08-08 21:19:06.999293)

Review request for hive, Ning Zhang and Paul Yang.

Changes
---

revised description for latest changes

Summary (updated)
---

This patch tries to make minimal changes to the API while keeping migration
short and somewhat easy to revert.

When altering or adding a partition, give it it's parent table's column
descriptor IF the columns they describe are the same. Otherwise, create a new
column descriptor for its columns.

When creating a table, create a new column descriptor every time. When
altering a table, only construct a new column descriptor if the columns list
has changed.

This addresses bug HIVE-2246.
https://issues.apache.org/jira/browse/HIVE-2246

Diffs
-

trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MColumnDescriptor.java
PRE-CREATION

Diff: https://reviews.apache.org/r/1183/diff

Testing
---

Thanks,

Sohan

Dedupe tables' column schemas from partitions in the metastore db
-

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-08-08 Thread jirapos...@reviews.apache.org (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081209#comment-13081209
]

jirapos...@reviews.apache.org commented on HIVE-2246:
-

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1183/
---

(Updated 2011-08-08 21:29:23.722825)

Review request for hive, Ning Zhang and Paul Yang.

Changes
---

Revert the changes to describe table T partition P, so that it always shows
the table T's schema. If a table's schema has changed, we do not support
querying on the old partition's schema at the moment.

Summary
---

This patch tries to make minimal changes to the API while keeping migration
short and somewhat easy to revert.

When altering or adding a partition, give it it's parent table's column
descriptor IF the columns they describe are the same. Otherwise, create a new
column descriptor for its columns.

When creating a table, create a new column descriptor every time. When
altering a table, only construct a new column descriptor if the columns list
has changed.

This addresses bug HIVE-2246.
https://issues.apache.org/jira/browse/HIVE-2246

Diffs (updated)
-

trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MColumnDescriptor.java
PRE-CREATION

trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MStorageDescriptor.java
1153927
trunk/metastore/src/model/package.jdo 1153927

Diff: https://reviews.apache.org/r/1183/diff

Testing
---

Thanks,

Sohan

Dedupe tables' column schemas from partitions in the metastore db
-

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-08-05 Thread jirapos...@reviews.apache.org (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080211#comment-13080211
]

jirapos...@reviews.apache.org commented on HIVE-2246:
-

bq. On 2011-07-25 06:46:04, Ning Zhang wrote:
bq. trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql, line 76
bq. https://reviews.apache.org/r/1183/diff/2/?file=26824#file26824line76
bq.
bq. is the CHARSET (latin1) the same as SDS? This will require the
user's comments to be in latin1 which prevents UTF chars.

Yes, this charset matches the same ones from the official hive schema for 0.7.0.

bq. On 2011-07-25 06:46:04, Ning Zhang wrote:
bq. trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql, line 206
bq. https://reviews.apache.org/r/1183/diff/2/?file=26824#file26824line206
bq.
bq. can you also add migration script for derby? we support derby as a
default metastore RDBMS as well.

Ok, will do. I will add it in the next-next diff here.

bq. On 2011-07-25 06:46:04, Ning Zhang wrote:
bq.
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java,
line 1752
bq. https://reviews.apache.org/r/1183/diff/2/?file=26825#file26825line1752
bq.
bq. here do you check if the 'alter table' command changes the schema
(columns definition)? If it just set a table property, then you don't need to
create a new ColumnDescriptor right?
bq.
bq. Also if a table's schema got changed, a new CD will be created, but
the old partition will still have the old CDs. When we query the old partition,
do we use the old partitons's CD or the table's CD?
bq.
bq. Also in the above case, when you run 'desc table partition
old_partition', do you return the old partition's CD or the table's CD?

Good point; I should check whether the table columns have changed; I do this
already when altering partitions. I added that in the next diff.

If a table's schema changes, it does not update existing partition CDs. If we
ever grab the partition object after the schema change, it will refer to its
old CD, not the table's CD. However, when querying tables on the CLI, we
almost always use the table's set of columns. E.g., if did:
bq. create table test (a string) partitioned by (p1 string, p2 string);
bq. alter table test add partition(p1=1, p2=1);
bq. # populate the p1=1, p2=1 partition with some data now
bq. alter table test add columns (b string)
bq. select * from test where p1 = 1 and p2 = 1,

it'd use the table's latest schema; i.e., return the column 'a's values and the
column 'b' as all NULL.

- Sohan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1183/#review1176
---

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-08-05 Thread jirapos...@reviews.apache.org (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080216#comment-13080216
]

jirapos...@reviews.apache.org commented on HIVE-2246:
-

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1183/
---

(Updated 2011-08-05 20:48:05.144312)

Review request for hive, Ning Zhang and Paul Yang.

Changes
---

-On alter table, only change the column descriptor if the columns have changed.
-Fix desc table partition... to use the partition's column schema, not the
table's

Summary
---

This patch tries to make minimal changes to the API while keeping migration
short and somewhat easy to revert.

When altering or adding a partition, give it it's parent table's column
descriptor IF the columns they describe are the same. Otherwise, create a new
column descriptor for its columns.

When adding or altering a table, create a new column descriptor every time.

This addresses bug HIVE-2246.
https://issues.apache.org/jira/browse/HIVE-2246

Diffs (updated)
-

trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
1153927
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
1153927

Diff: https://reviews.apache.org/r/1183/diff

Testing
---

Thanks,

Sohan

Dedupe tables' column schemas from partitions in the metastore db
-

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-08-05 Thread jirapos...@reviews.apache.org (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080218#comment-13080218
]

jirapos...@reviews.apache.org commented on HIVE-2246:
-

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1183/
---

(Updated 2011-08-05 20:49:19.127572)

Review request for hive, Ning Zhang and Paul Yang.

Changes
---

-Forgot to add a few files. NOTE: this is only a temporary diff; I need to add
derby and postgres migration scripts.

Summary
---

This patch tries to make minimal changes to the API while keeping migration
short and somewhat easy to revert.

When altering or adding a partition, give it it's parent table's column
descriptor IF the columns they describe are the same. Otherwise, create a new
column descriptor for its columns.

When adding or altering a table, create a new column descriptor every time.

This addresses bug HIVE-2246.
https://issues.apache.org/jira/browse/HIVE-2246

Diffs (updated)
-

trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql PRE-CREATION
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
1153927
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
1153927

trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MColumnDescriptor.java
PRE-CREATION

Diff: https://reviews.apache.org/r/1183/diff

Testing
---

Thanks,

Sohan

Dedupe tables' column schemas from partitions in the metastore db
-

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-08-05 Thread jirapos...@reviews.apache.org (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080327#comment-13080327
]

jirapos...@reviews.apache.org commented on HIVE-2246:
-

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1183/
---

(Updated 2011-08-06 01:40:49.118616)

Review request for hive, Ning Zhang and Paul Yang.

Changes
---

-made listStorageDescriptors.. into one transaction
-renamed dropStorageDescriptorCleanly to make it's functionality clearer
-indents typo

Summary
---

This patch tries to make minimal changes to the API while keeping migration
short and somewhat easy to revert.

When altering or adding a partition, give it it's parent table's column
descriptor IF the columns they describe are the same. Otherwise, create a new
column descriptor for its columns.

When adding or altering a table, create a new column descriptor every time.

This addresses bug HIVE-2246.
https://issues.apache.org/jira/browse/HIVE-2246

Diffs (updated)
-

trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MColumnDescriptor.java
PRE-CREATION

Diff: https://reviews.apache.org/r/1183/diff

Testing
---

Thanks,

Sohan

Dedupe tables' column schemas from partitions in the metastore db
-

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-08-05 Thread jirapos...@reviews.apache.org (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080328#comment-13080328
]

jirapos...@reviews.apache.org commented on HIVE-2246:
-

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1183/#review1313
---

trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
https://reviews.apache.org/r/1183/#comment2984

should read 1-N actually

- Sohan

On 2011-08-06 01:40:49, Sohan Jain wrote:
bq.
bq. ---
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/1183/
bq. ---
bq.
bq. (Updated 2011-08-06 01:40:49)
bq.
bq.
bq. Review request for hive, Ning Zhang and Paul Yang.
bq.
bq.
bq. Summary
bq. ---
bq.
bq. This patch tries to make minimal changes to the API while keeping
migration short and somewhat easy to revert.
bq.
bq. The new schema can be described as follows:
bq. - CDS is a table corresponding to Column Descriptor objects. Currently,
it only stores a CD_ID.
bq. - COLUMNS_V2 is a table corresponding to MFieldSchema objects, or columns.
A Column Descriptor holds a list of columns. COLUMNS_V2 has a foreign key to
the CD_ID to which it belongs.
bq. - SDS was modified to reference a Column Descriptor. So SDS now has a
foreign key to a CD_ID which describes its columns.
bq.
bq. During migration, we create Column Descriptors for tables in a
straightforward manner: their columns are now just wrapped inside a column
descriptor. The SDS of partitions use their parent table's column descriptor,
since currently a partition and its table share the same list of columns.
bq.
bq. When altering or adding a partition, give it it's parent table's column
descriptor IF the columns they describe are the same. Otherwise, create a new
column descriptor for its columns.
bq.
bq. When adding or altering a table, create a new column descriptor every time.
bq.
bq. Whenever you drop a storage descriptor (e.g, when dropping tables or
partitions), check to see if the related column descriptor has any other
references in the table. That is, check to see if any other storage
descriptors point to that column descriptor. If none do, then delete that
column descriptor. This check is in place so we don't have unreferenced column
descriptors and columns hanging around after schema evolution for tables.
bq.
bq.
bq. This addresses bug HIVE-2246.
bq. https://issues.apache.org/jira/browse/HIVE-2246
bq.
bq.
bq. Diffs
bq. -
bq.
bq.trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql
PRE-CREATION
bq.
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
1153927
bq.
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
1153927
bq.
trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MColumnDescriptor.java
PRE-CREATION
bq.
trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MStorageDescriptor.java
1153927
bq.trunk/metastore/src/model/package.jdo 1153927
bq.trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1153927
bq.
trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/MetaDataFormatUtils.java
1153927
bq.
bq. Diff: https://reviews.apache.org/r/1183/diff
bq.
bq.
bq. Testing
bq. ---
bq.
bq. Passes facebook's regression testing and all existing test cases. In one
instance, before migration, the overhead involved with storage descriptors and
columns was ~11 GB. After migration, the overhead was ~1.5 GB.
bq.
bq.
bq. Thanks,
bq.
bq. Sohan
bq.
bq.

Dedupe tables' column schemas from partitions in the metastore db
-

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-07-25 Thread jirapos...@reviews.apache.org (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070337#comment-13070337
]

jirapos...@reviews.apache.org commented on HIVE-2246:
-

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1183/#review1176
---

trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql
https://reviews.apache.org/r/1183/#comment2467

is the CHARSET (latin1) the same as SDS? This will require the user's
comments to be in latin1 which prevents UTF chars.

trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql
https://reviews.apache.org/r/1183/#comment2466

can you also add migration script for derby? we support derby as a default
metastore RDBMS as well.

trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
https://reviews.apache.org/r/1183/#comment2468

here do you check if the 'alter table' command changes the schema (columns
definition)? If it just set a table property, then you don't need to create a
new ColumnDescriptor right?

Also if a table's schema got changed, a new CD will be created, but the old
partition will still have the old CDs. When we query the old partition, do we
use the old partitons's CD or the table's CD?

Also in the above case, when you run 'desc table partition
old_partition', do you return the old partition's CD or the table's CD?

- Ning

On 2011-07-22 05:30:29, Sohan Jain wrote:
bq.
bq. ---
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/1183/
bq. ---
bq.
bq. (Updated 2011-07-22 05:30:29)
bq.
bq.
bq. Review request for hive, Ning Zhang and Paul Yang.
bq.
bq.
bq. Summary
bq. ---
bq.
bq. This patch tries to make minimal changes to the API while keeping
migration short and somewhat easy to revert.
bq.
bq. The new schema can be described as follows:
bq. - CDS is a table corresponding to Column Descriptor objects. Currently,
it only stores a CD_ID.
bq. - COLUMNS_V2 is a table corresponding to MFieldSchema objects, or columns.
A Column Descriptor holds a list of columns. COLUMNS_V2 has a foreign key to
the CD_ID to which it belongs.
bq. - SDS was modified to reference a Column Descriptor. So SDS now has a
foreign key to a CD_ID which describes its columns.
bq.
bq. During migration, we create Column Descriptors for tables in a
straightforward manner: their columns are now just wrapped inside a column
descriptor. The SDS of partitions use their parent table's column descriptor,
since currently a partition and its table share the same list of columns.
bq.
bq. When altering or adding a partition, give it it's parent table's column
descriptor IF the columns they describe are the same. Otherwise, create a new
column descriptor for its columns.
bq.
bq. When adding or altering a table, create a new column descriptor every time.
bq.
bq. Whenever you drop a storage descriptor (e.g, when dropping tables or
partitions), check to see if the related column descriptor has any other
references in the table. That is, check to see if any other storage
descriptors point to that column descriptor. If none do, then delete that
column descriptor. This check is in place so we don't have unreferenced column
descriptors and columns hanging around after schema evolution for tables.
bq.
bq.
bq. This addresses bug HIVE-2246.
bq. https://issues.apache.org/jira/browse/HIVE-2246
bq.
bq.
bq. Diffs
bq. -
bq.
bq.trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql
PRE-CREATION
bq.
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
1148945
bq.
trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MColumnDescriptor.java
PRE-CREATION
bq.
trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MStorageDescriptor.java
1148945
bq.trunk/metastore/src/model/package.jdo 1148945
bq.
bq. Diff: https://reviews.apache.org/r/1183/diff
bq.
bq.
bq. Testing
bq. ---
bq.
bq. Passes facebook's regression testing and all existing test cases. In one
instance, before migration, the overhead involved with storage descriptors and
columns was ~11 GB. After migration, the overhead was ~1.5 GB.
bq.
bq.
bq. Thanks,
bq.
bq. Sohan
bq.
bq.

Dedupe tables' column schemas from partitions in the metastore db
-

Key: HIVE-2246
URL: https://issues.apache.org/jira/browse/HIVE-2246
Project: Hive
Issue Type: Improvement

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-07-21 Thread jirapos...@reviews.apache.org (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069389#comment-13069389
]

jirapos...@reviews.apache.org commented on HIVE-2246:
-

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1183/
---

Review request for hive, Ning Zhang and Paul Yang.

Summary
---

This patch tries to make minimal changes to the API while keeping migration
short and somewhat easy to revert.

When altering or adding a partition, give it it's parent table's column
descriptor IF the columns they describe are the same. Otherwise, create a new
column descriptor for its columns.

When adding or altering a table, create a new column descriptor every time.

This addresses bug HIVE-2246.
https://issues.apache.org/jira/browse/HIVE-2246

Diffs
-

trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
1148945

trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MStorageDescriptor.java
1148945
trunk/metastore/src/model/package.jdo 1148945

Diff: https://reviews.apache.org/r/1183/diff

Testing
---

Thanks,

Sohan

Dedupe tables' column schemas from partitions in the metastore db
-

We can re-organize the JDO models to reduce space usage to keep the metastore
scalable for the future. Currently, partitions are the fastest growing
objects in the metastore, and the metastore keeps a separate copy of the
columns list for each partition. We can normalize the metastore db by
decoupling Columns from Storage Descriptors and not storing duplicate lists
of the columns for each partition.
An idea is to create an additional level of indirection with a Column
Descriptor that has a list of columns. A table has a reference to its
latest Column Descriptor (note: a table may have more than one Column
Descriptor in the case of schema evolution). Partitions and Indexes can
reference the same Column Descriptors as their parent table.
Currently, the COLUMNS table in the metastore has roughly (number of
partitions + number of tables) * (average number of columns pertable) rows.
We can reduce this to (number of tables) * (average number of columns per
table) rows, while incurring a small cost proportional to the number of
tables to store the Column Descriptors.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-07-21 Thread jirapos...@reviews.apache.org (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069392#comment-13069392
]

jirapos...@reviews.apache.org commented on HIVE-2246:
-

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1183/
---

(Updated 2011-07-22 05:30:29.026246)

Review request for hive, Ning Zhang and Paul Yang.

Changes
---

Adding some files I missed in the last diff.

Summary
---

This patch tries to make minimal changes to the API while keeping migration
short and somewhat easy to revert.

When altering or adding a partition, give it it's parent table's column
descriptor IF the columns they describe are the same. Otherwise, create a new
column descriptor for its columns.

When adding or altering a table, create a new column descriptor every time.

This addresses bug HIVE-2246.
https://issues.apache.org/jira/browse/HIVE-2246

Diffs (updated)
-

trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql PRE-CREATION
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
1148945

trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MColumnDescriptor.java
PRE-CREATION

trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MStorageDescriptor.java
1148945
trunk/metastore/src/model/package.jdo 1148945

Diff: https://reviews.apache.org/r/1183/diff

Testing
---

Thanks,

Sohan

Dedupe tables' column schemas from partitions in the metastore db
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

24 matches

Site Navigation

Mail list logo

Footer information