from:"Sohan Jain \(JIRA\)"

[jira] [Created] (HIVE-2366) Metastore upgrade scripts for HIVE-2246 do not migrate indexes nor rename the old COLUMNS table

2011-08-10 Thread Sohan Jain (JIRA)

Metastore upgrade scripts for HIVE-2246 do not migrate indexes nor rename the 
old COLUMNS table
---

 Key: HIVE-2366
 URL: https://issues.apache.org/jira/browse/HIVE-2366
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain


The upgrade scripts for the hive metastore in HIVE-2246 do not upgrade the 
indexes.  They also need to rename the old COLUMNS table after migration so 
that old clients will not accidentally access the COLUMNS table.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2366) Metastore upgrade scripts for HIVE-2246 do not migrate indexes nor rename the old COLUMNS table

2011-08-10 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2366:
-

Attachment: HIVE-2366.1.patch

 Metastore upgrade scripts for HIVE-2246 do not migrate indexes nor rename the 
 old COLUMNS table
 ---

 Key: HIVE-2366
 URL: https://issues.apache.org/jira/browse/HIVE-2366
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2366.1.patch


 The upgrade scripts for the hive metastore in HIVE-2246 do not upgrade the 
 indexes.  They also need to rename the old COLUMNS table after migration so 
 that old clients will not accidentally access the COLUMNS table.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2367) Indexes' storage descriptors' columns are not deduped, and altering an index leaves behind an unused storage descriptor

2011-08-10 Thread Sohan Jain (JIRA)

Indexes' storage descriptors' columns are not deduped, and altering an index 
leaves behind an unused storage descriptor
---

 Key: HIVE-2367
 URL: https://issues.apache.org/jira/browse/HIVE-2367
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sohan Jain


The metastore migration by HIVE-2246 does not dedupe the COLUMNS information 
for Indexes.  That is, the IDXS table has a Storage Descriptor that always 
points to a new Column Descriptor, which is unlikely shared by any other 
storage descriptor.

Therefore, when altering an index, a new storage Descriptor and column 
descriptor are created.  No other objects will reference the old storage 
descriptor and column descriptor, but they will persist in the metastore db.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2368) Determining whether a Column Descriptor is unused may take too long

2011-08-10 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2368:
-

Attachment: HIVE-2368.1.patch

 Determining whether a Column Descriptor is unused may take too long
 ---

 Key: HIVE-2368
 URL: https://issues.apache.org/jira/browse/HIVE-2368
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sohan Jain
 Attachments: HIVE-2368.1.patch


 To determine if a column descriptor is unused, we call 
 listStorageDescriptorsWithCD(), which may return a big list of SDs.  This can 
 severely slow down dropping partitions.
 We can add a maximum number of SDs to return, and just ask for 1 SD, since we 
 are just doing an existential check.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-08-08 Thread Sohan Jain (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sohan Jain updated HIVE-2246:
-

Attachment: HIVE-2246.8.patch

Dedupe tables' column schemas from partitions in the metastore db
-

Key: HIVE-2246
URL: https://issues.apache.org/jira/browse/HIVE-2246
Project: Hive
Issue Type: Improvement
Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch,
HIVE-2246.8.patch

Note: this patch proposes a schema change, and is therefore incompatible with
the current metastore.
We can re-organize the JDO models to reduce space usage to keep the metastore
scalable for the future. Currently, partitions are the fastest growing
objects in the metastore, and the metastore keeps a separate copy of the
columns list for each partition. We can normalize the metastore db by
decoupling Columns from Storage Descriptors and not storing duplicate lists
of the columns for each partition.
An idea is to create an additional level of indirection with a Column
Descriptor that has a list of columns. A table has a reference to its
latest Column Descriptor (note: a table may have more than one Column
Descriptor in the case of schema evolution). Partitions and Indexes can
reference the same Column Descriptors as their parent table.
Currently, the COLUMNS table in the metastore has roughly (number of
partitions + number of tables) * (average number of columns pertable) rows.
We can reduce this to (number of tables) * (average number of columns per
table) rows, while incurring a small cost proportional to the number of
tables to store the Column Descriptors.
Please see the latest review board for additional implementation details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2356) Fix udtf_explode.q and udf_explode.q test failures

2011-08-08 Thread Sohan Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081333#comment-13081333
 ] 

Sohan Jain commented on HIVE-2356:
--

@Carl: out of curiosity, what was the fix here?  I was trying to trace this 
error for a little while.

 Fix udtf_explode.q and udf_explode.q test failures
 --

 Key: HIVE-2356
 URL: https://issues.apache.org/jira/browse/HIVE-2356
 Project: Hive
  Issue Type: Bug
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-2356-fix-explode.1.patch.txt




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-08-05 Thread Sohan Jain (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sohan Jain updated HIVE-2246:
-

Attachment: HIVE-2246.4.patch

Dedupe tables' column schemas from partitions in the metastore db
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2322) Add ColumnarSerDe to the list of native SerDes

2011-08-05 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2322:
-

Attachment: HIVE-2322.4.patch

fixed the broken test cases

 Add ColumnarSerDe to the list of native SerDes
 --

 Key: HIVE-2322
 URL: https://issues.apache.org/jira/browse/HIVE-2322
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Serializers/Deserializers
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2322.1.patch, HIVE-2322.2.patch, HIVE-2322.3.patch, 
 HIVE-2322.4.patch


 We store metadata about ColumnarSerDes in the metastore, so it should be 
 considered a native SerDe.  Then, column information can be retrieved from 
 the metastore instead of from deserialization.
 Currently, for non-native SerDes, column comments are only shown as from 
 deserializer.  Adding ColumnarSerDe to the list of native SerDes will 
 persist column comments.  See HIVE-2171 for persisting the column comments of 
 custom SerDes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2319) Calling alter_table after changing partition comment throws an exception

2011-08-04 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2319:
-

Attachment: HIVE-2319.4.patch

 Calling alter_table after changing partition comment throws an exception
 

 Key: HIVE-2319
 URL: https://issues.apache.org/jira/browse/HIVE-2319
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2319.2.patch, HIVE-2319.3.patch, HIVE-2319.4.patch


 Altering a table's partition key comments raises an 
 InvalidOperationException.  The partition key name and type should not be 
 mutable, but the comment should be able to get changed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2322) Add ColumnarSerDe to the list of native SerDes

2011-08-04 Thread Sohan Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079537#comment-13079537
 ] 

Sohan Jain commented on HIVE-2322:
--

Yes, looks like some of the output.q files were updated and now conflicting.  
I've been re-running the test suite and re-generating them.

 Add ColumnarSerDe to the list of native SerDes
 --

 Key: HIVE-2322
 URL: https://issues.apache.org/jira/browse/HIVE-2322
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Serializers/Deserializers
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2322.1.patch, HIVE-2322.2.patch


 We store metadata about ColumnarSerDes in the metastore, so it should be 
 considered a native SerDe.  Then, column information can be retrieved from 
 the metastore instead of from deserialization.
 Currently, for non-native SerDes, column comments are only shown as from 
 deserializer.  Adding ColumnarSerDe to the list of native SerDes will 
 persist column comments.  See HIVE-2171 for persisting the column comments of 
 custom SerDes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2338) Alter table always throws an unhelpful error on failure

2011-08-02 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2338:
-

Status: Patch Available  (was: Open)

 Alter table always throws an unhelpful error on failure
 ---

 Key: HIVE-2338
 URL: https://issues.apache.org/jira/browse/HIVE-2338
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
Priority: Minor
 Attachments: HIVE-2338.1.patch


 Every failure in an alter table function always return a MetaException. When 
 altering tables and catching exceptions, we throw a MetaException in the 
 finally part of a try-catch-finally block, which overrides any other 
 exceptions thrown.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2319) Calling alter_table after changing partition comment throws an exception

2011-07-28 Thread Sohan Jain (JIRA)

Calling alter_table after changing partition comment throws an exception


 Key: HIVE-2319
 URL: https://issues.apache.org/jira/browse/HIVE-2319
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sohan Jain


Altering a table's partition key comments raises an InvalidOperationException.  
The partition key name and type should not be mutable, but the comment should 
be able to get changed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2319) Calling alter_table after changing partition comment throws an exception

2011-07-28 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2319:
-

Assignee: Sohan Jain
  Status: Patch Available  (was: Open)

 Calling alter_table after changing partition comment throws an exception
 

 Key: HIVE-2319
 URL: https://issues.apache.org/jira/browse/HIVE-2319
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2319.2.patch


 Altering a table's partition key comments raises an 
 InvalidOperationException.  The partition key name and type should not be 
 mutable, but the comment should be able to get changed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2319) Calling alter_table after changing partition comment throws an exception

2011-07-28 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2319:
-

Attachment: HIVE-2319.2.patch

 Calling alter_table after changing partition comment throws an exception
 

 Key: HIVE-2319
 URL: https://issues.apache.org/jira/browse/HIVE-2319
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sohan Jain
 Attachments: HIVE-2319.2.patch


 Altering a table's partition key comments raises an 
 InvalidOperationException.  The partition key name and type should not be 
 mutable, but the comment should be able to get changed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2322) Add ColumnarSerDe to the list of native SerDes

2011-07-28 Thread Sohan Jain (JIRA)

Add ColumnarSerDe to the list of native SerDes
--

 Key: HIVE-2322
 URL: https://issues.apache.org/jira/browse/HIVE-2322
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Serializers/Deserializers
Reporter: Sohan Jain
Assignee: Sohan Jain


We store metadata about ColumnarSerDes in the metastore, so it should be 
considered a native SerDe.  Then, column information can be retrieved from the 
metastore instead of from deserialization.

Currently, for non-native SerDes, column comments are only shown as from 
deserializer.  Adding ColumnarSerDe to the list of native SerDes will persist 
column comments.  See HIVE-2171 for persisting the column comments of custom 
SerDes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2322) Add ColumnarSerDe to the list of native SerDes

2011-07-28 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2322:
-

Attachment: HIVE-2322.1.patch

 Add ColumnarSerDe to the list of native SerDes
 --

 Key: HIVE-2322
 URL: https://issues.apache.org/jira/browse/HIVE-2322
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Serializers/Deserializers
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2322.1.patch


 We store metadata about ColumnarSerDes in the metastore, so it should be 
 considered a native SerDe.  Then, column information can be retrieved from 
 the metastore instead of from deserialization.
 Currently, for non-native SerDes, column comments are only shown as from 
 deserializer.  Adding ColumnarSerDe to the list of native SerDes will 
 persist column comments.  See HIVE-2171 for persisting the column comments of 
 custom SerDes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.

2011-07-25 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2226:
-

Attachment: HIVE-2226.4.patch

include auto-gen thrift files

 Add API to retrieve table names by an arbitrary filter, e.g., by owner, 
 retention, parameters, etc.
 ---

 Key: HIVE-2226
 URL: https://issues.apache.org/jira/browse/HIVE-2226
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2226.1.patch, HIVE-2226.3.patch, HIVE-2226.4.patch


 Create a function called get_table_names_by_filter that returns a list of 
 table names in a database that match a certain filter.  The filter should 
 operate similar to the one HIVE-1609.  Initially, you should be able to prune 
 the table list based on owner, retention, or table parameter key/values.  The 
 filtering should take place at the JDO level for efficiency/speed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-07-21 Thread Sohan Jain (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sohan Jain updated HIVE-2246:
-

Description:
Note: this patch proposes a schema change, and is therefore incompatible with
the current metastore.

We can re-organize the JDO models to reduce space usage to keep the metastore
scalable for the future. Currently, partitions are the fastest growing objects
in the metastore, and the metastore keeps a separate copy of the columns list
for each partition. We can normalize the metastore db by decoupling Columns
from Storage Descriptors and not storing duplicate lists of the columns for
each partition.

An idea is to create an additional level of indirection with a Column
Descriptor that has a list of columns. A table has a reference to its latest
Column Descriptor (note: a table may have more than one Column Descriptor in
the case of schema evolution). Partitions and Indexes can reference the same
Column Descriptors as their parent table.

Currently, the COLUMNS table in the metastore has roughly (number of partitions
+ number of tables) * (average number of columns pertable) rows. We can reduce
this to (number of tables) * (average number of columns per table) rows, while
incurring a small cost proportional to the number of tables to store the Column
Descriptors.

Please see the latest review board for additional implementation details.

was:
We can re-organize the JDO models to reduce space usage to keep the metastore
scalable for the future. Currently, partitions are the fastest growing objects
in the metastore, and the metastore keeps a separate copy of the columns list
for each partition. We can normalize the metastore db by decoupling Columns
from Storage Descriptors and not storing duplicate lists of the columns for
each partition.

Tags: metastore, schema, JDO

Dedupe tables' column schemas from partitions in the metastore db
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-07-21 Thread Sohan Jain (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sohan Jain updated HIVE-2246:
-

Attachment: HIVE-2246.3.patch

Adding some missing files that I forgot to svn add

Dedupe tables' column schemas from partitions in the metastore db
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.

2011-07-12 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2226:
-

Attachment: HIVE-2226.3.patch

 Add API to retrieve table names by an arbitrary filter, e.g., by owner, 
 retention, parameters, etc.
 ---

 Key: HIVE-2226
 URL: https://issues.apache.org/jira/browse/HIVE-2226
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2226.1.patch, HIVE-2226.3.patch


 Create a function called get_table_names_by_filter that returns a list of 
 table names in a database that match a certain filter.  The filter should 
 operate similar to the one HIVE-1609.  Initially, you should be able to prune 
 the table list based on owner, retention, or table parameter key/values.  The 
 filtering should take place at the JDO level for efficiency/speed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2275) Revert HIVE-2219 and apply correct patch to improve the efficiency of dropping multiple partitions

2011-07-08 Thread Sohan Jain (JIRA)

Revert HIVE-2219 and apply correct patch to improve the efficiency of dropping 
multiple partitions
--

 Key: HIVE-2275
 URL: https://issues.apache.org/jira/browse/HIVE-2275
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Sohan Jain
Assignee: Sohan Jain


HIVE-2219 applied an incorrect patch that fails unit tests.  This patch reverts 
those changes and adds the intended changes to improve the efficiency of 
dropping multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2275) Revert HIVE-2219 and apply correct patch to improve the efficiency of dropping multiple partitions

2011-07-08 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2275:
-

Attachment: HIVE-2275.1.patch

 Revert HIVE-2219 and apply correct patch to improve the efficiency of 
 dropping multiple partitions
 --

 Key: HIVE-2275
 URL: https://issues.apache.org/jira/browse/HIVE-2275
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2275.1.patch


 HIVE-2219 applied an incorrect patch that fails unit tests.  This patch 
 reverts those changes and adds the intended changes to improve the efficiency 
 of dropping multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2275) Revert HIVE-2219 and apply correct patch to improve the efficiency of dropping multiple partitions

2011-07-08 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2275:
-

Status: Patch Available  (was: Open)

 Revert HIVE-2219 and apply correct patch to improve the efficiency of 
 dropping multiple partitions
 --

 Key: HIVE-2275
 URL: https://issues.apache.org/jira/browse/HIVE-2275
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2275.1.patch


 HIVE-2219 applied an incorrect patch that fails unit tests.  This patch 
 reverts those changes and adds the intended changes to improve the efficiency 
 of dropping multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2219) Make alter table drop partition more efficient

2011-07-08 Thread Sohan Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062167#comment-13062167
 ] 

Sohan Jain commented on HIVE-2219:
--

Ok, please refer to HIVE-2275

 Make alter table drop partition more efficient
 

 Key: HIVE-2219
 URL: https://issues.apache.org/jira/browse/HIVE-2219
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Sohan Jain
Assignee: Sohan Jain
 Fix For: 0.8.0

 Attachments: HIVE-2219.1.patch, HIVE-2219.2.patch


 The current function dropTable() that handles dropping multiple partitions is 
 somewhat inefficient.  For each partition you want to drop, it loops through 
 each partition in the table to see if the partition exists.  This is an 
 _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is 
 the number of partitions in the table.  The running time of this function can 
 be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2194) Add actions for alter table and alter partition events for metastore event listeners

2011-07-08 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2194:
-

Attachment: HIVE-2194.4.patch

 Add actions for alter table and alter partition events for metastore event 
 listeners
 

 Key: HIVE-2194
 URL: https://issues.apache.org/jira/browse/HIVE-2194
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Fix For: 0.8.0

 Attachments: HIVE-2194.1.patch, HIVE-2194.3.patch, HIVE-2194.4.patch


 HIVE-2038 introduced the MetaStoreEventListener abstract class that defines 
 actions to be performed after particular events on a metastore.  Improve upon 
 that class by adding events to be performed on alter table and alter 
 partition actions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2276) Fix Inconsistency between RB and JIRA patches for HIVE-2194

2011-07-08 Thread Sohan Jain (JIRA)

Fix Inconsistency between RB and JIRA patches for HIVE-2194
---

 Key: HIVE-2276
 URL: https://issues.apache.org/jira/browse/HIVE-2276
 Project: Hive
  Issue Type: Bug
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2276.1.patch

The RB and JIRA patches for HIVE-2194 were out of sync.  An outdated patch for 
HIVE-2194 was committed.  This patch updates that patch to include the changes 
from RB.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2276) Fix Inconsistency between RB and JIRA patches for HIVE-2194

2011-07-08 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2276:
-

Attachment: HIVE-2276.1.patch

 Fix Inconsistency between RB and JIRA patches for HIVE-2194
 ---

 Key: HIVE-2276
 URL: https://issues.apache.org/jira/browse/HIVE-2276
 Project: Hive
  Issue Type: Bug
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2276.1.patch


 The RB and JIRA patches for HIVE-2194 were out of sync.  An outdated patch 
 for HIVE-2194 was committed.  This patch updates that patch to include the 
 changes from RB.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2276) Fix Inconsistency between RB and JIRA patches for HIVE-2194

2011-07-08 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2276:
-

Status: Patch Available  (was: Open)

 Fix Inconsistency between RB and JIRA patches for HIVE-2194
 ---

 Key: HIVE-2276
 URL: https://issues.apache.org/jira/browse/HIVE-2276
 Project: Hive
  Issue Type: Bug
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2276.1.patch


 The RB and JIRA patches for HIVE-2194 were out of sync.  An outdated patch 
 for HIVE-2194 was committed.  This patch updates that patch to include the 
 changes from RB.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2256) Better error message in CLI on invalid column name

2011-07-07 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2256:
-

Status: Patch Available  (was: Open)

 Better error message in CLI on invalid column name
 --

 Key: HIVE-2256
 URL: https://issues.apache.org/jira/browse/HIVE-2256
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Query Processor
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2256.1.patch, HIVE-2256.2.patch


 In the CLI, if a user inputs an incorrect column name, we currently just 
 print the bad column name in the query.  Typically, the user needs to 
 describe the table to figure out the correct column.
 This patch prints out a list of valid column and partition names in the table 
 when a user inputs an incorrect column name.
 e.g., 
 {{Error in semantic analysis: Invalid table alias or column reference 
 'col_does_not_exist' (possible column names are: col1, col2)}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2256) Better error message in CLI on invalid column name

2011-07-05 Thread Sohan Jain (JIRA)

Better error message in CLI on invalid column name
--

 Key: HIVE-2256
 URL: https://issues.apache.org/jira/browse/HIVE-2256
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Query Processor
Reporter: Sohan Jain


In the CLI, if a user inputs an incorrect column name, we currently just print 
the bad column name in the query.  Typically, the user needs to describe the 
table to figure out the correct column.

This patch prints out a list of valid column and partition names in the table 
when a user inputs an incorrect column name.

e.g., 
{{Error in semantic analysis: Invalid table alias or column reference 
'col_does_not_exist' (possible column names are: col1, col2)}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2256) Better error message in CLI on invalid column name

2011-07-05 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2256:
-

Attachment: HIVE-2256.1.patch

The patch will probably need to edit test cases as well to account for the new 
error messages.

 Better error message in CLI on invalid column name
 --

 Key: HIVE-2256
 URL: https://issues.apache.org/jira/browse/HIVE-2256
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Query Processor
Reporter: Sohan Jain
 Attachments: HIVE-2256.1.patch


 In the CLI, if a user inputs an incorrect column name, we currently just 
 print the bad column name in the query.  Typically, the user needs to 
 describe the table to figure out the correct column.
 This patch prints out a list of valid column and partition names in the table 
 when a user inputs an incorrect column name.
 e.g., 
 {{Error in semantic analysis: Invalid table alias or column reference 
 'col_does_not_exist' (possible column names are: col1, col2)}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-2256) Better error message in CLI on invalid column name

2011-07-05 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain reassigned HIVE-2256:


Assignee: Sohan Jain

 Better error message in CLI on invalid column name
 --

 Key: HIVE-2256
 URL: https://issues.apache.org/jira/browse/HIVE-2256
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Query Processor
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2256.1.patch


 In the CLI, if a user inputs an incorrect column name, we currently just 
 print the bad column name in the query.  Typically, the user needs to 
 describe the table to figure out the correct column.
 This patch prints out a list of valid column and partition names in the table 
 when a user inputs an incorrect column name.
 e.g., 
 {{Error in semantic analysis: Invalid table alias or column reference 
 'col_does_not_exist' (possible column names are: col1, col2)}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2256) Better error message in CLI on invalid column name

2011-07-05 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2256:
-

Attachment: HIVE-2256.2.patch

-Updated the outputs for unit tests.
-Fixed the error message to include the line number and position in the query 
of the invalid column.

 Better error message in CLI on invalid column name
 --

 Key: HIVE-2256
 URL: https://issues.apache.org/jira/browse/HIVE-2256
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Query Processor
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2256.1.patch, HIVE-2256.2.patch


 In the CLI, if a user inputs an incorrect column name, we currently just 
 print the bad column name in the query.  Typically, the user needs to 
 describe the table to figure out the correct column.
 This patch prints out a list of valid column and partition names in the table 
 when a user inputs an incorrect column name.
 e.g., 
 {{Error in semantic analysis: Invalid table alias or column reference 
 'col_does_not_exist' (possible column names are: col1, col2)}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

2011-06-30 Thread Sohan Jain (JIRA)

Dedupe tables' column schemas from partitions in the metastore db
-

 Key: HIVE-2246
 URL: https://issues.apache.org/jira/browse/HIVE-2246
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain


We can re-organize the JDO models to reduce space usage to keep the metastore 
scalable for the future.  Currently, partitions are the fastest growing objects 
in the metastore, and the metastore keeps a separate copy of the columns list 
for each partition.  We can normalize the metastore db by decoupling Columns 
from Storage Descriptors and not storing duplicate lists of the columns for 
each partition. 

An idea is to create an additional level of indirection with a Column 
Descriptor that has a list of columns.  A table has a reference to its latest 
Column Descriptor (note: a table may have more than one Column Descriptor in 
the case of schema evolution).  Partitions and Indexes can reference the same 
Column Descriptors as their parent table.

Currently, the COLUMNS table in the metastore has roughly (number of partitions 
+ number of tables) * (average number of columns pertable) rows.  We can reduce 
this to (number of tables) * (average number of columns per table) rows, while 
incurring a small cost proportional to the number of tables to store the Column 
Descriptors.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2242) DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions

2011-06-29 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2242:
-

Attachment: HIVE-2242.1.patch

- use db.getPartitions instead of db.getPartition to accommodate partial 
specifications

 DDL Semantic Analyzer does not pass partial specification partitions to 
 PreExecute hooks when dropping partitions
 -

 Key: HIVE-2242
 URL: https://issues.apache.org/jira/browse/HIVE-2242
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2242.1.patch


 Currently, when dropping partitions, the DDL Semantic Analyzer only passes 
 partitions that have a full specification to Pre Execution hooks.  It should 
 also include all matches from partial specifications.
 E.g., suppose you have a table
 {{create table test_table (a string) partitioned by (p1 string, p2 string);}}
 {{alter table test_table add partition (p1=1, p2=1);}}
 {{alter table test_table add partition (p1=1, p2=2);}}
 {{alter table test_table add partition (p1=2, p2=2);}}
 and you run 
 {{alter table test_table drop partition(p1=1);}}
 Pre-execution hooks will not be passed any of the partitions.  The expected 
 behavior is for pre-execution hooks to get the WriteEntity's with the 
 partitions p1=1/p2=1 and p1=1/p2=2

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2242) DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions

2011-06-29 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2242:
-

Status: Patch Available  (was: Open)

 DDL Semantic Analyzer does not pass partial specification partitions to 
 PreExecute hooks when dropping partitions
 -

 Key: HIVE-2242
 URL: https://issues.apache.org/jira/browse/HIVE-2242
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2242.1.patch


 Currently, when dropping partitions, the DDL Semantic Analyzer only passes 
 partitions that have a full specification to Pre Execution hooks.  It should 
 also include all matches from partial specifications.
 E.g., suppose you have a table
 {{create table test_table (a string) partitioned by (p1 string, p2 string);}}
 {{alter table test_table add partition (p1=1, p2=1);}}
 {{alter table test_table add partition (p1=1, p2=2);}}
 {{alter table test_table add partition (p1=2, p2=2);}}
 and you run 
 {{alter table test_table drop partition(p1=1);}}
 Pre-execution hooks will not be passed any of the partitions.  The expected 
 behavior is for pre-execution hooks to get the WriteEntity's with the 
 partitions p1=1/p2=1 and p1=1/p2=2

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2242) DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions

2011-06-27 Thread Sohan Jain (JIRA)

DDL Semantic Analyzer does not pass partial specification partitions to 
PreExecute hooks when dropping partitions
-

 Key: HIVE-2242
 URL: https://issues.apache.org/jira/browse/HIVE-2242
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Sohan Jain
Assignee: Sohan Jain


Currently, when dropping partitions, the DDL Semantic Analyzer only passes 
partitions that have a full specification to Pre Execution hooks.  It should 
also include all matches from partial specifications.

E.g., suppose you have a table
{{create table test_table (a string) partitioned by (p1 string, p2 string);
alter table test_table add partition (p1=1, p2=1);
alter table test_table add partition (p1=1, p2=2);
alter table test_table add partition (p1=2, p2=2);
}}

and you run 
{{alter table test_table drop partition(p1=1);}}
Pre-execution hooks will not be passed any of the partitions.  The expected 
behavior is for pre-execution hooks to get the WriteEntity's with the 
partitions p1=1/p2=1 and p1=1/p2=2

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2242) DDL Semantic Analyzer does not pass partial specification partitions to PreExecute hooks when dropping partitions

2011-06-27 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2242:
-

Description: 
Currently, when dropping partitions, the DDL Semantic Analyzer only passes 
partitions that have a full specification to Pre Execution hooks.  It should 
also include all matches from partial specifications.

E.g., suppose you have a table
{{create table test_table (a string) partitioned by (p1 string, p2 string);}}
{{alter table test_table add partition (p1=1, p2=1);}}
{{alter table test_table add partition (p1=1, p2=2);}}
{{alter table test_table add partition (p1=2, p2=2);}}

and you run 
{{alter table test_table drop partition(p1=1);}}
Pre-execution hooks will not be passed any of the partitions.  The expected 
behavior is for pre-execution hooks to get the WriteEntity's with the 
partitions p1=1/p2=1 and p1=1/p2=2

  was:
Currently, when dropping partitions, the DDL Semantic Analyzer only passes 
partitions that have a full specification to Pre Execution hooks.  It should 
also include all matches from partial specifications.

E.g., suppose you have a table
{{create table test_table (a string) partitioned by (p1 string, p2 string);
alter table test_table add partition (p1=1, p2=1);
alter table test_table add partition (p1=1, p2=2);
alter table test_table add partition (p1=2, p2=2);
}}

and you run 
{{alter table test_table drop partition(p1=1);}}
Pre-execution hooks will not be passed any of the partitions.  The expected 
behavior is for pre-execution hooks to get the WriteEntity's with the 
partitions p1=1/p2=1 and p1=1/p2=2


 DDL Semantic Analyzer does not pass partial specification partitions to 
 PreExecute hooks when dropping partitions
 -

 Key: HIVE-2242
 URL: https://issues.apache.org/jira/browse/HIVE-2242
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Sohan Jain
Assignee: Sohan Jain

 Currently, when dropping partitions, the DDL Semantic Analyzer only passes 
 partitions that have a full specification to Pre Execution hooks.  It should 
 also include all matches from partial specifications.
 E.g., suppose you have a table
 {{create table test_table (a string) partitioned by (p1 string, p2 string);}}
 {{alter table test_table add partition (p1=1, p2=1);}}
 {{alter table test_table add partition (p1=1, p2=2);}}
 {{alter table test_table add partition (p1=2, p2=2);}}
 and you run 
 {{alter table test_table drop partition(p1=1);}}
 Pre-execution hooks will not be passed any of the partitions.  The expected 
 behavior is for pre-execution hooks to get the WriteEntity's with the 
 partitions p1=1/p2=1 and p1=1/p2=2

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2194) Add actions for alter table and alter partition events for metastore event listeners

2011-06-21 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2194:
-

Attachment: HIVE-2194.3.patch

 Add actions for alter table and alter partition events for metastore event 
 listeners
 

 Key: HIVE-2194
 URL: https://issues.apache.org/jira/browse/HIVE-2194
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2194.1.patch, HIVE-2194.3.patch


 HIVE-2038 introduced the MetaStoreEventListener abstract class that defines 
 actions to be performed after particular events on a metastore.  Improve upon 
 that class by adding events to be performed on alter table and alter 
 partition actions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2213) Optimize get_partition_names_ps()

2011-06-17 Thread Sohan Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051333#comment-13051333
 ] 

Sohan Jain commented on HIVE-2213:
--

I'd also like to point one more thing out.  The previous implementation of 
get_partitions_ps_with_auth() did not actually make use of the inputted user 
name or group name, nor did it set any auth privileges on the desired 
partitions.  

This patch adds authentication privileges, which unfortunately slows down 
get_partitions_ps_with_auth(), since we have to iterate through all of the 
partitions and set privileges before returning them.  What is the desired 
behavior here?

 Optimize get_partition_names_ps()
 -

 Key: HIVE-2213
 URL: https://issues.apache.org/jira/browse/HIVE-2213
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch


 If a table has a large number of partitions, get_partition_names_ps() make 
 take a long time to execute, because we get all of the partition names from 
 the database.  This is not very memory efficient, and the operation can be 
 pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2213) Optimize get_partition_names_ps()

2011-06-16 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2213:
-

Status: Patch Available  (was: Open)

 Optimize get_partition_names_ps()
 -

 Key: HIVE-2213
 URL: https://issues.apache.org/jira/browse/HIVE-2213
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch


 If a table has a large number of partitions, get_partition_names_ps() make 
 take a long time to execute, because we get all of the partition names from 
 the database.  This is not very memory efficient, and the operation can be 
 pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2213) Optimize get_partition_names_ps()

2011-06-16 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2213:
-

Attachment: HIVE-2213.3.patch

-Fixed line that exceeded 100 chars

 Optimize get_partition_names_ps()
 -

 Key: HIVE-2213
 URL: https://issues.apache.org/jira/browse/HIVE-2213
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2213.1.patch, HIVE-2213.3.patch


 If a table has a large number of partitions, get_partition_names_ps() make 
 take a long time to execute, because we get all of the partition names from 
 the database.  This is not very memory efficient, and the operation can be 
 pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2219) Make alter table drop partition more efficient

2011-06-16 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2219:
-

Status: Open  (was: Patch Available)

 Make alter table drop partition more efficient
 

 Key: HIVE-2219
 URL: https://issues.apache.org/jira/browse/HIVE-2219
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2219.1.patch


 The current function dropTable() that handles dropping multiple partitions is 
 somewhat inefficient.  For each partition you want to drop, it loops through 
 each partition in the table to see if the partition exists.  This is an 
 _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is 
 the number of partitions in the table.  The running time of this function can 
 be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2219) Make alter table drop partition more efficient

2011-06-16 Thread Sohan Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050874#comment-13050874
 ] 

Sohan Jain commented on HIVE-2219:
--

Ah sorry, after another round of testing, I realized this doesn't work 
correctly at all for partial partition specs!  I will re-implement it and test 
again for speed / full correctness.

 Make alter table drop partition more efficient
 

 Key: HIVE-2219
 URL: https://issues.apache.org/jira/browse/HIVE-2219
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2219.1.patch


 The current function dropTable() that handles dropping multiple partitions is 
 somewhat inefficient.  For each partition you want to drop, it loops through 
 each partition in the table to see if the partition exists.  This is an 
 _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is 
 the number of partitions in the table.  The running time of this function can 
 be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.

2011-06-15 Thread Sohan Jain (JIRA)

Add API to retrieve table names by an arbitrary filter, e.g., by owner, 
retention, parameters, etc.
---

 Key: HIVE-2226
 URL: https://issues.apache.org/jira/browse/HIVE-2226
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain


Create a function called get_table_names_by_filter that returns a list of table 
names in a database that match a certain filter.  The filter should operate 
similar to the one HIVE-1609.  Initially, you should be able to prune the table 
list based on owner, retention, or table parameter key/values.  The filtering 
should take place at the JDO level for efficiency/speed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2226) Add API to retrieve table names by an arbitrary filter, e.g., by owner, retention, parameters, etc.

2011-06-15 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2226:
-

Attachment: HIVE-2226.1.patch

 Add API to retrieve table names by an arbitrary filter, e.g., by owner, 
 retention, parameters, etc.
 ---

 Key: HIVE-2226
 URL: https://issues.apache.org/jira/browse/HIVE-2226
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2226.1.patch


 Create a function called get_table_names_by_filter that returns a list of 
 table names in a database that match a certain filter.  The filter should 
 operate similar to the one HIVE-1609.  Initially, you should be able to prune 
 the table list based on owner, retention, or table parameter key/values.  The 
 filtering should take place at the JDO level for efficiency/speed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2219) Make alter table drop partition more efficient

2011-06-14 Thread Sohan Jain (JIRA)

Make alter table drop partition more efficient


 Key: HIVE-2219
 URL: https://issues.apache.org/jira/browse/HIVE-2219
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Sohan Jain


The current function dropTable() that handles dropping multiple partitions is 
somewhat inefficient.  For each partition you want to drop, it loops through 
each partition in the table to see if the partition exists.  This is an _O(mn)_ 
operation, where _m_ is the number of partitions to drop, and _n_ is the number 
of partitions in the table.  The running time of this function can be improved, 
which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-2219) Make alter table drop partition more efficient

2011-06-14 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain reassigned HIVE-2219:


Assignee: Sohan Jain

 Make alter table drop partition more efficient
 

 Key: HIVE-2219
 URL: https://issues.apache.org/jira/browse/HIVE-2219
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Sohan Jain
Assignee: Sohan Jain

 The current function dropTable() that handles dropping multiple partitions is 
 somewhat inefficient.  For each partition you want to drop, it loops through 
 each partition in the table to see if the partition exists.  This is an 
 _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is 
 the number of partitions in the table.  The running time of this function can 
 be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2219) Make alter table drop partition more efficient

2011-06-14 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2219:
-

Attachment: HIVE-2219.1.patch

Improves the time it takes to check whether a partition to delete exists in the 
lists of partitions.  Overall improves the complexity to _O(m + n)_

 Make alter table drop partition more efficient
 

 Key: HIVE-2219
 URL: https://issues.apache.org/jira/browse/HIVE-2219
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2219.1.patch


 The current function dropTable() that handles dropping multiple partitions is 
 somewhat inefficient.  For each partition you want to drop, it loops through 
 each partition in the table to see if the partition exists.  This is an 
 _O(mn)_ operation, where _m_ is the number of partitions to drop, and _n_ is 
 the number of partitions in the table.  The running time of this function can 
 be improved, which is useful for tables with many partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2213) Optimize get_partition_names_ps()

2011-06-10 Thread Sohan Jain (JIRA)

Optimize get_partition_names_ps()
-

 Key: HIVE-2213
 URL: https://issues.apache.org/jira/browse/HIVE-2213
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain


If a table has a large number of partitions, get_partition_names_ps() make take 
a long time to execute, because we get all of the partition names from the 
database.  This is not very memory efficient, and the operation can be pushed 
down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2213) Optimize get_partition_names_ps()

2011-06-10 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2213:
-

Attachment: HIVE-2213.1.patch

 Optimize get_partition_names_ps()
 -

 Key: HIVE-2213
 URL: https://issues.apache.org/jira/browse/HIVE-2213
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2213.1.patch


 If a table has a large number of partitions, get_partition_names_ps() make 
 take a long time to execute, because we get all of the partition names from 
 the database.  This is not very memory efficient, and the operation can be 
 pushed down to the JDO layer without getting all of the names first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2188) Add get_table_objects_by_name() to Hive MetaStore

2011-06-09 Thread Sohan Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046983#comment-13046983
 ] 

Sohan Jain commented on HIVE-2188:
--

Thank you, Carl.

 Add get_table_objects_by_name() to Hive MetaStore
 -

 Key: HIVE-2188
 URL: https://issues.apache.org/jira/browse/HIVE-2188
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2188.1.patch, HIVE-2188.3.patch


 This function would get multiple tables from the hive metastore as opposed to 
 just one at a time, saving round trip time to the metastore.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2188) Add multi_get_table function in Hive Metastore

2011-06-06 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2188:
-

Attachment: HIVE-2188.3.patch

 Add multi_get_table function in Hive Metastore
 --

 Key: HIVE-2188
 URL: https://issues.apache.org/jira/browse/HIVE-2188
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
Priority: Minor
 Attachments: HIVE-2188.1.patch, HIVE-2188.3.patch


 This function would get multiple tables from the hive metastore as opposed to 
 just one at a time, saving round trip time to the metastore.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2194) Add actions for alter table and alter partition events for metastore event listeners

2011-06-03 Thread Sohan Jain (JIRA)

Add actions for alter table and alter partition events for metastore event 
listeners


 Key: HIVE-2194
 URL: https://issues.apache.org/jira/browse/HIVE-2194
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain


HIVE-2038 introduced the MetaStoreEventListener abstract class that defines 
actions to be performed after particular events on a metastore.  Improve upon 
that class by adding events to be performed on alter table and alter 
partition actions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2194) Add actions for alter table and alter partition events for metastore event listeners

2011-06-03 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2194:
-

Status: Patch Available  (was: Open)

 Add actions for alter table and alter partition events for metastore event 
 listeners
 

 Key: HIVE-2194
 URL: https://issues.apache.org/jira/browse/HIVE-2194
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
 Attachments: HIVE-2194.1.patch


 HIVE-2038 introduced the MetaStoreEventListener abstract class that defines 
 actions to be performed after particular events on a metastore.  Improve upon 
 that class by adding events to be performed on alter table and alter 
 partition actions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2188) Add multi_get_table function in Hive Metastore

2011-06-01 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2188:
-

Attachment: HIVE-2188.1.patch

 - Added multiGetTable function to the interface RawStore
 - Implemented it in ObjectStore imitating the SQL IN operator in JDO
 - added multi_get_table function to HiveMetaStore that is a wrapper to the 
RawStore function.

 Add multi_get_table function in Hive Metastore
 --

 Key: HIVE-2188
 URL: https://issues.apache.org/jira/browse/HIVE-2188
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
Priority: Minor
 Attachments: HIVE-2188.1.patch


 This function would get multiple tables from the hive metastore as opposed to 
 just one at a time, saving round trip time to the metastore.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2188) Add multi_get_table function in Hive Metastore

2011-06-01 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-2188:
-

Status: Patch Available  (was: Open)

 Add multi_get_table function in Hive Metastore
 --

 Key: HIVE-2188
 URL: https://issues.apache.org/jira/browse/HIVE-2188
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
Priority: Minor
 Attachments: HIVE-2188.1.patch


 This function would get multiple tables from the hive metastore as opposed to 
 just one at a time, saving round trip time to the metastore.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1595) job name for alter table T archive partition P is not correct

2011-05-28 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-1595:
-

Status: Patch Available  (was: Open)

Removed README edit and unnecessary diffs to the import statements

 job name for alter table T archive partition P is not correct
 -

 Key: HIVE-1595
 URL: https://issues.apache.org/jira/browse/HIVE-1595
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Sohan Jain
 Attachments: Hive-1595.1.patch, Hive-1595.2.patch


 For some internal runs, I saw the job name as hadoop-0.20.1-tools.jar, which 
 makes it difficult to identify

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2188) Add multi_get_table function in Hive Metastore

2011-05-27 Thread Sohan Jain (JIRA)

Add multi_get_table function in Hive Metastore
--

 Key: HIVE-2188
 URL: https://issues.apache.org/jira/browse/HIVE-2188
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Sohan Jain
Assignee: Sohan Jain
Priority: Minor


This function would get multiple tables from the hive metastore as opposed to 
just one at a time, saving round trip time to the metastore.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1595) job name for alter table T archive partition P is not correct

2011-05-27 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-1595:
-

Attachment: Hive-1595.1.patch

 job name for alter table T archive partition P is not correct
 -

 Key: HIVE-1595
 URL: https://issues.apache.org/jira/browse/HIVE-1595
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Sohan Jain
 Attachments: Hive-1595.1.patch


 For some internal runs, I saw the job name as hadoop-0.20.1-tools.jar, which 
 makes it difficult to identify

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-1595) job name for alter table T archive partition P is not correct

2011-05-27 Thread Sohan Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sohan Jain updated HIVE-1595:
-

Status: Patch Available  (was: Open)

 job name for alter table T archive partition P is not correct
 -

 Key: HIVE-1595
 URL: https://issues.apache.org/jira/browse/HIVE-1595
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Sohan Jain
 Attachments: Hive-1595.1.patch


 For some internal runs, I saw the job name as hadoop-0.20.1-tools.jar, which 
 makes it difficult to identify

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

61 matches

Mail list logo