subject:"\[jira\] \[Updated\] \(HIVE\-2050\) batch processing partition pruning process"

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

2012-08-20 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2050:
-

Labels: PartitionPruner  (was: )

 batch processing partition pruning process
 --

 Key: HIVE-2050
 URL: https://issues.apache.org/jira/browse/HIVE-2050
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore, Query Processor
Reporter: Ning Zhang
Assignee: Ning Zhang
  Labels: PartitionPruner
 Fix For: 0.8.0

 Attachments: HIVE-2050.2.patch, HIVE-2050.3.patch, HIVE-2050.4.patch, 
 HIVE-2050.patch


 For partition predicates that cannot be pushed down to JDO filtering 
 (HIVE-2049), we should fall back to the old approach of listing all partition 
 names first and use Hive's expression evaluation engine to select the correct 
 partitions. Then the partition pruner should hand Hive a list of partition 
 names and return a list of Partition Object (this should be added to the Hive 
 API). 
 A possible optimization is that the the partition pruner should give Hive a 
 set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
 the JDO query should be formulated as range queries. Range queries are 
 possible because the first step list all partition names in sorted order. 
 It's easy to come up with a range and it is guaranteed that the JDO range 
 query results should be equivalent to the query with a list of partition 
 names. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

2011-06-30 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2050:
-

  Component/s: Query Processor
   Metastore
Fix Version/s: 0.8.0

 batch processing partition pruning process
 --

 Key: HIVE-2050
 URL: https://issues.apache.org/jira/browse/HIVE-2050
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore, Query Processor
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.8.0

 Attachments: HIVE-2050.2.patch, HIVE-2050.3.patch, HIVE-2050.4.patch, 
 HIVE-2050.patch


 For partition predicates that cannot be pushed down to JDO filtering 
 (HIVE-2049), we should fall back to the old approach of listing all partition 
 names first and use Hive's expression evaluation engine to select the correct 
 partitions. Then the partition pruner should hand Hive a list of partition 
 names and return a list of Partition Object (this should be added to the Hive 
 API). 
 A possible optimization is that the the partition pruner should give Hive a 
 set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
 the JDO query should be formulated as range queries. Range queries are 
 possible because the first step list all partition names in sorted order. 
 It's easy to come up with a range and it is guaranteed that the JDO range 
 query results should be equivalent to the query with a list of partition 
 names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

2011-03-29 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-2050:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Ning

 batch processing partition pruning process
 --

 Key: HIVE-2050
 URL: https://issues.apache.org/jira/browse/HIVE-2050
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2050.2.patch, HIVE-2050.3.patch, HIVE-2050.4.patch, 
 HIVE-2050.patch


 For partition predicates that cannot be pushed down to JDO filtering 
 (HIVE-2049), we should fall back to the old approach of listing all partition 
 names first and use Hive's expression evaluation engine to select the correct 
 partitions. Then the partition pruner should hand Hive a list of partition 
 names and return a list of Partition Object (this should be added to the Hive 
 API). 
 A possible optimization is that the the partition pruner should give Hive a 
 set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
 the JDO query should be formulated as range queries. Range queries are 
 possible because the first step list all partition names in sorted order. 
 It's easy to come up with a range and it is guaranteed that the JDO range 
 query results should be equivalent to the query with a list of partition 
 names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

2011-03-28 Thread Ning Zhang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ning Zhang updated HIVE-2050:
-

Attachment: HIVE-2050.2.patch

There are 2 major changes from the last patch:
- added a parameter hive.metastore.batch.retrieve.max to control the maximum
number of partitions can be retrieved from the metastore in one batch (default
300). In Hive.getPartitionsByNames(), the input partition name list are
separated into sublists and call the metastore API for each sublist.
- one of the most time consuming DB operations is the retrieve the sub-classes
of MPartition. In particular the list of FieldSchema are retrieved for each
partition and they are never used (the table's field schema is used for all
partitions). So one of the changes here is to omit the retrieval of FieldSchema
and make the table's fieldschema as the partitions. If later we need the
partition's fieldschema for schema evaluation, we should add another
function/flag for that.

These changes reduce memory by 50% and CPU by 20%.

The review board is also updated with the Java-only patch.

batch processing partition pruning process
--

Key: HIVE-2050
URL: https://issues.apache.org/jira/browse/HIVE-2050
Project: Hive
Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
Attachments: HIVE-2050.2.patch, HIVE-2050.patch

For partition predicates that cannot be pushed down to JDO filtering
(HIVE-2049), we should fall back to the old approach of listing all partition
names first and use Hive's expression evaluation engine to select the correct
partitions. Then the partition pruner should hand Hive a list of partition
names and return a list of Partition Object (this should be added to the Hive
API).
A possible optimization is that the the partition pruner should give Hive a
set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and
the JDO query should be formulated as range queries. Range queries are
possible because the first step list all partition names in sorted order.
It's easy to come up with a range and it is guaranteed that the JDO range
query results should be equivalent to the query with a list of partition
names.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

2011-03-28 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2050:
-

Status: Patch Available  (was: Open)

 batch processing partition pruning process
 --

 Key: HIVE-2050
 URL: https://issues.apache.org/jira/browse/HIVE-2050
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2050.2.patch, HIVE-2050.patch


 For partition predicates that cannot be pushed down to JDO filtering 
 (HIVE-2049), we should fall back to the old approach of listing all partition 
 names first and use Hive's expression evaluation engine to select the correct 
 partitions. Then the partition pruner should hand Hive a list of partition 
 names and return a list of Partition Object (this should be added to the Hive 
 API). 
 A possible optimization is that the the partition pruner should give Hive a 
 set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
 the JDO query should be formulated as range queries. Range queries are 
 possible because the first step list all partition names in sorted order. 
 It's easy to come up with a range and it is guaranteed that the JDO range 
 query results should be equivalent to the query with a list of partition 
 names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

2011-03-28 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2050:
-

Attachment: HIVE-2050.3.patch

Taken Namit's comment. Review board is also updated. 

 batch processing partition pruning process
 --

 Key: HIVE-2050
 URL: https://issues.apache.org/jira/browse/HIVE-2050
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2050.2.patch, HIVE-2050.3.patch, HIVE-2050.patch


 For partition predicates that cannot be pushed down to JDO filtering 
 (HIVE-2049), we should fall back to the old approach of listing all partition 
 names first and use Hive's expression evaluation engine to select the correct 
 partitions. Then the partition pruner should hand Hive a list of partition 
 names and return a list of Partition Object (this should be added to the Hive 
 API). 
 A possible optimization is that the the partition pruner should give Hive a 
 set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
 the JDO query should be formulated as range queries. Range queries are 
 possible because the first step list all partition names in sorted order. 
 It's easy to come up with a range and it is guaranteed that the JDO range 
 query results should be equivalent to the query with a list of partition 
 names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

2011-03-28 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2050:
-

Status: Patch Available  (was: Open)

 batch processing partition pruning process
 --

 Key: HIVE-2050
 URL: https://issues.apache.org/jira/browse/HIVE-2050
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2050.2.patch, HIVE-2050.3.patch, HIVE-2050.4.patch, 
 HIVE-2050.patch


 For partition predicates that cannot be pushed down to JDO filtering 
 (HIVE-2049), we should fall back to the old approach of listing all partition 
 names first and use Hive's expression evaluation engine to select the correct 
 partitions. Then the partition pruner should hand Hive a list of partition 
 names and return a list of Partition Object (this should be added to the Hive 
 API). 
 A possible optimization is that the the partition pruner should give Hive a 
 set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
 the JDO query should be formulated as range queries. Range queries are 
 possible because the first step list all partition names in sorted order. 
 It's easy to come up with a range and it is guaranteed that the JDO range 
 query results should be equivalent to the query with a list of partition 
 names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

2011-03-25 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-2050:
-

Status: Open  (was: Patch Available)

 batch processing partition pruning process
 --

 Key: HIVE-2050
 URL: https://issues.apache.org/jira/browse/HIVE-2050
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2050.patch


 For partition predicates that cannot be pushed down to JDO filtering 
 (HIVE-2049), we should fall back to the old approach of listing all partition 
 names first and use Hive's expression evaluation engine to select the correct 
 partitions. Then the partition pruner should hand Hive a list of partition 
 names and return a list of Partition Object (this should be added to the Hive 
 API). 
 A possible optimization is that the the partition pruner should give Hive a 
 set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
 the JDO query should be formulated as range queries. Range queries are 
 possible because the first step list all partition names in sorted order. 
 It's easy to come up with a range and it is guaranteed that the JDO range 
 query results should be equivalent to the query with a list of partition 
 names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

2011-03-22 Thread Ning Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2050:
-

Status: Patch Available  (was: Open)

 batch processing partition pruning process
 --

 Key: HIVE-2050
 URL: https://issues.apache.org/jira/browse/HIVE-2050
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2050.patch


 For partition predicates that cannot be pushed down to JDO filtering 
 (HIVE-2049), we should fall back to the old approach of listing all partition 
 names first and use Hive's expression evaluation engine to select the correct 
 partitions. Then the partition pruner should hand Hive a list of partition 
 names and return a list of Partition Object (this should be added to the Hive 
 API). 
 A possible optimization is that the the partition pruner should give Hive a 
 set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
 the JDO query should be formulated as range queries. Range queries are 
 possible because the first step list all partition names in sorted order. 
 It's easy to come up with a range and it is guaranteed that the JDO range 
 query results should be equivalent to the query with a list of partition 
 names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

2011-03-22 Thread Ning Zhang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ning Zhang updated HIVE-2050:
-

Attachment: HIVE-2050.patch

Uploading a new patch for review. Still running tests. The review board
request: https://reviews.apache.org/r/522/

batch processing partition pruning process
--

Key: HIVE-2050
URL: https://issues.apache.org/jira/browse/HIVE-2050
Project: Hive
Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
Attachments: HIVE-2050.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

[jira] [Updated] (HIVE-2050) batch processing partition pruning process

10 matches

Site Navigation

Mail list logo

Footer information