[jira] [Updated] (HIVE-7805) Support running multiple scans in hbase-handler

2015-02-05 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-7805:
---
Priority: Major  (was: Minor)

> Support running multiple scans in hbase-handler
> ---
>
> Key: HIVE-7805
> URL: https://issues.apache.org/jira/browse/HIVE-7805
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.14.0
>Reporter: Andrew Mains
>Assignee: Andrew Mains
> Attachments: HIVE-7805.1.patch, HIVE-7805.patch
>
>
> Currently, the HiveHBaseTableInputFormat only supports running a single scan. 
> This can be less efficient than running multiple disjoint scans in certain 
> cases, particularly when using a composite row key. For instance, given a row 
> key schema of:
> {code}
> struct
> {code}
> if one wants to push down the predicate:
> {code}
> bucket IN (1, 10, 100) AND timestamp >= 1408333927 AND timestamp < 1408506670
> {code}
> it's much more efficient to run a scan for each bucket over the time range 
> (particularly if there's a large amount of data per day). With a single scan, 
> the MR job has to process the data for all time for buckets in between 1 and 
> 100.
> hive should allow HBaseKeyFactory's to decompose a predicate into one or more 
> scans in order to take advantage of this fact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7805) Support running multiple scans in hbase-handler

2014-08-22 Thread Andrew Mains (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Mains updated HIVE-7805:
---

Status: Patch Available  (was: In Progress)

> Support running multiple scans in hbase-handler
> ---
>
> Key: HIVE-7805
> URL: https://issues.apache.org/jira/browse/HIVE-7805
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.14.0
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Attachments: HIVE-7805.1.patch, HIVE-7805.patch
>
>
> Currently, the HiveHBaseTableInputFormat only supports running a single scan. 
> This can be less efficient than running multiple disjoint scans in certain 
> cases, particularly when using a composite row key. For instance, given a row 
> key schema of:
> {code}
> struct
> {code}
> if one wants to push down the predicate:
> {code}
> bucket IN (1, 10, 100) AND timestamp >= 1408333927 AND timestamp < 1408506670
> {code}
> it's much more efficient to run a scan for each bucket over the time range 
> (particularly if there's a large amount of data per day). With a single scan, 
> the MR job has to process the data for all time for buckets in between 1 and 
> 100.
> hive should allow HBaseKeyFactory's to decompose a predicate into one or more 
> scans in order to take advantage of this fact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7805) Support running multiple scans in hbase-handler

2014-08-22 Thread Andrew Mains (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Mains updated HIVE-7805:
---

Priority: Minor  (was: Major)

> Support running multiple scans in hbase-handler
> ---
>
> Key: HIVE-7805
> URL: https://issues.apache.org/jira/browse/HIVE-7805
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.14.0
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Attachments: HIVE-7805.1.patch, HIVE-7805.patch
>
>
> Currently, the HiveHBaseTableInputFormat only supports running a single scan. 
> This can be less efficient than running multiple disjoint scans in certain 
> cases, particularly when using a composite row key. For instance, given a row 
> key schema of:
> {code}
> struct
> {code}
> if one wants to push down the predicate:
> {code}
> bucket IN (1, 10, 100) AND timestamp >= 1408333927 AND timestamp < 1408506670
> {code}
> it's much more efficient to run a scan for each bucket over the time range 
> (particularly if there's a large amount of data per day). With a single scan, 
> the MR job has to process the data for all time for buckets in between 1 and 
> 100.
> hive should allow HBaseKeyFactory's to decompose a predicate into one or more 
> scans in order to take advantage of this fact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7805) Support running multiple scans in hbase-handler

2014-08-22 Thread Andrew Mains (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Mains updated HIVE-7805:
---

Attachment: HIVE-7805.1.patch

Second go at this--I think I've fixed up the tests (at least, all of those 
failures that looked related to my changes). 

> Support running multiple scans in hbase-handler
> ---
>
> Key: HIVE-7805
> URL: https://issues.apache.org/jira/browse/HIVE-7805
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.14.0
>Reporter: Andrew Mains
>Assignee: Andrew Mains
> Attachments: HIVE-7805.1.patch, HIVE-7805.patch
>
>
> Currently, the HiveHBaseTableInputFormat only supports running a single scan. 
> This can be less efficient than running multiple disjoint scans in certain 
> cases, particularly when using a composite row key. For instance, given a row 
> key schema of:
> {code}
> struct
> {code}
> if one wants to push down the predicate:
> {code}
> bucket IN (1, 10, 100) AND timestamp >= 1408333927 AND timestamp < 1408506670
> {code}
> it's much more efficient to run a scan for each bucket over the time range 
> (particularly if there's a large amount of data per day). With a single scan, 
> the MR job has to process the data for all time for buckets in between 1 and 
> 100.
> hive should allow HBaseKeyFactory's to decompose a predicate into one or more 
> scans in order to take advantage of this fact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7805) Support running multiple scans in hbase-handler

2014-08-22 Thread Andrew Mains (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Mains updated HIVE-7805:
---

Status: In Progress  (was: Patch Available)

> Support running multiple scans in hbase-handler
> ---
>
> Key: HIVE-7805
> URL: https://issues.apache.org/jira/browse/HIVE-7805
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.14.0
>Reporter: Andrew Mains
>Assignee: Andrew Mains
> Attachments: HIVE-7805.patch
>
>
> Currently, the HiveHBaseTableInputFormat only supports running a single scan. 
> This can be less efficient than running multiple disjoint scans in certain 
> cases, particularly when using a composite row key. For instance, given a row 
> key schema of:
> {code}
> struct
> {code}
> if one wants to push down the predicate:
> {code}
> bucket IN (1, 10, 100) AND timestamp >= 1408333927 AND timestamp < 1408506670
> {code}
> it's much more efficient to run a scan for each bucket over the time range 
> (particularly if there's a large amount of data per day). With a single scan, 
> the MR job has to process the data for all time for buckets in between 1 and 
> 100.
> hive should allow HBaseKeyFactory's to decompose a predicate into one or more 
> scans in order to take advantage of this fact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7805) Support running multiple scans in hbase-handler

2014-08-20 Thread Andrew Mains (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Mains updated HIVE-7805:
---

Assignee: Andrew Mains
  Status: Patch Available  (was: Open)

> Support running multiple scans in hbase-handler
> ---
>
> Key: HIVE-7805
> URL: https://issues.apache.org/jira/browse/HIVE-7805
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.14.0
>Reporter: Andrew Mains
>Assignee: Andrew Mains
> Attachments: HIVE-7805.patch
>
>
> Currently, the HiveHBaseTableInputFormat only supports running a single scan. 
> This can be less efficient than running multiple disjoint scans in certain 
> cases, particularly when using a composite row key. For instance, given a row 
> key schema of:
> {code}
> struct
> {code}
> if one wants to push down the predicate:
> {code}
> bucket IN (1, 10, 100) AND timestamp >= 1408333927 AND timestamp < 1408506670
> {code}
> it's much more efficient to run a scan for each bucket over the time range 
> (particularly if there's a large amount of data per day). With a single scan, 
> the MR job has to process the data for all time for buckets in between 1 and 
> 100.
> hive should allow HBaseKeyFactory's to decompose a predicate into one or more 
> scans in order to take advantage of this fact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7805) Support running multiple scans in hbase-handler

2014-08-20 Thread Andrew Mains (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Mains updated HIVE-7805:
---

Attachment: HIVE-7805.patch

This patch changes HiveHBaseTableInputFormat to extend 
MultiTableInputFormatBase, and allows HBaseKeyFactory implementations to push a 
List, instead of just a single HBaseScanRange.

> Support running multiple scans in hbase-handler
> ---
>
> Key: HIVE-7805
> URL: https://issues.apache.org/jira/browse/HIVE-7805
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.14.0
>Reporter: Andrew Mains
> Attachments: HIVE-7805.patch
>
>
> Currently, the HiveHBaseTableInputFormat only supports running a single scan. 
> This can be less efficient than running multiple disjoint scans in certain 
> cases, particularly when using a composite row key. For instance, given a row 
> key schema of:
> {code}
> struct
> {code}
> if one wants to push down the predicate:
> {code}
> bucket IN (1, 10, 100) AND timestamp >= 1408333927 AND timestamp < 1408506670
> {code}
> it's much more efficient to run a scan for each bucket over the time range 
> (particularly if there's a large amount of data per day). With a single scan, 
> the MR job has to process the data for all time for buckets in between 1 and 
> 100.
> hive should allow HBaseKeyFactory's to decompose a predicate into one or more 
> scans in order to take advantage of this fact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)