[jira] [Work logged] (HIVE-24263) Create an HMS endpoint to list partition locations

2020-12-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24263?focusedWorklogId=528475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528475
 ]

ASF GitHub Bot logged work on HIVE-24263:
-

Author: ASF GitHub Bot
Created on: 26/Dec/20 00:59
Start Date: 26/Dec/20 00:59
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1572:
URL: https://github.com/apache/hive/pull/1572


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 528475)
Time Spent: 40m  (was: 0.5h)

> Create an HMS endpoint to list partition locations
> --
>
> Key: HIVE-24263
> URL: https://issues.apache.org/jira/browse/HIVE-24263
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24263.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In our company, we have a use-case to get quickly a list of partition 
> locations.  Currently it is done via listPartitions, which is a very heavy 
> operation in terms of memory and performance.
> This JIRA proposes an API: Map listPartitionLocations(String 
> db, String table, short max) that returns a map of partition names to 
> locations.
> For example, we have an integration from output of a Hive pipeline to Spark 
> jobs that consume directly from HDFS.  The Spark job scheduler needs to know 
> the partition paths that are available for consumption (the partition name is 
> not sufficient as it's input is HDFS path), and so we have to do heavy 
> listPartitions() for this.
> Another use-case is for a HDFS data removal tool that does a nightly crawl to 
> see if there are associated hive partitions mapped to a given partition path. 
>  The nightly crawling job could be much less resource-intensive if we had a 
> listPartitionLocations().
> As there is already an internal method in the ObjectStore for this done for 
> dropPartitions, it is only a matter of exposing this API to 
> HiveMetaStoreClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24263) Create an HMS endpoint to list partition locations

2020-12-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24263?focusedWorklogId=526199=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-526199
 ]

ASF GitHub Bot logged work on HIVE-24263:
-

Author: ASF GitHub Bot
Created on: 19/Dec/20 00:54
Start Date: 19/Dec/20 00:54
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1572:
URL: https://github.com/apache/hive/pull/1572#issuecomment-748392298


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 526199)
Time Spent: 0.5h  (was: 20m)

> Create an HMS endpoint to list partition locations
> --
>
> Key: HIVE-24263
> URL: https://issues.apache.org/jira/browse/HIVE-24263
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24263.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In our company, we have a use-case to get quickly a list of partition 
> locations.  Currently it is done via listPartitions, which is a very heavy 
> operation in terms of memory and performance.
> This JIRA proposes an API: Map listPartitionLocations(String 
> db, String table, short max) that returns a map of partition names to 
> locations.
> For example, we have an integration from output of a Hive pipeline to Spark 
> jobs that consume directly from HDFS.  The Spark job scheduler needs to know 
> the partition paths that are available for consumption (the partition name is 
> not sufficient as it's input is HDFS path), and so we have to do heavy 
> listPartitions() for this.
> Another use-case is for a HDFS data removal tool that does a nightly crawl to 
> see if there are associated hive partitions mapped to a given partition path. 
>  The nightly crawling job could be much less resource-intensive if we had a 
> listPartitionLocations().
> As there is already an internal method in the ObjectStore for this done for 
> dropPartitions, it is only a matter of exposing this API to 
> HiveMetaStoreClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24263) Create an HMS endpoint to list partition locations

2020-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24263?focusedWorklogId=502097=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502097
 ]

ASF GitHub Bot logged work on HIVE-24263:
-

Author: ASF GitHub Bot
Created on: 19/Oct/20 10:43
Start Date: 19/Oct/20 10:43
Worklog Time Spent: 10m 
  Work Description: szehonCriteo commented on pull request #1572:
URL: https://github.com/apache/hive/pull/1572#issuecomment-712026766


   Hi @vihangk1 sorry to ping, do you have any thoughts on this new API?  Thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 502097)
Time Spent: 20m  (was: 10m)

> Create an HMS endpoint to list partition locations
> --
>
> Key: HIVE-24263
> URL: https://issues.apache.org/jira/browse/HIVE-24263
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24263.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In our company, we have a use-case to get quickly a list of partition 
> locations.  Currently it is done via listPartitions, which is a very heavy 
> operation in terms of memory and performance.
> This JIRA proposes an API: Map listPartitionLocations(String 
> db, String table, short max) that returns a map of partition names to 
> locations.
> For example, we have an integration from output of a Hive pipeline to Spark 
> jobs that consume directly from HDFS.  The Spark job scheduler needs to know 
> the partition paths that are available for consumption (the partition name is 
> not sufficient as it's input is HDFS path), and so we have to do heavy 
> listPartitions() for this.
> Another use-case is for a HDFS data removal tool that does a nightly crawl to 
> see if there are associated hive partitions mapped to a given partition path. 
>  The nightly crawling job could be much less resource-intensive if we had a 
> listPartitionLocations().
> As there is already an internal method in the ObjectStore for this done for 
> dropPartitions, it is only a matter of exposing this API to 
> HiveMetaStoreClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24263) Create an HMS endpoint to list partition locations

2020-10-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24263?focusedWorklogId=499369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499369
 ]

ASF GitHub Bot logged work on HIVE-24263:
-

Author: ASF GitHub Bot
Created on: 12/Oct/20 12:25
Start Date: 12/Oct/20 12:25
Worklog Time Spent: 10m 
  Work Description: szehonCriteo opened a new pull request #1572:
URL: https://github.com/apache/hive/pull/1572


   
   
   ### What changes were proposed in this pull request?
   New API: List listPartitionLocations(String db, String table)
   
   
   
   ### Why are the changes needed?
   listPartitions returns this information in the partition object, but is too 
expensive/memory intensive just to get the location.
   Examples of why we need this information is listed in JIRA.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, new documentation is needed if the API is accepted.
   
   
   
   ### How was this patch tested?
   Unit tests are added
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 499369)
Remaining Estimate: 0h
Time Spent: 10m

> Create an HMS endpoint to list partition locations
> --
>
> Key: HIVE-24263
> URL: https://issues.apache.org/jira/browse/HIVE-24263
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
> Attachments: HIVE-24263.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In our company, we have a use-case to get quickly a list of partition 
> locations.  Currently it is done via listPartitions, which is a very heavy 
> operation in terms of memory and performance.
> This JIRA proposes an API: Map listPartitionLocations(String 
> db, String table, short max) that returns a map of partition names to 
> locations.
> For example, we have an integration from output of a Hive pipeline to Spark 
> jobs that consume directly from HDFS.  The Spark job scheduler needs to know 
> the partition paths that are available for consumption (the partition name is 
> not sufficient as it's input is HDFS path), and so we have to do heavy 
> listPartitions() for this.
> Another use-case is for a HDFS data removal tool that does a nightly crawl to 
> see if there are associated hive partitions mapped to a given partition path. 
>  The nightly crawling job could be much less resource-intensive if we had a 
> listPartitionLocations().
> As there is already an internal method in the ObjectStore for this done for 
> dropPartitions, it is only a matter of exposing this API to 
> HiveMetaStoreClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)