[jira] [Updated] (HIVE-24306) Launch single copy task for single batch of partitions in repl load for managed table

2020-10-28 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24306:
---
Description: 
For data dumped in staging location, we will run a single distcp at the table 
level for all partitions as the data is already present in the staging location.

For _files case where data is on source cluster and staging just has the file 
list, distcp is executed at the each file level. This is to take care of the cm 
case where we need the full path and encoded path(for cm). If the table is 
dropped, table level distcp will fail. 

This patch takes care of single copy for staging data.
However to run single distcp at the table level, file listing in distcp might 
lead  to OOM if the number of files are too high. So it needs to be fixed at 
the distcp level before committing this patch.

> Launch single copy task for single batch of partitions in repl load for 
> managed table
> -
>
> Key: HIVE-24306
> URL: https://issues.apache.org/jira/browse/HIVE-24306
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24306.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For data dumped in staging location, we will run a single distcp at the table 
> level for all partitions as the data is already present in the staging 
> location.
> For _files case where data is on source cluster and staging just has the file 
> list, distcp is executed at the each file level. This is to take care of the 
> cm case where we need the full path and encoded path(for cm). If the table is 
> dropped, table level distcp will fail. 
> This patch takes care of single copy for staging data.
> However to run single distcp at the table level, file listing in distcp might 
> lead  to OOM if the number of files are too high. So it needs to be fixed at 
> the distcp level before committing this patch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24306) Launch single copy task for single batch of partitions in repl load for managed table

2020-10-28 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24306:
---
Attachment: HIVE-24306.01.patch
Status: Patch Available  (was: In Progress)

> Launch single copy task for single batch of partitions in repl load for 
> managed table
> -
>
> Key: HIVE-24306
> URL: https://issues.apache.org/jira/browse/HIVE-24306
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24306.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24306) Launch single copy task for single batch of partitions in repl load for managed table

2020-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24306:
--
Labels: pull-request-available  (was: )

> Launch single copy task for single batch of partitions in repl load for 
> managed table
> -
>
> Key: HIVE-24306
> URL: https://issues.apache.org/jira/browse/HIVE-24306
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)