[
https://issues.apache.org/jira/browse/HIVE-24306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aasha Medhi updated HIVE-24306:
---
Description:
For data dumped in staging location, we will run a single distcp at the table
level for all partitions as the data is already present in the staging location.
For _files case where data is on source cluster and staging just has the file
list, distcp is executed at the each file level. This is to take care of the cm
case where we need the full path and encoded path(for cm). If the table is
dropped, table level distcp will fail.
This patch takes care of single copy for staging data.
However to run single distcp at the table level, file listing in distcp might
lead to OOM if the number of files are too high. So it needs to be fixed at
the distcp level before committing this patch.
> Launch single copy task for single batch of partitions in repl load for
> managed table
> -
>
> Key: HIVE-24306
> URL: https://issues.apache.org/jira/browse/HIVE-24306
> Project: Hive
> Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Labels: pull-request-available
> Attachments: HIVE-24306.01.patch
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> For data dumped in staging location, we will run a single distcp at the table
> level for all partitions as the data is already present in the staging
> location.
> For _files case where data is on source cluster and staging just has the file
> list, distcp is executed at the each file level. This is to take care of the
> cm case where we need the full path and encoded path(for cm). If the table is
> dropped, table level distcp will fail.
> This patch takes care of single copy for staging data.
> However to run single distcp at the table level, file listing in distcp might
> lead to OOM if the number of files are too high. So it needs to be fixed at
> the distcp level before committing this patch.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)