[jira] [Comment Edited] (HIVE-16901) Distcp optimization - One distcp per ReplCopyTask
[ https://issues.apache.org/jira/browse/HIVE-16901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16073937#comment-16073937 ] Sankar Hariappan edited comment on HIVE-16901 at 7/4/17 5:25 PM: - Added 04.patch after replacing (0 != srcMap.size()) to (!srcMap.isEmpty()) Thanks [~anishek] for the review! Request [~thejas]/[~daijy]/[~sushanth] to please review/commit the patch! was (Author: sankarh): Added 04.patch after replacing (0 != srcMap.size()) to (!srcMap.isEmpty()) Thanks [~anishek] for the review! Request [~thejas]/[~daijy] to please review/commit the patch! > Distcp optimization - One distcp per ReplCopyTask > -- > > Key: HIVE-16901 > URL: https://issues.apache.org/jira/browse/HIVE-16901 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-16901.01.patch, HIVE-16901.02.patch, > HIVE-16901.03.patch, HIVE-16901.04.patch > > > Currently, if a ReplCopyTask is created to copy a list of files, then distcp > is invoked for each and every file. Instead, need to pass the list of source > files to be copied to distcp tool which basically copies the files in > parallel and hence gets lot of performance gain. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-16901) Distcp optimization - One distcp per ReplCopyTask
[ https://issues.apache.org/jira/browse/HIVE-16901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16073937#comment-16073937 ] Sankar Hariappan edited comment on HIVE-16901 at 7/4/17 5:23 PM: - Added 04.patch after replacing (0 != srcMap.size()) to (!srcMap.isEmpty()) Thanks [~anishek] for the review! Request [~thejas]/[~daijy] to please review/commit the patch! was (Author: sankarh): Added 04.patch after replacing (0 != srcMap.size()) to (!srcMap.isEmpty()) > Distcp optimization - One distcp per ReplCopyTask > -- > > Key: HIVE-16901 > URL: https://issues.apache.org/jira/browse/HIVE-16901 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-16901.01.patch, HIVE-16901.02.patch, > HIVE-16901.03.patch, HIVE-16901.04.patch > > > Currently, if a ReplCopyTask is created to copy a list of files, then distcp > is invoked for each and every file. Instead, need to pass the list of source > files to be copied to distcp tool which basically copies the files in > parallel and hence gets lot of performance gain. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-16901) Distcp optimization - One distcp per ReplCopyTask
[ https://issues.apache.org/jira/browse/HIVE-16901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16073024#comment-16073024 ] Sankar Hariappan edited comment on HIVE-16901 at 7/4/17 2:04 AM: - Added 03.patch with fixes for Anishek's comments. Request [~anishek] to review the updated patch! was (Author: sankarh): Added 03.patch with fixes for Anishek's comments. > Distcp optimization - One distcp per ReplCopyTask > -- > > Key: HIVE-16901 > URL: https://issues.apache.org/jira/browse/HIVE-16901 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-16901.01.patch, HIVE-16901.02.patch, > HIVE-16901.03.patch > > > Currently, if a ReplCopyTask is created to copy a list of files, then distcp > is invoked for each and every file. Instead, need to pass the list of source > files to be copied to distcp tool which basically copies the files in > parallel and hence gets lot of performance gain. -- This message was sent by Atlassian JIRA (v6.4.14#64029)