[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-20531: Resolution: Fixed Status: Resolved (was: Patch Available) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch, HIVE-20531.07.patch, HIVE-20531.08.patch, > HIVE-20531.09.patch, HIVE-20531.10.patch, HIVE-20531.11.patch, > HIVE-20531.12.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-20531: Attachment: HIVE-20531.12.patch > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: Sankar Hariappan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch, HIVE-20531.07.patch, HIVE-20531.08.patch, > HIVE-20531.09.patch, HIVE-20531.10.patch, HIVE-20531.11.patch, > HIVE-20531.12.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-20531: Assignee: mahesh kumar behera (was: Sankar Hariappan) Status: Patch Available (was: Open) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch, HIVE-20531.07.patch, HIVE-20531.08.patch, > HIVE-20531.09.patch, HIVE-20531.10.patch, HIVE-20531.11.patch, > HIVE-20531.12.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-20531: Status: Open (was: Patch Available) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: Sankar Hariappan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch, HIVE-20531.07.patch, HIVE-20531.08.patch, > HIVE-20531.09.patch, HIVE-20531.10.patch, HIVE-20531.11.patch, > HIVE-20531.12.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-20531: Attachment: HIVE-20531.11.patch > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: Sankar Hariappan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch, HIVE-20531.07.patch, HIVE-20531.08.patch, > HIVE-20531.09.patch, HIVE-20531.10.patch, HIVE-20531.11.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-20531: Assignee: mahesh kumar behera (was: Sankar Hariappan) Status: Patch Available (was: Open) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch, HIVE-20531.07.patch, HIVE-20531.08.patch, > HIVE-20531.09.patch, HIVE-20531.10.patch, HIVE-20531.11.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-20531: Status: Open (was: Patch Available) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch, HIVE-20531.07.patch, HIVE-20531.08.patch, > HIVE-20531.09.patch, HIVE-20531.10.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-20531: Assignee: mahesh kumar behera (was: Sankar Hariappan) Status: Patch Available (was: Open) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch, HIVE-20531.07.patch, HIVE-20531.08.patch, > HIVE-20531.09.patch, HIVE-20531.10.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-20531: Attachment: HIVE-20531.10.patch > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: Sankar Hariappan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch, HIVE-20531.07.patch, HIVE-20531.08.patch, > HIVE-20531.09.patch, HIVE-20531.10.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-20531: Assignee: Sankar Hariappan (was: mahesh kumar behera) Status: Open (was: Patch Available) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: Sankar Hariappan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch, HIVE-20531.07.patch, HIVE-20531.08.patch, > HIVE-20531.09.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera updated HIVE-20531: --- Status: Patch Available (was: Open) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch, HIVE-20531.07.patch, HIVE-20531.08.patch, > HIVE-20531.09.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera updated HIVE-20531: --- Attachment: HIVE-20531.09.patch > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch, HIVE-20531.07.patch, HIVE-20531.08.patch, > HIVE-20531.09.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera updated HIVE-20531: --- Status: Open (was: Patch Available) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch, HIVE-20531.07.patch, HIVE-20531.08.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera updated HIVE-20531: --- Status: Open (was: Patch Available) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch, HIVE-20531.07.patch, HIVE-20531.08.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera updated HIVE-20531: --- Attachment: HIVE-20531.08.patch > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch, HIVE-20531.07.patch, HIVE-20531.08.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera updated HIVE-20531: --- Status: Patch Available (was: Open) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch, HIVE-20531.07.patch, HIVE-20531.08.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera updated HIVE-20531: --- Status: Patch Available (was: Open) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch, HIVE-20531.07.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera updated HIVE-20531: --- Attachment: HIVE-20531.07.patch > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch, HIVE-20531.07.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera updated HIVE-20531: --- Status: Open (was: Patch Available) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch, HIVE-20531.07.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-20531: Status: Patch Available (was: Open) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: Sankar Hariappan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-20531: Attachment: HIVE-20531.06.patch > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: Sankar Hariappan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-20531: Attachment: (was: HIVE-20531.06.patch) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: Sankar Hariappan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-20531: Assignee: Sankar Hariappan (was: mahesh kumar behera) Status: Open (was: Patch Available) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: Sankar Hariappan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-20531: Status: Patch Available (was: Reopened) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: Sankar Hariappan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-20531: Attachment: HIVE-20531.06.patch > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: Sankar Hariappan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch, > HIVE-20531.06.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-20531: Resolution: Fixed Status: Resolved (was: Patch Available) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20531) Repl load on cloud storage file system can skip redundant move or add partition tasks.
[ https://issues.apache.org/jira/browse/HIVE-20531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-20531: Summary: Repl load on cloud storage file system can skip redundant move or add partition tasks. (was: One of the task , either move or add partition can be avoided in repl load flow) > Repl load on cloud storage file system can skip redundant move or add > partition tasks. > -- > > Key: HIVE-20531 > URL: https://issues.apache.org/jira/browse/HIVE-20531 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-20531.01.patch, HIVE-20531.02.patch, > HIVE-20531.03.patch, HIVE-20531.04.patch, HIVE-20531.05.patch > > > In replication load, both add partition and insert operations are handled > through import. Import creates 3 major tasks. Copy, add partition and move. > Copy does the copy of data from source location to staging directory. Then > add partition (which runs in parallel to copy) creates the partition in meta > store. Its a no op in case of insert and by the time this ddl task is > executed for insert partition would be already present. The third operation > is move. Which actually moves the file from staging directory to actual > location. And then in case of insert it adds the insert event to notification > table. It does this for add partition operation which is redundant as the > event for add partition would have been written already by ddl task. With the > optimization to copy directly to actual table location in S3, move task can > be avoided for add partition operation replay and replay of insert need not > create the add partition (ddl) task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)