[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17507793#comment-17507793 ] László Bodor commented on TEZ-3363: --- this is finally merged to master, thanks [~srahman] for your work and patience! thanks [~kshukla] for the original patch (which has a lot in common with the final one), and [~sseth] for the comments back in 2017! > Delete intermediate data at the vertex level for Shuffle Handler > > > Key: TEZ-3363 > URL: https://issues.apache.org/jira/browse/TEZ-3363 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Turner Eagles >Assignee: Syed Shameerur Rahman >Priority: Major > Fix For: 0.10.2 > > Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch, TEZ-3363.03.patch > > Time Spent: 10h 50m > Remaining Estimate: 0h > > For applications like pig where processing times can be very long, > applications may choose to delete intermediate data for a sub dag. For > example if a DAG has synced data to HDFS, all upstream intermediate data can > be safely deleted. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411176#comment-17411176 ] Syed Shameerur Rahman commented on TEZ-3363: [~jeagles] Could you please review it? Thanks > Delete intermediate data at the vertex level for Shuffle Handler > > > Key: TEZ-3363 > URL: https://issues.apache.org/jira/browse/TEZ-3363 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Turner Eagles >Assignee: Syed Shameerur Rahman >Priority: Major > Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch, TEZ-3363.03.patch > > Time Spent: 10m > Remaining Estimate: 0h > > For applications like pig where processing times can be very long, > applications may choose to delete intermediate data for a sub dag. For > example if a DAG has synced data to HDFS, all upstream intermediate data can > be safely deleted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285046#comment-17285046 ] Hadoop QA commented on TEZ-3363: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 1s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 4m 9s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 34s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s{color} | {color:green} master passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 23s{color} | {color:green} master passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 35s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 16s{color} | {color:green} master passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 10s{color} | {color:green} master passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 0m 43s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 31s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 15s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 25s{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 16s{color} | {color:orange} tez-runtime-library: The patch generated 1 new + 21 unchanged - 0 fixed = 22 total (was 21) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 35s{color} | {color:orange} tez-dag: The patch generated 19 new + 612 unchanged - 0 fixed = 631 total (was 612) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 10s{color} | {color:orange} tez-plugins/tez-aux-services: The patch generated 2 new + 71 unchanged - 0 fixed = 73 total (was 71) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 50s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 55s{color} | {color:green} tez-api in the patch passed with JDK
[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285022#comment-17285022 ] Tez CI commented on TEZ-3363: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 4m 58s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 29s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 22s{color} | {color:green} master passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 12s{color} | {color:green} master passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 33s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 20s{color} | {color:green} master passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s{color} | {color:green} master passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 0m 41s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 14s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 27s{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 16s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 14s{color} | {color:orange} tez-runtime-library: The patch generated 1 new + 21 unchanged - 0 fixed = 22 total (was 21) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 35s{color} | {color:orange} tez-dag: The patch generated 19 new + 612 unchanged - 0 fixed = 631 total (was 612) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 11s{color} | {color:orange} tez-plugins/tez-aux-services: The patch generated 2 new + 71 unchanged - 0 fixed = 73 total (was 71) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 54s{color} | {color:green} tez-api in the patch passed. {color} | | {color:green}+1{color} |
[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047802#comment-17047802 ] Syed Shameerur Rahman commented on TEZ-3363: [~abstractdog] Can you please review? Thank You! > Delete intermediate data at the vertex level for Shuffle Handler > > > Key: TEZ-3363 > URL: https://issues.apache.org/jira/browse/TEZ-3363 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Turner Eagles >Assignee: Syed Shameerur Rahman >Priority: Major > Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch, TEZ-3363.03.patch > > Time Spent: 10m > Remaining Estimate: 0h > > For applications like pig where processing times can be very long, > applications may choose to delete intermediate data for a sub dag. For > example if a DAG has synced data to HDFS, all upstream intermediate data can > be safely deleted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043458#comment-17043458 ] Syed Shameerur Rahman commented on TEZ-3363: [~rajesh.balamohan] [~bikas] Can you please review? Thank You! > Delete intermediate data at the vertex level for Shuffle Handler > > > Key: TEZ-3363 > URL: https://issues.apache.org/jira/browse/TEZ-3363 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Turner Eagles >Assignee: Syed Shameerur Rahman >Priority: Major > Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch, TEZ-3363.03.patch > > Time Spent: 10m > Remaining Estimate: 0h > > For applications like pig where processing times can be very long, > applications may choose to delete intermediate data for a sub dag. For > example if a DAG has synced data to HDFS, all upstream intermediate data can > be safely deleted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040888#comment-17040888 ] TezQA commented on TEZ-3363: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 4s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 23s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 33s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 35s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 0m 33s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 26s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 14s{color} | {color:orange} tez-runtime-library: The patch generated 1 new + 22 unchanged - 0 fixed = 23 total (was 22) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 28s{color} | {color:orange} tez-dag: The patch generated 20 new + 638 unchanged - 0 fixed = 658 total (was 638) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 9s{color} | {color:orange} tez-plugins/tez-aux-services: The patch generated 3 new + 84 unchanged - 0 fixed = 87 total (was 84) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 41s{color} | {color:green} tez-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 21s{color} | {color:green} tez-runtime-library in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 4s{color} | {color:green} tez-dag in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 24s{color} | {color:green} tez-aux-services in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 45m 44s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 base: https://builds.apache.org/job/PreCommit-TEZ-Build/300/artifact/out/Dockerfile | | JIRA Issue | TEZ-3363 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12993987/TEZ-3363.03.patch | | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile | | uname | Linux ed83b2551c54
[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040851#comment-17040851 ] Syed Shameerur Rahman commented on TEZ-3363: [~jeagles] [~sseth] [~kshukla] Please review the patch TEZ-3363.03.patch > Delete intermediate data at the vertex level for Shuffle Handler > > > Key: TEZ-3363 > URL: https://issues.apache.org/jira/browse/TEZ-3363 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Turner Eagles >Assignee: Kuhu Shukla >Priority: Major > Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch, TEZ-3363.03.patch > > Time Spent: 10m > Remaining Estimate: 0h > > For applications like pig where processing times can be very long, > applications may choose to delete intermediate data for a sub dag. For > example if a DAG has synced data to HDFS, all upstream intermediate data can > be safely deleted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013347#comment-17013347 ] TezQA commented on TEZ-3363: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red} TEZ-3363 does not apply to master. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/TEZ/How+to+Contribute+to+Tez for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | TEZ-3363 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12886888/TEZ-3363.002.patch | | Console output | https://builds.apache.org/job/PreCommit-TEZ-Build/244/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org | This message was automatically generated. > Delete intermediate data at the vertex level for Shuffle Handler > > > Key: TEZ-3363 > URL: https://issues.apache.org/jira/browse/TEZ-3363 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Turner Eagles >Assignee: Kuhu Shukla >Priority: Major > Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch > > > For applications like pig where processing times can be very long, > applications may choose to delete intermediate data for a sub dag. For > example if a DAG has synced data to HDFS, all upstream intermediate data can > be safely deleted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013346#comment-17013346 ] Syed Shameerur Rahman commented on TEZ-3363: [~kshukla] Any update on this? Going through the initial patch, I think using List nodesList will lead to redundant http calls. Eg: No: of tasks in a vertex is 1000 which were launched across 5 nodes. Instead of sending 1000 HTTP requests we can make 5 HTTP requests and do some regex matching to delete the shuffle data. Since we are already matching the regex it will be better to use Set nodesList. I can pick this if you are not working on this. cc [~jeagles] > Delete intermediate data at the vertex level for Shuffle Handler > > > Key: TEZ-3363 > URL: https://issues.apache.org/jira/browse/TEZ-3363 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Turner Eagles >Assignee: Kuhu Shukla >Priority: Major > Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch > > > For applications like pig where processing times can be very long, > applications may choose to delete intermediate data for a sub dag. For > example if a DAG has synced data to HDFS, all upstream intermediate data can > be safely deleted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225470#comment-16225470 ] Kuhu Shukla commented on TEZ-3363: -- bq. On the vertex events - does the vertex make sure that every downstream vertex at the specified depth is complete? Yes, although looking at my current design it might be fragile to cases of duplicate vertex complete events from the same vertex. Besides that, the children data structure in VertexImpl takes care that all of them finish before vertexComplete() on the ancestor is called. You are right that this might be easier to do at Dag level. bq. When the data for a vertex is deleted, I think it'll be better to move it into a different state, so that in case of failures / re-runs which require data from this vertex, the vertex tasks can be re-run directly, instead of relying on failures from the source to trigger re-runs of upstream tasks (how slow/fast is this?). This can be problematic if the entire vertex ends up re-running even if all data is not required by a downstream task. Ideally, would be nice to re-run tasks when a downstream consumer requests this data. Agreed, since we know the re-runs must happen once data has been deleted. This will help bypass fetch reties and failure detection time in the current design. Will update patch and get back asap. > Delete intermediate data at the vertex level for Shuffle Handler > > > Key: TEZ-3363 > URL: https://issues.apache.org/jira/browse/TEZ-3363 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla > Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch > > > For applications like pig where processing times can be very long, > applications may choose to delete intermediate data for a sub dag. For > example if a DAG has synced data to HDFS, all upstream intermediate data can > be safely deleted. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219468#comment-16219468 ] Siddharth Seth commented on TEZ-3363: - Couple of comments. 1. When the data for a vertex is deleted, I think it'll be better to move it into a different state, so that in case of failures / re-runs which require data from this vertex, the vertex tasks can be re-run directly, instead of relying on failures from the source to trigger re-runs of upstream tasks (how slow/fast is this?). This can be problematic if the entire vertex ends up re-running even if all data is not required by a downstream task. Ideally, would be nice to re-run tasks when a downstream consumer requests this data. 2. On the vertex events - does the vertex make sure that every downstream vertex at the specified depth is complete? May be easier to move this co-ordination / selection of vertices for whcih data is to be deleted into the DAG - whcih already gets VERTEX_COMPLETE events. 3. The configs could be collapsed into one - with negative values indicating that the feature is disabled. > Delete intermediate data at the vertex level for Shuffle Handler > > > Key: TEZ-3363 > URL: https://issues.apache.org/jira/browse/TEZ-3363 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla > Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch > > > For applications like pig where processing times can be very long, > applications may choose to delete intermediate data for a sub dag. For > example if a DAG has synced data to HDFS, all upstream intermediate data can > be safely deleted. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205888#comment-16205888 ] Kuhu Shukla commented on TEZ-3363: -- Would be nice to get some initial comments on the design and potential issues with this approach. CC: [~jeagles], [~jlowe], [~sseth]. Thank you! > Delete intermediate data at the vertex level for Shuffle Handler > > > Key: TEZ-3363 > URL: https://issues.apache.org/jira/browse/TEZ-3363 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla > Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch > > > For applications like pig where processing times can be very long, > applications may choose to delete intermediate data for a sub dag. For > example if a DAG has synced data to HDFS, all upstream intermediate data can > be safely deleted. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176809#comment-16176809 ] Kuhu Shukla commented on TEZ-3363: -- Request for review/comments [~jeagles]/[~sseth]. Thanks a lot! > Delete intermediate data at the vertex level for Shuffle Handler > > > Key: TEZ-3363 > URL: https://issues.apache.org/jira/browse/TEZ-3363 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla > Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch > > > For applications like pig where processing times can be very long, > applications may choose to delete intermediate data for a sub dag. For > example if a DAG has synced data to HDFS, all upstream intermediate data can > be safely deleted. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165054#comment-16165054 ] Kuhu Shukla commented on TEZ-3363: -- This test TestMRRJobsDAGApi does not fail locally and seems like a case of TEZ-899. > Delete intermediate data at the vertex level for Shuffle Handler > > > Key: TEZ-3363 > URL: https://issues.apache.org/jira/browse/TEZ-3363 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla > Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch > > > For applications like pig where processing times can be very long, > applications may choose to delete intermediate data for a sub dag. For > example if a DAG has synced data to HDFS, all upstream intermediate data can > be safely deleted. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler
[ https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164955#comment-16164955 ] TezQA commented on TEZ-3363: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12886888/TEZ-3363.002.patch against master revision 7e895f5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.mapreduce.TestMRRJobsDAGApi Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2629//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2629//console This message is automatically generated. > Delete intermediate data at the vertex level for Shuffle Handler > > > Key: TEZ-3363 > URL: https://issues.apache.org/jira/browse/TEZ-3363 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla > Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch > > > For applications like pig where processing times can be very long, > applications may choose to delete intermediate data for a sub dag. For > example if a DAG has synced data to HDFS, all upstream intermediate data can > be safely deleted. -- This message was sent by Atlassian JIRA (v6.4.14#64029)