[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler

2022-03-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17507793#comment-17507793
 ] 

László Bodor commented on TEZ-3363:
---

this is finally merged to master, thanks [~srahman] for your work and patience!
thanks [~kshukla] for the original patch (which has a lot in common with the 
final one), and [~sseth] for the comments back in 2017!

> Delete intermediate data at the vertex level for Shuffle Handler
> 
>
> Key: TEZ-3363
> URL: https://issues.apache.org/jira/browse/TEZ-3363
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Turner Eagles
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Fix For: 0.10.2
>
> Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch, TEZ-3363.03.patch
>
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> For applications like pig where processing times can be very long, 
> applications may choose to delete intermediate data for a sub dag. For 
> example if a DAG has synced data to HDFS, all upstream intermediate data can 
> be safely deleted.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler

2021-09-07 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411176#comment-17411176
 ] 

Syed Shameerur Rahman commented on TEZ-3363:


[~jeagles] Could you please review it?
Thanks

> Delete intermediate data at the vertex level for Shuffle Handler
> 
>
> Key: TEZ-3363
> URL: https://issues.apache.org/jira/browse/TEZ-3363
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Turner Eagles
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch, TEZ-3363.03.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For applications like pig where processing times can be very long, 
> applications may choose to delete intermediate data for a sub dag. For 
> example if a DAG has synced data to HDFS, all upstream intermediate data can 
> be safely deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler

2021-02-15 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285046#comment-17285046
 ] 

Hadoop QA commented on TEZ-3363:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  4m  
9s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
10s{color} | {color:green} master passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
23s{color} | {color:green} master passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
16s{color} | {color:green} master passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
10s{color} | {color:green} master passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
43s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
31s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 16s{color} | {color:orange} tez-runtime-library: The patch generated 1 new + 
21 unchanged - 0 fixed = 22 total (was 21) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} tez-dag: The patch generated 19 new + 612 
unchanged - 0 fixed = 631 total (was 612) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 10s{color} | {color:orange} tez-plugins/tez-aux-services: The patch 
generated 2 new + 71 unchanged - 0 fixed = 73 total (was 71) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
55s{color} | {color:green} tez-api in the patch passed with JDK 

[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler

2021-02-15 Thread Tez CI (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285022#comment-17285022
 ] 

Tez CI commented on TEZ-3363:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  4m 
58s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
22s{color} | {color:green} master passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
12s{color} | {color:green} master passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
20s{color} | {color:green} master passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
8s{color} | {color:green} master passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
41s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
14s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 14s{color} | {color:orange} tez-runtime-library: The patch generated 1 new + 
21 unchanged - 0 fixed = 22 total (was 21) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} tez-dag: The patch generated 19 new + 612 
unchanged - 0 fixed = 631 total (was 612) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 11s{color} | {color:orange} tez-plugins/tez-aux-services: The patch 
generated 2 new + 71 unchanged - 0 fixed = 73 total (was 71) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
54s{color} | {color:green} tez-api in the patch passed. {color} |
| {color:green}+1{color} | 

[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler

2020-02-28 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047802#comment-17047802
 ] 

Syed Shameerur Rahman commented on TEZ-3363:


[~abstractdog] Can you please review?

Thank You!

> Delete intermediate data at the vertex level for Shuffle Handler
> 
>
> Key: TEZ-3363
> URL: https://issues.apache.org/jira/browse/TEZ-3363
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Turner Eagles
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch, TEZ-3363.03.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For applications like pig where processing times can be very long, 
> applications may choose to delete intermediate data for a sub dag. For 
> example if a DAG has synced data to HDFS, all upstream intermediate data can 
> be safely deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler

2020-02-24 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043458#comment-17043458
 ] 

Syed Shameerur Rahman commented on TEZ-3363:


[~rajesh.balamohan] [~bikas] Can you please review?

Thank You!

> Delete intermediate data at the vertex level for Shuffle Handler
> 
>
> Key: TEZ-3363
> URL: https://issues.apache.org/jira/browse/TEZ-3363
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Turner Eagles
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch, TEZ-3363.03.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For applications like pig where processing times can be very long, 
> applications may choose to delete intermediate data for a sub dag. For 
> example if a DAG has synced data to HDFS, all upstream intermediate data can 
> be safely deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler

2020-02-20 Thread TezQA (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040888#comment-17040888
 ] 

TezQA commented on TEZ-3363:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m  
4s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
23s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
33s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
26s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 14s{color} | {color:orange} tez-runtime-library: The patch generated 1 new + 
22 unchanged - 0 fixed = 23 total (was 22) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 28s{color} | {color:orange} tez-dag: The patch generated 20 new + 638 
unchanged - 0 fixed = 658 total (was 638) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m  9s{color} | {color:orange} tez-plugins/tez-aux-services: The patch 
generated 3 new + 84 unchanged - 0 fixed = 87 total (was 84) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
41s{color} | {color:green} tez-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
21s{color} | {color:green} tez-runtime-library in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m  
4s{color} | {color:green} tez-dag in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
24s{color} | {color:green} tez-aux-services in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 45m 44s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 base: 
https://builds.apache.org/job/PreCommit-TEZ-Build/300/artifact/out/Dockerfile |
| JIRA Issue | TEZ-3363 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12993987/TEZ-3363.03.patch |
| Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs 
checkstyle compile |
| uname | Linux ed83b2551c54 

[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler

2020-02-20 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040851#comment-17040851
 ] 

Syed Shameerur Rahman commented on TEZ-3363:


[~jeagles] [~sseth] [~kshukla] Please review the patch TEZ-3363.03.patch

> Delete intermediate data at the vertex level for Shuffle Handler
> 
>
> Key: TEZ-3363
> URL: https://issues.apache.org/jira/browse/TEZ-3363
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Turner Eagles
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch, TEZ-3363.03.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For applications like pig where processing times can be very long, 
> applications may choose to delete intermediate data for a sub dag. For 
> example if a DAG has synced data to HDFS, all upstream intermediate data can 
> be safely deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler

2020-01-10 Thread TezQA (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013347#comment-17013347
 ] 

TezQA commented on TEZ-3363:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} TEZ-3363 does not apply to master. Rebase required? Wrong Branch? 
See https://cwiki.apache.org/confluence/display/TEZ/How+to+Contribute+to+Tez 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | TEZ-3363 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12886888/TEZ-3363.002.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-TEZ-Build/244/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |


This message was automatically generated.



> Delete intermediate data at the vertex level for Shuffle Handler
> 
>
> Key: TEZ-3363
> URL: https://issues.apache.org/jira/browse/TEZ-3363
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Turner Eagles
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch
>
>
> For applications like pig where processing times can be very long, 
> applications may choose to delete intermediate data for a sub dag. For 
> example if a DAG has synced data to HDFS, all upstream intermediate data can 
> be safely deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler

2020-01-10 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013346#comment-17013346
 ] 

Syed Shameerur Rahman commented on TEZ-3363:


[~kshukla] Any update on this?

Going through the initial patch, I think using List nodesList will lead 
to redundant http calls. Eg: No: of tasks in a vertex is 1000 which were 
launched across 5 nodes. Instead of sending 1000 HTTP requests we can make 5 
HTTP requests and do some regex matching to delete the shuffle data. Since we 
are already matching the regex it will be better to use Set nodesList.

I can pick this if you are not working on this.

cc [~jeagles]

> Delete intermediate data at the vertex level for Shuffle Handler
> 
>
> Key: TEZ-3363
> URL: https://issues.apache.org/jira/browse/TEZ-3363
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Turner Eagles
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch
>
>
> For applications like pig where processing times can be very long, 
> applications may choose to delete intermediate data for a sub dag. For 
> example if a DAG has synced data to HDFS, all upstream intermediate data can 
> be safely deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler

2017-10-30 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225470#comment-16225470
 ] 

Kuhu Shukla commented on TEZ-3363:
--

bq. On the vertex events - does the vertex make sure that every downstream 
vertex at the specified depth is complete?
Yes, although looking at my current design it might be fragile to cases of 
duplicate vertex complete events from the same vertex. Besides that, the 
children data structure in VertexImpl takes care that all of them finish before 
vertexComplete() on the ancestor is called. You are right that this might be 
easier to do at Dag level.
bq. When the data for a vertex is deleted, I think it'll be better to move it 
into a different state, so that in case of failures / re-runs which require 
data from this vertex, the vertex tasks can be re-run directly, instead of 
relying on failures from the source to trigger re-runs of upstream tasks (how 
slow/fast is this?). This can be problematic if the entire vertex ends up 
re-running even if all data is not required by a downstream task. Ideally, 
would be nice to re-run tasks when a downstream consumer requests this data.
Agreed, since we know the re-runs must happen once data has been deleted. This 
will help bypass fetch reties and failure detection time in the current design. 
Will update patch and get back asap.

> Delete intermediate data at the vertex level for Shuffle Handler
> 
>
> Key: TEZ-3363
> URL: https://issues.apache.org/jira/browse/TEZ-3363
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch
>
>
> For applications like pig where processing times can be very long, 
> applications may choose to delete intermediate data for a sub dag. For 
> example if a DAG has synced data to HDFS, all upstream intermediate data can 
> be safely deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler

2017-10-25 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219468#comment-16219468
 ] 

Siddharth Seth commented on TEZ-3363:
-

Couple of comments.
1. When the data for a vertex is deleted, I think it'll be better to move it 
into a different state, so that in case of failures / re-runs which require 
data from this vertex, the vertex tasks can be re-run directly, instead of 
relying on failures from the source to trigger re-runs of upstream tasks (how 
slow/fast is this?). This can be problematic if the entire vertex ends up 
re-running even if all data is not required by a downstream task. Ideally, 
would be nice to re-run tasks when a downstream consumer requests this data.
2. On the vertex events - does the vertex make sure that every downstream 
vertex at the specified depth is complete? May be easier to move this 
co-ordination / selection of vertices for whcih data is to be deleted into the 
DAG - whcih already gets VERTEX_COMPLETE events.
3. The configs could be collapsed into one - with negative values indicating 
that the feature is disabled.


> Delete intermediate data at the vertex level for Shuffle Handler
> 
>
> Key: TEZ-3363
> URL: https://issues.apache.org/jira/browse/TEZ-3363
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch
>
>
> For applications like pig where processing times can be very long, 
> applications may choose to delete intermediate data for a sub dag. For 
> example if a DAG has synced data to HDFS, all upstream intermediate data can 
> be safely deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler

2017-10-16 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205888#comment-16205888
 ] 

Kuhu Shukla commented on TEZ-3363:
--

Would be nice to get some initial comments on the design and potential issues 
with this approach. CC: [~jeagles], [~jlowe], [~sseth]. Thank you! 

> Delete intermediate data at the vertex level for Shuffle Handler
> 
>
> Key: TEZ-3363
> URL: https://issues.apache.org/jira/browse/TEZ-3363
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch
>
>
> For applications like pig where processing times can be very long, 
> applications may choose to delete intermediate data for a sub dag. For 
> example if a DAG has synced data to HDFS, all upstream intermediate data can 
> be safely deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler

2017-09-22 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176809#comment-16176809
 ] 

Kuhu Shukla commented on TEZ-3363:
--

Request for review/comments [~jeagles]/[~sseth]. Thanks a lot!

> Delete intermediate data at the vertex level for Shuffle Handler
> 
>
> Key: TEZ-3363
> URL: https://issues.apache.org/jira/browse/TEZ-3363
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch
>
>
> For applications like pig where processing times can be very long, 
> applications may choose to delete intermediate data for a sub dag. For 
> example if a DAG has synced data to HDFS, all upstream intermediate data can 
> be safely deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler

2017-09-13 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165054#comment-16165054
 ] 

Kuhu Shukla commented on TEZ-3363:
--

This test TestMRRJobsDAGApi  does not fail locally and seems like a case of 
TEZ-899.

> Delete intermediate data at the vertex level for Shuffle Handler
> 
>
> Key: TEZ-3363
> URL: https://issues.apache.org/jira/browse/TEZ-3363
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch
>
>
> For applications like pig where processing times can be very long, 
> applications may choose to delete intermediate data for a sub dag. For 
> example if a DAG has synced data to HDFS, all upstream intermediate data can 
> be safely deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3363) Delete intermediate data at the vertex level for Shuffle Handler

2017-09-13 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164955#comment-16164955
 ] 

TezQA commented on TEZ-3363:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12886888/TEZ-3363.002.patch
  against master revision 7e895f5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.mapreduce.TestMRRJobsDAGApi

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2629//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2629//console

This message is automatically generated.

> Delete intermediate data at the vertex level for Shuffle Handler
> 
>
> Key: TEZ-3363
> URL: https://issues.apache.org/jira/browse/TEZ-3363
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch
>
>
> For applications like pig where processing times can be very long, 
> applications may choose to delete intermediate data for a sub dag. For 
> example if a DAG has synced data to HDFS, all upstream intermediate data can 
> be safely deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)