[jira] [Updated] (TEZ-3910) Single node can cause Tez job to fail during shuffle

2018-07-18 Thread Eric Wohlstadter (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated TEZ-3910:
--
Target Version/s: 0.9.2, 0.10.0  (was: 0.9.2)

> Single node can cause Tez job to fail during shuffle
> 
>
> Key: TEZ-3910
> URL: https://issues.apache.org/jira/browse/TEZ-3910
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3910.001.patch, TEZ-3910.002.patch, 
> TEZ-3910.003.patch, TEZ-3910.004.patch, TEZ-3910.005.patch
>
>
> There is a race where a downstream task that is running into fetch failures 
> due to bad output from the upstream task can continue to blame itself for the 
> failure before the AM can do a re-run of the upstream offending task and fix 
> the fetch failure. This causes the DAG to fail even if a single node fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3910) Single node can cause Tez job to fail during shuffle

2018-07-18 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3910:
-
Attachment: TEZ-3910.005.patch

> Single node can cause Tez job to fail during shuffle
> 
>
> Key: TEZ-3910
> URL: https://issues.apache.org/jira/browse/TEZ-3910
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3910.001.patch, TEZ-3910.002.patch, 
> TEZ-3910.003.patch, TEZ-3910.004.patch, TEZ-3910.005.patch
>
>
> There is a race where a downstream task that is running into fetch failures 
> due to bad output from the upstream task can continue to blame itself for the 
> failure before the AM can do a re-run of the upstream offending task and fix 
> the fetch failure. This causes the DAG to fail even if a single node fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3910) Single node can cause Tez job to fail during shuffle

2018-07-10 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3910:
-
Attachment: TEZ-3910.004.patch

> Single node can cause Tez job to fail during shuffle
> 
>
> Key: TEZ-3910
> URL: https://issues.apache.org/jira/browse/TEZ-3910
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3910.001.patch, TEZ-3910.002.patch, 
> TEZ-3910.003.patch, TEZ-3910.004.patch
>
>
> There is a race where a downstream task that is running into fetch failures 
> due to bad output from the upstream task can continue to blame itself for the 
> failure before the AM can do a re-run of the upstream offending task and fix 
> the fetch failure. This causes the DAG to fail even if a single node fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3910) Single node can cause Tez job to fail during shuffle

2018-05-04 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3910:
-
Attachment: TEZ-3910.003.patch

> Single node can cause Tez job to fail during shuffle
> 
>
> Key: TEZ-3910
> URL: https://issues.apache.org/jira/browse/TEZ-3910
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3910.001.patch, TEZ-3910.002.patch, 
> TEZ-3910.003.patch
>
>
> There is a race where a downstream task that is running into fetch failures 
> due to bad output from the upstream task can continue to blame itself for the 
> failure before the AM can do a re-run of the upstream offending task and fix 
> the fetch failure. This causes the DAG to fail even if a single node fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3910) Single node can cause Tez job to fail during shuffle

2018-05-01 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3910:
-
Attachment: TEZ-3910.002.patch

> Single node can cause Tez job to fail during shuffle
> 
>
> Key: TEZ-3910
> URL: https://issues.apache.org/jira/browse/TEZ-3910
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3910.001.patch, TEZ-3910.002.patch
>
>
> There is a race where a downstream task that is running into fetch failures 
> due to bad output from the upstream task can continue to blame itself for the 
> failure before the AM can do a re-run of the upstream offending task and fix 
> the fetch failure. This causes the DAG to fail even if a single node fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3910) Single node can cause Tez job to fail during shuffle

2018-04-04 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3910:
-
Attachment: TEZ-3910.001.patch

> Single node can cause Tez job to fail during shuffle
> 
>
> Key: TEZ-3910
> URL: https://issues.apache.org/jira/browse/TEZ-3910
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3910.001.patch
>
>
> There is a race where a downstream task that is running into fetch failures 
> due to bad output from the upstream task can continue to blame itself for the 
> failure before the AM can do a re-run of the upstream offending task and fix 
> the fetch failure. This causes the DAG to fail even if a single node fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)