[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-12-01 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712908#comment-15712908
 ] 

Hitesh Shah commented on TEZ-3271:
--

Assuming that the following recovery case was tested: 

  - V1 connected with V2 
  - V1 completes with failures 
  - AM killed when V2 was running with some tasks already launched 

+1 if the above test was manually done.  

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.10.patch, TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, 
> TEZ-3271.5.patch, TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, 
> TEZ-3271.9.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-11-30 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710028#comment-15710028
 ] 

Jonathan Eagles commented on TEZ-3271:
--

[~hitesh], I have verified this works manually with recovery, but the tests are 
proving hard to write. What needs to be done before we can look at this patch 
again?

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.10.patch, TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, 
> TEZ-3271.5.patch, TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, 
> TEZ-3271.9.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-10-27 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613573#comment-15613573
 ] 

TezQA commented on TEZ-3271:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12835664/TEZ-3271.10.patch
  against master revision 6cf4378.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2069//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2069//console

This message is automatically generated.

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.10.patch, TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, 
> TEZ-3271.5.patch, TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, 
> TEZ-3271.9.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-10-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612523#comment-15612523
 ] 

Hitesh Shah commented on TEZ-3271:
--

bq. The above it true, but one important this I didn't mention above is that 
edgeManager.getNumDestinationConsumerTasks throws of type Exception. Throwing 
TezException, IOException, and Exception seems redundant. I will catch the 
Exception and rethrow TezException to explicitly give a throws TezException, 
IOException signature.

Sorry - my bad. Missed that. Lets stick to Exception in that case - no need to 
do an additional wrap around. 

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, 
> TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, TEZ-3271.9.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-10-27 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612432#comment-15612432
 ] 

Jonathan Eagles commented on TEZ-3271:
--

bq. The function could do explicit throws for both of the exceptions. On the 
VertexImpl, a generic Exception catch would make sense to ensure that we dont 
hit a dispatcher error.
The above it true, but one important this I didn't mention above is that 
edgeManager.getNumDestinationConsumerTasks throws of type Exception. Throwing 
TezException, IOException, and Exception seems redundant. I will catch the 
Exception and rethrow TezException to explicitly give a throws TezException, 
IOException signature.
bq. Relies on TaskImpl's impl of addAndScheduleAttempt() but you could just use 
size of the map to decipher the last attempt.
Changed the logic to always ensure the highest attempt is chosen.

bq. "Assert.assertEquals(2, v6.numSuccessSourceAttemptCompletions);" - shoudnt 
this get 4 completions given that v6 expects task attempt completions for both 
tasks in v4 and v5?
Likewise for "testFailuresMaxPercentExceededSourceTaskAttemptCompletionEvents" 
- I would assume this will get 2 for tasks of v5?
Will address this
bq. Sorry for the last minute review comment on testVertexFailuresMaxPercent - 
should we be using a 2-vertex DAG with the first vertex having a threshold and 
verifying that the second vertex ( using a shuffle edge ) completes 
successfully ?
Think the main thrust is that the shuffle is missing between these vertices and 
should be added. Will do that.

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, 
> TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, TEZ-3271.9.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-10-26 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610181#comment-15610181
 ] 

Hitesh Shah commented on TEZ-3271:
--

bq. This function throws TezException and IOException. Let me what the right 
thing to do in this particular situation.

The function could do explicit throws for both of the exceptions. On the 
VertexImpl,  a generic Exception catch would make sense to ensure that we dont 
hit a dispatcher error. 

{code}
Iterator attempts = 
task.getAttempts().keySet().iterator();
2150while (attempts.hasNext()) {
2151  attempt = attempts.next();
2152}
{code}
  - Relies on TaskImpl's impl of addAndScheduleAttempt() but you could just use 
size of the map to decipher the last attempt. 

TestVertexImpl changes:
  - createInvalidDAGPlan not needed?
  - "Assert.assertEquals(2, v6.numSuccessSourceAttemptCompletions);" - shoudnt 
this get 4 completions given that v6 expects task attempt completions for both 
tasks in v4 and v5?
  - Likewise for 
"testFailuresMaxPercentExceededSourceTaskAttemptCompletionEvents" - I would 
assume this will get 2 for tasks of v5? 

Sorry for the last minute review comment on testVertexFailuresMaxPercent - 
should we be using a 2-vertex DAG with the first vertex having a threshold and 
verifying that the second vertex ( using a shuffle edge ) completes 
successfully ? 
 



> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, 
> TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, TEZ-3271.9.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-10-26 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15609288#comment-15609288
 ] 

Hitesh Shah commented on TEZ-3271:
--

[~jeagles] Will do so later today. Any luck on looking into the recovery 
aspects of this change? 

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, 
> TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, TEZ-3271.9.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-10-26 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15609087#comment-15609087
 ] 

Jonathan Eagles commented on TEZ-3271:
--

[~hitesh], can you have a look at the latest (9) max percent failures patch?

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, 
> TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, TEZ-3271.9.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-10-26 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608945#comment-15608945
 ] 

TezQA commented on TEZ-3271:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12835252/TEZ-3271.9.patch
  against master revision f735f48.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2065//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2065//console

This message is automatically generated.

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, 
> TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, TEZ-3271.9.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-10-25 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607395#comment-15607395
 ] 

TezQA commented on TEZ-3271:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12835252/TEZ-3271.9.patch
  against master revision f735f48.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in :
 org.apache.tez.dag.app.rm.TestTaskScheduler

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2062//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2062//console

This message is automatically generated.

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, 
> TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, TEZ-3271.9.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-10-25 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607276#comment-15607276
 ] 

Jonathan Eagles commented on TEZ-3271:
--

bq. the above should have parenthesis to make the code more understandable at a 
first glance.
added parentheses to the boolean logic
bq. would be useful to log count of failed tasks, total tasks, threshold to aid 
debugging.
modified logging to include the failed tasks total tasks, and threshold to be 
in line with the diagnostic message
bq. Do the task completion events to history need to be changed to publish this 
info too? Can be done in a follow-up jira but should be done and made visible 
via the UI
Will make sure to handle this in the follow-up jira
bq. why pick the first attempt instead of the last one?
Switched logic to pick the last one. Basically picked the first one so as to 
not iterate the whole list as needed by the API provided
bq. any particular reason for the generic exception as compared to a specific 
one being thrown?
This function throws TezException and IOException. Let me what the right thing 
to do in this particular situation.
bq. No test changes for TestVertexImpl? testVertexFailuresMaxPercent() does a 
good high level end to end verification but we probably need some unit tests at 
the VertexImpl level to test thresholds, event generation, etc.
Added both positive and negative tests to TestVertexImpl.


> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, 
> TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, TEZ-3271.9.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-10-25 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606447#comment-15606447
 ] 

Hitesh Shah commented on TEZ-3271:
--

Comments: 

{code}
boolean vertexFailuresBelowThreshold = vertex.succeededTaskCount + 
vertex.failedTaskCount == vertex.numTasks && vertex.failedTaskCount * 100 <= 
vertex.maxFailuresPercent * vertex.numTasks;
{code}
   - the above should have parenthesis to make the code more understandable at 
a first glance.
   
{code}
  LOG.info("All tasks have completed and the number of failed tasks is 
within threshold, vertex:" + vertex.logIdentifier);
{code}
  - would be useful to log count of failed tasks, total tasks, threshold to aid 
debugging. 
  - Do the task completion events to history need to be changed to publish this 
info too? Can be done in a follow-up jira but should be done and made visible 
via the UI. 

{code}
TezTaskAttemptID attempt = 
task.getAttempts().keySet().iterator().next();
2147LOG.info("Succeeding failed task attempt:" + attempt);
{code}
   - why pick the first attempt instead of the last one? 

{code}
generateEmptyEventsForAttempt(TezTaskAttemptID attempt) throws Exception
{code}
  - any particular reason for the generic exception as compared to a specific 
one being thrown? 

TEZ_VERTEX_FAILURES_MAXPERCENT needs a minor doc improvement to clarify whether 
the values are meant to be 0.0-1.0f or 0.0-100.0f. 

No test changes for TestVertexImpl? testVertexFailuresMaxPercent() does a good 
high level end to end verification but we probably need some unit tests at the 
VertexImpl level to test thresholds, event generation, etc. 





> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, 
> TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-10-25 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606236#comment-15606236
 ] 

TezQA commented on TEZ-3271:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12835164/TEZ-3271.8.patch
  against master revision f735f48.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2060//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2060//console

This message is automatically generated.

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, 
> TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-10-25 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605992#comment-15605992
 ] 

Jonathan Eagles commented on TEZ-3271:
--

Posted a new patch implementing the changes as discussed.

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, 
> TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-10-24 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603334#comment-15603334
 ] 

Jonathan Eagles commented on TEZ-3271:
--

bq. generateEmptyEventsForSourceTask in EdgeManagerPlugin should not be an 
abstract function. Given that CartesianProductEdgeManager needs changing this 
is an incompatible feature. An appropriate exception thrown could be used to 
indicate that the EM plugin in use does not support the failure threshold 
percent feature.
If we strictly limit this feature to know tez outputs, we can avoid empty event 
generation at this time in the edge manager plugin and can promote that to the 
edge.

bq. I think we can add a fail-safe in the edge plugins to generate the events 
only for known outputs (maybe if they belong the tez runtime package ? )
I add exception throwing to the Edge to restrict this to org.apache.tez outputs 
only

bq. i.e. if someone ends up writing a new output that uses a different payload 
we would need to throw an error at least with the current impl though we do 
need to figure out how the EM plugin can invoke an empty event that the Input 
understands. One option here would be to enhance the DME meta info to indicate 
empty/null payload or invoke an api on the Output to generate the empty data 
event.
I think this is aimed at how to implement this completely generically and 
should go into a follow up JIRA if we are using this jira to implement a 
stop-gap until a full blow implementation can be finished.

bq. As for event generation, I have a doubt with respect to recovery given that 
we expect all DME events to be generated before a task completes. This might be 
something to test more carefully on recovery to see if events are generated 
correctly as needed when a failed vertex is recovered or replayed as needed.
Will see about this.

bq. Unit test could be moved to TestTezJobs. At some point we probably need to 
get rid of a lot of the TestMRR* minicluster tests.
I am assuming you mean to reimplement in a non-mr way and not to just move the 
code over and so will approach this comment from that perspective.

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, 
> TEZ-3271.6.patch, TEZ-3271.7.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-10-21 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596820#comment-15596820
 ] 

Hitesh Shah commented on TEZ-3271:
--

Comments: 
  -  generateEmptyEventsForSourceTask in EdgeManagerPlugin should not be an 
abstract function. Given that CartesianProductEdgeManager needs changing this 
is an incompatible feature. An appropriate exception thrown could be used to 
indicate that the EM plugin in use does not support the failure threshold 
percent feature.  
  - I think we can add a fail-safe in the edge plugins to generate the events 
only for known outputs (maybe if they belong the tez runtime package ? ) i.e. 
if someone ends up writing a new output that uses a different payload we would 
need to throw an error atleast with the current impl though we do need to 
figure out how the EM plugin can invoke an empty event that the Input 
understands.  One option here would be to enhance the DME meta info to indicate 
empty/null payload or invoke an api on the Output to generate the empty data 
event.
  - As for event generation, I have a doubt with respect to recovery given that 
we expect all DME events to be generated before a task completes. This might be 
something to test more carefully on recovery to see if events are generated 
correctly as needed when a failed vertex is recovered or replayed as needed. 
 - Unit test could be moved to TestTezJobs. At some point we probably need to 
get rid of a lot of the TestMRR* minicluster tests. 
 

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, 
> TEZ-3271.6.patch, TEZ-3271.7.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-10-13 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573280#comment-15573280
 ] 

Jonathan Eagles commented on TEZ-3271:
--

[~hitesh], I have push the event generation into the edge manager. Need some 
recommendation on how to what events to generate from edge manager.

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, 
> TEZ-3271.6.patch, TEZ-3271.7.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-10-12 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570232#comment-15570232
 ] 

TezQA commented on TEZ-3271:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12832987/TEZ-3271.7.patch
  against master revision c9b09cb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2036//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2036//console

This message is automatically generated.

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, 
> TEZ-3271.6.patch, TEZ-3271.7.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-10-12 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569988#comment-15569988
 ] 

TezQA commented on TEZ-3271:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12832971/TEZ-3271.6.patch
  against master revision c9b09cb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2034//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2034//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2034//console

This message is automatically generated.

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, 
> TEZ-3271.6.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-09-09 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478304#comment-15478304
 ] 

TezQA commented on TEZ-3271:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12827818/TEZ-3271.5.patch
  against master revision c07ec7b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1961//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1961//console

This message is automatically generated.

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-09-09 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478153#comment-15478153
 ] 

Hitesh Shah commented on TEZ-3271:
--

[~jeagles] Any thoughts on how to get the event generation to work correctly 
for all kinds of edges including custom ones i.e the event generation be 
somehow done via the vertex manager or edge manager plugins? This unfortunately 
also ties in to how the output and input pairs are written so as to be able to 
be generate the correct kind of event for the waiting input. 

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-06-03 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314216#comment-15314216
 ] 

Jonathan Eagles commented on TEZ-3271:
--

Spoke with Rohini offline and we agree that tez.vertex.failures.maxpercent is a 
good choice.

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-06-02 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313083#comment-15313083
 ] 

Hitesh Shah commented on TEZ-3271:
--

I guess max.failures.percent provides more clarity. 

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-06-02 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312978#comment-15312978
 ] 

Jonathan Eagles commented on TEZ-3271:
--

[~hitesh], do you prefer tez.vertex.failures.percent or 
tez.vertex.max.failures.percent or something similar to indicate there is a 
threshold that when cross still produces a failure?

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-06-02 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312974#comment-15312974
 ] 

Hitesh Shah commented on TEZ-3271:
--

Minor nit on the config name - I guess it should be tez.vertex.failures.percent 
and not tez.am.*? Would be good to add some documentation for the new config. 

Will wait for the next round of the patch related to the event generation 
handling before doing a detailed review. 


> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-06-01 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311448#comment-15311448
 ] 

TezQA commented on TEZ-3271:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12807548/TEZ-3271.4.patch
  against master revision de8b460.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1768//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1768//console

This message is automatically generated.

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-06-01 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311315#comment-15311315
 ] 

TezQA commented on TEZ-3271:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12807548/TEZ-3271.4.patch
  against master revision de8b460.

{color:red}-1 patch{color}.  master compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1767//console

This message is automatically generated.

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-06-01 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311301#comment-15311301
 ] 

Jonathan Eagles commented on TEZ-3271:
--

This job will show off the current functionality.
{noformat}
HADOOP_CLASSPATH="$TEZ_HOME/*:$TEZ_HOME/lib/*:$TEZ_CONF_DIR" yarn jar 
$TEZ_HOME/tez-tests-*.jar mrrsleep -Dmrr.sleepjob.map.fatal.error=true 
-Dmrr.sleepjob.map.error.task.ids=0 -Dtez.am.task.max.failed.attempts=1 
-Dtez.am.failures.percent=0.25f -m 4 -ir 1 -r 1
{noformat}

{noformat}
16/06/01 22:51:40 INFO client.DAGClientImpl: DAG initialized: 
CurrentState=Running
16/06/01 22:51:40 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
TotalTasks: 6 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
16/06/01 22:51:45 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% 
TotalTasks: 6 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
16/06/01 22:51:47 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 
16.67% TotalTasks: 6 Succeeded: 1 Running: 3 Failed: 0 Killed: 0
16/06/01 22:51:47 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 50% 
TotalTasks: 6 Succeeded: 3 Running: 1 Failed: 1 Killed: 0 FailedTaskAttempts: 1
16/06/01 22:51:48 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 
83.33% TotalTasks: 6 Succeeded: 5 Running: 0 Failed: 1 Killed: 0 
FailedTaskAttempts: 1
16/06/01 22:51:48 INFO client.DAGClientImpl: DAG: State: SUCCEEDED Progress: 
83.33% TotalTasks: 6 Succeeded: 5 Running: 0 Failed: 1 Killed: 0 
FailedTaskAttempts: 1
16/06/01 22:51:48 INFO client.DAGClientImpl: DAG completed. FinalState=SUCCEEDED
{noformat}

Attached a screenshot of what the UI looks like. I would prefer to have the 
diagnostic message regarding success more prominent and less "red".

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, 
> TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch
>
>
> There is a certain category of work that need not have 100% of tasks succeed 
> to cause the work to be considered a success. To meet that end, I propose we 
> provide a tez equivalent of mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered 
> a success if the number of failures is below a configured threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-06-01 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1536#comment-1536
 ] 

Bikas Saha commented on TEZ-3271:
-

It will help if there is a bit more detail on whats the objective herein?

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3271.1.patch, TEZ-3271.2.patch, TEZ-3271.3.patch
>
>
> mapreduce.map.failures.maxpercent
> mapreduce.reduce.failures.maxpercent



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-06-01 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310850#comment-15310850
 ] 

Hitesh Shah commented on TEZ-3271:
--

  - The event generation for the failed tasks is the main issue and something 
maybe that the edge manager or the VM could do? I dont think the VertexImpl is 
the right place for this as datamovementevent payload are input/output 
specific. It might be better to split the 2 issues into different jiras - one 
for the event generation and I/O changes to handle the new event for no more 
data event and next address the failure threshold handling.

  - Other general comments:
- this config should be a vertex level config and not an AM specific one 
hence named and scoped accordingly?
- code regarding commit or not should probably be put in a common place? 
- any diagnostics updates to indicate vertex succeeded as failure threshold 
was not met? 
- any recovery impact? given that the same transitions are used for 
recovery I dont think there should be any impact but might be worth checking.

 

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3271.1.patch, TEZ-3271.2.patch, TEZ-3271.3.patch
>
>
> mapreduce.map.failures.maxpercent
> mapreduce.reduce.failures.maxpercent



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-05-31 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308982#comment-15308982
 ] 

Jonathan Eagles commented on TEZ-3271:
--

[~hitesh], [~bikassaha], [~jlowe], pieced together a prototype that will be a 
starting point for discussion. Ideas on how to generalize this approach will be 
appreciated.

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3271.1.patch, TEZ-3271.2.patch, TEZ-3271.3.patch
>
>
> mapreduce.map.failures.maxpercent
> mapreduce.reduce.failures.maxpercent



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-05-31 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308881#comment-15308881
 ] 

TezQA commented on TEZ-3271:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12807273/TEZ-3271.3.patch
  against master revision 18da493.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1762//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1762//console

This message is automatically generated.

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3271.1.patch, TEZ-3271.2.patch, TEZ-3271.3.patch
>
>
> mapreduce.map.failures.maxpercent
> mapreduce.reduce.failures.maxpercent



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent

2016-05-26 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302945#comment-15302945
 ] 

TezQA commented on TEZ-3271:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12806460/TEZ-3271.2.patch
  against master revision 89802b1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.dag.app.dag.impl.TestVertexImpl
  org.apache.tez.test.TestTaskErrorsUsingLocalMode
  org.apache.tez.test.TestFaultTolerance
  org.apache.tez.test.TestExceptionPropagation
  org.apache.tez.test.TestLocalMode
  org.apache.tez.history.TestHistoryParser

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1755//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1755//console

This message is automatically generated.

> Provide mapreduce failures.maxpercent equivalent
> 
>
> Key: TEZ-3271
> URL: https://issues.apache.org/jira/browse/TEZ-3271
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3271.1.patch, TEZ-3271.2.patch
>
>
> mapreduce.map.failures.maxpercent
> mapreduce.reduce.failures.maxpercent



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)