[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15712908#comment-15712908 ] Hitesh Shah commented on TEZ-3271: -- Assuming that the following recovery case was tested: - V1 connected with V2 - V1 completes with failures - AM killed when V2 was running with some tasks already launched +1 if the above test was manually done. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.10.patch, TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, > TEZ-3271.5.patch, TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, > TEZ-3271.9.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710028#comment-15710028 ] Jonathan Eagles commented on TEZ-3271: -- [~hitesh], I have verified this works manually with recovery, but the tests are proving hard to write. What needs to be done before we can look at this patch again? > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.10.patch, TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, > TEZ-3271.5.patch, TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, > TEZ-3271.9.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613573#comment-15613573 ] TezQA commented on TEZ-3271: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12835664/TEZ-3271.10.patch against master revision 6cf4378. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2069//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2069//console This message is automatically generated. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.10.patch, TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, > TEZ-3271.5.patch, TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, > TEZ-3271.9.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612523#comment-15612523 ] Hitesh Shah commented on TEZ-3271: -- bq. The above it true, but one important this I didn't mention above is that edgeManager.getNumDestinationConsumerTasks throws of type Exception. Throwing TezException, IOException, and Exception seems redundant. I will catch the Exception and rethrow TezException to explicitly give a throws TezException, IOException signature. Sorry - my bad. Missed that. Lets stick to Exception in that case - no need to do an additional wrap around. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, > TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, TEZ-3271.9.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612432#comment-15612432 ] Jonathan Eagles commented on TEZ-3271: -- bq. The function could do explicit throws for both of the exceptions. On the VertexImpl, a generic Exception catch would make sense to ensure that we dont hit a dispatcher error. The above it true, but one important this I didn't mention above is that edgeManager.getNumDestinationConsumerTasks throws of type Exception. Throwing TezException, IOException, and Exception seems redundant. I will catch the Exception and rethrow TezException to explicitly give a throws TezException, IOException signature. bq. Relies on TaskImpl's impl of addAndScheduleAttempt() but you could just use size of the map to decipher the last attempt. Changed the logic to always ensure the highest attempt is chosen. bq. "Assert.assertEquals(2, v6.numSuccessSourceAttemptCompletions);" - shoudnt this get 4 completions given that v6 expects task attempt completions for both tasks in v4 and v5? Likewise for "testFailuresMaxPercentExceededSourceTaskAttemptCompletionEvents" - I would assume this will get 2 for tasks of v5? Will address this bq. Sorry for the last minute review comment on testVertexFailuresMaxPercent - should we be using a 2-vertex DAG with the first vertex having a threshold and verifying that the second vertex ( using a shuffle edge ) completes successfully ? Think the main thrust is that the shuffle is missing between these vertices and should be added. Will do that. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, > TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, TEZ-3271.9.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610181#comment-15610181 ] Hitesh Shah commented on TEZ-3271: -- bq. This function throws TezException and IOException. Let me what the right thing to do in this particular situation. The function could do explicit throws for both of the exceptions. On the VertexImpl, a generic Exception catch would make sense to ensure that we dont hit a dispatcher error. {code} Iterator attempts = task.getAttempts().keySet().iterator(); 2150while (attempts.hasNext()) { 2151 attempt = attempts.next(); 2152} {code} - Relies on TaskImpl's impl of addAndScheduleAttempt() but you could just use size of the map to decipher the last attempt. TestVertexImpl changes: - createInvalidDAGPlan not needed? - "Assert.assertEquals(2, v6.numSuccessSourceAttemptCompletions);" - shoudnt this get 4 completions given that v6 expects task attempt completions for both tasks in v4 and v5? - Likewise for "testFailuresMaxPercentExceededSourceTaskAttemptCompletionEvents" - I would assume this will get 2 for tasks of v5? Sorry for the last minute review comment on testVertexFailuresMaxPercent - should we be using a 2-vertex DAG with the first vertex having a threshold and verifying that the second vertex ( using a shuffle edge ) completes successfully ? > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, > TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, TEZ-3271.9.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15609288#comment-15609288 ] Hitesh Shah commented on TEZ-3271: -- [~jeagles] Will do so later today. Any luck on looking into the recovery aspects of this change? > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, > TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, TEZ-3271.9.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15609087#comment-15609087 ] Jonathan Eagles commented on TEZ-3271: -- [~hitesh], can you have a look at the latest (9) max percent failures patch? > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, > TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, TEZ-3271.9.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608945#comment-15608945 ] TezQA commented on TEZ-3271: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12835252/TEZ-3271.9.patch against master revision f735f48. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2065//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2065//console This message is automatically generated. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, > TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, TEZ-3271.9.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607395#comment-15607395 ] TezQA commented on TEZ-3271: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12835252/TEZ-3271.9.patch against master revision f735f48. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in : org.apache.tez.dag.app.rm.TestTaskScheduler Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2062//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2062//console This message is automatically generated. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, > TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, TEZ-3271.9.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607276#comment-15607276 ] Jonathan Eagles commented on TEZ-3271: -- bq. the above should have parenthesis to make the code more understandable at a first glance. added parentheses to the boolean logic bq. would be useful to log count of failed tasks, total tasks, threshold to aid debugging. modified logging to include the failed tasks total tasks, and threshold to be in line with the diagnostic message bq. Do the task completion events to history need to be changed to publish this info too? Can be done in a follow-up jira but should be done and made visible via the UI Will make sure to handle this in the follow-up jira bq. why pick the first attempt instead of the last one? Switched logic to pick the last one. Basically picked the first one so as to not iterate the whole list as needed by the API provided bq. any particular reason for the generic exception as compared to a specific one being thrown? This function throws TezException and IOException. Let me what the right thing to do in this particular situation. bq. No test changes for TestVertexImpl? testVertexFailuresMaxPercent() does a good high level end to end verification but we probably need some unit tests at the VertexImpl level to test thresholds, event generation, etc. Added both positive and negative tests to TestVertexImpl. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, > TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch, TEZ-3271.9.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606447#comment-15606447 ] Hitesh Shah commented on TEZ-3271: -- Comments: {code} boolean vertexFailuresBelowThreshold = vertex.succeededTaskCount + vertex.failedTaskCount == vertex.numTasks && vertex.failedTaskCount * 100 <= vertex.maxFailuresPercent * vertex.numTasks; {code} - the above should have parenthesis to make the code more understandable at a first glance. {code} LOG.info("All tasks have completed and the number of failed tasks is within threshold, vertex:" + vertex.logIdentifier); {code} - would be useful to log count of failed tasks, total tasks, threshold to aid debugging. - Do the task completion events to history need to be changed to publish this info too? Can be done in a follow-up jira but should be done and made visible via the UI. {code} TezTaskAttemptID attempt = task.getAttempts().keySet().iterator().next(); 2147LOG.info("Succeeding failed task attempt:" + attempt); {code} - why pick the first attempt instead of the last one? {code} generateEmptyEventsForAttempt(TezTaskAttemptID attempt) throws Exception {code} - any particular reason for the generic exception as compared to a specific one being thrown? TEZ_VERTEX_FAILURES_MAXPERCENT needs a minor doc improvement to clarify whether the values are meant to be 0.0-1.0f or 0.0-100.0f. No test changes for TestVertexImpl? testVertexFailuresMaxPercent() does a good high level end to end verification but we probably need some unit tests at the VertexImpl level to test thresholds, event generation, etc. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, > TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606236#comment-15606236 ] TezQA commented on TEZ-3271: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12835164/TEZ-3271.8.patch against master revision f735f48. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2060//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2060//console This message is automatically generated. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, > TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605992#comment-15605992 ] Jonathan Eagles commented on TEZ-3271: -- Posted a new patch implementing the changes as discussed. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, > TEZ-3271.6.patch, TEZ-3271.7.patch, TEZ-3271.8.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603334#comment-15603334 ] Jonathan Eagles commented on TEZ-3271: -- bq. generateEmptyEventsForSourceTask in EdgeManagerPlugin should not be an abstract function. Given that CartesianProductEdgeManager needs changing this is an incompatible feature. An appropriate exception thrown could be used to indicate that the EM plugin in use does not support the failure threshold percent feature. If we strictly limit this feature to know tez outputs, we can avoid empty event generation at this time in the edge manager plugin and can promote that to the edge. bq. I think we can add a fail-safe in the edge plugins to generate the events only for known outputs (maybe if they belong the tez runtime package ? ) I add exception throwing to the Edge to restrict this to org.apache.tez outputs only bq. i.e. if someone ends up writing a new output that uses a different payload we would need to throw an error at least with the current impl though we do need to figure out how the EM plugin can invoke an empty event that the Input understands. One option here would be to enhance the DME meta info to indicate empty/null payload or invoke an api on the Output to generate the empty data event. I think this is aimed at how to implement this completely generically and should go into a follow up JIRA if we are using this jira to implement a stop-gap until a full blow implementation can be finished. bq. As for event generation, I have a doubt with respect to recovery given that we expect all DME events to be generated before a task completes. This might be something to test more carefully on recovery to see if events are generated correctly as needed when a failed vertex is recovered or replayed as needed. Will see about this. bq. Unit test could be moved to TestTezJobs. At some point we probably need to get rid of a lot of the TestMRR* minicluster tests. I am assuming you mean to reimplement in a non-mr way and not to just move the code over and so will approach this comment from that perspective. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, > TEZ-3271.6.patch, TEZ-3271.7.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596820#comment-15596820 ] Hitesh Shah commented on TEZ-3271: -- Comments: - generateEmptyEventsForSourceTask in EdgeManagerPlugin should not be an abstract function. Given that CartesianProductEdgeManager needs changing this is an incompatible feature. An appropriate exception thrown could be used to indicate that the EM plugin in use does not support the failure threshold percent feature. - I think we can add a fail-safe in the edge plugins to generate the events only for known outputs (maybe if they belong the tez runtime package ? ) i.e. if someone ends up writing a new output that uses a different payload we would need to throw an error atleast with the current impl though we do need to figure out how the EM plugin can invoke an empty event that the Input understands. One option here would be to enhance the DME meta info to indicate empty/null payload or invoke an api on the Output to generate the empty data event. - As for event generation, I have a doubt with respect to recovery given that we expect all DME events to be generated before a task completes. This might be something to test more carefully on recovery to see if events are generated correctly as needed when a failed vertex is recovered or replayed as needed. - Unit test could be moved to TestTezJobs. At some point we probably need to get rid of a lot of the TestMRR* minicluster tests. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, > TEZ-3271.6.patch, TEZ-3271.7.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573280#comment-15573280 ] Jonathan Eagles commented on TEZ-3271: -- [~hitesh], I have push the event generation into the edge manager. Need some recommendation on how to what events to generate from edge manager. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, > TEZ-3271.6.patch, TEZ-3271.7.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570232#comment-15570232 ] TezQA commented on TEZ-3271: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12832987/TEZ-3271.7.patch against master revision c9b09cb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestFaultTolerance Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2036//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2036//console This message is automatically generated. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, > TEZ-3271.6.patch, TEZ-3271.7.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569988#comment-15569988 ] TezQA commented on TEZ-3271: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12832971/TEZ-3271.6.patch against master revision c9b09cb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2034//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/2034//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2034//console This message is automatically generated. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch, > TEZ-3271.6.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478304#comment-15478304 ] TezQA commented on TEZ-3271: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12827818/TEZ-3271.5.patch against master revision c07ec7b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1961//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1961//console This message is automatically generated. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478153#comment-15478153 ] Hitesh Shah commented on TEZ-3271: -- [~jeagles] Any thoughts on how to get the event generation to work correctly for all kinds of edges including custom ones i.e the event generation be somehow done via the vertex manager or edge manager plugins? This unfortunately also ties in to how the output and input pairs are written so as to be able to be generate the correct kind of event for the waiting input. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch, TEZ-3271.5.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314216#comment-15314216 ] Jonathan Eagles commented on TEZ-3271: -- Spoke with Rohini offline and we agree that tez.vertex.failures.maxpercent is a good choice. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313083#comment-15313083 ] Hitesh Shah commented on TEZ-3271: -- I guess max.failures.percent provides more clarity. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312978#comment-15312978 ] Jonathan Eagles commented on TEZ-3271: -- [~hitesh], do you prefer tez.vertex.failures.percent or tez.vertex.max.failures.percent or something similar to indicate there is a threshold that when cross still produces a failure? > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312974#comment-15312974 ] Hitesh Shah commented on TEZ-3271: -- Minor nit on the config name - I guess it should be tez.vertex.failures.percent and not tez.am.*? Would be good to add some documentation for the new config. Will wait for the next round of the patch related to the event generation handling before doing a detailed review. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311448#comment-15311448 ] TezQA commented on TEZ-3271: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12807548/TEZ-3271.4.patch against master revision de8b460. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1768//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1768//console This message is automatically generated. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311315#comment-15311315 ] TezQA commented on TEZ-3271: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12807548/TEZ-3271.4.patch against master revision de8b460. {color:red}-1 patch{color}. master compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1767//console This message is automatically generated. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311301#comment-15311301 ] Jonathan Eagles commented on TEZ-3271: -- This job will show off the current functionality. {noformat} HADOOP_CLASSPATH="$TEZ_HOME/*:$TEZ_HOME/lib/*:$TEZ_CONF_DIR" yarn jar $TEZ_HOME/tez-tests-*.jar mrrsleep -Dmrr.sleepjob.map.fatal.error=true -Dmrr.sleepjob.map.error.task.ids=0 -Dtez.am.task.max.failed.attempts=1 -Dtez.am.failures.percent=0.25f -m 4 -ir 1 -r 1 {noformat} {noformat} 16/06/01 22:51:40 INFO client.DAGClientImpl: DAG initialized: CurrentState=Running 16/06/01 22:51:40 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 16/06/01 22:51:45 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 16/06/01 22:51:47 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 16.67% TotalTasks: 6 Succeeded: 1 Running: 3 Failed: 0 Killed: 0 16/06/01 22:51:47 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 50% TotalTasks: 6 Succeeded: 3 Running: 1 Failed: 1 Killed: 0 FailedTaskAttempts: 1 16/06/01 22:51:48 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 83.33% TotalTasks: 6 Succeeded: 5 Running: 0 Failed: 1 Killed: 0 FailedTaskAttempts: 1 16/06/01 22:51:48 INFO client.DAGClientImpl: DAG: State: SUCCEEDED Progress: 83.33% TotalTasks: 6 Succeeded: 5 Running: 0 Failed: 1 Killed: 0 FailedTaskAttempts: 1 16/06/01 22:51:48 INFO client.DAGClientImpl: DAG completed. FinalState=SUCCEEDED {noformat} Attached a screenshot of what the UI looks like. I would prefer to have the diagnostic message regarding success more prominent and less "red". > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: Succeeded with Failures.png, TEZ-3271.1.patch, > TEZ-3271.2.patch, TEZ-3271.3.patch, TEZ-3271.4.patch > > > There is a certain category of work that need not have 100% of tasks succeed > to cause the work to be considered a success. To meet that end, I propose we > provide a tez equivalent of mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. In this way a vertex will be considered > a success if the number of failures is below a configured threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1536#comment-1536 ] Bikas Saha commented on TEZ-3271: - It will help if there is a bit more detail on whats the objective herein? > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: TEZ-3271.1.patch, TEZ-3271.2.patch, TEZ-3271.3.patch > > > mapreduce.map.failures.maxpercent > mapreduce.reduce.failures.maxpercent -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310850#comment-15310850 ] Hitesh Shah commented on TEZ-3271: -- - The event generation for the failed tasks is the main issue and something maybe that the edge manager or the VM could do? I dont think the VertexImpl is the right place for this as datamovementevent payload are input/output specific. It might be better to split the 2 issues into different jiras - one for the event generation and I/O changes to handle the new event for no more data event and next address the failure threshold handling. - Other general comments: - this config should be a vertex level config and not an AM specific one hence named and scoped accordingly? - code regarding commit or not should probably be put in a common place? - any diagnostics updates to indicate vertex succeeded as failure threshold was not met? - any recovery impact? given that the same transitions are used for recovery I dont think there should be any impact but might be worth checking. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: TEZ-3271.1.patch, TEZ-3271.2.patch, TEZ-3271.3.patch > > > mapreduce.map.failures.maxpercent > mapreduce.reduce.failures.maxpercent -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308982#comment-15308982 ] Jonathan Eagles commented on TEZ-3271: -- [~hitesh], [~bikassaha], [~jlowe], pieced together a prototype that will be a starting point for discussion. Ideas on how to generalize this approach will be appreciated. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: TEZ-3271.1.patch, TEZ-3271.2.patch, TEZ-3271.3.patch > > > mapreduce.map.failures.maxpercent > mapreduce.reduce.failures.maxpercent -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308881#comment-15308881 ] TezQA commented on TEZ-3271: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12807273/TEZ-3271.3.patch against master revision 18da493. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1762//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1762//console This message is automatically generated. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: TEZ-3271.1.patch, TEZ-3271.2.patch, TEZ-3271.3.patch > > > mapreduce.map.failures.maxpercent > mapreduce.reduce.failures.maxpercent -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3271) Provide mapreduce failures.maxpercent equivalent
[ https://issues.apache.org/jira/browse/TEZ-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302945#comment-15302945 ] TezQA commented on TEZ-3271: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12806460/TEZ-3271.2.patch against master revision 89802b1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.dag.app.dag.impl.TestVertexImpl org.apache.tez.test.TestTaskErrorsUsingLocalMode org.apache.tez.test.TestFaultTolerance org.apache.tez.test.TestExceptionPropagation org.apache.tez.test.TestLocalMode org.apache.tez.history.TestHistoryParser Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1755//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1755//console This message is automatically generated. > Provide mapreduce failures.maxpercent equivalent > > > Key: TEZ-3271 > URL: https://issues.apache.org/jira/browse/TEZ-3271 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: TEZ-3271.1.patch, TEZ-3271.2.patch > > > mapreduce.map.failures.maxpercent > mapreduce.reduce.failures.maxpercent -- This message was sent by Atlassian JIRA (v6.3.4#6332)