[jira] [Commented] (TEZ-1752) Inputs / Outputs in the Runtime library should be interruptable
[ https://issues.apache.org/jira/browse/TEZ-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521141#comment-14521141 ] Siddharth Seth commented on TEZ-1752: - Spoke to Rajesh offline about committing the patch into TEZ-2003 instead of master for now, and hardening it there. Looking at the patch, especially TaskRunner and LogicalIORuntimeTask changes, I think it can be split into two - changes to LogicalIOProcessorRuntimeTask, RuntimeTask, TaskReporter, TezTaskRunner go into the branch, and the rest into the master. Rajesh, thoughts on splitting it ? Comments (TaskReporter, LogicalIORuntimeTask etc changes) - mostly minor - processorClosed=true, initializedInputs.remove, initializedOutputs.remove() - should be fore the actual invocation of close. Otherwise if there's an error we may try to invoke close twice. - {code} LOG.info(Cleanup is complete. EventRouterThread is yet to be interrupted); {code} Nit: This may be misleading since the eventRouter may have been interrupted during the close() finally block. - {code}+} catch (Throwable t) { + LOG.warn(Error in final cleanup of task. , t);{code} This isn't necessary. Exceptions are already being handled for individual close calls. - pubilc enum State - Can be an isRunning method like the existing hasInitialized method, instead of making the enum public - In TaskRunnerCallable, is there a need to check for InterruptedExceptions in the catch Throwable clause ? - Nit: task will never be null in TezTaskRunner - An exception from abortTask should probably be reported as a Failure. Can be fixed later in the branch. - Before taskFuture = executor.submit(callable); - checking the interrupt status may be useful. Otherwise the task would start, and the get is immediately interrupted. - maybeInterruptWaitingThread - Don't think we should be interrupting the main thread at this point. That can have several consequences. If task.run() returned without an Exception, but due to an abort/interrupt invocation, and the close succeeds after this - we'll try reporting the task as successful. If there's an error from close, there's a possibility that the main thread would have deregistered the task from the Reporter, and the TaskRunnerCallable thread would fail while indicating an error. Throwing an InterruptedException here and gracefully falling off the TaskRunnerCallable thread should take care of this. After this, the main run() thread regains control and can shutdown. - For a future jira. Inputs / Outputs could check for interrupts during shutdown to prevent costly spill attempts. The rest of the patch probably addresses this though. For the rest, which can go into master - Remove Exception from the abort() method signature ? - MergeManager.close - should this be throwing an InterruptedException in case of an Interrupt during Merge. Shuffle is changed to handle InterruptedException cleanly. - Some logs to be removed. - In PipelinedSorter - isThreadInterrupted - should this cause the loop to exit during the partition iteration. - PipelinedSorter - there's a lot of changes. I think most of them are related to moving the code into a try catch though ? - For the cleanup - can we hide this behind a configuration, which defaults to false for now. We don't support preemption of tasks without containers on master yet, so the container will get killed and cleaned up. Inputs / Outputs in the Runtime library should be interruptable --- Key: TEZ-1752 URL: https://issues.apache.org/jira/browse/TEZ-1752 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Rajesh Balamohan Attachments: TEZ-1752.1.patch, TEZ-1752.2.patch, TEZ-1752.3.patch, TEZ-1752.4.patch, TEZ-1752.5.patch Not possible to preempt tasks without killing containers without this. There's still the problem of Processors not supporting interrupts. We may need API enhancements to either query IPOs on whether they are interrupbtible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2390) tez-tools swimlane tool fails to parse large jobs 8K containers
[ https://issues.apache.org/jira/browse/TEZ-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521136#comment-14521136 ] Gopal V commented on TEZ-2390: -- [~jeagles]: the patch looks good, except for the debug print statements. {code} containers = [Container(ev) for ev in self.events if ev.event == CONTAINER_LAUNCHED] + for container in containers: + print(container) ... if(l.find([HISTORY]) != -1): m = self.MAIN_RE.match(l) + print(m); {code} The AM regexes are likely to be very fragile going forward as people switch on/off logging (my sub-second LLAP demos needs AM logging to be turned off). The reason I was forced to parse AM logs was because ATS kept losing data. Now that Tez AMs hang around until ATS acks all writes, I'm actually contemplating throwing this whole thing away now that we've got TEZ-2076 (which should work for 0.6.x as well, if backported). It would be great if you can help review that, so that we can move swimlane into a first-class analysis tool. tez-tools swimlane tool fails to parse large jobs 8K containers Key: TEZ-2390 URL: https://issues.apache.org/jira/browse/TEZ-2390 Project: Apache Tez Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: TEZ-2390.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1988) Tez UI does not work when using file:// in a browser
[ https://issues.apache.org/jira/browse/TEZ-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated TEZ-1988: -- Attachment: TEZ-1988.1.patch trivial patch [~Sreenath] please review. this will still require a build (to process the less files etc). but should be able to run from file:// once you have a build Tez UI does not work when using file:// in a browser - Key: TEZ-1988 URL: https://issues.apache.org/jira/browse/TEZ-1988 Project: Apache Tez Issue Type: Bug Components: UI Affects Versions: 0.6.0 Reporter: Hitesh Shah Assignee: Prakash Ramachandran Attachments: TEZ-1988.1.patch Docs mention that it defaults to using http://localhost for RM and Timeline server but it does not seem to be doing so. It uses file:///:8188 instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2389) Tez UI: Sort by attempt-no is incorrect in attempts pages.
[ https://issues.apache.org/jira/browse/TEZ-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521256#comment-14521256 ] Prakash Ramachandran commented on TEZ-2389: --- +1 LGTM. commiting shortly unrelated to this jira if there is sorting done on numbers, the getSortValue has to return numeric variables than the string representation. Tez UI: Sort by attempt-no is incorrect in attempts pages. -- Key: TEZ-2389 URL: https://issues.apache.org/jira/browse/TEZ-2389 Project: Apache Tez Issue Type: Bug Reporter: Sreenath Somarajapuram Assignee: Sreenath Somarajapuram Attachments: TEZ-2389.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2389) Tez UI: Sort by attempt-no is incorrect in attempts pages.
[ https://issues.apache.org/jira/browse/TEZ-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreenath Somarajapuram updated TEZ-2389: Description: Reported by [~tassapola] Tez UI: Sort by attempt-no is incorrect in attempts pages. -- Key: TEZ-2389 URL: https://issues.apache.org/jira/browse/TEZ-2389 Project: Apache Tez Issue Type: Bug Reporter: Sreenath Somarajapuram Assignee: Sreenath Somarajapuram Fix For: 0.7.0 Attachments: TEZ-2389.1.patch Reported by [~tassapola] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1988) Tez UI does not work when using file:// in a browser
[ https://issues.apache.org/jira/browse/TEZ-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521417#comment-14521417 ] TezQA commented on TEZ-1988: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12729455/TEZ-1988.1.patch against master revision 4ae87f0. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/588//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/588//console This message is automatically generated. Tez UI does not work when using file:// in a browser - Key: TEZ-1988 URL: https://issues.apache.org/jira/browse/TEZ-1988 Project: Apache Tez Issue Type: Bug Components: UI Affects Versions: 0.6.0 Reporter: Hitesh Shah Assignee: Prakash Ramachandran Attachments: TEZ-1988.1.patch Docs mention that it defaults to using http://localhost for RM and Timeline server but it does not seem to be doing so. It uses file:///:8188 instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-1988 PreCommit Build #588
Jira: https://issues.apache.org/jira/browse/TEZ-1988 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/588/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2776 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12729455/TEZ-1988.1.patch against master revision 4ae87f0. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/588//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/588//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 6d3f9b299e6d456a34300277d2727484413746b5 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #587 Archived 62 artifacts Archive block size is 32768 Received 28 blocks and 175558138 bytes Compression is 0.5% Took 35 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Assigned] (TEZ-1988) Tez UI does not work when using file:// in a browser
[ https://issues.apache.org/jira/browse/TEZ-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran reassigned TEZ-1988: - Assignee: Prakash Ramachandran Tez UI does not work when using file:// in a browser - Key: TEZ-1988 URL: https://issues.apache.org/jira/browse/TEZ-1988 Project: Apache Tez Issue Type: Bug Components: UI Affects Versions: 0.6.0 Reporter: Hitesh Shah Assignee: Prakash Ramachandran Docs mention that it defaults to using http://localhost for RM and Timeline server but it does not seem to be doing so. It uses file:///:8188 instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TEZ-1795) Tez UI - Include DAG name in vertex, task, counters pages
[ https://issues.apache.org/jira/browse/TEZ-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran resolved TEZ-1795. --- Resolution: Invalid fixed as part of TEZ-2158 Tez UI - Include DAG name in vertex, task, counters pages - Key: TEZ-1795 URL: https://issues.apache.org/jira/browse/TEZ-1795 Project: Apache Tez Issue Type: Improvement Components: UI Reporter: Rajesh Balamohan Assignee: Prakash Ramachandran It would be useful to include DAG names in Vertex, Task, Counters pages. This will be helpful when comparing different DAG details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2360) per-io counters flag should generate both overall and per-edge counters
[ https://issues.apache.org/jira/browse/TEZ-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522002#comment-14522002 ] Bikas Saha commented on TEZ-2360: - On separate lines please to follow existing code convention {code}+TezCounter modifiedCounter, originalCounter;{code} Sorry I missed the sum check in the previous review. A similar non-zero check for output edges would be good. {code}+assertTrue(At least one of the counter should be non-zero. invalid test , nonZeroCounters 0);{code} Unnecessary diffs? {code}- job.run(tezConf, new String[] { StringUtils.join(,, inputPaths), - StringUtils.join(,, outputPaths), 2 }, null) == 0); + job.run(tezConf, new String[]{StringUtils.join(,, inputPaths), + StringUtils.join(,, outputPaths), 2}, null) == 0); for (int i=0; inumIterations; ++i) { verifyOutput(outputDirs[i], remoteFs); @@ -732,8 +837,8 @@ public class TestTezJobs { remoteFs.mkdirs(inputDir); String outputDirStr = /tmp/owc-output; outputPaths[0] = outputDirStr; - job.run(tezConf, new String[] { StringUtils.join(,, inputPaths), - StringUtils.join(,, outputPaths), 2 }, null); + job.run(tezConf, new String[]{StringUtils.join(,, inputPaths), + StringUtils.join(,, outputPaths), 2}, null);{code} The above comments are minor. Please feel free to commit after fixing them. Thanks! per-io counters flag should generate both overall and per-edge counters Key: TEZ-2360 URL: https://issues.apache.org/jira/browse/TEZ-2360 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Prakash Ramachandran Attachments: TEZ-2360.1.patch, TEZ-2360.2.patch, TEZ-2360.3.patch, TEZ-2360.4.patch Currently, the per-io flag disables overall per task counters and retains only per edge counters. It would be useful to have both overall and per edge counters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2360) per-io counters flag should generate both overall and per-edge counters
[ https://issues.apache.org/jira/browse/TEZ-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated TEZ-2360: -- Attachment: TEZ-2360.4.patch addressed comments from [~bikassaha] per-io counters flag should generate both overall and per-edge counters Key: TEZ-2360 URL: https://issues.apache.org/jira/browse/TEZ-2360 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Prakash Ramachandran Attachments: TEZ-2360.1.patch, TEZ-2360.2.patch, TEZ-2360.3.patch, TEZ-2360.4.patch Currently, the per-io flag disables overall per task counters and retains only per edge counters. It would be useful to have both overall and per edge counters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2188) Verify vertex with bipartite source vertex and root input in client side
[ https://issues.apache.org/jira/browse/TEZ-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521863#comment-14521863 ] Bikas Saha commented on TEZ-2188: - I am sorry. I am not clear what the issue is that is being fixed? Verify vertex with bipartite source vertex and root input in client side Key: TEZ-2188 URL: https://issues.apache.org/jira/browse/TEZ-2188 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Attachments: TEZ-2188-1.patch, TEZ-2188-2.patch For this kind of case that vertex with bipartite source vertex and root input, there's no clear diagnosis message in client side, should do the verification in client side. {code} 16:31:57,333 - Thread( main) - (DAGClientImpl.java:541) - Waiting for DAG to start running 16:32:00,455 - Thread( main) - (DAGClientImpl.java:541) - DAG initialized: CurrentState=Running 16:32:00,977 - Thread( main) - (DAGClientImpl.java:541) - DAG: State: FAILED Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 2 16:32:00,978 - Thread( main) - (DAGClientImpl.java:541) - DAG completed. FinalState=FAILED 16:32:00,978 - Thread( main) - (TezExampleBase.java:137) - DAG diagnostics: [Vertex failed, vertexName=v2, vertexId=vertex_142588246_0025_1_01, diagnostics=[Vertex vertex_142588246_0025_1_01 [v2] killed/failed due to:null], Vertex killed, vertexName=v1, vertexId=vertex_142588246_0025_1_00, diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, Vertex vertex_142588246_0025_1_00 [v1] killed/failed due to:null], DAG failed due to vertex failure. failedVertices:1 killedVertices:1] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TEZ-2360) per-io counters flag should generate both overall and per-edge counters
[ https://issues.apache.org/jira/browse/TEZ-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran resolved TEZ-2360. --- Resolution: Fixed thanks [~bikassaha] committed to master. per-io counters flag should generate both overall and per-edge counters Key: TEZ-2360 URL: https://issues.apache.org/jira/browse/TEZ-2360 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Prakash Ramachandran Attachments: TEZ-2360.1.patch, TEZ-2360.2.patch, TEZ-2360.3.patch, TEZ-2360.4.patch Currently, the per-io flag disables overall per task counters and retains only per edge counters. It would be useful to have both overall and per edge counters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2395 PreCommit Build #590
Jira: https://issues.apache.org/jira/browse/TEZ-2395 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/590/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2774 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12729573/TEZ-2395.1.patch against master revision e36f962. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/590//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/590//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. f3f5c0a4827f1ed226e6ec60f654b37831826c8a logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #589 Archived 44 artifacts Archive block size is 32768 Received 29 blocks and 1802089 bytes Compression is 34.5% Took 1.2 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-2390) tez-tools swimlane tool fails to parse large jobs 8K containers
[ https://issues.apache.org/jira/browse/TEZ-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522283#comment-14522283 ] Jonathan Eagles commented on TEZ-2390: -- Thanks, [~gopalv]. I'll take a look at the TEZ-2076 to continue this type of analysis in the future. tez-tools swimlane tool fails to parse large jobs 8K containers Key: TEZ-2390 URL: https://issues.apache.org/jira/browse/TEZ-2390 Project: Apache Tez Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: TEZ-2390.1.patch, TEZ-2390.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2395) Tez UI: Minimum/Maximum Duration show a empty bracket next to 0 secs when you purposefully failed a job.
[ https://issues.apache.org/jira/browse/TEZ-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated TEZ-2395: -- Attachment: TEZ-2395.1.patch [~Sreenath] review please. Tez UI: Minimum/Maximum Duration show a empty bracket next to 0 secs when you purposefully failed a job. Key: TEZ-2395 URL: https://issues.apache.org/jira/browse/TEZ-2395 Project: Apache Tez Issue Type: Bug Components: UI Reporter: Prakash Ramachandran Assignee: Prakash Ramachandran Attachments: TEZ-2395.1.patch I set hive.tez.java.opts=-Xmx1m in order to fail a query. Vertex Details shows an empty bracket as shown in the attached screenshot: Minimum Duration 0 secs [ ] Maximum Duration 0 secs [ ] It would look better if the empty bracket is not displayed in a case there is no ask attempts. reported by [~taksaito] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2390) tez-tools swimlane tool fails to parse large jobs 8K containers
[ https://issues.apache.org/jira/browse/TEZ-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522231#comment-14522231 ] TezQA commented on TEZ-2390: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12729590/TEZ-2390.2.patch against master revision 765afd2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/591//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/591//console This message is automatically generated. tez-tools swimlane tool fails to parse large jobs 8K containers Key: TEZ-2390 URL: https://issues.apache.org/jira/browse/TEZ-2390 Project: Apache Tez Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: TEZ-2390.1.patch, TEZ-2390.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2395) Tez UI: Minimum/Maximum Duration show a empty bracket next to 0 secs when you purposefully failed a job.
Prakash Ramachandran created TEZ-2395: - Summary: Tez UI: Minimum/Maximum Duration show a empty bracket next to 0 secs when you purposefully failed a job. Key: TEZ-2395 URL: https://issues.apache.org/jira/browse/TEZ-2395 Project: Apache Tez Issue Type: Bug Components: UI Reporter: Prakash Ramachandran Assignee: Prakash Ramachandran I set hive.tez.java.opts=-Xmx1m in order to fail a query. Vertex Details shows an empty bracket as shown in the attached screenshot: Minimum Duration 0 secs [ ] Maximum Duration 0 secs [ ] It would look better if the empty bracket is not displayed in a case there is no ask attempts. reported by [~taksaito] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2390) tez-tools swimlane tool fails to parse large jobs 8K containers
[ https://issues.apache.org/jira/browse/TEZ-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522123#comment-14522123 ] Gopal V commented on TEZ-2390: -- LGTM - +1. tez-tools swimlane tool fails to parse large jobs 8K containers Key: TEZ-2390 URL: https://issues.apache.org/jira/browse/TEZ-2390 Project: Apache Tez Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: TEZ-2390.1.patch, TEZ-2390.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2360) per-io counters flag should generate both overall and per-edge counters
[ https://issues.apache.org/jira/browse/TEZ-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522122#comment-14522122 ] Bikas Saha commented on TEZ-2360: - If the above changes were made before commit, then could you please attach the final commit patch to the jira. per-io counters flag should generate both overall and per-edge counters Key: TEZ-2360 URL: https://issues.apache.org/jira/browse/TEZ-2360 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Prakash Ramachandran Attachments: TEZ-2360.1.patch, TEZ-2360.2.patch, TEZ-2360.3.patch, TEZ-2360.4.patch Currently, the per-io flag disables overall per task counters and retains only per edge counters. It would be useful to have both overall and per edge counters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2231) Create project by-laws
[ https://issues.apache.org/jira/browse/TEZ-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522168#comment-14522168 ] Rohini Palaniswamy commented on TEZ-2231: - [~hitesh], Did vimdiff between by-laws.patch and by-laws.3.patch and confirmed that the new changes made are good. +1. The by-laws.3.patch contains a lot of code changes that you were working on apart from the bylaws. Please upload the final patch which does contain only the by laws changes before checking in for future reference. Create project by-laws -- Key: TEZ-2231 URL: https://issues.apache.org/jira/browse/TEZ-2231 Project: Apache Tez Issue Type: Task Reporter: Hitesh Shah Assignee: Hitesh Shah Attachments: by-laws.2.patch, by-laws.3.patch, by-laws.patch Define the Project by-laws. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2231) Create project by-laws
[ https://issues.apache.org/jira/browse/TEZ-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2231: - Attachment: TEZ-2231.4.patch Thanks [~rohini]. Probably missed a rebase before doing a diff. Updated patch against latest master. Create project by-laws -- Key: TEZ-2231 URL: https://issues.apache.org/jira/browse/TEZ-2231 Project: Apache Tez Issue Type: Task Reporter: Hitesh Shah Assignee: Hitesh Shah Attachments: TEZ-2231.4.patch, by-laws.2.patch, by-laws.3.patch, by-laws.patch Define the Project by-laws. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2394) Issues when there is an error in VertexManager callbacks
Bikas Saha created TEZ-2394: --- Summary: Issues when there is an error in VertexManager callbacks Key: TEZ-2394 URL: https://issues.apache.org/jira/browse/TEZ-2394 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Priority: Critical -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Success: TEZ-2360 PreCommit Build #589
Jira: https://issues.apache.org/jira/browse/TEZ-2360 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/589/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2774 lines...] [INFO] Final Memory: 70M/927M [INFO] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12729554/TEZ-2360.4.patch against master revision e36f962. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/589//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/589//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 1fbde29d938d8c751173a5c10ef107d47932bbe7 logged out == == Finished build. == == Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #587 Archived 44 artifacts Archive block size is 32768 Received 8 blocks and 2487831 bytes Compression is 9.5% Took 0.82 sec Description set: TEZ-2360 Recording test results Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-2360) per-io counters flag should generate both overall and per-edge counters
[ https://issues.apache.org/jira/browse/TEZ-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522072#comment-14522072 ] TezQA commented on TEZ-2360: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12729554/TEZ-2360.4.patch against master revision e36f962. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/589//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/589//console This message is automatically generated. per-io counters flag should generate both overall and per-edge counters Key: TEZ-2360 URL: https://issues.apache.org/jira/browse/TEZ-2360 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Prakash Ramachandran Attachments: TEZ-2360.1.patch, TEZ-2360.2.patch, TEZ-2360.3.patch, TEZ-2360.4.patch Currently, the per-io flag disables overall per task counters and retains only per edge counters. It would be useful to have both overall and per edge counters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2390) tez-tools swimlane tool fails to parse large jobs 8K containers
[ https://issues.apache.org/jira/browse/TEZ-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated TEZ-2390: - Attachment: TEZ-2390.2.patch tez-tools swimlane tool fails to parse large jobs 8K containers Key: TEZ-2390 URL: https://issues.apache.org/jira/browse/TEZ-2390 Project: Apache Tez Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: TEZ-2390.1.patch, TEZ-2390.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2395) Tez UI: Minimum/Maximum Duration show a empty bracket next to 0 secs when you purposefully failed a job.
[ https://issues.apache.org/jira/browse/TEZ-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522137#comment-14522137 ] TezQA commented on TEZ-2395: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12729573/TEZ-2395.1.patch against master revision e36f962. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/590//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/590//console This message is automatically generated. Tez UI: Minimum/Maximum Duration show a empty bracket next to 0 secs when you purposefully failed a job. Key: TEZ-2395 URL: https://issues.apache.org/jira/browse/TEZ-2395 Project: Apache Tez Issue Type: Bug Components: UI Reporter: Prakash Ramachandran Assignee: Prakash Ramachandran Attachments: TEZ-2395.1.patch I set hive.tez.java.opts=-Xmx1m in order to fail a query. Vertex Details shows an empty bracket as shown in the attached screenshot: Minimum Duration 0 secs [ ] Maximum Duration 0 secs [ ] It would look better if the empty bracket is not displayed in a case there is no ask attempts. reported by [~taksaito] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2390) tez-tools swimlane tool fails to parse large jobs 8K containers
[ https://issues.apache.org/jira/browse/TEZ-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated TEZ-2390: - Fix Version/s: 0.6.1 tez-tools swimlane tool fails to parse large jobs 8K containers Key: TEZ-2390 URL: https://issues.apache.org/jira/browse/TEZ-2390 Project: Apache Tez Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Fix For: 0.6.1 Attachments: TEZ-2390.1.patch, TEZ-2390.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-776: --- Attachment: (was: TEZ-776.8.patch) Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the moment. A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event). Depending on the connection pattern - this puts limits on the number of tasks that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2394 PreCommit Build #594
Jira: https://issues.apache.org/jira/browse/TEZ-2394 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/594/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2774 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12729695/TEZ-2394.1.patch against master revision a02a5ea. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/594//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/594//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 746a785f857bb3493f7dcf7f043ff512626a0931 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #592 Archived 44 artifacts Archive block size is 32768 Received 6 blocks and 2557755 bytes Compression is 7.1% Took 2 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-2397) Translation of LocalResources via Tez plan serialization can be lossy
[ https://issues.apache.org/jira/browse/TEZ-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522755#comment-14522755 ] Jonathan Eagles commented on TEZ-2397: -- I'll make sure this is in 0.6.1 before the rc. Translation of LocalResources via Tez plan serialization can be lossy - Key: TEZ-2397 URL: https://issues.apache.org/jira/browse/TEZ-2397 Project: Apache Tez Issue Type: Bug Affects Versions: 0.4.0 Reporter: Siddharth Seth Assignee: Siddharth Seth Priority: Critical Attachments: TEZ-2397.1.txt Happens when there's no port information. The way we serialize a YarnURL into a string causes the reconstructed path to include the port as -1, which is an invalid URL. Path/URL reconstruction from this causes the hostname to be lost. This is problematic on clusters running HDFA HA - since there's no host:port information, only a service name. I'd imaging it'll be a problem for viewfs as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2389) Tez UI: Sort by attempt-no is incorrect in attempts pages.
[ https://issues.apache.org/jira/browse/TEZ-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522797#comment-14522797 ] Sreenath Somarajapuram commented on TEZ-2389: - Thanks [~pramachandran], will keep the suggestion in mind. Tez UI: Sort by attempt-no is incorrect in attempts pages. -- Key: TEZ-2389 URL: https://issues.apache.org/jira/browse/TEZ-2389 Project: Apache Tez Issue Type: Bug Reporter: Sreenath Somarajapuram Assignee: Sreenath Somarajapuram Fix For: 0.7.0 Attachments: TEZ-2389.1.patch Reported by [~tassapola] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2394) Issues when there is an error in VertexManager callbacks
[ https://issues.apache.org/jira/browse/TEZ-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522799#comment-14522799 ] TezQA commented on TEZ-2394: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12729695/TEZ-2394.1.patch against master revision a02a5ea. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/594//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/594//console This message is automatically generated. Issues when there is an error in VertexManager callbacks Key: TEZ-2394 URL: https://issues.apache.org/jira/browse/TEZ-2394 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Priority: Critical Attachments: TEZ-2394.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2397) Translation of LocalResources via Tez plan serialization can be lossy
[ https://issues.apache.org/jira/browse/TEZ-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522746#comment-14522746 ] Bikas Saha commented on TEZ-2397: - Good catch! +1 for the patch. 0.5.4 should wait for this. /cc [~hitesh] 0.6.1 should wait for this /cc [~jeagles] Translation of LocalResources via Tez plan serialization can be lossy - Key: TEZ-2397 URL: https://issues.apache.org/jira/browse/TEZ-2397 Project: Apache Tez Issue Type: Bug Affects Versions: 0.4.0 Reporter: Siddharth Seth Assignee: Siddharth Seth Priority: Critical Attachments: TEZ-2397.1.txt Happens when there's no port information. The way we serialize a YarnURL into a string causes the reconstructed path to include the port as -1, which is an invalid URL. Path/URL reconstruction from this causes the hostname to be lost. This is problematic on clusters running HDFA HA - since there's no host:port information, only a service name. I'd imaging it'll be a problem for viewfs as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2386) Tez UI: Inconsistent usage of icon colors
[ https://issues.apache.org/jira/browse/TEZ-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522747#comment-14522747 ] Sreenath Somarajapuram commented on TEZ-2386: - Please make hasFailedTasks a computed property. Other than that things looks good. Tez UI: Inconsistent usage of icon colors - Key: TEZ-2386 URL: https://issues.apache.org/jira/browse/TEZ-2386 Project: Apache Tez Issue Type: Bug Components: UI Reporter: Prakash Ramachandran Assignee: Prakash Ramachandran Attachments: TEZ-2386.1.patch, TEZ-2386.2.patch, TEZ-2386.wip.1.patch if there's failed attempts in a DAG, and it succeeds - an orange icon shows up on the DAG page. This is very useful to identify DAGs which may need some debugging. However, the color is Green for Vertex / Task views after this - so it's difficult to know which one actually had problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-776: --- Attachment: TEZ-776.8.patch Attaching patch with review comments addressed. Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, TEZ-776.8.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the moment. A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event). Depending on the connection pattern - this puts limits on the number of tasks that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2396) pig-tez-tfile-parser pom is hard coded to depend on 0.6.0-SNAPSHOT version of tez
Jonathan Eagles created TEZ-2396: Summary: pig-tez-tfile-parser pom is hard coded to depend on 0.6.0-SNAPSHOT version of tez Key: TEZ-2396 URL: https://issues.apache.org/jira/browse/TEZ-2396 Project: Apache Tez Issue Type: Bug Reporter: Jonathan Eagles -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2198) Fix sorter spill counts
[ https://issues.apache.org/jira/browse/TEZ-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-2198: -- Attachment: TEZ-2198.3.patch Thanks [~sseth]. Addressing review comments. - Renamed TOTAL_SPILL_COUNT to SHUFFLE_CHUNK - Setting counters as and when spill happens so that it can be viewed in UI. Fix sorter spill counts --- Key: TEZ-2198 URL: https://issues.apache.org/jira/browse/TEZ-2198 Project: Apache Tez Issue Type: Improvement Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2198.1.patch, TEZ-2198.2.patch, TEZ-2198.3.patch, no_additional_spills_eg_pipelined_shuffle.png, with_additional_spills.png Prior to pipelined shuffle, tez merged all spilled data into a single file. This ended up creating one index file and one output file. In this context, TaskCounter.ADDITIONAL_SPILL_COUNT was referred as the number of additional spills and there was no counter needed to track the number of merges. With pipelined shuffle, there is no final merge and ADDITIONAL_SPILL_COUNT would be misleading, as these spills are direct output files which are consumed by the consumers. It would be good to have the following - ADDITIONAL_SPILL_COUNT: represents the spills that are needed by the task to generate the final merged output - TOTAL_SPILLS: represents the total number of shuffle directories (index + output files) that got created at the end of processing. For e.g, Assume sorter generated 5 spills in an attempt Without pipelining: == ADDITIONAL_SPILL_COUNT = 5 -- Additional spills involved in sorting TOTAL_SPILLS = 1 -- Final merged output With pipelining: ADDITIONAL_SPILL_COUNT = 0 -- Additional spills involved in sorting TOTAL_SPILLS = 5 --- all spills are final output -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TEZ-1564) State machine error: Invalid event: T_SCHEDULE at SCHEDULED
[ https://issues.apache.org/jira/browse/TEZ-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan resolved TEZ-1564. --- Resolution: Cannot Reproduce can not reproduce. Will close this for time being and will reopen it if needed. State machine error: Invalid event: T_SCHEDULE at SCHEDULED --- Key: TEZ-1564 URL: https://issues.apache.org/jira/browse/TEZ-1564 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Priority: Critical Attachments: applogs.txt.tar.gz, dag.dot ERROR [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.TaskImpl: Can't handle this event at current state for task_1409722953518_0162_1_07_00 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_SCHEDULE at SCHEDULED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:827) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:95) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1604) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1590) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:724) I will attach the dag + app logs soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2396) pig-tez-tfile-parser pom is hard coded to depend on 0.6.0-SNAPSHOT version of tez
[ https://issues.apache.org/jira/browse/TEZ-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-2396: -- Attachment: TEZ-2396-branch-0.6.patch TEZ-2396.1.patch pig-tez-tfile-parser pom is hard coded to depend on 0.6.0-SNAPSHOT version of tez - Key: TEZ-2396 URL: https://issues.apache.org/jira/browse/TEZ-2396 Project: Apache Tez Issue Type: Bug Reporter: Jonathan Eagles Assignee: Rajesh Balamohan Attachments: TEZ-2396-branch-0.6.patch, TEZ-2396.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2398) Flaky test: TestFaultTolerance
Rajesh Balamohan created TEZ-2398: - Summary: Flaky test: TestFaultTolerance Key: TEZ-2398 URL: https://issues.apache.org/jira/browse/TEZ-2398 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522738#comment-14522738 ] Bikas Saha commented on TEZ-776: bq. Can the fields in DataMovementEvent be made final after the new create methods ? The non-final ints are being set in multiple places. So this will have to wait. bq. InputFailedEvent.makeCopy javadoc incomplete. Fixed. Ignoring comments for the obsolete version of the API. bq. Unrelated to this jira, but a minor enhancement to the LOG the type of edge during setup Added bq. This is not necessarily sufficient to determine whether ODR should be enabled for the edge Partially implementing the API will not work since this enables on demand routing for all types of events. Like I said, the point of this check is not identify ODR plugins but to identify legacy plugins. This is only there to allow older versions of Hive to run with newer versions of Tez transparently. This check is practically sufficient. bq. Does this change the caching of events which made Broadcast and OneToOne efficient earlier No. This is a bug in the current code which is being rectified. We should not change the original events received by the AM. If the same event is routed to different tasks with different indices then all of them will end up seeing the last index. Its was a bug waiting to be hit by the first edge plugin that would do so. bq. Think Hitesh already pointed this out, but a single event can explode into multiple events - bigger than the maxEvents limit. Simple fix would be to just accept the explosion and ignore the maxEvent limitation in this case. .7 patch addresses that issue. Ignoring the limit is not an option. If too many events are sent on the RPC then we can end up overloading the RPC and cause other issues. bq. In Edge, CompositeDataMovementEvent has it's own try catch throw AMUserCodeException .7 patch fixed it bq. VerexImpl.getTaskAttemptTezEvents - Is taskEvents.size(), and any access to taskEvents thread safe The code is being optimistic here and its explained in the comments. Nothing is deleted from this list and only one place for additions. So there is no concurrent modification issues. Its ok to read the size outside the lock since we will get a snapshot number to events to look at. Taking a lock is not going to give any better guarantees about seeing the latest event being added. bq. ROOT_INPUT_DATA_INFORMATION_EVENTS - Sending these directly can cause the maxEventsPerHeartbeat to be exceeded They are being guarded by the same overflow checks that guard the other events. So it is fine. bq. This can cause the number of events returned to the task to be lower than maxEventsToGet. .7 patch makes sure the events are fully packed to maxEvents bq. Add events for the ones which support ODR to the vertex event list, hand off the rest to the task This would need to change the index based lookup from a task to change to multiple indices instead of single index. That is a change I plan to make in a follow up jira. This would enable separating edge routing from non-edge routing or separate routing for different edges. Though I don't want to support the existing legacy routing in tandem with ODR. Unnecessary complexity and ODR offloads event processing from the central dispatcher which is a good benefit by itself. bq. I don't think ODR needs to be added to the OneToOne and Broadcast edges That is not optimal. In the average small job case, the CPU is not very relevant. However, for a large vertex with mixed edges we should not fall back to legacy routing because the broadcast is legacy. In addition it take routing pressure of the central dispatcher which helps get more useful work done faster on the central dispatcher. In various scenarios the central dispatcher has often shown up as a bottleneck. A lot of the CPU overhead will go away once we stop creating new event objects in the AM. Thats another follow up jira to this one. Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the
[jira] [Commented] (TEZ-2396) pig-tez-tfile-parser pom is hard coded to depend on 0.6.0-SNAPSHOT version of tez
[ https://issues.apache.org/jira/browse/TEZ-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522585#comment-14522585 ] Rajesh Balamohan commented on TEZ-2396: --- [~hitesh], [~jeagles] - Please review when you have sometime. pig-tez-tfile-parser pom is hard coded to depend on 0.6.0-SNAPSHOT version of tez - Key: TEZ-2396 URL: https://issues.apache.org/jira/browse/TEZ-2396 Project: Apache Tez Issue Type: Bug Reporter: Jonathan Eagles Assignee: Rajesh Balamohan Attachments: TEZ-2396-branch-0.6.patch, TEZ-2396.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2393) Tez pickup PATH env from gateway machine
[ https://issues.apache.org/jira/browse/TEZ-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522692#comment-14522692 ] TezQA commented on TEZ-2393: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12729661/TEZ-2393.1.patch against master revision a02a5ea. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/592//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/592//console This message is automatically generated. Tez pickup PATH env from gateway machine Key: TEZ-2393 URL: https://issues.apache.org/jira/browse/TEZ-2393 Project: Apache Tez Issue Type: Bug Reporter: Daniel Dai Assignee: Hitesh Shah Attachments: TEZ-2393.1.patch I found this issue on Windows. When I do: set PATH=C:\dummy;%PATH% Then run a tez job. C:\dummy appears in PATH of the vertex container. This is surprising since we don't expect frontend PATH will propagate to backend. [~hitesh] tried it on Linux and found the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2198) Fix sorter spill counts
[ https://issues.apache.org/jira/browse/TEZ-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522698#comment-14522698 ] TezQA commented on TEZ-2198: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12729663/TEZ-2198.3.patch against master revision a02a5ea. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestFaultTolerance org.apache.tez.test.TestPipelinedShuffle Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/593//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/593//console This message is automatically generated. Fix sorter spill counts --- Key: TEZ-2198 URL: https://issues.apache.org/jira/browse/TEZ-2198 Project: Apache Tez Issue Type: Improvement Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2198.1.patch, TEZ-2198.2.patch, TEZ-2198.3.patch, no_additional_spills_eg_pipelined_shuffle.png, with_additional_spills.png Prior to pipelined shuffle, tez merged all spilled data into a single file. This ended up creating one index file and one output file. In this context, TaskCounter.ADDITIONAL_SPILL_COUNT was referred as the number of additional spills and there was no counter needed to track the number of merges. With pipelined shuffle, there is no final merge and ADDITIONAL_SPILL_COUNT would be misleading, as these spills are direct output files which are consumed by the consumers. It would be good to have the following - ADDITIONAL_SPILL_COUNT: represents the spills that are needed by the task to generate the final merged output - TOTAL_SPILLS: represents the total number of shuffle directories (index + output files) that got created at the end of processing. For e.g, Assume sorter generated 5 spills in an attempt Without pipelining: == ADDITIONAL_SPILL_COUNT = 5 -- Additional spills involved in sorting TOTAL_SPILLS = 1 -- Final merged output With pipelining: ADDITIONAL_SPILL_COUNT = 0 -- Additional spills involved in sorting TOTAL_SPILLS = 5 --- all spills are final output -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522714#comment-14522714 ] Bikas Saha commented on TEZ-776: bq. That's still making more changes which belong to other jiras, and warrant a review in themselves. Not sure why this seems to be so. Let me try to clarify. Between patches A and B the proposed new API changed in this form. {code}+ public @Nullable CollectionDataMovementEvent routeCompositeDataMovementEventToDestination( + CompositeDataMovementEvent event, int sourceTaskIndex, int destinationTaskIndex) {code} to {code}+ public @Nullable EventRouteMetadata routeCompositeDataMovementEventToDestination( + int sourceTaskIndex, int destinationTaskIndex) {code} Since this API is being introduced in this jira for the first time, isn't now the right time to put in the best form possible and review it here instead of in a different jira? Other than this, the rest of the changes are essentially the same. There is no new functionality implemented. So I am struggling to understand your concerns about increasing scope or the changes belonging somewhere else. Perhaps you did not see patch B properly. I hope this code snippet and explanation helps clarify things. [~rajesh.balamohan], [~hitesh] Since you guys have been looking at the patches and have seen version A and have reviewed version B, please let me know if I am missing something. I am consciously trying to not increase the scope of this jira and have left many improvements for follow ups. E.g. separating root input event routing, composite event routing, event obsoletion etc. Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the moment. A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event). Depending on the connection pattern - this puts limits on the number of tasks that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2198 PreCommit Build #593
Jira: https://issues.apache.org/jira/browse/TEZ-2198 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/593/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2576 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12729663/TEZ-2198.3.patch against master revision a02a5ea. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestFaultTolerance org.apache.tez.test.TestPipelinedShuffle Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/593//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/593//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 3d5742fa4cee68d2bd36465615b1f811d9dc5084 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #592 Archived 44 artifacts Archive block size is 32768 Received 6 blocks and 2570486 bytes Compression is 7.1% Took 1.1 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## 7 tests failed. REGRESSION: org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit Error Message: TezSession has already shutdown. No cluster diagnostics found. Stack Trace: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found. at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:677) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:118) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:114) at org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit(TestFaultTolerance.java:248) REGRESSION: org.apache.tez.test.TestFaultTolerance.testInputFailureRerunCanSendOutputToTwoDownstreamVertices Error Message: TezSession has already shutdown. No cluster diagnostics found. Stack Trace: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found. at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:677) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:118) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:114) at org.apache.tez.test.TestFaultTolerance.testInputFailureRerunCanSendOutputToTwoDownstreamVertices(TestFaultTolerance.java:672) REGRESSION: org.apache.tez.test.TestFaultTolerance.testMultipleInputFailureWithoutExit Error Message: TezSession has already shutdown. No cluster diagnostics found. Stack Trace: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found. at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:677) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:118) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:114) at org.apache.tez.test.TestFaultTolerance.testMultipleInputFailureWithoutExit(TestFaultTolerance.java:297) REGRESSION: org.apache.tez.test.TestFaultTolerance.testCascadingInputFailureWithExitSuccess Error Message: TezSession has
[jira] [Updated] (TEZ-2198) Fix sorter spill counts
[ https://issues.apache.org/jira/browse/TEZ-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-2198: -- Attachment: TEZ-2198.4.patch - Added minor fix in TestPipelinedShuffle - TestFaultTolerance is unrelated to this patch. Fix sorter spill counts --- Key: TEZ-2198 URL: https://issues.apache.org/jira/browse/TEZ-2198 Project: Apache Tez Issue Type: Improvement Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2198.1.patch, TEZ-2198.2.patch, TEZ-2198.3.patch, TEZ-2198.4.patch, no_additional_spills_eg_pipelined_shuffle.png, with_additional_spills.png Prior to pipelined shuffle, tez merged all spilled data into a single file. This ended up creating one index file and one output file. In this context, TaskCounter.ADDITIONAL_SPILL_COUNT was referred as the number of additional spills and there was no counter needed to track the number of merges. With pipelined shuffle, there is no final merge and ADDITIONAL_SPILL_COUNT would be misleading, as these spills are direct output files which are consumed by the consumers. It would be good to have the following - ADDITIONAL_SPILL_COUNT: represents the spills that are needed by the task to generate the final merged output - TOTAL_SPILLS: represents the total number of shuffle directories (index + output files) that got created at the end of processing. For e.g, Assume sorter generated 5 spills in an attempt Without pipelining: == ADDITIONAL_SPILL_COUNT = 5 -- Additional spills involved in sorting TOTAL_SPILLS = 1 -- Final merged output With pipelining: ADDITIONAL_SPILL_COUNT = 0 -- Additional spills involved in sorting TOTAL_SPILLS = 5 --- all spills are final output -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
[ https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran reassigned TEZ-2366: - Assignee: Prakash Ramachandran Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333 Key: TEZ-2366 URL: https://issues.apache.org/jira/browse/TEZ-2366 Project: Apache Tez Issue Type: Bug Reporter: Daniel Dai Assignee: Prakash Ramachandran Priority: Critical Attachments: TEZ-2366.test.txt, TEZ-2366.wip.1.patch There are around 20 unit tests (out of around 2000) fail intermittently after TEZ-2333. Here is a stack: {code} org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164) at org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611) at org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591) at org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536) at org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} To reproduce that in Pig test, using the following commands: svn co http://svn.apache.org/repos/asf/pig/trunk ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism test Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to true (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup). I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to false in Pig and does not help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
[ https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522408#comment-14522408 ] Hitesh Shah commented on TEZ-2366: -- NodeId is host + port. Shuffle port could also be used - we are already using hostname matching in any case for checking whether this can be a local fetch. Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333 Key: TEZ-2366 URL: https://issues.apache.org/jira/browse/TEZ-2366 Project: Apache Tez Issue Type: Bug Reporter: Daniel Dai Priority: Critical Attachments: TEZ-2366.test.txt, TEZ-2366.wip.1.patch There are around 20 unit tests (out of around 2000) fail intermittently after TEZ-2333. Here is a stack: {code} org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164) at org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611) at org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591) at org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536) at org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} To reproduce that in Pig test, using the following commands: svn co http://svn.apache.org/repos/asf/pig/trunk ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism test Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to true (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup). I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to false in Pig and does not help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2360) per-io counters flag should generate both overall and per-edge counters
[ https://issues.apache.org/jira/browse/TEZ-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522304#comment-14522304 ] Hitesh Shah commented on TEZ-2360: -- Might be good to bump counter num defaults here as part of this too. per-io counters flag should generate both overall and per-edge counters Key: TEZ-2360 URL: https://issues.apache.org/jira/browse/TEZ-2360 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Prakash Ramachandran Attachments: TEZ-2360.1.patch, TEZ-2360.2.patch, TEZ-2360.3.patch, TEZ-2360.4.patch, TEZ-2360.5.patch Currently, the per-io flag disables overall per task counters and retains only per edge counters. It would be useful to have both overall and per edge counters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (TEZ-2360) per-io counters flag should generate both overall and per-edge counters
[ https://issues.apache.org/jira/browse/TEZ-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2360: - Comment: was deleted (was: Might be good to bump counter num defaults here as part of this too. ) per-io counters flag should generate both overall and per-edge counters Key: TEZ-2360 URL: https://issues.apache.org/jira/browse/TEZ-2360 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Prakash Ramachandran Attachments: TEZ-2360.1.patch, TEZ-2360.2.patch, TEZ-2360.3.patch, TEZ-2360.4.patch, TEZ-2360.5.patch Currently, the per-io flag disables overall per task counters and retains only per edge counters. It would be useful to have both overall and per edge counters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2393) Tez pickup PATH env from gateway machine
[ https://issues.apache.org/jira/browse/TEZ-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2393: - Attachment: TEZ-2393.1.patch Tez pickup PATH env from gateway machine Key: TEZ-2393 URL: https://issues.apache.org/jira/browse/TEZ-2393 Project: Apache Tez Issue Type: Bug Reporter: Daniel Dai Attachments: TEZ-2393.1.patch I found this issue on Windows. When I do: set PATH=C:\dummy;%PATH% Then run a tez job. C:\dummy appears in PATH of the vertex container. This is surprising since we don't expect frontend PATH will propagate to backend. [~hitesh] tried it on Linux and found the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-2392) Have all readers throw an Exception on incorrect next() usage
[ https://issues.apache.org/jira/browse/TEZ-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan reassigned TEZ-2392: - Assignee: Rajesh Balamohan Have all readers throw an Exception on incorrect next() usage - Key: TEZ-2392 URL: https://issues.apache.org/jira/browse/TEZ-2392 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Rajesh Balamohan Priority: Critical Follow up from TEZ-2348. Marking as critical since this is a behaviour change, and we should get it in early. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-2396) pig-tez-tfile-parser pom is hard coded to depend on 0.6.0-SNAPSHOT version of tez
[ https://issues.apache.org/jira/browse/TEZ-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan reassigned TEZ-2396: - Assignee: Rajesh Balamohan pig-tez-tfile-parser pom is hard coded to depend on 0.6.0-SNAPSHOT version of tez - Key: TEZ-2396 URL: https://issues.apache.org/jira/browse/TEZ-2396 Project: Apache Tez Issue Type: Bug Reporter: Jonathan Eagles Assignee: Rajesh Balamohan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah reassigned TEZ-2379: Assignee: Hitesh Shah org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Hitesh Shah Priority: Blocker {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1897) Create a concurrent version of AsyncDispatcher
[ https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522300#comment-14522300 ] Siddharth Seth commented on TEZ-1897: - Rest looks good to me. Create a concurrent version of AsyncDispatcher -- Key: TEZ-1897 URL: https://issues.apache.org/jira/browse/TEZ-1897 Project: Apache Tez Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch, TEZ-1897.4.patch, TEZ-1897.5.patch, TEZ-1897.6.patch Currently, it processes events on a single thread. For events that can be executed in parallel, e.g. vertex manager events, allowing higher concurrency may be beneficial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1897) Create a concurrent version of AsyncDispatcher
[ https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522298#comment-14522298 ] Siddharth Seth commented on TEZ-1897: - Mostly minor comments. - AsyncDispatcherConcurrent(String name, int numThreads) { super(name) instead of super(dispatcher) - final LinkedBlockingQueueEvent queue;; - Double ; - serviceStop / serviceStart don't need to invoke super. These will be invoked via the stop() / start() methods automatically. - In AsyncDispatcher - the error checking code for previously registered dispatchers (/* check to see if we have a listener registered */) can be common across all register methods. registerAndCreateDispatcher(without #threads) is missing a check on ConcurrentDispatchers - waitForDrained.wait(1000);, LOG.info(Waiting for AsyncDispatcher to drain.); - log line before the wait ? - At the same place - do the threads need to be interrupted ? Otherwise they'll always wait for the 1000ms if the queue is already empty. - There's changes in TaskImpl, Task, Vertex - which are unrelated to this. That's likely adding to perf gains. There's a separate jira for this - will see if the scope of that is no longer valid after theses changes. Major bit: There's a lot of code duplication between AsyncDispatcher and AsyncDispatcherConcurrent - GenericEventHandler, MultiAttemptListener and a bunch more.The register functions seem a little unnecessary since noone registers directly with this. Not sure if you want to fix this here or in a follow up. Create a concurrent version of AsyncDispatcher -- Key: TEZ-1897 URL: https://issues.apache.org/jira/browse/TEZ-1897 Project: Apache Tez Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch, TEZ-1897.4.patch, TEZ-1897.5.patch, TEZ-1897.6.patch Currently, it processes events on a single thread. For events that can be executed in parallel, e.g. vertex manager events, allowing higher concurrency may be beneficial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2363) Counters: off by 1 error for REDUCE_INPUT_GROUPS counter
[ https://issues.apache.org/jira/browse/TEZ-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated TEZ-2363: - Fix Version/s: 0.7.0 Counters: off by 1 error for REDUCE_INPUT_GROUPS counter Key: TEZ-2363 URL: https://issues.apache.org/jira/browse/TEZ-2363 Project: Apache Tez Issue Type: Bug Affects Versions: 0.7.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 0.7.0 Attachments: TEZ-2363.1.patch The reduce input key groups are not incremented for the first key in operation, only for the second key does it increment in moveToNext() - nextKey() - inputKeyCounter.increment(1); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Success: TEZ-2188 PreCommit Build #587
Jira: https://issues.apache.org/jira/browse/TEZ-2188 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/587/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2774 lines...] [INFO] Final Memory: 74M/1175M [INFO] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12729402/TEZ-2188-2.patch against master revision 2382f09. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/587//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/587//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. cb1cd318b31b74d28439df207b17769ee781e463 logged out == == Finished build. == == Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #586 Archived 44 artifacts Archive block size is 32768 Received 8 blocks and 2486487 bytes Compression is 9.5% Took 1.5 sec Description set: TEZ-2188 Recording test results Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-2188) Verify vertex with bipartite source vertex and root input in client side
[ https://issues.apache.org/jira/browse/TEZ-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520924#comment-14520924 ] TezQA commented on TEZ-2188: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12729402/TEZ-2188-2.patch against master revision 2382f09. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/587//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/587//console This message is automatically generated. Verify vertex with bipartite source vertex and root input in client side Key: TEZ-2188 URL: https://issues.apache.org/jira/browse/TEZ-2188 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Attachments: TEZ-2188-1.patch, TEZ-2188-2.patch For this kind of case that vertex with bipartite source vertex and root input, there's no clear diagnosis message in client side, should do the verification in client side. {code} 16:31:57,333 - Thread( main) - (DAGClientImpl.java:541) - Waiting for DAG to start running 16:32:00,455 - Thread( main) - (DAGClientImpl.java:541) - DAG initialized: CurrentState=Running 16:32:00,977 - Thread( main) - (DAGClientImpl.java:541) - DAG: State: FAILED Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 2 16:32:00,978 - Thread( main) - (DAGClientImpl.java:541) - DAG completed. FinalState=FAILED 16:32:00,978 - Thread( main) - (TezExampleBase.java:137) - DAG diagnostics: [Vertex failed, vertexName=v2, vertexId=vertex_142588246_0025_1_01, diagnostics=[Vertex vertex_142588246_0025_1_01 [v2] killed/failed due to:null], Vertex killed, vertexName=v1, vertexId=vertex_142588246_0025_1_00, diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, Vertex vertex_142588246_0025_1_00 [v1] killed/failed due to:null], DAG failed due to vertex failure. failedVertices:1 killedVertices:1] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)