[jira] [Comment Edited] (TEZ-1752) Inputs / Outputs in the Runtime library should be interruptable
[ https://issues.apache.org/jira/browse/TEZ-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516988#comment-14516988 ] Rajesh Balamohan edited comment on TEZ-1752 at 4/28/15 1:01 PM: - In case of AM sending shutdown signal, RuntimeTask.abort() is invoked so that processors can get a chance for any cleanup. - If the task is running, thread running the processor is interrupted immediately; otherwise taskFuture.get() is interrupted. - Adding the inputs/outputs as and when they are scheduled for initialization in LogicalIOProcessorRuntimeTask was (Author: rajesh.balamohan): - If the task is running, thread running the processor is interrupted; otherwise taskFuture.get() is interrupted. - Adding the inputs/outputs as and when they are scheduled for initialization in LogicalIOProcessorRuntimeTask Inputs / Outputs in the Runtime library should be interruptable --- Key: TEZ-1752 URL: https://issues.apache.org/jira/browse/TEZ-1752 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Rajesh Balamohan Attachments: TEZ-1752.1.patch, TEZ-1752.2.patch, TEZ-1752.3.patch, TEZ-1752.4.patch Not possible to preempt tasks without killing containers without this. There's still the problem of Processors not supporting interrupts. We may need API enhancements to either query IPOs on whether they are interrupbtible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1752) Inputs / Outputs in the Runtime library should be interruptable
[ https://issues.apache.org/jira/browse/TEZ-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517007#comment-14517007 ] TezQA commented on TEZ-1752: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12728780/TEZ-1752.4.patch against master revision 2522447. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.task.TestTaskExecution Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/562//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/562//console This message is automatically generated. Inputs / Outputs in the Runtime library should be interruptable --- Key: TEZ-1752 URL: https://issues.apache.org/jira/browse/TEZ-1752 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Rajesh Balamohan Attachments: TEZ-1752.1.patch, TEZ-1752.2.patch, TEZ-1752.3.patch, TEZ-1752.4.patch Not possible to preempt tasks without killing containers without this. There's still the problem of Processors not supporting interrupts. We may need API enhancements to either query IPOs on whether they are interrupbtible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2231) Create project by-laws
[ https://issues.apache.org/jira/browse/TEZ-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517927#comment-14517927 ] Hitesh Shah commented on TEZ-2231: -- This one has been open for quite some time. [~sseth] [~bikassaha] [~rohini] mind doing a final review before I commit? Create project by-laws -- Key: TEZ-2231 URL: https://issues.apache.org/jira/browse/TEZ-2231 Project: Apache Tez Issue Type: Task Reporter: Hitesh Shah Assignee: Hitesh Shah Attachments: by-laws.2.patch, by-laws.3.patch, by-laws.patch Define the Project by-laws. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1522) Scheduling can result in out of order execution and slowdown of upstream work
[ https://issues.apache.org/jira/browse/TEZ-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-1522: Target Version/s: 0.8.0 Scheduling can result in out of order execution and slowdown of upstream work - Key: TEZ-1522 URL: https://issues.apache.org/jira/browse/TEZ-1522 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Priority: Critical Labels: performance Attachments: TEZ-1522.1.wip.txt, TEZ-1522.2.wip.txt, TEZ-1522.am.log.gz, task_runtime.svg M2 M7 \ / (sg) \/ R3/ (b) \ / (b) \ / \ / M5 | R6 Plz refer to the attachment (task runtime SVG). In this case, M5 got scheduled much earlier than R3 (green color in the diagram) and retained lots of containers. R3 got less containers to work with. Attaching the output from the status monitor when the job ran; Map_5 has taken up almost all of cluster resource, whereas Reducer_3 got fraction of the capacity. Map_2: 1/1 Map_5: 0(+373)/1000 Map_7: 1/1 Reducer_3: 0/8000 Reducer_6: 0/1 Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 0/8000 Reducer_6: 0/1 Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 0(+1)/8000 Reducer_6: 0/1 Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 14(+7)/8000 Reducer_6: 0/1 Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 63(+14)/8000 Reducer_6: 0/1 Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 159(+22)/8000Reducer_6: 0/1 Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 308(+29)/8000Reducer_6: 0/1 ... Creating this JIRA as a placeholder for scheduler enhancement. One possibililty could be to schedule lesser number of tasks in downstream vertices, based on the information available for the upstream vertex. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-924) InputFailedEvent handling for Shuffle
[ https://issues.apache.org/jira/browse/TEZ-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-924: --- Target Version/s: 0.8.0 (was: 0.7.0) InputFailedEvent handling for Shuffle - Key: TEZ-924 URL: https://issues.apache.org/jira/browse/TEZ-924 Project: Apache Tez Issue Type: Bug Reporter: Siddharth Seth Priority: Critical Shuffle receives batches of Events to process from the AM. The way these events are sent over to the ShuffleHandlers and the way they're processed - it's possible that Shuffle will start fetching data from an Event, which is to be subsequently marked as failed (via an InputFailedEvent) 1) The AM sends events in batches. An InputFailedEvent for a specific Input may not be part of the same batch which contained the original event which is being marked bad. 2) The ShuffleEventHandler processes the events in each batch one event at a time - so even if the InputFailedEvent follows - it's possible for Shuffle to start fetching data from a Failed Input. The AM needs to change to invalidate Inputs up front - so that related events don't span batches. Alternately, it needs to apply the InputFailedEvent to the original event being sent. The Shuffle itself should process a batch update as a batch - that would prevent fetchers from starting early even though there may be additional events for the same host. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518321#comment-14518321 ] Bikas Saha commented on TEZ-2379: - deleting log as it had private server names and ip addresses. org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Priority: Blocker {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518337#comment-14518337 ] Bikas Saha commented on TEZ-2379: - 1) Client issued dag kill that caused all tasks to get kill. 2) Task sent kill request to its attempt and started waiting for attempt for finish 3) Attempt succeeded - sent done 4) Task got attempt success and went into killed state because all its attempts are done 5) Attempt got kill request - it honored that kill request in TerminatedAfterSuccessTransition and sent killed back to task. 6) Task got attempt killed in killed state and that is not handled. From what I see in the code, 5 seems to be the problem here. The attempt should ignore kill request if its already done. Attempt is killed when a different attempt is successful and this attempt is not needed. Or when the task is killed. Task retroactive kill in which a successful task is killed (say in order to run it again after node failure) does not use this flow. So unless we can think of any other use cases for a successful attempt transitioning to killed, we should ignore kill request in attempt if the attempt is already succeeded. org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Priority: Blocker {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1897) Create a concurrent version of AsyncDispatcher
[ https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-1897: Attachment: TEZ-1897.6.patch Patch addresses comments and adds a new test. Create a concurrent version of AsyncDispatcher -- Key: TEZ-1897 URL: https://issues.apache.org/jira/browse/TEZ-1897 Project: Apache Tez Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch, TEZ-1897.4.patch, TEZ-1897.5.patch, TEZ-1897.6.patch Currently, it processes events on a single thread. For events that can be executed in parallel, e.g. vertex manager events, allowing higher concurrency may be beneficial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-776: --- Attachment: TEZ-776.6.B.patch Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the moment. A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event). Depending on the connection pattern - this puts limits on the number of tasks that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Success: TEZ-1897 PreCommit Build #567
Jira: https://issues.apache.org/jira/browse/TEZ-1897 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/567/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2795 lines...] [INFO] Final Memory: 77M/960M [INFO] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12728962/TEZ-1897.6.patch against master revision 5b2f011. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/567//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/567//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 3e708a977896cd65ef59206c2f0fca47194c7204 logged out == == Finished build. == == Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #565 Archived 44 artifacts Archive block size is 32768 Received 6 blocks and 2582782 bytes Compression is 7.1% Took 1.5 sec Description set: TEZ-1897 Recording test results Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518350#comment-14518350 ] Rajesh Balamohan commented on TEZ-2379: --- speculation was not enabled in this case. org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Priority: Blocker {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2380 PreCommit Build #566
Jira: https://issues.apache.org/jira/browse/TEZ-2380 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/566/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2772 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12728963/TEZ-2380.1.patch against master revision 5b2f011. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/566//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/566//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 1bc0c8a43a36e36380eecf6aec12d7409c1ebc9b logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #565 Archived 44 artifacts Archive block size is 32768 Received 22 blocks and 2032295 bytes Compression is 26.2% Took 2.1 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Updated] (TEZ-2256) Avoid use of BufferTooSmallException to signal end of buffer in UnorderedPartitionedKVWriter
[ https://issues.apache.org/jira/browse/TEZ-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2256: - Assignee: Hitesh Shah (was: Cyrille Chépélov) Avoid use of BufferTooSmallException to signal end of buffer in UnorderedPartitionedKVWriter Key: TEZ-2256 URL: https://issues.apache.org/jira/browse/TEZ-2256 Project: Apache Tez Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Cyrille Chépélov Assignee: Hitesh Shah Priority: Critical Labels: patch Attachments: remove-btse-1-MASTER.patch, remove-btse-1-rfc.patch Original Estimate: 6h Remaining Estimate: 6h UnorderedPartitionedKVWriter delegates serialization to the application, passing it a private ByteArrayOutputStream. In case the buffer is exhausted, ByteArrayOutputStream signals that with a private BufferTooSmallException, which can be seen but not dealt with by the application. As [~cwensel] pointed out, when the application is in fact a complex framework, there is no way to distinguish this exception from a real failure, which compels logging the full stack even for reasonable events such as buffer complete. Suggested approach: set a complete flag in ByteArrayOutputStream that disables any further output, and replace BufferTooSmallException (BTSE) handling by checking that flag. [~sseth] suggested checking out SortedOutput as well, as the mechanisms there should be similar. I'll give this a go this week. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2256) Avoid use of BufferTooSmallException to signal end of buffer in UnorderedPartitionedKVWriter
[ https://issues.apache.org/jira/browse/TEZ-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2256: - Issue Type: Bug (was: Improvement) Avoid use of BufferTooSmallException to signal end of buffer in UnorderedPartitionedKVWriter Key: TEZ-2256 URL: https://issues.apache.org/jira/browse/TEZ-2256 Project: Apache Tez Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Cyrille Chépélov Assignee: Cyrille Chépélov Priority: Critical Labels: patch Attachments: remove-btse-1-MASTER.patch, remove-btse-1-rfc.patch Original Estimate: 6h Remaining Estimate: 6h UnorderedPartitionedKVWriter delegates serialization to the application, passing it a private ByteArrayOutputStream. In case the buffer is exhausted, ByteArrayOutputStream signals that with a private BufferTooSmallException, which can be seen but not dealt with by the application. As [~cwensel] pointed out, when the application is in fact a complex framework, there is no way to distinguish this exception from a real failure, which compels logging the full stack even for reasonable events such as buffer complete. Suggested approach: set a complete flag in ByteArrayOutputStream that disables any further output, and replace BufferTooSmallException (BTSE) handling by checking that flag. [~sseth] suggested checking out SortedOutput as well, as the mechanisms there should be similar. I'll give this a go this week. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2256) Avoid use of BufferTooSmallException to signal end of buffer in UnorderedPartitionedKVWriter
[ https://issues.apache.org/jira/browse/TEZ-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2256: - Issue Type: Improvement (was: Bug) Avoid use of BufferTooSmallException to signal end of buffer in UnorderedPartitionedKVWriter Key: TEZ-2256 URL: https://issues.apache.org/jira/browse/TEZ-2256 Project: Apache Tez Issue Type: Improvement Affects Versions: 0.6.0, 0.7.0 Reporter: Cyrille Chépélov Assignee: Cyrille Chépélov Priority: Critical Labels: patch Attachments: remove-btse-1-MASTER.patch, remove-btse-1-rfc.patch Original Estimate: 6h Remaining Estimate: 6h UnorderedPartitionedKVWriter delegates serialization to the application, passing it a private ByteArrayOutputStream. In case the buffer is exhausted, ByteArrayOutputStream signals that with a private BufferTooSmallException, which can be seen but not dealt with by the application. As [~cwensel] pointed out, when the application is in fact a complex framework, there is no way to distinguish this exception from a real failure, which compels logging the full stack even for reasonable events such as buffer complete. Suggested approach: set a complete flag in ByteArrayOutputStream that disables any further output, and replace BufferTooSmallException (BTSE) handling by checking that flag. [~sseth] suggested checking out SortedOutput as well, as the mechanisms there should be similar. I'll give this a go this week. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2256) Avoid use of BufferTooSmallException to signal end of buffer in UnorderedPartitionedKVWriter
[ https://issues.apache.org/jira/browse/TEZ-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518457#comment-14518457 ] Hitesh Shah commented on TEZ-2256: -- Change status to patch available to trigger pre-commit. Avoid use of BufferTooSmallException to signal end of buffer in UnorderedPartitionedKVWriter Key: TEZ-2256 URL: https://issues.apache.org/jira/browse/TEZ-2256 Project: Apache Tez Issue Type: Improvement Affects Versions: 0.6.0, 0.7.0 Reporter: Cyrille Chépélov Assignee: Cyrille Chépélov Priority: Critical Labels: patch Attachments: remove-btse-1-MASTER.patch, remove-btse-1-rfc.patch Original Estimate: 6h Remaining Estimate: 6h UnorderedPartitionedKVWriter delegates serialization to the application, passing it a private ByteArrayOutputStream. In case the buffer is exhausted, ByteArrayOutputStream signals that with a private BufferTooSmallException, which can be seen but not dealt with by the application. As [~cwensel] pointed out, when the application is in fact a complex framework, there is no way to distinguish this exception from a real failure, which compels logging the full stack even for reasonable events such as buffer complete. Suggested approach: set a complete flag in ByteArrayOutputStream that disables any further output, and replace BufferTooSmallException (BTSE) handling by checking that flag. [~sseth] suggested checking out SortedOutput as well, as the mechanisms there should be similar. I'll give this a go this week. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2256) Avoid use of BufferTooSmallException to signal end of buffer in UnorderedPartitionedKVWriter
[ https://issues.apache.org/jira/browse/TEZ-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518581#comment-14518581 ] TezQA commented on TEZ-2256: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723914/remove-btse-1-MASTER.patch against master revision 5b2f011. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/569//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/569//console This message is automatically generated. Avoid use of BufferTooSmallException to signal end of buffer in UnorderedPartitionedKVWriter Key: TEZ-2256 URL: https://issues.apache.org/jira/browse/TEZ-2256 Project: Apache Tez Issue Type: Improvement Affects Versions: 0.6.0, 0.7.0 Reporter: Cyrille Chépélov Assignee: Cyrille Chépélov Priority: Critical Labels: patch Attachments: remove-btse-1-MASTER.patch, remove-btse-1-rfc.patch Original Estimate: 6h Remaining Estimate: 6h UnorderedPartitionedKVWriter delegates serialization to the application, passing it a private ByteArrayOutputStream. In case the buffer is exhausted, ByteArrayOutputStream signals that with a private BufferTooSmallException, which can be seen but not dealt with by the application. As [~cwensel] pointed out, when the application is in fact a complex framework, there is no way to distinguish this exception from a real failure, which compels logging the full stack even for reasonable events such as buffer complete. Suggested approach: set a complete flag in ByteArrayOutputStream that disables any further output, and replace BufferTooSmallException (BTSE) handling by checking that flag. [~sseth] suggested checking out SortedOutput as well, as the mechanisms there should be similar. I'll give this a go this week. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2256 PreCommit Build #569
Jira: https://issues.apache.org/jira/browse/TEZ-2256 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/569/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2783 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723914/remove-btse-1-MASTER.patch against master revision 5b2f011. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/569//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/569//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. ce5b82ce372f4eacfe21bc559ca53f87c76c248f logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #567 Archived 44 artifacts Archive block size is 32768 Received 4 blocks and 2680043 bytes Compression is 4.7% Took 2.1 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
Success: TEZ-776 PreCommit Build #565
Jira: https://issues.apache.org/jira/browse/TEZ-776 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/565/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2797 lines...] [INFO] Final Memory: 75M/962M [INFO] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12728940/TEZ-776.6.A.patch against master revision 5b2f011. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/565//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/565//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 0bc2b6c6d18320ff070ba2fab38acd25a6ca84af logged out == == Finished build. == == Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #563 Archived 44 artifacts Archive block size is 32768 Received 14 blocks and 2298697 bytes Compression is 16.6% Took 1 sec Description set: TEZ-776 Recording test results Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Updated] (TEZ-2237) Complex DAG freezes and fails (was BufferTooSmallException raised in UnorderedPartitionedKVWriter then DAG lingers)
[ https://issues.apache.org/jira/browse/TEZ-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2237: - Assignee: Siddharth Seth Complex DAG freezes and fails (was BufferTooSmallException raised in UnorderedPartitionedKVWriter then DAG lingers) --- Key: TEZ-2237 URL: https://issues.apache.org/jira/browse/TEZ-2237 Project: Apache Tez Issue Type: Bug Affects Versions: 0.6.0 Environment: Debian Linux jessie OpenJDK Runtime Environment (build 1.8.0_40-internal-b27) OpenJDK 64-Bit Server VM (build 25.40-b25, mixed mode) 7 * Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, 16/24 GB RAM per node, 1*system disk + 4*1 or 2 TiB HDD for HDFS local (on-prem, dedicated hardware) Scalding 0.13.1 modified with https://github.com/twitter/scalding/pull/1220 to run Cascading 3.0.0-wip-90 with TEZ 0.6.0 Reporter: Cyrille Chépélov Assignee: Siddharth Seth Priority: Critical Attachments: TEZ-2237-hack.branch6.txt, TEZ-2237-hack.master.txt, TEZ-2237.test.2_branch0.6.txt, all_stacks.lst, alloc_mem.png, alloc_vcores.png, application_142732418_1444.yarn-logs.red.txt.gz, application_142732418_1908.red.txt.bz2, application_1427964335235_2070.txt.red.txt.bz2, appmastersyslog_dag_1427282048097_0215_1.red.txt.gz, appmastersyslog_dag_1427282048097_0237_1.red.txt.gz, gc_count_MRAppMaster.png, mem_free.png, noopexample_2237.txt, oneOutOfTwoOutputsStarted.txt, ordered-grouped-kv-input-traces.diff, output-starts.txt, start_containers.png, stop_containers.png, syslog_attempt_1427282048097_0215_1_21_14_0.red.txt.gz, syslog_attempt_1427282048097_0237_1_70_28_0.red.txt.gz, yarn_rm_flips.png On a specific DAG with many vertices (actually part of a larger meta-DAG), after about a hour of processing, several BufferTooSmallException are raised in UnorderedPartitionedKVWriter (about one every two or three spills). Once these exceptions are raised, the DAG remains indefinitely active, tying up memory and CPU resources as far as YARN is concerned, while little if any actual processing takes place. It seems two separate issues are at hand: 1. BufferTooSmallException are raised even though, small as the actually allocated buffers seem to be (around a couple megabytes were allotted whereas 100MiB were requested), the actual keys and values are never bigger than 24 and 1024 bytes respectively. 2. In the event BufferTooSmallExceptions are raised, the DAG fails to stop (stop requests appear to be sent 7 hours after the BTSE exceptions are raised, but 9 hours after these stop requests, the DAG was still lingering on with all containers present tying up memory and CPU allocations) The emergence of the BTSE prevent the Cascade to complete, preventing from validating the results compared to traditional MR1-based results. The lack of conclusion renders the cluster queue unavailable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2237) Complex DAG freezes and fails (was BufferTooSmallException raised in UnorderedPartitionedKVWriter then DAG lingers)
[ https://issues.apache.org/jira/browse/TEZ-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2237: - Priority: Critical (was: Major) Complex DAG freezes and fails (was BufferTooSmallException raised in UnorderedPartitionedKVWriter then DAG lingers) --- Key: TEZ-2237 URL: https://issues.apache.org/jira/browse/TEZ-2237 Project: Apache Tez Issue Type: Bug Affects Versions: 0.6.0 Environment: Debian Linux jessie OpenJDK Runtime Environment (build 1.8.0_40-internal-b27) OpenJDK 64-Bit Server VM (build 25.40-b25, mixed mode) 7 * Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, 16/24 GB RAM per node, 1*system disk + 4*1 or 2 TiB HDD for HDFS local (on-prem, dedicated hardware) Scalding 0.13.1 modified with https://github.com/twitter/scalding/pull/1220 to run Cascading 3.0.0-wip-90 with TEZ 0.6.0 Reporter: Cyrille Chépélov Priority: Critical Attachments: TEZ-2237-hack.branch6.txt, TEZ-2237-hack.master.txt, TEZ-2237.test.2_branch0.6.txt, all_stacks.lst, alloc_mem.png, alloc_vcores.png, application_142732418_1444.yarn-logs.red.txt.gz, application_142732418_1908.red.txt.bz2, application_1427964335235_2070.txt.red.txt.bz2, appmastersyslog_dag_1427282048097_0215_1.red.txt.gz, appmastersyslog_dag_1427282048097_0237_1.red.txt.gz, gc_count_MRAppMaster.png, mem_free.png, noopexample_2237.txt, oneOutOfTwoOutputsStarted.txt, ordered-grouped-kv-input-traces.diff, output-starts.txt, start_containers.png, stop_containers.png, syslog_attempt_1427282048097_0215_1_21_14_0.red.txt.gz, syslog_attempt_1427282048097_0237_1_70_28_0.red.txt.gz, yarn_rm_flips.png On a specific DAG with many vertices (actually part of a larger meta-DAG), after about a hour of processing, several BufferTooSmallException are raised in UnorderedPartitionedKVWriter (about one every two or three spills). Once these exceptions are raised, the DAG remains indefinitely active, tying up memory and CPU resources as far as YARN is concerned, while little if any actual processing takes place. It seems two separate issues are at hand: 1. BufferTooSmallException are raised even though, small as the actually allocated buffers seem to be (around a couple megabytes were allotted whereas 100MiB were requested), the actual keys and values are never bigger than 24 and 1024 bytes respectively. 2. In the event BufferTooSmallExceptions are raised, the DAG fails to stop (stop requests appear to be sent 7 hours after the BTSE exceptions are raised, but 9 hours after these stop requests, the DAG was still lingering on with all containers present tying up memory and CPU allocations) The emergence of the BTSE prevent the Cascade to complete, preventing from validating the results compared to traditional MR1-based results. The lack of conclusion renders the cluster queue unavailable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2237) Complex DAG freezes and fails (was BufferTooSmallException raised in UnorderedPartitionedKVWriter then DAG lingers)
[ https://issues.apache.org/jira/browse/TEZ-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2237: - Target Version/s: 0.6.1 Complex DAG freezes and fails (was BufferTooSmallException raised in UnorderedPartitionedKVWriter then DAG lingers) --- Key: TEZ-2237 URL: https://issues.apache.org/jira/browse/TEZ-2237 Project: Apache Tez Issue Type: Bug Affects Versions: 0.6.0 Environment: Debian Linux jessie OpenJDK Runtime Environment (build 1.8.0_40-internal-b27) OpenJDK 64-Bit Server VM (build 25.40-b25, mixed mode) 7 * Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, 16/24 GB RAM per node, 1*system disk + 4*1 or 2 TiB HDD for HDFS local (on-prem, dedicated hardware) Scalding 0.13.1 modified with https://github.com/twitter/scalding/pull/1220 to run Cascading 3.0.0-wip-90 with TEZ 0.6.0 Reporter: Cyrille Chépélov Priority: Critical Attachments: TEZ-2237-hack.branch6.txt, TEZ-2237-hack.master.txt, TEZ-2237.test.2_branch0.6.txt, all_stacks.lst, alloc_mem.png, alloc_vcores.png, application_142732418_1444.yarn-logs.red.txt.gz, application_142732418_1908.red.txt.bz2, application_1427964335235_2070.txt.red.txt.bz2, appmastersyslog_dag_1427282048097_0215_1.red.txt.gz, appmastersyslog_dag_1427282048097_0237_1.red.txt.gz, gc_count_MRAppMaster.png, mem_free.png, noopexample_2237.txt, oneOutOfTwoOutputsStarted.txt, ordered-grouped-kv-input-traces.diff, output-starts.txt, start_containers.png, stop_containers.png, syslog_attempt_1427282048097_0215_1_21_14_0.red.txt.gz, syslog_attempt_1427282048097_0237_1_70_28_0.red.txt.gz, yarn_rm_flips.png On a specific DAG with many vertices (actually part of a larger meta-DAG), after about a hour of processing, several BufferTooSmallException are raised in UnorderedPartitionedKVWriter (about one every two or three spills). Once these exceptions are raised, the DAG remains indefinitely active, tying up memory and CPU resources as far as YARN is concerned, while little if any actual processing takes place. It seems two separate issues are at hand: 1. BufferTooSmallException are raised even though, small as the actually allocated buffers seem to be (around a couple megabytes were allotted whereas 100MiB were requested), the actual keys and values are never bigger than 24 and 1024 bytes respectively. 2. In the event BufferTooSmallExceptions are raised, the DAG fails to stop (stop requests appear to be sent 7 hours after the BTSE exceptions are raised, but 9 hours after these stop requests, the DAG was still lingering on with all containers present tying up memory and CPU allocations) The emergence of the BTSE prevent the Cascade to complete, preventing from validating the results compared to traditional MR1-based results. The lack of conclusion renders the cluster queue unavailable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1897) Create a concurrent version of AsyncDispatcher
[ https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518254#comment-14518254 ] Bikas Saha commented on TEZ-1897: - bq. Rename registerAndCreateDispatcher to registerAndCreateConcurrentDispatcher ? bq. Rename registerWithExistingDispatcher to registerWithExistingConcurrentDispatcher ? The types in the arguments should be enough to differentiate and also provide compile time check. Adding it to the names would be redundant. Also, once this code is exercised and stabilized then we may use this as the only dispatcher. bq. Javadoc for AsyncDispatcherConcurrent is the same as AsyncDispatcher Fixed bq. Precondition numThreads should be at least 2 ? I think 1 is the same as AysnDispatcher This should be able to replace the legacy AsyncDispatcher and run all events on a single thread in the threadpool. So its allowed. bq. TaskAttemptImpl.java Some places still use getTask().getVertex(), can change to getVertex directly. Fixed Create a concurrent version of AsyncDispatcher -- Key: TEZ-1897 URL: https://issues.apache.org/jira/browse/TEZ-1897 Project: Apache Tez Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch, TEZ-1897.4.patch, TEZ-1897.5.patch, TEZ-1897.6.patch Currently, it processes events on a single thread. For events that can be executed in parallel, e.g. vertex manager events, allowing higher concurrency may be beneficial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518380#comment-14518380 ] Bikas Saha commented on TEZ-776: TEZ-776.6.A and TEZ-776.6.B are 2 versions of the API with the rest of the code being essentially the same. Version B uses an EventRouteMetadata as the return value from plugins and thus hides the events and their expansion etc. from them. This enables further improvement down the line where events could be expanded in the tasks or events could be kept in the AM without serde and passed directly to tasks. Based on some of the ideas in the initial design doc and ideas in TEZ-2255. Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the moment. A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event). Depending on the connection pattern - this puts limits on the number of tasks that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2380) Disable fall back to reading from timeline if timeline disabled
[ https://issues.apache.org/jira/browse/TEZ-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518386#comment-14518386 ] TezQA commented on TEZ-2380: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12728963/TEZ-2380.1.patch against master revision 5b2f011. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/566//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/566//console This message is automatically generated. Disable fall back to reading from timeline if timeline disabled Key: TEZ-2380 URL: https://issues.apache.org/jira/browse/TEZ-2380 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Chang Li Attachments: TEZ-2380.1.patch As part of TEZ-2226, timeline logging can get disabled in cases where domain creation fails. In this scenario, the dag client shoudl also not fall back to reading from timeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TEZ-2359) Deadlock in DAGAppMaster
[ https://issues.apache.org/jira/browse/TEZ-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang resolved TEZ-2359. - Resolution: Invalid My mistake, it's an issue when I work on TEZ-1273, not on the master. Deadlock in DAGAppMaster Key: TEZ-2359 URL: https://issues.apache.org/jira/browse/TEZ-2359 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Priority: Blocker {code} Found one Java-level deadlock: = Timer-1: waiting for ownable synchronizer 0x0007cd0f8a30, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by Dispatcher thread: Central Dispatcher thread: Central: waiting to lock monitor 0x7fb829866d18 (object 0x0007cd5ab958, a org.apache.tez.dag.app.rm.YarnTaskSchedulerService), which is held by DelayedContainerManager DelayedContainerManager: waiting for ownable synchronizer 0x0007cd0f8a30, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by Dispatcher thread: Central Java stack information for the threads listed above: === Timer-1: at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cd0f8a30 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.tez.dag.app.DAGAppMaster.checkAndHandleSessionTimeout(DAGAppMaster.java:2015) - locked 0x0007cd0f2ff0 (a org.apache.tez.dag.app.DAGAppMaster) at org.apache.tez.dag.app.DAGAppMaster$3.run(DAGAppMaster.java:1825) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) Dispatcher thread: Central: at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.dagComplete(YarnTaskSchedulerService.java:842) - waiting to lock 0x0007cd5ab958 (a org.apache.tez.dag.app.rm.YarnTaskSchedulerService) at org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.dagCompleted(TaskSchedulerEventHandler.java:566) at org.apache.tez.dag.app.DAGAppMaster.checkForCompletion(DAGAppMaster.java:832) at org.apache.tez.dag.app.DAGAppMaster.access$4800(DAGAppMaster.java:201) at org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2362) at org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2356) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) - locked 0x0007cd1d0208 (a org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine) at org.apache.tez.dag.app.DAGAppMaster.handle(DAGAppMaster.java:510) at org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:879) at org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:868) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) DelayedContainerManager: at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007cd0f8a30 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) at
[jira] [Created] (TEZ-2382) Publish counters to timeline for failed/killed attempts
Hitesh Shah created TEZ-2382: Summary: Publish counters to timeline for failed/killed attempts Key: TEZ-2382 URL: https://issues.apache.org/jira/browse/TEZ-2382 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah This will be useful for debugging if attempts fail due to OOM or if an attempt was killed as its speculated task ran faster, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518330#comment-14518330 ] TezQA commented on TEZ-776: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12728940/TEZ-776.6.A.patch against master revision 5b2f011. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/565//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/565//console This message is automatically generated. Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the moment. A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event). Depending on the connection pattern - this puts limits on the number of tasks that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-2379: Comment: was deleted (was: I think the following the code may cause the issue. There may be already some event causing TA killed in the event queue.E.g. TA_CONTAINER_TERMINATED_BY_SYSTEM May need to ignore T_ATTEMPT_KILLED at state of KILLED. {code} if (attempt != null !attempt.isFinished()) { eventHandler.handle(new TaskAttemptEventKillRequest(attempt.getID(), logMsg, errorCause)); } {code}) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Priority: Blocker {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518543#comment-14518543 ] TezQA commented on TEZ-776: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12728973/TEZ-776.6.B.patch against master revision 5b2f011. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 4 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.dag.app.TestMockDAGAppMaster Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/568//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/568//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/568//artifact/patchprocess/newPatchFindbugsWarningstez-api.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/568//console This message is automatically generated. Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the moment. A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event). Depending on the connection pattern - this puts limits on the number of tasks that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-776 PreCommit Build #568
Jira: https://issues.apache.org/jira/browse/TEZ-776 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/568/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2441 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12728973/TEZ-776.6.B.patch against master revision 5b2f011. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 4 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.dag.app.TestMockDAGAppMaster Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/568//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/568//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/568//artifact/patchprocess/newPatchFindbugsWarningstez-api.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/568//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. e63f35966e3164e70d2338a99e2a81706c0896b4 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #567 Archived 44 artifacts Archive block size is 32768 Received 4 blocks and 2615630 bytes Compression is 4.8% Took 2 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## 1 tests failed. FAILED: org.apache.tez.dag.app.TestMockDAGAppMaster.testMixedEdgeRouting Error Message: com.google.protobuf.ServiceException: org.apache.hadoop.ipc.RemoteException(org.apache.tez.dag.api.TezUncheckedException): Unable to instantiate class with 1 arguments: org.apache.tez.dag.app.TestMockDAGAppMaster$LegacyEdgeTestEdgeManager at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:74) at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:98) at org.apache.tez.dag.app.dag.impl.Edge.createEdgeManager(Edge.java:142) at org.apache.tez.dag.app.dag.impl.Edge.init(Edge.java:117) at org.apache.tez.dag.app.dag.impl.DAGImpl.createDAGEdges(DAGImpl.java:1451) at org.apache.tez.dag.app.dag.impl.DAGImpl.initializeDAG(DAGImpl.java:1403) at org.apache.tez.dag.app.dag.impl.DAGImpl.initializeDAG(DAGImpl.java:1355) at org.apache.tez.dag.app.dag.impl.DAGImpl$InitTransition.transition(DAGImpl.java:1665) at org.apache.tez.dag.app.dag.impl.DAGImpl$InitTransition.transition(DAGImpl.java:1647) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1079) at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:143) at org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.dag.app.DAGAppMaster.startDAGExecution(DAGAppMaster.java:2173) at org.apache.tez.dag.app.DAGAppMaster.startDAG(DAGAppMaster.java:2140) at
[jira] [Commented] (TEZ-2380) Disable fall back to reading from timeline if timeline disabled
[ https://issues.apache.org/jira/browse/TEZ-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518276#comment-14518276 ] Hitesh Shah commented on TEZ-2380: -- \cc [~pramachandran] Disable fall back to reading from timeline if timeline disabled Key: TEZ-2380 URL: https://issues.apache.org/jira/browse/TEZ-2380 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Chang Li Attachments: TEZ-2380.1.patch As part of TEZ-2226, timeline logging can get disabled in cases where domain creation fails. In this scenario, the dag client shoudl also not fall back to reading from timeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2383) release sort buffers on close
Rajesh Balamohan created TEZ-2383: - Summary: release sort buffers on close Key: TEZ-2383 URL: https://issues.apache.org/jira/browse/TEZ-2383 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Currently they get released when sorter object gets GC-ed, but it might be good to explicitly release them on close as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2305) MR compatibility sleep job fails with IOException: Undefined job output-path
[ https://issues.apache.org/jira/browse/TEZ-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-2305: Attachment: TEZ-2305-5.patch [~hitesh] Upload new patch based on TEZ-2305.2.patch. (Add unit test to simulate the behavior of translating MR job to DAG) MR compatibility sleep job fails with IOException: Undefined job output-path Key: TEZ-2305 URL: https://issues.apache.org/jira/browse/TEZ-2305 Project: Apache Tez Issue Type: Bug Affects Versions: 0.7.0 Reporter: Tassapol Athiapinya Priority: Critical Attachments: TEZ-2305-3.patch, TEZ-2305-4.patch, TEZ-2305-5.patch, TEZ-2305.1.patch, TEZ-2305.2.patch Running MR sleep job has an IOException. {code} 15/04/09 20:52:25 INFO mapreduce.Job: Job job_1428612196442_0002 failed with state FAILED due to: Vertex failed, vertexName=initialmap, vertexId=vertex_1428612196442_0002_1_00, diagnostics=[Task failed, taskId=task_1428612196442_0002_1_00_01, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.io.IOException: Undefined job output-path at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121) at org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.io.IOException: Undefined job output-path at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121) at org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ], TaskAttempt 2 failed, info=[Error: Failure while running task:java.io.IOException: Undefined job output-path at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121) at org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ], TaskAttempt 3 failed, info=[Error: Failure while running task:java.io.IOException: Undefined job output-path at
[jira] [Commented] (TEZ-2305) MR compatibility sleep job fails with IOException: Undefined job output-path
[ https://issues.apache.org/jira/browse/TEZ-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518612#comment-14518612 ] Hitesh Shah commented on TEZ-2305: -- +1 pending pre-commit . MR compatibility sleep job fails with IOException: Undefined job output-path Key: TEZ-2305 URL: https://issues.apache.org/jira/browse/TEZ-2305 Project: Apache Tez Issue Type: Bug Affects Versions: 0.7.0 Reporter: Tassapol Athiapinya Priority: Critical Attachments: TEZ-2305-3.patch, TEZ-2305-4.patch, TEZ-2305-5.patch, TEZ-2305.1.patch, TEZ-2305.2.patch Running MR sleep job has an IOException. {code} 15/04/09 20:52:25 INFO mapreduce.Job: Job job_1428612196442_0002 failed with state FAILED due to: Vertex failed, vertexName=initialmap, vertexId=vertex_1428612196442_0002_1_00, diagnostics=[Task failed, taskId=task_1428612196442_0002_1_00_01, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.io.IOException: Undefined job output-path at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121) at org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.io.IOException: Undefined job output-path at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121) at org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ], TaskAttempt 2 failed, info=[Error: Failure while running task:java.io.IOException: Undefined job output-path at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121) at org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ], TaskAttempt 3 failed, info=[Error: Failure while running task:java.io.IOException: Undefined job output-path at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248) at
[jira] [Updated] (TEZ-1752) Inputs / Outputs in the Runtime library should be interruptable
[ https://issues.apache.org/jira/browse/TEZ-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-1752: -- Attachment: TEZ-1752.4.patch - If the task is running, thread running the processor is interrupted; otherwise taskFuture.get() is interrupted. - Adding the inputs/outputs as and when they are scheduled for initialization in LogicalIOProcessorRuntimeTask Inputs / Outputs in the Runtime library should be interruptable --- Key: TEZ-1752 URL: https://issues.apache.org/jira/browse/TEZ-1752 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Rajesh Balamohan Attachments: TEZ-1752.1.patch, TEZ-1752.2.patch, TEZ-1752.3.patch, TEZ-1752.4.patch Not possible to preempt tasks without killing containers without this. There's still the problem of Processors not supporting interrupts. We may need API enhancements to either query IPOs on whether they are interrupbtible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-1752 PreCommit Build #562
Jira: https://issues.apache.org/jira/browse/TEZ-1752 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/562/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 1985 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12728780/TEZ-1752.4.patch against master revision 2522447. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.task.TestTaskExecution Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/562//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/562//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 60653cb169aa978f8334d4f8376a11f554ec6029 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #558 Archived 44 artifacts Archive block size is 32768 Received 4 blocks and 2594356 bytes Compression is 4.8% Took 1.6 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## 1 tests failed. REGRESSION: org.apache.tez.runtime.task.TestTaskExecution.testHeartbeatShouldDie Error Message: Found a TaskAttemptCompletedvent when not expected Stack Trace: java.lang.AssertionError: Found a TaskAttemptCompletedvent when not expected at org.junit.Assert.fail(Assert.java:88) at org.apache.tez.runtime.task.TestTaskExecution$TezTaskUmbilicalForTest.verifyNoCompletionEvents(TestTaskExecution.java:596) at org.apache.tez.runtime.task.TestTaskExecution.testHeartbeatShouldDie(TestTaskExecution.java:326)
[jira] [Commented] (TEZ-2377) RandomWriter ends up using TextOutputFormat instead of SequenceFileOutputFormat
[ https://issues.apache.org/jira/browse/TEZ-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516952#comment-14516952 ] Jeff Zhang commented on TEZ-2377: - Suppose it can be solved by TEZ-2305. [~rajesh.balamohan] Can you try the patch TEZ-2305.2.patch ? RandomWriter ends up using TextOutputFormat instead of SequenceFileOutputFormat --- Key: TEZ-2377 URL: https://issues.apache.org/jira/browse/TEZ-2377 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2377.1.patch {code} yarn jar ./dist/tez/tez-tests-0.7.0-SNAPSHOT.jar randomwriter -Dmapreduce.randomwriter.totalbytes=10737418 /tmp/test1 {code} This ends up generating TextOutputFormat. {code} yarn jar ./dist/tez/tez-tests-0.7.0-SNAPSHOT.jar sort -Dmapreduce.framework.name=yarn-tez -r 5 /tmp/test1 /tmp/test_sorted {code} This ends up throwing error {noformat} Failure while running task:java.io.IOException: hdfs://tez-vm:56565/tmp/test1/part-0 not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1851) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1811) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1760) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1774) at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54) at org.apache.tez.mapreduce.lib.MRReaderMapReduce.setupNewRecordReader(MRReaderMapReduce.java:149) at org.apache.tez.mapreduce.lib.MRReaderMapReduce.init(MRReaderMapReduce.java:78) at org.apache.tez.mapreduce.input.MRInput.initializeInternal(MRInput.java:475) at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:108) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2377) RandomWriter ends up using TextOutputFormat instead of SequenceFileOutputFormat
[ https://issues.apache.org/jira/browse/TEZ-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516993#comment-14516993 ] Rajesh Balamohan commented on TEZ-2377: --- Thanks [~zjffdu], TEZ-2305 (.2 patch) has the similar checks. Tried it out and it works fine. If you plan to commit that patch in, I can mark this as a duplicate or already resolved. RandomWriter ends up using TextOutputFormat instead of SequenceFileOutputFormat --- Key: TEZ-2377 URL: https://issues.apache.org/jira/browse/TEZ-2377 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2377.1.patch {code} yarn jar ./dist/tez/tez-tests-0.7.0-SNAPSHOT.jar randomwriter -Dmapreduce.randomwriter.totalbytes=10737418 /tmp/test1 {code} This ends up generating TextOutputFormat. {code} yarn jar ./dist/tez/tez-tests-0.7.0-SNAPSHOT.jar sort -Dmapreduce.framework.name=yarn-tez -r 5 /tmp/test1 /tmp/test_sorted {code} This ends up throwing error {noformat} Failure while running task:java.io.IOException: hdfs://tez-vm:56565/tmp/test1/part-0 not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1851) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1811) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1760) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1774) at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54) at org.apache.tez.mapreduce.lib.MRReaderMapReduce.setupNewRecordReader(MRReaderMapReduce.java:149) at org.apache.tez.mapreduce.lib.MRReaderMapReduce.init(MRReaderMapReduce.java:78) at org.apache.tez.mapreduce.input.MRInput.initializeInternal(MRInput.java:475) at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:108) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
Rajesh Balamohan created TEZ-2379: - Summary: org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-2379: -- Attachment: application_1429683757595_0479.log Attaching AM log org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Attachments: application_1429683757595_0479.log {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2374) Fix build break against hadoop-2.2 due to TEZ-2325
[ https://issues.apache.org/jira/browse/TEZ-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516848#comment-14516848 ] Prakash Ramachandran commented on TEZ-2374: --- +1 LGTM thanks hitesh was trying to remove the deprecated warning, missed the 2.2 branch. Fix build break against hadoop-2.2 due to TEZ-2325 -- Key: TEZ-2374 URL: https://issues.apache.org/jira/browse/TEZ-2374 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Attachments: TEZ-2374.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2378) In case Fetcher (unordered) fails to do local fetch, log in debug mode to reduce log size
Rajesh Balamohan created TEZ-2378: - Summary: In case Fetcher (unordered) fails to do local fetch, log in debug mode to reduce log size Key: TEZ-2378 URL: https://issues.apache.org/jira/browse/TEZ-2378 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Following can be logged as debug mode as opposed to WARN level. May be counters can be added later to track the number of times it failed to do local-fetch. {noformat} 2015-04-28 05:41:45,487 WARN [Fetcher [Map_5] #15] shuffle.Fetcher: Failed to shuffle output of InputAttemptIdentifier [inputIdentifier=InputIdentifier [inputIndex=81], attemptNumber=0, pathComponent=attempt_1429683757595_0485_1_03_81_0_10003, fetchTypeInfo=FINAL_MERGE_ENABLED, spillEventId=-1] from cn047-10.l42scl.hortonworks.com(local fetch) org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find output/attempt_1429683757595_0485_1_03_81_0_10003/file.out.index in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164) at org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:612) at org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:592) at org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:537) at org.apache.tez.runtime.library.common.shuffle.Fetcher.doSharedFetch(Fetcher.java:353) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:192) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517010#comment-14517010 ] Jeff Zhang commented on TEZ-2379: - I think the following the code may cause the issue. There may be already some event causing TA killed in the event queue.E.g. TA_CONTAINER_TERMINATED_BY_SYSTEM May need to ignore T_ATTEMPT_KILLED at state of KILLED. {code} if (attempt != null !attempt.isFinished()) { eventHandler.handle(new TaskAttemptEventKillRequest(attempt.getID(), logMsg, errorCause)); } {code} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Attachments: application_1429683757595_0479.log {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2380) Disable fall back to reading from timeline if timeline disabled
[ https://issues.apache.org/jira/browse/TEZ-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated TEZ-2380: -- Attachment: TEZ-2380.1.patch Disable fall back to reading from timeline if timeline disabled Key: TEZ-2380 URL: https://issues.apache.org/jira/browse/TEZ-2380 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Chang Li Attachments: TEZ-2380.1.patch As part of TEZ-2226, timeline logging can get disabled in cases where domain creation fails. In this scenario, the dag client shoudl also not fall back to reading from timeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2380) Disable fall back to reading from timeline if timeline disabled
[ https://issues.apache.org/jira/browse/TEZ-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518277#comment-14518277 ] Chang Li commented on TEZ-2380: --- [~hitesh], I have updated DAGClientImpl, and have verified that isATSEnabled is set to false when timeline disabled in both session and non-session case. Disable fall back to reading from timeline if timeline disabled Key: TEZ-2380 URL: https://issues.apache.org/jira/browse/TEZ-2380 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Chang Li Attachments: TEZ-2380.1.patch As part of TEZ-2226, timeline logging can get disabled in cases where domain creation fails. In this scenario, the dag client shoudl also not fall back to reading from timeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2256) Avoid use of BufferTooSmallException to signal end of buffer in UnorderedPartitionedKVWriter
[ https://issues.apache.org/jira/browse/TEZ-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-2256: -- Priority: Critical (was: Minor) Target Version/s: 0.6.1 Avoid use of BufferTooSmallException to signal end of buffer in UnorderedPartitionedKVWriter Key: TEZ-2256 URL: https://issues.apache.org/jira/browse/TEZ-2256 Project: Apache Tez Issue Type: Improvement Affects Versions: 0.6.0, 0.7.0 Reporter: Cyrille Chépélov Assignee: Cyrille Chépélov Priority: Critical Labels: patch Attachments: remove-btse-1-MASTER.patch, remove-btse-1-rfc.patch Original Estimate: 6h Remaining Estimate: 6h UnorderedPartitionedKVWriter delegates serialization to the application, passing it a private ByteArrayOutputStream. In case the buffer is exhausted, ByteArrayOutputStream signals that with a private BufferTooSmallException, which can be seen but not dealt with by the application. As [~cwensel] pointed out, when the application is in fact a complex framework, there is no way to distinguish this exception from a real failure, which compels logging the full stack even for reasonable events such as buffer complete. Suggested approach: set a complete flag in ByteArrayOutputStream that disables any further output, and replace BufferTooSmallException (BTSE) handling by checking that flag. [~sseth] suggested checking out SortedOutput as well, as the mechanisms there should be similar. I'll give this a go this week. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1897) Create a concurrent version of AsyncDispatcher
[ https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518493#comment-14518493 ] TezQA commented on TEZ-1897: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12728962/TEZ-1897.6.patch against master revision 5b2f011. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/567//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/567//console This message is automatically generated. Create a concurrent version of AsyncDispatcher -- Key: TEZ-1897 URL: https://issues.apache.org/jira/browse/TEZ-1897 Project: Apache Tez Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch, TEZ-1897.4.patch, TEZ-1897.5.patch, TEZ-1897.6.patch Currently, it processes events on a single thread. For events that can be executed in parallel, e.g. vertex manager events, allowing higher concurrency may be beneficial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1752) Inputs / Outputs in the Runtime library should be interruptable
[ https://issues.apache.org/jira/browse/TEZ-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517294#comment-14517294 ] TezQA commented on TEZ-1752: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12728815/TEZ-1752.5.patch against master revision 2522447. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/563//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/563//console This message is automatically generated. Inputs / Outputs in the Runtime library should be interruptable --- Key: TEZ-1752 URL: https://issues.apache.org/jira/browse/TEZ-1752 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Rajesh Balamohan Attachments: TEZ-1752.1.patch, TEZ-1752.2.patch, TEZ-1752.3.patch, TEZ-1752.4.patch, TEZ-1752.5.patch Not possible to preempt tasks without killing containers without this. There's still the problem of Processors not supporting interrupts. We may need API enhancements to either query IPOs on whether they are interrupbtible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1752) Inputs / Outputs in the Runtime library should be interruptable
[ https://issues.apache.org/jira/browse/TEZ-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-1752: -- Attachment: TEZ-1752.5.patch Inputs / Outputs in the Runtime library should be interruptable --- Key: TEZ-1752 URL: https://issues.apache.org/jira/browse/TEZ-1752 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Rajesh Balamohan Attachments: TEZ-1752.1.patch, TEZ-1752.2.patch, TEZ-1752.3.patch, TEZ-1752.4.patch, TEZ-1752.5.patch Not possible to preempt tasks without killing containers without this. There's still the problem of Processors not supporting interrupts. We may need API enhancements to either query IPOs on whether they are interrupbtible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2379: - Priority: Blocker (was: Major) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Priority: Blocker Attachments: application_1429683757595_0479.log {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2226) Disable writing history to timeline if domain creation fails.
[ https://issues.apache.org/jira/browse/TEZ-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517127#comment-14517127 ] Prakash Ramachandran commented on TEZ-2226: --- +1 LGTM. one minor item (separate JIRA) could be in DAGClientImpl to not read from timeline if writes are disabled. Disable writing history to timeline if domain creation fails. - Key: TEZ-2226 URL: https://issues.apache.org/jira/browse/TEZ-2226 Project: Apache Tez Issue Type: Sub-task Reporter: Hitesh Shah Assignee: Chang Li Priority: Blocker Attachments: TEZ-2226.10.patch, TEZ-2226.11.patch, TEZ-2226.12.patch, TEZ-2226.2.patch, TEZ-2226.3.patch, TEZ-2226.4.patch, TEZ-2226.5.patch, TEZ-2226.6.patch, TEZ-2226.7.patch, TEZ-2226.8.patch, TEZ-2226.9.patch, TEZ-2226.addon-for-patch10, TEZ-2226.addon-for-patch10-combined.full.patch, TEZ-2226.patch, TEZ-2226.wip.2.patch, TEZ-2226.wip.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Success: TEZ-1752 PreCommit Build #563
Jira: https://issues.apache.org/jira/browse/TEZ-1752 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/563/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2791 lines...] [INFO] Final Memory: 69M/927M [INFO] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12728815/TEZ-1752.5.patch against master revision 2522447. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/563//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/563//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 884316e5ac186e4fe11d60b806f9cf953ead2f28 logged out == == Finished build. == == Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #558 Archived 44 artifacts Archive block size is 32768 Received 4 blocks and 2618861 bytes Compression is 4.8% Took 1.9 sec Description set: TEZ-1752 Recording test results Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Updated] (TEZ-2383) release sort buffers on close
[ https://issues.apache.org/jira/browse/TEZ-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-2383: -- Attachment: TEZ-2383.1.patch [~sseth] - Can you please review when you find time? release sort buffers on close - Key: TEZ-2383 URL: https://issues.apache.org/jira/browse/TEZ-2383 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Attachments: TEZ-2383.1.patch Currently they get released when sorter object gets GC-ed, but it might be good to explicitly release them on close as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-2383) release sort buffers on close
[ https://issues.apache.org/jira/browse/TEZ-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan reassigned TEZ-2383: - Assignee: Rajesh Balamohan release sort buffers on close - Key: TEZ-2383 URL: https://issues.apache.org/jira/browse/TEZ-2383 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2383.1.patch Currently they get released when sorter object gets GC-ed, but it might be good to explicitly release them on close as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2360) per-io counters flag should generate both overall and per-edge counters
[ https://issues.apache.org/jira/browse/TEZ-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated TEZ-2360: -- Attachment: TEZ-2360.3.patch added Test case to check per io counters are aggregated. since task counters are not available from the client, did the check at vertex level. NOTE: some counters like TaskCounter.FIRST_EVENT_RECEIVED might not be meaningful at task level. per-io counters flag should generate both overall and per-edge counters Key: TEZ-2360 URL: https://issues.apache.org/jira/browse/TEZ-2360 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Prakash Ramachandran Attachments: TEZ-2360.1.patch, TEZ-2360.2.patch, TEZ-2360.3.patch Currently, the per-io flag disables overall per task counters and retains only per edge counters. It would be useful to have both overall and per edge counters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2305) MR compatibility sleep job fails with IOException: Undefined job output-path
[ https://issues.apache.org/jira/browse/TEZ-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2305: - Target Version/s: 0.5.4, 0.6.1 Affects Version/s: (was: 0.7.0) MR compatibility sleep job fails with IOException: Undefined job output-path Key: TEZ-2305 URL: https://issues.apache.org/jira/browse/TEZ-2305 Project: Apache Tez Issue Type: Bug Reporter: Tassapol Athiapinya Assignee: Jeff Zhang Priority: Critical Attachments: TEZ-2305-3.patch, TEZ-2305-4.patch, TEZ-2305-5.patch, TEZ-2305.1.patch, TEZ-2305.2.patch Running MR sleep job has an IOException. {code} 15/04/09 20:52:25 INFO mapreduce.Job: Job job_1428612196442_0002 failed with state FAILED due to: Vertex failed, vertexName=initialmap, vertexId=vertex_1428612196442_0002_1_00, diagnostics=[Task failed, taskId=task_1428612196442_0002_1_00_01, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.io.IOException: Undefined job output-path at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121) at org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.io.IOException: Undefined job output-path at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121) at org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ], TaskAttempt 2 failed, info=[Error: Failure while running task:java.io.IOException: Undefined job output-path at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121) at org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ], TaskAttempt 3 failed, info=[Error: Failure while running task:java.io.IOException: Undefined job output-path at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248) at
[jira] [Commented] (TEZ-2383) release sort buffers on close
[ https://issues.apache.org/jira/browse/TEZ-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518728#comment-14518728 ] Gopal V commented on TEZ-2383: -- [~rajesh.balamohan]: This is likely to be a fix limited to the sort implementation. You can broaden this fix to all possible edge types, by moving up the abstraction. Since ProcessorContext is the visible opaque implementation setup for the tez-api, it would be better to clean up that chain of references - cleaning up TezTaskContextImpl::runTimeTask within the ProcessorContext impl when the logical IO processor close is called. This follows a neat contract as any user-code which holds onto a processor context object after the close of the task runtime will be holding onto an empty/unusable shell of a context. release sort buffers on close - Key: TEZ-2383 URL: https://issues.apache.org/jira/browse/TEZ-2383 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2383.1.patch Currently they get released when sorter object gets GC-ed, but it might be good to explicitly release them on close as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2226) Disable writing history to timeline if domain creation fails.
[ https://issues.apache.org/jira/browse/TEZ-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517354#comment-14517354 ] Hitesh Shah commented on TEZ-2226: -- Good catch [~pramachandran]. Will file a separate jira for that. Committing shortly. Disable writing history to timeline if domain creation fails. - Key: TEZ-2226 URL: https://issues.apache.org/jira/browse/TEZ-2226 Project: Apache Tez Issue Type: Sub-task Reporter: Hitesh Shah Assignee: Chang Li Priority: Blocker Attachments: TEZ-2226.10.patch, TEZ-2226.11.patch, TEZ-2226.12.patch, TEZ-2226.2.patch, TEZ-2226.3.patch, TEZ-2226.4.patch, TEZ-2226.5.patch, TEZ-2226.6.patch, TEZ-2226.7.patch, TEZ-2226.8.patch, TEZ-2226.9.patch, TEZ-2226.addon-for-patch10, TEZ-2226.addon-for-patch10-combined.full.patch, TEZ-2226.patch, TEZ-2226.wip.2.patch, TEZ-2226.wip.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-924) InputFailedEvent handling for Shuffle
[ https://issues.apache.org/jira/browse/TEZ-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517398#comment-14517398 ] Hitesh Shah commented on TEZ-924: - [~rajesh.balamohan] [~sseth] Is this still for 0.7 ? or shoudl we move to 0.8 ? InputFailedEvent handling for Shuffle - Key: TEZ-924 URL: https://issues.apache.org/jira/browse/TEZ-924 Project: Apache Tez Issue Type: Bug Reporter: Siddharth Seth Priority: Critical Shuffle receives batches of Events to process from the AM. The way these events are sent over to the ShuffleHandlers and the way they're processed - it's possible that Shuffle will start fetching data from an Event, which is to be subsequently marked as failed (via an InputFailedEvent) 1) The AM sends events in batches. An InputFailedEvent for a specific Input may not be part of the same batch which contained the original event which is being marked bad. 2) The ShuffleEventHandler processes the events in each batch one event at a time - so even if the InputFailedEvent follows - it's possible for Shuffle to start fetching data from a Failed Input. The AM needs to change to invalidate Inputs up front - so that related events don't span batches. Alternately, it needs to apply the InputFailedEvent to the original event being sent. The Shuffle itself should process a batch update as a batch - that would prevent fetchers from starting early even though there may be additional events for the same host. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2380) Disable fall back to reading from timeline if timeline disabled
Hitesh Shah created TEZ-2380: Summary: Disable fall back to reading from timeline if timeline disabled Key: TEZ-2380 URL: https://issues.apache.org/jira/browse/TEZ-2380 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah As part of TEZ-2226, timeline logging can get disabled in cases where domain creation fails. In this scenario, the dag client shoudl also not fall back to reading from timeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-122) Finalize v1 DAGPlan and consider versioning support
[ https://issues.apache.org/jira/browse/TEZ-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-122: Target Version/s: 0.8.0 (was: 0.7.0) Finalize v1 DAGPlan and consider versioning support --- Key: TEZ-122 URL: https://issues.apache.org/jira/browse/TEZ-122 Project: Apache Tez Issue Type: Bug Reporter: Mike Liddell Priority: Critical The DAGPlan format will change over time. The current definition may be too strict with required fields and not have the desired flexibilty for versioning. options/suggestions 1. remove structural constraints (ie everything is optional) so that additive changes can succeed and so that mandatory fields in v1 can become optional without difficulty. This doesn't help with changes to naming changes and doesn't help if semantics change. 2. introduce a version field to the protobuf. This version field should probably be the only mandatory field in the top-level message so that it can always be parsed. Versioning may introduce additional Protobuf definititions eg DAGPlanv2 and so on. A combination of 1 2 seems wise. If all fields are optional, then we dont get stuck with unwanted mandatory fields. and if a version number is included, the AppMaster can tell what semantics / assumptions should apply when consuming the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2309) Fix slf4j dependencies for tez modules
[ https://issues.apache.org/jira/browse/TEZ-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517403#comment-14517403 ] Hitesh Shah commented on TEZ-2309: -- [~vasanthkumar] Mind filing a separate jira for fixing the dep in tez-dag and attaching your current patch there? I will move this out to 0.8 so that we can use this as the main jira to clean up logging api dependencies and all the necessary issues with slf4j-log4j. Fix slf4j dependencies for tez modules --- Key: TEZ-2309 URL: https://issues.apache.org/jira/browse/TEZ-2309 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Priority: Critical Attachments: TEZ-2309.1.patch Most modules should depend only on slf4j-api and not slf4j-log4j12. I believe only tez-dag and tez-runtime-internals might need the log4j dependency due to log rotation related code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2226) Disable writing history to timeline if domain creation fails.
[ https://issues.apache.org/jira/browse/TEZ-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517387#comment-14517387 ] Hitesh Shah commented on TEZ-2226: -- Created TEZ-2380 for [~pramachandran]'s comment. Disable writing history to timeline if domain creation fails. - Key: TEZ-2226 URL: https://issues.apache.org/jira/browse/TEZ-2226 Project: Apache Tez Issue Type: Sub-task Reporter: Hitesh Shah Assignee: Chang Li Priority: Blocker Fix For: 0.6.1 Attachments: TEZ-2226.10.patch, TEZ-2226.11.patch, TEZ-2226.12.patch, TEZ-2226.2.patch, TEZ-2226.3.patch, TEZ-2226.4.patch, TEZ-2226.5.patch, TEZ-2226.6.patch, TEZ-2226.7.patch, TEZ-2226.8.patch, TEZ-2226.9.patch, TEZ-2226.addon-for-patch10, TEZ-2226.addon-for-patch10-combined.full.patch, TEZ-2226.patch, TEZ-2226.wip.2.patch, TEZ-2226.wip.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1138) Access to some mapred counters is broken after TEZ-901
[ https://issues.apache.org/jira/browse/TEZ-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517407#comment-14517407 ] Hitesh Shah commented on TEZ-1138: -- [~sseth] [~rekhajoshm] Will either of you be looking at this? Or should we move this to 0.8? Access to some mapred counters is broken after TEZ-901 -- Key: TEZ-1138 URL: https://issues.apache.org/jira/browse/TEZ-1138 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Rekha Joshi Priority: Critical AbstractCounters.java maps mapred.Task$Counter counters to TaskCounter {code} private static final MapString, String legacyMap = Maps.newHashMap(); static { legacyMap.put(org.apache.hadoop.mapred.Task$Counter, TaskCounter.class.getName()); legacyMap.put(org.apache.hadoop.mapred.JobInProgress$Counter, JobCounter.class.getName()); legacyMap.put(FileSystemCounters, FileSystemCounter.class.getName()); }{code} But TEZ-901 removed a bunch of counters like MAP_OUTPUT_BYTES from TaskCounter. Now if MAP_OUTPUT_BYTES is retrieved from MR code then it fails with enum not found exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1421) MRCombiner throws NPE in MapredWordCount on master branch
[ https://issues.apache.org/jira/browse/TEZ-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517405#comment-14517405 ] Hitesh Shah commented on TEZ-1421: -- [~ozawa] Did you manage to get a chance to look at this? If not, we can move this out to 0.8. MRCombiner throws NPE in MapredWordCount on master branch - Key: TEZ-1421 URL: https://issues.apache.org/jira/browse/TEZ-1421 Project: Apache Tez Issue Type: Bug Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Priority: Critical I tested MapredWordCount against 70GB generated by RandowTextWriter. When a Combiner runs, it throws NPE. It looks setCombinerClass doesn't work correctly. {quote} Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.tez.mapreduce.combine.MRCombiner.runOldCombiner(MRCombiner.java:122) at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:112) at org.apache.tez.runtime.library.common.shuffle.impl.MergeManager.runCombineProcessor(MergeManager.java:472) at org.apache.tez.runtime.library.common.shuffle.impl.MergeManager$InMemoryMerger.merge(MergeManager.java:605) at org.apache.tez.runtime.library.common.shuffle.impl.MergeThread.run(MergeThread.java:89) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TEZ-2046) Make timeline server putDomain exceptions non-fatal - full feature
[ https://issues.apache.org/jira/browse/TEZ-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah resolved TEZ-2046. -- Resolution: Fixed Make timeline server putDomain exceptions non-fatal - full feature -- Key: TEZ-2046 URL: https://issues.apache.org/jira/browse/TEZ-2046 Project: Apache Tez Issue Type: Bug Reporter: Jonathan Eagles Priority: Blocker As pointed out, this could be problem in secure environments. This jira will provide a full fix that will disable timeline server puts for tez jobs that fail to success create a domain. In addition, it may trigger off of the client configuration created in YARN-3144. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2164) Shade the guava version used by Tez
[ https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2164: - Target Version/s: 0.8.0 (was: 0.7.0) Shade the guava version used by Tez --- Key: TEZ-2164 URL: https://issues.apache.org/jira/browse/TEZ-2164 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Priority: Critical Attachments: allow-guava-16.0.1.patch Should allow us to upgrade to a newer version without shipping a guava dependency. Would be good to do this in 0.7 so that we stop shipping guava as early as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2164) Shade the guava version used by Tez
[ https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517395#comment-14517395 ] Hitesh Shah commented on TEZ-2164: -- Moving to 0.8.0 as this work may result in some level of incompatibilities. Shade the guava version used by Tez --- Key: TEZ-2164 URL: https://issues.apache.org/jira/browse/TEZ-2164 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Priority: Critical Attachments: allow-guava-16.0.1.patch Should allow us to upgrade to a newer version without shipping a guava dependency. Would be good to do this in 0.7 so that we stop shipping guava as early as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2309) Fix slf4j dependencies for tez modules
[ https://issues.apache.org/jira/browse/TEZ-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2309: - Target Version/s: 0.8.0 (was: 0.7.0) Fix slf4j dependencies for tez modules --- Key: TEZ-2309 URL: https://issues.apache.org/jira/browse/TEZ-2309 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Priority: Critical Attachments: TEZ-2309.1.patch Most modules should depend only on slf4j-api and not slf4j-log4j12. I believe only tez-dag and tez-runtime-internals might need the log4j dependency due to log rotation related code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2381) Fixes after rebase 04/28
[ https://issues.apache.org/jira/browse/TEZ-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-2381: Attachment: TEZ-2381.1.txt Fixes after rebase 04/28 Key: TEZ-2381 URL: https://issues.apache.org/jira/browse/TEZ-2381 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: TEZ-2381.1.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1912) Merge exceptions are thrown when enabling tez.runtime.shuffle.memory-to-memory.enable tez.runtime.shuffle.memory-to-memory.segments
[ https://issues.apache.org/jira/browse/TEZ-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-1912: - Fix Version/s: 0.7.0 Merge exceptions are thrown when enabling tez.runtime.shuffle.memory-to-memory.enable tez.runtime.shuffle.memory-to-memory.segments -- Key: TEZ-1912 URL: https://issues.apache.org/jira/browse/TEZ-1912 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Fix For: 0.7.0 Attachments: TEZ-1912.1.patch Merge exceptions are thrown when running a hive query on tez with the following setting. It works fine without mem-to-mem merge setting. {code} 2015-01-04 20:04:01,371 ERROR [ShuffleAndMergeRunner [Map_1]] orderedgrouped.Shuffle: ShuffleRunner failed with error org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: Error while doing final merge at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:364) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:327) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Rec# 22630125: Negative value-length: -1 at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.positionToNextRecord(IFile.java:720) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.InMemoryReader.readRawKey(InMemoryReader.java:104) at org.apache.tez.runtime.library.common.sort.impl.TezMerger$Segment.readRawKey(TezMerger.java:329) at org.apache.tez.runtime.library.common.sort.impl.TezMerger$MergeQueue.adjustPriorityQueue(TezMerger.java:500) at org.apache.tez.runtime.library.common.sort.impl.TezMerger$MergeQueue.next(TezMerger.java:545) at org.apache.tez.runtime.library.common.sort.impl.TezMerger.writeFile(TezMerger.java:204) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:862) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:473) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:362) ... 5 more {code} {code} $HIVE_HOME/bin/hive -hiveconf tez.runtime.io.sort.factor=200 --hiveconf tez.shuffle-vertex-manager.min-src-fraction=1.0 --hiveconf tez.shuffle-vertex-manager.max-src-fraction=1.0 --hiveconf hive.tez.auto.reducer.parallelism=false --hiveconf tez.am.heartbeat.interval-ms.max=20 --hiveconf tez.runtime.io.sort.mb=1200 --hiveconf tez.runtime.sort.threads=2 --hiveconf tez.history.logging.service.class=org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService --hiveconf hive.tez.container.size=4096 --hiveconf tez.runtime.shuffle.memory-to-memory.enable=true --hiveconf tez.runtime.shuffle.memory-to-memory.segments=4 --10 TB dataset use tpcds4_bin_partitioned_orc_1; drop table testData; create table testData as select ss_sold_date_sk,ss_sold_time_sk,ss_item_sk,ss_customer_sk,ss_quantity,ss_sold_date from store_sales distribute by ss_sold_date; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1912) Merge exceptions are thrown when enabling tez.runtime.shuffle.memory-to-memory.enable tez.runtime.shuffle.memory-to-memory.segments
[ https://issues.apache.org/jira/browse/TEZ-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-1912: - Assignee: Rajesh Balamohan Merge exceptions are thrown when enabling tez.runtime.shuffle.memory-to-memory.enable tez.runtime.shuffle.memory-to-memory.segments -- Key: TEZ-1912 URL: https://issues.apache.org/jira/browse/TEZ-1912 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Fix For: 0.7.0 Attachments: TEZ-1912.1.patch Merge exceptions are thrown when running a hive query on tez with the following setting. It works fine without mem-to-mem merge setting. {code} 2015-01-04 20:04:01,371 ERROR [ShuffleAndMergeRunner [Map_1]] orderedgrouped.Shuffle: ShuffleRunner failed with error org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: Error while doing final merge at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:364) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:327) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Rec# 22630125: Negative value-length: -1 at org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.positionToNextRecord(IFile.java:720) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.InMemoryReader.readRawKey(InMemoryReader.java:104) at org.apache.tez.runtime.library.common.sort.impl.TezMerger$Segment.readRawKey(TezMerger.java:329) at org.apache.tez.runtime.library.common.sort.impl.TezMerger$MergeQueue.adjustPriorityQueue(TezMerger.java:500) at org.apache.tez.runtime.library.common.sort.impl.TezMerger$MergeQueue.next(TezMerger.java:545) at org.apache.tez.runtime.library.common.sort.impl.TezMerger.writeFile(TezMerger.java:204) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:862) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:473) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:362) ... 5 more {code} {code} $HIVE_HOME/bin/hive -hiveconf tez.runtime.io.sort.factor=200 --hiveconf tez.shuffle-vertex-manager.min-src-fraction=1.0 --hiveconf tez.shuffle-vertex-manager.max-src-fraction=1.0 --hiveconf hive.tez.auto.reducer.parallelism=false --hiveconf tez.am.heartbeat.interval-ms.max=20 --hiveconf tez.runtime.io.sort.mb=1200 --hiveconf tez.runtime.sort.threads=2 --hiveconf tez.history.logging.service.class=org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService --hiveconf hive.tez.container.size=4096 --hiveconf tez.runtime.shuffle.memory-to-memory.enable=true --hiveconf tez.runtime.shuffle.memory-to-memory.segments=4 --10 TB dataset use tpcds4_bin_partitioned_orc_1; drop table testData; create table testData as select ss_sold_date_sk,ss_sold_time_sk,ss_item_sk,ss_customer_sk,ss_quantity,ss_sold_date from store_sales distribute by ss_sold_date; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2377) RandomWriter ends up using TextOutputFormat instead of SequenceFileOutputFormat
[ https://issues.apache.org/jira/browse/TEZ-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518144#comment-14518144 ] Bikas Saha commented on TEZ-2377: - Does the same fix need to be made to MRInput(Base)? RandomWriter ends up using TextOutputFormat instead of SequenceFileOutputFormat --- Key: TEZ-2377 URL: https://issues.apache.org/jira/browse/TEZ-2377 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2377.1.patch {code} yarn jar ./dist/tez/tez-tests-0.7.0-SNAPSHOT.jar randomwriter -Dmapreduce.randomwriter.totalbytes=10737418 /tmp/test1 {code} This ends up generating TextOutputFormat. {code} yarn jar ./dist/tez/tez-tests-0.7.0-SNAPSHOT.jar sort -Dmapreduce.framework.name=yarn-tez -r 5 /tmp/test1 /tmp/test_sorted {code} This ends up throwing error {noformat} Failure while running task:java.io.IOException: hdfs://tez-vm:56565/tmp/test1/part-0 not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1851) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1811) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1760) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1774) at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54) at org.apache.tez.mapreduce.lib.MRReaderMapReduce.setupNewRecordReader(MRReaderMapReduce.java:149) at org.apache.tez.mapreduce.lib.MRReaderMapReduce.init(MRReaderMapReduce.java:78) at org.apache.tez.mapreduce.input.MRInput.initializeInternal(MRInput.java:475) at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:108) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-776: --- Attachment: TEZ-776.6.A.patch Patch with test issues fixed. Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the moment. A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event). Depending on the connection pattern - this puts limits on the number of tasks that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-946) Tez loses buffer-cache performance by running interleaved vertexes
[ https://issues.apache.org/jira/browse/TEZ-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-946: Target Version/s: 0.8.0 (was: 0.7.0) Tez loses buffer-cache performance by running interleaved vertexes -- Key: TEZ-946 URL: https://issues.apache.org/jira/browse/TEZ-946 Project: Apache Tez Issue Type: Bug Reporter: Gopal V Attachments: union-10.svg For a task which has multiple reduce vertexes running to generate UNION ops, the current Tez behaviour causes bad cache performance as well as bad perf with the object registry. The map spill files get paged in and out of cache, when I was running a large query which had multiple reducers pulling data off different shuffle edges at the same time. Along with this, whenever a map-join vertex is interleaved with a reducer vertex, the map-join hashtable gets dropped in the transition. It would be beneficial to schedule the vertexes at the same level with some priority so that we finish them faster through better buffer-cache hit-rate and object-registry hit-rate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TEZ-2381) Fixes after rebase 04/28
[ https://issues.apache.org/jira/browse/TEZ-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth resolved TEZ-2381. - Resolution: Fixed Fix Version/s: TEZ-2003 Fixes after rebase 04/28 Key: TEZ-2381 URL: https://issues.apache.org/jira/browse/TEZ-2381 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: TEZ-2003 Attachments: TEZ-2381.1.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518219#comment-14518219 ] Hitesh Shah commented on TEZ-2379: -- Additional question: speculation enabled? org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Priority: Blocker Attachments: application_1429683757595_0479.log {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518219#comment-14518219 ] Hitesh Shah edited comment on TEZ-2379 at 4/28/15 10:19 PM: Additional question: speculation enabled? \cc [~bikassaha] was (Author: hitesh): Additional question: speculation enabled? org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Priority: Blocker Attachments: application_1429683757595_0479.log {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1348) Setup configs required for local mode automatically, instead of relying on changes to tez-site
[ https://issues.apache.org/jira/browse/TEZ-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-1348: - Target Version/s: 0.8.0 (was: 0.6.0) Setup configs required for local mode automatically, instead of relying on changes to tez-site -- Key: TEZ-1348 URL: https://issues.apache.org/jira/browse/TEZ-1348 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Priority: Critical In TEZ-717, I incorrect thought setting fs.defaultFS programmatically in tez-site would work for local mode. Currently the requirement is that tez-site.xml must have fs.defaultFS set to file:///. While that works, it doesn't allow for seamless execution in either local-mode or on a cluster. The main issue here is that when Inputs / Outputs are configured - they use a version of configuration which reads tez-site, and do not use the configuration from the client itself (which is correct behaviour). Not sure what a good way to fix this is 1) It may be possible to override this value each time an instance of Configuration/TezConfiguration is created. One possible way would be to statically add a default resource to Configuration the moment a local client is created. 2) Provide information in the contexts on whether this is local or not. This is fairly ugly, and would get in the way of running mixed mode tasks. Anyone have other suggestions ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1537) Expose edge information in I/P/IContexts
[ https://issues.apache.org/jira/browse/TEZ-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-1537: - Target Version/s: 0.8.0 (was: 0.6.0) Expose edge information in I/P/IContexts Key: TEZ-1537 URL: https://issues.apache.org/jira/browse/TEZ-1537 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Inputs / Outputs can make use of this information for specific optimizations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-568) Allow I/P/Os to report spescific time information to the AM
[ https://issues.apache.org/jira/browse/TEZ-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-568: Target Version/s: 0.8.0 (was: 0.7.0) Allow I/P/Os to report spescific time information to the AM --- Key: TEZ-568 URL: https://issues.apache.org/jira/browse/TEZ-568 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth Priority: Critical The AM can include this information in the history logs, which allows for much easier analysis of job performance. As an example, shuffle reporting completion time, Tez itself reporting initialization time etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-568) Allow I/P/Os to report spescific time information to the AM
[ https://issues.apache.org/jira/browse/TEZ-568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518193#comment-14518193 ] Hitesh Shah commented on TEZ-568: - Moving to 0.8 Allow I/P/Os to report spescific time information to the AM --- Key: TEZ-568 URL: https://issues.apache.org/jira/browse/TEZ-568 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth Priority: Critical The AM can include this information in the history logs, which allows for much easier analysis of job performance. As an example, shuffle reporting completion time, Tez itself reporting initialization time etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518194#comment-14518194 ] Rajesh Balamohan commented on TEZ-2379: --- [~hitesh] - Yes, this is with master codebase. org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Priority: Blocker Attachments: application_1429683757595_0479.log {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1564) State machine error: Invalid event: T_SCHEDULE at SCHEDULED
[ https://issues.apache.org/jira/browse/TEZ-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-1564: - Target Version/s: 0.7.0 (was: 0.6.0) State machine error: Invalid event: T_SCHEDULE at SCHEDULED --- Key: TEZ-1564 URL: https://issues.apache.org/jira/browse/TEZ-1564 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Priority: Critical Attachments: applogs.txt.tar.gz, dag.dot ERROR [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.TaskImpl: Can't handle this event at current state for task_1409722953518_0162_1_07_00 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_SCHEDULE at SCHEDULED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:827) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:95) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1604) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1590) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:724) I will attach the dag + app logs soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1564) State machine error: Invalid event: T_SCHEDULE at SCHEDULED
[ https://issues.apache.org/jira/browse/TEZ-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-1564: - Target Version/s: 0.7.0, 0.6.1 (was: 0.7.0) State machine error: Invalid event: T_SCHEDULE at SCHEDULED --- Key: TEZ-1564 URL: https://issues.apache.org/jira/browse/TEZ-1564 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Priority: Critical Attachments: applogs.txt.tar.gz, dag.dot ERROR [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.TaskImpl: Can't handle this event at current state for task_1409722953518_0162_1_07_00 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_SCHEDULE at SCHEDULED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:827) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:95) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1604) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1590) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:724) I will attach the dag + app logs soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1897) Create a concurrent version of AsyncDispatcher
[ https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517978#comment-14517978 ] Siddharth Seth commented on TEZ-1897: - Quick glance. This should be disabled by default. Will look in detail a little later. Create a concurrent version of AsyncDispatcher -- Key: TEZ-1897 URL: https://issues.apache.org/jira/browse/TEZ-1897 Project: Apache Tez Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch, TEZ-1897.4.patch, TEZ-1897.5.patch Currently, it processes events on a single thread. For events that can be executed in parallel, e.g. vertex manager events, allowing higher concurrency may be beneficial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2381) Fixes after rebase 04/28
Siddharth Seth created TEZ-2381: --- Summary: Fixes after rebase 04/28 Key: TEZ-2381 URL: https://issues.apache.org/jira/browse/TEZ-2381 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1897) Create a concurrent version of AsyncDispatcher
[ https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517996#comment-14517996 ] Bikas Saha commented on TEZ-1897: - the patch enabled it by default to get the jenkins run with it. It will be off by default. Create a concurrent version of AsyncDispatcher -- Key: TEZ-1897 URL: https://issues.apache.org/jira/browse/TEZ-1897 Project: Apache Tez Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch, TEZ-1897.4.patch, TEZ-1897.5.patch Currently, it processes events on a single thread. For events that can be executed in parallel, e.g. vertex manager events, allowing higher concurrency may be beneficial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-776: --- Attachment: TEZ-776.6.A.patch Re-uploading patch as the previous one had incomplete diffs. Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the moment. A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event). Depending on the connection pattern - this puts limits on the number of tasks that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-776: --- Attachment: (was: TEZ-776.6.A.patch) Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the moment. A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event). Depending on the connection pattern - this puts limits on the number of tasks that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2376) Remove TaskAttemptEventType.TA_DIAGNOSTICS_UPDATE
[ https://issues.apache.org/jira/browse/TEZ-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516639#comment-14516639 ] TezQA commented on TEZ-2376: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12728721/TEZ-2376-1.patch against master revision 2522447. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestFaultTolerance Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/561//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/561//console This message is automatically generated. Remove TaskAttemptEventType.TA_DIAGNOSTICS_UPDATE - Key: TEZ-2376 URL: https://issues.apache.org/jira/browse/TEZ-2376 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Attachments: TEZ-2376-1.patch It is never used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1897) Create a concurrent version of AsyncDispatcher
[ https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516643#comment-14516643 ] Jeff Zhang commented on TEZ-1897: - Comments: * Rename registerAndCreateDispatcher to registerAndCreateConcurrentDispatcher ? Rename registerWithExistingDispatcher to registerWithExistingConcurrentDispatcher ? * Javadoc for AsyncDispatcherConcurrent is the same as AsyncDispatcher * No concurentDispatcher checking in registerAndCreateDispatcher(Class? extends Enum eventType, EventHandler handler, String dispatcherName) * Precondition numThreads should be at least 2 ? I think 1 is the same as AysnDispatcher {code} public AsyncDispatcherConcurrent registerAndCreateDispatcher(Class? extends Enum eventType, EventHandler handler, String dispatcherName, int numThreads) {code} * TaskAttemptImpl.java Some places still use getTask().getVertex(), can change to getVertex directly. Create a concurrent version of AsyncDispatcher -- Key: TEZ-1897 URL: https://issues.apache.org/jira/browse/TEZ-1897 Project: Apache Tez Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch, TEZ-1897.4.patch, TEZ-1897.5.patch Currently, it processes events on a single thread. For events that can be executed in parallel, e.g. vertex manager events, allowing higher concurrency may be beneficial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-776: --- Attachment: TEZ-776.5.patch Cancelling and re-uploading patch due to build error. Triggering build again. Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the moment. A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event). Depending on the connection pattern - this puts limits on the number of tasks that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517533#comment-14517533 ] TezQA commented on TEZ-776: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12728848/TEZ-776.5.patch against master revision 5b2f011. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.dag.app.TestTaskAttemptListenerImplTezDag org.apache.tez.dag.app.dag.impl.TestVertexImpl Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/564//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/564//console This message is automatically generated. Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the moment. A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event). Depending on the connection pattern - this puts limits on the number of tasks that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2372) TestAMRecovery failing in latest build
[ https://issues.apache.org/jira/browse/TEZ-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517516#comment-14517516 ] Hitesh Shah commented on TEZ-2372: -- For the next failure, take a look at https://builds.apache.org/job/Tez-Build/ws/ - this should get you to the actual mini cluster dirs as well as the test output log file. TestAMRecovery failing in latest build --- Key: TEZ-2372 URL: https://issues.apache.org/jira/browse/TEZ-2372 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah https://builds.apache.org/job/Tez-Build/1018/console -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-776: --- Attachment: (was: TEZ-776.5.patch) Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the moment. A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event). Depending on the connection pattern - this puts limits on the number of tasks that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-2380) Disable fall back to reading from timeline if timeline disabled
[ https://issues.apache.org/jira/browse/TEZ-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li reassigned TEZ-2380: - Assignee: Chang Li Disable fall back to reading from timeline if timeline disabled Key: TEZ-2380 URL: https://issues.apache.org/jira/browse/TEZ-2380 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Chang Li As part of TEZ-2226, timeline logging can get disabled in cases where domain creation fails. In this scenario, the dag client shoudl also not fall back to reading from timeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2376) Remove TaskAttemptEventType.TA_DIAGNOSTICS_UPDATE
[ https://issues.apache.org/jira/browse/TEZ-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516571#comment-14516571 ] TezQA commented on TEZ-2376: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12728713/TEZ-2376-1.patch against master revision 2522447. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestFaultTolerance Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/559//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/559//console This message is automatically generated. Remove TaskAttemptEventType.TA_DIAGNOSTICS_UPDATE - Key: TEZ-2376 URL: https://issues.apache.org/jira/browse/TEZ-2376 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Attachments: TEZ-2376-1.patch It is never used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2376 PreCommit Build #559
Jira: https://issues.apache.org/jira/browse/TEZ-2376 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/559/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2565 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12728713/TEZ-2376-1.patch against master revision 2522447. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestFaultTolerance Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/559//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/559//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 835a3cb8aff098eb2cba7623a690ab14d6481a71 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #558 Archived 44 artifacts Archive block size is 32768 Received 23 blocks and 1982656 bytes Compression is 27.5% Took 2 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## 6 tests failed. REGRESSION: org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit Error Message: expected:SUCCEEDED but was:FAILED Stack Trace: java.lang.AssertionError: expected:SUCCEEDED but was:FAILED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:135) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:114) at org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit(TestFaultTolerance.java:248) REGRESSION: org.apache.tez.test.TestFaultTolerance.testInputFailureRerunCanSendOutputToTwoDownstreamVertices Error Message: TezSession has already shutdown. No cluster diagnostics found. Stack Trace: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found. at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:653) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:118) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:114) at org.apache.tez.test.TestFaultTolerance.testInputFailureRerunCanSendOutputToTwoDownstreamVertices(TestFaultTolerance.java:672) REGRESSION: org.apache.tez.test.TestFaultTolerance.testMultipleInputFailureWithoutExit Error Message: TezSession has already shutdown. No cluster diagnostics found. Stack Trace: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found. at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:653) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:118) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:114) at org.apache.tez.test.TestFaultTolerance.testMultipleInputFailureWithoutExit(TestFaultTolerance.java:297) REGRESSION: org.apache.tez.test.TestFaultTolerance.testCascadingInputFailureWithExitSuccess Error Message: