date:20150428


[ 
https://issues.apache.org/jira/browse/TEZ-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516988#comment-14516988
 ] 

Rajesh Balamohan edited comment on TEZ-1752 at 4/28/15 1:01 PM:


- In case of AM sending shutdown signal, RuntimeTask.abort() is invoked so that 
processors can get a chance for any cleanup.
- If the task is running, thread running the processor is interrupted 
immediately; otherwise taskFuture.get() is interrupted.
- Adding the inputs/outputs as and when they are scheduled for initialization 
in LogicalIOProcessorRuntimeTask



was (Author: rajesh.balamohan):
- If the task is running, thread running the processor is interrupted; 
otherwise taskFuture.get() is interrupted.
- Adding the inputs/outputs as and when they are scheduled for initialization 
in LogicalIOProcessorRuntimeTask


 Inputs / Outputs in the Runtime library should be interruptable
 ---

 Key: TEZ-1752
 URL: https://issues.apache.org/jira/browse/TEZ-1752
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Rajesh Balamohan
 Attachments: TEZ-1752.1.patch, TEZ-1752.2.patch, TEZ-1752.3.patch, 
 TEZ-1752.4.patch


 Not possible to preempt tasks without killing containers without this.
 There's still the problem of Processors not supporting interrupts. We may 
 need API enhancements to either query IPOs on whether they are interrupbtible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1752) Inputs / Outputs in the Runtime library should be interruptable


[ 
https://issues.apache.org/jira/browse/TEZ-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517007#comment-14517007
 ] 

TezQA commented on TEZ-1752:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728780/TEZ-1752.4.patch
  against master revision 2522447.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.runtime.task.TestTaskExecution

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/562//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/562//console

This message is automatically generated.

 Inputs / Outputs in the Runtime library should be interruptable
 ---

 Key: TEZ-1752
 URL: https://issues.apache.org/jira/browse/TEZ-1752
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Rajesh Balamohan
 Attachments: TEZ-1752.1.patch, TEZ-1752.2.patch, TEZ-1752.3.patch, 
 TEZ-1752.4.patch


 Not possible to preempt tasks without killing containers without this.
 There's still the problem of Processors not supporting interrupts. We may 
 need API enhancements to either query IPOs on whether they are interrupbtible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2231) Create project by-laws


[ 
https://issues.apache.org/jira/browse/TEZ-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517927#comment-14517927
 ] 

Hitesh Shah commented on TEZ-2231:
--

This one has been open for quite some time. [~sseth] [~bikassaha] [~rohini] 
mind doing a final review before I commit?

 Create project by-laws
 --

 Key: TEZ-2231
 URL: https://issues.apache.org/jira/browse/TEZ-2231
 Project: Apache Tez
  Issue Type: Task
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: by-laws.2.patch, by-laws.3.patch, by-laws.patch


 Define the Project by-laws.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1522) Scheduling can result in out of order execution and slowdown of upstream work


 [ 
https://issues.apache.org/jira/browse/TEZ-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1522:

Target Version/s: 0.8.0

 Scheduling can result in out of order execution and slowdown of upstream work
 -

 Key: TEZ-1522
 URL: https://issues.apache.org/jira/browse/TEZ-1522
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Critical
  Labels: performance
 Attachments: TEZ-1522.1.wip.txt, TEZ-1522.2.wip.txt, 
 TEZ-1522.am.log.gz, task_runtime.svg


 M2 M7
 \  /
 (sg) \/
R3/ (b)
 \   /
  (b) \ /
   \   /
 M5
 |
 R6 
 Plz refer to the attachment (task runtime SVG). In this case, M5 got 
 scheduled much earlier than R3 (green color in the diagram) and retained lots 
 of containers.
 R3 got less containers to work with. 
 Attaching the output from the status monitor when the job ran;  Map_5 has 
 taken up almost all of cluster resource, whereas Reducer_3 got fraction of 
 the capacity.
 Map_2: 1/1  Map_5: 0(+373)/1000 Map_7: 1/1  Reducer_3: 0/8000 
   Reducer_6: 0/1
 Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 0/8000 
   Reducer_6: 0/1
 Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 0(+1)/8000 
   Reducer_6: 0/1
 
 Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 
 14(+7)/8000  Reducer_6: 0/1
 Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 
 63(+14)/8000 Reducer_6: 0/1
 Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 
 159(+22)/8000Reducer_6: 0/1
 Map_2: 1/1  Map_5: 0(+374)/1000 Map_7: 1/1  Reducer_3: 
 308(+29)/8000Reducer_6: 0/1
 ...
 Creating this JIRA as a placeholder for scheduler enhancement. One 
 possibililty could be to
 schedule lesser number of tasks in downstream vertices, based on the 
 information available for the upstream vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-924) InputFailedEvent handling for Shuffle


 [ 
https://issues.apache.org/jira/browse/TEZ-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-924:
---
Target Version/s: 0.8.0  (was: 0.7.0)

 InputFailedEvent handling for Shuffle
 -

 Key: TEZ-924
 URL: https://issues.apache.org/jira/browse/TEZ-924
 Project: Apache Tez
  Issue Type: Bug
Reporter: Siddharth Seth
Priority: Critical

 Shuffle receives batches of Events to process from the AM. The way these 
 events are sent over to the ShuffleHandlers and the way they're processed - 
 it's possible that Shuffle will start fetching data from an Event, which is 
 to be subsequently marked as failed (via an InputFailedEvent)
 1) The AM sends events in batches. An InputFailedEvent for a specific Input 
 may not be part of the same batch which contained the original event which is 
 being marked bad.
 2) The ShuffleEventHandler processes the events in each batch one event at a 
 time - so even if the InputFailedEvent follows - it's possible for Shuffle to 
 start fetching data from a Failed Input.
 The AM needs to change to invalidate Inputs up front - so that related events 
 don't span batches. Alternately, it needs to apply the InputFailedEvent to 
 the original event being sent.
 The Shuffle itself should process a batch update as a batch - that would 
 prevent fetchers from starting early even though there may be additional 
 events for the same host.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED


[ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518321#comment-14518321
 ] 

Bikas Saha commented on TEZ-2379:
-

deleting log as it had private server names and ip addresses.

 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 --

 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Priority: Blocker

 {noformat}
 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
 Can't handle this event at current state for 
 task_1429683757595_0479_1_03_13
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
 at 
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Additional notes:
 
 Hive - latest build 
 Tez - master
 tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED


[ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518337#comment-14518337
 ] 

Bikas Saha commented on TEZ-2379:
-

1) Client issued dag kill that caused all tasks to get kill.
2) Task sent kill request to its attempt and started waiting for attempt for 
finish
3) Attempt succeeded - sent done
4) Task got attempt success and went into killed state because all its attempts 
are done
5) Attempt got kill request - it honored that kill request in 
TerminatedAfterSuccessTransition and sent killed back to task.
6) Task got attempt killed in killed state and that is not handled.

From what I see in the code, 5 seems to be the problem here. The attempt 
should ignore kill request if its already done. Attempt is killed when a 
different attempt is successful and this attempt is not needed. Or when the 
task is killed. Task retroactive kill in which a successful task is killed 
(say in order to run it again after node failure) does not use this flow. So 
unless we can think of any other use cases for a successful attempt 
transitioning to killed, we should ignore kill request in attempt if the 
attempt is already succeeded.

 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 --

 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Priority: Blocker

 {noformat}
 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
 Can't handle this event at current state for 
 task_1429683757595_0479_1_03_13
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
 at 
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Additional notes:
 
 Hive - latest build 
 Tez - master
 tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1897) Create a concurrent version of AsyncDispatcher


 [ 
https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-1897:

Attachment: TEZ-1897.6.patch

Patch addresses comments and adds a new test.

 Create a concurrent version of AsyncDispatcher
 --

 Key: TEZ-1897
 URL: https://issues.apache.org/jira/browse/TEZ-1897
 Project: Apache Tez
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch, 
 TEZ-1897.4.patch, TEZ-1897.5.patch, TEZ-1897.6.patch


 Currently, it processes events on a single thread. For events that can be 
 executed in parallel, e.g. vertex manager events, allowing higher concurrency 
 may be beneficial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents


 [ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-776:
---
Attachment: TEZ-776.6.B.patch

 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, 
 TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, 
 TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, 
 TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, 
 TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
 With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
 events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
 without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Success: TEZ-1897 PreCommit Build #567

Jira: https://issues.apache.org/jira/browse/TEZ-1897
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/567/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2795 lines...]
[INFO] Final Memory: 77M/960M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728962/TEZ-1897.6.patch
  against master revision 5b2f011.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/567//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/567//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
3e708a977896cd65ef59206c2f0fca47194c7204 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #565
Archived 44 artifacts
Archive block size is 32768
Received 6 blocks and 2582782 bytes
Compression is 7.1%
Took 1.5 sec
Description set: TEZ-1897
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED


[ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518350#comment-14518350
 ] 

Rajesh Balamohan commented on TEZ-2379:
---

speculation was not enabled in this case.

 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 --

 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Priority: Blocker

 {noformat}
 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
 Can't handle this event at current state for 
 task_1429683757595_0479_1_03_13
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
 at 
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Additional notes:
 
 Hive - latest build 
 Tez - master
 tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Failed: TEZ-2380 PreCommit Build #566

Jira: https://issues.apache.org/jira/browse/TEZ-2380
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/566/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2772 lines...]



{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728963/TEZ-2380.1.patch
  against master revision 5b2f011.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/566//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/566//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
1bc0c8a43a36e36380eecf6aec12d7409c1ebc9b logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #565
Archived 44 artifacts
Archive block size is 32768
Received 22 blocks and 2032295 bytes
Compression is 26.2%
Took 2.1 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Updated] (TEZ-2256) Avoid use of BufferTooSmallException to signal end of buffer in UnorderedPartitionedKVWriter


 [ 
https://issues.apache.org/jira/browse/TEZ-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2256:
-
Assignee: Hitesh Shah  (was: Cyrille Chépélov)

 Avoid use of BufferTooSmallException to signal end of buffer in 
 UnorderedPartitionedKVWriter
 

 Key: TEZ-2256
 URL: https://issues.apache.org/jira/browse/TEZ-2256
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Cyrille Chépélov
Assignee: Hitesh Shah
Priority: Critical
  Labels: patch
 Attachments: remove-btse-1-MASTER.patch, remove-btse-1-rfc.patch

   Original Estimate: 6h
  Remaining Estimate: 6h

 UnorderedPartitionedKVWriter delegates serialization to the application, 
 passing it a private ByteArrayOutputStream. In case the buffer is exhausted, 
 ByteArrayOutputStream signals that with a private BufferTooSmallException, 
 which can be seen but not dealt with by the application. As [~cwensel] 
 pointed out, when the application is in fact a complex framework, there is no 
 way to distinguish this exception from a real failure, which compels logging 
 the full stack even for reasonable events such as buffer complete.
 Suggested approach: set a complete flag in ByteArrayOutputStream that 
 disables any further output, and replace  BufferTooSmallException (BTSE) 
 handling by checking that flag. 
 [~sseth] suggested checking out SortedOutput as well, as the mechanisms there 
 should be similar.
 I'll give this a go this week.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2256) Avoid use of BufferTooSmallException to signal end of buffer in UnorderedPartitionedKVWriter


 [ 
https://issues.apache.org/jira/browse/TEZ-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2256:
-
Issue Type: Bug  (was: Improvement)

 Avoid use of BufferTooSmallException to signal end of buffer in 
 UnorderedPartitionedKVWriter
 

 Key: TEZ-2256
 URL: https://issues.apache.org/jira/browse/TEZ-2256
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Cyrille Chépélov
Assignee: Cyrille Chépélov
Priority: Critical
  Labels: patch
 Attachments: remove-btse-1-MASTER.patch, remove-btse-1-rfc.patch

   Original Estimate: 6h
  Remaining Estimate: 6h

 UnorderedPartitionedKVWriter delegates serialization to the application, 
 passing it a private ByteArrayOutputStream. In case the buffer is exhausted, 
 ByteArrayOutputStream signals that with a private BufferTooSmallException, 
 which can be seen but not dealt with by the application. As [~cwensel] 
 pointed out, when the application is in fact a complex framework, there is no 
 way to distinguish this exception from a real failure, which compels logging 
 the full stack even for reasonable events such as buffer complete.
 Suggested approach: set a complete flag in ByteArrayOutputStream that 
 disables any further output, and replace  BufferTooSmallException (BTSE) 
 handling by checking that flag. 
 [~sseth] suggested checking out SortedOutput as well, as the mechanisms there 
 should be similar.
 I'll give this a go this week.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2256) Avoid use of BufferTooSmallException to signal end of buffer in UnorderedPartitionedKVWriter


 [ 
https://issues.apache.org/jira/browse/TEZ-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2256:
-
Issue Type: Improvement  (was: Bug)

 Avoid use of BufferTooSmallException to signal end of buffer in 
 UnorderedPartitionedKVWriter
 

 Key: TEZ-2256
 URL: https://issues.apache.org/jira/browse/TEZ-2256
 Project: Apache Tez
  Issue Type: Improvement
Affects Versions: 0.6.0, 0.7.0
Reporter: Cyrille Chépélov
Assignee: Cyrille Chépélov
Priority: Critical
  Labels: patch
 Attachments: remove-btse-1-MASTER.patch, remove-btse-1-rfc.patch

   Original Estimate: 6h
  Remaining Estimate: 6h

 UnorderedPartitionedKVWriter delegates serialization to the application, 
 passing it a private ByteArrayOutputStream. In case the buffer is exhausted, 
 ByteArrayOutputStream signals that with a private BufferTooSmallException, 
 which can be seen but not dealt with by the application. As [~cwensel] 
 pointed out, when the application is in fact a complex framework, there is no 
 way to distinguish this exception from a real failure, which compels logging 
 the full stack even for reasonable events such as buffer complete.
 Suggested approach: set a complete flag in ByteArrayOutputStream that 
 disables any further output, and replace  BufferTooSmallException (BTSE) 
 handling by checking that flag. 
 [~sseth] suggested checking out SortedOutput as well, as the mechanisms there 
 should be similar.
 I'll give this a go this week.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2256) Avoid use of BufferTooSmallException to signal end of buffer in UnorderedPartitionedKVWriter


[ 
https://issues.apache.org/jira/browse/TEZ-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518457#comment-14518457
 ] 

Hitesh Shah commented on TEZ-2256:
--

Change status to patch available to trigger pre-commit. 

 Avoid use of BufferTooSmallException to signal end of buffer in 
 UnorderedPartitionedKVWriter
 

 Key: TEZ-2256
 URL: https://issues.apache.org/jira/browse/TEZ-2256
 Project: Apache Tez
  Issue Type: Improvement
Affects Versions: 0.6.0, 0.7.0
Reporter: Cyrille Chépélov
Assignee: Cyrille Chépélov
Priority: Critical
  Labels: patch
 Attachments: remove-btse-1-MASTER.patch, remove-btse-1-rfc.patch

   Original Estimate: 6h
  Remaining Estimate: 6h

 UnorderedPartitionedKVWriter delegates serialization to the application, 
 passing it a private ByteArrayOutputStream. In case the buffer is exhausted, 
 ByteArrayOutputStream signals that with a private BufferTooSmallException, 
 which can be seen but not dealt with by the application. As [~cwensel] 
 pointed out, when the application is in fact a complex framework, there is no 
 way to distinguish this exception from a real failure, which compels logging 
 the full stack even for reasonable events such as buffer complete.
 Suggested approach: set a complete flag in ByteArrayOutputStream that 
 disables any further output, and replace  BufferTooSmallException (BTSE) 
 handling by checking that flag. 
 [~sseth] suggested checking out SortedOutput as well, as the mechanisms there 
 should be similar.
 I'll give this a go this week.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2256) Avoid use of BufferTooSmallException to signal end of buffer in UnorderedPartitionedKVWriter


[ 
https://issues.apache.org/jira/browse/TEZ-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518581#comment-14518581
 ] 

TezQA commented on TEZ-2256:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  
http://issues.apache.org/jira/secure/attachment/12723914/remove-btse-1-MASTER.patch
  against master revision 5b2f011.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/569//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/569//console

This message is automatically generated.

 Avoid use of BufferTooSmallException to signal end of buffer in 
 UnorderedPartitionedKVWriter
 

 Key: TEZ-2256
 URL: https://issues.apache.org/jira/browse/TEZ-2256
 Project: Apache Tez
  Issue Type: Improvement
Affects Versions: 0.6.0, 0.7.0
Reporter: Cyrille Chépélov
Assignee: Cyrille Chépélov
Priority: Critical
  Labels: patch
 Attachments: remove-btse-1-MASTER.patch, remove-btse-1-rfc.patch

   Original Estimate: 6h
  Remaining Estimate: 6h

 UnorderedPartitionedKVWriter delegates serialization to the application, 
 passing it a private ByteArrayOutputStream. In case the buffer is exhausted, 
 ByteArrayOutputStream signals that with a private BufferTooSmallException, 
 which can be seen but not dealt with by the application. As [~cwensel] 
 pointed out, when the application is in fact a complex framework, there is no 
 way to distinguish this exception from a real failure, which compels logging 
 the full stack even for reasonable events such as buffer complete.
 Suggested approach: set a complete flag in ByteArrayOutputStream that 
 disables any further output, and replace  BufferTooSmallException (BTSE) 
 handling by checking that flag. 
 [~sseth] suggested checking out SortedOutput as well, as the mechanisms there 
 should be similar.
 I'll give this a go this week.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Failed: TEZ-2256 PreCommit Build #569

Jira: https://issues.apache.org/jira/browse/TEZ-2256
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/569/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2783 lines...]



{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  
http://issues.apache.org/jira/secure/attachment/12723914/remove-btse-1-MASTER.patch
  against master revision 5b2f011.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/569//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/569//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
ce5b82ce372f4eacfe21bc559ca53f87c76c248f logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #567
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2680043 bytes
Compression is 4.7%
Took 2.1 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

Success: TEZ-776 PreCommit Build #565

Jira: https://issues.apache.org/jira/browse/TEZ-776
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/565/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2797 lines...]
[INFO] Final Memory: 75M/962M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728940/TEZ-776.6.A.patch
  against master revision 5b2f011.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/565//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/565//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
0bc2b6c6d18320ff070ba2fab38acd25a6ca84af logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #563
Archived 44 artifacts
Archive block size is 32768
Received 14 blocks and 2298697 bytes
Compression is 16.6%
Took 1 sec
Description set: TEZ-776
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Updated] (TEZ-2237) Complex DAG freezes and fails (was BufferTooSmallException raised in UnorderedPartitionedKVWriter then DAG lingers)


 [ 
https://issues.apache.org/jira/browse/TEZ-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2237:
-
Assignee: Siddharth Seth

 Complex DAG freezes and fails (was BufferTooSmallException raised in 
 UnorderedPartitionedKVWriter then DAG lingers)
 ---

 Key: TEZ-2237
 URL: https://issues.apache.org/jira/browse/TEZ-2237
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
 Environment: Debian Linux jessie
 OpenJDK Runtime Environment (build 1.8.0_40-internal-b27)
 OpenJDK 64-Bit Server VM (build 25.40-b25, mixed mode)
 7 * Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, 16/24 GB RAM per node, 1*system 
 disk + 4*1 or 2 TiB HDD for HDFS  local  (on-prem, dedicated hardware)
 Scalding 0.13.1 modified with https://github.com/twitter/scalding/pull/1220 
 to run Cascading 3.0.0-wip-90 with TEZ 0.6.0
Reporter: Cyrille Chépélov
Assignee: Siddharth Seth
Priority: Critical
 Attachments: TEZ-2237-hack.branch6.txt, TEZ-2237-hack.master.txt, 
 TEZ-2237.test.2_branch0.6.txt, all_stacks.lst, alloc_mem.png, 
 alloc_vcores.png, application_142732418_1444.yarn-logs.red.txt.gz, 
 application_142732418_1908.red.txt.bz2, 
 application_1427964335235_2070.txt.red.txt.bz2, 
 appmastersyslog_dag_1427282048097_0215_1.red.txt.gz, 
 appmastersyslog_dag_1427282048097_0237_1.red.txt.gz, 
 gc_count_MRAppMaster.png, mem_free.png, noopexample_2237.txt, 
 oneOutOfTwoOutputsStarted.txt, ordered-grouped-kv-input-traces.diff, 
 output-starts.txt, start_containers.png, stop_containers.png, 
 syslog_attempt_1427282048097_0215_1_21_14_0.red.txt.gz, 
 syslog_attempt_1427282048097_0237_1_70_28_0.red.txt.gz, yarn_rm_flips.png


 On a specific DAG with many vertices (actually part of a larger meta-DAG), 
 after about a hour of processing, several BufferTooSmallException are raised 
 in UnorderedPartitionedKVWriter (about one every two or three spills).
 Once these exceptions are raised, the DAG remains indefinitely active, 
 tying up memory and CPU resources as far as YARN is concerned, while little 
 if any actual processing takes place. 
 It seems two separate issues are at hand:
   1. BufferTooSmallException are raised even though, small as the actually 
 allocated buffers seem to be (around a couple megabytes were allotted whereas 
 100MiB were requested), the actual keys and values are never bigger than 24 
 and 1024 bytes respectively.
   2. In the event BufferTooSmallExceptions are raised, the DAG fails to stop 
 (stop requests appear to be sent 7 hours after the BTSE exceptions are 
 raised, but 9 hours after these stop requests, the DAG was still lingering on 
 with all containers present tying up memory and CPU allocations)
 The emergence of the BTSE prevent the Cascade to complete, preventing from 
 validating the results compared to traditional MR1-based results. The lack of 
 conclusion renders the cluster queue unavailable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2237) Complex DAG freezes and fails (was BufferTooSmallException raised in UnorderedPartitionedKVWriter then DAG lingers)


 [ 
https://issues.apache.org/jira/browse/TEZ-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2237:
-
Priority: Critical  (was: Major)

 Complex DAG freezes and fails (was BufferTooSmallException raised in 
 UnorderedPartitionedKVWriter then DAG lingers)
 ---

 Key: TEZ-2237
 URL: https://issues.apache.org/jira/browse/TEZ-2237
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
 Environment: Debian Linux jessie
 OpenJDK Runtime Environment (build 1.8.0_40-internal-b27)
 OpenJDK 64-Bit Server VM (build 25.40-b25, mixed mode)
 7 * Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, 16/24 GB RAM per node, 1*system 
 disk + 4*1 or 2 TiB HDD for HDFS  local  (on-prem, dedicated hardware)
 Scalding 0.13.1 modified with https://github.com/twitter/scalding/pull/1220 
 to run Cascading 3.0.0-wip-90 with TEZ 0.6.0
Reporter: Cyrille Chépélov
Priority: Critical
 Attachments: TEZ-2237-hack.branch6.txt, TEZ-2237-hack.master.txt, 
 TEZ-2237.test.2_branch0.6.txt, all_stacks.lst, alloc_mem.png, 
 alloc_vcores.png, application_142732418_1444.yarn-logs.red.txt.gz, 
 application_142732418_1908.red.txt.bz2, 
 application_1427964335235_2070.txt.red.txt.bz2, 
 appmastersyslog_dag_1427282048097_0215_1.red.txt.gz, 
 appmastersyslog_dag_1427282048097_0237_1.red.txt.gz, 
 gc_count_MRAppMaster.png, mem_free.png, noopexample_2237.txt, 
 oneOutOfTwoOutputsStarted.txt, ordered-grouped-kv-input-traces.diff, 
 output-starts.txt, start_containers.png, stop_containers.png, 
 syslog_attempt_1427282048097_0215_1_21_14_0.red.txt.gz, 
 syslog_attempt_1427282048097_0237_1_70_28_0.red.txt.gz, yarn_rm_flips.png


 On a specific DAG with many vertices (actually part of a larger meta-DAG), 
 after about a hour of processing, several BufferTooSmallException are raised 
 in UnorderedPartitionedKVWriter (about one every two or three spills).
 Once these exceptions are raised, the DAG remains indefinitely active, 
 tying up memory and CPU resources as far as YARN is concerned, while little 
 if any actual processing takes place. 
 It seems two separate issues are at hand:
   1. BufferTooSmallException are raised even though, small as the actually 
 allocated buffers seem to be (around a couple megabytes were allotted whereas 
 100MiB were requested), the actual keys and values are never bigger than 24 
 and 1024 bytes respectively.
   2. In the event BufferTooSmallExceptions are raised, the DAG fails to stop 
 (stop requests appear to be sent 7 hours after the BTSE exceptions are 
 raised, but 9 hours after these stop requests, the DAG was still lingering on 
 with all containers present tying up memory and CPU allocations)
 The emergence of the BTSE prevent the Cascade to complete, preventing from 
 validating the results compared to traditional MR1-based results. The lack of 
 conclusion renders the cluster queue unavailable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2237) Complex DAG freezes and fails (was BufferTooSmallException raised in UnorderedPartitionedKVWriter then DAG lingers)


 [ 
https://issues.apache.org/jira/browse/TEZ-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2237:
-
Target Version/s: 0.6.1

 Complex DAG freezes and fails (was BufferTooSmallException raised in 
 UnorderedPartitionedKVWriter then DAG lingers)
 ---

 Key: TEZ-2237
 URL: https://issues.apache.org/jira/browse/TEZ-2237
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
 Environment: Debian Linux jessie
 OpenJDK Runtime Environment (build 1.8.0_40-internal-b27)
 OpenJDK 64-Bit Server VM (build 25.40-b25, mixed mode)
 7 * Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, 16/24 GB RAM per node, 1*system 
 disk + 4*1 or 2 TiB HDD for HDFS  local  (on-prem, dedicated hardware)
 Scalding 0.13.1 modified with https://github.com/twitter/scalding/pull/1220 
 to run Cascading 3.0.0-wip-90 with TEZ 0.6.0
Reporter: Cyrille Chépélov
Priority: Critical
 Attachments: TEZ-2237-hack.branch6.txt, TEZ-2237-hack.master.txt, 
 TEZ-2237.test.2_branch0.6.txt, all_stacks.lst, alloc_mem.png, 
 alloc_vcores.png, application_142732418_1444.yarn-logs.red.txt.gz, 
 application_142732418_1908.red.txt.bz2, 
 application_1427964335235_2070.txt.red.txt.bz2, 
 appmastersyslog_dag_1427282048097_0215_1.red.txt.gz, 
 appmastersyslog_dag_1427282048097_0237_1.red.txt.gz, 
 gc_count_MRAppMaster.png, mem_free.png, noopexample_2237.txt, 
 oneOutOfTwoOutputsStarted.txt, ordered-grouped-kv-input-traces.diff, 
 output-starts.txt, start_containers.png, stop_containers.png, 
 syslog_attempt_1427282048097_0215_1_21_14_0.red.txt.gz, 
 syslog_attempt_1427282048097_0237_1_70_28_0.red.txt.gz, yarn_rm_flips.png


 On a specific DAG with many vertices (actually part of a larger meta-DAG), 
 after about a hour of processing, several BufferTooSmallException are raised 
 in UnorderedPartitionedKVWriter (about one every two or three spills).
 Once these exceptions are raised, the DAG remains indefinitely active, 
 tying up memory and CPU resources as far as YARN is concerned, while little 
 if any actual processing takes place. 
 It seems two separate issues are at hand:
   1. BufferTooSmallException are raised even though, small as the actually 
 allocated buffers seem to be (around a couple megabytes were allotted whereas 
 100MiB were requested), the actual keys and values are never bigger than 24 
 and 1024 bytes respectively.
   2. In the event BufferTooSmallExceptions are raised, the DAG fails to stop 
 (stop requests appear to be sent 7 hours after the BTSE exceptions are 
 raised, but 9 hours after these stop requests, the DAG was still lingering on 
 with all containers present tying up memory and CPU allocations)
 The emergence of the BTSE prevent the Cascade to complete, preventing from 
 validating the results compared to traditional MR1-based results. The lack of 
 conclusion renders the cluster queue unavailable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1897) Create a concurrent version of AsyncDispatcher

[
https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518254#comment-14518254
]

Bikas Saha commented on TEZ-1897:
-

bq. Rename registerAndCreateDispatcher to registerAndCreateConcurrentDispatcher
?
bq. Rename registerWithExistingDispatcher to
registerWithExistingConcurrentDispatcher ?
The types in the arguments should be enough to differentiate and also provide
compile time check. Adding it to the names would be redundant. Also, once this
code is exercised and stabilized then we may use this as the only dispatcher.

bq. Javadoc for AsyncDispatcherConcurrent is the same as AsyncDispatcher
Fixed

bq. Precondition numThreads should be at least 2 ? I think 1 is the same as
AysnDispatcher
This should be able to replace the legacy AsyncDispatcher and run all events on
a single thread in the threadpool. So its allowed.

bq. TaskAttemptImpl.java Some places still use getTask().getVertex(), can
change to getVertex directly.
Fixed

Create a concurrent version of AsyncDispatcher
--

Key: TEZ-1897
URL: https://issues.apache.org/jira/browse/TEZ-1897
Project: Apache Tez
Issue Type: Task
Reporter: Bikas Saha
Assignee: Bikas Saha
Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch,
TEZ-1897.4.patch, TEZ-1897.5.patch, TEZ-1897.6.patch

Currently, it processes events on a single thread. For events that can be
executed in parallel, e.g. vertex manager events, allowing higher concurrency
may be beneficial.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents


[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518380#comment-14518380
 ] 

Bikas Saha commented on TEZ-776:


TEZ-776.6.A and TEZ-776.6.B are 2 versions of the API with the rest of the code 
being essentially the same. 
Version B uses an EventRouteMetadata as the return value from plugins and thus 
hides the events and their expansion etc. from them. This enables further 
improvement down the line where events could be expanded in the tasks or events 
could be kept in the AM without serde and passed directly to tasks. Based on 
some of the ideas in the initial design doc and ideas in TEZ-2255.


 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, 
 TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, 
 TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, 
 TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, 
 TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
 With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
 events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
 without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2380) Disable fall back to reading from timeline if timeline disabled


[ 
https://issues.apache.org/jira/browse/TEZ-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518386#comment-14518386
 ] 

TezQA commented on TEZ-2380:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728963/TEZ-2380.1.patch
  against master revision 5b2f011.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/566//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/566//console

This message is automatically generated.

 Disable fall back to reading from timeline if timeline disabled 
 

 Key: TEZ-2380
 URL: https://issues.apache.org/jira/browse/TEZ-2380
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Chang Li
 Attachments: TEZ-2380.1.patch


 As part of TEZ-2226, timeline logging can get disabled in cases where domain 
 creation fails. In this scenario, the dag client shoudl also not fall back to 
 reading from timeline. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (TEZ-2359) Deadlock in DAGAppMaster


 [ 
https://issues.apache.org/jira/browse/TEZ-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang resolved TEZ-2359.
-
Resolution: Invalid

My mistake, it's an issue when I work on TEZ-1273, not on the master. 

 Deadlock in DAGAppMaster
 

 Key: TEZ-2359
 URL: https://issues.apache.org/jira/browse/TEZ-2359
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Priority: Blocker

 {code}
 Found one Java-level deadlock:
 =
 Timer-1:
   waiting for ownable synchronizer 0x0007cd0f8a30, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by Dispatcher thread: Central
 Dispatcher thread: Central:
   waiting to lock monitor 0x7fb829866d18 (object 0x0007cd5ab958, a 
 org.apache.tez.dag.app.rm.YarnTaskSchedulerService),
   which is held by DelayedContainerManager
 DelayedContainerManager:
   waiting for ownable synchronizer 0x0007cd0f8a30, (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
   which is held by Dispatcher thread: Central
 Java stack information for the threads listed above:
 ===
 Timer-1:
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x0007cd0f8a30 (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
   at 
 java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
   at 
 org.apache.tez.dag.app.DAGAppMaster.checkAndHandleSessionTimeout(DAGAppMaster.java:2015)
   - locked 0x0007cd0f2ff0 (a org.apache.tez.dag.app.DAGAppMaster)
   at org.apache.tez.dag.app.DAGAppMaster$3.run(DAGAppMaster.java:1825)
   at java.util.TimerThread.mainLoop(Timer.java:555)
   at java.util.TimerThread.run(Timer.java:505)
 Dispatcher thread: Central:
   at 
 org.apache.tez.dag.app.rm.YarnTaskSchedulerService.dagComplete(YarnTaskSchedulerService.java:842)
   - waiting to lock 0x0007cd5ab958 (a 
 org.apache.tez.dag.app.rm.YarnTaskSchedulerService)
   at 
 org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.dagCompleted(TaskSchedulerEventHandler.java:566)
   at 
 org.apache.tez.dag.app.DAGAppMaster.checkForCompletion(DAGAppMaster.java:832)
   at 
 org.apache.tez.dag.app.DAGAppMaster.access$4800(DAGAppMaster.java:201)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2362)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2356)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   - locked 0x0007cd1d0208 (a 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
   at org.apache.tez.dag.app.DAGAppMaster.handle(DAGAppMaster.java:510)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:879)
   at 
 org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:868)
   at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
   at java.lang.Thread.run(Thread.java:745)
 DelayedContainerManager:
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x0007cd0f8a30 (a 
 java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
   at 
 java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
   at

[jira] [Created] (TEZ-2382) Publish counters to timeline for failed/killed attempts

Hitesh Shah created TEZ-2382:


 Summary: Publish counters to timeline for failed/killed attempts 
 Key: TEZ-2382
 URL: https://issues.apache.org/jira/browse/TEZ-2382
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah


This will be useful for debugging if attempts fail due to OOM or if an attempt 
was killed as its speculated task ran faster, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents


[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518330#comment-14518330
 ] 

TezQA commented on TEZ-776:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728940/TEZ-776.6.A.patch
  against master revision 5b2f011.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/565//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/565//console

This message is automatically generated.

 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, 
 TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, 
 TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, 
 TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, 
 TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
 With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
 events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
 without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED


 [ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2379:

Comment: was deleted

(was: I think the following the code may cause the issue. There may be already 
some event causing TA killed in the event queue.E.g. 
TA_CONTAINER_TERMINATED_BY_SYSTEM
May need to ignore T_ATTEMPT_KILLED at state of KILLED.

{code}
if (attempt != null  !attempt.isFinished()) {
  eventHandler.handle(new TaskAttemptEventKillRequest(attempt.getID(), 
logMsg, errorCause));
}
{code})

 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 --

 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Priority: Blocker

 {noformat}
 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
 Can't handle this event at current state for 
 task_1429683757595_0479_1_03_13
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
 at 
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Additional notes:
 
 Hive - latest build 
 Tez - master
 tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents


[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518543#comment-14518543
 ] 

TezQA commented on TEZ-776:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728973/TEZ-776.6.B.patch
  against master revision 5b2f011.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 4 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.dag.app.TestMockDAGAppMaster

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/568//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/568//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/568//artifact/patchprocess/newPatchFindbugsWarningstez-api.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/568//console

This message is automatically generated.

 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, 
 TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, 
 TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, 
 TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, 
 TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
 With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
 events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
 without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Failed: TEZ-776 PreCommit Build #568

Jira: https://issues.apache.org/jira/browse/TEZ-776
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/568/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2441 lines...]


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728973/TEZ-776.6.B.patch
  against master revision 5b2f011.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 4 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.dag.app.TestMockDAGAppMaster

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/568//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/568//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/568//artifact/patchprocess/newPatchFindbugsWarningstez-api.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/568//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
e63f35966e3164e70d2338a99e2a81706c0896b4 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #567
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2615630 bytes
Compression is 4.8%
Took 2 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  org.apache.tez.dag.app.TestMockDAGAppMaster.testMixedEdgeRouting

Error Message:
com.google.protobuf.ServiceException: 
org.apache.hadoop.ipc.RemoteException(org.apache.tez.dag.api.TezUncheckedException):
 Unable to instantiate class with 1 arguments: 
org.apache.tez.dag.app.TestMockDAGAppMaster$LegacyEdgeTestEdgeManager
 at 
org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:74)
 at 
org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:98)
 at org.apache.tez.dag.app.dag.impl.Edge.createEdgeManager(Edge.java:142)
 at org.apache.tez.dag.app.dag.impl.Edge.init(Edge.java:117)
 at org.apache.tez.dag.app.dag.impl.DAGImpl.createDAGEdges(DAGImpl.java:1451)
 at org.apache.tez.dag.app.dag.impl.DAGImpl.initializeDAG(DAGImpl.java:1403)
 at org.apache.tez.dag.app.dag.impl.DAGImpl.initializeDAG(DAGImpl.java:1355)
 at 
org.apache.tez.dag.app.dag.impl.DAGImpl$InitTransition.transition(DAGImpl.java:1665)
 at 
org.apache.tez.dag.app.dag.impl.DAGImpl$InitTransition.transition(DAGImpl.java:1647)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1079)
 at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:143)
 at 
org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
org.apache.tez.dag.app.DAGAppMaster.startDAGExecution(DAGAppMaster.java:2173)
 at org.apache.tez.dag.app.DAGAppMaster.startDAG(DAGAppMaster.java:2140)
 at

[jira] [Commented] (TEZ-2380) Disable fall back to reading from timeline if timeline disabled


[ 
https://issues.apache.org/jira/browse/TEZ-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518276#comment-14518276
 ] 

Hitesh Shah commented on TEZ-2380:
--

\cc [~pramachandran]

 Disable fall back to reading from timeline if timeline disabled 
 

 Key: TEZ-2380
 URL: https://issues.apache.org/jira/browse/TEZ-2380
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Chang Li
 Attachments: TEZ-2380.1.patch


 As part of TEZ-2226, timeline logging can get disabled in cases where domain 
 creation fails. In this scenario, the dag client shoudl also not fall back to 
 reading from timeline. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-2383) release sort buffers on close

Rajesh Balamohan created TEZ-2383:
-

 Summary: release sort buffers on close
 Key: TEZ-2383
 URL: https://issues.apache.org/jira/browse/TEZ-2383
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan


Currently they get released when sorter object gets GC-ed, but it might be good 
to explicitly release them on close as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2305) MR compatibility sleep job fails with IOException: Undefined job output-path


 [ 
https://issues.apache.org/jira/browse/TEZ-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2305:

Attachment: TEZ-2305-5.patch

[~hitesh] Upload new patch based on TEZ-2305.2.patch. (Add unit test to 
simulate the behavior of translating MR job to DAG)

 MR compatibility sleep job fails with IOException: Undefined job output-path
 

 Key: TEZ-2305
 URL: https://issues.apache.org/jira/browse/TEZ-2305
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Tassapol Athiapinya
Priority: Critical
 Attachments: TEZ-2305-3.patch, TEZ-2305-4.patch, TEZ-2305-5.patch, 
 TEZ-2305.1.patch, TEZ-2305.2.patch


 Running MR sleep job has an IOException.
 {code}
 15/04/09 20:52:25 INFO mapreduce.Job: Job job_1428612196442_0002 failed with 
 state FAILED due to: Vertex failed, vertexName=initialmap, 
 vertexId=vertex_1428612196442_0002_1_00, diagnostics=[Task failed, 
 taskId=task_1428612196442_0002_1_00_01, diagnostics=[TaskAttempt 0 
 failed, info=[Error: Failure while running task:java.io.IOException: 
 Undefined job output-path
   at 
 org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
   at 
 org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
   at 
 org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 ], TaskAttempt 1 failed, info=[Error: Failure while running 
 task:java.io.IOException: Undefined job output-path
   at 
 org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
   at 
 org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
   at 
 org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 ], TaskAttempt 2 failed, info=[Error: Failure while running 
 task:java.io.IOException: Undefined job output-path
   at 
 org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
   at 
 org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
   at 
 org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 ], TaskAttempt 3 failed, info=[Error: Failure while running 
 task:java.io.IOException: Undefined job output-path
   at

[jira] [Commented] (TEZ-2305) MR compatibility sleep job fails with IOException: Undefined job output-path


[ 
https://issues.apache.org/jira/browse/TEZ-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518612#comment-14518612
 ] 

Hitesh Shah commented on TEZ-2305:
--

+1 pending pre-commit . 

 MR compatibility sleep job fails with IOException: Undefined job output-path
 

 Key: TEZ-2305
 URL: https://issues.apache.org/jira/browse/TEZ-2305
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Tassapol Athiapinya
Priority: Critical
 Attachments: TEZ-2305-3.patch, TEZ-2305-4.patch, TEZ-2305-5.patch, 
 TEZ-2305.1.patch, TEZ-2305.2.patch


 Running MR sleep job has an IOException.
 {code}
 15/04/09 20:52:25 INFO mapreduce.Job: Job job_1428612196442_0002 failed with 
 state FAILED due to: Vertex failed, vertexName=initialmap, 
 vertexId=vertex_1428612196442_0002_1_00, diagnostics=[Task failed, 
 taskId=task_1428612196442_0002_1_00_01, diagnostics=[TaskAttempt 0 
 failed, info=[Error: Failure while running task:java.io.IOException: 
 Undefined job output-path
   at 
 org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
   at 
 org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
   at 
 org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 ], TaskAttempt 1 failed, info=[Error: Failure while running 
 task:java.io.IOException: Undefined job output-path
   at 
 org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
   at 
 org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
   at 
 org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 ], TaskAttempt 2 failed, info=[Error: Failure while running 
 task:java.io.IOException: Undefined job output-path
   at 
 org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
   at 
 org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
   at 
 org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 ], TaskAttempt 3 failed, info=[Error: Failure while running 
 task:java.io.IOException: Undefined job output-path
   at 
 org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
   at

[jira] [Updated] (TEZ-1752) Inputs / Outputs in the Runtime library should be interruptable


 [ 
https://issues.apache.org/jira/browse/TEZ-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1752:
--
Attachment: TEZ-1752.4.patch

- If the task is running, thread running the processor is interrupted; 
otherwise taskFuture.get() is interrupted.
- Adding the inputs/outputs as and when they are scheduled for initialization 
in LogicalIOProcessorRuntimeTask


 Inputs / Outputs in the Runtime library should be interruptable
 ---

 Key: TEZ-1752
 URL: https://issues.apache.org/jira/browse/TEZ-1752
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Rajesh Balamohan
 Attachments: TEZ-1752.1.patch, TEZ-1752.2.patch, TEZ-1752.3.patch, 
 TEZ-1752.4.patch


 Not possible to preempt tasks without killing containers without this.
 There's still the problem of Processors not supporting interrupts. We may 
 need API enhancements to either query IPOs on whether they are interrupbtible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Failed: TEZ-1752 PreCommit Build #562

Jira: https://issues.apache.org/jira/browse/TEZ-1752
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/562/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 1985 lines...]




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728780/TEZ-1752.4.patch
  against master revision 2522447.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.runtime.task.TestTaskExecution

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/562//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/562//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
60653cb169aa978f8334d4f8376a11f554ec6029 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #558
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2594356 bytes
Compression is 4.8%
Took 1.6 sec
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
1 tests failed.
REGRESSION:  
org.apache.tez.runtime.task.TestTaskExecution.testHeartbeatShouldDie

Error Message:
Found a TaskAttemptCompletedvent when not expected

Stack Trace:
java.lang.AssertionError: Found a TaskAttemptCompletedvent when not expected
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.tez.runtime.task.TestTaskExecution$TezTaskUmbilicalForTest.verifyNoCompletionEvents(TestTaskExecution.java:596)
at 
org.apache.tez.runtime.task.TestTaskExecution.testHeartbeatShouldDie(TestTaskExecution.java:326)

[jira] [Commented] (TEZ-2377) RandomWriter ends up using TextOutputFormat instead of SequenceFileOutputFormat


[ 
https://issues.apache.org/jira/browse/TEZ-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516952#comment-14516952
 ] 

Jeff Zhang commented on TEZ-2377:
-

Suppose it can be solved by TEZ-2305. [~rajesh.balamohan] Can you try the patch 
TEZ-2305.2.patch ?

 RandomWriter ends up using TextOutputFormat instead of 
 SequenceFileOutputFormat
 ---

 Key: TEZ-2377
 URL: https://issues.apache.org/jira/browse/TEZ-2377
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-2377.1.patch


 {code}
 yarn jar ./dist/tez/tez-tests-0.7.0-SNAPSHOT.jar randomwriter 
 -Dmapreduce.randomwriter.totalbytes=10737418 /tmp/test1
 {code}
 This ends up generating TextOutputFormat.  
 {code}
 yarn jar ./dist/tez/tez-tests-0.7.0-SNAPSHOT.jar sort  
 -Dmapreduce.framework.name=yarn-tez -r 5 /tmp/test1 /tmp/test_sorted
 {code}
 This ends up throwing error 
 {noformat}
 Failure while running task:java.io.IOException: 
 hdfs://tez-vm:56565/tmp/test1/part-0 not a SequenceFile
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1851)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1811)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1760)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1774)
 at 
 org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54)
 at 
 org.apache.tez.mapreduce.lib.MRReaderMapReduce.setupNewRecordReader(MRReaderMapReduce.java:149)
 at 
 org.apache.tez.mapreduce.lib.MRReaderMapReduce.init(MRReaderMapReduce.java:78)
 at 
 org.apache.tez.mapreduce.input.MRInput.initializeInternal(MRInput.java:475)
 at 
 org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:108)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2377) RandomWriter ends up using TextOutputFormat instead of SequenceFileOutputFormat


[ 
https://issues.apache.org/jira/browse/TEZ-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516993#comment-14516993
 ] 

Rajesh Balamohan commented on TEZ-2377:
---

Thanks [~zjffdu], TEZ-2305 (.2 patch) has the similar checks.  Tried it out and 
it works fine.  If you plan to commit that patch in, I can mark this as a 
duplicate or already resolved. 

 RandomWriter ends up using TextOutputFormat instead of 
 SequenceFileOutputFormat
 ---

 Key: TEZ-2377
 URL: https://issues.apache.org/jira/browse/TEZ-2377
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-2377.1.patch


 {code}
 yarn jar ./dist/tez/tez-tests-0.7.0-SNAPSHOT.jar randomwriter 
 -Dmapreduce.randomwriter.totalbytes=10737418 /tmp/test1
 {code}
 This ends up generating TextOutputFormat.  
 {code}
 yarn jar ./dist/tez/tez-tests-0.7.0-SNAPSHOT.jar sort  
 -Dmapreduce.framework.name=yarn-tez -r 5 /tmp/test1 /tmp/test_sorted
 {code}
 This ends up throwing error 
 {noformat}
 Failure while running task:java.io.IOException: 
 hdfs://tez-vm:56565/tmp/test1/part-0 not a SequenceFile
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1851)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1811)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1760)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1774)
 at 
 org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54)
 at 
 org.apache.tez.mapreduce.lib.MRReaderMapReduce.setupNewRecordReader(MRReaderMapReduce.java:149)
 at 
 org.apache.tez.mapreduce.lib.MRReaderMapReduce.init(MRReaderMapReduce.java:78)
 at 
 org.apache.tez.mapreduce.input.MRInput.initializeInternal(MRInput.java:475)
 at 
 org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:108)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED

Rajesh Balamohan created TEZ-2379:
-

 Summary: 
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
T_ATTEMPT_KILLED at KILLED
 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan


{noformat}
2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't 
handle this event at current state for task_1429683757595_0479_1_03_13
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
T_ATTEMPT_KILLED at KILLED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
at 
org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
at 
org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
at 
org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
at java.lang.Thread.run(Thread.java:745)
{noformat}

Additional notes:

Hive - latest build 
Tez - master
tpch-200 gb scale q_17 (kill the job in the middle of execution)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED


 [ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2379:
--
Attachment: application_1429683757595_0479.log

Attaching AM log

 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 --

 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
 Attachments: application_1429683757595_0479.log


 {noformat}
 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
 Can't handle this event at current state for 
 task_1429683757595_0479_1_03_13
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
 at 
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Additional notes:
 
 Hive - latest build 
 Tez - master
 tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2374) Fix build break against hadoop-2.2 due to TEZ-2325

2015-04-28 Thread Prakash Ramachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516848#comment-14516848
 ] 

Prakash Ramachandran commented on TEZ-2374:
---

+1 LGTM 
thanks hitesh was trying to remove the deprecated warning, missed the 2.2 
branch.

 Fix build break against hadoop-2.2 due to TEZ-2325
 --

 Key: TEZ-2374
 URL: https://issues.apache.org/jira/browse/TEZ-2374
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: TEZ-2374.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-2378) In case Fetcher (unordered) fails to do local fetch, log in debug mode to reduce log size

Rajesh Balamohan created TEZ-2378:
-

 Summary: In case Fetcher (unordered) fails to do local fetch, log 
in debug mode to reduce log size
 Key: TEZ-2378
 URL: https://issues.apache.org/jira/browse/TEZ-2378
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan


Following can be logged as debug mode as opposed to WARN level. May be counters 
can be added later to track the number of times it failed to do local-fetch.

{noformat}
2015-04-28 05:41:45,487 WARN [Fetcher [Map_5] #15] shuffle.Fetcher: Failed to 
shuffle output of InputAttemptIdentifier [inputIdentifier=InputIdentifier 
[inputIndex=81], attemptNumber=0, 
pathComponent=attempt_1429683757595_0485_1_03_81_0_10003, 
fetchTypeInfo=FINAL_MERGE_ENABLED, spillEventId=-1] from 
cn047-10.l42scl.hortonworks.com(local fetch)
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
output/attempt_1429683757595_0485_1_03_81_0_10003/file.out.index in any of 
the configured local directories
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164)
at 
org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:612)
at 
org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:592)
at 
org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:537)
at 
org.apache.tez.runtime.library.common.shuffle.Fetcher.doSharedFetch(Fetcher.java:353)
at 
org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:192)
at 
org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED


[ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517010#comment-14517010
 ] 

Jeff Zhang commented on TEZ-2379:
-

I think the following the code may cause the issue. There may be already some 
event causing TA killed in the event queue.E.g. 
TA_CONTAINER_TERMINATED_BY_SYSTEM
May need to ignore T_ATTEMPT_KILLED at state of KILLED.

{code}
if (attempt != null  !attempt.isFinished()) {
  eventHandler.handle(new TaskAttemptEventKillRequest(attempt.getID(), 
logMsg, errorCause));
}
{code}

 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 --

 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
 Attachments: application_1429683757595_0479.log


 {noformat}
 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
 Can't handle this event at current state for 
 task_1429683757595_0479_1_03_13
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
 at 
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Additional notes:
 
 Hive - latest build 
 Tez - master
 tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2380) Disable fall back to reading from timeline if timeline disabled

2015-04-28 Thread Chang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated TEZ-2380:
--
Attachment: TEZ-2380.1.patch

 Disable fall back to reading from timeline if timeline disabled 
 

 Key: TEZ-2380
 URL: https://issues.apache.org/jira/browse/TEZ-2380
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Chang Li
 Attachments: TEZ-2380.1.patch


 As part of TEZ-2226, timeline logging can get disabled in cases where domain 
 creation fails. In this scenario, the dag client shoudl also not fall back to 
 reading from timeline. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2380) Disable fall back to reading from timeline if timeline disabled

2015-04-28 Thread Chang Li (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518277#comment-14518277
 ] 

Chang Li commented on TEZ-2380:
---

[~hitesh], I have updated DAGClientImpl, and have verified that isATSEnabled is 
set to false when timeline disabled in both session and non-session case.

 Disable fall back to reading from timeline if timeline disabled 
 

 Key: TEZ-2380
 URL: https://issues.apache.org/jira/browse/TEZ-2380
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Chang Li
 Attachments: TEZ-2380.1.patch


 As part of TEZ-2226, timeline logging can get disabled in cases where domain 
 creation fails. In this scenario, the dag client shoudl also not fall back to 
 reading from timeline. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2256) Avoid use of BufferTooSmallException to signal end of buffer in UnorderedPartitionedKVWriter


 [ 
https://issues.apache.org/jira/browse/TEZ-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2256:
--
Priority: Critical  (was: Minor)
Target Version/s: 0.6.1

 Avoid use of BufferTooSmallException to signal end of buffer in 
 UnorderedPartitionedKVWriter
 

 Key: TEZ-2256
 URL: https://issues.apache.org/jira/browse/TEZ-2256
 Project: Apache Tez
  Issue Type: Improvement
Affects Versions: 0.6.0, 0.7.0
Reporter: Cyrille Chépélov
Assignee: Cyrille Chépélov
Priority: Critical
  Labels: patch
 Attachments: remove-btse-1-MASTER.patch, remove-btse-1-rfc.patch

   Original Estimate: 6h
  Remaining Estimate: 6h

 UnorderedPartitionedKVWriter delegates serialization to the application, 
 passing it a private ByteArrayOutputStream. In case the buffer is exhausted, 
 ByteArrayOutputStream signals that with a private BufferTooSmallException, 
 which can be seen but not dealt with by the application. As [~cwensel] 
 pointed out, when the application is in fact a complex framework, there is no 
 way to distinguish this exception from a real failure, which compels logging 
 the full stack even for reasonable events such as buffer complete.
 Suggested approach: set a complete flag in ByteArrayOutputStream that 
 disables any further output, and replace  BufferTooSmallException (BTSE) 
 handling by checking that flag. 
 [~sseth] suggested checking out SortedOutput as well, as the mechanisms there 
 should be similar.
 I'll give this a go this week.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1897) Create a concurrent version of AsyncDispatcher


[ 
https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518493#comment-14518493
 ] 

TezQA commented on TEZ-1897:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728962/TEZ-1897.6.patch
  against master revision 5b2f011.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/567//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/567//console

This message is automatically generated.

 Create a concurrent version of AsyncDispatcher
 --

 Key: TEZ-1897
 URL: https://issues.apache.org/jira/browse/TEZ-1897
 Project: Apache Tez
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch, 
 TEZ-1897.4.patch, TEZ-1897.5.patch, TEZ-1897.6.patch


 Currently, it processes events on a single thread. For events that can be 
 executed in parallel, e.g. vertex manager events, allowing higher concurrency 
 may be beneficial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1752) Inputs / Outputs in the Runtime library should be interruptable


[ 
https://issues.apache.org/jira/browse/TEZ-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517294#comment-14517294
 ] 

TezQA commented on TEZ-1752:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728815/TEZ-1752.5.patch
  against master revision 2522447.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/563//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/563//console

This message is automatically generated.

 Inputs / Outputs in the Runtime library should be interruptable
 ---

 Key: TEZ-1752
 URL: https://issues.apache.org/jira/browse/TEZ-1752
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Rajesh Balamohan
 Attachments: TEZ-1752.1.patch, TEZ-1752.2.patch, TEZ-1752.3.patch, 
 TEZ-1752.4.patch, TEZ-1752.5.patch


 Not possible to preempt tasks without killing containers without this.
 There's still the problem of Processors not supporting interrupts. We may 
 need API enhancements to either query IPOs on whether they are interrupbtible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1752) Inputs / Outputs in the Runtime library should be interruptable


 [ 
https://issues.apache.org/jira/browse/TEZ-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1752:
--
Attachment: TEZ-1752.5.patch

 Inputs / Outputs in the Runtime library should be interruptable
 ---

 Key: TEZ-1752
 URL: https://issues.apache.org/jira/browse/TEZ-1752
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Rajesh Balamohan
 Attachments: TEZ-1752.1.patch, TEZ-1752.2.patch, TEZ-1752.3.patch, 
 TEZ-1752.4.patch, TEZ-1752.5.patch


 Not possible to preempt tasks without killing containers without this.
 There's still the problem of Processors not supporting interrupts. We may 
 need API enhancements to either query IPOs on whether they are interrupbtible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED


 [ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2379:
-
Priority: Blocker  (was: Major)

 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 --

 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Priority: Blocker
 Attachments: application_1429683757595_0479.log


 {noformat}
 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
 Can't handle this event at current state for 
 task_1429683757595_0479_1_03_13
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
 at 
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Additional notes:
 
 Hive - latest build 
 Tez - master
 tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2226) Disable writing history to timeline if domain creation fails.

2015-04-28 Thread Prakash Ramachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517127#comment-14517127
 ] 

Prakash Ramachandran commented on TEZ-2226:
---

+1 LGTM. 

one minor item (separate JIRA) could be in DAGClientImpl to not read from 
timeline if writes are disabled.

 Disable writing history to timeline if domain creation fails.
 -

 Key: TEZ-2226
 URL: https://issues.apache.org/jira/browse/TEZ-2226
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Chang Li
Priority: Blocker
 Attachments: TEZ-2226.10.patch, TEZ-2226.11.patch, TEZ-2226.12.patch, 
 TEZ-2226.2.patch, TEZ-2226.3.patch, TEZ-2226.4.patch, TEZ-2226.5.patch, 
 TEZ-2226.6.patch, TEZ-2226.7.patch, TEZ-2226.8.patch, TEZ-2226.9.patch, 
 TEZ-2226.addon-for-patch10, TEZ-2226.addon-for-patch10-combined.full.patch, 
 TEZ-2226.patch, TEZ-2226.wip.2.patch, TEZ-2226.wip.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Success: TEZ-1752 PreCommit Build #563

Jira: https://issues.apache.org/jira/browse/TEZ-1752
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/563/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2791 lines...]
[INFO] Final Memory: 69M/927M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728815/TEZ-1752.5.patch
  against master revision 2522447.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/563//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/563//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
884316e5ac186e4fe11d60b806f9cf953ead2f28 logged out


==
==
Finished build.
==
==


Archiving artifacts
Sending artifact delta relative to PreCommit-TEZ-Build #558
Archived 44 artifacts
Archive block size is 32768
Received 4 blocks and 2618861 bytes
Compression is 4.8%
Took 1.9 sec
Description set: TEZ-1752
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Updated] (TEZ-2383) release sort buffers on close


 [ 
https://issues.apache.org/jira/browse/TEZ-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2383:
--
Attachment: TEZ-2383.1.patch

[~sseth] - Can you please review when you find time?

 release sort buffers on close
 -

 Key: TEZ-2383
 URL: https://issues.apache.org/jira/browse/TEZ-2383
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
 Attachments: TEZ-2383.1.patch


 Currently they get released when sorter object gets GC-ed, but it might be 
 good to explicitly release them on close as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (TEZ-2383) release sort buffers on close


 [ 
https://issues.apache.org/jira/browse/TEZ-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned TEZ-2383:
-

Assignee: Rajesh Balamohan

 release sort buffers on close
 -

 Key: TEZ-2383
 URL: https://issues.apache.org/jira/browse/TEZ-2383
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-2383.1.patch


 Currently they get released when sorter object gets GC-ed, but it might be 
 good to explicitly release them on close as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2360) per-io counters flag should generate both overall and per-edge counters

2015-04-28 Thread Prakash Ramachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-2360:
--
Attachment: TEZ-2360.3.patch

added Test case to check per io counters are aggregated.
since task counters are not available from the client, did the check at vertex 
level. 
NOTE: some counters like TaskCounter.FIRST_EVENT_RECEIVED might not be 
meaningful at task level.



 per-io counters flag should generate both overall and per-edge counters 
 

 Key: TEZ-2360
 URL: https://issues.apache.org/jira/browse/TEZ-2360
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Prakash Ramachandran
 Attachments: TEZ-2360.1.patch, TEZ-2360.2.patch, TEZ-2360.3.patch


 Currently, the per-io flag disables overall per task counters and retains 
 only per edge counters. It would be useful to have both overall and per edge 
 counters. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2305) MR compatibility sleep job fails with IOException: Undefined job output-path


 [ 
https://issues.apache.org/jira/browse/TEZ-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2305:
-
 Target Version/s: 0.5.4, 0.6.1
Affects Version/s: (was: 0.7.0)

 MR compatibility sleep job fails with IOException: Undefined job output-path
 

 Key: TEZ-2305
 URL: https://issues.apache.org/jira/browse/TEZ-2305
 Project: Apache Tez
  Issue Type: Bug
Reporter: Tassapol Athiapinya
Assignee: Jeff Zhang
Priority: Critical
 Attachments: TEZ-2305-3.patch, TEZ-2305-4.patch, TEZ-2305-5.patch, 
 TEZ-2305.1.patch, TEZ-2305.2.patch


 Running MR sleep job has an IOException.
 {code}
 15/04/09 20:52:25 INFO mapreduce.Job: Job job_1428612196442_0002 failed with 
 state FAILED due to: Vertex failed, vertexName=initialmap, 
 vertexId=vertex_1428612196442_0002_1_00, diagnostics=[Task failed, 
 taskId=task_1428612196442_0002_1_00_01, diagnostics=[TaskAttempt 0 
 failed, info=[Error: Failure while running task:java.io.IOException: 
 Undefined job output-path
   at 
 org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
   at 
 org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
   at 
 org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 ], TaskAttempt 1 failed, info=[Error: Failure while running 
 task:java.io.IOException: Undefined job output-path
   at 
 org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
   at 
 org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
   at 
 org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 ], TaskAttempt 2 failed, info=[Error: Failure while running 
 task:java.io.IOException: Undefined job output-path
   at 
 org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
   at 
 org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:121)
   at 
 org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:401)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:436)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:415)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 ], TaskAttempt 3 failed, info=[Error: Failure while running 
 task:java.io.IOException: Undefined job output-path
   at 
 org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:248)
   at

[jira] [Commented] (TEZ-2383) release sort buffers on close

2015-04-28 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518728#comment-14518728
 ] 

Gopal V commented on TEZ-2383:
--

[~rajesh.balamohan]: This is likely to be a fix limited to the sort 
implementation.

You can broaden this fix to all possible edge types, by moving up the 
abstraction.

Since ProcessorContext is the visible opaque implementation setup for the 
tez-api, it would be better to clean up that chain of references - cleaning up 
TezTaskContextImpl::runTimeTask within the ProcessorContext impl when the 
logical IO processor close is called.

This follows a neat contract as any user-code which holds onto a processor 
context object after the close of the task runtime will be holding onto an 
empty/unusable shell of a context.

 release sort buffers on close
 -

 Key: TEZ-2383
 URL: https://issues.apache.org/jira/browse/TEZ-2383
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-2383.1.patch


 Currently they get released when sorter object gets GC-ed, but it might be 
 good to explicitly release them on close as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2226) Disable writing history to timeline if domain creation fails.


[ 
https://issues.apache.org/jira/browse/TEZ-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517354#comment-14517354
 ] 

Hitesh Shah commented on TEZ-2226:
--

Good catch [~pramachandran]. Will file a separate jira for that. Committing 
shortly. 

 Disable writing history to timeline if domain creation fails.
 -

 Key: TEZ-2226
 URL: https://issues.apache.org/jira/browse/TEZ-2226
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Chang Li
Priority: Blocker
 Attachments: TEZ-2226.10.patch, TEZ-2226.11.patch, TEZ-2226.12.patch, 
 TEZ-2226.2.patch, TEZ-2226.3.patch, TEZ-2226.4.patch, TEZ-2226.5.patch, 
 TEZ-2226.6.patch, TEZ-2226.7.patch, TEZ-2226.8.patch, TEZ-2226.9.patch, 
 TEZ-2226.addon-for-patch10, TEZ-2226.addon-for-patch10-combined.full.patch, 
 TEZ-2226.patch, TEZ-2226.wip.2.patch, TEZ-2226.wip.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-924) InputFailedEvent handling for Shuffle

[
https://issues.apache.org/jira/browse/TEZ-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517398#comment-14517398
]

Hitesh Shah commented on TEZ-924:
-

[~rajesh.balamohan] [~sseth] Is this still for 0.7 ? or shoudl we move to 0.8 ?

InputFailedEvent handling for Shuffle
-

Key: TEZ-924
URL: https://issues.apache.org/jira/browse/TEZ-924
Project: Apache Tez
Issue Type: Bug
Reporter: Siddharth Seth
Priority: Critical

Shuffle receives batches of Events to process from the AM. The way these
events are sent over to the ShuffleHandlers and the way they're processed -
it's possible that Shuffle will start fetching data from an Event, which is
to be subsequently marked as failed (via an InputFailedEvent)
1) The AM sends events in batches. An InputFailedEvent for a specific Input
may not be part of the same batch which contained the original event which is
being marked bad.
2) The ShuffleEventHandler processes the events in each batch one event at a
time - so even if the InputFailedEvent follows - it's possible for Shuffle to
start fetching data from a Failed Input.
The AM needs to change to invalidate Inputs up front - so that related events
don't span batches. Alternately, it needs to apply the InputFailedEvent to
the original event being sent.
The Shuffle itself should process a batch update as a batch - that would
prevent fetchers from starting early even though there may be additional
events for the same host.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-2380) Disable fall back to reading from timeline if timeline disabled

Hitesh Shah created TEZ-2380:


 Summary: Disable fall back to reading from timeline if timeline 
disabled 
 Key: TEZ-2380
 URL: https://issues.apache.org/jira/browse/TEZ-2380
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah


As part of TEZ-2226, timeline logging can get disabled in cases where domain 
creation fails. In this scenario, the dag client shoudl also not fall back to 
reading from timeline. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-122) Finalize v1 DAGPlan and consider versioning support


 [ 
https://issues.apache.org/jira/browse/TEZ-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-122:

Target Version/s: 0.8.0  (was: 0.7.0)

 Finalize v1 DAGPlan and consider versioning support
 ---

 Key: TEZ-122
 URL: https://issues.apache.org/jira/browse/TEZ-122
 Project: Apache Tez
  Issue Type: Bug
Reporter: Mike Liddell
Priority: Critical

 The DAGPlan format will change over time.   The current definition may be too 
 strict with required fields and not have the desired flexibilty for 
 versioning.
 options/suggestions
 1. remove structural constraints (ie everything is optional) so that additive 
 changes can succeed and so that mandatory fields in v1 can become optional 
 without difficulty.  
 This doesn't help with changes to naming changes and doesn't help if 
 semantics change.
 2. introduce a version field to the protobuf.  This version field should 
 probably be the only mandatory field in the top-level message so that it can 
 always be parsed. Versioning may introduce additional Protobuf definititions 
 eg DAGPlanv2 and so on.
 A combination of 1  2 seems wise.  If all fields are optional, then we dont 
 get stuck with unwanted mandatory fields.  and if a version number is 
 included, the AppMaster can tell what semantics / assumptions should apply 
 when consuming the plan. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2309) Fix slf4j dependencies for tez modules


[ 
https://issues.apache.org/jira/browse/TEZ-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517403#comment-14517403
 ] 

Hitesh Shah commented on TEZ-2309:
--

[~vasanthkumar] Mind filing a separate jira for fixing the dep in tez-dag and 
attaching your current patch there? I will move this out to 0.8 so that we can 
use this as the main jira to clean up logging api dependencies and all the 
necessary issues with slf4j-log4j. 

 Fix slf4j dependencies for tez modules 
 ---

 Key: TEZ-2309
 URL: https://issues.apache.org/jira/browse/TEZ-2309
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Priority: Critical
 Attachments: TEZ-2309.1.patch


 Most modules should depend only on slf4j-api and not slf4j-log4j12. 
 I believe only tez-dag and tez-runtime-internals might need the log4j 
 dependency due to log rotation related code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2226) Disable writing history to timeline if domain creation fails.


[ 
https://issues.apache.org/jira/browse/TEZ-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517387#comment-14517387
 ] 

Hitesh Shah commented on TEZ-2226:
--

Created TEZ-2380 for [~pramachandran]'s comment. 

 Disable writing history to timeline if domain creation fails.
 -

 Key: TEZ-2226
 URL: https://issues.apache.org/jira/browse/TEZ-2226
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Chang Li
Priority: Blocker
 Fix For: 0.6.1

 Attachments: TEZ-2226.10.patch, TEZ-2226.11.patch, TEZ-2226.12.patch, 
 TEZ-2226.2.patch, TEZ-2226.3.patch, TEZ-2226.4.patch, TEZ-2226.5.patch, 
 TEZ-2226.6.patch, TEZ-2226.7.patch, TEZ-2226.8.patch, TEZ-2226.9.patch, 
 TEZ-2226.addon-for-patch10, TEZ-2226.addon-for-patch10-combined.full.patch, 
 TEZ-2226.patch, TEZ-2226.wip.2.patch, TEZ-2226.wip.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1138) Access to some mapred counters is broken after TEZ-901


[ 
https://issues.apache.org/jira/browse/TEZ-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517407#comment-14517407
 ] 

Hitesh Shah commented on TEZ-1138:
--

[~sseth] [~rekhajoshm] Will either of you be looking at this? Or should we move 
this to 0.8? 

 Access to some mapred counters is broken after TEZ-901
 --

 Key: TEZ-1138
 URL: https://issues.apache.org/jira/browse/TEZ-1138
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Rekha Joshi
Priority: Critical

 AbstractCounters.java maps mapred.Task$Counter counters to TaskCounter
 {code}  private static final MapString, String legacyMap = 
 Maps.newHashMap();
   static {
 legacyMap.put(org.apache.hadoop.mapred.Task$Counter,
   TaskCounter.class.getName());
 legacyMap.put(org.apache.hadoop.mapred.JobInProgress$Counter,
   JobCounter.class.getName());
 legacyMap.put(FileSystemCounters, FileSystemCounter.class.getName());
   }{code}
 But TEZ-901 removed a bunch of counters like MAP_OUTPUT_BYTES from 
 TaskCounter. Now if MAP_OUTPUT_BYTES is retrieved from MR code then it fails 
 with enum not found exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1421) MRCombiner throws NPE in MapredWordCount on master branch


[ 
https://issues.apache.org/jira/browse/TEZ-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517405#comment-14517405
 ] 

Hitesh Shah commented on TEZ-1421:
--

[~ozawa] Did you manage to get a chance to look at this? If not, we can move 
this out to 0.8. 

 MRCombiner throws NPE in MapredWordCount on master branch
 -

 Key: TEZ-1421
 URL: https://issues.apache.org/jira/browse/TEZ-1421
 Project: Apache Tez
  Issue Type: Bug
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa
Priority: Critical

 I tested MapredWordCount against 70GB generated by RandowTextWriter. When a 
 Combiner runs, it throws NPE. It looks setCombinerClass doesn't work 
 correctly.
 {quote}
 Caused by: java.lang.RuntimeException: java.lang.NullPointerException
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
 at 
 org.apache.tez.mapreduce.combine.MRCombiner.runOldCombiner(MRCombiner.java:122)
 at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:112)
 at 
 org.apache.tez.runtime.library.common.shuffle.impl.MergeManager.runCombineProcessor(MergeManager.java:472)
 at 
 org.apache.tez.runtime.library.common.shuffle.impl.MergeManager$InMemoryMerger.merge(MergeManager.java:605)
 at 
 org.apache.tez.runtime.library.common.shuffle.impl.MergeThread.run(MergeThread.java:89)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (TEZ-2046) Make timeline server putDomain exceptions non-fatal - full feature


 [ 
https://issues.apache.org/jira/browse/TEZ-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah resolved TEZ-2046.
--
Resolution: Fixed

 Make timeline server putDomain exceptions non-fatal - full feature
 --

 Key: TEZ-2046
 URL: https://issues.apache.org/jira/browse/TEZ-2046
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jonathan Eagles
Priority: Blocker

 As pointed out, this could be problem in secure environments. This jira will 
 provide a full fix that will disable timeline server puts for tez jobs that 
 fail to success create a domain. In addition, it may trigger off of the 
 client configuration created in YARN-3144.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2164) Shade the guava version used by Tez


 [ 
https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2164:
-
Target Version/s: 0.8.0  (was: 0.7.0)

 Shade the guava version used by Tez
 ---

 Key: TEZ-2164
 URL: https://issues.apache.org/jira/browse/TEZ-2164
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Priority: Critical
 Attachments: allow-guava-16.0.1.patch


 Should allow us to upgrade to a newer version without shipping a guava 
 dependency.
 Would be good to do this in 0.7 so that we stop shipping guava as early as 
 possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2164) Shade the guava version used by Tez


[ 
https://issues.apache.org/jira/browse/TEZ-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517395#comment-14517395
 ] 

Hitesh Shah commented on TEZ-2164:
--

Moving to 0.8.0 as this work may result in some level of incompatibilities. 

 Shade the guava version used by Tez
 ---

 Key: TEZ-2164
 URL: https://issues.apache.org/jira/browse/TEZ-2164
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Priority: Critical
 Attachments: allow-guava-16.0.1.patch


 Should allow us to upgrade to a newer version without shipping a guava 
 dependency.
 Would be good to do this in 0.7 so that we stop shipping guava as early as 
 possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2309) Fix slf4j dependencies for tez modules


 [ 
https://issues.apache.org/jira/browse/TEZ-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-2309:
-
Target Version/s: 0.8.0  (was: 0.7.0)

 Fix slf4j dependencies for tez modules 
 ---

 Key: TEZ-2309
 URL: https://issues.apache.org/jira/browse/TEZ-2309
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Priority: Critical
 Attachments: TEZ-2309.1.patch


 Most modules should depend only on slf4j-api and not slf4j-log4j12. 
 I believe only tez-dag and tez-runtime-internals might need the log4j 
 dependency due to log rotation related code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2381) Fixes after rebase 04/28


 [ 
https://issues.apache.org/jira/browse/TEZ-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2381:

Attachment: TEZ-2381.1.txt

 Fixes after rebase 04/28
 

 Key: TEZ-2381
 URL: https://issues.apache.org/jira/browse/TEZ-2381
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: TEZ-2381.1.txt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1912) Merge exceptions are thrown when enabling tez.runtime.shuffle.memory-to-memory.enable tez.runtime.shuffle.memory-to-memory.segments


 [ 
https://issues.apache.org/jira/browse/TEZ-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1912:
-
Fix Version/s: 0.7.0

 Merge exceptions are thrown when enabling 
 tez.runtime.shuffle.memory-to-memory.enable  
 tez.runtime.shuffle.memory-to-memory.segments
 --

 Key: TEZ-1912
 URL: https://issues.apache.org/jira/browse/TEZ-1912
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
 Fix For: 0.7.0

 Attachments: TEZ-1912.1.patch


 Merge exceptions are thrown when running a hive query on tez with the 
 following setting.  It works fine without mem-to-mem merge setting.
 {code}
 2015-01-04 20:04:01,371 ERROR [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.Shuffle: ShuffleRunner failed with error
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
  Error while doing final merge
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:364)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:327)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: Rec# 22630125: Negative value-length: -1
 at 
 org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.positionToNextRecord(IFile.java:720)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.InMemoryReader.readRawKey(InMemoryReader.java:104)
 at 
 org.apache.tez.runtime.library.common.sort.impl.TezMerger$Segment.readRawKey(TezMerger.java:329)
 at 
 org.apache.tez.runtime.library.common.sort.impl.TezMerger$MergeQueue.adjustPriorityQueue(TezMerger.java:500)
 at 
 org.apache.tez.runtime.library.common.sort.impl.TezMerger$MergeQueue.next(TezMerger.java:545)
 at 
 org.apache.tez.runtime.library.common.sort.impl.TezMerger.writeFile(TezMerger.java:204)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:862)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:473)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:362)
 ... 5 more
 {code}
 {code}
 $HIVE_HOME/bin/hive -hiveconf tez.runtime.io.sort.factor=200 --hiveconf 
 tez.shuffle-vertex-manager.min-src-fraction=1.0 --hiveconf 
 tez.shuffle-vertex-manager.max-src-fraction=1.0 --hiveconf 
 hive.tez.auto.reducer.parallelism=false --hiveconf 
 tez.am.heartbeat.interval-ms.max=20 --hiveconf tez.runtime.io.sort.mb=1200 
 --hiveconf tez.runtime.sort.threads=2 --hiveconf 
 tez.history.logging.service.class=org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService
  --hiveconf hive.tez.container.size=4096 --hiveconf 
 tez.runtime.shuffle.memory-to-memory.enable=true --hiveconf 
 tez.runtime.shuffle.memory-to-memory.segments=4
 --10 TB dataset
 use tpcds4_bin_partitioned_orc_1;
 drop table testData;
 create table testData as select 
 ss_sold_date_sk,ss_sold_time_sk,ss_item_sk,ss_customer_sk,ss_quantity,ss_sold_date
  from store_sales distribute by ss_sold_date;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1912) Merge exceptions are thrown when enabling tez.runtime.shuffle.memory-to-memory.enable tez.runtime.shuffle.memory-to-memory.segments


 [ 
https://issues.apache.org/jira/browse/TEZ-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1912:
-
Assignee: Rajesh Balamohan

 Merge exceptions are thrown when enabling 
 tez.runtime.shuffle.memory-to-memory.enable  
 tez.runtime.shuffle.memory-to-memory.segments
 --

 Key: TEZ-1912
 URL: https://issues.apache.org/jira/browse/TEZ-1912
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Fix For: 0.7.0

 Attachments: TEZ-1912.1.patch


 Merge exceptions are thrown when running a hive query on tez with the 
 following setting.  It works fine without mem-to-mem merge setting.
 {code}
 2015-01-04 20:04:01,371 ERROR [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.Shuffle: ShuffleRunner failed with error
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
  Error while doing final merge
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:364)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:327)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: Rec# 22630125: Negative value-length: -1
 at 
 org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.positionToNextRecord(IFile.java:720)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.InMemoryReader.readRawKey(InMemoryReader.java:104)
 at 
 org.apache.tez.runtime.library.common.sort.impl.TezMerger$Segment.readRawKey(TezMerger.java:329)
 at 
 org.apache.tez.runtime.library.common.sort.impl.TezMerger$MergeQueue.adjustPriorityQueue(TezMerger.java:500)
 at 
 org.apache.tez.runtime.library.common.sort.impl.TezMerger$MergeQueue.next(TezMerger.java:545)
 at 
 org.apache.tez.runtime.library.common.sort.impl.TezMerger.writeFile(TezMerger.java:204)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:862)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:473)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:362)
 ... 5 more
 {code}
 {code}
 $HIVE_HOME/bin/hive -hiveconf tez.runtime.io.sort.factor=200 --hiveconf 
 tez.shuffle-vertex-manager.min-src-fraction=1.0 --hiveconf 
 tez.shuffle-vertex-manager.max-src-fraction=1.0 --hiveconf 
 hive.tez.auto.reducer.parallelism=false --hiveconf 
 tez.am.heartbeat.interval-ms.max=20 --hiveconf tez.runtime.io.sort.mb=1200 
 --hiveconf tez.runtime.sort.threads=2 --hiveconf 
 tez.history.logging.service.class=org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService
  --hiveconf hive.tez.container.size=4096 --hiveconf 
 tez.runtime.shuffle.memory-to-memory.enable=true --hiveconf 
 tez.runtime.shuffle.memory-to-memory.segments=4
 --10 TB dataset
 use tpcds4_bin_partitioned_orc_1;
 drop table testData;
 create table testData as select 
 ss_sold_date_sk,ss_sold_time_sk,ss_item_sk,ss_customer_sk,ss_quantity,ss_sold_date
  from store_sales distribute by ss_sold_date;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2377) RandomWriter ends up using TextOutputFormat instead of SequenceFileOutputFormat


[ 
https://issues.apache.org/jira/browse/TEZ-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518144#comment-14518144
 ] 

Bikas Saha commented on TEZ-2377:
-

Does the same fix need to be made to MRInput(Base)?

 RandomWriter ends up using TextOutputFormat instead of 
 SequenceFileOutputFormat
 ---

 Key: TEZ-2377
 URL: https://issues.apache.org/jira/browse/TEZ-2377
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-2377.1.patch


 {code}
 yarn jar ./dist/tez/tez-tests-0.7.0-SNAPSHOT.jar randomwriter 
 -Dmapreduce.randomwriter.totalbytes=10737418 /tmp/test1
 {code}
 This ends up generating TextOutputFormat.  
 {code}
 yarn jar ./dist/tez/tez-tests-0.7.0-SNAPSHOT.jar sort  
 -Dmapreduce.framework.name=yarn-tez -r 5 /tmp/test1 /tmp/test_sorted
 {code}
 This ends up throwing error 
 {noformat}
 Failure while running task:java.io.IOException: 
 hdfs://tez-vm:56565/tmp/test1/part-0 not a SequenceFile
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1851)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1811)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1760)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1774)
 at 
 org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54)
 at 
 org.apache.tez.mapreduce.lib.MRReaderMapReduce.setupNewRecordReader(MRReaderMapReduce.java:149)
 at 
 org.apache.tez.mapreduce.lib.MRReaderMapReduce.init(MRReaderMapReduce.java:78)
 at 
 org.apache.tez.mapreduce.input.MRInput.initializeInternal(MRInput.java:475)
 at 
 org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:108)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents


 [ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-776:
---
Attachment: TEZ-776.6.A.patch

Patch with test issues fixed.

 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, 
 TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, 
 TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, 
 TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, 
 TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
 With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
 events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
 without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-946) Tez loses buffer-cache performance by running interleaved vertexes


 [ 
https://issues.apache.org/jira/browse/TEZ-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-946:

Target Version/s: 0.8.0  (was: 0.7.0)

 Tez loses buffer-cache performance by running interleaved vertexes
 --

 Key: TEZ-946
 URL: https://issues.apache.org/jira/browse/TEZ-946
 Project: Apache Tez
  Issue Type: Bug
Reporter: Gopal V
 Attachments: union-10.svg


 For a task which has multiple reduce vertexes running to generate UNION ops, 
 the current Tez behaviour causes bad cache performance as well as bad perf 
 with the object registry.
 The map spill files get paged in and out of cache, when I was running a large 
 query which had multiple reducers pulling data off different shuffle edges at 
 the same time.
 Along with this, whenever a map-join vertex is interleaved with a reducer 
 vertex, the map-join hashtable gets dropped in the transition.
 It would be beneficial to schedule the vertexes at the same level with some 
 priority so that we finish them faster through better buffer-cache hit-rate 
 and object-registry hit-rate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (TEZ-2381) Fixes after rebase 04/28


 [ 
https://issues.apache.org/jira/browse/TEZ-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved TEZ-2381.
-
   Resolution: Fixed
Fix Version/s: TEZ-2003

 Fixes after rebase 04/28
 

 Key: TEZ-2381
 URL: https://issues.apache.org/jira/browse/TEZ-2381
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: TEZ-2003

 Attachments: TEZ-2381.1.txt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED


[ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518219#comment-14518219
 ] 

Hitesh Shah commented on TEZ-2379:
--

Additional question: speculation enabled?

 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 --

 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Priority: Blocker
 Attachments: application_1429683757595_0479.log


 {noformat}
 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
 Can't handle this event at current state for 
 task_1429683757595_0479_1_03_13
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
 at 
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Additional notes:
 
 Hive - latest build 
 Tez - master
 tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED


[ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518219#comment-14518219
 ] 

Hitesh Shah edited comment on TEZ-2379 at 4/28/15 10:19 PM:


Additional question: speculation enabled? \cc [~bikassaha]


was (Author: hitesh):
Additional question: speculation enabled?

 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 --

 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Priority: Blocker
 Attachments: application_1429683757595_0479.log


 {noformat}
 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
 Can't handle this event at current state for 
 task_1429683757595_0479_1_03_13
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
 at 
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Additional notes:
 
 Hive - latest build 
 Tez - master
 tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1348) Setup configs required for local mode automatically, instead of relying on changes to tez-site


 [ 
https://issues.apache.org/jira/browse/TEZ-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1348:
-
Target Version/s: 0.8.0  (was: 0.6.0)

 Setup configs required for local mode automatically, instead of relying on 
 changes to tez-site
 --

 Key: TEZ-1348
 URL: https://issues.apache.org/jira/browse/TEZ-1348
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Priority: Critical

 In TEZ-717, I incorrect thought setting fs.defaultFS programmatically in 
 tez-site would work for local mode.
 Currently the requirement is that tez-site.xml must have fs.defaultFS set to 
 file:///.
 While that works, it doesn't allow for seamless execution in either 
 local-mode or on a cluster.
 The main issue here is that when Inputs / Outputs are configured - they use a 
 version of configuration which reads tez-site, and do not use the 
 configuration from the client itself (which is correct behaviour).
 Not sure what a good way to fix this is 
 1) It may be possible to override this value each time an instance of 
 Configuration/TezConfiguration is created. One possible way would be to 
 statically add a default resource to Configuration the moment a local client 
 is created.
 2) Provide information in the contexts on whether this is local or not. This 
 is fairly ugly, and would get in the way of running mixed mode tasks.
 Anyone have other suggestions ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1537) Expose edge information in I/P/IContexts


 [ 
https://issues.apache.org/jira/browse/TEZ-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1537:
-
Target Version/s: 0.8.0  (was: 0.6.0)

 Expose edge information in I/P/IContexts
 

 Key: TEZ-1537
 URL: https://issues.apache.org/jira/browse/TEZ-1537
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth

 Inputs / Outputs can make use of this information for specific optimizations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-568) Allow I/P/Os to report spescific time information to the AM


 [ 
https://issues.apache.org/jira/browse/TEZ-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-568:

Target Version/s: 0.8.0  (was: 0.7.0)

 Allow I/P/Os to report spescific time information to the AM
 ---

 Key: TEZ-568
 URL: https://issues.apache.org/jira/browse/TEZ-568
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Critical

 The AM can include this information in the history logs, which allows for 
 much easier analysis of job performance.
 As an example, shuffle reporting completion time, Tez itself reporting 
 initialization time etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-568) Allow I/P/Os to report spescific time information to the AM


[ 
https://issues.apache.org/jira/browse/TEZ-568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518193#comment-14518193
 ] 

Hitesh Shah commented on TEZ-568:
-

Moving to 0.8

 Allow I/P/Os to report spescific time information to the AM
 ---

 Key: TEZ-568
 URL: https://issues.apache.org/jira/browse/TEZ-568
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Critical

 The AM can include this information in the history logs, which allows for 
 much easier analysis of job performance.
 As an example, shuffle reporting completion time, Tez itself reporting 
 initialization time etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED


[ 
https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518194#comment-14518194
 ] 

Rajesh Balamohan commented on TEZ-2379:
---

[~hitesh] - Yes, this is with master codebase. 

 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 --

 Key: TEZ-2379
 URL: https://issues.apache.org/jira/browse/TEZ-2379
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Priority: Blocker
 Attachments: application_1429683757595_0479.log


 {noformat}
 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: 
 Can't handle this event at current state for 
 task_1429683757595_0479_1_03_13
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_ATTEMPT_KILLED at KILLED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853)
 at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874)
 at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860)
 at 
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
 at 
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Additional notes:
 
 Hive - latest build 
 Tez - master
 tpch-200 gb scale q_17 (kill the job in the middle of execution)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1564) State machine error: Invalid event: T_SCHEDULE at SCHEDULED


 [ 
https://issues.apache.org/jira/browse/TEZ-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1564:
-
Target Version/s: 0.7.0  (was: 0.6.0)

 State machine error: Invalid event: T_SCHEDULE at SCHEDULED
 ---

 Key: TEZ-1564
 URL: https://issues.apache.org/jira/browse/TEZ-1564
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Priority: Critical
 Attachments: applogs.txt.tar.gz, dag.dot


 ERROR [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.TaskImpl: Can't handle this event at current 
 state for task_1409722953518_0162_1_07_00
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_SCHEDULE at SCHEDULED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:827)
   at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:95)
   at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1604)
   at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1590)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:724)
 I will attach the dag + app logs soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-1564) State machine error: Invalid event: T_SCHEDULE at SCHEDULED


 [ 
https://issues.apache.org/jira/browse/TEZ-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1564:
-
Target Version/s: 0.7.0, 0.6.1  (was: 0.7.0)

 State machine error: Invalid event: T_SCHEDULE at SCHEDULED
 ---

 Key: TEZ-1564
 URL: https://issues.apache.org/jira/browse/TEZ-1564
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Priority: Critical
 Attachments: applogs.txt.tar.gz, dag.dot


 ERROR [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.TaskImpl: Can't handle this event at current 
 state for task_1409722953518_0162_1_07_00
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 T_SCHEDULE at SCHEDULED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:827)
   at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:95)
   at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1604)
   at 
 org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1590)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:724)
 I will attach the dag + app logs soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1897) Create a concurrent version of AsyncDispatcher


[ 
https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517978#comment-14517978
 ] 

Siddharth Seth commented on TEZ-1897:
-

Quick glance. This should be disabled by default.
Will look in detail a little later.

 Create a concurrent version of AsyncDispatcher
 --

 Key: TEZ-1897
 URL: https://issues.apache.org/jira/browse/TEZ-1897
 Project: Apache Tez
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch, 
 TEZ-1897.4.patch, TEZ-1897.5.patch


 Currently, it processes events on a single thread. For events that can be 
 executed in parallel, e.g. vertex manager events, allowing higher concurrency 
 may be beneficial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-2381) Fixes after rebase 04/28

Siddharth Seth created TEZ-2381:
---

 Summary: Fixes after rebase 04/28
 Key: TEZ-2381
 URL: https://issues.apache.org/jira/browse/TEZ-2381
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1897) Create a concurrent version of AsyncDispatcher


[ 
https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517996#comment-14517996
 ] 

Bikas Saha commented on TEZ-1897:
-

the patch enabled it by default to get the jenkins run with it. It will be off 
by default.

 Create a concurrent version of AsyncDispatcher
 --

 Key: TEZ-1897
 URL: https://issues.apache.org/jira/browse/TEZ-1897
 Project: Apache Tez
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch, 
 TEZ-1897.4.patch, TEZ-1897.5.patch


 Currently, it processes events on a single thread. For events that can be 
 executed in parallel, e.g. vertex manager events, allowing higher concurrency 
 may be beneficial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents


 [ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-776:
---
Attachment: TEZ-776.6.A.patch

Re-uploading patch as the previous one had incomplete diffs.

 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, 
 TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, 
 TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, 
 TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, 
 TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
 With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
 events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
 without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents


 [ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-776:
---
Attachment: (was: TEZ-776.6.A.patch)

 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, 
 TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.ondemand.1.patch, 
 TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, 
 TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, 
 TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
 With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
 events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
 without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2376) Remove TaskAttemptEventType.TA_DIAGNOSTICS_UPDATE


[ 
https://issues.apache.org/jira/browse/TEZ-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516639#comment-14516639
 ] 

TezQA commented on TEZ-2376:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728721/TEZ-2376-1.patch
  against master revision 2522447.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/561//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/561//console

This message is automatically generated.

 Remove TaskAttemptEventType.TA_DIAGNOSTICS_UPDATE
 -

 Key: TEZ-2376
 URL: https://issues.apache.org/jira/browse/TEZ-2376
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: TEZ-2376-1.patch


 It is never used. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-1897) Create a concurrent version of AsyncDispatcher


[ 
https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516643#comment-14516643
 ] 

Jeff Zhang commented on TEZ-1897:
-

Comments:
* Rename registerAndCreateDispatcher to registerAndCreateConcurrentDispatcher ?
  Rename registerWithExistingDispatcher to 
registerWithExistingConcurrentDispatcher ?

* Javadoc for AsyncDispatcherConcurrent is the same as AsyncDispatcher

* No concurentDispatcher checking in registerAndCreateDispatcher(Class? 
extends Enum eventType,
  EventHandler handler, String dispatcherName)

*  Precondition numThreads should be at least 2 ? I think 1 is the same as 
AysnDispatcher
{code}
 public AsyncDispatcherConcurrent registerAndCreateDispatcher(Class? extends 
Enum eventType,
  EventHandler handler, String dispatcherName, int numThreads) 
{code}

* TaskAttemptImpl.java
Some places still use getTask().getVertex(), can change to getVertex directly.



 Create a concurrent version of AsyncDispatcher
 --

 Key: TEZ-1897
 URL: https://issues.apache.org/jira/browse/TEZ-1897
 Project: Apache Tez
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch, 
 TEZ-1897.4.patch, TEZ-1897.5.patch


 Currently, it processes events on a single thread. For events that can be 
 executed in parallel, e.g. vertex manager events, allowing higher concurrency 
 may be beneficial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents


 [ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-776:
---
Attachment: TEZ-776.5.patch

Cancelling and re-uploading patch due to build error. Triggering build again.

 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, 
 TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.ondemand.1.patch, 
 TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, 
 TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, 
 TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
 With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
 events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
 without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents


[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517533#comment-14517533
 ] 

TezQA commented on TEZ-776:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728848/TEZ-776.5.patch
  against master revision 5b2f011.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.dag.app.TestTaskAttemptListenerImplTezDag
  org.apache.tez.dag.app.dag.impl.TestVertexImpl

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/564//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/564//console

This message is automatically generated.

 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, 
 TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.ondemand.1.patch, 
 TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, 
 TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, 
 TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
 With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
 events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
 without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2372) TestAMRecovery failing in latest build


[ 
https://issues.apache.org/jira/browse/TEZ-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517516#comment-14517516
 ] 

Hitesh Shah commented on TEZ-2372:
--

For the next failure, take a look at 
https://builds.apache.org/job/Tez-Build/ws/ - this should get you to the actual 
mini cluster dirs as well as the test output log file. 

 TestAMRecovery failing in latest build 
 ---

 Key: TEZ-2372
 URL: https://issues.apache.org/jira/browse/TEZ-2372
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah

 https://builds.apache.org/job/Tez-Build/1018/console



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents


 [ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-776:
---
Attachment: (was: TEZ-776.5.patch)

 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Bikas Saha
 Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, 
 TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.ondemand.1.patch, 
 TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, 
 TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, 
 TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
 With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
 events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
 without_patch_jmc_output_of_AM.png


 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (TEZ-2380) Disable fall back to reading from timeline if timeline disabled

2015-04-28 Thread Chang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li reassigned TEZ-2380:
-

Assignee: Chang Li

 Disable fall back to reading from timeline if timeline disabled 
 

 Key: TEZ-2380
 URL: https://issues.apache.org/jira/browse/TEZ-2380
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Chang Li

 As part of TEZ-2226, timeline logging can get disabled in cases where domain 
 creation fails. In this scenario, the dag client shoudl also not fall back to 
 reading from timeline. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2376) Remove TaskAttemptEventType.TA_DIAGNOSTICS_UPDATE


[ 
https://issues.apache.org/jira/browse/TEZ-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516571#comment-14516571
 ] 

TezQA commented on TEZ-2376:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12728713/TEZ-2376-1.patch
  against master revision 2522447.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/559//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/559//console

This message is automatically generated.

 Remove TaskAttemptEventType.TA_DIAGNOSTICS_UPDATE
 -

 Key: TEZ-2376
 URL: https://issues.apache.org/jira/browse/TEZ-2376
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Attachments: TEZ-2376-1.patch


 It is never used. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Failed: TEZ-2376 PreCommit Build #559