[jira] [Commented] (TEZ-3096) Statemachine: TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS fails

2016-03-07 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184580#comment-15184580
 ] 

Zhiyuan Yang commented on TEZ-3096:
---

Sorry, it may take some time for me to figure it our because I'm still beginner 
on Tez and I'm not familiar with Hive neither.

> Statemachine: TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS fails
> ---
>
> Key: TEZ-3096
> URL: https://issues.apache.org/jira/browse/TEZ-3096
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Zhiyuan Yang
>
> Tasks are failing exactly 300ms into running due to a FileSystem error.
> {code}
> 2016-02-04 05:05:56,853 [ERROR] [Dispatcher thread {Central}] 
> |impl.TaskAttemptImpl|: Can't handle this event at current state for 
> attempt_1454544113740_0027_1_00_03_3
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:795)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:120)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2180)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2165)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
> at java.lang.Thread.run(Thread.java:745)
> 2016-02-04 05:05:56,903 [ERROR] [Dispatcher thread {Central}] 
> |impl.TaskAttemptImpl|: Can't handle this event at current state for 
> attempt_1454544113740_0027_1_00_00_3
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:795)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:120)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2180)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2165)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3096) Statemachine: TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS fails

2016-03-07 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184219#comment-15184219
 ] 

Zhiyuan Yang commented on TEZ-3096:
---

No problem. I'll look at your JIRA soon and give you the feedback.

> Statemachine: TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS fails
> ---
>
> Key: TEZ-3096
> URL: https://issues.apache.org/jira/browse/TEZ-3096
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Zhiyuan Yang
>
> Tasks are failing exactly 300ms into running due to a FileSystem error.
> {code}
> 2016-02-04 05:05:56,853 [ERROR] [Dispatcher thread {Central}] 
> |impl.TaskAttemptImpl|: Can't handle this event at current state for 
> attempt_1454544113740_0027_1_00_03_3
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:795)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:120)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2180)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2165)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
> at java.lang.Thread.run(Thread.java:745)
> 2016-02-04 05:05:56,903 [ERROR] [Dispatcher thread {Central}] 
> |impl.TaskAttemptImpl|: Can't handle this event at current state for 
> attempt_1454544113740_0027_1_00_00_3
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:795)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:120)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2180)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2165)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TEZ-3145) Reduce message size when empty partitions is high

2016-03-07 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15180722#comment-15180722
 ] 

Jonathan Eagles edited comment on TEZ-3145 at 3/8/16 12:26 AM:
---

The slim DME idea is to only send the empty partition across in the DME if the 
destination is going to read the source partition.


was (Author: jeagles):
The SLIM DME idea is to only send the empty partition across in the DME if the 
destination is going to read the source partition.

> Reduce message size when empty partitions is high
> -
>
> Key: TEZ-3145
> URL: https://issues.apache.org/jira/browse/TEZ-3145
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3145.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3145) Reduce message size when empty partitions is high

2016-03-07 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3145:
-
Attachment: TEZ-3145.2.patch

> Reduce message size when empty partitions is high
> -
>
> Key: TEZ-3145
> URL: https://issues.apache.org/jira/browse/TEZ-3145
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3145.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3145) Reduce message size when empty partitions is high

2016-03-07 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3145:
-
Attachment: (was: TEZ-3145.SLIM-DME.patch)

> Reduce message size when empty partitions is high
> -
>
> Key: TEZ-3145
> URL: https://issues.apache.org/jira/browse/TEZ-3145
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-3145.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2863) Container, node, and logs not available in UI for tasks that fail to launch

2016-03-07 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184052#comment-15184052
 ] 

Hitesh Shah commented on TEZ-2863:
--

+1. 



> Container, node, and logs not available in UI for tasks that fail to launch
> ---
>
> Key: TEZ-2863
> URL: https://issues.apache.org/jira/browse/TEZ-2863
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-2863.1.patch, TEZ-2863.2-branch-0.7.patch, 
> TEZ-2863.2.patch, TEZ-2863.3-branch-0.7.patch, 
> TEZ-2863.3-branch-0.7.patch.addendum, TEZ-2863.3.patch, 
> TEZ-2863.3.patch.addendum, TEZ-2863.4-branch-0.7.patch, TEZ-2863.4.patch, 
> TEZ-2863.5-branch-0.7.patch, TEZ-2863.5.patch
>
>
> While running a sample tez job
> {noformat}
> tez-examples-*.jar orderedwordcount -Dtez.task.resource.memory.mb=1 
> -Dtez.task.launch.cmd-opts="-Xmx1m" input output
> {noformat}
> It was noticed that the Tez UI task attempt 
> http://timelineserverhost:port/ws/v1/timeline/TEZ_TASK_ATTEMPT_ID/attempt_id 
> was missing the TEZ_ATTEMPT_STARTED event
> {noformat}
> 2015-10-01 10:03:55,344 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1443711816411_0001_1][Event:TASK_STARTED]: 
> vertexName=Tokenizer, taskId=task_1443711816411_0001_1_00_00, 
> scheduledTime=1443711835342, launchTime=1443711835342
> 2015-10-01 10:03:55,346 [INFO] [Dispatcher thread {Central}] 
> |util.RackResolver|: Resolved localhost to /default-rack
> 2015-10-01 10:03:55,356 [INFO] [TaskSchedulerEventHandlerThread] 
> |util.RackResolver|: Resolved localhost to /default-rack
> 2015-10-01 10:03:55,364 [INFO] [TaskSchedulerEventHandlerThread] 
> |rm.YarnTaskSchedulerService|: Allocation request for task: 
> attempt_1443711816411_0001_1_00_00_0 with request: Capability[ vCores:1>]Priority[2] host: localhost rack: null
> 2015-10-01 10:03:56,639 [INFO] [AMRM Heartbeater thread] 
> |impl.AMRMClientImpl|: Received new token for : localhost:57381
> 2015-10-01 10:03:56,646 [INFO] [AMRM Callback Handler Thread] 
> |util.RackResolver|: Resolved localhost to /default-rack
> 2015-10-01 10:03:56,648 [INFO] [DelayedContainerManager] 
> |rm.YarnTaskSchedulerService|: Assigning container to task: 
> containerId=container_1443711816411_0001_01_02, 
> task=attempt_1443711816411_0001_1_00_00_0, containerHost=localhost:57381, 
> containerPriority= 2, containerResources=, 
> localityMatchType=NodeLocal, matchedLocation=localhost, 
> honorLocalityFlags=true, reusedContainer=false, delayedContainers=0
> 2015-10-01 10:03:56,649 [INFO] [DelayedContainerManager] |util.RackResolver|: 
> Resolved localhost to /default-rack
> 2015-10-01 10:03:56,649 [INFO] [DelayedContainerManager] |util.RackResolver|: 
> Resolved localhost to /default-rack
> 2015-10-01 10:03:56,686 [INFO] [TaskSchedulerAppCaller #0] 
> |node.AMNodeTracker|: Adding new node: localhost:57381
> 2015-10-01 10:03:56,700 [INFO] [ContainerLauncher #0] 
> |launcher.ContainerLauncherImpl|: Launching 
> container_1443711816411_0001_01_02
> 2015-10-01 10:03:56,700 [INFO] [ContainerLauncher #0] 
> |impl.ContainerManagementProtocolProxy|: Opening proxy : localhost:57381
> 2015-10-01 10:03:56,741 [INFO] [ContainerLauncher #0] 
> |history.HistoryEventHandler|: [HISTORY][DAG:N/A][Event:CONTAINER_LAUNCHED]: 
> containerId=container_1443711816411_0001_01_02, launchTime=1443711836741
> 2015-10-01 10:03:57,647 [INFO] [AMRM Callback Handler Thread] 
> |rm.YarnTaskSchedulerService|: Allocated container 
> completed:container_1443711816411_0001_01_02 last allocated to task: 
> attempt_1443711816411_0001_1_00_00_0
> 2015-10-01 10:03:57,648 [INFO] [Dispatcher thread {Central}] 
> |container.AMContainerImpl|: Container container_1443711816411_0001_01_02 
> exited with diagnostics set to Container failed, exitCode=1. Exception from 
> container-launch.
> Container id: container_1443711816411_0001_01_02
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1: 
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>   at org.apache.hadoop.util.Shell.run(Shell.java:455)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> 

[jira] [Commented] (TEZ-3096) Statemachine: TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS fails

2016-03-07 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184011#comment-15184011
 ] 

Tsuyoshi Ozawa commented on TEZ-3096:
-

I've also uploaded the patch, so I appreciate if you take a look. Thanks!

> Statemachine: TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS fails
> ---
>
> Key: TEZ-3096
> URL: https://issues.apache.org/jira/browse/TEZ-3096
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Zhiyuan Yang
>
> Tasks are failing exactly 300ms into running due to a FileSystem error.
> {code}
> 2016-02-04 05:05:56,853 [ERROR] [Dispatcher thread {Central}] 
> |impl.TaskAttemptImpl|: Can't handle this event at current state for 
> attempt_1454544113740_0027_1_00_03_3
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:795)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:120)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2180)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2165)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
> at java.lang.Thread.run(Thread.java:745)
> 2016-02-04 05:05:56,903 [ERROR] [Dispatcher thread {Central}] 
> |impl.TaskAttemptImpl|: Can't handle this event at current state for 
> attempt_1454544113740_0027_1_00_00_3
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:795)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:120)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2180)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2165)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3096) Statemachine: TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS fails

2016-03-07 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184009#comment-15184009
 ] 

Tsuyoshi Ozawa commented on TEZ-3096:
-

[~gopalv] [~aplusplus] maybe duplicated issue of TEZ-3148?

> Statemachine: TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS fails
> ---
>
> Key: TEZ-3096
> URL: https://issues.apache.org/jira/browse/TEZ-3096
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Zhiyuan Yang
>
> Tasks are failing exactly 300ms into running due to a FileSystem error.
> {code}
> 2016-02-04 05:05:56,853 [ERROR] [Dispatcher thread {Central}] 
> |impl.TaskAttemptImpl|: Can't handle this event at current state for 
> attempt_1454544113740_0027_1_00_03_3
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:795)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:120)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2180)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2165)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
> at java.lang.Thread.run(Thread.java:745)
> 2016-02-04 05:05:56,903 [ERROR] [Dispatcher thread {Central}] 
> |impl.TaskAttemptImpl|: Can't handle this event at current state for 
> attempt_1454544113740_0027_1_00_00_3
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:795)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:120)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2180)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2165)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3155) Support a way to submit DAGs to a session where the DAG plan exceeds hadoop ipc limits

2016-03-07 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183558#comment-15183558
 ] 

Hitesh Shah commented on TEZ-3155:
--

bq. this addition seems to have no relation to the proto being modified - why 
was this needed?

Please ignore. Surprising that findbugs has not reported this earlier for other 
patches. 

> Support a way to submit DAGs to a session where the DAG plan exceeds hadoop 
> ipc limits 
> ---
>
> Key: TEZ-3155
> URL: https://issues.apache.org/jira/browse/TEZ-3155
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3155.1.patch, TEZ-3155.2.patch, TEZ-3155.3.patch, 
> TEZ-3155.4.patch
>
>
> Currently, dag submissions fail if the dag plan exceeds the hadoop ipc 
> limits. One option would be to fall back to local resources if the dag plan 
> is too large. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3155) Support a way to submit DAGs to a session where the DAG plan exceeds hadoop ipc limits

2016-03-07 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183553#comment-15183553
 ] 

Hitesh Shah commented on TEZ-3155:
--

Thanks for addressing the previous comments. Some more comments based on patch 
4: 

{code}

47  
48  
49  
50
{code}
  - this addition seems to have no relation to the proto being modified - why 
was this needed? 

TezClient: 

{code} private FileSystem fs = null; {code}
   - rename this to something like stagingFs. Also, this should be initialized 
once in init() and re-used. 

{code}
137   private static final int gapToMaxIPCSize = 5 * 1024 * 1024;
138   private AtomicInteger serializedDAGPlanCounter = new AtomicInteger(0);
{code}
  - above need code comments to describe that the vars are. 
   - might be good to make gapToMaxIPCSize configurable with default as 5 MB ). 
Mark the new config property as Private though


{code}
 dagClientConf.getInt(CommonConfigurationKeys.IPC_MAXIMUM_DATA_LENGTH,
530 CommonConfigurationKeys.IPC_MAXIMUM_DATA_LENGTH_DEFAULT)) {
{code}
  - this should be a class member var and initialized once. Also it should use 
the main tezconf and not dagclientconf

{code} TezConfiguration tezConf = amConfig.getTezConfiguration(); {code} 
   - no need to create an extra local var. Just use 
"amConfig.getTezConfiguration()" directly 

{code}
 /* we need manually delete the serialized dagplan since staging path here 
won't be destroyed */
190   Path dagPlanPath = new Path(request.getSerializedDagPlanPath());
191   FileSystem fs = dagPlanPath.getFileSystem(conf);
192   fs.delete(dagPlanPath, false);
{code}
  - this is not reliable if there is a test failure or an exception is thrown
  - staging dir should be set to target and also use the local fs
  - Using local fs could be done by having a package private method to override 
the stagingFs in TezClient with the value of FileSystem::getLocal 
  - For the dag plan file, use deleteOnExit() 

TestTezClient:

{code}
int maxIPCMsgSize = 1024;
173   conf.setInt(CommonConfigurationKeys.IPC_MAXIMUM_DATA_LENGTH, 
maxIPCMsgSize);
174   
processorDescriptor.setUserPayload(UserPayload.create(ByteBuffer.allocate(2*maxIPCMsgSize)));
{code}
   - processorDescriptor.setUserPayload() is not being invoked for the 
largeDagPlan false case? - shouldnt it always be set to say 2 MB in both 
scenarios and the max limit changed to 1 MB in one scenario and say 8 ( +5 for 
the overhead check ) MB in the other scenario? This can played around with to 
address my following comments on the buffer and additional resources checks. 
   - how is the 5 MB buffer check being tested? 
   - Also, there is no test if additionalResources ( or a combination of dag 
plan + additional rsrcs ) exceeds ipc limits? 

DAGClientAMProtocolBlockingPBServerImpl: 

   - fs can be initialized in the ctor itself 

{code}
try (FSDataInputStream fsDataInputStream = fs.open(requestPath)) {
173   dagPlan = DAGPlan.parseFrom(fsDataInputStream);
174 } catch (IOException e) {
175   throw wrapException(e);
176 }
{code}
  - wont the exception thrown in line 173 be caught be the catch in line 186 ?

testSubmitDagInSessionWithLargeDagPlan
  - test could be enhanced to verify the payload contents after deserialization 


   





> Support a way to submit DAGs to a session where the DAG plan exceeds hadoop 
> ipc limits 
> ---
>
> Key: TEZ-3155
> URL: https://issues.apache.org/jira/browse/TEZ-3155
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3155.1.patch, TEZ-3155.2.patch, TEZ-3155.3.patch, 
> TEZ-3155.4.patch
>
>
> Currently, dag submissions fail if the dag plan exceeds the hadoop ipc 
> limits. One option would be to fall back to local resources if the dag plan 
> is too large. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2954) Container launch timeouts should count towards node blacklisting

2016-03-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183434#comment-15183434
 ] 

Siddharth Seth commented on TEZ-2954:
-

[~ozawa] - I'll try looking at the patch by the end of the week.

> Container launch timeouts should count towards node blacklisting
> 
>
> Key: TEZ-2954
> URL: https://issues.apache.org/jira/browse/TEZ-2954
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Tsuyoshi Ozawa
> Attachments: TEZ-2954.001.patch
>
>
> Currently, only task failures count towards blacklisting. A container timing 
> out should do the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3096) Statemachine: TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS fails

2016-03-07 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183340#comment-15183340
 ] 

Zhiyuan Yang commented on TEZ-3096:
---

I would like to take this task. If there is no problem, I will assign this JIRA 
to myself.

> Statemachine: TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS fails
> ---
>
> Key: TEZ-3096
> URL: https://issues.apache.org/jira/browse/TEZ-3096
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Zhiyuan Yang
>
> Tasks are failing exactly 300ms into running due to a FileSystem error.
> {code}
> 2016-02-04 05:05:56,853 [ERROR] [Dispatcher thread {Central}] 
> |impl.TaskAttemptImpl|: Can't handle this event at current state for 
> attempt_1454544113740_0027_1_00_03_3
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:795)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:120)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2180)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2165)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
> at java.lang.Thread.run(Thread.java:745)
> 2016-02-04 05:05:56,903 [ERROR] [Dispatcher thread {Central}] 
> |impl.TaskAttemptImpl|: Can't handle this event at current state for 
> attempt_1454544113740_0027_1_00_00_3
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:795)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:120)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2180)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2165)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-3096) Statemachine: TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS fails

2016-03-07 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang reassigned TEZ-3096:
-

Assignee: Zhiyuan Yang

> Statemachine: TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS fails
> ---
>
> Key: TEZ-3096
> URL: https://issues.apache.org/jira/browse/TEZ-3096
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Zhiyuan Yang
>
> Tasks are failing exactly 300ms into running due to a FileSystem error.
> {code}
> 2016-02-04 05:05:56,853 [ERROR] [Dispatcher thread {Central}] 
> |impl.TaskAttemptImpl|: Can't handle this event at current state for 
> attempt_1454544113740_0027_1_00_03_3
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:795)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:120)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2180)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2165)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
> at java.lang.Thread.run(Thread.java:745)
> 2016-02-04 05:05:56,903 [ERROR] [Dispatcher thread {Central}] 
> |impl.TaskAttemptImpl|: Can't handle this event at current state for 
> attempt_1454544113740_0027_1_00_00_3
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> TA_TEZ_EVENT_UPDATE at KILL_IN_PROGRESS
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:795)
> at 
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:120)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2180)
> at 
> org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:2165)
> at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3140) Reduce AM memory usage while serialization

2016-03-07 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated TEZ-3140:

Attachment: TEZ-3140-3.patch

Committed to branch-0.7 and master. Attaching the final patch TEZ-3140-3.patch  
which moves the test to TestEntityDescriptor.java.  Thanks [~sseth] for the 
review.

> Reduce AM memory usage while serialization
> --
>
> Key: TEZ-3140
> URL: https://issues.apache.org/jira/browse/TEZ-3140
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.7.1, 0.8.3
>
> Attachments: TEZ-3140-1.patch, TEZ-3140-2.patch, TEZ-3140-3.patch
>
>
>There is an unnecessary copy of userpayload byte array during 
> serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)