[jira] [Commented] (HIVE-10648) LLAP: registry; Tez attempted to schedule to daemon that didn't exist

2015-11-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010040#comment-15010040
 ] 

Sergey Shelukhin commented on HIVE-10648:
-

[~gopalv] any update on this?

> LLAP: registry; Tez attempted to schedule to daemon that didn't exist
> -
>
> Key: HIVE-10648
> URL: https://issues.apache.org/jira/browse/HIVE-10648
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Gopal V
>
> I can post logs externally; for now app IDs on test cluster are 
> application_1429683757595_0784 and application_1429683757595_0783, I also 
> have logs copied over.
> AM found the node (same logs for other nodes):
> {noformat}
> 2015-05-07 12:13:28,074 INFO 
> [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler] 
> impl.LlapYarnRegistryImpl: Adding new worker 
> 342f4992-2608-43ab-a119-b50882e35f75 which mapped to DynamicServiceInstance 
> [alive=true, host=cn059-10.l42scl.hortonworks.com:15001 with 
> resources=]
> 
> 2015-05-07 12:13:28,082 INFO [Dispatcher thread: Central] node.AMNodeTracker: 
> Num cluster nodes = 19
> {noformat}
> Trouble is, this node never actually existed... The cluster only had 15 
> nodes. 
> As the job was progressing, AM repeatedly tried to schedule to this node and 
> failed. There was no other LLAP cluster running at the same time.
> In fact, given that I always start a 15-node cluster I am not sure where 
> 19-node data could conceivably come from...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10648) LLAP: registry; Tez attempted to schedule to daemon that didn't exist

2015-05-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533486#comment-14533486
 ] 

Sergey Shelukhin commented on HIVE-10648:
-

Hmm, that's true actually; and cluster had 16 nodes. It appears that one node 
didn't exist or was picked up wrong

 LLAP: registry; Tez attempted to schedule to daemon that didn't exist
 -

 Key: HIVE-10648
 URL: https://issues.apache.org/jira/browse/HIVE-10648
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Gopal V

 I can post logs externally; for now app IDs on test cluster are 
 application_1429683757595_0784 and application_1429683757595_0783, I also 
 have logs copied over.
 AM found the node (same logs for other nodes):
 {noformat}
 2015-05-07 12:13:28,074 INFO 
 [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler] 
 impl.LlapYarnRegistryImpl: Adding new worker 
 342f4992-2608-43ab-a119-b50882e35f75 which mapped to DynamicServiceInstance 
 [alive=true, host=cn059-10.l42scl.hortonworks.com:15001 with 
 resources=memory:20480, vCores:6]
 
 2015-05-07 12:13:28,082 INFO [Dispatcher thread: Central] node.AMNodeTracker: 
 Num cluster nodes = 19
 {noformat}
 Trouble is, this node never actually existed... The cluster only had 15 
 nodes. 
 As the job was progressing, AM repeatedly tried to schedule to this node and 
 failed. There was no other LLAP cluster running at the same time.
 In fact, given that I always start a 15-node cluster I am not sure where 
 19-node data could conceivably come from...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10648) LLAP: registry; Tez attempted to schedule to daemon that didn't exist

2015-05-07 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533464#comment-14533464
 ] 

Gopal V commented on HIVE-10648:


bq. 2015-05-07 12:13:28,082 INFO [Dispatcher thread: Central] 
node.AMNodeTracker: Num cluster nodes = 19

That's the number of Nodemanagers in YARN AFAIK - you do have 19 of those.

 LLAP: registry; Tez attempted to schedule to daemon that didn't exist
 -

 Key: HIVE-10648
 URL: https://issues.apache.org/jira/browse/HIVE-10648
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Gopal V

 I can post logs externally; for now app IDs on test cluster are 
 application_1429683757595_0784 and application_1429683757595_0783, I also 
 have logs copied over.
 AM found the node (same logs for other nodes):
 {noformat}
 2015-05-07 12:13:28,074 INFO 
 [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler] 
 impl.LlapYarnRegistryImpl: Adding new worker 
 342f4992-2608-43ab-a119-b50882e35f75 which mapped to DynamicServiceInstance 
 [alive=true, host=cn059-10.l42scl.hortonworks.com:15001 with 
 resources=memory:20480, vCores:6]
 
 2015-05-07 12:13:28,082 INFO [Dispatcher thread: Central] node.AMNodeTracker: 
 Num cluster nodes = 19
 {noformat}
 Trouble is, this node never actually existed... The cluster only had 15 
 nodes. 
 As the job was progressing, AM repeatedly tried to schedule to this node and 
 failed. There was no other LLAP cluster running at the same time.
 In fact, given that I always start a 15-node cluster I am not sure where 
 19-node data could conceivably come from...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)