[
https://issues.apache.org/jira/browse/HIVE-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Shelukhin updated HIVE-10649:
Description:
See HIVE-10648.
When AM cannot connect to a node, that appears to cause it to stall; example
log, there are no other interleaving logs even though this is happening in the
middle of Map 1 on TPCH q1, i.e. there are plenty of tasks scheduled.
From Assigning messages I can also see tasks are scheduled to all the nodes
before and after the pause, not just to the problematic node.
LLAP daemons have corresponding gaps where between two fragments nothing is ran
for a long time on any daemon.
{noformat}
2015-05-07 12:13:46,679 INFO [Dispatcher thread: Central] impl.TaskImpl:
task_1429683757595_0784_1_00_000276 Task Transitioned from SCHEDULED to RUNNING
due to event T_ATTEMPT_LAUNCHED
2015-05-07 12:13:46,811 INFO [TaskCommunicator # 3] ipc.Client: Retrying
connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already
tried 10 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:46,955 INFO [LlapSchedulerNodeEnabler]
impl.LlapYarnRegistryImpl: Starting to refresh ServiceInstanceSet 1611673583
2015-05-07 12:13:47,811 INFO [TaskCommunicator # 3] ipc.Client: Retrying
connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already
tried 11 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:48,812 INFO [TaskCommunicator # 3] ipc.Client: Retrying
connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already
tried 12 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:49,813 INFO [TaskCommunicator # 3] ipc.Client: Retrying
connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already
tried 13 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:50,813 INFO [TaskCommunicator # 3] ipc.Client: Retrying
connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already
tried 14 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:51,814 INFO [TaskCommunicator # 3] ipc.Client: Retrying
connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already
tried 15 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:52,814 INFO [TaskCommunicator # 3] ipc.Client: Retrying
connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already
tried 16 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:53,815 INFO [TaskCommunicator # 3] ipc.Client: Retrying
connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already
tried 17 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:54,816 INFO [TaskCommunicator # 3] ipc.Client: Retrying
connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already
tried 18 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:55,816 INFO [TaskCommunicator # 3] ipc.Client: Retrying
connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already
tried 19 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:56,817 INFO [TaskCommunicator # 3] ipc.Client: Retrying
connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already
tried 20 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:56,971 INFO [LlapSchedulerNodeEnabler]
impl.LlapYarnRegistryImpl: Starting to refresh ServiceInstanceSet 1611673583
2015-05-07 12:13:57,817 INFO [TaskCommunicator # 3] ipc.Client: Retrying
connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already
tried 21 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:58,818 INFO [TaskCommunicator # 3] ipc.Client: Retrying
connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already
tried 22 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:13:59,819 INFO [TaskCommunicator # 3] ipc.Client: Retrying
connect to server: cn059-10.l42scl.hortonworks.com/172.19.128.59:15001. Already
tried 23 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-05-07 12:14:00,819 INFO