[jira] [Updated] (AMBARI-25604) During blueprint deploy tasks sometimes fail due to KeyError on large clusters

2022-11-22 Thread Zhiguo Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/AMBARI-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiguo Wu updated AMBARI-25604:
---
Fix Version/s: 2.8.0

> During blueprint deploy tasks sometimes fail due to KeyError on large clusters
> --
>
> Key: AMBARI-25604
> URL: https://issues.apache.org/jira/browse/AMBARI-25604
> Project: Ambari
>  Issue Type: Bug
>Reporter: Andrew Onischuk
>Assignee: Andrew Onischuk
>Priority: Major
> Fix For: 2.8.0, 2.7.6
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> During blueprint deploy we don't rely on topology cache since AMBARI-23660
> So correct topology is send with
> the command, however the topology from the topology event can be wrong as per 
> AMBARI-23660. 
> The problem occurs when we still try to process broken topology from the 
> event on agent. Agent need to handle this failure with a warning. Currently 
> it just fails the whole command.
> {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
> Caught an exception while executing custom service command:  'exceptions.KeyError'>: 10; 10
> Traceback (most recent call last):
>   File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
> line 324, in runCommand
> command = self.generate_command(command_header)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
> line 507, in generate_command
> command_dict = self.configuration_builder.get_configuration(cluster_id, 
> service_name, component_name, required_config_timestamp)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
> 43, in get_configuration
> 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
>   File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
> newFunction
> return f(*args, **kw)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 
> 112, in get_cluster_host_info
> hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id 
> in component_dict.hostIds]
> KeyError: 10{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@ambari.apache.org
For additional commands, e-mail: issues-h...@ambari.apache.org



[jira] [Updated] (AMBARI-25604) During blueprint deploy tasks sometimes fail due to KeyError on large clusters

2020-12-18 Thread Andrew Onischuk (Jira)


 [ 
https://issues.apache.org/jira/browse/AMBARI-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Onischuk updated AMBARI-25604:
-
Description: 
During blueprint deploy we don't rely on topology cache since AMBARI-23660
So correct topology is send with
the command, however the topology from the topology event can be wrong as per 
AMBARI-23660. 

The problem occurs when we still try to process broken topology from the event 
on agent. Agent need to handle this failure with a warning. Currently it just 
fails the whole command.

{code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
Caught an exception while executing custom service command: : 10; 10
Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 324, in runCommand
command = self.generate_command(command_header)
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 507, in generate_command
command_dict = self.configuration_builder.get_configuration(cluster_id, 
service_name, component_name, required_config_timestamp)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
43, in get_configuration
'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
  File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
newFunction
return f(*args, **kw)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 
112, in get_cluster_host_info
hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in 
component_dict.hostIds]
KeyError: 10{code}



  was:
During blueprint deploy we don't rely on topology cache since AMBARI-23660
So correct topology is send with
the command, however the topology from the topology event can be wrong as per 
AMBARI-23660. 

The problem occurs when we still try to process broken topology from the event 
on agent. Agent need to handle this failure with a warning.

{code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
Caught an exception while executing custom service command: : 10; 10
Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 324, in runCommand
command = self.generate_command(command_header)
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 507, in generate_command
command_dict = self.configuration_builder.get_configuration(cluster_id, 
service_name, component_name, required_config_timestamp)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
43, in get_configuration
'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
  File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
newFunction
return f(*args, **kw)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 
112, in get_cluster_host_info
hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in 
component_dict.hostIds]
KeyError: 10{code}




> During blueprint deploy tasks sometimes fail due to KeyError on large clusters
> --
>
> Key: AMBARI-25604
> URL: https://issues.apache.org/jira/browse/AMBARI-25604
> Project: Ambari
>  Issue Type: Bug
>Reporter: Andrew Onischuk
>Assignee: Andrew Onischuk
>Priority: Major
> Fix For: 2.7.6
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During blueprint deploy we don't rely on topology cache since AMBARI-23660
> So correct topology is send with
> the command, however the topology from the topology event can be wrong as per 
> AMBARI-23660. 
> The problem occurs when we still try to process broken topology from the 
> event on agent. Agent need to handle this failure with a warning. Currently 
> it just fails the whole command.
> {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
> Caught an exception while executing custom service command:  'exceptions.KeyError'>: 10; 10
> Traceback (most recent call last):
>   File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
> line 324, in runCommand
> command = self.generate_command(command_header)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
> line 507, in generate_command
> command_dict = self.configuration_builder.get_configuration(cluster_id, 
> service_name, component_name, required_config_timestamp)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
> 43, in get_configuration
> 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
>   File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
> newFunction

[jira] [Updated] (AMBARI-25604) During blueprint deploy tasks sometimes fail due to KeyError on large clusters

2020-12-18 Thread Andrew Onischuk (Jira)


 [ 
https://issues.apache.org/jira/browse/AMBARI-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Onischuk updated AMBARI-25604:
-
Description: 
During blueprint deploy we don't rely on topology cache since
[https://issues.apache.org/jira/browse/AMBARI-23660]
So correct topology is send with
the command, however the topology from the topology event can be wrong as per 
AMBARI-23660. 

The problem occurs when we still try to process broken topology from the event 
on agent. Agent need to handle this failure with a warning.

{code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
Caught an exception while executing custom service command: : 10; 10
Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 324, in runCommand
command = self.generate_command(command_header)
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 507, in generate_command
command_dict = self.configuration_builder.get_configuration(cluster_id, 
service_name, component_name, required_config_timestamp)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
43, in get_configuration
'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
  File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
newFunction
return f(*args, **kw)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 
112, in get_cluster_host_info
hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in 
component_dict.hostIds]
KeyError: 10{code}



  was:
During blueprint deploy we don't rely on topology cache since
[https://issues.apache.org/jira/browse/AMBARI-23660]
So correct topology is send with
the command, however the topology from the topology event can be wrong. 

The problem occurs when we still try to process broken topology from the event 
on agent. Agent need to handle this failure with a warning.

{code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
Caught an exception while executing custom service command: : 10; 10
Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 324, in runCommand
command = self.generate_command(command_header)
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 507, in generate_command
command_dict = self.configuration_builder.get_configuration(cluster_id, 
service_name, component_name, required_config_timestamp)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
43, in get_configuration
'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
  File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
newFunction
return f(*args, **kw)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 
112, in get_cluster_host_info
hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in 
component_dict.hostIds]
KeyError: 10{code}




> During blueprint deploy tasks sometimes fail due to KeyError on large clusters
> --
>
> Key: AMBARI-25604
> URL: https://issues.apache.org/jira/browse/AMBARI-25604
> Project: Ambari
>  Issue Type: Bug
>Reporter: Andrew Onischuk
>Assignee: Andrew Onischuk
>Priority: Major
> Fix For: 2.7.6
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During blueprint deploy we don't rely on topology cache since
> [https://issues.apache.org/jira/browse/AMBARI-23660]
> So correct topology is send with
> the command, however the topology from the topology event can be wrong as per 
> AMBARI-23660. 
> The problem occurs when we still try to process broken topology from the 
> event on agent. Agent need to handle this failure with a warning.
> {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
> Caught an exception while executing custom service command:  'exceptions.KeyError'>: 10; 10
> Traceback (most recent call last):
>   File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
> line 324, in runCommand
> command = self.generate_command(command_header)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
> line 507, in generate_command
> command_dict = self.configuration_builder.get_configuration(cluster_id, 
> service_name, component_name, required_config_timestamp)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
> 43, in get_configuration
> 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
>   File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
> 

[jira] [Updated] (AMBARI-25604) During blueprint deploy tasks sometimes fail due to KeyError on large clusters

2020-12-18 Thread Andrew Onischuk (Jira)


 [ 
https://issues.apache.org/jira/browse/AMBARI-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Onischuk updated AMBARI-25604:
-
Description: 
During blueprint deploy we don't rely on topology cache since AMBARI-23660
So correct topology is send with
the command, however the topology from the topology event can be wrong as per 
AMBARI-23660. 

The problem occurs when we still try to process broken topology from the event 
on agent. Agent need to handle this failure with a warning.

{code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
Caught an exception while executing custom service command: : 10; 10
Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 324, in runCommand
command = self.generate_command(command_header)
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 507, in generate_command
command_dict = self.configuration_builder.get_configuration(cluster_id, 
service_name, component_name, required_config_timestamp)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
43, in get_configuration
'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
  File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
newFunction
return f(*args, **kw)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 
112, in get_cluster_host_info
hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in 
component_dict.hostIds]
KeyError: 10{code}



  was:
During blueprint deploy we don't rely on topology cache since
[https://issues.apache.org/jira/browse/AMBARI-23660]
So correct topology is send with
the command, however the topology from the topology event can be wrong as per 
AMBARI-23660. 

The problem occurs when we still try to process broken topology from the event 
on agent. Agent need to handle this failure with a warning.

{code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
Caught an exception while executing custom service command: : 10; 10
Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 324, in runCommand
command = self.generate_command(command_header)
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 507, in generate_command
command_dict = self.configuration_builder.get_configuration(cluster_id, 
service_name, component_name, required_config_timestamp)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
43, in get_configuration
'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
  File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
newFunction
return f(*args, **kw)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 
112, in get_cluster_host_info
hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in 
component_dict.hostIds]
KeyError: 10{code}




> During blueprint deploy tasks sometimes fail due to KeyError on large clusters
> --
>
> Key: AMBARI-25604
> URL: https://issues.apache.org/jira/browse/AMBARI-25604
> Project: Ambari
>  Issue Type: Bug
>Reporter: Andrew Onischuk
>Assignee: Andrew Onischuk
>Priority: Major
> Fix For: 2.7.6
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During blueprint deploy we don't rely on topology cache since AMBARI-23660
> So correct topology is send with
> the command, however the topology from the topology event can be wrong as per 
> AMBARI-23660. 
> The problem occurs when we still try to process broken topology from the 
> event on agent. Agent need to handle this failure with a warning.
> {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
> Caught an exception while executing custom service command:  'exceptions.KeyError'>: 10; 10
> Traceback (most recent call last):
>   File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
> line 324, in runCommand
> command = self.generate_command(command_header)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
> line 507, in generate_command
> command_dict = self.configuration_builder.get_configuration(cluster_id, 
> service_name, component_name, required_config_timestamp)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
> 43, in get_configuration
> 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
>   File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
> newFunction
> return f(*args, **kw)
>   File 

[jira] [Updated] (AMBARI-25604) During blueprint deploy tasks sometimes fail due to KeyError on large clusters

2020-12-18 Thread Andrew Onischuk (Jira)


 [ 
https://issues.apache.org/jira/browse/AMBARI-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Onischuk updated AMBARI-25604:
-
Description: 
During blueprint deploy we don't rely on topology cache since
[https://issues.apache.org/jira/browse/AMBARI-23660] So correct topology is 
send with
the command, however the topology from the topology event can be wrong. The 
problem occurs when we still try to process broken topology from the event on 
agent. Agent need to handle this failure with a warning.
{code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
Caught an exception while executing custom service command: : 10; 10
Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 324, in runCommand
command = self.generate_command(command_header)
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 507, in generate_command
command_dict = self.configuration_builder.get_configuration(cluster_id, 
service_name, component_name, required_config_timestamp)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
43, in get_configuration
'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
  File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
newFunction
return f(*args, **kw)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 
112, in get_cluster_host_info
hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in 
component_dict.hostIds]
KeyError: 10{code}



  was:
During blueprint deploy we don't rely on topology cache since
[https://issues.apache.org/jira/browse/AMBARI-23660] So topology is send with
command. BUT the problem occurs when we still try to generate it on agent and
fail. 
{code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
Caught an exception while executing custom service command: : 10; 10
Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 324, in runCommand
command = self.generate_command(command_header)
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 507, in generate_command
command_dict = self.configuration_builder.get_configuration(cluster_id, 
service_name, component_name, required_config_timestamp)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
43, in get_configuration
'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
  File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
newFunction
return f(*args, **kw)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 
112, in get_cluster_host_info
hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in 
component_dict.hostIds]
KeyError: 10{code}




> During blueprint deploy tasks sometimes fail due to KeyError on large clusters
> --
>
> Key: AMBARI-25604
> URL: https://issues.apache.org/jira/browse/AMBARI-25604
> Project: Ambari
>  Issue Type: Bug
>Reporter: Andrew Onischuk
>Assignee: Andrew Onischuk
>Priority: Major
> Fix For: 2.7.6
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During blueprint deploy we don't rely on topology cache since
> [https://issues.apache.org/jira/browse/AMBARI-23660] So correct topology is 
> send with
> the command, however the topology from the topology event can be wrong. The 
> problem occurs when we still try to process broken topology from the event on 
> agent. Agent need to handle this failure with a warning.
> {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
> Caught an exception while executing custom service command:  'exceptions.KeyError'>: 10; 10
> Traceback (most recent call last):
>   File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
> line 324, in runCommand
> command = self.generate_command(command_header)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
> line 507, in generate_command
> command_dict = self.configuration_builder.get_configuration(cluster_id, 
> service_name, component_name, required_config_timestamp)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
> 43, in get_configuration
> 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
>   File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
> newFunction
> return f(*args, **kw)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 
> 112, in get_cluster_host_info
> hostnames = 

[jira] [Updated] (AMBARI-25604) During blueprint deploy tasks sometimes fail due to KeyError on large clusters

2020-12-18 Thread Andrew Onischuk (Jira)


 [ 
https://issues.apache.org/jira/browse/AMBARI-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Onischuk updated AMBARI-25604:
-
Description: 
During blueprint deploy we don't rely on topology cache since
[https://issues.apache.org/jira/browse/AMBARI-23660]
So correct topology is send with
the command, however the topology from the topology event can be wrong. 

The problem occurs when we still try to process broken topology from the event 
on agent. Agent need to handle this failure with a warning.

{code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
Caught an exception while executing custom service command: : 10; 10
Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 324, in runCommand
command = self.generate_command(command_header)
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 507, in generate_command
command_dict = self.configuration_builder.get_configuration(cluster_id, 
service_name, component_name, required_config_timestamp)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
43, in get_configuration
'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
  File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
newFunction
return f(*args, **kw)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 
112, in get_cluster_host_info
hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in 
component_dict.hostIds]
KeyError: 10{code}



  was:
During blueprint deploy we don't rely on topology cache since
[https://issues.apache.org/jira/browse/AMBARI-23660] So correct topology is 
send with
the command, however the topology from the topology event can be wrong. The 
problem occurs when we still try to process broken topology from the event on 
agent. Agent need to handle this failure with a warning.
{code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
Caught an exception while executing custom service command: : 10; 10
Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 324, in runCommand
command = self.generate_command(command_header)
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 507, in generate_command
command_dict = self.configuration_builder.get_configuration(cluster_id, 
service_name, component_name, required_config_timestamp)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
43, in get_configuration
'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
  File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
newFunction
return f(*args, **kw)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 
112, in get_cluster_host_info
hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in 
component_dict.hostIds]
KeyError: 10{code}




> During blueprint deploy tasks sometimes fail due to KeyError on large clusters
> --
>
> Key: AMBARI-25604
> URL: https://issues.apache.org/jira/browse/AMBARI-25604
> Project: Ambari
>  Issue Type: Bug
>Reporter: Andrew Onischuk
>Assignee: Andrew Onischuk
>Priority: Major
> Fix For: 2.7.6
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During blueprint deploy we don't rely on topology cache since
> [https://issues.apache.org/jira/browse/AMBARI-23660]
> So correct topology is send with
> the command, however the topology from the topology event can be wrong. 
> The problem occurs when we still try to process broken topology from the 
> event on agent. Agent need to handle this failure with a warning.
> {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
> Caught an exception while executing custom service command:  'exceptions.KeyError'>: 10; 10
> Traceback (most recent call last):
>   File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
> line 324, in runCommand
> command = self.generate_command(command_header)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
> line 507, in generate_command
> command_dict = self.configuration_builder.get_configuration(cluster_id, 
> service_name, component_name, required_config_timestamp)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
> 43, in get_configuration
> 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
>   File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
> newFunction
> return f(*args, **kw)
>   

[jira] [Updated] (AMBARI-25604) During blueprint deploy tasks sometimes fail due to KeyError on large clusters

2020-12-18 Thread Andrew Onischuk (Jira)


 [ 
https://issues.apache.org/jira/browse/AMBARI-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Onischuk updated AMBARI-25604:
-
Description: 
During blueprint deploy we don't rely on topology cache since
[https://issues.apache.org/jira/browse/AMBARI-23660] So topology is send with
command. BUT the problem occurs when we still try to generate it on agent and
fail. 
{code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
Caught an exception while executing custom service command: : 10; 10
Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 324, in runCommand
command = self.generate_command(command_header)
  File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
line 507, in generate_command
command_dict = self.configuration_builder.get_configuration(cluster_id, 
service_name, component_name, required_config_timestamp)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
43, in get_configuration
'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
  File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
newFunction
return f(*args, **kw)
  File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 
112, in get_cluster_host_info
hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in 
component_dict.hostIds]
KeyError: 10{code}



  was:
During blueprint deploy we don't rely on topology cache since
[https://issues.apache.org/jira/browse/AMBARI-23660] So topology is send with
command. BUT the problem occurs when we still try to generate it on agent and
fail. {code:java} ERROR 2020-12-10 06:30:09,350
CustomServiceOrchestrator.py:459 - Caught an exception while executing custom
service command: : 10; 10 Traceback (most recent call last): File "/usr/lib
/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 324, in
runCommand command = self.generate_command(command_header) File "/usr/lib
/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 507, in
generate_command command_dict =
self.configuration_builder.get_configuration(cluster_id, service_name,
component_name, required_config_timestamp) File "/usr/lib/ambari-
agent/lib/ambari_agent/ConfigurationBuilder.py", line 43, in get_configuration
'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), File
"/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in newFunction
return f(*args, **kw) File "/usr/lib/ambari-
agent/lib/ambari_agent/ClusterTopologyCache.py", line 112, in
get_cluster_host_info hostnames =
[self.hosts_to_id[cluster_id][host_id].hostName for host_id in
component_dict.hostIds] KeyError: 10{code}




> During blueprint deploy tasks sometimes fail due to KeyError on large clusters
> --
>
> Key: AMBARI-25604
> URL: https://issues.apache.org/jira/browse/AMBARI-25604
> Project: Ambari
>  Issue Type: Bug
>Reporter: Andrew Onischuk
>Assignee: Andrew Onischuk
>Priority: Major
> Fix For: 2.7.6
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During blueprint deploy we don't rely on topology cache since
> [https://issues.apache.org/jira/browse/AMBARI-23660] So topology is send with
> command. BUT the problem occurs when we still try to generate it on agent and
> fail. 
> {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - 
> Caught an exception while executing custom service command:  'exceptions.KeyError'>: 10; 10
> Traceback (most recent call last):
>   File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
> line 324, in runCommand
> command = self.generate_command(command_header)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", 
> line 507, in generate_command
> command_dict = self.configuration_builder.get_configuration(cluster_id, 
> service_name, component_name, required_config_timestamp)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 
> 43, in get_configuration
> 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id),
>   File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in 
> newFunction
> return f(*args, **kw)
>   File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 
> 112, in get_cluster_host_info
> hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id 
> in component_dict.hostIds]
> KeyError: 10{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AMBARI-25604) During blueprint deploy tasks sometimes fail due to KeyError on large clusters

2020-12-10 Thread Andrew Onischuk (Jira)


 [ 
https://issues.apache.org/jira/browse/AMBARI-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Onischuk updated AMBARI-25604:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to branch-2.7

> During blueprint deploy tasks sometimes fail due to KeyError on large clusters
> --
>
> Key: AMBARI-25604
> URL: https://issues.apache.org/jira/browse/AMBARI-25604
> Project: Ambari
>  Issue Type: Bug
>Reporter: Andrew Onischuk
>Assignee: Andrew Onischuk
>Priority: Major
> Fix For: 2.7.6
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During blueprint deploy we don't rely on topology cache since
> [https://issues.apache.org/jira/browse/AMBARI-23660] So topology is send with
> command. BUT the problem occurs when we still try to generate it on agent and
> fail. {code:java} ERROR 2020-12-10 06:30:09,350
> CustomServiceOrchestrator.py:459 - Caught an exception while executing custom
> service command: : 10; 10 Traceback (most recent call last): File "/usr/lib
> /ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 324, in
> runCommand command = self.generate_command(command_header) File "/usr/lib
> /ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 507, in
> generate_command command_dict =
> self.configuration_builder.get_configuration(cluster_id, service_name,
> component_name, required_config_timestamp) File "/usr/lib/ambari-
> agent/lib/ambari_agent/ConfigurationBuilder.py", line 43, in get_configuration
> 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), File
> "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in newFunction
> return f(*args, **kw) File "/usr/lib/ambari-
> agent/lib/ambari_agent/ClusterTopologyCache.py", line 112, in
> get_cluster_host_info hostnames =
> [self.hosts_to_id[cluster_id][host_id].hostName for host_id in
> component_dict.hostIds] KeyError: 10{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AMBARI-25604) During blueprint deploy tasks sometimes fail due to KeyError on large clusters

2020-12-10 Thread Andrew Onischuk (Jira)


 [ 
https://issues.apache.org/jira/browse/AMBARI-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Onischuk updated AMBARI-25604:
-
Status: Patch Available  (was: Open)

> During blueprint deploy tasks sometimes fail due to KeyError on large clusters
> --
>
> Key: AMBARI-25604
> URL: https://issues.apache.org/jira/browse/AMBARI-25604
> Project: Ambari
>  Issue Type: Bug
>Reporter: Andrew Onischuk
>Assignee: Andrew Onischuk
>Priority: Major
> Fix For: 2.7.6
>
>
> During blueprint deploy we don't rely on topology cache since
> [https://issues.apache.org/jira/browse/AMBARI-23660] So topology is send with
> command. BUT the problem occurs when we still try to generate it on agent and
> fail. {code:java} ERROR 2020-12-10 06:30:09,350
> CustomServiceOrchestrator.py:459 - Caught an exception while executing custom
> service command: : 10; 10 Traceback (most recent call last): File "/usr/lib
> /ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 324, in
> runCommand command = self.generate_command(command_header) File "/usr/lib
> /ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 507, in
> generate_command command_dict =
> self.configuration_builder.get_configuration(cluster_id, service_name,
> component_name, required_config_timestamp) File "/usr/lib/ambari-
> agent/lib/ambari_agent/ConfigurationBuilder.py", line 43, in get_configuration
> 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), File
> "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in newFunction
> return f(*args, **kw) File "/usr/lib/ambari-
> agent/lib/ambari_agent/ClusterTopologyCache.py", line 112, in
> get_cluster_host_info hostnames =
> [self.hosts_to_id[cluster_id][host_id].hostName for host_id in
> component_dict.hostIds] KeyError: 10{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)