[jira] [Updated] (AMBARI-25604) During blueprint deploy tasks sometimes fail due to KeyError on large clusters
[ https://issues.apache.org/jira/browse/AMBARI-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiguo Wu updated AMBARI-25604: --- Fix Version/s: 2.8.0 > During blueprint deploy tasks sometimes fail due to KeyError on large clusters > -- > > Key: AMBARI-25604 > URL: https://issues.apache.org/jira/browse/AMBARI-25604 > Project: Ambari > Issue Type: Bug >Reporter: Andrew Onischuk >Assignee: Andrew Onischuk >Priority: Major > Fix For: 2.8.0, 2.7.6 > > Time Spent: 40m > Remaining Estimate: 0h > > During blueprint deploy we don't rely on topology cache since AMBARI-23660 > So correct topology is send with > the command, however the topology from the topology event can be wrong as per > AMBARI-23660. > The problem occurs when we still try to process broken topology from the > event on agent. Agent need to handle this failure with a warning. Currently > it just fails the whole command. > {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - > Caught an exception while executing custom service command: 'exceptions.KeyError'>: 10; 10 > Traceback (most recent call last): > File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", > line 324, in runCommand > command = self.generate_command(command_header) > File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", > line 507, in generate_command > command_dict = self.configuration_builder.get_configuration(cluster_id, > service_name, component_name, required_config_timestamp) > File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line > 43, in get_configuration > 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), > File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in > newFunction > return f(*args, **kw) > File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line > 112, in get_cluster_host_info > hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id > in component_dict.hostIds] > KeyError: 10{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@ambari.apache.org For additional commands, e-mail: issues-h...@ambari.apache.org
[jira] [Updated] (AMBARI-25604) During blueprint deploy tasks sometimes fail due to KeyError on large clusters
[ https://issues.apache.org/jira/browse/AMBARI-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Onischuk updated AMBARI-25604: - Description: During blueprint deploy we don't rely on topology cache since AMBARI-23660 So correct topology is send with the command, however the topology from the topology event can be wrong as per AMBARI-23660. The problem occurs when we still try to process broken topology from the event on agent. Agent need to handle this failure with a warning. Currently it just fails the whole command. {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - Caught an exception while executing custom service command: : 10; 10 Traceback (most recent call last): File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 324, in runCommand command = self.generate_command(command_header) File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 507, in generate_command command_dict = self.configuration_builder.get_configuration(cluster_id, service_name, component_name, required_config_timestamp) File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 43, in get_configuration 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in newFunction return f(*args, **kw) File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 112, in get_cluster_host_info hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in component_dict.hostIds] KeyError: 10{code} was: During blueprint deploy we don't rely on topology cache since AMBARI-23660 So correct topology is send with the command, however the topology from the topology event can be wrong as per AMBARI-23660. The problem occurs when we still try to process broken topology from the event on agent. Agent need to handle this failure with a warning. {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - Caught an exception while executing custom service command: : 10; 10 Traceback (most recent call last): File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 324, in runCommand command = self.generate_command(command_header) File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 507, in generate_command command_dict = self.configuration_builder.get_configuration(cluster_id, service_name, component_name, required_config_timestamp) File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 43, in get_configuration 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in newFunction return f(*args, **kw) File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 112, in get_cluster_host_info hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in component_dict.hostIds] KeyError: 10{code} > During blueprint deploy tasks sometimes fail due to KeyError on large clusters > -- > > Key: AMBARI-25604 > URL: https://issues.apache.org/jira/browse/AMBARI-25604 > Project: Ambari > Issue Type: Bug >Reporter: Andrew Onischuk >Assignee: Andrew Onischuk >Priority: Major > Fix For: 2.7.6 > > Time Spent: 20m > Remaining Estimate: 0h > > During blueprint deploy we don't rely on topology cache since AMBARI-23660 > So correct topology is send with > the command, however the topology from the topology event can be wrong as per > AMBARI-23660. > The problem occurs when we still try to process broken topology from the > event on agent. Agent need to handle this failure with a warning. Currently > it just fails the whole command. > {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - > Caught an exception while executing custom service command: 'exceptions.KeyError'>: 10; 10 > Traceback (most recent call last): > File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", > line 324, in runCommand > command = self.generate_command(command_header) > File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", > line 507, in generate_command > command_dict = self.configuration_builder.get_configuration(cluster_id, > service_name, component_name, required_config_timestamp) > File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line > 43, in get_configuration > 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), > File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in > newFunction
[jira] [Updated] (AMBARI-25604) During blueprint deploy tasks sometimes fail due to KeyError on large clusters
[ https://issues.apache.org/jira/browse/AMBARI-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Onischuk updated AMBARI-25604: - Description: During blueprint deploy we don't rely on topology cache since [https://issues.apache.org/jira/browse/AMBARI-23660] So correct topology is send with the command, however the topology from the topology event can be wrong as per AMBARI-23660. The problem occurs when we still try to process broken topology from the event on agent. Agent need to handle this failure with a warning. {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - Caught an exception while executing custom service command: : 10; 10 Traceback (most recent call last): File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 324, in runCommand command = self.generate_command(command_header) File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 507, in generate_command command_dict = self.configuration_builder.get_configuration(cluster_id, service_name, component_name, required_config_timestamp) File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 43, in get_configuration 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in newFunction return f(*args, **kw) File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 112, in get_cluster_host_info hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in component_dict.hostIds] KeyError: 10{code} was: During blueprint deploy we don't rely on topology cache since [https://issues.apache.org/jira/browse/AMBARI-23660] So correct topology is send with the command, however the topology from the topology event can be wrong. The problem occurs when we still try to process broken topology from the event on agent. Agent need to handle this failure with a warning. {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - Caught an exception while executing custom service command: : 10; 10 Traceback (most recent call last): File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 324, in runCommand command = self.generate_command(command_header) File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 507, in generate_command command_dict = self.configuration_builder.get_configuration(cluster_id, service_name, component_name, required_config_timestamp) File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 43, in get_configuration 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in newFunction return f(*args, **kw) File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 112, in get_cluster_host_info hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in component_dict.hostIds] KeyError: 10{code} > During blueprint deploy tasks sometimes fail due to KeyError on large clusters > -- > > Key: AMBARI-25604 > URL: https://issues.apache.org/jira/browse/AMBARI-25604 > Project: Ambari > Issue Type: Bug >Reporter: Andrew Onischuk >Assignee: Andrew Onischuk >Priority: Major > Fix For: 2.7.6 > > Time Spent: 20m > Remaining Estimate: 0h > > During blueprint deploy we don't rely on topology cache since > [https://issues.apache.org/jira/browse/AMBARI-23660] > So correct topology is send with > the command, however the topology from the topology event can be wrong as per > AMBARI-23660. > The problem occurs when we still try to process broken topology from the > event on agent. Agent need to handle this failure with a warning. > {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - > Caught an exception while executing custom service command: 'exceptions.KeyError'>: 10; 10 > Traceback (most recent call last): > File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", > line 324, in runCommand > command = self.generate_command(command_header) > File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", > line 507, in generate_command > command_dict = self.configuration_builder.get_configuration(cluster_id, > service_name, component_name, required_config_timestamp) > File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line > 43, in get_configuration > 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), > File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in >
[jira] [Updated] (AMBARI-25604) During blueprint deploy tasks sometimes fail due to KeyError on large clusters
[ https://issues.apache.org/jira/browse/AMBARI-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Onischuk updated AMBARI-25604: - Description: During blueprint deploy we don't rely on topology cache since AMBARI-23660 So correct topology is send with the command, however the topology from the topology event can be wrong as per AMBARI-23660. The problem occurs when we still try to process broken topology from the event on agent. Agent need to handle this failure with a warning. {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - Caught an exception while executing custom service command: : 10; 10 Traceback (most recent call last): File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 324, in runCommand command = self.generate_command(command_header) File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 507, in generate_command command_dict = self.configuration_builder.get_configuration(cluster_id, service_name, component_name, required_config_timestamp) File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 43, in get_configuration 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in newFunction return f(*args, **kw) File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 112, in get_cluster_host_info hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in component_dict.hostIds] KeyError: 10{code} was: During blueprint deploy we don't rely on topology cache since [https://issues.apache.org/jira/browse/AMBARI-23660] So correct topology is send with the command, however the topology from the topology event can be wrong as per AMBARI-23660. The problem occurs when we still try to process broken topology from the event on agent. Agent need to handle this failure with a warning. {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - Caught an exception while executing custom service command: : 10; 10 Traceback (most recent call last): File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 324, in runCommand command = self.generate_command(command_header) File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 507, in generate_command command_dict = self.configuration_builder.get_configuration(cluster_id, service_name, component_name, required_config_timestamp) File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 43, in get_configuration 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in newFunction return f(*args, **kw) File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 112, in get_cluster_host_info hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in component_dict.hostIds] KeyError: 10{code} > During blueprint deploy tasks sometimes fail due to KeyError on large clusters > -- > > Key: AMBARI-25604 > URL: https://issues.apache.org/jira/browse/AMBARI-25604 > Project: Ambari > Issue Type: Bug >Reporter: Andrew Onischuk >Assignee: Andrew Onischuk >Priority: Major > Fix For: 2.7.6 > > Time Spent: 20m > Remaining Estimate: 0h > > During blueprint deploy we don't rely on topology cache since AMBARI-23660 > So correct topology is send with > the command, however the topology from the topology event can be wrong as per > AMBARI-23660. > The problem occurs when we still try to process broken topology from the > event on agent. Agent need to handle this failure with a warning. > {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - > Caught an exception while executing custom service command: 'exceptions.KeyError'>: 10; 10 > Traceback (most recent call last): > File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", > line 324, in runCommand > command = self.generate_command(command_header) > File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", > line 507, in generate_command > command_dict = self.configuration_builder.get_configuration(cluster_id, > service_name, component_name, required_config_timestamp) > File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line > 43, in get_configuration > 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), > File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in > newFunction > return f(*args, **kw) > File
[jira] [Updated] (AMBARI-25604) During blueprint deploy tasks sometimes fail due to KeyError on large clusters
[ https://issues.apache.org/jira/browse/AMBARI-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Onischuk updated AMBARI-25604: - Description: During blueprint deploy we don't rely on topology cache since [https://issues.apache.org/jira/browse/AMBARI-23660] So correct topology is send with the command, however the topology from the topology event can be wrong. The problem occurs when we still try to process broken topology from the event on agent. Agent need to handle this failure with a warning. {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - Caught an exception while executing custom service command: : 10; 10 Traceback (most recent call last): File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 324, in runCommand command = self.generate_command(command_header) File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 507, in generate_command command_dict = self.configuration_builder.get_configuration(cluster_id, service_name, component_name, required_config_timestamp) File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 43, in get_configuration 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in newFunction return f(*args, **kw) File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 112, in get_cluster_host_info hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in component_dict.hostIds] KeyError: 10{code} was: During blueprint deploy we don't rely on topology cache since [https://issues.apache.org/jira/browse/AMBARI-23660] So topology is send with command. BUT the problem occurs when we still try to generate it on agent and fail. {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - Caught an exception while executing custom service command: : 10; 10 Traceback (most recent call last): File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 324, in runCommand command = self.generate_command(command_header) File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 507, in generate_command command_dict = self.configuration_builder.get_configuration(cluster_id, service_name, component_name, required_config_timestamp) File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 43, in get_configuration 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in newFunction return f(*args, **kw) File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 112, in get_cluster_host_info hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in component_dict.hostIds] KeyError: 10{code} > During blueprint deploy tasks sometimes fail due to KeyError on large clusters > -- > > Key: AMBARI-25604 > URL: https://issues.apache.org/jira/browse/AMBARI-25604 > Project: Ambari > Issue Type: Bug >Reporter: Andrew Onischuk >Assignee: Andrew Onischuk >Priority: Major > Fix For: 2.7.6 > > Time Spent: 20m > Remaining Estimate: 0h > > During blueprint deploy we don't rely on topology cache since > [https://issues.apache.org/jira/browse/AMBARI-23660] So correct topology is > send with > the command, however the topology from the topology event can be wrong. The > problem occurs when we still try to process broken topology from the event on > agent. Agent need to handle this failure with a warning. > {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - > Caught an exception while executing custom service command: 'exceptions.KeyError'>: 10; 10 > Traceback (most recent call last): > File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", > line 324, in runCommand > command = self.generate_command(command_header) > File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", > line 507, in generate_command > command_dict = self.configuration_builder.get_configuration(cluster_id, > service_name, component_name, required_config_timestamp) > File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line > 43, in get_configuration > 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), > File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in > newFunction > return f(*args, **kw) > File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line > 112, in get_cluster_host_info > hostnames =
[jira] [Updated] (AMBARI-25604) During blueprint deploy tasks sometimes fail due to KeyError on large clusters
[ https://issues.apache.org/jira/browse/AMBARI-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Onischuk updated AMBARI-25604: - Description: During blueprint deploy we don't rely on topology cache since [https://issues.apache.org/jira/browse/AMBARI-23660] So correct topology is send with the command, however the topology from the topology event can be wrong. The problem occurs when we still try to process broken topology from the event on agent. Agent need to handle this failure with a warning. {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - Caught an exception while executing custom service command: : 10; 10 Traceback (most recent call last): File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 324, in runCommand command = self.generate_command(command_header) File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 507, in generate_command command_dict = self.configuration_builder.get_configuration(cluster_id, service_name, component_name, required_config_timestamp) File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 43, in get_configuration 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in newFunction return f(*args, **kw) File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 112, in get_cluster_host_info hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in component_dict.hostIds] KeyError: 10{code} was: During blueprint deploy we don't rely on topology cache since [https://issues.apache.org/jira/browse/AMBARI-23660] So correct topology is send with the command, however the topology from the topology event can be wrong. The problem occurs when we still try to process broken topology from the event on agent. Agent need to handle this failure with a warning. {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - Caught an exception while executing custom service command: : 10; 10 Traceback (most recent call last): File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 324, in runCommand command = self.generate_command(command_header) File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 507, in generate_command command_dict = self.configuration_builder.get_configuration(cluster_id, service_name, component_name, required_config_timestamp) File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 43, in get_configuration 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in newFunction return f(*args, **kw) File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 112, in get_cluster_host_info hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in component_dict.hostIds] KeyError: 10{code} > During blueprint deploy tasks sometimes fail due to KeyError on large clusters > -- > > Key: AMBARI-25604 > URL: https://issues.apache.org/jira/browse/AMBARI-25604 > Project: Ambari > Issue Type: Bug >Reporter: Andrew Onischuk >Assignee: Andrew Onischuk >Priority: Major > Fix For: 2.7.6 > > Time Spent: 20m > Remaining Estimate: 0h > > During blueprint deploy we don't rely on topology cache since > [https://issues.apache.org/jira/browse/AMBARI-23660] > So correct topology is send with > the command, however the topology from the topology event can be wrong. > The problem occurs when we still try to process broken topology from the > event on agent. Agent need to handle this failure with a warning. > {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - > Caught an exception while executing custom service command: 'exceptions.KeyError'>: 10; 10 > Traceback (most recent call last): > File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", > line 324, in runCommand > command = self.generate_command(command_header) > File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", > line 507, in generate_command > command_dict = self.configuration_builder.get_configuration(cluster_id, > service_name, component_name, required_config_timestamp) > File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line > 43, in get_configuration > 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), > File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in > newFunction > return f(*args, **kw) >
[jira] [Updated] (AMBARI-25604) During blueprint deploy tasks sometimes fail due to KeyError on large clusters
[ https://issues.apache.org/jira/browse/AMBARI-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Onischuk updated AMBARI-25604: - Description: During blueprint deploy we don't rely on topology cache since [https://issues.apache.org/jira/browse/AMBARI-23660] So topology is send with command. BUT the problem occurs when we still try to generate it on agent and fail. {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - Caught an exception while executing custom service command: : 10; 10 Traceback (most recent call last): File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 324, in runCommand command = self.generate_command(command_header) File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 507, in generate_command command_dict = self.configuration_builder.get_configuration(cluster_id, service_name, component_name, required_config_timestamp) File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line 43, in get_configuration 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in newFunction return f(*args, **kw) File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line 112, in get_cluster_host_info hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in component_dict.hostIds] KeyError: 10{code} was: During blueprint deploy we don't rely on topology cache since [https://issues.apache.org/jira/browse/AMBARI-23660] So topology is send with command. BUT the problem occurs when we still try to generate it on agent and fail. {code:java} ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - Caught an exception while executing custom service command: : 10; 10 Traceback (most recent call last): File "/usr/lib /ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 324, in runCommand command = self.generate_command(command_header) File "/usr/lib /ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 507, in generate_command command_dict = self.configuration_builder.get_configuration(cluster_id, service_name, component_name, required_config_timestamp) File "/usr/lib/ambari- agent/lib/ambari_agent/ConfigurationBuilder.py", line 43, in get_configuration 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in newFunction return f(*args, **kw) File "/usr/lib/ambari- agent/lib/ambari_agent/ClusterTopologyCache.py", line 112, in get_cluster_host_info hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id in component_dict.hostIds] KeyError: 10{code} > During blueprint deploy tasks sometimes fail due to KeyError on large clusters > -- > > Key: AMBARI-25604 > URL: https://issues.apache.org/jira/browse/AMBARI-25604 > Project: Ambari > Issue Type: Bug >Reporter: Andrew Onischuk >Assignee: Andrew Onischuk >Priority: Major > Fix For: 2.7.6 > > Time Spent: 20m > Remaining Estimate: 0h > > During blueprint deploy we don't rely on topology cache since > [https://issues.apache.org/jira/browse/AMBARI-23660] So topology is send with > command. BUT the problem occurs when we still try to generate it on agent and > fail. > {code:java}ERROR 2020-12-10 06:30:09,350 CustomServiceOrchestrator.py:459 - > Caught an exception while executing custom service command: 'exceptions.KeyError'>: 10; 10 > Traceback (most recent call last): > File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", > line 324, in runCommand > command = self.generate_command(command_header) > File "/usr/lib/ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", > line 507, in generate_command > command_dict = self.configuration_builder.get_configuration(cluster_id, > service_name, component_name, required_config_timestamp) > File "/usr/lib/ambari-agent/lib/ambari_agent/ConfigurationBuilder.py", line > 43, in get_configuration > 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), > File "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in > newFunction > return f(*args, **kw) > File "/usr/lib/ambari-agent/lib/ambari_agent/ClusterTopologyCache.py", line > 112, in get_cluster_host_info > hostnames = [self.hosts_to_id[cluster_id][host_id].hostName for host_id > in component_dict.hostIds] > KeyError: 10{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (AMBARI-25604) During blueprint deploy tasks sometimes fail due to KeyError on large clusters
[ https://issues.apache.org/jira/browse/AMBARI-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Onischuk updated AMBARI-25604: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to branch-2.7 > During blueprint deploy tasks sometimes fail due to KeyError on large clusters > -- > > Key: AMBARI-25604 > URL: https://issues.apache.org/jira/browse/AMBARI-25604 > Project: Ambari > Issue Type: Bug >Reporter: Andrew Onischuk >Assignee: Andrew Onischuk >Priority: Major > Fix For: 2.7.6 > > Time Spent: 20m > Remaining Estimate: 0h > > During blueprint deploy we don't rely on topology cache since > [https://issues.apache.org/jira/browse/AMBARI-23660] So topology is send with > command. BUT the problem occurs when we still try to generate it on agent and > fail. {code:java} ERROR 2020-12-10 06:30:09,350 > CustomServiceOrchestrator.py:459 - Caught an exception while executing custom > service command: : 10; 10 Traceback (most recent call last): File "/usr/lib > /ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 324, in > runCommand command = self.generate_command(command_header) File "/usr/lib > /ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 507, in > generate_command command_dict = > self.configuration_builder.get_configuration(cluster_id, service_name, > component_name, required_config_timestamp) File "/usr/lib/ambari- > agent/lib/ambari_agent/ConfigurationBuilder.py", line 43, in get_configuration > 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), File > "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in newFunction > return f(*args, **kw) File "/usr/lib/ambari- > agent/lib/ambari_agent/ClusterTopologyCache.py", line 112, in > get_cluster_host_info hostnames = > [self.hosts_to_id[cluster_id][host_id].hostName for host_id in > component_dict.hostIds] KeyError: 10{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (AMBARI-25604) During blueprint deploy tasks sometimes fail due to KeyError on large clusters
[ https://issues.apache.org/jira/browse/AMBARI-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Onischuk updated AMBARI-25604: - Status: Patch Available (was: Open) > During blueprint deploy tasks sometimes fail due to KeyError on large clusters > -- > > Key: AMBARI-25604 > URL: https://issues.apache.org/jira/browse/AMBARI-25604 > Project: Ambari > Issue Type: Bug >Reporter: Andrew Onischuk >Assignee: Andrew Onischuk >Priority: Major > Fix For: 2.7.6 > > > During blueprint deploy we don't rely on topology cache since > [https://issues.apache.org/jira/browse/AMBARI-23660] So topology is send with > command. BUT the problem occurs when we still try to generate it on agent and > fail. {code:java} ERROR 2020-12-10 06:30:09,350 > CustomServiceOrchestrator.py:459 - Caught an exception while executing custom > service command: : 10; 10 Traceback (most recent call last): File "/usr/lib > /ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 324, in > runCommand command = self.generate_command(command_header) File "/usr/lib > /ambari-agent/lib/ambari_agent/CustomServiceOrchestrator.py", line 507, in > generate_command command_dict = > self.configuration_builder.get_configuration(cluster_id, service_name, > component_name, required_config_timestamp) File "/usr/lib/ambari- > agent/lib/ambari_agent/ConfigurationBuilder.py", line 43, in get_configuration > 'clusterHostInfo': self.topology_cache.get_cluster_host_info(cluster_id), File > "/usr/lib/ambari-agent/lib/ambari_agent/Utils.py", line 230, in newFunction > return f(*args, **kw) File "/usr/lib/ambari- > agent/lib/ambari_agent/ClusterTopologyCache.py", line 112, in > get_cluster_host_info hostnames = > [self.hosts_to_id[cluster_id][host_id].hostName for host_id in > component_dict.hostIds] KeyError: 10{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)