Re: [rdo-users] RHOSP 10 failed overcloud deployment

2018-02-02 Thread Anda Nicolae
Hi all,

Thank you very much for your support. I think I am getting pretty close to 
finishing my deployment.
My status now is:
'openstack stack resource list overcloud' displays all resources in 
CREATE_COMPLETE state, with the exception of AllNodesDeploySteps which is in 
CREATE_FAILED state.
resource_status_reason is Error: 
resources.AllNodesDeploySteps.resources.ControllerDeployment_Step5.resources[0]:
 Deployment to server failed: deploy_status_code: Deployment exited with 
non-zero status code: 6

From the json which has status_code 6 in /var/lib/heat-config/deployed on 
Controller node, I have:

deploy_stdout: Dependency Service[aodh-api] has failures: 
true\u001b[0m\n\u001b[mNotice: 
/Stage[main]/Keystone::Deps/Anchor[keystone::service::end]: Dependency 
Service[ceilometer-api] has failures: true\u001b[0m\n\u001b[mNotice: 
/Stage[main]/Keystone::Deps/Anchor[keystone::service::end]: Dependency 
Service[gnocchi-api] has failures: true\u001b[0m\n\u001b[mNotice: 
/Stage[main]/Keystone::Deps/Anchor[keystone::service::end]: Dependency 
Service[aodh-api] has failures: true\u001b[0m\n\u001b[mNotice: 
/Stage[main]/Neutron::Keystone::Auth/Keystone::Resource::Service_identity[neutron]/Keystone_user[neutron]:
 Dependency Service[ceilometer-api] has failures: 
true\u001b[0m\n\u001b[mNotice: 
/Stage[main]/Neutron::Keystone::Auth/Keystone::Resource::Service_identity[neutron]/Keystone_user[neutron]:
 Dependency Service[gnocchi-api] has failures: true\u001b[0m\n\u001b[mNotice: 
/Stage[main]/Neutron::Keystone::Auth/Keystone::Resource::Service_identity[neutron]/Keystone_user[neutron]:

deploy_stderr:  Could not look up qualified variable '::nova::api::admin_user'; 
class ::nova::api has not been evaluated\u001b[0m\n\u001b[1;31mWarning: 
Scope(Class[Nova::Keystone::Authtoken]): Could not look up qualified variable 
'::nova::api::admin_password'; class ::nova::api has not been 
evaluated\u001b[0m\n\u001b[1;31mWarning: 
Scope(Class[Nova::Keystone::Authtoken]): Could not look up qualified variable 
'::nova::api::admin_tenant_name'; class ::nova::api has not been 
evaluated\u001b[0m\n\u001b[1;31mWarning: 
Scope(Class[Nova::Keystone::Authtoken]): Could not look up qualified variable 
'::nova::api::auth_uri'; class ::nova::api has not been 
evaluated\u001b[0m\n\u001b[1;31mWarning: 
Scope(Class[Nova::Keystone::Authtoken]): Could not look up qualified variable 
'::nova::api::auth_version'; class ::nova::api has not been 
evaluated\u001b[0m\n\u001b[1;31mWarning: 
Scope(Class[Nova::Keystone::Authtoken]): Could not look up qualified variable 
'::nova::api::identity_uri'; class ::nova::api has not been evaluated

I've estimated that my deployment failed after half an hour, not after 4 hours 
like it did before.

I think my deployment failed because I haven't defined yet InternalApiNetCidr 
and TenantNetCidr.
Next step will be to define these in in network-environment.yaml.
I will use static IP addresses for both InternalApiNetCidr and TenantNetCidr 
and I will add these static IP addresses in ips_from_pool_all.yaml.
I will also define ../network/ports/internal_api_from_pool.yaml and 
../network/ports/tenant_from_pool.yaml.
Please let me know whether you have other ideas why my deployment fails.



To get here, I have added the following lines in both controller.yaml and 
compute.yaml:
routes:
  -
  ip_netmask: 169.254.169.254/32
next_hop: {get_param: EC2MetadataIp}
  -

and the lines:
  -
type: interface
name: eth1
use_dhcp: false # This effectively disables NIC1
  -
type: interface
name: eth2
use_dhcp: false # This effectively disables NIC2

On both the controller and the compute overcloud VMs, I have the following 
routing table:
Kernel IP routing table
Destination Gateway 
 Genmask  
Flags Metric RefUseIface
0.0.0.0 0.0.0.0  UG 0
  00br-ex
0.0.0.0  
255.255.255.128 U   
 0  00br-ex
   0.0.0.0  
255.255.255.0  U
0  00eth3
169.254.169.254
  255.255.255.255 UGH  0  0
0eth3

Thanks,
Anda

From: Pedro Sousa [mailto:pgso...@gmail.com]
Sent: Friday, February 2, 2018 1:22 PM
To: Anda Nicolae
Cc: ra...@redhat.com; users@lists.rdoproject.org
Subject: Re: [rdo-users] RHOSP 10 failed overcloud deployment

Hi Anda,

all the issues seem to related, if you're using tunneled networks you need to 
configure  tenant networks on both controller and computes.

Also

Re: [rdo-users] RHOSP 10 failed overcloud deployment

2018-02-02 Thread Pedro Sousa
Hi Anda,

all the issues seem to related, if you're using tunneled networks you need
to configure  tenant networks on both controller and computes.

Also if you're using static ips you should have internal networks defined
and bind them on ServiceNetMap.

In the compute nodes if you don't use external network make sure you have
the default route and 169.254.169.254/32 on ctlplane network, something
like this:

*network_config:*
*-*
*  type: interface*
*  name: nic1*
*  use_dhcp: false*
*  dns_servers: {get_param: DnsServers}*
*  addresses:*
*-*
*  ip_netmask:*
*list_join:*
*  - '/'*
*  - - {get_param: ControlPlaneIp}*
*- {get_param: ControlPlaneSubnetCidr}*
*  routes:*
*-*
*  ip_netmask: 169.254.169.254/32
<http://169.254.169.254/32>*
*  next_hop: {get_param: EC2MetadataIp}*
*-*
*  default: true*
*  next_hop: {get_param: ControlPlaneDefaultRoute}  *

Hope it helps.




On Fri, Feb 2, 2018 at 9:04 AM, Anda Nicolae <anico...@lenovo.com> wrote:

> Hi all,
>
>
>
> Thanks for the info about the 2 networks (external and ctlplane) that I
> need on the overcloud VMs (controller and compute).
>
> Now br-ex on my overcloud VMs has the external IP address and I am able to
> ping overcloud VMs on both external and ctlplane IP addresses.
>
>
>
> Also, since for the external network I use static IPs, in my
> ips-from-pool-all.yaml, I have:
>
> OS::TripleO::Compute::Ports::ExternalPort: ../network/ports/external_
> from_pool_compute.yaml
>
>
>
> external_from_pool_compute.yaml is similar to external_from_pool.yaml
> file. I've noticed that I if use noop.yaml, the external IP is not assigned
> to eth0 interface on the compute node.
>
> I hope it is correct to use it like this.
>
>
>
> I have continued with my overcloud deployment and I've noticed that some
> progress has been made:
>
> - Controller resource is now in CREATE_COMPLETE state
>
> - although deployment still fails, I can connect to the overcloud VMs via
> both ctlplane IP and external IP and check the logs, after the failure of
> the deploy operation
>
>
>
> Compute resource fails with the CREATE aborted reason. I've looked in
> /valog/messages on the overcloud compute VM and I've noticed the following
> error messages that keep repeating:
>
> Feb  2 03:09:36 localhost os-collect-config: Source [ec2] Unavailable.
>
> Feb  2 03:09:36 localhost os-collect-config: 
> /var/lib/os-collect-config/local-data
> not found. Skipping
>
> Feb  2 03:09:36 localhost os-collect-config: No local metadata found
> (['/var/lib/os-collect-config/local-data'])
>
> Feb  2 03:10:16 localhost os-collect-config: 
> HTTPConnectionPool(host='169.254.169.254',
> port=80): Max retries exceeded with url: /latest/meta-data/ (Caused by
> ConnectTimeoutError( object at 0x2752190>, 'Connection to 169.254.169.254 timed out. (connect
> timeout=10.0)'))
>
>
>
>
>
> From heat-engine.log, I have:
>
> 2018-02-01 19:26:32.253 3348 DEBUG neutronclient.v2_0.client
> [req-c27f050c-b743-4e1d-a706-e01e63a43b49 fdfcf2f659a94e57829dbefc618f3d3b
> 453c1e37b83f4f8e8a49dab299e8224d - - -] Error message: {"NeutronError":
> {"message": "Port 0292b718-2c28-4b0c-a517-c481c547b711 could not be
> found.", "type": "PortNotFound", "detail": ""}} _handle_fault_response
> /usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:266
>
>
>
>
>
> I have 2 questions regarding the deployment:
>
> 1. Does any of the error messages above cause the failed deployment of the
> Compute resource?
>
> 2. In my network-environment.yaml, I haven't set InternalApiNetCidr,
> TenantNetCidr, InternalApiNetworkVlanID, TenantNetworkVlanID.
>
> Do I need to set these in order to make de overcloud deployment work?
>
>
>
> Thanks,
>
> Anda
>
>
>
>
>
> *From:* Anda Nicolae
> *Sent:* Wednesday, January 31, 2018 12:40 PM
> *To:* 'Pedro Sousa'
> *Cc:* ra...@redhat.com; users@lists.rdoproject.org
> *Subject:* RE: [rdo-users] RHOSP 10 failed overcloud deployment
>
>
>
> I've just run 'neutron net-list' on the undercloud node and I have the 2
> networks, ctlplane and external.
>
> My belief was that I don't need the external network, I only need the
> provision (ctlplane) network for the deployment.
>
> I don't have a DHCP server for my external network.
>
>
>
> Do I need to set the external IP address for the compute node and for 

Re: [rdo-users] RHOSP 10 failed overcloud deployment

2018-02-02 Thread Raoul Scarazzini
On 02/02/2018 10:04 AM, Anda Nicolae wrote:
> I have 2 questions regarding the deployment:
> 1. Does any of the error messages above cause the failed deployment of
> the Compute resource?
> 2. In my network-environment.yaml, I haven't set InternalApiNetCidr,
> TenantNetCidr, InternalApiNetworkVlanID, TenantNetworkVlanID.
> Do I need to set these in order to make de overcloud deployment work?

In which step are you failing now? It would be really helpful now that
you can login into controllers and computes if you can go into
/var/lib/heat-config/deployed and check there for status_code different
from 0 inside the json files that are in there. It should give you an
idea about why it is failing.

-- 
Raoul Scarazzini
ra...@redhat.com
___
users mailing list
users@lists.rdoproject.org
http://lists.rdoproject.org/mailman/listinfo/users

To unsubscribe: users-unsubscr...@lists.rdoproject.org


Re: [rdo-users] RHOSP 10 failed overcloud deployment

2018-02-02 Thread Anda Nicolae
Hi all,

Thanks for the info about the 2 networks (external and ctlplane) that I need on 
the overcloud VMs (controller and compute).
Now br-ex on my overcloud VMs has the external IP address and I am able to ping 
overcloud VMs on both external and ctlplane IP addresses.

Also, since for the external network I use static IPs, in my 
ips-from-pool-all.yaml, I have:
OS::TripleO::Compute::Ports::ExternalPort: 
../network/ports/external_from_pool_compute.yaml

external_from_pool_compute.yaml is similar to external_from_pool.yaml file. 
I've noticed that I if use noop.yaml, the external IP is not assigned to eth0 
interface on the compute node.
I hope it is correct to use it like this.

I have continued with my overcloud deployment and I've noticed that some 
progress has been made:
- Controller resource is now in CREATE_COMPLETE state
- although deployment still fails, I can connect to the overcloud VMs via both 
ctlplane IP and external IP and check the logs, after the failure of the deploy 
operation

Compute resource fails with the CREATE aborted reason. I've looked in 
/valog/messages on the overcloud compute VM and I've noticed the following 
error messages that keep repeating:
Feb  2 03:09:36 localhost os-collect-config: Source [ec2] Unavailable.
Feb  2 03:09:36 localhost os-collect-config: 
/var/lib/os-collect-config/local-data not found. Skipping
Feb  2 03:09:36 localhost os-collect-config: No local metadata found 
(['/var/lib/os-collect-config/local-data'])
Feb  2 03:10:16 localhost os-collect-config: 
HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with 
url: /latest/meta-data/ (Caused by 
ConnectTimeoutError(, 'Connection to 169.254.169.254 timed out. (connect 
timeout=10.0)'))


From heat-engine.log, I have:
2018-02-01 19:26:32.253 3348 DEBUG neutronclient.v2_0.client 
[req-c27f050c-b743-4e1d-a706-e01e63a43b49 fdfcf2f659a94e57829dbefc618f3d3b 
453c1e37b83f4f8e8a49dab299e8224d - - -] Error message: {"NeutronError": 
{"message": "Port 0292b718-2c28-4b0c-a517-c481c547b711 could not be found.", 
"type": "PortNotFound", "detail": ""}} _handle_fault_response 
/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:266


I have 2 questions regarding the deployment:
1. Does any of the error messages above cause the failed deployment of the 
Compute resource?
2. In my network-environment.yaml, I haven't set InternalApiNetCidr, 
TenantNetCidr, InternalApiNetworkVlanID, TenantNetworkVlanID.
Do I need to set these in order to make de overcloud deployment work?

Thanks,
Anda


From: Anda Nicolae
Sent: Wednesday, January 31, 2018 12:40 PM
To: 'Pedro Sousa'
Cc: ra...@redhat.com; users@lists.rdoproject.org
Subject: RE: [rdo-users] RHOSP 10 failed overcloud deployment

I've just run 'neutron net-list' on the undercloud node and I have the 2 
networks, ctlplane and external.
My belief was that I don't need the external network, I only need the provision 
(ctlplane) network for the deployment.
I don't have a DHCP server for my external network.

Do I need to set the external IP address for the compute node and for the 
controller node in the yaml files from templates folder?

Thanks,
Anda

From: Pedro Sousa [mailto:pgso...@gmail.com]
Sent: Wednesday, January 31, 2018 12:32 PM
To: Anda Nicolae
Cc: ra...@redhat.com<mailto:ra...@redhat.com>; 
users@lists.rdoproject.org<mailto:users@lists.rdoproject.org>
Subject: Re: [rdo-users] RHOSP 10 failed overcloud deployment

Hi Anda,

some things you could check:

Do you have 2 networks on director (ctlplane and external) and are they 
reachable from the overcloud nodes?

Seems to me that you have network issues and that's because you're seeing those 
long timeouts.

For "Message: No valid host was found. There are not enough hosts available" 
message you could check "/var/log/nova/nova-conductor.log".

Regards


On Wed, Jan 31, 2018 at 10:14 AM, Anda Nicolae 
<anico...@lenovo.com<mailto:anico...@lenovo.com>> wrote:
I've let the deployment run overnight and it failed after almost 4hrs with the 
errors below. Do you happen to know the config file where I can decrease the 
timeout? I looked in /etc/nova/nova.conf and in ironic config files but I 
couldn't find anything relevant.

The errors are:

[overcloud.Compute.0]: CREATE_FAILED  ResourceInError: 
resources[0].resources.NovaCompute: Went to status ERROR due to "Message: 
Unknown, Code: Unknown"
[overcloud.Controller.0]: CREATE_FAILED  Resource CREATE failed: 
ResourceInError: resources.Controller: Went to status ERROR due to "Message: No 
valid host was found. There are not enough hosts available., Code: 500"

It is unclear to me why the above errors occur, since in my instackenv.json I 
declared node capabilities for both the computer and the controller node to be 
greater than the compute and controller flavors from 'openstack flavor list'.

However, I've fo

Re: [rdo-users] RHOSP 10 failed overcloud deployment

2018-01-31 Thread Anda Nicolae
I've let the deployment run overnight and it failed after almost 4hrs with the 
errors below. Do you happen to know the config file where I can decrease the 
timeout? I looked in /etc/nova/nova.conf and in ironic config files but I 
couldn't find anything relevant.

The errors are:

[overcloud.Compute.0]: CREATE_FAILED  ResourceInError: 
resources[0].resources.NovaCompute: Went to status ERROR due to "Message: 
Unknown, Code: Unknown"
[overcloud.Controller.0]: CREATE_FAILED  Resource CREATE failed: 
ResourceInError: resources.Controller: Went to status ERROR due to "Message: No 
valid host was found. There are not enough hosts available., Code: 500"

It is unclear to me why the above errors occur, since in my instackenv.json I 
declared node capabilities for both the computer and the controller node to be 
greater than the compute and controller flavors from 'openstack flavor list'.

However, I've found this link and I am looking over it:
https://docs.openstack.org/ironic/latest/admin/troubleshooting.html#nova-returns-no-valid-host-was-found-error

Thanks,
Anda

-Original Message-
From: Raoul Scarazzini [mailto:ra...@redhat.com] 
Sent: Tuesday, January 30, 2018 8:17 PM
To: Anda Nicolae; users@lists.rdoproject.org
Subject: Re: [rdo-users] RHOSP 10 failed overcloud deployment

On 01/30/2018 04:39 PM, Anda Nicolae wrote:
> Got it. 
> 
> I've noticed that it spends quite some time in CREATE_IN_PROGRESS state for 
> OS::Heat::ResourceGroup resource (on Controller node).
> Overcloud deployment fails after 4h. I will check in which config file is the 
> overcloud deployment timeout configured and decrease it.
> 
> Thanks,
> Anda

Check also network settings. 4h timeout is the default when something is 
unreachable.

--
Raoul Scarazzini
ra...@redhat.com
___
users mailing list
users@lists.rdoproject.org
http://lists.rdoproject.org/mailman/listinfo/users

To unsubscribe: users-unsubscr...@lists.rdoproject.org


[rdo-users] RHOSP 10 failed overcloud deployment

2018-01-30 Thread Anda Nicolae
Hello,
As previously stated in my previous mail on this list , I am trying to deploy 
OpenStack 10 using OpenStack Platform Director 10.
I am using a bare-metal server with RedHat 7.4, on which I have created 3 VMs: 
1st VM is the undercloud node, 2nd VM is the overcloud controller node and the 
3rd VM is the overcloud compute node.
The bare-metal server I am using is also my KVM hypervisor for the overcloud.

I managed to provision my overcloud nodes and now I am stuck at performing 
overcloud deployment.
The command I am running is:
openstack --debug overcloud deploy --templates ~/templates --control-scale 1 
--compute-scale 1 --control-flavor control --compute-flavor compute -e 
~/templates/environments/network-isolation.yaml -e 
~/templates/environments/network-environment.yaml --ntp-server pool.ntp.org 
--neutron-network-type vxlan --neutron-tunnel-types vxlan.

I connected via ssh with heat-admin user on my controller and compute nodes. 
I've run the following command to gather logs:
sudo journalctl -u os-collect-config

I think the problem is on my controller node, because I've noticed the 
following messages in the output of the above command:
os-collect-config[2996]: Source [ec2] Unavailable.
os-collect-config[2996]: /var/lib/os-collect-config/local-data not found. 
Skipping
os-collect-config[2996]: No local metadata found 
(['/var/lib/os-collect-config/local-data']

These messages repeat for various times in the output of the above command.

On my underclud VM, I've noticed that overcloud deployment remains stuck when 
running wait_for_stack_ready function from 
/usr/lib/python2.7/site-packages/tripleoclient/utils.py.

I also intend to add some logs in 
/usr/lib/python2.7/site-packages/os_collect_config/collect.py to see what 
causes the error message: Source [ec2] Unavailable

I think I have an error in my templates, but I don't figure out which yet. Do 
you know what may cause this?

Thanks,
Anda
___
users mailing list
users@lists.rdoproject.org
http://lists.rdoproject.org/mailman/listinfo/users

To unsubscribe: users-unsubscr...@lists.rdoproject.org