Re: Unable to add template to new deployment

2021-06-26 Thread Joshua Schaeffer


On 6/24/21 5:31 AM, Andrija Panic wrote:
> LXC is nothing short of untested recently (for years) - the ones that DO
> work (used in production by people) are KVM, XenServer/XCP-ng, VMware.
> That's all.
> LXC, OVM and co, are most probably doomed, to be honest.
Thanks, I'm not surprised to hear this. I will switch to KVM.

-- 
Thanks,
Joshua Schaeffer



Re: Unable to add template to new deployment

2021-06-24 Thread Andrija Panic
LXC is nothing short of untested recently (for years) - the ones that DO
work (used in production by people) are KVM, XenServer/XCP-ng, VMware.
That's all.
LXC, OVM and co, are most probably doomed, to be honest.

Best,

On Wed, 23 Jun 2021 at 09:27, Joshua Schaeffer 
wrote:

>
>  A thing that I briefly touched somewhere upstairs ^^^ - for each
>  traffic
>  type you have defined - you need to define a traffic label - my
>  deduction
>  capabilities make me believe you are using KVM, so you need to set
>  your KVM
>  traffic label for all your network traffic (traffic label, in you case
>  =
>  exact name of the bridge as visible in Linux) - I recall there are
>  some new
>  UI issues when it comes to tags, so go to your
>  :8080/client/legacy
>  - and check your traffic label there - and set it there, UI in
>  4.15.0.0
>  doesn't allow you to update/set it after the zone is created - but old
>  UI
>  will allow you to do it.
>
> I changed over all the bonds to the standard naming convention and that
> did the trick. I also added the storage network back as you suggested.
> Thanks again for those pointers. However, I may have discovered a bug. I'm
> actually trying to test an LXC hypervisor instead of KVM and it isn't using
> the network labels. There seems to be two problems:
>
> 1. You can't actually set the LXC network label in the new UI because
> there is no option for it. There is an option in the legacy UI, however it
> doesn't actually update the database and throws a warning in the management
> logs.
> 2. Even if you set the labels directly in the database ACS doesn't seem to
> use them. I'm not 100% sure but it looks like it defaults to the settings
> on the compute host. In my case this is causing problems with the storage
> network.
>
> For the first problem, If all the labels are set to NULL:
>
> user@dbserver:~$ sudo mysql -D cloud -e "SELECT id, traffic_type,
> lxc_network_label FROM physical_network_traffic_types;"
> ++--+---+
> | id | traffic_type | lxc_network_label |
> ++--+---+
> | 11 | Management   | NULL  |
> | 12 | Public   | NULL  |
> | 13 | Guest| NULL  |
> | 14 | Storage  | NULL  |
> ++--+---+
>
> and I attempt to set the LXC network label in the legacy UI it remains
> NULL in the database and I see this warning in the logs:
>
> 2021-06-23 05:42:20,977 WARN  [c.c.a.d.ParamGenericValidationWorker]
> (qtp1644231115-887:ctx-a97e9424 ctx-5d6ce3c6) (logid:3e68476e) Received
> unknown parameters for command updateTrafficType. Unknown parameters :
> lxcnetworklabel
>
> In order to get the right labels I updated the database manually:
>
> user@dbserver:~$ sudo mysql -D cloud -e "UPDATE
> physical_network_traffic_types SET lxc_network_label = 'cloudbr0' WHERE id
> = 11;"
> user@dbserver:~$ sudo mysql -D cloud -e "UPDATE
> physical_network_traffic_types SET lxc_network_label = 'cloudbr1' WHERE id
> in (12,13);"
> user@dbserver:~$ sudo mysql -D cloud -e "UPDATE
> physical_network_traffic_types SET lxc_network_label = 'cloudbr2' WHERE id
> = 14;"
> user@dbserver:~$ sudo mysql -D cloud -e "SELECT id, traffic_type,
> lxc_network_label FROM physical_network_traffic_types;"
> ++--+---+
> | id | traffic_type | lxc_network_label |
> ++--+---+
> | 11 | Management   | cloudbr0  |
> | 12 | Public   | cloudbr1  |
> | 13 | Guest| cloudbr1  |
> | 14 | Storage  | cloudbr2  |
> ++--+---+
>
> However, this leads to my second problem; it doesn't seem to actually use
> the correct network interface. I think it uses the default that is set on
> the compute (maybe as a fallback), but I could be wrong about that. This is
> what is set on my compute in the agent.properties file:
>
> user@cmpserver:~$ sudo cat /etc/cloudstack/agent/agent.properties | egrep
> '(network\.device|hypervisor\.type)'
> private.network.device=cloudbr0
> guest.network.device=cloudbr1
> hypervisor.type=lxc
> public.network.device=cloudbr1
>
> And I can see in virsh that the management and public interfaces use
> cloudbr0 and cloudbr1 respectively, however the storage interface for the
> VM uses cloudbr0 when it should use cloudbr2:
>
> root@s-38-VM:~# ip --brief link show eth3
> eth3 UP 1e:00:ac:00:03:6a
> 
>
> root@bllcloudcmp01:~# virsh dumpxml s-38-VM | grep -B 1 -A 8
> '1e:00:ac:00:03:6a'
> 
>   
>   
>   
>   
>   
>   
>   
>function='0x0'/>
> 
>
> I setup another cluster and host with the exact same configuration except
> running KVM instead of LXC and set the KVM labels to the same as the LXC
> labels as a test. I then started the system VM's on the new host. You can
> see that virsh is using the 

Re: Unable to add template to new deployment

2021-06-23 Thread Joshua Schaeffer


 A thing that I briefly touched somewhere upstairs ^^^ - for each
 traffic
 type you have defined - you need to define a traffic label - my
 deduction
 capabilities make me believe you are using KVM, so you need to set
 your KVM
 traffic label for all your network traffic (traffic label, in you case
 =
 exact name of the bridge as visible in Linux) - I recall there are
 some new
 UI issues when it comes to tags, so go to your
 :8080/client/legacy
 - and check your traffic label there - and set it there, UI in
 4.15.0.0
 doesn't allow you to update/set it after the zone is created - but old
 UI
 will allow you to do it.

I changed over all the bonds to the standard naming convention and that did the 
trick. I also added the storage network back as you suggested. Thanks again for 
those pointers. However, I may have discovered a bug. I'm actually trying to 
test an LXC hypervisor instead of KVM and it isn't using the network labels. 
There seems to be two problems:

1. You can't actually set the LXC network label in the new UI because there is 
no option for it. There is an option in the legacy UI, however it doesn't 
actually update the database and throws a warning in the management logs.
2. Even if you set the labels directly in the database ACS doesn't seem to use 
them. I'm not 100% sure but it looks like it defaults to the settings on the 
compute host. In my case this is causing problems with the storage network.

For the first problem, If all the labels are set to NULL:

user@dbserver:~$ sudo mysql -D cloud -e "SELECT id, traffic_type, 
lxc_network_label FROM physical_network_traffic_types;"
++--+---+
| id | traffic_type | lxc_network_label |
++--+---+
| 11 | Management   | NULL  |
| 12 | Public   | NULL  |
| 13 | Guest    | NULL  |
| 14 | Storage  | NULL  |
++--+---+

and I attempt to set the LXC network label in the legacy UI it remains NULL in 
the database and I see this warning in the logs:

2021-06-23 05:42:20,977 WARN  [c.c.a.d.ParamGenericValidationWorker] 
(qtp1644231115-887:ctx-a97e9424 ctx-5d6ce3c6) (logid:3e68476e) Received unknown 
parameters for command updateTrafficType. Unknown parameters : lxcnetworklabel

In order to get the right labels I updated the database manually:

user@dbserver:~$ sudo mysql -D cloud -e "UPDATE physical_network_traffic_types 
SET lxc_network_label = 'cloudbr0' WHERE id = 11;"
user@dbserver:~$ sudo mysql -D cloud -e "UPDATE physical_network_traffic_types 
SET lxc_network_label = 'cloudbr1' WHERE id in (12,13);"
user@dbserver:~$ sudo mysql -D cloud -e "UPDATE physical_network_traffic_types 
SET lxc_network_label = 'cloudbr2' WHERE id = 14;"
user@dbserver:~$ sudo mysql -D cloud -e "SELECT id, traffic_type, 
lxc_network_label FROM physical_network_traffic_types;"
++--+---+
| id | traffic_type | lxc_network_label |
++--+---+
| 11 | Management   | cloudbr0  |
| 12 | Public   | cloudbr1  |
| 13 | Guest    | cloudbr1  |
| 14 | Storage  | cloudbr2  |
++--+---+

However, this leads to my second problem; it doesn't seem to actually use the 
correct network interface. I think it uses the default that is set on the 
compute (maybe as a fallback), but I could be wrong about that. This is what is 
set on my compute in the agent.properties file:

user@cmpserver:~$ sudo cat /etc/cloudstack/agent/agent.properties | egrep 
'(network\.device|hypervisor\.type)'
private.network.device=cloudbr0
guest.network.device=cloudbr1
hypervisor.type=lxc
public.network.device=cloudbr1

And I can see in virsh that the management and public interfaces use cloudbr0 
and cloudbr1 respectively, however the storage interface for the VM uses 
cloudbr0 when it should use cloudbr2:

root@s-38-VM:~# ip --brief link show eth3
eth3 UP 1e:00:ac:00:03:6a 


root@bllcloudcmp01:~# virsh dumpxml s-38-VM | grep -B 1 -A 8 '1e:00:ac:00:03:6a'
    
  
  
  
  
  
  
  
  
    

I setup another cluster and host with the exact same configuration except 
running KVM instead of LXC and set the KVM labels to the same as the LXC labels 
as a test. I then started the system VM's on the new host. You can see that 
virsh is using the cloudbr2 bridge for the VM:

root@s-32-VM:~# ip --brief link show eth3
eth3 UP 1e:00:f8:00:03:df 


root@bllcloudcmp02:~# virsh dumpxml s-32-VM | grep -B 1 -A 8 '1e:00:f8:00:03:df'
    
  
  
  
  
  
  
  
  
    

Notice the  is set to cloudbr2 now with the only difference in 
the hypervisor.

Is LXC still supported (it is still mentioned in the docs)? If not then I'll 
just switch to using KVM.

-- 
Thanks,
Joshua 

Re: Unable to add template to new deployment

2021-06-21 Thread Andrija Panic
You're most welcome!

(and apologies about the naming convention jokes - I also would name things
in a meaningful way instead of bond0/1 etc - the same way I'm switching
back from those "predictable interface names "ensp0p1" and similar to
old-fashioned eth0, eth1 etc - not sure what kind of drugs did the
engineers take when they came with those "predictable" interface names...)

Cheers,

On Fri, 18 Jun 2021 at 07:16,  wrote:

> Andrija,
>
> Thanks so much for all the details. I'm out of the office for the next
> couple of days so will update my cloud with your suggestions when I get
> back.
>
> As far as the "fancy" naming, I just never found names like bondX useful
> when Linux allows naming the network device something else. It has just
> become a convention of mine. I can easily distinguish which bond carries
> cloud traffic and which carries storage traffic by looking at the bond
> name, but it is just a personal thing and can easily switch back to
> using the standard bond names.
>
> I was aware of the traffic labels but forgot to mention that I had set
> those up in my previous email. There were still some details that you
> provided that helped me further understand how they work though, thanks.
>
> Again, thanks for you help.
>
> On 2021-06-17 22:04, Andrija Panic wrote:
> > BTW, once you thing you have fixed all your network configuration
> > issues -
> > destroy all system VM (CPVM, SSVM and restart all networks with
> > "cleanup" -
> > so that new VMs are created_
> > Inside SSVM, run the the following script, which should give you
> > results
> > similar as below - confirming that your SSVM is healthy
> >
> >
> >
> >   root@s-2536-VM:/usr/local/cloud/systemvm#
> > /usr/local/cloud/systemvm/ssvm-
> > check.sh
> > 
> > First DNS server is  192.168.169.254
> > PING 192.168.169.254 (192.168.169.254): 56 data bytes
> > 64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms
> > 64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms
> > --- 192.168.169.254 ping statistics ---
> > 2 packets transmitted, 2 packets received, 0% packet loss
> > round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms
> > Good: Can ping DNS server
> > 
> > Good: DNS resolves cloudstack.apache.org
> > 
> > nfs is currently mounted
> > Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2
> > Good: Can write to mount point
> > 
> > Management server is 192.168.169.13. Checking connectivity.
> > Good: Can connect to management server 192.168.169.13 port 8250
> > 
> > Good: Java process is running
> > 
> > Tests Complete. Look for ERROR or WARNING above.
> >
> > On Thu, 17 Jun 2021 at 23:55, Andrija Panic 
> > wrote:
> >
> >> Since you really bothered to provide so very detailed inputs and help
> >> us
> >> help you (vs what some other people tend to do) -  I think you really
> >> deserved a decent answer (and some explanation).
> >>
> >> The last question first -even though you don't specify/have dedicated
> >> Storage traffic, there will be an additional interface inside the SSVM
> >> connected to the same Management network (not to the old Storage
> >> network -
> >> if you see the old storage network, restart your mgmt server and
> >> destroy
> >> the SSVM - a new one should be created, with proper interfaces inside
> >> it)
> >>
> >> bond naming issues:
> >> - rename  your "bond-services" to something industry-standard like
> >> "bond0"
> >> or similar - cloudstack extracts "child" interfaces from cloudbr1 IF
> >> you
> >> specify a VLAN for a network that ACS should create - so your
> >> "bond-services", while fancy (and unclear to me WHY you named it in
> >> that
> >> weird way - smiley here) - is NOT something CloudStack will recognize
> >> and
> >> this is the reason it fails (it even says so in that error message)
> >> - no reason to NOT have that dedicated storage network -  feel free to
> >> bring it back - the same issue you have as for the public traffic -
> >> rename
> >> "bond-storage" to e.g. "bond1" and you will be good to go -  since you
> >> are
> >> NOT using tagging, ACS will just plug vNIC of the VM into the cloudbr2
> >> (or
> >> whatever bridge name you use for it).
> >>
> >> Now some explanation (even though your deduction capabilities
> >> certainly
> >> made you draw some conclusions from what I wrote above ^^^)
> >>
> >> - When you specify a VLAN id for some network in CLoudStack -
> >> CloudStack
> >> will look for the device name that is specified as the "Traffic label"
> >> for
> >> that traffic (and you have none??? for your Public traffic - while it
> >> should be set to the name of the bridge device "cloudbr1") - and then
> >> it
> >> will provision a VLAN interface and create 

Re: Unable to add template to new deployment

2021-06-17 Thread jschaeffer

Andrija,

Thanks so much for all the details. I'm out of the office for the next 
couple of days so will update my cloud with your suggestions when I get 
back.


As far as the "fancy" naming, I just never found names like bondX useful 
when Linux allows naming the network device something else. It has just 
become a convention of mine. I can easily distinguish which bond carries 
cloud traffic and which carries storage traffic by looking at the bond 
name, but it is just a personal thing and can easily switch back to 
using the standard bond names.


I was aware of the traffic labels but forgot to mention that I had set 
those up in my previous email. There were still some details that you 
provided that helped me further understand how they work though, thanks.


Again, thanks for you help.

On 2021-06-17 22:04, Andrija Panic wrote:
BTW, once you thing you have fixed all your network configuration 
issues -
destroy all system VM (CPVM, SSVM and restart all networks with 
"cleanup" -

so that new VMs are created_
Inside SSVM, run the the following script, which should give you 
results

similar as below - confirming that your SSVM is healthy



  root@s-2536-VM:/usr/local/cloud/systemvm# 
/usr/local/cloud/systemvm/ssvm-

check.sh

First DNS server is  192.168.169.254
PING 192.168.169.254 (192.168.169.254): 56 data bytes
64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms
64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms
--- 192.168.169.254 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms
Good: Can ping DNS server

Good: DNS resolves cloudstack.apache.org

nfs is currently mounted
Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2
Good: Can write to mount point

Management server is 192.168.169.13. Checking connectivity.
Good: Can connect to management server 192.168.169.13 port 8250

Good: Java process is running

Tests Complete. Look for ERROR or WARNING above.

On Thu, 17 Jun 2021 at 23:55, Andrija Panic  
wrote:


Since you really bothered to provide so very detailed inputs and help 
us

help you (vs what some other people tend to do) -  I think you really
deserved a decent answer (and some explanation).

The last question first -even though you don't specify/have dedicated
Storage traffic, there will be an additional interface inside the SSVM
connected to the same Management network (not to the old Storage 
network -
if you see the old storage network, restart your mgmt server and 
destroy
the SSVM - a new one should be created, with proper interfaces inside 
it)


bond naming issues:
- rename  your "bond-services" to something industry-standard like 
"bond0"
or similar - cloudstack extracts "child" interfaces from cloudbr1 IF 
you

specify a VLAN for a network that ACS should create - so your
"bond-services", while fancy (and unclear to me WHY you named it in 
that
weird way - smiley here) - is NOT something CloudStack will recognize 
and

this is the reason it fails (it even says so in that error message)
- no reason to NOT have that dedicated storage network -  feel free to
bring it back - the same issue you have as for the public traffic - 
rename
"bond-storage" to e.g. "bond1" and you will be good to go -  since you 
are
NOT using tagging, ACS will just plug vNIC of the VM into the cloudbr2 
(or

whatever bridge name you use for it).

Now some explanation (even though your deduction capabilities 
certainly

made you draw some conclusions from what I wrote above ^^^)

- When you specify a VLAN id for some network in CLoudStack - 
CloudStack
will look for the device name that is specified as the "Traffic label" 
for

that traffic (and you have none??? for your Public traffic - while it
should be set to the name of the bridge device "cloudbr1") - and then 
it
will provision a VLAN interface and create a new bridge - (i.e. for 
Public
network with VLAN id 48, it will extract "bond0" from the "cloudbr1", 
and
create bond0.48 VLAN interface - AND it will create a brand new bridge 
with
this bond0.48 interface (bridge with funny name), and plug Public 
vNICs

into this new bridge
- When you do NOT specify a VLAN id for some network in CloudStack 
(i.e.
your storage network doesn't use VLAN ID in CloudStack, your switch 
ports
are in access vlan 96) - you need to have a bridge (i.e. cloudbr2) 
with the

bondYYY child interface (instead of that "bond-storage" fancy but
unrecognized child interface name) - and then ACS will NOT extract 
child
interface (nor do everything I explained in the previous 
paraghraph/bullet
point) - it will just bluntly "stick" all the vNICs into that cloudbr2 
-


Re: Unable to add template to new deployment

2021-06-17 Thread Andrija Panic
BTW, once you thing you have fixed all your network configuration issues -
destroy all system VM (CPVM, SSVM and restart all networks with "cleanup" -
so that new VMs are created_
Inside SSVM, run the the following script, which should give you results
similar as below - confirming that your SSVM is healthy



  root@s-2536-VM:/usr/local/cloud/systemvm# /usr/local/cloud/systemvm/ssvm-
check.sh

First DNS server is  192.168.169.254
PING 192.168.169.254 (192.168.169.254): 56 data bytes
64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms
64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms
--- 192.168.169.254 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms
Good: Can ping DNS server

Good: DNS resolves cloudstack.apache.org

nfs is currently mounted
Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2
Good: Can write to mount point

Management server is 192.168.169.13. Checking connectivity.
Good: Can connect to management server 192.168.169.13 port 8250

Good: Java process is running

Tests Complete. Look for ERROR or WARNING above.

On Thu, 17 Jun 2021 at 23:55, Andrija Panic  wrote:

> Since you really bothered to provide so very detailed inputs and help us
> help you (vs what some other people tend to do) -  I think you really
> deserved a decent answer (and some explanation).
>
> The last question first -even though you don't specify/have dedicated
> Storage traffic, there will be an additional interface inside the SSVM
> connected to the same Management network (not to the old Storage network -
> if you see the old storage network, restart your mgmt server and destroy
> the SSVM - a new one should be created, with proper interfaces inside it)
>
> bond naming issues:
> - rename  your "bond-services" to something industry-standard like "bond0"
> or similar - cloudstack extracts "child" interfaces from cloudbr1 IF you
> specify a VLAN for a network that ACS should create - so your
> "bond-services", while fancy (and unclear to me WHY you named it in that
> weird way - smiley here) - is NOT something CloudStack will recognize and
> this is the reason it fails (it even says so in that error message)
> - no reason to NOT have that dedicated storage network -  feel free to
> bring it back - the same issue you have as for the public traffic - rename
> "bond-storage" to e.g. "bond1" and you will be good to go -  since you are
> NOT using tagging, ACS will just plug vNIC of the VM into the cloudbr2 (or
> whatever bridge name you use for it).
>
> Now some explanation (even though your deduction capabilities certainly
> made you draw some conclusions from what I wrote above ^^^)
>
> - When you specify a VLAN id for some network in CLoudStack - CloudStack
> will look for the device name that is specified as the "Traffic label" for
> that traffic (and you have none??? for your Public traffic - while it
> should be set to the name of the bridge device "cloudbr1") - and then it
> will provision a VLAN interface and create a new bridge - (i.e. for Public
> network with VLAN id 48, it will extract "bond0" from the "cloudbr1", and
> create bond0.48 VLAN interface - AND it will create a brand new bridge with
> this bond0.48 interface (bridge with funny name), and plug Public vNICs
> into this new bridge
> - When you do NOT specify a VLAN id for some network in CloudStack (i.e.
> your storage network doesn't use VLAN ID in CloudStack, your switch ports
> are in access vlan 96) - you need to have a bridge (i.e. cloudbr2) with the
> bondYYY child interface (instead of that "bond-storage" fancy but
> unrecognized child interface name) - and then ACS will NOT extract child
> interface (nor do everything I explained in the previous paraghraph/bullet
> point) - it will just bluntly "stick" all the vNICs into that cloudbr2 -
> and hope you have a proper physical/child interface also added to the
> cloudbr2 that will carry the traffic down the line... (purely FYI -  you
> could also e.g. use trunking on Linux if you want to, and have e.g.
> "bondXXX.96" VLAN interface manually configured and add it to the bridge,
> while still NOT defining any VLAN in the CloudStack for that Storage
> network - and ACS will just stick vNIC to this bridge)
>
> Public traffic/network - is the network that all systemVMs (SSVM, CPVM and
> all VRs) are connected to - this network is "public" like "external" to
> other CloudStack internal or Guest network - this is the network to which
> the "north" interface is connected - but does NOT have to be " non-RFC 1918
> " - it can be any private IP range from your company internal network (that
> will 

Re: Unable to add template to new deployment

2021-06-17 Thread Andrija Panic
Since you really bothered to provide so very detailed inputs and help us
help you (vs what some other people tend to do) -  I think you really
deserved a decent answer (and some explanation).

The last question first -even though you don't specify/have dedicated
Storage traffic, there will be an additional interface inside the SSVM
connected to the same Management network (not to the old Storage network -
if you see the old storage network, restart your mgmt server and destroy
the SSVM - a new one should be created, with proper interfaces inside it)

bond naming issues:
- rename  your "bond-services" to something industry-standard like "bond0"
or similar - cloudstack extracts "child" interfaces from cloudbr1 IF you
specify a VLAN for a network that ACS should create - so your
"bond-services", while fancy (and unclear to me WHY you named it in that
weird way - smiley here) - is NOT something CloudStack will recognize and
this is the reason it fails (it even says so in that error message)
- no reason to NOT have that dedicated storage network -  feel free to
bring it back - the same issue you have as for the public traffic - rename
"bond-storage" to e.g. "bond1" and you will be good to go -  since you are
NOT using tagging, ACS will just plug vNIC of the VM into the cloudbr2 (or
whatever bridge name you use for it).

Now some explanation (even though your deduction capabilities certainly
made you draw some conclusions from what I wrote above ^^^)

- When you specify a VLAN id for some network in CLoudStack - CloudStack
will look for the device name that is specified as the "Traffic label" for
that traffic (and you have none??? for your Public traffic - while it
should be set to the name of the bridge device "cloudbr1") - and then it
will provision a VLAN interface and create a new bridge - (i.e. for Public
network with VLAN id 48, it will extract "bond0" from the "cloudbr1", and
create bond0.48 VLAN interface - AND it will create a brand new bridge with
this bond0.48 interface (bridge with funny name), and plug Public vNICs
into this new bridge
- When you do NOT specify a VLAN id for some network in CloudStack (i.e.
your storage network doesn't use VLAN ID in CloudStack, your switch ports
are in access vlan 96) - you need to have a bridge (i.e. cloudbr2) with the
bondYYY child interface (instead of that "bond-storage" fancy but
unrecognized child interface name) - and then ACS will NOT extract child
interface (nor do everything I explained in the previous paraghraph/bullet
point) - it will just bluntly "stick" all the vNICs into that cloudbr2 -
and hope you have a proper physical/child interface also added to the
cloudbr2 that will carry the traffic down the line... (purely FYI -  you
could also e.g. use trunking on Linux if you want to, and have e.g.
"bondXXX.96" VLAN interface manually configured and add it to the bridge,
while still NOT defining any VLAN in the CloudStack for that Storage
network - and ACS will just stick vNIC to this bridge)

Public traffic/network - is the network that all systemVMs (SSVM, CPVM and
all VRs) are connected to - this network is "public" like "external" to
other CloudStack internal or Guest network - this is the network to which
the "north" interface is connected - but does NOT have to be " non-RFC 1918
" - it can be any private IP range from your company internal network (that
will eventually route traffic to internet - IF you want your ACS to be able
to download stuff/templates from Internet - otherwise it does NOT have to
route to internet - if you are using private cloud and do NOT want external
access to your ACS, well to SSVM and CPVM and VRs external ("public")
interfaces/IPs - but if you are running a public cloud - then you want to
provide a non-RFC 1918  i.e. a really Publicly routable IP addresses/range
for the Public network - ACS will assign 1IP for SSVM, 1 IP for CPVM, and
many IPs to your many VRs you create.

A thing that I briefly touched somewhere upstairs ^^^ - for each traffic
type you have defined - you need to define a traffic label - my deduction
capabilities make me believe you are using KVM, so you need to set your KVM
traffic label for all your network traffic (traffic label, in you case =
exact name of the bridge as visible in Linux) - I recall there are some new
UI issues when it comes to tags, so go to your :8080/client/legacy
- and check your traffic label there - and set it there, UI in 4.15.0.0
doesn't allow you to update/set it after the zone is created - but old UI
will allow you to do it.

Not sure why I spent 30 minutes of my life, but there you go - hope you got
everything from my email - let me know if anything is unclear!

Cheers,

On Wed, 16 Jun 2021 at 19:15, Joshua Schaeffer 
wrote:

> So Suresh's advise has pushed me in the right direction. The VM was up but
> the agent state was down. I was able to connect to the VM in order to
> continue investigating and the VM is having network issues connecting to
> both my load balancer 

Re: Unable to add template to new deployment

2021-06-16 Thread Joshua Schaeffer
So Suresh's advise has pushed me in the right direction. The VM was up but the 
agent state was down. I was able to connect to the VM in order to continue 
investigating and the VM is having network issues connecting to both my load 
balancer and my secondary storage server. I don't think I'm understanding how 
the public network portion is supposed to work in my zone and could use some 
clarification. First let me explain my network setup. On my compute nodes, 
ideally, I want to use 3 NIC's:

1. A management NIC for management traffic. I was using cloudbr0 for this. 
cloudbr0 is a bridge I created that is connected to an access port on my 
switch. No vlan tagging is required to use this network (it uses VLAN 20)
2. A cloud NIC for both public and guest traffic. I was using cloudbr1 for 
this. cloudbr1 is a bridge I created that is connected to a trunk port on my 
switch. Public traffic uses VLAN 48 and guest traffic should use VLANs 400 - 
656. As the port is trunked I have to use vlan tagging for any traffic over 
this NIC.
3. A storage NIC for storage traffic. I use a bond called "bond-storage" for 
this. bond-storage is connected to an access port on my switch. No vlan tagging 
is required to use this network (it uses VLAN 96)

For now I've removed the storage NIC from the configuration to simplify my 
troubleshooting, so I should only be working with cloudbr0 and cloudbr1. To me 
the public network is a *non-RFC 1918* address that should be assigned to 
tenant VM's for external internet access. Why do system VM's need/get a public 
IP address? Can't they access all the internal CloudStack servers using the 
pod's management network?

So the first problem I'm seeing is whenever I tell CloudStack to tag VLAN 48 
for public traffic it uses the underlying bond under cloudbr1 and not the 
bridge. I don't know where it is even getting this name as I never provided it 
to CloudStack

Here is how I have it configured: 
https://drive.google.com/file/d/10PxLdp6e46_GW7oPFJwB3sQQxnvwUhvH/view?usp=sharing

Here is the message in the management logs:

2021-06-16 16:00:40,454 INFO  [c.c.v.VirtualMachineManagerImpl] 
(Work-Job-Executor-13:ctx-0f39d8e2 job-4/job-68 ctx-a4f832c5) (logid:eb82035c) 
Unable to start VM on Host[-2-Routing] due to Failed to create vnet 48: Error: 
argument "bond-services.48" is wrong: "name" not a valid ifnameCannot find 
device "bond-services.48"Failed to create vlan 48 on pif: bond-services.

This ultimately results in an error and the system VM never even starts.

If I remove the vlan tag from the configuration 
(https://drive.google.com/file/d/11tF6YIHm9xDZvQkvphi1xvHCX_X9rDz1/view?usp=sharing)
 then the VM starts and gets a public IP but without a tagged NIC it can't 
actually connect to the network. This is from inside the system VM:

root@s-9-VM:~# ip --brief addr
lo   UNKNOWN    127.0.0.1/8
eth0 UP 169.254.91.216/16
eth1 UP 10.2.21.72/22
eth2 UP 192.41.41.162/25
eth3 UP 10.2.99.15/22
root@s-9-VM:~# ping 192.41.41.129
PING 192.41.41.129 (192.41.41.129): 56 data bytes
92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable
92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable
92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable
92 bytes from s-9-VM (192.41.41.162): Destination Host Unreachable
^C--- 192.41.41.129 ping statistics ---
5 packets transmitted, 0 packets received, 100% packet loss

Obviously if the network isn't functioning then it can't connect to my storage 
server and the agent never starts. How do I setup my public network so that it 
tags the packets going over cloudbr1? Also, can I not have a public IP address 
for system VM's or is this required?

I have some other issues as well like the fact that it is creating a storage 
NIC on the system VM's even though I deleted my storage network from the zone, 
but I can tackle one problem at a time. If anyone is curious or it helps 
visualize my network, here is is a little ASCII diagram of how I have the 
compute node's networking setup. Hopefully it comes across the mailing list 
correctly and not all mangled:

+===
|
|    enp3s0f0 (eth) enp3s0f1 (eth) enp65s0f0 (eth)    enp65s0f1 (eth)   
 enp71s0 (eth)    enp72s0 (eth)
|   |  |   |  | 
   |  |
|   |  |   ++-+ 
   ++-+
|   |  |    |   
    |
|   |  |   bond-services (bond) 
    |
|   |  |    |   
    |
|   |  |   

Re: Unable to add template to new deployment

2021-06-16 Thread Andrija Panic
" There is no secondary storage VM for downloading template to image store
LXC_SEC_STOR1 "

So next step to investigate why there is no SSVM (can hosts access the
secondary storage NFS, can they access the Primary Storage, etc - those
tests you can do manually) - and as Suresh advised - one it's up, is it all
green (COnnected / Up state).

Best,


Re: Unable to add template to new deployment

2021-06-16 Thread Suresh Anaparti
HI Joshua,

What is the agent state of secondarystoragevm in the UI. If it is not 'Up', you 
can SSH to it and check the status of agent service there (service cloud 
status). Have you noticed any errors in the ssvm log at ' 
/var/log/cloud/cloud.out' if the service is running.


Regards,
Suresh

On 16/06/21, 4:11 AM, "Joshua Schaeffer"  wrote:

I've setup ACS 4.15 and created the first zone on it using the wizard 
through primate. The zone is enabled and isn't showing any issues in the UI. I 
can see two system vm's running (secondarystoragevm and consoleproxy) and can 
SSH into the VM's from the compute node. However when I try to add a template 
through the UI I get the following error:

There is no secondary storage VM for downloading template to image store 
LXC_SEC_STOR1

And on the controller I can see the corresponding log entries when I try to 
submit the new template:

2021-06-15 22:13:35,884 DEBUG [c.c.a.ApiServlet] 
(qtp1644231115-348:ctx-aab81632) (logid:ac328268) ===START===  172.16.44.18 -- 
GET  
name=Ubuntu+20.04=Ubuntu+20.04+(Focal)+64-bit=9f6f5b49-0e12-4af4-a13b-2ace6c47de43=LXC=TAR=aa18f3ad-cd73-11eb-b1da-5254008f72d5=true=getUploadParamsForTemplate=json
2021-06-15 22:13:35,924 DEBUG [c.c.a.ApiServer] 
(qtp1644231115-348:ctx-aab81632 ctx-da7281d3) (logid:ac328268) CIDRs from which 
account 'Acct[f8d6949d-cd74-11eb-b1da-5254008f72d5-admin]' is allowed to 
perform API calls: 0.0.0.0/0,::/0
2021-06-15 22:13:36,093 DEBUG [o.a.c.s.i.TemplateDataFactoryImpl] 
(qtp1644231115-348:ctx-aab81632 ctx-da7281d3) (logid:ac328268) template 203 is 
not in store:1, type:Image
2021-06-15 22:13:36,138 DEBUG [o.a.c.s.i.TemplateDataFactoryImpl] 
(qtp1644231115-348:ctx-aab81632 ctx-da7281d3) (logid:ac328268) template 203 is 
already in store:1, type:Image
2021-06-15 22:13:36,142 WARN  [c.c.t.HypervisorTemplateAdapter] 
(qtp1644231115-348:ctx-aab81632 ctx-da7281d3) (logid:ac328268) There is no 
secondary storage VM for downloading template to image store LXC_SEC_STOR1
2021-06-15 22:13:36,146 DEBUG [c.c.u.d.T.Transaction] 
(qtp1644231115-348:ctx-aab81632 ctx-da7281d3) (logid:ac328268) Rolling back the 
transaction: Time = 98 Name =  qtp1644231115-348; called by 
-TransactionLegacy.rollback:888-TransactionLegacy.removeUpTo:831-TransactionLegacy.close:655-Transaction.execute:38-Transaction.execute:47-HypervisorTemplateAdapter.createTemplateForPostUpload:298-TemplateManagerImpl.registerPostUploadInternal:361-TemplateManagerImpl.registerTemplateForPostUpload:423-NativeMethodAccessorImpl.invoke0:-2-NativeMethodAccessorImpl.invoke:62-DelegatingMethodAccessorImpl.invoke:43-Method.invoke:566
2021-06-15 22:13:36,229 ERROR [c.c.a.ApiServer] 
(qtp1644231115-348:ctx-aab81632 ctx-da7281d3) (logid:ac328268) unhandled 
exception executing api command: [Ljava.lang.String;@2ad13bf5
com.cloud.utils.exception.CloudRuntimeException: There is no secondary 
storage VM for downloading template to image store LXC_SEC_STOR1
at 
com.cloud.template.HypervisorTemplateAdapter$1.doInTransaction(HypervisorTemplateAdapter.java:363)
at 
com.cloud.template.HypervisorTemplateAdapter$1.doInTransaction(HypervisorTemplateAdapter.java:298)
at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:50)
at com.cloud.utils.db.Transaction.execute(Transaction.java:40)
at com.cloud.utils.db.Transaction.execute(Transaction.java:47)
at 
com.cloud.template.HypervisorTemplateAdapter.createTemplateForPostUpload(HypervisorTemplateAdapter.java:298)
at 
com.cloud.template.TemplateManagerImpl.registerPostUploadInternal(TemplateManagerImpl.java:361)
at 
com.cloud.template.TemplateManagerImpl.registerTemplateForPostUpload(TemplateManagerImpl.java:423)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
at 
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at 
org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:107)
at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
at 
com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:51)
at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
at 

Unable to add template to new deployment

2021-06-15 Thread Joshua Schaeffer
I've setup ACS 4.15 and created the first zone on it using the wizard through 
primate. The zone is enabled and isn't showing any issues in the UI. I can see 
two system vm's running (secondarystoragevm and consoleproxy) and can SSH into 
the VM's from the compute node. However when I try to add a template through 
the UI I get the following error:

There is no secondary storage VM for downloading template to image store 
LXC_SEC_STOR1

And on the controller I can see the corresponding log entries when I try to 
submit the new template:

2021-06-15 22:13:35,884 DEBUG [c.c.a.ApiServlet] 
(qtp1644231115-348:ctx-aab81632) (logid:ac328268) ===START===  172.16.44.18 -- 
GET  
name=Ubuntu+20.04=Ubuntu+20.04+(Focal)+64-bit=9f6f5b49-0e12-4af4-a13b-2ace6c47de43=LXC=TAR=aa18f3ad-cd73-11eb-b1da-5254008f72d5=true=getUploadParamsForTemplate=json
2021-06-15 22:13:35,924 DEBUG [c.c.a.ApiServer] (qtp1644231115-348:ctx-aab81632 
ctx-da7281d3) (logid:ac328268) CIDRs from which account 
'Acct[f8d6949d-cd74-11eb-b1da-5254008f72d5-admin]' is allowed to perform API 
calls: 0.0.0.0/0,::/0
2021-06-15 22:13:36,093 DEBUG [o.a.c.s.i.TemplateDataFactoryImpl] 
(qtp1644231115-348:ctx-aab81632 ctx-da7281d3) (logid:ac328268) template 203 is 
not in store:1, type:Image
2021-06-15 22:13:36,138 DEBUG [o.a.c.s.i.TemplateDataFactoryImpl] 
(qtp1644231115-348:ctx-aab81632 ctx-da7281d3) (logid:ac328268) template 203 is 
already in store:1, type:Image
2021-06-15 22:13:36,142 WARN  [c.c.t.HypervisorTemplateAdapter] 
(qtp1644231115-348:ctx-aab81632 ctx-da7281d3) (logid:ac328268) There is no 
secondary storage VM for downloading template to image store LXC_SEC_STOR1
2021-06-15 22:13:36,146 DEBUG [c.c.u.d.T.Transaction] 
(qtp1644231115-348:ctx-aab81632 ctx-da7281d3) (logid:ac328268) Rolling back the 
transaction: Time = 98 Name =  qtp1644231115-348; called by 
-TransactionLegacy.rollback:888-TransactionLegacy.removeUpTo:831-TransactionLegacy.close:655-Transaction.execute:38-Transaction.execute:47-HypervisorTemplateAdapter.createTemplateForPostUpload:298-TemplateManagerImpl.registerPostUploadInternal:361-TemplateManagerImpl.registerTemplateForPostUpload:423-NativeMethodAccessorImpl.invoke0:-2-NativeMethodAccessorImpl.invoke:62-DelegatingMethodAccessorImpl.invoke:43-Method.invoke:566
2021-06-15 22:13:36,229 ERROR [c.c.a.ApiServer] (qtp1644231115-348:ctx-aab81632 
ctx-da7281d3) (logid:ac328268) unhandled exception executing api command: 
[Ljava.lang.String;@2ad13bf5
com.cloud.utils.exception.CloudRuntimeException: There is no secondary storage 
VM for downloading template to image store LXC_SEC_STOR1
    at 
com.cloud.template.HypervisorTemplateAdapter$1.doInTransaction(HypervisorTemplateAdapter.java:363)
    at 
com.cloud.template.HypervisorTemplateAdapter$1.doInTransaction(HypervisorTemplateAdapter.java:298)
    at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:50)
    at com.cloud.utils.db.Transaction.execute(Transaction.java:40)
    at com.cloud.utils.db.Transaction.execute(Transaction.java:47)
    at 
com.cloud.template.HypervisorTemplateAdapter.createTemplateForPostUpload(HypervisorTemplateAdapter.java:298)
    at 
com.cloud.template.TemplateManagerImpl.registerPostUploadInternal(TemplateManagerImpl.java:361)
    at 
com.cloud.template.TemplateManagerImpl.registerTemplateForPostUpload(TemplateManagerImpl.java:423)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
    at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at 
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
    at 
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
    at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
    at 
org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:107)
    at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
    at 
com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:51)
    at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
    at 
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:95)
    at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
    at 
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)
    at com.sun.proxy.$Proxy198.registerTemplateForPostUpload(Unknown Source)
    at