Re: Unable to add template to new deployment

2021-06-17 Thread jschaeffer

Andrija,

Thanks so much for all the details. I'm out of the office for the next 
couple of days so will update my cloud with your suggestions when I get 
back.


As far as the "fancy" naming, I just never found names like bondX useful 
when Linux allows naming the network device something else. It has just 
become a convention of mine. I can easily distinguish which bond carries 
cloud traffic and which carries storage traffic by looking at the bond 
name, but it is just a personal thing and can easily switch back to 
using the standard bond names.


I was aware of the traffic labels but forgot to mention that I had set 
those up in my previous email. There were still some details that you 
provided that helped me further understand how they work though, thanks.


Again, thanks for you help.

On 2021-06-17 22:04, Andrija Panic wrote:
BTW, once you thing you have fixed all your network configuration 
issues -
destroy all system VM (CPVM, SSVM and restart all networks with 
"cleanup" -

so that new VMs are created_
Inside SSVM, run the the following script, which should give you 
results

similar as below - confirming that your SSVM is healthy



  root@s-2536-VM:/usr/local/cloud/systemvm# 
/usr/local/cloud/systemvm/ssvm-

check.sh

First DNS server is  192.168.169.254
PING 192.168.169.254 (192.168.169.254): 56 data bytes
64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms
64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms
--- 192.168.169.254 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms
Good: Can ping DNS server

Good: DNS resolves cloudstack.apache.org

nfs is currently mounted
Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2
Good: Can write to mount point

Management server is 192.168.169.13. Checking connectivity.
Good: Can connect to management server 192.168.169.13 port 8250

Good: Java process is running

Tests Complete. Look for ERROR or WARNING above.

On Thu, 17 Jun 2021 at 23:55, Andrija Panic  
wrote:


Since you really bothered to provide so very detailed inputs and help 
us

help you (vs what some other people tend to do) -  I think you really
deserved a decent answer (and some explanation).

The last question first -even though you don't specify/have dedicated
Storage traffic, there will be an additional interface inside the SSVM
connected to the same Management network (not to the old Storage 
network -
if you see the old storage network, restart your mgmt server and 
destroy
the SSVM - a new one should be created, with proper interfaces inside 
it)


bond naming issues:
- rename  your "bond-services" to something industry-standard like 
"bond0"
or similar - cloudstack extracts "child" interfaces from cloudbr1 IF 
you

specify a VLAN for a network that ACS should create - so your
"bond-services", while fancy (and unclear to me WHY you named it in 
that
weird way - smiley here) - is NOT something CloudStack will recognize 
and

this is the reason it fails (it even says so in that error message)
- no reason to NOT have that dedicated storage network -  feel free to
bring it back - the same issue you have as for the public traffic - 
rename
"bond-storage" to e.g. "bond1" and you will be good to go -  since you 
are
NOT using tagging, ACS will just plug vNIC of the VM into the cloudbr2 
(or

whatever bridge name you use for it).

Now some explanation (even though your deduction capabilities 
certainly

made you draw some conclusions from what I wrote above ^^^)

- When you specify a VLAN id for some network in CLoudStack - 
CloudStack
will look for the device name that is specified as the "Traffic label" 
for

that traffic (and you have none??? for your Public traffic - while it
should be set to the name of the bridge device "cloudbr1") - and then 
it
will provision a VLAN interface and create a new bridge - (i.e. for 
Public
network with VLAN id 48, it will extract "bond0" from the "cloudbr1", 
and
create bond0.48 VLAN interface - AND it will create a brand new bridge 
with
this bond0.48 interface (bridge with funny name), and plug Public 
vNICs

into this new bridge
- When you do NOT specify a VLAN id for some network in CloudStack 
(i.e.
your storage network doesn't use VLAN ID in CloudStack, your switch 
ports
are in access vlan 96) - you need to have a bridge (i.e. cloudbr2) 
with the

bondYYY child interface (instead of that "bond-storage" fancy but
unrecognized child interface name) - and then ACS will NOT extract 
child
interface (nor do everything I explained in the previous 
paraghraph/bullet
point) - it will just bluntly "stick" all the vNICs into that cloudbr2 
-


Cloudstack Usage --- not owner

2021-06-17 Thread Hean Seng
Hi

I have mysql hang, and restarted mysql, restarted cloudstack mgmt, and
usage.

After that facing issue of following:

duration is 120 minutes)

2021-06-18 04:25:49,656 DEBUG [cloud.usage.UsageManagerImpl]
(Usage-HB-1:null) (logid:) Scheduling Usage job...

2021-06-18 04:25:49,657 INFO  [cloud.usage.UsageManagerImpl]
(Usage-Job-1:null) (logid:) starting usage job...

2021-06-18 04:25:49,669 DEBUG [cloud.usage.UsageManagerImpl]
(Usage-Job-1:null) (logid:) Not owner of usage job, skipping...

2021-06-18 04:25:49,669 INFO  [cloud.usage.UsageManagerImpl]
(Usage-Job-1:null) (logid:) usage job complete



Seems usage pid is not updated to db .

Andybody know how to fix this ?

-- 
Regards,
Hean Seng


Re: [VOTE] Apache CloudStack 4.15.1.0 (RC2)

2021-06-17 Thread Andrija Panic
@Rohit Yadav  we might have a UI blocker, I'm
waiting for Mike to report the issue - it seems (per his separate email to
this ML) that the Traffic Labels are not persisted after the Zone
deployment (UI doesn't show traffic labels) - I do recall an issue in
4.15.0 where similar was happening and one could not update the Traffic
label in new UI (old UI had to be used)

@Corey, Mike  please report here with the problem - thx.

On Wed, 16 Jun 2021 at 18:28, Rohit Yadav  wrote:

> Hi All,
>
> I've created a 4.15.1.0 release, with the following artifacts up for a
> vote:
>
> Git Branch:
> https://github.com/apache/cloudstack/tree/4.15.1.0-RC20210616T2128
> Commit SHA:
> 3afd37022b9dac52cd146dccada6012e47a80232
>
> Source release (checksums and signatures are available at the same
> location):
> https://dist.apache.org/repos/dist/dev/cloudstack/4.15.1.0/
>
> PGP release keys (signed using 5ED1E1122DC5E8A4A45112C2484248210EE3D884):
> https://dist.apache.org/repos/dist/release/cloudstack/KEYS
>
> The vote will be open for the next week until 22 June 2021.
>
> For sanity in tallying the vote, can PMC members please be sure to indicate
> "(binding)" with their vote?
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
> For users convenience, the packages from this release candidate and 4.15.1
> systemvmtemplates are available here:
> https://download.cloudstack.org/testing/4.15.1.0-RC2/
> https://download.cloudstack.org/systemvm/4.15/
>
> Documentation is not published yet, but the following may be referenced for
> upgrade related tests: (there's a new 4.15.1 systemvmtemplate to be
> registered prior to upgrade)
>
> https://github.com/apache/cloudstack-documentation/tree/4.15/source/upgrading/upgrade
>
> Regards.
>


-- 

Andrija Panić


Re: Centos 7.9 - cloud-init password reset?

2021-06-17 Thread Andrija Panic
Thanks Yordan, nice PR!

Best,

On Sun, 30 May 2021 at 16:03, Yordan Kostov  wrote:

> Dear everyone,
>
> Did a draft on Creating Linux template guide you can find it here
> - https://github.com/apache/cloudstack-documentation/pull/215.
> A separate page has been done that can be considered as additions
> to the basic Linux guide. It relates to cloud-init and its features that
> serve as a middleware to Cloudstack instances GUI functions.
>
> Guides are based on the following scripts:
> - Centos 7 -
> https://github.com/dredknight/cloud_scripts/blob/master/CloudStack-Xen/templates/centos7_clean.bash
> - Ubuntu 20 -
> https://github.com/dredknight/cloud_scripts/blob/master/CloudStack-Xen/templates/ubuntu20_prep_clean.bash
>
> Could you take a look and let me know if anything needs to be
> changed - technical or format wise?
>
> During tests all features seem to work fine with the following
> peculiarity.
> - When ssh keys are reset in coudstack, the public key is added in
> /home/cloud-user/.ssh/authorized_keys but the old one is not removed.
> This means that users having previous private keys will still be
> able to login is there a way Cloudstack to delete the old key?
>
> Best regards,
> Jordan
>
> -Original Message-
> From: Alireza Eskandari 
> Sent: Sunday, May 23, 2021 1:53 AM
> To: users@cloudstack.apache.org
> Subject: Re: Centos 7.9 - cloud-init password reset?
>
>
> [X] This message came from outside your organization
>
>
> It seems cloud-init cannot execute the script so it shows an error but the
> script is running fine standalone.
> I'll try it on centos stream.
> Notice that cloud-init can handle password and ssh key from user data
> server without extra script, but it can't reset ssh key or set password
> from configdrive.
> The script resolves these problems.
>
> On Fri, May 21, 2021 at 12:45 AM 조대형  wrote:
>
> > Hi,
> >
> > I have attached the logs that I execute the password script and
> cloud-init.
> >
> > # ./password.bash
> >
> >  Results : executed password reset file.
> >
> > Cloud Password Manager: Searching for ConfigDrive Cloud Password
> > Manager: ConfigDrive not found Cloud Password Manager: Detecting
> > primary network Cloud Password Manager: Trying to find userdata server
> > Cloud Password Manager: Operating System is using NetworkManager Cloud
> > Password Manager: Found userdata server IP VR's IP address in
> > NetworkManager config Cloud Password Manager: Sending request to
> > userdata server at VR's IP address  to get public key Cloud Password
> > Manager: Got response from userdata server at VR's IP address Cloud
> > Password Manager: Did not receive any public keys from userdata server
> > Cloud Password Manager: Sending request to userdata server at VR's IP
> > address  to get the password Cloud Password Manager: Got response from
> > userdata server at VR's IP address Cloud Password Manager: VM has
> > already saved a password from the userdata server at VR's IP address
> >
> >
> >
> > # cloud-init init
> >
> > Cloud-init v. 20.3-10.el8 running 'init' at Fri, 21 May 2021 04:40:34
> > +. Up 268624.75 seconds.
> > ci-info: +++Net device
> > info
> > ci-info:
> >
> ++--+-+-++---+
> > ci-info: | Device |  Up  |   Address   |   Mask
> |
> > Scope  | Hw-Address|
> > ci-info:
> >
> ++--+-+-++---+
> > ci-info: |  eth0  | True | VR'S IP address1 |
> > 255.255.255.192 | global | 1e:00:8f:00:02:8f |
> > ci-info: |  eth0  | True | fe80::1c00:8fff:fe00:28f/64 |.
> > |  link  | 1e:00:8f:00:02:8f |
> > ci-info: |   lo   | True |  127.0.0.1  |255.0.0.0
> > |  host  | . |
> > ci-info: |   lo   | True |   ::1/128   |.
> > |  host  | . |
> > ci-info:
> >
> ++--+-+-++---+
> > ci-info: +Route IPv4
> > info++
> > ci-info:
> >
> +---+-++-+---+---+
> > ci-info: | Route | Destination |  Gateway   | Genmask | Interface
> > | Flags |
> > ci-info:
> >
> +---+-++-+---+---+
> > ci-info: |   0   |   0.0.0.0   | x.x.x.1 | 0.0.0.0 |eth0   |
> >  UG  |
> > ci-info: |   1   |  x.x.x.0 |  0.0.0.0   | 255.255.255.192 |eth0   |
> >  U   |
> > ci-info:
> >
> +---+-++-+---+---+
> > ci-info: +++Route IPv6 info+++
> > ci-info: +---+-+-+---+---+
> > ci-info: | Route | 

Re: Boot Order XenServer

2021-06-17 Thread Andrija Panic
If you read the last few blog lines more carefully you will notice only the
KEY name should be in the allowed list, not the value itself - so just
"HVM-boot-params:order" - if this doesn't work, then there might be a  bug
(due to column sign - so please test if you can use/reproduce the same
example as in our blog page)

Best,

On Tue, 15 Jun 2021 at 01:17, Felipe  wrote:

> I put in allow.additional.vm.configuration.list.xenserver:
>
> HVM-boot-params%3Aorder%3D%22dcn%22
>
> it didn't work, do you have an example to change the order of bios on HVM
> on xenserver?
>
> Thank you!!
>
> On 2021/06/14 22:32:24, Andrija Panic  wrote:
> >
> https://www.shapeblue.com/cloudstack-feature-first-look-enable-sending-of-arbitrary-configuration-data-to-vms/
> >
> > Best,
> >
> > On Mon, 14 Jun 2021 at 21:57, Felipe  wrote:
> >
> > > Hello everyone!!!
> > >
> > > I wonder if it is possible to change the boot order on xenserver?
> > >
> > > in global settings, is it at
> > > allow.additional.vm.configuration.list.xenserver?
> > >
> > > i would like to put DVD first in boot order.
> > >
> > > thank you all!!
> > >
> > > [image: image.png]
> > >
> > >
> >
> > --
> >
> > Andrija Panić
> >
>


-- 

Andrija Panić


Re: Alter Shared Guest Network?

2021-06-17 Thread Andrija Panic
There is something wrong there, and you should not, to my knowledge, have
issues with IDs (but I don't recall I have checked this ever)

Before "cloning" the row from user_ip_address table - please make sure you
are cloning an empty record, not the one which is "used" and alter clean up
things - makes you life easier.

Sequence "problem" :

I have no idea where is this mac_address used later - but the logical place
would be the cloud.nics table - all NICs that exist (for all of your VMs,
including system VMs) are located in that table - check the network ID in
the "networks" table (shared network ID), then do select * from nics where
network_id= to show all NICs from that network - in your case
there should be 3 NICs (of VR, VM1, VM2) - check if the mac addresses of
VM1 and VM2 are different - it NOT then you have the problem, otherwise, I
don't think you do have a problem - check inside your VM1 and VM2 if they
go their respective MAC and IP addresses - they should be different from
VM1 to VM2)

(I'm pretty sure that MAC sequence is not used anywhere, or anymore - as
the actual sequence numbers (for different resources) are kept in the
"sequence" table - and in my env MAC sequence for both private and public
MACs are set to "1" -which is nonsense - probably not used any more.

Best,

On Tue, 15 Jun 2021 at 13:22, Yordan Kostov  wrote:

> FYI tested this on 4.15 with specifics:
>  - Shared network with 2 ip range for example 10.10.10.10 - 10.10.10.11
> - created as much VMs as ACS allows me which is 1 (first ip gets assigned
> to the VR)
> - expanded the the range of the shared network in table "VLAN" from
> 10.10.10.10-10.10.10.11 to 10.10.10.10-10.10.10.12
> - Dublicated existing entry in table "user_ip_address" for ip in that
> specific shared network. Changed the following columns with new entries:
> --- ID to the next unreserved
> --- UUID to unique one for the table
> --- public_ip_address to 10.10.10.12
> --- allocated - make it NULL
> --- state - make it Free
> --- mac_address - look at the whole table and set it to the next one that
> is not used
>
> Back to ACS gui I can create a new VM in that network and Ip is assigned.
> But there are some underwater stones that are created this way.
> As IDs are created manually ACS DB is not updating its sequence so I was
> wondering if new network is created would it take the same MAC ID.
> After creating a new network and looking again in the table - the answer
> to this question is  yes - https://imgur.com/YnGMGRE.
>
> So besides the 2 tables another one should be edited but so far I cannot
> find where is the sequence kept.
>
> Best regards,
> Jordan
>
> -Original Message-
> From: Andrija Panic 
> Sent: Monday, June 14, 2021 10:24 PM
> To: users 
> Subject: Re: Alter Shared Guest Network?
>
>
> [X] This message came from outside your organization
>
>
> ANother is is the, if not mistaken, the VLAN table. which will contain the
> range as x.x.x.1-x.x.x.10 - etc - this is needed to be updated as well (if
> you manually add records in the user_ip_address table)
>
> best,
>
> On Thu, 10 Jun 2021 at 18:23, Jeremy Hansen  wrote:
>
> > Thanks. I’ll take a look table.
> >
> > -jeremy
> >
> > > On Jun 10, 2021, at 6:57 AM, Yordan Kostov 
> wrote:
> > >
> > > Hello Jeremy,
> > >
> > >Once a shared network with DHCP offering is created the IPs
> > > fitting
> > into the defined range are created in table called "user_ip_address".
> > >They are created one by one so if range between x.x.x.x.11 and
> > x.x.x.210 is created this will add 200 entries. So if you want to
> > expand that you need to add more entries manually, which is a bit
> unfortunate.
> > >
> > > Best regards,
> > > Jordan
> > >
> > > -Original Message-
> > > From: Jeremy Hansen 
> > > Sent: Thursday, June 10, 2021 12:12 AM
> > > To: users@cloudstack.apache.org
> > > Subject: Re: Alter Shared Guest Network?
> > >
> > >
> > > [X] This message came from outside your organization
> > >
> > >
> > >> On Jun 9, 2021, at 1:39 PM, Wido den Hollander 
> wrote:
> > >>
> > >> 
> > >>
> >  On 6/9/21 3:55 PM, Jeremy Hansen wrote:
> > >>> When I created my shared network config, I specified too narrow of
> > >>> an
> > IP range.
> > >>>
> > >>> I can’t seem to figure out how to alter this config via the web
> > interface. Is this possible?
> > >>>
> > >>
> > >> Not via de UI nor API. You will need to hack this in the database.
> > >> Or remove the network and create it again. But this is only
> > >> possible if there are no VMs in the network.
> > >>
> > >> Wido
> > >
> > > Thanks, recreating it seems like the easiest option since I’m only
> > > in
> > testing phase right now, but I’m curious what it would take to alter
> > tables to fix this. Any clues as to what tables/fields would need to be
> updated?
> > >
> > >>
> > >>> -jeremy
> > >>>
> > >
> >
> >
>
> --
>
> Andrija Panić
>


-- 

Andrija Panić


Re: Unable to add template to new deployment

2021-06-17 Thread Andrija Panic
BTW, once you thing you have fixed all your network configuration issues -
destroy all system VM (CPVM, SSVM and restart all networks with "cleanup" -
so that new VMs are created_
Inside SSVM, run the the following script, which should give you results
similar as below - confirming that your SSVM is healthy



  root@s-2536-VM:/usr/local/cloud/systemvm# /usr/local/cloud/systemvm/ssvm-
check.sh

First DNS server is  192.168.169.254
PING 192.168.169.254 (192.168.169.254): 56 data bytes
64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms
64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms
--- 192.168.169.254 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms
Good: Can ping DNS server

Good: DNS resolves cloudstack.apache.org

nfs is currently mounted
Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2
Good: Can write to mount point

Management server is 192.168.169.13. Checking connectivity.
Good: Can connect to management server 192.168.169.13 port 8250

Good: Java process is running

Tests Complete. Look for ERROR or WARNING above.

On Thu, 17 Jun 2021 at 23:55, Andrija Panic  wrote:

> Since you really bothered to provide so very detailed inputs and help us
> help you (vs what some other people tend to do) -  I think you really
> deserved a decent answer (and some explanation).
>
> The last question first -even though you don't specify/have dedicated
> Storage traffic, there will be an additional interface inside the SSVM
> connected to the same Management network (not to the old Storage network -
> if you see the old storage network, restart your mgmt server and destroy
> the SSVM - a new one should be created, with proper interfaces inside it)
>
> bond naming issues:
> - rename  your "bond-services" to something industry-standard like "bond0"
> or similar - cloudstack extracts "child" interfaces from cloudbr1 IF you
> specify a VLAN for a network that ACS should create - so your
> "bond-services", while fancy (and unclear to me WHY you named it in that
> weird way - smiley here) - is NOT something CloudStack will recognize and
> this is the reason it fails (it even says so in that error message)
> - no reason to NOT have that dedicated storage network -  feel free to
> bring it back - the same issue you have as for the public traffic - rename
> "bond-storage" to e.g. "bond1" and you will be good to go -  since you are
> NOT using tagging, ACS will just plug vNIC of the VM into the cloudbr2 (or
> whatever bridge name you use for it).
>
> Now some explanation (even though your deduction capabilities certainly
> made you draw some conclusions from what I wrote above ^^^)
>
> - When you specify a VLAN id for some network in CLoudStack - CloudStack
> will look for the device name that is specified as the "Traffic label" for
> that traffic (and you have none??? for your Public traffic - while it
> should be set to the name of the bridge device "cloudbr1") - and then it
> will provision a VLAN interface and create a new bridge - (i.e. for Public
> network with VLAN id 48, it will extract "bond0" from the "cloudbr1", and
> create bond0.48 VLAN interface - AND it will create a brand new bridge with
> this bond0.48 interface (bridge with funny name), and plug Public vNICs
> into this new bridge
> - When you do NOT specify a VLAN id for some network in CloudStack (i.e.
> your storage network doesn't use VLAN ID in CloudStack, your switch ports
> are in access vlan 96) - you need to have a bridge (i.e. cloudbr2) with the
> bondYYY child interface (instead of that "bond-storage" fancy but
> unrecognized child interface name) - and then ACS will NOT extract child
> interface (nor do everything I explained in the previous paraghraph/bullet
> point) - it will just bluntly "stick" all the vNICs into that cloudbr2 -
> and hope you have a proper physical/child interface also added to the
> cloudbr2 that will carry the traffic down the line... (purely FYI -  you
> could also e.g. use trunking on Linux if you want to, and have e.g.
> "bondXXX.96" VLAN interface manually configured and add it to the bridge,
> while still NOT defining any VLAN in the CloudStack for that Storage
> network - and ACS will just stick vNIC to this bridge)
>
> Public traffic/network - is the network that all systemVMs (SSVM, CPVM and
> all VRs) are connected to - this network is "public" like "external" to
> other CloudStack internal or Guest network - this is the network to which
> the "north" interface is connected - but does NOT have to be " non-RFC 1918
> " - it can be any private IP range from your company internal network (that
> will 

Re: Unable to add template to new deployment

2021-06-17 Thread Andrija Panic
Since you really bothered to provide so very detailed inputs and help us
help you (vs what some other people tend to do) -  I think you really
deserved a decent answer (and some explanation).

The last question first -even though you don't specify/have dedicated
Storage traffic, there will be an additional interface inside the SSVM
connected to the same Management network (not to the old Storage network -
if you see the old storage network, restart your mgmt server and destroy
the SSVM - a new one should be created, with proper interfaces inside it)

bond naming issues:
- rename  your "bond-services" to something industry-standard like "bond0"
or similar - cloudstack extracts "child" interfaces from cloudbr1 IF you
specify a VLAN for a network that ACS should create - so your
"bond-services", while fancy (and unclear to me WHY you named it in that
weird way - smiley here) - is NOT something CloudStack will recognize and
this is the reason it fails (it even says so in that error message)
- no reason to NOT have that dedicated storage network -  feel free to
bring it back - the same issue you have as for the public traffic - rename
"bond-storage" to e.g. "bond1" and you will be good to go -  since you are
NOT using tagging, ACS will just plug vNIC of the VM into the cloudbr2 (or
whatever bridge name you use for it).

Now some explanation (even though your deduction capabilities certainly
made you draw some conclusions from what I wrote above ^^^)

- When you specify a VLAN id for some network in CLoudStack - CloudStack
will look for the device name that is specified as the "Traffic label" for
that traffic (and you have none??? for your Public traffic - while it
should be set to the name of the bridge device "cloudbr1") - and then it
will provision a VLAN interface and create a new bridge - (i.e. for Public
network with VLAN id 48, it will extract "bond0" from the "cloudbr1", and
create bond0.48 VLAN interface - AND it will create a brand new bridge with
this bond0.48 interface (bridge with funny name), and plug Public vNICs
into this new bridge
- When you do NOT specify a VLAN id for some network in CloudStack (i.e.
your storage network doesn't use VLAN ID in CloudStack, your switch ports
are in access vlan 96) - you need to have a bridge (i.e. cloudbr2) with the
bondYYY child interface (instead of that "bond-storage" fancy but
unrecognized child interface name) - and then ACS will NOT extract child
interface (nor do everything I explained in the previous paraghraph/bullet
point) - it will just bluntly "stick" all the vNICs into that cloudbr2 -
and hope you have a proper physical/child interface also added to the
cloudbr2 that will carry the traffic down the line... (purely FYI -  you
could also e.g. use trunking on Linux if you want to, and have e.g.
"bondXXX.96" VLAN interface manually configured and add it to the bridge,
while still NOT defining any VLAN in the CloudStack for that Storage
network - and ACS will just stick vNIC to this bridge)

Public traffic/network - is the network that all systemVMs (SSVM, CPVM and
all VRs) are connected to - this network is "public" like "external" to
other CloudStack internal or Guest network - this is the network to which
the "north" interface is connected - but does NOT have to be " non-RFC 1918
" - it can be any private IP range from your company internal network (that
will eventually route traffic to internet - IF you want your ACS to be able
to download stuff/templates from Internet - otherwise it does NOT have to
route to internet - if you are using private cloud and do NOT want external
access to your ACS, well to SSVM and CPVM and VRs external ("public")
interfaces/IPs - but if you are running a public cloud - then you want to
provide a non-RFC 1918  i.e. a really Publicly routable IP addresses/range
for the Public network - ACS will assign 1IP for SSVM, 1 IP for CPVM, and
many IPs to your many VRs you create.

A thing that I briefly touched somewhere upstairs ^^^ - for each traffic
type you have defined - you need to define a traffic label - my deduction
capabilities make me believe you are using KVM, so you need to set your KVM
traffic label for all your network traffic (traffic label, in you case =
exact name of the bridge as visible in Linux) - I recall there are some new
UI issues when it comes to tags, so go to your :8080/client/legacy
- and check your traffic label there - and set it there, UI in 4.15.0.0
doesn't allow you to update/set it after the zone is created - but old UI
will allow you to do it.

Not sure why I spent 30 minutes of my life, but there you go - hope you got
everything from my email - let me know if anything is unclear!

Cheers,

On Wed, 16 Jun 2021 at 19:15, Joshua Schaeffer 
wrote:

> So Suresh's advise has pushed me in the right direction. The VM was up but
> the agent state was down. I was able to connect to the VM in order to
> continue investigating and the VM is having network issues connecting to
> both my load balancer 

Re: Issues Found Apache CloudStack 4.15.1.0 (RC2)

2021-06-17 Thread Andrija Panic
@Corey, Mike 

can you please raise a GH issue with the same description, and also vote -1
on the RC2 release, with the link to that GH issue?

THanks,
Andrija

On Thu, 17 Jun 2021 at 18:09, Corey, Mike 
wrote:

> Hi,
>
> Thanks for pushing this out.  I'm looking forward to trying the
> template/instance deployment in my VMware PILOT.
>
> A couple items I noticed off the "new" build are:
>
> 1 - During zone creation with VMware and setting up the physical networks
> - adding the traffic label to use a VDS does NOT keep/take/apply.  Once the
> zone is created and you go into the physical networks, the VDS traffic
> label is blank when it should be in this format
> "vSwtichName,VLAN,typeofswitch".  The only physical network traffic label
> that saved during zone setup wizard was for the Management stack; my
> storage and guest physical network traffic labels did not save from the
> wizard.
>
> 2 - Initial SystemVM deployment, the secondary storage permission do not
> allow the copy of the systemvm.iso to the secondary/systemvm/ folder.  I
> had to first create a /mnt/secondary/systemvm/ folder and chmod -R for this
> copy to function.
>
> More to come...
>
> Mike
>
> -Original Message-
> From: Rohit Yadav 
> Sent: Wednesday, June 16, 2021 12:28 PM
> To: d...@cloudstack.apache.org; users@cloudstack.apache.org
> Subject: [VOTE] Apache CloudStack 4.15.1.0 (RC2)
>
> Hi All,
>
> I've created a 4.15.1.0 release, with the following artifacts up for a
> vote:
>
> Git Branch:
> https://github.com/apache/cloudstack/tree/4.15.1.0-RC20210616T2128
> Commit SHA:
> 3afd37022b9dac52cd146dccada6012e47a80232
>
> Source release (checksums and signatures are available at the same
> location):
> https://dist.apache.org/repos/dist/dev/cloudstack/4.15.1.0/
>
> PGP release keys (signed using 5ED1E1122DC5E8A4A45112C2484248210EE3D884):
> https://dist.apache.org/repos/dist/release/cloudstack/KEYS
>
> The vote will be open for the next week until 22 June 2021.
>
> For sanity in tallying the vote, can PMC members please be sure to indicate
> "(binding)" with their vote?
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
> For users convenience, the packages from this release candidate and 4.15.1
> systemvmtemplates are available here:
> https://download.cloudstack.org/testing/4.15.1.0-RC2/
> https://download.cloudstack.org/systemvm/4.15/
>
> Documentation is not published yet, but the following may be referenced for
> upgrade related tests: (there's a new 4.15.1 systemvmtemplate to be
> registered prior to upgrade)
>
> https://github.com/apache/cloudstack-documentation/tree/4.15/source/upgrading/upgrade
>
> Regards.
>


-- 

Andrija Panić


Issues Found Apache CloudStack 4.15.1.0 (RC2)

2021-06-17 Thread Corey, Mike
Hi,

Thanks for pushing this out.  I'm looking forward to trying the 
template/instance deployment in my VMware PILOT.

A couple items I noticed off the "new" build are:

1 - During zone creation with VMware and setting up the physical networks - 
adding the traffic label to use a VDS does NOT keep/take/apply.  Once the zone 
is created and you go into the physical networks, the VDS traffic label is 
blank when it should be in this format "vSwtichName,VLAN,typeofswitch".  The 
only physical network traffic label that saved during zone setup wizard was for 
the Management stack; my storage and guest physical network traffic labels did 
not save from the wizard.

2 - Initial SystemVM deployment, the secondary storage permission do not allow 
the copy of the systemvm.iso to the secondary/systemvm/ folder.  I had to first 
create a /mnt/secondary/systemvm/ folder and chmod -R for this copy to function.

More to come...

Mike

-Original Message-
From: Rohit Yadav  
Sent: Wednesday, June 16, 2021 12:28 PM
To: d...@cloudstack.apache.org; users@cloudstack.apache.org
Subject: [VOTE] Apache CloudStack 4.15.1.0 (RC2)

Hi All,

I've created a 4.15.1.0 release, with the following artifacts up for a vote:

Git Branch:
https://github.com/apache/cloudstack/tree/4.15.1.0-RC20210616T2128
Commit SHA:
3afd37022b9dac52cd146dccada6012e47a80232

Source release (checksums and signatures are available at the same
location):
https://dist.apache.org/repos/dist/dev/cloudstack/4.15.1.0/

PGP release keys (signed using 5ED1E1122DC5E8A4A45112C2484248210EE3D884):
https://dist.apache.org/repos/dist/release/cloudstack/KEYS

The vote will be open for the next week until 22 June 2021.

For sanity in tallying the vote, can PMC members please be sure to indicate
"(binding)" with their vote?

[ ] +1  approve
[ ] +0  no opinion
[ ] -1  disapprove (and reason why)

For users convenience, the packages from this release candidate and 4.15.1
systemvmtemplates are available here:
https://download.cloudstack.org/testing/4.15.1.0-RC2/
https://download.cloudstack.org/systemvm/4.15/

Documentation is not published yet, but the following may be referenced for
upgrade related tests: (there's a new 4.15.1 systemvmtemplate to be
registered prior to upgrade)
https://github.com/apache/cloudstack-documentation/tree/4.15/source/upgrading/upgrade

Regards.


Re: Snapshots are not working after upgrading to 4.15.0

2021-06-17 Thread Daniel Augusto Veronezi Salvador

Hello Andrei,

As yourself noticed, ACS has a hardcoded threshold for secondary 
storages. In cases that the secondary storage has large capacities, 10% 
can mean a lot of storage. There is an open PR 
(https://github.com/apache/cloudstack/pull/4790) that externalize this 
threshold to allow operators decide how much they need. Also, the logs 
of secondary storage management were improved with PR 
https://github.com/apache/cloudstack/pull/4955.


With respect to KVM snapshots, volume snapshots are taken in a quite 
peculiar way. Instead of taking volume snapshots directly, ACS takes a 
full snapshot of the VM, which may cause freeze on it (VM) due to memory 
snapshot, and then extracts the disk from the VM snapshot. Due to it, it 
was open an issue https://github.com/apache/cloudstack/issues/5124 to 
discuss a new workflow to snapshots on KVM.


I am already implementing the solution for the issue 5124, and to 
improve this whole snapshot process for KVM; However, it is a complex 
and long standing job. As soon as we have something, I would appreciate 
to receive some feedback from you.


Regards,
Guto


On 2021/06/16 16:15:51, Andrei Mikhailovsky wrote:
> Hello,>
>
> I've done some more investigation and indeed, the snapshots were not 
taken because the secondary storage was over 90% used. I have started 
cleaning some of the older volumes and noticed another problem. After 
removing snapshots, they do not seem to be removed from the secondary 
storage. I've removed all snapshots over 24 hours ago and it looks like 
the disk space hasn't been freed up at all.>

>
> Looks like there are issues with snapshotting function after all.>
>
> Andrei>
>
>
>
> - Original Message ->
> > From: "Harikrishna Patnala" >
> > To: "users" >
> > Sent: Tuesday, 8 June, 2021 03:33:57>
> > Subject: Re: Snapshots are not working after upgrading to 4.15.0>
>
> > Hi Andrei,>
> > >
> > Can you check the following things and let us know?>
> > >
> > >
> > 1. Can you try creating a new volume and then create snapshot of 
that, to check>

> > if this an issue with old entries>
> > 2. For the snapshots which are failing can you check if you are 
seeing any>
> > error messages like this "Can't find an image storage in zone with 
less than".>

> > This is to check if secondary storage free space check failed.>
> > 3. For the snapshots which are failing and if it is delta snapshot 
can you>
> > check if its parent's snapshot entry exists in "snapshot_store_ref" 
table with>
> > 'parent_snapshot_id' of the current snapshot with 'store_role' 
"Image". This is>
> > to find the secondary storage where the parent snapshot backup is 
located.>

> > >
> > Regards,>
> > Harikrishna>
> > >
> > From: Andrei Mikhailovsky >
> > Sent: Monday, June 7, 2021 7:00 PM>
> > To: users >
> > Subject: Snapshots are not working after upgrading to 4.15.0>
> > >
> > Hello everyone,>
> > >
> > I am having an issue with volume snapshots since I've upgraded to 
4.15.0. None>
> > of the volumes are being snapshotted regardless if the snapshot is 
initiated>
> > manually or from the schedule. The strange thing is that if I 
manually take the>
> > snapshot, the GUI shows Success status, but the Storage>Snapshots 
show an Error>

> > status. Here is what I see in the management server logs:>
> > >
> > 2021-06-07 13:55:20,022 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]>
> > (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) 
(logid:be34ce01) Done>

> > executing com.cloud.vm.VmWorkTakeVolumeSnapshot for job-86143>
> > 2021-06-07 13:55:20,024 INFO [o.a.c.f.j.i.AsyncJobMonitor]>
> > (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) 
(logid:be34ce01) Remove>

> > job-86143 from job monitoring>
> > 2021-06-07 13:55:20,094 DEBUG [o.a.c.s.s.SnapshotServiceImpl]>
> > (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Failed to copy 
snapshot>
> > com.cloud.utils.exception.CloudRuntimeException: can not find an 
image stores>

> > at>
> > 
org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:271)> 


> > at>
> > 
org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.backupSnapshot(DefaultSnapshotStrategy.java:171)> 


> > at>
> > 
com.cloud.storage.snapshot.SnapshotManagerImpl$BackupSnapshotTask.runInContext(SnapshotManagerImpl.java:1238)> 


> > at>
> > 
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)> 


> > at>
> > 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)> 


> > at>
> > 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)> 


> > at>
> > 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)> 


> > at>
> > 
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)> 


> > at>
> > 

Re: Snapshots are not working after upgrading to 4.15.0

2021-06-17 Thread Rohit Yadav
Hi Andrei,

Can you test 4.15.1.0 RC2 which is up for voting/testing and if you're able to 
reproduce the issue, please file a bugreport if it's not same as 
https://github.com/apache/cloudstack/issues/4797

#4747 is Ceph specific which unfortunately I don't have an environment to test 
again but PRs are welcome by any Ceph user/developer. Thanks.


Regards.


From: Slavka Peleva 
Sent: Thursday, June 17, 2021 20:50
To: users@cloudstack.apache.org 
Subject: Re: Snapshots are not working after upgrading to 4.15.0

Hi all,

I've compared the delete of snapshots between 4.13 and 4.15.1. The main
difference is that when picking the snapshot strategy in 4.13, the deletion
is handled by XenserverSnapshotStrategy (renamed DefaultSnapshotStrategy in
the newer versions), and for 4.15.1 is handled by
StorageSystemSnapshotStrategy. The difference is that the first one deletes
the snapshot chain in secondary storage, the second deletes the snapshot
only on the primary (Ceph) storage. Gabriel, if you are aware of the
problem, can you correct me if I'm wrong?

Best regards,
Slavka

On Thu, Jun 17, 2021 at 4:23 PM Gabriel Bräscher 
wrote:

> Hi Andrei,
>
> I appreciate all the efforts and the help in narrowing down this issue. It
> looks similar and probably it is related to bug #4797 indeed.
> This bug is for some time to be fixed and I perfectly understand why you
> are not happy.
>
> I am speaking for myself here and I am not the Release Manager (RM) of
> 4.15.1.0 but In my point of view, this does not necessarily impact on
> blocking 4.15.1.0.
>
> Fixing it has been proving a bit trickier and also requires manual tests
> with different environment configurations and some time to debug and
> develop.
> I myself had no time to fix it for 4.15.1.0 thus decided to not hold
> 4.15.1.0 as it would mean that many users would not have several bug fixes
> due to this one.
>
> To give some context. I work for a hosting company that has been
> contributing to bug fixes and new features for a long time.
> We even fixed bugs that do not impact us directly (e.g. issues that affect
> storage systems we do not use, or a hypervisor we do not use, etc).
> This means that I, as a contributor, sometimes have less time for some
> tasks than other ones.
>
> With that said, I will be re-checking this issue soon(ish) but I cannot
> guarantee that I will be able to bring a fix in time for 4.15.1.0.
> If any contributor has time to fix it I would be happy to help with review
> and testing.
>
> Best regards,
> Gabriel.
>
> Em qui., 17 de jun. de 2021 às 07:31, Andrei Mikhailovsky
>  escreveu:
>
> > Hi Suresh,
> >
> > This is what I've answered on the db tables:
> >
> > The table snapshots has NULL under the removed column in all
> snapshots
> > that I've
> > removed. The table snapshot_store_ref has no such column, but the
> > state shown
> > as Destroyed.
> >
> >
> > I've done some more checking under the ssvm itself, which look ok:
> >
> >
> > root@s-2536-VM:/usr/local/cloud/systemvm#
> > /usr/local/cloud/systemvm/ssvm-check.sh
> > 
> > First DNS server is  192.168.169.254
> > PING 192.168.169.254 (192.168.169.254): 56 data bytes
> > 64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms
> > 64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms
> > --- 192.168.169.254 ping statistics ---
> > 2 packets transmitted, 2 packets received, 0% packet loss
> > round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms
> > Good: Can ping DNS server
> > 
> > Good: DNS resolves cloudstack.apache.org
> > 
> > nfs is currently mounted
> > Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2
> > Good: Can write to mount point
> > 
> > Management server is 192.168.169.13. Checking connectivity.
> > Good: Can connect to management server 192.168.169.13 port 8250
> > 
> > Good: Java process is running
> > 
> > Tests Complete. Look for ERROR or WARNING above.
> >
> >
> > The management server does show errors like these, without any further
> > details:
> >
> > 2021-06-17 10:31:06,197 DEBUG [c.c.s.StorageManagerImpl]
> > (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to
> delete
> > snapshot: 55183 from storage
> > 2021-06-17 10:31:06,280 DEBUG [o.a.c.s.s.SnapshotObject]
> > (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to
> update
> > state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to
> a
> > new state from Destroyed via DestroyRequested
> > 2021-06-17 10:31:06,281 DEBUG [c.c.s.StorageManagerImpl]
> > (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to
> delete
> > snapshot: 84059 from storage
> > 2021-06-17 10:31:06,363 

Re: instance backup designs?

2021-06-17 Thread Rohit Yadav
Hi Yordan,

We do have a backup & recovery framework which can be extended to implement 
support for new solutions, the current provider/plugin is available only for 
Vmware/Veeam and which can be used to implement support for other backup 
solutions for other hypervisors.

While there is no choice now, for XenServer/XCP-NG you can use volume snapshots 
as a way to have backups volumes on secondary storage.


Regards.


From: Yordan Kostov 
Sent: Wednesday, June 16, 2021 18:46
To: users@cloudstack.apache.org 
Subject: instance backup designs?

Hey everyone,

I was wondering what choice does one have for backup when 
underlying hypervisor is XenServer/XCP-NG?
Any high level ideas or just sharing any doc that may exist 
will be great!

Best regards,
Jordan

 



Re: Snapshots are not working after upgrading to 4.15.0

2021-06-17 Thread Slavka Peleva
Hi all,

I've compared the delete of snapshots between 4.13 and 4.15.1. The main
difference is that when picking the snapshot strategy in 4.13, the deletion
is handled by XenserverSnapshotStrategy (renamed DefaultSnapshotStrategy in
the newer versions), and for 4.15.1 is handled by
StorageSystemSnapshotStrategy. The difference is that the first one deletes
the snapshot chain in secondary storage, the second deletes the snapshot
only on the primary (Ceph) storage. Gabriel, if you are aware of the
problem, can you correct me if I'm wrong?

Best regards,
Slavka

On Thu, Jun 17, 2021 at 4:23 PM Gabriel Bräscher 
wrote:

> Hi Andrei,
>
> I appreciate all the efforts and the help in narrowing down this issue. It
> looks similar and probably it is related to bug #4797 indeed.
> This bug is for some time to be fixed and I perfectly understand why you
> are not happy.
>
> I am speaking for myself here and I am not the Release Manager (RM) of
> 4.15.1.0 but In my point of view, this does not necessarily impact on
> blocking 4.15.1.0.
>
> Fixing it has been proving a bit trickier and also requires manual tests
> with different environment configurations and some time to debug and
> develop.
> I myself had no time to fix it for 4.15.1.0 thus decided to not hold
> 4.15.1.0 as it would mean that many users would not have several bug fixes
> due to this one.
>
> To give some context. I work for a hosting company that has been
> contributing to bug fixes and new features for a long time.
> We even fixed bugs that do not impact us directly (e.g. issues that affect
> storage systems we do not use, or a hypervisor we do not use, etc).
> This means that I, as a contributor, sometimes have less time for some
> tasks than other ones.
>
> With that said, I will be re-checking this issue soon(ish) but I cannot
> guarantee that I will be able to bring a fix in time for 4.15.1.0.
> If any contributor has time to fix it I would be happy to help with review
> and testing.
>
> Best regards,
> Gabriel.
>
> Em qui., 17 de jun. de 2021 às 07:31, Andrei Mikhailovsky
>  escreveu:
>
> > Hi Suresh,
> >
> > This is what I've answered on the db tables:
> >
> > The table snapshots has NULL under the removed column in all
> snapshots
> > that I've
> > removed. The table snapshot_store_ref has no such column, but the
> > state shown
> > as Destroyed.
> >
> >
> > I've done some more checking under the ssvm itself, which look ok:
> >
> >
> > root@s-2536-VM:/usr/local/cloud/systemvm#
> > /usr/local/cloud/systemvm/ssvm-check.sh
> > 
> > First DNS server is  192.168.169.254
> > PING 192.168.169.254 (192.168.169.254): 56 data bytes
> > 64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms
> > 64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms
> > --- 192.168.169.254 ping statistics ---
> > 2 packets transmitted, 2 packets received, 0% packet loss
> > round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms
> > Good: Can ping DNS server
> > 
> > Good: DNS resolves cloudstack.apache.org
> > 
> > nfs is currently mounted
> > Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2
> > Good: Can write to mount point
> > 
> > Management server is 192.168.169.13. Checking connectivity.
> > Good: Can connect to management server 192.168.169.13 port 8250
> > 
> > Good: Java process is running
> > 
> > Tests Complete. Look for ERROR or WARNING above.
> >
> >
> > The management server does show errors like these, without any further
> > details:
> >
> > 2021-06-17 10:31:06,197 DEBUG [c.c.s.StorageManagerImpl]
> > (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to
> delete
> > snapshot: 55183 from storage
> > 2021-06-17 10:31:06,280 DEBUG [o.a.c.s.s.SnapshotObject]
> > (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to
> update
> > state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to
> a
> > new state from Destroyed via DestroyRequested
> > 2021-06-17 10:31:06,281 DEBUG [c.c.s.StorageManagerImpl]
> > (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to
> delete
> > snapshot: 84059 from storage
> > 2021-06-17 10:31:06,363 DEBUG [o.a.c.s.s.SnapshotObject]
> > (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to
> update
> > state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to
> a
> > new state from Destroyed via DestroyRequested
> >
> >
> > Regarding the bug 4797. I can't really comment as it has very little
> > technical details without the management log errors, etc. But
> essentially,
> > at the high level, the snapshots are not deleted from the backend in my
> > case, just like in the bug 4797.
> >
> >
> > TBH, I am very much 

Re: Snapshots are not working after upgrading to 4.15.0

2021-06-17 Thread Gabriel Bräscher
Hi Andrei,

I appreciate all the efforts and the help in narrowing down this issue. It
looks similar and probably it is related to bug #4797 indeed.
This bug is for some time to be fixed and I perfectly understand why you
are not happy.

I am speaking for myself here and I am not the Release Manager (RM) of
4.15.1.0 but In my point of view, this does not necessarily impact on
blocking 4.15.1.0.

Fixing it has been proving a bit trickier and also requires manual tests
with different environment configurations and some time to debug and
develop.
I myself had no time to fix it for 4.15.1.0 thus decided to not hold
4.15.1.0 as it would mean that many users would not have several bug fixes
due to this one.

To give some context. I work for a hosting company that has been
contributing to bug fixes and new features for a long time.
We even fixed bugs that do not impact us directly (e.g. issues that affect
storage systems we do not use, or a hypervisor we do not use, etc).
This means that I, as a contributor, sometimes have less time for some
tasks than other ones.

With that said, I will be re-checking this issue soon(ish) but I cannot
guarantee that I will be able to bring a fix in time for 4.15.1.0.
If any contributor has time to fix it I would be happy to help with review
and testing.

Best regards,
Gabriel.

Em qui., 17 de jun. de 2021 às 07:31, Andrei Mikhailovsky
 escreveu:

> Hi Suresh,
>
> This is what I've answered on the db tables:
>
> The table snapshots has NULL under the removed column in all snapshots
> that I've
> removed. The table snapshot_store_ref has no such column, but the
> state shown
> as Destroyed.
>
>
> I've done some more checking under the ssvm itself, which look ok:
>
>
> root@s-2536-VM:/usr/local/cloud/systemvm#
> /usr/local/cloud/systemvm/ssvm-check.sh
> 
> First DNS server is  192.168.169.254
> PING 192.168.169.254 (192.168.169.254): 56 data bytes
> 64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms
> 64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms
> --- 192.168.169.254 ping statistics ---
> 2 packets transmitted, 2 packets received, 0% packet loss
> round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms
> Good: Can ping DNS server
> 
> Good: DNS resolves cloudstack.apache.org
> 
> nfs is currently mounted
> Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2
> Good: Can write to mount point
> 
> Management server is 192.168.169.13. Checking connectivity.
> Good: Can connect to management server 192.168.169.13 port 8250
> 
> Good: Java process is running
> 
> Tests Complete. Look for ERROR or WARNING above.
>
>
> The management server does show errors like these, without any further
> details:
>
> 2021-06-17 10:31:06,197 DEBUG [c.c.s.StorageManagerImpl]
> (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to delete
> snapshot: 55183 from storage
> 2021-06-17 10:31:06,280 DEBUG [o.a.c.s.s.SnapshotObject]
> (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to update
> state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to a
> new state from Destroyed via DestroyRequested
> 2021-06-17 10:31:06,281 DEBUG [c.c.s.StorageManagerImpl]
> (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to delete
> snapshot: 84059 from storage
> 2021-06-17 10:31:06,363 DEBUG [o.a.c.s.s.SnapshotObject]
> (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to update
> state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to a
> new state from Destroyed via DestroyRequested
>
>
> Regarding the bug 4797. I can't really comment as it has very little
> technical details without the management log errors, etc. But essentially,
> at the high level, the snapshots are not deleted from the backend in my
> case, just like in the bug 4797.
>
>
> TBH, I am very much surprised that a bug in such an important function of
> ACS has slipped through the testing methods for the 4.15.0 release and
> despite being discovered over 3 months ago, it hasn't been scheduled for
> the fix in 4.15.1 bug fix release. Does that sound right to you? I think
> this issue should be revisited and corrected as it will cause a fill up of
> the secondary storage and ultimately cause all sorts of issues with
> creation of snapshots.
>
> Andrei
>
>
> - Original Message -
> > From: "Suresh Anaparti" 
> > To: "users" 
> > Sent: Thursday, 17 June, 2021 11:16:59
> > Subject: Re: Snapshots are not working after upgrading to 4.15.0
>
> > Hi Andrei,
> >
> > Have you checked the 'status' and 'removed' timestamp in snapshots
> table, and
> > 'state' in snapshot_store_ref table for these snapshots.
> >
> > Similar 

Re: Snapshots are not working after upgrading to 4.15.0

2021-06-17 Thread Andrei Mikhailovsky
Hi Suresh,

This is what I've answered on the db tables:

The table snapshots has NULL under the removed column in all snapshots that 
I've
removed. The table snapshot_store_ref has no such column, but the state 
shown
as Destroyed.


I've done some more checking under the ssvm itself, which look ok:


root@s-2536-VM:/usr/local/cloud/systemvm# 
/usr/local/cloud/systemvm/ssvm-check.sh

First DNS server is  192.168.169.254
PING 192.168.169.254 (192.168.169.254): 56 data bytes
64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms
64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms
--- 192.168.169.254 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms
Good: Can ping DNS server

Good: DNS resolves cloudstack.apache.org

nfs is currently mounted
Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2
Good: Can write to mount point

Management server is 192.168.169.13. Checking connectivity.
Good: Can connect to management server 192.168.169.13 port 8250

Good: Java process is running

Tests Complete. Look for ERROR or WARNING above.


The management server does show errors like these, without any further details:

2021-06-17 10:31:06,197 DEBUG [c.c.s.StorageManagerImpl] 
(StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to delete 
snapshot: 55183 from storage
2021-06-17 10:31:06,280 DEBUG [o.a.c.s.s.SnapshotObject] 
(StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to update 
state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to a new 
state from Destroyed via DestroyRequested
2021-06-17 10:31:06,281 DEBUG [c.c.s.StorageManagerImpl] 
(StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to delete 
snapshot: 84059 from storage
2021-06-17 10:31:06,363 DEBUG [o.a.c.s.s.SnapshotObject] 
(StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to update 
state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to a new 
state from Destroyed via DestroyRequested


Regarding the bug 4797. I can't really comment as it has very little technical 
details without the management log errors, etc. But essentially, at the high 
level, the snapshots are not deleted from the backend in my case, just like in 
the bug 4797.


TBH, I am very much surprised that a bug in such an important function of ACS 
has slipped through the testing methods for the 4.15.0 release and despite 
being discovered over 3 months ago, it hasn't been scheduled for the fix in 
4.15.1 bug fix release. Does that sound right to you? I think this issue should 
be revisited and corrected as it will cause a fill up of the secondary storage 
and ultimately cause all sorts of issues with creation of snapshots.

Andrei


- Original Message -
> From: "Suresh Anaparti" 
> To: "users" 
> Sent: Thursday, 17 June, 2021 11:16:59
> Subject: Re: Snapshots are not working after upgrading to 4.15.0

> Hi Andrei,
> 
> Have you checked the 'status' and 'removed' timestamp in snapshots table, and
> 'state' in snapshot_store_ref table for these snapshots.
> 
> Similar issue logged (by Ed, as mentioned in his email) here:
> https://github.com/apache/cloudstack/issues/4797. Is it the same issue?
> 
> Regards,
> Suresh
> 
>On 17/06/21, 2:18 PM, "Andrei Mikhailovsky"  wrote:
> 
>Hi Suresh, Please see below the answers to your questions.
> 
>
> 
> 
> - Original Message -
>> From: "Suresh Anaparti" 
>> To: "users" 
>> Sent: Thursday, 17 June, 2021 06:36:27
>> Subject: Re: Snapshots are not working after upgrading to 4.15.0
> 
>> Hi Andrei,
>> 
>> Can you check if the storage garbage collector is enabled or not in your 
> env
>> (specified using the global setting 'storage.cleanup.enabled'). If it is
>> enabled, check the interval & delay setting: 'storage.cleanup.interval' 
> and
>> 'storage.cleanup.delay', and see the logs to confirm cleanup is 
> performed or
>> not.
> 
>storage.cleanup.enabled is true
>storage.cleanup.interval is 3600
>storage.cleanup.delay is 360086400
> 
>> 
>> Also, check the snapshot status / state in snapshots & 
> snapshot_store_ref tables
>> for the snapshots that are not deleted during the cleanup. Is 'removed'
>> timestamp set for them in snapshots table?
>> 
> 
> 
>The table snapshots has NULL under the removed column in all snapshots 
> that I've
>removed. The table snapshot_store_ref has no such column, but the state 
> shown
>as Destroyed.
> 
> 
> 
> 
>> Regards,
>> Suresh
>> 
>>On 16/06/21, 9:46 PM, "Andrei Mikhailovsky"  
> 

Re: Snapshots are not working after upgrading to 4.15.0

2021-06-17 Thread Suresh Anaparti
Hi Andrei,

Have you checked the 'status' and 'removed' timestamp in snapshots table, and 
'state' in snapshot_store_ref table for these snapshots.

Similar issue logged (by Ed, as mentioned in his email) here: 
https://github.com/apache/cloudstack/issues/4797. Is it the same issue? 

Regards,
Suresh

On 17/06/21, 2:18 PM, "Andrei Mikhailovsky"  wrote:

Hi Suresh, Please see below the answers to your questions.


 

- Original Message -
> From: "Suresh Anaparti" 
> To: "users" 
> Sent: Thursday, 17 June, 2021 06:36:27
> Subject: Re: Snapshots are not working after upgrading to 4.15.0

> Hi Andrei,
> 
> Can you check if the storage garbage collector is enabled or not in your 
env
> (specified using the global setting 'storage.cleanup.enabled'). If it is
> enabled, check the interval & delay setting: 'storage.cleanup.interval' 
and
> 'storage.cleanup.delay', and see the logs to confirm cleanup is performed 
or
> not.

storage.cleanup.enabled is true
storage.cleanup.interval is 3600
storage.cleanup.delay is 360086400

> 
> Also, check the snapshot status / state in snapshots & snapshot_store_ref 
tables
> for the snapshots that are not deleted during the cleanup. Is 'removed'
> timestamp set for them in snapshots table?
> 


The table snapshots has NULL under the removed column in all snapshots that 
I've removed. The table snapshot_store_ref has no such column, but the state 
shown as Destroyed.




> Regards,
> Suresh
> 
>On 16/06/21, 9:46 PM, "Andrei Mikhailovsky"  
wrote:
> 
>Hello,
> 
>I've done some more investigation and indeed, the snapshots were not 
taken
>because the secondary storage was over 90% used. I have started 
cleaning some
>of the older volumes and noticed another problem. After removing 
snapshots,
>they do not seem to be removed from the secondary storage. I've 
removed all
>snapshots over 24 hours ago and it looks like  the disk space hasn't 
been freed
>up at all.
> 
>Looks like there are issues with snapshotting function after all.
> 
>Andrei
> 
> 
> 
>
> 
> 
> - Original Message -
>> From: "Harikrishna Patnala" 
>> To: "users" 
>> Sent: Tuesday, 8 June, 2021 03:33:57
>> Subject: Re: Snapshots are not working after upgrading to 4.15.0
> 
>> Hi Andrei,
>> 
>> Can you check the following things and let us know?
>> 
>> 
>>  1.  Can you try creating a new volume and then create snapshot of 
that, to check
>>  if this an issue with old entries
>>  2.  For the snapshots which are failing can you check if you are 
seeing any
>>  error messages like this "Can't find an image storage in zone with 
less than".
>>  This is to check if secondary storage free space check failed.
>>  3.  For the snapshots which are failing and if it is delta snapshot 
can you
>>  check if its parent's snapshot entry exists in "snapshot_store_ref" 
table with
>>  'parent_snapshot_id' of the current snapshot with 'store_role' 
"Image". This is
>>  to find the secondary storage where the parent snapshot backup is 
located.
>> 
>> Regards,
>> Harikrishna
>> 
>> From: Andrei Mikhailovsky 
>> Sent: Monday, June 7, 2021 7:00 PM
>> To: users 
>> Subject: Snapshots are not working after upgrading to 4.15.0
>> 
>> Hello everyone,
>> 
>> I am having an issue with volume snapshots since I've upgraded to 
4.15.0. None
>> of the volumes are being snapshotted regardless if the snapshot is 
initiated
>> manually or from the schedule. The strange thing is that if I 
manually take the
>> snapshot, the GUI shows Success status, but the Storage>Snapshots 
show an Error
>> status. Here is what I see in the management server logs:
>> 
>> 2021-06-07 13:55:20,022 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) 
(logid:be34ce01) Done
>> executing com.cloud.vm.VmWorkTakeVolumeSnapshot for job-86143
>> 2021-06-07 13:55:20,024 INFO [o.a.c.f.j.i.AsyncJobMonitor]
>> (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) 
(logid:be34ce01) Remove
>> job-86143 from job monitoring
>> 2021-06-07 13:55:20,094 DEBUG [o.a.c.s.s.SnapshotServiceImpl]
>> (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Failed to copy 
snapshot
>> com.cloud.utils.exception.CloudRuntimeException: can not find an 
image stores
>> at
>> 
org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:271)
>> at
>> 

Re: Snapshots are not working after upgrading to 4.15.0

2021-06-17 Thread Andrei Mikhailovsky
Hi Suresh, Please see below the answers to your questions.

- Original Message -
> From: "Suresh Anaparti" 
> To: "users" 
> Sent: Thursday, 17 June, 2021 06:36:27
> Subject: Re: Snapshots are not working after upgrading to 4.15.0

> Hi Andrei,
> 
> Can you check if the storage garbage collector is enabled or not in your env
> (specified using the global setting 'storage.cleanup.enabled'). If it is
> enabled, check the interval & delay setting: 'storage.cleanup.interval' and
> 'storage.cleanup.delay', and see the logs to confirm cleanup is performed or
> not.

storage.cleanup.enabled is true
storage.cleanup.interval is 3600
storage.cleanup.delay is 360086400

> 
> Also, check the snapshot status / state in snapshots & snapshot_store_ref 
> tables
> for the snapshots that are not deleted during the cleanup. Is 'removed'
> timestamp set for them in snapshots table?
> 


The table snapshots has NULL under the removed column in all snapshots that 
I've removed. The table snapshot_store_ref has no such column, but the state 
shown as Destroyed.




> Regards,
> Suresh
> 
>On 16/06/21, 9:46 PM, "Andrei Mikhailovsky"  wrote:
> 
>Hello,
> 
>I've done some more investigation and indeed, the snapshots were not taken
>because the secondary storage was over 90% used. I have started cleaning 
> some
>of the older volumes and noticed another problem. After removing snapshots,
>they do not seem to be removed from the secondary storage. I've removed all
>snapshots over 24 hours ago and it looks like  the disk space hasn't been 
> freed
>up at all.
> 
>Looks like there are issues with snapshotting function after all.
> 
>Andrei
> 
> 
> 
>
> 
> 
> - Original Message -
>> From: "Harikrishna Patnala" 
>> To: "users" 
>> Sent: Tuesday, 8 June, 2021 03:33:57
>> Subject: Re: Snapshots are not working after upgrading to 4.15.0
> 
>> Hi Andrei,
>> 
>> Can you check the following things and let us know?
>> 
>> 
>>  1.  Can you try creating a new volume and then create snapshot of that, 
> to check
>>  if this an issue with old entries
>>  2.  For the snapshots which are failing can you check if you are seeing 
> any
>>  error messages like this "Can't find an image storage in zone with less 
> than".
>>  This is to check if secondary storage free space check failed.
>>  3.  For the snapshots which are failing and if it is delta snapshot can 
> you
>>  check if its parent's snapshot entry exists in "snapshot_store_ref" 
> table with
>>  'parent_snapshot_id' of the current snapshot with 'store_role' "Image". 
> This is
>>  to find the secondary storage where the parent snapshot backup is 
> located.
>> 
>> Regards,
>> Harikrishna
>> 
>> From: Andrei Mikhailovsky 
>> Sent: Monday, June 7, 2021 7:00 PM
>> To: users 
>> Subject: Snapshots are not working after upgrading to 4.15.0
>> 
>> Hello everyone,
>> 
>> I am having an issue with volume snapshots since I've upgraded to 
> 4.15.0. None
>> of the volumes are being snapshotted regardless if the snapshot is 
> initiated
>> manually or from the schedule. The strange thing is that if I manually 
> take the
>> snapshot, the GUI shows Success status, but the Storage>Snapshots show 
> an Error
>> status. Here is what I see in the management server logs:
>> 
>> 2021-06-07 13:55:20,022 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) 
> Done
>> executing com.cloud.vm.VmWorkTakeVolumeSnapshot for job-86143
>> 2021-06-07 13:55:20,024 INFO [o.a.c.f.j.i.AsyncJobMonitor]
>> (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) 
> Remove
>> job-86143 from job monitoring
>> 2021-06-07 13:55:20,094 DEBUG [o.a.c.s.s.SnapshotServiceImpl]
>> (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Failed to copy 
> snapshot
>> com.cloud.utils.exception.CloudRuntimeException: can not find an image 
> stores
>> at
>> 
> org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:271)
>> at
>> 
> org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.backupSnapshot(DefaultSnapshotStrategy.java:171)
>> at
>> 
> com.cloud.storage.snapshot.SnapshotManagerImpl$BackupSnapshotTask.runInContext(SnapshotManagerImpl.java:1238)
>> at
>> 
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
>> at
>> 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
>> at
>> 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
>> at
>> 
> 

Re: Snapshots are not working after upgrading to 4.15.0

2021-06-17 Thread Edward St Pierre
Hi Guys,

I have already logged this as a big under reference: 4797

Ed


On Thu, 17 Jun 2021 at 06:37, Suresh Anaparti 
wrote:

> Hi Andrei,
>
> Can you check if the storage garbage collector is enabled or not in your
> env (specified using the global setting 'storage.cleanup.enabled'). If it
> is enabled, check the interval & delay setting: 'storage.cleanup.interval'
> and 'storage.cleanup.delay', and see the logs to confirm cleanup is
> performed or not.
>
> Also, check the snapshot status / state in snapshots & snapshot_store_ref
> tables for the snapshots that are not deleted during the cleanup. Is
> 'removed' timestamp set for them in snapshots table?
>
> Regards,
> Suresh
>
> On 16/06/21, 9:46 PM, "Andrei Mikhailovsky" 
> wrote:
>
> Hello,
>
> I've done some more investigation and indeed, the snapshots were not
> taken because the secondary storage was over 90% used. I have started
> cleaning some of the older volumes and noticed another problem. After
> removing snapshots, they do not seem to be removed from the secondary
> storage. I've removed all snapshots over 24 hours ago and it looks like
> the disk space hasn't been freed up at all.
>
> Looks like there are issues with snapshotting function after all.
>
> Andrei
>
>
>
>
>
>
> - Original Message -
> > From: "Harikrishna Patnala" 
> > To: "users" 
> > Sent: Tuesday, 8 June, 2021 03:33:57
> > Subject: Re: Snapshots are not working after upgrading to 4.15.0
>
> > Hi Andrei,
> >
> > Can you check the following things and let us know?
> >
> >
> >  1.  Can you try creating a new volume and then create snapshot of
> that, to check
> >  if this an issue with old entries
> >  2.  For the snapshots which are failing can you check if you are
> seeing any
> >  error messages like this "Can't find an image storage in zone with
> less than".
> >  This is to check if secondary storage free space check failed.
> >  3.  For the snapshots which are failing and if it is delta snapshot
> can you
> >  check if its parent's snapshot entry exists in "snapshot_store_ref"
> table with
> >  'parent_snapshot_id' of the current snapshot with 'store_role'
> "Image". This is
> >  to find the secondary storage where the parent snapshot backup is
> located.
> >
> > Regards,
> > Harikrishna
> > 
> > From: Andrei Mikhailovsky 
> > Sent: Monday, June 7, 2021 7:00 PM
> > To: users 
> > Subject: Snapshots are not working after upgrading to 4.15.0
> >
> > Hello everyone,
> >
> > I am having an issue with volume snapshots since I've upgraded to
> 4.15.0. None
> > of the volumes are being snapshotted regardless if the snapshot is
> initiated
> > manually or from the schedule. The strange thing is that if I
> manually take the
> > snapshot, the GUI shows Success status, but the Storage>Snapshots
> show an Error
> > status. Here is what I see in the management server logs:
> >
> > 2021-06-07 13:55:20,022 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> > (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143)
> (logid:be34ce01) Done
> > executing com.cloud.vm.VmWorkTakeVolumeSnapshot for job-86143
> > 2021-06-07 13:55:20,024 INFO [o.a.c.f.j.i.AsyncJobMonitor]
> > (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143)
> (logid:be34ce01) Remove
> > job-86143 from job monitoring
> > 2021-06-07 13:55:20,094 DEBUG [o.a.c.s.s.SnapshotServiceImpl]
> > (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Failed to copy
> snapshot
> > com.cloud.utils.exception.CloudRuntimeException: can not find an
> image stores
> > at
> >
> org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:271)
> > at
> >
> org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.backupSnapshot(DefaultSnapshotStrategy.java:171)
> > at
> >
> com.cloud.storage.snapshot.SnapshotManagerImpl$BackupSnapshotTask.runInContext(SnapshotManagerImpl.java:1238)
> > at
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
> > at
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
> > at
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
> > at
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
> > at
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
> > at
> >
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> > at
> >
>