Re: Unable to add template to new deployment
Andrija, Thanks so much for all the details. I'm out of the office for the next couple of days so will update my cloud with your suggestions when I get back. As far as the "fancy" naming, I just never found names like bondX useful when Linux allows naming the network device something else. It has just become a convention of mine. I can easily distinguish which bond carries cloud traffic and which carries storage traffic by looking at the bond name, but it is just a personal thing and can easily switch back to using the standard bond names. I was aware of the traffic labels but forgot to mention that I had set those up in my previous email. There were still some details that you provided that helped me further understand how they work though, thanks. Again, thanks for you help. On 2021-06-17 22:04, Andrija Panic wrote: BTW, once you thing you have fixed all your network configuration issues - destroy all system VM (CPVM, SSVM and restart all networks with "cleanup" - so that new VMs are created_ Inside SSVM, run the the following script, which should give you results similar as below - confirming that your SSVM is healthy root@s-2536-VM:/usr/local/cloud/systemvm# /usr/local/cloud/systemvm/ssvm- check.sh First DNS server is 192.168.169.254 PING 192.168.169.254 (192.168.169.254): 56 data bytes 64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms 64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms --- 192.168.169.254 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms Good: Can ping DNS server Good: DNS resolves cloudstack.apache.org nfs is currently mounted Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2 Good: Can write to mount point Management server is 192.168.169.13. Checking connectivity. Good: Can connect to management server 192.168.169.13 port 8250 Good: Java process is running Tests Complete. Look for ERROR or WARNING above. On Thu, 17 Jun 2021 at 23:55, Andrija Panic wrote: Since you really bothered to provide so very detailed inputs and help us help you (vs what some other people tend to do) - I think you really deserved a decent answer (and some explanation). The last question first -even though you don't specify/have dedicated Storage traffic, there will be an additional interface inside the SSVM connected to the same Management network (not to the old Storage network - if you see the old storage network, restart your mgmt server and destroy the SSVM - a new one should be created, with proper interfaces inside it) bond naming issues: - rename your "bond-services" to something industry-standard like "bond0" or similar - cloudstack extracts "child" interfaces from cloudbr1 IF you specify a VLAN for a network that ACS should create - so your "bond-services", while fancy (and unclear to me WHY you named it in that weird way - smiley here) - is NOT something CloudStack will recognize and this is the reason it fails (it even says so in that error message) - no reason to NOT have that dedicated storage network - feel free to bring it back - the same issue you have as for the public traffic - rename "bond-storage" to e.g. "bond1" and you will be good to go - since you are NOT using tagging, ACS will just plug vNIC of the VM into the cloudbr2 (or whatever bridge name you use for it). Now some explanation (even though your deduction capabilities certainly made you draw some conclusions from what I wrote above ^^^) - When you specify a VLAN id for some network in CLoudStack - CloudStack will look for the device name that is specified as the "Traffic label" for that traffic (and you have none??? for your Public traffic - while it should be set to the name of the bridge device "cloudbr1") - and then it will provision a VLAN interface and create a new bridge - (i.e. for Public network with VLAN id 48, it will extract "bond0" from the "cloudbr1", and create bond0.48 VLAN interface - AND it will create a brand new bridge with this bond0.48 interface (bridge with funny name), and plug Public vNICs into this new bridge - When you do NOT specify a VLAN id for some network in CloudStack (i.e. your storage network doesn't use VLAN ID in CloudStack, your switch ports are in access vlan 96) - you need to have a bridge (i.e. cloudbr2) with the bondYYY child interface (instead of that "bond-storage" fancy but unrecognized child interface name) - and then ACS will NOT extract child interface (nor do everything I explained in the previous paraghraph/bullet point) - it will just bluntly "stick" all the vNICs into that cloudbr2 -
Cloudstack Usage --- not owner
Hi I have mysql hang, and restarted mysql, restarted cloudstack mgmt, and usage. After that facing issue of following: duration is 120 minutes) 2021-06-18 04:25:49,656 DEBUG [cloud.usage.UsageManagerImpl] (Usage-HB-1:null) (logid:) Scheduling Usage job... 2021-06-18 04:25:49,657 INFO [cloud.usage.UsageManagerImpl] (Usage-Job-1:null) (logid:) starting usage job... 2021-06-18 04:25:49,669 DEBUG [cloud.usage.UsageManagerImpl] (Usage-Job-1:null) (logid:) Not owner of usage job, skipping... 2021-06-18 04:25:49,669 INFO [cloud.usage.UsageManagerImpl] (Usage-Job-1:null) (logid:) usage job complete Seems usage pid is not updated to db . Andybody know how to fix this ? -- Regards, Hean Seng
Re: [VOTE] Apache CloudStack 4.15.1.0 (RC2)
@Rohit Yadav we might have a UI blocker, I'm waiting for Mike to report the issue - it seems (per his separate email to this ML) that the Traffic Labels are not persisted after the Zone deployment (UI doesn't show traffic labels) - I do recall an issue in 4.15.0 where similar was happening and one could not update the Traffic label in new UI (old UI had to be used) @Corey, Mike please report here with the problem - thx. On Wed, 16 Jun 2021 at 18:28, Rohit Yadav wrote: > Hi All, > > I've created a 4.15.1.0 release, with the following artifacts up for a > vote: > > Git Branch: > https://github.com/apache/cloudstack/tree/4.15.1.0-RC20210616T2128 > Commit SHA: > 3afd37022b9dac52cd146dccada6012e47a80232 > > Source release (checksums and signatures are available at the same > location): > https://dist.apache.org/repos/dist/dev/cloudstack/4.15.1.0/ > > PGP release keys (signed using 5ED1E1122DC5E8A4A45112C2484248210EE3D884): > https://dist.apache.org/repos/dist/release/cloudstack/KEYS > > The vote will be open for the next week until 22 June 2021. > > For sanity in tallying the vote, can PMC members please be sure to indicate > "(binding)" with their vote? > > [ ] +1 approve > [ ] +0 no opinion > [ ] -1 disapprove (and reason why) > > For users convenience, the packages from this release candidate and 4.15.1 > systemvmtemplates are available here: > https://download.cloudstack.org/testing/4.15.1.0-RC2/ > https://download.cloudstack.org/systemvm/4.15/ > > Documentation is not published yet, but the following may be referenced for > upgrade related tests: (there's a new 4.15.1 systemvmtemplate to be > registered prior to upgrade) > > https://github.com/apache/cloudstack-documentation/tree/4.15/source/upgrading/upgrade > > Regards. > -- Andrija Panić
Re: Centos 7.9 - cloud-init password reset?
Thanks Yordan, nice PR! Best, On Sun, 30 May 2021 at 16:03, Yordan Kostov wrote: > Dear everyone, > > Did a draft on Creating Linux template guide you can find it here > - https://github.com/apache/cloudstack-documentation/pull/215. > A separate page has been done that can be considered as additions > to the basic Linux guide. It relates to cloud-init and its features that > serve as a middleware to Cloudstack instances GUI functions. > > Guides are based on the following scripts: > - Centos 7 - > https://github.com/dredknight/cloud_scripts/blob/master/CloudStack-Xen/templates/centos7_clean.bash > - Ubuntu 20 - > https://github.com/dredknight/cloud_scripts/blob/master/CloudStack-Xen/templates/ubuntu20_prep_clean.bash > > Could you take a look and let me know if anything needs to be > changed - technical or format wise? > > During tests all features seem to work fine with the following > peculiarity. > - When ssh keys are reset in coudstack, the public key is added in > /home/cloud-user/.ssh/authorized_keys but the old one is not removed. > This means that users having previous private keys will still be > able to login is there a way Cloudstack to delete the old key? > > Best regards, > Jordan > > -Original Message- > From: Alireza Eskandari > Sent: Sunday, May 23, 2021 1:53 AM > To: users@cloudstack.apache.org > Subject: Re: Centos 7.9 - cloud-init password reset? > > > [X] This message came from outside your organization > > > It seems cloud-init cannot execute the script so it shows an error but the > script is running fine standalone. > I'll try it on centos stream. > Notice that cloud-init can handle password and ssh key from user data > server without extra script, but it can't reset ssh key or set password > from configdrive. > The script resolves these problems. > > On Fri, May 21, 2021 at 12:45 AM 조대형 wrote: > > > Hi, > > > > I have attached the logs that I execute the password script and > cloud-init. > > > > # ./password.bash > > > > Results : executed password reset file. > > > > Cloud Password Manager: Searching for ConfigDrive Cloud Password > > Manager: ConfigDrive not found Cloud Password Manager: Detecting > > primary network Cloud Password Manager: Trying to find userdata server > > Cloud Password Manager: Operating System is using NetworkManager Cloud > > Password Manager: Found userdata server IP VR's IP address in > > NetworkManager config Cloud Password Manager: Sending request to > > userdata server at VR's IP address to get public key Cloud Password > > Manager: Got response from userdata server at VR's IP address Cloud > > Password Manager: Did not receive any public keys from userdata server > > Cloud Password Manager: Sending request to userdata server at VR's IP > > address to get the password Cloud Password Manager: Got response from > > userdata server at VR's IP address Cloud Password Manager: VM has > > already saved a password from the userdata server at VR's IP address > > > > > > > > # cloud-init init > > > > Cloud-init v. 20.3-10.el8 running 'init' at Fri, 21 May 2021 04:40:34 > > +. Up 268624.75 seconds. > > ci-info: +++Net device > > info > > ci-info: > > > ++--+-+-++---+ > > ci-info: | Device | Up | Address | Mask > | > > Scope | Hw-Address| > > ci-info: > > > ++--+-+-++---+ > > ci-info: | eth0 | True | VR'S IP address1 | > > 255.255.255.192 | global | 1e:00:8f:00:02:8f | > > ci-info: | eth0 | True | fe80::1c00:8fff:fe00:28f/64 |. > > | link | 1e:00:8f:00:02:8f | > > ci-info: | lo | True | 127.0.0.1 |255.0.0.0 > > | host | . | > > ci-info: | lo | True | ::1/128 |. > > | host | . | > > ci-info: > > > ++--+-+-++---+ > > ci-info: +Route IPv4 > > info++ > > ci-info: > > > +---+-++-+---+---+ > > ci-info: | Route | Destination | Gateway | Genmask | Interface > > | Flags | > > ci-info: > > > +---+-++-+---+---+ > > ci-info: | 0 | 0.0.0.0 | x.x.x.1 | 0.0.0.0 |eth0 | > > UG | > > ci-info: | 1 | x.x.x.0 | 0.0.0.0 | 255.255.255.192 |eth0 | > > U | > > ci-info: > > > +---+-++-+---+---+ > > ci-info: +++Route IPv6 info+++ > > ci-info: +---+-+-+---+---+ > > ci-info: | Route |
Re: Boot Order XenServer
If you read the last few blog lines more carefully you will notice only the KEY name should be in the allowed list, not the value itself - so just "HVM-boot-params:order" - if this doesn't work, then there might be a bug (due to column sign - so please test if you can use/reproduce the same example as in our blog page) Best, On Tue, 15 Jun 2021 at 01:17, Felipe wrote: > I put in allow.additional.vm.configuration.list.xenserver: > > HVM-boot-params%3Aorder%3D%22dcn%22 > > it didn't work, do you have an example to change the order of bios on HVM > on xenserver? > > Thank you!! > > On 2021/06/14 22:32:24, Andrija Panic wrote: > > > https://www.shapeblue.com/cloudstack-feature-first-look-enable-sending-of-arbitrary-configuration-data-to-vms/ > > > > Best, > > > > On Mon, 14 Jun 2021 at 21:57, Felipe wrote: > > > > > Hello everyone!!! > > > > > > I wonder if it is possible to change the boot order on xenserver? > > > > > > in global settings, is it at > > > allow.additional.vm.configuration.list.xenserver? > > > > > > i would like to put DVD first in boot order. > > > > > > thank you all!! > > > > > > [image: image.png] > > > > > > > > > > -- > > > > Andrija Panić > > > -- Andrija Panić
Re: Alter Shared Guest Network?
There is something wrong there, and you should not, to my knowledge, have issues with IDs (but I don't recall I have checked this ever) Before "cloning" the row from user_ip_address table - please make sure you are cloning an empty record, not the one which is "used" and alter clean up things - makes you life easier. Sequence "problem" : I have no idea where is this mac_address used later - but the logical place would be the cloud.nics table - all NICs that exist (for all of your VMs, including system VMs) are located in that table - check the network ID in the "networks" table (shared network ID), then do select * from nics where network_id= to show all NICs from that network - in your case there should be 3 NICs (of VR, VM1, VM2) - check if the mac addresses of VM1 and VM2 are different - it NOT then you have the problem, otherwise, I don't think you do have a problem - check inside your VM1 and VM2 if they go their respective MAC and IP addresses - they should be different from VM1 to VM2) (I'm pretty sure that MAC sequence is not used anywhere, or anymore - as the actual sequence numbers (for different resources) are kept in the "sequence" table - and in my env MAC sequence for both private and public MACs are set to "1" -which is nonsense - probably not used any more. Best, On Tue, 15 Jun 2021 at 13:22, Yordan Kostov wrote: > FYI tested this on 4.15 with specifics: > - Shared network with 2 ip range for example 10.10.10.10 - 10.10.10.11 > - created as much VMs as ACS allows me which is 1 (first ip gets assigned > to the VR) > - expanded the the range of the shared network in table "VLAN" from > 10.10.10.10-10.10.10.11 to 10.10.10.10-10.10.10.12 > - Dublicated existing entry in table "user_ip_address" for ip in that > specific shared network. Changed the following columns with new entries: > --- ID to the next unreserved > --- UUID to unique one for the table > --- public_ip_address to 10.10.10.12 > --- allocated - make it NULL > --- state - make it Free > --- mac_address - look at the whole table and set it to the next one that > is not used > > Back to ACS gui I can create a new VM in that network and Ip is assigned. > But there are some underwater stones that are created this way. > As IDs are created manually ACS DB is not updating its sequence so I was > wondering if new network is created would it take the same MAC ID. > After creating a new network and looking again in the table - the answer > to this question is yes - https://imgur.com/YnGMGRE. > > So besides the 2 tables another one should be edited but so far I cannot > find where is the sequence kept. > > Best regards, > Jordan > > -Original Message- > From: Andrija Panic > Sent: Monday, June 14, 2021 10:24 PM > To: users > Subject: Re: Alter Shared Guest Network? > > > [X] This message came from outside your organization > > > ANother is is the, if not mistaken, the VLAN table. which will contain the > range as x.x.x.1-x.x.x.10 - etc - this is needed to be updated as well (if > you manually add records in the user_ip_address table) > > best, > > On Thu, 10 Jun 2021 at 18:23, Jeremy Hansen wrote: > > > Thanks. I’ll take a look table. > > > > -jeremy > > > > > On Jun 10, 2021, at 6:57 AM, Yordan Kostov > wrote: > > > > > > Hello Jeremy, > > > > > >Once a shared network with DHCP offering is created the IPs > > > fitting > > into the defined range are created in table called "user_ip_address". > > >They are created one by one so if range between x.x.x.x.11 and > > x.x.x.210 is created this will add 200 entries. So if you want to > > expand that you need to add more entries manually, which is a bit > unfortunate. > > > > > > Best regards, > > > Jordan > > > > > > -Original Message- > > > From: Jeremy Hansen > > > Sent: Thursday, June 10, 2021 12:12 AM > > > To: users@cloudstack.apache.org > > > Subject: Re: Alter Shared Guest Network? > > > > > > > > > [X] This message came from outside your organization > > > > > > > > >> On Jun 9, 2021, at 1:39 PM, Wido den Hollander > wrote: > > >> > > >> > > >> > > On 6/9/21 3:55 PM, Jeremy Hansen wrote: > > >>> When I created my shared network config, I specified too narrow of > > >>> an > > IP range. > > >>> > > >>> I can’t seem to figure out how to alter this config via the web > > interface. Is this possible? > > >>> > > >> > > >> Not via de UI nor API. You will need to hack this in the database. > > >> Or remove the network and create it again. But this is only > > >> possible if there are no VMs in the network. > > >> > > >> Wido > > > > > > Thanks, recreating it seems like the easiest option since I’m only > > > in > > testing phase right now, but I’m curious what it would take to alter > > tables to fix this. Any clues as to what tables/fields would need to be > updated? > > > > > >> > > >>> -jeremy > > >>> > > > > > > > > > -- > > Andrija Panić > -- Andrija Panić
Re: Unable to add template to new deployment
BTW, once you thing you have fixed all your network configuration issues - destroy all system VM (CPVM, SSVM and restart all networks with "cleanup" - so that new VMs are created_ Inside SSVM, run the the following script, which should give you results similar as below - confirming that your SSVM is healthy root@s-2536-VM:/usr/local/cloud/systemvm# /usr/local/cloud/systemvm/ssvm- check.sh First DNS server is 192.168.169.254 PING 192.168.169.254 (192.168.169.254): 56 data bytes 64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms 64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms --- 192.168.169.254 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms Good: Can ping DNS server Good: DNS resolves cloudstack.apache.org nfs is currently mounted Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2 Good: Can write to mount point Management server is 192.168.169.13. Checking connectivity. Good: Can connect to management server 192.168.169.13 port 8250 Good: Java process is running Tests Complete. Look for ERROR or WARNING above. On Thu, 17 Jun 2021 at 23:55, Andrija Panic wrote: > Since you really bothered to provide so very detailed inputs and help us > help you (vs what some other people tend to do) - I think you really > deserved a decent answer (and some explanation). > > The last question first -even though you don't specify/have dedicated > Storage traffic, there will be an additional interface inside the SSVM > connected to the same Management network (not to the old Storage network - > if you see the old storage network, restart your mgmt server and destroy > the SSVM - a new one should be created, with proper interfaces inside it) > > bond naming issues: > - rename your "bond-services" to something industry-standard like "bond0" > or similar - cloudstack extracts "child" interfaces from cloudbr1 IF you > specify a VLAN for a network that ACS should create - so your > "bond-services", while fancy (and unclear to me WHY you named it in that > weird way - smiley here) - is NOT something CloudStack will recognize and > this is the reason it fails (it even says so in that error message) > - no reason to NOT have that dedicated storage network - feel free to > bring it back - the same issue you have as for the public traffic - rename > "bond-storage" to e.g. "bond1" and you will be good to go - since you are > NOT using tagging, ACS will just plug vNIC of the VM into the cloudbr2 (or > whatever bridge name you use for it). > > Now some explanation (even though your deduction capabilities certainly > made you draw some conclusions from what I wrote above ^^^) > > - When you specify a VLAN id for some network in CLoudStack - CloudStack > will look for the device name that is specified as the "Traffic label" for > that traffic (and you have none??? for your Public traffic - while it > should be set to the name of the bridge device "cloudbr1") - and then it > will provision a VLAN interface and create a new bridge - (i.e. for Public > network with VLAN id 48, it will extract "bond0" from the "cloudbr1", and > create bond0.48 VLAN interface - AND it will create a brand new bridge with > this bond0.48 interface (bridge with funny name), and plug Public vNICs > into this new bridge > - When you do NOT specify a VLAN id for some network in CloudStack (i.e. > your storage network doesn't use VLAN ID in CloudStack, your switch ports > are in access vlan 96) - you need to have a bridge (i.e. cloudbr2) with the > bondYYY child interface (instead of that "bond-storage" fancy but > unrecognized child interface name) - and then ACS will NOT extract child > interface (nor do everything I explained in the previous paraghraph/bullet > point) - it will just bluntly "stick" all the vNICs into that cloudbr2 - > and hope you have a proper physical/child interface also added to the > cloudbr2 that will carry the traffic down the line... (purely FYI - you > could also e.g. use trunking on Linux if you want to, and have e.g. > "bondXXX.96" VLAN interface manually configured and add it to the bridge, > while still NOT defining any VLAN in the CloudStack for that Storage > network - and ACS will just stick vNIC to this bridge) > > Public traffic/network - is the network that all systemVMs (SSVM, CPVM and > all VRs) are connected to - this network is "public" like "external" to > other CloudStack internal or Guest network - this is the network to which > the "north" interface is connected - but does NOT have to be " non-RFC 1918 > " - it can be any private IP range from your company internal network (that > will
Re: Unable to add template to new deployment
Since you really bothered to provide so very detailed inputs and help us help you (vs what some other people tend to do) - I think you really deserved a decent answer (and some explanation). The last question first -even though you don't specify/have dedicated Storage traffic, there will be an additional interface inside the SSVM connected to the same Management network (not to the old Storage network - if you see the old storage network, restart your mgmt server and destroy the SSVM - a new one should be created, with proper interfaces inside it) bond naming issues: - rename your "bond-services" to something industry-standard like "bond0" or similar - cloudstack extracts "child" interfaces from cloudbr1 IF you specify a VLAN for a network that ACS should create - so your "bond-services", while fancy (and unclear to me WHY you named it in that weird way - smiley here) - is NOT something CloudStack will recognize and this is the reason it fails (it even says so in that error message) - no reason to NOT have that dedicated storage network - feel free to bring it back - the same issue you have as for the public traffic - rename "bond-storage" to e.g. "bond1" and you will be good to go - since you are NOT using tagging, ACS will just plug vNIC of the VM into the cloudbr2 (or whatever bridge name you use for it). Now some explanation (even though your deduction capabilities certainly made you draw some conclusions from what I wrote above ^^^) - When you specify a VLAN id for some network in CLoudStack - CloudStack will look for the device name that is specified as the "Traffic label" for that traffic (and you have none??? for your Public traffic - while it should be set to the name of the bridge device "cloudbr1") - and then it will provision a VLAN interface and create a new bridge - (i.e. for Public network with VLAN id 48, it will extract "bond0" from the "cloudbr1", and create bond0.48 VLAN interface - AND it will create a brand new bridge with this bond0.48 interface (bridge with funny name), and plug Public vNICs into this new bridge - When you do NOT specify a VLAN id for some network in CloudStack (i.e. your storage network doesn't use VLAN ID in CloudStack, your switch ports are in access vlan 96) - you need to have a bridge (i.e. cloudbr2) with the bondYYY child interface (instead of that "bond-storage" fancy but unrecognized child interface name) - and then ACS will NOT extract child interface (nor do everything I explained in the previous paraghraph/bullet point) - it will just bluntly "stick" all the vNICs into that cloudbr2 - and hope you have a proper physical/child interface also added to the cloudbr2 that will carry the traffic down the line... (purely FYI - you could also e.g. use trunking on Linux if you want to, and have e.g. "bondXXX.96" VLAN interface manually configured and add it to the bridge, while still NOT defining any VLAN in the CloudStack for that Storage network - and ACS will just stick vNIC to this bridge) Public traffic/network - is the network that all systemVMs (SSVM, CPVM and all VRs) are connected to - this network is "public" like "external" to other CloudStack internal or Guest network - this is the network to which the "north" interface is connected - but does NOT have to be " non-RFC 1918 " - it can be any private IP range from your company internal network (that will eventually route traffic to internet - IF you want your ACS to be able to download stuff/templates from Internet - otherwise it does NOT have to route to internet - if you are using private cloud and do NOT want external access to your ACS, well to SSVM and CPVM and VRs external ("public") interfaces/IPs - but if you are running a public cloud - then you want to provide a non-RFC 1918 i.e. a really Publicly routable IP addresses/range for the Public network - ACS will assign 1IP for SSVM, 1 IP for CPVM, and many IPs to your many VRs you create. A thing that I briefly touched somewhere upstairs ^^^ - for each traffic type you have defined - you need to define a traffic label - my deduction capabilities make me believe you are using KVM, so you need to set your KVM traffic label for all your network traffic (traffic label, in you case = exact name of the bridge as visible in Linux) - I recall there are some new UI issues when it comes to tags, so go to your :8080/client/legacy - and check your traffic label there - and set it there, UI in 4.15.0.0 doesn't allow you to update/set it after the zone is created - but old UI will allow you to do it. Not sure why I spent 30 minutes of my life, but there you go - hope you got everything from my email - let me know if anything is unclear! Cheers, On Wed, 16 Jun 2021 at 19:15, Joshua Schaeffer wrote: > So Suresh's advise has pushed me in the right direction. The VM was up but > the agent state was down. I was able to connect to the VM in order to > continue investigating and the VM is having network issues connecting to > both my load balancer
Re: Issues Found Apache CloudStack 4.15.1.0 (RC2)
@Corey, Mike can you please raise a GH issue with the same description, and also vote -1 on the RC2 release, with the link to that GH issue? THanks, Andrija On Thu, 17 Jun 2021 at 18:09, Corey, Mike wrote: > Hi, > > Thanks for pushing this out. I'm looking forward to trying the > template/instance deployment in my VMware PILOT. > > A couple items I noticed off the "new" build are: > > 1 - During zone creation with VMware and setting up the physical networks > - adding the traffic label to use a VDS does NOT keep/take/apply. Once the > zone is created and you go into the physical networks, the VDS traffic > label is blank when it should be in this format > "vSwtichName,VLAN,typeofswitch". The only physical network traffic label > that saved during zone setup wizard was for the Management stack; my > storage and guest physical network traffic labels did not save from the > wizard. > > 2 - Initial SystemVM deployment, the secondary storage permission do not > allow the copy of the systemvm.iso to the secondary/systemvm/ folder. I > had to first create a /mnt/secondary/systemvm/ folder and chmod -R for this > copy to function. > > More to come... > > Mike > > -Original Message- > From: Rohit Yadav > Sent: Wednesday, June 16, 2021 12:28 PM > To: d...@cloudstack.apache.org; users@cloudstack.apache.org > Subject: [VOTE] Apache CloudStack 4.15.1.0 (RC2) > > Hi All, > > I've created a 4.15.1.0 release, with the following artifacts up for a > vote: > > Git Branch: > https://github.com/apache/cloudstack/tree/4.15.1.0-RC20210616T2128 > Commit SHA: > 3afd37022b9dac52cd146dccada6012e47a80232 > > Source release (checksums and signatures are available at the same > location): > https://dist.apache.org/repos/dist/dev/cloudstack/4.15.1.0/ > > PGP release keys (signed using 5ED1E1122DC5E8A4A45112C2484248210EE3D884): > https://dist.apache.org/repos/dist/release/cloudstack/KEYS > > The vote will be open for the next week until 22 June 2021. > > For sanity in tallying the vote, can PMC members please be sure to indicate > "(binding)" with their vote? > > [ ] +1 approve > [ ] +0 no opinion > [ ] -1 disapprove (and reason why) > > For users convenience, the packages from this release candidate and 4.15.1 > systemvmtemplates are available here: > https://download.cloudstack.org/testing/4.15.1.0-RC2/ > https://download.cloudstack.org/systemvm/4.15/ > > Documentation is not published yet, but the following may be referenced for > upgrade related tests: (there's a new 4.15.1 systemvmtemplate to be > registered prior to upgrade) > > https://github.com/apache/cloudstack-documentation/tree/4.15/source/upgrading/upgrade > > Regards. > -- Andrija Panić
Issues Found Apache CloudStack 4.15.1.0 (RC2)
Hi, Thanks for pushing this out. I'm looking forward to trying the template/instance deployment in my VMware PILOT. A couple items I noticed off the "new" build are: 1 - During zone creation with VMware and setting up the physical networks - adding the traffic label to use a VDS does NOT keep/take/apply. Once the zone is created and you go into the physical networks, the VDS traffic label is blank when it should be in this format "vSwtichName,VLAN,typeofswitch". The only physical network traffic label that saved during zone setup wizard was for the Management stack; my storage and guest physical network traffic labels did not save from the wizard. 2 - Initial SystemVM deployment, the secondary storage permission do not allow the copy of the systemvm.iso to the secondary/systemvm/ folder. I had to first create a /mnt/secondary/systemvm/ folder and chmod -R for this copy to function. More to come... Mike -Original Message- From: Rohit Yadav Sent: Wednesday, June 16, 2021 12:28 PM To: d...@cloudstack.apache.org; users@cloudstack.apache.org Subject: [VOTE] Apache CloudStack 4.15.1.0 (RC2) Hi All, I've created a 4.15.1.0 release, with the following artifacts up for a vote: Git Branch: https://github.com/apache/cloudstack/tree/4.15.1.0-RC20210616T2128 Commit SHA: 3afd37022b9dac52cd146dccada6012e47a80232 Source release (checksums and signatures are available at the same location): https://dist.apache.org/repos/dist/dev/cloudstack/4.15.1.0/ PGP release keys (signed using 5ED1E1122DC5E8A4A45112C2484248210EE3D884): https://dist.apache.org/repos/dist/release/cloudstack/KEYS The vote will be open for the next week until 22 June 2021. For sanity in tallying the vote, can PMC members please be sure to indicate "(binding)" with their vote? [ ] +1 approve [ ] +0 no opinion [ ] -1 disapprove (and reason why) For users convenience, the packages from this release candidate and 4.15.1 systemvmtemplates are available here: https://download.cloudstack.org/testing/4.15.1.0-RC2/ https://download.cloudstack.org/systemvm/4.15/ Documentation is not published yet, but the following may be referenced for upgrade related tests: (there's a new 4.15.1 systemvmtemplate to be registered prior to upgrade) https://github.com/apache/cloudstack-documentation/tree/4.15/source/upgrading/upgrade Regards.
Re: Snapshots are not working after upgrading to 4.15.0
Hello Andrei, As yourself noticed, ACS has a hardcoded threshold for secondary storages. In cases that the secondary storage has large capacities, 10% can mean a lot of storage. There is an open PR (https://github.com/apache/cloudstack/pull/4790) that externalize this threshold to allow operators decide how much they need. Also, the logs of secondary storage management were improved with PR https://github.com/apache/cloudstack/pull/4955. With respect to KVM snapshots, volume snapshots are taken in a quite peculiar way. Instead of taking volume snapshots directly, ACS takes a full snapshot of the VM, which may cause freeze on it (VM) due to memory snapshot, and then extracts the disk from the VM snapshot. Due to it, it was open an issue https://github.com/apache/cloudstack/issues/5124 to discuss a new workflow to snapshots on KVM. I am already implementing the solution for the issue 5124, and to improve this whole snapshot process for KVM; However, it is a complex and long standing job. As soon as we have something, I would appreciate to receive some feedback from you. Regards, Guto On 2021/06/16 16:15:51, Andrei Mikhailovsky wrote: > Hello,> > > I've done some more investigation and indeed, the snapshots were not taken because the secondary storage was over 90% used. I have started cleaning some of the older volumes and noticed another problem. After removing snapshots, they do not seem to be removed from the secondary storage. I've removed all snapshots over 24 hours ago and it looks like the disk space hasn't been freed up at all.> > > Looks like there are issues with snapshotting function after all.> > > Andrei> > > > > - Original Message -> > > From: "Harikrishna Patnala" > > > To: "users" > > > Sent: Tuesday, 8 June, 2021 03:33:57> > > Subject: Re: Snapshots are not working after upgrading to 4.15.0> > > > Hi Andrei,> > > > > > Can you check the following things and let us know?> > > > > > > > > 1. Can you try creating a new volume and then create snapshot of that, to check> > > if this an issue with old entries> > > 2. For the snapshots which are failing can you check if you are seeing any> > > error messages like this "Can't find an image storage in zone with less than".> > > This is to check if secondary storage free space check failed.> > > 3. For the snapshots which are failing and if it is delta snapshot can you> > > check if its parent's snapshot entry exists in "snapshot_store_ref" table with> > > 'parent_snapshot_id' of the current snapshot with 'store_role' "Image". This is> > > to find the secondary storage where the parent snapshot backup is located.> > > > > > Regards,> > > Harikrishna> > > > > > From: Andrei Mikhailovsky > > > Sent: Monday, June 7, 2021 7:00 PM> > > To: users > > > Subject: Snapshots are not working after upgrading to 4.15.0> > > > > > Hello everyone,> > > > > > I am having an issue with volume snapshots since I've upgraded to 4.15.0. None> > > of the volumes are being snapshotted regardless if the snapshot is initiated> > > manually or from the schedule. The strange thing is that if I manually take the> > > snapshot, the GUI shows Success status, but the Storage>Snapshots show an Error> > > status. Here is what I see in the management server logs:> > > > > > 2021-06-07 13:55:20,022 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]> > > (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) Done> > > executing com.cloud.vm.VmWorkTakeVolumeSnapshot for job-86143> > > 2021-06-07 13:55:20,024 INFO [o.a.c.f.j.i.AsyncJobMonitor]> > > (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) Remove> > > job-86143 from job monitoring> > > 2021-06-07 13:55:20,094 DEBUG [o.a.c.s.s.SnapshotServiceImpl]> > > (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Failed to copy snapshot> > > com.cloud.utils.exception.CloudRuntimeException: can not find an image stores> > > at> > > org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:271)> > > at> > > org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.backupSnapshot(DefaultSnapshotStrategy.java:171)> > > at> > > com.cloud.storage.snapshot.SnapshotManagerImpl$BackupSnapshotTask.runInContext(SnapshotManagerImpl.java:1238)> > > at> > > org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)> > > at> > > org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)> > > at> > > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)> > > at> > > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)> > > at> > > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)> > > at> > >
Re: Snapshots are not working after upgrading to 4.15.0
Hi Andrei, Can you test 4.15.1.0 RC2 which is up for voting/testing and if you're able to reproduce the issue, please file a bugreport if it's not same as https://github.com/apache/cloudstack/issues/4797 #4747 is Ceph specific which unfortunately I don't have an environment to test again but PRs are welcome by any Ceph user/developer. Thanks. Regards. From: Slavka Peleva Sent: Thursday, June 17, 2021 20:50 To: users@cloudstack.apache.org Subject: Re: Snapshots are not working after upgrading to 4.15.0 Hi all, I've compared the delete of snapshots between 4.13 and 4.15.1. The main difference is that when picking the snapshot strategy in 4.13, the deletion is handled by XenserverSnapshotStrategy (renamed DefaultSnapshotStrategy in the newer versions), and for 4.15.1 is handled by StorageSystemSnapshotStrategy. The difference is that the first one deletes the snapshot chain in secondary storage, the second deletes the snapshot only on the primary (Ceph) storage. Gabriel, if you are aware of the problem, can you correct me if I'm wrong? Best regards, Slavka On Thu, Jun 17, 2021 at 4:23 PM Gabriel Bräscher wrote: > Hi Andrei, > > I appreciate all the efforts and the help in narrowing down this issue. It > looks similar and probably it is related to bug #4797 indeed. > This bug is for some time to be fixed and I perfectly understand why you > are not happy. > > I am speaking for myself here and I am not the Release Manager (RM) of > 4.15.1.0 but In my point of view, this does not necessarily impact on > blocking 4.15.1.0. > > Fixing it has been proving a bit trickier and also requires manual tests > with different environment configurations and some time to debug and > develop. > I myself had no time to fix it for 4.15.1.0 thus decided to not hold > 4.15.1.0 as it would mean that many users would not have several bug fixes > due to this one. > > To give some context. I work for a hosting company that has been > contributing to bug fixes and new features for a long time. > We even fixed bugs that do not impact us directly (e.g. issues that affect > storage systems we do not use, or a hypervisor we do not use, etc). > This means that I, as a contributor, sometimes have less time for some > tasks than other ones. > > With that said, I will be re-checking this issue soon(ish) but I cannot > guarantee that I will be able to bring a fix in time for 4.15.1.0. > If any contributor has time to fix it I would be happy to help with review > and testing. > > Best regards, > Gabriel. > > Em qui., 17 de jun. de 2021 às 07:31, Andrei Mikhailovsky > escreveu: > > > Hi Suresh, > > > > This is what I've answered on the db tables: > > > > The table snapshots has NULL under the removed column in all > snapshots > > that I've > > removed. The table snapshot_store_ref has no such column, but the > > state shown > > as Destroyed. > > > > > > I've done some more checking under the ssvm itself, which look ok: > > > > > > root@s-2536-VM:/usr/local/cloud/systemvm# > > /usr/local/cloud/systemvm/ssvm-check.sh > > > > First DNS server is 192.168.169.254 > > PING 192.168.169.254 (192.168.169.254): 56 data bytes > > 64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms > > 64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms > > --- 192.168.169.254 ping statistics --- > > 2 packets transmitted, 2 packets received, 0% packet loss > > round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms > > Good: Can ping DNS server > > > > Good: DNS resolves cloudstack.apache.org > > > > nfs is currently mounted > > Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2 > > Good: Can write to mount point > > > > Management server is 192.168.169.13. Checking connectivity. > > Good: Can connect to management server 192.168.169.13 port 8250 > > > > Good: Java process is running > > > > Tests Complete. Look for ERROR or WARNING above. > > > > > > The management server does show errors like these, without any further > > details: > > > > 2021-06-17 10:31:06,197 DEBUG [c.c.s.StorageManagerImpl] > > (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to > delete > > snapshot: 55183 from storage > > 2021-06-17 10:31:06,280 DEBUG [o.a.c.s.s.SnapshotObject] > > (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to > update > > state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to > a > > new state from Destroyed via DestroyRequested > > 2021-06-17 10:31:06,281 DEBUG [c.c.s.StorageManagerImpl] > > (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to > delete > > snapshot: 84059 from storage > > 2021-06-17 10:31:06,363
Re: instance backup designs?
Hi Yordan, We do have a backup & recovery framework which can be extended to implement support for new solutions, the current provider/plugin is available only for Vmware/Veeam and which can be used to implement support for other backup solutions for other hypervisors. While there is no choice now, for XenServer/XCP-NG you can use volume snapshots as a way to have backups volumes on secondary storage. Regards. From: Yordan Kostov Sent: Wednesday, June 16, 2021 18:46 To: users@cloudstack.apache.org Subject: instance backup designs? Hey everyone, I was wondering what choice does one have for backup when underlying hypervisor is XenServer/XCP-NG? Any high level ideas or just sharing any doc that may exist will be great! Best regards, Jordan
Re: Snapshots are not working after upgrading to 4.15.0
Hi all, I've compared the delete of snapshots between 4.13 and 4.15.1. The main difference is that when picking the snapshot strategy in 4.13, the deletion is handled by XenserverSnapshotStrategy (renamed DefaultSnapshotStrategy in the newer versions), and for 4.15.1 is handled by StorageSystemSnapshotStrategy. The difference is that the first one deletes the snapshot chain in secondary storage, the second deletes the snapshot only on the primary (Ceph) storage. Gabriel, if you are aware of the problem, can you correct me if I'm wrong? Best regards, Slavka On Thu, Jun 17, 2021 at 4:23 PM Gabriel Bräscher wrote: > Hi Andrei, > > I appreciate all the efforts and the help in narrowing down this issue. It > looks similar and probably it is related to bug #4797 indeed. > This bug is for some time to be fixed and I perfectly understand why you > are not happy. > > I am speaking for myself here and I am not the Release Manager (RM) of > 4.15.1.0 but In my point of view, this does not necessarily impact on > blocking 4.15.1.0. > > Fixing it has been proving a bit trickier and also requires manual tests > with different environment configurations and some time to debug and > develop. > I myself had no time to fix it for 4.15.1.0 thus decided to not hold > 4.15.1.0 as it would mean that many users would not have several bug fixes > due to this one. > > To give some context. I work for a hosting company that has been > contributing to bug fixes and new features for a long time. > We even fixed bugs that do not impact us directly (e.g. issues that affect > storage systems we do not use, or a hypervisor we do not use, etc). > This means that I, as a contributor, sometimes have less time for some > tasks than other ones. > > With that said, I will be re-checking this issue soon(ish) but I cannot > guarantee that I will be able to bring a fix in time for 4.15.1.0. > If any contributor has time to fix it I would be happy to help with review > and testing. > > Best regards, > Gabriel. > > Em qui., 17 de jun. de 2021 às 07:31, Andrei Mikhailovsky > escreveu: > > > Hi Suresh, > > > > This is what I've answered on the db tables: > > > > The table snapshots has NULL under the removed column in all > snapshots > > that I've > > removed. The table snapshot_store_ref has no such column, but the > > state shown > > as Destroyed. > > > > > > I've done some more checking under the ssvm itself, which look ok: > > > > > > root@s-2536-VM:/usr/local/cloud/systemvm# > > /usr/local/cloud/systemvm/ssvm-check.sh > > > > First DNS server is 192.168.169.254 > > PING 192.168.169.254 (192.168.169.254): 56 data bytes > > 64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms > > 64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms > > --- 192.168.169.254 ping statistics --- > > 2 packets transmitted, 2 packets received, 0% packet loss > > round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms > > Good: Can ping DNS server > > > > Good: DNS resolves cloudstack.apache.org > > > > nfs is currently mounted > > Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2 > > Good: Can write to mount point > > > > Management server is 192.168.169.13. Checking connectivity. > > Good: Can connect to management server 192.168.169.13 port 8250 > > > > Good: Java process is running > > > > Tests Complete. Look for ERROR or WARNING above. > > > > > > The management server does show errors like these, without any further > > details: > > > > 2021-06-17 10:31:06,197 DEBUG [c.c.s.StorageManagerImpl] > > (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to > delete > > snapshot: 55183 from storage > > 2021-06-17 10:31:06,280 DEBUG [o.a.c.s.s.SnapshotObject] > > (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to > update > > state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to > a > > new state from Destroyed via DestroyRequested > > 2021-06-17 10:31:06,281 DEBUG [c.c.s.StorageManagerImpl] > > (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to > delete > > snapshot: 84059 from storage > > 2021-06-17 10:31:06,363 DEBUG [o.a.c.s.s.SnapshotObject] > > (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to > update > > state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to > a > > new state from Destroyed via DestroyRequested > > > > > > Regarding the bug 4797. I can't really comment as it has very little > > technical details without the management log errors, etc. But > essentially, > > at the high level, the snapshots are not deleted from the backend in my > > case, just like in the bug 4797. > > > > > > TBH, I am very much
Re: Snapshots are not working after upgrading to 4.15.0
Hi Andrei, I appreciate all the efforts and the help in narrowing down this issue. It looks similar and probably it is related to bug #4797 indeed. This bug is for some time to be fixed and I perfectly understand why you are not happy. I am speaking for myself here and I am not the Release Manager (RM) of 4.15.1.0 but In my point of view, this does not necessarily impact on blocking 4.15.1.0. Fixing it has been proving a bit trickier and also requires manual tests with different environment configurations and some time to debug and develop. I myself had no time to fix it for 4.15.1.0 thus decided to not hold 4.15.1.0 as it would mean that many users would not have several bug fixes due to this one. To give some context. I work for a hosting company that has been contributing to bug fixes and new features for a long time. We even fixed bugs that do not impact us directly (e.g. issues that affect storage systems we do not use, or a hypervisor we do not use, etc). This means that I, as a contributor, sometimes have less time for some tasks than other ones. With that said, I will be re-checking this issue soon(ish) but I cannot guarantee that I will be able to bring a fix in time for 4.15.1.0. If any contributor has time to fix it I would be happy to help with review and testing. Best regards, Gabriel. Em qui., 17 de jun. de 2021 às 07:31, Andrei Mikhailovsky escreveu: > Hi Suresh, > > This is what I've answered on the db tables: > > The table snapshots has NULL under the removed column in all snapshots > that I've > removed. The table snapshot_store_ref has no such column, but the > state shown > as Destroyed. > > > I've done some more checking under the ssvm itself, which look ok: > > > root@s-2536-VM:/usr/local/cloud/systemvm# > /usr/local/cloud/systemvm/ssvm-check.sh > > First DNS server is 192.168.169.254 > PING 192.168.169.254 (192.168.169.254): 56 data bytes > 64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms > 64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms > --- 192.168.169.254 ping statistics --- > 2 packets transmitted, 2 packets received, 0% packet loss > round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms > Good: Can ping DNS server > > Good: DNS resolves cloudstack.apache.org > > nfs is currently mounted > Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2 > Good: Can write to mount point > > Management server is 192.168.169.13. Checking connectivity. > Good: Can connect to management server 192.168.169.13 port 8250 > > Good: Java process is running > > Tests Complete. Look for ERROR or WARNING above. > > > The management server does show errors like these, without any further > details: > > 2021-06-17 10:31:06,197 DEBUG [c.c.s.StorageManagerImpl] > (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to delete > snapshot: 55183 from storage > 2021-06-17 10:31:06,280 DEBUG [o.a.c.s.s.SnapshotObject] > (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to update > state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to a > new state from Destroyed via DestroyRequested > 2021-06-17 10:31:06,281 DEBUG [c.c.s.StorageManagerImpl] > (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to delete > snapshot: 84059 from storage > 2021-06-17 10:31:06,363 DEBUG [o.a.c.s.s.SnapshotObject] > (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to update > state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to a > new state from Destroyed via DestroyRequested > > > Regarding the bug 4797. I can't really comment as it has very little > technical details without the management log errors, etc. But essentially, > at the high level, the snapshots are not deleted from the backend in my > case, just like in the bug 4797. > > > TBH, I am very much surprised that a bug in such an important function of > ACS has slipped through the testing methods for the 4.15.0 release and > despite being discovered over 3 months ago, it hasn't been scheduled for > the fix in 4.15.1 bug fix release. Does that sound right to you? I think > this issue should be revisited and corrected as it will cause a fill up of > the secondary storage and ultimately cause all sorts of issues with > creation of snapshots. > > Andrei > > > - Original Message - > > From: "Suresh Anaparti" > > To: "users" > > Sent: Thursday, 17 June, 2021 11:16:59 > > Subject: Re: Snapshots are not working after upgrading to 4.15.0 > > > Hi Andrei, > > > > Have you checked the 'status' and 'removed' timestamp in snapshots > table, and > > 'state' in snapshot_store_ref table for these snapshots. > > > > Similar
Re: Snapshots are not working after upgrading to 4.15.0
Hi Suresh, This is what I've answered on the db tables: The table snapshots has NULL under the removed column in all snapshots that I've removed. The table snapshot_store_ref has no such column, but the state shown as Destroyed. I've done some more checking under the ssvm itself, which look ok: root@s-2536-VM:/usr/local/cloud/systemvm# /usr/local/cloud/systemvm/ssvm-check.sh First DNS server is 192.168.169.254 PING 192.168.169.254 (192.168.169.254): 56 data bytes 64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms 64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms --- 192.168.169.254 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms Good: Can ping DNS server Good: DNS resolves cloudstack.apache.org nfs is currently mounted Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2 Good: Can write to mount point Management server is 192.168.169.13. Checking connectivity. Good: Can connect to management server 192.168.169.13 port 8250 Good: Java process is running Tests Complete. Look for ERROR or WARNING above. The management server does show errors like these, without any further details: 2021-06-17 10:31:06,197 DEBUG [c.c.s.StorageManagerImpl] (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to delete snapshot: 55183 from storage 2021-06-17 10:31:06,280 DEBUG [o.a.c.s.s.SnapshotObject] (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to update state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to a new state from Destroyed via DestroyRequested 2021-06-17 10:31:06,281 DEBUG [c.c.s.StorageManagerImpl] (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to delete snapshot: 84059 from storage 2021-06-17 10:31:06,363 DEBUG [o.a.c.s.s.SnapshotObject] (StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to update state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to a new state from Destroyed via DestroyRequested Regarding the bug 4797. I can't really comment as it has very little technical details without the management log errors, etc. But essentially, at the high level, the snapshots are not deleted from the backend in my case, just like in the bug 4797. TBH, I am very much surprised that a bug in such an important function of ACS has slipped through the testing methods for the 4.15.0 release and despite being discovered over 3 months ago, it hasn't been scheduled for the fix in 4.15.1 bug fix release. Does that sound right to you? I think this issue should be revisited and corrected as it will cause a fill up of the secondary storage and ultimately cause all sorts of issues with creation of snapshots. Andrei - Original Message - > From: "Suresh Anaparti" > To: "users" > Sent: Thursday, 17 June, 2021 11:16:59 > Subject: Re: Snapshots are not working after upgrading to 4.15.0 > Hi Andrei, > > Have you checked the 'status' and 'removed' timestamp in snapshots table, and > 'state' in snapshot_store_ref table for these snapshots. > > Similar issue logged (by Ed, as mentioned in his email) here: > https://github.com/apache/cloudstack/issues/4797. Is it the same issue? > > Regards, > Suresh > >On 17/06/21, 2:18 PM, "Andrei Mikhailovsky" wrote: > >Hi Suresh, Please see below the answers to your questions. > > > > > - Original Message - >> From: "Suresh Anaparti" >> To: "users" >> Sent: Thursday, 17 June, 2021 06:36:27 >> Subject: Re: Snapshots are not working after upgrading to 4.15.0 > >> Hi Andrei, >> >> Can you check if the storage garbage collector is enabled or not in your > env >> (specified using the global setting 'storage.cleanup.enabled'). If it is >> enabled, check the interval & delay setting: 'storage.cleanup.interval' > and >> 'storage.cleanup.delay', and see the logs to confirm cleanup is > performed or >> not. > >storage.cleanup.enabled is true >storage.cleanup.interval is 3600 >storage.cleanup.delay is 360086400 > >> >> Also, check the snapshot status / state in snapshots & > snapshot_store_ref tables >> for the snapshots that are not deleted during the cleanup. Is 'removed' >> timestamp set for them in snapshots table? >> > > >The table snapshots has NULL under the removed column in all snapshots > that I've >removed. The table snapshot_store_ref has no such column, but the state > shown >as Destroyed. > > > > >> Regards, >> Suresh >> >>On 16/06/21, 9:46 PM, "Andrei Mikhailovsky" >
Re: Snapshots are not working after upgrading to 4.15.0
Hi Andrei, Have you checked the 'status' and 'removed' timestamp in snapshots table, and 'state' in snapshot_store_ref table for these snapshots. Similar issue logged (by Ed, as mentioned in his email) here: https://github.com/apache/cloudstack/issues/4797. Is it the same issue? Regards, Suresh On 17/06/21, 2:18 PM, "Andrei Mikhailovsky" wrote: Hi Suresh, Please see below the answers to your questions. - Original Message - > From: "Suresh Anaparti" > To: "users" > Sent: Thursday, 17 June, 2021 06:36:27 > Subject: Re: Snapshots are not working after upgrading to 4.15.0 > Hi Andrei, > > Can you check if the storage garbage collector is enabled or not in your env > (specified using the global setting 'storage.cleanup.enabled'). If it is > enabled, check the interval & delay setting: 'storage.cleanup.interval' and > 'storage.cleanup.delay', and see the logs to confirm cleanup is performed or > not. storage.cleanup.enabled is true storage.cleanup.interval is 3600 storage.cleanup.delay is 360086400 > > Also, check the snapshot status / state in snapshots & snapshot_store_ref tables > for the snapshots that are not deleted during the cleanup. Is 'removed' > timestamp set for them in snapshots table? > The table snapshots has NULL under the removed column in all snapshots that I've removed. The table snapshot_store_ref has no such column, but the state shown as Destroyed. > Regards, > Suresh > >On 16/06/21, 9:46 PM, "Andrei Mikhailovsky" wrote: > >Hello, > >I've done some more investigation and indeed, the snapshots were not taken >because the secondary storage was over 90% used. I have started cleaning some >of the older volumes and noticed another problem. After removing snapshots, >they do not seem to be removed from the secondary storage. I've removed all >snapshots over 24 hours ago and it looks like the disk space hasn't been freed >up at all. > >Looks like there are issues with snapshotting function after all. > >Andrei > > > > > > > - Original Message - >> From: "Harikrishna Patnala" >> To: "users" >> Sent: Tuesday, 8 June, 2021 03:33:57 >> Subject: Re: Snapshots are not working after upgrading to 4.15.0 > >> Hi Andrei, >> >> Can you check the following things and let us know? >> >> >> 1. Can you try creating a new volume and then create snapshot of that, to check >> if this an issue with old entries >> 2. For the snapshots which are failing can you check if you are seeing any >> error messages like this "Can't find an image storage in zone with less than". >> This is to check if secondary storage free space check failed. >> 3. For the snapshots which are failing and if it is delta snapshot can you >> check if its parent's snapshot entry exists in "snapshot_store_ref" table with >> 'parent_snapshot_id' of the current snapshot with 'store_role' "Image". This is >> to find the secondary storage where the parent snapshot backup is located. >> >> Regards, >> Harikrishna >> >> From: Andrei Mikhailovsky >> Sent: Monday, June 7, 2021 7:00 PM >> To: users >> Subject: Snapshots are not working after upgrading to 4.15.0 >> >> Hello everyone, >> >> I am having an issue with volume snapshots since I've upgraded to 4.15.0. None >> of the volumes are being snapshotted regardless if the snapshot is initiated >> manually or from the schedule. The strange thing is that if I manually take the >> snapshot, the GUI shows Success status, but the Storage>Snapshots show an Error >> status. Here is what I see in the management server logs: >> >> 2021-06-07 13:55:20,022 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] >> (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) Done >> executing com.cloud.vm.VmWorkTakeVolumeSnapshot for job-86143 >> 2021-06-07 13:55:20,024 INFO [o.a.c.f.j.i.AsyncJobMonitor] >> (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) Remove >> job-86143 from job monitoring >> 2021-06-07 13:55:20,094 DEBUG [o.a.c.s.s.SnapshotServiceImpl] >> (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Failed to copy snapshot >> com.cloud.utils.exception.CloudRuntimeException: can not find an image stores >> at >> org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:271) >> at >>
Re: Snapshots are not working after upgrading to 4.15.0
Hi Suresh, Please see below the answers to your questions. - Original Message - > From: "Suresh Anaparti" > To: "users" > Sent: Thursday, 17 June, 2021 06:36:27 > Subject: Re: Snapshots are not working after upgrading to 4.15.0 > Hi Andrei, > > Can you check if the storage garbage collector is enabled or not in your env > (specified using the global setting 'storage.cleanup.enabled'). If it is > enabled, check the interval & delay setting: 'storage.cleanup.interval' and > 'storage.cleanup.delay', and see the logs to confirm cleanup is performed or > not. storage.cleanup.enabled is true storage.cleanup.interval is 3600 storage.cleanup.delay is 360086400 > > Also, check the snapshot status / state in snapshots & snapshot_store_ref > tables > for the snapshots that are not deleted during the cleanup. Is 'removed' > timestamp set for them in snapshots table? > The table snapshots has NULL under the removed column in all snapshots that I've removed. The table snapshot_store_ref has no such column, but the state shown as Destroyed. > Regards, > Suresh > >On 16/06/21, 9:46 PM, "Andrei Mikhailovsky" wrote: > >Hello, > >I've done some more investigation and indeed, the snapshots were not taken >because the secondary storage was over 90% used. I have started cleaning > some >of the older volumes and noticed another problem. After removing snapshots, >they do not seem to be removed from the secondary storage. I've removed all >snapshots over 24 hours ago and it looks like the disk space hasn't been > freed >up at all. > >Looks like there are issues with snapshotting function after all. > >Andrei > > > > > > > - Original Message - >> From: "Harikrishna Patnala" >> To: "users" >> Sent: Tuesday, 8 June, 2021 03:33:57 >> Subject: Re: Snapshots are not working after upgrading to 4.15.0 > >> Hi Andrei, >> >> Can you check the following things and let us know? >> >> >> 1. Can you try creating a new volume and then create snapshot of that, > to check >> if this an issue with old entries >> 2. For the snapshots which are failing can you check if you are seeing > any >> error messages like this "Can't find an image storage in zone with less > than". >> This is to check if secondary storage free space check failed. >> 3. For the snapshots which are failing and if it is delta snapshot can > you >> check if its parent's snapshot entry exists in "snapshot_store_ref" > table with >> 'parent_snapshot_id' of the current snapshot with 'store_role' "Image". > This is >> to find the secondary storage where the parent snapshot backup is > located. >> >> Regards, >> Harikrishna >> >> From: Andrei Mikhailovsky >> Sent: Monday, June 7, 2021 7:00 PM >> To: users >> Subject: Snapshots are not working after upgrading to 4.15.0 >> >> Hello everyone, >> >> I am having an issue with volume snapshots since I've upgraded to > 4.15.0. None >> of the volumes are being snapshotted regardless if the snapshot is > initiated >> manually or from the schedule. The strange thing is that if I manually > take the >> snapshot, the GUI shows Success status, but the Storage>Snapshots show > an Error >> status. Here is what I see in the management server logs: >> >> 2021-06-07 13:55:20,022 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] >> (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) > Done >> executing com.cloud.vm.VmWorkTakeVolumeSnapshot for job-86143 >> 2021-06-07 13:55:20,024 INFO [o.a.c.f.j.i.AsyncJobMonitor] >> (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) > Remove >> job-86143 from job monitoring >> 2021-06-07 13:55:20,094 DEBUG [o.a.c.s.s.SnapshotServiceImpl] >> (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Failed to copy > snapshot >> com.cloud.utils.exception.CloudRuntimeException: can not find an image > stores >> at >> > org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:271) >> at >> > org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.backupSnapshot(DefaultSnapshotStrategy.java:171) >> at >> > com.cloud.storage.snapshot.SnapshotManagerImpl$BackupSnapshotTask.runInContext(SnapshotManagerImpl.java:1238) >> at >> > org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48) >> at >> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55) >> at >> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102) >> at >> >
Re: Snapshots are not working after upgrading to 4.15.0
Hi Guys, I have already logged this as a big under reference: 4797 Ed On Thu, 17 Jun 2021 at 06:37, Suresh Anaparti wrote: > Hi Andrei, > > Can you check if the storage garbage collector is enabled or not in your > env (specified using the global setting 'storage.cleanup.enabled'). If it > is enabled, check the interval & delay setting: 'storage.cleanup.interval' > and 'storage.cleanup.delay', and see the logs to confirm cleanup is > performed or not. > > Also, check the snapshot status / state in snapshots & snapshot_store_ref > tables for the snapshots that are not deleted during the cleanup. Is > 'removed' timestamp set for them in snapshots table? > > Regards, > Suresh > > On 16/06/21, 9:46 PM, "Andrei Mikhailovsky" > wrote: > > Hello, > > I've done some more investigation and indeed, the snapshots were not > taken because the secondary storage was over 90% used. I have started > cleaning some of the older volumes and noticed another problem. After > removing snapshots, they do not seem to be removed from the secondary > storage. I've removed all snapshots over 24 hours ago and it looks like > the disk space hasn't been freed up at all. > > Looks like there are issues with snapshotting function after all. > > Andrei > > > > > > > - Original Message - > > From: "Harikrishna Patnala" > > To: "users" > > Sent: Tuesday, 8 June, 2021 03:33:57 > > Subject: Re: Snapshots are not working after upgrading to 4.15.0 > > > Hi Andrei, > > > > Can you check the following things and let us know? > > > > > > 1. Can you try creating a new volume and then create snapshot of > that, to check > > if this an issue with old entries > > 2. For the snapshots which are failing can you check if you are > seeing any > > error messages like this "Can't find an image storage in zone with > less than". > > This is to check if secondary storage free space check failed. > > 3. For the snapshots which are failing and if it is delta snapshot > can you > > check if its parent's snapshot entry exists in "snapshot_store_ref" > table with > > 'parent_snapshot_id' of the current snapshot with 'store_role' > "Image". This is > > to find the secondary storage where the parent snapshot backup is > located. > > > > Regards, > > Harikrishna > > > > From: Andrei Mikhailovsky > > Sent: Monday, June 7, 2021 7:00 PM > > To: users > > Subject: Snapshots are not working after upgrading to 4.15.0 > > > > Hello everyone, > > > > I am having an issue with volume snapshots since I've upgraded to > 4.15.0. None > > of the volumes are being snapshotted regardless if the snapshot is > initiated > > manually or from the schedule. The strange thing is that if I > manually take the > > snapshot, the GUI shows Success status, but the Storage>Snapshots > show an Error > > status. Here is what I see in the management server logs: > > > > 2021-06-07 13:55:20,022 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] > > (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) > (logid:be34ce01) Done > > executing com.cloud.vm.VmWorkTakeVolumeSnapshot for job-86143 > > 2021-06-07 13:55:20,024 INFO [o.a.c.f.j.i.AsyncJobMonitor] > > (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) > (logid:be34ce01) Remove > > job-86143 from job monitoring > > 2021-06-07 13:55:20,094 DEBUG [o.a.c.s.s.SnapshotServiceImpl] > > (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Failed to copy > snapshot > > com.cloud.utils.exception.CloudRuntimeException: can not find an > image stores > > at > > > org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:271) > > at > > > org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.backupSnapshot(DefaultSnapshotStrategy.java:171) > > at > > > com.cloud.storage.snapshot.SnapshotManagerImpl$BackupSnapshotTask.runInContext(SnapshotManagerImpl.java:1238) > > at > > > org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48) > > at > > > org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55) > > at > > > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102) > > at > > > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52) > > at > > > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45) > > at > > > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > > at > > >