Re: Host's HA failed!
Hi Li, Can you post the stack trace of the faiure from the logs? Regards, Nicolas Vazquez From: li jerry Sent: Thursday, June 6, 2019 11:50 AM To: d...@cloudstack.apache.org; users@cloudstack.apache.org Subject: Host's HA failed! Hell All We are trying to deploy VMs as cloudstack management nodes over the compute nodes, like hyper-converged infrastructure. But we found some issue with the HA switch. The test is based on CentOS 7.6 and CloudStack 4.11.2. The primary storage is NFS. We have enabled VM HA and Host HA, and the following global settings are enabled for CloudStack: indirect.agent.lb.algorithm=roundrobin host=M1IP,M2IP,M3IP We have 3 management VMs over 3 compute nodes, sharing with a mariadb galera cluster. M1 is the first management VM running on compute node H1, which is created under KVM and not managed by CloudStack. As the same, M2 and M3 is the second and thrid management VM running on H2 and H3. The problem is: If the CloudStack agent of compute node is connected to the management node on itself (like CloudStack agent of H1 is connected to M1), once H1 is down, CloudStack will only update the status of H1 to Disconnected, and won’t trigger AgentStatusCheck, then cause host HA failure. If H1 is connected to other management VMs like M2 or M3, this issue won’t happen. So, would you please give us some suggestions? Thanks. nicolas.vazq...@shapeblue.com www.shapeblue.com Amadeus House, Floral Street, London WC2E 9DPUK @shapeblue
Re: Can't start systemVM in a new advanced zone deployment
Yes snapshots are supposed to be in PS template copy. On 6/6/19, 9:24 AM, "Yiping Zhang" wrote: The nfs volume definitely allows root mount and have RW permissions, as we already see the volume mounted and template staged on primary storage. The volume is mounted as NFS3 datastore in vSphere. Volume snapshot is enabled, I can ask to have snapshot disabled to see if it makes any differentces. I need to find out more about NFS version and qtree mode from our storage admin. One thing I noticed is that when cloudstack templates are staged on to primary storage, a snapshot was created, which does not exist In the original OVA or on secondary storage. I suppose this is the expected behavior? Yiping On 6/6/19, 6:59 AM, "Sergey Levitskiy" wrote: This option is 'vol options name_of_volume nosnapdir on' however if I recall it right is supposed to work even with .snapshot directory visible Can you find out all vol options on your netapp volume? I would be most concerned about: - NFS version - NFS v4 should be disabled - security qtree mode to be set to UNIX - allow root mount I am also wondering if ACS is able to create ROOT-XX folder so you might want to watch the content of the DS when ACS tries the operations. On 6/5/19, 11:43 PM, "Paul Angus" wrote: Hi Yiping, do you have snapshots enabled on the NetApp filer? (it used to be seen as a ".snapshot" subdirectory in each directory) If so try disabling snapshots - there used to be a bug where the .snapshot directory would confuse CloudStack. paul.an...@shapeblue.com https://nam04.safelinks.protection.outlook.com/?url=www.shapeblue.comdata=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448727822sdata=NhoxwF0x4%2F8yn%2B8ck%2BCI8RUKEEDGnI73QfDDQeSmZUc%3Dreserved=0 Amadeus House, Floral Street, London WC2E 9DPUK @shapeblue -Original Message- From: Yiping Zhang Sent: 05 June 2019 23:38 To: users@cloudstack.apache.org Subject: Re: Can't start systemVM in a new advanced zone deployment Hi, Sergey: I found more logs in vpxa.log ( the esxi hosts are using UTC time zone, so I was looking at wrong time periods earlier). I have uploaded more logs into pastebin. From these log entries, it appears that when copying template to VM, it tried to open destination VMDK file and got error file not found. In case that the CloudStack attempted to create a systemVM, the destination VMDK file path it is looking for is "//.vmdk", see uploaded log at https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FaFysZkTydata=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448727822sdata=YyB9VdghCgiBuUmDZ8gIc0jPlM8miPzemX2UEAZ3sFA%3Dreserved=0 In case when I manually created new VM from a (different) template in vCenter UI, the destination VMDK file path it is looking for is "//.vmdk", see uploaded log at https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FyHcsD8xBdata=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448732817sdata=N%2BZHteGo3LDU0pvhBtzv7wcocAv35gRE9b9yKVQa6%2FQ%3Dreserved=0 So, I am confused as to how the path for destination VMDK was determined and by CloudStack or VMware, how did I end up with this? Yiping On 6/5/19, 12:32 PM, "Sergey Levitskiy" wrote: Some operations log get transferred to vCenter log vpxd.log. It is not straightforward to trace I but Vmware will be able to help should you open case with them. On 6/5/19, 11:39 AM, "Yiping Zhang" wrote: Hi, Sergey: During the time period when I had problem cloning template, there are only a few unique entries in vmkernel.log, and they were repeated hundreds/thousands of times by all the cpu cores: 2019-06-02T16:47:00.633Z cpu9:8491061)FSS: 6751: Failed to open file 'hpilo-d0ccb15'; Requested flags 0x5, world: 8491061 [ams-ahs], (Existing flags 0x5, world: 8491029 [ams-main]): Busy 2019-06-02T16:47:49.320Z cpu1:66415)nhpsa: hpsa_vmkScsiCmdDone:6384: Sense data: error code: 0x70, key: 0x5, info:00 00 00
Re: Can't start systemVM in a new advanced zone deployment
The nfs volume definitely allows root mount and have RW permissions, as we already see the volume mounted and template staged on primary storage. The volume is mounted as NFS3 datastore in vSphere. Volume snapshot is enabled, I can ask to have snapshot disabled to see if it makes any differentces. I need to find out more about NFS version and qtree mode from our storage admin. One thing I noticed is that when cloudstack templates are staged on to primary storage, a snapshot was created, which does not exist In the original OVA or on secondary storage. I suppose this is the expected behavior? Yiping On 6/6/19, 6:59 AM, "Sergey Levitskiy" wrote: This option is 'vol options name_of_volume nosnapdir on' however if I recall it right is supposed to work even with .snapshot directory visible Can you find out all vol options on your netapp volume? I would be most concerned about: - NFS version - NFS v4 should be disabled - security qtree mode to be set to UNIX - allow root mount I am also wondering if ACS is able to create ROOT-XX folder so you might want to watch the content of the DS when ACS tries the operations. On 6/5/19, 11:43 PM, "Paul Angus" wrote: Hi Yiping, do you have snapshots enabled on the NetApp filer? (it used to be seen as a ".snapshot" subdirectory in each directory) If so try disabling snapshots - there used to be a bug where the .snapshot directory would confuse CloudStack. paul.an...@shapeblue.com https://nam04.safelinks.protection.outlook.com/?url=www.shapeblue.comdata=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448727822sdata=NhoxwF0x4%2F8yn%2B8ck%2BCI8RUKEEDGnI73QfDDQeSmZUc%3Dreserved=0 Amadeus House, Floral Street, London WC2E 9DPUK @shapeblue -Original Message- From: Yiping Zhang Sent: 05 June 2019 23:38 To: users@cloudstack.apache.org Subject: Re: Can't start systemVM in a new advanced zone deployment Hi, Sergey: I found more logs in vpxa.log ( the esxi hosts are using UTC time zone, so I was looking at wrong time periods earlier). I have uploaded more logs into pastebin. From these log entries, it appears that when copying template to VM, it tried to open destination VMDK file and got error file not found. In case that the CloudStack attempted to create a systemVM, the destination VMDK file path it is looking for is "//.vmdk", see uploaded log at https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FaFysZkTydata=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448727822sdata=YyB9VdghCgiBuUmDZ8gIc0jPlM8miPzemX2UEAZ3sFA%3Dreserved=0 In case when I manually created new VM from a (different) template in vCenter UI, the destination VMDK file path it is looking for is "//.vmdk", see uploaded log at https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FyHcsD8xBdata=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448732817sdata=N%2BZHteGo3LDU0pvhBtzv7wcocAv35gRE9b9yKVQa6%2FQ%3Dreserved=0 So, I am confused as to how the path for destination VMDK was determined and by CloudStack or VMware, how did I end up with this? Yiping On 6/5/19, 12:32 PM, "Sergey Levitskiy" wrote: Some operations log get transferred to vCenter log vpxd.log. It is not straightforward to trace I but Vmware will be able to help should you open case with them. On 6/5/19, 11:39 AM, "Yiping Zhang" wrote: Hi, Sergey: During the time period when I had problem cloning template, there are only a few unique entries in vmkernel.log, and they were repeated hundreds/thousands of times by all the cpu cores: 2019-06-02T16:47:00.633Z cpu9:8491061)FSS: 6751: Failed to open file 'hpilo-d0ccb15'; Requested flags 0x5, world: 8491061 [ams-ahs], (Existing flags 0x5, world: 8491029 [ams-main]): Busy 2019-06-02T16:47:49.320Z cpu1:66415)nhpsa: hpsa_vmkScsiCmdDone:6384: Sense data: error code: 0x70, key: 0x5, info:00 00 00 00 , cmdInfo:00 00 00 00 , CmdSN: 0xd5c, worldId: 0x818e8e, Cmd: 0x85, ASC: 0x20, ASCQ: 0x0 2019-06-02T16:47:49.320Z cpu1:66415)ScsiDeviceIO: 2948: Cmd(0x43954115be40) 0x85, CmdSN 0xd5c from world 8490638 to dev "naa.600508b1001c6d77d7dd6a0cc0953df1" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. The device "
Host's HA failed!
Hell All We are trying to deploy VMs as cloudstack management nodes over the compute nodes, like hyper-converged infrastructure. But we found some issue with the HA switch. The test is based on CentOS 7.6 and CloudStack 4.11.2. The primary storage is NFS. We have enabled VM HA and Host HA, and the following global settings are enabled for CloudStack: indirect.agent.lb.algorithm=roundrobin host=M1IP,M2IP,M3IP We have 3 management VMs over 3 compute nodes, sharing with a mariadb galera cluster. M1 is the first management VM running on compute node H1, which is created under KVM and not managed by CloudStack. As the same, M2 and M3 is the second and thrid management VM running on H2 and H3. The problem is: If the CloudStack agent of compute node is connected to the management node on itself (like CloudStack agent of H1 is connected to M1), once H1 is down, CloudStack will only update the status of H1 to Disconnected, and won’t trigger AgentStatusCheck, then cause host HA failure. If H1 is connected to other management VMs like M2 or M3, this issue won’t happen. So, would you please give us some suggestions? Thanks.
Re: Can't start systemVM in a new advanced zone deployment
This option is 'vol options name_of_volume nosnapdir on' however if I recall it right is supposed to work even with .snapshot directory visible Can you find out all vol options on your netapp volume? I would be most concerned about: - NFS version - NFS v4 should be disabled - security qtree mode to be set to UNIX - allow root mount I am also wondering if ACS is able to create ROOT-XX folder so you might want to watch the content of the DS when ACS tries the operations. On 6/5/19, 11:43 PM, "Paul Angus" wrote: Hi Yiping, do you have snapshots enabled on the NetApp filer? (it used to be seen as a ".snapshot" subdirectory in each directory) If so try disabling snapshots - there used to be a bug where the .snapshot directory would confuse CloudStack. paul.an...@shapeblue.com www.shapeblue.com Amadeus House, Floral Street, London WC2E 9DPUK @shapeblue -Original Message- From: Yiping Zhang Sent: 05 June 2019 23:38 To: users@cloudstack.apache.org Subject: Re: Can't start systemVM in a new advanced zone deployment Hi, Sergey: I found more logs in vpxa.log ( the esxi hosts are using UTC time zone, so I was looking at wrong time periods earlier). I have uploaded more logs into pastebin. From these log entries, it appears that when copying template to VM, it tried to open destination VMDK file and got error file not found. In case that the CloudStack attempted to create a systemVM, the destination VMDK file path it is looking for is "//.vmdk", see uploaded log at https://pastebin.com/aFysZkTy In case when I manually created new VM from a (different) template in vCenter UI, the destination VMDK file path it is looking for is "//.vmdk", see uploaded log at https://pastebin.com/yHcsD8xB So, I am confused as to how the path for destination VMDK was determined and by CloudStack or VMware, how did I end up with this? Yiping On 6/5/19, 12:32 PM, "Sergey Levitskiy" wrote: Some operations log get transferred to vCenter log vpxd.log. It is not straightforward to trace I but Vmware will be able to help should you open case with them. On 6/5/19, 11:39 AM, "Yiping Zhang" wrote: Hi, Sergey: During the time period when I had problem cloning template, there are only a few unique entries in vmkernel.log, and they were repeated hundreds/thousands of times by all the cpu cores: 2019-06-02T16:47:00.633Z cpu9:8491061)FSS: 6751: Failed to open file 'hpilo-d0ccb15'; Requested flags 0x5, world: 8491061 [ams-ahs], (Existing flags 0x5, world: 8491029 [ams-main]): Busy 2019-06-02T16:47:49.320Z cpu1:66415)nhpsa: hpsa_vmkScsiCmdDone:6384: Sense data: error code: 0x70, key: 0x5, info:00 00 00 00 , cmdInfo:00 00 00 00 , CmdSN: 0xd5c, worldId: 0x818e8e, Cmd: 0x85, ASC: 0x20, ASCQ: 0x0 2019-06-02T16:47:49.320Z cpu1:66415)ScsiDeviceIO: 2948: Cmd(0x43954115be40) 0x85, CmdSN 0xd5c from world 8490638 to dev "naa.600508b1001c6d77d7dd6a0cc0953df1" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. The device " naa.600508b1001c6d77d7dd6a0cc0953df1" is the local disk on this host. Yiping On 6/5/19, 11:15 AM, "Sergey Levitskiy" wrote: This must be specific to that environment. For a full clone mode ACS simply calls cloneVMTask of vSphere API so basically until cloning of that template succeeds when attmepted in vSphere client it would keep failing in ACS. Can you post vmkernel.log from your ESX host esx-0001-a-001? On 6/5/19, 8:47 AM, "Yiping Zhang" wrote: Well, I can always reproduce it in this particular vSphere set up, but in a different ACS+vSphere environment, I don't see this problem. Yiping On 6/5/19, 1:00 AM, "Andrija Panic" wrote: Yiping, if you are sure you can reproduce the issue, it would be good to raise a GitHub issue and provide as much detail as possible. Andrija On Wed, 5 Jun 2019 at 05:29, Yiping Zhang wrote: > Hi, Sergey: > > Thanks for the tip. After setting vmware.create.full.clone=false, I was > able to create and start system VM instances. However, I feel that the > underlying problem still
Re: delete UploadAbandoned template
I did not try your fix but I will be trying to import image again soon and see. My fix was to create directory in template/tmpl/2/209, then using th UI I was able to remove the template. Thanks again, Jesse On Wed, Jun 5, 2019 at 4:55 PM Nicolas Vazquez < nicolas.vazq...@shapeblue.com> wrote: > Hi Jesse, > > As the upload was abandoned, it has never started, so the workaround will > be just deleting the entry from CloudStack as there will be no files on > secondary storage. To do this, please first execute this query on database: > > update template_store_ref set destroyed = 1 where template_id = 209; > > After that, try again to delete the template from CloudStack. > > > Regards, > > Nicolas Vazquez > > > From: jesse.wat...@gmail.com > Sent: Wednesday, June 5, 2019 3:47 PM > To: users@cloudstack.apache.org > Subject: delete UploadAbandoned template > > How do I remove abandined templates? > > While trying to clean up after couple failed attempts of importing a vm > template status is "UploadAbandoned" > When I try to delete file I get an error , "Failed to delete template" > Looking at management.log > 2019-06-05 14:43:17,508 WARN [c.c.t.HypervisorTemplateAdapter] > (API-Job-Executor-27:ctx-1bdcfb28 job-402 ctx-9643b0f7) (logid:334038d9) > Failed to delete the template: > Tmpl[209-QCOW2-209-2-b1f68c47-8b88-3d46-90fa-753f4e8bcc72 from the image > store: atlas_nfs due to: Unable to delete file 207 under Template path > template/tmpl/2/209 > > And directory 209 does not exist due to upload being abandoned. > > Any suggestions? > > TIA, > Jesse > > nicolas.vazq...@shapeblue.com > www.shapeblue.com > Amadeus House, Floral Street, London WC2E 9DPUK > @shapeblue > > > >
Re: CPU compatibility issue
To be fair, this fix is not a "fix" - it just allows "subtracting" CPU flags for the vCPU from the base model presented, if the physical CPU doesn't support it. I spent some time yesterday googling - and it seems like a possible kernel/qemu bug which is presented on just some CPU models. Adam, I understood you "solved" your issue by not expecting SandyBrige while having Westmer :) , right ? (You could however probably run SandyBridge with Westmere with this patch) Cheers On Thu, 6 Jun 2019 at 04:29, Nicolas Vazquez wrote: > Hi Adam, > > This PR seems to fix your issue: > https://github.com/apache/cloudstack/pull/3335 and it has been merged > into master but not on 4.11.x, which means it will be available on the next > 4.13 release. > > Regards, > Nicolas Vazquez > > From: Adam Witwicki > Sent: Wednesday, June 5, 2019 9:25:14 AM > To: users@cloudstack.apache.org > Subject: RE: CPU compatibility issue > > It might be because I'm using a Westmere CPU? > > > Doh > > Adam > > nicolas.vazq...@shapeblue.com > www.shapeblue.com > Amadeus House, Floral Street, London WC2E 9DPUK > @shapeblue > > > > > -Original Message- > From: Adam Witwicki > Sent: 05 June 2019 13:08 > To: users@cloudstack.apache.org > Subject: CPU compatibility issue > > ** This mail originated from OUTSIDE the Oakford corporate network. Treat > hyperlinks and attachments in this email with caution. ** > > Hello, > > I am having an issue on a new test system, cloudstack version 4.11.00 > (shapeblue) with KVM (Ubuntu 16.04), In production and in the past I have > used these settings in the agent.properties > > guest.cpu.model=SandyBridge > guest.cpu.mode=custom > > As this was needed to allow windows 2016 to boot and allow live migration > > Yet now on a clean install I get this error when trying to start any > instance (console, router, instance) > > 2019-06-05 13:03:13,282 WARN > [resource.wrapper.LibvirtStartCommandWrapper] (agentRequest-Handler-5:null) > (logid:2df290f8) LibvirtException > org.libvirt.LibvirtException: unsupported configuration: guest and host > CPU are not compatible: Host CPU does not provide required features: avx, > xsave, aes, tsc-deadline, x2apic, pclmuldq > > If I use host-passthrough it works > > This may be a clue > https://github.com/ustcweizhou/cloudstack/commit/16e87a80d85d283d52c3d72001cd94d9fda25278 > but I am not sure. > > > Can anyone explain what is going on? > > Thanks > > Adam > > > > > Disclaimer Notice: > This email has been sent by Oakford Technology Limited, while we have > checked this e-mail and any attachments for viruses, we can not guarantee > that they are virus-free. You must therefore take full responsibility for > virus checking. > This message and any attachments are confidential and should only be read > by those to whom they are addressed. If you are not the intended recipient, > please contact us, delete the message from your computer and destroy any > copies. Any distribution or copying without our prior permission is > prohibited. > Internet communications are not always secure and therefore Oakford > Technology Limited does not accept legal responsibility for this message. > The recipient is responsible for verifying its authenticity before acting > on the contents. Any views or opinions presented are solely those of the > author and do not necessarily represent those of Oakford Technology Limited. > Registered address: Oakford Technology Limited, The Manor House, Potterne, > Wiltshire. SN10 5PN. > Registered in England and Wales No. 5971519 > > Disclaimer Notice: > This email has been sent by Oakford Technology Limited, while we have > checked this e-mail and any attachments for viruses, we can not guarantee > that they are virus-free. You must therefore take full responsibility for > virus checking. > This message and any attachments are confidential and should only be read > by those to whom they are addressed. If you are not the intended recipient, > please contact us, delete the message from your computer and destroy any > copies. Any distribution or copying without our prior permission is > prohibited. > Internet communications are not always secure and therefore Oakford > Technology Limited does not accept legal responsibility for this message. > The recipient is responsible for verifying its authenticity before acting > on the contents. Any views or opinions presented are solely those of the > author and do not necessarily represent those of Oakford Technology Limited. > Registered address: Oakford Technology Limited, The Manor House, Potterne, > Wiltshire. SN10 5PN. > Registered in England and Wales No. 5971519 > > -- Andrija Panić
Re: [DISCUSS] Deprecate older hypervisors in 4.14
On 6/5/19 7:56 PM, Rohit Yadav wrote: > All, > > Bases on some conversation on a Github issue on moving to Python3 today, I > would like to propose a PR after 4.13 is cut on deprecation of following > hypervisors in the next major 4.14 release: > > - XenServer 6.2, 6.5 and older > - VMware 5.x > - KVM/CentOS6/RHEL6 (though we've already voted and agreed to deprecate el6 > packages in 4.14) > I would also opt for Ubuntu 14.04 as that is EOL since April 2019 and we should/could drop that support as well. > Note that it was mentioned in recent release notes as well, but I wanted to > kick off a discussion thread if esp. our users have any objections or > concerns: > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Hypervisor+and+Management+Server+OS+EOL+Dates > > Thoughts? Anything we should add, remove? Thanks. > > Regards, > Rohit Yadav > > > rohit.ya...@shapeblue.com > www.shapeblue.com > Amadeus House, Floral Street, London WC2E 9DPUK > @shapeblue > > >
Re: [DISCUSS] Deprecate older hypervisors in 4.14
Sounds like a good idea! People can still stay on a older version of CS if needed for them. We can't support older versions for ever. Wido On 6/5/19 7:56 PM, Rohit Yadav wrote: > All, > > Bases on some conversation on a Github issue on moving to Python3 today, I > would like to propose a PR after 4.13 is cut on deprecation of following > hypervisors in the next major 4.14 release: > > - XenServer 6.2, 6.5 and older > - VMware 5.x > - KVM/CentOS6/RHEL6 (though we've already voted and agreed to deprecate el6 > packages in 4.14) > > Note that it was mentioned in recent release notes as well, but I wanted to > kick off a discussion thread if esp. our users have any objections or > concerns: > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Hypervisor+and+Management+Server+OS+EOL+Dates > > Thoughts? Anything we should add, remove? Thanks. > > Regards, > Rohit Yadav > > > rohit.ya...@shapeblue.com > www.shapeblue.com > Amadeus House, Floral Street, London WC2E 9DPUK > @shapeblue > > >
RE: Can't start systemVM in a new advanced zone deployment
Hi Yiping, do you have snapshots enabled on the NetApp filer? (it used to be seen as a ".snapshot" subdirectory in each directory) If so try disabling snapshots - there used to be a bug where the .snapshot directory would confuse CloudStack. paul.an...@shapeblue.com www.shapeblue.com Amadeus House, Floral Street, London WC2E 9DPUK @shapeblue -Original Message- From: Yiping Zhang Sent: 05 June 2019 23:38 To: users@cloudstack.apache.org Subject: Re: Can't start systemVM in a new advanced zone deployment Hi, Sergey: I found more logs in vpxa.log ( the esxi hosts are using UTC time zone, so I was looking at wrong time periods earlier). I have uploaded more logs into pastebin. From these log entries, it appears that when copying template to VM, it tried to open destination VMDK file and got error file not found. In case that the CloudStack attempted to create a systemVM, the destination VMDK file path it is looking for is "//.vmdk", see uploaded log at https://pastebin.com/aFysZkTy In case when I manually created new VM from a (different) template in vCenter UI, the destination VMDK file path it is looking for is "//.vmdk", see uploaded log at https://pastebin.com/yHcsD8xB So, I am confused as to how the path for destination VMDK was determined and by CloudStack or VMware, how did I end up with this? Yiping On 6/5/19, 12:32 PM, "Sergey Levitskiy" wrote: Some operations log get transferred to vCenter log vpxd.log. It is not straightforward to trace I but Vmware will be able to help should you open case with them. On 6/5/19, 11:39 AM, "Yiping Zhang" wrote: Hi, Sergey: During the time period when I had problem cloning template, there are only a few unique entries in vmkernel.log, and they were repeated hundreds/thousands of times by all the cpu cores: 2019-06-02T16:47:00.633Z cpu9:8491061)FSS: 6751: Failed to open file 'hpilo-d0ccb15'; Requested flags 0x5, world: 8491061 [ams-ahs], (Existing flags 0x5, world: 8491029 [ams-main]): Busy 2019-06-02T16:47:49.320Z cpu1:66415)nhpsa: hpsa_vmkScsiCmdDone:6384: Sense data: error code: 0x70, key: 0x5, info:00 00 00 00 , cmdInfo:00 00 00 00 , CmdSN: 0xd5c, worldId: 0x818e8e, Cmd: 0x85, ASC: 0x20, ASCQ: 0x0 2019-06-02T16:47:49.320Z cpu1:66415)ScsiDeviceIO: 2948: Cmd(0x43954115be40) 0x85, CmdSN 0xd5c from world 8490638 to dev "naa.600508b1001c6d77d7dd6a0cc0953df1" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. The device " naa.600508b1001c6d77d7dd6a0cc0953df1" is the local disk on this host. Yiping On 6/5/19, 11:15 AM, "Sergey Levitskiy" wrote: This must be specific to that environment. For a full clone mode ACS simply calls cloneVMTask of vSphere API so basically until cloning of that template succeeds when attmepted in vSphere client it would keep failing in ACS. Can you post vmkernel.log from your ESX host esx-0001-a-001? On 6/5/19, 8:47 AM, "Yiping Zhang" wrote: Well, I can always reproduce it in this particular vSphere set up, but in a different ACS+vSphere environment, I don't see this problem. Yiping On 6/5/19, 1:00 AM, "Andrija Panic" wrote: Yiping, if you are sure you can reproduce the issue, it would be good to raise a GitHub issue and provide as much detail as possible. Andrija On Wed, 5 Jun 2019 at 05:29, Yiping Zhang wrote: > Hi, Sergey: > > Thanks for the tip. After setting vmware.create.full.clone=false, I was > able to create and start system VM instances.However, I feel that the > underlying problem still exists, and I am just working around it instead of > fixing it, because in my lab CloudStack instance with the same version of > ACS and vSphere, I still have vmware.create.full.clone=true and all is > working as expected. > > I did some reading on VMware docs regarding full clone vs. linked clone. > It seems that the best practice is to use full clone for production, > especially if there are high rates of changes to the disks. So > eventually, I need to understand and fix the root cause for this issue. > At least for now, I am over this hurdle and I can move on. > > Thanks again, >