Well, I can always reproduce it in this particular vSphere set up, but in a different ACS+vSphere environment, I don't see this problem.
Yiping On 6/5/19, 1:00 AM, "Andrija Panic" <andrija.pa...@gmail.com> wrote: Yiping, if you are sure you can reproduce the issue, it would be good to raise a GitHub issue and provide as much detail as possible. Andrija On Wed, 5 Jun 2019 at 05:29, Yiping Zhang <yipzh...@adobe.com.invalid> wrote: > Hi, Sergey: > > Thanks for the tip. After setting vmware.create.full.clone=false, I was > able to create and start system VM instances. However, I feel that the > underlying problem still exists, and I am just working around it instead of > fixing it, because in my lab CloudStack instance with the same version of > ACS and vSphere, I still have vmware.create.full.clone=true and all is > working as expected. > > I did some reading on VMware docs regarding full clone vs. linked clone. > It seems that the best practice is to use full clone for production, > especially if there are high rates of changes to the disks. So > eventually, I need to understand and fix the root cause for this issue. > At least for now, I am over this hurdle and I can move on. > > Thanks again, > > Yiping > > On 6/4/19, 11:13 AM, "Sergey Levitskiy" <serg...@hotmail.com> wrote: > > Everything looks good and consistent including all references in VMDK > and its snapshot. I would try these 2 routes: > 1. Figure out what vSphere error actually means from vmkernel log of > ESX when ACS tries to clone the template. If the same error happens while > doing it outside of ACS then a support case with VMware can be an option > 2. Try using link clones. This can be done by this global setting and > restarting management server > vmware.create.full.clone false > > > On 6/4/19, 9:57 AM, "Yiping Zhang" <yipzh...@adobe.com.INVALID> wrote: > > Hi, Sergey: > > Thanks for the help. By now, I have dropped and recreated DB, > re-deployed this zone multiple times, blow away primary and secondary > storage (including all contents on them) , or just delete template itself > from primary storage, multiple times. Every time I ended up with the same > error at the same place. > > The full management server log, from the point I seeded the > systemvmtemplate for vmware, to deploying a new advanced zone and enable > the zone to let CS to create system VM's and finally disable the zone to > stop infinite loop of trying to recreate failed system VM's, are posted > at pastebin: > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fc05wiQ3R&data=02%7C01%7Cyipzhang%40adobe.com%7C4c2a536c47ed44e4167908d6e98be65f%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636953184420635130&sdata=GFdI2M4NdVoB848FAtYgg9zzfXAjBFYm%2BwafqMTnxWY%3D&reserved=0 > > Here are the content of relevant files for the template on primary > storage: > > 1) /vmfsvolumes: > > ls -l /vmfs/volumes/ > total 2052 > drwxr-xr-x 1 root root 8 Jan 1 1970 > 414f6a73-87cd6dac-9585-133ddd409762 > lrwxr-xr-x 1 root root 17 Jun 4 16:37 > 42054b8459633172be231d72a52d59d4 -> afc5e946-03bfe3c2 <== this is > the NFS datastore for primary storage > drwxr-xr-x 1 root root 8 Jan 1 1970 > 5cd4b46b-fa4fcff0-d2a1-00215a9b31c0 > drwxr-xr-t 1 root root 1400 Jun 3 22:50 > 5cd4b471-c2318b91-8fb2-00215a9b31c0 > drwxr-xr-x 1 root root 8 Jan 1 1970 > 5cd4b471-da49a95b-bdb6-00215a9b31c0 > drwxr-xr-x 4 root root 4096 Jun 3 23:38 > afc5e946-03bfe3c2 > drwxr-xr-x 1 root root 8 Jan 1 1970 > b70c377c-54a9d28a-6a7b-3f462a475f73 > > 2) content in template dir on primary storage: > > ls -l > /vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/ > total 1154596 > -rw------- 1 root root 8192 Jun 3 23:38 > 533b6fcf3fa6301aadcc2b168f3f999a-000001-delta.vmdk > -rw------- 1 root root 366 Jun 3 23:38 > 533b6fcf3fa6301aadcc2b168f3f999a-000001.vmdk > -rw-r--r-- 1 root root 268 Jun 3 23:38 > 533b6fcf3fa6301aadcc2b168f3f999a-7d5d73de.hlog > -rw------- 1 root root 9711 Jun 3 23:38 > 533b6fcf3fa6301aadcc2b168f3f999a-Snapshot1.vmsn > -rw------- 1 root root 2097152000 Jun 3 23:38 > 533b6fcf3fa6301aadcc2b168f3f999a-flat.vmdk > -rw------- 1 root root 518 Jun 3 23:38 > 533b6fcf3fa6301aadcc2b168f3f999a.vmdk > -rw-r--r-- 1 root root 471 Jun 3 23:38 > 533b6fcf3fa6301aadcc2b168f3f999a.vmsd > -rwxr-xr-x 1 root root 1402 Jun 3 23:38 > 533b6fcf3fa6301aadcc2b168f3f999a.vmtx > > 3) *.vmdk file content: > > cat > /vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a.vmdk > # Disk DescriptorFile > version=1 > encoding="UTF-8" > CID=ecb01275 > parentCID=ffffffff > isNativeSnapshot="no" > createType="vmfs" > > # Extent description > RW 4096000 VMFS "533b6fcf3fa6301aadcc2b168f3f999a-flat.vmdk" > > # The Disk Data Base > #DDB > > ddb.adapterType = "lsilogic" > ddb.geometry.cylinders = "4063" > ddb.geometry.heads = "16" > ddb.geometry.sectors = "63" > ddb.longContentID = "1c60ba48999abde959998f05ecb01275" > ddb.thinProvisioned = "1" > ddb.uuid = "60 00 C2 9b 52 6d 98 c4-1f 44 51 ce 1e 70 a9 70" > ddb.virtualHWVersion = "13" > > 4) *-0001.vmdk content: > > cat > /vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a-000001.vmdk > > # Disk DescriptorFile > version=1 > encoding="UTF-8" > CID=ecb01275 > parentCID=ecb01275 > isNativeSnapshot="no" > createType="vmfsSparse" > parentFileNameHint="533b6fcf3fa6301aadcc2b168f3f999a.vmdk" > # Extent description > RW 4096000 VMFSSPARSE > "533b6fcf3fa6301aadcc2b168f3f999a-000001-delta.vmdk" > > # The Disk Data Base > #DDB > > ddb.longContentID = "1c60ba48999abde959998f05ecb01275" > > > 5) *.vmtx content: > > cat > /vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a.vmtx > > .encoding = "UTF-8" > config.version = "8" > virtualHW.version = "8" > nvram = "533b6fcf3fa6301aadcc2b168f3f999a.nvram" > pciBridge0.present = "TRUE" > svga.present = "TRUE" > pciBridge4.present = "TRUE" > pciBridge4.virtualDev = "pcieRootPort" > pciBridge4.functions = "8" > pciBridge5.present = "TRUE" > pciBridge5.virtualDev = "pcieRootPort" > pciBridge5.functions = "8" > pciBridge6.present = "TRUE" > pciBridge6.virtualDev = "pcieRootPort" > pciBridge6.functions = "8" > pciBridge7.present = "TRUE" > pciBridge7.virtualDev = "pcieRootPort" > pciBridge7.functions = "8" > vmci0.present = "TRUE" > hpet0.present = "TRUE" > floppy0.present = "FALSE" > memSize = "256" > scsi0.virtualDev = "lsilogic" > scsi0.present = "TRUE" > ide0:0.startConnected = "FALSE" > ide0:0.deviceType = "atapi-cdrom" > ide0:0.fileName = "CD/DVD drive 0" > ide0:0.present = "TRUE" > scsi0:0.deviceType = "scsi-hardDisk" > scsi0:0.fileName = "533b6fcf3fa6301aadcc2b168f3f999a-000001.vmdk" > scsi0:0.present = "TRUE" > displayName = "533b6fcf3fa6301aadcc2b168f3f999a" > annotation = "systemvmtemplate-4.11.2.0-vmware" > guestOS = "otherlinux-64" > toolScripts.afterPowerOn = "TRUE" > toolScripts.afterResume = "TRUE" > toolScripts.beforeSuspend = "TRUE" > toolScripts.beforePowerOff = "TRUE" > uuid.bios = "42 02 f1 40 33 e8 de e5-1a c5 93 2a c9 12 47 61" > vc.uuid = "50 02 5b d9 e9 c9 77 86-28 3e 84 00 22 2b eb d3" > firmware = "bios" > migrate.hostLog = "533b6fcf3fa6301aadcc2b168f3f999a-7d5d73de.hlog" > > > 6) *.vmsd file content: > > cat > /vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a.vmsd > .encoding = "UTF-8" > snapshot.lastUID = "1" > snapshot.current = "1" > snapshot0.uid = "1" > snapshot0.filename = > "533b6fcf3fa6301aadcc2b168f3f999a-Snapshot1.vmsn" > snapshot0.displayName = "cloud.template.base" > snapshot0.description = "Base snapshot" > snapshot0.createTimeHigh = "363123" > snapshot0.createTimeLow = "-679076964" > snapshot0.numDisks = "1" > snapshot0.disk0.fileName = "533b6fcf3fa6301aadcc2b168f3f999a.vmdk" > snapshot0.disk0.node = "scsi0:0" > snapshot.numSnapshots = "1" > > 7) *-Snapshot1.vmsn content: > > cat > /vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a-Snapshot1.vmsn > > ҾSnapshot\?%?cfgFilet%t%.encoding = "UTF-8" > config.version = "8" > virtualHW.version = "8" > nvram = "533b6fcf3fa6301aadcc2b168f3f999a.nvram" > pciBridge0.present = "TRUE" > svga.present = "TRUE" > pciBridge4.present = "TRUE" > pciBridge4.virtualDev = "pcieRootPort" > pciBridge4.functions = "8" > pciBridge5.present = "TRUE" > pciBridge5.virtualDev = "pcieRootPort" > pciBridge5.functions = "8" > pciBridge6.present = "TRUE" > pciBridge6.virtualDev = "pcieRootPort" > pciBridge6.functions = "8" > pciBridge7.present = "TRUE" > pciBridge7.virtualDev = "pcieRootPort" > pciBridge7.functions = "8" > vmci0.present = "TRUE" > hpet0.present = "TRUE" > floppy0.present = "FALSE" > memSize = "256" > scsi0.virtualDev = "lsilogic" > scsi0.present = "TRUE" > ide0:0.startConnected = "FALSE" > ide0:0.deviceType = "atapi-cdrom" > ide0:0.fileName = "CD/DVD drive 0" > ide0:0.present = "TRUE" > scsi0:0.deviceType = "scsi-hardDisk" > scsi0:0.fileName = "533b6fcf3fa6301aadcc2b168f3f999a.vmdk" > scsi0:0.present = "TRUE" > displayName = "533b6fcf3fa6301aadcc2b168f3f999a" > annotation = "systemvmtemplate-4.11.2.0-vmware" > guestOS = "otherlinux-64" > toolScripts.afterPowerOn = "TRUE" > toolScripts.afterResume = "TRUE" > toolScripts.beforeSuspend = "TRUE" > toolScripts.beforePowerOff = "TRUE" > uuid.bios = "42 02 f1 40 33 e8 de e5-1a c5 93 2a c9 12 47 61" > vc.uuid = "50 02 5b d9 e9 c9 77 86-28 3e 84 00 22 2b eb d3" > firmware = "bios" > migrate.hostLog = "533b6fcf3fa6301aadcc2b168f3f999a-7d5d73de.hlog" > > > ------------ > > That's all the data on the template VMDK. > > Much appreciate your time! > > Yiping > > > > On 6/4/19, 9:29 AM, "Sergey Levitskiy" <serg...@hotmail.com> > wrote: > > Have you tried deleting template from PS and let ACS to recopy > it again? If the issue is reproducible we can try to look what is wrong > with VMDK. Please post content of 533b6fcf3fa6301aadcc2b168f3f999a.vmdk , > 533b6fcf3fa6301aadcc2b168f3f999a-000001.vmdk and > 533b6fcf3fa6301aadcc2b168f3f999a.vmx (their equitant after ACS finishes > copying template). Also from one of your ESX hosts output of this > ls -al /vmfs/volumes > ls -al /vmfs/volumes/*/533b6fcf3fa6301aadcc2b168f3f999a (their > equitant after ACS finishes copying template) > > Can you also post management server log starting from the > point you unregister and delete template from the vCenter. > > On 6/4/19, 8:37 AM, "Yiping Zhang" <yipzh...@adobe.com.INVALID> > wrote: > > I have manually imported the OVA to vCenter and > successfully cloned a VM instance with it, on the same NFS datastore. > > > On 6/4/19, 8:25 AM, "Sergey Levitskiy" < > serg...@hotmail.com> wrote: > > I would suspect the template is corrupted on the > secondary storage. You can try disabling/enabling link clone feature and > see if it works the other way. > vmware.create.full.clone false > > Also systemVM template might have been generated on a > newer version of vSphere and not compatible with ESXi 6.5. What you can do > to validate this is to manually deploy OVA that is in Secondary storage and > try to spin up VM from it directly in vCenter. > > > > On 6/3/19, 5:41 PM, "Yiping Zhang" > <yipzh...@adobe.com.INVALID> wrote: > > Hi, list: > > I am struggling with deploying a new advanced zone > using ACS 4.11.2.0 + vSphere 6.5 + NetApp volumes for primary and secondary > storage devices. The initial setup of CS management server, seeding of > systemVM template, and advanced zone deployment all went smoothly. > > Once I enabled the zone in web UI and the systemVM > template gets copied/staged on to primary storage device. But subsequent VM > creations from this template would fail with errors: > > > 2019-06-03 18:38:15,764 INFO [c.c.h.v.m.HostMO] > (DirectAgent-7:ctx-d01169cb esx-0001-a-001.example.org, job-3/job-29, > cmd: CopyCommand) VM 533b6fcf3fa6301aadcc2b168f3f999a not found in host > cache > > 2019-06-03 18:38:17,017 INFO > [c.c.h.v.r.VmwareResource] (DirectAgent-4:ctx-08b54fbd > esx-0001-a-001.example.org, job-3/job-29, cmd: CopyCommand) > VmwareStorageProcessor and VmwareStorageSubsystemCommandHandler > successfully reconfigured > > 2019-06-03 18:38:17,128 INFO > [c.c.s.r.VmwareStorageProcessor] (DirectAgent-4:ctx-08b54fbd > esx-0001-a-001.example.org, job-3/job-29, cmd: CopyCommand) creating full > clone from template > > 2019-06-03 18:38:17,657 INFO > [c.c.h.v.u.VmwareHelper] (DirectAgent-4:ctx-08b54fbd > esx-0001-a-001.example.org, job-3/job-29, cmd: CopyCommand) > [ignored]failed toi get message for exception: Error caused by file > /vmfs/volumes/afc5e946-03bfe3c2/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a-000001.vmdk > > 2019-06-03 18:38:17,658 ERROR > [c.c.s.r.VmwareStorageProcessor] (DirectAgent-4:ctx-08b54fbd > esx-0001-a-001.example.org, job-3/job-29, cmd: CopyCommand) clone volume > from base image failed due to Exception: java.lang.RuntimeException > > Message: Error caused by file > /vmfs/volumes/afc5e946-03bfe3c2/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a-000001.vmdk > > > > If I try to create “new VM from template” > (533b6fcf3fa6301aadcc2b168f3f999a) on vCenter UI manually, I will receive > exactly the same error message. The name of the VMDK file in the error > message is a snapshot of the base disk image, but it is not part of the > original template OVA on the secondary storage. So, in the process of > copying the template from secondary to primary storage, a snapshot got > created and the disk became corrupted/unusable. > > Much later in the log file, there is another > error message “failed to fetch any free public IP address” (for ssvm, I > think). I don’t know if these two errors are related or if one is the root > cause for the other error. > > The full management server log is uploaded as > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fc05wiQ3R&data=02%7C01%7Cyipzhang%40adobe.com%7C4c2a536c47ed44e4167908d6e98be65f%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636953184420635130&sdata=GFdI2M4NdVoB848FAtYgg9zzfXAjBFYm%2BwafqMTnxWY%3D&reserved=0 > > Any help or insight on what went wrong here are > much appreciated. > > Thanks > > Yiping > > > > > > > > > > > > > -- Andrija Panić