Re: Host's HA failed!

2019-06-06 Thread Nicolas Vazquez
Hi Li,

Can you post the stack trace of the faiure from the logs?


Regards,

Nicolas Vazquez


From: li jerry 
Sent: Thursday, June 6, 2019 11:50 AM
To: d...@cloudstack.apache.org; users@cloudstack.apache.org
Subject: Host's HA failed!

Hell All

We are trying to deploy VMs as cloudstack management nodes over the compute 
nodes, like hyper-converged infrastructure. But we found some issue with the HA 
switch.
The test is based on CentOS 7.6 and CloudStack 4.11.2. The primary storage is 
NFS.
We have enabled VM HA and Host HA, and the following global settings are 
enabled for CloudStack:
indirect.agent.lb.algorithm=roundrobin
host=M1IP,M2IP,M3IP

We have 3 management VMs over 3 compute nodes, sharing with a mariadb galera 
cluster.
M1 is the first management VM running on compute node H1, which is created 
under KVM and not managed by CloudStack. As the same, M2 and M3 is the second 
and thrid management VM running on H2 and H3.

The problem is:
If the CloudStack agent of compute node is connected to the management node on 
itself (like CloudStack agent of H1 is connected to M1), once H1 is down, 
CloudStack will only update the status of H1 to Disconnected, and won’t trigger 
AgentStatusCheck, then cause host HA failure.
If H1 is connected to other management VMs like M2 or M3, this issue won’t 
happen.

So, would you please give us some suggestions? Thanks.


nicolas.vazq...@shapeblue.com 
www.shapeblue.com
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue
  
 



Re: Can't start systemVM in a new advanced zone deployment

2019-06-06 Thread Sergey Levitskiy
Yes snapshots are supposed to be in PS template copy. 

On 6/6/19, 9:24 AM, "Yiping Zhang"  wrote:

The nfs volume definitely allows root mount and have RW permissions, as we 
already see the volume mounted and template staged on primary storage. The 
volume is mounted as NFS3 datastore in vSphere.

Volume snapshot is enabled,  I can ask to have snapshot disabled to see if 
it makes any differentces.   I need to find out more about NFS version and 
qtree mode from our storage admin.  

One thing I noticed is that when cloudstack templates are staged on to 
primary storage, a snapshot was created, which does not exist In the original 
OVA or on secondary storage.  I suppose this is the expected behavior?

Yiping

On 6/6/19, 6:59 AM, "Sergey Levitskiy"  wrote:

This option is 'vol options name_of_volume nosnapdir on' however if I 
recall it right is supposed to work even with .snapshot directory visible
Can you find out all vol options on your netapp volume? I would be most 
concerned about:
- NFS version - NFS v4 should be disabled
- security qtree mode to be set to UNIX
- allow root mount

I am also wondering if ACS is able to create ROOT-XX folder so you 
might want to watch the content of the DS when ACS tries the operations.
 

On 6/5/19, 11:43 PM, "Paul Angus"  wrote:

Hi Yiping,

do you have snapshots enabled on the NetApp filer?  (it used to be 
seen as a  ".snapshot"  subdirectory in each directory)

If so try disabling snapshots - there used to be a bug where the 
.snapshot directory would confuse CloudStack.

paul.an...@shapeblue.com 

https://nam04.safelinks.protection.outlook.com/?url=www.shapeblue.comdata=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448727822sdata=NhoxwF0x4%2F8yn%2B8ck%2BCI8RUKEEDGnI73QfDDQeSmZUc%3Dreserved=0
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue
  
 


-Original Message-
From: Yiping Zhang  
Sent: 05 June 2019 23:38
To: users@cloudstack.apache.org
Subject: Re: Can't start systemVM in a new advanced zone deployment

Hi, Sergey:

I found more logs in vpxa.log ( the esxi hosts are using UTC time 
zone,  so I was looking at wrong time periods earlier).  I have uploaded more 
logs into pastebin.

From these log entries,  it appears that when copying template to 
VM,  it tried to open destination VMDK file and got error file not found.  

In case that the CloudStack attempted to create a systemVM,  the 
destination VMDK file path it is looking for is 
"//.vmdk",  see uploaded log at 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FaFysZkTydata=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448727822sdata=YyB9VdghCgiBuUmDZ8gIc0jPlM8miPzemX2UEAZ3sFA%3Dreserved=0

In case when I manually created new VM from a (different) template 
in vCenter UI,   the destination VMDK file path it is looking for is 
"//.vmdk", see uploaded log at 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FyHcsD8xBdata=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448732817sdata=N%2BZHteGo3LDU0pvhBtzv7wcocAv35gRE9b9yKVQa6%2FQ%3Dreserved=0

So, I am confused as to how the path for destination VMDK was 
determined and by CloudStack or VMware, how did I end up with this?

Yiping


On 6/5/19, 12:32 PM, "Sergey Levitskiy"  wrote:

Some operations log get transferred to vCenter log vpxd.log. It 
is not straightforward to trace I but Vmware will be able to help should you 
open case with them. 


On 6/5/19, 11:39 AM, "Yiping Zhang" 
 wrote:

Hi, Sergey:

During the time period when I had problem cloning template, 
 there are only a few unique entries in vmkernel.log, and they were repeated 
hundreds/thousands of times by all the cpu cores:

2019-06-02T16:47:00.633Z cpu9:8491061)FSS: 6751: Failed to 
open file 'hpilo-d0ccb15'; Requested flags 0x5, world: 8491061 [ams-ahs], 
(Existing flags 0x5, world: 8491029 [ams-main]): Busy
2019-06-02T16:47:49.320Z cpu1:66415)nhpsa: 
hpsa_vmkScsiCmdDone:6384: Sense data: error code: 0x70, key: 0x5, info:00 00 00 

Re: Can't start systemVM in a new advanced zone deployment

2019-06-06 Thread Yiping Zhang
The nfs volume definitely allows root mount and have RW permissions, as we 
already see the volume mounted and template staged on primary storage. The 
volume is mounted as NFS3 datastore in vSphere.

Volume snapshot is enabled,  I can ask to have snapshot disabled to see if it 
makes any differentces.   I need to find out more about NFS version and qtree 
mode from our storage admin.  

One thing I noticed is that when cloudstack templates are staged on to primary 
storage, a snapshot was created, which does not exist In the original OVA or on 
secondary storage.  I suppose this is the expected behavior?

Yiping

On 6/6/19, 6:59 AM, "Sergey Levitskiy"  wrote:

This option is 'vol options name_of_volume nosnapdir on' however if I 
recall it right is supposed to work even with .snapshot directory visible
Can you find out all vol options on your netapp volume? I would be most 
concerned about:
- NFS version - NFS v4 should be disabled
- security qtree mode to be set to UNIX
- allow root mount

I am also wondering if ACS is able to create ROOT-XX folder so you might 
want to watch the content of the DS when ACS tries the operations.
 

On 6/5/19, 11:43 PM, "Paul Angus"  wrote:

Hi Yiping,

do you have snapshots enabled on the NetApp filer?  (it used to be seen 
as a  ".snapshot"  subdirectory in each directory)

If so try disabling snapshots - there used to be a bug where the 
.snapshot directory would confuse CloudStack.

paul.an...@shapeblue.com 

https://nam04.safelinks.protection.outlook.com/?url=www.shapeblue.comdata=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448727822sdata=NhoxwF0x4%2F8yn%2B8ck%2BCI8RUKEEDGnI73QfDDQeSmZUc%3Dreserved=0
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue
  
 


-Original Message-
From: Yiping Zhang  
Sent: 05 June 2019 23:38
To: users@cloudstack.apache.org
Subject: Re: Can't start systemVM in a new advanced zone deployment

Hi, Sergey:

I found more logs in vpxa.log ( the esxi hosts are using UTC time zone, 
 so I was looking at wrong time periods earlier).  I have uploaded more logs 
into pastebin.

From these log entries,  it appears that when copying template to VM,  
it tried to open destination VMDK file and got error file not found.  

In case that the CloudStack attempted to create a systemVM,  the 
destination VMDK file path it is looking for is 
"//.vmdk",  see uploaded log at 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FaFysZkTydata=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448727822sdata=YyB9VdghCgiBuUmDZ8gIc0jPlM8miPzemX2UEAZ3sFA%3Dreserved=0

In case when I manually created new VM from a (different) template in 
vCenter UI,   the destination VMDK file path it is looking for is 
"//.vmdk", see uploaded log at 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FyHcsD8xBdata=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448732817sdata=N%2BZHteGo3LDU0pvhBtzv7wcocAv35gRE9b9yKVQa6%2FQ%3Dreserved=0

So, I am confused as to how the path for destination VMDK was 
determined and by CloudStack or VMware, how did I end up with this?

Yiping


On 6/5/19, 12:32 PM, "Sergey Levitskiy"  wrote:

Some operations log get transferred to vCenter log vpxd.log. It is 
not straightforward to trace I but Vmware will be able to help should you open 
case with them. 


On 6/5/19, 11:39 AM, "Yiping Zhang"  
wrote:

Hi, Sergey:

During the time period when I had problem cloning template,  
there are only a few unique entries in vmkernel.log, and they were repeated 
hundreds/thousands of times by all the cpu cores:

2019-06-02T16:47:00.633Z cpu9:8491061)FSS: 6751: Failed to open 
file 'hpilo-d0ccb15'; Requested flags 0x5, world: 8491061 [ams-ahs], (Existing 
flags 0x5, world: 8491029 [ams-main]): Busy
2019-06-02T16:47:49.320Z cpu1:66415)nhpsa: 
hpsa_vmkScsiCmdDone:6384: Sense data: error code: 0x70, key: 0x5, info:00 00 00 
00 , cmdInfo:00 00 00 00 , CmdSN: 0xd5c, worldId: 0x818e8e, Cmd: 0x85, ASC: 
0x20, ASCQ: 0x0
2019-06-02T16:47:49.320Z cpu1:66415)ScsiDeviceIO: 2948: 
Cmd(0x43954115be40) 0x85, CmdSN 0xd5c from world 8490638 to dev 
"naa.600508b1001c6d77d7dd6a0cc0953df1" failed H:0x0 D:0x2 P:0x0 Valid sense 
data: 0x5 0x20 0x0.

The device " 

Host's HA failed!

2019-06-06 Thread li jerry
Hell All

We are trying to deploy VMs as cloudstack management nodes over the compute 
nodes, like hyper-converged infrastructure. But we found some issue with the HA 
switch.
The test is based on CentOS 7.6 and CloudStack 4.11.2. The primary storage is 
NFS.
We have enabled VM HA and Host HA, and the following global settings are 
enabled for CloudStack:
indirect.agent.lb.algorithm=roundrobin
host=M1IP,M2IP,M3IP

We have 3 management VMs over 3 compute nodes, sharing with a mariadb galera 
cluster.
M1 is the first management VM running on compute node H1, which is created 
under KVM and not managed by CloudStack. As the same, M2 and M3 is the second 
and thrid management VM running on H2 and H3.

The problem is:
If the CloudStack agent of compute node is connected to the management node on 
itself (like CloudStack agent of H1 is connected to M1), once H1 is down, 
CloudStack will only update the status of H1 to Disconnected, and won’t trigger 
AgentStatusCheck, then cause host HA failure.
If H1 is connected to other management VMs like M2 or M3, this issue won’t 
happen.

So, would you please give us some suggestions? Thanks.



Re: Can't start systemVM in a new advanced zone deployment

2019-06-06 Thread Sergey Levitskiy
This option is 'vol options name_of_volume nosnapdir on' however if I recall it 
right is supposed to work even with .snapshot directory visible
Can you find out all vol options on your netapp volume? I would be most 
concerned about:
- NFS version - NFS v4 should be disabled
- security qtree mode to be set to UNIX
- allow root mount

I am also wondering if ACS is able to create ROOT-XX folder so you might want 
to watch the content of the DS when ACS tries the operations.
 

On 6/5/19, 11:43 PM, "Paul Angus"  wrote:

Hi Yiping,

do you have snapshots enabled on the NetApp filer?  (it used to be seen as 
a  ".snapshot"  subdirectory in each directory)

If so try disabling snapshots - there used to be a bug where the .snapshot 
directory would confuse CloudStack.

paul.an...@shapeblue.com 
www.shapeblue.com
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue
  
 


-Original Message-
From: Yiping Zhang  
Sent: 05 June 2019 23:38
To: users@cloudstack.apache.org
Subject: Re: Can't start systemVM in a new advanced zone deployment

Hi, Sergey:

I found more logs in vpxa.log ( the esxi hosts are using UTC time zone,  so 
I was looking at wrong time periods earlier).  I have uploaded more logs into 
pastebin.

From these log entries,  it appears that when copying template to VM,  it 
tried to open destination VMDK file and got error file not found.  

In case that the CloudStack attempted to create a systemVM,  the 
destination VMDK file path it is looking for is 
"//.vmdk",  see uploaded log at 
https://pastebin.com/aFysZkTy

In case when I manually created new VM from a (different) template in 
vCenter UI,   the destination VMDK file path it is looking for is 
"//.vmdk", see uploaded log at 
https://pastebin.com/yHcsD8xB

So, I am confused as to how the path for destination VMDK was determined 
and by CloudStack or VMware, how did I end up with this?

Yiping


On 6/5/19, 12:32 PM, "Sergey Levitskiy"  wrote:

Some operations log get transferred to vCenter log vpxd.log. It is not 
straightforward to trace I but Vmware will be able to help should you open case 
with them. 


On 6/5/19, 11:39 AM, "Yiping Zhang"  wrote:

Hi, Sergey:

During the time period when I had problem cloning template,  there 
are only a few unique entries in vmkernel.log, and they were repeated 
hundreds/thousands of times by all the cpu cores:

2019-06-02T16:47:00.633Z cpu9:8491061)FSS: 6751: Failed to open 
file 'hpilo-d0ccb15'; Requested flags 0x5, world: 8491061 [ams-ahs], (Existing 
flags 0x5, world: 8491029 [ams-main]): Busy
2019-06-02T16:47:49.320Z cpu1:66415)nhpsa: 
hpsa_vmkScsiCmdDone:6384: Sense data: error code: 0x70, key: 0x5, info:00 00 00 
00 , cmdInfo:00 00 00 00 , CmdSN: 0xd5c, worldId: 0x818e8e, Cmd: 0x85, ASC: 
0x20, ASCQ: 0x0
2019-06-02T16:47:49.320Z cpu1:66415)ScsiDeviceIO: 2948: 
Cmd(0x43954115be40) 0x85, CmdSN 0xd5c from world 8490638 to dev 
"naa.600508b1001c6d77d7dd6a0cc0953df1" failed H:0x0 D:0x2 P:0x0 Valid sense 
data: 0x5 0x20 0x0.

The device " naa.600508b1001c6d77d7dd6a0cc0953df1" is the local 
disk on this host.

Yiping


On 6/5/19, 11:15 AM, "Sergey Levitskiy"  wrote:

This must be specific to that environment.  For a full clone 
mode ACS simply calls cloneVMTask of vSphere API so basically until cloning of 
that template succeeds when attmepted in vSphere client  it would keep failing 
in ACS. Can you post vmkernel.log from your ESX host esx-0001-a-001?


On 6/5/19, 8:47 AM, "Yiping Zhang"  
wrote:

Well,  I can always reproduce it in this particular vSphere 
set up,  but in a different ACS+vSphere environment,  I don't see this problem.

Yiping

On 6/5/19, 1:00 AM, "Andrija Panic" 
 wrote:

Yiping,

if you are sure you can reproduce the issue, it would 
be good to raise a
GitHub issue and provide as much detail as possible.

Andrija

On Wed, 5 Jun 2019 at 05:29, Yiping Zhang 

wrote:

> Hi, Sergey:
>
> Thanks for the tip. After setting 
vmware.create.full.clone=false,  I was
> able to create and start system VM instances.
However,  I feel that the
> underlying problem still 

Re: delete UploadAbandoned template

2019-06-06 Thread jesse . waters
I did not try your fix but I will be trying to import image again soon and
see. My fix was to create directory in template/tmpl/2/209, then using th
UI I was able to remove the template.

Thanks again,

 Jesse

On Wed, Jun 5, 2019 at 4:55 PM Nicolas Vazquez <
nicolas.vazq...@shapeblue.com> wrote:

> Hi Jesse,
>
> As the upload was abandoned, it has never started, so the workaround will
> be just deleting the entry from CloudStack as there will be no files on
> secondary storage. To do this, please first execute this query on database:
>
> update template_store_ref set destroyed = 1 where template_id = 209;
>
> After that, try again to delete the template from CloudStack.
>
>
> Regards,
>
> Nicolas Vazquez
>
> 
> From: jesse.wat...@gmail.com 
> Sent: Wednesday, June 5, 2019 3:47 PM
> To: users@cloudstack.apache.org
> Subject: delete UploadAbandoned template
>
> How do I remove abandined templates?
>
> While trying to clean up after couple failed attempts of importing a vm
> template status is "UploadAbandoned"
> When I try to delete file I get an error , "Failed to delete template"
> Looking at management.log
> 2019-06-05 14:43:17,508 WARN  [c.c.t.HypervisorTemplateAdapter]
> (API-Job-Executor-27:ctx-1bdcfb28 job-402 ctx-9643b0f7) (logid:334038d9)
> Failed to delete the template:
> Tmpl[209-QCOW2-209-2-b1f68c47-8b88-3d46-90fa-753f4e8bcc72 from the image
> store: atlas_nfs due to: Unable to delete file 207 under Template path
> template/tmpl/2/209
>
> And directory 209 does not exist due to upload being abandoned.
>
> Any suggestions?
>
> TIA,
>   Jesse
>
> nicolas.vazq...@shapeblue.com
> www.shapeblue.com
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>
>
>
>


Re: CPU compatibility issue

2019-06-06 Thread Andrija Panic
To be fair, this fix is not a "fix"  - it just allows "subtracting" CPU
flags for the vCPU from the base model presented, if the physical CPU
doesn't support it.

I spent some time yesterday googling - and it seems like a possible
kernel/qemu bug which is presented on just some CPU models.

Adam, I understood you "solved" your issue by not expecting SandyBrige
while having Westmer :) , right ?

(You could however probably run SandyBridge with Westmere with this patch)

Cheers

On Thu, 6 Jun 2019 at 04:29, Nicolas Vazquez 
wrote:

> Hi Adam,
>
> This PR seems to fix your issue:
> https://github.com/apache/cloudstack/pull/3335 and it has been merged
> into master but not on 4.11.x, which means it will be available on the next
> 4.13 release.
>
> Regards,
> Nicolas Vazquez
> 
> From: Adam Witwicki 
> Sent: Wednesday, June 5, 2019 9:25:14 AM
> To: users@cloudstack.apache.org
> Subject: RE: CPU compatibility issue
>
> It might be because I'm using a Westmere CPU?
>
>
> Doh
>
> Adam
>
> nicolas.vazq...@shapeblue.com
> www.shapeblue.com
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>
>
>
>
> -Original Message-
> From: Adam Witwicki 
> Sent: 05 June 2019 13:08
> To: users@cloudstack.apache.org
> Subject: CPU compatibility issue
>
> ** This mail originated from OUTSIDE the Oakford corporate network. Treat
> hyperlinks and attachments in this email with caution. **
>
> Hello,
>
> I am having an issue on a new test system, cloudstack version 4.11.00
> (shapeblue) with KVM (Ubuntu 16.04), In production and in the past I have
> used these settings in the agent.properties
>
> guest.cpu.model=SandyBridge
> guest.cpu.mode=custom
>
> As this was needed to allow windows 2016 to boot and allow live migration
>
> Yet now on a clean install I get this error when trying to start any
> instance (console, router, instance)
>
> 2019-06-05 13:03:13,282 WARN
> [resource.wrapper.LibvirtStartCommandWrapper] (agentRequest-Handler-5:null)
> (logid:2df290f8) LibvirtException
> org.libvirt.LibvirtException: unsupported configuration: guest and host
> CPU are not compatible: Host CPU does not provide required features: avx,
> xsave, aes, tsc-deadline, x2apic, pclmuldq
>
> If I use host-passthrough it works
>
> This may be a clue
> https://github.com/ustcweizhou/cloudstack/commit/16e87a80d85d283d52c3d72001cd94d9fda25278
> but I am not sure.
>
>
> Can anyone explain what is going on?
>
> Thanks
>
> Adam
>
>
>
>
> Disclaimer Notice:
> This email has been sent by Oakford Technology Limited, while we have
> checked this e-mail and any attachments for viruses, we can not guarantee
> that they are virus-free. You must therefore take full responsibility for
> virus checking.
> This message and any attachments are confidential and should only be read
> by those to whom they are addressed. If you are not the intended recipient,
> please contact us, delete the message from your computer and destroy any
> copies. Any distribution or copying without our prior permission is
> prohibited.
> Internet communications are not always secure and therefore Oakford
> Technology Limited does not accept legal responsibility for this message.
> The recipient is responsible for verifying its authenticity before acting
> on the contents. Any views or opinions presented are solely those of the
> author and do not necessarily represent those of Oakford Technology Limited.
> Registered address: Oakford Technology Limited, The Manor House, Potterne,
> Wiltshire. SN10 5PN.
> Registered in England and Wales No. 5971519
>
> Disclaimer Notice:
> This email has been sent by Oakford Technology Limited, while we have
> checked this e-mail and any attachments for viruses, we can not guarantee
> that they are virus-free. You must therefore take full responsibility for
> virus checking.
> This message and any attachments are confidential and should only be read
> by those to whom they are addressed. If you are not the intended recipient,
> please contact us, delete the message from your computer and destroy any
> copies. Any distribution or copying without our prior permission is
> prohibited.
> Internet communications are not always secure and therefore Oakford
> Technology Limited does not accept legal responsibility for this message.
> The recipient is responsible for verifying its authenticity before acting
> on the contents. Any views or opinions presented are solely those of the
> author and do not necessarily represent those of Oakford Technology Limited.
> Registered address: Oakford Technology Limited, The Manor House, Potterne,
> Wiltshire. SN10 5PN.
> Registered in England and Wales No. 5971519
>
>

-- 

Andrija Panić


Re: [DISCUSS] Deprecate older hypervisors in 4.14

2019-06-06 Thread Wido den Hollander



On 6/5/19 7:56 PM, Rohit Yadav wrote:
> All,
> 
> Bases on some conversation on a Github issue on moving to Python3 today, I 
> would like to propose a PR after 4.13 is cut on deprecation of following 
> hypervisors in the next major 4.14 release:
> 
> - XenServer 6.2, 6.5 and older
> - VMware 5.x
> - KVM/CentOS6/RHEL6 (though we've already voted and agreed to deprecate el6 
> packages in 4.14)
> 

I would also opt for Ubuntu 14.04 as that is EOL since April 2019 and we
should/could drop that support as well.

> Note that it was mentioned in recent release notes as well, but I wanted to 
> kick off a discussion thread if esp. our users have any objections or 
> concerns: 
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Hypervisor+and+Management+Server+OS+EOL+Dates
> 
> Thoughts? Anything we should add, remove? Thanks.
> 
> Regards,
> Rohit Yadav
> 
> 
> rohit.ya...@shapeblue.com 
> www.shapeblue.com
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>   
>  
> 


Re: [DISCUSS] Deprecate older hypervisors in 4.14

2019-06-06 Thread Wido den Hollander
Sounds like a good idea!

People can still stay on a older version of CS if needed for them.

We can't support older versions for ever.

Wido

On 6/5/19 7:56 PM, Rohit Yadav wrote:
> All,
> 
> Bases on some conversation on a Github issue on moving to Python3 today, I 
> would like to propose a PR after 4.13 is cut on deprecation of following 
> hypervisors in the next major 4.14 release:
> 
> - XenServer 6.2, 6.5 and older
> - VMware 5.x
> - KVM/CentOS6/RHEL6 (though we've already voted and agreed to deprecate el6 
> packages in 4.14)
> 
> Note that it was mentioned in recent release notes as well, but I wanted to 
> kick off a discussion thread if esp. our users have any objections or 
> concerns: 
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Hypervisor+and+Management+Server+OS+EOL+Dates
> 
> Thoughts? Anything we should add, remove? Thanks.
> 
> Regards,
> Rohit Yadav
> 
> 
> rohit.ya...@shapeblue.com 
> www.shapeblue.com
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>   
>  
> 


RE: Can't start systemVM in a new advanced zone deployment

2019-06-06 Thread Paul Angus
Hi Yiping,

do you have snapshots enabled on the NetApp filer?  (it used to be seen as a  
".snapshot"  subdirectory in each directory)

If so try disabling snapshots - there used to be a bug where the .snapshot 
directory would confuse CloudStack.

paul.an...@shapeblue.com 
www.shapeblue.com
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue
  
 


-Original Message-
From: Yiping Zhang  
Sent: 05 June 2019 23:38
To: users@cloudstack.apache.org
Subject: Re: Can't start systemVM in a new advanced zone deployment

Hi, Sergey:

I found more logs in vpxa.log ( the esxi hosts are using UTC time zone,  so I 
was looking at wrong time periods earlier).  I have uploaded more logs into 
pastebin.

From these log entries,  it appears that when copying template to VM,  it tried 
to open destination VMDK file and got error file not found.  

In case that the CloudStack attempted to create a systemVM,  the destination 
VMDK file path it is looking for is "//.vmdk", 
 see uploaded log at https://pastebin.com/aFysZkTy

In case when I manually created new VM from a (different) template in vCenter 
UI,   the destination VMDK file path it is looking for is 
"//.vmdk", see uploaded log at 
https://pastebin.com/yHcsD8xB

So, I am confused as to how the path for destination VMDK was determined and by 
CloudStack or VMware, how did I end up with this?

Yiping


On 6/5/19, 12:32 PM, "Sergey Levitskiy"  wrote:

Some operations log get transferred to vCenter log vpxd.log. It is not 
straightforward to trace I but Vmware will be able to help should you open case 
with them. 


On 6/5/19, 11:39 AM, "Yiping Zhang"  wrote:

Hi, Sergey:

During the time period when I had problem cloning template,  there are 
only a few unique entries in vmkernel.log, and they were repeated 
hundreds/thousands of times by all the cpu cores:

2019-06-02T16:47:00.633Z cpu9:8491061)FSS: 6751: Failed to open file 
'hpilo-d0ccb15'; Requested flags 0x5, world: 8491061 [ams-ahs], (Existing flags 
0x5, world: 8491029 [ams-main]): Busy
2019-06-02T16:47:49.320Z cpu1:66415)nhpsa: hpsa_vmkScsiCmdDone:6384: 
Sense data: error code: 0x70, key: 0x5, info:00 00 00 00 , cmdInfo:00 00 00 00 
, CmdSN: 0xd5c, worldId: 0x818e8e, Cmd: 0x85, ASC: 0x20, ASCQ: 0x0
2019-06-02T16:47:49.320Z cpu1:66415)ScsiDeviceIO: 2948: 
Cmd(0x43954115be40) 0x85, CmdSN 0xd5c from world 8490638 to dev 
"naa.600508b1001c6d77d7dd6a0cc0953df1" failed H:0x0 D:0x2 P:0x0 Valid sense 
data: 0x5 0x20 0x0.

The device " naa.600508b1001c6d77d7dd6a0cc0953df1" is the local disk on 
this host.

Yiping


On 6/5/19, 11:15 AM, "Sergey Levitskiy"  wrote:

This must be specific to that environment.  For a full clone mode 
ACS simply calls cloneVMTask of vSphere API so basically until cloning of that 
template succeeds when attmepted in vSphere client  it would keep failing in 
ACS. Can you post vmkernel.log from your ESX host esx-0001-a-001?


On 6/5/19, 8:47 AM, "Yiping Zhang"  
wrote:

Well,  I can always reproduce it in this particular vSphere set 
up,  but in a different ACS+vSphere environment,  I don't see this problem.

Yiping

On 6/5/19, 1:00 AM, "Andrija Panic"  
wrote:

Yiping,

if you are sure you can reproduce the issue, it would be 
good to raise a
GitHub issue and provide as much detail as possible.

Andrija

On Wed, 5 Jun 2019 at 05:29, Yiping Zhang 

wrote:

> Hi, Sergey:
>
> Thanks for the tip. After setting 
vmware.create.full.clone=false,  I was
> able to create and start system VM instances.However, 
 I feel that the
> underlying problem still exists, and I am just working 
around it instead of
> fixing it,  because in my lab CloudStack instance with 
the same version of
> ACS and vSphere,  I still have 
vmware.create.full.clone=true and all is
> working as expected.
>
> I did some reading on VMware docs regarding full clone 
vs. linked clone.
> It seems that the best practice is to use full clone for 
production,
> especially if there are high rates of changes to the 
disks.  So
> eventually,  I need to understand and fix the root cause 
for this issue.
> At least for now,  I am over this hurdle and I can move 
on.
>
> Thanks again,
>