[jira] [Updated] (CLOUDSTACK-10397) Transient NFS access issues should not result in duplicate VMs or KVM hosts resets

2018-10-24 Thread Jean-Francois Nadeau (JIRA)


 [ 
https://issues.apache.org/jira/browse/CLOUDSTACK-10397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Francois Nadeau updated CLOUDSTACK-10397:
--
Description: 
Under CentOS 7.x with KVM and NFS as primary storage,  we expect to tolerate 
and recover from temporary disconnection from primary storage.  We simulate 
this with iptables from the KVM host using a DROP rule in the input and output 
chains to the NFS servers IP. 

 

The observation under 4.11.2 is that an NFS  disconnection of more than 5 
minutes will

With VM HA enabled and host HA disabled:   Cloudstack agent will often block 
refreshing primary storage and go in Down state from the controller 
perspective.  Controller will restart VMs on other hosts creating duplicate VMs 
on the network and possibly corrupt VM root disk if the transient issue goes 
away and the first KVM host still active.

 

With VM HA enabled and host HA enabled: Same agent issue can cause it to block 
and will end in either Disconnect or Down state.  Host HA framework will reset 
the KVM hosts after the kvm.ha._degraded_._max_.period .  The problem here is 
that,  yes the host HA does ensure we don't have dup VMs but at scale this 
would also provoke a lot of KVM host resets (if not all of them). 

 

On 4.9.3 the cloudstack agent will simply "hang" in there and the controller 
would not see the KVM host down (at least for 60 minutes).  When the network 
issue blocking NFS  access is resolved all KVM hosts and VMs just resume 
working with no large scale fencing happening.

The same resilience is expected on 4.11.x .  This a a blocker for an upgrade 
from 4.9,  considering we are more at risk on 4.11 with VM HA enabled and 
regardless of if host HA is enabled.

  was:
Under CentOS 7.x with KVM and NFS as primary storage,  we expect to tolerate 
and recover from temporary disconnection from primary storage.  We simulate 
this with iptables from the KVM host using a DROP rule in the input and output 
chains to the NFS servers IP. 

 

The observation under 4.11.2 is that an NFS  disconnection of more than 5 
minutes will

With VM HA enabled and host HA disabled:   Cloudstack agent will often block 
refreshing primary storage and go in Down state from the controller 
perspective.  Controller will restart VMs on other hosts creating duplicate VMs 
on the network and possibly corrupt VM root disk if the transient issue goes 
away.

 

With VM HA enabled and host HA disabled: Same agent issue can cause it to block 
and will end in either Disconnect or Down state.  Host HA framework will reset 
the KVM hosts after the kvm.ha._degraded_._max_.period .  The problem here is 
that,  yes the host HA does ensure we don't have dup VMs but at scale this 
would also provoke a lot of KVM host resets (if not all of them). 

 

On 4.9.3 the cloudstack agent will simply "hang" in there and the controller 
would not see the KVM host down (at least for 60 minutes).  When the network 
issue blocking NFS  access is resolved all KVM hosts and VMs just resume 
working with no large scale fencing happening.

The same resilience is expected on 4.11.x .  This a a blocker for an upgrade 
from 4.9,  considering we are more at risk on 4.11 with VM HA enabled and 
regardless of if host HA is enabled.


> Transient NFS access issues should not result in duplicate VMs or KVM hosts 
> resets
> --
>
> Key: CLOUDSTACK-10397
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-10397
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: cloudstack-agent, Hypervisor Controller
>Affects Versions: 4.11.1.1
>Reporter: Jean-Francois Nadeau
>Priority: Blocker
>
> Under CentOS 7.x with KVM and NFS as primary storage,  we expect to tolerate 
> and recover from temporary disconnection from primary storage.  We simulate 
> this with iptables from the KVM host using a DROP rule in the input and 
> output chains to the NFS servers IP. 
>  
> The observation under 4.11.2 is that an NFS  disconnection of more than 5 
> minutes will
> With VM HA enabled and host HA disabled:   Cloudstack agent will often block 
> refreshing primary storage and go in Down state from the controller 
> perspective.  Controller will restart VMs on other hosts creating duplicate 
> VMs on the network and possibly corrupt VM root disk if the transient issue 
> goes away and the first KVM host still active.
>  
> With VM HA enabled and host HA enabled: Same agent issue can cause it to 
> block and will end in either Disconnect or Down state.  Host HA framework 
> will reset the KVM hosts after the kvm.ha._degraded_._max_.period .  The 
> problem here is that,  yes the host HA does ensure we don't have dup VMs 

[jira] [Created] (CLOUDSTACK-10397) Transient NFS access issues should not result in duplicate VMs or KVM hosts resets

2018-10-24 Thread Jean-Francois Nadeau (JIRA)
Jean-Francois Nadeau created CLOUDSTACK-10397:
-

 Summary: Transient NFS access issues should not result in 
duplicate VMs or KVM hosts resets
 Key: CLOUDSTACK-10397
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-10397
 Project: CloudStack
  Issue Type: Bug
  Security Level: Public (Anyone can view this level - this is the default.)
  Components: cloudstack-agent, Hypervisor Controller
Affects Versions: 4.11.1.1
Reporter: Jean-Francois Nadeau


Under CentOS 7.x with KVM and NFS as primary storage,  we expect to tolerate 
and recover from temporary disconnection from primary storage.  We simulate 
this with iptables from the KVM host using a DROP rule in the input and output 
chains to the NFS servers IP. 

 

The observation under 4.11.2 is that an NFS  disconnection of more than 5 
minutes will

With VM HA enabled and host HA disabled:   Cloudstack agent will often block 
refreshing primary storage and go in Down state from the controller 
perspective.  Controller will restart VMs on other hosts creating duplicate VMs 
on the network and possibly corrupt VM root disk if the transient issue goes 
away.

 

With VM HA enabled and host HA disabled: Same agent issue can cause it to block 
and will end in either Disconnect or Down state.  Host HA framework will reset 
the KVM hosts after the kvm.ha._degraded_._max_.period .  The problem here is 
that,  yes the host HA does ensure we don't have dup VMs but at scale this 
would also provoke a lot of KVM host resets (if not all of them). 

 

On 4.9.3 the cloudstack agent will simply "hang" in there and the controller 
would not see the KVM host down (at least for 60 minutes).  When the network 
issue blocking NFS  access is resolved all KVM hosts and VMs just resume 
working with no large scale fencing happening.

The same resilience is expected on 4.11.x .  This a a blocker for an upgrade 
from 4.9,  considering we are more at risk on 4.11 with VM HA enabled and 
regardless of if host HA is enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CLOUDSTACK-10239) User LDAP authentication not working in UI (but works via API)

2018-01-19 Thread Jean-Francois Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332662#comment-16332662
 ] 

Jean-Francois Nadeau commented on CLOUDSTACK-10239:
---

Confirmed this is a regression from 4.9.   I reinstalled controller and can use 
LDAP credentials from the UI

> User LDAP authentication not working in UI (but works via API)
> --
>
> Key: CLOUDSTACK-10239
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-10239
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Management Server
>Affects Versions: 4.11.0.0
> Environment: CentOS 7, KVM, MSAD
>Reporter: Jean-Francois Nadeau
>Priority: Major
>
> hi,
> I setup LDAP authentication with the microsoft AD ldap provider and get 
> different behaviors in the UI vs using the API (cs python cli)
> Through the UI,  I can see the list of our AD users using the "Add ldap 
> account" action but selecting a user and adding it returns there are no 
> username by that name:
> INFO  [c.c.a.ApiServer] (qtp510113906-20:ctx-e32d5ff4 ctx-c3c50b46) 
> (logid:89c8c538) No LDAP user exists with the username of 
>  
> Doing the same thing from though CLI works fine:
> $ cs ldapCreateAccount username=markp accounttype=1 account=admin
>  ...
> {
>  "account": "admin", 
>  "accountid": "0683fdb0-fbae-11e7-9574-96a9f76bb706", 
>  "accounttype": 1, 
>  "created": "2018-01-18T19:21:31+", 
>  "domain": "ROOT", 
>  "domainid": "d9bbe213-fbad-11e7-9574-96a9f76bb706", 
> "firstname": "Mark", 
>  "id": "5ed90ce8-5c54-4f72-8579-639947f5c368", 
>  "iscallerchilddomain": false, 
>  "isdefault": false, 
>  "lastname": "p", 
>  "roleid": "f8a368af-fbad-11e7-9574-96a9f76bb706", 
>  "rolename": "Root Admin", 
>  "roletype": "Admin", 
>  "state": "enabled", 
>  "username": "markp", 
>  "usersource": "ldap"
>  }
>  
> Also,  once this user is added,  he can not login in the UI using his LDAP 
> credentials with the same error in the ms logs.  Then,  if i generate keys 
> for that same admin user,  he can use the API without problems.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CLOUDSTACK-10239) User LDAP authentication not working in UI (but works via API)

2018-01-18 Thread Jean-Francois Nadeau (JIRA)
Jean-Francois Nadeau created CLOUDSTACK-10239:
-

 Summary: User LDAP authentication not working in UI (but works via 
API)
 Key: CLOUDSTACK-10239
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-10239
 Project: CloudStack
  Issue Type: Bug
  Security Level: Public (Anyone can view this level - this is the default.)
  Components: Management Server
Affects Versions: 4.11.0.0
 Environment: CentOS 7, KVM, MSAD
Reporter: Jean-Francois Nadeau


hi,

I setup LDAP authentication with the microsoft AD ldap provider and get 
different behaviors in the UI vs using the API (cs python cli)

Through the UI,  I can see the list of our AD users using the "Add ldap 
account" action but selecting a user and adding it returns there are no 
username by that name:

INFO  [c.c.a.ApiServer] (qtp510113906-20:ctx-e32d5ff4 ctx-c3c50b46) 
(logid:89c8c538) No LDAP user exists with the username of 

 

Doing the same thing from though CLI works fine:

$ cs ldapCreateAccount username=markp accounttype=1 account=admin

 ...

{
 "account": "admin", 
 "accountid": "0683fdb0-fbae-11e7-9574-96a9f76bb706", 
 "accounttype": 1, 
 "created": "2018-01-18T19:21:31+", 
 "domain": "ROOT", 
 "domainid": "d9bbe213-fbad-11e7-9574-96a9f76bb706", 
"firstname": "Mark", 
 "id": "5ed90ce8-5c54-4f72-8579-639947f5c368", 
 "iscallerchilddomain": false, 
 "isdefault": false, 
 "lastname": "p", 
 "roleid": "f8a368af-fbad-11e7-9574-96a9f76bb706", 
 "rolename": "Root Admin", 
 "roletype": "Admin", 
 "state": "enabled", 
 "username": "markp", 
 "usersource": "ldap"
 }

 

Also,  once this user is added,  he can not login in the UI using his LDAP 
credentials with the same error in the ms logs.  Then,  if i generate keys for 
that same admin user,  he can use the API without problems.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CLOUDSTACK-10237) L2 networks should be allowed to be shared and used in projects

2018-01-17 Thread Jean-Francois Nadeau (JIRA)
Jean-Francois Nadeau created CLOUDSTACK-10237:
-

 Summary: L2 networks should be allowed to be shared and used in 
projects
 Key: CLOUDSTACK-10237
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-10237
 Project: CloudStack
  Issue Type: Bug
  Security Level: Public (Anyone can view this level - this is the default.)
  Components: Management Server
Affects Versions: 4.11.0.0
 Environment: Centos7, KVM, advanced zones with vlan based isolation
Reporter: Jean-Francois Nadeau


Hi all,
 
I'm testing 4.11-rc1 and the new L2 network type feature as shown at 
[http://www.shapeblue.com/layer-2-networks-in-cloudstack/]
 
I want to use this as a replacement to a shared network offering with no DHCP 
which works to support an external DHCP server but still required to fill some 
CIDR information.
 
I thought the intent was that L2 network type was to replace the previous 
approach when required to integrate an existing network/DHCP.
 
If I attempt to provision a VM in a project as the root admin using the L2 
network I get denied... apparently because they are not shared and I can't make 
them public.
 
('HTTP 531 response from CloudStack', , \{u'errorcode': 531, 
u'uuidList': [], u'cserrorcode': 4365, u'errortext': u'Unable to use network 
with id= 7712102b-bbdf-4c54-bdbf-9fddfa16de46, permission denied'}) 
 
Provisioning in the root project works just fine  but really I want to use 
L2 networks in user projects even if only the admin can do so.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CLOUDSTACK-5582) kvm - HA is not triggered when host is powered down since the host gets into "Disconnected" state.

2017-12-28 Thread Jean-Francois Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305694#comment-16305694
 ] 

Jean-Francois Nadeau commented on CLOUDSTACK-5582:
--

I see the same behavior on 4.10.   Powering off a server will put it in 
Disconnected state and VM HA will not fire. 

> kvm - HA is not triggered when host is powered down since the host gets into 
> "Disconnected" state. 
> ---
>
> Key: CLOUDSTACK-5582
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-5582
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Management Server
>Affects Versions: 4.3.0
> Environment: Build from 4.3
>Reporter: Sangeetha Hariharan
>Assignee: edison su
>Priority: Critical
> Fix For: 4.4.0
>
> Attachments: kvmhost-down-up.rar, kvmhost-down-up.rar
>
>
> kvm - HA is not triggered when host is powered down since the host gets into 
> "Disconnected" state.
> Advanced zone with  2 KVM (RHEL 6.3) hosts.
> Steps to reproduce the problem:
> Deploy few Vms in each of the hosts .
> Power down one of the hosts ( using IPMI).
> We see that the host gets into "Disconnected" state.
> All the Vms that are running in this host continue to be in "Up" state.
> This happens because of management server receiving a explicit shutdown 
> request from the agent:
> 2013-12-19 21:06:37,262 DEBUG [c.c.a.m.AgentManagerImpl] 
> (AgentManager-Handler-15:null) SeqA 2--1: Processing Seq 2--1:  { Cmd , 
> MgmtId: -1, via: 2, Ver: v1, Flags: 111, 
> [{"com.cloud.agent.api.ShutdownCommand":{"reason":"sig.kill","wait":0}}] }
> 2013-12-19 21:06:37,263 INFO  [c.c.a.m.AgentManagerImpl] 
> (AgentManager-Handler-15:null) Host 2 has informed us that it is shutting 
> down with reason sig.kill and detail null
> 2013-12-19 21:06:37,263 INFO  [c.c.a.m.AgentManagerImpl] 
> (AgentTaskPool-1:ctx-a32ed8e2) Host 2 is disconnecting with event 
> ShutdownRequested
> 2013-12-19 21:06:37,264 DEBUG [c.c.a.m.AgentManagerImpl] 
> (AgentTaskPool-1:ctx-a32ed8e2) The next status of agent 2is Disconnected, 
> current status is Up
> 2013-12-19 21:06:37,264 DEBUG [c.c.a.m.AgentManagerImpl] 
> (AgentTaskPool-1:ctx-a32ed8e2) Deregistering link for 2 with state 
> Disconnected
> 2013-12-19 21:06:37,264 DEBUG [c.c.a.m.AgentManagerImpl] 
> (AgentTaskPool-1:ctx-a32ed8e2) Remove Agent : 2
> 2013-12-19 21:06:37,264 DEBUG [c.c.a.m.ConnectedAgentAttache] 
> (AgentTaskPool-1:ctx-a32ed8e2) Processing Disconnect.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CLOUDSTACK-10102) New Network Type (L2)

2017-11-28 Thread Jean-Francois Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269493#comment-16269493
 ] 

Jean-Francois Nadeau commented on CLOUDSTACK-10102:
---

Is this feature the same as what is possible to achieve via the API here ? :  
http://www.shapeblue.com/using-the-api-for-advanced-network-management/

> New Network Type (L2)
> -
>
> Key: CLOUDSTACK-10102
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-10102
> Project: CloudStack
>  Issue Type: Improvement
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>Reporter: Nicolas Vazquez
>Assignee: Nicolas Vazquez
>
> Feature Specification: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74680920



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)