Re: Experimental - Direct download for templates

2018-02-07 Thread Marc-Aurèle Brothier
That's a great feature I think, thanks Nicolas for pushing it!

Marc-Aurèle

On Wed, Feb 7, 2018 at 3:19 PM, Nicolas Vazquez <
nicolas.vazq...@shapeblue.com> wrote:

> Hi all,
>
>
> A feature has been introduced on 4.11.0 allowing to register templates
> bypassing secondary storage using a new option 'Direct Download'. It allows
> templates to be directly downloaded into primary storage at VM deployment
> time. It is an experimental feature and it is currently supported on KVM
> hypervisor only. PR: https://github.com/apache/cloudstack/pull/2379
>
>
> A brief description on the current implementation:
>
> - CloudStack allows registering Direct Download/Bypass Secondary Storage
> templates for KVM hypervisor by setting the direct_download flag to true on
> registerTemplate.
>
> - Templates are not downloaded to secondary storage after they are
> registered on CloudStack, are marked as Bypass Secondary Storage and as
> Ready for deployment.
>
> - When Bypassed templates are selected for VM deployment, download is
> delegated to the agents, which would store the templates on primary storage
> instead of copying them from secondary storage
>
> - Metalinks are supported, but aria2 dependency has to be manually
> installed on the agents.
>
>
> There are currently some PRs in progress for 4.11.1 with some improvements
> for this functionality.
>
>
> Any comments/ideas?
>
>
> Thanks,
>
> Nicolas
>
> nicolas.vazq...@shapeblue.com
> www.shapeblue.com
> ,
> @shapeblue
>
>
>
>


Experimental - Direct download for templates

2018-02-07 Thread Nicolas Vazquez
Hi all,


A feature has been introduced on 4.11.0 allowing to register templates 
bypassing secondary storage using a new option 'Direct Download'. It allows 
templates to be directly downloaded into primary storage at VM deployment time. 
It is an experimental feature and it is currently supported on KVM hypervisor 
only. PR: https://github.com/apache/cloudstack/pull/2379


A brief description on the current implementation:

- CloudStack allows registering Direct Download/Bypass Secondary Storage 
templates for KVM hypervisor by setting the direct_download flag to true on 
registerTemplate.

- Templates are not downloaded to secondary storage after they are registered 
on CloudStack, are marked as Bypass Secondary Storage and as Ready for 
deployment.

- When Bypassed templates are selected for VM deployment, download is delegated 
to the agents, which would store the templates on primary storage instead of 
copying them from secondary storage

- Metalinks are supported, but aria2 dependency has to be manually installed on 
the agents.


There are currently some PRs in progress for 4.11.1 with some improvements for 
this functionality.


Any comments/ideas?


Thanks,

Nicolas

nicolas.vazq...@shapeblue.com 
www.shapeblue.com
,   
@shapeblue
  
 



Re: Refusing to design this network, the physical isolation type is not BCF_SEGMENT

2018-02-07 Thread Nicolas Vazquez
I have pushed a fix for this on PR 2448. Can you please test it?


From: Nux! 
Sent: Tuesday, February 6, 2018 10:25:07 AM
To: dev
Subject: Re: Refusing to design this network, the physical isolation type is 
not BCF_SEGMENT

Thanks Nicolas, much appreciated.
Once you have a patch, feel free to ping me so I can test.

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro


nicolas.vazq...@shapeblue.com 
www.shapeblue.com
,   
@shapeblue
  
 

- Original Message -
> From: "Nicolas Vazquez" 
> To: "dev" 
> Sent: Tuesday, 6 February, 2018 13:23:54
> Subject: Re: Refusing to design this network, the physical isolation type is 
> not BCF_SEGMENT

> Hi Lucian,
>
>
> Thanks for posting this issue. I have checked the canHandle() method on
> VxlanGuestNetworkGuru and it is not considering L2 network offerings, only
> Isolated, so it refuses to design the network. I'll make sure to include a fix
> for it on 4.11.1.
>
>
> Thanks,
>
> Nicolas
>
> 
> From: Nux! 
> Sent: Tuesday, February 6, 2018 8:30:03 AM
> To: dev
> Subject: [L2 network] [VXLAN] Refusing to design this network, the physical
> isolation type is not BCF_SEGMENT
>
> Hi,
>
> I'm trying to add an L2 network based on a VXLAN physical network and I am
> getting the error in the subject.
>
> If I use a VLAN based physical network all completes successfully and I end up
> with an L2 network in green "Setup" state.
>
> Here are some more logs:
>
> 2018-02-06 11:20:27,748 DEBUG [c.c.n.NetworkServiceImpl]
> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Found physical
> network id=201 based on requested tags mellanoxvxlan
> 2018-02-06 11:20:27,749 DEBUG [c.c.n.NetworkServiceImpl]
> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Found physical
> network id=201 based on requested tags mellanoxvxlan
> 2018-02-06 11:20:27,766 DEBUG [c.c.n.g.BigSwitchBcfGuestNetworkGuru]
> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
> design this network, the physical isolation type is not BCF_SEGMENT
> 2018-02-06 11:20:27,766 DEBUG [o.a.c.n.c.m.ContrailGuru]
> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
> design this network
> 2018-02-06 11:20:27,767 DEBUG [c.c.n.g.NiciraNvpGuestNetworkGuru]
> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
> design this network
> 2018-02-06 11:20:27,767 DEBUG [o.a.c.n.o.OpendaylightGuestNetworkGuru]
> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
> design this network
> 2018-02-06 11:20:27,767 DEBUG [c.c.n.g.OvsGuestNetworkGuru]
> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
> design this network
> 2018-02-06 11:20:27,769 DEBUG [o.a.c.n.g.SspGuestNetworkGuru]
> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) SSP not
> configured to be active
> 2018-02-06 11:20:27,769 DEBUG [c.c.n.g.BrocadeVcsGuestNetworkGuru]
> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
> design this network
> 2018-02-06 11:20:27,769 DEBUG [c.c.n.g.NuageVspGuestNetworkGuru]
> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
> design network using network offering 19 on physical network 201
> 2018-02-06 11:20:27,770 DEBUG [o.a.c.e.o.NetworkOrchestrator]
> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Releasing lock
> for Acct[6af2875b-04fc-11e8-923e-002590474525-admin]
> 2018-02-06 11:20:27,789 DEBUG [c.c.u.d.T.Transaction]
> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Rolling back the
> transaction: Time = 38 Name =  qtp788117692-390; called by
> -TransactionLegacy.rollback:889-TransactionLegacy.removeUpTo:832-TransactionLegacy.close:656-Transaction.execute:43-Transaction.execute:47-NetworkOrchestrator.createGuestNetwork:2315-NetworkServiceImpl$4.doInTransaction:1383-NetworkServiceImpl$4.doInTransaction:1331-Transaction.execute:40-NetworkServiceImpl.commitNetwork:1331-NetworkServiceImpl.createGuestNetwork:1294-NativeMethodAccessorImpl.invoke0:-2
> 2018-02-06 11:20:27,798 ERROR [c.c.a.ApiServer] (qtp788117692-390:ctx-f1a980be
> ctx-61be30e8) (logid:0ca0c866) unhandled exception executing api command:
> [Ljava.lang.String;@43b9df02
> com.cloud.utils.exception.CloudRuntimeException: Unable to convert network
> offering with specified id to network profile
>at
>
> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.setupNetwork(NetworkOrchestrator.java:726)
>at
>
> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator$10.doInTransaction(NetworkOrchestrator.java:2364)
>at
>
> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator$10.doInTransaction(NetworkOrchestrator.java:2315)
>at 
> com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:50)

Re: [DISCUSS] VR upgrade downtime reduction

2018-02-07 Thread Nux!
+1 too

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "Rene Moser" 
> To: "dev" 
> Sent: Wednesday, 7 February, 2018 10:11:45
> Subject: Re: [DISCUSS] VR upgrade downtime reduction

> On 02/06/2018 02:47 PM, Remi Bergsma wrote:
>> Hi Daan,
>> 
>> In my opinion the biggest issue is the fact that there are a lot of different
>> code paths: VPC versus non-VPC, VPC versus redundant-VPC, etc. That's why you
>> cannot simply switch from a single VPC to a redundant VPC for example.
>> 
>> For SBP, we mitigated that in Cosmic by converting all non-VPCs to a VPC 
>> with a
>> single tier and made sure all features are supported. Next we merged the 
>> single
>> and redundant VPC code paths. The idea here is that redundancy or not should
>> only be a difference in the number of routers. Code should be the same. A
>> single router, is also "master" but there just is no "backup".
>> 
>> That simplifies things A LOT, as keepalived is now the master of the whole
>> thing. No more assigning ip addresses in Python, but leave that to keepalived
>> instead. Lots of code deleted. Easier to maintain, way more stable. We just
>> released Cosmic 6 that has this feature and are now rolling it out in
>> production. Looking good so far. This change unlocks a lot of possibilities,
>> like live upgrading from a single VPC to a redundant one (and back). In the
>> end, if the redundant VPC is rock solid, you most likely don't even want 
>> single
>> VPCs any more. But that will come.
>> 
>> As I said, we're rolling this out as we speak. In a few weeks when 
>> everything is
>> upgraded I can share what we learned and how well it works. CloudStack could
>> use a similar approach.
> 
> +1 Pretty much this.
> 
> René


RE: 4.11 Release announcment

2018-02-07 Thread Giles Sirett
Kris
My bad - I was working from a summary list 

Kind regards
Giles

giles.sir...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 


-Original Message-
From: Kris Sterckx [mailto:kris.ster...@nuagenetworks.net] 
Sent: 06 February 2018 12:25
To: dev@cloudstack.apache.org
Cc: market...@cloudstack.apache.org; Rohit Yadav 
Subject: Re: 4.11 Release announcment

Hi Giles ,


Impressive !


For completeness : following features are missing in your list :

* Extra DHCP options support  (Nuage Networks)(CLOUDSTACK-9776)

* Physical network migration  (CLOUDSTACK-10024)  ,  better take it separate as 
this is generic development

* Nuage VSP 5.0 support and caching of NuageVsp ID's   (CLOUDSTACK-10053)


Kris


On 6 February 2018 at 10:36, Giles Sirett 
wrote:

> Hi all
>
> Rohit and I are wording the announcement for the 4.11 release
>
> I'm trying to get a few quotes for the announcements from ACS  users
>
>
> Something along the lines of "we're excited about this new version of 
> Cloudstack because of"
>
>
> If anybody here is able to provide a quote, can you please ping 
> something over to me by Thursday 12:00 GMT
>
>
> List of whats new below
>
>
> New Features and Improvements
> *Support for XenServer 7.1 and 7.2, and improved support for
> VMware 6.5.
> *Host-HA cloudstack/> framework and HA-provider for KVM hosts with and NFS as 
> primary storage, and a new background polling task manager.
> *Secure agents communication: new certificate authority framework<
> http://www.shapeblue.com/cloudstack-ca-framework/> and a default 
> built-in root CA provider.
> *New network type - L2 confluence/pages/viewpage.action?pageId=74680920>.
> *CloudStack metrics exporter for Prometheus >.
> *Cloudian Hyperstore
> Connector for CloudStack.
> *Annotation feature for CloudStack entities such as hosts.
> *Separation of volume snapshot creation of primary storage and
> backing operation on secondary storage.
> *Limit admin access from specified CIDRs.
> *Expansion of Management IP Range.
> *Dedication of public IPs to SSVM and CPVM.
> *Support for separate subnet for SSVM and CPVM.
> *Bypass secondary storage template copy/transfer for KVM.
> *Support for multi-disk OVA template for VMware.
> *Storage overprovisioning for local storage.
> *LDAP mapping with domain scope, and mapping of LDAP group to an
> account.
> *Move user across accounts.
> *Support for "VSD managed" networks with Nuage Networks.
> *Extend config drive support for user data, metadata, and password
> (Nuage networks).
> *Nuage domain template selection per VPC and support for network
> migration.
> *Managed storage enhancements.
> *Support for watchdog timer to KVM Instances.
> *Support for Secondary IPv6 Addresses and Subnets.
> *IPv6 Prefix Delegation support in Basic Networking.
> *Ability to specific mac address while deploying VM or adding a
> nic to a VM.
> *VMware dvswitch security policies configuration in network
> offering
> *Allow more than 7 nics to be added to a VMware VM.
> *Network rate cloudstack-administration/en/latest/service_offerings.html#
> network-throttling> usage for guest offering for VRs.
> *Usage metrics for VM snapshot on primary storage
> *Enable netscaler inline mode.
> *NCC integration in CloudStack.
> *The retirement of Midonet network plugin.
> UI Improvements
> *High precision of metrics in the dashboard.
> *Event timeline - filter related events.
> *Navigation improvements:
> * VRs to account, network, instances
> * Network and VRs to instances.
> *List view improvements:
> * As applicable, account, zone, network columns in list views.
> * States and related columns with icons in various infrastructure
> entity views.
> * Additional columns in several list views.
> *New columns for additional information.
> *Bulk operation support for stopping and destroying VMs (known the
> issue of manual refreshing required).
> Structural Improvements
> *Embedded Jetty and improved CloudStack management server
> configuration.
> *Improved support for Java 8 in built artifacts/modules,
> packaging, and systemvm template.
> *Debian 9 based systemvm template:
> * Patches system VM without reboot, reduces VR/systemvm startup
> time to few tens of seconds.
> * Faster console proxy startup and service availability.
> * Improved support for 

Re: [DISCUSS] VR upgrade downtime reduction

2018-02-07 Thread Rene Moser
On 02/06/2018 02:47 PM, Remi Bergsma wrote:
> Hi Daan,
> 
> In my opinion the biggest issue is the fact that there are a lot of different 
> code paths: VPC versus non-VPC, VPC versus redundant-VPC, etc. That's why you 
> cannot simply switch from a single VPC to a redundant VPC for example. 
> 
> For SBP, we mitigated that in Cosmic by converting all non-VPCs to a VPC with 
> a single tier and made sure all features are supported. Next we merged the 
> single and redundant VPC code paths. The idea here is that redundancy or not 
> should only be a difference in the number of routers. Code should be the 
> same. A single router, is also "master" but there just is no "backup".
> 
> That simplifies things A LOT, as keepalived is now the master of the whole 
> thing. No more assigning ip addresses in Python, but leave that to keepalived 
> instead. Lots of code deleted. Easier to maintain, way more stable. We just 
> released Cosmic 6 that has this feature and are now rolling it out in 
> production. Looking good so far. This change unlocks a lot of possibilities, 
> like live upgrading from a single VPC to a redundant one (and back). In the 
> end, if the redundant VPC is rock solid, you most likely don't even want 
> single VPCs any more. But that will come.
> 
> As I said, we're rolling this out as we speak. In a few weeks when everything 
> is upgraded I can share what we learned and how well it works. CloudStack 
> could use a similar approach.

+1 Pretty much this.

René


Re: [DISCUSS] VR upgrade downtime reduction

2018-02-07 Thread Rafael Weingärtner
 ONE-VR approach in ACS 5.0. It is time to plan for a major release and
break some things...

On Wed, Feb 7, 2018 at 7:17 AM, Paul Angus  wrote:

> It seems sensible to me to have ONE VR, and I like the idea of that we all
> VRs are 'redundant-ready', again supporting the ONE-VR approach.
>
> The question I have is:
>
> - how do we handle the transition - does it need ACS 5.0?
> The API and the UI separate the VR and the VPC, so what is the most
> logical presentation of the proposed solution to the users/operators.
>
>
> Kind regards,
>
> Paul Angus
>
> paul.an...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>
> -Original Message-
> From: Daan Hoogland [mailto:daan.hoogl...@gmail.com]
> Sent: 07 February 2018 08:58
> To: dev 
> Subject: Re: [DISCUSS] VR upgrade downtime reduction
>
> Reading all the reactions I am getting wary of all the possible solutions
> that we have.
>  We do have a fragile VR and Remi's way seems the only one to stabilise it.
> It also answers the question on which of my two tactics we should follow.
>  Wido's abjection may be valid but services that are not started are not
> crashing and thus should not hinder him.
>  As for Wei's changes I think the most important one is in the PR I ported
> forward to master, using his older commit. I metntioned it in
> > ​[1] https://github.com/apache/cloudstack/pull/2435​
> I am looking forward to any of your PRs as well Wei.
>
>  Making all VRs redundant is a bit of a hack and the biggest risk in it is
> making sure that only one will get started.
>
> ​ There is one point I'd like consensus on; We have only one system
> template and we are well served by letting it have only one form as VR. ​Do
> we agree on that?
>
> ​comments, flames, questions, ​regards,​
>
>
> On Tue, Feb 6, 2018 at 9:04 PM, Wei ZHOU  wrote:
>
> > Hi Remi,
> >
> > Actually in our fork, there are more changes than restartnetwork and
> > restart vpc, similar as your changes.
> > (1) edit networks from offering with single VR to offerings with RVR,
> > will hack VR (set new guest IP, start keepalived and conntrackd,
> > blablabla)
> > (2) restart vpc from single VR to RVR. similar changes will be made.
> > The downtime is around 5s. However, these changes are based 4.7.1, we
> > are not sure if it still work in 4.11
> >
> > We have lots of changes , we will port the changes to 4.11 LTS and
> > create PRs in the next months.
> >
> > -Wei
> >
> >
> > 2018-02-06 14:47 GMT+01:00 Remi Bergsma :
> >
> > > Hi Daan,
> > >
> > > In my opinion the biggest issue is the fact that there are a lot of
> > > different code paths: VPC versus non-VPC, VPC versus redundant-VPC,
> etc.
> > > That's why you cannot simply switch from a single VPC to a redundant
> > > VPC for example.
> > >
> > > For SBP, we mitigated that in Cosmic by converting all non-VPCs to a
> > > VPC with a single tier and made sure all features are supported.
> > > Next we
> > merged
> > > the single and redundant VPC code paths. The idea here is that
> > > redundancy or not should only be a difference in the number of
> > > routers. Code should
> > be
> > > the same. A single router, is also "master" but there just is no
> > "backup".
> > >
> > > That simplifies things A LOT, as keepalived is now the master of the
> > whole
> > > thing. No more assigning ip addresses in Python, but leave that to
> > > keepalived instead. Lots of code deleted. Easier to maintain, way
> > > more stable. We just released Cosmic 6 that has this feature and are
> > > now
> > rolling
> > > it out in production. Looking good so far. This change unlocks a lot
> > > of possibilities, like live upgrading from a single VPC to a
> > > redundant one (and back). In the end, if the redundant VPC is rock
> > > solid, you most
> > likely
> > > don't even want single VPCs any more. But that will come.
> > >
> > > As I said, we're rolling this out as we speak. In a few weeks when
> > > everything is upgraded I can share what we learned and how well it
> works.
> > > CloudStack could use a similar approach.
> > >
> > > Kind Regards,
> > > Remi
> > >
> > >
> > >
> > > On 05/02/2018, 16:44, "Daan Hoogland" 
> wrote:
> > >
> > > H devs,
> > >
> > > I have recently (re-)submitted two PRs, one by Wei [1] and one
> > > by
> > Remi
> > > [2],
> > > that reduce downtime for redundant routers and redundant VPCs
> > > respectively.
> > > (please review those)
> > > Now from customers we hear that they also want to reduce downtime
> for
> > > regular VRs so as we discussed this we came to two possible
> > > solutions that
> > > we want to implement one of:
> > >
> > > 1. start and configure a new router before destroying the old
> > > one and then
> > > as a last minute action stop the old one.
> > > 2. make all routers start up 

RE: [DISCUSS] VR upgrade downtime reduction

2018-02-07 Thread Paul Angus
It seems sensible to me to have ONE VR, and I like the idea of that we all VRs 
are 'redundant-ready', again supporting the ONE-VR approach.

The question I have is:

- how do we handle the transition - does it need ACS 5.0?
The API and the UI separate the VR and the VPC, so what is the most logical 
presentation of the proposed solution to the users/operators.


Kind regards,

Paul Angus

paul.an...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 


-Original Message-
From: Daan Hoogland [mailto:daan.hoogl...@gmail.com] 
Sent: 07 February 2018 08:58
To: dev 
Subject: Re: [DISCUSS] VR upgrade downtime reduction

Reading all the reactions I am getting wary of all the possible solutions that 
we have.
 We do have a fragile VR and Remi's way seems the only one to stabilise it.
It also answers the question on which of my two tactics we should follow.
 Wido's abjection may be valid but services that are not started are not 
crashing and thus should not hinder him.
 As for Wei's changes I think the most important one is in the PR I ported 
forward to master, using his older commit. I metntioned it in
> ​[1] https://github.com/apache/cloudstack/pull/2435​
I am looking forward to any of your PRs as well Wei.

 Making all VRs redundant is a bit of a hack and the biggest risk in it is 
making sure that only one will get started.

​ There is one point I'd like consensus on; We have only one system template 
and we are well served by letting it have only one form as VR. ​Do we agree on 
that?

​comments, flames, questions, ​regards,​


On Tue, Feb 6, 2018 at 9:04 PM, Wei ZHOU  wrote:

> Hi Remi,
>
> Actually in our fork, there are more changes than restartnetwork and 
> restart vpc, similar as your changes.
> (1) edit networks from offering with single VR to offerings with RVR, 
> will hack VR (set new guest IP, start keepalived and conntrackd, 
> blablabla)
> (2) restart vpc from single VR to RVR. similar changes will be made.
> The downtime is around 5s. However, these changes are based 4.7.1, we 
> are not sure if it still work in 4.11
>
> We have lots of changes , we will port the changes to 4.11 LTS and 
> create PRs in the next months.
>
> -Wei
>
>
> 2018-02-06 14:47 GMT+01:00 Remi Bergsma :
>
> > Hi Daan,
> >
> > In my opinion the biggest issue is the fact that there are a lot of 
> > different code paths: VPC versus non-VPC, VPC versus redundant-VPC, etc.
> > That's why you cannot simply switch from a single VPC to a redundant 
> > VPC for example.
> >
> > For SBP, we mitigated that in Cosmic by converting all non-VPCs to a 
> > VPC with a single tier and made sure all features are supported. 
> > Next we
> merged
> > the single and redundant VPC code paths. The idea here is that 
> > redundancy or not should only be a difference in the number of 
> > routers. Code should
> be
> > the same. A single router, is also "master" but there just is no
> "backup".
> >
> > That simplifies things A LOT, as keepalived is now the master of the
> whole
> > thing. No more assigning ip addresses in Python, but leave that to 
> > keepalived instead. Lots of code deleted. Easier to maintain, way 
> > more stable. We just released Cosmic 6 that has this feature and are 
> > now
> rolling
> > it out in production. Looking good so far. This change unlocks a lot 
> > of possibilities, like live upgrading from a single VPC to a 
> > redundant one (and back). In the end, if the redundant VPC is rock 
> > solid, you most
> likely
> > don't even want single VPCs any more. But that will come.
> >
> > As I said, we're rolling this out as we speak. In a few weeks when 
> > everything is upgraded I can share what we learned and how well it works.
> > CloudStack could use a similar approach.
> >
> > Kind Regards,
> > Remi
> >
> >
> >
> > On 05/02/2018, 16:44, "Daan Hoogland"  wrote:
> >
> > H devs,
> >
> > I have recently (re-)submitted two PRs, one by Wei [1] and one 
> > by
> Remi
> > [2],
> > that reduce downtime for redundant routers and redundant VPCs 
> > respectively.
> > (please review those)
> > Now from customers we hear that they also want to reduce downtime for
> > regular VRs so as we discussed this we came to two possible 
> > solutions that
> > we want to implement one of:
> >
> > 1. start and configure a new router before destroying the old 
> > one and then
> > as a last minute action stop the old one.
> > 2. make all routers start up redundancy services but for regular 
> > routers
> > start only one until an upgrade is required at which time a new,
> second
> > router can be started before killing the old one.​
> >
> > ​obviously both solutions have their merits, so I want to have 
> > your input
> > to make the broadest supported implementation.
> > -1 means there will be an overlap or a small 

Re: [DISCUSS] VR upgrade downtime reduction

2018-02-07 Thread Daan Hoogland
Reading all the reactions I am getting wary of all the possible solutions
that we have.
 We do have a fragile VR and Remi's way seems the only one to stabilise it.
It also answers the question on which of my two tactics we should follow.
 Wido's abjection may be valid but services that are not started are not
crashing and thus should not hinder him.
 As for Wei's changes I think the most important one is in the PR I ported
forward to master, using his older commit. I metntioned it in
> ​[1] https://github.com/apache/cloudstack/pull/2435​
I am looking forward to any of your PRs as well Wei.

 Making all VRs redundant is a bit of a hack and the biggest risk in it is
making sure that only one will get started.

​ There is one point I'd like consensus on; We have only one system
template and we are well served by letting it have only one form as VR. ​Do
we agree on that?

​comments, flames, questions, ​regards,​


On Tue, Feb 6, 2018 at 9:04 PM, Wei ZHOU  wrote:

> Hi Remi,
>
> Actually in our fork, there are more changes than restartnetwork and
> restart vpc, similar as your changes.
> (1) edit networks from offering with single VR to offerings with RVR, will
> hack VR (set new guest IP, start keepalived and conntrackd, blablabla)
> (2) restart vpc from single VR to RVR. similar changes will be made.
> The downtime is around 5s. However, these changes are based 4.7.1, we are
> not sure if it still work in 4.11
>
> We have lots of changes , we will port the changes to 4.11 LTS and create
> PRs in the next months.
>
> -Wei
>
>
> 2018-02-06 14:47 GMT+01:00 Remi Bergsma :
>
> > Hi Daan,
> >
> > In my opinion the biggest issue is the fact that there are a lot of
> > different code paths: VPC versus non-VPC, VPC versus redundant-VPC, etc.
> > That's why you cannot simply switch from a single VPC to a redundant VPC
> > for example.
> >
> > For SBP, we mitigated that in Cosmic by converting all non-VPCs to a VPC
> > with a single tier and made sure all features are supported. Next we
> merged
> > the single and redundant VPC code paths. The idea here is that redundancy
> > or not should only be a difference in the number of routers. Code should
> be
> > the same. A single router, is also "master" but there just is no
> "backup".
> >
> > That simplifies things A LOT, as keepalived is now the master of the
> whole
> > thing. No more assigning ip addresses in Python, but leave that to
> > keepalived instead. Lots of code deleted. Easier to maintain, way more
> > stable. We just released Cosmic 6 that has this feature and are now
> rolling
> > it out in production. Looking good so far. This change unlocks a lot of
> > possibilities, like live upgrading from a single VPC to a redundant one
> > (and back). In the end, if the redundant VPC is rock solid, you most
> likely
> > don't even want single VPCs any more. But that will come.
> >
> > As I said, we're rolling this out as we speak. In a few weeks when
> > everything is upgraded I can share what we learned and how well it works.
> > CloudStack could use a similar approach.
> >
> > Kind Regards,
> > Remi
> >
> >
> >
> > On 05/02/2018, 16:44, "Daan Hoogland"  wrote:
> >
> > H devs,
> >
> > I have recently (re-)submitted two PRs, one by Wei [1] and one by
> Remi
> > [2],
> > that reduce downtime for redundant routers and redundant VPCs
> > respectively.
> > (please review those)
> > Now from customers we hear that they also want to reduce downtime for
> > regular VRs so as we discussed this we came to two possible solutions
> > that
> > we want to implement one of:
> >
> > 1. start and configure a new router before destroying the old one and
> > then
> > as a last minute action stop the old one.
> > 2. make all routers start up redundancy services but for regular
> > routers
> > start only one until an upgrade is required at which time a new,
> second
> > router can be started before killing the old one.​
> >
> > ​obviously both solutions have their merits, so I want to have your
> > input
> > to make the broadest supported implementation.
> > -1 means there will be an overlap or a small delay and interruption
> of
> > service.
> > +1 It can be argued, "they got what they payed for".
> > -2 means a overhead in memory usage by the router by the extra
> services
> > running on it.
> > +2 the number of router-varieties will be further reduced.
> >
> > -1&-2 We have to deal with potentially large upgrade steps from way
> > before
> > the cloudstack era even and might be stuck to 1 because of that,
> > needing to
> > hack around it. Any dealing with older VRs, pre 4.5 and especially
> pre
> > 4.0
> > will be hard.
> >
> > I am not cross posting though this might be one of these occasions
> > where it
> > is appropriate to include users@. Just my puristic inhibitions.
> >
> > Of course I