Re: Copy Volume Failed in CloudStack 4.5 (XenServer 6.5)

2018-02-08 Thread Tutkowski, Mike
If you go to the Global Settings tab in the GUI and search for “wait”, there 
are several possible timeouts that may apply.

The backup.snapshot.wait Global Setting seems like the one that probably 
applies here (per what Pierre-Luc was noting).

On 2/8/18, 4:15 PM, "Pierre-Luc Dion"  wrote:

I think there is a timout global settings you could change so the copy task
will take longer before it timeout and fail in cloudstack. This will not
improve your performance but might reduce failure.

On updating the database content, it could work, but only if the vhd
successfully copy, and mappings remain valid.

I hope this can help...



Le 6 févr. 2018 13 h 28, "anillakieni"  a
écrit :

Dear All,

Is somebody available here to assist me on fixing my issue.

Thanks,
Anil.

On Tue, Feb 6, 2018 at 9:00 PM, anillakieni  wrote:

> Hi All,
>
> I'm facing issue when copying  larger size volumes. i.e., Secondary
> Storage to Primary Storage (I mean attaching DATA volume to VM), after
> certain time around 37670 seconds.
>
> Version of:
> - CloudStack is 4.5.0
> - XenServer 6.5.0
> - MySQL 5.1.73
>
>
> The error and log is provided below, Could someone please assist me here
> which steps i have to take to fix this issue. Also, can we have a chance
to
> update the failed status to success through database tables because i have
> to upload the whole disk again to secondary storage and then later attach
> it to VM, which is consuming more time. My environment has very slow
> network transfers (I have only 1 Gig switch). Please let me know if we can
> tweak the DB to update the status of the disk or do we have any settings
to
> be changed to accept more time (wait time) for updating the status.
> "
>
> 2018-02-06 03:20:42,385 DEBUG [c.c.a.t.Request] (Work-Job-Executor-31:ctx-
c1c78a5a
> job-106186/job-106187 ctx-ea1ef3e6) (logid:c59b2359) Seq
> 38-367887794560851961: Received:  { Ans: , MgmtId: 47019105324719, via:
38,
> Ver: v1, Flags: 110, { CopyCmdAnswer } }
> 2018-02-06 03:20:42,389 DEBUG [o.a.c.s.v.VolumeObject]
> (Work-Job-Executor-31:ctx-c1c78a5a job-106186/job-106187 ctx-ea1ef3e6)
> (logid:c59b2359) *Failed to update state*
> *com.cloud.utils.exception.CloudRuntimeException: DB Exception on:
> com.mysql.jdbc.JDBC4PreparedStatement@54bd3a25: SELECT volume_store_ref.id
> , volume_store_ref.store_id,
> volume_store_ref.volume_id, volume_store_ref.zone_id,
> volume_store_ref.created, volume_store_ref.last_updated,
> volume_store_ref.download_pct, volume_store_ref.size,
> volume_store_ref.physical_size, volume_store_ref.download_state,
> volume_store_ref.checksum, volume_store_ref.local_path,
> volume_store_ref.error_str, volume_store_ref.job_id,
> volume_store_ref.install_path, volume_store_ref.url,
> volume_store_ref.download_url, volume_store_ref.download_url_created,
> volume_store_ref.destroyed, volume_store_ref.update_count,
> volume_store_ref.updated, volume_store_ref.state, volume_store_ref.ref_cnt
> FROM volume_store_ref WHERE volume_store_ref.store_id = 1  AND
> volume_store_ref.volume_id = 1178  AND volume_store_ref.destroyed = 0
> ORDER BY RAND() LIMIT 1*
> at com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(
> GenericDaoBase.java:425)
> at com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(
> GenericDaoBase.java:361)
> at com.cloud.utils.db.GenericDaoBase.findOneIncludingRemovedBy(
> GenericDaoBase.java:889)
> at com.cloud.utils.db.GenericDaoBase.findOneBy(
> GenericDaoBase.java:900)
> at org.apache.cloudstack.storage.image.db.VolumeDataStoreDaoImpl.
> findByStoreVolume(VolumeDataStoreDaoImpl.java:209)
> at sun.reflect.GeneratedMethodAccessor306.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.springframework.aop.support.AopUtils.
> invokeJoinpointUsingReflection(AopUtils.java:317)
> at org.springframework.aop.framework.ReflectiveMethodInvocation.
> invokeJoinpoint(ReflectiveMethodInvocation.java:183)
> at org.springframework.aop.framework.ReflectiveMethodInvocation.
> proceed(ReflectiveMethodInvocation.java:150)
> at com.cloud.utils.db.TransactionContextInterceptor.invoke(
> TransactionContextInterceptor.java:34)
> at org.springframework.aop.framework.ReflectiveMethodInvocation.
> proceed(ReflectiveMethodInvocation.java:161)
> at 

Re: Copy Volume Failed in CloudStack 4.5 (XenServer 6.5)

2018-02-08 Thread Pierre-Luc Dion
I think there is a timout global settings you could change so the copy task
will take longer before it timeout and fail in cloudstack. This will not
improve your performance but might reduce failure.

On updating the database content, it could work, but only if the vhd
successfully copy, and mappings remain valid.

I hope this can help...



Le 6 févr. 2018 13 h 28, "anillakieni"  a
écrit :

Dear All,

Is somebody available here to assist me on fixing my issue.

Thanks,
Anil.

On Tue, Feb 6, 2018 at 9:00 PM, anillakieni  wrote:

> Hi All,
>
> I'm facing issue when copying  larger size volumes. i.e., Secondary
> Storage to Primary Storage (I mean attaching DATA volume to VM), after
> certain time around 37670 seconds.
>
> Version of:
> - CloudStack is 4.5.0
> - XenServer 6.5.0
> - MySQL 5.1.73
>
>
> The error and log is provided below, Could someone please assist me here
> which steps i have to take to fix this issue. Also, can we have a chance
to
> update the failed status to success through database tables because i have
> to upload the whole disk again to secondary storage and then later attach
> it to VM, which is consuming more time. My environment has very slow
> network transfers (I have only 1 Gig switch). Please let me know if we can
> tweak the DB to update the status of the disk or do we have any settings
to
> be changed to accept more time (wait time) for updating the status.
> "
>
> 2018-02-06 03:20:42,385 DEBUG [c.c.a.t.Request] (Work-Job-Executor-31:ctx-
c1c78a5a
> job-106186/job-106187 ctx-ea1ef3e6) (logid:c59b2359) Seq
> 38-367887794560851961: Received:  { Ans: , MgmtId: 47019105324719, via:
38,
> Ver: v1, Flags: 110, { CopyCmdAnswer } }
> 2018-02-06 03:20:42,389 DEBUG [o.a.c.s.v.VolumeObject]
> (Work-Job-Executor-31:ctx-c1c78a5a job-106186/job-106187 ctx-ea1ef3e6)
> (logid:c59b2359) *Failed to update state*
> *com.cloud.utils.exception.CloudRuntimeException: DB Exception on:
> com.mysql.jdbc.JDBC4PreparedStatement@54bd3a25: SELECT volume_store_ref.id
> , volume_store_ref.store_id,
> volume_store_ref.volume_id, volume_store_ref.zone_id,
> volume_store_ref.created, volume_store_ref.last_updated,
> volume_store_ref.download_pct, volume_store_ref.size,
> volume_store_ref.physical_size, volume_store_ref.download_state,
> volume_store_ref.checksum, volume_store_ref.local_path,
> volume_store_ref.error_str, volume_store_ref.job_id,
> volume_store_ref.install_path, volume_store_ref.url,
> volume_store_ref.download_url, volume_store_ref.download_url_created,
> volume_store_ref.destroyed, volume_store_ref.update_count,
> volume_store_ref.updated, volume_store_ref.state, volume_store_ref.ref_cnt
> FROM volume_store_ref WHERE volume_store_ref.store_id = 1  AND
> volume_store_ref.volume_id = 1178  AND volume_store_ref.destroyed = 0
> ORDER BY RAND() LIMIT 1*
> at com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(
> GenericDaoBase.java:425)
> at com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(
> GenericDaoBase.java:361)
> at com.cloud.utils.db.GenericDaoBase.findOneIncludingRemovedBy(
> GenericDaoBase.java:889)
> at com.cloud.utils.db.GenericDaoBase.findOneBy(
> GenericDaoBase.java:900)
> at org.apache.cloudstack.storage.image.db.VolumeDataStoreDaoImpl.
> findByStoreVolume(VolumeDataStoreDaoImpl.java:209)
> at sun.reflect.GeneratedMethodAccessor306.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.springframework.aop.support.AopUtils.
> invokeJoinpointUsingReflection(AopUtils.java:317)
> at org.springframework.aop.framework.ReflectiveMethodInvocation.
> invokeJoinpoint(ReflectiveMethodInvocation.java:183)
> at org.springframework.aop.framework.ReflectiveMethodInvocation.
> proceed(ReflectiveMethodInvocation.java:150)
> at com.cloud.utils.db.TransactionContextInterceptor.invoke(
> TransactionContextInterceptor.java:34)
> at org.springframework.aop.framework.ReflectiveMethodInvocation.
> proceed(ReflectiveMethodInvocation.java:161)
> at org.springframework.aop.interceptor.
> ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
> at org.springframework.aop.framework.ReflectiveMethodInvocation.
> proceed(ReflectiveMethodInvocation.java:172)
> at org.springframework.aop.framework.JdkDynamicAopProxy.
> invoke(JdkDynamicAopProxy.java:204)
> at com.sun.proxy.$Proxy173.findByStoreVolume(Unknown Source)
> at org.apache.cloudstack.storage.datastore.
> ObjectInDataStoreManagerImpl.findObject(ObjectInDataStoreManagerImpl.
> java:353)
> at org.apache.cloudstack.storage.datastore.
> ObjectInDataStoreManagerImpl.findObject(ObjectInDataStoreManagerImpl.
> java:338)
> at 

Re: Refusing to design this network, the physical isolation type is not BCF_SEGMENT

2018-02-08 Thread Nux!
Tested, now I can add L2 network on the back of vxlan phsyical network, but I 
have found a problem.

It doesn't seem to like VNI ids higher than those for VLANs which is 4096, 
however VXLAN's max id is about 16 millions, so this is a problem that needs a 
fix.

Here are some logs:

2018-02-08 15:50:00,738 WARN  [resource.wrapper.LibvirtStartCommandWrapper] 
(agentRequest-Handler-4:null) (logid:bd9fbea5) InternalErrorException 
com.cloud.exception.InternalErrorException: Failed to create vnet 3: 
RTNETLINK answers: Numerical result out of rangeCannot find device 
"bond0.3"Failed to create vlan 3 on pif: bond0.

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "Nux!" 
> To: "dev" 
> Sent: Thursday, 8 February, 2018 14:29:36
> Subject: Re: Refusing to design this network, the physical isolation type is 
> not BCF_SEGMENT

> Thanks! Any idea where blueorangutan keeps the rpms?
> 
> --
> Sent from the Delta quadrant using Borg technology!
> 
> Nux!
> www.nux.ro
> 
> - Original Message -
>> From: "Nicolas Vazquez" 
>> To: "dev" 
>> Sent: Wednesday, 7 February, 2018 12:12:00
>> Subject: Re: Refusing to design this network, the physical isolation type is 
>> not
>> BCF_SEGMENT
> 
>> I have pushed a fix for this on PR 2448. Can you please test it?
>> 
>> 
>> From: Nux! 
>> Sent: Tuesday, February 6, 2018 10:25:07 AM
>> To: dev
>> Subject: Re: Refusing to design this network, the physical isolation type is 
>> not
>> BCF_SEGMENT
>> 
>> Thanks Nicolas, much appreciated.
>> Once you have a patch, feel free to ping me so I can test.
>> 
>> --
>> Sent from the Delta quadrant using Borg technology!
>> 
>> Nux!
>> www.nux.ro
>> 
>> 
>> nicolas.vazq...@shapeblue.com
>> www.shapeblue.com
>> ,
>> @shapeblue
>>  
>> 
>> 
>> - Original Message -
>>> From: "Nicolas Vazquez" 
>>> To: "dev" 
>>> Sent: Tuesday, 6 February, 2018 13:23:54
>>> Subject: Re: Refusing to design this network, the physical isolation type 
>>> is not
>>> BCF_SEGMENT
>> 
>>> Hi Lucian,
>>>
>>>
>>> Thanks for posting this issue. I have checked the canHandle() method on
>>> VxlanGuestNetworkGuru and it is not considering L2 network offerings, only
>>> Isolated, so it refuses to design the network. I'll make sure to include a 
>>> fix
>>> for it on 4.11.1.
>>>
>>>
>>> Thanks,
>>>
>>> Nicolas
>>>
>>> 
>>> From: Nux! 
>>> Sent: Tuesday, February 6, 2018 8:30:03 AM
>>> To: dev
>>> Subject: [L2 network] [VXLAN] Refusing to design this network, the physical
>>> isolation type is not BCF_SEGMENT
>>>
>>> Hi,
>>>
>>> I'm trying to add an L2 network based on a VXLAN physical network and I am
>>> getting the error in the subject.
>>>
>>> If I use a VLAN based physical network all completes successfully and I end 
>>> up
>>> with an L2 network in green "Setup" state.
>>>
>>> Here are some more logs:
>>>
>>> 2018-02-06 11:20:27,748 DEBUG [c.c.n.NetworkServiceImpl]
>>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Found physical
>>> network id=201 based on requested tags mellanoxvxlan
>>> 2018-02-06 11:20:27,749 DEBUG [c.c.n.NetworkServiceImpl]
>>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Found physical
>>> network id=201 based on requested tags mellanoxvxlan
>>> 2018-02-06 11:20:27,766 DEBUG [c.c.n.g.BigSwitchBcfGuestNetworkGuru]
>>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
>>> design this network, the physical isolation type is not BCF_SEGMENT
>>> 2018-02-06 11:20:27,766 DEBUG [o.a.c.n.c.m.ContrailGuru]
>>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
>>> design this network
>>> 2018-02-06 11:20:27,767 DEBUG [c.c.n.g.NiciraNvpGuestNetworkGuru]
>>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
>>> design this network
>>> 2018-02-06 11:20:27,767 DEBUG [o.a.c.n.o.OpendaylightGuestNetworkGuru]
>>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
>>> design this network
>>> 2018-02-06 11:20:27,767 DEBUG [c.c.n.g.OvsGuestNetworkGuru]
>>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
>>> design this network
>>> 2018-02-06 11:20:27,769 DEBUG [o.a.c.n.g.SspGuestNetworkGuru]
>>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) SSP not
>>> configured to be active
>>> 2018-02-06 11:20:27,769 DEBUG [c.c.n.g.BrocadeVcsGuestNetworkGuru]
>>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
>>> design this network
>>> 2018-02-06 11:20:27,769 DEBUG [c.c.n.g.NuageVspGuestNetworkGuru]
>>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
>>> design network using network offering 19 on physical network 201
>>> 

Re: Refusing to design this network, the physical isolation type is not BCF_SEGMENT

2018-02-08 Thread Nux!
Thanks! Any idea where blueorangutan keeps the rpms?

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "Nicolas Vazquez" 
> To: "dev" 
> Sent: Wednesday, 7 February, 2018 12:12:00
> Subject: Re: Refusing to design this network, the physical isolation type is 
> not BCF_SEGMENT

> I have pushed a fix for this on PR 2448. Can you please test it?
> 
> 
> From: Nux! 
> Sent: Tuesday, February 6, 2018 10:25:07 AM
> To: dev
> Subject: Re: Refusing to design this network, the physical isolation type is 
> not
> BCF_SEGMENT
> 
> Thanks Nicolas, much appreciated.
> Once you have a patch, feel free to ping me so I can test.
> 
> --
> Sent from the Delta quadrant using Borg technology!
> 
> Nux!
> www.nux.ro
> 
> 
> nicolas.vazq...@shapeblue.com
> www.shapeblue.com
> ,
> @shapeblue
>  
> 
> 
> - Original Message -
>> From: "Nicolas Vazquez" 
>> To: "dev" 
>> Sent: Tuesday, 6 February, 2018 13:23:54
>> Subject: Re: Refusing to design this network, the physical isolation type is 
>> not
>> BCF_SEGMENT
> 
>> Hi Lucian,
>>
>>
>> Thanks for posting this issue. I have checked the canHandle() method on
>> VxlanGuestNetworkGuru and it is not considering L2 network offerings, only
>> Isolated, so it refuses to design the network. I'll make sure to include a 
>> fix
>> for it on 4.11.1.
>>
>>
>> Thanks,
>>
>> Nicolas
>>
>> 
>> From: Nux! 
>> Sent: Tuesday, February 6, 2018 8:30:03 AM
>> To: dev
>> Subject: [L2 network] [VXLAN] Refusing to design this network, the physical
>> isolation type is not BCF_SEGMENT
>>
>> Hi,
>>
>> I'm trying to add an L2 network based on a VXLAN physical network and I am
>> getting the error in the subject.
>>
>> If I use a VLAN based physical network all completes successfully and I end 
>> up
>> with an L2 network in green "Setup" state.
>>
>> Here are some more logs:
>>
>> 2018-02-06 11:20:27,748 DEBUG [c.c.n.NetworkServiceImpl]
>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Found physical
>> network id=201 based on requested tags mellanoxvxlan
>> 2018-02-06 11:20:27,749 DEBUG [c.c.n.NetworkServiceImpl]
>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Found physical
>> network id=201 based on requested tags mellanoxvxlan
>> 2018-02-06 11:20:27,766 DEBUG [c.c.n.g.BigSwitchBcfGuestNetworkGuru]
>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
>> design this network, the physical isolation type is not BCF_SEGMENT
>> 2018-02-06 11:20:27,766 DEBUG [o.a.c.n.c.m.ContrailGuru]
>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
>> design this network
>> 2018-02-06 11:20:27,767 DEBUG [c.c.n.g.NiciraNvpGuestNetworkGuru]
>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
>> design this network
>> 2018-02-06 11:20:27,767 DEBUG [o.a.c.n.o.OpendaylightGuestNetworkGuru]
>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
>> design this network
>> 2018-02-06 11:20:27,767 DEBUG [c.c.n.g.OvsGuestNetworkGuru]
>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
>> design this network
>> 2018-02-06 11:20:27,769 DEBUG [o.a.c.n.g.SspGuestNetworkGuru]
>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) SSP not
>> configured to be active
>> 2018-02-06 11:20:27,769 DEBUG [c.c.n.g.BrocadeVcsGuestNetworkGuru]
>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
>> design this network
>> 2018-02-06 11:20:27,769 DEBUG [c.c.n.g.NuageVspGuestNetworkGuru]
>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Refusing to
>> design network using network offering 19 on physical network 201
>> 2018-02-06 11:20:27,770 DEBUG [o.a.c.e.o.NetworkOrchestrator]
>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Releasing lock
>> for Acct[6af2875b-04fc-11e8-923e-002590474525-admin]
>> 2018-02-06 11:20:27,789 DEBUG [c.c.u.d.T.Transaction]
>> (qtp788117692-390:ctx-f1a980be ctx-61be30e8) (logid:0ca0c866) Rolling back 
>> the
>> transaction: Time = 38 Name =  qtp788117692-390; called by
>> -TransactionLegacy.rollback:889-TransactionLegacy.removeUpTo:832-TransactionLegacy.close:656-Transaction.execute:43-Transaction.execute:47-NetworkOrchestrator.createGuestNetwork:2315-NetworkServiceImpl$4.doInTransaction:1383-NetworkServiceImpl$4.doInTransaction:1331-Transaction.execute:40-NetworkServiceImpl.commitNetwork:1331-NetworkServiceImpl.createGuestNetwork:1294-NativeMethodAccessorImpl.invoke0:-2
>> 2018-02-06 11:20:27,798 ERROR [c.c.a.ApiServer] 
>> (qtp788117692-390:ctx-f1a980be
>> ctx-61be30e8) (logid:0ca0c866) unhandled exception executing api command:
>> [Ljava.lang.String;@43b9df02
>> com.cloud.utils.exception.CloudRuntimeException: Unable to convert 

Re: Experimental - Direct download for templates

2018-02-08 Thread Nux!
Hello,

Can you clarify the following about this feature:
- Is local storage supported?
- if above is Yes, then how are the templates synced between HVs? 
- will it require a permanent repository of templates to be available whenever 
a new HV needs to deploy them?

Thanks,
Lucian

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
> From: "Nicolas Vazquez" 
> To: "dev" 
> Sent: Wednesday, 7 February, 2018 14:19:55
> Subject: Experimental - Direct download for templates

> Hi all,
> 
> 
> A feature has been introduced on 4.11.0 allowing to register templates 
> bypassing
> secondary storage using a new option 'Direct Download'. It allows templates to
> be directly downloaded into primary storage at VM deployment time. It is an
> experimental feature and it is currently supported on KVM hypervisor only. PR:
> https://github.com/apache/cloudstack/pull/2379
> 
> 
> A brief description on the current implementation:
> 
> - CloudStack allows registering Direct Download/Bypass Secondary Storage
> templates for KVM hypervisor by setting the direct_download flag to true on
> registerTemplate.
> 
> - Templates are not downloaded to secondary storage after they are registered 
> on
> CloudStack, are marked as Bypass Secondary Storage and as Ready for 
> deployment.
> 
> - When Bypassed templates are selected for VM deployment, download is 
> delegated
> to the agents, which would store the templates on primary storage instead of
> copying them from secondary storage
> 
> - Metalinks are supported, but aria2 dependency has to be manually installed 
> on
> the agents.
> 
> 
> There are currently some PRs in progress for 4.11.1 with some improvements for
> this functionality.
> 
> 
> Any comments/ideas?
> 
> 
> Thanks,
> 
> Nicolas
> 
> nicolas.vazq...@shapeblue.com
> www.shapeblue.com
> ,
> @shapeblue


Re: [DISCUSS] VR upgrade downtime reduction

2018-02-08 Thread Daan Hoogland
to stop the vote and continue the discussion. I personally want unification
of all router vms: VR, 'shared network', rVR, VPC, rVPC, and eventually the
one we want to create for 'enterprise topology hand-off points'. And I
think we have some level of consensus on that but the path there is a
concern for Wido and for some of my colleagues as well, and rightly so. One
issue is upgrades from older versions.

I the common scenario as follows:
+ redundancy is deprecated and only number of instances remain.
+ an old VR is replicated in memory by an redundant enabled version, that
will be in a state of running but inactive.
- the old one will be destroyed while a ping is running
- as soon as the ping fails more then three times in a row (this might have
to have a hypervisor specific implementation or require a helper vm)
+ the new one is activated

after this upgrade Wei's and/or Remi's code will do the work for any
following upgrade.

flames, please



On Wed, Feb 7, 2018 at 12:17 PM, Nux!  wrote:

> +1 too
>
> --
> Sent from the Delta quadrant using Borg technology!
>
> Nux!
> www.nux.ro
>
> - Original Message -
> > From: "Rene Moser" 
> > To: "dev" 
> > Sent: Wednesday, 7 February, 2018 10:11:45
> > Subject: Re: [DISCUSS] VR upgrade downtime reduction
>
> > On 02/06/2018 02:47 PM, Remi Bergsma wrote:
> >> Hi Daan,
> >>
> >> In my opinion the biggest issue is the fact that there are a lot of
> different
> >> code paths: VPC versus non-VPC, VPC versus redundant-VPC, etc. That's
> why you
> >> cannot simply switch from a single VPC to a redundant VPC for example.
> >>
> >> For SBP, we mitigated that in Cosmic by converting all non-VPCs to a
> VPC with a
> >> single tier and made sure all features are supported. Next we merged
> the single
> >> and redundant VPC code paths. The idea here is that redundancy or not
> should
> >> only be a difference in the number of routers. Code should be the same.
> A
> >> single router, is also "master" but there just is no "backup".
> >>
> >> That simplifies things A LOT, as keepalived is now the master of the
> whole
> >> thing. No more assigning ip addresses in Python, but leave that to
> keepalived
> >> instead. Lots of code deleted. Easier to maintain, way more stable. We
> just
> >> released Cosmic 6 that has this feature and are now rolling it out in
> >> production. Looking good so far. This change unlocks a lot of
> possibilities,
> >> like live upgrading from a single VPC to a redundant one (and back). In
> the
> >> end, if the redundant VPC is rock solid, you most likely don't even
> want single
> >> VPCs any more. But that will come.
> >>
> >> As I said, we're rolling this out as we speak. In a few weeks when
> everything is
> >> upgraded I can share what we learned and how well it works. CloudStack
> could
> >> use a similar approach.
> >
> > +1 Pretty much this.
> >
> > René
>



-- 
Daan