Re: [openstack-dev] [Cinder] Austin Design Summit Recap
Dear Sean, Great compilation, this will help for sure!! Thank you!! Best Regards, Sheel Rana On Fri, May 6, 2016 at 10:38 PM, Sean McGinniswrote: > At the Design Summit in Austin, the Cinder team met over three days > to go over a variety of topics. This is a general summary of the > notes captured from each session. > > We were also able to record most sessions. Please see the > openstack-cinder YouTube channel for all its minute and tedious > glory: > > https://www.youtube.com/channel/UCJ8Koy4gsISMy0qW3CWZmaQ > > Replication Next Steps > == > Replication v2.1 was added in Mitaka. This was a first step in supporting > a simplified use case. A few drivers were able to implement support for > this in Mitaka, with a few already in the queue for support in Newton. > > There is a desire to add the ability to replicate smaller groups of volumes > and control them individually for failover, failback, etc. Eventually we > would also want to expose this functionality to non-admin users. This will > allow tenants to group their volumes by application workload or other user > specific constraint and give them control over managing that workload. > > It was agreed that it is too soon to expose this at this point. We would > first like to get broader vendor support for the current replication > capabilities before we add anything more. We also want to improve the admin > experience with handling full site failover. As it is today, there is a lot > of manual work that the admin would need to do to be able to fully recover > from a failover. There are ways we can make this experience better. So > before > we add additional things on top of replication, we want to make sure what > we have is solid and at least slightly polished. > > Personally, I would like to see some work done with Nova or some third > party > entity like Smaug or other projects to be able to coordinate activities on > the compute and storage sides in order to fail over an environment > completely > from a primary to secondary location. > > Related to the group replication (tiramisu) work was the idea of generic > volume groups. Some sort of grouping mechanism would be required to tie in > to that. We have a grouping today with consistency groups, but that has its > own set of semantics and expectations that doesn't always fully mesh with > what users would want for group replication. > > There have also been others looking at using consistency groups to enable > vendor specific functionality not quite inline with the intent of what > CGs are meant for. > > We plan on creating a new concept of a group that has a set of possible > types. > One of these types will be consistency, with the goal that internally we > can > shift things around to convert our current CG concept to be a group of type > consistency while still keeping the API interface that users are used to > for > working with them. > > But beyond that we will be able to add things like a "replication" type > that > will allow users to group volumes, that may or not be able to be snapped in > a IO order consistent manner, but that can be acted on as a group to be > replicated. We can also expand this group type to other concepts moving > forward to meet other use cases without needing to introduce a wholly new > concept. The mechanisms for managing groups will already be in place and a > new > type will be able to be added using existing plumbing. > > Etherpad: > https://etherpad.openstack.org/p/cinder-newton-replication > > Active/Active High Availability > === > Work continues on HA. Gorka gave an overview of the work completed so far > and > the work left to do. We are still on the plan proposed at the Tokyo Summit, > just a lot of work to get it all implemented. The biggest variations are > around > the host name used for the "clustered" service nodes and the idea that we > will > not attempt to do any sort of automatic cleanup for in-progress work that > gets > orphaned due to a node failure. > > Etherpad: > https://etherpad.openstack.org/p/cinder-newton-activeactiveha > > Mitaka Recap > > Two sessions were devoted to going over what had changed in Mitaka. There > were > a lot of things introduced that developers and code reviewers now need to > be > aware of, so we wanted to spend some time educating everyone on these > things. > > Conditional DB Updates > -- > To try to eliminate races (partly related to the HA work) we will now use > conditional updates. This will eliminate the gap between checking a value > in > setting it, making it one atomic DB update. Better performance than locking > around operations. > > Microversions > - > API microversions was implemented in Mitaka. The new /v3 endpoint should be > used. Any change in the API should now be implemented as a micrversion > bump. > Devref in Cinder with details of how to use this and more detail as to when >
[openstack-dev] [Cinder] Austin Design Summit Recap
At the Design Summit in Austin, the Cinder team met over three days to go over a variety of topics. This is a general summary of the notes captured from each session. We were also able to record most sessions. Please see the openstack-cinder YouTube channel for all its minute and tedious glory: https://www.youtube.com/channel/UCJ8Koy4gsISMy0qW3CWZmaQ Replication Next Steps == Replication v2.1 was added in Mitaka. This was a first step in supporting a simplified use case. A few drivers were able to implement support for this in Mitaka, with a few already in the queue for support in Newton. There is a desire to add the ability to replicate smaller groups of volumes and control them individually for failover, failback, etc. Eventually we would also want to expose this functionality to non-admin users. This will allow tenants to group their volumes by application workload or other user specific constraint and give them control over managing that workload. It was agreed that it is too soon to expose this at this point. We would first like to get broader vendor support for the current replication capabilities before we add anything more. We also want to improve the admin experience with handling full site failover. As it is today, there is a lot of manual work that the admin would need to do to be able to fully recover from a failover. There are ways we can make this experience better. So before we add additional things on top of replication, we want to make sure what we have is solid and at least slightly polished. Personally, I would like to see some work done with Nova or some third party entity like Smaug or other projects to be able to coordinate activities on the compute and storage sides in order to fail over an environment completely from a primary to secondary location. Related to the group replication (tiramisu) work was the idea of generic volume groups. Some sort of grouping mechanism would be required to tie in to that. We have a grouping today with consistency groups, but that has its own set of semantics and expectations that doesn't always fully mesh with what users would want for group replication. There have also been others looking at using consistency groups to enable vendor specific functionality not quite inline with the intent of what CGs are meant for. We plan on creating a new concept of a group that has a set of possible types. One of these types will be consistency, with the goal that internally we can shift things around to convert our current CG concept to be a group of type consistency while still keeping the API interface that users are used to for working with them. But beyond that we will be able to add things like a "replication" type that will allow users to group volumes, that may or not be able to be snapped in a IO order consistent manner, but that can be acted on as a group to be replicated. We can also expand this group type to other concepts moving forward to meet other use cases without needing to introduce a wholly new concept. The mechanisms for managing groups will already be in place and a new type will be able to be added using existing plumbing. Etherpad: https://etherpad.openstack.org/p/cinder-newton-replication Active/Active High Availability === Work continues on HA. Gorka gave an overview of the work completed so far and the work left to do. We are still on the plan proposed at the Tokyo Summit, just a lot of work to get it all implemented. The biggest variations are around the host name used for the "clustered" service nodes and the idea that we will not attempt to do any sort of automatic cleanup for in-progress work that gets orphaned due to a node failure. Etherpad: https://etherpad.openstack.org/p/cinder-newton-activeactiveha Mitaka Recap Two sessions were devoted to going over what had changed in Mitaka. There were a lot of things introduced that developers and code reviewers now need to be aware of, so we wanted to spend some time educating everyone on these things. Conditional DB Updates -- To try to eliminate races (partly related to the HA work) we will now use conditional updates. This will eliminate the gap between checking a value in setting it, making it one atomic DB update. Better performance than locking around operations. Microversions - API microversions was implemented in Mitaka. The new /v3 endpoint should be used. Any change in the API should now be implemented as a micrversion bump. Devref in Cinder with details of how to use this and more detail as to when a microversion is needed and when it is not. Rolling Upgrades Devref added for rolling upgrades and versioned objects. Discussed need to make incremental DB changes rather than all in one release. First release add new colume - write to both, read from original. Second release - write to both, read from new column. Third release - original column can now be