Thanks to gouthamr for doing these writeups and for recording!

We had a great turn out at the manila Fishbowl and working sessions. Important notes and Action Items are below:

===========================
Fishbowl 1: Race Conditions
===========================
Thursday 27th Oct / 11:00 - 11:40 / AC Hotel -Salon Barcelona - P1
Etherpad: https://etherpad.openstack.org/p/ocata-manila-race-conditions
Video: https://www.youtube.com/watch?v=__P7zQobAQw

Gist:
* We've some race conditions that have worsened over time:
  * Deleting a share while snapshotting the share
  * Two simultaneous delete-share calls
  * Two simultaneous create-snapshot calls
* Though the end result of the race conditions is not terrible, we can leave resources in untenable states, requiring administrative cleanup in the worst scenario * Any type of resource interaction must be protected in the database with a test-and-set using the appropriate status fields
* Any test-and-set must be protected with a lock
* Locks must not be held over long running tasks: i.e, RPC Casts, driver invocations etc. * We need more granular state transitions: micro/transitional states must be added per resource and judiciously used for state locking
* Ex: Shares need a 'snapshotting' state
* Ex: Share servers need states to signify setup phases, a la nova compute instances

Discussion Item:
* Locks in the manila-api service (or specifically, extending usage of locks across all manila services)
* Desirable because:
* Adding test-and-set logic at the database layer may render code unmaintainable complicated as opposed to using locking abstractions (oslo.concurrency / tooz) * Cinder has evolved an elegant test-and-set solution but we may not be able to benefit from that implementation because of the lack of being able to do multi-table updates and because the code references OVO which manila doesn't yet support.
* Un-desirable because:
* Most distributors (RedHat/Suse/Kubernetes-based/MOS) want to run more than one API service in active-active H/A. * If a true distributed locking mechanism isn't used/supported, the current file-locks would be useless in the above scenario. * Running file locks on shared file systems is a possibility, but applies configuration/set-up burden * Having all the locks on the share service would allow scale out of the API service and the share manager is really the place where things are going wrong * With a limited form of test-and-set, atomic state changes can still be achieved for the API service.

Agreed:
* File locks will not help

Action Items:
(bswartz): Will propose a spec for the locking strategy
(volunteers): Act on the spec ^ and help add more transitional states and locks (or test-and-set if any) (gouthamr): state transition diagrams for shares/share instances/replicas, access rules / instance access rules (volunteers): Review ^ and add state transition diagrams for snapshots/snapshot instances, share servers (mkoderer): will help with determining race conditions within manila-share with tests

=====================================
Fishbowl 2: Data Service / Jobs Table
=====================================
Thursday 27th Oct / 11:50 - 12:30 / AC Hotel - Salon Barcelona - P1
Etherpad: https://etherpad.openstack.org/p/ocata-manila-data-service-jobs-table
Video: https://www.youtube.com/watch?v=Sajy2Qjqbmk

Gist:
* Currently, a synchronous RPC call is made from the API to the share-manager/data-service that's performing a migration to get the progress of a migration * We need a way to record progress of long running tasks: migration, backup, data copy etc. * We need to introduce a jobs table so that the respective service performing the long running task can write to the database and the API relies on the database

Discussion Items:
* There was a suggestion to extend the jobs table to all tasks on the share: snapshotting, creating share from snapshot, extending, shrinking, etc. * We agreed not to do this because the table can easily go out of control; and there isn't a solid use case to register all jobs. Maybe asynchronous user messages is a better answer to this feature request
* "restartable" jobs would benefit from the jobs table
* service heartbeats could be used to react to services dying while running long running jobs * When running the data service in active-active mode, a service going down can pass on its jobs to the other data service

Action Items:
(ganso): Will determine the structure of the jobs table model in his spec
(ganso): Will determine the benefit of the data service reacting to additions in the database rather than acting upon RPC requests

=====================================
Working Sessions 1: High Availability
=====================================
Thursday 27th Oct / 14:40 - 15:20 / CCIB - Centre de Convencions Internacional de Barcelona - P1 - Room 130
Etherpad: https://etherpad.openstack.org/p/ocata-manila-high-availability
Video: https://www.youtube.com/watch?v=xFk8ShK6qxU

Gist:
* We have a patch to introduce the tooz abstraction library to manila, it currently creates a tooz coordinator for the manila-share service and demonstrates replacing oslo concurrency locks to tooz locks: https://review.openstack.org/#/c/318336/
* The heartbeat seems to have issues, needs debugging
* The owner/committer have tested this patch with both FileDriver and Kazoo/Zookeeper as tooz backends. We need to test other tooz backends
* Distributors do not package dependencies for all tooz backends
* We plan to introduce leader election via tooz. We plan to use this in cleanups, designate the service that performs polling (migration, replication of shares and snapshots, share server cleanup) * Code needs to be written to integrate the use of tooz/dlm via the manila devstack plugin so it can be gate tested

Action Items:
(gouthamr): Will document how to set up tooz with 2 or more share services
(bswartz): Will set up a sub group of contributors to code/test H/A solutions in this release

===============================
Working Session 2: Access Rules
===============================
Thursday 28th Oct / 11:00 - 11:40 / CCIB - Centre de Convencions Internacional de Barcelona - P1 - Room 134
Etherpad: https://etherpad.openstack.org/p/ocata-manila-high-availability
Video: https://www.youtube.com/watch?v=62EllNOZ3aw

Gist:
* We have had a number of bugs with our access rules implementation since two important design changes were made in manila: introduction of share instances and unifying allow_access/deny_access driver interfaces * The most significant of the bugs is the presence of race conditions that we have tried fixing multiple times but haven't ironed out. * https://blueprints.launchpad.net/manila/+spec/fix-and-improve-access-rules attempts to rectify the issues with reintroducing per share instance per access rule statuses and adding transitional statuses and performing pessimistic locking around state transitions. Database models and API needs cleanup wrt to the way they're currently accessed.

Discussion Items:
* While applying access rules in bulk, it is useful to identify which exact rules were not able to be applied. This was functionality that manila had but lost during the update_access work.
* State transitions must still be protected with locks
* IPv6 is being enabled across OpenStack, manila needs to support exporting shares with IPv6 and access control to IPv6 based clients
  * # LINK: https://review.openstack.org/#/c/312321/
  * # LINK: https://review.openstack.org/#/c/362786/
  * # LINK: https://review.openstack.org/#/c/328932/
* No significant requirement seems to exist for access groups

Action Items:
(gouthamr): Add a spec for the access rules work
(ganso): Will update the devref regarding update_access being a driver required feature. (bswartz): Will start an ML discussion regarding IPv6 support across all vendor drivers. (vponomaryov): Will add scenario tests around enforcing that newly created shares are not accessible

============================================================================
Working Session 3: Tempest Direction and ways ahead for Manila tempest tests
============================================================================
Thursday 28th Oct / 11:50 - 12:30 / CCIB - Centre de Convencions Internacional de Barcelona - P1 - Room 134
Etherpad: https://etherpad.openstack.org/p/ocata-manila-tempest-direction
Video: https://www.youtube.com/watch?v=A5_6b369ACY

Gist:
* Manila was the first project to use and implement the tempest plugin interface * The fact that the tests are in-tree is creating issues for future direction in tempest * TC Proposal to split projects and tempest tests will be re-proposed for Pike: https://review.openstack.org/#/c/369749/

Discussion Items:
* Cons of the tempest tests being in-tree
  * Tempest tests can't be 'branchless'
* Developers change tests as and when new code lands, effectively no longer testing backwards-compatibility, even where it is promised and makes sense * Manila needs to be installed (bringing in its bulky requirements) even when only manila_tempest_tests is desired
* Pros of tempest tests being in-tree
* Tests and code can land at the same time or in close consequence ensuring feature quality, i.e, ease and sanity of development and code review * Bug-fixes that change API behavior need multiple changes: First skip the relevant tests in the test project, make the change in the project and then add the tests in the test project: Three patches instead of one today. * Manila's share clients registration to enable discovery of the clients: https://review.openstack.org/#/c/334596/ * Manila is still using 'unstable' imports from tempest which make it peg to the tempest commit in its tree
* Dynamic Credentials are not part of the stable interface tempest.lib

Action Items:
(mkoderer/dmellado): Will fix dependencies on tempest within manila_tempest_tests. We only need to use stuff from tempest.lib (mkoderer/dmellado): Will work on getting more requirements for manila_tempest_tests within tempest.lib

===================
Contributors Meetup
===================
Friday 28th Oct / 14:00 - 18:30 / CCIB - Centre de Convencions Internacional de Barcelona - P1 - Room 134
Etherpad: https://etherpad.openstack.org/p/ocata-manila-contributor-meetup
Video: https://www.youtube.com/watch?v=SP10HgUGOnI (See video description for links to specific discussions)

Discussion Items:
* Exporting shares with multiple protocols:
* Many vendor drivers, including first party can export shares with NFS and CIFS
  * CEPHFS can support NFS and CEPHFS
* The API and driver interactions may be complex to be standardized across all available vendor * Almost all vendor driver developers in the room said they had the capability to support more than one protocol combination: ex: NFS/CIFS or CEPHFS/NFS (in development)
  * There's an open spec: https://review.openstack.org/#/c/329392/
* We may introduce this if we have the ability to preserve API breaking backwards compatibility, and provide for a common implementation across different drivers in terms of management of exports and ACLs * Vendor drivers *cannot* implement this such that users' shares behave differently between vendors * Discrepancy in share protocols and access types mapping: Shares exported with NFS must only support IP based rules and not user based rules - Our API is currently very inconsistent because of some drivers supporting IP rules for CIFS shares or user rules for NFS shares
  * Shares exported with nfs must only support IP rules
  * Shares exported with cifs/smb must only support user rules
* Scenario tests must be run on third party CIs
* Scenario testing infrastructure is WIP for supporting drivers besides the generic driver * Scenario tests will be run with the API tests in the same jobs for upstream drivers: * New set of scenario tests are proposed: https://review.openstack.org/#/c/374731/ * The goal is to define a broad set of scenarios that test behavior that is expected across all backends
* Alternate snapshot semantics
* We are removing the overload on snapshot_support to mean two things: snapshot_support (can take snapshots), create_share_from_snapshot_support (can create a new share from snapshot of given share) * create_share_from_snapshot_support will be added to existing share types (db migration) and all drivers' capabilities (via detection of interface methods)
  * snapshot_support will not be a required extra-spec anymore
* Not specifying snapshot_support when creating the share type would mean a "don't care" behavior with respect to picking a backend, but provides a meaningful behavior to the tenant and administrator. See Spec: https://review.openstack.org/#/c/391049/
* Specs deadlines and process
* https://review.openstack.org/#/c/374883/ (now merged) details what the process is for proposing features in manila
  * Driver features do not require specs
* High priority specs will be added to the specs repo categorized by release
  * A merged spec requires code review attention across the community
* Generic Driver Enhancements
* a new lightweight test image is proposed: https://review.openstack.org/#/c/392307 * This will be tested on our gate for the generic driver storage virtual machines * We're hoping this lightweight service image solves scalability concerns in the gate * Distro-specific logic in the generic driver would be consolidated and refactored
* Experimental APIs
* Driver support for experimental features isn't catching up, because product managers (allegedly) feel like experimental features are in too much of a churn to devote development effort on. * This is not the meaning of an experimental feature. Our intention is to provide user sanity wrt to the API * We want feedback from actual users who don't use manila from gerrit; which is why APIs are Experimental * Adoption across drivers for features with 'experimental' APIs can only be driven by vendors themselves, the community can only make the feature better and work towards generalizing the design as much.
* Container Driver
* Currently supports only CIFS because Ganesha has issues with access control * Ganesha has now been fixed (v2.4), container driver needs support NFS via Ganesha now
* Improving Ganesha in manila
  * Ganesha's bug fix for access control merged recently
  * rraja will improve ganesha library within manila
* VMT/ Security / Vulnerability Managed tag for manila
  * We had our first security bug
* We need a security focused sub team that would help manila achieve the vulnerability-managed tag
  * Tom Barron will lead this effort
* Share Migration
* nondisruptive parameter does not default to True (along the lines of preserve-metadata or writable)
  * preserve-snapshots parameter will be added in ocata
  * Changing protocols via migration
    * access rules need to be cleared
* This will be more appropriate as a "share modify" operation, rather than be allowed via the share migration API
  * Spec for Ocata improvements: https://review.openstack.org/#/c/392291/

Action Items:
(cknight/bswartz): Start an ML discussion regarding the discrepancy in share protocols and access types mapping. Make changes to the allow access API to disallow the discrepancies. (volunteer/s): Document what common capabilities support being reported as lists and which cannot be.
(bswartz): Send ML post about deadlines for Ocata
(markstur): Determine if IBM/GPFS driver can make use of the improved ganesha library within manila

Discussion items we didn't get to:
* ensure share
* manage API requirements
* manila-ui displaying only enabled share protocols
* app-catalog use case
* share replica quotas


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to