Hi Daneyon, We have been working on deploying a highly-available Barbican at Rackspace for a while now. We just recently made it publicly available through an early access program:
http://go.rackspace.com/cloud-keep.html We don't have a full deployment of Barbican yet. Our early access deployment does not include a Rabbit queue or barbican-worker processes, for example. This means that we don't yet have the ability to process /orders requests but we do support secret storage backed by Safenet Luna SA HSMs via the PCKS#11 Cryptographic Plugin. Our goal is to be able to provide 99.95% availability with a minimum throughput of 100 req/sec once we move to Unlimited Availability later this year, but we still have some work to get us there. To give you a better idea of what our deployment looks like, here's what we have in production today: On the front end we have two sets of VM pairs running haproxy [1] and keepalived [2] using a shared IP address per set. The two sets represent the blue and green node sets for blue-green zero-downtime deployments. [3] Our DNS entry is pointed to the shared IP of the green lb pair. The blue lb set is only accessible from our control plane, and is used for functional testing of code before being promoted to green. At any given time only one VM in each lb set is working and the other is a hot standby that keepalived can instantly promote if needed while keeping the same IP address. This gives us the ability to fail over haproxy faster than DNS can propagate. Requests are then load-balanced to at least two "API Nodes". These are VMs set up as docker hosts. We are running Repose [4], the barbican-api process and plight [5] each inside their own container. Repose is used for rate-limiting, token-validation, and access control. Plight is used to designate an API node as either blue or green. Each haproxy set is configured to route only to the api nodes that match its color, however they constantly query all nodes for blue/green status (more on this later). For data storage we are running a MariaDB Galera Cluster [6] in multi-master mode with 3x VM nodes. The cluster sits behind yet another haproxy+keepalived pair, so that our db connections are load-balanced to all three masters. This was mainly driven by our decision to host our control plane our public cloud, since the multi-master setup gives us better fault tolerance in the likely event of losing one of the DB nodes. Previous to this cloud-based deployment we were using PostgreSQL in a master+slave configuration, but we didn't have a good solution for fully automatic failovers. Choosing the right Cryptographic Plugin/Backend is probably going to be the hardest part about planning for a highly-available deployment. For our deployments we are using pairs of Luna SA HSMs in HA mode. [7] This is currently our bottleneck, and for Newton we plan to focus most of our development effort in improving the performance of the PKCS#11 Plugin. Originally we wanted to store one key per project in the HSM itself. However, we found out early on that the amount storage in the Lunas is very limited, and completely inadequate for the scale we want to operate at. This led to the development of the pkek-wrapping model that the PKCS#11 plugin is currently using. This came with the cost of having to make more hops to the HSM for a single transaction. The KMIP Plugin does not use the pkek-wrapping model, and as such is limited by the amount of storage available in the KMIP device that is used. Note that when deploying Barbican with the KMIP Plugin, the database capacity is not relevant. I'm not super familiar with DogTag, so I can't speak to the limitations of choosing the DogTag Plugin. Lastly, since our Lunas are racked in a dedicated environment, we have physical firewalls (F5s) in front of them. The barbican-api containers in the api nodes connect to the HSMs over a VPN tunnel from our public cloud environment to the dedicated environment. We have two identical environments right now (staging and production), and we will be adding more production environments in other data centers later this year. We deploy new code often, and production usually runs only a week or two behind the barbican master branch. For zero-downtime deployments, we've asked our community to stagger database schema changes across separate commits. The idea is that the schema change should be introduced first in a separate commit. This ensures that the current codebase can continue to operate with the new schema. The actual code changes are made in a follow-up patch. When we prepare to deploy, we first update the database schema. This is the only potentially disruptive operation we currently have. In theory the existing api nodes continue to function with the new schema. We then build up new blue API nodes with the new code to be rolled out. All the new nodes are accessible through our blue lb, and this is where we run our test suite to make sure everything is still good. If the tests all pass the new blue set is promoted to green, and the previously-green set is slowly demoted to blue. We keep the now-blue nodes around in case something breaks and we need to quickly roll back to the previous version. If all goes well in staging we do it all over again in prod. The whole thing is driven through Jenkins using ansible for configuration management. It's not fully automated in the sense that someone still has to push the button in Jenkins to get things going, but once we mature our pipeline a bit more we plan to set it on cruise control. Next steps for us after we sort out PCKS#11 performance will be to deploy an HA RabbitMQ, and N api-workers. I don't think we'll be setting up the keystone-listeners any time soon. I hope that gives you a good starting point for planning your HA-Barbican delpoyment. Let me know if you have any more questions. Regards, Douglas Mendizábal [1] http://www.haproxy.org/ [2] http://www.keepalived.org/ [3] http://martinfowler.com/bliki/BlueGreenDeployment.html [4] http://www.openrepose.org/ [5] https://github.com/rackerlabs/plight [6] https://downloads.mariadb.org/mariadb-galera/ [7] http://www.safenet-inc.com/data-encryption/hardware-security-modules-hsms/luna-hsms-key-management/luna-sa-network-hsm/ On 3/21/16 1:23 PM, Daneyon Hansen (danehans) wrote: > All, > > Does anyone have experience deploying Barbican in a highly-available > fashion? If so, I'm interested in learning from your experience. Any > insight you can provide is greatly appreciated. > > Regards, > Daneyon Hansen > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
signature.asc
Description: OpenPGP digital signature
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
