Re: Master/ETCD Migration

Jason DeTiberus Tue, 13 Dec 2016 11:15:33 -0800

On Tue, Dec 13, 2016 at 1:49 PM, Diego Castro <[email protected]>
wrote:


> 2016-12-13 15:24 GMT-03:00 Jason DeTiberus <[email protected]>:
>
>>
>>
>> On Tue, Dec 13, 2016 at 12:37 PM, Diego Castro <
>> [email protected]> wrote:
>>
>>> Thanks John, it's very helpful.
>>> Looking over the playbook code, it's seems to replace all certificates
>>> and trigger node evacuation to update all pods CA, i definitely don't want
>>> that!
>>>
>>
>> It should only do that when openshift_certificates_redeploy_ca is set to
>> True, otherwise it should just redeploy certificates on the masters.
>>
> Perfect!
>
>>
>> There is also a PR for splitting out the certificate redeploy playbooks
>> to allow for more flexibility when running: https://github.com/op
>> enshift/openshift-ansible/pull/2671
>>
>>
>>> - ETCD wont be a problem since i can replace the certs, migrate the
>>> datadir and restart masters.
>>>
>>
>> We don't currently support automated resizing or migration of etcd
>> currently, but this approach should work just fine.
>>
>> That said, one *could* do the following:
>> - Add the new etcd hosts to the inventory
>> - Run Ansible against the hosts (I suspect it will fail on service
>> startup)
>> - Add the newly provisioned etcd hosts manually to the cluster using
>> etcdctl
>> - if Ansible failed on the previous step, re-run Ansible again to finish
>> landing the etcd config change
>> - Remove the old etcd hosts from the etcd cluster using etcdctl
>> - Update the inventory to remove the old etcd hosts
>> - Run Ansible to remove the old etcd hosts from the master configs
>>
>> I'll do it!
>
>>
>> - Masters is a big issue, since i had to change public cluster hostname.
>>>
>>
>> Indeed, but there shouldn't be a huge disruption of doing a rolling
>> update of the master services to land the new certificate. The controllers
>> service will migrate (possibly multiple times), but that should be mostly
>> transparent to running apps and users.
>>
>
> What you mean by 'rolling update', is the same process of nodes 'which i
> do by running scaleup playbook'?
>

For masters, this might work:
- If you are using a named certificates:
  - update inventory:
    - update openshift_master_named_certificates to add the cert for the
new cluster name(s)
    - add the additional master hosts to the inventory without updating the
cluster hostname(s)
  - Run Ansible to land the new named_certificate on the existing hosts and
install/configure the new hosts

At this point, the cluster should be up and functional with all masters and
should respond and serve the api/console using the new cluster hostname,
but nodes will still be configured to use the old cluster hostname

The certificate redeploy PR covers how to update the node kubeconfigs to
point to the new master host, which would need to be done on each host
(along with a node reboot), before the old cluster hostname/load balancer
is removed.


One other thing to keep in mind, is that you will want to migrate
/etc/etcd/generated_certs and /etc/origin/generated_configs to the new
"first etcd" and "first master" respectively after removing the old hosts.


>
> Once i get the new nodes up and running, can i just shutdown the old
> servers and update the inventory? Just wondering if something goes wrong
> replacing masters[0].
>
>
>>
>>
>
>>
>>>
>>>
>>> ---
>>> Diego Castro / The CloudFather
>>> GetupCloud.com - Eliminamos a Gravidade
>>>
>>> 2016-12-13 11:17 GMT-03:00 Skarbek, John <[email protected]>:
>>>
>>>> Diego,
>>>>
>>>> We’ve done a similar thing in our environment. I’m not sure if the
>>>> openshift-ansible guys have a better way, but this is what we did at that
>>>> time.
>>>>
>>>> We created a custom playbook to run through all the steps as necessary.
>>>> And due to the version of openshift-ansible we were running, we had to be
>>>> careful when we did whichever server was index 0 in the array of hosts. (I
>>>> *think* they resolved that problem now)
>>>>
>>>> First we created a play that copied the necessary certificates too all
>>>> the nodes, such that it didn’t matter which node was in index 0 of the list
>>>> of nodes. So we had the playbook limited to operate one one node at a time
>>>> which dealt with tearing it down. Then we’d run the deploy on the entire
>>>> cluster. For the new node, everything was installed as necessary. For the
>>>> rest of the cluster it was mostly a no-op. We use static addresses, so the
>>>> only thing that really changed was the underlying host. Certificate
>>>> regeneration was limited.
>>>>
>>>> For the master nodes, this was pretty easy. For the etcd nodes, we had
>>>> to do a bit of extra work as the nodes being added to the cluster, had
>>>> different member id’s that what the cluster thought that node ought to
>>>> have. Following etcd’s docs on Member Migration should be able to help you
>>>> out here.
>>>>
>>>> The only major part we had to be careful of, was doing the work on the
>>>> node that was going to be the first node. Due to the way the playbooks
>>>> operated, it put a lot of config and certificate details that would get
>>>> copied around. If they’ve addressed this, it shouldn’t be an issue, but at
>>>> the time, we got around this by simply adjusting the order of which nodes
>>>> defined in our inventory file.
>>>>
>>>> A wee bit laborious, but definitely doable.
>>>>
>>>> In our case, we didn’t experience any downtime, the master nodes cycled
>>>> through the haproxy box appropriately, and the etcd nodes were removed and
>>>> added to the cluster without any major headaches.
>>>>
>>>> Though I’m now more curious if the team at redhat working on
>>>> openshift-ansible may have addressed any of these sorts of issues to make
>>>> it easier.
>>>>
>>>>
>>>>
>>>> --
>>>> John Skarbek
>>>>
>>>> On December 13, 2016 at 08:35:54, Diego Castro (
>>>> [email protected]) wrote:
>>>>
>>>> Hello, i have to migrate my production HA masters/etcd servers to new
>>>> boxes.
>>>>
>>>> Steps Intended:
>>>>
>>>> 1) Create a new masters and etcd machines using byo/config playbook.
>>>> 2) Stop the old masters and move etcd data directory to new etcd servers
>>>> 3) Start the new masters
>>>> 4) Run byo/openshift-cluster/redeploy-certificates.yml against the
>>>> cluster to updage CA and node configuration.
>>>>
>>>> Question:
>>>> - Is it the best or the right way to do since this is a production
>>>> cluster and i want minimal downtime?
>>>>
>>>>
>>>> ---
>>>> Diego Castro / The CloudFather
>>>> GetupCloud.com
>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__GetupCloud.com&d=DgQFaQ&c=_hRq4mqlUmqpqlyQ5hkoDXIVh6I6pxfkkNxQuL0p-Z0&r=8IlWeJZqFtf8Tvx1PDV9NsLfM_M0oNfzEXXNp-tpx74&m=SXZbgql2jEdZcxZf-F7G1PY7KWstOe44c8cHN7wPNKM&s=WBpMKzLoWt-i2RcaByenm6qveMOvVLk3hW7-68poML4&e=>
>>>> - Eliminamos a Gravidade
>>>> _______________________________________________
>>>> users mailing list
>>>> [email protected]
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.op
>>>> enshift.redhat.com_openshiftmm_listinfo_users&d=DgICAg&c=_hR
>>>> q4mqlUmqpqlyQ5hkoDXIVh6I6pxfkkNxQuL0p-Z0&r=8IlWeJZqFtf8Tvx1P
>>>> DV9NsLfM_M0oNfzEXXNp-tpx74&m=SXZbgql2jEdZcxZf-F7G1PY7KWstOe4
>>>> 4c8cHN7wPNKM&s=hljug4_Dzfra1fGcjSvwVO2n6CAsCQpr5yyPBcbOc-Y&e=
>>>>
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>>
>>>
>>
>>
>> --
>> Jason DeTiberus
>>
>
>


-- 
Jason DeTiberus

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: Master/ETCD Migration

Reply via email to