Re: [openstack-dev] [Fuel] Restore OSD devices with Puppet Ceph module

2015-07-24 Thread Mykola Golub
On Wed, Jul 22, 2015 at 3:52 PM, Oleg Gelbukh ogelb...@mirantis.com wrote:

 Greetings,

 While working on upgrade of OpenStack with Fuel installer, I meet a
 requirement to re-add OSD devices with the existing data set to a Ceph
 cluster using Puppet module. Node is reinstalled during the upgrade, thus
 disks used for OSDs are not mounted at Puppet runtime.

 Current version of Ceph module in fuel-library only supports addition of
 new OSD devices. Mounted devices are skipped. Not mounted devices with Ceph
 UUID in GPT label are passed to 'ceph-deploy osd prepare' command that
 formats the device, recreates file system and all existing data is lost.

 I proposed a patch to allow support for OSD devices with existing data set:
 https://review.openstack.org/#/c/203639/2

 However, this fix is very straightforward and doesn't account for
 different corner cases, as was pointed out by Mykola Golub in review. As
 this problem seems rather significant to me, I'd like to bring this
 discussion to the broader audience.

 So, here's the comment with my replies inline:

Oleg,

Sorry for the delay. I saw your message buth missed that apart my
comments it contained your replies. See my comments below.


 I am not sure just reactivating disks that have a filesystem is a safe
 approach:

 1) If you are deploying a mix of new and restored disks you may end up
 with confiicting OSDs joining the cluster with the same ID. 2) It makes
 sense to restore OSDs only if a monitor (cluster) is restored, otherwise
 activation of old OSDs will fail. 3) It might happen that the partition
 contains a valid filesystem by accident (e.g. the user reused disk/hosts
 from another cluster) -- it will not join the cluster because wrong fsid
 and credentials but the deployment will unexpectedly fail.

 1) As far as I can tell, OSD device IDs are assgined by Ceph cluster based
 on already existing devices. So, if some ID is stored on the device, either
 device with the given ID already exists in the cluster and no other new
 device will the same ID, or cluster doesn't know about a device with the
 given ID, and that means we already lost the data placement before.

I though here about the case when you are restoring the cluster from
scratch, readding OSD devices to the osd map. So agree, in your case,
if OSDs are not removed from the cluster map, it should work.

 2) This can be fixed by adding a check that ensures that fsid parameter in
 ceph.conf on the node and cluster-fsid on the device are equal. Otherwise
 the device is treated like a new device, i.e. passed to 'ceph-deploy osd
 prepare'.

Yes, I think after succesfully mounting this device we should check
both that cluster ID matches and the osd ID matches what is in the
cluster map.

 3) This situation would be covered by previous check, in my
 understanding.

Yes, if you add the check like above and failing the check will cause
redeploy it should work.


 Is it posible to pass information that the cluster is restored using
 partition preservation? Becasue I think a much safer approach is:

 1) Pass some flag from the user that we are restoring the cluster 2)
 Restore controller (monitor) and abort deployment if it fails. 3) When
 deploying osd host, if 'restore' flag is present, skip prepare step and try
 only activate for all disks if possible (we might want to ignore activate
 error, and continue with other disks so we restore osds as many as possible)

 The case I want to support by this change is not restoration of the whole
 cluster, but rather support for reinstallation of OSD node's operating
 system. For this case, the approach you propose seems actually more correct
 than my implementation. For node being reinstalled we do not expect new
 devices, but only ones with the existing data set, so we don't need to
 specifically check for it, but rather just skip prepare for all
 devices.

If this is for the case of restoring a one OSD node I think you can go
forwadrd with your approach. If it were supposed for the case when a
whole cluster need to be recovered that I would prefore mine.

I just though about restoring the whole cluster case, because recently
some people were asking me about possibility to restore after the
whole cluster lost.


 We still need to check that the value of fsid on the disk is consistent
 with the cluster's fsid.

 Which issues should we anticipate with this kind of approach?

Apart issues already mentioned that you agreed to address I think
nothing, I am looking forward at reviewing your updated patch :-)


 Another question that is still unclear to me is if someone really needs
 support for a hybrid use case when the new and existing unmounted OSD
 devices are mixed in one OSD node?

I don't think we need to support, but if does not forbidden for
users...  we don't know in what state the cluster a user is trying to
restore, I could imaging it having old and new osd disks.


 --
 Best regards,
 Oleg Gelbukh

 

[openstack-dev] [Fuel] Restore OSD devices with Puppet Ceph module

2015-07-22 Thread Oleg Gelbukh
Greetings,

While working on upgrade of OpenStack with Fuel installer, I meet a
requirement to re-add OSD devices with the existing data set to a Ceph
cluster using Puppet module. Node is reinstalled during the upgrade, thus
disks used for OSDs are not mounted at Puppet runtime.

Current version of Ceph module in fuel-library only supports addition of
new OSD devices. Mounted devices are skipped. Not mounted devices with Ceph
UUID in GPT label are passed to 'ceph-deploy osd prepare' command that
formats the device, recreates file system and all existing data is lost.

I proposed a patch to allow support for OSD devices with existing data set:
https://review.openstack.org/#/c/203639/2

However, this fix is very straightforward and doesn't account for different
corner cases, as was pointed out by Mykola Golub in review. As this problem
seems rather significant to me, I'd like to bring this discussion to the
broader audience.

So, here's the comment with my replies inline:

I am not sure just reactivating disks that have a filesystem is a safe
approach:

1) If you are deploying a mix of new and restored disks you may end up with
confiicting OSDs joining the cluster with the same ID. 2) It makes sense to
restore OSDs only if a monitor (cluster) is restored, otherwise activation
of old OSDs will fail. 3) It might happen that the partition contains a
valid filesystem by accident (e.g. the user reused disk/hosts from another
cluster) -- it will not join the cluster because wrong fsid and credentials
but the deployment will unexpectedly fail.

1) As far as I can tell, OSD device IDs are assgined by Ceph cluster based
on already existing devices. So, if some ID is stored on the device, either
device with the given ID already exists in the cluster and no other new
device will the same ID, or cluster doesn't know about a device with the
given ID, and that means we already lost the data placement before.
2) This can be fixed by adding a check that ensures that fsid parameter in
ceph.conf on the node and cluster-fsid on the device are equal. Otherwise
the device is treated like a new device, i.e. passed to 'ceph-deploy osd
prepare'.
3) This situation would be covered by previous check, in my understanding.

Is it posible to pass information that the cluster is restored using
partition preservation? Becasue I think a much safer approach is:

1) Pass some flag from the user that we are restoring the cluster 2)
Restore controller (monitor) and abort deployment if it fails. 3) When
deploying osd host, if 'restore' flag is present, skip prepare step and try
only activate for all disks if possible (we might want to ignore activate
error, and continue with other disks so we restore osds as many as possible)

The case I want to support by this change is not restoration of the whole
cluster, but rather support for reinstallation of OSD node's operating
system. For this case, the approach you propose seems actually more correct
than my implementation. For node being reinstalled we do not expect new
devices, but only ones with the existing data set, so we don't need to
specifically check for it, but rather just skip prepare for all devices.

We still need to check that the value of fsid on the disk is consistent
with the cluster's fsid.

Which issues should we anticipate with this kind of approach?

Another question that is still unclear to me is if someone really needs
support for a hybrid use case when the new and existing unmounted OSD
devices are mixed in one OSD node?

--
Best regards,
Oleg Gelbukh
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev