Re: [ceph-users] Understanding incomplete PGs

2019-07-06 Thread Torben Hørup
Hi 

The "ec unable to recover when below min size" thing has very recently
been fixed for octopus. 

See https://tracker.ceph.com/issues/18749 and
https://github.com/ceph/ceph/pull/17619 

Docs has been updated with a section on this issue
http://docs.ceph.com/docs/master/rados/operations/erasure-code/#erasure-coded-pool-recover
[2] 

/Torben 

On 05.07.2019 11:50, Paul Emmerich wrote:

> * There are virtually no use cases for ec pools with m=1, this is a bad 
> configuration as you can't have both availability and durability 
> 
> * Due to weird internal restrictions ec pools below their min size can't 
> recover, you'll probably have to reduce min_size temporarily to recover it 
> 
> * Depending on your version it might be necessary to restart some of the OSDs 
> due to a bug (fixed by now) that caused it to mark some objects as degraded 
> if you remove or restart an OSD while you have remapped objects 
> * run "ceph osd safe-to-destroy X" to check if it's safe to destroy a given 
> OSD
> 
> -- 
> Paul Emmerich
> 
> Looking for help with your Ceph cluster? Contact us at https://croit.io
> 
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io [1]
> Tel: +49 89 1896585 90 
> 
> On Fri, Jul 5, 2019 at 1:17 AM Kyle  wrote: 
> 
>> Hello,
>> 
>> I'm working with a small ceph cluster (about 10TB, 7-9 OSDs, all Bluestore 
>> on 
>> lvm) and recently ran into a problem with 17 pgs marked as incomplete after 
>> adding/removing OSDs.
>> 
>> Here's the sequence of events:
>> 1. 7 osds in the cluster, health is OK, all pgs are active+clean
>> 2. 3 new osds on a new host are added, lots of backfilling in progress
>> 3. osd 6 needs to be removed, so we do "ceph osd crush reweight osd.6 0"
>> 4. after a few hours we see "min osd.6 with 0 pgs" from "ceph osd 
>> utilization"
>> 5. ceph osd out 6
>> 6. systemctl stop ceph-osd@6
>> 7. the drive backing osd 6 is pulled and wiped
>> 8. backfilling has now finished all pgs are active+clean except for 17 
>> incomplete pgs
>> 
>> From reading the docs, it sounds like there has been unrecoverable data loss 
>> in those 17 pgs. That raises some questions for me:
>> 
>> Was "ceph osd utilization" only showing a goal of 0 pgs allocated instead of 
>> the current actual allocation?
>> 
>> Why is there data loss from a single osd being removed? Shouldn't that be 
>> recoverable?
>> All pools in the cluster are either replicated 3 or erasure-coded k=2,m=1 
>> with 
>> default "host" failure domain. They shouldn't suffer data loss with a single 
>> osd being removed even if there were no reweighting beforehand. Does the 
>> backfilling temporarily reduce data durability in some way?
>> 
>> Is there a way to see which pgs actually have data on a given osd?
>> 
>> I attached an example of one of the incomplete pgs.
>> 
>> Thanks for any help,
>> 
>> Kyle___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 

Links:
--
[1] http://www.croit.io
[2]
http://docs.ceph.com/docs/master/rados/operations/erasure-code/#erasure-coded-pool-recovery___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Understanding incomplete PGs

2019-07-05 Thread Kyle
On Friday, July 5, 2019 11:50:44 AM CDT Paul Emmerich wrote:
> * There are virtually no use cases for ec pools with m=1, this is a bad
> configuration as you can't have both availability and durability

I'll have to look into this more. The cluster only has 4 hosts, so it might be 
worth switching to osd failure domain for the EC pools and using k=5,m=2.

> 
> * Due to weird internal restrictions ec pools below their min size can't
> recover, you'll probably have to reduce min_size temporarily to recover it

Lowering min_size to 2 did allow it to recover.

> 
> * Depending on your version it might be necessary to restart some of the
> OSDs due to a bug (fixed by now) that caused it to mark some objects as
> degraded if you remove or restart an OSD while you have remapped objects
> 
> * run "ceph osd safe-to-destroy X" to check if it's safe to destroy a given
> OSD

Excellent, thanks!

> 
> > Hello,
> > 
> > I'm working with a small ceph cluster (about 10TB, 7-9 OSDs, all Bluestore
> > on
> > lvm) and recently ran into a problem with 17 pgs marked as incomplete
> > after
> > adding/removing OSDs.
> > 
> > Here's the sequence of events:
> > 1. 7 osds in the cluster, health is OK, all pgs are active+clean
> > 2. 3 new osds on a new host are added, lots of backfilling in progress
> > 3. osd 6 needs to be removed, so we do "ceph osd crush reweight osd.6 0"
> > 4. after a few hours we see "min osd.6 with 0 pgs" from "ceph osd
> > utilization"
> > 5. ceph osd out 6
> > 6. systemctl stop ceph-osd@6
> > 7. the drive backing osd 6 is pulled and wiped
> > 8. backfilling has now finished all pgs are active+clean except for 17
> > incomplete pgs
> > 
> > From reading the docs, it sounds like there has been unrecoverable data
> > loss
> > in those 17 pgs. That raises some questions for me:
> > 
> > Was "ceph osd utilization" only showing a goal of 0 pgs allocated instead
> > of
> > the current actual allocation?
> > 
> > Why is there data loss from a single osd being removed? Shouldn't that be
> > recoverable?
> > All pools in the cluster are either replicated 3 or erasure-coded k=2,m=1
> > with
> > default "host" failure domain. They shouldn't suffer data loss with a
> > single
> > osd being removed even if there were no reweighting beforehand. Does the
> > backfilling temporarily reduce data durability in some way?
> > 
> > Is there a way to see which pgs actually have data on a given osd?
> > 
> > I attached an example of one of the incomplete pgs.
> > 
> > Thanks for any help,
> > 
> > Kyle___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Understanding incomplete PGs

2019-07-05 Thread Kyle
On Friday, July 5, 2019 11:28:32 AM CDT Caspar Smit wrote:
> Kyle,
> 
> Was the cluster still backfilling when you removed osd 6 or did you only
> check its utilization?

Yes, still backfilling.

> 
> Running an EC pool with m=1 is a bad idea. EC pool min_size = k+1 so losing
> a single OSD results in inaccessible data.
> Your incomplete PG's are probably all EC pool pgs, please verify.

Yes, also correct.

> 
> If the above statement is true, you could *temporarily* set min_size to 2
> (on your EC pools) to get back access to your data again but this is a very
> dangerous action. Losing another OSD during this period results in actual
> data loss.

This resolved the issue. I had seen reducing min_size mentioned elsewhere, but 
for some reason I thought that applied only to replicated pools. Thank you!

> 
> Kind regards,
> Caspar Smit
> 
> Op vr 5 jul. 2019 om 01:17 schreef Kyle :
> > Hello,
> > 
> > I'm working with a small ceph cluster (about 10TB, 7-9 OSDs, all Bluestore
> > on
> > lvm) and recently ran into a problem with 17 pgs marked as incomplete
> > after
> > adding/removing OSDs.
> > 
> > Here's the sequence of events:
> > 1. 7 osds in the cluster, health is OK, all pgs are active+clean
> > 2. 3 new osds on a new host are added, lots of backfilling in progress
> > 3. osd 6 needs to be removed, so we do "ceph osd crush reweight osd.6 0"
> > 4. after a few hours we see "min osd.6 with 0 pgs" from "ceph osd
> > utilization"
> > 5. ceph osd out 6
> > 6. systemctl stop ceph-osd@6
> > 7. the drive backing osd 6 is pulled and wiped
> > 8. backfilling has now finished all pgs are active+clean except for 17
> > incomplete pgs
> > 
> > From reading the docs, it sounds like there has been unrecoverable data
> > loss
> > in those 17 pgs. That raises some questions for me:
> > 
> > Was "ceph osd utilization" only showing a goal of 0 pgs allocated instead
> > of
> > the current actual allocation?
> > 
> > Why is there data loss from a single osd being removed? Shouldn't that be
> > recoverable?
> > All pools in the cluster are either replicated 3 or erasure-coded k=2,m=1
> > with
> > default "host" failure domain. They shouldn't suffer data loss with a
> > single
> > osd being removed even if there were no reweighting beforehand. Does the
> > backfilling temporarily reduce data durability in some way?
> > 
> > Is there a way to see which pgs actually have data on a given osd?
> > 
> > I attached an example of one of the incomplete pgs.
> > 
> > Thanks for any help,
> > 
> > Kyle___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Understanding incomplete PGs

2019-07-05 Thread Paul Emmerich
* There are virtually no use cases for ec pools with m=1, this is a bad
configuration as you can't have both availability and durability

* Due to weird internal restrictions ec pools below their min size can't
recover, you'll probably have to reduce min_size temporarily to recover it

* Depending on your version it might be necessary to restart some of the
OSDs due to a bug (fixed by now) that caused it to mark some objects as
degraded if you remove or restart an OSD while you have remapped objects

* run "ceph osd safe-to-destroy X" to check if it's safe to destroy a given
OSD




-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Fri, Jul 5, 2019 at 1:17 AM Kyle  wrote:

> Hello,
>
> I'm working with a small ceph cluster (about 10TB, 7-9 OSDs, all Bluestore
> on
> lvm) and recently ran into a problem with 17 pgs marked as incomplete
> after
> adding/removing OSDs.
>
> Here's the sequence of events:
> 1. 7 osds in the cluster, health is OK, all pgs are active+clean
> 2. 3 new osds on a new host are added, lots of backfilling in progress
> 3. osd 6 needs to be removed, so we do "ceph osd crush reweight osd.6 0"
> 4. after a few hours we see "min osd.6 with 0 pgs" from "ceph osd
> utilization"
> 5. ceph osd out 6
> 6. systemctl stop ceph-osd@6
> 7. the drive backing osd 6 is pulled and wiped
> 8. backfilling has now finished all pgs are active+clean except for 17
> incomplete pgs
>
> From reading the docs, it sounds like there has been unrecoverable data
> loss
> in those 17 pgs. That raises some questions for me:
>
> Was "ceph osd utilization" only showing a goal of 0 pgs allocated instead
> of
> the current actual allocation?
>
> Why is there data loss from a single osd being removed? Shouldn't that be
> recoverable?
> All pools in the cluster are either replicated 3 or erasure-coded k=2,m=1
> with
> default "host" failure domain. They shouldn't suffer data loss with a
> single
> osd being removed even if there were no reweighting beforehand. Does the
> backfilling temporarily reduce data durability in some way?
>
> Is there a way to see which pgs actually have data on a given osd?
>
> I attached an example of one of the incomplete pgs.
>
> Thanks for any help,
>
> Kyle___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Understanding incomplete PGs

2019-07-05 Thread Caspar Smit
Kyle,

Was the cluster still backfilling when you removed osd 6 or did you only
check its utilization?

Running an EC pool with m=1 is a bad idea. EC pool min_size = k+1 so losing
a single OSD results in inaccessible data.
Your incomplete PG's are probably all EC pool pgs, please verify.

If the above statement is true, you could *temporarily* set min_size to 2
(on your EC pools) to get back access to your data again but this is a very
dangerous action. Losing another OSD during this period results in actual
data loss.

Kind regards,
Caspar Smit

Op vr 5 jul. 2019 om 01:17 schreef Kyle :

> Hello,
>
> I'm working with a small ceph cluster (about 10TB, 7-9 OSDs, all Bluestore
> on
> lvm) and recently ran into a problem with 17 pgs marked as incomplete
> after
> adding/removing OSDs.
>
> Here's the sequence of events:
> 1. 7 osds in the cluster, health is OK, all pgs are active+clean
> 2. 3 new osds on a new host are added, lots of backfilling in progress
> 3. osd 6 needs to be removed, so we do "ceph osd crush reweight osd.6 0"
> 4. after a few hours we see "min osd.6 with 0 pgs" from "ceph osd
> utilization"
> 5. ceph osd out 6
> 6. systemctl stop ceph-osd@6
> 7. the drive backing osd 6 is pulled and wiped
> 8. backfilling has now finished all pgs are active+clean except for 17
> incomplete pgs
>
> From reading the docs, it sounds like there has been unrecoverable data
> loss
> in those 17 pgs. That raises some questions for me:
>
> Was "ceph osd utilization" only showing a goal of 0 pgs allocated instead
> of
> the current actual allocation?
>
> Why is there data loss from a single osd being removed? Shouldn't that be
> recoverable?
> All pools in the cluster are either replicated 3 or erasure-coded k=2,m=1
> with
> default "host" failure domain. They shouldn't suffer data loss with a
> single
> osd being removed even if there were no reweighting beforehand. Does the
> backfilling temporarily reduce data durability in some way?
>
> Is there a way to see which pgs actually have data on a given osd?
>
> I attached an example of one of the incomplete pgs.
>
> Thanks for any help,
>
> Kyle___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com