Re: [ceph-users] OSD's won't start - thread abort

2019-07-05 Thread Gregory Farnum
n Wed, Jul 3, 2019 at 11:09 AM Austin Workman  wrote:
> Decided that if all the data was going to move, I should adjust my jerasure 
> ec profile from k=4, m=1 -> k=5, m=1 with force(is this even recommended vs. 
> just creating new pools???)
>
> Initially it unset crush-device-class=hdd to be blank
> Re-set crush-device-class
> Couldn't determine if this had any effect on the move operations.
> Changed back to k=4

You can't change the EC parameters on existing pools; Ceph has no way
of dealing with that. If it's possible to change the profile and break
the pool (which given the striping mismatch you cite later seems to be
what happened), we need to fix that.
Can you describe the exact commands you ran in that timeline?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD's won't start - thread abort

2019-07-03 Thread Austin Workman
After some creative PG surgery, everything is coming back online cleanly.
I went through one at a time(80-90 PG's) on the least filled(new osd.5) and
export-remove'd each PG that was causing the assertion failures after
testing starting the OSD. # tail -f /var/log/ceph/ceph-osd.5.log | grep -A1
"unlocked" (This helped identify the PG loaded right before the assertion
failure).  I've kept the flawed PG's with the striping issue in case they
were needed for anything later on. This allowed the OSD to finally start
with only clean PG's from that pool left.  Then I went through and started
that process on the other OSD(0), which is going to take forever because
that had existing data.  Paused with that, and I identified the
incomplete/inactive PG's and then exported those from the downed osd.0,
then imported into the osd.5 that was able to come online.  Some of the
imports identified split PG's where there were contents for other missing
PG's as part of a few of the imports.  Using the import capability while
specifiying the split pgid allowed those additional objects to import and
to satisfy all of the missing shards for additional objects that I hadn't
yet identified source PG's for.  5/6 OSD's up and running, and now all of
the PG's are active now, and all of the data is back working. Still
undersized/backfilling/moving but it seems there isn't any data loss.

Now I can either continue going through one at a time removing the
erroneous PG's from osd.0 or potentially blow it away and start a fresh
OSD.  Is there a recommended path there?
Second question, if I bring up the original OSD after pruning all of the
flawed PG copies with the stripe issue, is it important to remove the
leftover PG copies that were successfully imported into osd.5?  I'm
thinking I would want to, and can leave the exports around just in case.
Once data starts changing(new writes) I would imagine the exports wouldn't
work(or could they potentially screw something up?)


https://gist.githubusercontent.com/arodd/c95355a7b55f3e4a94f21bc5e801943d/raw/dfce381af603306e09c634196309d95c172961a7/osd-semi-healthy

After all of this, i'm going to make a new cephfs filesystem with a new
metadata/data pool with the newer ec settings to copy all of the data over
into with fresh PG's, and might consider moving to k=4,m=2 instead ;)

On Wed, Jul 3, 2019 at 2:28 PM Austin Workman  wrote:

> That makes more sense.
>
> Setting min_size = 4 on the EC pool allows data to flow again(kind of not
> really because of the still missing 22 other PG's) maybe this automatically
> raised to 5 when I adjusted the EC pool originally?, outside of the 21
> unknown and 1 down PG which are probably depending on the two OSD's.  These
> are probably the 22 PG's that actually got fully moved around(maybe even
> converted to k=5/m=1?).  Would be great if I can find a way to start those
> other two OSD's, and just deal with whatever state is causing the OSD's to
> crash.
>
> On Wed, Jul 3, 2019 at 2:18 PM Janne Johansson 
> wrote:
>
>> Den ons 3 juli 2019 kl 20:51 skrev Austin Workman :
>>
>>>
>>> But a very strange number shows up in the active sections of the pg's
>>> that's the same number roughly as 2147483648.  This seems very odd,
>>> and maybe the value got lodged somewhere it doesn't belong which is causing
>>> an issue.
>>>
>>>
>> That pg number is "-1" or something for a signed 32bit int, which means
>> "I don't know which one it was anymore" which you can get in PG lists when
>> OSDs are gone.
>>
>> --
>> May the most significant bit of your life be positive.
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD's won't start - thread abort

2019-07-03 Thread Austin Workman
That makes more sense.

Setting min_size = 4 on the EC pool allows data to flow again(kind of not
really because of the still missing 22 other PG's) maybe this automatically
raised to 5 when I adjusted the EC pool originally?, outside of the 21
unknown and 1 down PG which are probably depending on the two OSD's.  These
are probably the 22 PG's that actually got fully moved around(maybe even
converted to k=5/m=1?).  Would be great if I can find a way to start those
other two OSD's, and just deal with whatever state is causing the OSD's to
crash.

On Wed, Jul 3, 2019 at 2:18 PM Janne Johansson  wrote:

> Den ons 3 juli 2019 kl 20:51 skrev Austin Workman :
>
>>
>> But a very strange number shows up in the active sections of the pg's
>> that's the same number roughly as 2147483648.  This seems very odd,
>> and maybe the value got lodged somewhere it doesn't belong which is causing
>> an issue.
>>
>>
> That pg number is "-1" or something for a signed 32bit int, which means "I
> don't know which one it was anymore" which you can get in PG lists when
> OSDs are gone.
>
> --
> May the most significant bit of your life be positive.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD's won't start - thread abort

2019-07-03 Thread Janne Johansson
Den ons 3 juli 2019 kl 20:51 skrev Austin Workman :

>
> But a very strange number shows up in the active sections of the pg's
> that's the same number roughly as 2147483648.  This seems very odd,
> and maybe the value got lodged somewhere it doesn't belong which is causing
> an issue.
>
>
That pg number is "-1" or something for a signed 32bit int, which means "I
don't know which one it was anymore" which you can get in PG lists when
OSDs are gone.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD's won't start - thread abort

2019-07-03 Thread Austin Workman
Something very curious is that I was adjusting the configuration for osd
memory target via ceph-ansible and had at one point set 2147483648
which is around 2GB

Currently It's set to 1610612736, but strangely in the config file it
wrote 1963336226.

But a very strange number shows up in the active sections of the pg's
that's the same number roughly as 2147483648.  This seems very odd, and
maybe the value got lodged somewhere it doesn't belong which is causing an
issue.

https://gist.github.com/arodd/c95355a7b55f3e4a94f21bc5e801943d#file-8-c-pg-query-L18


On Wed, Jul 3, 2019 at 1:08 PM Austin Workman  wrote:

> So several events unfolded that may have led to this situation.  Some of
> them in hindsight were probably not the smartest decision around adjusting
> the ec pool and restarting the OSD's several times during these migrations.
>
>
>1. Added a new 6th OSD with ceph-ansible
>   1. Hung during restart of OSD's because they were set to noup and
>   one of the original OSD's wouldn't come back online because of the noup.
>   Manually unset noup and all 6 OSD's went up/in.
>2. Objects showing in degraded/misplaced
>3. Strange behavior restarting one OSD at a time and waiting for it to
>stabilize, depending on which was the last OSD restarted, different
>resulting backfill or move operations were taking place.
>4. Adjusted recovery/backfill sleep/concurrent moves to speed up
>re-location.
>5. Decided that if all the data was going to move, I should adjust my
>jerasure ec profile from k=4, m=1 -> k=5, m=1 with force(is this even
>recommended vs. just creating new pools???)
>   1. Initially it unset crush-device-class=hdd to be blank
>   2. Re-set crush-device-class
>   3. Couldn't determine if this had any effect on the move operations.
>   4. Changed back to k=4
>6. Let some of the backfill work through, ran into toofull situations
>even though OSD's had plenty of space.
>   1. Decided to add PG's to the EC pool 64->150
>7. Restarted one OSD at a time again, waiting for them to be healthy
>before moving on.(probably should have been setting noout)
>8. Eventually one of the old OSD's refused to start due to a thread
>abort relating to stripe size(see below).
>9. Tried restarting other OSD's they all came back online fine.
>10. Some time passes and then the new OSD crashes, and won't start
>back up with the same stripe size abort.
>   1. Now 2 OSD's are down, and won't start back up due to that same
>   condition, and data is no longer available.
>   2. 149 PG's showing as incomplete due to the min size 5(which
>   shouldn't it be 1 from the original EC/new EC profile settings?)
>   3. 1 pg as down
>   4. 21 unknown
>   5. Some of the PG's were still "new pg's" from increasing the PG
>   size of the pool.
>
> So yeah, somewhat of a cluster of changing too many things at once here,
> but I didn't realize the things I was doing would potentially have this
> result.
>
> The two OSD's that won't start should still have all of the data on them,
> it seems like they are having issues with at least one of the PG's in
> particular from the EC pool that was adjusted, but presumably the rest of
> the data should be fine, and hopefully there is a way to get them to start
> up again.  I saw a similar issue posted in the list a few years ago but
> there was never any follow up from the user having the issue.
>
> https://gist.github.com/arodd/c95355a7b55f3e4a94f21bc5e801943d
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com