Re: [ceph-users] OSD's won't start - thread abort
n Wed, Jul 3, 2019 at 11:09 AM Austin Workman wrote: > Decided that if all the data was going to move, I should adjust my jerasure > ec profile from k=4, m=1 -> k=5, m=1 with force(is this even recommended vs. > just creating new pools???) > > Initially it unset crush-device-class=hdd to be blank > Re-set crush-device-class > Couldn't determine if this had any effect on the move operations. > Changed back to k=4 You can't change the EC parameters on existing pools; Ceph has no way of dealing with that. If it's possible to change the profile and break the pool (which given the striping mismatch you cite later seems to be what happened), we need to fix that. Can you describe the exact commands you ran in that timeline? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD's won't start - thread abort
After some creative PG surgery, everything is coming back online cleanly. I went through one at a time(80-90 PG's) on the least filled(new osd.5) and export-remove'd each PG that was causing the assertion failures after testing starting the OSD. # tail -f /var/log/ceph/ceph-osd.5.log | grep -A1 "unlocked" (This helped identify the PG loaded right before the assertion failure). I've kept the flawed PG's with the striping issue in case they were needed for anything later on. This allowed the OSD to finally start with only clean PG's from that pool left. Then I went through and started that process on the other OSD(0), which is going to take forever because that had existing data. Paused with that, and I identified the incomplete/inactive PG's and then exported those from the downed osd.0, then imported into the osd.5 that was able to come online. Some of the imports identified split PG's where there were contents for other missing PG's as part of a few of the imports. Using the import capability while specifiying the split pgid allowed those additional objects to import and to satisfy all of the missing shards for additional objects that I hadn't yet identified source PG's for. 5/6 OSD's up and running, and now all of the PG's are active now, and all of the data is back working. Still undersized/backfilling/moving but it seems there isn't any data loss. Now I can either continue going through one at a time removing the erroneous PG's from osd.0 or potentially blow it away and start a fresh OSD. Is there a recommended path there? Second question, if I bring up the original OSD after pruning all of the flawed PG copies with the stripe issue, is it important to remove the leftover PG copies that were successfully imported into osd.5? I'm thinking I would want to, and can leave the exports around just in case. Once data starts changing(new writes) I would imagine the exports wouldn't work(or could they potentially screw something up?) https://gist.githubusercontent.com/arodd/c95355a7b55f3e4a94f21bc5e801943d/raw/dfce381af603306e09c634196309d95c172961a7/osd-semi-healthy After all of this, i'm going to make a new cephfs filesystem with a new metadata/data pool with the newer ec settings to copy all of the data over into with fresh PG's, and might consider moving to k=4,m=2 instead ;) On Wed, Jul 3, 2019 at 2:28 PM Austin Workman wrote: > That makes more sense. > > Setting min_size = 4 on the EC pool allows data to flow again(kind of not > really because of the still missing 22 other PG's) maybe this automatically > raised to 5 when I adjusted the EC pool originally?, outside of the 21 > unknown and 1 down PG which are probably depending on the two OSD's. These > are probably the 22 PG's that actually got fully moved around(maybe even > converted to k=5/m=1?). Would be great if I can find a way to start those > other two OSD's, and just deal with whatever state is causing the OSD's to > crash. > > On Wed, Jul 3, 2019 at 2:18 PM Janne Johansson > wrote: > >> Den ons 3 juli 2019 kl 20:51 skrev Austin Workman : >> >>> >>> But a very strange number shows up in the active sections of the pg's >>> that's the same number roughly as 2147483648. This seems very odd, >>> and maybe the value got lodged somewhere it doesn't belong which is causing >>> an issue. >>> >>> >> That pg number is "-1" or something for a signed 32bit int, which means >> "I don't know which one it was anymore" which you can get in PG lists when >> OSDs are gone. >> >> -- >> May the most significant bit of your life be positive. >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD's won't start - thread abort
That makes more sense. Setting min_size = 4 on the EC pool allows data to flow again(kind of not really because of the still missing 22 other PG's) maybe this automatically raised to 5 when I adjusted the EC pool originally?, outside of the 21 unknown and 1 down PG which are probably depending on the two OSD's. These are probably the 22 PG's that actually got fully moved around(maybe even converted to k=5/m=1?). Would be great if I can find a way to start those other two OSD's, and just deal with whatever state is causing the OSD's to crash. On Wed, Jul 3, 2019 at 2:18 PM Janne Johansson wrote: > Den ons 3 juli 2019 kl 20:51 skrev Austin Workman : > >> >> But a very strange number shows up in the active sections of the pg's >> that's the same number roughly as 2147483648. This seems very odd, >> and maybe the value got lodged somewhere it doesn't belong which is causing >> an issue. >> >> > That pg number is "-1" or something for a signed 32bit int, which means "I > don't know which one it was anymore" which you can get in PG lists when > OSDs are gone. > > -- > May the most significant bit of your life be positive. > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD's won't start - thread abort
Den ons 3 juli 2019 kl 20:51 skrev Austin Workman : > > But a very strange number shows up in the active sections of the pg's > that's the same number roughly as 2147483648. This seems very odd, > and maybe the value got lodged somewhere it doesn't belong which is causing > an issue. > > That pg number is "-1" or something for a signed 32bit int, which means "I don't know which one it was anymore" which you can get in PG lists when OSDs are gone. -- May the most significant bit of your life be positive. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD's won't start - thread abort
Something very curious is that I was adjusting the configuration for osd memory target via ceph-ansible and had at one point set 2147483648 which is around 2GB Currently It's set to 1610612736, but strangely in the config file it wrote 1963336226. But a very strange number shows up in the active sections of the pg's that's the same number roughly as 2147483648. This seems very odd, and maybe the value got lodged somewhere it doesn't belong which is causing an issue. https://gist.github.com/arodd/c95355a7b55f3e4a94f21bc5e801943d#file-8-c-pg-query-L18 On Wed, Jul 3, 2019 at 1:08 PM Austin Workman wrote: > So several events unfolded that may have led to this situation. Some of > them in hindsight were probably not the smartest decision around adjusting > the ec pool and restarting the OSD's several times during these migrations. > > >1. Added a new 6th OSD with ceph-ansible > 1. Hung during restart of OSD's because they were set to noup and > one of the original OSD's wouldn't come back online because of the noup. > Manually unset noup and all 6 OSD's went up/in. >2. Objects showing in degraded/misplaced >3. Strange behavior restarting one OSD at a time and waiting for it to >stabilize, depending on which was the last OSD restarted, different >resulting backfill or move operations were taking place. >4. Adjusted recovery/backfill sleep/concurrent moves to speed up >re-location. >5. Decided that if all the data was going to move, I should adjust my >jerasure ec profile from k=4, m=1 -> k=5, m=1 with force(is this even >recommended vs. just creating new pools???) > 1. Initially it unset crush-device-class=hdd to be blank > 2. Re-set crush-device-class > 3. Couldn't determine if this had any effect on the move operations. > 4. Changed back to k=4 >6. Let some of the backfill work through, ran into toofull situations >even though OSD's had plenty of space. > 1. Decided to add PG's to the EC pool 64->150 >7. Restarted one OSD at a time again, waiting for them to be healthy >before moving on.(probably should have been setting noout) >8. Eventually one of the old OSD's refused to start due to a thread >abort relating to stripe size(see below). >9. Tried restarting other OSD's they all came back online fine. >10. Some time passes and then the new OSD crashes, and won't start >back up with the same stripe size abort. > 1. Now 2 OSD's are down, and won't start back up due to that same > condition, and data is no longer available. > 2. 149 PG's showing as incomplete due to the min size 5(which > shouldn't it be 1 from the original EC/new EC profile settings?) > 3. 1 pg as down > 4. 21 unknown > 5. Some of the PG's were still "new pg's" from increasing the PG > size of the pool. > > So yeah, somewhat of a cluster of changing too many things at once here, > but I didn't realize the things I was doing would potentially have this > result. > > The two OSD's that won't start should still have all of the data on them, > it seems like they are having issues with at least one of the PG's in > particular from the EC pool that was adjusted, but presumably the rest of > the data should be fine, and hopefully there is a way to get them to start > up again. I saw a similar issue posted in the list a few years ago but > there was never any follow up from the user having the issue. > > https://gist.github.com/arodd/c95355a7b55f3e4a94f21bc5e801943d > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com