Re: [ceph-users] Hammer: PGs stuck creating
On Thu, Jun 30, 2016 at 11:34 PM, Brian Feltonwrote: > Sure. Here's a complete query dump of one of the 30 pgs: > http://pastebin.com/NFSYTbUP Looking at that something immediately stands out. There are a lot of entries in "past intervals" like so. "past_intervals": [ { "first": 18522, "last": 18523, "maybe_went_rw": 1, "up": [ 2147483647, ... "acting": [ 2147483647, 2147483647, 2147483647, 2147483647 ], "primary": -1, "up_primary": -1 That value is defined in src/crush/crush.h like so; #define CRUSH_ITEM_NONE 0x7fff /* no result */ So it looks like this could be to do with a bad crush rule (or at least a previously un-satisfiable rule). Could you share the output from the following? $ ceph osd crush rule ls For each rule listed by the above command. $ ceph osd crush rule dump [rule_name] I'd then dump out the crushmap and test it showing any bad mappings with the commands listed here; http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon That should hopefully give some insight. HTH, Brad > > Brian > > On Wed, Jun 29, 2016 at 6:25 PM, Brad Hubbard wrote: >> >> On Thu, Jun 30, 2016 at 3:22 AM, Brian Felton wrote: >> > Greetings, >> > >> > I have a lab cluster running Hammer 0.94.6 and being used exclusively >> > for >> > object storage. The cluster consists of four servers running 60 6TB >> > OSDs >> > each. The main .rgw.buckets pool is using k=3 m=1 erasure coding and >> > contains 8192 placement groups. >> > >> > Last week, one of our guys out-ed and removed one OSD from each of three >> > of >> > the four servers in the cluster, which resulted in some general badness >> > (the >> > disks were wiped post-removal, so the data are gone). After a proper >> > education in why this is a Bad Thing, we got the OSDs added back. When >> > all >> > was said and done, we had 30 pgs that were stuck incomplete, and no >> > amount >> > of magic has been able to get them to recover. From reviewing the data, >> > we >> > knew that all of these pgs contained at least 2 of the removed OSDs; I >> > understand and accept that the data are gone, and that's not a concern >> > (yay >> > lab). >> > >> > Here are the things I've tried: >> > >> > - Restarted all OSDs >> > - Stopped all OSDs, removed all OSDs from the crush map, and started >> > everything back up >> > - Executed a 'ceph pg force_create_pg ' for each of the 30 stuck pgs >> > - Executed a 'ceph pg send_pg_creates' to get the ball rolling on >> > creates >> > - Executed several 'ceph pg query' commands to ensure we were >> > referencing valid OSDs after the 'force_create_pg' >> > - Ensured those OSDs were really removed (e.g. 'ceph auth del', 'ceph >> > osd >> > crush remove', and 'ceph osd rm') >> >> Can you share some of the pg query output? >> >> > >> > At this point, I've got the same 30 pgs that are stuck creating. I've >> > run >> > out of ideas for getting this back to a healthy state. In reviewing the >> > other posts on the mailing list, the overwhelming solution was a bad OSD >> > in >> > the crush map, but I'm all but certain that isn't what's hitting us >> > here. >> > Normally, being the lab, I'd consider nuking the .rgw.buckets pool and >> > starting from scratch, but we've recently spent a lot of time pulling >> > 140TB >> > of data into this cluster for some performance and recovery tests, and >> > I'd >> > prefer not to have to start that process again. I am willing to >> > entertain >> > most any other idea irrespective to how destructive it is to these PGs, >> > so >> > long as I don't have to lose the rest of the data in the pool. >> > >> > Many thanks in advance for any assistance here. >> > >> > Brian Felton >> > >> > >> > >> > >> > ___ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> >> >> >> -- >> Cheers, >> Brad > > -- Cheers, Brad ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Hammer: PGs stuck creating
Sure. Here's a complete query dump of one of the 30 pgs: http://pastebin.com/NFSYTbUP Brian On Wed, Jun 29, 2016 at 6:25 PM, Brad Hubbardwrote: > On Thu, Jun 30, 2016 at 3:22 AM, Brian Felton wrote: > > Greetings, > > > > I have a lab cluster running Hammer 0.94.6 and being used exclusively for > > object storage. The cluster consists of four servers running 60 6TB OSDs > > each. The main .rgw.buckets pool is using k=3 m=1 erasure coding and > > contains 8192 placement groups. > > > > Last week, one of our guys out-ed and removed one OSD from each of three > of > > the four servers in the cluster, which resulted in some general badness > (the > > disks were wiped post-removal, so the data are gone). After a proper > > education in why this is a Bad Thing, we got the OSDs added back. When > all > > was said and done, we had 30 pgs that were stuck incomplete, and no > amount > > of magic has been able to get them to recover. From reviewing the data, > we > > knew that all of these pgs contained at least 2 of the removed OSDs; I > > understand and accept that the data are gone, and that's not a concern > (yay > > lab). > > > > Here are the things I've tried: > > > > - Restarted all OSDs > > - Stopped all OSDs, removed all OSDs from the crush map, and started > > everything back up > > - Executed a 'ceph pg force_create_pg ' for each of the 30 stuck pgs > > - Executed a 'ceph pg send_pg_creates' to get the ball rolling on creates > > - Executed several 'ceph pg query' commands to ensure we were > > referencing valid OSDs after the 'force_create_pg' > > - Ensured those OSDs were really removed (e.g. 'ceph auth del', 'ceph osd > > crush remove', and 'ceph osd rm') > > Can you share some of the pg query output? > > > > > At this point, I've got the same 30 pgs that are stuck creating. I've > run > > out of ideas for getting this back to a healthy state. In reviewing the > > other posts on the mailing list, the overwhelming solution was a bad OSD > in > > the crush map, but I'm all but certain that isn't what's hitting us here. > > Normally, being the lab, I'd consider nuking the .rgw.buckets pool and > > starting from scratch, but we've recently spent a lot of time pulling > 140TB > > of data into this cluster for some performance and recovery tests, and > I'd > > prefer not to have to start that process again. I am willing to > entertain > > most any other idea irrespective to how destructive it is to these PGs, > so > > long as I don't have to lose the rest of the data in the pool. > > > > Many thanks in advance for any assistance here. > > > > Brian Felton > > > > > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > -- > Cheers, > Brad > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Hammer: PGs stuck creating
On Thu, Jun 30, 2016 at 3:22 AM, Brian Feltonwrote: > Greetings, > > I have a lab cluster running Hammer 0.94.6 and being used exclusively for > object storage. The cluster consists of four servers running 60 6TB OSDs > each. The main .rgw.buckets pool is using k=3 m=1 erasure coding and > contains 8192 placement groups. > > Last week, one of our guys out-ed and removed one OSD from each of three of > the four servers in the cluster, which resulted in some general badness (the > disks were wiped post-removal, so the data are gone). After a proper > education in why this is a Bad Thing, we got the OSDs added back. When all > was said and done, we had 30 pgs that were stuck incomplete, and no amount > of magic has been able to get them to recover. From reviewing the data, we > knew that all of these pgs contained at least 2 of the removed OSDs; I > understand and accept that the data are gone, and that's not a concern (yay > lab). > > Here are the things I've tried: > > - Restarted all OSDs > - Stopped all OSDs, removed all OSDs from the crush map, and started > everything back up > - Executed a 'ceph pg force_create_pg ' for each of the 30 stuck pgs > - Executed a 'ceph pg send_pg_creates' to get the ball rolling on creates > - Executed several 'ceph pg query' commands to ensure we were > referencing valid OSDs after the 'force_create_pg' > - Ensured those OSDs were really removed (e.g. 'ceph auth del', 'ceph osd > crush remove', and 'ceph osd rm') Can you share some of the pg query output? > > At this point, I've got the same 30 pgs that are stuck creating. I've run > out of ideas for getting this back to a healthy state. In reviewing the > other posts on the mailing list, the overwhelming solution was a bad OSD in > the crush map, but I'm all but certain that isn't what's hitting us here. > Normally, being the lab, I'd consider nuking the .rgw.buckets pool and > starting from scratch, but we've recently spent a lot of time pulling 140TB > of data into this cluster for some performance and recovery tests, and I'd > prefer not to have to start that process again. I am willing to entertain > most any other idea irrespective to how destructive it is to these PGs, so > long as I don't have to lose the rest of the data in the pool. > > Many thanks in advance for any assistance here. > > Brian Felton > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Cheers, Brad ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Hammer: PGs stuck creating
Greetings, I have a lab cluster running Hammer 0.94.6 and being used exclusively for object storage. The cluster consists of four servers running 60 6TB OSDs each. The main .rgw.buckets pool is using k=3 m=1 erasure coding and contains 8192 placement groups. Last week, one of our guys out-ed and removed one OSD from each of three of the four servers in the cluster, which resulted in some general badness (the disks were wiped post-removal, so the data are gone). After a proper education in why this is a Bad Thing, we got the OSDs added back. When all was said and done, we had 30 pgs that were stuck incomplete, and no amount of magic has been able to get them to recover. From reviewing the data, we knew that all of these pgs contained at least 2 of the removed OSDs; I understand and accept that the data are gone, and that's not a concern (yay lab). Here are the things I've tried: - Restarted all OSDs - Stopped all OSDs, removed all OSDs from the crush map, and started everything back up - Executed a 'ceph pg force_create_pg ' for each of the 30 stuck pgs - Executed a 'ceph pg send_pg_creates' to get the ball rolling on creates - Executed several 'ceph pg query' commands to ensure we were referencing valid OSDs after the 'force_create_pg' - Ensured those OSDs were really removed (e.g. 'ceph auth del', 'ceph osd crush remove', and 'ceph osd rm') At this point, I've got the same 30 pgs that are stuck creating. I've run out of ideas for getting this back to a healthy state. In reviewing the other posts on the mailing list, the overwhelming solution was a bad OSD in the crush map, but I'm all but certain that isn't what's hitting us here. Normally, being the lab, I'd consider nuking the .rgw.buckets pool and starting from scratch, but we've recently spent a lot of time pulling 140TB of data into this cluster for some performance and recovery tests, and I'd prefer not to have to start that process again. I am willing to entertain most any other idea irrespective to how destructive it is to these PGs, so long as I don't have to lose the rest of the data in the pool. Many thanks in advance for any assistance here. Brian Felton ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com