Re: [ceph-users] pg stuck with unfound objects on non exsisting osd's
thanks for the suggestion. is a rolling reboot sufficient? or must all osd's be down at the same time ? one is no problem. the other takes some scheduling.. Ronny Aasen On 01.11.2016 21:52, c...@elchaka.de wrote: Hello Ronny, if it is possible for you, try to Reboot all OSD Nodes. I had this issue on my test Cluster and it become healthy after rebooting. Hth - Mehmet Am 1. November 2016 19:55:07 MEZ, schrieb Ronny Aasen: Hello. I have a cluster stuck with 2 pg's stuck undersized degraded, with 25 unfound objects. # ceph health detail HEALTH_WARN 2 pgs degraded; 2 pgs recovering; 2 pgs stuck degraded; 2 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized; recovery 294599/149522370 objects degraded (0.197%); recovery 640073/149522370 objects misplaced (0.428%); recovery 25/46579241 unfound (0.000%); noout flag(s) set pg 6.d4 is stuck unclean for 8893374.380079, current state active+recovering+undersized+degraded+remapped, last acting [62] pg 6.ab is stuck unclean for 8896787.249470, current state active+recovering+undersized+degraded+remapped, last acting [18,12] pg 6.d4 is stuck undersized for 438122.427341, current state active+recovering+undersized+degraded+remapped, last acting [62] pg 6.ab is stuck undersized for 416947.461950, current state active+recovering+undersized+degraded+remapped, last acting [18,12]*pg 6.d4 is stuck degraded for 438122.427402, current state active+recovering+undersized+degraded+remapped, last acting [62] pg 6.ab is stuck degraded for 416947.462010, current state active+recovering+undersized+degraded+remapped, last acting [18,12] pg 6.d4 is active+recovering+undersized+degraded+remapped, acting [62], 25 unfound pg 6.ab is active+recovering+undersized+degraded+remapped, acting [18,12] recovery 294599/149522370 objects degraded (0.197%) recovery 640073/149522370 objects misplaced (0.428%) recovery 25/46579241 unfound (0.000%) noout flag(s) set have been following the troubleshooting guide at http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/ but gets stuck without a resolution. luckily it is not critical data. so i wanted to mark the pg lost so it could become health-ok< br /> # ceph pg 6.d4 mark_unfound_lost delete Error EINVAL: pg has 25 unfound objects but we haven't probed all sources, not marking lost querying the pg i see that it would want osd.80 and osd 36 { "osd": "80", "status": "osd is down" }, trying to mark the osd's lost does not work either. since the osd's was removed from the cluster a long time ago. # ceph osd lost 80 --yes-i-really-mean-it osd.80 is not down or doesn't exist # ceph osd lost 36 --yes-i-really-mean-it osd.36 is not down or doesn't exist and this is where i am stuck. have tried stopping and starting the 3 osd's but that did not have any effect. Anyone have any advice how to proceed ? full output at: http://paste.debian.net/hidden/be03a185/ this is hammer 0.94.9 on debian 8. kind regards Ronny Aasen ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com * ** ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pg stuck with unfound objects on non exsisting osd's
Hello Ronny, if it is possible for you, try to Reboot all OSD Nodes. I had this issue on my test Cluster and it become healthy after rebooting. Hth - Mehmet Am 1. November 2016 19:55:07 MEZ, schrieb Ronny Aasen: >Hello. > >I have a cluster stuck with 2 pg's stuck undersized degraded, with 25 >unfound objects. > ># ceph health detail >HEALTH_WARN 2 pgs degraded; 2 pgs recovering; 2 pgs stuck degraded; 2 >pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized; recovery >294599/149522370 objects degraded (0.197%); recovery 640073/149522370 >objects misplaced (0.428%); recovery 25/46579241 unfound (0.000%); >noout flag(s) set >pg 6.d4 is stuck unclean for 8893374.380079, current state >active+recovering+undersized+degraded+remapped, last acting [62] >pg 6.ab is stuck unclean for 8896787.249470, current state >active+recovering+undersized+degraded+remapped, last acting [18,12] >pg 6.d4 is stuck undersized for 438122.427341, current state >active+recovering+undersized+degraded+remapped, last acting [62] >pg 6.ab is stuck undersized for 416947.461950, current state >active+recovering+undersized+degraded+remapped, last acting [18,12] >pg 6.d4 is stuck degraded for 438122.427402, current state >active+recovering+undersized+degraded+remapped, last acting [62] >pg 6.ab is stuck degraded for 416947.462010, current state >active+recovering+undersized+degraded+remapped, last acting [18,12] >pg 6.d4 is active+recovering+undersized+degraded+remapped, acting [62], >25 unfound >pg 6.ab is active+recovering+undersized+degraded+remapped, acting >[18,12] >recovery 294599/149522370 objects degraded (0.197%) >recovery 640073/149522370 objects misplaced (0.428%) >recovery 25/46579241 unfound (0.000%) >noout flag(s) set > > >have been following the troubleshooting guide at >http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/ > >but gets stuck without a resolution. > >luckily it is not critical data. so i wanted to mark the pg lost so it >could become health-ok > > ># ceph pg 6.d4 mark_unfound_lost delete >Error EINVAL: pg has 25 unfound objects but we haven't probed all >sources, not marking lost > >querying the pg i see that it would want osd.80 and osd 36 > > { > "osd": "80", > "status": "osd is down" > }, > >trying to mark the osd's lost does not work either. since the osd's was > >removed from the cluster a long time ago. > ># ceph osd lost 80 --yes-i-really-mean-it >osd.80 is not down or doesn't exist > ># ceph osd lost 36 --yes-i-really-mean-it >osd.36 is not down or doesn't exist > > >and this is where i am stuck. > >have tried stopping and starting the 3 osd's but that did not have any >effect. > >Anyone have any advice how to proceed ? > >full output at: http://paste.debian.net/hidden/be03a185/ > >this is hammer 0.94.9 on debian 8. > > >kind regards > >Ronny Aasen > > > >___ >ceph-users mailing list >ceph-users@lists.ceph.com >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] pg stuck with unfound objects on non exsisting osd's
Hello. I have a cluster stuck with 2 pg's stuck undersized degraded, with 25 unfound objects. # ceph health detail HEALTH_WARN 2 pgs degraded; 2 pgs recovering; 2 pgs stuck degraded; 2 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized; recovery 294599/149522370 objects degraded (0.197%); recovery 640073/149522370 objects misplaced (0.428%); recovery 25/46579241 unfound (0.000%); noout flag(s) set pg 6.d4 is stuck unclean for 8893374.380079, current state active+recovering+undersized+degraded+remapped, last acting [62] pg 6.ab is stuck unclean for 8896787.249470, current state active+recovering+undersized+degraded+remapped, last acting [18,12] pg 6.d4 is stuck undersized for 438122.427341, current state active+recovering+undersized+degraded+remapped, last acting [62] pg 6.ab is stuck undersized for 416947.461950, current state active+recovering+undersized+degraded+remapped, last acting [18,12] pg 6.d4 is stuck degraded for 438122.427402, current state active+recovering+undersized+degraded+remapped, last acting [62] pg 6.ab is stuck degraded for 416947.462010, current state active+recovering+undersized+degraded+remapped, last acting [18,12] pg 6.d4 is active+recovering+undersized+degraded+remapped, acting [62], 25 unfound pg 6.ab is active+recovering+undersized+degraded+remapped, acting [18,12] recovery 294599/149522370 objects degraded (0.197%) recovery 640073/149522370 objects misplaced (0.428%) recovery 25/46579241 unfound (0.000%) noout flag(s) set have been following the troubleshooting guide at http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/ but gets stuck without a resolution. luckily it is not critical data. so i wanted to mark the pg lost so it could become health-ok # ceph pg 6.d4 mark_unfound_lost delete Error EINVAL: pg has 25 unfound objects but we haven't probed all sources, not marking lost querying the pg i see that it would want osd.80 and osd 36 { "osd": "80", "status": "osd is down" }, trying to mark the osd's lost does not work either. since the osd's was removed from the cluster a long time ago. # ceph osd lost 80 --yes-i-really-mean-it osd.80 is not down or doesn't exist # ceph osd lost 36 --yes-i-really-mean-it osd.36 is not down or doesn't exist and this is where i am stuck. have tried stopping and starting the 3 osd's but that did not have any effect. Anyone have any advice how to proceed ? full output at: http://paste.debian.net/hidden/be03a185/ this is hammer 0.94.9 on debian 8. kind regards Ronny Aasen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com