Re: [ceph-users] pg stuck with unfound objects on non exsisting osd's

2016-11-01 Thread Ronny Aasen

thanks for the suggestion.

is a rolling reboot sufficient? or must all osd's be down at the same 
time ?

one is no problem.  the other takes some scheduling..

Ronny Aasen


On 01.11.2016 21:52, c...@elchaka.de wrote:

Hello Ronny,

if it is possible for you, try to Reboot all OSD Nodes.

I had this issue on my test Cluster and it become healthy after rebooting.

Hth
- Mehmet

Am 1. November 2016 19:55:07 MEZ, schrieb Ronny Aasen 
:


Hello.

I have a cluster stuck with 2 pg's stuck undersized degraded, with 25
unfound objects.

# ceph health detail
HEALTH_WARN 2 pgs degraded; 2 pgs recovering; 2 pgs stuck degraded; 2 pgs 
stuck unclean; 2 pgs stuck undersized; 2 pgs undersized; recovery 
294599/149522370 objects degraded (0.197%); recovery 640073/149522370 objects 
misplaced (0.428%); recovery 25/46579241 unfound (0.000%); noout flag(s) set
pg 6.d4 is stuck unclean for 8893374.380079, current state 
active+recovering+undersized+degraded+remapped, last acting [62]
pg 6.ab is stuck unclean for 8896787.249470, current state 
active+recovering+undersized+degraded+remapped, last acting [18,12]
pg 6.d4 is stuck undersized for 438122.427341, current state 
active+recovering+undersized+degraded+remapped, last acting [62]
pg 6.ab is stuck undersized for 416947.461950, current state 
active+recovering+undersized+degraded+remapped, last acting [18,12]*pg 6.d4 is 
stuck degraded for 438122.427402, current state
active+recovering+undersized+degraded+remapped, last acting [62]
pg 6.ab is stuck degraded for 416947.462010, current state
active+recovering+undersized+degraded+remapped, last acting
[18,12] pg 6.d4 is active+recovering+undersized+degraded+remapped,
acting [62], 25 unfound pg 6.ab is
active+recovering+undersized+degraded+remapped, acting [18,12]
recovery 294599/149522370 objects degraded (0.197%) recovery
640073/149522370 objects misplaced (0.428%) recovery 25/46579241
unfound (0.000%) noout flag(s) set have been following the
troubleshooting guide at
http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/
but gets stuck without a resolution. luckily it is not critical
data. so i wanted to mark the pg lost so it could become
health-ok< br /> # ceph pg 6.d4 mark_unfound_lost delete Error
EINVAL: pg has 25 unfound objects but we haven't probed all
sources, not marking lost querying the pg i see that it would want
osd.80 and osd 36 { "osd": "80", "status": "osd is down" }, trying
to mark the osd's lost does not work either. since the osd's was
removed from the cluster a long time ago. # ceph osd lost 80
--yes-i-really-mean-it osd.80 is not down or doesn't exist # ceph
osd lost 36 --yes-i-really-mean-it osd.36 is not down or doesn't
exist and this is where i am stuck. have tried stopping and
starting the 3 osd's but that did not have any effect. Anyone have
any advice how to proceed ? full output at:
http://paste.debian.net/hidden/be03a185/ this is hammer 0.94.9 on
debian 8. kind regards Ronny Aasen

ceph-users mailing list ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com *

**


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pg stuck with unfound objects on non exsisting osd's

2016-11-01 Thread ceph
Hello Ronny,

if it is possible for you, try to Reboot all OSD Nodes. 

I had this issue on my test Cluster and it become healthy after rebooting.

Hth
- Mehmet

Am 1. November 2016 19:55:07 MEZ, schrieb Ronny Aasen 
:
>Hello.
>
>I have a cluster stuck with 2 pg's stuck undersized degraded, with 25 
>unfound objects.
>
># ceph health detail
>HEALTH_WARN 2 pgs degraded; 2 pgs recovering; 2 pgs stuck degraded; 2
>pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized; recovery
>294599/149522370 objects degraded (0.197%); recovery 640073/149522370
>objects misplaced (0.428%); recovery 25/46579241 unfound (0.000%);
>noout flag(s) set
>pg 6.d4 is stuck unclean for 8893374.380079, current state
>active+recovering+undersized+degraded+remapped, last acting [62]
>pg 6.ab is stuck unclean for 8896787.249470, current state
>active+recovering+undersized+degraded+remapped, last acting [18,12]
>pg 6.d4 is stuck undersized for 438122.427341, current state
>active+recovering+undersized+degraded+remapped, last acting [62]
>pg 6.ab is stuck undersized for 416947.461950, current state
>active+recovering+undersized+degraded+remapped, last acting [18,12]
>pg 6.d4 is stuck degraded for 438122.427402, current state
>active+recovering+undersized+degraded+remapped, last acting [62]
>pg 6.ab is stuck degraded for 416947.462010, current state
>active+recovering+undersized+degraded+remapped, last acting [18,12]
>pg 6.d4 is active+recovering+undersized+degraded+remapped, acting [62],
>25 unfound
>pg 6.ab is active+recovering+undersized+degraded+remapped, acting
>[18,12]
>recovery 294599/149522370 objects degraded (0.197%)
>recovery 640073/149522370 objects misplaced (0.428%)
>recovery 25/46579241 unfound (0.000%)
>noout flag(s) set
>
>
>have been following the troubleshooting guide at 
>http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/
>
>but gets stuck without a resolution.
>
>luckily it is not critical data. so i wanted to mark the pg lost so it 
>could become health-ok
>
>
># ceph pg 6.d4 mark_unfound_lost delete
>Error EINVAL: pg has 25 unfound objects but we haven't probed all 
>sources, not marking lost
>
>querying the pg i see that it would want osd.80 and osd 36
>
>  {
> "osd": "80",
> "status": "osd is down"
> },
>
>trying to mark the osd's lost does not work either. since the osd's was
>
>removed from the cluster a long time ago.
>
># ceph osd lost 80 --yes-i-really-mean-it
>osd.80 is not down or doesn't exist
>
># ceph osd lost 36 --yes-i-really-mean-it
>osd.36 is not down or doesn't exist
>
>
>and this is where i am stuck.
>
>have tried stopping and starting the 3 osd's but that did not have any 
>effect.
>
>Anyone have any advice how to proceed ?
>
>full output at:  http://paste.debian.net/hidden/be03a185/
>
>this is hammer 0.94.9  on debian 8.
>
>
>kind regards
>
>Ronny Aasen
>
>
>
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pg stuck with unfound objects on non exsisting osd's

2016-11-01 Thread Ronny Aasen

Hello.

I have a cluster stuck with 2 pg's stuck undersized degraded, with 25 
unfound objects.


# ceph health detail
HEALTH_WARN 2 pgs degraded; 2 pgs recovering; 2 pgs stuck degraded; 2 pgs stuck 
unclean; 2 pgs stuck undersized; 2 pgs undersized; recovery 294599/149522370 
objects degraded (0.197%); recovery 640073/149522370 objects misplaced 
(0.428%); recovery 25/46579241 unfound (0.000%); noout flag(s) set
pg 6.d4 is stuck unclean for 8893374.380079, current state 
active+recovering+undersized+degraded+remapped, last acting [62]
pg 6.ab is stuck unclean for 8896787.249470, current state 
active+recovering+undersized+degraded+remapped, last acting [18,12]
pg 6.d4 is stuck undersized for 438122.427341, current state 
active+recovering+undersized+degraded+remapped, last acting [62]
pg 6.ab is stuck undersized for 416947.461950, current state 
active+recovering+undersized+degraded+remapped, last acting [18,12]
pg 6.d4 is stuck degraded for 438122.427402, current state 
active+recovering+undersized+degraded+remapped, last acting [62]
pg 6.ab is stuck degraded for 416947.462010, current state 
active+recovering+undersized+degraded+remapped, last acting [18,12]
pg 6.d4 is active+recovering+undersized+degraded+remapped, acting [62], 25 
unfound
pg 6.ab is active+recovering+undersized+degraded+remapped, acting [18,12]
recovery 294599/149522370 objects degraded (0.197%)
recovery 640073/149522370 objects misplaced (0.428%)
recovery 25/46579241 unfound (0.000%)
noout flag(s) set


have been following the troubleshooting guide at 
http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/ 
but gets stuck without a resolution.


luckily it is not critical data. so i wanted to mark the pg lost so it 
could become health-ok



# ceph pg 6.d4 mark_unfound_lost delete
Error EINVAL: pg has 25 unfound objects but we haven't probed all 
sources, not marking lost


querying the pg i see that it would want osd.80 and osd 36

 {
"osd": "80",
"status": "osd is down"
},

trying to mark the osd's lost does not work either. since the osd's was 
removed from the cluster a long time ago.


# ceph osd lost 80 --yes-i-really-mean-it
osd.80 is not down or doesn't exist

# ceph osd lost 36 --yes-i-really-mean-it
osd.36 is not down or doesn't exist


and this is where i am stuck.

have tried stopping and starting the 3 osd's but that did not have any 
effect.


Anyone have any advice how to proceed ?

full output at:  http://paste.debian.net/hidden/be03a185/

this is hammer 0.94.9  on debian 8.


kind regards

Ronny Aasen



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com