Re: [ceph-users] pgs stuck unclean after removing OSDs

2017-06-28 Thread Jan Kasprzak
David Turner wrote: : A couple things. You didn't `ceph osd crush remove osd.21` after doing the : other bits. Also you will want to remove the bucket (re: host) from the : crush map as it will now be empty. Right now you have a host in the crush : map with a weight, but no osds to put that

Re: [ceph-users] pgs stuck unclean after removing OSDs

2017-06-28 Thread David Turner
I would stop the service, down, out, rm, auth del, crush remove, disable service, fstab, umount. So you did remove it from your crush map, then? Could you post your `ceph osd tree`? On Wed, Jun 28, 2017, 10:12 AM Mazzystr wrote: > I've been using this procedure to remove

Re: [ceph-users] pgs stuck unclean after removing OSDs

2017-06-28 Thread Mazzystr
I've been using this procedure to remove OSDs... OSD_ID= ceph auth del osd.${OSD_ID} ceph osd down ${OSD_ID} ceph osd out ${OSD_ID} ceph osd rm ${OSD_ID} ceph osd crush remove osd.${OSD_ID} systemctl disable ceph-osd@${OSD_ID}.service systemctl stop ceph-osd@${OSD_ID}.service sed -i

Re: [ceph-users] pgs stuck unclean after removing OSDs

2017-06-28 Thread David Turner
A couple things. You didn't `ceph osd crush remove osd.21` after doing the other bits. Also you will want to remove the bucket (re: host) from the crush map as it will now be empty. Right now you have a host in the crush map with a weight, but no osds to put that data on. It has a weight

[ceph-users] pgs stuck unclean after removing OSDs

2017-06-28 Thread Jan Kasprzak
Hello, TL;DR: what to do when my cluster reports stuck unclean pgs? Detailed description: One of the nodes in my cluster died. CEPH correctly rebalanced itself, and reached the HEALTH_OK state. I have looked at the failed server, and decided to take it out of the cluster permanently,

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Shinobu Kinjo
On Sat, Feb 18, 2017 at 9:03 AM, Matyas Koszik wrote: > > > Looks like you've provided me with the solution, thanks! :) > I've set the tunables to firefly, and now I only see the normal states > associated with a recovering cluster, there're no more stale pgs. > I hope it'll stay

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Matyas Koszik
Looks like you've provided me with the solution, thanks! I've set the tunables to firefly, and now I only see the normal states associated with a recovering cluster, there're no more stale pgs. I hope it'll stay like this when it's done, but that'll take quite a while. Matyas On Fri, 17 Feb

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Matyas Koszik
I set it to 100, then restarted osd26, but after recovery everything is as it was before. On Sat, 18 Feb 2017, Shinobu Kinjo wrote: > You may need to increase ``choose_total_tries`` to more than 50 > (default) up to 100. > > - >

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Shinobu Kinjo
You may need to increase ``choose_total_tries`` to more than 50 (default) up to 100. - http://docs.ceph.com/docs/master/rados/operations/crush-map/#editing-a-crush-map - https://github.com/ceph/ceph/blob/master/doc/man/8/crushtool.rst On Sat, Feb 18, 2017 at 5:25 AM, Matyas Koszik

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Matyas Koszik
I have size=2 and 3 independent nodes. I'm happy to try firefly tunables, but a bit scared that it would make things even worse. On Fri, 17 Feb 2017, Gregory Farnum wrote: > Situations that are stable lots of undersized PGs like this generally > mean that the CRUSH map is failing to allocate

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Matyas Koszik
It's at https://atw.hu/~koszik/ceph/crushmap.txt On Sat, 18 Feb 2017, Shinobu Kinjo wrote: > Can you do? > > * ceph osd getcrushmap -o ./crushmap.o; crushtool -d ./crushmap.o -o > ./crushmap.txt > > On Sat, Feb 18, 2017 at 3:52 AM, Gregory Farnum wrote: > > Situations

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Shinobu Kinjo
Can you do? * ceph osd getcrushmap -o ./crushmap.o; crushtool -d ./crushmap.o -o ./crushmap.txt On Sat, Feb 18, 2017 at 3:52 AM, Gregory Farnum wrote: > Situations that are stable lots of undersized PGs like this generally > mean that the CRUSH map is failing to allocate

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Gregory Farnum
Situations that are stable lots of undersized PGs like this generally mean that the CRUSH map is failing to allocate enough OSDs for certain PGs. The log you have says the OSD is trying to NOTIFY the new primary that the PG exists here on this replica. I'd guess you only have 3 hosts and are

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Matyas Koszik
I'm not sure what variable should I be looking at exactly, but after reading through all of them I don't see anyting supsicious, all values are 0. I'm attaching it anyway, in case I missed something: https://atw.hu/~koszik/ceph/osd26-perf I tried debugging the ceph pg query a bit more, and it

Re: [ceph-users] pgs stuck unclean

2017-02-16 Thread Tomasz Kuzemko
If the PG cannot be queried I would bet on OSD message throttler. Check with "ceph --admin-daemon PATH_TO_ADMIN_SOCK perf dump" on each OSD which is holding this PG if message throttler current value is not equal max. If it is, increase the max value in ceph.conf and restart OSD. -- Tomasz

[ceph-users] pgs stuck unclean

2017-02-16 Thread Matyas Koszik
Hi, It seems that my ceph cluster is in an erroneous state of which I cannot see right now how to get out of. The status is the following: health HEALTH_WARN 25 pgs degraded 1 pgs stale 26 pgs stuck unclean 25 pgs undersized recovery 23578/9450442 objects

Re: [ceph-users] pgs stuck unclean after reweight

2016-07-25 Thread Christian Balzer
ian > Thanks for the help > Goncalo > > > > From: Christian Balzer [ch...@gol.com] > Sent: 20 July 2016 19:36 > To: ceph-us...@ceph.com > Cc: Goncalo Borges > Subject: Re: [ceph-users] pgs stuck unclean after reweight > > Hello, > > On Wed, 20 Jul 2

Re: [ceph-users] pgs stuck unclean after reweight

2016-07-23 Thread Goncalo Borges
. Thanks for the help Goncalo From: Christian Balzer [ch...@gol.com] Sent: 20 July 2016 19:36 To: ceph-us...@ceph.com Cc: Goncalo Borges Subject: Re: [ceph-users] pgs stuck unclean after reweight Hello, On Wed, 20 Jul 2016 13:42:20 +1000 Goncalo Borges wrote

Re: [ceph-users] pgs stuck unclean after reweight

2016-07-20 Thread Christian Balzer
Hello, On Wed, 20 Jul 2016 13:42:20 +1000 Goncalo Borges wrote: > Hi All... > > Today we had a warning regarding 8 near full osd. Looking to the osds > occupation, 3 of them were above 90%. One would hope that this would have been picked up earlier, as in before it even reaches near-full.

Re: [ceph-users] pgs stuck unclean after reweight

2016-07-20 Thread M Ranga Swami Reddy
Ok...try the same with osd.32 and osd.13...one by one (do the osd.32 and wait if any rebalance happens, if no changes, then do it on osd.13). thanks Swami On Wed, Jul 20, 2016 at 11:59 AM, Goncalo Borges wrote: > Hi Swami. > > Did not make any difference. > >

Re: [ceph-users] pgs stuck unclean after reweight

2016-07-20 Thread Goncalo Borges
Hi Swami. Did not make any difference. Cheers G. On 07/20/2016 03:31 PM, M Ranga Swami Reddy wrote: can you restart osd.32 and check the status? Thanks Swami On Wed, Jul 20, 2016 at 9:12 AM, Goncalo Borges wrote: Hi All... Today we had a warning regarding

Re: [ceph-users] pgs stuck unclean after reweight

2016-07-19 Thread M Ranga Swami Reddy
can you restart osd.32 and check the status? Thanks Swami On Wed, Jul 20, 2016 at 9:12 AM, Goncalo Borges wrote: > Hi All... > > Today we had a warning regarding 8 near full osd. Looking to the osds > occupation, 3 of them were above 90%. In order to solve the

Re: [ceph-users] pgs stuck unclean after reweight

2016-07-19 Thread Goncalo Borges
I think I understood the source of the problem: 1. This is the original pg mapping before reweighing: # egrep "(^6.e2\s|^6.4\s|^5.24\s|^5.306\s)" /tmp/pg_dump.1 6.e21273200004539146855330843084 active+clean2016-07-19 19:06:56.6221851005'234027

Re: [ceph-users] pgs stuck unclean after reweight

2016-07-19 Thread Goncalo Borges
Hi KK... Thanks. I did set 'sortbitwise flag' since that was mentioned in the release notes. However I do not understand how this relates to this problem. Can you give a bit more info? Cheers and Thanks Goncalo On 07/20/2016 02:10 PM, K K wrote: Hi, Goncalo. Do you set sortbitwise

[ceph-users] pgs stuck unclean after reweight

2016-07-19 Thread Goncalo Borges
Hi All... Today we had a warning regarding 8 near full osd. Looking to the osds occupation, 3 of them were above 90%. In order to solve the situation, I've decided to reweigh those first using ceph osd crush reweight osd.1 2.67719 ceph osd crush reweight osd.26 2.67719 ceph osd

[ceph-users] pgs stuck unclean on a new pool despite the pool size reconfiguration

2015-10-02 Thread Giuseppe Civitella
Hi all, I have a Firefly cluster which has been upgraded from Emperor. It has 2 OSD hosts and 3 monitors. The cluster has default values for what concerns size and min_size of the pools. Once upgraded to Firefly, I created a new pool called bench2: ceph osd pool create bench2 128 128 and set its

Re: [ceph-users] pgs stuck unclean on a new pool despite the pool size reconfiguration

2015-10-02 Thread Warren Wang - ISD
<giuseppe.civite...@gmail.com<mailto:giuseppe.civite...@gmail.com>> Date: Friday, October 2, 2015 at 10:05 AM To: ceph-users <ceph-us...@ceph.com<mailto:ceph-us...@ceph.com>> Subject: [ceph-users] pgs stuck unclean on a new pool despite the pool size reconfiguration

Re: [ceph-users] pgs stuck unclean on a new pool despite the pool size reconfiguration

2015-10-02 Thread Giuseppe Civitella
Date: Friday, October 2, 2015 at 10:05 AM > To: ceph-users <ceph-us...@ceph.com<mailto:ceph-us...@ceph.com>> > Subject: [ceph-users] pgs stuck unclean on a new pool despite the pool > size reconfiguration > > Hi all, > I have a Firefly cluster which has been upgraded from

Re: [ceph-users] 'pgs stuck unclean ' problem

2015-03-20 Thread Burkhard Linke
Hi, On 03/20/2015 01:58 AM, houguanghua wrote: Dear all, Ceph 0.72.2 is deployed in three hosts. But the ceph's status is HEALTH_WARN . The status is as follows: # ceph -s cluster e25909ed-25d9-42fd-8c97-0ed31eec6194 health HEALTH_WARN 768 pgs degraded; 768 pgs stuck

[ceph-users] 'pgs stuck unclean ' problem

2015-03-19 Thread houguanghua
Dear all, Ceph 0.72.2 is deployed in three hosts. But the ceph's status is HEALTH_WARN . The status is as follows: # ceph -s cluster e25909ed-25d9-42fd-8c97-0ed31eec6194 health HEALTH_WARN 768 pgs degraded; 768 pgs stuck unclean; recovery 2/3 objects degraded (66.667%) monmap

Re: [ceph-users] PGs stuck unclean active+remapped after an osd marked out

2015-03-16 Thread Gregory Farnum
On Wed, Mar 11, 2015 at 3:49 PM, Francois Lafont flafdiv...@free.fr wrote: Hi, I was always in the same situation: I couldn't remove an OSD without have some PGs definitely stuck to the active+remapped state. But I remembered I read on IRC that, before to mark out an OSD, it could be

Re: [ceph-users] PGs stuck unclean active+remapped after an osd marked out

2015-03-16 Thread Craig Lewis
If I remember/guess correctly, if you mark an OSD out it won't necessarily change the weight of the bucket above it (ie, the host), whereas if you change the weight of the OSD then the host bucket's weight changes. -Greg That sounds right. Marking an OSD out is a ceph osd reweight, not

Re: [ceph-users] PGs stuck unclean active+remapped after an osd marked out

2015-03-16 Thread Francois Lafont
Hi, Gregory Farnum a wrote : If I remember/guess correctly, if you mark an OSD out it won't necessarily change the weight of the bucket above it (ie, the host), whereas if you change the weight of the OSD then the host bucket's weight changes. I can just say that, indeed, I have noticed

Re: [ceph-users] PGs stuck unclean active+remapped after an osd marked out

2015-03-11 Thread Francois Lafont
Hi, I was always in the same situation: I couldn't remove an OSD without have some PGs definitely stuck to the active+remapped state. But I remembered I read on IRC that, before to mark out an OSD, it could be sometimes a good idea to reweight it to 0. So, instead of doing [1]: ceph osd out

Re: [ceph-users] PGs stuck unclean active+remapped after an osd marked out

2015-03-11 Thread Francois Lafont
Le 11/03/2015 05:44, Francois Lafont a écrit : PS: here is my conf. [...] I have this too: ~# ceph osd crush show-tunables { choose_local_tries: 0, choose_local_fallback_tries: 0, choose_total_tries: 50, chooseleaf_descend_once: 1, chooseleaf_vary_r: 0, straw_calc_version: 1,

[ceph-users] PGs stuck unclean active+remapped after an osd marked out

2015-03-10 Thread Francois Lafont
Hi, I had a ceph cluster in HEALTH_OK state with Firefly 0.80.9. I just wanted to remove an OSD (which worked well). So after: ceph osd out 3 I waited for the rebalancing but I had PGs stuck unclean: --- ~# ceph -s cluster

Re: [ceph-users] pgs stuck unclean in a pool without name

2014-04-23 Thread Cedric Lemarchand
Hello, Le 18/04/2014 16:33, Jean-Charles LOPEZ a écrit : use the radios command to remove the empty pool name if you need to. rados rmpool ‘’ ‘’ —yes-i-really-really-mean-it You won’t be alb to remove it with the ceph command Dont know how this pool got unclean, anyway as you said I removed

[ceph-users] pgs stuck unclean in a pool without name

2014-04-18 Thread Cedric Lemarchand
Hi, I am facing a strange behaviour where a pool is stucked, I have no idea how this pool appear in the cluster in the way I have not played with pool creation, *yet*. # root@node1:~# ceph -s cluster 1b147882-722c-43d8-8dfb-38b78d9fbec3 health HEALTH_WARN 333 pgs degraded; 333 pgs

Re: [ceph-users] pgs stuck unclean in a pool without name

2014-04-18 Thread Ирек Фасихов
Show command please: ceph osd tree. 2014-04-18 14:51 GMT+04:00 Cedric Lemarchand ced...@yipikai.org: Hi, I am facing a strange behaviour where a pool is stucked, I have no idea how this pool appear in the cluster in the way I have not played with pool creation, *yet*. #

Re: [ceph-users] pgs stuck unclean in a pool without name

2014-04-18 Thread Cedric Lemarchand
Hi, Le 18/04/2014 13:14, Ирек Фасихов a écrit : Show command please: ceph osd tree. Sure : root@node1:~# ceph osd tree # idweighttype nameup/downreweight -13root default -23host node1 01osd.0up1 11osd.1up1

Re: [ceph-users] pgs stuck unclean in a pool without name

2014-04-18 Thread Ирек Фасихов
This pools are created automatically when there is a start S3 (ceph-radosgw). By default, your configuration file, indicate the number of pgs = 333. But it's a lot for your configuration. 2014-04-18 15:28 GMT+04:00 Cedric Lemarchand ced...@yipikai.org: Hi, Le 18/04/2014 13:14, Ирек Фасихов

Re: [ceph-users] pgs stuck unclean in a pool without name

2014-04-18 Thread Jean-Charles LOPEZ
Hi Cedric, use the radios command to remove the empty pool name if you need to. rados rmpool ‘’ ‘’ —yes-i-really-really-mean-it You won’t be alb to remove it with the ceph command JC On Apr 18, 2014, at 03:51, Cedric Lemarchand ced...@yipikai.org wrote: Hi, I am facing a strange

Re: [ceph-users] pgs stuck unclean since forever, current state active+remapped

2013-08-15 Thread Gregory Farnum
They're unclean because CRUSH isn't generating an acting set of sufficient size so the OSDs/monitors are keeping them remapped in order to maintain replication guarantees. Look in the docs for the crush tunables options for a discussion on this. -Greg Software Engineer #42 @ http://inktank.com |

Re: [ceph-users] pgs stuck unclean since forever, current state active+remapped

2013-08-15 Thread 不坏阿峰
many thanks . i did and resolved it by : #ceph osd getcrushmap -o /tmp/crush #crushtool -i /tmp/crush --enable-unsafe-tunables --set-choose-local-tries 0 --set-choose-local-fallback-tries 0 --set-choose-total-tries 50 -o /tmp/crush.new root@ceph-admin:/etc/ceph# ceph osd setcrushmap -i

Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-13 Thread Samuel Just
Cool! -Sam On Tue, Aug 13, 2013 at 4:49 AM, Jeff Moskow j...@rtr.com wrote: Sam, Thanks that did it :-) health HEALTH_OK monmap e17: 5 mons at {a=172.16.170.1:6789/0,b=172.16.170.2:6789/0,c=172.16.170.3:6789/0,d=172.16.170.4:6789/0,e=172.16.170.5:6789/0}, election epoch 9794,

Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Samuel Just
Can you attach the output of ceph osd tree? Also, can you run ceph osd getmap -o /tmp/osdmap and attach /tmp/osdmap? -Sam On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow j...@rtr.com wrote: Thanks for the suggestion. I had tried stopping each OSD for 30 seconds, then restarting it, waiting 2

Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Jeff Moskow
Sam, I've attached both files. Thanks! Jeff On Mon, Aug 12, 2013 at 01:46:57PM -0700, Samuel Just wrote: Can you attach the output of ceph osd tree? Also, can you run ceph osd getmap -o /tmp/osdmap and attach /tmp/osdmap? -Sam On Fri, Aug 9, 2013 at 4:28 AM, Jeff

Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Samuel Just
Are you using any kernel clients? Will osds 3,14,16 be coming back? -Sam On Mon, Aug 12, 2013 at 2:26 PM, Jeff Moskow j...@rtr.com wrote: Sam, I've attached both files. Thanks! Jeff On Mon, Aug 12, 2013 at 01:46:57PM -0700, Samuel Just wrote: Can you attach the output of

Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Jeff Moskow
Sam, 3, 14 and 16 have been down for a while and I'll eventually replace those drives (I could do it now) but didn't want to introduce more variables. We are using RBD with Proxmox, so I think the answer about kernel clients is yes Jeff On Mon, Aug 12, 2013 at 02:41:11PM

[ceph-users] pgs stuck unclean since forever, current state active+remapped

2013-08-12 Thread 不坏阿峰
i got PGs stuck long time. do not how to fix it. can some person help to check? Environment: Debian 7 + ceph 0.617 root@ceph-admin:~# ceph -s health HEALTH_WARN 6 pgs stuck unclean monmap e2: 2 mons at {a=192.168.250.15:6789/0,b=192.168.250.8:6789/0}, election epoch 8,

[ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-09 Thread Jeff Moskow
Hi, I have a 5 node ceph cluster that is running well (no problems using any of the rbd images and that's really all we use). I have replication set to 3 on all three pools (data, metadata and rbd). ceph -s reports: health HEALTH_WARN 3 pgs degraded;

Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-09 Thread Wido den Hollander
On 08/09/2013 10:58 AM, Jeff Moskow wrote: Hi, I have a 5 node ceph cluster that is running well (no problems using any of the rbd images and that's really all we use). I have replication set to 3 on all three pools (data, metadata and rbd). ceph -s reports:

Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-09 Thread Jeff Moskow
Thanks for the suggestion. I had tried stopping each OSD for 30 seconds, then restarting it, waiting 2 minutes and then doing the next one (all OSD's eventually restarted). I tried this twice. -- ___ ceph-users mailing list

Re: [ceph-users] pgs stuck unclean after growing my ceph-cluster

2013-03-14 Thread Ansgar Jazdzewski
Hi, Thanks, my warning is gone now. 2013/3/13 Jeff Anderson-Lee jo...@eecs.berkeley.edu On 3/13/2013 9:31 AM, Greg Farnum wrote: Nope, it's not because you were using the cluster. The unclean PGs here are those which are in the active+remapped state. That's actually two states — active