David Turner wrote:
: A couple things. You didn't `ceph osd crush remove osd.21` after doing the
: other bits. Also you will want to remove the bucket (re: host) from the
: crush map as it will now be empty. Right now you have a host in the crush
: map with a weight, but no osds to put that
I would stop the service, down, out, rm, auth del, crush remove, disable
service, fstab, umount.
So you did remove it from your crush map, then? Could you post your `ceph
osd tree`?
On Wed, Jun 28, 2017, 10:12 AM Mazzystr wrote:
> I've been using this procedure to remove
I've been using this procedure to remove OSDs...
OSD_ID=
ceph auth del osd.${OSD_ID}
ceph osd down ${OSD_ID}
ceph osd out ${OSD_ID}
ceph osd rm ${OSD_ID}
ceph osd crush remove osd.${OSD_ID}
systemctl disable ceph-osd@${OSD_ID}.service
systemctl stop ceph-osd@${OSD_ID}.service
sed -i
A couple things. You didn't `ceph osd crush remove osd.21` after doing the
other bits. Also you will want to remove the bucket (re: host) from the
crush map as it will now be empty. Right now you have a host in the crush
map with a weight, but no osds to put that data on. It has a weight
Hello,
TL;DR: what to do when my cluster reports stuck unclean pgs?
Detailed description:
One of the nodes in my cluster died. CEPH correctly rebalanced itself,
and reached the HEALTH_OK state. I have looked at the failed server,
and decided to take it out of the cluster permanently,
On Sat, Feb 18, 2017 at 9:03 AM, Matyas Koszik wrote:
>
>
> Looks like you've provided me with the solution, thanks!
:)
> I've set the tunables to firefly, and now I only see the normal states
> associated with a recovering cluster, there're no more stale pgs.
> I hope it'll stay
Looks like you've provided me with the solution, thanks!
I've set the tunables to firefly, and now I only see the normal states
associated with a recovering cluster, there're no more stale pgs.
I hope it'll stay like this when it's done, but that'll take quite a
while.
Matyas
On Fri, 17 Feb
I set it to 100, then restarted osd26, but after recovery everything is as
it was before.
On Sat, 18 Feb 2017, Shinobu Kinjo wrote:
> You may need to increase ``choose_total_tries`` to more than 50
> (default) up to 100.
>
> -
>
You may need to increase ``choose_total_tries`` to more than 50
(default) up to 100.
-
http://docs.ceph.com/docs/master/rados/operations/crush-map/#editing-a-crush-map
- https://github.com/ceph/ceph/blob/master/doc/man/8/crushtool.rst
On Sat, Feb 18, 2017 at 5:25 AM, Matyas Koszik
I have size=2 and 3 independent nodes. I'm happy to try firefly tunables,
but a bit scared that it would make things even worse.
On Fri, 17 Feb 2017, Gregory Farnum wrote:
> Situations that are stable lots of undersized PGs like this generally
> mean that the CRUSH map is failing to allocate
It's at https://atw.hu/~koszik/ceph/crushmap.txt
On Sat, 18 Feb 2017, Shinobu Kinjo wrote:
> Can you do?
>
> * ceph osd getcrushmap -o ./crushmap.o; crushtool -d ./crushmap.o -o
> ./crushmap.txt
>
> On Sat, Feb 18, 2017 at 3:52 AM, Gregory Farnum wrote:
> > Situations
Can you do?
* ceph osd getcrushmap -o ./crushmap.o; crushtool -d ./crushmap.o -o
./crushmap.txt
On Sat, Feb 18, 2017 at 3:52 AM, Gregory Farnum wrote:
> Situations that are stable lots of undersized PGs like this generally
> mean that the CRUSH map is failing to allocate
Situations that are stable lots of undersized PGs like this generally
mean that the CRUSH map is failing to allocate enough OSDs for certain
PGs. The log you have says the OSD is trying to NOTIFY the new primary
that the PG exists here on this replica.
I'd guess you only have 3 hosts and are
I'm not sure what variable should I be looking at exactly, but after
reading through all of them I don't see anyting supsicious, all values are
0. I'm attaching it anyway, in case I missed something:
https://atw.hu/~koszik/ceph/osd26-perf
I tried debugging the ceph pg query a bit more, and it
If the PG cannot be queried I would bet on OSD message throttler. Check with
"ceph --admin-daemon PATH_TO_ADMIN_SOCK perf dump" on each OSD which is holding
this PG if message throttler current value is not equal max. If it is,
increase the max value in ceph.conf and restart OSD.
--
Tomasz
Hi,
It seems that my ceph cluster is in an erroneous state of which I cannot
see right now how to get out of.
The status is the following:
health HEALTH_WARN
25 pgs degraded
1 pgs stale
26 pgs stuck unclean
25 pgs undersized
recovery 23578/9450442 objects
ian
> Thanks for the help
> Goncalo
>
>
>
> From: Christian Balzer [ch...@gol.com]
> Sent: 20 July 2016 19:36
> To: ceph-us...@ceph.com
> Cc: Goncalo Borges
> Subject: Re: [ceph-users] pgs stuck unclean after reweight
>
> Hello,
>
> On Wed, 20 Jul 2
.
Thanks for the help
Goncalo
From: Christian Balzer [ch...@gol.com]
Sent: 20 July 2016 19:36
To: ceph-us...@ceph.com
Cc: Goncalo Borges
Subject: Re: [ceph-users] pgs stuck unclean after reweight
Hello,
On Wed, 20 Jul 2016 13:42:20 +1000 Goncalo Borges wrote
Hello,
On Wed, 20 Jul 2016 13:42:20 +1000 Goncalo Borges wrote:
> Hi All...
>
> Today we had a warning regarding 8 near full osd. Looking to the osds
> occupation, 3 of them were above 90%.
One would hope that this would have been picked up earlier, as in before
it even reaches near-full.
Ok...try the same with osd.32 and osd.13...one by one (do the osd.32
and wait if any rebalance happens, if no changes, then do it on
osd.13).
thanks
Swami
On Wed, Jul 20, 2016 at 11:59 AM, Goncalo Borges
wrote:
> Hi Swami.
>
> Did not make any difference.
>
>
Hi Swami.
Did not make any difference.
Cheers
G.
On 07/20/2016 03:31 PM, M Ranga Swami Reddy wrote:
can you restart osd.32 and check the status?
Thanks
Swami
On Wed, Jul 20, 2016 at 9:12 AM, Goncalo Borges
wrote:
Hi All...
Today we had a warning regarding
can you restart osd.32 and check the status?
Thanks
Swami
On Wed, Jul 20, 2016 at 9:12 AM, Goncalo Borges
wrote:
> Hi All...
>
> Today we had a warning regarding 8 near full osd. Looking to the osds
> occupation, 3 of them were above 90%. In order to solve the
I think I understood the source of the problem:
1. This is the original pg mapping before reweighing:
# egrep "(^6.e2\s|^6.4\s|^5.24\s|^5.306\s)" /tmp/pg_dump.1
6.e21273200004539146855330843084
active+clean2016-07-19 19:06:56.6221851005'234027
Hi KK...
Thanks. I did set 'sortbitwise flag' since that was mentioned in the
release notes.
However I do not understand how this relates to this problem.
Can you give a bit more info?
Cheers and Thanks
Goncalo
On 07/20/2016 02:10 PM, K K wrote:
Hi, Goncalo.
Do you set sortbitwise
Hi All...
Today we had a warning regarding 8 near full osd. Looking to the osds
occupation, 3 of them were above 90%. In order to solve the situation,
I've decided to reweigh those first using
ceph osd crush reweight osd.1 2.67719
ceph osd crush reweight osd.26 2.67719
ceph osd
Hi all,
I have a Firefly cluster which has been upgraded from Emperor.
It has 2 OSD hosts and 3 monitors.
The cluster has default values for what concerns size and min_size of the
pools.
Once upgraded to Firefly, I created a new pool called bench2:
ceph osd pool create bench2 128 128
and set its
<giuseppe.civite...@gmail.com<mailto:giuseppe.civite...@gmail.com>>
Date: Friday, October 2, 2015 at 10:05 AM
To: ceph-users <ceph-us...@ceph.com<mailto:ceph-us...@ceph.com>>
Subject: [ceph-users] pgs stuck unclean on a new pool despite the pool size
reconfiguration
Date: Friday, October 2, 2015 at 10:05 AM
> To: ceph-users <ceph-us...@ceph.com<mailto:ceph-us...@ceph.com>>
> Subject: [ceph-users] pgs stuck unclean on a new pool despite the pool
> size reconfiguration
>
> Hi all,
> I have a Firefly cluster which has been upgraded from
Hi,
On 03/20/2015 01:58 AM, houguanghua wrote:
Dear all,
Ceph 0.72.2 is deployed in three hosts. But the ceph's status is
HEALTH_WARN . The status is as follows:
# ceph -s
cluster e25909ed-25d9-42fd-8c97-0ed31eec6194
health HEALTH_WARN 768 pgs degraded; 768 pgs stuck
Dear all,
Ceph 0.72.2 is deployed in three hosts. But the ceph's status is HEALTH_WARN .
The status is as follows:
# ceph -s
cluster e25909ed-25d9-42fd-8c97-0ed31eec6194
health HEALTH_WARN 768 pgs degraded; 768 pgs stuck unclean; recovery 2/3
objects degraded (66.667%)
monmap
On Wed, Mar 11, 2015 at 3:49 PM, Francois Lafont flafdiv...@free.fr wrote:
Hi,
I was always in the same situation: I couldn't remove an OSD without
have some PGs definitely stuck to the active+remapped state.
But I remembered I read on IRC that, before to mark out an OSD, it
could be
If I remember/guess correctly, if you mark an OSD out it won't
necessarily change the weight of the bucket above it (ie, the host),
whereas if you change the weight of the OSD then the host bucket's
weight changes.
-Greg
That sounds right. Marking an OSD out is a ceph osd reweight, not
Hi,
Gregory Farnum a wrote :
If I remember/guess correctly, if you mark an OSD out it won't
necessarily change the weight of the bucket above it (ie, the host),
whereas if you change the weight of the OSD then the host bucket's
weight changes.
I can just say that, indeed, I have noticed
Hi,
I was always in the same situation: I couldn't remove an OSD without
have some PGs definitely stuck to the active+remapped state.
But I remembered I read on IRC that, before to mark out an OSD, it
could be sometimes a good idea to reweight it to 0. So, instead of
doing [1]:
ceph osd out
Le 11/03/2015 05:44, Francois Lafont a écrit :
PS: here is my conf.
[...]
I have this too:
~# ceph osd crush show-tunables
{ choose_local_tries: 0,
choose_local_fallback_tries: 0,
choose_total_tries: 50,
chooseleaf_descend_once: 1,
chooseleaf_vary_r: 0,
straw_calc_version: 1,
Hi,
I had a ceph cluster in HEALTH_OK state with Firefly 0.80.9. I just
wanted to remove an OSD (which worked well). So after:
ceph osd out 3
I waited for the rebalancing but I had PGs stuck unclean:
---
~# ceph -s
cluster
Hello,
Le 18/04/2014 16:33, Jean-Charles LOPEZ a écrit :
use the radios command to remove the empty pool name if you need to.
rados rmpool ‘’ ‘’ —yes-i-really-really-mean-it
You won’t be alb to remove it with the ceph command
Dont know how this pool got unclean, anyway as you said I removed
Hi,
I am facing a strange behaviour where a pool is stucked, I have no idea
how this pool appear in the cluster in the way I have not played with
pool creation, *yet*.
# root@node1:~# ceph -s
cluster 1b147882-722c-43d8-8dfb-38b78d9fbec3
health HEALTH_WARN 333 pgs degraded; 333 pgs
Show command please: ceph osd tree.
2014-04-18 14:51 GMT+04:00 Cedric Lemarchand ced...@yipikai.org:
Hi,
I am facing a strange behaviour where a pool is stucked, I have no idea
how this pool appear in the cluster in the way I have not played with pool
creation, *yet*.
#
Hi,
Le 18/04/2014 13:14, Ирек Фасихов a écrit :
Show command please: ceph osd tree.
Sure :
root@node1:~# ceph osd tree
# idweighttype nameup/downreweight
-13root default
-23host node1
01osd.0up1
11osd.1up1
This pools are created automatically when there is a start S3
(ceph-radosgw). By default, your configuration file, indicate the number of
pgs = 333. But it's a lot for your configuration.
2014-04-18 15:28 GMT+04:00 Cedric Lemarchand ced...@yipikai.org:
Hi,
Le 18/04/2014 13:14, Ирек Фасихов
Hi Cedric,
use the radios command to remove the empty pool name if you need to.
rados rmpool ‘’ ‘’ —yes-i-really-really-mean-it
You won’t be alb to remove it with the ceph command
JC
On Apr 18, 2014, at 03:51, Cedric Lemarchand ced...@yipikai.org wrote:
Hi,
I am facing a strange
They're unclean because CRUSH isn't generating an acting set of
sufficient size so the OSDs/monitors are keeping them remapped in
order to maintain replication guarantees. Look in the docs for the
crush tunables options for a discussion on this.
-Greg
Software Engineer #42 @ http://inktank.com |
many thanks . i did and resolved it by :
#ceph osd getcrushmap -o /tmp/crush
#crushtool -i /tmp/crush --enable-unsafe-tunables
--set-choose-local-tries 0 --set-choose-local-fallback-tries 0
--set-choose-total-tries 50 -o /tmp/crush.new
root@ceph-admin:/etc/ceph# ceph osd setcrushmap -i
Cool!
-Sam
On Tue, Aug 13, 2013 at 4:49 AM, Jeff Moskow j...@rtr.com wrote:
Sam,
Thanks that did it :-)
health HEALTH_OK
monmap e17: 5 mons at
{a=172.16.170.1:6789/0,b=172.16.170.2:6789/0,c=172.16.170.3:6789/0,d=172.16.170.4:6789/0,e=172.16.170.5:6789/0},
election epoch 9794,
Can you attach the output of ceph osd tree?
Also, can you run
ceph osd getmap -o /tmp/osdmap
and attach /tmp/osdmap?
-Sam
On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow j...@rtr.com wrote:
Thanks for the suggestion. I had tried stopping each OSD for 30 seconds,
then restarting it, waiting 2
Sam,
I've attached both files.
Thanks!
Jeff
On Mon, Aug 12, 2013 at 01:46:57PM -0700, Samuel Just wrote:
Can you attach the output of ceph osd tree?
Also, can you run
ceph osd getmap -o /tmp/osdmap
and attach /tmp/osdmap?
-Sam
On Fri, Aug 9, 2013 at 4:28 AM, Jeff
Are you using any kernel clients? Will osds 3,14,16 be coming back?
-Sam
On Mon, Aug 12, 2013 at 2:26 PM, Jeff Moskow j...@rtr.com wrote:
Sam,
I've attached both files.
Thanks!
Jeff
On Mon, Aug 12, 2013 at 01:46:57PM -0700, Samuel Just wrote:
Can you attach the output of
Sam,
3, 14 and 16 have been down for a while and I'll eventually replace
those drives (I could do it now)
but didn't want to introduce more variables.
We are using RBD with Proxmox, so I think the answer about kernel
clients is yes
Jeff
On Mon, Aug 12, 2013 at 02:41:11PM
i got PGs stuck long time. do not how to fix it. can some person help to
check?
Environment: Debian 7 + ceph 0.617
root@ceph-admin:~# ceph -s
health HEALTH_WARN 6 pgs stuck unclean
monmap e2: 2 mons at {a=192.168.250.15:6789/0,b=192.168.250.8:6789/0},
election epoch 8,
Hi,
I have a 5 node ceph cluster that is running well (no problems using
any of the
rbd images and that's really all we use).
I have replication set to 3 on all three pools (data, metadata and rbd).
ceph -s reports:
health HEALTH_WARN 3 pgs degraded;
On 08/09/2013 10:58 AM, Jeff Moskow wrote:
Hi,
I have a 5 node ceph cluster that is running well (no problems using
any of the
rbd images and that's really all we use).
I have replication set to 3 on all three pools (data, metadata and rbd).
ceph -s reports:
Thanks for the suggestion. I had tried stopping each OSD for 30
seconds, then restarting it, waiting 2 minutes and then doing the next
one (all OSD's eventually restarted). I tried this twice.
--
___
ceph-users mailing list
Hi,
Thanks, my warning is gone now.
2013/3/13 Jeff Anderson-Lee jo...@eecs.berkeley.edu
On 3/13/2013 9:31 AM, Greg Farnum wrote:
Nope, it's not because you were using the cluster. The unclean PGs here
are those which are in the active+remapped state. That's actually two
states — active
54 matches
Mail list logo