Re: [ceph-users] Core dump while getting a volume real size with a python script
... and this is the core dump output while executing the "rbd diff" command: http://paste.openstack.org/show/477604/ Regards, Giuseppe 2015-10-28 16:46 GMT+01:00 Giuseppe Civitella <giuseppe.civite...@gmail.com> : > Hi all, > > I'm trying to get the real disk usage of a Cinder volume converting this > bash commands to python: > http://cephnotes.ksperis.com/blog/2013/08/28/rbd-image-real-size > > I wrote a small test function which has already worked in many cases but > it stops with a core dump while trying to calculate the real size of a > particular volume. > > This is the function: > http://paste.openstack.org/show/477563/ > > this is the error I get: > http://paste.openstack.org/show/477567/ > > and these are the related rbd info: > http://paste.openstack.org/show/477568/ > > Can anyone help me to debug the problem? > > Thanks > Giuseppe > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Core dump while getting a volume real size with a python script
Hi all, I'm trying to get the real disk usage of a Cinder volume converting this bash commands to python: http://cephnotes.ksperis.com/blog/2013/08/28/rbd-image-real-size I wrote a small test function which has already worked in many cases but it stops with a core dump while trying to calculate the real size of a particular volume. This is the function: http://paste.openstack.org/show/477563/ this is the error I get: http://paste.openstack.org/show/477567/ and these are the related rbd info: http://paste.openstack.org/show/477568/ Can anyone help me to debug the problem? Thanks Giuseppe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] pgs stuck unclean on a new pool despite the pool size reconfiguration
Hi all, I have a Firefly cluster which has been upgraded from Emperor. It has 2 OSD hosts and 3 monitors. The cluster has default values for what concerns size and min_size of the pools. Once upgraded to Firefly, I created a new pool called bench2: ceph osd pool create bench2 128 128 and set its sizes: ceph osd pool set bench2 size 2 ceph osd pool set bench2 min_size 1 this is the state of the pools: pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 crash_replay_interval 45 stripe_width 0 pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 stripe_width 0 pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 stripe_width 0 pool 3 'volumes' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 384 pgp_num 384 last_change 2568 stripe_width 0 removed_snaps [1~75] pool 4 'images' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 384 pgp_num 384 last_change 1895 stripe_width 0 pool 8 'bench2' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 2580 flags hashpspool stripe_width 0 despite this I still get a warning about 128 pgs stuck unclean. The "ceph health detail" shows me the stuck PGs. So i take one to get the involved OSDs: pg 8.38 is stuck unclean since forever, current state active, last acting [22,7] if I restart the OSD with id 22, the PG 8.38 gets an active+clean state. This is an incorrect behavior, AFAIK. The cluster should get noticed of the new size and min_size values without any manual intervention. So my question is: any idea about why this happens and how to restore the default behavior? Do I need to restart all of the OSDs to restore an healthy state? thanks a lot Giuseppe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pgs stuck unclean on a new pool despite the pool size reconfiguration
Hi Warren, a simple: ceph osd pool set bench2 hashpspool false solved my problem. Thank a lot Giuseppe 2015-10-02 16:18 GMT+02:00 Warren Wang - ISD <warren.w...@walmart.com>: > You probably don’t want hashpspool automatically set, since your clients > may still not understand that crush map feature. You can try to unset it > for that pool and see what happens, or create a new pool without hashpspool > enabled from the start. Just a guess. > > Warren > > From: Giuseppe Civitella <giuseppe.civite...@gmail.com giuseppe.civite...@gmail.com>> > Date: Friday, October 2, 2015 at 10:05 AM > To: ceph-users <ceph-us...@ceph.com<mailto:ceph-us...@ceph.com>> > Subject: [ceph-users] pgs stuck unclean on a new pool despite the pool > size reconfiguration > > Hi all, > I have a Firefly cluster which has been upgraded from Emperor. > It has 2 OSD hosts and 3 monitors. > The cluster has default values for what concerns size and min_size of the > pools. > Once upgraded to Firefly, I created a new pool called bench2: > ceph osd pool create bench2 128 128 > and set its sizes: > ceph osd pool set bench2 size 2 > ceph osd pool set bench2 min_size 1 > > this is the state of the pools: > pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 1 crash_replay_interval 45 > stripe_width 0 > pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 1 stripe_width 0 > pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 2 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 1 stripe_width 0 > pool 3 'volumes' replicated size 2 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 384 pgp_num 384 last_change 2568 stripe_width 0 > removed_snaps [1~75] > pool 4 'images' replicated size 2 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 384 pgp_num 384 last_change 1895 stripe_width 0 > pool 8 'bench2' replicated size 2 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 128 pgp_num 128 last_change 2580 flags hashpspool > stripe_width 0 > > despite this I still get a warning about 128 pgs stuck unclean. > The "ceph health detail" shows me the stuck PGs. So i take one to get the > involved OSDs: > > pg 8.38 is stuck unclean since forever, current state active, last acting > [22,7] > > if I restart the OSD with id 22, the PG 8.38 gets an active+clean state. > > This is an incorrect behavior, AFAIK. The cluster should get noticed of > the new size and min_size values without any manual intervention. So my > question is: any idea about why this happens and how to restore the default > behavior? Do I need to restart all of the OSDs to restore an healthy state? > > thanks a lot > Giuseppe > > This email and any files transmitted with it are confidential and intended > solely for the individual or entity to whom they are addressed. If you have > received this email in error destroy it immediately. *** Walmart > Confidential *** > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Binding a pool to certain OSDs
So it was a PG problem. I added a couple of OSD per host, reconfigured the CRUSH map and the cluster began to work properly. Thanks Giuseppe 2015-04-14 19:02 GMT+02:00 Saverio Proto ziopr...@gmail.com: No error message. You just finish the RAM memory and you blow up the cluster because of too many PGs. Saverio 2015-04-14 18:52 GMT+02:00 Giuseppe Civitella giuseppe.civite...@gmail.com: Hi Saverio, I first made a test on my test staging lab where I have only 4 OSD. On my mon servers (which run other services) I have 16BG RAM, 15GB used but 5 cached. On the OSD servers I have 3GB RAM, 3GB used but 2 cached. ceph -s tells me nothing about PGs, shouldn't I get an error message from its output? Thanks Giuseppe 2015-04-14 18:20 GMT+02:00 Saverio Proto ziopr...@gmail.com: You only have 4 OSDs ? How much RAM per server ? I think you have already too many PG. Check your RAM usage. Check on Ceph wiki guidelines to dimension the correct number of PGs. Remeber that everytime to create a new pool you add PGs into the system. Saverio 2015-04-14 17:58 GMT+02:00 Giuseppe Civitella giuseppe.civite...@gmail.com: Hi all, I've been following this tutorial to realize my setup: http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ I got this CRUSH map from my test lab: http://paste.openstack.org/show/203887/ then I modified the map and uploaded it. This is the final version: http://paste.openstack.org/show/203888/ When applied the new CRUSH map, after some rebalancing, I get this health status: [- avalon1 root@controller001 Ceph -] # ceph -s cluster af09420b-4032-415e-93fc-6b60e9db064e health HEALTH_WARN crush map has legacy tunables; mon.controller001 low disk space; clock skew detected on mon.controller002 monmap e1: 3 mons at {controller001= 10.235.24.127:6789/0,controller002=10.235.24.128:6789/0,controller003=10.235.24.129:6789/0 }, election epoch 314, quorum 0,1,2 controller001,controller002,controller003 osdmap e3092: 4 osds: 4 up, 4 in pgmap v785873: 576 pgs, 6 pools, 71548 MB data, 18095 objects 8842 MB used, 271 GB / 279 GB avail 576 active+clean and this osd tree: [- avalon1 root@controller001 Ceph -] # ceph osd tree # idweight type name up/down reweight -8 2 root sed -5 1 host ceph001-sed 2 1 osd.2 up 1 -7 1 host ceph002-sed 3 1 osd.3 up 1 -1 2 root default -4 1 host ceph001-sata 0 1 osd.0 up 1 -6 1 host ceph002-sata 1 1 osd.1 up 1 which seems not a bad situation. The problem rise when I try to create a new pool, the command ceph osd pool create sed 128 128 gets stuck. It never ends. And I noticed that my Cinder installation is not able to create volumes anymore. I've been looking in the logs for errors and found nothing. Any hint about how to proceed to restore my ceph cluster? Is there something wrong with the steps I take to update the CRUSH map? Is the problem related to Emperor? Regards, Giuseppe 2015-04-13 18:26 GMT+02:00 Giuseppe Civitella giuseppe.civite...@gmail.com: Hi all, I've got a Ceph cluster which serves volumes to a Cinder installation. It runs Emperor. I'd like to be able to replace some of the disks with OPAL disks and create a new pool which uses exclusively the latter kind of disk. I'd like to have a traditional pool and a secure one coexisting on the same ceph host. I'd then use Cinder multi backend feature to serve them. My question is: how is it possible to realize such a setup? How can I bind a pool to certain OSDs? Thanks Giuseppe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Binding a pool to certain OSDs
Hi Saverio, I first made a test on my test staging lab where I have only 4 OSD. On my mon servers (which run other services) I have 16BG RAM, 15GB used but 5 cached. On the OSD servers I have 3GB RAM, 3GB used but 2 cached. ceph -s tells me nothing about PGs, shouldn't I get an error message from its output? Thanks Giuseppe 2015-04-14 18:20 GMT+02:00 Saverio Proto ziopr...@gmail.com: You only have 4 OSDs ? How much RAM per server ? I think you have already too many PG. Check your RAM usage. Check on Ceph wiki guidelines to dimension the correct number of PGs. Remeber that everytime to create a new pool you add PGs into the system. Saverio 2015-04-14 17:58 GMT+02:00 Giuseppe Civitella giuseppe.civite...@gmail.com: Hi all, I've been following this tutorial to realize my setup: http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ I got this CRUSH map from my test lab: http://paste.openstack.org/show/203887/ then I modified the map and uploaded it. This is the final version: http://paste.openstack.org/show/203888/ When applied the new CRUSH map, after some rebalancing, I get this health status: [- avalon1 root@controller001 Ceph -] # ceph -s cluster af09420b-4032-415e-93fc-6b60e9db064e health HEALTH_WARN crush map has legacy tunables; mon.controller001 low disk space; clock skew detected on mon.controller002 monmap e1: 3 mons at {controller001= 10.235.24.127:6789/0,controller002=10.235.24.128:6789/0,controller003=10.235.24.129:6789/0 }, election epoch 314, quorum 0,1,2 controller001,controller002,controller003 osdmap e3092: 4 osds: 4 up, 4 in pgmap v785873: 576 pgs, 6 pools, 71548 MB data, 18095 objects 8842 MB used, 271 GB / 279 GB avail 576 active+clean and this osd tree: [- avalon1 root@controller001 Ceph -] # ceph osd tree # idweight type name up/down reweight -8 2 root sed -5 1 host ceph001-sed 2 1 osd.2 up 1 -7 1 host ceph002-sed 3 1 osd.3 up 1 -1 2 root default -4 1 host ceph001-sata 0 1 osd.0 up 1 -6 1 host ceph002-sata 1 1 osd.1 up 1 which seems not a bad situation. The problem rise when I try to create a new pool, the command ceph osd pool create sed 128 128 gets stuck. It never ends. And I noticed that my Cinder installation is not able to create volumes anymore. I've been looking in the logs for errors and found nothing. Any hint about how to proceed to restore my ceph cluster? Is there something wrong with the steps I take to update the CRUSH map? Is the problem related to Emperor? Regards, Giuseppe 2015-04-13 18:26 GMT+02:00 Giuseppe Civitella giuseppe.civite...@gmail.com: Hi all, I've got a Ceph cluster which serves volumes to a Cinder installation. It runs Emperor. I'd like to be able to replace some of the disks with OPAL disks and create a new pool which uses exclusively the latter kind of disk. I'd like to have a traditional pool and a secure one coexisting on the same ceph host. I'd then use Cinder multi backend feature to serve them. My question is: how is it possible to realize such a setup? How can I bind a pool to certain OSDs? Thanks Giuseppe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Binding a pool to certain OSDs
Hi all, I've been following this tutorial to realize my setup: http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ I got this CRUSH map from my test lab: http://paste.openstack.org/show/203887/ then I modified the map and uploaded it. This is the final version: http://paste.openstack.org/show/203888/ When applied the new CRUSH map, after some rebalancing, I get this health status: [- avalon1 root@controller001 Ceph -] # ceph -s cluster af09420b-4032-415e-93fc-6b60e9db064e health HEALTH_WARN crush map has legacy tunables; mon.controller001 low disk space; clock skew detected on mon.controller002 monmap e1: 3 mons at {controller001= 10.235.24.127:6789/0,controller002=10.235.24.128:6789/0,controller003=10.235.24.129:6789/0}, election epoch 314, quorum 0,1,2 controller001,controller002,controller003 osdmap e3092: 4 osds: 4 up, 4 in pgmap v785873: 576 pgs, 6 pools, 71548 MB data, 18095 objects 8842 MB used, 271 GB / 279 GB avail 576 active+clean and this osd tree: [- avalon1 root@controller001 Ceph -] # ceph osd tree # idweight type name up/down reweight -8 2 root sed -5 1 host ceph001-sed 2 1 osd.2 up 1 -7 1 host ceph002-sed 3 1 osd.3 up 1 -1 2 root default -4 1 host ceph001-sata 0 1 osd.0 up 1 -6 1 host ceph002-sata 1 1 osd.1 up 1 which seems not a bad situation. The problem rise when I try to create a new pool, the command ceph osd pool create sed 128 128 gets stuck. It never ends. And I noticed that my Cinder installation is not able to create volumes anymore. I've been looking in the logs for errors and found nothing. Any hint about how to proceed to restore my ceph cluster? Is there something wrong with the steps I take to update the CRUSH map? Is the problem related to Emperor? Regards, Giuseppe 2015-04-13 18:26 GMT+02:00 Giuseppe Civitella giuseppe.civite...@gmail.com : Hi all, I've got a Ceph cluster which serves volumes to a Cinder installation. It runs Emperor. I'd like to be able to replace some of the disks with OPAL disks and create a new pool which uses exclusively the latter kind of disk. I'd like to have a traditional pool and a secure one coexisting on the same ceph host. I'd then use Cinder multi backend feature to serve them. My question is: how is it possible to realize such a setup? How can I bind a pool to certain OSDs? Thanks Giuseppe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Binding a pool to certain OSDs
Hi all, I've got a Ceph cluster which serves volumes to a Cinder installation. It runs Emperor. I'd like to be able to replace some of the disks with OPAL disks and create a new pool which uses exclusively the latter kind of disk. I'd like to have a traditional pool and a secure one coexisting on the same ceph host. I'd then use Cinder multi backend feature to serve them. My question is: how is it possible to realize such a setup? How can I bind a pool to certain OSDs? Thanks Giuseppe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Rbd image's data deletion
Hi all, what happens to data contained in an rbd image when the image itself gets deleted? Are the data just unlinked or are them destroyed in a way that make them unreadable? thanks Giuseppe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph, LIO, VMWARE anyone?
Hi all, I'm working on a lab setup regarding Ceph serving rbd images as ISCSI datastores to VMWARE via a LIO box. Is there someone that already did something similar wanting to share some knowledge? Any production deployments? What about LIO's HA and luns' performances? Thanks Giuseppe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph-deploy install and pinning on Ubuntu 14.04
Hi all, I'm using deph-deploy on Ubuntu 14.04. When I do a ceph-deploy install I see packages getting installed from ubuntu repositories instead of ceph's ones, am I missing something? Do I need to do some pinning on repositories? Thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] active+degraded on an empty new cluster
Craig, Gregory, my disks were a bit smaller than 10GB, I changed them with 20GB disks and the cluster's health went OK. Thanks a lot 2014-12-10 0:08 GMT+01:00 Craig Lewis cle...@centraldesktop.com: When I first created a test cluster, I used 1 GiB disks. That causes problems. Ceph has a CRUSH weight. By default, the weight is the size of the disk in TiB, truncated to 2 decimal places. ie, any disk smaller than 10 GiB will have a weight of 0.00. I increased all of my virtual disks to 10 GiB. After rebooting the nodes (to see the changes), everything healed. On Tue, Dec 9, 2014 at 9:45 AM, Gregory Farnum g...@gregs42.com wrote: It looks like your OSDs all have weight zero for some reason. I'd fix that. :) -Greg On Tue, Dec 9, 2014 at 6:24 AM Giuseppe Civitella giuseppe.civite...@gmail.com wrote: Hi, thanks for the quick answer. I did try the force_create_pg on a pg but is stuck on creating: root@ceph-mon1:/home/ceph# ceph pg dump |grep creating dumped all in format plain 2.2f0 0 0 0 0 0 0 creating 2014-12-09 13:11:37.384808 0'0 0:0 [] -1 [] -1 0'0 0.000'0 0.00 root@ceph-mon1:/home/ceph# ceph pg 2.2f query { state: active+degraded, epoch: 105, up: [ 0], acting: [ 0], actingbackfill: [ 0], info: { pgid: 2.2f, last_update: 0'0, last_complete: 0'0, log_tail: 0'0, last_user_version: 0, last_backfill: MAX, purged_snaps: [], last_scrub: 0'0, last_scrub_stamp: 2014-12-06 14:15:11.499769, last_deep_scrub: 0'0, last_deep_scrub_stamp: 2014-12-06 14:15:11.499769, last_clean_scrub_stamp: 0.00, log_size: 0, ondisk_log_size: 0, stats_invalid: 0, stat_sum: { num_bytes: 0, num_objects: 0, num_object_clones: 0, num_object_copies: 0, num_objects_missing_on_primary: 0, num_objects_degraded: 0, num_objects_unfound: 0, num_objects_dirty: 0, num_whiteouts: 0, num_read: 0, num_read_kb: 0, num_write: 0, num_write_kb: 0, num_scrub_errors: 0, num_shallow_scrub_errors: 0, num_deep_scrub_errors: 0, num_objects_recovered: 0, num_bytes_recovered: 0, num_keys_recovered: 0, num_objects_omap: 0, num_objects_hit_set_archive: 0}, stat_cat_sum: {}, up: [ 0], acting: [ 0], up_primary: 0, acting_primary: 0}, empty: 1, dne: 0, incomplete: 0, last_epoch_started: 104, hit_set_history: { current_last_update: 0'0, current_last_stamp: 0.00, current_info: { begin: 0.00, end: 0.00, version: 0'0}, history: []}}, peer_info: [], recovery_state: [ { name: Started\/Primary\/Active, enter_time: 2014-12-09 12:12:52.760384, might_have_unfound: [], recovery_progress: { backfill_targets: [], waiting_on_backfill: [], last_backfill_started: 0\/\/0\/\/-1, backfill_info: { begin: 0\/\/0\/\/-1, end: 0\/\/0\/\/-1, objects: []}, peer_backfill_info: [], backfills_in_flight: [], recovering: [], pg_backend: { pull_from_peer: [], pushing: []}}, scrub: { scrubber.epoch_start: 0, scrubber.active: 0, scrubber.block_writes: 0, scrubber.finalizing: 0, scrubber.waiting_on: 0, scrubber.waiting_on_whom: []}}, { name: Started, enter_time: 2014-12-09 12:12:51.845686}], agent_state: {}}root@ceph-mon1:/home/ceph# 2014-12-09 13:01 GMT+01:00 Irek Fasikhov malm...@gmail.com: Hi. http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/ ceph pg force_create_pg pgid 2014-12-09 14:50 GMT+03:00 Giuseppe Civitella giuseppe.civite...@gmail.com: Hi all, last week I installed a new ceph cluster on 3 vm running Ubuntu 14.04 with default kernel. There is a ceph monitor a two osd hosts. Here are some datails: ceph -s cluster c46d5b02-dab1-40bf-8a3d-f8e4a77b79da health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean monmap e1: 1 mons at {ceph-mon1=10.1.1.83:6789/0}, election epoch 1, quorum 0 ceph-mon1 osdmap e83: 6 osds: 6 up, 6 in pgmap v231: 192 pgs, 3 pools, 0 bytes data, 0 objects 207 MB used, 30446 MB / 30653 MB avail 192 active+degraded root@ceph-mon1:/home/ceph# ceph
[ceph-users] active+degraded on an empty new cluster
Hi all, last week I installed a new ceph cluster on 3 vm running Ubuntu 14.04 with default kernel. There is a ceph monitor a two osd hosts. Here are some datails: ceph -s cluster c46d5b02-dab1-40bf-8a3d-f8e4a77b79da health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean monmap e1: 1 mons at {ceph-mon1=10.1.1.83:6789/0}, election epoch 1, quorum 0 ceph-mon1 osdmap e83: 6 osds: 6 up, 6 in pgmap v231: 192 pgs, 3 pools, 0 bytes data, 0 objects 207 MB used, 30446 MB / 30653 MB avail 192 active+degraded root@ceph-mon1:/home/ceph# ceph osd dump epoch 99 fsid c46d5b02-dab1-40bf-8a3d-f8e4a77b79da created 2014-12-06 13:15:06.418843 modified 2014-12-09 11:38:04.353279 flags pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 18 flags hashpspool crash_replay_interval 45 stripe_width 0 pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 19 flags hashpspool stripe_width 0 pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 20 flags hashpspool stripe_width 0 max_osd 6 osd.0 up in weight 1 up_from 90 up_thru 90 down_at 89 last_clean_interval [58,89) 10.1.1.84:6805/995 10.1.1.84:6806/4000995 10.1.1.84:6807/4000995 10.1.1.84:6808/4000995 exists,up e3895075-614d-48e2-b956-96e13dbd87fe osd.1 up in weight 1 up_from 88 up_thru 0 down_at 87 last_clean_interval [8,87) 10.1.1.85:6800/23146 10.1.1.85:6815/7023146 10.1.1.85:6816/7023146 10.1.1.85:6817/7023146 exists,up 144bc6ee-2e3d-4118-a460-8cc2bb3ec3e8 osd.2 up in weight 1 up_from 61 up_thru 0 down_at 60 last_clean_interval [11,60) 10.1.1.85:6805/26784 10.1.1.85:6802/5026784 10.1.1.85:6811/5026784 10.1.1.85:6812/5026784 exists,up 8d5c7108-ef11-4947-b28c-8e20371d6d78 osd.3 up in weight 1 up_from 95 up_thru 0 down_at 94 last_clean_interval [57,94) 10.1.1.84:6800/810 10.1.1.84:6810/3000810 10.1.1.84:6811/3000810 10.1.1.84:6812/3000810 exists,up bd762b2d-f94c-4879-8865-cecd63895557 osd.4 up in weight 1 up_from 97 up_thru 0 down_at 96 last_clean_interval [74,96) 10.1.1.84:6801/9304 10.1.1.84:6802/2009304 10.1.1.84:6803/2009304 10.1.1.84:6813/2009304 exists,up 7d28a54b-b474-4369-b958-9e6bf6c856aa osd.5 up in weight 1 up_from 99 up_thru 0 down_at 98 last_clean_interval [79,98) 10.1.1.85:6801/19513 10.1.1.85:6808/2019513 10.1.1.85:6810/2019513 10.1.1.85:6813/2019513 exists,up f4d76875-0e40-487c-a26d-320f8b8d60c5 root@ceph-mon1:/home/ceph# ceph osd tree # idweight type name up/down reweight -1 0 root default -2 0 host ceph-osd1 0 0 osd.0 up 1 3 0 osd.3 up 1 4 0 osd.4 up 1 -3 0 host ceph-osd2 1 0 osd.1 up 1 2 0 osd.2 up 1 5 0 osd.5 up 1 Current HEALTH_WARN state says 192 active+degraded since I rebooted an osd host. Previously it was incomplete. It never reached a HEALTH_OK state. Any hint about what to do next to have an healthy cluster? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] active+degraded on an empty new cluster
Hi, thanks for the quick answer. I did try the force_create_pg on a pg but is stuck on creating: root@ceph-mon1:/home/ceph# ceph pg dump |grep creating dumped all in format plain 2.2f0 0 0 0 0 0 0 creating 2014-12-09 13:11:37.384808 0'0 0:0 [] -1 [] -1 0'0 0.000'0 0.00 root@ceph-mon1:/home/ceph# ceph pg 2.2f query { state: active+degraded, epoch: 105, up: [ 0], acting: [ 0], actingbackfill: [ 0], info: { pgid: 2.2f, last_update: 0'0, last_complete: 0'0, log_tail: 0'0, last_user_version: 0, last_backfill: MAX, purged_snaps: [], last_scrub: 0'0, last_scrub_stamp: 2014-12-06 14:15:11.499769, last_deep_scrub: 0'0, last_deep_scrub_stamp: 2014-12-06 14:15:11.499769, last_clean_scrub_stamp: 0.00, log_size: 0, ondisk_log_size: 0, stats_invalid: 0, stat_sum: { num_bytes: 0, num_objects: 0, num_object_clones: 0, num_object_copies: 0, num_objects_missing_on_primary: 0, num_objects_degraded: 0, num_objects_unfound: 0, num_objects_dirty: 0, num_whiteouts: 0, num_read: 0, num_read_kb: 0, num_write: 0, num_write_kb: 0, num_scrub_errors: 0, num_shallow_scrub_errors: 0, num_deep_scrub_errors: 0, num_objects_recovered: 0, num_bytes_recovered: 0, num_keys_recovered: 0, num_objects_omap: 0, num_objects_hit_set_archive: 0}, stat_cat_sum: {}, up: [ 0], acting: [ 0], up_primary: 0, acting_primary: 0}, empty: 1, dne: 0, incomplete: 0, last_epoch_started: 104, hit_set_history: { current_last_update: 0'0, current_last_stamp: 0.00, current_info: { begin: 0.00, end: 0.00, version: 0'0}, history: []}}, peer_info: [], recovery_state: [ { name: Started\/Primary\/Active, enter_time: 2014-12-09 12:12:52.760384, might_have_unfound: [], recovery_progress: { backfill_targets: [], waiting_on_backfill: [], last_backfill_started: 0\/\/0\/\/-1, backfill_info: { begin: 0\/\/0\/\/-1, end: 0\/\/0\/\/-1, objects: []}, peer_backfill_info: [], backfills_in_flight: [], recovering: [], pg_backend: { pull_from_peer: [], pushing: []}}, scrub: { scrubber.epoch_start: 0, scrubber.active: 0, scrubber.block_writes: 0, scrubber.finalizing: 0, scrubber.waiting_on: 0, scrubber.waiting_on_whom: []}}, { name: Started, enter_time: 2014-12-09 12:12:51.845686}], agent_state: {}}root@ceph-mon1:/home/ceph# 2014-12-09 13:01 GMT+01:00 Irek Fasikhov malm...@gmail.com: Hi. http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/ ceph pg force_create_pg pgid 2014-12-09 14:50 GMT+03:00 Giuseppe Civitella giuseppe.civite...@gmail.com: Hi all, last week I installed a new ceph cluster on 3 vm running Ubuntu 14.04 with default kernel. There is a ceph monitor a two osd hosts. Here are some datails: ceph -s cluster c46d5b02-dab1-40bf-8a3d-f8e4a77b79da health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean monmap e1: 1 mons at {ceph-mon1=10.1.1.83:6789/0}, election epoch 1, quorum 0 ceph-mon1 osdmap e83: 6 osds: 6 up, 6 in pgmap v231: 192 pgs, 3 pools, 0 bytes data, 0 objects 207 MB used, 30446 MB / 30653 MB avail 192 active+degraded root@ceph-mon1:/home/ceph# ceph osd dump epoch 99 fsid c46d5b02-dab1-40bf-8a3d-f8e4a77b79da created 2014-12-06 13:15:06.418843 modified 2014-12-09 11:38:04.353279 flags pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 18 flags hashpspool crash_replay_interval 45 stripe_width 0 pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 19 flags hashpspool stripe_width 0 pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 20 flags hashpspool stripe_width 0 max_osd 6 osd.0 up in weight 1 up_from 90 up_thru 90 down_at 89 last_clean_interval [58,89) 10.1.1.84:6805/995 10.1.1.84:6806/4000995 10.1.1.84:6807/4000995 10.1.1.84:6808/4000995 exists,up e3895075-614d-48e2-b956-96e13dbd87fe osd.1 up in weight 1 up_from 88 up_thru 0 down_at 87 last_clean_interval