Re: [ceph-users] Cluster health_warn 1 active+undersized+degraded/1 active+remapped
active+remapped, last acting [17,76] >>>> pg 2.81c is stuck unclean for 13892.363443, current state >>>> active+remapped, last acting [25,76] >>>> pg 2.1a3 is stuck unclean for 13892.364400, current state >>>> active+remapped, last acting [16,76] >>>> pg 2.2cb is stuck unclean for 13892.374390, current state >>>> active+remapped, last acting [14,76] >>>> pg 2.d41 is stuck unclean for 13892.373636, current state >>>> active+remapped, last acting [27,76] >>>> pg 2.3f9 is stuck unclean for 13892.373147, current state >>>> active+remapped, last acting [35,76] >>>> pg 2.a62 is stuck unclean for 86283.741920, current state >>>> active+remapped, last acting [2,38] >>>> pg 2.1b0 is stuck unclean for 13892.363268, current state >>>> active+remapped, last acting [3,76] >>>> recovery 1/66089446 objects degraded (0.000%) >>>> recovery 88844/66089446 objects misplaced (0.134%) >>>> >>>> I say apparently because with one object degraded, none of the pg's >>>> are showing degraded: >>>> # ceph pg dump_stuck degraded >>>> ok >>>> >>>> # ceph pg dump_stuck unclean >>>> ok >>>> pg_stat state up up_primary acting acting_primary >>>> 2.e7f active+remapped [58] 58 [58,5] 58 >>>> 2.143 active+remapped [16] 16 [16,76] 16 >>>> 2.968 active+remapped [44] 44 [44,76] 44 >>>> 2.5f8 active+remapped [17] 17 [17,76] 17 >>>> 2.81c active+remapped [25] 25 [25,76] 25 >>>> 2.1a3 active+remapped [16] 16 [16,76] 16 >>>> 2.2cb active+remapped [14] 14 [14,76] 14 >>>> 2.d41 active+remapped [27] 27 [27,76] 27 >>>> 2.3f9 active+remapped [35] 35 [35,76] 35 >>>> 2.a62 active+remapped [2] 2 [2,38] 2 >>>> 2.1b0 active+remapped [3] 3 [3,76] 3 >>>> >>>> All of the OSD filesystems are below 85% full. >>>> >>>> I then compared a 0.94.2 cluster that was new and had not been updated >>>> (current cluster is 0.94.2 which had been updated a couple times) and >>>> noticed the crush map had 'tunable straw_calc_version 1' so I added it >>>> to the current cluster. >>>> >>>> After the data moved around for about 8 hours or so I'm left with this >>>> state: >>>> >>>> # ceph health detail >>>> HEALTH_WARN 2 pgs stuck unclean; recovery 16357/66089446 objects >>>> misplaced (0.025%) >>>> pg 2.e7f is stuck unclean for 149422.331848, current state >>>> active+remapped, last acting [58,5] >>>> pg 2.782 is stuck unclean for 64878.002464, current state >>>> active+remapped, last acting [76,31] >>>> recovery 16357/66089446 objects misplaced (0.025%) >>>> >>>> I attempted a pg repair on both of the pg's listed above, but it >>>> doesn't look like anything is happening. The doc's reference an >>>> inconsistent state as a use case for the repair command so that's >>>> likely why. >>>> >>>> These 2 pg's have been the issue throughout this process so how can I >>>> dig deeper to figure out what the problem is? >>>> >>>> # ceph pg 2.e7f query: http://pastebin.com/jMMsbsjS >>>> # ceph pg 2.e7f query: http://pastebin.com/0ntBfFK5 >>>> >>>> >>>> On Wed, Aug 12, 2015 at 6:52 PM, yangyongp...@bwstor.com.cn >>>> wrote: >>>>> You can try "ceph pg repair pg_id"to repair the unhealth pg."ceph health >>>>> detail" command is very useful to detect unhealth pgs. >>>>> >>>>> >>>>> yangyongp...@bwstor.com.cn >>>>> >>>>> >>>>> From: Steve Dainard >>>>> Date: 2015-08-12 23:48 >>>>> To: ceph-users >>>>> Subject: [ceph-users] Cluster health_warn 1 active+undersized+degraded/1 >>>>> active+remapped >>>>> I ran a ceph osd reweight-by-utilization yesterday and partway through >>>>> had a network interruption. After the network was restored the cluster >>>>> continued to rebalance but this morning the cluster has stopped >>>>> rebalance and status will not change from: >>>>> >>>>> # ceph status >>>>> cluster af859ff1-c394-4c9a-95e2-0e0e4c87445c >>>>> health HEALTH_WARN >>&
Re: [ceph-users] Cluster health_warn 1 active+undersized+degraded/1 active+remapped
OSD tree: http://pastebin.com/3z333DP4 Crushmap: http://pastebin.com/DBd9k56m I realize these nodes are quite large, I have plans to break them out into 12 OSD's/node. On Thu, Aug 13, 2015 at 9:02 AM, GuangYang wrote: > Could you share the 'ceph osd tree dump' and CRUSH map dump ? > > Thanks, > Guang > > > >> Date: Thu, 13 Aug 2015 08:16:09 -0700 >> From: sdain...@spd1.com >> To: yangyongp...@bwstor.com.cn; ceph-users@lists.ceph.com >> Subject: Re: [ceph-users] Cluster health_warn 1 active+undersized+degraded/1 >> active+remapped >> >> I decided to set OSD 76 out and let the cluster shuffle the data off >> that disk and then brought the OSD back in. For the most part this >> seemed to be working, but then I had 1 object degraded and 88xxx >> objects misplaced: >> >> # ceph health detail >> HEALTH_WARN 11 pgs stuck unclean; recovery 1/66089446 objects degraded >> (0.000%); recovery 88844/66089446 objects misplaced (0.134%) >> pg 2.e7f is stuck unclean for 88398.251351, current state >> active+remapped, last acting [58,5] >> pg 2.143 is stuck unclean for 13892.364101, current state >> active+remapped, last acting [16,76] >> pg 2.968 is stuck unclean for 13892.363521, current state >> active+remapped, last acting [44,76] >> pg 2.5f8 is stuck unclean for 13892.377245, current state >> active+remapped, last acting [17,76] >> pg 2.81c is stuck unclean for 13892.363443, current state >> active+remapped, last acting [25,76] >> pg 2.1a3 is stuck unclean for 13892.364400, current state >> active+remapped, last acting [16,76] >> pg 2.2cb is stuck unclean for 13892.374390, current state >> active+remapped, last acting [14,76] >> pg 2.d41 is stuck unclean for 13892.373636, current state >> active+remapped, last acting [27,76] >> pg 2.3f9 is stuck unclean for 13892.373147, current state >> active+remapped, last acting [35,76] >> pg 2.a62 is stuck unclean for 86283.741920, current state >> active+remapped, last acting [2,38] >> pg 2.1b0 is stuck unclean for 13892.363268, current state >> active+remapped, last acting [3,76] >> recovery 1/66089446 objects degraded (0.000%) >> recovery 88844/66089446 objects misplaced (0.134%) >> >> I say apparently because with one object degraded, none of the pg's >> are showing degraded: >> # ceph pg dump_stuck degraded >> ok >> >> # ceph pg dump_stuck unclean >> ok >> pg_stat state up up_primary acting acting_primary >> 2.e7f active+remapped [58] 58 [58,5] 58 >> 2.143 active+remapped [16] 16 [16,76] 16 >> 2.968 active+remapped [44] 44 [44,76] 44 >> 2.5f8 active+remapped [17] 17 [17,76] 17 >> 2.81c active+remapped [25] 25 [25,76] 25 >> 2.1a3 active+remapped [16] 16 [16,76] 16 >> 2.2cb active+remapped [14] 14 [14,76] 14 >> 2.d41 active+remapped [27] 27 [27,76] 27 >> 2.3f9 active+remapped [35] 35 [35,76] 35 >> 2.a62 active+remapped [2] 2 [2,38] 2 >> 2.1b0 active+remapped [3] 3 [3,76] 3 >> >> All of the OSD filesystems are below 85% full. >> >> I then compared a 0.94.2 cluster that was new and had not been updated >> (current cluster is 0.94.2 which had been updated a couple times) and >> noticed the crush map had 'tunable straw_calc_version 1' so I added it >> to the current cluster. >> >> After the data moved around for about 8 hours or so I'm left with this state: >> >> # ceph health detail >> HEALTH_WARN 2 pgs stuck unclean; recovery 16357/66089446 objects >> misplaced (0.025%) >> pg 2.e7f is stuck unclean for 149422.331848, current state >> active+remapped, last acting [58,5] >> pg 2.782 is stuck unclean for 64878.002464, current state >> active+remapped, last acting [76,31] >> recovery 16357/66089446 objects misplaced (0.025%) >> >> I attempted a pg repair on both of the pg's listed above, but it >> doesn't look like anything is happening. The doc's reference an >> inconsistent state as a use case for the repair command so that's >> likely why. >> >> These 2 pg's have been the issue throughout this process so how can I >> dig deeper to figure out what the problem is? >> >> # ceph pg 2.e7f query: http://pastebin.com/jMMsbsjS >> # ceph pg 2.e7f query: http://pastebin.com/0ntBfFK5 >> >> >> On Wed, Aug 12, 2015 at 6:52 PM, yangyongp...@bwstor.com.cn >> wrote: >>> You can try "ceph pg repair pg_id"to repair the unhealth pg."ceph health >>> detail" command is very useful to detect unhealth
Re: [ceph-users] Cluster health_warn 1 active+undersized+degraded/1 active+remapped
I decided to set OSD 76 out and let the cluster shuffle the data off that disk and then brought the OSD back in. For the most part this seemed to be working, but then I had 1 object degraded and 88xxx objects misplaced: # ceph health detail HEALTH_WARN 11 pgs stuck unclean; recovery 1/66089446 objects degraded (0.000%); recovery 88844/66089446 objects misplaced (0.134%) pg 2.e7f is stuck unclean for 88398.251351, current state active+remapped, last acting [58,5] pg 2.143 is stuck unclean for 13892.364101, current state active+remapped, last acting [16,76] pg 2.968 is stuck unclean for 13892.363521, current state active+remapped, last acting [44,76] pg 2.5f8 is stuck unclean for 13892.377245, current state active+remapped, last acting [17,76] pg 2.81c is stuck unclean for 13892.363443, current state active+remapped, last acting [25,76] pg 2.1a3 is stuck unclean for 13892.364400, current state active+remapped, last acting [16,76] pg 2.2cb is stuck unclean for 13892.374390, current state active+remapped, last acting [14,76] pg 2.d41 is stuck unclean for 13892.373636, current state active+remapped, last acting [27,76] pg 2.3f9 is stuck unclean for 13892.373147, current state active+remapped, last acting [35,76] pg 2.a62 is stuck unclean for 86283.741920, current state active+remapped, last acting [2,38] pg 2.1b0 is stuck unclean for 13892.363268, current state active+remapped, last acting [3,76] recovery 1/66089446 objects degraded (0.000%) recovery 88844/66089446 objects misplaced (0.134%) I say apparently because with one object degraded, none of the pg's are showing degraded: # ceph pg dump_stuck degraded ok # ceph pg dump_stuck unclean ok pg_stat state up up_primary acting acting_primary 2.e7f active+remapped [58] 58 [58,5] 58 2.143 active+remapped [16] 16 [16,76] 16 2.968 active+remapped [44] 44 [44,76] 44 2.5f8 active+remapped [17] 17 [17,76] 17 2.81c active+remapped [25] 25 [25,76] 25 2.1a3 active+remapped [16] 16 [16,76] 16 2.2cb active+remapped [14] 14 [14,76] 14 2.d41 active+remapped [27] 27 [27,76] 27 2.3f9 active+remapped [35] 35 [35,76] 35 2.a62 active+remapped [2] 2 [2,38] 2 2.1b0 active+remapped [3] 3 [3,76] 3 All of the OSD filesystems are below 85% full. I then compared a 0.94.2 cluster that was new and had not been updated (current cluster is 0.94.2 which had been updated a couple times) and noticed the crush map had 'tunable straw_calc_version 1' so I added it to the current cluster. After the data moved around for about 8 hours or so I'm left with this state: # ceph health detail HEALTH_WARN 2 pgs stuck unclean; recovery 16357/66089446 objects misplaced (0.025%) pg 2.e7f is stuck unclean for 149422.331848, current state active+remapped, last acting [58,5] pg 2.782 is stuck unclean for 64878.002464, current state active+remapped, last acting [76,31] recovery 16357/66089446 objects misplaced (0.025%) I attempted a pg repair on both of the pg's listed above, but it doesn't look like anything is happening. The doc's reference an inconsistent state as a use case for the repair command so that's likely why. These 2 pg's have been the issue throughout this process so how can I dig deeper to figure out what the problem is? # ceph pg 2.e7f query: http://pastebin.com/jMMsbsjS # ceph pg 2.e7f query: http://pastebin.com/0ntBfFK5 On Wed, Aug 12, 2015 at 6:52 PM, yangyongp...@bwstor.com.cn wrote: > You can try "ceph pg repair pg_id"to repair the unhealth pg."ceph health > detail" command is very useful to detect unhealth pgs. > > ________ > yangyongp...@bwstor.com.cn > > > From: Steve Dainard > Date: 2015-08-12 23:48 > To: ceph-users > Subject: [ceph-users] Cluster health_warn 1 active+undersized+degraded/1 > active+remapped > I ran a ceph osd reweight-by-utilization yesterday and partway through > had a network interruption. After the network was restored the cluster > continued to rebalance but this morning the cluster has stopped > rebalance and status will not change from: > > # ceph status > cluster af859ff1-c394-4c9a-95e2-0e0e4c87445c > health HEALTH_WARN > 1 pgs degraded > 1 pgs stuck degraded > 2 pgs stuck unclean > 1 pgs stuck undersized > 1 pgs undersized > recovery 8163/66089054 objects degraded (0.012%) > recovery 8194/66089054 objects misplaced (0.012%) > monmap e24: 3 mons at > {mon1=10.0.231.53:6789/0,mon2=10.0.231.54:6789/0,mon3=10.0.231.55:6789/0} > election epoch 250, quorum 0,1,2 mon1,mon2,mon3 > osdmap e184486: 100 osds: 100 up, 100 in; 1 remapped pgs > pgmap v3010985: 4144 pgs, 7 pools, 125 TB data, 32270 kobjects > 251 TB used, 111 TB / 363 TB avail > 8163/66089054 objects degraded (0.012%) > 8194/66089054 objects
[ceph-users] Cluster health_warn 1 active+undersized+degraded/1 active+remapped
I ran a ceph osd reweight-by-utilization yesterday and partway through had a network interruption. After the network was restored the cluster continued to rebalance but this morning the cluster has stopped rebalance and status will not change from: # ceph status cluster af859ff1-c394-4c9a-95e2-0e0e4c87445c health HEALTH_WARN 1 pgs degraded 1 pgs stuck degraded 2 pgs stuck unclean 1 pgs stuck undersized 1 pgs undersized recovery 8163/66089054 objects degraded (0.012%) recovery 8194/66089054 objects misplaced (0.012%) monmap e24: 3 mons at {mon1=10.0.231.53:6789/0,mon2=10.0.231.54:6789/0,mon3=10.0.231.55:6789/0} election epoch 250, quorum 0,1,2 mon1,mon2,mon3 osdmap e184486: 100 osds: 100 up, 100 in; 1 remapped pgs pgmap v3010985: 4144 pgs, 7 pools, 125 TB data, 32270 kobjects 251 TB used, 111 TB / 363 TB avail 8163/66089054 objects degraded (0.012%) 8194/66089054 objects misplaced (0.012%) 4142 active+clean 1 active+undersized+degraded 1 active+remapped # ceph health detail HEALTH_WARN 1 pgs degraded; 1 pgs stuck degraded; 2 pgs stuck unclean; 1 pgs stuck undersized; 1 pgs undersized; recovery 8163/66089054 objects degraded (0.012%); recovery 8194/66089054 objects misplaced (0.012%) pg 2.e7f is stuck unclean for 65125.554509, current state active+remapped, last acting [58,5] pg 2.782 is stuck unclean for 65140.681540, current state active+undersized+degraded, last acting [76] pg 2.782 is stuck undersized for 60568.221461, current state active+undersized+degraded, last acting [76] pg 2.782 is stuck degraded for 60568.221549, current state active+undersized+degraded, last acting [76] pg 2.782 is active+undersized+degraded, acting [76] recovery 8163/66089054 objects degraded (0.012%) recovery 8194/66089054 objects misplaced (0.012%) # ceph pg 2.e7f query "recovery_state": [ { "name": "Started\/Primary\/Active", "enter_time": "2015-08-11 15:43:09.190269", "might_have_unfound": [], "recovery_progress": { "backfill_targets": [], "waiting_on_backfill": [], "last_backfill_started": "0\/\/0\/\/-1", "backfill_info": { "begin": "0\/\/0\/\/-1", "end": "0\/\/0\/\/-1", "objects": [] }, "peer_backfill_info": [], "backfills_in_flight": [], "recovering": [], "pg_backend": { "pull_from_peer": [], "pushing": [] } }, "scrub": { "scrubber.epoch_start": "0", "scrubber.active": 0, "scrubber.waiting_on": 0, "scrubber.waiting_on_whom": [] } }, { "name": "Started", "enter_time": "2015-08-11 15:43:04.955796" } ], # ceph pg 2.782 query "recovery_state": [ { "name": "Started\/Primary\/Active", "enter_time": "2015-08-11 15:42:42.178042", "might_have_unfound": [ { "osd": "5", "status": "not queried" } ], "recovery_progress": { "backfill_targets": [], "waiting_on_backfill": [], "last_backfill_started": "0\/\/0\/\/-1", "backfill_info": { "begin": "0\/\/0\/\/-1", "end": "0\/\/0\/\/-1", "objects": [] }, "peer_backfill_info": [], "backfills_in_flight": [], "recovering": [], "pg_backend": { "pull_from_peer": [], "pushing": [] } }, "scrub": { "scrubber.epoch_start": "0", "scrubber.active": 0, "scrubber.waiting_on": 0, "scrubber.waiting_on_whom": [] } }, { "name": "Started", "enter_time": "2015-08-11 15:42:41.139709" } ], "agent_state": {} I tried restarted osd.5/58/76 but no change. Any suggestions? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph tell not persistent through reboots?
That would make sense.. Thanks! On Thu, Aug 6, 2015 at 6:29 PM, Wang, Warren wrote: > Injecting args into the running procs is not meant to be persistent. You'll > need to modify /etc/ceph/ceph.conf for that. > > Warren > > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Steve Dainard > Sent: Thursday, August 06, 2015 9:16 PM > To: ceph-users@lists.ceph.com > Subject: [ceph-users] ceph tell not persistent through reboots? > > Hello, > > Version 0.94.1 > > I'm passing settings to the admin socket ie: > ceph tell osd.* injectargs '--osd_deep_scrub_begin_hour 20' > ceph tell osd.* injectargs '--osd_deep_scrub_end_hour 4' > ceph tell osd.* injectargs '--osd_deep_scrub_interval 1209600' > > Then I check to see if they're in the configs now: > # ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | egrep -i > 'scrub_interval|hour' > "osd_scrub_begin_hour": "4", > "osd_scrub_end_hour": "20", > "osd_deep_scrub_interval": "1.2096e+06", > > Then I restart that host and check again and the values have returned to > default: > # ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | egrep -i > 'scrub_interval|hour' > "osd_scrub_begin_hour": "0", > "osd_scrub_end_hour": "24", > "osd_deep_scrub_interval": "604800", > > If I check on another host the values are correct: > # ceph --admin-daemon /var/run/ceph/ceph-osd.90.asok config show | egrep -i > 'scrub_interval|hour' > "osd_scrub_begin_hour": "20", > "osd_scrub_end_hour": "4", > "osd_deep_scrub_interval": "1.2096e+06", > > If I check on a mon the values are default: > # ceph --admin-daemon /var/run/ceph/ceph-mon.mon1.asok config show | egrep -i > 'scrub_interval|hour' > "osd_scrub_begin_hour": "0", > "osd_scrub_end_hour": "24", > "osd_deep_scrub_interval": "604800", > > If I try to pass a config to mon1 via a osd host it appears to do something: > # ceph tell mon.1 injectargs "--osd_deep_scrub_interval 1209600" > injectargs:osd_deep_scrub_interval = '1.2096e+06' > > And then check on mon1 and its still the default value: > # ceph --admin-daemon /var/run/ceph/ceph-mon.mon1.asok config show | egrep -i > scrub_interval > "osd_deep_scrub_interval": "604800", > > > And if I pass a config on mon1 it looks like its being updated, but the > default remains: > # ceph tell mon.1 injectargs "--osd_deep_scrub_interval 1209600" > injectargs:osd_deep_scrub_interval = '1.2096e+06' > # ceph --admin-daemon /var/run/ceph/ceph-mon.mon1.asok config show | egrep -i > scrub_interval > "osd_deep_scrub_interval": "604800", > > I don't know if this is a bug, or if I'm doing something wrong here... > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Direct IO tests on RBD device vary significantly
Trying to get an understanding why direct IO would be so slow on my cluster. Ceph 0.94.1 1 Gig public network 10 Gig public network 10 Gig cluster network 100 OSD's, 4T disk sizes, 5G SSD journal. As of this morning I had no SSD journal and was finding direct IO was sub 10MB/s so I decided to add journals today. Afterwards I started running tests again and wasn't very impressed. Then for no apparent reason the write speeds increased significantly. But I'm finding they vary wildly. Currently there is a bit of background ceph activity, but only my testing client has an rbd mapped/mounted: election epoch 144, quorum 0,1,2 mon1,mon3,mon2 osdmap e181963: 100 osds: 100 up, 100 in flags noout pgmap v2852566: 4144 pgs, 7 pools, 113 TB data, 29179 kobjects 227 TB used, 135 TB / 363 TB avail 4103 active+clean 40 active+clean+scrubbing 1 active+clean+scrubbing+deep Tests: 1M block size: http://pastebin.com/LKtsaHrd throughput has no consistency 4k block size: http://pastebin.com/ib6VW9eB thoughput is amazingly consistent Thoughts? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph tell not persistent through reboots?
Hello, Version 0.94.1 I'm passing settings to the admin socket ie: ceph tell osd.* injectargs '--osd_deep_scrub_begin_hour 20' ceph tell osd.* injectargs '--osd_deep_scrub_end_hour 4' ceph tell osd.* injectargs '--osd_deep_scrub_interval 1209600' Then I check to see if they're in the configs now: # ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | egrep -i 'scrub_interval|hour' "osd_scrub_begin_hour": "4", "osd_scrub_end_hour": "20", "osd_deep_scrub_interval": "1.2096e+06", Then I restart that host and check again and the values have returned to default: # ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | egrep -i 'scrub_interval|hour' "osd_scrub_begin_hour": "0", "osd_scrub_end_hour": "24", "osd_deep_scrub_interval": "604800", If I check on another host the values are correct: # ceph --admin-daemon /var/run/ceph/ceph-osd.90.asok config show | egrep -i 'scrub_interval|hour' "osd_scrub_begin_hour": "20", "osd_scrub_end_hour": "4", "osd_deep_scrub_interval": "1.2096e+06", If I check on a mon the values are default: # ceph --admin-daemon /var/run/ceph/ceph-mon.mon1.asok config show | egrep -i 'scrub_interval|hour' "osd_scrub_begin_hour": "0", "osd_scrub_end_hour": "24", "osd_deep_scrub_interval": "604800", If I try to pass a config to mon1 via a osd host it appears to do something: # ceph tell mon.1 injectargs "--osd_deep_scrub_interval 1209600" injectargs:osd_deep_scrub_interval = '1.2096e+06' And then check on mon1 and its still the default value: # ceph --admin-daemon /var/run/ceph/ceph-mon.mon1.asok config show | egrep -i scrub_interval "osd_deep_scrub_interval": "604800", And if I pass a config on mon1 it looks like its being updated, but the default remains: # ceph tell mon.1 injectargs "--osd_deep_scrub_interval 1209600" injectargs:osd_deep_scrub_interval = '1.2096e+06' # ceph --admin-daemon /var/run/ceph/ceph-mon.mon1.asok config show | egrep -i scrub_interval "osd_deep_scrub_interval": "604800", I don't know if this is a bug, or if I'm doing something wrong here... ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Meanning of ceph perf dump
Hi Somnath, Do you have a link with the definitions of all the perf counters? Thanks, Steve On Sun, Jul 5, 2015 at 11:23 AM, Somnath Roy wrote: > Hi Ray, > > Here is the description of the different latencies under filestore perf > counters. > > > > Journal_latency : > > -- > > > > This is the latency of putting the ops in journal. Write is acknowledged > after that (well a bit after that, there is one context switch after this). > > > > commitcycle_latency: > > -- > > > > Filestore backend while carrying out transaction, do a buffered write. In a > separate thread it does call syncfs() to persist the data to the disk and > update the persistent commit number in a separate file. This thread runs by > default 5 sec interval. > > This latency measures the time taken to carry out this job after the timer > expires i.e the actual persisting cycle. > > > > apply_latency: > > > > > > This is the entire latency till the transaction finishes i.e journal write + > transaction time. It will do a buffer write here. > > > > queue_transaction_latency_avg: > > > > This is the latency of putting the op in the journal queue. This will give > you an idea how much throttling is going on at the first place. This depends > on the following two parameters if you are using XFS. > > > > filestore_queue_max_ops > > filestore_queue_max_bytes > > > > > > All the latency numbers are represented by avgcount(number of ops within > this range) and the sum (which is total latency in second). Sum/avgcount > will give you an idea the latency per op. > > > > Hope this is helpful, > > > > Thanks & Regards > > Somnath > > > > > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Ray > Sun > Sent: Sunday, July 05, 2015 7:28 AM > To: ceph-users@lists.ceph.com > Subject: [ceph-users] Meanning of ceph perf dump > > > > Cephers, > > Is there any documents or code definition to explain ceph perf dump? I am a > little confusing about the output, for example, under filestore, there's > journal_latency and apply_latency and each of them has avgcount and sum. I > am not quite sure what's the unit and meaning of the numbers? How can I use > these numbers to tuning my ceph cluster. Thanks a lot. > > > > "filestore": { > > "journal_queue_max_ops": 300, > > "journal_queue_ops": 0, > > "journal_ops": 35893, > > "journal_queue_max_bytes": 33554432, > > "journal_queue_bytes": 0, > > "journal_bytes": 20579009432, > > "journal_latency": { > > "avgcount": 35893, > > "sum": 1213.560761279 > > }, > > "journal_wr": 34228, > > "journal_wr_bytes": { > > "avgcount": 34228, > > "sum": 20657713152 > > }, > > "journal_full": 0, > > "committing": 0, > > "commitcycle": 3207, > > "commitcycle_interval": { > > "avgcount": 3207, > > "sum": 16157.379852152 > > }, > > "commitcycle_latency": { > > "avgcount": 3207, > > "sum": 121.892109010 > > }, > > "op_queue_max_ops": 50, > > "op_queue_ops": 0, > > "ops": 35893, > > "op_queue_max_bytes": 104857600, > > "op_queue_bytes": 0, > > "bytes": 20578506930, > > "apply_latency": { > > "avgcount": 35893, > > "sum": 1327.974596287 > > }, > > "queue_transaction_latency_avg": { > > "avgcount": 35893, > > "sum": 0.025993727 > > } > > }, > > > > Best Regards > -- Ray > > > > > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If the > reader of this message is not the intended recipient, you are hereby > notified that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly > prohibited. If you have received this communication in error, please notify > the sender by telephone or e-mail (as shown above) immediately and destroy > any and all copies of this message in your possession (whether hard copies > or electronically stored copies). > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Workaround for RHEL/CentOS 7.1 rbdmap service start warnings?
Other than those errors, do you find RBD's will not be unmapped on system restart/shutdown on a machine using systemd? Leaving the system hanging without network connections trying to unmap RBD's? That's been my experience thus far, so I wrote an (overly simple) systemd file to handle this on a per RBD basis. On Tue, Jul 14, 2015 at 1:15 PM, Bruce McFarland wrote: > When starting the rbdmap.service to provide map/unmap of rbd devices across > boot/shutdown cycles the /etc/init.d/rbdmap includes > /lib/lsb/init-functions. This is not a problem except that the rbdmap script > is making calls to the log_daemon_* log_progress_* log_actiion_* functions > that are included in Ubuntu 14.04 distro's, but are not in the RHEL 7.1/RHCS > 1.3 distro. Are there any recommended workaround for boot time startup in > RHEL/Centos 7.1 clients? > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Deadly slow Ceph cluster revisited
Disclaimer: I'm relatively new to ceph, and haven't moved into production with it. Did you run your bench for 30 seconds? For reference my bench from a VM bridged to a 10Gig card with 90x4TB at 30 seconds is: Total time run: 30.766596 Total writes made: 1979 Write size: 4194304 Bandwidth (MB/sec): 257.292 Stddev Bandwidth: 106.78 Max bandwidth (MB/sec): 420 Min bandwidth (MB/sec): 0 Average Latency:0.248238 Stddev Latency: 0.723444 Max latency:10.5275 Min latency:0.0346015 Seems like latency is a huge factor if your 30 second test took 52 seconds. What kind of 10Gig NICs are you using? I have Mellanox Connectx-3 and one node was using an older driver version. I started to experience the osd in..out..in.. and "incorrectly marked out from..." as mentioned by Quentin as well as poor performance. Installed the newest version of the Mellanox driver and all is running well again. On Fri, Jul 17, 2015 at 7:55 AM, J David wrote: > On Fri, Jul 17, 2015 at 10:21 AM, Mark Nelson wrote: >> rados -p 30 bench write >> >> just to see how it handles 4MB object writes. > > Here's that, from the VM host: > > Total time run: 52.062639 > Total writes made: 66 > Write size: 4194304 > Bandwidth (MB/sec): 5.071 > > Stddev Bandwidth: 11.6312 > Max bandwidth (MB/sec): 80 > Min bandwidth (MB/sec): 0 > Average Latency:12.436 > Stddev Latency: 13.6272 > Max latency:51.6924 > Min latency:0.073353 > > Unfortunately I don't know much about how to parse this (other than > 5MB/sec writes does match up with our best-case performance in the VM > guest). > >> If rados bench is >> also terribly slow, then you might want to start looking for evidence of IO >> getting hung up on a specific disk or node. > > Thusfar, no evidence of that has presented itself. iostat looks good > on every drive and the nodes are all equally loaded. > > Thanks! > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Unsetting osd_crush_chooseleaf_type = 0
I originally built a single node cluster, and added 'osd_crush_chooseleaf_type = 0 #0 is for one node cluster' to ceph.conf (which is now commented out). I've now added a 2nd node, where can I set this value to 1? I see in the crush map that the osd's are under 'host' buckets and don't see any reference to leaf. Would the cluster automatically rebalance when the 2nd host was added? How can I verify this? The issue right now, is with two host, copies = 2, min copies = 1, I cannot access data from client machines when one of the two hosts goes down. # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 region type 10 root # buckets host ceph1 { id -2 # do not change unnecessarily # weight 163.350 alg straw hash 0 # rjenkins1 item osd.0 weight 3.630 item osd.1 weight 3.630 } host ceph2 { id -3 # do not change unnecessarily # weight 163.350 alg straw hash 0 # rjenkins1 item osd.2 weight 3.630 item osd.3 weight 3.630 } root default { id -1 # do not change unnecessarily # weight 326.699 alg straw hash 0 # rjenkins1 item ceph1 weight 163.350 item ceph2 weight 163.350 } # rules rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default *step choose firstn 0 type osd <-- should this line be ''**step chooseleaf firstn 0 type host"?* step emit } # end crush map ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Health WARN, ceph errors looping
The error keeps coming back, eventually status changing to OK, then back into errors. I thought it looked like a connectivity issue as well with the "wrongly marked me down", but firewall rules are allowing all traffic on the cluster network. Syslog is being flooded with messages like: Jul 7 10:52:17 ceph1 bash: 2015-07-07 10:52:17.609870 7f2055192700 -1 osd.21 129936 heartbeat_check: no reply from osd.89 ever on either front or back, first ping sent 2015-07-07 10:51:50.995374 (cutoff 2015-07-07 10:51:57.609817) Jul 7 10:52:17 ceph1 bash: 2015-07-07 10:52:17.611302 7f203ba5b700 -1 osd.21 129936 heartbeat_check: no reply from osd.50 ever on either front or back, first ping sent 2015-07-07 10:51:44.691270 (cutoff 2015-07-07 10:51:57.611297) Jul 7 10:52:17 ceph1 bash: 2015-07-07 10:52:17.611309 7f203ba5b700 -1 osd.21 129936 heartbeat_check: no reply from osd.61 ever on either front or back, first ping sent 2015-07-07 10:51:50.995374 (cutoff 2015-07-07 10:51:57.611297) Jul 7 10:52:17 ceph1 bash: 2015-07-07 10:52:17.611315 7f203ba5b700 -1 osd.21 129936 heartbeat_check: no reply from osd.69 ever on either front or back, first ping sent 2015-07-07 10:51:54.998259 (cutoff 2015-07-07 10:51:57.611297) Thats just a small section, but multiple osd's are listed. eventually the logs are rate limited because they're coming in so fast. On Tue, Jul 7, 2015 at 10:13 AM, Abhishek L wrote: > > Steve Dainard writes: > >> Hello, >> >> Ceph 0.94.1 >> 2 hosts, Centos 7 >> >> I have two hosts, one which ran out of / disk space which crashed all >> the osd daemons. After cleaning up the OS disk storage and restarting >> ceph on that node, I'm seeing multiple errors, then health OK, then >> back into the errors: >> >> # ceph -w >> http://pastebin.com/mSKwNzYp > > Is the error still consistently happening? (the last lines shows > active+clean) Wild guess, but is it possible some sort of > iptables/firewall rules are preventing communication between the osds? > >> >> Any help is appreciated. >> >> Thanks, >> Steve >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- > Abhishek ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Health WARN, ceph errors looping
Hello, Ceph 0.94.1 2 hosts, Centos 7 I have two hosts, one which ran out of / disk space which crashed all the osd daemons. After cleaning up the OS disk storage and restarting ceph on that node, I'm seeing multiple errors, then health OK, then back into the errors: # ceph -w http://pastebin.com/mSKwNzYp Any help is appreciated. Thanks, Steve ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Can't mount btrfs volume on rbd
Hello, I'm getting an error when attempting to mount a volume on a host that was forceably powered off: # mount /dev/rbd4 climate-downscale-CMIP5/ mount: mount /dev/rbd4 on /mnt/climate-downscale-CMIP5 failed: Stale file handle /var/log/messages: Jun 10 15:31:07 node1 kernel: rbd4: unknown partition table # parted /dev/rbd4 print Model: Unknown (unknown) Disk /dev/rbd4: 36.5TB Sector size (logical/physical): 512B/512B Partition Table: loop Disk Flags: Number Start End SizeFile system Flags 1 0.00B 36.5TB 36.5TB btrfs # btrfs check --repair /dev/rbd4 enabling repair mode Checking filesystem on /dev/rbd4 UUID: dfe6b0c8-2866-4318-abc2-e1e75c891a5e checking extents cmds-check.c:2274: check_owner_ref: Assertion `rec->is_root` failed. btrfs[0x4175cc] btrfs[0x41b873] btrfs[0x41c3fe] btrfs[0x41dc1d] btrfs[0x406922] OS: CentOS 7.1 btrfs-progs: 3.16.2 Ceph: version: 0.94.1/CentOS 7.1 I haven't found any references to 'stale file handle' on btrfs. The underlying block device is ceph rbd, so I've posted to both lists for any feedback. Also once I reformatted btrfs I didn't get a mount error. The btrfs volume has been reformatted so I won't be able to do much post mortem but I'm wondering if anyone has some insight. Thanks, Steve ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com