Re: [ceph-users] Can't start osd- one osd alway be down.
Hi Craig Lewis, My pool have 300TB DATA, I can't recreate a new pool, then copying data by ceph cp pool (take very long time). I upgraded Ceph to Giant (0.86), but still error :(( I think my proplem is objects misplaced (0.320%) # ceph pg 23.96 query num_objects_missing_on_primary: 0, num_objects_degraded: 0, NUM_OBJECTS_MISPLACED: 79, cluster xx-x-x-x health HEALTH_WARN 225 pgs degraded; 2 pgs repair; 225 pgs stuck degraded; 263 pgs stuck unclean; 225 pgs stuck undersized; 225 pgs undersized; recovery 308759/54799506 objects degraded (0.563%); 175270/54799506 objects misplaced (0.320%); 1/130 in osds are down; flags noout,nodeep-scrub pgmap v28905830: 14973 pgs, 23 pools, 70255 GB data, 17838 kobjects 206 TB used, 245 TB / 452 TB avail 308759/54799506 objects degraded (0.563%); 175270/54799506 OBJECTS MISPLACED (0.320%) 14708 active+clean 38 ACTIVE+REMAPPED 225 ACTIVE+UNDERSIZED+DEGRADED client io 35068 kB/s rd, 71815 kB/s wr, 4956 op/s - Checking in ceph log: 2014-10-28 15:33:59.733177 7f6a7f1ab700 5 OSD.21 pg_epoch: 103718 pg[23.96( v 103713'171086 (103609'167229,103713'171086] local-les=103715 n=85 ec=25000 les/c 103715/103710 103714/103714/103236) [92,21,78] r=1 lpr=103714 pi=100280-103713/118 luod=0'0 crt=103713'171086 active] ENTER STARTED/REPLICAACTIVE/REPNOTRECOVERING Then logging many failed log: (on many objects eg: c03fe096/rbd_data.5348922ae8944a.306b,..) 2014-10-28 15:33:59.343435 7f6a7e1a9700 5 -- op tracker -- seq: 1793, time: 2014-10-28 15:33:59.343435, event: done, op: MOSDPGPush(23.96 103718 [PushOp(C03FE096/RBD_DATA.5348922AE8944A.306B/HEAD//24, version: 103622'283374, data_included: [0~4194304], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(c03fe096/rbd_data.5348922ae8944a.306b/head//24@103622'283374, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(4120f096/rbd_data.7a63d32ae8944a.0083/head//24, version: 103679'295624, data_included: [0~4194304], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(4120f096/rbd_data.7a63d32ae8944a.0083/head//24@103679'295624, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))]) Thanks! -- Tuan HaNoi-VietNam On 2014-10-28 01:35, Craig Lewis wrote: My experience is that once you hit this bug, those PGs are gone. I tried marking the primary OSD OUT, which caused this problem to move to the new primary OSD. Luckily for me, my affected PGs were using replication state in the secondary cluster. I ended up deleting the whole pool and recreating it. Which pools are 7 and 23? It's possible that it's something that easy to replace. On Fri, Oct 24, 2014 at 9:26 PM, Ta Ba Tuan tua...@vccloud.vn wrote: Hi Craig, Thanks for replying. When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log. Those pgs are active+degraded state. #ceph pg map 7.9d8 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start osd.21 then pg 7.9d8 and three remain pgs to changed to state active+recovering) . osd.21 still down after following logs: ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't start osd- one osd alway be down.
My experience is that once you hit this bug, those PGs are gone. I tried marking the primary OSD OUT, which caused this problem to move to the new primary OSD. Luckily for me, my affected PGs were using replication state in the secondary cluster. I ended up deleting the whole pool and recreating it. Which pools are 7 and 23? It's possible that it's something that easy to replace. On Fri, Oct 24, 2014 at 9:26 PM, Ta Ba Tuan tua...@vccloud.vn wrote: Hi Craig, Thanks for replying. When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log. Those pgs are active+degraded state. #ceph pg map 7.9d8 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start osd.21 then pg 7.9d8 and three remain pgs to changed to state active+recovering) . osd.21 still down after following logs: ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't start osd- one osd alway be down.
I send some related bugs: (osd.21 not be able started) -8705 2014-10-25 14:41:04.345727 7f12bac2f700 5 *osd.21* pg_epoch: 102843 pg[*6.5e1*( v 102843'11832159 (102377'11822991,102843'11832159] lb c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6 local-les=101780 n=4719 ec=164 les/c 102841/102838 102840/102840/102477) [40,0,21]/[40,0,60] r=-1 lpr=102840 pi=31832-102839/230 luod=0'0 crt=102843'11832157 lcod 102843'11832158 active+remapped] *exit Started/ReplicaActive/RepNotRecovering* 0.000170 1 0.000296 -1637 2014-10-25 14:41:14.326580 7f12bac2f700 5 *osd.21* pg_epoch: 102843 pg[*2.23b*( v 102839'91984 (91680'88526,102839'91984] local-les=102841 n=85 ec=25000 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100114-102839/50 luod=0'0 crt=102839'91984 active] *enter Started/ReplicaActive/RepNotRecovering* -437 2014-10-25 14:41:15.042174 7f12ba42e700 5 *osd.21 *pg_epoch: 102843 pg[*27.239(* v 102808'38419 (81621'35409,102808'38419] local-les=102841 n=23 ec=25085 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100252-102839/53 luod=0'0 crt=102808'38419 active] *enter **Started/ReplicaActive/RepNotRecovering* Thanks! On 10/25/2014 11:26 AM, Ta Ba Tuan wrote: Hi Craig, Thanks for replying. When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log. Those pgs are active+degraded state. #ceph pg map 7.9d8 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start osd.21 then pg 7.9d8 and three remain pgs to changed to state active+recovering) . osd.21 still down after following logs: 2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 seconds old, received at 2014-10-25 10:57:17.580013: MOSDPGPush(*7.9d8 *102803 [Push Op(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6, version: 102798'7794851, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6@102 798'7794851, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete :true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_rec overed_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 seconds old, received at 2014-10-25 10:57:18.140156: MOSDPGPush(*23.596* 102803 [Pus hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24, version: 102798'295732, data_included: [0~4194304], data_size: 4194304, omap_head er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24@1 02798'295732, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re covered_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 seconds old, received at 2014-10-25 10:57:17.555048: MOSDPGPush(*23.9c6* 102803 [Pus hOp(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24, version: 102798'66056, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24@10 2798'66056, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete: true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_reco vered_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:58.418847 osd.21 [WRN] 26 slow requests, 1 included below; oldest blocked for 54.967456 secs 2014-10-25 10:57:58.418859 osd.21 [WRN] slow request 30.967294 seconds old, received at 2014-10-25 10:57:27.451488: MOSDPGPush(*23.63c* 102803 [Pus hOp(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24, version: 102748'145637, data_included: [0~4194304], data_size: 4194304, omap_head er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24@1 02748'145637, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re covered_to:, omap_complete:false))])
Re: [ceph-users] Can't start osd- one osd alway be down.
My Ceph was hung, andosd.21 172.30.5.2:6870/8047 879 : [ERR] 6.9d8 has 4 objects unfound and apparently lost. After I restart all ceph-data nodes, I can't start osd.21, have many logs about pg 6.9d8 as: -440 2014-10-25 19:28:17.468161 7fec5731d700 5 -- op tracker -- seq: 3083, time: 2014-10-25 19:28:17.468161, event: reached_pg, op: MOSDPGPus h(*6.9d8* 102856 [PushOp(e8de59d8/*rbd_data.4d091f7304c844.e871/head//6*, version: 102853'7800592, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(e8de59d8/rbd_data.4d091f7304c844. e871/head//6@102853'7800592, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:41 94304, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_comp lete:false, omap_recovered_to:, omap_complete:false))]) I think having some error objects. What'm I must do?,please! Thanks! -- Tuan HaNoi-VietNam On 10/25/2014 03:01 PM, Ta Ba Tuan wrote: I send some related bugs: (osd.21 not be able started) -8705 2014-10-25 14:41:04.345727 7f12bac2f700 5 *osd.21* pg_epoch: 102843 pg[*6.5e1*( v 102843'11832159 (102377'11822991,102843'11832159] lb c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6 local-les=101780 n=4719 ec=164 les/c 102841/102838 102840/102840/102477) [40,0,21]/[40,0,60] r=-1 lpr=102840 pi=31832-102839/230 luod=0'0 crt=102843'11832157 lcod 102843'11832158 active+remapped] *exit Started/ReplicaActive/RepNotRecovering* 0.000170 1 0.000296 -1637 2014-10-25 14:41:14.326580 7f12bac2f700 5 *osd.21* pg_epoch: 102843 pg[*2.23b*( v 102839'91984 (91680'88526,102839'91984] local-les=102841 n=85 ec=25000 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100114-102839/50 luod=0'0 crt=102839'91984 active] *enter Started/ReplicaActive/RepNotRecovering* -437 2014-10-25 14:41:15.042174 7f12ba42e700 5 *osd.21 *pg_epoch: 102843 pg[*27.239(* v 102808'38419 (81621'35409,102808'38419] local-les=102841 n=23 ec=25085 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100252-102839/53 luod=0'0 crt=102808'38419 active] *enter **Started/ReplicaActive/RepNotRecovering* Thanks! On 10/25/2014 11:26 AM, Ta Ba Tuan wrote: Hi Craig, Thanks for replying. When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log. Those pgs are active+degraded state. #ceph pg map 7.9d8 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start osd.21 then pg 7.9d8 and three remain pgs to changed to state active+recovering) . osd.21 still down after following logs: 2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 seconds old, received at 2014-10-25 10:57:17.580013: MOSDPGPush(*7.9d8 *102803 [Push Op(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6, version: 102798'7794851, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6@102 798'7794851, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete :true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_rec overed_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 seconds old, received at 2014-10-25 10:57:18.140156: MOSDPGPush(*23.596* 102803 [Pus hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24, version: 102798'295732, data_included: [0~4194304], data_size: 4194304, omap_head er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24@1 02798'295732, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re covered_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 seconds old, received at 2014-10-25 10:57:17.555048: MOSDPGPush(*23.9c6* 102803 [Pus hOp(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24, version: 102798'66056, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24@10 2798'66056, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first,
Re: [ceph-users] Can't start osd- one osd alway be down.
#ceph pg *6.9d8* query ... peer_info: [ { peer: 49, pgid: 6.9d8, last_update: 102889'7801917, last_complete: 102889'7801917, log_tail: 102377'7792649, last_user_version: 7801879, last_backfill: MAX, purged_snaps: [1~7,9~44b,455~1f8,64f~63,6b3~3a,6ee~12f,81f~10,830~8,839~69b,ed7~7,edf~4,ee4~6f5,15da~f9,16d4~1f,16f5~7,16fd~4,1705~5 e,1764~7,1771~78,17eb~12,1800~2,1803~d,1812~3,181a~1,181c~a,1827~3b,1863~1,1865~1,1867~1,186b~e,187a~3,1881~1,1884~7,188c~1,188f~3,1894~5,189f~2, 18ab~1,18c6~1,1922~13,193d~1,1940~1,194a~1,1968~5,1975~1,1979~4,197e~4,1984~1,1987~11,199c~1,19a0~1,19a3~9,19ad~3,19b2~1,19b6~27,19de~8], history: { epoch_created: 164, last_epoch_started: 102888, last_epoch_clean: 102888, last_epoch_split: 0 parent_split_bits: 0, last_scrub: 91654'7460936, last_scrub_stamp: 2014-10-10 10:36:25.433016, last_deep_scrub: 81667'5815892, last_deep_scrub_stamp: 2014-08-29 09:44:14.012219, last_clean_scrub_stamp: 2014-10-10 10:36:25.433016, log_size: 9229, ondisk_log_size: 9229, stats_invalid: 1, stat_sum: { num_bytes: 17870536192, num_objects: 4327, num_object_clones: 29, num_object_copies: 12981,* ** num_objects_missing_on_primary: 4,* num_objects_degraded: 4, num_objects_unfound: 0, num_objects_dirty: 1092, num_whiteouts: 0, num_read: 4820626, num_read_kb: 59073045, num_write: 12748709, num_write_kb: 181630845, num_scrub_errors: 0, num_shallow_scrub_errors: 0, num_deep_scrub_errors: 0, num_objects_recovered: 135847, num_bytes_recovered: 562255538176, num_keys_recovered: 0, num_objects_omap: 0, num_objects_hit_set_archive: 0}, On 10/25/2014 07:40 PM, Ta Ba Tuan wrote: My Ceph was hung, andosd.21 172.30.5.2:6870/8047 879 : [ERR] 6.9d8 has 4 objects unfound and apparently lost. After I restart all ceph-data nodes, I can't start osd.21, have many logs about pg 6.9d8 as: -440 2014-10-25 19:28:17.468161 7fec5731d700 5 -- op tracker -- seq: 3083, time: 2014-10-25 19:28:17.468161, event: reached_pg, op: MOSDPGPus h(*6.9d8* 102856 [PushOp(e8de59d8/*rbd_data.4d091f7304c844.e871/head//6*, version: 102853'7800592, data_included: [0~4194304], data_size: 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(e8de59d8/rbd_data.4d091f7304c844. e871/head//6@102853'7800592, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:41 94304, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_comp lete:false, omap_recovered_to:, omap_complete:false))]) I think having some error objects. What'm I must do?,please! Thanks! -- Tuan HaNoi-VietNam On 10/25/2014 03:01 PM, Ta Ba Tuan wrote: I send some related bugs: (osd.21 not be able started) -8705 2014-10-25 14:41:04.345727 7f12bac2f700 5 *osd.21* pg_epoch: 102843 pg[*6.5e1*( v 102843'11832159 (102377'11822991,102843'11832159] lb c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6 local-les=101780 n=4719 ec=164 les/c 102841/102838 102840/102840/102477) [40,0,21]/[40,0,60] r=-1 lpr=102840 pi=31832-102839/230 luod=0'0 crt=102843'11832157 lcod 102843'11832158 active+remapped] *exit Started/ReplicaActive/RepNotRecovering* 0.000170 1 0.000296 -1637 2014-10-25 14:41:14.326580 7f12bac2f700 5 *osd.21* pg_epoch: 102843 pg[*2.23b*( v 102839'91984 (91680'88526,102839'91984] local-les=102841 n=85 ec=25000 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100114-102839/50 luod=0'0 crt=102839'91984 active] *enter Started/ReplicaActive/RepNotRecovering* -437 2014-10-25 14:41:15.042174 7f12ba42e700 5 *osd.21 *pg_epoch: 102843 pg[*27.239(* v 102808'38419 (81621'35409,102808'38419] local-les=102841 n=23 ec=25085 les/c 102841/102838 102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100252-102839/53 luod=0'0 crt=102808'38419 active] *enter **Started/ReplicaActive/RepNotRecovering* Thanks! On 10/25/2014 11:26 AM, Ta Ba Tuan wrote: Hi Craig, Thanks for replying. When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log. Those pgs are active+degraded state. #ceph pg map 7.9d8 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start osd.21 then pg 7.9d8 and three remain pgs to changed to state
Re: [ceph-users] Can't start osd- one osd alway be down.
It looks like you're running into http://tracker.ceph.com/issues/5699 You're running 0.80.7, which has a fix for that bug. From my reading of the code, I believe the fix only prevents the issue from occurring. It doesn't work around or repair bad snapshots created on older versions of Ceph. Were any of the snapshots you're removing up created on older versions of Ceph? If they were all created on Firefly, then you should open a new tracker issue, and try to get some help on IRC or the developers mailing list. On Thu, Oct 23, 2014 at 10:21 PM, Ta Ba Tuan tua...@vccloud.vn wrote: Dear everyone I can't start osd.21, (attached log file). some pgs can't be repair. I'm using replicate 3 for my data pool. Feel some objects in those pgs be failed, I tried to delete some data that related above objects, but still not start osd.21 and, removed osd.21, but other osds (eg: osd.86 down, not start osd.86). Guide me to debug it, please! Thanks! -- Tuan Ha Noi - VietNam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't start osd- one osd alway be down.
Hi Craig, Thanks for replying. When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log. Those pgs are active+degraded state. #ceph pg map 7.9d8 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start osd.21 then pg 7.9d8 and three remain pgs to changed to state active+recovering) . osd.21 still down after following logs: 2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 seconds old, received at 2014-10-25 10:57:17.580013: MOSDPGPush(*7.9d8 *102803 [Push Op(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6, version: 102798'7794851, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6@102 798'7794851, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete :true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_rec overed_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 seconds old, received at 2014-10-25 10:57:18.140156: MOSDPGPush(*23.596* 102803 [Pus hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24, version: 102798'295732, data_included: [0~4194304], data_size: 4194304, omap_head er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24@1 02798'295732, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re covered_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 seconds old, received at 2014-10-25 10:57:17.555048: MOSDPGPush(*23.9c6* 102803 [Pus hOp(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24, version: 102798'66056, data_included: [0~4194304], data_size: 4194304, omap_heade r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24@10 2798'66056, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete: true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_reco vered_to:, omap_complete:false))]) v2 currently no flag points reached 2014-10-25 10:57:58.418847 osd.21 [WRN] 26 slow requests, 1 included below; oldest blocked for 54.967456 secs 2014-10-25 10:57:58.418859 osd.21 [WRN] slow request 30.967294 seconds old, received at 2014-10-25 10:57:27.451488: MOSDPGPush(*23.63c* 102803 [Pus hOp(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24, version: 102748'145637, data_included: [0~4194304], data_size: 4194304, omap_head er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24@1 02748'145637, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complet e:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_re covered_to:, omap_complete:false))]) v2 currently no flag points reached Thanks! -- Tuan HaNoi-VietNam On 10/25/2014 05:07 AM, Craig Lewis wrote: It looks like you're running into http://tracker.ceph.com/issues/5699 You're running 0.80.7, which has a fix for that bug. From my reading of the code, I believe the fix only prevents the issue from occurring. It doesn't work around or repair bad snapshots created on older versions of Ceph. Were any of the snapshots you're removing up created on older versions of Ceph? If they were all created on Firefly, then you should open a new tracker issue, and try to get some help on IRC or the developers mailing list. On Thu, Oct 23, 2014 at 10:21 PM, Ta Ba Tuan tua...@vccloud.vn mailto:tua...@vccloud.vn wrote: Dear everyone I can't start osd.21, (attached log file). some pgs can't be repair. I'm using replicate 3 for my data pool. Feel some objects in those pgs be failed, I tried to delete some data that related above objects, but still not start osd.21 and, removed osd.21, but other osds (eg: osd.86 down, not start osd.86). Guide me to debug it, please! Thanks! -- Tuan Ha Noi - VietNam ___ ceph-users mailing list