Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-28 Thread tuantb
 

Hi Craig Lewis, 

My pool have 300TB DATA, I can't recreate a new pool, then copying data
by ceph cp pool (take very long time). 

I upgraded Ceph to Giant (0.86), but still error :(( 

I think my proplem is objects misplaced (0.320%) 

# ceph pg 23.96 query
 num_objects_missing_on_primary: 0,
 num_objects_degraded: 0,
 NUM_OBJECTS_MISPLACED: 79,

 cluster xx-x-x-x
 health HEALTH_WARN 225 pgs degraded; 2 pgs repair; 225 pgs stuck
degraded; 263 pgs stuck unclean; 225 pgs stuck undersized; 225 pgs
undersized; recovery 308759/54799506 objects degraded (0.563%);
175270/54799506 objects misplaced (0.320%); 1/130 in osds are down; 
 flags noout,nodeep-scrub
 pgmap v28905830: 14973 pgs, 23 pools, 70255 GB data, 17838 kobjects
 206 TB used, 245 TB / 452 TB avail
 308759/54799506 objects degraded (0.563%); 175270/54799506 OBJECTS
MISPLACED (0.320%)
 14708 active+clean
 38 ACTIVE+REMAPPED
 225 ACTIVE+UNDERSIZED+DEGRADED    
 client io 35068 kB/s rd, 71815 kB/s wr, 4956 op/s 

- Checking in ceph log: 

2014-10-28 15:33:59.733177 7f6a7f1ab700 5 OSD.21 pg_epoch: 103718
pg[23.96( v 103713'171086 (103609'167229,103713'171086] local-les=103715
n=85 ec=25000 les/c 103715/103710 103714/103714/103236) [92,21,78] r=1
lpr=103714 pi=100280-103713/118 luod=0'0 crt=103713'171086 active] ENTER
STARTED/REPLICAACTIVE/REPNOTRECOVERING 

Then logging many failed log: (on many objects eg:
c03fe096/rbd_data.5348922ae8944a.306b,..) 

2014-10-28 15:33:59.343435 7f6a7e1a9700 5 -- op tracker -- seq: 1793,
time: 2014-10-28 15:33:59.343435, event: done, op: MOSDPGPush(23.96
103718
[PushOp(C03FE096/RBD_DATA.5348922AE8944A.306B/HEAD//24,
version: 103622'283374, data_included: [0~4194304], data_size: 0,
omap_header_size: 0, omap_entries_size: 0, attrset_size: 2,
recovery_info:
ObjectRecoveryInfo(c03fe096/rbd_data.5348922ae8944a.306b/head//24@103622'283374,
copy_subset: [0~4194304], clone_subset: {}), after_progress:
ObjectRecoveryProgress(!first, data_recovered_to:4194304,
data_complete:true, omap_recovered_to:, omap_complete:true),
before_progress: ObjectRecoveryProgress(first, data_recovered_to:0,
data_complete:false, omap_recovered_to:,
omap_complete:false)),PushOp(4120f096/rbd_data.7a63d32ae8944a.0083/head//24,
version: 103679'295624, data_included: [0~4194304], data_size: 0,
omap_header_size: 0, omap_entries_size: 0, attrset_size: 2,
recovery_info:
ObjectRecoveryInfo(4120f096/rbd_data.7a63d32ae8944a.0083/head//24@103679'295624,
copy_subset: [0~4194304], clone_subset: {}), after_progress:
ObjectRecoveryProgress(!first, data_recovered_to:4194304,
data_complete:true, omap_recovered_to:, omap_complete:true),
before_progress: ObjectRecoveryProgress(first, data_recovered_to:0,
data_complete:false, omap_recovered_to:, omap_complete:false))]) 

Thanks! 

--
Tuan
HaNoi-VietNam 

On 2014-10-28 01:35, Craig Lewis wrote: 

 My experience is that once you hit this bug, those PGs are gone. I tried 
 marking the primary OSD OUT, which caused this problem to move to the new 
 primary OSD. Luckily for me, my affected PGs were using replication state in 
 the secondary cluster. I ended up deleting the whole pool and recreating it. 
 
 Which pools are 7 and 23? It's possible that it's something that easy to 
 replace. 
 
 On Fri, Oct 24, 2014 at 9:26 PM, Ta Ba Tuan tua...@vccloud.vn wrote:
 
 Hi Craig, Thanks for replying.
 When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 
 23.9c6, 23.63 can't recovery as pasted log.
 
 Those pgs are active+degraded state. 
 #ceph pg map 7.9d8 
 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start 
 osd.21 then pg 7.9d8 and three remain pgs to changed to state 
 active+recovering) . osd.21 still down after following logs:

 ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-27 Thread Craig Lewis
My experience is that once you hit this bug, those PGs are gone.  I tried
marking the primary OSD OUT, which caused this problem to move to the new
primary OSD.  Luckily for me, my affected PGs were using replication state
in the secondary cluster.  I ended up deleting the whole pool and
recreating it.

Which pools are 7 and 23?  It's possible that it's something that easy to
replace.



On Fri, Oct 24, 2014 at 9:26 PM, Ta Ba Tuan tua...@vccloud.vn wrote:

  Hi Craig, Thanks for replying.
 When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596,
 23.9c6, 23.63 can't recovery as pasted log.

 Those pgs are active+degraded state.
 #ceph pg map 7.9d8
 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49]  (When start
 osd.21 then pg 7.9d8 and three remain pgs  to changed to state
 active+recovering) . osd.21 still down after following logs:

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-25 Thread Ta Ba Tuan

I send some related bugs:
(osd.21 not be able started)

 -8705 2014-10-25 14:41:04.345727 7f12bac2f700  5 *osd.21* pg_epoch: 
102843 pg[*6.5e1*( v 102843'11832159 (102377'11822991,102843'11832159] 
lb c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6 
local-les=101780 n=4719 ec=164 les/c 102841/102838 102840/102840/102477) 
[40,0,21]/[40,0,60] r=-1 lpr=102840 pi=31832-102839/230 luod=0'0 
crt=102843'11832157 lcod 102843'11832158 active+remapped] *exit 
Started/ReplicaActive/RepNotRecovering* 0.000170 1 0.000296


 -1637 2014-10-25 14:41:14.326580 7f12bac2f700  5 *osd.21* pg_epoch: 
102843 pg[*2.23b*( v 102839'91984 (91680'88526,102839'91984] 
local-les=102841 n=85 ec=25000 les/c 102841/102838 102840/102840/102656) 
[90,21,120] r=1 lpr=102840 pi=100114-102839/50 luod=0'0 crt=102839'91984 
active] *enter Started/ReplicaActive/RepNotRecovering*


  -437 2014-10-25 14:41:15.042174 7f12ba42e700  5 *osd.21 *pg_epoch: 
102843 pg[*27.239(* v 102808'38419 (81621'35409,102808'38419] 
local-les=102841 n=23 ec=25085 les/c 102841/102838 102840/102840/102656) 
[90,21,120] r=1 lpr=102840 pi=100252-102839/53 luod=0'0 crt=102808'38419 
active] *enter **Started/ReplicaActive/RepNotRecovering*


Thanks!


On 10/25/2014 11:26 AM, Ta Ba Tuan wrote:

Hi Craig, Thanks for replying.
When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 
23.596, 23.9c6, 23.63 can't recovery as pasted log.


Those pgs are active+degraded state.
#ceph pg map 7.9d8
osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When 
start osd.21 then pg 7.9d8 and three remain pgs  to changed to state 
active+recovering) . osd.21 still down after following logs:



2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 seconds 
old, received at 2014-10-25 10:57:17.580013: MOSDPGPush(*7.9d8 *102803 
[Push
Op(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6, version: 
102798'7794851, data_included: [0~4194304], data_size: 4194304, omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6@102
798'7794851, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complete
:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_rec

overed_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 seconds 
old, received at 2014-10-25 10:57:18.140156: MOSDPGPush(*23.596* 
102803 [Pus
hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24, 
version: 102798'295732, data_included: [0~4194304], data_size: 
4194304, omap_head
er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24@1
02798'295732, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complet
e:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_re

covered_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 seconds 
old, received at 2014-10-25 10:57:17.555048: MOSDPGPush(*23.9c6* 
102803 [Pus
hOp(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24, 
version: 102798'66056, data_included: [0~4194304], data_size: 4194304, 
omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24@10
2798'66056, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complete:
true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_reco

vered_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:58.418847 osd.21 [WRN] 26 slow requests, 1 included 
below; oldest blocked for  54.967456 secs
2014-10-25 10:57:58.418859 osd.21 [WRN] slow request 30.967294 seconds 
old, received at 2014-10-25 10:57:27.451488: MOSDPGPush(*23.63c* 
102803 [Pus
hOp(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24, 
version: 102748'145637, data_included: [0~4194304], data_size: 
4194304, omap_head
er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24@1
02748'145637, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complet
e:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_re

covered_to:, omap_complete:false))]) 

Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-25 Thread Ta Ba Tuan
My Ceph was hung, andosd.21 172.30.5.2:6870/8047 879 : [ERR] 6.9d8 
has 4 objects unfound and apparently lost.


After I restart all ceph-data nodes,  I can't start osd.21, have many 
logs about pg 6.9d8 as:


 -440 2014-10-25 19:28:17.468161 7fec5731d700  5 -- op tracker -- seq: 
3083, time: 2014-10-25 19:28:17.468161, event: reached_pg, op: MOSDPGPus
h(*6.9d8* 102856 
[PushOp(e8de59d8/*rbd_data.4d091f7304c844.e871/head//6*, 
version: 102853'7800592, data_included: [0~4194304], data_size:
 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, 
recovery_info: ObjectRecoveryInfo(e8de59d8/rbd_data.4d091f7304c844.
e871/head//6@102853'7800592, copy_subset: [0~4194304], clone_subset: 
{}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:41
94304, data_complete:true, omap_recovered_to:, omap_complete:true), 
before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, 
data_comp

lete:false, omap_recovered_to:, omap_complete:false))])

I think having some error objects. What'm I must do?,please!
Thanks!
--
Tuan
HaNoi-VietNam


On 10/25/2014 03:01 PM, Ta Ba Tuan wrote:

I send some related bugs:
(osd.21 not be able started)

 -8705 2014-10-25 14:41:04.345727 7f12bac2f700  5 *osd.21* pg_epoch: 
102843 pg[*6.5e1*( v 102843'11832159 (102377'11822991,102843'11832159] 
lb c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6 
local-les=101780 n=4719 ec=164 les/c 102841/102838 
102840/102840/102477) [40,0,21]/[40,0,60] r=-1 lpr=102840 
pi=31832-102839/230 luod=0'0 crt=102843'11832157 lcod 102843'11832158 
active+remapped] *exit Started/ReplicaActive/RepNotRecovering* 
0.000170 1 0.000296


 -1637 2014-10-25 14:41:14.326580 7f12bac2f700  5 *osd.21* pg_epoch: 
102843 pg[*2.23b*( v 102839'91984 (91680'88526,102839'91984] 
local-les=102841 n=85 ec=25000 les/c 102841/102838 
102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100114-102839/50 
luod=0'0 crt=102839'91984 active] *enter 
Started/ReplicaActive/RepNotRecovering*


  -437 2014-10-25 14:41:15.042174 7f12ba42e700  5 *osd.21 *pg_epoch: 
102843 pg[*27.239(* v 102808'38419 (81621'35409,102808'38419] 
local-les=102841 n=23 ec=25085 les/c 102841/102838 
102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100252-102839/53 
luod=0'0 crt=102808'38419 active] *enter 
**Started/ReplicaActive/RepNotRecovering*


Thanks!


On 10/25/2014 11:26 AM, Ta Ba Tuan wrote:

Hi Craig, Thanks for replying.
When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 
23.596, 23.9c6, 23.63 can't recovery as pasted log.


Those pgs are active+degraded state.
#ceph pg map 7.9d8
osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49]  (When 
start osd.21 then pg 7.9d8 and three remain pgs  to changed to state 
active+recovering) . osd.21 still down after following logs:



2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 
seconds old, received at 2014-10-25 10:57:17.580013: 
MOSDPGPush(*7.9d8 *102803 [Push
Op(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6, 
version: 102798'7794851, data_included: [0~4194304], data_size: 
4194304, omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6@102
798'7794851, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complete
:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_rec

overed_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 
seconds old, received at 2014-10-25 10:57:18.140156: 
MOSDPGPush(*23.596* 102803 [Pus
hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24, 
version: 102798'295732, data_included: [0~4194304], data_size: 
4194304, omap_head
er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24@1
02798'295732, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complet
e:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_re

covered_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 
seconds old, received at 2014-10-25 10:57:17.555048: 
MOSDPGPush(*23.9c6* 102803 [Pus
hOp(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24, 
version: 102798'66056, data_included: [0~4194304], data_size: 
4194304, omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24@10
2798'66056, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 

Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-25 Thread Ta Ba Tuan

#ceph pg *6.9d8* query
...
  peer_info: [
{ peer: 49,
  pgid: 6.9d8,
  last_update: 102889'7801917,
  last_complete: 102889'7801917,
  log_tail: 102377'7792649,
  last_user_version: 7801879,
  last_backfill: MAX,
  purged_snaps: 
[1~7,9~44b,455~1f8,64f~63,6b3~3a,6ee~12f,81f~10,830~8,839~69b,ed7~7,edf~4,ee4~6f5,15da~f9,16d4~1f,16f5~7,16fd~4,1705~5

e,1764~7,1771~78,17eb~12,1800~2,1803~d,1812~3,181a~1,181c~a,1827~3b,1863~1,1865~1,1867~1,186b~e,187a~3,1881~1,1884~7,188c~1,188f~3,1894~5,189f~2,
18ab~1,18c6~1,1922~13,193d~1,1940~1,194a~1,1968~5,1975~1,1979~4,197e~4,1984~1,1987~11,199c~1,19a0~1,19a3~9,19ad~3,19b2~1,19b6~27,19de~8],
  history: { epoch_created: 164,
  last_epoch_started: 102888,
  last_epoch_clean: 102888,
  last_epoch_split: 0
  parent_split_bits: 0,
  last_scrub: 91654'7460936,
  last_scrub_stamp: 2014-10-10 10:36:25.433016,
  last_deep_scrub: 81667'5815892,
  last_deep_scrub_stamp: 2014-08-29 09:44:14.012219,
  last_clean_scrub_stamp: 2014-10-10 10:36:25.433016,
  log_size: 9229,
  ondisk_log_size: 9229,
  stats_invalid: 1,
  stat_sum: { num_bytes: 17870536192,
  num_objects: 4327,
  num_object_clones: 29,
  num_object_copies: 12981,*
**  num_objects_missing_on_primary: 4,*
  num_objects_degraded: 4,
  num_objects_unfound: 0,
  num_objects_dirty: 1092,
  num_whiteouts: 0,
  num_read: 4820626,
  num_read_kb: 59073045,
  num_write: 12748709,
  num_write_kb: 181630845,
  num_scrub_errors: 0,
  num_shallow_scrub_errors: 0,
  num_deep_scrub_errors: 0,
  num_objects_recovered: 135847,
  num_bytes_recovered: 562255538176,
  num_keys_recovered: 0,
  num_objects_omap: 0,
  num_objects_hit_set_archive: 0},


On 10/25/2014 07:40 PM, Ta Ba Tuan wrote:
My Ceph was hung, andosd.21 172.30.5.2:6870/8047 879 : [ERR] 
6.9d8 has 4 objects unfound and apparently lost.


After I restart all ceph-data nodes,  I can't start osd.21, have many 
logs about pg 6.9d8 as:


 -440 2014-10-25 19:28:17.468161 7fec5731d700  5 -- op tracker -- 
seq: 3083, time: 2014-10-25 19:28:17.468161, event: reached_pg, op: 
MOSDPGPus
h(*6.9d8* 102856 
[PushOp(e8de59d8/*rbd_data.4d091f7304c844.e871/head//6*, 
version: 102853'7800592, data_included: [0~4194304], data_size:
 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, 
recovery_info: 
ObjectRecoveryInfo(e8de59d8/rbd_data.4d091f7304c844.
e871/head//6@102853'7800592, copy_subset: [0~4194304], 
clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:41
94304, data_complete:true, omap_recovered_to:, omap_complete:true), 
before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, 
data_comp

lete:false, omap_recovered_to:, omap_complete:false))])

I think having some error objects. What'm I must do?,please!
Thanks!
--
Tuan
HaNoi-VietNam


On 10/25/2014 03:01 PM, Ta Ba Tuan wrote:

I send some related bugs:
(osd.21 not be able started)

 -8705 2014-10-25 14:41:04.345727 7f12bac2f700  5 *osd.21* pg_epoch: 
102843 pg[*6.5e1*( v 102843'11832159 
(102377'11822991,102843'11832159] lb 
c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6 
local-les=101780 n=4719 ec=164 les/c 102841/102838 
102840/102840/102477) [40,0,21]/[40,0,60] r=-1 lpr=102840 
pi=31832-102839/230 luod=0'0 crt=102843'11832157 lcod 102843'11832158 
active+remapped] *exit Started/ReplicaActive/RepNotRecovering* 
0.000170 1 0.000296


 -1637 2014-10-25 14:41:14.326580 7f12bac2f700  5 *osd.21* pg_epoch: 
102843 pg[*2.23b*( v 102839'91984 (91680'88526,102839'91984] 
local-les=102841 n=85 ec=25000 les/c 102841/102838 
102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100114-102839/50 
luod=0'0 crt=102839'91984 active] *enter 
Started/ReplicaActive/RepNotRecovering*


  -437 2014-10-25 14:41:15.042174 7f12ba42e700  5 *osd.21 *pg_epoch: 
102843 pg[*27.239(* v 102808'38419 (81621'35409,102808'38419] 
local-les=102841 n=23 ec=25085 les/c 102841/102838 
102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100252-102839/53 
luod=0'0 crt=102808'38419 active] *enter 
**Started/ReplicaActive/RepNotRecovering*


Thanks!


On 10/25/2014 11:26 AM, Ta Ba Tuan wrote:

Hi Craig, Thanks for replying.
When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 
23.596, 23.9c6, 23.63 can't recovery as pasted log.


Those pgs are active+degraded state.
#ceph pg map 7.9d8
osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49]  (When 
start osd.21 then pg 7.9d8 and three remain pgs  to changed to state 

Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-24 Thread Craig Lewis
It looks like you're running into http://tracker.ceph.com/issues/5699

You're running 0.80.7, which has a fix for that bug.  From my reading of
the code, I believe the fix only prevents the issue from occurring.  It
doesn't work around or repair bad snapshots created on older versions of
Ceph.

Were any of the snapshots you're removing up created on older versions of
Ceph?  If they were all created on Firefly, then you should open a new
tracker issue, and try to get some help on IRC or the developers mailing
list.


On Thu, Oct 23, 2014 at 10:21 PM, Ta Ba Tuan tua...@vccloud.vn wrote:

 Dear everyone

 I can't start osd.21, (attached log file).
 some pgs can't be repair. I'm using replicate 3 for my data pool.
 Feel some objects in those pgs be failed,

 I tried to delete some data that related above objects, but still not
 start osd.21
 and, removed osd.21, but other osds (eg: osd.86 down, not start osd.86).

 Guide me to debug it, please! Thanks!

 --
 Tuan
 Ha Noi - VietNam










 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-24 Thread Ta Ba Tuan

Hi Craig, Thanks for replying.
When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 
23.9c6, 23.63 can't recovery as pasted log.


Those pgs are active+degraded state.
#ceph pg map 7.9d8
osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start 
osd.21 then pg 7.9d8 and three remain pgs  to changed to state 
active+recovering) . osd.21 still down after following logs:



2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 seconds 
old, received at 2014-10-25 10:57:17.580013: MOSDPGPush(*7.9d8 *102803 [Push
Op(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6, version: 
102798'7794851, data_included: [0~4194304], data_size: 4194304, omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6@102
798'7794851, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complete
:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, 
omap_rec

overed_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 seconds 
old, received at 2014-10-25 10:57:18.140156: MOSDPGPush(*23.596* 102803 [Pus
hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24, version: 
102798'295732, data_included: [0~4194304], data_size: 4194304, omap_head
er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24@1
02798'295732, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complet
e:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, 
omap_re

covered_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 seconds 
old, received at 2014-10-25 10:57:17.555048: MOSDPGPush(*23.9c6* 102803 [Pus
hOp(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24, version: 
102798'66056, data_included: [0~4194304], data_size: 4194304, omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24@10
2798'66056, copy_subset: [0~4194304], clone_subset: {}), after_progress: 
ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:
true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, 
omap_reco

vered_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:58.418847 osd.21 [WRN] 26 slow requests, 1 included 
below; oldest blocked for  54.967456 secs
2014-10-25 10:57:58.418859 osd.21 [WRN] slow request 30.967294 seconds 
old, received at 2014-10-25 10:57:27.451488: MOSDPGPush(*23.63c* 102803 [Pus
hOp(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24, version: 
102748'145637, data_included: [0~4194304], data_size: 4194304, omap_head
er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24@1
02748'145637, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complet
e:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, 
omap_re

covered_to:, omap_complete:false))]) v2 currently no flag points reached

Thanks!
--
Tuan
HaNoi-VietNam

On 10/25/2014 05:07 AM, Craig Lewis wrote:

It looks like you're running into http://tracker.ceph.com/issues/5699

You're running 0.80.7, which has a fix for that bug.  From my reading 
of the code, I believe the fix only prevents the issue from 
occurring.  It doesn't work around or repair bad snapshots created on 
older versions of Ceph.


Were any of the snapshots you're removing up created on older versions 
of Ceph?  If they were all created on Firefly, then you should open a 
new tracker issue, and try to get some help on IRC or the developers 
mailing list.


On Thu, Oct 23, 2014 at 10:21 PM, Ta Ba Tuan tua...@vccloud.vn 
mailto:tua...@vccloud.vn wrote:


Dear everyone

I can't start osd.21, (attached log file).
some pgs can't be repair. I'm using replicate 3 for my data pool.
Feel some objects in those pgs be failed,

I tried to delete some data that related above objects, but still
not start osd.21
and, removed osd.21, but other osds (eg: osd.86 down, not start
osd.86).

Guide me to debug it, please! Thanks!

--
Tuan
Ha Noi - VietNam










___
ceph-users mailing list