Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-28 Thread tuantb
Hi Craig Lewis, My pool have 300TB DATA, I can't recreate a new pool, then copying data by ceph cp pool (take very long time). I upgraded Ceph to Giant (0.86), but still error :(( I think my proplem is objects misplaced (0.320%) # ceph pg 23.96 query num_objects_missing_on_primary: 0,

Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-27 Thread Craig Lewis
My experience is that once you hit this bug, those PGs are gone. I tried marking the primary OSD OUT, which caused this problem to move to the new primary OSD. Luckily for me, my affected PGs were using replication state in the secondary cluster. I ended up deleting the whole pool and

Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-25 Thread Ta Ba Tuan
I send some related bugs: (osd.21 not be able started) -8705 2014-10-25 14:41:04.345727 7f12bac2f700 5 *osd.21* pg_epoch: 102843 pg[*6.5e1*( v 102843'11832159 (102377'11822991,102843'11832159] lb c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6 local-les=101780 n=4719 ec=164 les/c

Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-25 Thread Ta Ba Tuan
My Ceph was hung, andosd.21 172.30.5.2:6870/8047 879 : [ERR] 6.9d8 has 4 objects unfound and apparently lost. After I restart all ceph-data nodes, I can't start osd.21, have many logs about pg 6.9d8 as: -440 2014-10-25 19:28:17.468161 7fec5731d700 5 -- op tracker -- seq: 3083, time:

Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-25 Thread Ta Ba Tuan
#ceph pg *6.9d8* query ... peer_info: [ { peer: 49, pgid: 6.9d8, last_update: 102889'7801917, last_complete: 102889'7801917, log_tail: 102377'7792649, last_user_version: 7801879, last_backfill: MAX, purged_snaps:

Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-24 Thread Craig Lewis
It looks like you're running into http://tracker.ceph.com/issues/5699 You're running 0.80.7, which has a fix for that bug. From my reading of the code, I believe the fix only prevents the issue from occurring. It doesn't work around or repair bad snapshots created on older versions of Ceph.

Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-24 Thread Ta Ba Tuan
Hi Craig, Thanks for replying. When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log. Those pgs are active+degraded state. #ceph pg map 7.9d8 osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start osd.21 then pg