Hi Craig Lewis,
My pool have 300TB DATA, I can't recreate a new pool, then copying data
by ceph cp pool (take very long time).
I upgraded Ceph to Giant (0.86), but still error :((
I think my proplem is objects misplaced (0.320%)
# ceph pg 23.96 query
num_objects_missing_on_primary: 0,
My experience is that once you hit this bug, those PGs are gone. I tried
marking the primary OSD OUT, which caused this problem to move to the new
primary OSD. Luckily for me, my affected PGs were using replication state
in the secondary cluster. I ended up deleting the whole pool and
I send some related bugs:
(osd.21 not be able started)
-8705 2014-10-25 14:41:04.345727 7f12bac2f700 5 *osd.21* pg_epoch:
102843 pg[*6.5e1*( v 102843'11832159 (102377'11822991,102843'11832159]
lb c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6
local-les=101780 n=4719 ec=164 les/c
My Ceph was hung, andosd.21 172.30.5.2:6870/8047 879 : [ERR] 6.9d8
has 4 objects unfound and apparently lost.
After I restart all ceph-data nodes, I can't start osd.21, have many
logs about pg 6.9d8 as:
-440 2014-10-25 19:28:17.468161 7fec5731d700 5 -- op tracker -- seq:
3083, time:
#ceph pg *6.9d8* query
...
peer_info: [
{ peer: 49,
pgid: 6.9d8,
last_update: 102889'7801917,
last_complete: 102889'7801917,
log_tail: 102377'7792649,
last_user_version: 7801879,
last_backfill: MAX,
purged_snaps:
It looks like you're running into http://tracker.ceph.com/issues/5699
You're running 0.80.7, which has a fix for that bug. From my reading of
the code, I believe the fix only prevents the issue from occurring. It
doesn't work around or repair bad snapshots created on older versions of
Ceph.
Hi Craig, Thanks for replying.
When i started that osd, Ceph Log from ceph -w warns pgs 7.9d8 23.596,
23.9c6, 23.63 can't recovery as pasted log.
Those pgs are active+degraded state.
#ceph pg map 7.9d8
osdmap e102808 pg 7.9d8 (7.9d8) - up [93,49] acting [93,49] (When start
osd.21 then pg