Re: [ceph-users] scrub error: found clone without head

2013-05-31 Thread Olivier Bonvalet
Hi,

sorry for the late answer : trying to fix that, I tried to delete the
image (rbd rm XXX), the rbd rm complete without errors, but rbd ls
still display this image.

What should I do ?


Here the files for the PG 3.6b :

# find /var/lib/ceph/osd/ceph-28/current/3.6b_head/ -name 
'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l
-rw-r--r-- 1 root root 4194304 19 mai   22:52 
/var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.9221__12d7_ADE3C16B__3
-rw-r--r-- 1 root root 4194304 19 mai   23:00 
/var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.3671__12d7_261CC0EB__3
-rw-r--r-- 1 root root 4194304 19 mai   22:59 
/var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.86a2__12d7_B10DEAEB__3

# find /var/lib/ceph/osd/ceph-23/current/3.6b_head/ -name 
'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l
-rw-r--r-- 1 root root 4194304 25 mars  19:18 
/var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.9221__12d7_ADE3C16B__3
-rw-r--r-- 1 root root 4194304 25 mars  19:33 
/var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.3671__12d7_261CC0EB__3
-rw-r--r-- 1 root root 4194304 25 mars  19:34 
/var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.86a2__12d7_B10DEAEB__3

# find /var/lib/ceph/osd/ceph-5/current/3.6b_head/ -name 'rb.0.15c26.238e1f29*' 
-print0 | xargs -r -0 ls -l
-rw-r--r-- 1 root root 4194304 25 mars  19:18 
/var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.9221__12d7_ADE3C16B__3
-rw-r--r-- 1 root root 4194304 25 mars  19:33 
/var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.3671__12d7_261CC0EB__3
-rw-r--r-- 1 root root 4194304 25 mars  19:34 
/var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.86a2__12d7_B10DEAEB__3


As you can see, OSD doesn't contain any other data on thoses PG for this RBD 
image. Should I remove them thought rados ?


In fact I remember that some of thoses files was truncated (size 0), then I 
manually copy data from osd-5. It was probably an error to do that.


Thanks,
Olivier

Le jeudi 23 mai 2013 à 15:53 -0700, Samuel Just a écrit :
 Can you send the filenames in the pg directories for those 4 pgs?
 -Sam
 
 On Thu, May 23, 2013 at 3:27 PM, Olivier Bonvalet ceph.l...@daevel.fr wrote:
  No :
  pg 3.7c is active+clean+inconsistent, acting [24,13,39]
  pg 3.6b is active+clean+inconsistent, acting [28,23,5]
  pg 3.d is active+clean+inconsistent, acting [29,4,11]
  pg 3.1 is active+clean+inconsistent, acting [28,19,5]
 
  But I suppose that all PG *was* having the osd.25 as primary (on the
  same host), which is (disabled) buggy OSD.
 
  Question : 12d7 in object path is the snapshot id, right ? If it's the
  case, I haven't got any snapshot with this id for the
  rb.0.15c26.238e1f29 image.
 
  So, which files should I remove ?
 
  Thanks for your help.
 
 
  Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit :
  Do all of the affected PGs share osd.28 as the primary?  I think the
  only recovery is probably to manually remove the orphaned clones.
  -Sam
 
  On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet ceph.l...@daevel.fr 
  wrote:
   Not yet. I keep it for now.
  
   Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit :
   rb.0.15c26.238e1f29
  
   Has that rbd volume been removed?
   -Sam
  
   On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet 
   ceph.l...@daevel.fr wrote:
0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.
   
   
Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
What version are you running?
-Sam
   
On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet 
ceph.l...@daevel.fr wrote:
 Is it enough ?

 # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found 
 clone without head'
 2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 scrub 
 ok
 2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 scrub 
 ok
 2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 scrub 
 ok
 2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 scrub 
 ok
 2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 scrub 
 ok
 2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 3.6b 
 ade3c16b/rb.0.15c26.238e1f29.9221/12d7//3 found clone 
 without head
 2013-05-22 15:55:07.230114 7f707d563700  0 log [ERR] : scrub 3.6b 
 261cc0eb/rb.0.15c26.238e1f29.3671/12d7//3 found clone 
 without head
 2013-05-22 15:56:56.456242 7f707d563700  0 log [ERR] : scrub 3.6b 
 b10deaeb/rb.0.15c26.238e1f29.86a2/12d7//3 found clone 
 without head
 2013-05-22 15:57:51.667085 7f707dd64700  0 log 

Re: [ceph-users] scrub error: found clone without head

2013-05-31 Thread Olivier Bonvalet
Note that I still have scrub errors, but rados doesn't see thoses
objects :

root! brontes:~# rados -p hdd3copies ls | grep '^rb.0.15c26.238e1f29'
root! brontes:~# 



Le vendredi 31 mai 2013 à 15:36 +0200, Olivier Bonvalet a écrit :
 Hi,
 
 sorry for the late answer : trying to fix that, I tried to delete the
 image (rbd rm XXX), the rbd rm complete without errors, but rbd ls
 still display this image.
 
 What should I do ?
 
 
 Here the files for the PG 3.6b :
 
 # find /var/lib/ceph/osd/ceph-28/current/3.6b_head/ -name 
 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l
 -rw-r--r-- 1 root root 4194304 19 mai   22:52 
 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.9221__12d7_ADE3C16B__3
 -rw-r--r-- 1 root root 4194304 19 mai   23:00 
 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.3671__12d7_261CC0EB__3
 -rw-r--r-- 1 root root 4194304 19 mai   22:59 
 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.86a2__12d7_B10DEAEB__3
 
 # find /var/lib/ceph/osd/ceph-23/current/3.6b_head/ -name 
 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l
 -rw-r--r-- 1 root root 4194304 25 mars  19:18 
 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.9221__12d7_ADE3C16B__3
 -rw-r--r-- 1 root root 4194304 25 mars  19:33 
 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.3671__12d7_261CC0EB__3
 -rw-r--r-- 1 root root 4194304 25 mars  19:34 
 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.86a2__12d7_B10DEAEB__3
 
 # find /var/lib/ceph/osd/ceph-5/current/3.6b_head/ -name 
 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l
 -rw-r--r-- 1 root root 4194304 25 mars  19:18 
 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.9221__12d7_ADE3C16B__3
 -rw-r--r-- 1 root root 4194304 25 mars  19:33 
 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.3671__12d7_261CC0EB__3
 -rw-r--r-- 1 root root 4194304 25 mars  19:34 
 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.86a2__12d7_B10DEAEB__3
 
 
 As you can see, OSD doesn't contain any other data on thoses PG for this RBD 
 image. Should I remove them thought rados ?
 
 
 In fact I remember that some of thoses files was truncated (size 0), then I 
 manually copy data from osd-5. It was probably an error to do that.
 
 
 Thanks,
 Olivier
 
 Le jeudi 23 mai 2013 à 15:53 -0700, Samuel Just a écrit :
  Can you send the filenames in the pg directories for those 4 pgs?
  -Sam
  
  On Thu, May 23, 2013 at 3:27 PM, Olivier Bonvalet ceph.l...@daevel.fr 
  wrote:
   No :
   pg 3.7c is active+clean+inconsistent, acting [24,13,39]
   pg 3.6b is active+clean+inconsistent, acting [28,23,5]
   pg 3.d is active+clean+inconsistent, acting [29,4,11]
   pg 3.1 is active+clean+inconsistent, acting [28,19,5]
  
   But I suppose that all PG *was* having the osd.25 as primary (on the
   same host), which is (disabled) buggy OSD.
  
   Question : 12d7 in object path is the snapshot id, right ? If it's the
   case, I haven't got any snapshot with this id for the
   rb.0.15c26.238e1f29 image.
  
   So, which files should I remove ?
  
   Thanks for your help.
  
  
   Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit :
   Do all of the affected PGs share osd.28 as the primary?  I think the
   only recovery is probably to manually remove the orphaned clones.
   -Sam
  
   On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet ceph.l...@daevel.fr 
   wrote:
Not yet. I keep it for now.
   
Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit :
rb.0.15c26.238e1f29
   
Has that rbd volume been removed?
-Sam
   
On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet 
ceph.l...@daevel.fr wrote:
 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.


 Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
 What version are you running?
 -Sam

 On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet 
 ceph.l...@daevel.fr wrote:
  Is it enough ?
 
  # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found 
  clone without head'
  2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 
  scrub ok
  2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 
  scrub ok
  2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 
  scrub ok
  2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 
  scrub ok
  2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 
  scrub ok
  2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 
  3.6b ade3c16b/rb.0.15c26.238e1f29.9221/12d7//3 found 
  clone without head
  

Re: [ceph-users] scrub error: found clone without head

2013-05-23 Thread Olivier Bonvalet
Not yet. I keep it for now.

Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit :
 rb.0.15c26.238e1f29
 
 Has that rbd volume been removed?
 -Sam
 
 On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet ceph.l...@daevel.fr 
 wrote:
  0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.
 
 
  Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
  What version are you running?
  -Sam
 
  On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet ceph.l...@daevel.fr 
  wrote:
   Is it enough ?
  
   # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone 
   without head'
   2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 scrub ok
   2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 scrub ok
   2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 scrub ok
   2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 scrub ok
   2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 scrub ok
   2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 3.6b 
   ade3c16b/rb.0.15c26.238e1f29.9221/12d7//3 found clone without 
   head
   2013-05-22 15:55:07.230114 7f707d563700  0 log [ERR] : scrub 3.6b 
   261cc0eb/rb.0.15c26.238e1f29.3671/12d7//3 found clone without 
   head
   2013-05-22 15:56:56.456242 7f707d563700  0 log [ERR] : scrub 3.6b 
   b10deaeb/rb.0.15c26.238e1f29.86a2/12d7//3 found clone without 
   head
   2013-05-22 15:57:51.667085 7f707dd64700  0 log [ERR] : 3.6b scrub 3 
   errors
   2013-05-22 15:57:55.241224 7f707dd64700  0 log [INF] : 9.450 scrub ok
   2013-05-22 15:57:59.800383 7f707cd62700  0 log [INF] : 9.465 scrub ok
   2013-05-22 15:59:55.024065 7f707661a700  0 -- 192.168.42.3:6803/12142  
   192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 
   cs=73 l=0).fault with nothing to send, going to standby
   2013-05-22 16:01:45.542579 7f7022770700  0 -- 192.168.42.3:6803/12142  
   192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 
   l=0).accept connect_seq 74 vs existing 73 state standby
   --
   2013-05-22 16:29:49.544310 7f707dd64700  0 log [INF] : 9.4eb scrub ok
   2013-05-22 16:29:53.190233 7f707dd64700  0 log [INF] : 9.4f4 scrub ok
   2013-05-22 16:29:59.478736 7f707dd64700  0 log [INF] : 8.6bb scrub ok
   2013-05-22 16:35:12.240246 7f7022770700  0 -- 192.168.42.3:6803/12142  
   192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 
   l=0).fault with nothing to send, going to standby
   2013-05-22 16:35:19.519019 7f707d563700  0 log [INF] : 8.700 scrub ok
   2013-05-22 16:39:15.422532 7f707dd64700  0 log [ERR] : scrub 3.1 
   b1869301/rb.0.15c26.238e1f29.0836/12d7//3 found clone without 
   head
   2013-05-22 16:40:04.995256 7f707cd62700  0 log [ERR] : scrub 3.1 
   bccad701/rb.0.15c26.238e1f29.9a00/12d7//3 found clone without 
   head
   2013-05-22 16:41:07.008717 7f707d563700  0 log [ERR] : scrub 3.1 
   8a9bec01/rb.0.15c26.238e1f29.9820/12d7//3 found clone without 
   head
   2013-05-22 16:41:42.460280 7f707c561700  0 log [ERR] : 3.1 scrub 3 errors
   2013-05-22 16:46:12.385678 7f7077735700  0 -- 192.168.42.3:6803/12142  
   192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 
   l=0).accept connect_seq 76 vs existing 75 state standby
   2013-05-22 16:58:36.079010 7f707661a700  0 -- 192.168.42.3:6803/12142  
   192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 
   l=0).accept connect_seq 40 vs existing 39 state standby
   2013-05-22 16:58:36.798038 7f707d563700  0 log [INF] : 9.50c scrub ok
   2013-05-22 16:58:40.104159 7f707c561700  0 log [INF] : 9.526 scrub ok
  
  
   Note : I have 8 scrub errors like that, on 4 impacted PG, and all 
   impacted objects are about the same RBD image (rb.0.15c26.238e1f29).
  
  
  
   Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit :
   Can you post your ceph.log with the period including all of these 
   errors?
   -Sam
  
   On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
   maha...@bspu.unibel.by wrote:
Olivier Bonvalet пишет:
   
Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
I have 4 scrub errors (3 PGs - found clone without head), on one 
OSD. Not
repairing. How to repair it exclude re-creating of OSD?
   
Now it easy to clean+create OSD, but in theory - in case there 
are multiple
OSDs - it may cause data lost.
   
I have same problem : 8 objects (4 PG) with error found clone 
without
head. How can I fix that ?
since pg repair doesn't handle that kind of errors, is there a way 
to
manually fix that ? (it's a production cluster)
   
Trying to fix manually I cause assertions in trimming process (died 
OSD). And
many others troubles. So, if you want to keep cluster running, wait 
for
developers answer. IMHO.
   
About manual repair attempt: see issue #4937. Also 

Re: [ceph-users] scrub error: found clone without head

2013-05-23 Thread Samuel Just
Do all of the affected PGs share osd.28 as the primary?  I think the
only recovery is probably to manually remove the orphaned clones.
-Sam

On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote:
 Not yet. I keep it for now.

 Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit :
 rb.0.15c26.238e1f29

 Has that rbd volume been removed?
 -Sam

 On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet ceph.l...@daevel.fr 
 wrote:
  0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.
 
 
  Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
  What version are you running?
  -Sam
 
  On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet ceph.l...@daevel.fr 
  wrote:
   Is it enough ?
  
   # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone 
   without head'
   2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 scrub ok
   2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 scrub ok
   2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 scrub ok
   2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 scrub ok
   2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 scrub ok
   2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 3.6b 
   ade3c16b/rb.0.15c26.238e1f29.9221/12d7//3 found clone without 
   head
   2013-05-22 15:55:07.230114 7f707d563700  0 log [ERR] : scrub 3.6b 
   261cc0eb/rb.0.15c26.238e1f29.3671/12d7//3 found clone without 
   head
   2013-05-22 15:56:56.456242 7f707d563700  0 log [ERR] : scrub 3.6b 
   b10deaeb/rb.0.15c26.238e1f29.86a2/12d7//3 found clone without 
   head
   2013-05-22 15:57:51.667085 7f707dd64700  0 log [ERR] : 3.6b scrub 3 
   errors
   2013-05-22 15:57:55.241224 7f707dd64700  0 log [INF] : 9.450 scrub ok
   2013-05-22 15:57:59.800383 7f707cd62700  0 log [INF] : 9.465 scrub ok
   2013-05-22 15:59:55.024065 7f707661a700  0 -- 192.168.42.3:6803/12142 
192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 
   cs=73 l=0).fault with nothing to send, going to standby
   2013-05-22 16:01:45.542579 7f7022770700  0 -- 192.168.42.3:6803/12142 
192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 
   l=0).accept connect_seq 74 vs existing 73 state standby
   --
   2013-05-22 16:29:49.544310 7f707dd64700  0 log [INF] : 9.4eb scrub ok
   2013-05-22 16:29:53.190233 7f707dd64700  0 log [INF] : 9.4f4 scrub ok
   2013-05-22 16:29:59.478736 7f707dd64700  0 log [INF] : 8.6bb scrub ok
   2013-05-22 16:35:12.240246 7f7022770700  0 -- 192.168.42.3:6803/12142 
192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 
   cs=75 l=0).fault with nothing to send, going to standby
   2013-05-22 16:35:19.519019 7f707d563700  0 log [INF] : 8.700 scrub ok
   2013-05-22 16:39:15.422532 7f707dd64700  0 log [ERR] : scrub 3.1 
   b1869301/rb.0.15c26.238e1f29.0836/12d7//3 found clone without 
   head
   2013-05-22 16:40:04.995256 7f707cd62700  0 log [ERR] : scrub 3.1 
   bccad701/rb.0.15c26.238e1f29.9a00/12d7//3 found clone without 
   head
   2013-05-22 16:41:07.008717 7f707d563700  0 log [ERR] : scrub 3.1 
   8a9bec01/rb.0.15c26.238e1f29.9820/12d7//3 found clone without 
   head
   2013-05-22 16:41:42.460280 7f707c561700  0 log [ERR] : 3.1 scrub 3 
   errors
   2013-05-22 16:46:12.385678 7f7077735700  0 -- 192.168.42.3:6803/12142 
192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 
   l=0).accept connect_seq 76 vs existing 75 state standby
   2013-05-22 16:58:36.079010 7f707661a700  0 -- 192.168.42.3:6803/12142 
192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 
   l=0).accept connect_seq 40 vs existing 39 state standby
   2013-05-22 16:58:36.798038 7f707d563700  0 log [INF] : 9.50c scrub ok
   2013-05-22 16:58:40.104159 7f707c561700  0 log [INF] : 9.526 scrub ok
  
  
   Note : I have 8 scrub errors like that, on 4 impacted PG, and all 
   impacted objects are about the same RBD image (rb.0.15c26.238e1f29).
  
  
  
   Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit :
   Can you post your ceph.log with the period including all of these 
   errors?
   -Sam
  
   On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
   maha...@bspu.unibel.by wrote:
Olivier Bonvalet пишет:
   
Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
I have 4 scrub errors (3 PGs - found clone without head), on 
one OSD. Not
repairing. How to repair it exclude re-creating of OSD?
   
Now it easy to clean+create OSD, but in theory - in case there 
are multiple
OSDs - it may cause data lost.
   
I have same problem : 8 objects (4 PG) with error found clone 
without
head. How can I fix that ?
since pg repair doesn't handle that kind of errors, is there a 
way to
manually fix that ? (it's a production cluster)
   
Trying to fix manually I cause 

Re: [ceph-users] scrub error: found clone without head

2013-05-23 Thread Olivier Bonvalet
No : 
pg 3.7c is active+clean+inconsistent, acting [24,13,39]
pg 3.6b is active+clean+inconsistent, acting [28,23,5]
pg 3.d is active+clean+inconsistent, acting [29,4,11]
pg 3.1 is active+clean+inconsistent, acting [28,19,5]

But I suppose that all PG *was* having the osd.25 as primary (on the
same host), which is (disabled) buggy OSD.

Question : 12d7 in object path is the snapshot id, right ? If it's the
case, I haven't got any snapshot with this id for the
rb.0.15c26.238e1f29 image.

So, which files should I remove ?

Thanks for your help.


Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit :
 Do all of the affected PGs share osd.28 as the primary?  I think the
 only recovery is probably to manually remove the orphaned clones.
 -Sam
 
 On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote:
  Not yet. I keep it for now.
 
  Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit :
  rb.0.15c26.238e1f29
 
  Has that rbd volume been removed?
  -Sam
 
  On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet ceph.l...@daevel.fr 
  wrote:
   0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.
  
  
   Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
   What version are you running?
   -Sam
  
   On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet 
   ceph.l...@daevel.fr wrote:
Is it enough ?
   
# tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone 
without head'
2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 scrub ok
2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 scrub ok
2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 scrub ok
2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 scrub ok
2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 scrub ok
2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 3.6b 
ade3c16b/rb.0.15c26.238e1f29.9221/12d7//3 found clone without 
head
2013-05-22 15:55:07.230114 7f707d563700  0 log [ERR] : scrub 3.6b 
261cc0eb/rb.0.15c26.238e1f29.3671/12d7//3 found clone without 
head
2013-05-22 15:56:56.456242 7f707d563700  0 log [ERR] : scrub 3.6b 
b10deaeb/rb.0.15c26.238e1f29.86a2/12d7//3 found clone without 
head
2013-05-22 15:57:51.667085 7f707dd64700  0 log [ERR] : 3.6b scrub 3 
errors
2013-05-22 15:57:55.241224 7f707dd64700  0 log [INF] : 9.450 scrub ok
2013-05-22 15:57:59.800383 7f707cd62700  0 log [INF] : 9.465 scrub ok
2013-05-22 15:59:55.024065 7f707661a700  0 -- 192.168.42.3:6803/12142 
 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 
pgs=200652 cs=73 l=0).fault with nothing to send, going to standby
2013-05-22 16:01:45.542579 7f7022770700  0 -- 192.168.42.3:6803/12142 
 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 
l=0).accept connect_seq 74 vs existing 73 state standby
--
2013-05-22 16:29:49.544310 7f707dd64700  0 log [INF] : 9.4eb scrub ok
2013-05-22 16:29:53.190233 7f707dd64700  0 log [INF] : 9.4f4 scrub ok
2013-05-22 16:29:59.478736 7f707dd64700  0 log [INF] : 8.6bb scrub ok
2013-05-22 16:35:12.240246 7f7022770700  0 -- 192.168.42.3:6803/12142 
 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 
cs=75 l=0).fault with nothing to send, going to standby
2013-05-22 16:35:19.519019 7f707d563700  0 log [INF] : 8.700 scrub ok
2013-05-22 16:39:15.422532 7f707dd64700  0 log [ERR] : scrub 3.1 
b1869301/rb.0.15c26.238e1f29.0836/12d7//3 found clone without 
head
2013-05-22 16:40:04.995256 7f707cd62700  0 log [ERR] : scrub 3.1 
bccad701/rb.0.15c26.238e1f29.9a00/12d7//3 found clone without 
head
2013-05-22 16:41:07.008717 7f707d563700  0 log [ERR] : scrub 3.1 
8a9bec01/rb.0.15c26.238e1f29.9820/12d7//3 found clone without 
head
2013-05-22 16:41:42.460280 7f707c561700  0 log [ERR] : 3.1 scrub 3 
errors
2013-05-22 16:46:12.385678 7f7077735700  0 -- 192.168.42.3:6803/12142 
 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 
cs=0 l=0).accept connect_seq 76 vs existing 75 state standby
2013-05-22 16:58:36.079010 7f707661a700  0 -- 192.168.42.3:6803/12142 
 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 
l=0).accept connect_seq 40 vs existing 39 state standby
2013-05-22 16:58:36.798038 7f707d563700  0 log [INF] : 9.50c scrub ok
2013-05-22 16:58:40.104159 7f707c561700  0 log [INF] : 9.526 scrub ok
   
   
Note : I have 8 scrub errors like that, on 4 impacted PG, and all 
impacted objects are about the same RBD image (rb.0.15c26.238e1f29).
   
   
   
Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit :
Can you post your ceph.log with the period including all of these 
errors?
-Sam
   
On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
maha...@bspu.unibel.by wrote:
 Olivier 

Re: [ceph-users] scrub error: found clone without head

2013-05-23 Thread Samuel Just
Can you send the filenames in the pg directories for those 4 pgs?
-Sam

On Thu, May 23, 2013 at 3:27 PM, Olivier Bonvalet ceph.l...@daevel.fr wrote:
 No :
 pg 3.7c is active+clean+inconsistent, acting [24,13,39]
 pg 3.6b is active+clean+inconsistent, acting [28,23,5]
 pg 3.d is active+clean+inconsistent, acting [29,4,11]
 pg 3.1 is active+clean+inconsistent, acting [28,19,5]

 But I suppose that all PG *was* having the osd.25 as primary (on the
 same host), which is (disabled) buggy OSD.

 Question : 12d7 in object path is the snapshot id, right ? If it's the
 case, I haven't got any snapshot with this id for the
 rb.0.15c26.238e1f29 image.

 So, which files should I remove ?

 Thanks for your help.


 Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit :
 Do all of the affected PGs share osd.28 as the primary?  I think the
 only recovery is probably to manually remove the orphaned clones.
 -Sam

 On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet ceph.l...@daevel.fr 
 wrote:
  Not yet. I keep it for now.
 
  Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit :
  rb.0.15c26.238e1f29
 
  Has that rbd volume been removed?
  -Sam
 
  On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet ceph.l...@daevel.fr 
  wrote:
   0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.
  
  
   Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
   What version are you running?
   -Sam
  
   On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet 
   ceph.l...@daevel.fr wrote:
Is it enough ?
   
# tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone 
without head'
2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 scrub ok
2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 scrub ok
2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 scrub ok
2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 scrub ok
2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 scrub ok
2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 3.6b 
ade3c16b/rb.0.15c26.238e1f29.9221/12d7//3 found clone 
without head
2013-05-22 15:55:07.230114 7f707d563700  0 log [ERR] : scrub 3.6b 
261cc0eb/rb.0.15c26.238e1f29.3671/12d7//3 found clone 
without head
2013-05-22 15:56:56.456242 7f707d563700  0 log [ERR] : scrub 3.6b 
b10deaeb/rb.0.15c26.238e1f29.86a2/12d7//3 found clone 
without head
2013-05-22 15:57:51.667085 7f707dd64700  0 log [ERR] : 3.6b scrub 3 
errors
2013-05-22 15:57:55.241224 7f707dd64700  0 log [INF] : 9.450 scrub ok
2013-05-22 15:57:59.800383 7f707cd62700  0 log [INF] : 9.465 scrub ok
2013-05-22 15:59:55.024065 7f707661a700  0 -- 
192.168.42.3:6803/12142  192.168.42.5:6828/31490 pipe(0x2a689000 
sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, 
going to standby
2013-05-22 16:01:45.542579 7f7022770700  0 -- 
192.168.42.3:6803/12142  192.168.42.5:6828/31490 pipe(0x2a689280 
sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 
state standby
--
2013-05-22 16:29:49.544310 7f707dd64700  0 log [INF] : 9.4eb scrub ok
2013-05-22 16:29:53.190233 7f707dd64700  0 log [INF] : 9.4f4 scrub ok
2013-05-22 16:29:59.478736 7f707dd64700  0 log [INF] : 8.6bb scrub ok
2013-05-22 16:35:12.240246 7f7022770700  0 -- 
192.168.42.3:6803/12142  192.168.42.5:6828/31490 pipe(0x2a689280 
sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, 
going to standby
2013-05-22 16:35:19.519019 7f707d563700  0 log [INF] : 8.700 scrub ok
2013-05-22 16:39:15.422532 7f707dd64700  0 log [ERR] : scrub 3.1 
b1869301/rb.0.15c26.238e1f29.0836/12d7//3 found clone 
without head
2013-05-22 16:40:04.995256 7f707cd62700  0 log [ERR] : scrub 3.1 
bccad701/rb.0.15c26.238e1f29.9a00/12d7//3 found clone 
without head
2013-05-22 16:41:07.008717 7f707d563700  0 log [ERR] : scrub 3.1 
8a9bec01/rb.0.15c26.238e1f29.9820/12d7//3 found clone 
without head
2013-05-22 16:41:42.460280 7f707c561700  0 log [ERR] : 3.1 scrub 3 
errors
2013-05-22 16:46:12.385678 7f7077735700  0 -- 
192.168.42.3:6803/12142  192.168.42.5:6828/31490 pipe(0x2a689c80 
sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 
75 state standby
2013-05-22 16:58:36.079010 7f707661a700  0 -- 
192.168.42.3:6803/12142  192.168.42.3:6801/11745 pipe(0x2a689a00 
sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 
state standby
2013-05-22 16:58:36.798038 7f707d563700  0 log [INF] : 9.50c scrub ok
2013-05-22 16:58:40.104159 7f707c561700  0 log [INF] : 9.526 scrub ok
   
   
Note : I have 8 scrub errors like that, on 4 impacted PG, and all 
impacted objects are about the same RBD image (rb.0.15c26.238e1f29).
   
   
   
Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit :
Can you post 

Re: [ceph-users] scrub error: found clone without head

2013-05-22 Thread Olivier Bonvalet

Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
 Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
  I have 4 scrub errors (3 PGs - found clone without head), on one OSD. Not
  repairing. How to repair it exclude re-creating of OSD?
  
  Now it easy to clean+create OSD, but in theory - in case there are 
  multiple
  OSDs - it may cause data lost.
  
  -- 
  WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
 
 
 Hi,
 
 I have same problem : 8 objects (4 PG) with error found clone without
 head. How can I fix that ?
 
 thanks,
 Olivier
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Hi,

since pg repair doesn't handle that kind of errors, is there a way to
manually fix that ? (it's a production cluster)

thanks in advance,
Olivier

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] scrub error: found clone without head

2013-05-22 Thread Samuel Just
Can you post your ceph.log with the period including all of these errors?
-Sam

On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
maha...@bspu.unibel.by wrote:
 Olivier Bonvalet пишет:

 Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
 Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
 I have 4 scrub errors (3 PGs - found clone without head), on one OSD. Not
 repairing. How to repair it exclude re-creating of OSD?

 Now it easy to clean+create OSD, but in theory - in case there are 
 multiple
 OSDs - it may cause data lost.

 I have same problem : 8 objects (4 PG) with error found clone without
 head. How can I fix that ?
 since pg repair doesn't handle that kind of errors, is there a way to
 manually fix that ? (it's a production cluster)

 Trying to fix manually I cause assertions in trimming process (died OSD). And
 many others troubles. So, if you want to keep cluster running, wait for
 developers answer. IMHO.

 About manual repair attempt: see issue #4937. Also similar results - in 
 subject
 Inconsistent PG's, repair ineffective.

 --
 WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] scrub error: found clone without head

2013-05-20 Thread Olivier Bonvalet
Great, thanks. I will follow this issue, and add informations if needed.

Le lundi 20 mai 2013 à 17:22 +0300, Dzianis Kahanovich a écrit :
 http://tracker.ceph.com/issues/4937
 
 For me it progressed up to ceph reinstall with repair data from backup (I help
 ceph die, but it was IMHO self-provocation for force reinstall). Now (at least
 to my summer outdoors) I keep v0.62 (3 nodes) with every pool size=3 
 min_size=2
 (was - size=2 min_size=1).
 
 But try to do nothing first and try to install latest version. And keep your
 vote to issue #4937 to force developers.
 
 Olivier Bonvalet пишет:
  Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
  I have 4 scrub errors (3 PGs - found clone without head), on one OSD. Not
  repairing. How to repair it exclude re-creating of OSD?
 
  Now it easy to clean+create OSD, but in theory - in case there are 
  multiple
  OSDs - it may cause data lost.
 
  -- 
  WBR, Dzianis Kahanovich AKA Denis Kaganovich, 
  http://mahatma.bspu.unibel.by/
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
  
  
  Hi,
  
  I have same problem : 8 objects (4 PG) with error found clone without
  head. How can I fix that ?
  
  thanks,
  Olivier
  
  
  
 
 
 -- 
 WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] scrub error: found clone without head

2013-05-19 Thread Olivier Bonvalet
Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
 I have 4 scrub errors (3 PGs - found clone without head), on one OSD. Not
 repairing. How to repair it exclude re-creating of OSD?
 
 Now it easy to clean+create OSD, but in theory - in case there are multiple
 OSDs - it may cause data lost.
 
 -- 
 WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


Hi,

I have same problem : 8 objects (4 PG) with error found clone without
head. How can I fix that ?

thanks,
Olivier

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com