Re: [ceph-users] How to repair active+clean+inconsistent?

2018-11-14 Thread Brad Hubbard
You could try a 'rados get' and then a 'rados put' on the object to start with.
On Thu, Nov 15, 2018 at 4:07 AM K.C. Wong  wrote:
>
> So, I’ve issued the deep-scrub command (and the repair command)
> and nothing seems to happen.
> Unrelated to this issue, I have to take down some OSD to prepare
> a host for RMA. One of them happens to be in the replication
> group for this PG. So, a scrub happened indirectly. I now have
> this from “ceph -s”:
>
> cluster 374aed9e-5fc1-47e1-8d29-4416f7425e76
>  health HEALTH_ERR
> 1 pgs inconsistent
> 18446 scrub errors
>  monmap e1: 3 mons at 
> {mgmt01=10.0.1.1:6789/0,mgmt02=10.1.1.1:6789/0,mgmt03=10.2.1.1:6789/0}
> election epoch 252, quorum 0,1,2 mgmt01,mgmt02,mgmt03
>   fsmap e346: 1/1/1 up {0=mgmt01=up:active}, 2 up:standby
>  osdmap e40248: 120 osds: 119 up, 119 in
> flags sortbitwise,require_jewel_osds
>   pgmap v22025963: 3136 pgs, 18 pools, 18975 GB data, 214 Mobjects
> 59473 GB used, 287 TB / 345 TB avail
> 3120 active+clean
>   15 active+clean+scrubbing+deep
>1 active+clean+inconsistent
>
> That’s a lot of scrub errors:
>
> HEALTH_ERR 1 pgs inconsistent; 18446 scrub errors
> pg 1.65 is active+clean+inconsistent, acting [62,67,33]
> 18446 scrub errors
>
> Now, “rados list-inconsistent-obj 1.65” returns a *very* long JSON
> output. Here’s a very small snippet, the errors look the same across:
>
> {
>   “object”:{
> "name":”10ea8bb.0045”,
> "nspace":”",
> "locator":”",
> "snap":"head”,
> "version”:59538
>   },
>   "errors":["attr_name_mismatch”],
>   "union_shard_errors":["oi_attr_missing”],
>   "selected_object_info":"1:a70dc1cc:::10ea8bb.0045:head(2897'59538 
> client.4895965.0:462007 dirty|data_digest|omap_digest s 4194304 uv 59538 dd 
> f437a612 od  alloc_hint [0 0])”,
>   "shards”:[
> {
>   "osd":33,
>   "errors":[],
>   "size":4194304,
>   "omap_digest”:"0x”,
>   "data_digest”:"0xf437a612”,
>   "attrs":[
> {"name":"_”,
>  "value":”EAgNAQAABAM1AA...“,
>  "Base64":true},
> {"name":"snapset”,
>  "value":”AgIZAQ...“,
>  "Base64":true}
>   ]
> },
> {
>   "osd":62,
>   "errors":[],
>   "size":4194304,
>   "omap_digest":"0x”,
>   "data_digest":"0xf437a612”,
>   "attrs”:[
> {"name":"_”,
>  "value":”EAgNAQAABAM1AA...",
>  "Base64":true},
> {"name":"snapset”,
>  "value":”AgIZAQ…",
>  "Base64":true}
>   ]
> },
> {
>   "osd":67,
>   "errors":["oi_attr_missing”],
>   "size":4194304,
>   "omap_digest":"0x”,
>   "data_digest":"0xf437a612”,
>   "attrs":[]
> }
>   ]
> }
>
> Clearly, on osd.67, the “attrs” array is empty. The question is,
> how do I fix this?
>
> Many thanks in advance,
>
> -kc
>
> K.C. Wong
> kcw...@verseon.com
> M: +1 (408) 769-8235
>
> -
> Confidentiality Notice:
> This message contains confidential information. If you are not the
> intended recipient and received this message in error, any use or
> distribution is strictly prohibited. Please also notify us
> immediately by return e-mail, and delete this message from your
> computer system. Thank you.
> -
>
> 4096R/B8995EDE  E527 CBE8 023E 79EA 8BBB  5C77 23A6 92E9 B899 5EDE
>
> hkps://hkps.pool.sks-keyservers.net
>
> On Nov 11, 2018, at 10:58 PM, Brad Hubbard  wrote:
>
> On Mon, Nov 12, 2018 at 4:21 PM Ashley Merrick  
> wrote:
>
>
> Your need to run "ceph pg deep-scrub 1.65" first
>
>
> Right, thanks Ashley. That's what the "Note that you may have to do a
> deep scrub to populate the output." part of my answer meant but
> perhaps I needed to go further?
>
> The system has a record of a scrub error on a previous scan but
> subsequent activity in the cluster has invalidated the specifics. You
> need to run another scrub to get the specific information for this pg
> at this point in time (the information does not remain valid
> indefinitely and therefore may need to be renewed depending on
> circumstances).
>
>
> On Mon, Nov 12, 2018 at 2:20 PM K.C. Wong  wrote:
>
>
> Hi Brad,
>
> I got the following:
>
> [root@mgmt01 ~]# ceph health detail
> HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
> pg 1.65 is active+clean+inconsistent, acting [62,67,47]
> 1 scrub errors
> [root@mgmt01 ~]# rados list-inconsistent-obj 1.65
> No scrub information available for pg 1.65
> error 2: (2) No such file or directory
> [root@mgmt01 ~]# rados list-inconsistent-snapset 1.65
> No scrub information available for pg 1.65
> error 2: (2) No such file or directory
>
> Rather odd output, I’d say; not that I understand what
> that means. I also tried ceph list-inconsistent-pg:
>
> [root@mgmt01 ~]# rados lspools
> rbd
> 

Re: [ceph-users] How to repair active+clean+inconsistent?

2018-11-14 Thread K.C. Wong
So, I’ve issued the deep-scrub command (and the repair command)
and nothing seems to happen.
Unrelated to this issue, I have to take down some OSD to prepare
a host for RMA. One of them happens to be in the replication
group for this PG. So, a scrub happened indirectly. I now have
this from “ceph -s”:

cluster 374aed9e-5fc1-47e1-8d29-4416f7425e76
 health HEALTH_ERR
1 pgs inconsistent
18446 scrub errors
 monmap e1: 3 mons at 
{mgmt01=10.0.1.1:6789/0,mgmt02=10.1.1.1:6789/0,mgmt03=10.2.1.1:6789/0}
election epoch 252, quorum 0,1,2 mgmt01,mgmt02,mgmt03
  fsmap e346: 1/1/1 up {0=mgmt01=up:active}, 2 up:standby
 osdmap e40248: 120 osds: 119 up, 119 in
flags sortbitwise,require_jewel_osds
  pgmap v22025963: 3136 pgs, 18 pools, 18975 GB data, 214 Mobjects
59473 GB used, 287 TB / 345 TB avail
3120 active+clean
  15 active+clean+scrubbing+deep
   1 active+clean+inconsistent

That’s a lot of scrub errors:

HEALTH_ERR 1 pgs inconsistent; 18446 scrub errors
pg 1.65 is active+clean+inconsistent, acting [62,67,33]
18446 scrub errors

Now, “rados list-inconsistent-obj 1.65” returns a *very* long JSON
output. Here’s a very small snippet, the errors look the same across:

{
  “object”:{
"name":”10ea8bb.0045”,
"nspace":”",
"locator":”",
"snap":"head”,
"version”:59538
  },
  "errors":["attr_name_mismatch”],
  "union_shard_errors":["oi_attr_missing”],
  "selected_object_info":"1:a70dc1cc:::10ea8bb.0045:head(2897'59538 
client.4895965.0:462007 dirty|data_digest|omap_digest s 4194304 uv 59538 dd 
f437a612 od  alloc_hint [0 0])”,
  "shards”:[
{
  "osd":33,
  "errors":[],
  "size":4194304,
  "omap_digest”:"0x”,
  "data_digest”:"0xf437a612”,
  "attrs":[
{"name":"_”,
 "value":”EAgNAQAABAM1AA...“,
 "Base64":true},
{"name":"snapset”,
 "value":”AgIZAQ...“,
 "Base64":true}
  ]
},
{
  "osd":62,
  "errors":[],
  "size":4194304,
  "omap_digest":"0x”,
  "data_digest":"0xf437a612”,
  "attrs”:[
{"name":"_”,
 "value":”EAgNAQAABAM1AA...",
 "Base64":true},
{"name":"snapset”,
 "value":”AgIZAQ…",
 "Base64":true}
  ]
},
{
  "osd":67,
  "errors":["oi_attr_missing”],
  "size":4194304,
  "omap_digest":"0x”,
  "data_digest":"0xf437a612”,
  "attrs":[]
}
  ]
}

Clearly, on osd.67, the “attrs” array is empty. The question is,
how do I fix this?

Many thanks in advance,

-kc

K.C. Wong
kcw...@verseon.com 
M: +1 (408) 769-8235

-
Confidentiality Notice:
This message contains confidential information. If you are not the
intended recipient and received this message in error, any use or
distribution is strictly prohibited. Please also notify us
immediately by return e-mail, and delete this message from your
computer system. Thank you.
-
4096R/B8995EDE 
  E527 
CBE8 023E 79EA 8BBB  5C77 23A6 92E9 B899 5EDE
hkps://hkps.pool.sks-keyservers.net

> On Nov 11, 2018, at 10:58 PM, Brad Hubbard  wrote:
> 
> On Mon, Nov 12, 2018 at 4:21 PM Ashley Merrick  > wrote:
>> 
>> Your need to run "ceph pg deep-scrub 1.65" first
> 
> Right, thanks Ashley. That's what the "Note that you may have to do a
> deep scrub to populate the output." part of my answer meant but
> perhaps I needed to go further?
> 
> The system has a record of a scrub error on a previous scan but
> subsequent activity in the cluster has invalidated the specifics. You
> need to run another scrub to get the specific information for this pg
> at this point in time (the information does not remain valid
> indefinitely and therefore may need to be renewed depending on
> circumstances).
> 
>> 
>> On Mon, Nov 12, 2018 at 2:20 PM K.C. Wong  wrote:
>>> 
>>> Hi Brad,
>>> 
>>> I got the following:
>>> 
>>> [root@mgmt01 ~]# ceph health detail
>>> HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
>>> pg 1.65 is active+clean+inconsistent, acting [62,67,47]
>>> 1 scrub errors
>>> [root@mgmt01 ~]# rados list-inconsistent-obj 1.65
>>> No scrub information available for pg 1.65
>>> error 2: (2) No such file or directory
>>> [root@mgmt01 ~]# rados list-inconsistent-snapset 1.65
>>> No scrub information available for pg 1.65
>>> error 2: (2) No such file or directory
>>> 
>>> Rather odd output, I’d say; not that I understand what
>>> that means. I also tried ceph list-inconsistent-pg:
>>> 
>>> [root@mgmt01 ~]# rados lspools
>>> rbd
>>> cephfs_data
>>> cephfs_metadata
>>> .rgw.root
>>> default.rgw.control
>>> default.rgw.data.root
>>> default.rgw.gc
>>> default.rgw.log
>>> ctrl-p
>>> prod
>>> corp

Re: [ceph-users] How to repair active+clean+inconsistent?

2018-11-11 Thread K.C. Wong
Thanks, Ashley.

Should I expect the deep-scrubbing to start immediately?

[root@mgmt01 ~]# ceph pg deep-scrub 1.65
instructing pg 1.65 on osd.62 to deep-scrub
[root@mgmt01 ~]# ceph pg ls deep_scrub
pg_stat objects mip degrmispunf bytes   log disklog state   
state_stamp v   reportedup  up_primary  acting  
acting_primary  last_scrub  scrub_stamp last_deep_scrub deep_scrub_stamp
16.75   430657  0   0   0   0   30754735820 30073007
active+clean+scrubbing+deep 2018-11-11 11:05:11.572325  39934'549067
39934:1311893   [4,64,35]   4   [4,64,35]   4   28743'539264
2018-11-07 02:17:53.293336  28743'5392642018-11-03 14:39:44.837702
16.86   430617  0   0   0   0   30316842298 30483048
active+clean+scrubbing+deep 2018-11-11 15:56:30.148527  39934'548012
39934:1038058   [18,2,62]   18  [18,2,62]   18  26347'529815
2018-10-28 01:06:55.526624  26347'5298152018-10-28 01:06:55.526624
16.eb   432196  0   0   0   0   30612459543 30713071
active+clean+scrubbing+deep 2018-11-11 11:02:46.993022  39934'550340
39934:3662047   [56,44,42]  56  [56,44,42]  56  28507'540255
2018-11-02 03:28:28.013949  28507'5402552018-11-02 03:28:28.013949
16.f3   431399  0   0   0   0   30672009253 30673067
active+clean+scrubbing+deep 2018-11-11 17:40:55.732162  39934'549240
39934:2212192   [69,82,6]   69  [69,82,6]   69  28743'539336
2018-11-02 17:22:05.745972  28743'5393362018-11-02 17:22:05.745972
16.f7   430885  0   0   0   0   30796505272 31003100
active+clean+scrubbing+deep 2018-11-11 22:50:05.231599  39934'548910
39934:683169[59,63,119] 59  [59,63,119] 59  28743'539167
2018-11-03 07:24:43.776341  26347'5308302018-10-28 04:44:12.276982
16.14c  430565  0   0   0   0   31177011073 30423042
active+clean+scrubbing+deep 2018-11-11 20:11:31.107313  39934'550564
39934:1545200   [41,12,70]  41  [41,12,70]  41  28743'540758
2018-11-03 23:04:49.155741  28743'5407582018-11-03 23:04:49.155741
16.156  430356  0   0   0   0   31021738479 30063006
active+clean+scrubbing+deep 2018-11-11 20:44:14.019537  39934'549241
39934:2958053   [83,47,1]   83  [83,47,1]   83  28743'539462
2018-11-04 14:46:56.890822  28743'5394622018-11-04 14:46:56.890822
16.19f  431613  0   0   0   0   30746145827 30633063
active+clean+scrubbing+deep 2018-11-11 19:06:40.693002  39934'549429
39934:1189872   [14,54,37]  14  [14,54,37]  14  28743'539660
2018-11-04 18:25:13.225962  26347'5313452018-10-28 20:08:45.286421
16.1b1  431225  0   0   0   0   30988996529 30483048
active+clean+scrubbing+deep 2018-11-11 20:12:35.367935  39934'549604
39934:778127[34,106,11] 34  [34,106,11] 34  26347'531560
2018-10-27 16:49:46.944748  26347'5315602018-10-27 16:49:46.944748
16.1e2  431724  0   0   0   0   30247732969 30703070
active+clean+scrubbing+deep 2018-11-11 20:55:17.591646  39934'550105
39934:1428341   [103,48,3]  103 [103,48,3]  103 28743'540270
2018-11-06 03:36:30.531106  28507'5398402018-11-02 01:08:23.268409
16.1f3  430604  0   0   0   0   30633545866 30393039
active+clean+scrubbing+deep 2018-11-11 20:15:28.557464  39934'548804
39934:1354817   [66,102,33] 66  [66,102,33] 66  28743'538896
2018-11-04 04:59:33.118414  28743'5388962018-11-04 04:59:33.118414
[root@mgmt01 ~]# ceph pg ls inconsistent
pg_stat objects mip degrmispunf bytes   log disklog state   
state_stamp v   reportedup  up_primary  acting  
acting_primary  last_scrub  scrub_stamp last_deep_scrub deep_scrub_stamp
1.6512806   0   0   0   0   30010463024 30083008
active+clean+inconsistent   2018-11-10 00:16:43.965966  39934'184512
39934:388820[62,67,47]  62  [62,67,47]  62  28743'183853
2018-11-04 01:31:27.042458  28743'1838532018-11-04 01:31:27.042458

It’s similar to when I issued “ceph pg repair 1.65”, instructing
osd.62 to repair 1.65, and then nothing seems to happen.

-kc

K.C. Wong
kcw...@verseon.com 
M: +1 (408) 769-8235

-
Confidentiality Notice:
This message contains confidential information. If you are not the
intended recipient and received this message in error, any use or
distribution is strictly prohibited. Please also 

Re: [ceph-users] How to repair active+clean+inconsistent?

2018-11-11 Thread Brad Hubbard
On Mon, Nov 12, 2018 at 4:21 PM Ashley Merrick  wrote:
>
> Your need to run "ceph pg deep-scrub 1.65" first

Right, thanks Ashley. That's what the "Note that you may have to do a
deep scrub to populate the output." part of my answer meant but
perhaps I needed to go further?

The system has a record of a scrub error on a previous scan but
subsequent activity in the cluster has invalidated the specifics. You
need to run another scrub to get the specific information for this pg
at this point in time (the information does not remain valid
indefinitely and therefore may need to be renewed depending on
circumstances).

>
> On Mon, Nov 12, 2018 at 2:20 PM K.C. Wong  wrote:
>>
>> Hi Brad,
>>
>> I got the following:
>>
>> [root@mgmt01 ~]# ceph health detail
>> HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
>> pg 1.65 is active+clean+inconsistent, acting [62,67,47]
>> 1 scrub errors
>> [root@mgmt01 ~]# rados list-inconsistent-obj 1.65
>> No scrub information available for pg 1.65
>> error 2: (2) No such file or directory
>> [root@mgmt01 ~]# rados list-inconsistent-snapset 1.65
>> No scrub information available for pg 1.65
>> error 2: (2) No such file or directory
>>
>> Rather odd output, I’d say; not that I understand what
>> that means. I also tried ceph list-inconsistent-pg:
>>
>> [root@mgmt01 ~]# rados lspools
>> rbd
>> cephfs_data
>> cephfs_metadata
>> .rgw.root
>> default.rgw.control
>> default.rgw.data.root
>> default.rgw.gc
>> default.rgw.log
>> ctrl-p
>> prod
>> corp
>> camp
>> dev
>> default.rgw.users.uid
>> default.rgw.users.keys
>> default.rgw.buckets.index
>> default.rgw.buckets.data
>> default.rgw.buckets.non-ec
>> [root@mgmt01 ~]# for i in $(rados lspools); do rados list-inconsistent-pg 
>> $i; done
>> []
>> ["1.65"]
>> []
>> []
>> []
>> []
>> []
>> []
>> []
>> []
>> []
>> []
>> []
>> []
>> []
>> []
>> []
>> []
>>
>> So, that’d put the inconsistency in the cephfs_data pool.
>>
>> Thank you for your help,
>>
>> -kc
>>
>> K.C. Wong
>> kcw...@verseon.com
>> M: +1 (408) 769-8235
>>
>> -
>> Confidentiality Notice:
>> This message contains confidential information. If you are not the
>> intended recipient and received this message in error, any use or
>> distribution is strictly prohibited. Please also notify us
>> immediately by return e-mail, and delete this message from your
>> computer system. Thank you.
>> -
>>
>> 4096R/B8995EDE  E527 CBE8 023E 79EA 8BBB  5C77 23A6 92E9 B899 5EDE
>>
>> hkps://hkps.pool.sks-keyservers.net
>>
>> On Nov 11, 2018, at 5:43 PM, Brad Hubbard  wrote:
>>
>> What does "rados list-inconsistent-obj " say?
>>
>> Note that you may have to do a deep scrub to populate the output.
>> On Mon, Nov 12, 2018 at 5:10 AM K.C. Wong  wrote:
>>
>>
>> Hi folks,
>>
>> I would appreciate any pointer as to how I can resolve a
>> PG stuck in “active+clean+inconsistent” state. This has
>> resulted in HEALTH_ERR status for the last 5 days with no
>> end in sight. The state got triggered when one of the drives
>> in the PG returned I/O error. I’ve since replaced the failed
>> drive.
>>
>> I’m running Jewel (out of centos-release-ceph-jewel) on
>> CentOS 7. I’ve tried “ceph pg repair ” and it didn’t seem
>> to do anything. I’ve tried even more drastic measures such as
>> comparing all the files (using filestore) under that PG_head
>> on all 3 copies and then nuking the outlier. Nothing worked.
>>
>> Many thanks,
>>
>> -kc
>>
>> K.C. Wong
>> kcw...@verseon.com
>> M: +1 (408) 769-8235
>>
>> -
>> Confidentiality Notice:
>> This message contains confidential information. If you are not the
>> intended recipient and received this message in error, any use or
>> distribution is strictly prohibited. Please also notify us
>> immediately by return e-mail, and delete this message from your
>> computer system. Thank you.
>> -
>> 4096R/B8995EDE  E527 CBE8 023E 79EA 8BBB  5C77 23A6 92E9 B899 5EDE
>> hkps://hkps.pool.sks-keyservers.net
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>>
>> --
>> Cheers,
>> Brad
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to repair active+clean+inconsistent?

2018-11-11 Thread Ashley Merrick
Your need to run "ceph pg deep-scrub 1.65" first

On Mon, Nov 12, 2018 at 2:20 PM K.C. Wong  wrote:

> Hi Brad,
>
> I got the following:
>
> [root@mgmt01 ~]# ceph health detail
> HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
> pg 1.65 is active+clean+inconsistent, acting [62,67,47]
> 1 scrub errors
> [root@mgmt01 ~]# rados list-inconsistent-obj 1.65
> No scrub information available for pg 1.65
> error 2: (2) No such file or directory
> [root@mgmt01 ~]# rados list-inconsistent-snapset 1.65
> No scrub information available for pg 1.65
> error 2: (2) No such file or directory
>
> Rather odd output, I’d say; not that I understand what
> that means. I also tried ceph list-inconsistent-pg:
>
> [root@mgmt01 ~]# rados lspools
> rbd
> cephfs_data
> cephfs_metadata
> .rgw.root
> default.rgw.control
> default.rgw.data.root
> default.rgw.gc
> default.rgw.log
> ctrl-p
> prod
> corp
> camp
> dev
> default.rgw.users.uid
> default.rgw.users.keys
> default.rgw.buckets.index
> default.rgw.buckets.data
> default.rgw.buckets.non-ec
> [root@mgmt01 ~]# for i in $(rados lspools); do rados list-inconsistent-pg
> $i; done
> []
> ["1.65"]
> []
> []
> []
> []
> []
> []
> []
> []
> []
> []
> []
> []
> []
> []
> []
> []
>
> So, that’d put the inconsistency in the cephfs_data pool.
>
> Thank you for your help,
>
> -kc
>
> K.C. Wong
> kcw...@verseon.com
> M: +1 (408) 769-8235
>
> -
> Confidentiality Notice:
> This message contains confidential information. If you are not the
> intended recipient and received this message in error, any use or
> distribution is strictly prohibited. Please also notify us
> immediately by return e-mail, and delete this message from your
> computer system. Thank you.
> -
>
> 4096R/B8995EDE 
>   
> E527 CBE8 023E 79EA 8BBB  5C77 23A6 92E9 B899 5EDE
>
> hkps://hkps.pool.sks-keyservers.net
>
> On Nov 11, 2018, at 5:43 PM, Brad Hubbard  wrote:
>
> What does "rados list-inconsistent-obj " say?
>
> Note that you may have to do a deep scrub to populate the output.
> On Mon, Nov 12, 2018 at 5:10 AM K.C. Wong  wrote:
>
>
> Hi folks,
>
> I would appreciate any pointer as to how I can resolve a
> PG stuck in “active+clean+inconsistent” state. This has
> resulted in HEALTH_ERR status for the last 5 days with no
> end in sight. The state got triggered when one of the drives
> in the PG returned I/O error. I’ve since replaced the failed
> drive.
>
> I’m running Jewel (out of centos-release-ceph-jewel) on
> CentOS 7. I’ve tried “ceph pg repair ” and it didn’t seem
> to do anything. I’ve tried even more drastic measures such as
> comparing all the files (using filestore) under that PG_head
> on all 3 copies and then nuking the outlier. Nothing worked.
>
> Many thanks,
>
> -kc
>
> K.C. Wong
> kcw...@verseon.com
> M: +1 (408) 769-8235
>
> -
> Confidentiality Notice:
> This message contains confidential information. If you are not the
> intended recipient and received this message in error, any use or
> distribution is strictly prohibited. Please also notify us
> immediately by return e-mail, and delete this message from your
> computer system. Thank you.
> -
> 4096R/B8995EDE  E527 CBE8 023E 79EA 8BBB  5C77 23A6 92E9 B899 5EDE
> hkps://hkps.pool.sks-keyservers.net
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> --
> Cheers,
> Brad
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to repair active+clean+inconsistent?

2018-11-11 Thread K.C. Wong
Hi Brad,

I got the following:

[root@mgmt01 ~]# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
pg 1.65 is active+clean+inconsistent, acting [62,67,47]
1 scrub errors
[root@mgmt01 ~]# rados list-inconsistent-obj 1.65
No scrub information available for pg 1.65
error 2: (2) No such file or directory
[root@mgmt01 ~]# rados list-inconsistent-snapset 1.65
No scrub information available for pg 1.65
error 2: (2) No such file or directory

Rather odd output, I’d say; not that I understand what
that means. I also tried ceph list-inconsistent-pg:

[root@mgmt01 ~]# rados lspools
rbd
cephfs_data
cephfs_metadata
.rgw.root
default.rgw.control
default.rgw.data.root
default.rgw.gc
default.rgw.log
ctrl-p
prod
corp
camp
dev
default.rgw.users.uid
default.rgw.users.keys
default.rgw.buckets.index
default.rgw.buckets.data
default.rgw.buckets.non-ec
[root@mgmt01 ~]# for i in $(rados lspools); do rados list-inconsistent-pg $i; 
done
[]
["1.65"]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]

So, that’d put the inconsistency in the cephfs_data pool.

Thank you for your help,

-kc

K.C. Wong
kcw...@verseon.com 
M: +1 (408) 769-8235

-
Confidentiality Notice:
This message contains confidential information. If you are not the
intended recipient and received this message in error, any use or
distribution is strictly prohibited. Please also notify us
immediately by return e-mail, and delete this message from your
computer system. Thank you.
-
4096R/B8995EDE 
  E527 
CBE8 023E 79EA 8BBB  5C77 23A6 92E9 B899 5EDE
hkps://hkps.pool.sks-keyservers.net

> On Nov 11, 2018, at 5:43 PM, Brad Hubbard  wrote:
> 
> What does "rados list-inconsistent-obj " say?
> 
> Note that you may have to do a deep scrub to populate the output.
> On Mon, Nov 12, 2018 at 5:10 AM K.C. Wong  wrote:
>> 
>> Hi folks,
>> 
>> I would appreciate any pointer as to how I can resolve a
>> PG stuck in “active+clean+inconsistent” state. This has
>> resulted in HEALTH_ERR status for the last 5 days with no
>> end in sight. The state got triggered when one of the drives
>> in the PG returned I/O error. I’ve since replaced the failed
>> drive.
>> 
>> I’m running Jewel (out of centos-release-ceph-jewel) on
>> CentOS 7. I’ve tried “ceph pg repair ” and it didn’t seem
>> to do anything. I’ve tried even more drastic measures such as
>> comparing all the files (using filestore) under that PG_head
>> on all 3 copies and then nuking the outlier. Nothing worked.
>> 
>> Many thanks,
>> 
>> -kc
>> 
>> K.C. Wong
>> kcw...@verseon.com
>> M: +1 (408) 769-8235
>> 
>> -
>> Confidentiality Notice:
>> This message contains confidential information. If you are not the
>> intended recipient and received this message in error, any use or
>> distribution is strictly prohibited. Please also notify us
>> immediately by return e-mail, and delete this message from your
>> computer system. Thank you.
>> -
>> 4096R/B8995EDE  E527 CBE8 023E 79EA 8BBB  5C77 23A6 92E9 B899 5EDE
>> hkps://hkps.pool.sks-keyservers.net
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> --
> Cheers,
> Brad



signature.asc
Description: Message signed with OpenPGP
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to repair active+clean+inconsistent?

2018-11-11 Thread Brad Hubbard
What does "rados list-inconsistent-obj " say?

Note that you may have to do a deep scrub to populate the output.
On Mon, Nov 12, 2018 at 5:10 AM K.C. Wong  wrote:
>
> Hi folks,
>
> I would appreciate any pointer as to how I can resolve a
> PG stuck in “active+clean+inconsistent” state. This has
> resulted in HEALTH_ERR status for the last 5 days with no
> end in sight. The state got triggered when one of the drives
> in the PG returned I/O error. I’ve since replaced the failed
> drive.
>
> I’m running Jewel (out of centos-release-ceph-jewel) on
> CentOS 7. I’ve tried “ceph pg repair ” and it didn’t seem
> to do anything. I’ve tried even more drastic measures such as
> comparing all the files (using filestore) under that PG_head
> on all 3 copies and then nuking the outlier. Nothing worked.
>
> Many thanks,
>
> -kc
>
> K.C. Wong
> kcw...@verseon.com
> M: +1 (408) 769-8235
>
> -
> Confidentiality Notice:
> This message contains confidential information. If you are not the
> intended recipient and received this message in error, any use or
> distribution is strictly prohibited. Please also notify us
> immediately by return e-mail, and delete this message from your
> computer system. Thank you.
> -
> 4096R/B8995EDE  E527 CBE8 023E 79EA 8BBB  5C77 23A6 92E9 B899 5EDE
> hkps://hkps.pool.sks-keyservers.net
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to repair active+clean+inconsistent?

2018-11-11 Thread K.C. Wong
Hi folks,

I would appreciate any pointer as to how I can resolve a
PG stuck in “active+clean+inconsistent” state. This has
resulted in HEALTH_ERR status for the last 5 days with no
end in sight. The state got triggered when one of the drives
in the PG returned I/O error. I’ve since replaced the failed
drive.

I’m running Jewel (out of centos-release-ceph-jewel) on
CentOS 7. I’ve tried “ceph pg repair ” and it didn’t seem
to do anything. I’ve tried even more drastic measures such as
comparing all the files (using filestore) under that PG_head
on all 3 copies and then nuking the outlier. Nothing worked.

Many thanks,

-kc

K.C. Wong
kcw...@verseon.com
M: +1 (408) 769-8235

-
Confidentiality Notice:
This message contains confidential information. If you are not the
intended recipient and received this message in error, any use or
distribution is strictly prohibited. Please also notify us
immediately by return e-mail, and delete this message from your
computer system. Thank you.
-
4096R/B8995EDE  E527 CBE8 023E 79EA 8BBB  5C77 23A6 92E9 B899 5EDE
hkps://hkps.pool.sks-keyservers.net



signature.asc
Description: Message signed with OpenPGP
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com