Re: [ceph-users] Inconsistent PGs caused by omap_digest mismatch

2019-04-09 Thread Bryan Stillwell

> On Apr 8, 2019, at 5:42 PM, Bryan Stillwell  wrote:
> 
> 
>> On Apr 8, 2019, at 4:38 PM, Gregory Farnum  wrote:
>> 
>> On Mon, Apr 8, 2019 at 3:19 PM Bryan Stillwell  
>> wrote:
>>> 
>>> There doesn't appear to be any correlation between the OSDs which would 
>>> point to a hardware issue, and since it's happening on two different 
>>> clusters I'm wondering if there's a race condition that has been fixed in a 
>>> later version?
>>> 
>>> Also, what exactly is the omap digest?  From what I can tell it appears to 
>>> be some kind of checksum for the omap data.  Is that correct?
>> 
>> Yeah; it's just a crc over the omap key-value data that's checked
>> during deep scrub. Same as the data digest.
>> 
>> I've not noticed any issues around this in Luminous but I probably
>> wouldn't have, so will have to leave it up to others if there are
>> fixes in since 12.2.8.
> 
> Thanks for adding some clarity to that Greg!
> 
> For some added information, this is what the logs reported earlier today:
> 
> 2019-04-08 11:46:15.610169 osd.504 osd.504 10.16.10.30:6804/8874 33 : cluster 
> [ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 
> 0x26a1241b != omap_digest 0x4c10ee76 from shard 504
> 2019-04-08 11:46:15.610190 osd.504 osd.504 10.16.10.30:6804/8874 34 : cluster 
> [ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 
> 0x26a1241b != omap_digest 0x4c10ee76 from shard 504
> 
> I then tried deep scrubbing it again to see if the data was fine, but the 
> digest calculation was just having problems.  It came back with the same 
> problem with new digest values:
> 
> 2019-04-08 15:56:21.186291 osd.504 osd.504 10.16.10.30:6804/8874 49 : cluster 
> [ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 
> 0x93bac8f != omap_digest 0 xab1b9c6f from shard 504
> 2019-04-08 15:56:21.186313 osd.504 osd.504 10.16.10.30:6804/8874 50 : cluster 
> [ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 
> 0x93bac8f != omap_digest 0 xab1b9c6f from shard 504
> 
> Which makes sense, but doesn’t explain why the omap data is getting out of 
> sync across multiple OSDs and clusters…
> 
> I’ll see what I can figure out tomorrow, but if anyone else has some hints I 
> would love to hear them.

I’ve dug into this more today and it appears that the omap data contains an 
extra entry on the OSDs with the mismatched omap digests.  I then searched the 
RGW logs and found that a DELETE happened shortly after the OSD booted, but the 
omap data wasn’t updated on that OSD so it became mismatched.

Here’s a timeline of the events which caused PG 7.9 to become inconsistent:

2019-04-04 14:37:34 - osd.492 marked itself down
2019-04-04 14:40:35 - osd.492 boot
2019-04-04 14:41:55 - DELETE call happened
2019-04-08 12:06:14 - omap_digest mismatch detected (pg 7.9 is 
active+clean+inconsistent, acting [492,546,523])

Here’s the timeline for PG 7.2b:

2019-04-03 13:54:17 - osd.488 marked itself down
2019-04-03 13:59:27 - osd.488 boot
2019-04-03 14:00:54 - DELETE call happened
2019-04-08 12:42:21 - omap_digest mismatch detected (pg 7.2b is 
active+clean+inconsistent, acting [488,511,541])

Bryan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Inconsistent PGs caused by omap_digest mismatch

2019-04-08 Thread Bryan Stillwell

> On Apr 8, 2019, at 4:38 PM, Gregory Farnum  wrote:
> 
> On Mon, Apr 8, 2019 at 3:19 PM Bryan Stillwell  wrote:
>> 
>> There doesn't appear to be any correlation between the OSDs which would 
>> point to a hardware issue, and since it's happening on two different 
>> clusters I'm wondering if there's a race condition that has been fixed in a 
>> later version?
>> 
>> Also, what exactly is the omap digest?  From what I can tell it appears to 
>> be some kind of checksum for the omap data.  Is that correct?
> 
> Yeah; it's just a crc over the omap key-value data that's checked
> during deep scrub. Same as the data digest.
> 
> I've not noticed any issues around this in Luminous but I probably
> wouldn't have, so will have to leave it up to others if there are
> fixes in since 12.2.8.

Thanks for adding some clarity to that Greg!

For some added information, this is what the logs reported earlier today:

2019-04-08 11:46:15.610169 osd.504 osd.504 10.16.10.30:6804/8874 33 : cluster 
[ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 
0x26a1241b != omap_digest 0x4c10ee76 from shard 504
2019-04-08 11:46:15.610190 osd.504 osd.504 10.16.10.30:6804/8874 34 : cluster 
[ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 
0x26a1241b != omap_digest 0x4c10ee76 from shard 504

I then tried deep scrubbing it again to see if the data was fine, but the 
digest calculation was just having problems.  It came back with the same 
problem with new digest values:

2019-04-08 15:56:21.186291 osd.504 osd.504 10.16.10.30:6804/8874 49 : cluster 
[ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 
0x93bac8f != omap_digest 0 xab1b9c6f from shard 504
2019-04-08 15:56:21.186313 osd.504 osd.504 10.16.10.30:6804/8874 50 : cluster 
[ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 
0x93bac8f != omap_digest 0 xab1b9c6f from shard 504

Which makes sense, but doesn’t explain why the omap data is getting out of sync 
across multiple OSDs and clusters…

I’ll see what I can figure out tomorrow, but if anyone else has some hints I 
would love to hear them.

Thanks,
Bryan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Inconsistent PGs caused by omap_digest mismatch

2019-04-08 Thread Gregory Farnum
On Mon, Apr 8, 2019 at 3:19 PM Bryan Stillwell  wrote:
>
> We have two separate RGW clusters running Luminous (12.2.8) that have started 
> seeing an increase in PGs going active+clean+inconsistent with the reason 
> being caused by an omap_digest mismatch.  Both clusters are using FileStore 
> and the inconsistent PGs are happening on the .rgw.buckets.index pool which 
> was moved from HDDs to SSDs within the last few months.
>
> We've been repairing them by first making sure the odd omap_digest is not the 
> primary by setting the primary-affinity to 0 if needed, doing the repair, and 
> then setting the primary-affinity back to 1.
>
> For example PG 7.3 went inconsistent earlier today:
>
> # rados list-inconsistent-obj 7.3 -f json-pretty | jq -r '.inconsistents[] | 
> .errors, .shards'
> [
>   "omap_digest_mismatch"
> ]
> [
>   {
> "osd": 504,
> "primary": true,
> "errors": [],
> "size": 0,
> "omap_digest": "0x4c10ee76",
> "data_digest": "0x"
>   },
>   {
> "osd": 525,
> "primary": false,
> "errors": [],
> "size": 0,
> "omap_digest": "0x26a1241b",
> "data_digest": "0x"
>   },
>   {
> "osd": 556,
> "primary": false,
> "errors": [],
> "size": 0,
> "omap_digest": "0x26a1241b",
> "data_digest": "0x"
>   }
> ]
>
> Since the odd omap_digest is on osd.504 and osd.504 is the primary, we would 
> set the primary-affinity to 0 with:
>
> # ceph osd primary-affinity osd.504 0
>
> Do the repair:
>
> # ceph pg repair 7.3
>
> And then once the repair is complete we would set the primary-affinity back 
> to 1 on osd.504:
>
> # ceph osd primary-affinity osd.504 1
>
> There doesn't appear to be any correlation between the OSDs which would point 
> to a hardware issue, and since it's happening on two different clusters I'm 
> wondering if there's a race condition that has been fixed in a later version?
>
> Also, what exactly is the omap digest?  From what I can tell it appears to be 
> some kind of checksum for the omap data.  Is that correct?

Yeah; it's just a crc over the omap key-value data that's checked
during deep scrub. Same as the data digest.

I've not noticed any issues around this in Luminous but I probably
wouldn't have, so will have to leave it up to others if there are
fixes in since 12.2.8.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Inconsistent PGs caused by omap_digest mismatch

2019-04-08 Thread Bryan Stillwell
We have two separate RGW clusters running Luminous (12.2.8) that have started 
seeing an increase in PGs going active+clean+inconsistent with the reason being 
caused by an omap_digest mismatch.  Both clusters are using FileStore and the 
inconsistent PGs are happening on the .rgw.buckets.index pool which was moved 
from HDDs to SSDs within the last few months.

We've been repairing them by first making sure the odd omap_digest is not the 
primary by setting the primary-affinity to 0 if needed, doing the repair, and 
then setting the primary-affinity back to 1.

For example PG 7.3 went inconsistent earlier today:

# rados list-inconsistent-obj 7.3 -f json-pretty | jq -r '.inconsistents[] | 
.errors, .shards'
[
  "omap_digest_mismatch"
]
[
  {
"osd": 504,
"primary": true,
"errors": [],
"size": 0,
"omap_digest": "0x4c10ee76",
"data_digest": "0x"
  },
  {
"osd": 525,
"primary": false,
"errors": [],
"size": 0,
"omap_digest": "0x26a1241b",
"data_digest": "0x"
  },
  {
"osd": 556,
"primary": false,
"errors": [],
"size": 0,
"omap_digest": "0x26a1241b",
"data_digest": "0x"
  }
]

Since the odd omap_digest is on osd.504 and osd.504 is the primary, we would 
set the primary-affinity to 0 with:

# ceph osd primary-affinity osd.504 0

Do the repair:

# ceph pg repair 7.3

And then once the repair is complete we would set the primary-affinity back to 
1 on osd.504:

# ceph osd primary-affinity osd.504 1

There doesn't appear to be any correlation between the OSDs which would point 
to a hardware issue, and since it's happening on two different clusters I'm 
wondering if there's a race condition that has been fixed in a later version?

Also, what exactly is the omap digest?  From what I can tell it appears to be 
some kind of checksum for the omap data.  Is that correct?

Thanks,
Bryan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com