Re: [ceph-users] cephfs deep scrub error:

2017-03-13 Thread Gregory Farnum
On Mon, Mar 13, 2017 at 3:28 AM, Dan van der Ster  wrote:
> Hi John,
>
> Last week we updated our prod CephFS cluster to 10.2.6 (clients and
> server side), and for the first time today we've got an object info
> size mismatch:
>
> I found this ticket you created in the tracker, which is why I've
> emailed you: http://tracker.ceph.com/issues/18240
>
> Here's the detail of our error:
>
> 2017-03-13 07:17:49.989297 osd.67 :6819/3441125 262 : cluster
> [ERR] deep-scrub 1.3da 1:5bc0e9dc:::1260f4b.0003:head on disk
> size (4187974) does not match object info size (4193094) adjusted for
> ondisk to (4193094)
>
> All three replica's have the same object size/md5sum:
>
> # ls -l 1260f4b.0003__head_3B9703DA__1
> -rw-r--r--. 1 ceph ceph 4187974 Mar 12 18:50
> 1260f4b.0003__head_3B9703DA__1
> # md5sum 1260f4b.0003__head_3B9703DA__1
> db1e1bab199b33fce3ad9195832626ef 1260f4b.0003__head_3B9703DA__1
>
> And indeed the object info does not agree with the files on disk:
>
> # ceph-dencoder type object_info_t import /tmp/attr1 decode dump_json
> {
> "oid": {
> "oid": "1260f4b.0003",
> "key": "",
> "snapid": -2,
> "hash": 999752666,
> "max": 0,
> "pool": 1,
> "namespace": ""
> },
> "version": "5262'221037",
> "prior_version": "5262'221031",
> "last_reqid": "osd.67.0:1180241",
> "user_version": 221031,

The only string I can see to tug on here is this: that the version and
user_version have clearly diverged, which (most?) commonly means some
kind of recovery or caching op was performed. You're not using cache
tiers, are you? Is it possible somebody ran recovery and overwrote
good objects with the bad one before you got to look at the raw files?
-Greg

> "size": 4193094,
> "mtime": "0.00",
> "local_mtime": "0.00",
> "lost": 0,
> "flags": 52,
> "snaps": [],
> "truncate_seq": 80,
> "truncate_size": 0,
> "data_digest": 2779145704,
> "omap_digest": 4294967295,
> "watchers": {}
> }
>
>
> PG repair doesn't handle these kind of corruptions, but I found a
> recipe in an old thread to fix the object info with hexedit. Before
> doing this I wanted to see if we can understand exactly how this is
> possible.
>
> I managed to find the exact cephfs file, and asked the user how they
> created it. They said the file was the output of a make test on some
> program. The make test was taking awhile, so they left their laptop,
> and when they returned to the computer, the ssh connection to their
> cephfs workstation had broken. I assume this means that the process
> writing the file had been killed while writing to cephfs. But I don't
> understand how a killed client process could result in inconsistent
> object info.
>
> Is there anything else needed to help debug this inconsistency?
>
> Cheers, Dan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs deep scrub error:

2017-03-13 Thread Dan van der Ster
On Mon, Mar 13, 2017 at 1:35 PM, John Spray  wrote:
> On Mon, Mar 13, 2017 at 10:28 AM, Dan van der Ster  
> wrote:
>> Hi John,
>>
>> Last week we updated our prod CephFS cluster to 10.2.6 (clients and
>> server side), and for the first time today we've got an object info
>> size mismatch:
>>
>> I found this ticket you created in the tracker, which is why I've
>> emailed you: http://tracker.ceph.com/issues/18240
>>
>> Here's the detail of our error:
>>
>> 2017-03-13 07:17:49.989297 osd.67 :6819/3441125 262 : cluster
>> [ERR] deep-scrub 1.3da 1:5bc0e9dc:::1260f4b.0003:head on disk
>> size (4187974) does not match object info size (4193094) adjusted for
>> ondisk to (4193094)
>
>
> Hmm, never investigated that particular ticket but did notice that the
> issue never re-occurred on master, or on jewel since the start of
> 2017.
>
> The last revision where it cropped up in testing was
> 5b402f8a7b5a763852e93cd0a5decd34572f4518, looking at the history
> between that commit and the tip of the jewel branch I don't see
> anything that I recognise as a corruption fix (apart from the omap fix
> which is clearly unrelated as we're looking at data objects in these
> failures).
>
> John

Thanks for checking, John. In the end we deleted the inconsistent
file, rescrubbed, and will just keep a look out for similar cases.

-- Dan


>
>
>> All three replica's have the same object size/md5sum:
>>
>> # ls -l 1260f4b.0003__head_3B9703DA__1
>> -rw-r--r--. 1 ceph ceph 4187974 Mar 12 18:50
>> 1260f4b.0003__head_3B9703DA__1
>> # md5sum 1260f4b.0003__head_3B9703DA__1
>> db1e1bab199b33fce3ad9195832626ef 1260f4b.0003__head_3B9703DA__1
>>
>> And indeed the object info does not agree with the files on disk:
>>
>> # ceph-dencoder type object_info_t import /tmp/attr1 decode dump_json
>> {
>> "oid": {
>> "oid": "1260f4b.0003",
>> "key": "",
>> "snapid": -2,
>> "hash": 999752666,
>> "max": 0,
>> "pool": 1,
>> "namespace": ""
>> },
>> "version": "5262'221037",
>> "prior_version": "5262'221031",
>> "last_reqid": "osd.67.0:1180241",
>> "user_version": 221031,
>> "size": 4193094,
>> "mtime": "0.00",
>> "local_mtime": "0.00",
>> "lost": 0,
>> "flags": 52,
>> "snaps": [],
>> "truncate_seq": 80,
>> "truncate_size": 0,
>> "data_digest": 2779145704,
>> "omap_digest": 4294967295,
>> "watchers": {}
>> }
>>
>>
>> PG repair doesn't handle these kind of corruptions, but I found a
>> recipe in an old thread to fix the object info with hexedit. Before
>> doing this I wanted to see if we can understand exactly how this is
>> possible.
>>
>> I managed to find the exact cephfs file, and asked the user how they
>> created it. They said the file was the output of a make test on some
>> program. The make test was taking awhile, so they left their laptop,
>> and when they returned to the computer, the ssh connection to their
>> cephfs workstation had broken. I assume this means that the process
>> writing the file had been killed while writing to cephfs. But I don't
>> understand how a killed client process could result in inconsistent
>> object info.
>>
>> Is there anything else needed to help debug this inconsistency?
>>
>> Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs deep scrub error:

2017-03-13 Thread John Spray
On Mon, Mar 13, 2017 at 10:28 AM, Dan van der Ster  wrote:
> Hi John,
>
> Last week we updated our prod CephFS cluster to 10.2.6 (clients and
> server side), and for the first time today we've got an object info
> size mismatch:
>
> I found this ticket you created in the tracker, which is why I've
> emailed you: http://tracker.ceph.com/issues/18240
>
> Here's the detail of our error:
>
> 2017-03-13 07:17:49.989297 osd.67 :6819/3441125 262 : cluster
> [ERR] deep-scrub 1.3da 1:5bc0e9dc:::1260f4b.0003:head on disk
> size (4187974) does not match object info size (4193094) adjusted for
> ondisk to (4193094)


Hmm, never investigated that particular ticket but did notice that the
issue never re-occurred on master, or on jewel since the start of
2017.

The last revision where it cropped up in testing was
5b402f8a7b5a763852e93cd0a5decd34572f4518, looking at the history
between that commit and the tip of the jewel branch I don't see
anything that I recognise as a corruption fix (apart from the omap fix
which is clearly unrelated as we're looking at data objects in these
failures).

John


> All three replica's have the same object size/md5sum:
>
> # ls -l 1260f4b.0003__head_3B9703DA__1
> -rw-r--r--. 1 ceph ceph 4187974 Mar 12 18:50
> 1260f4b.0003__head_3B9703DA__1
> # md5sum 1260f4b.0003__head_3B9703DA__1
> db1e1bab199b33fce3ad9195832626ef 1260f4b.0003__head_3B9703DA__1
>
> And indeed the object info does not agree with the files on disk:
>
> # ceph-dencoder type object_info_t import /tmp/attr1 decode dump_json
> {
> "oid": {
> "oid": "1260f4b.0003",
> "key": "",
> "snapid": -2,
> "hash": 999752666,
> "max": 0,
> "pool": 1,
> "namespace": ""
> },
> "version": "5262'221037",
> "prior_version": "5262'221031",
> "last_reqid": "osd.67.0:1180241",
> "user_version": 221031,
> "size": 4193094,
> "mtime": "0.00",
> "local_mtime": "0.00",
> "lost": 0,
> "flags": 52,
> "snaps": [],
> "truncate_seq": 80,
> "truncate_size": 0,
> "data_digest": 2779145704,
> "omap_digest": 4294967295,
> "watchers": {}
> }
>
>
> PG repair doesn't handle these kind of corruptions, but I found a
> recipe in an old thread to fix the object info with hexedit. Before
> doing this I wanted to see if we can understand exactly how this is
> possible.
>
> I managed to find the exact cephfs file, and asked the user how they
> created it. They said the file was the output of a make test on some
> program. The make test was taking awhile, so they left their laptop,
> and when they returned to the computer, the ssh connection to their
> cephfs workstation had broken. I assume this means that the process
> writing the file had been killed while writing to cephfs. But I don't
> understand how a killed client process could result in inconsistent
> object info.
>
> Is there anything else needed to help debug this inconsistency?
>
> Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com