Re: [ceph-users] PG inconsistent with error "size_too_large"

2020-01-16 Thread Massimo Sgaravatto
And I confirm that a repair is not useful. As as far I can see it simply
"cleans" the error (without modifying the big object) but the error of
course reappears when the deep scrub runs again on that PG

Cheers, Massimo

On Thu, Jan 16, 2020 at 9:35 AM Massimo Sgaravatto <
massimo.sgarava...@gmail.com> wrote:

> In my cluster I saw that the problematic objects have been uploaded by a
> specific application (onedata), which I think used to upload the files
> doing something like:
>
> rados --pool  put  
>
> Now (since Luminous ?) the default object size is 128MB but if I am not
> wrong it was 100GB before.
> This would explain why I have such big objects around (which indeed have
> an old timestamp)
>
> Cheers, Massimo
>
> On Wed, Jan 15, 2020 at 7:06 PM Liam Monahan  wrote:
>
>> I just changed my max object size to 256MB and scrubbed and the errors
>> went away.  I’m not sure what can be done to reduce the size of these
>> objects, though, if it really is a problem.  Our cluster has dynamic bucket
>> index resharding turned on, but that sharding process shouldn’t help it if
>> non-index objects are what is over the limit.
>>
>> I don’t think a pg repair would do anything unless the config tunables
>> are adjusted.
>>
>> On Jan 15, 2020, at 10:56 AM, Massimo Sgaravatto <
>> massimo.sgarava...@gmail.com> wrote:
>>
>> I never changed the default value for that attribute
>>
>> I am missing why I have such big objects around
>>
>> I am also wondering what a pg repair would do in such case
>>
>> Il mer 15 gen 2020, 16:18 Liam Monahan  ha scritto:
>>
>>> Thanks for that link.
>>>
>>> Do you have a default osd max object size of 128M?  I’m thinking about
>>> doubling that limit to 256MB on our cluster.  Our largest object is only
>>> about 10% over that limit.
>>>
>>> On Jan 15, 2020, at 3:51 AM, Massimo Sgaravatto <
>>> massimo.sgarava...@gmail.com> wrote:
>>>
>>> I guess this is coming from:
>>>
>>> https://github.com/ceph/ceph/pull/30783
>>>
>>> introduced in Nautilus 14.2.5
>>>
>>> On Wed, Jan 15, 2020 at 8:10 AM Massimo Sgaravatto <
>>> massimo.sgarava...@gmail.com> wrote:
>>>
 As I wrote here:


 http://lists.ceph.com/pipermail/ceph-users-ceph.com/2020-January/037909.html

 I saw the same after an update from Luminous to Nautilus 14.2.6

 Cheers, Massimo

 On Tue, Jan 14, 2020 at 7:45 PM Liam Monahan 
 wrote:

> Hi,
>
> I am getting one inconsistent object on our cluster with an
> inconsistency error that I haven’t seen before.  This started happening
> during a rolling upgrade of the cluster from 14.2.3 -> 14.2.6, but I am 
> not
> sure that’s related.
>
> I was hoping to know what the error means before trying a repair.
>
> [root@objmon04 ~]# ceph health detail
> HEALTH_ERR noout flag(s) set; 1 scrub errors; Possible data damage: 1
> pg inconsistent
> OSDMAP_FLAGS noout flag(s) set
> OSD_SCRUB_ERRORS 1 scrub errors
> PG_DAMAGED Possible data damage: 1 pg inconsistent
> pg 9.20e is active+clean+inconsistent, acting [509,674,659]
>
> rados list-inconsistent-obj 9.20e --format=json-pretty
> {
> "epoch": 759019,
> "inconsistents": [
> {
> "object": {
> "name":
> "2017-07-03-12-8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.31293422.4-activedns-diff",
> "nspace": "",
> "locator": "",
> "snap": "head",
> "version": 692875
> },
> "errors": [
> "size_too_large"
> ],
> "union_shard_errors": [],
> "selected_object_info": {
> "oid": {
> "oid":
> "2017-07-03-12-8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.31293422.4-activedns-diff",
> "key": "",
> "snapid": -2,
> "hash": 3321413134,
> "max": 0,
> "pool": 9,
> "namespace": ""
> },
> "version": "281183'692875",
> "prior_version": "281183'692874",
> "last_reqid": "client.34042469.0:206759091",
> "user_version": 692875,
> "size": 146097278,
> "mtime": "2017-07-03 12:43:35.569986",
> "local_mtime": "2017-07-03 12:43:35.571196",
> "lost": 0,
> "flags": [
> "dirty",
> "data_digest",
> "omap_digest"
> ],
> "truncate_seq": 0,
> "truncate_size": 0,
> "data_digest": "0xf19c8035",
> "omap_digest": "0x",
> "expected_object_size": 0,
>   

Re: [ceph-users] PG inconsistent with error "size_too_large"

2020-01-16 Thread Massimo Sgaravatto
In my cluster I saw that the problematic objects have been uploaded by a
specific application (onedata), which I think used to upload the files
doing something like:

rados --pool  put  

Now (since Luminous ?) the default object size is 128MB but if I am not
wrong it was 100GB before.
This would explain why I have such big objects around (which indeed have an
old timestamp)

Cheers, Massimo

On Wed, Jan 15, 2020 at 7:06 PM Liam Monahan  wrote:

> I just changed my max object size to 256MB and scrubbed and the errors
> went away.  I’m not sure what can be done to reduce the size of these
> objects, though, if it really is a problem.  Our cluster has dynamic bucket
> index resharding turned on, but that sharding process shouldn’t help it if
> non-index objects are what is over the limit.
>
> I don’t think a pg repair would do anything unless the config tunables are
> adjusted.
>
> On Jan 15, 2020, at 10:56 AM, Massimo Sgaravatto <
> massimo.sgarava...@gmail.com> wrote:
>
> I never changed the default value for that attribute
>
> I am missing why I have such big objects around
>
> I am also wondering what a pg repair would do in such case
>
> Il mer 15 gen 2020, 16:18 Liam Monahan  ha scritto:
>
>> Thanks for that link.
>>
>> Do you have a default osd max object size of 128M?  I’m thinking about
>> doubling that limit to 256MB on our cluster.  Our largest object is only
>> about 10% over that limit.
>>
>> On Jan 15, 2020, at 3:51 AM, Massimo Sgaravatto <
>> massimo.sgarava...@gmail.com> wrote:
>>
>> I guess this is coming from:
>>
>> https://github.com/ceph/ceph/pull/30783
>>
>> introduced in Nautilus 14.2.5
>>
>> On Wed, Jan 15, 2020 at 8:10 AM Massimo Sgaravatto <
>> massimo.sgarava...@gmail.com> wrote:
>>
>>> As I wrote here:
>>>
>>>
>>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2020-January/037909.html
>>>
>>> I saw the same after an update from Luminous to Nautilus 14.2.6
>>>
>>> Cheers, Massimo
>>>
>>> On Tue, Jan 14, 2020 at 7:45 PM Liam Monahan 
>>> wrote:
>>>
 Hi,

 I am getting one inconsistent object on our cluster with an
 inconsistency error that I haven’t seen before.  This started happening
 during a rolling upgrade of the cluster from 14.2.3 -> 14.2.6, but I am not
 sure that’s related.

 I was hoping to know what the error means before trying a repair.

 [root@objmon04 ~]# ceph health detail
 HEALTH_ERR noout flag(s) set; 1 scrub errors; Possible data damage: 1
 pg inconsistent
 OSDMAP_FLAGS noout flag(s) set
 OSD_SCRUB_ERRORS 1 scrub errors
 PG_DAMAGED Possible data damage: 1 pg inconsistent
 pg 9.20e is active+clean+inconsistent, acting [509,674,659]

 rados list-inconsistent-obj 9.20e --format=json-pretty
 {
 "epoch": 759019,
 "inconsistents": [
 {
 "object": {
 "name":
 "2017-07-03-12-8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.31293422.4-activedns-diff",
 "nspace": "",
 "locator": "",
 "snap": "head",
 "version": 692875
 },
 "errors": [
 "size_too_large"
 ],
 "union_shard_errors": [],
 "selected_object_info": {
 "oid": {
 "oid":
 "2017-07-03-12-8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.31293422.4-activedns-diff",
 "key": "",
 "snapid": -2,
 "hash": 3321413134,
 "max": 0,
 "pool": 9,
 "namespace": ""
 },
 "version": "281183'692875",
 "prior_version": "281183'692874",
 "last_reqid": "client.34042469.0:206759091",
 "user_version": 692875,
 "size": 146097278,
 "mtime": "2017-07-03 12:43:35.569986",
 "local_mtime": "2017-07-03 12:43:35.571196",
 "lost": 0,
 "flags": [
 "dirty",
 "data_digest",
 "omap_digest"
 ],
 "truncate_seq": 0,
 "truncate_size": 0,
 "data_digest": "0xf19c8035",
 "omap_digest": "0x",
 "expected_object_size": 0,
 "expected_write_size": 0,
 "alloc_hint_flags": 0,
 "manifest": {
 "type": 0
 },
 "watchers": {}
 },
 "shards": [
 {
 "osd": 509,
 "primary": true,
 "errors": [],
 "size": 146097278
   

Re: [ceph-users] PG inconsistent with error "size_too_large"

2020-01-15 Thread Liam Monahan
I just changed my max object size to 256MB and scrubbed and the errors went 
away.  I’m not sure what can be done to reduce the size of these objects, 
though, if it really is a problem.  Our cluster has dynamic bucket index 
resharding turned on, but that sharding process shouldn’t help it if non-index 
objects are what is over the limit.

I don’t think a pg repair would do anything unless the config tunables are 
adjusted.

> On Jan 15, 2020, at 10:56 AM, Massimo Sgaravatto 
>  wrote:
> 
> I never changed the default value for that attribute
> 
> I am missing why I have such big objects around 
> 
> I am also wondering what a pg repair would do in such case
> 
> Il mer 15 gen 2020, 16:18 Liam Monahan  > ha scritto:
> Thanks for that link.
> 
> Do you have a default osd max object size of 128M?  I’m thinking about 
> doubling that limit to 256MB on our cluster.  Our largest object is only 
> about 10% over that limit.
> 
>> On Jan 15, 2020, at 3:51 AM, Massimo Sgaravatto 
>> mailto:massimo.sgarava...@gmail.com>> wrote:
>> 
>> I guess this is coming from:
>> 
>> https://github.com/ceph/ceph/pull/30783 
>> 
>> 
>> introduced in Nautilus 14.2.5
>> 
>> On Wed, Jan 15, 2020 at 8:10 AM Massimo Sgaravatto 
>> mailto:massimo.sgarava...@gmail.com>> wrote:
>> As I wrote here:
>> 
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2020-January/037909.html 
>> 
>> 
>> I saw the same after an update from Luminous to Nautilus 14.2.6
>> 
>> Cheers, Massimo
>> 
>> On Tue, Jan 14, 2020 at 7:45 PM Liam Monahan > > wrote:
>> Hi,
>> 
>> I am getting one inconsistent object on our cluster with an inconsistency 
>> error that I haven’t seen before.  This started happening during a rolling 
>> upgrade of the cluster from 14.2.3 -> 14.2.6, but I am not sure that’s 
>> related.
>> 
>> I was hoping to know what the error means before trying a repair.
>> 
>> [root@objmon04 ~]# ceph health detail
>> HEALTH_ERR noout flag(s) set; 1 scrub errors; Possible data damage: 1 pg 
>> inconsistent
>> OSDMAP_FLAGS noout flag(s) set
>> OSD_SCRUB_ERRORS 1 scrub errors
>> PG_DAMAGED Possible data damage: 1 pg inconsistent
>> pg 9.20e is active+clean+inconsistent, acting [509,674,659]
>> 
>> rados list-inconsistent-obj 9.20e --format=json-pretty
>> {
>> "epoch": 759019,
>> "inconsistents": [
>> {
>> "object": {
>> "name": 
>> "2017-07-03-12-8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.31293422.4-activedns-diff",
>> "nspace": "",
>> "locator": "",
>> "snap": "head",
>> "version": 692875
>> },
>> "errors": [
>> "size_too_large"
>> ],
>> "union_shard_errors": [],
>> "selected_object_info": {
>> "oid": {
>> "oid": 
>> "2017-07-03-12-8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.31293422.4-activedns-diff",
>> "key": "",
>> "snapid": -2,
>> "hash": 3321413134,
>> "max": 0,
>> "pool": 9,
>> "namespace": ""
>> },
>> "version": "281183'692875",
>> "prior_version": "281183'692874",
>> "last_reqid": "client.34042469.0:206759091",
>> "user_version": 692875,
>> "size": 146097278,
>> "mtime": "2017-07-03 12:43:35.569986",
>> "local_mtime": "2017-07-03 12:43:35.571196",
>> "lost": 0,
>> "flags": [
>> "dirty",
>> "data_digest",
>> "omap_digest"
>> ],
>> "truncate_seq": 0,
>> "truncate_size": 0,
>> "data_digest": "0xf19c8035",
>> "omap_digest": "0x",
>> "expected_object_size": 0,
>> "expected_write_size": 0,
>> "alloc_hint_flags": 0,
>> "manifest": {
>> "type": 0
>> },
>> "watchers": {}
>> },
>> "shards": [
>> {
>> "osd": 509,
>> "primary": true,
>> "errors": [],
>> "size": 146097278
>> },
>> {
>> "osd": 659,
>> "primary": false,
>> "errors": [],
>> "size": 146097278
>> },
>> {
>> "osd": 674,
>> "primary": false,
>> "errors": [],
>> "size": 146097278
>> }
>> ]
>> }
>> ]
>> }

Re: [ceph-users] PG inconsistent with error "size_too_large"

2020-01-15 Thread Massimo Sgaravatto
I never changed the default value for that attribute

I am missing why I have such big objects around

I am also wondering what a pg repair would do in such case

Il mer 15 gen 2020, 16:18 Liam Monahan  ha scritto:

> Thanks for that link.
>
> Do you have a default osd max object size of 128M?  I’m thinking about
> doubling that limit to 256MB on our cluster.  Our largest object is only
> about 10% over that limit.
>
> On Jan 15, 2020, at 3:51 AM, Massimo Sgaravatto <
> massimo.sgarava...@gmail.com> wrote:
>
> I guess this is coming from:
>
> https://github.com/ceph/ceph/pull/30783
>
> introduced in Nautilus 14.2.5
>
> On Wed, Jan 15, 2020 at 8:10 AM Massimo Sgaravatto <
> massimo.sgarava...@gmail.com> wrote:
>
>> As I wrote here:
>>
>>
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2020-January/037909.html
>>
>> I saw the same after an update from Luminous to Nautilus 14.2.6
>>
>> Cheers, Massimo
>>
>> On Tue, Jan 14, 2020 at 7:45 PM Liam Monahan  wrote:
>>
>>> Hi,
>>>
>>> I am getting one inconsistent object on our cluster with an
>>> inconsistency error that I haven’t seen before.  This started happening
>>> during a rolling upgrade of the cluster from 14.2.3 -> 14.2.6, but I am not
>>> sure that’s related.
>>>
>>> I was hoping to know what the error means before trying a repair.
>>>
>>> [root@objmon04 ~]# ceph health detail
>>> HEALTH_ERR noout flag(s) set; 1 scrub errors; Possible data damage: 1 pg
>>> inconsistent
>>> OSDMAP_FLAGS noout flag(s) set
>>> OSD_SCRUB_ERRORS 1 scrub errors
>>> PG_DAMAGED Possible data damage: 1 pg inconsistent
>>> pg 9.20e is active+clean+inconsistent, acting [509,674,659]
>>>
>>> rados list-inconsistent-obj 9.20e --format=json-pretty
>>> {
>>> "epoch": 759019,
>>> "inconsistents": [
>>> {
>>> "object": {
>>> "name":
>>> "2017-07-03-12-8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.31293422.4-activedns-diff",
>>> "nspace": "",
>>> "locator": "",
>>> "snap": "head",
>>> "version": 692875
>>> },
>>> "errors": [
>>> "size_too_large"
>>> ],
>>> "union_shard_errors": [],
>>> "selected_object_info": {
>>> "oid": {
>>> "oid":
>>> "2017-07-03-12-8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.31293422.4-activedns-diff",
>>> "key": "",
>>> "snapid": -2,
>>> "hash": 3321413134,
>>> "max": 0,
>>> "pool": 9,
>>> "namespace": ""
>>> },
>>> "version": "281183'692875",
>>> "prior_version": "281183'692874",
>>> "last_reqid": "client.34042469.0:206759091",
>>> "user_version": 692875,
>>> "size": 146097278,
>>> "mtime": "2017-07-03 12:43:35.569986",
>>> "local_mtime": "2017-07-03 12:43:35.571196",
>>> "lost": 0,
>>> "flags": [
>>> "dirty",
>>> "data_digest",
>>> "omap_digest"
>>> ],
>>> "truncate_seq": 0,
>>> "truncate_size": 0,
>>> "data_digest": "0xf19c8035",
>>> "omap_digest": "0x",
>>> "expected_object_size": 0,
>>> "expected_write_size": 0,
>>> "alloc_hint_flags": 0,
>>> "manifest": {
>>> "type": 0
>>> },
>>> "watchers": {}
>>> },
>>> "shards": [
>>> {
>>> "osd": 509,
>>> "primary": true,
>>> "errors": [],
>>> "size": 146097278
>>> },
>>> {
>>> "osd": 659,
>>> "primary": false,
>>> "errors": [],
>>> "size": 146097278
>>> },
>>> {
>>> "osd": 674,
>>> "primary": false,
>>> "errors": [],
>>> "size": 146097278
>>> }
>>> ]
>>> }
>>> ]
>>> }
>>>
>>> Thanks,
>>> Liam
>>> —
>>> Senior Developer
>>> Institute for Advanced Computer Studies
>>> University of Maryland
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG inconsistent with error "size_too_large"

2020-01-15 Thread Liam Monahan
Thanks for that link.

Do you have a default osd max object size of 128M?  I’m thinking about doubling 
that limit to 256MB on our cluster.  Our largest object is only about 10% over 
that limit.

> On Jan 15, 2020, at 3:51 AM, Massimo Sgaravatto 
>  wrote:
> 
> I guess this is coming from:
> 
> https://github.com/ceph/ceph/pull/30783 
> 
> 
> introduced in Nautilus 14.2.5
> 
> On Wed, Jan 15, 2020 at 8:10 AM Massimo Sgaravatto 
> mailto:massimo.sgarava...@gmail.com>> wrote:
> As I wrote here:
> 
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2020-January/037909.html 
> 
> 
> I saw the same after an update from Luminous to Nautilus 14.2.6
> 
> Cheers, Massimo
> 
> On Tue, Jan 14, 2020 at 7:45 PM Liam Monahan  > wrote:
> Hi,
> 
> I am getting one inconsistent object on our cluster with an inconsistency 
> error that I haven’t seen before.  This started happening during a rolling 
> upgrade of the cluster from 14.2.3 -> 14.2.6, but I am not sure that’s 
> related.
> 
> I was hoping to know what the error means before trying a repair.
> 
> [root@objmon04 ~]# ceph health detail
> HEALTH_ERR noout flag(s) set; 1 scrub errors; Possible data damage: 1 pg 
> inconsistent
> OSDMAP_FLAGS noout flag(s) set
> OSD_SCRUB_ERRORS 1 scrub errors
> PG_DAMAGED Possible data damage: 1 pg inconsistent
> pg 9.20e is active+clean+inconsistent, acting [509,674,659]
> 
> rados list-inconsistent-obj 9.20e --format=json-pretty
> {
> "epoch": 759019,
> "inconsistents": [
> {
> "object": {
> "name": 
> "2017-07-03-12-8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.31293422.4-activedns-diff",
> "nspace": "",
> "locator": "",
> "snap": "head",
> "version": 692875
> },
> "errors": [
> "size_too_large"
> ],
> "union_shard_errors": [],
> "selected_object_info": {
> "oid": {
> "oid": 
> "2017-07-03-12-8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.31293422.4-activedns-diff",
> "key": "",
> "snapid": -2,
> "hash": 3321413134,
> "max": 0,
> "pool": 9,
> "namespace": ""
> },
> "version": "281183'692875",
> "prior_version": "281183'692874",
> "last_reqid": "client.34042469.0:206759091",
> "user_version": 692875,
> "size": 146097278,
> "mtime": "2017-07-03 12:43:35.569986",
> "local_mtime": "2017-07-03 12:43:35.571196",
> "lost": 0,
> "flags": [
> "dirty",
> "data_digest",
> "omap_digest"
> ],
> "truncate_seq": 0,
> "truncate_size": 0,
> "data_digest": "0xf19c8035",
> "omap_digest": "0x",
> "expected_object_size": 0,
> "expected_write_size": 0,
> "alloc_hint_flags": 0,
> "manifest": {
> "type": 0
> },
> "watchers": {}
> },
> "shards": [
> {
> "osd": 509,
> "primary": true,
> "errors": [],
> "size": 146097278
> },
> {
> "osd": 659,
> "primary": false,
> "errors": [],
> "size": 146097278
> },
> {
> "osd": 674,
> "primary": false,
> "errors": [],
> "size": 146097278
> }
> ]
> }
> ]
> }
> 
> Thanks,
> Liam
> —
> Senior Developer
> Institute for Advanced Computer Studies
> University of Maryland
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG inconsistent with error "size_too_large"

2020-01-15 Thread Massimo Sgaravatto
I guess this is coming from:

https://github.com/ceph/ceph/pull/30783

introduced in Nautilus 14.2.5

On Wed, Jan 15, 2020 at 8:10 AM Massimo Sgaravatto <
massimo.sgarava...@gmail.com> wrote:

> As I wrote here:
>
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2020-January/037909.html
>
> I saw the same after an update from Luminous to Nautilus 14.2.6
>
> Cheers, Massimo
>
> On Tue, Jan 14, 2020 at 7:45 PM Liam Monahan  wrote:
>
>> Hi,
>>
>> I am getting one inconsistent object on our cluster with an inconsistency
>> error that I haven’t seen before.  This started happening during a rolling
>> upgrade of the cluster from 14.2.3 -> 14.2.6, but I am not sure that’s
>> related.
>>
>> I was hoping to know what the error means before trying a repair.
>>
>> [root@objmon04 ~]# ceph health detail
>> HEALTH_ERR noout flag(s) set; 1 scrub errors; Possible data damage: 1 pg
>> inconsistent
>> OSDMAP_FLAGS noout flag(s) set
>> OSD_SCRUB_ERRORS 1 scrub errors
>> PG_DAMAGED Possible data damage: 1 pg inconsistent
>> pg 9.20e is active+clean+inconsistent, acting [509,674,659]
>>
>> rados list-inconsistent-obj 9.20e --format=json-pretty
>> {
>> "epoch": 759019,
>> "inconsistents": [
>> {
>> "object": {
>> "name":
>> "2017-07-03-12-8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.31293422.4-activedns-diff",
>> "nspace": "",
>> "locator": "",
>> "snap": "head",
>> "version": 692875
>> },
>> "errors": [
>> "size_too_large"
>> ],
>> "union_shard_errors": [],
>> "selected_object_info": {
>> "oid": {
>> "oid":
>> "2017-07-03-12-8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.31293422.4-activedns-diff",
>> "key": "",
>> "snapid": -2,
>> "hash": 3321413134,
>> "max": 0,
>> "pool": 9,
>> "namespace": ""
>> },
>> "version": "281183'692875",
>> "prior_version": "281183'692874",
>> "last_reqid": "client.34042469.0:206759091",
>> "user_version": 692875,
>> "size": 146097278,
>> "mtime": "2017-07-03 12:43:35.569986",
>> "local_mtime": "2017-07-03 12:43:35.571196",
>> "lost": 0,
>> "flags": [
>> "dirty",
>> "data_digest",
>> "omap_digest"
>> ],
>> "truncate_seq": 0,
>> "truncate_size": 0,
>> "data_digest": "0xf19c8035",
>> "omap_digest": "0x",
>> "expected_object_size": 0,
>> "expected_write_size": 0,
>> "alloc_hint_flags": 0,
>> "manifest": {
>> "type": 0
>> },
>> "watchers": {}
>> },
>> "shards": [
>> {
>> "osd": 509,
>> "primary": true,
>> "errors": [],
>> "size": 146097278
>> },
>> {
>> "osd": 659,
>> "primary": false,
>> "errors": [],
>> "size": 146097278
>> },
>> {
>> "osd": 674,
>> "primary": false,
>> "errors": [],
>> "size": 146097278
>> }
>> ]
>> }
>> ]
>> }
>>
>> Thanks,
>> Liam
>> —
>> Senior Developer
>> Institute for Advanced Computer Studies
>> University of Maryland
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG inconsistent with error "size_too_large"

2020-01-14 Thread Massimo Sgaravatto
As I wrote here:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2020-January/037909.html

I saw the same after an update from Luminous to Nautilus 14.2.6

Cheers, Massimo

On Tue, Jan 14, 2020 at 7:45 PM Liam Monahan  wrote:

> Hi,
>
> I am getting one inconsistent object on our cluster with an inconsistency
> error that I haven’t seen before.  This started happening during a rolling
> upgrade of the cluster from 14.2.3 -> 14.2.6, but I am not sure that’s
> related.
>
> I was hoping to know what the error means before trying a repair.
>
> [root@objmon04 ~]# ceph health detail
> HEALTH_ERR noout flag(s) set; 1 scrub errors; Possible data damage: 1 pg
> inconsistent
> OSDMAP_FLAGS noout flag(s) set
> OSD_SCRUB_ERRORS 1 scrub errors
> PG_DAMAGED Possible data damage: 1 pg inconsistent
> pg 9.20e is active+clean+inconsistent, acting [509,674,659]
>
> rados list-inconsistent-obj 9.20e --format=json-pretty
> {
> "epoch": 759019,
> "inconsistents": [
> {
> "object": {
> "name":
> "2017-07-03-12-8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.31293422.4-activedns-diff",
> "nspace": "",
> "locator": "",
> "snap": "head",
> "version": 692875
> },
> "errors": [
> "size_too_large"
> ],
> "union_shard_errors": [],
> "selected_object_info": {
> "oid": {
> "oid":
> "2017-07-03-12-8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.31293422.4-activedns-diff",
> "key": "",
> "snapid": -2,
> "hash": 3321413134,
> "max": 0,
> "pool": 9,
> "namespace": ""
> },
> "version": "281183'692875",
> "prior_version": "281183'692874",
> "last_reqid": "client.34042469.0:206759091",
> "user_version": 692875,
> "size": 146097278,
> "mtime": "2017-07-03 12:43:35.569986",
> "local_mtime": "2017-07-03 12:43:35.571196",
> "lost": 0,
> "flags": [
> "dirty",
> "data_digest",
> "omap_digest"
> ],
> "truncate_seq": 0,
> "truncate_size": 0,
> "data_digest": "0xf19c8035",
> "omap_digest": "0x",
> "expected_object_size": 0,
> "expected_write_size": 0,
> "alloc_hint_flags": 0,
> "manifest": {
> "type": 0
> },
> "watchers": {}
> },
> "shards": [
> {
> "osd": 509,
> "primary": true,
> "errors": [],
> "size": 146097278
> },
> {
> "osd": 659,
> "primary": false,
> "errors": [],
> "size": 146097278
> },
> {
> "osd": 674,
> "primary": false,
> "errors": [],
> "size": 146097278
> }
> ]
> }
> ]
> }
>
> Thanks,
> Liam
> —
> Senior Developer
> Institute for Advanced Computer Studies
> University of Maryland
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG inconsistent with error "size_too_large"

2020-01-14 Thread Liam Monahan
Hi,

I am getting one inconsistent object on our cluster with an inconsistency error 
that I haven’t seen before.  This started happening during a rolling upgrade of 
the cluster from 14.2.3 -> 14.2.6, but I am not sure that’s related.

I was hoping to know what the error means before trying a repair.

[root@objmon04 ~]# ceph health detail
HEALTH_ERR noout flag(s) set; 1 scrub errors; Possible data damage: 1 pg 
inconsistent
OSDMAP_FLAGS noout flag(s) set
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 9.20e is active+clean+inconsistent, acting [509,674,659]

rados list-inconsistent-obj 9.20e --format=json-pretty
{
"epoch": 759019,
"inconsistents": [
{
"object": {
"name": 
"2017-07-03-12-8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.31293422.4-activedns-diff",
"nspace": "",
"locator": "",
"snap": "head",
"version": 692875
},
"errors": [
"size_too_large"
],
"union_shard_errors": [],
"selected_object_info": {
"oid": {
"oid": 
"2017-07-03-12-8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.31293422.4-activedns-diff",
"key": "",
"snapid": -2,
"hash": 3321413134,
"max": 0,
"pool": 9,
"namespace": ""
},
"version": "281183'692875",
"prior_version": "281183'692874",
"last_reqid": "client.34042469.0:206759091",
"user_version": 692875,
"size": 146097278,
"mtime": "2017-07-03 12:43:35.569986",
"local_mtime": "2017-07-03 12:43:35.571196",
"lost": 0,
"flags": [
"dirty",
"data_digest",
"omap_digest"
],
"truncate_seq": 0,
"truncate_size": 0,
"data_digest": "0xf19c8035",
"omap_digest": "0x",
"expected_object_size": 0,
"expected_write_size": 0,
"alloc_hint_flags": 0,
"manifest": {
"type": 0
},
"watchers": {}
},
"shards": [
{
"osd": 509,
"primary": true,
"errors": [],
"size": 146097278
},
{
"osd": 659,
"primary": false,
"errors": [],
"size": 146097278
},
{
"osd": 674,
"primary": false,
"errors": [],
"size": 146097278
}
]
}
]
}

Thanks,
Liam
—
Senior Developer
Institute for Advanced Computer Studies
University of Maryland
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com