Re: [ceph-users] Snaptrim_error

2018-07-11 Thread Gregory Farnum
Ah sadly those logs don't look like they have enough debugging to be of
much use.

But from what I'm seeing here, I don't think this state should actually
hurt anything. It ought to go away the next time you delete a snapshot
(maybe only if it has data in those PGs? not sure) and otherwise be
ignorable. I've created a ticket at http://tracker.ceph.com/issues/24876
-Greg

On Wed, Jul 11, 2018 at 9:00 AM Flash  wrote:

> I found only something like this:
> 2018-07-04 19:26:20.209791 7fc8c8ad2700 -1 log_channel(cluster) log [ERR]
> : trim_object: Can not trim
> 11:e4d50fa4:::rbd_data.4d427a238e1f29.00190c9b:6b4 repair needed
> (no obc)
>
> The full logs in attach. Problem with pg 11.127 started  at ~19:26.
> Later  11.9 became to error state too, but I don't know, at what time
> exactly
>
> ср, 11 июл. 2018 г. в 18:20, Gregory Farnum :
>
>> On Wed, Jul 11, 2018 at 8:07 AM Flash  wrote:
>>
>>> Hi there.
>>>
>>> Yesterday I caught that error:
>>> PG_DAMAGED Possible data damage: 2 pgs snaptrim_error
>>> pg 11.9 is active+clean+snaptrim_error, acting [196,167,32]
>>> pg 11.127 is active+clean+snaptrim_error, acting [184,138,1]
>>> May it be because the scrub was done when the snapshots were cleaned up?
>>>
>>
>> Hmm, the only way you can get the snaptrim_error state is if the PG gets
>> an error when it tries to trim a particular snapshotted object. And it
>> doesn't get cleared by scrubbing; only when it starts snaptrimming again.
>>
>> If you have any OSD logs of when this happened, that would be helpful.
>>
>> And, uh, Sage? Do you know what was supposed to happen here? It's a bit
>> odd as a PG state.
>> -Greg
>>
>>
>>>
>>> I tried to restart OSD, then I run deep-scrub and repair, but it didn't
>>> solve the problem.
>>>
>>> In the documentation the page "Repairing PG inconsistencies" is empty  -
>>>  http://docs.ceph.com/docs/mimic/rados/operations/pg-repair/,
>>> so I don't know, what else can I do?
>>>
>>> Cluster info;
>>> vaersion 12.2.5
>>> 25 OSD nodes
>>> 12 OSD per node. The most of them still have filestore as storage
>>> backend.
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Snaptrim_error

2018-07-11 Thread Gregory Farnum
On Wed, Jul 11, 2018 at 8:07 AM Flash  wrote:

> Hi there.
>
> Yesterday I caught that error:
> PG_DAMAGED Possible data damage: 2 pgs snaptrim_error
> pg 11.9 is active+clean+snaptrim_error, acting [196,167,32]
> pg 11.127 is active+clean+snaptrim_error, acting [184,138,1]
> May it be because the scrub was done when the snapshots were cleaned up?
>

Hmm, the only way you can get the snaptrim_error state is if the PG gets an
error when it tries to trim a particular snapshotted object. And it doesn't
get cleared by scrubbing; only when it starts snaptrimming again.

If you have any OSD logs of when this happened, that would be helpful.

And, uh, Sage? Do you know what was supposed to happen here? It's a bit odd
as a PG state.
-Greg


>
> I tried to restart OSD, then I run deep-scrub and repair, but it didn't
> solve the problem.
>
> In the documentation the page "Repairing PG inconsistencies" is empty  -
> http://docs.ceph.com/docs/mimic/rados/operations/pg-repair/,
> so I don't know, what else can I do?
>
> Cluster info;
> vaersion 12.2.5
> 25 OSD nodes
> 12 OSD per node. The most of them still have filestore as storage backend.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com