Re: [ceph-users] jewel ceph has PG mapped always to the same OSD's

2018-04-07 Thread Konstantin Danilov
Deep scrub doesn't help.
After some steps (not sure what exact list)
ceph does remap this pg to another osd, but PG doesn't move
# ceph pg map 11.206
osdmap e176314 pg 11.206 (11.206) -> up [955,198,801] acting [787,697]

Hangs in this state forever, 'ceph pg 11.206 query' hangs as well

On Sat, Apr 7, 2018 at 12:42 AM, Konstantin Danilov
 wrote:
> David,
>
>> What happens when you deep-scrub this PG?
> we haven't try to deep-scrub it, will try.
>
>> What do the OSD logs show for any lines involving the problem PGs?
> Nothing special were logged about this particular osd, except that it's
> degraded.
> Yet osd consume quite a lot portion of its CPU time in
> snappy/leveldb/jemalloc libs.
> In logs there a lot of messages from leveldb about moving data between
> levels.
> Needles to mention that this PG is from RGW index bucket, so it's metadata
> only
> and get a relatively hight load. Yet not we have 3 PG with the same
> behavior from rgw data pool ()cluster have almost all data in RGW
>
>> Was anything happening on your cluster just before this started happening
>> at first?
> Cluster gets many updates in a week before issue, but nothing particularly
> noticeable.
> SSD OSD get's split in two, about 10% of OSD were removed. Some networking
> issues
> appears.
>
> Thanks
>
> On Fri, Apr 6, 2018 at 10:07 PM, David Turner  wrote:
>>
>> What happens when you deep-scrub this PG?  What do the OSD logs show for
>> any lines involving the problem PGs?  Was anything happening on your cluster
>> just before this started happening at first?
>>
>> On Fri, Apr 6, 2018 at 2:29 PM Konstantin Danilov 
>> wrote:
>>>
>>> Hi all, we have a strange issue on one cluster.
>>>
>>> One PG is mapped to the particular set of OSD, say X,Y and Z doesn't
>>> matter what how
>>> we change crush map.
>>> The whole picture is next:
>>>
>>> * This is 10.2.7 ceph version, all monitors and osd's have the same
>>> version
>>> * One  PG eventually get into 'active+degraded+incomplete' state. It
>>> was active+clean for a long time
>>> and already has some data. We can't detect the event, which leads it
>>> to this state. Probably it's
>>> happened after some osd was removed from the cluster
>>> * This PG has all 3 required OSD up and running, and all of them
>>> online (pool_sz=3, min_pool_sz=2)
>>> * All requests to pg stack forever, historic_ops shows that it waiting
>>> on "waiting_for_degraded_pg"
>>> * ceph pg query hangs forever
>>> * We can't copy data from another pool as well - copying process hangs
>>> and that fails with
>>> (34) Numerical result out of range
>>>  * We was trying to restart osd's, nodes, mon's with no effects
>>> * Eventually we found that shutting down osd Z(not primary) does solve
>>> the issue, but
>>> only before ceph set this osd out. If we trying to change the weight
>>> of this osd or remove it from cluster problem appears again. Cluster
>>> is working only while osd Z is down and not out and has the default
>>> weight
>>> * Then we have found that doesn't matter what we are doing with crushmap
>>> -
>>> osdmaptool --test-map-pgs-dump always put this PG to the same set of
>>> osd - [X, Y] (in this osdmap Z is already down). We updating crush map
>>> to remove nodes with OSD X,Y and Z completely out of it, compile it,
>>> import it back to osdmap and run osdmaptool and always get the same
>>> results
>>> * After several nodes restart and setting osd Z down, but no out we
>>> are now have 3 more PG with the same behaviour, but 'pined' to another
>>> osd's
>>> * We have run osdmaptool from luminous ceph to check if upmap
>>> extension is somehow getting into this osd map - it is not.
>>>
>>> So this is where we are now. Have anyone seen something like this? Any
>>> ideas are welcome. Thanks
>>>
>>>
>>> --
>>> Kostiantyn Danilov
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> --
> Kostiantyn Danilov aka koder.ua
> Principal software engineer, Mirantis
>
> skype:koder.ua
> http://koder-ua.blogspot.com/
> http://mirantis.com



-- 
Kostiantyn Danilov aka koder.ua
Principal software engineer, Mirantis

skype:koder.ua
http://koder-ua.blogspot.com/
http://mirantis.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jewel ceph has PG mapped always to the same OSD's

2018-04-06 Thread Konstantin Danilov
David,

> What happens when you deep-scrub this PG?
we haven't try to deep-scrub it, will try.

> What do the OSD logs show for any lines involving the problem PGs?
Nothing special were logged about this particular osd, except that it's
degraded.
Yet osd consume quite a lot portion of its CPU time in
snappy/leveldb/jemalloc libs.
In logs there a lot of messages from leveldb about moving data between
levels.
Needles to mention that this PG is from RGW index bucket, so it's metadata
only
and get a relatively hight load. Yet not we have 3 PG with the same
behavior from rgw data pool ()cluster have almost all data in RGW

> Was anything happening on your cluster just before this started happening
at first?
Cluster gets many updates in a week before issue, but nothing particularly
noticeable.
SSD OSD get's split in two, about 10% of OSD were removed. Some networking
issues
appears.

Thanks

On Fri, Apr 6, 2018 at 10:07 PM, David Turner  wrote:

> What happens when you deep-scrub this PG?  What do the OSD logs show for
> any lines involving the problem PGs?  Was anything happening on your
> cluster just before this started happening at first?
>
> On Fri, Apr 6, 2018 at 2:29 PM Konstantin Danilov 
> wrote:
>
>> Hi all, we have a strange issue on one cluster.
>>
>> One PG is mapped to the particular set of OSD, say X,Y and Z doesn't
>> matter what how
>> we change crush map.
>> The whole picture is next:
>>
>> * This is 10.2.7 ceph version, all monitors and osd's have the same
>> version
>> * One  PG eventually get into 'active+degraded+incomplete' state. It
>> was active+clean for a long time
>> and already has some data. We can't detect the event, which leads it
>> to this state. Probably it's
>> happened after some osd was removed from the cluster
>> * This PG has all 3 required OSD up and running, and all of them
>> online (pool_sz=3, min_pool_sz=2)
>> * All requests to pg stack forever, historic_ops shows that it waiting
>> on "waiting_for_degraded_pg"
>> * ceph pg query hangs forever
>> * We can't copy data from another pool as well - copying process hangs
>> and that fails with
>> (34) Numerical result out of range
>>  * We was trying to restart osd's, nodes, mon's with no effects
>> * Eventually we found that shutting down osd Z(not primary) does solve
>> the issue, but
>> only before ceph set this osd out. If we trying to change the weight
>> of this osd or remove it from cluster problem appears again. Cluster
>> is working only while osd Z is down and not out and has the default
>> weight
>> * Then we have found that doesn't matter what we are doing with crushmap -
>> osdmaptool --test-map-pgs-dump always put this PG to the same set of
>> osd - [X, Y] (in this osdmap Z is already down). We updating crush map
>> to remove nodes with OSD X,Y and Z completely out of it, compile it,
>> import it back to osdmap and run osdmaptool and always get the same
>> results
>> * After several nodes restart and setting osd Z down, but no out we
>> are now have 3 more PG with the same behaviour, but 'pined' to another
>> osd's
>> * We have run osdmaptool from luminous ceph to check if upmap
>> extension is somehow getting into this osd map - it is not.
>>
>> So this is where we are now. Have anyone seen something like this? Any
>> ideas are welcome. Thanks
>>
>>
>> --
>> Kostiantyn Danilov
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>


-- 
Kostiantyn Danilov aka koder.ua
Principal software engineer, Mirantis

skype:koder.ua
http://koder-ua.blogspot.com/
http://mirantis.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jewel ceph has PG mapped always to the same OSD's

2018-04-06 Thread David Turner
What happens when you deep-scrub this PG?  What do the OSD logs show for
any lines involving the problem PGs?  Was anything happening on your
cluster just before this started happening at first?

On Fri, Apr 6, 2018 at 2:29 PM Konstantin Danilov 
wrote:

> Hi all, we have a strange issue on one cluster.
>
> One PG is mapped to the particular set of OSD, say X,Y and Z doesn't
> matter what how
> we change crush map.
> The whole picture is next:
>
> * This is 10.2.7 ceph version, all monitors and osd's have the same version
> * One  PG eventually get into 'active+degraded+incomplete' state. It
> was active+clean for a long time
> and already has some data. We can't detect the event, which leads it
> to this state. Probably it's
> happened after some osd was removed from the cluster
> * This PG has all 3 required OSD up and running, and all of them
> online (pool_sz=3, min_pool_sz=2)
> * All requests to pg stack forever, historic_ops shows that it waiting
> on "waiting_for_degraded_pg"
> * ceph pg query hangs forever
> * We can't copy data from another pool as well - copying process hangs
> and that fails with
> (34) Numerical result out of range
>  * We was trying to restart osd's, nodes, mon's with no effects
> * Eventually we found that shutting down osd Z(not primary) does solve
> the issue, but
> only before ceph set this osd out. If we trying to change the weight
> of this osd or remove it from cluster problem appears again. Cluster
> is working only while osd Z is down and not out and has the default
> weight
> * Then we have found that doesn't matter what we are doing with crushmap -
> osdmaptool --test-map-pgs-dump always put this PG to the same set of
> osd - [X, Y] (in this osdmap Z is already down). We updating crush map
> to remove nodes with OSD X,Y and Z completely out of it, compile it,
> import it back to osdmap and run osdmaptool and always get the same
> results
> * After several nodes restart and setting osd Z down, but no out we
> are now have 3 more PG with the same behaviour, but 'pined' to another
> osd's
> * We have run osdmaptool from luminous ceph to check if upmap
> extension is somehow getting into this osd map - it is not.
>
> So this is where we are now. Have anyone seen something like this? Any
> ideas are welcome. Thanks
>
>
> --
> Kostiantyn Danilov
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com