Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-19 Thread Stefan Priebe - Profihost AG
Hi,

we were able to solve these issues. We switched bcache OSDs from ssd to
hdd in the ceph osd tree and lowered max recover from 3 to 1.

Thanks for your help!

Greets,
Stefan
Am 18.10.2018 um 15:42 schrieb David Turner:
> What are you OSD node stats?  CPU, RAM, quantity and size of OSD disks. 
> You might need to modify some bluestore settings to speed up the time it
> takes to peer or perhaps you might just be underpowering the amount of
> OSD disks you're trying to do and your servers and OSD daemons are going
> as fast as they can.
> On Sat, Oct 13, 2018 at 4:08 PM Stefan Priebe - Profihost AG
> mailto:s.pri...@profihost.ag>> wrote:
> 
> and a 3rd one:
> 
>     health: HEALTH_WARN
>             1 MDSs report slow metadata IOs
>             1 MDSs report slow requests
> 
> 2018-10-13 21:44:08.150722 mds.cloud1-1473 [WRN] 7 slow requests, 1
> included below; oldest blocked for > 199.922552 secs
> 2018-10-13 21:44:08.150725 mds.cloud1-1473 [WRN] slow request 34.829662
> seconds old, received at 2018-10-13 21:43:33.321031:
> client_request(client.216121228:929114 lookup #0x1/.active.lock
> 2018-10-13 21:43:33.321594 caller_uid=0, caller_gid=0{}) currently
> failed to rdlock, waiting
> 
> The relevant OSDs are bluestore again running at 100% I/O:
> 
> iostat shows:
> sdi              77,00     0,00  580,00   97,00 511032,00   972,00
> 1512,57    14,88   22,05   24,57    6,97   1,48 100,00
> 
> so it reads with 500MB/s which completely saturates the osd. And it does
> for > 10 minutes.
> 
> Greets,
> Stefan
> 
> Am 13.10.2018 um 21:29 schrieb Stefan Priebe - Profihost AG:
> >
> > ods.19 is a bluestore osd on a healthy 2TB SSD.
> >
> > Log of osd.19 is here:
> > https://pastebin.com/raw/6DWwhS0A
> >
> > Am 13.10.2018 um 21:20 schrieb Stefan Priebe - Profihost AG:
> >> Hi David,
> >>
> >> i think this should be the problem - form a new log from today:
> >>
> >> 2018-10-13 20:57:20.367326 mon.a [WRN] Health check update: 4
> osds down
> >> (OSD_DOWN)
> >> ...
> >> 2018-10-13 20:57:41.268674 mon.a [WRN] Health check update:
> Reduced data
> >> availability: 3 pgs peering (PG_AVAILABILITY)
> >> ...
> >> 2018-10-13 20:58:08.684451 mon.a [WRN] Health check failed: 1
> osds down
> >> (OSD_DOWN)
> >> ...
> >> 2018-10-13 20:58:22.841210 mon.a [WRN] Health check failed:
> Reduced data
> >> availability: 8 pgs inactive (PG_AVAILABILITY)
> >> 
> >> 2018-10-13 20:58:47.570017 mon.a [WRN] Health check update:
> Reduced data
> >> availability: 5 pgs inactive (PG_AVAILABILITY)
> >> ...
> >> 2018-10-13 20:58:49.142108 osd.19 [WRN] Monitor daemon marked osd.19
> >> down, but it is still running
> >> 2018-10-13 20:58:53.750164 mon.a [WRN] Health check update:
> Reduced data
> >> availability: 3 pgs inactive (PG_AVAILABILITY)
> >> ...
> >>
> >> so there is a timeframe of > 90s whee PGs are inactive and unavail -
> >> this would at least explain stalled I/O to me?
> >>
> >> Greets,
> >> Stefan
> >>
> >>
> >> Am 12.10.2018 um 15:59 schrieb David Turner:
> >>> The PGs per OSD does not change unless the OSDs are marked out.  You
> >>> have noout set, so that doesn't change at all during this test. 
> All of
> >>> your PGs peered quickly at the beginning and then were
> active+undersized
> >>> the rest of the time, you never had any blocked requests, and
> you always
> >>> had 100MB/s+ client IO.  I didn't see anything wrong with your
> cluster
> >>> to indicate that your clients had any problems whatsoever
> accessing data.
> >>>
> >>> Can you confirm that you saw the same problems while you were
> running
> >>> those commands?  The next thing would seem that possibly a
> client isn't
> >>> getting an updated OSD map to indicate that the host and its
> OSDs are
> >>> down and it's stuck trying to communicate with host7.  That would
> >>> indicate a potential problem with the client being unable to
> communicate
> >>> with the Mons maybe?  Have you completely ruled out any network
> problems
> >>> between all nodes and all of the IPs in the cluster.  What does your
> >>> client log show during these times?
> >>>
> >>> On Fri, Oct 12, 2018 at 8:35 AM Nils Fahldieck - Profihost AG
> >>> mailto:n.fahldi...@profihost.ag>
> >>
> wrote:
> >>>
> >>>     Hi, in our `ceph.conf` we have:
> >>>
> >>>       mon_max_pg_per_osd = 300
> >>>
> >>>     While the host is offline (9 OSDs down):
> >>>
> >>>       4352 PGs * 3 / 62 OSDs ~ 210 PGs per OSD
> >>>
> >>>     If all OSDs are online:
> >>>
> >>>       4352 PGs * 3 / 71 OSDs ~ 183 PGs per OSD
> 

Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-19 Thread Konstantin Shalygin

since some time we experience service outages in our Ceph cluster
whenever there is any change to the HEALTH status. E. g. swapping
storage devices, adding storage devices, rebooting Ceph hosts, during
backfills ect.

Just now I had a recent situation, where several VMs hung after I
rebooted one Ceph host. We have 3 replications for each PG, 3 mon, 3
mgr, 3 mds and 71 osds spread over 9 hosts.

We use Ceph as a storage backend for our Proxmox VE (PVE) environment.
The outages are in the form of blocked virtual file systems of those
virtual machines running in our PVE cluster.

It feels similar to stuck and inactive PGs to me. Honestly though I'm
not really sure on how to debug this problem or which log files to examine.

OS: Debian 9
Kernel: 4.12 based upon SLE15-SP1

# ceph version
ceph version 12.2.8-133-gded2f6836f
(ded2f6836f6331a58f5c817fca7bfcd6c58795aa) luminous (stable)

Can someone guide me? I'm more than happy to provide more information
as needed.



What is your network?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-18 Thread David Turner
What are you OSD node stats?  CPU, RAM, quantity and size of OSD disks.
You might need to modify some bluestore settings to speed up the time it
takes to peer or perhaps you might just be underpowering the amount of OSD
disks you're trying to do and your servers and OSD daemons are going as
fast as they can.
On Sat, Oct 13, 2018 at 4:08 PM Stefan Priebe - Profihost AG <
s.pri...@profihost.ag> wrote:

> and a 3rd one:
>
> health: HEALTH_WARN
> 1 MDSs report slow metadata IOs
> 1 MDSs report slow requests
>
> 2018-10-13 21:44:08.150722 mds.cloud1-1473 [WRN] 7 slow requests, 1
> included below; oldest blocked for > 199.922552 secs
> 2018-10-13 21:44:08.150725 mds.cloud1-1473 [WRN] slow request 34.829662
> seconds old, received at 2018-10-13 21:43:33.321031:
> client_request(client.216121228:929114 lookup #0x1/.active.lock
> 2018-10-13 21:43:33.321594 caller_uid=0, caller_gid=0{}) currently
> failed to rdlock, waiting
>
> The relevant OSDs are bluestore again running at 100% I/O:
>
> iostat shows:
> sdi  77,00 0,00  580,00   97,00 511032,00   972,00
> 1512,5714,88   22,05   24,576,97   1,48 100,00
>
> so it reads with 500MB/s which completely saturates the osd. And it does
> for > 10 minutes.
>
> Greets,
> Stefan
>
> Am 13.10.2018 um 21:29 schrieb Stefan Priebe - Profihost AG:
> >
> > ods.19 is a bluestore osd on a healthy 2TB SSD.
> >
> > Log of osd.19 is here:
> > https://pastebin.com/raw/6DWwhS0A
> >
> > Am 13.10.2018 um 21:20 schrieb Stefan Priebe - Profihost AG:
> >> Hi David,
> >>
> >> i think this should be the problem - form a new log from today:
> >>
> >> 2018-10-13 20:57:20.367326 mon.a [WRN] Health check update: 4 osds down
> >> (OSD_DOWN)
> >> ...
> >> 2018-10-13 20:57:41.268674 mon.a [WRN] Health check update: Reduced data
> >> availability: 3 pgs peering (PG_AVAILABILITY)
> >> ...
> >> 2018-10-13 20:58:08.684451 mon.a [WRN] Health check failed: 1 osds down
> >> (OSD_DOWN)
> >> ...
> >> 2018-10-13 20:58:22.841210 mon.a [WRN] Health check failed: Reduced data
> >> availability: 8 pgs inactive (PG_AVAILABILITY)
> >> 
> >> 2018-10-13 20:58:47.570017 mon.a [WRN] Health check update: Reduced data
> >> availability: 5 pgs inactive (PG_AVAILABILITY)
> >> ...
> >> 2018-10-13 20:58:49.142108 osd.19 [WRN] Monitor daemon marked osd.19
> >> down, but it is still running
> >> 2018-10-13 20:58:53.750164 mon.a [WRN] Health check update: Reduced data
> >> availability: 3 pgs inactive (PG_AVAILABILITY)
> >> ...
> >>
> >> so there is a timeframe of > 90s whee PGs are inactive and unavail -
> >> this would at least explain stalled I/O to me?
> >>
> >> Greets,
> >> Stefan
> >>
> >>
> >> Am 12.10.2018 um 15:59 schrieb David Turner:
> >>> The PGs per OSD does not change unless the OSDs are marked out.  You
> >>> have noout set, so that doesn't change at all during this test.  All of
> >>> your PGs peered quickly at the beginning and then were
> active+undersized
> >>> the rest of the time, you never had any blocked requests, and you
> always
> >>> had 100MB/s+ client IO.  I didn't see anything wrong with your cluster
> >>> to indicate that your clients had any problems whatsoever accessing
> data.
> >>>
> >>> Can you confirm that you saw the same problems while you were running
> >>> those commands?  The next thing would seem that possibly a client isn't
> >>> getting an updated OSD map to indicate that the host and its OSDs are
> >>> down and it's stuck trying to communicate with host7.  That would
> >>> indicate a potential problem with the client being unable to
> communicate
> >>> with the Mons maybe?  Have you completely ruled out any network
> problems
> >>> between all nodes and all of the IPs in the cluster.  What does your
> >>> client log show during these times?
> >>>
> >>> On Fri, Oct 12, 2018 at 8:35 AM Nils Fahldieck - Profihost AG
> >>> mailto:n.fahldi...@profihost.ag>> wrote:
> >>>
> >>> Hi, in our `ceph.conf` we have:
> >>>
> >>>   mon_max_pg_per_osd = 300
> >>>
> >>> While the host is offline (9 OSDs down):
> >>>
> >>>   4352 PGs * 3 / 62 OSDs ~ 210 PGs per OSD
> >>>
> >>> If all OSDs are online:
> >>>
> >>>   4352 PGs * 3 / 71 OSDs ~ 183 PGs per OSD
> >>>
> >>> ... so this doesn't seem to be the issue.
> >>>
> >>> If I understood you right, that's what you've meant. If I got you
> wrong,
> >>> would you mind to point to one of those threads you mentioned?
> >>>
> >>> Thanks :)
> >>>
> >>> Am 12.10.2018 um 14:03 schrieb Burkhard Linke:
> >>> > Hi,
> >>> >
> >>> >
> >>> > On 10/12/2018 01:55 PM, Nils Fahldieck - Profihost AG wrote:
> >>> >> I rebooted a Ceph host and logged `ceph status` & `ceph health
> >>> detail`
> >>> >> every 5 seconds. During this I encountered 'PG_AVAILABILITY
> >>> Reduced data
> >>> >> availability: pgs peering'. At the same time some VMs hung as
> >>> described
> >>> >> before.
> >>> >
> >>> > Just a wild guess... you have 71 

Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-13 Thread Stefan Priebe - Profihost AG
and a 3rd one:

health: HEALTH_WARN
1 MDSs report slow metadata IOs
1 MDSs report slow requests

2018-10-13 21:44:08.150722 mds.cloud1-1473 [WRN] 7 slow requests, 1
included below; oldest blocked for > 199.922552 secs
2018-10-13 21:44:08.150725 mds.cloud1-1473 [WRN] slow request 34.829662
seconds old, received at 2018-10-13 21:43:33.321031:
client_request(client.216121228:929114 lookup #0x1/.active.lock
2018-10-13 21:43:33.321594 caller_uid=0, caller_gid=0{}) currently
failed to rdlock, waiting

The relevant OSDs are bluestore again running at 100% I/O:

iostat shows:
sdi  77,00 0,00  580,00   97,00 511032,00   972,00
1512,5714,88   22,05   24,576,97   1,48 100,00

so it reads with 500MB/s which completely saturates the osd. And it does
for > 10 minutes.

Greets,
Stefan

Am 13.10.2018 um 21:29 schrieb Stefan Priebe - Profihost AG:
> 
> ods.19 is a bluestore osd on a healthy 2TB SSD.
> 
> Log of osd.19 is here:
> https://pastebin.com/raw/6DWwhS0A
> 
> Am 13.10.2018 um 21:20 schrieb Stefan Priebe - Profihost AG:
>> Hi David,
>>
>> i think this should be the problem - form a new log from today:
>>
>> 2018-10-13 20:57:20.367326 mon.a [WRN] Health check update: 4 osds down
>> (OSD_DOWN)
>> ...
>> 2018-10-13 20:57:41.268674 mon.a [WRN] Health check update: Reduced data
>> availability: 3 pgs peering (PG_AVAILABILITY)
>> ...
>> 2018-10-13 20:58:08.684451 mon.a [WRN] Health check failed: 1 osds down
>> (OSD_DOWN)
>> ...
>> 2018-10-13 20:58:22.841210 mon.a [WRN] Health check failed: Reduced data
>> availability: 8 pgs inactive (PG_AVAILABILITY)
>> 
>> 2018-10-13 20:58:47.570017 mon.a [WRN] Health check update: Reduced data
>> availability: 5 pgs inactive (PG_AVAILABILITY)
>> ...
>> 2018-10-13 20:58:49.142108 osd.19 [WRN] Monitor daemon marked osd.19
>> down, but it is still running
>> 2018-10-13 20:58:53.750164 mon.a [WRN] Health check update: Reduced data
>> availability: 3 pgs inactive (PG_AVAILABILITY)
>> ...
>>
>> so there is a timeframe of > 90s whee PGs are inactive and unavail -
>> this would at least explain stalled I/O to me?
>>
>> Greets,
>> Stefan
>>
>>
>> Am 12.10.2018 um 15:59 schrieb David Turner:
>>> The PGs per OSD does not change unless the OSDs are marked out.  You
>>> have noout set, so that doesn't change at all during this test.  All of
>>> your PGs peered quickly at the beginning and then were active+undersized
>>> the rest of the time, you never had any blocked requests, and you always
>>> had 100MB/s+ client IO.  I didn't see anything wrong with your cluster
>>> to indicate that your clients had any problems whatsoever accessing data.
>>>
>>> Can you confirm that you saw the same problems while you were running
>>> those commands?  The next thing would seem that possibly a client isn't
>>> getting an updated OSD map to indicate that the host and its OSDs are
>>> down and it's stuck trying to communicate with host7.  That would
>>> indicate a potential problem with the client being unable to communicate
>>> with the Mons maybe?  Have you completely ruled out any network problems
>>> between all nodes and all of the IPs in the cluster.  What does your
>>> client log show during these times?
>>>
>>> On Fri, Oct 12, 2018 at 8:35 AM Nils Fahldieck - Profihost AG
>>> mailto:n.fahldi...@profihost.ag>> wrote:
>>>
>>> Hi, in our `ceph.conf` we have:
>>>
>>>   mon_max_pg_per_osd = 300
>>>
>>> While the host is offline (9 OSDs down):
>>>
>>>   4352 PGs * 3 / 62 OSDs ~ 210 PGs per OSD
>>>
>>> If all OSDs are online:
>>>
>>>   4352 PGs * 3 / 71 OSDs ~ 183 PGs per OSD
>>>
>>> ... so this doesn't seem to be the issue.
>>>
>>> If I understood you right, that's what you've meant. If I got you wrong,
>>> would you mind to point to one of those threads you mentioned?
>>>
>>> Thanks :)
>>>
>>> Am 12.10.2018 um 14:03 schrieb Burkhard Linke:
>>> > Hi,
>>> >
>>> >
>>> > On 10/12/2018 01:55 PM, Nils Fahldieck - Profihost AG wrote:
>>> >> I rebooted a Ceph host and logged `ceph status` & `ceph health
>>> detail`
>>> >> every 5 seconds. During this I encountered 'PG_AVAILABILITY
>>> Reduced data
>>> >> availability: pgs peering'. At the same time some VMs hung as
>>> described
>>> >> before.
>>> >
>>> > Just a wild guess... you have 71 OSDs and about 4500 PG with size=3.
>>> > 13500 PG instance overall, resulting in ~190 PGs per OSD under normal
>>> > circumstances.
>>> >
>>> > If one host is down and the PGs have to re-peer, you might reach the
>>> > limit of 200 PG/OSDs on some of the OSDs, resulting in stuck peering.
>>> >
>>> > You can try to raise this limit. There are several threads on the
>>> > mailing list about this.
>>> >
>>> > Regards,
>>> > Burkhard
>>> >
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com 

Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-13 Thread Stefan Priebe - Profihost AG
and a 3rd one:

health: HEALTH_WARN
1 MDSs report slow metadata IOs
1 MDSs report slow requests

2018-10-13 21:44:08.150722 mds.cloud1-1473 [WRN] 7 slow requests, 1
included below; oldest blocked for > 199.922552 secs
2018-10-13 21:44:08.150725 mds.cloud1-1473 [WRN] slow request 34.829662
seconds old, received at 2018-10-13 21:43:33.321031:
client_request(client.216121228:929114 lookup #0x1/.active.lock
2018-10-13 21:43:33.321594 caller_uid=0, caller_gid=0{}) currently
failed to rdlock, waiting

The relevant OSDs are bluestore again running at 100% I/O:

iostat shows:
sdi  77,00 0,00  580,00   97,00 511032,00   972,00
1512,5714,88   22,05   24,576,97   1,48 100,00

so it reads with 500MB/s which completely saturates the osd. And it does
for > 10 minutes.

Greets,
Stefan

Am 13.10.2018 um 21:29 schrieb Stefan Priebe - Profihost AG:
> 
> ods.19 is a bluestore osd on a healthy 2TB SSD.
> 
> Log of osd.19 is here:
> https://pastebin.com/raw/6DWwhS0A
> 
> Am 13.10.2018 um 21:20 schrieb Stefan Priebe - Profihost AG:
>> Hi David,
>>
>> i think this should be the problem - form a new log from today:
>>
>> 2018-10-13 20:57:20.367326 mon.a [WRN] Health check update: 4 osds down
>> (OSD_DOWN)
>> ...
>> 2018-10-13 20:57:41.268674 mon.a [WRN] Health check update: Reduced data
>> availability: 3 pgs peering (PG_AVAILABILITY)
>> ...
>> 2018-10-13 20:58:08.684451 mon.a [WRN] Health check failed: 1 osds down
>> (OSD_DOWN)
>> ...
>> 2018-10-13 20:58:22.841210 mon.a [WRN] Health check failed: Reduced data
>> availability: 8 pgs inactive (PG_AVAILABILITY)
>> 
>> 2018-10-13 20:58:47.570017 mon.a [WRN] Health check update: Reduced data
>> availability: 5 pgs inactive (PG_AVAILABILITY)
>> ...
>> 2018-10-13 20:58:49.142108 osd.19 [WRN] Monitor daemon marked osd.19
>> down, but it is still running
>> 2018-10-13 20:58:53.750164 mon.a [WRN] Health check update: Reduced data
>> availability: 3 pgs inactive (PG_AVAILABILITY)
>> ...
>>
>> so there is a timeframe of > 90s whee PGs are inactive and unavail -
>> this would at least explain stalled I/O to me?
>>
>> Greets,
>> Stefan
>>
>>
>> Am 12.10.2018 um 15:59 schrieb David Turner:
>>> The PGs per OSD does not change unless the OSDs are marked out.  You
>>> have noout set, so that doesn't change at all during this test.  All of
>>> your PGs peered quickly at the beginning and then were active+undersized
>>> the rest of the time, you never had any blocked requests, and you always
>>> had 100MB/s+ client IO.  I didn't see anything wrong with your cluster
>>> to indicate that your clients had any problems whatsoever accessing data.
>>>
>>> Can you confirm that you saw the same problems while you were running
>>> those commands?  The next thing would seem that possibly a client isn't
>>> getting an updated OSD map to indicate that the host and its OSDs are
>>> down and it's stuck trying to communicate with host7.  That would
>>> indicate a potential problem with the client being unable to communicate
>>> with the Mons maybe?  Have you completely ruled out any network problems
>>> between all nodes and all of the IPs in the cluster.  What does your
>>> client log show during these times?
>>>
>>> On Fri, Oct 12, 2018 at 8:35 AM Nils Fahldieck - Profihost AG
>>> mailto:n.fahldi...@profihost.ag>> wrote:
>>>
>>> Hi, in our `ceph.conf` we have:
>>>
>>>   mon_max_pg_per_osd = 300
>>>
>>> While the host is offline (9 OSDs down):
>>>
>>>   4352 PGs * 3 / 62 OSDs ~ 210 PGs per OSD
>>>
>>> If all OSDs are online:
>>>
>>>   4352 PGs * 3 / 71 OSDs ~ 183 PGs per OSD
>>>
>>> ... so this doesn't seem to be the issue.
>>>
>>> If I understood you right, that's what you've meant. If I got you wrong,
>>> would you mind to point to one of those threads you mentioned?
>>>
>>> Thanks :)
>>>
>>> Am 12.10.2018 um 14:03 schrieb Burkhard Linke:
>>> > Hi,
>>> >
>>> >
>>> > On 10/12/2018 01:55 PM, Nils Fahldieck - Profihost AG wrote:
>>> >> I rebooted a Ceph host and logged `ceph status` & `ceph health
>>> detail`
>>> >> every 5 seconds. During this I encountered 'PG_AVAILABILITY
>>> Reduced data
>>> >> availability: pgs peering'. At the same time some VMs hung as
>>> described
>>> >> before.
>>> >
>>> > Just a wild guess... you have 71 OSDs and about 4500 PG with size=3.
>>> > 13500 PG instance overall, resulting in ~190 PGs per OSD under normal
>>> > circumstances.
>>> >
>>> > If one host is down and the PGs have to re-peer, you might reach the
>>> > limit of 200 PG/OSDs on some of the OSDs, resulting in stuck peering.
>>> >
>>> > You can try to raise this limit. There are several threads on the
>>> > mailing list about this.
>>> >
>>> > Regards,
>>> > Burkhard
>>> >
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com 

Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-13 Thread Stefan Priebe - Profihost AG

ods.19 is a bluestore osd on a healthy 2TB SSD.

Log of osd.19 is here:
https://pastebin.com/raw/6DWwhS0A

Am 13.10.2018 um 21:20 schrieb Stefan Priebe - Profihost AG:
> Hi David,
> 
> i think this should be the problem - form a new log from today:
> 
> 2018-10-13 20:57:20.367326 mon.a [WRN] Health check update: 4 osds down
> (OSD_DOWN)
> ...
> 2018-10-13 20:57:41.268674 mon.a [WRN] Health check update: Reduced data
> availability: 3 pgs peering (PG_AVAILABILITY)
> ...
> 2018-10-13 20:58:08.684451 mon.a [WRN] Health check failed: 1 osds down
> (OSD_DOWN)
> ...
> 2018-10-13 20:58:22.841210 mon.a [WRN] Health check failed: Reduced data
> availability: 8 pgs inactive (PG_AVAILABILITY)
> 
> 2018-10-13 20:58:47.570017 mon.a [WRN] Health check update: Reduced data
> availability: 5 pgs inactive (PG_AVAILABILITY)
> ...
> 2018-10-13 20:58:49.142108 osd.19 [WRN] Monitor daemon marked osd.19
> down, but it is still running
> 2018-10-13 20:58:53.750164 mon.a [WRN] Health check update: Reduced data
> availability: 3 pgs inactive (PG_AVAILABILITY)
> ...
> 
> so there is a timeframe of > 90s whee PGs are inactive and unavail -
> this would at least explain stalled I/O to me?
> 
> Greets,
> Stefan
> 
> 
> Am 12.10.2018 um 15:59 schrieb David Turner:
>> The PGs per OSD does not change unless the OSDs are marked out.  You
>> have noout set, so that doesn't change at all during this test.  All of
>> your PGs peered quickly at the beginning and then were active+undersized
>> the rest of the time, you never had any blocked requests, and you always
>> had 100MB/s+ client IO.  I didn't see anything wrong with your cluster
>> to indicate that your clients had any problems whatsoever accessing data.
>>
>> Can you confirm that you saw the same problems while you were running
>> those commands?  The next thing would seem that possibly a client isn't
>> getting an updated OSD map to indicate that the host and its OSDs are
>> down and it's stuck trying to communicate with host7.  That would
>> indicate a potential problem with the client being unable to communicate
>> with the Mons maybe?  Have you completely ruled out any network problems
>> between all nodes and all of the IPs in the cluster.  What does your
>> client log show during these times?
>>
>> On Fri, Oct 12, 2018 at 8:35 AM Nils Fahldieck - Profihost AG
>> mailto:n.fahldi...@profihost.ag>> wrote:
>>
>> Hi, in our `ceph.conf` we have:
>>
>>   mon_max_pg_per_osd = 300
>>
>> While the host is offline (9 OSDs down):
>>
>>   4352 PGs * 3 / 62 OSDs ~ 210 PGs per OSD
>>
>> If all OSDs are online:
>>
>>   4352 PGs * 3 / 71 OSDs ~ 183 PGs per OSD
>>
>> ... so this doesn't seem to be the issue.
>>
>> If I understood you right, that's what you've meant. If I got you wrong,
>> would you mind to point to one of those threads you mentioned?
>>
>> Thanks :)
>>
>> Am 12.10.2018 um 14:03 schrieb Burkhard Linke:
>> > Hi,
>> >
>> >
>> > On 10/12/2018 01:55 PM, Nils Fahldieck - Profihost AG wrote:
>> >> I rebooted a Ceph host and logged `ceph status` & `ceph health
>> detail`
>> >> every 5 seconds. During this I encountered 'PG_AVAILABILITY
>> Reduced data
>> >> availability: pgs peering'. At the same time some VMs hung as
>> described
>> >> before.
>> >
>> > Just a wild guess... you have 71 OSDs and about 4500 PG with size=3.
>> > 13500 PG instance overall, resulting in ~190 PGs per OSD under normal
>> > circumstances.
>> >
>> > If one host is down and the PGs have to re-peer, you might reach the
>> > limit of 200 PG/OSDs on some of the OSDs, resulting in stuck peering.
>> >
>> > You can try to raise this limit. There are several threads on the
>> > mailing list about this.
>> >
>> > Regards,
>> > Burkhard
>> >
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-13 Thread Stefan Priebe - Profihost AG
Hi David,

i think this should be the problem - form a new log from today:

2018-10-13 20:57:20.367326 mon.a [WRN] Health check update: 4 osds down
(OSD_DOWN)
...
2018-10-13 20:57:41.268674 mon.a [WRN] Health check update: Reduced data
availability: 3 pgs peering (PG_AVAILABILITY)
...
2018-10-13 20:58:08.684451 mon.a [WRN] Health check failed: 1 osds down
(OSD_DOWN)
...
2018-10-13 20:58:22.841210 mon.a [WRN] Health check failed: Reduced data
availability: 8 pgs inactive (PG_AVAILABILITY)

2018-10-13 20:58:47.570017 mon.a [WRN] Health check update: Reduced data
availability: 5 pgs inactive (PG_AVAILABILITY)
...
2018-10-13 20:58:49.142108 osd.19 [WRN] Monitor daemon marked osd.19
down, but it is still running
2018-10-13 20:58:53.750164 mon.a [WRN] Health check update: Reduced data
availability: 3 pgs inactive (PG_AVAILABILITY)
...

so there is a timeframe of > 90s whee PGs are inactive and unavail -
this would at least explain stalled I/O to me?

Greets,
Stefan


Am 12.10.2018 um 15:59 schrieb David Turner:
> The PGs per OSD does not change unless the OSDs are marked out.  You
> have noout set, so that doesn't change at all during this test.  All of
> your PGs peered quickly at the beginning and then were active+undersized
> the rest of the time, you never had any blocked requests, and you always
> had 100MB/s+ client IO.  I didn't see anything wrong with your cluster
> to indicate that your clients had any problems whatsoever accessing data.
> 
> Can you confirm that you saw the same problems while you were running
> those commands?  The next thing would seem that possibly a client isn't
> getting an updated OSD map to indicate that the host and its OSDs are
> down and it's stuck trying to communicate with host7.  That would
> indicate a potential problem with the client being unable to communicate
> with the Mons maybe?  Have you completely ruled out any network problems
> between all nodes and all of the IPs in the cluster.  What does your
> client log show during these times?
> 
> On Fri, Oct 12, 2018 at 8:35 AM Nils Fahldieck - Profihost AG
> mailto:n.fahldi...@profihost.ag>> wrote:
> 
> Hi, in our `ceph.conf` we have:
> 
>   mon_max_pg_per_osd = 300
> 
> While the host is offline (9 OSDs down):
> 
>   4352 PGs * 3 / 62 OSDs ~ 210 PGs per OSD
> 
> If all OSDs are online:
> 
>   4352 PGs * 3 / 71 OSDs ~ 183 PGs per OSD
> 
> ... so this doesn't seem to be the issue.
> 
> If I understood you right, that's what you've meant. If I got you wrong,
> would you mind to point to one of those threads you mentioned?
> 
> Thanks :)
> 
> Am 12.10.2018 um 14:03 schrieb Burkhard Linke:
> > Hi,
> >
> >
> > On 10/12/2018 01:55 PM, Nils Fahldieck - Profihost AG wrote:
> >> I rebooted a Ceph host and logged `ceph status` & `ceph health
> detail`
> >> every 5 seconds. During this I encountered 'PG_AVAILABILITY
> Reduced data
> >> availability: pgs peering'. At the same time some VMs hung as
> described
> >> before.
> >
> > Just a wild guess... you have 71 OSDs and about 4500 PG with size=3.
> > 13500 PG instance overall, resulting in ~190 PGs per OSD under normal
> > circumstances.
> >
> > If one host is down and the PGs have to re-peer, you might reach the
> > limit of 200 PG/OSDs on some of the OSDs, resulting in stuck peering.
> >
> > You can try to raise this limit. There are several threads on the
> > mailing list about this.
> >
> > Regards,
> > Burkhard
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-12 Thread Paul Emmerich
PGs switching to the peering state after a failure is normal and
expected. The important thing is how long they stay in that state; it
shouldn't be longer than a few seconds. It looks like less than 5
seconds from your log.

What might help here is the ceph -w log (or mon cluster log file)
during an outage.

Also, get rid of that min_size = 1 setting, that will bite you in the long run.

Paul
Am Fr., 12. Okt. 2018 um 23:27 Uhr schrieb Stefan Priebe - Profihost
AG :
>
> Hi David,
>
> Am 12.10.2018 um 15:59 schrieb David Turner:
> > The PGs per OSD does not change unless the OSDs are marked out.  You
> > have noout set, so that doesn't change at all during this test.  All of
> > your PGs peered quickly at the beginning and then were active+undersized
> > the rest of the time, you never had any blocked requests, and you always
> > had 100MB/s+ client IO.  I didn't see anything wrong with your cluster
> > to indicate that your clients had any problems whatsoever accessing data.
> >
> > Can you confirm that you saw the same problems while you were running
> > those commands?  The next thing would seem that possibly a client isn't
> > getting an updated OSD map to indicate that the host and its OSDs are
> > down and it's stuck trying to communicate with host7.  That would
> > indicate a potential problem with the client being unable to communicate
> > with the Mons maybe?
> May be but what about this status
> 'PG_AVAILABILITY Reduced data availability: pgs peering'
>
> See the log here: https://pastebin.com/wxUKzhgB
>
> PG_AVAILABILITY is noted at timestamps [2018-10-12 12:16:15.403394] and
> [2018-10-12 12:17:40.072655].
>
> And why does Ceph docs say:
>
> Data availability is reduced, meaning that the cluster is unable to
> service potential read or write requests for some data in the cluster.
> Specifically, one or more PGs is in a state that does not allow IO
> requests to be serviced. Problematic PG states include peering, stale,
> incomplete, and the lack of active (if those conditions do not clear
> quickly).
>
>
> Greets,
> Stefan
> >
> > On Fri, Oct 12, 2018 at 8:35 AM Nils Fahldieck - Profihost AG
> > mailto:n.fahldi...@profihost.ag>> wrote:
> >
> > Hi, in our `ceph.conf` we have:
> >
> >   mon_max_pg_per_osd = 300
> >
> > While the host is offline (9 OSDs down):
> >
> >   4352 PGs * 3 / 62 OSDs ~ 210 PGs per OSD
> >
> > If all OSDs are online:
> >
> >   4352 PGs * 3 / 71 OSDs ~ 183 PGs per OSD
> >
> > ... so this doesn't seem to be the issue.
> >
> > If I understood you right, that's what you've meant. If I got you wrong,
> > would you mind to point to one of those threads you mentioned?
> >
> > Thanks :)
> >
> > Am 12.10.2018 um 14:03 schrieb Burkhard Linke:
> > > Hi,
> > >
> > >
> > > On 10/12/2018 01:55 PM, Nils Fahldieck - Profihost AG wrote:
> > >> I rebooted a Ceph host and logged `ceph status` & `ceph health
> > detail`
> > >> every 5 seconds. During this I encountered 'PG_AVAILABILITY
> > Reduced data
> > >> availability: pgs peering'. At the same time some VMs hung as
> > described
> > >> before.
> > >
> > > Just a wild guess... you have 71 OSDs and about 4500 PG with size=3.
> > > 13500 PG instance overall, resulting in ~190 PGs per OSD under normal
> > > circumstances.
> > >
> > > If one host is down and the PGs have to re-peer, you might reach the
> > > limit of 200 PG/OSDs on some of the OSDs, resulting in stuck peering.
> > >
> > > You can try to raise this limit. There are several threads on the
> > > mailing list about this.
> > >
> > > Regards,
> > > Burkhard
> > >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-12 Thread David Turner
The PGs per OSD does not change unless the OSDs are marked out.  You have
noout set, so that doesn't change at all during this test.  All of your PGs
peered quickly at the beginning and then were active+undersized the rest of
the time, you never had any blocked requests, and you always had 100MB/s+
client IO.  I didn't see anything wrong with your cluster to indicate that
your clients had any problems whatsoever accessing data.

Can you confirm that you saw the same problems while you were running those
commands?  The next thing would seem that possibly a client isn't getting
an updated OSD map to indicate that the host and its OSDs are down and it's
stuck trying to communicate with host7.  That would indicate a potential
problem with the client being unable to communicate with the Mons maybe?
Have you completely ruled out any network problems between all nodes and
all of the IPs in the cluster.  What does your client log show during these
times?

On Fri, Oct 12, 2018 at 8:35 AM Nils Fahldieck - Profihost AG <
n.fahldi...@profihost.ag> wrote:

> Hi, in our `ceph.conf` we have:
>
>   mon_max_pg_per_osd = 300
>
> While the host is offline (9 OSDs down):
>
>   4352 PGs * 3 / 62 OSDs ~ 210 PGs per OSD
>
> If all OSDs are online:
>
>   4352 PGs * 3 / 71 OSDs ~ 183 PGs per OSD
>
> ... so this doesn't seem to be the issue.
>
> If I understood you right, that's what you've meant. If I got you wrong,
> would you mind to point to one of those threads you mentioned?
>
> Thanks :)
>
> Am 12.10.2018 um 14:03 schrieb Burkhard Linke:
> > Hi,
> >
> >
> > On 10/12/2018 01:55 PM, Nils Fahldieck - Profihost AG wrote:
> >> I rebooted a Ceph host and logged `ceph status` & `ceph health detail`
> >> every 5 seconds. During this I encountered 'PG_AVAILABILITY Reduced data
> >> availability: pgs peering'. At the same time some VMs hung as described
> >> before.
> >
> > Just a wild guess... you have 71 OSDs and about 4500 PG with size=3.
> > 13500 PG instance overall, resulting in ~190 PGs per OSD under normal
> > circumstances.
> >
> > If one host is down and the PGs have to re-peer, you might reach the
> > limit of 200 PG/OSDs on some of the OSDs, resulting in stuck peering.
> >
> > You can try to raise this limit. There are several threads on the
> > mailing list about this.
> >
> > Regards,
> > Burkhard
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-12 Thread Nils Fahldieck - Profihost AG
Hi, in our `ceph.conf` we have:

  mon_max_pg_per_osd = 300

While the host is offline (9 OSDs down):

  4352 PGs * 3 / 62 OSDs ~ 210 PGs per OSD

If all OSDs are online:

  4352 PGs * 3 / 71 OSDs ~ 183 PGs per OSD

... so this doesn't seem to be the issue.

If I understood you right, that's what you've meant. If I got you wrong,
would you mind to point to one of those threads you mentioned?

Thanks :)

Am 12.10.2018 um 14:03 schrieb Burkhard Linke:
> Hi,
> 
> 
> On 10/12/2018 01:55 PM, Nils Fahldieck - Profihost AG wrote:
>> I rebooted a Ceph host and logged `ceph status` & `ceph health detail`
>> every 5 seconds. During this I encountered 'PG_AVAILABILITY Reduced data
>> availability: pgs peering'. At the same time some VMs hung as described
>> before.
> 
> Just a wild guess... you have 71 OSDs and about 4500 PG with size=3.
> 13500 PG instance overall, resulting in ~190 PGs per OSD under normal
> circumstances.
> 
> If one host is down and the PGs have to re-peer, you might reach the
> limit of 200 PG/OSDs on some of the OSDs, resulting in stuck peering.
> 
> You can try to raise this limit. There are several threads on the
> mailing list about this.
> 
> Regards,
> Burkhard
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-12 Thread Burkhard Linke

Hi,


On 10/12/2018 01:55 PM, Nils Fahldieck - Profihost AG wrote:

I rebooted a Ceph host and logged `ceph status` & `ceph health detail`
every 5 seconds. During this I encountered 'PG_AVAILABILITY Reduced data
availability: pgs peering'. At the same time some VMs hung as described
before.


Just a wild guess... you have 71 OSDs and about 4500 PG with size=3. 
13500 PG instance overall, resulting in ~190 PGs per OSD under normal 
circumstances.


If one host is down and the PGs have to re-peer, you might reach the 
limit of 200 PG/OSDs on some of the OSDs, resulting in stuck peering.


You can try to raise this limit. There are several threads on the 
mailing list about this.


Regards,
Burkhard

--
Dr. rer. nat. Burkhard Linke
Bioinformatics and Systems Biology
Justus-Liebig-University Giessen
35392 Giessen, Germany
Phone: (+49) (0)641 9935810

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-12 Thread Nils Fahldieck - Profihost AG
I rebooted a Ceph host and logged `ceph status` & `ceph health detail`
every 5 seconds. During this I encountered 'PG_AVAILABILITY Reduced data
availability: pgs peering'. At the same time some VMs hung as described
before.

See the log here: https://pastebin.com/wxUKzhgB

PG_AVAILABILITY is noted at timestamps [2018-10-12 12:16:15.403394] and
[2018-10-12 12:17:40.072655].

Ceph docs say regarding PG_AVAILABILITY:

Data availability is reduced, meaning that the cluster is unable to
service potential read or write requests for some data in the cluster.
Specifically, one or more PGs is in a state that does not allow IO
requests to be serviced. Problematic PG states include peering, stale,
incomplete, and the lack of active (if those conditions do not clear
quickly).

Do you know why those PGs are stuck peering and how I might troubleshoot
this any further?
Am 11.10.2018 um 22:27 schrieb David Turner:
> You should definitely stop using `size 3 min_size 1` on your pools.  Go
> back to the default `min_size 2`.  I'm a little confused why you have 3
> different CRUSH rules.  They're all identical.  You only need different
> CRUSH rules if you're using Erasure Coding or targeting a different set
> of OSDs like SSD vs HDD OSDs for different pools.
> 
> All of that said, I don't see anything in those rules that would
> indicate why you're having problems with accessing your data when a node
> is being restarted.  The `ceph status` and `ceph health detail` outputs
> will be helpful while it's happening.
> 
> On Thu, Oct 11, 2018 at 3:02 PM Nils Fahldieck - Profihost AG
> mailto:n.fahldi...@profihost.ag>> wrote:
> 
> Thanks for your reply. I'll capture a `ceph status` the next time I
> encounter a not working RBD. Here's the other output you asked for:
> 
> $ ceph osd crush rule dump
> [
>     {
>         "rule_id": 0,
>         "rule_name": "data",
>         "ruleset": 0,
>         "type": 1,
>         "min_size": 1,
>         "max_size": 10,
>         "steps": [
>             {
>                 "op": "take",
>                 "item": -1,
>                 "item_name": "root"
>             },
>             {
>                 "op": "chooseleaf_firstn",
>                 "num": 0,
>                 "type": "host"
>             },
>             {
>                 "op": "emit"
>             }
>         ]
>     },
>     {
>         "rule_id": 1,
>         "rule_name": "metadata",
>         "ruleset": 1,
>         "type": 1,
>         "min_size": 1,
>         "max_size": 10,
>         "steps": [
>             {
>                 "op": "take",
>                 "item": -1,
>                 "item_name": "root"
>             },
>             {
>                 "op": "chooseleaf_firstn",
>                 "num": 0,
>                 "type": "host"
>             },
>             {
>                 "op": "emit"
>             }
>         ]
>     },
>     {
>         "rule_id": 2,
>         "rule_name": "rbd",
>         "ruleset": 2,
>         "type": 1,
>         "min_size": 1,
>         "max_size": 10,
>         "steps": [
>             {
>                 "op": "take",
>                 "item": -1,
>                 "item_name": "root"
>             },
>             {
>                 "op": "chooseleaf_firstn",
>                 "num": 0,
>                 "type": "host"
>             },
>             {
>                 "op": "emit"
>             }
>         ]
>     }
> ]
> 
> $ ceph osd pool ls detail
> pool 5 'cephstor1' replicated size 3 min_size 1 crush_rule 0 object_hash
> rjenkins pg_num 4096 pgp_num 4096 last_change 1217074 flags hashpspool
> min_read_recency_for_promote 1 min_write_recency_for_promote 1
> stripe_width 0 application rbd
>         removed_snaps
> 
> 

Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-11 Thread David Turner
You should definitely stop using `size 3 min_size 1` on your pools.  Go
back to the default `min_size 2`.  I'm a little confused why you have 3
different CRUSH rules.  They're all identical.  You only need different
CRUSH rules if you're using Erasure Coding or targeting a different set of
OSDs like SSD vs HDD OSDs for different pools.

All of that said, I don't see anything in those rules that would indicate
why you're having problems with accessing your data when a node is being
restarted.  The `ceph status` and `ceph health detail` outputs will be
helpful while it's happening.

On Thu, Oct 11, 2018 at 3:02 PM Nils Fahldieck - Profihost AG <
n.fahldi...@profihost.ag> wrote:

> Thanks for your reply. I'll capture a `ceph status` the next time I
> encounter a not working RBD. Here's the other output you asked for:
>
> $ ceph osd crush rule dump
> [
> {
> "rule_id": 0,
> "rule_name": "data",
> "ruleset": 0,
> "type": 1,
> "min_size": 1,
> "max_size": 10,
> "steps": [
> {
> "op": "take",
> "item": -1,
> "item_name": "root"
> },
> {
> "op": "chooseleaf_firstn",
> "num": 0,
> "type": "host"
> },
> {
> "op": "emit"
> }
> ]
> },
> {
> "rule_id": 1,
> "rule_name": "metadata",
> "ruleset": 1,
> "type": 1,
> "min_size": 1,
> "max_size": 10,
> "steps": [
> {
> "op": "take",
> "item": -1,
> "item_name": "root"
> },
> {
> "op": "chooseleaf_firstn",
> "num": 0,
> "type": "host"
> },
> {
> "op": "emit"
> }
> ]
> },
> {
> "rule_id": 2,
> "rule_name": "rbd",
> "ruleset": 2,
> "type": 1,
> "min_size": 1,
> "max_size": 10,
> "steps": [
> {
> "op": "take",
> "item": -1,
> "item_name": "root"
> },
> {
> "op": "chooseleaf_firstn",
> "num": 0,
> "type": "host"
> },
> {
> "op": "emit"
> }
> ]
> }
> ]
>
> $ ceph osd pool ls detail
> pool 5 'cephstor1' replicated size 3 min_size 1 crush_rule 0 object_hash
> rjenkins pg_num 4096 pgp_num 4096 last_change 1217074 flags hashpspool
> min_read_recency_for_promote 1 min_write_recency_for_promote 1
> stripe_width 0 application rbd
> removed_snaps
>
> [1~9,b~1,d~7d1e8,7d1f6~3d05f,ba256~4bd9,bee30~357,bf188~5531,c46ba~85b3,ccc6e~b599,d820b~1,d820d~1,d820f~1,d8211~1,d8214~1,d8216~1,d8219~2,d821d~1,d821f~1,d8221~1,d8223~1,d8226~2,d8229~1,d822b~2,d822e~2,d8231~3,d8236~1,d8238~2,d823b~1,d823d~3,d8241~1,d8243~1,d8245~1,d8247~3,d824d~1,d824f~1,d8251~1,d8253~1,d8255~2,d8258~1,d825c~1,d825e~2,d8262~1,d8264~1,d8266~1,d8268~2,d826e~2,d8272~1,d8274~1,d8276~8,d8280~1,d8282~1,d8284~1,d8286~1,d8288~1,d828a~1,d828c~1,d828e~1,d8290~1,d8292~1,d8294~3,d8298~1,d829a~2,d829d~1,d82a0~4,d82a6~1,d82a8~2,d82ac~1,d82ae~1,d82b0~1,d82b2~1,d82b5~1,d82b7~1,d82b9~1,d82bb~1,d82bd~1,d82bf~1,d82c1~1,d82c3~2,d82c6~2,d82c9~1,d82cb~1,d82ce~1,d82d0~2,d82d3~1,d82d6~4,d82db~1,d82de~1,d82e0~1,d82e2~1,d82e4~1,d82e6~1,d82e8~1,d82ea~1,d82ed~1,d82ef~1,d82f1~1,d82f3~2,d82f7~2,d82fb~2,d82ff~1,d8301~1,d8303~1,d8305~1,d8307~1,d8309~1,d830b~1,d830e~1,d8311~2,d8314~3,d8318~1,d831a~1,d831c~1,d831f~3,d8323~2,d8329~1,d832b~2,d832f~1,d8331~1,d8333~1,d8335~1,d8338~6,d833f~1,d8341~1,d8343~1,d8345~2,d8349~2,d834c~1,d834e~1,d8350~1,d8352~1,d8354~1,d8356~4,d835b~1,d835d~2,d8360~1,d8362~3,d8366~3,d836b~3,d8370~1,d8372~1,d8374~1,d8376~3,d837a~1,d837c~1,d837e~2,d8381~1,d8383~1,d8385~1,d8387~3,d838b~2,d838e~4,d8393~1,d8396~1,d8398~2,d839b~1,d839d~2,d83a0~2,d83a3~1,d83a5~2,d83a9~2,d83ad~1,d83b0~2,d83b4~2,d83b8~1,d83ba~a,d83c5~1,d83c7~1,d83ca~1,d83cc~1,d83ce~1,d83d0~1,d83d2~6,d83d9~3,d83df~1,d83e1~2,d83e5~1,d83e8~1,d83eb~4,d83f0~1,d83f2~1,d83f4~3,d83f8~3,d83fd~2,d8402~1,d8405~1,d8407~1,d840a~2,d840f~1,d8411~1,d8413~3,d8417~3,d841c~4,d8422~4,d8428~2,d842b~1,d842e~1,d8430~1,d8432~5,d843a~1,d843c~3,d8440~5,d8447~1,d844a~1,d844d~1,d844f~1,d8452~1,d8455~1,d8457~1,d8459~2,d845d~2,d8460~1,d8462~3,d8467~1,d8469~1,d846b~2,d846e~2,d8471~4,d8476~6,d847d~3,d8482~1,d8484~1,d8486~2,d8489~2,d848c~1,d848e~1,d8491~4,d8499~1,d849c~3,d84a0~1,d84a2~1,d84a4~3,d84aa~2,d84ad~2,d84b1~4,d84b6~1,d84b8~1,d84ba~1,d84bc~1,d84be~1,d84c0~5,d84c7~4,d84ce~1,d84d0~1,d84d2~2,d84d6~2,d84db~1,d84dd~2,d84e2~2,d84e6~1,d84e9~1,d84eb~4,d84f0~4]
> pool 6 'cephfs_cephstor1_data' replicated size 3 min_size 1 crush_rule 0
> object_hash rjenkins pg_num 128 pgp_num 128 last_change 1214952 flags
> hashpspool stripe_width 0 

Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-11 Thread Nils Fahldieck - Profihost AG
Thanks for your reply. I'll capture a `ceph status` the next time I
encounter a not working RBD. Here's the other output you asked for:

$ ceph osd crush rule dump
[
{
"rule_id": 0,
"rule_name": "data",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "root"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 1,
"rule_name": "metadata",
"ruleset": 1,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "root"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 2,
"rule_name": "rbd",
"ruleset": 2,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "root"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
]

$ ceph osd pool ls detail
pool 5 'cephstor1' replicated size 3 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 4096 pgp_num 4096 last_change 1217074 flags hashpspool
min_read_recency_for_promote 1 min_write_recency_for_promote 1
stripe_width 0 application rbd
removed_snaps
[1~9,b~1,d~7d1e8,7d1f6~3d05f,ba256~4bd9,bee30~357,bf188~5531,c46ba~85b3,ccc6e~b599,d820b~1,d820d~1,d820f~1,d8211~1,d8214~1,d8216~1,d8219~2,d821d~1,d821f~1,d8221~1,d8223~1,d8226~2,d8229~1,d822b~2,d822e~2,d8231~3,d8236~1,d8238~2,d823b~1,d823d~3,d8241~1,d8243~1,d8245~1,d8247~3,d824d~1,d824f~1,d8251~1,d8253~1,d8255~2,d8258~1,d825c~1,d825e~2,d8262~1,d8264~1,d8266~1,d8268~2,d826e~2,d8272~1,d8274~1,d8276~8,d8280~1,d8282~1,d8284~1,d8286~1,d8288~1,d828a~1,d828c~1,d828e~1,d8290~1,d8292~1,d8294~3,d8298~1,d829a~2,d829d~1,d82a0~4,d82a6~1,d82a8~2,d82ac~1,d82ae~1,d82b0~1,d82b2~1,d82b5~1,d82b7~1,d82b9~1,d82bb~1,d82bd~1,d82bf~1,d82c1~1,d82c3~2,d82c6~2,d82c9~1,d82cb~1,d82ce~1,d82d0~2,d82d3~1,d82d6~4,d82db~1,d82de~1,d82e0~1,d82e2~1,d82e4~1,d82e6~1,d82e8~1,d82ea~1,d82ed~1,d82ef~1,d82f1~1,d82f3~2,d82f7~2,d82fb~2,d82ff~1,d8301~1,d8303~1,d8305~1,d8307~1,d8309~1,d830b~1,d830e~1,d8311~2,d8314~3,d8318~1,d831a~1,d831c~1,d831f~3,d8323~2,d8329~1,d832b~2,d832f~1,d8331~1,d8333~1,d8335~1,d8338~6,d833f~1,d8341~1,d8343~1,d8345~2,d8349~2,d834c~1,d834e~1,d8350~1,d8352~1,d8354~1,d8356~4,d835b~1,d835d~2,d8360~1,d8362~3,d8366~3,d836b~3,d8370~1,d8372~1,d8374~1,d8376~3,d837a~1,d837c~1,d837e~2,d8381~1,d8383~1,d8385~1,d8387~3,d838b~2,d838e~4,d8393~1,d8396~1,d8398~2,d839b~1,d839d~2,d83a0~2,d83a3~1,d83a5~2,d83a9~2,d83ad~1,d83b0~2,d83b4~2,d83b8~1,d83ba~a,d83c5~1,d83c7~1,d83ca~1,d83cc~1,d83ce~1,d83d0~1,d83d2~6,d83d9~3,d83df~1,d83e1~2,d83e5~1,d83e8~1,d83eb~4,d83f0~1,d83f2~1,d83f4~3,d83f8~3,d83fd~2,d8402~1,d8405~1,d8407~1,d840a~2,d840f~1,d8411~1,d8413~3,d8417~3,d841c~4,d8422~4,d8428~2,d842b~1,d842e~1,d8430~1,d8432~5,d843a~1,d843c~3,d8440~5,d8447~1,d844a~1,d844d~1,d844f~1,d8452~1,d8455~1,d8457~1,d8459~2,d845d~2,d8460~1,d8462~3,d8467~1,d8469~1,d846b~2,d846e~2,d8471~4,d8476~6,d847d~3,d8482~1,d8484~1,d8486~2,d8489~2,d848c~1,d848e~1,d8491~4,d8499~1,d849c~3,d84a0~1,d84a2~1,d84a4~3,d84aa~2,d84ad~2,d84b1~4,d84b6~1,d84b8~1,d84ba~1,d84bc~1,d84be~1,d84c0~5,d84c7~4,d84ce~1,d84d0~1,d84d2~2,d84d6~2,d84db~1,d84dd~2,d84e2~2,d84e6~1,d84e9~1,d84eb~4,d84f0~4]
pool 6 'cephfs_cephstor1_data' replicated size 3 min_size 1 crush_rule 0
object_hash rjenkins pg_num 128 pgp_num 128 last_change 1214952 flags
hashpspool stripe_width 0 application cephfs
pool 7 'cephfs_cephstor1_metadata' replicated size 3 min_size 1
crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change
1214952 flags hashpspool stripe_width 0 application cephfs

Am 11.10.2018 um 20:47 schrieb David Turner:
> My first guess is to ask what your crush rules are.  `ceph osd crush
> rule dump` along with `ceph osd pool ls detail` would be helpful.  Also
> if you have a `ceph status` output from a time where the VM RBDs aren't
> working might explain something.
> 
> On Thu, Oct 11, 2018 at 1:12 PM Nils Fahldieck - Profihost AG
> mailto:n.fahldi...@profihost.ag>> wrote:
> 
> Hi everyone,
> 
> since some time we experience service outages in our Ceph cluster
> whenever there is any change to the HEALTH status. E. g. swapping
> storage devices, adding storage devices, rebooting Ceph hosts, 

Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-11 Thread David Turner
My first guess is to ask what your crush rules are.  `ceph osd crush rule
dump` along with `ceph osd pool ls detail` would be helpful.  Also if you
have a `ceph status` output from a time where the VM RBDs aren't working
might explain something.

On Thu, Oct 11, 2018 at 1:12 PM Nils Fahldieck - Profihost AG <
n.fahldi...@profihost.ag> wrote:

> Hi everyone,
>
> since some time we experience service outages in our Ceph cluster
> whenever there is any change to the HEALTH status. E. g. swapping
> storage devices, adding storage devices, rebooting Ceph hosts, during
> backfills ect.
>
> Just now I had a recent situation, where several VMs hung after I
> rebooted one Ceph host. We have 3 replications for each PG, 3 mon, 3
> mgr, 3 mds and 71 osds spread over 9 hosts.
>
> We use Ceph as a storage backend for our Proxmox VE (PVE) environment.
> The outages are in the form of blocked virtual file systems of those
> virtual machines running in our PVE cluster.
>
> It feels similar to stuck and inactive PGs to me. Honestly though I'm
> not really sure on how to debug this problem or which log files to examine.
>
> OS: Debian 9
> Kernel: 4.12 based upon SLE15-SP1
>
> # ceph version
> ceph version 12.2.8-133-gded2f6836f
> (ded2f6836f6331a58f5c817fca7bfcd6c58795aa) luminous (stable)
>
> Can someone guide me? I'm more than happy to provide more information
> as needed.
>
> Thanks in advance
> Nils
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com