from:"Tom W"

Re: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked down, then booted, then failed again

2018-07-17 Thread Tom W

Hi Bryan,

Try both the commands timing out again but with the -verbose flag, see if we 
can get anything from that.

Tom

From: Bryan Banister 
Sent: 17 July 2018 23:51
To: Tom W ; ceph-users@lists.ceph.com
Subject: RE: Cluster in bad shape, seemingly endless cycle of OSDs failed, then 
marked down, then booted, then failed again

Hey Tom,

Ok, yeah, I've used those steps before myself for other operations like cluster 
software updates.  I'll try them.

As for the query, not sure how I missed that, and think I've used it before!  
Unfortunately it just hangs similar to the other daemon operation:
root@rook-tools:/# ceph pg 19.1fdf query

Thanks,
-Bryan

From: Tom W [mailto:to...@ukfast.co.uk]
Sent: Tuesday, July 17, 2018 5:36 PM
To: Bryan Banister 
mailto:bbanis...@jumptrading.com>>; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: RE: Cluster in bad shape, seemingly endless cycle of OSDs failed, then 
marked down, then booted, then failed again

Note: External Email

Hi Bryan,

Try "ceph pg xx.xx query", with xx.xx of course being the PG number. With some 
luck this will give you the state of that individual pg and which OSDs or 
issues may be blocking the peering from completing, which can be used as a clue 
perhaps as to the cause. If you can find a point of two OSDs unable to peer, 
you then at least have a pathway to begin testing connectivity again.

For pausing your cluster, my method (never tested in an environment with more 
than a few osds or in production) is:
ceph osd set noout
ceph osd set nobackfill
ceph osd set norecover
ceph osd set nodown
ceph osd set norebalance
ceph osd pause

To return, just reverse the order with "ceph osd pause" becoming "ceph osd 
unpause" I believe. All the above flags will stop most activity and just get 
things peered and not much else, once they are successfully peered, you can 
slowly begin to unset the above (and I do recommend going slowly..). You don't 
have any significant misplaced/degraded objects so you won't likely see much 
recovery activity, but as scrubbing kicks in there might be significant amounts 
of inconsistent PGs and backfill/recovery going on. It might be best to limit 
the impact of these from going 0 to 100 with these parameters (1 backfill at a 
time, wait 0.1 seconds between recovery op per OSD).

ceph tell osd.* injectargs '--osd-max-backfills 1'
ceph tell osd.* injectargs '--osd-recovery-sleep 0.1'

Tom

From: Bryan Banister 
mailto:bbanis...@jumptrading.com>>
Sent: 17 July 2018 23:22
To: Tom W mailto:to...@ukfast.co.uk>>; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: RE: Cluster in bad shape, seemingly endless cycle of OSDs failed, then 
marked down, then booted, then failed again

Thanks Tom,

Yes, we can try pausing I/O to give the cluster time to recover.  I assume that 
you're talking about using `ceph osd set pause` for this?

We did finally get some health output, which seems to indicate everything is 
basically stuck:
2018-07-17 21:00:00.000107 mon.rook-ceph-mon7 [WRN] overall HEALTH_WARN 
nodown,noout flag(s) set; 1/8884349 objects misplaced (0.000%); Reduced data 
availability: 10907 pgs inactive, 6354 pgs down, 4553 pgs peering; Degraded 
data redundancy: 10/8884349 objects degraded (0.000%), 6 pgs degraded, 80 pgs 
undersized; 135 slow requests are blocked > 32 sec
2018-07-17 22:00:00.000124 mon.rook-ceph-mon7 [WRN] overall HEALTH_WARN 
nodown,noout flag(s) set; 1/8884349 objects misplaced (0.000%); Reduced data 
availability: 10907 pgs inactive, 6354 pgs down, 4553 pgs peering; Degraded 
data redundancy: 10/8884349 objects degraded (0.000%), 6 pgs degraded, 80 pgs 
undersized; 135 slow requests are blocked > 32 sec

I can't seem to find a command to run a query on a specific PG, though I'm 
really new to ceph so sorry if that's an obvious thing.  What would I run to 
query the status and condition of a PG?

I'll talk with our kubernetes team to see if they can also help rule out any 
networking related issues.

Cheers,
-Bryan

From: Tom W [mailto:to...@ukfast.co.uk]
Sent: Tuesday, July 17, 2018 5:06 PM
To: Bryan Banister 
mailto:bbanis...@jumptrading.com>>; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: RE: Cluster in bad shape, seemingly endless cycle of OSDs failed, then 
marked down, then booted, then failed again

Note: External Email

Hi Bryan,

That's unusual, and not something I can really begin to unravel. As some other 
pointers, perhaps run a PG query on some of the inactive and peering PGs for 
any potentially useful output?

I suspect from what you've put that most PGs are simply in a down and peering 
state, and it can't peer as they are down still. The nodown flag doesn't seem 
to have fixed that, but th

Re: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked down, then booted, then failed again

2018-07-17 Thread Tom W

Hi Bryan,

Try "ceph pg xx.xx query", with xx.xx of course being the PG number. With some 
luck this will give you the state of that individual pg and which OSDs or 
issues may be blocking the peering from completing, which can be used as a clue 
perhaps as to the cause. If you can find a point of two OSDs unable to peer, 
you then at least have a pathway to begin testing connectivity again.

For pausing your cluster, my method (never tested in an environment with more 
than a few osds or in production) is:
ceph osd set noout
ceph osd set nobackfill
ceph osd set norecover
ceph osd set nodown
ceph osd set norebalance
ceph osd pause

To return, just reverse the order with "ceph osd pause" becoming "ceph osd 
unpause" I believe. All the above flags will stop most activity and just get 
things peered and not much else, once they are successfully peered, you can 
slowly begin to unset the above (and I do recommend going slowly..). You don't 
have any significant misplaced/degraded objects so you won't likely see much 
recovery activity, but as scrubbing kicks in there might be significant amounts 
of inconsistent PGs and backfill/recovery going on. It might be best to limit 
the impact of these from going 0 to 100 with these parameters (1 backfill at a 
time, wait 0.1 seconds between recovery op per OSD).

ceph tell osd.* injectargs '--osd-max-backfills 1'
ceph tell osd.* injectargs '--osd-recovery-sleep 0.1'

Tom



From: Bryan Banister 
Sent: 17 July 2018 23:22
To: Tom W ; ceph-users@lists.ceph.com
Subject: RE: Cluster in bad shape, seemingly endless cycle of OSDs failed, then 
marked down, then booted, then failed again

Thanks Tom,

Yes, we can try pausing I/O to give the cluster time to recover.  I assume that 
you're talking about using `ceph osd set pause` for this?

We did finally get some health output, which seems to indicate everything is 
basically stuck:
2018-07-17 21:00:00.000107 mon.rook-ceph-mon7 [WRN] overall HEALTH_WARN 
nodown,noout flag(s) set; 1/8884349 objects misplaced (0.000%); Reduced data 
availability: 10907 pgs inactive, 6354 pgs down, 4553 pgs peering; Degraded 
data redundancy: 10/8884349 objects degraded (0.000%), 6 pgs degraded, 80 pgs 
undersized; 135 slow requests are blocked > 32 sec
2018-07-17 22:00:00.000124 mon.rook-ceph-mon7 [WRN] overall HEALTH_WARN 
nodown,noout flag(s) set; 1/8884349 objects misplaced (0.000%); Reduced data 
availability: 10907 pgs inactive, 6354 pgs down, 4553 pgs peering; Degraded 
data redundancy: 10/8884349 objects degraded (0.000%), 6 pgs degraded, 80 pgs 
undersized; 135 slow requests are blocked > 32 sec

I can't seem to find a command to run a query on a specific PG, though I'm 
really new to ceph so sorry if that's an obvious thing.  What would I run to 
query the status and condition of a PG?

I'll talk with our kubernetes team to see if they can also help rule out any 
networking related issues.

Cheers,
-Bryan

From: Tom W [mailto:to...@ukfast.co.uk]
Sent: Tuesday, July 17, 2018 5:06 PM
To: Bryan Banister 
mailto:bbanis...@jumptrading.com>>; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: RE: Cluster in bad shape, seemingly endless cycle of OSDs failed, then 
marked down, then booted, then failed again

Note: External Email

Hi Bryan,

That's unusual, and not something I can really begin to unravel. As some other 
pointers, perhaps run a PG query on some of the inactive and peering PGs for 
any potentially useful output?

I suspect from what you've put that most PGs are simply in a down and peering 
state, and it can't peer as they are down still. The nodown flag doesn't seem 
to have fixed that, but then again it can't peer if they actually are down 
which nodown will mask.

Is pausing all cluster IO an option for you? My thinking here is to pause all 
IO, completely restart and verify all OSDs are back up and operational? If they 
fail to come up during paused IO, it will rule out any spiking load, but this 
seems to be more of a network issue, as even peering would normally generate 
some volume of traffic as it cycles to reattempt.

I'm not familiar at all with Rook or Kubernetes at this stage so I also have 
concern over how the networking stack there would work. MTU has been a problem 
in the past but this would only affect performance and not operation in my 
mind. Also perhaps being able to reach other nodes on the right interfaces, so 
can you definitely traverse the public and cluster networks successfully?

Tom



From: Bryan Banister 
mailto:bbanis...@jumptrading.com>>
Sent: 17 July 2018 22:36
To: Tom W mailto:to...@ukfast.co.uk>>; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: RE: Cluster in bad shape, seemingly endless cycle of OSDs failed, then 
marked down, then booted, then failed again

Hi Tom,

I

Re: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked down, then booted, then failed again

2018-07-17 Thread Tom W

Hi Bryan,

That's unusual, and not something I can really begin to unravel. As some other 
pointers, perhaps run a PG query on some of the inactive and peering PGs for 
any potentially useful output?

I suspect from what you've put that most PGs are simply in a down and peering 
state, and it can't peer as they are down still. The nodown flag doesn't seem 
to have fixed that, but then again it can't peer if they actually are down 
which nodown will mask.

Is pausing all cluster IO an option for you? My thinking here is to pause all 
IO, completely restart and verify all OSDs are back up and operational? If they 
fail to come up during paused IO, it will rule out any spiking load, but this 
seems to be more of a network issue, as even peering would normally generate 
some volume of traffic as it cycles to reattempt.

I'm not familiar at all with Rook or Kubernetes at this stage so I also have 
concern over how the networking stack there would work. MTU has been a problem 
in the past but this would only affect performance and not operation in my 
mind. Also perhaps being able to reach other nodes on the right interfaces, so 
can you definitely traverse the public and cluster networks successfully?

Tom


From: Bryan Banister 
Sent: 17 July 2018 22:36
To: Tom W ; ceph-users@lists.ceph.com
Subject: RE: Cluster in bad shape, seemingly endless cycle of OSDs failed, then 
marked down, then booted, then failed again

Hi Tom,

I tried to check out the ops in flight as you suggested but this seems to just 
hang:
root@rook-ceph-osd-carg-kubelet-osd02-m9rhx:/# ceph --admin-daemon 
/var/lib/rook/osd238/rook-osd.238.asok daemon osd.238 dump_ops_in_flight

Nothing returns and don't get a prompt back.

The cluster is somewhat new, but has been running without any major issues for 
more than a week or so.  We're not even sure how this all started.

I'm happy to provide more details of our deployment if you or others need 
anything.

We haven't changed anything today/recently.  I think you're correct that 
unsetting 'nodown' will just return things to the previous state.

Thanks!
-Bryan

From: Tom W [mailto:to...@ukfast.co.uk]
Sent: Tuesday, July 17, 2018 4:19 PM
To: Bryan Banister 
mailto:bbanis...@jumptrading.com>>; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: RE: Cluster in bad shape, seemingly endless cycle of OSDs failed, then 
marked down, then booted, then failed again

Note: External Email

Hi Bryan,

OSDs may not truly be up, this flag merely prevents them being marked as down 
even if they are unresponsive. It may be worth unsetting nodown as soon as you 
are confident, but unsetting it before anything changes will just return to the 
previous state. Perhaps not harmful, but I have no oversight on your deployment 
nor am I an expert in any regards.

Find an OSD which is up and having issues peering, and perhaps try something 
like this

ceph daemon osd.x dump_ops_in_flight

Replacing x with the OSD number, I am curious to see what may be holding it up. 
I assume you have already done the usual tests to ensure it is traversing the 
right interface, correct VLANs, reachable via ICMP, perhaps even run an iperf 
and tpcdump to be certain the flow is as expected.

Tom

From: Bryan Banister 
mailto:bbanis...@jumptrading.com>>
Sent: 17 July 2018 22:03
To: Tom W mailto:to...@ukfast.co.uk>>; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: RE: Cluster in bad shape, seemingly endless cycle of OSDs failed, then 
marked down, then booted, then failed again

Hi Tom,

Decided to try your suggestion of the 'nodown' setting and this indeed has 
gotten all of the OSDs up and they haven't failed out like before.  However the 
PGs are in bad states and Ceph doesn't seem interested in starting recovery 
over the last 30 minues since the latest health message was reported:

2018-07-17 20:29:00.638398 mon.rook-ceph-mon7 [WRN] Health check update: 
1/8884343 objects misplaced (0.000%) (OBJECT_MISPLACED)
2018-07-17 20:29:00.864863 mon.rook-ceph-mon7 [INF] osd.221 
7.129.220.49:6957/30346 boot
2018-07-17 20:29:01.907855 mon.rook-ceph-mon7 [INF] Health check cleared: 
OSD_DOWN (was: 1 osds down)
2018-07-17 20:29:02.598518 mon.rook-ceph-mon7 [INF] osd.238 
7.129.220.49:6923/30330 boot
2018-07-17 20:29:02.988546 mon.rook-ceph-mon7 [WRN] Health check update: 
Reduced data availability: 10895 pgs inactive, 6514 pgs down, 4391 pgs peering, 
2 pgs stale (PG_AVAILABILITY)
2018-07-17 20:29:04.380454 mon.rook-ceph-mon7 [WRN] Health check update: 
Degraded data redundancy: 10/8884349 objects degraded (0.000%), 6 pgs degraded, 
80 pgs undersized (PG_DEGRADED)
2018-07-17 20:29:08.319073 mon.rook-ceph-mon7 [WRN] Health check update: 
1/8884349 objects misplaced (0.000%) (OBJECT_MISPLACED)
2018-07-17 20:29:08.319103 mon.rook-ceph-mon7 [WRN] Health check update:

Re: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked down, then booted, then failed again

2018-07-17 Thread Tom W

Hi Bryan,

OSDs may not truly be up, this flag merely prevents them being marked as down 
even if they are unresponsive. It may be worth unsetting nodown as soon as you 
are confident, but unsetting it before anything changes will just return to the 
previous state. Perhaps not harmful, but I have no oversight on your deployment 
nor am I an expert in any regards.

Find an OSD which is up and having issues peering, and perhaps try something 
like this

ceph daemon osd.x dump_ops_in_flight

Replacing x with the OSD number, I am curious to see what may be holding it up. 
I assume you have already done the usual tests to ensure it is traversing the 
right interface, correct VLANs, reachable via ICMP, perhaps even run an iperf 
and tpcdump to be certain the flow is as expected.

Tom

From: Bryan Banister 
Sent: 17 July 2018 22:03
To: Tom W ; ceph-users@lists.ceph.com
Subject: RE: Cluster in bad shape, seemingly endless cycle of OSDs failed, then 
marked down, then booted, then failed again

Hi Tom,

Decided to try your suggestion of the 'nodown' setting and this indeed has 
gotten all of the OSDs up and they haven't failed out like before.  However the 
PGs are in bad states and Ceph doesn't seem interested in starting recovery 
over the last 30 minues since the latest health message was reported:

2018-07-17 20:29:00.638398 mon.rook-ceph-mon7 [WRN] Health check update: 
1/8884343 objects misplaced (0.000%) (OBJECT_MISPLACED)
2018-07-17 20:29:00.864863 mon.rook-ceph-mon7 [INF] osd.221 
7.129.220.49:6957/30346 boot
2018-07-17 20:29:01.907855 mon.rook-ceph-mon7 [INF] Health check cleared: 
OSD_DOWN (was: 1 osds down)
2018-07-17 20:29:02.598518 mon.rook-ceph-mon7 [INF] osd.238 
7.129.220.49:6923/30330 boot
2018-07-17 20:29:02.988546 mon.rook-ceph-mon7 [WRN] Health check update: 
Reduced data availability: 10895 pgs inactive, 6514 pgs down, 4391 pgs peering, 
2 pgs stale (PG_AVAILABILITY)
2018-07-17 20:29:04.380454 mon.rook-ceph-mon7 [WRN] Health check update: 
Degraded data redundancy: 10/8884349 objects degraded (0.000%), 6 pgs degraded, 
80 pgs undersized (PG_DEGRADED)
2018-07-17 20:29:08.319073 mon.rook-ceph-mon7 [WRN] Health check update: 
1/8884349 objects misplaced (0.000%) (OBJECT_MISPLACED)
2018-07-17 20:29:08.319103 mon.rook-ceph-mon7 [WRN] Health check update: 
Reduced data availability: 10893 pgs inactive, 6391 pgs down, 4515 pgs peering, 
1 pg stale (PG_AVAILABILITY)
2018-07-17 20:29:13.319406 mon.rook-ceph-mon7 [WRN] Health check update: 
Reduced data availability: 10893 pgs inactive, 6354 pgs down, 4552 pgs peering 
(PG_AVAILABILITY)
2018-07-17 20:29:14.044696 mon.rook-ceph-mon7 [WRN] Health check update: 123 
slow requests are blocked > 32 sec (REQUEST_SLOW)
2018-07-17 20:29:20.277493 mon.rook-ceph-mon7 [WRN] Health check update: 129 
slow requests are blocked > 32 sec (REQUEST_SLOW)
2018-07-17 20:29:27.344834 mon.rook-ceph-mon7 [WRN] Health check update: 135 
slow requests are blocked > 32 sec (REQUEST_SLOW)
2018-07-17 20:29:54.516115 mon.rook-ceph-mon7 [WRN] Health check update: 
Reduced data availability: 10899 pgs inactive, 6354 pgs down, 4552 pgs peering 
(PG_AVAILABILITY)
2018-07-17 20:30:03.322101 mon.rook-ceph-mon7 [WRN] Health check update: 
Reduced data availability: 10907 pgs inactive, 6354 pgs down, 4553 pgs peering 
(PG_AVAILABILITY)

Nothing since then, which was 30 min ago.  Hosts are basically idle.

I'm thinking of unsetting the 'nodown" now to see what it does, but is there 
any other recommendations here before I do that?

Thanks again!
-Bryan

From: Tom W [mailto:to...@ukfast.co.uk]
Sent: Tuesday, July 17, 2018 1:58 PM
To: Bryan Banister 
mailto:bbanis...@jumptrading.com>>; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: Cluster in bad shape, seemingly endless cycle of OSDs failed, then 
marked down, then booted, then failed again

Note: External Email

Prior to the OSD being marked as down by the cluster, do you note the PGs 
become inactive on it? Using a flag such as nodown may prevent OSDs flapping if 
it helps reduce the IO load to see if things stabilise out, but be wary of this 
flag as I believe PGs using the OSD as the primary will not failover to another 
OSD while nodown is set.

My thoughts here, albeit I am shooting in the dark a little with this theory, 
is perhaps individual OSDs being overloaded and not returning a heartbeat as a 
result of the load. When OSDs are marked as down and new maps are distributed 
this would add further load so while it keeps recalculating it may be a vicious 
cycle which may be alleviated if it could stabilise.

With networks mainly idle, do you see any spikes at all? Perhaps an OSD coming 
online, OSD attempts backfill/recovery and QoS dropping the heartbeat packets 
if it overloads the link?

Just spitballing some ideas here until somebody more qualified may have an idea.


From: Bryan Ba

Re: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked down, then booted, then failed again

2018-07-17 Thread Tom W

Prior to the OSD being marked as down by the cluster, do you note the PGs 
become inactive on it? Using a flag such as nodown may prevent OSDs flapping if 
it helps reduce the IO load to see if things stabilise out, but be wary of this 
flag as I believe PGs using the OSD as the primary will not failover to another 
OSD while nodown is set.

My thoughts here, albeit I am shooting in the dark a little with this theory, 
is perhaps individual OSDs being overloaded and not returning a heartbeat as a 
result of the load. When OSDs are marked as down and new maps are distributed 
this would add further load so while it keeps recalculating it may be a vicious 
cycle which may be alleviated if it could stabilise.

With networks mainly idle, do you see any spikes at all? Perhaps an OSD coming 
online, OSD attempts backfill/recovery and QoS dropping the heartbeat packets 
if it overloads the link?

Just spitballing some ideas here until somebody more qualified may have an idea.


From: Bryan Banister 
Sent: 17 July 2018 19:18:15
To: Bryan Banister; Tom W; ceph-users@lists.ceph.com
Subject: RE: Cluster in bad shape, seemingly endless cycle of OSDs failed, then 
marked down, then booted, then failed again

I didn’t find anything obvious in the release notes about this issue we see to 
have, but I don’t understand it really.

We have seen logs indicating some kind of heartbeat issue with OSDs, but we 
don’t believe there is any issues with the networking between the nodes, which 
are mostly idle as well:

2018-07-17 17:41:32.903871 I | osd12: 2018-07-17 17:41:32.903793 7fffef198700 
-1 osd.12 4296 heartbeat_check: no reply from 7.129.220.44:6866 osd.219 ever on 
either front or back, first ping sent 2018-07-17 17:41:09.893761 (cutoff 
2018-07-17 17:41:12.903604)
2018-07-17 17:41:32.903875 I | osd12: 2018-07-17 17:41:32.903795 7fffef198700 
-1 osd.12 4296 heartbeat_check: no reply from 7.129.220.44:6922 osd.220 ever on 
either front or back, first ping sent 2018-07-17 17:41:09.893761 (cutoff 
2018-07-17 17:41:12.903604)
2018-07-17 17:41:32.903878 I | osd12: 2018-07-17 17:41:32.903798 7fffef198700 
-1 osd.12 4296 heartbeat_check: no reply from 7.129.220.44:6901 osd.221 ever on 
either front or back, first ping sent 2018-07-17 17:41:09.893761 (cutoff 
2018-07-17 17:41:12.903604)
2018-07-17 17:41:32.903880 I | osd12: 2018-07-17 17:41:32.903800 7fffef198700 
-1 osd.12 4296 heartbeat_check: no reply from 7.129.220.44:6963 osd.222 ever on 
either front or back, first ping sent 2018-07-17 17:41:09.893761 (cutoff 
2018-07-17 17:41:12.903604)
2018-07-17 17:41:32.903884 I | osd12: 2018-07-17 17:41:32.903803 7fffef198700 
-1 osd.12 4296 heartbeat_check: no reply from 7.129.220.44:6907 osd.224 ever on 
either front or back, first ping sent 2018-07-17 17:41:09.893761 (cutoff 
2018-07-17 17:41:12.903604)

Is there a way to resolve this issue, which seems to be the root cause of the 
OSDs being marked as failed.

Thanks in advance for any help,
-Bryan

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Bryan 
Banister
Sent: Tuesday, July 17, 2018 12:08 PM
To: Tom W ; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs 
failed, then marked down, then booted, then failed again

Note: External Email

Hi Tom,

We’re apparently running ceph version 12.2.5 on a Rook based cluster.  We have 
EC pools on large 8TB HDDs and metadata on bluestore OSDs on NVMe drives.

I’ll look at the release notes.

Thanks!
-Bryan

From: Tom W [mailto:to...@ukfast.co.uk]
Sent: Tuesday, July 17, 2018 12:05 PM
To: Bryan Banister 
mailto:bbanis...@jumptrading.com>>; 
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: Cluster in bad shape, seemingly endless cycle of OSDs failed, then 
marked down, then booted, then failed again

Note: External Email


Hi Bryan,



What version of Ceph are you currently running on, and do you run any erasure 
coded pools or bluestore OSDs? Might be worth having a quick glance over the 
recent changelogs:



http://docs.ceph.com/docs/master/releases/luminous/



Tom


From: ceph-users 
mailto:ceph-users-boun...@lists.ceph.com>> 
on behalf of Bryan Banister 
mailto:bbanis...@jumptrading.com>>
Sent: 17 July 2018 18:00:05
To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs 
failed, then marked down, then booted, then failed again

Hi all,

We’re still very new to managing Ceph and seem to have cluster that is in an 
endless loop of failing OSDs, then marking them down, then booting them again:

Here are some example logs:
2018-07-17 16:48:28.976673 mon.rook-ceph-mon7 [INF] osd.83 failed 
(root=default,host=carg-kubelet-osd04) (3 reporters from different host after 
61.491973 >= grace 20.0

Re: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked down, then booted, then failed again

2018-07-17 Thread Tom W

Hi Bryan,


What version of Ceph are you currently running on, and do you run any erasure 
coded pools or bluestore OSDs? Might be worth having a quick glance over the 
recent changelogs:


http://docs.ceph.com/docs/master/releases/luminous/


Tom


From: ceph-users  on behalf of Bryan 
Banister 
Sent: 17 July 2018 18:00:05
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs 
failed, then marked down, then booted, then failed again

Hi all,

We’re still very new to managing Ceph and seem to have cluster that is in an 
endless loop of failing OSDs, then marking them down, then booting them again:

Here are some example logs:
2018-07-17 16:48:28.976673 mon.rook-ceph-mon7 [INF] osd.83 failed 
(root=default,host=carg-kubelet-osd04) (3 reporters from different host after 
61.491973 >= grace 20.010293)
2018-07-17 16:48:28.976730 mon.rook-ceph-mon7 [INF] osd.84 failed 
(root=default,host=carg-kubelet-osd04) (3 reporters from different host after 
61.491916 >= grace 20.010293)
2018-07-17 16:48:28.976785 mon.rook-ceph-mon7 [INF] osd.85 failed 
(root=default,host=carg-kubelet-osd04) (3 reporters from different host after 
61.491870 >= grace 20.011151)
2018-07-17 16:48:28.976843 mon.rook-ceph-mon7 [INF] osd.86 failed 
(root=default,host=carg-kubelet-osd04) (3 reporters from different host after 
61.491828 >= grace 20.010293)
2018-07-17 16:48:28.976890 mon.rook-ceph-mon7 [INF] Marking osd.1 out (has been 
down for 605 seconds)
2018-07-17 16:48:28.976913 mon.rook-ceph-mon7 [INF] Marking osd.2 out (has been 
down for 605 seconds)
2018-07-17 16:48:28.976933 mon.rook-ceph-mon7 [INF] Marking osd.3 out (has been 
down for 605 seconds)
2018-07-17 16:48:28.976954 mon.rook-ceph-mon7 [INF] Marking osd.4 out (has been 
down for 605 seconds)
2018-07-17 16:48:28.976979 mon.rook-ceph-mon7 [INF] Marking osd.9 out (has been 
down for 605 seconds)
2018-07-17 16:48:28.977000 mon.rook-ceph-mon7 [INF] Marking osd.10 out (has 
been down for 605 seconds)
2018-07-17 16:48:28.977020 mon.rook-ceph-mon7 [INF] Marking osd.11 out (has 
been down for 605 seconds)
2018-07-17 16:48:28.977040 mon.rook-ceph-mon7 [INF] Marking osd.12 out (has 
been down for 605 seconds)
2018-07-17 16:48:28.977059 mon.rook-ceph-mon7 [INF] Marking osd.13 out (has 
been down for 605 seconds)
2018-07-17 16:48:28.977079 mon.rook-ceph-mon7 [INF] Marking osd.14 out (has 
been down for 605 seconds)
2018-07-17 16:48:30.889316 mon.rook-ceph-mon7 [INF] osd.55 
7.129.218.12:6920/90761 boot
2018-07-17 16:48:31.113052 mon.rook-ceph-mon7 [WRN] Health check update: 
4946/8854434 objects misplaced (0.056%) (OBJECT_MISPLACED)
2018-07-17 16:48:31.113087 mon.rook-ceph-mon7 [WRN] Health check update: 
Degraded data redundancy: 7951/8854434 objects degraded (0.090%), 88 pgs 
degraded, 273 pgs undersized (PG_DEGRADED)
2018-07-17 16:48:32.763546 mon.rook-ceph-mon7 [WRN] Health check update: 
Reduced data availability: 10439 pgs inactive, 8994 pgs down, 1639 pgs peering, 
88 pgs incomplete, 3430 pgs stale (PG_AVAILABILITY)
2018-07-17 16:48:32.763578 mon.rook-ceph-mon7 [WRN] Health check update: 29 
slow requests are blocked > 32 sec (REQUEST_SLOW)
2018-07-17 16:48:34.096178 mon.rook-ceph-mon7 [INF] osd.88 failed 
(root=default,host=carg-kubelet-osd04) (3 reporters from different host after 
66.612054 >= grace 20.010283)
2018-07-17 16:48:34.108020 mon.rook-ceph-mon7 [WRN] Health check update: 112 
osds down (OSD_DOWN)
2018-07-17 16:48:38.736108 mon.rook-ceph-mon7 [WRN] Health check update: 
4946/8843715 objects misplaced (0.056%) (OBJECT_MISPLACED)
2018-07-17 16:48:38.736140 mon.rook-ceph-mon7 [WRN] Health check update: 
Reduced data availability: 10415 pgs inactive, 9000 pgs down, 1635 pgs peering, 
88 pgs incomplete, 3418 pgs stale (PG_AVAILABILITY)
2018-07-17 16:48:38.736166 mon.rook-ceph-mon7 [WRN] Health check update: 
Degraded data redundancy: 7949/8843715 objects degraded (0.090%), 86 pgs 
degraded, 267 pgs undersized (PG_DEGRADED)
2018-07-17 16:48:40.430146 mon.rook-ceph-mon7 [WRN] Health check update: 111 
osds down (OSD_DOWN)
2018-07-17 16:48:40.812579 mon.rook-ceph-mon7 [INF] osd.117 
7.129.217.10:6833/98090 boot
2018-07-17 16:48:42.427204 mon.rook-ceph-mon7 [INF] osd.115 
7.129.217.10:6940/98114 boot
2018-07-17 16:48:42.427297 mon.rook-ceph-mon7 [INF] osd.100 
7.129.217.10:6899/98091 boot
2018-07-17 16:48:42.427502 mon.rook-ceph-mon7 [INF] osd.95 
7.129.217.10:6901/98092 boot

Not sure this is going to fix itself.  Any ideas on how to handle this 
situation??

Thanks in advance!
-Bryan




Note: This email is for the confidential use of the named addressee(s) only and 
may contain proprietary, confidential, or privileged information and/or 
personal data. If you are not the intended recipient, you are hereby notified 
that any review, dissemination, or copying of this email is strictly 
prohibited, and requested to notify the sender immediately and destroy this 
ema

[ceph-users] Centralised Logging Strategy

2018-06-27 Thread Tom W

Morning all,

Does anybody have any advice regarding moving their Ceph clusters to 
centralised logging? We are presently investigating routes to undertake this 
(long awaited and needed) change, and any pointers or gotchas that may lay 
ahead that we could be advised on would be great. We are quite used to 
deploying ELK at this stage so that is our probable option for collection and 
analysis unless there is a compelling reason otherwise. We are able to write 
our own filters as needed, but tried and tested options on which to base our 
own would be fantastic too.

We are debating whether it is worth doing the logging over our management 
network (this is a simple layer 2 network on 100Mbit links per host), or should 
we perhaps be looking to do this over the public network (40G in our case) 
instead?

Kind Regards,

Tom



NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-named 
person(s). If you are not the intended recipient, notify the sender 
immediately, delete this email from your system and do not disclose or use for 
any purpose. We may monitor all incoming and outgoing emails in line with 
current legislation. We have taken steps to ensure that this email and 
attachments are free from any virus, but it remains your responsibility to 
ensure that viruses do not adversely affect you
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RGW Index rapidly expanding post tunables update (12.2.5)

2018-06-20 Thread Tom W

Hi all,

We have recently upgraded from Jewel (10.2.10) to Luminous (12.2.5) and after 
this we decided to update our tunables configuration to the optimals, which 
were previously at Firefly. During this process, we have noticed the OSDs 
(bluestore) rapidly filling on the RGW index and GC pool. We estimated the 
index to consume around 30G of space and the GC negligible, but they are now 
filling all 4 OSDs per host which contain 2TB SSDs in each.

Does anyone have any experience with this, or how to determine why the sudden 
growth has been encountered during recovery after the tunables update?

We have disabled resharding activity due to this issue, 
https://tracker.ceph.com/issues/24551 and our gc queue is only a few items at 
present.

Kind Regards,

Tom



NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-named 
person(s). If you are not the intended recipient, notify the sender 
immediately, delete this email from your system and do not disclose or use for 
any purpose. We may monitor all incoming and outgoing emails in line with 
current legislation. We have taken steps to ensure that this email and 
attachments are free from any virus, but it remains your responsibility to 
ensure that viruses do not adversely affect you
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Bucket reporting content inconsistently

2018-05-12 Thread Tom W

Thanks for posting this for me Sean. Just to update, it seems that despite the 
bucket checks completing and reporting no issues, the objects continued to show 
in any tools to list the contents of the bucket.

I put together a simple loop to upload a new file to overwrite the existing one 
then trigger a delete request though the API and this seems to be working in 
lieu of a cleaner solution.

We will be upgrading to Luminous in the coming week, I’ll report back if we see 
any significant change in this issue when we do.

Kind Regards,

Tom

From: ceph-users  On Behalf Of Sean Redmond
Sent: 11 May 2018 17:15
To: ceph-users 
Subject: [ceph-users] Bucket reporting content inconsistently


HI all,



We have recently upgraded to 10.2.10 in preparation for our upcoming upgrade to 
Luminous and I have been attempting to remove a bucket. When using tools such 
as s3cmd I can see files are listed, verified by the checking with bi list too 
as shown below:



root@ceph-rgw-1:~# radosgw-admin --id rgw.ceph-rgw-1 bi list 
--bucket='bucketnamehere' | grep -i "\"idx\":" | wc -l

3278



However, on attempting to delete the bucket and purge the objects , it appears 
not to be recognised:



root@ceph-rgw-1:~# radosgw-admin --id rgw.ceph-rgw-1 bucket rm --bucket= 
bucketnamehere --purge-objects

2018-05-10 14:11:05.393851 7f0ab07b6a00 -1 ERROR: unable to remove bucket(2) No 
such file or directory



Checking the bucket stats, it does appear that the bucket is reporting no 
content, and repeat the above content test there has been no change to the 3278 
figure:



root@ceph-rgw-1:~# radosgw-admin --id rgw.ceph-rgw-1 bucket stats 
--bucket="bucketnamehere"

{

"bucket": "bucketnamehere",

"pool": ".rgw.buckets",

"index_pool": ".rgw.buckets.index",

"id": "default.28142894.1",

"marker": "default.28142894.1",

"owner": "16355",

"ver": 
"0#5463545,1#5483686,2#5483484,3#5474696,4#5479052,5#5480339,6#5469460,7#5463976",

"master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0",

"mtime": "2015-12-08 12:42:26.286153",

"max_marker": "0#,1#,2#,3#,4#,5#,6#,7#",

"usage": {

"rgw.main": {

"size_kb": 0,

"size_kb_actual": 0,

"num_objects": 0

},

"rgw.multimeta": {

"size_kb": 0,

"size_kb_actual": 0,

"num_objects": 0

}

},

"bucket_quota": {

"enabled": false,

"max_size_kb": -1,

"max_objects": -1

}

}



I have attempted a bucket index check and fix on this, however, it does not 
appear to have made a difference and no fixes or errors reported from it. Does 
anyone have any advice on how to proceed with removing this content? At this 
stage I am not too concerned if the method needed to remove this generates 
orphans, as we will shortly be running a large orphan scan after our upgrade to 
Luminous. Cluster health otherwise reports normal.



Thanks

Sean Redmond



NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-named 
person(s). If you are not the intended recipient, notify the sender 
immediately, delete this email from your system and do not disclose or use for 
any purpose. We may monitor all incoming and outgoing emails in line with 
current legislation. We have taken steps to ensure that this email and 
attachments are free from any virus, but it remains your responsibility to 
ensure that viruses do not adversely affect you
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Test for Leo

2018-05-11 Thread Tom W

Test for Leo, please ignore.




NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-named 
person(s). If you are not the intended recipient, notify the sender 
immediately, delete this email from your system and do not disclose or use for 
any purpose. We may monitor all incoming and outgoing emails in line with 
current legislation. We have taken steps to ensure that this email and 
attachments are free from any virus, but it remains your responsibility to 
ensure that viruses do not adversely affect you
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked down, then booted, then failed again

Re: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked down, then booted, then failed again

Re: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked down, then booted, then failed again

Re: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked down, then booted, then failed again

Re: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked down, then booted, then failed again

Re: [ceph-users] Cluster in bad shape, seemingly endless cycle of OSDs failed, then marked down, then booted, then failed again

[ceph-users] Centralised Logging Strategy

[ceph-users] RGW Index rapidly expanding post tunables update (12.2.5)

Re: [ceph-users] Bucket reporting content inconsistently

[ceph-users] Test for Leo

10 matches

Site Navigation

Mail list logo

Footer information