Re: [ceph-users] How to just delete PGs stuck incomplete on EC pool

2019-03-11 Thread Maks Kowalik
Hello Daniel, I think you will not avoid a tedious job of manual cleanup... Or the other way is to delete the whole pool (ID 18). The manual cleanup means to take all the OSDs from "probing_osds", stop them one by one and remove the shards of groups 18.1e and 18.c (using ceph-objstore-tool).

Re: [ceph-users] How to just delete PGs stuck incomplete on EC pool

2019-03-05 Thread Peter Woodman
Last time I had to do this, I used the command outlined here: https://tracker.ceph.com/issues/10098 On Mon, Mar 4, 2019 at 11:05 AM Daniel K wrote: > > Thanks for the suggestions. > > I've tried both -- setting osd_find_best_info_ignore_history_les = true and > restarting all OSDs, as well as

Re: [ceph-users] How to just delete PGs stuck incomplete on EC pool

2019-03-04 Thread Daniel K
Thanks for the suggestions. I've tried both -- setting osd_find_best_info_ignore_history_les = true and restarting all OSDs, as well as 'ceph osd-force-create-pg' -- but both still show incomplete PG_AVAILABILITY Reduced data availability: 2 pgs inactive, 2 pgs incomplete pg 18.c is

Re: [ceph-users] How to just delete PGs stuck incomplete on EC pool

2019-03-02 Thread Paul Emmerich
On Sat, Mar 2, 2019 at 5:49 PM Alexandre Marangone wrote: > > If you have no way to recover the drives, you can try to reboot the OSDs with > `osd_find_best_info_ignore_history_les = true` (revert it afterwards), you'll > lose data. If after this, the PGs are down, you can mark the OSDs

Re: [ceph-users] How to just delete PGs stuck incomplete on EC pool

2019-03-02 Thread Alexandre Marangone
If you have no way to recover the drives, you can try to reboot the OSDs with `osd_find_best_info_ignore_history_les = true` (revert it afterwards), you'll lose data. If after this, the PGs are down, you can mark the OSDs blocking the PGs from become active lost. On Sat, Mar 2, 2019 at 6:08 AM

Re: [ceph-users] How to just delete PGs stuck incomplete on EC pool

2019-03-02 Thread Daniel K
They all just started having read errors. Bus resets. Slow reads. Which is one of the reasons the cluster didn't recover fast enough to compensate. I tried to be mindful of the drive type and specifically avoided the larger capacity Seagates that are SMR. Used 1 SM863 for every 6 drives for the

Re: [ceph-users] How to just delete PGs stuck incomplete on EC pool

2019-03-02 Thread jesper
Did they break, or did something went wronng trying to replace them? Jespe Sent from myMail for iOS Saturday, 2 March 2019, 14.34 +0100 from Daniel K : >I bought the wrong drives trying to be cheap. They were 2TB WD Blue 5400rpm >2.5 inch laptop drives. > >They've been replace now with

Re: [ceph-users] How to just delete PGs stuck incomplete on EC pool

2019-03-02 Thread Daniel K
I bought the wrong drives trying to be cheap. They were 2TB WD Blue 5400rpm 2.5 inch laptop drives. They've been replace now with HGST 10K 1.8TB SAS drives. On Sat, Mar 2, 2019, 12:04 AM wrote: > > > Saturday, 2 March 2019, 04.20 +0100 from satha...@gmail.com < > satha...@gmail.com>: > > 56

Re: [ceph-users] How to just delete PGs stuck incomplete on EC pool

2019-03-01 Thread jesper
Saturday, 2 March 2019, 04.20 +0100 from satha...@gmail.com : >56 OSD, 6-node 12.2.5 cluster on Proxmox > >We had multiple drives fail(about 30%) within a few days of each other, likely >faster than the cluster could recover. Hov did so many drives break? Jesper

[ceph-users] How to just delete PGs stuck incomplete on EC pool

2019-03-01 Thread Daniel K
56 OSD, 6-node 12.2.5 cluster on Proxmox We had multiple drives fail(about 30%) within a few days of each other, likely faster than the cluster could recover. After the dust settled, we have 2 out of 896 pgs stuck inactive. The failed drives are completely inaccessible, so I can't mount them and