[ceph-users] Re: PGs down

2020-12-22 Thread Jeremy Austin
Hi Igor, I had taken the OSDs out already, so bringing them up allowed a full rebalance to occur. I verified that they were not exhibiting ATA or SMART-reportable errors, wiped them and re-added. I will deep scrub. Thanks again! Jeremy On Mon, Dec 21, 2020 at 11:39 PM Igor Fedotov wrote: >

[ceph-users] Re: PGs down

2020-12-22 Thread Igor Fedotov
Hi Jeremy, good to know you managed to bring your OSDs up. Have you been able to reweight them to 0 and migrate data out of these "broken" OSDs? If so I suggest to redeploy them - the corruption is still in the DB and it might pop-up one day. If not please do that first - you might still

[ceph-users] Re: PGs down

2020-12-21 Thread Jeremy Austin
Igor, You're a bloomin' genius, as they say. Disabling auto compaction allowed OSDs 11 and 12 to spin up/out. The 7 down PGs recovered; there were a few unfound items previously which I went ahead and deleted, given that this is EC, revert not being an option. HEALTH OK :) I'm now intending to

[ceph-users] Re: PGs down

2020-12-21 Thread Igor Fedotov
Hi Alexander, the option you provided controls bluefs log compaction not rocksdb ones. Hence it doesn't make sense in Jeremy's case. Thanks, Igor On 12/21/2020 6:55 AM, Alexander E. Patrakov wrote: On Mon, Dec 21, 2020 at 4:57 AM Jeremy Austin wrote: On Sun, Dec 20, 2020 at 2:25 PM

[ceph-users] Re: PGs down

2020-12-21 Thread Igor Fedotov
Hi Jeremy, you might want to try RocksDB's disable_auto_compactions option for that. To adjust rocksdb's options one should  edit/insert bluestore_rocksdb_options in ceph.conf. E.g. bluestore_rocksdb_options =

[ceph-users] Re: PGs down

2020-12-21 Thread Jeremy Austin
On Sun, Dec 20, 2020 at 6:56 PM Alexander E. Patrakov wrote: > On Mon, Dec 21, 2020 at 4:57 AM Jeremy Austin wrote: > > > > On Sun, Dec 20, 2020 at 2:25 PM Jeremy Austin > wrote: > > > > > Will attempt to disable compaction and report. > > > > > > > Not sure I'm doing this right. In [osd]

[ceph-users] Re: PGs down

2020-12-20 Thread Alexander E. Patrakov
On Mon, Dec 21, 2020 at 4:57 AM Jeremy Austin wrote: > > On Sun, Dec 20, 2020 at 2:25 PM Jeremy Austin wrote: > > > Will attempt to disable compaction and report. > > > > Not sure I'm doing this right. In [osd] section of ceph.conf, I added > periodic_compaction_seconds=0 > > and attempted to

[ceph-users] Re: PGs down

2020-12-20 Thread Jeremy Austin
On Sun, Dec 20, 2020 at 2:25 PM Jeremy Austin wrote: > Will attempt to disable compaction and report. > Not sure I'm doing this right. In [osd] section of ceph.conf, I added periodic_compaction_seconds=0 and attempted to start the OSDs in question. Same error as before. Am I setting compaction

[ceph-users] Re: PGs down

2020-12-20 Thread Jeremy Austin
Sorry for the delay, Igor; answers inline. On Mon, Dec 14, 2020 at 2:09 AM Igor Fedotov wrote: > Hi Jeremy, > > I think you lost the data for OSD.11 & .12 I'm not aware of any reliable > enough way to recover RocksDB from this sort of errors. > > Theoretically you might want to disable auto

[ceph-users] Re: PGs down

2020-12-15 Thread Igor Fedotov
otov Sent: Monday, 14 December 2020 12:09 To: Jeremy Austin Cc: ceph-users@ceph.io Subject: [ceph-users] Re: PGs down Hi Jeremy, I think you lost the data for OSD.11 & .12 I'm not aware of any reliable enough way to recover RocksDB from this sort of errors. Theoretically you might want t

[ceph-users] Re: PGs down

2020-12-15 Thread Wout van Heeswijk
To: Jeremy Austin Cc: ceph-users@ceph.io Subject: [ceph-users] Re: PGs down Hi Jeremy, I think you lost the data for OSD.11 & .12 I'm not aware of any reliable enough way to recover RocksDB from this sort of errors. Theoretically you might want to disable auto compaction for Roc

[ceph-users] Re: PGs down

2020-12-14 Thread Igor Fedotov
Hi Jeremy, I think you lost the data for OSD.11 & .12  I'm not aware of any reliable enough way to recover RocksDB from this sort of errors. Theoretically you might want to disable auto compaction for RocksDB for these daemons and try to bring then up and attempt to drain the data out of

[ceph-users] Re: PGs down

2020-12-13 Thread Jeremy Austin
OSD 12 looks much the same.I don't have logs back to the original date, but this looks very similar — db/sst corruption. The standard fsck approaches couldn't fix it. I believe it was a form of ATA failure — OSD 11 and 12, if I recall correctly, did not actually experience SMARTD-reportable

[ceph-users] Re: PGs down

2020-12-12 Thread Igor Fedotov
Hi Jeremy, wondering what were the OSDs' logs when they crashed for the first time? And does OSD.12 reports the similar problem for now: 3> 2020-12-12 20:23:45.756 7f2d21404700 -1 rocksdb: submit_common error: Corruption: block checksum mismatch: expected 3113305400, got 1242690251 in