[ceph-users] Re: Recover pgs from failed osds

2020-09-07 Thread Wido den Hollander
On 04/09/2020 13:50, Eugen Block wrote: Hi, Wido had an idea in a different thread [1], you could try to advise the OSDs to compact at boot: [osd] osd_compact_on_start = true This is in master only, not yet in any release. Can you give that a shot? Wido also reported something about

[ceph-users] Re: Recover pgs from failed osds

2020-09-05 Thread Vahideh Alinouri
osd_compact_on_start is not in osd config reference.[1] thread discuss the "slow rados ls" solution. Compacting with "ceph-kvstore-tool bluestore-kv" on failed osd is done but no change. After export-remove all pgs on failed osd, failed osd started successfully! Crashing ceph-osd node is caused by

[ceph-users] Re: Recover pgs from failed osds

2020-09-04 Thread Eugen Block
Hi, Wido had an idea in a different thread [1], you could try to advise the OSDs to compact at boot: [osd] osd_compact_on_start = true Can you give that a shot? Wido also reported something about large OSD memory in [2], but noone commented yet. Regards, Eugen [1]

[ceph-users] Re: Recover pgs from failed osds

2020-09-01 Thread Vahideh Alinouri
Is not any solution or advice? On Tue, Sep 1, 2020, 11:53 AM Vahideh Alinouri wrote: > One of failed osd with 3G RAM started and dump_mempools shows total RAM > usage is 18G and buff_anon uses 17G RAM! > > On Mon, Aug 31, 2020 at 6:24 PM Vahideh Alinouri < > vahideh.alino...@gmail.com> wrote: >

[ceph-users] Re: Recover pgs from failed osds

2020-09-01 Thread Vahideh Alinouri
One of failed osd with 3G RAM started and dump_mempools shows total RAM usage is 18G and buff_anon uses 17G RAM! On Mon, Aug 31, 2020 at 6:24 PM Vahideh Alinouri wrote: > osd_memory_target of failed osd in one ceph-osd node changed to 6G but > other osd_memory_target is 3G, starting failed osd

[ceph-users] Re: Recover pgs from failed osds

2020-08-31 Thread Vahideh Alinouri
osd_memory_target of failed osd in one ceph-osd node changed to 6G but other osd_memory_target is 3G, starting failed osd with 6G memory_target causes other osd "down" in ceph-osd node! and failed osd is still down. On Mon, Aug 31, 2020 at 2:19 PM Eugen Block wrote: > Can you try the opposite

[ceph-users] Re: Recover pgs from failed osds

2020-08-31 Thread Eugen Block
Can you try the opposite and turn up the memory_target and only try to start a single OSD? Zitat von Vahideh Alinouri : osd_memory_target is changed to 3G, starting failed osd causes ceph-osd nodes crash! and failed osd is still "down" On Fri, Aug 28, 2020 at 1:13 PM Vahideh Alinouri

[ceph-users] Re: Recover pgs from failed osds

2020-08-31 Thread Vahideh Alinouri
osd_memory_target is changed to 3G, starting failed osd causes ceph-osd nodes crash! and failed osd is still "down" On Fri, Aug 28, 2020 at 1:13 PM Vahideh Alinouri wrote: > Yes, each osd node has 7 osds with 4 GB memory_target. > > > On Fri, Aug 28, 2020, 12:48 PM Eugen Block wrote: > >> Just

[ceph-users] Re: Recover pgs from failed osds

2020-08-28 Thread Vahideh Alinouri
Yes, each osd node has 7 osds with 4 GB memory_target. On Fri, Aug 28, 2020, 12:48 PM Eugen Block wrote: > Just to confirm, each OSD node has 7 OSDs with 4 GB memory_target? > That leaves only 4 GB RAM for the rest, and in case of heavy load the > OSDs use even more. I would suggest to reduce

[ceph-users] Re: Recover pgs from failed osds

2020-08-28 Thread Eugen Block
Just to confirm, each OSD node has 7 OSDs with 4 GB memory_target? That leaves only 4 GB RAM for the rest, and in case of heavy load the OSDs use even more. I would suggest to reduce the memory_target to 3 GB and see if they start successfully. Zitat von Vahideh Alinouri :

[ceph-users] Re: Recover pgs from failed osds

2020-08-28 Thread Vahideh Alinouri
osd_memory_target is 4294967296. Cluster setup: 3 mon, 3 mgr, 21 osds on 3 ceph-osd nodes in lvm scenario. ceph-osd nodes resources are 32G RAM - 4 core CPU - osd disk 4TB - 9 osds have block.wal on SSDs. Public network is 1G and cluster network is 10G. Cluster installed and upgraded using

[ceph-users] Re: Recover pgs from failed osds

2020-08-27 Thread Eugen Block
What is the memory_target for your OSDs? Can you share more details about your setup? You write about high memory, are the OSD nodes affected by OOM killer? You could try to reduce the osd_memory_target and see if that helps bring the OSDs back up. Splitting the PGs is a very heavy