[ceph-users] Re: Ceph Cluster Taking An Awful Long Time To Rebalance

2021-03-16 Thread duluxoz
Yeap - that was the issue: an incorrect CRUSH rule Thanks for the help Dulux-Oz ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Erasure-coded Block Device Image Creation With qemu-img - Help

2021-03-16 Thread duluxoz
Hi Guys, So, new issue (I'm gonna get the hang of this if it kills me :-) ). I have a working/healthy Ceph (Octopus) Cluster (with qemu-img, libvert, etc, installed), and an erasure-coded pool called "my_pool". I now need to create a "my_data" image within the "my_pool" pool. As this is for a

[ceph-users] Re: Diskless boot for Ceph nodes

2021-03-16 Thread Stefan Kooman
On 3/16/21 6:37 PM, Stephen Smith6 wrote: Hey folks - thought I'd check and see if anyone has ever tried to use ephemeral (tmpfs / ramfs based) boot disks for Ceph nodes? croit.io does that quite succesfully I believe [1]. Gr. Stefan [1]: https://www.croit.io/software/features

[ceph-users] Re: Networking Idea/Question

2021-03-16 Thread Tony Liu
"but you may see significant performance improvement with a second "cluster" network in a large cluster." "does not usually have a significant impact on overall performance." The above two statements look conflict to me and cause confusing. What's the purpose of "cluster" network, simply

[ceph-users] Re: Diskless boot for Ceph nodes

2021-03-16 Thread Nico Schottelius
On 2021-03-16 22:06, Stefan Kooman wrote: On 3/16/21 6:37 PM, Stephen Smith6 wrote: Hey folks - thought I'd check and see if anyone has ever tried to use ephemeral (tmpfs / ramfs based) boot disks for Ceph nodes? croit.io does that quite succesfully I believe [1]. Same here at ungleich, all

[ceph-users] Re: Question about migrating from iSCSI to RBD

2021-03-16 Thread Richard Bade
Hi Justin, I did some testing with iscsi a year or so ago. It was just using standard rbd images in the backend so yes I think your theory of stopping iscsi to release the locks and then providing access to the rbd image would work. Rich On Wed, 17 Mar 2021 at 09:53, Justin Goetz wrote: > >

[ceph-users] Question about migrating from iSCSI to RBD

2021-03-16 Thread Justin Goetz
Hello! I was hoping to inquire if anyone here has attempted similar operations, and if they ran into any issues. To give a brief overview of my situation, I have a standard octopus cluster running 15.2.2, with ceph-iscsi installed via ansible. The original scope of a project we were working

[ceph-users] Re: osd_max_backfills = 1 for one OSD

2021-03-16 Thread Frank Schilder
ceph config rm ... = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dave Hall Sent: 16 March 2021 16:41:47 To: ceph-users Subject: [ceph-users] osd_max_backfills = 1 for one OSD Hello, I've been trying to get an OSD ready for

[ceph-users] osd_max_backfills = 1 for one OSD

2021-03-16 Thread Dave Hall
Hello, I've been trying to get an OSD ready for removal for about a week. RIght now I have 2 pgs backfilling and 21 in backfill_wait. Looking around I found reference to the following command (followed by my output). # ceph config dump | egrep

[ceph-users] Re: *****SPAM***** Diskless boot for Ceph nodes

2021-03-16 Thread Marc
croit.io(?) have some solution for booting from the network. They are very active here, you will undoubtedly hear from them ;) > -Original Message- > Sent: 16 March 2021 18:37 > To: ceph-users@ceph.io > Subject: *SPAM* [ceph-users] Diskless boot for Ceph nodes > > Hey folks -

[ceph-users] Diskless boot for Ceph nodes

2021-03-16 Thread Stephen Smith6
Hey folks - thought I'd check and see if anyone has ever tried to use ephemeral (tmpfs / ramfs based) boot disks for Ceph nodes? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Inactive pg, how to make it active / or delete

2021-03-16 Thread Szabo, Istvan (Agoda)
Can’t bring back, now trying to make the cluster hosts equal. This signal error bot going away from the osd start. I think I’ll try the ceph-objectstore tool pg export import on the died osds and put back to another. Let’s see. > On 2021. Mar 16., at 18:54, Frank Schilder wrote: > > The PG

[ceph-users] Re: Has anyone contact Data for Samsung Datacenter SSD Support ?

2021-03-16 Thread Jake Grimmett
Hi Christoph, We use the DC Toolkit to over-provision PM983 ssd's [root@pcterm25-1g ~]# ./Samsung_SSD_DC_Toolkit_for_Linux_V2.1 -L Samsung DC Toolkit Version 2.1.L.Q.0 Copyright (C) 2017 SAMSUNG

[ceph-users] Re: Networking Idea/Question

2021-03-16 Thread Andrew Walker-Brown
https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/ Sent from Mail for Windows 10 From: Tony Liu Sent: 16 March 2021 16:16 To: Stefan Kooman; Dave

[ceph-users] Re: Networking Idea/Question

2021-03-16 Thread Tony Liu
> -Original Message- > From: Stefan Kooman > Sent: Tuesday, March 16, 2021 4:10 AM > To: Dave Hall ; ceph-users > Subject: [ceph-users] Re: Networking Idea/Question > > On 3/15/21 5:34 PM, Dave Hall wrote: > > Hello, > > > > If anybody out there has tried this or thought about it, I'd

[ceph-users] Re: Inactive pg, how to make it active / or delete

2021-03-16 Thread Frank Schilder
The PG says blocked_by at least 2 of your down-OSDs. When you look at the history (past_intervals), it needs to backfill from the down OSDs (down_osds_we_would_probe). Since its more than 1, it can't proceed. You need to get the OSDs up. Best regards, = Frank Schilder AIT Risø

[ceph-users] Re: Networking Idea/Question

2021-03-16 Thread Stefan Kooman
On 3/15/21 5:34 PM, Dave Hall wrote: Hello, If anybody out there has tried this or thought about it, I'd like to know... I've been thinking about ways to squeeze as much performance as possible from the NICs  on a Ceph OSD node.  The nodes in our cluster (6 x OSD, 3 x MGR/MON/MDS/RGW)

[ceph-users] Re: Networking Idea/Question

2021-03-16 Thread Dave Hall
Burkhard, I woke up with the same conclusion - LACP load balancing can break down when the traffic traverses a router since the IP headers have the router as the destination address and thus the Ethernet header has the same to MAC addresses. (I think that in a pure layer 2 fabric the MAC

[ceph-users] Inactive pg, how to make it active / or delete

2021-03-16 Thread Szabo, Istvan (Agoda)
Hi, I have 4 inactive pg in my cluster, the osds are dies where it was before. How can I make it work again? Maybe just threw away because last backfill=max? Based on the pg query it is totally up on other osds. It is an EC 3+1. This is an example inactive pg: ceph pg 44.1f0 query {

[ceph-users] Re: Unhealthy Cluster | Remove / Purge duplicate osds | Fix daemon

2021-03-16 Thread Sebastian Wagner
Hi Oliver, I don't know how you managed to remove all MGRs from the cluster, but there is the documentation to manually recover from this: > https://docs.ceph.com/en/latest/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon Hope that helps, Sebastian Am 15.03.21 um 18:24 schrieb Oliver

[ceph-users] Re: Ceph Cluster Taking An Awful Long Time To Rebalance

2021-03-16 Thread duluxoz
Ah, right, that makes sense - I'll have a go at that Thank you On 16/03/2021 19:12, Janne Johansson wrote: pgs: 88.889% pgs not active 6/21 objects misplaced (28.571%) 256 creating+incomplete For new clusters, "creating+incomplete" sounds like you

[ceph-users] Re: should I increase the amount of PGs?

2021-03-16 Thread Boris Behrens
Hi Dan, my EC profile look very "default" to me. [root@s3db1 ~]# ceph osd erasure-code-profile ls default [root@s3db1 ~]# ceph osd erasure-code-profile get default k=2 m=1 plugin=jerasure technique=reed_sol_van I don't understand the ouput, but the balancing get worse over night: [root@s3db1

[ceph-users] Re: Networking Idea/Question

2021-03-16 Thread Burkhard Linke
Hi, On 16.03.21 03:40, Dave Hall wrote: Andrew, I agree that the choice of hash function is important for LACP. My thinking has always been to stay down in layers 2 and 3.  With enough hosts it seems likely that traffic would be split close to evenly.  Heads or tails - 50% of the time

[ceph-users] Re: Ceph Cluster Taking An Awful Long Time To Rebalance

2021-03-16 Thread Janne Johansson
> pgs: 88.889% pgs not active > 6/21 objects misplaced (28.571%) > 256 creating+incomplete For new clusters, "creating+incomplete" sounds like you created a pool (with 256 PGs) with some crush rule that doesn't allow it to find suitable placements, like

[ceph-users] Re: Networking Idea/Question

2021-03-16 Thread Dave Hall
Andrew, I agree that the choice of hash function is important for LACP. My thinking has always been to stay down in layers 2 and 3.  With enough hosts it seems likely that traffic would be split close to evenly.  Heads or tails - 50% of the time you're right.  TCP ports should also be nearly

[ceph-users] Re: Safe to remove osd or not? Which statement is correct?

2021-03-16 Thread Szabo, Istvan (Agoda)
Hi Boris, Yeah, this is the reason: -1> 2021-03-15T16:21:35.307+0100 7f8b1fd8d700 5 prioritycache tune_memory target: 4294967296mapped: 454098944 unmapped: 8560640 heap: 462659584 old mem: 2845415832 new mem: 2845415832 0> 2021-03-15T16:21:35.311+0100 7f8b11570700 -1 *** Caught signal

[ceph-users] Re: millions slow ops on a cluster without load

2021-03-16 Thread Szabo, Istvan (Agoda)
Yeah, the mtu is on the cluster network’s nic cards are 8982, the ping works with 8954 packets between interfaces. On 2021. Mar 15., at 23:40, Matthew H wrote:  Might be an MTU problem, have you checked your network and MTU settings? From: Szabo, Istvan