12.2.5 on Proxmox cluster.
6 nodes, about 50 OSDs, bluestore and cache tiering on an EC pool. Mostly
spinners with an SSD OSD drive and an SSD WAL DB drive on each node. PM863
SSDs with ~75%+ endurance remaning.
Has been running relatively okay besides some spinner failures until I
checked today
Hi Dan,
'noup' now makes a lot of sense - that's probably the major help that
our cluster start would have needed. Essentially this way only one map
change occurs in the cluster when all the OSDs are marked 'in' and that
gets distributed, vs hundreds or thousands of map changes as various
Thanks, Roman.
My RDMA is working correctly, I'm pretty sure of that for two reasons.
(1) E8 Storage agent running on all OSDs uses RDMA to communicate with our E8
Storage controller and it's working correctly at the moment. The volumes are
available and IO can be done at full line rate and
Hey Andras,
Three mons is possibly too few for such a large cluster. We've had lots of
good stable experience with 5-mon clusters. I've never tried 7, so I can't
say if that would lead to other problems (e.g. leader/peon sync
scalability).
That said, our 1-osd bigbang tests managed with only
Forgot to mention: all nodes are on Luminous 12.2.8 currently on CentOS 7.5.
On 12/19/18 5:34 PM, Andras Pataki wrote:
Dear ceph users,
We have a large-ish ceph cluster with about 3500 osds. We run 3 mons
on dedicated hosts, and the mons typically use a few percent of a
core, and generate
Dear ceph users,
We have a large-ish ceph cluster with about 3500 osds. We run 3 mons on
dedicated hosts, and the mons typically use a few percent of a core, and
generate about 50Mbits/sec network traffic. They are connected at
20Gbits/sec (bonded dual 10Gbit) and are running on 2x14 core
I would be interested learning about the performance increase it has
compared to 10Gbit. I got the ConnectX-3 Pro but I am not using the rdma
because support is not default available.
sockperf ping-pong -i 192.168.2.13 -p 5001 -m 16384 -t 10 --pps=max
sockperf: Warmup stage (sending a few
I’ve been struggling mightily with getting a realm/zonegroup/zone configuration
that works for me, and I find it difficult to imagine this isn’t a very common
issue. So I might assume I’m thinking about this incorrectly.
We have several clusters, geographically dispersed. I want them all to
Hi Wiko,
I would like to express my gratitude. Indeed setting tunables to optimal solved
the issue!
But it was a nightmare for 4 days, OMG! 54% data moving, mon loosing sync every
a few seconds, osds going down continuously, false osds marking down, mon
elections every like a few seconds,
Thanks for the insights Mohammad and Roman. Interesting read.
My interest in RDMA is purely from testing perspective.
Still I would be interested if somebody who has RDMA enabled and running, to
share their ceph.conf.
My RDMA related entries are taken from Mellanox blog here
Hi all,
we’re running a ceph hammer cluster with 3 mons and 24 osds (3 same nodes) and
need to migrate all servers to a new datacenter and change the IPs of the
nodes.
I found this tutorial:
Glad to hear it helped.
This particular option is ultra dangerous, so imho its obfuscated name is
just perfect!
Finally, since I didn't mention it earlier, don't forget to disable the
option and restart the relevant OSDs now that their active again. And it
would be sensible to deep scrub that PG
Ciao Dan,
thanks a lot for your message! :-)
Indeed, the procedure you outlined did the trick and I am now back to
healthy state.
--yes-i-really-really-love-ceph-parameter-names !!!
Ciao ciao
Fulvio
Original Message
Subject: Re:
Version: Mimic 13.2.2
Lately during any kind of cluster change, particularly adding OSD in
this most recent instance, I'm seeing our mons (all of them) showing
100% usage on a single core but not at all using any of the other
available cores on the system. Cluster commands are slow to respond
Thanks for your prompt reply
Volumes are being created just fine in the "volumes" pool but they are not
bootable
Also, ephemeral instances are working fine ( disks are being created on the
dedicated ceph pool "instances')
Access for cinder user from compute node is fine
[root@ops-ctrl-new
Hi,
can you explain more detailed what exactly goes wrong?
In many cases it's an authentication error, can you check if your
specified user is allowed to create volumes in the respective pool?
You could try something like this (from compute node):
rbd --user -k
Hi,
I'll appreciated if someone can provide some guidance for troubleshooting /
setting up Openstack (rocky) + ceph (mimic) so that volumes created on
ceph be bootable
I have followed this http://docs.ceph.com/docs/mimic/rbd/rbd-openstack/
enabled debug in both nova and cinder but still not
Hello everyone,
I am running mds + mon on 3 nodes. Recently due to increased cache pressure
and NUMA non-interleave effect, we decided to double the memory on the nodes
from 32 G to 64 G.
We wanted to upgrade a standby node first to be able to test new memory vendor.
So without much
Hi,
i just start with the Admin Ops API and wonder how to deal with jobs that
take hours or even days.
Worst example: I have a user with tons of files. When I send the DELETE
/admin/user - command with purge-data=True the HTTPS-Request times out,
but the job is still being processed. I have no
19 matches
Mail list logo