[ceph-users] Understanding monitor requirements

2020-04-09 Thread Brian Topping
Hi experts, question about monitors and latency. I am setting up a new cluster and I’d like to have more than one monitor. Unfortunately, the primary site only has two chassis, so to get the third mon, I’ve been trying to bring it up remotely. So far, it’s not working and I wonder if someone

[ceph-users] ceph-mgr with large connections in CLOSE_WAIT state

2020-04-09 Thread Void Star Nill
Hi, I am seeing a large number of connections from ceph-mgr are stuck in CLOSE_WAIT state with data stuck in the receive queue. Looks like ceph-mgr process is not reading the data completely off the socket buffers and terminating the connections properly. I also notice that the access to

[ceph-users] How to fix 1 pg stale+active+clean

2020-04-09 Thread Marc Roos
How to fix 1 pg marked as stale+active+clean pg 30.4 is stuck stale for 175342.419261, current state stale+active+clean, last acting [31] ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Fwd: question on rbd locks

2020-04-09 Thread Void Star Nill
Thanks Ilya, Paul. I dont have the panic traces and probably they are not related to rbd. I was merely describing our use case. On our setup that we manage, we have a software layer similar to Kubernetes CSI that orchestrates the volume map/unmap on behalf of the users. We are currently using

[ceph-users] Re: Fwd: Question on rbd maps

2020-04-09 Thread Void Star Nill
Thanks Ilya. Is there a more deterministic way to know where the volumes are mapped to? Thanks, Shridhar On Wed, 8 Apr 2020 at 03:06, Ilya Dryomov wrote: > A note of caution, though. "rbd status" just lists watches on the > image header object and a watch is not a reliable indicator of

[ceph-users] v15.2.1 Octopus released

2020-04-09 Thread Abhishek
This is the first bugfix release of Ceph Octopus, we recommend all Octopus users upgrade. This release fixes an upgrade issue and also has 2 security fixes Notable Changes ~~~ * issue#44759: Fixed luminous->nautilus->octopus upgrade asserts * CVE-2020-1759: Fixed nonce reuse in

[ceph-users] Re: Balancer not balancing (14.2.7, crush-compat)

2020-04-09 Thread Vladimir Brik
One possibly relevant detail: the cluster has 8 nodes, and the new pool I created uses k5 m2 erasure coding. Vlad On 4/9/20 11:28 AM, Vladimir Brik wrote: Hello I am running ceph 14.2.7 with balancer in crush-compat mode (needed because of old clients), but it's doesn't seem to be doing

[ceph-users] Balancer not balancing (14.2.7, crush-compat)

2020-04-09 Thread Vladimir Brik
Hello I am running ceph 14.2.7 with balancer in crush-compat mode (needed because of old clients), but it's doesn't seem to be doing anything. It used to work in the past. I am not sure what changed. I created a big pool, ~285TB stored, and it doesn't look like it ever got balanced: pool 43

[ceph-users] Bucket index entries containing unicode NULL - how to remove them?

2020-04-09 Thread Maks Kowalik
Hello, since some time I've been investigating problems causing buckets' index corruption. In my case it's been because of numerous bugs related to index resharding and bucket lifecycle policies. One of those bugs present in versions prior to 14.2.8 made the index omapkeys' names contain unicode

[ceph-users] Re: remove S3 bucket with rados CLI

2020-04-09 Thread Dan van der Ster
On Thu, Apr 9, 2020 at 3:25 PM Robert Sander wrote: > > Hi Dan, > > Am 09.04.20 um 15:08 schrieb Dan van der Ster: > > > > What do you have for full_ratio? > > The cluster is running Nautilus and the ratios should still be the > default values. Currently I have to direct access to report them. >

[ceph-users] Re: remove S3 bucket with rados CLI

2020-04-09 Thread Robert Sander
Hi Dan, Am 09.04.20 um 15:08 schrieb Dan van der Ster: > > What do you have for full_ratio? The cluster is running Nautilus and the ratios should still be the default values. Currently I have to direct access to report them. > Maybe you can unblock by setting the full_ratio to 0.96? We will

[ceph-users] Re: remove S3 bucket with rados CLI

2020-04-09 Thread Dan van der Ster
Hi, What do you have for full_ratio? Here are the defaults: # ceph osd dump | grep full full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 Maybe you can unblock by setting the full_ratio to 0.96? -- Dan On Thu, Apr 9, 2020 at 3:00 PM Robert Sander wrote: > > Hi, > > is it possible

[ceph-users] remove S3 bucket with rados CLI

2020-04-09 Thread Robert Sander
Hi, is it possible to remove an S3 bucket and its S3 objects with the rados CLI tool (removing the low level Ceph objects)? The situation is a nearfull cluster with OSDs filled more than 80% and the default.rgw.buckets.data being reported as 100% full. Read operations are still possible but no

[ceph-users] Re: Using M2 SSDs as osds

2020-04-09 Thread Martin Verges
Hello Felix, the lifetime is not a matter of the connector/slots type, it's about the disk itself. Check the datasheet for TBW and make sure your drive is suitable for Ceph. If both are ok, M.2 is absolutely ok. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail:

[ceph-users] Re: MDS: cache pressure warnings with Ganesha exports

2020-04-09 Thread Jeff Layton
On Tue, 2020-04-07 at 07:34 +, Stolte, Felix wrote: > Hey folks, > > I keep getting ceph health warnings about clients failing to respond to > cache pressure. They always refer to sessions from ganesha exports. I've read > all threads regarding this issue, but none of my changes resolved

[ceph-users] Re: MDS: cache pressure warnings with Ganesha exports

2020-04-09 Thread Eugen Block
Hi Felix, we've had cache pressure messages for a long time in our small production cluster without seeing any negative impact to clients (or the cluster). We don't use Ganesha but we export CephFS directories via NFS. I guess NFS is the common denominator here. In our case we started with

[ceph-users] PGs unknown (osd down) after conversion to cephadm

2020-04-09 Thread Dr. Marco Savoca
Hi all, last week I successfully upgraded my cluster to Octopus and converted it to cephadm. The conversion process (according to the docs) went well and the cluster ran in an active+clean status. But after a reboot all osd went down with a delay of a couple of minutes after reboot and all (100%)

[ceph-users] Using M2 SSDs as osds

2020-04-09 Thread Stolte, Felix
Hey guys, I am evaluating using M2 SSDs as osds for an all flash pool. Is anyone using that in production and can elaborate on his experience? I am a little bit concerned about the lifetime of the M2 disks. Best regards Felix IT-Services Telefon 02461 61-9243 E-Mail: f.sto...@fz-juelich.de