[ceph-users] Re: Migrating clusters (and versions)

2020-05-13 Thread Konstantin Shalygin
On 5/8/20 2:32 AM, Kees Meijs wrote: I'm in the middle of an OpenStack migration (obviously Ceph backed) and stumble into some huge virtual machines. To ensure downtime is kept to a minimum, I'm thinking of using Ceph's snapshot features using rbd export-diff and import-diff. However, is it

[ceph-users] What is ceph doing after sync

2020-05-13 Thread Zhenshi Zhou
Hi, I deployed a multi-site in order to sync data from a cluster to anther. The data is fully synced(I suppose) and the cluster has no traffic at present. Everything seems fine. However, the sync status is not what I expected. Is there any step after data transfer? Can I change the master zone

[ceph-users] Re: Memory usage of OSD

2020-05-13 Thread Amudhan P
For Ceph release before nautilus to effect osd_memory_target changes need to restart OSD service. I had similar issue in mimic I did the same in my test setup. Before restarting OSD service ensure you set osd nodown and osd noout similar commands to ensure it doesn't trigger OSD down and

[ceph-users] What is a pgmap?

2020-05-13 Thread Bryan Henderson
I'm surprised I couldn't find this explained anywhere (I did look), but ... What is the pgmap and why does it get updated every few seconds on a tiny cluster that's mostly idle? I do know what a placement group (PG) is and that when documentation talks about placement group maps, it is talking

[ceph-users] Re: iscsi issues with ceph (Nautilus) + tcmu-runner

2020-05-13 Thread Mike Christie
On 5/13/20 3:21 AM, Phil Regnauld wrote: > So, we've been running with iscsi enabled (tcmu-runner) on our Nautilus ceph > cluster for a couple of weeks, and started using it with our vsphere cluster. > Things looked good so we put it in production, but yesterday morning we > experienced a freeze

[ceph-users] Re: Memory usage of OSD

2020-05-13 Thread Mark Nelson
Coincidentally Adam on our core team just reported this morning that he saw extremely high bluestore_cache_other memory usage while running compression performance tests as well.  That may indicate we have a memory leak related to the compression code.  I doubt setting the memory_target to

[ceph-users] Re: Memory usage of OSD

2020-05-13 Thread Rafał Wądołowski
Mark, Unfortunetly I closed terminal with mempool. But there was a lot of bytes used by bluestore_cache_other. That was the highest value (about 85%). The onode cache takes about 10%. PGlog and osdmaps was okey, low values. I saw some ideas that maybe compression_mode force in pool can make a

[ceph-users] Re: OSDs taking too much memory, for pglog

2020-05-13 Thread Mark Nelson
Hi Harald, I was thinking just changing the config setting for the pglog length.  Having said that, if you only have 123 PGs per OSD max and 8.5GB of pglog memory usage that sounds like a bug to me.  Can you create a tracker ticket with the ceph version and assoicated info?  One of the

[ceph-users] Re: Memory usage of OSD

2020-05-13 Thread Mark Nelson
On 5/13/20 12:43 AM, Rafał Wądołowski wrote: Hi, I noticed strange situation in one of our clusters. The OSD deamons are taking too much RAM. We are running 12.2.12 and have default configuration of osd_memory_target (4GiB). Heap dump shows: osd.2969 dumping heap profile now.

[ceph-users] Re: Cluster network and public network

2020-05-13 Thread Frank Schilder
Dear all, looks like I need to be more precise: >>> I think, however, that a disappearing back network has no real >>> consequences as the heartbeats always go over both. >> >> FWIW this has not been my experience, at least through Luminous. >> >> What I’ve seen is that when the

[ceph-users] Re: Erasure coded pool queries

2020-05-13 Thread Thomas Byrne - UKRI STFC
Aleksey Gutikov wrote a detailed response to a similar question last year, maybe this will help? http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-July/036318.html I haven't done much looking into this, but for your question, I believe the two option that control the size of objects on

[ceph-users] Luminous to Nautilus mon upgrade oddity - failed to decode mgrstat state; luminous dev version? buffer::end_of_buffer

2020-05-13 Thread Thomas Byrne - UKRI STFC
Hi all, We're upgrading a cluster from luminous to nautilus. The monitors and managers are running a non-release version of luminous (12.2.12-642-g5ff3e8e) and we're upgrading them to 14.2.9. We've upgraded one monitor and it's happily in quorum as a peon. However, when a ceph status hits

[ceph-users] Re: Disproportionate Metadata Size

2020-05-13 Thread Denis Krienbühl
Sure, the db device has a size of 22.5G, the primary deice has 100G. Here’s the complete ceph osd df output of one of the OSDs experiencing this issue: ID CLASS WEIGHT REWEIGHT SIZEUSE DATAOMAP METAAVAIL %USE VAR PGS 14 hdd 0.11960 1.0 122 GiB 118 GiB 2.4 GiB 0 B

[ceph-users] Re: Disproportionate Metadata Size

2020-05-13 Thread Eugen Block
Hi Daniel, I had the exact same issue in a (virtual) Luminous cluster without much data in it. The root cause was that my OSDs were too small (10 GB only) and the rocksDB also grew until manual compaction. I had configured the small OSDs intentionally because it was never supposed to

[ceph-users] Re: Disproportionate Metadata Size

2020-05-13 Thread Paul Emmerich
osd df is misleading when using external DB devices, they are always counted as 100% full there Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Wed, May 13, 2020

[ceph-users] Re: Cluster network and public network

2020-05-13 Thread Stefan Kooman
On 2020-05-12 18:59, Anthony D'Atri wrote: I think, however, that a disappearing back network has no real consequences as the heartbeats always go over both. FWIW this has not been my experience, at least through Luminous. What I’ve seen is that when the cluster/replication net is

[ceph-users] Disproportionate Metadata Size

2020-05-13 Thread Denis Krienbühl
Hi On one of our Ceph clusters, some OSDs have been marked as full. Since this is a staging cluster that does not have much data on it, this is strange. Looking at the full OSDs through “ceph osd df” I figured out that the space is mostly used by metadata: SIZE: 122 GiB USE: 118 GiB

[ceph-users] Ceph Nautilus packages for Ubuntu Focal

2020-05-13 Thread Stefan Kooman
Hi list, We're wondering if Ceph Nautilus packages will be provided for Ubuntu Focal Fossa (20.04)? You might wonder why one would not just use Ubuntu Bionic (18.04) instead of using the latest LTS. Here is why: a glibc bug in Ubuntu Bionic that *might* affect Open vSwitch (OVS) users [1].

[ceph-users] iscsi issues with ceph (Nautilus) + tcmu-runner

2020-05-13 Thread Phil Regnauld
So, we've been running with iscsi enabled (tcmu-runner) on our Nautilus ceph cluster for a couple of weeks, and started using it with our vsphere cluster. Things looked good so we put it in production, but yesterday morning we experienced a freeze of all iSCSO I/O one of the ESXi nodes, and the

[ceph-users] Re: Cluster network and public network

2020-05-13 Thread Anthony D'Atri
> > I did not mean to have a back network configured but it is taken down. Of > course this won't work. What I mean is that you: > > 1. remove the cluster network definition from the cluster config (ceph.conf > and/or ceph config ...) > 2. restart OSDs to apply the change > 3. remove the

[ceph-users] Re: Difficulty creating a topic for bucket notifications

2020-05-13 Thread Yuval Lifshitz
Hi Alexis, Which version are you using? There was a bug in 14.2.8 with topic creation. See: https://tracker.ceph.com/issues/44614 Also note that for topic operations we are using a different signature version (ver3) than the one used by default by the aws CLI tools. Please see here: