[ceph-users] Re: Global AVAIL vs Pool MAX AVAIL

2021-01-11 Thread Mark Johnson
Thanks Anthony, Shortly after I made that post, I found a Server Fault post where someone had asked the exact same question. The reply was this - "The 'MAX AVAIL' column represents the amount of data that can be used before the first OSD becomes full. It takes into account the projected

[ceph-users] Re: Ceph 15.2.3 on Ubuntu 20.04 with odroid xu4 / python thread Problem

2021-01-11 Thread Oliver Weinmann
Hi again, it took me some time but I figured out that on ubuntu focal there is a more recent version of ceph (15.2.7) available. So I gave it a try and replaced the ceph_argparse.py file but it still stuck running the command: [2021-01-11 23:44:06,340][ceph_volume.process][INFO  ] Running

[ceph-users] denied reconnect attempt for ceph fs client

2021-01-11 Thread Frank Schilder
Hi all, I'm not 100% sure, but I believe that since the update from mimic-13.2.8 to mimic-13.2.10 I have a strange issue. If a ceph fs client becomes unresponsive, it is evicted, but it cannot reconnect; see ceph.log extract below. In the past, clients would retry after the blacklist period

[ceph-users] Global AVAIL vs Pool MAX AVAIL

2021-01-11 Thread Mark Johnson
Can someone please explain to me the difference between the Global "AVAIL" and the "MAX AVAIL" in the pools table when I do a "ceph df detail"? The reason being that we have a total of 14 pools, however almost all of our data exists in one pool. A "ceph df detail" shows the following:

[ceph-users] Re: bluefs_buffered_io=false performance regression

2021-01-11 Thread Robert Sander
Hi Marc and Dan, thanks for your quick responses assuring me that we did nothing totally wrong. Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B

[ceph-users] "ceph orch restart mgr" command creates mgr restart loop

2021-01-11 Thread Chris Read
Greetings all... I'm busy testing out Ceph and have hit this troublesome bug while following the steps outlined here: https://docs.ceph.com/en/octopus/cephadm/monitoring/#configuring-ssl-tls-for-grafana When I issue the "ceph orch restart mgr" command, it appears the command is not cleared from

[ceph-users] DocuBetter Meeting This Week -- 13 Jan 2021 1730 UTC

2021-01-11 Thread John Zachary Dover
Unless an unforeseen crisis arises, the DocuBetter meetings for the next two months will focus on ensuring that we have a smooth and easy-to-understand docs suite for the release of Pacific. Meeting: https://bluejeans.com/908675367 Etherpad: https://pad.ceph.com/p/Ceph_Documentation

[ceph-users] Re: bluefs_buffered_io=false performance regression

2021-01-11 Thread Dan van der Ster
And to add some references, there is a PR on hold here: https://github.com/ceph/ceph/pull/38044 which links some relevant trackers entries. Outside of large block.db removals (e.g. from backfilling or snap trimming) we didn't notice a huge difference -- though that is not conclusive. There are

[ceph-users] Re: bluefs_buffered_io=false performance regression

2021-01-11 Thread Mark Nelson
Hi Robert, We are definitely aware of this issue.  It appears to often be related to snap trimming and we believe possibly related to excessive thrashing of the rocksdb block cache.  I suspect that when bluefs_buffered_io is enabled it hides the issue and people don't notice the problem, but

[ceph-users] bluefs_buffered_io=false performance regression

2021-01-11 Thread Robert Sander
Hi, bluefs_buffered_io was disabled in Ceph version 14.2.11. The cluster started last year with 14.2.5 and got upgraded over the year now running 14.2.16. The performance was OK first but got abysmal bad at the end of 2020. We checked the components and HDDs and SSDs seem to be fine. Single

[ceph-users] Re: RBD Image can't be formatted - blk_error

2021-01-11 Thread Ilya Dryomov
On Mon, Jan 11, 2021 at 10:09 AM Gaël THEROND wrote: > > Hi Ilya, > > Here is additional information: > My cluster is a three OSD Nodes cluster with each node having 24 4TB SSD > disks. > > The mkfs.xfs command fail with the following error: > https://pastebin.com/yTmMUtQs > > I'm using the

[ceph-users] Re: [cephadm] Point release minor updates block themselves infinitely

2021-01-11 Thread Paul Browne
Next thing I've tried is taking a low-impact host and purging all Ceph/Podman state from it to re-install it from scratch (a Rados GW instance, in this case). But now seeing this strange error in just re-adding the host via a "ceph orch host add" , at the point where a disk inventory is attempted

[ceph-users] [cephadm] Point release minor updates block themselves infinitely

2021-01-11 Thread Paul Browne
Hello all, I've been having some real troubles in getting cephadm to apply some very minor point release updates cleanly, twice now applying the point update of 15.2.6 -> 15.2.7 and 15.2.7 to 15.2.8 has gotten blocked somewhere and ended up making no progress, requiring digging deep into

[ceph-users] Re: osd gradual reweight question

2021-01-11 Thread mj
Hi Anthony and Frank, Thanks for your responses! I think you have answered my question: the impact of one complete sudden reweight to zero is bigger, because of the increased peering that is happening. With impact I meant: OSDs being marked down by the cluster. (and automatically coming

[ceph-users] Re: performance impact by pool deletion?

2021-01-11 Thread Scheurer François
Many Thanks Eugen for you experience sharing!! very useful information (I thought I was maybe to paranoid.. thanks god I asked the maillist first!) Cheers Francois -- EveryWare AG François Scheurer Senior Systems Engineer Zurlindenstrasse 52a CH-8003 Zürich tel: +41 44 466 60 00 fax: +41

[ceph-users] Re: performance impact by pool deletion?

2021-01-11 Thread Scheurer François
Thank you Glen and Frank for your experience sharing! Cheers Francois -- EveryWare AG François Scheurer Senior Systems Engineer Zurlindenstrasse 52a CH-8003 Zürich tel: +41 44 466 60 00 fax: +41 44 466 60 10 mail: francois.scheu...@everyware.ch web: http://www.everyware.ch

[ceph-users] Re: RBD Image can't be formatted - blk_error

2021-01-11 Thread Gaël THEROND
Hi Ilya, Here is additional information: My cluster is a three OSD Nodes cluster with each node having 24 4TB SSD disks. The mkfs.xfs command fail with the following error: https://pastebin.com/yTmMUtQs I'm using the following command to format the image: mkfs.xfs /dev/rbd// I'm facing the same