[ceph-users] Ceph EC PG calculation

2020-11-17 Thread Szabo, Istvan (Agoda)
Hi, I have this error: I have 36 osd and get this: Error ERANGE: pg_num 4096 size 6 would mean 25011 total pgs, which exceeds max 10500 (mon_max_pg_per_osd 250 * num_in_osds 42) If I want to calculate the max pg in my server, how it works if I have EC pool? I have 4:2 data EC pool, and the

[ceph-users] Re: Accessing Ceph Storage Data via Ceph Block Storage

2020-11-17 Thread DHilsbos
Vaughan; An absolute minimal Ceph cluster really needs to be 3 servers, and at that usable space should be 1/3 of raw space (see the archives of this mailing list for many discussions of why size=2 is bad). While it is possible to run other tasks on Ceph servers, memory utilization of Ceph

[ceph-users] Accessing Ceph Storage Data via Ceph Block Storage

2020-11-17 Thread Vaughan Beckwith
Hi All, I'm not sure if this is the correct place to ask this question, I have tried the channels, but have received very little help there. I am currently very new to Ceph and am investigating it as to a possible replacement for a legacy application which use to provide us with replication.

[ceph-users] CephFS error: currently failed to rdlock, waiting. clients crashing and evicted

2020-11-17 Thread Thomas Hukkelberg
Hi all! Hopefully some of you can shed some light on this. We have big problems with samba crashing when macOS smb clients access certain/random folders/files over vfs_ceph. When browsing cephfs folder in question directly on a cephnode where cephfs is mouted we experience some issues like

[ceph-users] Re: MGR restart loop

2020-11-17 Thread Frank Schilder
Addition: This happens only when I stop mon.ceph-01, I can stop any other MON daemon without problems. I checked network connectivity and all hosts can see all other hosts. I already increased mon_mgr_beacon_grace to a huge value due to another bug a long time ago: global advanced

[ceph-users] MGR restart loop

2020-11-17 Thread Frank Schilder
Dear cephers, I have a problem with MGR daemons, ceph version mimic-13.2.8. I'm trying to do maintenance on our MON/MGR servers and am through with 2 out of 3. I have MON and MGR collocated on a host, 3 hosts in total. So far, procedure was to stop the deamons on the server and do the

[ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

2020-11-17 Thread Anthony D'Atri
> > I'm probably going to get crucified for this Naw. The <> in your From: header, though …. ;) ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

2020-11-17 Thread DHilsbos
Phil; I'm probably going to get crucified for this, but I put a year of testing into this before determining it was sufficient to the needs of my organization... If the primary concerns are capability and cost (not top of the line performance), then I can tell you that we have had great

[ceph-users] Re: Ceph RBD - High IOWait during the Writes

2020-11-17 Thread Tony Liu
I am not sure any configuration tuning would help here. The bottleneck is on HDD. In my case, I have a SSD for WAL/DB and it provides pretty good write performance. The part I don't quite understand in your case is that, random read is quite fast. Due to the HDD seeking latency, the random read is

[ceph-users] Re: Reclassify crush map

2020-11-17 Thread Seena Fallah
Also when I reclassify-bucket to a non exist base bucket it says: "default parent test does not exist" But as documented in https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/ it should create it! On Tue, Nov 17, 2020 at 6:05 PM Seena Fallah wrote: > Hi all, > > I want to

[ceph-users] Module 'dashboard' has failed: '_cffi_backend.CDataGCP' object has no attribute 'type'

2020-11-17 Thread Marcelo
Hello all. I'm trying to deploy the dashboard (Nautilus 14.2.8), and after I run ceph dashboard create-self-signed-cert, the cluster started to show this warning: # ceph health detail HEALTH_ERR Module 'dashboard' has failed: '_cffi_backend.CDataGCP' object has no attribute 'type'

[ceph-users] Re: Ceph RBD - High IOWait during the Writes

2020-11-17 Thread athreyavc
I disabled the CephX authentication now. Though the Performance is Slightly better, it is not yet there. Are there any other recommendations for all HDD ceph clusters ? From another thread https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/DFHXXN4KKI5PS7LYPZJO4GYHU67JYVVL/ *In our

[ceph-users] Re: Bucket notification is working strange

2020-11-17 Thread Yuval Lifshitz
Hi Krasaev, Thanks for pointing out this issue! This is currently under review here: [1], and tracked here: [2]. Once merged, the fix would be available on the master development branch, and the plan is to backport the fix to Octopus in the future. Yuval [1]

[ceph-users] Reclassify crush map

2020-11-17 Thread Seena Fallah
Hi all, I want to reclassify my crushmap. I have two roots, one hiops and one default. In hiops root I have one datacenter and in that I have three rack and in each rack I have 3 osds. When I run the command below it says "item -55 in bucket -54 is not also a reclassified bucket". I see the new

[ceph-users] Re: osd_pglog memory hoarding - another case

2020-11-17 Thread Kalle Happonen
Hi, > I don't think the default osd_min_pg_log_entries has changed recently. > In https://tracker.ceph.com/issues/47775 I proposed that we limit the > pg log length by memory -- if it is indeed possible for log entries to > get into several MB, then this would be necessary IMHO. I've had a

[ceph-users] CephFS: Recovering from broken Mount

2020-11-17 Thread Julian Fölsch
Hello, We are running a Octopus cluster however we still have some older Ubuntu 16.04 clients connecting using libcephfs2 version 14.2.13-1xenial. From time to time it happened that the network was having issues so the clients lost the connection to the cluster. But the system still thinks

[ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

2020-11-17 Thread Robert Sander
Hi Phil, thanks for the background info. Am 17.11.20 um 01:51 schrieb Phil Merricks: > 1: Move off the data and scrap the cluster as it stands currently. > (already under way) > 2: Group the block devices into pools of the same geometry and type (and > maybe do some tiering?) > 3. Spread the

[ceph-users] Re: osd_pglog memory hoarding - another case

2020-11-17 Thread Mark Nelson
Hi Dan, I 100% agree with your proposal.  One of the goals I had in mind with the prioritycache framework is that pglog could end up becoming another prioritycache target that is balanced against the other caches.  The idea would be that we try to keep some amount of pglog data in memory at

[ceph-users] Re: osd_pglog memory hoarding - another case

2020-11-17 Thread Dan van der Ster
I don't think the default osd_min_pg_log_entries has changed recently. In https://tracker.ceph.com/issues/47775 I proposed that we limit the pg log length by memory -- if it is indeed possible for log entries to get into several MB, then this would be necessary IMHO. But you said you were

[ceph-users] Re: osd_pglog memory hoarding - another case

2020-11-17 Thread Kalle Happonen
Another idea, which I don't know if has any merit. If 8 MB is a realistic log size (or has this grown for some reason?), did the enforcement (or default) of the minimum value change lately (osd_min_pg_log_entries)? If the minimum amount would be set to 1000, at 8 MB per log, we would have

[ceph-users] Re: osd_pglog memory hoarding - another case

2020-11-17 Thread Dan van der Ster
On Tue, Nov 17, 2020 at 11:45 AM Kalle Happonen wrote: > > Hi Dan @ co., > Thanks for the support (moral and technical). > > That sounds like a good guess, but it seems like there is nothing alarming > here. In all our pools, some pgs are a bit over 3100, but not at any > exceptional values. >

[ceph-users] Re: osd_pglog memory hoarding - another case

2020-11-17 Thread Kalle Happonen
Hi Dan @ co., Thanks for the support (moral and technical). That sounds like a good guess, but it seems like there is nothing alarming here. In all our pools, some pgs are a bit over 3100, but not at any exceptional values. cat pgdumpfull.txt | jq '.pg_map.pg_stats[] | select(.ondisk_log_size

[ceph-users] Re: osd_pglog memory hoarding - another case

2020-11-17 Thread Dan van der Ster
Hi Kalle, Do you have active PGs now with huge pglogs? You can do something like this to find them: ceph pg dump -f json | jq '.pg_map.pg_stats[] | select(.ondisk_log_size > 3000)' If you find some, could you increase to debug_osd = 10 then share the osd log. I am interested in the debug

[ceph-users] Re: osd_pglog memory hoarding - another case

2020-11-17 Thread Dan van der Ster
Hi Xie, On Tue, Nov 17, 2020 at 11:14 AM wrote: > > Hi Dan, > > > > Given that it adds a case where the pg_log is not trimmed, I wonder if > > there could be an unforeseen condition where `last_update_ondisk` > > isn't being updated correctly, and therefore the osd stops trimming > > the pg_log

[ceph-users] Re: osd_pglog memory hoarding - another case

2020-11-17 Thread xie.xingguo
___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: osd_pglog memory hoarding - another case

2020-11-17 Thread Dan van der Ster
Hi Kalle, Strangely and luckily, in our case the memory explosion didn't reoccur after that incident. So I can mostly only offer moral support. But if this bug indeed appeared between 14.2.8 and 14.2.13, then I think this is suspicious: b670715eb4 osd/PeeringState: do not trim pg log past

[ceph-users] EC cluster cascade failures and performance problems

2020-11-17 Thread Paul Kramme
Hello, currently, we are experiencing problems with a cluster used for storing RBD backups. Config: * 8 nodes, each with 6 HDDs OSDs and 1 SSD used for blockdb and WAL * k=4 m=2 EC * dual 25GbE NIC * v14.2.8 ceph health detail shows the following messages: HEALTH_WARN BlueFS spillover detected

[ceph-users] osd_pglog memory hoarding - another case

2020-11-17 Thread Kalle Happonen
Hello all, wrt: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/7IMIWCKIHXNULEBHVUIXQQGYUDJAO2SF/ Yesterday we hit a problem with osd_pglog memory, similar to the thread above. We have a 56 node object storage (S3+SWIFT) cluster with 25 OSD disk per node. We run 8+3 EC for the

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-17 Thread Janek Bevendorff
I have run radosgw-admin gc list (without --include-all) a few times already, but the list was always empty. I will create a cron job running it every few minutes and writing out the results. On 17/11/2020 02:22, Eric Ivancich wrote: I’m wondering if anyone experiencing this bug would mind