[ceph-users] Unexpected recovering after nautilus 14.2.7 -> 14.2.8

2020-03-04 Thread Rainer Krienke
Hello, at the moment my ceph is still working but in a degraded state after I upgraded one (in 9) hosts from 14.2.7 to 14.2.8 and rebooting this host (node2, one monitor in 3) after the upgrade. Usually before rebooting I set ceph osd set noout ceph osd set nobackfill ceph osd set

[ceph-users] Re: consistency of import-diff

2020-03-04 Thread Stefan Priebe - Profihost AG
Am 04.03.20 um 16:02 schrieb Wido den Hollander: > > > On 3/4/20 3:49 PM, Lars Marowsky-Bree wrote: >> On 2020-03-04T15:44:34, Wido den Hollander wrote: >> >>> I understand what you are trying to do, but it's a trade-off. Endless >>> snapshots are also a danger because bit-rot can sneak in

[ceph-users] Re: consistency of import-diff

2020-03-04 Thread Stefan Priebe - Profihost AG
Am 04.03.20 um 15:49 schrieb Lars Marowsky-Bree: > On 2020-03-04T15:44:34, Wido den Hollander wrote: > >> I understand what you are trying to do, but it's a trade-off. Endless >> snapshots are also a danger because bit-rot can sneak in somewhere which >> you might not notice. >> >> A fresh

[ceph-users] Re: consistency of import-diff

2020-03-04 Thread Stefan Priebe - Profihost AG
Am 04.03.20 um 15:44 schrieb Wido den Hollander: > > > On 3/3/20 8:46 PM, Stefan Priebe - Profihost AG wrote: >> Hello, >> >> does anybody know whether there is any mechanism to make sure an image >> looks like the original after an import-diff? >> >> While doing ceph backups on another ceph

[ceph-users] Re: High memory ceph mgr 14.2.7

2020-03-04 Thread hoannv46
I disabled some module in mgr : influx, dashboard, prometheus. After i restart mgr, ram of mgr increase to 20GB in some seconds. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: pg_num as power of two adjustment: only downwards?

2020-03-04 Thread John Petrini
You need to increase the pg number (number of placement groups) before you increase the pgp number (number of placement groups for placement). The former creates the pg's and the latter activates them and triggers a rebalance. ___ ceph-users mailing

[ceph-users] MDS getting stuck on 'resolve' and 'rejoin'

2020-03-04 Thread Anastasia Belyaeva
Hello! Our CephFS mds cluster consists of 3 ranks. We had a minor issue with the network the ceph runs on. And after that cephfs became unavaialble: rank 1 and 2 stuck in rejoin rank 0 can't get pass 'resolve' state and keeps getting blacklisted I checked the logs (with debug_mds 5/5) on the

[ceph-users] Re: Need clarification on CephFS, EC Pools, and File Layouts

2020-03-04 Thread Patrick Donnelly
Hi Jake, On Wed, Mar 4, 2020 at 8:11 AM Jake Grimmett wrote: > Our cluster uses a 4PB 8:2 EC pool for cephfs data, and a 900GB > replicated NVME pool for metadata. Presumably all writes put a footprint > on the EC pool? Yes. If you don't presently have issues there may be no reason to change

[ceph-users] Re: How can I fix "object unfound" error?

2020-03-04 Thread Chad William Seys
Maybe I've marked the object as "lost" and removed the failed OSD. The cluster now is healthy, but I'd like to understand if it's likely to bother me again in the future. Yeah, I don't know. Within the last month there are 4 separate instances of people mentioning "unfound" object in

[ceph-users] How can I fix "object unfound" error?

2020-03-04 Thread Chad William Seys
Hi Simone, Maybe you've hit this bug: https://tracker.ceph.com/issues/44286 ? Chad. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Radosgw dynamic sharding jewel -> luminous

2020-03-04 Thread Casey Bodley
Okay, it's sharing the log_pool so you shouldn't need any special pool permissions there. The 'failed to list reshard log entries' error message is coming from a call to cls_rgw_reshard_list(), which is a new API in cls_rgw. Have all of your osds been upgraded to support that? On 3/4/20

[ceph-users] Re: Need clarification on CephFS, EC Pools, and File Layouts

2020-03-04 Thread Jake Grimmett
Hi Patrick, many thanks for clarifying this - very useful :) I'm sure a whole lot of people would love to retrospectively move the default from an EC to replicated pool, so +1 from us too. Our cluster uses a 4PB 8:2 EC pool for cephfs data, and a 900GB replicated NVME pool for metadata.

[ceph-users] Re: Need clarification on CephFS, EC Pools, and File Layouts

2020-03-04 Thread Patrick Donnelly
Hello Dave, On Tue, Mar 3, 2020 at 12:34 PM Dave Hall wrote: > This is for a cluster currently running at 14.2.7. Since our cluster is > still relatively small we feel a strong need to run our CephFS on an EC > Pool (8 + 2) and Crush Failure Domain = OSD to maximize capacity. > > I have read

[ceph-users] Re: Radosgw dynamic sharding jewel -> luminous

2020-03-04 Thread Robert LeBlanc
On Tue, Mar 3, 2020 at 10:31 AM Casey Bodley wrote: > The default value of this reshard pool is "default.rgw.log:reshard". You > can check 'radosgw-admin zone get' for the list of pool names/namespaces > in use. It may be that your log pool is named ".rgw.log" instead, so you > could change your

[ceph-users] Re: consistency of import-diff

2020-03-04 Thread Wido den Hollander
On 3/4/20 3:49 PM, Lars Marowsky-Bree wrote: > On 2020-03-04T15:44:34, Wido den Hollander wrote: > >> I understand what you are trying to do, but it's a trade-off. Endless >> snapshots are also a danger because bit-rot can sneak in somewhere which >> you might not notice. >> >> A fresh export

[ceph-users] Re: High memory ceph mgr 14.2.7

2020-03-04 Thread Mark Lopez
I noticed the same - https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/WS6OWKZ5NQXRJIQNGRBGQPPETWOBUOFP/. Out of curiosity, do you use the dashboard, and if so, does keeping the dashboard open cause the mgr RAM usage to increase overtime? Regards, Mark

[ceph-users] Re: consistency of import-diff

2020-03-04 Thread Lars Marowsky-Bree
On 2020-03-04T15:44:34, Wido den Hollander wrote: > I understand what you are trying to do, but it's a trade-off. Endless > snapshots are also a danger because bit-rot can sneak in somewhere which > you might not notice. > > A fresh export (full copy) every X period protects you against this.

[ceph-users] Re: consistency of import-diff

2020-03-04 Thread Wido den Hollander
On 3/3/20 8:46 PM, Stefan Priebe - Profihost AG wrote: > Hello, > > does anybody know whether there is any mechanism to make sure an image > looks like the original after an import-diff? > > While doing ceph backups on another ceph cluster i currently do a fresh > import every 7 days. So i'm

[ceph-users] Re: consistency of import-diff

2020-03-04 Thread Janne Johansson
Den tis 3 mars 2020 kl 21:48 skrev Stefan Priebe - Profihost AG < s.pri...@profihost.ag>: > > You can use a full local export, piped to some hash program (this is > > what Backurne¹ does) : rbd export - | xxhsum > > Then, check the hash consistency with the original > > Thanks for the suggestion

[ceph-users] is ceph balancer doing anything?

2020-03-04 Thread Andrei Mikhailovsky
Hello everyone, A few weeks ago I have enabled the ceph balancer on my cluster as per the instructions here: [ https://docs.ceph.com/docs/mimic/mgr/balancer/ | https://docs.ceph.com/docs/mimic/mgr/balancer/ ] I am running ceph version: ceph version 13.2.6

[ceph-users] Re: Error in Telemetry Module

2020-03-04 Thread Lenz Grimmer
On 2020-03-04 13:49, Tecnologia Charne.Net wrote: > Any thoughts? >>> I tried disable an re-enable the module, but the error remains. >>> >> The telemetry server seems to be down. People have been notified :-) >> >> Wido >> > Thanks! > > The message HEALTH_ERR, in red, on the front of the

[ceph-users] Re: Error in Telemetry Module

2020-03-04 Thread Tecnologia Charne.Net
Any thoughts? I tried disable an re-enable the module, but the error remains. The telemetry server seems to be down. People have been notified :-) Wido Thanks! The message HEALTH_ERR, in red, on the front of the dashboard, is an interesting way to start the day. ;) -Javier

[ceph-users] Re: Error in Telemetry Module

2020-03-04 Thread Wido den Hollander
On 3/4/20 12:35 PM, Tecnologia Charne.Net wrote: > Hello! > > Today, I started the day with > > # ceph -s >   cluster: >     health: HEALTH_ERR >     Module 'telemetry' has failed: > HTTPSConnectionPool(host='telemetry.ceph.com', port=443): Max retries > exceeded with url: /report

[ceph-users] Error in Telemetry Module

2020-03-04 Thread Tecnologia Charne.Net
Hello! Today, I started the day with # ceph -s   cluster:     health: HEALTH_ERR     Module 'telemetry' has failed: HTTPSConnectionPool(host='telemetry.ceph.com', port=443): Max retries exceeded with url: /report (Caused by NewConnectionError('at 0x7fa97e5a4f90>: Failed to establish

[ceph-users] Re: v14.2.8 Nautilus released

2020-03-04 Thread kefu chai
On Wed, Mar 4, 2020 at 3:46 AM Kaleb Keithley wrote: > > > Just FYI, 14.2.8 build fails on Fedora-32 on S390x. Other architectures build > fine. > Kaleb, see https://github.com/ceph/ceph/pull/33716. hopefully we can get in the next release or probably you could include this patch in the rpm

[ceph-users] High memory ceph mgr 14.2.7

2020-03-04 Thread hoannv46
Hi all. My cluster in ceph version 14.2.6 Mgr process in top 3104786 ceph 20 0 20.2g 19.4g 18696 S 315.3 62.0 41:32.74 ceph-mgr

[ceph-users] Re: Forcibly move PGs from full to empty OSD

2020-03-04 Thread Thomas Schneider
Hi Wido, can you please share some detailed instructions how to do this? And what do you mean with "respect your failure domain"? THX Am 04.03.2020 um 11:27 schrieb Wido den Hollander: > > On 3/4/20 11:15 AM, Thomas Schneider wrote: >> Hi, >> >> Ceph balancer is not working correctly; there's

[ceph-users] Re: Forcibly move PGs from full to empty OSD

2020-03-04 Thread Thomas Schneider
Hi, I already use CRUSHMAP weight to manually control the OSD utilization. However this results in a situation where 5-10% of my 336 OSDs have a weight < 1.0, and this would impact/hinder ceph balancer to work. This means I would need to modify any OSD with weight < 1.0 first before ceph

[ceph-users] Re: Forcibly move PGs from full to empty OSD

2020-03-04 Thread Scheurer François
Hi Thomas To get the usage: ceph osd df | sort -nk8 #VAR is the ratio to avg util #WEIGHT is CRUSHMAP weight; typically the Disk capacity in TiB #REWEIGHT is temporary (until osd restart or ceph osd set noout) WEIGHT correction for manual rebalance You can use for temporary

[ceph-users] Re: Forcibly move PGs from full to empty OSD

2020-03-04 Thread Wido den Hollander
On 3/4/20 11:15 AM, Thomas Schneider wrote: > Hi, > > Ceph balancer is not working correctly; there's an open bug > report, too. > > Until this issue is not solved, I need a workaround because I get more > and more warnings about "nearfull osd(s)". > >

[ceph-users] Forcibly move PGs from full to empty OSD

2020-03-04 Thread Thomas Schneider
Hi, Ceph balancer is not working correctly; there's an open bug report, too. Until this issue is not solved, I need a workaround because I get more and more warnings about "nearfull osd(s)". Therefore my question is: How can I forcibly move PGs from full

[ceph-users] Re: MIgration from weight compat to pg_upmap

2020-03-04 Thread Stefan Priebe - Profihost AG
Am 04.03.20 um 11:08 schrieb Dan van der Ster: > https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py > > ^^ that can help. > > Best to try it on a test cluster first to see how it works before going ahead. perfect Greets, Stefan > -- dan > > > On Wed, Mar 4,

[ceph-users] Re: MIgration from weight compat to pg_upmap

2020-03-04 Thread Dan van der Ster
https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py ^^ that can help. Best to try it on a test cluster first to see how it works before going ahead. -- dan On Wed, Mar 4, 2020 at 11:05 AM Stefan Priebe - Profihost AG wrote: > > Hello, > > is there any way to

[ceph-users] MIgration from weight compat to pg_upmap

2020-03-04 Thread Stefan Priebe - Profihost AG
Hello, is there any way to switch to pg_upmap without triggering heavy rebalancing two times? 1.) happens at: ceph osd crush weight-set rm-compat 2.) happens after running the balancer in pg_upmap mode Greets, Stefan ___ ceph-users mailing list --

[ceph-users] Deleting Multiparts stuck directly from rgw.data pool

2020-03-04 Thread EDH - Manuel Rios
Hi, We're at 14.2.8, still got problems with abort multiparts. This night we created a full list of objects with the string multipart like