[ceph-users] A basic question on failure domain

2018-10-19 Thread Cody
Hi folks, I have a rookie question. Does the number of the buckets chosen as the failure domain must be equal or greater than the number of replica (or k+m for erasure coding)? E.g., for an erasure code profile where k=4, m=2, failure domain=rack, does it only work when there are 6 or more racks

Re: [ceph-users] why set pg_num do not update pgp_num

2018-10-19 Thread Dai Xiang
On Fri, Oct 19, 2018 at 10:06:06AM +0200, Wido den Hollander wrote: > > > On 10/19/18 7:51 AM, xiang@iluvatar.ai wrote: > > Hi! > > > > I use ceph 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic > > (stable), and find that: > > > > When expand whole cluster, i update pg_num, all

Re: [ceph-users] Broken CephFS stray entries?

2018-10-19 Thread Yan, Zheng
no action is required. mds fixes this type of error atomically. On Fri, Oct 19, 2018 at 6:59 PM Burkhard Linke wrote: > > Hi, > > > upon failover or restart, or MDS complains that something is wrong with > one of the stray directories: > > > 2018-10-19 12:56:06.442151 7fc908e2d700 -1

Re: [ceph-users] bluestore compression enabled but no data compressed

2018-10-19 Thread Igor Fedotov
Hi Frank, On 10/19/2018 2:19 PM, Frank Schilder wrote: Hi David, sorry for the slow response, we had a hell of a week at work. OK, so I had compression mode set to aggressive on some pools, but the global option was not changed, because I interpreted the documentation as "pool settings take

Re: [ceph-users] fixing another remapped+incomplete EC 4+2 pg

2018-10-19 Thread Gregory Farnum
On Thu, Oct 18, 2018 at 2:28 PM Graham Allan wrote: > Thanks Greg, > > This did get resolved though I'm not 100% certain why! > > For one of the suspect shards which caused crash on backfill, I > attempted to delete the associated via s3, late last week. I then > examined the filestore OSDs and

Re: [ceph-users] slow_used_bytes - SlowDB being used despite lots of space free in BlockDB on SSD?

2018-10-19 Thread Igor Fedotov
Hi Nick On 10/19/2018 10:14 AM, Nick Fisk wrote: -Original Message- From: Igor Fedotov [mailto:ifedo...@suse.de] Sent: 19 October 2018 01:03 To: n...@fisk.me.uk; ceph-users@lists.ceph.com Subject: Re: [ceph-users] slow_used_bytes - SlowDB being used despite lots of space free in

Re: [ceph-users] understanding % used in ceph df

2018-10-19 Thread Jakub Jaszewski
Hi, your question is more about MAX AVAIL value I think, see how Ceph calculates it http://docs.ceph.com/docs/luminous/rados/operations/monitoring/#checking-a-cluster-s-usage-stats One OSD getting full makes the pool full as well, so keep on nearfull OSDs reweighting . Jakub 19 paź 2018 16:34

Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-19 Thread Stefan Priebe - Profihost AG
Hi, we were able to solve these issues. We switched bcache OSDs from ssd to hdd in the ceph osd tree and lowered max recover from 3 to 1. Thanks for your help! Greets, Stefan Am 18.10.2018 um 15:42 schrieb David Turner: > What are you OSD node stats?  CPU, RAM, quantity and size of OSD disks. 

Re: [ceph-users] bluestore compression enabled but no data compressed

2018-10-19 Thread David Turner
1) I don't really know about the documentation. You can always put together a PR for an update to the docs. I only know what I've tested trying to get compression working. 2) If you have permissive in both places, no compression will happen, if you have aggressive globally for the OSDs and none

[ceph-users] ceph-deploy error

2018-10-19 Thread Vikas Rana
Hi there, While upgrading from jewel to luminous, all packages wereupgraded but while adding MGR with cluster name CEPHDR, it fails. It works with default cluster name CEPH root@vtier-P-node1:~# sudo su - ceph-deploy ceph-deploy@vtier-P-node1:~$ ceph-deploy --ceph-conf /etc/ceph/cephdr.conf mgr

Re: [ceph-users] radosgw s3 bucket acls

2018-10-19 Thread Niels Denissen
Hi, I’m currently running into a similar problem. My goal is to ensure all S3 users are able to list any buckets/objects that are available within ceph. Haven’t found a way around that yet, I indeed found also that linking buckets to users allows them to list anything, but only for the user the

Re: [ceph-users] bluestore compression enabled but no data compressed

2018-10-19 Thread Frank Schilder
Hi David, sorry for the slow response, we had a hell of a week at work. OK, so I had compression mode set to aggressive on some pools, but the global option was not changed, because I interpreted the documentation as "pool settings take precedence". To check your advise, I executed ceph

Re: [ceph-users] Broken CephFS stray entries?

2018-10-19 Thread Paul Emmerich
Try to run a scrub on the mds: ceph daemon mds. scrub_path / recursive That might yield additional information. You can then add "repair" to the command to try to fix it. Paul Am Fr., 19. Okt. 2018 um 12:59 Uhr schrieb Burkhard Linke : > > Hi, > > > upon failover or restart, or MDS complains

[ceph-users] Broken CephFS stray entries?

2018-10-19 Thread Burkhard Linke
Hi, upon failover or restart, or MDS complains that something is wrong with one of the stray directories: 2018-10-19 12:56:06.442151 7fc908e2d700 -1 log_channel(cluster) log [ERR] : bad/negative dir size on 0x607 f(v133 m2018-10-19 12:51:12.016360 -4=-5+1) 2018-10-19 12:56:06.442182

Re: [ceph-users] cephfs kernel client blocks when removing large files

2018-10-19 Thread Yan, Zheng
On Mon, Oct 8, 2018 at 2:57 PM Dylan McCulloch wrote: > > Hi all, > > > We have identified some unexpected blocking behaviour by the ceph-fs kernel > client. > > > When performing 'rm' on large files (100+GB), there appears to be a > significant delay of 10 seconds or more, before a 'stat'

Re: [ceph-users] 12.2.8: 1 node comes up (noout set), from a 6 nodes cluster -> I/O stuck (rbd usage)

2018-10-19 Thread Eugen Block
No, you do not need to set nobackfill and norecover if you only shut down one server. The guide you are referencing is about shutting down everything. It will not recover degraded PGs if you shut down one server with noout. You are right, I must have confused something in my memory with the

[ceph-users] understanding % used in ceph df

2018-10-19 Thread Florian Engelmann
Hi, Our Ceph cluster is a 6 Node cluster each node having 8 disks. The cluster is used for object storage only (right now). We do use EC 3+2 on the buckets.data pool. We had a problem with RadosGW segfaulting (12.2.5) till we upgraded to 12.2.8. We had almost 30.000 radosgw crashes leading

Re: [ceph-users] 12.2.8: 1 node comes up (noout set), from a 6 nodes cluster -> I/O stuck (rbd usage)

2018-10-19 Thread Paul Emmerich
No, you do not need to set nobackfill and norecover if you only shut down one server. The guide you are referencing is about shutting down everything. It will not recover degraded PGs if you shut down one server with noout. Paul Am Fr., 19. Okt. 2018 um 11:37 Uhr schrieb Eugen Block : > > Hi

Re: [ceph-users] 12.2.8: 1 node comes up (noout set), from a 6 nodes cluster -> I/O stuck (rbd usage)

2018-10-19 Thread Eugen Block
Hi Denny, the recommendation for ceph maintenance is to set three flags if you need to shutdown a node (or the entire cluster): ceph osd set noout ceph osd set nobackfill ceph osd set norecover Although the 'noout' flag seems to be enough for many maintenance tasks it doesn't prevent the

Re: [ceph-users] Anyone tested Samsung 860 DCT SSDs?

2018-10-19 Thread Konstantin Shalygin
Thanks for the feedback everyone. Based on the TBW figures, it sounds like these drives are terrible for us as the idea is to NOT use them simply for archive. This will be a high read/write workload, so totally a show stopper. I’m interested in the Seagate Nytro myself. I was recommend

Re: [ceph-users] slow_used_bytes - SlowDB being used despite lots of space free in BlockDB on SSD?

2018-10-19 Thread Nick Fisk
> -Original Message- > From: Nick Fisk [mailto:n...@fisk.me.uk] > Sent: 19 October 2018 08:15 > To: 'Igor Fedotov' ; ceph-users@lists.ceph.com > Subject: RE: [ceph-users] slow_used_bytes - SlowDB being used despite lots of > space free in BlockDB on SSD? > > > -Original Message-

Re: [ceph-users] Apply bucket policy to bucket for LDAP user: what is the correct identifier for principal

2018-10-19 Thread Ha Son Hai
Hello, I found that the metadata of ldap user and normal radosgw user different in the "type". Can it be the cause that the bucket policy does not work? # Normal radosgw user { "user_id": "ceph-dashboard", "display_name": "Ceph Dashboard", "email": "", "suspended": 0,

Re: [ceph-users] why set pg_num do not update pgp_num

2018-10-19 Thread Wido den Hollander
On 10/19/18 7:51 AM, xiang@iluvatar.ai wrote: > Hi! > > I use ceph 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic > (stable), and find that: > > When expand whole cluster, i update pg_num, all succeed, but the status > is as below: >   cluster: >     id:

Re: [ceph-users] Troubleshooting hanging storage backend whenever there is any cluster change

2018-10-19 Thread Konstantin Shalygin
since some time we experience service outages in our Ceph cluster whenever there is any change to the HEALTH status. E. g. swapping storage devices, adding storage devices, rebooting Ceph hosts, during backfills ect. Just now I had a recent situation, where several VMs hung after I rebooted one

Re: [ceph-users] slow_used_bytes - SlowDB being used despite lots of space free in BlockDB on SSD?

2018-10-19 Thread Nick Fisk
> -Original Message- > From: Igor Fedotov [mailto:ifedo...@suse.de] > Sent: 19 October 2018 01:03 > To: n...@fisk.me.uk; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] slow_used_bytes - SlowDB being used despite lots of > space free in BlockDB on SSD? > > > > On 10/18/2018 7:49

Re: [ceph-users] Jewel to Luminous RGW upgrade issues

2018-10-19 Thread Arvydas Opulskis
Yes, we know it now :) But it was a surprise at the moment we started RGW upgrade, because it was not noticed in release notes, or I missed it somehow. On Fri, Oct 19, 2018 at 9:41 AM Konstantin Shalygin wrote: > On 10/19/18 1:37 PM, Arvydas Opulskis wrote: > > Yes, that's understandable, but

Re: [ceph-users] What is rgw.none

2018-10-19 Thread Arvydas Opulskis
Hi, we have same question when trying to understand output of bucket stats. Maybe you had found explanation somewhere else? Thanks, Arvydas On Mon, Aug 6, 2018 at 10:28 AM Tomasz Płaza wrote: > Hi all, > > I have a bucket with a vary big num_objects in rgw.none: > > { > "bucket": "dyna", >

Re: [ceph-users] Jewel to Luminous RGW upgrade issues

2018-10-19 Thread Konstantin Shalygin
On 10/19/18 1:37 PM, Arvydas Opulskis wrote: Yes, that's understandable, but question was about "transit period" when at some point we had part of RGW's upgraded and some of them were still in Jewel. At that time we had a lot of complains from S3 users, who couldn't access their buckets

Re: [ceph-users] Disabling RGW Encryption support in Luminous

2018-10-19 Thread Arvydas Opulskis
Hi, yes we did it two days ago too. There is PR for this, but it's not commited yet. Thanks, anyway! Arvydas On Fri, Oct 19, 2018 at 7:15 AM Konstantin Shalygin wrote: > After RGW upgrade from Jewel to Luminous, one S3 user started to receive > errors from his postgre wal-e solution. Error is

Re: [ceph-users] Jewel to Luminous RGW upgrade issues

2018-10-19 Thread Arvydas Opulskis
Yes, that's understandable, but question was about "transit period" when at some point we had part of RGW's upgraded and some of them were still in Jewel. At that time we had a lot of complains from S3 users, who couldn't access their buckets randomly. We did several upgrades in last years and it