Re: [ceph-users] backup ceph

2018-09-20 Thread Konstantin Shalygin
We're newbie to Ceph.  Besides using incremental snapshots with RDB to backup data on one Ceph cluster to another running Ceph cluster, or using backup tools like backy2, will there be any recommended way to backup Ceph data  ? Someone here suggested taking snapshot of RDB daily and keeps 30

Re: [ceph-users] network architecture questions

2018-09-20 Thread Konstantin Shalygin
Hi, I read through the various documentation and had a few questions: - From what I understand cephFS clients reach the OSDs directly, does the cluster network need to be opened up as a public network? All RADOS clients connect to osd's and mon's directly. - Is it still necessary to have a

Re: [ceph-users] Ceph backfill problem

2018-09-20 Thread Konstantin Shalygin
Has anyone experienced below? 2 of OSD server was down, after bring up 2 of servers, I brought 52 OSD's in with just weight of 0.05, but it causing huge backfilling load, I saw so many blocked requests and a number of pg stuck inactive. some of servers was impact. so I stopped backfilling by mark

Re: [ceph-users] Ceph backfill problem

2018-09-20 Thread Matthew H
Without knowing more about the underlying hardware, you likely are reaching some type of IO resource constraint. Are your journals colocated or non-colocated? How fast is your backend OSD storage device? You may also want to look at setting the norebalance flag. Good luck! > On Sep 20,

Re: [ceph-users] backup ceph

2018-09-20 Thread ST Wong (ITSC)
Hi, >> Will the RAID 6 be mirrored to another storage in remote site for DR purpose? > >Not yet. Our goal is to have the backup ceph to which we will replicate spread >across three different buildings, with 3 replicas. May I ask if the backup ceph is a single ceph cluster span across 3

Re: [ceph-users] ceph-ansible

2018-09-20 Thread Matthew H
Setup a python virtual environment and install the required notario package version. You'll want to also install ansible into that virtual environment along with netaddr. On Sep 20, 2018, at 18:04, solarflow99 mailto:solarflo...@gmail.com>> wrote: oh, was that all it was... git clone

[ceph-users] Ceph backfill problem

2018-09-20 Thread Chen Allen
Hi there, Has anyone experienced below? 2 of OSD server was down, after bring up 2 of servers, I brought 52 OSD's in with just weight of 0.05, but it causing huge backfilling load, I saw so many blocked requests and a number of pg stuck inactive. some of servers was impact. so I stopped

[ceph-users] PG stuck incomplete

2018-09-20 Thread Olivier Bonvalet
Hello, on a Luminous cluster, I have a PG incomplete and I can't find how to fix that. It's an EC pool (4+2) : pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for 'incomplete') Of course, we can't reduce min_size

Re: [ceph-users] ceph-ansible

2018-09-20 Thread solarflow99
oh, was that all it was... git clone https://github.com/ceph/ceph-ansible/ I installed the notario package from EPEL, python2-notario-0.0.11-2.el7.noarch and thats the newest they have On Thu, Sep 20, 2018 at 3:57 PM Alfredo Deza wrote: > Not sure how you installed ceph-ansible, the

Re: [ceph-users] ceph-ansible

2018-09-20 Thread Alfredo Deza
Not sure how you installed ceph-ansible, the requirements mention a version of a dependency (the notario module) which needs to be 0.0.13 or newer, and you seem to be using an older one. On Thu, Sep 20, 2018 at 6:53 PM solarflow99 wrote: > > Hi, tying to get this to do a simple deployment, and

[ceph-users] ceph-ansible

2018-09-20 Thread solarflow99
Hi, tying to get this to do a simple deployment, and i'm getting a strange error, has anyone seen this? I'm using Centos 7, rel 5 ansible 2.5.3 python version = 2.7.5 I've tried with mimic luninous and even jewel, no luck at all. TASK [ceph-validate : validate provided configuration]

Re: [ceph-users] ceph-fuse using excessive memory

2018-09-20 Thread Andras Pataki
I've done some more experiments playing with client config parameters, and it seems like the the client_oc_size parameter is very correlated to how big ceph-fuse grows.  With its default value of 200MB, ceph-fuse gets to about 22GB of RSS, with our previous client_oc_size value of 2GB, the

Re: [ceph-users] Ceph Day at University of Santa Cruz - September 19

2018-09-20 Thread Mike Perez
Hi Nitin, I'm still receiving slides from the speakers but I think I will start posting them tomorrow. I will reply back when this is done. Thanks! -- Mike Perez (thingee) On 16:44 Sep 20, Kamble, Nitin A wrote: > Hi Mike, > Are the slides of presentations available anywhere? If not, can

Re: [ceph-users] total_used statistic incorrect

2018-09-20 Thread Serkan Çoban
did you read my answer? On Thu, Sep 20, 2018 at 8:21 PM Mike Cave wrote: > > I'll bump this one more time in case someone who knows why this is happening > didn't see the thread yesterday. > > Cheers, > Mike > > -Original Message- > From: ceph-users on behalf of Cave Mike > > Date:

Re: [ceph-users] macos build failing

2018-09-20 Thread Gregory Farnum
On Thu, Sep 20, 2018 at 3:46 AM Marc Roos wrote: > > > When running ./do_cmake.sh, I get > > fatal: destination path '/Users/mac/ceph/src/zstd' already exists and is > not an empty directory. > fatal: clone of 'https://github.com/facebook/zstd' into submodule path > '/Users/mac/ceph/src/zstd'

Re: [ceph-users] total_used statistic incorrect

2018-09-20 Thread Mike Cave
I'll bump this one more time in case someone who knows why this is happening didn't see the thread yesterday. Cheers, Mike -Original Message- From: ceph-users on behalf of Cave Mike Date: Wednesday, September 19, 2018 at 9:25 AM Cc: ceph-users Subject: Re: [ceph-users] total_used

Re: [ceph-users] Ceph Day at University of Santa Cruz - September 19

2018-09-20 Thread Kamble, Nitin A
Hi Mike, Are the slides of presentations available anywhere? If not, can those be shared? Thanks, Nitin On 9/11/18, 3:47 PM, "ceph-users on behalf of Mike Perez" wrote: [External Email] Hey all, Just a reminder that Ceph Day at

Re: [ceph-users] Mimic upgrade failure

2018-09-20 Thread Sage Weil
On Thu, 20 Sep 2018, KEVIN MICHAEL HRPCEK wrote: > Top results when both were taken with ms_dispatch at 100%. The mon one > changes alot so I've included 3 snapshots of those. I'll update > mon_osd_cache_size. > > After disabling auth_cluster_required and a cluster reboot I am having > less

Re: [ceph-users] Mimic upgrade failure

2018-09-20 Thread KEVIN MICHAEL HRPCEK
Top results when both were taken with ms_dispatch at 100%. The mon one changes alot so I've included 3 snapshots of those. I'll update mon_osd_cache_size. After disabling auth_cluster_required and a cluster reboot I am having less problems keeping OSDs in the cluster since they seem to not be

Re: [ceph-users] Slow requests blocked. No rebalancing

2018-09-20 Thread Jaime Ibar
Hi all, after increasing mon_max_pg_per_osd number ceph starts rebalancing as usual. However, the slow requests warnings are still there, even after setting primary-affinity to 0 beforehand. By the other hand, if I destroy the osd, ceph will start rebalancing unless noout flag is set, am I

Re: [ceph-users] Mimic upgrade failure

2018-09-20 Thread KEVIN MICHAEL HRPCEK
The mons have a 300gb raid 1 on 10k sas. The /var lv is 44% full with the /var/lib/ceph/mon directory at 6.7gb. When ms_dispatch is running 100% it is all user time with iostat showing 0-2% utilization of the drive. I'm considering taking one the mon's raid 1 drives and dropping them into a

Re: [ceph-users] Slow requests blocked. No rebalancing

2018-09-20 Thread Paul Emmerich
You can prevent creation of the PGs on the old filestore OSDs (which seems to be the culprit here) during replacement by replacing the disks the hard way: * ceph osd destroy osd.X * re-create with bluestore under the same id (ceph volume ... --osd-id X) it will then just backfill onto the same

Re: [ceph-users] Crush distribution with heterogeneous device classes and failure domain hosts

2018-09-20 Thread Kevin Olbrich
Thank you very much Paul. Kevin Am Do., 20. Sep. 2018 um 15:19 Uhr schrieb Paul Emmerich < paul.emmer...@croit.io>: > Hi, > > device classes are internally represented as completely independent > trees/roots; showing them in one tree is just syntactic sugar. > > For example, if you have a

Re: [ceph-users] Mimic upgrade failure

2018-09-20 Thread Sage Weil
Can you try 'perf top -g -p ' to see where all of the encoding activity is coming from? I see two possibilities (the mon attempts to cache encoded maps, and the MOSDMap message itself will also reencode if/when that fails). Also: mon_osd_cache_size = 10 by default... try making that 500 or

Re: [ceph-users] Crush distribution with heterogeneous device classes and failure domain hosts

2018-09-20 Thread Paul Emmerich
Hi, device classes are internally represented as completely independent trees/roots; showing them in one tree is just syntactic sugar. For example, if you have a hierarchy like root --> host1, host2, host3 --> nvme/ssd/sata OSDs, then you'll actually have 3 trees: root~ssd -> host1~ssd,

Re: [ceph-users] Crush distribution with heterogeneous device classes and failure domain hosts

2018-09-20 Thread Kevin Olbrich
To answer my own question: ceph osd crush tree --show-shadow Sorry for the noise... Am Do., 20. Sep. 2018 um 14:54 Uhr schrieb Kevin Olbrich : > Hi! > > Currently I have a cluster with four hosts and 4x HDDs + 4 SSDs per host. > I also have replication rules to distinguish between HDD and SSD

Re: [ceph-users] Mimic upgrade failure

2018-09-20 Thread David Turner
Out of curiosity, what disks do you have your mons on and how does the disk usage, both utilization% and full%, look while this is going on? On Wed, Sep 19, 2018, 1:57 PM Kevin Hrpcek wrote: > Majority of the clients are luminous with a few kraken stragglers. I > looked at ceph features and

[ceph-users] Crush distribution with heterogeneous device classes and failure domain hosts

2018-09-20 Thread Kevin Olbrich
Hi! Currently I have a cluster with four hosts and 4x HDDs + 4 SSDs per host. I also have replication rules to distinguish between HDD and SSD (and failure-domain set to rack) which are mapped to pools. What happens if I add a heterogeneous host with 1x SSD and 1x NVMe (where NVMe will be a new

[ceph-users] Can't remove DeleteMarkers in rgw bucket

2018-09-20 Thread Sean Purdy
Hi, We have a bucket that we are trying to empty. Versioning and lifecycle was enabled. We deleted all the objects in the bucket. But this left a whole bunch of Delete Markers. aws s3api delete-object --bucket B --key K --version-id V is not deleting the delete markers. Any ideas? We

Re: [ceph-users] Slow requests blocked. No rebalancing

2018-09-20 Thread Eugen Block
Hi, to reduce impact on clients during migration I would set the OSD's primary-affinity to 0 beforehand. This should prevent the slow requests, at least this setting has helped us a lot with problematic OSDs. Regards Eugen Zitat von Jaime Ibar : Hi all, we recently upgrade from

Re: [ceph-users] Slow requests blocked. No rebalancing

2018-09-20 Thread Darius Kasparavičius
Hello, 2018-09-20 09:32:58.851160 mon.dri-ceph01 [WRN] Health check update: 249 PGs pending on creation (PENDING_CREATING_PGS) This error might indicate that you are hitting a PG limit per osd. Here some information on it https://ceph.com/community/new-luminous-pg-overdose-protection/ . You

[ceph-users] Slow requests blocked. No rebalancing

2018-09-20 Thread Jaime Ibar
Hi all, we recently upgrade from Jewel 10.2.10 to Luminous 12.2.7, now we're trying to migrate the osd's to Bluestore following this document[0], however when I mark the osd as out, I'm getting warnings similar to these ones 2018-09-20 09:32:46.079630 mon.dri-ceph01 [WRN] Health check

Re: [ceph-users] [RGWRados]librados: Objecter returned from getxattrs r=-36

2018-09-20 Thread John Spray
On Thu, Sep 20, 2018 at 9:42 AM fatkun chan wrote: > > OSD : filestore FileSystem: ext4 system: centos7.2 > > I use https://rook.io/ deploy ceph on kubernetes > > > ceph config > -- > mon keyvaluedb = rocksdb > mon_allow_pool_delete= true >

Re: [ceph-users] macos build failing

2018-09-20 Thread Marc Roos
When running ./do_cmake.sh, I get fatal: destination path '/Users/mac/ceph/src/zstd' already exists and is not an empty directory. fatal: clone of 'https://github.com/facebook/zstd' into submodule path '/Users/mac/ceph/src/zstd' failed Failed to clone 'src/zstd'. Retry scheduled fatal:

[ceph-users] macos build failing

2018-09-20 Thread Marc Roos
Has anyone been able to build according to this manual? Because here it fails. http://docs.ceph.com/docs/mimic/dev/macos/ I have prepared macos as it is described, took 2h to build this llvm, is that really necessary? I do the git clone --single-branch -b mimic

Re: [ceph-users] Delay Between Writing Data and that Data being available for reading?

2018-09-20 Thread Thomas Sumpter
In case you or anyone else reading is interested, I tried using the latest fuse client instead of the kernel client and my problem seems to be gone. I think our kernel is recent enough that it should include the bug fix you mentioned? So maybe some else going on there... Regards, Tom From:

Re: [ceph-users] [RGWRados]librados: Objecter returned from getxattrs r=-36

2018-09-20 Thread fatkun chan
OSD : filestore FileSystem: ext4 system: centos7.2 I use https://rook.io/ deploy ceph on kubernetes ceph config -- mon keyvaluedb = rocksdb mon_allow_pool_delete= true mon_max_pg_per_osd = 1000 debug default= 0 debug

Re: [ceph-users] Cluster Security

2018-09-20 Thread Florian Florensa
I was thinking of iscsi gateways colocated on the osd nodes and trying to distribute the luns as evenly as possible, would that setup work ? Also regarding the configuration of the iscsi target is it stored inside ceph cluster ? Le jeu. 20 sept. 2018 à 08:23, Jan Fajerski a écrit : > Hi, > if

Re: [ceph-users] v12.2.8 Luminous released

2018-09-20 Thread Konstantin Shalygin
12.2.8 improves the deep scrub code to automatically repair these inconsistencies. Once the entire cluster has been upgraded and then fully deep scrubbed, and all such inconsistencies are resolved; it will be safe to disable the `osd distrust data digest = true` workaround option. Just for

Re: [ceph-users] Cluster Security

2018-09-20 Thread Jan Fajerski
Hi, if you want to isolate your HV from ceph's public network a gateway would do that (like iscsi gateway). Note however that this will also add an extra network hop and a potential bottleneck since all client traffic has to pass through the gateway node(s). HTH, Jan On Wed, Sep 19, 2018 at

Re: [ceph-users] Is luminous ceph rgw can only run with the civetweb ?

2018-09-20 Thread linghucongsong
1 it is for the perfomance nginx is more faster than civetweb base on the cosbench test. 2 I want to use some extra function with nginx. such as rtmp streaming data and add watermask for picture and so on. with nginx it has more free and opensource powerful plug-in units. At