Re: [ceph-users] mount cephfs on ceph servers

2019-03-06 Thread Hector Martin
On 06/03/2019 12:07, Zhenshi Zhou wrote: Hi, I'm gonna mount cephfs from my ceph servers for some reason, including monitors, metadata servers and osd servers. I know it's not a best practice. But what is the exact potential danger if I mount cephfs from its own server? As a datapoint, I have

Re: [ceph-users] 14.1.0, No dashboard module

2019-03-06 Thread Kai Wagner
Hi all, I think this change really late in the game just results into confusion. I would be in favor to make the ceph-mgr-dashboard package a dependency of the ceph-mgr so that people just need to enable the dashboard without the need to install another package separately. This way we could also

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-06 Thread Mark Nelson
On 3/5/19 4:23 PM, Vitaliy Filippov wrote: Testing -rw=write without -sync=1 or -fsync=1 (or -fsync=32 for batch IO, or just fio -ioengine=rbd from outside a VM) is rather pointless - you're benchmarking the RBD cache, not Ceph itself. RBD cache is coalescing your writes into big sequential

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-06 Thread Stefan Priebe - Profihost AG
Hi Mark, Am 05.03.19 um 23:12 schrieb Mark Nelson: > Hi Stefan, > > > Could you try running your random write workload against bluestore and > then take a wallclock profile of an OSD using gdbpmp? It's available here: > > > https://github.com/markhpc/gdbpmp sure but it does not work: #

Re: [ceph-users] mount cephfs on ceph servers

2019-03-06 Thread David C
The general advice has been to not use the kernel client on an osd node as you may see a deadlock under certain conditions. Using the fuse client should be fine or use the kernel client inside a VM. On Wed, 6 Mar 2019, 03:07 Zhenshi Zhou, wrote: > Hi, > > I'm gonna mount cephfs from my ceph

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-06 Thread Mark Nelson
On 3/6/19 5:12 AM, Stefan Priebe - Profihost AG wrote: Hi Mark, Am 05.03.19 um 23:12 schrieb Mark Nelson: Hi Stefan, Could you try running your random write workload against bluestore and then take a wallclock profile of an OSD using gdbpmp? It's available here:

Re: [ceph-users] Mounting image from erasure-coded pool without tiering in KVM

2019-03-06 Thread Vitaliy Filippov
Check if you have a recent enough librbd installed on your VM hosts. Hello, all! I have a problem with adding image volumes to my KVM VM. I prepared erasure coded pool (named data01) on full-bluestore OSDs and allowed ec_overwrites on it. Also i created replicated pool for image volumes

Re: [ceph-users] objects degraded higher than 100%

2019-03-06 Thread Darius Kasparavičius
Hi, there it's 1.2% not 1200%. On Wed, Mar 6, 2019 at 4:36 PM Simon Ironside wrote: > > Hi, > > I'm still seeing this issue during failure testing of a new Mimic 13.2.4 > cluster. To reproduce: > > - Working Mimic 13.2.4 cluster > - Pull a disk > - Wait for recovery to complete (i.e. back to

Re: [ceph-users] mount cephfs on ceph servers

2019-03-06 Thread Jake Grimmett
Just to add "+1" on this datapoint, based on one month usage on Mimic 13.2.4 essentially "it works great for us" Prior to this, we had issues with the kernel driver on 12.2.2. This could have been due to limited RAM on the osd nodes (128GB / 45 OSD), and an older kernel. Upgrading the RAM to

Re: [ceph-users] objects degraded higher than 100%

2019-03-06 Thread Simon Ironside
Hi, I'm still seeing this issue during failure testing of a new Mimic 13.2.4 cluster. To reproduce: - Working Mimic 13.2.4 cluster - Pull a disk - Wait for recovery to complete (i.e. back to HEALTH_OK) - Remove the OSD with `ceph osd crush remove` - See greater than 100% degraded objects

Re: [ceph-users] Deploy Cehp in multisite setup

2019-03-06 Thread Daniel Gryniewicz
On 3/5/19 2:15 PM, Paul Emmerich wrote: Choose two: * POSIX filesystem with a reliable storage underneath * Multiple sites with poor or high-latency connection between them * Performance If you can get away with S3/Object access, rather than POSIX FS, you could use the RadosGW from Ceph.

Re: [ceph-users] objects degraded higher than 100%

2019-03-06 Thread Simon Ironside
That's the misplaced objects, no problem there. Degraded objects are at 153.818%. Simon On 06/03/2019 15:26, Darius Kasparavičius wrote: Hi, there it's 1.2% not 1200%. On Wed, Mar 6, 2019 at 4:36 PM Simon Ironside wrote: Hi, I'm still seeing this issue during failure testing of a new

Re: [ceph-users] objects degraded higher than 100%

2019-03-06 Thread Darius Kasparavičius
For some reason I didn't notice that number. But it's most likely you are hitting this or similar bug: https://tracker.ceph.com/issues/21803 On Wed, Mar 6, 2019, 17:30 Simon Ironside wrote: > That's the misplaced objects, no problem there. Degraded objects are at > 153.818%. > > Simon > > On

Re: [ceph-users] backfill_toofull after adding new OSDs

2019-03-06 Thread Simon Ironside
I've just seen this when *removing* an OSD too. Issue resolved itself during recovery. OSDs were not full, not even close, there's virtually nothing on this cluster. Mimic 13.2.4 on RHEL 7.6. OSDs are all Bluestore HDD with SSD DBs. Everything is otherwise default.   cluster:     id: MY

Re: [ceph-users] optimize bluestore for random write i/o

2019-03-06 Thread Stefan Priebe - Profihost AG
Am 06.03.19 um 14:08 schrieb Mark Nelson: > > On 3/6/19 5:12 AM, Stefan Priebe - Profihost AG wrote: >> Hi Mark, >> Am 05.03.19 um 23:12 schrieb Mark Nelson: >>> Hi Stefan, >>> >>> >>> Could you try running your random write workload against bluestore and >>> then take a wallclock profile of an

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-06 Thread Casey Bodley
Hi Trey, I think it's more likely that these stale metadata entries are from deleted buckets, rather than accidental bucket reshards. When a bucket is deleted in a multisite configuration, we don't delete its bucket instance because other zones may still need to sync the object deletes - and

[ceph-users] MDS crashes on client connection

2019-03-06 Thread Kadiyska, Yana
Hi all, I am a new user on this list. I have a legacy production system running ceph version 0.94.7 Ceph itself appears to be functioning well, ceph -s is reporting good health. I am connecting to the filesystem via an hdfs client. Upon connection I see the client receiving messages like

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-06 Thread Trey Palmer
Casey, This was the result of trying 'data sync init': root@c2-rgw1:~# radosgw-admin data sync init ERROR: source zone not specified root@c2-rgw1:~# radosgw-admin data sync init --source-zone= WARNING: cannot find source zone id for name= ERROR: sync.init_sync_status() returned ret=-2

Re: [ceph-users] backfill_toofull after adding new OSDs

2019-03-06 Thread Paul Emmerich
Yeah, this happens all the time during backfilling since Mimic and is some kind of bug. It will always resolve itself, but it's still quite annoying. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München

Re: [ceph-users] Ceph REST API

2019-03-06 Thread Martin Verges
Hello, you could use croit to manage your cluster. We provide a extensive RESTful API that can be used to automate near everything in your cluster. Take a look at https://croit.io/docs/v1901 and try it yourself with our vagrant demo or using the production guide. If you miss something, please

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-06 Thread Trey Palmer
Casey, You are spot on that almost all of these are deleted buckets. At some point in the last few months we deleted and replaced buckets with underscores in their names, and those are responsible for most of these errors. Thanks very much for the reply and explanation. We’ll give ‘data sync

[ceph-users] Can CephFS Kernel Client Not Read & Write at the Same Time?

2019-03-06 Thread Andrew Richards
We discovered recently that our CephFS mount appeared to be halting reads when writes were being synched to the Ceph cluster to the point it was affecting applications. I also posted this as a Gist with embedded graph images to help illustrate:

[ceph-users] Deploying a Ceph+NFS Server Cluster with Rook

2019-03-06 Thread Jeff Layton
I had several people ask me to put together some instructions on how to deploy a Ceph+NFS cluster from scratch, and the new functionality in Ceph and rook.io make this quite easy. I wrote a Ceph community blog post that walks the reader through the process:

Re: [ceph-users] objects degraded higher than 100%

2019-03-06 Thread Simon Ironside
Yes, as I said that bug is marked resolved. It's also marked as only affecting jewel and luminous. I'm pointing out that it's still an issue today in mimic 13.2.4. Simon On 06/03/2019 16:04, Darius Kasparavičius wrote: For some reason I didn't notice that number. But it's most likely you are

Re: [ceph-users] radosgw sync falling behind regularly

2019-03-06 Thread Trey Palmer
It appears we eventually got 'data sync init' working. At least, it's worked on 5 of the 6 sync directions in our 3-node cluster. The sixth has not run without an error returned, although 'sync status' does say "preparing for full sync". Thanks, Trey On Wed, Mar 6, 2019 at 1:22 PM Trey Palmer

Re: [ceph-users] 14.1.0, No dashboard module

2019-03-06 Thread solarflow99
sounds right to me On Wed, Mar 6, 2019 at 7:35 AM Kai Wagner wrote: > Hi all, > > I think this change really late in the game just results into confusion. > > I would be in favor to make the ceph-mgr-dashboard package a dependency of > the ceph-mgr so that people just need to enable the

Re: [ceph-users] mount cephfs on ceph servers

2019-03-06 Thread Daniele Riccucci
Hello, is the deadlock risk still an issue in containerized deployments? For example with OSD daemons in containers and mounting the filesystem on the host machine? Thank you. Daniele On 06/03/19 16:40, Jake Grimmett wrote: Just to add "+1" on this datapoint, based on one month usage on

[ceph-users] http://tracker.ceph.com/issues/38122

2019-03-06 Thread Milanov, Radoslav Nikiforov
Can someone elaborate on [cid:image001.png@01D4D44C.27BDF330] >From http://tracker.ceph.com/issues/38122 Which exactly package is missing? And why is this happening ? In Mimic all dependencies are resolved by yum? - Rado ___ ceph-users mailing list

Re: [ceph-users] http://tracker.ceph.com/issues/38122

2019-03-06 Thread Brad Hubbard
+Jos Collin On Thu, Mar 7, 2019 at 9:41 AM Milanov, Radoslav Nikiforov wrote: > Can someone elaborate on > > > > From http://tracker.ceph.com/issues/38122 > > > > Which exactly package is missing? > > And why is this happening ? In Mimic all dependencies are resolved by yum? > > - Rado > > >

Re: [ceph-users] How To Scale Ceph for Large Numbers of Clients?

2019-03-06 Thread Patrick Donnelly
Hello Zack, On Wed, Mar 6, 2019 at 1:18 PM Zack Brenton wrote: > > Hello, > > We're running Ceph on Kubernetes 1.12 using the Rook operator > (https://rook.io), but we've been struggling to scale applications mounting > CephFS volumes above 600 pods / 300 nodes. All our instances use the

[ceph-users] rados cppool Input/Output Error on RGW pool

2019-03-06 Thread Wido den Hollander
Hi, I'm trying to do a 'rados cppool' of a RGW index pool and I keep hitting this error: .rgw.buckets.index:.dir.default.20674.1 => .rgw.buckets.index.new:.dir.default.20674.1 error copying object: (0) Success error copying pool .rgw.buckets.index => .rgw.buckets.index.new: (5) Input/output

[ceph-users] How To Scale Ceph for Large Numbers of Clients?

2019-03-06 Thread Zack Brenton
Hello, We're running Ceph on Kubernetes 1.12 using the Rook operator ( https://rook.io), but we've been struggling to scale applications mounting CephFS volumes above 600 pods / 300 nodes. All our instances use the kernel client and run kernel `4.19.23-coreos-r1`. We've tried increasing the MDS

[ceph-users] GetRole Error:405 Method Not Allowed

2019-03-06 Thread myxingkong
I created a role and attached a permission policy to it. radosgw-admin role create --role-name=S3Access --path=/application_abc/component_xyz/

[ceph-users] PGs stuck in created state

2019-03-06 Thread simon falicon
Hello Ceph Users, I have an issue with my ceph cluster, after one serious fail in four SSD (electricaly dead) I have lost PGs (and replicats) and I have 14 Pgs stuck. So for correct it I have try to force create this PGs (with same IDs) but now the Pgs stuck in creating state -_-" : ~# ceph -s

Re: [ceph-users] http://tracker.ceph.com/issues/38122

2019-03-06 Thread Jos Collin
I have originally created this bug when I saw this issue in debian/stretch. But now it looks like install-deps.sh is not installing 'colorize' package in Fedora too. I'm reopening this bug. On 07/03/19 8:32 AM, Brad Hubbard wrote: > +Jos Collin   > > On Thu, Mar 7, 2019