[ceph-users] nfs-ganesha FSAL CephFS: nfs_health :DBUS :WARN :Health status is unhealthy

2018-09-10 Thread Kevin Olbrich
Hi! Today one of our nfs-ganesha gateway experienced an outage and since crashs every time, the client behind it tries to access the data. This is a Ceph Mimic cluster with nfs-ganesha from ceph-repos: nfs-ganesha-2.6.2-0.1.el7.x86_64 nfs-ganesha-ceph-2.6.2-0.1.el7.x86_64 There were fixes for

[ceph-users] omap vs. xattr in librados

2018-09-10 Thread Benjamin Cherian
Hi, I'm interested in writing a relatively simple application that would use librados for storage. Are there recommendations for when to use the omap as opposed to an xattr? In theory, you could use either a set of xattrs or an omap as a kv store associated with a specific object. Are there

Re: [ceph-users] Bluestore DB size and onode count

2018-09-10 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark > Nelson > Sent: 10 September 2018 18:27 > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Bluestore DB size and onode count > > On 09/10/2018 12:22 PM, Igor Fedotov wrote: > >

Re: [ceph-users] Bluestore DB size and onode count

2018-09-10 Thread Igor Fedotov
On 9/10/2018 8:26 PM, Mark Nelson wrote: On 09/10/2018 12:22 PM, Igor Fedotov wrote: Just in case - is slow_used_bytes equal to 0? Some DB data might reside at slow device if spill over has happened. Which doesn't require full DB volume to happen - that's by RocksDB's design. And

Re: [ceph-users] rbd-nbd on CentOS

2018-09-10 Thread Ilya Dryomov
On Mon, Sep 10, 2018 at 7:46 PM David Turner wrote: > > Now that you mention it, I remember those threads on the ML. What happens if > you use --yes-i-really-mean-it to do those things and then later you try to > map an RBD with an older kernel for CentOS 7.3 or 7.4? Will that mapping > fail

Re: [ceph-users] data corruption issue with "rbd export-diff/import-diff"

2018-09-10 Thread Patrick.Mclean
On 2018-09-10 11:04:20-07:00 Jason Dillaman wrote: On Mon, Sep 10, 2018 at 1:35 PM patrick.mcl...@sony.com wrote: > We utilize Ceph RBDs for our users' storage and need to keep data > synchronized across data centres. For this we rely on 'rbd export-diff / > import-diff'. Lately we have been

Re: [ceph-users] data corruption issue with "rbd export-diff/import-diff"

2018-09-10 Thread Jason Dillaman
On Mon, Sep 10, 2018 at 1:35 PM wrote: > > Hi, > We utilize Ceph RBDs for our users' storage and need to keep data > synchronized across data centres. For this we rely on 'rbd export-diff / > import-diff'. Lately we have been noticing cases in which the file system on > the 'destination RBD'

[ceph-users] data corruption issue with "rbd export-diff/import-diff"

2018-09-10 Thread Patrick.Mclean
Hi, We utilize Ceph RBDs for our users' storage and need to keep data synchronized across data centres. For this we rely on 'rbd export-diff / import-diff'. Lately we have been noticing cases in which the file system on the 'destination RBD' is corrupt. We have been trying to isolate the issue,

Re: [ceph-users] rbd-nbd on CentOS

2018-09-10 Thread David Turner
Now that you mention it, I remember those threads on the ML. What happens if you use --yes-i-really-mean-it to do those things and then later you try to map an RBD with an older kernel for CentOS 7.3 or 7.4? Will that mapping fail because of the min-client-version of luminous set on the cluster

Re: [ceph-users] rbd-nbd on CentOS

2018-09-10 Thread Ilya Dryomov
On Mon, Sep 10, 2018 at 7:19 PM David Turner wrote: > > I haven't found any mention of this on the ML and Google's results are all > about compiling your own kernel to use NBD on CentOS. Is everyone that's > using rbd-nbd on CentOS honestly compiling their own kernels for the clients? > This

Re: [ceph-users] Bluestore DB size and onode count

2018-09-10 Thread Mark Nelson
On 09/10/2018 12:22 PM, Igor Fedotov wrote: Hi Nick. On 9/10/2018 1:30 PM, Nick Fisk wrote: If anybody has 5 minutes could they just clarify a couple of things for me 1. onode count, should this be equal to the number of objects stored on the OSD? Through reading several posts, there

Re: [ceph-users] Bluestore DB size and onode count

2018-09-10 Thread Igor Fedotov
Hi Nick. On 9/10/2018 1:30 PM, Nick Fisk wrote: If anybody has 5 minutes could they just clarify a couple of things for me 1. onode count, should this be equal to the number of objects stored on the OSD? Through reading several posts, there seems to be a general indication that this is the

[ceph-users] rbd-nbd on CentOS

2018-09-10 Thread David Turner
I haven't found any mention of this on the ML and Google's results are all about compiling your own kernel to use NBD on CentOS. Is everyone that's using rbd-nbd on CentOS honestly compiling their own kernels for the clients? This feels like something that shouldn't be necessary anymore. I would

[ceph-users] tier monitoring

2018-09-10 Thread Fyodor Ustinov
Hi! Does anyone have a recipe for monitoring of tiering pool? Interested in such parameters as fullness, flush/evict/promote statistics and so on. WBR, Fyodor. ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] Need a procedure for corrupted pg_log repair using ceph-kvstore-tool

2018-09-10 Thread Maks Kowalik
Can someone provide information about what to look for (and how to modify the related leveldb keys) in case of such error leading to OSD crash -5> 2018-09-10 14:46:30.896130 7efff657dd00 20 read_log_and_missing 712021'566147 (656569'562061) delete

Re: [ceph-users] Need help

2018-09-10 Thread John Spray
(adding list back) The "clients failing to respond to capability release" messages can sometimes indicate a bug in the client code, so it's a good idea to make sure you've got the most recent fixes before investigating further. It's also useful to compare kernel vs. fuse clients to see if the

Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time

2018-09-10 Thread Kees Meijs
Hi list, A little update: meanwhile we added a new node consisting of Hammer OSDs to ensure sufficient cluster capacity. The upgraded node with Infernalis OSDs is completely removed from the CRUSH map and the OSDs removed (obviously we didn't wipe the disks yet). At the moment we're still

Re: [ceph-users] Need help

2018-09-10 Thread John Spray
On Mon, Sep 10, 2018 at 1:40 PM marc-antoine desrochers wrote: > > Hi, > > > > I am currently running a ceph cluster running in CEPHFS with 3 nodes each > have 6 osd’s except 1 who got 5. I got 3 mds : 2 active and 1 standby, 3 mon. > > > > > > [root@ceph-n1 ~]# ceph -s > > cluster: > >

Re: [ceph-users] Need help

2018-09-10 Thread Burkhard Linke
Hi, On 09/10/2018 02:40 PM, marc-antoine desrochers wrote: Hi, I am currently running a ceph cluster running in CEPHFS with 3 nodes each have 6 osd's except 1 who got 5. I got 3 mds : 2 active and 1 standby, 3 mon. [root@ceph-n1 ~]# ceph -s cluster: id:

Re: [ceph-users] Need help

2018-09-10 Thread Marc Roos
I guess good luck. Maybe you can ask these guys to hurry up and get something production ready. https://github.com/ceph-dovecot/dovecot-ceph-plugin -Original Message- From: marc-antoine desrochers [mailto:marc-antoine.desroch...@sogetel.com] Sent: maandag 10 september 2018 14:40

Re: [ceph-users] upgrade jewel to luminous with ec + cache pool

2018-09-10 Thread David Turner
Yes, migrating to 12.2.8 is fine. Migrating to not use the cache tier is as simple as changing the ec pool mode to allow EC over writes, changing the cache tier mode to forward, flushing the tier, and removing it. Basically once you have EC over writes just follow the steps in the docs for

[ceph-users] Need help

2018-09-10 Thread marc-antoine desrochers
Hi, I am currently running a ceph cluster running in CEPHFS with 3 nodes each have 6 osd's except 1 who got 5. I got 3 mds : 2 active and 1 standby, 3 mon. [root@ceph-n1 ~]# ceph -s cluster: id: 1d97aa70-2029-463a-b6fa-20e98f3e21fb health: HEALTH_WARN 3

Re: [ceph-users] tcmu-runner could not find handler

2018-09-10 Thread Jason Dillaman
On Mon, Sep 10, 2018 at 6:36 AM 展荣臻 wrote: > > hi!everyone: > > I want to export ceph rbd via iscsi。 > ceph version is 10.2.11,centos 7.5 kernel 3.10.0-862.el7.x86_64, > and i also installed > tcmu-runner、targetcli-fb、python-rtslib、ceph-iscsi-config、 ceph-iscsi-cli。 > but when i

Re: [ceph-users] Mimic upgrade failure

2018-09-10 Thread Sage Weil
I took a look at the mon log you sent. A few things I noticed: - The frequent mon elections seem to get only 2/3 mons about half of the time. - The messages coming in a mostly osd_failure, and half of those seem to be recoveries (cancellation of the failure message). It does smell a bit like

[ceph-users] upgrade jewel to luminous with ec + cache pool

2018-09-10 Thread Markus Hickel
Dear all, i am running a cephfs cluster (jewel 10.2.10) with a ec + cache pool. There is a thread in the ML that states skipping 10.2.11 and going to 11.2.8 is possible, does this work with ec + cache pool aswell ? I also wanted to ask if there is a recommended migration path from cephfs with

[ceph-users] tcmu-runner could not find handler

2018-09-10 Thread 展荣臻
hi!everyone: I want to export ceph rbd via iscsi。 ceph version is 10.2.11,centos 7.5 kernel 3.10.0-862.el7.x86_64, and i also installed tcmu-runner、targetcli-fb、python-rtslib、ceph-iscsi-config、 ceph-iscsi-cli。 but when i lanuch "create pool=rbd image=disk_1 size=10G" with gwcli

[ceph-users] Bluestore DB size and onode count

2018-09-10 Thread Nick Fisk
If anybody has 5 minutes could they just clarify a couple of things for me 1. onode count, should this be equal to the number of objects stored on the OSD? Through reading several posts, there seems to be a general indication that this is the case, but looking at my OSD's the maths don't work.

Re: [ceph-users] Force unmap of RBD image

2018-09-10 Thread Martin Palma
Thanks for the suggestions, and will future check for LVM volumes, etc... the kernel version is the following 3.10.0-327.4.4.el7.x86_64 and the OS is CentOS 7.2.1511 (Core) Best, Martin On Mon, Sep 10, 2018 at 12:23 PM Ilya Dryomov wrote: > > On Mon, Sep 10, 2018 at 10:46 AM Martin Palma

Re: [ceph-users] Force unmap of RBD image

2018-09-10 Thread Ilya Dryomov
On Mon, Sep 10, 2018 at 10:46 AM Martin Palma wrote: > > We are trying to unmap an rbd image form a host for deletion and > hitting the following error: > > rbd: sysfs write failed > rbd: unmap failed: (16) Device or resource busy > > We used commands like "lsof" and "fuser" but nothing is

[ceph-users] Tiering stats are blank on Bluestore OSD's

2018-09-10 Thread Nick Fisk
After upgrading a number of OSD's to Bluestore I have noticed that the cache tier OSD's which have so far been upgraded are no longer logging tier_* stats "tier_promote": 0, "tier_flush": 0, "tier_flush_fail": 0, "tier_try_flush": 0, "tier_try_flush_fail":

Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-10 Thread Menno Zonneveld
-Original message- > From:Alwin Antreich > Sent: Thursday 6th September 2018 18:36 > To: ceph-users > Cc: Menno Zonneveld ; Marc Roos > Subject: Re: [ceph-users] Rados performance inconsistencies, lower than > expected performance > > On Thu, Sep 06, 2018 at 05:15:26PM +0200, Marc

Re: [ceph-users] Rados performance inconsistencies, lower than expected performance

2018-09-10 Thread Menno Zonneveld
I filled up the cluster by accident by not supplying --no-cleanup to the write benchmark, I'm sure there must be a better way for that though. I've run the tests again and when the cluster is 'empty' (I have a few test VM's stored on CEPH) and let it fill up again. Performance goes up from

[ceph-users] Force unmap of RBD image

2018-09-10 Thread Martin Palma
We are trying to unmap an rbd image form a host for deletion and hitting the following error: rbd: sysfs write failed rbd: unmap failed: (16) Device or resource busy We used commands like "lsof" and "fuser" but nothing is reported to use the device. Also checked for watcher with "rados -p pool

Re: [ceph-users] Mimic upgrade failure

2018-09-10 Thread Janne Johansson
Den mån 10 sep. 2018 kl 08:10 skrev Kevin Hrpcek : > Update for the list archive. > > I went ahead and finished the mimic upgrade with the osds in a fluctuating > state of up and down. The cluster did start to normalize a lot easier after > everything was on mimic since the random mass OSD

Re: [ceph-users] Mimic upgrade failure

2018-09-10 Thread Kevin Hrpcek
Update for the list archive. I went ahead and finished the mimic upgrade with the osds in a fluctuating state of up and down. The cluster did start to normalize a lot easier after everything was on mimic since the random mass OSD heartbeat failures stopped and the constant mon election