[ceph-users] How to fix mon scrub errors?

2017-12-12 Thread Burkhard Linke
HI, since the upgrade to luminous 12.2.2 the mons are complaining about scrub errors: 2017-12-13 08:49:27.169184 mon.ceph-storage-03 [ERR] scrub mismatch 2017-12-13 08:49:27.169203 mon.ceph-storage-03 [ERR]  mon.0 ScrubResult(keys {logm=87,mds_health=13} crc

Re: [ceph-users] Health Error : Request Stuck

2017-12-12 Thread Karun Josy
Cluster is unusable because of inactive PGs. How can we correct it? = ceph pg dump_stuck inactive ok PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY 1.4bactivating+remapped [5,2,0,13,1] 5 [5,2,13,1,4] 5 1.35activating+remapped [2,7,0,1,12]

Re: [ceph-users] which version of ceph is better for cephfs in production

2017-12-12 Thread Yan, Zheng
On Wed, Dec 13, 2017 at 9:27 AM, 13605702...@163.com <13605702...@163.com> wrote: > hi > > since Jewel, cephfs is considered as production ready. > but can anybody tell me which version fo ceph is better? Jewel? kraken? or > Luminous? > luminous, version 12.2.2 > thanks > >

Re: [ceph-users] inconsistent pg issue with ceph version 10.2.3

2017-12-12 Thread Thanh Tran
I fixed this inconsistent error. It seems ceph didn't delete the mismatch object that depends the deleted snapshot. This caused error "unexpected clone" that resulted of message inconsistent. Log: 2017-12-12 20:14:06.651942 7fc7eff7e700 -1 log_channel(cluster) log [ERR] : deep-scrub 4.1b42

[ceph-users] Health Error : Request Stuck

2017-12-12 Thread Karun Josy
Hello, We added a new disk to the cluster and while rebalancing we are getting error warnings. = Overall status: HEALTH_ERR REQUEST_SLOW: 1824 slow requests are blocked > 32 sec REQUEST_STUCK: 1022 stuck requests are blocked > 4096 sec == The load in the servers seems to

[ceph-users] which version of ceph is better for cephfs in production

2017-12-12 Thread 13605702...@163.com
hi since Jewel, cephfs is considered as production ready. but can anybody tell me which version fo ceph is better? Jewel? kraken? or Luminous? thanks 13605702...@163.com ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] Sudden omap growth on some OSDs

2017-12-12 Thread Gregory Farnum
On Tue, Dec 12, 2017 at 3:36 PM wrote: > From: Gregory Farnum > Date: Tuesday, 12 December 2017 at 19:24 > To: "Vasilakakos, George (STFC,RAL,SC)" > Cc: "ceph-users@lists.ceph.com" >

Re: [ceph-users] Sudden omap growth on some OSDs

2017-12-12 Thread george.vasilakakos
From: Gregory Farnum Date: Tuesday, 12 December 2017 at 19:24 To: "Vasilakakos, George (STFC,RAL,SC)" Cc: "ceph-users@lists.ceph.com" Subject: Re: [ceph-users] Sudden omap growth on some OSDs On Tue, Dec 12, 2017 at

Re: [ceph-users] Fwd: Lock doesn't want to be given up

2017-12-12 Thread Florian Margaine
Hi, As a follow-up, this PR for librbd seems to be what needs to be applied to krbd too. As said in the PR, the bug is very much reproducible after Jason Dillaman's suggestion. Regards, Florian Florian Margaine writes: > Hi, > > We're hitting an odd issue on our ceph

Re: [ceph-users] Odd object blocking IO on PG

2017-12-12 Thread Gregory Farnum
On Tue, Dec 12, 2017 at 12:33 PM Nick Fisk wrote: > > > That doesn't look like an RBD object -- any idea who is > > "client.34720596.1:212637720"? > > So I think these might be proxy ops from the cache tier, as there are also > block ops on one of the cache tier OSD's, but this

Re: [ceph-users] Bluestore Compression not inheriting pool option

2017-12-12 Thread Stefan Kooman
Quoting Nick Fisk (n...@fisk.me.uk): > Hi All, > > Has anyone been testing the bluestore pool compression option? > > I have set compression=snappy on a RBD pool. When I add a new bluestore OSD, > data is not being compressed when backfilling, confirmed by looking at the > perf dump results. If

Re: [ceph-users] Odd object blocking IO on PG

2017-12-12 Thread Nick Fisk
> That doesn't look like an RBD object -- any idea who is > "client.34720596.1:212637720"? So I think these might be proxy ops from the cache tier, as there are also block ops on one of the cache tier OSD's, but this time it actually lists the object name. Block op on cache tier.

Re: [ceph-users] Error in osd_client.c, request_reinit

2017-12-12 Thread Ilya Dryomov
On Tue, Dec 12, 2017 at 8:18 PM, fcid wrote: > Hello everyone, > > We had an incident regarding a client which reboot after experiencing some > issues with a ceph cluster. > > The other clients who consume RBD images from the same ceph cluster showed > and error at the time of

Re: [ceph-users] Sudden omap growth on some OSDs

2017-12-12 Thread Gregory Farnum
On Tue, Dec 12, 2017 at 3:16 AM wrote: > > On 11 Dec 2017, at 18:24, Gregory Farnum > wrote: > > Hmm, this does all sound odd. Have you tried just restarting the primary > OSD yet? That frequently resolves transient oddities

[ceph-users] Error in osd_client.c, request_reinit

2017-12-12 Thread fcid
Hello everyone, We had an incident regarding a client which reboot after experiencing some issues with a ceph cluster. The other clients who consume RBD images from the same ceph cluster showed and error at the time of the reboot in logs related to libceph. The errors looks like this: Dec

Re: [ceph-users] Odd object blocking IO on PG

2017-12-12 Thread Gregory Farnum
Jason was more diligent than me and dug enough to realize that we print out the "raw pg", which we are printing out because we haven't gotten far enough in the pipeline to decode the actual object name. You'll note that it ends with the same characters as the PG does, and unlike a pgid, the raw pg

Re: [ceph-users] Odd object blocking IO on PG

2017-12-12 Thread Jason Dillaman
That doesn't look like an RBD object -- any idea who is "client.34720596.1:212637720"? On Tue, Dec 12, 2017 at 12:36 PM, Nick Fisk wrote: > Does anyone know what this object (0.ae78c1cf) might be, it's not your > normal run of the mill RBD object and I can't seem to find it in

Re: [ceph-users] Odd object blocking IO on PG

2017-12-12 Thread Gregory Farnum
On Tue, Dec 12, 2017 at 9:37 AM Nick Fisk wrote: > Does anyone know what this object (0.ae78c1cf) might be, it's not your > normal run of the mill RBD object and I can't seem to find it in the pool > using rados --all ls . It seems to be leaving the 0.1cf PG stuck in an >

Re: [ceph-users] Using CephFS in LXD containers

2017-12-12 Thread David Turner
We have a project using cephfs (ceph-fuse) in kubernetes containers. For us the throughput was limited by the mount point and not the cluster. Having a single mount point for each container would cap with the throughput of a single mount point. We ended up mounting cephfs inside of the

[ceph-users] Bluestore Compression not inheriting pool option

2017-12-12 Thread Nick Fisk
Hi All, Has anyone been testing the bluestore pool compression option? I have set compression=snappy on a RBD pool. When I add a new bluestore OSD, data is not being compressed when backfilling, confirmed by looking at the perf dump results. If I then set again the compression type on the pool

[ceph-users] Odd object blocking IO on PG

2017-12-12 Thread Nick Fisk
Does anyone know what this object (0.ae78c1cf) might be, it's not your normal run of the mill RBD object and I can't seem to find it in the pool using rados --all ls . It seems to be leaving the 0.1cf PG stuck in an activating+remapped state and blocking IO. Pool 0 is just a pure RBD pool with a

[ceph-users] Using CephFS in LXD containers

2017-12-12 Thread Bogdan SOLGA
Hello, everyone! We have recently started to use CephFS (Jewel, v12.2.1) from a few LXD containers. We have mounted it on the host servers and then exposed it in the LXD containers. Do you have any recommendations (dos and don'ts) on this way of using CephFS? Thank you, in advance! Kind

[ceph-users] Fwd: Lock doesn't want to be given up

2017-12-12 Thread Florian Margaine
Hi, We're hitting an odd issue on our ceph cluster: - We have machine1 mapping an exclusive-lock RBD. - Machine2 wants to take a snapshot of the RBD, but fails to take the lock. Stracing the rbd snap process on machine2 shows it looping on sending "lockget" commands, without ever moving

[ceph-users] inconsistent pg issue with ceph version 10.2.3

2017-12-12 Thread Thanh Tran
Hi, My ceph cluster has a inconsistent pg. I tried to deep scrub pg and repair pg but not fix problems. I found that the object that made pg inconsistent depends on snapshot (snap id is 2ccac = 183468) of a image, I deleted this snapshot, then query inconsistent pg and it showed empty, but my

Re: [ceph-users] ceph configuration backup - what is vital?

2017-12-12 Thread Wido den Hollander
On 12/12/2017 02:18 PM, David Turner wrote: I always back up my crush map. Someone making a mistake to the crush map will happen and being able to restore last night's crush map has been wonderful. That's all I really back up. Yes, that's what I would suggest as well. Just have a daily

Re: [ceph-users] ceph configuration backup - what is vital?

2017-12-12 Thread David Turner
I always back up my crush map. Someone making a mistake to the crush map will happen and being able to restore last night's crush map has been wonderful. That's all I really back up. On Tue, Dec 12, 2017, 5:53 AM Wolfgang Lendl < wolfgang.le...@meduniwien.ac.at> wrote: > hello, > > I'm looking

Re: [ceph-users] Slow objects deletion

2017-12-12 Thread David Turner
To delete objects quickly, I set up a multi-threaded python script, but then I learned about the --bypass-gc so I've been trying to use that instead of putting all of the object into the GC to be deleted. Deleting using radosgw-admin is not multi-threaded. On Tue, Dec 12, 2017, 5:43 AM Rafał

Re: [ceph-users] Production 12.2.2 CephFS Cluster still broken, new Details

2017-12-12 Thread Tobias Prousa
Thank you very much! I feel optimistic that now I got what I need to get that thing back working again. I'll report back... Best regards, Tobi On 12/12/2017 02:08 PM, Yan, Zheng wrote: On Tue, Dec 12, 2017 at 8:29 PM, Tobias Prousa wrote: Hi Zheng, the more you

Re: [ceph-users] Production 12.2.2 CephFS Cluster still broken, new Details

2017-12-12 Thread Yan, Zheng
On Tue, Dec 12, 2017 at 8:29 PM, Tobias Prousa wrote: > Hi Zheng, > > the more you tell me the more what I see begins to makes sens to me. Thank > you very much. > > Could you please be a little more verbose about how to use rados rmomapky? > What to use for and what to

Re: [ceph-users] Resharding issues / How long does it take?

2017-12-12 Thread Martin Emrich
Hi! (By the way, now a second bucket has this problem, it apparently occurs when the automatic resharding commences while data is being written to the bucket). Am 12.12.17 um 09:53 schrieb Orit Wasserman: On Mon, Dec 11, 2017 at 11:45 AM, Martin Emrich wrote:

Re: [ceph-users] Production 12.2.2 CephFS Cluster still broken, new Details

2017-12-12 Thread Tobias Prousa
Hi Zheng, the more you tell me the more what I see begins to makes sens to me. Thank you very much. Could you please be a little more verbose about how to use rados rmomapky? What to use for and what to use for <>. Here is what my dir_frag looks like:     {     "damage_type":

Re: [ceph-users] Production 12.2.2 CephFS Cluster still broken, new Details

2017-12-12 Thread Yan, Zheng
On Tue, Dec 12, 2017 at 4:22 PM, Tobias Prousa wrote: > Hi there, > > regarding my ML post from yesterday (Upgrade from 12.2.1 to 12.2.2 broke my > CephFs) I was able to get a little further with the suggested > "cephfs-table-tool take_inos ". This made the whole issue

Re: [ceph-users] Sudden omap growth on some OSDs

2017-12-12 Thread george.vasilakakos
On 11 Dec 2017, at 18:24, Gregory Farnum > wrote: Hmm, this does all sound odd. Have you tried just restarting the primary OSD yet? That frequently resolves transient oddities like this. If not, I'll go poke at the kraken source and one of the

Re: [ceph-users] Production 12.2.2 CephFS Cluster still broken, new Details

2017-12-12 Thread Yan, Zheng
On Tue, Dec 12, 2017 at 4:22 PM, Tobias Prousa wrote: > Hi there, > > regarding my ML post from yesterday (Upgrade from 12.2.1 to 12.2.2 broke my > CephFs) I was able to get a little further with the suggested > "cephfs-table-tool take_inos ". This made the whole issue

[ceph-users] ceph configuration backup - what is vital?

2017-12-12 Thread Wolfgang Lendl
hello, I'm looking for a recommendation about what parts/configuration/etc to backup from a ceph cluster in case of a disaster. I know this depends heavily on the type of disaster and I'm not talking about backup of payload stored on osds. currently I have my admin key stored somewhere outside

Re: [ceph-users] ceph-volume lvm activate could not find osd..0

2017-12-12 Thread Dan van der Ster
Doh! The activate command needs the *osd* fsid, not the cluster fsid. So this works: ceph-volume lvm activate 0 6608c0cf-3827-4967-94fd-5a3336f604c3 Is an "activate-all" equivalent planned? -- Dan On Tue, Dec 12, 2017 at 11:35 AM, Dan van der Ster wrote: > Hi all, > >

[ceph-users] Slow objects deletion

2017-12-12 Thread Rafał Wądołowski
Hi, Is there any known fast procedure to delete objects in large buckets? I have about 40 milions of objects. I used: radosgw-admin bucket rm --bucket=bucket-3 --purge-objects but it is very slow. I am using ceph luminous (12.2.1). Is it working in parallel? -- BR, Rafał Wądołowski

[ceph-users] ceph-volume lvm activate could not find osd..0

2017-12-12 Thread Dan van der Ster
Hi all, Did anyone successfully prepare a new OSD with ceph-volume in 12.2.2? We are trying the simplest thing possible and not succeeding :( # ceph-volume lvm prepare --bluestore --data /dev/sdb # ceph-volume lvm list == osd.0 === [block]

Re: [ceph-users] Resharding issues / How long does it take?

2017-12-12 Thread Orit Wasserman
Hi, On Mon, Dec 11, 2017 at 11:45 AM, Martin Emrich wrote: > Hi! > > Am 10.12.17, 11:54 schrieb "Orit Wasserman" : > > Hi Martin, > > On Thu, Dec 7, 2017 at 5:05 PM, Martin Emrich > wrote: > > It could be

[ceph-users] Production 12.2.2 CephFS Cluster still broken, new Details

2017-12-12 Thread Tobias Prousa
Hi there, regarding my ML post from yesterday (Upgrade from 12.2.1 to 12.2.2 broke my CephFs) I was able to get a little further with the suggested "cephfs-table-tool take_inos ". This made the whole issue with loads of "falsely free-marked inodes" go away. I then restarted MDS, kept all

Re: [ceph-users] Luminous, RGW bucket resharding

2017-12-12 Thread Orit Wasserman
On Mon, Dec 11, 2017 at 5:44 PM, Sam Wouters wrote: > On 11-12-17 16:23, Orit Wasserman wrote: >> On Mon, Dec 11, 2017 at 4:58 PM, Sam Wouters wrote: >>> Hi Orrit, >>> >>> >>> On 04-12-17 18:57, Orit Wasserman wrote: Hi Andreas, On Mon, Dec 4, 2017