[ceph-users] Source Package radosgw file has authentication issues

2016-10-19 Thread 于 姜
ceph_10.2.3.orig.tar.gz Source package Compile completed: /root/neunn_gitlab/ceph-Jewel10.2.3/src/radosgw The following issues occur when script execution: 2016-10-20 11:36:30.102266 7f8b4b93f900 -1 auth: unable to find a keyring on /var/lib/ceph/radosgw/-admin/keyring: (2) No such file or

Re: [ceph-users] qemu-rbd and ceph striping

2016-10-19 Thread Ahmed Mostafa
Does this also mean that strip count can be thought of as the number of parrallel writes to different objects at different OSDs ? Thank you On Thursday, 20 October 2016, Jason Dillaman wrote: > librbd (used by QEMU to provide RBD-backed disks) uses librados and > provides

Re: [ceph-users] Surviving a ceph cluster outage: the hard way

2016-10-19 Thread Goncalo Borges
Hi Kostis... That is a tale from the dark side. Glad you recover it and that you were willing to doc it all up, and share it. Thank you for that, Can I also ask which tool did you use to recover the leveldb? Cheers Goncalo From: ceph-users

Re: [ceph-users] When the kernel support JEWEL tunables?

2016-10-19 Thread Alexandre DERUMIER
works fine with kernel 4.6 for me. from doc: http://docs.ceph.com/docs/master/rados/operations/crush-map/#crush-tunables it should works with kernel 4.5 too. I don't known if they are any plan to backport last krbd module version to kernel 4.4 ? - Mail original - De: "한승진"

[ceph-users] When the kernel support JEWEL tunables?

2016-10-19 Thread 한승진
Hi all, When I try to mount rbd through KRBD, it failed because of mismatch features. The Client's OS is Ubuntu 16.04 and kernel is 4.4.0-38 My original CRUSH tunables is below. root@Fx2x1ctrlserv01:~# ceph osd crush show-tunables { "choose_local_tries": 0,

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-19 Thread Christian Balzer
Hello, On Wed, 19 Oct 2016 12:28:28 + Jim Kilborn wrote: > I have setup a new linux cluster to allow migration from our old SAN based > cluster to a new cluster with ceph. > All systems running centos 7.2 with the 3.10.0-327.36.1 kernel. As others mentioned, not a good choice, but also not

Re: [ceph-users] removing image of rbd mirroring

2016-10-19 Thread Jason Dillaman
On Wed, Oct 19, 2016 at 6:52 PM, yan cui wrote: > 2016-10-19 15:46:44.843053 7f35c9925d80 -1 librbd: cannot obtain exclusive > lock - not removing Are you attempting to delete the primary or non-primary image? I would expect any attempts to delete the non-primary image to

Re: [ceph-users] qemu-rbd and ceph striping

2016-10-19 Thread Jason Dillaman
librbd (used by QEMU to provide RBD-backed disks) uses librados and provides the necessary handling for striping across multiple backing objects. When you don't specify "fancy" striping options via "--stripe-count" and "--stripe-unit", it essentially defaults to stripe count of 1 and stripe unit

[ceph-users] removing image of rbd mirroring

2016-10-19 Thread yan cui
Hi all, We setup rbd mirroring between 2 clusters, but have issues when we want to delete one image. Following is the detailed info. It reports that some other instance is still using it, which kind of makes sense because we set up the mirror between 2 clusters. What's the best practice to

[ceph-users] Surviving a ceph cluster outage: the hard way

2016-10-19 Thread Kostis Fardelas
Hello cephers, this is the blog post on our Ceph cluster's outage we experienced some weeks ago and about how we managed to revive the cluster and our clients's data. I hope it will prove useful for anyone who will find himself/herself in a similar position. Thanks for everyone on the ceph-users

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-19 Thread Jim Kilborn
John, Updating to the latest mainline kernel from elrepo (4.8.2-1) on all 4 ceph servers, and the ceph client that I am testing with, still didn’t fix the issues. Still getting “Failing to respond to Cache Pressure”. And ops block currently hovering between 100-300 requests > 32 sec

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cachepressure, capability release, poor iostat await avg queue size

2016-10-19 Thread mykola.dvornik
Not sure if related, but I see the same issue on the very different hardware/configuration. In particular on large data transfers OSDs become slow and blocking. Iostat await on spinners can go up to 6(!) s ( journal is on the ssd). Looking closer on those spinners with blktrace suggest that

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-19 Thread John Spray
On Wed, Oct 19, 2016 at 5:17 PM, Jim Kilborn wrote: > John, > > > > Thanks for the tips…. > > Unfortunately, I was looking at this page > http://docs.ceph.com/docs/jewel/start/os-recommendations/ OK, thanks - I've pushed an update to clarify that

[ceph-users] qemu-rbd and ceph striping

2016-10-19 Thread Ahmed Mostafa
Hello >From the documentation i understand that clients that uses librados must perform striping for themselves, but i do not understand how could this be if we have striping options in ceph ? i mean i can create rbd images that has configuration for striping, count and unite size. So my

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-19 Thread Jim Kilborn
John, Thanks for the tips…. Unfortunately, I was looking at this page http://docs.ceph.com/docs/jewel/start/os-recommendations/ I’ll consider either upgrading the kernels or using the fuse client, but will likely go the kernel 4.4 route As for moving to just a replicated pool, I take it

Re: [ceph-users] Feedback wanted: health warning when standby MDS dies?

2016-10-19 Thread Sean Redmond
Hi, I would be interested in this case when a mds in standby-replay fails. Thanks On Wed, Oct 19, 2016 at 4:06 PM, Scottix wrote: > I would take the analogy of a Raid scenario. Basically a standby is > considered like a spare drive. If that spare drive goes down. It is good

Re: [ceph-users] Feedback wanted: health warning when standby MDS dies?

2016-10-19 Thread Scottix
I would take the analogy of a Raid scenario. Basically a standby is considered like a spare drive. If that spare drive goes down. It is good to know about the event, but it does in no way indicate a degraded system, everything keeps running at top speed. If you had multi active MDS and one goes

Re: [ceph-users] new Open Source Ceph based iSCSI SAN project

2016-10-19 Thread Tyler Bishop
This is a cool project, keep up the good work! _ Tyler Bishop Founder O: 513-299-7108 x10 M: 513-646-5809 http://BeyondHosting.net This email is intended only for the recipient(s) above and/or otherwise authorized personnel. The information

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-19 Thread John Spray
On Wed, Oct 19, 2016 at 1:28 PM, Jim Kilborn wrote: > I have setup a new linux cluster to allow migration from our old SAN based > cluster to a new cluster with ceph. > All systems running centos 7.2 with the 3.10.0-327.36.1 kernel. > I am basically running stock ceph

Re: [ceph-users] HELP ! Cluster unusable with lots of "hitsuicidetimeout"

2016-10-19 Thread Yoann Moulin
Hello, >>> We have a cluster in Jewel 10.2.2 under ubuntu 16.04. The cluster is >>> compose by 12 nodes, each nodes have 10 OSD with journal on disk. >>> >>> We have one rbd partition and a radosGW with 2 data pool, one replicated, >>> one EC (8+2) >>> >>> in attachment few details on our

Re: [ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"

2016-10-19 Thread Dan van der Ster
On Wed, Oct 19, 2016 at 3:22 PM, Yoann Moulin wrote: > Hello, > >>> We have a cluster in Jewel 10.2.2 under ubuntu 16.04. The cluster is >>> compose by 12 nodes, each nodes have 10 OSD with journal on disk. >>> >>> We have one rbd partition and a radosGW with 2 data pool,

Re: [ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"

2016-10-19 Thread Yoann Moulin
Hello, >> We have a cluster in Jewel 10.2.2 under ubuntu 16.04. The cluster is compose >> by 12 nodes, each nodes have 10 OSD with journal on disk. >> >> We have one rbd partition and a radosGW with 2 data pool, one replicated, >> one EC (8+2) >> >> in attachment few details on our cluster. >>

[ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-19 Thread Jim Kilborn
I have setup a new linux cluster to allow migration from our old SAN based cluster to a new cluster with ceph. All systems running centos 7.2 with the 3.10.0-327.36.1 kernel. I am basically running stock ceph settings, with just turning the write cache off via hdparm on the drives, and

Re: [ceph-users] offending shards are crashing osd's

2016-10-19 Thread Ronny Aasen
On 06. okt. 2016 13:41, Ronny Aasen wrote: hello I have a few osd's in my cluster that are regularly crashing. [snip] ofcourse having 3 osd's dying regularly is not good for my health. so i have set noout, to avoid heavy recoveries. googeling this error messages gives exactly 1 hit:

Re: [ceph-users] HELP ! Cluster unusable with lots of "hitsuicidetimeout"

2016-10-19 Thread Burkhard Linke
Hi, just an additional comment: you can disable backfilling and recovery temporarily by setting the 'nobackfill' and 'norecover' flags. It will reduce the backfilling traffic and may help the cluster and its OSD to recover. Afterwards you should set the backfill traffic settings to the

Re: [ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"

2016-10-19 Thread Christian Balzer
Hello, no specific ideas, but this somewhat sounds familiar. One thing first, you already stopped client traffic but to make sure your cluster really becomes quiescent, stop all scrubs as well. That's always a good idea in any recovery, overload situation. Have you verified CPU load (are those

Re: [ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"

2016-10-19 Thread Dan van der Ster
Hi Yoann, On Wed, Oct 19, 2016 at 9:44 AM, Yoann Moulin wrote: > Dear List, > > We have a cluster in Jewel 10.2.2 under ubuntu 16.04. The cluster is compose > by 12 nodes, each nodes have 10 OSD with journal on disk. > > We have one rbd partition and a radosGW with 2 data

[ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"

2016-10-19 Thread Yoann Moulin
Dear List, We have a cluster in Jewel 10.2.2 under ubuntu 16.04. The cluster is compose by 12 nodes, each nodes have 10 OSD with journal on disk. We have one rbd partition and a radosGW with 2 data pool, one replicated, one EC (8+2) in attachment few details on our cluster. Currently, our