[ceph-users] failed to connect to the RADOS monitor on: IP:6789, : Connection timed out

2017-03-21 Thread Vince
Hi, We have setup a ceph cluster and while adding it as primary storage in Cloudstack, I am getting the below error in hypervisor server . The error says the hypervisor server timed out while connecting to the ceph monitor. Disabled firewall and made sure ports are open. This is the final

Re: [ceph-users] INFO:ceph-create-keys:ceph-mon admin socket not ready yet.

2017-03-21 Thread Vince
Hi, I have checked and confirmed that the monitor daemon is running and the socket file /var/run/ceph/ceph-mon.mon1.asok has been created. But the server messages is still showing the error. Mar 22 00:47:38 mon1 ceph-create-keys: admin_socket: exception getting command descriptions:

Re: [ceph-users] Do we know which version of ceph-client has this fix ? http://tracker.ceph.com/issues/17191

2017-03-21 Thread Deepak Naidu
Thanks Brad -- Deepak > On Mar 21, 2017, at 9:31 PM, Brad Hubbard wrote: > >> On Wed, Mar 22, 2017 at 10:55 AM, Deepak Naidu wrote: >> Do we know which version of ceph client does this bug has a fix. Bug: >> http://tracker.ceph.com/issues/17191 >> >>

Re: [ceph-users] Do we know which version of ceph-client has this fix ? http://tracker.ceph.com/issues/17191

2017-03-21 Thread Brad Hubbard
On Wed, Mar 22, 2017 at 10:55 AM, Deepak Naidu wrote: > Do we know which version of ceph client does this bug has a fix. Bug: > http://tracker.ceph.com/issues/17191 > > > > I have ceph-common-10.2.6-0 ( on CentOS 7.3.1611) & ceph-fs-common- > 10.2.6-1(Ubuntu 14.04.5)

[ceph-users] Do we know which version of ceph-client has this fix ? http://tracker.ceph.com/issues/17191

2017-03-21 Thread Deepak Naidu
Do we know which version of ceph client does this bug has a fix. Bug: http://tracker.ceph.com/issues/17191 I have ceph-common-10.2.6-0 ( on CentOS 7.3.1611) & ceph-fs-common- 10.2.6-1(Ubuntu 14.04.5) -- Deepak ---

Re: [ceph-users] Recompiling source code - to find exact RPM

2017-03-21 Thread Brad Hubbard
Based solely on the information given the only rpms with this specific commit in them would be here https://shaman.ceph.com/builds/ceph/wip-prune-past-intervals-kraken/ (specifically

Re: [ceph-users] What's the actual justification for min_size? (was: Re: I/O hangs with 2 node failure even if one node isn't involved in I/O)

2017-03-21 Thread Shinobu Kinjo
> I am sure I remember having to reduce min_size to 1 temporarily in the past > to allow recovery from having two drives irrecoverably die at the same time > in one of my clusters. What was the situation that you had to do that? Thanks for sharing your experience in advance. Regards,

Re: [ceph-users] active+clean+inconsistent and pg repair

2017-03-21 Thread Shain Miley
Hi, Thank you for providing me this level of detail. I ended up just failing the drive since it is still under support and we had in fact gotten emails about the health of this drive in the past. I will however use this in the future if we have an issue with a pg and it is the first time we

Re: [ceph-users] What's the actual justification for min_size?

2017-03-21 Thread Anthony D'Atri
I’m fairly sure I saw it as recently as Hammer, definitely Firefly. YMMV. > On Mar 21, 2017, at 4:09 PM, Gregory Farnum wrote: > > You shouldn't need to set min_size to 1 in order to heal any more. That was > the case a long time ago but it's been several major LTS

[ceph-users] rados gateway

2017-03-21 Thread Garg, Pankaj
Hi, I'm installing Rados Gateway, using Jewel 10.2.5, and can't seem to find the correct documentation. I used ceph-deploy to start the gateway, but cant seem to restart the process correctly. Can someone point me to the correct steps? Also, how do I start my rados gateway back. This is what I

Re: [ceph-users] What's the actual justification for min_size?

2017-03-21 Thread Gregory Farnum
You shouldn't need to set min_size to 1 in order to heal any more. That was the case a long time ago but it's been several major LTS releases now. :) So: just don't ever set min_size to 1. -Greg On Tue, Mar 21, 2017 at 6:04 PM Anthony D'Atri wrote: > >> a min_size of 1 is

[ceph-users] Preconditioning an RBD image

2017-03-21 Thread Alex Gorbachev
I wanted to share the recent experience, in which a few RBD volumes, formatted as XFS and exported via Ubuntu NFS-kernel-server performed poorly, even generated an "out of space" warnings on a nearly empty filesystem. I tried a variety of hacks and fixes to no effect, until things started

Re: [ceph-users] add multiple OSDs to cluster

2017-03-21 Thread Anthony D'Atri
Deploying or removing OSD’s in parallel for sure can save elapsed time and avoid moving data more than once. There are certain pitfalls, though, and the strategy needs careful planning. - Deploying a new OSD at full weight means a lot of write operations. Running multiple whole-OSD backfills

Re: [ceph-users] What's the actual justification for min_size?

2017-03-21 Thread Anthony D'Atri
>> a min_size of 1 is dangerous though because it means you are 1 hard disk >> failure away from losing the objects within that placement group entirely. a >> min_size of 2 is generally considered the minimum you want but many people >> ignore that advice, some wish they hadn't. > > I admit I

[ceph-users] How to mount different ceph FS using ceph-fuse or kernel cephfs mount

2017-03-21 Thread Deepak Naidu
Greetings, I have below two cephFS "volumes/filesystem" created on my ceph cluster. Yes I used the "enable_multiple" flag to enable the multiple cephFS feature. My question 1) How do I mention the fs name ie dataX or data1 during cephFS mount either using kernel mount of ceph-fuse

[ceph-users] Correcting inconsistent pg in EC pool

2017-03-21 Thread Graham Allan
I came across an inconsistent pg in our 4+2 EC storage pool (ceph 10.2.5). Since "ceph pg repair" wasn't able to correct it, I followed the general outline given in this thread http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-August/003965.html # zgrep -Hn ERR

Re: [ceph-users] krbd exclusive-lock

2017-03-21 Thread Jason Dillaman
The exclusive-lock feature does, by default, automatically transition the lock between clients that are attempting to use the image. Only one client will be able to issue writes to the image at a time. If you ran "dd" against both mappings concurrently, I'd expect you'd see a vastly decreased

Re: [ceph-users] radosgw global quotas

2017-03-21 Thread Graham Allan
On 03/17/2017 11:47 AM, Casey Bodley wrote: On 03/16/2017 03:47 PM, Graham Allan wrote: This might be a dumb question, but I'm not at all sure what the "global quotas" in the radosgw region map actually do. It is like a default quota which is applied to all users or buckets, without having to

Re: [ceph-users] I/O hangs with 2 node failure even if one node isn't involved in I/O

2017-03-21 Thread Kjetil Jørgensen
Hi, On Tue, Mar 21, 2017 at 11:59 AM, Adam Carheden wrote: > Let's see if I got this. 4 host cluster. size=3, min_size=2. 2 hosts > fail. Are all of the following accurate? > > a. An rdb is split into lots of objects, parts of which will probably > exist on all 4 hosts. >

Re: [ceph-users] add multiple OSDs to cluster

2017-03-21 Thread Jonathan Proulx
If it took 7hr for one drive you probably already done this (or defaults are for low impact recovery) but before doing anything you want to besure you OSD settings max backfills, max recovery active, recovery sleep (perhaps others?) are set such that revovery and backfilling doesn't overwhelm

Re: [ceph-users] add multiple OSDs to cluster

2017-03-21 Thread Steve Taylor
Generally speaking, you are correct. Adding more OSDs at once is more efficient than adding fewer at a time. That being said, do so carefully. We typically add OSDs to our clusters either 32 or 64 at once, and we have had issues on occasion with bad drives. It's common for us to have a drive or

Re: [ceph-users] I/O hangs with 2 node failure even if one node isn't involved in I/O

2017-03-21 Thread Adam Carheden
Let's see if I got this. 4 host cluster. size=3, min_size=2. 2 hosts fail. Are all of the following accurate? a. An rdb is split into lots of objects, parts of which will probably exist on all 4 hosts. b. Some objects will have 2 of their 3 replicas on 2 of the offline OSDs. c. Reads can

[ceph-users] add multiple OSDs to cluster

2017-03-21 Thread mj
Hi, Just a quick question about adding OSDs, since most of the docs I can find talk about adding ONE OSD, and I'd like to add four per server on my three-node cluster. This morning I tried the careful approach, and added one OSD to server1. It all went fine, everything rebuilt and I have a

Re: [ceph-users] Need erasure coding, pg and block size explanation

2017-03-21 Thread Maxime Guyot
Hi Vincent, There is no buffering until the object reaches 8MB. When the object is written, it has a given size. RADOS just splits the object in K chunks, padding occurs if the object size is not a multiple of K. See also:

[ceph-users] What's the actual justification for min_size? (was: Re: I/O hangs with 2 node failure even if one node isn't involved in I/O)

2017-03-21 Thread Richard Hesketh
On 21/03/17 17:48, Wes Dillingham wrote: > a min_size of 1 is dangerous though because it means you are 1 hard disk > failure away from losing the objects within that placement group entirely. a > min_size of 2 is generally considered the minimum you want but many people > ignore that advice,

Re: [ceph-users] INFO:ceph-create-keys:ceph-mon admin socket not ready yet.

2017-03-21 Thread Wes Dillingham
Generally this means the monitor daemon is not running. Is the monitor daemon running? The monitor daemon creates the admin socket in /var/run/ceph/$socket Elaborate on how you are attempting to deploy ceph. On Tue, Mar 21, 2017 at 9:01 AM, Vince wrote: > Hi, > > I am

Re: [ceph-users] I/O hangs with 2 node failure even if one node isn't involved in I/O

2017-03-21 Thread Wes Dillingham
If you had set min_size to 1 you would not have seen the writes pause. a min_size of 1 is dangerous though because it means you are 1 hard disk failure away from losing the objects within that placement group entirely. a min_size of 2 is generally considered the minimum you want but many people

[ceph-users] Linux Fest NW CFP

2017-03-21 Thread Federico Lucifredi
Hello Ceph team, Linux Fest NorthWest's CFP is out. It is a bit too far for me to do it as a day trip from Boston, but it would be nice if someone on the pacific coast feels like giving a technical overview / architecture session.

[ceph-users] Ceph rebalancing problem!!

2017-03-21 Thread Arturo N. Diaz Crespo
Hello, I have a small cluster of Ceph installed and I have followed the manual installation instructions since I do not have internet. I have configured the system with two network interfaces, one for the client network and one for the cluster network. The problem is that the system when it begins

[ceph-users] Need erasure coding, pg and block size explanation

2017-03-21 Thread Vincent Godin
When we use a replicated pool of size 3 for example, each data, a block of 4MB is written on one PG which is distributed on 3 hosts (by default). The osd holding the primary will copy the block to OSDs holding the secondary and third PG. With erasure code, let's take a raid5 schema like k=2 and

[ceph-users] Cephalocon 2017 CFP Open!

2017-03-21 Thread Patrick McGarry
Hey cephers, For those of you that are interested in presenting, sponsoring, or attending Cephalocon, all of those options are now available on the Ceph site. http://ceph.com/cephalocon2017/ If you have any questions, comments, or difficulties, feel free to let me know. Thanks! -- Best

Re: [ceph-users] Brainstorming ideas for Python-CRUSH

2017-03-21 Thread Loic Dachary
Hi Logan, On 03/21/2017 03:27 PM, Logan Kuhn wrote: > I like the idea > > Being able to play around with different configuration options and using this > tool as a sanity checker or showing what will change as well as whether or > not the changes could cause health warn or health err. The

Re: [ceph-users] I/O hangs with 2 node failure even if one node isn't involved in I/O

2017-03-21 Thread Adam Carheden
Thanks everyone for the replies. Very informative. However, should I have expected writes to pause if I'd had min_size set to 1 instead of 2? And yes, I was under the false impression that my rdb devices was a single object. That explains what all those other things are on a test cluster where I

[ceph-users] idea about optimize an osd rebuild

2017-03-21 Thread Vincent Godin
when you replace a failed osd, it has to recover all of its pgs and so it is pretty busy. Is it possible to tell the OSD to not become primary for any of its already synchronized pgs till every pgs (of the OSD) have recover ? It should accelerate the rebuild process because the OSD won't have to

Re: [ceph-users] Brainstorming ideas for Python-CRUSH

2017-03-21 Thread Logan Kuhn
I like the idea Being able to play around with different configuration options and using this tool as a sanity checker or showing what will change as well as whether or not the changes could cause health warn or health err. For example, if I were to change the replication level of a pool,

[ceph-users] Recompiling source code - to find exact RPM

2017-03-21 Thread nokia ceph
Hello, I made some changes in the below file on ceph kraken v11.2.0 source code as per this article https://github.com/ceph/ceph-ci/commit/wip-prune-past-intervals-kraken ..src/osd/PG.cc ..src/osd/PG.h Is there any way to find which rpm got affected by these two files. I believe it should be

[ceph-users] Ceph Day Warsaw (25 Apr)

2017-03-21 Thread Patrick McGarry
Hey cephers, We have no finalized the details for Ceph Day Warsaw (see http://ceph.com/cephdays) and as a result, we need speakers! If you would be interested in sharing some of your experiences or work around Ceph please let me know as soon as possible. Thanks. -- Best Regards, Patrick

Re: [ceph-users] Ceph-osd Daemon Receives Segmentation Fault on Trusty After Upgrading to 0.94.10 Release

2017-03-21 Thread Alexey Sheplyakov
gt; Best regards, >> Alexey >> >> On Tue, Mar 21, 2017 at 12:21 PM, Özhan Rüzgar Karaman >> <oruzgarkara...@gmail.com> wrote: >> > Hi Wido; >> > After 30 minutes osd id 3 crashed also with segmentation fault, i >> > uploaded >&g

[ceph-users] INFO:ceph-create-keys:ceph-mon admin socket not ready yet.

2017-03-21 Thread Vince
Hi, I am getting the below error in messages after setting up ceph monitor. === Mar 21 08:48:23 mon1 ceph-create-keys: admin_socket: exception getting command descriptions: [Errno 2] No such file or directory Mar 21 08:48:23 mon1 ceph-create-keys: INFO:ceph-create-keys:ceph-mon admin

Re: [ceph-users] Ceph-osd Daemon Receives Segmentation Fault on Trusty After Upgrading to 0.94.10 Release

2017-03-21 Thread Alexey Sheplyakov
21, 2017 at 12:21 PM, Özhan Rüzgar Karaman <oruzgarkara...@gmail.com> wrote: > Hi Wido; > After 30 minutes osd id 3 crashed also with segmentation fault, i uploaded > logs again to the same location as ceph.log.wido.20170321-3.tgz. So now all > OSD deamons on that server is c

[ceph-users] Brainstorming ideas for Python-CRUSH

2017-03-21 Thread Xavier Villaneau
Hello all, A few weeks ago Loïc Dachary presented his work on python-crush to the ceph-devel list, but I don't think it's been done here yet. In a few words, python-crush is a new Python 2 and 3 library / API for the CRUSH algorithm. It also provides a CLI executable with a few built-in tools

[ceph-users] krbd exclusive-lock

2017-03-21 Thread Mikaël Cluseau
Hi, There's something I don't understand about the exclusive-lock feature. I created an image: $ ssh host-3 Container Linux by CoreOS stable (1298.6.0) Update Strategy: No Reboots host-3 ~ # uname -a Linux host-3 4.9.9-coreos-r1 #1 SMP Tue Mar 14 21:09:42 UTC 2017 x86_64 Intel(R) Xeon(R) CPU

[ceph-users] pgs stale during patching

2017-03-21 Thread Laszlo Budai
Hello, we have been patching our ceph cluster 0.94.7 to 0.94.10. We were updating one node at a time, and after each OSD node has been rebooted we were waiting for the cluster health status to be OK. In the docs we have "stale - The placement group status has not been updated by a ceph-osd,

Re: [ceph-users] Ceph-osd Daemon Receives Segmentation Fault on Trusty After Upgrading to 0.94.10 Release

2017-03-21 Thread Özhan Rüzgar Karaman
Hi Wido; After 30 minutes osd id 3 crashed also with segmentation fault, i uploaded logs again to the same location as ceph.log.wido.20170321-3.tgz. So now all OSD deamons on that server is crashed. Thanks Özhan On Tue, Mar 21, 2017 at 10:57 AM, Özhan Rüzgar Karaman < oruzgarkara...@gmail.

Re: [ceph-users] Ceph-osd Daemon Receives Segmentation Fault on Trusty After Upgrading to 0.94.10 Release

2017-03-21 Thread Özhan Rüzgar Karaman
. This time osd id 3 started and operated successfully but osd id 2 failed again with same segmentation fault. I have uploaded new logs as to the same destination as ceph.log.wido.20170321-2.tgz and its link is below again. https://drive.google.com/drive/folders/0B_hD9LJqrkd7NmtJOW5YUnh6UE0?usp= sharing