[ceph-users] How stable is a Hot Standby (Standby Replay) MDS?

2014-12-19 Thread Wido den Hollander
Hi, Multi-MDS is not recommended, so currently I'm testing with Active/Standby, but there is also a situation where a MDS could be in Standby Replay by enabling 'mds_standby_replay' in the config. How stable is that? I know the answer would be: Test it! But just wondering what the recommendation

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-19 Thread Sean Sullivan
Hello Christian, Thanks again for all of your help! I started a bonnie test using the following:: bonnie -d /mnt/rbd/scratch2/ -m $(hostname) -f -b Hopefully it completes in the next hour or so. A reboot of the slow OSDs clears the slow marker for now kh10-9$ ceph -w cluster

Re: [ceph-users] How stable is a Hot Standby (Standby Replay) MDS?

2014-12-19 Thread John Spray
Standby replay is about as stable as normal standby -- it's covered in some of the nightly test suites. The code running in standby replay is almost all the same as what is run in one go at startup on a normal standby. John On Fri, Dec 19, 2014 at 8:05 AM, Wido den Hollander w...@42on.com

[ceph-users] 0.88

2014-12-19 Thread Lindsay Mathieson
Will this make its way into the debian repo eventually? http://ceph.com/debian-giant -- Lindsay signature.asc Description: This is a digitally signed message part. ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] 0.88

2014-12-19 Thread Loic Dachary
Hi Lindsay, On 19/12/2014 15:12, Lindsay Mathieson wrote: Will this make its way into the debian repo eventually? This is a development release that is not meant to be published in distributions such as Debian, CentOS etc. Cheers http://ceph.com/debian-giant

Re: [ceph-users] 0.88

2014-12-19 Thread Lindsay Mathieson
On Fri, 19 Dec 2014 03:27:53 PM you wrote: On 19/12/2014 15:12, Lindsay Mathieson wrote: Will this make its way into the debian repo eventually? This is a development release that is not meant to be published in distributions such as Debian, CentOS etc. Ah, thanks. Its not clear from the

Re: [ceph-users] 0.88

2014-12-19 Thread Loic Dachary
On 19/12/2014 15:35, Lindsay Mathieson wrote: On Fri, 19 Dec 2014 03:27:53 PM you wrote: On 19/12/2014 15:12, Lindsay Mathieson wrote: Will this make its way into the debian repo eventually? This is a development release that is not meant to be published in distributions such as Debian,

Re: [ceph-users] 0.88

2014-12-19 Thread Lindsay Mathieson
On Fri, 19 Dec 2014 03:57:42 PM you wrote: The stable release have real names, that is what makes them different from development releases (dumpling, emperor, firefly, giant, hammer). Ah, so we had two named firefly releases (Firefly 0.86 Firefly 0.87) - they were both production and we have

Re: [ceph-users] 0.88

2014-12-19 Thread Loic Dachary
On 19/12/2014 16:10, Lindsay Mathieson wrote: On Fri, 19 Dec 2014 03:57:42 PM you wrote: The stable release have real names, that is what makes them different from development releases (dumpling, emperor, firefly, giant, hammer). Ah, so we had two named firefly releases (Firefly 0.86

[ceph-users] Placement groups stuck inactive after down out of 1/9 OSDs

2014-12-19 Thread Chris Murray
Hello, I'm a newbie to CEPH, gaining some familiarity by hosting some virtual machines on a test cluster. I'm using a virtualisation product called Proxmox Virtual Environment, which conveniently handles cluster setup, pool setup, OSD creation etc. During the attempted removal of an OSD, my pool

Re: [ceph-users] 0.88

2014-12-19 Thread Francois Lafont
Hi, Le 19/12/2014 15:57, Loic Dachary a écrit : The stable release have real names, that is what makes them different from development releases (dumpling, emperor, firefly, giant, hammer). And I add that, from what I understand, one time in two the release is LTS (Long Time Support). Firefly

[ceph-users] Hanging VMs with Qemu + RBD

2014-12-19 Thread Nico Schottelius
Hello, another issue we have experienced with qemu VMs (qemu 2.0.0) with ceph-0.80 on Ubuntu 14.04 managed by opennebula 4.10.1: The VMs are completly frozen when rebalancing takes place, they do not even respond to ping anymore. Looking at the qemu processes they are in state Sl. Is this a

Re: [ceph-users] Have 2 different public networks

2014-12-19 Thread Craig Lewis
On Thu, Dec 18, 2014 at 10:47 PM, Francois Lafont flafdiv...@free.fr wrote: Le 19/12/2014 02:18, Craig Lewis a écrit : The daemons bind to *, Yes but *only* for the OSD daemon. Am I wrong? Personally I must provide IP addresses for the monitors in the /etc/ceph/ceph.conf, like this:

Re: [ceph-users] High CPU/Delay when Removing Layered Child RBD Image

2014-12-19 Thread Robert LeBlanc
Do you know if this value is not set if it uses 4MB or 4096 bytes? Thanks, Robert LeBlanc On Thu, Dec 18, 2014 at 6:51 PM, Tyler Wilson k...@linuxdigital.net wrote: Okay, this is rather unrelated to Ceph but I might as well mention how this is fixed. When using the Juno-Release OpenStack

Re: [ceph-users] Recovering from PG in down+incomplete state

2014-12-19 Thread Robert LeBlanc
I'm still pretty new at troubleshooting Ceph and since no one has responded yet I'll give a stab. What is the size of your pool? 'ceph osd pool get pool name size' It seems like based on the number of incomplete PGs that it was '1'. I understand that if you are able to bring osd 7 back in, it

Re: [ceph-users] Need help from Ceph experts

2014-12-19 Thread Craig Lewis
I've done single nodes. I have a couple VMs for RadosGW Federation testing. It has a single virtual network, with both clusters on the same network. Because I'm only using a single OSD on a single host, I had to update the crushmap to handle that. My Chef recipe runs: ceph osd getcrushmap -o

Re: [ceph-users] Recovering from PG in down+incomplete state

2014-12-19 Thread Craig Lewis
Why did you remove osd.7? Something else appears to be wrong. With all 11 OSDs up, you shouldn't have any PGs stuck in stale or peering. How badly are the clocks skewed between nodes? If it's bad enough, it can cause communication problems between nodes. Ceph will complain if the clocks are

Re: [ceph-users] Hanging VMs with Qemu + RBD

2014-12-19 Thread Robert LeBlanc
I think smaller clusters get chocked up with the default backfill. I've seen latency on a four node cluster with 10 OSD each improve by setting osd_max_backfills to 2. I would try lowering it and see if it helps. Also, if you are running both cluster and VM traffic on the same network, you could

Re: [ceph-users] Placement groups stuck inactive after down out of 1/9 OSDs

2014-12-19 Thread Craig Lewis
That seems odd. So you have 3 nodes, with 3 OSDs each. You should've been able to mark osd.0 down and out, then stop the daemon without having those issues. It's generally best to mark an osd down, then out, and wait until the cluster has recovered completely before stopping the daemon and

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

2014-12-19 Thread Gregory Farnum
On Thu, Dec 18, 2014 at 8:44 PM, Sean Sullivan seapasu...@uchicago.edu wrote: Thanks for the reply Gegory, Sorry if this is in the wrong direction or something. Maybe I do not understand To test uploads I either use bash time and either python-swiftclient or boto

Re: [ceph-users] Placement groups stuck inactive after down out of 1/9 OSDs

2014-12-19 Thread Chris Murray
Interesting indeed, those tuneables were suggested on the pve-user mailing list too, and they certainly sound like they’ll ease the pressure during the recovery operation. What I might not have explained very well though is that the VMs hung indefinitely and past the end of the recovery

Re: [ceph-users] Placement groups stuck inactive after down out of 1/9 OSDs

2014-12-19 Thread Dietmar Maurer
The more I think about this problem, the less I think there'll be an easy answer, and it's more likely that I'll have to reproduce the scenario and actually pause myself next time in order to troubleshoot it? It is even possible to simulate those crush problem. I reported a few examples long

[ceph-users] v0.90 released

2014-12-19 Thread Sage Weil
This is the last development release before Christmas. There are some API cleanups for librados and librbd, and lots of bug fixes across the board for the OSD, MDS, RGW, and CRUSH. The OSD also gets support for discard (potentially helpful on SSDs, although it is off by default), and there

Re: [ceph-users] Placement groups stuck inactive after down out of 1/9 OSDs

2014-12-19 Thread Craig Lewis
With only one OSD down and size = 3, you shouldn't've had any PGs inactive. At worst, they should've been active+degraded. The only thought I have is that some of your PGs aren't mapping to the correct number of OSDs. That's not supposed to be able to happen unless you've messed up your crush

Re: [ceph-users] Have 2 different public networks

2014-12-19 Thread Francois Lafont
[Oh, sorry Craig for my mistake: I sent my response to your personal address instead of sending it to the list. Sorry for the duplicate. I send my message to list] Hello, Le 19/12/2014 19:17, Craig Lewis a écrit : I'm not using mon addr lines, and my ceph-mon daemons are bound to 0.0.0.0:*.

Re: [ceph-users] Have 2 different public networks

2014-12-19 Thread Craig Lewis
On Fri, Dec 19, 2014 at 4:03 PM, Francois Lafont flafdiv...@free.fr wrote: Le 19/12/2014 19:17, Craig Lewis a écrit : I'm not using mon addr lines, and my ceph-mon daemons are bound to 0.0.0.0:*. And do you have several IP addresses on your server? Can you contact the *same* monitor

Re: [ceph-users] Have 2 different public networks

2014-12-19 Thread Francois Lafont
Le 20/12/2014 02:18, Craig Lewis a écrit : And do you have several IP addresses on your server? Can you contact the *same* monitor process with different IP addresses? For instance: telnet -e ']' ip_addr1 6789 telnet -e ']' ip_addr2 6789 Oh. The second one fails, even though

Re: [ceph-users] Have 2 different public networks

2014-12-19 Thread Craig Lewis
On Fri, Dec 19, 2014 at 6:19 PM, Francois Lafont flafdiv...@free.fr wrote: So, indeed, I have to use routing *or* maybe create 2 monitors by server like this: [mon.node1-public1] host = ceph-node1 mon addr = 10.0.1.1 [mon.node1-public2] host = ceph-node1 mon

Re: [ceph-users] v0.90 released

2014-12-19 Thread Anthony Alba
Hi Sage, Has the repo metadata been regenerated? One of my reposync jobs can only see up to 0.89, using http://ceph.com/rpm-testing. Thanks Anthony On Sat, Dec 20, 2014 at 6:22 AM, Sage Weil sw...@redhat.com wrote: This is the last development release before Christmas. There are some API