Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server

2014-09-24 Thread Micha Krause
Hi, I was able to get a dmesg output from the centos Machine with kernel 3.16: kworker/3:2:9521 blocked for more than 120 seconds. Not tainted 3.16.2-1.el6.elrepo.x86_64 #1 echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. kworker/3:2 D 0003 0

[ceph-users] RadosGW + Keystone = 403 Forbidden

2014-09-24 Thread Florent Bautista
Hi all, I want to set-up a RadosGW (Firefly) + Keystone (IceHouse) environment, but I have a problem I can't solve. It seems that authentication is OK, user get a token. But when he wants to create a bucket, he get 403 error. I have this in RadosGW logs : 2014-09-24 13:02:37.894674

Re: [ceph-users] RadosGW + Keystone = 403 Forbidden

2014-09-24 Thread Florent Bautista
Sorry, problem solved, I forgot /swift/v1 at the end of URLs in Keystone endpoint creation... ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server

2014-09-24 Thread Ilya Dryomov
On Wed, Sep 24, 2014 at 12:12 PM, Micha Krause mi...@krausam.de wrote: Hi, I was able to get a dmesg output from the centos Machine with kernel 3.16: kworker/3:2:9521 blocked for more than 120 seconds. Not tainted 3.16.2-1.el6.elrepo.x86_64 #1 echo 0

Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server

2014-09-24 Thread Ilya Dryomov
On Wed, Sep 24, 2014 at 12:20 PM, Micha Krause mi...@krausam.de wrote: Hi, So does it actually crash or it's just the blocked I/Os? If it doesn't crash, you should be able to get everything off dmesg. it's blocked I/Os, I just wrote another mail to the list, with more dmesg Output from

Re: [ceph-users] [Ceph-community] Pgs are in stale+down+peering state

2014-09-24 Thread Sage Weil
On Wed, 24 Sep 2014, Sahana Lokeshappa wrote: 2.a9    518 0   0   0   0   2172649472  3001    3001    active+clean    2014-09-22 17:49:35.357586  6826'35762  17842:72706 [12,7,28]   12  [12,7,28]   12   6826'35762  2014-09-22 11:33:55.985449  

Re: [ceph-users] Can ceph-deploy be used with 'osd objectstore = keyvaluestore-dev' in config file ?

2014-09-24 Thread Sage Weil
On Wed, 24 Sep 2014, Mark Kirkwood wrote: On 24/09/14 14:29, Aegeaner wrote: I run ceph on Red Hat Enterprise Linux Server 6.4 Santiago, and when I run service ceph start i got: # service ceph start ERROR:ceph-disk:Failed to activate ceph-disk: Does not look like a Ceph OSD,

[ceph-users] Geom Error on boot with rbd volume

2014-09-24 Thread Steven Timm
I have been trying for quite some time to launch a KVM VM from a CEPH RBD volume using OpenNebula. I have gotten past the permissions issues and to the point where kvm can actually start a virtual machine, but we are getting a Geom Error as soon as the virt console comes up. (note, not a GRUB

Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server

2014-09-24 Thread Ilya Dryomov
On Wed, Sep 24, 2014 at 4:52 PM, Micha Krause mi...@krausam.de wrote: Hi, Like I mentioned in my other reply, I'd be very interested in any similar messages on kernel other than 3.15.*, 3.16.1 and 3.16.2. One hung task stack trace is usually not enough to diagnose this sort of problems.

Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server

2014-09-24 Thread Andrei Mikhailovsky
Ilya, Was wondering if you've had a chance to look into performance issues with rbd and the patched kernel? I've downloaded 3.16.3 and running some dd tests, which were producing hang tasks in the past. I've noticed that i can't get past 20mb/s on the rbd mounted volume. I am sure I was

Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server

2014-09-24 Thread German Anders
things work fine on kernel 3.13.0-35 German Anders --- Original message --- Asunto: Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server De: Ilya Dryomov ilya.dryo...@inktank.com Para: Micha Krause mi...@krausam.de Cc: ceph-users@lists.ceph.com

Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server

2014-09-24 Thread Micha Krause
Hi, Well, these don't point at rbd at all. Are you seeing *any* progress when this happens? Could it be that things just get very slow and don't actually hang? Can you try watching sysfs osdc file for a while to see if requests are going through or not? (/sys/kernel/debug/ceph/fsid.id/osdc)

[ceph-users] IO wait spike in VM

2014-09-24 Thread Bécholey Alexandre
Dear Ceph guru, We have a Ceph cluster (version 0.80.5 38b73c67d375a2552d8ed67843c8a65c2c0feba6) with 4 MON and 16 OSDs (4 per host) used as a backend storage for libvirt. Hosts: Ubuntu 14.04 CPU: 2 Xeon X5650 RAM: 48 GB (no swap) No SSD for journals HDD: 4 WDC WD2003FYYS-02W0B0 (2 TB, 7200

Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server

2014-09-24 Thread Micha Krause
Hi, things work fine on kernel 3.13.0-35 I can reproduce this on 3.13.10, and I had in once on 3.13.0-35 as well. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server

2014-09-24 Thread German Anders
3.13.0-35 -generic? really? I found my self in a similar situation like yours and making a downgrade to that version works fine, also you could try 3.14.9-031, it work fine for me also. German Anders --- Original message --- Asunto: Re: [ceph-users] Frequent Crashes on rbd

Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server

2014-09-24 Thread Andrei Mikhailovsky
I also had the hang tasks issues with 3.13.0-35 -generic - Original Message - From: German Anders gand...@despegar.com To: Micha Krause mi...@krausam.de Cc: ceph-users@lists.ceph.com Sent: Wednesday, 24 September, 2014 4:35:15 PM Subject: Re: [ceph-users] Frequent Crashes on rbd to

Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server

2014-09-24 Thread Micha Krause
Hi, 3.13.0-35 -generic? really? I found my self in a similar situation like yours and making a downgrade to that version works fine, also you could try 3.14.9-031, it work fine for me also. yes, it's an Ubuntu Machine, I was not able to reproduce the problem here, but the workload is quite

Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server

2014-09-24 Thread German Anders
And on 3.14.9-031? German Anders --- Original message --- Asunto: Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server De: Andrei Mikhailovsky and...@arhont.com Para: German Anders gand...@despegar.com Cc: ceph-users@lists.ceph.com, Micha Krause mi...@krausam.de

Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server

2014-09-24 Thread Ilya Dryomov
On Wed, Sep 24, 2014 at 7:51 PM, Micha Krause mi...@krausam.de wrote: Hi, 3.13.0-35 -generic? really? I found my self in a similar situation like yours and making a downgrade to that version works fine, also you could try 3.14.9-031, it work fine for me also. That is exactly the problem with

Re: [ceph-users] Frequent Crashes on rbd to nfs gateway Server

2014-09-24 Thread Micha Krause
Hi, That's strange. 3.13 is way before any changes that could have had any such effect. Can you by any chance try with older kernels to see where it starts misbehaving for you? 3.12? 3.10? 3.8? my crush tunables are set to bobtail, so I can't go bellow 3.9, I will try 3.12 tomorrow and

Re: [ceph-users] [Ceph-community] Setting up Ceph calamari :: Made Simple

2014-09-24 Thread Don Talton (dotalton)
Great stuff Karan, thank you. [http://www.cisco.com/web/europe/images/email/signature/logo05.jpg] Don Talton dotal...@cisco.com Phone: 602-692-9510 From: Ceph-community [mailto:ceph-community-boun...@lists.ceph.com] On Behalf Of Karan Singh Sent: Wednesday, September 24, 2014 1:16 AM To:

Re: [ceph-users] [ceph-calamari] Setting up Ceph calamari :: Made Simple

2014-09-24 Thread Gregory Meno
On Wed, Sep 24, 2014 at 4:16 AM, Karan Singh karan.si...@csc.fi wrote: Hello Cepher’s Now here comes my new blog on setting up Ceph Calamari. I hope you would like this step-by-step guide http://karan-mj.blogspot.fi/2014/09/ceph-calamari-survival-guide.html - Karan -

Re: [ceph-users] Resetting RGW Federated replication

2014-09-24 Thread Craig Lewis
This is embarrassing. I ran a single radosgw-agent, verbose mode, with a single thread... and everything works correctly. I noticed the problem a couple days ago, and stopped radosgw-agent until I had time to look into the issue. Looking back through the logs, things were working correctly a

[ceph-users] How many objects can you store in a Ceph bucket?

2014-09-24 Thread Steve Kingsland
What's the upper bound on the number of objects you can store in a bucket, before read/write performance starts to degrade? *Steve Kingsland* Senior Software Engineer *Opower* http://www.opower.com/ *We’re hiring! See jobs here http://www.opower.com/careers*

Re: [ceph-users] Merging two active ceph clusters: suggestions needed

2014-09-24 Thread Craig Lewis
Yehuda, are there any potential problems there? I'm wondering if duplicate bucket names that don't have the same contents might cause problems? Would the second cluster be read-only while replication is running? Robin, are the mtimes in Cluster B's S3 data important? Just wondering if it would

Re: [ceph-users] Merging two active ceph clusters: suggestions needed

2014-09-24 Thread Yehuda Sadeh
On Wed, Sep 24, 2014 at 11:17 AM, Craig Lewis cle...@centraldesktop.com wrote: Yehuda, are there any potential problems there? I'm wondering if duplicate bucket names that don't have the same contents might cause problems? Would the second cluster be read-only while replication is running? I

[ceph-users] Tuning osd hearbeat interval and grace period

2014-09-24 Thread Wensley, Barton
I am wondering if anyone has had experience tuning the following options to get faster failure detection of a storage node: - osd heartbeat interval (default 6s) - osd heartbeat grace (default 20s) I am working with a very small cluster: - 2 storage nodes - 1 to 6 OSDs per storage node -

Re: [ceph-users] Status of snapshots in CephFS

2014-09-24 Thread Florian Haas
On Fri, Sep 19, 2014 at 5:25 PM, Sage Weil sw...@redhat.com wrote: On Fri, 19 Sep 2014, Florian Haas wrote: Hello everyone, Just thought I'd circle back on some discussions I've had with people earlier in the year: Shortly before firefly, snapshot support for CephFS clients was effectively

Re: [ceph-users] Geom Error on boot with rbd volume

2014-09-24 Thread Robert LeBlanc
Could this be a lock issue? From what I understand, librbd does not create an rbd device, it is all done in userspace. I would make sure that you have unmapped the image from all machines and try it again. I haven't done a lot with librbd myself, but my co-workers have it working just fine with

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-24 Thread Alexandre DERUMIER
What about writes with Giant? I'm around - 4k iops (4k random) with 1osd (1 node - 1 osd) - 8k iops (4k random) with 2 osd (1 node - 2 osd) - 16K iops (4k random) with 4 osd (2 nodes - 2 osd by node) - 22K iops (4k random) with 6 osd (3 nodes - 2 osd by node) Seem to scale, but I'm cpu

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-24 Thread Kasper Dieter
On Wed, Sep 24, 2014 at 08:49:21PM +0200, Alexandre DERUMIER wrote: What about writes with Giant? I'm around - 4k iops (4k random) with 1osd (1 node - 1 osd) - 8k iops (4k random) with 2 osd (1 node - 2 osd) - 16K iops (4k random) with 4 osd (2 nodes - 2 osd by node) - 22K iops (4k

Re: [ceph-users] Geom Error on boot with rbd volume

2014-09-24 Thread Steven Timm
Thanks for responding. Others have told me the same--that libvirt is doing its thing in user space. The process of OpenNebula actually clones a copy of the image so this image was freshly created and there's no way it could have been mapped elsewhere. In other news--by just taking the

Re: [ceph-users] Merging two active ceph clusters: suggestions needed

2014-09-24 Thread Robin H. Johnson
On Wed, Sep 24, 2014 at 11:31:29AM -0700, Yehuda Sadeh wrote: On Wed, Sep 24, 2014 at 11:17 AM, Craig Lewis cle...@centraldesktop.com wrote: Yehuda, are there any potential problems there? I'm wondering if duplicate bucket names that don't have the same contents might cause problems?

Re: [ceph-users] Geom Error on boot with rbd volume

2014-09-24 Thread Steven Timm
This turned out to be an issue with the actual image I was trying to launch. when I made a second image from a different source and loaded into Ceph (and after a reboot of the VM host in question) things just worked. In another post someone had said it could be a locking issue--that is possible

Re: [ceph-users] Merging two active ceph clusters: suggestions needed

2014-09-24 Thread Yehuda Sadeh
On Wed, Sep 24, 2014 at 2:12 PM, Robin H. Johnson robb...@gentoo.org wrote: On Wed, Sep 24, 2014 at 11:31:29AM -0700, Yehuda Sadeh wrote: On Wed, Sep 24, 2014 at 11:17 AM, Craig Lewis cle...@centraldesktop.com wrote: Yehuda, are there any potential problems there? I'm wondering if duplicate

[ceph-users] RBD import slow

2014-09-24 Thread Brian Rak
I've been doing some testing of importing virtual machine images, and I've found that 'rbd import' is at least 2x as slow as 'qemu-img convert'. Is there anything I can do to speed this process up? I'd like to use rbd import because it gives me a little additional flexibility. My test setup

[ceph-users] bug: ceph-deploy does not support jumbo frame

2014-09-24 Thread yuelongguang
hi,all after i set mtu=9000, ceph-deply waits reply all the time , 'detecting platform for host.' how to know what commands ceph-deploy need that osd to do? thanks___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] [PG] Slow request *** seconds old,v4 currently waiting for pg to exist locally

2014-09-24 Thread Aegeaner
The cluster healthy state is WARN: health HEALTH_WARN 118 pgs degraded; 8 pgs down; 59 pgs incomplete; 28 pgs peering; 292 pgs stale; 87 pgs stuck inactive; 292 pgs stuck stale; 205 pgs stuck unclean; 22 requests are blocked 32 sec; recovery 12474/46357 objects degraded

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-24 Thread Zhang, Jian
We haven't tried Giant yet... Thanks Jian -Original Message- From: Sebastien Han [mailto:sebastien@enovance.com] Sent: Tuesday, September 23, 2014 11:42 PM To: Zhang, Jian Cc: Alexandre DERUMIER; ceph-users@lists.ceph.com Subject: Re: [ceph-users] [Single OSD performance on SSD]

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-24 Thread Christian Balzer
On Wed, 24 Sep 2014 20:49:21 +0200 (CEST) Alexandre DERUMIER wrote: What about writes with Giant? I'm around - 4k iops (4k random) with 1osd (1 node - 1 osd) - 8k iops (4k random) with 2 osd (1 node - 2 osd) - 16K iops (4k random) with 4 osd (2 nodes - 2 osd by node) - 22K iops (4k

[ceph-users] OSD start fail

2014-09-24 Thread baijia...@126.com
when I start all the osds, I find many osd start failed. logs as follow: osd/SnapMapper.cc: 270: FAILED assert(check(oid)) ceph version () 1: ceph-osd() [0x5e61c8] 2: (remove_dir(CephContext*, ObjectStore*, SnapMapper*, OSDriver*, ObjectStore::Sequencer*, coll_t,

Re: [ceph-users] Can ceph-deploy be used with 'osd objectstore = keyvaluestore-dev' in config file ?

2014-09-24 Thread Mark Kirkwood
On 25/09/14 01:03, Sage Weil wrote: On Wed, 24 Sep 2014, Mark Kirkwood wrote: On 24/09/14 14:29, Aegeaner wrote: I run ceph on Red Hat Enterprise Linux Server 6.4 Santiago, and when I run service ceph start i got: # service ceph start ERROR:ceph-disk:Failed to activate ceph-disk: