Re: [ceph-users] Testing CephFS

2015-08-24 Thread Shinobu
Need to be more careful but probably you're right -; ./net/ceph/messenger.c Shinobu On Mon, Aug 24, 2015 at 8:53 PM, Simon Hallam s...@pml.ac.uk wrote: The clients are: [root@gridnode50 ~]# uname -a Linux gridnode50 4.0.8-200.fc21.x86_64 #1 SMP Fri Jul 10 21:09:54 UTC 2015 x86_64 x86_64

[ceph-users] Ceph for multi-site operation

2015-08-24 Thread Julien Escario
Hello, First, let me advise I'm really a noob with Cephsince I have only read some documentation. I'm now trying to deploy a Ceph cluster for testing purposes. The cluster is based on 3 (more if necessary) hypervisors running proxmox 3.4. Before going futher, I have an essential question : is

[ceph-users] Opensource plugin for pulling out cluster recovery and client IO metric

2015-08-24 Thread Vickey Singh
Hello Ceph Geeks I am planning to develop a python plugin that pulls out cluster *recovery IO* and *client IO* operation metrics , that can be further used with collectd. *For example , i need to take out these values* *recovery io 814 MB/s, 101 objects/s* *client io 85475 kB/s rd, 1430 kB/s

[ceph-users] radosgw secret_key

2015-08-24 Thread Luis Periquito
When I create a new user using radosgw-admin most of the time the secret key gets escaped with a backslash, making it not work. Something like secret_key: xx\/\/. Why would the / need to be escaped? Why is it printing the \/ instead of / that does work?

Re: [ceph-users] Ceph for multi-site operation

2015-08-24 Thread Lionel Bouton
Le 24/08/2015 15:11, Julien Escario a écrit : Hello, First, let me advise I'm really a noob with Cephsince I have only read some documentation. I'm now trying to deploy a Ceph cluster for testing purposes. The cluster is based on 3 (more if necessary) hypervisors running proxmox 3.4.

Re: [ceph-users] TRIM / DISCARD run at low priority by the OSDs?

2015-08-24 Thread Alexandre DERUMIER
Hi, I'm not sure for krbd, but with librbd, using trim/discard on the client, don't do trim/discard on the osd physical disk. It's simply write zeroes in the rbd image. zeores write can be skipped since this commit (librbd related)

Re: [ceph-users] Slow responding OSDs are not OUTed and cause RBD client IO hangs

2015-08-24 Thread Alex Gorbachev
This can be tuned in the iSCSI initiation on VMware - look in advanced settings on your ESX hosts (at least if you use the software initiator). Thanks, Jan. I asked this question of Vmware as well, I think the problem is specific to a given iSCSI session, so wondering if that's strictly the

Re: [ceph-users] Slow responding OSDs are not OUTed and cause RBD client IO hangs

2015-08-24 Thread Jan Schermer
I never actually set up iSCSI with VMware, I just had to research various VMware storage options when we had a SAN-probelm at a former job... But I can take a look at it again if you want me to. Is it realy deadlocked when this issue occurs? What I think is partly responsible for this

[ceph-users] rbd du

2015-08-24 Thread Allen Liao
Hi all, The online manual (http://ceph.com/docs/master/man/8/rbd/) for rbd has documentation for the 'du' command. I'm running ceph 0.94.2 and that command isn't recognized, nor is it in the man page. Is there another command that will calculate the provisioned and actual disk usage of all

Re: [ceph-users] Slow responding OSDs are not OUTed and cause RBD client IO hangs

2015-08-24 Thread Alex Gorbachev
HI Jan, On Mon, Aug 24, 2015 at 12:40 PM, Jan Schermer j...@schermer.cz wrote: I never actually set up iSCSI with VMware, I just had to research various VMware storage options when we had a SAN-probelm at a former job... But I can take a look at it again if you want me to. Thank you, I

[ceph-users] EXT4 for Production and Journal Question?

2015-08-24 Thread Robert LeBlanc
Building off a discussion earlier this month [1], how supported is EXT4 for OSDs? It seems that some people are getting good results with it and I'll be testing it in our environment. The other question is if the EXT4 journal is even necessary if you are using Ceph SSD journals. My thoughts are

[ceph-users] v9.0.3 released

2015-08-24 Thread Sage Weil
This is the second to last batch of development work for the Infernalis cycle. The most intrusive change is an internal (non user-visible) change to the OSD's ObjectStore interface. Many fixes and improvements elsewhere across RGW, RBD, and another big pile of CephFS scrub/repair

Re: [ceph-users] OSD GHz vs. Cores Question

2015-08-24 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Thanks to all the responses. There has been more to think about which is what I was looking for. We have MySQL running on this cluster so we will have some VMs with fairly low queue depths. Our Ops teams are not excited about unplugging cables and

Re: [ceph-users] rbd du

2015-08-24 Thread Jason Dillaman
That rbd CLI command is a new feature that will be included with the upcoming infernalis release. In the meantime, you can use this approach [1] to estimate your RBD image usage. [1] http://ceph.com/planet/real-size-of-a-ceph-rbd-image/ -- Jason Dillaman Red Hat Ceph Storage Engineering

Re: [ceph-users] EXT4 for Production and Journal Question?

2015-08-24 Thread Lionel Bouton
Le 24/08/2015 19:34, Robert LeBlanc a écrit : Building off a discussion earlier this month [1], how supported is EXT4 for OSDs? It seems that some people are getting good results with it and I'll be testing it in our environment. The other question is if the EXT4 journal is even necessary if

Re: [ceph-users] Testing CephFS

2015-08-24 Thread Simon Hallam
Hi Greg, The MDS' detect that the other one went down and started the replay. I did some further testing with 20 client machines. Of the 20 client machines, 5 hung with the following error: [Aug24 10:53] ceph: mds0 caps stale [Aug24 10:54] ceph: mds0 caps stale [Aug24 10:58] ceph: mds0 hung

Re: [ceph-users] Testing CephFS

2015-08-24 Thread Gregory Farnum
On Mon, Aug 24, 2015 at 11:35 AM, Simon Hallam s...@pml.ac.uk wrote: Hi Greg, The MDS' detect that the other one went down and started the replay. I did some further testing with 20 client machines. Of the 20 client machines, 5 hung with the following error: [Aug24 10:53] ceph: mds0 caps

Re: [ceph-users] Slow responding OSDs are not OUTed and cause RBD client IO hangs

2015-08-24 Thread Jan Schermer
This can be tuned in the iSCSI initiation on VMware - look in advanced settings on your ESX hosts (at least if you use the software initiator). Jan On 23 Aug 2015, at 21:28, Nick Fisk n...@fisk.me.uk wrote: Hi Alex, Currently RBD+LIO+ESX is broken. The problem is caused by the RBD

Re: [ceph-users] ceph osd debug question / proposal

2015-08-24 Thread Jan Schermer
I'm not talking about IO happening, I'm talking about file descriptors staying open. If they weren't open you could umount it without the -l. Once you hit the OSD again all those open files will start working and if more need to be opened it will start looking for them... Jan On 24 Aug

Re: [ceph-users] Testing CephFS

2015-08-24 Thread Yan, Zheng
On Aug 24, 2015, at 18:38, Gregory Farnum gfar...@redhat.com wrote: On Mon, Aug 24, 2015 at 11:35 AM, Simon Hallam s...@pml.ac.uk wrote: Hi Greg, The MDS' detect that the other one went down and started the replay. I did some further testing with 20 client machines. Of the 20 client

Re: [ceph-users] Testing CephFS

2015-08-24 Thread Simon Hallam
The clients are: [root@gridnode50 ~]# uname -a Linux gridnode50 4.0.8-200.fc21.x86_64 #1 SMP Fri Jul 10 21:09:54 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux [root@gridnode50 ~]# ceph -v ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70) I don't think it is a reconnect timeout, as they

Re: [ceph-users] Slow responding OSDs are not OUTed and cause RBD client IO hangs

2015-08-24 Thread Nick Fisk
-Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Alex Gorbachev Sent: 24 August 2015 18:06 To: Jan Schermer j...@schermer.cz Cc: ceph-users@lists.ceph.com; Nick Fisk n...@fisk.me.uk Subject: Re: [ceph-users] Slow responding OSDs are not

Re: [ceph-users] TRIM / DISCARD run at low priority by the OSDs?

2015-08-24 Thread Chad William Seys
Hi Alexandre, Thanks for the note. I was not clear enough. The fstrim I was running was only on the krbd mountpoints. The backend OSDs only have standard hard disks, not SSDs, so they don't need to be trimmed. Instead I was reclaiming free space as reported by Ceph. Running fstrim on the

[ceph-users] Opensource plugin for pulling out cluster recovery and client IO metric

2015-08-24 Thread Vickey Singh
Hello Ceph Geeks I am planning to develop a python plugin that pulls out cluster *recovery IO* and *client IO* operation metrics , that can be further used with collectd. *For example , i need to take out these values* *recovery io 814 MB/s, 101 objects/s* *client io 85475 kB/s rd, 1430 kB/s

Re: [ceph-users] ceph osd debug question / proposal

2015-08-24 Thread Goncalo Borges
Hi Jan... We were interested in the situation where an rm -Rf is done in the current directory of the OSD. Here are my findings: 1. In this exercise, we simply deleted all the content of /var/lib/ceph/osd/ceph-23/current. # cd /var/lib/ceph/osd/ceph-23/current # rm -Rf *

Re: [ceph-users] ceph osd debug question / proposal

2015-08-24 Thread Shinobu
Hope nobody never does that. Anyway that's good to know in case of disaster recovery. Thank you! Shinobu On Tue, Aug 25, 2015 at 12:10 PM, Goncalo Borges gonc...@physics.usyd.edu.au wrote: Hi Shinobu Human mistake, for example :-) Not very frequent, but it happens. Nevertheless, the

Re: [ceph-users] ceph osd debug question / proposal

2015-08-24 Thread Goncalo Borges
Hi Shinobu Human mistake, for example :-) Not very frequent, but it happens. Nevertheless, the idea is to test ceph against different DC scenarios, triggered by different problems. On this particular situation, the cluster recovered ok ONCE the problematic OSD daemon was tagged as 'down'

Re: [ceph-users] ceph osd debug question / proposal

2015-08-24 Thread Shinobu
So what is the situation where you need to do: # cd /var/lib/ceph/osd/ceph-23/current # rm -Rf * # df (...) I'm quite sure that is not normal. Shinobu On Tue, Aug 25, 2015 at 9:41 AM, Goncalo Borges gonc...@physics.usyd.edu.au wrote: Hi Jan... We were interested in the situation where an