Re: [ceph-users] Firefly - Giant : CentOS 7 : install failed ceph-deploy

2015-04-08 Thread Travis Rhoden
Hi Vickey, The easiest way I know of to get around this right now is to add the following line in section for epel in /etc/yum.repos.d/epel.repo exclude=python-rados python-rbd So this is what my epel.repo file looks like: http://fpaste.org/208681/ It is those two packages in EPEL that are

Re: [ceph-users] Firefly - Giant : CentOS 7 : install failed ceph-deploy

2015-04-08 Thread Michael Kidd
I don't think this came through the first time.. resending.. If it's a dupe, my apologies.. For Firefly / Giant installs, I've had success with the following: yum install ceph ceph-common --disablerepo=base --disablerepo=epel Let us know if this works for you as well. Thanks, Michael J. Kidd

Re: [ceph-users] [a bit off-topic] Power usage estimation of hardware for Ceph

2015-04-08 Thread Christian Balzer
On Wed, 08 Apr 2015 14:59:21 +0200 Francois Lafont wrote: Hi, Sorry in advance for this thread not directly linked to Ceph. ;) We are thinking about buying servers to build a ceph cluster and we would like to have, if possible, a *approximative* power usage estimation of these servers

Re: [ceph-users] long blocking with writes on rbds

2015-04-08 Thread Jeff Epstein
Our workload involves creating and destroying a lot of pools. Each pool has 100 pgs, so it adds up. Could this be causing the problem? What would you suggest instead? ...this is most likely the cause. Deleting a pool causes the data and pgs associated with it to be deleted asynchronously,

Re: [ceph-users] RBD hard crash on kernel 3.10

2015-04-08 Thread Christian Balzer
On Wed, 08 Apr 2015 14:25:36 + Shawn Edwards wrote: We've been working on a storage repository for xenserver 6.5, which uses the 3.10 kernel (ug). I got the xenserver guys to include the rbd and libceph kernel modules into the 6.5 release, so that's at least available. Woah, hold on

Re: [ceph-users] MDS unmatched rstat after upgrade hammer

2015-04-08 Thread Yan, Zheng
On Thu, Apr 9, 2015 at 7:09 AM, Scottix scot...@gmail.com wrote: I was testing the upgrade on our dev environment and after I restarted the mds I got the following errors. 2015-04-08 15:58:34.056470 mds.0 [ERR] unmatched rstat on 605, inode has n(v70 rc2015-03-16 09:11:34.390905), dirfrags

Re: [ceph-users] RBD hard crash on kernel 3.10

2015-04-08 Thread Shawn Edwards
On Wed, Apr 8, 2015 at 9:23 PM Christian Balzer ch...@gol.com wrote: On Wed, 08 Apr 2015 14:25:36 + Shawn Edwards wrote: We've been working on a storage repository for xenserver 6.5, which uses the 3.10 kernel (ug). I got the xenserver guys to include the rbd and libceph kernel

[ceph-users] object size in rados bench write

2015-04-08 Thread Deneau, Tom
I've noticed when I use large object sizes like 100M with rados bench write, I get rados -p data2 bench 60 write --no-cleanup -b 100M Maintaining 16 concurrent writes of 104857600 bytes for up to 60 seconds or 0 objects sec Cur ops started finished avg MB/s cur MB/s last lat avg lat

Re: [ceph-users] object size in rados bench write

2015-04-08 Thread Deneau, Tom
Ah, I see there is an osd parameter for this osd max write size Description:The maximum size of a write in megabytes. Default:90 -Original Message- From: Deneau, Tom Sent: Wednesday, April 08, 2015 3:57 PM To: 'ceph-users@lists.ceph.com' Subject: object size

Re: [ceph-users] live migration fails with image on ceph

2015-04-08 Thread Yuming Ma (yumima)
Josh, I think we are using plain live migration and not mirroring block drives as the other test did. What are the chances or scenario that disk image can be corrupted during the live migration for both source and target are connected to the same volume and RBD caches is turned on: rbd cache

Re: [ceph-users] Firefly - Giant : CentOS 7 : install failed ceph-deploy

2015-04-08 Thread Vickey Singh
Hi Ken As per your suggestion , i tried enabling epel-testing repository but still no luck. Please check the below output. I would really appreciate any help here. # yum install ceph --enablerepo=epel-testing --- Package python-rbd.x86_64 1:0.80.7-0.5.el7 will be installed -- Processing

Re: [ceph-users] Firefly - Giant : CentOS 7 : install failed ceph-deploy

2015-04-08 Thread Sam Wouters
Hi Vickey, we had a similar issue and we resolved it by giving the centos base and update repo a higher priority (ex 10) then the epel repo. The ceph-deploy tool only sets a prio of 1 for the ceph repo's, but the centos and epel repo's stay on the default of 99. regards, Sam On 08-04-15

Re: [ceph-users] Firefly - Giant : CentOS 7 : install failed ceph-deploy

2015-04-08 Thread Irek Fasikhov
I use Centos 7.1. The problem is that in the basic package repository has ceph-common. [root@ceph01p24 cluster]# yum --showduplicates list ceph-common Loaded plugins: dellsysid, etckeeper, fastestmirror, priorities Loading mirror speeds from cached hostfile * base: centos-mirror.rbc.ru * epel:

Re: [ceph-users] Firefly - Giant : CentOS 7 : install failed ceph-deploy

2015-04-08 Thread Vickey Singh
Hi The below suggestion also didn’t worked Full logs here : http://paste.ubuntu.com/10771939/ [root@rgw-node1 yum.repos.d]# yum --showduplicates list ceph Loaded plugins: fastestmirror, priorities Loading mirror speeds from cached hostfile * base: mirror.zetup.net * epel:

Re: [ceph-users] when recovering start

2015-04-08 Thread Stéphane DUGRAVOT
- Le 7 Avr 15, à 14:57, lijian blacker1...@163.com a écrit : Haomai Wang, the mon_osd_down_out_interval is 300, please refer to my settings, and I use the cli 'service ceph stop osd.X' to stop a osd the pg status change to remap,backfill and recovering ... immediately so other

Re: [ceph-users] Preliminary RDMA vs TCP numbers

2015-04-08 Thread Andrey Korolyov
On Wed, Apr 8, 2015 at 11:17 AM, Somnath Roy somnath@sandisk.com wrote: Hi, Please find the preliminary performance numbers of TCP Vs RDMA (XIO) implementation (on top of SSDs) in the following link. http://www.slideshare.net/somnathroy7568/ceph-on-rdma The attachment didn't go

[ceph-users] Inconsistent ceph-deploy disk list command results

2015-04-08 Thread f...@univ-lr.fr
Hi all, I want to alert on a command we've learned to avoid for its inconsistent results. on Giant 0.87.1 and Hammer 0.93.0 (ceph-deploy-1.5.22-0.noarch was used in both cases) ceph-deploy disk list command has a problem. We should get an exhaustive list of devices entries, like this one :

Re: [ceph-users] Preliminary RDMA vs TCP numbers

2015-04-08 Thread Andrei Mikhailovsky
Hi, Am I the only person noticing disappointing results from the preliminary RDMA testing, or am I reading the numbers wrong? Yes, it's true that on a very small cluster you do see a great improvement in rdma, but in real life rdma is used in large infrastructure projects, not on a few

[ceph-users] Preliminary RDMA vs TCP numbers

2015-04-08 Thread Somnath Roy
Hi, Please find the preliminary performance numbers of TCP Vs RDMA (XIO) implementation (on top of SSDs) in the following link. http://www.slideshare.net/somnathroy7568/ceph-on-rdma The attachment didn't go through it seems, so, I had to use slideshare. Mark, If we have time, I can present it

Re: [ceph-users] Firefly - Giant : CentOS 7 : install failed ceph-deploy

2015-04-08 Thread Vickey Singh
Hello Everyone I also tried setting higher priority as suggested by SAM but no luck Please see the Full logs here http://paste.ubuntu.com/10771358/ While installing yum searches for correct Ceph repository but it founds 3 versions of python-ceph under http://ceph.com/rpm-giant/el7/x86_64/

Re: [ceph-users] Preliminary RDMA vs TCP numbers

2015-04-08 Thread Somnath Roy
Andrei, Yes, I see it has lot of potential and I believe fixing the performance bottlenecks inside XIO messenger it should go further. We are working on it and will keep community posted.. Thanks Regards Somnath From: Andrei Mikhailovsky [mailto:and...@arhont.com] Sent: Wednesday, April 08,

[ceph-users] long blocking with writes on rbds

2015-04-08 Thread Jeff Epstein
Hi, I'm having sporadic very poor performance running ceph. Right now mkfs, even with nodiscard, takes 30 mintes or more. These kind of delays happen often but irregularly .There seems to be no common denominator. Clearly, however, they make it impossible to deploy ceph in production. I

Re: [ceph-users] long blocking with writes on rbds

2015-04-08 Thread Lionel Bouton
On 04/08/15 18:24, Jeff Epstein wrote: Hi, I'm having sporadic very poor performance running ceph. Right now mkfs, even with nodiscard, takes 30 mintes or more. These kind of delays happen often but irregularly .There seems to be no common denominator. Clearly, however, they make it impossible

Re: [ceph-users] Getting placement groups to place evenly (again)

2015-04-08 Thread Gregory Farnum
ceph pg dump will output the size of each pg, among other things. On Wed, Apr 8, 2015 at 8:34 AM J David j.david.li...@gmail.com wrote: On Wed, Apr 8, 2015 at 11:33 AM, Gregory Farnum g...@gregs42.com wrote: Is this a problem with your PGs being placed unevenly, with your PGs being sized

Re: [ceph-users] Preliminary RDMA vs TCP numbers

2015-04-08 Thread Somnath Roy
I used the default TCP setting in Ubuntu 14.04. -Original Message- From: Andrey Korolyov [mailto:and...@xdel.ru] Sent: Wednesday, April 08, 2015 1:28 AM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-devel Subject: Re: Preliminary RDMA vs TCP numbers On Wed, Apr 8, 2015 at 11:17 AM,

Re: [ceph-users] Getting placement groups to place evenly (again)

2015-04-08 Thread Gregory Farnum
Is this a problem with your PGs being placed unevenly, with your PGs being sized very differently, or both? CRUSH is never going to balance perfectly, but the numbers you're quoting look a bit worse than usual at first glance. -Greg On Tue, Apr 7, 2015 at 8:16 PM J David j.david.li...@gmail.com

Re: [ceph-users] OSDs not coming up on one host

2015-04-08 Thread Gregory Farnum
Im on my phone so can't check exactly what those threads are trying to do, but the osd has several threads which are stuck. The FileStore threads are certainly trying to access the disk/local filesystem. You may not have a hardware fault, but it looks like something in your stack is not behaving

Re: [ceph-users] Preliminary RDMA vs TCP numbers

2015-04-08 Thread Somnath Roy
I am using Mellanox 40GbE, I think it is TCP offloaded. From: Viral Mehta [mailto:viral@gmail.com] Sent: Wednesday, April 08, 2015 1:30 AM To: Somnath Roy Cc: ceph-users@lists.ceph.com; ceph-devel Subject: Re: Preliminary RDMA vs TCP numbers I am sorry, I am new to the discussion. But, is it

Re: [ceph-users] Inconsistent ceph-deploy disk list command results

2015-04-08 Thread f...@univ-lr.fr
Hi Travis, Thanks for your advice, Issue #11347 created http://tracker.ceph.com/issues/11347 Frederic Travis Rhoden trho...@gmail.com a écrit le 8/04/15 16:44 : Hi Frederic, Thanks for the report! Do you mind throwing this details into a bug report at http://tracker.ceph.com/ ? I have

Re: [ceph-users] Number of ioctx per rados connection

2015-04-08 Thread Josh Durgin
Yes, you can use multiple ioctxs with the same underlying rados connection. There's no hard limit on how many, it depends on your usage if/when a single rados connection becomes a bottleneck. It's safe to use different ioctxs from multiple threads. IoCtxs have some local state like namespace,

Re: [ceph-users] Interesting problem: 2 pgs stuck in EC pool with missing OSDs

2015-04-08 Thread Loic Dachary
Hi Paul, Contrary to what the documentation states at http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon the crush ruleset can be modified (an update at https://github.com/ceph/ceph/pull/4306 will fix that). Placement groups will move around, but

[ceph-users] Number of ioctx per rados connection

2015-04-08 Thread Michel Hollands
Hello, This is a question about the C API for librados. Can you use multiple “IO contexts” (ioctx) per rados connection and if so how many ? Can these then be used by multiple threads ? Thanks in advance, Michel ___ ceph-users mailing list

[ceph-users] Radosgw GC parallelization

2015-04-08 Thread ceph
Hi, I have a Ceph cluster, used through radosgw. In that cluster, I write files each seconds: input files are known, predictible and stable, there is always the same number of new fiexd-size files, each second. Theses files are kept a few days, then remove after a fixed duration. And thus, I

Re: [ceph-users] What are you doing to locate performance issues in a Ceph cluster?

2015-04-08 Thread Chris Kitzmiller
On Apr 7, 2015, at 7:44 PM, Francois Lafont wrote: Chris Kitzmiller wrote: I graph aggregate stats for `ceph --admin-daemon /var/run/ceph/ceph-osd.$osdid.asok perf dump`. If the max latency strays too far outside of my mean latency I know to go look for the troublemaker. My graphs look

[ceph-users] [a bit off-topic] Power usage estimation of hardware for Ceph

2015-04-08 Thread Francois Lafont
Hi, Sorry in advance for this thread not directly linked to Ceph. ;) We are thinking about buying servers to build a ceph cluster and we would like to have, if possible, a *approximative* power usage estimation of these servers (this parameter could be important in your choice): 1. the 12xbays

Re: [ceph-users] Cascading Failure of OSDs

2015-04-08 Thread Francois Lafont
Hi, 01/04/2015 17:28, Quentin Hartman wrote: Right now we're just scraping the output of ifconfig: ifconfig p2p1 | grep -e 'RX\|TX' | grep packets | awk '{print $3}' It clunky, but it works. I'm sure there's a cleaner way, but this was expedient. QH Ok, thx for the information

Re: [ceph-users] Firefly - Giant : CentOS 7 : install failed ceph-deploy

2015-04-08 Thread Vickey Singh
Any suggestion geeks VS On Wed, Apr 8, 2015 at 2:15 PM, Vickey Singh vickey.singh22...@gmail.com wrote: Hi The below suggestion also didn’t worked Full logs here : http://paste.ubuntu.com/10771939/ [root@rgw-node1 yum.repos.d]# yum --showduplicates list ceph Loaded plugins:

Re: [ceph-users] What are you doing to locate performance issues in a Ceph cluster?

2015-04-08 Thread Francois Lafont
Chris Kitzmiller wrote: ~# ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok perf [...] osd: { opq: 0, op_wip: 0, op: 3566, op_in_bytes: 208803635, op_out_bytes: 146962506, op_latency: { avgcount: 3566, sum: 100.330695000}, op_process_latency:

Re: [ceph-users] Radosgw GC parallelization

2015-04-08 Thread LOPEZ Jean-Charles
Hi, the following parameters can be used to have more GC processing and more efficient - rgw_gc_max_objs defaults to 32 - rgw_gc_obj_min_wait defaults to 2 * 3600 - rgw_gc_processor_max_time defaults to 3600 - rgw_gc_processor_period defaults to 3600 It is recommended to set rgw_gc_max_objs

[ceph-users] RBD hard crash on kernel 3.10

2015-04-08 Thread Shawn Edwards
We've been working on a storage repository for xenserver 6.5, which uses the 3.10 kernel (ug). I got the xenserver guys to include the rbd and libceph kernel modules into the 6.5 release, so that's at least available. Where things go bad is when we have many (10 or so) VMs on one host, all using

Re: [ceph-users] when recovering start

2015-04-08 Thread Stéphane DUGRAVOT
- Le 8 Avr 15, à 14:21, lijian blacker1...@163.com a écrit : Hi Stephane, I dump from a osd deamon You have to apply the mon_osd_down_out_interval value to monitor and not osd. What is the value on the mon ? Stephane. Thanks Jian LI At 2015-04-08 16:05:04, Stéphane DUGRAVOT

[ceph-users] [ANN] ceph-deploy 1.5.23 released

2015-04-08 Thread Travis Rhoden
Hi All, This is a new release of ceph-deploy that includes a new feature for Hammer and bugfixes. ceph-deploy can be installed from the ceph.com hosted repos for Firefly, Giant, Hammer, or testing, and is also available on PyPI. ceph-deploy now defaults to installing the Hammer release. If you

Re: [ceph-users] Inconsistent ceph-deploy disk list command results

2015-04-08 Thread Travis Rhoden
Hi Frederic, Thanks for the report! Do you mind throwing this details into a bug report at http://tracker.ceph.com/ ? I have seen the same thing once before, but at the time didn't have the chance to check if the inconsistency was coming from ceph-deploy or from ceph-disk. This certainly seems

[ceph-users] OSDs not coming up on one host

2015-04-08 Thread Jacob Reid
I have a cluster of 3 servers (recently updated from 0.80.5 to 0.80.9), each running 4-6 osds as single disks, journaled to a partition each on an SSD, with 3 mons on separate hosts. Recently, I started taking the hosts down to move disks between controllers and add extra disk capacity before

Re: [ceph-users] What are you doing to locate performance issues in a Ceph cluster?

2015-04-08 Thread Dan Ryder (daryder)
Yes, the unit is in seconds for those latencies. The sum/avgcount is the average since the daemon was (re)started. If you're interested, I've co-authored a collectd plugin which captures data from Ceph daemons - built into the plugin I give the option to calculate either the long-run avg

Re: [ceph-users] long blocking with writes on rbds

2015-04-08 Thread Jeff Epstein
Hi, thanks for answering. Here are the answers to your questions. Hopefully they will be helpful. On 04/08/2015 12:36 PM, Lionel Bouton wrote: I probably won't be able to help much, but people knowing more will need at least: - your Ceph version, - the kernel version of the host on which you

Re: [ceph-users] Getting placement groups to place evenly (again)

2015-04-08 Thread J David
On Wed, Apr 8, 2015 at 11:33 AM, Gregory Farnum g...@gregs42.com wrote: Is this a problem with your PGs being placed unevenly, with your PGs being sized very differently, or both? Please forgive the silly question, but how would one check that? Thanks!

[ceph-users] rados bench seq read with single thread

2015-04-08 Thread Deneau, Tom
Say I have a single node cluster with 5 disks. And using dd iflag=direct on that node, I can see disk read bandwidth at 160 MB/s I populate a pool with 4MB objects. And then on that same single node, I run $ drop-caches using /proc/sys/vm/drop_caches $ rados -p mypool bench nn seq -t 1

Re: [ceph-users] Getting placement groups to place evenly (again)

2015-04-08 Thread J David
On Wed, Apr 8, 2015 at 11:40 AM, Gregory Farnum g...@gregs42.com wrote: ceph pg dump will output the size of each pg, among other things. Among many other things. :) Here is the raw output, in case I'm misinterpreting it: http://pastebin.com/j4ySNBdQ It *looks* like the pg's are roughly

Re: [ceph-users] Preliminary RDMA vs TCP numbers

2015-04-08 Thread Andrei Mikhailovsky
Somnath, Sounds very promising! I can't wait to try it on my cluster as I am currently using IPOIB instread of the native rdma. Cheers Andrei - Original Message - From: Somnath Roy somnath@sandisk.com To: Andrei Mikhailovsky and...@arhont.com, Andrey Korolyov

Re: [ceph-users] Preliminary RDMA vs TCP numbers

2015-04-08 Thread Mark Nelson
Please do keep in mind that this is *very* experimental still and likely to destroy all data and life within a 2 mile radius. ;) Mark On 04/08/2015 01:16 PM, Andrei Mikhailovsky wrote: Somnath, Sounds very promising! I can't wait to try it on my cluster as I am currently using IPOIB instread

Re: [ceph-users] Preliminary RDMA vs TCP numbers

2015-04-08 Thread Andrei Mikhailovsky
Mike, yeah, I wouldn't switch to rdma until it is fully supported in a stable release ))) Andrei - Original Message - From: Andrei Mikhailovsky and...@arhont.com To: Somnath Roy somnath@sandisk.com Cc: ceph-users@lists.ceph.com, ceph-devel ceph-de...@vger.kernel.org Sent:

Re: [ceph-users] Firefly - Giant : CentOS 7 : install failed ceph-deploy

2015-04-08 Thread Vickey Singh
Community , need help. -VS- On Wed, Apr 8, 2015 at 4:36 PM, Vickey Singh vickey.singh22...@gmail.com wrote: Any suggestion geeks VS On Wed, Apr 8, 2015 at 2:15 PM, Vickey Singh vickey.singh22...@gmail.com wrote: Hi The below suggestion also didn’t worked Full logs here :

Re: [ceph-users] long blocking with writes on rbds

2015-04-08 Thread Josh Durgin
On 04/08/2015 11:40 AM, Jeff Epstein wrote: Hi, thanks for answering. Here are the answers to your questions. Hopefully they will be helpful. On 04/08/2015 12:36 PM, Lionel Bouton wrote: I probably won't be able to help much, but people knowing more will need at least: - your Ceph version, -