Re: [ceph-users] Two osds are spaming dmesg every 900 seconds

2014-08-26 Thread Gregory Farnum
This is being output by one of the kernel clients, and it's just saying that the connections to those two OSDs have died from inactivity. Either the other OSD connections are used a lot more, or aren't used at all. In any case, it's not a problem; just a noisy notification. There's not much you

Re: [ceph-users] Ceph-fuse fails to mount

2014-08-26 Thread Gregory Farnum
In particular, we changed things post-Firefly so that the filesystem isn't created automatically. You'll need to set it up (and its pools, etc) explicitly to use it. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Aug 25, 2014 at 2:40 PM, Sean Crosby

Re: [ceph-users] MDS dying on Ceph 0.67.10

2014-08-26 Thread Gregory Farnum
I don't think the log messages you're showing are the actual cause of the failure. The log file should have a proper stack trace (with specific function references and probably a listed assert failure), can you find that? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue,

Re: [ceph-users] Fresh Firefly install degraded without modified default tunables

2014-08-26 Thread Gregory Farnum
Hmm, that all looks basically fine. But why did you decide not to segregate OSDs across hosts (according to your CRUSH rules)? I think maybe it's the interaction of your map, setting choose_local_tries to 0, and trying to go straight to the OSDs instead of choosing hosts. But I'm not super

Re: [ceph-users] [Ceph-community] ceph replication and striping

2014-08-26 Thread Aaron Ten Clay
On Tue, Aug 26, 2014 at 5:07 AM, m.channappa.nega...@accenture.com wrote: Hello all, I have configured a ceph storage cluster. 1. I created the volume .I would like to know that replication of data will happen automatically in ceph ? 2. how to configure striped volume using ceph ?

Re: [ceph-users] Ceph-fuse fails to mount

2014-08-26 Thread Gregory Farnum
[Re-added the list.] I believe you'll find everything you need at http://ceph.com/docs/master/cephfs/createfs/ -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Aug 26, 2014 at 1:25 PM, LaBarre, James (CTR) A6IT james.laba...@cigna.com wrote: So is there a link

Re: [ceph-users] Fresh Firefly install degraded without modified default tunables

2014-08-26 Thread Ripal Nathuji
Hi Greg, Good question: I started with a single node test and had just left the setting in across larger configs as in earlier versions (e.g. Emperor) it didn't seem to matter. I also had the same thought that it could be causing an issue with the new default tunables in Firefly and did try

[ceph-users] do RGW have billing feature? If have, how do we use it ?

2014-08-26 Thread baijia...@126.com
baijia...@126.com___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Best practice K/M-parameters EC pool

2014-08-26 Thread Christian Balzer
Hello, On Tue, 26 Aug 2014 20:21:39 +0200 Loic Dachary wrote: Hi Craig, I assume the reason for the 48 hours recovery time is to keep the cost of the cluster low ? I wrote 1h recovery time because it is roughly the time it would take to move 4TB over a 10Gb/s link. Could you upgrade your

Re: [ceph-users] ceph-deploy with --release (--stable) for dumpling?

2014-08-26 Thread Nigel Williams
On Tue, Aug 26, 2014 at 5:10 PM, Konrad Gutkowski konrad.gutkow...@ffs.pl wrote: Ceph-deploy should set priority for ceph repository, which it doesn't, this usually installs the best available version from any repository. Thanks Konrad for the tip. It took several goes (notably ceph-deploy

Re: [ceph-users] Best practice K/M-parameters EC pool

2014-08-26 Thread Christian Balzer
Hello, On Tue, 26 Aug 2014 16:12:11 +0200 Loic Dachary wrote: Using percentages instead of numbers lead me to calculations errors. Here it is again using 1/100 instead of % for clarity ;-) Assuming that: * The pool is configured for three replicas (size = 3 which is the default) * It

Re: [ceph-users] MDS dying on Ceph 0.67.10

2014-08-26 Thread MinhTien MinhTien
Hi Gregory Farmum, Thank you for your reply! This is the log: 2014-08-26 16:22:39.103461 7f083752f700 -1 mds/CDir.cc: In function 'void CDir::_committed(version_t)' thread 7f083752f700 time 2014-08-26 16:22:39.075809 mds/CDir.cc: 2071: FAILED assert(in-is_dirty() || in-last ((__u64)(-2)))

[ceph-users] 'incomplete' PGs: what does it mean?

2014-08-26 Thread John Morris
In the docs [1], 'incomplete' is defined thusly: Ceph detects that a placement group is missing a necessary period of history from its log. If you see this state, report a bug, and try to start any failed OSDs that may contain the needed information. However, during an extensive review of

[ceph-users] question about getting rbd.ko and ceph.ko

2014-08-26 Thread yuelongguang
hi,all is there a way to get rbd,ko and ceph.ko for centos 6.X. or i have to build them from source code? which is the least kernel version? thanks___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] enrich ceph test methods, what is your concern about ceph. thanks

2014-08-26 Thread Irek Fasikhov
Hi. I and many people use fio. For ceph rbd has a special engine: https://telekomcloud.github.io/ceph/2014/02/26/ceph-performance-analysis_fio_rbd.html 2014-08-26 12:15 GMT+04:00 yuelongguang fasts...@163.com: hi,all i am planning to do a test on ceph, include performance, throughput,

Re: [ceph-users] ceph cluster inconsistency?

2014-08-26 Thread Kenneth Waegeman
Hi, In the meantime I already tried with upgrading the cluster to 0.84, to see if that made a difference, and it seems it does. I can't reproduce the crashing osds by doing a 'rados -p ecdata ls' anymore. But now the cluster detect it is inconsistent: cluster

Re: [ceph-users] v0.84 released

2014-08-26 Thread Stijn De Weirdt
hi all, there are a zillion OSD bug fixes. Things are looking pretty good for the Giant release that is coming up in the next month. any chance of having a compilable cephfs kernel module for el7 for the next major release? stijn ___ ceph-users

Re: [ceph-users] Ceph monitor load, low performance

2014-08-26 Thread Irek Fasikhov
Move logs on the SSD and immediately increase performance. you have about 50% of the performance lost on logs. And just for the three replications recommended more than 5 hosts 2014-08-26 12:17 GMT+04:00 Mateusz Skała mateusz.sk...@budikom.net: Hi thanks for reply. From the top of my

Re: [ceph-users] ceph cluster inconsistency?

2014-08-26 Thread Haomai Wang
Hmm, it looks like you hit this bug(http://tracker.ceph.com/issues/9223). Sorry for the late message, I forget that this fix is merged into 0.84. Thanks for your patient :-) On Tue, Aug 26, 2014 at 4:39 PM, Kenneth Waegeman kenneth.waege...@ugent.be wrote: Hi, In the meantime I already

Re: [ceph-users] Ceph monitor load, low performance

2014-08-26 Thread Mateusz Skała
You mean to move /var/log/ceph/* to SSD disk? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Best practice K/M-parameters EC pool

2014-08-26 Thread Christian Balzer
Hello, On Tue, 26 Aug 2014 10:23:43 +1000 Blair Bethwaite wrote: Message: 25 Date: Fri, 15 Aug 2014 15:06:49 +0200 From: Loic Dachary l...@dachary.org To: Erik Logtenberg e...@logtenberg.eu, ceph-users@lists.ceph.com Subject: Re: [ceph-users] Best practice K/M-parameters EC pool

Re: [ceph-users] Ceph monitor load, low performance

2014-08-26 Thread Irek Fasikhov
I'm sorry, of course it journals) 2014-08-26 13:16 GMT+04:00 Mateusz Skała mateusz.sk...@budikom.net: You mean to move /var/log/ceph/* to SSD disk? ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] Ceph monitor load, low performance

2014-08-26 Thread pawel . orzechowski
Hello Gentelmen:-) Let me point one important aspect of this low performance problem: from all 4 nodes of our ceph cluster only one node shows bad metrics, that is very high latency on its osd's (from 200-600ms), while other three nodes behave normaly, thats is latency of their osds is

Re: [ceph-users] enrich ceph test methods, what is your concern about ceph. thanks

2014-08-26 Thread yuelongguang
thanks Irek Fasikhov. is it the only way to test ceph-rbd? and an important aim of the test is to find where the bottleneck is. qemu/librbd/ceph. could you share your test result with me? thanks 在 2014-08-26 04:22:22,Irek Fasikhov malm...@gmail.com 写道: Hi. I and many people

Re: [ceph-users] enrich ceph test methods, what is your concern about ceph. thanks

2014-08-26 Thread Irek Fasikhov
For me, the bottleneck is single-threaded operation. The recording will have more or less solved with the inclusion of rbd cache, but there are problems with reading. But I think that these problems can be solved cache pool, but have not tested. It follows that the more threads, the greater the

Re: [ceph-users] enrich ceph test methods, what is your concern about ceph. thanks

2014-08-26 Thread Irek Fasikhov
Sorry..Enter pressed :) continued... no, it's not the only way to check, but it depends what you want to use ceph 2014-08-26 15:22 GMT+04:00 Irek Fasikhov malm...@gmail.com: For me, the bottleneck is single-threaded operation. The recording will have more or less solved with the inclusion of

[ceph-users] ceph can not repair itself after accidental power down, half of pgs are peering

2014-08-26 Thread yuelongguang
hi,all i have 5 osds and 3 mons. its status is ok then. to be mentioned , this cluster has no any data. i just deploy it and to be familar with some command lines. what is the probpem and how to fix? thanks ---environment- ceph-release-1-0.el6.noarch ceph-deploy-1.5.11-0.noarch

Re: [ceph-users] Best practice K/M-parameters EC pool

2014-08-26 Thread Loic Dachary
Hi Blair, Assuming that: * The pool is configured for three replicas (size = 3 which is the default) * It takes one hour for Ceph to recover from the loss of a single OSD * Any other disk has a 0.001% chance to fail within the hour following the failure of the first disk (assuming AFR

Re: [ceph-users] Best practice K/M-parameters EC pool

2014-08-26 Thread Loic Dachary
Using percentages instead of numbers lead me to calculations errors. Here it is again using 1/100 instead of % for clarity ;-) Assuming that: * The pool is configured for three replicas (size = 3 which is the default) * It takes one hour for Ceph to recover from the loss of a single OSD * Any

[ceph-users] MDS dying on Ceph 0.67.10

2014-08-26 Thread MinhTien MinhTien
Hi all, I have a cluster of 2 nodes on Centos 6.5 with ceph 0.67.10 (replicate = 2) When I add the 3rd node in the Ceph Cluster, CEPH perform load balancing. I have 3 MDS in 3 nodes,the MDS process is dying after a while with a stack trace:

Re: [ceph-users] ceph can not repair itself after accidental power down, half of pgs are peering

2014-08-26 Thread Michael
How far out are your clocks? It's showing a clock skew, if they're too far out it can cause issues with cephx. Otherwise you're probably going to need to check your cephx auth keys. -Michael On 26/08/2014 12:26, yuelongguang wrote: hi,all i have 5 osds and 3 mons. its status is ok then. to be

Re: [ceph-users] Best practice K/M-parameters EC pool

2014-08-26 Thread Craig Lewis
My OSD rebuild time is more like 48 hours (4TB disks, 60% full, osd max backfills = 1). I believe that increases my risk of failure by 48^2 . Since your numbers are failure rate per hour per disk, I need to consider the risk for the whole time for each disk. So more formally, rebuild time to

Re: [ceph-users] Ceph monitor load, low performance

2014-08-26 Thread Craig Lewis
I had a similar problem once. I traced my problem it to a failed battery on my RAID card, which disabled write caching. One of the many things I need to add to monitoring. On Tue, Aug 26, 2014 at 3:58 AM, pawel.orzechow...@budikom.net wrote: Hello Gentelmen:-) Let me point one important

Re: [ceph-users] Best practice K/M-parameters EC pool

2014-08-26 Thread Loic Dachary
Hi Craig, I assume the reason for the 48 hours recovery time is to keep the cost of the cluster low ? I wrote 1h recovery time because it is roughly the time it would take to move 4TB over a 10Gb/s link. Could you upgrade your hardware to reduce the recovery time to less than two hours ? Or

Re: [ceph-users] slow read speeds from kernel rbd (Firefly 0.80.4)

2014-08-26 Thread Steve Anthony
Ok, after some delays and the move to new network hardware I have an update. I'm still seeing the same low bandwidth and high retransmissions from iperf after moving to the Cisco 6001 (10Gb) and 2960 (1Gb). I've narrowed it down to transmissions from a 10Gb connected host to a 1Gb connected host.