Re: [ceph-users] ceph cluster inconsistency keyvaluestore

2014-09-08 Thread Kenneth Waegeman
Thank you very much ! Is this problem then related to the weird sizes I see: pgmap v55220: 1216 pgs, 3 pools, 3406 GB data, 852 kobjects 418 GB used, 88130 GB / 88549 GB avail a calculation with df shows indeed that there is about 400GB used on disks, but the tests I ran

Re: [ceph-users] Ceph on RHEL 7 with multiple OSD's

2014-09-08 Thread Loic Dachary
Hi, It it looks like your osd.0 is down and you only have one osd left (osd.1) which would explain why the cluster cannot get to a healthy state. The size 2 in pool 0 'data' replicated size 2 ... means the pool needs at least two OSDs up to function properly. Do you know why the osd.0 is not

Re: [ceph-users] Ceph on RHEL 7 with multiple OSD's

2014-09-08 Thread BG
Sorry, no idea, this is a first time install I'm trying and I'm following the Storage Cluster Quick Start guide. Looking in the ceph.log file I do see warnings related to osd.0: 2014-09-08 11:06:44.000667 osd.0 10.119.16.15:6800/4433 1 : [WRN] map e10 wrongly marked me down I've also just

Re: [ceph-users] Ceph on RHEL 7 with multiple OSD's

2014-09-08 Thread BG
Also, for info, this is from the osd.0 log file: 2014-09-08 11:06:44.000663 7f41144c7700 0 log [WRN] : map e10 wrongly marked me down 2014-09-08 11:06:44.002595 7f41144c7700 0 osd.0 10 crush map has features 1107558400, adjusting msgr requires for mons 2014-09-08 11:06:44.003346 7f41072ab700 0

Re: [ceph-users] ceph cluster inconsistency keyvaluestore

2014-09-08 Thread Haomai Wang
I'm not very sure, it's possible that keyvaluestore will use spare write which make big difference with ceph space statistic On Mon, Sep 8, 2014 at 6:35 PM, Kenneth Waegeman kenneth.waege...@ugent.be wrote: Thank you very much ! Is this problem then related to the weird sizes I see:

[ceph-users] Is ceph osd reweight always safe to use?

2014-09-08 Thread JR
Greetings all, I have a small ceph cluster (4 nodes, 2 osds per node) which recently started showing: root@ocd45:~# ceph health HEALTH_WARN 1 near full osd(s) admin@node4:~$ for i in 2 3 4 5; do sudo ssh osd4$i df -h |egrep 'Filesystem|osd/ceph'; done Filesystem Size Used Avail Use%

Re: [ceph-users] Ceph object back up details

2014-09-08 Thread Yehuda Sadeh
Not sure I understand what you ask. Multiple zones within the same region configuration is described here: http://ceph.com/docs/master/radosgw/federated-config/#multi-site-data-replication Yehuda On Sun, Sep 7, 2014 at 10:32 PM, M Ranga Swami Reddy swamire...@gmail.com wrote: Hi Yahuda, I

Re: [ceph-users] Is ceph osd reweight always safe to use?

2014-09-08 Thread Christian Balzer
Hello, On Mon, 08 Sep 2014 11:42:59 -0400 JR wrote: Greetings all, I have a small ceph cluster (4 nodes, 2 osds per node) which recently started showing: root@ocd45:~# ceph health HEALTH_WARN 1 near full osd(s) admin@node4:~$ for i in 2 3 4 5; do sudo ssh osd4$i df -h |egrep

Re: [ceph-users] resizing the OSD

2014-09-08 Thread JIten Shah
On Sep 6, 2014, at 8:22 PM, Christian Balzer ch...@gol.com wrote: Hello, On Sat, 06 Sep 2014 10:28:19 -0700 JIten Shah wrote: Thanks Christian. Replies inline. On Sep 6, 2014, at 8:04 AM, Christian Balzer ch...@gol.com wrote: Hello, On Fri, 05 Sep 2014 15:31:01 -0700 JIten

[ceph-users] Updating the pg and pgp values

2014-09-08 Thread JIten Shah
While checking the health of the cluster, I ran to the following error: warning: health HEALTH_WARN too few pgs per osd (1 min 20) When I checked the pg and php numbers, I saw the value was the default value of 64 ceph osd pool get data pg_num pg_num: 64 ceph osd pool get data pgp_num pgp_num:

Re: [ceph-users] Updating the pg and pgp values

2014-09-08 Thread Gregory Farnum
On Mon, Sep 8, 2014 at 10:08 AM, JIten Shah jshah2...@me.com wrote: While checking the health of the cluster, I ran to the following error: warning: health HEALTH_WARN too few pgs per osd (1 min 20) When I checked the pg and php numbers, I saw the value was the default value of 64 ceph osd

Re: [ceph-users] Updating the pg and pgp values

2014-09-08 Thread JIten Shah
Thanks Greg. —Jiten On Sep 8, 2014, at 10:31 AM, Gregory Farnum g...@inktank.com wrote: On Mon, Sep 8, 2014 at 10:08 AM, JIten Shah jshah2...@me.com wrote: While checking the health of the cluster, I ran to the following error: warning: health HEALTH_WARN too few pgs per osd (1 min 20)

Re: [ceph-users] Is ceph osd reweight always safe to use?

2014-09-08 Thread JR
Hi Christian, I have 448 PGs and 448 PGPs (according to ceph -s). This seems borne out by: root@osd45:~# rados lspools data metadata rbd volumes images root@osd45:~# for i in $(rados lspools); do echo $i pg($(ceph osd pool get $i pg_num), pgp$(ceph osd pool get $i pg_num); done data pg(pg_num:

Re: [ceph-users] Updating the pg and pgp values

2014-09-08 Thread JIten Shah
So, if it doesn’t refer to the entry in ceph.conf. Where does it actually store the new value? —Jiten On Sep 8, 2014, at 10:31 AM, Gregory Farnum g...@inktank.com wrote: On Mon, Sep 8, 2014 at 10:08 AM, JIten Shah jshah2...@me.com wrote: While checking the health of the cluster, I ran to the

Re: [ceph-users] Updating the pg and pgp values

2014-09-08 Thread Gregory Farnum
It's stored in the OSDMap on the monitors. Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Sep 8, 2014 at 10:50 AM, JIten Shah jshah2...@me.com wrote: So, if it doesn’t refer to the entry in ceph.conf. Where does it actually store the new value? —Jiten On Sep 8, 2014,

Re: [ceph-users] Updating the pg and pgp values

2014-09-08 Thread JIten Shah
Thanks. How do I query the OSDMap on monitors? Using ceph osd pool get data pg” ? or is there a way to get the full list of settings? —jiten On Sep 8, 2014, at 10:52 AM, Gregory Farnum g...@inktank.com wrote: It's stored in the OSDMap on the monitors. Software Engineer #42 @

Re: [ceph-users] Delays while waiting_for_osdmap according to dump_historic_ops

2014-09-08 Thread Gregory Farnum
On Sun, Sep 7, 2014 at 4:28 PM, Alex Moore a...@lspeed.org wrote: I recently found out about the ceph --admin-daemon /var/run/ceph/ceph-osd.id.asok dump_historic_ops command, and noticed something unexpected in the output on my cluster, after checking numerous output samples... It looks to

Re: [ceph-users] osd crash: trim_objectcould not find coid

2014-09-08 Thread Gregory Farnum
On Mon, Sep 8, 2014 at 1:42 AM, Francois Deppierraz franc...@ctrlaltdel.ch wrote: Hi, This issue is on a small 2 servers (44 osds) ceph cluster running 0.72.2 under Ubuntu 12.04. The cluster was filling up (a few osds near full) and I tried to increase the number of pg per pool to 1024 for

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-08 Thread Sebastien Han
They definitely are Warren! Thanks for bringing this here :). On 05 Sep 2014, at 23:02, Wang, Warren warren_w...@cable.comcast.com wrote: +1 to what Cedric said. Anything more than a few minutes of heavy sustained writes tended to get our solid state devices into a state where garbage

Re: [ceph-users] osd crash: trim_objectcould not find coid

2014-09-08 Thread Francois Deppierraz
Hi Greg, Thanks for your support! On 08. 09. 14 20:20, Gregory Farnum wrote: The first one is not caused by the same thing as the ticket you reference (it was fixed well before emperor), so it appears to be some kind of disk corruption. The second one is definitely corruption of some kind

Re: [ceph-users] osd crash: trim_objectcould not find coid

2014-09-08 Thread Gregory Farnum
On Mon, Sep 8, 2014 at 2:53 PM, Francois Deppierraz franc...@ctrlaltdel.ch wrote: Hi Greg, Thanks for your support! On 08. 09. 14 20:20, Gregory Farnum wrote: The first one is not caused by the same thing as the ticket you reference (it was fixed well before emperor), so it appears to be

Re: [ceph-users] OSD is crashing while running admin socket

2014-09-08 Thread Samuel Just
That seems reasonable. Bug away! -Sam On Mon, Sep 8, 2014 at 5:11 PM, Somnath Roy somnath@sandisk.com wrote: Hi Sage/Sam, I faced a crash in OSD with latest Ceph master. Here is the log trace for the same. ceph version 0.85-677-gd5777c4 (d5777c421548e7f039bb2c77cb0df2e9c7404723)

Re: [ceph-users] OSD is crashing while running admin socket

2014-09-08 Thread Somnath Roy
Created the following tracker and assigned to me. http://tracker.ceph.com/issues/9384 Thanks Regards Somnath -Original Message- From: Samuel Just [mailto:sam.j...@inktank.com] Sent: Monday, September 08, 2014 5:22 PM To: Somnath Roy Cc: Sage Weil (sw...@redhat.com);

Re: [ceph-users] OSD is crashing while running admin socket

2014-09-08 Thread Sage Weil
On Tue, 9 Sep 2014, Somnath Roy wrote: Created the following tracker and assigned to me. http://tracker.ceph.com/issues/9384 By the way, this might be the same as or similar to http://tracker.ceph.com/issues/8885 Thanks! sage Thanks Regards Somnath -Original Message- From:

Re: [ceph-users] OSD is crashing while running admin socket

2014-09-08 Thread Somnath Roy
Yeah!!..Looks similar but not entirely.. There is another potential race condition that may cause this. We are protecting the TrackedOp::events structure only during TrackedOp::mark_event with lock mutex. I couldn't find it anywhere else. The events structure should also be protected during

Re: [ceph-users] resizing the OSD

2014-09-08 Thread Christian Balzer
Hello, On Mon, 08 Sep 2014 09:53:58 -0700 JIten Shah wrote: On Sep 6, 2014, at 8:22 PM, Christian Balzer ch...@gol.com wrote: Hello, On Sat, 06 Sep 2014 10:28:19 -0700 JIten Shah wrote: Thanks Christian. Replies inline. On Sep 6, 2014, at 8:04 AM, Christian Balzer

Re: [ceph-users] Updating the pg and pgp values

2014-09-08 Thread Christian Balzer
Hello, On Mon, 08 Sep 2014 10:08:27 -0700 JIten Shah wrote: While checking the health of the cluster, I ran to the following error: warning: health HEALTH_WARN too few pgs per osd (1 min 20) When I checked the pg and php numbers, I saw the value was the default value of 64 ceph osd

Re: [ceph-users] Is ceph osd reweight always safe to use?

2014-09-08 Thread Christian Balzer
Hello, On Mon, 08 Sep 2014 13:50:08 -0400 JR wrote: Hi Christian, I have 448 PGs and 448 PGPs (according to ceph -s). This seems borne out by: root@osd45:~# rados lspools data metadata rbd volumes images root@osd45:~# for i in $(rados lspools); do echo $i pg($(ceph osd pool get

Re: [ceph-users] SSD journal deployment experiences

2014-09-08 Thread Quenten Grasso
This reminds me of something I was trying to find out awhile back. If we have 2000 Random IOPS of which are 4K Blocks our cluster (assuming 3 x Replicas) will generate 6000 IOPS @ 4K onto the journals. Does this mean our Journals will absorb 6000 IOPS and turn these into X IOPS onto our

[ceph-users] all my osds are down, but ceph -s tells they are up and in.

2014-09-08 Thread yuelongguang
hi,all that is crazy. 1. all my osds are down, but ceph -s tells they are up and in. why? 2. now all osds are down, a vm is using rbd as its disk, and inside vm fio is r/wing the disk , but it hang ,can not be killed. why ? thanks [root@cephosd2-monb ~]# ceph -v ceph version 0.81

Re: [ceph-users] Is ceph osd reweight always safe to use?

2014-09-08 Thread Christian Balzer
Hello, On Mon, 08 Sep 2014 18:30:07 -0400 JR wrote: Hi Christian, all, Having researched this a bit more, it seemed that just doing ceph osd pool set rbd pg_num 128 ceph osd pool set rbd pgp_num 128 might be the answer. Alas, it was not. After running the above the cluster just sat

[ceph-users] mix ceph verion with 0.80.5 and 0.85

2014-09-08 Thread 廖建锋
dear, As there are a lot of bugs of keyvalue backend in 0.80.5 firely version , So i want to upgrade to 0.85 for some osds which already down and unable to start and keep some other osd with 0.80.5,I wondering , will it works? [Adobe Systems] 廖建锋 Derek

Re: [ceph-users] SSD journal deployment experiences

2014-09-08 Thread Christian Balzer
On Tue, 9 Sep 2014 01:40:42 + Quenten Grasso wrote: This reminds me of something I was trying to find out awhile back. If we have 2000 Random IOPS of which are 4K Blocks our cluster (assuming 3 x Replicas) will generate 6000 IOPS @ 4K onto the journals. Does this mean our Journals

Re: [ceph-users] Is ceph osd reweight always safe to use?

2014-09-08 Thread JR
Hi Christian, Ha ... root@osd45:~# ceph osd pool get rbd pg_num pg_num: 128 root@osd45:~# ceph osd pool get rbd pgp_num pgp_num: 64 That's the explanation! I did run the command but it spit out some (what I thought was a harmless) warning; should have checked more carefully. I now have the

[ceph-users] 回复: mix ceph verion with 0.80.5 and 0.85

2014-09-08 Thread 廖建锋
Looks like it dosn't work, i noticed that 0.85 added superblock to leveldb osd, the osd which I alread have do not have superblock is there anybody can tell me how to upgrade OSDs ? 发件人: ceph-usersmailto:ceph-users-boun...@lists.ceph.com 发送时间: 2014-09-09 10:32 收件人:

Re: [ceph-users] 回复: mix ceph verion with 0.80.5 and 0.85

2014-09-08 Thread Jason King
Check the docs. 2014-09-09 11:02 GMT+08:00 廖建锋 de...@f-club.cn: Looks like it dosn't work, i noticed that 0.85 added superblock to leveldb osd, the osd which I alread have do not have superblock is there anybody can tell me how to upgrade OSDs ? *发件人:* ceph-users

Re: [ceph-users] all my osds are down, but ceph -s tells they are up and in.

2014-09-08 Thread Sage Weil
On Tue, 9 Sep 2014, yuelongguang wrote: hi,all   that is crazy. 1. all my osds are down, but ceph -s tells they are up and in. why? Peer OSDs normally handle failure detection. If all OSDs are down, there is nobody to report the failures. After 5 or 10 minutes if the OSDs don't report any

Re: [ceph-users] ceph cluster inconsistency keyvaluestore

2014-09-08 Thread Sage Weil
On Sun, 7 Sep 2014, Haomai Wang wrote: I have found the root cause. It's a bug. When chunky scrub happen, it will iterate the who pg's objects and each iterator only a few objects will be scan. osd/PG.cc:3758 ret = get_pgbackend()- objects_list_partial( start,

Re: [ceph-users] 回复: mix ceph verion with 0.80.5 and 0.85

2014-09-08 Thread 廖建锋
there is nothing about this in ceph.com 发件人: Jason Kingmailto:chn@gmail.com 发送时间: 2014-09-09 11:19 收件人: 廖建锋mailto:de...@f-club.cn 抄送: ceph-usersmailto:ceph-users-boun...@lists.ceph.com; ceph-usersmailto:ceph-users@lists.ceph.com 主题: Re: [ceph-users] 回复: mix ceph verion with 0.80.5 and 0.85

Re: [ceph-users] Is ceph osd reweight always safe to use?

2014-09-08 Thread JR
Greetings After running for a couple of hours, my attempt to re-balance a near ful disk has stopped with a stuck unclean error: root@osd45:~# ceph -s cluster c8122868-27af-11e4-b570-52540004010f health HEALTH_WARN 6 pgs backfilling; 6 pgs stuck unclean; recovery 13086/1158268 degraded