[ceph-users] osd laggy algorithm
hello. ceph transfers osd node in the down status by default , after receiving 3 reports about disabled nodes. Reports are sent per osd heartbeat grace seconds, but the settings of mon_osd_adjust_heartbeat_gratse = true, mon_osd_adjust_down_out_interval = true timeout to transfer nodes in down status may vary. Tell me please: what algorithm enables changes timeout for the transfer nodes occur in down/out status and which parameters are affected? thanks. -- Artem ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.80.9 Firefly released
Hi Loic, Nope, only the versions from 0.81-trusty to 0.93-1trusty are available in http://ceph.com/debian-testing/pool/main/c/ceph/ But the firefly deb source packages for 0.80.9-1trusty is not available :( Cheers, Valery On 11/03/15 14:11 , Loic Dachary wrote: Hi Valery, They should be here http://ceph.com/debian-testing/ Cheers On 11/03/2015 10:07, Valery Tschopp wrote: Where can I find the debian trusty source package for v0.80.9? Cheers, Valery On 10/03/15 20:34 , Sage Weil wrote: This is a bugfix release for firefly. It fixes a performance regression in librbd, an important CRUSH misbehavior (see below), and several RGW bugs. We have also backported support for flock/fcntl locks to ceph-fuse and libcephfs. We recommend that all Firefly users upgrade. For more detailed information, see http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt Adjusting CRUSH maps * This point release fixes several issues with CRUSH that trigger excessive data migration when adjusting OSD weights. These are most obvious when a very small weight change (e.g., a change from 0 to .01) triggers a large amount of movement, but the same set of bugs can also lead to excessive (though less noticeable) movement in other cases. However, because the bug may already have affected your cluster, fixing it may trigger movement *back* to the more correct location. For this reason, you must manually opt-in to the fixed behavior. In order to set the new tunable to correct the behavior:: ceph osd crush set-tunable straw_calc_version 1 Note that this change will have no immediate effect. However, from this point forward, any 'straw' bucket in your CRUSH map that is adjusted will get non-buggy internal weights, and that transition may trigger some rebalancing. You can estimate how much rebalancing will eventually be necessary on your cluster with:: ceph osd getcrushmap -o /tmp/cm crushtool -i /tmp/cm --num-rep 3 --test --show-mappings /tmp/a 21 crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2 crushtool -i /tmp/cm2 --reweight -o /tmp/cm2 crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings /tmp/b 21 wc -l /tmp/a # num total mappings diff -u /tmp/a /tmp/b | grep -c ^+# num changed mappings Divide the total number of lines in /tmp/a with the number of lines changed. We've found that most clusters are under 10%. You can force all of this rebalancing to happen at once with:: ceph osd crush reweight-all Otherwise, it will happen at some unknown point in the future when CRUSH weights are next adjusted. Notable Changes --- * ceph-fuse: flock, fcntl lock support (Yan, Zheng, Greg Farnum) * crush: fix straw bucket weight calculation, add straw_calc_version tunable (#10095 Sage Weil) * crush: fix tree bucket (Rongzu Zhu) * crush: fix underflow of tree weights (Loic Dachary, Sage Weil) * crushtool: add --reweight (Sage Weil) * librbd: complete pending operations before losing image (#10299 Jason Dillaman) * librbd: fix read caching performance regression (#9854 Jason Dillaman) * librbd: gracefully handle deleted/renamed pools (#10270 Jason Dillaman) * mon: fix dump of chooseleaf_vary_r tunable (Sage Weil) * osd: fix PG ref leak in snaptrimmer on peering (#10421 Kefu Chai) * osd: handle no-op write with snapshot (#10262 Sage Weil) * radosgw-admin: create subuser when creating user (#10103 Yehuda Sadeh) * rgw: change multipart uplaod id magic (#10271 Georgio Dimitrakakis, Yehuda Sadeh) * rgw: don't overwrite bucket/object owner when setting ACLs (#10978 Yehuda Sadeh) * rgw: enable IPv6 for embedded civetweb (#10965 Yehuda Sadeh) * rgw: fix partial swift GET (#10553 Yehuda Sadeh) * rgw: fix quota disable (#9907 Dong Lei) * rgw: index swift keys appropriately (#10471 Hemant Burman, Yehuda Sadeh) * rgw: make setattrs update bucket index (#5595 Yehuda Sadeh) * rgw: pass civetweb configurables (#10907 Yehuda Sadeh) * rgw: remove swift user manifest (DLO) hash calculation (#9973 Yehuda Sadeh) * rgw: return correct len for 0-len objects (#9877 Yehuda Sadeh) * rgw: S3 object copy content-type fix (#9478 Yehuda Sadeh) * rgw: send ETag on S3 object copy (#9479 Yehuda Sadeh) * rgw: send HTTP status reason explicitly in fastcgi (Yehuda Sadeh) * rgw: set ulimit -n from sysvinit (el6) init script (#9587 Sage Weil) * rgw: update swift subuser permission masks when authenticating (#9918 Yehuda Sadeh) * rgw: URL decode query params correctly (#10271 Georgio Dimitrakakis, Yehuda Sadeh) * rgw: use attrs when reading object attrs (#10307 Yehuda Sadeh) * rgw: use \r\n for http headers (#9254 Benedikt Fraunhofer, Yehuda Sadeh) Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at http://ceph.com/download/ceph-0.80.9.tar.gz * For packages, see
Re: [ceph-users] client crashed when osd gets restarted - hammer 0.93
Kevin, This is a known issue and should be fixed in the latest krbd. The problem is, it is not backported to 14.04 krbd yet. You need to build it from latest krbd source if you want to stick with 14.04. The workaround is, you need to unmap your clients before restarting osds. Thanks Regards Somnath From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of kevin parrikar Sent: Wednesday, March 11, 2015 11:44 AM To: ceph-users@lists.ceph.com Subject: [ceph-users] client crashed when osd gets restarted - hammer 0.93 Hi, I am trying hammer 0.93 on Ubuntu 14.04. rbd is mapped in client ,which is also ubuntu 14.04 . When i did a stop ceph-osd-all and then a start,client machine crashed and attached pic was in the console.Not sure if its related to ceph. Thanks PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Cache Tier Flush = immediate base tier journal sync?
I'm not sure if it's something I'm doing wrong or just experiencing an oddity, but when my cache tier flushes dirty blocks out to the base tier, the writes seem to hit the OSD's straight away instead of coalescing in the journals, is this correct? For example if I create a RBD on a standard 3 way replica pool and run fio via librbd 128k writes, I see the journals take all the io's until I hit my filestore_min_sync_interval and then I see it start writing to the underlying disks. Doing the same on a full cache tier (to force flushing) I immediately see the base disks at a very high utilisation. The journals also have some write IO at the same time. The only other odd thing I can see via iostat is that most of the time whilst I'm running Fio, is that I can see the underlying disks doing very small write IO's of around 16kb with an occasional big burst of activity. I know erasure coding+cache tier is slower than just plain replicated pools, but even with various high queue depths I'm struggling to get much above 100-150 iops compared to a 3 way replica pool which can easily achieve 1000-1500. The base tier is comprised of 40 disks. It seems quite a marked difference and I'm wondering if this strange journal behaviour is the cause. Does anyone have any ideas? Nick ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Duplication name Container
On 11/03/2015, at 15.31, Wido den Hollander w...@42on.com wrote: On 03/11/2015 03:23 PM, Jimmy Goffaux wrote: Hello All, I use Ceph in production for several months. but i have an errors with Ceph Rados Gateway for multiple users. I am faced with the following error: Error trying to create container 'xs02': 409 Conflict: BucketAlreadyExists Which corresponds to the documentation : http://ceph.com/docs/master/radosgw/s3/bucketops/ By which means I can avoid this kind of problem? You can not. Bucket names are unique inside the RADOS Gateway. Just as with Amazon S3. Well it can be avoided but not at the Ceph level but at your Application level :) Either ignore already exist errors in your App or try to verify bucket exists before creating buckets... /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] S3 RadosGW - Create bucket OP
On 11/03/2015, at 08.19, Steffen W Sørensen ste...@me.com wrote: On 10/03/2015, at 23.31, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: What kind of application is that? Commercial Email platform from Openwave.com Maybe it could be worked around using an apache rewrite rule. In any case, I opened issue #11091. Okay, how, by rewriting the response? Thanks, where can tickets be followed/viewed? Ah here: http://tracker.ceph.com/projects/rgw/issues Not at the moment. There's already issue #6961, I bumped its priority higher, and we'll take a look at it. Please also backport to Giant if possible :) /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] client crashed when osd gets restarted - hammer 0.93
Hi, I am trying hammer 0.93 on Ubuntu 14.04. rbd is mapped in client ,which is also ubuntu 14.04 . When i did a stop ceph-osd-all and then a start,client machine crashed and attached pic was in the console.Not sure if its related to ceph. Thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] client crashed when osd gets restarted - hammer 0.93
thanks i will follow this work around. On Thu, Mar 12, 2015 at 12:18 AM, Somnath Roy somnath@sandisk.com wrote: Kevin, This is a known issue and should be fixed in the latest krbd. The problem is, it is not backported to 14.04 krbd yet. You need to build it from latest krbd source if you want to stick with 14.04. The workaround is, you need to unmap your clients before restarting osds. Thanks Regards Somnath *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *kevin parrikar *Sent:* Wednesday, March 11, 2015 11:44 AM *To:* ceph-users@lists.ceph.com *Subject:* [ceph-users] client crashed when osd gets restarted - hammer 0.93 Hi, I am trying hammer 0.93 on Ubuntu 14.04. rbd is mapped in client ,which is also ubuntu 14.04 . When i did a stop ceph-osd-all and then a start,client machine crashed and attached pic was in the console.Not sure if its related to ceph. Thanks -- PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs
For each of those pgs, you'll need to identify the pg copy you want to be the winner and either 1) Remove all of the other ones using ceph-objectstore-tool and hopefully the winner you left alone will allow the pg to recover and go active. 2) Export the winner using ceph-objectstore-tool, use ceph-objectstore-tool to delete *all* copies of the pg, use force_create_pg to recreate the pg empty, use ceph-objectstore-tool to do a rados import on the exported pg copy. Also, the pgs which are still down still have replicas which need to be brought back or marked lost. -Sam On 03/11/2015 07:29 AM, joel.merr...@gmail.com wrote: I'd like to not have to null them if possible, there's nothing outlandishly valuable, its more the time to reprovision (users have stuff on there, mainly testing but I have a nasty feeling some users won't have backed up their test instances). When you say complicated and fragile, could you expand? Thanks again! Joel On Wed, Mar 11, 2015 at 1:21 PM, Samuel Just sj...@redhat.com wrote: Ok, you lost all copies from an interval where the pgs went active. The recovery from this is going to be complicated and fragile. Are the pools valuable? -Sam On 03/11/2015 03:35 AM, joel.merr...@gmail.com wrote: For clarity too, I've tried to drop the min_size before as suggested, doesn't make a difference unfortunately On Wed, Mar 11, 2015 at 9:50 AM, joel.merr...@gmail.com joel.merr...@gmail.com wrote: Sure thing, n.b. I increased pg count to see if it would help. Alas not. :) Thanks again! health_detail https://gist.github.com/199bab6d3a9fe30fbcae osd_dump https://gist.github.com/499178c542fa08cc33bb osd_tree https://gist.github.com/02b62b2501cbd684f9b2 Random selected queries: queries/0.19.query https://gist.github.com/f45fea7c85d6e665edf8 queries/1.a1.query https://gist.github.com/dd68fbd5e862f94eb3be queries/7.100.query https://gist.github.com/d4fd1fb030c6f2b5e678 queries/7.467.query https://gist.github.com/05dbcdc9ee089bd52d0c On Tue, Mar 10, 2015 at 2:49 PM, Samuel Just sj...@redhat.com wrote: Yeah, get a ceph pg query on one of the stuck ones. -Sam On Tue, 2015-03-10 at 14:41 +, joel.merr...@gmail.com wrote: Stuck unclean and stuck inactive. I can fire up a full query and health dump somewhere useful if you want (full pg query info on ones listed in health detail, tree, osd dump etc). There were blocked_by operations that no longer exist after doing the OSD addition. Side note, spent some time yesterday writing some bash to do this programatically (might be useful to others, will throw on github) On Tue, Mar 10, 2015 at 1:41 PM, Samuel Just sj...@redhat.com wrote: What do you mean by unblocked but still stuck? -Sam On Mon, 2015-03-09 at 22:54 +, joel.merr...@gmail.com wrote: On Mon, Mar 9, 2015 at 2:28 PM, Samuel Just sj...@redhat.com wrote: You'll probably have to recreate osds with the same ids (empty ones), let them boot, stop them, and mark them lost. There is a feature in the tracker to improve this behavior: http://tracker.ceph.com/issues/10976 -Sam Thanks Sam, I've readded the OSDs, they became unblocked but there are still the same number of pgs stuck. I looked at them in some more detail and it seems they all have num_bytes='0'. Tried a repair too, for good measure. Still nothing I'm afraid. Does this mean some underlying catastrophe has happened and they are never going to recover? Following on, would that cause data loss. There are no missing objects and I'm hoping there's appropriate checksumming / replicas to balance that out, but now I'm not so sure. Thanks again, Joel -- $ echo kpfmAdpoofdufevq/dp/vl | perl -pe 's/(.)/chr(ord($1)-1)/ge' ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PGs stuck unclean active+remapped after an osd marked out
Hi, I was always in the same situation: I couldn't remove an OSD without have some PGs definitely stuck to the active+remapped state. But I remembered I read on IRC that, before to mark out an OSD, it could be sometimes a good idea to reweight it to 0. So, instead of doing [1]: ceph osd out 3 I have tried [2]: ceph osd crush reweight osd.3 0 # waiting for the rebalancing... ceph osd out 3 and it worked. Then I could remove my osd with the online documentation: http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual Now, the osd is removed and my cluster is HEALTH_OK. \o/ Now, my question is: why my cluster was definitely stuck to active+remapped with [1] but was not with [2]? Personally, I have absolutely no explanation. If you have an explanation, I'd love to know it. Should the reweight command be present in the online documentation? http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual If yes, I can make a pull request on the doc with pleasure. ;) Regards. -- François Lafont ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-osd pegging CPU on giant, no snapshots involved this time
On Wed, Feb 18, 2015 at 9:19 PM, Florian Haas wrote: Hey everyone, I must confess I'm still not fully understanding this problem and don't exactly know where to start digging deeper, but perhaps other users have seen this and/or it rings a bell. System info: Ceph giant on CentOS 7; approx. 240 OSDs, 6 pools using 2 different rulesets where the problem applies to hosts and PGs using a bog-standard default crushmap. Symptom: out of the blue, ceph-osd processes on a single OSD node start going to 100% CPU utilization. The problems turns so bad that the machine is effectively becoming CPU bound and can't cope with any client requests anymore. Stopping and restarting all OSDs brings the problem right back, as does rebooting the machine — right after ceph-osd processes start, CPU utilization shoots up again. Stopping and marking out several OSDs on the machine makes the problem go away but obviously causes massive backfilling. All the logs show while CPU utilization is implausibly high are slow requests (which would be expected in a system that can barely do anything). Now I've seen issues like this before on dumpling and firefly, but besides the fact that they have all been addressed and should now be fixed, they always involved the prior mass removal of RBD snapshots. This system only used a handful of snapshots in testing, and is presently not using any snapshots at all. I'll be spending some time looking for clues in the log files of the OSDs that were shut down which caused the problem to go away, but if this sounds familiar to anyone willing to offer clues, I'd be more than interested. :) Thanks! Cheers, Florian Dan vd Ster was kind enough to pitch in an incredibly helpful off-list reply, which I am taking the liberty to paraphrase here: That mysterious OSD madness seems to be caused by NUMA zone reclaim, which is enabled by default on Intel machines with recent kernels. It can be disabled as follows: echo 0 /proc/sys/vm/zone_reclaim_mode or of course, sysctl -w vm.zone_reclaim_mode=0 or the corresponding sysctl.conf entry. On the machines affected, that seems to have removed the CPU pegging issue, at least it has not reappeared for several days now. Dan and Sage have discussed the issue recently in this thread: http://www.spinics.net/lists/ceph-users/msg14914.html Thanks a million to Dan. I'm looking into the original issue Florian describes above. It seems that unsetting zone_reclaim_mode wasn't the magical fix we hoped. After a couple of weeks, we're seeing pegged CPUs again, but his time we managed to get a perf top snapshot of it happening. These are the topmost (ahem) lines: 8.33% [kernel] [k] _raw_spin_lock 3.14% perf [.] 0x000da124 2.58% [unknown] [.] 0x7f8a2901042d 1.85% libpython2.7.so.1.0 [.] 0x0006dac2 1.61% libc-2.17.so [.] __memcpy_ssse3_back 1.54% perf [.] dso__find_symbol 1.44% libc-2.17.so [.] __strcmp_sse42 1.41% libpython2.7.so.1.0 [.] PyEval_EvalFrameEx 1.25% [kernel] [k] native_write_msr_safe 1.24% perf [.] hists__output_resort 1.11% libleveldb.so.1.0.7 [.] 0x0003cde8 0.86% perf [.] perf_evsel__parse_sample 0.81% libtcmalloc.so.4.1.2 [.] operator new(unsigned long) 0.76% libpython2.7.so.1.0 [.] PyEval_EvalFrameEx 0.73% [kernel] [k] apic_timer_interrupt 0.71% [kernel] [k] page_fault 0.71% [kernel] [k] _raw_spin_lock_irqsave 0.62% libpthread-2.17.so [.] pthread_mutex_unlock 0.62% libc-2.17.so [.] __memcmp_sse4_1 0.61% libc-2.17.so [.] _int_malloc 0.60% perf [.] rb_next 0.58% [kernel] [k] clear_page_c_e 0.56% [kernel] [k] tg_load_down The server in question was booted without any OSDs. A few were started after invoking 'perf top', during which run the CPUs were saturated. Any ideas? Cheers! Adolfo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Add monitor unsuccesful
On 12/03/2015, at 00.55, Jesus Chavez (jeschave) jesch...@cisco.com wrote: can anybody tell me a good blog link that explain how to add monitor? I have tried manually and also with ceph-deploy without success =( Dunno if these might help U: http://ceph.com/docs/master/rados/operations/add-or-rm-mons/#adding-a-monitor-manual http://cephnotes.ksperis.com/blog/2013/08/29/mon-failed-to-start /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Add monitor unsuccesful
can anybody tell me a good blog link that explain how to add monitor? I have tried manually and also with ceph-deploy without success =( Help [cid:image005.png@01D00809.A6D502D0] Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146 Mobile: +51 1 5538883255 CCIE - 44433 Cisco.comhttp://www.cisco.com/ [cid:image006.gif@01D00809.A6D502D0] Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click herehttp://www.cisco.com/web/about/doing_business/legal/cri/index.html for Company Registration Information. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] hang osd --zap-disk
I don’t know what is going on =( the system hangs with the message below after commaand ceph-deploy osd --zap-disk create tauro:sdb” [tauro][WARNING] No data was received after 300 seconds, disconnecting... [ceph_deploy.osd][DEBUG ] Host tauro is now ready for osd use. [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.22): /usr/bin/ceph-deploy osd activate tauro:sdb1 [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks tauro:/dev/sdb1: [tauro][DEBUG ] connection detected need for sudo [tauro][DEBUG ] connected to host: tauro [tauro][DEBUG ] detect platform information from remote host [tauro][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Red Hat Enterprise Linux Server 7.1 Maipo [ceph_deploy.osd][DEBUG ] activating host tauro disk /dev/sdb1 [ceph_deploy.osd][DEBUG ] will use init type: sysvinit [tauro][INFO ] Running command: sudo ceph-disk -v activate --mark-init sysvinit --mount /dev/sdb1 [tauro][WARNING] INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- /dev/sdb1 [tauro][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs [tauro][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs [tauro][WARNING] DEBUG:ceph-disk:Mounting /dev/sdb1 on /var/lib/ceph/tmp/mnt.lNpFro with options noatime,inode64 [tauro][WARNING] INFO:ceph-disk:Running command: /usr/bin/mount -t xfs -o noatime,inode64 -- /dev/sdb1 /var/lib/ceph/tmp/mnt.lNpFro [tauro][WARNING] DEBUG:ceph-disk:Cluster uuid is fc72a252-15be-40e9-9de1-34593be5668a [tauro][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid [tauro][WARNING] DEBUG:ceph-disk:Cluster name is ceph [tauro][WARNING] DEBUG:ceph-disk:OSD uuid is bf192166-86e9-4c68-9bff-7ced1c9ba8ee [tauro][WARNING] DEBUG:ceph-disk:Allocating OSD id... [tauro][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd create --concise bf192166-86e9-4c68-9bff-7ced1c9ba8ee [tauro][WARNING] 2015-03-11 17:49:31.782184 7f9cf05a8700 0 -- :/1015927 192.168.4.35:6789/0 pipe(0x7f9cec0253f0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f9cec025680).fault [tauro][WARNING] 2015-03-11 17:49:35.782524 7f9cf04a7700 0 -- :/1015927 192.168.4.35:6789/0 pipe(0x7f9cec00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f9cee90).fault [tauro][WARNING] 2015-03-11 17:49:37.781846 7f9cf05a8700 0 -- :/1015927 192.168.4.35:6789/0 pipe(0x7f9ce00030e0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f9ce0003370).fault [tauro][WARNING] 2015-03-11 17:49:41.782566 7f9cf04a7700 0 -- :/1015927 192.168.4.35:6789/0 pipe(0x7f9cec00 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f9cee90).fault [tauro][WARNING] 2015-03-11 17:49:43.782303 7f9cf05a8700 0 -- :/1015927 192.168.4.35:6789/0 pipe(0x7f9ce00031b0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f9ce00025d0).fault [tauro][WARNING] 2015-03-11 17:49:47.784627 7f9cf04a7700 0 -- :/1015927 192.168.4.35:6789/0 pipe(0x7f9cec00 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f9cee90).fault [tauro][WARNING] 2015-03-11 17:49:49.782712 7f9cf05a8700 0 -- :/1015927 192.168.4.35:6789/0 pipe(0x7f9ce00031b0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f9ce0002c60).fault [tauro][WARNING] 2015-03-11 17:49:53.784690 7f9cf04a7700 0 -- :/1015927 192.168.4.35:6789/0 pipe(0x7f9ce0003fb0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f9ce0004240).fault [tauro][WARNING] 2015-03-11 17:49:55.783248 7f9cf05a8700 0 -- :/1015927 192.168.4.35:6789/0 pipe(0x7f9ce0004930 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f9ce0004bc0) [cid:image005.png@01D00809.A6D502D0] Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146 Mobile: +51 1 5538883255 CCIE - 44433 Cisco.comhttp://www.cisco.com/ [cid:image006.gif@01D00809.A6D502D0] Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click herehttp://www.cisco.com/web/about/doing_business/legal/cri/index.html for Company Registration Information. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Add monitor unsuccesful
can anybody tell me a good blog link that explain how to add monitor? I have tried manually and also with ceph-deploy without success =( Help [cid:image005.png@01D00809.A6D502D0] Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146 Mobile: +51 1 5538883255 CCIE - 44433 Cisco.comhttp://www.cisco.com/ [cid:image006.gif@01D00809.A6D502D0] Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click herehttp://www.cisco.com/web/about/doing_business/legal/cri/index.html for Company Registration Information. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PGs stuck unclean active+remapped after an osd marked out
Le 11/03/2015 05:44, Francois Lafont a écrit : PS: here is my conf. [...] I have this too: ~# ceph osd crush show-tunables { choose_local_tries: 0, choose_local_fallback_tries: 0, choose_total_tries: 50, chooseleaf_descend_once: 1, chooseleaf_vary_r: 0, straw_calc_version: 1, profile: unknown, optimal_tunables: 0, legacy_tunables: 0, require_feature_tunables: 1, require_feature_tunables2: 1, require_feature_tunables3: 0, has_v2_rules: 0, has_v3_rules: 0} And in the online documentation, I can read this: http://ceph.com/docs/master/rados/operations/crush-map/#crush-tunables3 Legacy default is 0, but with this value CRUSH is sometimes unable to find a mapping. Is this my problem? Should I do this in my cluster? ceph osd crush set-tunable chooseleaf_vary_r 1 But here http://ceph.com/docs/master/rados/operations/crush-map/#which-client-versions-support-crush-tunables3, I can read: Linux kernel version v3.15 or later (for the file system and RBD kernel clients) and It could be a problem for me because I have clients with kernel version 3.13 (Ubuntu 14.04). -- François Lafont ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] S3 RadosGW - Create bucket OP
On 11/03/2015, at 08.19, Steffen W Sørensen ste...@me.com wrote: On 10/03/2015, at 23.31, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: What kind of application is that? Commercial Email platform from Openwave.com Maybe it could be worked around using an apache rewrite rule. In any case, I opened issue #11091. Okay, how, by rewriting the response? Thanks, where can tickets be followed/viewed? Asked my vendor what confuses their App about the reply. Would be nice if they could work against Ceph S3 :) 2. at every create bucket OP the GW create what looks like new containers for ACLs in .rgw pool, is this normal or howto avoid such multiple objects clottering the GW pools? Is there something wrong since I get multiple ACL object for this bucket everytime my App tries to recreate same bucket or is this a feature/bug in radosGW? That's a bug. Ok, any resolution/work-around to this? Not at the moment. There's already issue #6961, I bumped its priority higher, and we'll take a look at it. Thanks! BTW running Giant: [root@rgw ~]# rpm -qa| grep -i ceph httpd-tools-2.2.22-1.ceph.el6.x86_64 ceph-common-0.87.1-0.el6.x86_64 mod_fastcgi-2.4.7-1.ceph.el6.x86_64 libcephfs1-0.87.1-0.el6.x86_64 xfsprogs-3.1.1-14_ceph.el6.x86_64 ceph-radosgw-0.87.1-0.el6.x86_64 httpd-2.2.22-1.ceph.el6.x86_64 python-ceph-0.87.1-0.el6.x86_64 ceph-0.87.1-0.el6.x86_64 [root@rgw ~]# uname -a Linux rgw.sprawl.dk 2.6.32-504.8.1.el6.x86_64 #1 SMP Wed Jan 28 21:11:36 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux [root@rgw ~]# cat /etc/redhat-release CentOS release 6.6 (Final) signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS: stripe_unit=65536 + object_size=1310720 = pipe.fault, server, going to standby
On Wed, Mar 11, 2015 at 1:21 PM, LOPEZ Jean-Charles jelo...@redhat.com wrote: Hi Florent What are the « rules » for stripe_unit object_size ? - stripe_unit * stripe_count = object_size So in your case set stripe_unit = 2 JC On 11 Mar 2015, at 19:59, Florent B flor...@coppint.com wrote: Hi all, I'm testing CephFS with Giant and I have a problem when I set these attrs : setfattr -n ceph.dir.layout.stripe_unit -v 65536 pool_cephfs01/ setfattr -n ceph.dir.layout.stripe_count -v 1 pool_cephfs01/ setfattr -n ceph.dir.layout.object_size -v 1310720 pool_cephfs01/ setfattr -n ceph.dir.layout.pool -v cephfs01 pool_cephfs01/ When a client writes files in pool_cephfs01/, It got failed: Transport endpoint is not connected (107) and these errors on MDS : 10.111.0.6:6801/41706 10.111.17.118:0/9384 pipe(0x5e3a580 sd=27 :6801 s=2 pgs=2 cs=1 l=0 c=0x6a8d1e0).fault, server, going to standby When I set stripe_unit=1048576 object_size=1048576, it seems working. What are the rules for stripe_unit object_size ? stripe_unit * stripe_count = object_size is definitely not correct. The current rules are: - object_size is a multiple of stripe_unit - stripe_unit (and consequently object_size) is 64k-aligned - stripe_count is at least 1 (i.e. at least 1 object in an object set) However, the above layout is pretty bogus - there is basically no striping going on, so it's probably a bug in the way it's handled. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Firefly Tiering
Hi Stefan, If the majority of your hot data fits on the cache tier you will see quite a marked improvement in read performance and similar write performance (assuming you would have had your hdds backed by SSD journals). However for data that is not in the cache tier you will get 10-20% less read performance and anything up to 10x less write performance. This is because a cache write miss has to read the entire object from the backing store into the cache and then modify it. The read performance degradation will probably be fixed in Hammer with proxy reads, but writes will most likely still be an issue. Nick -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Stefan Priebe - Profihost AG Sent: 11 March 2015 07:27 To: ceph-users@lists.ceph.com Subject: [ceph-users] Firefly Tiering Hi, has anybody successfully tested tiering while using firefly? How much does it impact performance vs. a normal pool? I mean is there any difference between a full SSD pool und a tiering SSD pool with SATA Backend? Greets, Stefan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS: stripe_unit=65536 + object_size=1310720 = pipe.fault, server, going to standby
Hi Florent What are the « rules » for stripe_unit object_size ? - stripe_unit * stripe_count = object_size So in your case set stripe_unit = 2 JC On 11 Mar 2015, at 19:59, Florent B flor...@coppint.com wrote: Hi all, I'm testing CephFS with Giant and I have a problem when I set these attrs : setfattr -n ceph.dir.layout.stripe_unit -v 65536 pool_cephfs01/ setfattr -n ceph.dir.layout.stripe_count -v 1 pool_cephfs01/ setfattr -n ceph.dir.layout.object_size -v 1310720 pool_cephfs01/ setfattr -n ceph.dir.layout.pool -v cephfs01 pool_cephfs01/ When a client writes files in pool_cephfs01/, It got failed: Transport endpoint is not connected (107) and these errors on MDS : 10.111.0.6:6801/41706 10.111.17.118:0/9384 pipe(0x5e3a580 sd=27 :6801 s=2 pgs=2 cs=1 l=0 c=0x6a8d1e0).fault, server, going to standby When I set stripe_unit=1048576 object_size=1048576, it seems working. What are the rules for stripe_unit object_size ? Thank you. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs
For clarity too, I've tried to drop the min_size before as suggested, doesn't make a difference unfortunately On Wed, Mar 11, 2015 at 9:50 AM, joel.merr...@gmail.com joel.merr...@gmail.com wrote: Sure thing, n.b. I increased pg count to see if it would help. Alas not. :) Thanks again! health_detail https://gist.github.com/199bab6d3a9fe30fbcae osd_dump https://gist.github.com/499178c542fa08cc33bb osd_tree https://gist.github.com/02b62b2501cbd684f9b2 Random selected queries: queries/0.19.query https://gist.github.com/f45fea7c85d6e665edf8 queries/1.a1.query https://gist.github.com/dd68fbd5e862f94eb3be queries/7.100.query https://gist.github.com/d4fd1fb030c6f2b5e678 queries/7.467.query https://gist.github.com/05dbcdc9ee089bd52d0c On Tue, Mar 10, 2015 at 2:49 PM, Samuel Just sj...@redhat.com wrote: Yeah, get a ceph pg query on one of the stuck ones. -Sam On Tue, 2015-03-10 at 14:41 +, joel.merr...@gmail.com wrote: Stuck unclean and stuck inactive. I can fire up a full query and health dump somewhere useful if you want (full pg query info on ones listed in health detail, tree, osd dump etc). There were blocked_by operations that no longer exist after doing the OSD addition. Side note, spent some time yesterday writing some bash to do this programatically (might be useful to others, will throw on github) On Tue, Mar 10, 2015 at 1:41 PM, Samuel Just sj...@redhat.com wrote: What do you mean by unblocked but still stuck? -Sam On Mon, 2015-03-09 at 22:54 +, joel.merr...@gmail.com wrote: On Mon, Mar 9, 2015 at 2:28 PM, Samuel Just sj...@redhat.com wrote: You'll probably have to recreate osds with the same ids (empty ones), let them boot, stop them, and mark them lost. There is a feature in the tracker to improve this behavior: http://tracker.ceph.com/issues/10976 -Sam Thanks Sam, I've readded the OSDs, they became unblocked but there are still the same number of pgs stuck. I looked at them in some more detail and it seems they all have num_bytes='0'. Tried a repair too, for good measure. Still nothing I'm afraid. Does this mean some underlying catastrophe has happened and they are never going to recover? Following on, would that cause data loss. There are no missing objects and I'm hoping there's appropriate checksumming / replicas to balance that out, but now I'm not so sure. Thanks again, Joel -- $ echo kpfmAdpoofdufevq/dp/vl | perl -pe 's/(.)/chr(ord($1)-1)/ge' -- $ echo kpfmAdpoofdufevq/dp/vl | perl -pe 's/(.)/chr(ord($1)-1)/ge' ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Firefly Tiering
Am 11.03.2015 um 11:17 schrieb Nick Fisk: Hi Nick, Am 11.03.2015 um 10:52 schrieb Nick Fisk: Hi Stefan, If the majority of your hot data fits on the cache tier you will see quite a marked improvement in read performance I don't have writes ;-) just around 5%. 95% are writes. and similar write performance (assuming you would have had your hdds backed by SSD journals). similar write performance of SSD cache tier or HDD backend tier? I'm mainly interested in a writeback mode. Writes on Cache tiering are the same speed as a non cache tiering solution (with SSD journals), if the blocks are in the cache. However for data that is not in the cache tier you will get 10-20% less read performance and anything up to 10x less write performance. This is because a cache write miss has to read the entire object from the backing store into the cache and then modify it. The read performance degradation will probably be fixed in Hammer with proxy reads, but writes will most likely still be an issue. Why is writing to the HOT part so slow? If the object is in the cache tier or currently doesn't exist, then writes are fast as it just has to write directly to the cache tier SSD's. However if the object is in the slow tier and you write to it, then its very slow. This is because it has to read it off the slow tier (~12ms), write it on to the cache tier(~.5ms) and then update it (~.5ms). Mhm sounds correct. So it's better to stuck with journals instead of using a cache tier. That's purely down to your workload, but in general if you are doing lots of writes, a cache tier will probably slow you down at the moment. Stefan With a non caching solution, you would have just written straight to the journal (~.5ms) Stefan Nick -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Stefan Priebe - Profihost AG Sent: 11 March 2015 07:27 To: ceph-users@lists.ceph.com Subject: [ceph-users] Firefly Tiering Hi, has anybody successfully tested tiering while using firefly? How much does it impact performance vs. a normal pool? I mean is there any difference between a full SSD pool und a tiering SSD pool with SATA Backend? Greets, Stefan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph days
Check out ceph youtube page. - Karan - On 11 Mar 2015, at 00:45, Tom Deneau tom.den...@amd.com wrote: Are the slides or videos from ceph days presentations made available somewhere? I noticed some links in the Frankfurt Ceph day, but not for the other Ceph Days. -- Tom ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Firefly Tiering
Hi Nick, Am 11.03.2015 um 10:52 schrieb Nick Fisk: Hi Stefan, If the majority of your hot data fits on the cache tier you will see quite a marked improvement in read performance I don't have writes ;-) just around 5%. 95% are writes. and similar write performance (assuming you would have had your hdds backed by SSD journals). similar write performance of SSD cache tier or HDD backend tier? I'm mainly interested in a writeback mode. However for data that is not in the cache tier you will get 10-20% less read performance and anything up to 10x less write performance. This is because a cache write miss has to read the entire object from the backing store into the cache and then modify it. The read performance degradation will probably be fixed in Hammer with proxy reads, but writes will most likely still be an issue. Why is writing to the HOT part so slow? Stefan Nick -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Stefan Priebe - Profihost AG Sent: 11 March 2015 07:27 To: ceph-users@lists.ceph.com Subject: [ceph-users] Firefly Tiering Hi, has anybody successfully tested tiering while using firefly? How much does it impact performance vs. a normal pool? I mean is there any difference between a full SSD pool und a tiering SSD pool with SATA Backend? Greets, Stefan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.80.9 Firefly released
Where can I find the debian trusty source package for v0.80.9? Cheers, Valery On 10/03/15 20:34 , Sage Weil wrote: This is a bugfix release for firefly. It fixes a performance regression in librbd, an important CRUSH misbehavior (see below), and several RGW bugs. We have also backported support for flock/fcntl locks to ceph-fuse and libcephfs. We recommend that all Firefly users upgrade. For more detailed information, see http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt Adjusting CRUSH maps * This point release fixes several issues with CRUSH that trigger excessive data migration when adjusting OSD weights. These are most obvious when a very small weight change (e.g., a change from 0 to .01) triggers a large amount of movement, but the same set of bugs can also lead to excessive (though less noticeable) movement in other cases. However, because the bug may already have affected your cluster, fixing it may trigger movement *back* to the more correct location. For this reason, you must manually opt-in to the fixed behavior. In order to set the new tunable to correct the behavior:: ceph osd crush set-tunable straw_calc_version 1 Note that this change will have no immediate effect. However, from this point forward, any 'straw' bucket in your CRUSH map that is adjusted will get non-buggy internal weights, and that transition may trigger some rebalancing. You can estimate how much rebalancing will eventually be necessary on your cluster with:: ceph osd getcrushmap -o /tmp/cm crushtool -i /tmp/cm --num-rep 3 --test --show-mappings /tmp/a 21 crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2 crushtool -i /tmp/cm2 --reweight -o /tmp/cm2 crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings /tmp/b 21 wc -l /tmp/a # num total mappings diff -u /tmp/a /tmp/b | grep -c ^+# num changed mappings Divide the total number of lines in /tmp/a with the number of lines changed. We've found that most clusters are under 10%. You can force all of this rebalancing to happen at once with:: ceph osd crush reweight-all Otherwise, it will happen at some unknown point in the future when CRUSH weights are next adjusted. Notable Changes --- * ceph-fuse: flock, fcntl lock support (Yan, Zheng, Greg Farnum) * crush: fix straw bucket weight calculation, add straw_calc_version tunable (#10095 Sage Weil) * crush: fix tree bucket (Rongzu Zhu) * crush: fix underflow of tree weights (Loic Dachary, Sage Weil) * crushtool: add --reweight (Sage Weil) * librbd: complete pending operations before losing image (#10299 Jason Dillaman) * librbd: fix read caching performance regression (#9854 Jason Dillaman) * librbd: gracefully handle deleted/renamed pools (#10270 Jason Dillaman) * mon: fix dump of chooseleaf_vary_r tunable (Sage Weil) * osd: fix PG ref leak in snaptrimmer on peering (#10421 Kefu Chai) * osd: handle no-op write with snapshot (#10262 Sage Weil) * radosgw-admin: create subuser when creating user (#10103 Yehuda Sadeh) * rgw: change multipart uplaod id magic (#10271 Georgio Dimitrakakis, Yehuda Sadeh) * rgw: don't overwrite bucket/object owner when setting ACLs (#10978 Yehuda Sadeh) * rgw: enable IPv6 for embedded civetweb (#10965 Yehuda Sadeh) * rgw: fix partial swift GET (#10553 Yehuda Sadeh) * rgw: fix quota disable (#9907 Dong Lei) * rgw: index swift keys appropriately (#10471 Hemant Burman, Yehuda Sadeh) * rgw: make setattrs update bucket index (#5595 Yehuda Sadeh) * rgw: pass civetweb configurables (#10907 Yehuda Sadeh) * rgw: remove swift user manifest (DLO) hash calculation (#9973 Yehuda Sadeh) * rgw: return correct len for 0-len objects (#9877 Yehuda Sadeh) * rgw: S3 object copy content-type fix (#9478 Yehuda Sadeh) * rgw: send ETag on S3 object copy (#9479 Yehuda Sadeh) * rgw: send HTTP status reason explicitly in fastcgi (Yehuda Sadeh) * rgw: set ulimit -n from sysvinit (el6) init script (#9587 Sage Weil) * rgw: update swift subuser permission masks when authenticating (#9918 Yehuda Sadeh) * rgw: URL decode query params correctly (#10271 Georgio Dimitrakakis, Yehuda Sadeh) * rgw: use attrs when reading object attrs (#10307 Yehuda Sadeh) * rgw: use \r\n for http headers (#9254 Benedikt Fraunhofer, Yehuda Sadeh) Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at http://ceph.com/download/ceph-0.80.9.tar.gz * For packages, see http://ceph.com/docs/master/install/get-packages * For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- SWITCH -- Valery Tschopp, Software Engineer, Peta Solutions Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland email:
Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread
Thanks Sage I will create a “new feature” request on tracker.ceph.com http://tracker.ceph.com/ so that this discussion should not get buried under mailing list. Developers can implement this as per their convenience. Karan Singh Systems Specialist , Storage Platforms CSC - IT Center for Science, Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland mobile: +358 503 812758 tel. +358 9 4572001 fax +358 9 4572302 http://www.csc.fi/ On 10 Mar 2015, at 14:26, Sage Weil s...@newdream.net wrote: On Tue, 10 Mar 2015, Christian Eichelmann wrote: Hi Sage, we hit this problem a few monthes ago as well and it took us quite a while to figure out what's wrong. As a Systemadministrator I don't like the idea that daemons or even init scripts are changing system wide configuration parameters, so I wouldn't like to see the OSDs do it themself. This is my general feeling as well. As we move to systemd, I'd like to have the ceph unit file get away from this entirely and have the admin set these values in /etc/security/limits.conf or /etc/sysctl.d. The main thing making this problematic right now is that the daemons run as root instead of a 'ceph' user. The idea with the warning is on one hand a good hint, on the other hand it also may confuse people, since changing this setting is not required for common hardware. If we make it warn only if it reaches 50% of the threshold that is probably safe... sage Regards, Christian On 03/09/2015 08:01 PM, Sage Weil wrote: On Mon, 9 Mar 2015, Karan Singh wrote: Thanks Guys kernel.pid_max=4194303 did the trick. Great to hear! Sorry we missed that you only had it at 65536. This is a really common problem that people hit when their clusters start to grow. Is there somewhere in the docs we can put this to catch more users? Or maybe a warning issued by the osds themselves or something if they see limits that are low? sage - Karan - On 09 Mar 2015, at 14:48, Christian Eichelmann christian.eichelm...@1und1.de wrote: Hi Karan, as you are actually writing in your own book, the problem is the sysctl setting kernel.pid_max. I've seen in your bug report that you were setting it to 65536, which is still to low for high density hardware. In our cluster, one OSD server has in an idle situation about 66.000 Threads (60 OSDs per Server). The number of threads increases when you increase the number of placement groups in the cluster, which I think has triggered your problem. Set the kernel.pid_max setting to 4194303 (the maximum) like Azad Aliyar suggested, and the problem should be gone. Regards, Christian Am 09.03.2015 11:41, schrieb Karan Singh: Hello Community need help to fix a long going Ceph problem. Cluster is unhealthy , Multiple OSDs are DOWN. When i am trying to restart OSD?s i am getting this error /2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc http://Thread.cc: In function 'void Thread::create(size_t)' thread 7f760dac9700 time 2015-03-09 12:22:16.311970/ /common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)/ *Environment *: 4 Nodes , OSD+Monitor , Firefly latest , CentOS6.5 , 3.17.2-1.el6.elrepo.x86_64 Tried upgrading from 0.80.7 to 0.80.8 but no Luck Tried centOS stock kernel 2.6.32 but no Luck Memory is not a problem more then 150+GB is free Did any one every faced this problem ?? *Cluster status * * * / cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33/ / health HEALTH_WARN 7334 pgs degraded; 1185 pgs down; 1 pgs incomplete; 1735 pgs peering; 8938 pgs stale; 1/ /736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs stuck unclean; recovery 6061/31080 objects degraded (19/ /.501%); 111/196 in osds are down; clock skew detected on mon.pouta-s02, mon.pouta-s03/ / monmap e3: 3 mons at {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX .50.3:6789/ //0}, election epoch 1312, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03/ / * osdmap e26633: 239 osds: 85 up, 196 in*/ / pgmap v60389: 17408 pgs, 13 pools, 42345 MB data, 10360 objects/ /4699 GB used, 707 TB / 711 TB avail/ /6061/31080 objects degraded (19.501%)/ / 14 down+remapped+peering/ / 39 active/ /3289 active+clean/ / 547 peering/ / 663 stale+down+peering/ / 705 stale+active+remapped/ / 1 active+degraded+remapped/ / 1
Re: [ceph-users] Firefly Tiering
Am 11.03.2015 um 11:17 schrieb Nick Fisk: Hi Nick, Am 11.03.2015 um 10:52 schrieb Nick Fisk: Hi Stefan, If the majority of your hot data fits on the cache tier you will see quite a marked improvement in read performance I don't have writes ;-) just around 5%. 95% are writes. and similar write performance (assuming you would have had your hdds backed by SSD journals). similar write performance of SSD cache tier or HDD backend tier? I'm mainly interested in a writeback mode. Writes on Cache tiering are the same speed as a non cache tiering solution (with SSD journals), if the blocks are in the cache. However for data that is not in the cache tier you will get 10-20% less read performance and anything up to 10x less write performance. This is because a cache write miss has to read the entire object from the backing store into the cache and then modify it. The read performance degradation will probably be fixed in Hammer with proxy reads, but writes will most likely still be an issue. Why is writing to the HOT part so slow? If the object is in the cache tier or currently doesn't exist, then writes are fast as it just has to write directly to the cache tier SSD's. However if the object is in the slow tier and you write to it, then its very slow. This is because it has to read it off the slow tier (~12ms), write it on to the cache tier(~.5ms) and then update it (~.5ms). Mhm sounds correct. So it's better to stuck with journals instead of using a cache tier. Stefan With a non caching solution, you would have just written straight to the journal (~.5ms) Stefan Nick -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Stefan Priebe - Profihost AG Sent: 11 March 2015 07:27 To: ceph-users@lists.ceph.com Subject: [ceph-users] Firefly Tiering Hi, has anybody successfully tested tiering while using firefly? How much does it impact performance vs. a normal pool? I mean is there any difference between a full SSD pool und a tiering SSD pool with SATA Backend? Greets, Stefan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] S3 RadosGW - Create bucket OP
On 10/03/2015, at 23.31, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: What kind of application is that? Commercial Email platform from Openwave.com Maybe it could be worked around using an apache rewrite rule. In any case, I opened issue #11091. Okay, how, by rewriting the response? Thanks, where can tickets be followed/viewed? Asked my vendor what confuses their App about the reply. Would be nice if they could work against Ceph S3 :) 2. at every create bucket OP the GW create what looks like new containers for ACLs in .rgw pool, is this normal or howto avoid such multiple objects clottering the GW pools? Is there something wrong since I get multiple ACL object for this bucket everytime my App tries to recreate same bucket or is this a feature/bug in radosGW? That's a bug. Ok, any resolution/work-around to this? Not at the moment. There's already issue #6961, I bumped its priority higher, and we'll take a look at it. Thanks! /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.80.9 Firefly released
Hi Sage, On Tue, Mar 10, 2015 at 8:34 PM, Sage Weil sw...@redhat.com wrote: Adjusting CRUSH maps * This point release fixes several issues with CRUSH that trigger excessive data migration when adjusting OSD weights. These are most obvious when a very small weight change (e.g., a change from 0 to .01) triggers a large amount of movement, but the same set of bugs can also lead to excessive (though less noticeable) movement in other cases. However, because the bug may already have affected your cluster, fixing it may trigger movement *back* to the more correct location. For this reason, you must manually opt-in to the fixed behavior. In order to set the new tunable to correct the behavior:: ceph osd crush set-tunable straw_calc_version 1 Since it's not obvious in this case, does setting straw_calc_version = 1 still allow older firefly clients to connect? Cheers, Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.80.9 Firefly released
Hi Valery, They should be here http://ceph.com/debian-testing/ Cheers On 11/03/2015 10:07, Valery Tschopp wrote: Where can I find the debian trusty source package for v0.80.9? Cheers, Valery On 10/03/15 20:34 , Sage Weil wrote: This is a bugfix release for firefly. It fixes a performance regression in librbd, an important CRUSH misbehavior (see below), and several RGW bugs. We have also backported support for flock/fcntl locks to ceph-fuse and libcephfs. We recommend that all Firefly users upgrade. For more detailed information, see http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt Adjusting CRUSH maps * This point release fixes several issues with CRUSH that trigger excessive data migration when adjusting OSD weights. These are most obvious when a very small weight change (e.g., a change from 0 to .01) triggers a large amount of movement, but the same set of bugs can also lead to excessive (though less noticeable) movement in other cases. However, because the bug may already have affected your cluster, fixing it may trigger movement *back* to the more correct location. For this reason, you must manually opt-in to the fixed behavior. In order to set the new tunable to correct the behavior:: ceph osd crush set-tunable straw_calc_version 1 Note that this change will have no immediate effect. However, from this point forward, any 'straw' bucket in your CRUSH map that is adjusted will get non-buggy internal weights, and that transition may trigger some rebalancing. You can estimate how much rebalancing will eventually be necessary on your cluster with:: ceph osd getcrushmap -o /tmp/cm crushtool -i /tmp/cm --num-rep 3 --test --show-mappings /tmp/a 21 crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2 crushtool -i /tmp/cm2 --reweight -o /tmp/cm2 crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings /tmp/b 21 wc -l /tmp/a # num total mappings diff -u /tmp/a /tmp/b | grep -c ^+# num changed mappings Divide the total number of lines in /tmp/a with the number of lines changed. We've found that most clusters are under 10%. You can force all of this rebalancing to happen at once with:: ceph osd crush reweight-all Otherwise, it will happen at some unknown point in the future when CRUSH weights are next adjusted. Notable Changes --- * ceph-fuse: flock, fcntl lock support (Yan, Zheng, Greg Farnum) * crush: fix straw bucket weight calculation, add straw_calc_version tunable (#10095 Sage Weil) * crush: fix tree bucket (Rongzu Zhu) * crush: fix underflow of tree weights (Loic Dachary, Sage Weil) * crushtool: add --reweight (Sage Weil) * librbd: complete pending operations before losing image (#10299 Jason Dillaman) * librbd: fix read caching performance regression (#9854 Jason Dillaman) * librbd: gracefully handle deleted/renamed pools (#10270 Jason Dillaman) * mon: fix dump of chooseleaf_vary_r tunable (Sage Weil) * osd: fix PG ref leak in snaptrimmer on peering (#10421 Kefu Chai) * osd: handle no-op write with snapshot (#10262 Sage Weil) * radosgw-admin: create subuser when creating user (#10103 Yehuda Sadeh) * rgw: change multipart uplaod id magic (#10271 Georgio Dimitrakakis, Yehuda Sadeh) * rgw: don't overwrite bucket/object owner when setting ACLs (#10978 Yehuda Sadeh) * rgw: enable IPv6 for embedded civetweb (#10965 Yehuda Sadeh) * rgw: fix partial swift GET (#10553 Yehuda Sadeh) * rgw: fix quota disable (#9907 Dong Lei) * rgw: index swift keys appropriately (#10471 Hemant Burman, Yehuda Sadeh) * rgw: make setattrs update bucket index (#5595 Yehuda Sadeh) * rgw: pass civetweb configurables (#10907 Yehuda Sadeh) * rgw: remove swift user manifest (DLO) hash calculation (#9973 Yehuda Sadeh) * rgw: return correct len for 0-len objects (#9877 Yehuda Sadeh) * rgw: S3 object copy content-type fix (#9478 Yehuda Sadeh) * rgw: send ETag on S3 object copy (#9479 Yehuda Sadeh) * rgw: send HTTP status reason explicitly in fastcgi (Yehuda Sadeh) * rgw: set ulimit -n from sysvinit (el6) init script (#9587 Sage Weil) * rgw: update swift subuser permission masks when authenticating (#9918 Yehuda Sadeh) * rgw: URL decode query params correctly (#10271 Georgio Dimitrakakis, Yehuda Sadeh) * rgw: use attrs when reading object attrs (#10307 Yehuda Sadeh) * rgw: use \r\n for http headers (#9254 Benedikt Fraunhofer, Yehuda Sadeh) Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at http://ceph.com/download/ceph-0.80.9.tar.gz * For packages, see http://ceph.com/docs/master/install/get-packages * For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy ___ ceph-users mailing list ceph-users@lists.ceph.com
Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs
Ok, you lost all copies from an interval where the pgs went active. The recovery from this is going to be complicated and fragile. Are the pools valuable? -Sam On 03/11/2015 03:35 AM, joel.merr...@gmail.com wrote: For clarity too, I've tried to drop the min_size before as suggested, doesn't make a difference unfortunately On Wed, Mar 11, 2015 at 9:50 AM, joel.merr...@gmail.com joel.merr...@gmail.com wrote: Sure thing, n.b. I increased pg count to see if it would help. Alas not. :) Thanks again! health_detail https://gist.github.com/199bab6d3a9fe30fbcae osd_dump https://gist.github.com/499178c542fa08cc33bb osd_tree https://gist.github.com/02b62b2501cbd684f9b2 Random selected queries: queries/0.19.query https://gist.github.com/f45fea7c85d6e665edf8 queries/1.a1.query https://gist.github.com/dd68fbd5e862f94eb3be queries/7.100.query https://gist.github.com/d4fd1fb030c6f2b5e678 queries/7.467.query https://gist.github.com/05dbcdc9ee089bd52d0c On Tue, Mar 10, 2015 at 2:49 PM, Samuel Just sj...@redhat.com wrote: Yeah, get a ceph pg query on one of the stuck ones. -Sam On Tue, 2015-03-10 at 14:41 +, joel.merr...@gmail.com wrote: Stuck unclean and stuck inactive. I can fire up a full query and health dump somewhere useful if you want (full pg query info on ones listed in health detail, tree, osd dump etc). There were blocked_by operations that no longer exist after doing the OSD addition. Side note, spent some time yesterday writing some bash to do this programatically (might be useful to others, will throw on github) On Tue, Mar 10, 2015 at 1:41 PM, Samuel Just sj...@redhat.com wrote: What do you mean by unblocked but still stuck? -Sam On Mon, 2015-03-09 at 22:54 +, joel.merr...@gmail.com wrote: On Mon, Mar 9, 2015 at 2:28 PM, Samuel Just sj...@redhat.com wrote: You'll probably have to recreate osds with the same ids (empty ones), let them boot, stop them, and mark them lost. There is a feature in the tracker to improve this behavior: http://tracker.ceph.com/issues/10976 -Sam Thanks Sam, I've readded the OSDs, they became unblocked but there are still the same number of pgs stuck. I looked at them in some more detail and it seems they all have num_bytes='0'. Tried a repair too, for good measure. Still nothing I'm afraid. Does this mean some underlying catastrophe has happened and they are never going to recover? Following on, would that cause data loss. There are no missing objects and I'm hoping there's appropriate checksumming / replicas to balance that out, but now I'm not so sure. Thanks again, Joel -- $ echo kpfmAdpoofdufevq/dp/vl | perl -pe 's/(.)/chr(ord($1)-1)/ge' ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.80.9 Firefly released
On Wed, 11 Mar 2015, Stefan Priebe - Profihost AG wrote: Hi Sage, Am 11.03.2015 um 04:14 schrieb Sage Weil: On Wed, 11 Mar 2015, Christian Balzer wrote: On Tue, 10 Mar 2015 12:34:14 -0700 (PDT) Sage Weil wrote: Adjusting CRUSH maps * This point release fixes several issues with CRUSH that trigger excessive data migration when adjusting OSD weights. These are most obvious when a very small weight change (e.g., a change from 0 to .01) triggers a large amount of movement, but the same set of bugs can also lead to excessive (though less noticeable) movement in other cases. However, because the bug may already have affected your cluster, fixing it may trigger movement *back* to the more correct location. For this reason, you must manually opt-in to the fixed behavior. It would be nice to know at what version of Ceph those bugs were introduced. This bug has been present in CRUSH since the beginning. So peaople upgrading from dumplang have todo the same? 1.) They need to set tunables to optimal (to get firefly tunables) 2.) They have to set those options you mention? Nothing has to (or probably should be) done as part of the upgrade process itself. This tunable can be set without changing to firefly tunables. It affects the monitor-side generation of internal weight values one, and has no dependency or compatibility issue with clients or OSDs. And the bug only triggers when a weight is changed. sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.80.9 Firefly released
On Wed, 11 Mar 2015, Gabri Mate wrote: May I assume this fix will be in Hammer? So can I use this to fix my cluster after upgrading Giant to Hammer? Yes, the fix is also in Hammer, but the same procedure should be followed to opt-in to the new behavior. sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.80.9 Firefly released
On Wed, 11 Mar 2015, Dan van der Ster wrote: Hi Sage, On Tue, Mar 10, 2015 at 8:34 PM, Sage Weil sw...@redhat.com wrote: Adjusting CRUSH maps * This point release fixes several issues with CRUSH that trigger excessive data migration when adjusting OSD weights. These are most obvious when a very small weight change (e.g., a change from 0 to .01) triggers a large amount of movement, but the same set of bugs can also lead to excessive (though less noticeable) movement in other cases. However, because the bug may already have affected your cluster, fixing it may trigger movement *back* to the more correct location. For this reason, you must manually opt-in to the fixed behavior. In order to set the new tunable to correct the behavior:: ceph osd crush set-tunable straw_calc_version 1 Since it's not obvious in this case, does setting straw_calc_version = 1 still allow older firefly clients to connect? Correct. The bug only affects the generation of internal weight values that are stored in the crush map itself (crush_calc_straw()). Setting the tunable makes the *monitors* behave properly (if adjusting weights via the ceph cli) or *crushtool* calculate weights properly if you are compiling the crush map via 'crushtool -c ...'. There is no dependency or compatibility issue with clients, and no need to set tunables to 'firefly' to set straw_calc_version. sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Duplication name Container
On 03/11/2015 03:23 PM, Jimmy Goffaux wrote: Hello All, I use Ceph in production for several months. but i have an errors with Ceph Rados Gateway for multiple users. I am faced with the following error: Error trying to create container 'xs02': 409 Conflict: BucketAlreadyExists Which corresponds to the documentation : http://ceph.com/docs/master/radosgw/s3/bucketops/ By which means I can avoid this kind of problem? You can not. Bucket names are unique inside the RADOS Gateway. Just as with Amazon S3. Here are my versions used: radosgw-agent = 1.2-1precise ceph = 0.87-1precise Thank you for your help -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Adding Monitor Stuck
I am really stuck adding second monitor =(, ceph-deploy mon create seems to finish with some error like monitor may not be able to form quorum and they are not definite in mon initial… I have found there is a way to get it work and is doing the next commands: ceph mon add tauro 192.168.4.35:6789 but this is weird because seems to be a command that you usually run after mkfs something like this (do ceph-mon -i {mon-id} --mkfs --monmap {tmp}/{map-filename} --keyring {tmp}/{key-filename}) :@ but that depends on monmap and keyring things that you are not able to do in the “new monitor” since it has nothing =( so even the manual way that if you follow the steps you get lost because you don’t really know which command is for which server. Also it says that you should star the new monitor while the “add command” is hunting a mon client, but that depends on monmap and keyring again things that you don’t have in the new server… =( Im getting crazy can anybody explain how does really work? Thanks [cid:image005.png@01D00809.A6D502D0] Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146 Mobile: +51 1 5538883255 CCIE - 44433 Cisco.comhttp://www.cisco.com/ [cid:image006.gif@01D00809.A6D502D0] Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click herehttp://www.cisco.com/web/about/doing_business/legal/cri/index.html for Company Registration Information. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.80.9 Firefly released
Hi, May I assume this fix will be in Hammer? So can I use this to fix my cluster after upgrading Giant to Hammer? Best regards, Mate On 12:34 Tue 10 Mar , Sage Weil wrote: This is a bugfix release for firefly. It fixes a performance regression in librbd, an important CRUSH misbehavior (see below), and several RGW bugs. We have also backported support for flock/fcntl locks to ceph-fuse and libcephfs. We recommend that all Firefly users upgrade. For more detailed information, see http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt Adjusting CRUSH maps * This point release fixes several issues with CRUSH that trigger excessive data migration when adjusting OSD weights. These are most obvious when a very small weight change (e.g., a change from 0 to .01) triggers a large amount of movement, but the same set of bugs can also lead to excessive (though less noticeable) movement in other cases. However, because the bug may already have affected your cluster, fixing it may trigger movement *back* to the more correct location. For this reason, you must manually opt-in to the fixed behavior. In order to set the new tunable to correct the behavior:: ceph osd crush set-tunable straw_calc_version 1 Note that this change will have no immediate effect. However, from this point forward, any 'straw' bucket in your CRUSH map that is adjusted will get non-buggy internal weights, and that transition may trigger some rebalancing. You can estimate how much rebalancing will eventually be necessary on your cluster with:: ceph osd getcrushmap -o /tmp/cm crushtool -i /tmp/cm --num-rep 3 --test --show-mappings /tmp/a 21 crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2 crushtool -i /tmp/cm2 --reweight -o /tmp/cm2 crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings /tmp/b 21 wc -l /tmp/a # num total mappings diff -u /tmp/a /tmp/b | grep -c ^+# num changed mappings Divide the total number of lines in /tmp/a with the number of lines changed. We've found that most clusters are under 10%. You can force all of this rebalancing to happen at once with:: ceph osd crush reweight-all Otherwise, it will happen at some unknown point in the future when CRUSH weights are next adjusted. Notable Changes --- * ceph-fuse: flock, fcntl lock support (Yan, Zheng, Greg Farnum) * crush: fix straw bucket weight calculation, add straw_calc_version tunable (#10095 Sage Weil) * crush: fix tree bucket (Rongzu Zhu) * crush: fix underflow of tree weights (Loic Dachary, Sage Weil) * crushtool: add --reweight (Sage Weil) * librbd: complete pending operations before losing image (#10299 Jason Dillaman) * librbd: fix read caching performance regression (#9854 Jason Dillaman) * librbd: gracefully handle deleted/renamed pools (#10270 Jason Dillaman) * mon: fix dump of chooseleaf_vary_r tunable (Sage Weil) * osd: fix PG ref leak in snaptrimmer on peering (#10421 Kefu Chai) * osd: handle no-op write with snapshot (#10262 Sage Weil) * radosgw-admin: create subuser when creating user (#10103 Yehuda Sadeh) * rgw: change multipart uplaod id magic (#10271 Georgio Dimitrakakis, Yehuda Sadeh) * rgw: don't overwrite bucket/object owner when setting ACLs (#10978 Yehuda Sadeh) * rgw: enable IPv6 for embedded civetweb (#10965 Yehuda Sadeh) * rgw: fix partial swift GET (#10553 Yehuda Sadeh) * rgw: fix quota disable (#9907 Dong Lei) * rgw: index swift keys appropriately (#10471 Hemant Burman, Yehuda Sadeh) * rgw: make setattrs update bucket index (#5595 Yehuda Sadeh) * rgw: pass civetweb configurables (#10907 Yehuda Sadeh) * rgw: remove swift user manifest (DLO) hash calculation (#9973 Yehuda Sadeh) * rgw: return correct len for 0-len objects (#9877 Yehuda Sadeh) * rgw: S3 object copy content-type fix (#9478 Yehuda Sadeh) * rgw: send ETag on S3 object copy (#9479 Yehuda Sadeh) * rgw: send HTTP status reason explicitly in fastcgi (Yehuda Sadeh) * rgw: set ulimit -n from sysvinit (el6) init script (#9587 Sage Weil) * rgw: update swift subuser permission masks when authenticating (#9918 Yehuda Sadeh) * rgw: URL decode query params correctly (#10271 Georgio Dimitrakakis, Yehuda Sadeh) * rgw: use attrs when reading object attrs (#10307 Yehuda Sadeh) * rgw: use \r\n for http headers (#9254 Benedikt Fraunhofer, Yehuda Sadeh) Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at http://ceph.com/download/ceph-0.80.9.tar.gz * For packages, see http://ceph.com/docs/master/install/get-packages * For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster
Sorry about all the unrelated grep issues.. So I've rebuilt and reinstalled and it's still broken. On the working node, even with the new packages, everything works. On the new broken node, I've added a mon and it works. But I still cannot start an OSD on the new node. What else do you need from me? I'll get logs run any number of tests. I've got data in this cluster already, and it's full so I need to expand it, I've already got hardware. Thanks in advance for even having a look -Original Message- From: Samuel Just [mailto:sj...@redhat.com] Sent: Wednesday, 11 March 2015 1:41 AM To: Malcolm Haak; jl...@redhat.com Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster Joao, it looks like map 2759 is causing trouble, how would he get the full and incremental maps for that out of the mons? -Sam On Tue, 2015-03-10 at 14:12 +, Malcolm Haak wrote: Hi Samuel, The sha1? I'm going to admit ignorance as to what you are looking for. They are all running the same release if that is what you are asking. Same tarball built into rpms using rpmbuild on both nodes... Only difference being that the other node has been upgraded and the problem node is fresh. added the requested config here is the command line output microserver-1:/etc # /etc/init.d/ceph start osd.3 === osd.3 === Mounting xfs on microserver-1:/var/lib/ceph/osd/ceph-3 2015-03-11 01:00:13.492279 7f05b2f72700 1 -- :/0 messenger.start 2015-03-11 01:00:13.492823 7f05b2f72700 1 -- :/1002795 -- 192.168.0.10:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x7f05ac0290b0 con 0x7f05ac027c40 2015-03-11 01:00:13.510814 7f05b07ef700 1 -- 192.168.0.250:0/1002795 learned my addr 192.168.0.250:0/1002795 2015-03-11 01:00:13.527653 7f05abfff700 1 -- 192.168.0.250:0/1002795 == mon.0 192.168.0.10:6789/0 1 mon_map magic: 0 v1 191+0+0 (1112175541 0 0) 0x7f05aab0 con 0x7f05ac027c40 2015-03-11 01:00:13.527899 7f05abfff700 1 -- 192.168.0.250:0/1002795 == mon.0 192.168.0.10:6789/0 2 auth_reply(proto 1 0 (0) Success) v1 24+0+0 (3859410672 0 0) 0x7f05ae70 con 0x7f05ac027c40 2015-03-11 01:00:13.527973 7f05abfff700 1 -- 192.168.0.250:0/1002795 -- 192.168.0.10:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x7f05ac029730 con 0x7f05ac027c40 2015-03-11 01:00:13.528124 7f05b2f72700 1 -- 192.168.0.250:0/1002795 -- 192.168.0.10:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x7f05ac029a50 con 0x7f05ac027c40 2015-03-11 01:00:13.528265 7f05b2f72700 1 -- 192.168.0.250:0/1002795 -- 192.168.0.10:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x7f05ac029f20 con 0x7f05ac027c40 2015-03-11 01:00:13.530359 7f05abfff700 1 -- 192.168.0.250:0/1002795 == mon.0 192.168.0.10:6789/0 3 mon_map magic: 0 v1 191+0+0 (1112175541 0 0) 0x7f05aab0 con 0x7f05ac027c40 2015-03-11 01:00:13.530548 7f05abfff700 1 -- 192.168.0.250:0/1002795 == mon.0 192.168.0.10:6789/0 4 mon_subscribe_ack(300s) v1 20+0+0 (3648139960 0 0) 0x7f05afb0 con 0x7f05ac027c40 2015-03-11 01:00:13.531114 7f05abfff700 1 -- 192.168.0.250:0/1002795 == mon.0 192.168.0.10:6789/0 5 osd_map(3277..3277 src has 2757..3277) v3 5366+0+0 (3110999244 0 0) 0x7f05a0002800 con 0x7f05ac027c40 2015-03-11 01:00:13.531772 7f05abfff700 1 -- 192.168.0.250:0/1002795 == mon.0 192.168.0.10:6789/0 6 mon_subscribe_ack(300s) v1 20+0+0 (3648139960 0 0) 0x7f05afb0 con 0x7f05ac027c40 2015-03-11 01:00:13.532186 7f05abfff700 1 -- 192.168.0.250:0/1002795 == mon.0 192.168.0.10:6789/0 7 osd_map(3277..3277 src has 2757..3277) v3 5366+0+0 (3110999244 0 0) 0x7f05a0001250 con 0x7f05ac027c40 2015-03-11 01:00:13.532260 7f05abfff700 1 -- 192.168.0.250:0/1002795 == mon.0 192.168.0.10:6789/0 8 mon_subscribe_ack(300s) v1 20+0+0 (3648139960 0 0) 0x7f05afb0 con 0x7f05ac027c40 2015-03-11 01:00:13.556748 7f05b2f72700 1 -- 192.168.0.250:0/1002795 -- 192.168.0.10:6789/0 -- mon_command({prefix: get_command_descriptions} v 0) v1 -- ?+0 0x7f05ac016ac0 con 0x7f05ac027c40 2015-03-11 01:00:13.564968 7f05abfff700 1 -- 192.168.0.250:0/1002795 == mon.0 192.168.0.10:6789/0 9 mon_command_ack([{prefix: get_command_descriptions}]=0 v0) v1 72+0+34995 (1092875540 0 1727986498) 0x7f05aa70 con 0x7f05ac027c40 2015-03-11 01:00:13.770122 7f05b2f72700 1 -- 192.168.0.250:0/1002795 -- 192.168.0.10:6789/0 -- mon_command({prefix: osd crush create-or-move, args: [host=microserver-1, root=default], id: 3, weight: 1.81} v 0) v1 -- ?+0 0x7f05ac016ac0 con 0x7f05ac027c40 2015-03-11 01:00:13.772299 7f05abfff700 1 -- 192.168.0.250:0/1002795 == mon.0 192.168.0.10:6789/0 10 mon_command_ack([{prefix: osd crush create-or-move, args: [host=microserver-1, root=default], id: 3, weight: 1.81}]=0 create-or-move updated item name 'osd.3' weight 1.81 at location {host=microserver-1,root=default}
Re: [ceph-users] Add monitor unsuccesful
Thanks Steffen I have followed everything not sure what is going on, the mon keyring and client admin are individual? Per mon host? Or do I need to copy from the first initial mon node? Thanks again! Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146tel:+52%2055%205267%203146 Mobile: +51 1 5538883255tel:+51%201%205538883255 CCIE - 44433 On Mar 11, 2015, at 6:28 PM, Steffen W Sørensen ste...@me.commailto:ste...@me.com wrote: On 12/03/2015, at 00.55, Jesus Chavez (jeschave) jesch...@cisco.commailto:jesch...@cisco.com wrote: can anybody tell me a good blog link that explain how to add monitor? I have tried manually and also with ceph-deploy without success =( Dunno if these might help U: http://ceph.com/docs/master/rados/operations/add-or-rm-mons/#adding-a-monitor-manual http://cephnotes.ksperis.com/blog/2013/08/29/mon-failed-to-start /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Can not list objects in large bucket
I have a single radosgw user with 2 s3 keys and 1 swift key. I have created a few buckets and I can list all of the contents of bucket A and C but not B with either S3 (boto) or python-swiftclient. I am able to list the first 1000 entries using radosgw-admin 'bucket list --bucket=bucketB' without any issues but this doesn't really help. The odd thing is I can still upload and download objects in the bucket. I just can't list them. I tried setting the bucket canned_acl to private and public but I still can't list the objects inside. I'm using ceph .87 (Giant) Here is some info about the cluster:: http://pastebin.com/LvQYnXem -- ceph.conf http://pastebin.com/efBBPCwa -- ceph -s http://pastebin.com/tF62WMU9 -- radosgw-admin bucket list http://pastebin.com/CZ8TkyNG -- python list bucket objects script http://pastebin.com/TUCyxhMD -- radosgw-admin bucket stats --bucketB http://pastebin.com/uHbEtGHs -- rados -p .rgw.buckets ls | grep default.20283.2 (bucketB marker) http://pastebin.com/WYwfQndV -- Python Error when trying to list BucketB via boto I have no idea why this could be happening outside of the acl. Has anyone seen this before? Any idea on how I can get access to this bucket again via s3/swift? Also is there a way to list the full list of a bucket via radosgw-admin and not the first 9000 lines / 1000 entries, or a way to page through them? EDIT:: I just fixed it (I hope) but the fix doesn't make any sense: radosgw-admin bucket unlink --uid=user --bucket=bucketB radosgw-admin bucket link --uid=user --bucket=bucketB --bucket-id=default.20283.2 Now with swift or s3 (boto) I am able to list the bucket contents without issue ^_^ Can someone elaborate on why this works and how it broken in the first place when ceph was health_ok the entire time? With 3 replicas how did this happen? Could this be a bug? sorry for the rambling. I am confused and tired ;p ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Shadow files
Anyone got any info on this? Is it safe to delete shadow files? On 2015-03-11 10:03, Ben wrote: We have a large number of shadow files in our cluster that aren't being deleted automatically as data is deleted. Is it safe to delete these files? Is there something we need to be aware of when deleting them? Is there a script that we can run that will delete these safely? Is there something wrong with our cluster that it isn't deleting these files when it should be? We are using civetweb with radosgw, with tengine ssl proxy infront of it Any advice please Thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com