[ceph-users] radosgw backup
Hi everyone. I'm wondering - is there way to backup radosgw data? What i already tried. create backup pool - copy .rgw.buckets to backup pool. Then i delete object via s3 client. And then i copy data from backup pool to .rgw.buckets. I still can't see object in s3 client, but can get it via http by early known url. Questions: where radosgw stores info about objects - (how to make restored object visible from s3 client)? is there best way for backup data for radosgw? Thanks for any advises. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds crash
Thank you for your reply I had read the 'mds crashing' thread and i dont think im seeing that bug (http://tracker.ceph.com/issues/10449) . I have enabled debug objector = 10 and here is the full log on starting mds : http://pastebin.com/dbk0uLYy Here is the last part of log: -35 2015-05-29 09:28:23.104098 7f78cdcde700 10 mds.0.objecter ms_handle_connect 0x3f43440 -34 2015-05-29 09:28:23.104555 7f78cdcde700 10 mds.0.objecter ms_handle_connect 0x3f43860 -33 2015-05-29 09:28:23.105016 7f78cdcde700 10 mds.0.objecter ms_handle_connect 0x3f43de0 -32 2015-05-29 09:28:23.105350 7f78c57ad700 10 mds.0.objecter ms_dispatch 0x3e2e000 osd_op_reply(25 164.0002 [trimtrunc 2@0] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6 -31 2015-05-29 09:28:23.105375 7f78c57ad700 10 mds.0.objecter in handle_osd_op_reply -30 2015-05-29 09:28:23.105378 7f78c57ad700 7 mds.0.objecter handle_osd_op_reply 25 ondisk v 0'0 uv 0 in 11.2a2643ed attempt 1 -29 2015-05-29 09:28:23.105381 7f78c57ad700 10 mds.0.objecter op 0 rval -95 len 0 -28 2015-05-29 09:28:23.105387 7f78c57ad700 5 mds.0.objecter 1 unacked, 4 uncommitted -27 2015-05-29 09:28:23.105678 7f78c55ab700 10 mds.0.objecter ms_dispatch 0x3e2e000 osd_op_reply(26 164.0003 [trimtrunc 2@0] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6 -26 2015-05-29 09:28:23.105696 7f78c55ab700 10 mds.0.objecter in handle_osd_op_reply -25 2015-05-29 09:28:23.105699 7f78c55ab700 7 mds.0.objecter handle_osd_op_reply 26 ondisk v 0'0 uv 0 in 11.beb48626 attempt 1 -24 2015-05-29 09:28:23.105702 7f78c55ab700 10 mds.0.objecter op 0 rval -95 len 0 -23 2015-05-29 09:28:23.105708 7f78c55ab700 5 mds.0.objecter 1 unacked, 3 uncommitted -22 2015-05-29 09:28:23.106134 7f78c54aa700 10 mds.0.objecter ms_dispatch 0x3e2e000 osd_op_reply(27 164.0001 [trimtrunc 2@0] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6 -21 2015-05-29 09:28:23.106152 7f78c54aa700 10 mds.0.objecter in handle_osd_op_reply -20 2015-05-29 09:28:23.106155 7f78c54aa700 7 mds.0.objecter handle_osd_op_reply 27 ondisk v 0'0 uv 0 in 11.4a09fd98 attempt 1 -19 2015-05-29 09:28:23.106158 7f78c54aa700 10 mds.0.objecter op 0 rval -95 len 0 -18 2015-05-29 09:28:23.106163 7f78c54aa700 5 mds.0.objecter 1 unacked, 2 uncommitted -17 2015-05-29 09:28:23.106524 7f78c53a9700 10 mds.0.objecter ms_dispatch 0x3e2e000 osd_op_reply(28 164. [trimtrunc 2@0] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6 -16 2015-05-29 09:28:23.106541 7f78c53a9700 10 mds.0.objecter in handle_osd_op_reply -15 2015-05-29 09:28:23.106543 7f78c53a9700 7 mds.0.objecter handle_osd_op_reply 28 ondisk v 0'0 uv 0 in 11.5ce99960 attempt 1 -14 2015-05-29 09:28:23.106546 7f78c53a9700 10 mds.0.objecter op 0 rval -95 len 0 -13 2015-05-29 09:28:23.106552 7f78c53a9700 5 mds.0.objecter 1 unacked, 1 uncommitted -12 2015-05-29 09:28:23.106958 7f78c52a8700 10 mds.0.objecter ms_dispatch 0x3e2e000 osd_op_reply(29 164.0004 [trimtrunc 2@0] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6 -11 2015-05-29 09:28:23.106971 7f78c52a8700 10 mds.0.objecter in handle_osd_op_reply -10 2015-05-29 09:28:23.106973 7f78c52a8700 7 mds.0.objecter handle_osd_op_reply 29 ondisk v 0'0 uv 0 in 11.50e84eb2 attempt 1 -9 2015-05-29 09:28:23.106976 7f78c52a8700 10 mds.0.objecter op 0 rval -95 len 0 -8 2015-05-29 09:28:23.106980 7f78c52a8700 5 mds.0.objecter 1 unacked, 0 uncommitted -7 2015-05-29 09:28:23.107296 7f78c69bf700 10 mds.0.objecter ms_dispatch 0x3e2e000 osd_op_reply(30 1. [omap-get-header 0~0,omap-get-vals 0~16] v0'0 uv1 ondisk = 0) v6 -6 2015-05-29 09:28:23.107307 7f78c69bf700 10 mds.0.objecter in handle_osd_op_reply -5 2015-05-29 09:28:23.107309 7f78c69bf700 7 mds.0.objecter handle_osd_op_reply 30 ondisk v 0'0 uv 1 in 13.6b2cdaff attempt 0 -4 2015-05-29 09:28:23.107311 7f78c69bf700 10 mds.0.objecter op 0 rval 0 len 222 -3 2015-05-29 09:28:23.107313 7f78c69bf700 10 mds.0.objecter op 1 rval 0 len 4 -2 2015-05-29 09:28:23.107315 7f78c69bf700 10 mds.0.objecter op 1 handler 0x3e316b0 -1 2015-05-29 09:28:23.107321 7f78c69bf700 5 mds.0.objecter 0 unacked, 0 uncommitted 0 2015-05-29 09:28:23.108478 7f78cb4d9700 -1 mds/MDCache.cc: In function 'virtual void C_IO_MDC_TruncateFinish::finish(int)' thread 7f78cb4d9700 time 2015-05-29 09:28:23.107027 mds/MDCache.cc: 5974: FAILED assert(r == 0 || r == -2) On 28/05/15 17:43, John Spray wrote: (This came up as in-reply-to to the previous mds crashing thread -- it's better to start threads with a fresh message) On 28/05/2015 16:58, Peter Tiernan wrote: Hi all, I have been testing cephfs with erasure coded pool and cache tier. I have 3 mds running on the same physical server as 3 mons. The cluster is in ok state otherwise, rbd is working and all pg are active+clean. Im running v
Re: [ceph-users] mds crash
hi, that appears to have worked. The mds are now stable and I can read and write correctly. thanks for the help and have a good day. On 29/05/15 12:25, John Spray wrote: On 29/05/2015 11:41, Peter Tiernan wrote: ok, thanks. I wasn’t aware of this. Should this command fix everything or is do i need to delete cephfs and pools and start again: ceph osd tier cache-mode CachePool writeback It might well work, give it a try. John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds crash
On 29/05/2015 09:46, Peter Tiernan wrote: -16 2015-05-29 09:28:23.106541 7f78c53a9700 10 mds.0.objecter in handle_osd_op_reply -15 2015-05-29 09:28:23.106543 7f78c53a9700 7 mds.0.objecter handle_osd_op_reply 28 ondisk v 0'0 uv 0 in 11.5ce99960 attempt 1 -14 2015-05-29 09:28:23.106546 7f78c53a9700 10 mds.0.objecter op 0 rval -95 len 0 -13 2015-05-29 09:28:23.106552 7f78c53a9700 5 mds.0.objecter 1 unacked, 1 uncommitted -12 2015-05-29 09:28:23.106958 7f78c52a8700 10 mds.0.objecter ms_dispatch 0x3e2e000 osd_op_reply(29 164.0004 [trimtrunc 2@0] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6 -11 2015-05-29 09:28:23.106971 7f78c52a8700 10 mds.0.objecter in handle_osd_op_reply -10 2015-05-29 09:28:23.106973 7f78c52a8700 7 mds.0.objecter handle_osd_op_reply 29 ondisk v 0'0 uv 0 in 11.50e84eb2 attempt 1 -9 2015-05-29 09:28:23.106976 7f78c52a8700 10 mds.0.objecter op 0 rval -95 len 0 -8 2015-05-29 09:28:23.106980 7f78c52a8700 5 mds.0.objecter 1 unacked, 0 uncommitted -7 2015-05-29 09:28:23.107296 7f78c69bf700 10 mds.0.objecter ms_dispatch 0x3e2e000 osd_op_reply(30 1. [omap-get-header 0~0,omap-get-vals 0~16] v0'0 uv1 ondisk = 0) v6 -6 2015-05-29 09:28:23.107307 7f78c69bf700 10 mds.0.objecter in handle_osd_op_reply -5 2015-05-29 09:28:23.107309 7f78c69bf700 7 mds.0.objecter handle_osd_op_reply 30 ondisk v 0'0 uv 1 in 13.6b2cdaff attempt 0 -4 2015-05-29 09:28:23.107311 7f78c69bf700 10 mds.0.objecter op 0 rval 0 len 222 -3 2015-05-29 09:28:23.107313 7f78c69bf700 10 mds.0.objecter op 1 rval 0 len 4 -2 2015-05-29 09:28:23.107315 7f78c69bf700 10 mds.0.objecter op 1 handler 0x3e316b0 -1 2015-05-29 09:28:23.107321 7f78c69bf700 5 mds.0.objecter 0 unacked, 0 uncommitted 0 2015-05-29 09:28:23.108478 7f78cb4d9700 -1 mds/MDCache.cc: In function 'virtual void C_IO_MDC_TruncateFinish::finish(int)' thread 7f78cb4d9700 time 2015-05-29 09:28:23.107027 mds/MDCache.cc: 5974: FAILED assert(r == 0 || r == -2) OK, so you have Operation not supported coming out of RADOS. That usually means you've got CephFS trying to use an erase coded pool directly (doesn't work) rather than via a replicated cache pool (does work). You may have found that the filesystem appeared to work up to a point if you were only writing and not modifying. John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds crash
ok, thanks. I wasn’t aware of this. Should this command fix everything or is do i need to delete cephfs and pools and start again: ceph osd tier cache-mode CachePool writeback On 29/05/15 11:37, John Spray wrote: On 29/05/2015 11:34, Peter Tiernan wrote: ok, thats interesting. I had issues before this crash where files were being garbled. I followed what I thought was the correct procedure for erasure coded pool with cache tier: ceph osd pool create ECpool 800 800 erasure default ceph osd pool create CachePool 4096 4096 ceph osd tier add ECpool CachePool ceph osd tier cache-mode CachePool readonly ceph osd tier set-overlay ECpool CachePool ceph osd pool create cephfs_metadata 4096 4096 ceph fs new cephfs cephfs_metadata ECpool Is my mistake the last command above? should the ceph fs new be given the CachePool and not the ECpool? The problem is that you're creating a readonly cache tier instead of a writeback cache tier. CephFS needs a writeback cache tier for modifications and truncations. Cheers, John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds crash
On 29/05/2015 11:34, Peter Tiernan wrote: ok, thats interesting. I had issues before this crash where files were being garbled. I followed what I thought was the correct procedure for erasure coded pool with cache tier: ceph osd pool create ECpool 800 800 erasure default ceph osd pool create CachePool 4096 4096 ceph osd tier add ECpool CachePool ceph osd tier cache-mode CachePool readonly ceph osd tier set-overlay ECpool CachePool ceph osd pool create cephfs_metadata 4096 4096 ceph fs new cephfs cephfs_metadata ECpool Is my mistake the last command above? should the ceph fs new be given the CachePool and not the ECpool? The problem is that you're creating a readonly cache tier instead of a writeback cache tier. CephFS needs a writeback cache tier for modifications and truncations. Cheers, John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds crash
ok, thats interesting. I had issues before this crash where files were being garbled. I followed what I thought was the correct procedure for erasure coded pool with cache tier: ceph osd pool create ECpool 800 800 erasure default ceph osd pool create CachePool 4096 4096 ceph osd tier add ECpool CachePool ceph osd tier cache-mode CachePool readonly ceph osd tier set-overlay ECpool CachePool ceph osd pool create cephfs_metadata 4096 4096 ceph fs new cephfs cephfs_metadata ECpool Is my mistake the last command above? should the ceph fs new be given the CachePool and not the ECpool? thanks On 29/05/15 11:17, John Spray wrote: On 29/05/2015 09:46, Peter Tiernan wrote: -16 2015-05-29 09:28:23.106541 7f78c53a9700 10 mds.0.objecter in handle_osd_op_reply -15 2015-05-29 09:28:23.106543 7f78c53a9700 7 mds.0.objecter handle_osd_op_reply 28 ondisk v 0'0 uv 0 in 11.5ce99960 attempt 1 -14 2015-05-29 09:28:23.106546 7f78c53a9700 10 mds.0.objecter op 0 rval -95 len 0 -13 2015-05-29 09:28:23.106552 7f78c53a9700 5 mds.0.objecter 1 unacked, 1 uncommitted -12 2015-05-29 09:28:23.106958 7f78c52a8700 10 mds.0.objecter ms_dispatch 0x3e2e000 osd_op_reply(29 164.0004 [trimtrunc 2@0] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6 -11 2015-05-29 09:28:23.106971 7f78c52a8700 10 mds.0.objecter in handle_osd_op_reply -10 2015-05-29 09:28:23.106973 7f78c52a8700 7 mds.0.objecter handle_osd_op_reply 29 ondisk v 0'0 uv 0 in 11.50e84eb2 attempt 1 -9 2015-05-29 09:28:23.106976 7f78c52a8700 10 mds.0.objecter op 0 rval -95 len 0 -8 2015-05-29 09:28:23.106980 7f78c52a8700 5 mds.0.objecter 1 unacked, 0 uncommitted -7 2015-05-29 09:28:23.107296 7f78c69bf700 10 mds.0.objecter ms_dispatch 0x3e2e000 osd_op_reply(30 1. [omap-get-header 0~0,omap-get-vals 0~16] v0'0 uv1 ondisk = 0) v6 -6 2015-05-29 09:28:23.107307 7f78c69bf700 10 mds.0.objecter in handle_osd_op_reply -5 2015-05-29 09:28:23.107309 7f78c69bf700 7 mds.0.objecter handle_osd_op_reply 30 ondisk v 0'0 uv 1 in 13.6b2cdaff attempt 0 -4 2015-05-29 09:28:23.107311 7f78c69bf700 10 mds.0.objecter op 0 rval 0 len 222 -3 2015-05-29 09:28:23.107313 7f78c69bf700 10 mds.0.objecter op 1 rval 0 len 4 -2 2015-05-29 09:28:23.107315 7f78c69bf700 10 mds.0.objecter op 1 handler 0x3e316b0 -1 2015-05-29 09:28:23.107321 7f78c69bf700 5 mds.0.objecter 0 unacked, 0 uncommitted 0 2015-05-29 09:28:23.108478 7f78cb4d9700 -1 mds/MDCache.cc: In function 'virtual void C_IO_MDC_TruncateFinish::finish(int)' thread 7f78cb4d9700 time 2015-05-29 09:28:23.107027 mds/MDCache.cc: 5974: FAILED assert(r == 0 || r == -2) OK, so you have Operation not supported coming out of RADOS. That usually means you've got CephFS trying to use an erase coded pool directly (doesn't work) rather than via a replicated cache pool (does work). You may have found that the filesystem appeared to work up to a point if you were only writing and not modifying. John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] NFS interaction with RBD
All, I 've tried to recreate the issue without success! My configuration is the following: OS (Hypervisor + VM): CentOS 6.6 (2.6.32-504.1.3.el6.x86_64) QEMU: qemu-kvm-0.12.1.2-2.415.el6.3ceph.x86_64 Ceph: ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), 20x4TB OSDs equally distributed on two disk nodes, 3xMonitors OpenStack Cinder has been configured to provide RBD Volumes from Ceph. I have created 10x 500GB Volumes which were then all attached at a single Virtual Machine. All volumes were formatted two times for comparison reasons, one using mkfs.xfs and one using mkfs.ext4. I did try to issue the commands all at the same time (or as possible to that). In both tests I didn't notice any interruption. It may took longer than just doing one at a time but the system was continuously up and everything was responding without the problem. At the time of these processes the open connections were 100 with one of the OSD node and 111 with the other one. So I guess I am not experiencing the issue due to the low number of OSDs I am having. Is my assumption correct? Best regards, George Thanks a million for the feedback Christian! I 've tried to recreate the issue with 10RBD Volumes mounted on a single server without success! I 've issued the mkfs.xfs command simultaneously (or at least as fast I could do it in different terminals) without noticing any problems. Can you please tell me what was the size of each one of the RBD Volumes cause I have a feeling that mine were two small, and if so I have to test it on our bigger cluster. I 've also thought that besides QEMU version it might also be important the underlying OS, so what was your testbed? All the best, George Hi George In order to experience the error it was enough to simply run mkfs.xfs on all the volumes. In the meantime it became clear what the problem was: ~ ; cat /proc/183016/limits ... Max open files1024 4096 files .. This can be changed by setting a decent value in /etc/libvirt/qemu.conf for max_files. Regards Christian On 27 May 2015, at 16:23, Jens-Christian Fischer jens-christian.fisc...@switch.ch wrote: George, I will let Christian provide you the details. As far as I know, it was enough to just do a ‘ls’ on all of the attached drives. we are using Qemu 2.0: $ dpkg -l | grep qemu ii ipxe-qemu 1.0.0+git-2013.c3d1e78-2ubuntu1 all PXE boot firmware - ROM images for qemu ii qemu-keymaps2.0.0+dfsg-2ubuntu1.11 all QEMU keyboard maps ii qemu-system 2.0.0+dfsg-2ubuntu1.11 amd64QEMU full system emulation binaries ii qemu-system-arm 2.0.0+dfsg-2ubuntu1.11 amd64QEMU full system emulation binaries (arm) ii qemu-system-common 2.0.0+dfsg-2ubuntu1.11 amd64QEMU full system emulation binaries (common files) ii qemu-system-mips2.0.0+dfsg-2ubuntu1.11 amd64QEMU full system emulation binaries (mips) ii qemu-system-misc2.0.0+dfsg-2ubuntu1.11 amd64QEMU full system emulation binaries (miscelaneous) ii qemu-system-ppc 2.0.0+dfsg-2ubuntu1.11 amd64QEMU full system emulation binaries (ppc) ii qemu-system-sparc 2.0.0+dfsg-2ubuntu1.11 amd64QEMU full system emulation binaries (sparc) ii qemu-system-x86 2.0.0+dfsg-2ubuntu1.11 amd64QEMU full system emulation binaries (x86) ii qemu-utils 2.0.0+dfsg-2ubuntu1.11 amd64QEMU utilities cheers jc -- SWITCH Jens-Christian Fischer, Peta Solutions Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland phone +41 44 268 15 15, direct +41 44 268 15 71 jens-christian.fisc...@switch.ch http://www.switch.ch http://www.switch.ch/stories On 26.05.2015, at 19:12, Georgios Dimitrakakis gior...@acmac.uoc.gr wrote: Jens-Christian, how did you test that? Did you just tried to write to them simultaneously? Any other tests that one can perform to verify that? In our installation we have a VM with 30 RBD volumes mounted which are all exported via NFS to other VMs. No one has complaint for the moment but the load/usage is very minimal. If this problem really exists then very soon that the trial phase will be over we will have millions of complaints :-( What version of QEMU are you using? We are using the one provided by Ceph in qemu-kvm-0.12.1.2-2.415.el6.3ceph.x86_64.rpm Best regards, George I think we (i.e. Christian) found the problem: We created a test VM with 9 mounted RBD volumes (no NFS server). As soon as he hit all disks, we started to experience these 120 second timeouts. We realized that the QEMU process on the hypervisor is opening a TCP connection to every OSD for every mounted volume - exceeding the 1024 FD
Re: [ceph-users] Discuss: New default recovery config settings
On Fri, May 29, 2015 at 5:47 PM, Samuel Just sj...@redhat.com wrote: Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io. We are talking about changing the defaults as follows: osd_max_backfills to 1 (from 10) osd_recovery_max_active to 3 (from 15) osd_recovery_op_priority to 1 (from 10) osd_recovery_max_single_start to 1 (from 5) We'd like a bit of feedback first though. Is anyone happy with the current configs? Is anyone using something between these values and the current defaults? What kind of workload? I'd guess that lowering osd_max_backfills to 1 is probably a good idea, but I wonder whether lowering osd_recovery_max_active and osd_recovery_max_single_start will cause small objects to recover unacceptably slowly. Thoughts? -Sam Sam I was thinking about this recently. We recently recently we ended up hitting a recovery story a scrub storm both happened at a time of high client activity. While changing the defaults down will make these kinds of disruptions less likely to occur, it also makes recovery (rebalancing) very slow. What I'd like to see What I would be happy to see is more of a QOS style tunable along the lines of networking traffic shaping. Where can guarantee a minimum amount of recovery load (and I say it in quotes since there's more the one resource involved) when the cluster is busy with client IO. Or vice versa there's a minimum amount of client IO that's guaranteed. Then when there's lower periods of client activity the recovery (and other background work) can proceed at full speed. Many workloads are cyclical or seasonal (in the statistics term of it, eg. intra/infra day seasonality). QOS style managment should lead to a more dynamic system where we can maximize available utilization, minimize disruptions, and not play wack-a-mole with many conf knobs. I'm aware that this is much harder to implement but thankfully there's a lot of literature, implementation and practical experience out there to draw upon. - Milosz -- Milosz Tanski CTO 16 East 34th Street, 15th floor New York, NY 10016 p: 646-253-9055 e: mil...@adfin.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Discuss: New default recovery config settings
Hi, We did it the other way around instead, defining a period where the load is lighter and turn off/on backfill/recover. Then you want the backfill values to be the what is default right now. Also, someone said that (think it was Greg?) If you have problems with backfill, your cluster backing store is not fast enough/too much load. If 10 osds goes down at the same time you want those values to be high to minimize the downtime. /Josef fre 29 maj 2015 23:47 Samuel Just sj...@redhat.com skrev: Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io. We are talking about changing the defaults as follows: osd_max_backfills to 1 (from 10) osd_recovery_max_active to 3 (from 15) osd_recovery_op_priority to 1 (from 10) osd_recovery_max_single_start to 1 (from 5) We'd like a bit of feedback first though. Is anyone happy with the current configs? Is anyone using something between these values and the current defaults? What kind of workload? I'd guess that lowering osd_max_backfills to 1 is probably a good idea, but I wonder whether lowering osd_recovery_max_active and osd_recovery_max_single_start will cause small objects to recover unacceptably slowly. Thoughts? -Sam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Hammer 0.94.1 - install-deps.sh script error
Hi, On 28/05/2015 05:13, Dyweni - Ceph-Users wrote: Hi Guys, Running the install-deps.sh script on Debian Squeeze results in the package 'cryptsetup-bin' not being found (and 'cryptsetup' not being used). This is due to the pipe character being deleted. To fix this, I replaced this line: -e 's/\|//g;' \ with this line: -e 's/\s*\|\s*/\\\|/g;' \ Nice catch :-) Does that look right ? https://github.com/ceph/ceph/pull/4799/files#diff-47a21b3706c13e08943e223c12323aa1L45 it would be great if you could try it, for instance with wget -O loic-install-deps.sh https://raw.githubusercontent.com/dachary/ceph/wip-install-deps/install-deps.sh bash -x install-deps.sh Cheers Thought you'd like to include this into the main line code. (FYI, This is somewhat related to this bug: http://tracker.ceph.com/issues/4943) Thanks, Dyweni ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Discuss: New default recovery config settings
I like the idea of turning the defaults down. During the ceph operators session at the OpenStack conference last week Warren described the behavior pretty accurately as Ceph basically DOSes itself unless you reduce those settings. Maybe this is more of a problem when the clusters are small? Another idea would be to have a better way to prioritize recovery traffic to an even lower priority level by setting the ionice value to 'Idle' in the CFQ scheduler? Bryan From: Josef Johansson jose...@gmail.commailto:jose...@gmail.com Date: Friday, May 29, 2015 at 4:16 PM To: Samuel Just sj...@redhat.commailto:sj...@redhat.com, ceph-devel ceph-de...@vger.kernel.orgmailto:ceph-de...@vger.kernel.org, 'ceph-users@lists.ceph.commailto:'ceph-users@lists.ceph.com' (ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com) ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Subject: Re: [ceph-users] Discuss: New default recovery config settings Hi, We did it the other way around instead, defining a period where the load is lighter and turn off/on backfill/recover. Then you want the backfill values to be the what is default right now. Also, someone said that (think it was Greg?) If you have problems with backfill, your cluster backing store is not fast enough/too much load. If 10 osds goes down at the same time you want those values to be high to minimize the downtime. /Josef fre 29 maj 2015 23:47 Samuel Just sj...@redhat.commailto:sj...@redhat.com skrev: Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io. We are talking about changing the defaults as follows: osd_max_backfills to 1 (from 10) osd_recovery_max_active to 3 (from 15) osd_recovery_op_priority to 1 (from 10) osd_recovery_max_single_start to 1 (from 5) We'd like a bit of feedback first though. Is anyone happy with the current configs? Is anyone using something between these values and the current defaults? What kind of workload? I'd guess that lowering osd_max_backfills to 1 is probably a good idea, but I wonder whether lowering osd_recovery_max_active and osd_recovery_max_single_start will cause small objects to recover unacceptably slowly. Thoughts? -Sam ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Discuss: New default recovery config settings
On Fri, May 29, 2015 at 2:47 PM, Samuel Just sj...@redhat.com wrote: Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io. We are talking about changing the defaults as follows: osd_max_backfills to 1 (from 10) osd_recovery_max_active to 3 (from 15) osd_recovery_op_priority to 1 (from 10) osd_recovery_max_single_start to 1 (from 5) I'm under the (possibly erroneous) impression that reducing the number of max backfills doesn't actually reduce recovery speed much (but will reduce memory use), but that dropping the op priority can. I'd rather we make users manually adjust values which can have a material impact on their data safety, even if most of them choose to do so. After all, even under our worst behavior we're still doing a lot better than a resilvering RAID array. ;) -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Discuss: New default recovery config settings
Sam, We are seeing some good client IO results during recovery by using the following values.. osd recovery max active = 1 osd max backfills = 1 osd recovery threads = 1 osd recovery op priority = 1 It is all flash though. The recovery time in case of entire node (~120 TB) failure/a single drive (~8TB) failure is also not too bad with the above settings. Thanks Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Samuel Just Sent: Friday, May 29, 2015 2:47 PM To: ceph-devel; 'ceph-users@lists.ceph.com' (ceph-users@lists.ceph.com) Subject: Discuss: New default recovery config settings Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io. We are talking about changing the defaults as follows: osd_max_backfills to 1 (from 10) osd_recovery_max_active to 3 (from 15) osd_recovery_op_priority to 1 (from 10) osd_recovery_max_single_start to 1 (from 5) We'd like a bit of feedback first though. Is anyone happy with the current configs? Is anyone using something between these values and the current defaults? What kind of workload? I'd guess that lowering osd_max_backfills to 1 is probably a good idea, but I wonder whether lowering osd_recovery_max_active and osd_recovery_max_single_start will cause small objects to recover unacceptably slowly. Thoughts? -Sam -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] newstore configuration
Hi, I have setup a cluster with newstore functionality and see that file sized of 100KB are stored in the DB and files 100KB are stored in fragments directory. Is there a way to change this threshold value in ceph.conf? Regards Srikanth ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Hammer 0.94.1 - install-deps.sh script error
Looks good to me. Dyweni On 2015-05-29 17:08, Loic Dachary wrote: Hi, On 28/05/2015 05:13, Dyweni - Ceph-Users wrote: Hi Guys, Running the install-deps.sh script on Debian Squeeze results in the package 'cryptsetup-bin' not being found (and 'cryptsetup' not being used). This is due to the pipe character being deleted. To fix this, I replaced this line: -e 's/\|//g;' \ with this line: -e 's/\s*\|\s*/\\\|/g;' \ Nice catch :-) Does that look right ? https://github.com/ceph/ceph/pull/4799/files#diff-47a21b3706c13e08943e223c12323aa1L45 it would be great if you could try it, for instance with wget -O loic-install-deps.sh https://raw.githubusercontent.com/dachary/ceph/wip-install-deps/install-deps.sh bash -x install-deps.sh Cheers Thought you'd like to include this into the main line code. (FYI, This is somewhat related to this bug: http://tracker.ceph.com/issues/4943) Thanks, Dyweni ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] NFS interaction with RBD
In the end this came down to one slow OSD. There were no hardware issues so have to just assume something gummed up during rebalancing and peering. I restarted the osd process after setting the cluster to noout. After the osd was restarted the rebalance completed and the cluster returned to health ok. As soon as the osd restarted all previously hanging operations returned to normal. I'm surprised by a single slow OSD impacting access to the entire cluster. I understand now that only the primary osd is used for reads and writes must go to the primary then secondary, but I would have expected the impact to be more contained. We currently build XFS file systems directly on RBD images. I'm wondering if there would be any value in using an LVM abstraction on top to spread access to other osds for read and failure scenarios. Any thoughts on the above appreciated. ~jpr On 05/28/2015 03:18 PM, John-Paul Robinson wrote: To follow up on the original post, Further digging indicates this is a problem with RBD image access and is not related to NFS-RBD interaction as initially suspected. The nfsd is simply hanging as a result of a hung request to the XFS file system mounted on our RBD-NFS gateway.This hung XFS call is caused by a problem with the RBD module interacting with our Ceph pool. I've found a reliable way to trigger a hang directly on an rbd image mapped into our RBD-NFS gateway box. The image contains an XFS file system. When I try to list the contents of a particular directory, the request hangs indefinitely. Two weeks ago our ceph status was: jpr@rcs-02:~/projects/rstore-utils$ sudo ceph --id nova status health HEALTH_WARN 1 near full osd(s) monmap e1: 3 mons at {da0-36-9f-0e-28-2c=172.16.171.6:6789/0,da0-36-9f-0e-2b-88=172.16.171.5:6789/0,da0-36-9f-0e-2b-a0=172.16.171.4:6789/0}, election epoch 350, quorum 0,1,2 da0-36-9f-0e-28-2c,da0-36-9f-0e-2b-88,da0-36-9f-0e-2b-a0 osdmap e5978: 66 osds: 66 up, 66 in pgmap v26434260: 3072 pgs: 3062 active+clean, 6 active+clean+scrubbing, 4 active+clean+scrubbing+deep; 45712 GB data, 91590 GB used, 51713 GB / 139 TB avail; 12234B/s wr, 1op/s mdsmap e1: 0/0/1 up The near full osd was number 53 and we updated our crush map to rewieght the osd. All of the OSDs had a weight of 1 based on the assumption that all osds were 2.0TB. Apparently one of our severs had the OSDs Sized to 2.8TB and this caused the OSD imbalance eventhough we are only at 50% utilization. We reweighted the near full osd to .8 and that initiated a rebalance that has since relieved the 95% full condition on that OSD. However, since that time the repeering has not completed and we suspect this is causing problems with our access of RBD images. Our current ceph status is: jpr@rcs-02:~/projects/rstore-utils$ sudo ceph --id nova status health HEALTH_WARN 1 pgs peering; 1 pgs stuck inactive; 4 pgs stuck unclean; recovery 9/23842120 degraded (0.000%) monmap e1: 3 mons at {da0-36-9f-0e-28-2c=172.16.171.6:6789/0,da0-36-9f-0e-2b-88=172.16.171.5:6789/0,da0-36-9f-0e-2b-a0=172.16.171.4:6789/0}, election epoch 350, quorum 0,1,2 da0-36-9f-0e-28-2c,da0-36-9f-0e-2b-88,da0-36-9f-0e-2b-a0 osdmap e6036: 66 osds: 66 up, 66 in pgmap v27104371: 3072 pgs: 3 active, 3056 active+clean, 9 active+clean+scrubbing, 1 remapped+peering, 3 active+clean+scrubbing+deep; 45868 GB data, 92006 GB used, 51297 GB / 139 TB avail; 3125B/s wr, 0op/s; 9/23842120 degraded (0.000%) mdsmap e1: 0/0/1 up Here are further details on our stuck pgs: jpr@rcs-02:~/projects/rstore-utils$ sudo ceph --id nova pg dump_stuck inactive ok pg_stat objects mip degrunf bytes log disklog state state_stamp v reportedup acting last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 3.3af 11600 0 0 0 47941791744 153812 153812 remapped+peering2015-05-15 12:47:17.223786 5979'293066 6000'1248735 [48,62] [53,48,62] 5979'293056 2015-05-15 07:40:36.275563 5979'293056 2015-05-15 07:40:36.275563 jpr@rcs-02:~/projects/rstore-utils$ sudo ceph --id nova pg dump_stuck unclean ok pg_stat objects mip degrunf bytes log disklog state state_stamp v reportedup acting last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 3.106 11870 0 9 0 49010106368 163991 163991 active 2015-05-15 12:47:19.761469 6035'356332 5968'1358516 [62,53] [62,53] 5979'356242 2015-05-14 22:22:12.966150 5979'351351 2015-05-12 18:04:41.838686 5.104 0 0 0 0 0 0 0 active 2015-05-15 12:47:19.800676 0'0 5968'1615
[ceph-users] Discuss: New default recovery config settings
Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io. We are talking about changing the defaults as follows: osd_max_backfills to 1 (from 10) osd_recovery_max_active to 3 (from 15) osd_recovery_op_priority to 1 (from 10) osd_recovery_max_single_start to 1 (from 5) We'd like a bit of feedback first though. Is anyone happy with the current configs? Is anyone using something between these values and the current defaults? What kind of workload? I'd guess that lowering osd_max_backfills to 1 is probably a good idea, but I wonder whether lowering osd_recovery_max_active and osd_recovery_max_single_start will cause small objects to recover unacceptably slowly. Thoughts? -Sam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com