[ceph-users] radosgw backup

2015-05-29 Thread Konstantin Ivanov
Hi everyone.
I'm wondering - is there way to backup radosgw data?
What i already tried.
create backup pool - copy .rgw.buckets to backup pool. Then i delete
object via s3 client. And then i copy data from backup pool to
.rgw.buckets. I still can't see object in s3 client, but can get it via
http by early known url.
Questions: where radosgw stores info about objects - (how to make restored
object visible from s3 client)? is there best way for backup data for
radosgw?
Thanks for any advises.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds crash

2015-05-29 Thread Peter Tiernan

Thank you for your reply

I had read the 'mds crashing' thread and i dont think im seeing that bug 
(http://tracker.ceph.com/issues/10449) .


I have enabled debug objector = 10 and here is the full log on 
starting mds :  http://pastebin.com/dbk0uLYy


Here is the last part of log:


   -35 2015-05-29 09:28:23.104098 7f78cdcde700 10 mds.0.objecter 
ms_handle_connect 0x3f43440
   -34 2015-05-29 09:28:23.104555 7f78cdcde700 10 mds.0.objecter 
ms_handle_connect 0x3f43860
   -33 2015-05-29 09:28:23.105016 7f78cdcde700 10 mds.0.objecter 
ms_handle_connect 0x3f43de0
   -32 2015-05-29 09:28:23.105350 7f78c57ad700 10 mds.0.objecter 
ms_dispatch 0x3e2e000 osd_op_reply(25 164.0002 [trimtrunc 
2@0] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6
   -31 2015-05-29 09:28:23.105375 7f78c57ad700 10 mds.0.objecter in 
handle_osd_op_reply
   -30 2015-05-29 09:28:23.105378 7f78c57ad700  7 mds.0.objecter 
handle_osd_op_reply 25 ondisk v 0'0 uv 0 in 11.2a2643ed attempt 1
   -29 2015-05-29 09:28:23.105381 7f78c57ad700 10 mds.0.objecter  op 0 
rval -95 len 0
   -28 2015-05-29 09:28:23.105387 7f78c57ad700  5 mds.0.objecter 1 
unacked, 4 uncommitted
   -27 2015-05-29 09:28:23.105678 7f78c55ab700 10 mds.0.objecter 
ms_dispatch 0x3e2e000 osd_op_reply(26 164.0003 [trimtrunc 
2@0] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6
   -26 2015-05-29 09:28:23.105696 7f78c55ab700 10 mds.0.objecter in 
handle_osd_op_reply
   -25 2015-05-29 09:28:23.105699 7f78c55ab700  7 mds.0.objecter 
handle_osd_op_reply 26 ondisk v 0'0 uv 0 in 11.beb48626 attempt 1
   -24 2015-05-29 09:28:23.105702 7f78c55ab700 10 mds.0.objecter  op 0 
rval -95 len 0
   -23 2015-05-29 09:28:23.105708 7f78c55ab700  5 mds.0.objecter 1 
unacked, 3 uncommitted
   -22 2015-05-29 09:28:23.106134 7f78c54aa700 10 mds.0.objecter 
ms_dispatch 0x3e2e000 osd_op_reply(27 164.0001 [trimtrunc 
2@0] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6
   -21 2015-05-29 09:28:23.106152 7f78c54aa700 10 mds.0.objecter in 
handle_osd_op_reply
   -20 2015-05-29 09:28:23.106155 7f78c54aa700  7 mds.0.objecter 
handle_osd_op_reply 27 ondisk v 0'0 uv 0 in 11.4a09fd98 attempt 1
   -19 2015-05-29 09:28:23.106158 7f78c54aa700 10 mds.0.objecter  op 0 
rval -95 len 0
   -18 2015-05-29 09:28:23.106163 7f78c54aa700  5 mds.0.objecter 1 
unacked, 2 uncommitted
   -17 2015-05-29 09:28:23.106524 7f78c53a9700 10 mds.0.objecter 
ms_dispatch 0x3e2e000 osd_op_reply(28 164. [trimtrunc 
2@0] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6
   -16 2015-05-29 09:28:23.106541 7f78c53a9700 10 mds.0.objecter in 
handle_osd_op_reply
   -15 2015-05-29 09:28:23.106543 7f78c53a9700  7 mds.0.objecter 
handle_osd_op_reply 28 ondisk v 0'0 uv 0 in 11.5ce99960 attempt 1
   -14 2015-05-29 09:28:23.106546 7f78c53a9700 10 mds.0.objecter  op 0 
rval -95 len 0
   -13 2015-05-29 09:28:23.106552 7f78c53a9700  5 mds.0.objecter 1 
unacked, 1 uncommitted
   -12 2015-05-29 09:28:23.106958 7f78c52a8700 10 mds.0.objecter 
ms_dispatch 0x3e2e000 osd_op_reply(29 164.0004 [trimtrunc 
2@0] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6
   -11 2015-05-29 09:28:23.106971 7f78c52a8700 10 mds.0.objecter in 
handle_osd_op_reply
   -10 2015-05-29 09:28:23.106973 7f78c52a8700  7 mds.0.objecter 
handle_osd_op_reply 29 ondisk v 0'0 uv 0 in 11.50e84eb2 attempt 1
-9 2015-05-29 09:28:23.106976 7f78c52a8700 10 mds.0.objecter  op 0 
rval -95 len 0
-8 2015-05-29 09:28:23.106980 7f78c52a8700  5 mds.0.objecter 1 
unacked, 0 uncommitted
-7 2015-05-29 09:28:23.107296 7f78c69bf700 10 mds.0.objecter 
ms_dispatch 0x3e2e000 osd_op_reply(30 1. [omap-get-header 
0~0,omap-get-vals 0~16] v0'0 uv1 ondisk = 0) v6
-6 2015-05-29 09:28:23.107307 7f78c69bf700 10 mds.0.objecter in 
handle_osd_op_reply
-5 2015-05-29 09:28:23.107309 7f78c69bf700  7 mds.0.objecter 
handle_osd_op_reply 30 ondisk v 0'0 uv 1 in 13.6b2cdaff attempt 0
-4 2015-05-29 09:28:23.107311 7f78c69bf700 10 mds.0.objecter  op 0 
rval 0 len 222
-3 2015-05-29 09:28:23.107313 7f78c69bf700 10 mds.0.objecter  op 1 
rval 0 len 4
-2 2015-05-29 09:28:23.107315 7f78c69bf700 10 mds.0.objecter  op 1 
handler 0x3e316b0
-1 2015-05-29 09:28:23.107321 7f78c69bf700  5 mds.0.objecter 0 
unacked, 0 uncommitted
 0 2015-05-29 09:28:23.108478 7f78cb4d9700 -1 mds/MDCache.cc: In 
function 'virtual void C_IO_MDC_TruncateFinish::finish(int)' thread 
7f78cb4d9700 time 2015-05-29 09:28:23.107027

mds/MDCache.cc: 5974: FAILED assert(r == 0 || r == -2)



On 28/05/15 17:43, John Spray wrote:


(This came up as in-reply-to to the previous mds crashing thread -- 
it's better to start threads with a fresh message)




On 28/05/2015 16:58, Peter Tiernan wrote:

Hi all,

I have been testing cephfs with erasure coded pool and cache tier. I 
have 3 mds running on the same physical server as 3 mons. The cluster 
is in ok state otherwise, rbd is working and all pg are active+clean. 
Im running v 

Re: [ceph-users] mds crash

2015-05-29 Thread Peter Tiernan

hi,

that appears to have worked. The mds are now stable and I can read and 
write correctly.


thanks for the help and have a good day.

On 29/05/15 12:25, John Spray wrote:



On 29/05/2015 11:41, Peter Tiernan wrote:
ok, thanks. I wasn’t aware of this. Should this command fix 
everything or is do i need to delete cephfs and pools and start again:


 ceph osd tier cache-mode CachePool writeback



It might well work, give it a try.

John


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds crash

2015-05-29 Thread John Spray

On 29/05/2015 09:46, Peter Tiernan wrote:


   -16 2015-05-29 09:28:23.106541 7f78c53a9700 10 mds.0.objecter in 
handle_osd_op_reply
   -15 2015-05-29 09:28:23.106543 7f78c53a9700  7 mds.0.objecter 
handle_osd_op_reply 28 ondisk v 0'0 uv 0 in 11.5ce99960 attempt 1
   -14 2015-05-29 09:28:23.106546 7f78c53a9700 10 mds.0.objecter  op 
0 rval -95 len 0
   -13 2015-05-29 09:28:23.106552 7f78c53a9700  5 mds.0.objecter 1 
unacked, 1 uncommitted
   -12 2015-05-29 09:28:23.106958 7f78c52a8700 10 mds.0.objecter 
ms_dispatch 0x3e2e000 osd_op_reply(29 164.0004 [trimtrunc 
2@0] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6
   -11 2015-05-29 09:28:23.106971 7f78c52a8700 10 mds.0.objecter in 
handle_osd_op_reply
   -10 2015-05-29 09:28:23.106973 7f78c52a8700  7 mds.0.objecter 
handle_osd_op_reply 29 ondisk v 0'0 uv 0 in 11.50e84eb2 attempt 1
-9 2015-05-29 09:28:23.106976 7f78c52a8700 10 mds.0.objecter  op 
0 rval -95 len 0
-8 2015-05-29 09:28:23.106980 7f78c52a8700  5 mds.0.objecter 1 
unacked, 0 uncommitted
-7 2015-05-29 09:28:23.107296 7f78c69bf700 10 mds.0.objecter 
ms_dispatch 0x3e2e000 osd_op_reply(30 1. [omap-get-header 
0~0,omap-get-vals 0~16] v0'0 uv1 ondisk = 0) v6
-6 2015-05-29 09:28:23.107307 7f78c69bf700 10 mds.0.objecter in 
handle_osd_op_reply
-5 2015-05-29 09:28:23.107309 7f78c69bf700  7 mds.0.objecter 
handle_osd_op_reply 30 ondisk v 0'0 uv 1 in 13.6b2cdaff attempt 0
-4 2015-05-29 09:28:23.107311 7f78c69bf700 10 mds.0.objecter  op 
0 rval 0 len 222
-3 2015-05-29 09:28:23.107313 7f78c69bf700 10 mds.0.objecter  op 
1 rval 0 len 4
-2 2015-05-29 09:28:23.107315 7f78c69bf700 10 mds.0.objecter  op 
1 handler 0x3e316b0
-1 2015-05-29 09:28:23.107321 7f78c69bf700  5 mds.0.objecter 0 
unacked, 0 uncommitted
 0 2015-05-29 09:28:23.108478 7f78cb4d9700 -1 mds/MDCache.cc: In 
function 'virtual void C_IO_MDC_TruncateFinish::finish(int)' thread 
7f78cb4d9700 time 2015-05-29 09:28:23.107027

mds/MDCache.cc: 5974: FAILED assert(r == 0 || r == -2)


OK, so you have Operation not supported coming out of RADOS.  That 
usually means you've got CephFS trying to use an erase coded pool 
directly (doesn't work) rather than via a replicated cache pool (does work).


You may have found that the filesystem appeared to work up to a point if 
you were only writing and not modifying.


John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds crash

2015-05-29 Thread Peter Tiernan
ok, thanks. I wasn’t aware of this. Should this command fix everything 
or is do i need to delete cephfs and pools and start again:


 ceph osd tier cache-mode CachePool writeback



On 29/05/15 11:37, John Spray wrote:

On 29/05/2015 11:34, Peter Tiernan wrote:
ok, thats interesting. I had issues before this crash where files 
were being garbled. I followed what I thought was the correct 
procedure for erasure coded pool with cache tier:


 ceph osd pool create ECpool 800 800 erasure default
 ceph osd pool create CachePool 4096 4096
 ceph osd tier add ECpool CachePool
 ceph osd tier cache-mode CachePool readonly
 ceph osd tier set-overlay ECpool CachePool
 ceph osd pool create cephfs_metadata 4096 4096
 ceph fs new cephfs cephfs_metadata ECpool

Is my mistake the last command above? should the ceph fs new be given 
the CachePool and not the ECpool?


The problem is that you're creating a readonly cache tier instead of a 
writeback cache tier.  CephFS needs a writeback cache tier for 
modifications and truncations.


Cheers,
John


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds crash

2015-05-29 Thread John Spray

On 29/05/2015 11:34, Peter Tiernan wrote:
ok, thats interesting. I had issues before this crash where files were 
being garbled. I followed what I thought was the correct procedure for 
erasure coded pool with cache tier:


 ceph osd pool create ECpool 800 800 erasure default
 ceph osd pool create CachePool 4096 4096
 ceph osd tier add ECpool CachePool
 ceph osd tier cache-mode CachePool readonly
 ceph osd tier set-overlay ECpool CachePool
 ceph osd pool create cephfs_metadata 4096 4096
 ceph fs new cephfs cephfs_metadata ECpool

Is my mistake the last command above? should the ceph fs new be given 
the CachePool and not the ECpool?


The problem is that you're creating a readonly cache tier instead of a 
writeback cache tier.  CephFS needs a writeback cache tier for 
modifications and truncations.


Cheers,
John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds crash

2015-05-29 Thread Peter Tiernan
ok, thats interesting. I had issues before this crash where files were 
being garbled. I followed what I thought was the correct procedure for 
erasure coded pool with cache tier:


 ceph osd pool create ECpool 800 800 erasure default
 ceph osd pool create CachePool 4096 4096
 ceph osd tier add ECpool CachePool
 ceph osd tier cache-mode CachePool readonly
 ceph osd tier set-overlay ECpool CachePool
 ceph osd pool create cephfs_metadata 4096 4096
 ceph fs new cephfs cephfs_metadata ECpool

Is my mistake the last command above? should the ceph fs new be given 
the CachePool and not the ECpool?


thanks


On 29/05/15 11:17, John Spray wrote:

On 29/05/2015 09:46, Peter Tiernan wrote:


   -16 2015-05-29 09:28:23.106541 7f78c53a9700 10 mds.0.objecter in 
handle_osd_op_reply
   -15 2015-05-29 09:28:23.106543 7f78c53a9700  7 mds.0.objecter 
handle_osd_op_reply 28 ondisk v 0'0 uv 0 in 11.5ce99960 attempt 1
   -14 2015-05-29 09:28:23.106546 7f78c53a9700 10 mds.0.objecter  op 
0 rval -95 len 0
   -13 2015-05-29 09:28:23.106552 7f78c53a9700  5 mds.0.objecter 1 
unacked, 1 uncommitted
   -12 2015-05-29 09:28:23.106958 7f78c52a8700 10 mds.0.objecter 
ms_dispatch 0x3e2e000 osd_op_reply(29 164.0004 [trimtrunc 
2@0] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6
   -11 2015-05-29 09:28:23.106971 7f78c52a8700 10 mds.0.objecter in 
handle_osd_op_reply
   -10 2015-05-29 09:28:23.106973 7f78c52a8700  7 mds.0.objecter 
handle_osd_op_reply 29 ondisk v 0'0 uv 0 in 11.50e84eb2 attempt 1
-9 2015-05-29 09:28:23.106976 7f78c52a8700 10 mds.0.objecter  op 
0 rval -95 len 0
-8 2015-05-29 09:28:23.106980 7f78c52a8700  5 mds.0.objecter 1 
unacked, 0 uncommitted
-7 2015-05-29 09:28:23.107296 7f78c69bf700 10 mds.0.objecter 
ms_dispatch 0x3e2e000 osd_op_reply(30 1. [omap-get-header 
0~0,omap-get-vals 0~16] v0'0 uv1 ondisk = 0) v6
-6 2015-05-29 09:28:23.107307 7f78c69bf700 10 mds.0.objecter in 
handle_osd_op_reply
-5 2015-05-29 09:28:23.107309 7f78c69bf700  7 mds.0.objecter 
handle_osd_op_reply 30 ondisk v 0'0 uv 1 in 13.6b2cdaff attempt 0
-4 2015-05-29 09:28:23.107311 7f78c69bf700 10 mds.0.objecter  op 
0 rval 0 len 222
-3 2015-05-29 09:28:23.107313 7f78c69bf700 10 mds.0.objecter  op 
1 rval 0 len 4
-2 2015-05-29 09:28:23.107315 7f78c69bf700 10 mds.0.objecter  op 
1 handler 0x3e316b0
-1 2015-05-29 09:28:23.107321 7f78c69bf700  5 mds.0.objecter 0 
unacked, 0 uncommitted
 0 2015-05-29 09:28:23.108478 7f78cb4d9700 -1 mds/MDCache.cc: In 
function 'virtual void C_IO_MDC_TruncateFinish::finish(int)' thread 
7f78cb4d9700 time 2015-05-29 09:28:23.107027

mds/MDCache.cc: 5974: FAILED assert(r == 0 || r == -2)


OK, so you have Operation not supported coming out of RADOS. That 
usually means you've got CephFS trying to use an erase coded pool 
directly (doesn't work) rather than via a replicated cache pool (does 
work).


You may have found that the filesystem appeared to work up to a point 
if you were only writing and not modifying.


John


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NFS interaction with RBD

2015-05-29 Thread Georgios Dimitrakakis

All,

I 've tried to recreate the issue without success!

My configuration is the following:

OS (Hypervisor + VM): CentOS 6.6 (2.6.32-504.1.3.el6.x86_64)
QEMU: qemu-kvm-0.12.1.2-2.415.el6.3ceph.x86_64
Ceph: ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), 
20x4TB OSDs equally distributed on two disk nodes, 3xMonitors



OpenStack Cinder has been configured to provide RBD Volumes from Ceph.

I have created 10x 500GB Volumes which were then all attached at a 
single Virtual Machine.


All volumes were formatted two times for comparison reasons, one using 
mkfs.xfs and one using mkfs.ext4.
I did try to issue the commands all at the same time (or as possible to 
that).


In both tests I didn't notice any interruption. It may took longer than 
just doing one at a time but the system was continuously up and 
everything was responding without the problem.


At the time of these processes the open connections were 100 with one 
of the OSD node and 111 with the other one.


So I guess I am not experiencing the issue due to the low number of 
OSDs I am having. Is my assumption correct?



Best regards,

George




Thanks a million for the feedback Christian!

I 've tried to recreate the issue with 10RBD Volumes mounted on a
single server without success!

I 've issued the mkfs.xfs command simultaneously (or at least as
fast I could do it in different terminals) without noticing any
problems. Can you please tell me what was the size of each one of the
RBD Volumes cause I have a feeling that mine were two small, and if 
so

I have to test it on our bigger cluster.

I 've also thought that besides QEMU version it might also be
important the underlying OS, so what was your testbed?


All the best,

George


Hi George

In order to experience the error it was enough to simply run 
mkfs.xfs

on all the volumes.


In the meantime it became clear what the problem was:

 ~ ; cat /proc/183016/limits
...
Max open files1024 4096 
files

..

This can be changed by setting a decent value in
/etc/libvirt/qemu.conf for max_files.

Regards
Christian



On 27 May 2015, at 16:23, Jens-Christian Fischer
jens-christian.fisc...@switch.ch wrote:


George,

I will let Christian provide you the details. As far as I know, it 
was enough to just do a ‘ls’ on all of the attached drives.


we are using Qemu 2.0:

$ dpkg -l | grep qemu
ii  ipxe-qemu   
1.0.0+git-2013.c3d1e78-2ubuntu1   all  PXE boot firmware 
- ROM images for qemu
ii  qemu-keymaps2.0.0+dfsg-2ubuntu1.11  
all  QEMU keyboard maps
ii  qemu-system 2.0.0+dfsg-2ubuntu1.11  
amd64QEMU full system emulation binaries
ii  qemu-system-arm 2.0.0+dfsg-2ubuntu1.11  
amd64QEMU full system emulation binaries (arm)
ii  qemu-system-common  2.0.0+dfsg-2ubuntu1.11  
amd64QEMU full system emulation binaries (common files)
ii  qemu-system-mips2.0.0+dfsg-2ubuntu1.11  
amd64QEMU full system emulation binaries (mips)
ii  qemu-system-misc2.0.0+dfsg-2ubuntu1.11  
amd64QEMU full system emulation binaries (miscelaneous)
ii  qemu-system-ppc 2.0.0+dfsg-2ubuntu1.11  
amd64QEMU full system emulation binaries (ppc)
ii  qemu-system-sparc   2.0.0+dfsg-2ubuntu1.11  
amd64QEMU full system emulation binaries (sparc)
ii  qemu-system-x86 2.0.0+dfsg-2ubuntu1.11  
amd64QEMU full system emulation binaries (x86)
ii  qemu-utils  2.0.0+dfsg-2ubuntu1.11  
amd64QEMU utilities


cheers
jc

--
SWITCH
Jens-Christian Fischer, Peta Solutions
Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
phone +41 44 268 15 15, direct +41 44 268 15 71
jens-christian.fisc...@switch.ch
http://www.switch.ch

http://www.switch.ch/stories

On 26.05.2015, at 19:12, Georgios Dimitrakakis 
gior...@acmac.uoc.gr wrote:



Jens-Christian,

how did you test that? Did you just tried to write to them 
simultaneously? Any other tests that one can perform to verify that?


In our installation we have a VM with 30 RBD volumes mounted which 
are all exported via NFS to other VMs.
No one has complaint for the moment but the load/usage is very 
minimal.
If this problem really exists then very soon that the trial phase 
will be over we will have millions of complaints :-(


What version of QEMU are you using? We are using the one provided 
by Ceph in qemu-kvm-0.12.1.2-2.415.el6.3ceph.x86_64.rpm


Best regards,

George


I think we (i.e. Christian) found the problem:

We created a test VM with 9 mounted RBD volumes (no NFS server). 
As
soon as he hit all disks, we started to experience these 120 
second

timeouts. We realized that the QEMU process on the hypervisor is
opening a TCP connection to every OSD for every mounted volume -
exceeding the 1024 FD 

Re: [ceph-users] Discuss: New default recovery config settings

2015-05-29 Thread Milosz Tanski
On Fri, May 29, 2015 at 5:47 PM, Samuel Just sj...@redhat.com wrote:
 Many people have reported that they need to lower the osd recovery config 
 options to minimize the impact of recovery on client io.  We are talking 
 about changing the defaults as follows:

 osd_max_backfills to 1 (from 10)
 osd_recovery_max_active to 3 (from 15)
 osd_recovery_op_priority to 1 (from 10)
 osd_recovery_max_single_start to 1 (from 5)

 We'd like a bit of feedback first though.  Is anyone happy with the current 
 configs?  Is anyone using something between these values and the current 
 defaults?  What kind of workload?  I'd guess that lowering osd_max_backfills 
 to 1 is probably a good idea, but I wonder whether lowering 
 osd_recovery_max_active and osd_recovery_max_single_start will cause small 
 objects to recover unacceptably slowly.

 Thoughts?
 -Sam

Sam I was thinking about this recently. We recently recently we ended
up hitting a recovery story  a scrub storm both happened at a time of
high client activity. While changing the defaults down will make these
kinds of disruptions less likely to occur, it also makes recovery
(rebalancing) very slow. What I'd like to see

What I would be happy to see is more of a QOS style tunable along the
lines of networking traffic shaping. Where can guarantee a minimum
amount of recovery load (and I say it in quotes since there's more
the one resource involved) when the cluster is busy with client IO. Or
vice versa there's a minimum amount of client IO that's guaranteed.
Then when there's lower periods of client activity the recovery (and
other background work) can proceed at full speed. Many workloads are
cyclical or seasonal (in the statistics term of it, eg. intra/infra
day seasonality).

QOS style managment should lead to a more dynamic system where we can
maximize available utilization, minimize disruptions, and not play
wack-a-mole with many conf knobs. I'm aware that this is much harder
to implement but thankfully there's a lot of literature,
implementation and practical experience out there to draw upon.

- Milosz

-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: mil...@adfin.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Discuss: New default recovery config settings

2015-05-29 Thread Josef Johansson
Hi,

We did it the other way around instead, defining a period where the load is
lighter and turn off/on backfill/recover. Then you want the backfill values
to be the what is default right now.

Also, someone said that (think it was Greg?) If you have problems with
backfill, your cluster backing store is not fast enough/too much load.
If 10 osds goes down at the same time you want those values to be high to
minimize the downtime.

/Josef

fre 29 maj 2015 23:47 Samuel Just sj...@redhat.com skrev:

 Many people have reported that they need to lower the osd recovery config
 options to minimize the impact of recovery on client io.  We are talking
 about changing the defaults as follows:

 osd_max_backfills to 1 (from 10)
 osd_recovery_max_active to 3 (from 15)
 osd_recovery_op_priority to 1 (from 10)
 osd_recovery_max_single_start to 1 (from 5)

 We'd like a bit of feedback first though.  Is anyone happy with the
 current configs?  Is anyone using something between these values and the
 current defaults?  What kind of workload?  I'd guess that lowering
 osd_max_backfills to 1 is probably a good idea, but I wonder whether
 lowering osd_recovery_max_active and osd_recovery_max_single_start will
 cause small objects to recover unacceptably slowly.

 Thoughts?
 -Sam
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hammer 0.94.1 - install-deps.sh script error

2015-05-29 Thread Loic Dachary
Hi,

On 28/05/2015 05:13, Dyweni - Ceph-Users wrote:
 Hi Guys,
 
 Running the install-deps.sh script on Debian Squeeze results in the package 
 'cryptsetup-bin' not being found (and 'cryptsetup' not being used).
 
 This is due to the pipe character being deleted.
 
 To fix this, I replaced this line:
 -e 's/\|//g;' \
 with this line:
 -e 's/\s*\|\s*/\\\|/g;' \
 
 

Nice catch :-) Does that look right ?

https://github.com/ceph/ceph/pull/4799/files#diff-47a21b3706c13e08943e223c12323aa1L45

it would be great if you could try it, for instance with

wget -O loic-install-deps.sh 
https://raw.githubusercontent.com/dachary/ceph/wip-install-deps/install-deps.sh
bash -x install-deps.sh

Cheers

 Thought you'd like to include this into the main line code.
 
 (FYI, This is somewhat related to this bug:  
 http://tracker.ceph.com/issues/4943)
 
 Thanks,
 Dyweni
 
 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Discuss: New default recovery config settings

2015-05-29 Thread Stillwell, Bryan
I like the idea of turning the defaults down.  During the ceph operators 
session at the OpenStack conference last week Warren described the behavior 
pretty accurately as Ceph basically DOSes itself unless you reduce those 
settings.  Maybe this is more of a problem when the clusters are small?

Another idea would be to have a better way to prioritize recovery traffic to an 
even lower priority level by setting the ionice value to 'Idle' in the CFQ 
scheduler?

Bryan

From: Josef Johansson jose...@gmail.commailto:jose...@gmail.com
Date: Friday, May 29, 2015 at 4:16 PM
To: Samuel Just sj...@redhat.commailto:sj...@redhat.com, ceph-devel 
ceph-de...@vger.kernel.orgmailto:ceph-de...@vger.kernel.org, 
'ceph-users@lists.ceph.commailto:'ceph-users@lists.ceph.com' 
(ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com) 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Discuss: New default recovery config settings


Hi,

We did it the other way around instead, defining a period where the load is 
lighter and turn off/on backfill/recover. Then you want the backfill values to 
be the what is default right now.

Also, someone said that (think it was Greg?) If you have problems with 
backfill, your cluster backing store is not fast enough/too much load.
If 10 osds goes down at the same time you want those values to be high to 
minimize the downtime.

/Josef

fre 29 maj 2015 23:47 Samuel Just sj...@redhat.commailto:sj...@redhat.com 
skrev:
Many people have reported that they need to lower the osd recovery config 
options to minimize the impact of recovery on client io.  We are talking about 
changing the defaults as follows:

osd_max_backfills to 1 (from 10)
osd_recovery_max_active to 3 (from 15)
osd_recovery_op_priority to 1 (from 10)
osd_recovery_max_single_start to 1 (from 5)

We'd like a bit of feedback first though.  Is anyone happy with the current 
configs?  Is anyone using something between these values and the current 
defaults?  What kind of workload?  I'd guess that lowering osd_max_backfills to 
1 is probably a good idea, but I wonder whether lowering 
osd_recovery_max_active and osd_recovery_max_single_start will cause small 
objects to recover unacceptably slowly.

Thoughts?
-Sam
___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


This E-mail and any of its attachments may contain Time Warner Cable 
proprietary information, which is privileged, confidential, or subject to 
copyright belonging to Time Warner Cable. This E-mail is intended solely for 
the use of the individual or entity to which it is addressed. If you are not 
the intended recipient of this E-mail, you are hereby notified that any 
dissemination, distribution, copying, or action taken in relation to the 
contents of and attachments to this E-mail is strictly prohibited and may be 
unlawful. If you have received this E-mail in error, please notify the sender 
immediately and permanently delete the original and any copy of this E-mail and 
any printout.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Discuss: New default recovery config settings

2015-05-29 Thread Gregory Farnum
On Fri, May 29, 2015 at 2:47 PM, Samuel Just sj...@redhat.com wrote:
 Many people have reported that they need to lower the osd recovery config 
 options to minimize the impact of recovery on client io.  We are talking 
 about changing the defaults as follows:

 osd_max_backfills to 1 (from 10)
 osd_recovery_max_active to 3 (from 15)
 osd_recovery_op_priority to 1 (from 10)
 osd_recovery_max_single_start to 1 (from 5)

I'm under the (possibly erroneous) impression that reducing the number
of max backfills doesn't actually reduce recovery speed much (but will
reduce memory use), but that dropping the op priority can. I'd rather
we make users manually adjust values which can have a material impact
on their data safety, even if most of them choose to do so.

After all, even under our worst behavior we're still doing a lot
better than a resilvering RAID array. ;)
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Discuss: New default recovery config settings

2015-05-29 Thread Somnath Roy
Sam,
We are seeing some good client IO results during recovery by using the 
following values..

osd recovery max active = 1
osd max backfills = 1
osd recovery threads = 1
osd recovery op priority = 1

It is all flash though.  The recovery time in case of entire node (~120 TB) 
failure/a single drive (~8TB) failure is also not too bad with the above 
settings.

Thanks  Regards
Somnath

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Samuel Just
Sent: Friday, May 29, 2015 2:47 PM
To: ceph-devel; 'ceph-users@lists.ceph.com' (ceph-users@lists.ceph.com)
Subject: Discuss: New default recovery config settings

Many people have reported that they need to lower the osd recovery config 
options to minimize the impact of recovery on client io.  We are talking about 
changing the defaults as follows:

osd_max_backfills to 1 (from 10)
osd_recovery_max_active to 3 (from 15)
osd_recovery_op_priority to 1 (from 10)
osd_recovery_max_single_start to 1 (from 5)

We'd like a bit of feedback first though.  Is anyone happy with the current 
configs?  Is anyone using something between these values and the current 
defaults?  What kind of workload?  I'd guess that lowering osd_max_backfills to 
1 is probably a good idea, but I wonder whether lowering 
osd_recovery_max_active and osd_recovery_max_single_start will cause small 
objects to recover unacceptably slowly.

Thoughts?
-Sam
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] newstore configuration

2015-05-29 Thread Srikanth Madugundi
Hi,

I have setup a cluster with newstore functionality and see that file sized
of 100KB are stored in the DB and files 100KB are stored in fragments
directory.

Is there a way to change this threshold value in ceph.conf?

Regards
Srikanth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hammer 0.94.1 - install-deps.sh script error

2015-05-29 Thread Dyweni - Ceph-Users

Looks good to me.

Dyweni


On 2015-05-29 17:08, Loic Dachary wrote:

Hi,

On 28/05/2015 05:13, Dyweni - Ceph-Users wrote:

Hi Guys,

Running the install-deps.sh script on Debian Squeeze results in the 
package 'cryptsetup-bin' not being found (and 'cryptsetup' not being 
used).


This is due to the pipe character being deleted.

To fix this, I replaced this line:
-e 's/\|//g;' \
with this line:
-e 's/\s*\|\s*/\\\|/g;' \




Nice catch :-) Does that look right ?

https://github.com/ceph/ceph/pull/4799/files#diff-47a21b3706c13e08943e223c12323aa1L45

it would be great if you could try it, for instance with

wget -O loic-install-deps.sh
https://raw.githubusercontent.com/dachary/ceph/wip-install-deps/install-deps.sh
bash -x install-deps.sh

Cheers


Thought you'd like to include this into the main line code.

(FYI, This is somewhat related to this bug:  
http://tracker.ceph.com/issues/4943)


Thanks,
Dyweni





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Loïc Dachary, Artisan Logiciel Libre

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NFS interaction with RBD

2015-05-29 Thread John-Paul Robinson
In the end this came down to one slow OSD.  There were no hardware
issues so have to just assume something gummed up during rebalancing and
peering.

I restarted the osd process after setting the cluster to noout.  After
the osd was restarted the rebalance completed and the cluster returned
to health ok.

As soon as the osd restarted all previously hanging operations returned
to normal.

I'm surprised by a single slow OSD impacting access to the entire
cluster.   I understand now that only the primary osd is used for reads
and writes must go to the primary then secondary, but I would have
expected  the impact to be more contained.

We currently build XFS file systems directly on RBD images.  I'm
wondering if there would be any value in using an LVM abstraction on top
to spread access to other osds  for read and failure scenarios.

Any thoughts on the above appreciated.

~jpr


On 05/28/2015 03:18 PM, John-Paul Robinson wrote:
 To follow up on the original post,

 Further digging indicates this is a problem with RBD image access and
 is not related to NFS-RBD interaction as initially suspected.  The
 nfsd is simply hanging as a result of a hung request to the XFS file
 system mounted on our RBD-NFS gateway.This hung XFS call is caused
 by a problem with the RBD module interacting with our Ceph pool.

 I've found a reliable way to trigger a hang directly on an rbd image
 mapped into our RBD-NFS gateway box.  The image contains an XFS file
 system.  When I try to list the contents of a particular directory,
 the request hangs indefinitely.

 Two weeks ago our ceph status was:

 jpr@rcs-02:~/projects/rstore-utils$ sudo ceph --id nova status
health HEALTH_WARN 1 near full osd(s)
monmap e1: 3 mons at
 
 {da0-36-9f-0e-28-2c=172.16.171.6:6789/0,da0-36-9f-0e-2b-88=172.16.171.5:6789/0,da0-36-9f-0e-2b-a0=172.16.171.4:6789/0},
 election epoch 350, quorum 0,1,2
 da0-36-9f-0e-28-2c,da0-36-9f-0e-2b-88,da0-36-9f-0e-2b-a0
osdmap e5978: 66 osds: 66 up, 66 in
 pgmap v26434260: 3072 pgs: 3062 active+clean, 6
 active+clean+scrubbing, 4 active+clean+scrubbing+deep; 45712 GB
 data, 91590 GB used, 51713 GB / 139 TB avail; 12234B/s wr, 1op/s
mdsmap e1: 0/0/1 up


 The near full osd was number 53 and we updated our crush map to
 rewieght the osd.  All of the OSDs had a weight of 1 based on the
 assumption that all osds were 2.0TB.  Apparently one of our severs had
 the OSDs Sized to 2.8TB and this caused the OSD imbalance eventhough
 we are only at 50% utilization.  We reweighted the near full osd to .8
 and that initiated a rebalance that has since relieved the 95% full
 condition on that OSD.

 However, since that time the repeering has not completed and we
 suspect this is causing problems with our access of RBD images.   Our
 current ceph status is:

 jpr@rcs-02:~/projects/rstore-utils$ sudo ceph --id nova status
health HEALTH_WARN 1 pgs peering; 1 pgs stuck inactive; 4 pgs
 stuck unclean; recovery 9/23842120 degraded (0.000%)
monmap e1: 3 mons at
 
 {da0-36-9f-0e-28-2c=172.16.171.6:6789/0,da0-36-9f-0e-2b-88=172.16.171.5:6789/0,da0-36-9f-0e-2b-a0=172.16.171.4:6789/0},
 election epoch 350, quorum 0,1,2
 da0-36-9f-0e-28-2c,da0-36-9f-0e-2b-88,da0-36-9f-0e-2b-a0
osdmap e6036: 66 osds: 66 up, 66 in
 pgmap v27104371: 3072 pgs: 3 active, 3056 active+clean, 9
 active+clean+scrubbing, 1 remapped+peering, 3
 active+clean+scrubbing+deep; 45868 GB data, 92006 GB used, 51297
 GB / 139 TB avail; 3125B/s wr, 0op/s; 9/23842120 degraded (0.000%)
mdsmap e1: 0/0/1 up


 Here are further details on our stuck pgs:

 jpr@rcs-02:~/projects/rstore-utils$ sudo ceph --id nova pg
 dump_stuck inactive
 ok
 pg_stat objects mip degrunf bytes   log disklog
 state   state_stamp v   reportedup  acting 
 last_scrub   scrub_stamp  last_deep_scrub deep_scrub_stamp
 3.3af   11600   0   0   0   47941791744 153812 
 153812  remapped+peering2015-05-15 12:47:17.223786 
 5979'293066  6000'1248735 [48,62] [53,48,62] 
 5979'293056 2015-05-15 07:40:36.275563  5979'293056
 2015-05-15 07:40:36.275563

 jpr@rcs-02:~/projects/rstore-utils$ sudo ceph --id nova pg
 dump_stuck unclean
 ok
 pg_stat objects mip degrunf bytes   log disklog
 state   state_stamp v   reportedup  acting 
 last_scrub   scrub_stamp  last_deep_scrub deep_scrub_stamp
 3.106   11870   0   9   0   49010106368 163991 
 163991  active  2015-05-15 12:47:19.761469  6035'356332
 5968'1358516 [62,53]  [62,53] 5979'356242 2015-05-14
 22:22:12.966150  5979'351351 2015-05-12 18:04:41.838686
 5.104   0   0   0   0   0   0   0  
 active  2015-05-15 12:47:19.800676  0'0 5968'1615  
 

[ceph-users] Discuss: New default recovery config settings

2015-05-29 Thread Samuel Just
Many people have reported that they need to lower the osd recovery config 
options to minimize the impact of recovery on client io.  We are talking about 
changing the defaults as follows:

osd_max_backfills to 1 (from 10)
osd_recovery_max_active to 3 (from 15)
osd_recovery_op_priority to 1 (from 10)
osd_recovery_max_single_start to 1 (from 5)

We'd like a bit of feedback first though.  Is anyone happy with the current 
configs?  Is anyone using something between these values and the current 
defaults?  What kind of workload?  I'd guess that lowering osd_max_backfills to 
1 is probably a good idea, but I wonder whether lowering 
osd_recovery_max_active and osd_recovery_max_single_start will cause small 
objects to recover unacceptably slowly.

Thoughts?
-Sam
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com