Jerasure 1.2A plugin for Ceph

2013-08-30 Thread Loic Dachary
Hi James,

The first version of the jerasure 1.2A plugin for Ceph is complete at

https://github.com/ceph/ceph/pull/538#commits-pushed-763275e

This commit introduces the main part:
ErasureCodeJerasure: base class for jerasure ErasureCodeInterface
https://github.com/dachary/ceph/commit/76d2842358465e560a4929d60131762f8c93804f

and each technique is derived from it in six successive commits, starting from 
here
ErasureCodeJerasure: define technique ReedSolomonVandermonde

It would be great if you could take a look and let us know if you see anything 
odd.

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.



signature.asc
Description: OpenPGP digital signature


Re: radosgw 0.67.2 update - ERROR: failed to initialize watch

2013-08-30 Thread Sylvain Munaut
Hi,


 I just pushed a fix to wip-6161, can you verify that it fixes the issue
 for you?

Thanks, I'll give it a shot on monday, I'm out of the office at the moment.


Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


libvirt: Using rbd_create3 to create format 2 images

2013-08-30 Thread Wido den Hollander

Hi,

I created the attached patch to have libvirt create images with format 2 
by default, this would simplify the CloudStack code and could also help 
other projects.


The problem with libvirt is that there is no mechanism to supply 
information like order, features, stripe unit and count to the 
rbd_create3 method, so it's now hardcoded in libvirt.


Any comments on this patch before I fire it of to the libvirt guys?

--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
From 2731f7c131d938ed5029bf8343877fcc4d950a0f Mon Sep 17 00:00:00 2001
From: Wido den Hollander w...@widodh.nl
Date: Fri, 30 Aug 2013 10:50:25 +0200
Subject: [PATCH] rbd: Use rbd_create3 to create RBD format 2 images by
 default

This new RBD format supports snapshotting and cloning. By having
libvirt create images in format 2 end-users of the created images
can benefit of the new RBD format.

Signed-off-by: Wido den Hollander w...@widodh.nl
---
 src/storage/storage_backend_rbd.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c
index d9e1789..e5d720e 100644
--- a/src/storage/storage_backend_rbd.c
+++ b/src/storage/storage_backend_rbd.c
@@ -443,6 +443,9 @@ static int virStorageBackendRBDCreateVol(virConnectPtr conn,
 ptr.cluster = NULL;
 ptr.ioctx = NULL;
 int order = 0;
+uint64_t features = 3;
+uint64_t stripe_count = 1;
+uint64_t stripe_unit = 4194304;
 int ret = -1;
 
 VIR_DEBUG(Creating RBD image %s/%s with size %llu,
@@ -467,7 +470,8 @@ static int virStorageBackendRBDCreateVol(virConnectPtr conn,
 goto cleanup;
 }
 
-if (rbd_create(ptr.ioctx, vol-name, vol-capacity, order)  0) {
+if (rbd_create3(ptr.ioctx, vol-name, vol-capacity, features, order,
+stripe_count, stripe_unit)  0) {
 virReportError(VIR_ERR_INTERNAL_ERROR,
_(failed to create volume '%s/%s'),
pool-def-source.name,
-- 
1.7.9.5



RE: debugging librbd async - valgrind memtest hit

2013-08-30 Thread James Harper
I finally got a valgrind memtest hit... output attached below email. I 
recompiled all of tapdisk and ceph without any -O options (thought I had 
already...) and it seems to have done the trick

Basically it looks like an instance of AioRead is being accessed after being 
free'd. I need some hints on what api behaviour by the tapdisk driver could be 
causing this to happen in librbd...

thanks

James

==25078== Memcheck, a memory error detector
==25078== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==25078== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==25078== Command: /usr/bin/tapdisk.clean
==25078== Parent PID: 25077
==25078==
==25078==
==25078== HEAP SUMMARY:
==25078== in use at exit: 6,808 bytes in 7 blocks
==25078==   total heap usage: 7 allocs, 0 frees, 6,808 bytes allocated
==25078==
==25078== For a detailed leak analysis, rerun with: --leak-check=full
==25078==
==25078== For counts of detected and suppressed errors, rerun with: -v
==25078== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)
==25081== Warning: noted but unhandled ioctl 0xd0 with no size/direction hints
==25081==This could cause spurious value errors to appear.
==25081==See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a 
proper wrapper.
==25081== Syscall param ioctl(FIBMAP) points to unaddressable byte(s)
==25081==at 0x75F1AC7: ioctl (syscall-template.S:82)
==25081==by 0x4088DF: tapdisk_blktap_complete_request (tapdisk-blktap.c:150)
==25081==by 0x40802C: tapdisk_vbd_kick (tapdisk-vbd.c:1441)
==25081==by 0x40E684: tapdisk_server_iterate (tapdisk-server.c:211)
==25081==by 0x40E864: tapdisk_server_run (tapdisk-server.c:334)
==25081==by 0x4039BF: main (tapdisk2.c:150)
==25081==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==25081==
==25081== Invalid read of size 8
==25081==at 0x7044DB6: librbd::AioRead::send() (AioRequest.cc:106)
==25081==by 0x7076EDE: librbd::aio_read(librbd::ImageCtx*, 
std::vectorstd::pairunsigned long, unsigned long, 
std::allocatorstd::pairunsigned long, unsigned long   const, char*, 
ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3096)
==25081==by 0x7076330: librbd::aio_read(librbd::ImageCtx*, unsigned long, 
unsigned long, char*, ceph::buffer::list*, librbd::AioCompletion*) 
(internal.cc:3032)
==25081==by 0x703EF75: rbd_aio_read (librbd.cc:1117)
==25081==by 0x41FDA4: tdrbd_submit_request (block-rbd.c:540)
==25081==by 0x42004A: tdrbd_queue_request (block-rbd.c:659)
==25081==by 0x40602A: tapdisk_vbd_issue_request (tapdisk-vbd.c:1244)
==25081==by 0x4062FA: tapdisk_vbd_issue_new_requests (tapdisk-vbd.c:1340)
==25081==by 0x407C27: tapdisk_vbd_issue_requests (tapdisk-vbd.c:1403)
==25081==by 0x407DBA: tapdisk_vbd_check_state (tapdisk-vbd.c:891)
==25081==by 0x40E62C: tapdisk_server_iterate (tapdisk-server.c:220)
==25081==by 0x40E864: tapdisk_server_run (tapdisk-server.c:334)
==25081==  Address 0xfe79b38 is 8 bytes inside a block of size 248 free'd
==25081==at 0x4C279DC: operator delete(void*) (vg_replace_malloc.c:457)
==25081==by 0x7046859: librbd::AioRead::~AioRead() (AioRequest.h:74)
==25081==by 0x70426E6: librbd::AioRequest::complete(int) (AioRequest.h:41)
==25081==by 0x7074323: librbd::rados_req_cb(void*, void*) (internal.cc:2751)
==25081==by 0x5FD191A: librados::C_AioComplete::finish(int) 
(AioCompletionImpl.h:181)
==25081==by 0x5F907E0: Context::complete(int) (Context.h:42)
==25081==by 0x6066CEF: Finisher::finisher_thread_entry() (Finisher.cc:56)
==25081==by 0x5FB81D3: Finisher::FinisherThread::entry() (Finisher.h:46)
==25081==by 0x62C89E0: Thread::_entry_func(void*) (Thread.cc:41)
==25081==by 0x7308B4F: start_thread (pthread_create.c:304)
==25081==by 0x75F8A7C: clone (clone.S:112)
==25081==
==25081== Invalid read of size 8
==25081==at 0x7044DBA: librbd::AioRead::send() (AioRequest.cc:106)
==25081==by 0x7076EDE: librbd::aio_read(librbd::ImageCtx*, 
std::vectorstd::pairunsigned long, unsigned long, 
std::allocatorstd::pairunsigned long, unsigned long   const, char*, 
ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3096)
==25081==by 0x7076330: librbd::aio_read(librbd::ImageCtx*, unsigned long, 
unsigned long, char*, ceph::buffer::list*, librbd::AioCompletion*) 
(internal.cc:3032)
==25081==by 0x703EF75: rbd_aio_read (librbd.cc:1117)
==25081==by 0x41FDA4: tdrbd_submit_request (block-rbd.c:540)
==25081==by 0x42004A: tdrbd_queue_request (block-rbd.c:659)
==25081==by 0x40602A: tapdisk_vbd_issue_request (tapdisk-vbd.c:1244)
==25081==by 0x4062FA: tapdisk_vbd_issue_new_requests (tapdisk-vbd.c:1340)
==25081==by 0x407C27: tapdisk_vbd_issue_requests (tapdisk-vbd.c:1403)
==25081==by 0x407DBA: tapdisk_vbd_check_state (tapdisk-vbd.c:891)
==25081==by 0x40E62C: tapdisk_server_iterate (tapdisk-server.c:220)
==25081==by 0x40E864: tapdisk_server_run 

collectd plugin with cuttlefish

2013-08-30 Thread Damien Churchill
Hi,

Has anything changed with the admin socket that would prevent the
collectd plugin (compiled against 5.3.1 using the patches submitted to
the collectd ML) from gathering stats? I've recompiled collectd with
--enable-debug and receive the following output in the log:

ceph_init
name=mon_ceph1, asok_path=/var/run/ceph/ceph-mon.ceph1.asok
entering cconn_main_loop(request_type = 0)
did cconn_prepare(name=mon_ceph1,i=0,st=1)
cconn_handle_event(name=mon_ceph1,state=1,amt=0,ret=4)
did cconn_prepare(name=mon_ceph1,i=0,st=2)
did cconn_prepare(name=mon_ceph1,i=0,st=2)
ERROR: cconn_main_loop: timed out.
cconn_main_loop: reached all Ceph daemons :)
Initialization of plugin `ceph' failed with status -110. Plugin will
be unloaded.
plugin_unregister_read: Marked `ceph' for removal.

Thanks in advance!
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ceph s3 allowed characters

2013-08-30 Thread Dominik Mostowiec
Hi,
I got err (400) from radosgw on request:
2013-08-30 08:09:19.396812 7f3b307c0700  2 req 3070:0.000150::POST
/dysk/files/test.test%40op.pl/DOMIWENT%202013/DW%202013_03_27/PROJEKTY%202012/ZB%20KROL/Szko%C5%82a%20%C5%81aziska%20ZB%20KROL/sala-%A3aziska_Dolne_PB-0_went_15_11_06%20Layout1%20%283%29.pdf::http
status=400
2013-08-30 08:09:34.851892 7f3b55ffb700 10
s-object=files/test.t...@op.pl/DOMIWENT 2013/Damian
DW/dw/Specyfikacja istotnych warunkF3w zamF3wienia.doc
s-bucket=dysk

What is allowed range of chars in url in radosgw?

-- 
Regards
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] ceph s3 allowed characters

2013-08-30 Thread Dominik Mostowiec
(echo -n 'GET 
/dysk/files/test.test%40op.pl/DOMIWENT%202013/Damian%20DW/dw/Specyfikacja%20istotnych%20warunk%F3w%20zam%F3wienia.doc
HTTP/1.0'; printf \r\n\r\n) | nc localhost 88
HTTP/1.1 400 Bad Request
Date: Fri, 30 Aug 2013 14:10:07 GMT
Server: Apache/2.2.22 (Ubuntu)
Accept-Ranges: bytes
Content-Length: 83
Connection: close
Content-Type: application/xml

?xml version=1.0
encoding=UTF-8?ErrorCodeInvalidObjectName/Code/Error

Full log from radosgw another error:

2013-08-30 14:32:52.166321 7f42e77d6700  1 == starting new request
req=0x12cff20 =
2013-08-30 14:32:52.166385 7f42e77d6700  2 req 33246:0.65initializing
2013-08-30 14:32:52.166410 7f42e77d6700 10 meta HTTP_X_AMZ_ACL=public-read
2013-08-30 14:32:52.166419 7f42e77d6700 10 x x-amz-acl:public-read
2013-08-30 14:32:52.166497 7f42e77d6700 10
s-object=files/test.t...@op.pl/DOMIWENT 2013/DW 2013_03_27/PROJEKTY
2012/ZB KROL/Szkoła Łaziska ZB
KROL/sala-A3aziska_Dolne_PB-0_went_15_11_06 Layou
t1 (4).pdf s-bucket=dysk
2013-08-30 14:32:52.166563 7f42e77d6700  2 req 33246:0.000242::POST
/dysk/files/test.test%40op.pl/DOMIWENT%202013/DW%202013_03_27/PROJEKTY%202012/ZB%20KROL/Szko%C5%82a%20%C5%81aziska%20ZB%20KROL
/sala-%A3aziska_Dolne_PB-0_went_15_11_06%20Layout1%20%284%29.pdf::http
status=400
2013-08-30 14:32:52.166653 7f42e77d6700  1 == req done
req=0x12cff20 http_status=400 ==

--
Dominik

2013/8/30 Alfredo Deza alfredo.d...@inktank.com:



 On Fri, Aug 30, 2013 at 9:52 AM, Dominik Mostowiec
 dominikmostow...@gmail.com wrote:

 Hi,
 I got err (400) from radosgw on request:
 2013-08-30 08:09:19.396812 7f3b307c0700  2 req 3070:0.000150::POST

 /dysk/files/test.test%40op.pl/DOMIWENT%202013/DW%202013_03_27/PROJEKTY%202012/ZB%20KROL/Szko%C5%82a%20%C5%81aziska%20ZB%20KROL/sala-%A3aziska_Dolne_PB-0_went_15_11_06%20Layout1%20%283%29.pdf::http
 status=400
 2013-08-30 08:09:34.851892 7f3b55ffb700 10
 s-object=files/test.t...@op.pl/DOMIWENT 2013/Damian
 DW/dw/Specyfikacja istotnych warunkF3w zamF3wienia.doc
 s-bucket=dysk

 What is allowed range of chars in url in radosgw?


 Can you post the full HTTP headers for the response?

 The output you are pasting is not entirely clear to me, is that a single log
 line for the whole request? Maybe it is just the formatting that
 is throwing me off.


 --
 Regards
 Dominik
 ___
 ceph-users mailing list
 ceph-us...@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





-- 
Pozdrawiam
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] ceph s3 allowed characters

2013-08-30 Thread Yehuda Sadeh
On Fri, Aug 30, 2013 at 7:44 AM, Dominik Mostowiec
dominikmostow...@gmail.com wrote:
 (echo -n 'GET 
 /dysk/files/test.test%40op.pl/DOMIWENT%202013/Damian%20DW/dw/Specyfikacja%20istotnych%20warunk%F3w%20zam%F3wienia.doc
 HTTP/1.0'; printf \r\n\r\n) | nc localhost 88
 HTTP/1.1 400 Bad Request
 Date: Fri, 30 Aug 2013 14:10:07 GMT
 Server: Apache/2.2.22 (Ubuntu)
 Accept-Ranges: bytes
 Content-Length: 83
 Connection: close
 Content-Type: application/xml

 ?xml version=1.0
 encoding=UTF-8?ErrorCodeInvalidObjectName/Code/Error



The object name needs to be utf8 encoded.

Yehuda
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: debugging librbd async - valgrind memtest hit

2013-08-30 Thread Sage Weil
On Fri, 30 Aug 2013, James Harper wrote:
 I finally got a valgrind memtest hit... output attached below email. I 
 recompiled all of tapdisk and ceph without any -O options (thought I had 
 already...) and it seems to have done the trick

What version is this?  The line numbers don't seem to match up with my 
source tree.
 
 Basically it looks like an instance of AioRead is being accessed after 
 being free'd. I need some hints on what api behaviour by the tapdisk 
 driver could be causing this to happen in librbd...

It looks like refcounting for the AioCompletion is off.  My first guess 
would be premature (or extra) calls to rados_aio_release or 
AioCompletion::release().

I did a quick look at the code and it looks like aio_read() is carrying a 
ref for the AioComplete for the entire duration of the function, so it 
should not be disappearing (and taking the AioRead request struct with it) 
until well after where the invalid read is.  Maybe there is an error path 
somewhere what is dropping a ref it shouldn't?

sage


 
 thanks
 
 James
 
 ==25078== Memcheck, a memory error detector
 ==25078== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
 ==25078== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
 ==25078== Command: /usr/bin/tapdisk.clean
 ==25078== Parent PID: 25077
 ==25078==
 ==25078==
 ==25078== HEAP SUMMARY:
 ==25078== in use at exit: 6,808 bytes in 7 blocks
 ==25078==   total heap usage: 7 allocs, 0 frees, 6,808 bytes allocated
 ==25078==
 ==25078== For a detailed leak analysis, rerun with: --leak-check=full
 ==25078==
 ==25078== For counts of detected and suppressed errors, rerun with: -v
 ==25078== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)
 ==25081== Warning: noted but unhandled ioctl 0xd0 with no size/direction hints
 ==25081==This could cause spurious value errors to appear.
 ==25081==See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a 
 proper wrapper.
 ==25081== Syscall param ioctl(FIBMAP) points to unaddressable byte(s)
 ==25081==at 0x75F1AC7: ioctl (syscall-template.S:82)
 ==25081==by 0x4088DF: tapdisk_blktap_complete_request 
 (tapdisk-blktap.c:150)
 ==25081==by 0x40802C: tapdisk_vbd_kick (tapdisk-vbd.c:1441)
 ==25081==by 0x40E684: tapdisk_server_iterate (tapdisk-server.c:211)
 ==25081==by 0x40E864: tapdisk_server_run (tapdisk-server.c:334)
 ==25081==by 0x4039BF: main (tapdisk2.c:150)
 ==25081==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
 ==25081==
 ==25081== Invalid read of size 8
 ==25081==at 0x7044DB6: librbd::AioRead::send() (AioRequest.cc:106)
 ==25081==by 0x7076EDE: librbd::aio_read(librbd::ImageCtx*, 
 std::vectorstd::pairunsigned long, unsigned long, 
 std::allocatorstd::pairunsigned long, unsigned long   const, char*, 
 ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3096)
 ==25081==by 0x7076330: librbd::aio_read(librbd::ImageCtx*, unsigned long, 
 unsigned long, char*, ceph::buffer::list*, librbd::AioCompletion*) 
 (internal.cc:3032)
 ==25081==by 0x703EF75: rbd_aio_read (librbd.cc:1117)
 ==25081==by 0x41FDA4: tdrbd_submit_request (block-rbd.c:540)
 ==25081==by 0x42004A: tdrbd_queue_request (block-rbd.c:659)
 ==25081==by 0x40602A: tapdisk_vbd_issue_request (tapdisk-vbd.c:1244)
 ==25081==by 0x4062FA: tapdisk_vbd_issue_new_requests (tapdisk-vbd.c:1340)
 ==25081==by 0x407C27: tapdisk_vbd_issue_requests (tapdisk-vbd.c:1403)
 ==25081==by 0x407DBA: tapdisk_vbd_check_state (tapdisk-vbd.c:891)
 ==25081==by 0x40E62C: tapdisk_server_iterate (tapdisk-server.c:220)
 ==25081==by 0x40E864: tapdisk_server_run (tapdisk-server.c:334)
 ==25081==  Address 0xfe79b38 is 8 bytes inside a block of size 248 free'd
 ==25081==at 0x4C279DC: operator delete(void*) (vg_replace_malloc.c:457)
 ==25081==by 0x7046859: librbd::AioRead::~AioRead() (AioRequest.h:74)
 ==25081==by 0x70426E6: librbd::AioRequest::complete(int) (AioRequest.h:41)
 ==25081==by 0x7074323: librbd::rados_req_cb(void*, void*) 
 (internal.cc:2751)
 ==25081==by 0x5FD191A: librados::C_AioComplete::finish(int) 
 (AioCompletionImpl.h:181)
 ==25081==by 0x5F907E0: Context::complete(int) (Context.h:42)
 ==25081==by 0x6066CEF: Finisher::finisher_thread_entry() (Finisher.cc:56)
 ==25081==by 0x5FB81D3: Finisher::FinisherThread::entry() (Finisher.h:46)
 ==25081==by 0x62C89E0: Thread::_entry_func(void*) (Thread.cc:41)
 ==25081==by 0x7308B4F: start_thread (pthread_create.c:304)
 ==25081==by 0x75F8A7C: clone (clone.S:112)
 ==25081==
 ==25081== Invalid read of size 8
 ==25081==at 0x7044DBA: librbd::AioRead::send() (AioRequest.cc:106)
 ==25081==by 0x7076EDE: librbd::aio_read(librbd::ImageCtx*, 
 std::vectorstd::pairunsigned long, unsigned long, 
 std::allocatorstd::pairunsigned long, unsigned long   const, char*, 
 ceph::buffer::list*, librbd::AioCompletion*) (internal.cc:3096)
 ==25081==by 0x7076330: 

Re: libvirt: Using rbd_create3 to create format 2 images

2013-08-30 Thread Josh Durgin

On 08/30/2013 02:42 AM, Wido den Hollander wrote:

Hi,

I created the attached patch to have libvirt create images with format 2
by default, this would simplify the CloudStack code and could also help
other projects.

The problem with libvirt is that there is no mechanism to supply
information like order, features, stripe unit and count to the
rbd_create3 method, so it's now hardcoded in libvirt.

Any comments on this patch before I fire it of to the libvirt guys?


Seems ok to me. They might want you to detect whether the function is
there and compile without it if librbd doesn't support it (rbd_create3
first appeared in bobtail).

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Deep-Scrub and High Read Latency with QEMU/RBD

2013-08-30 Thread Mike Dawson
We've been struggling with an issue of spikes of high i/o latency with 
qemu/rbd guests. As we've been chasing this bug, we've greatly improved 
the methods we use to monitor our infrastructure.


It appears that our RBD performance chokes in two situations:

- Deep-Scrub
- Backfill/recovery

In this email, I want to focus on deep-scrub. Graphing '% Util' from 
'iostat -x' on my hosts with OSDs, I can see Deep-Scrub take my disks 
from around 10% utilized to complete saturation during a scrub.


RBD writeback cache appears to cover the issue nicely, but occasionally 
suffers drops in performance (presumably when it flushes). But, reads 
appear to suffer greatly, with multiple seconds of 0B/s of reads 
accomplished (see log fragment below). If I make the assumption that 
deep-scrub isn't intended to create massive spindle contention, this 
appears to be a problem. What should happen here?


Looking at the settings around deep-scrub, I don't see an obvious way to 
say don't saturate my drives. Are there any setting in Ceph or 
otherwise (readahead?) that might lower the burden of deep-scrub?


If not, perhaps reads could be remapped to avoid waiting on saturated 
disks during scrub.


Any ideas?

2013-08-30 15:47:20.166149 mon.0 [INF] pgmap v9853931: 20672 pgs: 20665 
active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 0B/s rd, 5058KB/s wr, 217op/s
2013-08-30 15:47:21.945948 mon.0 [INF] pgmap v9853932: 20672 pgs: 20665 
active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 0B/s rd, 5553KB/s wr, 229op/s
2013-08-30 15:47:23.205843 mon.0 [INF] pgmap v9853933: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 0B/s rd, 6580KB/s wr, 246op/s
2013-08-30 15:47:24.843308 mon.0 [INF] pgmap v9853934: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 0B/s rd, 3795KB/s wr, 224op/s
2013-08-30 15:47:25.862722 mon.0 [INF] pgmap v9853935: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 1414B/s rd, 3799KB/s wr, 181op/s
2013-08-30 15:47:26.887516 mon.0 [INF] pgmap v9853936: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 1541B/s rd, 8138KB/s wr, 160op/s
2013-08-30 15:47:27.933629 mon.0 [INF] pgmap v9853937: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 0B/s rd, 14458KB/s wr, 304op/s
2013-08-30 15:47:29.127847 mon.0 [INF] pgmap v9853938: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 0B/s rd, 15300KB/s wr, 345op/s
2013-08-30 15:47:30.344837 mon.0 [INF] pgmap v9853939: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 0B/s rd, 13128KB/s wr, 218op/s
2013-08-30 15:47:31.380089 mon.0 [INF] pgmap v9853940: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 0B/s rd, 13299KB/s wr, 241op/s
2013-08-30 15:47:32.388303 mon.0 [INF] pgmap v9853941: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 4951B/s rd, 8147KB/s wr, 192op/s
2013-08-30 15:47:33.858382 mon.0 [INF] pgmap v9853942: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 7029B/s rd, 3254KB/s wr, 190op/s
2013-08-30 15:47:35.279691 mon.0 [INF] pgmap v9853943: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64555 GB / 174 TB avail; 1651B/s rd, 2476KB/s wr, 207op/s
2013-08-30 15:47:36.309078 mon.0 [INF] pgmap v9853944: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64555 GB / 174 TB avail; 0B/s rd, 3788KB/s wr, 239op/s
2013-08-30 15:47:38.120343 mon.0 [INF] pgmap v9853945: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64555 GB / 174 TB avail; 0B/s rd, 4671KB/s wr, 239op/s
2013-08-30 15:47:39.546980 mon.0 [INF] pgmap v9853946: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64555 GB / 174 TB avail; 0B/s rd, 13487KB/s wr, 444op/s
2013-08-30 15:47:40.561203 mon.0 [INF] pgmap v9853947: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64555 GB / 174 TB avail; 0B/s rd, 15265KB/s wr, 489op/s
2013-08-30 15:47:41.794355 mon.0 [INF] pgmap v9853948: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64555 GB / 174 TB avail; 0B/s rd, 7157KB/s wr, 240op/s
2013-08-30 15:47:44.661000 mon.0 [INF] pgmap v9853949: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB 

Re: Deep-Scrub and High Read Latency with QEMU/RBD

2013-08-30 Thread Andrey Korolyov
You may want to reduce scrubbing pgs per osd to 1 using config option
and check the results.

On Fri, Aug 30, 2013 at 8:03 PM, Mike Dawson mike.daw...@cloudapt.com wrote:
 We've been struggling with an issue of spikes of high i/o latency with
 qemu/rbd guests. As we've been chasing this bug, we've greatly improved the
 methods we use to monitor our infrastructure.

 It appears that our RBD performance chokes in two situations:

 - Deep-Scrub
 - Backfill/recovery

 In this email, I want to focus on deep-scrub. Graphing '% Util' from 'iostat
 -x' on my hosts with OSDs, I can see Deep-Scrub take my disks from around
 10% utilized to complete saturation during a scrub.

 RBD writeback cache appears to cover the issue nicely, but occasionally
 suffers drops in performance (presumably when it flushes). But, reads appear
 to suffer greatly, with multiple seconds of 0B/s of reads accomplished (see
 log fragment below). If I make the assumption that deep-scrub isn't intended
 to create massive spindle contention, this appears to be a problem. What
 should happen here?

 Looking at the settings around deep-scrub, I don't see an obvious way to say
 don't saturate my drives. Are there any setting in Ceph or otherwise
 (readahead?) that might lower the burden of deep-scrub?

 If not, perhaps reads could be remapped to avoid waiting on saturated disks
 during scrub.

 Any ideas?

 2013-08-30 15:47:20.166149 mon.0 [INF] pgmap v9853931: 20672 pgs: 20665
 active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 5058KB/s wr, 217op/s
 2013-08-30 15:47:21.945948 mon.0 [INF] pgmap v9853932: 20672 pgs: 20665
 active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 5553KB/s wr, 229op/s
 2013-08-30 15:47:23.205843 mon.0 [INF] pgmap v9853933: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 6580KB/s wr, 246op/s
 2013-08-30 15:47:24.843308 mon.0 [INF] pgmap v9853934: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 3795KB/s wr, 224op/s
 2013-08-30 15:47:25.862722 mon.0 [INF] pgmap v9853935: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 1414B/s rd, 3799KB/s wr, 181op/s
 2013-08-30 15:47:26.887516 mon.0 [INF] pgmap v9853936: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 1541B/s rd, 8138KB/s wr, 160op/s
 2013-08-30 15:47:27.933629 mon.0 [INF] pgmap v9853937: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 14458KB/s wr, 304op/s
 2013-08-30 15:47:29.127847 mon.0 [INF] pgmap v9853938: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 15300KB/s wr, 345op/s
 2013-08-30 15:47:30.344837 mon.0 [INF] pgmap v9853939: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 13128KB/s wr, 218op/s
 2013-08-30 15:47:31.380089 mon.0 [INF] pgmap v9853940: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 13299KB/s wr, 241op/s
 2013-08-30 15:47:32.388303 mon.0 [INF] pgmap v9853941: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 4951B/s rd, 8147KB/s wr, 192op/s
 2013-08-30 15:47:33.858382 mon.0 [INF] pgmap v9853942: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 7029B/s rd, 3254KB/s wr, 190op/s
 2013-08-30 15:47:35.279691 mon.0 [INF] pgmap v9853943: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64555 GB / 174 TB avail; 1651B/s rd, 2476KB/s wr, 207op/s
 2013-08-30 15:47:36.309078 mon.0 [INF] pgmap v9853944: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64555 GB / 174 TB avail; 0B/s rd, 3788KB/s wr, 239op/s
 2013-08-30 15:47:38.120343 mon.0 [INF] pgmap v9853945: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64555 GB / 174 TB avail; 0B/s rd, 4671KB/s wr, 239op/s
 2013-08-30 15:47:39.546980 mon.0 [INF] pgmap v9853946: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64555 GB / 174 TB avail; 0B/s rd, 13487KB/s wr, 444op/s
 2013-08-30 15:47:40.561203 mon.0 [INF] pgmap v9853947: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64555 GB / 174 TB avail; 0B/s rd, 15265KB/s wr, 489op/s
 2013-08-30 15:47:41.794355 mon.0 [INF] pgmap v9853948: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 

Re: Deep-Scrub and High Read Latency with QEMU/RBD

2013-08-30 Thread Mike Dawson

Andrey,

I use all the defaults:

# ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config show | grep scrub
  osd_scrub_thread_timeout: 60,
  osd_scrub_finalize_thread_timeout: 600,
  osd_max_scrubs: 1,
  osd_scrub_load_threshold: 0.5,
  osd_scrub_min_interval: 86400,
  osd_scrub_max_interval: 604800,
  osd_scrub_chunk_min: 5,
  osd_scrub_chunk_max: 25,
  osd_deep_scrub_interval: 604800,
  osd_deep_scrub_stride: 524288,

Which value are you referring to?


Does anyone know exactly how osd scrub load threshold works? The 
manual states The maximum CPU load. Ceph will not scrub when the CPU 
load is higher than this number. Default is 50%. So on a system with 
multiple processors and cores...what happens? Is the threshold .5 load 
(meaning half a core) or 50% of max load meaning anything less than 8 if 
you have 16 cores?


Thanks,
Mike Dawson

On 8/30/2013 1:34 PM, Andrey Korolyov wrote:

You may want to reduce scrubbing pgs per osd to 1 using config option
and check the results.

On Fri, Aug 30, 2013 at 8:03 PM, Mike Dawson mike.daw...@cloudapt.com wrote:

We've been struggling with an issue of spikes of high i/o latency with
qemu/rbd guests. As we've been chasing this bug, we've greatly improved the
methods we use to monitor our infrastructure.

It appears that our RBD performance chokes in two situations:

- Deep-Scrub
- Backfill/recovery

In this email, I want to focus on deep-scrub. Graphing '% Util' from 'iostat
-x' on my hosts with OSDs, I can see Deep-Scrub take my disks from around
10% utilized to complete saturation during a scrub.

RBD writeback cache appears to cover the issue nicely, but occasionally
suffers drops in performance (presumably when it flushes). But, reads appear
to suffer greatly, with multiple seconds of 0B/s of reads accomplished (see
log fragment below). If I make the assumption that deep-scrub isn't intended
to create massive spindle contention, this appears to be a problem. What
should happen here?

Looking at the settings around deep-scrub, I don't see an obvious way to say
don't saturate my drives. Are there any setting in Ceph or otherwise
(readahead?) that might lower the burden of deep-scrub?

If not, perhaps reads could be remapped to avoid waiting on saturated disks
during scrub.

Any ideas?

2013-08-30 15:47:20.166149 mon.0 [INF] pgmap v9853931: 20672 pgs: 20665
active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 5058KB/s wr, 217op/s
2013-08-30 15:47:21.945948 mon.0 [INF] pgmap v9853932: 20672 pgs: 20665
active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 5553KB/s wr, 229op/s
2013-08-30 15:47:23.205843 mon.0 [INF] pgmap v9853933: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 6580KB/s wr, 246op/s
2013-08-30 15:47:24.843308 mon.0 [INF] pgmap v9853934: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 3795KB/s wr, 224op/s
2013-08-30 15:47:25.862722 mon.0 [INF] pgmap v9853935: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 1414B/s rd, 3799KB/s wr, 181op/s
2013-08-30 15:47:26.887516 mon.0 [INF] pgmap v9853936: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 1541B/s rd, 8138KB/s wr, 160op/s
2013-08-30 15:47:27.933629 mon.0 [INF] pgmap v9853937: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 14458KB/s wr, 304op/s
2013-08-30 15:47:29.127847 mon.0 [INF] pgmap v9853938: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 15300KB/s wr, 345op/s
2013-08-30 15:47:30.344837 mon.0 [INF] pgmap v9853939: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 13128KB/s wr, 218op/s
2013-08-30 15:47:31.380089 mon.0 [INF] pgmap v9853940: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 13299KB/s wr, 241op/s
2013-08-30 15:47:32.388303 mon.0 [INF] pgmap v9853941: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 4951B/s rd, 8147KB/s wr, 192op/s
2013-08-30 15:47:33.858382 mon.0 [INF] pgmap v9853942: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 7029B/s rd, 3254KB/s wr, 190op/s
2013-08-30 15:47:35.279691 mon.0 [INF] pgmap v9853943: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64555 GB / 174 TB avail; 1651B/s rd, 2476KB/s wr, 207op/s
2013-08-30 15:47:36.309078 mon.0 [INF] pgmap v9853944: 20672 pgs: 20664
active+clean, 8 

Re: Deep-Scrub and High Read Latency with QEMU/RBD

2013-08-30 Thread Andrey Korolyov
On Fri, Aug 30, 2013 at 9:44 PM, Mike Dawson mike.daw...@cloudapt.com wrote:
 Andrey,

 I use all the defaults:

 # ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config show | grep scrub
   osd_scrub_thread_timeout: 60,
   osd_scrub_finalize_thread_timeout: 600,


   osd_max_scrubs: 1,

This one. I may suggest to increase max_interval and write some kind
of script doing per-pg scrub with low intensity, so you`ll have one
scrubbing PG or less anytime and you may wait some time before
scrubbing next, so they will not start scrubbing at once when
max_interval will expire. I had discussed some throttling mechanisms
to scrubbing some months ago here or in ceph-devel, but there still no
such implementation (it is ultimately low-priority task since it can
be handled by such simple thing as proposal above).

   osd_scrub_load_threshold: 0.5,
   osd_scrub_min_interval: 86400,
   osd_scrub_max_interval: 604800,
   osd_scrub_chunk_min: 5,
   osd_scrub_chunk_max: 25,
   osd_deep_scrub_interval: 604800,
   osd_deep_scrub_stride: 524288,

 Which value are you referring to?


 Does anyone know exactly how osd scrub load threshold works? The manual
 states The maximum CPU load. Ceph will not scrub when the CPU load is
 higher than this number. Default is 50%. So on a system with multiple
 processors and cores...what happens? Is the threshold .5 load (meaning half
 a core) or 50% of max load meaning anything less than 8 if you have 16
 cores?

 Thanks,
 Mike Dawson


 On 8/30/2013 1:34 PM, Andrey Korolyov wrote:

 You may want to reduce scrubbing pgs per osd to 1 using config option
 and check the results.

 On Fri, Aug 30, 2013 at 8:03 PM, Mike Dawson mike.daw...@cloudapt.com
 wrote:

 We've been struggling with an issue of spikes of high i/o latency with
 qemu/rbd guests. As we've been chasing this bug, we've greatly improved
 the
 methods we use to monitor our infrastructure.

 It appears that our RBD performance chokes in two situations:

 - Deep-Scrub
 - Backfill/recovery

 In this email, I want to focus on deep-scrub. Graphing '% Util' from
 'iostat
 -x' on my hosts with OSDs, I can see Deep-Scrub take my disks from around
 10% utilized to complete saturation during a scrub.

 RBD writeback cache appears to cover the issue nicely, but occasionally
 suffers drops in performance (presumably when it flushes). But, reads
 appear
 to suffer greatly, with multiple seconds of 0B/s of reads accomplished
 (see
 log fragment below). If I make the assumption that deep-scrub isn't
 intended
 to create massive spindle contention, this appears to be a problem. What
 should happen here?

 Looking at the settings around deep-scrub, I don't see an obvious way to
 say
 don't saturate my drives. Are there any setting in Ceph or otherwise
 (readahead?) that might lower the burden of deep-scrub?

 If not, perhaps reads could be remapped to avoid waiting on saturated
 disks
 during scrub.

 Any ideas?

 2013-08-30 15:47:20.166149 mon.0 [INF] pgmap v9853931: 20672 pgs: 20665
 active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 5058KB/s wr, 217op/s
 2013-08-30 15:47:21.945948 mon.0 [INF] pgmap v9853932: 20672 pgs: 20665
 active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 5553KB/s wr, 229op/s
 2013-08-30 15:47:23.205843 mon.0 [INF] pgmap v9853933: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 6580KB/s wr, 246op/s
 2013-08-30 15:47:24.843308 mon.0 [INF] pgmap v9853934: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 3795KB/s wr, 224op/s
 2013-08-30 15:47:25.862722 mon.0 [INF] pgmap v9853935: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 1414B/s rd, 3799KB/s wr, 181op/s
 2013-08-30 15:47:26.887516 mon.0 [INF] pgmap v9853936: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 1541B/s rd, 8138KB/s wr, 160op/s
 2013-08-30 15:47:27.933629 mon.0 [INF] pgmap v9853937: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 14458KB/s wr, 304op/s
 2013-08-30 15:47:29.127847 mon.0 [INF] pgmap v9853938: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 15300KB/s wr, 345op/s
 2013-08-30 15:47:30.344837 mon.0 [INF] pgmap v9853939: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 13128KB/s wr, 218op/s
 2013-08-30 15:47:31.380089 mon.0 [INF] pgmap v9853940: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 13299KB/s wr, 241op/s
 2013-08-30 15:47:32.388303 

Re: libvirt: Using rbd_create3 to create format 2 images

2013-08-30 Thread Wido den Hollander

On 08/30/2013 05:26 PM, Josh Durgin wrote:

On 08/30/2013 02:42 AM, Wido den Hollander wrote:

Hi,

I created the attached patch to have libvirt create images with format 2
by default, this would simplify the CloudStack code and could also help
other projects.

The problem with libvirt is that there is no mechanism to supply
information like order, features, stripe unit and count to the
rbd_create3 method, so it's now hardcoded in libvirt.

Any comments on this patch before I fire it of to the libvirt guys?


Seems ok to me. They might want you to detect whether the function is
there and compile without it if librbd doesn't support it (rbd_create3
first appeared in bobtail).



Good one. Although I don't think anybody is still running Argonaut I'll 
do a version check of librbd and switch to rbd_create if needed.



--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ceph-users] ceph install

2013-08-30 Thread Jimmy Lu [ Storage ]
Hello ceph-users,

I am new to Ceph and would like to bring up a 5-node cluster for my PoC. I
am doing an installation from below link and ran into a problem. I am not
so sure how to deal with it. Can someone please shed some light?

http://ceph.com/docs/master/install/rpm/

[root@cleverloadgen16 ceph]# ceph auth add client.radosgw.gateway
--in-file=/etc/ceph/keyring.radosgw.gateway
unable to find any monitors in conf. please specify monitors via -m
monaddr or -c ceph.conf
Error connecting to cluster: ObjectNotFound

[root@cleverloadgen16 ceph]# cat keyring.radosgw.gateway
[client.radosgw.gateway]
key = AQCC4yBSyMWQGBAADS7j7DnIZeGAZiaJFaM8Xw==
caps mon = allow rw
caps osd = allow rwx
[root@cleverloadgen16 ceph]#


Thanks,
jimmy

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: debugging librbd async - valgrind memtest hit

2013-08-30 Thread James Harper
 
 On Fri, 30 Aug 2013, James Harper wrote:
  I finally got a valgrind memtest hit... output attached below email. I
  recompiled all of tapdisk and ceph without any -O options (thought I had
  already...) and it seems to have done the trick
 
 What version is this?  The line numbers don't seem to match up with my
 source tree.

0.67.2, but I've peppered it with debug prints

  Basically it looks like an instance of AioRead is being accessed after
  being free'd. I need some hints on what api behaviour by the tapdisk
  driver could be causing this to happen in librbd...
 
 It looks like refcounting for the AioCompletion is off.  My first guess
 would be premature (or extra) calls to rados_aio_release or
 AioCompletion::release().
 
 I did a quick look at the code and it looks like aio_read() is carrying a
 ref for the AioComplete for the entire duration of the function, so it
 should not be disappearing (and taking the AioRead request struct with it)
 until well after where the invalid read is.  Maybe there is an error path
 somewhere what is dropping a ref it shouldn't?
 

I'll see if I can find a way to track that. It's the c-get() and c-put() that 
track this right?
 
The crash seems a little bit different every time, so it could still be 
something stomping on memory, eg overwriting the ref count or something.

Thanks

James

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: collectd plugin with cuttlefish

2013-08-30 Thread Dan Mick
It's a bit surprising that it broke with cuttlefish; something might 
have happened in dumpling, but we wouldn't expect changes in cuttlefish.

It looks like collectd just couldn't talk to the monitor properly.
Maybe look at the mon's log and see what it thinks it saw?

On 08/30/2013 05:04 AM, Damien Churchill wrote:

Hi,

Has anything changed with the admin socket that would prevent the
collectd plugin (compiled against 5.3.1 using the patches submitted to
the collectd ML) from gathering stats? I've recompiled collectd with
--enable-debug and receive the following output in the log:

ceph_init
name=mon_ceph1, asok_path=/var/run/ceph/ceph-mon.ceph1.asok
entering cconn_main_loop(request_type = 0)
did cconn_prepare(name=mon_ceph1,i=0,st=1)
cconn_handle_event(name=mon_ceph1,state=1,amt=0,ret=4)
did cconn_prepare(name=mon_ceph1,i=0,st=2)
did cconn_prepare(name=mon_ceph1,i=0,st=2)
ERROR: cconn_main_loop: timed out.
cconn_main_loop: reached all Ceph daemons :)
Initialization of plugin `ceph' failed with status -110. Plugin will
be unloaded.
plugin_unregister_read: Marked `ceph' for removal.

Thanks in advance!
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html