Re: [Ceph-qa] Bug #13191 CentOS 7 multipath test fail because libdevmapper version must be >= 1.02.89

2015-12-15 Thread Loic Dachary
[redirecting to ceph-devel].

Hi,

On 14/12/2015 21:20, Abe Asraoui wrote:
> Hi All,
> 
> Does anyone know if this bug # 13191 has been resolved ??

http://tracker.ceph.com/issues/13191 has not been resolved. Could you please 
comment on it ? A short explanation about why you need it resolved will help.

Thanks !

> 
> 
> Thanks,
> Abe
> ___
> Ceph-qa mailing list
> ceph...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-qa-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Compiling for FreeBSD, Clang refuses to compile a test

2015-12-08 Thread Willem Jan Withagen

On 8-12-2015 01:29, Willem Jan Withagen wrote:

On 7-12-2015 23:19, Michal Jarzabek wrote:

Hi Willem,

If you look at line 411 and 412 you will have variables k and m
defined. They are not changed anywhere(I think), so the sizes must
be big enough. As Xinze mentioned just add const in front of it:
const int k = 12 const int m = 4 and it should fix the compile
error.

buffer::ptr enc[k + m] works with gcc, because of the compiler
extension, but it's not standard
c++(https://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html)

I will submit patch to  change it.


That is exactly what I have done to get things compiling. Have not
yet gotten to the state that everything builds to start testing.


Testing has started

Not everything goes well, but it is getting there.
Had to disable ebd testing as that still does not get build.
But I think I saw a patch passing by that it only got build for Linux.

And the tests below tooks to run for atleast 7 hours on a
CPU: AMD Phenom(tm) II X6 1075T Processor (3013.83-MHz K8-class CPU)
So what I gather from that is that that is too long.

Some tests (like: unittest_erasure_code_shec_thread)  got killed because
they ran out of swap, others (unittest_on_exit) got a signal 6...
But then again that could be because the gtest ASSERT_DEATH stuff is
not suppoorted under FreeBSD (as by words of Google)

--WjW



PASS: unittest_erasure_code_plugin
PASS: unittest_erasure_code
PASS: unittest_erasure_code_jerasure
PASS: unittest_erasure_code_plugin_jerasure
PASS: unittest_erasure_code_isa
PASS: unittest_erasure_code_plugin_isa
PASS: unittest_erasure_code_lrc
PASS: unittest_erasure_code_plugin_lrc
PASS: unittest_erasure_code_shec
PASS: unittest_erasure_code_shec_all
PASS: unittest_erasure_code_shec_thread
Killed
FAIL: unittest_erasure_code_shec_arguments
PASS: unittest_erasure_code_plugin_shec
PASS: unittest_erasure_code_example
PASS: unittest_librados
PASS: unittest_librados_config
PASS: unittest_journal
PASS: unittest_rbd_replay
PASS: unittest_encoding
PASS: unittest_base64
PASS: unittest_run_cmd
PASS: unittest_simple_spin
PASS: unittest_libcephfs_config
PASS: unittest_mon_moncap
PASS: unittest_mon_pgmap
PASS: unittest_ecbackend
PASS: unittest_osdscrub
PASS: unittest_pglog
PASS: unittest_hitset
PASS: unittest_osd_osdcap
PASS: unittest_pageset
PASS: unittest_chain_xattr
PASS: unittest_lfnindex
PASS: unittest_mds_authcap
PASS: unittest_addrs
PASS: unittest_bloom_filter
PASS: unittest_histogram
PASS: unittest_prioritized_queue
PASS: unittest_str_map
PASS: unittest_sharedptr_registry
PASS: unittest_shared_cache
PASS: unittest_sloppy_crc_map
PASS: unittest_util
PASS: unittest_crush_wrapper
PASS: unittest_crush
PASS: unittest_osdmap
PASS: unittest_workqueue
PASS: unittest_striper
PASS: unittest_prebufferedstreambuf
PASS: unittest_str_list
PASS: unittest_log
PASS: unittest_throttle
PASS: unittest_ceph_argparse
PASS: unittest_ceph_compatset
PASS: unittest_mds_types
PASS: unittest_osd_types
PASS: unittest_lru
PASS: unittest_io_priority
PASS: unittest_gather
PASS: unittest_signals
PASS: unittest_bufferlist
PASS: unittest_xlist
PASS: unittest_crc32c
PASS: unittest_arch
PASS: unittest_crypto
PASS: unittest_crypto_init
PASS: unittest_perf_counters
PASS: unittest_admin_socket
PASS: unittest_ceph_crypto
PASS: unittest_utf8
PASS: unittest_mime
PASS: unittest_escape
PASS: unittest_strtol
PASS: unittest_confutils
PASS: unittest_config
PASS: unittest_context
PASS: unittest_safe_io
PASS: unittest_heartbeatmap
PASS: unittest_formatter
PASS: unittest_daemon_config
PASS: unittest_ipaddr
PASS: unittest_texttable
Abort trap (core dumped)
FAIL: unittest_on_exit
PASS: unittest_readahead
PASS: unittest_tableformatter
PASS: unittest_bit_vector
FAIL: ceph-detect-init/run-tox.sh
FAIL: test/erasure-code/test-erasure-code.sh
FAIL: test/erasure-code/test-erasure-eio.sh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Compiling for FreeBSD, Clang refuses to compile a test

2015-12-07 Thread Willem Jan Withagen

On 5-12-2015 14:02, Xinze Chi (信泽) wrote:

I think "const int k = 12; const int m = 4" would pass the compile?



Are these sizes big enough??

--WjW


2015-12-05 20:56 GMT+08:00 Willem Jan Withagen <w...@digiware.nl>:

src/test/erasure-code/TestErasureCodeIsa.cc

contains snippets, function definition like:

buffer::ptr enc[k + m];
   // create buffers with a copy of the original data to be able to compare
it after decoding
   {
 for (int i = 0; i < (k + m); i++) {

Clang refuses because the [k+m] size in not known at compiletime.
Suggesting to tempate this.

How would one normally handle this?

I've temporarily made it fixed size 1024*1024.
But I'm not sure if that is big enough

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Compiling for FreeBSD, Clang refuses to compile a test

2015-12-07 Thread Michal Jarzabek
Hi Willem,

If you look at line 411 and 412 you will have variables k and m
defined. They are not changed anywhere(I think), so the sizes must be
big enough.
As Xinze mentioned just add const in front of it:
const int k = 12
const int m = 4
and it should fix the compile error.

buffer::ptr enc[k + m] works with gcc, because of the compiler
extension, but it's not standard
c++(https://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html)

I will submit patch to  change it.

Thanks,
Michal

On Mon, Dec 7, 2015 at 9:44 PM, Willem Jan Withagen <w...@digiware.nl> wrote:
> On 5-12-2015 14:02, Xinze Chi (信泽) wrote:
>>
>> I think "const int k = 12; const int m = 4" would pass the compile?
>
>
>
> Are these sizes big enough??
>
> --WjW
>
>> 2015-12-05 20:56 GMT+08:00 Willem Jan Withagen <w...@digiware.nl>:
>>>
>>> src/test/erasure-code/TestErasureCodeIsa.cc
>>>
>>> contains snippets, function definition like:
>>>
>>> buffer::ptr enc[k + m];
>>>// create buffers with a copy of the original data to be able to
>>> compare
>>> it after decoding
>>>{
>>>  for (int i = 0; i < (k + m); i++) {
>>>
>>> Clang refuses because the [k+m] size in not known at compiletime.
>>> Suggesting to tempate this.
>>>
>>> How would one normally handle this?
>>>
>>> I've temporarily made it fixed size 1024*1024.
>>> But I'm not sure if that is big enough
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Compiling for FreeBSD, Clang refuses to compile a test

2015-12-07 Thread Willem Jan Withagen

On 7-12-2015 23:19, Michal Jarzabek wrote:

Hi Willem,

If you look at line 411 and 412 you will have variables k and m
defined. They are not changed anywhere(I think), so the sizes must be
big enough.
As Xinze mentioned just add const in front of it:
const int k = 12
const int m = 4
and it should fix the compile error.

buffer::ptr enc[k + m] works with gcc, because of the compiler
extension, but it's not standard
c++(https://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html)

I will submit patch to  change it.


That is exactly what I have done to get things compiling.
Have not yet gotten to the state that everything builds to start testing.

--WjW
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Compiling for FreeBSD, Clang refuses to compile a test

2015-12-05 Thread Willem Jan Withagen

src/test/erasure-code/TestErasureCodeIsa.cc

contains snippets, function definition like:

buffer::ptr enc[k + m];
  // create buffers with a copy of the original data to be able to 
compare it after decoding

  {
for (int i = 0; i < (k + m); i++) {

Clang refuses because the [k+m] size in not known at compiletime.
Suggesting to tempate this.

How would one normally handle this?

I've temporarily made it fixed size 1024*1024.
But I'm not sure if that is big enough

--WjW

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Compiling for FreeBSD, Clang refuses to compile a test

2015-12-05 Thread 信泽
I think "const int k = 12; const int m = 4" would pass the compile?

2015-12-05 20:56 GMT+08:00 Willem Jan Withagen <w...@digiware.nl>:
> src/test/erasure-code/TestErasureCodeIsa.cc
>
> contains snippets, function definition like:
>
> buffer::ptr enc[k + m];
>   // create buffers with a copy of the original data to be able to compare
> it after decoding
>   {
> for (int i = 0; i < (k + m); i++) {
>
> Clang refuses because the [k+m] size in not known at compiletime.
> Suggesting to tempate this.
>
> How would one normally handle this?
>
> I've temporarily made it fixed size 1024*1024.
> But I'm not sure if that is big enough
>
> --WjW
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Regards,
Xinze Chi
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


test

2015-11-11 Thread Somnath Roy
Sorry for the spam , having some issues with devl
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: test

2015-11-11 Thread Mark Nelson

whatever you did, it appears to work. :)

On 11/11/2015 05:44 PM, Somnath Roy wrote:

Sorry for the spam , having some issues with devl
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RE: [performance] why rbd_aio_write latency increase from 4ms to 7.3ms after the same test

2015-11-02 Thread hzwulibin
Hi,

Thank you, that make sense for testing, but i'm afraid not in my case.
Even i test on the volume that already test many times, the IOPS will not 
growing up 
again. Yeah, i mean, this VM is broken, IOPS of the VM will never growing up..

Thanks!

--   
hzwulibin
2015-11-03

-
发件人:"Chen, Xiaoxi" <xiaoxi.c...@intel.com>
发送日期:2015-11-02 14:11
收件人:hzwulibin,ceph-devel,ceph-users
抄送:
主题:RE: [performance] why rbd_aio_write latency increase from 4ms to
 7.3ms after the same test

Pre-allocated the volume by "DD" across the entire RBD before you do any 
performance test:).

In this case, you may want to re-create the RBD, pre-allocate and try again.

> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of hzwulibin
> Sent: Monday, November 2, 2015 1:24 PM
> To: ceph-devel; ceph-users
> Subject: [performance] why rbd_aio_write latency increase from 4ms to
> 7.3ms after the same test
> 
> Hi,
> same environment, after a test script, the io latency(get from sudo ceph --
> admin-daemon /run/ceph/guests/ceph-client.*.asok per dump) increase
> from about 4ms to 7.3ms
> 
> qemu version: debian 2.1.2
> kernel:3.10.45-openstack-amd64
> system: debian 7.8
> ceph: 0.94.5
> VM CPU number: 4  (cpu MHz : 2599.998)
> VM memory size: 16GB
> 9 OSD storage servers, with 4 SSD OSD on each, total 36 OSDs.
> 
> Test scripts in VM:
> # cat reproduce.sh
> #!/bin/bash
> 
> times=20
> for((i=1;i<=$times;i++))
> do
> tmpdate=`date "+%F-%T"`
> echo
> "===$tmpdate($i/$times)===
> "
> tmp=$((i%2))
> if [[ $tmp -eq 0 ]];then
> echo "### fio /root/vdb.cfg ###"
> fio /root/vdb.cfg
> else
> echo "### fio /root/vdc.cfg ###"
> fio /root/vdc.cfg
> fi
> done
> 
> 
> tmpdate=`date "+%F-%T"`
> echo "### [$tmpdate] fio /root/vde.cfg ###"
> fio /root/vde.cfg
> 
> 
> # cat vdb.cfg
> [global]
> rw=randwrite
> direct=1
> numjobs=64
> ioengine=sync
> bsrange=4k-4k
> runtime=180
> group_reporting
> 
> [disk01]
> filename=/dev/vdb
> 
> 
> # cat vdc.cfg
> [global]
> rw=randwrite
> direct=1
> numjobs=64
> ioengine=sync
> bsrange=4k-4k
> runtime=180
> group_reporting
> 
> [disk01]
> filename=/dev/vdc
> 
> # cat vdd.cfg
> [global]
> rw=randwrite
> direct=1
> numjobs=64
> ioengine=sync
> bsrange=4k-4k
> runtime=180
> group_reporting
> 
> [disk01]
> filename=/dev/vdd
> 
> # cat vde.cfg
> [global]
> rw=randwrite
> direct=1
> numjobs=64
> ioengine=sync
> bsrange=4k-4k
> runtime=180
> group_reporting
> 
> [disk01]
> filename=/dev/vde
> 
> After run the scripts reproduce.sh, the disks in the VM's IOPS cutdown from
> 12k to 5k, the latency increase from 4ms to 7.3ms.
> 
> run steps:
> 1. create a VM
> 2. create four volumes and attatch to the VM 3. sh reproduce.sh 4. in the
> runtime of  reproduce.sh, run "fio vdd.cfg" or "fio vde.cfg" to checkt the
> performance
> 
> After reproduce.sh finished, performance down.
> 
> 
> Anyone has the same problem or has some ideas about this?
> 
> Thanks!
> --
> hzwulibin
> 2015-11-02
>  {.n +   +%  lzwm  b 맲  r  yǩ ׯzX    ܨ}   Ơz :+vzZ+  +zf   h  
>  ~i   z  w   ?
> & )ߢf

N�r��yb�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

Re: RE: [performance] why rbd_aio_write latency increase from 4ms to 7.3ms after the same test

2015-11-02 Thread hzwulibin
Hi,

Thank you, that make sense for testing, but i'm afraid not in my case.
Even i test on the volume that already test many times, the IOPS will not 
growing up 
again. Yeah, i mean, this VM is broken, IOPS of the VM will never growing up..

Thanks!

--   
hzwulibin
2015-11-03

-
发件人:"Chen, Xiaoxi" <xiaoxi.c...@intel.com>
发送日期:2015-11-02 14:11
收件人:hzwulibin,ceph-devel,ceph-users
抄送:
主题:RE: [performance] why rbd_aio_write latency increase from 4ms to
 7.3ms after the same test

Pre-allocated the volume by "DD" across the entire RBD before you do any 
performance test:).

In this case, you may want to re-create the RBD, pre-allocate and try again.

> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of hzwulibin
> Sent: Monday, November 2, 2015 1:24 PM
> To: ceph-devel; ceph-users
> Subject: [performance] why rbd_aio_write latency increase from 4ms to
> 7.3ms after the same test
> 
> Hi,
> same environment, after a test script, the io latency(get from sudo ceph --
> admin-daemon /run/ceph/guests/ceph-client.*.asok per dump) increase
> from about 4ms to 7.3ms
> 
> qemu version: debian 2.1.2
> kernel:3.10.45-openstack-amd64
> system: debian 7.8
> ceph: 0.94.5
> VM CPU number: 4  (cpu MHz : 2599.998)
> VM memory size: 16GB
> 9 OSD storage servers, with 4 SSD OSD on each, total 36 OSDs.
> 
> Test scripts in VM:
> # cat reproduce.sh
> #!/bin/bash
> 
> times=20
> for((i=1;i<=$times;i++))
> do
> tmpdate=`date "+%F-%T"`
> echo
> "===$tmpdate($i/$times)===
> "
> tmp=$((i%2))
> if [[ $tmp -eq 0 ]];then
> echo "### fio /root/vdb.cfg ###"
> fio /root/vdb.cfg
> else
> echo "### fio /root/vdc.cfg ###"
> fio /root/vdc.cfg
> fi
> done
> 
> 
> tmpdate=`date "+%F-%T"`
> echo "### [$tmpdate] fio /root/vde.cfg ###"
> fio /root/vde.cfg
> 
> 
> # cat vdb.cfg
> [global]
> rw=randwrite
> direct=1
> numjobs=64
> ioengine=sync
> bsrange=4k-4k
> runtime=180
> group_reporting
> 
> [disk01]
> filename=/dev/vdb
> 
> 
> # cat vdc.cfg
> [global]
> rw=randwrite
> direct=1
> numjobs=64
> ioengine=sync
> bsrange=4k-4k
> runtime=180
> group_reporting
> 
> [disk01]
> filename=/dev/vdc
> 
> # cat vdd.cfg
> [global]
> rw=randwrite
> direct=1
> numjobs=64
> ioengine=sync
> bsrange=4k-4k
> runtime=180
> group_reporting
> 
> [disk01]
> filename=/dev/vdd
> 
> # cat vde.cfg
> [global]
> rw=randwrite
> direct=1
> numjobs=64
> ioengine=sync
> bsrange=4k-4k
> runtime=180
> group_reporting
> 
> [disk01]
> filename=/dev/vde
> 
> After run the scripts reproduce.sh, the disks in the VM's IOPS cutdown from
> 12k to 5k, the latency increase from 4ms to 7.3ms.
> 
> run steps:
> 1. create a VM
> 2. create four volumes and attatch to the VM 3. sh reproduce.sh 4. in the
> runtime of  reproduce.sh, run "fio vdd.cfg" or "fio vde.cfg" to checkt the
> performance
> 
> After reproduce.sh finished, performance down.
> 
> 
> Anyone has the same problem or has some ideas about this?
> 
> Thanks!
> --
> hzwulibin
> 2015-11-02
>  {.n +   +%  lzwm  b 맲  r  yǩ ׯzX    ܨ}   Ơz :+vzZ+  +zf   h  
>  ~i   z  w   ?
> & )ߢf

N�r��yb�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

aarch64 test builds for trusty now available

2015-09-24 Thread Sage Weil
We now have a gitbuilder up and running building test packages for arm64 
(aarch64).  The hardware for these builds has been graciously provided by 
Cavium (thank you!).

Trusty aarch64 users can now install packages with

 ceph-deploy install --dev BRANCH HOST

and build results are visible at

http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-deb-trusty-aarch64-basic/
http://gitbuilder.ceph.com/ceph-deb-trusty-aarch64-basic/ref/

I would love to expand this to include centos7 builds and to also have 
hardware to run make check (as we do on x86_64).  If you have reasonably 
powerful aarch64 hardware available and are willing to provide us with 
remote access please let me know off-list.

Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


test

2015-09-22 Thread wangsongbo

test
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/39] drop null test before destroy functions

2015-09-14 Thread SF Markus Elfring
> Recent commits to kernel/git/torvalds/linux.git have made the following
> functions able to tolerate NULL arguments:
>
> kmem_cache_destroy (commit 3942d29918522)
> mempool_destroy (commit 4e3ca3e033d1)
> dma_pool_destroy (commit 44d7175da6ea)

How do you think about to extend an other SmPL script?

Related topic:
scripts/coccinelle/free: Delete NULL test before freeing functions
https://systeme.lip6.fr/pipermail/cocci/2015-May/001960.html
https://www.mail-archive.com/cocci@systeme.lip6.fr/msg01855.html


> If these changes are OK, I will address the remainder later.

Would anybody like to reuse my general SmPL approach for similar source
code clean-up?

Regards,
Markus
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 33/39] rbd: drop null test before destroy functions

2015-09-14 Thread Ilya Dryomov
On Sun, Sep 13, 2015 at 3:15 PM, Julia Lawall <julia.law...@lip6.fr> wrote:
> Remove unneeded NULL test.
>
> The semantic patch that makes this change is as follows:
> (http://coccinelle.lip6.fr/)
>
> // 
> @@ expression x; @@
> -if (x != NULL) {
>   \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
>   x = NULL;
> -}
> // 
>
> Signed-off-by: Julia Lawall <julia.law...@lip6.fr>
>
> ---
>  drivers/block/rbd.c |6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index d93a037..0507246 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -5645,10 +5645,8 @@ static int rbd_slab_init(void)
> if (rbd_segment_name_cache)
> return 0;
>  out_err:
> -   if (rbd_obj_request_cache) {
> -   kmem_cache_destroy(rbd_obj_request_cache);
> -   rbd_obj_request_cache = NULL;
> -   }
> +   kmem_cache_destroy(rbd_obj_request_cache);
> +   rbd_obj_request_cache = NULL;
>
> kmem_cache_destroy(rbd_img_request_cache);
> rbd_img_request_cache = NULL;

Applied.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 33/39] rbd: drop null test before destroy functions

2015-09-13 Thread Julia Lawall
Remove unneeded NULL test.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// 
@@ expression x; @@
-if (x != NULL) {
  \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
  x = NULL;
-}
// 

Signed-off-by: Julia Lawall <julia.law...@lip6.fr>

---
 drivers/block/rbd.c |6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index d93a037..0507246 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -5645,10 +5645,8 @@ static int rbd_slab_init(void)
if (rbd_segment_name_cache)
return 0;
 out_err:
-   if (rbd_obj_request_cache) {
-   kmem_cache_destroy(rbd_obj_request_cache);
-   rbd_obj_request_cache = NULL;
-   }
+   kmem_cache_destroy(rbd_obj_request_cache);
+   rbd_obj_request_cache = NULL;
 
kmem_cache_destroy(rbd_img_request_cache);
rbd_img_request_cache = NULL;

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/39] drop null test before destroy functions

2015-09-13 Thread Julia Lawall
Recent commits to kernel/git/torvalds/linux.git have made the following
functions able to tolerate NULL arguments:

kmem_cache_destroy (commit 3942d29918522)
mempool_destroy (commit 4e3ca3e033d1)
dma_pool_destroy (commit 44d7175da6ea)

These patches remove the associated NULL tests for the files that I found
easy to compile test.  If these changes are OK, I will address the
remainder later.

---

 arch/x86/kvm/mmu.c |6 --
 block/bio-integrity.c  |7 --
 block/bio.c|7 --
 block/blk-core.c   |3 -
 block/elevator.c   |3 -
 drivers/atm/he.c   |7 --
 drivers/block/aoe/aoedev.c |3 -
 drivers/block/drbd/drbd_main.c |   21 ++-
 drivers/block/pktcdvd.c|3 -
 drivers/block/rbd.c|6 --
 drivers/dma/dmaengine.c|6 --
 drivers/firmware/google/gsmi.c |3 -
 drivers/gpu/drm/i915/i915_dma.c|   19 ++
 drivers/iommu/amd_iommu_init.c |7 --
 drivers/md/bcache/bset.c   |3 -
 drivers/md/bcache/request.c|3 -
 drivers/md/bcache/super.c  |9 +--
 drivers/md/dm-bufio.c  |3 -
 drivers/md/dm-cache-target.c   |3 -
 drivers/md/dm-crypt.c  |6 --
 drivers/md/dm-io.c |3 -
 drivers/md/dm-log-userspace-base.c |3 -
 drivers/md/dm-region-hash.c|4 -
 drivers/md/dm.c|   13 +---
 drivers/md/multipath.c |3 -
 drivers/md/raid1.c |6 --
 drivers/md/raid10.c|9 +--
 drivers/md/raid5.c |3 -
 drivers/mtd/nand/nandsim.c |3 -
 drivers/mtd/ubi/attach.c   |4 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c  |3 -
 drivers/staging/lustre/lustre/llite/super25.c  |   16 +
 drivers/staging/lustre/lustre/obdclass/genops.c|   24 ++--
 drivers/staging/lustre/lustre/obdclass/lu_object.c |6 --
 drivers/staging/rdma/hfi1/user_sdma.c  |3 -
 drivers/thunderbolt/ctl.c  |3 -
 drivers/usb/gadget/udc/bdc/bdc_core.c  |3 -
 drivers/usb/gadget/udc/gr_udc.c|3 -
 drivers/usb/gadget/udc/mv_u3d_core.c   |3 -
 drivers/usb/gadget/udc/mv_udc_core.c   |3 -
 drivers/usb/host/fotg210-hcd.c |   12 +---
 drivers/usb/host/fusbh200-hcd.c|   12 +---
 drivers/usb/host/whci/init.c   |3 -
 drivers/usb/host/xhci-mem.c|   12 +---
 fs/btrfs/backref.c |3 -
 fs/btrfs/delayed-inode.c   |3 -
 fs/btrfs/delayed-ref.c |   12 +---
 fs/btrfs/disk-io.c |3 -
 fs/btrfs/extent_io.c   |6 --
 fs/btrfs/extent_map.c  |3 -
 fs/btrfs/file.c|3 -
 fs/btrfs/inode.c   |   18 ++
 fs/btrfs/ordered-data.c|3 -
 fs/dlm/memory.c|6 --
 fs/ecryptfs/main.c |3 -
 fs/ext4/crypto.c   |9 +--
 fs/ext4/extents_status.c   |3 -
 fs/ext4/mballoc.c  |3 -
 fs/f2fs/crypto.c   |9 +--
 fs/gfs2/main.c |   29 ++
 fs/jbd2/journal.c  |   15 +
 fs/jbd2/revoke.c   |   12 +---
 fs/jbd2/transaction.c  |6 --
 fs/jffs2/malloc.c  |   27 +++--
 fs/nfsd/nfscache.c |6 --
 fs/nilfs2/super.c  |   12 +---
 fs/ocfs2/dlm/dlmlock.c |3 -
 fs/ocfs2/dlm/dlmmaster.c   |   16 +
 fs/ocfs2/super.c   |   18 ++
 fs/ocfs2/uptodate.c|3 -
 lib/debugobjects.c |3 -
 net/core/sock.c|   12 +---
 net/dccp/ackvec.c  |   12 +---
 net/dccp/ccid.c

Re: How to save log when test met bugs

2015-09-09 Thread Loic Dachary
[adding ceph-devel as this may also be an inconvenient to others]


On 09/09/2015 10:23, Ma, Jianpeng wrote:> Hi Loic:
>   Today, I run test/cephtool-test-mds.sh, because my code has bug cause osd 
> down. I only from the screen saw "osd o down " and so on. But I don't 
> find the related osd log.
> This because in vstart_wrapper.sh
>>> function vstart_setup()
>>> {
>>>rm -fr $CEPH_DEV_DIR $CEPH_OUT_DIR
>>>mkdir -p $CEPH_DEV_DIR
>>>trap "teardown $CEPH_DIR" EXIT
>  This " trap "teardown $CEPH_DIR" EXIT" will remove the test dir(osd log in 
> this).
> 
> I don' know how to resolve this. But I think this really is a bug. Or am 
> missing something?

You're right, it is inconvenient for debug. What I usually do in these 
situations is add "exit" or even a sleep 1000 somewhere to bypass 
everything and give me time to take a look at the logs. There should really be 
a way to ask (option, environment variable ?) "please keep the logs when it 
fails instead of cleaning up".

What do you think ?

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Test

2015-09-07 Thread Wukongming
Test
-
本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from H3C, 
which is
intended only for the person or entity whose address is listed above. Any use 
of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender
by phone or email immediately and delete it!
N�r��yb�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

test

2015-09-06 Thread changtao381
test


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: My test on newstore show much poorer performance than filestore

2015-08-27 Thread Sage Weil
On Thu, 27 Aug 2015, wenjunh wrote:
 Hi
 I have a try of the newstore to explore its performance, which shows 
 its performance is much poorer than filestore.
 
 I test my cluster using fio, Here is the comparison of the two store 
 with randread  randwrite scenario:
 
 rw=randread, bs=4K, numjobs=1:
 newstore: bw=2280.7KB/s, iops=570
 filestore: bw=4846.6KB/s, iops=1211
 
 rw=randwrite, bs=4K, numjobs=1:
 newstore: bw=32999 B/s, iops=8
 filestore: bw=250978 B/s, iops=61
 The two tests are performed using the same hardware, but newstore is 
 much poorer than filestore. From the view of the community, newstore is 
 the next default backend store in Ceph, but why its performance so poor. 
 Could someone tell me why?

Newstore doesn't do well with random IO just yet, especially when the 
WAL/journal is not on a separate device, because rocksdb doesn't do a very 
good job with its log management.  I'm working on fixing this in rocksdb 
now.

At the other end of the spectrum, if you do something like rados bench (4M 
writes) newstore should be almost 2x faster.

Were you using an SSD journal with filestore or a disk partition?  
Newstore doesn't know how to use a separate SSD yet unless you set up the 
wal_dir manually.

sage

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


My test on newstore show much poorer performance than filestore

2015-08-27 Thread wenjunh
Hi
I have a try of the newstore to explore its performance, which shows its 
performance is much poorer than filestore.

I test my cluster using fio, Here is the comparison of the two store with 
randread  randwrite scenario:

rw=randread, bs=4K, numjobs=1:
newstore: bw=2280.7KB/s, iops=570
filestore: bw=4846.6KB/s, iops=1211

rw=randwrite, bs=4K, numjobs=1:
newstore: bw=32999 B/s, iops=8
filestore: bw=250978 B/s, iops=61
The two tests are performed using the same hardware, but newstore is much 
poorer than filestore. From the view of the community, newstore is the next 
default backend store in Ceph, but why its performance so poor. Could someone 
tell me why?

Thanks
wenjunh
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: test/run-rbd-unit-tests.sh : pure virtual method called

2015-08-05 Thread Loic Dachary
Hi,

Here is another make check fail. They don't seem to be related. To the best of 
my knowledge these are the only two rbd related failures in make check during 
the past week.

http://jenkins.ceph.dachary.org/job/ceph/LABELS=ubuntu-14.04x86_64/6884/console

[ RUN  ] TestLibRBD.ObjectMapConsistentSnap
using new format!
test/librbd/test_librbd.cc:2790: Failure
Value of: passed
  Actual: false
Expected: true
[  FAILED  ] TestLibRBD.ObjectMapConsistentSnap (396 ms)

[--] Global test environment tear-down
[==] 98 tests from 6 test cases ran. (10554 ms total)
[  PASSED  ] 97 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] TestLibRBD.ObjectMapConsistentSnap

On 03/08/2015 18:01, Loic Dachary wrote:
 Hi,
 
 test/run-rbd-unit-tests.sh failed today on master on Ubuntu 14.04, when run 
 by the make check bot on an unrelated pull request (modifying do_autogen 
 which is not used by the make check bot).
 
 http://jenkins.ceph.dachary.org/job/ceph/LABELS=ubuntu-14.04x86_64/6834/console
 
 [ RUN  ] TestInternal.MultipleResize
 pure virtual method called
 terminate called without an active exception
 
 Cheers
 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Odd QA Test Running

2015-08-03 Thread Haomai Wang
I found 
https://github.com/ceph/ceph-qa-suite/blob/master/erasure-code/ec-rados-plugin%3Dshec-k%3D4-m%3D3-c%3D2.yaml
has override section and will override user's enable experimental
unrecoverable data corrupting features config. So my jobs are
corrupted.

I made a PR(https://github.com/ceph/ceph-qa-suite/pull/518) and hope
fix this point.

On Fri, Jul 31, 2015 at 5:50 PM, Haomai Wang haomaiw...@gmail.com wrote:
 Hi all,

 I  ran a test 
 suite(http://pulpito.ceph.com/haomai-2015-07-29_11:40:40-rados-master-distro-basic-multi/)
 and found the failed jobs are failed by 2015-07-29 10:52:35.313197
 7f16ae655780 -1 unrecognized ms_type 'async'

 Then I found the failed jobs(like
 http://pulpito.ceph.com/haomai-2015-07-29_11:40:40-rados-master-distro-basic-multi/991540/)
 lack of “enable experimental unrecoverable data corrupting features:
 ms-type-async”.

 Other successful jobs(like
 http://pulpito.ceph.com/haomai-2015-07-29_11:40:40-rados-master-distro-basic-multi/991517/)
 can find enable experimental unrecoverable data corrupting features:
 ms-type-async in yaml.

 So that's mean the same schedule suite will generate the different
 yaml file? Is there something tricky?

 --

 Best Regards,

 Wheat



-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Odd QA Test Running

2015-07-31 Thread Haomai Wang
Hi all,

I  ran a test 
suite(http://pulpito.ceph.com/haomai-2015-07-29_11:40:40-rados-master-distro-basic-multi/)
and found the failed jobs are failed by 2015-07-29 10:52:35.313197
7f16ae655780 -1 unrecognized ms_type 'async'

Then I found the failed jobs(like
http://pulpito.ceph.com/haomai-2015-07-29_11:40:40-rados-master-distro-basic-multi/991540/)
lack of “enable experimental unrecoverable data corrupting features:
ms-type-async”.

Other successful jobs(like
http://pulpito.ceph.com/haomai-2015-07-29_11:40:40-rados-master-distro-basic-multi/991517/)
can find enable experimental unrecoverable data corrupting features:
ms-type-async in yaml.

So that's mean the same schedule suite will generate the different
yaml file? Is there something tricky?

-- 

Best Regards,

Wheat
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/18] rbd: add write test helper

2015-07-29 Thread mchristi
From: Mike Christie micha...@cs.wisc.edu

The next patches add a couple new commands that have write data.
This patch adds a helper to combine all the IMG_REQ write tests.

Signed-off-by: Mike Christie micha...@cs.wisc.edu
---
 drivers/block/rbd.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 1df7bdd..8d0b30a 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1739,6 +1739,12 @@ static bool img_request_layered_test(struct 
rbd_img_request *img_request)
return test_bit(IMG_REQ_LAYERED, img_request-flags) != 0;
 }
 
+static bool img_request_is_write_type_test(struct rbd_img_request *img_request)
+{
+   return img_request_write_test(img_request) ||
+  img_request_discard_test(img_request);
+}
+
 static enum obj_operation_type
 rbd_img_request_op_type(struct rbd_img_request *img_request)
 {
@@ -2024,8 +2030,7 @@ rbd_osd_req_create_copyup(struct rbd_obj_request 
*obj_request)
rbd_assert(obj_request_img_data_test(obj_request));
img_request = obj_request-img_request;
rbd_assert(img_request);
-   rbd_assert(img_request_write_test(img_request) ||
-   img_request_discard_test(img_request));
+   rbd_assert(img_request_is_write_type_test(img_request));
 
if (img_request_discard_test(img_request))
num_osd_ops = 2;
@@ -2259,8 +2264,7 @@ static void rbd_img_request_destroy(struct kref *kref)
rbd_dev_parent_put(img_request-rbd_dev);
}
 
-   if (img_request_write_test(img_request) ||
-   img_request_discard_test(img_request))
+   if (img_request_is_write_type_test(img_request))
ceph_put_snap_context(img_request-snapc);
 
kmem_cache_free(rbd_img_request_cache, img_request);
@@ -2977,8 +2981,7 @@ static bool img_obj_request_simple(struct rbd_obj_request 
*obj_request)
rbd_dev = img_request-rbd_dev;
 
/* Reads */
-   if (!img_request_write_test(img_request) 
-   !img_request_discard_test(img_request))
+   if (!img_request_is_write_type_test(img_request))
return true;
 
/* Non-layered writes */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


test

2015-07-28 Thread Shinobu Kinjo
Please ignore.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 9.0.2 test/perf_local.cc on non-x86 architectures

2015-07-24 Thread kefu chai
On Wed, Jul 22, 2015 at 12:34 AM, Deneau, Tom tom.den...@amd.com wrote:
 I was trying to do an rpmbuild of v9.0.2 for aarch64 and got the following 
 error:

 test/perf_local.cc: In function 'double div32()':
 test/perf_local.cc:396:31: error: impossible constraint in 'asm'
   cc);

 Probably should have an if defined (__i386__) around it.

Tom,

hopefully https://github.com/ceph/ceph/pull/5342 will address it.

thanks,
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


9.0.2 test/perf_local.cc on non-x86 architectures

2015-07-21 Thread Deneau, Tom
I was trying to do an rpmbuild of v9.0.2 for aarch64 and got the following 
error:

test/perf_local.cc: In function 'double div32()':
test/perf_local.cc:396:31: error: impossible constraint in 'asm'
  cc);

Probably should have an if defined (__i386__) around it.

-- Tom

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Block storage performance test tool - would like to merge into cbt

2015-07-14 Thread Konstantin Danilov
Mark,

does Wednesday performance meeting is a good place for discussion, or
we need a separated one?

On Mon, Jul 13, 2015 at 6:16 PM, Mark Nelson mnel...@redhat.com wrote:
 Hi Konstantin,

 I'm definitely interested in looking at your tools and seeing if we can
 merge them into cbt!  One of the things we lack right now in cbt is any kind
 of real openstack integration.  Right now CBT basically just assumes you've
 already launched VMs and specified them as clients in the yaml, so being
 able to spin up VMs in a standard way would be very useful.  It might be
 worth exploring if we can use your tool to make the cluster base class
 openstack aware so that any of the eventual cluster classes (ceph, and
 maybe some day gluster, swift, etc) can use it to launch VMs or do other
 things.  I'd really love to be able to create a cbt yaml config file and
 iterate through parametric configuration parameters building multiple
 different clusters and running tests against them with system monitoring and
 data post processing happening automatically.

 The data post processing is also something that will be very useful.  We
 have a couple of folks really interested in this area as well.

 Mark




 On 07/11/2015 03:02 AM, Konstantin Danilov wrote:

 Hi all,

 We(Mirantis ceph team) have a tool for block storage performance test,
 called 'wally' -
 https://github.com/Mirantis/disk_perf_test_tool.

 It has some nice features, like:

 * Openstack and FUEL integration (can spawn VM for tests, gather HW info,
 etc)
 * Set of tests, joined into suit, which measures different performnce
 aspects and
 creates joined report, as example -
 http://koder-ua.github.io/6.1GA/cinder_volume_iscsi.html,
 VM running on ceph drives report example -
 http://koder-ua.github.io/random/ceph_example.html
 * Data postrocessing - confidence intervals, etc

 We would like to merge our code into cbt. Do you interesting in it?
 Can we discuss a way to merge?

 Thanks





-- 
Kostiantyn Danilov aka koder.ua
Principal software engineer, Mirantis

skype:koder.ua
http://koder-ua.blogspot.com/
http://mirantis.com
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Block storage performance test tool - would like to merge into cbt

2015-07-14 Thread Mark Nelson
Probably depends how much time it will take.  I'd say if we think it 
might be more than 15 minutes of discussion we should wait until the end 
of the performance meeting and then talk about it.  If it's fairly quick 
though we could probably add it to the perf meeting itself.


Mark

On 07/14/2015 03:11 AM, Konstantin Danilov wrote:

Mark,

does Wednesday performance meeting is a good place for discussion, or
we need a separated one?

On Mon, Jul 13, 2015 at 6:16 PM, Mark Nelson mnel...@redhat.com wrote:

Hi Konstantin,

I'm definitely interested in looking at your tools and seeing if we can
merge them into cbt!  One of the things we lack right now in cbt is any kind
of real openstack integration.  Right now CBT basically just assumes you've
already launched VMs and specified them as clients in the yaml, so being
able to spin up VMs in a standard way would be very useful.  It might be
worth exploring if we can use your tool to make the cluster base class
openstack aware so that any of the eventual cluster classes (ceph, and
maybe some day gluster, swift, etc) can use it to launch VMs or do other
things.  I'd really love to be able to create a cbt yaml config file and
iterate through parametric configuration parameters building multiple
different clusters and running tests against them with system monitoring and
data post processing happening automatically.

The data post processing is also something that will be very useful.  We
have a couple of folks really interested in this area as well.

Mark




On 07/11/2015 03:02 AM, Konstantin Danilov wrote:


Hi all,

We(Mirantis ceph team) have a tool for block storage performance test,
called 'wally' -
https://github.com/Mirantis/disk_perf_test_tool.

It has some nice features, like:

* Openstack and FUEL integration (can spawn VM for tests, gather HW info,
etc)
* Set of tests, joined into suit, which measures different performnce
aspects and
creates joined report, as example -
http://koder-ua.github.io/6.1GA/cinder_volume_iscsi.html,
VM running on ceph drives report example -
http://koder-ua.github.io/random/ceph_example.html
* Data postrocessing - confidence intervals, etc

We would like to merge our code into cbt. Do you interesting in it?
Can we discuss a way to merge?

Thanks








--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Block storage performance test tool - would like to merge into cbt

2015-07-14 Thread Konstantin Danilov
I think better to discuss after the meeting

On Tue, Jul 14, 2015 at 2:22 PM, Mark Nelson mnel...@redhat.com wrote:
 Probably depends how much time it will take.  I'd say if we think it might
 be more than 15 minutes of discussion we should wait until the end of the
 performance meeting and then talk about it.  If it's fairly quick though we
 could probably add it to the perf meeting itself.

 Mark


 On 07/14/2015 03:11 AM, Konstantin Danilov wrote:

 Mark,

 does Wednesday performance meeting is a good place for discussion, or
 we need a separated one?

 On Mon, Jul 13, 2015 at 6:16 PM, Mark Nelson mnel...@redhat.com wrote:

 Hi Konstantin,

 I'm definitely interested in looking at your tools and seeing if we can
 merge them into cbt!  One of the things we lack right now in cbt is any
 kind
 of real openstack integration.  Right now CBT basically just assumes
 you've
 already launched VMs and specified them as clients in the yaml, so being
 able to spin up VMs in a standard way would be very useful.  It might be
 worth exploring if we can use your tool to make the cluster base class
 openstack aware so that any of the eventual cluster classes (ceph, and
 maybe some day gluster, swift, etc) can use it to launch VMs or do other
 things.  I'd really love to be able to create a cbt yaml config file and
 iterate through parametric configuration parameters building multiple
 different clusters and running tests against them with system monitoring
 and
 data post processing happening automatically.

 The data post processing is also something that will be very useful.  We
 have a couple of folks really interested in this area as well.

 Mark




 On 07/11/2015 03:02 AM, Konstantin Danilov wrote:


 Hi all,

 We(Mirantis ceph team) have a tool for block storage performance test,
 called 'wally' -
 https://github.com/Mirantis/disk_perf_test_tool.

 It has some nice features, like:

 * Openstack and FUEL integration (can spawn VM for tests, gather HW
 info,
 etc)
 * Set of tests, joined into suit, which measures different performnce
 aspects and
 creates joined report, as example -
 http://koder-ua.github.io/6.1GA/cinder_volume_iscsi.html,
 VM running on ceph drives report example -
 http://koder-ua.github.io/random/ceph_example.html
 * Data postrocessing - confidence intervals, etc

 We would like to merge our code into cbt. Do you interesting in it?
 Can we discuss a way to merge?

 Thanks









-- 
Kostiantyn Danilov aka koder.ua
Principal software engineer, Mirantis

skype:koder.ua
http://koder-ua.blogspot.com/
http://mirantis.com
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Block storage performance test tool - would like to merge into cbt

2015-07-13 Thread Mark Nelson

Hi Konstantin,

I'm definitely interested in looking at your tools and seeing if we can 
merge them into cbt!  One of the things we lack right now in cbt is any 
kind of real openstack integration.  Right now CBT basically just 
assumes you've already launched VMs and specified them as clients in the 
yaml, so being able to spin up VMs in a standard way would be very 
useful.  It might be worth exploring if we can use your tool to make the 
cluster base class openstack aware so that any of the eventual cluster 
classes (ceph, and maybe some day gluster, swift, etc) can use it to 
launch VMs or do other things.  I'd really love to be able to create a 
cbt yaml config file and iterate through parametric configuration 
parameters building multiple different clusters and running tests 
against them with system monitoring and data post processing happening 
automatically.


The data post processing is also something that will be very useful.  We 
have a couple of folks really interested in this area as well.


Mark



On 07/11/2015 03:02 AM, Konstantin Danilov wrote:

Hi all,

We(Mirantis ceph team) have a tool for block storage performance test,
called 'wally' -
https://github.com/Mirantis/disk_perf_test_tool.

It has some nice features, like:

* Openstack and FUEL integration (can spawn VM for tests, gather HW info, etc)
* Set of tests, joined into suit, which measures different performnce
aspects and
creates joined report, as example -
http://koder-ua.github.io/6.1GA/cinder_volume_iscsi.html,
VM running on ceph drives report example -
http://koder-ua.github.io/random/ceph_example.html
* Data postrocessing - confidence intervals, etc

We would like to merge our code into cbt. Do you interesting in it?
Can we discuss a way to merge?

Thanks


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Block storage performance test tool - would like to merge into cbt

2015-07-11 Thread Konstantin Danilov
Hi all,

We(Mirantis ceph team) have a tool for block storage performance test,
called 'wally' -
https://github.com/Mirantis/disk_perf_test_tool.

It has some nice features, like:

* Openstack and FUEL integration (can spawn VM for tests, gather HW info, etc)
* Set of tests, joined into suit, which measures different performnce
aspects and
creates joined report, as example -
http://koder-ua.github.io/6.1GA/cinder_volume_iscsi.html,
VM running on ceph drives report example -
http://koder-ua.github.io/random/ceph_example.html
* Data postrocessing - confidence intervals, etc

We would like to merge our code into cbt. Do you interesting in it?
Can we discuss a way to merge?

Thanks

-- 
Kostiantyn Danilov aka koder.ua
Principal software engineer, Mirantis

skype:koder.ua
http://koder-ua.blogspot.com/
http://mirantis.com
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: unit test for messaging crc32c

2015-06-18 Thread Dałek , Piotr
 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
 ow...@vger.kernel.org] On Behalf Of Dan van der Ster
 
 Hi all,
 
 We've recently experienced a broken router than was corrupting packets in
 way that the tcp checksums were still valid. There has been some resulting
 data corruption -- thus far the confirmed corruptions were outside of Ceph
 communications -- but it has made us want to double check the Ceph
 messenger crc32c code.
 
 We have crc32c enabled (as default), and I expected to find some bad crc
 messages logged on the clients and/or osds, but thus far we haven't found
 any.
 
 Is there a unit test which validates this mechanism, e.g. one which
 intentionally corrupts a Message then confirms that the crc code drops it? I
 didn't find anything relevant in src/test/, but I'm not too familiar with the
 framework.

Actually, all it takes is just to disable CRC in configuration on one node (or 
even 
daemon). It'll cause to put zeros in CRC fields in all messages sent, triggering
CRC check failures cluster-wide (on remaining, unaffected nodes/daemons).

There's also internal CRC32 calculator test in src/test/common/test_crc32c.cc.

With best regards / Pozdrawiam
Piotr Dałek

N�r��yb�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

unit test for messaging crc32c

2015-06-18 Thread Dan van der Ster
Hi all,

We've recently experienced a broken router than was corrupting packets
in way that the tcp checksums were still valid. There has been some
resulting data corruption -- thus far the confirmed corruptions were
outside of Ceph communications -- but it has made us want to double
check the Ceph messenger crc32c code.

We have crc32c enabled (as default), and I expected to find some bad
crc messages logged on the clients and/or osds, but thus far we
haven't found any.

Is there a unit test which validates this mechanism, e.g. one which
intentionally corrupts a Message then confirms that the crc code drops
it? I didn't find anything relevant in src/test/, but I'm not too
familiar with the framework.

Thanks in advance, Dan
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unit test for messaging crc32c

2015-06-18 Thread Dan van der Ster
On Thu, Jun 18, 2015 at 9:53 AM, Dałek, Piotr
piotr.da...@ts.fujitsu.com wrote:
 Actually, all it takes is just to disable CRC in configuration on one node 
 (or even
 daemon). It'll cause to put zeros in CRC fields in all messages sent, 
 triggering
 CRC check failures cluster-wide (on remaining, unaffected nodes/daemons).

Thanks for the hint. I will try that!

--
Dan
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cache pool on top of ec base pool teuthology test

2015-05-25 Thread Wang, Zhiqiang
We have a bunch of teuthology tests which build cache pool on top of an ec base 
pool, and do partial object write. This is ok with the current cache tiering 
implementation. But with proxy write, this won't work. In my testing, the error 
message is something like below:

2015-05-20 05:51:42.896828 7f682eed4700 10 osd.5 pg_epoch: 183 pg[2.1( v 
183'7978 (179'7799,183'7978] local-les=172 n=1258 ec=8 les/c 172/172 
170/171/171) [5,0] r=0 lpr=171 luod=183'7975 crt=183'7974 lcod 183'7974 mlcod 
183'7974 active+clean] do_proxy_write Start proxy write for 
osd_op(client.4130.0:32967 plana4222147-6594 [write 733117~408660] 2.3ed99ff9 
ack+ondisk+write+known_if_redirected e183) v5
2015-05-20 05:51:42.899958 7f682c6cf700  1 -- 10.214.132.32:6808/20666 -- 
10.214.132.32:0/20666 -- osd_op_reply(17556 plana4222147-6594 [write 
733117~408660] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6 -- ?+0 
0xa2e23c0 con 0x9355760

What should we do with these tests?
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Cache pool on top of ec base pool teuthology test

2015-05-25 Thread Sage Weil
On Mon, 25 May 2015, Wang, Zhiqiang wrote:
 We have a bunch of teuthology tests which build cache pool on top of an 
 ec base pool, and do partial object write. This is ok with the current 
 cache tiering implementation. But with proxy write, this won't work. In 
 my testing, the error message is something like below:
 
 2015-05-20 05:51:42.896828 7f682eed4700 10 osd.5 pg_epoch: 183 pg[2.1( v 
 183'7978 (179'7799,183'7978] local-les=172 n=1258 ec=8 les/c 172/172 
 170/171/171) [5,0] r=0 lpr=171 luod=183'7975 crt=183'7974 lcod 183'7974 mlcod 
 183'7974 active+clean] do_proxy_write Start proxy write for 
 osd_op(client.4130.0:32967 plana4222147-6594 [write 733117~408660] 2.3ed99ff9 
 ack+ondisk+write+known_if_redirected e183) v5
 2015-05-20 05:51:42.899958 7f682c6cf700  1 -- 10.214.132.32:6808/20666 -- 
 10.214.132.32:0/20666 -- osd_op_reply(17556 plana4222147-6594 [write 
 733117~408660] v0'0 uv0 ondisk = -95 ((95) Operation not supported)) v6 -- 
 ?+0 0xa2e23c0 con 0x9355760
 
 What should we do with these tests?

I think the test is fine.. but the OSD should refuse to proxy the write if 
the base tier won't support the write operation in question.  I believe we 
recently renamed one of the helpers supports_omap()... we probably need a 
similar set of helpers for object overwrites?

Or, we can make a method like should_proxy_to_ec() that scans through 
the op vector and makes a conservative judgement of whether it is safe 
to proxy...

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Cache pool on top of ec base pool teuthology test

2015-05-25 Thread Wang, Zhiqiang
OK, so the cache tier OSD checks if it's able to proxy the write op, if not, 
then forces a promotion.

Btw, I don't find the supports_omap helps in the current master. Do you mean 
the 'CEPH_OSD_COPY_GET_FLAG_NOTSUPP_OMAP' flag?

-Original Message-
From: Sage Weil [mailto:sw...@redhat.com] 
Sent: Monday, May 25, 2015 11:04 PM
To: Wang, Zhiqiang
Cc: ceph-devel@vger.kernel.org
Subject: Re: Cache pool on top of ec base pool teuthology test

On Mon, 25 May 2015, Wang, Zhiqiang wrote:
 We have a bunch of teuthology tests which build cache pool on top of 
 an ec base pool, and do partial object write. This is ok with the 
 current cache tiering implementation. But with proxy write, this won't 
 work. In my testing, the error message is something like below:
 
 2015-05-20 05:51:42.896828 7f682eed4700 10 osd.5 pg_epoch: 183 pg[2.1( 
 v 183'7978 (179'7799,183'7978] local-les=172 n=1258 ec=8 les/c 172/172 
 170/171/171) [5,0] r=0 lpr=171 luod=183'7975 crt=183'7974 lcod 
 183'7974 mlcod 183'7974 active+clean] do_proxy_write Start proxy write 
 for osd_op(client.4130.0:32967 plana4222147-6594 [write 733117~408660] 
 2.3ed99ff9 ack+ondisk+write+known_if_redirected e183) v5
 2015-05-20 05:51:42.899958 7f682c6cf700  1 -- 10.214.132.32:6808/20666 
 -- 10.214.132.32:0/20666 -- osd_op_reply(17556 plana4222147-6594 
 [write 733117~408660] v0'0 uv0 ondisk = -95 ((95) Operation not 
 supported)) v6 -- ?+0 0xa2e23c0 con 0x9355760
 
 What should we do with these tests?

I think the test is fine.. but the OSD should refuse to proxy the write if the 
base tier won't support the write operation in question.  I believe we recently 
renamed one of the helpers supports_omap()... we probably need a similar set of 
helpers for object overwrites?

Or, we can make a method like should_proxy_to_ec() that scans through the op 
vector and makes a conservative judgement of whether it is safe to proxy...

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Cache pool on top of ec base pool teuthology test

2015-05-25 Thread Wang, Zhiqiang
Yes, we can do the force promotion check in init_op_flags, as we did before.

-Original Message-
From: Sage Weil [mailto:sw...@redhat.com] 
Sent: Tuesday, May 26, 2015 10:19 AM
To: Wang, Zhiqiang
Cc: ceph-devel@vger.kernel.org
Subject: RE: Cache pool on top of ec base pool teuthology test

On Tue, 26 May 2015, Wang, Zhiqiang wrote:
 OK, so the cache tier OSD checks if it's able to proxy the write op, if not, 
 then forces a promotion.
 
 Btw, I don't find the supports_omap helps in the current master. Do you mean 
 the 'CEPH_OSD_COPY_GET_FLAG_NOTSUPP_OMAP' flag?

Yeah, that isn't helpful.  I think what we need is the helper to check the op, 
and make that dependent on is_erasure()?  That shoudl be good enough for now.

sage


 
 -Original Message-
 From: Sage Weil [mailto:sw...@redhat.com]
 Sent: Monday, May 25, 2015 11:04 PM
 To: Wang, Zhiqiang
 Cc: ceph-devel@vger.kernel.org
 Subject: Re: Cache pool on top of ec base pool teuthology test
 
 On Mon, 25 May 2015, Wang, Zhiqiang wrote:
  We have a bunch of teuthology tests which build cache pool on top of 
  an ec base pool, and do partial object write. This is ok with the 
  current cache tiering implementation. But with proxy write, this 
  won't work. In my testing, the error message is something like below:
  
  2015-05-20 05:51:42.896828 7f682eed4700 10 osd.5 pg_epoch: 183 
  pg[2.1( v 183'7978 (179'7799,183'7978] local-les=172 n=1258 ec=8 
  les/c 172/172
  170/171/171) [5,0] r=0 lpr=171 luod=183'7975 crt=183'7974 lcod
  183'7974 mlcod 183'7974 active+clean] do_proxy_write Start proxy 
  write for osd_op(client.4130.0:32967 plana4222147-6594 [write 
  733117~408660]
  2.3ed99ff9 ack+ondisk+write+known_if_redirected e183) v5
  2015-05-20 05:51:42.899958 7f682c6cf700  1 -- 
  10.214.132.32:6808/20666
  -- 10.214.132.32:0/20666 -- osd_op_reply(17556 plana4222147-6594
  [write 733117~408660] v0'0 uv0 ondisk = -95 ((95) Operation not
  supported)) v6 -- ?+0 0xa2e23c0 con 0x9355760
  
  What should we do with these tests?
 
 I think the test is fine.. but the OSD should refuse to proxy the write if 
 the base tier won't support the write operation in question.  I believe we 
 recently renamed one of the helpers supports_omap()... we probably need a 
 similar set of helpers for object overwrites?
 
 Or, we can make a method like should_proxy_to_ec() that scans through the 
 op vector and makes a conservative judgement of whether it is safe to proxy...
 
 sage
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel 
 in the body of a message to majord...@vger.kernel.org More majordomo 
 info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Cache pool on top of ec base pool teuthology test

2015-05-25 Thread Sage Weil
On Tue, 26 May 2015, Wang, Zhiqiang wrote:
 OK, so the cache tier OSD checks if it's able to proxy the write op, if not, 
 then forces a promotion.
 
 Btw, I don't find the supports_omap helps in the current master. Do you mean 
 the 'CEPH_OSD_COPY_GET_FLAG_NOTSUPP_OMAP' flag?

Yeah, that isn't helpful.  I think what we need is the helper to check the 
op, and make that dependent on is_erasure()?  That shoudl be good enough 
for now.

sage


 
 -Original Message-
 From: Sage Weil [mailto:sw...@redhat.com] 
 Sent: Monday, May 25, 2015 11:04 PM
 To: Wang, Zhiqiang
 Cc: ceph-devel@vger.kernel.org
 Subject: Re: Cache pool on top of ec base pool teuthology test
 
 On Mon, 25 May 2015, Wang, Zhiqiang wrote:
  We have a bunch of teuthology tests which build cache pool on top of 
  an ec base pool, and do partial object write. This is ok with the 
  current cache tiering implementation. But with proxy write, this won't 
  work. In my testing, the error message is something like below:
  
  2015-05-20 05:51:42.896828 7f682eed4700 10 osd.5 pg_epoch: 183 pg[2.1( 
  v 183'7978 (179'7799,183'7978] local-les=172 n=1258 ec=8 les/c 172/172 
  170/171/171) [5,0] r=0 lpr=171 luod=183'7975 crt=183'7974 lcod 
  183'7974 mlcod 183'7974 active+clean] do_proxy_write Start proxy write 
  for osd_op(client.4130.0:32967 plana4222147-6594 [write 733117~408660] 
  2.3ed99ff9 ack+ondisk+write+known_if_redirected e183) v5
  2015-05-20 05:51:42.899958 7f682c6cf700  1 -- 10.214.132.32:6808/20666 
  -- 10.214.132.32:0/20666 -- osd_op_reply(17556 plana4222147-6594 
  [write 733117~408660] v0'0 uv0 ondisk = -95 ((95) Operation not 
  supported)) v6 -- ?+0 0xa2e23c0 con 0x9355760
  
  What should we do with these tests?
 
 I think the test is fine.. but the OSD should refuse to proxy the write if 
 the base tier won't support the write operation in question.  I believe we 
 recently renamed one of the helpers supports_omap()... we probably need a 
 similar set of helpers for object overwrites?
 
 Or, we can make a method like should_proxy_to_ec() that scans through the 
 op vector and makes a conservative judgement of whether it is safe to proxy...
 
 sage
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Rados: add test case for CEPH_OP_FLAG_TIER_NOCACHE

2015-05-24 Thread Li Wang
From: Min Chen minc...@ubuntukylin.com

Signed-off-by: Min Chen minc...@ubuntukylin.com
Reviewed-by: Li Wang liw...@ubuntukylin.com
---
 src/test/librados/tier.cc | 176 ++
 1 file changed, 176 insertions(+)

diff --git a/src/test/librados/tier.cc b/src/test/librados/tier.cc
index 60b56f9..0b3368e 100644
--- a/src/test/librados/tier.cc
+++ b/src/test/librados/tier.cc
@@ -2390,6 +2390,94 @@ TEST_F(LibRadosTwoPoolsPP, ProxyRead) {
   cluster.wait_for_latest_osdmap();
 }
 
+TEST_F(LibRadosTwoPoolsPP, TierNocache) {
+  // configure cache
+  bufferlist inbl;
+  ASSERT_EQ(0, cluster.mon_command(
+{\prefix\: \osd tier add\, \pool\: \ + pool_name +
+\, \tierpool\: \ + cache_pool_name +
+\, \force_nonempty\: \--force-nonempty\ },
+inbl, NULL, NULL));
+  ASSERT_EQ(0, cluster.mon_command(
+{\prefix\: \osd tier set-overlay\, \pool\: \ + pool_name +
+\, \overlaypool\: \ + cache_pool_name + \},
+inbl, NULL, NULL));
+
+  std::string cache_modes[] = {writeback, forward, readonly, 
readforward, readproxy};
+  int count = (int) sizeof(cache_modes)/sizeof(cache_modes[0]);
+  int i;
+  // test write/read with TierNocache in each cache-mode
+  for (i = 0; i  count; i++)
+  {
+std::cout  set cache-mode: + cache_modes[i]  std::endl;
+ASSERT_EQ(0, cluster.mon_command(
+  {\prefix\: \osd tier cache-mode\, \pool\: \ + cache_pool_name +
+  \, \mode\: \ + cache_modes[i] + \},
+  inbl, NULL, NULL));
+
+// wait for maps to settle
+cluster.wait_for_latest_osdmap();
+
+std::string content(cache_modes[i]);
+std::string object(cache_modes[i]+_obj);
+ObjectWriteOperation wr;
+librados::AioCompletion *completion = cluster.aio_create_completion();
+
+//writeback: create a new object
+bufferlist bl;
+bl.append(content);
+wr.write_full(bl);
+ioctx.aio_operate(object, completion, wr, 
librados::OPERATION_TIER_NOCACHE);
+completion-wait_for_safe();
+completion-release();
+
+// verify the object is NOT present in the cache tier
+{
+  NObjectIterator it = cache_ioctx.nobjects_begin();
+  ASSERT_TRUE(it == cache_ioctx.nobjects_end());
+}
+
+//writeback: read the object content
+ObjectReadOperation rd;
+uint64_t len = bl.length();
+completion = cluster.aio_create_completion();
+bufferlist bl2;
+bufferlist bl3;
+rd.read(0, len+1, bl3, NULL);
+ASSERT_EQ(0, ioctx.aio_operate(
+   object, completion, rd,
+   librados::OPERATION_TIER_NOCACHE, NULL));
+completion-wait_for_complete();
+
+ASSERT_EQ(0, completion-get_return_value());
+uint64_t n = 0;
+for (n = 0; n  len; n++) {
+   bl2.append(bl3[n]);
+}
+ASSERT_EQ(content, bl2.c_str());
+completion-release();
+
+// verify the object is NOT present in the cache tier
+{
+  NObjectIterator it = cache_ioctx.nobjects_begin();
+  ASSERT_TRUE(it == cache_ioctx.nobjects_end());
+}
+  }
+
+  // tear down tiers
+  ASSERT_EQ(0, cluster.mon_command(
+{\prefix\: \osd tier remove-overlay\, \pool\: \ + pool_name +
+\},
+inbl, NULL, NULL));
+  ASSERT_EQ(0, cluster.mon_command(
+{\prefix\: \osd tier remove\, \pool\: \ + pool_name +
+\, \tierpool\: \ + cache_pool_name + \},
+inbl, NULL, NULL));
+
+  // wait for maps to settle before next test
+  cluster.wait_for_latest_osdmap();
+}
+
 class LibRadosTwoPoolsECPP : public RadosTestECPP
 {
 public:
@@ -4439,6 +4527,94 @@ TEST_F(LibRadosTwoPoolsECPP, ProxyRead) {
   cluster.wait_for_latest_osdmap();
 }
 
+TEST_F(LibRadosTwoPoolsECPP, TierNocache) {
+  // configure cache
+  bufferlist inbl;
+  ASSERT_EQ(0, cluster.mon_command(
+{\prefix\: \osd tier add\, \pool\: \ + pool_name +
+\, \tierpool\: \ + cache_pool_name +
+\, \force_nonempty\: \--force-nonempty\ },
+inbl, NULL, NULL));
+  ASSERT_EQ(0, cluster.mon_command(
+{\prefix\: \osd tier set-overlay\, \pool\: \ + pool_name +
+\, \overlaypool\: \ + cache_pool_name + \},
+inbl, NULL, NULL));
+
+  std::string cache_modes[] = {writeback, forward, readonly, 
readforward, readproxy};
+  int count = (int) sizeof(cache_modes)/sizeof(cache_modes[0]);
+  int i;
+  // test write/read with TierNocache in each cache-mode
+  for (i = 0; i  count; i++)
+  {
+std::cout  set cache-mode: + cache_modes[i]  std::endl;
+ASSERT_EQ(0, cluster.mon_command(
+  {\prefix\: \osd tier cache-mode\, \pool\: \ + cache_pool_name +
+  \, \mode\: \ + cache_modes[i] + \},
+  inbl, NULL, NULL));
+
+// wait for maps to settle
+cluster.wait_for_latest_osdmap();
+
+std::string content(cache_modes[i]);
+std::string object(cache_modes[i]+_obj);
+ObjectWriteOperation wr;
+librados::AioCompletion *completion = cluster.aio_create_completion();
+
+//writeback: create a new object
+bufferlist bl;
+bl.append(content);
+wr.write_full(bl);
+ioctx.aio_operate(object, completion, wr, 
librados::OPERATION_TIER_NOCACHE

[PATCH 5/5] Doc: add temperature related stuff in documents and test scripts

2015-05-21 Thread Li Wang
From: MingXin Liu mingxin...@ubuntukylin.com

Signed-off-by: MingXin Liu mingxin...@ubuntukylin.com
Reviewed-by: Li Wang liw...@ubuntukylin.com
---
 doc/dev/cache-pool.rst |  4 
 doc/man/8/ceph.rst | 12 +---
 doc/rados/operations/pools.rst |  7 +++
 qa/workunits/cephtool/test.sh  | 14 ++
 4 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/doc/dev/cache-pool.rst b/doc/dev/cache-pool.rst
index f44cbd9..d3b6257 100644
--- a/doc/dev/cache-pool.rst
+++ b/doc/dev/cache-pool.rst
@@ -179,5 +179,9 @@ the cache tier::
 
  ceph osd pool set foo-hot cache_min_evict_age 1800   # 30 minutes
 
+You can specify the objects evict policy(cache-measure),when cache-measure is 
set as atime
+the most recent objects are hotter than others,if use temperature as measure 
agent will consider
+both access time and frequency::
+ ceph osd tier cache-measure foo-hot atime|temperature
 
 
diff --git a/doc/man/8/ceph.rst b/doc/man/8/ceph.rst
index f950221..53133d8 100644
--- a/doc/man/8/ceph.rst
+++ b/doc/man/8/ceph.rst
@@ -45,7 +45,7 @@ Synopsis
 
 | **ceph** **osd** **pool** [ *create* \| *delete* \| *get* \| *get-quota* \| 
*ls* \| *mksnap* \| *rename* \| *rmsnap* \| *set* \| *set-quota* \| *stats* ] 
...
 
-| **ceph** **osd** **tier** [ *add* \| *add-cache* \| *cache-mode* \| *remove* 
\| *remove-overlay* \| *set-overlay* ] ...
+| **ceph** **osd** **tier** [ *add* \| *add-cache* \| *cache-mode* \| 
*cache-measure* \| *remove* \| *remove-overlay* \| *set-overlay* ] ...
 
 | **ceph** **pg** [ *debug* \| *deep-scrub* \| *dump* \| *dump_json* \| 
*dump_pools_json* \| *dump_stuck* \| *force_create_pg* \| *getmap* \| *ls* \| 
*ls-by-osd* \| *ls-by-pool* \| *ls-by-primary* \| *map* \| *repair* \| *scrub* 
\| *send_pg_creates* \| *set_full_ratio* \| *set_nearfull_ratio* \| *stat* ] ...
 
@@ -878,7 +878,7 @@ Only for tiered pools::
ceph osd pool get poolname 
hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|
target_max_objects|target_max_bytes|cache_target_dirty_ratio|
cache_target_full_ratio|cache_min_flush_age|cache_min_evict_age|
-   min_read_recency_for_promote
+   min_read_recency_for_promote|hit_set_grade_decay_rate
 
 Only for erasure coded pools::
 
@@ -927,7 +927,7 @@ Usage::

hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|debug_fake_ec_pool|
target_max_bytes|target_max_objects|cache_target_dirty_ratio|
cache_target_full_ratio|cache_min_flush_age|cache_min_evict_age|auid|
-   min_read_recency_for_promote|write_fadvise_dontneed
+   
min_read_recency_for_promote|write_fadvise_dontneed|hit_set_grade_decay_rate
val {--yes-i-really-mean-it}
 
 Subcommand ``set-quota`` sets object or byte limit on pool.
@@ -1049,6 +1049,12 @@ Usage::
ceph osd tier cache-mode poolname none|writeback|forward|readonly|
readforward|readproxy
 
+Subcommand ``cache-measure`` specifies the caching measure for cache tier 
pool.
+
+Usage::
+
+   ceph osd tier cache-measure poolname atime|temperature
+
 Subcommand ``remove`` removes the tier tierpool (the second one) from base 
pool
 pool (the first one).
 
diff --git a/doc/rados/operations/pools.rst b/doc/rados/operations/pools.rst
index 36b9c94..2c6deab 100644
--- a/doc/rados/operations/pools.rst
+++ b/doc/rados/operations/pools.rst
@@ -374,6 +374,13 @@ You may set values for the following keys:
 :Example: ``100`` #1M objects
 
 
+``hit_set_grade_decay_rate``
+:Description: Temperature grade decay rate between a hit_set and the follow one
+:Type: Integer
+:Valid Range: 0 - 100
+:Default: ``50``
+
+
 ``cache_min_flush_age``
 
 :Description: The time (in seconds) before the cache tiering agent will flush 
diff --git a/qa/workunits/cephtool/test.sh b/qa/workunits/cephtool/test.sh
index 15d4e73..c51592a 100755
--- a/qa/workunits/cephtool/test.sh
+++ b/qa/workunits/cephtool/test.sh
@@ -415,6 +415,20 @@ function test_tiering()
   ceph osd pool delete cache5 cache5 --yes-i-really-really-mean-it
   ceph osd pool delete basepoolB basepoolB --yes-i-really-really-mean-it
   ceph osd pool delete basepoolA basepoolA --yes-i-really-really-mean-it
+
+  #cache-measure
+  ceph osd pool create Mbase1 2
+  ceph osd pool create Mcache1 2
+  ceph osd tier add Mbase1 Mcache1
+  ceph osd pool set Mcache1 hit_set_type bloom
+  ceph osd pool set Mcache1 hit_set_count 4
+  ceph osd pool set Mcache1 hit_set_period 1200
+  ceph osd pool set Mcache1 hit_set_grade_decay_rate 3
+  ceph osd tier cache-mode writeback
+  ceph osd tier cache-measure temperature
+  ceph osd tier set-overlay Mbase1 Mcache1
+  ceph osd tier cache-measure atime
+  ceph osd tier cache-measure temperature
 }
 
 function test_auth()
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: make check, src/test/ceph-disk.sh fails on Mint

2015-05-16 Thread Michal Jarzabek
Well, I do run it on Linux Mint, but rest of the tests passes without
any problems. So I was wondering if there was any simple way to fix
this one as well.

On Sat, May 16, 2015 at 10:30 PM, David Zafman dzaf...@redhat.com wrote:

 Is something really broken?  Or are you just on an unsupported platform?

 David


 On 5/16/15 8:49 AM, Michal Jarzabek wrote:

 ceph_detect_init.exc.UnsupportedPlatform: Platform is not supported.:


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Performance test for cache tier

2015-04-20 Thread Ning Yao
Hi, all

Fio will exactly access each block once by default. But it is not the
case we want to use cache tier. The reason we use it is lots of
content will be access repeatedly so that there is lots of cache hits.
I find random_distribution can alter the acess distribution, but hard
to  exactly set the hit : miss rate for the test. Is there any way to
do this?


Regards
Ning Yao
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to test hammer rbd objectmap feature ?

2015-04-15 Thread Alexandre DERUMIER
Yeah, once we're confident in it in master. The idea behind this 
feature was to allow using object maps with existing images. There 
just wasn't time to include it in the base hammer release. 

Ok, thanks Josh !

(I'm planning to implement this feature in proxmox when it'll be released).


- Mail original -
De: Josh Durgin jdur...@redhat.com
À: aderumier aderum...@odiso.com, ceph-devel ceph-devel@vger.kernel.org
Envoyé: Mercredi 15 Avril 2015 01:12:38
Objet: Re: how to test hammer rbd objectmap feature ?

On 04/14/2015 12:48 AM, Alexandre DERUMIER wrote: 
 Hi, 
 
 I would like to known how to enable object map on hammer ? 
 
 I found a post hammer commit here: 
 
 https://github.com/ceph/ceph/commit/3a7b28d9a2de365d515ea1380ee9e4f867504e10 
 rbd: add feature enable/disable support 
 
 - Specifies which RBD format 2 features are to be enabled when creating 
 - an image. The numbers from the desired features below should be added 
 - to compute the parameter value: 
 + Specifies which RBD format 2 feature should be enabled when creating 
 + an image. Multiple features can be enabled by repeating this option 
 + multiple times. The following features are supported: 
 
 -.. option:: --image-features features 
 +.. option:: --image-feature feature 
 
 - +1: layering support 
 - +2: striping v2 support 
 - +4: exclusive locking support 
 - +8: object map support 
 + layering: layering support 
 + striping: striping v2 support 
 + exclusive-lock: exclusive locking support 
 + object-map: object map support (requires exclusive-lock) 
 
 
 So, in current hammer release, we can only setup objectmap and other features 
 on rbd volume creation ? 

Yes, that's right. 

 Do this patch allow to change features on the fly ? 

Yup, just for exclusive-lock and object map (since they don't 
affect object layout). 

 If yes, is it planned to backport it to hammer soon ? 

Yeah, once we're confident in it in master. The idea behind this 
feature was to allow using object maps with existing images. There 
just wasn't time to include it in the base hammer release. 

Josh 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


how to test hammer rbd objectmap feature ?

2015-04-14 Thread Alexandre DERUMIER
Hi,

I would like to known how to enable object map on hammer ?

I found a post hammer commit here:

https://github.com/ceph/ceph/commit/3a7b28d9a2de365d515ea1380ee9e4f867504e10
rbd: add feature enable/disable support

-   Specifies which RBD format 2 features are to be enabled when creating
-   an image. The numbers from the desired features below should be added
-   to compute the parameter value:
+   Specifies which RBD format 2 feature should be enabled when creating
+   an image. Multiple features can be enabled by repeating this option
+   multiple times. The following features are supported:

-.. option:: --image-features features
+.. option:: --image-feature feature

-   +1: layering support
-   +2: striping v2 support
-   +4: exclusive locking support
-   +8: object map support
+   layering: layering support
+   striping: striping v2 support
+   exclusive-lock: exclusive locking support
+   object-map: object map support (requires exclusive-lock)


So, in current hammer release, we can only setup objectmap and other features 
on rbd volume creation ?

Do this patch allow to change features on the fly ?
If yes, is it planned to backport it to hammer soon ?


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to test hammer rbd objectmap feature ?

2015-04-14 Thread Josh Durgin

On 04/14/2015 12:48 AM, Alexandre DERUMIER wrote:

Hi,

I would like to known how to enable object map on hammer ?

I found a post hammer commit here:

https://github.com/ceph/ceph/commit/3a7b28d9a2de365d515ea1380ee9e4f867504e10
rbd: add feature enable/disable support

-   Specifies which RBD format 2 features are to be enabled when creating
-   an image. The numbers from the desired features below should be added
-   to compute the parameter value:
+   Specifies which RBD format 2 feature should be enabled when creating
+   an image. Multiple features can be enabled by repeating this option
+   multiple times. The following features are supported:

-.. option:: --image-features features
+.. option:: --image-feature feature

-   +1: layering support
-   +2: striping v2 support
-   +4: exclusive locking support
-   +8: object map support
+   layering: layering support
+   striping: striping v2 support
+   exclusive-lock: exclusive locking support
+   object-map: object map support (requires exclusive-lock)


So, in current hammer release, we can only setup objectmap and other features 
on rbd volume creation ?


Yes, that's right.


Do this patch allow to change features on the fly ?


Yup, just for exclusive-lock and object map (since they don't
affect object layout).


If yes, is it planned to backport it to hammer soon ?


Yeah, once we're confident in it in master. The idea behind this
feature was to allow using object maps with existing images. There
just wasn't time to include it in the base hammer release.

Josh
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Does crushtool --test --simulate do what cluster should do?

2015-03-23 Thread Robert LeBlanc
I'm trying to create a CRUSH ruleset and I'm using crushtool to test
the rules, but it doesn't seem to mapping things correctly. I have two
roots, on for spindles and another for SSD. I have two rules, one for
each root. The output of crushtool on rule 0 shows objects being
mapped to SSD OSDs when it should only be choosing spindles.

I'm pretty sure I'm doing something wrong. I've tested the map on .93 and .80.8.

The map is at http://pastebin.com/BjmuASX0

when running

crushtool -i map.crush --test --num-rep 3 --rule 0 --simulate --show-mappings

I'm getting mapping to OSDs  39 which are SSDs. The same happens when
I run the SSD rule, I get OSDs from both roots. It is as if crushtool
is not selecting the correct root. In fact both rules result in the
same mapping:

RNG rule 0 x 0 [0,38,23]
RNG rule 0 x 1 [10,25,1]
RNG rule 0 x 2 [11,40,0]
RNG rule 0 x 3 [5,30,26]
RNG rule 0 x 4 [44,30,10]
RNG rule 0 x 5 [8,26,16]
RNG rule 0 x 6 [24,5,36]
RNG rule 0 x 7 [38,10,9]
RNG rule 0 x 8 [39,9,23]
RNG rule 0 x 9 [12,3,24]
RNG rule 0 x 10 [18,6,41]
...

RNG rule 1 x 0 [0,38,23]
RNG rule 1 x 1 [10,25,1]
RNG rule 1 x 2 [11,40,0]
RNG rule 1 x 3 [5,30,26]
RNG rule 1 x 4 [44,30,10]
RNG rule 1 x 5 [8,26,16]
RNG rule 1 x 6 [24,5,36]
RNG rule 1 x 7 [38,10,9]
RNG rule 1 x 8 [39,9,23]
RNG rule 1 x 9 [12,3,24]
RNG rule 1 x 10 [18,6,41]
...


Thanks,
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

2015-03-02 Thread Mark Nelson

Hi Alex,

I see I even responded in the same thread!  This would be a good thing 
to bring up in the meeting on Wednesday.  Those are far faster single 
OSD results than I've been able to muster with simplemessenger.  I 
wonder how much effect flow-control and header/data crc had.  He did 
have quite a bit more CPU (Intel specs say 14 cores @ 2.6GHz, 28 if you 
count hyperthreading).  Depending on whether there were 1 or 2 CPUs in 
that node, that might be around 3x the CPU power I have here.


Some other thoughts:  Were the simplemessenger tests on IPoIB or native? 
 How big was the RBD volume that was created (could some data be 
locally cached)?  Did network data transfer statistics match the 
benchmark result numbers?


I also did some tests on fdcache, though just glancing at the results it 
doesn't look like tweaking those parameters had much effect.


Mark

On 03/01/2015 08:38 AM, Alexandre DERUMIER wrote:

Hi Mark,

I found an previous bench from Vu Pham (it's was about simplemessenger vs 
xiomessenger)

http://www.spinics.net/lists/ceph-devel/msg22414.html

and with 1 osd, he was able to reach 105k iops with simple messenger

. ~105k iops (4K random read, 20 cores used, numjobs=8, iopdepth=32)

this was with more powerfull nodes, but the difference seem to be quite huge



- Mail original -
De: aderumier aderum...@odiso.com
À: Mark Nelson mnel...@redhat.com
Cc: ceph-devel ceph-devel@vger.kernel.org, ceph-users 
ceph-us...@lists.ceph.com
Envoyé: Vendredi 27 Février 2015 07:10:42
Objet: Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

Thanks Mark for the results,
default values seem to be quite resonable indeed.


I also wonder is cpu frequency can have an impact on latency or not.
I'm going to benchmark on dual xeon 10-cores 3,1ghz nodes in coming weeks,
I'll try replay your benchmark to compare



- Mail original -
De: Mark Nelson mnel...@redhat.com
À: ceph-devel ceph-devel@vger.kernel.org, ceph-users 
ceph-us...@lists.ceph.com
Envoyé: Jeudi 26 Février 2015 05:44:15
Objet: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

Hi Everyone,

In the Ceph Dumpling/Firefly/Hammer SSD/Memstore performance comparison
thread, Alexandre DERUMIER wondered if changing the default shard and
threads per shard OSD settings might have a positive effect on
performance in our tests. I went back and used one of the PCIe SSDs
from our previous tests to experiment with a recent master pull. I
wanted to know how performance was affected by changing these parameters
and also to validate that the default settings still appear to be correct.

I plan to conduct more tests (potentially across multiple SATA SSDs in
the same box), but these initial results seem to show that the default
settings that were chosen are quite reasonable.

Mark

___
ceph-users mailing list
ceph-us...@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-us...@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

2015-03-02 Thread Alexandre DERUMIER
 This would be a good thing to bring up in the meeting on Wednesday. 
yes !

I wonder how much effect flow-control and header/data crc had. 
yes. I known that sommath also disable crc for his bench

Were the simplemessenger tests on IPoIB or native? 

I think it's native, as the Vu Pham benchmark was done on mellanox sx1012 
(ethernet).
And xio messenger was on Roce (rdma over ethernet)

How big was the RBD volume that was created (could some data be 
locally cached)? Did network data transfer statistics match the 
benchmark result numbers? 



I @cc Vu pham to this mail maybe it'll be able to give us answer.


Note that I'll have same mellanox switches (sx1012) for my production cluster 
in some weeks,
so I'll be able to reproduce the bench. (with 2x10 cores 3,1ghz nodes and 
clients).





- Mail original -
De: Mark Nelson mnel...@redhat.com
À: aderumier aderum...@odiso.com
Cc: ceph-devel ceph-devel@vger.kernel.org, ceph-users 
ceph-us...@lists.ceph.com
Envoyé: Lundi 2 Mars 2015 15:39:24
Objet: Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

Hi Alex, 

I see I even responded in the same thread! This would be a good thing 
to bring up in the meeting on Wednesday. Those are far faster single 
OSD results than I've been able to muster with simplemessenger. I 
wonder how much effect flow-control and header/data crc had. He did 
have quite a bit more CPU (Intel specs say 14 cores @ 2.6GHz, 28 if you 
count hyperthreading). Depending on whether there were 1 or 2 CPUs in 
that node, that might be around 3x the CPU power I have here. 

Some other thoughts: Were the simplemessenger tests on IPoIB or native? 
How big was the RBD volume that was created (could some data be 
locally cached)? Did network data transfer statistics match the 
benchmark result numbers? 

I also did some tests on fdcache, though just glancing at the results it 
doesn't look like tweaking those parameters had much effect. 

Mark 

On 03/01/2015 08:38 AM, Alexandre DERUMIER wrote: 
 Hi Mark, 
 
 I found an previous bench from Vu Pham (it's was about simplemessenger vs 
 xiomessenger) 
 
 http://www.spinics.net/lists/ceph-devel/msg22414.html 
 
 and with 1 osd, he was able to reach 105k iops with simple messenger 
 
 . ~105k iops (4K random read, 20 cores used, numjobs=8, iopdepth=32) 
 
 this was with more powerfull nodes, but the difference seem to be quite huge 
 
 
 
 - Mail original - 
 De: aderumier aderum...@odiso.com 
 À: Mark Nelson mnel...@redhat.com 
 Cc: ceph-devel ceph-devel@vger.kernel.org, ceph-users 
 ceph-us...@lists.ceph.com 
 Envoyé: Vendredi 27 Février 2015 07:10:42 
 Objet: Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results 
 
 Thanks Mark for the results, 
 default values seem to be quite resonable indeed. 
 
 
 I also wonder is cpu frequency can have an impact on latency or not. 
 I'm going to benchmark on dual xeon 10-cores 3,1ghz nodes in coming weeks, 
 I'll try replay your benchmark to compare 
 
 
 
 - Mail original - 
 De: Mark Nelson mnel...@redhat.com 
 À: ceph-devel ceph-devel@vger.kernel.org, ceph-users 
 ceph-us...@lists.ceph.com 
 Envoyé: Jeudi 26 Février 2015 05:44:15 
 Objet: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results 
 
 Hi Everyone, 
 
 In the Ceph Dumpling/Firefly/Hammer SSD/Memstore performance comparison 
 thread, Alexandre DERUMIER wondered if changing the default shard and 
 threads per shard OSD settings might have a positive effect on 
 performance in our tests. I went back and used one of the PCIe SSDs 
 from our previous tests to experiment with a recent master pull. I 
 wanted to know how performance was affected by changing these parameters 
 and also to validate that the default settings still appear to be correct. 
 
 I plan to conduct more tests (potentially across multiple SATA SSDs in 
 the same box), but these initial results seem to show that the default 
 settings that were chosen are quite reasonable. 
 
 Mark 
 
 ___ 
 ceph-users mailing list 
 ceph-us...@lists.ceph.com 
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
 
 ___ 
 ceph-users mailing list 
 ceph-us...@lists.ceph.com 
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
 -- 
 To unsubscribe from this list: send the line unsubscribe ceph-devel in 
 the body of a message to majord...@vger.kernel.org 
 More majordomo info at http://vger.kernel.org/majordomo-info.html 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

2015-03-02 Thread Vu Pham

  This would be a good thing to bring up in the meeting on Wednesday.
yes !


Yes, we can discuss details on Wed's call.



I wonder how much effect flow-control and header/data crc had.
yes. I known that sommath also disable crc for his bench


I disabled ceph's header/data crc for both simplemessenger  xio but 
didn't run with header/data crc enable to see the differences.



Were the simplemessenger tests on IPoIB or native?

I think it's native, as the Vu Pham benchmark was done on mellanox 
sx1012 (ethernet).
And xio messenger was on Roce (rdma over ethernet)

Yes, it's native for simplemessenger and RoCE for xio messenger



How big was the RBD volume that was created (could some data be
locally cached)? Did network data transfer statistics match the
benchmark result numbers?

Single OSD on 4GB ramdisk, journal size is 256MB.

RBD volume is only 128MB; however, I ran fio_rbd client with direct=1 to 
bypass local buffer cache
Yes, the network data xfer statistics match the benchmark result 
numbers.
I used dstat -N ethX to monitor the network data statistics

I also turned all cores @ full speed and applied one parameter tuning 
for Mellanox ConnectX-3 HCA mlx4_core driver
(options mlx4_core  log_num_mgm_entry_size=-7)

$ cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
2601000

$ for c in ./cpu[0-55]*; do echo 2601000  
${c}/cpufreq/scaling_min_freq; done






I @cc Vu pham to this mail maybe it'll be able to give us answer.


Note that I'll have same mellanox switches (sx1012) for my production 
cluster in some weeks,
so I'll be able to reproduce the bench. (with 2x10 cores 3,1ghz nodes 
and clients).





- Mail original -
De: Mark Nelson mnel...@redhat.com
À: aderumier aderum...@odiso.com
Cc: ceph-devel ceph-devel@vger.kernel.org, ceph-users 
ceph-us...@lists.ceph.com
Envoyé: Lundi 2 Mars 2015 15:39:24
Objet: Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

Hi Alex,

I see I even responded in the same thread! This would be a good thing
to bring up in the meeting on Wednesday. Those are far faster single
OSD results than I've been able to muster with simplemessenger. I
wonder how much effect flow-control and header/data crc had. He did
have quite a bit more CPU (Intel specs say 14 cores @ 2.6GHz, 28 if you
count hyperthreading). Depending on whether there were 1 or 2 CPUs in
that node, that might be around 3x the CPU power I have here.

Some other thoughts: Were the simplemessenger tests on IPoIB or native?
How big was the RBD volume that was created (could some data be
locally cached)? Did network data transfer statistics match the
benchmark result numbers?

I also did some tests on fdcache, though just glancing at the results 
it
doesn't look like tweaking those parameters had much effect.

Mark

On 03/01/2015 08:38 AM, Alexandre DERUMIER wrote:
  Hi Mark,

  I found an previous bench from Vu Pham (it's was about 
simplemessenger vs xiomessenger)

  http://www.spinics.net/lists/ceph-devel/msg22414.html

  and with 1 osd, he was able to reach 105k iops with simple messenger

  . ~105k iops (4K random read, 20 cores used, numjobs=8, iopdepth=32)

  this was with more powerfull nodes, but the difference seem to be 
quite huge



  - Mail original -
  De: aderumier aderum...@odiso.com
  À: Mark Nelson mnel...@redhat.com
  Cc: ceph-devel ceph-devel@vger.kernel.org, ceph-users 
ceph-us...@lists.ceph.com
  Envoyé: Vendredi 27 Février 2015 07:10:42
  Objet: Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

  Thanks Mark for the results,
  default values seem to be quite resonable indeed.


  I also wonder is cpu frequency can have an impact on latency or not.
  I'm going to benchmark on dual xeon 10-cores 3,1ghz nodes in coming 
weeks,
  I'll try replay your benchmark to compare



  - Mail original -
  De: Mark Nelson mnel...@redhat.com
  À: ceph-devel ceph-devel@vger.kernel.org, ceph-users 
ceph-us...@lists.ceph.com
  Envoyé: Jeudi 26 Février 2015 05:44:15
  Objet: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

  Hi Everyone,

  In the Ceph Dumpling/Firefly/Hammer SSD/Memstore performance 
comparison
  thread, Alexandre DERUMIER wondered if changing the default shard and
  threads per shard OSD settings might have a positive effect on
  performance in our tests. I went back and used one of the PCIe SSDs
  from our previous tests to experiment with a recent master pull. I
  wanted to know how performance was affected by changing these 
parameters
  and also to validate that the default settings still appear to be 
correct.

  I plan to conduct more tests (potentially across multiple SATA SSDs 
in
  the same box), but these initial results seem to show that the 
default
  settings that were chosen are quite reasonable.

  Mark

  ___
  ceph-users mailing list
  ceph-us...@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

2015-03-01 Thread Kevin Walker
Can I ask what xio and simple messenger are and the differences?

Kind regards

Kevin Walker
+968 9765 1742

On 1 Mar 2015, at 18:38, Alexandre DERUMIER aderum...@odiso.com wrote:

Hi Mark,

I found an previous bench from Vu Pham (it's was about simplemessenger vs 
xiomessenger)

http://www.spinics.net/lists/ceph-devel/msg22414.html

and with 1 osd, he was able to reach 105k iops with simple messenger

. ~105k iops (4K random read, 20 cores used, numjobs=8, iopdepth=32)

this was with more powerfull nodes, but the difference seem to be quite huge



- Mail original -
De: aderumier aderum...@odiso.com
À: Mark Nelson mnel...@redhat.com
Cc: ceph-devel ceph-devel@vger.kernel.org, ceph-users 
ceph-us...@lists.ceph.com
Envoyé: Vendredi 27 Février 2015 07:10:42
Objet: Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

Thanks Mark for the results, 
default values seem to be quite resonable indeed. 


I also wonder is cpu frequency can have an impact on latency or not. 
I'm going to benchmark on dual xeon 10-cores 3,1ghz nodes in coming weeks, 
I'll try replay your benchmark to compare 



- Mail original - 
De: Mark Nelson mnel...@redhat.com 
À: ceph-devel ceph-devel@vger.kernel.org, ceph-users 
ceph-us...@lists.ceph.com 
Envoyé: Jeudi 26 Février 2015 05:44:15 
Objet: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results 

Hi Everyone, 

In the Ceph Dumpling/Firefly/Hammer SSD/Memstore performance comparison 
thread, Alexandre DERUMIER wondered if changing the default shard and 
threads per shard OSD settings might have a positive effect on 
performance in our tests. I went back and used one of the PCIe SSDs 
from our previous tests to experiment with a recent master pull. I 
wanted to know how performance was affected by changing these parameters 
and also to validate that the default settings still appear to be correct. 

I plan to conduct more tests (potentially across multiple SATA SSDs in 
the same box), but these initial results seem to show that the default 
settings that were chosen are quite reasonable. 

Mark 

___ 
ceph-users mailing list 
ceph-us...@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___ 
ceph-users mailing list 
ceph-us...@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-us...@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

2015-03-01 Thread Alexandre DERUMIER
Hi Mark,

I found an previous bench from Vu Pham (it's was about simplemessenger vs 
xiomessenger)

http://www.spinics.net/lists/ceph-devel/msg22414.html

and with 1 osd, he was able to reach 105k iops with simple messenger

. ~105k iops (4K random read, 20 cores used, numjobs=8, iopdepth=32)

this was with more powerfull nodes, but the difference seem to be quite huge



- Mail original -
De: aderumier aderum...@odiso.com
À: Mark Nelson mnel...@redhat.com
Cc: ceph-devel ceph-devel@vger.kernel.org, ceph-users 
ceph-us...@lists.ceph.com
Envoyé: Vendredi 27 Février 2015 07:10:42
Objet: Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

Thanks Mark for the results, 
default values seem to be quite resonable indeed. 


I also wonder is cpu frequency can have an impact on latency or not. 
I'm going to benchmark on dual xeon 10-cores 3,1ghz nodes in coming weeks, 
I'll try replay your benchmark to compare 



- Mail original - 
De: Mark Nelson mnel...@redhat.com 
À: ceph-devel ceph-devel@vger.kernel.org, ceph-users 
ceph-us...@lists.ceph.com 
Envoyé: Jeudi 26 Février 2015 05:44:15 
Objet: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results 

Hi Everyone, 

In the Ceph Dumpling/Firefly/Hammer SSD/Memstore performance comparison 
thread, Alexandre DERUMIER wondered if changing the default shard and 
threads per shard OSD settings might have a positive effect on 
performance in our tests. I went back and used one of the PCIe SSDs 
from our previous tests to experiment with a recent master pull. I 
wanted to know how performance was affected by changing these parameters 
and also to validate that the default settings still appear to be correct. 

I plan to conduct more tests (potentially across multiple SATA SSDs in 
the same box), but these initial results seem to show that the default 
settings that were chosen are quite reasonable. 

Mark 

___ 
ceph-users mailing list 
ceph-us...@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___ 
ceph-users mailing list 
ceph-us...@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

2015-03-01 Thread Alexandre DERUMIER
Can I ask what xio and simple messenger are and the differences? 

simple messenger is the classic messenger protocol used since the begining of 
ceph.
xio messenger is for rdma (infiniband or Roce over ethernet)
they are also a new async messenger.

They should help to reduce latencies (and also cpu usage for rdma, because you 
don't have tcp overhead)


- Mail original -
De: Kevin Walker kwal...@virtualviolet.net
À: aderumier aderum...@odiso.com
Cc: Mark Nelson mnel...@redhat.com, ceph-devel 
ceph-devel@vger.kernel.org, ceph-users ceph-us...@lists.ceph.com
Envoyé: Dimanche 1 Mars 2015 22:49:23
Objet: Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

Can I ask what xio and simple messenger are and the differences? 

Kind regards 

Kevin Walker 
+968 9765 1742 

On 1 Mar 2015, at 18:38, Alexandre DERUMIER aderum...@odiso.com wrote: 

Hi Mark, 

I found an previous bench from Vu Pham (it's was about simplemessenger vs 
xiomessenger) 

http://www.spinics.net/lists/ceph-devel/msg22414.html 

and with 1 osd, he was able to reach 105k iops with simple messenger 

. ~105k iops (4K random read, 20 cores used, numjobs=8, iopdepth=32) 

this was with more powerfull nodes, but the difference seem to be quite huge 



- Mail original - 
De: aderumier aderum...@odiso.com 
À: Mark Nelson mnel...@redhat.com 
Cc: ceph-devel ceph-devel@vger.kernel.org, ceph-users 
ceph-us...@lists.ceph.com 
Envoyé: Vendredi 27 Février 2015 07:10:42 
Objet: Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results 

Thanks Mark for the results, 
default values seem to be quite resonable indeed. 


I also wonder is cpu frequency can have an impact on latency or not. 
I'm going to benchmark on dual xeon 10-cores 3,1ghz nodes in coming weeks, 
I'll try replay your benchmark to compare 



- Mail original - 
De: Mark Nelson mnel...@redhat.com 
À: ceph-devel ceph-devel@vger.kernel.org, ceph-users 
ceph-us...@lists.ceph.com 
Envoyé: Jeudi 26 Février 2015 05:44:15 
Objet: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results 

Hi Everyone, 

In the Ceph Dumpling/Firefly/Hammer SSD/Memstore performance comparison 
thread, Alexandre DERUMIER wondered if changing the default shard and 
threads per shard OSD settings might have a positive effect on 
performance in our tests. I went back and used one of the PCIe SSDs 
from our previous tests to experiment with a recent master pull. I 
wanted to know how performance was affected by changing these parameters 
and also to validate that the default settings still appear to be correct. 

I plan to conduct more tests (potentially across multiple SATA SSDs in 
the same box), but these initial results seem to show that the default 
settings that were chosen are quite reasonable. 

Mark 

___ 
ceph-users mailing list 
ceph-us...@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___ 
ceph-users mailing list 
ceph-us...@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___ 
ceph-users mailing list 
ceph-us...@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

2015-02-26 Thread Alexandre DERUMIER
Thanks Mark for the results,
default values seem to be quite resonable indeed.


I also wonder is cpu frequency can have an impact on latency or not.
I'm going to benchmark on dual xeon 10-cores 3,1ghz nodes in coming weeks,
I'll try replay your benchmark to compare



- Mail original -
De: Mark Nelson mnel...@redhat.com
À: ceph-devel ceph-devel@vger.kernel.org, ceph-users 
ceph-us...@lists.ceph.com
Envoyé: Jeudi 26 Février 2015 05:44:15
Objet: [ceph-users] Ceph Hammer OSD Shard Tuning Test Results

Hi Everyone, 

In the Ceph Dumpling/Firefly/Hammer SSD/Memstore performance comparison 
thread, Alexandre DERUMIER wondered if changing the default shard and 
threads per shard OSD settings might have a positive effect on 
performance in our tests. I went back and used one of the PCIe SSDs 
from our previous tests to experiment with a recent master pull. I 
wanted to know how performance was affected by changing these parameters 
and also to validate that the default settings still appear to be correct. 

I plan to conduct more tests (potentially across multiple SATA SSDs in 
the same box), but these initial results seem to show that the default 
settings that were chosen are quite reasonable. 

Mark 

___ 
ceph-users mailing list 
ceph-us...@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Is there any documents to describe the architecture of ceph unit test based on gtest

2014-12-09 Thread Nicheal
Hi, all

Is any guideline that describes how to run the ceph unit test, and its
basic architecture?
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is there any documents to describe the architecture of ceph unit test based on gtest

2014-12-09 Thread Gregory Farnum
On Tue, Dec 9, 2014 at 1:50 AM, Nicheal zay11...@gmail.com wrote:
 Hi, all

 Is any guideline that describes how to run the ceph unit test, and its
 basic architecture?

You can run them all by executing make check [-j N]. The executables
run as part of that are specified in the makefiles throughout the
project structure (CHECK_programs or similar). For more on how the
gtest framework is set up, you should look at the gtest docs. ;)
-Greg
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Teuthology smoke test case(tasks/rados_python.yaml) failed on giant

2014-10-20 Thread Aanchal Agrawal
Hi All,

I am using giant branch for the development purpose.
One of the teuthology smoke test case 
'ceph-qa-suite/suites/smoke/basic/tasks/rados_python.yaml' is failing on it.

From teuthology.log
==
2014-10-14T11:31:09.461 
INFO:teuthology.task.workunit.client.0.plana02.stderr:test_rados.test_rados_init
 ... ok
2014-10-14T11:31:09.489 
INFO:teuthology.task.workunit.client.0.plana02.stderr:test_rados.test_ioctx_context_manager
 ... ok
2014-10-14T11:31:09.490 INFO:teuthology.task.workunit.client.0.plana02.stderr:
2014-10-14T11:31:09.490 
INFO:teuthology.task.workunit.client.0.plana02.stderr:==
2014-10-14T11:31:09.490 
INFO:teuthology.task.workunit.client.0.plana02.stderr:ERROR: 
test_rados.TestRados.test_get_pool_base_tier
2014-10-14T11:31:09.490 
INFO:teuthology.task.workunit.client.0.plana02.stderr:--
2014-10-14T11:31:09.490 
INFO:teuthology.task.workunit.client.0.plana02.stderr:Traceback (most recent 
call last):
2014-10-14T11:31:09.491 INFO:teuthology.task.workunit.client.0.plana02.stderr:  
File /usr/local/lib/python2.7/dist-packages/nose/case.py, line 197, in runTest
2014-10-14T11:31:09.491 INFO:teuthology.task.workunit.client.0.plana02.stderr:  
  self.test(*self.arg)
2014-10-14T11:31:09.491 INFO:teuthology.task.workunit.client.0.plana02.stderr:  
File /home/ubuntu/cephtest/mnt.0/client.0/tmp/test_rados.py, line 96, in 
test_get_pool_base_tier
2014-10-14T11:31:09.491 INFO:teuthology.task.workunit.client.0.plana02.stderr:  
  pool_id = self.rados.pool_lookup('foo')
2014-10-14T11:31:09.491 
INFO:teuthology.task.workunit.client.0.plana02.stderr:AttributeError: 'Rados' 
object has no attribute 'pool_lookup'
2014-10-14T11:31:09.491 INFO:teuthology.task.workunit.client.0.plana02.stderr:
2014-10-14T11:31:09.491 
INFO:teuthology.task.workunit.client.0.plana02.stderr:--
2014-10-14T11:31:09.492 
INFO:teuthology.task.workunit.client.0.plana02.stderr:Ran 36 tests in 98.393s
2014-10-14T11:31:09.492 INFO:teuthology.task.workunit.client.0.plana02.stderr:
2014-10-14T11:31:09.492 
INFO:teuthology.task.workunit.client.0.plana02.stderr:FAILED (errors=1)


self.rados.pool_lookup() is a part of file 'src/test/pybind/test_rados.py'.
===
91def test_get_pool_base_tier(self):

95  try:
96   pool_id = self.rados.pool_lookup('foo')
===

Failure is due to latest commits on '/src/pybind/rados.py' which is in master 
and not present in giant.
It defines 'pool_lookup()' for rados object .

Following is a commit:
commit : d8ae14f48965e2aff191d67fda7b5ffd85840d47 to master.

We need to have this commit in giant branch in order to pass the test case.
Can anyone pull it to giant?

Thanks,
Aanchal



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Teuthology smoke test case(tasks/rados_python.yaml) failed on giant

2014-10-20 Thread Gregory Farnum
It sounds like you're running the master branch ceph-qa-suite tests
against the giant branch of Ceph. The tests should pass if you resolve
that.

If not, or if you have some particular need for this function in the
Giant release, you (or someone you work with) can submit a backport
Pull Request, but this is a feature addition and Giant is pretty close
to release so I'm not sure it's appropriate for merging right now.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Sun, Oct 19, 2014 at 11:04 PM, Aanchal Agrawal
aanchal.agra...@sandisk.com wrote:
 Hi All,

 I am using giant branch for the development purpose.
 One of the teuthology smoke test case 
 'ceph-qa-suite/suites/smoke/basic/tasks/rados_python.yaml' is failing on it.

 From teuthology.log
 ==
 2014-10-14T11:31:09.461 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:test_rados.test_rados_init
  ... ok
 2014-10-14T11:31:09.489 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:test_rados.test_ioctx_context_manager
  ... ok
 2014-10-14T11:31:09.490 INFO:teuthology.task.workunit.client.0.plana02.stderr:
 2014-10-14T11:31:09.490 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:==
 2014-10-14T11:31:09.490 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:ERROR: 
 test_rados.TestRados.test_get_pool_base_tier
 2014-10-14T11:31:09.490 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:--
 2014-10-14T11:31:09.490 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:Traceback (most recent 
 call last):
 2014-10-14T11:31:09.491 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:  File 
 /usr/local/lib/python2.7/dist-packages/nose/case.py, line 197, in runTest
 2014-10-14T11:31:09.491 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:self.test(*self.arg)
 2014-10-14T11:31:09.491 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:  File 
 /home/ubuntu/cephtest/mnt.0/client.0/tmp/test_rados.py, line 96, in 
 test_get_pool_base_tier
 2014-10-14T11:31:09.491 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:pool_id = 
 self.rados.pool_lookup('foo')
 2014-10-14T11:31:09.491 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:AttributeError: 'Rados' 
 object has no attribute 'pool_lookup'
 2014-10-14T11:31:09.491 INFO:teuthology.task.workunit.client.0.plana02.stderr:
 2014-10-14T11:31:09.491 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:--
 2014-10-14T11:31:09.492 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:Ran 36 tests in 98.393s
 2014-10-14T11:31:09.492 INFO:teuthology.task.workunit.client.0.plana02.stderr:
 2014-10-14T11:31:09.492 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:FAILED (errors=1)
 

 self.rados.pool_lookup() is a part of file 'src/test/pybind/test_rados.py'.
 ===
 91def test_get_pool_base_tier(self):

 95  try:
 96   pool_id = self.rados.pool_lookup('foo')
 ===

 Failure is due to latest commits on '/src/pybind/rados.py' which is in master 
 and not present in giant.
 It defines 'pool_lookup()' for rados object .

 Following is a commit:
 commit : d8ae14f48965e2aff191d67fda7b5ffd85840d47 to master.

 We need to have this commit in giant branch in order to pass the test case.
 Can anyone pull it to giant?

 Thanks,
 Aanchal

 

 PLEASE NOTE: The information contained in this electronic mail message is 
 intended only for the use of the designated recipient(s) named above. If the 
 reader of this message is not the intended recipient, you are hereby notified 
 that you have received this message in error and that any review, 
 dissemination, distribution, or copying of this message is strictly 
 prohibited. If you have received this communication in error, please notify 
 the sender by telephone or e-mail (as shown above) immediately and destroy 
 any and all copies of this message in your possession (whether hard copies or 
 electronically stored copies).

 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Teuthology smoke test case(tasks/rados_python.yaml) failed on giant

2014-10-20 Thread Aanchal Agrawal
Got it. Thanks for the reply Greg.

Regards,
Aanchal

-Original Message-
From: Gregory Farnum [mailto:g...@inktank.com] 
Sent: Monday, October 20, 2014 10:46 PM
To: Aanchal Agrawal
Cc: ceph-devel@vger.kernel.org
Subject: Re: Teuthology smoke test case(tasks/rados_python.yaml) failed on giant

It sounds like you're running the master branch ceph-qa-suite tests against the 
giant branch of Ceph. The tests should pass if you resolve that.

If not, or if you have some particular need for this function in the Giant 
release, you (or someone you work with) can submit a backport Pull Request, but 
this is a feature addition and Giant is pretty close to release so I'm not sure 
it's appropriate for merging right now.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Sun, Oct 19, 2014 at 11:04 PM, Aanchal Agrawal aanchal.agra...@sandisk.com 
wrote:
 Hi All,

 I am using giant branch for the development purpose.
 One of the teuthology smoke test case 
 'ceph-qa-suite/suites/smoke/basic/tasks/rados_python.yaml' is failing on it.

 From teuthology.log
 ==
 2014-10-14T11:31:09.461 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:test_rados.test_
 rados_init ... ok
 2014-10-14T11:31:09.489 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:test_rados.test_
 ioctx_context_manager ... ok
 2014-10-14T11:31:09.490 INFO:teuthology.task.workunit.client.0.plana02.stderr:
 2014-10-14T11:31:09.490 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:
 ==
 2014-10-14T11:31:09.490 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:ERROR: 
 test_rados.TestRados.test_get_pool_base_tier
 2014-10-14T11:31:09.490 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:
 --
 2014-10-14T11:31:09.490 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:Traceback (most recent 
 call last):
 2014-10-14T11:31:09.491 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:  File 
 /usr/local/lib/python2.7/dist-packages/nose/case.py, line 197, in runTest
 2014-10-14T11:31:09.491 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:self.test(*self.arg)
 2014-10-14T11:31:09.491 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:  File 
 /home/ubuntu/cephtest/mnt.0/client.0/tmp/test_rados.py, line 96, in 
 test_get_pool_base_tier
 2014-10-14T11:31:09.491 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:pool_id = 
 self.rados.pool_lookup('foo')
 2014-10-14T11:31:09.491 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:AttributeError: 'Rados' 
 object has no attribute 'pool_lookup'
 2014-10-14T11:31:09.491 INFO:teuthology.task.workunit.client.0.plana02.stderr:
 2014-10-14T11:31:09.491 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:
 --
 2014-10-14T11:31:09.492 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:Ran 36 tests in 
 98.393s
 2014-10-14T11:31:09.492 INFO:teuthology.task.workunit.client.0.plana02.stderr:
 2014-10-14T11:31:09.492 
 INFO:teuthology.task.workunit.client.0.plana02.stderr:FAILED 
 (errors=1) 

 self.rados.pool_lookup() is a part of file 'src/test/pybind/test_rados.py'.
 ===
 91def test_get_pool_base_tier(self):

 95  try:
 96   pool_id = self.rados.pool_lookup('foo')
 ===

 Failure is due to latest commits on '/src/pybind/rados.py' which is in master 
 and not present in giant.
 It defines 'pool_lookup()' for rados object .

 Following is a commit:
 commit : d8ae14f48965e2aff191d67fda7b5ffd85840d47 to master.

 We need to have this commit in giant branch in order to pass the test case.
 Can anyone pull it to giant?

 Thanks,
 Aanchal

 

 PLEASE NOTE: The information contained in this electronic mail message is 
 intended only for the use of the designated recipient(s) named above. If the 
 reader of this message is not the intended recipient, you are hereby notified 
 that you have received this message in error and that any review, 
 dissemination, distribution, or copying of this message is strictly 
 prohibited. If you have received this communication in error, please notify 
 the sender by telephone or e-mail (as shown above) immediately and destroy 
 any and all copies of this message in your possession (whether hard copies or 
 electronically stored copies).

 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel 
 in the body of a message to majord...@vger.kernel.org More majordomo 
 info at  http://vger.kernel.org/majordomo-info.html


RE: Is that possible to run teuthology on local test servers

2014-08-14 Thread Sage Weil
You can accomplish the same thing by adding a section like

targets:
  ubu...@plana37.front.sepia.ceph.com: ssh-rsa 
B3NzaC1yc2EDAQABAAABAQDlGkqSVZEriH63rFY4R4y6jelC0RGv/A8JtkIGWqRDIjLYsXgB4GoSNsI5H0FGhR+8ZiE2kOimFvpzYXCiFEW9QNWEcdKn75ZsnUnCTJf0fhYUyyD0qaxL3BhDyPX2pazSPvIB9w+WSATpW3XoL6uos7VIRjGn95VsHOonWjx25rupntyF9ac0VChYCFAO7WVQcxDfQ8MAW3O4YhH+uIuzv62ZahI8l+9A1rKbxcQU5aOaS+lhJArHEGxF38JXGWOmqybhe1x8wc4Fw/1g777VfwbyzG0wh2WgxkZ2R7qzKttnJ+iT8a/Un+ZLi8AnWN7KiwabRiMvc1VXO6zl

to your yaml.  The key is the *public* key of the machine (what you get 
from ssh-keyscan), to verify you are talking to the right target.  That, 
along with the 'roles' and 'tasks' section, is put in a yaml file, and the 
filename is passed to the 'teuthology' binary.

sage


On Thu, 14 Aug 2014, Vijayendra Shamanna wrote:

 Hi,
 
 I had gotten teuthology to work some time back to a reasonable extent in my 
 local setup with a few quick ugly hacks. The main set of changes were,
 
 1. Explicitly named my test systems as plana01 , plana02  plana03. 
 Some of the teuthology code which checks for VM instances does compare with 
 known set/class of machine names
 2. In lock_machines() routine (teuthology/task/internal.py), set 
 ctx.config['targets'] as follows:
 mydict['myuser@plana01'] = 'ssh-rsa my_ssh_key1'
 mydict['myuser@plana02'] = 'ssh-rsa my_ssh_key2' and so on..
ctx.config['targets'] = mydict
 3. Comment out the reporting code in teuthology/report.py
 
 There were a few other minor changes like disabling teuthology branch 
 checkout etc.
 
 Thanks,
 Viju
 
 
 
 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org 
 [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of guping
 Sent: Thursday, August 14, 2014 8:22 AM
 To: ceph-devel@vger.kernel.org
 Subject: Is that possible to run teuthology on local test servers
 
 I read the doc on the github teuthology, but still can not figure out how to 
 run teuthology on my local test servers.
 Any experience? Any advice?
 
 --
 Thanks,
 Gu Ping
 
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in the 
 body of a message to majord...@vger.kernel.org More majordomo info at  
 http://vger.kernel.org/majordomo-info.html
 
 
 
 PLEASE NOTE: The information contained in this electronic mail message is 
 intended only for the use of the designated recipient(s) named above. If the 
 reader of this message is not the intended recipient, you are hereby notified 
 that you have received this message in error and that any review, 
 dissemination, distribution, or copying of this message is strictly 
 prohibited. If you have received this communication in error, please notify 
 the sender by telephone or e-mail (as shown above) immediately and destroy 
 any and all copies of this message in your possession (whether hard copies or 
 electronically stored copies).
 
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Is that possible to run teuthology on local test servers

2014-08-13 Thread guping
I read the doc on the github teuthology, but still can not figure out 
how to run teuthology on my local test servers.

Any experience? Any advice?

--
Thanks,
Gu Ping

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Is that possible to run teuthology on local test servers

2014-08-13 Thread Vijayendra Shamanna
Hi,

I had gotten teuthology to work some time back to a reasonable extent in my 
local setup with a few quick ugly hacks. The main set of changes were,

1. Explicitly named my test systems as plana01 , plana02  plana03. Some 
of the teuthology code which checks for VM instances does compare with known 
set/class of machine names
2. In lock_machines() routine (teuthology/task/internal.py), set 
ctx.config['targets'] as follows:
mydict['myuser@plana01'] = 'ssh-rsa my_ssh_key1'
mydict['myuser@plana02'] = 'ssh-rsa my_ssh_key2' and so on..
   ctx.config['targets'] = mydict
3. Comment out the reporting code in teuthology/report.py

There were a few other minor changes like disabling teuthology branch checkout 
etc.

Thanks,
Viju



-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of guping
Sent: Thursday, August 14, 2014 8:22 AM
To: ceph-devel@vger.kernel.org
Subject: Is that possible to run teuthology on local test servers

I read the doc on the github teuthology, but still can not figure out how to 
run teuthology on my local test servers.
Any experience? Any advice?

--
Thanks,
Gu Ping

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


swift test error interpretation

2014-06-28 Thread Loic Dachary
Hi Yehuda,

The following error was found when running a firefly cluster mixed with an 
erasure coded related change ( https://github.com/ceph/ceph/pull/1890 ). The 
matching teuthology task 
http://qa-proxy.ceph.com/teuthology/loic-2014-06-27_18:45:37-upgrade:firefly-x:stress-split-wip-8475-testing-basic-plana/329523/
 is part of an upgrade suite 
http://pulpito.ceph.com/loic-2014-06-27_18:45:37-upgrade:firefly-x:stress-split-wip-8475-testing-basic-plana/
 that has been generated from a new 
https://github.com/ceph/ceph-qa-suite/tree/master/suites/upgrade suite : 
https://github.com/ceph/ceph-qa-suite/tree/master/suites/upgrade/firefly-x . It 
was created by copying 
https://github.com/ceph/ceph-qa-suite/tree/master/suites/upgrade/dumpling-x , 
mostly.

Do you have an advice on how to interpret this output ?

Feel free to redirect me to whatever URL if this is something explained 
elsewhere. Any pointer would be greatly appreciated :-)

Cheers

http://qa-proxy.ceph.com/teuthology/loic-2014-06-27_18:45:37-upgrade:firefly-x:stress-split-wip-8475-testing-basic-plana/329523/

2014-06-27T19:37:39.441 
INFO:teuthology.orchestra.run.plana64.stderr:testZeroByteFile 
(test.functional.tests.TestFileUTF8) ... ok
2014-06-27T19:37:39.441 INFO:teuthology.orchestra.run.plana64.stderr:
2014-06-27T19:37:39.442 
INFO:teuthology.orchestra.run.plana64.stderr:==
2014-06-27T19:37:39.442 INFO:teuthology.orchestra.run.plana64.stderr:ERROR: 
testAccountHead (test.functional.tests.TestAccount)
2014-06-27T19:37:39.442 
INFO:teuthology.orchestra.run.plana64.stderr:--
2014-06-27T19:37:39.442 INFO:teuthology.orchestra.run.plana64.stderr:Traceback 
(most recent call last):
2014-06-27T19:37:39.442 INFO:teuthology.orchestra.run.plana64.stderr:  File 
/home/ubuntu/cephtest/swift/test/functional/tests.py, line 104, in setUp
2014-06-27T19:37:39.442 INFO:teuthology.orchestra.run.plana64.stderr:
cls.env.setUp()
2014-06-27T19:37:39.442 INFO:teuthology.orchestra.run.plana64.stderr:  File 
/home/ubuntu/cephtest/swift/test/functional/tests.py, line 140, in setUp
2014-06-27T19:37:39.442 INFO:teuthology.orchestra.run.plana64.stderr:raise 
ResponseError(cls.conn.response)
2014-06-27T19:37:39.443 
INFO:teuthology.orchestra.run.plana64.stderr:ResponseError: 500: Internal 
Server Error
2014-06-27T19:37:39.443 INFO:teuthology.orchestra.run.plana64.stderr:
2014-06-27T19:37:39.443 
INFO:teuthology.orchestra.run.plana64.stderr:--
2014-06-27T19:37:39.443 INFO:teuthology.orchestra.run.plana64.stderr:Ran 137 
tests in 452.719s
2014-06-27T19:37:39.443 INFO:teuthology.orchestra.run.plana64.stderr:
2014-06-27T19:37:39.443 INFO:teuthology.orchestra.run.plana64.stderr:FAILED 
(errors=1)
2014-06-27T19:37:39.460 ERROR:teuthology.contextutil:Saw exception from nested 
tasks
Traceback (most recent call last):
  File /home/teuthworker/teuthology-master/teuthology/contextutil.py, line 
27, in nested
vars.append(enter())
  File /usr/lib/python2.7/contextlib.py, line 17, in __enter__
return self.gen.next()
  File /home/teuthworker/teuthology-master/teuthology/task/swift.py, line 
175, in run_tests
args=args,
  File /home/teuthworker/teuthology-master/teuthology/orchestra/cluster.py, 
line 64, in run
return [remote.run(**kwargs) for remote in remotes]
  File /home/teuthworker/teuthology-master/teuthology/orchestra/remote.py, 
line 114, in run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File /home/teuthworker/teuthology-master/teuthology/orchestra/run.py, line 
401, in run
r.wait()
  File /home/teuthworker/teuthology-master/teuthology/orchestra/run.py, line 
102, in wait
exitstatus=status, node=self.hostname)
CommandFailedError: Command failed on plana64 with status 1: 
SWIFT_TEST_CONFIG_FILE=/home/ubuntu/cephtest/archive/testswift.client.0.conf 
/home/ubuntu/cephtest/swift/virtualen
-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Recommended teuthology upgrade test

2014-06-27 Thread Loic Dachary
Hi Sam,

TL;DR: what oneliner do you recommend to run upgrade tests for 
https://github.com/ceph/ceph/pull/1890 ? 

Running the rados suite can be done with :

   ./schedule_suite.sh rados wip-8071 testing l...@dachary.org basic master 
plana 

or something else since ./schedule_suite.sh was recently obsoleted ( 
http://tracker.ceph.com/issues/8678 ). Running something similar for upgrade 
will presumably run all of 
https://github.com/ceph/ceph-qa-suite/tree/master/suites/upgrade

Is there a way to run minimal tests by limiting the upgrade suite so that it 
only focuses on a firefly cluster that upgrades to 
https://github.com/ceph/ceph/pull/1890 so that it checks the behavior when 
running a mixed cluster (firefly + master with the change) ?

It looks like http://pulpito.ceph.com/?suite=upgrade was never run ( at least 
that's what appears to cause http://tracker.ceph.com/issues/8681 ) Is 
http://pulpito.ceph.com/?suite=upgrade-rados a good fit ? If so is there a way 
to figure out how it was created ?

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Recommended teuthology upgrade test

2014-06-27 Thread Yuri Weinstein
Loic

I don't intent to answer all questions, but some info, see inline

On Fri, Jun 27, 2014 at 8:16 AM, Loic Dachary l...@dachary.org wrote:
 Hi Sam,

 TL;DR: what oneliner do you recommend to run upgrade tests for 
 https://github.com/ceph/ceph/pull/1890 ?

 Running the rados suite can be done with :

./schedule_suite.sh rados wip-8071 testing l...@dachary.org basic master 
 plana


It was replaced with teuthology-suite, see --help for more info

 or something else since ./schedule_suite.sh was recently obsoleted ( 
 http://tracker.ceph.com/issues/8678 ). Running something similar for upgrade 
 will presumably run all of 
 https://github.com/ceph/ceph-qa-suite/tree/master/suites/upgrade

 Is there a way to run minimal tests by limiting the upgrade suite so that it 
 only focuses on a firefly cluster that upgrades to 
 https://github.com/ceph/ceph/pull/1890 so that it checks the behavior when 
 running a mixed cluster (firefly + master with the change) ?

You can run specifying argument with smaller suite, like this:
dumpling-x/parallel


 It looks like http://pulpito.ceph.com/?suite=upgrade was never run ( at least 
 that's what appears to cause http://tracker.ceph.com/issues/8681 ) Is 
 http://pulpito.ceph.com/?suite=upgrade-rados a good fit ? If so is there a 
 way to figure out how it was created ?

 Cheers

 --
 Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recommended teuthology upgrade test

2014-06-27 Thread Loic Dachary


On 27/06/2014 18:17, Yuri Weinstein wrote:
 Loic
 
 I don't intent to answer all questions, but some info, see inline
 
 On Fri, Jun 27, 2014 at 8:16 AM, Loic Dachary l...@dachary.org wrote:
 Hi Sam,

 TL;DR: what oneliner do you recommend to run upgrade tests for 
 https://github.com/ceph/ceph/pull/1890 ?

 Running the rados suite can be done with :

./schedule_suite.sh rados wip-8071 testing l...@dachary.org basic master 
 plana

 
 It was replaced with teuthology-suite, see --help for more info
 
 or something else since ./schedule_suite.sh was recently obsoleted ( 
 http://tracker.ceph.com/issues/8678 ). Running something similar for upgrade 
 will presumably run all of 
 https://github.com/ceph/ceph-qa-suite/tree/master/suites/upgrade

 Is there a way to run minimal tests by limiting the upgrade suite so that it 
 only focuses on a firefly cluster that upgrades to 
 https://github.com/ceph/ceph/pull/1890 so that it checks the behavior when 
 running a mixed cluster (firefly + master with the change) ?
 
 You can run specifying argument with smaller suite, like this:
 dumpling-x/parallel

Hi Yuri,

Thanks for the hint :-) It is documented 
https://github.com/ceph/teuthology/commit/7d2388b42c75f4526bc34c6a4b1b16637d967527
 but it looks like it should be more generic because any level of 
subdirectories can be used.

Cheers

 

 It looks like http://pulpito.ceph.com/?suite=upgrade was never run ( at 
 least that's what appears to cause http://tracker.ceph.com/issues/8681 ) Is 
 http://pulpito.ceph.com/?suite=upgrade-rados a good fit ? If so is there a 
 way to figure out how it was created ?

 Cheers

 --
 Loïc Dachary, Artisan Logiciel Libre


-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: [ceph-users] CDS Conferencing Test

2014-06-17 Thread Patrick McGarry
The conference room is open for the next 30 mins to test BlueJeans before CDS 
next week.

https://bluejeans.com/362952863



Best Regards,  

Patrick McGarry
Director Ceph Community, Red Hat
http://ceph.com || http://community.redhat.com
@scuttlemonkey || @ceph  

On June 16, 2014 at 7:41:26 PM, Patrick McGarry 
(patr...@inktank.com(mailto:patr...@inktank.com)) wrote:

 Hey Cephers,
  
 As you know the next Ceph Developer Summit is fast approaching! (stay tuned 
 for the schedule later in the week) This summit is going to be utilizing our 
 new video conferencing system “BlueJeans.” In order to ensure that things go 
 smoothly on summit day I’ll be running a few test meetings so people can get 
 things configured to their liking.
  
 The first meeting will be tomorrow (Tues) at 14:30 EDT (GMT -5). The details 
 are as follows:
  
  
 To join the Meeting:
 https://bluejeans.com/362952863
  
 To join via Browser:
 https://bluejeans.com/362952863/browser
  
 To join with Lync:
 https://bluejeans.com/362952863/lync
  
 To join via Room System:
 Video Conferencing System: bjn.vc -or- 199.48.152.152
 Meeting ID: 362952863
  
  
 To join via Phone:  
 1) Dial:
 +1 408 740 7256
 +1 888 240 2560 (US or Canada only)
 (see all numbers - http://bluejeans.com/numbers)
 2) Enter Conference ID: 362952863
  
  
  
 I’ll keep the room open for approximately 30 minutes. If you are unable to 
 attend this session I will schedule 1 or 2 more before next week in order to 
 give everyone that wants to a chance to test it out. Thanks.
  
  
  
  
 Best Regards,
  
 Patrick McGarry
 Director Ceph Community, Red Hat
 http://ceph.com || http://community.redhat.com
 @scuttlemonkey || @ceph
 ___
 ceph-users mailing list
 ceph-us...@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


CDS Conferencing Test

2014-06-16 Thread Patrick McGarry
Hey Cephers,

As you know the next Ceph Developer Summit is fast approaching! (stay tuned for 
the schedule later in the week)  This summit is going to be utilizing our new 
video conferencing system “BlueJeans.” In order to ensure that things go 
smoothly on summit day I’ll be running a few test meetings so people can get 
things configured to their liking.

The first meeting will be tomorrow (Tues) at 14:30 EDT (GMT -5).  The details 
are as follows:


To join the Meeting:
https://bluejeans.com/362952863

To join via Browser:
https://bluejeans.com/362952863/browser

To join with Lync:
https://bluejeans.com/362952863/lync

To join via Room System:
Video Conferencing System:  bjn.vc -or- 199.48.152.152
Meeting ID: 362952863


To join via Phone: 
1) Dial:
          +1 408 740 7256
          +1 888 240 2560 (US or Canada only)
          (see all numbers - http://bluejeans.com/numbers)
2) Enter Conference ID: 362952863



I’ll keep the room open for approximately 30 minutes.  If you are unable to 
attend this session I will schedule 1 or 2 more before next week in order to 
give everyone that wants to a chance to test it out.  Thanks.




Best Regards,  

Patrick McGarry
Director Ceph Community, Red Hat
http://ceph.com || http://community.redhat.com
@scuttlemonkey || @ceph
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Question about the test() function in CrushTester class

2014-01-10 Thread Lipeng Wan
Hi guys,

I am now trying to use crushtool.cc to test crush algorithm. First, I
build a new crush map using crushtool.cc and all the devices have the
maximum weight (0x1). Then I assign different weights to devices
using the --weight option and run the test() function. It seems that
during the execution of test() function, the crush map was not
modified based on the new device weights I gave, which means the
bucket selection was still based on the initial device weight rather
than the weights I gave. Dose this make sense?

L. Wan
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-12-03 Thread Sylvain Munaut
Hi,

 What sort of memory are your instances using?

I just had a look. Around 120 Mb. Which indeed is a bit higher that I'd like.


 I haven't turned on any caching so I assume it's disabled.

Yes.


Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-11-30 Thread James Harper
 
 Hi James,
 
  Are you still working on this in any way?
 
 Well I'm using it, but I haven't worked on it. I never was able to
 reproduce any issue with it locally ...
 In prod, I do run it with cache disabled though since I never took the
 time to check using the cache was safe in the various failure modes.
 
 Is 300 MB normal ? Well, that probably depends on your settings (cache
 enabled / size / ...). But in anycase I'd guess the memory comes from
 a librbd itself. It's not like I do much allocation myself :p
 

What sort of memory are your instances using? I haven't turned on any caching 
so I assume it's disabled.

I increased the stack size to 8M to work around the crash I was having, but 
lowering that to 2MB doesn't have any significant impact on memory usage.

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-11-29 Thread James Harper
Sylvain,

Are you still working on this in any way?

It's been working great for me but seems to use an excessive amount of memory, 
like 300MB per process. Is that expected?

Thanks

James

 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
 ow...@vger.kernel.org] On Behalf Of Sylvain Munaut
 Sent: Saturday, 20 April 2013 12:41 AM
 To: Pasi Kärkkäinen
 Cc: ceph-devel@vger.kernel.org; xen-de...@lists.xen.org
 Subject: Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to
 test ? :p
 
  If you have time to write up some lines about steps required to test this,
  that'd be nice, it'll help people to test this stuff.
 
 To quickly test, I compiled the package and just replaced the tapdisk
 binary from my normal blktap install with the newly compiled one.
 
 Then you need to setup a RBD image named 'test' in the default 'rbd'
 pool. You also need to setup a proper ceph.conf and keyring file on
 the client (since librbd will use those for the parameters). The
 keyring must contain the 'client.admin' key
 
 Then in the config file, use something like
 tap2:tapdisk:rbd:xxx,xvda1,w  the 'xxx' part is currently ignored
 ...
 
 
 Cheers,
 
 Sylvain
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-11-29 Thread Sylvain Munaut
Hi James,

 Are you still working on this in any way?

Well I'm using it, but I haven't worked on it. I never was able to
reproduce any issue with it locally ...
In prod, I do run it with cache disabled though since I never took the
time to check using the cache was safe in the various failure modes.

Is 300 MB normal ? Well, that probably depends on your settings (cache
enabled / size / ...). But in anycase I'd guess the memory comes from
a librbd itself. It's not like I do much allocation myself :p

Cheers,

   Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] test/libcephfs: free cmount after tests finishes

2013-11-03 Thread Sage Weil
Applied this one too!

BTW, an easier workflow than sending patches to the list is to accumulate 
a batch of fixes in a branch and submit a pull request via github (at 
least if you're already a github user).  Whichever works well for you.

Thanks!
s

On Sun, 3 Nov 2013, Xing Lin wrote:

 unmount and release cmount at the end of tests
 
 Signed-off-by: Xing Lin xing...@cs.utah.edu
 ---
  src/test/libcephfs/readdir_r_cb.cc | 4 
  1 file changed, 4 insertions(+)
 
 diff --git a/src/test/libcephfs/readdir_r_cb.cc 
 b/src/test/libcephfs/readdir_r_cb.cc
 index 788260b..4a99f10 100644
 --- a/src/test/libcephfs/readdir_r_cb.cc
 +++ b/src/test/libcephfs/readdir_r_cb.cc
 @@ -54,4 +54,8 @@ TEST(LibCephFS, ReaddirRCB) {
ASSERT_LE(0, ceph_opendir(cmount, c_dir, dirp));
ASSERT_EQ(5, ceph_getdnames(cmount, dirp, buf, 6));
ASSERT_EQ(4, ceph_getdnames(cmount, dirp, buf, 6));
 +
 +  // free cmount after finishing testing
 +  ASSERT_EQ(0, ceph_unmount(cmount));
 +  ASSERT_EQ(0, ceph_release(cmount));
  }
 -- 
 1.8.3.4 (Apple Git-47)
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] test/libcephfs: free cmount after tests finishes

2013-11-03 Thread Xing
Hi Sage,

Thanks for applying these two patches. I will try to accumulate more fixes and 
submit pull requests via github later. 

Thanks,
Xing

On Nov 3, 2013, at 12:17 AM, Sage Weil s...@inktank.com wrote:

 Applied this one too!
 
 BTW, an easier workflow than sending patches to the list is to accumulate 
 a batch of fixes in a branch and submit a pull request via github (at 
 least if you're already a github user).  Whichever works well for you.
 
 Thanks!
 s

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] CephFS test-case

2013-09-06 Thread Sage Weil
[re-adding ceph-devel]

On Sat, 7 Sep 2013, Nigel Williams wrote:

 On Sat, Sep 7, 2013 at 1:27 AM, Sage Weil s...@inktank.com wrote:
  It sounds like the problem is cluster B's pools have too few PGs, making
  the data distribution get all out of whack.
 
 Agree, it was too few PGs, I have no re-adjusted and it is busy
 backfilling and evening out the data-distribution across the OSDs.
 
 My overall point is that the out-of-the-box defaults don't provide a
 stable test-deployment (whereas older versions like 0.61 did), and so
 minimally perhaps ceph-deploy needs to have a stab at choosing a
 workable value of PGs? or alternatively the health warning could
 include a note about PGs being too low.

I agree; this is a general problem that we need to come up with a better 
solution to.

One idea:

- make ceph health warn when the pg distribution looks bad
- too few pgs relative the # of osds
- too many objects in a pool relative to the # of pgs and the 
  above

(We'll need to be a little creative to make thresholds that make sense.)

If we have an interactive ceph-deploy new, we can also estimate how big 
the cluster will get and make a more sensible starting count.  I like that 
less, though, as it potentially confusing and has more room for user 
error.

sage


 
   ceph osd dump | grep ^pool
  say, and how many OSDs do you have?
 
 I assume you mean PGs, it was the default (192?) and changing it to
 400 seems to have helped. There are 12 OSDs (4 per server, 3 servers).
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] CephFS test-case

2013-09-06 Thread Mark Nelson

On 09/06/2013 06:22 PM, Sage Weil wrote:

[re-adding ceph-devel]

On Sat, 7 Sep 2013, Nigel Williams wrote:


On Sat, Sep 7, 2013 at 1:27 AM, Sage Weil s...@inktank.com wrote:

It sounds like the problem is cluster B's pools have too few PGs, making
the data distribution get all out of whack.


Agree, it was too few PGs, I have no re-adjusted and it is busy
backfilling and evening out the data-distribution across the OSDs.

My overall point is that the out-of-the-box defaults don't provide a
stable test-deployment (whereas older versions like 0.61 did), and so
minimally perhaps ceph-deploy needs to have a stab at choosing a
workable value of PGs? or alternatively the health warning could
include a note about PGs being too low.


I agree; this is a general problem that we need to come up with a better
solution to.

One idea:

- make ceph health warn when the pg distribution looks bad
- too few pgs relative the # of osds
- too many objects in a pool relative to the # of pgs and the
  above

(We'll need to be a little creative to make thresholds that make sense.)

If we have an interactive ceph-deploy new, we can also estimate how big
the cluster will get and make a more sensible starting count.  I like that
less, though, as it potentially confusing and has more room for user
error.


At one point Sam and I were discussing some kind of message that 
wouldn't be a health warning, but something kind of similar to what you 
are discussing here.  The idea is this would be for when Ceph thinks 
something is configured sub-optimally, but the issue doesn't necessarily 
affect the health of the cluster (at least in so much as everything is 
functioning as defined).  We were concerned that people might not want 
more things causing health warnings.




sage





  ceph osd dump | grep ^pool
say, and how many OSDs do you have?


I assume you mean PGs, it was the default (192?) and changing it to
400 seems to have helped. There are 12 OSDs (4 per server, 3 servers).



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-16 Thread Frederik Thuysbaert

Hi Sylvain,


I'm not quite sure what u mean, can u give some more information on how I do
this? I compiled tapdisk with ./configure CFLAGS=-g, but I'm not sure this
is what u meant.


Yes, ./configure CFLAGS=-g LDFLAGS=-g  is a good start.

...

Then once you have a core file, you can use gdb along with the tapdisk
executable to generate a meaningful backtrace of where the crash



I did 2 runs, with a cold reboot in between just to be sure. I don't 
think I'm getting a lot of valuable information, but I will post it 
anyway. The reason for the cold reboot was a 'Cannot access memory at 
address ...' in gdb after the first frame, I thought it could help.


Here's what I got:

try 1:
Core was generated by `tapdisk'.
Program terminated with signal 11, Segmentation fault.
#0  0x7fb42d2082d7 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/x86_64-linux-gnu/libpthread.so.0

(gdb) bt
#0  0x7fb42d2082d7 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/x86_64-linux-gnu/libpthread.so.0

Cannot access memory at address 0x7fb42f081c38
(gdb) frame 0
#0  0x7fb42d2082d7 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/x86_64-linux-gnu/libpthread.so.0

(gdb) list
77  }
78  
79  int
80  main(int argc, char *argv[])
81  {
82  char *control;
83  int c, err, nodaemon;
84  FILE *out;
85  
86  control  = NULL;
(gdb) info locals
No symbol table info available.

try 2:
Core was generated by `tapdisk'.
Program terminated with signal 11, Segmentation fault.
#0  0x7fe05a721e6b in poll () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x7fe05a721e6b in poll () from /lib/x86_64-linux-gnu/libc.so.6
Cannot access memory at address 0x7fe05c2ba518
(gdb) frame 0
#0  0x7fe05a721e6b in poll () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) list
77  }
78  
79  int
80  main(int argc, char *argv[])
81  {
82  char *control;
83  int c, err, nodaemon;
84  FILE *out;
85  
86  control  = NULL;
(gdb) info locals
No symbol table info available.

Regards,

- Frederik

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-15 Thread James Harper
 
  Hi,
 
   I just tested with tap2:aio and that worked (had an old image of the VM
 on
  lvm still so just tested with that). Switching back to rbd and it crashes 
  every
  time, just as postgres is starting in the vm. Booting into single user mode,
  waiting 30 seconds, then letting the boot continue it still crashes at the
 same
  point so I think it's not a timing thing - maybe postgres has a disk access
  pattern that is triggering the bug?
 
  Mmm, that's really interesting.
 
  Could you try to disable request merging ? Just give option
  max_merge_size=0 in the tap2 disk description. Something like
  'tap2:tapdisk:rbd:rbd/test:max_merge_size=0,xvda2,w'
 
 
 Just as suddenly the problem went away and I can no longer reproduce the
 crash on startup. Very frustrating. Most likely it still crashed during heavy 
 use
 but that can take days.
 
 I've just upgraded librbd to dumpling (from cuttlefish) on that one server and
 will see what it's doing by morning. I'll disable merging when I can reproduce
 it next.
 

I just had a crash since upgrading to dumpling, and will disable merging 
tonight.

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-15 Thread James Harper

 
 I just had a crash since upgrading to dumpling, and will disable merging
 tonight.
 

Still crashes with merging disabled.

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-14 Thread Frederik Thuysbaert

On 13-08-13 17:39, Sylvain Munaut wrote:


It's actually strange that it changes anything at all.

Can you try adding a ERROR(HERE\n);  in that error path processing
and check syslog to see if it's triggered at all ?

A traceback would be great if you can get a core file. And possibly
compile tapdisk with debug symbols.

When halting the domU after the errors, I get the following in dom0 syslog:

Aug 14 10:43:57 xen-001 kernel: [ 5041.338756] INFO: task tapdisk:9690 
blocked for more than 120 seconds.
Aug 14 10:43:57 xen-001 kernel: [ 5041.338817] echo 0  
/proc/sys/kernel/hung_task_timeout_secs disables this message.
Aug 14 10:43:57 xen-001 kernel: [ 5041.338903] tapdisk D 
8800bf213780 0  9690  1 0x
Aug 14 10:43:57 xen-001 kernel: [ 5041.338908]  8800b4b0e730 
0246 8800 8160d020
Aug 14 10:43:57 xen-001 kernel: [ 5041.338912]  00013780 
8800b4ebffd8 8800b4ebffd8 8800b4b0e730
Aug 14 10:43:57 xen-001 kernel: [ 5041.338916]  8800b4d36190 
000181199c37 8800b5798c00 8800b5798c00

Aug 14 10:43:57 xen-001 kernel: [ 5041.338921] Call Trace:
Aug 14 10:43:57 xen-001 kernel: [ 5041.338929] [a0308411] ? 
blktap_device_destroy_sync+0x85/0x9b [blktap]
Aug 14 10:43:57 xen-001 kernel: [ 5041.338936] [8105fadf] ? 
add_wait_queue+0x3c/0x3c
Aug 14 10:43:57 xen-001 kernel: [ 5041.338940] [a0307444] ? 
blktap_ring_release+0x10/0x2d [blktap]
Aug 14 10:43:57 xen-001 kernel: [ 5041.338945] [810fb141] ? 
fput+0xf9/0x1a1
Aug 14 10:43:57 xen-001 kernel: [ 5041.338949] [810f8e6c] ? 
filp_close+0x62/0x6a
Aug 14 10:43:57 xen-001 kernel: [ 5041.338954] [81049831] ? 
put_files_struct+0x60/0xad
Aug 14 10:43:57 xen-001 kernel: [ 5041.338958] [81049e38] ? 
do_exit+0x292/0x713
Aug 14 10:43:57 xen-001 kernel: [ 5041.338961] [8104a539] ? 
do_group_exit+0x74/0x9e
Aug 14 10:43:57 xen-001 kernel: [ 5041.338965] [81055f94] ? 
get_signal_to_deliver+0x46d/0x48f
Aug 14 10:43:57 xen-001 kernel: [ 5041.338970] [81347759] ? 
force_sig_info_fault+0x5b/0x63
Aug 14 10:43:57 xen-001 kernel: [ 5041.338975] [8100de27] ? 
do_signal+0x38/0x610
Aug 14 10:43:57 xen-001 kernel: [ 5041.338979] [81070deb] ? 
arch_local_irq_restore+0x7/0x8
Aug 14 10:43:57 xen-001 kernel: [ 5041.338983] [8134eb77] ? 
_raw_spin_unlock_irqrestore+0xe/0xf
Aug 14 10:43:57 xen-001 kernel: [ 5041.338987] [8103f944] ? 
wake_up_new_task+0xb9/0xc2
Aug 14 10:43:57 xen-001 kernel: [ 5041.338992] [8106f987] ? 
sys_futex+0x120/0x151
Aug 14 10:43:57 xen-001 kernel: [ 5041.338995] [8100e435] ? 
do_notify_resume+0x25/0x68
Aug 14 10:43:57 xen-001 kernel: [ 5041.338999] [8134ef3c] ? 
retint_signal+0x48/0x8c

...
Aug 14 10:44:17 xen-001 tap-ctl: tap-err:tap_ctl_connect: couldn't 
connect to /var/run/blktap-control/ctl9478: 111





Cheers,

 Sylvain

Regards

- Frederik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-14 Thread Sylvain Munaut
Hi,

 I just tested with tap2:aio and that worked (had an old image of the VM on 
 lvm still so just tested with that). Switching back to rbd and it crashes 
 every time, just as postgres is starting in the vm. Booting into single user 
 mode, waiting 30 seconds, then letting the boot continue it still crashes at 
 the same point so I think it's not a timing thing - maybe postgres has a disk 
 access pattern that is triggering the bug?

Mmm, that's really interesting.

Could you try to disable request merging ? Just give option
max_merge_size=0 in the tap2 disk description. Something like
'tap2:tapdisk:rbd:rbd/test:max_merge_size=0,xvda2,w'

Cheers,

 Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-14 Thread James Harper
 
 Hi,
 
  I just tested with tap2:aio and that worked (had an old image of the VM on
 lvm still so just tested with that). Switching back to rbd and it crashes 
 every
 time, just as postgres is starting in the vm. Booting into single user mode,
 waiting 30 seconds, then letting the boot continue it still crashes at the 
 same
 point so I think it's not a timing thing - maybe postgres has a disk access
 pattern that is triggering the bug?
 
 Mmm, that's really interesting.
 
 Could you try to disable request merging ? Just give option
 max_merge_size=0 in the tap2 disk description. Something like
 'tap2:tapdisk:rbd:rbd/test:max_merge_size=0,xvda2,w'
 

Just as suddenly the problem went away and I can no longer reproduce the crash 
on startup. Very frustrating. Most likely it still crashed during heavy use but 
that can take days.

I've just upgraded librbd to dumpling (from cuttlefish) on that one server and 
will see what it's doing by morning. I'll disable merging when I can reproduce 
it next.

Thanks

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-14 Thread Sylvain Munaut
Hi Frederik,

 A traceback would be great if you can get a core file. And possibly
 compile tapdisk with debug symbols.

 I'm not quite sure what u mean, can u give some more information on how I do
 this? I compiled tapdisk with ./configure CFLAGS=-g, but I'm not sure this
 is what u meant.

Yes, ./configure CFLAGS=-g LDFLAGS=-g  is a good start.

Then when it crashes, if will leave a 'core' time somewhere. (not sure
where, maybe in / or in /tmp)
If it doesn't you may have to enable it. When the process is running,
use this on the tapdisk PID :

http://superuser.com/questions/404239/setting-ulimit-on-a-running-process

Then once you have a core file, you can use gdb along with the tapdisk
executable to generate a meaningful backtrace of where the crash
happenned :

See for ex http://publib.boulder.ibm.com/httpserv/ihsdiag/get_backtrace.html
for how to do it.


 When halting the domU after the errors, I get the following in dom0 syslog:

It's not really unexpected. If tapdisk crashes the IO ring is going to
be left hanging and god knows what weird behaviour will happen ...


Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread Sylvain Munaut
 FWIW, I can confirm via printf's that this error path is never hit in at 
 least some of the crashes I'm seeing.

Ok thanks.

Are you using cache btw ?

Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread James Harper
 
  FWIW, I can confirm via printf's that this error path is never hit in at 
  least
 some of the crashes I'm seeing.
 
 Ok thanks.
 
 Are you using cache btw ?
 

I hope not. How could I tell? It's not something I've explicitly enabled.

Thanks

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread Sylvain Munaut
Hi,

 I hope not. How could I tell? It's not something I've explicitly enabled.

It's disabled by default.

So you'd have to have enabled it either in ceph.conf  or directly in
the device path in the xen config. (option is 'rbd cache',
http://ceph.com/docs/next/rbd/rbd-config-ref/ )

Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread Frederik Thuysbaert


Hi,

I have been testing this a while now, and just finished testing your 
untested patch. The rbd caching problem still persists.


The system I am testing on has the following characteristics:

Dom0:
- Linux xen-001 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64
- Most recent git checkout of blktap rbd branch

DomU:
- Same kernel as dom0
- Root (xvda1) is a logical volume on dom0
- xvda2 is a Rados Block Device format 1

Let me start by saying that the errors only occur with RBD client 
caching ON.
I will give the error messages of both dom0 and domU before and after I 
applied the patch.


Actions in domU to trigger errors:

~# mkfs.xfs -f /dev/xvda2
~# mount /dev/xvda2 /mnt
~# bonnie -u 0 -g 0 /mnt


Error messages:

BEFORE patch:

Without RBD cache:

dom0: no errors
domU: no errors

With RBD cache:

dom0: no errors

domU:
Aug 13 18:18:33 debian-vm-101 kernel: [   37.960475] lost page write due 
to I/O error on xvda2
Aug 13 18:18:33 debian-vm-101 kernel: [   37.960488] lost page write due 
to I/O error on xvda2
Aug 13 18:18:33 debian-vm-101 kernel: [   37.960501] lost page write due 
to I/O error on xvda2

...
Aug 13 18:18:52 debian-vm-101 kernel: [   56.394645] XFS (xvda2): 
xfs_do_force_shutdown(0x2) called from line 1007 of file 
/build/linux-s5x2oE/linux-3.2.46/fs/xfs/xfs_log.c.  Return address = 
0xa013ced5
Aug 13 18:19:19 debian-vm-101 kernel: [   83.941539] XFS (xvda2): 
xfs_log_force: error 5 returned.
Aug 13 18:19:19 debian-vm-101 kernel: [   83.941565] XFS (xvda2): 
xfs_log_force: error 5 returned.

...

AFTER patch:

Without RBD cache:

dom0: no errors
domU: no errors

With RBD cache:

dom0:
Aug 13 16:40:49 xen-001 kernel: [   94.954734] tapdisk[3075]: segfault 
at 7f749ee86da0 ip 7f749d060776 sp 7f748ea7a460 error 7 in 
libpthread-2.13.so[7f749d059000+17000]



domU:
Same as before patch.



I would like to add that I have the time to test this, we are happy to 
help you in any way possible. However, since I am no C developer, I 
won't be able to do much more than testing.



Regards

Frederik


On 13-08-13 11:20, Sylvain Munaut wrote:

Hi,


I hope not. How could I tell? It's not something I've explicitly enabled.

It's disabled by default.

So you'd have to have enabled it either in ceph.conf  or directly in
the device path in the xen config. (option is 'rbd cache',
http://ceph.com/docs/next/rbd/rbd-config-ref/ )

Cheers,

 Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread Sylvain Munaut
Hi,

 I have been testing this a while now, and just finished testing your
 untested patch. The rbd caching problem still persists.

Yes, I wouldn't expect to change anything for caching. But I still
don't understand why caching would change anything at all ... all of
it should be handled within the librbd lib.


Note that I would recommend against caching anyway. The blktap layer
doesn't pass through the FLUSH commands and so this make it completely
unsafe because the VM will think things are commited to disk durably
even though they are not ...



 I will give the error messages of both dom0 and domU before and after I
 applied the patch.

It's actually strange that it changes anything at all.

Can you try adding a ERROR(HERE\n);  in that error path processing
and check syslog to see if it's triggered at all ?

A traceback would be great if you can get a core file. And possibly
compile tapdisk with debug symbols.


Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread James Harper
Just noticed email subject qemu-1.4.0 and onwards, linux kernel 3.2.x, 
ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and 
unresponsive qemu-process, [Qemu-devel] [Bug 1207686] where Sage noted that he 
has seen a completion called twice in the logs the OP posted. If that is 
actually happening (and not just an artefact of logging ring buffer overflowing 
or something) then I think that could easily cause a segfault in tapdisk rbd.

I'll try and see if I can log when that happens.

James

 -Original Message-
 From: Sylvain Munaut [mailto:s.mun...@whatever-company.com]
 Sent: Tuesday, 13 August 2013 7:20 PM
 To: James Harper
 Cc: Pasi Kärkkäinen; ceph-devel@vger.kernel.org; xen-de...@lists.xen.org
 Subject: Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to
 test ? :p
 
 Hi,
 
  I hope not. How could I tell? It's not something I've explicitly enabled.
 
 It's disabled by default.
 
 So you'd have to have enabled it either in ceph.conf  or directly in
 the device path in the xen config. (option is 'rbd cache',
 http://ceph.com/docs/next/rbd/rbd-config-ref/ )
 
 Cheers,
 
 Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread James Harper
I think I have a separate problem too - tapdisk will segfault almost 
immediately upon starting but seemingly only for Linux PV DomU's. Once it has 
started doing this I have to wait a few hours to a day before it starts working 
again. My Windows DomU's appear to be able to start normally though.

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread Sylvain Munaut
On Wed, Aug 14, 2013 at 1:39 AM, James Harper
james.har...@bendigoit.com.au wrote:
 I think I have a separate problem too - tapdisk will segfault almost 
 immediately upon starting but seemingly only for Linux PV DomU's. Once it has 
 started doing this I have to wait a few hours to a day before it starts 
 working again. My Windows DomU's appear to be able to start normally though.

What about other blktap driver ? like using blktap raw driver, does
that work without issue ?

Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread James Harper
 
 On Wed, Aug 14, 2013 at 1:39 AM, James Harper
 james.har...@bendigoit.com.au wrote:
  I think I have a separate problem too - tapdisk will segfault almost
 immediately upon starting but seemingly only for Linux PV DomU's. Once it
 has started doing this I have to wait a few hours to a day before it starts
 working again. My Windows DomU's appear to be able to start normally
 though.
 
 What about other blktap driver ? like using blktap raw driver, does
 that work without issue ?
 

What's the syntax for that? I use tap2:tapdisk:rbd for rbd, but I don't know 
how to specify raw and anything I try just says it doesn't understand

Thanks

James
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-13 Thread James Harper
 
 
  On Wed, Aug 14, 2013 at 1:39 AM, James Harper
  james.har...@bendigoit.com.au wrote:
   I think I have a separate problem too - tapdisk will segfault almost
  immediately upon starting but seemingly only for Linux PV DomU's. Once it
  has started doing this I have to wait a few hours to a day before it starts
  working again. My Windows DomU's appear to be able to start normally
  though.
 
  What about other blktap driver ? like using blktap raw driver, does
  that work without issue ?
 
 
 What's the syntax for that? I use tap2:tapdisk:rbd for rbd, but I don't know
 how to specify raw and anything I try just says it doesn't understand
 

I just tested with tap2:aio and that worked (had an old image of the VM on lvm 
still so just tested with that). Switching back to rbd and it crashes every 
time, just as postgres is starting in the vm. Booting into single user mode, 
waiting 30 seconds, then letting the boot continue it still crashes at the same 
point so I think it's not a timing thing - maybe postgres has a disk access 
pattern that is triggering the bug?

Putting printf's in seems to make the problem go away sometimes, so it's hard 
to debug.

James

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Ceph-qa: change the fsx.sh to support hole punching test

2013-08-13 Thread Li Wang
This patch change the fsx.sh to pull better fsx.c from xfstests site
to support hole punching test.

Signed-off-by: Yunchuan Wen yunchuan...@ubuntukylin.com
Signed-off-by: Li Wang liw...@ubuntukylin.com
---
 qa/workunits/suites/fsx.sh |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/qa/workunits/suites/fsx.sh b/qa/workunits/suites/fsx.sh
index 32d5b63..c48164c 100755
--- a/qa/workunits/suites/fsx.sh
+++ b/qa/workunits/suites/fsx.sh
@@ -2,8 +2,10 @@
 
 set -e
 
-wget http://ceph.com/qa/fsx.c
-gcc fsx.c -o fsx
+apt-get install git libacl1-dev xfslibs-dev libattr1-dev -y
+git clone git://ceph.newdream.net/git/xfstests.git
+make -C xfstests
+cp xfstests/ltp/fsx .
 
 ./fsx   1MB -N 5 -p 1 -l 1048576
 ./fsx  10MB -N 5 -p 1 -l 10485760
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-12 Thread Sylvain Munaut
Hi,

   tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 7f7e387532d4 sp
  7f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
   tapdisk:9180 blocked for more than 120 seconds.
   tapdisk D 88043fc13540 0  9180  1 0x

You can try generating a core file by changing the ulimit on the running process

http://superuser.com/questions/404239/setting-ulimit-on-a-running-process

A backtrace would be useful :)


 Actually maybe not. What I was reading only applies for large number of bytes 
 written to the pipe, and even then I got confused by the double negatives. 
 Sorry for the noise.

Yes, as you discovered but size  PIPE_BUF, they should be atomic even
in non-blocking mode. But I could still add assert() there to make
sure it is.


I did find a bug where it could leak requests which may lead to
hang. But it shouldn't crash ...

Here's an (untested yet) patch in the rbd error path:


diff --git a/drivers/block-rbd.c b/drivers/block-rbd.c
index 68fbed7..ab2d2c5 100644
--- a/drivers/block-rbd.c
+++ b/drivers/block-rbd.c
@@ -560,6 +560,9 @@ err:
if (c)
rbd_aio_release(c);

+   list_move(req-queue, prv-reqs_free);
+   prv-reqs_free_count++;
+
return rv;
 }


Cheers,

 Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Xen-devel] Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-08-12 Thread James Harper
tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 7f7e387532d4 sp
   7f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
tapdisk:9180 blocked for more than 120 seconds.
tapdisk D 88043fc13540 0  9180  1 0x
 
 You can try generating a core file by changing the ulimit on the running
 process
 
 A backtrace would be useful :)
 

I found it was actually dumping core in /, but gdb doesn't seem to work nicely 
and all I get is this:

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library /lib/x86_64-linux-gnu/libthread_db.so.1.
Cannot find new threads: generic error
Core was generated by `tapdisk'.
Program terminated with signal 11, Segmentation fault.
#0  pthread_cond_wait@@GLIBC_2.3.2 () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:163
163 ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such 
file or directory.

Even when I attach to a running process.

One VM segfaults on startup, pretty much everytime except never when I attach 
strace to it, meaning it's probably a race condition and may not actually be in 
your code...

 
  Actually maybe not. What I was reading only applies for large number of
  bytes written to the pipe, and even then I got confused by the double
  negatives. Sorry for the noise.
 
 Yes, as you discovered but size  PIPE_BUF, they should be atomic even
 in non-blocking mode. But I could still add assert() there to make
 sure it is.

Nah I got that completely backwards. I see now you are only passing a pointer 
so yes it should never be non-atomic.

 I did find a bug where it could leak requests which may lead to
 hang. But it shouldn't crash ...
 
 Here's an (untested yet) patch in the rbd error path:
 

I'll try that later this morning when I get a minute.

I've done the poor-mans-debugger thing and riddled the code with printf's but 
as far as I can determine every routine starts and ends. My thinking at the 
moment is that it's either a race (the VM's most likely to crash have multiple 
disks), or a buffer overflow that trips it up either immediately, or later.

I have definitely observed multiple VM's crash when something in ceph hiccup's 
(eg I bring a mon up or down), if that helps.

I also followed through the rbd_aio_release idea on the weekend - I can see 
that if the read returns failure it means the callback was never called so the 
release is then the responsibility of the caller.

Thanks

James

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >