RE: Question about big EC pool.

2015-09-12 Thread Somnath Roy
 I don't think there is any limit from Ceph side..
> We are testing with ~768 TB deployment with 4:2 EC on Flash and it is working 
> well so far..
>
> Thanks & Regards
> Somnath

Thanks for answer!

It's very interesting!

What is hardware you use for your the test cluster?
[Somnath] Three 256 TB SanDisk's JBOF (IF100) and 2 heads in front of that , 
so, total of 6 node cluster. FYI, each IF100 can support max 512 TB. Heads are 
with 128GB  RAM and Xeon 2690 V3 dual socket on each of the server.

You use only SSD or SSD+NVE?

[Somnath] For now, it is all SSDs.

Journal is located on the same SSD or not?

[Somnath] Yes, journal is on the same SSD.

What a plugin you use?

[Somnath] Cauchy_good jerasure.

You catch some bugs or strange things?

[Somnath] So far all is well :-)

Sorry for so many questions, but it's very interesting!

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

N�r��yb�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

Re: Question about big EC pool.

2015-09-12 Thread Mike Almateia

12-Sep-15 19:34, Somnath Roy пишет:

I don't think there is any limit from Ceph side..
We are testing with ~768 TB deployment with 4:2 EC on Flash and it is working 
well so far..

Thanks & Regards
Somnath


Thanks for answer!

It's very interesting!

What is hardware you use for your the test cluster?
You use only SSD or SSD+NVE?
Journal is located on the same SSD or not?
What a plugin you use?
You catch some bugs or strange things?

Sorry for so many questions, but it's very interesting!
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: loadable objectstore

2015-09-12 Thread Allen Samuels
Performance impact after initialization will be zero. All of the call sequences 
are done as vtable dynamic dispatches on the global ObjectStore instance. This 
type of call sequence doesn't matter whether it's dynamic or statically linked, 
they are the same (a simple indirection through the vtbl which is loaded from a 
known constant offset in the object).


Allen Samuels
Chief Software Architect, Emerging Storage Solutions 

951 SanDisk Drive, Milpitas, CA 95035
T: +1 408 801 7030| M: +1 408 780 6416
allen.samu...@sandisk.com

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Varada Kari
Sent: Friday, September 11, 2015 9:34 PM
To: James (Fei) Liu-SSI ; Sage Weil 
; Matt W. Benjamin ; Loic Dachary 

Cc: ceph-devel 
Subject: RE: loadable objectstore

Hi James,

Please find the responses inline.

varada

> -Original Message-
> From: James (Fei) Liu-SSI [mailto:james@ssi.samsung.com]
> Sent: Saturday, September 12, 2015 12:13 AM
> To: Varada Kari ; Sage Weil
> ; Matt W. Benjamin ; Loic
> Dachary 
> Cc: ceph-devel 
> Subject: RE: loadable objectstore
>
> Hi Varada,
>   Got a chance to go through the code. Great job. It is much cleaner . Several
> questions:
>   1. What you think about the performance impact with the new
> implementation? Such  as dynamic library vs static link?
[Varada Kari] Haven't measured the performance yet, but there will be some hit 
due to static vs dynamic. But that shouldn't be a major degradation, but I will 
hold on till we have some perf runs to figure that out.
>   2. Could any vendor just provide a objectstore interfaces complied dynamic
> binary library for their own storage engine with new factory framework?
[Varada Kari] That was one of the design motives for this change. Yes any 
backend adhering the interfaces of object store can integrate with osd. All 
they need to do provide a factory interface and the required version and init 
functionality additionally to all the required object store interfaces.
>
>   Regards,
>   James
>
>
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Varada Kari
> Sent: Friday, September 11, 2015 3:28 AM
> To: Sage Weil; Matt W. Benjamin; Loic Dachary
> Cc: ceph-devel
> Subject: RE: loadable objectstore
>
> Hi Sage/ Matt,
>
> I have submitted the pull request based on wip-plugin branch for the object
> store factory implementation at https://github.com/ceph/ceph/pull/5884 .
> Haven't rebased to the master yet. Working on rebase and including new
> store in the factory implementation.  Please have a look and let me know
> your comments. Will submit a rebased PR soon with new store integration.
>
> Thanks,
> Varada
>
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Varada Kari
> Sent: Friday, July 03, 2015 7:31 PM
> To: Sage Weil ; Adam Crume
> 
> Cc: Loic Dachary ; ceph-devel  de...@vger.kernel.org>; Matt W. Benjamin 
> Subject: RE: loadable objectstore
>
> Hi All,
>
> Not able to make much progress after making common as a shared object
> along with object store.
> Compilation of the test binaries are failing with 
> "./.libs/libceph_filestore.so:
> undefined reference to `tracepoint_dlopen'".
>
>   CXXLDceph_streamtest
> ./.libs/libceph_filestore.so: undefined reference to `tracepoint_dlopen'
> collect2: error: ld returned 1 exit status
> make[3]: *** [ceph_streamtest] Error 1
>
> But libfilestore.so is linked with lttng-ust.
>
> src/.libs$ ldd libceph_filestore.so
> libceph_keyvaluestore.so.1 => /home/varada/obs-factory/plugin-
> work/src/.libs/libceph_keyvaluestore.so.1 (0x7f5e50f5)
> libceph_os.so.1 => /home/varada/obs-factory/plugin-
> work/src/.libs/libceph_os.so.1 (0x7f5e4f93a000)
> libcommon.so.1 => /home/varada/ obs-factory/plugin-
> work/src/.libs/libcommon.so.1 (0x7f5e4b5df000)
> liblttng-ust.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust.so.0
> (0x7f5e4b179000)
> liblttng-ust-tracepoint.so.0 => 
> /usr/lib/x86_64-linux-gnu/liblttng-ust-
> tracepoint.so.0 (0x7f5e4a021000)
> liburcu-bp.so.1 => /usr/lib/liburcu-bp.so.1 (0x7f5e49e1a000)
> liburcu-cds.so.1 => /usr/lib/liburcu-cds.so.1 (0x7f5e49c12000)
>
> Edited the above output just show the dependencies.
> Did anyone face this issue before?
> Any help would be much appreciated.
>
> Thanks,
> Varada
>
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Varada Kari
> Sent: Friday, June 26, 

Re: About Fio backend with ObjectStore API

2015-09-12 Thread Matt Benjamin
It would be worth exploring async, sure.

matt

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-761-4689
fax.  734-769-8938
cel.  734-216-5309


- Original Message -
> From: "James (Fei) Liu-SSI" 
> To: "Casey Bodley" 
> Cc: "Haomai Wang" , ceph-devel@vger.kernel.org
> Sent: Friday, September 11, 2015 1:18:31 PM
> Subject: RE: About Fio backend with ObjectStore API
> 
> Hi Casey,
>   You are right. I think the bottleneck is in fio side rather than in
>   filestore side in this case. The fio did not issue the io commands faster
>   enough to saturate the filestore.
>   Here is one of possible solution for it: Create a  async engine which are
>   normally way faster than sync engine in fio.
>
>Here is possible framework. This new Objectstore-AIO engine in FIO in
>theory will be way faster than sync engine. Once we have FIO which can
>saturate newstore, memstore and filestore, we can investigate them in
>very details of where the bottleneck in their design.
> 
> .
> struct objectstore_aio_data {
>   struct aio_ctx *q_aio_ctx;
>   struct aio_completion_data *a_data;
>   aio_ses_ctx_t *p_ses_ctx;
>   unsigned int entries;
> };
> ...
> /*
>  * Note that the structure is exported, so that fio can get it via
>  * dlsym(..., "ioengine");
>  */
> struct ioengine_ops us_aio_ioengine = {
>   .name   = "objectstore-aio",
>   .version= FIO_IOOPS_VERSION,
>   .init   = fio_objectstore_aio_init,
>   .prep   = fio_objectstore_aio_prep,
>   .queue  = fio_objectstore_aio_queue,
>   .cancel = fio_objectstore_aio_cancel,
>   .getevents  = fio_objectstore_aio_getevents,
>   .event  = fio_objectstore_aio_event,
>   .cleanup= fio_objectstore_aio_cleanup,
>   .open_file  = fio_objectstore_aio_open,
>   .close_file = fio_objectstore_aio_close,
> };
> 
> 
> Let me know what you think.
> 
> Regards,
> James
> 
> -Original Message-
> From: Casey Bodley [mailto:cbod...@redhat.com]
> Sent: Friday, September 11, 2015 7:28 AM
> To: James (Fei) Liu-SSI
> Cc: Haomai Wang; ceph-devel@vger.kernel.org
> Subject: Re: About Fio backend with ObjectStore API
> 
> Hi James,
> 
> That's great that you were able to get fio-objectstore running! Thanks to you
> and Haomai for all the help with testing.
> 
> In terms of performance, it's possible that we're not handling the
> completions optimally. When profiling with MemStore I remember seeing a
> significant amount of cpu time spent in polling with
> fio_ceph_os_getevents().
> 
> The issue with reads is more of a design issue than a bug. Because the test
> starts with a mkfs(), there are no objects to read from initially. You would
> just have to add a write job to run before the read job, to make sure that
> the objects are initialized. Or perhaps the mkfs() step could be an optional
> part of the configuration.
> 
> Casey
> 
> - Original Message -
> From: "James (Fei) Liu-SSI" 
> To: "Haomai Wang" , "Casey Bodley" 
> Cc: ceph-devel@vger.kernel.org
> Sent: Thursday, September 10, 2015 8:08:04 PM
> Subject: RE: About Fio backend with ObjectStore API
> 
> Hi Casey and Haomai,
> 
>   We finally made the fio-objectstore works in our end . Here is fio data
>   against filestore with Samsung 850 Pro. It is sequential write and the
>   performance is very poor which is expected though.
> 
> Run status group 0 (all jobs):
>   WRITE: io=524288KB, aggrb=9467KB/s, minb=9467KB/s, maxb=9467KB/s,
>   mint=55378msec, maxt=55378msec
> 
>   But anyway, it works even though still some bugs to fix like read and
>   filesytem issues. thanks a lot for your great work.
> 
>   Regards,
>   James
> 
>   jamesliu@jamesliu-OptiPlex-7010:~/WorkSpace/ceph_casey/src$ sudo ./fio/fio
>   ./test/objectstore.fio
> filestore: (g=0): rw=write, bs=128K-128K/128K-128K/128K-128K,
> ioengine=cephobjectstore, iodepth=1 fio-2.2.9-56-g736a Starting 1 process
> test1
> filestore: Laying out IO file(s) (1 file(s) / 512MB)
> 2015-09-10 16:55:40.614494 7f19d34d1840  1 filestore(/home/jamesliu/fio_ceph)
> mkfs in /home/jamesliu/fio_ceph
> 2015-09-10 16:55:40.614924 7f19d34d1840  1 filestore(/home/jamesliu/fio_ceph)
> mkfs generated fsid 5508d58e-dbfc-48a5-9f9c-c639af4fe73a
> 2015-09-10 16:55:40.630326 7f19d34d1840  1 filestore(/home/jamesliu/fio_ceph)
> write_version_stamp 4
> 2015-09-10 16:55:40.673417 7f19d34d1840  0 filestore(/home/jamesliu/fio_ceph)
> backend xfs (magic 0x58465342)
> 2015-09-10 16:55:40.724097 7f19d34d1840  1 filestore(/home/jamesliu/fio_ceph)
> leveldb db exists/created
> 2015-09-10 16:55:40.724218 7f19d34d1840 -1 journal 

RE: Question about big EC pool.

2015-09-12 Thread Somnath Roy
I don't think there is any limit from Ceph side..
We are testing with ~768 TB deployment with 4:2 EC on Flash and it is working 
well so far..

Thanks & Regards
Somnath

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Mike Almateia
Sent: Saturday, September 12, 2015 9:01 AM
To: ceph-devel
Subject: Question about big EC pool.

Hello!

Ceph have EC pool feature a long time and I thinking how much big EC pool can 
we made and support.

I have task from our a client to make a storage around 5Pb userful space for 
storing video from cams (I ask in ceph-user maillist but nobody answer me.)

Ceph by now can handle that huge EC storage or not? May be someone testing or 
useing in prodaction huge EC pools?
I heard, that EC feature not ready for production, it's right?

Thanks for any answer.

--
Mike, yes.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).



RE: loadable objectstore

2015-09-12 Thread Varada Kari
Yes Allen for ObjectStore what you said correct. Along with ObjectStore 
backends, I have made libcommon as a shared object, so I was not sure of the 
performance, as bufferlist is part of libcommon.

Varada

> -Original Message-
> From: Allen Samuels
> Sent: Sunday, September 13, 2015 2:05 AM
> To: Varada Kari ; James (Fei) Liu-SSI
> ; Sage Weil ; Matt W.
> Benjamin ; Loic Dachary 
> Cc: ceph-devel 
> Subject: RE: loadable objectstore
>
> Performance impact after initialization will be zero. All of the call 
> sequences
> are done as vtable dynamic dispatches on the global ObjectStore instance.
> This type of call sequence doesn't matter whether it's dynamic or statically
> linked, they are the same (a simple indirection through the vtbl which is
> loaded from a known constant offset in the object).
>
>
> Allen Samuels
> Chief Software Architect, Emerging Storage Solutions
>
> 951 SanDisk Drive, Milpitas, CA 95035
> T: +1 408 801 7030| M: +1 408 780 6416
> allen.samu...@sandisk.com
>
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Varada Kari
> Sent: Friday, September 11, 2015 9:34 PM
> To: James (Fei) Liu-SSI ; Sage Weil
> ; Matt W. Benjamin ; Loic
> Dachary 
> Cc: ceph-devel 
> Subject: RE: loadable objectstore
>
> Hi James,
>
> Please find the responses inline.
>
> varada
>
> > -Original Message-
> > From: James (Fei) Liu-SSI [mailto:james@ssi.samsung.com]
> > Sent: Saturday, September 12, 2015 12:13 AM
> > To: Varada Kari ; Sage Weil
> > ; Matt W. Benjamin ; Loic
> > Dachary 
> > Cc: ceph-devel 
> > Subject: RE: loadable objectstore
> >
> > Hi Varada,
> >   Got a chance to go through the code. Great job. It is much cleaner .
> > Several
> > questions:
> >   1. What you think about the performance impact with the new
> > implementation? Such  as dynamic library vs static link?
> [Varada Kari] Haven't measured the performance yet, but there will be some
> hit due to static vs dynamic. But that shouldn't be a major degradation, but I
> will hold on till we have some perf runs to figure that out.
> >   2. Could any vendor just provide a objectstore interfaces complied
> > dynamic binary library for their own storage engine with new factory
> framework?
> [Varada Kari] That was one of the design motives for this change. Yes any
> backend adhering the interfaces of object store can integrate with osd. All
> they need to do provide a factory interface and the required version and init
> functionality additionally to all the required object store interfaces.
> >
> >   Regards,
> >   James
> >
> >
> > -Original Message-
> > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> > ow...@vger.kernel.org] On Behalf Of Varada Kari
> > Sent: Friday, September 11, 2015 3:28 AM
> > To: Sage Weil; Matt W. Benjamin; Loic Dachary
> > Cc: ceph-devel
> > Subject: RE: loadable objectstore
> >
> > Hi Sage/ Matt,
> >
> > I have submitted the pull request based on wip-plugin branch for the
> > object store factory implementation at
> https://github.com/ceph/ceph/pull/5884 .
> > Haven't rebased to the master yet. Working on rebase and including new
> > store in the factory implementation.  Please have a look and let me
> > know your comments. Will submit a rebased PR soon with new store
> integration.
> >
> > Thanks,
> > Varada
> >
> > -Original Message-
> > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> > ow...@vger.kernel.org] On Behalf Of Varada Kari
> > Sent: Friday, July 03, 2015 7:31 PM
> > To: Sage Weil ; Adam Crume
> 
> > Cc: Loic Dachary ; ceph-devel  > de...@vger.kernel.org>; Matt W. Benjamin 
> > Subject: RE: loadable objectstore
> >
> > Hi All,
> >
> > Not able to make much progress after making common as a shared object
> > along with object store.
> > Compilation of the test binaries are failing with 
> > "./.libs/libceph_filestore.so:
> > undefined reference to `tracepoint_dlopen'".
> >
> >   CXXLDceph_streamtest
> > ./.libs/libceph_filestore.so: undefined reference to `tracepoint_dlopen'
> > collect2: error: ld returned 1 exit status
> > make[3]: *** [ceph_streamtest] Error 1
> >
> > But libfilestore.so is linked with lttng-ust.
> >
> > src/.libs$ ldd libceph_filestore.so
> > libceph_keyvaluestore.so.1 => /home/varada/obs-factory/plugin-
> > work/src/.libs/libceph_keyvaluestore.so.1 (0x7f5e50f5)
> > libceph_os.so.1 => /home/varada/obs-factory/plugin-
> > work/src/.libs/libceph_os.so.1 (0x7f5e4f93a000)
> > 

Re: 2 replications,flapping can not stop for a very long time

2015-09-12 Thread huang jun
hi, do you set both public_network and cluster_network, but just cut
off the cluster_network?
And do you have not only one osd on the same host?
If so, maybe you can not get stable, now the osd have peers in the
prev and next osd id,
they can exchange ping message.
you cut off the cluster_network, the outbox peer osds can not detect
the ping, they
reports the osd failure to MON, and MON gather enough reporters and
reports, then the osd will
be marked down.
But the osd can reports to MON bc the public_network is ok,  MON
thinks the osd wronly marked down, mark it to UP.
So flapping happens again and again.

2015-09-12 20:26 GMT+08:00 zhao.ming...@h3c.com :
>
> Hi,
> I'm testing reliability of ceph recently, and I have met the flapping problem.
> I have 2 replications, and cut off the cluster network ,now  flapping can not 
> stop,I have wait more than 30min, but status of osds are still not stable;
> I want to know about  when monitor recv reports from osds ,how it can 
> mark one osd down?
> (reports && reporter && grace) need to satisfied some conditions, how to 
> calculate the grace?
> and how long will the flapping  stop?Does the flapping must be stopped by 
> configure,such as configure an osd lost?
> Can someone help me ?
> Thanks~
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
> 邮件!
> This e-mail and its attachments contain confidential information from H3C, 
> which is
> intended only for the person or entity whose address is listed above. Any use 
> of the
> information contained herein in any way (including, but not limited to, total 
> or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please 
> notify the sender
> by phone or email immediately and delete it!



-- 
thanks
huangjun
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About Fio backend with ObjectStore API

2015-09-12 Thread Haomai Wang
It's really cool. Do you prepare to push to upstream? I think it
should be more convenient  if we make fio repo as submodule.

On Sat, Sep 12, 2015 at 5:04 PM, Haomai Wang  wrote:
> I found my problem why segment:
>
> because fio links librbd/librados from my /usr/local/lib but use
> ceph/src/.libs/libfio_ceph_objectstore.so. They are different ceph
> version.
>
> So maybe we need to add check for abi version?
>
> On Sat, Sep 12, 2015 at 4:08 AM, Casey Bodley  wrote:
>> Hi James,
>>
>> I just looked back at the results you posted, and saw that you were using 
>> iodepth=1. Setting this higher should help keep the FileStore busy.
>>
>> Casey
>>
>> - Original Message -
>>> From: "James (Fei) Liu-SSI" 
>>> To: "Casey Bodley" 
>>> Cc: "Haomai Wang" , ceph-devel@vger.kernel.org
>>> Sent: Friday, September 11, 2015 1:18:31 PM
>>> Subject: RE: About Fio backend with ObjectStore API
>>>
>>> Hi Casey,
>>>   You are right. I think the bottleneck is in fio side rather than in
>>>   filestore side in this case. The fio did not issue the io commands faster
>>>   enough to saturate the filestore.
>>>   Here is one of possible solution for it: Create a  async engine which are
>>>   normally way faster than sync engine in fio.
>>>
>>>Here is possible framework. This new Objectstore-AIO engine in FIO in
>>>theory will be way faster than sync engine. Once we have FIO which can
>>>saturate newstore, memstore and filestore, we can investigate them in
>>>very details of where the bottleneck in their design.
>>>
>>> .
>>> struct objectstore_aio_data {
>>>   struct aio_ctx *q_aio_ctx;
>>>   struct aio_completion_data *a_data;
>>>   aio_ses_ctx_t *p_ses_ctx;
>>>   unsigned int entries;
>>> };
>>> ...
>>> /*
>>>  * Note that the structure is exported, so that fio can get it via
>>>  * dlsym(..., "ioengine");
>>>  */
>>> struct ioengine_ops us_aio_ioengine = {
>>>   .name   = "objectstore-aio",
>>>   .version= FIO_IOOPS_VERSION,
>>>   .init   = fio_objectstore_aio_init,
>>>   .prep   = fio_objectstore_aio_prep,
>>>   .queue  = fio_objectstore_aio_queue,
>>>   .cancel = fio_objectstore_aio_cancel,
>>>   .getevents  = fio_objectstore_aio_getevents,
>>>   .event  = fio_objectstore_aio_event,
>>>   .cleanup= fio_objectstore_aio_cleanup,
>>>   .open_file  = fio_objectstore_aio_open,
>>>   .close_file = fio_objectstore_aio_close,
>>> };
>>>
>>>
>>> Let me know what you think.
>>>
>>> Regards,
>>> James
>>> 
>>> -Original Message-
>>> From: Casey Bodley [mailto:cbod...@redhat.com]
>>> Sent: Friday, September 11, 2015 7:28 AM
>>> To: James (Fei) Liu-SSI
>>> Cc: Haomai Wang; ceph-devel@vger.kernel.org
>>> Subject: Re: About Fio backend with ObjectStore API
>>>
>>> Hi James,
>>>
>>> That's great that you were able to get fio-objectstore running! Thanks to 
>>> you
>>> and Haomai for all the help with testing.
>>>
>>> In terms of performance, it's possible that we're not handling the
>>> completions optimally. When profiling with MemStore I remember seeing a
>>> significant amount of cpu time spent in polling with
>>> fio_ceph_os_getevents().
>>>
>>> The issue with reads is more of a design issue than a bug. Because the test
>>> starts with a mkfs(), there are no objects to read from initially. You would
>>> just have to add a write job to run before the read job, to make sure that
>>> the objects are initialized. Or perhaps the mkfs() step could be an optional
>>> part of the configuration.
>>>
>>> Casey
>>>
>>> - Original Message -
>>> From: "James (Fei) Liu-SSI" 
>>> To: "Haomai Wang" , "Casey Bodley" 
>>> 
>>> Cc: ceph-devel@vger.kernel.org
>>> Sent: Thursday, September 10, 2015 8:08:04 PM
>>> Subject: RE: About Fio backend with ObjectStore API
>>>
>>> Hi Casey and Haomai,
>>>
>>>   We finally made the fio-objectstore works in our end . Here is fio data
>>>   against filestore with Samsung 850 Pro. It is sequential write and the
>>>   performance is very poor which is expected though.
>>>
>>> Run status group 0 (all jobs):
>>>   WRITE: io=524288KB, aggrb=9467KB/s, minb=9467KB/s, maxb=9467KB/s,
>>>   mint=55378msec, maxt=55378msec
>>>
>>>   But anyway, it works even though still some bugs to fix like read and
>>>   filesytem issues. thanks a lot for your great work.
>>>
>>>   Regards,
>>>   James
>>>
>>>   jamesliu@jamesliu-OptiPlex-7010:~/WorkSpace/ceph_casey/src$ sudo ./fio/fio
>>>   ./test/objectstore.fio
>>> filestore: (g=0): rw=write, bs=128K-128K/128K-128K/128K-128K,
>>> ioengine=cephobjectstore, iodepth=1 fio-2.2.9-56-g736a Starting 1 process
>>> test1
>>> filestore: Laying out IO file(s) (1 file(s) 

Re: About Fio backend with ObjectStore API

2015-09-12 Thread Haomai Wang
I found my problem why segment:

because fio links librbd/librados from my /usr/local/lib but use
ceph/src/.libs/libfio_ceph_objectstore.so. They are different ceph
version.

So maybe we need to add check for abi version?

On Sat, Sep 12, 2015 at 4:08 AM, Casey Bodley  wrote:
> Hi James,
>
> I just looked back at the results you posted, and saw that you were using 
> iodepth=1. Setting this higher should help keep the FileStore busy.
>
> Casey
>
> - Original Message -
>> From: "James (Fei) Liu-SSI" 
>> To: "Casey Bodley" 
>> Cc: "Haomai Wang" , ceph-devel@vger.kernel.org
>> Sent: Friday, September 11, 2015 1:18:31 PM
>> Subject: RE: About Fio backend with ObjectStore API
>>
>> Hi Casey,
>>   You are right. I think the bottleneck is in fio side rather than in
>>   filestore side in this case. The fio did not issue the io commands faster
>>   enough to saturate the filestore.
>>   Here is one of possible solution for it: Create a  async engine which are
>>   normally way faster than sync engine in fio.
>>
>>Here is possible framework. This new Objectstore-AIO engine in FIO in
>>theory will be way faster than sync engine. Once we have FIO which can
>>saturate newstore, memstore and filestore, we can investigate them in
>>very details of where the bottleneck in their design.
>>
>> .
>> struct objectstore_aio_data {
>>   struct aio_ctx *q_aio_ctx;
>>   struct aio_completion_data *a_data;
>>   aio_ses_ctx_t *p_ses_ctx;
>>   unsigned int entries;
>> };
>> ...
>> /*
>>  * Note that the structure is exported, so that fio can get it via
>>  * dlsym(..., "ioengine");
>>  */
>> struct ioengine_ops us_aio_ioengine = {
>>   .name   = "objectstore-aio",
>>   .version= FIO_IOOPS_VERSION,
>>   .init   = fio_objectstore_aio_init,
>>   .prep   = fio_objectstore_aio_prep,
>>   .queue  = fio_objectstore_aio_queue,
>>   .cancel = fio_objectstore_aio_cancel,
>>   .getevents  = fio_objectstore_aio_getevents,
>>   .event  = fio_objectstore_aio_event,
>>   .cleanup= fio_objectstore_aio_cleanup,
>>   .open_file  = fio_objectstore_aio_open,
>>   .close_file = fio_objectstore_aio_close,
>> };
>>
>>
>> Let me know what you think.
>>
>> Regards,
>> James
>> 
>> -Original Message-
>> From: Casey Bodley [mailto:cbod...@redhat.com]
>> Sent: Friday, September 11, 2015 7:28 AM
>> To: James (Fei) Liu-SSI
>> Cc: Haomai Wang; ceph-devel@vger.kernel.org
>> Subject: Re: About Fio backend with ObjectStore API
>>
>> Hi James,
>>
>> That's great that you were able to get fio-objectstore running! Thanks to you
>> and Haomai for all the help with testing.
>>
>> In terms of performance, it's possible that we're not handling the
>> completions optimally. When profiling with MemStore I remember seeing a
>> significant amount of cpu time spent in polling with
>> fio_ceph_os_getevents().
>>
>> The issue with reads is more of a design issue than a bug. Because the test
>> starts with a mkfs(), there are no objects to read from initially. You would
>> just have to add a write job to run before the read job, to make sure that
>> the objects are initialized. Or perhaps the mkfs() step could be an optional
>> part of the configuration.
>>
>> Casey
>>
>> - Original Message -
>> From: "James (Fei) Liu-SSI" 
>> To: "Haomai Wang" , "Casey Bodley" 
>> Cc: ceph-devel@vger.kernel.org
>> Sent: Thursday, September 10, 2015 8:08:04 PM
>> Subject: RE: About Fio backend with ObjectStore API
>>
>> Hi Casey and Haomai,
>>
>>   We finally made the fio-objectstore works in our end . Here is fio data
>>   against filestore with Samsung 850 Pro. It is sequential write and the
>>   performance is very poor which is expected though.
>>
>> Run status group 0 (all jobs):
>>   WRITE: io=524288KB, aggrb=9467KB/s, minb=9467KB/s, maxb=9467KB/s,
>>   mint=55378msec, maxt=55378msec
>>
>>   But anyway, it works even though still some bugs to fix like read and
>>   filesytem issues. thanks a lot for your great work.
>>
>>   Regards,
>>   James
>>
>>   jamesliu@jamesliu-OptiPlex-7010:~/WorkSpace/ceph_casey/src$ sudo ./fio/fio
>>   ./test/objectstore.fio
>> filestore: (g=0): rw=write, bs=128K-128K/128K-128K/128K-128K,
>> ioengine=cephobjectstore, iodepth=1 fio-2.2.9-56-g736a Starting 1 process
>> test1
>> filestore: Laying out IO file(s) (1 file(s) / 512MB)
>> 2015-09-10 16:55:40.614494 7f19d34d1840  1 filestore(/home/jamesliu/fio_ceph)
>> mkfs in /home/jamesliu/fio_ceph
>> 2015-09-10 16:55:40.614924 7f19d34d1840  1 filestore(/home/jamesliu/fio_ceph)
>> mkfs generated fsid 5508d58e-dbfc-48a5-9f9c-c639af4fe73a
>> 2015-09-10 16:55:40.630326 7f19d34d1840  1 

Re: [HPDD-discuss] [PATCH] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations

2015-09-12 Thread Jeff Layton
On Sat, 12 Sep 2015 04:41:33 +
"Dilger, Andreas"  wrote:

> On 2015/09/11, 4:20 AM, "HPDD-discuss on behalf of Jeff Layton"
> 
> wrote:
> 
> >With NFSv3 nfsd will always attempt to send along WCC data to the
> >client. This generally involves saving off the in-core inode information
> >prior to doing the operation on the given filehandle, and then issuing a
> >vfs_getattr to it after the op.
> >
> >Some filesystems (particularly clustered or networked ones) have an
> >expensive ->getattr inode operation. Atomicitiy is also often difficult
> >or impossible to guarantee on such filesystems. For those, we're best
> >off not trying to provide WCC information to the client at all, and to
> >simply allow it to poll for that information as needed with a GETATTR
> >RPC.
> >
> >This patch adds a new flags field to struct export_operations, and
> >defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate
> >that nfsd should not attempt to provide WCC info in NFSv3 replies. It
> >also adds a blurb about the new flags field and flag to the exporting
> >documentation.
> >
> >The server will also now skip collecting this information for NFSv2 as
> >well, since that info is never used there anyway.
> >
> >Note that this patch does not add this flag to any filesystem
> >export_operations structures. This was originally developed to allow
> >reexporting nfs via nfsd. That code is not (and may never be) suitable
> >for merging into mainline.
> >
> >Other filesystems may want to consider enabling this flag too. It's hard
> >to tell however which ones have export operations to enable export via
> >knfsd and which ones mostly rely on them for open-by-filehandle support,
> >so I'm leaving that up to the individual maintainers to decide. I am
> >cc'ing the relevant lists for those filesystems that I think may want to
> >consider adding this though.
> >
> >Cc: hpdd-disc...@lists.01.org
> >Cc: ceph-devel@vger.kernel.org
> >Cc: cluster-de...@redhat.com
> >Cc: fuse-de...@lists.sourceforge.net
> >Cc: ocfs2-de...@oss.oracle.com
> >Signed-off-by: Jeff Layton 
> >---
> > Documentation/filesystems/nfs/Exporting | 27 +++
> > fs/nfsd/nfs3xdr.c   |  5 -
> > fs/nfsd/nfsfh.c | 14 ++
> > fs/nfsd/nfsfh.h |  5 -
> > include/linux/exportfs.h|  2 ++
> > 5 files changed, 51 insertions(+), 2 deletions(-)
> >
> >diff --git a/Documentation/filesystems/nfs/Exporting
> >b/Documentation/filesystems/nfs/Exporting
> >index 520a4becb75c..fa636cde3907 100644
> >--- a/Documentation/filesystems/nfs/Exporting
> >+++ b/Documentation/filesystems/nfs/Exporting
> >@@ -138,6 +138,11 @@ struct which has the following members:
> > to find potential names, and matches inode numbers to find the
> >correct
> > match.
> > 
> >+  flags
> >+Some filesystems may need to be handled differently than others. The
> >+export_operations struct also includes a flags field that allows the
> >+filesystem to communicate such information to nfsd. See the Export
> >+Operations Flags section below for more explanation.
> > 
> > A filehandle fragment consists of an array of 1 or more 4byte words,
> > together with a one byte "type".
> >@@ -147,3 +152,25 @@ generated by encode_fh, in which case it will have
> >been padded with
> > nuls.  Rather, the encode_fh routine should choose a "type" which
> > indicates the decode_fh how much of the filehandle is valid, and how
> > it should be interpreted.
> >+
> >+Export Operations Flags
> >+---
> >+In addition to the operation vector pointers, struct export_operations
> >also
> >+contains a "flags" field that allows the filesystem to communicate to
> >nfsd
> >+that it may want to do things differently when dealing with it. The
> >+following flags are defined:
> >+
> >+  EXPORT_OP_NOWCC
> >+RFC 1813 recommends that servers always send weak cache consistency
> >+(WCC) data to the client after each operation. The server should
> >+atomically collect attributes about the inode, do an operation on it,
> >+and then collect the attributes afterward. This allows the client to
> >+skip issuing GETATTRs in some situations but means that the server
> >+is calling vfs_getattr for almost all RPCs. On some filesystems
> >+(particularly those that are clustered or networked) this is
> >expensive
> >+and atomicity is difficult to guarantee. This flag indicates to nfsd
> >+that it should skip providing WCC attributes to the client in NFSv3
> >+replies when doing operations on this filesystem. Consider enabling
> >+this on filesystems that have an expensive ->getattr inode operation,
> >+or when atomicity between pre and post operation attribute collection
> >+is impossible to guarantee.
> >diff --git 

Question about big EC pool.

2015-09-12 Thread Mike Almateia

Hello!

Ceph have EC pool feature a long time and I thinking how much big EC 
pool can we made and support.


I have task from our a client to make a storage around 5Pb userful space 
for storing video from cams (I ask in ceph-user maillist but nobody 
answer me.)


Ceph by now can handle that huge EC storage or not? May be someone 
testing or useing in prodaction huge EC pools?

I heard, that EC feature not ready for production, it's right?

Thanks for any answer.

--
Mike, yes.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2 replications,flapping can not stop for a very long time

2015-09-12 Thread zhao.ming...@h3c.com
  
Hi,
I'm testing reliability of ceph recently, and I have met the flapping problem.
I have 2 replications, and cut off the cluster network ,now  flapping can not 
stop,I have wait more than 30min, but status of osds are still not stable;
I want to know about  when monitor recv reports from osds ,how it can mark 
one osd down?
(reports && reporter && grace) need to satisfied some conditions, how to 
calculate the grace?
and how long will the flapping  stop?Does the flapping must be stopped by 
configure,such as configure an osd lost?
Can someone help me ?
Thanks~
-
本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from H3C, 
which is
intended only for the person or entity whose address is listed above. Any use 
of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender
by phone or email immediately and delete it!