[ceph-users] erasure coded pool
Is it possible to run an erasure coded pool using default k=2, m=2 profile on a single node? (this is just for functionality testing). The single node has 3 OSDs. Replicated pools run fine. ceph.conf does contain: osd crush chooseleaf type = 0 -- Tom Deneau ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] erasure coded pool
Hi Tom, On 20/02/2015 22:59, Deneau, Tom wrote: Is it possible to run an erasure coded pool using default k=2, m=2 profile on a single node? (this is just for functionality testing). The single node has 3 OSDs. Replicated pools run fine. For k=2 m=2 to work you need four (k+m) OSDs. As long the the crush rule allows it, you can have them on the same host. Cheers ceph.conf does contain: osd crush chooseleaf type = 0 -- Tom Deneau ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] erasure coded pool why ever k1?
Hi, On 22/01/2015 16:37, Chad William Seys wrote: Hi Loic, The size of each chunk is object size / K. If you have K=1 and M=2 it will be the same as 3 replicas with none of the advantages ;-) Interesting! I did not see this explained so explicitly. So is the general explanation of k and m something like: k, m: fault tolerance of m+1 replicas, space of 1/k*(m+k) replicas, plus slowness ? I'm not sure to understand the space formula but it looks like you got the idea. So one should never bother with k=1 b/c: k=1, m: fault tolerance of m+1, space of m+1 replicas, plus slowness. (therefore, just use m+1 replicas!) but k=2, m=1: might be useful instead of 2 replicas b/c it has fault tolerance of 2 replicas, space of 1/2*(1+2) = 3/2 = 1.5 replicas, plus slowness. And k=2, m=2: which should be as tolerant as 3 replicas, but take up as much space as (1/2)*(2+2)=2 replicas (right?). That's also how I understand it :-) Cheers Thanks again! Chad. -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] erasure coded pool why ever k1?
Hi Loic, The size of each chunk is object size / K. If you have K=1 and M=2 it will be the same as 3 replicas with none of the advantages ;-) Interesting! I did not see this explained so explicitly. So is the general explanation of k and m something like: k, m: fault tolerance of m+1 replicas, space of 1/k*(m+k) replicas, plus slowness ? So one should never bother with k=1 b/c: k=1, m: fault tolerance of m+1, space of m+1 replicas, plus slowness. (therefore, just use m+1 replicas!) but k=2, m=1: might be useful instead of 2 replicas b/c it has fault tolerance of 2 replicas, space of 1/2*(1+2) = 3/2 = 1.5 replicas, plus slowness. And k=2, m=2: which should be as tolerant as 3 replicas, but take up as much space as (1/2)*(2+2)=2 replicas (right?). Thanks again! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] erasure coded pool why ever k1?
Hello all, What reasons would one want k1? I read that m determines the number of OSD which can fail before loss. But I don't see explained how to choose k. Any benefits for choosing k1? Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] erasure coded pool why ever k1?
On 21/01/2015 22:42, Chad William Seys wrote: Hello all, What reasons would one want k1? I read that m determines the number of OSD which can fail before loss. But I don't see explained how to choose k. Any benefits for choosing k1? The size of each chunk is object size / K. If you have K=1 and M=2 it will be the same as 3 replicas with none of the advantages ;-) Cheers -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] erasure coded pool why ever k1?
Well, look at it this way: with 3X replication, for each TB of data you need 3 TB disk. With (for example) 10+3 EC, you get better protection, and for each TB of data you need 1.3 TB disk. -don- -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Loic Dachary Sent: 21 January, 2015 15:18 To: Chad William Seys; ceph-users@lists.ceph.com Subject: Re: [ceph-users] erasure coded pool why ever k1? On 21/01/2015 22:42, Chad William Seys wrote: Hello all, What reasons would one want k1? I read that m determines the number of OSD which can fail before loss. But I don't see explained how to choose k. Any benefits for choosing k1? The size of each chunk is object size / K. If you have K=1 and M=2 it will be the same as 3 replicas with none of the advantages ;-) Cheers -- Loïc Dachary, Artisan Logiciel Libre -- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] erasure coded pool k=7,m=5
Hi all, Soon, we should have a 3 datacenters (dc) ceph cluster with 4 hosts in each dc. Each host will have 12 OSD. We can accept the loss of one datacenter and one host on the remaining 2 datacenters. In order to use erasure coded pool : 1. Is the solution for a strategy k = 7, m = 5 is acceptable ? 2. Is this is the only one that guarantees us our premise ? 3. And more generally, is there a formula (based on the number of dc, host and OSD) that allows us to calculate the profile ? Thanks. Stephane. -- Université de Lorraine Stéphane DUGRAVOT - Direction du numérique - Infrastructure Jabber : stephane.dugra...@univ-lorraine.fr Tél.: +33 3 83 68 20 98 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] erasure coded pool k=7,m=5
Hi Stéphane, On 23/12/2014 14:34, Stéphane DUGRAVOT wrote: Hi all, Soon, we should have a 3 datacenters (dc) ceph cluster with 4 hosts in each dc. Each host will have 12 OSD. We can accept the loss of one datacenter and one host on the remaining 2 datacenters. In order to use erasure coded pool : 1. Is the solution for a strategy k = 7, m = 5 is acceptable ? If you want to sustain the loss of one datacenter, k=2,m=1 is what you want, with a ruleset that require that no two shards must be in the same datacenter. It also sustains the loss of one host within a datacenter: the missing chunk on the lost host will be reconstructed using the two other chunks from the two other datacenter. If, in addition, you want to sustain the loss of one machine while a datacenter is down, you would need to use the LRC plugin. 2. Is this is the only one that guarantees us our premise ? 3. And more generally, is there a formula (based on the number of dc, host and OSD) that allows us to calculate the profile ? I don't think there is such a formula. Cheers Thanks. Stephane. -- *Université de Lorraine**/ /*Stéphane DUGRAVOT - Direction du numérique - Infrastructure Jabber : /stephane.dugra...@univ-lorraine.fr/ Tél.: /+33 3 83 68 20 98/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Erasure coded pool suitable for MDS?
On 20/06/2014 00:06, Erik Logtenberg wrote: Hi Loic, That is a nice idea. And if I then use newfs against that replicated cache pool, it'll work reliably? It will not be limited by the erasure coded pool features, indeed. Cheers Kind regards, Erik. On 06/19/2014 11:09 PM, Loic Dachary wrote: On 19/06/2014 22:51, Wido den Hollander wrote: Op 19 jun. 2014 om 16:10 heeft Erik Logtenberg e...@logtenberg.eu het volgende geschreven: Hi, Are erasure coded pools suitable for use with MDS? I don't think so. It does in-place updates of objects and that doesn't work with EC pools. Hi Erik, This is correct. You can however set the replicated pool to be the cache of an erasure coded pool. https://ceph.com/docs/master/dev/cache-pool/ Cheers I tried to give it a go by creating two new pools like so: # ceph osd pool create ecdata 128 128 erasure # ceph osd pool create ecmetadata 128 128 erasure Then looked up their id's: # ceph osd lspools ..., 6 ecdata,7 ecmetadata # ceph mds newfs 7 6 --yes-i-really-mean-it But then when I start MDS, it crashes horribly. I did notice that MDS created a couple of objects in the ecmetadata pool: # rados ls -p ecmetadata mds0_sessionmap mds0_inotable 1..inode 200. mds_anchortable mds_snaptable 100..inode However it crashes immediately after. I started mds manually to try and see what's up: # ceph-mds -i 0 -d This spews out so much information that I saved it in a logfile, added as an attachment. Kind regards, Erik. mds.log ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Erasure coded pool suitable for MDS?
Hi, Are erasure coded pools suitable for use with MDS? I tried to give it a go by creating two new pools like so: # ceph osd pool create ecdata 128 128 erasure # ceph osd pool create ecmetadata 128 128 erasure Then looked up their id's: # ceph osd lspools ..., 6 ecdata,7 ecmetadata # ceph mds newfs 7 6 --yes-i-really-mean-it But then when I start MDS, it crashes horribly. I did notice that MDS created a couple of objects in the ecmetadata pool: # rados ls -p ecmetadata mds0_sessionmap mds0_inotable 1..inode 200. mds_anchortable mds_snaptable 100..inode However it crashes immediately after. I started mds manually to try and see what's up: # ceph-mds -i 0 -d This spews out so much information that I saved it in a logfile, added as an attachment. Kind regards, Erik. 2014-06-19 22:07:34.492328 7f3572f6e7c0 0 ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74), process ceph-mds, pid 2943 starting mds.0 at :/0 2014-06-19 22:07:35.793309 7f356dd88700 1 mds.-1.0 handle_mds_map standby 2014-06-19 22:07:35.876689 7f356dd88700 1 mds.0.15 handle_mds_map i am now mds.0.15 2014-06-19 22:07:35.876695 7f356dd88700 1 mds.0.15 handle_mds_map state change up:standby -- up:creating 2014-06-19 22:07:35.876931 7f356dd88700 0 mds.0.cache creating system inode with ino:1 2014-06-19 22:07:35.877204 7f356dd88700 0 mds.0.cache creating system inode with ino:100 2014-06-19 22:07:35.877209 7f356dd88700 0 mds.0.cache creating system inode with ino:600 2014-06-19 22:07:35.877369 7f356dd88700 0 mds.0.cache creating system inode with ino:601 2014-06-19 22:07:35.877455 7f356dd88700 0 mds.0.cache creating system inode with ino:602 2014-06-19 22:07:35.877519 7f356dd88700 0 mds.0.cache creating system inode with ino:603 2014-06-19 22:07:35.877566 7f356dd88700 0 mds.0.cache creating system inode with ino:604 2014-06-19 22:07:35.877606 7f356dd88700 0 mds.0.cache creating system inode with ino:605 2014-06-19 22:07:35.877683 7f356dd88700 0 mds.0.cache creating system inode with ino:606 2014-06-19 22:07:35.877723 7f356dd88700 0 mds.0.cache creating system inode with ino:607 2014-06-19 22:07:35.877780 7f356dd88700 0 mds.0.cache creating system inode with ino:608 2014-06-19 22:07:35.877819 7f356dd88700 0 mds.0.cache creating system inode with ino:609 2014-06-19 22:07:35.877858 7f356dd88700 0 mds.0.cache creating system inode with ino:200 mds/CDir.cc: In function 'virtual void C_Dir_Committed::finish(int)' thread 7f356dd88700 time 2014-06-19 22:07:35.881337 mds/CDir.cc: 1809: FAILED assert(r == 0) ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74) 1: ceph-mds() [0x75c6f1] 2: (Context::complete(int)+0x9) [0x56cff9] 3: (C_Gather::sub_finish(Context*, int)+0x1f7) [0x56e9a7] 4: (C_Gather::C_GatherSub::finish(int)+0x12) [0x56eab2] 5: (Context::complete(int)+0x9) [0x56cff9] 6: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xf4e) [0x7d26ee] 7: (MDS::handle_core_message(Message*)+0xb1f) [0x58e5ef] 8: (MDS::_dispatch(Message*)+0x32) [0x58e7f2] 9: (MDS::ms_dispatch(Message*)+0xa3) [0x5901d3] 10: (DispatchQueue::entry()+0x57a) [0x99d9da] 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x8be63d] 12: (()+0x7c53) [0x7f3572366c53] 13: (clone()+0x6d) [0x7f3571257dbd] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. 2014-06-19 22:07:35.883239 7f356dd88700 -1 mds/CDir.cc: In function 'virtual void C_Dir_Committed::finish(int)' thread 7f356dd88700 time 2014-06-19 22:07:35.881337 mds/CDir.cc: 1809: FAILED assert(r == 0) ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74) 1: ceph-mds() [0x75c6f1] 2: (Context::complete(int)+0x9) [0x56cff9] 3: (C_Gather::sub_finish(Context*, int)+0x1f7) [0x56e9a7] 4: (C_Gather::C_GatherSub::finish(int)+0x12) [0x56eab2] 5: (Context::complete(int)+0x9) [0x56cff9] 6: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xf4e) [0x7d26ee] 7: (MDS::handle_core_message(Message*)+0xb1f) [0x58e5ef] 8: (MDS::_dispatch(Message*)+0x32) [0x58e7f2] 9: (MDS::ms_dispatch(Message*)+0xa3) [0x5901d3] 10: (DispatchQueue::entry()+0x57a) [0x99d9da] 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x8be63d] 12: (()+0x7c53) [0x7f3572366c53] 13: (clone()+0x6d) [0x7f3571257dbd] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- begin dump of recent events --- -144 2014-06-19 22:07:34.489920 7f3572f6e7c0 5 asok(0x1e0) register_command perfcounters_dump hook 0x1dc8010 -143 2014-06-19 22:07:34.489992 7f3572f6e7c0 5 asok(0x1e0) register_command 1 hook 0x1dc8010 -142 2014-06-19 22:07:34.490003 7f3572f6e7c0 5 asok(0x1e0) register_command perf dump hook 0x1dc8010 -141 2014-06-19 22:07:34.490015 7f3572f6e7c0 5 asok(0x1e0) register_command perfcounters_schema hook 0x1dc8010 -140 2014-06-19 22:07:34.490027 7f3572f6e7c0 5 asok(0x1e0) register_command 2 hook 0x1dc8010 -139 2014-06-19 22:07:34.490035 7f3572f6e7c0 5 asok(0x1e0)