Re: [ceph-users] erasure code : number of chunks for a small cluster ?
Oh, I didn't thinked about this. Thanks Hector ! - Mail original - De: "Hector Martin" À: "ceph-users" Envoyé: Vendredi 6 Février 2015 09:06:29 Objet: Re: [ceph-users] erasure code : number of chunks for a small cluster ? On 02/02/15 03:38, Udo Lembke wrote: > With 3 hosts only you can't survive an full node failure, because for > that you need > host >= k + m. Sure you can. k=2, m=1 with the failure domain set to host will survive a full host failure. Configuring an encoding that survives one full host failure or two OSDs anywhere on the cluster is possible. Use k=4, m=2, then define a CRUSH rule like this: step take default step choose indep 3 type host step choose indep 2 type osd step emit That will ensure that for each PG, each host gets two chunks on two independent OSDs. That means that you can lose any pair of OSDs (since no PG will have two chunks on the same OSD, and the encoding can survive a two-chunk loss). You can also lose any host, which will cause the loss of exactly two chunks for every PG. Of course, with a setup like this, if you lose a host, the cluster will be degraded until you can bring the host back, and will not be able to recover those chunks anywhere (since the ruleset prevents so), so any further failure of an OSD while a host is down will necessarily lose data. -- Hector Martin (hec...@marcansoft.com) Public Key: https://marcan.st/marcan.asc ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] erasure code : number of chunks for a small cluster ?
On 06/02/15 21:07, Udo Lembke wrote: > Am 06.02.2015 09:06, schrieb Hector Martin: >> On 02/02/15 03:38, Udo Lembke wrote: >>> With 3 hosts only you can't survive an full node failure, because for >>> that you need >>> host >= k + m. >> >> Sure you can. k=2, m=1 with the failure domain set to host will survive >> a full host failure. >> > > Hi, > Alexandre has the requirement of 2 failed disk or one full node failure. > This is the reason why I wrote, that this is not possible... But it is, I just explained how that can be achieved with only 3 nodes, with k=4, m=2, and a custom CRUSH rule. Placing precisely two chunks on each host, on two distinct OSDs, satisfies this requirement: any two OSDs can fail (leaving 4/6 chunks) or any host can fail (again leaving 4/6 chunks). -- Hector Martin (hec...@marcansoft.com) Public Key: https://marcan.st/marcan.asc ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] erasure code : number of chunks for a small cluster ?
Am 06.02.2015 09:06, schrieb Hector Martin: > On 02/02/15 03:38, Udo Lembke wrote: >> With 3 hosts only you can't survive an full node failure, because for >> that you need >> host >= k + m. > > Sure you can. k=2, m=1 with the failure domain set to host will survive > a full host failure. > Hi, Alexandre has the requirement of 2 failed disk or one full node failure. This is the reason why I wrote, that this is not possible... Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] erasure code : number of chunks for a small cluster ?
On 02/02/15 03:38, Udo Lembke wrote: > With 3 hosts only you can't survive an full node failure, because for > that you need > host >= k + m. Sure you can. k=2, m=1 with the failure domain set to host will survive a full host failure. Configuring an encoding that survives one full host failure or two OSDs anywhere on the cluster is possible. Use k=4, m=2, then define a CRUSH rule like this: step take default step choose indep 3 type host step choose indep 2 type osd step emit That will ensure that for each PG, each host gets two chunks on two independent OSDs. That means that you can lose any pair of OSDs (since no PG will have two chunks on the same OSD, and the encoding can survive a two-chunk loss). You can also lose any host, which will cause the loss of exactly two chunks for every PG. Of course, with a setup like this, if you lose a host, the cluster will be degraded until you can bring the host back, and will not be able to recover those chunks anywhere (since the ruleset prevents so), so any further failure of an OSD while a host is down will necessarily lose data. -- Hector Martin (hec...@marcansoft.com) Public Key: https://marcan.st/marcan.asc ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] erasure code : number of chunks for a small cluster ?
>>Hi Alexandre, >> >>nice to meet you here ;-) Hi Udo! (Udo from proxmox ? ;) >>With 3 hosts only you can't survive an full node failure, because for >>that you need >>host >= k + m. >>And k:1 m:2 don't make any sense. >> >>I start with 5 hosts and use k:3, m:2. In this case two hdds can fail or >>one host can be down for maintenance. Ok, thanks ! With loic explain too, It's clear now ! - Mail original - De: "Udo Lembke" À: "aderumier" , "ceph-users" Envoyé: Dimanche 1 Février 2015 19:38:55 Objet: Re: [ceph-users] erasure code : number of chunks for a small cluster ? Hi Alexandre, nice to meet you here ;-) With 3 hosts only you can't survive an full node failure, because for that you need host >= k + m. And k:1 m:2 don't make any sense. I start with 5 hosts and use k:3, m:2. In this case two hdds can fail or one host can be down for maintenance. Udo PS: you also can't change k+m on a pool later... On 01.02.2015 18:15, Alexandre DERUMIER wrote: > Hi, > > I'm currently trying to understand how to setup correctly a pool with erasure > code > > > https://ceph.com/docs/v0.80/dev/osd_internals/erasure_coding/developer_notes/ > > > My cluster is 3 nodes with 6 osd for each node (18 osd total). > > I want to be able to survive of 2 disk failures, but also a full node > failure. > > What is the best setup for this ? Does I need M=2 or M=6 ? > > > > > Also, how to determinate the best chunk number ? > > for example, > K = 4 , M=2 > K = 8 , M=2 > K = 16 , M=2 > > you can loose which each config 2 osd, but the more data chunks you have, the > less space is used by coding chunks right ? > Does the number of chunk have performance impact ? (read/write ?) > > Regards, > > Alexandre > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] erasure code : number of chunks for a small cluster ?
>>If you have K=2,M=1 you will survive one node failure. If your failure domain >>is the host (i.e. there never is more than one chunk per node for any given >>object), it will also survive two disks failures within a given node >>>>because only one of them will have a chunk. It won't be able to resist the >>simultaneous failure of two OSDs that belong to two different nodes: that >>would be the same as having two simultaneous node failure. Ah, ok, it's clear now ! With 1 chunk by node, I finally understand how it's working :) Thanks Loic - Mail original - De: "Loic Dachary" À: "aderumier" , "ceph-users" Envoyé: Dimanche 1 Février 2015 18:42:51 Objet: Re: [ceph-users] erasure code : number of chunks for a small cluster ? Hi Alexandre, On 01/02/2015 18:15, Alexandre DERUMIER wrote: > Hi, > > I'm currently trying to understand how to setup correctly a pool with erasure > code > > > https://ceph.com/docs/v0.80/dev/osd_internals/erasure_coding/developer_notes/ > > > My cluster is 3 nodes with 6 osd for each node (18 osd total). > > I want to be able to survive of 2 disk failures, but also a full node > failure. If you have K=2,M=1 you will survive one node failure. If your failure domain is the host (i.e. there never is more than one chunk per node for any given object), it will also survive two disks failures within a given node because only one of them will have a chunk. It won't be able to resist the simultaneous failure of two OSDs that belong to two different nodes: that would be the same as having two simultaneous node failure. > > What is the best setup for this ? Does I need M=2 or M=6 ? > > > > > Also, how to determinate the best chunk number ? > > for example, > K = 4 , M=2 > K = 8 , M=2 > K = 16 , M=2 > > you can loose which each config 2 osd, but the more data chunks you have, the > less space is used by coding chunks right ? Yes. > Does the number of chunk have performance impact ? (read/write ?) If there are more chunks there is an additional computation overhead but I'm not sure what's the impact. I suspect it's not significant when but never actually measured it. Cheers > > Regards, > > Alexandre > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Loïc Dachary, Artisan Logiciel Libre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] erasure code : number of chunks for a small cluster ?
Hi Alexandre, nice to meet you here ;-) With 3 hosts only you can't survive an full node failure, because for that you need host >= k + m. And k:1 m:2 don't make any sense. I start with 5 hosts and use k:3, m:2. In this case two hdds can fail or one host can be down for maintenance. Udo PS: you also can't change k+m on a pool later... On 01.02.2015 18:15, Alexandre DERUMIER wrote: > Hi, > > I'm currently trying to understand how to setup correctly a pool with erasure > code > > > https://ceph.com/docs/v0.80/dev/osd_internals/erasure_coding/developer_notes/ > > > My cluster is 3 nodes with 6 osd for each node (18 osd total). > > I want to be able to survive of 2 disk failures, but also a full node failure. > > What is the best setup for this ? Does I need M=2 or M=6 ? > > > > > Also, how to determinate the best chunk number ? > > for example, > K = 4 , M=2 > K = 8 , M=2 > K = 16 , M=2 > > you can loose which each config 2 osd, but the more data chunks you have, the > less space is used by coding chunks right ? > Does the number of chunk have performance impact ? (read/write ?) > > Regards, > > Alexandre > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] erasure code : number of chunks for a small cluster ?
Hi Alexandre, On 01/02/2015 18:15, Alexandre DERUMIER wrote: > Hi, > > I'm currently trying to understand how to setup correctly a pool with erasure > code > > > https://ceph.com/docs/v0.80/dev/osd_internals/erasure_coding/developer_notes/ > > > My cluster is 3 nodes with 6 osd for each node (18 osd total). > > I want to be able to survive of 2 disk failures, but also a full node failure. If you have K=2,M=1 you will survive one node failure. If your failure domain is the host (i.e. there never is more than one chunk per node for any given object), it will also survive two disks failures within a given node because only one of them will have a chunk. It won't be able to resist the simultaneous failure of two OSDs that belong to two different nodes: that would be the same as having two simultaneous node failure. > > What is the best setup for this ? Does I need M=2 or M=6 ? > > > > > Also, how to determinate the best chunk number ? > > for example, > K = 4 , M=2 > K = 8 , M=2 > K = 16 , M=2 > > you can loose which each config 2 osd, but the more data chunks you have, the > less space is used by coding chunks right ? Yes. > Does the number of chunk have performance impact ? (read/write ?) If there are more chunks there is an additional computation overhead but I'm not sure what's the impact. I suspect it's not significant when but never actually measured it. Cheers > > Regards, > > Alexandre > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com