Re: [ceph-users] Adding multiple OSD
On 05/12/17 09:20, Ronny Aasen wrote: > On 05. des. 2017 00:14, Karun Josy wrote: >> Thank you for detailed explanation! >> >> Got one another doubt, >> >> This is the total space available in the cluster : >> >> TOTAL : 23490G >> Use : 10170G >> Avail : 13320G >> >> >> But ecpool shows max avail as just 3 TB. What am I missing ? >> >> Karun Josy > > without knowing details of your cluster, this is just assumption guessing, > but... > > perhaps one of your hosts have less free space then the others, replicated > can pick 3 of the hosts that have plenty of space, but erasure perhaps > require more hosts, so the host with least space is the limiting factor. > > check > ceph osd df tree > > to see how it looks. > > > kinds regards > Ronny Aasen From previous emails the erasure code profile is k=5,m=3, with a host failure domain, so the EC pool does use all eight hosts for every object. I agree it's very likely that the problem is that your hosts currently have heterogeneous capacity and the maximum data in the EC pool will be limited by the size of the smallest host. Also remember that with this profile, you have a 3/5 overhead on your data, so 1GB of real data stored in the pool translates to 1.6GB of raw data on disk. The pool usage and max available stats are given in terms of real data, but the cluster TOTAL usage/availability is expressed in terms of the raw space (since real usable data will vary depending on pool settings). If you check, you will probably find that your lowest-capacity host has near 6TB of space free, which would let you store a little over 3.5TB of real data in your EC pool. Rich signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Adding multiple OSD
On 05. des. 2017 00:14, Karun Josy wrote: Thank you for detailed explanation! Got one another doubt, This is the total space available in the cluster : TOTAL : 23490G Use : 10170G Avail : 13320G But ecpool shows max avail as just 3 TB. What am I missing ? == $ ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 23490G 13338G 10151G 43.22 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS ostemplates 1 162G 2.79 1134G 42084 imagepool 34 122G 2.11 1891G 34196 cvm1 54 8058 0 1891G 950 ecpool1 55 4246G 42.77 3546G 1232590 $ ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 0 ssd 1.86469 1.0 1909G 625G 1284G 32.76 0.76 201 1 ssd 1.86469 1.0 1909G 691G 1217G 36.23 0.84 208 2 ssd 0.87320 1.0 894G 587G 306G 65.67 1.52 156 11 ssd 0.87320 1.0 894G 631G 262G 70.68 1.63 186 3 ssd 0.87320 1.0 894G 605G 288G 67.73 1.56 165 14 ssd 0.87320 1.0 894G 635G 258G 71.07 1.64 177 4 ssd 0.87320 1.0 894G 419G 474G 46.93 1.08 127 15 ssd 0.87320 1.0 894G 373G 521G 41.73 0.96 114 16 ssd 0.87320 1.0 894G 492G 401G 55.10 1.27 149 5 ssd 0.87320 1.0 894G 288G 605G 32.25 0.74 87 6 ssd 0.87320 1.0 894G 342G 551G 38.28 0.88 102 7 ssd 0.87320 1.0 894G 300G 593G 33.61 0.78 93 22 ssd 0.87320 1.0 894G 343G 550G 38.43 0.89 104 8 ssd 0.87320 1.0 894G 267G 626G 29.90 0.69 77 9 ssd 0.87320 1.0 894G 376G 518G 42.06 0.97 118 10 ssd 0.87320 1.0 894G 322G 571G 36.12 0.83 102 19 ssd 0.87320 1.0 894G 339G 554G 37.95 0.88 109 12 ssd 0.87320 1.0 894G 360G 534G 40.26 0.93 112 13 ssd 0.87320 1.0 894G 404G 489G 45.21 1.04 120 20 ssd 0.87320 1.0 894G 342G 551G 38.29 0.88 103 23 ssd 0.87320 1.0 894G 148G 745G 16.65 0.38 61 17 ssd 0.87320 1.0 894G 423G 470G 47.34 1.09 117 18 ssd 0.87320 1.0 894G 403G 490G 45.18 1.04 120 21 ssd 0.87320 1.0 894G 444G 450G 49.67 1.15 130 TOTAL 23490G 10170G 13320G 43.30 Karun Josy On Tue, Dec 5, 2017 at 4:42 AM, Karun Josy> wrote: Thank you for detailed explanation! Got one another doubt, This is the total space available in the cluster : TOTAL 23490G Use 10170G Avail : 13320G But ecpool shows max avail as just 3 TB. without knowing details of your cluster, this is just assumption guessing, but... perhaps one of your hosts have less free space then the others, replicated can pick 3 of the hosts that have plenty of space, but erasure perhaps require more hosts, so the host with least space is the limiting factor. check ceph osd df tree to see how it looks. kinds regards Ronny Aasen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Adding multiple OSD
Thank you for detailed explanation! Got one another doubt, This is the total space available in the cluster : TOTAL : 23490G Use : 10170G Avail : 13320G But ecpool shows max avail as just 3 TB. What am I missing ? == $ ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 23490G 13338G 10151G 43.22 POOLS: NAMEID USED %USED MAX AVAIL OBJECTS ostemplates 1 162G 2.79 1134G 42084 imagepool 34 122G 2.11 1891G 34196 cvm154 8058 0 1891G 950 ecpool1 55 4246G 42.77 3546G 1232590 $ ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USEAVAIL %USE VAR PGS 0 ssd 1.86469 1.0 1909G 625G 1284G 32.76 0.76 201 1 ssd 1.86469 1.0 1909G 691G 1217G 36.23 0.84 208 2 ssd 0.87320 1.0 894G 587G 306G 65.67 1.52 156 11 ssd 0.87320 1.0 894G 631G 262G 70.68 1.63 186 3 ssd 0.87320 1.0 894G 605G 288G 67.73 1.56 165 14 ssd 0.87320 1.0 894G 635G 258G 71.07 1.64 177 4 ssd 0.87320 1.0 894G 419G 474G 46.93 1.08 127 15 ssd 0.87320 1.0 894G 373G 521G 41.73 0.96 114 16 ssd 0.87320 1.0 894G 492G 401G 55.10 1.27 149 5 ssd 0.87320 1.0 894G 288G 605G 32.25 0.74 87 6 ssd 0.87320 1.0 894G 342G 551G 38.28 0.88 102 7 ssd 0.87320 1.0 894G 300G 593G 33.61 0.78 93 22 ssd 0.87320 1.0 894G 343G 550G 38.43 0.89 104 8 ssd 0.87320 1.0 894G 267G 626G 29.90 0.69 77 9 ssd 0.87320 1.0 894G 376G 518G 42.06 0.97 118 10 ssd 0.87320 1.0 894G 322G 571G 36.12 0.83 102 19 ssd 0.87320 1.0 894G 339G 554G 37.95 0.88 109 12 ssd 0.87320 1.0 894G 360G 534G 40.26 0.93 112 13 ssd 0.87320 1.0 894G 404G 489G 45.21 1.04 120 20 ssd 0.87320 1.0 894G 342G 551G 38.29 0.88 103 23 ssd 0.87320 1.0 894G 148G 745G 16.65 0.38 61 17 ssd 0.87320 1.0 894G 423G 470G 47.34 1.09 117 18 ssd 0.87320 1.0 894G 403G 490G 45.18 1.04 120 21 ssd 0.87320 1.0 894G 444G 450G 49.67 1.15 130 TOTAL 23490G 10170G 13320G 43.30 Karun Josy On Tue, Dec 5, 2017 at 4:42 AM, Karun Josywrote: > Thank you for detailed explanation! > > Got one another doubt, > > This is the total space available in the cluster : > > TOTAL 23490G > Use 10170G > Avail : 13320G > > > But ecpool shows max avail as just 3 TB. > > > > Karun Josy > > On Tue, Dec 5, 2017 at 1:06 AM, David Turner > wrote: > >> No, I would only add disks to 1 failure domain at a time. So in your >> situation where you're adding 2 more disks to each node, I would recommend >> adding the 2 disks into 1 node at a time. Your failure domain is the >> crush-failure-domain=host. So you can lose a host and only lose 1 copy of >> the data. If all of your pools are using the k=5 m=3 profile, then I would >> say it's fine to add the disks into 2 nodes at a time. If you have any >> replica pools for RGW metadata or anything, then I would stick with the 1 >> host at a time. >> >> On Mon, Dec 4, 2017 at 2:29 PM Karun Josy wrote: >> >>> Thanks for your reply! >>> >>> I am using erasure coded profile with k=5, m=3 settings >>> >>> $ ceph osd erasure-code-profile get profile5by3 >>> crush-device-class= >>> crush-failure-domain=host >>> crush-root=default >>> jerasure-per-chunk-alignment=false >>> k=5 >>> m=3 >>> plugin=jerasure >>> technique=reed_sol_van >>> w=8 >>> >>> >>> Cluster has 8 nodes, with 3 disks each. We are planning to add 2 more on >>> each nodes. >>> >>> If I understand correctly, then I can add 3 disks at once right , >>> assuming 3 disks can fail at a time as per the ec code profile. >>> >>> Karun Josy >>> >>> On Tue, Dec 5, 2017 at 12:06 AM, David Turner >>> wrote: >>> Depending on how well you burn-in/test your new disks, I like to only add 1 failure domain of disks at a time in case you have bad disks that you're adding. If you are confident that your disks aren't likely to fail during the backfilling, then you can go with more. I just added 8 servers (16 OSDs each) to a cluster with 15 servers (16 OSDs each) all at the same time, but we spent 2 weeks testing the hardware before adding the new nodes to the cluster. If you add 1 failure domain at a time, then any DoA disks in the new nodes will only be able to fail with 1 copy of your data instead of across multiple nodes. On Mon, Dec 4, 2017 at 12:54 PM Karun Josy wrote: > Hi, > > Is it recommended to add OSD disks one by one or can I add couple of > disks at a time ? > > Current cluster size is about 4 TB. > > >
Re: [ceph-users] Adding multiple OSD
Thank you for detailed explanation! Got one another doubt, This is the total space available in the cluster : TOTAL 23490G Use 10170G Avail : 13320G But ecpool shows max avail as just 3 TB. Karun Josy On Tue, Dec 5, 2017 at 1:06 AM, David Turnerwrote: > No, I would only add disks to 1 failure domain at a time. So in your > situation where you're adding 2 more disks to each node, I would recommend > adding the 2 disks into 1 node at a time. Your failure domain is the > crush-failure-domain=host. So you can lose a host and only lose 1 copy of > the data. If all of your pools are using the k=5 m=3 profile, then I would > say it's fine to add the disks into 2 nodes at a time. If you have any > replica pools for RGW metadata or anything, then I would stick with the 1 > host at a time. > > On Mon, Dec 4, 2017 at 2:29 PM Karun Josy wrote: > >> Thanks for your reply! >> >> I am using erasure coded profile with k=5, m=3 settings >> >> $ ceph osd erasure-code-profile get profile5by3 >> crush-device-class= >> crush-failure-domain=host >> crush-root=default >> jerasure-per-chunk-alignment=false >> k=5 >> m=3 >> plugin=jerasure >> technique=reed_sol_van >> w=8 >> >> >> Cluster has 8 nodes, with 3 disks each. We are planning to add 2 more on >> each nodes. >> >> If I understand correctly, then I can add 3 disks at once right , >> assuming 3 disks can fail at a time as per the ec code profile. >> >> Karun Josy >> >> On Tue, Dec 5, 2017 at 12:06 AM, David Turner >> wrote: >> >>> Depending on how well you burn-in/test your new disks, I like to only >>> add 1 failure domain of disks at a time in case you have bad disks that >>> you're adding. If you are confident that your disks aren't likely to fail >>> during the backfilling, then you can go with more. I just added 8 servers >>> (16 OSDs each) to a cluster with 15 servers (16 OSDs each) all at the same >>> time, but we spent 2 weeks testing the hardware before adding the new nodes >>> to the cluster. >>> >>> If you add 1 failure domain at a time, then any DoA disks in the new >>> nodes will only be able to fail with 1 copy of your data instead of across >>> multiple nodes. >>> >>> On Mon, Dec 4, 2017 at 12:54 PM Karun Josy wrote: >>> Hi, Is it recommended to add OSD disks one by one or can I add couple of disks at a time ? Current cluster size is about 4 TB. Karun ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Adding multiple OSD
No, I would only add disks to 1 failure domain at a time. So in your situation where you're adding 2 more disks to each node, I would recommend adding the 2 disks into 1 node at a time. Your failure domain is the crush-failure-domain=host. So you can lose a host and only lose 1 copy of the data. If all of your pools are using the k=5 m=3 profile, then I would say it's fine to add the disks into 2 nodes at a time. If you have any replica pools for RGW metadata or anything, then I would stick with the 1 host at a time. On Mon, Dec 4, 2017 at 2:29 PM Karun Josywrote: > Thanks for your reply! > > I am using erasure coded profile with k=5, m=3 settings > > $ ceph osd erasure-code-profile get profile5by3 > crush-device-class= > crush-failure-domain=host > crush-root=default > jerasure-per-chunk-alignment=false > k=5 > m=3 > plugin=jerasure > technique=reed_sol_van > w=8 > > > Cluster has 8 nodes, with 3 disks each. We are planning to add 2 more on > each nodes. > > If I understand correctly, then I can add 3 disks at once right , assuming > 3 disks can fail at a time as per the ec code profile. > > Karun Josy > > On Tue, Dec 5, 2017 at 12:06 AM, David Turner > wrote: > >> Depending on how well you burn-in/test your new disks, I like to only add >> 1 failure domain of disks at a time in case you have bad disks that you're >> adding. If you are confident that your disks aren't likely to fail during >> the backfilling, then you can go with more. I just added 8 servers (16 >> OSDs each) to a cluster with 15 servers (16 OSDs each) all at the same >> time, but we spent 2 weeks testing the hardware before adding the new nodes >> to the cluster. >> >> If you add 1 failure domain at a time, then any DoA disks in the new >> nodes will only be able to fail with 1 copy of your data instead of across >> multiple nodes. >> >> On Mon, Dec 4, 2017 at 12:54 PM Karun Josy wrote: >> >>> Hi, >>> >>> Is it recommended to add OSD disks one by one or can I add couple of >>> disks at a time ? >>> >>> Current cluster size is about 4 TB. >>> >>> >>> >>> Karun >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Adding multiple OSD
Thanks for your reply! I am using erasure coded profile with k=5, m=3 settings $ ceph osd erasure-code-profile get profile5by3 crush-device-class= crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=5 m=3 plugin=jerasure technique=reed_sol_van w=8 Cluster has 8 nodes, with 3 disks each. We are planning to add 2 more on each nodes. If I understand correctly, then I can add 3 disks at once right , assuming 3 disks can fail at a time as per the ec code profile. Karun Josy On Tue, Dec 5, 2017 at 12:06 AM, David Turnerwrote: > Depending on how well you burn-in/test your new disks, I like to only add > 1 failure domain of disks at a time in case you have bad disks that you're > adding. If you are confident that your disks aren't likely to fail during > the backfilling, then you can go with more. I just added 8 servers (16 > OSDs each) to a cluster with 15 servers (16 OSDs each) all at the same > time, but we spent 2 weeks testing the hardware before adding the new nodes > to the cluster. > > If you add 1 failure domain at a time, then any DoA disks in the new nodes > will only be able to fail with 1 copy of your data instead of across > multiple nodes. > > On Mon, Dec 4, 2017 at 12:54 PM Karun Josy wrote: > >> Hi, >> >> Is it recommended to add OSD disks one by one or can I add couple of >> disks at a time ? >> >> Current cluster size is about 4 TB. >> >> >> >> Karun >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Adding multiple OSD
Depending on how well you burn-in/test your new disks, I like to only add 1 failure domain of disks at a time in case you have bad disks that you're adding. If you are confident that your disks aren't likely to fail during the backfilling, then you can go with more. I just added 8 servers (16 OSDs each) to a cluster with 15 servers (16 OSDs each) all at the same time, but we spent 2 weeks testing the hardware before adding the new nodes to the cluster. If you add 1 failure domain at a time, then any DoA disks in the new nodes will only be able to fail with 1 copy of your data instead of across multiple nodes. On Mon, Dec 4, 2017 at 12:54 PM Karun Josywrote: > Hi, > > Is it recommended to add OSD disks one by one or can I add couple of disks > at a time ? > > Current cluster size is about 4 TB. > > > > Karun > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Adding multiple OSD
Hi, Is it recommended to add OSD disks one by one or can I add couple of disks at a time ? Current cluster size is about 4 TB. Karun ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com