Re: [ceph-users] Adding multiple OSD

2017-12-05 Thread Richard Hesketh
On 05/12/17 09:20, Ronny Aasen wrote:
> On 05. des. 2017 00:14, Karun Josy wrote:
>> Thank you for detailed explanation!
>>
>> Got one another doubt,
>>
>> This is the total space available in the cluster :
>>
>> TOTAL : 23490G
>> Use  : 10170G
>> Avail : 13320G
>>
>>
>> But ecpool shows max avail as just 3 TB. What am I missing ?
>>
>> Karun Josy
> 
> without knowing details of your cluster, this is just assumption guessing, 
> but...
> 
> perhaps one of your hosts have less free space then the others, replicated 
> can pick 3 of the hosts that have plenty of space, but erasure perhaps 
> require more hosts, so the host with least space is the limiting factor.
> 
> check
> ceph osd df tree
> 
> to see how it looks.
> 
> 
> kinds regards
> Ronny Aasen

From previous emails the erasure code profile is k=5,m=3, with a host failure 
domain, so the EC pool does use all eight hosts for every object. I agree it's 
very likely that the problem is that your hosts currently have heterogeneous 
capacity and the maximum data in the EC pool will be limited by the size of the 
smallest host.

Also remember that with this profile, you have a 3/5 overhead on your data, so 
1GB of real data stored in the pool translates to 1.6GB of raw data on disk. 
The pool usage and max available stats are given in terms of real data, but the 
cluster TOTAL usage/availability is expressed in terms of the raw space (since 
real usable data will vary depending on pool settings). If you check, you will 
probably find that your lowest-capacity host has near 6TB of space free, which 
would let you store a little over 3.5TB of real data in your EC pool.

Rich



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding multiple OSD

2017-12-05 Thread Ronny Aasen

On 05. des. 2017 00:14, Karun Josy wrote:

Thank you for detailed explanation!

Got one another doubt,

This is the total space available in the cluster :

TOTAL : 23490G
Use  : 10170G
Avail : 13320G


But ecpool shows max avail as just 3 TB. What am I missing ?

==


$ ceph df
GLOBAL:
     SIZE       AVAIL      RAW USED     %RAW USED
     23490G     13338G       10151G         43.22
POOLS:
     NAME            ID     USED      %USED     MAX AVAIL     OBJECTS
     ostemplates     1       162G      2.79         1134G       42084
     imagepool       34      122G      2.11         1891G       34196
     cvm1            54      8058         0         1891G         950
     ecpool1         55     4246G     42.77         3546G     1232590


$ ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE   USE    AVAIL  %USE  VAR  PGS
  0   ssd 1.86469  1.0  1909G   625G  1284G 32.76 0.76 201
  1   ssd 1.86469  1.0  1909G   691G  1217G 36.23 0.84 208
  2   ssd 0.87320  1.0   894G   587G   306G 65.67 1.52 156
11   ssd 0.87320  1.0   894G   631G   262G 70.68 1.63 186
  3   ssd 0.87320  1.0   894G   605G   288G 67.73 1.56 165
14   ssd 0.87320  1.0   894G   635G   258G 71.07 1.64 177
  4   ssd 0.87320  1.0   894G   419G   474G 46.93 1.08 127
15   ssd 0.87320  1.0   894G   373G   521G 41.73 0.96 114
16   ssd 0.87320  1.0   894G   492G   401G 55.10 1.27 149
  5   ssd 0.87320  1.0   894G   288G   605G 32.25 0.74  87
  6   ssd 0.87320  1.0   894G   342G   551G 38.28 0.88 102
  7   ssd 0.87320  1.0   894G   300G   593G 33.61 0.78  93
22   ssd 0.87320  1.0   894G   343G   550G 38.43 0.89 104
  8   ssd 0.87320  1.0   894G   267G   626G 29.90 0.69  77
  9   ssd 0.87320  1.0   894G   376G   518G 42.06 0.97 118
10   ssd 0.87320  1.0   894G   322G   571G 36.12 0.83 102
19   ssd 0.87320  1.0   894G   339G   554G 37.95 0.88 109
12   ssd 0.87320  1.0   894G   360G   534G 40.26 0.93 112
13   ssd 0.87320  1.0   894G   404G   489G 45.21 1.04 120
20   ssd 0.87320  1.0   894G   342G   551G 38.29 0.88 103
23   ssd 0.87320  1.0   894G   148G   745G 16.65 0.38  61
17   ssd 0.87320  1.0   894G   423G   470G 47.34 1.09 117
18   ssd 0.87320  1.0   894G   403G   490G 45.18 1.04 120
21   ssd 0.87320  1.0   894G   444G   450G 49.67 1.15 130
                     TOTAL 23490G 10170G 13320G 43.30



Karun Josy

On Tue, Dec 5, 2017 at 4:42 AM, Karun Josy > wrote:


Thank you for detailed explanation!

Got one another doubt,

This is the total space available in the cluster :

TOTAL 23490G
Use 10170G
Avail : 13320G


But ecpool shows max avail as just 3 TB.




without knowing details of your cluster, this is just assumption 
guessing, but...


perhaps one of your hosts have less free space then the others, 
replicated can pick 3 of the hosts that have plenty of space, but 
erasure perhaps require more hosts, so the host with least space is the 
limiting factor.


check
ceph osd df tree

to see how it looks.


kinds regards
Ronny Aasen

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding multiple OSD

2017-12-04 Thread Karun Josy
Thank you for detailed explanation!

Got one another doubt,

This is the total space available in the cluster :

TOTAL : 23490G
Use  : 10170G
Avail : 13320G


But ecpool shows max avail as just 3 TB. What am I missing ?

==


$ ceph df
GLOBAL:
SIZE   AVAIL  RAW USED %RAW USED
23490G 13338G   10151G 43.22
POOLS:
NAMEID USED  %USED MAX AVAIL OBJECTS
ostemplates 1   162G  2.79 1134G   42084
imagepool   34  122G  2.11 1891G   34196
cvm154  8058 0 1891G 950
ecpool1 55 4246G 42.77 3546G 1232590


$ ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE  VAR  PGS
 0   ssd 1.86469  1.0  1909G   625G  1284G 32.76 0.76 201
 1   ssd 1.86469  1.0  1909G   691G  1217G 36.23 0.84 208
 2   ssd 0.87320  1.0   894G   587G   306G 65.67 1.52 156
11   ssd 0.87320  1.0   894G   631G   262G 70.68 1.63 186
 3   ssd 0.87320  1.0   894G   605G   288G 67.73 1.56 165
14   ssd 0.87320  1.0   894G   635G   258G 71.07 1.64 177
 4   ssd 0.87320  1.0   894G   419G   474G 46.93 1.08 127
15   ssd 0.87320  1.0   894G   373G   521G 41.73 0.96 114
16   ssd 0.87320  1.0   894G   492G   401G 55.10 1.27 149
 5   ssd 0.87320  1.0   894G   288G   605G 32.25 0.74  87
 6   ssd 0.87320  1.0   894G   342G   551G 38.28 0.88 102
 7   ssd 0.87320  1.0   894G   300G   593G 33.61 0.78  93
22   ssd 0.87320  1.0   894G   343G   550G 38.43 0.89 104
 8   ssd 0.87320  1.0   894G   267G   626G 29.90 0.69  77
 9   ssd 0.87320  1.0   894G   376G   518G 42.06 0.97 118
10   ssd 0.87320  1.0   894G   322G   571G 36.12 0.83 102
19   ssd 0.87320  1.0   894G   339G   554G 37.95 0.88 109
12   ssd 0.87320  1.0   894G   360G   534G 40.26 0.93 112
13   ssd 0.87320  1.0   894G   404G   489G 45.21 1.04 120
20   ssd 0.87320  1.0   894G   342G   551G 38.29 0.88 103
23   ssd 0.87320  1.0   894G   148G   745G 16.65 0.38  61
17   ssd 0.87320  1.0   894G   423G   470G 47.34 1.09 117
18   ssd 0.87320  1.0   894G   403G   490G 45.18 1.04 120
21   ssd 0.87320  1.0   894G   444G   450G 49.67 1.15 130
TOTAL 23490G 10170G 13320G 43.30



Karun Josy

On Tue, Dec 5, 2017 at 4:42 AM, Karun Josy  wrote:

> Thank you for detailed explanation!
>
> Got one another doubt,
>
> This is the total space available in the cluster :
>
> TOTAL 23490G
> Use 10170G
> Avail : 13320G
>
>
> But ecpool shows max avail as just 3 TB.
>
>
>
> Karun Josy
>
> On Tue, Dec 5, 2017 at 1:06 AM, David Turner 
> wrote:
>
>> No, I would only add disks to 1 failure domain at a time.  So in your
>> situation where you're adding 2 more disks to each node, I would recommend
>> adding the 2 disks into 1 node at a time.  Your failure domain is the
>> crush-failure-domain=host.  So you can lose a host and only lose 1 copy of
>> the data.  If all of your pools are using the k=5 m=3 profile, then I would
>> say it's fine to add the disks into 2 nodes at a time.  If you have any
>> replica pools for RGW metadata or anything, then I would stick with the 1
>> host at a time.
>>
>> On Mon, Dec 4, 2017 at 2:29 PM Karun Josy  wrote:
>>
>>> Thanks for your reply!
>>>
>>> I am using erasure coded profile with k=5, m=3 settings
>>>
>>> $ ceph osd erasure-code-profile get profile5by3
>>> crush-device-class=
>>> crush-failure-domain=host
>>> crush-root=default
>>> jerasure-per-chunk-alignment=false
>>> k=5
>>> m=3
>>> plugin=jerasure
>>> technique=reed_sol_van
>>> w=8
>>>
>>>
>>> Cluster has 8 nodes, with 3 disks each. We are planning to add 2 more on
>>> each nodes.
>>>
>>> If I understand correctly, then I can add 3 disks at once right ,
>>> assuming 3 disks can fail at a time as per the ec code profile.
>>>
>>> Karun Josy
>>>
>>> On Tue, Dec 5, 2017 at 12:06 AM, David Turner 
>>> wrote:
>>>
 Depending on how well you burn-in/test your new disks, I like to only
 add 1 failure domain of disks at a time in case you have bad disks that
 you're adding.  If you are confident that your disks aren't likely to fail
 during the backfilling, then you can go with more.  I just added 8 servers
 (16 OSDs each) to a cluster with 15 servers (16 OSDs each) all at the same
 time, but we spent 2 weeks testing the hardware before adding the new nodes
 to the cluster.

 If you add 1 failure domain at a time, then any DoA disks in the new
 nodes will only be able to fail with 1 copy of your data instead of across
 multiple nodes.

 On Mon, Dec 4, 2017 at 12:54 PM Karun Josy 
 wrote:

> Hi,
>
> Is it recommended to add OSD disks one by one or can I add couple of
> disks at a time ?
>
> Current cluster size is about 4 TB.
>
>
>

Re: [ceph-users] Adding multiple OSD

2017-12-04 Thread Karun Josy
Thank you for detailed explanation!

Got one another doubt,

This is the total space available in the cluster :

TOTAL 23490G
Use 10170G
Avail : 13320G


But ecpool shows max avail as just 3 TB.



Karun Josy

On Tue, Dec 5, 2017 at 1:06 AM, David Turner  wrote:

> No, I would only add disks to 1 failure domain at a time.  So in your
> situation where you're adding 2 more disks to each node, I would recommend
> adding the 2 disks into 1 node at a time.  Your failure domain is the
> crush-failure-domain=host.  So you can lose a host and only lose 1 copy of
> the data.  If all of your pools are using the k=5 m=3 profile, then I would
> say it's fine to add the disks into 2 nodes at a time.  If you have any
> replica pools for RGW metadata or anything, then I would stick with the 1
> host at a time.
>
> On Mon, Dec 4, 2017 at 2:29 PM Karun Josy  wrote:
>
>> Thanks for your reply!
>>
>> I am using erasure coded profile with k=5, m=3 settings
>>
>> $ ceph osd erasure-code-profile get profile5by3
>> crush-device-class=
>> crush-failure-domain=host
>> crush-root=default
>> jerasure-per-chunk-alignment=false
>> k=5
>> m=3
>> plugin=jerasure
>> technique=reed_sol_van
>> w=8
>>
>>
>> Cluster has 8 nodes, with 3 disks each. We are planning to add 2 more on
>> each nodes.
>>
>> If I understand correctly, then I can add 3 disks at once right ,
>> assuming 3 disks can fail at a time as per the ec code profile.
>>
>> Karun Josy
>>
>> On Tue, Dec 5, 2017 at 12:06 AM, David Turner 
>> wrote:
>>
>>> Depending on how well you burn-in/test your new disks, I like to only
>>> add 1 failure domain of disks at a time in case you have bad disks that
>>> you're adding.  If you are confident that your disks aren't likely to fail
>>> during the backfilling, then you can go with more.  I just added 8 servers
>>> (16 OSDs each) to a cluster with 15 servers (16 OSDs each) all at the same
>>> time, but we spent 2 weeks testing the hardware before adding the new nodes
>>> to the cluster.
>>>
>>> If you add 1 failure domain at a time, then any DoA disks in the new
>>> nodes will only be able to fail with 1 copy of your data instead of across
>>> multiple nodes.
>>>
>>> On Mon, Dec 4, 2017 at 12:54 PM Karun Josy  wrote:
>>>
 Hi,

 Is it recommended to add OSD disks one by one or can I add couple of
 disks at a time ?

 Current cluster size is about 4 TB.



 Karun
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding multiple OSD

2017-12-04 Thread David Turner
No, I would only add disks to 1 failure domain at a time.  So in your
situation where you're adding 2 more disks to each node, I would recommend
adding the 2 disks into 1 node at a time.  Your failure domain is the
crush-failure-domain=host.  So you can lose a host and only lose 1 copy of
the data.  If all of your pools are using the k=5 m=3 profile, then I would
say it's fine to add the disks into 2 nodes at a time.  If you have any
replica pools for RGW metadata or anything, then I would stick with the 1
host at a time.

On Mon, Dec 4, 2017 at 2:29 PM Karun Josy  wrote:

> Thanks for your reply!
>
> I am using erasure coded profile with k=5, m=3 settings
>
> $ ceph osd erasure-code-profile get profile5by3
> crush-device-class=
> crush-failure-domain=host
> crush-root=default
> jerasure-per-chunk-alignment=false
> k=5
> m=3
> plugin=jerasure
> technique=reed_sol_van
> w=8
>
>
> Cluster has 8 nodes, with 3 disks each. We are planning to add 2 more on
> each nodes.
>
> If I understand correctly, then I can add 3 disks at once right , assuming
> 3 disks can fail at a time as per the ec code profile.
>
> Karun Josy
>
> On Tue, Dec 5, 2017 at 12:06 AM, David Turner 
> wrote:
>
>> Depending on how well you burn-in/test your new disks, I like to only add
>> 1 failure domain of disks at a time in case you have bad disks that you're
>> adding.  If you are confident that your disks aren't likely to fail during
>> the backfilling, then you can go with more.  I just added 8 servers (16
>> OSDs each) to a cluster with 15 servers (16 OSDs each) all at the same
>> time, but we spent 2 weeks testing the hardware before adding the new nodes
>> to the cluster.
>>
>> If you add 1 failure domain at a time, then any DoA disks in the new
>> nodes will only be able to fail with 1 copy of your data instead of across
>> multiple nodes.
>>
>> On Mon, Dec 4, 2017 at 12:54 PM Karun Josy  wrote:
>>
>>> Hi,
>>>
>>> Is it recommended to add OSD disks one by one or can I add couple of
>>> disks at a time ?
>>>
>>> Current cluster size is about 4 TB.
>>>
>>>
>>>
>>> Karun
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding multiple OSD

2017-12-04 Thread Karun Josy
Thanks for your reply!

I am using erasure coded profile with k=5, m=3 settings

$ ceph osd erasure-code-profile get profile5by3
crush-device-class=
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=5
m=3
plugin=jerasure
technique=reed_sol_van
w=8


Cluster has 8 nodes, with 3 disks each. We are planning to add 2 more on
each nodes.

If I understand correctly, then I can add 3 disks at once right , assuming
3 disks can fail at a time as per the ec code profile.

Karun Josy

On Tue, Dec 5, 2017 at 12:06 AM, David Turner  wrote:

> Depending on how well you burn-in/test your new disks, I like to only add
> 1 failure domain of disks at a time in case you have bad disks that you're
> adding.  If you are confident that your disks aren't likely to fail during
> the backfilling, then you can go with more.  I just added 8 servers (16
> OSDs each) to a cluster with 15 servers (16 OSDs each) all at the same
> time, but we spent 2 weeks testing the hardware before adding the new nodes
> to the cluster.
>
> If you add 1 failure domain at a time, then any DoA disks in the new nodes
> will only be able to fail with 1 copy of your data instead of across
> multiple nodes.
>
> On Mon, Dec 4, 2017 at 12:54 PM Karun Josy  wrote:
>
>> Hi,
>>
>> Is it recommended to add OSD disks one by one or can I add couple of
>> disks at a time ?
>>
>> Current cluster size is about 4 TB.
>>
>>
>>
>> Karun
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding multiple OSD

2017-12-04 Thread David Turner
Depending on how well you burn-in/test your new disks, I like to only add 1
failure domain of disks at a time in case you have bad disks that you're
adding.  If you are confident that your disks aren't likely to fail during
the backfilling, then you can go with more.  I just added 8 servers (16
OSDs each) to a cluster with 15 servers (16 OSDs each) all at the same
time, but we spent 2 weeks testing the hardware before adding the new nodes
to the cluster.

If you add 1 failure domain at a time, then any DoA disks in the new nodes
will only be able to fail with 1 copy of your data instead of across
multiple nodes.

On Mon, Dec 4, 2017 at 12:54 PM Karun Josy  wrote:

> Hi,
>
> Is it recommended to add OSD disks one by one or can I add couple of disks
> at a time ?
>
> Current cluster size is about 4 TB.
>
>
>
> Karun
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Adding multiple OSD

2017-12-04 Thread Karun Josy
Hi,

Is it recommended to add OSD disks one by one or can I add couple of disks
at a time ?

Current cluster size is about 4 TB.



Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com