Re: [ceph-users] Ceph osd crush weight to utilization incorrect on one node

2018-05-11 Thread Pardhiv Karri
Hi David,

Here is the output of ceph df. We have lot of space in our ceph cluster. We
have 2 OSDs (266,500) down earlier due to hardware issue and never got a
chance to fix them.

GLOBAL:
SIZE  AVAIL RAW USED %RAW USED
1101T  701T 400T 36.37
POOLS:
NAME   ID USED   %USED MAX AVAIL
OBJECTS
rbd0   0 0  159T
 0
.rgw.root  3 780 0  159T
 3
.rgw.control   4   0 0  159T
 8
.rgw.gc5   0 0  159T
35
.users.uid 66037 0  159T
32
images 7  16462G  4.38  159T
 2660844
.rgw   10820 0  159T
 4
volumes11   106T 28.91  159T
28011837
compute12 11327G  3.01  159T
 1467722
backups15  0 0  159T
 0
.rgw.buckets.index 16  0 0  159T
 2
.rgw.buckets   17  0 0  159T
 0


Thanks,
Pardhiv K

On Fri, May 11, 2018 at 7:14 PM, David Turner  wrote:

> What's your `ceph osd tree`, `ceph df`, `ceph osd df`? You sound like you
> just have a fairly fill cluster that you haven't balanced the crush weights
> on.
>
>
> On Fri, May 11, 2018, 10:06 PM Pardhiv Karri 
> wrote:
>
>> Hi David,
>>
>> Thanks for the reply. Yeah we are seeing that 0.0001 usage on pretty much
>> on all OSDs. But this node it is different  whether full weight or just
>> 0.2of OSD 611 the OSD 611 start increasing.
>>
>> --Pardhiv K
>>
>>
>> On Fri, May 11, 2018 at 10:50 AM, David Turner 
>> wrote:
>>
>>> There was a time in the history of Ceph where a weight of 0.0 was not
>>> always what you thought.  People had better experiences with crush weights
>>> of something like 0.0001 or something.  This is just a memory tickling in
>>> the back of my mind of things I've read on the ML years back.
>>>
>>> On Fri, May 11, 2018 at 1:26 PM Bryan Stillwell 
>>> wrote:
>>>
 > We have a large 1PB ceph cluster. We recently added 6 nodes with 16
 2TB disks
 > each to the cluster. All the 5 nodes rebalanced well without any
 issues and
 > the sixth/last node OSDs started acting weird as I increase weight of
 one osd
 > the utilization doesn't change but a different osd on the same node
 > utilization is getting increased. Rebalance complete fine but
 utilization is
 > not right.
 >
 > Increased weight of OSD 610 to 0.2 from 0.0 but utilization of OSD 611
 > started increasing but its weight is 0.0. If I increase weight of OSD
 611 to
 > 0.2 then its overall utilization is growing to what if its weight is
 0.4. So
 > if I increase weight of 610 and 615 to their full weight then
 utilization on
 > OSD 610 is 1% and on OSD 611 is inching towards 100% where I had to
 stop and
 > downsize the OSD's crush weight back to 0.0 to avoid any implications
 on ceph
 > cluster. Its not just one osd but different OSD's on that one node.
 The only
 > correlation I found out is 610 and 611 OSD Journal partitions are on
 the same
 > SSD drive and all the OSDs are SAS drives. Any help on how to debug or
 > resolve this will be helpful.

 You didn't say which version of Ceph you were using, but based on the
 output
 of 'ceph osd df' I'm guessing it's pre-Jewel (maybe Hammer?) cluster?

 I've found that data placement can be a little weird when you have
 really
 low CRUSH weights (0.2) on one of the nodes where the other nodes have
 large
 CRUSH weights (2.0).  I've had it where a single OSD in a node was
 getting
 almost all the data.  It wasn't until I increased the weights to be
 more in
 line with the rest of the cluster that it evened back out.

 I believe this can also be caused by not having enough PGs in your
 cluster.
 Or the PGs you do have aren't distributed correctly based on the data
 usage
 in each pool.  Have you used https://ceph.com/pgcalc/ to determine the
 correct number of PGs you should have per pool?

 Since you are likely running a pre-Jewel cluster it could also be that
 you
 haven't switched your tunables to use the straw2 data placement
 algorithm:

 http://docs.ceph.com/docs/master/rados/operations/crush-
 map/#hammer-crush-v4

 That should help as well.  Once that's enabled you can convert your
 existing
 buckets to straw2 as well.  Just be careful you don't have any old
 clients
 connecting to your cluster that don't support that feature yet.

 Bryan

 

Re: [ceph-users] Ceph osd crush weight to utilization incorrect on one node

2018-05-11 Thread David Turner
What's your `ceph osd tree`, `ceph df`, `ceph osd df`? You sound like you
just have a fairly fill cluster that you haven't balanced the crush weights
on.

On Fri, May 11, 2018, 10:06 PM Pardhiv Karri  wrote:

> Hi David,
>
> Thanks for the reply. Yeah we are seeing that 0.0001 usage on pretty much
> on all OSDs. But this node it is different  whether full weight or just
> 0.2of OSD 611 the OSD 611 start increasing.
>
> --Pardhiv K
>
>
> On Fri, May 11, 2018 at 10:50 AM, David Turner 
> wrote:
>
>> There was a time in the history of Ceph where a weight of 0.0 was not
>> always what you thought.  People had better experiences with crush weights
>> of something like 0.0001 or something.  This is just a memory tickling in
>> the back of my mind of things I've read on the ML years back.
>>
>> On Fri, May 11, 2018 at 1:26 PM Bryan Stillwell 
>> wrote:
>>
>>> > We have a large 1PB ceph cluster. We recently added 6 nodes with 16
>>> 2TB disks
>>> > each to the cluster. All the 5 nodes rebalanced well without any
>>> issues and
>>> > the sixth/last node OSDs started acting weird as I increase weight of
>>> one osd
>>> > the utilization doesn't change but a different osd on the same node
>>> > utilization is getting increased. Rebalance complete fine but
>>> utilization is
>>> > not right.
>>> >
>>> > Increased weight of OSD 610 to 0.2 from 0.0 but utilization of OSD 611
>>> > started increasing but its weight is 0.0. If I increase weight of OSD
>>> 611 to
>>> > 0.2 then its overall utilization is growing to what if its weight is
>>> 0.4. So
>>> > if I increase weight of 610 and 615 to their full weight then
>>> utilization on
>>> > OSD 610 is 1% and on OSD 611 is inching towards 100% where I had to
>>> stop and
>>> > downsize the OSD's crush weight back to 0.0 to avoid any implications
>>> on ceph
>>> > cluster. Its not just one osd but different OSD's on that one node.
>>> The only
>>> > correlation I found out is 610 and 611 OSD Journal partitions are on
>>> the same
>>> > SSD drive and all the OSDs are SAS drives. Any help on how to debug or
>>> > resolve this will be helpful.
>>>
>>> You didn't say which version of Ceph you were using, but based on the
>>> output
>>> of 'ceph osd df' I'm guessing it's pre-Jewel (maybe Hammer?) cluster?
>>>
>>> I've found that data placement can be a little weird when you have really
>>> low CRUSH weights (0.2) on one of the nodes where the other nodes have
>>> large
>>> CRUSH weights (2.0).  I've had it where a single OSD in a node was
>>> getting
>>> almost all the data.  It wasn't until I increased the weights to be more
>>> in
>>> line with the rest of the cluster that it evened back out.
>>>
>>> I believe this can also be caused by not having enough PGs in your
>>> cluster.
>>> Or the PGs you do have aren't distributed correctly based on the data
>>> usage
>>> in each pool.  Have you used https://ceph.com/pgcalc/ to determine the
>>> correct number of PGs you should have per pool?
>>>
>>> Since you are likely running a pre-Jewel cluster it could also be that
>>> you
>>> haven't switched your tunables to use the straw2 data placement
>>> algorithm:
>>>
>>>
>>> http://docs.ceph.com/docs/master/rados/operations/crush-map/#hammer-crush-v4
>>>
>>> That should help as well.  Once that's enabled you can convert your
>>> existing
>>> buckets to straw2 as well.  Just be careful you don't have any old
>>> clients
>>> connecting to your cluster that don't support that feature yet.
>>>
>>> Bryan
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> *Pardhiv Karri*
> "Rise and Rise again until LAMBS become LIONS"
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph osd crush weight to utilization incorrect on one node

2018-05-11 Thread Pardhiv Karri
Hi David,

Thanks for the reply. Yeah we are seeing that 0.0001 usage on pretty much
on all OSDs. But this node it is different  whether full weight or just
0.2of OSD 611 the OSD 611 start increasing.

--Pardhiv K


On Fri, May 11, 2018 at 10:50 AM, David Turner 
wrote:

> There was a time in the history of Ceph where a weight of 0.0 was not
> always what you thought.  People had better experiences with crush weights
> of something like 0.0001 or something.  This is just a memory tickling in
> the back of my mind of things I've read on the ML years back.
>
> On Fri, May 11, 2018 at 1:26 PM Bryan Stillwell 
> wrote:
>
>> > We have a large 1PB ceph cluster. We recently added 6 nodes with 16 2TB
>> disks
>> > each to the cluster. All the 5 nodes rebalanced well without any issues
>> and
>> > the sixth/last node OSDs started acting weird as I increase weight of
>> one osd
>> > the utilization doesn't change but a different osd on the same node
>> > utilization is getting increased. Rebalance complete fine but
>> utilization is
>> > not right.
>> >
>> > Increased weight of OSD 610 to 0.2 from 0.0 but utilization of OSD 611
>> > started increasing but its weight is 0.0. If I increase weight of OSD
>> 611 to
>> > 0.2 then its overall utilization is growing to what if its weight is
>> 0.4. So
>> > if I increase weight of 610 and 615 to their full weight then
>> utilization on
>> > OSD 610 is 1% and on OSD 611 is inching towards 100% where I had to
>> stop and
>> > downsize the OSD's crush weight back to 0.0 to avoid any implications
>> on ceph
>> > cluster. Its not just one osd but different OSD's on that one node. The
>> only
>> > correlation I found out is 610 and 611 OSD Journal partitions are on
>> the same
>> > SSD drive and all the OSDs are SAS drives. Any help on how to debug or
>> > resolve this will be helpful.
>>
>> You didn't say which version of Ceph you were using, but based on the
>> output
>> of 'ceph osd df' I'm guessing it's pre-Jewel (maybe Hammer?) cluster?
>>
>> I've found that data placement can be a little weird when you have really
>> low CRUSH weights (0.2) on one of the nodes where the other nodes have
>> large
>> CRUSH weights (2.0).  I've had it where a single OSD in a node was getting
>> almost all the data.  It wasn't until I increased the weights to be more
>> in
>> line with the rest of the cluster that it evened back out.
>>
>> I believe this can also be caused by not having enough PGs in your
>> cluster.
>> Or the PGs you do have aren't distributed correctly based on the data
>> usage
>> in each pool.  Have you used https://ceph.com/pgcalc/ to determine the
>> correct number of PGs you should have per pool?
>>
>> Since you are likely running a pre-Jewel cluster it could also be that you
>> haven't switched your tunables to use the straw2 data placement algorithm:
>>
>> http://docs.ceph.com/docs/master/rados/operations/crush-
>> map/#hammer-crush-v4
>>
>> That should help as well.  Once that's enabled you can convert your
>> existing
>> buckets to straw2 as well.  Just be careful you don't have any old clients
>> connecting to your cluster that don't support that feature yet.
>>
>> Bryan
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
*Pardhiv Karri*
"Rise and Rise again until LAMBS become LIONS"
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph osd crush weight to utilization incorrect on one node

2018-05-11 Thread Pardhiv Karri
Hi Bryan,

Thank you for the reply.

We are on Hammer, ceph version 0.94.9
(fe6d859066244b97b24f09d46552afc2071e6f90)

We tried with full weight on all OSDs on that node and the OSDs like 611
are going above 90% so downsized and tested with only 0.2

Our PGs are at 119 for all 12 pools in the cluster.

We are using tree algorithm for our clusters.

We deleted and re-added the OSDs and still the same issue.

Not sure if upgrading the cluster might fix it but we are afraid of
upgrading the cluster, hoping for a fix without upgrading.

Thanks,
Pardhiv K



On Fri, May 11, 2018 at 10:26 AM, Bryan Stillwell 
wrote:

> > We have a large 1PB ceph cluster. We recently added 6 nodes with 16 2TB
> disks
> > each to the cluster. All the 5 nodes rebalanced well without any issues
> and
> > the sixth/last node OSDs started acting weird as I increase weight of
> one osd
> > the utilization doesn't change but a different osd on the same node
> > utilization is getting increased. Rebalance complete fine but
> utilization is
> > not right.
> >
> > Increased weight of OSD 610 to 0.2 from 0.0 but utilization of OSD 611
> > started increasing but its weight is 0.0. If I increase weight of OSD
> 611 to
> > 0.2 then its overall utilization is growing to what if its weight is
> 0.4. So
> > if I increase weight of 610 and 615 to their full weight then
> utilization on
> > OSD 610 is 1% and on OSD 611 is inching towards 100% where I had to stop
> and
> > downsize the OSD's crush weight back to 0.0 to avoid any implications on
> ceph
> > cluster. Its not just one osd but different OSD's on that one node. The
> only
> > correlation I found out is 610 and 611 OSD Journal partitions are on the
> same
> > SSD drive and all the OSDs are SAS drives. Any help on how to debug or
> > resolve this will be helpful.
>
> You didn't say which version of Ceph you were using, but based on the
> output
> of 'ceph osd df' I'm guessing it's pre-Jewel (maybe Hammer?) cluster?
>
> I've found that data placement can be a little weird when you have really
> low CRUSH weights (0.2) on one of the nodes where the other nodes have
> large
> CRUSH weights (2.0).  I've had it where a single OSD in a node was getting
> almost all the data.  It wasn't until I increased the weights to be more in
> line with the rest of the cluster that it evened back out.
>
> I believe this can also be caused by not having enough PGs in your cluster.
> Or the PGs you do have aren't distributed correctly based on the data usage
> in each pool.  Have you used https://ceph.com/pgcalc/ to determine the
> correct number of PGs you should have per pool?
>
> Since you are likely running a pre-Jewel cluster it could also be that you
> haven't switched your tunables to use the straw2 data placement algorithm:
>
> http://docs.ceph.com/docs/master/rados/operations/crush-
> map/#hammer-crush-v4
>
> That should help as well.  Once that's enabled you can convert your
> existing
> buckets to straw2 as well.  Just be careful you don't have any old clients
> connecting to your cluster that don't support that feature yet.
>
> Bryan
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
*Pardhiv Karri*
"Rise and Rise again until LAMBS become LIONS"
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph osd crush weight to utilization incorrect on one node

2018-05-11 Thread David Turner
There was a time in the history of Ceph where a weight of 0.0 was not
always what you thought.  People had better experiences with crush weights
of something like 0.0001 or something.  This is just a memory tickling in
the back of my mind of things I've read on the ML years back.

On Fri, May 11, 2018 at 1:26 PM Bryan Stillwell 
wrote:

> > We have a large 1PB ceph cluster. We recently added 6 nodes with 16 2TB
> disks
> > each to the cluster. All the 5 nodes rebalanced well without any issues
> and
> > the sixth/last node OSDs started acting weird as I increase weight of
> one osd
> > the utilization doesn't change but a different osd on the same node
> > utilization is getting increased. Rebalance complete fine but
> utilization is
> > not right.
> >
> > Increased weight of OSD 610 to 0.2 from 0.0 but utilization of OSD 611
> > started increasing but its weight is 0.0. If I increase weight of OSD
> 611 to
> > 0.2 then its overall utilization is growing to what if its weight is
> 0.4. So
> > if I increase weight of 610 and 615 to their full weight then
> utilization on
> > OSD 610 is 1% and on OSD 611 is inching towards 100% where I had to stop
> and
> > downsize the OSD's crush weight back to 0.0 to avoid any implications on
> ceph
> > cluster. Its not just one osd but different OSD's on that one node. The
> only
> > correlation I found out is 610 and 611 OSD Journal partitions are on the
> same
> > SSD drive and all the OSDs are SAS drives. Any help on how to debug or
> > resolve this will be helpful.
>
> You didn't say which version of Ceph you were using, but based on the
> output
> of 'ceph osd df' I'm guessing it's pre-Jewel (maybe Hammer?) cluster?
>
> I've found that data placement can be a little weird when you have really
> low CRUSH weights (0.2) on one of the nodes where the other nodes have
> large
> CRUSH weights (2.0).  I've had it where a single OSD in a node was getting
> almost all the data.  It wasn't until I increased the weights to be more in
> line with the rest of the cluster that it evened back out.
>
> I believe this can also be caused by not having enough PGs in your cluster.
> Or the PGs you do have aren't distributed correctly based on the data usage
> in each pool.  Have you used https://ceph.com/pgcalc/ to determine the
> correct number of PGs you should have per pool?
>
> Since you are likely running a pre-Jewel cluster it could also be that you
> haven't switched your tunables to use the straw2 data placement algorithm:
>
>
> http://docs.ceph.com/docs/master/rados/operations/crush-map/#hammer-crush-v4
>
> That should help as well.  Once that's enabled you can convert your
> existing
> buckets to straw2 as well.  Just be careful you don't have any old clients
> connecting to your cluster that don't support that feature yet.
>
> Bryan
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph osd crush weight to utilization incorrect on one node

2018-05-11 Thread Bryan Stillwell
> We have a large 1PB ceph cluster. We recently added 6 nodes with 16 2TB disks
> each to the cluster. All the 5 nodes rebalanced well without any issues and
> the sixth/last node OSDs started acting weird as I increase weight of one osd
> the utilization doesn't change but a different osd on the same node
> utilization is getting increased. Rebalance complete fine but utilization is
> not right.
>
> Increased weight of OSD 610 to 0.2 from 0.0 but utilization of OSD 611
> started increasing but its weight is 0.0. If I increase weight of OSD 611 to
> 0.2 then its overall utilization is growing to what if its weight is 0.4. So
> if I increase weight of 610 and 615 to their full weight then utilization on
> OSD 610 is 1% and on OSD 611 is inching towards 100% where I had to stop and
> downsize the OSD's crush weight back to 0.0 to avoid any implications on ceph
> cluster. Its not just one osd but different OSD's on that one node. The only
> correlation I found out is 610 and 611 OSD Journal partitions are on the same
> SSD drive and all the OSDs are SAS drives. Any help on how to debug or
> resolve this will be helpful.

You didn't say which version of Ceph you were using, but based on the output
of 'ceph osd df' I'm guessing it's pre-Jewel (maybe Hammer?) cluster?

I've found that data placement can be a little weird when you have really
low CRUSH weights (0.2) on one of the nodes where the other nodes have large
CRUSH weights (2.0).  I've had it where a single OSD in a node was getting
almost all the data.  It wasn't until I increased the weights to be more in
line with the rest of the cluster that it evened back out.

I believe this can also be caused by not having enough PGs in your cluster.
Or the PGs you do have aren't distributed correctly based on the data usage
in each pool.  Have you used https://ceph.com/pgcalc/ to determine the
correct number of PGs you should have per pool?

Since you are likely running a pre-Jewel cluster it could also be that you
haven't switched your tunables to use the straw2 data placement algorithm:

http://docs.ceph.com/docs/master/rados/operations/crush-map/#hammer-crush-v4

That should help as well.  Once that's enabled you can convert your existing
buckets to straw2 as well.  Just be careful you don't have any old clients
connecting to your cluster that don't support that feature yet.

Bryan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph osd crush weight to utilization incorrect on one node

2018-05-10 Thread Pardhiv Karri
Hi,

We have a large 1PB ceph cluster. We recently added 6 nodes with 16 2TB
disks each to the cluster. All the 5 nodes rebalanced well without any
issues and the sixth/last node OSDs started acting weird as I increase
weight of one osd the utilization doesn't change but a different osd on the
same node utilization is getting increased. Rebalance complete fine but
utilization is not right.


Increased weight of OSD 610 to 0.2 from 0.0 but utilization of OSD 611
started increasing but its weight is 0.0. If I increase weight of OSD 611
to 0.2 then its overall utilization is growing to what if its weight is
0.4. So if I increase weight of 610 and 615 to their full weight then
utilization on OSD 610 is 1% and on OSD 611 is inching towards 100% where I
had to stop and downsize the OSD's crush weight back to 0.0 to avoid any
implications on ceph cluster. Its not just one osd but different OSD's on
that one node. The only correlation I found out is 610 and 611 OSD Journal
partitions are on the same SSD drive and all the OSDs are SAS drives. Any
help on how to debug or resolve this will be helpful.


Attached the screenshot.  with shows 610, 612 and 620 osd crush weight is
increased to 0.2 but OSDs 611, 615 and 623 utilization increased but has 0
crush weight.


​
​


Thanks,
Pardhiv K
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com