Re: [ceph-users] Ceph osd crush weight to utilization incorrect on one node
Hi David, Here is the output of ceph df. We have lot of space in our ceph cluster. We have 2 OSDs (266,500) down earlier due to hardware issue and never got a chance to fix them. GLOBAL: SIZE AVAIL RAW USED %RAW USED 1101T 701T 400T 36.37 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS rbd0 0 0 159T 0 .rgw.root 3 780 0 159T 3 .rgw.control 4 0 0 159T 8 .rgw.gc5 0 0 159T 35 .users.uid 66037 0 159T 32 images 7 16462G 4.38 159T 2660844 .rgw 10820 0 159T 4 volumes11 106T 28.91 159T 28011837 compute12 11327G 3.01 159T 1467722 backups15 0 0 159T 0 .rgw.buckets.index 16 0 0 159T 2 .rgw.buckets 17 0 0 159T 0 Thanks, Pardhiv K On Fri, May 11, 2018 at 7:14 PM, David Turnerwrote: > What's your `ceph osd tree`, `ceph df`, `ceph osd df`? You sound like you > just have a fairly fill cluster that you haven't balanced the crush weights > on. > > > On Fri, May 11, 2018, 10:06 PM Pardhiv Karri > wrote: > >> Hi David, >> >> Thanks for the reply. Yeah we are seeing that 0.0001 usage on pretty much >> on all OSDs. But this node it is different whether full weight or just >> 0.2of OSD 611 the OSD 611 start increasing. >> >> --Pardhiv K >> >> >> On Fri, May 11, 2018 at 10:50 AM, David Turner >> wrote: >> >>> There was a time in the history of Ceph where a weight of 0.0 was not >>> always what you thought. People had better experiences with crush weights >>> of something like 0.0001 or something. This is just a memory tickling in >>> the back of my mind of things I've read on the ML years back. >>> >>> On Fri, May 11, 2018 at 1:26 PM Bryan Stillwell >>> wrote: >>> > We have a large 1PB ceph cluster. We recently added 6 nodes with 16 2TB disks > each to the cluster. All the 5 nodes rebalanced well without any issues and > the sixth/last node OSDs started acting weird as I increase weight of one osd > the utilization doesn't change but a different osd on the same node > utilization is getting increased. Rebalance complete fine but utilization is > not right. > > Increased weight of OSD 610 to 0.2 from 0.0 but utilization of OSD 611 > started increasing but its weight is 0.0. If I increase weight of OSD 611 to > 0.2 then its overall utilization is growing to what if its weight is 0.4. So > if I increase weight of 610 and 615 to their full weight then utilization on > OSD 610 is 1% and on OSD 611 is inching towards 100% where I had to stop and > downsize the OSD's crush weight back to 0.0 to avoid any implications on ceph > cluster. Its not just one osd but different OSD's on that one node. The only > correlation I found out is 610 and 611 OSD Journal partitions are on the same > SSD drive and all the OSDs are SAS drives. Any help on how to debug or > resolve this will be helpful. You didn't say which version of Ceph you were using, but based on the output of 'ceph osd df' I'm guessing it's pre-Jewel (maybe Hammer?) cluster? I've found that data placement can be a little weird when you have really low CRUSH weights (0.2) on one of the nodes where the other nodes have large CRUSH weights (2.0). I've had it where a single OSD in a node was getting almost all the data. It wasn't until I increased the weights to be more in line with the rest of the cluster that it evened back out. I believe this can also be caused by not having enough PGs in your cluster. Or the PGs you do have aren't distributed correctly based on the data usage in each pool. Have you used https://ceph.com/pgcalc/ to determine the correct number of PGs you should have per pool? Since you are likely running a pre-Jewel cluster it could also be that you haven't switched your tunables to use the straw2 data placement algorithm: http://docs.ceph.com/docs/master/rados/operations/crush- map/#hammer-crush-v4 That should help as well. Once that's enabled you can convert your existing buckets to straw2 as well. Just be careful you don't have any old clients connecting to your cluster that don't support that feature yet. Bryan
Re: [ceph-users] Ceph osd crush weight to utilization incorrect on one node
What's your `ceph osd tree`, `ceph df`, `ceph osd df`? You sound like you just have a fairly fill cluster that you haven't balanced the crush weights on. On Fri, May 11, 2018, 10:06 PM Pardhiv Karriwrote: > Hi David, > > Thanks for the reply. Yeah we are seeing that 0.0001 usage on pretty much > on all OSDs. But this node it is different whether full weight or just > 0.2of OSD 611 the OSD 611 start increasing. > > --Pardhiv K > > > On Fri, May 11, 2018 at 10:50 AM, David Turner > wrote: > >> There was a time in the history of Ceph where a weight of 0.0 was not >> always what you thought. People had better experiences with crush weights >> of something like 0.0001 or something. This is just a memory tickling in >> the back of my mind of things I've read on the ML years back. >> >> On Fri, May 11, 2018 at 1:26 PM Bryan Stillwell >> wrote: >> >>> > We have a large 1PB ceph cluster. We recently added 6 nodes with 16 >>> 2TB disks >>> > each to the cluster. All the 5 nodes rebalanced well without any >>> issues and >>> > the sixth/last node OSDs started acting weird as I increase weight of >>> one osd >>> > the utilization doesn't change but a different osd on the same node >>> > utilization is getting increased. Rebalance complete fine but >>> utilization is >>> > not right. >>> > >>> > Increased weight of OSD 610 to 0.2 from 0.0 but utilization of OSD 611 >>> > started increasing but its weight is 0.0. If I increase weight of OSD >>> 611 to >>> > 0.2 then its overall utilization is growing to what if its weight is >>> 0.4. So >>> > if I increase weight of 610 and 615 to their full weight then >>> utilization on >>> > OSD 610 is 1% and on OSD 611 is inching towards 100% where I had to >>> stop and >>> > downsize the OSD's crush weight back to 0.0 to avoid any implications >>> on ceph >>> > cluster. Its not just one osd but different OSD's on that one node. >>> The only >>> > correlation I found out is 610 and 611 OSD Journal partitions are on >>> the same >>> > SSD drive and all the OSDs are SAS drives. Any help on how to debug or >>> > resolve this will be helpful. >>> >>> You didn't say which version of Ceph you were using, but based on the >>> output >>> of 'ceph osd df' I'm guessing it's pre-Jewel (maybe Hammer?) cluster? >>> >>> I've found that data placement can be a little weird when you have really >>> low CRUSH weights (0.2) on one of the nodes where the other nodes have >>> large >>> CRUSH weights (2.0). I've had it where a single OSD in a node was >>> getting >>> almost all the data. It wasn't until I increased the weights to be more >>> in >>> line with the rest of the cluster that it evened back out. >>> >>> I believe this can also be caused by not having enough PGs in your >>> cluster. >>> Or the PGs you do have aren't distributed correctly based on the data >>> usage >>> in each pool. Have you used https://ceph.com/pgcalc/ to determine the >>> correct number of PGs you should have per pool? >>> >>> Since you are likely running a pre-Jewel cluster it could also be that >>> you >>> haven't switched your tunables to use the straw2 data placement >>> algorithm: >>> >>> >>> http://docs.ceph.com/docs/master/rados/operations/crush-map/#hammer-crush-v4 >>> >>> That should help as well. Once that's enabled you can convert your >>> existing >>> buckets to straw2 as well. Just be careful you don't have any old >>> clients >>> connecting to your cluster that don't support that feature yet. >>> >>> Bryan >>> >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > > -- > *Pardhiv Karri* > "Rise and Rise again until LAMBS become LIONS" > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph osd crush weight to utilization incorrect on one node
Hi David, Thanks for the reply. Yeah we are seeing that 0.0001 usage on pretty much on all OSDs. But this node it is different whether full weight or just 0.2of OSD 611 the OSD 611 start increasing. --Pardhiv K On Fri, May 11, 2018 at 10:50 AM, David Turnerwrote: > There was a time in the history of Ceph where a weight of 0.0 was not > always what you thought. People had better experiences with crush weights > of something like 0.0001 or something. This is just a memory tickling in > the back of my mind of things I've read on the ML years back. > > On Fri, May 11, 2018 at 1:26 PM Bryan Stillwell > wrote: > >> > We have a large 1PB ceph cluster. We recently added 6 nodes with 16 2TB >> disks >> > each to the cluster. All the 5 nodes rebalanced well without any issues >> and >> > the sixth/last node OSDs started acting weird as I increase weight of >> one osd >> > the utilization doesn't change but a different osd on the same node >> > utilization is getting increased. Rebalance complete fine but >> utilization is >> > not right. >> > >> > Increased weight of OSD 610 to 0.2 from 0.0 but utilization of OSD 611 >> > started increasing but its weight is 0.0. If I increase weight of OSD >> 611 to >> > 0.2 then its overall utilization is growing to what if its weight is >> 0.4. So >> > if I increase weight of 610 and 615 to their full weight then >> utilization on >> > OSD 610 is 1% and on OSD 611 is inching towards 100% where I had to >> stop and >> > downsize the OSD's crush weight back to 0.0 to avoid any implications >> on ceph >> > cluster. Its not just one osd but different OSD's on that one node. The >> only >> > correlation I found out is 610 and 611 OSD Journal partitions are on >> the same >> > SSD drive and all the OSDs are SAS drives. Any help on how to debug or >> > resolve this will be helpful. >> >> You didn't say which version of Ceph you were using, but based on the >> output >> of 'ceph osd df' I'm guessing it's pre-Jewel (maybe Hammer?) cluster? >> >> I've found that data placement can be a little weird when you have really >> low CRUSH weights (0.2) on one of the nodes where the other nodes have >> large >> CRUSH weights (2.0). I've had it where a single OSD in a node was getting >> almost all the data. It wasn't until I increased the weights to be more >> in >> line with the rest of the cluster that it evened back out. >> >> I believe this can also be caused by not having enough PGs in your >> cluster. >> Or the PGs you do have aren't distributed correctly based on the data >> usage >> in each pool. Have you used https://ceph.com/pgcalc/ to determine the >> correct number of PGs you should have per pool? >> >> Since you are likely running a pre-Jewel cluster it could also be that you >> haven't switched your tunables to use the straw2 data placement algorithm: >> >> http://docs.ceph.com/docs/master/rados/operations/crush- >> map/#hammer-crush-v4 >> >> That should help as well. Once that's enabled you can convert your >> existing >> buckets to straw2 as well. Just be careful you don't have any old clients >> connecting to your cluster that don't support that feature yet. >> >> Bryan >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- *Pardhiv Karri* "Rise and Rise again until LAMBS become LIONS" ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph osd crush weight to utilization incorrect on one node
Hi Bryan, Thank you for the reply. We are on Hammer, ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90) We tried with full weight on all OSDs on that node and the OSDs like 611 are going above 90% so downsized and tested with only 0.2 Our PGs are at 119 for all 12 pools in the cluster. We are using tree algorithm for our clusters. We deleted and re-added the OSDs and still the same issue. Not sure if upgrading the cluster might fix it but we are afraid of upgrading the cluster, hoping for a fix without upgrading. Thanks, Pardhiv K On Fri, May 11, 2018 at 10:26 AM, Bryan Stillwellwrote: > > We have a large 1PB ceph cluster. We recently added 6 nodes with 16 2TB > disks > > each to the cluster. All the 5 nodes rebalanced well without any issues > and > > the sixth/last node OSDs started acting weird as I increase weight of > one osd > > the utilization doesn't change but a different osd on the same node > > utilization is getting increased. Rebalance complete fine but > utilization is > > not right. > > > > Increased weight of OSD 610 to 0.2 from 0.0 but utilization of OSD 611 > > started increasing but its weight is 0.0. If I increase weight of OSD > 611 to > > 0.2 then its overall utilization is growing to what if its weight is > 0.4. So > > if I increase weight of 610 and 615 to their full weight then > utilization on > > OSD 610 is 1% and on OSD 611 is inching towards 100% where I had to stop > and > > downsize the OSD's crush weight back to 0.0 to avoid any implications on > ceph > > cluster. Its not just one osd but different OSD's on that one node. The > only > > correlation I found out is 610 and 611 OSD Journal partitions are on the > same > > SSD drive and all the OSDs are SAS drives. Any help on how to debug or > > resolve this will be helpful. > > You didn't say which version of Ceph you were using, but based on the > output > of 'ceph osd df' I'm guessing it's pre-Jewel (maybe Hammer?) cluster? > > I've found that data placement can be a little weird when you have really > low CRUSH weights (0.2) on one of the nodes where the other nodes have > large > CRUSH weights (2.0). I've had it where a single OSD in a node was getting > almost all the data. It wasn't until I increased the weights to be more in > line with the rest of the cluster that it evened back out. > > I believe this can also be caused by not having enough PGs in your cluster. > Or the PGs you do have aren't distributed correctly based on the data usage > in each pool. Have you used https://ceph.com/pgcalc/ to determine the > correct number of PGs you should have per pool? > > Since you are likely running a pre-Jewel cluster it could also be that you > haven't switched your tunables to use the straw2 data placement algorithm: > > http://docs.ceph.com/docs/master/rados/operations/crush- > map/#hammer-crush-v4 > > That should help as well. Once that's enabled you can convert your > existing > buckets to straw2 as well. Just be careful you don't have any old clients > connecting to your cluster that don't support that feature yet. > > Bryan > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- *Pardhiv Karri* "Rise and Rise again until LAMBS become LIONS" ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph osd crush weight to utilization incorrect on one node
There was a time in the history of Ceph where a weight of 0.0 was not always what you thought. People had better experiences with crush weights of something like 0.0001 or something. This is just a memory tickling in the back of my mind of things I've read on the ML years back. On Fri, May 11, 2018 at 1:26 PM Bryan Stillwellwrote: > > We have a large 1PB ceph cluster. We recently added 6 nodes with 16 2TB > disks > > each to the cluster. All the 5 nodes rebalanced well without any issues > and > > the sixth/last node OSDs started acting weird as I increase weight of > one osd > > the utilization doesn't change but a different osd on the same node > > utilization is getting increased. Rebalance complete fine but > utilization is > > not right. > > > > Increased weight of OSD 610 to 0.2 from 0.0 but utilization of OSD 611 > > started increasing but its weight is 0.0. If I increase weight of OSD > 611 to > > 0.2 then its overall utilization is growing to what if its weight is > 0.4. So > > if I increase weight of 610 and 615 to their full weight then > utilization on > > OSD 610 is 1% and on OSD 611 is inching towards 100% where I had to stop > and > > downsize the OSD's crush weight back to 0.0 to avoid any implications on > ceph > > cluster. Its not just one osd but different OSD's on that one node. The > only > > correlation I found out is 610 and 611 OSD Journal partitions are on the > same > > SSD drive and all the OSDs are SAS drives. Any help on how to debug or > > resolve this will be helpful. > > You didn't say which version of Ceph you were using, but based on the > output > of 'ceph osd df' I'm guessing it's pre-Jewel (maybe Hammer?) cluster? > > I've found that data placement can be a little weird when you have really > low CRUSH weights (0.2) on one of the nodes where the other nodes have > large > CRUSH weights (2.0). I've had it where a single OSD in a node was getting > almost all the data. It wasn't until I increased the weights to be more in > line with the rest of the cluster that it evened back out. > > I believe this can also be caused by not having enough PGs in your cluster. > Or the PGs you do have aren't distributed correctly based on the data usage > in each pool. Have you used https://ceph.com/pgcalc/ to determine the > correct number of PGs you should have per pool? > > Since you are likely running a pre-Jewel cluster it could also be that you > haven't switched your tunables to use the straw2 data placement algorithm: > > > http://docs.ceph.com/docs/master/rados/operations/crush-map/#hammer-crush-v4 > > That should help as well. Once that's enabled you can convert your > existing > buckets to straw2 as well. Just be careful you don't have any old clients > connecting to your cluster that don't support that feature yet. > > Bryan > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph osd crush weight to utilization incorrect on one node
> We have a large 1PB ceph cluster. We recently added 6 nodes with 16 2TB disks > each to the cluster. All the 5 nodes rebalanced well without any issues and > the sixth/last node OSDs started acting weird as I increase weight of one osd > the utilization doesn't change but a different osd on the same node > utilization is getting increased. Rebalance complete fine but utilization is > not right. > > Increased weight of OSD 610 to 0.2 from 0.0 but utilization of OSD 611 > started increasing but its weight is 0.0. If I increase weight of OSD 611 to > 0.2 then its overall utilization is growing to what if its weight is 0.4. So > if I increase weight of 610 and 615 to their full weight then utilization on > OSD 610 is 1% and on OSD 611 is inching towards 100% where I had to stop and > downsize the OSD's crush weight back to 0.0 to avoid any implications on ceph > cluster. Its not just one osd but different OSD's on that one node. The only > correlation I found out is 610 and 611 OSD Journal partitions are on the same > SSD drive and all the OSDs are SAS drives. Any help on how to debug or > resolve this will be helpful. You didn't say which version of Ceph you were using, but based on the output of 'ceph osd df' I'm guessing it's pre-Jewel (maybe Hammer?) cluster? I've found that data placement can be a little weird when you have really low CRUSH weights (0.2) on one of the nodes where the other nodes have large CRUSH weights (2.0). I've had it where a single OSD in a node was getting almost all the data. It wasn't until I increased the weights to be more in line with the rest of the cluster that it evened back out. I believe this can also be caused by not having enough PGs in your cluster. Or the PGs you do have aren't distributed correctly based on the data usage in each pool. Have you used https://ceph.com/pgcalc/ to determine the correct number of PGs you should have per pool? Since you are likely running a pre-Jewel cluster it could also be that you haven't switched your tunables to use the straw2 data placement algorithm: http://docs.ceph.com/docs/master/rados/operations/crush-map/#hammer-crush-v4 That should help as well. Once that's enabled you can convert your existing buckets to straw2 as well. Just be careful you don't have any old clients connecting to your cluster that don't support that feature yet. Bryan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph osd crush weight to utilization incorrect on one node
Hi, We have a large 1PB ceph cluster. We recently added 6 nodes with 16 2TB disks each to the cluster. All the 5 nodes rebalanced well without any issues and the sixth/last node OSDs started acting weird as I increase weight of one osd the utilization doesn't change but a different osd on the same node utilization is getting increased. Rebalance complete fine but utilization is not right. Increased weight of OSD 610 to 0.2 from 0.0 but utilization of OSD 611 started increasing but its weight is 0.0. If I increase weight of OSD 611 to 0.2 then its overall utilization is growing to what if its weight is 0.4. So if I increase weight of 610 and 615 to their full weight then utilization on OSD 610 is 1% and on OSD 611 is inching towards 100% where I had to stop and downsize the OSD's crush weight back to 0.0 to avoid any implications on ceph cluster. Its not just one osd but different OSD's on that one node. The only correlation I found out is 610 and 611 OSD Journal partitions are on the same SSD drive and all the OSDs are SAS drives. Any help on how to debug or resolve this will be helpful. Attached the screenshot. with shows 610, 612 and 620 osd crush weight is increased to 0.2 but OSDs 611, 615 and 623 utilization increased but has 0 crush weight. Thanks, Pardhiv K ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com