Hi John, Thanks for the explanation. Have a couple of more questions on this subject though.
1. "pretend_min_hours_passed" sounds like something that I could use. I'm okay if there is a chance of interruption in services to the user at this time, as long as it does not cause any data-loss or data-corruption. 2. It would have been really useful if the rebalancing operations could be logged by swift somewhere and automatically run later (after min_part_hours). Regards, Shyam On Thu, May 1, 2014 at 11:15 PM, John Dickinson <m...@not.mn> wrote: > > On May 1, 2014, at 10:32 AM, Shyam Prasad N <nspmangal...@gmail.com> > wrote: > > > Hi Chuck, > > Thanks for the reply. > > > > The reason for such weight distribution seems to do with the ring > rebalance command. I've scripted the disk addition (and rebalance) process > to the ring using a wrapper command. When I trigger the rebalance after > each disk addition, only the first rebalance seems to take effect. > > > > Is there any other way to adjust the weights other than rebalance? Or is > there a way to force a rebalance, even if the frequency of the rebalance > (as a part of disk addition) is under an hour (the min_part_hours value in > ring creation). > > Rebalancing only moves one replica at a time to ensure that your data > remains available, even if you have a hardware failure while you are adding > capacity. This is why it may take multiple rebalances to get everything > evenly balanced. > > The min_part_hours setting (perhaps poorly named) should match how long a > replication pass takes in your cluster. You can understand this because of > what I said above. By ensuring that replication has completed before > putting another partition "in flight", Swift can ensure that you keep your > data highly available. > > For completeness to answer your question, there is an (intentionally) > undocumented option in swift-ring-builder called > "pretend_min_part_hours_passed", but it should ALMOST NEVER be used in a > production cluster, unless you really, really know what you are doing. > Using that option will very likely cause service interruptions to your > users. The better option is to correctly set the min_part_hours value to > match your replication pass time (with set_min_part_hours), and then wait > for swift to move things around. > > Here's some more info on how and why to add capacity to a running Swift > cluster: https://swiftstack.com/blog/2012/04/09/swift-capacity-management/ > > --John > > > > > > > On May 1, 2014 9:00 PM, "Chuck Thier" <cth...@gmail.com> wrote: > > Hi Shyam, > > > > If I am reading your ring output correctly, it looks like only the > devices in node .202 have a weight set, and thus why all of your objects > are going to that one node. You can update the weight of the other > devices, and rebalance, and things should get distributed correctly. > > > > -- > > Chuck > > > > > > On Thu, May 1, 2014 at 5:28 AM, Shyam Prasad N <nspmangal...@gmail.com> > wrote: > > Hi, > > > > I created a swift cluster and configured the rings like this... > > > > swift-ring-builder object.builder create 10 3 1 > > > > ubuntu-202:/etc/swift$ swift-ring-builder object.builder > > object.builder, build version 12 > > 1024 partitions, 3.000000 replicas, 1 regions, 4 zones, 12 devices, > 300.00 balance > > The minimum number of hours before a partition can be reassigned is 1 > > Devices: id region zone ip address port replication ip > replication port name weight partitions balance meta > > 0 1 1 10.3.0.202 6010 10.3.0.202 > 6010 xvdb 1.00 1024 300.00 > > 1 1 1 10.3.0.202 6020 10.3.0.202 > 6020 xvdc 1.00 1024 300.00 > > 2 1 1 10.3.0.202 6030 10.3.0.202 > 6030 xvde 1.00 1024 300.00 > > 3 1 2 10.3.0.212 6010 10.3.0.212 > 6010 xvdb 1.00 0 -100.00 > > 4 1 2 10.3.0.212 6020 10.3.0.212 > 6020 xvdc 1.00 0 -100.00 > > 5 1 2 10.3.0.212 6030 10.3.0.212 > 6030 xvde 1.00 0 -100.00 > > 6 1 3 10.3.0.222 6010 10.3.0.222 > 6010 xvdb 1.00 0 -100.00 > > 7 1 3 10.3.0.222 6020 10.3.0.222 > 6020 xvdc 1.00 0 -100.00 > > 8 1 3 10.3.0.222 6030 10.3.0.222 > 6030 xvde 1.00 0 -100.00 > > 9 1 4 10.3.0.232 6010 10.3.0.232 > 6010 xvdb 1.00 0 -100.00 > > 10 1 4 10.3.0.232 6020 10.3.0.232 > 6020 xvdc 1.00 0 -100.00 > > 11 1 4 10.3.0.232 6030 10.3.0.232 > 6030 xvde 1.00 0 -100.00 > > > > Container and account rings have a similar configuration. > > Once the rings were created and all the disks were added to the rings > like above, I ran rebalance on each ring. (I ran rebalance after adding > each of the node above.) > > Then I immediately scp the rings to all other nodes in the cluster. > > > > I now observe that the objects are all going to 10.3.0.202. I don't see > the objects being replicated to the other nodes. So much so that 202 is > approaching 100% disk usage, while other nodes are almost completely empty. > > What am I doing wrong? Am I not supposed to run rebalance operation > after addition of each disk/node? > > > > Thanks in advance for the help. > > > > -- > > -Shyam > > > > _______________________________________________ > > OpenStack-dev mailing list > > OpenStack-dev@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > > > > _______________________________________________ > > OpenStack-dev mailing list > > OpenStack-dev@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > _______________________________________________ > > OpenStack-dev mailing list > > OpenStack-dev@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > -- -Shyam
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev