Thanks Andi, that helps, it's true that my expectations were misplaced; I was expecting all nodes to "rebalance" until they each store the same size.
What's weird though is there are missing folders in the newly created c0d4p1 node. Here's what I get root@storage3:/srv/node# ls c0d1p1/ accounts async_pending containers objects tmp root@storage3:/srv/node# ls c0d4p1/ accounts tmp Is that normal? And when I check /var/log/rsyncd.log for the moves in between storage nodes, I see too many of the following- which, again, makes me think whether there's something wrong : 2012/10/24 19:22:56 [6514] rsync to container/c0d4p1/tmp/e49cf526-1d53-4069-bbea-b74f6dbec5f1 from storage2 (192.168.1.4) 2012/10/24 19:22:56 [6514] receiving file list 2012/10/24 19:22:56 [6514] sent 54 bytes received 17527 bytes total size 17408 2012/10/24 21:22:56 [6516] connect from storage2 (192.168.1.4) 2012/10/24 19:22:56 [6516] rsync to container/c0d4p1/tmp/4b8b0618-077b-48e2-a7a0-fb998fcf11bc from storage2 (192.168.1.4) 2012/10/24 19:22:56 [6516] receiving file list 2012/10/24 19:22:56 [6516] sent 54 bytes received 26743 bytes total size 26624 2012/10/24 21:22:56 [6518] connect from storage2 (192.168.1.4) 2012/10/24 19:22:56 [6518] rsync to container/c0d4p1/tmp/53452ee6-c52c-4e3b-abe2-a31a2c8d65ba from storage2 (192.168.1.4) 2012/10/24 19:22:56 [6518] receiving file list 2012/10/24 19:22:57 [6518] sent 54 bytes received 24695 bytes total size 24576 2012/10/24 21:22:57 [6550] connect from storage2 (192.168.1.4) 2012/10/24 19:22:57 [6550] rsync to container/c0d4p1/tmp/b858126d-3152-4d71-a0e8-eea115f69fc8 from storage2 (192.168.1.4) 2012/10/24 19:22:57 [6550] receiving file list 2012/10/24 19:22:57 [6550] sent 54 bytes received 24695 bytes total size 24576 2012/10/24 21:22:57 [6552] connect from storage2 (192.168.1.4) 2012/10/24 19:22:57 [6552] rsync to container/c0d4p1/tmp/f3ce8205-84ac-4236-baea-3a3aef2da6ab from storage2 (192.168.1.4) 2012/10/24 19:22:57 [6552] receiving file list 2012/10/24 19:22:58 [6552] sent 54 bytes received 25719 bytes total size 25600 2012/10/24 21:22:58 [6554] connect from storage2 (192.168.1.4) 2012/10/24 19:22:58 [6554] rsync to container/c0d4p1/tmp/91b4f046-eacb-4a1d-aed1-727d0c982742 from storage2 (192.168.1.4) 2012/10/24 19:22:58 [6554] receiving file list 2012/10/24 19:22:58 [6554] sent 54 bytes received 18551 bytes total size 18432 2012/10/24 21:22:58 [6556] connect from storage2 (192.168.1.4) 2012/10/24 19:22:58 [6556] rsync to container/c0d4p1/tmp/94d223f9-b84d-4911-be6b-bb28f89b6647 from storage2 (192.168.1.4) 2012/10/24 19:22:58 [6556] receiving file list 2012/10/24 19:22:58 [6556] sent 54 bytes received 24695 bytes total size 24576 On Tue, Oct 23, 2012 at 11:17 AM, andi abes <[email protected]> wrote: > On Tue, Oct 23, 2012 at 12:16 PM, Emre Sokullu <[email protected]> > wrote: > > Folks, > > > > This is the 3rd day and I see no or very little (kb.s) change with the > new > > disks. > > > > Could it be normal, is there a long computation process that takes time > > first before actually filling newly added disks? > > > > Or should I just start from scratch with the "create" command this time. > The > > last time I did it, I didn't use the "swift-ring-builder create 20 3 1 > .." > > command first but just started with "swift-ring-builder add ..." and used > > existing ring.gz files, thinking otherwise I could be reformatting the > whole > > stack. I'm not sure if that's the case. > > > > That is correct - you don't want to recreate the rings, since that is > likely to cause redundant partition movement. > > > Please advise. Thanks, > > > > I think your expectations might be misplaced. the ring builder tries > to not move partitions needlessly. In your cluster, you had 3 > zones(and i'm assuming 3 replicas). swift placed the partitions as > efficiently as it could, spread across the 3 zones (servers). As > things stand, there's no real reason for partitions to move across the > servers. I'm guessing that the data growth you've seen is from new > data, not from existing data movement (but there are some calls to > random in the code which might have produced some partition movement). > > If you truly want to move things around forcefully, you could: > * decrease the weight of the old devices. This would cause them to be > over weighted, and partitions reassigned away from them. > * delete and re-add devices to the ring. This will cause all the > partitions from the deleted devices to be spread across the new set of > devices. > > After you perform your ring manipulation commands, execute the > rebalance command and copy the ring files. > This is likely to cause *lots* of activity in your cluster... which > seems to be the desired outcome. Its likely to have negative impact of > service requests to the proxy. It's something you probably want to be > careful about. > > If you leave things alone as they are, new data will be distributed on > the new devices, and as old data gets deleted usage will rebalance > over time. > > > > -- > > Emre > > > > On Mon, Oct 22, 2012 at 12:09 PM, Emre Sokullu <[email protected]> > wrote: > >> > >> Hi Samuel, > >> > >> Thanks for quick reply. > >> > >> They're all 100. And here's the output of swift-ring-builder > >> > >> root@proxy1:/etc/swift# swift-ring-builder account.builder > >> account.builder, build version 13 > >> 1048576 partitions, 3 replicas, 3 zones, 12 devices, 0.00 balance > >> The minimum number of hours before a partition can be reassigned is 1 > >> Devices: id zone ip address port name weight partitions > >> balance meta > >> 0 1 192.168.1.3 6002 c0d1p1 100.00 262144 > >> 0.00 > >> 1 1 192.168.1.3 6002 c0d2p1 100.00 262144 > >> 0.00 > >> 2 1 192.168.1.3 6002 c0d3p1 100.00 262144 > >> 0.00 > >> 3 2 192.168.1.4 6002 c0d1p1 100.00 262144 > >> 0.00 > >> 4 2 192.168.1.4 6002 c0d2p1 100.00 262144 > >> 0.00 > >> 5 2 192.168.1.4 6002 c0d3p1 100.00 262144 > >> 0.00 > >> 6 3 192.168.1.5 6002 c0d1p1 100.00 262144 > >> 0.00 > >> 7 3 192.168.1.5 6002 c0d2p1 100.00 262144 > >> 0.00 > >> 8 3 192.168.1.5 6002 c0d3p1 100.00 262144 > >> 0.00 > >> 9 1 192.168.1.3 6002 c0d4p1 100.00 262144 > >> 0.00 > >> 10 2 192.168.1.4 6002 c0d4p1 100.00 262144 > >> 0.00 > >> 11 3 192.168.1.5 6002 c0d4p1 100.00 262144 > >> 0.00 > >> > >> On Mon, Oct 22, 2012 at 12:03 PM, Samuel Merritt <[email protected]> > >> wrote: > >> > On 10/22/12 9:38 AM, Emre Sokullu wrote: > >> >> > >> >> Hi folks, > >> >> > >> >> At GROU.PS, we've been an OpenStack SWIFT user for more than 1.5 > years > >> >> now. Currently, we hold about 18TB of data on 3 storage nodes. Since > >> >> we hit 84% in utilization, we have recently decided to expand the > >> >> storage with more disks. > >> >> > >> >> In order to do that, after creating a new c0d4p1 partition in each of > >> >> the storage nodes, we ran the following commands on our proxy server: > >> >> > >> >> swift-ring-builder account.builder add z1-192.168.1.3:6002/c0d4p1 100 > >> >> swift-ring-builder container.builder add z1-192.168.1.3:6002/c0d4p1 > 100 > >> >> swift-ring-builder object.builder add z1-192.168.1.3:6002/c0d4p1 100 > >> >> swift-ring-builder account.builder add z2-192.168.1.4:6002/c0d4p1 100 > >> >> swift-ring-builder container.builder add z2-192.168.1.4:6002/c0d4p1 > 100 > >> >> swift-ring-builder object.builder add z2-192.168.1.4:6002/c0d4p1 100 > >> >> swift-ring-builder account.builder add z3-192.168.1.5:6002/c0d4p1 100 > >> >> swift-ring-builder container.builder add z3-192.168.1.5:6002/c0d4p1 > 100 > >> >> swift-ring-builder object.builder add z3-192.168.1.5:6002/c0d4p1 100 > >> >> > >> >> [snip] > >> > > >> >> > >> >> So right now, the problem is; the disk growth in each of the storage > >> >> nodes seems to have stalled, > >> > > >> > So you've added 3 new devices to each ring and assigned a weight of > 100 > >> > to > >> > each one. What are the weights of the other devices in the ring? If > >> > they're > >> > much larger than 100, then that will cause the new devices to end up > >> > with a > >> > small fraction of the data you want on them. > >> > > >> > Running "swift-ring-builder <thing>.builder" will show you > information, > >> > including weights, of all the devices in the ring. > >> > > >> > > >> > > >> >> * Bonus question: why do we copy ring.gz files to storage nodes and > >> >> how critical they are. To me it's not clear how Swift can afford to > >> >> wait (even though it's just a few seconds ) for .ring.gz files to be > >> >> in storage nodes after rebalancing- if those files are so critical. > >> > > >> > > >> > The ring.gz files contain the mapping from Swift partitions to disks. > As > >> > you > >> > know, the proxy server uses it to determine which backends have the > data > >> > for > >> > a given request. The replicators also use the ring to determine where > >> > data > >> > belongs so that they can ensure the right number of replicas, etc. > >> > > >> > When two storage nodes have different versions of a ring.gz file, you > >> > can > >> > get replicator fights. They look like this: > >> > > >> > - node1's (old) ring says that the partition for a replica of > >> > /cof/fee/cup > >> > belongs on node2's /dev/sdf. > >> > - node2's (new) ring says that the same partition belongs on node1's > >> > /dev/sdd. > >> > > >> > When the replicator on node1 runs, it will see that it has the > partition > >> > for > >> > /cof/fee/cup on its disk. It will then consult the ring, push that > >> > partition's contents to node2, and then delete its local copy (since > >> > node1's > >> > ring says that this data does not belong on node1). > >> > > >> > When the replicator on node2 runs, it will do the converse: push to > >> > node1, > >> > then delete its local copy. > >> > > >> > If you leave the rings out of sync for a long time, then you'll end up > >> > consuming disk and network IO ping-ponging a set of data around. If > >> > they're > >> > out of sync for a few seconds, then it's not a big deal. > >> > > >> > _______________________________________________ > >> > Mailing list: https://launchpad.net/~openstack > >> > Post to : [email protected] > >> > Unsubscribe : https://launchpad.net/~openstack > >> > More help : https://help.launchpad.net/ListHelp > > > > > > > > _______________________________________________ > > Mailing list: https://launchpad.net/~openstack > > Post to : [email protected] > > Unsubscribe : https://launchpad.net/~openstack > > More help : https://help.launchpad.net/ListHelp > > >
_______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : [email protected] Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp

