Thanks everyone. It seems on a first glance that the application that's using swift, is pushing all the PUT operations in a single container (+100 PUT/sec) so the developers are making a quick change to try to split all the load across lots of containers to scale horizontally since we are getting lots of concurrency on a single container.
No doubt if thats the problem, our next move it to swap account/container to SSD devices. I'll keep you posted ! * * *Alejandro.* On Wed, Jan 16, 2013 at 5:13 AM, Ywang225 <ywang...@126.com> wrote: > If you cares about put performance, one thing needs to check, are you > placing account and container with object together? If it is, this possibly > becomes bottleneck, you could place account and container on dedicated > nodes or dedicated faster disks. Of course, this involves ring changes. > > Another side is about the parameters for account and container servers, > workers=48 seems too high, which will increase contention on accessing > account or container db. > > -ywang > > 在 2013-1-15,4:01,Alejandro Comisario <alejandro.comisa...@mercadolibre.com> > 写道: > > Chuck et All. > > Let me go through the point one by one. > > #1 Even seeing that "object-auditor" allways runs and never stops, we > stoped the swift-*-auditor and didnt see any improvements, from all the > datanodes we have an average of 8% IO-WAIT (using iostat), the only thing > that we see is the pid "xfsbuf" runs once in a while causing 99% iowait for > a sec, we delayed the runtime for that process, and didnt see changes > either. > > Our object-auditor config for all devices is as follow : > > [object-auditor] > files_per_second = 5 > zero_byte_files_per_second = 5 > bytes_per_second = 3000000 > > #2 Our 12 proxyes are 6 physical and 6 kvm instances running on nova, > checking iftop we are at an average of 15Mb/s of bandwidth usage so i dont > think we are saturating the networking. > #3 The overall Idle CPU on all datanodes is 80%, im not sure how to check > the CPU usage per worker, let me paste the config for a device for object, > account and container. > > *object-server.conf* > *------------------* > [DEFAULT] > devices = /srv/node/sda3 > mount_check = false > bind_port = 6010 > user = swift > log_facility = LOG_LOCAL2 > log_level = DEBUG > workers = 48 > disable_fallocate = true > > [pipeline:main] > pipeline = object-server > > [app:object-server] > use = egg:swift#object > > [object-replicator] > vm_test_mode = yes > concurrency = 8 > run_pause = 600 > > [object-updater] > concurrency = 8 > > [object-auditor] > files_per_second = 5 > zero_byte_files_per_second = 5 > bytes_per_second = 3000000 > > *account-server.conf* > *-------------------* > [DEFAULT] > devices = /srv/node/sda3 > mount_check = false > bind_port = 6012 > user = swift > log_facility = LOG_LOCAL2 > log_level = DEBUG > workers = 48 > db_preallocation = on > disable_fallocate = true > > [pipeline:main] > pipeline = account-server > > [app:account-server] > use = egg:swift#account > > [account-replicator] > vm_test_mode = yes > concurrency = 8 > run_pause = 600 > > [account-auditor] > > [account-reaper] > > *container-server.conf* > *---------------------* > [DEFAULT] > devices = /srv/node/sda3 > mount_check = false > bind_port = 6011 > user = swift > workers = 48 > log_facility = LOG_LOCAL2 > allow_versions = True > disable_fallocate = true > > [pipeline:main] > pipeline = container-server > > [app:container-server] > use = egg:swift#container > allow_versions = True > > [container-replicator] > vm_test_mode = yes > concurrency = 8 > run_pause = 500 > > [container-updater] > concurrency = 8 > > [container-auditor] > > #4 We dont use SSL for swift so, no latency over there. > > Hope you guys can shed some light. > > > * > * > * > * > *Alejandro Comisario > #melicloud CloudBuilders* > Arias 3751, Piso 7 (C1430CRG) > Ciudad de Buenos Aires - Argentina > Cel: +549(11) 15-3770-1857 > Tel : +54(11) 4640-8443 > > > On Mon, Jan 14, 2013 at 1:23 PM, Chuck Thier <cth...@gmail.com> wrote: > >> Hi Alejandro, >> >> I really doubt that partition size is causing these issues. It can be >> difficult to debug these types of issues without access to the >> cluster, but I can think of a couple of things to look at. >> >> 1. Check your disk io usage and io wait on the storage nodes. If >> that seems abnormally high, then that could be one of the sources of >> problems. If this is the case, then the first things that I would >> look at are the auditors, as they can use up a lot of disk io if not >> properly configured. I would try turning them off for a bit >> (swift-*-auditor) and see if that makes any difference. >> >> 2. Check your network io usage. You haven't described what type of >> network you have going to the proxies, but if they share a single GigE >> interface, if my quick calculations are correct, you could be >> saturating the network. >> >> 3. Check your CPU usage. I listed this one last as you have said >> that you have already worked at tuning the number of workers (though I >> would be interested to hear how many workers you have running for each >> service). The main thing to look for, is to see if all of your >> workers are maxed out on CPU, if so, then you may need to bump >> workers. >> >> 4. SSL Termination? Where are you terminating the SSL connection? >> If you are terminating SSL in Swift directly with the swift proxy, >> then that could also be a source of issue. This was only meant for >> dev and testing, and you should use an SSL terminating load balancer >> in front of the swift proxies. >> >> That's what I could think of right off the top of my head. >> >> -- >> Chuck >> >> On Mon, Jan 14, 2013 at 5:45 AM, Alejandro Comisario >> <alejandro.comisa...@mercadolibre.com> wrote: >> > Chuck / John. >> > We are having 50.000 request per minute ( where 10.000+ are put from >> small >> > objects, from 10KB to 150KB ) >> > >> > We are using swift 1.7.4 with keystone token caching so no latency over >> > there. >> > We are having 12 proxyes and 24 datanodes divided in 4 zones ( each >> datanode >> > has 48gb of ram, 2 hexacore and 4 devices of 3TB each ) >> > >> > The workers that are puting objects in swift are seeing an awful >> > performance, and we too. >> > With peaks of 2secs to 15secs per put operations coming from the >> datanodes. >> > We tunes db_preallocation, disable_fallocate, workers and concurrency >> but we >> > cant reach the request that we need ( we need 24.000 put per minute of >> small >> > objects ) but we dont seem to find where is the problem, other than >> from the >> > datanodes. >> > >> > Maybe worth pasting our config over here? >> > Thanks in advance. >> > >> > alejandro >> > >> > On 12 Jan 2013 02:01, "Chuck Thier" <cth...@gmail.com> wrote: >> >> >> >> Looking at this from a different perspective. Having 2500 partitions >> >> per drive shouldn't be an absolutely horrible thing either. Do you >> >> know how many objects you have per partition? What types of problems >> >> are you seeing? >> >> >> >> -- >> >> Chuck >> >> >> >> On Fri, Jan 11, 2013 at 3:28 PM, John Dickinson <m...@not.mn> wrote: >> >> > If effect, this would be a complete replacement of your rings, and >> that >> >> > is essentially a whole new cluster. All of the existing data would >> need to >> >> > be rehashed into the new ring before it is available. >> >> > >> >> > There is no process that rehashes the data to ensure that it is >> still in >> >> > the correct partition. Replication only ensures that the partitions >> are on >> >> > the right drives. >> >> > >> >> > To change the number of partitions, you will need to GET all of the >> data >> >> > from the old ring and PUT it to the new ring. A more complicated, but >> >> > perhaps more efficient) solution may include something like walking >> each >> >> > drive and rehashing+moving the data to the right partition and then >> letting >> >> > replication settle it down. >> >> > >> >> > Either way, 100% of your existing data will need to at least be >> rehashed >> >> > (and probably moved). Your CPU (hashing), disks (read+write), RAM >> (directory >> >> > walking), and network (replication) may all be limiting factors in >> how long >> >> > it will take to do this. Your per-disk free space may also determine >> what >> >> > method you choose. >> >> > >> >> > I would not expect any data loss while doing this, but you will >> probably >> >> > have availability issues, depending on the data access patterns. >> >> > >> >> > I'd like to eventually see something in swift that allows for >> changing >> >> > the partition power in existing rings, but that will be >> >> > hard/tricky/non-trivial. >> >> > >> >> > Good luck. >> >> > >> >> > --John >> >> > >> >> > >> >> > On Jan 11, 2013, at 1:17 PM, Alejandro Comisario >> >> > <alejandro.comisa...@mercadolibre.com> wrote: >> >> > >> >> >> Hi guys. >> >> >> We've created a swift cluster several months ago, the things is that >> >> >> righ now we cant add hardware and we configured lots of partitions >> thinking >> >> >> about the final picture of the cluster. >> >> >> >> >> >> Today each datanodes is having 2500+ partitions per device, and even >> >> >> tuning the background processes ( replicator, auditor & updater ) >> we really >> >> >> want to try to lower the partition power. >> >> >> >> >> >> Since its not possible to do that without recreating the ring, we >> can >> >> >> have the luxury of recreate it with a very lower partition power, >> and >> >> >> rebalance / deploy the new ring. >> >> >> >> >> >> The question is, having a working cluster with *existing data* is it >> >> >> possible to do this and wait for the data to move around *without >> data loss* >> >> >> ??? >> >> >> If so, it might be true to wait for an improvement in the overall >> >> >> cluster performance ? >> >> >> >> >> >> We have no problem to have a non working cluster (while moving the >> >> >> data) even for an entire weekend. >> >> >> >> >> >> Cheers. >> >> >> >> >> >> >> >> > >> >> > >> >> > _______________________________________________ >> >> > Mailing list: https://launchpad.net/~openstack >> >> > Post to : openstack@lists.launchpad.net >> >> > Unsubscribe : https://launchpad.net/~openstack >> >> > More help : https://help.launchpad.net/ListHelp >> >> > >> > > _______________________________________________ > OpenStack-operators mailing list > openstack-operat...@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > >
_______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp