[openstack-dev] [Swift3] improve multi-delete performance
Hi, The multi_delete in swift3, perform sequential DELETE. In my 3 storage-node configuration to delete a 1000 objects, it took 30 second. Following code change to create 100 thread pool to delete 1000 object took only 12 second. (This may even reduce if more storage nodes in picture). If the following code change look fine, How can we formally (propose/review/commit) take this to the swift3 github code base ? *Code diff:* diff --git a/swift3/swift-plugin-s3/swift3/controllers/multi_delete.py b/swift3/swift-plugin-s3/swift3/controllers/multi_delete.py index 1bfde1d..5140529 100644 --- a/swift3/swift-plugin-s3/swift3/controllers/multi_delete.py +++ b/swift3/swift-plugin-s3/swift3/controllers/multi_delete.py @@ -21,9 +21,9 @@ from swift3.response import HTTPOk, S3NotImplemented, NoSuchKey, \ from swift3.cfg import CONF from swift3.utils import LOGGER -# Zadara-Begin +from eventlet import GreenPool +import copy MAX_MULTI_DELETE_BODY_SIZE = 262144 -# Zadara-End class MultiObjectDeleteController(Controller): @@ -44,6 +44,24 @@ class MultiObjectDeleteController(Controller): return tostring(elem) +def async_delete(self, reqs, key, elem): +req = copy.copy(reqs) +req.object_name = key +try: +req.get_response(self.app, method='DELETE') +except NoSuchKey: +pass +except ErrorResponse as e: +error = SubElement(elem, 'Error') +SubElement(error, 'Key').text = key +SubElement(error, 'Code').text = e.__class__.__name__ +SubElement(error, 'Message').text = e._msg +return + +if not self.quiet: +deleted = SubElement(elem, 'Deleted') +SubElement(deleted, 'Key').text = key + @bucket_operation def POST(self, req): """ @@ -90,27 +108,17 @@ class MultiObjectDeleteController(Controller): body = self._gen_error_body(error, elem, delete_list) return HTTPOk(body=body) +parallel_delete = 100 +run_pool = GreenPool(size=parallel_delete) for key, version in delete_list: if version is not None: # TODO: delete the specific version of the object raise S3NotImplemented() -req.object_name = key - -try: -req.get_response(self.app, method='DELETE') -except NoSuchKey: -pass -except ErrorResponse as e: -error = SubElement(elem, 'Error') -SubElement(error, 'Key').text = key -SubElement(error, 'Code').text = e.__class__.__name__ -SubElement(error, 'Message').text = e._msg -continue - -if not self.quiet: -deleted = SubElement(elem, 'Deleted') -SubElement(deleted, 'Key').text = key +run_pool.spawn(self.async_delete, req, key, elem) + +# Wait for all the process to complete +run_pool.waitall() body = tostring(elem) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [swift] zones and partition
Hi, I have 3 ZONEs, with different capacity in each. Say I have 4 X 1TB disk (r0z1 - 1TB, r0z2 - 1TB,r0 z3 - 2TB ). The ring builder (rebalance code), keep ¼-partitions of all 3 replica in Zone-3. This is the current default behavior from the rebalance code. This puts pressure to the storage user to evenly increase the storage capacity across the zones. Is this is the correct understanding I have ? If so, Why have we chosen this approach, rather cant we enforce zone based partition (but the partition size on Z1 and Z2 may be lesser than Z3) ? This makes sure we have 100% zone level protection and not loss of data on 1 zone failure ? Thanks, -kiru __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] profiling Latency of single PUT operation on proxy + storage
Hi All, I have attached a simple timeline of proxy+object latency chart for a single PUT request. Please check. I am profiling the swift proxy + object server to improve the latency of single PUT request. This may help to improve the overall OPS performance. Test Configuration : 4CPU + 16GB + 1 proxy node + 1 storage node + 1 replica for object ring, 3 replica for container ring on SSD; perform 4k PUT (one-by-one) request. Every 4K PUT request in the above case takes 22ms (30ms for 3 replica-count for object). Target is to bring the per 4K put request below 10ms to double the overall OPS performance. There are some potential places where we can improve the latency to achieve this. Can you please provide your thoughts. *Performance optimization-1: *Proxy server don’t have to get blocked in connect() - getexpect() until object-server responds. *Problem Today: *On PUT request, the proxy server connect_put_node() wait for the response from the object server (getexpect()) after the connection is established. Once the response (‘HTTP_CONTINUE’) is received, the proxy server goes ahead and spawn the send_file thread to send data to object server’s. There code looks serialized between proxy and object server. *Optimization*: *Option1:* Avoid waiting for all the connect to complete before proceeding with the send_data to the connected object-server’s ? *Option2:* The purpose of the getexpect() call is not very clear. Can we relax this, so that the proxy server will go-ahead read the data_source and send it to the object server quickly after the connection is established. We may have to handle extra failure cases here. (FYI: This reduce 3ms for a single PUT request ). def _connect_put_node(self, nodes, part, path, headers, logger_thread_locals, req): """Method for a file PUT connect""" ……….. *with Timeout(self.app.node_timeout):* *resp = conn.getexpect()* ……… *Performance Optimization-2*: Object server serialize the container_update after the data write. *Problem Today:* On PUT request, the object server, after writing the data and meta data, the container_update() is called, which is serialized to all storage nodes (3 way). Each container update take 3 millisecond and it adds to 9 millisecond for the container_update to complete. *Optimization:* Can we make this parallel using the green thread, and probably *return success on the first successful container update*, if there is no connection error? I am trying to understand whether this will have any data integrity issues, can you please provide your feed back on this ? *(FYI:* this reduce atlest 5 millisecond) *Performance Optimization-3*: write(metadata) in object server takes 2 to 3 millisecond *Problem today:* After writing the data to the file, writer.put(metadata) -> _*finalize*_put() to process the post write operation. This takes an average of 3 millisecond for every put request. *Optimization:* *Option 1:* Is it possible to flush the file (group of files) asynchronously in _*finalize*_put() *Option 2:* Can we make this put(metadata) an asynchronous call ? so the container update can happen in parallel ? Error conditions must be handled properly. I would like to know, whether we have done any work done in this area, so not to repeat the effort. The motivation for this work, is because 30millisecond for a single 4K I/O looks too high. With this the only way to scale is to put more server’s. Trying to see whether we can achieve anything quickly to modify some portion of code or this may require quite a bit of code-rewrite. Also, suggest whether this approach/work on reducing latency of 1 PUT request is correct ? Thanks -kiru *From:* Shyam Kaushik [mailto:sh...@zadarastorage.com ] *Sent:* Friday, September 04, 2015 11:53 AM *To:* Kirubakaran Kaliannan *Subject:* RE: profiling per I/O logs *Hi Kiru,* I listed couple of optimization options like below. Can you pls list down 3-4 optimizations like below in similar format & pass it back to me for a quick review. Once we finalize lets bounce it with community on what they think. *Performance optimization-1:* Proxy-server - on PUT request drive client side independent of auth/object-server connection establishment *Problem today:* on PUT request, client connects/puts header to proxy-server. Proxy-server goes to auth & then looks up ring, connects to each of object-server sending a header. Then when object-servers accept the connection, proxy-server sends HTTP continue to client & now client writes data into proxy-server & then proxy-server writes data to the object-servers *Optimization:* Proxy-server can drive the client side independent of backend side. i.e. upon auth completes, proxy-server through a thread can send HTTP continue to client & ask for the data to be written. In the background it can try to conne
[openstack-dev] [Swift] ObjectController::async_update() latest change
Hi, Regarding https://github.com/openstack/swift/commit/2289137164231d7872731c2cf3d81b86f34f01a4 I am profiling each section of the swift code. Notice ObjectController::async_update()has high latency, and tried to use threads to parallelize the container_update. Noticed the above changes, and I profiled the above code changes as well. On my set up, per container async_update takes 2 to 4 millisecond and takes 9-11 ms to complete the 3 asyncupdate’s. With the above changes it came down to 7 to 9 ms, but I am expecting this to go further below to 4ms at least. 1. Do we have any latency number for the improvement on the above change ? 2. Trying to understand the possibility, do we really want to wait for all async_update()to complete ? can we just return success, once when one async_update() is successful ? Thanks, -kiru __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] swift-dispersion-populate for different storage policies
Hi I am working on making the swift-dispersion-populate and -report to work on different storage policies (different rings). Looks like internally the container objects are hardcoded. Is there anyone working on improving this or am I the first one ? Thanks kiru __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev