[openstack-dev] [Swift3] improve multi-delete performance

2016-05-23 Thread Kirubakaran Kaliannan
Hi,



The multi_delete in swift3, perform sequential DELETE.  In my 3
storage-node configuration to delete a 1000 objects, it took 30 second.



Following code change to create 100 thread pool to delete 1000 object took
only 12 second. (This may even reduce if more storage nodes in picture).



If the following code change look fine, How can we formally
(propose/review/commit) take this to the swift3 github code base ?





*Code diff:*



diff --git a/swift3/swift-plugin-s3/swift3/controllers/multi_delete.py
b/swift3/swift-plugin-s3/swift3/controllers/multi_delete.py

index 1bfde1d..5140529 100644

--- a/swift3/swift-plugin-s3/swift3/controllers/multi_delete.py

+++ b/swift3/swift-plugin-s3/swift3/controllers/multi_delete.py

@@ -21,9 +21,9 @@ from swift3.response import HTTPOk, S3NotImplemented,
NoSuchKey, \

from swift3.cfg import CONF

from swift3.utils import LOGGER

-# Zadara-Begin

+from eventlet import GreenPool

+import copy

MAX_MULTI_DELETE_BODY_SIZE = 262144

-# Zadara-End



 class MultiObjectDeleteController(Controller):

@@ -44,6 +44,24 @@ class MultiObjectDeleteController(Controller):

 return tostring(elem)

+def async_delete(self, reqs, key, elem):

+req = copy.copy(reqs)

+req.object_name = key

+try:

+req.get_response(self.app, method='DELETE')

+except NoSuchKey:

+pass

+except ErrorResponse as e:

+error = SubElement(elem, 'Error')

+SubElement(error, 'Key').text = key

+SubElement(error, 'Code').text = e.__class__.__name__

+SubElement(error, 'Message').text = e._msg

+return

+

+if not self.quiet:

+deleted = SubElement(elem, 'Deleted')

+SubElement(deleted, 'Key').text = key

+

 @bucket_operation

 def POST(self, req):

 """

@@ -90,27 +108,17 @@ class MultiObjectDeleteController(Controller):

 body = self._gen_error_body(error, elem, delete_list)

 return HTTPOk(body=body)

+parallel_delete = 100

+run_pool = GreenPool(size=parallel_delete)

 for key, version in delete_list:

 if version is not None:

 # TODO: delete the specific version of the object

 raise S3NotImplemented()

-req.object_name = key

-

-try:

-req.get_response(self.app, method='DELETE')

-except NoSuchKey:

-pass

-except ErrorResponse as e:

-error = SubElement(elem, 'Error')

-SubElement(error, 'Key').text = key

-SubElement(error, 'Code').text = e.__class__.__name__

-SubElement(error, 'Message').text = e._msg

-continue

-

-if not self.quiet:

-deleted = SubElement(elem, 'Deleted')

-SubElement(deleted, 'Key').text = key

+run_pool.spawn(self.async_delete, req, key, elem)

+

+# Wait for all the process to complete

+run_pool.waitall()

 body = tostring(elem)
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [swift] zones and partition

2016-02-21 Thread Kirubakaran Kaliannan
Hi,



I have 3 ZONEs, with different capacity in each. Say I have 4 X 1TB disk
 (r0z1 - 1TB, r0z2 - 1TB,r0 z3 - 2TB ).



The ring builder (rebalance code), keep ¼-partitions of all 3 replica in
Zone-3. This is the current default  behavior from the rebalance code.

This puts pressure to the storage user to evenly increase the storage
capacity across the zones. Is this is the correct understanding I have ?



If so, Why have we chosen this approach, rather cant we enforce zone based
partition (but the partition size on Z1 and Z2 may be lesser than Z3) ?

This makes sure we have 100% zone level protection and not loss of data on
1 zone failure ?



Thanks,

-kiru
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] profiling Latency of single PUT operation on proxy + storage

2015-09-08 Thread Kirubakaran Kaliannan
Hi All,



I have attached a simple timeline of proxy+object latency chart for a
single PUT request. Please check.



I am profiling the swift proxy + object server to improve the latency of
single PUT request. This may help to improve the overall OPS performance.

Test Configuration : 4CPU + 16GB + 1 proxy node + 1 storage node + 1
replica for object ring, 3 replica for container ring on SSD;  perform 4k
PUT (one-by-one) request.

Every 4K PUT request in the above case takes 22ms (30ms for 3 replica-count
for object). Target is to bring the per 4K put request below 10ms to double
the overall OPS performance.



There are some potential places where we can improve the latency to achieve
this. Can you please provide your thoughts.



*Performance optimization-1: *Proxy server don’t have to get blocked in
connect() - getexpect() until object-server responds.

*Problem Today: *On PUT request, the proxy server connect_put_node() wait
for the response from the object server (getexpect()) after the connection
is established. Once the response (‘HTTP_CONTINUE’) is received, the proxy
server goes ahead and spawn the send_file thread to send data to object
server’s. There code looks serialized between proxy and object server.

*Optimization*:

*Option1:* Avoid waiting for all the connect to complete before proceeding
with the send_data to the connected object-server’s ?

*Option2:* The purpose of the getexpect() call is not very clear. Can we
relax  this, so that the proxy server will go-ahead read the data_source
and send it to the object server quickly after the connection is
established. We may have to handle extra failure cases here. (FYI: This
reduce 3ms for a single PUT request ).

def _connect_put_node(self, nodes, part, path, headers,

  logger_thread_locals, req):

"""Method for a file PUT connect"""

   ………..
   *with Timeout(self.app.node_timeout):*

*resp = conn.getexpect()*

   ………



*Performance Optimization-2*: Object server serialize the container_update
after the data write.

*Problem Today:* On PUT request, the object server, after writing the data
and meta data, the container_update() is called, which is serialized to all
storage nodes (3 way). Each container update take 3 millisecond and it adds
to 9 millisecond for the container_update to complete.

*Optimization:* Can we make this parallel using the green thread, and
probably *return success on  the first successful container update*, if
there is no connection error? I am trying to understand whether this will
have any data integrity issues, can you please provide your feed back on
this ?

*(FYI:* this reduce atlest 5 millisecond)



*Performance Optimization-3*:  write(metadata) in object server takes 2 to
3 millisecond

*Problem today:* After writing the data to the file, writer.put(metadata)
-> _*finalize*_put() to process the post write operation. This takes an
average of 3 millisecond for every put request.

*Optimization:*

*Option 1:* Is it possible to flush the file (group of files)
asynchronously in _*finalize*_put()

*Option 2:* Can we make this put(metadata) an asynchronous call ? so the
container update can happen in parallel ?  Error conditions must be handled
properly.



I would like to know, whether we have done any work done in this area, so
not to repeat the effort.



The motivation for this work, is because 30millisecond for a single 4K I/O
looks too high. With this the only way to scale is to put more server’s.
Trying to see whether we can achieve anything quickly to modify some
portion of code  or this may require quite a bit of code-rewrite.



Also, suggest whether this approach/work on reducing latency of 1 PUT
request is correct ?





Thanks

-kiru



*From:* Shyam Kaushik [mailto:sh...@zadarastorage.com
]
*Sent:* Friday, September 04, 2015 11:53 AM
*To:* Kirubakaran Kaliannan
*Subject:* RE: profiling per I/O logs



*Hi Kiru,*



I listed couple of optimization options like below. Can you pls list down
3-4 optimizations like below in similar format & pass it back to me for a
quick review. Once we finalize lets bounce it with community on what they
think.



*Performance optimization-1:* Proxy-server - on PUT request drive client
side independent of auth/object-server connection establishment

*Problem today:* on PUT request, client connects/puts header to
proxy-server. Proxy-server goes to auth & then looks up ring, connects to
each of object-server sending a header. Then when object-servers accept the
connection, proxy-server sends HTTP continue to client & now client writes
data into proxy-server & then proxy-server writes data to the object-servers

*Optimization:* Proxy-server can drive the client side independent of
backend side. i.e. upon auth completes, proxy-server through a thread can
send HTTP continue to client & ask for the data to be written. In the
background it can try to conne

[openstack-dev] [Swift] ObjectController::async_update() latest change

2015-09-01 Thread Kirubakaran Kaliannan
Hi,

Regarding
https://github.com/openstack/swift/commit/2289137164231d7872731c2cf3d81b86f34f01a4

I am profiling each section of the swift code. Notice
ObjectController::async_update()has high latency, and tried to use threads
to parallelize the container_update. Noticed the above changes, and I
profiled the above code changes as well.

On my set up, per container async_update takes 2 to 4 millisecond and takes
9-11 ms to complete the 3 asyncupdate’s. With the above changes it came down
to 7 to 9 ms, but I am expecting this to go further below to 4ms at least.

1.  Do we have any latency number for the improvement on the above change ?
2.  Trying to understand the possibility, do we really want to wait for all
async_update()to complete ? can we just return success, once when one
async_update() is successful ?


Thanks,
-kiru

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] swift-dispersion-populate for different storage policies

2015-03-31 Thread Kirubakaran Kaliannan
Hi

I am working on making the swift-dispersion-populate and -report to work on
different storage policies (different rings).
Looks like internally the container objects are hardcoded.
Is there anyone working on improving this or am I the first one ?

Thanks
kiru

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev