Also note that you should be expecting async operations to be slower on a call-by-call basis. Async protocols have added overhead. The point of them really is to leave the client free to interleave other computing activity between the async calls. It’s not usually a better way to do batch writing. That’s not an observation specific to C*, that’s just about understanding the role of async operations in computing.
There is some subtlety with distributed services like C* where you’re round-robining the calls around the cluster, where repeated async calls can win relative to sync because you aren’t waiting to hand off the next unit of work to a different node, but once the activity starts to queue up on any kind of resource, even just TCP buffering, you’ll likely be back to a situation where all you are measuring is the net difference in protocol overhead for async vs sync. One of the challenges with performance testing is you have to be pretty clear on what exactly it is you are exercising, or all you can conclude from different numbers is that different numbers can exist. R From: Alexander Dejanovski <a...@thelastpickle.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Wednesday, December 11, 2019 at 7:44 AM To: user <user@cassandra.apache.org> Subject: Re: execute is faster than execute_async? Message from External Sender Hi, you can check this piece of documentation from Datastax: https://docs.datastax.com/en/developer/python-driver/3.20/api/cassandra/cluster/#cassandra.cluster.Session.execute_async<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.datastax.com_en_developer_python-2Ddriver_3.20_api_cassandra_cluster_-23cassandra.cluster.Session.execute-5Fasync&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=0ofF4UffMCqC7QUlll_Df3DXg8p2S1e6Us9n2WPDi40&s=DThiGTbvbXSgd9EgVDS5TB3UMg2BPHvC8QypKU18IY0&e=> The usual way of doing this is to send a bunch of execute_async() calls, adding the returned futures in a list. Once the list reaches the chosen threshold (usually we send around 100 queries and wait for them to finish before moving on the the next ones), loop through the futures and call the result() method to block until it completes. Should look like this: futures = [] for i in range(len(queries)): futures.append(session.execute_async(queries[i])) if len(futures) >= 100 or i == len(queries)-1: for future in futures: results = future.result() # will block until the query finishes futures = [] # empty the list Haven't tested the code above but it should give you an idea on how this can be implemented. Sending hundreds/thousands of queries without waiting for a result will DDoS the cluster, so you should always implement some throttling. Cheers, ----------------- Alexander Dejanovski France @alexanderdeja Consultant Apache Cassandra Consulting http://www.thelastpickle.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.thelastpickle.com_&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=0ofF4UffMCqC7QUlll_Df3DXg8p2S1e6Us9n2WPDi40&s=j15rYxPPTuCan-fJfvsS7dVrfBFtz9ZKXT-4fb2Avbs&e=> On Wed, Dec 11, 2019 at 10:42 AM Jordan West <jorda...@gmail.com<mailto:jorda...@gmail.com>> wrote: I’m not very familiar with the python client unfortunately. If it helps: In Java, async would return futures and at the end of submitting each batch you would block on them by calling get. Jordan On Wed, Dec 11, 2019 at 1:37 AM lampahome <pahome.c...@mirlab.org<mailto:pahome.c...@mirlab.org>> wrote: Jordan West <jorda...@gmail.com<mailto:jorda...@gmail.com>> 於 2019年12月11日 週三 下午4:34寫道: Hi, Have you tried batching calls to execute_async with periodic blocking for the batch’s responses? Can you give me some keywords about calling execute_async batch? PS: I use python version.