Also note that you should be expecting async operations to be slower on a 
call-by-call basis.  Async protocols have added overhead.  The point of them 
really is to leave the client free to interleave other computing activity 
between the async calls.  It’s not usually a better way to do batch writing. 
That’s not an observation specific to C*, that’s just about understanding the 
role of async operations in computing.

There is some subtlety with distributed services like C* where you’re 
round-robining the calls around the cluster, where repeated async calls can win 
relative to sync because you aren’t waiting to hand off the next unit of work 
to a different node, but once the activity starts to queue up on any kind of 
resource, even just TCP buffering, you’ll likely be back to a situation where 
all you are measuring is the net difference in protocol overhead for async vs 
sync.

One of the challenges with performance testing is you have to be pretty clear 
on what exactly it is you are exercising, or all you can conclude from 
different numbers is that different numbers can exist.

R

From: Alexander Dejanovski <a...@thelastpickle.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Wednesday, December 11, 2019 at 7:44 AM
To: user <user@cassandra.apache.org>
Subject: Re: execute is faster than execute_async?

Message from External Sender
Hi,

you can check this piece of documentation from Datastax: 
https://docs.datastax.com/en/developer/python-driver/3.20/api/cassandra/cluster/#cassandra.cluster.Session.execute_async<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.datastax.com_en_developer_python-2Ddriver_3.20_api_cassandra_cluster_-23cassandra.cluster.Session.execute-5Fasync&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=0ofF4UffMCqC7QUlll_Df3DXg8p2S1e6Us9n2WPDi40&s=DThiGTbvbXSgd9EgVDS5TB3UMg2BPHvC8QypKU18IY0&e=>

The usual way of doing this is to send a bunch of execute_async() calls, adding 
the returned futures in a list. Once the list reaches the chosen threshold 
(usually we send around 100 queries and wait for them to finish before moving 
on the the next ones), loop through the futures and call the result() method to 
block until it completes.
Should look like this:


futures = []

for i in range(len(queries)):

    futures.append(session.execute_async(queries[i]))

    if len(futures) >= 100 or i == len(queries)-1:

        for future in futures:

            results = future.result() # will block until the query finishes

        futures = []  # empty the list



Haven't tested the code above but it should give you an idea on how this can be 
implemented.
Sending hundreds/thousands of queries without waiting for a result will DDoS 
the cluster, so you should always implement some throttling.

Cheers,

-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.thelastpickle.com_&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=0ofF4UffMCqC7QUlll_Df3DXg8p2S1e6Us9n2WPDi40&s=j15rYxPPTuCan-fJfvsS7dVrfBFtz9ZKXT-4fb2Avbs&e=>


On Wed, Dec 11, 2019 at 10:42 AM Jordan West 
<jorda...@gmail.com<mailto:jorda...@gmail.com>> wrote:
I’m not very familiar with the python client unfortunately. If it helps: In 
Java, async would return futures and at the end of submitting each batch you 
would block on them by calling get.

Jordan

On Wed, Dec 11, 2019 at 1:37 AM lampahome 
<pahome.c...@mirlab.org<mailto:pahome.c...@mirlab.org>> wrote:


Jordan West <jorda...@gmail.com<mailto:jorda...@gmail.com>> 於 2019年12月11日 週三 
下午4:34寫道:
Hi,

Have you tried batching calls to execute_async with periodic blocking for the 
batch’s responses?

Can you give me some keywords about calling execute_async batch?

PS: I use python version.

Reply via email to