Re: execute is faster than execute_async?
On 12/12/2019 06.25, lampahome wrote: Jon Haddad mailto:j...@jonhaddad.com>> 於 2019年12月12日 週四 上午12:42寫道: I'm not sure how you're measuring this - could you share your benchmarking code? s the details of theri? start = time.time() for i in range(40960): prep = session.prepare(query, (args)) session.execute(prep) # or session.execute_async(prep) print('time', time.time()-start) Just like above code snippet. I almost cost time by execute_async() more than normal execute(). I think you're just exposing Python and perhaps driver weaknesses. With .execute(), memory usage stays constant and you suffer the round trip time once per loop. With .execute_async(), memory usage grows, and if there is any algorithm in the driver that is not O(1) (say to maintain the outstanding request table), execution time grows as you push more and more requests. The thread(s) that process responses have to contend with the request issuing thread over locks. You don't suffer the round trip time, but from your results the other issues dominate. If you also collected responses in your loop, and also bound the number of outstanding requests to a reasonable number, you'll see execute_async performing better. You'll see even better performance if you drop Python for a language more suitable for the data plane.
Re: execute is faster than execute_async?
Jon Haddad 於 2019年12月12日 週四 上午12:42寫道: > I'm not sure how you're measuring this - could you share your benchmarking > code? > >> s the details of theri? >> > start = time.time() for i in range(40960): prep = session.prepare(query, (args)) session.execute(prep) # or session.execute_async(prep) print('time', time.time()-start) Just like above code snippet. I almost cost time by execute_async() more than normal execute().
Re: execute is faster than execute_async?
I'm not sure how you're measuring this - could you share your benchmarking code? I ask because execute calls execute_async under the hood: https://github.com/datastax/python-driver/blob/master/cassandra/cluster.py#L2316 I tested the python driver a ways back and found some weird behavior due to the way it's non blocking code was implemented. IIRC there were some sleep calls thrown in there to get around Python's threading inadequacies. I can't remember if this code path is avoided when you use the execute() call. Jon On Wed, Dec 11, 2019 at 3:09 AM lampahome wrote: > I submit 1 row for 40960 times by session.execute() and > session.execute_async() > > I found total time of execute() is always fast than execute_async > > Does that make sense? Or I miss the details of theri? >
Re: execute is faster than execute_async?
Also note that you should be expecting async operations to be slower on a call-by-call basis. Async protocols have added overhead. The point of them really is to leave the client free to interleave other computing activity between the async calls. It’s not usually a better way to do batch writing. That’s not an observation specific to C*, that’s just about understanding the role of async operations in computing. There is some subtlety with distributed services like C* where you’re round-robining the calls around the cluster, where repeated async calls can win relative to sync because you aren’t waiting to hand off the next unit of work to a different node, but once the activity starts to queue up on any kind of resource, even just TCP buffering, you’ll likely be back to a situation where all you are measuring is the net difference in protocol overhead for async vs sync. One of the challenges with performance testing is you have to be pretty clear on what exactly it is you are exercising, or all you can conclude from different numbers is that different numbers can exist. R From: Alexander Dejanovski Reply-To: "user@cassandra.apache.org" Date: Wednesday, December 11, 2019 at 7:44 AM To: user Subject: Re: execute is faster than execute_async? Message from External Sender Hi, you can check this piece of documentation from Datastax: https://docs.datastax.com/en/developer/python-driver/3.20/api/cassandra/cluster/#cassandra.cluster.Session.execute_async<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.datastax.com_en_developer_python-2Ddriver_3.20_api_cassandra_cluster_-23cassandra.cluster.Session.execute-5Fasync&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=0ofF4UffMCqC7QUlll_Df3DXg8p2S1e6Us9n2WPDi40&s=DThiGTbvbXSgd9EgVDS5TB3UMg2BPHvC8QypKU18IY0&e=> The usual way of doing this is to send a bunch of execute_async() calls, adding the returned futures in a list. Once the list reaches the chosen threshold (usually we send around 100 queries and wait for them to finish before moving on the the next ones), loop through the futures and call the result() method to block until it completes. Should look like this: futures = [] for i in range(len(queries)): futures.append(session.execute_async(queries[i])) if len(futures) >= 100 or i == len(queries)-1: for future in futures: results = future.result() # will block until the query finishes futures = [] # empty the list Haven't tested the code above but it should give you an idea on how this can be implemented. Sending hundreds/thousands of queries without waiting for a result will DDoS the cluster, so you should always implement some throttling. Cheers, - Alexander Dejanovski France @alexanderdeja Consultant Apache Cassandra Consulting http://www.thelastpickle.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.thelastpickle.com_&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=0ofF4UffMCqC7QUlll_Df3DXg8p2S1e6Us9n2WPDi40&s=j15rYxPPTuCan-fJfvsS7dVrfBFtz9ZKXT-4fb2Avbs&e=> On Wed, Dec 11, 2019 at 10:42 AM Jordan West mailto:jorda...@gmail.com>> wrote: I’m not very familiar with the python client unfortunately. If it helps: In Java, async would return futures and at the end of submitting each batch you would block on them by calling get. Jordan On Wed, Dec 11, 2019 at 1:37 AM lampahome mailto:pahome.c...@mirlab.org>> wrote: Jordan West mailto:jorda...@gmail.com>> 於 2019年12月11日 週三 下午4:34寫道: Hi, Have you tried batching calls to execute_async with periodic blocking for the batch’s responses? Can you give me some keywords about calling execute_async batch? PS: I use python version.
Re: execute is faster than execute_async?
Hi, you can check this piece of documentation from Datastax: https://docs.datastax.com/en/developer/python-driver/3.20/api/cassandra/cluster/#cassandra.cluster.Session.execute_async The usual way of doing this is to send a bunch of execute_async() calls, adding the returned futures in a list. Once the list reaches the chosen threshold (usually we send around 100 queries and wait for them to finish before moving on the the next ones), loop through the futures and call the result() method to block until it completes. Should look like this: futures = [] for i in range(len(queries)): futures.append(session.execute_async(queries[i])) if len(futures) >= 100 or i == len(queries)-1: for future in futures: results = future.result() # will block until the query finishes futures = [] # empty the list Haven't tested the code above but it should give you an idea on how this can be implemented. Sending hundreds/thousands of queries without waiting for a result will DDoS the cluster, so you should always implement some throttling. Cheers, - Alexander Dejanovski France @alexanderdeja Consultant Apache Cassandra Consulting http://www.thelastpickle.com On Wed, Dec 11, 2019 at 10:42 AM Jordan West wrote: > I’m not very familiar with the python client unfortunately. If it helps: > In Java, async would return futures and at the end of submitting each batch > you would block on them by calling get. > > Jordan > > On Wed, Dec 11, 2019 at 1:37 AM lampahome wrote: > >> >> >> Jordan West 於 2019年12月11日 週三 下午4:34寫道: >> >>> Hi, >>> >>> Have you tried batching calls to execute_async with periodic blocking >>> for the batch’s responses? >>> >> >> Can you give me some keywords about calling execute_async batch? >> >> PS: I use python version. >> >
Re: execute is faster than execute_async?
I’m not very familiar with the python client unfortunately. If it helps: In Java, async would return futures and at the end of submitting each batch you would block on them by calling get. Jordan On Wed, Dec 11, 2019 at 1:37 AM lampahome wrote: > > > Jordan West 於 2019年12月11日 週三 下午4:34寫道: > >> Hi, >> >> Have you tried batching calls to execute_async with periodic blocking for >> the batch’s responses? >> > > Can you give me some keywords about calling execute_async batch? > > PS: I use python version. >
Re: execute is faster than execute_async?
Jordan West 於 2019年12月11日 週三 下午4:34寫道: > Hi, > > Have you tried batching calls to execute_async with periodic blocking for > the batch’s responses? > Can you give me some keywords about calling execute_async batch? PS: I use python version.
Re: execute is faster than execute_async?
Hi, Have you tried batching calls to execute_async with periodic blocking for the batch’s responses? I’ve witnessed this behavior as well with large or no batches and while I didn’t have time to investigate fully its likely due to message queuing behavior within Cassandra (pre-4.0). Smaller batches or execute() alleviates the queueing issues and spreads the load more evenly across the time quanta being measured. Jordan On Wed, Dec 11, 2019 at 12:09 AM lampahome wrote: > I submit 1 row for 40960 times by session.execute() and > session.execute_async() > > I found total time of execute() is always fast than execute_async > > Does that make sense? Or I miss the details of theri? >
execute is faster than execute_async?
I submit 1 row for 40960 times by session.execute() and session.execute_async() I found total time of execute() is always fast than execute_async Does that make sense? Or I miss the details of theri?