Re: execute is faster than execute_async?

2019-12-12 Thread Avi Kivity


On 12/12/2019 06.25, lampahome wrote:



Jon Haddad mailto:j...@jonhaddad.com>> 於 
2019年12月12日 週四 上午12:42寫道:


I'm not sure how you're measuring this - could you share your
benchmarking code?

s the details of theri?


start = time.time()
for i in range(40960):
    prep = session.prepare(query, (args))
    session.execute(prep) # or session.execute_async(prep)
print('time', time.time()-start)

Just like above code snippet.
I almost cost time by execute_async()   more than normal execute().



I think you're just exposing Python and perhaps driver weaknesses.


With .execute(), memory usage stays constant and you suffer the round 
trip time once per loop.


With .execute_async(), memory usage grows, and if there is any algorithm 
in the driver that is not O(1) (say to maintain the outstanding request 
table), execution time grows as you push more and more requests. The 
thread(s) that process responses have to contend with the request 
issuing thread over locks. You don't suffer the round trip time, but 
from your results the other issues dominate.



If you also collected responses in your loop, and also bound the number 
of outstanding requests to a reasonable number, you'll see execute_async 
performing better. You'll see even better performance if you drop Python 
for a language more suitable for the data plane.




Re: execute is faster than execute_async?

2019-12-11 Thread lampahome
Jon Haddad  於 2019年12月12日 週四 上午12:42寫道:

> I'm not sure how you're measuring this - could you share your benchmarking
> code?
>
>> s the details of theri?
>>
>
start = time.time()
for i in range(40960):
prep = session.prepare(query, (args))
session.execute(prep) # or session.execute_async(prep)
print('time', time.time()-start)

Just like above code snippet.
I almost cost time by execute_async()   more than normal execute().


Re: execute is faster than execute_async?

2019-12-11 Thread Jon Haddad
I'm not sure how you're measuring this - could you share your benchmarking
code?

I ask because execute calls execute_async under the hood:
https://github.com/datastax/python-driver/blob/master/cassandra/cluster.py#L2316

I tested the python driver a ways back and found some weird behavior due to
the way it's non blocking code was implemented.  IIRC there were some sleep
calls thrown in there to get around Python's threading inadequacies.  I
can't remember if this code path is avoided when you use the execute() call.

Jon


On Wed, Dec 11, 2019 at 3:09 AM lampahome  wrote:

> I submit 1 row for 40960 times by session.execute() and
> session.execute_async()
>
> I found total time of execute() is always fast than execute_async
>
> Does that make sense? Or I miss the details of theri?
>


Re: execute is faster than execute_async?

2019-12-11 Thread Reid Pinchback
Also note that you should be expecting async operations to be slower on a 
call-by-call basis.  Async protocols have added overhead.  The point of them 
really is to leave the client free to interleave other computing activity 
between the async calls.  It’s not usually a better way to do batch writing. 
That’s not an observation specific to C*, that’s just about understanding the 
role of async operations in computing.

There is some subtlety with distributed services like C* where you’re 
round-robining the calls around the cluster, where repeated async calls can win 
relative to sync because you aren’t waiting to hand off the next unit of work 
to a different node, but once the activity starts to queue up on any kind of 
resource, even just TCP buffering, you’ll likely be back to a situation where 
all you are measuring is the net difference in protocol overhead for async vs 
sync.

One of the challenges with performance testing is you have to be pretty clear 
on what exactly it is you are exercising, or all you can conclude from 
different numbers is that different numbers can exist.

R

From: Alexander Dejanovski 
Reply-To: "user@cassandra.apache.org" 
Date: Wednesday, December 11, 2019 at 7:44 AM
To: user 
Subject: Re: execute is faster than execute_async?

Message from External Sender
Hi,

you can check this piece of documentation from Datastax: 
https://docs.datastax.com/en/developer/python-driver/3.20/api/cassandra/cluster/#cassandra.cluster.Session.execute_async<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.datastax.com_en_developer_python-2Ddriver_3.20_api_cassandra_cluster_-23cassandra.cluster.Session.execute-5Fasync&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=0ofF4UffMCqC7QUlll_Df3DXg8p2S1e6Us9n2WPDi40&s=DThiGTbvbXSgd9EgVDS5TB3UMg2BPHvC8QypKU18IY0&e=>

The usual way of doing this is to send a bunch of execute_async() calls, adding 
the returned futures in a list. Once the list reaches the chosen threshold 
(usually we send around 100 queries and wait for them to finish before moving 
on the the next ones), loop through the futures and call the result() method to 
block until it completes.
Should look like this:


futures = []

for i in range(len(queries)):

futures.append(session.execute_async(queries[i]))

if len(futures) >= 100 or i == len(queries)-1:

for future in futures:

results = future.result() # will block until the query finishes

futures = []  # empty the list



Haven't tested the code above but it should give you an idea on how this can be 
implemented.
Sending hundreds/thousands of queries without waiting for a result will DDoS 
the cluster, so you should always implement some throttling.

Cheers,

-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.thelastpickle.com_&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=0ofF4UffMCqC7QUlll_Df3DXg8p2S1e6Us9n2WPDi40&s=j15rYxPPTuCan-fJfvsS7dVrfBFtz9ZKXT-4fb2Avbs&e=>


On Wed, Dec 11, 2019 at 10:42 AM Jordan West 
mailto:jorda...@gmail.com>> wrote:
I’m not very familiar with the python client unfortunately. If it helps: In 
Java, async would return futures and at the end of submitting each batch you 
would block on them by calling get.

Jordan

On Wed, Dec 11, 2019 at 1:37 AM lampahome 
mailto:pahome.c...@mirlab.org>> wrote:


Jordan West mailto:jorda...@gmail.com>> 於 2019年12月11日 週三 
下午4:34寫道:
Hi,

Have you tried batching calls to execute_async with periodic blocking for the 
batch’s responses?

Can you give me some keywords about calling execute_async batch?

PS: I use python version.


Re: execute is faster than execute_async?

2019-12-11 Thread Alexander Dejanovski
Hi,

you can check this piece of documentation from Datastax:
https://docs.datastax.com/en/developer/python-driver/3.20/api/cassandra/cluster/#cassandra.cluster.Session.execute_async

The usual way of doing this is to send a bunch of execute_async() calls,
adding the returned futures in a list. Once the list reaches the chosen
threshold (usually we send around 100 queries and wait for them to finish
before moving on the the next ones), loop through the futures and call the
result() method to block until it completes.
Should look like this:

futures = []
for i in range(len(queries)):
futures.append(session.execute_async(queries[i]))
if len(futures) >= 100 or i == len(queries)-1:
for future in futures:
results = future.result() # will block until the query finishes
futures = []  # empty the list


Haven't tested the code above but it should give you an idea on how this
can be implemented.
Sending hundreds/thousands of queries without waiting for a result will
DDoS the cluster, so you should always implement some throttling.

Cheers,

-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


On Wed, Dec 11, 2019 at 10:42 AM Jordan West  wrote:

> I’m not very familiar with the python client unfortunately. If it helps:
> In Java, async would return futures and at the end of submitting each batch
> you would block on them by calling get.
>
> Jordan
>
> On Wed, Dec 11, 2019 at 1:37 AM lampahome  wrote:
>
>>
>>
>> Jordan West  於 2019年12月11日 週三 下午4:34寫道:
>>
>>> Hi,
>>>
>>> Have you tried batching calls to execute_async with periodic blocking
>>> for the batch’s responses?
>>>
>>
>> Can you give me some keywords about calling execute_async batch?
>>
>> PS: I use python version.
>>
>


Re: execute is faster than execute_async?

2019-12-11 Thread Jordan West
I’m not very familiar with the python client unfortunately. If it helps: In
Java, async would return futures and at the end of submitting each batch
you would block on them by calling get.

Jordan

On Wed, Dec 11, 2019 at 1:37 AM lampahome  wrote:

>
>
> Jordan West  於 2019年12月11日 週三 下午4:34寫道:
>
>> Hi,
>>
>> Have you tried batching calls to execute_async with periodic blocking for
>> the batch’s responses?
>>
>
> Can you give me some keywords about calling execute_async batch?
>
> PS: I use python version.
>


Re: execute is faster than execute_async?

2019-12-11 Thread lampahome
Jordan West  於 2019年12月11日 週三 下午4:34寫道:

> Hi,
>
> Have you tried batching calls to execute_async with periodic blocking for
> the batch’s responses?
>

Can you give me some keywords about calling execute_async batch?

PS: I use python version.


Re: execute is faster than execute_async?

2019-12-11 Thread Jordan West
Hi,

Have you tried batching calls to execute_async with periodic blocking for
the batch’s responses? I’ve witnessed this behavior as well with large or
no batches and while I didn’t have time to investigate fully its likely due
to message queuing behavior within Cassandra (pre-4.0). Smaller batches or
execute() alleviates the queueing issues and spreads the load more evenly
across the time quanta being measured.

Jordan


On Wed, Dec 11, 2019 at 12:09 AM lampahome  wrote:

> I submit 1 row for 40960 times by session.execute() and
> session.execute_async()
>
> I found total time of execute() is always fast than execute_async
>
> Does that make sense? Or I miss the details of theri?
>


execute is faster than execute_async?

2019-12-11 Thread lampahome
I submit 1 row for 40960 times by session.execute() and
session.execute_async()

I found total time of execute() is always fast than execute_async

Does that make sense? Or I miss the details of theri?