[ 
https://issues.apache.org/jira/browse/CASSANDRA-11320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs resolved CASSANDRA-11320.
-------------------------------------
       Resolution: Fixed
    Fix Version/s:     (was: 3.0.x)
                       (was: 3.x)
                   3.5
                   3.0.5

bq. I believe they should be fine as hidden options?

Yes, that seems fine to me.  I forgot that the dtest was using them.

bq. Is there a better way?

I'm not sure, I didn't spend a lot of time digging into it.  That seems fine to 
me for now, though.

bq. We could back-port it to 2.2 as well.

Since this is a fairly large change, let's keep it in 3.x for now.  If we see 
this problem in the wild in 2.2, we can look at backporting it.

bq. once the code review is completed, I will take care of merging to other 
branches and running CI.

In order to get this before the 3.5 code freeze, I went ahead and merged this 
myself and ran the relevant dtests locally.  Aside from one [minor dtest 
tweak|https://github.com/riptano/cassandra-dtest/commit/0b0047617c8a38562d3714e1e6b9d744396b255f]
 everything looks good.

So, +1, committed as {{98086b65d0bc76631a3aeb50cddd8c9a82bc05b9}} to 3.0 and 
merged upwards.  Thanks!

> Improve backoff policy for cqlsh COPY FROM
> ------------------------------------------
>
>                 Key: CASSANDRA-11320
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11320
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Stefania
>            Assignee: Stefania
>              Labels: doc-impacting
>             Fix For: 3.0.5, 3.5
>
>
> Currently we have an exponential back-off policy in COPY FROM that kicks in 
> when timeouts are received. However there are two limitations:
> * it does not cover new requests and therefore we may not back-off 
> sufficiently to give time to an overloaded server to recover
> * the pause is performed in the receiving thread and therefore we may not 
> process server messages quickly enough
> There is a static throttling mechanism in rows per second from feeder to 
> worker processes (the INGESTRATE) but the feeder has no idea of the load of 
> each worker process. However it's easy to keep track of how many chunks a 
> worker process has yet to read by introducing a bounded semaphore.
> The idea is to move the back-off pauses to the worker processes main thread 
> so as to include all messages, new and retries, not just the retries that 
> timed out. The worker process will not read new chunks during the back-off 
> pauses, and the feeder process can then look at the number of pending chunks 
> before sending new chunks to a worker process.
> [~aholmber], [~aweisberg] what do you think?  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to