subject:"\[jira\] \[Commented\] \(CASSANDRA\-4705\) Speculative execution for CL

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

2012-12-01 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508068#comment-13508068
]

Jonathan Ellis commented on CASSANDRA-4705:
---

Okay, let's leave UpdateSampleLatencies alone (although as style I'd prefer to
inline it as an anyonymous Runnable).

Thinking more about the core functionality:

- a RetryType of one pre-emptive redundant data read would be a useful
alternative to ALL. (If supporting both makes things more complex, I would
vote for just supporting the single extra read.) E.g., for a CL.ONE read it
would perform two data reads; for CL.QUORUM it would perform two data reads and
a digest read. Put another way, it would do the same exta data read
Xpercentile would, but it would do it ahead of the threshold timeout.
- ISTM we should continue to use RDR for normal (non-RR) SR reads, and just
accept the first data reply that comes back without comparing it to others.
This makes the most sense to me semantically, and keeps CL.ONE reads
lightweight.
- I think it's incorrect (again, in the non-RR case) to perform a data read
against the same host we sent a digest read to. Consider CL.QUORUM: I send a
data read to replica X and a digest to replica Y. X is slow to respond. Doing
a data read to Y won't help, since I need both to meet my CL. I have to do my
SR read to replica Z, if one exists and is alive.
- We should probably extend this to doing extra digest reads for CL ONE, when
we get the data read back quickly but the digest read is slow.
- SR + RR is the tricky part... this is where SR could result in data and
digests from the same host. So ideally, we want the ability to compare
(potentially) multiple data reads, *and* multiple digests, *and* track the
source for CL purposes, which neither RDR nor RRR is equipped to do. Perhaps
we should just force all reads to data reads for SR + RR [or even for all RR
reads], to simplify this.

Finally,
- millis may be too coarse a grain here, especially for Custom settings.
Currently an in-memory read will typically be under 2ms and it's quite possible
we can get that down to 1 if we can purge some of the latency between stages.
Might as well use micros since Timer gives it to us for free, right?

Speculative execution for CL_ONE

Key: CASSANDRA-4705
URL: https://issues.apache.org/jira/browse/CASSANDRA-4705
Project: Cassandra
Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Vijay
Assignee: Vijay
Priority: Minor
Attachments: 0001-CASSANDRA-4705.patch, 0001-CASSANDRA-4705-v2.patch

When read_repair is not 1.0, we send the request to one node for some of the
requests. When a node goes down or when a node is too busy the client has to
wait for the timeout before it can retry.
It would be nice to watch for latency and execute an additional request to a
different node, if the response is not received within average/99% of the
response times recorded in the past.
CASSANDRA-2540 might be able to solve the variance when read_repair is set to
1.0
1) May be we need to use metrics-core to record various Percentiles
2) Modify ReadCallback.get to execute additional request speculatively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

2012-11-23 Thread Vijay (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503260#comment-13503260
]

Vijay commented on CASSANDRA-4705:
--

Hi Jonathan, Sorry for the delay.

{quote}
Would it make more sense to have getReadLatencyRate and UpdateSampleLatencies
into SR? that way we could replace case statements with polymorphism.
{quote}
The problem is that we have to calculate the expensive percentile calculation
Async using a scheduled TPE, We can avoid the switch by introducing additional
SRFactory which will initialize the TPE as per CF changes in the settings? Let
me know.

{quote}
Why does preprocess return a boolean now?
{quote}
The current patch uses the boolean to understand if the processing was done or
not its used by RCB after the patch when there are more than 1 responses
received by the co-ordinator from the same host (When SR is on and the actual
read response gets back at the same time as the speculated response), we should
not count that towards the consistency level.

{quote}
How does/should SR interact with RR? Using ALL + RRR
{quote}
Currently we are doing additional read to double check if we need to write, I
thought the goal for ALL will eliminate that and do additional write instead...
Most cases it will be a memtable update :)
I can think of 2 options:
1) Just document the ALL case and live with the additional writes, user might
not be a big issue for most cases and for the rest they can switch to the
default behavior.
2) We can queue the repair Mutations, in the Async thread we can check if there
are duplicate mutations pending... if yes then we can just ignore the
duplicates this can be done by doing sendRR and adding the CF to be repaired in
a HashSet (it takes additional memory footprint).

Should we move this discussion to a different ticket?

Let me know, Thanks!

Speculative execution for CL_ONE

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

2012-11-21 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13501877#comment-13501877
]

Jonathan Ellis commented on CASSANDRA-4705:
---

avro is just used for upgrading from 1.0 schemas, so shouldn't need to touch
that anymore.

Would it make more sense to have getReadLatencyRate and UpdateSampleLatencies
into SR? that way we could replace case statements with polymorphism.

Can you split the AbstractReadExecutor refactor out from the speculative
execution code? That would make it easier to isolate the changes in review.

Why does preprocess return a boolean now?

How does/should SR interact with RR? Using ALL + RRR means we're probably
going to do a lot of unnecessary repair writes in a high-update environment
(i.e., it would be normal for one replica to be slightly behind others on a
read), which is probably not what we want. Also unclear to me what happens
when we use RDR and do a SR when we've also requested extra digests for RR, and
we get a data read and a digest from the same replica.

Speculative execution for CL_ONE

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

2012-10-09 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13472500#comment-13472500
 ] 

Jonathan Ellis commented on CASSANDRA-4705:
---

So I guess we could support {ALL, Xpercentile, Yms, NONE} where X and Y are 
both doubles?

 Speculative execution for CL_ONE
 

 Key: CASSANDRA-4705
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4705
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Attachments: 0001-CASSANDRA-4705.patch


 When read_repair is not 1.0, we send the request to one node for some of the 
 requests. When a node goes down or when a node is too busy the client has to 
 wait for the timeout before it can retry. 
 It would be nice to watch for latency and execute an additional request to a 
 different node, if the response is not received within average/99% of the 
 response times recorded in the past.
 CASSANDRA-2540 might be able to solve the variance when read_repair is set to 
 1.0
 1) May be we need to use metrics-core to record various Percentiles
 2) Modify ReadCallback.get to execute additional request speculatively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

2012-10-09 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13472535#comment-13472535
 ] 

Vijay commented on CASSANDRA-4705:
--

Cool, let me work on the patch soon... 
{quote}
are both doubles?
{quote}
Well it will be long in ms, :)

 Speculative execution for CL_ONE
 

 Key: CASSANDRA-4705
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4705
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Attachments: 0001-CASSANDRA-4705.patch


 When read_repair is not 1.0, we send the request to one node for some of the 
 requests. When a node goes down or when a node is too busy the client has to 
 wait for the timeout before it can retry. 
 It would be nice to watch for latency and execute an additional request to a 
 different node, if the response is not received within average/99% of the 
 response times recorded in the past.
 CASSANDRA-2540 might be able to solve the variance when read_repair is set to 
 1.0
 1) May be we need to use metrics-core to record various Percentiles
 2) Modify ReadCallback.get to execute additional request speculatively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

2012-10-09 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13472552#comment-13472552
 ] 

Jonathan Ellis commented on CASSANDRA-4705:
---

our history has been that sooner or later someone always wants fractional ms, 
but I'm fine w/ long (or int) :)

 Speculative execution for CL_ONE
 

 Key: CASSANDRA-4705
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4705
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Attachments: 0001-CASSANDRA-4705.patch


 When read_repair is not 1.0, we send the request to one node for some of the 
 requests. When a node goes down or when a node is too busy the client has to 
 wait for the timeout before it can retry. 
 It would be nice to watch for latency and execute an additional request to a 
 different node, if the response is not received within average/99% of the 
 response times recorded in the past.
 CASSANDRA-2540 might be able to solve the variance when read_repair is set to 
 1.0
 1) May be we need to use metrics-core to record various Percentiles
 2) Modify ReadCallback.get to execute additional request speculatively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

2012-10-05 Thread Chris Burroughs (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470461#comment-13470461
 ] 

Chris Burroughs commented on CASSANDRA-4705:


 Looks like metrics-core exposes 75, 95, 97, 99 and 99.9

Reporters have a limited set (ie you can't generate new values that will pop up 
in jmx on the fly), but in code you should be able to get at any percentile you 
want: 
https://github.com/codahale/metrics/blob/2.x-maintenance/metrics-core/src/main/java/com/yammer/metrics/stats/Snapshot.java#L54

 Speculative execution for CL_ONE
 

 Key: CASSANDRA-4705
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4705
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Attachments: 0001-CASSANDRA-4705.patch


 When read_repair is not 1.0, we send the request to one node for some of the 
 requests. When a node goes down or when a node is too busy the client has to 
 wait for the timeout before it can retry. 
 It would be nice to watch for latency and execute an additional request to a 
 different node, if the response is not received within average/99% of the 
 response times recorded in the past.
 CASSANDRA-2540 might be able to solve the variance when read_repair is set to 
 1.0
 1) May be we need to use metrics-core to record various Percentiles
 2) Modify ReadCallback.get to execute additional request speculatively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

2012-10-05 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470470#comment-13470470
 ] 

Jonathan Ellis commented on CASSANDRA-4705:
---

Thanks Chris!

 Speculative execution for CL_ONE
 

 Key: CASSANDRA-4705
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4705
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Attachments: 0001-CASSANDRA-4705.patch


 When read_repair is not 1.0, we send the request to one node for some of the 
 requests. When a node goes down or when a node is too busy the client has to 
 wait for the timeout before it can retry. 
 It would be nice to watch for latency and execute an additional request to a 
 different node, if the response is not received within average/99% of the 
 response times recorded in the past.
 CASSANDRA-2540 might be able to solve the variance when read_repair is set to 
 1.0
 1) May be we need to use metrics-core to record various Percentiles
 2) Modify ReadCallback.get to execute additional request speculatively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

2012-09-30 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13466559#comment-13466559
 ] 

Jonathan Ellis commented on CASSANDRA-4705:
---

Well, we have a pretty short list of possibilities from metrics...  I guess we 
could add auto95, auto97, auto99 options?

 Speculative execution for CL_ONE
 

 Key: CASSANDRA-4705
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4705
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Vijay
Assignee: Vijay
Priority: Minor

 When read_repair is not 1.0, we send the request to one node for some of the 
 requests. When a node goes down or when a node is too busy the client has to 
 wait for the timeout before it can retry. 
 It would be nice to watch for latency and execute an additional request to a 
 different node, if the response is not received within average/99% of the 
 response times recorded in the past.
 CASSANDRA-2540 might be able to solve the variance when read_repair is set to 
 1.0
 1) May be we need to use metrics-core to record various Percentiles
 2) Modify ReadCallback.get to execute additional request speculatively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

2012-09-29 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13466390#comment-13466390
 ] 

Vijay commented on CASSANDRA-4705:
--

I pushed the prototype code into 
https://github.com/Vijay2win/cassandra/commit/62bbabfc41ba8e664eb63ba50110e5f5909b2a87

Looks like metrics-core exposes 75, 95, 97, 99 and 99.9 Percentile's, with my 
tests 75P is too low, and 99 is too high to make a difference, whereas 95P long 
tail looks better (Moving average doesn't make much of a difference too). 

I still think we should also support hard coded value in addition to the auto :)

Note: have to make the speculative_retry part of the schema but currently if 
you want to test it out it is a code change in CFMetaData

 Speculative execution for CL_ONE
 

 Key: CASSANDRA-4705
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4705
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Vijay
Assignee: Vijay
Priority: Minor

 When read_repair is not 1.0, we send the request to one node for some of the 
 requests. When a node goes down or when a node is too busy the client has to 
 wait for the timeout before it can retry. 
 It would be nice to watch for latency and execute an additional request to a 
 different node, if the response is not received within average/99% of the 
 response times recorded in the past.
 CASSANDRA-2540 might be able to solve the variance when read_repair is set to 
 1.0
 1) May be we need to use metrics-core to record various Percentiles
 2) Modify ReadCallback.get to execute additional request speculatively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

2012-09-25 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13462762#comment-13462762
]

Jonathan Ellis commented on CASSANDRA-4705:
---

I don't like the idea of making users manually specify thresholds. They will
usually get it wrong, and we have latency histograms that should let us do a
better job automagically.

But I could see the value of a setting to allow disabling it when you know your
CF has a bunch of different query types being thrown at it. Something like
speculative_retry = {off, automatic, full} where full is Peter's full data
reads to each replica.

Speculative execution for CL_ONE

Key: CASSANDRA-4705
URL: https://issues.apache.org/jira/browse/CASSANDRA-4705
Project: Cassandra
Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Vijay
Assignee: Vijay
Priority: Minor

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

2012-09-24 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461805#comment-13461805
 ] 

Jonathan Ellis commented on CASSANDRA-4705:
---

FTR I'm not sure CL.ONE is going to be substantially easier than generalizing 
to all CL.

 Speculative execution for CL_ONE
 

 Key: CASSANDRA-4705
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4705
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Vijay
Assignee: Vijay
Priority: Minor

 When read_repair is not 1.0, we send the request to one node for some of the 
 requests. When a node goes down or when a node is too busy the client has to 
 wait for the timeout before it can retry. 
 It would be nice to watch for latency and execute an additional request to a 
 different node, if the response is not received within average/99% of the 
 response times recorded in the past.
 CASSANDRA-2540 might be able to solve the variance when read_repair is set to 
 1.0
 1) May be we need to use metrics-core to record various Percentiles
 2) Modify ReadCallback.get to execute additional request speculatively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

2012-09-24 Thread Peter Schuller (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13462450#comment-13462450
]

Peter Schuller commented on CASSANDRA-4705:
---

99% based on what time period? If period it too short, you won't get the full
impact since you'll pollute the track record. If it's too large, consider the
traffic increase resulting from a prolonged hiccup. Will you be able to hide
typical GC pauses? Then you better have the window be higher than 250 ms. What
about full gc:s? How do you determine what the p99 is given a node with
multiple replica sets shared with it? If a single node goes into full gc, how
do you make latency be un-affected while still capping the number of backup
requests at a reasonable number? If you don't cap it, the optimization is more
dangerous than useful, since it just means you'll fall over under various
hard-to-predict emergent situations if you expect to take advantage of less
reads when provisioning your cluster. What's an appropriate cap? How do you
scale that with RF and consistency level? How do you explain this to the person
who has to figure out how much capacity is needed for a cluster?

In our case, we pretty much run all our clusters with RR turned fully up - not
necessarily for RR purposes, but for the purpose of more deterministic
behavior. You don't want things falling over when a replica goas down. If you
don't have the iops/CPU to take all replicas having to process all requests for
a replica set, you're at risk of falling over (i.e., you don't scale, because
failures are common in large clusters) - unless you over-provision, but then
you might as well go all data reads to begin with.

I am not arguing against the idea of backup requests, but I *strongly*
recommend simply going for the trivial and obvious route of full data reads
*first* and getting the obvious pay-off with no increase in complexity (I would
even argue it's a *decrease* in complexity in terms of the behavior of the
system as a whole, especially from the perspective of a human understanding
emergent cluster behavior) - and then slowly develop something like this, with
very careful thought to all the edge cases and implications of it.

I'm in favor of long-term *predictable* performance. Full data reads is a very
very easy way to achieve that, and vastly better latency, in many cases (the
bandwidth saturation case pretty much being the major exception; CPU savings
aren't really relevant with Cassandra's model if you expect to survive nodes
being down). It's also very easy for a human to understand the behavior when
looking at graphs of system behavior in some event, and trying to predict what
will happen, or explain what did happen.

I really think the drawbacks of full data reads are being massively
over-estimated and the implications of lack of data reads massively
under-estimated.

Speculative execution for CL_ONE

Key: CASSANDRA-4705
URL: https://issues.apache.org/jira/browse/CASSANDRA-4705
Project: Cassandra
Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Vijay
Assignee: Vijay
Priority: Minor

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

2012-09-24 Thread Peter Schuller (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13462456#comment-13462456
]

Peter Schuller commented on CASSANDRA-4705:
---

Here's a good example of complexity implication that I just thought of (and
it's stuff like this I'm worried about w.r.t. complexity): How do you split
requests into groups within which to do latency profiling? If you don't,
you'll easily end up having the expensive requests always be processed multiple
times because they always hit the backup path (because they are expensive and
thus latent). So you could very easily eat up all your intended benefit by
having the very expensive requests take the backup path. Without knowledge of
the nature of the requests, and since we cannot reliably just assume a
homogenous request pattern, you would probably need some non-trivial way of
classifying requests and having it relate to these statistics to keep.

In some cases, having it be a per-cf setting might be enough. In other cases
that's not feasable - for example maybe you're doing slicing on large rows, and
maybe it's impossible to determine based on an incoming requests whether it's
expensive or not (the range may be high but result in only a single column, for
example).

What if you don't care about the latency of the legitimately expensive
requests, but about the cheap ones? And what if those legitimately expensive
requests consumes your 1% (p99), such that none of the cheaper requests are
subject to backup requests? Now you get none of the benefit, but you still take
the brunt of the cost you'd have if you just went with full data reads.

I'm sure there are many other concerns I'm not thinking of; this was meant as
an example of how it can be hard to make this actually work the way it's
intended.

Speculative execution for CL_ONE

Key: CASSANDRA-4705
URL: https://issues.apache.org/jira/browse/CASSANDRA-4705
Project: Cassandra
Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Vijay
Assignee: Vijay
Priority: Minor

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

2012-09-23 Thread Brandon Williams (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461464#comment-13461464
 ] 

Brandon Williams commented on CASSANDRA-4705:
-

bq. It would be nice to watch for latency and execute an additional request to 
a different node

Isn't this what the dsnitch does to some degree?

 Speculative execution for CL_ONE
 

 Key: CASSANDRA-4705
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4705
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Vijay
Assignee: Vijay
Priority: Minor

 When read_repair is not 1.0, we send the request to one node for some of the 
 requests. When a node goes down or when a node is too busy the client has to 
 wait for the timeout before it can retry. 
 It would be nice to watch for latency and execute an additional request to a 
 different node, if the response is not received within average/99% of the 
 response times recorded in the past.
 CASSANDRA-2540 might be able to solve the variance when read_repair is set to 
 1.0
 1) May be we need to use metrics-core to record various Percentiles
 2) Modify ReadCallback.get to execute additional request speculatively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

2012-09-23 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461478#comment-13461478
 ] 

Vijay commented on CASSANDRA-4705:
--

No, DSnitch watches for the latency but doesn't do the later It wont 
speculate/execute duplicate requests to another host, if the response times are 
 x%. 

I think this patch will be in addition to dsnitch, something like Jonathan 
posted in 2540

{quote}
I like the approach described in 
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/people/jeff/Berkeley-Latency-Mar2012.pdf
 of doing backup requests if the original doesn't reply within N% of normal.
{quote}

 Speculative execution for CL_ONE
 

 Key: CASSANDRA-4705
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4705
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Vijay
Assignee: Vijay
Priority: Minor

 When read_repair is not 1.0, we send the request to one node for some of the 
 requests. When a node goes down or when a node is too busy the client has to 
 wait for the timeout before it can retry. 
 It would be nice to watch for latency and execute an additional request to a 
 different node, if the response is not received within average/99% of the 
 response times recorded in the past.
 CASSANDRA-2540 might be able to solve the variance when read_repair is set to 
 1.0
 1) May be we need to use metrics-core to record various Percentiles
 2) Modify ReadCallback.get to execute additional request speculatively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

[jira] [Commented] (CASSANDRA-4705) Speculative execution for CL_ONE

16 matches

Site Navigation

Mail list logo

Footer information