[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-11 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565718#comment-15565718
 ] 

Jeremy Hanna commented on CASSANDRA-7296:
-

It does look like given the use case and that it really only applies to CL.ONE, 
it does look like the CL addition is a clearer/cleaner option.  It makes using 
the rest of the driver options simpler to reason about because it makes the CL 
contract very clear regardless of the other options.  The driver changes appear 
to have the same level of intrusiveness and the protocol would have to be 
updated in either case.

Is there a reason why a CL addition couldn't be done in this case - or in other 
words, do the edge cases of adding a CL outweigh the clarity of this function 
as a CL?

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Brian Hess (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563258#comment-15563258
 ] 

 Brian Hess commented on CASSANDRA-7296:


Consistency Level does feel like the right approach. The THIS_ prefix is in 
line with LOCAL_ in that it would identify the locus of nodes that are 
available for consistency. With LOCAL_ONE, we need just one replica from the 
data center of this coordinator. If no replicas exist (like the RF=0) then you 
get UnavailableException. Namely, you don't reach out to other nodes and proxy 
for another DC, etc. Also note that while the client can certainly see that 
it's talking to a DC with no RF by looking at system tables or driver API 
calls, we still throw the UnavailableException. 

In the THIS_ONE, we are saying that the locus of available nodes for 
consistency level is just the coordonator itself. If that node is not a 
replica, then it should also throw an UnavailableException. It should not 
silently go ask the actual replicas, just like in the LOCAL_ONE case we don't 
ask other DCs. While it is true that the client could know that this node is 
not a replica, it is the same as in LOCAL_ONE and RF. 

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563081#comment-15563081
 ] 

Jeremiah Jordan commented on CASSANDRA-7296:


Just an FYI changes to the CL enum require changes to every driver as well. CL 
is a protocol level option, not part of a query string.

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563059#comment-15563059
 ] 

Jon Haddad commented on CASSANDRA-7296:
---

It sounds like what you're suggesting is that *every* query setting be moved to 
CQL.  That's a different discussion altogether.  Currently settings that change 
how the protocol behaves go in the protocol, that's how Cassandra works now.  
Trying to change that behavior by starting with a single feature just leaves 
everyone with inconsistencies in how the driver itself behaves.  

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563043#comment-15563043
 ] 

Edward Capriolo commented on CASSANDRA-7296:


{quote}
I'm very opposed to the /*disable_snitch=true*/ syntax. We don't use that 
anywhere, and why would we want that to be part of the statement? Making it 
part of the statement removes the ability to disable dynamic snitch at a per 
query level, including it as part of CQL makes it per prepared statement.
It's not like adding it to the protocol is any different than specifying 
consistency level or a write timestamp.
{quote}

Again, this is how most (if not all databases do this). The reason is for RDBMS 
databases the API's are standard (like JDBC) and you can not add new 
functionality in the form of new methods.

The point of CQL is it solves everything in the query language, every weird 
switch that takes something out of the language makes it more like thirft. It 
is now something that EVERY client drive must implement.

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562992#comment-15562992
 ] 

Jeremiah Jordan commented on CASSANDRA-7296:


Besides what the client API looks like how would people expect this to behave 
if the coordinator is not a replica?  That decision may also affect how the API 
should look from a "least surprises" stand point. If the CL was "THIS_ONE" I 
would expect no data or possibly an UnavailableException (IIRC this is what you 
get from LOCAL_ in a DC with no replicas). If it was a flag called "prefer 
coordinator" of something then I would expect the request to be coordinated to 
replica nodes if the coordinator wasn't one.

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562990#comment-15562990
 ] 

Blake Eggleston commented on CASSANDRA-7296:


bq. are there scenarios where you would query a non-replica and expect it to 
return nothing rather than proxy the request

It's more the guarantee that you're _definitely_ looking at the data on a 
certain node. If we proxy when non-replicas are queried then you can't be sure 
that you're looking at the data on a certain node. If you've made a mistake, 
and queried a non replica, you'll see data from a different node

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562951#comment-15562951
 ] 

Jon Haddad commented on CASSANDRA-7296:
---

To Blake's point, are there scenarios where you would query a non-replica and 
expect it to return nothing rather than proxy the request?

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562934#comment-15562934
 ] 

Jon Haddad commented on CASSANDRA-7296:
---

I'm very opposed to the /\*disable_snitch=true\*/ syntax.  We don't use that 
anywhere, and why would we want that to be part of the statement?  Making it 
part of the statement removes the ability to disable dynamic snitch at a _per 
query_ level, including it as part of CQL makes it per prepared statement.  

It's not like adding it to the protocol is any different than specifying 
consistency level or a write timestamp.  

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562910#comment-15562910
 ] 

Edward Capriolo commented on CASSANDRA-7296:


{quote}
stmt = session.prepare("SELECT * from tab where id = ?", 
consistency_level=ConsistencyLevel.ONE)
stmt.disable_dynamic_snitch()
{quote}

I think it would be better using more standard SQL for optimizations. This is 
the common way query hints are provided.

{quote}
stmt = session.prepare("SELECT /*disable_snitch=true*/ * from tab where id = 
?", consistency_level=ConsistencyLevel.ONE)
{quote}

Providing extra methods like this seems thrift like. 
{quote}
stmt.disable_dynamic_snitch()
{quote}
This makes an API not a query language.


> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562896#comment-15562896
 ] 

Blake Eggleston commented on CASSANDRA-7296:


Just to clarify, has the goal of the ticket changed to give operators the 
option to always include the coordinator in a read if it's a replica? The goal 
as stated in the ticket description is to give operators the option to perform 
a local only read against the coordinator they’ve connected to, and fail (or 
return nothing) if it's not a replica.

In the context of the original description, combining this option with CLs 
other than ONE doesn’t make much sense.

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562870#comment-15562870
 ] 

Edward Capriolo commented on CASSANDRA-7296:


Is https://issues.apache.org/jira/browse/CASSANDRA-8119 a protocol option as 
well?

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562824#comment-15562824
 ] 

Jon Haddad commented on CASSANDRA-7296:
---

{quote}But short of that, why not directly attack the problem we're trying to 
solve and add a (protocol) option to queries to force the behavior ("always 
pick the coordinator as one replica if it's one"). That sounds less confusing 
to me than have a new CL that will confuse newcomers (as the difference with 
ONE is somewhat subtle for a newcomer). As a bonus, it would also work for CL > 
ONE (since again, it'll just be about forcing the dynamic snitch to pick the 
coordinator if it's a replica).
{quote}

This is a reasonable alternative.  I'm not sure if it's useful outside of 
CL=ONE, but there's probably a use case I'm not thinking of.  

Using the Python driver would look something like this, I'm assuming:

{code}
stmt = session.prepare("SELECT * from tab where id = ?", 
consistency_level=ConsistencyLevel.ONE)
stmt.disable_dynamic_snitch()
session.execute(stmt, [1])
{code}

Plus a bit to direct the driver to a particular replica, which has to happen 
regardless. 


> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562814#comment-15562814
 ] 

Jeff Jirsa commented on CASSANDRA-7296:
---

+1 in favor of protocol option, so users can apply it to other CLs as desired. 



> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562736#comment-15562736
 ] 

Jeremiah Jordan commented on CASSANDRA-7296:


+1 for protocol option, not new CL.

Also I think the removal of Severity from DynamicEndpointSnitch 
(CASSANDRA-11738) should reduce the times where it does something very screwy 
in picking replicas.

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562698#comment-15562698
 ] 

Edward Capriolo commented on CASSANDRA-7296:


{quote}
Basically, despite this being arguably confusing to most, I'm not sure we have 
really quantified the advantage this brings us, which is a shame 
{quote}

It brings one key thing. The clients do logic to control where to route 
request, they do this because they want the lowest latency. We want the server 
to respect the brain power of the client and carry out the operation where it 
decided, not forward the request elsewhere like it (sometimes) does now.

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562659#comment-15562659
 ] 

Sylvain Lebresne commented on CASSANDRA-7296:
-

bq. I'm concerned it will prove to be a step backwards in real clusters, where 
coordinator disk latencies may truly jump up up significantly

Right, by we also have rapid read protection now that might limit that problem 
reasonably well. But anyway, I'm not making any strong claim here, that's why I 
started my sentence by "I'm not even entirely sure". Basically, despite this 
being arguably confusing to most, I'm not sure we have really quantified the 
advantage this brings us, which is a shame (but it's not like I'm volunteering 
for experimenting here so ).

To clarify, my main point was that I dislike the idea of providing this through 
a new CL, and I'd rather have that being a protocol level query option (we have 
to change the protocol _anyway_).

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562632#comment-15562632
 ] 

Jeff Jirsa commented on CASSANDRA-7296:
---

{quote}
First, I'm not even entirely sure than letting the dynamic snitch bypass the 
coordinator if it's a replica is a good idea in the first place. Everyone more 
or less agree that doing token-aware routing is a good thing nowadays, and it's 
certainly confusing that the dynamic snitch may screw that up. If the dynamic 
snitch was a perfect and instantaneous view of latencies, then that could make 
sense, but it's not. Anyway, I think it's worth at least evaluating making even 
the dynamic snitch always pick the local node if it's a replica, as I'm not 
sure the benefit of not doing so outweigh the confusion it creates.
{quote}

Emotionally, I want this to be the right answer (principle of least 
astonishment), but I don't think it is. I'm concerned it will prove to be a 
step backwards in real clusters, where coordinator disk latencies may truly 
jump up up significantly (imagine all compaction threads running scrub/cleanup, 
where not only is the disk likely completely utilized, but the # of sstables on 
disk grows because all compaction threads are in use, so reads are more 
expensive than normal - in this case, dsnitch DOES save us, and implementing 
this type of change would be very hard to work around in production with most 
drivers).



> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Brian Hess (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562407#comment-15562407
 ] 

 Brian Hess commented on CASSANDRA-7296:


Just my 2 cents, but having this be a per-table option is not really a great 
solution for the debugging issue.  I'd be okay if we had that as a default, but 
we'd certainly need to support having this behavior even if that table option 
isn't set (it would be unfortunate to have to ALTER the table to get this 
behavior).

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561789#comment-15561789
 ] 

Sylvain Lebresne commented on CASSANDRA-7296:
-

I can certainly agree that having the dynamic snitch getting in the way of 
local queries is not always desirable, but I'm less convinced that adding a new 
consistency level is the cleanest way to circumvent that.

First, I'm not even entirely sure than letting the dynamic snitch bypass the 
coordinator if it's a replica is a good idea in the first place. Everyone more 
or less agree that doing token-aware routing is a good thing nowadays, and it's 
certainly confusing that the dynamic snitch may screw that up. If the dynamic 
snitch was a perfect and instantaneous view of latencies, then that could make 
sense, but it's not. Anyway, I think it's worth at least evaluating making even 
the dynamic snitch always pick the local node if it's a replica, as I'm not 
sure the benefit of not doing so outweigh the confusion it creates.

But short of that, why not directly attack the problem we're trying to solve 
and add a (protocol) option to queries to force the behavior ("always pick the 
coordinator as one replica if it's one"). That sounds less confusing to me than 
have a new CL that will confuse newcomers (as the difference with ONE is 
somewhat subtle for a newcomer). As a bonus, it would also work for CL > ONE 
(since again, it'll just be about forcing the dynamic snitch to pick the 
coordinator if it's a replica).

We could also have a table option to do the same: force the dynamic snitch to 
pick the coordinator if it's a replica for all queries on that table, which 
would be a tad more convenient for request pinning (of course, I get that for 
troubleshooting you still want to per-query option).


> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561602#comment-15561602
 ] 

Stefan Podkowinski commented on CASSANDRA-7296:
---

bq. There's cases where an operator needs to know exactly what's on a specific 
node. CL.COORDINATOR_ONLY is useful for debugging all sorts of production 
issues. Dynamic snitch makes CL=ONE not an effective way of determining what's 
on a specific node.

This. Most people are not even aware of this behavior and get confused by 
different results. There definitely should be a way to query individual nodes 
deterministically even if it's "just" for troubleshooting.

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-07 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556085#comment-15556085
 ] 

Blake Eggleston commented on CASSANDRA-7296:


Agreed, this would be useful in testing and troubleshooting.

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-07 Thread Tupshin Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556007#comment-15556007
 ] 

Tupshin Harper commented on CASSANDRA-7296:
---

Given the fresh activity, I'd like to re-emphasize my support for this ticket. 
I think node/data debugging via request pinning is an excellent use of it, and 
is basically the original reason for the ticket. Spark turned out to be an 
irrelevant tangent, but there is significant benefit in supporting this 
(degeneratively simple) form of consistency. If [~jjirsa]'s patch is still 
applicable (or can be), i'd love to see it given a fair shake.

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-07 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556008#comment-15556008
 ] 

Chris Lohfink commented on CASSANDRA-7296:
--

I could see this being useful in writing tests

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-07 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1978#comment-1978
 ] 

Edward Capriolo commented on CASSANDRA-7296:


{quote}
 Since there's little upside to this, and quite a bit of potential downside
{quote}

This is really useful if you want to do user generated request pinning. ONE 
could allows the node to proxy the request away based on what dynamic_snitch 
wants to do.

{quote}
New consistency levels tend to introduce a lot of edge-case bugs, and this one 
is particularly special, which probably means extra bugs.
{quote}

I am not following this logic. Why does because previously attempts which added 
buggy or incomplete features stand as a reason not to add new features?

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-07 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1951#comment-1951
 ] 

Jon Haddad commented on CASSANDRA-7296:
---

I'd like to resurrect this.  There's cases where an operator needs to know 
exactly what's on a specific node.  CL.COORDINATOR_ONLY is useful for debugging 
all sorts of production issues.

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>Assignee: Jeff Jirsa
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2015-02-23 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334203#comment-14334203
 ] 

Tyler Hobbs commented on CASSANDRA-7296:


It seems like the primary motivation here was performance and the use-case 
mentioned in CASSANDRA-6340.  The performance concern seems to have been 
address thoroughly by Piotr, and as I commented on 6340, I'm not sure this is 
the best solution there either.

New consistency levels tend to introduce a lot of edge-case bugs, and this one 
is particularly special, which probably means extra bugs.  Since there's little 
upside to this, and quite a bit of potential downside, I vote for closing this 
as Won't Fix.

 Add CL.COORDINATOR_ONLY
 ---

 Key: CASSANDRA-7296
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
 Project: Cassandra
  Issue Type: Improvement
Reporter: Tupshin Harper
Assignee: Jeff Jirsa

 For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
 read that never gets distributed, and only works if the coordinator you are 
 talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2015-02-20 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329916#comment-14329916
 ] 

Jeff Jirsa commented on CASSANDRA-7296:
---

Untested patch available at 
https://github.com/jeffjirsa/cassandra/compare/cassandra-7296.diff 
CQLSH requires python driver updated/patch at 
https://github.com/jeffjirsa/python-driver/compare/coordinator-only.diff 

I have no idea if there's interest in actually merging this (that is, if the 
project actually wants CL.COORDINATOR_ONLY). I can see use cases where people 
might want it. I'm not sure if it's worth the added complexity on the project. 
If someone confirms there's interest, I'll do more thorough testing. 

 Add CL.COORDINATOR_ONLY
 ---

 Key: CASSANDRA-7296
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
 Project: Cassandra
  Issue Type: Improvement
Reporter: Tupshin Harper

 For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
 read that never gets distributed, and only works if the coordinator you are 
 talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2014-12-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249655#comment-14249655
 ] 

Piotr Kołaczkowski commented on CASSANDRA-7296:
---

Honestly, I don't like this idea because of the following reasons:

# Seems like adding quite a lot of complexity to handle the following cases:
  ** What do we do if RF  1 to avoid duplicates? 
  ** If we decide on primary token range only, what do we do if one of the 
nodes fail and some primary token ranges have no node to query from? 
  ** What if the amount of data is large enough that we'd like to actually 
split token ranges so that they are smaller and there are more spark tasks? 
This is important for bigger jobs to protect from sudden failures and not 
having to recompute too much in case of a lost spark partition.
  ** How do we fetch data from the same node in parallel? Currently it is 
perfectly fine to have one Spark node using multiple cores (mappers) that fetch 
data from the same coordinator node separately?
# It is trying to solve a theoretical problem which hasn't proved in practice 
yet.
  ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No 
significant difference on larger data sets, and only a tiny difference on 
really small sets (constant cost of the query is higher than the cost of 
fetching the data).
  ** There are no customers reporting vnodes to be a problem for them.
  ** Theoretical reason: If data is large enough to not fit in page cache 
(hundreds of GBs on a single node), 256 additional random seeks is not going to 
cause a huge penalty because:
  *** some of them can be hidden by splitting those queries between separate 
Spark threads, so they would be submitted and executed in parallel
  *** each token range will be of size *hundreds* of MBs, which is enough large 
to hide one or two seeks

Some *real* performance problems we (and users) observed:
 * Cassandra is taking plenty of CPU when doing sequential scans. It is not 
possible to saturate bandwidth of a single laptop spinning HDD, because all 
cores of i7 CPU @2.4 GHz are 100% busy processing those small CQL cells, 
merging rows from different SSTables, ordering cells, filtering out tombstones, 
serializing etc. The problem doesn't go away after doing full compaction or 
disabling vnodes. This is a serious problem, because doing exactly the same 
query on a plain text file stored in CFS (still C*, but data stored as 2MB 
blobs) gives 3-30x performance boost (depending on who did the benchmark). We 
need to close this gap. See: https://datastax.jira.com/browse/DSP-3670
 * We need to improve backpressure mechanism at least in such a way that the 
driver or Spark connector would know to start throttling writes if the cluster 
doesn't keep up. Currently Cassandra just timeouts the writes, but once it 
happens, the driver has no clue how long to wait until it is ok to resubmit the 
update. It would be actually good to know long enough before timing out, so we 
could slow down and avoid wasteful retrying at all. Currently it is not 
possible to predict cluster load by e.g. observing write latency, because the 
latency is extremely good until it is suddently terrible (timeout). This is 
also important for other non-Spark related use cases. See 
https://issues.apache.org/jira/browse/CASSANDRA-7937.




 Add CL.COORDINATOR_ONLY
 ---

 Key: CASSANDRA-7296
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
 Project: Cassandra
  Issue Type: Improvement
Reporter: Tupshin Harper

 For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
 read that never gets distributed, and only works if the coordinator you are 
 talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2014-12-17 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250222#comment-14250222
 ] 

Jon Haddad commented on CASSANDRA-7296:
---

Good points.  I think this issue would result in other, perhaps more serious 
problems, making an appearance.  I am not convinced, however, that NUM_TOKENS = 
NUM_QUERIES is the right solution on the spark side either, under the case of 
(data  disk  disk_type == spinning_rust).  I think we can move any future 
discussion to the driver JIRA and reference this from there.

 Add CL.COORDINATOR_ONLY
 ---

 Key: CASSANDRA-7296
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
 Project: Cassandra
  Issue Type: Improvement
Reporter: Tupshin Harper

 For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
 read that never gets distributed, and only works if the coordinator you are 
 talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2014-12-13 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14245489#comment-14245489
 ] 

Jon Haddad commented on CASSANDRA-7296:
---

I suspect this would help with Spark since it currently has to do 1 query per 
token range.  Kind of a mess doing 256 queries that all look at the same 
sstables.

 Add CL.COORDINATOR_ONLY
 ---

 Key: CASSANDRA-7296
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
 Project: Cassandra
  Issue Type: Improvement
Reporter: Tupshin Harper

 For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
 read that never gets distributed, and only works if the coordinator you are 
 talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)