[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565718#comment-15565718 ] Jeremy Hanna commented on CASSANDRA-7296: - It does look like given the use case and that it really only applies to CL.ONE, it does look like the CL addition is a clearer/cleaner option. It makes using the rest of the driver options simpler to reason about because it makes the CL contract very clear regardless of the other options. The driver changes appear to have the same level of intrusiveness and the protocol would have to be updated in either case. Is there a reason why a CL addition couldn't be done in this case - or in other words, do the edge cases of adding a CL outweigh the clarity of this function as a CL? > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563258#comment-15563258 ] Brian Hess commented on CASSANDRA-7296: Consistency Level does feel like the right approach. The THIS_ prefix is in line with LOCAL_ in that it would identify the locus of nodes that are available for consistency. With LOCAL_ONE, we need just one replica from the data center of this coordinator. If no replicas exist (like the RF=0) then you get UnavailableException. Namely, you don't reach out to other nodes and proxy for another DC, etc. Also note that while the client can certainly see that it's talking to a DC with no RF by looking at system tables or driver API calls, we still throw the UnavailableException. In the THIS_ONE, we are saying that the locus of available nodes for consistency level is just the coordonator itself. If that node is not a replica, then it should also throw an UnavailableException. It should not silently go ask the actual replicas, just like in the LOCAL_ONE case we don't ask other DCs. While it is true that the client could know that this node is not a replica, it is the same as in LOCAL_ONE and RF. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563081#comment-15563081 ] Jeremiah Jordan commented on CASSANDRA-7296: Just an FYI changes to the CL enum require changes to every driver as well. CL is a protocol level option, not part of a query string. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563059#comment-15563059 ] Jon Haddad commented on CASSANDRA-7296: --- It sounds like what you're suggesting is that *every* query setting be moved to CQL. That's a different discussion altogether. Currently settings that change how the protocol behaves go in the protocol, that's how Cassandra works now. Trying to change that behavior by starting with a single feature just leaves everyone with inconsistencies in how the driver itself behaves. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563043#comment-15563043 ] Edward Capriolo commented on CASSANDRA-7296: {quote} I'm very opposed to the /*disable_snitch=true*/ syntax. We don't use that anywhere, and why would we want that to be part of the statement? Making it part of the statement removes the ability to disable dynamic snitch at a per query level, including it as part of CQL makes it per prepared statement. It's not like adding it to the protocol is any different than specifying consistency level or a write timestamp. {quote} Again, this is how most (if not all databases do this). The reason is for RDBMS databases the API's are standard (like JDBC) and you can not add new functionality in the form of new methods. The point of CQL is it solves everything in the query language, every weird switch that takes something out of the language makes it more like thirft. It is now something that EVERY client drive must implement. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562992#comment-15562992 ] Jeremiah Jordan commented on CASSANDRA-7296: Besides what the client API looks like how would people expect this to behave if the coordinator is not a replica? That decision may also affect how the API should look from a "least surprises" stand point. If the CL was "THIS_ONE" I would expect no data or possibly an UnavailableException (IIRC this is what you get from LOCAL_ in a DC with no replicas). If it was a flag called "prefer coordinator" of something then I would expect the request to be coordinated to replica nodes if the coordinator wasn't one. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562990#comment-15562990 ] Blake Eggleston commented on CASSANDRA-7296: bq. are there scenarios where you would query a non-replica and expect it to return nothing rather than proxy the request It's more the guarantee that you're _definitely_ looking at the data on a certain node. If we proxy when non-replicas are queried then you can't be sure that you're looking at the data on a certain node. If you've made a mistake, and queried a non replica, you'll see data from a different node > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562951#comment-15562951 ] Jon Haddad commented on CASSANDRA-7296: --- To Blake's point, are there scenarios where you would query a non-replica and expect it to return nothing rather than proxy the request? > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562934#comment-15562934 ] Jon Haddad commented on CASSANDRA-7296: --- I'm very opposed to the /\*disable_snitch=true\*/ syntax. We don't use that anywhere, and why would we want that to be part of the statement? Making it part of the statement removes the ability to disable dynamic snitch at a _per query_ level, including it as part of CQL makes it per prepared statement. It's not like adding it to the protocol is any different than specifying consistency level or a write timestamp. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562910#comment-15562910 ] Edward Capriolo commented on CASSANDRA-7296: {quote} stmt = session.prepare("SELECT * from tab where id = ?", consistency_level=ConsistencyLevel.ONE) stmt.disable_dynamic_snitch() {quote} I think it would be better using more standard SQL for optimizations. This is the common way query hints are provided. {quote} stmt = session.prepare("SELECT /*disable_snitch=true*/ * from tab where id = ?", consistency_level=ConsistencyLevel.ONE) {quote} Providing extra methods like this seems thrift like. {quote} stmt.disable_dynamic_snitch() {quote} This makes an API not a query language. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562896#comment-15562896 ] Blake Eggleston commented on CASSANDRA-7296: Just to clarify, has the goal of the ticket changed to give operators the option to always include the coordinator in a read if it's a replica? The goal as stated in the ticket description is to give operators the option to perform a local only read against the coordinator they’ve connected to, and fail (or return nothing) if it's not a replica. In the context of the original description, combining this option with CLs other than ONE doesn’t make much sense. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562870#comment-15562870 ] Edward Capriolo commented on CASSANDRA-7296: Is https://issues.apache.org/jira/browse/CASSANDRA-8119 a protocol option as well? > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562824#comment-15562824 ] Jon Haddad commented on CASSANDRA-7296: --- {quote}But short of that, why not directly attack the problem we're trying to solve and add a (protocol) option to queries to force the behavior ("always pick the coordinator as one replica if it's one"). That sounds less confusing to me than have a new CL that will confuse newcomers (as the difference with ONE is somewhat subtle for a newcomer). As a bonus, it would also work for CL > ONE (since again, it'll just be about forcing the dynamic snitch to pick the coordinator if it's a replica). {quote} This is a reasonable alternative. I'm not sure if it's useful outside of CL=ONE, but there's probably a use case I'm not thinking of. Using the Python driver would look something like this, I'm assuming: {code} stmt = session.prepare("SELECT * from tab where id = ?", consistency_level=ConsistencyLevel.ONE) stmt.disable_dynamic_snitch() session.execute(stmt, [1]) {code} Plus a bit to direct the driver to a particular replica, which has to happen regardless. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562814#comment-15562814 ] Jeff Jirsa commented on CASSANDRA-7296: --- +1 in favor of protocol option, so users can apply it to other CLs as desired. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562736#comment-15562736 ] Jeremiah Jordan commented on CASSANDRA-7296: +1 for protocol option, not new CL. Also I think the removal of Severity from DynamicEndpointSnitch (CASSANDRA-11738) should reduce the times where it does something very screwy in picking replicas. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562698#comment-15562698 ] Edward Capriolo commented on CASSANDRA-7296: {quote} Basically, despite this being arguably confusing to most, I'm not sure we have really quantified the advantage this brings us, which is a shame {quote} It brings one key thing. The clients do logic to control where to route request, they do this because they want the lowest latency. We want the server to respect the brain power of the client and carry out the operation where it decided, not forward the request elsewhere like it (sometimes) does now. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562659#comment-15562659 ] Sylvain Lebresne commented on CASSANDRA-7296: - bq. I'm concerned it will prove to be a step backwards in real clusters, where coordinator disk latencies may truly jump up up significantly Right, by we also have rapid read protection now that might limit that problem reasonably well. But anyway, I'm not making any strong claim here, that's why I started my sentence by "I'm not even entirely sure". Basically, despite this being arguably confusing to most, I'm not sure we have really quantified the advantage this brings us, which is a shame (but it's not like I'm volunteering for experimenting here so ). To clarify, my main point was that I dislike the idea of providing this through a new CL, and I'd rather have that being a protocol level query option (we have to change the protocol _anyway_). > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562632#comment-15562632 ] Jeff Jirsa commented on CASSANDRA-7296: --- {quote} First, I'm not even entirely sure than letting the dynamic snitch bypass the coordinator if it's a replica is a good idea in the first place. Everyone more or less agree that doing token-aware routing is a good thing nowadays, and it's certainly confusing that the dynamic snitch may screw that up. If the dynamic snitch was a perfect and instantaneous view of latencies, then that could make sense, but it's not. Anyway, I think it's worth at least evaluating making even the dynamic snitch always pick the local node if it's a replica, as I'm not sure the benefit of not doing so outweigh the confusion it creates. {quote} Emotionally, I want this to be the right answer (principle of least astonishment), but I don't think it is. I'm concerned it will prove to be a step backwards in real clusters, where coordinator disk latencies may truly jump up up significantly (imagine all compaction threads running scrub/cleanup, where not only is the disk likely completely utilized, but the # of sstables on disk grows because all compaction threads are in use, so reads are more expensive than normal - in this case, dsnitch DOES save us, and implementing this type of change would be very hard to work around in production with most drivers). > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562407#comment-15562407 ] Brian Hess commented on CASSANDRA-7296: Just my 2 cents, but having this be a per-table option is not really a great solution for the debugging issue. I'd be okay if we had that as a default, but we'd certainly need to support having this behavior even if that table option isn't set (it would be unfortunate to have to ALTER the table to get this behavior). > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561789#comment-15561789 ] Sylvain Lebresne commented on CASSANDRA-7296: - I can certainly agree that having the dynamic snitch getting in the way of local queries is not always desirable, but I'm less convinced that adding a new consistency level is the cleanest way to circumvent that. First, I'm not even entirely sure than letting the dynamic snitch bypass the coordinator if it's a replica is a good idea in the first place. Everyone more or less agree that doing token-aware routing is a good thing nowadays, and it's certainly confusing that the dynamic snitch may screw that up. If the dynamic snitch was a perfect and instantaneous view of latencies, then that could make sense, but it's not. Anyway, I think it's worth at least evaluating making even the dynamic snitch always pick the local node if it's a replica, as I'm not sure the benefit of not doing so outweigh the confusion it creates. But short of that, why not directly attack the problem we're trying to solve and add a (protocol) option to queries to force the behavior ("always pick the coordinator as one replica if it's one"). That sounds less confusing to me than have a new CL that will confuse newcomers (as the difference with ONE is somewhat subtle for a newcomer). As a bonus, it would also work for CL > ONE (since again, it'll just be about forcing the dynamic snitch to pick the coordinator if it's a replica). We could also have a table option to do the same: force the dynamic snitch to pick the coordinator if it's a replica for all queries on that table, which would be a tad more convenient for request pinning (of course, I get that for troubleshooting you still want to per-query option). > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561602#comment-15561602 ] Stefan Podkowinski commented on CASSANDRA-7296: --- bq. There's cases where an operator needs to know exactly what's on a specific node. CL.COORDINATOR_ONLY is useful for debugging all sorts of production issues. Dynamic snitch makes CL=ONE not an effective way of determining what's on a specific node. This. Most people are not even aware of this behavior and get confused by different results. There definitely should be a way to query individual nodes deterministically even if it's "just" for troubleshooting. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556085#comment-15556085 ] Blake Eggleston commented on CASSANDRA-7296: Agreed, this would be useful in testing and troubleshooting. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556007#comment-15556007 ] Tupshin Harper commented on CASSANDRA-7296: --- Given the fresh activity, I'd like to re-emphasize my support for this ticket. I think node/data debugging via request pinning is an excellent use of it, and is basically the original reason for the ticket. Spark turned out to be an irrelevant tangent, but there is significant benefit in supporting this (degeneratively simple) form of consistency. If [~jjirsa]'s patch is still applicable (or can be), i'd love to see it given a fair shake. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556008#comment-15556008 ] Chris Lohfink commented on CASSANDRA-7296: -- I could see this being useful in writing tests > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1978#comment-1978 ] Edward Capriolo commented on CASSANDRA-7296: {quote} Since there's little upside to this, and quite a bit of potential downside {quote} This is really useful if you want to do user generated request pinning. ONE could allows the node to proxy the request away based on what dynamic_snitch wants to do. {quote} New consistency levels tend to introduce a lot of edge-case bugs, and this one is particularly special, which probably means extra bugs. {quote} I am not following this logic. Why does because previously attempts which added buggy or incomplete features stand as a reason not to add new features? > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1951#comment-1951 ] Jon Haddad commented on CASSANDRA-7296: --- I'd like to resurrect this. There's cases where an operator needs to know exactly what's on a specific node. CL.COORDINATOR_ONLY is useful for debugging all sorts of production issues. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper >Assignee: Jeff Jirsa > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334203#comment-14334203 ] Tyler Hobbs commented on CASSANDRA-7296: It seems like the primary motivation here was performance and the use-case mentioned in CASSANDRA-6340. The performance concern seems to have been address thoroughly by Piotr, and as I commented on 6340, I'm not sure this is the best solution there either. New consistency levels tend to introduce a lot of edge-case bugs, and this one is particularly special, which probably means extra bugs. Since there's little upside to this, and quite a bit of potential downside, I vote for closing this as Won't Fix. Add CL.COORDINATOR_ONLY --- Key: CASSANDRA-7296 URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 Project: Cassandra Issue Type: Improvement Reporter: Tupshin Harper Assignee: Jeff Jirsa For reasons such as CASSANDRA-6340 and similar, it would be nice to have a read that never gets distributed, and only works if the coordinator you are talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329916#comment-14329916 ] Jeff Jirsa commented on CASSANDRA-7296: --- Untested patch available at https://github.com/jeffjirsa/cassandra/compare/cassandra-7296.diff CQLSH requires python driver updated/patch at https://github.com/jeffjirsa/python-driver/compare/coordinator-only.diff I have no idea if there's interest in actually merging this (that is, if the project actually wants CL.COORDINATOR_ONLY). I can see use cases where people might want it. I'm not sure if it's worth the added complexity on the project. If someone confirms there's interest, I'll do more thorough testing. Add CL.COORDINATOR_ONLY --- Key: CASSANDRA-7296 URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 Project: Cassandra Issue Type: Improvement Reporter: Tupshin Harper For reasons such as CASSANDRA-6340 and similar, it would be nice to have a read that never gets distributed, and only works if the coordinator you are talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249655#comment-14249655 ] Piotr Kołaczkowski commented on CASSANDRA-7296: --- Honestly, I don't like this idea because of the following reasons: # Seems like adding quite a lot of complexity to handle the following cases: ** What do we do if RF 1 to avoid duplicates? ** If we decide on primary token range only, what do we do if one of the nodes fail and some primary token ranges have no node to query from? ** What if the amount of data is large enough that we'd like to actually split token ranges so that they are smaller and there are more spark tasks? This is important for bigger jobs to protect from sudden failures and not having to recompute too much in case of a lost spark partition. ** How do we fetch data from the same node in parallel? Currently it is perfectly fine to have one Spark node using multiple cores (mappers) that fetch data from the same coordinator node separately? # It is trying to solve a theoretical problem which hasn't proved in practice yet. ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No significant difference on larger data sets, and only a tiny difference on really small sets (constant cost of the query is higher than the cost of fetching the data). ** There are no customers reporting vnodes to be a problem for them. ** Theoretical reason: If data is large enough to not fit in page cache (hundreds of GBs on a single node), 256 additional random seeks is not going to cause a huge penalty because: *** some of them can be hidden by splitting those queries between separate Spark threads, so they would be submitted and executed in parallel *** each token range will be of size *hundreds* of MBs, which is enough large to hide one or two seeks Some *real* performance problems we (and users) observed: * Cassandra is taking plenty of CPU when doing sequential scans. It is not possible to saturate bandwidth of a single laptop spinning HDD, because all cores of i7 CPU @2.4 GHz are 100% busy processing those small CQL cells, merging rows from different SSTables, ordering cells, filtering out tombstones, serializing etc. The problem doesn't go away after doing full compaction or disabling vnodes. This is a serious problem, because doing exactly the same query on a plain text file stored in CFS (still C*, but data stored as 2MB blobs) gives 3-30x performance boost (depending on who did the benchmark). We need to close this gap. See: https://datastax.jira.com/browse/DSP-3670 * We need to improve backpressure mechanism at least in such a way that the driver or Spark connector would know to start throttling writes if the cluster doesn't keep up. Currently Cassandra just timeouts the writes, but once it happens, the driver has no clue how long to wait until it is ok to resubmit the update. It would be actually good to know long enough before timing out, so we could slow down and avoid wasteful retrying at all. Currently it is not possible to predict cluster load by e.g. observing write latency, because the latency is extremely good until it is suddently terrible (timeout). This is also important for other non-Spark related use cases. See https://issues.apache.org/jira/browse/CASSANDRA-7937. Add CL.COORDINATOR_ONLY --- Key: CASSANDRA-7296 URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 Project: Cassandra Issue Type: Improvement Reporter: Tupshin Harper For reasons such as CASSANDRA-6340 and similar, it would be nice to have a read that never gets distributed, and only works if the coordinator you are talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250222#comment-14250222 ] Jon Haddad commented on CASSANDRA-7296: --- Good points. I think this issue would result in other, perhaps more serious problems, making an appearance. I am not convinced, however, that NUM_TOKENS = NUM_QUERIES is the right solution on the spark side either, under the case of (data disk disk_type == spinning_rust). I think we can move any future discussion to the driver JIRA and reference this from there. Add CL.COORDINATOR_ONLY --- Key: CASSANDRA-7296 URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 Project: Cassandra Issue Type: Improvement Reporter: Tupshin Harper For reasons such as CASSANDRA-6340 and similar, it would be nice to have a read that never gets distributed, and only works if the coordinator you are talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14245489#comment-14245489 ] Jon Haddad commented on CASSANDRA-7296: --- I suspect this would help with Spark since it currently has to do 1 query per token range. Kind of a mess doing 256 queries that all look at the same sstables. Add CL.COORDINATOR_ONLY --- Key: CASSANDRA-7296 URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 Project: Cassandra Issue Type: Improvement Reporter: Tupshin Harper For reasons such as CASSANDRA-6340 and similar, it would be nice to have a read that never gets distributed, and only works if the coordinator you are talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)