[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563043#comment-15563043 ] Edward Capriolo edited comment on CASSANDRA-7296 at 10/10/16 6:22 PM: -- {quote} I'm very opposed to the /*disable_snitch=true*/ syntax. We don't use that anywhere, and why would we want that to be part of the statement? Making it part of the statement removes the ability to disable dynamic snitch at a per query level, including it as part of CQL makes it per prepared statement. It's not like adding it to the protocol is any different than specifying consistency level or a write timestamp. {quote} Again, this is how most (if not all databases do this). The reason is for RDBMS databases the API's are standard (like JDBC) and you can not add new functionality in the form of new methods at the driver level. The win of CQL is it solves everything in the query language. Everything else takes something out of the language makes it more like thirft. It is now something that EVERY client driver must implement. This is why the consistency level makes sense as well because you can fit the need without making a new feature that all the clients must implement to get the functionality. Another way to do this is make the options a clear part of the language: https://msdn.microsoft.com/en-us/library/ms181714.aspx This is essentially the same thing as /* */ the parser parses it and acts. It is only a matter of the syntax. was (Author: appodictic): {quote} I'm very opposed to the /*disable_snitch=true*/ syntax. We don't use that anywhere, and why would we want that to be part of the statement? Making it part of the statement removes the ability to disable dynamic snitch at a per query level, including it as part of CQL makes it per prepared statement. It's not like adding it to the protocol is any different than specifying consistency level or a write timestamp. {quote} Again, this is how most (if not all databases do this). The reason is for RDBMS databases the API's are standard (like JDBC) and you can not add new functionality in the form of new methods. The win of CQL is it solves everything in the query language. Everything else takes something out of the language makes it more like thirft. It is now something that EVERY client driver must implement. This is why the consistency level makes sense as well because you can fit the need without making a new feature that all the clients must implement to get the functionality > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563043#comment-15563043 ] Edward Capriolo edited comment on CASSANDRA-7296 at 10/10/16 6:20 PM: -- {quote} I'm very opposed to the /*disable_snitch=true*/ syntax. We don't use that anywhere, and why would we want that to be part of the statement? Making it part of the statement removes the ability to disable dynamic snitch at a per query level, including it as part of CQL makes it per prepared statement. It's not like adding it to the protocol is any different than specifying consistency level or a write timestamp. {quote} Again, this is how most (if not all databases do this). The reason is for RDBMS databases the API's are standard (like JDBC) and you can not add new functionality in the form of new methods. The win of CQL is it solves everything in the query language. Everything else takes something out of the language makes it more like thirft. It is now something that EVERY client driver must implement. This is why the consistency level makes sense as well because you can fit the need without making a new feature that all the clients must implement to get the functionality was (Author: appodictic): {quote} I'm very opposed to the /*disable_snitch=true*/ syntax. We don't use that anywhere, and why would we want that to be part of the statement? Making it part of the statement removes the ability to disable dynamic snitch at a per query level, including it as part of CQL makes it per prepared statement. It's not like adding it to the protocol is any different than specifying consistency level or a write timestamp. {quote} Again, this is how most (if not all databases do this). The reason is for RDBMS databases the API's are standard (like JDBC) and you can not add new functionality in the form of new methods. The point of CQL is it solves everything in the query language, every weird switch that takes something out of the language makes it more like thirft. It is now something that EVERY client drive must implement. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562910#comment-15562910 ] Edward Capriolo edited comment on CASSANDRA-7296 at 10/10/16 5:30 PM: -- {quote} stmt = session.prepare("SELECT * from tab where id = ?", consistency_level=ConsistencyLevel.ONE) stmt.disable_dynamic_snitch() {quote} I think it would be better using more standard SQL for optimizations. This is the common way query hints are provided. {quote} stmt = session.prepare("SELECT /* disable_snitch=true */ * from tab where id = ?", consistency_level=ConsistencyLevel.ONE) {quote} Providing extra methods like this seems thrift like. {quote} stmt.disable_dynamic_snitch() {quote} This makes an API not a query language. was (Author: appodictic): {quote} stmt = session.prepare("SELECT * from tab where id = ?", consistency_level=ConsistencyLevel.ONE) stmt.disable_dynamic_snitch() {quote} I think it would be better using more standard SQL for optimizations. This is the common way query hints are provided. {quote} stmt = session.prepare("SELECT /*disable_snitch=true*/ * from tab where id = ?", consistency_level=ConsistencyLevel.ONE) {quote} Providing extra methods like this seems thrift like. {quote} stmt.disable_dynamic_snitch() {quote} This makes an API not a query language. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562870#comment-15562870 ] Edward Capriolo edited comment on CASSANDRA-7296 at 10/10/16 5:26 PM: -- I think it makes sense as either but it really makes sense as a consistency level as well. THIS_ONE might be a better name. Other consistency levels do express WHERE you want something to happen: Aren't we discussing adding consistency levels here? https://issues.apache.org/jira/browse/CASSANDRA-8119 The difference between 8119 and this is that this is implemented in a patch, so a rational argument is to do this feature in the least intrusive way. was (Author: appodictic): Is https://issues.apache.org/jira/browse/CASSANDRA-8119 a protocol option as well? > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562870#comment-15562870 ] Edward Capriolo edited comment on CASSANDRA-7296 at 10/10/16 5:27 PM: -- I think it makes sense as either but it really makes sense as a consistency level as well. THIS_ONE might be a better name. Other consistency levels do express WHERE you want something to happen such as ANY: Aren't we discussing adding consistency levels here? https://issues.apache.org/jira/browse/CASSANDRA-8119 The difference between 8119 and this is that this is implemented in a patch, so a rational argument is to do this feature in the least intrusive way. was (Author: appodictic): I think it makes sense as either but it really makes sense as a consistency level as well. THIS_ONE might be a better name. Other consistency levels do express WHERE you want something to happen: Aren't we discussing adding consistency levels here? https://issues.apache.org/jira/browse/CASSANDRA-8119 The difference between 8119 and this is that this is implemented in a patch, so a rational argument is to do this feature in the least intrusive way. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562698#comment-15562698 ] Edward Capriolo edited comment on CASSANDRA-7296 at 10/10/16 4:10 PM: -- {quote} Basically, despite this being arguably confusing to most, I'm not sure we have really quantified the advantage this brings us, which is a shame {quote} It brings one key thing. The clients do logic to control where to route request, they do this because they want the lowest latency. We want the server to respect the brain power of the client and carry out the operation where it decided, not forward the request elsewhere like it (sometimes) does now incurring more latency on some requests and making them hard to debug. was (Author: appodictic): {quote} Basically, despite this being arguably confusing to most, I'm not sure we have really quantified the advantage this brings us, which is a shame {quote} It brings one key thing. The clients do logic to control where to route request, they do this because they want the lowest latency. We want the server to respect the brain power of the client and carry out the operation where it decided, not forward the request elsewhere like it (sometimes) does now. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1978#comment-1978 ] Edward Capriolo edited comment on CASSANDRA-7296 at 10/7/16 7:05 PM: - {quote} Since there's little upside to this, and quite a bit of potential downside {quote} This is really useful if you want to do user generated request pinning. ONE could allow the node to proxy the request away based on what dynamic_snitch wants to do. {quote} New consistency levels tend to introduce a lot of edge-case bugs, and this one is particularly special, which probably means extra bugs. {quote} I am not following this logic. Because previously attempts which added buggy or incomplete features stand as a reason not to add new features? was (Author: appodictic): {quote} Since there's little upside to this, and quite a bit of potential downside {quote} This is really useful if you want to do user generated request pinning. ONE could allow the node to proxy the request away based on what dynamic_snitch wants to do. {quote} New consistency levels tend to introduce a lot of edge-case bugs, and this one is particularly special, which probably means extra bugs. {quote} I am not following this logic. Why does because previously attempts which added buggy or incomplete features stand as a reason not to add new features? > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1978#comment-1978 ] Edward Capriolo edited comment on CASSANDRA-7296 at 10/7/16 7:03 PM: - {quote} Since there's little upside to this, and quite a bit of potential downside {quote} This is really useful if you want to do user generated request pinning. ONE could allow the node to proxy the request away based on what dynamic_snitch wants to do. {quote} New consistency levels tend to introduce a lot of edge-case bugs, and this one is particularly special, which probably means extra bugs. {quote} I am not following this logic. Why does because previously attempts which added buggy or incomplete features stand as a reason not to add new features? was (Author: appodictic): {quote} Since there's little upside to this, and quite a bit of potential downside {quote} This is really useful if you want to do user generated request pinning. ONE could allows the node to proxy the request away based on what dynamic_snitch wants to do. {quote} New consistency levels tend to introduce a lot of edge-case bugs, and this one is particularly special, which probably means extra bugs. {quote} I am not following this logic. Why does because previously attempts which added buggy or incomplete features stand as a reason not to add new features? > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1951#comment-1951 ] Jon Haddad edited comment on CASSANDRA-7296 at 10/7/16 6:57 PM: I'd like to resurrect this. There's cases where an operator needs to know exactly what's on a specific node. CL.COORDINATOR_ONLY is useful for debugging all sorts of production issues. Dynamic snitch makes CL=ONE not an effective way of determining what's on a specific node. was (Author: rustyrazorblade): I'd like to resurrect this. There's cases where an operator needs to know exactly what's on a specific node. CL.COORDINATOR_ONLY is useful for debugging all sorts of production issues. > Add CL.COORDINATOR_ONLY > --- > > Key: CASSANDRA-7296 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 > Project: Cassandra > Issue Type: Improvement >Reporter: Tupshin Harper > > For reasons such as CASSANDRA-6340 and similar, it would be nice to have a > read that never gets distributed, and only works if the coordinator you are > talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249655#comment-14249655 ] Piotr Kołaczkowski edited comment on CASSANDRA-7296 at 12/17/14 9:39 AM: - Honestly, I don't like this idea for Spark because of the following reasons: # Seems like adding quite a lot of complexity to handle the following cases: ** What do we do if RF 1 to avoid duplicates? ** If we decide on primary token range only, what do we do if one of the nodes fail and some primary token ranges have no node to query from? ** What if the amount of data is large enough that we'd like to actually split token ranges so that they are smaller and there are more spark tasks? This is important for bigger jobs to protect from sudden failures and not having to recompute too much in case of a lost spark partition. ** How do we fetch data from the same node in parallel? Currently it is perfectly fine to have one Spark node using multiple cores (mappers) that fetch data from the same coordinator node separately? # It is trying to solve a theoretical problem which hasn't proved in practice yet. ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No significant difference on larger data sets, and only a tiny difference on really small sets (constant cost of the query is higher than the cost of fetching the data). ** There are no customers reporting vnodes to be a problem for them. ** Theoretical reason: If data is large enough to not fit in page cache (hundreds of GBs on a single node), 256 additional random seeks is not going to cause a huge penalty because: *** some of them can be hidden by splitting those queries between separate Spark threads, so they would be submitted and executed in parallel *** each token range will be of size *hundreds* of MBs, which is enough large to hide one or two seeks Some *real* performance problems we (and users) observed: * Cassandra is taking plenty of CPU when doing sequential scans. It is not possible to saturate bandwidth of a single laptop spinning HDD, because all cores of i7 CPU @2.4 GHz are 100% busy processing those small CQL cells, merging rows from different SSTables, ordering cells, filtering out tombstones, serializing etc. The problem doesn't go away after doing full compaction or disabling vnodes. This is a serious problem, because doing exactly the same query on a plain text file stored in CFS (still C*, but data stored as 2MB blobs) gives 3-30x performance boost (depending on who did the benchmark). We need to close this gap. See: https://datastax.jira.com/browse/DSP-3670 * We need to improve backpressure mechanism at least in such a way that the driver or Spark connector would know to start throttling writes if the cluster doesn't keep up. Currently Cassandra just timeouts the writes, but once it happens, the driver has no clue how long to wait until it is ok to resubmit the update. It would be actually good to know long enough before timing out, so we could slow down and avoid wasteful retrying at all. Currently it is not possible to predict cluster load by e.g. observing write latency, because the latency is extremely good until it is suddently terrible (timeout). This is also important for other non-Spark related use cases. See https://issues.apache.org/jira/browse/CASSANDRA-7937. was (Author: pkolaczk): Honestly, I don't like this idea because of the following reasons: # Seems like adding quite a lot of complexity to handle the following cases: ** What do we do if RF 1 to avoid duplicates? ** If we decide on primary token range only, what do we do if one of the nodes fail and some primary token ranges have no node to query from? ** What if the amount of data is large enough that we'd like to actually split token ranges so that they are smaller and there are more spark tasks? This is important for bigger jobs to protect from sudden failures and not having to recompute too much in case of a lost spark partition. ** How do we fetch data from the same node in parallel? Currently it is perfectly fine to have one Spark node using multiple cores (mappers) that fetch data from the same coordinator node separately? # It is trying to solve a theoretical problem which hasn't proved in practice yet. ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No significant difference on larger data sets, and only a tiny difference on really small sets (constant cost of the query is higher than the cost of fetching the data). ** There are no customers reporting vnodes to be a problem for them. ** Theoretical reason: If data is large enough to not fit in page cache (hundreds of GBs on a single node), 256 additional random seeks is not going to cause a huge penalty because: *** some of them can be hidden by splitting
[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249655#comment-14249655 ] Piotr Kołaczkowski edited comment on CASSANDRA-7296 at 12/17/14 9:41 AM: - Honestly, I don't think it would benefit Spark integration: # Seems like adding quite a lot of complexity to handle the following cases: ** What do we do if RF 1 to avoid duplicates? ** If we decide on primary token range only, what do we do if one of the nodes fail and some primary token ranges have no node to query from? ** What if the amount of data is large enough that we'd like to actually split token ranges so that they are smaller and there are more spark tasks? This is important for bigger jobs to protect from sudden failures and not having to recompute too much in case of a lost spark partition. ** How do we fetch data from the same node in parallel? Currently it is perfectly fine to have one Spark node using multiple cores (mappers) that fetch data from the same coordinator node separately? # It is trying to solve a theoretical problem which hasn't proved in practice yet. ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No significant difference on larger data sets, and only a tiny difference on really small sets (constant cost of the query is higher than the cost of fetching the data). ** There are no customers reporting vnodes to be a problem for them. ** Theoretical reason: If data is large enough to not fit in page cache (hundreds of GBs on a single node), 256 additional random seeks is not going to cause a huge penalty because: *** some of them can be hidden by splitting those queries between separate Spark threads, so they would be submitted and executed in parallel *** each token range will be of size *hundreds* of MBs, which is enough large to hide one or two seeks Some *real* performance problems we (and users) observed: * Cassandra is taking plenty of CPU when doing sequential scans. It is not possible to saturate bandwidth of a single laptop spinning HDD, because all cores of i7 CPU @2.4 GHz are 100% busy processing those small CQL cells, merging rows from different SSTables, ordering cells, filtering out tombstones, serializing etc. The problem doesn't go away after doing full compaction or disabling vnodes. This is a serious problem, because doing exactly the same query on a plain text file stored in CFS (still C*, but data stored as 2MB blobs) gives 3-30x performance boost (depending on who did the benchmark). We need to close this gap. See: https://datastax.jira.com/browse/DSP-3670 * We need to improve backpressure mechanism at least in such a way that the driver or Spark connector would know to start throttling writes if the cluster doesn't keep up. Currently Cassandra just timeouts the writes, but once it happens, the driver has no clue how long to wait until it is ok to resubmit the update. It would be actually good to know long enough before timing out, so we could slow down and avoid wasteful retrying at all. Currently it is not possible to predict cluster load by e.g. observing write latency, because the latency is extremely good until it is suddently terrible (timeout). This is also important for other non-Spark related use cases. See https://issues.apache.org/jira/browse/CASSANDRA-7937. was (Author: pkolaczk): Honestly, I don't like this idea for Spark because of the following reasons: # Seems like adding quite a lot of complexity to handle the following cases: ** What do we do if RF 1 to avoid duplicates? ** If we decide on primary token range only, what do we do if one of the nodes fail and some primary token ranges have no node to query from? ** What if the amount of data is large enough that we'd like to actually split token ranges so that they are smaller and there are more spark tasks? This is important for bigger jobs to protect from sudden failures and not having to recompute too much in case of a lost spark partition. ** How do we fetch data from the same node in parallel? Currently it is perfectly fine to have one Spark node using multiple cores (mappers) that fetch data from the same coordinator node separately? # It is trying to solve a theoretical problem which hasn't proved in practice yet. ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No significant difference on larger data sets, and only a tiny difference on really small sets (constant cost of the query is higher than the cost of fetching the data). ** There are no customers reporting vnodes to be a problem for them. ** Theoretical reason: If data is large enough to not fit in page cache (hundreds of GBs on a single node), 256 additional random seeks is not going to cause a huge penalty because: *** some of them can be hidden by splitting those
[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249655#comment-14249655 ] Piotr Kołaczkowski edited comment on CASSANDRA-7296 at 12/17/14 5:41 PM: - Honestly, I don't think it would benefit Spark integration: # Seems like adding quite a lot of complexity to handle the following cases: ** What do we do if RF 1 to avoid duplicates? ** If we decide on primary token range only, what do we do if one of the nodes fail and some primary token ranges have no node to query from? ** What if the amount of data is large enough that we'd like to actually split token ranges so that they are smaller and there are more spark tasks? This is important for bigger jobs to protect from sudden failures and not having to recompute too much in case of a lost spark partition. ** How do we fetch data from the same node in parallel? Currently it is perfectly fine to have one Spark node using multiple cores (mappers) that fetch data from the same coordinator node separately? # It is trying to solve a theoretical problem which hasn't proved in practice yet. ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No significant difference on larger data sets, and only a tiny difference on really small sets (constant cost of the query is higher than the cost of fetching the data). ** There are no customers reporting vnodes to be a problem for them. ** Theoretical reason: If data is large enough to not fit in page cache (hundreds of GBs on a single node), 256 additional random seeks is not going to cause a huge penalty because: *** some of them can be hidden by splitting those queries between separate Spark threads, so they would be submitted and executed in parallel *** each token range will be of size *hundreds* of MBs, which is enough large to hide one or two seeks Some *real* performance problems we (and users) observed: * Cassandra is taking plenty of CPU when doing sequential scans. It is not possible to saturate bandwidth of a single laptop spinning HDD, because all cores of i7 CPU @2.4 GHz are 100% busy processing those small CQL cells, merging rows from different SSTables, ordering cells, filtering out tombstones, serializing etc. The problem doesn't go away after doing full compaction or disabling vnodes. This is a serious problem, because doing exactly the same query on a plain text file stored in CFS (still C*, but data stored as 2MB blobs) gives 3-30x performance boost (depending on who did the benchmark). We need to close this gap. * We need to improve backpressure mechanism at least in such a way that the driver or Spark connector would know to start throttling writes if the cluster doesn't keep up. Currently Cassandra just timeouts the writes, but once it happens, the driver has no clue how long to wait until it is ok to resubmit the update. It would be actually good to know long enough before timing out, so we could slow down and avoid wasteful retrying at all. Currently it is not possible to predict cluster load by e.g. observing write latency, because the latency is extremely good until it is suddently terrible (timeout). This is also important for other non-Spark related use cases. See https://issues.apache.org/jira/browse/CASSANDRA-7937. was (Author: pkolaczk): Honestly, I don't think it would benefit Spark integration: # Seems like adding quite a lot of complexity to handle the following cases: ** What do we do if RF 1 to avoid duplicates? ** If we decide on primary token range only, what do we do if one of the nodes fail and some primary token ranges have no node to query from? ** What if the amount of data is large enough that we'd like to actually split token ranges so that they are smaller and there are more spark tasks? This is important for bigger jobs to protect from sudden failures and not having to recompute too much in case of a lost spark partition. ** How do we fetch data from the same node in parallel? Currently it is perfectly fine to have one Spark node using multiple cores (mappers) that fetch data from the same coordinator node separately? # It is trying to solve a theoretical problem which hasn't proved in practice yet. ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No significant difference on larger data sets, and only a tiny difference on really small sets (constant cost of the query is higher than the cost of fetching the data). ** There are no customers reporting vnodes to be a problem for them. ** Theoretical reason: If data is large enough to not fit in page cache (hundreds of GBs on a single node), 256 additional random seeks is not going to cause a huge penalty because: *** some of them can be hidden by splitting those queries between separate Spark threads, so they would be
[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY
[ https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250222#comment-14250222 ] Jon Haddad edited comment on CASSANDRA-7296 at 12/17/14 6:05 PM: - Good points. I think this issue would result in other, perhaps more serious problems, making an appearance. I am not convinced, however, that NUM_TOKENS = NUM_QUERIES is the right solution on the spark side either, under the case of (data size disk size disk_type == spinning_rust). I think we can move any future discussion to the driver JIRA and reference this from there. was (Author: rustyrazorblade): Good points. I think this issue would result in other, perhaps more serious problems, making an appearance. I am not convinced, however, that NUM_TOKENS = NUM_QUERIES is the right solution on the spark side either, under the case of (data disk disk_type == spinning_rust). I think we can move any future discussion to the driver JIRA and reference this from there. Add CL.COORDINATOR_ONLY --- Key: CASSANDRA-7296 URL: https://issues.apache.org/jira/browse/CASSANDRA-7296 Project: Cassandra Issue Type: Improvement Reporter: Tupshin Harper For reasons such as CASSANDRA-6340 and similar, it would be nice to have a read that never gets distributed, and only works if the coordinator you are talking to is an owner of the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)