[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563043#comment-15563043
 ] 

Edward Capriolo edited comment on CASSANDRA-7296 at 10/10/16 6:22 PM:
--

{quote}
I'm very opposed to the /*disable_snitch=true*/ syntax. We don't use that 
anywhere, and why would we want that to be part of the statement? Making it 
part of the statement removes the ability to disable dynamic snitch at a per 
query level, including it as part of CQL makes it per prepared statement.
It's not like adding it to the protocol is any different than specifying 
consistency level or a write timestamp.
{quote}

Again, this is how most (if not all databases do this). The reason is for RDBMS 
databases the API's are standard (like JDBC) and you can not add new 
functionality in the form of new methods at the driver level.

The win of CQL is it solves everything in the query language. Everything else 
takes something out of the language makes it more like thirft. It is now 
something that EVERY client driver must implement. 

This is why the consistency level makes sense as well because you can fit the 
need without making a new feature that all the clients must implement to get 
the functionality.

Another way to do this is make the options a clear part of the language:

https://msdn.microsoft.com/en-us/library/ms181714.aspx

This is essentially the same thing as /* */ the parser parses it and acts. It 
is only a matter of the syntax.


was (Author: appodictic):
{quote}
I'm very opposed to the /*disable_snitch=true*/ syntax. We don't use that 
anywhere, and why would we want that to be part of the statement? Making it 
part of the statement removes the ability to disable dynamic snitch at a per 
query level, including it as part of CQL makes it per prepared statement.
It's not like adding it to the protocol is any different than specifying 
consistency level or a write timestamp.
{quote}

Again, this is how most (if not all databases do this). The reason is for RDBMS 
databases the API's are standard (like JDBC) and you can not add new 
functionality in the form of new methods.

The win of CQL is it solves everything in the query language. Everything else 
takes something out of the language makes it more like thirft. It is now 
something that EVERY client driver must implement. 

This is why the consistency level makes sense as well because you can fit the 
need without making a new feature that all the clients must implement to get 
the functionality

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563043#comment-15563043
 ] 

Edward Capriolo edited comment on CASSANDRA-7296 at 10/10/16 6:20 PM:
--

{quote}
I'm very opposed to the /*disable_snitch=true*/ syntax. We don't use that 
anywhere, and why would we want that to be part of the statement? Making it 
part of the statement removes the ability to disable dynamic snitch at a per 
query level, including it as part of CQL makes it per prepared statement.
It's not like adding it to the protocol is any different than specifying 
consistency level or a write timestamp.
{quote}

Again, this is how most (if not all databases do this). The reason is for RDBMS 
databases the API's are standard (like JDBC) and you can not add new 
functionality in the form of new methods.

The win of CQL is it solves everything in the query language. Everything else 
takes something out of the language makes it more like thirft. It is now 
something that EVERY client driver must implement. 

This is why the consistency level makes sense as well because you can fit the 
need without making a new feature that all the clients must implement to get 
the functionality


was (Author: appodictic):
{quote}
I'm very opposed to the /*disable_snitch=true*/ syntax. We don't use that 
anywhere, and why would we want that to be part of the statement? Making it 
part of the statement removes the ability to disable dynamic snitch at a per 
query level, including it as part of CQL makes it per prepared statement.
It's not like adding it to the protocol is any different than specifying 
consistency level or a write timestamp.
{quote}

Again, this is how most (if not all databases do this). The reason is for RDBMS 
databases the API's are standard (like JDBC) and you can not add new 
functionality in the form of new methods.

The point of CQL is it solves everything in the query language, every weird 
switch that takes something out of the language makes it more like thirft. It 
is now something that EVERY client drive must implement.

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562910#comment-15562910
 ] 

Edward Capriolo edited comment on CASSANDRA-7296 at 10/10/16 5:30 PM:
--

{quote}
stmt = session.prepare("SELECT * from tab where id = ?", 
consistency_level=ConsistencyLevel.ONE)
stmt.disable_dynamic_snitch()
{quote}

I think it would be better using more standard SQL for optimizations. This is 
the common way query hints are provided.

{quote}
stmt = session.prepare("SELECT /* disable_snitch=true */ * from tab where id = 
?", consistency_level=ConsistencyLevel.ONE)
{quote}

Providing extra methods like this seems thrift like. 
{quote}
stmt.disable_dynamic_snitch()
{quote}
This makes an API not a query language.



was (Author: appodictic):
{quote}
stmt = session.prepare("SELECT * from tab where id = ?", 
consistency_level=ConsistencyLevel.ONE)
stmt.disable_dynamic_snitch()
{quote}

I think it would be better using more standard SQL for optimizations. This is 
the common way query hints are provided.

{quote}
stmt = session.prepare("SELECT /*disable_snitch=true*/ * from tab where id = 
?", consistency_level=ConsistencyLevel.ONE)
{quote}

Providing extra methods like this seems thrift like. 
{quote}
stmt.disable_dynamic_snitch()
{quote}
This makes an API not a query language.


> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562870#comment-15562870
 ] 

Edward Capriolo edited comment on CASSANDRA-7296 at 10/10/16 5:26 PM:
--

I think it makes sense as either but it really makes sense as a consistency 
level as well. 

THIS_ONE might be a better name. Other consistency levels do express WHERE you 
want something to happen:

Aren't we discussing adding consistency levels here?  
https://issues.apache.org/jira/browse/CASSANDRA-8119 

The difference between 8119 and this is that this is implemented in a patch, so 
a rational argument is to do this feature in the least intrusive way. 


was (Author: appodictic):
Is https://issues.apache.org/jira/browse/CASSANDRA-8119 a protocol option as 
well?

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562870#comment-15562870
 ] 

Edward Capriolo edited comment on CASSANDRA-7296 at 10/10/16 5:27 PM:
--

I think it makes sense as either but it really makes sense as a consistency 
level as well. 

THIS_ONE might be a better name. Other consistency levels do express WHERE you 
want something to happen such as ANY:

Aren't we discussing adding consistency levels here?  
https://issues.apache.org/jira/browse/CASSANDRA-8119 

The difference between 8119 and this is that this is implemented in a patch, so 
a rational argument is to do this feature in the least intrusive way. 


was (Author: appodictic):
I think it makes sense as either but it really makes sense as a consistency 
level as well. 

THIS_ONE might be a better name. Other consistency levels do express WHERE you 
want something to happen:

Aren't we discussing adding consistency levels here?  
https://issues.apache.org/jira/browse/CASSANDRA-8119 

The difference between 8119 and this is that this is implemented in a patch, so 
a rational argument is to do this feature in the least intrusive way. 

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562698#comment-15562698
 ] 

Edward Capriolo edited comment on CASSANDRA-7296 at 10/10/16 4:10 PM:
--

{quote}
Basically, despite this being arguably confusing to most, I'm not sure we have 
really quantified the advantage this brings us, which is a shame 
{quote}

It brings one key thing. The clients do logic to control where to route 
request, they do this because they want the lowest latency. We want the server 
to respect the brain power of the client and carry out the operation where it 
decided, not forward the request elsewhere like it (sometimes) does now 
incurring more latency on some requests and making them hard to debug.


was (Author: appodictic):
{quote}
Basically, despite this being arguably confusing to most, I'm not sure we have 
really quantified the advantage this brings us, which is a shame 
{quote}

It brings one key thing. The clients do logic to control where to route 
request, they do this because they want the lowest latency. We want the server 
to respect the brain power of the client and carry out the operation where it 
decided, not forward the request elsewhere like it (sometimes) does now.

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-07 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1978#comment-1978
 ] 

Edward Capriolo edited comment on CASSANDRA-7296 at 10/7/16 7:05 PM:
-

{quote}
 Since there's little upside to this, and quite a bit of potential downside
{quote}

This is really useful if you want to do user generated request pinning. ONE 
could allow the node to proxy the request away based on what dynamic_snitch 
wants to do.

{quote}
New consistency levels tend to introduce a lot of edge-case bugs, and this one 
is particularly special, which probably means extra bugs.
{quote}

I am not following this logic. Because previously attempts which added buggy or 
incomplete features stand as a reason not to add new features?


was (Author: appodictic):
{quote}
 Since there's little upside to this, and quite a bit of potential downside
{quote}

This is really useful if you want to do user generated request pinning. ONE 
could allow the node to proxy the request away based on what dynamic_snitch 
wants to do.

{quote}
New consistency levels tend to introduce a lot of edge-case bugs, and this one 
is particularly special, which probably means extra bugs.
{quote}

I am not following this logic. Why does because previously attempts which added 
buggy or incomplete features stand as a reason not to add new features?

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-07 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1978#comment-1978
 ] 

Edward Capriolo edited comment on CASSANDRA-7296 at 10/7/16 7:03 PM:
-

{quote}
 Since there's little upside to this, and quite a bit of potential downside
{quote}

This is really useful if you want to do user generated request pinning. ONE 
could allow the node to proxy the request away based on what dynamic_snitch 
wants to do.

{quote}
New consistency levels tend to introduce a lot of edge-case bugs, and this one 
is particularly special, which probably means extra bugs.
{quote}

I am not following this logic. Why does because previously attempts which added 
buggy or incomplete features stand as a reason not to add new features?


was (Author: appodictic):
{quote}
 Since there's little upside to this, and quite a bit of potential downside
{quote}

This is really useful if you want to do user generated request pinning. ONE 
could allows the node to proxy the request away based on what dynamic_snitch 
wants to do.

{quote}
New consistency levels tend to introduce a lot of edge-case bugs, and this one 
is particularly special, which probably means extra bugs.
{quote}

I am not following this logic. Why does because previously attempts which added 
buggy or incomplete features stand as a reason not to add new features?

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2016-10-07 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1951#comment-1951
 ] 

Jon Haddad edited comment on CASSANDRA-7296 at 10/7/16 6:57 PM:


I'd like to resurrect this.  There's cases where an operator needs to know 
exactly what's on a specific node.  CL.COORDINATOR_ONLY is useful for debugging 
all sorts of production issues.  Dynamic snitch makes CL=ONE not an effective 
way of determining what's on a specific node.


was (Author: rustyrazorblade):
I'd like to resurrect this.  There's cases where an operator needs to know 
exactly what's on a specific node.  CL.COORDINATOR_ONLY is useful for debugging 
all sorts of production issues.

> Add CL.COORDINATOR_ONLY
> ---
>
> Key: CASSANDRA-7296
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tupshin Harper
>
> For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
> read that never gets distributed, and only works if the coordinator you are 
> talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2014-12-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249655#comment-14249655
 ] 

Piotr Kołaczkowski edited comment on CASSANDRA-7296 at 12/17/14 9:39 AM:
-

Honestly, I don't like this idea for Spark because of the following reasons:

# Seems like adding quite a lot of complexity to handle the following cases:
  ** What do we do if RF  1 to avoid duplicates? 
  ** If we decide on primary token range only, what do we do if one of the 
nodes fail and some primary token ranges have no node to query from? 
  ** What if the amount of data is large enough that we'd like to actually 
split token ranges so that they are smaller and there are more spark tasks? 
This is important for bigger jobs to protect from sudden failures and not 
having to recompute too much in case of a lost spark partition.
  ** How do we fetch data from the same node in parallel? Currently it is 
perfectly fine to have one Spark node using multiple cores (mappers) that fetch 
data from the same coordinator node separately?
# It is trying to solve a theoretical problem which hasn't proved in practice 
yet.
  ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No 
significant difference on larger data sets, and only a tiny difference on 
really small sets (constant cost of the query is higher than the cost of 
fetching the data).
  ** There are no customers reporting vnodes to be a problem for them.
  ** Theoretical reason: If data is large enough to not fit in page cache 
(hundreds of GBs on a single node), 256 additional random seeks is not going to 
cause a huge penalty because:
  *** some of them can be hidden by splitting those queries between separate 
Spark threads, so they would be submitted and executed in parallel
  *** each token range will be of size *hundreds* of MBs, which is enough large 
to hide one or two seeks

Some *real* performance problems we (and users) observed:
 * Cassandra is taking plenty of CPU when doing sequential scans. It is not 
possible to saturate bandwidth of a single laptop spinning HDD, because all 
cores of i7 CPU @2.4 GHz are 100% busy processing those small CQL cells, 
merging rows from different SSTables, ordering cells, filtering out tombstones, 
serializing etc. The problem doesn't go away after doing full compaction or 
disabling vnodes. This is a serious problem, because doing exactly the same 
query on a plain text file stored in CFS (still C*, but data stored as 2MB 
blobs) gives 3-30x performance boost (depending on who did the benchmark). We 
need to close this gap. See: https://datastax.jira.com/browse/DSP-3670
 * We need to improve backpressure mechanism at least in such a way that the 
driver or Spark connector would know to start throttling writes if the cluster 
doesn't keep up. Currently Cassandra just timeouts the writes, but once it 
happens, the driver has no clue how long to wait until it is ok to resubmit the 
update. It would be actually good to know long enough before timing out, so we 
could slow down and avoid wasteful retrying at all. Currently it is not 
possible to predict cluster load by e.g. observing write latency, because the 
latency is extremely good until it is suddently terrible (timeout). This is 
also important for other non-Spark related use cases. See 
https://issues.apache.org/jira/browse/CASSANDRA-7937.





was (Author: pkolaczk):
Honestly, I don't like this idea because of the following reasons:

# Seems like adding quite a lot of complexity to handle the following cases:
  ** What do we do if RF  1 to avoid duplicates? 
  ** If we decide on primary token range only, what do we do if one of the 
nodes fail and some primary token ranges have no node to query from? 
  ** What if the amount of data is large enough that we'd like to actually 
split token ranges so that they are smaller and there are more spark tasks? 
This is important for bigger jobs to protect from sudden failures and not 
having to recompute too much in case of a lost spark partition.
  ** How do we fetch data from the same node in parallel? Currently it is 
perfectly fine to have one Spark node using multiple cores (mappers) that fetch 
data from the same coordinator node separately?
# It is trying to solve a theoretical problem which hasn't proved in practice 
yet.
  ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No 
significant difference on larger data sets, and only a tiny difference on 
really small sets (constant cost of the query is higher than the cost of 
fetching the data).
  ** There are no customers reporting vnodes to be a problem for them.
  ** Theoretical reason: If data is large enough to not fit in page cache 
(hundreds of GBs on a single node), 256 additional random seeks is not going to 
cause a huge penalty because:
  *** some of them can be hidden by splitting 

[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2014-12-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249655#comment-14249655
 ] 

Piotr Kołaczkowski edited comment on CASSANDRA-7296 at 12/17/14 9:41 AM:
-

Honestly, I don't think it would benefit Spark integration:

# Seems like adding quite a lot of complexity to handle the following cases:
  ** What do we do if RF  1 to avoid duplicates? 
  ** If we decide on primary token range only, what do we do if one of the 
nodes fail and some primary token ranges have no node to query from? 
  ** What if the amount of data is large enough that we'd like to actually 
split token ranges so that they are smaller and there are more spark tasks? 
This is important for bigger jobs to protect from sudden failures and not 
having to recompute too much in case of a lost spark partition.
  ** How do we fetch data from the same node in parallel? Currently it is 
perfectly fine to have one Spark node using multiple cores (mappers) that fetch 
data from the same coordinator node separately?
# It is trying to solve a theoretical problem which hasn't proved in practice 
yet.
  ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No 
significant difference on larger data sets, and only a tiny difference on 
really small sets (constant cost of the query is higher than the cost of 
fetching the data).
  ** There are no customers reporting vnodes to be a problem for them.
  ** Theoretical reason: If data is large enough to not fit in page cache 
(hundreds of GBs on a single node), 256 additional random seeks is not going to 
cause a huge penalty because:
  *** some of them can be hidden by splitting those queries between separate 
Spark threads, so they would be submitted and executed in parallel
  *** each token range will be of size *hundreds* of MBs, which is enough large 
to hide one or two seeks

Some *real* performance problems we (and users) observed:
 * Cassandra is taking plenty of CPU when doing sequential scans. It is not 
possible to saturate bandwidth of a single laptop spinning HDD, because all 
cores of i7 CPU @2.4 GHz are 100% busy processing those small CQL cells, 
merging rows from different SSTables, ordering cells, filtering out tombstones, 
serializing etc. The problem doesn't go away after doing full compaction or 
disabling vnodes. This is a serious problem, because doing exactly the same 
query on a plain text file stored in CFS (still C*, but data stored as 2MB 
blobs) gives 3-30x performance boost (depending on who did the benchmark). We 
need to close this gap. See: https://datastax.jira.com/browse/DSP-3670
 * We need to improve backpressure mechanism at least in such a way that the 
driver or Spark connector would know to start throttling writes if the cluster 
doesn't keep up. Currently Cassandra just timeouts the writes, but once it 
happens, the driver has no clue how long to wait until it is ok to resubmit the 
update. It would be actually good to know long enough before timing out, so we 
could slow down and avoid wasteful retrying at all. Currently it is not 
possible to predict cluster load by e.g. observing write latency, because the 
latency is extremely good until it is suddently terrible (timeout). This is 
also important for other non-Spark related use cases. See 
https://issues.apache.org/jira/browse/CASSANDRA-7937.





was (Author: pkolaczk):
Honestly, I don't like this idea for Spark because of the following reasons:

# Seems like adding quite a lot of complexity to handle the following cases:
  ** What do we do if RF  1 to avoid duplicates? 
  ** If we decide on primary token range only, what do we do if one of the 
nodes fail and some primary token ranges have no node to query from? 
  ** What if the amount of data is large enough that we'd like to actually 
split token ranges so that they are smaller and there are more spark tasks? 
This is important for bigger jobs to protect from sudden failures and not 
having to recompute too much in case of a lost spark partition.
  ** How do we fetch data from the same node in parallel? Currently it is 
perfectly fine to have one Spark node using multiple cores (mappers) that fetch 
data from the same coordinator node separately?
# It is trying to solve a theoretical problem which hasn't proved in practice 
yet.
  ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No 
significant difference on larger data sets, and only a tiny difference on 
really small sets (constant cost of the query is higher than the cost of 
fetching the data).
  ** There are no customers reporting vnodes to be a problem for them.
  ** Theoretical reason: If data is large enough to not fit in page cache 
(hundreds of GBs on a single node), 256 additional random seeks is not going to 
cause a huge penalty because:
  *** some of them can be hidden by splitting those 

[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2014-12-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249655#comment-14249655
 ] 

Piotr Kołaczkowski edited comment on CASSANDRA-7296 at 12/17/14 5:41 PM:
-

Honestly, I don't think it would benefit Spark integration:

# Seems like adding quite a lot of complexity to handle the following cases:
  ** What do we do if RF  1 to avoid duplicates? 
  ** If we decide on primary token range only, what do we do if one of the 
nodes fail and some primary token ranges have no node to query from? 
  ** What if the amount of data is large enough that we'd like to actually 
split token ranges so that they are smaller and there are more spark tasks? 
This is important for bigger jobs to protect from sudden failures and not 
having to recompute too much in case of a lost spark partition.
  ** How do we fetch data from the same node in parallel? Currently it is 
perfectly fine to have one Spark node using multiple cores (mappers) that fetch 
data from the same coordinator node separately?
# It is trying to solve a theoretical problem which hasn't proved in practice 
yet.
  ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No 
significant difference on larger data sets, and only a tiny difference on 
really small sets (constant cost of the query is higher than the cost of 
fetching the data).
  ** There are no customers reporting vnodes to be a problem for them.
  ** Theoretical reason: If data is large enough to not fit in page cache 
(hundreds of GBs on a single node), 256 additional random seeks is not going to 
cause a huge penalty because:
  *** some of them can be hidden by splitting those queries between separate 
Spark threads, so they would be submitted and executed in parallel
  *** each token range will be of size *hundreds* of MBs, which is enough large 
to hide one or two seeks

Some *real* performance problems we (and users) observed:
 * Cassandra is taking plenty of CPU when doing sequential scans. It is not 
possible to saturate bandwidth of a single laptop spinning HDD, because all 
cores of i7 CPU @2.4 GHz are 100% busy processing those small CQL cells, 
merging rows from different SSTables, ordering cells, filtering out tombstones, 
serializing etc. The problem doesn't go away after doing full compaction or 
disabling vnodes. This is a serious problem, because doing exactly the same 
query on a plain text file stored in CFS (still C*, but data stored as 2MB 
blobs) gives 3-30x performance boost (depending on who did the benchmark). We 
need to close this gap. 
 * We need to improve backpressure mechanism at least in such a way that the 
driver or Spark connector would know to start throttling writes if the cluster 
doesn't keep up. Currently Cassandra just timeouts the writes, but once it 
happens, the driver has no clue how long to wait until it is ok to resubmit the 
update. It would be actually good to know long enough before timing out, so we 
could slow down and avoid wasteful retrying at all. Currently it is not 
possible to predict cluster load by e.g. observing write latency, because the 
latency is extremely good until it is suddently terrible (timeout). This is 
also important for other non-Spark related use cases. See 
https://issues.apache.org/jira/browse/CASSANDRA-7937.





was (Author: pkolaczk):
Honestly, I don't think it would benefit Spark integration:

# Seems like adding quite a lot of complexity to handle the following cases:
  ** What do we do if RF  1 to avoid duplicates? 
  ** If we decide on primary token range only, what do we do if one of the 
nodes fail and some primary token ranges have no node to query from? 
  ** What if the amount of data is large enough that we'd like to actually 
split token ranges so that they are smaller and there are more spark tasks? 
This is important for bigger jobs to protect from sudden failures and not 
having to recompute too much in case of a lost spark partition.
  ** How do we fetch data from the same node in parallel? Currently it is 
perfectly fine to have one Spark node using multiple cores (mappers) that fetch 
data from the same coordinator node separately?
# It is trying to solve a theoretical problem which hasn't proved in practice 
yet.
  ** Russell Spitzer benchmarked vnodes on small/medium/larger data sets. No 
significant difference on larger data sets, and only a tiny difference on 
really small sets (constant cost of the query is higher than the cost of 
fetching the data).
  ** There are no customers reporting vnodes to be a problem for them.
  ** Theoretical reason: If data is large enough to not fit in page cache 
(hundreds of GBs on a single node), 256 additional random seeks is not going to 
cause a huge penalty because:
  *** some of them can be hidden by splitting those queries between separate 
Spark threads, so they would be 

[jira] [Comment Edited] (CASSANDRA-7296) Add CL.COORDINATOR_ONLY

2014-12-17 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250222#comment-14250222
 ] 

Jon Haddad edited comment on CASSANDRA-7296 at 12/17/14 6:05 PM:
-

Good points.  I think this issue would result in other, perhaps more serious 
problems, making an appearance.  I am not convinced, however, that NUM_TOKENS = 
NUM_QUERIES is the right solution on the spark side either, under the case of 
(data size  disk size  disk_type == spinning_rust).  I think we can move any 
future discussion to the driver JIRA and reference this from there.


was (Author: rustyrazorblade):
Good points.  I think this issue would result in other, perhaps more serious 
problems, making an appearance.  I am not convinced, however, that NUM_TOKENS = 
NUM_QUERIES is the right solution on the spark side either, under the case of 
(data  disk  disk_type == spinning_rust).  I think we can move any future 
discussion to the driver JIRA and reference this from there.

 Add CL.COORDINATOR_ONLY
 ---

 Key: CASSANDRA-7296
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7296
 Project: Cassandra
  Issue Type: Improvement
Reporter: Tupshin Harper

 For reasons such as CASSANDRA-6340 and similar, it would be nice to have a 
 read that never gets distributed, and only works if the coordinator you are 
 talking to is an owner of the row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)