[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-05-09 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536846#comment-14536846
 ] 

Jonathan Shook edited comment on CASSANDRA-9318 at 5/10/15 12:32 AM:
-

I would venture that a solid load shedding system may improve the degenerate 
overloading case, but it is not the preferred method for dealing with 
overloading for most users. The concept of back-pressure is more squarely what 
people expect, for better or worse.

Here is what I think reasonable users want to see, with some variations:
1) The system performs with stability, up to the workload that it is able to 
handle with stability.
2a) Once it reaches that limit, it starts pushing back in terms of how quickly 
it accepts new work. This means that it simply blocks the operations or 
submissions of new requests with some useful bound that is determined by the 
system. It does not yet have to shed load. It does not yet have to give 
exceptions. This is a very reasonable expectation for most users. This is what 
they expect. Load shedding is a term of art which does not change the users' 
expectations.
2b) Once it reaches that limit, it starts throwing OE to the client. It does 
not have to shed load yet. (Perhaps this exception or something like it can be 
thrown _before_ load shedding occurs.) This is a very reasonable expectation 
for users who are savvy enough to do active load management at the client 
level. It may have to start writing hints, but if you are writing hints merely 
because of load, this might not be the best justification for having the hints 
system kick in. To me this is inherently a convenient remedy for the wrong 
problem, even if it works well. Yes, hints are there as a general mechanism, 
but it does not solve the problem of needing to know when the system is being 
pushed beyond capacity and how to handle it proactively. You could also say 
that hints actively hurt capacity when you need them most sometimes. They are 
expensive to process given the current implementation, and will always be load 
shifting even at theoretical best. Still we need them for node availability 
concerns, although we should be careful not to use them as a crutch for general 
capacity issues.
2c) Once it reaches that limit, it starts backlogging (without a helpful 
signature of such in the responses, maybe BackloggingException with some queue 
estimate). This is a very reasonable expectation for users who are savvy enough 
to manage their peak and valley workloads in a sensible way. Sometimes you 
actually want to tax the ingest and flush side of the system for a bit before 
allowing it to switch modes and catch up with compaction. The fact that C* can 
do this is an interesting capability, but those who want backpressure will not 
easily see it that way.
2d) If the system is being pushed beyond its capacity, then it may have to shed 
load. This should only happen if the user has decided that they want to be 
responsible for such and have pushed the system beyond the reasonable limit 
without paying attention to the indications in 2a, 2b, and 2c. In the current 
system, this decision is already made for them. They have no choice.

In a more optimistic world, users would get near optimal performance for a well 
tuned workload with back-pressure active throughout the system, or something 
very much like it. We could call it a different kind of scheduler, different 
queue management methods, or whatever. 
As long as the user could prioritize stability at some bounded load over 
possible instability at an over-saturating load, I think they would in most 
cases. Like I said, they really don't have this choice right now. I know this 
is not trivial. We can't remove the need to make sane judgments about sizing 
and configuration. We might be able to, however, make the system ramp more 
predictably up to saturation, and behave more reasonably at that level.

Order of precedence, How to designate a mode of operation, or any other 
concerns aren't really addressed here. I just provided the examples above as 
types of behaviors which are nuanced yet perfectly valid for different types of 
system designs. The real point here is that there is not a single overall 
QoS/capacity/back-pressure behavior which is going to be acceptable to all 
users. Still, we need to ensure stability under saturating load where possible. 
I would like to think that with CASSANDRA-8099 that we can start discussing 
some of the client-facing back-pressure ideas more earnestly. I do believe that 
these ideas are all compatible ideas on a spectrum of behavior. They are not 
mutually exclusive from a design/implementation perspective. It's possible that 
they could be specified per operation, even, with some traffic yield to others 
due to client policies. For example, a lower priority client could yield when 
it knows the 

cassandra git commit: simplify: switch contains get - get

2015-05-09 Thread dbrosius
Repository: cassandra
Updated Branches:
  refs/heads/trunk 6cb19216f - 16bf51211


simplify: switch contains  get - get


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/16bf5121
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/16bf5121
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/16bf5121

Branch: refs/heads/trunk
Commit: 16bf51211594fada8115ca70e3731aa3d4440191
Parents: 6cb1921
Author: Dave Brosius dbros...@mebigfatguy.com
Authored: Sat May 9 17:08:27 2015 -0400
Committer: Dave Brosius dbros...@mebigfatguy.com
Committed: Sat May 9 17:08:27 2015 -0400

--
 .../cassandra/io/sstable/metadata/MetadataSerializer.java   | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/16bf5121/src/java/org/apache/cassandra/io/sstable/metadata/MetadataSerializer.java
--
diff --git 
a/src/java/org/apache/cassandra/io/sstable/metadata/MetadataSerializer.java 
b/src/java/org/apache/cassandra/io/sstable/metadata/MetadataSerializer.java
index 46fbbe2..8a65d8d 100644
--- a/src/java/org/apache/cassandra/io/sstable/metadata/MetadataSerializer.java
+++ b/src/java/org/apache/cassandra/io/sstable/metadata/MetadataSerializer.java
@@ -116,9 +116,10 @@ public class MetadataSerializer implements 
IMetadataSerializer
 for (MetadataType type : types)
 {
 MetadataComponent component = null;
-if (toc.containsKey(type))
+Integer offset = toc.get(type);
+if (offset != null)
 {
-in.seek(toc.get(type));
+in.seek(offset);
 component = type.serializer.deserialize(descriptor.version, 
in);
 }
 components.put(type, component);



[jira] [Updated] (CASSANDRA-9337) Expose LocalStrategy to Applications

2015-05-09 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-9337:
--
Fix Version/s: 3.x

 Expose LocalStrategy to Applications
 

 Key: CASSANDRA-9337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Matthias Broecheler
Assignee: Jeremiah Jordan
 Fix For: 3.x


 For applications maintaining secondary indexes (or, more generally, views) on 
 a table, it would be nice if they could rely on the same mechanism that C* 
 uses under the hood to maintain its secondary column indexes. That is, allow 
 applications to create tables with LocalReplicationStrategy (i.e. not 
 replicated) which are not visible to the user when describing the keyspace 
 and which cannot be modified through the client directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9337) Expose LocalStrategy to Applications

2015-05-09 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-9337:
--
Assignee: Jeremiah Jordan

 Expose LocalStrategy to Applications
 

 Key: CASSANDRA-9337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Matthias Broecheler
Assignee: Jeremiah Jordan

 For applications maintaining secondary indexes (or, more generally, views) on 
 a table, it would be nice if they could rely on the same mechanism that C* 
 uses under the hood to maintain its secondary column indexes. That is, allow 
 applications to create tables with LocalReplicationStrategy (i.e. not 
 replicated) which are not visible to the user when describing the keyspace 
 and which cannot be modified through the client directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-05-09 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536846#comment-14536846
 ] 

Jonathan Shook edited comment on CASSANDRA-9318 at 5/10/15 12:26 AM:
-

I would venture that a solid load shedding system may improve the degenerate 
overloading case, but it is not the preferred method for dealing with 
overloading for most users. The concept of back-pressure is more squarely what 
people expect, for better or worse.

Here is what I think reasonable users want to see, with some variations:
1) The system performs with stability, up to the workload that it is able to 
handle with stability.
2a) Once it reaches that limit, it starts pushing back in terms of how quickly 
it accepts new work. This means that it simply blocks the operations or 
submissions of new requests with some useful bound that is determined by the 
system. It does not yet have to shed load. It does not yet have to give 
exceptions. This is a very reasonable expectation for most users. This is what 
they expect. Load shedding is a term of art which does not change the users' 
expectations.
2b) Once it reaches that limit, it starts throwing OE to the client. It does 
not have to shed load yet. (Perhaps this exception or something like it can be 
thrown _before_ load shedding occurs.) This is a very reasonable expectation 
for users who are savvy enough to do active load management at the client 
level. It may have to start writing hints, but if you are writing hints merely 
because of load, this might not be the best justification for having the hints 
system kick in. To me this is inherently a convenient remedy for the wrong 
problem, even if it works well. Yes, hints are there as a general mechanism, 
but it does not solve the problem of needing to know when the system is being 
pushed beyond capacity and how to handle it proactively. You could also say 
that hints actively hurt capacity when you need them most sometimes. They are 
expensive to process given the current implementation, and will always be load 
shifting even at theoretical best. Still we need them for node availability 
concerns, although we should be careful not to use them as a crutch for general 
capacity issues.
2c) Once it reaches that limit, it starts backlogging (without a helpful 
signature of such in the responses, maybe BackloggingException with some queue 
estimate). This is a very reasonable expectation for users who are savvy enough 
to manage their peak and valley workloads in a sensible way. Sometimes you 
actually want to tax the ingest and flush side of the system for a bit before 
allowing it to switch modes and catch up with compaction. The fact that C* can 
do this is an interesting capability, but those who want backpressure will not 
easily see it that way.
2d) If the system is being pushed beyond its capacity, then it may have to shed 
load. This should only happen if the user has decided that they want to be 
responsible for such and have pushed the system beyond the reasonable limit 
without paying attention to the indications in 2a, 2b, and 2c. In the current 
system, this decision is already made for them. They have no choice.

In a more optimistic world, users would get near optimal performance for a well 
tuned workload with back-pressure active throughout the system, or something 
very much like it. We could call it a different kind of scheduler, different 
queue management methods, or whatever. 
As long as the user could prioritize stability at some bounded load over 
possible instability at an over-saturating load, I think they would in most 
cases. Like I said, they really don't have this choice right now. I know this 
is not trivial. We can't remove the need to make sane judgments about sizing 
and configuration. We might be able to, however, make the system ramp more 
predictably up to saturation, and behave more reasonable at that level.

Order of precedence, How to designate a mode of operation, or any other 
concerns aren't really addressed here. I just provided the examples above as 
types of behaviors which are nuanced yet perfectly valid for different types of 
system designs. The real point here is that there is not a single overall 
QoS/capacity/back-pressure behavior which is going to be acceptable to all 
users. Still, we need to ensure stability under saturating load where possible. 
I would like to think that with CASSANDRA-8099 that we can start discussing 
some of the client-facing back-pressure ideas more earnestly. I do believe that 
these ideas are all compatible ideas on a spectrum of behavior. They are not 
mutually exclusive from a design/implementation perspective. It's possible that 
they could be specified per operation, even, with some traffic yield to others 
due to client policies. For example, a lower priority client could yield when 
it knows the 

[jira] [Updated] (CASSANDRA-9337) Expose LocalStrategy to Applications

2015-05-09 Thread Matthias Broecheler (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Broecheler updated CASSANDRA-9337:
---
Description: For applications maintaining secondary indexes (or, more 
generally, views) on a table, it would be nice if they could rely on the same 
mechanism that C* uses under the hood to maintain its secondary column indexes. 
That is, allow applications to create tables with LocalReplicationStrategy 
(i.e. not replicated) which are not visible to the user when describing the 
keyspace and which cannot be modified through the client directly.  (was: For 
applications that build on top of Cassandra, two common use cases emerge:

1) Secondary indexes are used to maintain some form of a custom materialized 
view locally in a separate table. This is essentially what C* column indexes 
do. In that case, the table should be local (i.e. not replicated) as it is 
maintained against another table.
2) A table is used to store configuration information to pertains to the 
application running atop of Cassandra which needs to be replicated to all nodes.

In both cases, the replication strategy differs from standard tables and the 
tables should not be visible to the user when doing a DESCRIBE KEYSPACE. In 
both cases, it would furthermore be nice if writing could be restricted so that 
the tables can only be updated from within the process but not by clients 
through CQL. No read restrictions need to be imposed.)

 Expose LocalStrategy to Applications
 

 Key: CASSANDRA-9337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Matthias Broecheler

 For applications maintaining secondary indexes (or, more generally, views) on 
 a table, it would be nice if they could rely on the same mechanism that C* 
 uses under the hood to maintain its secondary column indexes. That is, allow 
 applications to create tables with LocalReplicationStrategy (i.e. not 
 replicated) which are not visible to the user when describing the keyspace 
 and which cannot be modified through the client directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9337) Expose LocalStrategy to Applications

2015-05-09 Thread Matthias Broecheler (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Broecheler updated CASSANDRA-9337:
---
Summary: Expose LocalStrategy to Applications  (was: Advanced table options)

 Expose LocalStrategy to Applications
 

 Key: CASSANDRA-9337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9337
 Project: Cassandra
  Issue Type: Improvement
Reporter: Matthias Broecheler

 For applications that build on top of Cassandra, two common use cases emerge:
 1) Secondary indexes are used to maintain some form of a custom materialized 
 view locally in a separate table. This is essentially what C* column indexes 
 do. In that case, the table should be local (i.e. not replicated) as it is 
 maintained against another table.
 2) A table is used to store configuration information to pertains to the 
 application running atop of Cassandra which needs to be replicated to all 
 nodes.
 In both cases, the replication strategy differs from standard tables and the 
 tables should not be visible to the user when doing a DESCRIBE KEYSPACE. In 
 both cases, it would furthermore be nice if writing could be restricted so 
 that the tables can only be updated from within the process but not by 
 clients through CQL. No read restrictions need to be imposed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-05-09 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536484#comment-14536484
 ] 

Jeremiah Jordan commented on CASSANDRA-8576:


Bq. It looks better now, but the mixed-cluster during rolling upgrade issue is 
still there. If someone upgrades half of the cluster to the version with this 
patch, Hadoop jobs will very likely report errors (not sure how bad that will 
be - need to test it).

This is only an issue if the jobs are pulling the C* jar off of the nodes and 
the jar isn't part of the job itself?  So if this is a problem for someone, 
they have a work around.

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.x

 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
 CASSANDRA-8576-v2-2.1-branch.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-05-09 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536484#comment-14536484
 ] 

Jeremiah Jordan edited comment on CASSANDRA-8576 at 5/9/15 12:33 PM:
-

bq. It looks better now, but the mixed-cluster during rolling upgrade issue is 
still there. If someone upgrades half of the cluster to the version with this 
patch, Hadoop jobs will very likely report errors (not sure how bad that will 
be - need to test it).

This is only an issue if the jobs are pulling the C* jar off of the nodes and 
the jar isn't part of the job itself?  So if this is a problem for someone, 
they have a work around.


was (Author: jjordan):
Bq. It looks better now, but the mixed-cluster during rolling upgrade issue is 
still there. If someone upgrades half of the cluster to the version with this 
patch, Hadoop jobs will very likely report errors (not sure how bad that will 
be - need to test it).

This is only an issue if the jobs are pulling the C* jar off of the nodes and 
the jar isn't part of the job itself?  So if this is a problem for someone, 
they have a work around.

 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.x

 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
 CASSANDRA-8576-v2-2.1-branch.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-05-09 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536341#comment-14536341
 ] 

Piotr Kołaczkowski commented on CASSANDRA-8576:
---

Some comments were not addressed.
{noformat}
  boolean containToken;
for (RangeToken subrange : ranges)
{
//make sure subrange contains the token
containToken = false;
if (token != null)
{
if (subrange.contains(token))
containToken = true;
else
continue;
}

ColumnFamilySplit split =
new ColumnFamilySplit(
factory.toString(subrange.left),
factory.toString(subrange.right),
subSplit.getRow_count(),
endpoints);

if (containToken)
split.setPartitionKeyEqQuery(containToken);
logger.debug(adding {}, split);
{noformat}
Multiple code smells in this fragment:
* boolean flag declared in a needlessly broad scope. If something is used only 
inside a loop, it should be declared only inside the loop.
* continue controlled by a boolean flag
* redundant if (the code is equivalent without if (containToken)

I simplified it for you:
{noformat}
for (RangeToken subrange : ranges)
{
boolean containsToken = token != null  
subrange.contains(token);
if (token == null || containsToken) {
ColumnFamilySplit split =
new ColumnFamilySplit(
factory.toString(subrange.left),
factory.toString(subrange.right),
subSplit.getRow_count(),
endpoints);
split.setPartitionKeyEqQuery(containsToken);
logger.debug(adding {}, split);
splits.add(split);
}
}
{noformat}





 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.x

 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
 CASSANDRA-8576-v2-2.1-branch.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9197) Startup slowdown due to preloading jemalloc

2015-05-09 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536362#comment-14536362
 ] 

Robert Stupp commented on CASSANDRA-9197:
-

ping [~philipthompson] ;)

 Startup slowdown due to preloading jemalloc
 ---

 Key: CASSANDRA-9197
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9197
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne
Assignee: Robert Stupp
Priority: Minor
 Fix For: 3.x

 Attachments: 9197.txt


 On my box, it seems that the jemalloc loading from CASSANDRA-8714 made the 
 process take ~10 seconds to even start (I have no explication for it). I 
 don't know if it's specific to my machine or not, so that ticket is mainly so 
 someone else can check if it sees the same, in particular for jenkins. If it 
 does sees the same slowness, we might want to at least disable jemalloc for 
 dtests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9229) Add functions to convert timeuuid to date or time

2015-05-09 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536361#comment-14536361
 ] 

Robert Stupp commented on CASSANDRA-9229:
-

[~JoshuaMcKenzie] the point why I'd prefer not to add any {{toTime}} conversion 
is that we only ”know” UTC - so we could only convert to UTC, which is usually 
wrong. You need date+time *and* time-zone to perform a correct 
to-time-conversion, since time zones or its definitions (e.g. 
daylight-saving-time) may change. Having a distributed system with probably 
multiple JRE versions can cause different results. Having that said, I'd like 
to leave any time-conversion up to the client.

 Add functions to convert timeuuid to date or time
 -

 Key: CASSANDRA-9229
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9229
 Project: Cassandra
  Issue Type: New Feature
Reporter: Michaël Figuière
Assignee: Benjamin Lerer
  Labels: cql, doc-impacting
 Fix For: 3.x

 Attachments: CASSANDRA-9229.txt


 As CASSANDRA-7523 brings the {{date}} and {{time}} native types to Cassandra, 
 it would be useful to add builtin function to convert {{timeuuid}} to these 
 two new types, just like {{dateOf()}} is doing for timestamps.
 {{timeOf()}} would extract the time component from a {{timeuuid}}. Example 
 use case could be at insert time with for instance {{timeOf(now())}}, as well 
 as at read time to compare the time component of a {{timeuuid}} column in a 
 {{WHERE}} clause.
 The use cases would be similar for {{date}} but the solution is slightly less 
 obvious, as in a perfect world we would want {{dateOf()}} to convert to 
 {{date}} and {{timestampOf()}} for {{timestamp}}, unfortunately {{dateOf()}} 
 already exist and convert to a {{timestamp}}, not a {{date}}. Making this 
 change would break many existing CQL queries which is not acceptable. 
 Therefore we could use a different name formatting logic such as {{toDate}} 
 or {{dateFrom}}. We could then also consider using this new name convention 
 for the 3 dates related types and just have {{dateOf}} becoming a deprecated 
 alias.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9230) Allow preparing multiple prepared statements at once

2015-05-09 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536372#comment-14536372
 ] 

Aleksey Yeschenko commented on CASSANDRA-9230:
--

CASSANDRA-8831 and CASSANDRA-7923 should indeed be enough.

Also, the protocol is already asynchronous - send all your prepare requests at 
once, then wait for completion, and you would essentially get the same result 
in the end. If the drivers don't allow us to do that, then it's a drivers 
issue, not a C* issue.

So I'm with Tyler on this. Not worth adding a new protocol level construct 
(-1). Might be worse doing some work on the driver side though.

 Allow preparing multiple prepared statements at once
 

 Key: CASSANDRA-9230
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9230
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Vishy Kasar
Priority: Minor
  Labels: ponies

 We have a few cases like this:
 1. Large (40K) clients
 2. Each client preparing the same 10 prepared statements at the start up and 
 on reconnection to node
 3. Small(ish) number (24) of cassandra nodes 
 The statement need to be prepared on a casasndra node just once but currently 
 it is prepared 40K times at startup. 
 https://issues.apache.org/jira/browse/CASSANDRA-8831 will make the situation 
 much better. A further optimization is to allow clients to create not-yet 
 prepared statements in bulk.This way, client can prepare all the not yet 
 prepared statements with one round trip to server. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-05-09 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536307#comment-14536307
 ] 

Benedict commented on CASSANDRA-9318:
-

bq. Where? Are you talking about the hint limit?

I was, and I realise that was a mistake; I didn't fully understand the existing 
logic (and your proposal took me by surprise). Now that I do, I think I 
understand what you are proposing. There are a few problems that I see with it, 
though:

# the cluster as a whole, especially in large clusters, can still send a _lot_ 
of requests to a single node
# it has the opposite impact of (and likely prevents) CASSANDRA-3852, with 
older operations completely blocking newer ones 
# it might mean a lot more OE than users are used to during temporary blips, 
pushing problems down to clients, when the cluster is actually quite capable of 
coping (through hinting)
# tuning it is hard; network latencies, query processing times, and cluster 
size (which changes over time) will each impact it

I'm wary about a feature like this, when we could simply improve our current 
work shedding to make it more robust (MessagingService, MUTATION stage and 
ExpiringMap all, effectively, shed; just not with sufficient predictability), 
but I think I've made all my concerns sufficiently clear so I'll leave it with 
you.

 Bound the number of in-flight requests at the coordinator
 -

 Key: CASSANDRA-9318
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 2.1.x


 It's possible to somewhat bound the amount of load accepted into the cluster 
 by bounding the number of in-flight requests and request bytes.
 An implementation might do something like track the number of outstanding 
 bytes and requests and if it reaches a high watermark disable read on client 
 connections until it goes back below some low watermark.
 Need to make sure that disabling read on the client connection won't 
 introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8939) Stack overflow when reading data ingested through SSTableLoader

2015-05-09 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536318#comment-14536318
 ] 

Benedict commented on CASSANDRA-8939:
-

See CASSANDRA-8946, which ultimately superceded this ticket.

 Stack overflow when reading data ingested through SSTableLoader
 ---

 Key: CASSANDRA-8939
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8939
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Single C* node
 Linux Mint 17.1, kernel 3.16.0-30-generic
 Oracle Java 7u75.
Reporter: Piotr Kołaczkowski
Assignee: Benedict
 Fix For: 2.1.5

 Attachments: 8939.txt


 I created an empty table:
 {noformat}
 CREATE TABLE test.kv (
 key int PRIMARY KEY,
 value text
 ) WITH bloom_filter_fp_chance = 0.01
 AND caching = '{keys:ALL, rows_per_partition:NONE}'
 AND comment = ''
 AND compaction = {'min_threshold': '4', 'class': 
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
 'max_threshold': '32'}
 AND compression = {'sstable_compression': 
 'org.apache.cassandra.io.compress.LZ4Compressor'}
 AND dclocal_read_repair_chance = 0.1
 AND default_time_to_live = 0
 AND gc_grace_seconds = 864000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair_chance = 0.0
 AND speculative_retry = '99.0PERCENTILE';
 {noformat}
 Then I loaded some rows into it using CqlSSTableWriter and SSTableLoader 
 (programmatically, doing it the same way as BulkLoader is doing it). The 
 streaming finished with no errors. 
 I can even read all the data back with cqlsh:
 {noformat}
 cqlsh SELECT key, value FROM test.kv;
 
  3405 | foo3405
  5504 | foo5504
  3476 | foo3476
  2542 | foo2542
  6931 | foo6931
 ---MORE---
 (1 rows)
 {noformat}
 However, filtering by token fails:
 {noformat}
 cqlsh SELECT key, value FROM test.kv WHERE token(key)  854443789258213092;
 OperationTimedOut: errors={}, last_host=127.0.0.1
 cqlsh 
 {noformat}
 Server log repors a StackOverflowException:
 {noformat}
 WARN  15:10:05  Uncaught exception on thread 
 Thread[SharedPool-Worker-2,5,main]: {}
 java.lang.StackOverflowError: null
   at 
 java.nio.charset.CharsetDecoder.implReplaceWith(CharsetDecoder.java:302) 
 ~[na:1.7.0_75]
   at java.nio.charset.CharsetDecoder.replaceWith(CharsetDecoder.java:288) 
 ~[na:1.7.0_75]
   at java.nio.charset.CharsetDecoder.init(CharsetDecoder.java:203) 
 ~[na:1.7.0_75]
   at java.nio.charset.CharsetDecoder.init(CharsetDecoder.java:226) 
 ~[na:1.7.0_75]
   at sun.nio.cs.UTF_8$Decoder.init(UTF_8.java:84) ~[na:1.7.0_75]
   at sun.nio.cs.UTF_8$Decoder.init(UTF_8.java:81) ~[na:1.7.0_75]
   at sun.nio.cs.UTF_8.newDecoder(UTF_8.java:68) ~[na:1.7.0_75]
   at 
 org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:152) 
 ~[cassandra-all-2.1.3.248.jar:2.1.3.248]
   at 
 org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:39)
  ~[cassandra-all-2.1.3.248.jar:2.1.3.248]
   at 
 org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:26)
  ~[cassandra-all-2.1.3.248.jar:2.1.3.248]
   at 
 org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:82) 
 ~[cassandra-all-2.1.3.248.jar:2.1.3.248]
   at 
 org.apache.cassandra.cql3.ColumnIdentifier.init(ColumnIdentifier.java:54) 
 ~[cassandra-all-2.1.3.248.jar:2.1.3.248]
   at 
 org.apache.cassandra.db.composites.CompoundSparseCellNameType.idFor(CompoundSparseCellNameType.java:169)
  ~[cassandra-all-2.1.3.248.jar:2.1.3.248]
   at 
 org.apache.cassandra.db.composites.CompoundSparseCellNameType.makeWith(CompoundSparseCellNameType.java:177)
  ~[cassandra-all-2.1.3.248.jar:2.1.3.248]
   at 
 org.apache.cassandra.db.composites.AbstractCompoundCellNameType.fromByteBuffer(AbstractCompoundCellNameType.java:106)
  ~[cassandra-all-2.1.3.248.jar:2.1.3.248]
   at 
 org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:397)
  ~[cassandra-all-2.1.3.248.jar:2.1.3.248]
   at 
 org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:381)
  ~[cassandra-all-2.1.3.248.jar:2.1.3.248]
   at 
 org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:75)
  ~[cassandra-all-2.1.3.248.jar:2.1.3.248]
   at 
 org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:52) 
 ~[cassandra-all-2.1.3.248.jar:2.1.3.248]
   at 
 org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:46) 
 ~[cassandra-all-2.1.3.248.jar:2.1.3.248]
   at 
 

[jira] [Reopened] (CASSANDRA-8812) JVM Crashes on Windows x86

2015-05-09 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reopened CASSANDRA-8812:
-

Somehow missed this in the retrospective. This should be caught by the kitchen 
sink tests, since it requires a lot of concurrent work in parallel to schema 
changes (DROP TABLE), but it could do with its own regression test as well. 
Reopening for that.

 JVM Crashes on Windows x86
 --

 Key: CASSANDRA-8812
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8812
 Project: Cassandra
  Issue Type: Bug
 Environment: Windows 7 running x86(32-bit) Oracle JDK 1.8.0_u31
Reporter: Amichai Rothman
Assignee: Benedict
 Fix For: 2.1.5

 Attachments: 8812.txt, crashtest.tgz


 Under Windows (32 or 64 bit) with the 32-bit Oracle JDK, the JVM may crash 
 due to EXCEPTION_ACCESS_VIOLATION. This happens inconsistently. The attached 
 test project can recreate the crash - sometimes it works successfully, 
 sometimes there's a Java exception in the log, and sometimes the hotspot JVM 
 crash shows up (regardless of whether the JUnit test results in success - you 
 can ignore that). Run it a bunch of times to see the various outcomes. It 
 also contains a sample hotspot error log.
 Note that both when the Java exception is thrown and when the JVM crashes, 
 the stack trace is almost the same - they both eventually occur when the 
 PERIODIC-COMMIT-LOG-SYNCER thread calls CommitLogSegment.sync and accesses 
 the buffer (MappedByteBuffer): if it happens to be in buffer.force(), then 
 the Java exception is thrown, and if it's in one of the buffer.put() calls 
 before it, then the JVM crashes. This possibly exposes a JVM bug as well in 
 this case. So it basically looks like a race condition which results in the 
 buffer sometimes being used after it is no longer valid.
 I recreated this on a PC with Windows 7 64-bit running the 32-bit Oracle JDK, 
 as well as on a modern.ie virtualbox image of Windows 7 32-bit running the 
 JDK, and it happens both with JDK 7 and JDK 8. Also defining an explicit 
 dependency on cassandra 2.1.2 (as opposed to the cassandra-unit dependency on 
 2.1.0) doesn't make a difference. At some point in my testing I've also seen 
 a Java-level exception on Linux, but I can't recreate it at the moment with 
 this test project, so I can't guarantee it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8851) Uncaught exception on thread Thread[SharedPool-Worker-16,5,main] after upgrade to 2.1.3

2015-05-09 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536314#comment-14536314
 ] 

Benedict commented on CASSANDRA-8851:
-

bq. Benedict is there still an issue with tests not being run against 
compressed tables where they should be?

Probably, but I'm confused as to the context wrt this ticket?

 Uncaught exception on thread Thread[SharedPool-Worker-16,5,main] after 
 upgrade to 2.1.3
 ---

 Key: CASSANDRA-8851
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8851
 Project: Cassandra
  Issue Type: Bug
 Environment: ubuntu 
Reporter: Tobias Schlottke
Assignee: Benedict
Priority: Critical
 Fix For: 2.1.5

 Attachments: cassandra.yaml, schema.txt, system.log.gz


 Hi there,
 after upgrading to 2.1.3 we've got the following error every few seconds:
 {code}
 WARN  [SharedPool-Worker-16] 2015-02-23 10:20:36,392 
 AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
 Thread[SharedPool-Worker-16,5,main]: {}
 java.lang.AssertionError: null
   at org.apache.cassandra.io.util.Memory.size(Memory.java:307) 
 ~[apache-cassandra-2.1.3.jar:2.1.3]
   at 
 org.apache.cassandra.utils.obs.OffHeapBitSet.capacity(OffHeapBitSet.java:61) 
 ~[apache-cassandra-2.1.3.jar:2.1.3]
   at org.apache.cassandra.utils.BloomFilter.indexes(BloomFilter.java:74) 
 ~[apache-cassandra-2.1.3.jar:2.1.3]
   at 
 org.apache.cassandra.utils.BloomFilter.isPresent(BloomFilter.java:98) 
 ~[apache-cassandra-2.1.3.jar:2.1.3]
   at 
 org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:1366)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
   at 
 org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:1350)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
   at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:41)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
   at 
 org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:185)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
   at 
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
   at 
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:273)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
   at 
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
   at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1915)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
   at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1748)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
   at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:342) 
 ~[apache-cassandra-2.1.3.jar:2.1.3]
   at 
 org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:57)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
   at 
 org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47) 
 ~[apache-cassandra-2.1.3.jar:2.1.3]
   at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) 
 ~[apache-cassandra-2.1.3.jar:2.1.3]
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
 ~[na:1.7.0_45]
   at 
 org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
  ~[apache-cassandra-2.1.3.jar:2.1.3]
   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
 [apache-cassandra-2.1.3.jar:2.1.3]
   at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
 {code}
 This seems to crash the compactions and pushes up server load and piles up 
 compactions.
 Any idea / possible workaround?
 Best,
 Tobias



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop

2015-05-09 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536352#comment-14536352
 ] 

Piotr Kołaczkowski commented on CASSANDRA-8576:
---

It looks better now, but the mixed-cluster during rolling upgrade issue is 
still there. If someone upgrades half of the cluster to the version with this 
patch, Hadoop jobs will very likely report errors (not sure how bad that will 
be - need to test it). If this is not a problem, +1.


 Primary Key Pushdown For Hadoop
 ---

 Key: CASSANDRA-8576
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Russell Alexander Spitzer
Assignee: Alex Liu
 Fix For: 2.1.x

 Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, 
 CASSANDRA-8576-v2-2.1-branch.txt


 I've heard reports from several users that they would like to have predicate 
 pushdown functionality for hadoop (Hive in particular) based services. 
 Example usecase
 Table with wide partitions, one per customer
 Application team has HQL they would like to run on a single customer
 Currently time to complete scales with number of customers since Input Format 
 can't pushdown primary key predicate
 Current implementation requires a full table scan (since it can't recognize 
 that a single partition was specified)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1

2015-05-09 Thread Alan Boudreault (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536739#comment-14536739
 ] 

Alan Boudreault commented on CASSANDRA-7409:


Tests are done, no new blockers experienced during the runs: 
https://drive.google.com/drive/u/0/folders/0BwZ_GPM33j6KfktyN29kelQzd3NEYnNhTnpfajE2UDRwTTUtQkxwQVQ4YnpqaEMxSUk4TXM

We do see some bad performance for standard LCS for Like and Temperature 
scenarios. I will compare them with 2.1 to ensure it's not a new issue.

 Allow multiple overlapping sstables in L1
 -

 Key: CASSANDRA-7409
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409
 Project: Cassandra
  Issue Type: Improvement
Reporter: Carl Yeksigian
Assignee: Carl Yeksigian
  Labels: compaction
 Fix For: 3.x


 Currently, when a normal L0 compaction takes place (not STCS), we take up to 
 MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and 
 compact them together. If we didn't have to deal with the overlapping L1 
 tables, we could compact a higher number of L0 sstables together into a set 
 of non-overlapping L1 sstables.
 This could be done by delaying the invariant that L1 has no overlapping 
 sstables. Going from L1 to L2, we would be compacting fewer sstables together 
 which overlap.
 When reading, we will not have the same one sstable per level (except L0) 
 guarantee, but this can be bounded (once we have too many sets of sstables, 
 either compact them back into the same level, or compact them up to the next 
 level).
 This could be generalized to allow any level to be the maximum for this 
 overlapping strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9197) Startup slowdown due to preloading jemalloc

2015-05-09 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536616#comment-14536616
 ] 

Philip Thompson commented on CASSANDRA-9197:


Sorry, [~snazy], I never got the email notication for this. I'll check it out 
on Monday.

 Startup slowdown due to preloading jemalloc
 ---

 Key: CASSANDRA-9197
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9197
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne
Assignee: Robert Stupp
Priority: Minor
 Fix For: 3.x

 Attachments: 9197.txt


 On my box, it seems that the jemalloc loading from CASSANDRA-8714 made the 
 process take ~10 seconds to even start (I have no explication for it). I 
 don't know if it's specific to my machine or not, so that ticket is mainly so 
 someone else can check if it sees the same, in particular for jenkins. If it 
 does sees the same slowness, we might want to at least disable jemalloc for 
 dtests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-05-09 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536307#comment-14536307
 ] 

Benedict edited comment on CASSANDRA-9318 at 5/9/15 5:23 PM:
-

bq. Where? Are you talking about the hint limit?

I was, and I realise that was a mistake; I didn't fully understand the existing 
logic (and your proposal took me by surprise). Now that I do, I think I 
understand what you are proposing. There are a few problems that I see with it, 
though:

# the cluster as a whole, especially in large clusters, can still send a _lot_ 
of requests to a single node
# it has the opposite impact of (and likely prevents) CASSANDRA-3852, with 
older operations completely blocking newer ones 
# it might mean a lot more OE than users are used to during temporary blips, 
pushing problems down to clients, when the cluster is actually quite capable of 
coping (through hinting)
#* It seems like this would in fact seriously compromise our A property, with 
any failure for any node in a token range rapidly making the entire token range 
unavailable for writes\*
# tuning it is hard; network latencies, query processing times, and cluster 
size (which changes over time) will each impact it

I'm wary about a feature like this, when we could simply improve our current 
work shedding to make it more robust (MessagingService, MUTATION stage and 
ExpiringMap all, effectively, shed; just not with sufficient predictability), 
but I think I've made all my concerns sufficiently clear so I'll leave it with 
you.

\* At the very least we would have to first fallback to hints, rather than 
throwing OE, and wait for hints to saturate before throwing (AFAICT). In which 
case we're _in effect_ introducing LIFO-leaky pruning of the ExpiringMap, MS, 
and the receiving node's MUTATION stage, but under a new mechanism (as opposed 
to inline FIFO? (tbd) pruning). I don't really have anything against this, 
since it is functionally equivalent, although I think FIFO-pruning is 
preferable; having fewer pruning mechanisms is probably preferable; these 
mechanisms would apply more universally; and they would insulate the node from 
the many-to-one effect (by making the MUTATION stage itself robust to overload).


was (Author: benedict):
bq. Where? Are you talking about the hint limit?

I was, and I realise that was a mistake; I didn't fully understand the existing 
logic (and your proposal took me by surprise). Now that I do, I think I 
understand what you are proposing. There are a few problems that I see with it, 
though:

# the cluster as a whole, especially in large clusters, can still send a _lot_ 
of requests to a single node
# it has the opposite impact of (and likely prevents) CASSANDRA-3852, with 
older operations completely blocking newer ones 
# it might mean a lot more OE than users are used to during temporary blips, 
pushing problems down to clients, when the cluster is actually quite capable of 
coping (through hinting)
# tuning it is hard; network latencies, query processing times, and cluster 
size (which changes over time) will each impact it

I'm wary about a feature like this, when we could simply improve our current 
work shedding to make it more robust (MessagingService, MUTATION stage and 
ExpiringMap all, effectively, shed; just not with sufficient predictability), 
but I think I've made all my concerns sufficiently clear so I'll leave it with 
you.

 Bound the number of in-flight requests at the coordinator
 -

 Key: CASSANDRA-9318
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 2.1.x


 It's possible to somewhat bound the amount of load accepted into the cluster 
 by bounding the number of in-flight requests and request bytes.
 An implementation might do something like track the number of outstanding 
 bytes and requests and if it reaches a high watermark disable read on client 
 connections until it goes back below some low watermark.
 Need to make sure that disabling read on the client connection won't 
 introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-9230) Allow preparing multiple prepared statements at once

2015-05-09 Thread Robert Stupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp resolved CASSANDRA-9230.
-
Resolution: Not A Problem

Closing as ”not a problem. Although the tests show some advantage - having 
CASSANDRA-8831 committed should solve the issue since clients no longer have to 
re-prepare the statements.

 Allow preparing multiple prepared statements at once
 

 Key: CASSANDRA-9230
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9230
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Vishy Kasar
Priority: Minor
  Labels: ponies

 We have a few cases like this:
 1. Large (40K) clients
 2. Each client preparing the same 10 prepared statements at the start up and 
 on reconnection to node
 3. Small(ish) number (24) of cassandra nodes 
 The statement need to be prepared on a casasndra node just once but currently 
 it is prepared 40K times at startup. 
 https://issues.apache.org/jira/browse/CASSANDRA-8831 will make the situation 
 much better. A further optimization is to allow clients to create not-yet 
 prepared statements in bulk.This way, client can prepare all the not yet 
 prepared statements with one round trip to server. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-05-09 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536846#comment-14536846
 ] 

Jonathan Shook commented on CASSANDRA-9318:
---

I would venture that a solid load shedding system may improve the degenerate 
overloading case, but it is not the preferred method for dealing with 
overloading for most users. The concept of back-pressure is more squarely what 
people expect, for better or worse.

Here is what I think reasonable users want to see, with some variations:
1) The system performs with stability, up to the workload that it is able to 
handle with stability.
2a) Once it reaches that limit, it starts pushing back in terms of how quickly 
it accepts new work. This means that it simply blocks the operations or 
submissions of new requests with some useful bound that is determined by the 
system. It does not yet have to shed load. It does not yet have to give 
exceptions. This is a very reasonable expectation for most users. This is what 
they expect. Load shedding is a term of art which does not change the users 
expectations.
2b) Once it reaches that limit, it starts throwing OE to the client. It does 
not have to shed load yet. This is a very reasonable expectation for users who 
are savvy enough to do active load management at the client level. It may have 
to start writing hints, but if you are writing hints because of load, this 
might not be the best justification for having the hints system kick in. To me 
this is inherently a convenient remedy for the wrong problem, even if it works 
well. Yes, hints are there as a general mechanism, but it does not relieve us 
of the problem of needing to know when the system is at capacity and how to 
handle it proactively. You could also say that hints actively hurt capacity 
when you need them most sometimes. They are expensive to process given the 
current implementation, and will always be load shifting even at theoretical 
best. Still we need them for node availability concerns, although we should be 
careful to use them as a crutch for general capacity issues.
2c) Once it reaches that limit, it starts backlogging (without a helpful 
signature of such in the responses, maybe BackloggingException with some queue 
estimate). This is a very reasonable expectation for users who are savvy enough 
to manage their peak and valley workloads in a sensible way. Sometimes you 
actually want to tax the ingest and flush side of the system for a bit before 
allowing it to switch modes and catch up with compaction. The fact that C* can 
do this is an interesting capability, but those who want backpressure will not 
easily see it that way.
2d) If the system is being pushed beyond its capacity, then it may have to shed 
load. This should only happen if the users has decided that they want to be 
responsible for such and have pushed the system beyond the reasonable limit 
without paying attention to the indications in 2a, 2b, and 2c.

Order of precedence, designated mode of operation, or any other concerns aren't 
really addressed here. I just provided them as examples of types of behaviors 
which are nuanced yet perfectly valid for different types of system designers. 
The real point here is that there is not a single overall design which is going 
to be acceptable to all users. Still, we need to ensure stability under 
saturating load where possible. I would like to think that with CASSANDRA-8099 
that we can start discussing some of the client-facing back-pressure ideas more 
earnestly.

We can come up with methods to improve the reliable and responsive capacity of 
the system even with some internal load management. If the first cut ends up 
being sub-optimal, then we can measure it against non-bounded workload tests 
and strive to close the gap. If it is implemented in a way that can support 
multiple usage scenarios, as described above, then such a limitation might be 
unlimited, bounded at level ___, or bounded by inline resource 
management.. But in any case would be controllable by some users/admin, 
client.. If we could ultimately give the categories of users above the ability 
to enable the various modes, then the 2a) scenario would be perfectly desirable 
for many users already even if the back-pressure logic only gave you 70% of the 
effective system capacity. Once testing shows that performance with active 
back-pressure to the client is close enough to the unbounded workloads, it 
could be enabled.

Summary: We still need reasonable back-pressure support throughout the system 
and eventually to the client. Features like this that can be a stepping stone 
towards such are still needed. The most perfect load shedding and hinting 
systems will still not be a sufficient replacement for back-pressure and 
capacity management.

 Bound the number of in-flight requests at the coordinator
 

[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-05-09 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536846#comment-14536846
 ] 

Jonathan Shook edited comment on CASSANDRA-9318 at 5/9/15 7:42 PM:
---

I would venture that a solid load shedding system may improve the degenerate 
overloading case, but it is not the preferred method for dealing with 
overloading for most users. The concept of back-pressure is more squarely what 
people expect, for better or worse.

Here is what I think reasonable users want to see, with some variations:
1) The system performs with stability, up to the workload that it is able to 
handle with stability.
2a) Once it reaches that limit, it starts pushing back in terms of how quickly 
it accepts new work. This means that it simply blocks the operations or 
submissions of new requests with some useful bound that is determined by the 
system. It does not yet have to shed load. It does not yet have to give 
exceptions. This is a very reasonable expectation for most users. This is what 
they expect. Load shedding is a term of art which does not change the users 
expectations.
2b) Once it reaches that limit, it starts throwing OE to the client. It does 
not have to shed load yet. This is a very reasonable expectation for users who 
are savvy enough to do active load management at the client level. It may have 
to start writing hints, but if you are writing hints because of load, this 
might not be the best justification for having the hints system kick in. To me 
this is inherently a convenient remedy for the wrong problem, even if it works 
well. Yes, hints are there as a general mechanism, but it does not relieve us 
of the problem of needing to know when the system is at capacity and how to 
handle it proactively. You could also say that hints actively hurt capacity 
when you need them most sometimes. They are expensive to process given the 
current implementation, and will always be load shifting even at theoretical 
best. Still we need them for node availability concerns, although we should be 
careful to use them as a crutch for general capacity issues.
2c) Once it reaches that limit, it starts backlogging (without a helpful 
signature of such in the responses, maybe BackloggingException with some queue 
estimate). This is a very reasonable expectation for users who are savvy enough 
to manage their peak and valley workloads in a sensible way. Sometimes you 
actually want to tax the ingest and flush side of the system for a bit before 
allowing it to switch modes and catch up with compaction. The fact that C* can 
do this is an interesting capability, but those who want backpressure will not 
easily see it that way.
2d) If the system is being pushed beyond its capacity, then it may have to shed 
load. This should only happen if the users has decided that they want to be 
responsible for such and have pushed the system beyond the reasonable limit 
without paying attention to the indications in 2a, 2b, and 2c.

Order of precedence, designated mode of operation, or any other concerns aren't 
really addressed here. I just provided them as examples of types of behaviors 
which are nuanced yet perfectly valid for different types of system designers. 
The real point here is that there is not a single overall design which is going 
to be acceptable to all users. Still, we need to ensure stability under 
saturating load where possible. I would like to think that with CASSANDRA-8099 
that we can start discussing some of the client-facing back-pressure ideas more 
earnestly.

We can come up with methods to improve the reliable and responsive capacity of 
the system even with some internal load management. If the first cut ends up 
being sub-optimal, then we can measure it against non-bounded workload tests 
and strive to close the gap. If it is implemented in a way that can support 
multiple usage scenarios, as described above, then such a limitation might be 
unlimited, bounded at level ___, or bounded by inline resource 
management.. But in any case would be controllable by some users/admin, 
client.. If we could ultimately give the categories of users above the ability 
to enable the various modes, then the 2a) scenario would be perfectly desirable 
for many users already even if the back-pressure logic only gave you 70% of the 
effective system capacity. Once testing shows that performance with active 
back-pressure to the client is close enough to the unbounded workloads, it 
could be enabled by default.

Summary: We still need reasonable back-pressure support throughout the system 
and eventually to the client. Features like this that can be a stepping stone 
towards such are still needed. The most perfect load shedding and hinting 
systems will still not be a sufficient replacement for back-pressure and 
capacity management.


was (Author: jshook):
I would venture that a 

[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

2015-05-09 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536846#comment-14536846
 ] 

Jonathan Shook edited comment on CASSANDRA-9318 at 5/9/15 7:46 PM:
---

I would venture that a solid load shedding system may improve the degenerate 
overloading case, but it is not the preferred method for dealing with 
overloading for most users. The concept of back-pressure is more squarely what 
people expect, for better or worse.

Here is what I think reasonable users want to see, with some variations:
1) The system performs with stability, up to the workload that it is able to 
handle with stability.
2a) Once it reaches that limit, it starts pushing back in terms of how quickly 
it accepts new work. This means that it simply blocks the operations or 
submissions of new requests with some useful bound that is determined by the 
system. It does not yet have to shed load. It does not yet have to give 
exceptions. This is a very reasonable expectation for most users. This is what 
they expect. Load shedding is a term of art which does not change the users 
expectations.
2b) Once it reaches that limit, it starts throwing OE to the client. It does 
not have to shed load yet. This is a very reasonable expectation for users who 
are savvy enough to do active load management at the client level. It may have 
to start writing hints, but if you are writing hints because of load, this 
might not be the best justification for having the hints system kick in. To me 
this is inherently a convenient remedy for the wrong problem, even if it works 
well. Yes, hints are there as a general mechanism, but it does not relieve us 
of the problem of needing to know when the system is at capacity and how to 
handle it proactively. You could also say that hints actively hurt capacity 
when you need them most sometimes. They are expensive to process given the 
current implementation, and will always be load shifting even at theoretical 
best. Still we need them for node availability concerns, although we should be 
careful to use them as a crutch for general capacity issues.
2c) Once it reaches that limit, it starts backlogging (without a helpful 
signature of such in the responses, maybe BackloggingException with some queue 
estimate). This is a very reasonable expectation for users who are savvy enough 
to manage their peak and valley workloads in a sensible way. Sometimes you 
actually want to tax the ingest and flush side of the system for a bit before 
allowing it to switch modes and catch up with compaction. The fact that C* can 
do this is an interesting capability, but those who want backpressure will not 
easily see it that way.
2d) If the system is being pushed beyond its capacity, then it may have to shed 
load. This should only happen if the users has decided that they want to be 
responsible for such and have pushed the system beyond the reasonable limit 
without paying attention to the indications in 2a, 2b, and 2c.

Order of precedence, designated mode of operation, or any other concerns aren't 
really addressed here. I just provided the examples above as types of behaviors 
which are nuanced yet perfectly valid for different types of system designs. 
The real point here is that there is not a single overall 
QoS/capacity/back-pressure behavior which is going to be acceptable to all 
users. Still, we need to ensure stability under saturating load where possible. 
I would like to think that with CASSANDRA-8099 that we can start discussing 
some of the client-facing back-pressure ideas more earnestly.

We can come up with methods to improve the reliable and responsive capacity of 
the system even with some internal load management. If the first cut ends up 
being sub-optimal, then we can measure it against non-bounded workload tests 
and strive to close the gap. If it is implemented in a way that can support 
multiple usage scenarios, as described above, then such a limitation might be 
unlimited, bounded at level ___, or bounded by inline resource 
management.. But in any case would be controllable by some users/admin, 
client.. If we could ultimately give the categories of users above the ability 
to enable the various modes, then the 2a) scenario would be perfectly desirable 
for many users already even if the back-pressure logic only gave you 70% of the 
effective system capacity. Once testing shows that performance with active 
back-pressure to the client is close enough to the unbounded workloads, it 
could be enabled by default.

Summary: We still need reasonable back-pressure support throughout the system 
and eventually to the client. Features like this that can be a stepping stone 
towards such are still needed. The most perfect load shedding and hinting 
systems will still not be a sufficient replacement for back-pressure and 
capacity management.


was (Author: