[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536846#comment-14536846 ] Jonathan Shook edited comment on CASSANDRA-9318 at 5/10/15 12:32 AM: - I would venture that a solid load shedding system may improve the degenerate overloading case, but it is not the preferred method for dealing with overloading for most users. The concept of back-pressure is more squarely what people expect, for better or worse. Here is what I think reasonable users want to see, with some variations: 1) The system performs with stability, up to the workload that it is able to handle with stability. 2a) Once it reaches that limit, it starts pushing back in terms of how quickly it accepts new work. This means that it simply blocks the operations or submissions of new requests with some useful bound that is determined by the system. It does not yet have to shed load. It does not yet have to give exceptions. This is a very reasonable expectation for most users. This is what they expect. Load shedding is a term of art which does not change the users' expectations. 2b) Once it reaches that limit, it starts throwing OE to the client. It does not have to shed load yet. (Perhaps this exception or something like it can be thrown _before_ load shedding occurs.) This is a very reasonable expectation for users who are savvy enough to do active load management at the client level. It may have to start writing hints, but if you are writing hints merely because of load, this might not be the best justification for having the hints system kick in. To me this is inherently a convenient remedy for the wrong problem, even if it works well. Yes, hints are there as a general mechanism, but it does not solve the problem of needing to know when the system is being pushed beyond capacity and how to handle it proactively. You could also say that hints actively hurt capacity when you need them most sometimes. They are expensive to process given the current implementation, and will always be load shifting even at theoretical best. Still we need them for node availability concerns, although we should be careful not to use them as a crutch for general capacity issues. 2c) Once it reaches that limit, it starts backlogging (without a helpful signature of such in the responses, maybe BackloggingException with some queue estimate). This is a very reasonable expectation for users who are savvy enough to manage their peak and valley workloads in a sensible way. Sometimes you actually want to tax the ingest and flush side of the system for a bit before allowing it to switch modes and catch up with compaction. The fact that C* can do this is an interesting capability, but those who want backpressure will not easily see it that way. 2d) If the system is being pushed beyond its capacity, then it may have to shed load. This should only happen if the user has decided that they want to be responsible for such and have pushed the system beyond the reasonable limit without paying attention to the indications in 2a, 2b, and 2c. In the current system, this decision is already made for them. They have no choice. In a more optimistic world, users would get near optimal performance for a well tuned workload with back-pressure active throughout the system, or something very much like it. We could call it a different kind of scheduler, different queue management methods, or whatever. As long as the user could prioritize stability at some bounded load over possible instability at an over-saturating load, I think they would in most cases. Like I said, they really don't have this choice right now. I know this is not trivial. We can't remove the need to make sane judgments about sizing and configuration. We might be able to, however, make the system ramp more predictably up to saturation, and behave more reasonably at that level. Order of precedence, How to designate a mode of operation, or any other concerns aren't really addressed here. I just provided the examples above as types of behaviors which are nuanced yet perfectly valid for different types of system designs. The real point here is that there is not a single overall QoS/capacity/back-pressure behavior which is going to be acceptable to all users. Still, we need to ensure stability under saturating load where possible. I would like to think that with CASSANDRA-8099 that we can start discussing some of the client-facing back-pressure ideas more earnestly. I do believe that these ideas are all compatible ideas on a spectrum of behavior. They are not mutually exclusive from a design/implementation perspective. It's possible that they could be specified per operation, even, with some traffic yield to others due to client policies. For example, a lower priority client could yield when it knows the
cassandra git commit: simplify: switch contains get - get
Repository: cassandra Updated Branches: refs/heads/trunk 6cb19216f - 16bf51211 simplify: switch contains get - get Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/16bf5121 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/16bf5121 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/16bf5121 Branch: refs/heads/trunk Commit: 16bf51211594fada8115ca70e3731aa3d4440191 Parents: 6cb1921 Author: Dave Brosius dbros...@mebigfatguy.com Authored: Sat May 9 17:08:27 2015 -0400 Committer: Dave Brosius dbros...@mebigfatguy.com Committed: Sat May 9 17:08:27 2015 -0400 -- .../cassandra/io/sstable/metadata/MetadataSerializer.java | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/16bf5121/src/java/org/apache/cassandra/io/sstable/metadata/MetadataSerializer.java -- diff --git a/src/java/org/apache/cassandra/io/sstable/metadata/MetadataSerializer.java b/src/java/org/apache/cassandra/io/sstable/metadata/MetadataSerializer.java index 46fbbe2..8a65d8d 100644 --- a/src/java/org/apache/cassandra/io/sstable/metadata/MetadataSerializer.java +++ b/src/java/org/apache/cassandra/io/sstable/metadata/MetadataSerializer.java @@ -116,9 +116,10 @@ public class MetadataSerializer implements IMetadataSerializer for (MetadataType type : types) { MetadataComponent component = null; -if (toc.containsKey(type)) +Integer offset = toc.get(type); +if (offset != null) { -in.seek(toc.get(type)); +in.seek(offset); component = type.serializer.deserialize(descriptor.version, in); } components.put(type, component);
[jira] [Updated] (CASSANDRA-9337) Expose LocalStrategy to Applications
[ https://issues.apache.org/jira/browse/CASSANDRA-9337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9337: -- Fix Version/s: 3.x Expose LocalStrategy to Applications Key: CASSANDRA-9337 URL: https://issues.apache.org/jira/browse/CASSANDRA-9337 Project: Cassandra Issue Type: Improvement Reporter: Matthias Broecheler Assignee: Jeremiah Jordan Fix For: 3.x For applications maintaining secondary indexes (or, more generally, views) on a table, it would be nice if they could rely on the same mechanism that C* uses under the hood to maintain its secondary column indexes. That is, allow applications to create tables with LocalReplicationStrategy (i.e. not replicated) which are not visible to the user when describing the keyspace and which cannot be modified through the client directly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9337) Expose LocalStrategy to Applications
[ https://issues.apache.org/jira/browse/CASSANDRA-9337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9337: -- Assignee: Jeremiah Jordan Expose LocalStrategy to Applications Key: CASSANDRA-9337 URL: https://issues.apache.org/jira/browse/CASSANDRA-9337 Project: Cassandra Issue Type: Improvement Reporter: Matthias Broecheler Assignee: Jeremiah Jordan For applications maintaining secondary indexes (or, more generally, views) on a table, it would be nice if they could rely on the same mechanism that C* uses under the hood to maintain its secondary column indexes. That is, allow applications to create tables with LocalReplicationStrategy (i.e. not replicated) which are not visible to the user when describing the keyspace and which cannot be modified through the client directly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536846#comment-14536846 ] Jonathan Shook edited comment on CASSANDRA-9318 at 5/10/15 12:26 AM: - I would venture that a solid load shedding system may improve the degenerate overloading case, but it is not the preferred method for dealing with overloading for most users. The concept of back-pressure is more squarely what people expect, for better or worse. Here is what I think reasonable users want to see, with some variations: 1) The system performs with stability, up to the workload that it is able to handle with stability. 2a) Once it reaches that limit, it starts pushing back in terms of how quickly it accepts new work. This means that it simply blocks the operations or submissions of new requests with some useful bound that is determined by the system. It does not yet have to shed load. It does not yet have to give exceptions. This is a very reasonable expectation for most users. This is what they expect. Load shedding is a term of art which does not change the users' expectations. 2b) Once it reaches that limit, it starts throwing OE to the client. It does not have to shed load yet. (Perhaps this exception or something like it can be thrown _before_ load shedding occurs.) This is a very reasonable expectation for users who are savvy enough to do active load management at the client level. It may have to start writing hints, but if you are writing hints merely because of load, this might not be the best justification for having the hints system kick in. To me this is inherently a convenient remedy for the wrong problem, even if it works well. Yes, hints are there as a general mechanism, but it does not solve the problem of needing to know when the system is being pushed beyond capacity and how to handle it proactively. You could also say that hints actively hurt capacity when you need them most sometimes. They are expensive to process given the current implementation, and will always be load shifting even at theoretical best. Still we need them for node availability concerns, although we should be careful not to use them as a crutch for general capacity issues. 2c) Once it reaches that limit, it starts backlogging (without a helpful signature of such in the responses, maybe BackloggingException with some queue estimate). This is a very reasonable expectation for users who are savvy enough to manage their peak and valley workloads in a sensible way. Sometimes you actually want to tax the ingest and flush side of the system for a bit before allowing it to switch modes and catch up with compaction. The fact that C* can do this is an interesting capability, but those who want backpressure will not easily see it that way. 2d) If the system is being pushed beyond its capacity, then it may have to shed load. This should only happen if the user has decided that they want to be responsible for such and have pushed the system beyond the reasonable limit without paying attention to the indications in 2a, 2b, and 2c. In the current system, this decision is already made for them. They have no choice. In a more optimistic world, users would get near optimal performance for a well tuned workload with back-pressure active throughout the system, or something very much like it. We could call it a different kind of scheduler, different queue management methods, or whatever. As long as the user could prioritize stability at some bounded load over possible instability at an over-saturating load, I think they would in most cases. Like I said, they really don't have this choice right now. I know this is not trivial. We can't remove the need to make sane judgments about sizing and configuration. We might be able to, however, make the system ramp more predictably up to saturation, and behave more reasonable at that level. Order of precedence, How to designate a mode of operation, or any other concerns aren't really addressed here. I just provided the examples above as types of behaviors which are nuanced yet perfectly valid for different types of system designs. The real point here is that there is not a single overall QoS/capacity/back-pressure behavior which is going to be acceptable to all users. Still, we need to ensure stability under saturating load where possible. I would like to think that with CASSANDRA-8099 that we can start discussing some of the client-facing back-pressure ideas more earnestly. I do believe that these ideas are all compatible ideas on a spectrum of behavior. They are not mutually exclusive from a design/implementation perspective. It's possible that they could be specified per operation, even, with some traffic yield to others due to client policies. For example, a lower priority client could yield when it knows the
[jira] [Updated] (CASSANDRA-9337) Expose LocalStrategy to Applications
[ https://issues.apache.org/jira/browse/CASSANDRA-9337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Broecheler updated CASSANDRA-9337: --- Description: For applications maintaining secondary indexes (or, more generally, views) on a table, it would be nice if they could rely on the same mechanism that C* uses under the hood to maintain its secondary column indexes. That is, allow applications to create tables with LocalReplicationStrategy (i.e. not replicated) which are not visible to the user when describing the keyspace and which cannot be modified through the client directly. (was: For applications that build on top of Cassandra, two common use cases emerge: 1) Secondary indexes are used to maintain some form of a custom materialized view locally in a separate table. This is essentially what C* column indexes do. In that case, the table should be local (i.e. not replicated) as it is maintained against another table. 2) A table is used to store configuration information to pertains to the application running atop of Cassandra which needs to be replicated to all nodes. In both cases, the replication strategy differs from standard tables and the tables should not be visible to the user when doing a DESCRIBE KEYSPACE. In both cases, it would furthermore be nice if writing could be restricted so that the tables can only be updated from within the process but not by clients through CQL. No read restrictions need to be imposed.) Expose LocalStrategy to Applications Key: CASSANDRA-9337 URL: https://issues.apache.org/jira/browse/CASSANDRA-9337 Project: Cassandra Issue Type: Improvement Reporter: Matthias Broecheler For applications maintaining secondary indexes (or, more generally, views) on a table, it would be nice if they could rely on the same mechanism that C* uses under the hood to maintain its secondary column indexes. That is, allow applications to create tables with LocalReplicationStrategy (i.e. not replicated) which are not visible to the user when describing the keyspace and which cannot be modified through the client directly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9337) Expose LocalStrategy to Applications
[ https://issues.apache.org/jira/browse/CASSANDRA-9337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Broecheler updated CASSANDRA-9337: --- Summary: Expose LocalStrategy to Applications (was: Advanced table options) Expose LocalStrategy to Applications Key: CASSANDRA-9337 URL: https://issues.apache.org/jira/browse/CASSANDRA-9337 Project: Cassandra Issue Type: Improvement Reporter: Matthias Broecheler For applications that build on top of Cassandra, two common use cases emerge: 1) Secondary indexes are used to maintain some form of a custom materialized view locally in a separate table. This is essentially what C* column indexes do. In that case, the table should be local (i.e. not replicated) as it is maintained against another table. 2) A table is used to store configuration information to pertains to the application running atop of Cassandra which needs to be replicated to all nodes. In both cases, the replication strategy differs from standard tables and the tables should not be visible to the user when doing a DESCRIBE KEYSPACE. In both cases, it would furthermore be nice if writing could be restricted so that the tables can only be updated from within the process but not by clients through CQL. No read restrictions need to be imposed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536484#comment-14536484 ] Jeremiah Jordan commented on CASSANDRA-8576: Bq. It looks better now, but the mixed-cluster during rolling upgrade issue is still there. If someone upgrades half of the cluster to the version with this patch, Hadoop jobs will very likely report errors (not sure how bad that will be - need to test it). This is only an issue if the jobs are pulling the C* jar off of the nodes and the jar isn't part of the job itself? So if this is a problem for someone, they have a work around. Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Assignee: Alex Liu Fix For: 2.1.x Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, CASSANDRA-8576-v2-2.1-branch.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536484#comment-14536484 ] Jeremiah Jordan edited comment on CASSANDRA-8576 at 5/9/15 12:33 PM: - bq. It looks better now, but the mixed-cluster during rolling upgrade issue is still there. If someone upgrades half of the cluster to the version with this patch, Hadoop jobs will very likely report errors (not sure how bad that will be - need to test it). This is only an issue if the jobs are pulling the C* jar off of the nodes and the jar isn't part of the job itself? So if this is a problem for someone, they have a work around. was (Author: jjordan): Bq. It looks better now, but the mixed-cluster during rolling upgrade issue is still there. If someone upgrades half of the cluster to the version with this patch, Hadoop jobs will very likely report errors (not sure how bad that will be - need to test it). This is only an issue if the jobs are pulling the C* jar off of the nodes and the jar isn't part of the job itself? So if this is a problem for someone, they have a work around. Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Assignee: Alex Liu Fix For: 2.1.x Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, CASSANDRA-8576-v2-2.1-branch.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536341#comment-14536341 ] Piotr Kołaczkowski commented on CASSANDRA-8576: --- Some comments were not addressed. {noformat} boolean containToken; for (RangeToken subrange : ranges) { //make sure subrange contains the token containToken = false; if (token != null) { if (subrange.contains(token)) containToken = true; else continue; } ColumnFamilySplit split = new ColumnFamilySplit( factory.toString(subrange.left), factory.toString(subrange.right), subSplit.getRow_count(), endpoints); if (containToken) split.setPartitionKeyEqQuery(containToken); logger.debug(adding {}, split); {noformat} Multiple code smells in this fragment: * boolean flag declared in a needlessly broad scope. If something is used only inside a loop, it should be declared only inside the loop. * continue controlled by a boolean flag * redundant if (the code is equivalent without if (containToken) I simplified it for you: {noformat} for (RangeToken subrange : ranges) { boolean containsToken = token != null subrange.contains(token); if (token == null || containsToken) { ColumnFamilySplit split = new ColumnFamilySplit( factory.toString(subrange.left), factory.toString(subrange.right), subSplit.getRow_count(), endpoints); split.setPartitionKeyEqQuery(containsToken); logger.debug(adding {}, split); splits.add(split); } } {noformat} Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Assignee: Alex Liu Fix For: 2.1.x Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, CASSANDRA-8576-v2-2.1-branch.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9197) Startup slowdown due to preloading jemalloc
[ https://issues.apache.org/jira/browse/CASSANDRA-9197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536362#comment-14536362 ] Robert Stupp commented on CASSANDRA-9197: - ping [~philipthompson] ;) Startup slowdown due to preloading jemalloc --- Key: CASSANDRA-9197 URL: https://issues.apache.org/jira/browse/CASSANDRA-9197 Project: Cassandra Issue Type: Bug Reporter: Sylvain Lebresne Assignee: Robert Stupp Priority: Minor Fix For: 3.x Attachments: 9197.txt On my box, it seems that the jemalloc loading from CASSANDRA-8714 made the process take ~10 seconds to even start (I have no explication for it). I don't know if it's specific to my machine or not, so that ticket is mainly so someone else can check if it sees the same, in particular for jenkins. If it does sees the same slowness, we might want to at least disable jemalloc for dtests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9229) Add functions to convert timeuuid to date or time
[ https://issues.apache.org/jira/browse/CASSANDRA-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536361#comment-14536361 ] Robert Stupp commented on CASSANDRA-9229: - [~JoshuaMcKenzie] the point why I'd prefer not to add any {{toTime}} conversion is that we only ”know” UTC - so we could only convert to UTC, which is usually wrong. You need date+time *and* time-zone to perform a correct to-time-conversion, since time zones or its definitions (e.g. daylight-saving-time) may change. Having a distributed system with probably multiple JRE versions can cause different results. Having that said, I'd like to leave any time-conversion up to the client. Add functions to convert timeuuid to date or time - Key: CASSANDRA-9229 URL: https://issues.apache.org/jira/browse/CASSANDRA-9229 Project: Cassandra Issue Type: New Feature Reporter: Michaël Figuière Assignee: Benjamin Lerer Labels: cql, doc-impacting Fix For: 3.x Attachments: CASSANDRA-9229.txt As CASSANDRA-7523 brings the {{date}} and {{time}} native types to Cassandra, it would be useful to add builtin function to convert {{timeuuid}} to these two new types, just like {{dateOf()}} is doing for timestamps. {{timeOf()}} would extract the time component from a {{timeuuid}}. Example use case could be at insert time with for instance {{timeOf(now())}}, as well as at read time to compare the time component of a {{timeuuid}} column in a {{WHERE}} clause. The use cases would be similar for {{date}} but the solution is slightly less obvious, as in a perfect world we would want {{dateOf()}} to convert to {{date}} and {{timestampOf()}} for {{timestamp}}, unfortunately {{dateOf()}} already exist and convert to a {{timestamp}}, not a {{date}}. Making this change would break many existing CQL queries which is not acceptable. Therefore we could use a different name formatting logic such as {{toDate}} or {{dateFrom}}. We could then also consider using this new name convention for the 3 dates related types and just have {{dateOf}} becoming a deprecated alias. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9230) Allow preparing multiple prepared statements at once
[ https://issues.apache.org/jira/browse/CASSANDRA-9230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536372#comment-14536372 ] Aleksey Yeschenko commented on CASSANDRA-9230: -- CASSANDRA-8831 and CASSANDRA-7923 should indeed be enough. Also, the protocol is already asynchronous - send all your prepare requests at once, then wait for completion, and you would essentially get the same result in the end. If the drivers don't allow us to do that, then it's a drivers issue, not a C* issue. So I'm with Tyler on this. Not worth adding a new protocol level construct (-1). Might be worse doing some work on the driver side though. Allow preparing multiple prepared statements at once Key: CASSANDRA-9230 URL: https://issues.apache.org/jira/browse/CASSANDRA-9230 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Vishy Kasar Priority: Minor Labels: ponies We have a few cases like this: 1. Large (40K) clients 2. Each client preparing the same 10 prepared statements at the start up and on reconnection to node 3. Small(ish) number (24) of cassandra nodes The statement need to be prepared on a casasndra node just once but currently it is prepared 40K times at startup. https://issues.apache.org/jira/browse/CASSANDRA-8831 will make the situation much better. A further optimization is to allow clients to create not-yet prepared statements in bulk.This way, client can prepare all the not yet prepared statements with one round trip to server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536307#comment-14536307 ] Benedict commented on CASSANDRA-9318: - bq. Where? Are you talking about the hint limit? I was, and I realise that was a mistake; I didn't fully understand the existing logic (and your proposal took me by surprise). Now that I do, I think I understand what you are proposing. There are a few problems that I see with it, though: # the cluster as a whole, especially in large clusters, can still send a _lot_ of requests to a single node # it has the opposite impact of (and likely prevents) CASSANDRA-3852, with older operations completely blocking newer ones # it might mean a lot more OE than users are used to during temporary blips, pushing problems down to clients, when the cluster is actually quite capable of coping (through hinting) # tuning it is hard; network latencies, query processing times, and cluster size (which changes over time) will each impact it I'm wary about a feature like this, when we could simply improve our current work shedding to make it more robust (MessagingService, MUTATION stage and ExpiringMap all, effectively, shed; just not with sufficient predictability), but I think I've made all my concerns sufficiently clear so I'll leave it with you. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8939) Stack overflow when reading data ingested through SSTableLoader
[ https://issues.apache.org/jira/browse/CASSANDRA-8939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536318#comment-14536318 ] Benedict commented on CASSANDRA-8939: - See CASSANDRA-8946, which ultimately superceded this ticket. Stack overflow when reading data ingested through SSTableLoader --- Key: CASSANDRA-8939 URL: https://issues.apache.org/jira/browse/CASSANDRA-8939 Project: Cassandra Issue Type: Bug Components: Core Environment: Single C* node Linux Mint 17.1, kernel 3.16.0-30-generic Oracle Java 7u75. Reporter: Piotr Kołaczkowski Assignee: Benedict Fix For: 2.1.5 Attachments: 8939.txt I created an empty table: {noformat} CREATE TABLE test.kv ( key int PRIMARY KEY, value text ) WITH bloom_filter_fp_chance = 0.01 AND caching = '{keys:ALL, rows_per_partition:NONE}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; {noformat} Then I loaded some rows into it using CqlSSTableWriter and SSTableLoader (programmatically, doing it the same way as BulkLoader is doing it). The streaming finished with no errors. I can even read all the data back with cqlsh: {noformat} cqlsh SELECT key, value FROM test.kv; 3405 | foo3405 5504 | foo5504 3476 | foo3476 2542 | foo2542 6931 | foo6931 ---MORE--- (1 rows) {noformat} However, filtering by token fails: {noformat} cqlsh SELECT key, value FROM test.kv WHERE token(key) 854443789258213092; OperationTimedOut: errors={}, last_host=127.0.0.1 cqlsh {noformat} Server log repors a StackOverflowException: {noformat} WARN 15:10:05 Uncaught exception on thread Thread[SharedPool-Worker-2,5,main]: {} java.lang.StackOverflowError: null at java.nio.charset.CharsetDecoder.implReplaceWith(CharsetDecoder.java:302) ~[na:1.7.0_75] at java.nio.charset.CharsetDecoder.replaceWith(CharsetDecoder.java:288) ~[na:1.7.0_75] at java.nio.charset.CharsetDecoder.init(CharsetDecoder.java:203) ~[na:1.7.0_75] at java.nio.charset.CharsetDecoder.init(CharsetDecoder.java:226) ~[na:1.7.0_75] at sun.nio.cs.UTF_8$Decoder.init(UTF_8.java:84) ~[na:1.7.0_75] at sun.nio.cs.UTF_8$Decoder.init(UTF_8.java:81) ~[na:1.7.0_75] at sun.nio.cs.UTF_8.newDecoder(UTF_8.java:68) ~[na:1.7.0_75] at org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:152) ~[cassandra-all-2.1.3.248.jar:2.1.3.248] at org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:39) ~[cassandra-all-2.1.3.248.jar:2.1.3.248] at org.apache.cassandra.serializers.AbstractTextSerializer.deserialize(AbstractTextSerializer.java:26) ~[cassandra-all-2.1.3.248.jar:2.1.3.248] at org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:82) ~[cassandra-all-2.1.3.248.jar:2.1.3.248] at org.apache.cassandra.cql3.ColumnIdentifier.init(ColumnIdentifier.java:54) ~[cassandra-all-2.1.3.248.jar:2.1.3.248] at org.apache.cassandra.db.composites.CompoundSparseCellNameType.idFor(CompoundSparseCellNameType.java:169) ~[cassandra-all-2.1.3.248.jar:2.1.3.248] at org.apache.cassandra.db.composites.CompoundSparseCellNameType.makeWith(CompoundSparseCellNameType.java:177) ~[cassandra-all-2.1.3.248.jar:2.1.3.248] at org.apache.cassandra.db.composites.AbstractCompoundCellNameType.fromByteBuffer(AbstractCompoundCellNameType.java:106) ~[cassandra-all-2.1.3.248.jar:2.1.3.248] at org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:397) ~[cassandra-all-2.1.3.248.jar:2.1.3.248] at org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:381) ~[cassandra-all-2.1.3.248.jar:2.1.3.248] at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:75) ~[cassandra-all-2.1.3.248.jar:2.1.3.248] at org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:52) ~[cassandra-all-2.1.3.248.jar:2.1.3.248] at org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:46) ~[cassandra-all-2.1.3.248.jar:2.1.3.248] at
[jira] [Reopened] (CASSANDRA-8812) JVM Crashes on Windows x86
[ https://issues.apache.org/jira/browse/CASSANDRA-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict reopened CASSANDRA-8812: - Somehow missed this in the retrospective. This should be caught by the kitchen sink tests, since it requires a lot of concurrent work in parallel to schema changes (DROP TABLE), but it could do with its own regression test as well. Reopening for that. JVM Crashes on Windows x86 -- Key: CASSANDRA-8812 URL: https://issues.apache.org/jira/browse/CASSANDRA-8812 Project: Cassandra Issue Type: Bug Environment: Windows 7 running x86(32-bit) Oracle JDK 1.8.0_u31 Reporter: Amichai Rothman Assignee: Benedict Fix For: 2.1.5 Attachments: 8812.txt, crashtest.tgz Under Windows (32 or 64 bit) with the 32-bit Oracle JDK, the JVM may crash due to EXCEPTION_ACCESS_VIOLATION. This happens inconsistently. The attached test project can recreate the crash - sometimes it works successfully, sometimes there's a Java exception in the log, and sometimes the hotspot JVM crash shows up (regardless of whether the JUnit test results in success - you can ignore that). Run it a bunch of times to see the various outcomes. It also contains a sample hotspot error log. Note that both when the Java exception is thrown and when the JVM crashes, the stack trace is almost the same - they both eventually occur when the PERIODIC-COMMIT-LOG-SYNCER thread calls CommitLogSegment.sync and accesses the buffer (MappedByteBuffer): if it happens to be in buffer.force(), then the Java exception is thrown, and if it's in one of the buffer.put() calls before it, then the JVM crashes. This possibly exposes a JVM bug as well in this case. So it basically looks like a race condition which results in the buffer sometimes being used after it is no longer valid. I recreated this on a PC with Windows 7 64-bit running the 32-bit Oracle JDK, as well as on a modern.ie virtualbox image of Windows 7 32-bit running the JDK, and it happens both with JDK 7 and JDK 8. Also defining an explicit dependency on cassandra 2.1.2 (as opposed to the cassandra-unit dependency on 2.1.0) doesn't make a difference. At some point in my testing I've also seen a Java-level exception on Linux, but I can't recreate it at the moment with this test project, so I can't guarantee it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8851) Uncaught exception on thread Thread[SharedPool-Worker-16,5,main] after upgrade to 2.1.3
[ https://issues.apache.org/jira/browse/CASSANDRA-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536314#comment-14536314 ] Benedict commented on CASSANDRA-8851: - bq. Benedict is there still an issue with tests not being run against compressed tables where they should be? Probably, but I'm confused as to the context wrt this ticket? Uncaught exception on thread Thread[SharedPool-Worker-16,5,main] after upgrade to 2.1.3 --- Key: CASSANDRA-8851 URL: https://issues.apache.org/jira/browse/CASSANDRA-8851 Project: Cassandra Issue Type: Bug Environment: ubuntu Reporter: Tobias Schlottke Assignee: Benedict Priority: Critical Fix For: 2.1.5 Attachments: cassandra.yaml, schema.txt, system.log.gz Hi there, after upgrading to 2.1.3 we've got the following error every few seconds: {code} WARN [SharedPool-Worker-16] 2015-02-23 10:20:36,392 AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-16,5,main]: {} java.lang.AssertionError: null at org.apache.cassandra.io.util.Memory.size(Memory.java:307) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.utils.obs.OffHeapBitSet.capacity(OffHeapBitSet.java:61) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.utils.BloomFilter.indexes(BloomFilter.java:74) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.utils.BloomFilter.isPresent(BloomFilter.java:98) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:1366) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:1350) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:41) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:185) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:273) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1915) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1748) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:342) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:57) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) ~[apache-cassandra-2.1.3.jar:2.1.3] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_45] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.3.jar:2.1.3] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] {code} This seems to crash the compactions and pushes up server load and piles up compactions. Any idea / possible workaround? Best, Tobias -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8576) Primary Key Pushdown For Hadoop
[ https://issues.apache.org/jira/browse/CASSANDRA-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536352#comment-14536352 ] Piotr Kołaczkowski commented on CASSANDRA-8576: --- It looks better now, but the mixed-cluster during rolling upgrade issue is still there. If someone upgrades half of the cluster to the version with this patch, Hadoop jobs will very likely report errors (not sure how bad that will be - need to test it). If this is not a problem, +1. Primary Key Pushdown For Hadoop --- Key: CASSANDRA-8576 URL: https://issues.apache.org/jira/browse/CASSANDRA-8576 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Russell Alexander Spitzer Assignee: Alex Liu Fix For: 2.1.x Attachments: 8576-2.1-branch.txt, 8576-trunk.txt, CASSANDRA-8576-v2-2.1-branch.txt I've heard reports from several users that they would like to have predicate pushdown functionality for hadoop (Hive in particular) based services. Example usecase Table with wide partitions, one per customer Application team has HQL they would like to run on a single customer Currently time to complete scales with number of customers since Input Format can't pushdown primary key predicate Current implementation requires a full table scan (since it can't recognize that a single partition was specified) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7409) Allow multiple overlapping sstables in L1
[ https://issues.apache.org/jira/browse/CASSANDRA-7409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536739#comment-14536739 ] Alan Boudreault commented on CASSANDRA-7409: Tests are done, no new blockers experienced during the runs: https://drive.google.com/drive/u/0/folders/0BwZ_GPM33j6KfktyN29kelQzd3NEYnNhTnpfajE2UDRwTTUtQkxwQVQ4YnpqaEMxSUk4TXM We do see some bad performance for standard LCS for Like and Temperature scenarios. I will compare them with 2.1 to ensure it's not a new issue. Allow multiple overlapping sstables in L1 - Key: CASSANDRA-7409 URL: https://issues.apache.org/jira/browse/CASSANDRA-7409 Project: Cassandra Issue Type: Improvement Reporter: Carl Yeksigian Assignee: Carl Yeksigian Labels: compaction Fix For: 3.x Currently, when a normal L0 compaction takes place (not STCS), we take up to MAX_COMPACTING_L0 L0 sstables and all of the overlapping L1 sstables and compact them together. If we didn't have to deal with the overlapping L1 tables, we could compact a higher number of L0 sstables together into a set of non-overlapping L1 sstables. This could be done by delaying the invariant that L1 has no overlapping sstables. Going from L1 to L2, we would be compacting fewer sstables together which overlap. When reading, we will not have the same one sstable per level (except L0) guarantee, but this can be bounded (once we have too many sets of sstables, either compact them back into the same level, or compact them up to the next level). This could be generalized to allow any level to be the maximum for this overlapping strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9197) Startup slowdown due to preloading jemalloc
[ https://issues.apache.org/jira/browse/CASSANDRA-9197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536616#comment-14536616 ] Philip Thompson commented on CASSANDRA-9197: Sorry, [~snazy], I never got the email notication for this. I'll check it out on Monday. Startup slowdown due to preloading jemalloc --- Key: CASSANDRA-9197 URL: https://issues.apache.org/jira/browse/CASSANDRA-9197 Project: Cassandra Issue Type: Bug Reporter: Sylvain Lebresne Assignee: Robert Stupp Priority: Minor Fix For: 3.x Attachments: 9197.txt On my box, it seems that the jemalloc loading from CASSANDRA-8714 made the process take ~10 seconds to even start (I have no explication for it). I don't know if it's specific to my machine or not, so that ticket is mainly so someone else can check if it sees the same, in particular for jenkins. If it does sees the same slowness, we might want to at least disable jemalloc for dtests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536307#comment-14536307 ] Benedict edited comment on CASSANDRA-9318 at 5/9/15 5:23 PM: - bq. Where? Are you talking about the hint limit? I was, and I realise that was a mistake; I didn't fully understand the existing logic (and your proposal took me by surprise). Now that I do, I think I understand what you are proposing. There are a few problems that I see with it, though: # the cluster as a whole, especially in large clusters, can still send a _lot_ of requests to a single node # it has the opposite impact of (and likely prevents) CASSANDRA-3852, with older operations completely blocking newer ones # it might mean a lot more OE than users are used to during temporary blips, pushing problems down to clients, when the cluster is actually quite capable of coping (through hinting) #* It seems like this would in fact seriously compromise our A property, with any failure for any node in a token range rapidly making the entire token range unavailable for writes\* # tuning it is hard; network latencies, query processing times, and cluster size (which changes over time) will each impact it I'm wary about a feature like this, when we could simply improve our current work shedding to make it more robust (MessagingService, MUTATION stage and ExpiringMap all, effectively, shed; just not with sufficient predictability), but I think I've made all my concerns sufficiently clear so I'll leave it with you. \* At the very least we would have to first fallback to hints, rather than throwing OE, and wait for hints to saturate before throwing (AFAICT). In which case we're _in effect_ introducing LIFO-leaky pruning of the ExpiringMap, MS, and the receiving node's MUTATION stage, but under a new mechanism (as opposed to inline FIFO? (tbd) pruning). I don't really have anything against this, since it is functionally equivalent, although I think FIFO-pruning is preferable; having fewer pruning mechanisms is probably preferable; these mechanisms would apply more universally; and they would insulate the node from the many-to-one effect (by making the MUTATION stage itself robust to overload). was (Author: benedict): bq. Where? Are you talking about the hint limit? I was, and I realise that was a mistake; I didn't fully understand the existing logic (and your proposal took me by surprise). Now that I do, I think I understand what you are proposing. There are a few problems that I see with it, though: # the cluster as a whole, especially in large clusters, can still send a _lot_ of requests to a single node # it has the opposite impact of (and likely prevents) CASSANDRA-3852, with older operations completely blocking newer ones # it might mean a lot more OE than users are used to during temporary blips, pushing problems down to clients, when the cluster is actually quite capable of coping (through hinting) # tuning it is hard; network latencies, query processing times, and cluster size (which changes over time) will each impact it I'm wary about a feature like this, when we could simply improve our current work shedding to make it more robust (MessagingService, MUTATION stage and ExpiringMap all, effectively, shed; just not with sufficient predictability), but I think I've made all my concerns sufficiently clear so I'll leave it with you. Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Ariel Weisberg Fix For: 2.1.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-9230) Allow preparing multiple prepared statements at once
[ https://issues.apache.org/jira/browse/CASSANDRA-9230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp resolved CASSANDRA-9230. - Resolution: Not A Problem Closing as ”not a problem. Although the tests show some advantage - having CASSANDRA-8831 committed should solve the issue since clients no longer have to re-prepare the statements. Allow preparing multiple prepared statements at once Key: CASSANDRA-9230 URL: https://issues.apache.org/jira/browse/CASSANDRA-9230 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Vishy Kasar Priority: Minor Labels: ponies We have a few cases like this: 1. Large (40K) clients 2. Each client preparing the same 10 prepared statements at the start up and on reconnection to node 3. Small(ish) number (24) of cassandra nodes The statement need to be prepared on a casasndra node just once but currently it is prepared 40K times at startup. https://issues.apache.org/jira/browse/CASSANDRA-8831 will make the situation much better. A further optimization is to allow clients to create not-yet prepared statements in bulk.This way, client can prepare all the not yet prepared statements with one round trip to server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536846#comment-14536846 ] Jonathan Shook commented on CASSANDRA-9318: --- I would venture that a solid load shedding system may improve the degenerate overloading case, but it is not the preferred method for dealing with overloading for most users. The concept of back-pressure is more squarely what people expect, for better or worse. Here is what I think reasonable users want to see, with some variations: 1) The system performs with stability, up to the workload that it is able to handle with stability. 2a) Once it reaches that limit, it starts pushing back in terms of how quickly it accepts new work. This means that it simply blocks the operations or submissions of new requests with some useful bound that is determined by the system. It does not yet have to shed load. It does not yet have to give exceptions. This is a very reasonable expectation for most users. This is what they expect. Load shedding is a term of art which does not change the users expectations. 2b) Once it reaches that limit, it starts throwing OE to the client. It does not have to shed load yet. This is a very reasonable expectation for users who are savvy enough to do active load management at the client level. It may have to start writing hints, but if you are writing hints because of load, this might not be the best justification for having the hints system kick in. To me this is inherently a convenient remedy for the wrong problem, even if it works well. Yes, hints are there as a general mechanism, but it does not relieve us of the problem of needing to know when the system is at capacity and how to handle it proactively. You could also say that hints actively hurt capacity when you need them most sometimes. They are expensive to process given the current implementation, and will always be load shifting even at theoretical best. Still we need them for node availability concerns, although we should be careful to use them as a crutch for general capacity issues. 2c) Once it reaches that limit, it starts backlogging (without a helpful signature of such in the responses, maybe BackloggingException with some queue estimate). This is a very reasonable expectation for users who are savvy enough to manage their peak and valley workloads in a sensible way. Sometimes you actually want to tax the ingest and flush side of the system for a bit before allowing it to switch modes and catch up with compaction. The fact that C* can do this is an interesting capability, but those who want backpressure will not easily see it that way. 2d) If the system is being pushed beyond its capacity, then it may have to shed load. This should only happen if the users has decided that they want to be responsible for such and have pushed the system beyond the reasonable limit without paying attention to the indications in 2a, 2b, and 2c. Order of precedence, designated mode of operation, or any other concerns aren't really addressed here. I just provided them as examples of types of behaviors which are nuanced yet perfectly valid for different types of system designers. The real point here is that there is not a single overall design which is going to be acceptable to all users. Still, we need to ensure stability under saturating load where possible. I would like to think that with CASSANDRA-8099 that we can start discussing some of the client-facing back-pressure ideas more earnestly. We can come up with methods to improve the reliable and responsive capacity of the system even with some internal load management. If the first cut ends up being sub-optimal, then we can measure it against non-bounded workload tests and strive to close the gap. If it is implemented in a way that can support multiple usage scenarios, as described above, then such a limitation might be unlimited, bounded at level ___, or bounded by inline resource management.. But in any case would be controllable by some users/admin, client.. If we could ultimately give the categories of users above the ability to enable the various modes, then the 2a) scenario would be perfectly desirable for many users already even if the back-pressure logic only gave you 70% of the effective system capacity. Once testing shows that performance with active back-pressure to the client is close enough to the unbounded workloads, it could be enabled. Summary: We still need reasonable back-pressure support throughout the system and eventually to the client. Features like this that can be a stepping stone towards such are still needed. The most perfect load shedding and hinting systems will still not be a sufficient replacement for back-pressure and capacity management. Bound the number of in-flight requests at the coordinator
[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536846#comment-14536846 ] Jonathan Shook edited comment on CASSANDRA-9318 at 5/9/15 7:42 PM: --- I would venture that a solid load shedding system may improve the degenerate overloading case, but it is not the preferred method for dealing with overloading for most users. The concept of back-pressure is more squarely what people expect, for better or worse. Here is what I think reasonable users want to see, with some variations: 1) The system performs with stability, up to the workload that it is able to handle with stability. 2a) Once it reaches that limit, it starts pushing back in terms of how quickly it accepts new work. This means that it simply blocks the operations or submissions of new requests with some useful bound that is determined by the system. It does not yet have to shed load. It does not yet have to give exceptions. This is a very reasonable expectation for most users. This is what they expect. Load shedding is a term of art which does not change the users expectations. 2b) Once it reaches that limit, it starts throwing OE to the client. It does not have to shed load yet. This is a very reasonable expectation for users who are savvy enough to do active load management at the client level. It may have to start writing hints, but if you are writing hints because of load, this might not be the best justification for having the hints system kick in. To me this is inherently a convenient remedy for the wrong problem, even if it works well. Yes, hints are there as a general mechanism, but it does not relieve us of the problem of needing to know when the system is at capacity and how to handle it proactively. You could also say that hints actively hurt capacity when you need them most sometimes. They are expensive to process given the current implementation, and will always be load shifting even at theoretical best. Still we need them for node availability concerns, although we should be careful to use them as a crutch for general capacity issues. 2c) Once it reaches that limit, it starts backlogging (without a helpful signature of such in the responses, maybe BackloggingException with some queue estimate). This is a very reasonable expectation for users who are savvy enough to manage their peak and valley workloads in a sensible way. Sometimes you actually want to tax the ingest and flush side of the system for a bit before allowing it to switch modes and catch up with compaction. The fact that C* can do this is an interesting capability, but those who want backpressure will not easily see it that way. 2d) If the system is being pushed beyond its capacity, then it may have to shed load. This should only happen if the users has decided that they want to be responsible for such and have pushed the system beyond the reasonable limit without paying attention to the indications in 2a, 2b, and 2c. Order of precedence, designated mode of operation, or any other concerns aren't really addressed here. I just provided them as examples of types of behaviors which are nuanced yet perfectly valid for different types of system designers. The real point here is that there is not a single overall design which is going to be acceptable to all users. Still, we need to ensure stability under saturating load where possible. I would like to think that with CASSANDRA-8099 that we can start discussing some of the client-facing back-pressure ideas more earnestly. We can come up with methods to improve the reliable and responsive capacity of the system even with some internal load management. If the first cut ends up being sub-optimal, then we can measure it against non-bounded workload tests and strive to close the gap. If it is implemented in a way that can support multiple usage scenarios, as described above, then such a limitation might be unlimited, bounded at level ___, or bounded by inline resource management.. But in any case would be controllable by some users/admin, client.. If we could ultimately give the categories of users above the ability to enable the various modes, then the 2a) scenario would be perfectly desirable for many users already even if the back-pressure logic only gave you 70% of the effective system capacity. Once testing shows that performance with active back-pressure to the client is close enough to the unbounded workloads, it could be enabled by default. Summary: We still need reasonable back-pressure support throughout the system and eventually to the client. Features like this that can be a stepping stone towards such are still needed. The most perfect load shedding and hinting systems will still not be a sufficient replacement for back-pressure and capacity management. was (Author: jshook): I would venture that a
[jira] [Comment Edited] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536846#comment-14536846 ] Jonathan Shook edited comment on CASSANDRA-9318 at 5/9/15 7:46 PM: --- I would venture that a solid load shedding system may improve the degenerate overloading case, but it is not the preferred method for dealing with overloading for most users. The concept of back-pressure is more squarely what people expect, for better or worse. Here is what I think reasonable users want to see, with some variations: 1) The system performs with stability, up to the workload that it is able to handle with stability. 2a) Once it reaches that limit, it starts pushing back in terms of how quickly it accepts new work. This means that it simply blocks the operations or submissions of new requests with some useful bound that is determined by the system. It does not yet have to shed load. It does not yet have to give exceptions. This is a very reasonable expectation for most users. This is what they expect. Load shedding is a term of art which does not change the users expectations. 2b) Once it reaches that limit, it starts throwing OE to the client. It does not have to shed load yet. This is a very reasonable expectation for users who are savvy enough to do active load management at the client level. It may have to start writing hints, but if you are writing hints because of load, this might not be the best justification for having the hints system kick in. To me this is inherently a convenient remedy for the wrong problem, even if it works well. Yes, hints are there as a general mechanism, but it does not relieve us of the problem of needing to know when the system is at capacity and how to handle it proactively. You could also say that hints actively hurt capacity when you need them most sometimes. They are expensive to process given the current implementation, and will always be load shifting even at theoretical best. Still we need them for node availability concerns, although we should be careful to use them as a crutch for general capacity issues. 2c) Once it reaches that limit, it starts backlogging (without a helpful signature of such in the responses, maybe BackloggingException with some queue estimate). This is a very reasonable expectation for users who are savvy enough to manage their peak and valley workloads in a sensible way. Sometimes you actually want to tax the ingest and flush side of the system for a bit before allowing it to switch modes and catch up with compaction. The fact that C* can do this is an interesting capability, but those who want backpressure will not easily see it that way. 2d) If the system is being pushed beyond its capacity, then it may have to shed load. This should only happen if the users has decided that they want to be responsible for such and have pushed the system beyond the reasonable limit without paying attention to the indications in 2a, 2b, and 2c. Order of precedence, designated mode of operation, or any other concerns aren't really addressed here. I just provided the examples above as types of behaviors which are nuanced yet perfectly valid for different types of system designs. The real point here is that there is not a single overall QoS/capacity/back-pressure behavior which is going to be acceptable to all users. Still, we need to ensure stability under saturating load where possible. I would like to think that with CASSANDRA-8099 that we can start discussing some of the client-facing back-pressure ideas more earnestly. We can come up with methods to improve the reliable and responsive capacity of the system even with some internal load management. If the first cut ends up being sub-optimal, then we can measure it against non-bounded workload tests and strive to close the gap. If it is implemented in a way that can support multiple usage scenarios, as described above, then such a limitation might be unlimited, bounded at level ___, or bounded by inline resource management.. But in any case would be controllable by some users/admin, client.. If we could ultimately give the categories of users above the ability to enable the various modes, then the 2a) scenario would be perfectly desirable for many users already even if the back-pressure logic only gave you 70% of the effective system capacity. Once testing shows that performance with active back-pressure to the client is close enough to the unbounded workloads, it could be enabled by default. Summary: We still need reasonable back-pressure support throughout the system and eventually to the client. Features like this that can be a stepping stone towards such are still needed. The most perfect load shedding and hinting systems will still not be a sufficient replacement for back-pressure and capacity management. was (Author: