[jira] [Commented] (CASSANDRA-7731) Get max values for live/tombstone cells per slice
[ https://issues.apache.org/jira/browse/CASSANDRA-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151011#comment-14151011 ] Cyril Scetbon commented on CASSANDRA-7731: -- yeah but displaying depends on the way it's calculated if it's actually wrong ... I glanced at other tickets and it has been opened and not updated since december 2013, so I'm not confident it will be fixed soon. I really thinks it's an important subject. Get max values for live/tombstone cells per slice - Key: CASSANDRA-7731 URL: https://issues.apache.org/jira/browse/CASSANDRA-7731 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Cyril Scetbon Assignee: Robert Stupp Priority: Minor Fix For: 2.1.1 Attachments: 7731-2.0.txt, 7731-2.1.txt I think you should not say that slice statistics are valid for the [last five minutes |https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/tools/NodeCmd.java#L955-L956] in CFSTATS command of nodetool. I've read the documentation from yammer for Histograms and there is no way to force values to expire after x minutes except by [clearing|http://grepcode.com/file/repo1.maven.org/maven2/com.yammer.metrics/metrics-core/2.1.2/com/yammer/metrics/core/Histogram.java#96] it . The only thing I can see is that the last snapshot used to provide the median (or whatever you'd used instead) value is based on 1028 values. I think we should also be able to detect that some requests are accessing a lot of live/tombstone cells per query and that's not possible for now without activating DEBUG for SliceQueryFilter for example and by tweaking the threshold. Currently as nodetool cfstats returns the median if a low part of the queries are scanning a lot of live/tombstone cells we miss it ! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151074#comment-14151074 ] Robert Stupp commented on CASSANDRA-4914: - bq. aggregate functions do only return their input type ... Hm - I still think that we should be able to handle integer overflows - if for example people get a negative result if summing just positive values, they'll complain about it. Postgres for example [returns a bigint for int sums|http://www.postgresql.org/docs/9.1/static/functions-aggregate.html] - Oracle doesn't have any integer data type in tables (only that NUMBER data type). (But if that input-type==output-type behavior is clearly documented and people are able to do e.g. a {{SELECT sum( (varint) myIntCol ) FROM ...}} then that's fine for me - but I'm not sold on this.) Aggregation functions in CQL Key: CASSANDRA-4914 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914 Project: Cassandra Issue Type: New Feature Reporter: Vijay Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, CASSANDRA-4914-V4.txt, CASSANDRA-4914.txt The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc). With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row). Example: SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC; empid | deptid | first_name | last_name | salary ---+++---+ 130 | 3 | joe| doe | 10.1 130 | 2 | joe| doe |100 130 | 1 | joe| doe | 1e+03 SELECT sum(salary), empid FROM emp WHERE empID IN (130); sum(salary) | empid -+ 1110.1| 130 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-494) add remove_slice to the api
[ https://issues.apache.org/jira/browse/CASSANDRA-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151103#comment-14151103 ] ZhongYu commented on CASSANDRA-494: --- Why not implement this feature? We are having trouble deleting columns like timestamp. There are too many columns to load to client. It is really slow to delete data by reading it first! We cost 10 days to delete 1,000,000,000 timestamp style data of about 1000 CF. Each CF have average 1 rows. If we can delete columns by ranges, I think the above operation can finish in serval minutes. add remove_slice to the api --- Key: CASSANDRA-494 URL: https://issues.apache.org/jira/browse/CASSANDRA-494 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Dan Di Spaltro Priority: Minor It would be nice to mimic how get_slice works for removing values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8011) Fail on large batch sizes
[ https://issues.apache.org/jira/browse/CASSANDRA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Yeksigian updated CASSANDRA-8011: -- Attachment: 8011-trunk.txt Fail on large batch sizes -- Key: CASSANDRA-8011 URL: https://issues.apache.org/jira/browse/CASSANDRA-8011 Project: Cassandra Issue Type: Improvement Reporter: Patrick McFadin Assignee: Carl Yeksigian Priority: Minor Fix For: 3.0 Attachments: 8011-trunk.txt Related to https://issues.apache.org/jira/browse/CASSANDRA-6487 Just education alone is not going to stop some of the largest batch sizes from being used. Just like we have a tombstone fail threshold, I propose that we have a batch size fail threshold. Maybe 10x warn? {{batch_fail_threshold: 50k}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6666) Avoid accumulating tombstones after partial hint replay
[ https://issues.apache.org/jira/browse/CASSANDRA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151127#comment-14151127 ] Donald Smith commented on CASSANDRA-: - I know this is moot because of the redesign of hints, but I want to understand this. OK, if a hint was successfully delivered, then I can see how a tombstone would be useful for causing deletion of *older* instances in other sstables. But if a hint timed-out (tombstone), then any older instance will also have timed out (presumably). So, could tombstones be deleted in that case (timeout)? Perhaps a timed out cell IS a tombstone, but my point is: I don't see why they need to take up space. Avoid accumulating tombstones after partial hint replay --- Key: CASSANDRA- URL: https://issues.apache.org/jira/browse/CASSANDRA- Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Labels: hintedhandoff Attachments: .txt, cassandra_system.log.debug.gz -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7838) Warn user when disks are network/ebs mounted
[ https://issues.apache.org/jira/browse/CASSANDRA-7838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151143#comment-14151143 ] AMIT KUMAR commented on CASSANDRA-7838: --- It's based off of trunk. I forked the repo and branched from trunk. Let me know if you see otherwise. Amit On Fri, Sep 26, 2014 at 10:54 AM, T Jake Luciani (JIRA) j...@apache.org Warn user when disks are network/ebs mounted Key: CASSANDRA-7838 URL: https://issues.apache.org/jira/browse/CASSANDRA-7838 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Priority: Minor Labels: bootcamp, lhf Fix For: 3.0 Attachments: 0001-CASSANDRA-7838-log-warning-for-networked-drives.patch, 0002-added-missing-winnt-native.patch, 0003-CASSANDRA-7838-WIP-adding-a-few-other-checks.patch, 0004-CASSANDRA-7838-Removed-network-check-and-read-latenc.patch The Sigar project let's you probe os/cpu/filesystems across the major platforms. https://github.com/hyperic/sigar It would be nice on start-up to use this to warn users if they are running with settings that will make them sad, like Network drive or EBS on Ec2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6246) EPaxos
[ https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151250#comment-14151250 ] sankalp kohli commented on CASSANDRA-6246: -- Keeping executed instances In the current implementation, we only keep the last commit per CQL partition. We can do the same for this as well. I am also reading about epaxos recently and want to know when do you do the condition check in your implementation? EPaxos -- Key: CASSANDRA-6246 URL: https://issues.apache.org/jira/browse/CASSANDRA-6246 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Blake Eggleston Priority: Minor One reason we haven't optimized our Paxos implementation with Multi-paxos is that Multi-paxos requires leader election and hence, a period of unavailability when the leader dies. EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, (2) is particularly useful across multiple datacenters, and (3) allows any node to act as coordinator: http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf However, there is substantial additional complexity involved if we choose to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6246) EPaxos
[ https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151256#comment-14151256 ] Blake Eggleston commented on CASSANDRA-6246: bq. In the current implementation, we only keep the last commit per CQL partition. We can do the same for this as well. Yeah I've been thinking about that some more. Just because we could keep a bunch of historical data doesn't mean we should. There may be situations where we need to keep more than one instance around though, specifically when the instance is part of a strongly connected component. Keeping some historical data would be useful for helping instances recover from short failures where they miss several instances, but after a point, transmitting all the activity for the last hour or two would just be nuts. The other issue with relying on historical data for failure recovery is that you can't keep all of it, so you'd have dangling pointers on the older instances. For longer partitions, and nodes joining the ring, if we transmitted our current dependency bookkeeping for the token ranges they're replicating, the corresponding instances, and the current values for those instances, that should be enough to get going. bq. I am also reading about epaxos recently and want to know when do you do the condition check in your implementation? It would have to be when the instance is executed. EPaxos -- Key: CASSANDRA-6246 URL: https://issues.apache.org/jira/browse/CASSANDRA-6246 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Blake Eggleston Priority: Minor One reason we haven't optimized our Paxos implementation with Multi-paxos is that Multi-paxos requires leader election and hence, a period of unavailability when the leader dies. EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, (2) is particularly useful across multiple datacenters, and (3) allows any node to act as coordinator: http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf However, there is substantial additional complexity involved if we choose to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8011) Fail on large batch sizes
[ https://issues.apache.org/jira/browse/CASSANDRA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151258#comment-14151258 ] sankalp kohli commented on CASSANDRA-8011: -- Some nits 1) In your comment I think you mean batch_size_fail_threshold_in_kb and not batch_size_file_threshold_in_kb 2) We should also expose changing this via JMX like we do for TombstoneFailureThreshold. 3) When we encounter TombstoneFailureThreshold, we clearly write in the exception like (see tombstone_warn_threshold). We should add the same for this. Fail on large batch sizes -- Key: CASSANDRA-8011 URL: https://issues.apache.org/jira/browse/CASSANDRA-8011 Project: Cassandra Issue Type: Improvement Reporter: Patrick McFadin Assignee: Carl Yeksigian Priority: Minor Fix For: 3.0 Attachments: 8011-trunk.txt Related to https://issues.apache.org/jira/browse/CASSANDRA-6487 Just education alone is not going to stop some of the largest batch sizes from being used. Just like we have a tombstone fail threshold, I propose that we have a batch size fail threshold. Maybe 10x warn? {{batch_fail_threshold: 50k}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-6246) EPaxos
[ https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151256#comment-14151256 ] Blake Eggleston edited comment on CASSANDRA-6246 at 9/28/14 11:15 PM: -- bq. In the current implementation, we only keep the last commit per CQL partition. We can do the same for this as well. Yeah I've been thinking about that some more. Just because we could keep a bunch of historical data doesn't mean we should. There may be situations where we need to keep more than one instance around though, specifically when the instance is part of a strongly connected component. Keeping some historical data would be useful for helping nodes recover from short failures where they miss several instances, but after a point, transmitting all the activity for the last hour or two would just be nuts. The other issue with relying on historical data for failure recovery is that you can't keep all of it, so you'd have dangling pointers on the older instances. For longer partitions, and nodes joining the ring, if we transmitted our current dependency bookkeeping for the token ranges they're replicating, the corresponding instances, and the current values for those instances, that should be enough to get going. bq. I am also reading about epaxos recently and want to know when do you do the condition check in your implementation? It would have to be when the instance is executed. was (Author: bdeggleston): bq. In the current implementation, we only keep the last commit per CQL partition. We can do the same for this as well. Yeah I've been thinking about that some more. Just because we could keep a bunch of historical data doesn't mean we should. There may be situations where we need to keep more than one instance around though, specifically when the instance is part of a strongly connected component. Keeping some historical data would be useful for helping instances recover from short failures where they miss several instances, but after a point, transmitting all the activity for the last hour or two would just be nuts. The other issue with relying on historical data for failure recovery is that you can't keep all of it, so you'd have dangling pointers on the older instances. For longer partitions, and nodes joining the ring, if we transmitted our current dependency bookkeeping for the token ranges they're replicating, the corresponding instances, and the current values for those instances, that should be enough to get going. bq. I am also reading about epaxos recently and want to know when do you do the condition check in your implementation? It would have to be when the instance is executed. EPaxos -- Key: CASSANDRA-6246 URL: https://issues.apache.org/jira/browse/CASSANDRA-6246 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Blake Eggleston Priority: Minor One reason we haven't optimized our Paxos implementation with Multi-paxos is that Multi-paxos requires leader election and hence, a period of unavailability when the leader dies. EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, (2) is particularly useful across multiple datacenters, and (3) allows any node to act as coordinator: http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf However, there is substantial additional complexity involved if we choose to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6246) EPaxos
[ https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151261#comment-14151261 ] sankalp kohli commented on CASSANDRA-6246: -- It would have to be when the instance is executed. Since the client(the application) needs to know whether this was a success of not, I was thinking of making it part of the pre accept. When a replica gets a request of pre accept, along with last instance, it can also send the values of the check. If the response from all replicas are the same(fast path), it could be committed locally and async to other replicas. Also the response to the client will contain whether the query succeed or not. Make sense? PS: I am quite excited to see this implementation coming along specially since you are working on it :) EPaxos -- Key: CASSANDRA-6246 URL: https://issues.apache.org/jira/browse/CASSANDRA-6246 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Blake Eggleston Priority: Minor One reason we haven't optimized our Paxos implementation with Multi-paxos is that Multi-paxos requires leader election and hence, a period of unavailability when the leader dies. EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, (2) is particularly useful across multiple datacenters, and (3) allows any node to act as coordinator: http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf However, there is substantial additional complexity involved if we choose to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6246) EPaxos
[ https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151289#comment-14151289 ] Blake Eggleston commented on CASSANDRA-6246: Thanks [~kohlisankalp] :) So the issue with making the check part of the preaccept phase is that you can't trust the value in the database at that point. If there are other interfering instances in flight, you don't know what order they'll be executed in until they're all committed. So, one of them could change the value and you'd have replied to the client with incorrect information. Assuming the client sends the query to a replica, things would go like this: # receive client request # send preaccept request to replicas and wait for a fast path quorum to respond # assuming all responses agreed, commit locally notify replicas asynchronously # assuming all dependencies are committed, sort dependency graph # execute all instances preceding the client's instance, read value* in question and perform the check, make mutation* # respond with result to client *performed locally EPaxos -- Key: CASSANDRA-6246 URL: https://issues.apache.org/jira/browse/CASSANDRA-6246 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Blake Eggleston Priority: Minor One reason we haven't optimized our Paxos implementation with Multi-paxos is that Multi-paxos requires leader election and hence, a period of unavailability when the leader dies. EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, (2) is particularly useful across multiple datacenters, and (3) allows any node to act as coordinator: http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf However, there is substantial additional complexity involved if we choose to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[Cassandra Wiki] Update of Operations by JonHaddad
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The Operations page has been changed by JonHaddad: https://wiki.apache.org/cassandra/Operations?action=diffrev1=112rev2=113 When the !RandomPartitioner is used, Tokens are integers from 0 to 2**127. Keys are converted to this range by MD5 hashing for comparison with Tokens. (Thus, keys are always convertible to Tokens, but the reverse is not always true.) === Token selection === + Using a strong hash function means !RandomPartitioner keys will, on average, be evenly spread across the Token space, but you can still have imbalances if your Tokens do not divide up the range evenly, so you should specify !InitialToken to your first nodes as `i * (2**127 / N)` for i = 0 .. N-1. In Cassandra 0.7, you should specify `initial_token` in `cassandra.yaml`. With !NetworkTopologyStrategy, you should calculate the tokens the nodes in each DC independently. Tokens still needed to be unique, so you can add 1 to the tokens in the 2nd DC, add 2 in the 3rd, and so on. Thus, for a 4-node cluster in 2 datacenters, you would have @@ -45, +46 @@ With order preserving partitioners, your key distribution will be application-dependent. You should still take your best guess at specifying initial tokens (guided by sampling actual data, if possible), but you will be more dependent on active load balancing (see below) and/or adding new nodes to hot spots. Once data is placed on the cluster, the partitioner may not be changed without wiping and starting over. + + As a caveat to the above section, it is generally not necessary to manually select individual tokens when using the vnodes feature. + === Replication === A Cassandra cluster always divides up the key space into ranges delimited by Tokens as described above, but additional replica placement is customizable via IReplicaPlacementStrategy in the configuration file. The standard strategies are