[jira] [Commented] (CASSANDRA-7731) Get max values for live/tombstone cells per slice

2014-09-28 Thread Cyril Scetbon (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151011#comment-14151011
 ] 

Cyril Scetbon commented on CASSANDRA-7731:
--

yeah but displaying depends on the way it's calculated if it's actually wrong 
... I glanced at other tickets and it has been opened and not updated since 
december 2013, so I'm not confident it will be fixed soon. I really thinks it's 
an important subject.

 Get max values for live/tombstone cells per slice
 -

 Key: CASSANDRA-7731
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7731
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Cyril Scetbon
Assignee: Robert Stupp
Priority: Minor
 Fix For: 2.1.1

 Attachments: 7731-2.0.txt, 7731-2.1.txt


 I think you should not say that slice statistics are valid for the [last five 
 minutes 
 |https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/tools/NodeCmd.java#L955-L956]
  in CFSTATS command of nodetool. I've read the documentation from yammer for 
 Histograms and there is no way to force values to expire after x minutes 
 except by 
 [clearing|http://grepcode.com/file/repo1.maven.org/maven2/com.yammer.metrics/metrics-core/2.1.2/com/yammer/metrics/core/Histogram.java#96]
  it . The only thing I can see is that the last snapshot used to provide the 
 median (or whatever you'd used instead) value is based on 1028 values.
 I think we should also be able to detect that some requests are accessing a 
 lot of live/tombstone cells per query and that's not possible for now without 
 activating DEBUG for SliceQueryFilter for example and by tweaking the 
 threshold. Currently as nodetool cfstats returns the median if a low part of 
 the queries are scanning a lot of live/tombstone cells we miss it !



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4914) Aggregation functions in CQL

2014-09-28 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151074#comment-14151074
 ] 

Robert Stupp commented on CASSANDRA-4914:
-

bq. aggregate functions do only return their input type ...

Hm - I still think that we should be able to handle integer overflows - if for 
example people get a negative result if summing just positive values, they'll 
complain about it.
Postgres for example [returns a bigint for int 
sums|http://www.postgresql.org/docs/9.1/static/functions-aggregate.html] - 
Oracle doesn't have any integer data type in tables (only that NUMBER data 
type).
(But if that input-type==output-type behavior is clearly documented and people 
are able to do e.g. a {{SELECT sum( (varint) myIntCol ) FROM ...}} then that's 
fine for me - but I'm not sold on this.)

 Aggregation functions in CQL
 

 Key: CASSANDRA-4914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914
 Project: Cassandra
  Issue Type: New Feature
Reporter: Vijay
Assignee: Benjamin Lerer
  Labels: cql, docs
 Fix For: 3.0

 Attachments: CASSANDRA-4914-V2.txt, CASSANDRA-4914-V3.txt, 
 CASSANDRA-4914-V4.txt, CASSANDRA-4914.txt


 The requirement is to do aggregation of data in Cassandra (Wide row of column 
 values of int, double, float etc).
 With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for 
 the columns within a row).
 Example:
 SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC;  
   
  empid | deptid | first_name | last_name | salary
 ---+++---+
130 |  3 | joe| doe   |   10.1
130 |  2 | joe| doe   |100
130 |  1 | joe| doe   |  1e+03
  
 SELECT sum(salary), empid FROM emp WHERE empID IN (130);  
   
  sum(salary) | empid
 -+
1110.1|  130



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-494) add remove_slice to the api

2014-09-28 Thread ZhongYu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151103#comment-14151103
 ] 

ZhongYu commented on CASSANDRA-494:
---

Why not implement this feature?
We are having trouble deleting columns like timestamp. There are too many 
columns to load to client. It is really slow to delete data by reading it first!
We cost 10 days to delete 1,000,000,000 timestamp style data of about 1000 CF. 
Each CF have average 1 rows. 
If we can delete columns by ranges, I think the above operation can finish in 
serval minutes.

 add remove_slice to the api
 ---

 Key: CASSANDRA-494
 URL: https://issues.apache.org/jira/browse/CASSANDRA-494
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Dan Di Spaltro
Priority: Minor

 It would be nice to mimic how get_slice works for removing values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8011) Fail on large batch sizes

2014-09-28 Thread Carl Yeksigian (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Yeksigian updated CASSANDRA-8011:
--
Attachment: 8011-trunk.txt

 Fail on large batch sizes 
 --

 Key: CASSANDRA-8011
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8011
 Project: Cassandra
  Issue Type: Improvement
Reporter: Patrick McFadin
Assignee: Carl Yeksigian
Priority: Minor
 Fix For: 3.0

 Attachments: 8011-trunk.txt


 Related to https://issues.apache.org/jira/browse/CASSANDRA-6487
 Just education alone is not going to stop some of the largest batch sizes 
 from being used. Just like we have a tombstone fail threshold, I propose that 
 we have a batch size fail threshold.
 Maybe 10x warn?
 {{batch_fail_threshold: 50k}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6666) Avoid accumulating tombstones after partial hint replay

2014-09-28 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151127#comment-14151127
 ] 

Donald Smith commented on CASSANDRA-:
-

I know this is moot because of the redesign of hints, but I want to understand 
this. OK, if a hint was successfully delivered, then I can see how a tombstone 
would be useful for causing deletion of *older* instances in other sstables.  
But if a hint timed-out (tombstone), then any older instance will also have 
timed out (presumably). So, could tombstones be deleted in that case (timeout)? 
 Perhaps a timed out cell IS a tombstone, but my point is: I don't see why they 
need to take up space.

 Avoid accumulating tombstones after partial hint replay
 ---

 Key: CASSANDRA-
 URL: https://issues.apache.org/jira/browse/CASSANDRA-
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
  Labels: hintedhandoff
 Attachments: .txt, cassandra_system.log.debug.gz






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7838) Warn user when disks are network/ebs mounted

2014-09-28 Thread AMIT KUMAR (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151143#comment-14151143
 ] 

AMIT KUMAR commented on CASSANDRA-7838:
---

It's based off of trunk. I forked the repo and branched from trunk. Let me
know if you see otherwise.

Amit

On Fri, Sep 26, 2014 at 10:54 AM, T Jake Luciani (JIRA) j...@apache.org



 Warn user when disks are network/ebs mounted
 

 Key: CASSANDRA-7838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7838
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Priority: Minor
  Labels: bootcamp, lhf
 Fix For: 3.0

 Attachments: 
 0001-CASSANDRA-7838-log-warning-for-networked-drives.patch, 
 0002-added-missing-winnt-native.patch, 
 0003-CASSANDRA-7838-WIP-adding-a-few-other-checks.patch, 
 0004-CASSANDRA-7838-Removed-network-check-and-read-latenc.patch


 The Sigar project let's you probe os/cpu/filesystems across the major 
 platforms.
 https://github.com/hyperic/sigar
 It would be nice on start-up to use this to warn users if they are running 
 with settings that will make them sad, like Network drive or EBS on Ec2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6246) EPaxos

2014-09-28 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151250#comment-14151250
 ] 

sankalp kohli commented on CASSANDRA-6246:
--

Keeping executed instances
In the current implementation, we only keep the last commit per CQL partition. 
We can do the same for this as well. 

I am also reading about epaxos recently and want to know when do you do the 
condition check in your implementation? 


 EPaxos
 --

 Key: CASSANDRA-6246
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6246
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Blake Eggleston
Priority: Minor

 One reason we haven't optimized our Paxos implementation with Multi-paxos is 
 that Multi-paxos requires leader election and hence, a period of 
 unavailability when the leader dies.
 EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, 
 (2) is particularly useful across multiple datacenters, and (3) allows any 
 node to act as coordinator: 
 http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf
 However, there is substantial additional complexity involved if we choose to 
 implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6246) EPaxos

2014-09-28 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151256#comment-14151256
 ] 

Blake Eggleston commented on CASSANDRA-6246:


bq. In the current implementation, we only keep the last commit per CQL 
partition. We can do the same for this as well.

Yeah I've been thinking about that some more. Just because we could keep a 
bunch of historical data doesn't mean we should. There may be situations where 
we need to keep more than one instance around though, specifically when the 
instance is part of a strongly connected component. Keeping some historical 
data would be useful for helping instances recover from short failures where 
they miss several instances, but after a point, transmitting all the activity 
for the last hour or two would just be nuts. The other issue with relying on 
historical data for failure recovery is that you can't keep all of it, so you'd 
have dangling pointers on the older instances. 

For longer partitions, and nodes joining the ring, if we transmitted our 
current dependency bookkeeping for the token ranges they're replicating, the 
corresponding instances, and the current values for those instances, that 
should be enough to get going.

bq. I am also reading about epaxos recently and want to know when do you do the 
condition check in your implementation?

It would have to be when the instance is executed.

 EPaxos
 --

 Key: CASSANDRA-6246
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6246
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Blake Eggleston
Priority: Minor

 One reason we haven't optimized our Paxos implementation with Multi-paxos is 
 that Multi-paxos requires leader election and hence, a period of 
 unavailability when the leader dies.
 EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, 
 (2) is particularly useful across multiple datacenters, and (3) allows any 
 node to act as coordinator: 
 http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf
 However, there is substantial additional complexity involved if we choose to 
 implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8011) Fail on large batch sizes

2014-09-28 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151258#comment-14151258
 ] 

sankalp kohli commented on CASSANDRA-8011:
--

Some nits
1) In your comment I think you mean batch_size_fail_threshold_in_kb and not 
batch_size_file_threshold_in_kb
2) We should also expose changing this via JMX like we do for 
TombstoneFailureThreshold.
3) When we encounter TombstoneFailureThreshold, we clearly write in the 
exception like  (see tombstone_warn_threshold). We should add the same for 
this.  

 Fail on large batch sizes 
 --

 Key: CASSANDRA-8011
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8011
 Project: Cassandra
  Issue Type: Improvement
Reporter: Patrick McFadin
Assignee: Carl Yeksigian
Priority: Minor
 Fix For: 3.0

 Attachments: 8011-trunk.txt


 Related to https://issues.apache.org/jira/browse/CASSANDRA-6487
 Just education alone is not going to stop some of the largest batch sizes 
 from being used. Just like we have a tombstone fail threshold, I propose that 
 we have a batch size fail threshold.
 Maybe 10x warn?
 {{batch_fail_threshold: 50k}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-6246) EPaxos

2014-09-28 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151256#comment-14151256
 ] 

Blake Eggleston edited comment on CASSANDRA-6246 at 9/28/14 11:15 PM:
--

bq. In the current implementation, we only keep the last commit per CQL 
partition. We can do the same for this as well.

Yeah I've been thinking about that some more. Just because we could keep a 
bunch of historical data doesn't mean we should. There may be situations where 
we need to keep more than one instance around though, specifically when the 
instance is part of a strongly connected component. Keeping some historical 
data would be useful for helping nodes recover from short failures where they 
miss several instances, but after a point, transmitting all the activity for 
the last hour or two would just be nuts. The other issue with relying on 
historical data for failure recovery is that you can't keep all of it, so you'd 
have dangling pointers on the older instances. 

For longer partitions, and nodes joining the ring, if we transmitted our 
current dependency bookkeeping for the token ranges they're replicating, the 
corresponding instances, and the current values for those instances, that 
should be enough to get going.

bq. I am also reading about epaxos recently and want to know when do you do the 
condition check in your implementation?

It would have to be when the instance is executed.


was (Author: bdeggleston):
bq. In the current implementation, we only keep the last commit per CQL 
partition. We can do the same for this as well.

Yeah I've been thinking about that some more. Just because we could keep a 
bunch of historical data doesn't mean we should. There may be situations where 
we need to keep more than one instance around though, specifically when the 
instance is part of a strongly connected component. Keeping some historical 
data would be useful for helping instances recover from short failures where 
they miss several instances, but after a point, transmitting all the activity 
for the last hour or two would just be nuts. The other issue with relying on 
historical data for failure recovery is that you can't keep all of it, so you'd 
have dangling pointers on the older instances. 

For longer partitions, and nodes joining the ring, if we transmitted our 
current dependency bookkeeping for the token ranges they're replicating, the 
corresponding instances, and the current values for those instances, that 
should be enough to get going.

bq. I am also reading about epaxos recently and want to know when do you do the 
condition check in your implementation?

It would have to be when the instance is executed.

 EPaxos
 --

 Key: CASSANDRA-6246
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6246
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Blake Eggleston
Priority: Minor

 One reason we haven't optimized our Paxos implementation with Multi-paxos is 
 that Multi-paxos requires leader election and hence, a period of 
 unavailability when the leader dies.
 EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, 
 (2) is particularly useful across multiple datacenters, and (3) allows any 
 node to act as coordinator: 
 http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf
 However, there is substantial additional complexity involved if we choose to 
 implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6246) EPaxos

2014-09-28 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151261#comment-14151261
 ] 

sankalp kohli commented on CASSANDRA-6246:
--

It would have to be when the instance is executed.

Since the client(the application) needs to know whether this was a success of 
not, I was thinking of making it part of the pre accept. 
When a replica gets a request of pre accept, along with last instance, it can 
also send the values of the check. If the response from all replicas are the 
same(fast path), it could be committed locally and async to other replicas. 
Also the response to the client will contain whether the query succeed or not. 

Make sense? 

PS: I am quite excited to see this implementation coming along specially since 
you are working on it :) 

 EPaxos
 --

 Key: CASSANDRA-6246
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6246
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Blake Eggleston
Priority: Minor

 One reason we haven't optimized our Paxos implementation with Multi-paxos is 
 that Multi-paxos requires leader election and hence, a period of 
 unavailability when the leader dies.
 EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, 
 (2) is particularly useful across multiple datacenters, and (3) allows any 
 node to act as coordinator: 
 http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf
 However, there is substantial additional complexity involved if we choose to 
 implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6246) EPaxos

2014-09-28 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151289#comment-14151289
 ] 

Blake Eggleston commented on CASSANDRA-6246:


Thanks [~kohlisankalp] :)

So the issue with making the check part of the preaccept phase is that you 
can't trust the value in the database at that point. If there are other 
interfering instances in flight, you don't know what order they'll be executed 
in until they're all committed. So, one of them could change the value and 
you'd have replied to the client with incorrect information. Assuming the 
client sends the query to a replica, things would go like this:

# receive client request
# send preaccept request to replicas and wait for a fast path quorum to respond
# assuming all responses agreed, commit locally  notify replicas asynchronously
# assuming all dependencies are committed, sort dependency graph
# execute all instances preceding the client's instance, read value* in 
question and perform the check, make mutation*
# respond with result to client

*performed locally

 EPaxos
 --

 Key: CASSANDRA-6246
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6246
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Blake Eggleston
Priority: Minor

 One reason we haven't optimized our Paxos implementation with Multi-paxos is 
 that Multi-paxos requires leader election and hence, a period of 
 unavailability when the leader dies.
 EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, 
 (2) is particularly useful across multiple datacenters, and (3) allows any 
 node to act as coordinator: 
 http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf
 However, there is substantial additional complexity involved if we choose to 
 implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[Cassandra Wiki] Update of Operations by JonHaddad

2014-09-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The Operations page has been changed by JonHaddad:
https://wiki.apache.org/cassandra/Operations?action=diffrev1=112rev2=113

  When the !RandomPartitioner is used, Tokens are integers from 0 to 2**127.  
Keys are converted to this range by MD5 hashing for comparison with Tokens.  
(Thus, keys are always convertible to Tokens, but the reverse is not always 
true.)
  
  === Token selection ===
+ 
  Using a strong hash function means !RandomPartitioner keys will, on average, 
be evenly spread across the Token space, but you can still have imbalances if 
your Tokens do not divide up the range evenly, so you should specify 
!InitialToken to your first nodes as `i * (2**127 / N)` for i = 0 .. N-1. In 
Cassandra 0.7, you should specify `initial_token` in `cassandra.yaml`.
  
  With !NetworkTopologyStrategy, you should calculate the tokens the nodes in 
each DC independently. Tokens still needed to be unique, so you can add 1 to 
the tokens in the 2nd DC, add 2 in the 3rd, and so on.  Thus, for a 4-node 
cluster in 2 datacenters, you would have
@@ -45, +46 @@

  With order preserving partitioners, your key distribution will be 
application-dependent.  You should still take your best guess at specifying 
initial tokens (guided by sampling actual data, if possible), but you will be 
more dependent on active load balancing (see below) and/or adding new nodes to 
hot spots.
  
  Once data is placed on the cluster, the partitioner may not be changed 
without wiping and starting over.
+ 
+ As a caveat to the above section, it is generally not necessary to manually 
select individual tokens when using the vnodes feature.
+ 
  
  === Replication ===
  A Cassandra cluster always divides up the key space into ranges delimited by 
Tokens as described above, but additional replica placement is customizable via 
IReplicaPlacementStrategy in the configuration file.  The standard strategies 
are