date:20140422

Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.0 bc6f4d003 - 48d7e4080


Fix CQL version number for CASSANDRA-7055


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/48d7e408
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/48d7e408
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/48d7e408

Branch: refs/heads/cassandra-2.0
Commit: 48d7e408085ff4f56a718487dedb05c23dfe57c9
Parents: bc6f4d0
Author: Sylvain Lebresne sylv...@datastax.com
Authored: Tue Apr 22 10:12:19 2014 +0200
Committer: Sylvain Lebresne sylv...@datastax.com
Committed: Tue Apr 22 10:12:19 2014 +0200

--
 doc/cql3/CQL.textile| 16 +++-
 .../org/apache/cassandra/cql3/QueryProcessor.java   |  2 +-
 2 files changed, 16 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/48d7e408/doc/cql3/CQL.textile
--
diff --git a/doc/cql3/CQL.textile b/doc/cql3/CQL.textile
index 3847701..f6208bf 100644
--- a/doc/cql3/CQL.textile
+++ b/doc/cql3/CQL.textile
@@ -497,13 +497,16 @@ bc(syntax)..
   ( USING option ( AND option )* )?
   SET assignment ( ',' assignment )*
   WHERE where-clause
-  ( IF identifier '=' term ( AND identifier '=' term 
)* )?
+  ( IF condition ( AND condition )* )?
 
 assignment ::= identifier '=' term
| identifier '=' identifier ('+' | '-') (int-term | 
set-literal | list-literal)
| identifier '=' identifier '+' map-literal
| identifier '[' term ']' '=' term
 
+condition ::= identifier '=' term
+  | identifier '[' term ']' '=' term
+
 where-clause ::= relation ( AND relation )*
 
 relation ::= identifier '=' term
@@ -552,6 +555,7 @@ bc(syntax)..
   FROM tablename
   ( USING TIMESTAMP integer)?
   WHERE where-clause
+  ( IF ( EXISTS | ( condition ( AND condition )*) ) )?
 
 selection ::= identifier ( '[' term ']' )?
 
@@ -560,6 +564,9 @@ bc(syntax)..
 relation ::= identifier '=' term
  | identifier IN '(' ( term ( ',' term )* )? ')'
  | identifier IN '?'
+
+condition ::= identifier '=' term
+  | identifier '[' term ']' '=' term
 p. 
 __Sample:__
 
@@ -574,6 +581,7 @@ The @DELETE@ statement deletes columns and rows. If column 
names are provided di
 
 In a @DELETE@ statement, all deletions within the same partition key are 
applied atomically and in isolation.
 
+A @DELETE@ operation application can be conditioned using @IF@ like for 
@UPDATE@ and @INSERT@. But please not that as for the later, this will incur a 
non negligible performance cost (internally, Paxos will be used) and so should 
be used sparingly.
 
 
 h3(#batchStmt). BATCH
@@ -1149,6 +1157,12 @@ h2(#changes). Changes
 
 The following describes the addition/changes brought for each version of CQL.
 
+h3. 3.1.6
+
+* A new @uuid@ method:#uuidFun has been added.
+* Support for @DELETE ... IF EXISTS@ syntax.
+
+
 h3. 3.1.5
 
 * It is now possible to group clustering columns in a relatiion, see SELECT 
Where clauses:#selectWhere.

http://git-wip-us.apache.org/repos/asf/cassandra/blob/48d7e408/src/java/org/apache/cassandra/cql3/QueryProcessor.java
--
diff --git a/src/java/org/apache/cassandra/cql3/QueryProcessor.java 
b/src/java/org/apache/cassandra/cql3/QueryProcessor.java
index 64ea5e5..ab0ea40 100644
--- a/src/java/org/apache/cassandra/cql3/QueryProcessor.java
+++ b/src/java/org/apache/cassandra/cql3/QueryProcessor.java
@@ -43,7 +43,7 @@ import org.apache.cassandra.utils.SemanticVersion;
 
 public class QueryProcessor implements QueryHandler
 {
-public static final SemanticVersion CQL_VERSION = new 
SemanticVersion(3.1.5);
+public static final SemanticVersion CQL_VERSION = new 
SemanticVersion(3.1.6);
 
 public static final QueryProcessor instance = new QueryProcessor();

[1/2] git commit: Fix CQL version number for CASSANDRA-7055

Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.1 30e2bff69 - 3e6b29925


Fix CQL version number for CASSANDRA-7055


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/48d7e408
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/48d7e408
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/48d7e408

Branch: refs/heads/cassandra-2.1
Commit: 48d7e408085ff4f56a718487dedb05c23dfe57c9
Parents: bc6f4d0
Author: Sylvain Lebresne sylv...@datastax.com
Authored: Tue Apr 22 10:12:19 2014 +0200
Committer: Sylvain Lebresne sylv...@datastax.com
Committed: Tue Apr 22 10:12:19 2014 +0200

--
 doc/cql3/CQL.textile| 16 +++-
 .../org/apache/cassandra/cql3/QueryProcessor.java   |  2 +-
 2 files changed, 16 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/48d7e408/doc/cql3/CQL.textile
--
diff --git a/doc/cql3/CQL.textile b/doc/cql3/CQL.textile
index 3847701..f6208bf 100644
--- a/doc/cql3/CQL.textile
+++ b/doc/cql3/CQL.textile
@@ -497,13 +497,16 @@ bc(syntax)..
   ( USING option ( AND option )* )?
   SET assignment ( ',' assignment )*
   WHERE where-clause
-  ( IF identifier '=' term ( AND identifier '=' term 
)* )?
+  ( IF condition ( AND condition )* )?
 
 assignment ::= identifier '=' term
| identifier '=' identifier ('+' | '-') (int-term | 
set-literal | list-literal)
| identifier '=' identifier '+' map-literal
| identifier '[' term ']' '=' term
 
+condition ::= identifier '=' term
+  | identifier '[' term ']' '=' term
+
 where-clause ::= relation ( AND relation )*
 
 relation ::= identifier '=' term
@@ -552,6 +555,7 @@ bc(syntax)..
   FROM tablename
   ( USING TIMESTAMP integer)?
   WHERE where-clause
+  ( IF ( EXISTS | ( condition ( AND condition )*) ) )?
 
 selection ::= identifier ( '[' term ']' )?
 
@@ -560,6 +564,9 @@ bc(syntax)..
 relation ::= identifier '=' term
  | identifier IN '(' ( term ( ',' term )* )? ')'
  | identifier IN '?'
+
+condition ::= identifier '=' term
+  | identifier '[' term ']' '=' term
 p. 
 __Sample:__
 
@@ -574,6 +581,7 @@ The @DELETE@ statement deletes columns and rows. If column 
names are provided di
 
 In a @DELETE@ statement, all deletions within the same partition key are 
applied atomically and in isolation.
 
+A @DELETE@ operation application can be conditioned using @IF@ like for 
@UPDATE@ and @INSERT@. But please not that as for the later, this will incur a 
non negligible performance cost (internally, Paxos will be used) and so should 
be used sparingly.
 
 
 h3(#batchStmt). BATCH
@@ -1149,6 +1157,12 @@ h2(#changes). Changes
 
 The following describes the addition/changes brought for each version of CQL.
 
+h3. 3.1.6
+
+* A new @uuid@ method:#uuidFun has been added.
+* Support for @DELETE ... IF EXISTS@ syntax.
+
+
 h3. 3.1.5
 
 * It is now possible to group clustering columns in a relatiion, see SELECT 
Where clauses:#selectWhere.

http://git-wip-us.apache.org/repos/asf/cassandra/blob/48d7e408/src/java/org/apache/cassandra/cql3/QueryProcessor.java
--
diff --git a/src/java/org/apache/cassandra/cql3/QueryProcessor.java 
b/src/java/org/apache/cassandra/cql3/QueryProcessor.java
index 64ea5e5..ab0ea40 100644
--- a/src/java/org/apache/cassandra/cql3/QueryProcessor.java
+++ b/src/java/org/apache/cassandra/cql3/QueryProcessor.java
@@ -43,7 +43,7 @@ import org.apache.cassandra.utils.SemanticVersion;
 
 public class QueryProcessor implements QueryHandler
 {
-public static final SemanticVersion CQL_VERSION = new 
SemanticVersion(3.1.5);
+public static final SemanticVersion CQL_VERSION = new 
SemanticVersion(3.1.6);
 
 public static final QueryProcessor instance = new QueryProcessor();

[2/2] git commit: Merge branch 'cassandra-2.0' into cassandra-2.1

Merge branch 'cassandra-2.0' into cassandra-2.1


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3e6b2992
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3e6b2992
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3e6b2992

Branch: refs/heads/cassandra-2.1
Commit: 3e6b29925686dac0275c2db64e3a3b69203b1747
Parents: 30e2bff 48d7e40
Author: Sylvain Lebresne sylv...@datastax.com
Authored: Tue Apr 22 10:12:47 2014 +0200
Committer: Sylvain Lebresne sylv...@datastax.com
Committed: Tue Apr 22 10:12:47 2014 +0200

--
 doc/cql3/CQL.textile| 16 +++-
 .../org/apache/cassandra/cql3/QueryProcessor.java   |  2 +-
 2 files changed, 16 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/3e6b2992/doc/cql3/CQL.textile
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/3e6b2992/src/java/org/apache/cassandra/cql3/QueryProcessor.java
--

[1/3] git commit: Fix CQL version number for CASSANDRA-7055

Repository: cassandra
Updated Branches:
  refs/heads/trunk 33fa4f648 - 68aa62bde


Fix CQL version number for CASSANDRA-7055


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/48d7e408
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/48d7e408
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/48d7e408

Branch: refs/heads/trunk
Commit: 48d7e408085ff4f56a718487dedb05c23dfe57c9
Parents: bc6f4d0
Author: Sylvain Lebresne sylv...@datastax.com
Authored: Tue Apr 22 10:12:19 2014 +0200
Committer: Sylvain Lebresne sylv...@datastax.com
Committed: Tue Apr 22 10:12:19 2014 +0200

--
 doc/cql3/CQL.textile| 16 +++-
 .../org/apache/cassandra/cql3/QueryProcessor.java   |  2 +-
 2 files changed, 16 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/48d7e408/doc/cql3/CQL.textile
--
diff --git a/doc/cql3/CQL.textile b/doc/cql3/CQL.textile
index 3847701..f6208bf 100644
--- a/doc/cql3/CQL.textile
+++ b/doc/cql3/CQL.textile
@@ -497,13 +497,16 @@ bc(syntax)..
   ( USING option ( AND option )* )?
   SET assignment ( ',' assignment )*
   WHERE where-clause
-  ( IF identifier '=' term ( AND identifier '=' term 
)* )?
+  ( IF condition ( AND condition )* )?
 
 assignment ::= identifier '=' term
| identifier '=' identifier ('+' | '-') (int-term | 
set-literal | list-literal)
| identifier '=' identifier '+' map-literal
| identifier '[' term ']' '=' term
 
+condition ::= identifier '=' term
+  | identifier '[' term ']' '=' term
+
 where-clause ::= relation ( AND relation )*
 
 relation ::= identifier '=' term
@@ -552,6 +555,7 @@ bc(syntax)..
   FROM tablename
   ( USING TIMESTAMP integer)?
   WHERE where-clause
+  ( IF ( EXISTS | ( condition ( AND condition )*) ) )?
 
 selection ::= identifier ( '[' term ']' )?
 
@@ -560,6 +564,9 @@ bc(syntax)..
 relation ::= identifier '=' term
  | identifier IN '(' ( term ( ',' term )* )? ')'
  | identifier IN '?'
+
+condition ::= identifier '=' term
+  | identifier '[' term ']' '=' term
 p. 
 __Sample:__
 
@@ -574,6 +581,7 @@ The @DELETE@ statement deletes columns and rows. If column 
names are provided di
 
 In a @DELETE@ statement, all deletions within the same partition key are 
applied atomically and in isolation.
 
+A @DELETE@ operation application can be conditioned using @IF@ like for 
@UPDATE@ and @INSERT@. But please not that as for the later, this will incur a 
non negligible performance cost (internally, Paxos will be used) and so should 
be used sparingly.
 
 
 h3(#batchStmt). BATCH
@@ -1149,6 +1157,12 @@ h2(#changes). Changes
 
 The following describes the addition/changes brought for each version of CQL.
 
+h3. 3.1.6
+
+* A new @uuid@ method:#uuidFun has been added.
+* Support for @DELETE ... IF EXISTS@ syntax.
+
+
 h3. 3.1.5
 
 * It is now possible to group clustering columns in a relatiion, see SELECT 
Where clauses:#selectWhere.

http://git-wip-us.apache.org/repos/asf/cassandra/blob/48d7e408/src/java/org/apache/cassandra/cql3/QueryProcessor.java
--
diff --git a/src/java/org/apache/cassandra/cql3/QueryProcessor.java 
b/src/java/org/apache/cassandra/cql3/QueryProcessor.java
index 64ea5e5..ab0ea40 100644
--- a/src/java/org/apache/cassandra/cql3/QueryProcessor.java
+++ b/src/java/org/apache/cassandra/cql3/QueryProcessor.java
@@ -43,7 +43,7 @@ import org.apache.cassandra.utils.SemanticVersion;
 
 public class QueryProcessor implements QueryHandler
 {
-public static final SemanticVersion CQL_VERSION = new 
SemanticVersion(3.1.5);
+public static final SemanticVersion CQL_VERSION = new 
SemanticVersion(3.1.6);
 
 public static final QueryProcessor instance = new QueryProcessor();

[3/3] git commit: Merge branch 'cassandra-2.1' into trunk

Merge branch 'cassandra-2.1' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/68aa62bd
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/68aa62bd
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/68aa62bd

Branch: refs/heads/trunk
Commit: 68aa62bde1596a0f7cae03049a3cdcb491151a1c
Parents: 33fa4f6 3e6b299
Author: Sylvain Lebresne sylv...@datastax.com
Authored: Tue Apr 22 10:13:26 2014 +0200
Committer: Sylvain Lebresne sylv...@datastax.com
Committed: Tue Apr 22 10:13:26 2014 +0200

--
 doc/cql3/CQL.textile| 16 +++-
 .../org/apache/cassandra/cql3/QueryProcessor.java   |  2 +-
 2 files changed, 16 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/68aa62bd/src/java/org/apache/cassandra/cql3/QueryProcessor.java
--

[2/3] git commit: Merge branch 'cassandra-2.0' into cassandra-2.1

Merge branch 'cassandra-2.0' into cassandra-2.1


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3e6b2992
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3e6b2992
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3e6b2992

Branch: refs/heads/trunk
Commit: 3e6b29925686dac0275c2db64e3a3b69203b1747
Parents: 30e2bff 48d7e40
Author: Sylvain Lebresne sylv...@datastax.com
Authored: Tue Apr 22 10:12:47 2014 +0200
Committer: Sylvain Lebresne sylv...@datastax.com
Committed: Tue Apr 22 10:12:47 2014 +0200

--
 doc/cql3/CQL.textile| 16 +++-
 .../org/apache/cassandra/cql3/QueryProcessor.java   |  2 +-
 2 files changed, 16 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/3e6b2992/doc/cql3/CQL.textile
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/3e6b2992/src/java/org/apache/cassandra/cql3/QueryProcessor.java
--

[jira] [Resolved] (CASSANDRA-7055) Boken CQL Version number in 2.0.7

2014-04-22 Thread Sylvain Lebresne (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne resolved CASSANDRA-7055.
-

Resolution: Fixed

Can't do anything for 2.0.7, but bumped the version for 2.0.8 (and updated the 
doc accordingly).

 Boken CQL Version number in 2.0.7
 -

 Key: CASSANDRA-7055
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7055
 Project: Cassandra
  Issue Type: Bug
Reporter: Michaël Figuière
Assignee: Sylvain Lebresne
Priority: Trivial
 Fix For: 2.0.8


 Cassandra 2.0.7 has introduced 2 changes in the CQL language:
 *Add uuid() function (CASSANDRA-6473)
 *Add support for DELETE ... IF EXISTS to CQL3 (CASSANDRA-5708)
 Unfortunately the {{cql_version}} hasn't been incremented as reported in the 
 {{system.local}} table.
 In 2.0.6:
 {code}
 cqlsh select cql_version from system.local;
  cql_version
 -
3.1.5
 {code}
 In 2.0.7:
 {code}
 cqlsh select cql_version from system.local;
  cql_version
 -
3.1.5
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6863) Incorrect read repair of range thombstones

2014-04-22 Thread Oleg Anastasyev (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976532#comment-13976532
 ] 

Oleg Anastasyev commented on CASSANDRA-6863:


RangeTombstoneList.updateDigest:
{code}
+for (int j = 0; j  ends[i].size(); j++)
+digest.update(starts[i].get(j).duplicate());
{code}
should call digest.update on ends[i], not starts[i]

the rest LGTM

 Incorrect read repair of range thombstones
 --

 Key: CASSANDRA-6863
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6863
 Project: Cassandra
  Issue Type: Bug
 Environment: 2.0
Reporter: Oleg Anastasyev
 Attachments: 6863-v2.txt, 6863-v2.txt, 
 ReadRepairRangeThombstoneDiff.txt, ReadRepairsDebugLogger.txt


 Rows with range thombstones are read repaired for every replica, if RR is 
 triggered (this is because CF.diff() returns non null if !isEmpty(), which in 
 turn returns false if range thombstones list is not empty). 
 Also, full rangethombstone list is send to all nodes, which could be a 
 problem if you have wide partition.
 Fixed this by evaluating diff on range thombstone lists as well as on 
 deteleInfo of endpoint CF versions. Also return null from CF.diff, if no diff 
 in RTL.
 A second patch (ReadRepairsDebugLogger.txt) adds some debug logging to look 
 at read repairs. You may find it useful as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6993) Windows: remove mmap'ed I/O for index files and force standard file access


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976537#comment-13976537
 ] 

Benedict commented on CASSANDRA-6993:
-

1) isUnix should be final
2) I think your isUnix check is too limited: this will break Mac OSX, FreeBSD 
and Solaris users, possibly others. Since basically every OS other than Windows 
probably supports this, I'd suggest making it an isWindows check and looking 
for contains(windows). [This 
link|http://mindprod.com/jgloss/properties.html#OSNAME] may help, although may 
not be completely authoritative. A quick grep of openjdk shows the following 
line in their own test tools, though: 

{code}
static boolean isWindows = System.getProperty(os.name).startsWith(Windows);
{code}

Which suggests it's probably sufficient.

 Windows: remove mmap'ed I/O for index files and force standard file access
 --

 Key: CASSANDRA-6993
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6993
 Project: Cassandra
  Issue Type: Improvement
Reporter: Joshua McKenzie
Assignee: Joshua McKenzie
Priority: Minor
 Fix For: 3.0

 Attachments: 6993_v1.txt


 Memory-mapped I/O on Windows causes issues with hard-links; we're unable to 
 delete hard-links to open files with memory-mapped segments even using nio.  
 We'll need to push for close to performance parity between mmap'ed I/O and 
 buffered going forward as the buffered / compressed path offers other 
 benefits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7029) Investigate alternative transport protocols for both client and inter-server communications


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976543#comment-13976543
 ] 

Benedict commented on CASSANDRA-7029:
-

This is exactly the reason I created CASSANDRA-7061, which I intend to look at 
first. Profilers indicate a great deal of overhead in networking, but I'm not 
sure how honest that is.


 Investigate alternative transport protocols for both client and inter-server 
 communications
 ---

 Key: CASSANDRA-7029
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7029
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 There are a number of reasons to think we can do better than TCP for our 
 communications:
 1) We can actually tolerate sporadic small message losses, so guaranteed 
 delivery isn't essential (although for larger messages it probably is)
 2) As shown in \[1\] and \[2\], Linux can behave quite suboptimally with 
 regard to TCP message delivery when the system is under load. Judging from 
 the theoretical description, this is likely to apply even when the 
 system-load is not high, but the number of processes to schedule is high. 
 Cassandra generally has a lot of threads to schedule, so this is quite 
 pertinent for us. UDP performs substantially better here.
 3) Even when the system is not under load, UDP has a lower CPU burden, and 
 that burden is constant regardless of the number of connections it processes. 
 4) On a simple benchmark on my local PC, using non-blocking IO for UDP and 
 busy spinning on IO I can actually push 20-40% more throughput through 
 loopback (where TCP should be optimal, as no latency), even for very small 
 messages. Since we can see networking taking multiple CPUs' worth of time 
 during a stress test, using a busy-spin for ~100micros after last message 
 receipt is almost certainly acceptable, especially as we can (ultimately) 
 process inter-server and client communications on the same thread/socket in 
 this model.
 5) We can optimise the threading model heavily: since we generally process 
 very small messages (200 bytes not at all implausible), the thread signalling 
 costs on the processing thread can actually dramatically impede throughput. 
 In general it costs ~10micros to signal (and passing the message to another 
 thread for processing in the current model requires signalling). For 200-byte 
 messages this caps our throughput at 20MB/s.
 I propose to knock up a highly naive UDP-based connection protocol with 
 super-trivial congestion control over the course of a few days, with the only 
 initial goal being maximum possible performance (not fairness, reliability, 
 or anything else), and trial it in Netty (possibly making some changes to 
 Netty to mitigate thread signalling costs). The reason for knocking up our 
 own here is to get a ceiling on what the absolute limit of potential for this 
 approach is. Assuming this pans out with performance gains in C* proper, we 
 then look to contributing to/forking the udt-java project and see how easy it 
 is to bring performance in line with what we can get with our naive approach 
 (I don't suggest starting here, as the project is using blocking old-IO, and 
 modifying it with latency in mind may be challenging, and we won't know for 
 sure what the best case scenario is).
 \[1\] 
 http://test-docdb.fnal.gov/0016/001648/002/Potential%20Performance%20Bottleneck%20in%20Linux%20TCP.PDF
 \[2\] 
 http://cd-docdb.fnal.gov/cgi-bin/RetrieveFile?docid=1968;filename=Performance%20Analysis%20of%20Linux%20Networking%20-%20Packet%20Receiving%20(Official).pdf;version=2
 Further related reading:
 http://public.dhe.ibm.com/software/commerce/doc/mft/cdunix/41/UDTWhitepaper.pdf
 https://mospace.umsystem.edu/xmlui/bitstream/handle/10355/14482/ChoiUndPerTcp.pdf?sequence=1
 https://access.redhat.com/site/documentation/en-US/JBoss_Enterprise_Web_Platform/5/html/Administration_And_Configuration_Guide/jgroups-perf-udpbuffer.html
 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.153.3762rep=rep1type=pdf



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (CASSANDRA-7065) Add some extra metadata in leveled manifest to be able to reduce the amount of sstables searched on read path

2014-04-22 Thread Marcus Eriksson (JIRA)

Marcus Eriksson created CASSANDRA-7065:
--

 Summary: Add some extra metadata in leveled manifest to be able to 
reduce the amount of sstables searched on read path
 Key: CASSANDRA-7065
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7065
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson


Based on this;
http://rocksdb.org/blog/431/indexing-sst-files-for-better-lookup-performance/

By keeping pointers from the sstables in lower to higher levels we could reduce 
the number of candidates in higher levels, ie, instead of searching all 1000 L3 
sstables, we use the information from the L2 search to include less L3 sstables.

First we need to figure out if this can beat our IntervalTree approach (and if 
the win is worth it).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7031) Increase default commit log total space + segment size


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976547#comment-13976547
 ] 

Benedict commented on CASSANDRA-7031:
-

The _worst_ latency is substantially reduced, which is down to waiting on the 
commit log to catch up. It's possible the 99th/99.9th are increased due to 
sharing the same disk, but notice the 95th percentile is lower also for both, 
so it's only a slight spike in the 99th+99.9th for a substantial drop in the 
max and the more common cases. Could simply be random noise from running on my 
box, though. [~enigmacurry] perhaps you could kick off a simple test comparing 
with and without this patch on the real cluster so we can see some pretty 
graphs (keep populate range low so that commit log is a more visible component 
though, preferably)?

 Increase default commit log total space + segment size
 --

 Key: CASSANDRA-7031
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7031
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
Priority: Trivial
 Fix For: 2.1 beta2

 Attachments: 7031.txt


 I would like to increase the default commit log total space and segment size 
 options for 64-bit JVMs:
 The current default of 1Gb and 32Mb is quite constrained and can have some 
 (very minor) negative performance implications, for no major benefit: 
 # 32Mb files are actually quite small, and if during the 10s interval we have 
 completely filled multiple of them (quite easy) it would be more efficient to 
 write fewer larger files, as we can issue fewer fsyncs and permit the OS to 
 schedule the writes more efficiently. On my box this has a small but 
 noticeable impact. Although I would expect on decent server hardware this 
 would be smaller still, since we immediately drop the pages from cache on 
 writing there isn't a great deal of advantage to keeping the files so small. 
 The only advantage I can see is that during a drop KS/CF or other event that 
 forces log rollover we're wasting less space until log recycling. 128-256Mb 
 are modest increases that seem more appropriate to me.
 # 1Gb is too small for the default total log space. We can find that we force 
 memtable flushes as a result of log utilisation instead of memtable occupancy 
 quite often (esp. as a result of increased effective memtable space from 
 recent improvements), especially on machines with more addressable memory. I 
 suggest 8Gb as a minimum. The only disadvantage of having more log data is 
 that replay on restart may be slightly slower, but since most of the events 
 will be ignored it should be relatively benign, and I would rather take the 
 penalty on startup instead of during running, no matter how small the running 
 penalty.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6916) Preemptive opening of compaction result


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976585#comment-13976585
 ] 

Benedict commented on CASSANDRA-6916:
-

[~enigmacurry]: 6916v3 should crash if you set the preheat_kernel_page_cache 
property to true (just tested this locally). Setting the 
populate_io_cache_on_flush property of the CF would probably work but simply 
have no effect. Are you sure you were running the correct branch?

bq. Furthermore, I hadn't realized when testing CASSANDRA-6746 that we could 
actually fare well with the existing options like this

The problem is that we consider the default to be better at preventing dramatic 
page cache churn during compaction, which this should continue to deliver but 
without the downsides.

The errors look plausible - but could we confirm we're running the correct (v3) 
branch given it didn't crash with the preheat setting in the yaml?

 Preemptive opening of compaction result
 ---

 Key: CASSANDRA-6916
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6916
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benedict
Assignee: Benedict
Priority: Minor
  Labels: performance
 Fix For: 2.1

 Attachments: 6916-stock2_1.mixed.cache_tweaks.tar.gz, 
 6916-stock2_1.mixed.logs.tar.gz, 6916v3-preempive-open-compact.logs.gz, 
 6916v3-preempive-open-compact.mixed.2.logs.tar.gz, 
 6916v3-premptive-open-compact.mixed.cache_tweaks.2.tar.gz


 Related to CASSANDRA-6812, but a little simpler: when compacting, we mess 
 quite badly with the page cache. One thing we can do to mitigate this problem 
 is to use the sstable we're writing before we've finished writing it, and to 
 drop the regions from the old sstables from the page cache as soon as the new 
 sstables have them (even if they're only written to the page cache). This 
 should minimise any page cache churn, as the old sstables must be larger than 
 the new sstable, and since both will be in memory, dropping the old sstables 
 is at least as good as dropping the new.
 The approach is quite straight-forward. Every X MB written:
 # grab flushed length of index file;
 # grab second to last index summary record, after excluding those that point 
 to positions after the flushed length;
 # open index file, and check that our last record doesn't occur outside of 
 the flushed length of the data file (pretty unlikely)
 # Open the sstable with the calculated upper bound
 Some complications:
 # must keep running copy of compression metadata for reopening with
 # we need to be able to replace an sstable with itself but a different lower 
 bound
 # we need to drop the old page cache only when readers have finished



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7031) Increase default commit log total space + segment size


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976624#comment-13976624
 ] 

Benedict commented on CASSANDRA-7031:
-

From what POV is 128Mb a long gap between archived segments? Do we mean that 
there may be a 128Mb gap after the most recent archive during which no PIT 
restore is possible? Seems like this would be a minimal problem, as the most 
recent CLS is still present in the CL directory, and we could always offer the 
ability to create a PITR point through force recycling the current CL segment 
at the requested time to make sure there is a separate backup. If you care 
about rolling PITR backups with minimal intervals then you're probably a very 
specific use case, I'd reckon.

As far as replay is concerned, I don't see a major difference: we need to read 
ahead potentially more than even one 128Mb file to check if there are delayed 
commits, and either way 128Mb is a very small amount of data - a few seconds at 
most of extra restore time.

 Increase default commit log total space + segment size
 --

 Key: CASSANDRA-7031
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7031
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
Priority: Trivial
 Fix For: 2.1 beta2

 Attachments: 7031.txt


 I would like to increase the default commit log total space and segment size 
 options for 64-bit JVMs:
 The current default of 1Gb and 32Mb is quite constrained and can have some 
 (very minor) negative performance implications, for no major benefit: 
 # 32Mb files are actually quite small, and if during the 10s interval we have 
 completely filled multiple of them (quite easy) it would be more efficient to 
 write fewer larger files, as we can issue fewer fsyncs and permit the OS to 
 schedule the writes more efficiently. On my box this has a small but 
 noticeable impact. Although I would expect on decent server hardware this 
 would be smaller still, since we immediately drop the pages from cache on 
 writing there isn't a great deal of advantage to keeping the files so small. 
 The only advantage I can see is that during a drop KS/CF or other event that 
 forces log rollover we're wasting less space until log recycling. 128-256Mb 
 are modest increases that seem more appropriate to me.
 # 1Gb is too small for the default total log space. We can find that we force 
 memtable flushes as a result of log utilisation instead of memtable occupancy 
 quite often (esp. as a result of increased effective memtable space from 
 recent improvements), especially on machines with more addressable memory. I 
 suggest 8Gb as a minimum. The only disadvantage of having more log data is 
 that replay on restart may be slightly slower, but since most of the events 
 will be ignored it should be relatively benign, and I would rather take the 
 penalty on startup instead of during running, no matter how small the running 
 penalty.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-4450) CQL3: Allow preparing the consistency level, timestamp and ttl

2014-04-22 Thread Pavel Eremeev (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976639#comment-13976639
]

Pavel Eremeev commented on CASSANDRA-4450:
--

Why [timestamp] is using LongType.instance instead of TimestampType.instance?

I think its better for clients to know the true type of that field for
simplified and correct encoding of that value.

CQL3: Allow preparing the consistency level, timestamp and ttl
--

Key: CASSANDRA-4450
URL: https://issues.apache.org/jira/browse/CASSANDRA-4450
Project: Cassandra
Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
Labels: cql3
Fix For: 2.0 beta 1

It could be useful to allow the preparation of the consitency level, the
timestamp and the ttl. I.e. to allow:
{noformat}
UPDATE foo SET .. USING CONSISTENCY ? AND TIMESTAMP ? AND TTL ?
{noformat}
A slight concern is that when preparing a statement we return the names of
the prepared variables, but none of timestamp, ttl and consistency are
reserved names currently, so returning those as names could conflict with a
column name. We can either:
* make these reserved identifier (I have to add that I'm not a fan because at
least for timestamp, I think that's a potentially useful and common column
name).
* use some specific special character to indicate those are not column names,
like returning [timestamp], [ttl], [consistency].

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7046) Update nodetool commands to output the date and time they were run on

2014-04-22 Thread JIRA


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976662#comment-13976662
 ] 

Clément Lardeur commented on CASSANDRA-7046:


For me, It's not the responsibility of nodetool to display the time in stdout 
when the command has been executed, but rather it's the script or the client 
that called nodetool which should do it.

If you use a shell script to call nodetool and redirect his output into a file, 
use the {{date}} command before calling nodetool.

 Update nodetool commands to output the date and time they were run on
 -

 Key: CASSANDRA-7046
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7046
 Project: Cassandra
  Issue Type: Improvement
Reporter: Johnny Miller
Priority: Trivial
  Labels: lhf

 It would help if the various nodetool commands also outputted the system date 
 time they were run. Often these commands are executed and then we look at the 
 cassandra log files to try and find out what was happening at that time. 
 This is certainly just a convenience feature, but it would be nice to have 
 the information in there to aid with diagnostics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-7062) Extension of static columns for compound cluster keys

2014-04-22 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clément Lardeur updated CASSANDRA-7062:
---

Description: 
CASSANDRA-6561 implemented static columns for a given partition key.

What this is proposing for a compound cluster key is a static column that is 
static at intermediate parts of a compound cluster key. This example shows a 
table modelling a moderately complex EAV pattern  :

{code}
CREATE TABLE t (
   entityID text,
   propertyName text,
   valueIndex text,
   entityName text static (entityID),
   propertyType text static (entityID, propertyName),
   propertyRelations Listtext static (entityID, propertyName),
   data text,
   PRIMARY KEY (entityID, (propertyName,valueIndex))
)
{code}
So in this example has the following static columns:
- the entityName column behaves exactly as CASSANDRA-6561 details, so all 
cluster rows have the same value
- the propertyType and propertyRelations columns are static with respect to the 
remaining parts of the cluster key (that is, across all valueIndex values for a 
given propertyName), so an update to those values for an entityID and a 
propertyName will be shared/constant by all the value rows...

Is this a relatively simple extension of the same mechanism in -6561, or is 
this a whoa, you have no idea what you are proposing?

Sample data:

Mary and Jane aren't married...
{code}
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, 
data) VALUES ('0001','MARY MATALIN','married','SingleValue','0','false');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, 
data) VALUES ('0002','JANE JOHNSON','married','SingleValue','0','false');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) 
VALUES ('0001','MARY MATALIN','kids','NOVALUE','');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex) 
VALUES ('0002','JANE JOHNSON','kids','NOVALUE','');
{code}
{code}
SELECT * FROM t:

0001 MARY MATALIN  married   SingleValue   0   false
0001 MARY MATALIN  kids NOVALUE  null
0002 JANE JOHNSON  married   SingleValue   0   false
0002 JANE JOHNSON  kids NOVALUE  null
{code}
Then mary and jane get married (so the entityName column that is static on the 
partition key is updated just like CASSANDRA-6561 )
{code}
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, 
data) VALUES ('0001','MARY SMITH','married','SingleValue','0','TRUE');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex, 
data) VALUES ('0002','JANE JONES','married','SingleValue','0','TRUE');
{code}
{code}
SELECT * FROM t:

0001 MARY SMITH  married   SingleValue   0   TRUE
0001 MARY SMITH  kids NOVALUE  null
0002 JANE JONES   married   SingleValue   0   TRUE
0002 JANE JONES   kids NOVALUE  null
{code}
Then mary and jane have a kid, so we add another value to the kids attribute:
{code}
INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES 
('0001','kids','SingleValue','0','JIM-BOB');
INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES 
('0002','kids','SingleValue','0','JENNY');
{code}
{code}
SELECT * FROM t:

0001 MARY SMITH  married   SingleValue   0   TRUE
0001 MARY SMITH  kids SingleValuenull
0001 MARY SMITH  kids SingleValue   0   JIM-BOB
0002 JANE JONES   married   SingleValue   0   TRUE
0002 JANE JONES   kids SingleValuenull
0002 JANE JONES   kids SingleValue   0   JENNY
{code}
Then Mary has ANOTHER kid, which demonstrates the partially static column 
relative to the cluster key, as ALL value rows for the property 'kids' get 
updated to the new value:
{code}
INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES 
('0001','kids','MultiValue','1','HARRY');
{code}
{code}
SELECT * FROM t:

0001 MARY SMITH  married   SingleValue  0   TRUE
0001 MARY SMITH  kids MultiValue  null
0001 MARY SMITH  kids MultiValue 0   JIM-BOB
0001 MARY SMITH  kids MultiValue 1   HARRY
0002 JANE JONES   married   SingleValue   0   TRUE
0002 JANE JONES   kids SingleValuenull
0002 JANE JONES   kids SingleValue   0   JENNY
{code}

... ok, hopefully that example isn't TOO complicated. Yes, there's a stupid 
hack bug in there with the null/empty row for the kids attribute, but please 
bear with me on that 

Generally speaking, this will aid in flattening / denormalization of relational 
constructs into cassandra-friendly schemas. In the above example we are 
flattening a relational schema of three tables: entity, property, and value 
tables into a single sparse flattened denormalized compound table.


  was:
CASSANDRA-6561 implemented static columns for a given partition key.

What this is proposing for a compound cluster key is a static

[jira] [Commented] (CASSANDRA-6696) Drive replacement in JBOD can cause data to reappear.

2014-04-22 Thread Tupshin Harper (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976672#comment-13976672
]

Tupshin Harper commented on CASSANDRA-6696:
---

+1. To the extent that we can do sstables per vnode without introducing other
performance costs, I am hugely in favor of it. With good OS tuning, I'm not
scared of too many sstables. If it is a pain for backup, or other things, you
could have an offline sstable consolidator script that would take a batch of
sstables and stream them out as a single sstable to a remote location.

Drive replacement in JBOD can cause data to reappear.
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-6863) Incorrect read repair of range thombstones


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-6863:
--

 Reviewer: Jonathan Ellis
Fix Version/s: 2.1
 Assignee: Oleg Anastasyev

 Incorrect read repair of range thombstones
 --

 Key: CASSANDRA-6863
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6863
 Project: Cassandra
  Issue Type: Bug
 Environment: 2.0
Reporter: Oleg Anastasyev
Assignee: Oleg Anastasyev
 Fix For: 2.1

 Attachments: 6863-v2.txt, 6863-v2.txt, 
 ReadRepairRangeThombstoneDiff.txt, ReadRepairsDebugLogger.txt


 Rows with range thombstones are read repaired for every replica, if RR is 
 triggered (this is because CF.diff() returns non null if !isEmpty(), which in 
 turn returns false if range thombstones list is not empty). 
 Also, full rangethombstone list is send to all nodes, which could be a 
 problem if you have wide partition.
 Fixed this by evaluating diff on range thombstone lists as well as on 
 deteleInfo of endpoint CF versions. Also return null from CF.diff, if no diff 
 in RTL.
 A second patch (ReadRepairsDebugLogger.txt) adds some debug logging to look 
 at read repairs. You may find it useful as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6987) sstablesplit fails in 2.1


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976684#comment-13976684
 ] 

Jonathan Ellis commented on CASSANDRA-6987:
---

committed

 sstablesplit fails in 2.1
 -

 Key: CASSANDRA-6987
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6987
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Debian Testing/Jessie
 Oracle JDK 1.7.0_51
 c*-2.1 branch, commit 5ebadc11e36749e6479f9aba19406db3aacdaf41
Reporter: Michael Shuler
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: 6987.txt


 sstablesplit dtest began failing in 2.1 at 
 http://cassci.datastax.com/job/cassandra-2.1_novnode_dtest/95/ triggered by 
 http://cassci.datastax.com/job/cassandra-2.1/186/
 repro:
 {noformat}
 (cassandra-2.1)mshuler@hana:~/git/cassandra$ ./bin/cassandra /dev/null 
 (cassandra-2.1)mshuler@hana:~/git/cassandra$ ./tools/bin/cassandra-stress 
 write n=100
 Created keyspaces. Sleeping 1s for propagation.
 Warming up WRITE with 5 iterations...
 Connected to cluster: Test Cluster
 Datatacenter: datacenter1; Host: localhost/127.0.0.1; Rack: rack1
 Sleeping 2s...
 Running WRITE with 50 threads  for 100 iterations
 ops   ,op/s,   key/s,mean, med, .95, .99,.999,
  max,   time,   stderr
 26836 ,   26830,   26830, 2.0, 1.1, 4.0,20.8,   131.4,   
 207.4,1.0,  0.0
 64002 ,   36236,   36236, 1.4, 0.8, 4.2,13.8,41.3,   
 234.8,2.0,  0.0
 105604,   38188,   38188, 1.3, 0.8, 3.2,10.6,78.4,
 93.7,3.1,  0.10546
 156179,   36750,   36750, 1.4, 0.9, 2.9, 8.8,   117.0,   
 139.8,4.5,  0.08482
 202092,   40487,   40487, 1.2, 0.9, 2.9, 7.3,45.6,   
 122.5,5.6,  0.07231
 246947,   40583,   40583, 1.2, 0.8, 3.0, 7.6,98.2,   
 152.1,6.7,  0.07056
 290186,   39867,   39867, 1.3, 0.8, 2.6, 8.9,   113.3,   
 126.4,7.8,  0.06391
 331609,   40155,   40155, 1.2, 0.8, 3.1, 8.7,99.1,   
 124.9,8.8,  0.05731
 371813,   38742,   38742, 1.3, 0.8, 3.1, 9.2,   117.2,   
 123.9,9.9,  0.05153
 416853,   40024,   40024, 1.2, 0.8, 3.2, 8.1,70.4,   
 119.8,   11.0,  0.04634
 458389,   39045,   39045, 1.3, 0.8, 3.2, 9.1,   106.4,   
 135.9,   12.1,  0.04236
 511323,   36513,   36513, 1.4, 0.8, 3.3, 9.2,   120.2,   
 161.0,   13.5,  0.03883
 549872,   34296,   34296, 1.5, 0.9, 3.4,11.5,   106.7,   
 132.7,   14.6,  0.03678
 589405,   34535,   34535, 1.4, 0.9, 2.9,10.6,   106.2,   
 147.9,   15.8,  0.03607
 633225,   39472,   39472, 1.3, 0.8, 3.0, 7.6,   106.3,   
 125.1,   16.9,  0.03374
 672751,   38251,   38251, 1.3, 0.8, 3.0, 8.0,94.7,   
 157.5,   17.9,  0.03193
 714762,   38047,   38047, 1.3, 0.8, 3.0, 9.3,   102.6,   
 167.8,   19.0,  0.03001
 756629,   38080,   38080, 1.3, 0.8, 3.2, 8.8,   101.7,   
 117.4,   20.1,  0.02847
 802981,   38955,   38955, 1.3, 0.8, 3.0, 9.1,   105.2,   
 164.6,   21.3,  0.02708
 847262,   38817,   38817, 1.3, 0.7, 3.2, 9.8,   112.1,   
 137.4,   22.5,  0.02581
 887639,   38403,   38403, 1.3, 0.8, 2.9,10.0,99.1,   
 147.8,   23.5,  0.02470
 929362,   35056,   35056, 1.4, 0.8, 3.3,11.5,   111.8,   
 149.3,   24.7,  0.02360
 980996,   38247,   38247, 1.3, 0.8, 3.5, 8.3,78.8,   
 129.0,   26.1,  0.02338
 100   ,   39379,   39379, 1.2, 0.9, 3.1, 9.0,29.4,
 83.8,   26.5,  0.02238
 Results:
 real op rate  : 37673
 adjusted op rate stderr   : 0
 key rate  : 37673
 latency mean  : 1.3
 latency median: 0.8
 latency 95th percentile   : 3.2
 latency 99th percentile   : 10.4
 latency 99.9th percentile : 92.1
 latency max   : 234.8
 Total operation time  : 00:00:26
 END
 (cassandra-2.1)mshuler@hana:~/git/cassandra$ ./bin/nodetool compact Keyspace1
 (cassandra-2.1)mshuler@hana:~/git/cassandra$ ./bin/sstablesplit 
 /var/lib/cassandra/data/Keyspace1/Standard1-*/Keyspace1-Standard1-ka-2-Data.db
 Exception in thread main java.lang.AssertionError
 at 
 org.apache.cassandra.db.Keyspace.openWithoutSSTables(Keyspace.java:104)
 at 
 org.apache.cassandra.tools.StandaloneSplitter.main(StandaloneSplitter.java:108)
 {noformat}
 There are no errors in system.log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6863) Incorrect read repair of range thombstones


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976686#comment-13976686
 ] 

Jonathan Ellis commented on CASSANDRA-6863:
---

committed

 Incorrect read repair of range thombstones
 --

 Key: CASSANDRA-6863
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6863
 Project: Cassandra
  Issue Type: Bug
 Environment: 2.0
Reporter: Oleg Anastasyev
Assignee: Oleg Anastasyev
 Fix For: 2.1

 Attachments: 6863-v2.txt, 6863-v2.txt, 
 ReadRepairRangeThombstoneDiff.txt, ReadRepairsDebugLogger.txt


 Rows with range thombstones are read repaired for every replica, if RR is 
 triggered (this is because CF.diff() returns non null if !isEmpty(), which in 
 turn returns false if range thombstones list is not empty). 
 Also, full rangethombstone list is send to all nodes, which could be a 
 problem if you have wide partition.
 Fixed this by evaluating diff on range thombstone lists as well as on 
 deteleInfo of endpoint CF versions. Also return null from CF.diff, if no diff 
 in RTL.
 A second patch (ReadRepairsDebugLogger.txt) adds some debug logging to look 
 at read repairs. You may find it useful as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6696) Drive replacement in JBOD can cause data to reappear.

[
https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976688#comment-13976688
]

Benedict commented on CASSANDRA-6696:
-

The problem here is packing vnodes fairly across the disks: either we need to
ensure that all vnodes are of roughly equal size (very difficult), or we
probably need to have a dynamic allocation strategy, and the problem with
_that_ is that when the token range gets redistributed by node
additions/removals, the whole cluster suddenly needs to start kicking off
rebalancing of their local disks.

We could support splitting the token range into M distinct chunks, where M is
preferably some multiple of the number of disks, and split the total token
range into M chunks, then allocate each chunk to a disk in round-robin fashion.
This then remains deterministic, and it is I think easier to guarantee an even
distribution within a given token range than it is to guarantee all vnodes are
of equal size, whilst still supporting a dynamic cluster size. Even here,
though, realistically I think we need the number of chunks to be quite a bit
smaller than the number of vnodes to guarantee anything approaching balance of
these chunks.

Drive replacement in JBOD can cause data to reappear.
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7031) Increase default commit log total space + segment size

[
https://issues.apache.org/jira/browse/CASSANDRA-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976690#comment-13976690
]

Jonathan Ellis commented on CASSANDRA-7031:
---

bq. Do we mean that there may be a 128Mb gap after the most recent archive
during which no PIT restore is possible? Seems like this would be a minimal
problem, as the most recent CLS is still present in the CL directory

Since one point of restore is, I don't have the CL directory anymore this is
kind of a non-solution.

we could always offer the ability to create a PITR point through force
recycling the current CL segment at the requested time to make sure there is
a separate backup.

So now we're forcing users to add a cron job for PITR to work? I don't like
that idea either.

Increase default commit log total space + segment size
--

Key: CASSANDRA-7031
URL: https://issues.apache.org/jira/browse/CASSANDRA-7031
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Benedict
Assignee: Benedict
Priority: Trivial
Fix For: 2.1 beta2

Attachments: 7031.txt

I would like to increase the default commit log total space and segment size
options for 64-bit JVMs:
The current default of 1Gb and 32Mb is quite constrained and can have some
(very minor) negative performance implications, for no major benefit:
# 32Mb files are actually quite small, and if during the 10s interval we have
completely filled multiple of them (quite easy) it would be more efficient to
write fewer larger files, as we can issue fewer fsyncs and permit the OS to
schedule the writes more efficiently. On my box this has a small but
noticeable impact. Although I would expect on decent server hardware this
would be smaller still, since we immediately drop the pages from cache on
writing there isn't a great deal of advantage to keeping the files so small.
The only advantage I can see is that during a drop KS/CF or other event that
forces log rollover we're wasting less space until log recycling. 128-256Mb
are modest increases that seem more appropriate to me.
# 1Gb is too small for the default total log space. We can find that we force
memtable flushes as a result of log utilisation instead of memtable occupancy
quite often (esp. as a result of increased effective memtable space from
recent improvements), especially on machines with more addressable memory. I
suggest 8Gb as a minimum. The only disadvantage of having more log data is
that replay on restart may be slightly slower, but since most of the events
will be ignored it should be relatively benign, and I would rather take the
penalty on startup instead of during running, no matter how small the running
penalty.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-7031) Increase default commit log total space + segment size