date:20120725


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-4459:


Attachment: 4459-v2.txt

Update with a comment explaining that IntegerType is wrong, but we're doing it 
anyway.  Also switched all the IntegerTypes to Int32Types in the tests, which 
pass.  I don't see any point in explicitly testing IntegerType as well until 
pig has a BigInteger.

 pig driver casts ints as bytearray
 --

 Key: CASSANDRA-4459
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4459
 Project: Cassandra
  Issue Type: Bug
 Environment: C* 1.1.2 embedded in DSE
Reporter: Cathy Daw
Assignee: Brandon Williams
 Fix For: 1.1.3

 Attachments: 4459-v2.txt, 4459.txt


 we seem to be auto-mapping C* int columns to bytearray in Pig, and farther 
 down I can't seem to find a way to cast that to int and do an average.  
 {code}
 grunt cassandra_users = LOAD 'cassandra://cqldb/users' USING 
 CassandraStorage();
 grunt dump cassandra_users;
 (bobhatter,(act,22),(fname,bob),(gender,m),(highSchool,Cal 
 High),(lname,hatter),(sat,500),(state,CA),{})
 (alicesmith,(act,27),(fname,alice),(gender,f),(highSchool,Tuscon 
 High),(lname,smith),(sat,650),(state,AZ),{})
  
 // notice sat and act columns are bytearray values 
 grunt describe cassandra_users;
 cassandra_users: {key: chararray,act: (name: chararray,value: 
 bytearray),fname: (name: chararray,value: chararray),
 gender: (name: chararray,value: chararray),highSchool: (name: 
 chararray,value: chararray),lname: (name: chararray,value: chararray),
 sat: (name: chararray,value: bytearray),state: (name: chararray,value: 
 chararray),columns: {(name: chararray,value: chararray)}}
 grunt users_by_state = GROUP cassandra_users BY state;
 grunt dump users_by_state;
 ((state,AX),{(aoakley,(highSchool,Phoenix 
 High),(lname,Oakley),state,(act,22),(sat,500),(gender,m),(fname,Anne),{})})
 ((state,AZ),{(gjames,(highSchool,Tuscon 
 High),(lname,James),state,(act,24),(sat,650),(gender,f),(fname,Geronomo),{})})
 ((state,CA),{(philton,(highSchool,Beverly 
 High),(lname,Hilton),state,(act,37),(sat,220),(gender,m),(fname,Paris),{}),(jbrown,(highSchool,Cal
  High),(lname,Brown),state,(act,20),(sat,700),(gender,m),(fname,Jerry),{})})
 // Error - use explicit cast
 grunt user_avg = FOREACH users_by_state GENERATE cassandra_users.state, 
 AVG(cassandra_users.sat);
 grunt dump user_avg;
 2012-07-22 17:15:04,361 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1045: Could not infer the matching function for org.apache.pig.builtin.AVG as 
 multiple or none of them fit. Please use an explicit cast.
 // Unable to cast as int
 grunt user_avg = FOREACH users_by_state GENERATE cassandra_users.state, 
 AVG((int)cassandra_users.sat);
 grunt dump user_avg;
 2012-07-22 17:07:39,217 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1052: Cannot cast bag with schema sat: bag({name: chararray,value: 
 bytearray}) to int
 {code}
 *Seed data in CQL*
 {code}
 CREATE KEYSPACE cqldb with 
   strategy_class = 'org.apache.cassandra.locator.SimpleStrategy' 
   and strategy_options:replication_factor=3;  
 use cqldb;
 CREATE COLUMNFAMILY users (
   KEY text PRIMARY KEY, 
   fname text, lname text, gender varchar, 
   act int, sat int, highSchool text, state varchar);
 insert into users (KEY, fname, lname, gender, act, sat, highSchool, state)
 values (gjames, Geronomo, James, f, 24, 650, 'Tuscon High', 'AZ');
 insert into users (KEY, fname, lname, gender, act, sat, highSchool, state)
 values (aoakley, Anne, Oakley, m , 22, 500, 'Phoenix High', 'AX');
 insert into users (KEY, fname, lname, gender, act, sat, highSchool, state)
 values (jbrown, Jerry, Brown, m , 20, 700, 'Cal High', 'CA');
 insert into users (KEY, fname, lname, gender, act, sat, highSchool, state)
 values (philton, Paris, Hilton, m , 37, 220, 'Beverly High', 'CA');
 select * from users;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4292) Per-disk I/O queues

2012-07-25 Thread Yuki Morishita (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yuki Morishita updated CASSANDRA-4292:
--

Attachment: 4292-v2.txt

v2 attached.
This version introduces new way of specifying # of threads per disk. In
cassandra.yaml, {{data_file_directory}} now takes additional parameter in the
following format(num threads follows after ':').

{code}
data_file_directories:
- /mnt/d1/data:1
- /mnt/d1/data:3
{code}

If ':#' is omitted, it defaults to 1, so we can preserve backward
compatibility. {{memtable_flush_writers}} is removed from yaml.

In this version, compaction also uses disk bound task executor to write
sstables. Directory is chosen based on available space in both queue and disk.

bq. probably cleaner to use a Map for the new getLocationForDisk method

I did not modify to Map, since I think it is redundant and looping through few
directories does not make difference.

Per-disk I/O queues
---

Key: CASSANDRA-4292
URL: https://issues.apache.org/jira/browse/CASSANDRA-4292
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Jonathan Ellis
Assignee: Yuki Morishita
Fix For: 1.2

Attachments: 4292-v2.txt, 4292.txt

As noted in CASSANDRA-809, we have a certain amount of flush (and compaction)
threads, which mix and match disk volumes indiscriminately. It may be worth
creating a tight thread - disk affinity, to prevent unnecessary conflict at
that level.
OTOH as SSDs become more prevalent this becomes a non-issue. Unclear how
much pain this actually causes in practice in the meantime.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4459) pig driver casts ints as bytearray


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422395#comment-13422395
 ] 

Jonathan Ellis commented on CASSANDRA-4459:
---

+1

 pig driver casts ints as bytearray
 --

 Key: CASSANDRA-4459
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4459
 Project: Cassandra
  Issue Type: Bug
 Environment: C* 1.1.2 embedded in DSE
Reporter: Cathy Daw
Assignee: Brandon Williams
 Fix For: 1.1.3

 Attachments: 4459-v2.txt, 4459.txt


 we seem to be auto-mapping C* int columns to bytearray in Pig, and farther 
 down I can't seem to find a way to cast that to int and do an average.  
 {code}
 grunt cassandra_users = LOAD 'cassandra://cqldb/users' USING 
 CassandraStorage();
 grunt dump cassandra_users;
 (bobhatter,(act,22),(fname,bob),(gender,m),(highSchool,Cal 
 High),(lname,hatter),(sat,500),(state,CA),{})
 (alicesmith,(act,27),(fname,alice),(gender,f),(highSchool,Tuscon 
 High),(lname,smith),(sat,650),(state,AZ),{})
  
 // notice sat and act columns are bytearray values 
 grunt describe cassandra_users;
 cassandra_users: {key: chararray,act: (name: chararray,value: 
 bytearray),fname: (name: chararray,value: chararray),
 gender: (name: chararray,value: chararray),highSchool: (name: 
 chararray,value: chararray),lname: (name: chararray,value: chararray),
 sat: (name: chararray,value: bytearray),state: (name: chararray,value: 
 chararray),columns: {(name: chararray,value: chararray)}}
 grunt users_by_state = GROUP cassandra_users BY state;
 grunt dump users_by_state;
 ((state,AX),{(aoakley,(highSchool,Phoenix 
 High),(lname,Oakley),state,(act,22),(sat,500),(gender,m),(fname,Anne),{})})
 ((state,AZ),{(gjames,(highSchool,Tuscon 
 High),(lname,James),state,(act,24),(sat,650),(gender,f),(fname,Geronomo),{})})
 ((state,CA),{(philton,(highSchool,Beverly 
 High),(lname,Hilton),state,(act,37),(sat,220),(gender,m),(fname,Paris),{}),(jbrown,(highSchool,Cal
  High),(lname,Brown),state,(act,20),(sat,700),(gender,m),(fname,Jerry),{})})
 // Error - use explicit cast
 grunt user_avg = FOREACH users_by_state GENERATE cassandra_users.state, 
 AVG(cassandra_users.sat);
 grunt dump user_avg;
 2012-07-22 17:15:04,361 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1045: Could not infer the matching function for org.apache.pig.builtin.AVG as 
 multiple or none of them fit. Please use an explicit cast.
 // Unable to cast as int
 grunt user_avg = FOREACH users_by_state GENERATE cassandra_users.state, 
 AVG((int)cassandra_users.sat);
 grunt dump user_avg;
 2012-07-22 17:07:39,217 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1052: Cannot cast bag with schema sat: bag({name: chararray,value: 
 bytearray}) to int
 {code}
 *Seed data in CQL*
 {code}
 CREATE KEYSPACE cqldb with 
   strategy_class = 'org.apache.cassandra.locator.SimpleStrategy' 
   and strategy_options:replication_factor=3;  
 use cqldb;
 CREATE COLUMNFAMILY users (
   KEY text PRIMARY KEY, 
   fname text, lname text, gender varchar, 
   act int, sat int, highSchool text, state varchar);
 insert into users (KEY, fname, lname, gender, act, sat, highSchool, state)
 values (gjames, Geronomo, James, f, 24, 650, 'Tuscon High', 'AZ');
 insert into users (KEY, fname, lname, gender, act, sat, highSchool, state)
 values (aoakley, Anne, Oakley, m , 22, 500, 'Phoenix High', 'AX');
 insert into users (KEY, fname, lname, gender, act, sat, highSchool, state)
 values (jbrown, Jerry, Brown, m , 20, 700, 'Cal High', 'CA');
 insert into users (KEY, fname, lname, gender, act, sat, highSchool, state)
 values (philton, Paris, Hilton, m , 37, 220, 'Beverly High', 'CA');
 select * from users;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Cassandra Wiki] Update of VirtualNodes/Balance by EricEvans

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The VirtualNodes/Balance page has been changed by EricEvans:
http://wiki.apache.org/cassandra/VirtualNodes/Balance

Comment:
stubbed out page

New page:
This page is for design notes and information relating to operations effecting 
token/range ownership.  See also:

 * [[https://issues.apache.org/jira/browse/CASSANDRA-4445|CASSANDRA-4445: 
balance utility for vnodes]]
 * [[https://issues.apache.org/jira/browse/CASSANDRA-4443|CASSANDRA-4443: 
shuffle utility for vnodes]]

TableOfContents

Anchor(requirements)
== Requirements ==

 1. Offsetting ownership ratios for [[#heterogeneous_nodes|heterogeneous nodes]]
 1. Correcting [[#imbalance|imbalances created by random token selection]]
 1. [[#shuffling|Randomizing ranges]] after a migration

Anchor(heterogeneous_nodes)
== Heterogeneous Nodes ==

When running a cluster of heterogeneous nodes, (i.e. differing amounts of 
storage, memory, cores, etc), it may be desirable to place a greater or less 
portion of the keyspace on one or more nodes.

Anchor(imbalance)
== Imbalance ==

By default, a nodes tokens are randomly generated with the expectation that an 
even distribution of the namespace will result.  However, variations of as much 
as 7% have been reported on small clusters when using the `num_tokens` default 
of 256.

These randomly generated tokens are MD5 sums, so entropy isn't the problem 
here, at least not in the sense that using a better RNG would create a more 
even distribution of ranges.  Increasing the token count (either by increasing 
num_tokens, or the number of nodes) will improve this, (the more tokens, the 
more the distribution will even out).

This anecdotal worst-case is probably Good Enough, especially when considering 
that key distribution is subject to the same properties, or that many data sets 
are skewed on their own, (i.e. optimal ownership is not necessary optimal 
anyway).

That said, our history is one where random token selection produced completely 
unacceptable results, and manual intervention was required.  The typical 
(expected) result of manual token selection is near perfect balance of 
ownership, and it will likely be some time before people are comfortable seeing 
otherwise.

Anchor(shuffling)
== Shuffling ==

When migrating a legacy cluster with one-token-per-node to virtual nodes, the 
existing range is carved up into `num_tokens` new ranges.  These new ranges are 
still contiguous however, and a means of randomizing their placement is needed.

[jira] [Commented] (CASSANDRA-4459) pig driver casts ints as bytearray

2012-07-25 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422400#comment-13422400
 ] 

Pavel Yaskevich commented on CASSANDRA-4459:


+1

 pig driver casts ints as bytearray
 --

 Key: CASSANDRA-4459
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4459
 Project: Cassandra
  Issue Type: Bug
 Environment: C* 1.1.2 embedded in DSE
Reporter: Cathy Daw
Assignee: Brandon Williams
 Fix For: 1.1.3

 Attachments: 4459-v2.txt, 4459.txt


 we seem to be auto-mapping C* int columns to bytearray in Pig, and farther 
 down I can't seem to find a way to cast that to int and do an average.  
 {code}
 grunt cassandra_users = LOAD 'cassandra://cqldb/users' USING 
 CassandraStorage();
 grunt dump cassandra_users;
 (bobhatter,(act,22),(fname,bob),(gender,m),(highSchool,Cal 
 High),(lname,hatter),(sat,500),(state,CA),{})
 (alicesmith,(act,27),(fname,alice),(gender,f),(highSchool,Tuscon 
 High),(lname,smith),(sat,650),(state,AZ),{})
  
 // notice sat and act columns are bytearray values 
 grunt describe cassandra_users;
 cassandra_users: {key: chararray,act: (name: chararray,value: 
 bytearray),fname: (name: chararray,value: chararray),
 gender: (name: chararray,value: chararray),highSchool: (name: 
 chararray,value: chararray),lname: (name: chararray,value: chararray),
 sat: (name: chararray,value: bytearray),state: (name: chararray,value: 
 chararray),columns: {(name: chararray,value: chararray)}}
 grunt users_by_state = GROUP cassandra_users BY state;
 grunt dump users_by_state;
 ((state,AX),{(aoakley,(highSchool,Phoenix 
 High),(lname,Oakley),state,(act,22),(sat,500),(gender,m),(fname,Anne),{})})
 ((state,AZ),{(gjames,(highSchool,Tuscon 
 High),(lname,James),state,(act,24),(sat,650),(gender,f),(fname,Geronomo),{})})
 ((state,CA),{(philton,(highSchool,Beverly 
 High),(lname,Hilton),state,(act,37),(sat,220),(gender,m),(fname,Paris),{}),(jbrown,(highSchool,Cal
  High),(lname,Brown),state,(act,20),(sat,700),(gender,m),(fname,Jerry),{})})
 // Error - use explicit cast
 grunt user_avg = FOREACH users_by_state GENERATE cassandra_users.state, 
 AVG(cassandra_users.sat);
 grunt dump user_avg;
 2012-07-22 17:15:04,361 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1045: Could not infer the matching function for org.apache.pig.builtin.AVG as 
 multiple or none of them fit. Please use an explicit cast.
 // Unable to cast as int
 grunt user_avg = FOREACH users_by_state GENERATE cassandra_users.state, 
 AVG((int)cassandra_users.sat);
 grunt dump user_avg;
 2012-07-22 17:07:39,217 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1052: Cannot cast bag with schema sat: bag({name: chararray,value: 
 bytearray}) to int
 {code}
 *Seed data in CQL*
 {code}
 CREATE KEYSPACE cqldb with 
   strategy_class = 'org.apache.cassandra.locator.SimpleStrategy' 
   and strategy_options:replication_factor=3;  
 use cqldb;
 CREATE COLUMNFAMILY users (
   KEY text PRIMARY KEY, 
   fname text, lname text, gender varchar, 
   act int, sat int, highSchool text, state varchar);
 insert into users (KEY, fname, lname, gender, act, sat, highSchool, state)
 values (gjames, Geronomo, James, f, 24, 650, 'Tuscon High', 'AZ');
 insert into users (KEY, fname, lname, gender, act, sat, highSchool, state)
 values (aoakley, Anne, Oakley, m , 22, 500, 'Phoenix High', 'AX');
 insert into users (KEY, fname, lname, gender, act, sat, highSchool, state)
 values (jbrown, Jerry, Brown, m , 20, 700, 'Cal High', 'CA');
 insert into users (KEY, fname, lname, gender, act, sat, highSchool, state)
 values (philton, Paris, Hilton, m , 37, 220, 'Beverly High', 'CA');
 select * from users;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

git commit: Pig: support for Int32Type. Patch by brandonwilliams, reviewed by xedin for CASSANDRA-4459

Updated Branches:
  refs/heads/cassandra-1.1 9a6339476 - 6f384c54d


Pig: support for Int32Type.
Patch by brandonwilliams, reviewed by xedin for CASSANDRA-4459


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6f384c54
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6f384c54
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6f384c54

Branch: refs/heads/cassandra-1.1
Commit: 6f384c54de567d8d901592f0c32769b6582e50e4
Parents: 9a63394
Author: Brandon Williams brandonwilli...@apache.org
Authored: Wed Jul 25 12:06:49 2012 -0500
Committer: Brandon Williams brandonwilli...@apache.org
Committed: Wed Jul 25 12:06:49 2012 -0500

--
 examples/pig/test/populate-cli.txt |4 ++--
 .../cassandra/hadoop/pig/CassandraStorage.java |2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/6f384c54/examples/pig/test/populate-cli.txt
--
diff --git a/examples/pig/test/populate-cli.txt 
b/examples/pig/test/populate-cli.txt
index 1f59642..b2dda58 100644
--- a/examples/pig/test/populate-cli.txt
+++ b/examples/pig/test/populate-cli.txt
@@ -8,7 +8,7 @@ column_metadata =
 [
 {column_name: name, validation_class: UTF8Type, index_type: KEYS},
 {column_name: vote_type, validation_class: UTF8Type},
-{column_name: rating, validation_class: IntegerType},
+{column_name: rating, validation_class: Int32Type},
 {column_name: score, validation_class: LongType},
 {column_name: percent, validation_class: FloatType},
 {column_name: atomic_weight, validation_class: DoubleType},
@@ -23,7 +23,7 @@ column_metadata =
 [
 {column_name: name, validation_class: UTF8Type, index_type: KEYS},
 {column_name: vote_type, validation_class: UTF8Type},
-{column_name: rating, validation_class: IntegerType},
+{column_name: rating, validation_class: Int32Type},
 {column_name: score, validation_class: LongType},
 {column_name: percent, validation_class: FloatType},
 {column_name: atomic_weight, validation_class: DoubleType},

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6f384c54/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java
--
diff --git a/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java 
b/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java
index 454330c..f2fad67 100644
--- a/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java
+++ b/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java
@@ -670,7 +670,7 @@ public class CassandraStorage extends LoadFunc implements 
StoreFuncInterface, Lo
 {
 if (type instanceof LongType || type instanceof DateType) // DateType 
is bad and it should feel bad
 return DataType.LONG;
-else if (type instanceof IntegerType)
+else if (type instanceof IntegerType || type instanceof Int32Type) // 
IntegerType will overflow at 2**31, but is kept for compatibility until pig has 
a BigInteger
 return DataType.INTEGER;
 else if (type instanceof AsciiType)
 return DataType.CHARARRAY;

[2/4] git commit: Pig: support for Int32Type. Patch by brandonwilliams, reviewed by xedin for CASSANDRA-4459

Pig: support for Int32Type.
Patch by brandonwilliams, reviewed by xedin for CASSANDRA-4459


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6f384c54
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6f384c54
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6f384c54

Branch: refs/heads/trunk
Commit: 6f384c54de567d8d901592f0c32769b6582e50e4
Parents: 9a63394
Author: Brandon Williams brandonwilli...@apache.org
Authored: Wed Jul 25 12:06:49 2012 -0500
Committer: Brandon Williams brandonwilli...@apache.org
Committed: Wed Jul 25 12:06:49 2012 -0500

--
 examples/pig/test/populate-cli.txt |4 ++--
 .../cassandra/hadoop/pig/CassandraStorage.java |2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/6f384c54/examples/pig/test/populate-cli.txt
--
diff --git a/examples/pig/test/populate-cli.txt 
b/examples/pig/test/populate-cli.txt
index 1f59642..b2dda58 100644
--- a/examples/pig/test/populate-cli.txt
+++ b/examples/pig/test/populate-cli.txt
@@ -8,7 +8,7 @@ column_metadata =
 [
 {column_name: name, validation_class: UTF8Type, index_type: KEYS},
 {column_name: vote_type, validation_class: UTF8Type},
-{column_name: rating, validation_class: IntegerType},
+{column_name: rating, validation_class: Int32Type},
 {column_name: score, validation_class: LongType},
 {column_name: percent, validation_class: FloatType},
 {column_name: atomic_weight, validation_class: DoubleType},
@@ -23,7 +23,7 @@ column_metadata =
 [
 {column_name: name, validation_class: UTF8Type, index_type: KEYS},
 {column_name: vote_type, validation_class: UTF8Type},
-{column_name: rating, validation_class: IntegerType},
+{column_name: rating, validation_class: Int32Type},
 {column_name: score, validation_class: LongType},
 {column_name: percent, validation_class: FloatType},
 {column_name: atomic_weight, validation_class: DoubleType},

http://git-wip-us.apache.org/repos/asf/cassandra/blob/6f384c54/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java
--
diff --git a/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java 
b/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java
index 454330c..f2fad67 100644
--- a/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java
+++ b/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java
@@ -670,7 +670,7 @@ public class CassandraStorage extends LoadFunc implements 
StoreFuncInterface, Lo
 {
 if (type instanceof LongType || type instanceof DateType) // DateType 
is bad and it should feel bad
 return DataType.LONG;
-else if (type instanceof IntegerType)
+else if (type instanceof IntegerType || type instanceof Int32Type) // 
IntegerType will overflow at 2**31, but is kept for compatibility until pig has 
a BigInteger
 return DataType.INTEGER;
 else if (type instanceof AsciiType)
 return DataType.CHARARRAY;

[4/4] git commit: Fix scary message about secondaries always being created at startup

Fix scary message about secondaries always being created at startup


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/41c9ba63
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/41c9ba63
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/41c9ba63

Branch: refs/heads/trunk
Commit: 41c9ba63d624d1d6863b67a0cbcf4144bfbea29c
Parents: aba1f16
Author: Brandon Williams brandonwilli...@apache.org
Authored: Mon Jul 23 18:30:13 2012 -0500
Committer: Brandon Williams brandonwilli...@apache.org
Committed: Mon Jul 23 18:32:00 2012 -0500

--
 .../cassandra/db/index/SecondaryIndexManager.java  |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/41c9ba63/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java
--
diff --git a/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java 
b/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java
index 6733c90..ba066e2 100644
--- a/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java
+++ b/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java
@@ -205,7 +205,6 @@ public class SecondaryIndexManager
 return null;
 
 assert cdef.getIndexType() != null;
-logger.info(Creating new index : {},cdef);
 
 SecondaryIndex index;
 try
@@ -231,6 +230,7 @@ public class SecondaryIndexManager
 {
 index = currentIndex;
 index.addColumnDef(cdef);
+logger.info(Creating new index : {},cdef);
 }
 }
 else

[1/4] git commit: Merge branch 'cassandra-1.1' into trunk

Updated Branches:
  refs/heads/trunk e73b2a68b - d62f8c1e5


Merge branch 'cassandra-1.1' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/d62f8c1e
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/d62f8c1e
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/d62f8c1e

Branch: refs/heads/trunk
Commit: d62f8c1e5f4a901652cd9dd7ef7f8ecb4b779450
Parents: e73b2a6 6f384c5
Author: Brandon Williams brandonwilli...@apache.org
Authored: Wed Jul 25 12:09:14 2012 -0500
Committer: Brandon Williams brandonwilli...@apache.org
Committed: Wed Jul 25 12:09:14 2012 -0500

--
 examples/pig/test/populate-cli.txt |4 ++--
 .../cassandra/hadoop/pig/CassandraStorage.java |2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/d62f8c1e/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java
--

[3/4] git commit: cqlsh: add a COPY TO command Patch by paul cannon, reviewed by brandonwilliams for CASSANDRA-4434

cqlsh: add a COPY TO command
Patch by paul cannon, reviewed by brandonwilliams for CASSANDRA-4434


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/9a633947
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/9a633947
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/9a633947

Branch: refs/heads/trunk
Commit: 9a63394765de28160d576c9285be68587e222a86
Parents: 41c9ba6
Author: Brandon Williams brandonwilli...@apache.org
Authored: Tue Jul 24 13:57:19 2012 -0500
Committer: Brandon Williams brandonwilli...@apache.org
Committed: Tue Jul 24 13:57:19 2012 -0500

--
 CHANGES.txt |1 +
 bin/cqlsh   |  126 -
 2 files changed, 105 insertions(+), 22 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/9a633947/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 0885387..638574c 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -23,6 +23,7 @@ Merged from 1.0:
  * Fix LCS splitting sstable base on uncompressed size (CASSANDRA-4419)
  * Bootstraps that fail are detected upon restart and will retry safely without
needing to delete existing data first (CASSANDRA-4427)
+ * (cqlsh) add a COPY TO command to copy a CF to a CSV file (CASSANDRA-4434)
 
 
 1.1.2

http://git-wip-us.apache.org/repos/asf/cassandra/blob/9a633947/bin/cqlsh
--
diff --git a/bin/cqlsh b/bin/cqlsh
index 574d49b..c67a818 100755
--- a/bin/cqlsh
+++ b/bin/cqlsh
@@ -224,7 +224,8 @@ cqlsh_extra_syntax_rules = r'''
 
 copyCommand ::= COPY cf=columnFamilyName
  ( ( [colnames]=colname ( , [colnames]=colname 
)* ) )?
- FROM ( fname=stringLiteral | STDIN )
+ ( dir=FROM ( fname=stringLiteral | STDIN )
+ | dir=TO   ( fname=stringLiteral | STDOUT ) )
  ( WITH copyOption ( AND copyOption )* )?
 ;
 
@@ -303,12 +304,16 @@ def complete_copy_column_names(ctxt, cqlsh):
 return [colnames[0]]
 return set(colnames[1:]) - set(existcols)
 
-COPY_OPTIONS = ('DELIMITER', 'QUOTE', 'ESCAPE', 'HEADER')
+COPY_OPTIONS = ('DELIMITER', 'QUOTE', 'ESCAPE', 'HEADER', 'ENCODING', 'NULL')
 
 @cqlsh_syntax_completer('copyOption', 'optnames')
 def complete_copy_options(ctxt, cqlsh):
 optnames = map(str.upper, ctxt.get_binding('optnames', ()))
-return set(COPY_OPTIONS) - set(optnames)
+direction = ctxt.get_binding('dir').upper()
+opts = set(COPY_OPTIONS) - set(optnames)
+if direction == 'FROM':
+opts -= ('ENCODING', 'NULL')
+return opts
 
 @cqlsh_syntax_completer('copyOption', 'optvals')
 def complete_copy_opt_values(ctxt, cqlsh):
@@ -448,13 +453,13 @@ def unix_time_from_uuid1(u):
 return (u.get_time() - 0x01B21DD213814000) / 1000.0
 
 def format_value(val, casstype, output_encoding, addcolor=False, 
time_format='',
- float_precision=3, colormap=DEFAULT_VALUE_COLORS):
+ float_precision=3, colormap=DEFAULT_VALUE_COLORS, 
nullval='null'):
 color = colormap['default']
 coloredval = None
 displaywidth = None
 
 if val is None:
-bval = 'null'
+bval = nullval
 color = colormap['error']
 elif isinstance(val, DecodeError):
 casstype = 'BytesType'
@@ -727,7 +732,7 @@ class Shell(cmd.Cmd):
 def get_column_names(self, ksname, cfname):
 if ksname is None:
 ksname = self.current_keyspace
-if self.cqlver_atleast(3):
+if ksname != 'system' and self.cqlver_atleast(3):
 return self.get_column_names_from_layout(ksname, cfname)
 else:
 return self.get_column_names_from_cfdef(ksname, cfname)
@@ -1433,6 +1438,9 @@ class Shell(cmd.Cmd):
 COPY table_name [ ( column [, ...] ) ]
  FROM ( 'filename' | STDIN )
  [ WITH option='value' [AND ...] ];
+COPY table_name [ ( column [, ...] ) ]
+ TO ( 'filename' | STDOUT )
+ [ WITH option='value' [AND ...] ];
 
 Available options and defaults:
 
@@ -1440,6 +1448,8 @@ class Shell(cmd.Cmd):
   QUOTE=''- quoting character to be used to quote fields
   ESCAPE='\'   - character to appear before the QUOTE char when 
quoted
   HEADER=false - whether to ignore the first line
+  ENCODING='utf8'  - encoding for CSV output (COPY TO only)
+  NULL=''  - string that represents a null value (COPY TO only)
 
 When entering CSV data on STDIN, you can use the sequence \.
 on a line by itself to end the data input.
@@ -1448,12 +1458,11 @@ class Shell(cmd.Cmd):
 ks

[Cassandra Wiki] Update of VirtualNodes/Balance by EricEvans

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The VirtualNodes/Balance page has been changed by EricEvans:
http://wiki.apache.org/cassandra/VirtualNodes/Balance?action=diffrev1=1rev2=2

Comment:
hashing out implementation proposal

  
  When migrating a legacy cluster with one-token-per-node to virtual nodes, the 
existing range is carved up into `num_tokens` new ranges.  These new ranges are 
still contiguous however, and a means of randomizing their placement is needed.
  
+ Anchor(implementation)
+ == Implementation (Draft) ==
+ === Nodes / Cluster ===
+ The most straightforward method of effecting ownership is a token move (i.e. 
relocating a range from one node to another).  Exposing this with JMX would 
allow implementing all of the required operations client-side.
+ 
+ === User Interface ===
+ 
+ {{{
+ $ nodetool balance
+ }}}
+ 
+ {{{
+ $ nodetool shuffle
+ }}}
+ 
+ {{{
+ $ nodetool trim
+ }}}
+

[jira] [Updated] (CASSANDRA-4447) enable jamm for OpenJDK = 1.6.0.23


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-4447:


Attachment: 4447.txt

Attaching a slightly different approach with cleaner logic.

 enable jamm for OpenJDK = 1.6.0.23
 ---

 Key: CASSANDRA-4447
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4447
 Project: Cassandra
  Issue Type: Improvement
  Components: Packaging
 Environment: openjdk
Reporter: Ilya Shipitsin
Priority: Trivial
 Fix For: 1.1.3

 Attachments: 4447.txt


 we tested jamm with OpenJDK, it works well starting at 1.6.0.23, so I suggest
 --- cassandra-env.sh.dist   2012-07-19 12:24:44.938886154 +0600
 +++ cassandra-env.sh2012-07-19 12:28:34.913886847 +0600
 @@ -119,8 +119,10 @@
  
  # add the jamm javaagent
  check_openjdk=`${JAVA:-java} -version 21 | awk '{if (NR == 2) {print 
 $1}}'`
 -if [ $check_openjdk != OpenJDK ]
 +check_openjdk_is_good_for_jamm=`${JAVA:-java} -version 21 | awk -F 
 _|\ '/1\.6\.0/  $3  23 {print bad }'`
 +if [ $check_openjdk = OpenJDK ]  [ $check_openjdk_is_good_for_jamm = 
 bad ]
  then
 +else 
  JVM_OPTS=$JVM_OPTS -javaagent:$CASSANDRA_HOME/lib/jamm-0.2.5.jar
  fi

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (CASSANDRA-4447) enable jamm for OpenJDK = 1.6.0.23


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams reassigned CASSANDRA-4447:
---

Assignee: Brandon Williams

 enable jamm for OpenJDK = 1.6.0.23
 ---

 Key: CASSANDRA-4447
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4447
 Project: Cassandra
  Issue Type: Improvement
  Components: Packaging
 Environment: openjdk
Reporter: Ilya Shipitsin
Assignee: Brandon Williams
Priority: Trivial
 Fix For: 1.1.3

 Attachments: 4447.txt


 we tested jamm with OpenJDK, it works well starting at 1.6.0.23, so I suggest
 --- cassandra-env.sh.dist   2012-07-19 12:24:44.938886154 +0600
 +++ cassandra-env.sh2012-07-19 12:28:34.913886847 +0600
 @@ -119,8 +119,10 @@
  
  # add the jamm javaagent
  check_openjdk=`${JAVA:-java} -version 21 | awk '{if (NR == 2) {print 
 $1}}'`
 -if [ $check_openjdk != OpenJDK ]
 +check_openjdk_is_good_for_jamm=`${JAVA:-java} -version 21 | awk -F 
 _|\ '/1\.6\.0/  $3  23 {print bad }'`
 +if [ $check_openjdk = OpenJDK ]  [ $check_openjdk_is_good_for_jamm = 
 bad ]
  then
 +else 
  JVM_OPTS=$JVM_OPTS -javaagent:$CASSANDRA_HOME/lib/jamm-0.2.5.jar
  fi

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4447) enable jamm for OpenJDK = 1.6.0.23


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-4447:


Reviewer: thepaul  (was: brandon.williams)

 enable jamm for OpenJDK = 1.6.0.23
 ---

 Key: CASSANDRA-4447
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4447
 Project: Cassandra
  Issue Type: Improvement
  Components: Packaging
 Environment: openjdk
Reporter: Ilya Shipitsin
Assignee: Brandon Williams
Priority: Trivial
 Fix For: 1.1.3

 Attachments: 4447.txt


 we tested jamm with OpenJDK, it works well starting at 1.6.0.23, so I suggest
 --- cassandra-env.sh.dist   2012-07-19 12:24:44.938886154 +0600
 +++ cassandra-env.sh2012-07-19 12:28:34.913886847 +0600
 @@ -119,8 +119,10 @@
  
  # add the jamm javaagent
  check_openjdk=`${JAVA:-java} -version 21 | awk '{if (NR == 2) {print 
 $1}}'`
 -if [ $check_openjdk != OpenJDK ]
 +check_openjdk_is_good_for_jamm=`${JAVA:-java} -version 21 | awk -F 
 _|\ '/1\.6\.0/  $3  23 {print bad }'`
 +if [ $check_openjdk = OpenJDK ]  [ $check_openjdk_is_good_for_jamm = 
 bad ]
  then
 +else 
  JVM_OPTS=$JVM_OPTS -javaagent:$CASSANDRA_HOME/lib/jamm-0.2.5.jar
  fi

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4447) enable jamm for OpenJDK = 1.6.0.23

2012-07-25 Thread Ilya Shipitsin (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422455#comment-13422455
 ] 

Ilya Shipitsin commented on CASSANDRA-4447:
---

ok, it's better

 enable jamm for OpenJDK = 1.6.0.23
 ---

 Key: CASSANDRA-4447
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4447
 Project: Cassandra
  Issue Type: Improvement
  Components: Packaging
 Environment: openjdk
Reporter: Ilya Shipitsin
Assignee: Brandon Williams
Priority: Trivial
 Fix For: 1.1.3

 Attachments: 4447.txt


 we tested jamm with OpenJDK, it works well starting at 1.6.0.23, so I suggest
 --- cassandra-env.sh.dist   2012-07-19 12:24:44.938886154 +0600
 +++ cassandra-env.sh2012-07-19 12:28:34.913886847 +0600
 @@ -119,8 +119,10 @@
  
  # add the jamm javaagent
  check_openjdk=`${JAVA:-java} -version 21 | awk '{if (NR == 2) {print 
 $1}}'`
 -if [ $check_openjdk != OpenJDK ]
 +check_openjdk_is_good_for_jamm=`${JAVA:-java} -version 21 | awk -F 
 _|\ '/1\.6\.0/  $3  23 {print bad }'`
 +if [ $check_openjdk = OpenJDK ]  [ $check_openjdk_is_good_for_jamm = 
 bad ]
  then
 +else 
  JVM_OPTS=$JVM_OPTS -javaagent:$CASSANDRA_HOME/lib/jamm-0.2.5.jar
  fi

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[2/5] git commit: add comment to #4452

add comment to #4452


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f46232c0
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f46232c0
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f46232c0

Branch: refs/heads/cassandra-1.1
Commit: f46232c0b02f27c5177bd453a6d0b0f6441c2499
Parents: 06bdd3e
Author: Jonathan Ellis jbel...@apache.org
Authored: Wed Jul 25 13:15:21 2012 -0500
Committer: Jonathan Ellis jbel...@apache.org
Committed: Wed Jul 25 13:15:21 2012 -0500

--
 .../cassandra/service/StorageServiceMBean.java |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/f46232c0/src/java/org/apache/cassandra/service/StorageServiceMBean.java
--
diff --git a/src/java/org/apache/cassandra/service/StorageServiceMBean.java 
b/src/java/org/apache/cassandra/service/StorageServiceMBean.java
index 0872e2b..c4c6a1d 100644
--- a/src/java/org/apache/cassandra/service/StorageServiceMBean.java
+++ b/src/java/org/apache/cassandra/service/StorageServiceMBean.java
@@ -401,8 +401,10 @@ public interface StorageServiceMBean
 public void loadNewSSTables(String ksName, String cfName);
 
 /**
- * Return a List of Tokens representing a sample of keys
- * across all ColumnFamilyStores
+ * Return a List of Tokens representing a sample of keys across all 
ColumnFamilyStores.
+ *
+ * Note: this should be left as an operation, not an attribute (methods 
starting with get)
+ * to avoid sending potentially multiple MB of data when accessing this 
mbean by default.  See CASSANDRA-4452.
  *
  * @return set of Tokens as Strings
  */

[4/5] git commit: rename getRangeKeySample to sampleKeyRange patch by Jan Prach; reviewed by jbellis for CASSANDRA-4452

rename getRangeKeySample to sampleKeyRange
patch by Jan Prach; reviewed by jbellis for CASSANDRA-4452


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/06bdd3ea
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/06bdd3ea
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/06bdd3ea

Branch: refs/heads/trunk
Commit: 06bdd3ea8e6ec8ebf47b7bd813041550f99fa48b
Parents: 6f384c5
Author: Jonathan Ellis jbel...@apache.org
Authored: Wed Jul 25 13:12:32 2012 -0500
Committer: Jonathan Ellis jbel...@apache.org
Committed: Wed Jul 25 13:12:32 2012 -0500

--
 CHANGES.txt|2 ++
 .../apache/cassandra/service/StorageService.java   |2 +-
 .../cassandra/service/StorageServiceMBean.java |2 +-
 src/java/org/apache/cassandra/tools/NodeCmd.java   |2 +-
 src/java/org/apache/cassandra/tools/NodeProbe.java |4 ++--
 5 files changed, 7 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/06bdd3ea/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 638574c..c160d69 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,6 @@
 1.1.3
+ * (JMX) rename getRangeKeySample to sampleKeyRange to avoid returning
+   multi-MB results as an attribute (CASSANDRA-4452)
  * flush based on data size, not throughput; overwritten columns no 
longer artificially inflate liveRatio (CASSANDRA-4399)
  * update default commitlog segment size to 32MB and total commitlog

http://git-wip-us.apache.org/repos/asf/cassandra/blob/06bdd3ea/src/java/org/apache/cassandra/service/StorageService.java
--
diff --git a/src/java/org/apache/cassandra/service/StorageService.java 
b/src/java/org/apache/cassandra/service/StorageService.java
index 28a3551..bfc8c81 100644
--- a/src/java/org/apache/cassandra/service/StorageService.java
+++ b/src/java/org/apache/cassandra/service/StorageService.java
@@ -3080,7 +3080,7 @@ public class StorageService implements 
IEndpointStateChangeSubscriber, StorageSe
 /**
  * #{@inheritDoc}
  */
-public ListString getRangeKeySample()
+public ListString sampleKeyRange() // do not rename to getter - see 
CASSANDRA-4452 for details
 {
 ListDecoratedKey keys = 
keySamples(ColumnFamilyStore.allUserDefined(), getLocalPrimaryRange());
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/06bdd3ea/src/java/org/apache/cassandra/service/StorageServiceMBean.java
--
diff --git a/src/java/org/apache/cassandra/service/StorageServiceMBean.java 
b/src/java/org/apache/cassandra/service/StorageServiceMBean.java
index 72d03d1..0872e2b 100644
--- a/src/java/org/apache/cassandra/service/StorageServiceMBean.java
+++ b/src/java/org/apache/cassandra/service/StorageServiceMBean.java
@@ -406,7 +406,7 @@ public interface StorageServiceMBean
  *
  * @return set of Tokens as Strings
  */
-public ListString getRangeKeySample();
+public ListString sampleKeyRange();
 
 /**
  * rebuild the specified indexes

http://git-wip-us.apache.org/repos/asf/cassandra/blob/06bdd3ea/src/java/org/apache/cassandra/tools/NodeCmd.java
--
diff --git a/src/java/org/apache/cassandra/tools/NodeCmd.java 
b/src/java/org/apache/cassandra/tools/NodeCmd.java
index a8d3f55..b73e96a 100644
--- a/src/java/org/apache/cassandra/tools/NodeCmd.java
+++ b/src/java/org/apache/cassandra/tools/NodeCmd.java
@@ -922,7 +922,7 @@ public class NodeCmd
 private void printRangeKeySample(PrintStream outs)
 {
 outs.println(RangeKeySample: );
-ListString tokenStrings = this.probe.getRangeKeySample();
+ListString tokenStrings = this.probe.sampleKeyRange();
 for (String tokenString : tokenStrings)
 {
 outs.println(\t + tokenString);

http://git-wip-us.apache.org/repos/asf/cassandra/blob/06bdd3ea/src/java/org/apache/cassandra/tools/NodeProbe.java
--
diff --git a/src/java/org/apache/cassandra/tools/NodeProbe.java 
b/src/java/org/apache/cassandra/tools/NodeProbe.java
index d1a615d..5c04eff 100644
--- a/src/java/org/apache/cassandra/tools/NodeProbe.java
+++ b/src/java/org/apache/cassandra/tools/NodeProbe.java
@@ -690,9 +690,9 @@ public class NodeProbe
 ssProxy.rebuild(sourceDc);
 }
 
-public ListString getRangeKeySample()
+public ListString sampleKeyRange()
 {
-return ssProxy.getRangeKeySample();
+return ssProxy.sampleKeyRange();
 }
 
 public void resetLocalSchema() throws IOException

[3/5] git commit: add comment to #4452

add comment to #4452


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f46232c0
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f46232c0
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f46232c0

Branch: refs/heads/trunk
Commit: f46232c0b02f27c5177bd453a6d0b0f6441c2499
Parents: 06bdd3e
Author: Jonathan Ellis jbel...@apache.org
Authored: Wed Jul 25 13:15:21 2012 -0500
Committer: Jonathan Ellis jbel...@apache.org
Committed: Wed Jul 25 13:15:21 2012 -0500

--
 .../cassandra/service/StorageServiceMBean.java |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/f46232c0/src/java/org/apache/cassandra/service/StorageServiceMBean.java
--
diff --git a/src/java/org/apache/cassandra/service/StorageServiceMBean.java 
b/src/java/org/apache/cassandra/service/StorageServiceMBean.java
index 0872e2b..c4c6a1d 100644
--- a/src/java/org/apache/cassandra/service/StorageServiceMBean.java
+++ b/src/java/org/apache/cassandra/service/StorageServiceMBean.java
@@ -401,8 +401,10 @@ public interface StorageServiceMBean
 public void loadNewSSTables(String ksName, String cfName);
 
 /**
- * Return a List of Tokens representing a sample of keys
- * across all ColumnFamilyStores
+ * Return a List of Tokens representing a sample of keys across all 
ColumnFamilyStores.
+ *
+ * Note: this should be left as an operation, not an attribute (methods 
starting with get)
+ * to avoid sending potentially multiple MB of data when accessing this 
mbean by default.  See CASSANDRA-4452.
  *
  * @return set of Tokens as Strings
  */

[1/5] git commit: Merge branch 'cassandra-1.1' into trunk

Updated Branches:
  refs/heads/cassandra-1.1 6f384c54d - f46232c0b
  refs/heads/trunk d62f8c1e5 - b167e9ba7


Merge branch 'cassandra-1.1' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b167e9ba
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b167e9ba
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b167e9ba

Branch: refs/heads/trunk
Commit: b167e9ba74fa917aeed55cfdcbff9133c13720d5
Parents: d62f8c1 f46232c
Author: Jonathan Ellis jbel...@apache.org
Authored: Wed Jul 25 13:15:29 2012 -0500
Committer: Jonathan Ellis jbel...@apache.org
Committed: Wed Jul 25 13:15:29 2012 -0500

--
 CHANGES.txt|2 ++
 .../apache/cassandra/service/StorageService.java   |2 +-
 .../cassandra/service/StorageServiceMBean.java |8 +---
 src/java/org/apache/cassandra/tools/NodeCmd.java   |2 +-
 src/java/org/apache/cassandra/tools/NodeProbe.java |4 ++--
 5 files changed, 11 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/b167e9ba/CHANGES.txt
--
diff --cc CHANGES.txt
index c558c3f,c160d69..6dc6382
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,39 -1,6 +1,41 @@@
 +1.2-dev
 + * Introduce new json format with row level deletion (CASSANDRA-4054)
 + * remove redundant name column from schema_keyspaces (CASSANDRA-4433)
 + * improve nodetool ring handling of multi-dc clusters (CASSANDRA-3047)
 + * update NTS calculateNaturalEndpoints to be O(N log N) (CASSANDRA-3881)
 + * add UseCondCardMark XX jvm settings on jdk 1.7 (CASSANDRA-4366)
 + * split up rpc timeout by operation type (CASSANDRA-2819)
 + * rewrite key cache save/load to use only sequential i/o (CASSANDRA-3762)
 + * update MS protocol with a version handshake + broadcast address id
 +   (CASSANDRA-4311)
 + * multithreaded hint replay (CASSANDRA-4189)
 + * add inter-node message compression (CASSANDRA-3127)
 + * remove COPP (CASSANDRA-2479)
 + * Track tombstone expiration and compact when tombstone content is
 +   higher than a configurable threshold, default 20% (CASSANDRA-3442, 4234)
 + * update MurmurHash to version 3 (CASSANDRA-2975)
 + * (CLI) track elapsed time for `delete' operation (CASSANDRA-4060)
 + * (CLI) jline version is bumped to 1.0 to properly  support
 +   'delete' key function (CASSANDRA-4132)
 + * Save IndexSummary into new SSTable 'Summary' component (CASSANDRA-2392, 
4289)
 + * Add support for range tombstones (CASSANDRA-3708)
 + * Improve MessagingService efficiency (CASSANDRA-3617)
 + * Avoid ID conflicts from concurrent schema changes (CASSANDRA-3794)
 + * Set thrift HSHA server thread limit to unlimited by default 
(CASSANDRA-4277)
 + * Avoids double serialization of CF id in RowMutation messages
 +   (CASSANDRA-4293)
 + * stream compressed sstables directly with java nio (CASSANDRA-4297)
 + * Support multiple ranges in SliceQueryFilter (CASSANDRA-3885)
 + * Add column metadata to system column families (CASSANDRA-4018)
 + * (cql3) Always use composite types by default (CASSANDRA-4329)
 + * (cql3) Add support for set, map and list (CASSANDRA-3647)
 + * Validate date type correctly (CASSANDRA-4441)
 + * (cql3) Allow definitions with only a PK (CASSANDRA-4361)
 +
 +
  1.1.3
+  * (JMX) rename getRangeKeySample to sampleKeyRange to avoid returning
+multi-MB results as an attribute (CASSANDRA-4452)
   * flush based on data size, not throughput; overwritten columns no 
 longer artificially inflate liveRatio (CASSANDRA-4399)
   * update default commitlog segment size to 32MB and total commitlog

http://git-wip-us.apache.org/repos/asf/cassandra/blob/b167e9ba/src/java/org/apache/cassandra/service/StorageService.java
--
diff --cc src/java/org/apache/cassandra/service/StorageService.java
index d8bed6f,bfc8c81..4af399d
--- a/src/java/org/apache/cassandra/service/StorageService.java
+++ b/src/java/org/apache/cassandra/service/StorageService.java
@@@ -3289,11 -3080,9 +3289,11 @@@ public class StorageService implements 
  /**
   * #{@inheritDoc}
   */
- public ListString getRangeKeySample()
+ public ListString sampleKeyRange() // do not rename to getter - see 
CASSANDRA-4452 for details
  {
 -ListDecoratedKey keys = 
keySamples(ColumnFamilyStore.allUserDefined(), getLocalPrimaryRange());
 +ListDecoratedKey keys = new ArrayListDecoratedKey();
 +for (RangeToken range : getLocalPrimaryRanges())
 +keys.addAll(keySamples(ColumnFamilyStore.allUserDefined(), 
range));
  
  ListString sampledKeys = new ArrayListString(keys.size());
  for (DecoratedKey key : keys)

[Cassandra Wiki] Update of VirtualNodes/Balance by EricEvans

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The VirtualNodes/Balance page has been changed by EricEvans:
http://wiki.apache.org/cassandra/VirtualNodes/Balance?action=diffrev1=2rev2=3

Comment:
balance vs. intentional imbalance

  
  When migrating a legacy cluster with one-token-per-node to virtual nodes, the 
existing range is carved up into `num_tokens` new ranges.  These new ranges are 
still contiguous however, and a means of randomizing their placement is needed.
  
+ 
  Anchor(implementation)
  == Implementation (Draft) ==
+ === Considerations ===
+ In the most basic sense, ''balanced'' means that each node has 1/n of the 
token-space, so adjusting ownership for [[#heterogeneous_nodes|heterogeneous 
nodes]] is implicitly about ''unbalancing''.  This is important because, if for 
example, you reduced ownership of a node to say (1/n)*.8, you expect that 
imbalance to persist, and not be balanced-away by operations on other nodes.
+ 
+ ''Note: This will likely require storing state, in the form of an offset, on 
each node.''
+ 
  === Nodes / Cluster ===
  The most straightforward method of effecting ownership is a token move (i.e. 
relocating a range from one node to another).  Exposing this with JMX would 
allow implementing all of the required operations client-side.

[jira] [Assigned] (CASSANDRA-1920) Enhance word_count example s.t. it can ingest and analyze arbitrary text

2012-07-25 Thread Kirk True (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True reassigned CASSANDRA-1920:


Assignee: Kirk True

 Enhance word_count example s.t. it can ingest and analyze arbitrary text
 

 Key: CASSANDRA-1920
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1920
 Project: Cassandra
  Issue Type: Improvement
  Components: Contrib
 Environment: N/A
Reporter: Benjamin Coverston
Assignee: Kirk True
Priority: Minor
  Labels: lhf
   Original Estimate: 4h
  Remaining Estimate: 4h

 Enhance the word_count demo so that arbitrary text files can be ingested, and 
 those ingested files can also be analyzed in the map reduce jobs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3564) flush before shutdown so restart is faster


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422511#comment-13422511
 ] 

Brandon Williams commented on CASSANDRA-3564:
-

Doesn't {{nodetool flush}} already contain everything we need?  We just need 
the packaging glue.

 flush before shutdown so restart is faster
 --

 Key: CASSANDRA-3564
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3564
 Project: Cassandra
  Issue Type: New Feature
  Components: Packaging
Reporter: Jonathan Ellis
Assignee: David Alves
Priority: Minor
 Fix For: 1.2

 Attachments: 3564.patch, 3564.patch


 Cassandra handles flush in its shutdown hook for durable_writes=false CFs 
 (otherwise we're *guaranteed* to lose data) but leaves it up to the operator 
 otherwise.  I'd rather leave it that way to offer these semantics:
 - cassandra stop = shutdown nicely [explicit flush, then kill -int]
 - kill -INT = shutdown faster but don't lose any updates [current behavior]
 - kill -KILL = lose most recent writes unless durable_writes=true and batch 
 commits are on [also current behavior]
 But if it's not reasonable to use nodetool from the init script then I guess 
 we can just make the shutdown hook flush everything.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[1/2] git commit: merge from 1.1

2012-07-25 Thread xedin

Updated Branches:
  refs/heads/trunk b167e9ba7 - 5cde66bab


merge from 1.1


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/5cde66ba
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/5cde66ba
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/5cde66ba

Branch: refs/heads/trunk
Commit: 5cde66bab30e7aa8f98bae9a7504b2a4a17cdda1
Parents: b167e9b cc0be1b
Author: Pavel Yaskevich xe...@apache.org
Authored: Wed Jul 25 22:01:14 2012 +0300
Committer: Pavel Yaskevich xe...@apache.org
Committed: Wed Jul 25 22:01:14 2012 +0300

--
 CHANGES.txt|1 +
 .../org/apache/cassandra/db/ColumnFamilyStore.java |   33 +--
 .../cassandra/db/ColumnFamilyStoreMBean.java   |5 ++
 .../compaction/SizeTieredCompactionStrategy.java   |3 +-
 src/java/org/apache/cassandra/tools/NodeProbe.java |3 +-
 .../cassandra/db/compaction/CompactionsTest.java   |3 +-
 6 files changed, 29 insertions(+), 19 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/5cde66ba/CHANGES.txt
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/5cde66ba/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/5cde66ba/src/java/org/apache/cassandra/db/ColumnFamilyStoreMBean.java
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/5cde66ba/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/5cde66ba/src/java/org/apache/cassandra/tools/NodeProbe.java
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/5cde66ba/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
--

[2/2] git commit: fix nodetool's setcompactionthreshold command patch by Aleksey Yeschenko; reviewed by Pavel Yaskevich for CASSANDRA-4455

2012-07-25 Thread xedin

fix nodetool's setcompactionthreshold command
patch by Aleksey Yeschenko; reviewed by Pavel Yaskevich for CASSANDRA-4455


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/cc0be1b4
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/cc0be1b4
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/cc0be1b4

Branch: refs/heads/trunk
Commit: cc0be1b40007ef4b653e4ad6bc4dbe0438b97785
Parents: f46232c
Author: Pavel Yaskevich xe...@apache.org
Authored: Wed Jul 25 17:52:39 2012 +0300
Committer: Pavel Yaskevich xe...@apache.org
Committed: Wed Jul 25 21:59:38 2012 +0300

--
 CHANGES.txt|1 +
 .../org/apache/cassandra/db/ColumnFamilyStore.java |   33 +--
 .../cassandra/db/ColumnFamilyStoreMBean.java   |5 ++
 .../compaction/SizeTieredCompactionStrategy.java   |3 +-
 src/java/org/apache/cassandra/tools/NodeProbe.java |3 +-
 .../cassandra/db/compaction/CompactionsTest.java   |3 +-
 6 files changed, 29 insertions(+), 19 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/cc0be1b4/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index c160d69..169f66d 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -18,6 +18,7 @@
  * Fix LCS bug with sstable containing only 1 row (CASSANDRA-4411)
  * fix Can't Modify Index Name problem on CF update (CASSANDRA-4439)
  * Fix assertion error in getOverlappingSSTables during repair (CASSANDRA-4456)
+ * fix nodetool's setcompactionthreshold command (CASSANDRA-4455)
 Merged from 1.0:
  * allow dropping columns shadowed by not-yet-expired supercolumn or row
tombstones in PrecompactedRow (CASSANDRA-4396)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/cc0be1b4/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
--
diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java 
b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
index 0b66020..b93adc1 100644
--- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
+++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
@@ -1845,6 +1845,18 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
 return compactionStrategy;
 }
 
+public void setCompactionThresholds(int minThreshold, int maxThreshold)
+{
+validateCompactionThresholds(minThreshold, maxThreshold);
+
+minCompactionThreshold.set(minThreshold);
+maxCompactionThreshold.set(maxThreshold);
+
+// this is called as part of CompactionStrategy constructor; avoid 
circular dependency by checking for null
+if (compactionStrategy != null)
+CompactionManager.instance.submitBackground(this);
+}
+
 public int getMinimumCompactionThreshold()
 {
 return minCompactionThreshold.value();
@@ -1852,14 +1864,8 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
 
 public void setMinimumCompactionThreshold(int minCompactionThreshold)
 {
-if ((minCompactionThreshold  this.maxCompactionThreshold.value())  
this.maxCompactionThreshold.value() != 0)
-throw new RuntimeException(The min_compaction_threshold cannot be 
larger than the max.);
-
+validateCompactionThresholds(minCompactionThreshold, 
maxCompactionThreshold.value());
 this.minCompactionThreshold.set(minCompactionThreshold);
-
-// this is called as part of CompactionStrategy constructor; avoid 
circular dependency by checking for null
-if (compactionStrategy != null)
-CompactionManager.instance.submitBackground(this);
 }
 
 public int getMaximumCompactionThreshold()
@@ -1869,14 +1875,15 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
 
 public void setMaximumCompactionThreshold(int maxCompactionThreshold)
 {
-if (maxCompactionThreshold  0  maxCompactionThreshold  
this.minCompactionThreshold.value())
-throw new RuntimeException(The max_compaction_threshold cannot be 
smaller than the min.);
-
+validateCompactionThresholds(minCompactionThreshold.value(), 
maxCompactionThreshold);
 this.maxCompactionThreshold.set(maxCompactionThreshold);
+}
 
-// this is called as part of CompactionStrategy constructor; avoid 
circular dependency by checking for null
-if (compactionStrategy != null)
-CompactionManager.instance.submitBackground(this);
+private void validateCompactionThresholds(int minThreshold, int 
maxThreshold)
+{
+if (minThreshold  maxThreshold  maxThreshold != 0)
+throw new RuntimeException(String.format(The 
min_compaction_threshold cannot be

[jira] [Commented] (CASSANDRA-3564) flush before shutdown so restart is faster


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422517#comment-13422517
 ] 

Jonathan Ellis commented on CASSANDRA-3564:
---

bq. we could still improve the (debian) packaging to have a 'call flush for me 
before shutdown' 

WFM, but it should default to off IMO.

 flush before shutdown so restart is faster
 --

 Key: CASSANDRA-3564
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3564
 Project: Cassandra
  Issue Type: New Feature
  Components: Packaging
Reporter: Jonathan Ellis
Assignee: David Alves
Priority: Minor
 Fix For: 1.2

 Attachments: 3564.patch, 3564.patch


 Cassandra handles flush in its shutdown hook for durable_writes=false CFs 
 (otherwise we're *guaranteed* to lose data) but leaves it up to the operator 
 otherwise.  I'd rather leave it that way to offer these semantics:
 - cassandra stop = shutdown nicely [explicit flush, then kill -int]
 - kill -INT = shutdown faster but don't lose any updates [current behavior]
 - kill -KILL = lose most recent writes unless durable_writes=true and batch 
 commits are on [also current behavior]
 But if it's not reasonable to use nodetool from the init script then I guess 
 we can just make the shutdown hook flush everything.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3564) flush before shutdown so restart is faster


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422522#comment-13422522
 ] 

Brandon Williams commented on CASSANDRA-3564:
-

bq. it should default to off IMO

+1

 flush before shutdown so restart is faster
 --

 Key: CASSANDRA-3564
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3564
 Project: Cassandra
  Issue Type: New Feature
  Components: Packaging
Reporter: Jonathan Ellis
Assignee: David Alves
Priority: Minor
 Fix For: 1.2

 Attachments: 3564.patch, 3564.patch


 Cassandra handles flush in its shutdown hook for durable_writes=false CFs 
 (otherwise we're *guaranteed* to lose data) but leaves it up to the operator 
 otherwise.  I'd rather leave it that way to offer these semantics:
 - cassandra stop = shutdown nicely [explicit flush, then kill -int]
 - kill -INT = shutdown faster but don't lose any updates [current behavior]
 - kill -KILL = lose most recent writes unless durable_writes=true and batch 
 commits are on [also current behavior]
 But if it's not reasonable to use nodetool from the init script then I guess 
 we can just make the shutdown hook flush everything.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4436) Counters in columns don't preserve correct values after cluster restart

[
https://issues.apache.org/jira/browse/CASSANDRA-4436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422532#comment-13422532
]

Jonathan Ellis commented on CASSANDRA-4436:
---

bq. But we won't have the same ancestor multiple times

I don't think that's true. Suppose for instance we have leveled compaction
with A and B in L0. They are larger than 5MB so we split the result into X, Y,
and Z. Next we flush C to L0. It overlaps with Y and Z, so we're compacting
C, Y, and Z. Now we have Y and Z both with A and B as ancestors.

(Switching from LCS back to STCS is another way you could get duplicate
ancestors.)

Counters in columns don't preserve correct values after cluster restart
---

Key: CASSANDRA-4436
URL: https://issues.apache.org/jira/browse/CASSANDRA-4436
Project: Cassandra
Issue Type: Bug
Components: Core
Affects Versions: 1.0.10
Reporter: Peter Velas
Assignee: Sylvain Lebresne
Fix For: 1.1.3

Attachments: 4436-1.0-2.txt, 4436-1.0.txt, 4436-1.1-2.txt,
4436-1.1.txt, increments.cql.gz

Similar to #3821. but affecting normal columns.
Set up a 2-node cluster with rf=2.
1. Create a counter column family and increment a 100 keys in loop 5000
times.
2. Then make a rolling restart to cluster.
3. Again increment another 5000 times.
4. Make a rolling restart to cluster.
5. Again increment another 5000 times.
6. Make a rolling restart to cluster.
After step 6 we were able to reproduce bug with bad counter values.
Expected values were 15 000. Values returned from cluster are higher then
15000 + some random number.
Rolling restarts are done with nodetool drain. Always waiting until second
node discover its down then kill java process.

[jira] [Commented] (CASSANDRA-4292) Per-disk I/O queues

[
https://issues.apache.org/jira/browse/CASSANDRA-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422574#comment-13422574
]

Jonathan Ellis commented on CASSANDRA-4292:
---

bq. Directory is chosen based on available space in both queue and disk.

We still want to prioritize disks that have no tasks yet, since ipos are a
bigger bottleneck than space, in general.

So specifically, we want to prioritize in order of:

# enough space for the new sstable (boolean)
# zero tasks (boolean)
# total free space (long)

We may want to test changing #2 to ordering by task count... both have pros
and cons.

Per-disk I/O queues
---

Attachments: 4292-v2.txt, 4292.txt

[jira] [Commented] (CASSANDRA-4460) SystemTable.setBootstrapState always sets bootstrap state to true

2012-07-25 Thread Dave Brosius (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422593#comment-13422593
 ] 

Dave Brosius commented on CASSANDRA-4460:
-

it has to be migrated anyway. The table is defined to be boolean currently. So 
either you migrate to integer or string. I chose string as 0, 1, 2 mean nothing 
to me.

 SystemTable.setBootstrapState always sets bootstrap state to true
 -

 Key: CASSANDRA-4460
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4460
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2
Reporter: Dave Brosius
Assignee: Dave Brosius
Priority: Trivial
 Attachments: use_bootstrap_enum_strings.txt


 public static void setBootstrapState(BootstrapState state)
 {
 String req = INSERT INTO system.%s (key, bootstrapped) VALUES ('%s', 
 '%b');
 processInternal(String.format(req, LOCAL_CF, LOCAL_KEY, 
 getBootstrapState()));
 forceBlockingFlush(LOCAL_CF);
 }
 Third parameter %b is set from getBootstrapState() which returns an enum, 
 thus %b collapses to null/non null checks. This would seem then to always set 
 it to true.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Cassandra Wiki] Update of VirtualNodes/Balance by EricEvans

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The VirtualNodes/Balance page has been changed by EricEvans:
http://wiki.apache.org/cassandra/VirtualNodes/Balance?action=diffrev1=3rev2=4

Comment:
proposed tool interfaces

  
  === User Interface ===
  
+ The `balance` sub-command balances the node it is ran against, by default a 
targeted ownership of `1/n`.  The sub-command takes an optional offset in the 
rangeFootNote(Does this range make sense?) of `+100` to `-100`, which 
results in a targeted ownership of `(1/n)*(offset/100)`.
+ 
+ ''Note: ranges copied from/to other nodes must be selected in such a way as 
to respect their offsets.''
+ 
  {{{
- $ nodetool balance
+ $ nodetool balance [+/-offset]
  }}}
+ 
+ The ``shuffle` sub-command randomly exchanges contiguous ranges on the node 
it ran against, with other nodes in the cluster.
  
  {{{
  $ nodetool shuffle
  }}}
  
+ The `trim` sub-command assigns an offset in the rangeFootNote(Does this 
range make sense?) of `+100` to `-100`, and copies randomly selected ranges 
onto, or off of, the node it is ran against to achieve the requested ownership 
(`(1/n)*(offset/100)`).
+ 
  {{{
- $ nodetool trim
+ $ nodetool trim +/- offset
  }}}

[jira] [Commented] (CASSANDRA-1967) commit log replay shouldn't end with a flush

2012-07-25 Thread Robert Coli (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422613#comment-13422613
]

Robert Coli commented on CASSANDRA-1967:

After making the above update, I noticed Cassandra 1.0.10 flushing after
replay. Given this experience clashing with my interpretation of the code, I
conjectured that the flush must be deeper in the code paths than previous
versions, and deeper than I read this time. I asked about this in #cassandra.

Per jbellis in #cassandra :

1) Explicit flush at the end of replay is by design.
2) The design goal in this case is to avoid multiple replay of the same log, if
node crashes before replayed data is flushed.

I don't find 2) a compelling design goal, and believe it violates the principle
of least surprise.

The purpose of the commitlog is to hold the contents of memtables. In the case
of a crash, I expect the commitlog replay process to result in the same
memtables that my node contained before it crashed. If it then crashes again, I
expect the same memtables to be replayed again. There may be some negative
externalities to this repeated replay which are not currently clear to me, but
I am relatively confident that being surprised by my memtable state is not one
of them.

In my opinion, avoiding compaction as a side effect of restart/replay is, in
contrast, a compelling design goal.

Significant production users appear to agree in CASSANDRA-2444 ([Twitter has]
ran into many times where we do not want compaction to run right away against
CFs when booting up a node.) But the resolution of CASSANDRA-2444 (If the
node needs to compact, it will do so at the first flush, which is more likely
to be staggered across the cluster) does not make sense if commitlog replay
always ends with a flush. The logical result of both code paths appears the
same : restart has a potential to trigger immediate compaction.

In summary... +1 for re-opening this ticket and making commit log replay not
end with a flush.

commit log replay shouldn't end with a flush

Key: CASSANDRA-1967
URL: https://issues.apache.org/jira/browse/CASSANDRA-1967
Project: Cassandra
Issue Type: Improvement
Components: Core
Affects Versions: 0.3
Reporter: Robert Coli
Priority: Minor

(Apologies in advance if there is some very compelling reason to flush after
replay, of which I am not currently aware. ;D)
Currently, when a node restarts, the following sequence occurs :
a) commitlog is replayed
b) any memtables resulting from a) are flushed
c) a new commitlog is opened, new memtables are switched in
... (other stuff happens)
d) node starts taking traffic
This has side effects, perhaps most seriously the potential of triggering
compaction. As a node is likely to struggle performance-wise after
restarting, triggering compaction at that time seems like something we might
wish to avoid.
I propose that the sequence be :
a) commitlog is replayed
b) a new commitlog is opened, new memtables are switched in
... (other stuff happens)
c) node starts taking traffic
Looking through the relevant code, the only code that appears to depend on
this flush is at
src/java/org/apache/cassandra/db/commitlog/CommitLog.java:112 :

// all old segments are recovered and deleted before CommitLog is
instantiated.
// All we need to do is create a new one.
segments.add(new CommitLogSegment());

Presumably this code would have to be refactored to be aware of the currently
open commitlog.

[jira] [Commented] (CASSANDRA-1967) commit log replay shouldn't end with a flush


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422628#comment-13422628
 ] 

Jonathan Ellis commented on CASSANDRA-1967:
---

You're barking up the wrong tree by blaming flush.  To the degree that 
compaction is a problem (and on a properly tuned system it shouldn't be), we 
can simply extend the five minute delay on autocompaction to these flushes as 
well.

 commit log replay shouldn't end with a flush
 

 Key: CASSANDRA-1967
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1967
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.3
Reporter: Robert Coli
Priority: Minor

 (Apologies in advance if there is some very compelling reason to flush after 
 replay, of which I am not currently aware. ;D)
 Currently, when a node restarts, the following sequence occurs :
 a) commitlog is replayed
 b) any memtables resulting from a) are flushed 
 c) a new commitlog is opened, new memtables are switched in
 ... (other stuff happens)
 d) node starts taking traffic
 This has side effects, perhaps most seriously the potential of triggering 
 compaction. As a node is likely to struggle performance-wise after 
 restarting, triggering compaction at that time seems like something we might 
 wish to avoid.
 I propose that the sequence be :
 a) commitlog is replayed
 b) a new commitlog is opened, new memtables are switched in 
 ... (other stuff happens)
 c) node starts taking traffic
 Looking through the relevant code, the only code that appears to depend on 
 this flush is at 
 src/java/org/apache/cassandra/db/commitlog/CommitLog.java:112 :
 
 // all old segments are recovered and deleted before CommitLog is 
 instantiated.
 // All we need to do is create a new one.
 segments.add(new CommitLogSegment());
 
 Presumably this code would have to be refactored to be aware of the currently 
 open commitlog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4460) SystemTable.setBootstrapState always sets bootstrap state to true


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-4460:


Attachment: 4460.txt

bq. it has to be migrated anyway. The table is defined to be boolean currently

Actually, it's only boolean in trunk and we don't need to keep trunk compatible 
with itself.  It turns out upgradeSystemData() is handling the 1.1 to trunk 
transition for us already.

bq.  I chose string as 0, 1, 2 mean nothing to me.

Fair enough.  Attaching a new version which takes all of this into account, and 
fixes a bug in setBootstrapState using getBootstrapState instead of the state 
passed to it.

 SystemTable.setBootstrapState always sets bootstrap state to true
 -

 Key: CASSANDRA-4460
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4460
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2
Reporter: Dave Brosius
Assignee: Dave Brosius
Priority: Trivial
 Attachments: 4460.txt, use_bootstrap_enum_strings.txt


 public static void setBootstrapState(BootstrapState state)
 {
 String req = INSERT INTO system.%s (key, bootstrapped) VALUES ('%s', 
 '%b');
 processInternal(String.format(req, LOCAL_CF, LOCAL_KEY, 
 getBootstrapState()));
 forceBlockingFlush(LOCAL_CF);
 }
 Third parameter %b is set from getBootstrapState() which returns an enum, 
 thus %b collapses to null/non null checks. This would seem then to always set 
 it to true.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-4465) Index fails to be created on all nodes in cluster, restart resolves

2012-07-25 Thread Grant Heffernan (JIRA)

Grant Heffernan created CASSANDRA-4465:
--

 Summary: Index fails to be created on all nodes in cluster, 
restart resolves
 Key: CASSANDRA-4465
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4465
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.10
 Environment: 21 node cluster, Ubuntu Linux 11.10 in a virtualized 
environment, Apache cassandra community release, binary distribution
Reporter: Grant Heffernan
Priority: Minor


On a production cluster, under load, creating an index on a column resulted in 
the index being successfully created on 4 of 21 nodes. All nodes received the 
schema agreement and were in concert. There were no errors logged on any of the 
nodes that failed to build the index.

A rolling restart of the cluster resulted in the nodes which had previously 
failed to build the index doing so when coming back up from a restart.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4460) SystemTable.setBootstrapState always sets bootstrap state to true

2012-07-25 Thread Dave Brosius (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422810#comment-13422810
 ] 

Dave Brosius commented on CASSANDRA-4460:
-

LGTM

 SystemTable.setBootstrapState always sets bootstrap state to true
 -

 Key: CASSANDRA-4460
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4460
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2
Reporter: Dave Brosius
Assignee: Dave Brosius
Priority: Trivial
 Attachments: 4460.txt, use_bootstrap_enum_strings.txt


 public static void setBootstrapState(BootstrapState state)
 {
 String req = INSERT INTO system.%s (key, bootstrapped) VALUES ('%s', 
 '%b');
 processInternal(String.format(req, LOCAL_CF, LOCAL_KEY, 
 getBootstrapState()));
 forceBlockingFlush(LOCAL_CF);
 }
 Third parameter %b is set from getBootstrapState() which returns an enum, 
 thus %b collapses to null/non null checks. This would seem then to always set 
 it to true.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

git commit: Fix SystemTable.setBootstrapState and other merge fallout from #4427. Patch by Dave Brosius and brandonwilliams, reviewed by Dave Brosius for CASSANDRA-4460