[jira] [Updated] (CASSANDRA-2843) better performance on long row read

2011-07-01 Thread Yang Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yang updated CASSANDRA-2843:
-

Attachment: fast_cf.diff

diff file

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: fast_cf.diff


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2843) better performance on long row read

2011-07-01 Thread Yang Yang (JIRA)
better performance on long row read
---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: fast_cf.diff

currently if a row contains  1000 columns, the run time becomes considerably 
slow (my test of 
a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 40 
bytes in value, is about 16ms.
this is all running in memory, no disk read is involved.

through debugging we can find
most of this time is spent on 
[Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
[Wall Time]  
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
ColumnFamily)
[Wall Time]  
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
ColumnFamily)
[Wall Time]  
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, int, 
ColumnFamily)
[Wall Time]  
org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily, 
Iterator, int)
[Wall Time]  
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
 Iterator, int)
[Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)

ColumnFamily.addColumn() is slow because it inserts into an internal 
concurrentSkipListMap() that maps column names to values.
this structure is slow for two reasons: it needs to do synchronization; it 
needs to maintain a more complex structure of map.

but if we look at the whole read path, thrift already defines the read output 
to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
data structure in the interium and finally convert it to a list. on the 
synchronization side, since the return CF is never going to be shared/modified 
by other threads, we know the access is always single thread, so no 
synchronization is needed.

but these 2 features are indeed needed for ColumnFamily in other cases, 
particularly write. so we can provide a different ColumnFamily to 
CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
creates the standard ColumnFamily, but take a provided returnCF, whose cost is 
much cheaper.

the provided patch is for demonstration now, will work further once we agree on 
the general direction. 
CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is provided. 
the main work is to let the FastColumnFamily use an array  for internal 
storage. at first I used binary search to insert new columns in addColumn(), 
but later I found that even this is not necessary, since all calling scenarios 
of ColumnFamily.addColumn() has an invariant that the inserted columns come in 
sorted order (I still have an issue to resolve descending or ascending  now, 
but ascending works). so the current logic is simply to compare the new column 
against the end column in the array, if names not equal, append, if equal, 
reconcile.

slight temporary hacks are made on getTopLevelColumnFamily so we have 2 flavors 
of the method, one accepting a returnCF. but we could definitely think about 
what is the better way to provide this returnCF.


this patch compiles fine, no tests are provided yet. but I tested it in my 
application, and the performance improvement is dramatic: it offers about 50% 
reduction in read time in the 3000-column case.


thanks
Yang


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2843) better performance on long row read

2011-07-01 Thread Yang Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yang updated CASSANDRA-2843:
-

Attachment: b.tar.gz

just untar this file into the 0.8.0-rc1  source tree, then compile

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: b.tar.gz, fast_cf.diff


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2843) better performance on long row read

2011-07-01 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058256#comment-13058256
 ] 

Sylvain Lebresne commented on CASSANDRA-2843:
-

The usual way to do thinks is to attach a patch. But I see that your diff don't 
include FastColumnFamily. The patch also include instrumentation and a few 
unrelated chances (a commented method, a change from SortedSet to Set in an 
unrelated method signature) that would ideally be removed. It would be great to 
have this rebase to the current 0.8 branch too.

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: b.tar.gz, fast_cf.diff


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2844) grep friendly nodetool compactionstats output

2011-07-01 Thread Wojciech Meler (JIRA)
grep friendly nodetool compactionstats output
-

 Key: CASSANDRA-2844
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2844
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 0.8.1
Reporter: Wojciech Meler
Priority: Trivial


output from nodetool compactionstats is quite hard to parse with text tools - 
it would be nice to have one line per compaction

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2844) grep friendly nodetool compactionstats output

2011-07-01 Thread Wojciech Meler (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wojciech Meler updated CASSANDRA-2844:
--

Attachment: comapctionstats.patch

patch for 0.8.1 that do the job

 grep friendly nodetool compactionstats output
 -

 Key: CASSANDRA-2844
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2844
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 0.8.1
Reporter: Wojciech Meler
Priority: Trivial
 Attachments: comapctionstats.patch


 output from nodetool compactionstats is quite hard to parse with text tools - 
 it would be nice to have one line per compaction

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2819) Split rpc timeout for read and write ops

2011-07-01 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058539#comment-13058539
 ] 

Jonathan Ellis commented on CASSANDRA-2819:
---

That is the wrong place for it because Message is used both to send and to 
receive.  MDT creation time is effectively identical.

 Split rpc timeout for read and write ops
 

 Key: CASSANDRA-2819
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2819
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Stu Hood
Assignee: Melvin Wang
 Fix For: 1.0

 Attachments: twttr-cassandra-0.8-counts-resync-rpc-rw-timeouts.diff


 Given the vastly different latency characteristics of reads and writes, it 
 makes sense for them to have independent rpc timeouts internally.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2845) Cassandra uses 100% system CPU on Ubuntu Natty (11.04)

2011-07-01 Thread Steve Corona (JIRA)
Cassandra uses 100% system CPU on Ubuntu Natty (11.04)
--

 Key: CASSANDRA-2845
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2845
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.1, 0.8.0
 Environment: Default install of Ubuntu 11.04
Reporter: Steve Corona


Step 1. Boot up a brand new, default Ubuntu 11.04 Server install
Step 2. Install Cassandra from Apache APT Respository (deb 
http://www.apache.org/dist/cassandra/debian 08x main)
Step 3. apt-get install cassandra, as soon as it cassandra starts it will 
freeze the machine

What's happening is that as soon as cassandra starts up it immediately sucks up 
100% of CPU and starves the machine. This effectively bricks the box until you 
boot into single user mode and disable the cassandra init.d script.

Under htop, the CPU usage shows up as system cpu, not user.

The machine I'm testing this on is a Quad-Core Sandy Bridge w/ 16GB of Memory, 
so it's not a system resource issue. I've also tested this on completely 
different hardware (Dual 64-Bit Xeons  AMD X4) and it has the same effect.

Ubuntu 10.10 does not exhibit the same issue. I have only tested 0.8 and 0.8.1.

root@cassandra01:/# java -version
java version 1.6.0_22
OpenJDK Runtime Environment (IcedTea6 1.10.2) (6b22-1.10.2-0ubuntu1~11.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)

root@cassandra:/# uname -a
Linux cassandra01 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 
x86_64 x86_64 x86_64 GNU/Linux

/proc/cpu
Intel(R) Xeon(R) CPU E31270 @ 3.40GHz

/proc/meminfo
MemTotal:   16459776 kB
MemFree:14190708 kB

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2846) Changing replication_factor using update keyspace not working

2011-07-01 Thread JIRA
Changing replication_factor using update keyspace not working
---

 Key: CASSANDRA-2846
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2846
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.1
 Environment: A clean 0.8.1 install using the default configuration
Reporter: Jonas Borgström


Unless I've misunderstood the new way to do this with 0.8 I think update 
keyspace is broken:

{code}
[default@unknown] create keyspace Test with placement_strategy = 
'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
[{replication_factor:1}];
37f70d40-a3e9-11e0--242d50cf1fbf
Waiting for schema agreement...
... schemas agree across the cluster
[default@unknown] describe keyspace Test;
Keyspace: Test:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:1]
  Column Families:
[default@unknown] update keyspace Test with placement_strategy = 
'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
[{replication_factor:2}];
489fe220-a3e9-11e0--242d50cf1fbf
Waiting for schema agreement...
... schemas agree across the cluster
[default@unknown] describe keyspace Test;   

Keyspace: Test:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:1]
  Column Families:
{code}

Isn't the second describe keyspace supposed to to say replication_factor:2?

Relevant bits from system.log:
{code}
Migration.java (line 116) Applying migration 
489fe220-a3e9-11e0--242d50cf1fbf Update keyspace Testrep 
strategy:SimpleStrategy{}durable_writes: true to Testrep 
strategy:SimpleStrategy{}durable_writes: true
UpdateKeyspace.java (line 74) Keyspace updated. Please perform any manual 
operations
{code}


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2842) Hive JDBC connections fail with InvalidUrlException when both the C* and Hive JDBC drivers are loaded

2011-07-01 Thread Rick Shaw (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rick Shaw updated CASSANDRA-2842:
-

Attachment: pass-if-not-right-driver-v1.txt

This test has been run against v1.0.3 of the driver. In that version the 
{{connect(...)}} method of {{CassandraDriver}} is called with an unsupported 
protocol:subprotocol in its URL. It recognizes it is not the proper protocol 
but erroneously throws an exception rather than returning a null to the caller 
stating that it can not handle it, so please move on. The patch is based on the 
current trunk of {{/drivers}} (v1.0.4).

 Hive JDBC connections fail with InvalidUrlException when both the C* and Hive 
 JDBC drivers are loaded
 -

 Key: CASSANDRA-2842
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2842
 Project: Cassandra
  Issue Type: Bug
Reporter: Cathy Daw
Priority: Trivial
 Attachments: pass-if-not-right-driver-v1.txt


 Hive connections fail with InvalidUrlException when both the C* and Hive JDBC 
 drivers are loaded, and it seems the URL is being interpreted as a C* url.
 {code}
   Caused an ERROR
 [junit] Invalid connection url:jdbc:hive://127.0.0.1:1/default. 
 should start with jdbc:cassandra
 [junit] org.apache.cassandra.cql.jdbc.InvalidUrlException: Invalid 
 connection url:jdbc:hive://127.0.0.1:1/default. should start with 
 jdbc:cassandra
 [junit]   at 
 org.apache.cassandra.cql.jdbc.CassandraDriver.connect(CassandraDriver.java:90)
 [junit]   at java.sql.DriverManager.getConnection(DriverManager.java:582)
 [junit]   at java.sql.DriverManager.getConnection(DriverManager.java:185)
 [junit]   at 
 com.datastax.bugRepros.repro_connection_error.test1_runHiveBeforeJdbc(repro_connection_error.java:34)
 {code}
 *Code Snippet: intended to illustrate the connection issues* 
 * Copy file to test directory
 * Change package declaration
 * run:  ant test -Dtest.name=repro_conn_error
 {code}
 package com.datastax.bugRepros;
 import java.sql.DriverManager;
 import java.sql.Connection;
 import java.sql.SQLException;
 import java.util.Enumeration;
 import org.junit.Test;
 public class repro_conn_error
 {
 @Test
 public void jdbcConnectionError() throws Exception 
 {  
 // Create Hive JDBC Connection - will succeed if  
 try 
 {
 // Uncomment loading C* driver to reproduce bug
 Class.forName(org.apache.cassandra.cql.jdbc.CassandraDriver);
 
 // Load Hive driver and connect
 Class.forName(org.apache.hadoop.hive.jdbc.HiveDriver);
 Connection hiveConn = 
 DriverManager.getConnection(jdbc:hive://127.0.0.1:1/default, , );
 hiveConn.close();  
 System.out.println(successful hive connection);
 } catch (SQLException e) {
 System.out.println(unsuccessful hive connection);
 e.printStackTrace();
 }
 
 // Create C* JDBC Connection
 try 
 {
 Class.forName(org.apache.cassandra.cql.jdbc.CassandraDriver);
 Connection jdbcConn = 
 DriverManager.getConnection(jdbc:cassandra:root/root@127.0.0.1:9160/default);
  
 jdbcConn.close();
 System.out.println(successful c* connection);
 } catch (SQLException e) {
 System.out.println(unsuccessful c* connection);
 e.printStackTrace();
 }
 
 // Print out all loaded JDBC drivers.
 Enumeration d = java.sql.DriverManager.getDrivers();
 
 while (d.hasMoreElements()) {
 Object driverAsObject = d.nextElement();
 System.out.println(JDBC driver= + driverAsObject);
 }
 }
 }
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2739) Cannot recover SSTable with version f (current version g) during the node decommission.

2011-07-01 Thread Thibaut (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058568#comment-13058568
 ] 

Thibaut commented on CASSANDRA-2739:


Running into the same problem after upgrading our test cluster from 0.7.* 
(don't know what the exact version number was) to 0.8.1. Do I have to run scrub 
on each node and everything will be fine afterwards? 

We plan to upgrade our production cluster soon and can't afford to loose data 
there.


 Cannot recover SSTable with version f (current version g) during the node 
 decommission.
 ---

 Key: CASSANDRA-2739
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2739
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0
 Environment: centos, cassandra 0.7.4 upgrade to 0.8.0-final.
Reporter: Dikang Gu
  Labels: decommission, version

 I upgrade the 4-nodes cassandra 0.7.4 cluster to 0.8.0-final. Then, I do the 
 bin/nodetool decommission on one node, the decommission hangs there and I got 
 the following errors on other nodes.
 ERROR [Thread-55] 2011-06-03 18:02:03,500 AbstractCassandraDaemon.java (line 
 113) Fatal exception in thread Thread[Thread-55,5,main]
 java.lang.RuntimeException: Cannot recover SSTable with version f (current 
 version g).
   at 
 org.apache.cassandra.io.sstable.SSTableWriter.createBuilder(SSTableWriter.java:240)
   at 
 org.apache.cassandra.db.CompactionManager.submitSSTableBuild(CompactionManager.java:1088)
   at 
 org.apache.cassandra.streaming.StreamInSession.finished(StreamInSession.java:108)
   at 
 org.apache.cassandra.streaming.IncomingStreamReader.readFile(IncomingStreamReader.java:104)
   at 
 org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:61)
   at 
 org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:155)
   at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
 ERROR [Thread-56] 2011-06-03 18:02:04,285 AbstractCassandraDaemon.java (line 
 113) Fatal exception in thread Thread[Thread-56,5,main]
 java.lang.RuntimeException: Cannot recover SSTable with version f (current 
 version g).
   at 
 org.apache.cassandra.io.sstable.SSTableWriter.createBuilder(SSTableWriter.java:240)
   at 
 org.apache.cassandra.db.CompactionManager.submitSSTableBuild(CompactionManager.java:1088)
   at 
 org.apache.cassandra.streaming.StreamInSession.finished(StreamInSession.java:108)
   at 
 org.apache.cassandra.streaming.IncomingStreamReader.readFile(IncomingStreamReader.java:104)
   at 
 org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:61)
   at 
 org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:155)
   at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2846) Changing replication_factor using update keyspace not working

2011-07-01 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2846:
--

Attachment: 2846.txt

// server helpfully sets deprecated replication factor when it sends a 
KsDef back, for older clients.
// we need to unset that on the new KsDef we create to avoid being 
treated as a legacy client in return.


 Changing replication_factor using update keyspace not working
 ---

 Key: CASSANDRA-2846
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2846
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.1
 Environment: A clean 0.8.1 install using the default configuration
Reporter: Jonas Borgström
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2846.txt


 Unless I've misunderstood the new way to do this with 0.8 I think update 
 keyspace is broken:
 {code}
 [default@unknown] create keyspace Test with placement_strategy = 
 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
 [{replication_factor:1}];
 37f70d40-a3e9-11e0--242d50cf1fbf
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@unknown] describe keyspace Test;
 Keyspace: Test:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:1]
   Column Families:
 [default@unknown] update keyspace Test with placement_strategy = 
 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
 [{replication_factor:2}];
 489fe220-a3e9-11e0--242d50cf1fbf
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@unknown] describe keyspace Test; 
   
 Keyspace: Test:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:1]
   Column Families:
 {code}
 Isn't the second describe keyspace supposed to to say 
 replication_factor:2?
 Relevant bits from system.log:
 {code}
 Migration.java (line 116) Applying migration 
 489fe220-a3e9-11e0--242d50cf1fbf Update keyspace Testrep 
 strategy:SimpleStrategy{}durable_writes: true to Testrep 
 strategy:SimpleStrategy{}durable_writes: true
 UpdateKeyspace.java (line 74) Keyspace updated. Please perform any manual 
 operations
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2845) Cassandra uses 100% system CPU on Ubuntu Natty (11.04)

2011-07-01 Thread Steve Corona (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058613#comment-13058613
 ] 

Steve Corona commented on CASSANDRA-2845:
-

I actually figured this out- it's more of a cassandra packaging issue than an 
issue with the actual code.

I extracted the cassandra-0.8.1.deb file and diff'ed all of the files with 
apache-cassandra-0.8.1-bin.tar.gz. I noticed that apache-cassandra-0.8.1.jar 
was off by a few bytes. I extracted the jar and determined that the deb file 
was using a different version of the following classes:

cli/CliLexer.class
cli/CliParser.class
cql/CqlLexer.class
cql/CqlParser.class

I repackaged the .deb using apache-cassandra-0.8.1.jar from the bin.tar.gz 
(will post instructions below) and it installed on Ubuntu 11.04 without a 
hitch. I'm not sure if the .jar/.class files used to package the deb were 
corrupted or just are a different/incomplete/broken version.

Poor mans .deb repackaging until it's officially fixed:

cd /tmp
mkdir work  cd work
wget 
http://www.fightrice.com/mirrors/apache/cassandra/0.8.1/apache-cassandra-0.8.1-bin.tar.gz
tar -zxvf apache-cassandra-0.8.1-bin.tar.gz 

mkdir deb  cd deb
wget 
http://www.apache.org/dist/cassandra/debian/pool/main/c/cassandra/cassandra_0.8.1_all.deb

# need bintools to get ar utility
sudo apt-get install binutils

ar vx cassandra_0.8.1_all.deb
tar -zxvf data.tar.gz
rm data.tar.gz
cd ./usr/share/cassandra

mv /tmp/work/apache-cassandra-0.8.1/lib/apache-cassandra-0.8.1.jar .
cd /tmp/work/deb
tar -czvf data.tar.gz etc/ usr/ var/

rm cassandra_0.8.1_all.deb
ar rc cassandra_0.8.1_all.deb debian-binary control.tar.gz data.tar.gz

sudo apt-get install openjdk-6-jdk
sudo dpkg -i cassandra_0.8.1_all.deb 

Alternatively, you can use policy-rc.d to prevent cassandra.deb's post-init 
script from running on install and replace the messed up .jar after it has been 
installed. Instructions here: 
http://lifeonubuntu.com/how-to-prevent-server-daemons-from-starting-during-apt-get-install/




 Cassandra uses 100% system CPU on Ubuntu Natty (11.04)
 --

 Key: CASSANDRA-2845
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2845
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.8.1
 Environment: Default install of Ubuntu 11.04
Reporter: Steve Corona

 Step 1. Boot up a brand new, default Ubuntu 11.04 Server install
 Step 2. Install Cassandra from Apache APT Respository (deb 
 http://www.apache.org/dist/cassandra/debian 08x main)
 Step 3. apt-get install cassandra, as soon as it cassandra starts it will 
 freeze the machine
 What's happening is that as soon as cassandra starts up it immediately sucks 
 up 100% of CPU and starves the machine. This effectively bricks the box until 
 you boot into single user mode and disable the cassandra init.d script.
 Under htop, the CPU usage shows up as system cpu, not user.
 The machine I'm testing this on is a Quad-Core Sandy Bridge w/ 16GB of 
 Memory, so it's not a system resource issue. I've also tested this on 
 completely different hardware (Dual 64-Bit Xeons  AMD X4) and it has the 
 same effect.
 Ubuntu 10.10 does not exhibit the same issue. I have only tested 0.8 and 
 0.8.1.
 root@cassandra01:/# java -version
 java version 1.6.0_22
 OpenJDK Runtime Environment (IcedTea6 1.10.2) (6b22-1.10.2-0ubuntu1~11.04.1)
 OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)
 root@cassandra:/# uname -a
 Linux cassandra01 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 
 2011 x86_64 x86_64 x86_64 GNU/Linux
 /proc/cpu
 Intel(R) Xeon(R) CPU E31270 @ 3.40GHz
 /proc/meminfo
 MemTotal:   16459776 kB
 MemFree:14190708 kB

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (CASSANDRA-2845) Cassandra uses 100% system CPU on Ubuntu Natty (11.04)

2011-07-01 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis reassigned CASSANDRA-2845:
-

Assignee: paul cannon

/baffled

 Cassandra uses 100% system CPU on Ubuntu Natty (11.04)
 --

 Key: CASSANDRA-2845
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2845
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.8.1
 Environment: Default install of Ubuntu 11.04
Reporter: Steve Corona
Assignee: paul cannon

 Step 1. Boot up a brand new, default Ubuntu 11.04 Server install
 Step 2. Install Cassandra from Apache APT Respository (deb 
 http://www.apache.org/dist/cassandra/debian 08x main)
 Step 3. apt-get install cassandra, as soon as it cassandra starts it will 
 freeze the machine
 What's happening is that as soon as cassandra starts up it immediately sucks 
 up 100% of CPU and starves the machine. This effectively bricks the box until 
 you boot into single user mode and disable the cassandra init.d script.
 Under htop, the CPU usage shows up as system cpu, not user.
 The machine I'm testing this on is a Quad-Core Sandy Bridge w/ 16GB of 
 Memory, so it's not a system resource issue. I've also tested this on 
 completely different hardware (Dual 64-Bit Xeons  AMD X4) and it has the 
 same effect.
 Ubuntu 10.10 does not exhibit the same issue. I have only tested 0.8 and 
 0.8.1.
 root@cassandra01:/# java -version
 java version 1.6.0_22
 OpenJDK Runtime Environment (IcedTea6 1.10.2) (6b22-1.10.2-0ubuntu1~11.04.1)
 OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)
 root@cassandra:/# uname -a
 Linux cassandra01 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 
 2011 x86_64 x86_64 x86_64 GNU/Linux
 /proc/cpu
 Intel(R) Xeon(R) CPU E31270 @ 3.40GHz
 /proc/meminfo
 MemTotal:   16459776 kB
 MemFree:14190708 kB

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2843) better performance on long row read

2011-07-01 Thread Yang Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yang updated CASSANDRA-2843:
-

Attachment: (was: fast_cf.diff)

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang

 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2843) better performance on long row read

2011-07-01 Thread Yang Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yang updated CASSANDRA-2843:
-

Attachment: (was: b.tar.gz)

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang

 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2843) better performance on long row read

2011-07-01 Thread Yang Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yang updated CASSANDRA-2843:
-

Attachment: fast_cf_081_trunk.diff

the provided patch is for demonstration now, will work further once we agree on 
the general direction. 

CFS, ColumnFamily, and Table are changed; a new FastColumnFamily is provided. 
the main work is to let the FastColumnFamily use an array for internal storage. 
at first I used binary search to insert new columns in addColumn(), but later I 
found that even this is not necessary, since all calling scenarios of 
ColumnFamily.addColumn() has an invariant that the inserted columns come in 
sorted order (I still have an issue to resolve descending or ascending now, but 
ascending works). so the current logic is simply to compare the new column 
against the end column in the array, if names not equal, append, if equal, 
reconcile.

slight temporary hacks are made on getTopLevelColumnFamily so we have 2 flavors 
of the method, one accepting a returnCF. but we could definitely think about 
what is the better way to provide this returnCF.

this patch compiles fine, no tests are provided yet. but I tested it in my 
application, and the performance improvement is dramatic: it offers about 50% 
reduction in read time in the 3000-column case.


 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: fast_cf_081_trunk.diff


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on 

[jira] [Commented] (CASSANDRA-2843) better performance on long row read

2011-07-01 Thread Yang Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058645#comment-13058645
 ] 

Yang Yang commented on CASSANDRA-2843:
--

thanks Sylvain. 

I changed the patch to be based on current svn trunk.

(sorry the last attempt was based on the 080-rc1 tar ball, I did not know how 
to include a new file in diff -uw -r , so had to include the 
FastColumnFamily.java in a tar ball)

sorry the last SortedSet change was a typo... I once changed SortedSet to Set 
when I tried to use the cheaper HashMap, but later removed it when I used array.


a lot of the FastColumnFamily methods are not implemented now, but the basic 
functionality is there for demonstration of the idea

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: fast_cf_081_trunk.diff


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2843) better performance on long row read

2011-07-01 Thread Yang Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058652#comment-13058652
 ] 

Yang Yang commented on CASSANDRA-2843:
--

right now design wise, the thing I'm most not sure about is where to properly 
inject the returnCF.

also on a bigger scale, the multiple levels of 
collateIterator
reducingIterator
ColumnFamily
Table.getRow()

could probably be looked at from a more wholistic view, so that less internal 
conversions are done. my patch makes a small try in this step, but probably 
more can be done: for example getRow() converts the CFS.getSortedColumns() into 
another List by thriftifyColumns(). instead of list, we may just let 
FastColumnFamily pass the original iterators, and thriftify directly uses the 
iterator, instead of through the FastColumnFamily.columns_array. this time 
saving could be small though, since array is already very cheap.



 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: fast_cf_081_trunk.diff


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2842) Hive JDBC connections fail with InvalidUrlException when both the C* and Hive JDBC drivers are loaded

2011-07-01 Thread Rick Shaw (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058658#comment-13058658
 ] 

Rick Shaw commented on CASSANDRA-2842:
--

I took a quick look at the Hive sources and I believe you will find the Hive 
Driver suffers from this defect as well. So if you reversed the order I think 
it will be the Hive driver that throws an exception rather than deferring to 
the next driver in the chain of loaded drivers(C*).

 Hive JDBC connections fail with InvalidUrlException when both the C* and Hive 
 JDBC drivers are loaded
 -

 Key: CASSANDRA-2842
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2842
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0
Reporter: Cathy Daw
Assignee: Rick Shaw
Priority: Trivial
 Fix For: 1.0

 Attachments: pass-if-not-right-driver-v1.txt


 Hive connections fail with InvalidUrlException when both the C* and Hive JDBC 
 drivers are loaded, and it seems the URL is being interpreted as a C* url.
 {code}
   Caused an ERROR
 [junit] Invalid connection url:jdbc:hive://127.0.0.1:1/default. 
 should start with jdbc:cassandra
 [junit] org.apache.cassandra.cql.jdbc.InvalidUrlException: Invalid 
 connection url:jdbc:hive://127.0.0.1:1/default. should start with 
 jdbc:cassandra
 [junit]   at 
 org.apache.cassandra.cql.jdbc.CassandraDriver.connect(CassandraDriver.java:90)
 [junit]   at java.sql.DriverManager.getConnection(DriverManager.java:582)
 [junit]   at java.sql.DriverManager.getConnection(DriverManager.java:185)
 [junit]   at 
 com.datastax.bugRepros.repro_connection_error.test1_runHiveBeforeJdbc(repro_connection_error.java:34)
 {code}
 *Code Snippet: intended to illustrate the connection issues* 
 * Copy file to test directory
 * Change package declaration
 * run:  ant test -Dtest.name=repro_conn_error
 {code}
 package com.datastax.bugRepros;
 import java.sql.DriverManager;
 import java.sql.Connection;
 import java.sql.SQLException;
 import java.util.Enumeration;
 import org.junit.Test;
 public class repro_conn_error
 {
 @Test
 public void jdbcConnectionError() throws Exception 
 {  
 // Create Hive JDBC Connection - will succeed if  
 try 
 {
 // Uncomment loading C* driver to reproduce bug
 Class.forName(org.apache.cassandra.cql.jdbc.CassandraDriver);
 
 // Load Hive driver and connect
 Class.forName(org.apache.hadoop.hive.jdbc.HiveDriver);
 Connection hiveConn = 
 DriverManager.getConnection(jdbc:hive://127.0.0.1:1/default, , );
 hiveConn.close();  
 System.out.println(successful hive connection);
 } catch (SQLException e) {
 System.out.println(unsuccessful hive connection);
 e.printStackTrace();
 }
 
 // Create C* JDBC Connection
 try 
 {
 Class.forName(org.apache.cassandra.cql.jdbc.CassandraDriver);
 Connection jdbcConn = 
 DriverManager.getConnection(jdbc:cassandra:root/root@127.0.0.1:9160/default);
  
 jdbcConn.close();
 System.out.println(successful c* connection);
 } catch (SQLException e) {
 System.out.println(unsuccessful c* connection);
 e.printStackTrace();
 }
 
 // Print out all loaded JDBC drivers.
 Enumeration d = java.sql.DriverManager.getDrivers();
 
 while (d.hasMoreElements()) {
 Object driverAsObject = d.nextElement();
 System.out.println(JDBC driver= + driverAsObject);
 }
 }
 }
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2819) Split rpc timeout for read and write ops

2011-07-01 Thread Melvin Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058684#comment-13058684
 ] 

Melvin Wang commented on CASSANDRA-2819:


how about add the creation timestamp to the header of the message? MDT is 
executed nearly immediately after it is created. Thus the construction time of 
MDT is too close to the checkpoint we have in run(). I am just concerned with 
the effectiveness of the current logic in MDT, although I am not sure about the 
consequences of adding 4 bytes to all the messages we created.

 Split rpc timeout for read and write ops
 

 Key: CASSANDRA-2819
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2819
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Stu Hood
Assignee: Melvin Wang
 Fix For: 1.0

 Attachments: twttr-cassandra-0.8-counts-resync-rpc-rw-timeouts.diff


 Given the vastly different latency characteristics of reads and writes, it 
 makes sense for them to have independent rpc timeouts internally.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2847) Nullpointer Exception in get_range_slices

2011-07-01 Thread Thibaut (JIRA)
Nullpointer Exception in get_range_slices
-

 Key: CASSANDRA-2847
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2847
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.1
Reporter: Thibaut
Priority: Critical


Hi,

we upgraded our test cluster from 0.7.* to 0.8.1. We did run nodetool scrub on 
each node, and then nodetool repair (Repair might not have finished so far). We 
also upgradet to hector 0.8.1 

We tried to run our application and get_range_slices fails with the following 
error:

ERROR [pool-2-thread-15] 2011-07-01 20:15:46,224 Cassandra.java (line 3210) 
Internal error processing get_range_slices
java.lang.NullPointerException
at org.apache.cassandra.db.ColumnFamily.diff(ColumnFamily.java:298)
at org.apache.cassandra.db.ColumnFamily.diff(ColumnFamily.java:406)
at 
org.apache.cassandra.service.RowRepairResolver.maybeScheduleRepairs(RowRepairResolver.java:103)
at 
org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:120)
at 
org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:85)
at 
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:74)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at 
org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:715)
at 
org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:617)
at 
org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.process(Cassandra.java:3202)
at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)





--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2845) Cassandra uses 100% system CPU on Ubuntu Natty (11.04)

2011-07-01 Thread Steve Corona (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058706#comment-13058706
 ] 

Steve Corona commented on CASSANDRA-2845:
-

Okay, so as it turns out the original problem is different than I thought. My 
dpkg solution was just skirting around the real issue (since dpkg doesn't force 
you to install all of the recommended dependencies).

It's libjna-java (3.2.4-2ubuntu2) that's really causing the issue. The 
cassandra apt repository is pulling it in as a dependency and for, whatever 
reason, it sucks up all of the CPU when it runs with cassandra. I don't know if 
it's a matter of libjna being broken in 11.04 or just that it doesn't play nice 
with Cassandra.

FWIW, CASSANDRA-2803 mentions deb packages  libjna- not sure what role that 
plays into this.

Here is my current workaround:

mkdir -p /usr/sbin/
cat  /usr/sbin/policy-rc.d
#!/bin/sh
exit 101
EOF
chmod 755 /usr/sbin/policy-rc.d

apt-get install cassandra
apt-get remove libjna-java
service cassandra start


 Cassandra uses 100% system CPU on Ubuntu Natty (11.04)
 --

 Key: CASSANDRA-2845
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2845
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.8.1
 Environment: Default install of Ubuntu 11.04
Reporter: Steve Corona
Assignee: paul cannon

 Step 1. Boot up a brand new, default Ubuntu 11.04 Server install
 Step 2. Install Cassandra from Apache APT Respository (deb 
 http://www.apache.org/dist/cassandra/debian 08x main)
 Step 3. apt-get install cassandra, as soon as it cassandra starts it will 
 freeze the machine
 What's happening is that as soon as cassandra starts up it immediately sucks 
 up 100% of CPU and starves the machine. This effectively bricks the box until 
 you boot into single user mode and disable the cassandra init.d script.
 Under htop, the CPU usage shows up as system cpu, not user.
 The machine I'm testing this on is a Quad-Core Sandy Bridge w/ 16GB of 
 Memory, so it's not a system resource issue. I've also tested this on 
 completely different hardware (Dual 64-Bit Xeons  AMD X4) and it has the 
 same effect.
 Ubuntu 10.10 does not exhibit the same issue. I have only tested 0.8 and 
 0.8.1.
 root@cassandra01:/# java -version
 java version 1.6.0_22
 OpenJDK Runtime Environment (IcedTea6 1.10.2) (6b22-1.10.2-0ubuntu1~11.04.1)
 OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)
 root@cassandra:/# uname -a
 Linux cassandra01 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 
 2011 x86_64 x86_64 x86_64 GNU/Linux
 /proc/cpu
 Intel(R) Xeon(R) CPU E31270 @ 3.40GHz
 /proc/meminfo
 MemTotal:   16459776 kB
 MemFree:14190708 kB

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2819) Split rpc timeout for read and write ops

2011-07-01 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058715#comment-13058715
 ] 

Jonathan Ellis commented on CASSANDRA-2819:
---

Let's keep the scope of this ticket to splitting the rpc timeout.  We can open 
another for make request dropping more accurate/aggressive.

 Split rpc timeout for read and write ops
 

 Key: CASSANDRA-2819
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2819
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Stu Hood
Assignee: Melvin Wang
 Fix For: 1.0

 Attachments: twttr-cassandra-0.8-counts-resync-rpc-rw-timeouts.diff


 Given the vastly different latency characteristics of reads and writes, it 
 makes sense for them to have independent rpc timeouts internally.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2252) off-heap memtables

2011-07-01 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058733#comment-13058733
 ] 

Jonathan Ellis commented on CASSANDRA-2252:
---

JNA 3.3.0 has been released including the http://java.net/jira/browse/JNA-179 
fixes.

 off-heap memtables
 --

 Key: CASSANDRA-2252
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
 Fix For: 1.0

 Attachments: 0001-add-MemtableAllocator.txt, 
 0002-add-off-heap-MemtableAllocator-support.txt, merged-2252.tgz

   Original Estimate: 0.4h
  Remaining Estimate: 0.4h

 The memtable design practically actively fights Java's GC design.  Todd 
 Lipcon gave a good explanation over on HBASE-3455.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2847) Nullpointer Exception in get_range_slices

2011-07-01 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058736#comment-13058736
 ] 

Jonathan Ellis commented on CASSANDRA-2847:
---

Sounds like CASSANDRA-2823.  Can you try svn head of the 0.8 branch?

 Nullpointer Exception in get_range_slices
 -

 Key: CASSANDRA-2847
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2847
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.1
Reporter: Thibaut
Priority: Critical

 Hi,
 we upgraded our test cluster from 0.7.* to 0.8.1. We did run nodetool scrub 
 on each node, and then nodetool repair (Repair might not have finished so 
 far). We also upgradet to hector 0.8.1 
 We tried to run our application and get_range_slices fails with the following 
 error:
 ERROR [pool-2-thread-15] 2011-07-01 20:15:46,224 Cassandra.java (line 3210) 
 Internal error processing get_range_slices
 java.lang.NullPointerException
 at org.apache.cassandra.db.ColumnFamily.diff(ColumnFamily.java:298)
 at org.apache.cassandra.db.ColumnFamily.diff(ColumnFamily.java:406)
 at 
 org.apache.cassandra.service.RowRepairResolver.maybeScheduleRepairs(RowRepairResolver.java:103)
 at 
 org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:120)
 at 
 org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:85)
 at 
 org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:74)
 at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
 at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
 at 
 org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:715)
 at 
 org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:617)
 at 
 org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.process(Cassandra.java:3202)
 at 
 org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
 at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




svn commit: r1142046 - in /cassandra/drivers/java: CHANGES.txt src/org/apache/cassandra/cql/jdbc/CassandraDriver.java

2011-07-01 Thread jbellis
Author: jbellis
Date: Fri Jul  1 19:49:33 2011
New Revision: 1142046

URL: http://svn.apache.org/viewvc?rev=1142046view=rev
Log:
cooperate with other jdbc drivers
patch by Rick Shaw; reviewed by jbellis for CASSANDRA-2842

Modified:
cassandra/drivers/java/CHANGES.txt

cassandra/drivers/java/src/org/apache/cassandra/cql/jdbc/CassandraDriver.java

Modified: cassandra/drivers/java/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/drivers/java/CHANGES.txt?rev=1142046r1=1142045r2=1142046view=diff
==
--- cassandra/drivers/java/CHANGES.txt (original)
+++ cassandra/drivers/java/CHANGES.txt Fri Jul  1 19:49:33 2011
@@ -1,2 +1,3 @@
 1.0.4
   * improve JDBC spec compliance (CASSANDRA-2720, 2754)
+  * cooperate with other jdbc drivers (CASSANDRA-2842)

Modified: 
cassandra/drivers/java/src/org/apache/cassandra/cql/jdbc/CassandraDriver.java
URL: 
http://svn.apache.org/viewvc/cassandra/drivers/java/src/org/apache/cassandra/cql/jdbc/CassandraDriver.java?rev=1142046r1=1142045r2=1142046view=diff
==
--- 
cassandra/drivers/java/src/org/apache/cassandra/cql/jdbc/CassandraDriver.java 
(original)
+++ 
cassandra/drivers/java/src/org/apache/cassandra/cql/jdbc/CassandraDriver.java 
Fri Jul  1 19:49:33 2011
@@ -20,6 +20,8 @@
  */
 package org.apache.cassandra.cql.jdbc;
 
+import static org.apache.cassandra.cql.jdbc.Utils.*;
+
 import java.sql.Connection;
 import java.sql.Driver;
 import java.sql.DriverManager;
@@ -39,12 +41,7 @@ import java.util.Properties;
 
 /** The Constant MINOR_VERSION. */
 private static final int MINOR_VERSION = 0;
-
-private static final String BAD_URL = Invalid connection url: '%s'. it 
should start with 'jdbc:cassandra:';
 
-/** The ACCEPT s_ url. */
-public static final String ACCEPTS_URL = jdbc:cassandra:;
-
 //private static final Logger logger = 
LoggerFactory.getLogger(CassandraDriver.class); 
 
 static
@@ -66,7 +63,7 @@ import java.util.Properties;
  */
 public boolean acceptsURL(String url) throws SQLException
 {
-return url.startsWith(ACCEPTS_URL);
+return url.startsWith(PROTOCOL);
 }
 
 /**
@@ -80,7 +77,7 @@ import java.util.Properties;
 }
 else
 {
-throw new 
SQLNonTransientConnectionException(String.format(BAD_URL, url));
+return null; // signal it is the wrong driver for this 
protocol:subprotocol
 }
 }
 




svn commit: r1142050 - in /cassandra/branches/cassandra-0.8: CHANGES.txt src/java/org/apache/cassandra/tools/NodeCmd.java

2011-07-01 Thread jbellis
Author: jbellis
Date: Fri Jul  1 19:54:15 2011
New Revision: 1142050

URL: http://svn.apache.org/viewvc?rev=1142050view=rev
Log:
improve nodetool compactionstats formatting
patch by Wojciech Meler; reviewed by jbellis for CASSANDRA-2844

Modified:
cassandra/branches/cassandra-0.8/CHANGES.txt

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/tools/NodeCmd.java

Modified: cassandra/branches/cassandra-0.8/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1142050r1=1142049r2=1142050view=diff
==
--- cassandra/branches/cassandra-0.8/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.8/CHANGES.txt Fri Jul  1 19:54:15 2011
@@ -11,6 +11,7 @@
(CASSANDRA-2823)
  * Fix race in SystemTable.getCurrentLocalNodeId (CASSANDRA-2824)
  * Correctly set default for replicate_on_write (CASSANDRA-2835)
+ * improve nodetool compactionstats formatting (CASSANDRA-2844)
 
 
 0.8.1

Modified: 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/tools/NodeCmd.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/tools/NodeCmd.java?rev=1142050r1=1142049r2=1142050view=diff
==
--- 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/tools/NodeCmd.java
 (original)
+++ 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/tools/NodeCmd.java
 Fri Jul  1 19:54:15 2011
@@ -354,26 +354,22 @@ public class NodeCmd
 completed += n;
 outs.printf(%-25s%10s%10s%15s%n, Responses, n/a, pending, 
completed);
 }
-   
+
 public void printCompactionStats(PrintStream outs)
 {
 CompactionManagerMBean cm = probe.getCompactionManagerProxy();
+outs.println(pending tasks:  + cm.getPendingTasks());
+if (cm.getCompactions().size()  0)
+outs.printf(%25s%16s%16s%16s%16s%10s%n, compaction type, 
keyspace, column family, bytes compacted, bytes total, progress);
 for (CompactionInfo c : cm.getCompactions())
 {
-outs.println(compaction type:  + c.getTaskType());
-outs.println(keyspace:  + c.getKeyspace());
-outs.println(column family:  + c.getColumnFamily());
-outs.println(bytes compacted:  + c.getBytesComplete());
-outs.println(bytes total:  + c.getTotalBytes());
 String percentComplete = c.getTotalBytes() == 0
? n/a
-   : new DecimalFormat(#.##).format((double) 
c.getBytesComplete() / c.getTotalBytes() * 100) + %;
-outs.println(compaction progress:  + percentComplete);
-outs.println(-);
+   : new DecimalFormat(0.00).format((double) 
c.getBytesComplete() / c.getTotalBytes() * 100) + %;
+outs.printf(%25s%16s%16s%16s%16s%10s%n, c.getTaskType(), 
c.getKeyspace(), c.getColumnFamily(), c.getBytesComplete(), c.getTotalBytes(), 
percentComplete);
 }
-outs.println(pending tasks:  + cm.getPendingTasks());
 }
- 
+
 public void printColumnFamilyStats(PrintStream outs)
 {
 Map String, List ColumnFamilyStoreMBean cfstoreMap = new HashMap 
String, List ColumnFamilyStoreMBean();




[jira] [Updated] (CASSANDRA-2844) grep friendly nodetool compactionstats output

2011-07-01 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2844:
--

Affects Version/s: (was: 0.8.1)
   0.8.0
Fix Version/s: 0.8.2
 Assignee: Wojciech Meler

 grep friendly nodetool compactionstats output
 -

 Key: CASSANDRA-2844
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2844
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 0.8.0
Reporter: Wojciech Meler
Assignee: Wojciech Meler
Priority: Trivial
 Fix For: 0.8.2

 Attachments: comapctionstats.patch


 output from nodetool compactionstats is quite hard to parse with text tools - 
 it would be nice to have one line per compaction

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CASSANDRA-2844) grep friendly nodetool compactionstats output

2011-07-01 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-2844.
---

Resolution: Fixed
  Reviewer: jbellis

reformatted and committed.  thanks!

 grep friendly nodetool compactionstats output
 -

 Key: CASSANDRA-2844
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2844
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 0.8.0
Reporter: Wojciech Meler
Assignee: Wojciech Meler
Priority: Trivial
 Fix For: 0.8.2

 Attachments: comapctionstats.patch


 output from nodetool compactionstats is quite hard to parse with text tools - 
 it would be nice to have one line per compaction

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-803) remove PropertyConfigurator from CassandraDaemon

2011-07-01 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058751#comment-13058751
 ] 

Jonathan Ellis commented on CASSANDRA-803:
--

bq. I'd be happy to rip out all the log4j specific stuff and replace it with 
slf4j if that patch would be used.

Sure, as long as the log4j-based defaults continue to work.

Related: CASSANDRA-2383

 remove PropertyConfigurator from CassandraDaemon
 

 Key: CASSANDRA-803
 URL: https://issues.apache.org/jira/browse/CASSANDRA-803
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.6
Reporter: Jesse McConnell

 In order for users to make use of the EmbeddedCassandraService for unit 
 testing they need to have a dependency declared on log4j.  
 It would be nice if we could use the log4j-over-slf4j artifact to bridge this 
 requirement for those of us using slf4j.  
 http://www.slf4j.org/legacy.html#log4j-over-slf4j
 Currently it errors with the direct usage of the PropertyConfigurator in 
 o.a.c.thrift.CassandraDaemon.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2786) After a minor compaction, deleted key-slices are visible again

2011-07-01 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058755#comment-13058755
 ] 

Jonathan Ellis commented on CASSANDRA-2786:
---

Nit: wouldn't it be cleaner to just pass gcBefore rather than the entire 
controller to EchoedRow constructor?

+1 otherwise.

 After a minor compaction, deleted key-slices are visible again
 --

 Key: CASSANDRA-2786
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2786
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0
 Environment: Reproduced on single Cassandra node (CentOS 5.5)
 Reproduced on single Cassandra node (Windows Server 2008)
Reporter: rene kochen
Assignee: Sylvain Lebresne
 Fix For: 0.8.1, 0.8.2

 Attachments: 0001-Fix-wrong-purge-of-deleted-cf.patch, 
 2786_part2.patch, CassandraIssue.zip, CassandraIssueJava.zip


 After a minor compaction, deleted key-slices are visible again.
 Steps to reproduce:
 1) Insert a row named test.
 2) Insert 50 rows. During this step, row test is included in a major 
 compaction:
file-1, file-2, file-3 and file-4 compacted to file-5 (includes test).
 3) Delete row named test.
 4) Insert 50 rows. During this step, row test is included in a minor 
 compaction:
file-6, file-7, file-8 and file-9 compacted to file-10 (should include 
 tombstoned test).
 After step 4, row test is live again.
 Test environment:
 Single node with empty database.
 Standard configured super-column-family (I see this behavior with several 
 gc_grace settings (big and small values):
 create column family Customers with column_type = 'Super' and comparator = 
 'BytesType;
 In Cassandra 0.7.6 I observe the expected behavior, i.e. after step 4, the 
 row is still deleted.
 I've included a .NET program to reproduce the problem. I will add a Java 
 version later on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-07-01 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058757#comment-13058757
 ] 

Jonathan Ellis commented on CASSANDRA-2388:
---

+1 to CFRR changes

wasn't immediately clear to me what CFIF changes are doing, can you elaborate?

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388-local-nodes-only.rough-sketch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2844) grep friendly nodetool compactionstats output

2011-07-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058765#comment-13058765
 ] 

Hudson commented on CASSANDRA-2844:
---

Integrated in Cassandra-0.8 #201 (See 
[https://builds.apache.org/job/Cassandra-0.8/201/])
improve nodetool compactionstats formatting
patch by Wojciech Meler; reviewed by jbellis for CASSANDRA-2844

jbellis : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1142050
Files : 
* /cassandra/branches/cassandra-0.8/CHANGES.txt
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/tools/NodeCmd.java


 grep friendly nodetool compactionstats output
 -

 Key: CASSANDRA-2844
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2844
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 0.8.0
Reporter: Wojciech Meler
Assignee: Wojciech Meler
Priority: Trivial
 Fix For: 0.8.2

 Attachments: comapctionstats.patch


 output from nodetool compactionstats is quite hard to parse with text tools - 
 it would be nice to have one line per compaction

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2753) Capture the max client timestamp for an SSTable

2011-07-01 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058767#comment-13058767
 ] 

Jonathan Ellis commented on CASSANDRA-2753:
---

Is there a reason not to have the max timestamp code in an IColumn method?

 Capture the max client timestamp for an SSTable
 ---

 Key: CASSANDRA-2753
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2753
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Alan Liang
Assignee: Alan Liang
Priority: Minor
 Attachments: 
 0001-capture-max-timestamp-and-created-SSTableMetadata-to-V2.patch, 
 0001-capture-max-timestamp-and-created-SSTableMetadata-to.patch, 
 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2804) expose dropped messages, exceptions over JMX

2011-07-01 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2804:
--

Attachment: 2804-v2.txt

v2 adds recently dropped mbean as in Ryan's, and changes logDroppedMessages to 
log the total counts to avoid interfering with it.

 expose dropped messages, exceptions over JMX
 

 Key: CASSANDRA-2804
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2804
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2804-v2.txt, 2804.txt, 
 twttr-cassandra-0.8-counts-resync-droppedmsg-metric.diff


 Patch against 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2804) expose dropped messages, exceptions over JMX

2011-07-01 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2804:
--

Fix Version/s: (was: 0.7.7)

Targeting 0.8+ now since we're changing logDroppedMessages behavior.

 expose dropped messages, exceptions over JMX
 

 Key: CASSANDRA-2804
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2804
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2804-v2.txt, 2804.txt, 
 twttr-cassandra-0.8-counts-resync-droppedmsg-metric.diff


 Patch against 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2846) Changing replication_factor using update keyspace not working

2011-07-01 Thread Jon Hermes (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Hermes updated CASSANDRA-2846:
--

Reviewer: jhermes  (was: bcoverston)

 Changing replication_factor using update keyspace not working
 ---

 Key: CASSANDRA-2846
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2846
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.1
 Environment: A clean 0.8.1 install using the default configuration
Reporter: Jonas Borgström
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2846.txt


 Unless I've misunderstood the new way to do this with 0.8 I think update 
 keyspace is broken:
 {code}
 [default@unknown] create keyspace Test with placement_strategy = 
 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
 [{replication_factor:1}];
 37f70d40-a3e9-11e0--242d50cf1fbf
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@unknown] describe keyspace Test;
 Keyspace: Test:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:1]
   Column Families:
 [default@unknown] update keyspace Test with placement_strategy = 
 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
 [{replication_factor:2}];
 489fe220-a3e9-11e0--242d50cf1fbf
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@unknown] describe keyspace Test; 
   
 Keyspace: Test:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:1]
   Column Families:
 {code}
 Isn't the second describe keyspace supposed to to say 
 replication_factor:2?
 Relevant bits from system.log:
 {code}
 Migration.java (line 116) Applying migration 
 489fe220-a3e9-11e0--242d50cf1fbf Update keyspace Testrep 
 strategy:SimpleStrategy{}durable_writes: true to Testrep 
 strategy:SimpleStrategy{}durable_writes: true
 UpdateKeyspace.java (line 74) Keyspace updated. Please perform any manual 
 operations
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2846) Changing replication_factor using update keyspace not working

2011-07-01 Thread Jon Hermes (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058855#comment-13058855
 ] 

Jon Hermes commented on CASSANDRA-2846:
---

-1, doesn't update strategy_options for KS's that already have SimpleStrategy.
repro:

{noformat}
start 1-node local
stress -o insert -n 1 (create Keyspace1 with SS and RF1)
cli:
  [] update keyspace Keyspace1 with strategy_options=[{replication_factor:2}];
{noformat}

Creating a new keyspace (Keyspace2, with default NTS and [{DC1:1}], then 
`update keyspace Keyspace2 with 
placement_strategy='org.apache.cassandra.locator.SimpleStrategy' and 
strategy_options=[{replication_factor:2}];` does work, however.

 Changing replication_factor using update keyspace not working
 ---

 Key: CASSANDRA-2846
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2846
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.1
 Environment: A clean 0.8.1 install using the default configuration
Reporter: Jonas Borgström
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2846.txt


 Unless I've misunderstood the new way to do this with 0.8 I think update 
 keyspace is broken:
 {code}
 [default@unknown] create keyspace Test with placement_strategy = 
 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
 [{replication_factor:1}];
 37f70d40-a3e9-11e0--242d50cf1fbf
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@unknown] describe keyspace Test;
 Keyspace: Test:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:1]
   Column Families:
 [default@unknown] update keyspace Test with placement_strategy = 
 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
 [{replication_factor:2}];
 489fe220-a3e9-11e0--242d50cf1fbf
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@unknown] describe keyspace Test; 
   
 Keyspace: Test:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:1]
   Column Families:
 {code}
 Isn't the second describe keyspace supposed to to say 
 replication_factor:2?
 Relevant bits from system.log:
 {code}
 Migration.java (line 116) Applying migration 
 489fe220-a3e9-11e0--242d50cf1fbf Update keyspace Testrep 
 strategy:SimpleStrategy{}durable_writes: true to Testrep 
 strategy:SimpleStrategy{}durable_writes: true
 UpdateKeyspace.java (line 74) Keyspace updated. Please perform any manual 
 operations
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2846) Changing replication_factor using update keyspace not working

2011-07-01 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058856#comment-13058856
 ] 

Jonathan Ellis commented on CASSANDRA-2846:
---

Jonas's test case works for me.

 Changing replication_factor using update keyspace not working
 ---

 Key: CASSANDRA-2846
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2846
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.1
 Environment: A clean 0.8.1 install using the default configuration
Reporter: Jonas Borgström
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2846.txt


 Unless I've misunderstood the new way to do this with 0.8 I think update 
 keyspace is broken:
 {code}
 [default@unknown] create keyspace Test with placement_strategy = 
 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
 [{replication_factor:1}];
 37f70d40-a3e9-11e0--242d50cf1fbf
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@unknown] describe keyspace Test;
 Keyspace: Test:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:1]
   Column Families:
 [default@unknown] update keyspace Test with placement_strategy = 
 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
 [{replication_factor:2}];
 489fe220-a3e9-11e0--242d50cf1fbf
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@unknown] describe keyspace Test; 
   
 Keyspace: Test:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:1]
   Column Families:
 {code}
 Isn't the second describe keyspace supposed to to say 
 replication_factor:2?
 Relevant bits from system.log:
 {code}
 Migration.java (line 116) Applying migration 
 489fe220-a3e9-11e0--242d50cf1fbf Update keyspace Testrep 
 strategy:SimpleStrategy{}durable_writes: true to Testrep 
 strategy:SimpleStrategy{}durable_writes: true
 UpdateKeyspace.java (line 74) Keyspace updated. Please perform any manual 
 operations
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query

2011-07-01 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-1125:
--

Attachment: 1125-formatted.txt

Looks good to me for the most part.  (Attaching reformatted version.)

One part though I'm not 100% sure about -- we're using KeyRange for 
start-exclusive ranges, when the Thrift API always uses it for start-inclusive.

I'd be more comfortable with any of:
- using a PairString, String
- using a new one-off class
- using KeyRange but with tokens (which Thrift also uses for start-exclusive)
- using a Range object directly (also requires tokens)

 Filter out ColumnFamily rows that aren't part of the query
 --

 Key: CASSANDRA-1125
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Mck SembWever
Priority: Minor
 Fix For: 1.0

 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch


 Currently, when running a MapReduce job against data in a Cassandra data 
 store, it reads through all the data for a particular ColumnFamily.  This 
 could be optimized to only read through those rows that have to do with the 
 query.
 It's a small change but wanted to put it in Jira so that it didn't fall 
 through the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query

2011-07-01 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058859#comment-13058859
 ] 

Jonathan Ellis commented on CASSANDRA-1125:
---

(And I'd be fine with putting this in 0.8.x.)

 Filter out ColumnFamily rows that aren't part of the query
 --

 Key: CASSANDRA-1125
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Mck SembWever
Priority: Minor
 Fix For: 1.0

 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch


 Currently, when running a MapReduce job against data in a Cassandra data 
 store, it reads through all the data for a particular ColumnFamily.  This 
 could be optimized to only read through those rows that have to do with the 
 query.
 It's a small change but wanted to put it in Jira so that it didn't fall 
 through the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2846) Changing replication_factor using update keyspace not working

2011-07-01 Thread Jon Hermes (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058855#comment-13058855
 ] 

Jon Hermes edited comment on CASSANDRA-2846 at 7/1/11 11:21 PM:


--1, doesn't update strategy_options for KS's that already have SimpleStrategy.-

+1, it's good.

  was (Author: jhermes):
-1, doesn't update strategy_options for KS's that already have 
SimpleStrategy.
repro:

{noformat}
start 1-node local
stress -o insert -n 1 (create Keyspace1 with SS and RF1)
cli:
  [] update keyspace Keyspace1 with strategy_options=[{replication_factor:2}];
{noformat}

Creating a new keyspace (Keyspace2, with default NTS and [{DC1:1}], then 
`update keyspace Keyspace2 with 
placement_strategy='org.apache.cassandra.locator.SimpleStrategy' and 
strategy_options=[{replication_factor:2}];` does work, however.
  
 Changing replication_factor using update keyspace not working
 ---

 Key: CASSANDRA-2846
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2846
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.1
 Environment: A clean 0.8.1 install using the default configuration
Reporter: Jonas Borgström
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2846.txt


 Unless I've misunderstood the new way to do this with 0.8 I think update 
 keyspace is broken:
 {code}
 [default@unknown] create keyspace Test with placement_strategy = 
 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
 [{replication_factor:1}];
 37f70d40-a3e9-11e0--242d50cf1fbf
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@unknown] describe keyspace Test;
 Keyspace: Test:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:1]
   Column Families:
 [default@unknown] update keyspace Test with placement_strategy = 
 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
 [{replication_factor:2}];
 489fe220-a3e9-11e0--242d50cf1fbf
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@unknown] describe keyspace Test; 
   
 Keyspace: Test:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:1]
   Column Families:
 {code}
 Isn't the second describe keyspace supposed to to say 
 replication_factor:2?
 Relevant bits from system.log:
 {code}
 Migration.java (line 116) Applying migration 
 489fe220-a3e9-11e0--242d50cf1fbf Update keyspace Testrep 
 strategy:SimpleStrategy{}durable_writes: true to Testrep 
 strategy:SimpleStrategy{}durable_writes: true
 UpdateKeyspace.java (line 74) Keyspace updated. Please perform any manual 
 operations
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira