[jira] [Updated] (CASSANDRA-2843) better performance on long row read
[ https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Yang updated CASSANDRA-2843: - Attachment: fast_cf.diff diff file better performance on long row read --- Key: CASSANDRA-2843 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843 Project: Cassandra Issue Type: New Feature Reporter: Yang Yang Attachments: fast_cf.diff currently if a row contains 1000 columns, the run time becomes considerably slow (my test of a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 40 bytes in value, is about 16ms. this is all running in memory, no disk read is involved. through debugging we can find most of this time is spent on [Wall Time] org.apache.cassandra.db.Table.getRow(QueryFilter) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, ColumnFamily) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, ColumnFamily) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, int, ColumnFamily) [Wall Time] org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily, Iterator, int) [Wall Time] org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer, Iterator, int) [Wall Time] org.apache.cassandra.db.ColumnFamily.addColumn(IColumn) ColumnFamily.addColumn() is slow because it inserts into an internal concurrentSkipListMap() that maps column names to values. this structure is slow for two reasons: it needs to do synchronization; it needs to maintain a more complex structure of map. but if we look at the whole read path, thrift already defines the read output to be ListColumnOrSuperColumn so it does not make sense to use a luxury map data structure in the interium and finally convert it to a list. on the synchronization side, since the return CF is never going to be shared/modified by other threads, we know the access is always single thread, so no synchronization is needed. but these 2 features are indeed needed for ColumnFamily in other cases, particularly write. so we can provide a different ColumnFamily to CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always creates the standard ColumnFamily, but take a provided returnCF, whose cost is much cheaper. the provided patch is for demonstration now, will work further once we agree on the general direction. CFS, ColumnFamily, and Table are changed; a new FastColumnFamily is provided. the main work is to let the FastColumnFamily use an array for internal storage. at first I used binary search to insert new columns in addColumn(), but later I found that even this is not necessary, since all calling scenarios of ColumnFamily.addColumn() has an invariant that the inserted columns come in sorted order (I still have an issue to resolve descending or ascending now, but ascending works). so the current logic is simply to compare the new column against the end column in the array, if names not equal, append, if equal, reconcile. slight temporary hacks are made on getTopLevelColumnFamily so we have 2 flavors of the method, one accepting a returnCF. but we could definitely think about what is the better way to provide this returnCF. this patch compiles fine, no tests are provided yet. but I tested it in my application, and the performance improvement is dramatic: it offers about 50% reduction in read time in the 3000-column case. thanks Yang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2843) better performance on long row read
better performance on long row read --- Key: CASSANDRA-2843 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843 Project: Cassandra Issue Type: New Feature Reporter: Yang Yang Attachments: fast_cf.diff currently if a row contains 1000 columns, the run time becomes considerably slow (my test of a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 40 bytes in value, is about 16ms. this is all running in memory, no disk read is involved. through debugging we can find most of this time is spent on [Wall Time] org.apache.cassandra.db.Table.getRow(QueryFilter) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, ColumnFamily) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, ColumnFamily) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, int, ColumnFamily) [Wall Time] org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily, Iterator, int) [Wall Time] org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer, Iterator, int) [Wall Time] org.apache.cassandra.db.ColumnFamily.addColumn(IColumn) ColumnFamily.addColumn() is slow because it inserts into an internal concurrentSkipListMap() that maps column names to values. this structure is slow for two reasons: it needs to do synchronization; it needs to maintain a more complex structure of map. but if we look at the whole read path, thrift already defines the read output to be ListColumnOrSuperColumn so it does not make sense to use a luxury map data structure in the interium and finally convert it to a list. on the synchronization side, since the return CF is never going to be shared/modified by other threads, we know the access is always single thread, so no synchronization is needed. but these 2 features are indeed needed for ColumnFamily in other cases, particularly write. so we can provide a different ColumnFamily to CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always creates the standard ColumnFamily, but take a provided returnCF, whose cost is much cheaper. the provided patch is for demonstration now, will work further once we agree on the general direction. CFS, ColumnFamily, and Table are changed; a new FastColumnFamily is provided. the main work is to let the FastColumnFamily use an array for internal storage. at first I used binary search to insert new columns in addColumn(), but later I found that even this is not necessary, since all calling scenarios of ColumnFamily.addColumn() has an invariant that the inserted columns come in sorted order (I still have an issue to resolve descending or ascending now, but ascending works). so the current logic is simply to compare the new column against the end column in the array, if names not equal, append, if equal, reconcile. slight temporary hacks are made on getTopLevelColumnFamily so we have 2 flavors of the method, one accepting a returnCF. but we could definitely think about what is the better way to provide this returnCF. this patch compiles fine, no tests are provided yet. but I tested it in my application, and the performance improvement is dramatic: it offers about 50% reduction in read time in the 3000-column case. thanks Yang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2843) better performance on long row read
[ https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Yang updated CASSANDRA-2843: - Attachment: b.tar.gz just untar this file into the 0.8.0-rc1 source tree, then compile better performance on long row read --- Key: CASSANDRA-2843 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843 Project: Cassandra Issue Type: New Feature Reporter: Yang Yang Attachments: b.tar.gz, fast_cf.diff currently if a row contains 1000 columns, the run time becomes considerably slow (my test of a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 40 bytes in value, is about 16ms. this is all running in memory, no disk read is involved. through debugging we can find most of this time is spent on [Wall Time] org.apache.cassandra.db.Table.getRow(QueryFilter) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, ColumnFamily) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, ColumnFamily) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, int, ColumnFamily) [Wall Time] org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily, Iterator, int) [Wall Time] org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer, Iterator, int) [Wall Time] org.apache.cassandra.db.ColumnFamily.addColumn(IColumn) ColumnFamily.addColumn() is slow because it inserts into an internal concurrentSkipListMap() that maps column names to values. this structure is slow for two reasons: it needs to do synchronization; it needs to maintain a more complex structure of map. but if we look at the whole read path, thrift already defines the read output to be ListColumnOrSuperColumn so it does not make sense to use a luxury map data structure in the interium and finally convert it to a list. on the synchronization side, since the return CF is never going to be shared/modified by other threads, we know the access is always single thread, so no synchronization is needed. but these 2 features are indeed needed for ColumnFamily in other cases, particularly write. so we can provide a different ColumnFamily to CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always creates the standard ColumnFamily, but take a provided returnCF, whose cost is much cheaper. the provided patch is for demonstration now, will work further once we agree on the general direction. CFS, ColumnFamily, and Table are changed; a new FastColumnFamily is provided. the main work is to let the FastColumnFamily use an array for internal storage. at first I used binary search to insert new columns in addColumn(), but later I found that even this is not necessary, since all calling scenarios of ColumnFamily.addColumn() has an invariant that the inserted columns come in sorted order (I still have an issue to resolve descending or ascending now, but ascending works). so the current logic is simply to compare the new column against the end column in the array, if names not equal, append, if equal, reconcile. slight temporary hacks are made on getTopLevelColumnFamily so we have 2 flavors of the method, one accepting a returnCF. but we could definitely think about what is the better way to provide this returnCF. this patch compiles fine, no tests are provided yet. but I tested it in my application, and the performance improvement is dramatic: it offers about 50% reduction in read time in the 3000-column case. thanks Yang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2843) better performance on long row read
[ https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058256#comment-13058256 ] Sylvain Lebresne commented on CASSANDRA-2843: - The usual way to do thinks is to attach a patch. But I see that your diff don't include FastColumnFamily. The patch also include instrumentation and a few unrelated chances (a commented method, a change from SortedSet to Set in an unrelated method signature) that would ideally be removed. It would be great to have this rebase to the current 0.8 branch too. better performance on long row read --- Key: CASSANDRA-2843 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843 Project: Cassandra Issue Type: New Feature Reporter: Yang Yang Attachments: b.tar.gz, fast_cf.diff currently if a row contains 1000 columns, the run time becomes considerably slow (my test of a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 40 bytes in value, is about 16ms. this is all running in memory, no disk read is involved. through debugging we can find most of this time is spent on [Wall Time] org.apache.cassandra.db.Table.getRow(QueryFilter) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, ColumnFamily) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, ColumnFamily) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, int, ColumnFamily) [Wall Time] org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily, Iterator, int) [Wall Time] org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer, Iterator, int) [Wall Time] org.apache.cassandra.db.ColumnFamily.addColumn(IColumn) ColumnFamily.addColumn() is slow because it inserts into an internal concurrentSkipListMap() that maps column names to values. this structure is slow for two reasons: it needs to do synchronization; it needs to maintain a more complex structure of map. but if we look at the whole read path, thrift already defines the read output to be ListColumnOrSuperColumn so it does not make sense to use a luxury map data structure in the interium and finally convert it to a list. on the synchronization side, since the return CF is never going to be shared/modified by other threads, we know the access is always single thread, so no synchronization is needed. but these 2 features are indeed needed for ColumnFamily in other cases, particularly write. so we can provide a different ColumnFamily to CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always creates the standard ColumnFamily, but take a provided returnCF, whose cost is much cheaper. the provided patch is for demonstration now, will work further once we agree on the general direction. CFS, ColumnFamily, and Table are changed; a new FastColumnFamily is provided. the main work is to let the FastColumnFamily use an array for internal storage. at first I used binary search to insert new columns in addColumn(), but later I found that even this is not necessary, since all calling scenarios of ColumnFamily.addColumn() has an invariant that the inserted columns come in sorted order (I still have an issue to resolve descending or ascending now, but ascending works). so the current logic is simply to compare the new column against the end column in the array, if names not equal, append, if equal, reconcile. slight temporary hacks are made on getTopLevelColumnFamily so we have 2 flavors of the method, one accepting a returnCF. but we could definitely think about what is the better way to provide this returnCF. this patch compiles fine, no tests are provided yet. but I tested it in my application, and the performance improvement is dramatic: it offers about 50% reduction in read time in the 3000-column case. thanks Yang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2844) grep friendly nodetool compactionstats output
grep friendly nodetool compactionstats output - Key: CASSANDRA-2844 URL: https://issues.apache.org/jira/browse/CASSANDRA-2844 Project: Cassandra Issue Type: Improvement Components: Tools Affects Versions: 0.8.1 Reporter: Wojciech Meler Priority: Trivial output from nodetool compactionstats is quite hard to parse with text tools - it would be nice to have one line per compaction -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2844) grep friendly nodetool compactionstats output
[ https://issues.apache.org/jira/browse/CASSANDRA-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wojciech Meler updated CASSANDRA-2844: -- Attachment: comapctionstats.patch patch for 0.8.1 that do the job grep friendly nodetool compactionstats output - Key: CASSANDRA-2844 URL: https://issues.apache.org/jira/browse/CASSANDRA-2844 Project: Cassandra Issue Type: Improvement Components: Tools Affects Versions: 0.8.1 Reporter: Wojciech Meler Priority: Trivial Attachments: comapctionstats.patch output from nodetool compactionstats is quite hard to parse with text tools - it would be nice to have one line per compaction -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2819) Split rpc timeout for read and write ops
[ https://issues.apache.org/jira/browse/CASSANDRA-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058539#comment-13058539 ] Jonathan Ellis commented on CASSANDRA-2819: --- That is the wrong place for it because Message is used both to send and to receive. MDT creation time is effectively identical. Split rpc timeout for read and write ops Key: CASSANDRA-2819 URL: https://issues.apache.org/jira/browse/CASSANDRA-2819 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Stu Hood Assignee: Melvin Wang Fix For: 1.0 Attachments: twttr-cassandra-0.8-counts-resync-rpc-rw-timeouts.diff Given the vastly different latency characteristics of reads and writes, it makes sense for them to have independent rpc timeouts internally. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2845) Cassandra uses 100% system CPU on Ubuntu Natty (11.04)
Cassandra uses 100% system CPU on Ubuntu Natty (11.04) -- Key: CASSANDRA-2845 URL: https://issues.apache.org/jira/browse/CASSANDRA-2845 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.8.1, 0.8.0 Environment: Default install of Ubuntu 11.04 Reporter: Steve Corona Step 1. Boot up a brand new, default Ubuntu 11.04 Server install Step 2. Install Cassandra from Apache APT Respository (deb http://www.apache.org/dist/cassandra/debian 08x main) Step 3. apt-get install cassandra, as soon as it cassandra starts it will freeze the machine What's happening is that as soon as cassandra starts up it immediately sucks up 100% of CPU and starves the machine. This effectively bricks the box until you boot into single user mode and disable the cassandra init.d script. Under htop, the CPU usage shows up as system cpu, not user. The machine I'm testing this on is a Quad-Core Sandy Bridge w/ 16GB of Memory, so it's not a system resource issue. I've also tested this on completely different hardware (Dual 64-Bit Xeons AMD X4) and it has the same effect. Ubuntu 10.10 does not exhibit the same issue. I have only tested 0.8 and 0.8.1. root@cassandra01:/# java -version java version 1.6.0_22 OpenJDK Runtime Environment (IcedTea6 1.10.2) (6b22-1.10.2-0ubuntu1~11.04.1) OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode) root@cassandra:/# uname -a Linux cassandra01 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux /proc/cpu Intel(R) Xeon(R) CPU E31270 @ 3.40GHz /proc/meminfo MemTotal: 16459776 kB MemFree:14190708 kB -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2846) Changing replication_factor using update keyspace not working
Changing replication_factor using update keyspace not working --- Key: CASSANDRA-2846 URL: https://issues.apache.org/jira/browse/CASSANDRA-2846 Project: Cassandra Issue Type: Bug Affects Versions: 0.8.1 Environment: A clean 0.8.1 install using the default configuration Reporter: Jonas Borgström Unless I've misunderstood the new way to do this with 0.8 I think update keyspace is broken: {code} [default@unknown] create keyspace Test with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:1}]; 37f70d40-a3e9-11e0--242d50cf1fbf Waiting for schema agreement... ... schemas agree across the cluster [default@unknown] describe keyspace Test; Keyspace: Test: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: [default@unknown] update keyspace Test with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:2}]; 489fe220-a3e9-11e0--242d50cf1fbf Waiting for schema agreement... ... schemas agree across the cluster [default@unknown] describe keyspace Test; Keyspace: Test: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: {code} Isn't the second describe keyspace supposed to to say replication_factor:2? Relevant bits from system.log: {code} Migration.java (line 116) Applying migration 489fe220-a3e9-11e0--242d50cf1fbf Update keyspace Testrep strategy:SimpleStrategy{}durable_writes: true to Testrep strategy:SimpleStrategy{}durable_writes: true UpdateKeyspace.java (line 74) Keyspace updated. Please perform any manual operations {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2842) Hive JDBC connections fail with InvalidUrlException when both the C* and Hive JDBC drivers are loaded
[ https://issues.apache.org/jira/browse/CASSANDRA-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rick Shaw updated CASSANDRA-2842: - Attachment: pass-if-not-right-driver-v1.txt This test has been run against v1.0.3 of the driver. In that version the {{connect(...)}} method of {{CassandraDriver}} is called with an unsupported protocol:subprotocol in its URL. It recognizes it is not the proper protocol but erroneously throws an exception rather than returning a null to the caller stating that it can not handle it, so please move on. The patch is based on the current trunk of {{/drivers}} (v1.0.4). Hive JDBC connections fail with InvalidUrlException when both the C* and Hive JDBC drivers are loaded - Key: CASSANDRA-2842 URL: https://issues.apache.org/jira/browse/CASSANDRA-2842 Project: Cassandra Issue Type: Bug Reporter: Cathy Daw Priority: Trivial Attachments: pass-if-not-right-driver-v1.txt Hive connections fail with InvalidUrlException when both the C* and Hive JDBC drivers are loaded, and it seems the URL is being interpreted as a C* url. {code} Caused an ERROR [junit] Invalid connection url:jdbc:hive://127.0.0.1:1/default. should start with jdbc:cassandra [junit] org.apache.cassandra.cql.jdbc.InvalidUrlException: Invalid connection url:jdbc:hive://127.0.0.1:1/default. should start with jdbc:cassandra [junit] at org.apache.cassandra.cql.jdbc.CassandraDriver.connect(CassandraDriver.java:90) [junit] at java.sql.DriverManager.getConnection(DriverManager.java:582) [junit] at java.sql.DriverManager.getConnection(DriverManager.java:185) [junit] at com.datastax.bugRepros.repro_connection_error.test1_runHiveBeforeJdbc(repro_connection_error.java:34) {code} *Code Snippet: intended to illustrate the connection issues* * Copy file to test directory * Change package declaration * run: ant test -Dtest.name=repro_conn_error {code} package com.datastax.bugRepros; import java.sql.DriverManager; import java.sql.Connection; import java.sql.SQLException; import java.util.Enumeration; import org.junit.Test; public class repro_conn_error { @Test public void jdbcConnectionError() throws Exception { // Create Hive JDBC Connection - will succeed if try { // Uncomment loading C* driver to reproduce bug Class.forName(org.apache.cassandra.cql.jdbc.CassandraDriver); // Load Hive driver and connect Class.forName(org.apache.hadoop.hive.jdbc.HiveDriver); Connection hiveConn = DriverManager.getConnection(jdbc:hive://127.0.0.1:1/default, , ); hiveConn.close(); System.out.println(successful hive connection); } catch (SQLException e) { System.out.println(unsuccessful hive connection); e.printStackTrace(); } // Create C* JDBC Connection try { Class.forName(org.apache.cassandra.cql.jdbc.CassandraDriver); Connection jdbcConn = DriverManager.getConnection(jdbc:cassandra:root/root@127.0.0.1:9160/default); jdbcConn.close(); System.out.println(successful c* connection); } catch (SQLException e) { System.out.println(unsuccessful c* connection); e.printStackTrace(); } // Print out all loaded JDBC drivers. Enumeration d = java.sql.DriverManager.getDrivers(); while (d.hasMoreElements()) { Object driverAsObject = d.nextElement(); System.out.println(JDBC driver= + driverAsObject); } } } {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2739) Cannot recover SSTable with version f (current version g) during the node decommission.
[ https://issues.apache.org/jira/browse/CASSANDRA-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058568#comment-13058568 ] Thibaut commented on CASSANDRA-2739: Running into the same problem after upgrading our test cluster from 0.7.* (don't know what the exact version number was) to 0.8.1. Do I have to run scrub on each node and everything will be fine afterwards? We plan to upgrade our production cluster soon and can't afford to loose data there. Cannot recover SSTable with version f (current version g) during the node decommission. --- Key: CASSANDRA-2739 URL: https://issues.apache.org/jira/browse/CASSANDRA-2739 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.8.0 Environment: centos, cassandra 0.7.4 upgrade to 0.8.0-final. Reporter: Dikang Gu Labels: decommission, version I upgrade the 4-nodes cassandra 0.7.4 cluster to 0.8.0-final. Then, I do the bin/nodetool decommission on one node, the decommission hangs there and I got the following errors on other nodes. ERROR [Thread-55] 2011-06-03 18:02:03,500 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[Thread-55,5,main] java.lang.RuntimeException: Cannot recover SSTable with version f (current version g). at org.apache.cassandra.io.sstable.SSTableWriter.createBuilder(SSTableWriter.java:240) at org.apache.cassandra.db.CompactionManager.submitSSTableBuild(CompactionManager.java:1088) at org.apache.cassandra.streaming.StreamInSession.finished(StreamInSession.java:108) at org.apache.cassandra.streaming.IncomingStreamReader.readFile(IncomingStreamReader.java:104) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:61) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:155) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93) ERROR [Thread-56] 2011-06-03 18:02:04,285 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[Thread-56,5,main] java.lang.RuntimeException: Cannot recover SSTable with version f (current version g). at org.apache.cassandra.io.sstable.SSTableWriter.createBuilder(SSTableWriter.java:240) at org.apache.cassandra.db.CompactionManager.submitSSTableBuild(CompactionManager.java:1088) at org.apache.cassandra.streaming.StreamInSession.finished(StreamInSession.java:108) at org.apache.cassandra.streaming.IncomingStreamReader.readFile(IncomingStreamReader.java:104) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:61) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:155) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2846) Changing replication_factor using update keyspace not working
[ https://issues.apache.org/jira/browse/CASSANDRA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2846: -- Attachment: 2846.txt // server helpfully sets deprecated replication factor when it sends a KsDef back, for older clients. // we need to unset that on the new KsDef we create to avoid being treated as a legacy client in return. Changing replication_factor using update keyspace not working --- Key: CASSANDRA-2846 URL: https://issues.apache.org/jira/browse/CASSANDRA-2846 Project: Cassandra Issue Type: Bug Affects Versions: 0.8.1 Environment: A clean 0.8.1 install using the default configuration Reporter: Jonas Borgström Assignee: Jonathan Ellis Priority: Minor Fix For: 0.8.2 Attachments: 2846.txt Unless I've misunderstood the new way to do this with 0.8 I think update keyspace is broken: {code} [default@unknown] create keyspace Test with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:1}]; 37f70d40-a3e9-11e0--242d50cf1fbf Waiting for schema agreement... ... schemas agree across the cluster [default@unknown] describe keyspace Test; Keyspace: Test: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: [default@unknown] update keyspace Test with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:2}]; 489fe220-a3e9-11e0--242d50cf1fbf Waiting for schema agreement... ... schemas agree across the cluster [default@unknown] describe keyspace Test; Keyspace: Test: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: {code} Isn't the second describe keyspace supposed to to say replication_factor:2? Relevant bits from system.log: {code} Migration.java (line 116) Applying migration 489fe220-a3e9-11e0--242d50cf1fbf Update keyspace Testrep strategy:SimpleStrategy{}durable_writes: true to Testrep strategy:SimpleStrategy{}durable_writes: true UpdateKeyspace.java (line 74) Keyspace updated. Please perform any manual operations {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2845) Cassandra uses 100% system CPU on Ubuntu Natty (11.04)
[ https://issues.apache.org/jira/browse/CASSANDRA-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058613#comment-13058613 ] Steve Corona commented on CASSANDRA-2845: - I actually figured this out- it's more of a cassandra packaging issue than an issue with the actual code. I extracted the cassandra-0.8.1.deb file and diff'ed all of the files with apache-cassandra-0.8.1-bin.tar.gz. I noticed that apache-cassandra-0.8.1.jar was off by a few bytes. I extracted the jar and determined that the deb file was using a different version of the following classes: cli/CliLexer.class cli/CliParser.class cql/CqlLexer.class cql/CqlParser.class I repackaged the .deb using apache-cassandra-0.8.1.jar from the bin.tar.gz (will post instructions below) and it installed on Ubuntu 11.04 without a hitch. I'm not sure if the .jar/.class files used to package the deb were corrupted or just are a different/incomplete/broken version. Poor mans .deb repackaging until it's officially fixed: cd /tmp mkdir work cd work wget http://www.fightrice.com/mirrors/apache/cassandra/0.8.1/apache-cassandra-0.8.1-bin.tar.gz tar -zxvf apache-cassandra-0.8.1-bin.tar.gz mkdir deb cd deb wget http://www.apache.org/dist/cassandra/debian/pool/main/c/cassandra/cassandra_0.8.1_all.deb # need bintools to get ar utility sudo apt-get install binutils ar vx cassandra_0.8.1_all.deb tar -zxvf data.tar.gz rm data.tar.gz cd ./usr/share/cassandra mv /tmp/work/apache-cassandra-0.8.1/lib/apache-cassandra-0.8.1.jar . cd /tmp/work/deb tar -czvf data.tar.gz etc/ usr/ var/ rm cassandra_0.8.1_all.deb ar rc cassandra_0.8.1_all.deb debian-binary control.tar.gz data.tar.gz sudo apt-get install openjdk-6-jdk sudo dpkg -i cassandra_0.8.1_all.deb Alternatively, you can use policy-rc.d to prevent cassandra.deb's post-init script from running on install and replace the messed up .jar after it has been installed. Instructions here: http://lifeonubuntu.com/how-to-prevent-server-daemons-from-starting-during-apt-get-install/ Cassandra uses 100% system CPU on Ubuntu Natty (11.04) -- Key: CASSANDRA-2845 URL: https://issues.apache.org/jira/browse/CASSANDRA-2845 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.8.1 Environment: Default install of Ubuntu 11.04 Reporter: Steve Corona Step 1. Boot up a brand new, default Ubuntu 11.04 Server install Step 2. Install Cassandra from Apache APT Respository (deb http://www.apache.org/dist/cassandra/debian 08x main) Step 3. apt-get install cassandra, as soon as it cassandra starts it will freeze the machine What's happening is that as soon as cassandra starts up it immediately sucks up 100% of CPU and starves the machine. This effectively bricks the box until you boot into single user mode and disable the cassandra init.d script. Under htop, the CPU usage shows up as system cpu, not user. The machine I'm testing this on is a Quad-Core Sandy Bridge w/ 16GB of Memory, so it's not a system resource issue. I've also tested this on completely different hardware (Dual 64-Bit Xeons AMD X4) and it has the same effect. Ubuntu 10.10 does not exhibit the same issue. I have only tested 0.8 and 0.8.1. root@cassandra01:/# java -version java version 1.6.0_22 OpenJDK Runtime Environment (IcedTea6 1.10.2) (6b22-1.10.2-0ubuntu1~11.04.1) OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode) root@cassandra:/# uname -a Linux cassandra01 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux /proc/cpu Intel(R) Xeon(R) CPU E31270 @ 3.40GHz /proc/meminfo MemTotal: 16459776 kB MemFree:14190708 kB -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (CASSANDRA-2845) Cassandra uses 100% system CPU on Ubuntu Natty (11.04)
[ https://issues.apache.org/jira/browse/CASSANDRA-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis reassigned CASSANDRA-2845: - Assignee: paul cannon /baffled Cassandra uses 100% system CPU on Ubuntu Natty (11.04) -- Key: CASSANDRA-2845 URL: https://issues.apache.org/jira/browse/CASSANDRA-2845 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.8.1 Environment: Default install of Ubuntu 11.04 Reporter: Steve Corona Assignee: paul cannon Step 1. Boot up a brand new, default Ubuntu 11.04 Server install Step 2. Install Cassandra from Apache APT Respository (deb http://www.apache.org/dist/cassandra/debian 08x main) Step 3. apt-get install cassandra, as soon as it cassandra starts it will freeze the machine What's happening is that as soon as cassandra starts up it immediately sucks up 100% of CPU and starves the machine. This effectively bricks the box until you boot into single user mode and disable the cassandra init.d script. Under htop, the CPU usage shows up as system cpu, not user. The machine I'm testing this on is a Quad-Core Sandy Bridge w/ 16GB of Memory, so it's not a system resource issue. I've also tested this on completely different hardware (Dual 64-Bit Xeons AMD X4) and it has the same effect. Ubuntu 10.10 does not exhibit the same issue. I have only tested 0.8 and 0.8.1. root@cassandra01:/# java -version java version 1.6.0_22 OpenJDK Runtime Environment (IcedTea6 1.10.2) (6b22-1.10.2-0ubuntu1~11.04.1) OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode) root@cassandra:/# uname -a Linux cassandra01 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux /proc/cpu Intel(R) Xeon(R) CPU E31270 @ 3.40GHz /proc/meminfo MemTotal: 16459776 kB MemFree:14190708 kB -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2843) better performance on long row read
[ https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Yang updated CASSANDRA-2843: - Attachment: (was: fast_cf.diff) better performance on long row read --- Key: CASSANDRA-2843 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843 Project: Cassandra Issue Type: New Feature Reporter: Yang Yang currently if a row contains 1000 columns, the run time becomes considerably slow (my test of a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 40 bytes in value, is about 16ms. this is all running in memory, no disk read is involved. through debugging we can find most of this time is spent on [Wall Time] org.apache.cassandra.db.Table.getRow(QueryFilter) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, ColumnFamily) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, ColumnFamily) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, int, ColumnFamily) [Wall Time] org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily, Iterator, int) [Wall Time] org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer, Iterator, int) [Wall Time] org.apache.cassandra.db.ColumnFamily.addColumn(IColumn) ColumnFamily.addColumn() is slow because it inserts into an internal concurrentSkipListMap() that maps column names to values. this structure is slow for two reasons: it needs to do synchronization; it needs to maintain a more complex structure of map. but if we look at the whole read path, thrift already defines the read output to be ListColumnOrSuperColumn so it does not make sense to use a luxury map data structure in the interium and finally convert it to a list. on the synchronization side, since the return CF is never going to be shared/modified by other threads, we know the access is always single thread, so no synchronization is needed. but these 2 features are indeed needed for ColumnFamily in other cases, particularly write. so we can provide a different ColumnFamily to CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always creates the standard ColumnFamily, but take a provided returnCF, whose cost is much cheaper. the provided patch is for demonstration now, will work further once we agree on the general direction. CFS, ColumnFamily, and Table are changed; a new FastColumnFamily is provided. the main work is to let the FastColumnFamily use an array for internal storage. at first I used binary search to insert new columns in addColumn(), but later I found that even this is not necessary, since all calling scenarios of ColumnFamily.addColumn() has an invariant that the inserted columns come in sorted order (I still have an issue to resolve descending or ascending now, but ascending works). so the current logic is simply to compare the new column against the end column in the array, if names not equal, append, if equal, reconcile. slight temporary hacks are made on getTopLevelColumnFamily so we have 2 flavors of the method, one accepting a returnCF. but we could definitely think about what is the better way to provide this returnCF. this patch compiles fine, no tests are provided yet. but I tested it in my application, and the performance improvement is dramatic: it offers about 50% reduction in read time in the 3000-column case. thanks Yang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2843) better performance on long row read
[ https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Yang updated CASSANDRA-2843: - Attachment: (was: b.tar.gz) better performance on long row read --- Key: CASSANDRA-2843 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843 Project: Cassandra Issue Type: New Feature Reporter: Yang Yang currently if a row contains 1000 columns, the run time becomes considerably slow (my test of a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 40 bytes in value, is about 16ms. this is all running in memory, no disk read is involved. through debugging we can find most of this time is spent on [Wall Time] org.apache.cassandra.db.Table.getRow(QueryFilter) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, ColumnFamily) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, ColumnFamily) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, int, ColumnFamily) [Wall Time] org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily, Iterator, int) [Wall Time] org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer, Iterator, int) [Wall Time] org.apache.cassandra.db.ColumnFamily.addColumn(IColumn) ColumnFamily.addColumn() is slow because it inserts into an internal concurrentSkipListMap() that maps column names to values. this structure is slow for two reasons: it needs to do synchronization; it needs to maintain a more complex structure of map. but if we look at the whole read path, thrift already defines the read output to be ListColumnOrSuperColumn so it does not make sense to use a luxury map data structure in the interium and finally convert it to a list. on the synchronization side, since the return CF is never going to be shared/modified by other threads, we know the access is always single thread, so no synchronization is needed. but these 2 features are indeed needed for ColumnFamily in other cases, particularly write. so we can provide a different ColumnFamily to CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always creates the standard ColumnFamily, but take a provided returnCF, whose cost is much cheaper. the provided patch is for demonstration now, will work further once we agree on the general direction. CFS, ColumnFamily, and Table are changed; a new FastColumnFamily is provided. the main work is to let the FastColumnFamily use an array for internal storage. at first I used binary search to insert new columns in addColumn(), but later I found that even this is not necessary, since all calling scenarios of ColumnFamily.addColumn() has an invariant that the inserted columns come in sorted order (I still have an issue to resolve descending or ascending now, but ascending works). so the current logic is simply to compare the new column against the end column in the array, if names not equal, append, if equal, reconcile. slight temporary hacks are made on getTopLevelColumnFamily so we have 2 flavors of the method, one accepting a returnCF. but we could definitely think about what is the better way to provide this returnCF. this patch compiles fine, no tests are provided yet. but I tested it in my application, and the performance improvement is dramatic: it offers about 50% reduction in read time in the 3000-column case. thanks Yang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2843) better performance on long row read
[ https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Yang updated CASSANDRA-2843: - Attachment: fast_cf_081_trunk.diff the provided patch is for demonstration now, will work further once we agree on the general direction. CFS, ColumnFamily, and Table are changed; a new FastColumnFamily is provided. the main work is to let the FastColumnFamily use an array for internal storage. at first I used binary search to insert new columns in addColumn(), but later I found that even this is not necessary, since all calling scenarios of ColumnFamily.addColumn() has an invariant that the inserted columns come in sorted order (I still have an issue to resolve descending or ascending now, but ascending works). so the current logic is simply to compare the new column against the end column in the array, if names not equal, append, if equal, reconcile. slight temporary hacks are made on getTopLevelColumnFamily so we have 2 flavors of the method, one accepting a returnCF. but we could definitely think about what is the better way to provide this returnCF. this patch compiles fine, no tests are provided yet. but I tested it in my application, and the performance improvement is dramatic: it offers about 50% reduction in read time in the 3000-column case. better performance on long row read --- Key: CASSANDRA-2843 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843 Project: Cassandra Issue Type: New Feature Reporter: Yang Yang Attachments: fast_cf_081_trunk.diff currently if a row contains 1000 columns, the run time becomes considerably slow (my test of a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 40 bytes in value, is about 16ms. this is all running in memory, no disk read is involved. through debugging we can find most of this time is spent on [Wall Time] org.apache.cassandra.db.Table.getRow(QueryFilter) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, ColumnFamily) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, ColumnFamily) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, int, ColumnFamily) [Wall Time] org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily, Iterator, int) [Wall Time] org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer, Iterator, int) [Wall Time] org.apache.cassandra.db.ColumnFamily.addColumn(IColumn) ColumnFamily.addColumn() is slow because it inserts into an internal concurrentSkipListMap() that maps column names to values. this structure is slow for two reasons: it needs to do synchronization; it needs to maintain a more complex structure of map. but if we look at the whole read path, thrift already defines the read output to be ListColumnOrSuperColumn so it does not make sense to use a luxury map data structure in the interium and finally convert it to a list. on the synchronization side, since the return CF is never going to be shared/modified by other threads, we know the access is always single thread, so no synchronization is needed. but these 2 features are indeed needed for ColumnFamily in other cases, particularly write. so we can provide a different ColumnFamily to CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always creates the standard ColumnFamily, but take a provided returnCF, whose cost is much cheaper. the provided patch is for demonstration now, will work further once we agree on the general direction. CFS, ColumnFamily, and Table are changed; a new FastColumnFamily is provided. the main work is to let the FastColumnFamily use an array for internal storage. at first I used binary search to insert new columns in addColumn(), but later I found that even this is not necessary, since all calling scenarios of ColumnFamily.addColumn() has an invariant that the inserted columns come in sorted order (I still have an issue to resolve descending or ascending now, but ascending works). so the current logic is simply to compare the new column against the end column in the array, if names not equal, append, if equal, reconcile. slight temporary hacks are made on getTopLevelColumnFamily so we have 2 flavors of the method, one accepting a returnCF. but we could definitely think about what is the better way to provide this returnCF. this patch compiles fine, no tests are provided yet. but I tested it in my application, and the performance improvement is dramatic: it offers about 50% reduction in read time in the 3000-column case. thanks Yang -- This message is automatically generated by JIRA. For more information on
[jira] [Commented] (CASSANDRA-2843) better performance on long row read
[ https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058645#comment-13058645 ] Yang Yang commented on CASSANDRA-2843: -- thanks Sylvain. I changed the patch to be based on current svn trunk. (sorry the last attempt was based on the 080-rc1 tar ball, I did not know how to include a new file in diff -uw -r , so had to include the FastColumnFamily.java in a tar ball) sorry the last SortedSet change was a typo... I once changed SortedSet to Set when I tried to use the cheaper HashMap, but later removed it when I used array. a lot of the FastColumnFamily methods are not implemented now, but the basic functionality is there for demonstration of the idea better performance on long row read --- Key: CASSANDRA-2843 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843 Project: Cassandra Issue Type: New Feature Reporter: Yang Yang Attachments: fast_cf_081_trunk.diff currently if a row contains 1000 columns, the run time becomes considerably slow (my test of a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 40 bytes in value, is about 16ms. this is all running in memory, no disk read is involved. through debugging we can find most of this time is spent on [Wall Time] org.apache.cassandra.db.Table.getRow(QueryFilter) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, ColumnFamily) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, ColumnFamily) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, int, ColumnFamily) [Wall Time] org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily, Iterator, int) [Wall Time] org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer, Iterator, int) [Wall Time] org.apache.cassandra.db.ColumnFamily.addColumn(IColumn) ColumnFamily.addColumn() is slow because it inserts into an internal concurrentSkipListMap() that maps column names to values. this structure is slow for two reasons: it needs to do synchronization; it needs to maintain a more complex structure of map. but if we look at the whole read path, thrift already defines the read output to be ListColumnOrSuperColumn so it does not make sense to use a luxury map data structure in the interium and finally convert it to a list. on the synchronization side, since the return CF is never going to be shared/modified by other threads, we know the access is always single thread, so no synchronization is needed. but these 2 features are indeed needed for ColumnFamily in other cases, particularly write. so we can provide a different ColumnFamily to CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always creates the standard ColumnFamily, but take a provided returnCF, whose cost is much cheaper. the provided patch is for demonstration now, will work further once we agree on the general direction. CFS, ColumnFamily, and Table are changed; a new FastColumnFamily is provided. the main work is to let the FastColumnFamily use an array for internal storage. at first I used binary search to insert new columns in addColumn(), but later I found that even this is not necessary, since all calling scenarios of ColumnFamily.addColumn() has an invariant that the inserted columns come in sorted order (I still have an issue to resolve descending or ascending now, but ascending works). so the current logic is simply to compare the new column against the end column in the array, if names not equal, append, if equal, reconcile. slight temporary hacks are made on getTopLevelColumnFamily so we have 2 flavors of the method, one accepting a returnCF. but we could definitely think about what is the better way to provide this returnCF. this patch compiles fine, no tests are provided yet. but I tested it in my application, and the performance improvement is dramatic: it offers about 50% reduction in read time in the 3000-column case. thanks Yang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2843) better performance on long row read
[ https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058652#comment-13058652 ] Yang Yang commented on CASSANDRA-2843: -- right now design wise, the thing I'm most not sure about is where to properly inject the returnCF. also on a bigger scale, the multiple levels of collateIterator reducingIterator ColumnFamily Table.getRow() could probably be looked at from a more wholistic view, so that less internal conversions are done. my patch makes a small try in this step, but probably more can be done: for example getRow() converts the CFS.getSortedColumns() into another List by thriftifyColumns(). instead of list, we may just let FastColumnFamily pass the original iterators, and thriftify directly uses the iterator, instead of through the FastColumnFamily.columns_array. this time saving could be small though, since array is already very cheap. better performance on long row read --- Key: CASSANDRA-2843 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843 Project: Cassandra Issue Type: New Feature Reporter: Yang Yang Attachments: fast_cf_081_trunk.diff currently if a row contains 1000 columns, the run time becomes considerably slow (my test of a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 40 bytes in value, is about 16ms. this is all running in memory, no disk read is involved. through debugging we can find most of this time is spent on [Wall Time] org.apache.cassandra.db.Table.getRow(QueryFilter) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, ColumnFamily) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, ColumnFamily) [Wall Time] org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, int, ColumnFamily) [Wall Time] org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily, Iterator, int) [Wall Time] org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer, Iterator, int) [Wall Time] org.apache.cassandra.db.ColumnFamily.addColumn(IColumn) ColumnFamily.addColumn() is slow because it inserts into an internal concurrentSkipListMap() that maps column names to values. this structure is slow for two reasons: it needs to do synchronization; it needs to maintain a more complex structure of map. but if we look at the whole read path, thrift already defines the read output to be ListColumnOrSuperColumn so it does not make sense to use a luxury map data structure in the interium and finally convert it to a list. on the synchronization side, since the return CF is never going to be shared/modified by other threads, we know the access is always single thread, so no synchronization is needed. but these 2 features are indeed needed for ColumnFamily in other cases, particularly write. so we can provide a different ColumnFamily to CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always creates the standard ColumnFamily, but take a provided returnCF, whose cost is much cheaper. the provided patch is for demonstration now, will work further once we agree on the general direction. CFS, ColumnFamily, and Table are changed; a new FastColumnFamily is provided. the main work is to let the FastColumnFamily use an array for internal storage. at first I used binary search to insert new columns in addColumn(), but later I found that even this is not necessary, since all calling scenarios of ColumnFamily.addColumn() has an invariant that the inserted columns come in sorted order (I still have an issue to resolve descending or ascending now, but ascending works). so the current logic is simply to compare the new column against the end column in the array, if names not equal, append, if equal, reconcile. slight temporary hacks are made on getTopLevelColumnFamily so we have 2 flavors of the method, one accepting a returnCF. but we could definitely think about what is the better way to provide this returnCF. this patch compiles fine, no tests are provided yet. but I tested it in my application, and the performance improvement is dramatic: it offers about 50% reduction in read time in the 3000-column case. thanks Yang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2842) Hive JDBC connections fail with InvalidUrlException when both the C* and Hive JDBC drivers are loaded
[ https://issues.apache.org/jira/browse/CASSANDRA-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058658#comment-13058658 ] Rick Shaw commented on CASSANDRA-2842: -- I took a quick look at the Hive sources and I believe you will find the Hive Driver suffers from this defect as well. So if you reversed the order I think it will be the Hive driver that throws an exception rather than deferring to the next driver in the chain of loaded drivers(C*). Hive JDBC connections fail with InvalidUrlException when both the C* and Hive JDBC drivers are loaded - Key: CASSANDRA-2842 URL: https://issues.apache.org/jira/browse/CASSANDRA-2842 Project: Cassandra Issue Type: Bug Affects Versions: 1.0 Reporter: Cathy Daw Assignee: Rick Shaw Priority: Trivial Fix For: 1.0 Attachments: pass-if-not-right-driver-v1.txt Hive connections fail with InvalidUrlException when both the C* and Hive JDBC drivers are loaded, and it seems the URL is being interpreted as a C* url. {code} Caused an ERROR [junit] Invalid connection url:jdbc:hive://127.0.0.1:1/default. should start with jdbc:cassandra [junit] org.apache.cassandra.cql.jdbc.InvalidUrlException: Invalid connection url:jdbc:hive://127.0.0.1:1/default. should start with jdbc:cassandra [junit] at org.apache.cassandra.cql.jdbc.CassandraDriver.connect(CassandraDriver.java:90) [junit] at java.sql.DriverManager.getConnection(DriverManager.java:582) [junit] at java.sql.DriverManager.getConnection(DriverManager.java:185) [junit] at com.datastax.bugRepros.repro_connection_error.test1_runHiveBeforeJdbc(repro_connection_error.java:34) {code} *Code Snippet: intended to illustrate the connection issues* * Copy file to test directory * Change package declaration * run: ant test -Dtest.name=repro_conn_error {code} package com.datastax.bugRepros; import java.sql.DriverManager; import java.sql.Connection; import java.sql.SQLException; import java.util.Enumeration; import org.junit.Test; public class repro_conn_error { @Test public void jdbcConnectionError() throws Exception { // Create Hive JDBC Connection - will succeed if try { // Uncomment loading C* driver to reproduce bug Class.forName(org.apache.cassandra.cql.jdbc.CassandraDriver); // Load Hive driver and connect Class.forName(org.apache.hadoop.hive.jdbc.HiveDriver); Connection hiveConn = DriverManager.getConnection(jdbc:hive://127.0.0.1:1/default, , ); hiveConn.close(); System.out.println(successful hive connection); } catch (SQLException e) { System.out.println(unsuccessful hive connection); e.printStackTrace(); } // Create C* JDBC Connection try { Class.forName(org.apache.cassandra.cql.jdbc.CassandraDriver); Connection jdbcConn = DriverManager.getConnection(jdbc:cassandra:root/root@127.0.0.1:9160/default); jdbcConn.close(); System.out.println(successful c* connection); } catch (SQLException e) { System.out.println(unsuccessful c* connection); e.printStackTrace(); } // Print out all loaded JDBC drivers. Enumeration d = java.sql.DriverManager.getDrivers(); while (d.hasMoreElements()) { Object driverAsObject = d.nextElement(); System.out.println(JDBC driver= + driverAsObject); } } } {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2819) Split rpc timeout for read and write ops
[ https://issues.apache.org/jira/browse/CASSANDRA-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058684#comment-13058684 ] Melvin Wang commented on CASSANDRA-2819: how about add the creation timestamp to the header of the message? MDT is executed nearly immediately after it is created. Thus the construction time of MDT is too close to the checkpoint we have in run(). I am just concerned with the effectiveness of the current logic in MDT, although I am not sure about the consequences of adding 4 bytes to all the messages we created. Split rpc timeout for read and write ops Key: CASSANDRA-2819 URL: https://issues.apache.org/jira/browse/CASSANDRA-2819 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Stu Hood Assignee: Melvin Wang Fix For: 1.0 Attachments: twttr-cassandra-0.8-counts-resync-rpc-rw-timeouts.diff Given the vastly different latency characteristics of reads and writes, it makes sense for them to have independent rpc timeouts internally. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2847) Nullpointer Exception in get_range_slices
Nullpointer Exception in get_range_slices - Key: CASSANDRA-2847 URL: https://issues.apache.org/jira/browse/CASSANDRA-2847 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.8.1 Reporter: Thibaut Priority: Critical Hi, we upgraded our test cluster from 0.7.* to 0.8.1. We did run nodetool scrub on each node, and then nodetool repair (Repair might not have finished so far). We also upgradet to hector 0.8.1 We tried to run our application and get_range_slices fails with the following error: ERROR [pool-2-thread-15] 2011-07-01 20:15:46,224 Cassandra.java (line 3210) Internal error processing get_range_slices java.lang.NullPointerException at org.apache.cassandra.db.ColumnFamily.diff(ColumnFamily.java:298) at org.apache.cassandra.db.ColumnFamily.diff(ColumnFamily.java:406) at org.apache.cassandra.service.RowRepairResolver.maybeScheduleRepairs(RowRepairResolver.java:103) at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:120) at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:85) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:74) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:715) at org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:617) at org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.process(Cassandra.java:3202) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2845) Cassandra uses 100% system CPU on Ubuntu Natty (11.04)
[ https://issues.apache.org/jira/browse/CASSANDRA-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058706#comment-13058706 ] Steve Corona commented on CASSANDRA-2845: - Okay, so as it turns out the original problem is different than I thought. My dpkg solution was just skirting around the real issue (since dpkg doesn't force you to install all of the recommended dependencies). It's libjna-java (3.2.4-2ubuntu2) that's really causing the issue. The cassandra apt repository is pulling it in as a dependency and for, whatever reason, it sucks up all of the CPU when it runs with cassandra. I don't know if it's a matter of libjna being broken in 11.04 or just that it doesn't play nice with Cassandra. FWIW, CASSANDRA-2803 mentions deb packages libjna- not sure what role that plays into this. Here is my current workaround: mkdir -p /usr/sbin/ cat /usr/sbin/policy-rc.d #!/bin/sh exit 101 EOF chmod 755 /usr/sbin/policy-rc.d apt-get install cassandra apt-get remove libjna-java service cassandra start Cassandra uses 100% system CPU on Ubuntu Natty (11.04) -- Key: CASSANDRA-2845 URL: https://issues.apache.org/jira/browse/CASSANDRA-2845 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.8.1 Environment: Default install of Ubuntu 11.04 Reporter: Steve Corona Assignee: paul cannon Step 1. Boot up a brand new, default Ubuntu 11.04 Server install Step 2. Install Cassandra from Apache APT Respository (deb http://www.apache.org/dist/cassandra/debian 08x main) Step 3. apt-get install cassandra, as soon as it cassandra starts it will freeze the machine What's happening is that as soon as cassandra starts up it immediately sucks up 100% of CPU and starves the machine. This effectively bricks the box until you boot into single user mode and disable the cassandra init.d script. Under htop, the CPU usage shows up as system cpu, not user. The machine I'm testing this on is a Quad-Core Sandy Bridge w/ 16GB of Memory, so it's not a system resource issue. I've also tested this on completely different hardware (Dual 64-Bit Xeons AMD X4) and it has the same effect. Ubuntu 10.10 does not exhibit the same issue. I have only tested 0.8 and 0.8.1. root@cassandra01:/# java -version java version 1.6.0_22 OpenJDK Runtime Environment (IcedTea6 1.10.2) (6b22-1.10.2-0ubuntu1~11.04.1) OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode) root@cassandra:/# uname -a Linux cassandra01 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux /proc/cpu Intel(R) Xeon(R) CPU E31270 @ 3.40GHz /proc/meminfo MemTotal: 16459776 kB MemFree:14190708 kB -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2819) Split rpc timeout for read and write ops
[ https://issues.apache.org/jira/browse/CASSANDRA-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058715#comment-13058715 ] Jonathan Ellis commented on CASSANDRA-2819: --- Let's keep the scope of this ticket to splitting the rpc timeout. We can open another for make request dropping more accurate/aggressive. Split rpc timeout for read and write ops Key: CASSANDRA-2819 URL: https://issues.apache.org/jira/browse/CASSANDRA-2819 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Stu Hood Assignee: Melvin Wang Fix For: 1.0 Attachments: twttr-cassandra-0.8-counts-resync-rpc-rw-timeouts.diff Given the vastly different latency characteristics of reads and writes, it makes sense for them to have independent rpc timeouts internally. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2252) off-heap memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058733#comment-13058733 ] Jonathan Ellis commented on CASSANDRA-2252: --- JNA 3.3.0 has been released including the http://java.net/jira/browse/JNA-179 fixes. off-heap memtables -- Key: CASSANDRA-2252 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 1.0 Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, merged-2252.tgz Original Estimate: 0.4h Remaining Estimate: 0.4h The memtable design practically actively fights Java's GC design. Todd Lipcon gave a good explanation over on HBASE-3455. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2847) Nullpointer Exception in get_range_slices
[ https://issues.apache.org/jira/browse/CASSANDRA-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058736#comment-13058736 ] Jonathan Ellis commented on CASSANDRA-2847: --- Sounds like CASSANDRA-2823. Can you try svn head of the 0.8 branch? Nullpointer Exception in get_range_slices - Key: CASSANDRA-2847 URL: https://issues.apache.org/jira/browse/CASSANDRA-2847 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.8.1 Reporter: Thibaut Priority: Critical Hi, we upgraded our test cluster from 0.7.* to 0.8.1. We did run nodetool scrub on each node, and then nodetool repair (Repair might not have finished so far). We also upgradet to hector 0.8.1 We tried to run our application and get_range_slices fails with the following error: ERROR [pool-2-thread-15] 2011-07-01 20:15:46,224 Cassandra.java (line 3210) Internal error processing get_range_slices java.lang.NullPointerException at org.apache.cassandra.db.ColumnFamily.diff(ColumnFamily.java:298) at org.apache.cassandra.db.ColumnFamily.diff(ColumnFamily.java:406) at org.apache.cassandra.service.RowRepairResolver.maybeScheduleRepairs(RowRepairResolver.java:103) at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:120) at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:85) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:74) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:715) at org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:617) at org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.process(Cassandra.java:3202) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1142046 - in /cassandra/drivers/java: CHANGES.txt src/org/apache/cassandra/cql/jdbc/CassandraDriver.java
Author: jbellis Date: Fri Jul 1 19:49:33 2011 New Revision: 1142046 URL: http://svn.apache.org/viewvc?rev=1142046view=rev Log: cooperate with other jdbc drivers patch by Rick Shaw; reviewed by jbellis for CASSANDRA-2842 Modified: cassandra/drivers/java/CHANGES.txt cassandra/drivers/java/src/org/apache/cassandra/cql/jdbc/CassandraDriver.java Modified: cassandra/drivers/java/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/drivers/java/CHANGES.txt?rev=1142046r1=1142045r2=1142046view=diff == --- cassandra/drivers/java/CHANGES.txt (original) +++ cassandra/drivers/java/CHANGES.txt Fri Jul 1 19:49:33 2011 @@ -1,2 +1,3 @@ 1.0.4 * improve JDBC spec compliance (CASSANDRA-2720, 2754) + * cooperate with other jdbc drivers (CASSANDRA-2842) Modified: cassandra/drivers/java/src/org/apache/cassandra/cql/jdbc/CassandraDriver.java URL: http://svn.apache.org/viewvc/cassandra/drivers/java/src/org/apache/cassandra/cql/jdbc/CassandraDriver.java?rev=1142046r1=1142045r2=1142046view=diff == --- cassandra/drivers/java/src/org/apache/cassandra/cql/jdbc/CassandraDriver.java (original) +++ cassandra/drivers/java/src/org/apache/cassandra/cql/jdbc/CassandraDriver.java Fri Jul 1 19:49:33 2011 @@ -20,6 +20,8 @@ */ package org.apache.cassandra.cql.jdbc; +import static org.apache.cassandra.cql.jdbc.Utils.*; + import java.sql.Connection; import java.sql.Driver; import java.sql.DriverManager; @@ -39,12 +41,7 @@ import java.util.Properties; /** The Constant MINOR_VERSION. */ private static final int MINOR_VERSION = 0; - -private static final String BAD_URL = Invalid connection url: '%s'. it should start with 'jdbc:cassandra:'; -/** The ACCEPT s_ url. */ -public static final String ACCEPTS_URL = jdbc:cassandra:; - //private static final Logger logger = LoggerFactory.getLogger(CassandraDriver.class); static @@ -66,7 +63,7 @@ import java.util.Properties; */ public boolean acceptsURL(String url) throws SQLException { -return url.startsWith(ACCEPTS_URL); +return url.startsWith(PROTOCOL); } /** @@ -80,7 +77,7 @@ import java.util.Properties; } else { -throw new SQLNonTransientConnectionException(String.format(BAD_URL, url)); +return null; // signal it is the wrong driver for this protocol:subprotocol } }
svn commit: r1142050 - in /cassandra/branches/cassandra-0.8: CHANGES.txt src/java/org/apache/cassandra/tools/NodeCmd.java
Author: jbellis Date: Fri Jul 1 19:54:15 2011 New Revision: 1142050 URL: http://svn.apache.org/viewvc?rev=1142050view=rev Log: improve nodetool compactionstats formatting patch by Wojciech Meler; reviewed by jbellis for CASSANDRA-2844 Modified: cassandra/branches/cassandra-0.8/CHANGES.txt cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/tools/NodeCmd.java Modified: cassandra/branches/cassandra-0.8/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1142050r1=1142049r2=1142050view=diff == --- cassandra/branches/cassandra-0.8/CHANGES.txt (original) +++ cassandra/branches/cassandra-0.8/CHANGES.txt Fri Jul 1 19:54:15 2011 @@ -11,6 +11,7 @@ (CASSANDRA-2823) * Fix race in SystemTable.getCurrentLocalNodeId (CASSANDRA-2824) * Correctly set default for replicate_on_write (CASSANDRA-2835) + * improve nodetool compactionstats formatting (CASSANDRA-2844) 0.8.1 Modified: cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/tools/NodeCmd.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/tools/NodeCmd.java?rev=1142050r1=1142049r2=1142050view=diff == --- cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/tools/NodeCmd.java (original) +++ cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/tools/NodeCmd.java Fri Jul 1 19:54:15 2011 @@ -354,26 +354,22 @@ public class NodeCmd completed += n; outs.printf(%-25s%10s%10s%15s%n, Responses, n/a, pending, completed); } - + public void printCompactionStats(PrintStream outs) { CompactionManagerMBean cm = probe.getCompactionManagerProxy(); +outs.println(pending tasks: + cm.getPendingTasks()); +if (cm.getCompactions().size() 0) +outs.printf(%25s%16s%16s%16s%16s%10s%n, compaction type, keyspace, column family, bytes compacted, bytes total, progress); for (CompactionInfo c : cm.getCompactions()) { -outs.println(compaction type: + c.getTaskType()); -outs.println(keyspace: + c.getKeyspace()); -outs.println(column family: + c.getColumnFamily()); -outs.println(bytes compacted: + c.getBytesComplete()); -outs.println(bytes total: + c.getTotalBytes()); String percentComplete = c.getTotalBytes() == 0 ? n/a - : new DecimalFormat(#.##).format((double) c.getBytesComplete() / c.getTotalBytes() * 100) + %; -outs.println(compaction progress: + percentComplete); -outs.println(-); + : new DecimalFormat(0.00).format((double) c.getBytesComplete() / c.getTotalBytes() * 100) + %; +outs.printf(%25s%16s%16s%16s%16s%10s%n, c.getTaskType(), c.getKeyspace(), c.getColumnFamily(), c.getBytesComplete(), c.getTotalBytes(), percentComplete); } -outs.println(pending tasks: + cm.getPendingTasks()); } - + public void printColumnFamilyStats(PrintStream outs) { Map String, List ColumnFamilyStoreMBean cfstoreMap = new HashMap String, List ColumnFamilyStoreMBean();
[jira] [Updated] (CASSANDRA-2844) grep friendly nodetool compactionstats output
[ https://issues.apache.org/jira/browse/CASSANDRA-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2844: -- Affects Version/s: (was: 0.8.1) 0.8.0 Fix Version/s: 0.8.2 Assignee: Wojciech Meler grep friendly nodetool compactionstats output - Key: CASSANDRA-2844 URL: https://issues.apache.org/jira/browse/CASSANDRA-2844 Project: Cassandra Issue Type: Improvement Components: Tools Affects Versions: 0.8.0 Reporter: Wojciech Meler Assignee: Wojciech Meler Priority: Trivial Fix For: 0.8.2 Attachments: comapctionstats.patch output from nodetool compactionstats is quite hard to parse with text tools - it would be nice to have one line per compaction -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-2844) grep friendly nodetool compactionstats output
[ https://issues.apache.org/jira/browse/CASSANDRA-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-2844. --- Resolution: Fixed Reviewer: jbellis reformatted and committed. thanks! grep friendly nodetool compactionstats output - Key: CASSANDRA-2844 URL: https://issues.apache.org/jira/browse/CASSANDRA-2844 Project: Cassandra Issue Type: Improvement Components: Tools Affects Versions: 0.8.0 Reporter: Wojciech Meler Assignee: Wojciech Meler Priority: Trivial Fix For: 0.8.2 Attachments: comapctionstats.patch output from nodetool compactionstats is quite hard to parse with text tools - it would be nice to have one line per compaction -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-803) remove PropertyConfigurator from CassandraDaemon
[ https://issues.apache.org/jira/browse/CASSANDRA-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058751#comment-13058751 ] Jonathan Ellis commented on CASSANDRA-803: -- bq. I'd be happy to rip out all the log4j specific stuff and replace it with slf4j if that patch would be used. Sure, as long as the log4j-based defaults continue to work. Related: CASSANDRA-2383 remove PropertyConfigurator from CassandraDaemon Key: CASSANDRA-803 URL: https://issues.apache.org/jira/browse/CASSANDRA-803 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.6 Reporter: Jesse McConnell In order for users to make use of the EmbeddedCassandraService for unit testing they need to have a dependency declared on log4j. It would be nice if we could use the log4j-over-slf4j artifact to bridge this requirement for those of us using slf4j. http://www.slf4j.org/legacy.html#log4j-over-slf4j Currently it errors with the direct usage of the PropertyConfigurator in o.a.c.thrift.CassandraDaemon. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2786) After a minor compaction, deleted key-slices are visible again
[ https://issues.apache.org/jira/browse/CASSANDRA-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058755#comment-13058755 ] Jonathan Ellis commented on CASSANDRA-2786: --- Nit: wouldn't it be cleaner to just pass gcBefore rather than the entire controller to EchoedRow constructor? +1 otherwise. After a minor compaction, deleted key-slices are visible again -- Key: CASSANDRA-2786 URL: https://issues.apache.org/jira/browse/CASSANDRA-2786 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.8.0 Environment: Reproduced on single Cassandra node (CentOS 5.5) Reproduced on single Cassandra node (Windows Server 2008) Reporter: rene kochen Assignee: Sylvain Lebresne Fix For: 0.8.1, 0.8.2 Attachments: 0001-Fix-wrong-purge-of-deleted-cf.patch, 2786_part2.patch, CassandraIssue.zip, CassandraIssueJava.zip After a minor compaction, deleted key-slices are visible again. Steps to reproduce: 1) Insert a row named test. 2) Insert 50 rows. During this step, row test is included in a major compaction: file-1, file-2, file-3 and file-4 compacted to file-5 (includes test). 3) Delete row named test. 4) Insert 50 rows. During this step, row test is included in a minor compaction: file-6, file-7, file-8 and file-9 compacted to file-10 (should include tombstoned test). After step 4, row test is live again. Test environment: Single node with empty database. Standard configured super-column-family (I see this behavior with several gc_grace settings (big and small values): create column family Customers with column_type = 'Super' and comparator = 'BytesType; In Cassandra 0.7.6 I observe the expected behavior, i.e. after step 4, the row is still deleted. I've included a .NET program to reproduce the problem. I will add a Java version later on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058757#comment-13058757 ] Jonathan Ellis commented on CASSANDRA-2388: --- +1 to CFRR changes wasn't immediately clear to me what CFIF changes are doing, can you elaborate? ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.6, 0.8.0 Reporter: Eldon Stegall Assignee: Jeremy Hanna Labels: hadoop, inputformat Fix For: 0.7.7, 0.8.2 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388-local-nodes-only.rough-sketch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2844) grep friendly nodetool compactionstats output
[ https://issues.apache.org/jira/browse/CASSANDRA-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058765#comment-13058765 ] Hudson commented on CASSANDRA-2844: --- Integrated in Cassandra-0.8 #201 (See [https://builds.apache.org/job/Cassandra-0.8/201/]) improve nodetool compactionstats formatting patch by Wojciech Meler; reviewed by jbellis for CASSANDRA-2844 jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1142050 Files : * /cassandra/branches/cassandra-0.8/CHANGES.txt * /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/tools/NodeCmd.java grep friendly nodetool compactionstats output - Key: CASSANDRA-2844 URL: https://issues.apache.org/jira/browse/CASSANDRA-2844 Project: Cassandra Issue Type: Improvement Components: Tools Affects Versions: 0.8.0 Reporter: Wojciech Meler Assignee: Wojciech Meler Priority: Trivial Fix For: 0.8.2 Attachments: comapctionstats.patch output from nodetool compactionstats is quite hard to parse with text tools - it would be nice to have one line per compaction -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2753) Capture the max client timestamp for an SSTable
[ https://issues.apache.org/jira/browse/CASSANDRA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058767#comment-13058767 ] Jonathan Ellis commented on CASSANDRA-2753: --- Is there a reason not to have the max timestamp code in an IColumn method? Capture the max client timestamp for an SSTable --- Key: CASSANDRA-2753 URL: https://issues.apache.org/jira/browse/CASSANDRA-2753 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Alan Liang Assignee: Alan Liang Priority: Minor Attachments: 0001-capture-max-timestamp-and-created-SSTableMetadata-to-V2.patch, 0001-capture-max-timestamp-and-created-SSTableMetadata-to.patch, 0003-capture-max-timestamp-for-sstable-and-introduced-SST.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2804) expose dropped messages, exceptions over JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2804: -- Attachment: 2804-v2.txt v2 adds recently dropped mbean as in Ryan's, and changes logDroppedMessages to log the total counts to avoid interfering with it. expose dropped messages, exceptions over JMX Key: CASSANDRA-2804 URL: https://issues.apache.org/jira/browse/CASSANDRA-2804 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 0.8.2 Attachments: 2804-v2.txt, 2804.txt, twttr-cassandra-0.8-counts-resync-droppedmsg-metric.diff Patch against 0.7. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2804) expose dropped messages, exceptions over JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2804: -- Fix Version/s: (was: 0.7.7) Targeting 0.8+ now since we're changing logDroppedMessages behavior. expose dropped messages, exceptions over JMX Key: CASSANDRA-2804 URL: https://issues.apache.org/jira/browse/CASSANDRA-2804 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 0.8.2 Attachments: 2804-v2.txt, 2804.txt, twttr-cassandra-0.8-counts-resync-droppedmsg-metric.diff Patch against 0.7. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2846) Changing replication_factor using update keyspace not working
[ https://issues.apache.org/jira/browse/CASSANDRA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Hermes updated CASSANDRA-2846: -- Reviewer: jhermes (was: bcoverston) Changing replication_factor using update keyspace not working --- Key: CASSANDRA-2846 URL: https://issues.apache.org/jira/browse/CASSANDRA-2846 Project: Cassandra Issue Type: Bug Affects Versions: 0.8.1 Environment: A clean 0.8.1 install using the default configuration Reporter: Jonas Borgström Assignee: Jonathan Ellis Priority: Minor Fix For: 0.8.2 Attachments: 2846.txt Unless I've misunderstood the new way to do this with 0.8 I think update keyspace is broken: {code} [default@unknown] create keyspace Test with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:1}]; 37f70d40-a3e9-11e0--242d50cf1fbf Waiting for schema agreement... ... schemas agree across the cluster [default@unknown] describe keyspace Test; Keyspace: Test: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: [default@unknown] update keyspace Test with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:2}]; 489fe220-a3e9-11e0--242d50cf1fbf Waiting for schema agreement... ... schemas agree across the cluster [default@unknown] describe keyspace Test; Keyspace: Test: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: {code} Isn't the second describe keyspace supposed to to say replication_factor:2? Relevant bits from system.log: {code} Migration.java (line 116) Applying migration 489fe220-a3e9-11e0--242d50cf1fbf Update keyspace Testrep strategy:SimpleStrategy{}durable_writes: true to Testrep strategy:SimpleStrategy{}durable_writes: true UpdateKeyspace.java (line 74) Keyspace updated. Please perform any manual operations {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2846) Changing replication_factor using update keyspace not working
[ https://issues.apache.org/jira/browse/CASSANDRA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058855#comment-13058855 ] Jon Hermes commented on CASSANDRA-2846: --- -1, doesn't update strategy_options for KS's that already have SimpleStrategy. repro: {noformat} start 1-node local stress -o insert -n 1 (create Keyspace1 with SS and RF1) cli: [] update keyspace Keyspace1 with strategy_options=[{replication_factor:2}]; {noformat} Creating a new keyspace (Keyspace2, with default NTS and [{DC1:1}], then `update keyspace Keyspace2 with placement_strategy='org.apache.cassandra.locator.SimpleStrategy' and strategy_options=[{replication_factor:2}];` does work, however. Changing replication_factor using update keyspace not working --- Key: CASSANDRA-2846 URL: https://issues.apache.org/jira/browse/CASSANDRA-2846 Project: Cassandra Issue Type: Bug Affects Versions: 0.8.1 Environment: A clean 0.8.1 install using the default configuration Reporter: Jonas Borgström Assignee: Jonathan Ellis Priority: Minor Fix For: 0.8.2 Attachments: 2846.txt Unless I've misunderstood the new way to do this with 0.8 I think update keyspace is broken: {code} [default@unknown] create keyspace Test with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:1}]; 37f70d40-a3e9-11e0--242d50cf1fbf Waiting for schema agreement... ... schemas agree across the cluster [default@unknown] describe keyspace Test; Keyspace: Test: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: [default@unknown] update keyspace Test with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:2}]; 489fe220-a3e9-11e0--242d50cf1fbf Waiting for schema agreement... ... schemas agree across the cluster [default@unknown] describe keyspace Test; Keyspace: Test: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: {code} Isn't the second describe keyspace supposed to to say replication_factor:2? Relevant bits from system.log: {code} Migration.java (line 116) Applying migration 489fe220-a3e9-11e0--242d50cf1fbf Update keyspace Testrep strategy:SimpleStrategy{}durable_writes: true to Testrep strategy:SimpleStrategy{}durable_writes: true UpdateKeyspace.java (line 74) Keyspace updated. Please perform any manual operations {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2846) Changing replication_factor using update keyspace not working
[ https://issues.apache.org/jira/browse/CASSANDRA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058856#comment-13058856 ] Jonathan Ellis commented on CASSANDRA-2846: --- Jonas's test case works for me. Changing replication_factor using update keyspace not working --- Key: CASSANDRA-2846 URL: https://issues.apache.org/jira/browse/CASSANDRA-2846 Project: Cassandra Issue Type: Bug Affects Versions: 0.8.1 Environment: A clean 0.8.1 install using the default configuration Reporter: Jonas Borgström Assignee: Jonathan Ellis Priority: Minor Fix For: 0.8.2 Attachments: 2846.txt Unless I've misunderstood the new way to do this with 0.8 I think update keyspace is broken: {code} [default@unknown] create keyspace Test with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:1}]; 37f70d40-a3e9-11e0--242d50cf1fbf Waiting for schema agreement... ... schemas agree across the cluster [default@unknown] describe keyspace Test; Keyspace: Test: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: [default@unknown] update keyspace Test with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:2}]; 489fe220-a3e9-11e0--242d50cf1fbf Waiting for schema agreement... ... schemas agree across the cluster [default@unknown] describe keyspace Test; Keyspace: Test: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: {code} Isn't the second describe keyspace supposed to to say replication_factor:2? Relevant bits from system.log: {code} Migration.java (line 116) Applying migration 489fe220-a3e9-11e0--242d50cf1fbf Update keyspace Testrep strategy:SimpleStrategy{}durable_writes: true to Testrep strategy:SimpleStrategy{}durable_writes: true UpdateKeyspace.java (line 74) Keyspace updated. Please perform any manual operations {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-1125: -- Attachment: 1125-formatted.txt Looks good to me for the most part. (Attaching reformatted version.) One part though I'm not 100% sure about -- we're using KeyRange for start-exclusive ranges, when the Thrift API always uses it for start-inclusive. I'd be more comfortable with any of: - using a PairString, String - using a new one-off class - using KeyRange but with tokens (which Thrift also uses for start-exclusive) - using a Range object directly (also requires tokens) Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058859#comment-13058859 ] Jonathan Ellis commented on CASSANDRA-1125: --- (And I'd be fine with putting this in 0.8.x.) Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2846) Changing replication_factor using update keyspace not working
[ https://issues.apache.org/jira/browse/CASSANDRA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058855#comment-13058855 ] Jon Hermes edited comment on CASSANDRA-2846 at 7/1/11 11:21 PM: --1, doesn't update strategy_options for KS's that already have SimpleStrategy.- +1, it's good. was (Author: jhermes): -1, doesn't update strategy_options for KS's that already have SimpleStrategy. repro: {noformat} start 1-node local stress -o insert -n 1 (create Keyspace1 with SS and RF1) cli: [] update keyspace Keyspace1 with strategy_options=[{replication_factor:2}]; {noformat} Creating a new keyspace (Keyspace2, with default NTS and [{DC1:1}], then `update keyspace Keyspace2 with placement_strategy='org.apache.cassandra.locator.SimpleStrategy' and strategy_options=[{replication_factor:2}];` does work, however. Changing replication_factor using update keyspace not working --- Key: CASSANDRA-2846 URL: https://issues.apache.org/jira/browse/CASSANDRA-2846 Project: Cassandra Issue Type: Bug Affects Versions: 0.8.1 Environment: A clean 0.8.1 install using the default configuration Reporter: Jonas Borgström Assignee: Jonathan Ellis Priority: Minor Fix For: 0.8.2 Attachments: 2846.txt Unless I've misunderstood the new way to do this with 0.8 I think update keyspace is broken: {code} [default@unknown] create keyspace Test with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:1}]; 37f70d40-a3e9-11e0--242d50cf1fbf Waiting for schema agreement... ... schemas agree across the cluster [default@unknown] describe keyspace Test; Keyspace: Test: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: [default@unknown] update keyspace Test with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:2}]; 489fe220-a3e9-11e0--242d50cf1fbf Waiting for schema agreement... ... schemas agree across the cluster [default@unknown] describe keyspace Test; Keyspace: Test: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: {code} Isn't the second describe keyspace supposed to to say replication_factor:2? Relevant bits from system.log: {code} Migration.java (line 116) Applying migration 489fe220-a3e9-11e0--242d50cf1fbf Update keyspace Testrep strategy:SimpleStrategy{}durable_writes: true to Testrep strategy:SimpleStrategy{}durable_writes: true UpdateKeyspace.java (line 74) Keyspace updated. Please perform any manual operations {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira