[jira] Created: (CASSANDRA-2076) Not starting due to Invalid saved cache
Not starting due to Invalid saved cache --- Key: CASSANDRA-2076 URL: https://issues.apache.org/jira/browse/CASSANDRA-2076 Project: Cassandra Issue Type: Bug Components: Core Environment: linux Reporter: Thibaut Priority: Minor Fix For: 0.7.2 This occured on two nodes on me (running 0.7.1 from svn) One node was killed by the kernel due to a OOM and the other node was haning and I had to kill it manually with kill -9 (kill didn't work). (maybe these were faulty hardware nodes, I don't know) The saved_cache was corrupt afterwards and I couldn't start the nodes. After deleting the saved_caches directory I could start the nodes again. Instead of not starting when an error occurs, cassandra could simply delete the errornous file and continue to start? INFO 22:31:11,570 reading saved cache /hd1/cassandra_md5/saved_caches/table_attributes-table_attributes-KeyCache ERROR 22:31:11,595 Exception encountered during startup. java.lang.RuntimeException: The provided key was not UTF8 encoded. at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:159) at org.apache.cassandra.dht.OrderPreservingPartitioner.decorateKey(OrderPreservingPartitioner.java:44) at org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:281) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:218) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:458) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:440) at org.apache.cassandra.db.Table.initCf(Table.java:360) at org.apache.cassandra.db.Table.init(Table.java:290) at org.apache.cassandra.db.Table.open(Table.java:107) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:312) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:81) Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:260) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:781) at org.apache.cassandra.utils.FBUtilities.decodeToUTF8(FBUtilities.java:403) at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:155) ... 11 more Exception encountered during startup. java.lang.RuntimeException: The provided key was not UTF8 encoded. at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:159) at org.apache.cassandra.dht.OrderPreservingPartitioner.decorateKey(OrderPreservingPartitioner.java:44) at org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:281) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:218) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:458) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:440) at org.apache.cassandra.db.Table.initCf(Table.java:360) at org.apache.cassandra.db.Table.init(Table.java:290) at org.apache.cassandra.db.Table.open(Table.java:107) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:312) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:81) Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:260) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:781) at org.apache.cassandra.utils.FBUtilities.decodeToUTF8(FBUtilities.java:403) at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:155) ... 11 more -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-1600) Merge get_indexed_slices with get_range_slices
[ https://issues.apache.org/jira/browse/CASSANDRA-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988732#comment-12988732 ] Ben Pirt commented on CASSANDRA-1600: - Hopefully it is useful to get another use case for why this is important to a real-world user. We are storing time-series data and would like to be able to pull out all values between time A and time B that have a specific value as a property. Because we aren't able to combine a range slice with an indexed slice we are having to duplicate our data into several keyspaces so we can still do the range slice. Our ideal scenario would be to be able to say Give me all keys between time A and time B whose property P is greater than or equal to 5 I would imagine that in another time-series type scenario of storing lots of logs (e.g. Apache logs) it would be very useful to say Give me all logs between time A and time B with a status code of 200 Please do let me know if I'm misunderstanding things and that there is a better way of doing this, but it seems to me that it would be very useful functionality. Very much looking forward to 0.8 for this fix alone! Merge get_indexed_slices with get_range_slices -- Key: CASSANDRA-1600 URL: https://issues.apache.org/jira/browse/CASSANDRA-1600 Project: Cassandra Issue Type: Improvement Components: API Affects Versions: 0.7 beta 1 Reporter: Stu Hood Fix For: 0.8 Attachments: 0001-Add-optional-IndexClause-to-KeyRange-and-serialize-wit.txt, 0002-Drop-the-IndexClause.count-parameter.txt, 0003-Execute-RangeSliceCommands-using-scan-when-an-IndexCla.txt, 0004-Remove-get_indexed_slices-method.txt, 0005-Update-system-tests-to-use-get_range_slices.txt, 0006-Remove-start_key-from-IndexClause-for-the-start_key-in.txt, 0007-Respect-end_key-for-filtered-queries.txt, 0008-allow-applying-row-filtering-to-sequential-scan.txt, 0009-rename-Index-Filter.txt, AbstractScanIterator.java From a comment on 1157: {quote} IndexClause only has a start key for get_indexed_slices, but it would seem that the reasoning behind using 'KeyRange' for get_range_slices applies there as well, since if you know the range you care about in the primary index, you don't want to continue scanning until you exhaust 'count' (or the cluster). Since it would appear that get_indexed_slices would benefit from a KeyRange, why not smash get_(range|indexed)_slices together, and make IndexClause an optional field on KeyRange? {quote} -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Issue Comment Edited: (CASSANDRA-1600) Merge get_indexed_slices with get_range_slices
[ https://issues.apache.org/jira/browse/CASSANDRA-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988732#comment-12988732 ] Ben Pirt edited comment on CASSANDRA-1600 at 1/31/11 11:47 AM: --- Hopefully it is useful to get another use case for why this is important to a real-world user. We are storing time-series data and would like to be able to pull out all values between time A and time B that have a specific value as a property. Because we aren't able to combine a range slice with an indexed slice we are having to duplicate our data into several keyspaces so we can still do the range slice. Our ideal scenario would be to be able to say Give me all keys between time A and time B whose property P is greater than or equal to 5 I would imagine that in another time-series type scenario of storing lots of logs (e.g. Apache logs) it would be very useful to say Give me all logs between time A and time B with a status code of 200 My only question is how this works in conjunction with limit. As a user I would expect that if I limited the results to 100, I would get a max of 100 results between time A and time B which matched the secondary index query, however I understand this may be at odds with how get_range applies the limit. I would want the limit to be applied after the secondary index predicate has been applied. Please do let me know if I'm misunderstanding things and that there is a better way of doing this, but it seems to me that it would be very useful functionality. Very much looking forward to 0.8 for this fix alone! was (Author: bjpirt): Hopefully it is useful to get another use case for why this is important to a real-world user. We are storing time-series data and would like to be able to pull out all values between time A and time B that have a specific value as a property. Because we aren't able to combine a range slice with an indexed slice we are having to duplicate our data into several keyspaces so we can still do the range slice. Our ideal scenario would be to be able to say Give me all keys between time A and time B whose property P is greater than or equal to 5 I would imagine that in another time-series type scenario of storing lots of logs (e.g. Apache logs) it would be very useful to say Give me all logs between time A and time B with a status code of 200 Please do let me know if I'm misunderstanding things and that there is a better way of doing this, but it seems to me that it would be very useful functionality. Very much looking forward to 0.8 for this fix alone! Merge get_indexed_slices with get_range_slices -- Key: CASSANDRA-1600 URL: https://issues.apache.org/jira/browse/CASSANDRA-1600 Project: Cassandra Issue Type: Improvement Components: API Affects Versions: 0.7 beta 1 Reporter: Stu Hood Fix For: 0.8 Attachments: 0001-Add-optional-IndexClause-to-KeyRange-and-serialize-wit.txt, 0002-Drop-the-IndexClause.count-parameter.txt, 0003-Execute-RangeSliceCommands-using-scan-when-an-IndexCla.txt, 0004-Remove-get_indexed_slices-method.txt, 0005-Update-system-tests-to-use-get_range_slices.txt, 0006-Remove-start_key-from-IndexClause-for-the-start_key-in.txt, 0007-Respect-end_key-for-filtered-queries.txt, 0008-allow-applying-row-filtering-to-sequential-scan.txt, 0009-rename-Index-Filter.txt, AbstractScanIterator.java From a comment on 1157: {quote} IndexClause only has a start key for get_indexed_slices, but it would seem that the reasoning behind using 'KeyRange' for get_range_slices applies there as well, since if you know the range you care about in the primary index, you don't want to continue scanning until you exhaust 'count' (or the cluster). Since it would appear that get_indexed_slices would benefit from a KeyRange, why not smash get_(range|indexed)_slices together, and make IndexClause an optional field on KeyRange? {quote} -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1065627 - /cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
Author: jbellis Date: Mon Jan 31 14:45:33 2011 New Revision: 1065627 URL: http://svn.apache.org/viewvc?rev=1065627view=rev Log: include stacktrace for configuration errors in system log Modified: cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/config/DatabaseDescriptor.java Modified: cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/config/DatabaseDescriptor.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/config/DatabaseDescriptor.java?rev=1065627r1=1065626r2=1065627view=diff == --- cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/config/DatabaseDescriptor.java (original) +++ cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/config/DatabaseDescriptor.java Mon Jan 31 14:45:33 2011 @@ -374,19 +374,19 @@ public classDatabaseDescriptor } catch (UnknownHostException e) { -logger.error(Fatal error: + e.getMessage()); +logger.error(Fatal configuration error , e); System.err.println(Unable to start with unknown hosts configured. Use IP addresses instead of hostnames.); System.exit(2); } catch (ConfigurationException e) { -logger.error(Fatal error: + e.getMessage()); +logger.error(Fatal configuration error, e); System.err.println(Bad configuration; unable to start server); System.exit(1); } catch (YAMLException e) { -logger.error(Fatal error: + e.getMessage()); +logger.error(Fatal configuration error error, e); System.err.println(Bad configuration; unable to start server); System.exit(1); }
[jira] Commented: (CASSANDRA-2076) Not starting due to Invalid saved cache
[ https://issues.apache.org/jira/browse/CASSANDRA-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988780#comment-12988780 ] Thibaut commented on CASSANDRA-2076: This might be related: Two other nodes (still running) also show up the The provided key was not UTF8 encoded. error in the log. I have never seen this error in 0.7.0 ERROR [MutationStage:19] 2011-01-30 21:36:16,951 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.RuntimeException: java.lang.RuntimeException: The provided key was not UTF8 encoded. at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: The provided key was not UTF8 encoded. at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:159) at org.apache.cassandra.dht.OrderPreservingPartitioner.decorateKey(OrderPreservingPartitioner.java:44) at org.apache.cassandra.db.Table.apply(Table.java:406) at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:190) at org.apache.cassandra.service.StorageProxy$2.runMayThrow(StorageProxy.java:288) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:260) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:781) at org.apache.cassandra.utils.FBUtilities.decodeToUTF8(FBUtilities.java:403) at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:155) ... 8 more ERROR [MutationStage:19] 2011-01-30 21:36:16,991 AbstractCassandraDaemon.java (line 119) Fatal exception in thread Thread[MutationStage:19,5,main] java.lang.RuntimeException: java.lang.RuntimeException: The provided key was not UTF8 encoded. at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: The provided key was not UTF8 encoded. at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:159) at org.apache.cassandra.dht.OrderPreservingPartitioner.decorateKey(OrderPreservingPartitioner.java:44) at org.apache.cassandra.db.Table.apply(Table.java:406) at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:190) at org.apache.cassandra.service.StorageProxy$2.runMayThrow(StorageProxy.java:288) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:260) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:781) at org.apache.cassandra.utils.FBUtilities.decodeToUTF8(FBUtilities.java:403) at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:155) ... 8 more WARN [ScheduledTasks:1] 2011-01-30 21:36:21,450 MessagingService.java (line 506) Dropped 8 MUTATION messages in the last 5000ms Not starting due to Invalid saved cache --- Key: CASSANDRA-2076 URL: https://issues.apache.org/jira/browse/CASSANDRA-2076 Project: Cassandra Issue Type: Bug Components: Core Environment: linux Reporter: Thibaut Priority: Minor Fix For: 0.7.2 This occured on two nodes on me (running 0.7.1 from svn) One node was killed by the kernel due to a OOM and the other node was haning and I had to kill it manually with kill -9 (kill didn't work). (maybe these were faulty hardware nodes, I don't know) The saved_cache was corrupt afterwards and I couldn't start the nodes. After deleting the saved_caches directory I could start the nodes again. Instead of not starting when an error occurs, cassandra could simply delete the errornous file and continue to start? INFO 22:31:11,570 reading saved cache /hd1/cassandra_md5/saved_caches/table_attributes-table_attributes-KeyCache ERROR 22:31:11,595 Exception encountered during startup. java.lang.RuntimeException: The provided key was not UTF8 encoded. at
[jira] Updated: (CASSANDRA-2076) Not starting due to Invalid saved cache
[ https://issues.apache.org/jira/browse/CASSANDRA-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thibaut updated CASSANDRA-2076: --- Priority: Critical (was: Minor) Not starting due to Invalid saved cache --- Key: CASSANDRA-2076 URL: https://issues.apache.org/jira/browse/CASSANDRA-2076 Project: Cassandra Issue Type: Bug Components: Core Environment: linux Reporter: Thibaut Priority: Critical Fix For: 0.7.2 This occured on two nodes on me (running 0.7.1 from svn) One node was killed by the kernel due to a OOM and the other node was haning and I had to kill it manually with kill -9 (kill didn't work). (maybe these were faulty hardware nodes, I don't know) The saved_cache was corrupt afterwards and I couldn't start the nodes. After deleting the saved_caches directory I could start the nodes again. Instead of not starting when an error occurs, cassandra could simply delete the errornous file and continue to start? INFO 22:31:11,570 reading saved cache /hd1/cassandra_md5/saved_caches/table_attributes-table_attributes-KeyCache ERROR 22:31:11,595 Exception encountered during startup. java.lang.RuntimeException: The provided key was not UTF8 encoded. at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:159) at org.apache.cassandra.dht.OrderPreservingPartitioner.decorateKey(OrderPreservingPartitioner.java:44) at org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:281) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:218) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:458) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:440) at org.apache.cassandra.db.Table.initCf(Table.java:360) at org.apache.cassandra.db.Table.init(Table.java:290) at org.apache.cassandra.db.Table.open(Table.java:107) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:312) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:81) Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:260) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:781) at org.apache.cassandra.utils.FBUtilities.decodeToUTF8(FBUtilities.java:403) at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:155) ... 11 more Exception encountered during startup. java.lang.RuntimeException: The provided key was not UTF8 encoded. at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:159) at org.apache.cassandra.dht.OrderPreservingPartitioner.decorateKey(OrderPreservingPartitioner.java:44) at org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:281) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:218) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:458) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:440) at org.apache.cassandra.db.Table.initCf(Table.java:360) at org.apache.cassandra.db.Table.init(Table.java:290) at org.apache.cassandra.db.Table.open(Table.java:107) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:312) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:81) Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:260) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:781) at org.apache.cassandra.utils.FBUtilities.decodeToUTF8(FBUtilities.java:403) at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:155) ... 11 more -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2076) Not starting due to Invalid saved cache
[ https://issues.apache.org/jira/browse/CASSANDRA-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thibaut updated CASSANDRA-2076: --- Fix Version/s: 0.7.1 Not starting due to Invalid saved cache --- Key: CASSANDRA-2076 URL: https://issues.apache.org/jira/browse/CASSANDRA-2076 Project: Cassandra Issue Type: Bug Components: Core Environment: linux Reporter: Thibaut Priority: Critical Fix For: 0.7.1, 0.7.2 This occured on two nodes on me (running 0.7.1 from svn) One node was killed by the kernel due to a OOM and the other node was haning and I had to kill it manually with kill -9 (kill didn't work). (maybe these were faulty hardware nodes, I don't know) The saved_cache was corrupt afterwards and I couldn't start the nodes. After deleting the saved_caches directory I could start the nodes again. Instead of not starting when an error occurs, cassandra could simply delete the errornous file and continue to start? INFO 22:31:11,570 reading saved cache /hd1/cassandra_md5/saved_caches/table_attributes-table_attributes-KeyCache ERROR 22:31:11,595 Exception encountered during startup. java.lang.RuntimeException: The provided key was not UTF8 encoded. at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:159) at org.apache.cassandra.dht.OrderPreservingPartitioner.decorateKey(OrderPreservingPartitioner.java:44) at org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:281) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:218) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:458) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:440) at org.apache.cassandra.db.Table.initCf(Table.java:360) at org.apache.cassandra.db.Table.init(Table.java:290) at org.apache.cassandra.db.Table.open(Table.java:107) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:312) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:81) Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:260) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:781) at org.apache.cassandra.utils.FBUtilities.decodeToUTF8(FBUtilities.java:403) at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:155) ... 11 more Exception encountered during startup. java.lang.RuntimeException: The provided key was not UTF8 encoded. at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:159) at org.apache.cassandra.dht.OrderPreservingPartitioner.decorateKey(OrderPreservingPartitioner.java:44) at org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:281) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:218) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:458) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:440) at org.apache.cassandra.db.Table.initCf(Table.java:360) at org.apache.cassandra.db.Table.init(Table.java:290) at org.apache.cassandra.db.Table.open(Table.java:107) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:312) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:81) Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:260) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:781) at org.apache.cassandra.utils.FBUtilities.decodeToUTF8(FBUtilities.java:403) at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:155) ... 11 more -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2076) Not starting due to Invalid saved cache
[ https://issues.apache.org/jira/browse/CASSANDRA-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988781#comment-12988781 ] Thibaut commented on CASSANDRA-2076: I brought down and restarted the entire cluster. (100 nodes, 5x20 nodes) Every single node complains of an invalid file in the saved_cache directory. Not starting due to Invalid saved cache --- Key: CASSANDRA-2076 URL: https://issues.apache.org/jira/browse/CASSANDRA-2076 Project: Cassandra Issue Type: Bug Components: Core Environment: linux Reporter: Thibaut Priority: Critical Fix For: 0.7.1, 0.7.2 This occured on two nodes on me (running 0.7.1 from svn) One node was killed by the kernel due to a OOM and the other node was haning and I had to kill it manually with kill -9 (kill didn't work). (maybe these were faulty hardware nodes, I don't know) The saved_cache was corrupt afterwards and I couldn't start the nodes. After deleting the saved_caches directory I could start the nodes again. Instead of not starting when an error occurs, cassandra could simply delete the errornous file and continue to start? INFO 22:31:11,570 reading saved cache /hd1/cassandra_md5/saved_caches/table_attributes-table_attributes-KeyCache ERROR 22:31:11,595 Exception encountered during startup. java.lang.RuntimeException: The provided key was not UTF8 encoded. at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:159) at org.apache.cassandra.dht.OrderPreservingPartitioner.decorateKey(OrderPreservingPartitioner.java:44) at org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:281) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:218) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:458) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:440) at org.apache.cassandra.db.Table.initCf(Table.java:360) at org.apache.cassandra.db.Table.init(Table.java:290) at org.apache.cassandra.db.Table.open(Table.java:107) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:312) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:81) Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:260) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:781) at org.apache.cassandra.utils.FBUtilities.decodeToUTF8(FBUtilities.java:403) at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:155) ... 11 more Exception encountered during startup. java.lang.RuntimeException: The provided key was not UTF8 encoded. at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:159) at org.apache.cassandra.dht.OrderPreservingPartitioner.decorateKey(OrderPreservingPartitioner.java:44) at org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:281) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:218) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:458) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:440) at org.apache.cassandra.db.Table.initCf(Table.java:360) at org.apache.cassandra.db.Table.init(Table.java:290) at org.apache.cassandra.db.Table.open(Table.java:107) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:312) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:81) Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:260) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:781) at org.apache.cassandra.utils.FBUtilities.decodeToUTF8(FBUtilities.java:403) at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:155) ... 11 more -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2076) Not restarting due to Invalid saved cache
[ https://issues.apache.org/jira/browse/CASSANDRA-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thibaut updated CASSANDRA-2076: --- Summary: Not restarting due to Invalid saved cache (was: Not starting due to Invalid saved cache) Not restarting due to Invalid saved cache - Key: CASSANDRA-2076 URL: https://issues.apache.org/jira/browse/CASSANDRA-2076 Project: Cassandra Issue Type: Bug Components: Core Environment: linux Reporter: Thibaut Priority: Critical Fix For: 0.7.1, 0.7.2 This occured on two nodes on me (running 0.7.1 from svn) One node was killed by the kernel due to a OOM and the other node was haning and I had to kill it manually with kill -9 (kill didn't work). (maybe these were faulty hardware nodes, I don't know) The saved_cache was corrupt afterwards and I couldn't start the nodes. After deleting the saved_caches directory I could start the nodes again. Instead of not starting when an error occurs, cassandra could simply delete the errornous file and continue to start? INFO 22:31:11,570 reading saved cache /hd1/cassandra_md5/saved_caches/table_attributes-table_attributes-KeyCache ERROR 22:31:11,595 Exception encountered during startup. java.lang.RuntimeException: The provided key was not UTF8 encoded. at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:159) at org.apache.cassandra.dht.OrderPreservingPartitioner.decorateKey(OrderPreservingPartitioner.java:44) at org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:281) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:218) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:458) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:440) at org.apache.cassandra.db.Table.initCf(Table.java:360) at org.apache.cassandra.db.Table.init(Table.java:290) at org.apache.cassandra.db.Table.open(Table.java:107) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:312) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:81) Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:260) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:781) at org.apache.cassandra.utils.FBUtilities.decodeToUTF8(FBUtilities.java:403) at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:155) ... 11 more Exception encountered during startup. java.lang.RuntimeException: The provided key was not UTF8 encoded. at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:159) at org.apache.cassandra.dht.OrderPreservingPartitioner.decorateKey(OrderPreservingPartitioner.java:44) at org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:281) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:218) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:458) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:440) at org.apache.cassandra.db.Table.initCf(Table.java:360) at org.apache.cassandra.db.Table.init(Table.java:290) at org.apache.cassandra.db.Table.open(Table.java:107) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:312) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:81) Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:260) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:781) at org.apache.cassandra.utils.FBUtilities.decodeToUTF8(FBUtilities.java:403) at org.apache.cassandra.dht.OrderPreservingPartitioner.getToken(OrderPreservingPartitioner.java:155) ... 11 more -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2073) Streaming occasionally makes gossip back up
[ https://issues.apache.org/jira/browse/CASSANDRA-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988782#comment-12988782 ] Gary Dusbabek commented on CASSANDRA-2073: -- +1 Streaming occasionally makes gossip back up --- Key: CASSANDRA-2073 URL: https://issues.apache.org/jira/browse/CASSANDRA-2073 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.0 Reporter: Brandon Williams Assignee: Jonathan Ellis Priority: Minor Fix For: 0.7.2 Attachments: 2073.txt Streaming occasionally makes gossip back up, causing nodes to mark each other as down even though the network is ok. This appears to happen just after streaming has finished. I noticed this in the course of working on CASSANDRA-2072, so decommission is one way to reproduce. It seems to happen maybe one of fifteen or twenty tries, so it's fairly rare. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1065654 - in /cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra: config/DatabaseDescriptor.java service/AbstractCassandraDaemon.java service/StorageService.java
Author: jbellis Date: Mon Jan 31 15:40:48 2011 New Revision: 1065654 URL: http://svn.apache.org/viewvc?rev=1065654view=rev Log: more informative error messages for configuration problems patch by jbellis Modified: cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/config/DatabaseDescriptor.java cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/AbstractCassandraDaemon.java cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/StorageService.java Modified: cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/config/DatabaseDescriptor.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/config/DatabaseDescriptor.java?rev=1065654r1=1065653r2=1065654view=diff == --- cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/config/DatabaseDescriptor.java (original) +++ cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/config/DatabaseDescriptor.java Mon Jan 31 15:40:48 2011 @@ -120,8 +120,17 @@ public classDatabaseDescriptor { URL url = getStorageConfigURL(); logger.info(Loading settings from + url); - -InputStream input = url.openStream(); + +InputStream input = null; +try +{ +input = url.openStream(); +} +catch (IOException e) +{ +// getStorageConfigURL should have ruled this out +throw new AssertionError(e); +} org.yaml.snakeyaml.constructor.Constructor constructor = new org.yaml.snakeyaml.constructor.Constructor(Config.class); TypeDescription desc = new TypeDescription(Config.class); desc.putListPropertyType(keyspaces, RawKeyspace.class); @@ -253,7 +262,16 @@ public classDatabaseDescriptor /* Local IP or hostname to bind RPC server to */ if (conf.rpc_address != null) -rpcAddress = InetAddress.getByName(conf.rpc_address); +{ +try +{ +rpcAddress = InetAddress.getByName(conf.rpc_address); +} +catch (UnknownHostException e) +{ +throw new ConfigurationException(Unknown host in rpc_address + conf.rpc_address); +} +} if (conf.thrift_framed_transport_size_in_mb 0 conf.thrift_max_message_length_in_mb conf.thrift_framed_transport_size_in_mb) { @@ -291,6 +309,10 @@ public classDatabaseDescriptor { throw new ConfigurationException(Invalid Request Scheduler class + conf.request_scheduler); } +catch (Exception e) +{ +throw new ConfigurationException(Unable to instantiate request scheduler, e); +} } else { @@ -369,31 +391,28 @@ public classDatabaseDescriptor } for (String seedString : conf.seeds) { -seeds.add(InetAddress.getByName(seedString)); +try +{ +seeds.add(InetAddress.getByName(seedString)); +} +catch (UnknownHostException e) +{ +throw new ConfigurationException(Unknown seed + seedString + . Consider using IP addresses instead of host names); +} } } -catch (UnknownHostException e) -{ -logger.error(Fatal configuration error , e); -System.err.println(Unable to start with unknown hosts configured. Use IP addresses instead of hostnames.); -System.exit(2); -} catch (ConfigurationException e) { logger.error(Fatal configuration error, e); -System.err.println(Bad configuration; unable to start server); +System.err.println(e.getMessage() + \nFatal configuration error; unable to start server. See log for stacktrace.); System.exit(1); } catch (YAMLException e) { logger.error(Fatal configuration error error, e); -System.err.println(Bad configuration; unable to start server); +System.err.println(e.getMessage() + \nInvalid yaml; unable to start server. See log for stacktrace.); System.exit(1); } -catch (Exception e) -{ -throw new RuntimeException(e); -} } private static IEndpointSnitch createEndpointSnitch(String endpointSnitchClassName) throws ConfigurationException Modified: cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/AbstractCassandraDaemon.java URL:
svn commit: r1065660 - in /cassandra/branches/cassandra-0.7: conf/cassandra.yaml src/java/org/apache/cassandra/locator/TokenMetadata.java
Author: jbellis Date: Mon Jan 31 16:02:21 2011 New Revision: 1065660 URL: http://svn.apache.org/viewvc?rev=1065660view=rev Log: move initialization out of constructor where possible Modified: cassandra/branches/cassandra-0.7/conf/cassandra.yaml cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/locator/TokenMetadata.java Modified: cassandra/branches/cassandra-0.7/conf/cassandra.yaml URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/conf/cassandra.yaml?rev=1065660r1=1065659r2=1065660view=diff == --- cassandra/branches/cassandra-0.7/conf/cassandra.yaml (original) +++ cassandra/branches/cassandra-0.7/conf/cassandra.yaml Mon Jan 31 16:02:21 2011 @@ -220,7 +220,7 @@ rpc_timeout_in_ms: 1 # org.apache.cassandra.locator.PropertyFileSnitch: # - Proximity is determined by rack and data center, which are #explicitly configured in cassandra-topology.properties. -endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch +endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch # dynamic_snitch -- This boolean controls whether the above snitch is # wrapped with a dynamic snitch, which will monitor read latencies Modified: cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/locator/TokenMetadata.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/locator/TokenMetadata.java?rev=1065660r1=1065659r2=1065660view=diff == --- cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/locator/TokenMetadata.java (original) +++ cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/locator/TokenMetadata.java Mon Jan 31 16:02:21 2011 @@ -50,22 +50,22 @@ public class TokenMetadata // for any nodes that boot simultaneously between same two nodes. For this we cannot simply make pending ranges a ttMultimap/tt, // since that would make us unable to notice the real problem of two nodes trying to boot using the same token. // In order to do this properly, we need to know what tokens are booting at any time. -private BiMapToken, InetAddress bootstrapTokens; +private BiMapToken, InetAddress bootstrapTokens = HashBiMap.create(); // we will need to know at all times what nodes are leaving and calculate ranges accordingly. // An anonymous pending ranges list is not enough, as that does not tell which node is leaving // and/or if the ranges are there because of bootstrap or leave operation. // (See CASSANDRA-603 for more detail + examples). -private SetInetAddress leavingEndpoints; +private SetInetAddress leavingEndpoints = new HashSetInetAddress(); -private ConcurrentMapString, MultimapRange, InetAddress pendingRanges; +private ConcurrentMapString, MultimapRange, InetAddress pendingRanges = new ConcurrentHashMapString, MultimapRange, InetAddress(); /* Use this lock for manipulating the token map */ private final ReadWriteLock lock = new ReentrantReadWriteLock(true); private ArrayListToken sortedTokens; /* list of subscribers that are notified when the tokenToEndpointMap changed */ -private final CopyOnWriteArrayListAbstractReplicationStrategy subscribers; +private final CopyOnWriteArrayListAbstractReplicationStrategy subscribers = new CopyOnWriteArrayListAbstractReplicationStrategy(); public TokenMetadata() { @@ -77,11 +77,7 @@ public class TokenMetadata if (tokenToEndpointMap == null) tokenToEndpointMap = HashBiMap.create(); this.tokenToEndpointMap = tokenToEndpointMap; -bootstrapTokens = HashBiMap.create(); -leavingEndpoints = new HashSetInetAddress(); -pendingRanges = new ConcurrentHashMapString, MultimapRange, InetAddress(); sortedTokens = sortTokens(); -subscribers = new CopyOnWriteArrayListAbstractReplicationStrategy(); } private ArrayListToken sortTokens()
svn commit: r1065664 - in /cassandra/branches/cassandra-0.7: CHANGES.txt src/java/org/apache/cassandra/gms/Gossiper.java src/java/org/apache/cassandra/net/IncomingTcpConnection.java src/java/org/apach
Author: gdusbabek Date: Mon Jan 31 16:12:57 2011 New Revision: 1065664 URL: http://svn.apache.org/viewvc?rev=1065664view=rev Log: ignore messages from the future. keep track of nodes in gossip regardless. patch by gdusbabek, reviewed by jbellis. CASSANDRA-1970 Modified: cassandra/branches/cassandra-0.7/CHANGES.txt cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/gms/Gossiper.java cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/net/IncomingTcpConnection.java cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/net/MessagingService.java Modified: cassandra/branches/cassandra-0.7/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/CHANGES.txt?rev=1065664r1=1065663r2=1065664view=diff == --- cassandra/branches/cassandra-0.7/CHANGES.txt (original) +++ cassandra/branches/cassandra-0.7/CHANGES.txt Mon Jan 31 16:12:57 2011 @@ -49,7 +49,8 @@ * fix math in RandomPartitioner.describeOwnership (CASSANDRA-2071) * fix deletion of sstable non-data components (CASSANDRA-2059) * avoid blocking gossip while deleting handoff hints (CASSANDRA-2073) - + * ignore messages from newer versions, keep track of nodes in gossip + regardless of version (CASSANDRA-1970) 0.7.0-final * fix offsets to ByteBuffer.get (CASSANDRA-1939) Modified: cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/gms/Gossiper.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/gms/Gossiper.java?rev=1065664r1=1065663r2=1065664view=diff == --- cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/gms/Gossiper.java (original) +++ cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/gms/Gossiper.java Mon Jan 31 16:12:57 2011 @@ -26,6 +26,7 @@ import java.util.*; import java.util.Map.Entry; import java.util.concurrent.*; +import org.cliffc.high_scale_lib.NonBlockingHashMap; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -141,6 +142,10 @@ public class Gossiper implements IFailur * after removal to prevent nodes from falsely reincarnating during the time when removal * gossip gets propagated to all nodes */ MapInetAddress, Long justRemovedEndpoints_ = new ConcurrentHashMapInetAddress, Long(); + +// protocol versions of the other nodes in the cluster +private final ConcurrentMapInetAddress, Integer versions = new NonBlockingHashMapInetAddress, Integer(); + private Gossiper() { @@ -169,6 +174,20 @@ public class Gossiper implements IFailur { subscribers_.remove(subscriber); } + +public void setVersion(InetAddress address, int version) +{ +Integer old = versions.put(address, version); +EndpointState state = endpointStateMap_.get(address); +if (state == null) +addSavedEndpoint(address); +} + +public Integer getVersion(InetAddress address) +{ +return versions.get(address); +} + public SetInetAddress getLiveMembers() { Modified: cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/net/IncomingTcpConnection.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/net/IncomingTcpConnection.java?rev=1065664r1=1065663r2=1065664view=diff == --- cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/net/IncomingTcpConnection.java (original) +++ cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/net/IncomingTcpConnection.java Mon Jan 31 16:12:57 2011 @@ -24,6 +24,7 @@ package org.apache.cassandra.net; import java.io.*; import java.net.Socket; +import org.apache.cassandra.gms.Gossiper; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -52,6 +53,7 @@ public class IncomingTcpConnection exten { DataInputStream input; boolean isStream; +int version; try { // determine the connection type to decide whether to buffer @@ -62,6 +64,8 @@ public class IncomingTcpConnection exten if (!isStream) // we should buffer input = new DataInputStream(new BufferedInputStream(socket.getInputStream(), 4096)); +version = MessagingService.getBits(header, 15, 8); +Gossiper.instance.setVersion(socket.getInetAddress(), version); } catch (IOException e) { @@ -74,6 +78,12 @@ public class IncomingTcpConnection exten { if (isStream) { +if (version MessagingService.version_) +{ +logger.error(Received untranslated stream from newer protcol version.
svn commit: r1065665 - in /cassandra/branches/cassandra-0.7: src/java/org/apache/cassandra/locator/ src/java/org/apache/cassandra/service/ test/unit/org/apache/cassandra/dht/
Author: jbellis Date: Mon Jan 31 16:13:20 2011 New Revision: 1065665 URL: http://svn.apache.org/viewvc?rev=1065665view=rev Log: convert SS.partitioner, valueFactory to instance fields Modified: cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/locator/Ec2Snitch.java cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/MigrationManager.java cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/StorageLoadBalancer.java cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/StorageService.java cassandra/branches/cassandra-0.7/test/unit/org/apache/cassandra/dht/BootStrapperTest.java Modified: cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/locator/Ec2Snitch.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/locator/Ec2Snitch.java?rev=1065665r1=1065664r2=1065665view=diff == --- cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/locator/Ec2Snitch.java (original) +++ cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/locator/Ec2Snitch.java Mon Jan 31 16:13:20 2011 @@ -89,7 +89,7 @@ public class Ec2Snitch extends AbstractN { // Share EC2 info via gossip. We have to wait until Gossiper is initialized though. logger.info(Ec2Snitch adding ApplicationState ec2region= + ec2region + ec2zone= + ec2zone); -Gossiper.instance.addLocalApplicationState(ApplicationState.DC, StorageService.valueFactory.datacenter(ec2region)); -Gossiper.instance.addLocalApplicationState(ApplicationState.RACK, StorageService.valueFactory.rack(ec2zone)); +Gossiper.instance.addLocalApplicationState(ApplicationState.DC, StorageService.instance.valueFactory.datacenter(ec2region)); +Gossiper.instance.addLocalApplicationState(ApplicationState.RACK, StorageService.instance.valueFactory.rack(ec2zone)); } } Modified: cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/MigrationManager.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/MigrationManager.java?rev=1065665r1=1065664r2=1065665view=diff == --- cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/MigrationManager.java (original) +++ cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/MigrationManager.java Mon Jan 31 16:13:20 2011 @@ -97,7 +97,7 @@ public class MigrationManager implements MessagingService.instance().sendOneWay(msg, host); // this is for notifying nodes as they arrive in the cluster. if (!StorageService.instance.isClientMode()) - Gossiper.instance.addLocalApplicationState(ApplicationState.SCHEMA, StorageService.valueFactory.migration(version)); + Gossiper.instance.addLocalApplicationState(ApplicationState.SCHEMA, StorageService.instance.valueFactory.migration(version)); } /** Modified: cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/StorageLoadBalancer.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/StorageLoadBalancer.java?rev=1065665r1=1065664r2=1065665view=diff == --- cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/StorageLoadBalancer.java (original) +++ cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/StorageLoadBalancer.java Mon Jan 31 16:13:20 2011 @@ -348,7 +348,7 @@ public class StorageLoadBalancer impleme if (logger_.isDebugEnabled()) logger_.debug(Disseminating load info ...); Gossiper.instance.addLocalApplicationState(ApplicationState.LOAD, - StorageService.valueFactory.load(StorageService.instance.getLoad())); + StorageService.instance.valueFactory.load(StorageService.instance.getLoad())); } }; StorageService.scheduledTasks.scheduleWithFixedDelay(runnable, 2 * Gossiper.intervalInMillis_, BROADCAST_INTERVAL, TimeUnit.MILLISECONDS); Modified: cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/StorageService.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/StorageService.java?rev=1065665r1=1065664r2=1065665view=diff == --- cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/StorageService.java (original) +++
svn commit: r1065668 - /cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/AbstractCassandraDaemon.java
Author: jbellis Date: Mon Jan 31 16:16:06 2011 New Revision: 1065668 URL: http://svn.apache.org/viewvc?rev=1065668view=rev Log: fix circular initialization problem with PropertyFileSnitch caused by #1951 patch by slebresne; reviewed by jbellis Modified: cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/AbstractCassandraDaemon.java Modified: cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/AbstractCassandraDaemon.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/AbstractCassandraDaemon.java?rev=1065668r1=1065667r2=1065668view=diff == --- cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/AbstractCassandraDaemon.java (original) +++ cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/AbstractCassandraDaemon.java Mon Jan 31 16:16:06 2011 @@ -56,11 +56,6 @@ import org.mortbay.thread.ThreadPool; */ public abstract class AbstractCassandraDaemon implements CassandraDaemon { -public AbstractCassandraDaemon() -{ -StorageService.instance.registerDaemon(this); -} - //Initialize logging in such a way that it checks for config changes every 10 seconds. static { @@ -184,6 +179,7 @@ public abstract class AbstractCassandraD SystemTable.purgeIncompatibleHints(); // start server internals +StorageService.instance.registerDaemon(this); try { StorageService.instance.initServer();
svn commit: r1065669 - /cassandra/branches/cassandra-0.7/conf/cassandra.yaml
Author: jbellis Date: Mon Jan 31 16:16:44 2011 New Revision: 1065669 URL: http://svn.apache.org/viewvc?rev=1065669view=rev Log: set default snitch back to Simple Modified: cassandra/branches/cassandra-0.7/conf/cassandra.yaml Modified: cassandra/branches/cassandra-0.7/conf/cassandra.yaml URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/conf/cassandra.yaml?rev=1065669r1=1065668r2=1065669view=diff == --- cassandra/branches/cassandra-0.7/conf/cassandra.yaml (original) +++ cassandra/branches/cassandra-0.7/conf/cassandra.yaml Mon Jan 31 16:16:44 2011 @@ -220,7 +220,7 @@ rpc_timeout_in_ms: 1 # org.apache.cassandra.locator.PropertyFileSnitch: # - Proximity is determined by rack and data center, which are #explicitly configured in cassandra-topology.properties. -endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch +endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch # dynamic_snitch -- This boolean controls whether the above snitch is # wrapped with a dynamic snitch, which will monitor read latencies
svn commit: r1065676 - in /cassandra/trunk: ./ conf/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/java/org/apache/cassandra/config/ src/java/org/apache/cassandra/db/ src/java/org/apache/
Author: gdusbabek Date: Mon Jan 31 16:30:16 2011 New Revision: 1065676 URL: http://svn.apache.org/viewvc?rev=1065676view=rev Log: merge from 0.7 Modified: cassandra/trunk/ (props changed) cassandra/trunk/CHANGES.txt cassandra/trunk/conf/cassandra.yaml cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java (props changed) cassandra/trunk/src/java/org/apache/cassandra/config/DatabaseDescriptor.java cassandra/trunk/src/java/org/apache/cassandra/db/HintedHandOffManager.java cassandra/trunk/src/java/org/apache/cassandra/gms/Gossiper.java cassandra/trunk/src/java/org/apache/cassandra/locator/Ec2Snitch.java cassandra/trunk/src/java/org/apache/cassandra/locator/TokenMetadata.java cassandra/trunk/src/java/org/apache/cassandra/net/IncomingTcpConnection.java cassandra/trunk/src/java/org/apache/cassandra/net/MessagingService.java cassandra/trunk/src/java/org/apache/cassandra/service/AbstractCassandraDaemon.java cassandra/trunk/src/java/org/apache/cassandra/service/MigrationManager.java cassandra/trunk/src/java/org/apache/cassandra/service/StorageLoadBalancer.java cassandra/trunk/src/java/org/apache/cassandra/service/StorageService.java cassandra/trunk/test/unit/org/apache/cassandra/dht/BootStrapperTest.java Propchange: cassandra/trunk/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Mon Jan 31 16:30:16 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1064713 -/cassandra/branches/cassandra-0.7:1026516-1064915 +/cassandra/branches/cassandra-0.7:1026516-1065665 /cassandra/branches/cassandra-0.7.0:1053690-1055654 /cassandra/tags/cassandra-0.7.0-rc3:1051699-1053689 /incubator/cassandra/branches/cassandra-0.3:774578-796573 Modified: cassandra/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/trunk/CHANGES.txt?rev=1065676r1=1065675r2=1065676view=diff == --- cassandra/trunk/CHANGES.txt (original) +++ cassandra/trunk/CHANGES.txt Mon Jan 31 16:30:16 2011 @@ -58,7 +58,9 @@ (CASSANDRA-2058) * fix math in RandomPartitioner.describeOwnership (CASSANDRA-2071) * fix deletion of sstable non-data components (CASSANDRA-2059) - + * avoid blocking gossip while deleting handoff hints (CASSANDRA-2073) + * ignore messages from newer versions, keep track of nodes in gossip + regardless of version (CASSANDRA-1970) 0.7.0-final * fix offsets to ByteBuffer.get (CASSANDRA-1939) Modified: cassandra/trunk/conf/cassandra.yaml URL: http://svn.apache.org/viewvc/cassandra/trunk/conf/cassandra.yaml?rev=1065676r1=1065675r2=1065676view=diff == --- cassandra/trunk/conf/cassandra.yaml (original) +++ cassandra/trunk/conf/cassandra.yaml Mon Jan 31 16:30:16 2011 @@ -225,7 +225,7 @@ rpc_timeout_in_ms: 1 # org.apache.cassandra.locator.PropertyFileSnitch: # - Proximity is determined by rack and data center, which are #explicitly configured in cassandra-topology.properties. -endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch +endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch # dynamic_snitch -- This boolean controls whether the above snitch is # wrapped with a dynamic snitch, which will monitor read latencies Propchange: cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Mon Jan 31 16:30:16 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:922689-1052356,1052358-1053452,1053454,1053456-1064713 -/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1064915 +/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1065665 /cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1053690-1055654 /cassandra/tags/cassandra-0.7.0-rc3/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1051699-1053689 /incubator/cassandra/branches/cassandra-0.3/interface/gen-java/org/apache/cassandra/service/Cassandra.java:774578-796573 Propchange:
[jira] Commented: (CASSANDRA-1970) Message version resolution
[ https://issues.apache.org/jira/browse/CASSANDRA-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988795#comment-12988795 ] Hudson commented on CASSANDRA-1970: --- Integrated in Cassandra-0.7 #231 (See [https://hudson.apache.org/hudson/job/Cassandra-0.7/231/]) ignore messages from the future. keep track of nodes in gossip regardless. patch by gdusbabek, reviewed by jbellis. CASSANDRA-1970 Message version resolution -- Key: CASSANDRA-1970 URL: https://issues.apache.org/jira/browse/CASSANDRA-1970 Project: Cassandra Issue Type: Sub-task Reporter: Gary Dusbabek Assignee: Gary Dusbabek Priority: Minor Fix For: 0.7.2 Attachments: 1970.txt, v3-0001-ignore-messages-from-newer-versions-keep-track-of-node.txt When a new new node (version N) contacts an old node (version N-1) for the first time, the old node will not understand the message. One resolution mechanism would be for the old node to bounce the message back to the sender. The sender would then respond by translating the message to the appropriate version and resending it. For this to work, 0.7.1 will need to have the bounce feature. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (CASSANDRA-2079) AsciiType comparator no longer usable on numeric types in 0.7
AsciiType comparator no longer usable on numeric types in 0.7 - Key: CASSANDRA-2079 URL: https://issues.apache.org/jira/browse/CASSANDRA-2079 Project: Cassandra Issue Type: Improvement Components: Documentation website Affects Versions: 0.7.0 Environment: Ubuntu 10 Reporter: Robbie Strickland Fix For: 0.7.0 Prior to 0.7, if you wanted to use integer values other than long types as column names, you had to use AsciiType to get a valid numeric-order comparison. If you migrate to 0.7 you need to change the comparison type to IntegerType, otherwise you will get the following error: InvalidRequestException(Why: Invalid byte for ascii: -51), or something similar. The documentation should be updated to warn users of this issue. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-1969) Use BB for row cache - To Improve GC performance.
[ https://issues.apache.org/jira/browse/CASSANDRA-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988806#comment-12988806 ] Vijay commented on CASSANDRA-1969: -- Hi Jonathan, 1) Can We catch for OOM when creating direct memory? and log it and return null so it is not affecting the normal operations? 2) Can we add a JVM parameter to limit the JVM direct memory allocations (Which will include the allocations for Page Cache)? Use BB for row cache - To Improve GC performance. - Key: CASSANDRA-1969 URL: https://issues.apache.org/jira/browse/CASSANDRA-1969 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux and Mac Reporter: Vijay Assignee: Vijay Priority: Minor Attachments: 0001-Config-1969.txt, 0001-introduce-ICache-InstrumentingCache-IRowCacheProvider.txt, 0002-Update_existing-1965.txt, 0002-implement-SerializingCache.txt, 0003-New_Cache_Providers-1969.txt, 0003-add-ICache.isCopying-method.txt, 0004-TestCase-1969.txt, BB_Cache-1945.png, JMX-Cache-1945.png, Old_Cahce-1945.png, POC-0001-Config-1945.txt, POC-0002-Update_existing-1945.txt, POC-0003-New_Cache_Providers-1945.txt Java BB.allocateDirect() will allocate native memory out of the JVM and will help reducing the GC pressure in the JVM with a large Cache. From some of the basic tests it shows around 50% improvement than doing a normal Object cache. In addition this patch provide the users an option to choose BB.allocateDirect or store everything in the heap. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (CASSANDRA-2080) Upgrade to release of Whirr 0.3.0
Upgrade to release of Whirr 0.3.0 - Key: CASSANDRA-2080 URL: https://issues.apache.org/jira/browse/CASSANDRA-2080 Project: Cassandra Issue Type: Improvement Reporter: Stu Hood Assignee: Stu Hood Priority: Trivial Whirr 0.3.0 has been released. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-1600) Merge get_indexed_slices with get_range_slices
[ https://issues.apache.org/jira/browse/CASSANDRA-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988844#comment-12988844 ] Jonathan Ellis commented on CASSANDRA-1600: --- You can do this with the existing get_indexed_slices API, you just have to manually stop paging when you get to B. Merge get_indexed_slices with get_range_slices -- Key: CASSANDRA-1600 URL: https://issues.apache.org/jira/browse/CASSANDRA-1600 Project: Cassandra Issue Type: Improvement Components: API Affects Versions: 0.7 beta 1 Reporter: Stu Hood Fix For: 0.8 Attachments: 0001-Add-optional-IndexClause-to-KeyRange-and-serialize-wit.txt, 0002-Drop-the-IndexClause.count-parameter.txt, 0003-Execute-RangeSliceCommands-using-scan-when-an-IndexCla.txt, 0004-Remove-get_indexed_slices-method.txt, 0005-Update-system-tests-to-use-get_range_slices.txt, 0006-Remove-start_key-from-IndexClause-for-the-start_key-in.txt, 0007-Respect-end_key-for-filtered-queries.txt, 0008-allow-applying-row-filtering-to-sequential-scan.txt, 0009-rename-Index-Filter.txt, AbstractScanIterator.java From a comment on 1157: {quote} IndexClause only has a start key for get_indexed_slices, but it would seem that the reasoning behind using 'KeyRange' for get_range_slices applies there as well, since if you know the range you care about in the primary index, you don't want to continue scanning until you exhaust 'count' (or the cluster). Since it would appear that get_indexed_slices would benefit from a KeyRange, why not smash get_(range|indexed)_slices together, and make IndexClause an optional field on KeyRange? {quote} -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (CASSANDRA-2081) Consistency QUORUM does not work anymore
Consistency QUORUM does not work anymore Key: CASSANDRA-2081 URL: https://issues.apache.org/jira/browse/CASSANDRA-2081 Project: Cassandra Issue Type: Bug Components: Core Environment: linux, hector + cassandra Reporter: Thibaut Priority: Blocker Fix For: 0.7.1 I'm using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25. Using consistency level Quorum won't work anymore (tested it on read). Consisteny level ONE still works though I have tried this with one dead node in my cluster. If I restart cassandra with an older svn revision (apache-cassandra-2011-01-28_20-06-01.jar), I can access the cluster with consistency level QUORUM again, while still using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25 in my application. 11/01/31 19:54:38 ERROR connection.CassandraHostRetryService: Downed intr1n18(192.168.0.18):9160 host still appears to be down: Unable to open transport to intr1n18(192.168.0.18):9160 , java.net.NoRouteToHostException: No route to host 11/01/31 19:54:38 INFO connection.CassandraHostRetryService: Downed Host retry status false with host: intr1n18(192.168.0.18):9160 11/01/31 19:54:45 ERROR connection.HConnectionManager: Could not fullfill request on this host CassandraClientintr1n11:9160-483 intr1n11 is marked as up however and I can also access the node through the cassandra cli. 192.168.0.1 Up Normal 8.02 GB 5.00% 0cc 192.168.0.2 Up Normal 7.96 GB 5.00% 199 192.168.0.3 Up Normal 8.24 GB 5.00% 266 192.168.0.4 Up Normal 4.94 GB 5.00% 333 192.168.0.5 Up Normal 5.02 GB 5.00% 400 192.168.0.6 Up Normal 5 GB5.00% 4cc 192.168.0.7 Up Normal 5.1 GB 5.00% 599 192.168.0.8 Up Normal 5.07 GB 5.00% 666 192.168.0.9 Up Normal 4.78 GB 5.00% 733 192.168.0.10Up Normal 4.34 GB 5.00% 7ff 192.168.0.11Up Normal 5.01 GB 5.00% 8cc 192.168.0.12Up Normal 5.31 GB 5.00% 999 192.168.0.13Up Normal 5.56 GB 5.00% a66 192.168.0.14Up Normal 5.82 GB 5.00% b33 192.168.0.15Up Normal 5.57 GB 5.00% c00 192.168.0.16Up Normal 5.03 GB 5.00% ccc 192.168.0.17Up Normal 4.77 GB 5.00% d99 192.168.0.18Down Normal ? 5.00% e66 192.168.0.19Up Normal 4.78 GB 5.00% f33 192.168.0.20Up Normal 4.83 GB 5.00% -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2072) Race condition during decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-2072: Attachment: (was: 0003-Remove-endpoint-state-when-expiring-justRemovedEndpo.patch) Race condition during decommission -- Key: CASSANDRA-2072 URL: https://issues.apache.org/jira/browse/CASSANDRA-2072 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.0 Reporter: Brandon Williams Assignee: Brandon Williams Priority: Minor Attachments: 0001-announce-having-left-the-ring-for-RING_DELAY-on-deco.patch, 0002-Improve-TRACE-logging-for-Gossiper.patch Occasionally when decommissioning a node, there is a race condition that occurs where another node will never remove the token and thus propagate it again with a state of down. With CASSANDRA-1900 we can solve this, but it shouldn't occur in the first place. Given nodes A, B, and C, if you decommission B it will stream to A and C. When complete, B will decommission and receive this stacktrace: ERROR 00:02:40,282 Fatal exception in thread Thread[Thread-5,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:62) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:387) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:91 At this point A will show it is removing B's token, but C will not and instead its failure detector will report that B is dead, and nodetool ring on C shows B in a leaving/down state. In another gossip round, C will propagate this state back to A. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2081) Consistency QUORUM does not work anymore
[ https://issues.apache.org/jira/browse/CASSANDRA-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988865#comment-12988865 ] Jonathan Ellis commented on CASSANDRA-2081: --- What kind of doesn't work are you seeing? Consistency QUORUM does not work anymore Key: CASSANDRA-2081 URL: https://issues.apache.org/jira/browse/CASSANDRA-2081 Project: Cassandra Issue Type: Bug Components: Core Environment: linux, hector + cassandra Reporter: Thibaut Priority: Blocker Fix For: 0.7.1 I'm using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25. Using consistency level Quorum won't work anymore (tested it on read). Consisteny level ONE still works though I have tried this with one dead node in my cluster. If I restart cassandra with an older svn revision (apache-cassandra-2011-01-28_20-06-01.jar), I can access the cluster with consistency level QUORUM again, while still using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25 in my application. 11/01/31 19:54:38 ERROR connection.CassandraHostRetryService: Downed intr1n18(192.168.0.18):9160 host still appears to be down: Unable to open transport to intr1n18(192.168.0.18):9160 , java.net.NoRouteToHostException: No route to host 11/01/31 19:54:38 INFO connection.CassandraHostRetryService: Downed Host retry status false with host: intr1n18(192.168.0.18):9160 11/01/31 19:54:45 ERROR connection.HConnectionManager: Could not fullfill request on this host CassandraClientintr1n11:9160-483 intr1n11 is marked as up however and I can also access the node through the cassandra cli. 192.168.0.1 Up Normal 8.02 GB 5.00% 0cc 192.168.0.2 Up Normal 7.96 GB 5.00% 199 192.168.0.3 Up Normal 8.24 GB 5.00% 266 192.168.0.4 Up Normal 4.94 GB 5.00% 333 192.168.0.5 Up Normal 5.02 GB 5.00% 400 192.168.0.6 Up Normal 5 GB5.00% 4cc 192.168.0.7 Up Normal 5.1 GB 5.00% 599 192.168.0.8 Up Normal 5.07 GB 5.00% 666 192.168.0.9 Up Normal 4.78 GB 5.00% 733 192.168.0.10Up Normal 4.34 GB 5.00% 7ff 192.168.0.11Up Normal 5.01 GB 5.00% 8cc 192.168.0.12Up Normal 5.31 GB 5.00% 999 192.168.0.13Up Normal 5.56 GB 5.00% a66 192.168.0.14Up Normal 5.82 GB 5.00% b33 192.168.0.15Up Normal 5.57 GB 5.00% c00 192.168.0.16Up Normal 5.03 GB 5.00% ccc 192.168.0.17Up Normal 4.77 GB 5.00% d99 192.168.0.18Down Normal ? 5.00% e66 192.168.0.19Up Normal 4.78 GB 5.00% f33 192.168.0.20Up Normal 4.83 GB 5.00% -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (CASSANDRA-2083) Hinted Handoff and schema race
Hinted Handoff and schema race -- Key: CASSANDRA-2083 URL: https://issues.apache.org/jira/browse/CASSANDRA-2083 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.0 Reporter: Brandon Williams Priority: Minor If a node is down while a keyspace/cf is created and then data is inserted into the CF causing other nodes to hint, when the down node recovers it will lose some hints until the schema propagates: {noformat} ERROR 19:59:28,264 Error in row mutation org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't find cfId=1000 at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117) at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:377) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:70) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) INFO 19:59:28,356 Applying migration 28e2e7a4-2d74-11e0-9b6b-cdc89135952c {noformat} -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Issue Comment Edited: (CASSANDRA-2081) Consistency QUORUM does not work anymore
[ https://issues.apache.org/jira/browse/CASSANDRA-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988867#comment-12988867 ] Thibaut edited comment on CASSANDRA-2081 at 1/31/11 8:02 PM: - My application hangs/blocks forever as I catch all the Hector exceptions and retry when there was an error. Above log file messages will repeat itself again and again. There are also no error messages in the cassandra log file. was (Author: tbritz): My application hangs/blocks forever as I catch all the Hector exceptions and retry when there was an error. Above log file messages will repeat itself again and again. Consistency QUORUM does not work anymore Key: CASSANDRA-2081 URL: https://issues.apache.org/jira/browse/CASSANDRA-2081 Project: Cassandra Issue Type: Bug Components: Core Environment: linux, hector + cassandra Reporter: Thibaut Priority: Blocker Fix For: 0.7.1 I'm using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25. Using consistency level Quorum won't work anymore (tested it on read). Consisteny level ONE still works though I have tried this with one dead node in my cluster. If I restart cassandra with an older svn revision (apache-cassandra-2011-01-28_20-06-01.jar), I can access the cluster with consistency level QUORUM again, while still using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25 in my application. 11/01/31 19:54:38 ERROR connection.CassandraHostRetryService: Downed intr1n18(192.168.0.18):9160 host still appears to be down: Unable to open transport to intr1n18(192.168.0.18):9160 , java.net.NoRouteToHostException: No route to host 11/01/31 19:54:38 INFO connection.CassandraHostRetryService: Downed Host retry status false with host: intr1n18(192.168.0.18):9160 11/01/31 19:54:45 ERROR connection.HConnectionManager: Could not fullfill request on this host CassandraClientintr1n11:9160-483 intr1n11 is marked as up however and I can also access the node through the cassandra cli. 192.168.0.1 Up Normal 8.02 GB 5.00% 0cc 192.168.0.2 Up Normal 7.96 GB 5.00% 199 192.168.0.3 Up Normal 8.24 GB 5.00% 266 192.168.0.4 Up Normal 4.94 GB 5.00% 333 192.168.0.5 Up Normal 5.02 GB 5.00% 400 192.168.0.6 Up Normal 5 GB5.00% 4cc 192.168.0.7 Up Normal 5.1 GB 5.00% 599 192.168.0.8 Up Normal 5.07 GB 5.00% 666 192.168.0.9 Up Normal 4.78 GB 5.00% 733 192.168.0.10Up Normal 4.34 GB 5.00% 7ff 192.168.0.11Up Normal 5.01 GB 5.00% 8cc 192.168.0.12Up Normal 5.31 GB 5.00% 999 192.168.0.13Up Normal 5.56 GB 5.00% a66 192.168.0.14Up Normal 5.82 GB 5.00% b33 192.168.0.15Up Normal 5.57 GB 5.00% c00 192.168.0.16Up Normal 5.03 GB 5.00% ccc 192.168.0.17Up Normal 4.77 GB 5.00% d99 192.168.0.18Down Normal ? 5.00% e66 192.168.0.19Up Normal 4.78 GB 5.00% f33 192.168.0.20Up Normal 4.83 GB 5.00% -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2081) Consistency QUORUM does not work anymore (hector:Could not fullfill request on this host)
[ https://issues.apache.org/jira/browse/CASSANDRA-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thibaut updated CASSANDRA-2081: --- Summary: Consistency QUORUM does not work anymore (hector:Could not fullfill request on this host) (was: Consistency QUORUM does not work anymore) Consistency QUORUM does not work anymore (hector:Could not fullfill request on this host) - Key: CASSANDRA-2081 URL: https://issues.apache.org/jira/browse/CASSANDRA-2081 Project: Cassandra Issue Type: Bug Components: Core Environment: linux, hector + cassandra Reporter: Thibaut Priority: Blocker Fix For: 0.7.1 I'm using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25. Using consistency level Quorum won't work anymore (tested it on read). Consisteny level ONE still works though I have tried this with one dead node in my cluster. If I restart cassandra with an older svn revision (apache-cassandra-2011-01-28_20-06-01.jar), I can access the cluster with consistency level QUORUM again, while still using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25 in my application. 11/01/31 19:54:38 ERROR connection.CassandraHostRetryService: Downed intr1n18(192.168.0.18):9160 host still appears to be down: Unable to open transport to intr1n18(192.168.0.18):9160 , java.net.NoRouteToHostException: No route to host 11/01/31 19:54:38 INFO connection.CassandraHostRetryService: Downed Host retry status false with host: intr1n18(192.168.0.18):9160 11/01/31 19:54:45 ERROR connection.HConnectionManager: Could not fullfill request on this host CassandraClientintr1n11:9160-483 intr1n11 is marked as up however and I can also access the node through the cassandra cli. 192.168.0.1 Up Normal 8.02 GB 5.00% 0cc 192.168.0.2 Up Normal 7.96 GB 5.00% 199 192.168.0.3 Up Normal 8.24 GB 5.00% 266 192.168.0.4 Up Normal 4.94 GB 5.00% 333 192.168.0.5 Up Normal 5.02 GB 5.00% 400 192.168.0.6 Up Normal 5 GB5.00% 4cc 192.168.0.7 Up Normal 5.1 GB 5.00% 599 192.168.0.8 Up Normal 5.07 GB 5.00% 666 192.168.0.9 Up Normal 4.78 GB 5.00% 733 192.168.0.10Up Normal 4.34 GB 5.00% 7ff 192.168.0.11Up Normal 5.01 GB 5.00% 8cc 192.168.0.12Up Normal 5.31 GB 5.00% 999 192.168.0.13Up Normal 5.56 GB 5.00% a66 192.168.0.14Up Normal 5.82 GB 5.00% b33 192.168.0.15Up Normal 5.57 GB 5.00% c00 192.168.0.16Up Normal 5.03 GB 5.00% ccc 192.168.0.17Up Normal 4.77 GB 5.00% d99 192.168.0.18Down Normal ? 5.00% e66 192.168.0.19Up Normal 4.78 GB 5.00% f33 192.168.0.20Up Normal 4.83 GB 5.00% -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (CASSANDRA-2084) Corrupt sstables cause compaction to fail again, and again and again, ...
Corrupt sstables cause compaction to fail again, and again and again, ... - Key: CASSANDRA-2084 URL: https://issues.apache.org/jira/browse/CASSANDRA-2084 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.0 Environment: Ubuntu 10.10 Cassandra 0.7.0 4 Nodes Reporter: Dan Hendry I have been having some serious data corruption issues in my cluster. I suspect some deeper more serious Cassandra bug but I dont know what or where it is and I have not found a way to reproduce the issues I have been having. This ticket is for a behaviour I have observed where cassandra starts compacting a set of sstables, fails, does not clean up the tmp files, then start compacting the exact same set of sstables again. (See logs below). After awhile, the node runs out of disk space and crashes. At the very least, cassandra should clean up temp files after a failed compaction. Better yet, it should stop trying to compact that file and log what file the error occurred for. The list of corrupt sstables does not even have to be persistent, just an in memory list which gets wiped out on a restart. Here is a sample log, the same 4 sstables are being compacted then failing then being compacted again. INFO [CompactionExecutor:1] 2011-01-31 13:08:26,434 CompactionManager.java (line 272) Compacting [org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-562-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-692-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-773-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-940-Data.db')] INFO [HintedHandoff:1] 2011-01-31 13:08:28,878 HintedHandOffManager.java (line 226) Could not complete hinted handoff to /192.168.4.16 INFO [HintedHandoff:1] 2011-01-31 13:08:28,879 ColumnFamilyStore.java (line 648) switching in a fresh Memtable for HintsColumnFamily at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1296500864696.log', position=104140211) INFO [HintedHandoff:1] 2011-01-31 13:08:28,879 ColumnFamilyStore.java (line 952) Enqueuing flush of Memtable-HintsColumnFamily@1652350488(1155546 bytes, 20839 operations) INFO [FlushWriter:1] 2011-01-31 13:08:28,879 Memtable.java (line 155) Writing Memtable-HintsColumnFamily@1652350488(1155546 bytes, 20839 operations) INFO [FlushWriter:1] 2011-01-31 13:08:29,199 Memtable.java (line 162) Completed flushing /var/lib/cassandra/data/system/HintsColumnFamily-e-9-Data.db (1075487 bytes) INFO [GossipStage:1] 2011-01-31 13:08:45,508 Gossiper.java (line 569) InetAddress /192.168.4.16 is now UP INFO [COMMIT-LOG-WRITER] 2011-01-31 13:08:59,736 CommitLogSegment.java (line 50) Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1296500939735.log INFO [MutationStage:8] 2011-01-31 13:09:15,868 ColumnFamilyStore.java (line 648) switching in a fresh Memtable for UserSearch at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1296500939735.log', position=56028937) INFO [MutationStage:8] 2011-01-31 13:09:15,868 ColumnFamilyStore.java (line 952) Enqueuing flush of Memtable-UserSearch@1186863256(174163962 bytes, 2097155 operations) INFO [FlushWriter:1] 2011-01-31 13:09:15,868 Memtable.java (line 155) Writing Memtable-UserSearch@1186863256(174163962 bytes, 2097155 operations) ERROR [CompactionExecutor:1] 2011-01-31 13:09:22,462 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.io.IOError: java.io.EOFException: attempted to skip 776104308 bytes but only skipped 8469212 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:78) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:178) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:143) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:135) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38) at org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284) at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326) at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at
[jira] Issue Comment Edited: (CASSANDRA-2081) Consistency QUORUM does not work anymore (hector:Could not fullfill request on this host)
[ https://issues.apache.org/jira/browse/CASSANDRA-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988867#comment-12988867 ] Thibaut edited comment on CASSANDRA-2081 at 1/31/11 8:10 PM: - My application hangs/blocks forever as I catch all the Hector exceptions and retry when there was an error. Above log file messages will repeat itself again and again. There are also no error messages in the cassandra log file. Also Could not fullfill request on this host CassandraClient is an error message I have never seen before. was (Author: tbritz): My application hangs/blocks forever as I catch all the Hector exceptions and retry when there was an error. Above log file messages will repeat itself again and again. There are also no error messages in the cassandra log file. Consistency QUORUM does not work anymore (hector:Could not fullfill request on this host) - Key: CASSANDRA-2081 URL: https://issues.apache.org/jira/browse/CASSANDRA-2081 Project: Cassandra Issue Type: Bug Components: Core Environment: linux, hector + cassandra Reporter: Thibaut Priority: Blocker Fix For: 0.7.1 I'm using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25. Using consistency level Quorum won't work anymore (tested it on read). Consisteny level ONE still works though I have tried this with one dead node in my cluster. If I restart cassandra with an older svn revision (apache-cassandra-2011-01-28_20-06-01.jar), I can access the cluster with consistency level QUORUM again, while still using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25 in my application. 11/01/31 19:54:38 ERROR connection.CassandraHostRetryService: Downed intr1n18(192.168.0.18):9160 host still appears to be down: Unable to open transport to intr1n18(192.168.0.18):9160 , java.net.NoRouteToHostException: No route to host 11/01/31 19:54:38 INFO connection.CassandraHostRetryService: Downed Host retry status false with host: intr1n18(192.168.0.18):9160 11/01/31 19:54:45 ERROR connection.HConnectionManager: Could not fullfill request on this host CassandraClientintr1n11:9160-483 intr1n11 is marked as up however and I can also access the node through the cassandra cli. 192.168.0.1 Up Normal 8.02 GB 5.00% 0cc 192.168.0.2 Up Normal 7.96 GB 5.00% 199 192.168.0.3 Up Normal 8.24 GB 5.00% 266 192.168.0.4 Up Normal 4.94 GB 5.00% 333 192.168.0.5 Up Normal 5.02 GB 5.00% 400 192.168.0.6 Up Normal 5 GB5.00% 4cc 192.168.0.7 Up Normal 5.1 GB 5.00% 599 192.168.0.8 Up Normal 5.07 GB 5.00% 666 192.168.0.9 Up Normal 4.78 GB 5.00% 733 192.168.0.10Up Normal 4.34 GB 5.00% 7ff 192.168.0.11Up Normal 5.01 GB 5.00% 8cc 192.168.0.12Up Normal 5.31 GB 5.00% 999 192.168.0.13Up Normal 5.56 GB 5.00% a66 192.168.0.14Up Normal 5.82 GB 5.00% b33 192.168.0.15Up Normal 5.57 GB 5.00% c00 192.168.0.16Up Normal 5.03 GB 5.00% ccc 192.168.0.17Up Normal 4.77 GB 5.00% d99 192.168.0.18Down Normal ? 5.00% e66 192.168.0.19Up Normal 4.78 GB 5.00% f33 192.168.0.20Up Normal 4.83 GB 5.00% -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2084) Corrupt sstables cause compaction to fail again, and again and again, ...
[ https://issues.apache.org/jira/browse/CASSANDRA-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Hendry updated CASSANDRA-2084: -- Environment: Ubuntu 10.10 Cassandra 0.7.0 (4 Nodes) Java: - java version 1.6.0_22 - Java(TM) SE Runtime Environment (build 1.6.0_22-b04) - Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode) was: Ubuntu 10.10 Cassandra 0.7.0 4 Nodes Corrupt sstables cause compaction to fail again, and again and again, ... - Key: CASSANDRA-2084 URL: https://issues.apache.org/jira/browse/CASSANDRA-2084 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.0 Environment: Ubuntu 10.10 Cassandra 0.7.0 (4 Nodes) Java: - java version 1.6.0_22 - Java(TM) SE Runtime Environment (build 1.6.0_22-b04) - Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode) Reporter: Dan Hendry I have been having some serious data corruption issues in my cluster. I suspect some deeper more serious Cassandra bug but I dont know what or where it is and I have not found a way to reproduce the issues I have been having. This ticket is for a behaviour I have observed where cassandra starts compacting a set of sstables, fails, does not clean up the tmp files, then start compacting the exact same set of sstables again. (See logs below). After awhile, the node runs out of disk space and crashes. At the very least, cassandra should clean up temp files after a failed compaction. Better yet, it should stop trying to compact that file and log what file the error occurred for. The list of corrupt sstables does not even have to be persistent, just an in memory list which gets wiped out on a restart. Here is a sample log, the same 4 sstables are being compacted then failing then being compacted again. INFO [CompactionExecutor:1] 2011-01-31 13:08:26,434 CompactionManager.java (line 272) Compacting [org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-562-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-692-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-773-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-940-Data.db')] INFO [HintedHandoff:1] 2011-01-31 13:08:28,878 HintedHandOffManager.java (line 226) Could not complete hinted handoff to /192.168.4.16 INFO [HintedHandoff:1] 2011-01-31 13:08:28,879 ColumnFamilyStore.java (line 648) switching in a fresh Memtable for HintsColumnFamily at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1296500864696.log', position=104140211) INFO [HintedHandoff:1] 2011-01-31 13:08:28,879 ColumnFamilyStore.java (line 952) Enqueuing flush of Memtable-HintsColumnFamily@1652350488(1155546 bytes, 20839 operations) INFO [FlushWriter:1] 2011-01-31 13:08:28,879 Memtable.java (line 155) Writing Memtable-HintsColumnFamily@1652350488(1155546 bytes, 20839 operations) INFO [FlushWriter:1] 2011-01-31 13:08:29,199 Memtable.java (line 162) Completed flushing /var/lib/cassandra/data/system/HintsColumnFamily-e-9-Data.db (1075487 bytes) INFO [GossipStage:1] 2011-01-31 13:08:45,508 Gossiper.java (line 569) InetAddress /192.168.4.16 is now UP INFO [COMMIT-LOG-WRITER] 2011-01-31 13:08:59,736 CommitLogSegment.java (line 50) Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1296500939735.log INFO [MutationStage:8] 2011-01-31 13:09:15,868 ColumnFamilyStore.java (line 648) switching in a fresh Memtable for UserSearch at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1296500939735.log', position=56028937) INFO [MutationStage:8] 2011-01-31 13:09:15,868 ColumnFamilyStore.java (line 952) Enqueuing flush of Memtable-UserSearch@1186863256(174163962 bytes, 2097155 operations) INFO [FlushWriter:1] 2011-01-31 13:09:15,868 Memtable.java (line 155) Writing Memtable-UserSearch@1186863256(174163962 bytes, 2097155 operations) ERROR [CompactionExecutor:1] 2011-01-31 13:09:22,462 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.io.IOError: java.io.EOFException: attempted to skip 776104308 bytes but only skipped 8469212 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:78) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:178) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:143) at
[jira] Commented: (CASSANDRA-2081) Consistency QUORUM does not work anymore (hector:Could not fullfill request on this host)
[ https://issues.apache.org/jira/browse/CASSANDRA-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988879#comment-12988879 ] Jonathan Ellis commented on CASSANDRA-2081: --- Is this RF=3? What do you see in the Cassandra log when you set log level to debug, for the queries that Hector gives up on? What are the versions you tried that works/doesn't work? (In description above both versions are given as apache-cassandra-2011-01-28_20-06-01.jar.) Consistency QUORUM does not work anymore (hector:Could not fullfill request on this host) - Key: CASSANDRA-2081 URL: https://issues.apache.org/jira/browse/CASSANDRA-2081 Project: Cassandra Issue Type: Bug Components: Core Environment: linux, hector + cassandra Reporter: Thibaut Priority: Blocker Fix For: 0.7.1 I'm using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25. Using consistency level Quorum won't work anymore (tested it on read). Consisteny level ONE still works though I have tried this with one dead node in my cluster. If I restart cassandra with an older svn revision (apache-cassandra-2011-01-28_20-06-01.jar), I can access the cluster with consistency level QUORUM again, while still using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25 in my application. 11/01/31 19:54:38 ERROR connection.CassandraHostRetryService: Downed intr1n18(192.168.0.18):9160 host still appears to be down: Unable to open transport to intr1n18(192.168.0.18):9160 , java.net.NoRouteToHostException: No route to host 11/01/31 19:54:38 INFO connection.CassandraHostRetryService: Downed Host retry status false with host: intr1n18(192.168.0.18):9160 11/01/31 19:54:45 ERROR connection.HConnectionManager: Could not fullfill request on this host CassandraClientintr1n11:9160-483 intr1n11 is marked as up however and I can also access the node through the cassandra cli. 192.168.0.1 Up Normal 8.02 GB 5.00% 0cc 192.168.0.2 Up Normal 7.96 GB 5.00% 199 192.168.0.3 Up Normal 8.24 GB 5.00% 266 192.168.0.4 Up Normal 4.94 GB 5.00% 333 192.168.0.5 Up Normal 5.02 GB 5.00% 400 192.168.0.6 Up Normal 5 GB5.00% 4cc 192.168.0.7 Up Normal 5.1 GB 5.00% 599 192.168.0.8 Up Normal 5.07 GB 5.00% 666 192.168.0.9 Up Normal 4.78 GB 5.00% 733 192.168.0.10Up Normal 4.34 GB 5.00% 7ff 192.168.0.11Up Normal 5.01 GB 5.00% 8cc 192.168.0.12Up Normal 5.31 GB 5.00% 999 192.168.0.13Up Normal 5.56 GB 5.00% a66 192.168.0.14Up Normal 5.82 GB 5.00% b33 192.168.0.15Up Normal 5.57 GB 5.00% c00 192.168.0.16Up Normal 5.03 GB 5.00% ccc 192.168.0.17Up Normal 4.77 GB 5.00% d99 192.168.0.18Down Normal ? 5.00% e66 192.168.0.19Up Normal 4.78 GB 5.00% f33 192.168.0.20Up Normal 4.83 GB 5.00% -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2081) Consistency QUORUM does not work anymore (hector:Could not fullfill request on this host)
[ https://issues.apache.org/jira/browse/CASSANDRA-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1295#comment-1295 ] Thibaut commented on CASSANDRA-2081: RF=3 I will enable the debug log level tomorrow for cassandra, switch back to apache-cassandra-2011-01-28_20-06-01.jar and post you the results. The last version that I tried that worked was apache-cassandra-2011-01-24_06-01-26.jar. apache-cassandra-2011-01-28_20-06-01.jar doesn't work anymore. Consistency QUORUM does not work anymore (hector:Could not fullfill request on this host) - Key: CASSANDRA-2081 URL: https://issues.apache.org/jira/browse/CASSANDRA-2081 Project: Cassandra Issue Type: Bug Components: Core Environment: linux, hector + cassandra Reporter: Thibaut Priority: Blocker Fix For: 0.7.1 I'm using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25. Using consistency level Quorum won't work anymore (tested it on read). Consisteny level ONE still works though I have tried this with one dead node in my cluster. If I restart cassandra with an older svn revision (apache-cassandra-2011-01-28_20-06-01.jar), I can access the cluster with consistency level QUORUM again, while still using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25 in my application. 11/01/31 19:54:38 ERROR connection.CassandraHostRetryService: Downed intr1n18(192.168.0.18):9160 host still appears to be down: Unable to open transport to intr1n18(192.168.0.18):9160 , java.net.NoRouteToHostException: No route to host 11/01/31 19:54:38 INFO connection.CassandraHostRetryService: Downed Host retry status false with host: intr1n18(192.168.0.18):9160 11/01/31 19:54:45 ERROR connection.HConnectionManager: Could not fullfill request on this host CassandraClientintr1n11:9160-483 intr1n11 is marked as up however and I can also access the node through the cassandra cli. 192.168.0.1 Up Normal 8.02 GB 5.00% 0cc 192.168.0.2 Up Normal 7.96 GB 5.00% 199 192.168.0.3 Up Normal 8.24 GB 5.00% 266 192.168.0.4 Up Normal 4.94 GB 5.00% 333 192.168.0.5 Up Normal 5.02 GB 5.00% 400 192.168.0.6 Up Normal 5 GB5.00% 4cc 192.168.0.7 Up Normal 5.1 GB 5.00% 599 192.168.0.8 Up Normal 5.07 GB 5.00% 666 192.168.0.9 Up Normal 4.78 GB 5.00% 733 192.168.0.10Up Normal 4.34 GB 5.00% 7ff 192.168.0.11Up Normal 5.01 GB 5.00% 8cc 192.168.0.12Up Normal 5.31 GB 5.00% 999 192.168.0.13Up Normal 5.56 GB 5.00% a66 192.168.0.14Up Normal 5.82 GB 5.00% b33 192.168.0.15Up Normal 5.57 GB 5.00% c00 192.168.0.16Up Normal 5.03 GB 5.00% ccc 192.168.0.17Up Normal 4.77 GB 5.00% d99 192.168.0.18Down Normal ? 5.00% e66 192.168.0.19Up Normal 4.78 GB 5.00% f33 192.168.0.20Up Normal 4.83 GB 5.00% -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2058) Nodes periodically spike in load
[ https://issues.apache.org/jira/browse/CASSANDRA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988890#comment-12988890 ] David King commented on CASSANDRA-2058: --- I have upgraded to 0.6.11 and am definitely still seeing this problem (although I'm no longer seeing the 30% performance hit while the nodes are up) Nodes periodically spike in load Key: CASSANDRA-2058 URL: https://issues.apache.org/jira/browse/CASSANDRA-2058 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.6.10, 0.7.1 Reporter: David King Assignee: Jonathan Ellis Fix For: 0.6.11, 0.7.1 Attachments: 2058-0.7-v2.txt, 2058-0.7-v3.txt, 2058-0.7.txt, 2058.txt, cassandra.pmc01.log.bz2, cassandra.pmc14.log.bz2, graph a.png, graph b.png (Filing as a placeholder bug as I gather information.) At ~10p 24 Jan, I upgraded our 20-node cluster from 0.6.8-0.6.10, turned on the DES, and moved some CFs from one KS into another (drain whole cluster, take it down, move files, change schema, put it back up). Since then, I've had four storms whereby a node's load will shoot to 700+ (400% CPU on a 4-cpu machine) and become totally unresponsive. After a moment or two like that, its neighbour dies too, and the failure cascades around the ring. Unfortunately because of the high load I'm not able to get into the machine to pull a thread dump to see wtf it's doing as it happens. I've also had an issue where a single node spikes up to high load, but recovers. This may or may not be the same issue from which the nodes don't recover as above, but both are new behaviour -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2081) Consistency QUORUM does not work anymore (hector:Could not fullfill request on this host)
[ https://issues.apache.org/jira/browse/CASSANDRA-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988894#comment-12988894 ] Brandon Williams commented on CASSANDRA-2081: - I'm not able to reproduce with contrib/stress, can you try that? Consistency QUORUM does not work anymore (hector:Could not fullfill request on this host) - Key: CASSANDRA-2081 URL: https://issues.apache.org/jira/browse/CASSANDRA-2081 Project: Cassandra Issue Type: Bug Components: Core Environment: linux, hector + cassandra Reporter: Thibaut Priority: Blocker Fix For: 0.7.1 I'm using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25. Using consistency level Quorum won't work anymore (tested it on read). Consisteny level ONE still works though I have tried this with one dead node in my cluster. If I restart cassandra with an older svn revision (apache-cassandra-2011-01-28_20-06-01.jar), I can access the cluster with consistency level QUORUM again, while still using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25 in my application. 11/01/31 19:54:38 ERROR connection.CassandraHostRetryService: Downed intr1n18(192.168.0.18):9160 host still appears to be down: Unable to open transport to intr1n18(192.168.0.18):9160 , java.net.NoRouteToHostException: No route to host 11/01/31 19:54:38 INFO connection.CassandraHostRetryService: Downed Host retry status false with host: intr1n18(192.168.0.18):9160 11/01/31 19:54:45 ERROR connection.HConnectionManager: Could not fullfill request on this host CassandraClientintr1n11:9160-483 intr1n11 is marked as up however and I can also access the node through the cassandra cli. 192.168.0.1 Up Normal 8.02 GB 5.00% 0cc 192.168.0.2 Up Normal 7.96 GB 5.00% 199 192.168.0.3 Up Normal 8.24 GB 5.00% 266 192.168.0.4 Up Normal 4.94 GB 5.00% 333 192.168.0.5 Up Normal 5.02 GB 5.00% 400 192.168.0.6 Up Normal 5 GB5.00% 4cc 192.168.0.7 Up Normal 5.1 GB 5.00% 599 192.168.0.8 Up Normal 5.07 GB 5.00% 666 192.168.0.9 Up Normal 4.78 GB 5.00% 733 192.168.0.10Up Normal 4.34 GB 5.00% 7ff 192.168.0.11Up Normal 5.01 GB 5.00% 8cc 192.168.0.12Up Normal 5.31 GB 5.00% 999 192.168.0.13Up Normal 5.56 GB 5.00% a66 192.168.0.14Up Normal 5.82 GB 5.00% b33 192.168.0.15Up Normal 5.57 GB 5.00% c00 192.168.0.16Up Normal 5.03 GB 5.00% ccc 192.168.0.17Up Normal 4.77 GB 5.00% d99 192.168.0.18Down Normal ? 5.00% e66 192.168.0.19Up Normal 4.78 GB 5.00% f33 192.168.0.20Up Normal 4.83 GB 5.00% -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2072) Race condition during decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2072: -- Fix Version/s: 0.7.2 Race condition during decommission -- Key: CASSANDRA-2072 URL: https://issues.apache.org/jira/browse/CASSANDRA-2072 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.0 Reporter: Brandon Williams Assignee: Brandon Williams Priority: Minor Fix For: 0.7.2 Attachments: 0001-announce-having-left-the-ring-for-RING_DELAY-on-deco.patch, 0002-Improve-TRACE-logging-for-Gossiper.patch, 0003-Remove-endpoint-state-when-expiring-justRemovedEndpo.patch Occasionally when decommissioning a node, there is a race condition that occurs where another node will never remove the token and thus propagate it again with a state of down. With CASSANDRA-1900 we can solve this, but it shouldn't occur in the first place. Given nodes A, B, and C, if you decommission B it will stream to A and C. When complete, B will decommission and receive this stacktrace: ERROR 00:02:40,282 Fatal exception in thread Thread[Thread-5,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:62) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:387) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:91 At this point A will show it is removing B's token, but C will not and instead its failure detector will report that B is dead, and nodetool ring on C shows B in a leaving/down state. In another gossip round, C will propagate this state back to A. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2058) Nodes periodically spike in load
[ https://issues.apache.org/jira/browse/CASSANDRA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988912#comment-12988912 ] Jonathan Ellis commented on CASSANDRA-2058: --- Please tell me you're at least seeing this less often than with .10 :) Nodes periodically spike in load Key: CASSANDRA-2058 URL: https://issues.apache.org/jira/browse/CASSANDRA-2058 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.6.10, 0.7.1 Reporter: David King Assignee: Jonathan Ellis Fix For: 0.6.11, 0.7.1 Attachments: 2058-0.7-v2.txt, 2058-0.7-v3.txt, 2058-0.7.txt, 2058.txt, cassandra.pmc01.log.bz2, cassandra.pmc14.log.bz2, graph a.png, graph b.png (Filing as a placeholder bug as I gather information.) At ~10p 24 Jan, I upgraded our 20-node cluster from 0.6.8-0.6.10, turned on the DES, and moved some CFs from one KS into another (drain whole cluster, take it down, move files, change schema, put it back up). Since then, I've had four storms whereby a node's load will shoot to 700+ (400% CPU on a 4-cpu machine) and become totally unresponsive. After a moment or two like that, its neighbour dies too, and the failure cascades around the ring. Unfortunately because of the high load I'm not able to get into the machine to pull a thread dump to see wtf it's doing as it happens. I've also had an issue where a single node spikes up to high load, but recovers. This may or may not be the same issue from which the nodes don't recover as above, but both are new behaviour -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1065827 - in /cassandra/trunk: ./ conf/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/java/org/apache/cassandra/service/
Author: jbellis Date: Mon Jan 31 22:12:43 2011 New Revision: 1065827 URL: http://svn.apache.org/viewvc?rev=1065827view=rev Log: merge from 0.7 Modified: cassandra/trunk/ (props changed) cassandra/trunk/conf/cassandra.yaml cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java (props changed) cassandra/trunk/src/java/org/apache/cassandra/service/AbstractCassandraDaemon.java Propchange: cassandra/trunk/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Mon Jan 31 22:12:43 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1064713 -/cassandra/branches/cassandra-0.7:1026516-1065665 +/cassandra/branches/cassandra-0.7:1026516-1065826 /cassandra/branches/cassandra-0.7.0:1053690-1055654 /cassandra/tags/cassandra-0.7.0-rc3:1051699-1053689 /incubator/cassandra/branches/cassandra-0.3:774578-796573 Modified: cassandra/trunk/conf/cassandra.yaml URL: http://svn.apache.org/viewvc/cassandra/trunk/conf/cassandra.yaml?rev=1065827r1=1065826r2=1065827view=diff == --- cassandra/trunk/conf/cassandra.yaml (original) +++ cassandra/trunk/conf/cassandra.yaml Mon Jan 31 22:12:43 2011 @@ -225,7 +225,7 @@ rpc_timeout_in_ms: 1 # org.apache.cassandra.locator.PropertyFileSnitch: # - Proximity is determined by rack and data center, which are #explicitly configured in cassandra-topology.properties. -endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch +endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch # dynamic_snitch -- This boolean controls whether the above snitch is # wrapped with a dynamic snitch, which will monitor read latencies Propchange: cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Mon Jan 31 22:12:43 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:922689-1052356,1052358-1053452,1053454,1053456-1064713 -/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1065665 +/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1065826 /cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1053690-1055654 /cassandra/tags/cassandra-0.7.0-rc3/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1051699-1053689 /incubator/cassandra/branches/cassandra-0.3/interface/gen-java/org/apache/cassandra/service/Cassandra.java:774578-796573 Propchange: cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Mon Jan 31 22:12:43 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:922689-1052356,1052358-1053452,1053454,1053456-1064713 -/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1065665 +/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1065826 /cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1053690-1055654 /cassandra/tags/cassandra-0.7.0-rc3/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1051699-1053689 /incubator/cassandra/branches/cassandra-0.3/interface/gen-java/org/apache/cassandra/service/column_t.java:774578-792198 Propchange: cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Mon Jan 31 22:12:43 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java:922689-1052356,1052358-1053452,1053454,1053456-1064713 -/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java:1026516-1065665
[jira] Commented: (CASSANDRA-2067) refactor o.a.c.utils.UUIDGen to allow creating type 1 UUIDs for a given time
[ https://issues.apache.org/jira/browse/CASSANDRA-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988927#comment-12988927 ] Folke Behrens commented on CASSANDRA-2067: -- You could also use the Preferences system to store a random permanent node ID. refactor o.a.c.utils.UUIDGen to allow creating type 1 UUIDs for a given time Key: CASSANDRA-2067 URL: https://issues.apache.org/jira/browse/CASSANDRA-2067 Project: Cassandra Issue Type: Bug Components: Core Reporter: Eric Evans Assignee: Eric Evans Fix For: 0.8 Attachments: v1-0001-CASSANDRA-2067-o.a.c.utils.UUIDGen-adapted-from-flewto.txt, v1-0002-eliminate-usage-of-JUG-for-UUIDs.txt, v1-0003-remove-JUG-jar-and-references.txt, v2-0001-CASSANDRA-2067-o.a.c.utils.UUIDGen-adapted-from-flewto.txt, v2-0002-eliminate-usage-of-JUG-for-UUIDs.txt, v2-0003-remove-JUG-jar-and-license-files.txt Original Estimate: 0h Remaining Estimate: 0h CASSANDRA-2027 creates the need to generate type 1 UUIDs using arbitrary date/times. IMO, this would be a good opportunity to replace o.a.c.utils.UUIDGen with the class that Gary Dusbabek wrote for Flewton (https://github.com/flewton/flewton/blob/master/src/com/rackspace/flewton/util/UUIDGen.java), which is better/more comprehensive. We can even eliminate the dependency on JUG. Patches to follow. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-1551) create tell me what nodes you have hints for jmx api
[ https://issues.apache.org/jira/browse/CASSANDRA-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Hermes updated CASSANDRA-1551: -- Attachment: 1551-v4.txt Rebased, deleteHFE() now accepts an ipaddr or hostname. create tell me what nodes you have hints for jmx api -- Key: CASSANDRA-1551 URL: https://issues.apache.org/jira/browse/CASSANDRA-1551 Project: Cassandra Issue Type: New Feature Components: Tools Reporter: Jonathan Ellis Assignee: Jon Hermes Priority: Minor Fix For: 0.7.2 Attachments: 1551-v2.txt, 1551-v3.txt, 1551-v4.txt, 1551.txt Original Estimate: 4h Remaining Estimate: 4h we can do this efficiently in 0.7 due to new HH schema. in 0.6 this would require scanning all hints so probably not worth it. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2074) Currently voted on 7.0.1 release won't start on windows
[ https://issues.apache.org/jira/browse/CASSANDRA-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988961#comment-12988961 ] Joaquin Casares commented on CASSANDRA-2074: I tried to reproduce this using the same version of Cassandra that you downloaded and couldn't. I updated the instructions to include Windows and Cassandra 0.7 configurations here: http://wiki.apache.org/cassandra/RunningCassandraInEclipse. I did however notice that you aren't running the -Dcassandra-foreground argument. What you are probably seeing the output before Cassandra starts running in the background since it seems like everything processed fine. You could either include the foreground option or access Cassandra using the cassandra-cli. Do either of these options give you better results? Currently voted on 7.0.1 release won't start on windows --- Key: CASSANDRA-2074 URL: https://issues.apache.org/jira/browse/CASSANDRA-2074 Project: Cassandra Issue Type: Bug Components: Core Environment: Windows 7 Reporter: Thibaut Assignee: Joaquin Casares Fix For: 0.7.1 The proposed release (https://hudson.apache.org/hudson/job/Cassandra-0.7/228/) won't start on my windows dev machine running ecplise. (Haven't tested this on linux) Startup parameters: -Dcassandra.config=cassandra-test/cassandra.yaml -ea -Xmx2G It exists right after the following message, no ERROR message is shown. I also tried deleting all my data folders, but cassandra still exists. INFO 18:02:09,690 Will not load MX4J, mx4j-tools.jar is not in the classpath apache-cassandra-2011-01-24_06-01-26.jar works fine though. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2058) Nodes periodically spike in load
[ https://issues.apache.org/jira/browse/CASSANDRA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988968#comment-12988968 ] David King commented on CASSANDRA-2058: --- It's hard to say. I lost 5 nodes in about an hour, but I don't know how many I lost last time Nodes periodically spike in load Key: CASSANDRA-2058 URL: https://issues.apache.org/jira/browse/CASSANDRA-2058 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.6.10, 0.7.1 Environment: OpenJDK 64-Bit Server VM (build 1.6.0_0-b12, mixed mode) Ubuntu 8.10 Linux pmc01 2.6.27-22-xen #1 SMP Fri Feb 20 23:58:13 UTC 2009 x86_64 GNU/Linux Reporter: David King Assignee: Jonathan Ellis Fix For: 0.6.11, 0.7.1 Attachments: 2058-0.7-v2.txt, 2058-0.7-v3.txt, 2058-0.7.txt, 2058.txt, cassandra.pmc01.log.bz2, cassandra.pmc14.log.bz2, graph a.png, graph b.png (Filing as a placeholder bug as I gather information.) At ~10p 24 Jan, I upgraded our 20-node cluster from 0.6.8-0.6.10, turned on the DES, and moved some CFs from one KS into another (drain whole cluster, take it down, move files, change schema, put it back up). Since then, I've had four storms whereby a node's load will shoot to 700+ (400% CPU on a 4-cpu machine) and become totally unresponsive. After a moment or two like that, its neighbour dies too, and the failure cascades around the ring. Unfortunately because of the high load I'm not able to get into the machine to pull a thread dump to see wtf it's doing as it happens. I've also had an issue where a single node spikes up to high load, but recovers. This may or may not be the same issue from which the nodes don't recover as above, but both are new behaviour -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[Cassandra Wiki] Update of RunningCassandraInEclipse by JoaquinCasares
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The RunningCassandraInEclipse page has been changed by JoaquinCasares. http://wiki.apache.org/cassandra/RunningCassandraInEclipse?action=diffrev1=18rev2=19 -- Right click on the build.xml (in your project root) - Run As - Ant Build. This will do a whole lot of good things, eg. generate the CLI grammar with ANTLR, generate avro and thrift code. + '''UPDATE''' New for Cassandra 0.7.1: Right click on the build.xml (in your project root) - Run As - Ant Build... and select generate-eclipse-files. This will automatically build most of the jars in the right places. All that is left to do is to is the very next step in which you add all of the jars in the lib/ folder to the Build Path and all the dissociations should dissappear. If so, skip to the Run Cassandra section. + Next thing you want to do is to add all the needed third party libraries to the build path. Expand the lib/ folder and find a bunch of jar files. Shift select all of them and right mouse click and choose Build Path - Add to Build Path. This will force Eclipse do update the entire workspace, so please be patient. Some of the errors should also have disappeared by now (not all though). @@ -64, +66 @@ Now, if you are lucky, your Eclipse workspace should look something like this: {{attachment:FixSrcJavaSourceFolder-11.png}} - + + = Common Errors = + - (Some Eclipse users have complained about the following error message: + Some Eclipse users have complained about the following error message: 'Access restriction: The method getDuration() from the type GcInfo is not accessible due to restriction on required library /System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Classes/classes.jar'. @@ -80, +84 @@ Now the errors should be gone and you are ready to create a run/debug configuration for cassandra. + + = Run Cassandra = Click Run - Run Configurations Select org.apache.cassandra.thrift.CassandraDaemon as you Main class, make sure that your cassandra project is selected in the Project field. Under the Arguments tab you can specify VM arguments. Below is my complete VM arguments list for Cassandra 0.6:
[jira] Created: (CASSANDRA-2085) digest latencies are not included in snitch calculations
digest latencies are not included in snitch calculations Key: CASSANDRA-2085 URL: https://issues.apache.org/jira/browse/CASSANDRA-2085 Project: Cassandra Issue Type: Bug Affects Versions: 0.6.9 Reporter: Jonathan Ellis Fix For: 0.6.11 ResponseVerbHandler calls MessagingService.instance.maybeAddLatency(cb, message.getFrom(), age); but maybeAddLatency needs to include DigestResponseHandler (it was ported from 0.7 where that no longer exists) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (CASSANDRA-2086) array index out of bounds on compact repair
array index out of bounds on compact repair - Key: CASSANDRA-2086 URL: https://issues.apache.org/jira/browse/CASSANDRA-2086 Project: Cassandra Issue Type: Bug Affects Versions: 0.7.0 Reporter: Jeffrey Damick Priority: Critical We're seeing array index out of bounds exceptions (below) on 0.7.0 when running compact. The repair seems to hang indefinitely on all nodes (also throws index oob). On 1 node in our cluster (running compact): INFO [CompactionExecutor:1] 2011-01-31 20:07:12,140 CompactionManager.java (line 272) Compacting [org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data//XXX-e-318-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/xxx/xxx-e-317-Data.db')] ERROR [CompactionExecutor:1] 2011-01-31 20:07:12,295 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.lang.ArrayIndexOutOfBoundsException: 7 at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:58) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:45) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29) at java.util.concurrent.ConcurrentSkipListMap$ComparableUsingComparator.compareTo(Unknown Source) at java.util.concurrent.ConcurrentSkipListMap.doPut(Unknown Source) at java.util.concurrent.ConcurrentSkipListMap.putIfAbsent(Unknown Source) And another node (running compact): INFO [StreamStage:1] 2011-01-31 20:03:48,663 StreamOutSession.java (line 174) Streaming to /xxx.xxx.xxx.xxx ERROR [CompactionExecutor:1] 2011-01-31 20:03:52,587 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.lang.ArrayIndexOutOfBoundsException ERROR [CompactionExecutor:1] 2011-01-31 20:03:54,216 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.lang.ArrayIndexOutOfBoundsException: 6 at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:56) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:45) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29) at java.util.concurrent.ConcurrentSkipListMap$ComparableUsingComparator.compareTo(Unknown Source) at java.util.concurrent.ConcurrentSkipListMap.doPut(Unknown Source) at java.util.concurrent.ConcurrentSkipListMap.putIfAbsent(Unknown Source) at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:218) Is this related to: CASSANDRA-1959 or CASSANDRA-1992? This has left some of my data in an unrecoverable inaccessible state - how can i repair this situation? -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Issue Comment Edited: (CASSANDRA-2081) Consistency QUORUM does not work anymore (hector:Could not fullfill request on this host)
[ https://issues.apache.org/jira/browse/CASSANDRA-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12989032#comment-12989032 ] Aaron Morton edited comment on CASSANDRA-2081 at 2/1/11 4:46 AM: - I've sort of stumbled onto something similar with an 0.7 install. I need to go home now so cannot dig any deeper and rule out human error, but this is what I have. 5 node 0.7.0 install 1) Load data in using python stress.py -d jb-cass1,jb-cass2,jb-cass3,jb-cass4,jb-cass5 -o insert -n 100 -e QUORUM -t 10 -i 1 -l 3 (use all 5 nodes, insert 1,000,000 rows with RF 3 and QUORUM and 10 threads, report progress every second) 2) Read back using python stress.py -d jb-cass2,jb-cass3,jb-cass4,jb-cass5 -o read -n 100 -e QUORUM -t 10 -i 1 (note that jb-cass1 is removed from the list) 3) make big bang Once the read has run a few seconds I ran reboot -f on node 1. I expect the read operations to complete, output was 11270,1315,1315,0.00839671943578,9 11631,361,361,0.00746133188792,11 11631,0,0,NaN,12 11631,0,0,NaN,13 11631,0,0,NaN,14 11631,0,0,NaN,15 11631,0,0,NaN,16 11631,0,0,NaN,17 11631,0,0,NaN,18 11631,0,0,NaN,19 Process Reader-10: Traceback (most recent call last): File /vol/apps/python-2.6.4_64/lib/python2.6/multiprocessing/process.py, line 232, in _bootstrap self.run() File stress.py, line 279, in run r = self.cclient.get_slice(key, parent, p, consistency) File /local1/frameworks/cassandra/apache-cassandra-0.7.0-src/contrib/py_stress/cassandra/Cassandra.py, line 432, in get_slice return self.recv_get_slice() File /local1/frameworks/cassandra/apache-cassandra-0.7.0-src/contrib/py_stress/cassandra/Cassandra.py, line 462, in recv_get_slice raise result.te All clients died. stress.py is not setting a timeout on the thrift socket, so am guessing this is server side. I was running DEBUG on all the nodes (but had turned off the line numbers), this is from one. the 114.63 machine is obviously the one I killed. DEBUG [pool-1-thread-2] 2011-02-01 17:14:08,186 StorageService.java (line org.apache.cassandra.service.StorageService) Sorted endpoints are /192.168.114.63,jb08.wetafx.co.nz/192.168.114.67,/192.168.114.64 DEBUG [pool-1-thread-2] 2011-02-01 17:14:08,186 QuorumResponseHandler.java (line org.apache.cassandra.service.QuorumResponseHandler) QuorumResponseHandler blocking for 2 responses DEBUG [pool-1-thread-2] 2011-02-01 17:14:08,186 StorageProxy.java (line org.apache.cassandra.service.StorageProxy) strongread reading digest for SliceFromReadCommand(table='Keyspace1', key='30323334343534', column_parent='QueryPath(columnFamilyName='Standard1', superColumnName='null', columnName='null')', start='', finish='', reversed=false, count=5) from 6...@jb08.wetafx.co.nz/192.168.114.67 DEBUG [pool-1-thread-2] 2011-02-01 17:14:08,187 StorageProxy.java (line org.apache.cassandra.service.StorageProxy) strongread reading data for SliceFromReadCommand(table='Keyspace1', key='30323334343534', column_parent='QueryPath(columnFamilyName='Standard1', superColumnName='null', columnName='null')', start='', finish='', reversed=false, count=5) from 6623@/192.168.114.63 DEBUG [pool-1-thread-2] 2011-02-01 17:14:08,187 StorageProxy.java (line org.apache.cassandra.service.StorageProxy) strongread reading digest for SliceFromReadCommand(table='Keyspace1', key='30323334343534', column_parent='QueryPath(columnFamilyName='Standard1', superColumnName='null', columnName='null')', start='', finish='', reversed=false, count=5) from 6624@/192.168.114.64 DEBUG [ReadStage:19] 2011-02-01 17:14:08,187 SliceQueryFilter.java (line org.apache.cassandra.db.filter.SliceQueryFilter) collecting 0 of 5: 4330:false:34@1296532428248604 DEBUG [ReadStage:19] 2011-02-01 17:14:08,187 SliceQueryFilter.java (line org.apache.cassandra.db.filter.SliceQueryFilter) collecting 1 of 5: 4331:false:34@1296532428248637 DEBUG [ReadStage:19] 2011-02-01 17:14:08,187 SliceQueryFilter.java (line org.apache.cassandra.db.filter.SliceQueryFilter) collecting 2 of 5: 4332:false:34@1296532428248640 DEBUG [ReadStage:19] 2011-02-01 17:14:08,187 SliceQueryFilter.java (line org.apache.cassandra.db.filter.SliceQueryFilter) collecting 3 of 5: 4333:false:34@1296532428248642 DEBUG [ReadStage:19] 2011-02-01 17:14:08,187 SliceQueryFilter.java (line org.apache.cassandra.db.filter.SliceQueryFilter) collecting 4 of 5: 4334:false:34@1296532428248656 DEBUG [ReadStage:19] 2011-02-01 17:14:08,187 ReadVerbHandler.java (line org.apache.cassandra.db.ReadVerbHandler) digest is 220b82e28c2bb4be869c168243d75f01 DEBUG [ReadStage:19] 2011-02-01 17:14:08,187 ReadVerbHandler.java (line org.apache.cassandra.db.ReadVerbHandler) Read key 30323334343534; sending response to 7d8fa1fd-a2fe-6a54-7bb0-3b129206d...@jb08.wetafx.co.nz/192.168.114.67 DEBUG [RequestResponseStage:13] 2011-02-01 17:14:08,188 ResponseVerbHandler.java (line
[jira] Resolved: (CASSANDRA-2086) array index out of bounds on compact repair
[ https://issues.apache.org/jira/browse/CASSANDRA-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-2086. --- Resolution: Duplicate this is CASSANDRA-1992. array index out of bounds on compact repair - Key: CASSANDRA-2086 URL: https://issues.apache.org/jira/browse/CASSANDRA-2086 Project: Cassandra Issue Type: Bug Affects Versions: 0.7.0 Reporter: Jeffrey Damick Priority: Critical We're seeing array index out of bounds exceptions (below) on 0.7.0 when running compact. The repair seems to hang indefinitely on all nodes (also throws index oob). On 1 node in our cluster (running compact): INFO [CompactionExecutor:1] 2011-01-31 20:07:12,140 CompactionManager.java (line 272) Compacting [org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data//XXX-e-318-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/xxx/xxx-e-317-Data.db')] ERROR [CompactionExecutor:1] 2011-01-31 20:07:12,295 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.lang.ArrayIndexOutOfBoundsException: 7 at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:58) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:45) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29) at java.util.concurrent.ConcurrentSkipListMap$ComparableUsingComparator.compareTo(Unknown Source) at java.util.concurrent.ConcurrentSkipListMap.doPut(Unknown Source) at java.util.concurrent.ConcurrentSkipListMap.putIfAbsent(Unknown Source) And another node (running compact): INFO [StreamStage:1] 2011-01-31 20:03:48,663 StreamOutSession.java (line 174) Streaming to /xxx.xxx.xxx.xxx ERROR [CompactionExecutor:1] 2011-01-31 20:03:52,587 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.lang.ArrayIndexOutOfBoundsException ERROR [CompactionExecutor:1] 2011-01-31 20:03:54,216 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.lang.ArrayIndexOutOfBoundsException: 6 at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:56) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:45) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29) at java.util.concurrent.ConcurrentSkipListMap$ComparableUsingComparator.compareTo(Unknown Source) at java.util.concurrent.ConcurrentSkipListMap.doPut(Unknown Source) at java.util.concurrent.ConcurrentSkipListMap.putIfAbsent(Unknown Source) at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:218) Is this related to: CASSANDRA-1959 or CASSANDRA-1992? This has left some of my data in an unrecoverable inaccessible state - how can i repair this situation? -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2086) array index out of bounds on compact repair
[ https://issues.apache.org/jira/browse/CASSANDRA-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12989035#comment-12989035 ] Jeffrey Damick commented on CASSANDRA-2086: --- but is there is any way to repair the problem without deleting all of my data? array index out of bounds on compact repair - Key: CASSANDRA-2086 URL: https://issues.apache.org/jira/browse/CASSANDRA-2086 Project: Cassandra Issue Type: Bug Affects Versions: 0.7.0 Reporter: Jeffrey Damick Priority: Critical We're seeing array index out of bounds exceptions (below) on 0.7.0 when running compact. The repair seems to hang indefinitely on all nodes (also throws index oob). On 1 node in our cluster (running compact): INFO [CompactionExecutor:1] 2011-01-31 20:07:12,140 CompactionManager.java (line 272) Compacting [org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data//XXX-e-318-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/xxx/xxx-e-317-Data.db')] ERROR [CompactionExecutor:1] 2011-01-31 20:07:12,295 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.lang.ArrayIndexOutOfBoundsException: 7 at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:58) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:45) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29) at java.util.concurrent.ConcurrentSkipListMap$ComparableUsingComparator.compareTo(Unknown Source) at java.util.concurrent.ConcurrentSkipListMap.doPut(Unknown Source) at java.util.concurrent.ConcurrentSkipListMap.putIfAbsent(Unknown Source) And another node (running compact): INFO [StreamStage:1] 2011-01-31 20:03:48,663 StreamOutSession.java (line 174) Streaming to /xxx.xxx.xxx.xxx ERROR [CompactionExecutor:1] 2011-01-31 20:03:52,587 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.lang.ArrayIndexOutOfBoundsException ERROR [CompactionExecutor:1] 2011-01-31 20:03:54,216 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.lang.ArrayIndexOutOfBoundsException: 6 at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:56) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:45) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29) at java.util.concurrent.ConcurrentSkipListMap$ComparableUsingComparator.compareTo(Unknown Source) at java.util.concurrent.ConcurrentSkipListMap.doPut(Unknown Source) at java.util.concurrent.ConcurrentSkipListMap.putIfAbsent(Unknown Source) at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:218) Is this related to: CASSANDRA-1959 or CASSANDRA-1992? This has left some of my data in an unrecoverable inaccessible state - how can i repair this situation? -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Issue Comment Edited: (CASSANDRA-1941) Add distributed test doing reads during MovementTest
[ https://issues.apache.org/jira/browse/CASSANDRA-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12989048#comment-12989048 ] Stu Hood edited comment on CASSANDRA-1941 at 2/1/11 5:28 AM: - MovementTest performs a loadbalance, which is almost a full roundtrip. It should be possible to test bootstrap by: # decommissioning the node via nodetool # killing the process # wiping its state # starting it again These commands exist in the Whirr scripts now. was (Author: stuhood): MovementTest performs a loadbalance, which almost a full roundtrip. It should be possible to test bootstrap by: # decommissioning the node via nodetool # killing the process # wiping its state # starting it again These commands exist in the Whirr scripts now. Add distributed test doing reads during MovementTest Key: CASSANDRA-1941 URL: https://issues.apache.org/jira/browse/CASSANDRA-1941 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Brandon Williams Priority: Minor Fix For: 0.8 Following introduction of the distributed test framework in CASSANDRA-1859, we should extend that to test reads while bootstrap happens (this is a scenario that has had regressions in the past). See test/distributed/README.txt for intro. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-1941) Add distributed test doing reads during MovementTest
[ https://issues.apache.org/jira/browse/CASSANDRA-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12989048#comment-12989048 ] Stu Hood commented on CASSANDRA-1941: - MovementTest performs a loadbalance, which almost a full roundtrip. It should be possible to test bootstrap by: # decommissioning the node via nodetool # killing the process # wiping its state # starting it again These commands exist in the Whirr scripts now. Add distributed test doing reads during MovementTest Key: CASSANDRA-1941 URL: https://issues.apache.org/jira/browse/CASSANDRA-1941 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Brandon Williams Priority: Minor Fix For: 0.8 Following introduction of the distributed test framework in CASSANDRA-1859, we should extend that to test reads while bootstrap happens (this is a scenario that has had regressions in the past). See test/distributed/README.txt for intro. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up
Temp files for failed compactions/streaming not cleaned up -- Key: CASSANDRA-2088 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088 Project: Cassandra Issue Type: Bug Components: Core Reporter: Stu Hood Fix For: 0.7.2 From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up
[ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12989053#comment-12989053 ] Stu Hood commented on CASSANDRA-2088: - Regarding repair: http://www.mail-archive.com/user@cassandra.apache.org/msg09259.html And compaction: CASSANDRA-2084 Temp files for failed compactions/streaming not cleaned up -- Key: CASSANDRA-2088 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088 Project: Cassandra Issue Type: Bug Components: Core Reporter: Stu Hood Fix For: 0.7.2 From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2087) Keep in-memory list of uncompactable sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stu Hood updated CASSANDRA-2087: Summary: Keep in-memory list of uncompactable sstables (was: Keep in memory list of uncompactable sstables) Keep in-memory list of uncompactable sstables - Key: CASSANDRA-2087 URL: https://issues.apache.org/jira/browse/CASSANDRA-2087 Project: Cassandra Issue Type: Improvement Reporter: Stu Hood Priority: Minor Rather than retrying compactions that we know will fail we should: {quote}stop trying to compact that file and log what file the error occurred for. The list of corrupt sstables does not even have to be persistent, just an in memory list which gets wiped out on a restart.{quote} -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Issue Comment Edited: (CASSANDRA-2084) Corrupt sstables cause compaction to fail again, and again and again, ...
[ https://issues.apache.org/jira/browse/CASSANDRA-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12989054#comment-12989054 ] Stu Hood edited comment on CASSANDRA-2084 at 2/1/11 6:00 AM: - EDIT: Just double checked, apparently version 'f' was in the 0.7 branch, but did not make it into 0.7.0: apologies. I'll take a close look at this tomorrow. -It looks like those SSTables were created with a pre-release version of Cassandra 0.7 (version 'e', vs the release version 'f'). Mind you, that is a usecase that we would like to support, but it's important information to include in a bug report.- -This error occurs suspiciously close to the bloom filter reading code, which changed between e and f. I'll CC kingryan to have him take a look tomorrow.- Keeping a list of uncompactable SSTables is an excellent idea: opened CASSANDRA-2087. Also opened CASSANDRA-2088 for the compaction cleanup problem. Thanks for the report! was (Author: stuhood): It looks like those SSTables were created with a pre-release version of Cassandra 0.7 (version 'e', vs the release version 'f'). Mind you, that is a usecase that we would like to support, but it's important information to include in a bug report. This error occurs suspiciously close to the bloom filter reading code, which changed between e and f. I'll CC kingryan to have him take a look tomorrow. Keeping a list of uncompactable SSTables is an excellent idea: opened CASSANDRA-2087. Also opened CASSANDRA-2088 for the compaction cleanup problem. Thanks for the report! Corrupt sstables cause compaction to fail again, and again and again, ... - Key: CASSANDRA-2084 URL: https://issues.apache.org/jira/browse/CASSANDRA-2084 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.0 Environment: Ubuntu 10.10 Cassandra 0.7.0 (4 Nodes) Java: - java version 1.6.0_22 - Java(TM) SE Runtime Environment (build 1.6.0_22-b04) - Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode) Reporter: Dan Hendry I have been having some serious data corruption issues in my cluster. I suspect some deeper more serious Cassandra bug but I dont know what or where it is and I have not found a way to reproduce the issues I have been having. This ticket is for a behaviour I have observed where cassandra starts compacting a set of sstables, fails, does not clean up the tmp files, then start compacting the exact same set of sstables again. (See logs below). After awhile, the node runs out of disk space and crashes. At the very least, cassandra should clean up temp files after a failed compaction. Better yet, it should stop trying to compact that file and log what file the error occurred for. The list of corrupt sstables does not even have to be persistent, just an in memory list which gets wiped out on a restart. Here is a sample log, the same 4 sstables are being compacted then failing then being compacted again. INFO [CompactionExecutor:1] 2011-01-31 13:08:26,434 CompactionManager.java (line 272) Compacting [org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-562-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-692-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-773-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-940-Data.db')] INFO [HintedHandoff:1] 2011-01-31 13:08:28,878 HintedHandOffManager.java (line 226) Could not complete hinted handoff to /192.168.4.16 INFO [HintedHandoff:1] 2011-01-31 13:08:28,879 ColumnFamilyStore.java (line 648) switching in a fresh Memtable for HintsColumnFamily at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1296500864696.log', position=104140211) INFO [HintedHandoff:1] 2011-01-31 13:08:28,879 ColumnFamilyStore.java (line 952) Enqueuing flush of Memtable-HintsColumnFamily@1652350488(1155546 bytes, 20839 operations) INFO [FlushWriter:1] 2011-01-31 13:08:28,879 Memtable.java (line 155) Writing Memtable-HintsColumnFamily@1652350488(1155546 bytes, 20839 operations) INFO [FlushWriter:1] 2011-01-31 13:08:29,199 Memtable.java (line 162) Completed flushing /var/lib/cassandra/data/system/HintsColumnFamily-e-9-Data.db (1075487 bytes) INFO [GossipStage:1] 2011-01-31 13:08:45,508 Gossiper.java (line 569) InetAddress /192.168.4.16 is now UP INFO [COMMIT-LOG-WRITER] 2011-01-31 13:08:59,736 CommitLogSegment.java (line 50) Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1296500939735.log
[jira] Updated: (CASSANDRA-1941) Add distributed test doing reads during MovementTest
[ https://issues.apache.org/jira/browse/CASSANDRA-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stu Hood updated CASSANDRA-1941: Issue Type: Test (was: New Feature) Add distributed test doing reads during MovementTest Key: CASSANDRA-1941 URL: https://issues.apache.org/jira/browse/CASSANDRA-1941 Project: Cassandra Issue Type: Test Components: Core Reporter: Jonathan Ellis Assignee: Brandon Williams Priority: Minor Fix For: 0.8 Following introduction of the distributed test framework in CASSANDRA-1859, we should extend that to test reads while bootstrap happens (this is a scenario that has had regressions in the past). See test/distributed/README.txt for intro. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (CASSANDRA-2089) Distributed test for the dynamic snitch
Distributed test for the dynamic snitch --- Key: CASSANDRA-2089 URL: https://issues.apache.org/jira/browse/CASSANDRA-2089 Project: Cassandra Issue Type: Test Components: Core Reporter: Stu Hood The dynamic snitch has turned into an essential component in dealing with partially failed nodes: it would be great to have it fully tested before the 0.8 release. In order to implement a proper test of the snitch, it is necessary to be able to flip a switch to place a node in a degraded state. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2089) Distributed test for the dynamic snitch
[ https://issues.apache.org/jira/browse/CASSANDRA-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stu Hood updated CASSANDRA-2089: Fix Version/s: 0.8 Distributed test for the dynamic snitch --- Key: CASSANDRA-2089 URL: https://issues.apache.org/jira/browse/CASSANDRA-2089 Project: Cassandra Issue Type: Test Components: Core Reporter: Stu Hood Labels: des Fix For: 0.8 The dynamic snitch has turned into an essential component in dealing with partially failed nodes: it would be great to have it fully tested before the 0.8 release. In order to implement a proper test of the snitch, it is necessary to be able to flip a switch to place a node in a degraded state. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira