[jira] [Updated] (CASSANDRA-3636) cassandra 1.0.6 debian packages will not run on OpenVZ
[ https://issues.apache.org/jira/browse/CASSANDRA-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zenek Kraweznik updated CASSANDRA-3636: --- Description: During upgrade from 1.0.6 {code}Setting up cassandra (1.0.6) ... *error: permission denied on key 'vm.max_map_count'* dpkg: error processing cassandra (--configure): subprocess installed post-installation script returned error exit status 255 Errors were encountered while processing: cassandra {code} was: During upgrade from 1.0.6 Setting up cassandra (1.0.6) ... *error: permission denied on key 'vm.max_map_count'* dpkg: error processing cassandra (--configure): subprocess installed post-installation script returned error exit status 255 Errors were encountered while processing: cassandra cassandra 1.0.6 debian packages will not run on OpenVZ -- Key: CASSANDRA-3636 URL: https://issues.apache.org/jira/browse/CASSANDRA-3636 Project: Cassandra Issue Type: Bug Components: Packaging Affects Versions: 1.0.6 Environment: Debian Linux (stable), OpenVZ container Reporter: Zenek Kraweznik Priority: Critical During upgrade from 1.0.6 {code}Setting up cassandra (1.0.6) ... *error: permission denied on key 'vm.max_map_count'* dpkg: error processing cassandra (--configure): subprocess installed post-installation script returned error exit status 255 Errors were encountered while processing: cassandra {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3625) Do something about DynamicCompositeType
[ https://issues.apache.org/jira/browse/CASSANDRA-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170057#comment-13170057 ] Boris Yen edited comment on CASSANDRA-3625 at 12/15/11 9:07 AM: Not sure if this is doable or has any issue. I am thinking why not mimic that way the secondary index is implemented right now. Create one extra column family for keeping track of the comparators for each row. So, whenever a column is inserted to the cassandra, the cassandra needs to read before write to make sure the new column is valid. This would sacrifice the write performance of dynamicComposite column family, but at least it allows the cassandra to perform the validation before the actual write. was (Author: yulinyen): Not sure if this is doable or has an issue. I am thinking why not mimic that way the secondary index is implemented right now. Create one extra column family for keeping track of the comparators for each row. So, whenever a column is inserted to the cassandra, the cassandra needs to read before write to make sure the new column is valid. This would sacrifice the write performance of dynamicComposite column family, but at least it allows the cassandra to perform the validation before the actual write. Do something about DynamicCompositeType --- Key: CASSANDRA-3625 URL: https://issues.apache.org/jira/browse/CASSANDRA-3625 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Sylvain Lebresne Currently, DynamicCompositeType is a super dangerous type. We cannot leave it that way or people will get hurt. Let's recall that DynamicCompositeType allows composite column names without any limitation on what each component type can be. It was added to basically allow to use different rows of the same column family to each store a different index. So for instance you would have: {noformat} index1: { bar:24 - someval bar:42 - someval foo:12 - someval ... } index2: { 0:uuid1:3.2 - someval 1:uuid2:2.2 - someval ... } {noformat} where index1, index2, ... are rows. So each row have columns whose names have similar structure (so they can be compared), but between rows the structure can be different (we neve compare two columns from two different rows). But the problem is the following: what happens if in the index1 row above, you insert a column whose name is 0:uuid1 ? There is no really meaningful way to compare bar:24 and 0:uuid1. The current implementation of DynamicCompositeType, when confronted with this, says that it is a user error and throw a MarshalException. The problem with that is that the exception is not throw at insert time, and it *cannot* be because of the dynamic nature of the comparator. But that means that if you do insert the wrong column in the wrong row, you end up *corrupting* a sstable. It is too dangerous a behavior. And it's probably made worst by the fact that some people probably think that DynamicCompositeType should be superior to CompositeType since you know, it's dynamic. One solution to that problem could be to decide of some random (but predictable) order between two incomparable component. For example we could design that IntType LongType StringType ... Note that even if we do that, I would suggest renaming the DynamicCompositeType to something that suggest that CompositeType is always preferable to DynamicCompositeType unless you're really doing very advanced stuffs. Opinions? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3625) Do something about DynamicCompositeType
[ https://issues.apache.org/jira/browse/CASSANDRA-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170057#comment-13170057 ] Boris Yen commented on CASSANDRA-3625: -- Not sure if this is doable or has an issue. I am thinking why not mimic that way the secondary index is implemented right now. Create one extra column family for keeping track of the comparators for each row. So, whenever a column is inserted to the cassandra, the cassandra needs to read before write to make sure the new column is valid. This would sacrifice the write performance of dynamicComposite column family, but at least it allows the cassandra to perform the validation before the actual write. Do something about DynamicCompositeType --- Key: CASSANDRA-3625 URL: https://issues.apache.org/jira/browse/CASSANDRA-3625 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Sylvain Lebresne Currently, DynamicCompositeType is a super dangerous type. We cannot leave it that way or people will get hurt. Let's recall that DynamicCompositeType allows composite column names without any limitation on what each component type can be. It was added to basically allow to use different rows of the same column family to each store a different index. So for instance you would have: {noformat} index1: { bar:24 - someval bar:42 - someval foo:12 - someval ... } index2: { 0:uuid1:3.2 - someval 1:uuid2:2.2 - someval ... } {noformat} where index1, index2, ... are rows. So each row have columns whose names have similar structure (so they can be compared), but between rows the structure can be different (we neve compare two columns from two different rows). But the problem is the following: what happens if in the index1 row above, you insert a column whose name is 0:uuid1 ? There is no really meaningful way to compare bar:24 and 0:uuid1. The current implementation of DynamicCompositeType, when confronted with this, says that it is a user error and throw a MarshalException. The problem with that is that the exception is not throw at insert time, and it *cannot* be because of the dynamic nature of the comparator. But that means that if you do insert the wrong column in the wrong row, you end up *corrupting* a sstable. It is too dangerous a behavior. And it's probably made worst by the fact that some people probably think that DynamicCompositeType should be superior to CompositeType since you know, it's dynamic. One solution to that problem could be to decide of some random (but predictable) order between two incomparable component. For example we could design that IntType LongType StringType ... Note that even if we do that, I would suggest renaming the DynamicCompositeType to something that suggest that CompositeType is always preferable to DynamicCompositeType unless you're really doing very advanced stuffs. Opinions? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3615) CommitLog BufferOverflowException
[ https://issues.apache.org/jira/browse/CASSANDRA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170092#comment-13170092 ] Vitalii Tymchyshyn commented on CASSANDRA-3615: --- I've got similar problem on 1.0.5: ERROR [COMMIT-LOG-WRITER] 2011-12-13 21:11:57,004 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[COMMIT-LOG-WRITER,5,main] java.nio.BufferOverflowException at java.nio.Buffer.nextPutIndex(Buffer.java:518) at java.nio.DirectByteBuffer.putInt(DirectByteBuffer.java:664) at org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:244) at org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:567) at org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:49) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.lang.Thread.run(Thread.java:679) It seems that there is an incosistency between space checker and actual write org.apache.cassandra.db.commitlog.CommitLogSegment#hasCapacityFor checks for serialized length + ENTRY_OVERHEAD_SIZE = 4 + 8 + 8 At the same time, write also writes END_OF_SEGMENT_MARKER int, so ENTRY_OVERHEAD_SIZE should be 4+8+8+4 CommitLog BufferOverflowException - Key: CASSANDRA-3615 URL: https://issues.apache.org/jira/browse/CASSANDRA-3615 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1 Reporter: Rick Branson Assignee: Rick Branson Reported on mailing list http://mail-archives.apache.org/mod_mbox/cassandra-dev/201112.mbox/%3CCAJHHpg2Rw_BWFJ9DycRGSYkmwMwrJDK3%3Dzw3HwRoutWHbUcULw%40mail.gmail.com%3E ERROR 14:07:31,215 Fatal exception in thread Thread[COMMIT-LOG-WRITER,5,main] java.nio.BufferOverflowException at java.nio.Buffer.nextPutIndex(Buffer.java:501) at java.nio.DirectByteBuffer.putInt(DirectByteBuffer.java:654) at org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:259) at org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:568) at org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:49) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.lang.Thread.run(Thread.java:662) INFO 14:07:31,504 flushing high-traffic column family CFS(Keyspace='***', ColumnFamily='***') (estimated 103394287 bytes) It happened during a fairly standard load process using M/R. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3620) Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and scheduled repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Williams updated CASSANDRA-3620: Summary: Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and scheduled repairs (was: Proposal for distributed deletes - use Reaper Model rather than GCSeconds and scheduled repairs) Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and scheduled repairs - Key: CASSANDRA-3620 URL: https://issues.apache.org/jira/browse/CASSANDRA-3620 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Dominic Williams Labels: GCSeconds,, deletes,, distributed_deletes,, merkle_trees, repair, Original Estimate: 504h Remaining Estimate: 504h Here is a proposal for an improved system for handling distributed deletes. h2. The Problem There are various issues with repair: * Repair is expensive anyway * Repair jobs are often made more expensive than they should be by other issues (nodes dropping requests, hinted handoff not working, downtime etc) * Repair processes can often fail and need restarting, for example in cloud environments where network issues make a node disappear from the ring for a brief moment * When you fail to run repair within GCSeconds, either by error or because of issues with Cassandra, data written to a node that did not see a later delete can reappear (and a node might miss a delete for several reasons including being down or simply dropping requests during load shedding) * If you cannot run repair and have to increase GCSeconds to prevent deleted data reappearing, in some cases the growing tombstone overhead can significantly degrade performance Because of the foregoing, in high throughput environments it can be very difficult to make repair a cron job. It can be preferable to keep a terminal open and run repair jobs one by one, making sure they succeed and keeping and eye on overall load to reduce system impact. This isn't desirable, and problems are exacerbated when there are lots of column families in a database or it is necessary to run a column family with a low GCSeconds to reduce tombstone load (because there are many write/deletes to that column family). The database owner must run repair within the GCSeconds window, or increase GCSeconds, to avoid potentially losing delete operations. It would be much better if there was no ongoing requirement to run repair to ensure deletes aren't lost, and no GCSeconds window. Ideally repair would be an optional maintenance utility used in special cases, or to ensure ONE reads get consistent data. h2. Reaper Model Proposal # Tombstones do not expire, and there is no GCSeconds # Tombstones have associated ACK lists, which record the replicas that have acknowledged them # Tombstones are only deleted (or marked for compaction) when they have been acknowledged by all replicas # When a tombstone is deleted, it is added to a fast relic index of MD5 hashes of cf-key-name[-subName]-ackList. The relic index makes it possible for a reaper to acknowledge a tombstone after it is deleted # Background reaper threads constantly stream ACK requests to other nodes, and stream back ACK responses back to requests they have received (throttling their usage of CPU and bandwidth so as not to affect performance) # If a reaper receives a request to ACK a tombstone that does not exist, it creates the tombstone and adds an ACK for the requestor, and replies with an ACK NOTES * The existence of entries in the relic index do not affect normal query performance * If a node goes down, and comes up after a configurable relic entry timeout, the worst that can happen is that a tombstone that hasn't received all its acknowledgements is re-created across the replicas when the reaper requests their acknowledgements (which is no big deal since this does not corrupt data) * Since early removal of entries in the relic index does not cause corruption, it can be kept small, or even kept in memory * Simple to implement and predictable h3. Planned Benefits * Operations are finely grained (reaper interruption is not an issue) * The labour administration overhead associated with running repair can be removed * Reapers can utilize spare cycles and run constantly in background to prevent the load spikes and performance issues associated with repair * There will no longer be the threat of corruption if repair can't be run for some reason (for example because of a new adopter's lack of Cassandra expertise, a cron script failing, or Cassandra bugs preventing repair being run etc) *
[jira] [Updated] (CASSANDRA-3620) Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and manual repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Williams updated CASSANDRA-3620: Description: Proposal for an improved system for handling distributed deletes, which removes the requirement to run repair regular processes to maintain performance and data integrity. h2. The Problem There are various issues with repair: * Repair is expensive anyway * Repair jobs are often made more expensive than they should be by other issues (nodes dropping requests, hinted handoff not working, downtime etc) * Repair processes can often fail and need restarting, for example in cloud environments where network issues make a node disappear from the ring for a brief moment * When you fail to run repair within GCSeconds, either by error or because of issues with Cassandra, data written to a node that did not see a later delete can reappear (and a node might miss a delete for several reasons including being down or simply dropping requests during load shedding) * If you cannot run repair and have to increase GCSeconds to prevent deleted data reappearing, in some cases the growing tombstone overhead can significantly degrade performance Because of the foregoing, in high throughput environments it can be very difficult to make repair a cron job. It can be preferable to keep a terminal open and run repair jobs one by one, making sure they succeed and keeping and eye on overall load to reduce system impact. This isn't desirable, and problems are exacerbated when there are lots of column families in a database or it is necessary to run a column family with a low GCSeconds to reduce tombstone load (because there are many write/deletes to that column family). The database owner must run repair within the GCSeconds window, or increase GCSeconds, to avoid potentially losing delete operations. It would be much better if there was no ongoing requirement to run repair to ensure deletes aren't lost, and no GCSeconds window. Ideally repair would be an optional maintenance utility used in special cases, or to ensure ONE reads get consistent data. h2. Reaper Model Proposal # Tombstones do not expire, and there is no GCSeconds # Tombstones have associated ACK lists, which record the replicas that have acknowledged them # Tombstones are only deleted (or marked for compaction) when they have been acknowledged by all replicas # When a tombstone is deleted, it is added to a fast relic index of MD5 hashes of cf-key-name[-subName]-ackList. The relic index makes it possible for a reaper to acknowledge a tombstone after it is deleted # Background reaper threads constantly stream ACK requests to other nodes, and stream back ACK responses back to requests they have received (throttling their usage of CPU and bandwidth so as not to affect performance) # If a reaper receives a request to ACK a tombstone that does not exist, it creates the tombstone and adds an ACK for the requestor, and replies with an ACK NOTES * The existence of entries in the relic index do not affect normal query performance * If a node goes down, and comes up after a configurable relic entry timeout, the worst that can happen is that a tombstone that hasn't received all its acknowledgements is re-created across the replicas when the reaper requests their acknowledgements (which is no big deal since this does not corrupt data) * Since early removal of entries in the relic index does not cause corruption, it can be kept small, or even kept in memory * Simple to implement and predictable h3. Planned Benefits * Operations are finely grained (reaper interruption is not an issue) * The labour administration overhead associated with running repair can be removed * Reapers can utilize spare cycles and run constantly in background to prevent the load spikes and performance issues associated with repair * There will no longer be the threat of corruption if repair can't be run for some reason (for example because of a new adopter's lack of Cassandra expertise, a cron script failing, or Cassandra bugs preventing repair being run etc) * Deleting tombstones earlier, thereby reducing the number involved in query processing, will often dramatically improve performance was: Here is a proposal for an improved system for handling distributed deletes. h2. The Problem There are various issues with repair: * Repair is expensive anyway * Repair jobs are often made more expensive than they should be by other issues (nodes dropping requests, hinted handoff not working, downtime etc) * Repair processes can often fail and need restarting, for example in cloud environments where network issues make a node disappear from the ring for a brief moment * When you fail to run repair within GCSeconds, either by error or because of issues with Cassandra, data written to a node that did not see a later delete can
[jira] [Created] (CASSANDRA-3638) It may iterate the whole memtable while just query one row . This seriously affect the performance . of Cassandra
It may iterate the whole memtable while just query one row . This seriously affect the performance . of Cassandra -- Key: CASSANDRA-3638 URL: https://issues.apache.org/jira/browse/CASSANDRA-3638 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.0 Reporter: MaHaiyang RangeSliceVerbHandler may just only query one row , but cassandra may iterate the whole memtable . the problem is in ColumnFamilyStore.getRangeSlice() method . {code:title=ColumnFamilyStore.java|borderStyle=solid} public ListRow getRangeSlice(ByteBuffer superColumn, final AbstractBounds range, int maxResults, IFilter columnFilter) throws ExecutionException, InterruptedException { ... DecoratedKey startWith = new DecoratedKey(range.left, null); DecoratedKey stopAt = new DecoratedKey(range.right, null); QueryFilter filter = new QueryFilter(null, new QueryPath(columnFamily, superColumn, null), columnFilter); int gcBefore = (int)(System.currentTimeMillis() / 1000) - metadata.getGcGraceSeconds(); ListRow rows; ViewFragment view = markReferenced(startWith, stopAt); try { CloseableIteratorRow iterator = RowIteratorFactory.getIterator(view.memtables, view.sstables, startWith, stopAt, filter, getComparator(), this); rows = new ArrayListRow(); try { // pull rows out of the iterator boolean first = true; while (iterator.hasNext()) {color:red} // this iterator may iterate the whole memtable!! {color} { } } . } . return rows; } {code} {code:title=Memtable.java|borderStyle=solid} {color:red} // Just only query one row ,but returned a sublist of columnFamiles {color} public IteratorMap.EntryDecoratedKey, ColumnFamily getEntryIterator(DecoratedKey startWith) { return columnFamilies.tailMap(startWith).entrySet().iterator(); } {code} {code:title=RowIteratorFactory.java|borderStyle=solid} public IColumnIterator computeNext() { while (iter.hasNext()) { Map.EntryDecoratedKey, ColumnFamily entry = iter.next(); IColumnIterator ici = filter.getMemtableColumnIterator(entry.getValue(), entry.getKey(), comparator); {color:red} // entry.getKey() will never bigger or equal to startKey, and then iterate the whole sublist of memtable {color} if (pred.apply(ici)) return ici; } return endOfData(); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3638) It may iterate the whole memtable while just query one row . This seriously affect the performance . of Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MaHaiyang updated CASSANDRA-3638: - Description: RangeSliceVerbHandler may just only query one row , but cassandra may iterate the whole memtable . the problem is in ColumnFamilyStore.getRangeSlice() method . {code:title=ColumnFamilyStore.java|borderStyle=solid} public ListRow getRangeSlice(ByteBuffer superColumn, final AbstractBounds range, int maxResults, IFilter columnFilter) throws ExecutionException, InterruptedException { ... DecoratedKey startWith = new DecoratedKey(range.left, null); DecoratedKey stopAt = new DecoratedKey(range.right, null); QueryFilter filter = new QueryFilter(null, new QueryPath(columnFamily, superColumn, null), columnFilter); int gcBefore = (int)(System.currentTimeMillis() / 1000) - metadata.getGcGraceSeconds(); ListRow rows; ViewFragment view = markReferenced(startWith, stopAt); try { CloseableIteratorRow iterator = RowIteratorFactory.getIterator(view.memtables, view.sstables, startWith, stopAt, filter, getComparator(), this); rows = new ArrayListRow(); try { // pull rows out of the iterator boolean first = true; while (iterator.hasNext()) {color:red} // this iterator may iterate the whole memtable!!{color} { } } . } . return rows; } {code} {code:title=Memtable.java|borderStyle=solid} {color:red} // Just only query one row ,but returned a sublist of columnFamiles {color} public IteratorMap.EntryDecoratedKey, ColumnFamily getEntryIterator(DecoratedKey startWith) { return columnFamilies.tailMap(startWith).entrySet().iterator(); } {code} {code:title=RowIteratorFactory.java|borderStyle=solid} public IColumnIterator computeNext() { while (iter.hasNext()) { Map.EntryDecoratedKey, ColumnFamily entry = iter.next(); IColumnIterator ici = filter.getMemtableColumnIterator(entry.getValue(), entry.getKey(), comparator); {color:red} // entry.getKey() will never bigger or equal to startKey, and then iterate the whole sublist of memtable {color} if (pred.apply(ici)) return ici; } return endOfData(); {code} was: RangeSliceVerbHandler may just only query one row , but cassandra may iterate the whole memtable . the problem is in ColumnFamilyStore.getRangeSlice() method . {code:title=ColumnFamilyStore.java|borderStyle=solid} public ListRow getRangeSlice(ByteBuffer superColumn, final AbstractBounds range, int maxResults, IFilter columnFilter) throws ExecutionException, InterruptedException { ... DecoratedKey startWith = new DecoratedKey(range.left, null); DecoratedKey stopAt = new DecoratedKey(range.right, null); QueryFilter filter = new QueryFilter(null, new QueryPath(columnFamily, superColumn, null), columnFilter); int gcBefore = (int)(System.currentTimeMillis() / 1000) - metadata.getGcGraceSeconds(); ListRow rows; ViewFragment view = markReferenced(startWith, stopAt); try { CloseableIteratorRow iterator = RowIteratorFactory.getIterator(view.memtables, view.sstables, startWith, stopAt, filter, getComparator(), this); rows = new ArrayListRow(); try { // pull rows out of the iterator boolean first = true; while (iterator.hasNext()) {color:red} // this iterator may iterate the whole memtable!! {color} { } } . } . return rows; } {code} {code:title=Memtable.java|borderStyle=solid} {color:red} // Just only query one row ,but returned a sublist of columnFamiles {color} public IteratorMap.EntryDecoratedKey, ColumnFamily getEntryIterator(DecoratedKey startWith) { return columnFamilies.tailMap(startWith).entrySet().iterator(); } {code} {code:title=RowIteratorFactory.java|borderStyle=solid} public IColumnIterator computeNext() { while (iter.hasNext()) { Map.EntryDecoratedKey, ColumnFamily entry = iter.next(); IColumnIterator ici = filter.getMemtableColumnIterator(entry.getValue(), entry.getKey(), comparator); {color:red} // entry.getKey() will never bigger or equal to startKey, and then iterate the whole sublist of memtable {color} if (pred.apply(ici)) return ici; } return
[jira] [Updated] (CASSANDRA-3638) It may iterate the whole memtable while just query one row . This seriously affect the performance . of Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MaHaiyang updated CASSANDRA-3638: - Description: RangeSliceVerbHandler may just only query one row , but cassandra may iterate the whole memtable . the problem is in ColumnFamilyStore.getRangeSlice() method . {color:red} // this iterator may iterate the whole memtable!!{color} {code:title=ColumnFamilyStore.java|borderStyle=solid} public ListRow getRangeSlice(ByteBuffer superColumn, final AbstractBounds range, int maxResults, IFilter columnFilter) throws ExecutionException, InterruptedException { ... DecoratedKey startWith = new DecoratedKey(range.left, null); DecoratedKey stopAt = new DecoratedKey(range.right, null); QueryFilter filter = new QueryFilter(null, new QueryPath(columnFamily, superColumn, null), columnFilter); int gcBefore = (int)(System.currentTimeMillis() / 1000) - metadata.getGcGraceSeconds(); ListRow rows; ViewFragment view = markReferenced(startWith, stopAt); try { CloseableIteratorRow iterator = RowIteratorFactory.getIterator(view.memtables, view.sstables, startWith, stopAt, filter, getComparator(), this); rows = new ArrayListRow(); try { // pull rows out of the iterator boolean first = true; while (iterator.hasNext()) // this iterator may iterate the whole memtable!! { } } . } . return rows; } {code} {color:red} // Just only query one row ,but returned a sublist of columnFamiles {color} {code:title=Memtable.java|borderStyle=solid} // Just only query one row ,but returned a sublist of columnFamiles public IteratorMap.EntryDecoratedKey, ColumnFamily getEntryIterator(DecoratedKey startWith) { return columnFamilies.tailMap(startWith).entrySet().iterator(); } {code} {color:red} // entry.getKey() will never bigger or equal to startKey, and then iterate the whole sublist of memtable {color} {code:title=RowIteratorFactory.java|borderStyle=solid} public IColumnIterator computeNext() { while (iter.hasNext()) { Map.EntryDecoratedKey, ColumnFamily entry = iter.next(); IColumnIterator ici = filter.getMemtableColumnIterator(entry.getValue(), entry.getKey(), comparator); // entry.getKey() will never bigger or equal to startKey, and then iterate the whole sublist of memtable if (pred.apply(ici)) return ici; } return endOfData(); {code} was: RangeSliceVerbHandler may just only query one row , but cassandra may iterate the whole memtable . the problem is in ColumnFamilyStore.getRangeSlice() method . {color:red} // this iterator may iterate the whole memtable!!{color} {code:title=ColumnFamilyStore.java|borderStyle=solid} public ListRow getRangeSlice(ByteBuffer superColumn, final AbstractBounds range, int maxResults, IFilter columnFilter) throws ExecutionException, InterruptedException { ... DecoratedKey startWith = new DecoratedKey(range.left, null); DecoratedKey stopAt = new DecoratedKey(range.right, null); QueryFilter filter = new QueryFilter(null, new QueryPath(columnFamily, superColumn, null), columnFilter); int gcBefore = (int)(System.currentTimeMillis() / 1000) - metadata.getGcGraceSeconds(); ListRow rows; ViewFragment view = markReferenced(startWith, stopAt); try { CloseableIteratorRow iterator = RowIteratorFactory.getIterator(view.memtables, view.sstables, startWith, stopAt, filter, getComparator(), this); rows = new ArrayListRow(); try { // pull rows out of the iterator boolean first = true; while (iterator.hasNext()) // this iterator may iterate the whole memtable!! { } } . } . return rows; } {code} {color:red} // Just only query one row ,but returned a sublist of columnFamiles {color} {code:title=Memtable.java|borderStyle=solid} // Just only query one row ,but returned a sublist of columnFamiles public IteratorMap.EntryDecoratedKey, ColumnFamily getEntryIterator(DecoratedKey startWith) { return columnFamilies.tailMap(startWith).entrySet().iterator(); } {code} {color:red} // entry.getKey() will never bigger or equal to startKey, and then iterate the whole sublist of memtable {color} {code:title=RowIteratorFactory.java|borderStyle=solid} public IColumnIterator
[jira] [Updated] (CASSANDRA-3638) It may iterate the whole memtable while just query one row . This seriously affect the performance . of Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MaHaiyang updated CASSANDRA-3638: - Description: RangeSliceVerbHandler may just only query one row , but cassandra may iterate the whole memtable . the problem is in ColumnFamilyStore.getRangeSlice() method . {color:red} // this iterator may iterate the whole memtable!!{color} {code:title=ColumnFamilyStore.java|borderStyle=solid} public ListRow getRangeSlice(ByteBuffer superColumn, final AbstractBounds range, int maxResults, IFilter columnFilter) throws ExecutionException, InterruptedException { ... DecoratedKey startWith = new DecoratedKey(range.left, null); DecoratedKey stopAt = new DecoratedKey(range.right, null); QueryFilter filter = new QueryFilter(null, new QueryPath(columnFamily, superColumn, null), columnFilter); int gcBefore = (int)(System.currentTimeMillis() / 1000) - metadata.getGcGraceSeconds(); ListRow rows; ViewFragment view = markReferenced(startWith, stopAt); try { CloseableIteratorRow iterator = RowIteratorFactory.getIterator(view.memtables, view.sstables, startWith, stopAt, filter, getComparator(), this); rows = new ArrayListRow(); try { // pull rows out of the iterator boolean first = true; while (iterator.hasNext()) // this iterator may iterate the whole memtable!! { } } . } . return rows; } {code} {color:red} // Just only query one row ,but returned a sublist of columnFamiles {color} {code:title=Memtable.java|borderStyle=solid} // Just only query one row ,but returned a sublist of columnFamiles public IteratorMap.EntryDecoratedKey, ColumnFamily getEntryIterator(DecoratedKey startWith) { return columnFamilies.tailMap(startWith).entrySet().iterator(); } {code} {color:red} // entry.getKey() will never bigger or equal to startKey, and then iterate the whole sublist of memtable {color} {code:title=RowIteratorFactory.java|borderStyle=solid} public IColumnIterator computeNext() { while (iter.hasNext()) { Map.EntryDecoratedKey, ColumnFamily entry = iter.next(); IColumnIterator ici = filter.getMemtableColumnIterator(entry.getValue(), entry.getKey(), comparator); // entry.getKey() will never bigger or equal to startKey, and then iterate the whole sublist of memtable if (pred.apply(ici)) return ici; } return endOfData(); {code} was: RangeSliceVerbHandler may just only query one row , but cassandra may iterate the whole memtable . the problem is in ColumnFamilyStore.getRangeSlice() method . {code:title=ColumnFamilyStore.java|borderStyle=solid} public ListRow getRangeSlice(ByteBuffer superColumn, final AbstractBounds range, int maxResults, IFilter columnFilter) throws ExecutionException, InterruptedException { ... DecoratedKey startWith = new DecoratedKey(range.left, null); DecoratedKey stopAt = new DecoratedKey(range.right, null); QueryFilter filter = new QueryFilter(null, new QueryPath(columnFamily, superColumn, null), columnFilter); int gcBefore = (int)(System.currentTimeMillis() / 1000) - metadata.getGcGraceSeconds(); ListRow rows; ViewFragment view = markReferenced(startWith, stopAt); try { CloseableIteratorRow iterator = RowIteratorFactory.getIterator(view.memtables, view.sstables, startWith, stopAt, filter, getComparator(), this); rows = new ArrayListRow(); try { // pull rows out of the iterator boolean first = true; while (iterator.hasNext()) {color:red} // this iterator may iterate the whole memtable!!{color} { } } . } . return rows; } {code} {code:title=Memtable.java|borderStyle=solid} {color:red} // Just only query one row ,but returned a sublist of columnFamiles {color} public IteratorMap.EntryDecoratedKey, ColumnFamily getEntryIterator(DecoratedKey startWith) { return columnFamilies.tailMap(startWith).entrySet().iterator(); } {code} {code:title=RowIteratorFactory.java|borderStyle=solid} public IColumnIterator computeNext() { while (iter.hasNext()) { Map.EntryDecoratedKey, ColumnFamily entry = iter.next(); IColumnIterator ici = filter.getMemtableColumnIterator(entry.getValue(), entry.getKey(),
[jira] [Updated] (CASSANDRA-3620) Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and manual repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Williams updated CASSANDRA-3620: Description: Proposal for an improved system for handling distributed deletes, which removes the requirement to regularly run repair processes to maintain performance and data integrity. h2. The Problem There are various issues with repair: * Repair is expensive to run * Repair jobs are often made more expensive than they should be by other issues (nodes dropping requests, hinted handoff not working, downtime etc) * Repair processes can often fail and need restarting, for example in cloud environments where network issues make a node disappear from the ring for a brief moment * When you fail to run repair within GCSeconds, either by error or because of issues with Cassandra, data written to a node that did not see a later delete can reappear (and a node might miss a delete for several reasons including being down or simply dropping requests during load shedding) * If you cannot run repair and have to increase GCSeconds to prevent deleted data reappearing, in some cases the growing tombstone overhead can significantly degrade performance Because of the foregoing, in high throughput environments it can be very difficult to make repair a cron job. It can be preferable to keep a terminal open and run repair jobs one by one, making sure they succeed and keeping and eye on overall load to reduce system impact. This isn't desirable, and problems are exacerbated when there are lots of column families in a database or it is necessary to run a column family with a low GCSeconds to reduce tombstone load (because there are many write/deletes to that column family). The database owner must run repair within the GCSeconds window, or increase GCSeconds, to avoid potentially losing delete operations. It would be much better if there was no ongoing requirement to run repair to ensure deletes aren't lost, and no GCSeconds window. Ideally repair would be an optional maintenance utility used in special cases, or to ensure ONE reads get consistent data. h2. Reaper Model Proposal # Tombstones do not expire, and there is no GCSeconds # Tombstones have associated ACK lists, which record the replicas that have acknowledged them # Tombstones are deleted (or marked for compaction) when they have been acknowledged by all replicas # When a tombstone is deleted, it is added to a relic index. The relic index makes it possible for a reaper to acknowledge a tombstone after it is deleted # The ACK lists and relic index are held in memory for speed # Background reaper threads constantly stream ACK requests to other nodes, and stream back ACK responses back to requests they have received (throttling their usage of CPU and bandwidth so as not to affect performance) # If a reaper receives a request to ACK a tombstone that does not exist, it creates the tombstone and adds an ACK for the requestor, and replies with an ACK. This is the worst that can happen, and does not cause data corruption. ADDENDUM The proposal to hold the ACK and relic lists in memory was added after the first posting. Please see comments for full reasons. Furthermore, a proposal for enhancements to repair was posted to comments, which would cause tombstones to be scavenged when repair completes (the author had assumed this was the case anyway, but it seems at time of writing they are only scavenged during compaction on GCSeconds timeout). The proposals are not exclusive and this proposal is extended to include the possible enhancements to repair described. NOTES * If a node goes down for a prolonged period, the worst that can happen is that some tombstones are recreated across the cluster when it restarts, which does not corrupt data (and this will only occur with a very small number of tombstones) * The system is simple to implement and predictable * With the reaper model, repair would become an optional process for optimizing the database to increase the consistency seen by ConsistencyLevel.ONE reads, and for fixing up nodes, for example after an sstable was lost h3. Planned Benefits * Reaper threads can utilize spare cycles to constantly scavenge tombstones in the background thereby greatly reducing tombstone load, improving query performance, reducing the system resources needed by processes such as compaction, and making performance generally more predictable * The reaper model means that GCSeconds is no longer necessary, which removes the threat of data corruption if repair can't be run successfully within that period (for example if repair can't be run because of a new adopter's lack of Cassandra expertise, a cron script failing, or Cassandra bugs or other technical issues) * Reaper threads are fully automatic, work in the background and perform finely grained operations where
[jira] [Issue Comment Edited] (CASSANDRA-3620) Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and manual repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13169761#comment-13169761 ] Dominic Williams edited comment on CASSANDRA-3620 at 12/15/11 2:19 PM: --- Ok I got it and +1 on that idea. I had actually assumed tombstones were compacted away after repair anyway. So abandon GCSeconds and simply kill of tombstones created before repair when it runs successfully (presumably on a range-by-range basis?) * Improved performance through reduced tombstone load * No risk of data corruption if repair not run That would be a cool first step and improve the current situation. I think a reaper system is still needed though, although this feature would take some of the existing pressure off. There would still be the issue of tombstone build up between repairs, which means performance can vary (or actually, degrade) between invocations, the load spikes from repair itself and the manual nature of the process. I guess I'm on the sharp end of this - we have several column families where columns represent game objects or messages owned by users where there is a high delete and insert load. Various operations need to perform slices of user rows and these can get much slower as tombstones build up, so GCSeconds has been brought right down, but this leads to the constant pain of omg how long left before need to run repair or increase GCSeconds etc.. improving repair as described would remove the Sword of Damocles threat of data corruption but we'd still need to make sure it was run regularly, performance would degrade between invocations and repair would create load spikes. The reaping model can take away those problems. was (Author: dccwilliams): Ok I got it and +1 on that idea. Abandon GCSeconds and simply kill of tombstones created before repair when it runs successfully (presumably on a range-by-range basis) * Improved performance through reduced tombstone load * No risk of data corruption if repair not run That would be a very cool first step to optimize this I think a reaper system would still be well worthwhile though, although this feature would take some pressure off. There is still the issue of tombstone build up between repairs, which means performance can vary (or actually, degrade) between invocations plus there are still the load spikes from repair itself I guess I'm on the sharp end of this - we have several column families where columns represent game objects or messages owned by users where there is a high delete and insert load. Various operations need to perform slices of user rows and these can get much slower as tombstones build up, so GCSeconds has been brought right down, but this leads to the constant pain of omg how long left before need to run repair or increase GCSeconds etc.. improving repair would remove the Sword of Damocles thing but we'd still need to run it regularly and performance wouldn't be as consistent it could be with constant background reaping Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and manual repairs -- Key: CASSANDRA-3620 URL: https://issues.apache.org/jira/browse/CASSANDRA-3620 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Dominic Williams Labels: GCSeconds,, deletes,, distributed_deletes,, merkle_trees, repair, Original Estimate: 504h Remaining Estimate: 504h Proposal for an improved system for handling distributed deletes, which removes the requirement to regularly run repair processes to maintain performance and data integrity. h2. The Problem There are various issues with repair: * Repair is expensive to run * Repair jobs are often made more expensive than they should be by other issues (nodes dropping requests, hinted handoff not working, downtime etc) * Repair processes can often fail and need restarting, for example in cloud environments where network issues make a node disappear from the ring for a brief moment * When you fail to run repair within GCSeconds, either by error or because of issues with Cassandra, data written to a node that did not see a later delete can reappear (and a node might miss a delete for several reasons including being down or simply dropping requests during load shedding) * If you cannot run repair and have to increase GCSeconds to prevent deleted data reappearing, in some cases the growing tombstone overhead can significantly degrade performance Because of the foregoing, in high throughput environments it can be very difficult to make repair a cron job. It can be preferable to keep a terminal
[jira] [Issue Comment Edited] (CASSANDRA-3620) Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and manual repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13169761#comment-13169761 ] Dominic Williams edited comment on CASSANDRA-3620 at 12/15/11 2:22 PM: --- Ok I got it and +1 on that idea. I had actually assumed tombstones were compacted away after repair anyway. So as I understand GCSeconds would be removed, and tombstones would be marked for deletion once a repair operation was successfully run. That would be a cool first step and improve the current situation. But I think a reaper system is still needed: although this feature would take some of the current pressure off, there would still be the issue of tombstone build up between repairs, which means performance will degrade between invocations, the load spikes from repair itself and the manual nature of the process. I guess I'm on the sharp end of this - we have several column families where columns represent game objects or messages owned by users where there is a high delete and insert load. Various operations need to perform slices of user rows and these can get much slower as tombstones build up, so GCSeconds has been brought right down, but this leads to the constant pain of omg how long left before need to run repair or increase GCSeconds etc.. improving repair as described would remove the Sword of Damocles threat of data corruption but we'd still need to make sure it was run regularly, performance would degrade between invocations and repair would create load spikes. The reaping model can take away those problems. was (Author: dccwilliams): Ok I got it and +1 on that idea. I had actually assumed tombstones were compacted away after repair anyway. So abandon GCSeconds and simply kill of tombstones created before repair when it runs successfully (presumably on a range-by-range basis?) * Improved performance through reduced tombstone load * No risk of data corruption if repair not run That would be a cool first step and improve the current situation. I think a reaper system is still needed though, although this feature would take some of the existing pressure off. There would still be the issue of tombstone build up between repairs, which means performance can vary (or actually, degrade) between invocations, the load spikes from repair itself and the manual nature of the process. I guess I'm on the sharp end of this - we have several column families where columns represent game objects or messages owned by users where there is a high delete and insert load. Various operations need to perform slices of user rows and these can get much slower as tombstones build up, so GCSeconds has been brought right down, but this leads to the constant pain of omg how long left before need to run repair or increase GCSeconds etc.. improving repair as described would remove the Sword of Damocles threat of data corruption but we'd still need to make sure it was run regularly, performance would degrade between invocations and repair would create load spikes. The reaping model can take away those problems. Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and manual repairs -- Key: CASSANDRA-3620 URL: https://issues.apache.org/jira/browse/CASSANDRA-3620 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Dominic Williams Labels: GCSeconds,, deletes,, distributed_deletes,, merkle_trees, repair, Original Estimate: 504h Remaining Estimate: 504h Proposal for an improved system for handling distributed deletes, which removes the requirement to regularly run repair processes to maintain performance and data integrity. h2. The Problem There are various issues with repair: * Repair is expensive to run * Repair jobs are often made more expensive than they should be by other issues (nodes dropping requests, hinted handoff not working, downtime etc) * Repair processes can often fail and need restarting, for example in cloud environments where network issues make a node disappear from the ring for a brief moment * When you fail to run repair within GCSeconds, either by error or because of issues with Cassandra, data written to a node that did not see a later delete can reappear (and a node might miss a delete for several reasons including being down or simply dropping requests during load shedding) * If you cannot run repair and have to increase GCSeconds to prevent deleted data reappearing, in some cases the growing tombstone overhead can significantly degrade performance Because of the foregoing, in high throughput environments it can be
[jira] [Commented] (CASSANDRA-2475) Prepared statements
[ https://issues.apache.org/jira/browse/CASSANDRA-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170257#comment-13170257 ] Eric Evans commented on CASSANDRA-2475: --- bq. WFM. I assumed Rick had already implemented one for JDBC API completeness but if we're just going to no-op that out for now I'm not going to lose any sleep over it. He did, but we removed it at an earlier stage of the review, for the reasons listed here (so if it's decided that we should have one, I'll do the work to put it back in). bq. It's the client's responsibility to prepare the statements on each connection before using them, which implies some caching behavior on the part of the driver as in http://www.theserverside.com/news/1365244/Why-Prepared-Statements-are-important-and-how-to-use-them-properly OK, that makes sense. Though, it would seem to add another data-point to the API to remove PSes isn't necessary argument, since a close() on a pooled a connection isn't going to remove the statement server-side anyway. Prepared statements --- Key: CASSANDRA-2475 URL: https://issues.apache.org/jira/browse/CASSANDRA-2475 Project: Cassandra Issue Type: New Feature Components: API, Core Affects Versions: 1.0.5 Reporter: Eric Evans Assignee: Rick Shaw Priority: Minor Labels: cql Fix For: 1.1 Attachments: 2475-v1.patch, 2475-v2.patch, 2475-v3.1.patch, 2475-v3.2-Thrift.patch, v1-0001-CASSANDRA-2475-prepared-statement-patch.txt, v1-0002-regenerated-thrift-java.txt, v2-0001-CASSANDRA-2475-rickshaw-2475-v3.1.patch.txt, v2-0002-rickshaw-2475-v3.2-Thrift.patch-w-changes.txt, v2-0003-eevans-increment-thrift-version-by-1-not-3.txt, v2-0004-eevans-misc-cleanups.txt, v2-0005-eevans-refactor-for-better-encapsulation-of-prepare.txt, v2-0006-eevans-log-queries-at-TRACE.txt, v2-0007-use-an-LRU-map-for-storage-of-prepared-statements.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1214803 - /cassandra/trunk/src/java/org/apache/cassandra/service/ClientState.java
Author: eevans Date: Thu Dec 15 15:05:08 2011 New Revision: 1214803 URL: http://svn.apache.org/viewvc?rev=1214803view=rev Log: bump maximum cached prepared statements to 10,000 (from 50) (and fix Map so that it is actually LRU) Patch by evans for CASSANDRA-2475 Modified: cassandra/trunk/src/java/org/apache/cassandra/service/ClientState.java Modified: cassandra/trunk/src/java/org/apache/cassandra/service/ClientState.java URL: http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/service/ClientState.java?rev=1214803r1=1214802r2=1214803view=diff == --- cassandra/trunk/src/java/org/apache/cassandra/service/ClientState.java (original) +++ cassandra/trunk/src/java/org/apache/cassandra/service/ClientState.java Thu Dec 15 15:05:08 2011 @@ -43,7 +43,7 @@ import org.apache.cassandra.thrift.Inval */ public class ClientState { -private static final int MAX_CACHE_PREPARED = 50; // Ridiculously large, right? +private static final int MAX_CACHE_PREPARED = 1;// Enough to keep buggy clients from OOM'ing us private static Logger logger = LoggerFactory.getLogger(ClientState.class); // Current user for the session @@ -53,7 +53,7 @@ public class ClientState private final ListObject resource = new ArrayListObject(); // An LRU map of prepared statements -private MapInteger, CQLStatement prepared = new HashMapInteger, CQLStatement() { +private MapInteger, CQLStatement prepared = new LinkedHashMapInteger, CQLStatement(16, 0.75f, true) { protected boolean removeEldestEntry(Map.EntryInteger, CQLStatement eldest) { return size() MAX_CACHE_PREPARED; }
[Cassandra Wiki] Update of ArticlesAndPresentations by zznate
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The ArticlesAndPresentations page has been changed by zznate: http://wiki.apache.org/cassandra/ArticlesAndPresentations?action=diffrev1=129rev2=130 * [[http://www.emtg.net78.net/2011/10/21/cassandra_hector.html|Cassandra y Hector]], Spanish, October 2011 = Presentations = + * [[http://www.slideshare.net/jsevellec/cassandra-pour-les-dveloppeurs-java|Cassandra pourles(ch'tis)Développeurs Java]] - Jérémy Sevellec, December 2011 * [[http://www.slideshare.net/mattdennis/cassandra-data-modeling|Cassandra Data Modeling Workshop]] - Cassandra SF, Matthew F. Dennis, July 2011 * [[http://www.slideshare.net/jeromatron/cassandrahadoop-integration|Cassandra/Hadoop Integration]] - Jeremy Hanna, January 2011 * [[http://www.slideshare.net/supertom/using-cassandra-with-your-web-application|Using Cassandra with your Web Application]] - Tom Melendez, Oct 2010
[Cassandra Wiki] Update of ArticlesAndPresentations by zznate
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The ArticlesAndPresentations page has been changed by zznate: http://wiki.apache.org/cassandra/ArticlesAndPresentations?action=diffrev1=130rev2=131 Comment: adjust link name format * [[http://www.emtg.net78.net/2011/10/21/cassandra_hector.html|Cassandra y Hector]], Spanish, October 2011 = Presentations = - * [[http://www.slideshare.net/jsevellec/cassandra-pour-les-dveloppeurs-java|Cassandra pourles(ch'tis)Développeurs Java]] - Jérémy Sevellec, December 2011 + * [[http://www.slideshare.net/jsevellec/cassandra-pour-les-dveloppeurs-java|Cassandra pour les (ch'tis) Développeurs Java]] - Jérémy Sevellec, December 2011 * [[http://www.slideshare.net/mattdennis/cassandra-data-modeling|Cassandra Data Modeling Workshop]] - Cassandra SF, Matthew F. Dennis, July 2011 * [[http://www.slideshare.net/jeromatron/cassandrahadoop-integration|Cassandra/Hadoop Integration]] - Jeremy Hanna, January 2011 * [[http://www.slideshare.net/supertom/using-cassandra-with-your-web-application|Using Cassandra with your Web Application]] - Tom Melendez, Oct 2010
[jira] [Updated] (CASSANDRA-3639) Move streams too many data
[ https://issues.apache.org/jira/browse/CASSANDRA-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabien Rousseau updated CASSANDRA-3639: --- Attachment: 0001-try-to-fix-move-streaming-too-many-data-unit-tests.patch Move streams too many data -- Key: CASSANDRA-3639 URL: https://issues.apache.org/jira/browse/CASSANDRA-3639 Project: Cassandra Issue Type: Improvement Affects Versions: 0.8.7 Reporter: Fabien Rousseau Priority: Minor Attachments: 0001-try-to-fix-move-streaming-too-many-data-unit-tests.patch During a move operation, we observed that the node streamed most of its data and received all its data. We are running Cassandra 0.8.7 (plus a few patches) After reading the code related to move, we found out that : - in StorageService.java, line 2002 and line 2004 = ranges are returned in a non ordered collection, but calculateStreamAndFetchRanges() method (line 2011) assume ranges are sorted, thus, resulting in wider ranges to be fetched/streamed We managed to isolate and reproduce this in a unit test. We also propose a patch which : - does not rely on any sort - adds a few unit tests (may not be exhaustive...) Unit tests are done only for RF=2 and for the OldNetworkStrategyTopology. For the sake of simplicity, we've put them in OldNetworkStrategyTopologyTest, but they probably should be moved. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-3637) data file size limit
[ https://issues.apache.org/jira/browse/CASSANDRA-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-3637. --- Resolution: Not A Problem LeveledCompactionStrategy addresses this. http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra data file size limit Key: CASSANDRA-3637 URL: https://issues.apache.org/jira/browse/CASSANDRA-3637 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Zenek Kraweznik For 100GB cassandra database (on 500GB disk) I need another 100GB space for compacting (caused by large files, one of data file is 80GB). Limitng file size for ex to 5GB (limit shoud be configurable) I need significantly less space for that operation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3639) Move streams too many data
Move streams too many data -- Key: CASSANDRA-3639 URL: https://issues.apache.org/jira/browse/CASSANDRA-3639 Project: Cassandra Issue Type: Improvement Affects Versions: 0.8.7 Reporter: Fabien Rousseau Priority: Minor Attachments: 0001-try-to-fix-move-streaming-too-many-data-unit-tests.patch During a move operation, we observed that the node streamed most of its data and received all its data. We are running Cassandra 0.8.7 (plus a few patches) After reading the code related to move, we found out that : - in StorageService.java, line 2002 and line 2004 = ranges are returned in a non ordered collection, but calculateStreamAndFetchRanges() method (line 2011) assume ranges are sorted, thus, resulting in wider ranges to be fetched/streamed We managed to isolate and reproduce this in a unit test. We also propose a patch which : - does not rely on any sort - adds a few unit tests (may not be exhaustive...) Unit tests are done only for RF=2 and for the OldNetworkStrategyTopology. For the sake of simplicity, we've put them in OldNetworkStrategyTopologyTest, but they probably should be moved. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3638) It may iterate the whole memtable while just query one row . This seriously affect the performance . of Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170304#comment-13170304 ] Jonathan Ellis commented on CASSANDRA-3638: --- getRangeSlice is the scan a lot of rows method. getColumnFamily is the scan a single row method. It may iterate the whole memtable while just query one row . This seriously affect the performance . of Cassandra -- Key: CASSANDRA-3638 URL: https://issues.apache.org/jira/browse/CASSANDRA-3638 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.0 Reporter: MaHaiyang RangeSliceVerbHandler may just only query one row , but cassandra may iterate the whole memtable . the problem is in ColumnFamilyStore.getRangeSlice() method . {color:red} // this iterator may iterate the whole memtable!!{color} {code:title=ColumnFamilyStore.java|borderStyle=solid} public ListRow getRangeSlice(ByteBuffer superColumn, final AbstractBounds range, int maxResults, IFilter columnFilter) throws ExecutionException, InterruptedException { ... DecoratedKey startWith = new DecoratedKey(range.left, null); DecoratedKey stopAt = new DecoratedKey(range.right, null); QueryFilter filter = new QueryFilter(null, new QueryPath(columnFamily, superColumn, null), columnFilter); int gcBefore = (int)(System.currentTimeMillis() / 1000) - metadata.getGcGraceSeconds(); ListRow rows; ViewFragment view = markReferenced(startWith, stopAt); try { CloseableIteratorRow iterator = RowIteratorFactory.getIterator(view.memtables, view.sstables, startWith, stopAt, filter, getComparator(), this); rows = new ArrayListRow(); try { // pull rows out of the iterator boolean first = true; while (iterator.hasNext()) // this iterator may iterate the whole memtable!! { } } . } . return rows; } {code} {color:red} // Just only query one row ,but returned a sublist of columnFamiles {color} {code:title=Memtable.java|borderStyle=solid} // Just only query one row ,but returned a sublist of columnFamiles public IteratorMap.EntryDecoratedKey, ColumnFamily getEntryIterator(DecoratedKey startWith) { return columnFamilies.tailMap(startWith).entrySet().iterator(); } {code} {color:red} // entry.getKey() will never bigger or equal to startKey, and then iterate the whole sublist of memtable {color} {code:title=RowIteratorFactory.java|borderStyle=solid} public IColumnIterator computeNext() { while (iter.hasNext()) { Map.EntryDecoratedKey, ColumnFamily entry = iter.next(); IColumnIterator ici = filter.getMemtableColumnIterator(entry.getValue(), entry.getKey(), comparator); // entry.getKey() will never bigger or equal to startKey, and then iterate the whole sublist of memtable if (pred.apply(ici)) return ici; } return endOfData(); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3635) Throttle validation separately from other compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170310#comment-13170310 ] Jonathan Ellis commented on CASSANDRA-3635: --- I don't think we should put this in 0.8. The repair problems there are a lot deeper than this. I'm fine with posting a backport patch if people want to run a custom build with it, but this shouldn't go in anything earlier than 1.0. (I'd prefer 1.1 TBH.) Since compaction throughput does not include validation anymore, I'd prefer to default to something like 12/4 instead of effectively increasing the impact of compaction + repair out of the box. Throttle validation separately from other compaction Key: CASSANDRA-3635 URL: https://issues.apache.org/jira/browse/CASSANDRA-3635 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Labels: repair Fix For: 0.8.10, 1.0.7 Attachments: 0001-separate-validation-throttling.patch Validation compaction is fairly ressource intensive. It is possible to throttle it with other compaction, but there is cases where you really want to throttle it rather aggressively but don't necessarily want to have minor compactions throttled that much. The goal is to (optionally) allow to set a separate throttling value for validation. PS: I'm not pretending this will solve every repair problem or anything. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3639) Move streams too many data
[ https://issues.apache.org/jira/browse/CASSANDRA-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3639: -- Reviewer: thepaul Fix Version/s: 1.1 I'm not comfortable changing the move code in a stable release, but this sounds like a good change to make in 1.1. Can you post a version of the patch that applies to trunk? Move streams too many data -- Key: CASSANDRA-3639 URL: https://issues.apache.org/jira/browse/CASSANDRA-3639 Project: Cassandra Issue Type: Improvement Affects Versions: 0.8.7 Reporter: Fabien Rousseau Priority: Minor Fix For: 1.1 Attachments: 0001-try-to-fix-move-streaming-too-many-data-unit-tests.patch During a move operation, we observed that the node streamed most of its data and received all its data. We are running Cassandra 0.8.7 (plus a few patches) After reading the code related to move, we found out that : - in StorageService.java, line 2002 and line 2004 = ranges are returned in a non ordered collection, but calculateStreamAndFetchRanges() method (line 2011) assume ranges are sorted, thus, resulting in wider ranges to be fetched/streamed We managed to isolate and reproduce this in a unit test. We also propose a patch which : - does not rely on any sort - adds a few unit tests (may not be exhaustive...) Unit tests are done only for RF=2 and for the OldNetworkStrategyTopology. For the sake of simplicity, we've put them in OldNetworkStrategyTopologyTest, but they probably should be moved. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2475) Prepared statements
[ https://issues.apache.org/jira/browse/CASSANDRA-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170320#comment-13170320 ] Jonathan Ellis commented on CASSANDRA-2475: --- bq. it would seem to add another data-point to the API to remove PSes isn't necessary argument Agreed, let's leave that out for now. Prepared statements --- Key: CASSANDRA-2475 URL: https://issues.apache.org/jira/browse/CASSANDRA-2475 Project: Cassandra Issue Type: New Feature Components: API, Core Affects Versions: 1.0.5 Reporter: Eric Evans Assignee: Rick Shaw Priority: Minor Labels: cql Fix For: 1.1 Attachments: 2475-v1.patch, 2475-v2.patch, 2475-v3.1.patch, 2475-v3.2-Thrift.patch, v1-0001-CASSANDRA-2475-prepared-statement-patch.txt, v1-0002-regenerated-thrift-java.txt, v2-0001-CASSANDRA-2475-rickshaw-2475-v3.1.patch.txt, v2-0002-rickshaw-2475-v3.2-Thrift.patch-w-changes.txt, v2-0003-eevans-increment-thrift-version-by-1-not-3.txt, v2-0004-eevans-misc-cleanups.txt, v2-0005-eevans-refactor-for-better-encapsulation-of-prepare.txt, v2-0006-eevans-log-queries-at-TRACE.txt, v2-0007-use-an-LRU-map-for-storage-of-prepared-statements.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1391) Allow Concurrent Schema Migrations
[ https://issues.apache.org/jira/browse/CASSANDRA-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170322#comment-13170322 ] Jonathan Ellis commented on CASSANDRA-1391: --- That doesn't work, though: what if we have two updates at the same timestamp? I think it really does need to be content-based. Also, I still think using Table.apply and CF.diff is the right way to do this, instead of effectively duplicating that code as a special case. Are there any downsides to that approach I'm missing? Allow Concurrent Schema Migrations -- Key: CASSANDRA-1391 URL: https://issues.apache.org/jira/browse/CASSANDRA-1391 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.0 Reporter: Stu Hood Assignee: Pavel Yaskevich Fix For: 1.1 Attachments: 0001-new-migration-schema-and-avro-methods-cleanup.patch, 0002-avro-removal.patch, 0003-oldVersion-removed-nit-fixed.patch, CASSANDRA-1391.patch CASSANDRA-1292 fixed multiple migrations started from the same node to properly queue themselves, but it is still possible for migrations initiated on different nodes to conflict and leave the cluster in a bad state. Since the system_add/drop/rename methods are accessible directly from the client API, they should be completely safe for concurrent use. It should be possible to allow for most types of concurrent migrations by converting the UUID schema ID into a VersionVectorClock (as provided by CASSANDRA-580). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3616) Temp SSTable and file descriptor leak
[ https://issues.apache.org/jira/browse/CASSANDRA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170323#comment-13170323 ] Jonathan Ellis commented on CASSANDRA-3616: --- Eric, do you also see correlation w/ repair operations? Temp SSTable and file descriptor leak - Key: CASSANDRA-3616 URL: https://issues.apache.org/jira/browse/CASSANDRA-3616 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.5 Environment: 1.0.5 + CASSANDRA-3532 patch Solaris 10 Reporter: Eric Parusel Discussion about this started in CASSANDRA-3532. It's on it's own ticket now. Anyhow: The nodes in my cluster are using a lot of file descriptors, holding open tmp files. A few are using 50K+, nearing their limit (on Solaris, of 64K). Here's a small snippet of lsof: java 828 appdeployer *162u VREG 181,65540 0 333884 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776518-Data.db java 828 appdeployer *163u VREG 181,65540 0 333502 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776452-Data.db java 828 appdeployer *165u VREG 181,65540 0 333929 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776527-Index.db java 828 appdeployer *166u VREG 181,65540 0 333859 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776514-Data.db java 828 appdeployer *167u VREG 181,65540 0 333663 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776480-Data.db java 828 appdeployer *168u VREG 181,65540 0 333812 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db I spot checked a few and found they still exist on the filesystem too: rw-rr- 1 appdeployer appdeployer 0 Dec 12 07:16 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db After more investigation, it seems to happen during a CompactionTask. I waited until I saw some -tmp- files hanging around in the data dir: -rw-r--r-- 1 appdeployer appdeployer 0 Dec 12 21:47:10 2011 messages_meta-tmp-hb-788904-Data.db -rw-r--r-- 1 appdeployer appdeployer 0 Dec 12 21:47:10 2011 messages_meta-tmp-hb-788904-Index.db and then found this in the logs: INFO [CompactionExecutor:18839] 2011-12-12 21:47:07,173 CompactionTask.java (line 113) Compacting [SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760408-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760413-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760409-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-788314-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760407-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760412-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760410-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760411-Data.db')] INFO [CompactionExecutor:18839] 2011-12-12 21:47:10,461 CompactionTask.java (line 218) Compacted to [/data1/cassandra/data/MA_DDR/messages_meta-hb-788896-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788897-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788898-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788899-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788900-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788901-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788902-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788903-Data.db,]. 83,899,295 to 83,891,657 (~99% of original) bytes for 75,662 keys at 24.332518MB/s. Time: 3,288ms. Note that the timestamp of the 2nd log line matches the last modified time of the files, and has IDs leading up to, *but not including 788904*. I thought this might be relavent information, but I haven't found the specific cause yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3616) Temp SSTable and file descriptor leak
[ https://issues.apache.org/jira/browse/CASSANDRA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170333#comment-13170333 ] Eric Parusel commented on CASSANDRA-3616: - I haven't run a repair lately (no deletes at this time, I plan on setting up scheduled repairs though), but can run one to find out in the next few hours. So far I've only correlated it with compaction. I should note we're using LeveledCompactionStrategy. Temp SSTable and file descriptor leak - Key: CASSANDRA-3616 URL: https://issues.apache.org/jira/browse/CASSANDRA-3616 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.5 Environment: 1.0.5 + CASSANDRA-3532 patch Solaris 10 Reporter: Eric Parusel Discussion about this started in CASSANDRA-3532. It's on it's own ticket now. Anyhow: The nodes in my cluster are using a lot of file descriptors, holding open tmp files. A few are using 50K+, nearing their limit (on Solaris, of 64K). Here's a small snippet of lsof: java 828 appdeployer *162u VREG 181,65540 0 333884 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776518-Data.db java 828 appdeployer *163u VREG 181,65540 0 333502 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776452-Data.db java 828 appdeployer *165u VREG 181,65540 0 333929 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776527-Index.db java 828 appdeployer *166u VREG 181,65540 0 333859 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776514-Data.db java 828 appdeployer *167u VREG 181,65540 0 333663 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776480-Data.db java 828 appdeployer *168u VREG 181,65540 0 333812 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db I spot checked a few and found they still exist on the filesystem too: rw-rr- 1 appdeployer appdeployer 0 Dec 12 07:16 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db After more investigation, it seems to happen during a CompactionTask. I waited until I saw some -tmp- files hanging around in the data dir: -rw-r--r-- 1 appdeployer appdeployer 0 Dec 12 21:47:10 2011 messages_meta-tmp-hb-788904-Data.db -rw-r--r-- 1 appdeployer appdeployer 0 Dec 12 21:47:10 2011 messages_meta-tmp-hb-788904-Index.db and then found this in the logs: INFO [CompactionExecutor:18839] 2011-12-12 21:47:07,173 CompactionTask.java (line 113) Compacting [SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760408-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760413-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760409-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-788314-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760407-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760412-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760410-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760411-Data.db')] INFO [CompactionExecutor:18839] 2011-12-12 21:47:10,461 CompactionTask.java (line 218) Compacted to [/data1/cassandra/data/MA_DDR/messages_meta-hb-788896-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788897-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788898-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788899-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788900-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788901-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788902-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788903-Data.db,]. 83,899,295 to 83,891,657 (~99% of original) bytes for 75,662 keys at 24.332518MB/s. Time: 3,288ms. Note that the timestamp of the 2nd log line matches the last modified time of the files, and has IDs leading up to, *but not including 788904*. I thought this might be relavent information, but I haven't found the specific cause yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1391) Allow Concurrent Schema Migrations
[ https://issues.apache.org/jira/browse/CASSANDRA-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170336#comment-13170336 ] Pavel Yaskevich commented on CASSANDRA-1391: We could compare uuids instead in the isMergingMigration method. How do node determine if it is ahead or behind of the ring with content based versioning? Even if able to determine state, how do you find out what migrations node needs to send/receive to get ring in sync? Allow Concurrent Schema Migrations -- Key: CASSANDRA-1391 URL: https://issues.apache.org/jira/browse/CASSANDRA-1391 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.0 Reporter: Stu Hood Assignee: Pavel Yaskevich Fix For: 1.1 Attachments: 0001-new-migration-schema-and-avro-methods-cleanup.patch, 0002-avro-removal.patch, 0003-oldVersion-removed-nit-fixed.patch, CASSANDRA-1391.patch CASSANDRA-1292 fixed multiple migrations started from the same node to properly queue themselves, but it is still possible for migrations initiated on different nodes to conflict and leave the cluster in a bad state. Since the system_add/drop/rename methods are accessible directly from the client API, they should be completely safe for concurrent use. It should be possible to allow for most types of concurrent migrations by converting the UUID schema ID into a VersionVectorClock (as provided by CASSANDRA-580). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3143) Global caches (key/row)
[ https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Yaskevich updated CASSANDRA-3143: --- Attachment: (was: 0003-CacheServiceMBean-and-correct-key-cache-loading.patch) Global caches (key/row) --- Key: CASSANDRA-3143 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143 Project: Cassandra Issue Type: Improvement Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Priority: Minor Labels: Core Fix For: 1.1 Attachments: 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch Caches are difficult to configure well as ColumnFamilies are added, similar to how memtables were difficult pre-CASSANDRA-2006. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3143) Global caches (key/row)
[ https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Yaskevich updated CASSANDRA-3143: --- Attachment: (was: 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch) Global caches (key/row) --- Key: CASSANDRA-3143 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143 Project: Cassandra Issue Type: Improvement Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Priority: Minor Labels: Core Fix For: 1.1 Attachments: 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch Caches are difficult to configure well as ColumnFamilies are added, similar to how memtables were difficult pre-CASSANDRA-2006. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3143) Global caches (key/row)
[ https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Yaskevich updated CASSANDRA-3143: --- Attachment: (was: 0001-global-key-cache.patch) Global caches (key/row) --- Key: CASSANDRA-3143 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143 Project: Cassandra Issue Type: Improvement Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Priority: Minor Labels: Core Fix For: 1.1 Attachments: 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch Caches are difficult to configure well as ColumnFamilies are added, similar to how memtables were difficult pre-CASSANDRA-2006. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3143) Global caches (key/row)
[ https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Yaskevich updated CASSANDRA-3143: --- Attachment: (was: 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch) Global caches (key/row) --- Key: CASSANDRA-3143 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143 Project: Cassandra Issue Type: Improvement Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Priority: Minor Labels: Core Fix For: 1.1 Attachments: 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch Caches are difficult to configure well as ColumnFamilies are added, similar to how memtables were difficult pre-CASSANDRA-2006. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3143) Global caches (key/row)
[ https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Yaskevich updated CASSANDRA-3143: --- Attachment: (was: 0004-key-row-cache-tests-and-tweaks.patch) Global caches (key/row) --- Key: CASSANDRA-3143 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143 Project: Cassandra Issue Type: Improvement Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Priority: Minor Labels: Core Fix For: 1.1 Attachments: 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch Caches are difficult to configure well as ColumnFamilies are added, similar to how memtables were difficult pre-CASSANDRA-2006. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3143) Global caches (key/row)
[ https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Yaskevich updated CASSANDRA-3143: --- Attachment: (was: 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch) Global caches (key/row) --- Key: CASSANDRA-3143 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143 Project: Cassandra Issue Type: Improvement Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Priority: Minor Labels: Core Fix For: 1.1 Caches are difficult to configure well as ColumnFamilies are added, similar to how memtables were difficult pre-CASSANDRA-2006. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3143) Global caches (key/row)
[ https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Yaskevich updated CASSANDRA-3143: --- Attachment: 0007-second-round-of-changes-according-to-Sylvain-comment.patch 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch 0004-key-row-cache-tests-and-tweaks.patch 0003-CacheServiceMBean-and-correct-key-cache-loading.patch 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch 0001-global-key-cache.patch rebased set of patches, where all the changes from the second Sylvain's comment are in patch #7. Global caches (key/row) --- Key: CASSANDRA-3143 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143 Project: Cassandra Issue Type: Improvement Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Priority: Minor Labels: Core Fix For: 1.1 Attachments: 0001-global-key-cache.patch, 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 0004-key-row-cache-tests-and-tweaks.patch, 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch, 0007-second-round-of-changes-according-to-Sylvain-comment.patch Caches are difficult to configure well as ColumnFamilies are added, similar to how memtables were difficult pre-CASSANDRA-2006. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2475) Prepared statements
[ https://issues.apache.org/jira/browse/CASSANDRA-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170360#comment-13170360 ] Hudson commented on CASSANDRA-2475: --- Integrated in Cassandra #1257 (See [https://builds.apache.org/job/Cassandra/1257/]) bump maximum cached prepared statements to 10,000 (from 50) (and fix Map so that it is actually LRU) Patch by evans for CASSANDRA-2475 eevans : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1214803 Files : * /cassandra/trunk/src/java/org/apache/cassandra/service/ClientState.java Prepared statements --- Key: CASSANDRA-2475 URL: https://issues.apache.org/jira/browse/CASSANDRA-2475 Project: Cassandra Issue Type: New Feature Components: API, Core Affects Versions: 1.0.5 Reporter: Eric Evans Assignee: Rick Shaw Priority: Minor Labels: cql Fix For: 1.1 Attachments: 2475-v1.patch, 2475-v2.patch, 2475-v3.1.patch, 2475-v3.2-Thrift.patch, v1-0001-CASSANDRA-2475-prepared-statement-patch.txt, v1-0002-regenerated-thrift-java.txt, v2-0001-CASSANDRA-2475-rickshaw-2475-v3.1.patch.txt, v2-0002-rickshaw-2475-v3.2-Thrift.patch-w-changes.txt, v2-0003-eevans-increment-thrift-version-by-1-not-3.txt, v2-0004-eevans-misc-cleanups.txt, v2-0005-eevans-refactor-for-better-encapsulation-of-prepare.txt, v2-0006-eevans-log-queries-at-TRACE.txt, v2-0007-use-an-LRU-map-for-storage-of-prepared-statements.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170364#comment-13170364 ] Jonathan Ellis commented on CASSANDRA-2749: --- bq. we cannot stream between two nodes, one using separate cf directory I don't see any reason to continue to support the old-style directory layout. That adds complexity (operationally as well as in the code) for no benefit that I can think of. I think we should migrate from old layout to new on the first startup under 1.1. bq. regarding keyspaces in file names, sure, why not, guess having a header with this info in the file is out of the question, then the only meta data we have is the file name, right? A problem could be if we want to do CASSANDRA-1983 later, that would increase the file name length even more I'm on the fence here -- on the one hand having ks + cf in the filename simplifies some things. On the other hand, we allow arbitrary-length KS + CF names (up to 64K iirc) so UUID aside we're already in trouble on ext3/ext4, xfs, and ntfs, which all support max filename length of ~256. I'm starting to think we should move these into the metadata component instead of the filename. fine-grained control over data directories -- Key: CASSANDRA-2749 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Priority: Minor Fix For: 1.1 Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-add-new-directory-layout.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs. Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to keyspaces. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3635) Throttle validation separately from other compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170367#comment-13170367 ] Jonathan Ellis commented on CASSANDRA-3635: --- Taking a step back, I'm not sure I see the benefit here. If we're okay with X MB/s of i/o going on, doesn't that disrupt reads just as much whether that comes from repair validation or ordinary compaction? Throttle validation separately from other compaction Key: CASSANDRA-3635 URL: https://issues.apache.org/jira/browse/CASSANDRA-3635 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Labels: repair Fix For: 0.8.10, 1.0.7 Attachments: 0001-separate-validation-throttling.patch Validation compaction is fairly ressource intensive. It is possible to throttle it with other compaction, but there is cases where you really want to throttle it rather aggressively but don't necessarily want to have minor compactions throttled that much. The goal is to (optionally) allow to set a separate throttling value for validation. PS: I'm not pretending this will solve every repair problem or anything. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3628) Make Pig/CassandraStorage delete functionality disabled by default and configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-3628: Attachment: 3628.txt Split out the conditions so it can do a noop for null values. Not 100% certain that's the desired behavior - do we want to do that or do we want to just write an empty value. However, if we want to write an empty value, we have to modify the null to an empty value because of the NPEs that happen if we don't change it. For our purposes, we want to skip them if the values are null. In our code we also log the column family name and the column name, but that might be up to the user who wants to do that - adds a lot of logging. Maybe people want that though. Make Pig/CassandraStorage delete functionality disabled by default and configurable --- Key: CASSANDRA-3628 URL: https://issues.apache.org/jira/browse/CASSANDRA-3628 Project: Cassandra Issue Type: Task Reporter: Jeremy Hanna Assignee: Jeremy Hanna Labels: pig Fix For: 1.0.7, 1.1 Attachments: 3628.txt Right now, there is a way to delete column with the CassandraStorage loadstorefunc. In practice it is a bad idea to have that enabled by default. A scenario: do an outer join and you don't have a value for something and then you write out to cassandra all of the attributes of that relation. You've just inadvertently deleted a column for all the rows that didn't have that value as a result of the outer join. It can be argued that you want to be careful with how you project after the join. However, I would think disabling by default and having a configurable property to enable it for the instances when you explicitly want to use it is the right plan. Fwiw, we had a bug in one of our scripts that did exactly as described above. It's good to fix the bug. It's bad to implicitly delete data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3625) Do something about DynamicCompositeType
[ https://issues.apache.org/jira/browse/CASSANDRA-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170383#comment-13170383 ] Ed Anuff commented on CASSANDRA-3625: - I'm not sure we need a longer term solution than what I'm proposing. I think we're all in agreement that throwing the exception the way it's doing now is bad and that a deterministic though not necessarily transparent sort behavior is the best solution. Sylvain, are you working on this one or would you like me to take a stab at it? Do something about DynamicCompositeType --- Key: CASSANDRA-3625 URL: https://issues.apache.org/jira/browse/CASSANDRA-3625 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Sylvain Lebresne Currently, DynamicCompositeType is a super dangerous type. We cannot leave it that way or people will get hurt. Let's recall that DynamicCompositeType allows composite column names without any limitation on what each component type can be. It was added to basically allow to use different rows of the same column family to each store a different index. So for instance you would have: {noformat} index1: { bar:24 - someval bar:42 - someval foo:12 - someval ... } index2: { 0:uuid1:3.2 - someval 1:uuid2:2.2 - someval ... } {noformat} where index1, index2, ... are rows. So each row have columns whose names have similar structure (so they can be compared), but between rows the structure can be different (we neve compare two columns from two different rows). But the problem is the following: what happens if in the index1 row above, you insert a column whose name is 0:uuid1 ? There is no really meaningful way to compare bar:24 and 0:uuid1. The current implementation of DynamicCompositeType, when confronted with this, says that it is a user error and throw a MarshalException. The problem with that is that the exception is not throw at insert time, and it *cannot* be because of the dynamic nature of the comparator. But that means that if you do insert the wrong column in the wrong row, you end up *corrupting* a sstable. It is too dangerous a behavior. And it's probably made worst by the fact that some people probably think that DynamicCompositeType should be superior to CompositeType since you know, it's dynamic. One solution to that problem could be to decide of some random (but predictable) order between two incomparable component. For example we could design that IntType LongType StringType ... Note that even if we do that, I would suggest renaming the DynamicCompositeType to something that suggest that CompositeType is always preferable to DynamicCompositeType unless you're really doing very advanced stuffs. Opinions? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170385#comment-13170385 ] Sylvain Lebresne commented on CASSANDRA-2749: - {quote} On the other hand, we allow arbitrary-length KS + CF names (up to 64K iirc) so UUID aside we're already in trouble on ext3/ext4, xfs, and ntfs, which all support max filename length of ~256. I'm starting to think we should move these into the metadata component instead of the filename. {quote} The thing with the metadata component is that from a code perspective, there is lots of places where we want to create a Descriptor, which involves extracting the keyspace/cf names only based on the filename. Adding the necessity to locate and read the metadata in those places will likely don't be very fun. So I'd be in favor of just limiting the keyspace and column family names. It's one for which there is no real point to have very long names. Limiting each one to 32 characters shouldn't be a strong limitation. fine-grained control over data directories -- Key: CASSANDRA-2749 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Priority: Minor Fix For: 1.1 Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-add-new-directory-layout.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs. Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to keyspaces. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3639) Move streams too many data
[ https://issues.apache.org/jira/browse/CASSANDRA-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabien Rousseau updated CASSANDRA-3639: --- Attachment: 0001-try-to-fix-move-streaming-too-many-data-unit-tests-v2.patch Sure. This patch (v2) applies to trunk. I also made a little modification in the tests (for the first patch, at the last moment, I changed the first token from 0 to 10, and this made most of the test OK with the current code) Move streams too many data -- Key: CASSANDRA-3639 URL: https://issues.apache.org/jira/browse/CASSANDRA-3639 Project: Cassandra Issue Type: Improvement Affects Versions: 0.8.7 Reporter: Fabien Rousseau Priority: Minor Fix For: 1.1 Attachments: 0001-try-to-fix-move-streaming-too-many-data-unit-tests-v2.patch, 0001-try-to-fix-move-streaming-too-many-data-unit-tests.patch During a move operation, we observed that the node streamed most of its data and received all its data. We are running Cassandra 0.8.7 (plus a few patches) After reading the code related to move, we found out that : - in StorageService.java, line 2002 and line 2004 = ranges are returned in a non ordered collection, but calculateStreamAndFetchRanges() method (line 2011) assume ranges are sorted, thus, resulting in wider ranges to be fetched/streamed We managed to isolate and reproduce this in a unit test. We also propose a patch which : - does not rely on any sort - adds a few unit tests (may not be exhaustive...) Unit tests are done only for RF=2 and for the OldNetworkStrategyTopology. For the sake of simplicity, we've put them in OldNetworkStrategyTopologyTest, but they probably should be moved. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3616) Temp SSTable and file descriptor leak
[ https://issues.apache.org/jira/browse/CASSANDRA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170393#comment-13170393 ] Brandon Williams commented on CASSANDRA-3616: - I can reproduce this with SizeTieredStrategy. Temp SSTable and file descriptor leak - Key: CASSANDRA-3616 URL: https://issues.apache.org/jira/browse/CASSANDRA-3616 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.5 Environment: 1.0.5 + CASSANDRA-3532 patch Solaris 10 Reporter: Eric Parusel Discussion about this started in CASSANDRA-3532. It's on it's own ticket now. Anyhow: The nodes in my cluster are using a lot of file descriptors, holding open tmp files. A few are using 50K+, nearing their limit (on Solaris, of 64K). Here's a small snippet of lsof: java 828 appdeployer *162u VREG 181,65540 0 333884 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776518-Data.db java 828 appdeployer *163u VREG 181,65540 0 333502 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776452-Data.db java 828 appdeployer *165u VREG 181,65540 0 333929 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776527-Index.db java 828 appdeployer *166u VREG 181,65540 0 333859 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776514-Data.db java 828 appdeployer *167u VREG 181,65540 0 333663 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776480-Data.db java 828 appdeployer *168u VREG 181,65540 0 333812 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db I spot checked a few and found they still exist on the filesystem too: rw-rr- 1 appdeployer appdeployer 0 Dec 12 07:16 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db After more investigation, it seems to happen during a CompactionTask. I waited until I saw some -tmp- files hanging around in the data dir: -rw-r--r-- 1 appdeployer appdeployer 0 Dec 12 21:47:10 2011 messages_meta-tmp-hb-788904-Data.db -rw-r--r-- 1 appdeployer appdeployer 0 Dec 12 21:47:10 2011 messages_meta-tmp-hb-788904-Index.db and then found this in the logs: INFO [CompactionExecutor:18839] 2011-12-12 21:47:07,173 CompactionTask.java (line 113) Compacting [SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760408-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760413-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760409-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-788314-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760407-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760412-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760410-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760411-Data.db')] INFO [CompactionExecutor:18839] 2011-12-12 21:47:10,461 CompactionTask.java (line 218) Compacted to [/data1/cassandra/data/MA_DDR/messages_meta-hb-788896-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788897-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788898-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788899-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788900-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788901-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788902-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788903-Data.db,]. 83,899,295 to 83,891,657 (~99% of original) bytes for 75,662 keys at 24.332518MB/s. Time: 3,288ms. Note that the timestamp of the 2nd log line matches the last modified time of the files, and has IDs leading up to, *but not including 788904*. I thought this might be relavent information, but I haven't found the specific cause yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3616) Temp SSTable and file descriptor leak
[ https://issues.apache.org/jira/browse/CASSANDRA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170396#comment-13170396 ] Jonathan Ellis commented on CASSANDRA-3616: --- Just with compaction? Did you get a debug log? Temp SSTable and file descriptor leak - Key: CASSANDRA-3616 URL: https://issues.apache.org/jira/browse/CASSANDRA-3616 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.5 Environment: 1.0.5 + CASSANDRA-3532 patch Solaris 10 Reporter: Eric Parusel Discussion about this started in CASSANDRA-3532. It's on it's own ticket now. Anyhow: The nodes in my cluster are using a lot of file descriptors, holding open tmp files. A few are using 50K+, nearing their limit (on Solaris, of 64K). Here's a small snippet of lsof: java 828 appdeployer *162u VREG 181,65540 0 333884 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776518-Data.db java 828 appdeployer *163u VREG 181,65540 0 333502 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776452-Data.db java 828 appdeployer *165u VREG 181,65540 0 333929 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776527-Index.db java 828 appdeployer *166u VREG 181,65540 0 333859 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776514-Data.db java 828 appdeployer *167u VREG 181,65540 0 333663 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776480-Data.db java 828 appdeployer *168u VREG 181,65540 0 333812 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db I spot checked a few and found they still exist on the filesystem too: rw-rr- 1 appdeployer appdeployer 0 Dec 12 07:16 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db After more investigation, it seems to happen during a CompactionTask. I waited until I saw some -tmp- files hanging around in the data dir: -rw-r--r-- 1 appdeployer appdeployer 0 Dec 12 21:47:10 2011 messages_meta-tmp-hb-788904-Data.db -rw-r--r-- 1 appdeployer appdeployer 0 Dec 12 21:47:10 2011 messages_meta-tmp-hb-788904-Index.db and then found this in the logs: INFO [CompactionExecutor:18839] 2011-12-12 21:47:07,173 CompactionTask.java (line 113) Compacting [SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760408-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760413-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760409-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-788314-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760407-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760412-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760410-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760411-Data.db')] INFO [CompactionExecutor:18839] 2011-12-12 21:47:10,461 CompactionTask.java (line 218) Compacted to [/data1/cassandra/data/MA_DDR/messages_meta-hb-788896-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788897-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788898-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788899-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788900-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788901-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788902-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788903-Data.db,]. 83,899,295 to 83,891,657 (~99% of original) bytes for 75,662 keys at 24.332518MB/s. Time: 3,288ms. Note that the timestamp of the 2nd log line matches the last modified time of the files, and has IDs leading up to, *but not including 788904*. I thought this might be relavent information, but I haven't found the specific cause yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3635) Throttle validation separately from other compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170397#comment-13170397 ] Sylvain Lebresne commented on CASSANDRA-3635: - I guess part of the idea is that validation is a bit cpu intensive (due to the SHA-256 hash it does), so that allows to limit that too without being a problem for other compaction. It also allows giving more room for ordinary compactions, so that they complete earlier, which will impact reads (while having validation finishing quickly is not necessarily as important). Throttle validation separately from other compaction Key: CASSANDRA-3635 URL: https://issues.apache.org/jira/browse/CASSANDRA-3635 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Labels: repair Fix For: 0.8.10, 1.0.7 Attachments: 0001-separate-validation-throttling.patch Validation compaction is fairly ressource intensive. It is possible to throttle it with other compaction, but there is cases where you really want to throttle it rather aggressively but don't necessarily want to have minor compactions throttled that much. The goal is to (optionally) allow to set a separate throttling value for validation. PS: I'm not pretending this will solve every repair problem or anything. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)
[ https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170400#comment-13170400 ] Jonathan Ellis commented on CASSANDRA-3143: --- bq. I fail to see what is so crazy about having the function that saves the cache having access to both key and value. It may require a bit of refactoring, but I don't see that as a good argument. Anyway, it's not a very big deal but I still think that the two phase loading is more fragile than it needs, and saving values would allow a proper reload. Why would you want to do a cache reload? That's just going to be stale... Clearing the cache I can understand, but reloading a semi-arbitrary older cache state? I don't see the value there. ISTM we're talking about trading one kind of ugly code (passing around the Set of keys to load to SSTR) for another (a lot of code duplication between key cache, which wants to save values, and row cache, which doesn't). It's also worth pointing out that if we're concerned about cache size, the two-phase approach gives smaller saved caches. So I think I'd lean towards the existing, two-phase approach. Global caches (key/row) --- Key: CASSANDRA-3143 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143 Project: Cassandra Issue Type: Improvement Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Priority: Minor Labels: Core Fix For: 1.1 Attachments: 0001-global-key-cache.patch, 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 0004-key-row-cache-tests-and-tweaks.patch, 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch, 0007-second-round-of-changes-according-to-Sylvain-comment.patch Caches are difficult to configure well as ColumnFamilies are added, similar to how memtables were difficult pre-CASSANDRA-2006. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3625) Do something about DynamicCompositeType
[ https://issues.apache.org/jira/browse/CASSANDRA-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170402#comment-13170402 ] Sylvain Lebresne commented on CASSANDRA-3625: - bq. Right, but I thought we were positing that You Shouldn't Do That. In which case as long as it doesn't crash, I'm good Without going in the debate of how useful or not that is, I think that as soon as it is allowed (to mix in the same row column with components of different types), some will do it, so we'd rather choose the more coherent solution and so I also prefer fixing some order on the types themselves and use that. As for picking the actual sort between types, I'd prefer avoiding a hash as it's not a very good uses for hash imo (I don't want to bother about collision, as unlikely as it is). But using the alias character (and falling back to good old string comparison on the class name if there is no alias) seems fine to me. bq. Sylvain, are you working on this one or would you like me to take a stab at it? I haven't started writing anything so feel free to give it a shot. Do something about DynamicCompositeType --- Key: CASSANDRA-3625 URL: https://issues.apache.org/jira/browse/CASSANDRA-3625 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Sylvain Lebresne Currently, DynamicCompositeType is a super dangerous type. We cannot leave it that way or people will get hurt. Let's recall that DynamicCompositeType allows composite column names without any limitation on what each component type can be. It was added to basically allow to use different rows of the same column family to each store a different index. So for instance you would have: {noformat} index1: { bar:24 - someval bar:42 - someval foo:12 - someval ... } index2: { 0:uuid1:3.2 - someval 1:uuid2:2.2 - someval ... } {noformat} where index1, index2, ... are rows. So each row have columns whose names have similar structure (so they can be compared), but between rows the structure can be different (we neve compare two columns from two different rows). But the problem is the following: what happens if in the index1 row above, you insert a column whose name is 0:uuid1 ? There is no really meaningful way to compare bar:24 and 0:uuid1. The current implementation of DynamicCompositeType, when confronted with this, says that it is a user error and throw a MarshalException. The problem with that is that the exception is not throw at insert time, and it *cannot* be because of the dynamic nature of the comparator. But that means that if you do insert the wrong column in the wrong row, you end up *corrupting* a sstable. It is too dangerous a behavior. And it's probably made worst by the fact that some people probably think that DynamicCompositeType should be superior to CompositeType since you know, it's dynamic. One solution to that problem could be to decide of some random (but predictable) order between two incomparable component. For example we could design that IntType LongType StringType ... Note that even if we do that, I would suggest renaming the DynamicCompositeType to something that suggest that CompositeType is always preferable to DynamicCompositeType unless you're really doing very advanced stuffs. Opinions? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3635) Throttle validation separately from other compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170403#comment-13170403 ] Jonathan Ellis commented on CASSANDRA-3635: --- If you're i/o bound under size-tiered compaction you're kind of screwed since it does such a poor job of actually bucketing the same rows together. I think we should get some feedback of the here's what my workload like and this diminishes my repair pain nature before committing this. Again, I'm fine with posting a 0.8 version of the patch if that helps. Throttle validation separately from other compaction Key: CASSANDRA-3635 URL: https://issues.apache.org/jira/browse/CASSANDRA-3635 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Labels: repair Fix For: 0.8.10, 1.0.7 Attachments: 0001-separate-validation-throttling.patch Validation compaction is fairly ressource intensive. It is possible to throttle it with other compaction, but there is cases where you really want to throttle it rather aggressively but don't necessarily want to have minor compactions throttled that much. The goal is to (optionally) allow to set a separate throttling value for validation. PS: I'm not pretending this will solve every repair problem or anything. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)
[ https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170404#comment-13170404 ] Sylvain Lebresne commented on CASSANDRA-3143: - Alright. I don't really care about cache reloading either actually. The only thing I don't like with the two phase approach is that it populate the cache with -1 positions. If for any reason, this doesn't get updated correctly, we'll end up having the cache wrongly saying that the key doesn't exists in the sstable. Of course there is no reason for the two phase approach to not work, but there is part of me that don't like that a simple mess up in the cache loading can make some keys unaccessible. Anyway, let's just not have bugs in there :) Global caches (key/row) --- Key: CASSANDRA-3143 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143 Project: Cassandra Issue Type: Improvement Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Priority: Minor Labels: Core Fix For: 1.1 Attachments: 0001-global-key-cache.patch, 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 0004-key-row-cache-tests-and-tweaks.patch, 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch, 0007-second-round-of-changes-according-to-Sylvain-comment.patch Caches are difficult to configure well as ColumnFamilies are added, similar to how memtables were difficult pre-CASSANDRA-2006. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3635) Throttle validation separately from other compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170406#comment-13170406 ] Sylvain Lebresne commented on CASSANDRA-3635: - bq. I think we should get some feedback of the here's what my workload like and this diminishes my repair pain nature before committing this. I'm totally fine with that. bq. Again, I'm fine with posting a 0.8 version of the patch if that helps. The currently attached patch is against 0.8. Throttle validation separately from other compaction Key: CASSANDRA-3635 URL: https://issues.apache.org/jira/browse/CASSANDRA-3635 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Labels: repair Fix For: 0.8.10, 1.0.7 Attachments: 0001-separate-validation-throttling.patch Validation compaction is fairly ressource intensive. It is possible to throttle it with other compaction, but there is cases where you really want to throttle it rather aggressively but don't necessarily want to have minor compactions throttled that much. The goal is to (optionally) allow to set a separate throttling value for validation. PS: I'm not pretending this will solve every repair problem or anything. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3626) Nodes can get stuck in UP state forever, despite being DOWN
[ https://issues.apache.org/jira/browse/CASSANDRA-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170409#comment-13170409 ] Peter Schuller commented on CASSANDRA-3626: --- +1. :) Nodes can get stuck in UP state forever, despite being DOWN --- Key: CASSANDRA-3626 URL: https://issues.apache.org/jira/browse/CASSANDRA-3626 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.8.8, 1.0.5 Reporter: Peter Schuller Assignee: Peter Schuller Attachments: 3626.txt This is a proposed phrasing for an upstream ticket named Newly discovered nodes that are down get stuck in UP state forever (will edit w/ feedback until done): We have a observed a problem with gossip which, when you are bootstrapping a new node (or replacing using the replace_token support), any node in the cluster which is Down at the time the node is started, will be assumed to be Up and then *never ever* flapped back to Down until you restart the node. This has at least two implications to replacing or bootstrapping new nodes when there are nodes down in the ring: * If the new node happens to select a node listed as (UP but in reality is DOWN) as a stream source, streaming will sit there hanging forever. * If that doesn't happen (by picking another host), it will instead finish bootstrapping correctly, and begin servicing requests all the while thinking DOWN nodes are UP, and thus routing requests to them, generating timeouts. The way to get out of this is to restart the node(s) that you bootstrapped. I have tested and confirmed the symptom (that the bootstrapped node things other nodes are Up) using a fairly recent 1.0. The main debugging effort happened on 0.8 however, so all details below refer to 0.8 but are probably similar in 1.0. Steps to reproduce: * Bring up a cluster of = 3 nodes. *Ensure RF is N*, so that the cluster is operative with one node removed. * Pick two random nodes A, and B. Shut them *both* off. * Wait for everyone to realize they are both off (for good measure). * Now, take node A and nuke it's data directories and re-start it, such that it comes up w/ normal bootstrap (or use replace_token; didn't test that but should not affect it). * Watch how node A starts up, all the while believing node B is down, even though all other nodes in the cluster agree that B is down and B is in fact still turned off. The mechanism by which it initially goes into Up state is that the node receives a gossip response from any other node in the cluster, and GossipDigestAck2VerbHandler.doVerb() calls Gossiper.applyStateLocally(). Gossiper.applyStateLocally() doesn't have any local endpoint state for the cluster, so the else statement at the end (it's a new node) gets triggered and handleMajorStateChange() is called. handleMajorStateChange() always calls markAlive(), unless the state is a dead state (but dead here does not mean not up, but refers to joining/hibernate etc). So at this point the node is up in the mind of the node you just bootstrapped. Now, in each gossip round doStatusCheck() is called, which iterates over all nodes (including the one falsly Up) and among other things, calls FailureDetector.interpret() on each node. FailureDetector.interpret() is meant to update its sense of Phi for the node, and potentially convict it. However there is a short-circuit at the top, whereby if we do not yet have any arrival window for the node, we simply return immediately. Arrival intervals are only added as a result of a FailureDetector.report() call, which never happens in this case because the initial endpoint state we added, which came from a remote node that was up, had the latest version of the gossip state (so Gossiper.reportFailureDetector() will never call report()). The result is that the node can never ever be convicted. Now, let's ignore for a moment the problem that a node that is actually Down will be thought to be Up temporarily for a little while. That is sub-optimal, but let's aim for a fix to the more serious problem in this ticket - which is that is stays up forever. Considered solutions: * When interpret() gets called and there is no arrival window, we could add a faked arrival window far back in time to cause the node to have history and be marked down. This works in the particular test case. The problem is that since we are not ourselves actively trying to gossip to these nodes with any particular speed, it might take a significant time before we get any kind of confirmation from someone else that it's actually Up in cases where the node actually *is* Up, so it's not clear that this is a good idea. * When interpret() gets called and
[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)
[ https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170410#comment-13170410 ] Pavel Yaskevich commented on CASSANDRA-3143: How about we just change SSTableReader.getCachedPosition to return null if value of the key cache was -1? Global caches (key/row) --- Key: CASSANDRA-3143 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143 Project: Cassandra Issue Type: Improvement Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Priority: Minor Labels: Core Fix For: 1.1 Attachments: 0001-global-key-cache.patch, 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 0004-key-row-cache-tests-and-tweaks.patch, 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch, 0007-second-round-of-changes-according-to-Sylvain-comment.patch Caches are difficult to configure well as ColumnFamilies are added, similar to how memtables were difficult pre-CASSANDRA-2006. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)
[ https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170411#comment-13170411 ] Jonathan Ellis commented on CASSANDRA-3143: --- I'd rather go with the current approach of leaving the cache empty until we have real values for it, and pass SSTR a Set of keys-to-load. Global caches (key/row) --- Key: CASSANDRA-3143 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143 Project: Cassandra Issue Type: Improvement Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Priority: Minor Labels: Core Fix For: 1.1 Attachments: 0001-global-key-cache.patch, 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 0004-key-row-cache-tests-and-tweaks.patch, 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch, 0007-second-round-of-changes-according-to-Sylvain-comment.patch Caches are difficult to configure well as ColumnFamilies are added, similar to how memtables were difficult pre-CASSANDRA-2006. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3635) Throttle validation separately from other compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170415#comment-13170415 ] Vijay commented on CASSANDRA-3635: -- I think it will be much better if we can prioritize, long running compaction vs normal compaction, lets sat we have 10MB Compaction limit 2MB Validation compaction limit 2MB is the limit for the validation for a while and when normal compaction kicks in we might want to hold the validation and do the compction complete because that will affect the read performance and continue with the validation compaction after that. by doing this we can set something like 12MB Compaction limit 6 MB Validation compaction limit and still be within the HDD limit of 12MB. The good thing about normal compaction is that it is spread out and not all the nodes are not involved in it. I am starting to think that we can do repairs one by one for a range (within a region), so the traffic doesnt get stuck waiting for the IO. Hope it makes sense. Throttle validation separately from other compaction Key: CASSANDRA-3635 URL: https://issues.apache.org/jira/browse/CASSANDRA-3635 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Labels: repair Fix For: 0.8.10, 1.0.7 Attachments: 0001-separate-validation-throttling.patch Validation compaction is fairly ressource intensive. It is possible to throttle it with other compaction, but there is cases where you really want to throttle it rather aggressively but don't necessarily want to have minor compactions throttled that much. The goal is to (optionally) allow to set a separate throttling value for validation. PS: I'm not pretending this will solve every repair problem or anything. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3616) Temp SSTable and file descriptor leak
[ https://issues.apache.org/jira/browse/CASSANDRA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170414#comment-13170414 ] Brandon Williams commented on CASSANDRA-3616: - CASSANDRA-3532 is the culprit here. Temp SSTable and file descriptor leak - Key: CASSANDRA-3616 URL: https://issues.apache.org/jira/browse/CASSANDRA-3616 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.5 Environment: 1.0.5 + CASSANDRA-3532 patch Solaris 10 Reporter: Eric Parusel Discussion about this started in CASSANDRA-3532. It's on it's own ticket now. Anyhow: The nodes in my cluster are using a lot of file descriptors, holding open tmp files. A few are using 50K+, nearing their limit (on Solaris, of 64K). Here's a small snippet of lsof: java 828 appdeployer *162u VREG 181,65540 0 333884 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776518-Data.db java 828 appdeployer *163u VREG 181,65540 0 333502 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776452-Data.db java 828 appdeployer *165u VREG 181,65540 0 333929 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776527-Index.db java 828 appdeployer *166u VREG 181,65540 0 333859 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776514-Data.db java 828 appdeployer *167u VREG 181,65540 0 333663 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776480-Data.db java 828 appdeployer *168u VREG 181,65540 0 333812 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db I spot checked a few and found they still exist on the filesystem too: rw-rr- 1 appdeployer appdeployer 0 Dec 12 07:16 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db After more investigation, it seems to happen during a CompactionTask. I waited until I saw some -tmp- files hanging around in the data dir: -rw-r--r-- 1 appdeployer appdeployer 0 Dec 12 21:47:10 2011 messages_meta-tmp-hb-788904-Data.db -rw-r--r-- 1 appdeployer appdeployer 0 Dec 12 21:47:10 2011 messages_meta-tmp-hb-788904-Index.db and then found this in the logs: INFO [CompactionExecutor:18839] 2011-12-12 21:47:07,173 CompactionTask.java (line 113) Compacting [SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760408-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760413-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760409-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-788314-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760407-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760412-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760410-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760411-Data.db')] INFO [CompactionExecutor:18839] 2011-12-12 21:47:10,461 CompactionTask.java (line 218) Compacted to [/data1/cassandra/data/MA_DDR/messages_meta-hb-788896-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788897-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788898-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788899-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788900-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788901-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788902-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788903-Data.db,]. 83,899,295 to 83,891,657 (~99% of original) bytes for 75,662 keys at 24.332518MB/s. Time: 3,288ms. Note that the timestamp of the 2nd log line matches the last modified time of the files, and has IDs leading up to, *but not including 788904*. I thought this might be relavent information, but I haven't found the specific cause yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)
[ https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170418#comment-13170418 ] Pavel Yaskevich commented on CASSANDRA-3143: I'm not a fan of that because we would need to drag read keys through all of the CFS and SSTableReaders :( Global caches (key/row) --- Key: CASSANDRA-3143 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143 Project: Cassandra Issue Type: Improvement Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Priority: Minor Labels: Core Fix For: 1.1 Attachments: 0001-global-key-cache.patch, 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 0004-key-row-cache-tests-and-tweaks.patch, 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch, 0007-second-round-of-changes-according-to-Sylvain-comment.patch Caches are difficult to configure well as ColumnFamilies are added, similar to how memtables were difficult pre-CASSANDRA-2006. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1214916 - in /cassandra/branches/cassandra-0.8: CHANGES.txt src/java/org/apache/cassandra/gms/Gossiper.java
Author: brandonwilliams Date: Thu Dec 15 19:10:36 2011 New Revision: 1214916 URL: http://svn.apache.org/viewvc?rev=1214916view=rev Log: Prevent new nodes from thinking down nodes are up forever. Patch by brandonwilliams, reviewed by Peter Schuller for CASSANDRA-3626 Modified: cassandra/branches/cassandra-0.8/CHANGES.txt cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/Gossiper.java Modified: cassandra/branches/cassandra-0.8/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1214916r1=1214915r2=1214916view=diff == --- cassandra/branches/cassandra-0.8/CHANGES.txt (original) +++ cassandra/branches/cassandra-0.8/CHANGES.txt Thu Dec 15 19:10:36 2011 @@ -1,3 +1,6 @@ +0.8.10 + * prevent new nodes from thinking down nodes are up forever (CASSANDRA-3626) + 0.8.9 * remove invalid assertion that table was opened before dropping it (CASSANDRA-3580) Modified: cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/Gossiper.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/Gossiper.java?rev=1214916r1=1214915r2=1214916view=diff == --- cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/Gossiper.java (original) +++ cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/Gossiper.java Thu Dec 15 19:10:36 2011 @@ -831,7 +831,8 @@ public class Gossiper implements IFailur } else { -// this is a new node +// this is a new node, report it to the FD in case it is the first time we are seeing it AND it's not alive +FailureDetector.instance.report(ep); handleMajorStateChange(ep, remoteState); } }
[jira] [Commented] (CASSANDRA-3635) Throttle validation separately from other compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170417#comment-13170417 ] Jonathan Ellis commented on CASSANDRA-3635: --- bq. I am starting to think that we can do repairs one by one for a range (within a region You mean if you have replicas A B C, comparing A and B before comparing A and C? The downside there is you now have to validate twice, or they will be too out of sync. Throttle validation separately from other compaction Key: CASSANDRA-3635 URL: https://issues.apache.org/jira/browse/CASSANDRA-3635 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Labels: repair Fix For: 0.8.10, 1.0.7 Attachments: 0001-separate-validation-throttling.patch Validation compaction is fairly ressource intensive. It is possible to throttle it with other compaction, but there is cases where you really want to throttle it rather aggressively but don't necessarily want to have minor compactions throttled that much. The goal is to (optionally) allow to set a separate throttling value for validation. PS: I'm not pretending this will solve every repair problem or anything. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1214918 - in /cassandra/branches/cassandra-1.0: ./ contrib/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/java/org/apache/cassandra/gms/
Author: brandonwilliams Date: Thu Dec 15 19:14:28 2011 New Revision: 1214918 URL: http://svn.apache.org/viewvc?rev=1214918view=rev Log: Merge 3626 from 0.8 Modified: cassandra/branches/cassandra-1.0/ (props changed) cassandra/branches/cassandra-1.0/CHANGES.txt cassandra/branches/cassandra-1.0/contrib/ (props changed) cassandra/branches/cassandra-1.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java (props changed) cassandra/branches/cassandra-1.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java (props changed) cassandra/branches/cassandra-1.0/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java (props changed) cassandra/branches/cassandra-1.0/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java (props changed) cassandra/branches/cassandra-1.0/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java (props changed) cassandra/branches/cassandra-1.0/src/java/org/apache/cassandra/gms/Gossiper.java Propchange: cassandra/branches/cassandra-1.0/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Thu Dec 15 19:14:28 2011 @@ -1,7 +1,7 @@ /cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1131291 /cassandra/branches/cassandra-0.7:1026516-1211709 /cassandra/branches/cassandra-0.7.0:1053690-1055654 -/cassandra/branches/cassandra-0.8:1090934-1125013,1125019-1212854,1212938 +/cassandra/branches/cassandra-0.8:1090934-1125013,1125019-1212854,1212938,1214916 /cassandra/branches/cassandra-0.8.0:1125021-1130369 /cassandra/branches/cassandra-0.8.1:1101014-1125018 /cassandra/branches/cassandra-1.0:1167106,1167185 Modified: cassandra/branches/cassandra-1.0/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-1.0/CHANGES.txt?rev=1214918r1=1214917r2=1214918view=diff == --- cassandra/branches/cassandra-1.0/CHANGES.txt (original) +++ cassandra/branches/cassandra-1.0/CHANGES.txt Thu Dec 15 19:14:28 2011 @@ -2,6 +2,8 @@ * fix assertion when dropping a columnfamily with no sstables (CASSANDRA-3614) * more efficient allocation of small bloom filters (CASSANDRA-3618) * CLibrary.createHardLinkWithExec() to check for errors (CASSANDRA-3101) +Merged from 0.8: + * prevent new nodes from thinking down nodes are up forever (CASSANDRA-3626) 1.0.6 * (CQL) fix cqlsh support for replicate_on_write (CASSANDRA-3596) Propchange: cassandra/branches/cassandra-1.0/contrib/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Thu Dec 15 19:14:28 2011 @@ -1,7 +1,7 @@ /cassandra/branches/cassandra-0.6/contrib:922689-1052356,1052358-1053452,1053454,1053456-1068009 /cassandra/branches/cassandra-0.7/contrib:1026516-1211709 /cassandra/branches/cassandra-0.7.0/contrib:1053690-1055654 -/cassandra/branches/cassandra-0.8/contrib:1090934-1125013,1125019-1212854,1212938 +/cassandra/branches/cassandra-0.8/contrib:1090934-1125013,1125019-1212854,1212938,1214916 /cassandra/branches/cassandra-0.8.0/contrib:1125021-1130369 /cassandra/branches/cassandra-0.8.1/contrib:1101014-1125018 /cassandra/branches/cassandra-1.0/contrib:1167106,1167185 Propchange: cassandra/branches/cassandra-1.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Thu Dec 15 19:14:28 2011 @@ -1,7 +1,7 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:922689-1052356,1052358-1053452,1053454,1053456-1131291 /cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1211709 /cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1053690-1055654 -/cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1090934-1125013,1125019-1212854,1212938 +/cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1090934-1125013,1125019-1212854,1212938,1214916 /cassandra/branches/cassandra-0.8.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1125021-1130369 /cassandra/branches/cassandra-0.8.1/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1101014-1125018 /cassandra/branches/cassandra-1.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1167106,1167185 Propchange: cassandra/branches/cassandra-1.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Thu Dec
[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)
[ https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170423#comment-13170423 ] Jonathan Ellis commented on CASSANDRA-3143: --- Again, that's what we're doing now, so I don't see it as *that* big a deal. But I'm good with either that approach, or save-the-values-also approach. I agree with Sylvain that keeping invalid values in the cache and replacing them later is a bad idea. Global caches (key/row) --- Key: CASSANDRA-3143 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143 Project: Cassandra Issue Type: Improvement Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Priority: Minor Labels: Core Fix For: 1.1 Attachments: 0001-global-key-cache.patch, 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 0004-key-row-cache-tests-and-tweaks.patch, 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch, 0007-second-round-of-changes-according-to-Sylvain-comment.patch Caches are difficult to configure well as ColumnFamilies are added, similar to how memtables were difficult pre-CASSANDRA-2006. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3640) Dynamic Snitch does not compute scores if no direct reads hit the node.
Dynamic Snitch does not compute scores if no direct reads hit the node. --- Key: CASSANDRA-3640 URL: https://issues.apache.org/jira/browse/CASSANDRA-3640 Project: Cassandra Issue Type: Bug Reporter: Edward Capriolo Priority: Minor We can into an interesting situation. We added 2 nodes to our cluster. Strangely this node performed worse then other nodes. It had more IOwait for example. The impact was not major but it was noticeable. Later I determined that this Cassandra node was not in our client's list of nodes and our clients do not auto discover. I confirmed that the host did not have any scores inside it's dynamic snitch. It is counter intuitive that a node receiving less or no direct user requests would perform worse then others. I am not sure of the dynamic that caused this. I understand that DSnitch is supposed to have it's own view of the world, maybe it could share information with neighbours. Again this is more of a client configuration issue then a direct Cassandra issue, but I found it quite interesting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3626) Nodes can get stuck in UP state forever, despite being DOWN
[ https://issues.apache.org/jira/browse/CASSANDRA-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-3626: Reviewer: scode (was: lenn0x) Fix Version/s: 1.0.7 0.8.10 Assignee: Brandon Williams (was: Peter Schuller) Nodes can get stuck in UP state forever, despite being DOWN --- Key: CASSANDRA-3626 URL: https://issues.apache.org/jira/browse/CASSANDRA-3626 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.8.8, 1.0.5 Reporter: Peter Schuller Assignee: Brandon Williams Fix For: 0.8.10, 1.0.7 Attachments: 3626.txt This is a proposed phrasing for an upstream ticket named Newly discovered nodes that are down get stuck in UP state forever (will edit w/ feedback until done): We have a observed a problem with gossip which, when you are bootstrapping a new node (or replacing using the replace_token support), any node in the cluster which is Down at the time the node is started, will be assumed to be Up and then *never ever* flapped back to Down until you restart the node. This has at least two implications to replacing or bootstrapping new nodes when there are nodes down in the ring: * If the new node happens to select a node listed as (UP but in reality is DOWN) as a stream source, streaming will sit there hanging forever. * If that doesn't happen (by picking another host), it will instead finish bootstrapping correctly, and begin servicing requests all the while thinking DOWN nodes are UP, and thus routing requests to them, generating timeouts. The way to get out of this is to restart the node(s) that you bootstrapped. I have tested and confirmed the symptom (that the bootstrapped node things other nodes are Up) using a fairly recent 1.0. The main debugging effort happened on 0.8 however, so all details below refer to 0.8 but are probably similar in 1.0. Steps to reproduce: * Bring up a cluster of = 3 nodes. *Ensure RF is N*, so that the cluster is operative with one node removed. * Pick two random nodes A, and B. Shut them *both* off. * Wait for everyone to realize they are both off (for good measure). * Now, take node A and nuke it's data directories and re-start it, such that it comes up w/ normal bootstrap (or use replace_token; didn't test that but should not affect it). * Watch how node A starts up, all the while believing node B is down, even though all other nodes in the cluster agree that B is down and B is in fact still turned off. The mechanism by which it initially goes into Up state is that the node receives a gossip response from any other node in the cluster, and GossipDigestAck2VerbHandler.doVerb() calls Gossiper.applyStateLocally(). Gossiper.applyStateLocally() doesn't have any local endpoint state for the cluster, so the else statement at the end (it's a new node) gets triggered and handleMajorStateChange() is called. handleMajorStateChange() always calls markAlive(), unless the state is a dead state (but dead here does not mean not up, but refers to joining/hibernate etc). So at this point the node is up in the mind of the node you just bootstrapped. Now, in each gossip round doStatusCheck() is called, which iterates over all nodes (including the one falsly Up) and among other things, calls FailureDetector.interpret() on each node. FailureDetector.interpret() is meant to update its sense of Phi for the node, and potentially convict it. However there is a short-circuit at the top, whereby if we do not yet have any arrival window for the node, we simply return immediately. Arrival intervals are only added as a result of a FailureDetector.report() call, which never happens in this case because the initial endpoint state we added, which came from a remote node that was up, had the latest version of the gossip state (so Gossiper.reportFailureDetector() will never call report()). The result is that the node can never ever be convicted. Now, let's ignore for a moment the problem that a node that is actually Down will be thought to be Up temporarily for a little while. That is sub-optimal, but let's aim for a fix to the more serious problem in this ticket - which is that is stays up forever. Considered solutions: * When interpret() gets called and there is no arrival window, we could add a faked arrival window far back in time to cause the node to have history and be marked down. This works in the particular test case. The problem is that since we are not ourselves actively trying to gossip to these nodes with any particular speed, it might take a significant time before we get any kind of confirmation from someone else that it's actually Up in cases
[jira] [Commented] (CASSANDRA-3640) Dynamic Snitch does not compute scores if no direct reads hit the node.
[ https://issues.apache.org/jira/browse/CASSANDRA-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170427#comment-13170427 ] Jonathan Ellis commented on CASSANDRA-3640: --- I think I misunderstood on irc. The dynamic snitch is populated based on client requests, so it's normal for that to be empty here. It sounds like the real question is, were other nodes directing requests away from the poorly performing ones, and if not, what did their dsnitch contents look like? Dynamic Snitch does not compute scores if no direct reads hit the node. --- Key: CASSANDRA-3640 URL: https://issues.apache.org/jira/browse/CASSANDRA-3640 Project: Cassandra Issue Type: Bug Reporter: Edward Capriolo Priority: Minor We can into an interesting situation. We added 2 nodes to our cluster. Strangely this node performed worse then other nodes. It had more IOwait for example. The impact was not major but it was noticeable. Later I determined that this Cassandra node was not in our client's list of nodes and our clients do not auto discover. I confirmed that the host did not have any scores inside it's dynamic snitch. It is counter intuitive that a node receiving less or no direct user requests would perform worse then others. I am not sure of the dynamic that caused this. I understand that DSnitch is supposed to have it's own view of the world, maybe it could share information with neighbours. Again this is more of a client configuration issue then a direct Cassandra issue, but I found it quite interesting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3640) Dynamic Snitch does not compute scores if no direct reads hit the node.
[ https://issues.apache.org/jira/browse/CASSANDRA-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated CASSANDRA-3640: --- Affects Version/s: 0.8.7 Issue Type: Improvement (was: Bug) Dynamic Snitch does not compute scores if no direct reads hit the node. --- Key: CASSANDRA-3640 URL: https://issues.apache.org/jira/browse/CASSANDRA-3640 Project: Cassandra Issue Type: Improvement Affects Versions: 0.8.7 Reporter: Edward Capriolo Priority: Minor We can into an interesting situation. We added 2 nodes to our cluster. Strangely this node performed worse then other nodes. It had more IOwait for example. The impact was not major but it was noticeable. Later I determined that this Cassandra node was not in our client's list of nodes and our clients do not auto discover. I confirmed that the host did not have any scores inside it's dynamic snitch. It is counter intuitive that a node receiving less or no direct user requests would perform worse then others. I am not sure of the dynamic that caused this. I understand that DSnitch is supposed to have it's own view of the world, maybe it could share information with neighbours. Again this is more of a client configuration issue then a direct Cassandra issue, but I found it quite interesting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3640) Dynamic Snitch does not compute scores if no direct reads hit the node.
[ https://issues.apache.org/jira/browse/CASSANDRA-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated CASSANDRA-3640: --- Description: We can into an interesting situation. We added 2 nodes to our cluster. Strangely these nodes were performing worse then other nodes. They had more IOwait for example. The impact was not major but it was noticeable. Later I determined that these Cassandra node were not in our client's list of nodes and our clients do not auto discover. I confirmed that the hosts did not have any scores inside it's dynamic snitch. It is counter intuitive that a node receiving less or no direct user requests would perform worse then others. I am not sure of the dynamic that caused this. I understand that DSnitch is supposed to have it's own view of the world, maybe it could share information with neighbours. Again this is more of a client configuration issue then a direct Cassandra issue, but I found it interesting. was: We can into an interesting situation. We added 2 nodes to our cluster. Strangely this node performed worse then other nodes. It had more IOwait for example. The impact was not major but it was noticeable. Later I determined that this Cassandra node was not in our client's list of nodes and our clients do not auto discover. I confirmed that the host did not have any scores inside it's dynamic snitch. It is counter intuitive that a node receiving less or no direct user requests would perform worse then others. I am not sure of the dynamic that caused this. I understand that DSnitch is supposed to have it's own view of the world, maybe it could share information with neighbours. Again this is more of a client configuration issue then a direct Cassandra issue, but I found it quite interesting. Dynamic Snitch does not compute scores if no direct reads hit the node. --- Key: CASSANDRA-3640 URL: https://issues.apache.org/jira/browse/CASSANDRA-3640 Project: Cassandra Issue Type: Improvement Affects Versions: 0.8.7 Reporter: Edward Capriolo Priority: Minor We can into an interesting situation. We added 2 nodes to our cluster. Strangely these nodes were performing worse then other nodes. They had more IOwait for example. The impact was not major but it was noticeable. Later I determined that these Cassandra node were not in our client's list of nodes and our clients do not auto discover. I confirmed that the hosts did not have any scores inside it's dynamic snitch. It is counter intuitive that a node receiving less or no direct user requests would perform worse then others. I am not sure of the dynamic that caused this. I understand that DSnitch is supposed to have it's own view of the world, maybe it could share information with neighbours. Again this is more of a client configuration issue then a direct Cassandra issue, but I found it interesting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1214927 - in /cassandra/trunk: ./ contrib/ debian/ interface/thrift/gen-java/org/apache/cassandra/thrift/ pylib/cqlshlib/ src/java/org/apache/cassandra/db/ src/java/org/apache/cassandra/d
Author: jbellis Date: Thu Dec 15 19:34:40 2011 New Revision: 1214927 URL: http://svn.apache.org/viewvc?rev=1214927view=rev Log: merge from 1.0 Added: cassandra/trunk/debian/cassandra-sysctl.conf Removed: cassandra/trunk/test/distributed/README.txt cassandra/trunk/test/distributed/org/apache/cassandra/CassandraServiceController.java cassandra/trunk/test/distributed/org/apache/cassandra/CountersTest.java cassandra/trunk/test/distributed/org/apache/cassandra/MovementTest.java cassandra/trunk/test/distributed/org/apache/cassandra/MutationTest.java cassandra/trunk/test/distributed/org/apache/cassandra/TestBase.java cassandra/trunk/test/distributed/org/apache/cassandra/utils/BlobUtils.java cassandra/trunk/test/distributed/org/apache/cassandra/utils/KeyPair.java Modified: cassandra/trunk/ (props changed) cassandra/trunk/.rat-excludes cassandra/trunk/CHANGES.txt cassandra/trunk/NEWS.txt cassandra/trunk/build.xml cassandra/trunk/contrib/ (props changed) cassandra/trunk/debian/changelog cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java (props changed) cassandra/trunk/pylib/cqlshlib/cqlhandling.py cassandra/trunk/src/java/org/apache/cassandra/db/DataTracker.java cassandra/trunk/src/java/org/apache/cassandra/db/SystemTable.java cassandra/trunk/src/java/org/apache/cassandra/db/migration/DropKeyspace.java cassandra/trunk/src/java/org/apache/cassandra/gms/Gossiper.java cassandra/trunk/src/java/org/apache/cassandra/gms/GossiperMBean.java cassandra/trunk/src/java/org/apache/cassandra/service/StorageProxy.java cassandra/trunk/src/java/org/apache/cassandra/service/StorageService.java cassandra/trunk/test/cassandra.in.sh cassandra/trunk/test/unit/org/apache/cassandra/service/RemoveTest.java Propchange: cassandra/trunk/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Thu Dec 15 19:34:40 2011 @@ -1,11 +1,12 @@ /cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1131291 /cassandra/branches/cassandra-0.7:1026516-1211709 /cassandra/branches/cassandra-0.7.0:1053690-1055654 -/cassandra/branches/cassandra-0.8:1090934-1125013,1125019-1198724,1198726-1206097,1206099-1211976 +/cassandra/branches/cassandra-0.8:1090934-1125013,1125019-1198724,1198726-1206097,1206099-1212854,1212938 /cassandra/branches/cassandra-0.8.0:1125021-1130369 /cassandra/branches/cassandra-0.8.1:1101014-1125018 -/cassandra/branches/cassandra-1.0:1167085-1211978,1212284,1213775 +/cassandra/branches/cassandra-1.0:1167085-1213775 /cassandra/branches/cassandra-1.0.0:1167104-1167229,1167232-1181093,1181741,1181816,1181820,1182951,1183243 +/cassandra/branches/cassandra-1.0.5:1208016 /cassandra/tags/cassandra-0.7.0-rc3:1051699-1053689 /cassandra/tags/cassandra-0.8.0-rc1:1102511-1125020 /incubator/cassandra/branches/cassandra-0.3:774578-796573 Modified: cassandra/trunk/.rat-excludes URL: http://svn.apache.org/viewvc/cassandra/trunk/.rat-excludes?rev=1214927r1=1214926r2=1214927view=diff == --- cassandra/trunk/.rat-excludes (original) +++ cassandra/trunk/.rat-excludes Thu Dec 15 19:34:40 2011 @@ -29,3 +29,4 @@ drivers/txpy/txcql/cassandra/* drivers/py/cql/cassandra/* doc/cql/CQL* build.properties.default +test/data/legacy-sstables/** Modified: cassandra/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/trunk/CHANGES.txt?rev=1214927r1=1214926r2=1214927view=diff == --- cassandra/trunk/CHANGES.txt (original) +++ cassandra/trunk/CHANGES.txt Thu Dec 15 19:34:40 2011 @@ -27,7 +27,12 @@ * more efficient allocation of small bloom filters (CASSANDRA-3618) +1.0.7 + * fix assertion when dropping a columnfamily with no sstables (CASSANDRA-3614) + + 1.0.6 + * (CQL) fix cqlsh support for replicate_on_write (CASSANDRA-3596) * fix adding to leveled manifest after streaming (CASSANDRA-3536) * filter out unavailable cipher suites when using encryption (CASSANDRA-3178) * (HADOOP) add old-style api support for CFIF and CFRR (CASSANDRA-2799) @@ -49,13 +54,20 @@ * add back partitioner to sstable metadata (CASSANDRA-3540) * fix NPE in get_count for counters (CASSANDRA-3601) Merged from 0.8: + * remove invalid assertion that table was opened before dropping it + (CASSANDRA-3580) + * range and
[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)
[ https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170438#comment-13170438 ] Pavel Yaskevich commented on CASSANDRA-3143: We are doing that now because we are able to read caches independently for each of the CFS, but with a global cache we would need to load that set on cache init and keep it through Schema.load as I global state, why wouldn't changing SSTableReader.getCachedPosition to return null (and delete that key) if value was -1, be a path of least resistance in this case? Global caches (key/row) --- Key: CASSANDRA-3143 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143 Project: Cassandra Issue Type: Improvement Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Priority: Minor Labels: Core Fix For: 1.1 Attachments: 0001-global-key-cache.patch, 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 0004-key-row-cache-tests-and-tweaks.patch, 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch, 0007-second-round-of-changes-according-to-Sylvain-comment.patch Caches are difficult to configure well as ColumnFamilies are added, similar to how memtables were difficult pre-CASSANDRA-2006. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3616) Temp SSTable and file descriptor leak
[ https://issues.apache.org/jira/browse/CASSANDRA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-3616: Attachment: 3616.patch I believe the problem is that when compaction is over we were creating an empty (and useless) writer. Previously we were just deleting it right away because the 'cleanupIfNeeded' was in the finally. Patch attached to not create it in the first place. Temp SSTable and file descriptor leak - Key: CASSANDRA-3616 URL: https://issues.apache.org/jira/browse/CASSANDRA-3616 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.5 Environment: 1.0.5 + CASSANDRA-3532 patch Solaris 10 Reporter: Eric Parusel Attachments: 3616.patch Discussion about this started in CASSANDRA-3532. It's on it's own ticket now. Anyhow: The nodes in my cluster are using a lot of file descriptors, holding open tmp files. A few are using 50K+, nearing their limit (on Solaris, of 64K). Here's a small snippet of lsof: java 828 appdeployer *162u VREG 181,65540 0 333884 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776518-Data.db java 828 appdeployer *163u VREG 181,65540 0 333502 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776452-Data.db java 828 appdeployer *165u VREG 181,65540 0 333929 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776527-Index.db java 828 appdeployer *166u VREG 181,65540 0 333859 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776514-Data.db java 828 appdeployer *167u VREG 181,65540 0 333663 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776480-Data.db java 828 appdeployer *168u VREG 181,65540 0 333812 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db I spot checked a few and found they still exist on the filesystem too: rw-rr- 1 appdeployer appdeployer 0 Dec 12 07:16 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db After more investigation, it seems to happen during a CompactionTask. I waited until I saw some -tmp- files hanging around in the data dir: -rw-r--r-- 1 appdeployer appdeployer 0 Dec 12 21:47:10 2011 messages_meta-tmp-hb-788904-Data.db -rw-r--r-- 1 appdeployer appdeployer 0 Dec 12 21:47:10 2011 messages_meta-tmp-hb-788904-Index.db and then found this in the logs: INFO [CompactionExecutor:18839] 2011-12-12 21:47:07,173 CompactionTask.java (line 113) Compacting [SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760408-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760413-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760409-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-788314-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760407-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760412-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760410-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760411-Data.db')] INFO [CompactionExecutor:18839] 2011-12-12 21:47:10,461 CompactionTask.java (line 218) Compacted to [/data1/cassandra/data/MA_DDR/messages_meta-hb-788896-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788897-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788898-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788899-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788900-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788901-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788902-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788903-Data.db,]. 83,899,295 to 83,891,657 (~99% of original) bytes for 75,662 keys at 24.332518MB/s. Time: 3,288ms. Note that the timestamp of the 2nd log line matches the last modified time of the files, and has IDs leading up to, *but not including 788904*. I thought this might be relavent information, but I haven't found the specific cause yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)
[ https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170446#comment-13170446 ] Jonathan Ellis commented on CASSANDRA-3143: --- Also note that while we have one global cache internally, there's nothing stopping us from splitting out the different CFs to different save files. In fact that would be great from a backwards compatibility point of view; there's users out there who would really hate to blow away their cache on upgrade, and preserving the save format would avoid the need for a backwards compatibility mode. Global caches (key/row) --- Key: CASSANDRA-3143 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143 Project: Cassandra Issue Type: Improvement Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Priority: Minor Labels: Core Fix For: 1.1 Attachments: 0001-global-key-cache.patch, 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 0004-key-row-cache-tests-and-tweaks.patch, 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch, 0007-second-round-of-changes-according-to-Sylvain-comment.patch Caches are difficult to configure well as ColumnFamilies are added, similar to how memtables were difficult pre-CASSANDRA-2006. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3143) Global caches (key/row)
[ https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Yaskevich updated CASSANDRA-3143: --- Attachment: (was: 0007-second-round-of-changes-according-to-Sylvain-comment.patch) Global caches (key/row) --- Key: CASSANDRA-3143 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143 Project: Cassandra Issue Type: Improvement Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Priority: Minor Labels: Core Fix For: 1.1 Attachments: 0001-global-key-cache.patch, 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 0004-key-row-cache-tests-and-tweaks.patch, 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch Caches are difficult to configure well as ColumnFamilies are added, similar to how memtables were difficult pre-CASSANDRA-2006. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)
[ https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170450#comment-13170450 ] Pavel Yaskevich commented on CASSANDRA-3143: Yeah, I guess this is the best way to go, I will remove #7 patch and re-attach with those changes to avoid pre-loading keys as well as keeping a global state. Global caches (key/row) --- Key: CASSANDRA-3143 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143 Project: Cassandra Issue Type: Improvement Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Priority: Minor Labels: Core Fix For: 1.1 Attachments: 0001-global-key-cache.patch, 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 0004-key-row-cache-tests-and-tweaks.patch, 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch Caches are difficult to configure well as ColumnFamilies are added, similar to how memtables were difficult pre-CASSANDRA-2006. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-1537) Add option (on CF) to remove expired column on minor compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-1537. --- Resolution: Won't Fix Fix Version/s: (was: 1.1) Assignee: (was: Sylvain Lebresne) This doesn't seem urgent or useful enough to justify adding more options and complexity to the TTL code. Add option (on CF) to remove expired column on minor compactions Key: CASSANDRA-1537 URL: https://issues.apache.org/jira/browse/CASSANDRA-1537 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.1 Reporter: Sylvain Lebresne Priority: Minor Original Estimate: 8h Remaining Estimate: 8h In some use cases, you can safely remove the tombstones of an expired column. In theory, this is true in each case where you know that you will never update a column using a ttl strictly lesser that the one of the old column. This will be the case for instance if you always use the same ttl on all the columns of a CF (say you use the CF for a long term persistent cache). I propose adding an option (by CF) that says 'always remove tombstone of expired columns for that CF'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-2056) Need a way of flattening schemas.
[ https://issues.apache.org/jira/browse/CASSANDRA-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-2056. --- Resolution: Invalid Fix Version/s: (was: 1.1) Assignee: (was: Gary Dusbabek) This is obsolete post-CASSANDRA-1391 Need a way of flattening schemas. - Key: CASSANDRA-2056 URL: https://issues.apache.org/jira/browse/CASSANDRA-2056 Project: Cassandra Issue Type: Improvement Reporter: Gary Dusbabek Priority: Minor Attachments: v2-0001-convert-MigrationManager-into-a-singleton.txt, v2-0002-bail-on-migrations-originating-from-newer-protocol-ver.txt, v2-0003-a-way-to-upgrade-schema-when-protocol-version-changes.txt For all of our trying not to, we still managed to screw this up. Schema updates currently contain a serialized RowMutation stored as a column value. When a node needs updated schema, it requests these values, deserializes them and applies them. As the serialization scheme for RowMutation changes over time (this is inevitable), those old migrations will become incompatible with newer implementations of the RowMutation deserializer. This means that when new nodes come online, they'll get migration messages that they have trouble deserializing. (Remember, we've only made the promise that we'll be backwards compatible for one version--see CASSANDRA-1015--even though we'd eventually have this problem without that guarantee.) What I propose is a cluster command to flatten the schema prior to upgrading. This would basically purge the old schema updates and replace them with a single serialized migration (serialized in the current protocol version). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2261) During Compaction, Corrupt SSTables with rows that cause failures should be identified and blacklisted.
[ https://issues.apache.org/jira/browse/CASSANDRA-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170457#comment-13170457 ] Jonathan Ellis commented on CASSANDRA-2261: --- I don't suppose you'd care to rebase to trunk? During Compaction, Corrupt SSTables with rows that cause failures should be identified and blacklisted. --- Key: CASSANDRA-2261 URL: https://issues.apache.org/jira/browse/CASSANDRA-2261 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benjamin Coverston Assignee: Benjamin Coverston Priority: Minor Labels: not_a_pony Fix For: 1.1 Attachments: 2261.patch When a compaction of a set of SSTables fails because of corruption it will continue to try to compact that SSTable causing pending compactions to build up. One way to mitigate this problem would be to log the error, then identify the specific SSTable that caused the failure, subsequently blacklisting that SSTable and ensuring that it is no longer included in future compactions. For this we could simply store the problematic SSTable's name in memory. If it's not possible to identify the SSTable that caused the issue, then perhaps blacklisting the (ordered) permutation of SSTables to be compacted together is something that can be done to solve this problem in a more general case, and avoid issues where two (or more) SSTables have trouble compacting a particular row. For this option we would probably want to store the lists of the bad combinations in the system table somewhere s.t. these can survive a node failure (there have been a few cases where I have seen a compaction cause a node failure). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-2876) JDBC 1.1 Roadmap of Enhancements
[ https://issues.apache.org/jira/browse/CASSANDRA-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-2876. --- Resolution: Fixed Assignee: Rick Shaw Resolving as fixed since the subtasks are, but really JDBC driver moved out-of-tree anyway. JDBC 1.1 Roadmap of Enhancements Key: CASSANDRA-2876 URL: https://issues.apache.org/jira/browse/CASSANDRA-2876 Project: Cassandra Issue Type: Improvement Components: Drivers Affects Versions: 0.8.1 Reporter: Rick Shaw Assignee: Rick Shaw Priority: Minor Labels: cql, jdbc Fix For: 1.1 Organizational ticket to tie together the proposed improvements to Cassandra's JDBC driver in order to coincide with the 1.0 release of the server-side product in the fall of 2011. The target list of improvements (in no particular order for the moment) are as follows: # Complete the {{PreparedStatement}} functionality by implementing true server side variable binding against pre-compiled CQL references. # Provide simple {{DataSource}} Support. # Provide a full {{PooledDataSource}} implementation that integrates the C* JDBC driver with App Servers, JPA implementations and POJO Frameworks (like Spring). # Add the {{BigDecimal}} datatype to the list of {{AbstractType}} classes to complete the planned datatype support for {{PreparedStatement}} and {{ResultSet}}. # Enhance the {{Driver}} features to support automatic error recovery and reconnection. # Support {{RowId}} in {{ResultSet}} # Allow bi-directional row access scrolling to complete functionality in the {{ResultSet}}. # Deliver unit tests for each of the major components of the suite. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3024) sstable and message varint encoding
[ https://issues.apache.org/jira/browse/CASSANDRA-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170459#comment-13170459 ] Jonathan Ellis commented on CASSANDRA-3024: --- Are you still working on a patch for this, Terje? sstable and message varint encoding --- Key: CASSANDRA-3024 URL: https://issues.apache.org/jira/browse/CASSANDRA-3024 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Priority: Minor Fix For: 1.1 We could save some sstable space by encoding longs and ints as vlong and vint, respectively. (Probably most short lengths would be better as vint as well.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3641) inconsistent/corrupt counters w/ broken shards never converge
inconsistent/corrupt counters w/ broken shards never converge - Key: CASSANDRA-3641 URL: https://issues.apache.org/jira/browse/CASSANDRA-3641 Project: Cassandra Issue Type: Bug Reporter: Peter Schuller We ran into a case (which MIGHT be related to CASSANDRA-3070) whereby we had counters that were corrupt (hopefully due to CASSANDRA-3178). The corruption was that there would exist shards with the *same* node_id, *same* clock id, but *different* counts. The counter column diffing and reconciliation code assumes that this never happens, and ignores the count. The problem with this is that if there is an inconsistency, the result of a reconciliation will depend on the order of the shards. In our case for example, we would see the value of the counter randomly fluctuating on a CL.ALL read, but we would get consistent (whatever the node had) on CL.ONE (submitted to one of the nodes in the replica set for the key). In addition, read repair would not work despite digest mismatches because the diffing algorithm also did not care about the counts when determining the differences to send. I'm attaching patches that fixes this. The first patch is against our 0.8 branch, which is not terribly useful to people, but I include it because it is the well-tested version that we have used on the production cluster which was subject to this corruption. The other patch is against trunk, and contains the same change. What the patch does is: * On diffing, treat as DISJOINT if there is a count discrepancy. * On reconciliation, look at the count and *deterministically* pick the higher one, and: ** log the fact that we detected a corrupt counter ** increment a JMX observable counter for monitoring purposes A cluster which is subject to such corruption and has this patch, will fix itself with and AES + compact (or just repeated compactions assuming the replicate-on-compact is able to deliver correctly). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2261) During Compaction, Corrupt SSTables with rows that cause failures should be identified and blacklisted.
[ https://issues.apache.org/jira/browse/CASSANDRA-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170461#comment-13170461 ] Benjamin Coverston commented on CASSANDRA-2261: --- happily During Compaction, Corrupt SSTables with rows that cause failures should be identified and blacklisted. --- Key: CASSANDRA-2261 URL: https://issues.apache.org/jira/browse/CASSANDRA-2261 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benjamin Coverston Assignee: Benjamin Coverston Priority: Minor Labels: not_a_pony Fix For: 1.1 Attachments: 2261.patch When a compaction of a set of SSTables fails because of corruption it will continue to try to compact that SSTable causing pending compactions to build up. One way to mitigate this problem would be to log the error, then identify the specific SSTable that caused the failure, subsequently blacklisting that SSTable and ensuring that it is no longer included in future compactions. For this we could simply store the problematic SSTable's name in memory. If it's not possible to identify the SSTable that caused the issue, then perhaps blacklisting the (ordered) permutation of SSTables to be compacted together is something that can be done to solve this problem in a more general case, and avoid issues where two (or more) SSTables have trouble compacting a particular row. For this option we would probably want to store the lists of the bad combinations in the system table somewhere s.t. these can survive a node failure (there have been a few cases where I have seen a compaction cause a node failure). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3389) Evaluate CSLM alternatives for improved cache or GC performance
[ https://issues.apache.org/jira/browse/CASSANDRA-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170462#comment-13170462 ] Jonathan Ellis commented on CASSANDRA-3389: --- Can you test these under G1 garbage collector? CSLM is the main reason G1 works poorly for us. (Especially the uses in Memtable.) Evaluate CSLM alternatives for improved cache or GC performance --- Key: CASSANDRA-3389 URL: https://issues.apache.org/jira/browse/CASSANDRA-3389 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Brandon Williams Priority: Minor Fix For: 1.1 Attachments: 0001-Replace-CSLM-with-ConcurrentSkipTreeMap.patch, 0001-Switch-CSLM-to-SnapTree.patch Ben Manes commented on http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance that it's worth evaluating https://github.com/mspiegel/lockfreeskiptree and https://github.com/nbronson/snaptree as CSLM replacements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3641) inconsistent/corrupt counters w/ broken shards never converge
[ https://issues.apache.org/jira/browse/CASSANDRA-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Schuller updated CASSANDRA-3641: -- Attachment: 3641-0.8-internal-not-for-inclusion.txt 3641-trunk.txt inconsistent/corrupt counters w/ broken shards never converge - Key: CASSANDRA-3641 URL: https://issues.apache.org/jira/browse/CASSANDRA-3641 Project: Cassandra Issue Type: Bug Reporter: Peter Schuller Attachments: 3641-0.8-internal-not-for-inclusion.txt, 3641-trunk.txt We ran into a case (which MIGHT be related to CASSANDRA-3070) whereby we had counters that were corrupt (hopefully due to CASSANDRA-3178). The corruption was that there would exist shards with the *same* node_id, *same* clock id, but *different* counts. The counter column diffing and reconciliation code assumes that this never happens, and ignores the count. The problem with this is that if there is an inconsistency, the result of a reconciliation will depend on the order of the shards. In our case for example, we would see the value of the counter randomly fluctuating on a CL.ALL read, but we would get consistent (whatever the node had) on CL.ONE (submitted to one of the nodes in the replica set for the key). In addition, read repair would not work despite digest mismatches because the diffing algorithm also did not care about the counts when determining the differences to send. I'm attaching patches that fixes this. The first patch is against our 0.8 branch, which is not terribly useful to people, but I include it because it is the well-tested version that we have used on the production cluster which was subject to this corruption. The other patch is against trunk, and contains the same change. What the patch does is: * On diffing, treat as DISJOINT if there is a count discrepancy. * On reconciliation, look at the count and *deterministically* pick the higher one, and: ** log the fact that we detected a corrupt counter ** increment a JMX observable counter for monitoring purposes A cluster which is subject to such corruption and has this patch, will fix itself with and AES + compact (or just repeated compactions assuming the replicate-on-compact is able to deliver correctly). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (CASSANDRA-3641) inconsistent/corrupt counters w/ broken shards never converge
[ https://issues.apache.org/jira/browse/CASSANDRA-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Schuller reassigned CASSANDRA-3641: - Assignee: Peter Schuller inconsistent/corrupt counters w/ broken shards never converge - Key: CASSANDRA-3641 URL: https://issues.apache.org/jira/browse/CASSANDRA-3641 Project: Cassandra Issue Type: Bug Reporter: Peter Schuller Assignee: Peter Schuller Attachments: 3641-0.8-internal-not-for-inclusion.txt, 3641-trunk.txt We ran into a case (which MIGHT be related to CASSANDRA-3070) whereby we had counters that were corrupt (hopefully due to CASSANDRA-3178). The corruption was that there would exist shards with the *same* node_id, *same* clock id, but *different* counts. The counter column diffing and reconciliation code assumes that this never happens, and ignores the count. The problem with this is that if there is an inconsistency, the result of a reconciliation will depend on the order of the shards. In our case for example, we would see the value of the counter randomly fluctuating on a CL.ALL read, but we would get consistent (whatever the node had) on CL.ONE (submitted to one of the nodes in the replica set for the key). In addition, read repair would not work despite digest mismatches because the diffing algorithm also did not care about the counts when determining the differences to send. I'm attaching patches that fixes this. The first patch is against our 0.8 branch, which is not terribly useful to people, but I include it because it is the well-tested version that we have used on the production cluster which was subject to this corruption. The other patch is against trunk, and contains the same change. What the patch does is: * On diffing, treat as DISJOINT if there is a count discrepancy. * On reconciliation, look at the count and *deterministically* pick the higher one, and: ** log the fact that we detected a corrupt counter ** increment a JMX observable counter for monitoring purposes A cluster which is subject to such corruption and has this patch, will fix itself with and AES + compact (or just repeated compactions assuming the replicate-on-compact is able to deliver correctly). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3213) Upgrade Thrift
[ https://issues.apache.org/jira/browse/CASSANDRA-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3213: -- Summary: Upgrade Thrift (was: Upgrade Thrift to 0.7.0) Upgrade Thrift -- Key: CASSANDRA-3213 URL: https://issues.apache.org/jira/browse/CASSANDRA-3213 Project: Cassandra Issue Type: Task Components: Core Reporter: Jake Farrell Assignee: Jake Farrell Priority: Trivial Labels: thrift Fix For: 1.2 Attachments: v1-0001-update-generated-thrift-code.patch, v1-0002-upgrade-thrift-jar-and-license.patch, v1-0003-update-build-xml.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2319) Promote row index
[ https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2319: -- Fix Version/s: (was: 1.1) Promote row index - Key: CASSANDRA-2319 URL: https://issues.apache.org/jira/browse/CASSANDRA-2319 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Stu Hood Assignee: Stu Hood Labels: compression, index, timeseries Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, version-g-lzf.txt, version-g.txt The row index contains entries for configurably sized blocks of a wide row. For a row of appreciable size, the row index ends up directing the third seek (1. index, 2. row index, 3. content) to nearby the first column of a scan. Since the row index is always used for wide rows, and since it contains information that tells us whether or not the 3rd seek is necessary (the column range or name we are trying to slice may not exist in a given sstable), promoting the row index into the sstable index would allow us to drop the maximum number of seeks for wide rows back to 2, and, more importantly, would allow sstables to be eliminated using only the index. An example usecase that benefits greatly from this change is time series data in wide rows, where data is appended to the beginning or end of the row. Our existing compaction strategy gets lucky and clusters the oldest data in the oldest sstables: for queries to recently appended data, we would be able to eliminate wide rows using only the sstable index, rather than needing to seek into the data file to determine that it isn't interesting. For narrow rows, this change would have no effect, as they will not reach the threshold for indexing anyway. A first cut design for this change would look very similar to the file format design proposed on #674: http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, column names clustered, and offsets clustered and delta encoded. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2398) Type specific compression
[ https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2398: -- Fix Version/s: (was: 1.2) Type specific compression - Key: CASSANDRA-2398 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Stu Hood Labels: compression Attachments: 0001-CASSANDRA-2398-Add-type-specific-compression-to-Abstra.txt, 0002-CASSANDRA-2398-Type-specific-compression-for-counters.txt, compress-lzf-0.7.0.jar Cassandra has a lot of locations that are ripe for type specific compression. A short list: Indexes * Keys compressed as BytesType, which could default to LZO/LZMA * Offsets (delta and varint encoding) * Column names added by 2319 Data * Keys, columns, timestamps: see http://wiki.apache.org/cassandra/FileFormatDesignDoc A basic interface for type specific compression could be as simple as: {code:java} public void compress(int version, final ListByteBuffer from, DataOutput to) throws IOException public void decompress(int version, DataInput from, ListByteBuffer to) throws IOException public void skip(int version, DataInput from) throws IOException {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3067) Simple SSTable Pluggability
[ https://issues.apache.org/jira/browse/CASSANDRA-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3067: -- Fix Version/s: (was: 1.1) Simple SSTable Pluggability --- Key: CASSANDRA-3067 URL: https://issues.apache.org/jira/browse/CASSANDRA-3067 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Stu Hood Assignee: Stu Hood Attachments: 0001-CASSANDRA-3067-Create-an-ABC-for-SSTableIdentityIterat.txt, 0002-CASSANDRA-3067-Move-from-linear-SSTable-versions-to-fe.txt, 0003-CASSANDRA-3067-Create-an-ABC-for-SSTableWriter.txt, 0004-CASSANDRA-3067-Rename-SSTable-Names-Slice-Iterator-to-.txt, 0005-CASSANDRA-3067-Create-ABCs-for-SSTableReader-and-KeyIt.txt, 0006-CASSANDRA-3067-Allow-overriding-the-current-sstable-ve.txt CASSANDRA-2995 proposes full storage engine pluggability, which is probably unavoidable in the long run. For now though, I'd like to propose an incremental alternative that preserves the sstable model, but allows it to evolve non-linearly. The sstable version field could allow for simple switching between writable sstable types, without moving all the way to differentiating between engines as CASSANDRA-2995 requires. This can be accomplished by moving towards a feature flags model (with a mapping between versions and feature sets), rather than a linear versions model (where versions can be strictly ordered and all versions above X have a feature). There are restrictions on this approach: * It's sufficient for an alternate SSTable(Writer|Reader|*) set to require a patch to enable (rather than a JAR) * Filenames/descriptors/components must conform to the existing conventions -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3213) Upgrade Thrift
[ https://issues.apache.org/jira/browse/CASSANDRA-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170464#comment-13170464 ] Jonathan Ellis commented on CASSANDRA-3213: --- Where was the Thrift wire-compatibility change? Was that 0.6 - 0.7? If so maybe we should upgrade to 0.7 for our 1.1 release so that people can use a modern Thrift client-side. Upgrade Thrift -- Key: CASSANDRA-3213 URL: https://issues.apache.org/jira/browse/CASSANDRA-3213 Project: Cassandra Issue Type: Task Components: Core Reporter: Jake Farrell Assignee: Jake Farrell Priority: Trivial Labels: thrift Fix For: 1.2 Attachments: v1-0001-update-generated-thrift-code.patch, v1-0002-upgrade-thrift-jar-and-license.patch, v1-0003-update-build-xml.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170467#comment-13170467 ] Marcus Eriksson commented on CASSANDRA-2749: sounds great (both just supporting new-style-layout and limiting names to 32chars) guess we need to supply a tool to rename sstables files if anyone is on longer names? and rolling upgrades are out of the question then right? (maybe the already are?) fine-grained control over data directories -- Key: CASSANDRA-2749 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Priority: Minor Fix For: 1.1 Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-add-new-directory-layout.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs. Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to keyspaces. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170475#comment-13170475 ] Pavel Yaskevich commented on CASSANDRA-2749: I think if we would be just supporting new style layout we can convert on startup. +1 on both ideas tho. fine-grained control over data directories -- Key: CASSANDRA-2749 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Priority: Minor Fix For: 1.1 Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-add-new-directory-layout.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs. Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to keyspaces. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2056) Need a way of flattening schemas.
[ https://issues.apache.org/jira/browse/CASSANDRA-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170478#comment-13170478 ] Gary Dusbabek commented on CASSANDRA-2056: -- bq. This is obsolete post-CASSANDRA-1391 This ticket is orthogonal to 1391. The purpose for flattening schemas is to make is so we do not have to maintain compatibility for the migration serializers indefinitely (for the purpose of bootstrapping a new node). Need a way of flattening schemas. - Key: CASSANDRA-2056 URL: https://issues.apache.org/jira/browse/CASSANDRA-2056 Project: Cassandra Issue Type: Improvement Reporter: Gary Dusbabek Priority: Minor Attachments: v2-0001-convert-MigrationManager-into-a-singleton.txt, v2-0002-bail-on-migrations-originating-from-newer-protocol-ver.txt, v2-0003-a-way-to-upgrade-schema-when-protocol-version-changes.txt For all of our trying not to, we still managed to screw this up. Schema updates currently contain a serialized RowMutation stored as a column value. When a node needs updated schema, it requests these values, deserializes them and applies them. As the serialization scheme for RowMutation changes over time (this is inevitable), those old migrations will become incompatible with newer implementations of the RowMutation deserializer. This means that when new nodes come online, they'll get migration messages that they have trouble deserializing. (Remember, we've only made the promise that we'll be backwards compatible for one version--see CASSANDRA-1015--even though we'd eventually have this problem without that guarantee.) What I propose is a cluster command to flatten the schema prior to upgrading. This would basically purge the old schema updates and replace them with a single serialized migration (serialized in the current protocol version). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170480#comment-13170480 ] Sylvain Lebresne commented on CASSANDRA-2749: - bq. guess we need to supply a tool to rename sstables files if anyone is on longer names? We probably don't need to do anything. I don't think anyone is really using names long enough for them to it the file system limit, the goal of limiting the names is just so to prevent this from happening but there will be no other assumption that the names are short from the code. I also don't think anything will prevent rolling upgrades, do you had something in mind? Note: I have a long flight ahead of me so I plan to update my last patch with both those changes, as I still like the moving of all the directories handling in a dedicated class, even if we don't support both layout. fine-grained control over data directories -- Key: CASSANDRA-2749 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Priority: Minor Fix For: 1.1 Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-add-new-directory-layout.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs. Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to keyspaces. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3616) Temp SSTable and file descriptor leak
[ https://issues.apache.org/jira/browse/CASSANDRA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170481#comment-13170481 ] Jonathan Ellis commented on CASSANDRA-3616: --- +1 Temp SSTable and file descriptor leak - Key: CASSANDRA-3616 URL: https://issues.apache.org/jira/browse/CASSANDRA-3616 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.5 Environment: 1.0.5 + CASSANDRA-3532 patch Solaris 10 Reporter: Eric Parusel Attachments: 3616.patch Discussion about this started in CASSANDRA-3532. It's on it's own ticket now. Anyhow: The nodes in my cluster are using a lot of file descriptors, holding open tmp files. A few are using 50K+, nearing their limit (on Solaris, of 64K). Here's a small snippet of lsof: java 828 appdeployer *162u VREG 181,65540 0 333884 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776518-Data.db java 828 appdeployer *163u VREG 181,65540 0 333502 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776452-Data.db java 828 appdeployer *165u VREG 181,65540 0 333929 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776527-Index.db java 828 appdeployer *166u VREG 181,65540 0 333859 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776514-Data.db java 828 appdeployer *167u VREG 181,65540 0 333663 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776480-Data.db java 828 appdeployer *168u VREG 181,65540 0 333812 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db I spot checked a few and found they still exist on the filesystem too: rw-rr- 1 appdeployer appdeployer 0 Dec 12 07:16 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db After more investigation, it seems to happen during a CompactionTask. I waited until I saw some -tmp- files hanging around in the data dir: -rw-r--r-- 1 appdeployer appdeployer 0 Dec 12 21:47:10 2011 messages_meta-tmp-hb-788904-Data.db -rw-r--r-- 1 appdeployer appdeployer 0 Dec 12 21:47:10 2011 messages_meta-tmp-hb-788904-Index.db and then found this in the logs: INFO [CompactionExecutor:18839] 2011-12-12 21:47:07,173 CompactionTask.java (line 113) Compacting [SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760408-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760413-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760409-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-788314-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760407-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760412-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760410-Data.db'), SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760411-Data.db')] INFO [CompactionExecutor:18839] 2011-12-12 21:47:10,461 CompactionTask.java (line 218) Compacted to [/data1/cassandra/data/MA_DDR/messages_meta-hb-788896-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788897-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788898-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788899-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788900-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788901-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788902-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788903-Data.db,]. 83,899,295 to 83,891,657 (~99% of original) bytes for 75,662 keys at 24.332518MB/s. Time: 3,288ms. Note that the timestamp of the 2nd log line matches the last modified time of the files, and has IDs leading up to, *but not including 788904*. I thought this might be relavent information, but I haven't found the specific cause yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1684) Entity groups
[ https://issues.apache.org/jira/browse/CASSANDRA-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170484#comment-13170484 ] Jonathan Ellis commented on CASSANDRA-1684: --- bq. if there were more optimizations done on rows (allowed them to be even larger, etc.), would that be a better approach? I think it would be. That's definitely a long-term play, though. I only have ideas on how to fix some of the problems Sylvain raised. And then there's others like CASSANDRA-3362. But we kind of need to fix large rows independent of the entity group idea. bq. Two use cases where same row does not work for us: Both of these sound like basically workarounds for weaknesses elsewhere. Which again feels like the right answer is to fix those weaknesses rather than adding another layer of hack on top. I guess there's really two questions here: - Should we add a special row group api? - What should the implementation look like? In other words, we could add a row group api and implement it in terms of large rows. Or implement it another way. But, we want wide rows that work well independent of row groups, so it feels like that's the right place to spend our efforts now. Entity groups - Key: CASSANDRA-1684 URL: https://issues.apache.org/jira/browse/CASSANDRA-1684 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Sylvain Lebresne Fix For: 1.2 Original Estimate: 80h Remaining Estimate: 80h Supporting entity groups similar to App Engine's (that is, allow rows to be part of a parent entity group, whose key is used for routing instead of the row itself) allows several improvements: - batches within an EG can be atomic across multiple rows - order-by-value queries within an EG only have to touch a single replica even with RandomPartitioner -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3640) Dynamic Snitch does not compute scores if no direct reads hit the node.
[ https://issues.apache.org/jira/browse/CASSANDRA-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170485#comment-13170485 ] Edward Capriolo commented on CASSANDRA-3640: I did happen to capture the snitch information from another node at the time of the event. /10.71.71.51=3.15 /10.71.74.30=26.88 /10.71.71.62=1.67 /10.71.71.66=5.19 /10.71.71.73=3.76 /10.71.71.76=0.68 /10.71.74.34=1.66 /10.71.71.63=2.42 /10.71.71.72=0.82 /10.71.71.59=3.44 /10.71.74.33=1.21 /10.71.71.64=1.21 /10.71.71.60=2.19 /10.71.71.71=1.75 /10.71.74.32=106.55 /10.71.71.54=86.69 /10.71.71.53=5.14 /10.71.74.27=5.93 /10.71.74.31=3.11 /10.71.71.69=1.15 /10.71.71.56=2.73 /10.71.74.37=2.16 /10.71.71.70=2.85 /10.71.71.58=0.77 /10.71.71.55=5.83 /10.71.74.38=1.14 /10.71.74.35=3.61 /10.71.71.68=0.81 /10.71.71.67=0.69 /10.71.71.74=3.64 /10.71.71.57=1.21 /10.71.71.52=2.37 /10.71.71.65=6.78 /10.71.71.61=2.8 /10.71.71.75=4.12 /10.71.74.36=3.22 These are the two systems that I show as getting a lot of IO /10.71.74.29=1.64 /10.71.74.28=3.49 We are doing mostly READ.ONE with a low read repair chance. ring is balanced and data/node is the same. badness_threshold is 0.2 No user load is hitting these two machines directly. Thus it is hard for me to understand how dynamic snitch is routing so much traffic to these machines that they are more burdened then other nodes. As I understand the dynamic snitch these machines should be the least burdened ones. Dynamic Snitch does not compute scores if no direct reads hit the node. --- Key: CASSANDRA-3640 URL: https://issues.apache.org/jira/browse/CASSANDRA-3640 Project: Cassandra Issue Type: Improvement Affects Versions: 0.8.7 Reporter: Edward Capriolo Priority: Minor We can into an interesting situation. We added 2 nodes to our cluster. Strangely these nodes were performing worse then other nodes. They had more IOwait for example. The impact was not major but it was noticeable. Later I determined that these Cassandra node were not in our client's list of nodes and our clients do not auto discover. I confirmed that the hosts did not have any scores inside it's dynamic snitch. It is counter intuitive that a node receiving less or no direct user requests would perform worse then others. I am not sure of the dynamic that caused this. I understand that DSnitch is supposed to have it's own view of the world, maybe it could share information with neighbours. Again this is more of a client configuration issue then a direct Cassandra issue, but I found it interesting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories
[ https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170487#comment-13170487 ] Jonathan Ellis commented on CASSANDRA-2749: --- It might be worth adding a are my filenames going to be too large check against all KS + CF combinations before starting to migrate data files around, though. It would suck to end up with a partially converted database if some short CF names complete early on, before erroring out on a long one. fine-grained control over data directories -- Key: CASSANDRA-2749 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Priority: Minor Fix For: 1.1 Attachments: 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 0001-add-new-directory-layout.patch, 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 2749_proper.tar.gz Currently Cassandra supports multiple data directories but no way to control what sstables are placed where. Particularly for systems with mixed SSDs and rotational disks, it would be nice to pin frequently accessed columnfamilies to the SSDs. Postgresql does this with tablespaces (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we should probably avoid using that name because of confusing similarity to keyspaces. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (CASSANDRA-3636) cassandra 1.0.6 debian packages will not run on OpenVZ
[ https://issues.apache.org/jira/browse/CASSANDRA-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne reassigned CASSANDRA-3636: --- Assignee: Brandon Williams cassandra 1.0.6 debian packages will not run on OpenVZ -- Key: CASSANDRA-3636 URL: https://issues.apache.org/jira/browse/CASSANDRA-3636 Project: Cassandra Issue Type: Bug Components: Packaging Affects Versions: 1.0.6 Environment: Debian Linux (stable), OpenVZ container Reporter: Zenek Kraweznik Assignee: Brandon Williams Priority: Critical During upgrade from 1.0.6 {code}Setting up cassandra (1.0.6) ... *error: permission denied on key 'vm.max_map_count'* dpkg: error processing cassandra (--configure): subprocess installed post-installation script returned error exit status 255 Errors were encountered while processing: cassandra {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3636) cassandra 1.0.6 debian packages will not run on OpenVZ
[ https://issues.apache.org/jira/browse/CASSANDRA-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3636: -- Priority: Minor (was: Critical) cassandra 1.0.6 debian packages will not run on OpenVZ -- Key: CASSANDRA-3636 URL: https://issues.apache.org/jira/browse/CASSANDRA-3636 Project: Cassandra Issue Type: Bug Components: Packaging Affects Versions: 1.0.6 Environment: Debian Linux (stable), OpenVZ container Reporter: Zenek Kraweznik Assignee: Brandon Williams Priority: Minor During upgrade from 1.0.6 {code}Setting up cassandra (1.0.6) ... *error: permission denied on key 'vm.max_map_count'* dpkg: error processing cassandra (--configure): subprocess installed post-installation script returned error exit status 255 Errors were encountered while processing: cassandra {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3635) Throttle validation separately from other compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170490#comment-13170490 ] Vijay commented on CASSANDRA-3635: -- Nope, I think we can create a tree independent on the nodes and then compare it Lets say we create a tree on A first after completion, we can create a tree on B and then on C (We have to sync on time, may be flush at the time when the repair was requested or something like that). Once we have all the 3 Trees we can compare and transfer which as required. Once we have all the trees we can exchange the trees and then start real streaming if needed. That way we dont bring the whole range down or hot. Throttle validation separately from other compaction Key: CASSANDRA-3635 URL: https://issues.apache.org/jira/browse/CASSANDRA-3635 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Labels: repair Fix For: 0.8.10, 1.0.7 Attachments: 0001-separate-validation-throttling.patch Validation compaction is fairly ressource intensive. It is possible to throttle it with other compaction, but there is cases where you really want to throttle it rather aggressively but don't necessarily want to have minor compactions throttled that much. The goal is to (optionally) allow to set a separate throttling value for validation. PS: I'm not pretending this will solve every repair problem or anything. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way
[ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170529#comment-13170529 ] Radim Kolar commented on CASSANDRA-3497: It will be good to have ability to shrink bloom filter during loading. Save only standard cassandra bloom filters but shrink them during load according to CF settings. BloomFilter FP ratio should be configurable or size-restricted some other way - Key: CASSANDRA-3497 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Brandon Williams Priority: Minor When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them. It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way
[ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3497: -- Fix Version/s: 1.1 Assignee: Yuki Morishita BloomFilter FP ratio should be configurable or size-restricted some other way - Key: CASSANDRA-3497 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Brandon Williams Assignee: Yuki Morishita Priority: Minor Fix For: 1.1 When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them. It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3626) Nodes can get stuck in UP state forever, despite being DOWN
[ https://issues.apache.org/jira/browse/CASSANDRA-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170535#comment-13170535 ] Hudson commented on CASSANDRA-3626: --- Integrated in Cassandra-0.8 #419 (See [https://builds.apache.org/job/Cassandra-0.8/419/]) Prevent new nodes from thinking down nodes are up forever. Patch by brandonwilliams, reviewed by Peter Schuller for CASSANDRA-3626 brandonwilliams : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1214916 Files : * /cassandra/branches/cassandra-0.8/CHANGES.txt * /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/Gossiper.java Nodes can get stuck in UP state forever, despite being DOWN --- Key: CASSANDRA-3626 URL: https://issues.apache.org/jira/browse/CASSANDRA-3626 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.8.8, 1.0.5 Reporter: Peter Schuller Assignee: Brandon Williams Fix For: 0.8.10, 1.0.7 Attachments: 3626.txt This is a proposed phrasing for an upstream ticket named Newly discovered nodes that are down get stuck in UP state forever (will edit w/ feedback until done): We have a observed a problem with gossip which, when you are bootstrapping a new node (or replacing using the replace_token support), any node in the cluster which is Down at the time the node is started, will be assumed to be Up and then *never ever* flapped back to Down until you restart the node. This has at least two implications to replacing or bootstrapping new nodes when there are nodes down in the ring: * If the new node happens to select a node listed as (UP but in reality is DOWN) as a stream source, streaming will sit there hanging forever. * If that doesn't happen (by picking another host), it will instead finish bootstrapping correctly, and begin servicing requests all the while thinking DOWN nodes are UP, and thus routing requests to them, generating timeouts. The way to get out of this is to restart the node(s) that you bootstrapped. I have tested and confirmed the symptom (that the bootstrapped node things other nodes are Up) using a fairly recent 1.0. The main debugging effort happened on 0.8 however, so all details below refer to 0.8 but are probably similar in 1.0. Steps to reproduce: * Bring up a cluster of = 3 nodes. *Ensure RF is N*, so that the cluster is operative with one node removed. * Pick two random nodes A, and B. Shut them *both* off. * Wait for everyone to realize they are both off (for good measure). * Now, take node A and nuke it's data directories and re-start it, such that it comes up w/ normal bootstrap (or use replace_token; didn't test that but should not affect it). * Watch how node A starts up, all the while believing node B is down, even though all other nodes in the cluster agree that B is down and B is in fact still turned off. The mechanism by which it initially goes into Up state is that the node receives a gossip response from any other node in the cluster, and GossipDigestAck2VerbHandler.doVerb() calls Gossiper.applyStateLocally(). Gossiper.applyStateLocally() doesn't have any local endpoint state for the cluster, so the else statement at the end (it's a new node) gets triggered and handleMajorStateChange() is called. handleMajorStateChange() always calls markAlive(), unless the state is a dead state (but dead here does not mean not up, but refers to joining/hibernate etc). So at this point the node is up in the mind of the node you just bootstrapped. Now, in each gossip round doStatusCheck() is called, which iterates over all nodes (including the one falsly Up) and among other things, calls FailureDetector.interpret() on each node. FailureDetector.interpret() is meant to update its sense of Phi for the node, and potentially convict it. However there is a short-circuit at the top, whereby if we do not yet have any arrival window for the node, we simply return immediately. Arrival intervals are only added as a result of a FailureDetector.report() call, which never happens in this case because the initial endpoint state we added, which came from a remote node that was up, had the latest version of the gossip state (so Gossiper.reportFailureDetector() will never call report()). The result is that the node can never ever be convicted. Now, let's ignore for a moment the problem that a node that is actually Down will be thought to be Up temporarily for a little while. That is sub-optimal, but let's aim for a fix to the more serious problem in this ticket - which is that is stays up forever. Considered solutions: * When interpret() gets called and there is no arrival window, we could add a faked arrival window