[jira] [Created] (CASSANDRA-3075) Cassandra CLI unable to use list command with INTEGER column names, resulting in syntax error
Cassandra CLI unable to use list command with INTEGER column names, resulting in syntax error - Key: CASSANDRA-3075 URL: https://issues.apache.org/jira/browse/CASSANDRA-3075 Project: Cassandra Issue Type: Bug Components: Tools Affects Versions: 0.8.0 Environment: 64 Bit Ubuntu 11.04(full update), AMD64 + 8GB RAM + 500GB Hdd, Java 1.6.0_26, Cassandra 0.8.0 + 4GB heap, Cassandra CLI Reporter: Renato Bacelar da Silveira Priority: Minor I have a Column Family named 1105115. I have inserted the CF with Hector, and it did not throw any exception concerning the name of the column. If I am issuing the command list 1105115; I incur the following error: [default@unknown] list 1105115; Syntax error at position 5: mismatched input '1105115' expecting Identifier I presume we are not to name CFs as integers? Or is there something I am missing from the bellow help content: [default@unknown] help list; list cf; list cf[startKey:]; list cf[startKey:endKey]; list cf[startKey:endKey] limit limit; List a range of rows, and all of their columns, in the specified column family. The order of rows returned is dependant on the Partitioner in use. Required Parameters: - cf: Name of the column family to list rows from. Optional Parameters: - endKey: Key to end the range at. The end key will be included in the result. Defaults to an empty byte array. - limit: Number of rows to return. Default is 100. - startKey: Key start the range from. The start key will be included in the result. Defaults to an empty byte array. Examples: list Standard1; list Super1[j:]; list Standard1[j:k] limit 40; Column Family Info: ColumnFamily: 1105100 Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.AsciiType Row cache size / save period in seconds: 0.0/0 Key cache size / save period in seconds: 20.0/14400 Memtable thresholds: 0.5203125/111/1440 (millions of ops/MB/minutes) GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Built indexes: [] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3075) Cassandra CLI unable to use list command with INTEGER column names, resulting in syntax error
[ https://issues.apache.org/jira/browse/CASSANDRA-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renato Bacelar da Silveira updated CASSANDRA-3075: -- Description: I have a Column Family named 1105115. I have inserted the CF with Hector, and it did not throw any exception concerning the name of the column. If I am issuing the command list 1105115; I incur the following error: [default@unknown] list 1105115; Syntax error at position 5: mismatched input '1105115' expecting Identifier I presume we are not to name CFs as integers? Or is there something I am missing from the bellow help content: [default@unknown] help list; list cf; list cf[startKey:]; list cf[startKey:endKey]; list cf[startKey:endKey] limit limit; List a range of rows, and all of their columns, in the specified column family. The order of rows returned is dependant on the Partitioner in use. Required Parameters: - cf: Name of the column family to list rows from. Optional Parameters: - endKey: Key to end the range at. The end key will be included in the result. Defaults to an empty byte array. - limit: Number of rows to return. Default is 100. - startKey: Key start the range from. The start key will be included in the result. Defaults to an empty byte array. Examples: list Standard1; list Super1[j:]; list Standard1[j:k] limit 40; Column Family Info: ColumnFamily: 1105115 Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.AsciiType Row cache size / save period in seconds: 0.0/0 Key cache size / save period in seconds: 20.0/14400 Memtable thresholds: 0.5203125/111/1440 (millions of ops/MB/minutes) GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Built indexes: [] was: I have a Column Family named 1105115. I have inserted the CF with Hector, and it did not throw any exception concerning the name of the column. If I am issuing the command list 1105115; I incur the following error: [default@unknown] list 1105115; Syntax error at position 5: mismatched input '1105115' expecting Identifier I presume we are not to name CFs as integers? Or is there something I am missing from the bellow help content: [default@unknown] help list; list cf; list cf[startKey:]; list cf[startKey:endKey]; list cf[startKey:endKey] limit limit; List a range of rows, and all of their columns, in the specified column family. The order of rows returned is dependant on the Partitioner in use. Required Parameters: - cf: Name of the column family to list rows from. Optional Parameters: - endKey: Key to end the range at. The end key will be included in the result. Defaults to an empty byte array. - limit: Number of rows to return. Default is 100. - startKey: Key start the range from. The start key will be included in the result. Defaults to an empty byte array. Examples: list Standard1; list Super1[j:]; list Standard1[j:k] limit 40; Column Family Info: ColumnFamily: 1105100 Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.AsciiType Row cache size / save period in seconds: 0.0/0 Key cache size / save period in seconds: 20.0/14400 Memtable thresholds: 0.5203125/111/1440 (millions of ops/MB/minutes) GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Built indexes: [] Column Family name edit on the extra info... sorry Cassandra CLI unable to use list command with INTEGER column names, resulting in syntax error - Key: CASSANDRA-3075 URL: https://issues.apache.org/jira/browse/CASSANDRA-3075 Project: Cassandra Issue Type: Bug Components: Tools Affects Versions: 0.8.0 Environment: 64 Bit Ubuntu 11.04(full update), AMD64 + 8GB RAM + 500GB Hdd, Java 1.6.0_26, Cassandra 0.8.0 + 4GB heap, Cassandra CLI Reporter: Renato Bacelar da Silveira Priority: Minor Labels: features, newbie I have a Column Family named 1105115. I have inserted the CF with Hector, and it did not throw any exception concerning the name of the column. If I am issuing the command list 1105115; I incur the following error: [default@unknown] list 1105115; Syntax error at position 5: mismatched input '1105115' expecting Identifier I presume
[jira] [Updated] (CASSANDRA-1608) Redesigned Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Coverston updated CASSANDRA-1608: -- Attachment: 1608-v5.txt I've made the changes requested in the last two comments. The latest changes/merge seem to have caused a regression when the # of SSTables increases beyond a few hundred. Next time I'll be able to look at this is Friday I'll try to figure out what on earth is going on. Redesigned Compaction - Key: CASSANDRA-1608 URL: https://issues.apache.org/jira/browse/CASSANDRA-1608 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Chris Goffinet Assignee: Benjamin Coverston Attachments: 1608-22082011.txt, 1608-v2.txt, 1608-v4.txt, 1608-v5.txt After seeing the I/O issues in CASSANDRA-1470, I've been doing some more thinking on this subject that I wanted to lay out. I propose we redo the concept of how compaction works in Cassandra. At the moment, compaction is kicked off based on a write access pattern, not read access pattern. In most cases, you want the opposite. You want to be able to track how well each SSTable is performing in the system. If we were to keep statistics in-memory of each SSTable, prioritize them based on most accessed, and bloom filter hit/miss ratios, we could intelligently group sstables that are being read most often and schedule them for compaction. We could also schedule lower priority maintenance on SSTable's not often accessed. I also propose we limit the size of each SSTable to a fix sized, that gives us the ability to better utilize our bloom filters in a predictable manner. At the moment after a certain size, the bloom filters become less reliable. This would also allow us to group data most accessed. Currently the size of an SSTable can grow to a point where large portions of the data might not actually be accessed as often. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-1608) Redesigned Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090883#comment-13090883 ] Benjamin Coverston edited comment on CASSANDRA-1608 at 8/25/11 10:05 AM: - I've made the changes requested in the last two comments. The latest changes/merge seem to have caused a regression when the # of SSTables increases beyond a few hundred. Next time I'll be able to look at this is Friday I'll try to figure out what on earth is going on. EDIT: Somehow I screwed up the attached patch.. I'll fix it and resubmit. was (Author: bcoverston): I've made the changes requested in the last two comments. The latest changes/merge seem to have caused a regression when the # of SSTables increases beyond a few hundred. Next time I'll be able to look at this is Friday I'll try to figure out what on earth is going on. Redesigned Compaction - Key: CASSANDRA-1608 URL: https://issues.apache.org/jira/browse/CASSANDRA-1608 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Chris Goffinet Assignee: Benjamin Coverston Attachments: 1608-22082011.txt, 1608-v2.txt, 1608-v4.txt After seeing the I/O issues in CASSANDRA-1470, I've been doing some more thinking on this subject that I wanted to lay out. I propose we redo the concept of how compaction works in Cassandra. At the moment, compaction is kicked off based on a write access pattern, not read access pattern. In most cases, you want the opposite. You want to be able to track how well each SSTable is performing in the system. If we were to keep statistics in-memory of each SSTable, prioritize them based on most accessed, and bloom filter hit/miss ratios, we could intelligently group sstables that are being read most often and schedule them for compaction. We could also schedule lower priority maintenance on SSTable's not often accessed. I also propose we limit the size of each SSTable to a fix sized, that gives us the ability to better utilize our bloom filters in a predictable manner. At the moment after a certain size, the bloom filters become less reliable. This would also allow us to group data most accessed. Currently the size of an SSTable can grow to a point where large portions of the data might not actually be accessed as often. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1608) Redesigned Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Coverston updated CASSANDRA-1608: -- Attachment: (was: 1608-v5.txt) Redesigned Compaction - Key: CASSANDRA-1608 URL: https://issues.apache.org/jira/browse/CASSANDRA-1608 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Chris Goffinet Assignee: Benjamin Coverston Attachments: 1608-22082011.txt, 1608-v2.txt, 1608-v4.txt After seeing the I/O issues in CASSANDRA-1470, I've been doing some more thinking on this subject that I wanted to lay out. I propose we redo the concept of how compaction works in Cassandra. At the moment, compaction is kicked off based on a write access pattern, not read access pattern. In most cases, you want the opposite. You want to be able to track how well each SSTable is performing in the system. If we were to keep statistics in-memory of each SSTable, prioritize them based on most accessed, and bloom filter hit/miss ratios, we could intelligently group sstables that are being read most often and schedule them for compaction. We could also schedule lower priority maintenance on SSTable's not often accessed. I also propose we limit the size of each SSTable to a fix sized, that gives us the ability to better utilize our bloom filters in a predictable manner. At the moment after a certain size, the bloom filters become less reliable. This would also allow us to group data most accessed. Currently the size of an SSTable can grow to a point where large portions of the data might not actually be accessed as often. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3075) Cassandra CLI unable to use list command with INTEGER column names, resulting in syntax error
[ https://issues.apache.org/jira/browse/CASSANDRA-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3075: -- Fix Version/s: 0.8.5 Assignee: Pavel Yaskevich One possible solution would be to allow quoting CF names. Cassandra CLI unable to use list command with INTEGER column names, resulting in syntax error - Key: CASSANDRA-3075 URL: https://issues.apache.org/jira/browse/CASSANDRA-3075 Project: Cassandra Issue Type: Bug Components: Tools Affects Versions: 0.8.0 Environment: 64 Bit Ubuntu 11.04(full update), AMD64 + 8GB RAM + 500GB Hdd, Java 1.6.0_26, Cassandra 0.8.0 + 4GB heap, Cassandra CLI Reporter: Renato Bacelar da Silveira Assignee: Pavel Yaskevich Priority: Minor Labels: features, newbie Fix For: 0.8.5 I have a Column Family named 1105115. I have inserted the CF with Hector, and it did not throw any exception concerning the name of the column. If I am issuing the command list 1105115; I incur the following error: [default@unknown] list 1105115; Syntax error at position 5: mismatched input '1105115' expecting Identifier I presume we are not to name CFs as integers? Or is there something I am missing from the bellow help content: [default@unknown] help list; list cf; list cf[startKey:]; list cf[startKey:endKey]; list cf[startKey:endKey] limit limit; List a range of rows, and all of their columns, in the specified column family. The order of rows returned is dependant on the Partitioner in use. Required Parameters: - cf: Name of the column family to list rows from. Optional Parameters: - endKey: Key to end the range at. The end key will be included in the result. Defaults to an empty byte array. - limit: Number of rows to return. Default is 100. - startKey: Key start the range from. The start key will be included in the result. Defaults to an empty byte array. Examples: list Standard1; list Super1[j:]; list Standard1[j:k] limit 40; Column Family Info: ColumnFamily: 1105115 Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.AsciiType Row cache size / save period in seconds: 0.0/0 Key cache size / save period in seconds: 20.0/14400 Memtable thresholds: 0.5203125/111/1440 (millions of ops/MB/minutes) GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Built indexes: [] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3076) AssertionError in new GCInspector log
[ https://issues.apache.org/jira/browse/CASSANDRA-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091037#comment-13091037 ] T Jake Luciani commented on CASSANDRA-3076: --- [junit] ERROR 10:20:16,755 Fatal exception in thread Thread[ScheduledTasks:1,5,main] [junit] java.lang.AssertionError [junit] at org.apache.cassandra.service.GCInspector.logGCResults(GCInspector.java:110) [junit] at org.apache.cassandra.service.GCInspector.access$000(GCInspector.java:41) [junit] at org.apache.cassandra.service.GCInspector$1.run(GCInspector.java:85) [junit] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) [junit] at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) [junit] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) [junit] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) [junit] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) [junit] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [junit] at java.lang.Thread.run(Thread.java:680) [junit] - --- AssertionError in new GCInspector log - Key: CASSANDRA-3076 URL: https://issues.apache.org/jira/browse/CASSANDRA-3076 Project: Cassandra Issue Type: Bug Environment: Lion OSX Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Fix For: 0.8.5 Small regression from CASSANDRA-2868 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-2761) JDBC driver does not build
[ https://issues.apache.org/jira/browse/CASSANDRA-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-2761. --- Resolution: Fixed Fix Version/s: (was: 1.0) JDBC driver does not build -- Key: CASSANDRA-2761 URL: https://issues.apache.org/jira/browse/CASSANDRA-2761 Project: Cassandra Issue Type: Bug Components: API Affects Versions: 1.0 Reporter: Jonathan Ellis Assignee: Rick Shaw Attachments: jdbc-driver-build-v1.txt, v1-0001-CASSANDRA-2761-cleanup-nits.txt Need a way to build (and run tests for) the Java driver. Also: still some vestigal references to drivers/ in trunk build.xml. Should we remove drivers/ from the 0.8 branch as well? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3076) AssertionError in new GCInspector log
[ https://issues.apache.org/jira/browse/CASSANDRA-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091037#comment-13091037 ] T Jake Luciani edited comment on CASSANDRA-3076 at 8/25/11 2:27 PM: {code} [junit] ERROR 10:20:16,755 Fatal exception in thread Thread[ScheduledTasks:1,5,main] [junit] java.lang.AssertionError [junit] at org.apache.cassandra.service.GCInspector.logGCResults(GCInspector.java:110) [junit] at org.apache.cassandra.service.GCInspector.access$000(GCInspector.java:41) [junit] at org.apache.cassandra.service.GCInspector$1.run(GCInspector.java:85) [junit] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) [junit] at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) [junit] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) [junit] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) [junit] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) [junit] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [junit] at java.lang.Thread.run(Thread.java:680) [junit] - --- {code} was (Author: tjake): [junit] ERROR 10:20:16,755 Fatal exception in thread Thread[ScheduledTasks:1,5,main] [junit] java.lang.AssertionError [junit] at org.apache.cassandra.service.GCInspector.logGCResults(GCInspector.java:110) [junit] at org.apache.cassandra.service.GCInspector.access$000(GCInspector.java:41) [junit] at org.apache.cassandra.service.GCInspector$1.run(GCInspector.java:85) [junit] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) [junit] at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) [junit] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) [junit] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) [junit] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) [junit] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [junit] at java.lang.Thread.run(Thread.java:680) [junit] - --- AssertionError in new GCInspector log - Key: CASSANDRA-3076 URL: https://issues.apache.org/jira/browse/CASSANDRA-3076 Project: Cassandra Issue Type: Bug Environment: Lion OSX Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Fix For: 0.8.5 Small regression from CASSANDRA-2868 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3076) AssertionError in new GCInspector log
AssertionError in new GCInspector log - Key: CASSANDRA-3076 URL: https://issues.apache.org/jira/browse/CASSANDRA-3076 Project: Cassandra Issue Type: Bug Environment: Lion OSX Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Fix For: 0.8.5 Small regression from CASSANDRA-2868 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3074) comments and documentation for index_interval are misleading
[ https://issues.apache.org/jira/browse/CASSANDRA-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091038#comment-13091038 ] Jonathan Ellis commented on CASSANDRA-3074: --- the proposed changes conflate the index entries themselves (always one per key) and the sampling rate (which is what index_interval affects). comments and documentation for index_interval are misleading Key: CASSANDRA-3074 URL: https://issues.apache.org/jira/browse/CASSANDRA-3074 Project: Cassandra Issue Type: Bug Affects Versions: 0.8.4 Reporter: Matthew F. Dennis Assignee: Matthew F. Dennis Priority: Minor Attachments: 3074-cassandra-0.8.patch The comments and documentation for index_interval are misleading. They state the larger the *sampling* the more effective the index as at the cost of space. This is true, but in the context of the configuration variable it implies the larger the *setting* is the larger the index is while in fact it's the opposite of that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3076) AssertionError in new GCInspector log
[ https://issues.apache.org/jira/browse/CASSANDRA-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Jake Luciani updated CASSANDRA-3076: -- Attachment: 3076.txt AssertionError in new GCInspector log - Key: CASSANDRA-3076 URL: https://issues.apache.org/jira/browse/CASSANDRA-3076 Project: Cassandra Issue Type: Bug Environment: Lion OSX Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Fix For: 0.8.5 Attachments: 3076.txt Small regression from CASSANDRA-2868 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3076) AssertionError in new GCInspector log
[ https://issues.apache.org/jira/browse/CASSANDRA-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Jake Luciani updated CASSANDRA-3076: -- Fix Version/s: (was: 0.8.5) 0.7.9 AssertionError in new GCInspector log - Key: CASSANDRA-3076 URL: https://issues.apache.org/jira/browse/CASSANDRA-3076 Project: Cassandra Issue Type: Bug Environment: Lion OSX Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Fix For: 0.7.9 Attachments: 3076.txt Small regression from CASSANDRA-2868 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3076) AssertionError in new GCInspector log
[ https://issues.apache.org/jira/browse/CASSANDRA-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091042#comment-13091042 ] Jonathan Ellis commented on CASSANDRA-3076: --- The assert is basically saying if total gc time has increased, count should have increased as well. If that's valid, then the if (previousTotal.equals(total)) continue check should handle this. If it's not, we should probably remove the assert entirely. AssertionError in new GCInspector log - Key: CASSANDRA-3076 URL: https://issues.apache.org/jira/browse/CASSANDRA-3076 Project: Cassandra Issue Type: Bug Environment: Lion OSX Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Fix For: 0.7.9 Attachments: 3076.txt Small regression from CASSANDRA-2868 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3076) AssertionError in new GCInspector log
[ https://issues.apache.org/jira/browse/CASSANDRA-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091049#comment-13091049 ] T Jake Luciani commented on CASSANDRA-3076: --- Right, I think it's likely a OSX lion thing. Removing the assert works for me. AssertionError in new GCInspector log - Key: CASSANDRA-3076 URL: https://issues.apache.org/jira/browse/CASSANDRA-3076 Project: Cassandra Issue Type: Bug Environment: Lion OSX Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Fix For: 0.7.9 Attachments: 3076.txt Small regression from CASSANDRA-2868 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1161607 - in /cassandra/branches/cassandra-0.7: CHANGES.txt src/java/org/apache/cassandra/service/GCInspector.java
Author: jbellis Date: Thu Aug 25 15:36:57 2011 New Revision: 1161607 URL: http://svn.apache.org/viewvc?rev=1161607view=rev Log: r/m failing assert to match 1161167 in 0.8 Modified: cassandra/branches/cassandra-0.7/CHANGES.txt cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/GCInspector.java Modified: cassandra/branches/cassandra-0.7/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/CHANGES.txt?rev=1161607r1=1161606r2=1161607view=diff == --- cassandra/branches/cassandra-0.7/CHANGES.txt (original) +++ cassandra/branches/cassandra-0.7/CHANGES.txt Thu Aug 25 15:36:57 2011 @@ -9,7 +9,7 @@ CompactionManager.estimatedCompactions (CASSANDRA-2708) * remove gossip state when a new IP takes over a token (CASSANDRA-3071) * work around native memory leak in com.sun.management.GarbageCollectorMXBean -(CASSANDRA-2868) + (CASSANDRA-2868, 3076) 0.7.8 Modified: cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/GCInspector.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/GCInspector.java?rev=1161607r1=1161606r2=1161607view=diff == --- cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/GCInspector.java (original) +++ cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/GCInspector.java Thu Aug 25 15:36:57 2011 @@ -106,7 +106,6 @@ public class GCInspector if (previousCount == null) previousCount = 0L; gccounts.put(gc.getName(), count); -assert count previousCount; MemoryUsage mu = membean.getHeapMemoryUsage(); long memoryUsed = mu.getUsed();
svn commit: r1161608 - in /cassandra/branches/cassandra-0.8: ./ contrib/ interface/thrift/gen-java/org/apache/cassandra/thrift/
Author: jbellis Date: Thu Aug 25 15:37:43 2011 New Revision: 1161608 URL: http://svn.apache.org/viewvc?rev=1161608view=rev Log: merge from 0.7 Modified: cassandra/branches/cassandra-0.8/ (props changed) cassandra/branches/cassandra-0.8/contrib/ (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java (props changed) Propchange: cassandra/branches/cassandra-0.8/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Thu Aug 25 15:37:43 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1131291 -/cassandra/branches/cassandra-0.7:1026516-1160444,1160825 +/cassandra/branches/cassandra-0.7:1026516-1160444,1160825,1161607 /cassandra/branches/cassandra-0.7.0:1053690-1055654 /cassandra/branches/cassandra-0.8:1090934-1125013,1125041 /cassandra/branches/cassandra-0.8.0:1125021-1130369 Propchange: cassandra/branches/cassandra-0.8/contrib/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Thu Aug 25 15:37:43 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6/contrib:922689-1052356,1052358-1053452,1053454,1053456-1068009 -/cassandra/branches/cassandra-0.7/contrib:1026516-1160444,1160825 +/cassandra/branches/cassandra-0.7/contrib:1026516-1160444,1160825,1161607 /cassandra/branches/cassandra-0.7.0/contrib:1053690-1055654 /cassandra/branches/cassandra-0.8/contrib:1090934-1125013,1125041 /cassandra/branches/cassandra-0.8.0/contrib:1125021-1130369 Propchange: cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Thu Aug 25 15:37:43 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:922689-1052356,1052358-1053452,1053454,1053456-1131291 -/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1160444,1160825 +/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1160444,1160825,1161607 /cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1053690-1055654 /cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1090934-1125013,1125041 /cassandra/branches/cassandra-0.8.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1125021-1130369 Propchange: cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Thu Aug 25 15:37:43 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:922689-1052356,1052358-1053452,1053454,1053456-1131291 -/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1160444,1160825 +/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1160444,1160825,1161607 /cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1053690-1055654 /cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1090934-1125013,1125041 /cassandra/branches/cassandra-0.8.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1125021-1130369 Propchange: cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Thu Aug 25 15:37:43 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java:922689-1052356,1052358-1053452,1053454,1053456-1131291 -/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java:1026516-1160444,1160825
[jira] [Resolved] (CASSANDRA-3076) AssertionError in new GCInspector log
[ https://issues.apache.org/jira/browse/CASSANDRA-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-3076. --- Resolution: Fixed Fix Version/s: 0.8.5 Reviewer: jbellis ok, done in 0.7. (Brandon already did that in 0.8 in r1161167.) AssertionError in new GCInspector log - Key: CASSANDRA-3076 URL: https://issues.apache.org/jira/browse/CASSANDRA-3076 Project: Cassandra Issue Type: Bug Environment: Lion OSX Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Fix For: 0.7.9, 0.8.5 Attachments: 3076.txt Small regression from CASSANDRA-2868 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2380) Cassandra requires hostname is resolvable even when specifying IP's for listen and rpc addresses
[ https://issues.apache.org/jira/browse/CASSANDRA-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091123#comment-13091123 ] Jonathan Ellis commented on CASSANDRA-2380: --- Exception thrown by the agent usually refers to the RMI agent used by JMX. I'd try uncommenting and editing this line in cassandra-env.sh: {noformat} # JVM_OPTS=$JVM_OPTS -Djava.rmi.server.hostname=public name {noformat} Cassandra requires hostname is resolvable even when specifying IP's for listen and rpc addresses Key: CASSANDRA-2380 URL: https://issues.apache.org/jira/browse/CASSANDRA-2380 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.4 Environment: open jdk 1.6.0_20 64-Bit Reporter: Eric Tamme Priority: Trivial A strange looking error is printed out, with no stack trace and no other log, when hostname is not resolvable regardless of whether or not the hostname is being used to specify a listen or rpc address. I am specifically using IPv6 addresses but I have tested it with IPv4 and gotten the same result. Error: Exception thrown by the agent : java.net.MalformedURLException: Local host name unknown: java.net.UnknownHostException I have spent several hours trying to track down what is happening and have been unable to determine if this is down in the java getByName-getAllByName-getAllByName0 set of methods that is happening when listenAddress = InetAddress.getByName(conf.listen_address); is called from DatabaseDescriptor.java I am not able to replicate the error in a stand alone java program (see below) so I am not sure what cassandra is doing to force name resolution. Perhaps the issue is not in DatabaseDescriptor, but some where else? I get no log output, and no stack trace when this happens, only the single line error. import java.net.InetAddress; import java.net.UnknownHostException; class Test { public static void main(String args[]) { try { InetAddress listenAddress = InetAddress.getByName(foo); System.out.println(listenAddress); } catch (UnknownHostException e) { System.out.println(Unable to parse address); } } } People have just said oh go put a line in your hosts file and while that does work, it is not right. If I am not using my hostname for any reason cassandra should not have to resolve it, and carrying around that application specific stuff in your hosts file is not correct. Regardless of if this bug gets fixed, I want to better understand what the heck is going on that makes cassandra crash and print out that exception. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-2380) Cassandra requires hostname is resolvable even when specifying IP's for listen and rpc addresses
[ https://issues.apache.org/jira/browse/CASSANDRA-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-2380. --- Resolution: Cannot Reproduce Cassandra requires hostname is resolvable even when specifying IP's for listen and rpc addresses Key: CASSANDRA-2380 URL: https://issues.apache.org/jira/browse/CASSANDRA-2380 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.4 Environment: open jdk 1.6.0_20 64-Bit Reporter: Eric Tamme Priority: Trivial A strange looking error is printed out, with no stack trace and no other log, when hostname is not resolvable regardless of whether or not the hostname is being used to specify a listen or rpc address. I am specifically using IPv6 addresses but I have tested it with IPv4 and gotten the same result. Error: Exception thrown by the agent : java.net.MalformedURLException: Local host name unknown: java.net.UnknownHostException I have spent several hours trying to track down what is happening and have been unable to determine if this is down in the java getByName-getAllByName-getAllByName0 set of methods that is happening when listenAddress = InetAddress.getByName(conf.listen_address); is called from DatabaseDescriptor.java I am not able to replicate the error in a stand alone java program (see below) so I am not sure what cassandra is doing to force name resolution. Perhaps the issue is not in DatabaseDescriptor, but some where else? I get no log output, and no stack trace when this happens, only the single line error. import java.net.InetAddress; import java.net.UnknownHostException; class Test { public static void main(String args[]) { try { InetAddress listenAddress = InetAddress.getByName(foo); System.out.println(listenAddress); } catch (UnknownHostException e) { System.out.println(Unable to parse address); } } } People have just said oh go put a line in your hosts file and while that does work, it is not right. If I am not using my hostname for any reason cassandra should not have to resolve it, and carrying around that application specific stuff in your hosts file is not correct. Regardless of if this bug gets fixed, I want to better understand what the heck is going on that makes cassandra crash and print out that exception. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1608) Redesigned Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Coverston updated CASSANDRA-1608: -- Attachment: 1608-v5.txt Fixed up the patch according to the comments given. Took a stab a culling some of the SSTables from the locking mechanism. Redesigned Compaction - Key: CASSANDRA-1608 URL: https://issues.apache.org/jira/browse/CASSANDRA-1608 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Chris Goffinet Assignee: Benjamin Coverston Attachments: 1608-22082011.txt, 1608-v2.txt, 1608-v4.txt, 1608-v5.txt After seeing the I/O issues in CASSANDRA-1470, I've been doing some more thinking on this subject that I wanted to lay out. I propose we redo the concept of how compaction works in Cassandra. At the moment, compaction is kicked off based on a write access pattern, not read access pattern. In most cases, you want the opposite. You want to be able to track how well each SSTable is performing in the system. If we were to keep statistics in-memory of each SSTable, prioritize them based on most accessed, and bloom filter hit/miss ratios, we could intelligently group sstables that are being read most often and schedule them for compaction. We could also schedule lower priority maintenance on SSTable's not often accessed. I also propose we limit the size of each SSTable to a fix sized, that gives us the ability to better utilize our bloom filters in a predictable manner. At the moment after a certain size, the bloom filters become less reliable. This would also allow us to group data most accessed. Currently the size of an SSTable can grow to a point where large portions of the data might not actually be accessed as often. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091196#comment-13091196 ] Yang Yang commented on CASSANDRA-2252: -- SlabAllocator: private void tryRetireRegion(Region region) { if (currentRegion.compareAndSet(region, null)) { filledRegions.add(region); } } could you please explain why we need to add them to filledRegions? when all the buffers that share the same region die/become unreachable, shouldn't we just let the region go and free memory? , then we should not tie this region in memory through the references starting from filledRegions . no ?? just to confirm my thoughts, I looked at the HBase implementation: ./src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreLAB.java /** * Try to retire the current chunk if it is still * codec/code. Postcondition is that curChunk.get() * != c */ private void tryRetireChunk(Chunk c) { @SuppressWarnings(unused) boolean weRetiredIt = curChunk.compareAndSet(c, null); // If the CAS succeeds, that means that we won the race // to retire the chunk. We could use this opportunity to // update metrics on external fragmentation. // // If the CAS fails, that means that someone else already // retired the chunk for us. } it does not tie it to a region list . the current result of tying regions together through the filledRegions is that all regions (even if those dead ones) still occupy memory. --- well if the purpose is to count the size() held in allocator, should we use weak references? arena allocation for memtables -- Key: CASSANDRA-2252 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 1.0 Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz The memtable design practically actively fights Java's GC design. Todd Lipcon gave a good explanation over on HBASE-3455. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2252) arena allocation for memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091196#comment-13091196 ] Yang Yang edited comment on CASSANDRA-2252 at 8/25/11 6:26 PM: --- SlabAllocator: private void tryRetireRegion(Region region) { if (currentRegion.compareAndSet(region, null)) { filledRegions.add(region); } } could you please explain why we need to add them to filledRegions? when all the buffers that share the same region die/become unreachable, shouldn't we just let the region go and free memory? , then we should not tie this region in memory through the references starting from filledRegions . no ?? just to confirm my thoughts, I looked at the HBase implementation: ./src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreLAB.java /** * Try to retire the current chunk if it is still * codec/code. Postcondition is that curChunk.get() * != c */ private void tryRetireChunk(Chunk c) { @SuppressWarnings(unused) boolean weRetiredIt = curChunk.compareAndSet(c, null); // If the CAS succeeds, that means that we won the race // to retire the chunk. We could use this opportunity to // update metrics on external fragmentation. // // If the CAS fails, that means that someone else already // retired the chunk for us. } it does not tie it to a region list . the current result of tying regions together through the filledRegions is that all regions (even if those dead ones) always occupy memory. --- well if the purpose is to count the size() held in allocator, should we use weak references? was (Author: yangyangyyy): SlabAllocator: private void tryRetireRegion(Region region) { if (currentRegion.compareAndSet(region, null)) { filledRegions.add(region); } } could you please explain why we need to add them to filledRegions? when all the buffers that share the same region die/become unreachable, shouldn't we just let the region go and free memory? , then we should not tie this region in memory through the references starting from filledRegions . no ?? just to confirm my thoughts, I looked at the HBase implementation: ./src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreLAB.java /** * Try to retire the current chunk if it is still * codec/code. Postcondition is that curChunk.get() * != c */ private void tryRetireChunk(Chunk c) { @SuppressWarnings(unused) boolean weRetiredIt = curChunk.compareAndSet(c, null); // If the CAS succeeds, that means that we won the race // to retire the chunk. We could use this opportunity to // update metrics on external fragmentation. // // If the CAS fails, that means that someone else already // retired the chunk for us. } it does not tie it to a region list . the current result of tying regions together through the filledRegions is that all regions (even if those dead ones) still occupy memory. --- well if the purpose is to count the size() held in allocator, should we use weak references? arena allocation for memtables -- Key: CASSANDRA-2252 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 1.0 Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz The memtable design practically actively fights Java's GC design. Todd Lipcon gave a good explanation over on HBASE-3455. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2252) arena allocation for memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091196#comment-13091196 ] Yang Yang edited comment on CASSANDRA-2252 at 8/25/11 6:32 PM: --- SlabAllocator: private void tryRetireRegion(Region region) { if (currentRegion.compareAndSet(region, null)) { filledRegions.add(region); } } could you please explain why we need to add them to filledRegions? when all the buffers that share the same region die/become unreachable, shouldn't we just let the region go and free memory? , then we should not tie this region in memory through the references starting from filledRegions . no ?? just to confirm my thoughts, I looked at the HBase implementation: ./src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreLAB.java /** * Try to retire the current chunk if it is still * codec/code. Postcondition is that curChunk.get() * != c */ private void tryRetireChunk(Chunk c) { @SuppressWarnings(unused) boolean weRetiredIt = curChunk.compareAndSet(c, null); // If the CAS succeeds, that means that we won the race // to retire the chunk. We could use this opportunity to // update metrics on external fragmentation. // // If the CAS fails, that means that someone else already // retired the chunk for us. } it does not tie it to a region list . the current result of tying regions together through the filledRegions is that all regions (even if those dead ones) always occupy memory. --- well if the purpose is to count the size() held in allocator, should we just keep a int var of total size , or use weak references? was (Author: yangyangyyy): SlabAllocator: private void tryRetireRegion(Region region) { if (currentRegion.compareAndSet(region, null)) { filledRegions.add(region); } } could you please explain why we need to add them to filledRegions? when all the buffers that share the same region die/become unreachable, shouldn't we just let the region go and free memory? , then we should not tie this region in memory through the references starting from filledRegions . no ?? just to confirm my thoughts, I looked at the HBase implementation: ./src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreLAB.java /** * Try to retire the current chunk if it is still * codec/code. Postcondition is that curChunk.get() * != c */ private void tryRetireChunk(Chunk c) { @SuppressWarnings(unused) boolean weRetiredIt = curChunk.compareAndSet(c, null); // If the CAS succeeds, that means that we won the race // to retire the chunk. We could use this opportunity to // update metrics on external fragmentation. // // If the CAS fails, that means that someone else already // retired the chunk for us. } it does not tie it to a region list . the current result of tying regions together through the filledRegions is that all regions (even if those dead ones) always occupy memory. --- well if the purpose is to count the size() held in allocator, should we use weak references? arena allocation for memtables -- Key: CASSANDRA-2252 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 1.0 Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz The memtable design practically actively fights Java's GC design. Todd Lipcon gave a good explanation over on HBASE-3455. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091202#comment-13091202 ] Jonathan Ellis commented on CASSANDRA-2252: --- the purpose is we need to keep them alive until flush. so weak would not work. arena allocation for memtables -- Key: CASSANDRA-2252 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 1.0 Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz The memtable design practically actively fights Java's GC design. Todd Lipcon gave a good explanation over on HBASE-3455. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091209#comment-13091209 ] Yang Yang commented on CASSANDRA-2252: -- Thanks Jonathan, but why do we need them alive? for example I create a 2MB region, which is carved out to 100 ByteBuffers, each of these ByteBuffers would point to the data of the Region, so as long as one of them is live, the bytes pointed to by Region.data is still in heap; and if these 100 ByteBuffers all die, isn't it our goal to free the 2MB region, since no one is using them?? arena allocation for memtables -- Key: CASSANDRA-2252 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 1.0 Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz The memtable design practically actively fights Java's GC design. Todd Lipcon gave a good explanation over on HBASE-3455. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3074) comments and documentation for index_interval are misleading
[ https://issues.apache.org/jira/browse/CASSANDRA-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew F. Dennis updated CASSANDRA-3074: - Attachment: 3074-cassandra-0.8.patch poor choice of words on my part. new version attached. comments and documentation for index_interval are misleading Key: CASSANDRA-3074 URL: https://issues.apache.org/jira/browse/CASSANDRA-3074 Project: Cassandra Issue Type: Bug Affects Versions: 0.8.4 Reporter: Matthew F. Dennis Assignee: Matthew F. Dennis Priority: Minor Attachments: 3074-cassandra-0.8.patch The comments and documentation for index_interval are misleading. They state the larger the *sampling* the more effective the index as at the cost of space. This is true, but in the context of the configuration variable it implies the larger the *setting* is the larger the index is while in fact it's the opposite of that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3074) comments and documentation for index_interval are misleading
[ https://issues.apache.org/jira/browse/CASSANDRA-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew F. Dennis updated CASSANDRA-3074: - Attachment: (was: 3074-cassandra-0.8.patch) comments and documentation for index_interval are misleading Key: CASSANDRA-3074 URL: https://issues.apache.org/jira/browse/CASSANDRA-3074 Project: Cassandra Issue Type: Bug Affects Versions: 0.8.4 Reporter: Matthew F. Dennis Assignee: Matthew F. Dennis Priority: Minor Attachments: 3074-cassandra-0.8.patch The comments and documentation for index_interval are misleading. They state the larger the *sampling* the more effective the index as at the cost of space. This is true, but in the context of the configuration variable it implies the larger the *setting* is the larger the index is while in fact it's the opposite of that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3077) Support TTL option to be set for column family
Support TTL option to be set for column family -- Key: CASSANDRA-3077 URL: https://issues.apache.org/jira/browse/CASSANDRA-3077 Project: Cassandra Issue Type: Wish Components: Core Affects Versions: 0.8.4 Reporter: Aleksey Vorona Priority: Minor Use case: I want one of my CFs not to store any data older than two months. It is a notifications CF which is of no interest to user. Currently I am setting TTL with each insert in the CF, but since it is a constant it makes sense to me to have it configured in CF definition to apply automatically to all rows in the CF. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3074) comments and documentation for index_interval are misleading
[ https://issues.apache.org/jira/browse/CASSANDRA-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew F. Dennis updated CASSANDRA-3074: - Attachment: (was: 3074-cassandra-0.8.patch) comments and documentation for index_interval are misleading Key: CASSANDRA-3074 URL: https://issues.apache.org/jira/browse/CASSANDRA-3074 Project: Cassandra Issue Type: Bug Affects Versions: 0.8.4 Reporter: Matthew F. Dennis Assignee: Matthew F. Dennis Priority: Minor Attachments: 3074-cassandra-0.8.patch The comments and documentation for index_interval are misleading. They state the larger the *sampling* the more effective the index as at the cost of space. This is true, but in the context of the configuration variable it implies the larger the *setting* is the larger the index is while in fact it's the opposite of that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3074) comments and documentation for index_interval are misleading
[ https://issues.apache.org/jira/browse/CASSANDRA-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew F. Dennis updated CASSANDRA-3074: - Attachment: 3074-cassandra-0.8.patch comments and documentation for index_interval are misleading Key: CASSANDRA-3074 URL: https://issues.apache.org/jira/browse/CASSANDRA-3074 Project: Cassandra Issue Type: Bug Affects Versions: 0.8.4 Reporter: Matthew F. Dennis Assignee: Matthew F. Dennis Priority: Minor Attachments: 3074-cassandra-0.8.patch The comments and documentation for index_interval are misleading. They state the larger the *sampling* the more effective the index as at the cost of space. This is true, but in the context of the configuration variable it implies the larger the *setting* is the larger the index is while in fact it's the opposite of that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3077) Support TTL option to be set for column family
[ https://issues.apache.org/jira/browse/CASSANDRA-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091211#comment-13091211 ] Brandon Williams commented on CASSANDRA-3077: - I could see this as a default ttl option that is used when one is not specified, sort of like a default validator. Support TTL option to be set for column family -- Key: CASSANDRA-3077 URL: https://issues.apache.org/jira/browse/CASSANDRA-3077 Project: Cassandra Issue Type: Wish Components: Core Affects Versions: 0.8.4 Reporter: Aleksey Vorona Priority: Minor Use case: I want one of my CFs not to store any data older than two months. It is a notifications CF which is of no interest to user. Currently I am setting TTL with each insert in the CF, but since it is a constant it makes sense to me to have it configured in CF definition to apply automatically to all rows in the CF. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1161701 - /cassandra/branches/cassandra-0.8/conf/cassandra.yaml
Author: jbellis Date: Thu Aug 25 19:05:49 2011 New Revision: 1161701 URL: http://svn.apache.org/viewvc?rev=1161701view=rev Log: clarify index_interval explanation patch by mdennis for CASSANDRA-3074 Modified: cassandra/branches/cassandra-0.8/conf/cassandra.yaml Modified: cassandra/branches/cassandra-0.8/conf/cassandra.yaml URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/conf/cassandra.yaml?rev=1161701r1=1161700r2=1161701view=diff == --- cassandra/branches/cassandra-0.8/conf/cassandra.yaml (original) +++ cassandra/branches/cassandra-0.8/conf/cassandra.yaml Thu Aug 25 19:05:49 2011 @@ -380,9 +380,16 @@ request_scheduler: org.apache.cassandra. # the request scheduling. Currently the only valid option is keyspace. # request_scheduler_id: keyspace -# The Index Interval determines how large the sampling of row keys -# is for a given SSTable. The larger the sampling, the more effective -# the index is at the cost of space. +# index_interval controls the sampling of entries from the primrary +# row index in terms of space versus time. The larger the interval, +# the smaller and less effective the sampling will be. In technicial +# terms, the interval coresponds to the number of index entries that +# are skipped between taking each sample. All the sampled entries +# must fit in memory. Generally, a value between 128 and 512 here +# coupled with a large key cache size on CFs results in the best trade +# offs. This value is not often changed, however if you have many +# very small rows (many to an OS page), then increasing this will +# often lower memory usage without a impact on performance. index_interval: 128 # Enable or disable inter-node encryption
[jira] [Updated] (CASSANDRA-3074) comments and documentation for index_interval are misleading
[ https://issues.apache.org/jira/browse/CASSANDRA-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3074: -- Affects Version/s: (was: 0.8.4) comments and documentation for index_interval are misleading Key: CASSANDRA-3074 URL: https://issues.apache.org/jira/browse/CASSANDRA-3074 Project: Cassandra Issue Type: Bug Reporter: Matthew F. Dennis Assignee: Matthew F. Dennis Priority: Minor Attachments: 3074-cassandra-0.8.patch The comments and documentation for index_interval are misleading. They state the larger the *sampling* the more effective the index as at the cost of space. This is true, but in the context of the configuration variable it implies the larger the *setting* is the larger the index is while in fact it's the opposite of that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091225#comment-13091225 ] Yang Yang commented on CASSANDRA-2252: -- my line of thought comes from update-heavy workload (counters etc). conceivably (putting the row key issue aside for a while) one region would contain bytebuffer values of similar age. as more updates come in, all the columns in older regions are likely to have all died out, thus allowing us to free the entire region before flushing happens. coming back to the row key issue, in the original slab allocator paper ( Jeff Bonwick ) , a slab contains strictly the same objects, which imply that they die at roughly the same time. if they don't, then yes, in our case, slab has the disadvantage that an entire slab (2MB worth of mem) is held simply because a row key in it is not dead yet. so to overcome this disadvantage, we probably need to further distinguish between object types to be allocated in the slab: this JIRA (same as HBase code) distinguishes between all the allocations between different memtables, to work better with update-heavy traffic, we need to distinguish between row keys and column values (they have different life times) arena allocation for memtables -- Key: CASSANDRA-2252 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 1.0 Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz The memtable design practically actively fights Java's GC design. Todd Lipcon gave a good explanation over on HBASE-3455. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2252) arena allocation for memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091225#comment-13091225 ] Yang Yang edited comment on CASSANDRA-2252 at 8/25/11 7:09 PM: --- my line of thought comes from update-heavy workload (counters etc). conceivably (putting the row key issue aside for a while) one region would contain bytebuffer values of similar age. as more updates come in, all the columns in older regions are likely to have all died out, thus allowing us to free the entire region before flushing happens. coming back to the row key issue, in the original slab allocator paper ( Jeff Bonwick ) , a slab contains strictly the same objects, which imply that they die at roughly the same time. if they don't, then yes, in our case, slab has the disadvantage that an entire slab (2MB worth of mem) is held simply because a row key in it is not dead yet. so to overcome this disadvantage, we probably need to further distinguish between object types to be allocated in the slab: this JIRA (same as HBase code) distinguishes between all the allocations between different memtables, to work better with update-heavy traffic, we need to *distinguish between row keys and column values (they have different life times)* was (Author: yangyangyyy): my line of thought comes from update-heavy workload (counters etc). conceivably (putting the row key issue aside for a while) one region would contain bytebuffer values of similar age. as more updates come in, all the columns in older regions are likely to have all died out, thus allowing us to free the entire region before flushing happens. coming back to the row key issue, in the original slab allocator paper ( Jeff Bonwick ) , a slab contains strictly the same objects, which imply that they die at roughly the same time. if they don't, then yes, in our case, slab has the disadvantage that an entire slab (2MB worth of mem) is held simply because a row key in it is not dead yet. so to overcome this disadvantage, we probably need to further distinguish between object types to be allocated in the slab: this JIRA (same as HBase code) distinguishes between all the allocations between different memtables, to work better with update-heavy traffic, we need to distinguish between row keys and column values (they have different life times) arena allocation for memtables -- Key: CASSANDRA-2252 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 1.0 Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz The memtable design practically actively fights Java's GC design. Todd Lipcon gave a good explanation over on HBASE-3455. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091229#comment-13091229 ] Yang Yang commented on CASSANDRA-2252: -- actually even if we don't do the further optimization suggested in the last comment (separate rowkey and column value into different slab allocators), it would still very much likely work better and kill off some dead regions. let's say a row/column is continually updated 1000 times , and 100 column value fit into 2MB, then to do these 1000 updates, we allocate 10 regions, only the first region would contain the row key, and finally all the 8 regions in the middle would die, the first one remains due to the row key, and the last remains due to the latest (live) column value arena allocation for memtables -- Key: CASSANDRA-2252 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 1.0 Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz The memtable design practically actively fights Java's GC design. Todd Lipcon gave a good explanation over on HBASE-3455. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1161709 - in /cassandra/trunk: ./ conf/ contrib/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/java/org/apache/cassandra/config/ src/java/org/apache/cassandra/db/ src/java/or
Author: jbellis Date: Thu Aug 25 19:28:24 2011 New Revision: 1161709 URL: http://svn.apache.org/viewvc?rev=1161709view=rev Log: merge from 0.8 Modified: cassandra/trunk/ (props changed) cassandra/trunk/CHANGES.txt cassandra/trunk/build.xml cassandra/trunk/conf/cassandra.yaml cassandra/trunk/contrib/ (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java (props changed) cassandra/trunk/src/java/org/apache/cassandra/config/CFMetaData.java cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java cassandra/trunk/src/java/org/apache/cassandra/db/index/keys/KeysIndex.java cassandra/trunk/src/java/org/apache/cassandra/io/sstable/SSTableReader.java cassandra/trunk/src/java/org/apache/cassandra/service/GCInspector.java cassandra/trunk/src/java/org/apache/cassandra/service/StorageService.java cassandra/trunk/src/java/org/apache/cassandra/tools/SSTableExport.java Propchange: cassandra/trunk/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Thu Aug 25 19:28:24 2011 @@ -1,7 +1,7 @@ /cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1131291 -/cassandra/branches/cassandra-0.7:1026516-1160444,1160825 +/cassandra/branches/cassandra-0.7:1026516-1160444,1160825,1161607 /cassandra/branches/cassandra-0.7.0:1053690-1055654 -/cassandra/branches/cassandra-0.8:1090934-1125013,1125019-1133844,1133846-1133917,1133919-1135156,1135158-1160459,1160827 +/cassandra/branches/cassandra-0.8:1090934-1125013,1125019-1161708 /cassandra/branches/cassandra-0.8.0:1125021-1130369 /cassandra/branches/cassandra-0.8.1:1101014-1125018 /cassandra/tags/cassandra-0.7.0-rc3:1051699-1053689 Modified: cassandra/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/trunk/CHANGES.txt?rev=1161709r1=1161708r2=1161709view=diff == --- cassandra/trunk/CHANGES.txt (original) +++ cassandra/trunk/CHANGES.txt Thu Aug 25 19:28:24 2011 @@ -73,6 +73,8 @@ CompactionManager.estimatedCompactions (CASSANDRA-2708) * expose rpc timeouts per host in MessagingServiceMBean (CASSANDRA-2941) * avoid including cwd in classpath for deb and rpm packages (CASSANDRA-2881) + * remove gossip state when a new IP takes over a token (CASSANDRA-3071) + * allow sstable2json to work on index sstable files (CASSANDRA-3059) 0.8.4 Modified: cassandra/trunk/build.xml URL: http://svn.apache.org/viewvc/cassandra/trunk/build.xml?rev=1161709r1=1161708r2=1161709view=diff == --- cassandra/trunk/build.xml (original) +++ cassandra/trunk/build.xml Thu Aug 25 19:28:24 2011 @@ -36,7 +36,6 @@ property name=build.src value=${basedir}/src/ property name=build.src.java value=${basedir}/src/java/ property name=build.src.resources value=${basedir}/src/resources/ -property name=build.src.driver value=${basedir}/drivers/java/src / property name=avro.src value=${basedir}/src/avro/ property name=build.src.gen-java value=${basedir}/src/gen-java/ property name=build.lib value=${basedir}/lib/ @@ -46,7 +45,6 @@ property name=build.classes value=${build.dir}/classes/ property name=build.classes.main value=${build.classes}/main / property name=build.classes.thrift value=${build.classes}/thrift / -property name=build.classes.cql value=${build.classes}/cql / property name=javadoc.dir value=${build.dir}/javadoc/ property name=javadoc.jars.dir value=${build.dir}/javadocs/ property name=interface.dir value=${basedir}/interface/ @@ -161,7 +159,6 @@ message=Not a source artifact, stopping here. / mkdir dir=${build.classes.main}/ mkdir dir=${build.classes.thrift}/ -mkdir dir=${build.classes.cql}/ mkdir dir=${test.lib}/ mkdir dir=${test.classes}/ mkdir dir=${build.src.gen-java}/ @@ -396,7 +393,6 @@ url=${svn.entry.url}?pathrev=${svn.entry dependency groupId=log4j artifactId=log4j version=1.2.16 / dependency groupId=org.apache.cassandra artifactId=cassandra-all version=${version} / dependency groupId=org.apache.cassandra artifactId=cassandra-thrift version=${version} / - dependency groupId=org.apache.cassandra artifactId=cassandra-cql version=${version} / /dependencyManagement developer
buildbot failure in ASF Buildbot on cassandra-trunk
The Buildbot has detected a new failure on builder cassandra-trunk while building ASF Buildbot. Full details are available at: http://ci.apache.org/builders/cassandra-trunk/builds/1553 Buildbot URL: http://ci.apache.org/ Buildslave for this Build: isis_ubuntu Build Reason: scheduler Build Source Stamp: [branch cassandra/trunk] 1161709 Blamelist: jbellis BUILD FAILED: failed compile sincerely, -The Buildbot
svn commit: r1161712 - /cassandra/trunk/build.xml
Author: jbellis Date: Thu Aug 25 19:33:04 2011 New Revision: 1161712 URL: http://svn.apache.org/viewvc?rev=1161712view=rev Log: fix bad merge Modified: cassandra/trunk/build.xml Modified: cassandra/trunk/build.xml URL: http://svn.apache.org/viewvc/cassandra/trunk/build.xml?rev=1161712r1=1161711r2=1161712view=diff == --- cassandra/trunk/build.xml (original) +++ cassandra/trunk/build.xml Thu Aug 25 19:33:04 2011 @@ -36,6 +36,7 @@ property name=build.src value=${basedir}/src/ property name=build.src.java value=${basedir}/src/java/ property name=build.src.resources value=${basedir}/src/resources/ +property name=build.src.driver value=${basedir}/drivers/java/src / property name=avro.src value=${basedir}/src/avro/ property name=build.src.gen-java value=${basedir}/src/gen-java/ property name=build.lib value=${basedir}/lib/ @@ -45,6 +46,7 @@ property name=build.classes value=${build.dir}/classes/ property name=build.classes.main value=${build.classes}/main / property name=build.classes.thrift value=${build.classes}/thrift / +property name=build.classes.cql value=${build.classes}/cql / property name=javadoc.dir value=${build.dir}/javadoc/ property name=javadoc.jars.dir value=${build.dir}/javadocs/ property name=interface.dir value=${basedir}/interface/ @@ -159,6 +161,7 @@ message=Not a source artifact, stopping here. / mkdir dir=${build.classes.main}/ mkdir dir=${build.classes.thrift}/ +mkdir dir=${build.classes.cql}/ mkdir dir=${test.lib}/ mkdir dir=${test.classes}/ mkdir dir=${build.src.gen-java}/ @@ -393,6 +396,7 @@ url=${svn.entry.url}?pathrev=${svn.entry dependency groupId=log4j artifactId=log4j version=1.2.16 / dependency groupId=org.apache.cassandra artifactId=cassandra-all version=${version} / dependency groupId=org.apache.cassandra artifactId=cassandra-thrift version=${version} / + dependency groupId=org.apache.cassandra artifactId=cassandra-cql version=${version} / /dependencyManagement developer id=alakshman name=Avinash Lakshman/ developer id=antelder name=Anthony Elder/ @@ -499,6 +503,22 @@ url=${svn.entry.url}?pathrev=${svn.entry dependency groupId=org.slf4j artifactId=slf4j-api/ dependency groupId=org.apache.thrift artifactId=libthrift/ /artifact:pom + artifact:pom id=cql-pom +artifactId=cassandra-cql +url=http://cassandra.apache.org; +name=Apache Cassandra +parent groupId=org.apache.cassandra +artifactId=cassandra-parent +version=${version}/ +scm connection=${scm.connection} developerConnection=${scm.developerConnection} url=${scm.url}/ +dependency groupId=com.google.guava artifactId=guava/ +dependency groupId=org.slf4j artifactId=slf4j-api/ +dependency groupId=org.apache.thrift artifactId=libthrift/ +dependency groupId=org.apache.cassandra artifactId=cassandra-thrift/ +dependency groupId=org.apache.cassandra artifactId=cassandra-all/ +!-- because cassandra-all uses log4j, and we need cassandra-all, consumers must use log4j, so force log4j version of slf4j -- +dependency groupId=org.slf4j artifactId=slf4j-log4j12 scope=runtime/ + /artifact:pom artifact:pom id=dist-pom artifactId=apache-cassandra @@ -668,6 +688,11 @@ url=${svn.entry.url}?pathrev=${svn.entry src path=${build.src.gen-java}/ classpath refid=cassandra.classpath/ /javac +javac debug=true debuglevel=${debuglevel} + destdir=${build.classes.cql} includeantruntime=false +src path=${build.src.driver} / +classpath refid=cassandra.classpath/ +/javac copy todir=${build.classes.main} fileset dir=${build.src.resources} / /copy @@ -725,6 +750,20 @@ url=${svn.entry.url}?pathrev=${svn.entry !-- /section -- /manifest /jar + + !-- CQL driver Jar -- + artifact:writepom pomRefId=cql-pom + file=${build.dir}/${ant.project.name}-cql-${cql.driver.version}.pom/ + jar jarfile=${build.dir}/${ant.project.name}-cql-${cql.driver.version}.jar + basedir=${build.classes.cql} +manifest + attribute name=Implementation-Title value=Cassandra/ + attribute name=Implementation-Version value=${version}/ + attribute name=Implementation-Vendor value=Apache/ + attribute name=Class-Path + value=${ant.project.name}-thrift-${version}.jar / +/manifest + /jar /target !-- @@ -750,11 +789,23 @@ url=${svn.entry.url}?pathrev=${svn.entry fileset dir=${build.src.gen-java} defaultexcludes=yes
[Cassandra Wiki] Update of ConfigurationNotes by PavelYaskevich
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The ConfigurationNotes page has been changed by PavelYaskevich: http://wiki.apache.org/cassandra/ConfigurationNotes?action=diffrev1=9rev2=10 Per-node options are loaded from yaml and held in !DatabaseDescriptor. - Per-KS, per-CF, and per-Column options are loaded from the !MigrationsTable at startup and are encapsulated with KSMetaData, CFMetaData, and !ColumnDefinition objects, which are held by !DatabaseDescriptor as well as !Tables and !ColumnFamilyStores respectively. When a migration arrives, it writes to the !MigrationsTable, then propogates the changes out to the KS/CFMD objects in the system. + Per-KS, per-CF, and per-Column options are loaded from the !MigrationsTable at startup and are encapsulated with !KSMetaData, !CFMetaData, and !ColumnDefinition objects, which are held by !Schema and !Table. When a migration arrives, it writes to the !MigrationsTable, then propogates the changes out to the KS/CFMD objects in the system. Configuration can be changed at runtime without a restart (excluding the ones that change on-disk format (which cannot be changed without clearing the cluster) and ones that change routing). For per-node options, poke !StorageService via JMX (which in turn pokes !DatabaseDescriptor). For per-KS options, poke the appropriate !Table. For per-CF and per-Column options, poke the appropriate !ColumnFamilyStore. These ephemeral changes are stronger than migrations (they stay set regardless of new config coming in), but do not persist between reboots. @@ -22, +22 @@ * define T getFoo() {return foo;} since all optional params are private - * update deflate() and inflate() to handle the new option -!CfDef and !CfDef- + * update to{Avro/Thrift}() and from{Avro/Thrift}() to handle the new option -!CfDef and !CfDef- * update equals(), hashcode(), and tostring() to build with the new prop * update applyImplicitDefaults() - * update convertTo{Thrift/Avro}() - * update apply() (a.k.a. applyAvroMigrationChangesToCurrentCFMD) - - * update convertToCFMetaData (a.k.a. convert thrift to CFMD and validate it) * if desired, add new option to CLI add/update CF
svn commit: r1161719 - /cassandra/trunk/src/java/org/apache/cassandra/tools/SSTableExport.java
Author: jbellis Date: Thu Aug 25 19:49:52 2011 New Revision: 1161719 URL: http://svn.apache.org/viewvc?rev=1161719view=rev Log: fix bad merge Modified: cassandra/trunk/src/java/org/apache/cassandra/tools/SSTableExport.java Modified: cassandra/trunk/src/java/org/apache/cassandra/tools/SSTableExport.java URL: http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/tools/SSTableExport.java?rev=1161719r1=1161718r2=1161719view=diff == --- cassandra/trunk/src/java/org/apache/cassandra/tools/SSTableExport.java (original) +++ cassandra/trunk/src/java/org/apache/cassandra/tools/SSTableExport.java Thu Aug 25 19:49:52 2011 @@ -29,6 +29,7 @@ import org.apache.cassandra.config.Colum import org.apache.cassandra.config.DatabaseDescriptor; import org.apache.cassandra.config.Schema; import org.apache.cassandra.db.*; +import org.apache.cassandra.db.index.keys.KeysIndex; import org.apache.cassandra.db.marshal.AbstractType; import org.apache.cassandra.io.util.RandomAccessReader; import org.apache.cassandra.service.StorageService; @@ -341,13 +342,13 @@ public class SSTableExport // look up index metadata from parent int i = descriptor.cfname.indexOf(.); String parentName = descriptor.cfname.substring(0, i); -CFMetaData parent = DatabaseDescriptor.getCFMetaData(descriptor.ksname, parentName); +CFMetaData parent = Schema.instance.getCFMetaData(descriptor.ksname, parentName); ColumnDefinition def = parent.getColumnDefinitionForIndex(descriptor.cfname.substring(i + 1)); -metadata = CFMetaData.newIndexMetadata(parent, def, ColumnFamilyStore.indexComparator()); +metadata = CFMetaData.newIndexMetadata(parent, def, KeysIndex.indexComparator()); } else { -metadata = DatabaseDescriptor.getCFMetaData(descriptor.ksname, descriptor.cfname); +metadata = Schema.instance.getCFMetaData(descriptor.ksname, descriptor.cfname); } export(SSTableReader.open(descriptor, metadata), outs, excludes);
buildbot success in ASF Buildbot on cassandra-trunk
The Buildbot has detected a restored build on builder cassandra-trunk while building ASF Buildbot. Full details are available at: http://ci.apache.org/builders/cassandra-trunk/builds/1555 Buildbot URL: http://ci.apache.org/ Buildslave for this Build: isis_ubuntu Build Reason: scheduler Build Source Stamp: [branch cassandra/trunk] 1161719 Blamelist: jbellis Build succeeded! sincerely, -The Buildbot
[jira] [Commented] (CASSANDRA-3074) comments and documentation for index_interval are misleading
[ https://issues.apache.org/jira/browse/CASSANDRA-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091296#comment-13091296 ] Hudson commented on CASSANDRA-3074: --- Integrated in Cassandra-0.8 #295 (See [https://builds.apache.org/job/Cassandra-0.8/295/]) clarify index_interval explanation patch by mdennis for CASSANDRA-3074 jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1161701 Files : * /cassandra/branches/cassandra-0.8/conf/cassandra.yaml comments and documentation for index_interval are misleading Key: CASSANDRA-3074 URL: https://issues.apache.org/jira/browse/CASSANDRA-3074 Project: Cassandra Issue Type: Bug Reporter: Matthew F. Dennis Assignee: Matthew F. Dennis Priority: Minor Fix For: 0.8.5 Attachments: 3074-cassandra-0.8.patch The comments and documentation for index_interval are misleading. They state the larger the *sampling* the more effective the index as at the cost of space. This is true, but in the context of the configuration variable it implies the larger the *setting* is the larger the index is while in fact it's the opposite of that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3078) Make Secondary Indexes Pluggable
Make Secondary Indexes Pluggable - Key: CASSANDRA-3078 URL: https://issues.apache.org/jira/browse/CASSANDRA-3078 Project: Cassandra Issue Type: Improvement Components: Core Reporter: T Jake Luciani Assignee: T Jake Luciani Fix For: 1.0 CASSANDRA-2982 got us most of the way there... This ticket removes the IndexType enum (while keeping support for KEYS internally from old cf metadata). You now specify a index_class rather than index_type. index_class is the full classname of the SecondaryIndex impl. This also adds a index_options map to pass extra info to the secondary index impl if needed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3025) PHP/PDO driver for Cassandra CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikko Koppanen updated CASSANDRA-3025: -- Attachment: pdo_cassandra-0.1.2.tgz Hi, PDO doesn't seem to support different amount of columns on rows, which is slightly problematic with sparse columns. I did the following solution for now: https://github.com/mkoppanen/php-pdo_cassandra/blob/master/tests/017-sparsecolumns.phpt The columns that are not set for the row are named __column_not_set_%d, which I think is about the cleanest way to do it. The test for integers is updated here: https://github.com/mkoppanen/php-pdo_cassandra/blob/master/tests/018-int.phpt UUID behaviour is here: https://github.com/mkoppanen/php-pdo_cassandra/blob/master/tests/019-uuid.phpt For UUIDs I was thinking about adding additional configuration option to automatically unparse them into string representation. Test with the available data types as values: https://github.com/mkoppanen/php-pdo_cassandra/blob/master/tests/020-types.phpt And a test using bigint comparator + sparse columns: https://github.com/mkoppanen/php-pdo_cassandra/blob/master/tests/021-comparators.phpt PHP/PDO driver for Cassandra CQL Key: CASSANDRA-3025 URL: https://issues.apache.org/jira/browse/CASSANDRA-3025 Project: Cassandra Issue Type: New Feature Components: API Reporter: Mikko Koppanen Labels: php Attachments: pdo_cassandra-0.1.0.tgz, pdo_cassandra-0.1.1.tgz, pdo_cassandra-0.1.2.tgz, php_test_results_20110818_2317.txt Hello, attached is the initial version of the PDO driver for Cassandra CQL language. This is a native PHP extension written in what I would call a combination of C and C++, due to PHP being C. The thrift API used is the C++. The API looks roughly following: {code} ?php $db = new PDO('cassandra:host=127.0.0.1;port=9160'); $db-exec (CREATE KEYSPACE mytest with strategy_class = 'SimpleStrategy' and strategy_options:replication_factor=1;); $db-exec (USE mytest); $db-exec (CREATE COLUMNFAMILY users ( my_key varchar PRIMARY KEY, full_name varchar );); $stmt = $db-prepare (INSERT INTO users (my_key, full_name) VALUES (:key, :full_name);); $stmt-execute (array (':key' = 'mikko', ':full_name' = 'Mikko K' )); {code} Currently prepared statements are emulated on the client side but I understand that there is a plan to add prepared statements to Cassandra CQL API as well. I will add this feature in to the extension as soon as they are implemented. Additional documentation can be found in github https://github.com/mkoppanen/php-pdo_cassandra, in the form of rendered MarkDown file. Tests are currently not included in the package file and they can be found in the github for now as well. I have created documentation in docbook format as well, but have not yet rendered it. Comments and feedback are welcome. Thanks, Mikko -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3078) Make Secondary Indexes Pluggable
[ https://issues.apache.org/jira/browse/CASSANDRA-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Jake Luciani updated CASSANDRA-3078: -- Attachment: 3078.txt Make Secondary Indexes Pluggable - Key: CASSANDRA-3078 URL: https://issues.apache.org/jira/browse/CASSANDRA-3078 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.0 Reporter: T Jake Luciani Assignee: T Jake Luciani Labels: secondary_index Fix For: 1.0 Attachments: 3078.txt CASSANDRA-2982 got us most of the way there... This ticket removes the IndexType enum (while keeping support for KEYS internally from old cf metadata). You now specify a index_class rather than index_type. index_class is the full classname of the SecondaryIndex impl. This also adds a index_options map to pass extra info to the secondary index impl if needed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3078) Make Secondary Indexes Pluggable
[ https://issues.apache.org/jira/browse/CASSANDRA-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Jake Luciani updated CASSANDRA-3078: -- Attachment: 3078_thrift.txt Make Secondary Indexes Pluggable - Key: CASSANDRA-3078 URL: https://issues.apache.org/jira/browse/CASSANDRA-3078 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.0 Reporter: T Jake Luciani Assignee: T Jake Luciani Labels: secondary_index Fix For: 1.0 Attachments: 3078.txt, 3078_thrift.txt CASSANDRA-2982 got us most of the way there... This ticket removes the IndexType enum (while keeping support for KEYS internally from old cf metadata). You now specify a index_class rather than index_type. index_class is the full classname of the SecondaryIndex impl. This also adds a index_options map to pass extra info to the secondary index impl if needed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-957) convenience workflow for replacing dead node
[ https://issues.apache.org/jira/browse/CASSANDRA-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091364#comment-13091364 ] Nick Bailey commented on CASSANDRA-957: --- The hint rework looks good. The only comment I have there is that it would be nice if the logging statements for sending hints creating hints indicated the ip as well as the token. Even though it's stored by token it would be nice to immediately see the ip in the log without having to look it up. I'm also unsure about the reasoning behind the last patch. Why increase the initial sleep in joinTokenRing? convenience workflow for replacing dead node Key: CASSANDRA-957 URL: https://issues.apache.org/jira/browse/CASSANDRA-957 Project: Cassandra Issue Type: Wish Components: Core, Tools Affects Versions: 0.8.2 Reporter: Jonathan Ellis Assignee: Vijay Fix For: 1.0 Attachments: 0001-Support-Token-Replace.patch, 0001-Support-bringing-back-a-node-to-the-cluster-that-exi.patch, 0001-Support-token-replace.patch, 0001-support-for-replace-token-v3.patch, 0002-Do-not-include-local-node-when-computing-workMap.patch, 0002-Rework-Hints-to-be-on-token.patch, 0002-Rework-Hints-to-be-on-token.patch, 0002-upport-for-hints-on-token-v3.patch, 0003-Make-HintedHandoff-More-reliable.patch, 0003-Make-hints-More-reliable.patch, 0003-making-bootstrap-sleep-longer.patch Original Estimate: 24h Remaining Estimate: 24h Replacing a dead node with a new one is a common operation, but nodetool removetoken followed by bootstrap is inefficient (re-replicating data first to the remaining nodes, then to the new one) and manually bootstrapping to a token just less than the old one's, followed by nodetool removetoken is slightly painful and prone to manual errors. First question: how would you expose this in our tool ecosystem? It needs to be a startup-time option to the new node, so it can't be nodetool, and messing with the config xml definitely takes the convenience out. A one-off -DreplaceToken=XXY argument? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-957) convenience workflow for replacing dead node
[ https://issues.apache.org/jira/browse/CASSANDRA-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091376#comment-13091376 ] Vijay commented on CASSANDRA-957: - * In Gossiper.doStatusCheck() you made it ignore any state that is for the local endpoint and is not a dead state. Shouldn't it just always ignore any state about the local endpoint though? Basically what it was doing previously? * Basically the same question about Gossiper.applyStateLocally() the loop continues if the state is for the local node and the state is dead. Why would we want to apply a live local state? - Fixed, initial intention was to find the old state of the node, Seems like it is not possible now… * Does the hibernate state need the true/false value? Seems like all we care about is that it is set at all. Looks like we we are starting up right now we automatically go into a hibernate state, then we go into a bootstrap state afterwards if the specified a replace token. Seems like we shouldn't set a state at all until we know we are doing one of replace/bootstrap/just joining. - it will be either true or false (If not a replace, or overwrite with the state normal)… if you don't then Gossiper.applyStateLocally will mark it alive on all the other nodes. * It looks like right now you could specify a replace token that isn't part of the cluster. If that happens we should throw an exception and tell the user to do the normal bootstrap process. - As we are ignoring the local states… this information is hard to gather when we are trying to replace the same node…. The check is to see no other live node owns this token…. - We can document in the wiki about the effects if they replace a token which is not part of the ring…. (repair/decommission) * Why use the last gossip time to determine if the node we are replacing is alive? Why not just check gossip to see if the ring thinks it is alive? - because by default when we hear about someone we consider them to be alive…. the idea is to check and see if we heard from them back or not (After the ring delay) if not then there is more probability that the dead node is dead (Thats why we have to wait for 90 + delay * We should update the the message for the exception that is thrown when you try to bootstrap to an existing token. It should indicate either remove the dead node or follow this replacement process. - I am not sure if i parse that, i have added more to it plz check. * I'm not sure why we are calling updateNormalToken() in the StorageService.bootstrap() method when it's a token replacement. - Thats because you don't want the range request sent to the node which is not existing. * A little bit of doc on this would be good, maybe in cassandra.yaml? Just on how to pass the argument to the startup process. - Yaml is bad because this is a one time thing…. Wiki page? like the don't join ring property convenience workflow for replacing dead node Key: CASSANDRA-957 URL: https://issues.apache.org/jira/browse/CASSANDRA-957 Project: Cassandra Issue Type: Wish Components: Core, Tools Affects Versions: 0.8.2 Reporter: Jonathan Ellis Assignee: Vijay Fix For: 1.0 Attachments: 0001-Support-Token-Replace.patch, 0001-Support-bringing-back-a-node-to-the-cluster-that-exi.patch, 0001-Support-token-replace.patch, 0001-support-for-replace-token-v3.patch, 0002-Do-not-include-local-node-when-computing-workMap.patch, 0002-Rework-Hints-to-be-on-token.patch, 0002-Rework-Hints-to-be-on-token.patch, 0002-upport-for-hints-on-token-v3.patch, 0003-Make-HintedHandoff-More-reliable.patch, 0003-Make-hints-More-reliable.patch, 0003-making-bootstrap-sleep-longer.patch Original Estimate: 24h Remaining Estimate: 24h Replacing a dead node with a new one is a common operation, but nodetool removetoken followed by bootstrap is inefficient (re-replicating data first to the remaining nodes, then to the new one) and manually bootstrapping to a token just less than the old one's, followed by nodetool removetoken is slightly painful and prone to manual errors. First question: how would you expose this in our tool ecosystem? It needs to be a startup-time option to the new node, so it can't be nodetool, and messing with the config xml definitely takes the convenience out. A one-off -DreplaceToken=XXY argument? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-957) convenience workflow for replacing dead node
[ https://issues.apache.org/jira/browse/CASSANDRA-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-957: Attachment: 0001-support-token-replace-v4.patch Attaching newer version with fix and rebase. convenience workflow for replacing dead node Key: CASSANDRA-957 URL: https://issues.apache.org/jira/browse/CASSANDRA-957 Project: Cassandra Issue Type: Wish Components: Core, Tools Affects Versions: 0.8.2 Reporter: Jonathan Ellis Assignee: Vijay Fix For: 1.0 Attachments: 0001-Support-Token-Replace.patch, 0001-Support-bringing-back-a-node-to-the-cluster-that-exi.patch, 0001-Support-token-replace.patch, 0001-support-for-replace-token-v3.patch, 0001-support-token-replace-v4.patch, 0002-Do-not-include-local-node-when-computing-workMap.patch, 0002-Rework-Hints-to-be-on-token.patch, 0002-Rework-Hints-to-be-on-token.patch, 0002-upport-for-hints-on-token-v3.patch, 0003-Make-HintedHandoff-More-reliable.patch, 0003-Make-hints-More-reliable.patch, 0003-making-bootstrap-sleep-longer.patch Original Estimate: 24h Remaining Estimate: 24h Replacing a dead node with a new one is a common operation, but nodetool removetoken followed by bootstrap is inefficient (re-replicating data first to the remaining nodes, then to the new one) and manually bootstrapping to a token just less than the old one's, followed by nodetool removetoken is slightly painful and prone to manual errors. First question: how would you expose this in our tool ecosystem? It needs to be a startup-time option to the new node, so it can't be nodetool, and messing with the config xml definitely takes the convenience out. A one-off -DreplaceToken=XXY argument? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-957) convenience workflow for replacing dead node
[ https://issues.apache.org/jira/browse/CASSANDRA-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091386#comment-13091386 ] Vijay commented on CASSANDRA-957: - I'm also unsure about the reasoning behind the last patch. Why increase the initial sleep in joinTokenRing? -- Ring delay + extra time so we can check if there is any live server before actually replacing the node. convenience workflow for replacing dead node Key: CASSANDRA-957 URL: https://issues.apache.org/jira/browse/CASSANDRA-957 Project: Cassandra Issue Type: Wish Components: Core, Tools Affects Versions: 0.8.2 Reporter: Jonathan Ellis Assignee: Vijay Fix For: 1.0 Attachments: 0001-Support-Token-Replace.patch, 0001-Support-bringing-back-a-node-to-the-cluster-that-exi.patch, 0001-Support-token-replace.patch, 0001-support-for-replace-token-v3.patch, 0001-support-token-replace-v4.patch, 0002-Do-not-include-local-node-when-computing-workMap.patch, 0002-Rework-Hints-to-be-on-token.patch, 0002-Rework-Hints-to-be-on-token.patch, 0002-upport-for-hints-on-token-v3.patch, 0003-Make-HintedHandoff-More-reliable.patch, 0003-Make-hints-More-reliable.patch, 0003-making-bootstrap-sleep-longer.patch Original Estimate: 24h Remaining Estimate: 24h Replacing a dead node with a new one is a common operation, but nodetool removetoken followed by bootstrap is inefficient (re-replicating data first to the remaining nodes, then to the new one) and manually bootstrapping to a token just less than the old one's, followed by nodetool removetoken is slightly painful and prone to manual errors. First question: how would you expose this in our tool ecosystem? It needs to be a startup-time option to the new node, so it can't be nodetool, and messing with the config xml definitely takes the convenience out. A one-off -DreplaceToken=XXY argument? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3079) Allow the client request scheduler to throw Timeouts
[ https://issues.apache.org/jira/browse/CASSANDRA-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stu Hood updated CASSANDRA-3079: Reviewer: lenn0x (was: chrisg) Allow the client request scheduler to throw Timeouts Key: CASSANDRA-3079 URL: https://issues.apache.org/jira/browse/CASSANDRA-3079 Project: Cassandra Issue Type: Improvement Components: API Reporter: Stu Hood Assignee: Stu Hood Priority: Minor Fix For: 1.0 The RoundRobinScheduler prioritizes threads by allowing them to queue on a SynchronousQueue per scheduling bucket. These queues currently do not use timeouts, and we observed a cascading case where client retries caused the scheduler queues to fill such that latency was way above the client timeout. Allowing the IRequestScheduler.queue method to throw a (per-call configurable) timeout, we can avoid this cascading. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-957) convenience workflow for replacing dead node
[ https://issues.apache.org/jira/browse/CASSANDRA-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-957: Attachment: 0002-hints-on-token-than-ip-v4.patch Added more logging for RMV i am not sure if we have to parse the string to token and then to ip. convenience workflow for replacing dead node Key: CASSANDRA-957 URL: https://issues.apache.org/jira/browse/CASSANDRA-957 Project: Cassandra Issue Type: Wish Components: Core, Tools Affects Versions: 0.8.2 Reporter: Jonathan Ellis Assignee: Vijay Fix For: 1.0 Attachments: 0001-Support-Token-Replace.patch, 0001-Support-bringing-back-a-node-to-the-cluster-that-exi.patch, 0001-Support-token-replace.patch, 0001-support-for-replace-token-v3.patch, 0001-support-token-replace-v4.patch, 0002-Do-not-include-local-node-when-computing-workMap.patch, 0002-Rework-Hints-to-be-on-token.patch, 0002-Rework-Hints-to-be-on-token.patch, 0002-hints-on-token-than-ip-v4.patch, 0002-upport-for-hints-on-token-v3.patch, 0003-Make-HintedHandoff-More-reliable.patch, 0003-Make-hints-More-reliable.patch, 0003-making-bootstrap-sleep-longer.patch Original Estimate: 24h Remaining Estimate: 24h Replacing a dead node with a new one is a common operation, but nodetool removetoken followed by bootstrap is inefficient (re-replicating data first to the remaining nodes, then to the new one) and manually bootstrapping to a token just less than the old one's, followed by nodetool removetoken is slightly painful and prone to manual errors. First question: how would you expose this in our tool ecosystem? It needs to be a startup-time option to the new node, so it can't be nodetool, and messing with the config xml definitely takes the convenience out. A one-off -DreplaceToken=XXY argument? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3079) Allow the client request scheduler to throw Timeouts
[ https://issues.apache.org/jira/browse/CASSANDRA-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stu Hood updated CASSANDRA-3079: Attachment: 0002-Fix-try-finally-nesting-for-scheduling.txt 0001-Add-timeouts-to-request-scheduling.txt 0001 Adds the timeouts mentioned above (currently all schedule() calls use RpcTimeout, pending commit of CASSANDRA-2819) 0002 Fixes the try-finally nesting of schedule calls to avoid spurious release() calls due to timeouts Allow the client request scheduler to throw Timeouts Key: CASSANDRA-3079 URL: https://issues.apache.org/jira/browse/CASSANDRA-3079 Project: Cassandra Issue Type: Improvement Components: API Reporter: Stu Hood Assignee: Stu Hood Priority: Minor Fix For: 1.0 Attachments: 0001-Add-timeouts-to-request-scheduling.txt, 0002-Fix-try-finally-nesting-for-scheduling.txt The RoundRobinScheduler prioritizes threads by allowing them to queue on a SynchronousQueue per scheduling bucket. These queues currently do not use timeouts, and we observed a cascading case where client retries caused the scheduler queues to fill such that latency was way above the client timeout. Allowing the IRequestScheduler.queue method to throw a (per-call configurable) timeout, we can avoid this cascading. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2434) node bootstrapping can violate consistency
[ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091418#comment-13091418 ] paul cannon commented on CASSANDRA-2434: So, it looks like it will be possible for the node-that-will-be-removed to change between starting a bootstrap and finishing it (other nodes being bootstrapped/moved/decom'd during that time period); in some cases, that could still lead to a consistency violation. Is that unlikely enough that we don't care, here? At least the situation would be better with the proposed fix than it is now. Second question: what might the permission from the operator to choose a replica that is closer/less dead look like? Maybe just a boolean flag saying it's ok to stream from any node for any range you need to stream? Or would we want to allow specifying precise source nodes for any/all affected address ranges? node bootstrapping can violate consistency -- Key: CASSANDRA-2434 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434 Project: Cassandra Issue Type: Bug Reporter: Peter Schuller Assignee: paul cannon Fix For: 1.1 My reading (a while ago) of the code indicates that there is no logic involved during bootstrapping that avoids consistency level violations. If I recall correctly it just grabs neighbors that are currently up. There are at least two issues I have with this behavior: * If I have a cluster where I have applications relying on QUORUM with RF=3, and bootstrapping complete based on only one node, I have just violated the supposedly guaranteed consistency semantics of the cluster. * Nodes can flap up and down at any time, so even if a human takes care to look at which nodes are up and things about it carefully before bootstrapping, there's no guarantee. A complication is that not only does it depend on use-case where this is an issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster which is otherwise used for QUORUM operations you may wish to accept less-than-quorum nodes during bootstrap in various emergency situations. A potential easy fix is to have bootstrap take an argument which is the number of hosts to bootstrap from, or to assume QUORUM if none is given. (A related concern is bootstrapping across data centers. You may *want* to bootstrap to a local node and then do a repair to avoid sending loads of data across DC:s while still achieving consistency. Or even if you don't care about the consistency issues, I don't think there is currently a way to bootstrap from local nodes only.) Thoughts? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3079) Allow the client request scheduler to throw Timeouts
[ https://issues.apache.org/jira/browse/CASSANDRA-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091421#comment-13091421 ] Melvin Wang commented on CASSANDRA-3079: looks good to me. Allow the client request scheduler to throw Timeouts Key: CASSANDRA-3079 URL: https://issues.apache.org/jira/browse/CASSANDRA-3079 Project: Cassandra Issue Type: Improvement Components: API Reporter: Stu Hood Assignee: Stu Hood Priority: Minor Fix For: 1.0 Attachments: 0001-Add-timeouts-to-request-scheduling.txt, 0002-Fix-try-finally-nesting-for-scheduling.txt The RoundRobinScheduler prioritizes threads by allowing them to queue on a SynchronousQueue per scheduling bucket. These queues currently do not use timeouts, and we observed a cascading case where client retries caused the scheduler queues to fill such that latency was way above the client timeout. Allowing the IRequestScheduler.queue method to throw a (per-call configurable) timeout, we can avoid this cascading. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3080) Add throttling for internode streaming
Add throttling for internode streaming -- Key: CASSANDRA-3080 URL: https://issues.apache.org/jira/browse/CASSANDRA-3080 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Stu Hood Fix For: 1.0 Cassandra does (mostly) sequential reads from disk to send data to other nodes, which means that it is easily possible to stream upwards of 100 MB/s per source node. To avoid affecting service, we should add streaming throttling across all streams in the outbound direction, preferably configurable from JMX, and with `nodetool netstats` integration. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3025) PHP/PDO driver for Cassandra CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091520#comment-13091520 ] Jonathan Ellis commented on CASSANDRA-3025: --- bq. The columns that are not set for the row are named _column_not_set%d, which I think is about the cleanest way to do it I don't think I understand. Can we just set them to null instead? bq. For UUIDs I was thinking about adding additional configuration option to automatically unparse them into string representation Why not unparse into objects as in the phpcassa link above? PHP/PDO driver for Cassandra CQL Key: CASSANDRA-3025 URL: https://issues.apache.org/jira/browse/CASSANDRA-3025 Project: Cassandra Issue Type: New Feature Components: API Reporter: Mikko Koppanen Labels: php Attachments: pdo_cassandra-0.1.0.tgz, pdo_cassandra-0.1.1.tgz, pdo_cassandra-0.1.2.tgz, php_test_results_20110818_2317.txt Hello, attached is the initial version of the PDO driver for Cassandra CQL language. This is a native PHP extension written in what I would call a combination of C and C++, due to PHP being C. The thrift API used is the C++. The API looks roughly following: {code} ?php $db = new PDO('cassandra:host=127.0.0.1;port=9160'); $db-exec (CREATE KEYSPACE mytest with strategy_class = 'SimpleStrategy' and strategy_options:replication_factor=1;); $db-exec (USE mytest); $db-exec (CREATE COLUMNFAMILY users ( my_key varchar PRIMARY KEY, full_name varchar );); $stmt = $db-prepare (INSERT INTO users (my_key, full_name) VALUES (:key, :full_name);); $stmt-execute (array (':key' = 'mikko', ':full_name' = 'Mikko K' )); {code} Currently prepared statements are emulated on the client side but I understand that there is a plan to add prepared statements to Cassandra CQL API as well. I will add this feature in to the extension as soon as they are implemented. Additional documentation can be found in github https://github.com/mkoppanen/php-pdo_cassandra, in the form of rendered MarkDown file. Tests are currently not included in the package file and they can be found in the github for now as well. I have created documentation in docbook format as well, but have not yet rendered it. Comments and feedback are welcome. Thanks, Mikko -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3078) Make Secondary Indexes Pluggable
[ https://issues.apache.org/jira/browse/CASSANDRA-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091522#comment-13091522 ] Jonathan Ellis commented on CASSANDRA-3078: --- skimmed: - hashconstruct addition should be moved into a separate ticket and used for strategy_options as well - we should default to a package (org.apache.cassandra.db.index) if none is specified in class name so using built-ins is that much less of a pita - style: spaces not tabs, space after //, space before () in conditionals + loops Make Secondary Indexes Pluggable - Key: CASSANDRA-3078 URL: https://issues.apache.org/jira/browse/CASSANDRA-3078 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.0 Reporter: T Jake Luciani Assignee: T Jake Luciani Labels: secondary_index Fix For: 1.0 Attachments: 3078.txt, 3078_thrift.txt CASSANDRA-2982 got us most of the way there... This ticket removes the IndexType enum (while keeping support for KEYS internally from old cf metadata). You now specify a index_class rather than index_type. index_class is the full classname of the SecondaryIndex impl. This also adds a index_options map to pass extra info to the secondary index impl if needed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1161983 - in /cassandra/trunk: ./ src/java/org/apache/cassandra/scheduler/ src/java/org/apache/cassandra/thrift/
Author: jbellis Date: Fri Aug 26 03:54:32 2011 New Revision: 1161983 URL: http://svn.apache.org/viewvc?rev=1161983view=rev Log: Add timeouts to client request schedulers patch by Stu Hood; reviewed by Melvin Wang for CASSANDRA-3079 Modified: cassandra/trunk/CHANGES.txt cassandra/trunk/src/java/org/apache/cassandra/scheduler/IRequestScheduler.java cassandra/trunk/src/java/org/apache/cassandra/scheduler/NoScheduler.java cassandra/trunk/src/java/org/apache/cassandra/scheduler/RoundRobinScheduler.java cassandra/trunk/src/java/org/apache/cassandra/scheduler/WeightedQueue.java cassandra/trunk/src/java/org/apache/cassandra/thrift/CassandraServer.java Modified: cassandra/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/trunk/CHANGES.txt?rev=1161983r1=1161982r2=1161983view=diff == --- cassandra/trunk/CHANGES.txt (original) +++ cassandra/trunk/CHANGES.txt Fri Aug 26 03:54:32 2011 @@ -42,6 +42,7 @@ * Add install command to cassandra.bat (CASSANDRA-292) * clean up KSMetadata, CFMetadata from unnecessary Thrift-Avro conversion methods (CASSANDRA-3032) + * Add timeouts to client request schedulers (CASSANDRA-3079) 0.8.5 Modified: cassandra/trunk/src/java/org/apache/cassandra/scheduler/IRequestScheduler.java URL: http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/scheduler/IRequestScheduler.java?rev=1161983r1=1161982r2=1161983view=diff == --- cassandra/trunk/src/java/org/apache/cassandra/scheduler/IRequestScheduler.java (original) +++ cassandra/trunk/src/java/org/apache/cassandra/scheduler/IRequestScheduler.java Fri Aug 26 03:54:32 2011 @@ -20,6 +20,8 @@ package org.apache.cassandra.scheduler; * */ +import java.util.concurrent.TimeoutException; + /** * Implementors of IRequestScheduler must provide a constructor taking a RequestSchedulerOptions object. */ @@ -30,8 +32,9 @@ public interface IRequestScheduler * * @param t Thread handing the request * @param idScheduling parameter, an id to distinguish profiles (users/keyspace) + * @param timeout The max time in milliseconds to spend blocking for a slot */ -public void queue(Thread t, String id); +public void queue(Thread t, String id, long timeoutMS) throws TimeoutException; /** * A convenience method for indicating when a particular request has completed Modified: cassandra/trunk/src/java/org/apache/cassandra/scheduler/NoScheduler.java URL: http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/scheduler/NoScheduler.java?rev=1161983r1=1161982r2=1161983view=diff == --- cassandra/trunk/src/java/org/apache/cassandra/scheduler/NoScheduler.java (original) +++ cassandra/trunk/src/java/org/apache/cassandra/scheduler/NoScheduler.java Fri Aug 26 03:54:32 2011 @@ -34,7 +34,7 @@ public class NoScheduler implements IReq public NoScheduler() {} -public void queue(Thread t, String id) {} +public void queue(Thread t, String id, long timeoutMS) {} public void release() {} } Modified: cassandra/trunk/src/java/org/apache/cassandra/scheduler/RoundRobinScheduler.java URL: http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/scheduler/RoundRobinScheduler.java?rev=1161983r1=1161982r2=1161983view=diff == --- cassandra/trunk/src/java/org/apache/cassandra/scheduler/RoundRobinScheduler.java (original) +++ cassandra/trunk/src/java/org/apache/cassandra/scheduler/RoundRobinScheduler.java Fri Aug 26 03:54:32 2011 @@ -23,6 +23,7 @@ package org.apache.cassandra.scheduler; import java.util.Map; import java.util.concurrent.Semaphore; +import java.util.concurrent.TimeoutException; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -78,14 +79,14 @@ public class RoundRobinScheduler impleme logger.info(Started the RoundRobin Request Scheduler); } -public void queue(Thread t, String id) +public void queue(Thread t, String id, long timeoutMS) throws TimeoutException { WeightedQueue weightedQueue = getWeightedQueue(id); try { queueSize.release(); -weightedQueue.put(t); +weightedQueue.put(t, timeoutMS); } catch (InterruptedException e) { Modified: cassandra/trunk/src/java/org/apache/cassandra/scheduler/WeightedQueue.java URL: http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/scheduler/WeightedQueue.java?rev=1161983r1=1161982r2=1161983view=diff == --- cassandra/trunk/src/java/org/apache/cassandra/scheduler/WeightedQueue.java (original) +++
[jira] [Commented] (CASSANDRA-3079) Allow the client request scheduler to throw Timeouts
[ https://issues.apache.org/jira/browse/CASSANDRA-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091565#comment-13091565 ] Hudson commented on CASSANDRA-3079: --- Integrated in Cassandra #1048 (See [https://builds.apache.org/job/Cassandra/1048/]) Add timeouts to client request schedulers patch by Stu Hood; reviewed by Melvin Wang for CASSANDRA-3079 jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1161983 Files : * /cassandra/trunk/CHANGES.txt * /cassandra/trunk/src/java/org/apache/cassandra/scheduler/IRequestScheduler.java * /cassandra/trunk/src/java/org/apache/cassandra/scheduler/NoScheduler.java * /cassandra/trunk/src/java/org/apache/cassandra/scheduler/RoundRobinScheduler.java * /cassandra/trunk/src/java/org/apache/cassandra/scheduler/WeightedQueue.java * /cassandra/trunk/src/java/org/apache/cassandra/thrift/CassandraServer.java Allow the client request scheduler to throw Timeouts Key: CASSANDRA-3079 URL: https://issues.apache.org/jira/browse/CASSANDRA-3079 Project: Cassandra Issue Type: Improvement Components: API Reporter: Stu Hood Assignee: Stu Hood Priority: Minor Fix For: 1.0 Attachments: 0001-Add-timeouts-to-request-scheduling.txt, 0002-Fix-try-finally-nesting-for-scheduling.txt The RoundRobinScheduler prioritizes threads by allowing them to queue on a SynchronousQueue per scheduling bucket. These queues currently do not use timeouts, and we observed a cascading case where client retries caused the scheduler queues to fill such that latency was way above the client timeout. Allowing the IRequestScheduler.queue method to throw a (per-call configurable) timeout, we can avoid this cascading. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira