[jira] [Commented] (CASSANDRA-2630) CLI - 'describe column family' would be nice
[ https://issues.apache.org/jira/browse/CASSANDRA-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092430#comment-13092430 ] satish babu krishnamoorthy commented on CASSANDRA-2630: --- Pavel, do you want me to update the patch with your comments ? CLI - 'describe column family' would be nice Key: CASSANDRA-2630 URL: https://issues.apache.org/jira/browse/CASSANDRA-2630 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: satish babu krishnamoorthy Priority: Minor Labels: cli, lhf Fix For: 1.0 Attachments: cassandra-0.8.2-2630-1.txt, cassandra-0.8.2-2630.txt I end up verifying column families a lot and using 'describe keyspace keyspace;' spits out a whole bunch of data since our keyspace has a lot of metadata. It would be really useful to have a 'describe column family;' for a given column family in the currently authenticated keyspace. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3092) Delete columns using range without specifying the column names
Delete columns using range without specifying the column names -- Key: CASSANDRA-3092 URL: https://issues.apache.org/jira/browse/CASSANDRA-3092 Project: Cassandra Issue Type: Improvement Reporter: Tongguo Pang When we delete columns, especially whose names are time stamps(obtained from System.curMillis() method), it's very hard to get the column names. If we the delete can take a range of column names(using start and end), that can make this operation much easier -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2630) CLI - 'describe column family' would be nice
[ https://issues.apache.org/jira/browse/CASSANDRA-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092455#comment-13092455 ] Pavel Yaskevich commented on CASSANDRA-2630: Please attach version 2 instead CLI - 'describe column family' would be nice Key: CASSANDRA-2630 URL: https://issues.apache.org/jira/browse/CASSANDRA-2630 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Assignee: satish babu krishnamoorthy Priority: Minor Labels: cli, lhf Fix For: 1.0 Attachments: cassandra-0.8.2-2630-1.txt, cassandra-0.8.2-2630.txt I end up verifying column families a lot and using 'describe keyspace keyspace;' spits out a whole bunch of data since our keyspace has a lot of metadata. It would be really useful to have a 'describe column family;' for a given column family in the currently authenticated keyspace. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3093) delete multiple rows on secondary equals
delete multiple rows on secondary equals Key: CASSANDRA-3093 URL: https://issues.apache.org/jira/browse/CASSANDRA-3093 Project: Cassandra Issue Type: Improvement Reporter: Tongguo Pang For now if we want to delete rows on secondary index equals, we have to read the keys back. This very inefficient, especially when the result set is big(for example, 1,000,000 rows). If the delete operation can accept a secondary index query as input, that will be very helpful for this situation -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1162495 - in /cassandra/branches/cassandra-0.8: CHANGES.txt src/java/org/apache/cassandra/cli/Cli.g src/java/org/apache/cassandra/cli/CliCompiler.java test/unit/org/apache/cassandra/cli/C
Author: xedin Date: Sun Aug 28 11:06:17 2011 New Revision: 1162495 URL: http://svn.apache.org/viewvc?rev=1162495view=rev Log: Fix parsing of the Keyspace and ColumnFamily names in numeric and string representations in CLI patch by Pavel Yaskevich; reviewed by Jonathan Ellis for CASSANDRA-3075 Modified: cassandra/branches/cassandra-0.8/CHANGES.txt cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/Cli.g cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliCompiler.java cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/cli/CliTest.java Modified: cassandra/branches/cassandra-0.8/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1162495r1=1162494r2=1162495view=diff == --- cassandra/branches/cassandra-0.8/CHANGES.txt (original) +++ cassandra/branches/cassandra-0.8/CHANGES.txt Sun Aug 28 11:06:17 2011 @@ -35,7 +35,8 @@ * work around native memory leak in com.sun.management.GarbageCollectorMXBean (CASSANDRA-2868) * fix UnavailableException with writes at CL.EACH_QUORM (CASSANDRA-3084) - + * fix parsing of the Keyspace and ColumnFamily names in numeric + and string representations in CLI (CASSANDRA-3075) 0.8.4 * include files-to-be-streamed in StreamInSession.getSources (CASSANDRA-2972) Modified: cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/Cli.g URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/Cli.g?rev=1162495r1=1162494r2=1162495view=diff == --- cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/Cli.g (original) +++ cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/Cli.g Sun Aug 28 11:06:17 2011 @@ -382,8 +382,8 @@ useKeyspace keyValuePairExpr -: objectName ( (AND | WITH) keyValuePair )* -- ^(NODE_NEW_KEYSPACE_ACCESS objectName ( keyValuePair )* ) +: entityName ( (AND | WITH) keyValuePair )* +- ^(NODE_NEW_KEYSPACE_ACCESS entityName ( keyValuePair )* ) ; keyValuePair @@ -423,12 +423,12 @@ columnFamilyExpr ; keyRangeExpr -:'[' ( startKey? ':' endKey? )? ']' - - ^(NODE_KEY_RANGE startKey? endKey?) +:'[' ( startKey=entityName? ':' endKey=entityName? )? ']' + - ^(NODE_KEY_RANGE $startKey? $endKey?) ; columnName - : (StringLiteral | Identifier | IntegerPositiveLiteral | IntegerNegativeLiteral) + : entityName ; attr_name @@ -448,12 +448,8 @@ attrValueDouble : DoubleLiteral ; -objectName - : Identifier - ; - keyspace - : Identifier + : entityName ; replica_placement_strategy @@ -461,7 +457,7 @@ replica_placement_strategy ; keyspaceNewName - : Identifier + : entityName ; comparator @@ -472,7 +468,7 @@ command : Identifier ; newColumnFamily - : Identifier + : entityName ; username: Identifier @@ -482,8 +478,12 @@ password: StringLiteral ; columnFamily - : Identifier - ; + : entityName + ; + +entityName + : (Identifier | StringLiteral | IntegerPositiveLiteral | IntegerNegativeLiteral) + ; rowKey : (Identifier | StringLiteral | IntegerPositiveLiteral | IntegerNegativeLiteral | functionCall) @@ -502,14 +502,6 @@ functionArgument : Identifier | StringLiteral | IntegerPositiveLiteral | IntegerNegativeLiteral ; -startKey -: (Identifier | StringLiteral) - ; - -endKey -: (Identifier | StringLiteral) - ; - columnOrSuperColumn : (Identifier | IntegerPositiveLiteral | IntegerNegativeLiteral | StringLiteral | functionCall) ; Modified: cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliCompiler.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliCompiler.java?rev=1162495r1=1162494r2=1162495view=diff == --- cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliCompiler.java (original) +++ cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliCompiler.java Sun Aug 28 11:06:17 2011 @@ -96,7 +96,7 @@ public class CliCompiler public static String getColumnFamily(Tree astNode, ListCfDef cfDefs) { -return getColumnFamily(astNode.getChild(0).getText(), cfDefs); +return getColumnFamily(CliUtils.unescapeSQLString(astNode.getChild(0).getText()), cfDefs); } public static String getColumnFamily(String cfName, ListCfDef cfDefs) Modified: cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/cli/CliTest.java URL:
[jira] [Commented] (CASSANDRA-3075) Cassandra CLI unable to use list command with INTEGER column names, resulting in syntax error
[ https://issues.apache.org/jira/browse/CASSANDRA-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092472#comment-13092472 ] Hudson commented on CASSANDRA-3075: --- Integrated in Cassandra-0.8 #297 (See [https://builds.apache.org/job/Cassandra-0.8/297/]) Fix parsing of the Keyspace and ColumnFamily names in numeric and string representations in CLI patch by Pavel Yaskevich; reviewed by Jonathan Ellis for CASSANDRA-3075 xedin : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1162495 Files : * /cassandra/branches/cassandra-0.8/CHANGES.txt * /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/Cli.g * /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliCompiler.java * /cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/cli/CliTest.java Cassandra CLI unable to use list command with INTEGER column names, resulting in syntax error - Key: CASSANDRA-3075 URL: https://issues.apache.org/jira/browse/CASSANDRA-3075 Project: Cassandra Issue Type: Bug Components: Tools Affects Versions: 0.8.0 Environment: 64 Bit Ubuntu 11.04(full update), AMD64 + 8GB RAM + 500GB Hdd, Java 1.6.0_26, Cassandra 0.8.0 + 4GB heap, Cassandra CLI Reporter: Renato Bacelar da Silveira Assignee: Pavel Yaskevich Priority: Minor Labels: features, newbie Fix For: 0.8.5 Attachments: CASSANDRA-3075.patch I have a Column Family named 1105115. I have inserted the CF with Hector, and it did not throw any exception concerning the name of the column. If I am issuing the command list 1105115; I incur the following error: [default@unknown] list 1105115; Syntax error at position 5: mismatched input '1105115' expecting Identifier I presume we are not to name CFs as integers? Or is there something I am missing from the bellow help content: [default@unknown] help list; list cf; list cf[startKey:]; list cf[startKey:endKey]; list cf[startKey:endKey] limit limit; List a range of rows, and all of their columns, in the specified column family. The order of rows returned is dependant on the Partitioner in use. Required Parameters: - cf: Name of the column family to list rows from. Optional Parameters: - endKey: Key to end the range at. The end key will be included in the result. Defaults to an empty byte array. - limit: Number of rows to return. Default is 100. - startKey: Key start the range from. The start key will be included in the result. Defaults to an empty byte array. Examples: list Standard1; list Super1[j:]; list Standard1[j:k] limit 40; Column Family Info: ColumnFamily: 1105115 Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.AsciiType Row cache size / save period in seconds: 0.0/0 Key cache size / save period in seconds: 20.0/14400 Memtable thresholds: 0.5203125/111/1440 (millions of ops/MB/minutes) GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Built indexes: [] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3094) Jdbc Connection pooling for Cassandra
Jdbc Connection pooling for Cassandra - Key: CASSANDRA-3094 URL: https://issues.apache.org/jira/browse/CASSANDRA-3094 Project: Cassandra Issue Type: New Feature Components: Drivers Reporter: Vivek Mishra Assignee: Vivek Mishra As JDBC driver stuff is in place for Cassandra connection, so there is a thought to add connection pooling(jdbc specific). This could be a useful feature in case large/long running applications. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3094) Jdbc Connection pooling for Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vivek Mishra updated CASSANDRA-3094: Attachment: cassandra_connection_pool_v1.patch Jdbc Connection pooling for Cassandra - Key: CASSANDRA-3094 URL: https://issues.apache.org/jira/browse/CASSANDRA-3094 Project: Cassandra Issue Type: New Feature Components: Drivers Reporter: Vivek Mishra Assignee: Vivek Mishra Attachments: cassandra_connection_pool_v1.patch As JDBC driver stuff is in place for Cassandra connection, so there is a thought to add connection pooling(jdbc specific). This could be a useful feature in case large/long running applications. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-3086) Use interval tree to narrow down sstables on range scans
[ https://issues.apache.org/jira/browse/CASSANDRA-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-3086. --- Resolution: Invalid Assignee: (was: Benjamin Coverston) This was done in 1608 after all Use interval tree to narrow down sstables on range scans Key: CASSANDRA-3086 URL: https://issues.apache.org/jira/browse/CASSANDRA-3086 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Priority: Minor CASSANDRA-1608 added interval tree optimization for single-row queries but not range scans (CFS.getRangeSlice). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3025) PHP/PDO driver for Cassandra CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikko Koppanen updated CASSANDRA-3025: -- Attachment: pdo_cassandra-0.1.3.tgz Update to the latest package PHP/PDO driver for Cassandra CQL Key: CASSANDRA-3025 URL: https://issues.apache.org/jira/browse/CASSANDRA-3025 Project: Cassandra Issue Type: New Feature Components: API Reporter: Mikko Koppanen Labels: php Attachments: pdo_cassandra-0.1.0.tgz, pdo_cassandra-0.1.1.tgz, pdo_cassandra-0.1.2.tgz, pdo_cassandra-0.1.3.tgz, php_test_results_20110818_2317.txt Hello, attached is the initial version of the PDO driver for Cassandra CQL language. This is a native PHP extension written in what I would call a combination of C and C++, due to PHP being C. The thrift API used is the C++. The API looks roughly following: {code} ?php $db = new PDO('cassandra:host=127.0.0.1;port=9160'); $db-exec (CREATE KEYSPACE mytest with strategy_class = 'SimpleStrategy' and strategy_options:replication_factor=1;); $db-exec (USE mytest); $db-exec (CREATE COLUMNFAMILY users ( my_key varchar PRIMARY KEY, full_name varchar );); $stmt = $db-prepare (INSERT INTO users (my_key, full_name) VALUES (:key, :full_name);); $stmt-execute (array (':key' = 'mikko', ':full_name' = 'Mikko K' )); {code} Currently prepared statements are emulated on the client side but I understand that there is a plan to add prepared statements to Cassandra CQL API as well. I will add this feature in to the extension as soon as they are implemented. Additional documentation can be found in github https://github.com/mkoppanen/php-pdo_cassandra, in the form of rendered MarkDown file. Tests are currently not included in the package file and they can be found in the github for now as well. I have created documentation in docbook format as well, but have not yet rendered it. Comments and feedback are welcome. Thanks, Mikko -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3095) java.lang.NegativeArraySizeException during compacting large row
java.lang.NegativeArraySizeException during compacting large row Key: CASSANDRA-3095 URL: https://issues.apache.org/jira/browse/CASSANDRA-3095 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.8.4 Environment: Linux 2.6.26-2-amd64 #1 SMP Thu Feb 11 00:59:32 UTC 2010 x86_64 GNU/Linux JDK 1.6.0_27 (Java 6 update 27), with JNA. Reporter: Pas Hello, It's a 4 node ring, 3 on 0.7.4, I've upgraded one to 0.8.4. This particular node was having issues with compaction that's why I've tried the upgrade (it looks likely that this solved the compaction issues). Here's the stack trace from system.log. INFO [CompactionExecutor:22] 2011-08-28 18:12:46,566 CompactionController.java (line 136) Compacting large row (36028797018963968 bytes) incrementally ERROR [CompactionExecutor:22] 2011-08-28 18:12:46,609 AbstractCassandraDaemon.java (line 134) Fatal exception in thread Thread[CompactionExecutor:22,1,main] java.lang.NegativeArraySizeException at org.apache.cassandra.utils.obs.OpenBitSet.init(OpenBitSet.java:85) at org.apache.cassandra.utils.BloomFilter.bucketsFor(BloomFilter.java:56) at org.apache.cassandra.utils.BloomFilter.getFilter(BloomFilter.java:73) at org.apache.cassandra.db.ColumnIndexer.serializeInternal(ColumnIndexer.java:62) at org.apache.cassandra.db.ColumnIndexer.serialize(ColumnIndexer.java:50) at org.apache.cassandra.db.compaction.LazilyCompactedRow.init(LazilyCompactedRow.java:89) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:138) at org.apache.cassandra.db.compaction.CompactionIterator.getReduced(CompactionIterator.java:123) at org.apache.cassandra.db.compaction.CompactionIterator.getReduced(CompactionIterator.java:43) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:74) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183) at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94) at org.apache.cassandra.db.compaction.CompactionManager.doCompactionWithoutSizeEstimation(CompactionManager.java:569) at org.apache.cassandra.db.compaction.CompactionManager.doCompaction(CompactionManager.java:506) at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:141) at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:107) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) We've ~70 files still in f format. And 80 in g. We've ~100 GB of data on this node. Thanks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3091) Move the caching of KS and CF metadata in the JDBC suite from Connection to Statement
[ https://issues.apache.org/jira/browse/CASSANDRA-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rick Shaw updated CASSANDRA-3091: - Attachment: move-metadata-for-decoder-to-statement-level-v2.txt v2 of patch adds better clarity to the non-interface methods of {{CassandraConnection}} by making them package {{protected}}. Move the caching of KS and CF metadata in the JDBC suite from Connection to Statement - Key: CASSANDRA-3091 URL: https://issues.apache.org/jira/browse/CASSANDRA-3091 Project: Cassandra Issue Type: Improvement Components: Drivers Affects Versions: 0.8.4 Reporter: Rick Shaw Assignee: Rick Shaw Priority: Minor Labels: JDBC Fix For: 0.8.5 Attachments: move-metadata-for decoder-to-statement-level-v1.txt, move-metadata-for-decoder-to-statement-level-v2.txt Currently, all caching of metadata used in JDBC's {{ColumnDecoder}} class is loaded and held in the {{CassandraConnection}} class. The implication of this is that any activity on the connected server from the time the connection is established is not reflected in the KSs and CF that can be accessed by the {{ResultSet, Statement}} and {{PreparedStatement}}. By moving the cached metadata to the {{Statement}} level, the currency of the metadata can be checked within the {{Statement}} and reloaded if it is seen to be absent. And by instantiating a new {{Statement}} (on any existing connection) you are assured of getting the most current copy of the metadata known to the server at the new time of instantiation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2664) JDBC driver for CQL works only with Strings
[ https://issues.apache.org/jira/browse/CASSANDRA-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092517#comment-13092517 ] Rick Shaw commented on CASSANDRA-2664: -- This appears closable... There is no such method in the current code and all current methods in the {{PreparedStatement}} unit test succeed. JDBC driver for CQL works only with Strings --- Key: CASSANDRA-2664 URL: https://issues.apache.org/jira/browse/CASSANDRA-2664 Project: Cassandra Issue Type: Bug Components: API Affects Versions: 0.8.0 beta 2 Environment: It happens to JDBC driver for both: 0.8.0 beta version and 0.8.0-rc1 Reporter: Roman Kuzmin Labels: cql, jdbc Original Estimate: 4h Remaining Estimate: 4h CassandraPreparedStatement.java Line 141: String stringParam = makeCqlString(type.toString(param)); It crashes with ClassCastException for all parameters that are not Strings. It is because, when the method applyDualBindings is called from makeUpdate it ALWAYS get one and the same type as parameter. In fact it is a comparator of columnfamily itself. In my case it is UTF8Type. And UTF8Type.toString() method expects only Strings. I think it must be column-dependent. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1608) Redesigned Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092518#comment-13092518 ] Benjamin Coverston commented on CASSANDRA-1608: --- .bq Additional note: test suite runs about 20% slower for me w/ Leveled compactions. Unsure if that should be expected. That's not entirely expected. It's probably due in part to the amount of flushing that we force during in the tests. Flushes and compactions both trigger interval tree builds. Other than that the codepaths are the same. Redesigned Compaction - Key: CASSANDRA-1608 URL: https://issues.apache.org/jira/browse/CASSANDRA-1608 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Chris Goffinet Assignee: Benjamin Coverston Fix For: 1.0 Attachments: 1608-22082011.txt, 1608-v2.txt, 1608-v4.txt, 1608-v5.txt After seeing the I/O issues in CASSANDRA-1470, I've been doing some more thinking on this subject that I wanted to lay out. I propose we redo the concept of how compaction works in Cassandra. At the moment, compaction is kicked off based on a write access pattern, not read access pattern. In most cases, you want the opposite. You want to be able to track how well each SSTable is performing in the system. If we were to keep statistics in-memory of each SSTable, prioritize them based on most accessed, and bloom filter hit/miss ratios, we could intelligently group sstables that are being read most often and schedule them for compaction. We could also schedule lower priority maintenance on SSTable's not often accessed. I also propose we limit the size of each SSTable to a fix sized, that gives us the ability to better utilize our bloom filters in a predictable manner. At the moment after a certain size, the bloom filters become less reliable. This would also allow us to group data most accessed. Currently the size of an SSTable can grow to a point where large portions of the data might not actually be accessed as often. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-1608) Redesigned Compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092518#comment-13092518 ] Benjamin Coverston edited comment on CASSANDRA-1608 at 8/28/11 5:25 PM: bq. Additional note: test suite runs about 20% slower for me w/ Leveled compactions. Unsure if that should be expected. That's not entirely expected. It's probably due in part to the amount of flushing that we force during in the tests. Flushes and compactions both trigger interval tree builds. Other than that the codepaths are the same. was (Author: bcoverston): .bq Additional note: test suite runs about 20% slower for me w/ Leveled compactions. Unsure if that should be expected. That's not entirely expected. It's probably due in part to the amount of flushing that we force during in the tests. Flushes and compactions both trigger interval tree builds. Other than that the codepaths are the same. Redesigned Compaction - Key: CASSANDRA-1608 URL: https://issues.apache.org/jira/browse/CASSANDRA-1608 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Chris Goffinet Assignee: Benjamin Coverston Fix For: 1.0 Attachments: 1608-22082011.txt, 1608-v2.txt, 1608-v4.txt, 1608-v5.txt After seeing the I/O issues in CASSANDRA-1470, I've been doing some more thinking on this subject that I wanted to lay out. I propose we redo the concept of how compaction works in Cassandra. At the moment, compaction is kicked off based on a write access pattern, not read access pattern. In most cases, you want the opposite. You want to be able to track how well each SSTable is performing in the system. If we were to keep statistics in-memory of each SSTable, prioritize them based on most accessed, and bloom filter hit/miss ratios, we could intelligently group sstables that are being read most often and schedule them for compaction. We could also schedule lower priority maintenance on SSTable's not often accessed. I also propose we limit the size of each SSTable to a fix sized, that gives us the ability to better utilize our bloom filters in a predictable manner. At the moment after a certain size, the bloom filters become less reliable. This would also allow us to group data most accessed. Currently the size of an SSTable can grow to a point where large portions of the data might not actually be accessed as often. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-3092) Delete columns using range without specifying the column names
[ https://issues.apache.org/jira/browse/CASSANDRA-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-3092. --- Resolution: Duplicate see CASSANDRA-494 Delete columns using range without specifying the column names -- Key: CASSANDRA-3092 URL: https://issues.apache.org/jira/browse/CASSANDRA-3092 Project: Cassandra Issue Type: Improvement Reporter: Tongguo Pang When we delete columns, especially whose names are time stamps(obtained from System.curMillis() method), it's very hard to get the column names. If we the delete can take a range of column names(using start and end), that can make this operation much easier -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-2664) JDBC driver for CQL works only with Strings
[ https://issues.apache.org/jira/browse/CASSANDRA-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-2664. --- Resolution: Invalid JDBC driver for CQL works only with Strings --- Key: CASSANDRA-2664 URL: https://issues.apache.org/jira/browse/CASSANDRA-2664 Project: Cassandra Issue Type: Bug Components: API Affects Versions: 0.8.0 beta 2 Environment: It happens to JDBC driver for both: 0.8.0 beta version and 0.8.0-rc1 Reporter: Roman Kuzmin Labels: cql, jdbc Original Estimate: 4h Remaining Estimate: 4h CassandraPreparedStatement.java Line 141: String stringParam = makeCqlString(type.toString(param)); It crashes with ClassCastException for all parameters that are not Strings. It is because, when the method applyDualBindings is called from makeUpdate it ALWAYS get one and the same type as parameter. In fact it is a comparator of columnfamily itself. In my case it is UTF8Type. And UTF8Type.toString() method expects only Strings. I think it must be column-dependent. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1162598 - /cassandra/trunk/src/java/org/apache/cassandra/security/streaming/SSLIncomingStreamReader.java
Author: xedin Date: Sun Aug 28 21:37:49 2011 New Revision: 1162598 URL: http://svn.apache.org/viewvc?rev=1162598view=rev Log: Deleted empty file src/java/org/apache/cassandra/security/streaming/SSLIncomingStreamReader.java Modified: cassandra/trunk/src/java/org/apache/cassandra/security/streaming/SSLIncomingStreamReader.java
[jira] [Updated] (CASSANDRA-3085) Race condition in sstable reference counting
[ https://issues.apache.org/jira/browse/CASSANDRA-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3085: -- Attachment: (was: 3085-v2.txt) Race condition in sstable reference counting Key: CASSANDRA-3085 URL: https://issues.apache.org/jira/browse/CASSANDRA-3085 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0 Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Critical Fix For: 1.0 Attachments: 3085-v2.txt, 3085.txt DataTracker gives us an atomic View of memtable/sstables, but acquiring references is not atomic. So it is possible to acquire references to an SSTableReader object that is no longer valid, as in this example: View V contains sstables {A, B}. We attempt a read in thread T using this View. Meanwhile, A and B are compacted to {C}, yielding View W. No references exist to A or B so they are cleaned up. Back in thread T we acquire references to A and B. This does not cause an error, but it will when we attempt to read from them next. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3085) Race condition in sstable reference counting
[ https://issues.apache.org/jira/browse/CASSANDRA-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3085: -- Attachment: 3085-v2.txt v2 encapsulates the lockless atomic acquisition in CFS.markReferenced(Interval). Not 100% sure how important the changes to the getRangeSlice tokens were, that I took out. :) If we need those, we might need to make getRangeSlice loop manually w/o the encapsulation, since we need the view to compute the Interval, but we need the Interval to search for sstables. Race condition in sstable reference counting Key: CASSANDRA-3085 URL: https://issues.apache.org/jira/browse/CASSANDRA-3085 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0 Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Critical Fix For: 1.0 Attachments: 3085-v2.txt, 3085.txt DataTracker gives us an atomic View of memtable/sstables, but acquiring references is not atomic. So it is possible to acquire references to an SSTableReader object that is no longer valid, as in this example: View V contains sstables {A, B}. We attempt a read in thread T using this View. Meanwhile, A and B are compacted to {C}, yielding View W. No references exist to A or B so they are cleaned up. Back in thread T we acquire references to A and B. This does not cause an error, but it will when we attempt to read from them next. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3085) Race condition in sstable reference counting
[ https://issues.apache.org/jira/browse/CASSANDRA-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3085: -- Attachment: 3085-v2.txt Race condition in sstable reference counting Key: CASSANDRA-3085 URL: https://issues.apache.org/jira/browse/CASSANDRA-3085 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0 Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Critical Fix For: 1.0 Attachments: 3085-v2.txt, 3085.txt DataTracker gives us an atomic View of memtable/sstables, but acquiring references is not atomic. So it is possible to acquire references to an SSTableReader object that is no longer valid, as in this example: View V contains sstables {A, B}. We attempt a read in thread T using this View. Meanwhile, A and B are compacted to {C}, yielding View W. No references exist to A or B so they are cleaned up. Back in thread T we acquire references to A and B. This does not cause an error, but it will when we attempt to read from them next. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3096) Test RoundRobinScheduler timeouts
Test RoundRobinScheduler timeouts - Key: CASSANDRA-3096 URL: https://issues.apache.org/jira/browse/CASSANDRA-3096 Project: Cassandra Issue Type: Bug Components: API Reporter: Stu Hood Assignee: Stu Hood CASSANDRA-3079 was very hasty, and introduced two bugs that would: 1) cause the scheduler to busywait after a timeout, 2) never actually throw timeouts. This calls for a test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3096) Test RoundRobinScheduler timeouts
[ https://issues.apache.org/jira/browse/CASSANDRA-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stu Hood updated CASSANDRA-3096: Attachment: 0001-Properly-throw-timeouts-decrement-the-count-of-waiters.txt 0001 - Properly throw timeouts from WeightedQueue, decrement the count of waiters on timeout, fix off-by-one in taskCount, and test all of it. Test RoundRobinScheduler timeouts - Key: CASSANDRA-3096 URL: https://issues.apache.org/jira/browse/CASSANDRA-3096 Project: Cassandra Issue Type: Bug Components: API Reporter: Stu Hood Assignee: Stu Hood Fix For: 1.0 Attachments: 0001-Properly-throw-timeouts-decrement-the-count-of-waiters.txt CASSANDRA-3079 was very hasty, and introduced two bugs that would: 1) cause the scheduler to busywait after a timeout, 2) never actually throw timeouts. This calls for a test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3096) Test RoundRobinScheduler timeouts
[ https://issues.apache.org/jira/browse/CASSANDRA-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stu Hood updated CASSANDRA-3096: Attachment: (was: 0001-Properly-throw-timeouts-decrement-the-count-of-waiters.txt) Test RoundRobinScheduler timeouts - Key: CASSANDRA-3096 URL: https://issues.apache.org/jira/browse/CASSANDRA-3096 Project: Cassandra Issue Type: Bug Components: API Reporter: Stu Hood Assignee: Stu Hood Fix For: 1.0 CASSANDRA-3079 was very hasty, and introduced two bugs that would: 1) cause the scheduler to busywait after a timeout, 2) never actually throw timeouts. This calls for a test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3096) Test RoundRobinScheduler timeouts
[ https://issues.apache.org/jira/browse/CASSANDRA-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stu Hood updated CASSANDRA-3096: Attachment: 0001-Properly-throw-timeouts-decrement-the-count-of-waiters.txt Test RoundRobinScheduler timeouts - Key: CASSANDRA-3096 URL: https://issues.apache.org/jira/browse/CASSANDRA-3096 Project: Cassandra Issue Type: Bug Components: API Reporter: Stu Hood Assignee: Stu Hood Fix For: 1.0 Attachments: 0001-Properly-throw-timeouts-decrement-the-count-of-waiters.txt CASSANDRA-3079 was very hasty, and introduced two bugs that would: 1) cause the scheduler to busywait after a timeout, 2) never actually throw timeouts. This calls for a test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2630) CLI - 'describe column family' would be nice
[ https://issues.apache.org/jira/browse/CASSANDRA-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish babu krishnamoorthy updated CASSANDRA-2630: -- Attachment: cassandra-0.8.2-2630-2.txt updated comments from pavel :) CLI - 'describe column family' would be nice Key: CASSANDRA-2630 URL: https://issues.apache.org/jira/browse/CASSANDRA-2630 Project: Cassandra Issue Type: Improvement Affects Versions: 0.8.4 Reporter: Jeremy Hanna Assignee: satish babu krishnamoorthy Priority: Minor Labels: cli, lhf Fix For: 1.0 Attachments: cassandra-0.8.2-2630-1.txt, cassandra-0.8.2-2630-2.txt, cassandra-0.8.2-2630.txt I end up verifying column families a lot and using 'describe keyspace keyspace;' spits out a whole bunch of data since our keyspace has a lot of metadata. It would be really useful to have a 'describe column family;' for a given column family in the currently authenticated keyspace. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2630) CLI - 'describe column family' would be nice
[ https://issues.apache.org/jira/browse/CASSANDRA-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092569#comment-13092569 ] Pavel Yaskevich commented on CASSANDRA-2630: One last thing: can you please re-attach v2 version and check Grant license to ASF for inclusion in ASF works checkbox, thanks! CLI - 'describe column family' would be nice Key: CASSANDRA-2630 URL: https://issues.apache.org/jira/browse/CASSANDRA-2630 Project: Cassandra Issue Type: Improvement Affects Versions: 0.8.4 Reporter: Jeremy Hanna Assignee: satish babu krishnamoorthy Priority: Minor Labels: cli, lhf Fix For: 1.0 Attachments: cassandra-0.8.2-2630-1.txt, cassandra-0.8.2-2630-2.txt, cassandra-0.8.2-2630.txt I end up verifying column families a lot and using 'describe keyspace keyspace;' spits out a whole bunch of data since our keyspace has a lot of metadata. It would be really useful to have a 'describe column family;' for a given column family in the currently authenticated keyspace. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2252) arena allocation for memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092574#comment-13092574 ] Yang Yang edited comment on CASSANDRA-2252 at 8/29/11 12:17 AM: hi Jonathan: I checked the counters code, they currently use HeapAllocator . what is the reason we don't yet use SlabAllocator for Counters? Thanks also I put the idea to use 2 SlabAllocators (one for those buffers with long life, one for those short-lived) in https://github.com/yangyangyyy/cassandra/commit/bc017835c64240e58c0c51b2d5f8793f3c7f3a76 https://github.com/yangyangyyy/cassandra/commit/8431ca1b9586086073e6b81d346a06e8172a97e7 maybe it is useful was (Author: yangyangyyy): hi Jonathan: I checked the counters code, they currently use HeapAllocator . what is the reason we don't yet use SlabAllocator for Counters? Thanks also I put the idea to use 2 SlabAllocators (one for those buffers with long life, one for those short-lived) in https://github.com/yangyangyyy/cassandra/commit/bc017835c64240e58c0c51b2d5f8793f3c7f3a76 maybe it is useful arena allocation for memtables -- Key: CASSANDRA-2252 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 1.0 Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz The memtable design practically actively fights Java's GC design. Todd Lipcon gave a good explanation over on HBASE-3455. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1599) Add sort/order support for secondary indexing
[ https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Nine updated CASSANDRA-1599: - Issue Type: Sub-task (was: New Feature) Parent: CASSANDRA-2915 Add sort/order support for secondary indexing - Key: CASSANDRA-1599 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599 Project: Cassandra Issue Type: Sub-task Components: API Reporter: Todd Nine Assignee: Jonathan Ellis Original Estimate: 32h Remaining Estimate: 32h For a lot of users paging is a standard use case on many web applications. It would be nice to allow paging as part of a Boolean Expression. Page - start index - end index - page timestamp - Sort Order When sorting, is it possible to sort both ASC and DESC? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1598) Add Boolean Expression to secondary querying
[ https://issues.apache.org/jira/browse/CASSANDRA-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Nine updated CASSANDRA-1598: - Issue Type: Sub-task (was: New Feature) Parent: CASSANDRA-2915 Add Boolean Expression to secondary querying Key: CASSANDRA-1598 URL: https://issues.apache.org/jira/browse/CASSANDRA-1598 Project: Cassandra Issue Type: Sub-task Components: API Affects Versions: 0.7 beta 3 Reporter: Todd Nine Add boolean operators similar to Lucene style searches. Currently there is implicit support for the operator. It would be helpful to also add support for ||/Union operators. I would envision this as the client would be required to construct the expression tree and pass it via the thrift interface. BooleanExpression -- BooleanOrIndexExpression -- BooleanOperator -- BooleanOrIndexExpression I'd like to take a crack at this since it will greatly improve my Datanucleus plugin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1598) Add Boolean Expression to secondary querying
[ https://issues.apache.org/jira/browse/CASSANDRA-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Nine updated CASSANDRA-1598: - Issue Type: New Feature (was: Sub-task) Parent: (was: CASSANDRA-2915) Add Boolean Expression to secondary querying Key: CASSANDRA-1598 URL: https://issues.apache.org/jira/browse/CASSANDRA-1598 Project: Cassandra Issue Type: New Feature Components: API Affects Versions: 0.7 beta 3 Reporter: Todd Nine Add boolean operators similar to Lucene style searches. Currently there is implicit support for the operator. It would be helpful to also add support for ||/Union operators. I would envision this as the client would be required to construct the expression tree and pass it via the thrift interface. BooleanExpression -- BooleanOrIndexExpression -- BooleanOperator -- BooleanOrIndexExpression I'd like to take a crack at this since it will greatly improve my Datanucleus plugin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092600#comment-13092600 ] Yang Yang commented on CASSANDRA-2252: -- if it's a memtable-related operation aren't the CounterContext's finally inserted into the CounterColumns, hence the Memtable too ? for example: CounterMutation.computeShardMerger() == CounterColumn.computeOldShardMerger() === ByteBuffer contextManager.computeOldShardMerger { .. ContextState merger = ContextState.allocate(2, nbDelta, HeapAllocator.instance); return merger.context; } the merger.context is a ByteBuffer that is inserted into CounterColumn by CounterColumn.computeOldShardMerger() Thanks Yang arena allocation for memtables -- Key: CASSANDRA-2252 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 1.0 Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz The memtable design practically actively fights Java's GC design. Todd Lipcon gave a good explanation over on HBASE-3455. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2252) arena allocation for memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092600#comment-13092600 ] Yang Yang edited comment on CASSANDRA-2252 at 8/29/11 2:33 AM: --- if it's a memtable-related operation CounterContext.allocate produces a ByteBuffer , some of which goes into CounterColumn, hence Memtable, it seems. for example: CounterMutation.computeShardMerger() == CounterColumn.computeOldShardMerger() === ByteBuffer contextManager.computeOldShardMerger { .. ContextState merger = ContextState.allocate(2, nbDelta, HeapAllocator.instance); return merger.context; } the merger.context is a ByteBuffer that is inserted into CounterColumn by CounterColumn.computeOldShardMerger() Thanks Yang was (Author: yangyangyyy): if it's a memtable-related operation aren't the CounterContext's finally inserted into the CounterColumns, hence the Memtable too ? for example: CounterMutation.computeShardMerger() == CounterColumn.computeOldShardMerger() === ByteBuffer contextManager.computeOldShardMerger { .. ContextState merger = ContextState.allocate(2, nbDelta, HeapAllocator.instance); return merger.context; } the merger.context is a ByteBuffer that is inserted into CounterColumn by CounterColumn.computeOldShardMerger() Thanks Yang arena allocation for memtables -- Key: CASSANDRA-2252 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 1.0 Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz The memtable design practically actively fights Java's GC design. Todd Lipcon gave a good explanation over on HBASE-3455. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2252) arena allocation for memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092600#comment-13092600 ] Yang Yang edited comment on CASSANDRA-2252 at 8/29/11 2:33 AM: --- if it's a memtable-related operation but CounterContext.allocate produces a ByteBuffer , some of which goes into CounterColumn, hence Memtable, it seems. for example: CounterMutation.computeShardMerger() == CounterColumn.computeOldShardMerger() === ByteBuffer contextManager.computeOldShardMerger { .. ContextState merger = ContextState.allocate(2, nbDelta, HeapAllocator.instance); return merger.context; } the merger.context is a ByteBuffer that is inserted into CounterColumn by CounterColumn.computeOldShardMerger() Thanks Yang was (Author: yangyangyyy): if it's a memtable-related operation CounterContext.allocate produces a ByteBuffer , some of which goes into CounterColumn, hence Memtable, it seems. for example: CounterMutation.computeShardMerger() == CounterColumn.computeOldShardMerger() === ByteBuffer contextManager.computeOldShardMerger { .. ContextState merger = ContextState.allocate(2, nbDelta, HeapAllocator.instance); return merger.context; } the merger.context is a ByteBuffer that is inserted into CounterColumn by CounterColumn.computeOldShardMerger() Thanks Yang arena allocation for memtables -- Key: CASSANDRA-2252 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 1.0 Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz The memtable design practically actively fights Java's GC design. Todd Lipcon gave a good explanation over on HBASE-3455. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092602#comment-13092602 ] Jonathan Ellis commented on CASSANDRA-2252: --- Happy to look at a patch to fix that. Please open a new ticket. arena allocation for memtables -- Key: CASSANDRA-2252 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 1.0 Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz The memtable design practically actively fights Java's GC design. Todd Lipcon gave a good explanation over on HBASE-3455. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092606#comment-13092606 ] Todd Nine commented on CASSANDRA-2915: -- I don't necessaryly think there is a 1 to 1 relationship between a column and a Lucene document field. In our case we have the need to index fields in more than one manner. For instance, we index users as straight strings (lowercased) with email, first name and last name columns. However we also want to tokenize the email, first and last name columns to allow our customer support people to perform partial name matching. I think a 1 to N mapping is required for column to document field to allow this sort of functionality. As far as expiration on columns, is there a system event that we can hook into to just force a document reindex when a column expires rather than add an additional field that will need to be sorted from? As per Jason's previous post, I think supporting ORDER BY, GROUP BY, COUNT, LIKE etc are a must. Most users have become accustomed to this functionality with RDBMS. If they cause potential performance problems, I think this should be documented so that users have enough information to determine if they can rely on the Lucene index or should build their own index directly. Lastly, this is a huge feature for the hector-jpa plugin, what can I do to help? Lucene based Secondary Indexes -- Key: CASSANDRA-2915 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915 Project: Cassandra Issue Type: New Feature Components: Core Reporter: T Jake Luciani Assignee: Jason Rutherglen Labels: secondary_index Secondary indexes (of type KEYS) suffer from a number of limitations in their current form: - Multiple IndexClauses only work when there is a subset of rows under the highest clause - One new column family is created per index this means 10 new CFs for 10 secondary indexes This ticket will use the Lucene library to implement secondary indexes as one index per CF, and utilize the Lucene query engine to handle multiple index clauses. Also, by using the Lucene we get a highly optimized file format. There are a few parallels we can draw between Cassandra and Lucene. Lucene indexes segments in memory then flushes them to disk so we can sync our memtable flushes to lucene flushes. Lucene also has optimize() which correlates to our compaction process, so these can be sync'd as well. We will also need to correlate column validators to Lucene tokenizers, so the data can be stored properly, the big win in once this is done we can perform complex queries within a column like wildcard searches. The downside of this approach is we will need to read before write since documents in Lucene are written as complete documents. For random workloads with lot's of indexed columns this means we need to read the document from the index, update it and write it back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2252) arena allocation for memtables
[ https://issues.apache.org/jira/browse/CASSANDRA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092608#comment-13092608 ] Yang Yang commented on CASSANDRA-2252: -- cool, actually after some thought, I think we need to put more care to utilizing SlabAllocator for counters: I realized this when u said only if it's a memtable-related operation, this would be very true for some temp variable ByteBuffers, which are thrown away immediately, and hence get relaimed in the new gen GC, and never go into old gen. for counters, the column values (which contain the CounterContext) change a lot, if we assume that the value of each counter is updated 1000 times during the life time of a memtable before being flushed, then if you look at a typical 2MB slab allocated out, 99.9% of the buffers it contains are going to be non-reachable and GC'ed before flushing. so when the 0.1% buffer is promoted, it occupies 2MB space instead of its actual size, which would be more waste than the possible fragmentation problem it causes. so in this case (or, more generally, all cases where update is more often), using HeapAllocator may be better. Thanks Yang arena allocation for memtables -- Key: CASSANDRA-2252 URL: https://issues.apache.org/jira/browse/CASSANDRA-2252 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 1.0 Attachments: 0001-add-MemtableAllocator.txt, 0002-add-off-heap-MemtableAllocator-support.txt, 2252-v3.txt, 2252-v4.txt, merged-2252.tgz The memtable design practically actively fights Java's GC design. Todd Lipcon gave a good explanation over on HBASE-3455. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2630) CLI - 'describe column family' would be nice
[ https://issues.apache.org/jira/browse/CASSANDRA-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish babu krishnamoorthy updated CASSANDRA-2630: -- Attachment: cassandra-0.8.2-2630-2.txt re-attach v2 version and check Grant license to ASF for inclusion in ASF works checkbox CLI - 'describe column family' would be nice Key: CASSANDRA-2630 URL: https://issues.apache.org/jira/browse/CASSANDRA-2630 Project: Cassandra Issue Type: Improvement Affects Versions: 0.8.4 Reporter: Jeremy Hanna Assignee: satish babu krishnamoorthy Priority: Minor Labels: cli, lhf Fix For: 1.0 Attachments: cassandra-0.8.2-2630-1.txt, cassandra-0.8.2-2630-2.txt, cassandra-0.8.2-2630-2.txt, cassandra-0.8.2-2630.txt I end up verifying column families a lot and using 'describe keyspace keyspace;' spits out a whole bunch of data since our keyspace has a lot of metadata. It would be really useful to have a 'describe column family;' for a given column family in the currently authenticated keyspace. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2915) Lucene based Secondary Indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092606#comment-13092606 ] Todd Nine edited comment on CASSANDRA-2915 at 8/29/11 4:29 AM: --- I don't necessaryly think there is a 1 to 1 relationship between a column and a Lucene document field. In our case we have the need to index fields in more than one manner. For instance, we index users as straight strings (lowercased) with email, first name and last name columns. However we also want to tokenize the email, first and last name columns to allow our customer support people to perform partial name matching. I think a 1 to N mapping is required for column to document field to allow this sort of functionality. As far as expiration on columns, is there a system event that we can hook into to just force a document reindex when a column expires rather than add an additional field that will need to be sorted from? As per Jason's previous post, I think supporting ORDER BY, GROUP BY, COUNT, LIKE etc are a must. Most users have become accustomed to this functionality with RDBMS. If they cause potential performance problems, I think this should be documented so that users have enough information to determine if they can rely on the Lucene index or should build their own index directly. Has anyone looked at existing code in ElasticSearch to avoid some of the pitfalls they have already experienced in building something similar? http://www.elasticsearch.org/ Lastly, this is a huge feature for the hector-jpa plugin, what can I do to help? was (Author: tnine): I don't necessaryly think there is a 1 to 1 relationship between a column and a Lucene document field. In our case we have the need to index fields in more than one manner. For instance, we index users as straight strings (lowercased) with email, first name and last name columns. However we also want to tokenize the email, first and last name columns to allow our customer support people to perform partial name matching. I think a 1 to N mapping is required for column to document field to allow this sort of functionality. As far as expiration on columns, is there a system event that we can hook into to just force a document reindex when a column expires rather than add an additional field that will need to be sorted from? As per Jason's previous post, I think supporting ORDER BY, GROUP BY, COUNT, LIKE etc are a must. Most users have become accustomed to this functionality with RDBMS. If they cause potential performance problems, I think this should be documented so that users have enough information to determine if they can rely on the Lucene index or should build their own index directly. Lastly, this is a huge feature for the hector-jpa plugin, what can I do to help? Lucene based Secondary Indexes -- Key: CASSANDRA-2915 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915 Project: Cassandra Issue Type: New Feature Components: Core Reporter: T Jake Luciani Assignee: Jason Rutherglen Labels: secondary_index Secondary indexes (of type KEYS) suffer from a number of limitations in their current form: - Multiple IndexClauses only work when there is a subset of rows under the highest clause - One new column family is created per index this means 10 new CFs for 10 secondary indexes This ticket will use the Lucene library to implement secondary indexes as one index per CF, and utilize the Lucene query engine to handle multiple index clauses. Also, by using the Lucene we get a highly optimized file format. There are a few parallels we can draw between Cassandra and Lucene. Lucene indexes segments in memory then flushes them to disk so we can sync our memtable flushes to lucene flushes. Lucene also has optimize() which correlates to our compaction process, so these can be sync'd as well. We will also need to correlate column validators to Lucene tokenizers, so the data can be stored properly, the big win in once this is done we can perform complex queries within a column like wildcard searches. The downside of this approach is we will need to read before write since documents in Lucene are written as complete documents. For random workloads with lot's of indexed columns this means we need to read the document from the index, update it and write it back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2915) Lucene based Secondary Indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092606#comment-13092606 ] Todd Nine edited comment on CASSANDRA-2915 at 8/29/11 4:30 AM: --- I don't necessarily think there is a 1 to 1 relationship between a column and a Lucene document field. In our case we have the need to index fields in more than one manner. For instance, we index users as straight strings (lowercased) with email, first name and last name columns. However we also want to tokenize the email, first and last name columns to allow our customer support people to perform partial name matching. I think a 1 to N mapping is required for column to document field to allow this sort of functionality. As far as expiration on columns, is there a system event that we can hook into to just force a document reindex when a column expires rather than add an additional field that will need to be sorted from? As per Jason's previous post, I think supporting ORDER BY, GROUP BY, COUNT, LIKE etc are a must. Most users have become accustomed to this functionality with RDBMS. If they cause potential performance problems, I think this should be documented so that users have enough information to determine if they can rely on the Lucene index or should build their own index directly. Has anyone looked at existing code in ElasticSearch to avoid some of the pitfalls they have already experienced in building something similar? http://www.elasticsearch.org/ Lastly, this is a huge feature for the hector-jpa plugin, what can I do to help? was (Author: tnine): I don't necessaryly think there is a 1 to 1 relationship between a column and a Lucene document field. In our case we have the need to index fields in more than one manner. For instance, we index users as straight strings (lowercased) with email, first name and last name columns. However we also want to tokenize the email, first and last name columns to allow our customer support people to perform partial name matching. I think a 1 to N mapping is required for column to document field to allow this sort of functionality. As far as expiration on columns, is there a system event that we can hook into to just force a document reindex when a column expires rather than add an additional field that will need to be sorted from? As per Jason's previous post, I think supporting ORDER BY, GROUP BY, COUNT, LIKE etc are a must. Most users have become accustomed to this functionality with RDBMS. If they cause potential performance problems, I think this should be documented so that users have enough information to determine if they can rely on the Lucene index or should build their own index directly. Has anyone looked at existing code in ElasticSearch to avoid some of the pitfalls they have already experienced in building something similar? http://www.elasticsearch.org/ Lastly, this is a huge feature for the hector-jpa plugin, what can I do to help? Lucene based Secondary Indexes -- Key: CASSANDRA-2915 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915 Project: Cassandra Issue Type: New Feature Components: Core Reporter: T Jake Luciani Assignee: Jason Rutherglen Labels: secondary_index Secondary indexes (of type KEYS) suffer from a number of limitations in their current form: - Multiple IndexClauses only work when there is a subset of rows under the highest clause - One new column family is created per index this means 10 new CFs for 10 secondary indexes This ticket will use the Lucene library to implement secondary indexes as one index per CF, and utilize the Lucene query engine to handle multiple index clauses. Also, by using the Lucene we get a highly optimized file format. There are a few parallels we can draw between Cassandra and Lucene. Lucene indexes segments in memory then flushes them to disk so we can sync our memtable flushes to lucene flushes. Lucene also has optimize() which correlates to our compaction process, so these can be sync'd as well. We will also need to correlate column validators to Lucene tokenizers, so the data can be stored properly, the big win in once this is done we can perform complex queries within a column like wildcard searches. The downside of this approach is we will need to read before write since documents in Lucene are written as complete documents. For random workloads with lot's of indexed columns this means we need to read the document from the index, update it and write it back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3056) Able to set path location of HeapDump in cassandra-env
[ https://issues.apache.org/jira/browse/CASSANDRA-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish babu krishnamoorthy updated CASSANDRA-3056: -- Attachment: CASSANDRA-3056-1.txt Added HeapDumpPath parameter to cassandra-env.sh Able to set path location of HeapDump in cassandra-env -- Key: CASSANDRA-3056 URL: https://issues.apache.org/jira/browse/CASSANDRA-3056 Project: Cassandra Issue Type: Improvement Affects Versions: 0.7.8, 0.8.4 Reporter: David Talbott Priority: Minor Labels: lhf Attachments: CASSANDRA-3056-1.txt We should be able to designate the path location to put any perf dumps that are performed. By Default with this not set the perf dump can occur on the root disk and fill the drive. Should be able to solve this by simply inserting JVM_OPTS=$JVM_OPTS -XX:HeapDumpPath=path to dir into cassandra-env.sh as a default option available and set. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira