[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113677#comment-15113677 ] Sam Tunnicliffe commented on CASSANDRA-10661: - [~xedin] SGTM! > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8028) Unable to compute when histogram overflowed
[ https://issues.apache.org/jira/browse/CASSANDRA-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113715#comment-15113715 ] Navjyot Nishant commented on CASSANDRA-8028: Hi All, We are getting similar issue while autocompaction is running on few of our nodes. Following is the error being logged, can someone please suggest what is causing this and how to resolve it? We use Cassandra 2.1.9. Please let me know if further information is required. Error: ERROR [CompactionExecutor:3] 2016-01-23 11:54:50,198 CassandraDaemon.java:223 - Exception in thread Thread[CompactionExecutor:3,1,main] java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:203) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.metadata.StatsMetadata.getEstimatedDroppableTombstoneRatio(StatsMetadata.java:98) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.SSTableReader.getEstimatedDroppableTombstoneRatio(SSTableReader.java:1987) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.worthDroppingTombstones(AbstractCompactionStrategy.java:370) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy.getNextBackgroundSSTables(SizeTieredCompactionStrategy.java:96) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy.getNextBackgroundTask(SizeTieredCompactionStrategy.java:179) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) ~[apache-cassandra-2.1.9.jar:2.1.9] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] > Unable to compute when histogram overflowed > --- > > Key: CASSANDRA-8028 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8028 > Project: Cassandra > Issue Type: Bug > Components: Tools > Environment: Linux >Reporter: Gianluca Borello >Assignee: Carl Yeksigian > Fix For: 2.1.3 > > Attachments: 8028-2.1-clean.txt, 8028-2.1-v2.txt, 8028-2.1.txt, > 8028-trunk.txt, sstable-histogrambuster.tar.bz2 > > > It seems like with 2.1.0 histograms can't be computed most of the times: > $ nodetool cfhistograms draios top_files_by_agent1 > nodetool: Unable to compute when histogram overflowed > See 'nodetool help' or 'nodetool help '. > I can probably find a way to attach a .cql script to reproduce it, but I > suspect it must be obvious to replicate it as it happens on more than 50% of > my column families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11063) Unable to compute ceiling for max when histogram overflowed
[ https://issues.apache.org/jira/browse/CASSANDRA-11063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navjyot Nishant updated CASSANDRA-11063: Description: Issue https://issues.apache.org/jira/browse/CASSANDRA-8028 seems related with error we are getting. But we are getting this with Cassandra 2.1.9 when autocompaction is running it keeps throwing following errors, we are unsure if its a bug or can be resolved, please suggest. WARN [CompactionExecutor:3] 2016-01-23 13:30:40,907 SSTableWriter.java:240 - Compacting large partition gccatlgsvcks/category_name_dedup:66611300 (138152195 bytes) ERROR [CompactionExecutor:1] 2016-01-23 13:30:50,267 CassandraDaemon.java:223 - Exception in thread Thread[CompactionExecutor:1,1,main] java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:203) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.metadata.StatsMetadata.getEstimatedDroppableTombstoneRatio(StatsMetadata.java:98) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.SSTableReader.getEstimatedDroppableTombstoneRatio(SSTableReader.java:1987) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.worthDroppingTombstones(AbstractCompactionStrategy.java:370) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy.getNextBackgroundSSTables(SizeTieredCompactionStrategy.java:96) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy.getNextBackgroundTask(SizeTieredCompactionStrategy.java:179) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) ~[apache-cassandra-2.1.9.jar:2.1.9] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] Additional info: cfstats is running fine for that table... ~ $ nodetool cfstats gccatlgsvcks.category_name_dedup Keyspace: gccatlgsvcks Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Flushes: 0 Table: category_name_dedup SSTable count: 6 Space used (live): 836089073 Space used (total): 836089073 Space used by snapshots (total): 3621519 Off heap memory used (total): 6925736 SSTable Compression Ratio: 0.03725398763856016 Number of keys (estimate): 3004 Memtable cell count: 0 Memtable data size: 0 Memtable off heap memory used: 0 Memtable switch count: 0 Local read count: 0 Local read latency: NaN ms Local write count: 0 Local write latency: NaN ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 5240 Bloom filter off heap memory used: 5192 Index summary off heap memory used: 1200 Compression metadata off heap memory used: 6919344 Compacted partition minimum bytes: 125 Compacted partition maximum bytes: 30753941057 Compacted partition mean bytes: 8352388 Average live cells per slice (last five minutes): 0.0 Maximum live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 was: Issue https://issues.apache.org/jira/browse/CASSANDRA-8028 seems related with error we are getting. But we are getting this with Cassandra 2.1.9 when autocompaction is running it keeps throwing following errors, we are unsure if its a bug or can be resolved, please suggest. ERROR [CompactionExecutor:3] 2016-01-23 11:52:50,197 CassandraDaemon.java:223 - Exception in thread Thread[CompactionExecutor:3,1,main] java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113690#comment-15113690 ] Pavel Yaskevich commented on CASSANDRA-10661: - [~beobal] Awesome, will try to do everything tomorrow, thanks! > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112994#comment-15112994 ] Pavel Yaskevich edited comment on CASSANDRA-10661 at 1/23/16 11:04 AM: --- [~beobal] How about `unfilteredCluster`? Since we are on the same page about this, here is what I'm thinking - we are going to update README.md we have in xedin/sasi and I'm going to put it into doc/SASI.md, squash all 17 commits into one and push to trunk, sounds good? was (Author: xedin): [~beobal] How about `unfilteredCluster`? Since we are on the same page about this, here is what I'm thinking - we are going to avoid README.md we have in xedin/sasi and I'm going to put it into doc/SASI.md, squash all 17 commits into one and push to trunk, sounds good? > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8028) Unable to compute when histogram overflowed
[ https://issues.apache.org/jira/browse/CASSANDRA-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113748#comment-15113748 ] Navjyot Nishant commented on CASSANDRA-8028: I have created https://issues.apache.org/jira/browse/CASSANDRA-11063 to track this issue. > Unable to compute when histogram overflowed > --- > > Key: CASSANDRA-8028 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8028 > Project: Cassandra > Issue Type: Bug > Components: Tools > Environment: Linux >Reporter: Gianluca Borello >Assignee: Carl Yeksigian > Fix For: 2.1.3 > > Attachments: 8028-2.1-clean.txt, 8028-2.1-v2.txt, 8028-2.1.txt, > 8028-trunk.txt, sstable-histogrambuster.tar.bz2 > > > It seems like with 2.1.0 histograms can't be computed most of the times: > $ nodetool cfhistograms draios top_files_by_agent1 > nodetool: Unable to compute when histogram overflowed > See 'nodetool help' or 'nodetool help '. > I can probably find a way to attach a .cql script to reproduce it, but I > suspect it must be obvious to replicate it as it happens on more than 50% of > my column families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11063) Unable to compute ceiling for max when histogram overflowed
Navjyot Nishant created CASSANDRA-11063: --- Summary: Unable to compute ceiling for max when histogram overflowed Key: CASSANDRA-11063 URL: https://issues.apache.org/jira/browse/CASSANDRA-11063 Project: Cassandra Issue Type: Bug Components: Compaction Environment: Cassandra 2.1.9 on RHEL Reporter: Navjyot Nishant Issue https://issues.apache.org/jira/browse/CASSANDRA-8028 seems related with error we are getting. But we are getting this with Cassandra 2.1.9 when autocompaction is running it keeps throwing following errors, we are unsure if its a bug or can be resolved, please suggest. ERROR [CompactionExecutor:3] 2016-01-23 11:52:50,197 CassandraDaemon.java:223 - Exception in thread Thread[CompactionExecutor:3,1,main] java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:203) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.metadata.StatsMetadata.getEstimatedDroppableTombstoneRatio(StatsMetadata.java:98) ~[apache-cassandra-2.1.9.jar:2.1 at org.apache.cassandra.io.sstable.SSTableReader.getEstimatedDroppableTombstoneRatio(SSTableReader.java:1987) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.worthDroppingTombstones(AbstractCompactionStrategy.java:370) ~[apache-cassandra-2.1. at org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy.getNextBackgroundSSTables(SizeTieredCompactionStrategy.java:96) ~[apache-cassandra at org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy.getNextBackgroundTask(SizeTieredCompactionStrategy.java:179) ~[apache-cassandra-2. at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) ~[apache-cassandra-2.1.9.j at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) ~[apache-cassandra-2.1.9.jar:2. at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11063) Unable to compute ceiling for max when histogram overflowed
[ https://issues.apache.org/jira/browse/CASSANDRA-11063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navjyot Nishant updated CASSANDRA-11063: Description: Issue https://issues.apache.org/jira/browse/CASSANDRA-8028 seems related with error we are getting. But we are getting this with Cassandra 2.1.9 when autocompaction is running it keeps throwing following errors, we are unsure if its a bug or can be resolved, please suggest. WARN [CompactionExecutor:3] 2016-01-23 13:30:40,907 SSTableWriter.java:240 - Compacting large partition gccatlgsvcks/category_name_dedup:66611300 (138152195 bytes) ERROR [CompactionExecutor:1] 2016-01-23 13:30:50,267 CassandraDaemon.java:223 - Exception in thread Thread[CompactionExecutor:1,1,main] java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.java:203) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.metadata.StatsMetadata.getEstimatedDroppableTombstoneRatio(StatsMetadata.java:98) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.io.sstable.SSTableReader.getEstimatedDroppableTombstoneRatio(SSTableReader.java:1987) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.worthDroppingTombstones(AbstractCompactionStrategy.java:370) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy.getNextBackgroundSSTables(SizeTieredCompactionStrategy.java:96) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy.getNextBackgroundTask(SizeTieredCompactionStrategy.java:179) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getNextBackgroundTask(WrappingCompactionStrategy.java:84) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:230) ~[apache-cassandra-2.1.9.jar:2.1.9] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] Additional info: cfstats is running fine for that table... ~ $ nodetool cfstats gccatlgsvcks.category_name_dedup Keyspace: gccatlgsvcks Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Flushes: 0 Table: category_name_dedup SSTable count: 6 Space used (live): 836314727 Space used (total): 836314727 Space used by snapshots (total): 3621519 Off heap memory used (total): 6930368 SSTable Compression Ratio: 0.03725358753117693 Number of keys (estimate): 3004 Memtable cell count: 0 Memtable data size: 0 Memtable off heap memory used: 0 Memtable switch count: 0 Local read count: 0 Local read latency: NaN ms Local write count: 0 Local write latency: NaN ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 5240 Bloom filter off heap memory used: 5192 Index summary off heap memory used: 1200 Compression metadata off heap memory used: 6923976 Compacted partition minimum bytes: 125 Compacted partition maximum bytes: 30753941057 Compacted partition mean bytes: 8352388 Average live cells per slice (last five minutes): 0.0 Maximum live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 cfhistograms is also running fine... ~ $ nodetool cfhistograms gccatlgsvcks category_name_dedup gccatlgsvcks/category_name_dedup histograms Percentile SSTables Write Latency Read LatencyPartition Size Cell Count (micros) (micros) (bytes) 50% 0.00 0.00 0.00 1109 20 75% 0.00 0.00 0.00 2299 42 95% 0.00
[jira] [Comment Edited] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113836#comment-15113836 ] Jack Krupansky edited comment on CASSANDRA-10661 at 1/23/16 6:57 PM: - Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. Or, can I indeed have two indexes on a single column, one a traditional exact match, and one a prefix match. Hmmm... in which case, which gets used if I just specify a column name? CREATE INDEX first_name_full ON mytable (first_name)... CREATE CUSTOM INDEX first_name_prefix ON mytable (first_name)... (I may be confused here - can you specify an index name in place of a column name in a relation in a SELECT/WHERE clause (SELECT... WHERE... first_name_exact = 'Joe')? I don't see any doc/spec that indicates that you can. I'm not sure why I thought that you could. But I don't see any code that detects and fails on this case at CREATE INDEX time. The code checks for "everything but name" rather than detecting two non-keys/values indexes on the same column.) It would be good to have an example that illustrates this. In fact, I would argue that first and last names are perfect examples of where you really do need to query on both exact match and partial match. In fact, I'm not sure I can think of any examples of non-tokenized text fields where you don't want to reserve the ability to find an exact match even if you do need partial matches for some queries. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) Are there any use cases of traditional Cassandra indexes which shouldn't almost automatically be converted to SPARSE. After all, the current recommended best practice is to avoid secondary indexes where the column cardinality is either very high or very low, which seems to be a match for SPARSE, although the precise meaning of SPARSE is still a bit fuzzy for me. Maybe, for the first_name use case I mentioned the user would be better off with a first_name Materialized View using first_name in the PK instead of the SPARSE SASI index. In fact, by placing first_name in the partition key of the MV I could assure that all base table rows with the same first name would be on the same node. If all of that is true, we will need to give users some decent guidance on when to use SPARSE SASI vs. MV (vs. classic secondary... or even DSE Search.) was (Author: jkrupan): Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. Or, can I indeed have two indexes on a single column, one a traditional exact match, and one a prefix match. Hmmm... in which case, which gets used if I just specify a column name? CREATE INDEX first_name_full ON mytable (first_name)... CREATE CUSTOM INDEX first_name_prefix ON mytable (first_name)... It would be good to have an example that illustrates this. In fact, I would argue that first and last names are perfect examples of where you really do need to query on both exact match and partial match. In fact, I'm not sure I can think of any examples of non-tokenized text fields where you don't want to reserve the ability to find an exact match even if you do need partial matches for some queries. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) Are there any use cases of traditional Cassandra indexes which shouldn't almost automatically be converted to SPARSE. After all, the current recommended best practice is to avoid secondary indexes where the column cardinality is either very high or very low, which seems to be a match for SPARSE, although the precise meaning of SPARSE is still a bit fuzzy for me. Maybe, for the first_name use case I mentioned the user would be better off with a first_name Materialized View using first_name in the PK instead of the SPARSE SASI index. In fact, by placing first_name in the partition key of the MV
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113918#comment-15113918 ] Jordan West commented on CASSANDRA-10661: - bq. Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. The example is correct, but this is not a limitation of SASI, its a limitation in CQL, and we decided not to further extend the grammar, since we have already had to scale back our grammar changes to later phases (removing OR, grouping, and != support for now). Ideally, CQL would support a `LIKE` operator similar to SQL, and depending on if the index was created with `PREFIX` or `CONTAINS` we would allow/disallow forms such as `%Jo%` or `_j%`. bq. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) It does, but so are all queries on numerical data, which thinking about it, may make the `PREFIX` option confusing for numeric types. SPARSE is intended to improve query performance on numerical data where there are a large number of terms (e.g. timestamps), but small number of keys per term (e.g. some timeseries data). `SPARSE` should not be used on every numerical column, and for most non-numerical data is not an ideal setting either. For example, in a large data set of first names the number of names will be small compared to the number of keys, and given the distribution of first names using SPARSE will increase the size of the index and at best have zero effect on query performance, but may hurt it. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113918#comment-15113918 ] Jordan West edited comment on CASSANDRA-10661 at 1/23/16 7:42 PM: -- bq. Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. The example is correct, but this is not a limitation of SASI, its a limitation in CQL, and we decided not to further extend the grammar, since we have already had to scale back our grammar changes to later phases (removing OR, grouping, and != support for now). Ideally, `=` would mean exact match and CQL would support a `LIKE` operator similar to SQL, and depending on if the index was created with `PREFIX` or `CONTAINS` we would allow/disallow forms such as `%Jo%` or `_j%`. bq. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) It does, but so are all queries on numerical data, which thinking about it, may make the `PREFIX` option confusing for numeric types. SPARSE is intended to improve query performance on numerical data where there are a large number of terms (e.g. timestamps), but small number of keys per term (e.g. some timeseries data). `SPARSE` should not be used on every numerical column, and for most non-numerical data is not an ideal setting either. For example, in a large data set of first names the number of names will be small compared to the number of keys, and given the distribution of first names using SPARSE will increase the size of the index and at best have zero effect on query performance, but may hurt it. was (Author: jrwest): bq. Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. The example is correct, but this is not a limitation of SASI, its a limitation in CQL, and we decided not to further extend the grammar, since we have already had to scale back our grammar changes to later phases (removing OR, grouping, and != support for now). Ideally, CQL would support a `LIKE` operator similar to SQL, and depending on if the index was created with `PREFIX` or `CONTAINS` we would allow/disallow forms such as `%Jo%` or `_j%`. bq. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) It does, but so are all queries on numerical data, which thinking about it, may make the `PREFIX` option confusing for numeric types. SPARSE is intended to improve query performance on numerical data where there are a large number of terms (e.g. timestamps), but small number of keys per term (e.g. some timeseries data). `SPARSE` should not be used on every numerical column, and for most non-numerical data is not an ideal setting either. For example, in a large data set of first names the number of names will be small compared to the number of keys, and given the distribution of first names using SPARSE will increase the size of the index and at best have zero effect on query performance, but may hurt it. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113816#comment-15113816 ] Jack Krupansky commented on CASSANDRA-10661: So is this stuff actually ready to release? I mean, consistent with the new philosophy that "trunk is always releasable"? IOW, if it does get committed, it will be in 3.4 no matter what? I only ask because it just seemed that there was stuff in flux fairly recently (a couple days ago), suggested it wasn't quite baked enough to be considered "releasable". > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11060) Allow DTCS old SSTable filtering to use min timestamp instead of max
[ https://issues.apache.org/jira/browse/CASSANDRA-11060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Deng updated CASSANDRA-11060: - Labels: dtcs (was: ) > Allow DTCS old SSTable filtering to use min timestamp instead of max > > > Key: CASSANDRA-11060 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11060 > Project: Cassandra > Issue Type: Improvement >Reporter: Sam Bisbee > Labels: dtcs > > We have observed a DTCS behavior when using TTLs where SSTables are never or > very rarely fully expired due to compaction, allowing expired data to be > "stuck" in large partially expired SSTables. > This is because compaction filtering is performed on the max timestamp, which > continues to grow as SSTables are compacted together. This means they will > never move past max_sstable_age_days. With a sufficiently large TTL, like 30 > days, this allows old but not expired SSTables to continue combining and > never become fully expired, even with a max_sstable_age_days of 1. > As a result we have seen expired data hang around in large SSTables for over > six months longer than it should have. This is obviously wasteful and a disk > capacity issue. > As a result we have been running an extended version of DTCS called MTCS in > some deployments. The only change is that it uses min timestamp instead of > max for compaction filtering (filterOldSSTables()). This allows SSTables to > move beyond max_sstable_age_days and stop compacting, which means the entire > SSTable can become fully expired and be dropped off disk as intended. > You can see and test MTCS here: https://github.com/threatstack/mtcs > I am not advocating that MTCS be its own stand alone compaction strategy. > However, I would like to see a configuration option for DTCS that allows you > to specify whether old SSTables should be filtered on min or max timestamp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113836#comment-15113836 ] Jack Krupansky commented on CASSANDRA-10661: Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. Or, can I indeed have two indexes on a single column, one a traditional exact match, and one a prefix match. Hmmm... in which case, which gets used if I just specify a column name? CREATE INDEX first_name_full ON table CREATE CUSTOM INDEX first_name_prefix ... It would be good to have an example that illustrates this. In fact, I would argue that first and last names are perfect examples of where you really do need to query on both exact match and partial match. In fact, I'm not sure I can think of any examples of non-tokenized text fields where you don't want to reserve the ability to find an exact match even if you do need partial matches for some queries. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) Are there any use cases of traditional Cassandra indexes which shouldn't almost automatically be converted to SPARSE. After all, the current recommended best practice is to avoid secondary indexes where the column cardinality is either very high or very low, which seems to be a match for SPARSE, although the precise meaning of SPARSE is still a bit fuzzy for me. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113836#comment-15113836 ] Jack Krupansky edited comment on CASSANDRA-10661 at 1/23/16 5:58 PM: - Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. Or, can I indeed have two indexes on a single column, one a traditional exact match, and one a prefix match. Hmmm... in which case, which gets used if I just specify a column name? CREATE INDEX first_name_full ON mytable (first_name)... CREATE CUSTOM INDEX first_name_prefix ON mytable (first_name)... It would be good to have an example that illustrates this. In fact, I would argue that first and last names are perfect examples of where you really do need to query on both exact match and partial match. In fact, I'm not sure I can think of any examples of non-tokenized text fields where you don't want to reserve the ability to find an exact match even if you do need partial matches for some queries. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) Are there any use cases of traditional Cassandra indexes which shouldn't almost automatically be converted to SPARSE. After all, the current recommended best practice is to avoid secondary indexes where the column cardinality is either very high or very low, which seems to be a match for SPARSE, although the precise meaning of SPARSE is still a bit fuzzy for me. Maybe, for the first_name use case I mentioned the user would be better off with a first_name Materialized View using first_name in the PK instead of the SPARSE SASI index. In fact, by placing first_name in the partition key of the MV I could assure that all base table rows with the same first name would be on the same node. If all of that is true, we will need to give users some decent guidance on when to use SPARSE SASI vs. MV (vs. classic secondary... or even DSE Search.) was (Author: jkrupan): Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. Or, can I indeed have two indexes on a single column, one a traditional exact match, and one a prefix match. Hmmm... in which case, which gets used if I just specify a column name? CREATE INDEX first_name_full ON mytable (first_name)... CREATE CUSTOM INDEX first_name_prefix ON mytable (first_name)... It would be good to have an example that illustrates this. In fact, I would argue that first and last names are perfect examples of where you really do need to query on both exact match and partial match. In fact, I'm not sure I can think of any examples of non-tokenized text fields where you don't want to reserve the ability to find an exact match even if you do need partial matches for some queries. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) Are there any use cases of traditional Cassandra indexes which shouldn't almost automatically be converted to SPARSE. After all, the current recommended best practice is to avoid secondary indexes where the column cardinality is either very high or very low, which seems to be a match for SPARSE, although the precise meaning of SPARSE is still a bit fuzzy for me. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to
[jira] [Updated] (CASSANDRA-11056) Use max timestamp to decide DTCS-timewindow-membership
[ https://issues.apache.org/jira/browse/CASSANDRA-11056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Deng updated CASSANDRA-11056: - Labels: dtcs (was: ) > Use max timestamp to decide DTCS-timewindow-membership > -- > > Key: CASSANDRA-11056 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11056 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Björn Hegerfors > Labels: dtcs > Attachments: cassandra-2.2-CASSANDRA-11056.txt > > > TWCS (CASSANDRA-9666) uses max timestamp to decide time window membership, we > should do the same in DTCS so that users can configure DTCS to work exactly > like TWCS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113842#comment-15113842 ] Jon Haddad commented on CASSANDRA-10661: If sparse means what Jack is implying, perhaps a better name for it would be EXACT > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113836#comment-15113836 ] Jack Krupansky edited comment on CASSANDRA-10661 at 1/23/16 4:55 PM: - Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. Or, can I indeed have two indexes on a single column, one a traditional exact match, and one a prefix match. Hmmm... in which case, which gets used if I just specify a column name? CREATE INDEX first_name_full ON mytable (first_name)... CREATE CUSTOM INDEX first_name_prefix ON mytable (first_name)... It would be good to have an example that illustrates this. In fact, I would argue that first and last names are perfect examples of where you really do need to query on both exact match and partial match. In fact, I'm not sure I can think of any examples of non-tokenized text fields where you don't want to reserve the ability to find an exact match even if you do need partial matches for some queries. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) Are there any use cases of traditional Cassandra indexes which shouldn't almost automatically be converted to SPARSE. After all, the current recommended best practice is to avoid secondary indexes where the column cardinality is either very high or very low, which seems to be a match for SPARSE, although the precise meaning of SPARSE is still a bit fuzzy for me. was (Author: jkrupan): Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. Or, can I indeed have two indexes on a single column, one a traditional exact match, and one a prefix match. Hmmm... in which case, which gets used if I just specify a column name? CREATE INDEX first_name_full ON table CREATE CUSTOM INDEX first_name_prefix ... It would be good to have an example that illustrates this. In fact, I would argue that first and last names are perfect examples of where you really do need to query on both exact match and partial match. In fact, I'm not sure I can think of any examples of non-tokenized text fields where you don't want to reserve the ability to find an exact match even if you do need partial matches for some queries. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) Are there any use cases of traditional Cassandra indexes which shouldn't almost automatically be converted to SPARSE. After all, the current recommended best practice is to avoid secondary indexes where the column cardinality is either very high or very low, which seems to be a match for SPARSE, although the precise meaning of SPARSE is still a bit fuzzy for me. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113842#comment-15113842 ] Jon Haddad edited comment on CASSANDRA-10661 at 1/23/16 5:02 PM: - If sparse means what Jack is implying, perhaps a better name for it would be EXACT. Using SPARSE will usually result in people asking "what does that mean", and the answer will be "exact match" so I propose we just use that as it'll cut down on the number of questions people have. was (Author: rustyrazorblade): If sparse means what Jack is implying, perhaps a better name for it would be EXACT > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113967#comment-15113967 ] Pavel Yaskevich commented on CASSANDRA-10661: - bq. So is this stuff actually ready to release? I mean, consistent with the new philosophy that "trunk is always releasable"? IOW, if it does get committed, it will be in 3.4 no matter what? I only ask because it just seemed that there was stuff in flux fairly recently (a couple days ago), suggested it wasn't quite baked enough to be considered "releasable". Yes, the stuff is ready to release since fairly recently added changes are ported from 2.0 and clustering support is just couple of lines of additional filtering added, no internal data structure changes, this is also opt-in feature which is irrelevant for core functionality until enabled. This is also the reason why we don't want do any of the CQL front-end related changes right away but rather more gradual migration. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10937) OOM on multiple nodes on write load (v. 3.0.0), problem also present on DSE-4.8.3, but there it survives more time
[ https://issues.apache.org/jira/browse/CASSANDRA-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114212#comment-15114212 ] Peter Kovgan commented on CASSANDRA-10937: -- Thank you, Jack. I answer inline: I still don't see any reason to believe that there is a bug here and that the primary issue is that you are overloading the cluster. Peter: Agree and hope this is the reason Sure, Cassandra should do a better job of shedding/failing excessive incoming requests, and there is an open Jira ticket to add just such a feature, but even with that new feature, the net effect will be the same - it will still be up to the application and operations to properly size the cluster and throttle application load before it gets to Cassandra. Peter: No problem, I understand the driving force for that, I only claim that friendly warning would be appropriate in case of estimated danger of approaching OOM.It is hard to do that, I understand. Some situations are not easy to analyze and make conclusions. But see below… OOM is not typically an indication of a software bug. Sure, sometimes code has memory leaks, but with a highly dynamic system such as Cassandra, it typically means either a misconfigured JVM or just very heavy load. Sometimes OOM simply means that there is a lot of background processing going on (like compactions or hinted handoff) that is having trouble keeping up with incoming requests. Sometimes OOM occurs because you have too large a heap which defers GC but then GC takes too long and further incoming requests simply generate more pressure on the heap faster than that massive GC can deal with it. Peter: Regarding compactions.. I could imagine that. We notice progressive growth in IO demand. So, I would take IO wait progressive growth as a warning trigger for possible approaching OOM.E.g. if normal IO wait configured as 0.3%, and system progressively goes through some configured thresholds of 0.7, 1.0, 1.5 % , I would like to notice that in some warning log.This way, I can judge earlier, that I need increase the ring or wait an OOM. Now, in latest test, I see pending comactions gradually increases. Very slowely. Two days ago it was 40, now 135, I wonder, is it a sign of a pending problem? It is indeed tricky to make sure the JVM has enough heap but not too much. Peter: Aware of that. I deal with GC issues in general more frequently than others in my company. Previous DSE tests done with G1, providing multiple of 2048Mb (G1 recommendation), concretely I gave it 73728M Here I assume effective GC with G1 is more a function of available CPU, because there are a lot of “young” and “old” spaces and things are more complicated than in Concurrent collector. CPU was fine when OOM happened, a lot of idle, another sign that IO is a bottleneck. We now test 2 single node installations, one with 36G heap and one with 73gb. I want see which one is doing better. We also reduced load to 5 Mb/sec, instead of 25-30. DSE typically runs with a larger heap by default. You can try increasing your heap to 10 or 12G. But if you make the heap too big, the big GC can bite you as described above. In that case, the heap needs to be reduced. Typically you don't need a heap smaller than 8 GB. If OOM occurs with a 8 GB heap it typically means the load on that node is simply too heavy. Be sure to review the recommendations in this blog post on reasonable recommendations: http://www.datastax.com/dev/blog/how-not-to-benchmark-cassandra Peter: Done.All is by the book, except: We use custom producer and custom data model. We change data model, trying make it more effective, last change was adding day to partition, we want avoid too wide rows. Our producer is multi-threaded and configurable. A few questions that will help us better understand what you are really trying to do: 1. How much reading are you doing and when relative to writes? Peter: In OOM-ended tests(In all tests before) we did only writes. Just recently, with lower load I started did reads. Meanwhile it is OK. (4 days passed) 2. Are you doing any updates or deletes? (Cause compaction, which can fall behind your write/update load.) Peter: No, no updates, and will not do. Our TTL will be set for 4 weeks in production. Now I do no TTL to test reads on greater data storage. 3. How much data is on the cluster (rows)? Peter: This info is currently unavailable (for OOM-ended tests and previous particular data model). I cannot check, because Cassandra fails on OOM during restart and I have no different environment to see. But for today’s test (we added a day to partition, other parameters are the same ) estimated numbers from nodetool sfstats are: Number of keys (estimate): 2000 Number of keys (estimate): 10142095 Number of keys (estimate): 350 Number of keys (estimate): 2000 Number of keys (estimate): 350 Number of keys (estimate): 12491 I assume now
[04/14] cassandra git commit: Integrate SASI index into Cassandra
http://git-wip-us.apache.org/repos/asf/cassandra/blob/72790dc8/test/resources/tokenization/apache_license_header.txt -- diff --git a/test/resources/tokenization/apache_license_header.txt b/test/resources/tokenization/apache_license_header.txt new file mode 100644 index 000..d973dce --- /dev/null +++ b/test/resources/tokenization/apache_license_header.txt @@ -0,0 +1,16 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ \ No newline at end of file http://git-wip-us.apache.org/repos/asf/cassandra/blob/72790dc8/test/resources/tokenization/ja_jp_1.txt -- diff --git a/test/resources/tokenization/ja_jp_1.txt b/test/resources/tokenization/ja_jp_1.txt new file mode 100644 index 000..1a0a198 --- /dev/null +++ b/test/resources/tokenization/ja_jp_1.txt @@ -0,0 +1 @@ +å¤åæ¬ã¯é¡åã®è¨ããã¦ããªããã®ãå¤ããè¨ããã¦ããå ´åã§ãã£ã¦ãå 容ã¯ãã¾ãã¾ã§ããããæºæ°ç©èªãã®å ´åã¯ååã®æ¨é¡ã¨ãã¦ãæºæ°ç©èªããªããããã«ç¸å½ããç©èªå ¨ä½ã®æ¨é¡ãè¨ããã¦ããå ´åããããããããã®å¸åãè¨ããã¦ãããã¨ãå°ãªããªããããããçµç·¯ãããç¾å¨ã«ããã¦ä¸è¬ã«ãæºæ°ç©èªãã¨å¼ã°ãã¦ãããã®ç©èªãæ¸ãããå½æã®é¡åãä½ã§ãã£ãã®ãã¯æããã§ã¯ãªããå¤ãæ代ã®åæ¬ã注éæ¸ãªã©ã®æç®ã«è¨ããã¦ããå称ã¯å¤§ãã以ä¸ã®ç³»çµ±ã«åãããã \ No newline at end of file http://git-wip-us.apache.org/repos/asf/cassandra/blob/72790dc8/test/resources/tokenization/ja_jp_2.txt -- diff --git a/test/resources/tokenization/ja_jp_2.txt b/test/resources/tokenization/ja_jp_2.txt new file mode 100644 index 000..278b4fd --- /dev/null +++ b/test/resources/tokenization/ja_jp_2.txt @@ -0,0 +1,2 @@ +ä¸é幸ä¸ç·¨ãå¸¸ç¨ æºæ°ç©èªè¦è¦§ãæ¦èµéæ¸é¢ã1997å¹´ï¼å¹³æ9å¹´ï¼ã ISBN 4-8386-0383-5 +ãã®ä»ã«CD-ROMåãããæ¬ææ¤ç´¢ã·ã¹ãã ã¨ãã¦æ¬¡ã®ãããªãã®ãããã \ No newline at end of file http://git-wip-us.apache.org/repos/asf/cassandra/blob/72790dc8/test/resources/tokenization/lorem_ipsum.txt -- diff --git a/test/resources/tokenization/lorem_ipsum.txt b/test/resources/tokenization/lorem_ipsum.txt new file mode 100644 index 000..14a4477 --- /dev/null +++ b/test/resources/tokenization/lorem_ipsum.txt @@ -0,0 +1 @@ +"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum." \ No newline at end of file http://git-wip-us.apache.org/repos/asf/cassandra/blob/72790dc8/test/resources/tokenization/ru_ru_1.txt -- diff --git a/test/resources/tokenization/ru_ru_1.txt b/test/resources/tokenization/ru_ru_1.txt new file mode 100644 index 000..c19a9be --- /dev/null +++ b/test/resources/tokenization/ru_ru_1.txt @@ -0,0 +1,19 @@ +ÐÑл ÑабÑлаз ÑÑÑикеÑнди виÑÑпÑÑаÑоÑебÑз Ñи, кÑм нобÑз дикÑÑÑÑ ÑнвидÑÐ½Ñ ÐµÐ´. ЮÑÑ Ð·Ð¾Ð»ÑÑ Ð¸Ð¹Ð¶ÐºÐ²ÑÑ Ñа, нÑк но ÑлиÑÑ Ð²Ð¾Ð»ÑпÑÑа пÑÑкÑпиÑÑÑ. Ð«Ñ Ð²ÐµÐºÐ¶ декам плÑÑаÑонÑм, Ñа жÑмо ÑÑдÑÐºÐ°Ð±ÐµÑ Ð»ÑебÑÑавиÑÑÑ ÐºÐ²Ñй, алÑбÑкиÑÑ Ð»ÑгÑÐ½Ð´Ð¾Ñ ÑÑ Ð¿ÑÑ. ÐÑж ед аÑÑÑм нÑмквÑам ÑебиквÑÑ, Ñи амÑÑ Ð´ÑбÑÑ Ð½ÑллÑам квÑо. ÐÑ Ð·Ð¾Ð»ÑÑ Ð¿Ð¾Ð½Ð´ÑÑÑм ÑлÑÑÑеÑÑнд Ñ Ð°Ð¶, вÑÑ ÑнвидÑÐ½Ñ Ð´ÑÑиниÑеонÑм Ñкз, конгÑÑ ÐºÑÑÑÑож квÑо аÑ. + +Ðд ÑиÑÑÑнÑ
[14/14] cassandra git commit: Integrate SASI index into Cassandra
Integrate SASI index into Cassandra patch by xedin; reviewed by beobal for CASSANDRA-10661 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/72790dc8 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/72790dc8 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/72790dc8 Branch: refs/heads/trunk Commit: 72790dc8e34826b39ac696b03025ae6b7b6beb2b Parents: 11c8ca6 Author: Pavel YaskevichAuthored: Wed Dec 2 19:23:54 2015 -0800 Committer: Pavel Yaskevich Committed: Sat Jan 23 19:35:29 2016 -0800 -- CHANGES.txt | 1 + build.xml |22 +- doc/SASI.md | 768 + lib/concurrent-trees-2.4.0.jar | Bin 0 -> 118696 bytes lib/hppc-0.5.4.jar | Bin 0 -> 1305173 bytes lib/jflex-1.6.0.jar | Bin 0 -> 1048690 bytes lib/licenses/concurrent-trees-2.4.0.txt | 201 + lib/licenses/hppc-0.5.4.txt | 202 + lib/licenses/jflex-1.6.0.txt| 201 + lib/licenses/primitive-1.0.txt | 201 + lib/licenses/snowball-stemmer-1.3.0.581.1.txt | 201 + lib/primitive-1.0.jar | Bin 0 -> 52589 bytes lib/snowball-stemmer-1.3.0.581.1.jar| Bin 0 -> 93019 bytes .../cassandra/config/DatabaseDescriptor.java| 7 +- .../org/apache/cassandra/db/ColumnIndex.java| 6 +- .../apache/cassandra/db/filter/RowFilter.java |15 +- .../cassandra/index/SecondaryIndexManager.java |11 + .../apache/cassandra/index/sasi/SASIIndex.java | 288 + .../cassandra/index/sasi/SASIIndexBuilder.java | 128 + .../cassandra/index/sasi/SSTableIndex.java | 187 + .../org/apache/cassandra/index/sasi/Term.java |65 + .../cassandra/index/sasi/TermIterator.java | 208 + .../index/sasi/analyzer/AbstractAnalyzer.java |51 + .../index/sasi/analyzer/NoOpAnalyzer.java |54 + .../sasi/analyzer/NonTokenizingAnalyzer.java| 126 + .../sasi/analyzer/NonTokenizingOptions.java | 147 + .../sasi/analyzer/SUPPLEMENTARY.jflex-macro | 143 + .../index/sasi/analyzer/StandardAnalyzer.java | 194 + .../sasi/analyzer/StandardTokenizerImpl.jflex | 220 + .../analyzer/StandardTokenizerInterface.java|65 + .../sasi/analyzer/StandardTokenizerOptions.java | 272 + .../analyzer/filter/BasicResultFilters.java |76 + .../analyzer/filter/FilterPipelineBuilder.java |51 + .../analyzer/filter/FilterPipelineExecutor.java |53 + .../analyzer/filter/FilterPipelineTask.java |52 + .../sasi/analyzer/filter/StemmerFactory.java| 101 + .../sasi/analyzer/filter/StemmingFilters.java |46 + .../sasi/analyzer/filter/StopWordFactory.java | 100 + .../sasi/analyzer/filter/StopWordFilters.java |42 + .../cassandra/index/sasi/conf/ColumnIndex.java | 193 + .../cassandra/index/sasi/conf/DataTracker.java | 162 + .../cassandra/index/sasi/conf/IndexMode.java| 169 + .../index/sasi/conf/view/PrefixTermTree.java| 194 + .../index/sasi/conf/view/RangeTermTree.java |77 + .../index/sasi/conf/view/TermTree.java |58 + .../cassandra/index/sasi/conf/view/View.java| 104 + .../cassandra/index/sasi/disk/Descriptor.java |51 + .../cassandra/index/sasi/disk/OnDiskBlock.java | 142 + .../cassandra/index/sasi/disk/OnDiskIndex.java | 773 ++ .../index/sasi/disk/OnDiskIndexBuilder.java | 627 + .../index/sasi/disk/PerSSTableIndexWriter.java | 361 + .../apache/cassandra/index/sasi/disk/Token.java |42 + .../cassandra/index/sasi/disk/TokenTree.java| 519 + .../index/sasi/disk/TokenTreeBuilder.java | 839 ++ .../exceptions/TimeQuotaExceededException.java |21 + .../index/sasi/memory/IndexMemtable.java|71 + .../index/sasi/memory/KeyRangeIterator.java | 118 + .../cassandra/index/sasi/memory/MemIndex.java |51 + .../index/sasi/memory/SkipListMemIndex.java |97 + .../index/sasi/memory/TrieMemIndex.java | 254 + .../cassandra/index/sasi/plan/Expression.java | 340 + .../cassandra/index/sasi/plan/Operation.java| 477 + .../index/sasi/plan/QueryController.java| 261 + .../cassandra/index/sasi/plan/QueryPlan.java| 170 + .../cassandra/index/sasi/sa/ByteTerm.java |51 + .../cassandra/index/sasi/sa/CharTerm.java |54 + .../cassandra/index/sasi/sa/IntegralSA.java |84 + .../org/apache/cassandra/index/sasi/sa/SA.java |58 + .../cassandra/index/sasi/sa/SuffixSA.java | 143 + .../apache/cassandra/index/sasi/sa/Term.java|58 + .../cassandra/index/sasi/sa/TermIterator.java |31 +
[13/14] cassandra git commit: Integrate SASI index into Cassandra
http://git-wip-us.apache.org/repos/asf/cassandra/blob/72790dc8/lib/licenses/jflex-1.6.0.txt -- diff --git a/lib/licenses/jflex-1.6.0.txt b/lib/licenses/jflex-1.6.0.txt new file mode 100644 index 000..50086f8 --- /dev/null +++ b/lib/licenses/jflex-1.6.0.txt @@ -0,0 +1,201 @@ + Apache License + Version 2.0, January 2004 +http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You +
[05/14] cassandra git commit: Integrate SASI index into Cassandra
http://git-wip-us.apache.org/repos/asf/cassandra/blob/72790dc8/test/resources/tokenization/adventures_of_huckleberry_finn_mark_twain.txt -- diff --git a/test/resources/tokenization/adventures_of_huckleberry_finn_mark_twain.txt b/test/resources/tokenization/adventures_of_huckleberry_finn_mark_twain.txt new file mode 100644 index 000..27cadc3 --- /dev/null +++ b/test/resources/tokenization/adventures_of_huckleberry_finn_mark_twain.txt @@ -0,0 +1,12361 @@ + + +The Project Gutenberg EBook of Adventures of Huckleberry Finn, Complete +by Mark Twain (Samuel Clemens) + +This eBook is for the use of anyone anywhere at no cost and with almost +no restrictions whatsoever. You may copy it, give it away or re-use +it under the terms of the Project Gutenberg License included with this +eBook or online at www.gutenberg.net + +Title: Adventures of Huckleberry Finn, Complete + +Author: Mark Twain (Samuel Clemens) + +Release Date: August 20, 2006 [EBook #76] + +Last Updated: October 20, 2012] + +Language: English + + +*** START OF THIS PROJECT GUTENBERG EBOOK HUCKLEBERRY FINN *** + +Produced by David Widger + + + + + +ADVENTURES + +OF + +HUCKLEBERRY FINN + +(Tom Sawyer's Comrade) + +By Mark Twain + +Complete + + + + +CONTENTS. + +CHAPTER I. Civilizing Huck.ÂMiss Watson.ÂTom Sawyer Waits. + +CHAPTER II. The Boys Escape Jim.ÂTorn Sawyer's Gang.ÂDeep-laid Plans. + +CHAPTER III. A Good Going-over.ÂGrace Triumphant.Â"One of Tom Sawyers's +Lies". + +CHAPTER IV. Huck and the Judge.ÂSuperstition. + +CHAPTER V. Huck's Father.ÂThe Fond Parent.ÂReform. + +CHAPTER VI. He Went for Judge Thatcher.ÂHuck Decided to Leave.ÂPolitical +Economy.ÂThrashing Around. + +CHAPTER VII. Laying for Him.ÂLocked in the Cabin.ÂSinking the +Body.ÂResting. + +CHAPTER VIII. Sleeping in the Woods.ÂRaising the Dead.ÂExploring the +Island.ÂFinding Jim.ÂJim's Escape.ÂSigns.ÂBalum. + +CHAPTER IX. The Cave.ÂThe Floating House. + +CHAPTER X. The Find.ÂOld Hank Bunker.ÂIn Disguise. + +CHAPTER XI. Huck and the Woman.ÂThe Search.ÂPrevarication.ÂGoing to +Goshen. + +CHAPTER XII. Slow Navigation.ÂBorrowing Things.ÂBoarding the Wreck.ÂThe +Plotters.ÂHunting for the Boat. + +CHAPTER XIII. Escaping from the Wreck.ÂThe Watchman.ÂSinking. + +CHAPTER XIV. A General Good Time.ÂThe Harem.ÂFrench. + +CHAPTER XV. Huck Loses the Raft.ÂIn the Fog.ÂHuck Finds the Raft.ÂTrash. + +CHAPTER XVI. Expectation.ÂA White Lie.ÂFloating Currency.ÂRunning by +Cairo.ÂSwimming Ashore. + +CHAPTER XVII. An Evening Call.ÂThe Farm in Arkansaw.ÂInterior +Decorations.ÂStephen Dowling Bots.ÂPoetical Effusions. + +CHAPTER XVIII. Col. Grangerford.ÂAristocracy.ÂFeuds.ÂThe +Testament.ÂRecovering the Raft.ÂThe WoodÂpile.ÂPork and Cabbage. + +CHAPTER XIX. Tying Up DayÂtimes.ÂAn Astronomical Theory.ÂRunning a +Temperance Revival.ÂThe Duke of Bridgewater.ÂThe Troubles of Royalty. + +CHAPTER XX. Huck Explains.ÂLaying Out a Campaign.ÂWorking the +CampÂmeeting.ÂA Pirate at the CampÂmeeting.ÂThe Duke as a Printer. + +CHAPTER XXI. Sword Exercise.ÂHamlet's Soliloquy.ÂThey Loafed Around +Town.ÂA Lazy Town.ÂOld Boggs.ÂDead. + +CHAPTER XXII. Sherburn.ÂAttending the Circus.ÂIntoxication in the +Ring.ÂThe Thrilling Tragedy. + +CHAPTER XXIII. Sold.ÂRoyal Comparisons.ÂJim Gets Home-sick. + +CHAPTER XXIV. Jim in Royal Robes.ÂThey Take a Passenger.ÂGetting +Information.ÂFamily Grief. + +CHAPTER XXV. Is It Them?ÂSinging the "Doxologer."ÂAwful SquareÂFuneral +Orgies.ÂA Bad Investment . + +CHAPTER XXVI. A Pious King.ÂThe King's Clergy.ÂShe Asked His +Pardon.ÂHiding in the Room.ÂHuck Takes the Money. + +CHAPTER XXVII. The Funeral.ÂSatisfying Curiosity.ÂSuspicious of +Huck,ÂQuick Sales and Small. + +CHAPTER XXVIII. The Trip to England.Â"The Brute!"ÂMary Jane Decides to +Leave.ÂHuck Parting with Mary Jane.ÂMumps.ÂThe Opposition Line. + +CHAPTER XXIX. Contested Relationship.ÂThe King Explains the Loss.ÂA +Question of Handwriting.ÂDigging up the Corpse.ÂHuck Escapes. + +CHAPTER XXX. The King Went for Him.ÂA Royal Row.ÂPowerful Mellow. + +CHAPTER XXXI. Ominous Plans.ÂNews from Jim.ÂOld Recollections.ÂA Sheep +Story.ÂValuable Information. + +CHAPTER XXXII. Still and SundayÂlike.ÂMistaken Identity.ÂUp a Stump.ÂIn +a Dilemma. + +CHAPTER XXXIII. A Nigger Stealer.ÂSouthern Hospitality.ÂA Pretty Long +Blessing.ÂTar and Feathers. + +CHAPTER XXXIV. The Hut by the Ash Hopper.ÂOutrageous.ÂClimbing the +Lightning Rod.ÂTroubled with Witches. + +CHAPTER XXXV. Escaping Properly.ÂDark Schemes.ÂDiscrimination in +Stealing.ÂA Deep Hole. + +CHAPTER XXXVI. The Lightning Rod.ÂHis Level Best.ÂA Bequest to +Posterity.ÂA High Figure. + +CHAPTER XXXVII. The Last Shirt.ÂMooning Around.ÂSailing Orders.ÂThe +Witch Pie. + +CHAPTER XXXVIII. The Coat of Arms.ÂA Skilled Superintendent.ÂUnpleasant +Glory.ÂA Tearful Subject. + +CHAPTER XXXIX. Rats.ÂLively
[11/14] cassandra git commit: Integrate SASI index into Cassandra
http://git-wip-us.apache.org/repos/asf/cassandra/blob/72790dc8/src/java/org/apache/cassandra/index/sasi/conf/view/PrefixTermTree.java -- diff --git a/src/java/org/apache/cassandra/index/sasi/conf/view/PrefixTermTree.java b/src/java/org/apache/cassandra/index/sasi/conf/view/PrefixTermTree.java new file mode 100644 index 000..72b6daf --- /dev/null +++ b/src/java/org/apache/cassandra/index/sasi/conf/view/PrefixTermTree.java @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.cassandra.index.sasi.conf.view; + +import java.nio.ByteBuffer; +import java.util.HashSet; +import java.util.Map; +import java.util.Set; + +import org.apache.cassandra.index.sasi.SSTableIndex; +import org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder; +import org.apache.cassandra.index.sasi.plan.Expression; +import org.apache.cassandra.index.sasi.utils.trie.KeyAnalyzer; +import org.apache.cassandra.index.sasi.utils.trie.PatriciaTrie; +import org.apache.cassandra.index.sasi.utils.trie.Trie; +import org.apache.cassandra.db.marshal.AbstractType; +import org.apache.cassandra.utils.Interval; +import org.apache.cassandra.utils.IntervalTree; + +import com.google.common.collect.Sets; + +/** + * This class is an extension over RangeTermTree for string terms, + * it is required because interval tree can't handle matching if search is on the + * prefix of min/max of the range, so for ascii/utf8 fields we build an additional + * prefix trie (including both min/max terms of the index) and do union of the results + * of the prefix tree search and results from the interval tree lookup. + */ +public class PrefixTermTree extends RangeTermTree +{ +private final OnDiskIndexBuilder.Mode mode; +private final Trietrie; + +public PrefixTermTree(ByteBuffer min, ByteBuffer max, + Trie trie, + IntervalTree > ranges, + OnDiskIndexBuilder.Mode mode) +{ +super(min, max, ranges); + +this.mode = mode; +this.trie = trie; +} + +public Set search(Expression e) +{ +Map indexes = (e == null || e.lower == null || mode == OnDiskIndexBuilder.Mode.CONTAINS) +? trie : trie.prefixMap(e.lower.value); + +Set view = new HashSet<>(indexes.size()); +indexes.values().forEach(view::addAll); + +return Sets.union(view, super.search(e)); +} + +public static class Builder extends RangeTermTree.Builder +{ +private final PatriciaTrie trie; + +protected Builder(OnDiskIndexBuilder.Mode mode, final AbstractType comparator) +{ +super(mode, comparator); +trie = new PatriciaTrie<>(new ByteBufferKeyAnalyzer(comparator)); +} + +public void addIndex(SSTableIndex index) +{ +super.addIndex(index); +addTerm(index.minTerm(), index); +addTerm(index.maxTerm(), index); +} + +public TermTree build() +{ +return new PrefixTermTree(min, max, trie, IntervalTree.build(intervals), mode); +} + +private void addTerm(ByteBuffer term, SSTableIndex index) +{ +Set indexes = trie.get(term); +if (indexes == null) +trie.put(term, (indexes = new HashSet<>())); + +indexes.add(index); +} +} + +private static class ByteBufferKeyAnalyzer implements KeyAnalyzer +{ +private final AbstractType comparator; + +public ByteBufferKeyAnalyzer(AbstractType comparator) +{ +this.comparator = comparator; +} + +/** + * A bit mask where the first bit is 1 and the others are zero + */ +private static final int MSB = 1 << Byte.SIZE-1; + +public int compare(ByteBuffer a, ByteBuffer b) +{ +return comparator.compare(a, b); +} + +public int lengthInBits(ByteBuffer o)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114168#comment-15114168 ] Pavel Yaskevich commented on CASSANDRA-10661: - Pushed as squashed commit [72790dc|https://github.com/apache/cassandra/commit/72790dc8e34826b39ac696b03025ae6b7b6beb2b]. I'm going to resolve this issue and promote CASSANDRA-10765 from sub-task. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.4 > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[06/14] cassandra git commit: Integrate SASI index into Cassandra
http://git-wip-us.apache.org/repos/asf/cassandra/blob/72790dc8/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/fi_ST.txt -- diff --git a/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/fi_ST.txt b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/fi_ST.txt new file mode 100644 index 000..3c8bfd5 --- /dev/null +++ b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/fi_ST.txt @@ -0,0 +1,748 @@ +# Stop Words List from http://members.unine.ch/jacques.savoy/clef/index.html +aiemmin +aika +aikaa +aikaan +aikaisemmin +aikaisin +aikajen +aikana +aikoina +aikoo +aikovat +aina +ainakaan +ainakin +ainoa +ainoat +aiomme +aion +aiotte +aist +aivan +ajan +�l� +alas +alemmas +�lk��n +alkuisin +alkuun +alla +alle +aloitamme +aloitan +aloitat +aloitatte +aloitattivat +aloitettava +aloitettevaksi +aloitettu +aloitimme +aloitin +aloitit +aloititte +aloittaa +aloittamatta +aloitti +aloittivat +alta +aluksi +alussa +alusta +annettavaksi +annetteva +annettu +antaa +antamatta +antoi +aoua +apu +asia +asiaa +asian +asiasta +asiat +asioiden +asioihin +asioita +asti +avuksi +avulla +avun +avutta +edell� +edelle +edelleen +edelt� +edemm�s +edes +edess� +edest� +ehk� +ei +eik� +eilen +eiv�t +eli +ellei +elleiv�t +ellemme +ellen +ellet +ellette +emme +en +en�� +enemm�n +eniten +ennen +ensi +ensimm�inen +ensimm�iseksi +ensimm�isen +ensimm�isen� +ensimm�iset +ensimm�isi� +ensimm�isiksi +ensimm�isin� +ensimm�ist� +ensin +entinen +entisen +entisi� +entist� +entisten +er��t +er�iden +er�s +eri +eritt�in +erityisesti +esi +esiin +esill� +esimerkiksi +et +eteen +etenkin +ett� +ette +ettei +halua +haluaa +haluamatta +haluamme +haluan +haluat +haluatte +haluavat +halunnut +halusi +halusimme +halusin +halusit +halusitte +halusivat +halutessa +haluton +h�n +h�neen +h�nell� +h�nelle +h�nelt� +h�nen +h�ness� +h�nest� +h�net +he +hei +heid�n +heihin +heille +heilt� +heiss� +heist� +heit� +helposti +heti +hetkell� +hieman +huolimatta +huomenna +hyv� +hyv�� +hyv�t +hyvi� +hyvien +hyviin +hyviksi +hyville +hyvilt� +hyvin +hyvin� +hyviss� +hyvist� +ihan +ilman +ilmeisesti +itse +itse��n +itsens� +ja +j�� +j�lkeen +j�lleen +jo +johon +joiden +joihin +joiksi +joilla +joille +joilta +joissa +joista +joita +joka +jokainen +jokin +joko +joku +jolla +jolle +jolloin +jolta +jompikumpi +jonka +jonkin +jonne +joo +jopa +jos +joskus +jossa +josta +jota +jotain +joten +jotenkin +jotenkuten +jotka +jotta +jouduimme +jouduin +jouduit +jouduitte +joudumme +joudun +joudutte +joukkoon +joukossa +joukosta +joutua +joutui +joutuivat +joutumaan +joutuu +joutuvat +juuri +kahdeksan +kahdeksannen +kahdella +kahdelle +kahdelta +kahden +kahdessa +kahdesta +kahta +kahteen +kai +kaiken +kaikille +kaikilta +kaikkea +kaikki +kaikkia +kaikkiaan +kaikkialla +kaikkialle +kaikkialta +kaikkien +kaikkin +kaksi +kannalta +kannattaa +kanssa +kanssaan +kanssamme +kanssani +kanssanne +kanssasi +kauan +kauemmas +kautta +kehen +keiden +keihin +keiksi +keill� +keille +keilt� +kein� +keiss� +keist� +keit� +keitt� +keitten +keneen +keneksi +kenell� +kenelle +kenelt� +kenen +kenen� +keness� +kenest� +kenet +kenett� +kenness�st� +kerran +kerta +kertaa +kesken +keskim��rin +ket� +ketk� +kiitos +kohti +koko +kokonaan +kolmas +kolme +kolmen +kolmesti +koska +koskaan +kovin +kuin +kuinka +kuitenkaan +kuitenkin +kuka +kukaan +kukin +kumpainen +kumpainenkaan +kumpi +kumpikaan +kumpikin +kun +kuten +kuuden +kuusi +kuutta +kyll� +kymmenen +kyse +l�hekk�in +l�hell� +l�helle +l�helt� +l�hemm�s +l�hes +l�hinn� +l�htien +l�pi +liian +liki +lis�� +lis�ksi +luo +mahdollisimman +mahdollista +me +meid�n +meill� +meille +melkein +melko +menee +meneet +menemme +menen +menet +menette +menev�t +meni +menimme +menin +menit +meniv�t +menness� +mennyt +menossa +mihin +mik� +mik��n +mik�li +mikin +miksi +milloin +min� +minne +minun +minut +miss� +mist� +mit� +mit��n +miten +moi +molemmat +mones +monesti +monet +moni +moniaalla +moniaalle +moniaalta +monta +muassa +muiden +muita +muka +mukaan +mukaansa +mukana +mutta +muu +muualla +muualle +muualta +muuanne +muulloin +muun +muut +muuta +muutama +muutaman +muuten +my�hemmin +my�s +my�sk��n +my�skin +my�t� +n�iden +n�in +n�iss� +n�iss�hin +n�iss�lle +n�iss�lt� +n�iss�st� +n�it� +n�m� +ne +nelj� +nelj�� +nelj�n +niiden +niin +niist� +niit� +noin +nopeammin +nopeasti +nopeiten +nro +nuo +nyt +ohi +oikein +ole +olemme +olen +olet +olette +oleva +olevan +olevat +oli +olimme +olin +olisi +olisimme +olisin +olisit +olisitte +olisivat +olit +olitte +olivat +olla +olleet +olli +ollut +oma +omaa +omaan +omaksi +omalle +omalta +oman +omassa +omat
[12/14] cassandra git commit: Integrate SASI index into Cassandra
http://git-wip-us.apache.org/repos/asf/cassandra/blob/72790dc8/src/java/org/apache/cassandra/index/sasi/analyzer/NonTokenizingOptions.java -- diff --git a/src/java/org/apache/cassandra/index/sasi/analyzer/NonTokenizingOptions.java b/src/java/org/apache/cassandra/index/sasi/analyzer/NonTokenizingOptions.java new file mode 100644 index 000..303087b --- /dev/null +++ b/src/java/org/apache/cassandra/index/sasi/analyzer/NonTokenizingOptions.java @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.cassandra.index.sasi.analyzer; + +import java.util.Map; + +public class NonTokenizingOptions +{ +public static final String NORMALIZE_LOWERCASE = "normalize_lowercase"; +public static final String NORMALIZE_UPPERCASE = "normalize_uppercase"; +public static final String CASE_SENSITIVE = "case_sensitive"; + +private boolean caseSensitive; +private boolean upperCaseOutput; +private boolean lowerCaseOutput; + +public boolean isCaseSensitive() +{ +return caseSensitive; +} + +public void setCaseSensitive(boolean caseSensitive) +{ +this.caseSensitive = caseSensitive; +} + +public boolean shouldUpperCaseOutput() +{ +return upperCaseOutput; +} + +public void setUpperCaseOutput(boolean upperCaseOutput) +{ +this.upperCaseOutput = upperCaseOutput; +} + +public boolean shouldLowerCaseOutput() +{ +return lowerCaseOutput; +} + +public void setLowerCaseOutput(boolean lowerCaseOutput) +{ +this.lowerCaseOutput = lowerCaseOutput; +} + +public static class OptionsBuilder +{ +private boolean caseSensitive = true; +private boolean upperCaseOutput = false; +private boolean lowerCaseOutput = false; + +public OptionsBuilder() +{ +} + +public OptionsBuilder caseSensitive(boolean caseSensitive) +{ +this.caseSensitive = caseSensitive; +return this; +} + +public OptionsBuilder upperCaseOutput(boolean upperCaseOutput) +{ +this.upperCaseOutput = upperCaseOutput; +return this; +} + +public OptionsBuilder lowerCaseOutput(boolean lowerCaseOutput) +{ +this.lowerCaseOutput = lowerCaseOutput; +return this; +} + +public NonTokenizingOptions build() +{ +if (lowerCaseOutput && upperCaseOutput) +throw new IllegalArgumentException("Options to normalize terms cannot be " + +"both uppercase and lowercase at the same time"); + +NonTokenizingOptions options = new NonTokenizingOptions(); +options.setCaseSensitive(caseSensitive); +options.setUpperCaseOutput(upperCaseOutput); +options.setLowerCaseOutput(lowerCaseOutput); +return options; +} +} + +public static NonTokenizingOptions buildFromMap(MapoptionsMap) +{ +OptionsBuilder optionsBuilder = new OptionsBuilder(); + +if (optionsMap.containsKey(CASE_SENSITIVE) && (optionsMap.containsKey(NORMALIZE_LOWERCASE) +|| optionsMap.containsKey(NORMALIZE_UPPERCASE))) +throw new IllegalArgumentException("case_sensitive option cannot be specified together " + +"with either normalize_lowercase or normalize_uppercase"); + +for (Map.Entry entry : optionsMap.entrySet()) +{ +switch (entry.getKey()) +{ +case NORMALIZE_LOWERCASE: +{ +boolean bool = Boolean.parseBoolean(entry.getValue()); +optionsBuilder = optionsBuilder.lowerCaseOutput(bool); +break; +} +case NORMALIZE_UPPERCASE: +{ +boolean bool = Boolean.parseBoolean(entry.getValue()); +optionsBuilder = optionsBuilder.upperCaseOutput(bool); +break; +} +case
[07/14] cassandra git commit: Integrate SASI index into Cassandra
http://git-wip-us.apache.org/repos/asf/cassandra/blob/72790dc8/src/java/org/apache/cassandra/index/sasi/utils/trie/PatriciaTrie.java -- diff --git a/src/java/org/apache/cassandra/index/sasi/utils/trie/PatriciaTrie.java b/src/java/org/apache/cassandra/index/sasi/utils/trie/PatriciaTrie.java new file mode 100644 index 000..3c672ec --- /dev/null +++ b/src/java/org/apache/cassandra/index/sasi/utils/trie/PatriciaTrie.java @@ -0,0 +1,1261 @@ +/* + * Copyright 2005-2010 Roger Kapsi, Sam Berlin + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.cassandra.index.sasi.utils.trie; + +import java.io.Serializable; +import java.util.*; + +/** + * This class is taken from https://github.com/rkapsi/patricia-trie (v0.6), and slightly modified + * to correspond to Cassandra code style, as the only Patricia Trie implementation, + * which supports pluggable key comparators (e.g. commons-collections PatriciaTrie (which is based + * on rkapsi/patricia-trie project) only supports String keys) + * but unfortunately is not deployed to the maven central as a downloadable artifact. + */ + +/** + * PATRICIA {@link Trie} + * + * Practical Algorithm to Retrieve Information Coded in Alphanumeric + * + * A PATRICIA {@link Trie} is a compressed {@link Trie}. Instead of storing + * all data at the edges of the {@link Trie} (and having empty internal nodes), + * PATRICIA stores data in every node. This allows for very efficient traversal, + * insert, delete, predecessor, successor, prefix, range, and {@link #select(Object)} + * operations. All operations are performed at worst in O(K) time, where K + * is the number of bits in the largest item in the tree. In practice, + * operations actually take O(A(K)) time, where A(K) is the average number of + * bits of all items in the tree. + * + * Most importantly, PATRICIA requires very few comparisons to keys while + * doing any operation. While performing a lookup, each comparison (at most + * K of them, described above) will perform a single bit comparison against + * the given key, instead of comparing the entire key to another key. + * + * The {@link Trie} can return operations in lexicographical order using the + * {@link #traverse(Cursor)}, 'prefix', 'submap', or 'iterator' methods. The + * {@link Trie} can also scan for items that are 'bitwise' (using an XOR + * metric) by the 'select' method. Bitwise closeness is determined by the + * {@link KeyAnalyzer} returning true or false for a bit being set or not in + * a given key. + * + * Any methods here that take an {@link Object} argument may throw a + * {@link ClassCastException} if the method is expecting an instance of K + * and it isn't K. + * + * @see http://en.wikipedia.org/wiki/Radix_tree;>Radix Tree + * @see http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/PATRICIA;>PATRICIA + * @see http://www.imperialviolet.org/binary/critbit.pdf;>Crit-Bit Tree + * + * @author Roger Kapsi + * @author Sam Berlin + */ +public class PatriciaTrieextends AbstractPatriciaTrie implements Serializable +{ +private static final long serialVersionUID = -2246014692353432660L; + +public PatriciaTrie(KeyAnalyzer keyAnalyzer) +{ +super(keyAnalyzer); +} + +public PatriciaTrie(KeyAnalyzer keyAnalyzer, Map m) +{ +super(keyAnalyzer, m); +} + +@Override +public Comparator comparator() +{ +return keyAnalyzer; +} + +@Override +public SortedMap prefixMap(K prefix) +{ +return lengthInBits(prefix) == 0 ? this : new PrefixRangeMap(prefix); +} + +@Override +public K firstKey() +{ +return firstEntry().getKey(); +} + +@Override +public K lastKey() +{ +TrieEntry entry = lastEntry(); +return entry != null ? entry.getKey() : null; +} + +@Override +public SortedMap headMap(K toKey) +{ +return new RangeEntryMap(null, toKey); +} + +@Override +public SortedMap subMap(K fromKey, K toKey) +{ +return new RangeEntryMap(fromKey, toKey); +} + +@Override +public SortedMap tailMap(K fromKey) +{ +return new RangeEntryMap(fromKey, null); +} + +/** + * Returns an entry strictly higher than the given key, + * or null if no
[02/14] cassandra git commit: Integrate SASI index into Cassandra
http://git-wip-us.apache.org/repos/asf/cassandra/blob/72790dc8/test/unit/org/apache/cassandra/index/sasi/analyzer/NonTokenizingAnalyzerTest.java -- diff --git a/test/unit/org/apache/cassandra/index/sasi/analyzer/NonTokenizingAnalyzerTest.java b/test/unit/org/apache/cassandra/index/sasi/analyzer/NonTokenizingAnalyzerTest.java new file mode 100644 index 000..ba67853 --- /dev/null +++ b/test/unit/org/apache/cassandra/index/sasi/analyzer/NonTokenizingAnalyzerTest.java @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.cassandra.index.sasi.analyzer; + +import java.nio.ByteBuffer; + +import org.apache.cassandra.db.marshal.Int32Type; +import org.apache.cassandra.db.marshal.UTF8Type; +import org.apache.cassandra.utils.ByteBufferUtil; + +import org.junit.Assert; +import org.junit.Test; + +/** + * Tests for the non-tokenizing analyzer + */ +public class NonTokenizingAnalyzerTest +{ +@Test +public void caseInsensitiveAnalizer() throws Exception +{ +NonTokenizingAnalyzer analyzer = new NonTokenizingAnalyzer(); +NonTokenizingOptions options = NonTokenizingOptions.getDefaultOptions(); +options.setCaseSensitive(false); +analyzer.init(options, UTF8Type.instance); + +String testString = "Nip it in the bud"; +ByteBuffer toAnalyze = ByteBuffer.wrap(testString.getBytes()); +analyzer.reset(toAnalyze); +ByteBuffer analyzed = null; +while (analyzer.hasNext()) +analyzed = analyzer.next(); + Assert.assertTrue(testString.toLowerCase().equals(ByteBufferUtil.string(analyzed))); +} + +@Test +public void caseSensitiveAnalizer() throws Exception +{ +NonTokenizingAnalyzer analyzer = new NonTokenizingAnalyzer(); +NonTokenizingOptions options = NonTokenizingOptions.getDefaultOptions(); +analyzer.init(options, UTF8Type.instance); + +String testString = "Nip it in the bud"; +ByteBuffer toAnalyze = ByteBuffer.wrap(testString.getBytes()); +analyzer.reset(toAnalyze); +ByteBuffer analyzed = null; +while (analyzer.hasNext()) +analyzed = analyzer.next(); + Assert.assertFalse(testString.toLowerCase().equals(ByteBufferUtil.string(analyzed))); +} + +@Test +public void ensureIncompatibleInputSkipped() throws Exception +{ +NonTokenizingAnalyzer analyzer = new NonTokenizingAnalyzer(); +NonTokenizingOptions options = NonTokenizingOptions.getDefaultOptions(); +analyzer.init(options, Int32Type.instance); + +ByteBuffer toAnalyze = ByteBufferUtil.bytes(1); +analyzer.reset(toAnalyze); +Assert.assertTrue(!analyzer.hasNext()); +} +} http://git-wip-us.apache.org/repos/asf/cassandra/blob/72790dc8/test/unit/org/apache/cassandra/index/sasi/analyzer/StandardAnalyzerTest.java -- diff --git a/test/unit/org/apache/cassandra/index/sasi/analyzer/StandardAnalyzerTest.java b/test/unit/org/apache/cassandra/index/sasi/analyzer/StandardAnalyzerTest.java new file mode 100644 index 000..e307512 --- /dev/null +++ b/test/unit/org/apache/cassandra/index/sasi/analyzer/StandardAnalyzerTest.java @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.cassandra.index.sasi.analyzer;
[09/14] cassandra git commit: Integrate SASI index into Cassandra
http://git-wip-us.apache.org/repos/asf/cassandra/blob/72790dc8/src/java/org/apache/cassandra/index/sasi/plan/Operation.java -- diff --git a/src/java/org/apache/cassandra/index/sasi/plan/Operation.java b/src/java/org/apache/cassandra/index/sasi/plan/Operation.java new file mode 100644 index 000..1857c56 --- /dev/null +++ b/src/java/org/apache/cassandra/index/sasi/plan/Operation.java @@ -0,0 +1,477 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.cassandra.index.sasi.plan; + +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.*; + +import org.apache.cassandra.config.ColumnDefinition; +import org.apache.cassandra.config.ColumnDefinition.Kind; +import org.apache.cassandra.cql3.Operator; +import org.apache.cassandra.db.filter.RowFilter; +import org.apache.cassandra.db.rows.Row; +import org.apache.cassandra.db.rows.Unfiltered; +import org.apache.cassandra.index.sasi.conf.ColumnIndex; +import org.apache.cassandra.index.sasi.analyzer.AbstractAnalyzer; +import org.apache.cassandra.index.sasi.disk.Token; +import org.apache.cassandra.index.sasi.plan.Expression.Op; +import org.apache.cassandra.index.sasi.utils.RangeIntersectionIterator; +import org.apache.cassandra.index.sasi.utils.RangeIterator; +import org.apache.cassandra.index.sasi.utils.RangeUnionIterator; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.collect.*; +import org.apache.cassandra.utils.FBUtilities; + +public class Operation extends RangeIterator+{ +public enum OperationType +{ +AND, OR; + +public boolean apply(boolean a, boolean b) +{ +switch (this) +{ +case OR: +return a | b; + +case AND: +return a & b; + +default: +throw new AssertionError(); +} +} +} + +private final QueryController controller; + +protected final OperationType op; +protected final ListMultimap expressions; +protected final RangeIterator range; + +protected Operation left, right; + +private Operation(OperationType operation, + QueryController controller, + ListMultimap expressions, + RangeIterator range, + Operation left, Operation right) +{ +super(range); + +this.op = operation; +this.controller = controller; +this.expressions = expressions; +this.range = range; + +this.left = left; +this.right = right; +} + +/** + * Recursive "satisfies" checks based on operation + * and data from the lower level members using depth-first search + * and bubbling the results back to the top level caller. + * + * Most of the work here is done by {@link #localSatisfiedBy(Unfiltered, boolean)} + * see it's comment for details, if there are no local expressions + * assigned to Operation it will call satisfiedBy(Row) on it's children. + * + * Query: first_name = X AND (last_name = Y OR address = XYZ AND street = IL AND city = C) OR (state = 'CA' AND country = 'US') + * Row: key1: (first_name: X, last_name: Z, address: XYZ, street: IL, city: C, state: NY, country:US) + * + * #1 OR + */\ + * #2 (first_name) AND AND (state, country) + * \ + * #3(last_name) OR + * \ + * #4 AND (address, street, city) + * + * + * Evaluation of the key1 is top-down depth-first search: + * + * --- going down --- + * Level #1 is evaluated, OR expression has to pull results from it's children which are at level #2 and OR them together, + * Level #2 AND (state, country) could be be evaluated right away, AND (first_name) refers to it's "right" child from level #3 + * Level #3 OR (last_name) requests results
[08/14] cassandra git commit: Integrate SASI index into Cassandra
http://git-wip-us.apache.org/repos/asf/cassandra/blob/72790dc8/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java -- diff --git a/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java b/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java new file mode 100644 index 000..0d2214a --- /dev/null +++ b/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java @@ -0,0 +1,281 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.cassandra.index.sasi.utils; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; +import java.util.PriorityQueue; + +import com.google.common.collect.Iterators; +import org.apache.cassandra.io.util.FileUtils; + +import com.google.common.annotations.VisibleForTesting; + +public class RangeIntersectionIterator +{ +protected enum Strategy +{ +BOUNCE, LOOKUP, ADAPTIVE +} + +public static , D extends CombinedValue> Builderbuilder() +{ +return builder(Strategy.ADAPTIVE); +} + +@VisibleForTesting +protected static , D extends CombinedValue> Builder builder(Strategy strategy) +{ +return new Builder<>(strategy); +} + +public static class Builder, D extends CombinedValue> extends RangeIterator.Builder +{ +private final Strategy strategy; + +public Builder(Strategy strategy) +{ +super(IteratorType.INTERSECTION); +this.strategy = strategy; +} + +protected RangeIterator buildIterator() +{ +// if the range is disjoint we can simply return empty +// iterator of any type, because it's not going to produce any results. +if (statistics.isDisjoint()) +return new BounceIntersectionIterator<>(statistics, new PriorityQueue >(1)); + +switch (strategy) +{ +case LOOKUP: +return new LookupIntersectionIterator<>(statistics, ranges); + +case BOUNCE: +return new BounceIntersectionIterator<>(statistics, ranges); + +case ADAPTIVE: +return statistics.sizeRatio() <= 0.01d +? new LookupIntersectionIterator<>(statistics, ranges) +: new BounceIntersectionIterator<>(statistics, ranges); + +default: +throw new IllegalStateException("Unknown strategy: " + strategy); +} +} +} + +private static abstract class AbstractIntersectionIterator, D extends CombinedValue> extends RangeIterator +{ +protected final PriorityQueue > ranges; + +private AbstractIntersectionIterator(Builder.Statistics statistics, PriorityQueue > ranges) +{ +super(statistics); +this.ranges = ranges; +} + +public void close() throws IOException +{ +for (RangeIterator range : ranges) +FileUtils.closeQuietly(range); +} +} + +/** + * Iterator which performs intersection of multiple ranges by using bouncing (merge-join) technique to identify + * common elements in the given ranges. Aforementioned "bounce" works as follows: range queue is poll'ed for the + * range with the smallest current token (main loop), that token is used to {@link RangeIterator#skipTo(Comparable)} + * other ranges, if token produced by {@link RangeIterator#skipTo(Comparable)} is equal to current "candidate" token, + * both get merged together and the same operation is repeated for next range from the queue, if returned token + * is not equal than candidate, candidate's range gets put back into the queue and the main loop gets repeated until + * next intersection token is found or at least one iterator runs out of tokens. + * + * This technique is every efficient to jump over gaps in
[10/14] cassandra git commit: Integrate SASI index into Cassandra
http://git-wip-us.apache.org/repos/asf/cassandra/blob/72790dc8/src/java/org/apache/cassandra/index/sasi/disk/TokenTree.java -- diff --git a/src/java/org/apache/cassandra/index/sasi/disk/TokenTree.java b/src/java/org/apache/cassandra/index/sasi/disk/TokenTree.java new file mode 100644 index 000..5d85d00 --- /dev/null +++ b/src/java/org/apache/cassandra/index/sasi/disk/TokenTree.java @@ -0,0 +1,519 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.cassandra.index.sasi.disk; + +import java.io.IOException; +import java.util.*; + +import org.apache.cassandra.db.DecoratedKey; +import org.apache.cassandra.index.sasi.utils.AbstractIterator; +import org.apache.cassandra.index.sasi.utils.CombinedValue; +import org.apache.cassandra.index.sasi.utils.MappedBuffer; +import org.apache.cassandra.index.sasi.utils.RangeIterator; +import org.apache.cassandra.utils.MergeIterator; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Function; +import com.google.common.collect.Iterators; +import org.apache.commons.lang3.builder.HashCodeBuilder; + +import static org.apache.cassandra.index.sasi.disk.TokenTreeBuilder.EntryType; + +// Note: all of the seek-able offsets contained in TokenTree should be sizeof(long) +// even if currently only lower int portion of them if used, because that makes +// it possible to switch to mmap implementation which supports long positions +// without any on-disk format changes and/or re-indexing if one day we'll have a need to. +public class TokenTree +{ +private static final int LONG_BYTES = Long.SIZE / 8; +private static final int SHORT_BYTES = Short.SIZE / 8; + +private final Descriptor descriptor; +private final MappedBuffer file; +private final long startPos; +private final long treeMinToken; +private final long treeMaxToken; +private final long tokenCount; + +@VisibleForTesting +protected TokenTree(MappedBuffer tokenTree) +{ +this(Descriptor.CURRENT, tokenTree); +} + +public TokenTree(Descriptor d, MappedBuffer tokenTree) +{ +descriptor = d; +file = tokenTree; +startPos = file.position(); + +file.position(startPos + TokenTreeBuilder.SHARED_HEADER_BYTES); + +if (!validateMagic()) +throw new IllegalArgumentException("invalid token tree"); + +tokenCount = file.getLong(); +treeMinToken = file.getLong(); +treeMaxToken = file.getLong(); +} + +public long getCount() +{ +return tokenCount; +} + +public RangeIteratoriterator(Function keyFetcher) +{ +return new TokenTreeIterator(file.duplicate(), keyFetcher); +} + +public OnDiskToken get(final long searchToken, Function keyFetcher) +{ +seekToLeaf(searchToken, file); +long leafStart = file.position(); +short leafSize = file.getShort(leafStart + 1); // skip the info byte + +file.position(leafStart + TokenTreeBuilder.BLOCK_HEADER_BYTES); // skip to tokens +short tokenIndex = searchLeaf(searchToken, leafSize); + +file.position(leafStart + TokenTreeBuilder.BLOCK_HEADER_BYTES); + +OnDiskToken token = OnDiskToken.getTokenAt(file, tokenIndex, leafSize, keyFetcher); +return token.get().equals(searchToken) ? token : null; +} + +private boolean validateMagic() +{ +switch (descriptor.version.toString()) +{ +case Descriptor.VERSION_AA: +return true; +case Descriptor.VERSION_AB: +return TokenTreeBuilder.AB_MAGIC == file.getShort(); +default: +return false; +} +} + +// finds leaf that *could* contain token +private void seekToLeaf(long token, MappedBuffer file) +{ +// this loop always seeks forward except for the first iteration +// where it may seek back to the root +long blockStart = startPos; +while (true) +{ +file.position(blockStart); + +byte info = file.get(); +
[01/14] cassandra git commit: Integrate SASI index into Cassandra
Repository: cassandra Updated Branches: refs/heads/trunk 11c8ca6b5 -> 72790dc8e http://git-wip-us.apache.org/repos/asf/cassandra/blob/72790dc8/test/unit/org/apache/cassandra/index/sasi/plan/OperationTest.java -- diff --git a/test/unit/org/apache/cassandra/index/sasi/plan/OperationTest.java b/test/unit/org/apache/cassandra/index/sasi/plan/OperationTest.java new file mode 100644 index 000..92fbf69 --- /dev/null +++ b/test/unit/org/apache/cassandra/index/sasi/plan/OperationTest.java @@ -0,0 +1,645 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.cassandra.index.sasi.plan; + +import java.nio.ByteBuffer; +import java.util.*; +import java.util.concurrent.TimeUnit; + +import com.google.common.collect.ListMultimap; +import com.google.common.collect.Multimap; +import com.google.common.collect.Sets; +import org.apache.cassandra.SchemaLoader; +import org.apache.cassandra.config.CFMetaData; +import org.apache.cassandra.config.ColumnDefinition; +import org.apache.cassandra.cql3.Operator; +import org.apache.cassandra.db.*; +import org.apache.cassandra.db.filter.RowFilter; +import org.apache.cassandra.db.marshal.DoubleType; +import org.apache.cassandra.db.rows.*; +import org.apache.cassandra.index.sasi.plan.Operation.OperationType; +import org.apache.cassandra.db.marshal.Int32Type; +import org.apache.cassandra.db.marshal.LongType; +import org.apache.cassandra.db.marshal.UTF8Type; +import org.apache.cassandra.exceptions.ConfigurationException; +import org.apache.cassandra.schema.KeyspaceMetadata; +import org.apache.cassandra.schema.KeyspaceParams; +import org.apache.cassandra.schema.Tables; +import org.apache.cassandra.service.MigrationManager; +import org.apache.cassandra.utils.FBUtilities; + +import org.junit.*; + +public class OperationTest extends SchemaLoader +{ +private static final String KS_NAME = "sasi"; +private static final String CF_NAME = "test_cf"; +private static final String CLUSTERING_CF_NAME = "clustering_test_cf"; + +private static ColumnFamilyStore BACKEND; +private static ColumnFamilyStore CLUSTERING_BACKEND; + +@BeforeClass +public static void loadSchema() throws ConfigurationException +{ +System.setProperty("cassandra.config", "cassandra-murmur.yaml"); +SchemaLoader.loadSchema(); +MigrationManager.announceNewKeyspace(KeyspaceMetadata.create(KS_NAME, + KeyspaceParams.simpleTransient(1), + Tables.of(SchemaLoader.sasiCFMD(KS_NAME, CF_NAME), + SchemaLoader.clusteringSASICFMD(KS_NAME, CLUSTERING_CF_NAME; + +BACKEND = Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME); +CLUSTERING_BACKEND = Keyspace.open(KS_NAME).getColumnFamilyStore(CLUSTERING_CF_NAME); +} + +private QueryController controller; + +@Before +public void beforeTest() +{ +controller = new QueryController(BACKEND, + PartitionRangeReadCommand.allDataRead(BACKEND.metadata, FBUtilities.nowInSeconds()), + TimeUnit.SECONDS.toMillis(10)); +} + +@After +public void afterTest() +{ +controller.finish(); +} + +@Test +public void testAnalyze() throws Exception +{ +final ColumnDefinition firstName = getColumn(UTF8Type.instance.decompose("first_name")); +final ColumnDefinition age = getColumn(UTF8Type.instance.decompose("age")); +final ColumnDefinition comment = getColumn(UTF8Type.instance.decompose("comment")); + +// age != 5 AND age > 1 AND age != 6 AND age <= 10 +Mapexpressions = convert(Operation.analyzeGroup(controller, OperationType.AND, + Arrays.asList(new SimpleExpression(age, Operator.NEQ, Int32Type.instance.decompose(5)), +
[03/14] cassandra git commit: Integrate SASI index into Cassandra
http://git-wip-us.apache.org/repos/asf/cassandra/blob/72790dc8/test/unit/org/apache/cassandra/index/sasi/SASIIndexTest.java -- diff --git a/test/unit/org/apache/cassandra/index/sasi/SASIIndexTest.java b/test/unit/org/apache/cassandra/index/sasi/SASIIndexTest.java new file mode 100644 index 000..cb5ec73 --- /dev/null +++ b/test/unit/org/apache/cassandra/index/sasi/SASIIndexTest.java @@ -0,0 +1,1852 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.cassandra.index.sasi; + +import java.nio.ByteBuffer; +import java.util.*; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.ThreadLocalRandom; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; + +import org.apache.cassandra.SchemaLoader; +import org.apache.cassandra.config.CFMetaData; +import org.apache.cassandra.config.ColumnDefinition; +import org.apache.cassandra.config.DatabaseDescriptor; +import org.apache.cassandra.cql3.*; +import org.apache.cassandra.cql3.Term; +import org.apache.cassandra.cql3.statements.IndexTarget; +import org.apache.cassandra.cql3.statements.SelectStatement; +import org.apache.cassandra.db.*; +import org.apache.cassandra.db.filter.ColumnFilter; +import org.apache.cassandra.db.filter.DataLimits; +import org.apache.cassandra.db.filter.RowFilter; +import org.apache.cassandra.db.marshal.*; +import org.apache.cassandra.db.partitions.PartitionUpdate; +import org.apache.cassandra.db.partitions.UnfilteredPartitionIterator; +import org.apache.cassandra.db.rows.*; +import org.apache.cassandra.dht.IPartitioner; +import org.apache.cassandra.dht.Murmur3Partitioner; +import org.apache.cassandra.dht.Range; +import org.apache.cassandra.exceptions.ConfigurationException; +import org.apache.cassandra.index.sasi.conf.ColumnIndex; +import org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder; +import org.apache.cassandra.index.sasi.exceptions.TimeQuotaExceededException; +import org.apache.cassandra.index.sasi.plan.QueryPlan; +import org.apache.cassandra.schema.IndexMetadata; +import org.apache.cassandra.schema.KeyspaceMetadata; +import org.apache.cassandra.schema.KeyspaceParams; +import org.apache.cassandra.schema.Tables; +import org.apache.cassandra.serializers.MarshalException; +import org.apache.cassandra.serializers.TypeSerializer; +import org.apache.cassandra.service.MigrationManager; +import org.apache.cassandra.service.QueryState; +import org.apache.cassandra.thrift.CqlRow; +import org.apache.cassandra.transport.messages.ResultMessage; +import org.apache.cassandra.utils.ByteBufferUtil; +import org.apache.cassandra.utils.FBUtilities; +import org.apache.cassandra.utils.Pair; + +import com.google.common.collect.Lists; +import com.google.common.util.concurrent.Uninterruptibles; + +import junit.framework.Assert; + +import org.junit.*; + +public class SASIIndexTest +{ +private static final IPartitioner PARTITIONER = new Murmur3Partitioner(); + +private static final String KS_NAME = "sasi"; +private static final String CF_NAME = "test_cf"; +private static final String CLUSTRING_CF_NAME = "clustering_test_cf"; + +@BeforeClass +public static void loadSchema() throws ConfigurationException +{ +System.setProperty("cassandra.config", "cassandra-murmur.yaml"); +SchemaLoader.loadSchema(); +MigrationManager.announceNewKeyspace(KeyspaceMetadata.create(KS_NAME, + KeyspaceParams.simpleTransient(1), + Tables.of(SchemaLoader.sasiCFMD(KS_NAME, CF_NAME), + SchemaLoader.clusteringSASICFMD(KS_NAME, CLUSTRING_CF_NAME; +} + +@After +public void cleanUp() +{ + Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME).truncateBlocking(); +} + +@Test +public void testSingleExpressionQueries() throws Exception +{ +testSingleExpressionQueries(false); +cleanupData(); +