[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346368#comment-15346368 ] Dave Brosius commented on CASSANDRA-10661: -- {quote} Dave Brosius Can you expand further why someMapEntry.equals(someAbstractTrie) will always be false ? {quote} darn it, i misread. the equals isn't on AbstractTrie, but on a subclass. so yes you are right. Sorry for the noise. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.4 > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345995#comment-15345995 ] DOAN DuyHai commented on CASSANDRA-10661: - [~dbrosius] Can you expand further why {{someMapEntry.equals(someAbstractTrie)}} will always be false ? According to the contract of {{Map.Entry::equals}}, as long as the key and value are equal, the equality holds. I've tried an unit test and it works: {code:java} public class AbstractTrieTest { @Test public void should_test_equality() throws Exception { Mapmap = new HashMap<>(); map.put("10", 10L); final Map.Entry mapEntry = map.entrySet().iterator().next(); final AbstractTrie.BasicEntry trieEntry = new AbstractPatriciaTrie.TrieEntry<>("10", 10L, 0); Assert.assertTrue("mapEntry.equals(trieEntry)", mapEntry.equals(trieEntry)); } } {code} {noformat} % ant testsome -Dtest.name=org.apache.cassandra.index.sasi.utils.AbstractTrieTest -Dtest.methods=should_test_equality ... testsome: [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/usr/local/Cellar/ant/1.9.4/libexec/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar:file:/Users/archinnovinfo/perso/cassandra/build/lib/jars/ant-1.9.4.jar!/org/apache/tools/ant/Project.class [junit] Testsuite: org.apache.cassandra.index.sasi.utils.AbstractTrieTest [junit] Testsuite: org.apache.cassandra.index.sasi.utils.AbstractTrieTest Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.058 sec {noformat} > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.4 > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345812#comment-15345812 ] Dave Brosius commented on CASSANDRA-10661: -- AbstractTrie.equals isn't symmetric.. someAbstractTrie.equals(someMapEntry) could be true, but someMapEntry.equals(someAbstractTrie) will always be false. that is unstable. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.4 > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261541#comment-15261541 ] Chanh Le commented on CASSANDRA-10661: -- [~xedin] Thank man. You got my day. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.4 > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261537#comment-15261537 ] Pavel Yaskevich commented on CASSANDRA-10661: - Hi [~giaosudau], the name of the index class is 'org.apache.cassandra.index.sasi.SASIIndex' you most likely reading documentation specific for 2.0, here is the updated doc https://github.com/apache/cassandra/blob/trunk/doc/SASI.md, it resides in doc/ folder of Apache Cassandra distribution. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.4 > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261528#comment-15261528 ] Chanh Le commented on CASSANDRA-10661: -- Hi I am using cassandra 3.5 and I have problem when create index with that. CREATE CUSTOM INDEX ON bar (fname) USING 'org.apache.cassandra.db.index.SSTableAttachedSecondaryIndex' WITH OPTIONS = { 'analyzer_class': 'org.apache.cassandra.db.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false' }; it throws: unable to find custom indexer class 'org.apache.cassandra.db.index.SSTableAttachedSecondaryIndex > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.4 > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119962#comment-15119962 ] DOAN DuyHai commented on CASSANDRA-10661: - Last but not least. We have an use-case for secondary index that is pretty interesting. With the *Spark/Cassandra* connector, since each Spark partition is distributed according to Cassandra token ranges, each CQL query is *node local*. Consequently a {code:sql} SELECT * FROM table WHERe col=xxx` {code} is transformed by the connector into {code:sql} SELECT * FROM table WHERE col=xxx AND token(pk)>=aaa AND token(pk) <= bbb {code} The normal CQL query engine is token-aware and will restrict the secondary index query on the given range. *Does SASI query planner also take into account token range restrictions to avoid hitting un-necessary nodes ?* > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.4 > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15120020#comment-15120020 ] Pavel Yaskevich commented on CASSANDRA-10661: - Yes, it does account for the range specified and even not going to touch unnecessary SSTables which are not in the range. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.4 > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113677#comment-15113677 ] Sam Tunnicliffe commented on CASSANDRA-10661: - [~xedin] SGTM! > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113690#comment-15113690 ] Pavel Yaskevich commented on CASSANDRA-10661: - [~beobal] Awesome, will try to do everything tomorrow, thanks! > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113918#comment-15113918 ] Jordan West commented on CASSANDRA-10661: - bq. Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. The example is correct, but this is not a limitation of SASI, its a limitation in CQL, and we decided not to further extend the grammar, since we have already had to scale back our grammar changes to later phases (removing OR, grouping, and != support for now). Ideally, CQL would support a `LIKE` operator similar to SQL, and depending on if the index was created with `PREFIX` or `CONTAINS` we would allow/disallow forms such as `%Jo%` or `_j%`. bq. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) It does, but so are all queries on numerical data, which thinking about it, may make the `PREFIX` option confusing for numeric types. SPARSE is intended to improve query performance on numerical data where there are a large number of terms (e.g. timestamps), but small number of keys per term (e.g. some timeseries data). `SPARSE` should not be used on every numerical column, and for most non-numerical data is not an ideal setting either. For example, in a large data set of first names the number of names will be small compared to the number of keys, and given the distribution of first names using SPARSE will increase the size of the index and at best have zero effect on query performance, but may hurt it. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113816#comment-15113816 ] Jack Krupansky commented on CASSANDRA-10661: So is this stuff actually ready to release? I mean, consistent with the new philosophy that "trunk is always releasable"? IOW, if it does get committed, it will be in 3.4 no matter what? I only ask because it just seemed that there was stuff in flux fairly recently (a couple days ago), suggested it wasn't quite baked enough to be considered "releasable". > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113836#comment-15113836 ] Jack Krupansky commented on CASSANDRA-10661: Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. Or, can I indeed have two indexes on a single column, one a traditional exact match, and one a prefix match. Hmmm... in which case, which gets used if I just specify a column name? CREATE INDEX first_name_full ON table CREATE CUSTOM INDEX first_name_prefix ... It would be good to have an example that illustrates this. In fact, I would argue that first and last names are perfect examples of where you really do need to query on both exact match and partial match. In fact, I'm not sure I can think of any examples of non-tokenized text fields where you don't want to reserve the ability to find an exact match even if you do need partial matches for some queries. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) Are there any use cases of traditional Cassandra indexes which shouldn't almost automatically be converted to SPARSE. After all, the current recommended best practice is to avoid secondary indexes where the column cardinality is either very high or very low, which seems to be a match for SPARSE, although the precise meaning of SPARSE is still a bit fuzzy for me. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113842#comment-15113842 ] Jon Haddad commented on CASSANDRA-10661: If sparse means what Jack is implying, perhaps a better name for it would be EXACT > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113967#comment-15113967 ] Pavel Yaskevich commented on CASSANDRA-10661: - bq. So is this stuff actually ready to release? I mean, consistent with the new philosophy that "trunk is always releasable"? IOW, if it does get committed, it will be in 3.4 no matter what? I only ask because it just seemed that there was stuff in flux fairly recently (a couple days ago), suggested it wasn't quite baked enough to be considered "releasable". Yes, the stuff is ready to release since fairly recently added changes are ported from 2.0 and clustering support is just couple of lines of additional filtering added, no internal data structure changes, this is also opt-in feature which is irrelevant for core functionality until enabled. This is also the reason why we don't want do any of the CQL front-end related changes right away but rather more gradual migration. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114168#comment-15114168 ] Pavel Yaskevich commented on CASSANDRA-10661: - Pushed as squashed commit [72790dc|https://github.com/apache/cassandra/commit/72790dc8e34826b39ac696b03025ae6b7b6beb2b]. I'm going to resolve this issue and promote CASSANDRA-10765 from sub-task. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.4 > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112994#comment-15112994 ] Pavel Yaskevich commented on CASSANDRA-10661: - [~beobal] How about `unfilteredCluster`? Since we are on the same page about this, here is what I'm thinking - we are going to avoid README.md we have in xedin/sasi and I'm going to put it into doc/SASI.md, squash all 17 commits into one and push to trunk, sounds good? > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113071#comment-15113071 ] Pavel Yaskevich commented on CASSANDRA-10661: - [~doanduyhai] Regarding CONTAINS mode - it's more expensive to build since it has to extract the suffixes (which is exactly what search people are doing) which makes it as expensive to query as PREFIX columns. Regarding sorting - this is currently not a priority since it would require extensive changes to Cassandra interfaces to support that, MAX_ROWS currently is the same restriction per result page as in CQL3. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113138#comment-15113138 ] DOAN DuyHai commented on CASSANDRA-10661: - Thanks for the clarifications > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113157#comment-15113157 ] Pavel Yaskevich commented on CASSANDRA-10661: - No problem! Just to clarify a bit more - ORDER BY I mean is "real SQL" ORDER BY (the one which sorts keys) and not currently built-in one which depends on CLUSTERING ORDER, that one would still work the same way with indexes as it works right now. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113051#comment-15113051 ] DOAN DuyHai commented on CASSANDRA-10661: - Hello [~xedin], it's me again. I've had some discussion with search people and they told me that wildcard searches (name like "\*x\*") are very expensive. Classical data structure like suffix trees are adapted for suffix searching (name like "xxx\*"). For prefix search (name like "\*xxx") they're creating a *reversed* index. Does it mean that the CONTAINS mode (formerly named SUFFIX) is more expensive than the NORMAL search mode ? If yes, how much expensive is it (x2 ? order of magnitude ?) Second question, more related to the impl, since you query the nodes following the token range and do not hit all nodes like normal secondary index, does it imply that *sorting* (ORDER BY) is no longer relevant since you do not retrieve all possible results ? (I've seen in QueryPlan.MAX_ROWS that there is a hard-coded limit of 10 000 results) Sorry to annoy you with my questions but they are important so that we, evangelists, can give the right use-cases for users and especially deter them from mis-using SASI when it's not appropriate or when the search cost is prohibitive. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112458#comment-15112458 ] Sam Tunnicliffe commented on CASSANDRA-10661: - bq. So let's prioritize phase #2 and #3 over that for now, WDYT? Sounds good to me. I've only one tiny comment about the latest commits, which is that it's slightly odd API-wise that {{SSTableFlushObserver::nextRow}} takes an {{Unfiltered}} rather than a {{Row}}. Maybe renaming it to {{nextUnfiltered}} or even just {{flushed}} would make sense. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15111582#comment-15111582 ] Pavel Yaskevich commented on CASSANDRA-10661: - [~beobal] Thinking more about it, we should probably proceed with implementation of the clustering that I've finished last night to conclude phase #1, since it's already an improvement over internal indexes anyway and it would take some time to have TokenTree keys with flexible size implemented which is going to speed up intersections but not necessarily satisfies-by at the end of the query, so benefit of it is uncertain at this point, because we have to scan through all row clusters anyway. So let's prioritize phase #2 and #3 over that for now, WDYT? P.S. I have been running testall and dtest in CI (0 failures for testall, some old failures for dtest): ||branch||testall||dtest|| |[sasi-integration|https://github.com/xedin/sasi/]|[testall|http://cassci.datastax.com/job/xedin-sasi-3.2-integration-testall/]|[dtest|http://cassci.datastax.com/job/xedin-sasi-3.2-integration-dtest/]| > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109984#comment-15109984 ] Pavel Yaskevich commented on CASSANDRA-10661: - I've pushed initial version of the clustering support for SASI into sasi-3.2-integration branch, it's not as well performing yet as it can be but is still an improvement over ClusteringColumnIndex because index intersection is used instead of filtering (although intersection currently only done on the partition key instead of partition key + clustering), so if anybody wants they can start testing it. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099200#comment-15099200 ] Pavel Yaskevich commented on CASSANDRA-10661: - Just as a quick update, I've ported in-memory index size estimation, so the only remaining thing on the list for phase #1 is clustering support. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15094522#comment-15094522 ] Pavel Yaskevich commented on CASSANDRA-10661: - [~doanduyhai] It uses built-in facilities for it, namely PartitionRangeReadCommand in 3.x since it returns results in the token order it doesn't have to scatter-gather right away and can do what normal read commands do. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15094106#comment-15094106 ] DOAN DuyHai commented on CASSANDRA-10661: - [~xedin] If you have some time, can you point me to the source code (class) where SASI manages the fetching of data on other nodes in the ring ? Jason Brown told me that SASI does not use the scatter-gather technique but fetches data by token range: https://twitter.com/doanduyhai/status/662392685706289152 > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091764#comment-15091764 ] Sam Tunnicliffe commented on CASSANDRA-10661: - bq. maybe we should rename ORIGINAL into PREFIX at the same time, so we'll have - PREFIX, CONTAINS, SPARSE +1 > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092561#comment-15092561 ] Pavel Yaskevich commented on CASSANDRA-10661: - Pushed to [3.2-integration branch|https://github.com/xedin/cassandra] > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090954#comment-15090954 ] DOAN DuyHai commented on CASSANDRA-10661: - Looks good to me! > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090941#comment-15090941 ] Pavel Yaskevich commented on CASSANDRA-10661: - This got me thinking - maybe we should rename ORIGINAL into PREFIX at the same time, so we'll have - PREFIX, CONTAINS, SPARSE, WDYT? > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090025#comment-15090025 ] DOAN DuyHai commented on CASSANDRA-10661: - A minor remark. Shouldn't we take this integration into C* opportunity to *rename* the *SUFFIX* mode to *CONTAINS* ? Indeed, *NORMAL* and *SPARSE* indexing modes are quite self-explanatory whereas *SUFFIX* mode not only allows searching on suffixes but also on prefixes. Users can be confused by the suffix name and think it only works for suffix search > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090476#comment-15090476 ] Pavel Yaskevich commented on CASSANDRA-10661: - Sounds good, Doan! we can do that. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085957#comment-15085957 ] Sam Tunnicliffe commented on CASSANDRA-10661: - This is looking pretty good. A problem (which isn't caught by any of the unit tests btw) is that due to the fact that under the hood 3.x considers all compact storage columns as static. This breaks interactions with sasi-indexed tables via CQL - for example, try running through the examples in the original [SASI readme|https://github.com/xedin/sasi/blob/master/README.md] and you'll find querying mostly broken. {code} cqlsh:demo> select first_name, last_name, age, height, created_at from sasi where first_name = 'M'; InvalidRequest: code=2200 [Invalid query] message="Queries using 2ndary indexes don't support selecting only static columns" cqlsh:demo> cqlsh:demo> cqlsh:demo> select * from sasi where first_name = 'M'; id | age | created_at | first_name | height | last_name +-++++--- (0 rows) {code} Fortunately, I believe we can simply drop the use of COMPACT STORAGE. My (limited) testing suggests that when tables are created without it, everything that's currently implemented works as expected. The new SASI specific tests look good and are all green, but we obviously need to run this through CI before it's committed. On a related note, are there any dtests that may be worth adding? The utest coverage is pretty comprehensive (modulo the CQL issues) so I wouldn't say it was absolutely critical, but some multi-node & CQL based tests would be nice to have. Otherwise, this first phase of integration looks good to me. On initial review I found one bug and a handful of nits. I have a few scenarios I want to run through, mostly to verify how sasi interacts with some of the parts of the index subsystem that were changed in 3.0. Initial review comments: * The regex matching in o.a.c.io.sstable.Component.Type::fromRepresentation throws an NPE when it encounters an unknown name and tries to match it to a CUSTOM component. * In SASIIndex, getMetadataReloadTask & getBlockingFlushTask should be able to just return null (like getInitializationTask does). In the case of the former, that is true right now as the only call site is in SIM where nulls are properly handled. getBlockingFlushTask is also called from KeyCacheCqlTest which doesn't check for nulls so would need tweaking slightly. (This is totally minor, the irregularity in SASIIndex just bugged me). * I couldn't see why a PeekingIterator is used in OnDiskIndex::search * The use of "a" and "b" in the o.a.c.i.sasi.plan.Expression ctor seems like it could have the potential for pain when debugging. I'm sure that it isn't very likely we'll ever care too much & I don't have any particularly better suggestion but if you do, could these be changed to something more greppable (or extracted to constants)? * The anonymous extension of Expression in Operation::analyzeGroup can be replaced with {{perColumn.add(new Expression(controller, columnIndex).add(e.operator(), token));}} * MemIndex::estimateSize is unused * It doesn't really affect anything, but just for clarity I would rename MemoryUtil.DIRECT_BYTE_BUFFER_R_CLASS to RO_DIRECT_BYTE_BUFFER_CLASS * Most trivial of nits: brace placement in SchemaLoader (ln 255) > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15086531#comment-15086531 ] Pavel Yaskevich commented on CASSANDRA-10661: - Thanks for the review! This is pretty ironic that COMPACT STORAGE is broken now, but we are going to fit this anyway once clustering support is added. :) dtests are definitely a good idea, I think we should have some already and we'll definitely port them main repo, I will also add my branch to CI tonight. Regarding review comments (all of the changes are pushed): bq. The regex matching in o.a.c.io.sstable.Component.Type::fromRepresentation throws an NPE Fixed. bq. In SASIIndex, getMetadataReloadTask & getBlockingFlushTask should be able to just return null Made both return null in SASIndex and made sure there would be no NPEs in existing call sites. bq. I couldn't see why a PeekingIterator is used in OnDiskIndex::search Moved back to Iterator that, since PeekingIterator was a rudiment of the previous version if search. bq. The use of "a" and "b" in the o.a.c.i.sasi.plan.Expression ctor seems like it could have the potential for pain when debugging. Renamed to "sasi", "internal". maybe it will make it a bit clearer... bq. The anonymous extension of Expression in Operation::analyzeGroup can be replaced Fixed, Thanks for catching that! bq. MemIndex::estimateSize is unused Is a TODO item for me as memory tracking is different in 3.0 (see my previous comment). bq. It doesn't really affect anything, but just for clarity I would rename MemoryUtil.DIRECT_BYTE_BUFFER_R_CLASS to RO_DIRECT_BYTE_BUFFER_CLASS Makes sense and done. bq. Most trivial of nits: brace placement in SchemaLoader (ln 255) Fixed :) > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069465#comment-15069465 ] Sam Tunnicliffe commented on CASSANDRA-10661: - [~xedin] that sounds to me like a reasonable approach. Does that basically make this ticket Patch Available based on your integration branch (if so, I'll be sure and wrap up review asap after the holidays)? > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070176#comment-15070176 ] Pavel Yaskevich commented on CASSANDRA-10661: - [~beobal] Yes, I will remove OR/parenthesis features from the branch (since we have a separate repo we can port it back from), do some more cleanup, and mark this as Patch Available. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067415#comment-15067415 ] Pavel Yaskevich commented on CASSANDRA-10661: - [~beobal] Here is the latest status: I've attempted to integrate OR/Parenthesis into the CQL3 and SelectStatement which, as I've figured, actually would still require CASSADRA-10765 to be implemented since all of the restrictions have to be constructed/checked per logical operation (in other words, per CQL3 statement we'll have to build operation graph instead of current list approach) which would require substantial changes in SelectStatement, StatementRestrictions and other query processing classes. Maybe an alternative, and granular approach, would be more appropriate in this case: phase #1 - SASI goes into trunk supporting AND only (in other words, having QueryPlan internalized, no changes to CQL3); phase #2 - implement CASSANDRA-10765 with AND support only, which would supersede restriction support (via StatementRestrictions) in CQL3; phase #3 - add OR support to, by that time, already global QueryPlan. WDYT? > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048288#comment-15048288 ] Pavel Yaskevich commented on CASSANDRA-10661: - Just as a quick update - I've finished porting all SASI functionality except OR/parenthesis and NOT_EQ to trunk in my [branch|https://github.com/xedin/cassandra/commits/sasi-3.2-integration], all of the appropriate SASI tests pass now. Going to proceed with porting of new operations. > Integrate SASI to Cassandra > --- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Pavel Yaskevich >Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)