[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-06-23 Thread Dave Brosius (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346368#comment-15346368
 ] 

Dave Brosius commented on CASSANDRA-10661:
--

{quote}
Dave Brosius Can you expand further why someMapEntry.equals(someAbstractTrie) 
will always be false ?
{quote}

darn it, i misread. the equals isn't on AbstractTrie, but on a subclass. so yes 
you are right. Sorry for the noise.

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.4
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-06-23 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345995#comment-15345995
 ] 

DOAN DuyHai commented on CASSANDRA-10661:
-

[~dbrosius] Can you expand further why 
{{someMapEntry.equals(someAbstractTrie)}} will always be false ?

According to the contract of {{Map.Entry::equals}}, as long as the key and 
value are equal, the equality holds.

I've tried an unit test and it works:

{code:java}
public class AbstractTrieTest
{

@Test
public void should_test_equality() throws Exception {
Map map = new HashMap<>();
map.put("10", 10L);

final Map.Entry mapEntry = 
map.entrySet().iterator().next();

final AbstractTrie.BasicEntry trieEntry = new 
AbstractPatriciaTrie.TrieEntry<>("10", 10L, 0);

Assert.assertTrue("mapEntry.equals(trieEntry)", 
mapEntry.equals(trieEntry));
}
}
{code}

{noformat}
% ant testsome 
-Dtest.name=org.apache.cassandra.index.sasi.utils.AbstractTrieTest 
-Dtest.methods=should_test_equality   
...
testsome:
[junit] WARNING: multiple versions of ant detected in path for junit
[junit]  
jar:file:/usr/local/Cellar/ant/1.9.4/libexec/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/Users/archinnovinfo/perso/cassandra/build/lib/jars/ant-1.9.4.jar!/org/apache/tools/ant/Project.class
[junit] Testsuite: org.apache.cassandra.index.sasi.utils.AbstractTrieTest
[junit] Testsuite: org.apache.cassandra.index.sasi.utils.AbstractTrieTest 
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.058 sec
{noformat}

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.4
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-06-22 Thread Dave Brosius (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345812#comment-15345812
 ] 

Dave Brosius commented on CASSANDRA-10661:
--

AbstractTrie.equals isn't symmetric..

someAbstractTrie.equals(someMapEntry) could be true, but

someMapEntry.equals(someAbstractTrie) will always be false.

that is unstable.

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.4
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-04-27 Thread Chanh Le (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261541#comment-15261541
 ] 

Chanh Le commented on CASSANDRA-10661:
--

[~xedin] Thank man. You got my day.

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.4
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-04-27 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261537#comment-15261537
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

Hi [~giaosudau], the name of the index class is 
'org.apache.cassandra.index.sasi.SASIIndex' you most likely reading 
documentation specific for 2.0, here is the updated doc 
https://github.com/apache/cassandra/blob/trunk/doc/SASI.md, it resides in doc/ 
folder of Apache Cassandra distribution. 

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.4
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-04-27 Thread Chanh Le (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261528#comment-15261528
 ] 

Chanh Le commented on CASSANDRA-10661:
--

Hi I am using cassandra 3.5 and I have problem when create index with that.
CREATE CUSTOM INDEX ON bar (fname) USING 
'org.apache.cassandra.db.index.SSTableAttachedSecondaryIndex'
WITH OPTIONS = {
'analyzer_class':
'org.apache.cassandra.db.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'
};

it throws: unable to find custom indexer class 
'org.apache.cassandra.db.index.SSTableAttachedSecondaryIndex



> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.4
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-27 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119962#comment-15119962
 ] 

DOAN DuyHai commented on CASSANDRA-10661:
-

Last but not least. We have an use-case for secondary index that is pretty 
interesting. With the *Spark/Cassandra* connector, since each Spark partition 
is distributed according to Cassandra token ranges, each CQL query is *node 
local*. 

Consequently a 

{code:sql}
SELECT * FROM table WHERe col=xxx`
{code} 

is transformed by the connector into  

{code:sql}
SELECT * FROM table WHERE col=xxx AND token(pk)>=aaa AND token(pk) <= bbb
{code}

The normal CQL query engine is token-aware and will restrict the secondary 
index query on the given range. *Does SASI query planner also take into account 
token range restrictions to avoid hitting un-necessary nodes ?* 

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.4
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-27 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15120020#comment-15120020
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

Yes, it does account for the range specified and even not going to touch 
unnecessary SSTables which are not in the range.

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.4
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-23 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113677#comment-15113677
 ] 

Sam Tunnicliffe commented on CASSANDRA-10661:
-

[~xedin] SGTM!

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-23 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113690#comment-15113690
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

[~beobal] Awesome, will try to do everything tomorrow, thanks!

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-23 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113918#comment-15113918
 ] 

Jordan West commented on CASSANDRA-10661:
-

bq. Is there also a way to query a SASI-indexed column by exact value? I mean, 
it seems as if by enabling prefix or contains, that it will always query by 
prefix or contains. For example, if I want to query for full first name, like 
where their full first name really is "J" and not get "John" and "James" as 
well, while at other times I am indeed looking for names starting with a prefix 
of "Jo" for "John", "Joseph", etc.

The example is correct, but this is not a limitation of SASI, its a limitation 
in CQL, and we decided not to further extend the grammar, since we have already 
had to scale back our grammar changes to later phases (removing OR, grouping, 
and != support for now). Ideally, CQL would support a `LIKE` operator similar 
to SQL, and depending on if the index was created with `PREFIX` or `CONTAINS` 
we would allow/disallow forms such as `%Jo%` or `_j%`. 

bq. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which 
case, would I be better off with a SPARSE index for first_name_full, or would a 
traditional Cassandra non-custom index work fine (or even better.)

It does, but so are all queries on numerical data, which thinking about it, may 
make the `PREFIX` option confusing for numeric types. SPARSE is intended to 
improve query performance on numerical data where there are a large number of 
terms (e.g. timestamps), but small number of keys per term (e.g. some 
timeseries data).  `SPARSE` should not be used on every numerical column, and 
for most non-numerical data is not an ideal setting either. For example, in a 
large data set of first names the number of names will be small compared to the 
number of keys, and given the distribution of first names using SPARSE will 
increase the size of the index and at best have zero effect on query 
performance, but may hurt it.





 

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-23 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113816#comment-15113816
 ] 

Jack Krupansky commented on CASSANDRA-10661:


So is this stuff actually ready to release? I mean, consistent with the new 
philosophy that "trunk is always releasable"? IOW, if it does get committed, it 
will be in 3.4 no matter what? I only ask because it just seemed that there was 
stuff in flux fairly recently (a couple days ago), suggested it wasn't quite 
baked enough to be considered "releasable". 

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-23 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113836#comment-15113836
 ] 

Jack Krupansky commented on CASSANDRA-10661:


Is there also a way to query a SASI-indexed column by exact value? I mean, it 
seems as if by enabling prefix or contains, that it will always query by prefix 
or contains. For example, if I want to query for full first name, like where 
their full first name really is "J" and not get "John" and "James" as well, 
while at other times I am indeed looking for names starting with a prefix of 
"Jo" for "John", "Joseph", etc.

Or, can I indeed have two indexes on a single column, one a traditional exact 
match, and one a prefix match. Hmmm... in which case, which gets used if I just 
specify a column name?

CREATE INDEX first_name_full ON table 
CREATE CUSTOM INDEX first_name_prefix ...

It would be good to have an example that illustrates this. In fact, I would 
argue that first and last names are perfect examples of where you really do 
need to query on both exact match and partial match. In fact, I'm not sure I 
can think of any examples of non-tokenized text fields where you don't want to 
reserve the ability to find an exact match even if you do need partial matches 
for some queries.

Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which 
case, would I be better off with a SPARSE index for first_name_full, or would a 
traditional Cassandra non-custom index work fine (or even better.)

Are there any use cases of traditional Cassandra indexes which shouldn't almost 
automatically be converted to SPARSE. After all, the current recommended best 
practice is to avoid secondary indexes where the column cardinality is either 
very high or very low, which seems to be a match for SPARSE, although the 
precise meaning of SPARSE is still a bit fuzzy for me.

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-23 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113842#comment-15113842
 ] 

Jon Haddad commented on CASSANDRA-10661:


If sparse means what Jack is implying, perhaps a better name for it would be 
EXACT

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-23 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113967#comment-15113967
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

bq. So is this stuff actually ready to release? I mean, consistent with the new 
philosophy that "trunk is always releasable"? IOW, if it does get committed, it 
will be in 3.4 no matter what? I only ask because it just seemed that there was 
stuff in flux fairly recently (a couple days ago), suggested it wasn't quite 
baked enough to be considered "releasable".

Yes, the stuff is ready to release since fairly recently added changes are 
ported from 2.0 and clustering support is just couple of lines of additional 
filtering added, no internal data structure changes, this is also opt-in 
feature which is irrelevant for core functionality until enabled. This is also 
the reason why we don't want do any of the CQL front-end related changes right 
away but rather more gradual migration.

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-23 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15114168#comment-15114168
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

Pushed as squashed commit 
[72790dc|https://github.com/apache/cassandra/commit/72790dc8e34826b39ac696b03025ae6b7b6beb2b].
 I'm going to resolve this issue and promote CASSANDRA-10765 from sub-task.

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.4
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-22 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112994#comment-15112994
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

[~beobal] How about `unfilteredCluster`? Since we are on the same page about 
this, here is what I'm thinking - we are going to avoid README.md we have in 
xedin/sasi and I'm going to put it into doc/SASI.md, squash all 17 commits into 
one and push to trunk, sounds good?

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-22 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113071#comment-15113071
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

[~doanduyhai] 

Regarding CONTAINS mode - it's more expensive to build since it has to extract 
the suffixes (which is exactly what search people are doing) which makes it as 
expensive to query as PREFIX columns. Regarding sorting - this is currently not 
a priority since it would require extensive changes to Cassandra interfaces to 
support that, MAX_ROWS currently is the same restriction per result page as in 
CQL3.


> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-22 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113138#comment-15113138
 ] 

DOAN DuyHai commented on CASSANDRA-10661:
-

Thanks for the clarifications

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-22 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113157#comment-15113157
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

No problem! Just to clarify a bit more - ORDER BY I mean is "real SQL" ORDER BY 
(the one which sorts keys) and not currently built-in one which depends on 
CLUSTERING ORDER, that one would still work the same way with indexes as it 
works right now.

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-22 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113051#comment-15113051
 ] 

DOAN DuyHai commented on CASSANDRA-10661:
-

Hello [~xedin], it's me again.

 I've had some discussion with search people and they told me that wildcard 
searches (name like "\*x\*") are very expensive. Classical data structure 
like suffix trees are adapted for suffix searching (name like "xxx\*"). For 
prefix search (name like "\*xxx") they're creating a *reversed* index. Does it 
mean that the CONTAINS mode (formerly named SUFFIX) is more expensive than the 
NORMAL search mode ? If yes, how much expensive is it (x2 ? order of magnitude 
?)

 Second question, more related to the impl, since you query the nodes following 
the token range and do not hit all nodes like normal secondary index, does it 
imply that *sorting*  (ORDER BY) is no longer relevant since you do not 
retrieve all possible results ? (I've seen in QueryPlan.MAX_ROWS that there is 
a hard-coded limit of 10 000 results)

 Sorry to annoy you with my questions but they are important so that we, 
evangelists, can give the right use-cases for users and especially deter them 
from mis-using SASI when it's not appropriate or when the search cost is 
prohibitive.

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-22 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112458#comment-15112458
 ] 

Sam Tunnicliffe commented on CASSANDRA-10661:
-

bq. So let's prioritize phase #2 and #3 over that for now, WDYT?

Sounds good to me. I've only one tiny comment about the latest commits, which 
is that it's slightly odd API-wise that {{SSTableFlushObserver::nextRow}} takes 
an {{Unfiltered}} rather than a {{Row}}. Maybe renaming it to 
{{nextUnfiltered}} or even just {{flushed}} would make sense. 


> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-21 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15111582#comment-15111582
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

[~beobal] Thinking more about it, we should probably proceed with 
implementation of the clustering that I've finished last night to conclude 
phase #1, since it's already an improvement over internal indexes anyway and it 
would take some time to have TokenTree keys with flexible size implemented 
which is going to speed up intersections but not necessarily satisfies-by at 
the end of the query, so benefit of it is uncertain at this point, because we 
have to scan through all row clusters anyway. So let's prioritize phase #2 and 
#3 over that for now, WDYT?

P.S. I have been running testall and dtest in CI (0 failures for testall, some 
old failures for dtest):

||branch||testall||dtest||
|[sasi-integration|https://github.com/xedin/sasi/]|[testall|http://cassci.datastax.com/job/xedin-sasi-3.2-integration-testall/]|[dtest|http://cassci.datastax.com/job/xedin-sasi-3.2-integration-dtest/]|


> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-20 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109984#comment-15109984
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

I've pushed initial version of the clustering support for SASI into 
sasi-3.2-integration branch, it's not as well performing yet as it can be but 
is still an improvement over ClusteringColumnIndex because index intersection 
is used instead of filtering (although intersection currently only done on the 
partition key instead of partition key + clustering), so if anybody wants they 
can start testing it.

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-14 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099200#comment-15099200
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

Just as a quick update, I've ported in-memory index size estimation, so the 
only remaining thing on the list for phase #1 is clustering support.

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-12 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15094522#comment-15094522
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

[~doanduyhai] It uses built-in facilities for it, namely 
PartitionRangeReadCommand in 3.x since it returns results in the token order it 
doesn't have to scatter-gather right away and can do what normal read commands 
do.

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-12 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15094106#comment-15094106
 ] 

DOAN DuyHai commented on CASSANDRA-10661:
-

[~xedin] If you have some time, can you point me to the source code (class) 
where SASI manages the fetching of data on other nodes in the ring ? Jason 
Brown told me that SASI does not use the scatter-gather technique but fetches 
data by token range: https://twitter.com/doanduyhai/status/662392685706289152

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-11 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091764#comment-15091764
 ] 

Sam Tunnicliffe commented on CASSANDRA-10661:
-

bq. maybe we should rename ORIGINAL into PREFIX at the same time, so we'll have 
- PREFIX, CONTAINS, SPARSE
+1

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-11 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092561#comment-15092561
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

Pushed to [3.2-integration branch|https://github.com/xedin/cassandra]

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-10 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090954#comment-15090954
 ] 

DOAN DuyHai commented on CASSANDRA-10661:
-

Looks good to me!

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-09 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090941#comment-15090941
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

This got me thinking - maybe we should rename ORIGINAL into PREFIX at the same 
time, so we'll have - PREFIX, CONTAINS, SPARSE, WDYT?

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-08 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090025#comment-15090025
 ] 

DOAN DuyHai commented on CASSANDRA-10661:
-

A minor remark. Shouldn't we take this integration into C* opportunity to 
*rename* the *SUFFIX* mode to *CONTAINS* ?
Indeed, *NORMAL* and *SPARSE* indexing modes are quite self-explanatory whereas 
*SUFFIX* mode not only allows searching on suffixes but also on prefixes. 

 Users can be confused by the suffix name and think it only works for suffix 
search

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-08 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090476#comment-15090476
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

Sounds good, Doan! we can do that. 

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-06 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085957#comment-15085957
 ] 

Sam Tunnicliffe commented on CASSANDRA-10661:
-

This is looking pretty good. 

A problem (which isn't caught by any of the unit tests btw) is that due to the 
fact that under the hood 3.x considers all compact storage columns as static. 
This breaks interactions with sasi-indexed tables via CQL - for example, try 
running through the examples in the original [SASI 
readme|https://github.com/xedin/sasi/blob/master/README.md] and you'll find 
querying mostly broken. 

{code}
cqlsh:demo> select first_name, last_name, age, height, created_at from sasi 
where first_name = 'M';
InvalidRequest: code=2200 [Invalid query] message="Queries using 2ndary indexes 
don't support selecting only static columns"
cqlsh:demo>
cqlsh:demo>
cqlsh:demo> select * from sasi where first_name = 'M';

 id | age | created_at | first_name | height | last_name
+-++++---

(0 rows)
{code}

Fortunately, I believe we can simply drop the use of COMPACT STORAGE. My 
(limited) testing suggests that when tables are created without it, everything 
that's currently implemented works as expected.

The new SASI specific tests look good and are all green, but we obviously need 
to run this through CI before it's committed. On a related note, are there any 
dtests that may be worth adding? The utest coverage is pretty comprehensive 
(modulo the CQL issues) so I wouldn't say it was absolutely critical, but some 
multi-node & CQL based tests would be nice to have.

Otherwise, this first phase of integration looks good to me. On initial review 
I found one bug and a handful of nits. I have a few scenarios I want to run 
through, mostly to verify how sasi interacts with some of the parts of the 
index subsystem that were changed in 3.0.

Initial review comments:

* The regex matching in o.a.c.io.sstable.Component.Type::fromRepresentation 
throws an NPE when it encounters an unknown name and tries to match it to a 
CUSTOM component.
* In SASIIndex, getMetadataReloadTask & getBlockingFlushTask should be able to 
just return null (like getInitializationTask does). In the case of the former, 
that is true right now as the only call site is in SIM where nulls are properly 
handled. getBlockingFlushTask is also called from KeyCacheCqlTest which doesn't 
check for nulls so would need tweaking slightly. (This is totally minor, the 
irregularity in SASIIndex just bugged me).
* I couldn't see why a PeekingIterator is used in OnDiskIndex::search
* The use of "a" and "b" in the o.a.c.i.sasi.plan.Expression ctor seems like it 
could have the potential for pain when debugging. I'm sure that it isn't very 
likely we'll ever care too much & I don't have any particularly better 
suggestion but if you do, could these be changed to something more greppable 
(or extracted to constants)?
* The anonymous extension of Expression in Operation::analyzeGroup can be 
replaced with {{perColumn.add(new Expression(controller, 
columnIndex).add(e.operator(), token));}}
* MemIndex::estimateSize is unused
* It doesn't really affect anything, but just for clarity I would rename 
MemoryUtil.DIRECT_BYTE_BUFFER_R_CLASS to RO_DIRECT_BYTE_BUFFER_CLASS
* Most trivial of nits: brace placement in SchemaLoader (ln 255)


> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2016-01-06 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15086531#comment-15086531
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

Thanks for the review!

This is pretty ironic that COMPACT STORAGE is broken now, but we are going to 
fit this anyway once clustering support is added. :) dtests are definitely a 
good idea, I think we should have some already and we'll definitely port them 
main repo, I will also add my branch to CI tonight.

Regarding review comments (all of the changes are pushed): 

bq. The regex matching in o.a.c.io.sstable.Component.Type::fromRepresentation 
throws an NPE
Fixed.

bq. In SASIIndex, getMetadataReloadTask & getBlockingFlushTask should be able 
to just return null
Made both return null in SASIndex and made sure there would be no NPEs in 
existing call sites.

bq. I couldn't see why a PeekingIterator is used in OnDiskIndex::search
Moved back to Iterator that, since PeekingIterator was a rudiment 
of the previous version if search.

bq. The use of "a" and "b" in the o.a.c.i.sasi.plan.Expression ctor seems like 
it could have the potential for pain when debugging.
Renamed to "sasi", "internal". maybe it will make it a bit clearer...

bq. The anonymous extension of Expression in Operation::analyzeGroup can be 
replaced
Fixed, Thanks for catching that!

bq. MemIndex::estimateSize is unused
Is a TODO item for me as memory tracking is different in 3.0 (see my previous 
comment).

bq. It doesn't really affect anything, but just for clarity I would rename 
MemoryUtil.DIRECT_BYTE_BUFFER_R_CLASS to RO_DIRECT_BYTE_BUFFER_CLASS
Makes sense and done.

bq. Most trivial of nits: brace placement in SchemaLoader (ln 255)
Fixed :)

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2015-12-23 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069465#comment-15069465
 ] 

Sam Tunnicliffe commented on CASSANDRA-10661:
-

[~xedin] that sounds to me like a reasonable approach. Does that basically make 
this ticket Patch Available based on your integration branch (if so, I'll be 
sure and wrap up review asap after the holidays)?

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2015-12-23 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070176#comment-15070176
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

[~beobal] Yes, I will remove OR/parenthesis features from the branch (since we 
have a separate repo we can port it back from), do some more cleanup, and mark 
this as Patch Available.

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2015-12-21 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067415#comment-15067415
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

[~beobal] Here is the latest status: I've attempted to integrate OR/Parenthesis 
into the CQL3 and SelectStatement which, as I've figured, actually would still 
require CASSADRA-10765 to be implemented since all of the restrictions have to 
be constructed/checked per logical operation (in other words, per CQL3 
statement we'll have to build operation graph instead of current list approach) 
which would require substantial changes in SelectStatement, 
StatementRestrictions and other query processing classes. Maybe an alternative, 
and granular approach, would be more appropriate in this case:

phase #1 - SASI goes into trunk supporting AND only (in other words, having 
QueryPlan internalized, no changes to CQL3);
phase #2 - implement CASSANDRA-10765 with AND support only, which would 
supersede restriction support (via StatementRestrictions) in CQL3;
phase #3 - add OR support to, by that time, already global QueryPlan.

WDYT?

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10661) Integrate SASI to Cassandra

2015-12-09 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048288#comment-15048288
 ] 

Pavel Yaskevich commented on CASSANDRA-10661:
-

Just as a quick update - I've finished porting all SASI functionality except 
OR/parenthesis and NOT_EQ to trunk in my 
[branch|https://github.com/xedin/cassandra/commits/sasi-3.2-integration], all 
of the appropriate SASI tests pass now. Going to proceed with porting of new 
operations.

> Integrate SASI to Cassandra
> ---
>
> Key: CASSANDRA-10661
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>  Labels: sasi
> Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)