Re: Ordering by multiple columns?

2016-10-10 Thread Mikhail Krupitskiy
Looks like ordering by multiple columns in Cassandra has few sides that are not 
obvious.
I wasn’t able to find this information in the official documentation but it’s 
quite well described here:
http://stackoverflow.com/questions/35708118/where-and-order-by-clauses-in-cassandra-cql
 


Thanks,
Mikhail
> On 10 Oct 2016, at 21:55, DuyHai Doan  wrote:
> 
> No, we didn't record the talk this time unfortunately :(
> 
> On Mon, Oct 10, 2016 at 8:17 PM, Ali Akhtar  > wrote:
> Really helpful slides. Is there a video to go with them?
> 
> On Sun, Oct 9, 2016 at 11:48 AM, DuyHai Doan  > wrote:
> Yes it is possible, read this: 
> http://www.slideshare.net/doanduyhai/datastax-day-2016-cassandra-data-modeling-basics/24
>  
> 
> 
> and the following slides
> 
> On Sun, Oct 9, 2016 at 2:04 AM, Ali Akhtar  > wrote:
> Is it possible to have multiple clustering keys in cassandra, or some other 
> way to order by multiple columns?
> 
> For example, say I have a table of songs, and each song has a rating and a 
> date.
> 
> I want to sort songs by rating first, and then with newer songs on top.
> 
> So if two songs have 5 rating, and one's date is 1st Feb, the other is 2nd 
> Feb, then I want the 2nd feb one to be sorted above the 1st feb one.
> 
> Like this:
> 
> Select * from songs order by rating, createdAt
> 
> Is this possible?
> 
> 
> 



Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-10-04 Thread Mikhail Krupitskiy
Please see my comments inline.

Thanks,
Mikhail
> On 26 Sep 2016, at 17:07, DuyHai Doan <doanduy...@gmail.com> wrote:
> 
> "In the current implementation (‘%’ could be a wildcard only at the start/end 
> of a term) I guess it should be ’ENDS with ‘%escape’ ‘." 
> 
> --> Yes in the current impl, it means ENDS WITH '%escape' but we want SASI to 
> understand the %% as an escape for % so the goal is that SASI understands 
> LIKE '%%escape' as EQUALS TO '%escape'. Am I correct ?
I guess that the goal is to define a way to use ‘%’ as a simple char.
LIKE '%escape' - ENDS WITH 'escape'
LIKE '%%escape' - EQUALS TO '%escape’
LIKE '%%escape%' - STARTS WITH '%escape’

LIKE ‘%%%escape’ - undefined in general case
LIKE ‘%%%escape’ - ENDS WITH “%escape” in a case when we know that a wildcard 
could be only at the start/end.
> 
> "Moreover all terms that contains single ‘%’ somewhere in the middle should 
> cause an exception."
> 
> --> Not necessarily, sometime people may want to search text pattern 
> containing the literal %. Imagine the text "this year the average income has 
> increase by 10%". People may want to search for "10%”.
If someone wants to search for ’10%’ then he should escape the ‘%’ char: like 
“10%%”.
> 
> 
> 
> "BUT may be it’s better to make escaping more universal to support a future 
> possible case where a wildcard could be placed in the middle of a term too?"
> 
> --> I guess universal escaping for % is the cleaner and better solution. 
> However it may involve some complex regular expression. I'm not sure that 
> input.replaceAll("%%", "%") trick would work for any cases.
As I wrote I don’t think that it’s possible to do universal escaping using ‘%’ 
as an escape char (a char to escape wildcard char to make it a simple char 
semantically) and as wildcard at the same time.
I suggest to use “\” as an escape char.
Also I don’t know enough about Cassandra’s internals to estimate how universal 
escaping will affect performance.
It really looks like a better solution for Cassandra users.
> 
> And we also need to define when we want to detect operation type 
> (LIKE_PREFIX, LIKE_SUFFIX, LIKE_CONTAINS, EQUAL) ? 
> 
> Should we detect operation type BEFORE escaping or AFTER escaping ?
As I understand ‘escaping' will be done by users. 
So on DB level we get an already escaped string from a request and it’s 
possible to know which symbol is a wildcard and which is just a char.
I guess that Cassandra should parse (unescape?) an incoming string to define 
wildcards positions and simple chars positions and then define an operation 
type.

 
> 
> 
> 
> 
> 
> On Mon, Sep 26, 2016 at 3:54 PM, Mikhail Krupitskiy 
> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
> wrote:
>> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
> In the current implementation (‘%’ could be a wildcard only at the start/end 
> of a term) I guess it should be ’ENDS with ‘%escape’ ‘.
> Moreover all terms that contains single ‘%’ somewhere in the middle should 
> cause an exception.
> BUT may be it’s better to make escaping more universal to support a future 
> possible case where a wildcard could be placed in the middle of a term too?
> 
> Thanks,
> Mikhail 
>> On 24 Sep 2016, at 21:09, DuyHai Doan <doanduy...@gmail.com 
>> <mailto:doanduy...@gmail.com>> wrote:
>> 
>> Reminder, right now, the % character is only interpreted as wildcard IF AND 
>> ONLY IF it is the first/last character of the searched term
>> 
>> 
>> LIKE '%escape' --> ENDS WITH 'escape' 
>> 
>> If we use % to escape %,
>> LIKE '%%escape' -->  EQUALS TO '%escape'
>> 
>> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
>> 
>> 
>> 
>> 
>> On Fri, Sep 23, 2016 at 5:02 PM, Mikhail Krupitskiy 
>> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
>> wrote:
>> Hi, Jim,
>> 
>> What pattern should be used to search “ends with  ‘%escape’ “ with your 
>> conception?
>> 
>> Thanks,
>> Mikhail
>> 
>>> On 22 Sep 2016, at 18:51, Jim Ancona <j...@anconafamily.com 
>>> <mailto:j...@anconafamily.com>> wrote:
>>> 
>>> To answer DuyHai's question without introducing new syntax, I'd suggest:
>>>> LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape' 
>>> So the first two %'s are translated to a literal, non-wildcard % and the 
>>> third % is a wildcard because it's not doubled.
>>> 
>>> Jim
>>> 
>>> On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy 
>>> <mikhail.kru

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-26 Thread Mikhail Krupitskiy
> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
In the current implementation (‘%’ could be a wildcard only at the start/end of 
a term) I guess it should be ’ENDS with ‘%escape’ ‘.
Moreover all terms that contains single ‘%’ somewhere in the middle should 
cause an exception.
BUT may be it’s better to make escaping more universal to support a future 
possible case where a wildcard could be placed in the middle of a term too?

Thanks,
Mikhail 
> On 24 Sep 2016, at 21:09, DuyHai Doan <doanduy...@gmail.com> wrote:
> 
> Reminder, right now, the % character is only interpreted as wildcard IF AND 
> ONLY IF it is the first/last character of the searched term
> 
> 
> LIKE '%escape' --> ENDS WITH 'escape' 
> 
> If we use % to escape %,
> LIKE '%%escape' -->  EQUALS TO '%escape'
> 
> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
> 
> 
> 
> 
> On Fri, Sep 23, 2016 at 5:02 PM, Mikhail Krupitskiy 
> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
> wrote:
> Hi, Jim,
> 
> What pattern should be used to search “ends with  ‘%escape’ “ with your 
> conception?
> 
> Thanks,
> Mikhail
> 
>> On 22 Sep 2016, at 18:51, Jim Ancona <j...@anconafamily.com 
>> <mailto:j...@anconafamily.com>> wrote:
>> 
>> To answer DuyHai's question without introducing new syntax, I'd suggest:
>>> LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape' 
>> So the first two %'s are translated to a literal, non-wildcard % and the 
>> third % is a wildcard because it's not doubled.
>> 
>> Jim
>> 
>> On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy 
>> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
>> wrote:
>> I guess that it should be similar to how it is done in SQL for LIKE patterns.
>> 
>> You can introduce an escape character, e.g. ‘\’.
>> Examples:
>> ‘%’ - any string
>> ‘\%’ - equal to ‘%’ character
>> ‘\%foo%’ - starts from ‘%foo’
>> ‘%%%escape’ - ends with ’escape’
>> ‘\%%’ - starts from ‘%’
>> ‘\\\%%’ - starts from ‘\%’ .
>> 
>> What do you think?
>> 
>> Thanks,
>> Mikhail
>>> On 22 Sep 2016, at 16:47, DuyHai Doan <doanduy...@gmail.com 
>>> <mailto:doanduy...@gmail.com>> wrote:
>>> 
>>> Hello Mikhail
>>> 
>>> It's more complicated that it seems
>>> 
>>> LIKE '%%escape' means  EQUAL TO '%escape'
>>> 
>>> LIKE '%escape' means ENDS WITH 'escape'
>>> 
>>> What's about LIKE '%%%escape' 
>>> 
>>> How should we treat this case ? Replace %% by % at the beginning of the 
>>> searched term ??
>>> 
>>> 
>>> 
>>> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy 
>>> <mikhail.krupits...@jetbrains.com 
>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote:
>>> Hi!
>>> 
>>> We’ve talked about two items:
>>> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
>>> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with help 
>>> of LIKE.
>>> 
>>> Item #1was resolved as CASSANDRA-12573.
>>> 
>>> Regarding to item #2: you said the following:
>>>> A possible fix would be:
>>>> 
>>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on 
>>>> the column data type)
>>>> 2) remove the escape character e.g. before parsing OR use some advanced 
>>>> regex to exclude the %% from parsing e.g
>>>> 
>>>> Step 2) is dead easy but step 1) is harder because I don't know if 
>>>> converting the bytebuffer into String at this stage of the CQL parser is 
>>>> expensive or not (in term of computation)
>>>> 
>>>> Let me try a patch 
>>> 
>>> So is there any update on this?
>>> 
>>> Thanks,
>>> Mikhail
>>> 
>>> 
>>>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy 
>>>> <mikhail.krupits...@jetbrains.com 
>>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote:
>>>> 
>>>> Hi!
>>>> 
>>>> Have you had a chance to try your patch or solve the issue in an other 
>>>> way? 
>>>> 
>>>> Thanks,
>>>> Mikhail
>>>>> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduy...@gmail.com 
>>>>> <mailto:doanduy...@gmail.com>> wrote:
>>>>> 
>>>>> Ok so I've found the source of t

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-23 Thread Mikhail Krupitskiy
Hi, Jim,

What pattern should be used to search “ends with  ‘%escape’ “ with your 
conception?

Thanks,
Mikhail
> On 22 Sep 2016, at 18:51, Jim Ancona <j...@anconafamily.com> wrote:
> 
> To answer DuyHai's question without introducing new syntax, I'd suggest:
>> LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape' 
> So the first two %'s are translated to a literal, non-wildcard % and the 
> third % is a wildcard because it's not doubled.
> 
> Jim
> 
> On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy 
> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
> wrote:
> I guess that it should be similar to how it is done in SQL for LIKE patterns.
> 
> You can introduce an escape character, e.g. ‘\’.
> Examples:
> ‘%’ - any string
> ‘\%’ - equal to ‘%’ character
> ‘\%foo%’ - starts from ‘%foo’
> ‘%%%escape’ - ends with ’escape’
> ‘\%%’ - starts from ‘%’
> ‘\\\%%’ - starts from ‘\%’ .
> 
> What do you think?
> 
> Thanks,
> Mikhail
>> On 22 Sep 2016, at 16:47, DuyHai Doan <doanduy...@gmail.com 
>> <mailto:doanduy...@gmail.com>> wrote:
>> 
>> Hello Mikhail
>> 
>> It's more complicated that it seems
>> 
>> LIKE '%%escape' means  EQUAL TO '%escape'
>> 
>> LIKE '%escape' means ENDS WITH 'escape'
>> 
>> What's about LIKE '%%%escape' 
>> 
>> How should we treat this case ? Replace %% by % at the beginning of the 
>> searched term ??
>> 
>> 
>> 
>> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy 
>> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
>> wrote:
>> Hi!
>> 
>> We’ve talked about two items:
>> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
>> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with help 
>> of LIKE.
>> 
>> Item #1was resolved as CASSANDRA-12573.
>> 
>> Regarding to item #2: you said the following:
>>> A possible fix would be:
>>> 
>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on 
>>> the column data type)
>>> 2) remove the escape character e.g. before parsing OR use some advanced 
>>> regex to exclude the %% from parsing e.g
>>> 
>>> Step 2) is dead easy but step 1) is harder because I don't know if 
>>> converting the bytebuffer into String at this stage of the CQL parser is 
>>> expensive or not (in term of computation)
>>> 
>>> Let me try a patch 
>> 
>> So is there any update on this?
>> 
>> Thanks,
>> Mikhail
>> 
>> 
>>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy 
>>> <mikhail.krupits...@jetbrains.com 
>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote:
>>> 
>>> Hi!
>>> 
>>> Have you had a chance to try your patch or solve the issue in an other way? 
>>> 
>>> Thanks,
>>> Mikhail
>>>> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduy...@gmail.com 
>>>> <mailto:doanduy...@gmail.com>> wrote:
>>>> 
>>>> Ok so I've found the source of the issue, it's pretty well hidden because 
>>>> it is NOT in the SASI source code directly.
>>>> 
>>>> Here is the method where C* determines what kind of LIKE expression you're 
>>>> using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>>>> 
>>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>>>>  
>>>> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778>
>>>> 
>>>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget 
>>>> to remove escape character BEFORE doing the matching so if your search is 
>>>> LIKE '%%esc%', the detected expression is LIKE_CONTAINS.
>>>> 
>>>> A possible fix would be:
>>>> 
>>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on 
>>>> the column data type)
>>>> 2) remove the escape character e.g. before parsing OR use some advanced 
>>>> regex to exclude the %% from parsing e.g
>>>> 
>>>> Step 2) is dead easy but step 1) is harder because I don't know if 
>>>> converting the bytebuffer into String at this stage of the CQL parser is 
>>>> expensive or not (in term of computation)
>>>> 
>>>>

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-22 Thread Mikhail Krupitskiy
I guess that it should be similar to how it is done in SQL for LIKE patterns.

You can introduce an escape character, e.g. ‘\’.
Examples:
‘%’ - any string
‘\%’ - equal to ‘%’ character
‘\%foo%’ - starts from ‘%foo’
‘%%%escape’ - ends with ’escape’
‘\%%’ - starts from ‘%’
‘\\\%%’ - starts from ‘\%’ .

What do you think?

Thanks,
Mikhail
> On 22 Sep 2016, at 16:47, DuyHai Doan <doanduy...@gmail.com> wrote:
> 
> Hello Mikhail
> 
> It's more complicated that it seems
> 
> LIKE '%%escape' means  EQUAL TO '%escape'
> 
> LIKE '%escape' means ENDS WITH 'escape'
> 
> What's about LIKE '%%%escape' 
> 
> How should we treat this case ? Replace %% by % at the beginning of the 
> searched term ??
> 
> 
> 
> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy 
> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
> wrote:
> Hi!
> 
> We’ve talked about two items:
> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with help 
> of LIKE.
> 
> Item #1was resolved as CASSANDRA-12573.
> 
> Regarding to item #2: you said the following:
>> A possible fix would be:
>> 
>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the 
>> column data type)
>> 2) remove the escape character e.g. before parsing OR use some advanced 
>> regex to exclude the %% from parsing e.g
>> 
>> Step 2) is dead easy but step 1) is harder because I don't know if 
>> converting the bytebuffer into String at this stage of the CQL parser is 
>> expensive or not (in term of computation)
>> 
>> Let me try a patch 
> 
> So is there any update on this?
> 
> Thanks,
> Mikhail
> 
> 
>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy 
>> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
>> wrote:
>> 
>> Hi!
>> 
>> Have you had a chance to try your patch or solve the issue in an other way? 
>> 
>> Thanks,
>> Mikhail
>>> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduy...@gmail.com 
>>> <mailto:doanduy...@gmail.com>> wrote:
>>> 
>>> Ok so I've found the source of the issue, it's pretty well hidden because 
>>> it is NOT in the SASI source code directly.
>>> 
>>> Here is the method where C* determines what kind of LIKE expression you're 
>>> using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>>> 
>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>>>  
>>> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778>
>>> 
>>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget 
>>> to remove escape character BEFORE doing the matching so if your search is 
>>> LIKE '%%esc%', the detected expression is LIKE_CONTAINS.
>>> 
>>> A possible fix would be:
>>> 
>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on 
>>> the column data type)
>>> 2) remove the escape character e.g. before parsing OR use some advanced 
>>> regex to exclude the %% from parsing e.g
>>> 
>>> Step 2) is dead easy but step 1) is harder because I don't know if 
>>> converting the bytebuffer into String at this stage of the CQL parser is 
>>> expensive or not (in term of computation)
>>> 
>>> Let me try a patch  
>>> 
>>> 
>>> 
>>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduy...@gmail.com 
>>> <mailto:doanduy...@gmail.com>> wrote:
>>> Ok you're right, I get your point
>>> 
>>> LIKE '%%esc%' --> startWith('%esc')
>>> 
>>> LIKE 'escape%%' -->  = 'escape%'
>>> 
>>> What I strongly suspect is that in the source code of SASI, we parse the % 
>>> xxx % expression BEFORE applying escape. That will explain the observed 
>>> behavior. E.g:
>>> 
>>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>> 
>>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>> 
>>> Let me check in the source code and try to reproduce the issue
>>> 
>>> 
>>> 
>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy 
>>> <mikhail.krupits...@jetbrains.com 
>>> <mailto:mikhail.krupits...@jetbrains.com>> wrote:
>>> Looks like we have different understanding of what re

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-22 Thread Mikhail Krupitskiy
Hi!

We’ve talked about two items:
1) ‘%’ as a wildcard in the middle of LIKE pattern.
2) How to escape ‘%’ to be able to find strings with the ‘%’ char with help of 
LIKE.

Item #1was resolved as CASSANDRA-12573.

Regarding to item #2: you said the following:
> A possible fix would be:
> 
> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the 
> column data type)
> 2) remove the escape character e.g. before parsing OR use some advanced regex 
> to exclude the %% from parsing e.g
> 
> Step 2) is dead easy but step 1) is harder because I don't know if converting 
> the bytebuffer into String at this stage of the CQL parser is expensive or 
> not (in term of computation)
> 
> Let me try a patch 

So is there any update on this?

Thanks,
Mikhail


> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy 
> <mikhail.krupits...@jetbrains.com> wrote:
> 
> Hi!
> 
> Have you had a chance to try your patch or solve the issue in an other way? 
> 
> Thanks,
> Mikhail
>> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduy...@gmail.com 
>> <mailto:doanduy...@gmail.com>> wrote:
>> 
>> Ok so I've found the source of the issue, it's pretty well hidden because it 
>> is NOT in the SASI source code directly.
>> 
>> Here is the method where C* determines what kind of LIKE expression you're 
>> using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>> 
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>>  
>> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778>
>> 
>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget to 
>> remove escape character BEFORE doing the matching so if your search is LIKE 
>> '%%esc%', the detected expression is LIKE_CONTAINS.
>> 
>> A possible fix would be:
>> 
>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the 
>> column data type)
>> 2) remove the escape character e.g. before parsing OR use some advanced 
>> regex to exclude the %% from parsing e.g
>> 
>> Step 2) is dead easy but step 1) is harder because I don't know if 
>> converting the bytebuffer into String at this stage of the CQL parser is 
>> expensive or not (in term of computation)
>> 
>> Let me try a patch  
>> 
>> 
>> 
>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduy...@gmail.com 
>> <mailto:doanduy...@gmail.com>> wrote:
>> Ok you're right, I get your point
>> 
>> LIKE '%%esc%' --> startWith('%esc')
>> 
>> LIKE 'escape%%' -->  = 'escape%'
>> 
>> What I strongly suspect is that in the source code of SASI, we parse the % 
>> xxx % expression BEFORE applying escape. That will explain the observed 
>> behavior. E.g:
>> 
>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>> 
>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>> 
>> Let me check in the source code and try to reproduce the issue
>> 
>> 
>> 
>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy 
>> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
>> wrote:
>> Looks like we have different understanding of what results are expected.
>> I based my understanding on 
>> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html 
>> <http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html>
>> According to the doc ‘esc’ is a pattern for exact match and I guess that 
>> there is no semantical difference between two LIKE patterns (both of 
>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>> 
>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
>>> containing '%esc' so %escapeme is a possible match and also escape%esc
>> Why ‘containing’? I expect that it should be ’starting’..
>>> 
>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results 
>>> starting with 'escape%' so escape%me is a valid result and also escape%esc
>> Why ’starting’? I expect that it should be ‘exact matching’.
>> 
>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it returns 
>> nothing (CASSANDRA-12573).
>> 
>> What I’m missing?
>> 
>> Thanks,
>> Mikhail
>> 
>>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduy...@gmail.com 
>>> <mailto:doanduy...@gmail.com>> wrote:
>>> 
>>> CREATE CUSTOM INDEX ON

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-20 Thread Mikhail Krupitskiy
Hi!

Have you had a chance to try your patch or solve the issue in an other way? 

Thanks,
Mikhail
> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduy...@gmail.com> wrote:
> 
> Ok so I've found the source of the issue, it's pretty well hidden because it 
> is NOT in the SASI source code directly.
> 
> Here is the method where C* determines what kind of LIKE expression you're 
> using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
> 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>  
> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778>
> 
> As you can see, it's pretty simple, maybe too simple. Indeed, they forget to 
> remove escape character BEFORE doing the matching so if your search is LIKE 
> '%%esc%', the detected expression is LIKE_CONTAINS.
> 
> A possible fix would be:
> 
> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the 
> column data type)
> 2) remove the escape character e.g. before parsing OR use some advanced regex 
> to exclude the %% from parsing e.g
> 
> Step 2) is dead easy but step 1) is harder because I don't know if converting 
> the bytebuffer into String at this stage of the CQL parser is expensive or 
> not (in term of computation)
> 
> Let me try a patch  
> 
> 
> 
> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduy...@gmail.com 
> <mailto:doanduy...@gmail.com>> wrote:
> Ok you're right, I get your point
> 
> LIKE '%%esc%' --> startWith('%esc')
> 
> LIKE 'escape%%' -->  = 'escape%'
> 
> What I strongly suspect is that in the source code of SASI, we parse the % 
> xxx % expression BEFORE applying escape. That will explain the observed 
> behavior. E.g:
> 
> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
> 
> LIKE 'escape%%' parsed as xxx% where xxx =escape%
> 
> Let me check in the source code and try to reproduce the issue
> 
> 
> 
> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy 
> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
> wrote:
> Looks like we have different understanding of what results are expected.
> I based my understanding on 
> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html 
> <http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html>
> According to the doc ‘esc’ is a pattern for exact match and I guess that 
> there is no semantical difference between two LIKE patterns (both of patterns 
> should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
> 
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
>> containing '%esc' so %escapeme is a possible match and also escape%esc
> Why ‘containing’? I expect that it should be ’starting’..
>> 
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting 
>> with 'escape%' so escape%me is a valid result and also escape%esc
> Why ’starting’? I expect that it should be ‘exact matching’.
> 
> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it returns 
> nothing (CASSANDRA-12573).
> 
> What I’m missing?
> 
> Thanks,
> Mikhail
> 
>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduy...@gmail.com 
>> <mailto:doanduy...@gmail.com>> wrote:
>> 
>> CREATE CUSTOM INDEX ON test.escape(val) USING 'org.apache.cassandra.index.sa 
>> <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {'mode': 
>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa 
>> <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer', 
>> 'case_sensitive': 'false'};
>> 
>> I don't see any problem in the results you got
>> 
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
>> containing '%esc' so %escapeme is a possible match and also escape%esc
> Why ‘containing’? I expect that it should be ’starting’..
>> 
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting 
>> with 'escape%' so escape%me is a valid result and also escape%esc
> Why ’starting’? I expect that it should be ‘exact matching’.
> 
>> 
>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy 
>> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
>> wrote:
>> Thanks for the reply.
>> Could you please provide what index definition did you use?
>> With the index from my script I get the following results:
>> 
>> cqlsh:test> select * from escape;
>> 
>>  id | v

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-15 Thread Mikhail Krupitskiy
Thank you for the investigation. Will wait for a fix and news.

Probably it’s not a directly related question but what do you think about 
CASSANDRA-12573? Let me know if it’s better to create a separate thread for it.

Thanks,
Mikhail

> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduy...@gmail.com> wrote:
> 
> Ok so I've found the source of the issue, it's pretty well hidden because it 
> is NOT in the SASI source code directly.
> 
> Here is the method where C* determines what kind of LIKE expression you're 
> using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
> 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>  
> <https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778>
> 
> As you can see, it's pretty simple, maybe too simple. Indeed, they forget to 
> remove escape character BEFORE doing the matching so if your search is LIKE 
> '%%esc%', the detected expression is LIKE_CONTAINS.
> 
> A possible fix would be:
> 
> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the 
> column data type)
> 2) remove the escape character e.g. before parsing OR use some advanced regex 
> to exclude the %% from parsing e.g
> 
> Step 2) is dead easy but step 1) is harder because I don't know if converting 
> the bytebuffer into String at this stage of the CQL parser is expensive or 
> not (in term of computation)
> 
> Let me try a patch  
> 
> 
> 
> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduy...@gmail.com 
> <mailto:doanduy...@gmail.com>> wrote:
> Ok you're right, I get your point
> 
> LIKE '%%esc%' --> startWith('%esc')
> 
> LIKE 'escape%%' -->  = 'escape%'
> 
> What I strongly suspect is that in the source code of SASI, we parse the % 
> xxx % expression BEFORE applying escape. That will explain the observed 
> behavior. E.g:
> 
> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
> 
> LIKE 'escape%%' parsed as xxx% where xxx =escape%
> 
> Let me check in the source code and try to reproduce the issue
> 
> 
> 
> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy 
> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
> wrote:
> Looks like we have different understanding of what results are expected.
> I based my understanding on 
> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html 
> <http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html>
> According to the doc ‘esc’ is a pattern for exact match and I guess that 
> there is no semantical difference between two LIKE patterns (both of patterns 
> should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
> 
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
>> containing '%esc' so %escapeme is a possible match and also escape%esc
> Why ‘containing’? I expect that it should be ’starting’..
>> 
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting 
>> with 'escape%' so escape%me is a valid result and also escape%esc
> Why ’starting’? I expect that it should be ‘exact matching’.
> 
> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it returns 
> nothing (CASSANDRA-12573).
> 
> What I’m missing?
> 
> Thanks,
> Mikhail
> 
>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduy...@gmail.com 
>> <mailto:doanduy...@gmail.com>> wrote:
>> 
>> CREATE CUSTOM INDEX ON test.escape(val) USING 'org.apache.cassandra.index.sa 
>> <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {'mode': 
>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa 
>> <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer', 
>> 'case_sensitive': 'false'};
>> 
>> I don't see any problem in the results you got
>> 
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
>> containing '%esc' so %escapeme is a possible match and also escape%esc
> Why ‘containing’? I expect that it should be ’starting’..
>> 
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting 
>> with 'escape%' so escape%me is a valid result and also escape%esc
> Why ’starting’? I expect that it should be ‘exact matching’.
> 
>> 
>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy 
>> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
>> wrote:
>> Thanks for the reply.
>> Could you please provide what index definition did you use?
>> With the index

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-13 Thread Mikhail Krupitskiy
Looks like we have different understanding of what results are expected.
I based my understanding on 
http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html 
<http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html>
According to the doc ‘esc’ is a pattern for exact match and I guess that there 
is no semantical difference between two LIKE patterns (both of patterns should 
be treated as ‘exact match'): ‘%%esc’ and ‘esc’.

> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing 
> '%esc' so %escapeme is a possible match and also escape%esc
Why ‘containing’? I expect that it should be ’starting’..
> 
> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting 
> with 'escape%' so escape%me is a valid result and also escape%esc
Why ’starting’? I expect that it should be ‘exact matching’.

Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it returns 
nothing (CASSANDRA-12573).

What I’m missing?

Thanks,
Mikhail

> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduy...@gmail.com> wrote:
> 
> CREATE CUSTOM INDEX ON test.escape(val) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 
> 'CONTAINS', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
> 'case_sensitive': 'false'};
> 
> I don't see any problem in the results you got
> 
> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing 
> '%esc' so %escapeme is a possible match and also escape%esc
Why ‘containing’? I expect that it should be ’starting’..
> 
> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting 
> with 'escape%' so escape%me is a valid result and also escape%esc
Why ’starting’? I expect that it should be ‘exact matching’.
> 
> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy 
> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
> wrote:
> Thanks for the reply.
> Could you please provide what index definition did you use?
> With the index from my script I get the following results:
> 
> cqlsh:test> select * from escape;
> 
>  id | val
> +---
>   1 | %escapeme
>   2 | escape%me
>   3 | escape%esc
> 
> Contains search
> 
> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
> 
>  id | val
> +---
>   1 | %escapeme
>   3 | escape%esc
> (2 rows)
> 
> 
> Prefix search
> 
> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
> 
>  id | val
> +---
>   2 | escape%me
>   3 | escape%esc
> 
> Thanks,
> Mikhail 
> 
>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduy...@gmail.com 
>> <mailto:doanduy...@gmail.com>> wrote:
>> 
>> Use % to escape %
>> 
>> cqlsh:test> select * from escape;
>> 
>>  id | val
>> +---
>>   1 | %escapeme
>>   2 | escape%me
>> 
>> 
>> Contains search
>> 
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>> 
>>  id | val
>> +---
>>   1 | %escapeme
>> 
>> (1 rows)
>> 
>> 
>> Prefix search
>> 
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>> 
>>  id | val
>> +---
>>   2 | escape%me
>> 
>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy 
>> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
>> wrote:
>> Hi Cassandra guys,
>> 
>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a 
>> search pattern.
>> Here is my test script:
>> 
>> DROP keyspace if exists kmv;
>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : 
>> 'SimpleStrategy', 'replication_factor':'1'} ;
>> USE kmv;
>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
>> c1));
>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 'org.apache.cassandra.index.sa 
>> <http://org.apache.cassandra.index.sa/>si.SASIIndex' WITH OPTIONS = {
>> 'analyzed' : 'true',
>> 'analyzer_class' : 'org.apache.cassandra.index.sa 
>> <http://org.apache.cassandra.index.sa/>si.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive' : 'false',
>> 'mode' : 'CONTAINS'
>> };
>> 
>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>> 
>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>> 
>> _pattern_ '%%%' finds all columns that contain %.
>> How to find columns that start form ‘%’ or ‘%a’?
>> How to find columns that end with ‘%’?
>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ 
>> char form % as a command symbol? (Also there is a related issue 
>> CASSANDRA-12573).
>> 
>> 
>> Thanks,
>> Mikhail
>> 
> 
> 



Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-13 Thread Mikhail Krupitskiy
Thanks for the reply.
Could you please provide what index definition did you use?
With the index from my script I get the following results:

cqlsh:test> select * from escape;

 id | val
+---
  1 | %escapeme
  2 | escape%me
  3 | escape%esc

Contains search

cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';

 id | val
+---
  1 | %escapeme
  3 | escape%esc
(2 rows)


Prefix search

cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';

 id | val
+---
  2 | escape%me
  3 | escape%esc

Thanks,
Mikhail 

> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduy...@gmail.com> wrote:
> 
> Use % to escape %
> 
> cqlsh:test> select * from escape;
> 
>  id | val
> +---
>   1 | %escapeme
>   2 | escape%me
> 
> 
> Contains search
> 
> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
> 
>  id | val
> +---
>   1 | %escapeme
> 
> (1 rows)
> 
> 
> Prefix search
> 
> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
> 
>  id | val
> +---
>   2 | escape%me
> 
> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy 
> <mikhail.krupits...@jetbrains.com <mailto:mikhail.krupits...@jetbrains.com>> 
> wrote:
> Hi Cassandra guys,
> 
> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a search 
> pattern.
> Here is my test script:
> 
> DROP keyspace if exists kmv;
> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> USE kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
> 'analyzed' : 'true',
> 'analyzer_class' : 
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive' : 'false',
> 'mode' : 'CONTAINS'
> };
> 
> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
> 
> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
> 
> _pattern_ '%%%' finds all columns that contain %.
> How to find columns that start form ‘%’ or ‘%a’?
> How to find columns that end with ‘%’?
> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ char 
> form % as a command symbol? (Also there is a related issue CASSANDRA-12573).
> 
> 
> Thanks,
> Mikhail
> 



How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-13 Thread Mikhail Krupitskiy
Hi Cassandra guys,

I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a search 
pattern.
Here is my test script:

DROP keyspace if exists kmv;
CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : 
'SimpleStrategy', 'replication_factor':'1'} ;
USE kmv;
CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, c1));
CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
'analyzed' : 'true',
'analyzer_class' : 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive' : 'false', 
'mode' : 'CONTAINS'
};

INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');

SELECT c2 from kmv.kmv where c2 like ‘_pattern_';

_pattern_ '%%%' finds all columns that contain %.
How to find columns that start form ‘%’ or ‘%a’?
How to find columns that end with ‘%’?
What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ char 
form % as a command symbol? (Also there is a related issue CASSANDRA-12573).


Thanks,
Mikhail