Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-10-04 Thread Mikhail Krupitskiy
Please see my comments inline.

Thanks,
Mikhail
> On 26 Sep 2016, at 17:07, DuyHai Doan  wrote:
> 
> "In the current implementation (‘%’ could be a wildcard only at the start/end 
> of a term) I guess it should be ’ENDS with ‘%escape’ ‘." 
> 
> --> Yes in the current impl, it means ENDS WITH '%escape' but we want SASI to 
> understand the %% as an escape for % so the goal is that SASI understands 
> LIKE '%%escape' as EQUALS TO '%escape'. Am I correct ?
I guess that the goal is to define a way to use ‘%’ as a simple char.
LIKE '%escape' - ENDS WITH 'escape'
LIKE '%%escape' - EQUALS TO '%escape’
LIKE '%%escape%' - STARTS WITH '%escape’

LIKE ‘%%%escape’ - undefined in general case
LIKE ‘%%%escape’ - ENDS WITH “%escape” in a case when we know that a wildcard 
could be only at the start/end.
> 
> "Moreover all terms that contains single ‘%’ somewhere in the middle should 
> cause an exception."
> 
> --> Not necessarily, sometime people may want to search text pattern 
> containing the literal %. Imagine the text "this year the average income has 
> increase by 10%". People may want to search for "10%”.
If someone wants to search for ’10%’ then he should escape the ‘%’ char: like 
“10%%”.
> 
> 
> 
> "BUT may be it’s better to make escaping more universal to support a future 
> possible case where a wildcard could be placed in the middle of a term too?"
> 
> --> I guess universal escaping for % is the cleaner and better solution. 
> However it may involve some complex regular expression. I'm not sure that 
> input.replaceAll("%%", "%") trick would work for any cases.
As I wrote I don’t think that it’s possible to do universal escaping using ‘%’ 
as an escape char (a char to escape wildcard char to make it a simple char 
semantically) and as wildcard at the same time.
I suggest to use “\” as an escape char.
Also I don’t know enough about Cassandra’s internals to estimate how universal 
escaping will affect performance.
It really looks like a better solution for Cassandra users.
> 
> And we also need to define when we want to detect operation type 
> (LIKE_PREFIX, LIKE_SUFFIX, LIKE_CONTAINS, EQUAL) ? 
> 
> Should we detect operation type BEFORE escaping or AFTER escaping ?
As I understand ‘escaping' will be done by users. 
So on DB level we get an already escaped string from a request and it’s 
possible to know which symbol is a wildcard and which is just a char.
I guess that Cassandra should parse (unescape?) an incoming string to define 
wildcards positions and simple chars positions and then define an operation 
type.

 
> 
> 
> 
> 
> 
> On Mon, Sep 26, 2016 at 3:54 PM, Mikhail Krupitskiy 
> > 
> wrote:
>> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
> In the current implementation (‘%’ could be a wildcard only at the start/end 
> of a term) I guess it should be ’ENDS with ‘%escape’ ‘.
> Moreover all terms that contains single ‘%’ somewhere in the middle should 
> cause an exception.
> BUT may be it’s better to make escaping more universal to support a future 
> possible case where a wildcard could be placed in the middle of a term too?
> 
> Thanks,
> Mikhail 
>> On 24 Sep 2016, at 21:09, DuyHai Doan > > wrote:
>> 
>> Reminder, right now, the % character is only interpreted as wildcard IF AND 
>> ONLY IF it is the first/last character of the searched term
>> 
>> 
>> LIKE '%escape' --> ENDS WITH 'escape' 
>> 
>> If we use % to escape %,
>> LIKE '%%escape' -->  EQUALS TO '%escape'
>> 
>> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
>> 
>> 
>> 
>> 
>> On Fri, Sep 23, 2016 at 5:02 PM, Mikhail Krupitskiy 
>> > 
>> wrote:
>> Hi, Jim,
>> 
>> What pattern should be used to search “ends with  ‘%escape’ “ with your 
>> conception?
>> 
>> Thanks,
>> Mikhail
>> 
>>> On 22 Sep 2016, at 18:51, Jim Ancona >> > wrote:
>>> 
>>> To answer DuyHai's question without introducing new syntax, I'd suggest:
 LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape' 
>>> So the first two %'s are translated to a literal, non-wildcard % and the 
>>> third % is a wildcard because it's not doubled.
>>> 
>>> Jim
>>> 
>>> On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy 
>>> >> > wrote:
>>> I guess that it should be similar to how it is done in SQL for LIKE 
>>> patterns.
>>> 
>>> You can introduce an escape character, e.g. ‘\’.
>>> Examples:
>>> ‘%’ - any string
>>> ‘\%’ - equal to ‘%’ character
>>> ‘\%foo%’ - starts from ‘%foo’
>>> ‘%%%escape’ - ends with ’escape’
>>> ‘\%%’ - starts from ‘%’
>>> ‘\\\%%’ - starts from ‘\%’ .
>>> 
>>> What do you think?
>>> 
>>> Thanks,
>>> Mikhail
 On 22 Sep 2016, at 16:47, DuyHai Doan 

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-26 Thread DuyHai Doan
"In the current implementation (‘%’ could be a wildcard only at the
start/end of a term) I guess it should be ’ENDS with ‘%escape’ ‘."

--> Yes in the current impl, it means ENDS WITH '%escape' but we want SASI
to understand the %% as an escape for % so the goal is that SASI
understands LIKE '%%escape' as EQUALS TO '%escape'. Am I correct ?

"Moreover all terms that contains single ‘%’ somewhere in the middle should
cause an exception."

--> Not necessarily, sometime people may want to search text pattern
containing the literal %. Imagine the text "this year the average income
has increase by 10%". People may want to search for "10%".


"BUT may be it’s better to make escaping more universal to support a future
possible case where a wildcard could be placed in the middle of a term too?"

--> I guess universal escaping for % is the cleaner and better solution.
However it may involve some complex regular expression. I'm not sure that
input.replaceAll("%%", "%") trick would work for any cases.

And we also need to define when we want to detect operation type
(LIKE_PREFIX, LIKE_SUFFIX, LIKE_CONTAINS, EQUAL) ?

Should we detect operation type BEFORE escaping or AFTER escaping ?





On Mon, Sep 26, 2016 at 3:54 PM, Mikhail Krupitskiy <
mikhail.krupits...@jetbrains.com> wrote:

> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
>
> In the current implementation (‘%’ could be a wildcard only at the
> start/end of a term) I guess it should be ’ENDS with ‘%escape’ ‘.
> Moreover all terms that contains single ‘%’ somewhere in the middle should
> cause an exception.
> BUT may be it’s better to make escaping more universal to support a future
> possible case where a wildcard could be placed in the middle of a term too?
>
> Thanks,
> Mikhail
>
> On 24 Sep 2016, at 21:09, DuyHai Doan  wrote:
>
> Reminder, right now, the % character is only interpreted as wildcard IF
> AND ONLY IF it is the first/last character of the searched term
>
>
> LIKE '%escape' --> ENDS WITH 'escape'
>
> If we use % to escape %,
> LIKE '%%escape' -->  EQUALS TO '%escape'
>
> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
>
>
>
>
> On Fri, Sep 23, 2016 at 5:02 PM, Mikhail Krupitskiy <
> mikhail.krupits...@jetbrains.com> wrote:
>
>> Hi, Jim,
>>
>> What pattern should be used to search “ends with  ‘%escape’ “ with your
>> conception?
>>
>> Thanks,
>> Mikhail
>>
>> On 22 Sep 2016, at 18:51, Jim Ancona  wrote:
>>
>> To answer DuyHai's question without introducing new syntax, I'd suggest:
>>
>> LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape'
>>
>> So the first two %'s are translated to a literal, non-wildcard % and the
>> third % is a wildcard because it's not doubled.
>>
>> Jim
>>
>> On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy <
>> mikhail.krupits...@jetbrains.com> wrote:
>>
>>> I guess that it should be similar to how it is done in SQL for LIKE
>>> patterns.
>>>
>>> You can introduce an escape character, e.g. ‘\’.
>>> Examples:
>>> ‘%’ - any string
>>> ‘\%’ - equal to ‘%’ character
>>> ‘\%foo%’ - starts from ‘%foo’
>>> ‘%%%escape’ - ends with ’escape’
>>> ‘\%%’ - starts from ‘%’
>>> ‘\\\%%’ - starts from ‘\%’ .
>>>
>>> What do you think?
>>>
>>> Thanks,
>>> Mikhail
>>>
>>> On 22 Sep 2016, at 16:47, DuyHai Doan  wrote:
>>>
>>> Hello Mikhail
>>>
>>> It's more complicated that it seems
>>>
>>> LIKE '%%escape' means  EQUAL TO '%escape'
>>>
>>> LIKE '%escape' means ENDS WITH 'escape'
>>>
>>> What's about LIKE '%%%escape' 
>>>
>>> How should we treat this case ? Replace %% by % at the beginning of the
>>> searched term ??
>>>
>>>
>>>
>>> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy <
>>> mikhail.krupits...@jetbrains.com> wrote:
>>>
 Hi!

 We’ve talked about two items:
 1) ‘%’ as a wildcard in the middle of LIKE pattern.
 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with
 help of LIKE.

 Item #1was resolved as CASSANDRA-12573.

 Regarding to item #2: you said the following:

 A possible fix would be:

 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending
 on the column data type)
 2) remove the escape character e.g. before parsing OR use some advanced
 regex to exclude the %% from parsing e.g

 Step 2) is dead easy but step 1) is harder because I don't know if
 converting the bytebuffer into String at this stage of the CQL parser is
 expensive or not (in term of computation)

 Let me try a patch

 So is there any update on this?

 Thanks,
 Mikhail


 On 20 Sep 2016, at 18:38, Mikhail Krupitskiy <
 mikhail.krupits...@jetbrains.com> wrote:

 Hi!

 Have you had a chance to try your patch or solve the issue in an other
 way?

 Thanks,
 Mikhail

 On 15 Sep 2016, at 16:02, DuyHai Doan  wrote:

 Ok so I've found the source of the issue, 

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-26 Thread Mikhail Krupitskiy
> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
In the current implementation (‘%’ could be a wildcard only at the start/end of 
a term) I guess it should be ’ENDS with ‘%escape’ ‘.
Moreover all terms that contains single ‘%’ somewhere in the middle should 
cause an exception.
BUT may be it’s better to make escaping more universal to support a future 
possible case where a wildcard could be placed in the middle of a term too?

Thanks,
Mikhail 
> On 24 Sep 2016, at 21:09, DuyHai Doan  wrote:
> 
> Reminder, right now, the % character is only interpreted as wildcard IF AND 
> ONLY IF it is the first/last character of the searched term
> 
> 
> LIKE '%escape' --> ENDS WITH 'escape' 
> 
> If we use % to escape %,
> LIKE '%%escape' -->  EQUALS TO '%escape'
> 
> LIKE '%%%escape' --> EQUALS TO '%%escape' ???
> 
> 
> 
> 
> On Fri, Sep 23, 2016 at 5:02 PM, Mikhail Krupitskiy 
> > 
> wrote:
> Hi, Jim,
> 
> What pattern should be used to search “ends with  ‘%escape’ “ with your 
> conception?
> 
> Thanks,
> Mikhail
> 
>> On 22 Sep 2016, at 18:51, Jim Ancona > > wrote:
>> 
>> To answer DuyHai's question without introducing new syntax, I'd suggest:
>>> LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape' 
>> So the first two %'s are translated to a literal, non-wildcard % and the 
>> third % is a wildcard because it's not doubled.
>> 
>> Jim
>> 
>> On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy 
>> > 
>> wrote:
>> I guess that it should be similar to how it is done in SQL for LIKE patterns.
>> 
>> You can introduce an escape character, e.g. ‘\’.
>> Examples:
>> ‘%’ - any string
>> ‘\%’ - equal to ‘%’ character
>> ‘\%foo%’ - starts from ‘%foo’
>> ‘%%%escape’ - ends with ’escape’
>> ‘\%%’ - starts from ‘%’
>> ‘\\\%%’ - starts from ‘\%’ .
>> 
>> What do you think?
>> 
>> Thanks,
>> Mikhail
>>> On 22 Sep 2016, at 16:47, DuyHai Doan >> > wrote:
>>> 
>>> Hello Mikhail
>>> 
>>> It's more complicated that it seems
>>> 
>>> LIKE '%%escape' means  EQUAL TO '%escape'
>>> 
>>> LIKE '%escape' means ENDS WITH 'escape'
>>> 
>>> What's about LIKE '%%%escape' 
>>> 
>>> How should we treat this case ? Replace %% by % at the beginning of the 
>>> searched term ??
>>> 
>>> 
>>> 
>>> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy 
>>> >> > wrote:
>>> Hi!
>>> 
>>> We’ve talked about two items:
>>> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
>>> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with help 
>>> of LIKE.
>>> 
>>> Item #1was resolved as CASSANDRA-12573.
>>> 
>>> Regarding to item #2: you said the following:
 A possible fix would be:
 
 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on 
 the column data type)
 2) remove the escape character e.g. before parsing OR use some advanced 
 regex to exclude the %% from parsing e.g
 
 Step 2) is dead easy but step 1) is harder because I don't know if 
 converting the bytebuffer into String at this stage of the CQL parser is 
 expensive or not (in term of computation)
 
 Let me try a patch 
>>> 
>>> So is there any update on this?
>>> 
>>> Thanks,
>>> Mikhail
>>> 
>>> 
 On 20 Sep 2016, at 18:38, Mikhail Krupitskiy 
 > wrote:
 
 Hi!
 
 Have you had a chance to try your patch or solve the issue in an other 
 way? 
 
 Thanks,
 Mikhail
> On 15 Sep 2016, at 16:02, DuyHai Doan  > wrote:
> 
> Ok so I've found the source of the issue, it's pretty well hidden because 
> it is NOT in the SASI source code directly.
> 
> Here is the method where C* determines what kind of LIKE expression 
> you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
> 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>  
> 
> 
> As you can see, it's pretty simple, maybe too simple. Indeed, they forget 
> to remove escape character BEFORE doing the matching so if your search is 
> LIKE '%%esc%', the detected expression is LIKE_CONTAINS.
> 
> A possible fix would be:
> 
> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on 
> the column data type)
> 2) remove the escape character e.g. before parsing OR use some advanced 
> regex to exclude the %% from 

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-24 Thread DuyHai Doan
Reminder, right now, the % character is only interpreted as wildcard IF AND
ONLY IF it is the first/last character of the searched term


LIKE '%escape' --> ENDS WITH 'escape'

If we use % to escape %,
LIKE '%%escape' -->  EQUALS TO '%escape'

LIKE '%%%escape' --> EQUALS TO '%%escape' ???




On Fri, Sep 23, 2016 at 5:02 PM, Mikhail Krupitskiy <
mikhail.krupits...@jetbrains.com> wrote:

> Hi, Jim,
>
> What pattern should be used to search “ends with  ‘%escape’ “ with your
> conception?
>
> Thanks,
> Mikhail
>
> On 22 Sep 2016, at 18:51, Jim Ancona  wrote:
>
> To answer DuyHai's question without introducing new syntax, I'd suggest:
>
> LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape'
>
> So the first two %'s are translated to a literal, non-wildcard % and the
> third % is a wildcard because it's not doubled.
>
> Jim
>
> On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy <
> mikhail.krupits...@jetbrains.com> wrote:
>
>> I guess that it should be similar to how it is done in SQL for LIKE
>> patterns.
>>
>> You can introduce an escape character, e.g. ‘\’.
>> Examples:
>> ‘%’ - any string
>> ‘\%’ - equal to ‘%’ character
>> ‘\%foo%’ - starts from ‘%foo’
>> ‘%%%escape’ - ends with ’escape’
>> ‘\%%’ - starts from ‘%’
>> ‘\\\%%’ - starts from ‘\%’ .
>>
>> What do you think?
>>
>> Thanks,
>> Mikhail
>>
>> On 22 Sep 2016, at 16:47, DuyHai Doan  wrote:
>>
>> Hello Mikhail
>>
>> It's more complicated that it seems
>>
>> LIKE '%%escape' means  EQUAL TO '%escape'
>>
>> LIKE '%escape' means ENDS WITH 'escape'
>>
>> What's about LIKE '%%%escape' 
>>
>> How should we treat this case ? Replace %% by % at the beginning of the
>> searched term ??
>>
>>
>>
>> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy <
>> mikhail.krupits...@jetbrains.com> wrote:
>>
>>> Hi!
>>>
>>> We’ve talked about two items:
>>> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
>>> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with
>>> help of LIKE.
>>>
>>> Item #1was resolved as CASSANDRA-12573.
>>>
>>> Regarding to item #2: you said the following:
>>>
>>> A possible fix would be:
>>>
>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
>>> the column data type)
>>> 2) remove the escape character e.g. before parsing OR use some advanced
>>> regex to exclude the %% from parsing e.g
>>>
>>> Step 2) is dead easy but step 1) is harder because I don't know if
>>> converting the bytebuffer into String at this stage of the CQL parser is
>>> expensive or not (in term of computation)
>>>
>>> Let me try a patch
>>>
>>> So is there any update on this?
>>>
>>> Thanks,
>>> Mikhail
>>>
>>>
>>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy <
>>> mikhail.krupits...@jetbrains.com> wrote:
>>>
>>> Hi!
>>>
>>> Have you had a chance to try your patch or solve the issue in an other
>>> way?
>>>
>>> Thanks,
>>> Mikhail
>>>
>>> On 15 Sep 2016, at 16:02, DuyHai Doan  wrote:
>>>
>>> Ok so I've found the source of the issue, it's pretty well hidden
>>> because it is NOT in the SASI source code directly.
>>>
>>> Here is the method where C* determines what kind of LIKE expression
>>> you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>>>
>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/
>>> apache/cassandra/cql3/restrictions/SingleColumnRestriction.j
>>> ava#L733-L778
>>>
>>> As you can see, it's pretty simple, maybe too simple. Indeed, they
>>> forget to remove escape character BEFORE doing the matching so if your
>>> search is LIKE '%%esc%', the detected expression is LIKE_CONTAINS.
>>>
>>> A possible fix would be:
>>>
>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
>>> the column data type)
>>> 2) remove the escape character e.g. before parsing OR use some advanced
>>> regex to exclude the %% from parsing e.g
>>>
>>> Step 2) is dead easy but step 1) is harder because I don't know if
>>> converting the bytebuffer into String at this stage of the CQL parser is
>>> expensive or not (in term of computation)
>>>
>>> Let me try a patch
>>>
>>>
>>>
>>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan 
>>> wrote:
>>>
 Ok you're right, I get your point

 LIKE '%%esc%' --> startWith('%esc')

 LIKE 'escape%%' -->  = 'escape%'

 What I strongly suspect is that in the source code of SASI, we parse
 the % xxx % expression BEFORE applying escape. That will explain the
 observed behavior. E.g:

 LIKE '%%esc%'  parsed as %xxx% where xxx = %esc

 LIKE 'escape%%' parsed as xxx% where xxx =escape%

 Let me check in the source code and try to reproduce the issue



 On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
 mikhail.krupits...@jetbrains.com> wrote:

> Looks like we have different understanding of what results are
> expected.
> I based my understanding on http://docs.datastax.com/en
> 

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-23 Thread Mikhail Krupitskiy
Hi, Jim,

What pattern should be used to search “ends with  ‘%escape’ “ with your 
conception?

Thanks,
Mikhail
> On 22 Sep 2016, at 18:51, Jim Ancona  wrote:
> 
> To answer DuyHai's question without introducing new syntax, I'd suggest:
>> LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape' 
> So the first two %'s are translated to a literal, non-wildcard % and the 
> third % is a wildcard because it's not doubled.
> 
> Jim
> 
> On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy 
> > 
> wrote:
> I guess that it should be similar to how it is done in SQL for LIKE patterns.
> 
> You can introduce an escape character, e.g. ‘\’.
> Examples:
> ‘%’ - any string
> ‘\%’ - equal to ‘%’ character
> ‘\%foo%’ - starts from ‘%foo’
> ‘%%%escape’ - ends with ’escape’
> ‘\%%’ - starts from ‘%’
> ‘\\\%%’ - starts from ‘\%’ .
> 
> What do you think?
> 
> Thanks,
> Mikhail
>> On 22 Sep 2016, at 16:47, DuyHai Doan > > wrote:
>> 
>> Hello Mikhail
>> 
>> It's more complicated that it seems
>> 
>> LIKE '%%escape' means  EQUAL TO '%escape'
>> 
>> LIKE '%escape' means ENDS WITH 'escape'
>> 
>> What's about LIKE '%%%escape' 
>> 
>> How should we treat this case ? Replace %% by % at the beginning of the 
>> searched term ??
>> 
>> 
>> 
>> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy 
>> > 
>> wrote:
>> Hi!
>> 
>> We’ve talked about two items:
>> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
>> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with help 
>> of LIKE.
>> 
>> Item #1was resolved as CASSANDRA-12573.
>> 
>> Regarding to item #2: you said the following:
>>> A possible fix would be:
>>> 
>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on 
>>> the column data type)
>>> 2) remove the escape character e.g. before parsing OR use some advanced 
>>> regex to exclude the %% from parsing e.g
>>> 
>>> Step 2) is dead easy but step 1) is harder because I don't know if 
>>> converting the bytebuffer into String at this stage of the CQL parser is 
>>> expensive or not (in term of computation)
>>> 
>>> Let me try a patch 
>> 
>> So is there any update on this?
>> 
>> Thanks,
>> Mikhail
>> 
>> 
>>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy 
>>> >> > wrote:
>>> 
>>> Hi!
>>> 
>>> Have you had a chance to try your patch or solve the issue in an other way? 
>>> 
>>> Thanks,
>>> Mikhail
 On 15 Sep 2016, at 16:02, DuyHai Doan > wrote:
 
 Ok so I've found the source of the issue, it's pretty well hidden because 
 it is NOT in the SASI source code directly.
 
 Here is the method where C* determines what kind of LIKE expression you're 
 using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
 
 https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
  
 
 
 As you can see, it's pretty simple, maybe too simple. Indeed, they forget 
 to remove escape character BEFORE doing the matching so if your search is 
 LIKE '%%esc%', the detected expression is LIKE_CONTAINS.
 
 A possible fix would be:
 
 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on 
 the column data type)
 2) remove the escape character e.g. before parsing OR use some advanced 
 regex to exclude the %% from parsing e.g
 
 Step 2) is dead easy but step 1) is harder because I don't know if 
 converting the bytebuffer into String at this stage of the CQL parser is 
 expensive or not (in term of computation)
 
 Let me try a patch  
 
 
 
 On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan > wrote:
 Ok you're right, I get your point
 
 LIKE '%%esc%' --> startWith('%esc')
 
 LIKE 'escape%%' -->  = 'escape%'
 
 What I strongly suspect is that in the source code of SASI, we parse the % 
 xxx % expression BEFORE applying escape. That will explain the observed 
 behavior. E.g:
 
 LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
 
 LIKE 'escape%%' parsed as xxx% where xxx =escape%
 
 Let me check in the source code and try to reproduce the issue
 
 
 
 On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy 
 > wrote:
 Looks like we have different understanding of what results are expected.
 I based my 

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-22 Thread Jim Ancona
To answer DuyHai's question without introducing new syntax, I'd suggest:

LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape'

So the first two %'s are translated to a literal, non-wildcard % and the
third % is a wildcard because it's not doubled.

Jim

On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy <
mikhail.krupits...@jetbrains.com> wrote:

> I guess that it should be similar to how it is done in SQL for LIKE
> patterns.
>
> You can introduce an escape character, e.g. ‘\’.
> Examples:
> ‘%’ - any string
> ‘\%’ - equal to ‘%’ character
> ‘\%foo%’ - starts from ‘%foo’
> ‘%%%escape’ - ends with ’escape’
> ‘\%%’ - starts from ‘%’
> ‘\\\%%’ - starts from ‘\%’ .
>
> What do you think?
>
> Thanks,
> Mikhail
>
> On 22 Sep 2016, at 16:47, DuyHai Doan  wrote:
>
> Hello Mikhail
>
> It's more complicated that it seems
>
> LIKE '%%escape' means  EQUAL TO '%escape'
>
> LIKE '%escape' means ENDS WITH 'escape'
>
> What's about LIKE '%%%escape' 
>
> How should we treat this case ? Replace %% by % at the beginning of the
> searched term ??
>
>
>
> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy <
> mikhail.krupits...@jetbrains.com> wrote:
>
>> Hi!
>>
>> We’ve talked about two items:
>> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
>> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with
>> help of LIKE.
>>
>> Item #1was resolved as CASSANDRA-12573.
>>
>> Regarding to item #2: you said the following:
>>
>> A possible fix would be:
>>
>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
>> the column data type)
>> 2) remove the escape character e.g. before parsing OR use some advanced
>> regex to exclude the %% from parsing e.g
>>
>> Step 2) is dead easy but step 1) is harder because I don't know if
>> converting the bytebuffer into String at this stage of the CQL parser is
>> expensive or not (in term of computation)
>>
>> Let me try a patch
>>
>> So is there any update on this?
>>
>> Thanks,
>> Mikhail
>>
>>
>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy <
>> mikhail.krupits...@jetbrains.com> wrote:
>>
>> Hi!
>>
>> Have you had a chance to try your patch or solve the issue in an other
>> way?
>>
>> Thanks,
>> Mikhail
>>
>> On 15 Sep 2016, at 16:02, DuyHai Doan  wrote:
>>
>> Ok so I've found the source of the issue, it's pretty well hidden because
>> it is NOT in the SASI source code directly.
>>
>> Here is the method where C* determines what kind of LIKE expression
>> you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>>
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/
>> apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>>
>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget
>> to remove escape character BEFORE doing the matching so if your search is 
>> LIKE
>> '%%esc%', the detected expression is LIKE_CONTAINS.
>>
>> A possible fix would be:
>>
>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
>> the column data type)
>> 2) remove the escape character e.g. before parsing OR use some advanced
>> regex to exclude the %% from parsing e.g
>>
>> Step 2) is dead easy but step 1) is harder because I don't know if
>> converting the bytebuffer into String at this stage of the CQL parser is
>> expensive or not (in term of computation)
>>
>> Let me try a patch
>>
>>
>>
>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan 
>> wrote:
>>
>>> Ok you're right, I get your point
>>>
>>> LIKE '%%esc%' --> startWith('%esc')
>>>
>>> LIKE 'escape%%' -->  = 'escape%'
>>>
>>> What I strongly suspect is that in the source code of SASI, we parse the
>>> % xxx % expression BEFORE applying escape. That will explain the observed
>>> behavior. E.g:
>>>
>>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>>
>>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>>
>>> Let me check in the source code and try to reproduce the issue
>>>
>>>
>>>
>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
>>> mikhail.krupits...@jetbrains.com> wrote:
>>>
 Looks like we have different understanding of what results are expected.
 I based my understanding on http://docs.datastax.com/en
 /cql/3.3/cql/cql_using/useSASIIndex.html
 According to the doc ‘esc’ is a pattern for exact match and I guess
 that there is no semantical difference between two LIKE patterns (both of
 patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.

 SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
 *containing* '%esc' so *%esc*apeme is a possible match and also escape
 *%esc*

 Why ‘containing’? I expect that it should be ’starting’..


 SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
 *starting* with 'escape%' so *escape%*me is a valid result and also
 *escape%*esc

 Why ’starting’? I expect that it should be ‘exact matching’.


Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-22 Thread Mikhail Krupitskiy
I guess that it should be similar to how it is done in SQL for LIKE patterns.

You can introduce an escape character, e.g. ‘\’.
Examples:
‘%’ - any string
‘\%’ - equal to ‘%’ character
‘\%foo%’ - starts from ‘%foo’
‘%%%escape’ - ends with ’escape’
‘\%%’ - starts from ‘%’
‘\\\%%’ - starts from ‘\%’ .

What do you think?

Thanks,
Mikhail
> On 22 Sep 2016, at 16:47, DuyHai Doan  wrote:
> 
> Hello Mikhail
> 
> It's more complicated that it seems
> 
> LIKE '%%escape' means  EQUAL TO '%escape'
> 
> LIKE '%escape' means ENDS WITH 'escape'
> 
> What's about LIKE '%%%escape' 
> 
> How should we treat this case ? Replace %% by % at the beginning of the 
> searched term ??
> 
> 
> 
> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy 
> > 
> wrote:
> Hi!
> 
> We’ve talked about two items:
> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with help 
> of LIKE.
> 
> Item #1was resolved as CASSANDRA-12573.
> 
> Regarding to item #2: you said the following:
>> A possible fix would be:
>> 
>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the 
>> column data type)
>> 2) remove the escape character e.g. before parsing OR use some advanced 
>> regex to exclude the %% from parsing e.g
>> 
>> Step 2) is dead easy but step 1) is harder because I don't know if 
>> converting the bytebuffer into String at this stage of the CQL parser is 
>> expensive or not (in term of computation)
>> 
>> Let me try a patch 
> 
> So is there any update on this?
> 
> Thanks,
> Mikhail
> 
> 
>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy 
>> > 
>> wrote:
>> 
>> Hi!
>> 
>> Have you had a chance to try your patch or solve the issue in an other way? 
>> 
>> Thanks,
>> Mikhail
>>> On 15 Sep 2016, at 16:02, DuyHai Doan >> > wrote:
>>> 
>>> Ok so I've found the source of the issue, it's pretty well hidden because 
>>> it is NOT in the SASI source code directly.
>>> 
>>> Here is the method where C* determines what kind of LIKE expression you're 
>>> using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>>> 
>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>>>  
>>> 
>>> 
>>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget 
>>> to remove escape character BEFORE doing the matching so if your search is 
>>> LIKE '%%esc%', the detected expression is LIKE_CONTAINS.
>>> 
>>> A possible fix would be:
>>> 
>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on 
>>> the column data type)
>>> 2) remove the escape character e.g. before parsing OR use some advanced 
>>> regex to exclude the %% from parsing e.g
>>> 
>>> Step 2) is dead easy but step 1) is harder because I don't know if 
>>> converting the bytebuffer into String at this stage of the CQL parser is 
>>> expensive or not (in term of computation)
>>> 
>>> Let me try a patch  
>>> 
>>> 
>>> 
>>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan >> > wrote:
>>> Ok you're right, I get your point
>>> 
>>> LIKE '%%esc%' --> startWith('%esc')
>>> 
>>> LIKE 'escape%%' -->  = 'escape%'
>>> 
>>> What I strongly suspect is that in the source code of SASI, we parse the % 
>>> xxx % expression BEFORE applying escape. That will explain the observed 
>>> behavior. E.g:
>>> 
>>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>> 
>>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>> 
>>> Let me check in the source code and try to reproduce the issue
>>> 
>>> 
>>> 
>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy 
>>> >> > wrote:
>>> Looks like we have different understanding of what results are expected.
>>> I based my understanding on 
>>> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html 
>>> 
>>> According to the doc ‘esc’ is a pattern for exact match and I guess that 
>>> there is no semantical difference between two LIKE patterns (both of 
>>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>> 
 SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
 containing '%esc' so %escapeme is a possible match and also escape%esc
>>> Why ‘containing’? I expect that it should be ’starting’..
 
 SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results 
 starting with 'escape%' so escape%me is a valid result and also escape%esc
>>> Why ’starting’? I expect 

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-22 Thread DuyHai Doan
Hello Mikhail

It's more complicated that it seems

LIKE '%%escape' means  EQUAL TO '%escape'

LIKE '%escape' means ENDS WITH 'escape'

What's about LIKE '%%%escape' 

How should we treat this case ? Replace %% by % at the beginning of the
searched term ??



On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy <
mikhail.krupits...@jetbrains.com> wrote:

> Hi!
>
> We’ve talked about two items:
> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with
> help of LIKE.
>
> Item #1was resolved as CASSANDRA-12573.
>
> Regarding to item #2: you said the following:
>
> A possible fix would be:
>
> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
> the column data type)
> 2) remove the escape character e.g. before parsing OR use some advanced
> regex to exclude the %% from parsing e.g
>
> Step 2) is dead easy but step 1) is harder because I don't know if
> converting the bytebuffer into String at this stage of the CQL parser is
> expensive or not (in term of computation)
>
> Let me try a patch
>
> So is there any update on this?
>
> Thanks,
> Mikhail
>
>
> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy <
> mikhail.krupits...@jetbrains.com> wrote:
>
> Hi!
>
> Have you had a chance to try your patch or solve the issue in an other
> way?
>
> Thanks,
> Mikhail
>
> On 15 Sep 2016, at 16:02, DuyHai Doan  wrote:
>
> Ok so I've found the source of the issue, it's pretty well hidden because
> it is NOT in the SASI source code directly.
>
> Here is the method where C* determines what kind of LIKE expression you're
> using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>
> https://github.com/apache/cassandra/blob/trunk/src/java/
> org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#
> L733-L778
>
> As you can see, it's pretty simple, maybe too simple. Indeed, they forget
> to remove escape character BEFORE doing the matching so if your search is LIKE
> '%%esc%', the detected expression is LIKE_CONTAINS.
>
> A possible fix would be:
>
> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
> the column data type)
> 2) remove the escape character e.g. before parsing OR use some advanced
> regex to exclude the %% from parsing e.g
>
> Step 2) is dead easy but step 1) is harder because I don't know if
> converting the bytebuffer into String at this stage of the CQL parser is
> expensive or not (in term of computation)
>
> Let me try a patch
>
>
>
> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan  wrote:
>
>> Ok you're right, I get your point
>>
>> LIKE '%%esc%' --> startWith('%esc')
>>
>> LIKE 'escape%%' -->  = 'escape%'
>>
>> What I strongly suspect is that in the source code of SASI, we parse the
>> % xxx % expression BEFORE applying escape. That will explain the observed
>> behavior. E.g:
>>
>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>
>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>
>> Let me check in the source code and try to reproduce the issue
>>
>>
>>
>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
>> mikhail.krupits...@jetbrains.com> wrote:
>>
>>> Looks like we have different understanding of what results are expected.
>>> I based my understanding on http://docs.datastax.com/en
>>> /cql/3.3/cql/cql_using/useSASIIndex.html
>>> According to the doc ‘esc’ is a pattern for exact match and I guess that
>>> there is no semantical difference between two LIKE patterns (both of
>>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>>
>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>> *%esc*
>>>
>>> Why ‘containing’? I expect that it should be ’starting’..
>>>
>>>
>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>> *escape%*esc
>>>
>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>
>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it
>>> returns nothing (CASSANDRA-12573).
>>>
>>> What I’m missing?
>>>
>>> Thanks,
>>> Mikhail
>>>
>>> On 13 Sep 2016, at 19:31, DuyHai Doan  wrote:
>>>
>>> CREATE CUSTOM INDEX ON test.escape(val) USING '
>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
>>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa
>>> si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>>>
>>> I don't see any problem in the results you got
>>>
>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>> *%esc*
>>>
>>> Why ‘containing’? I expect that it should be ’starting’..
>>>
>>>
>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>> 

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-22 Thread Mikhail Krupitskiy
Hi!

We’ve talked about two items:
1) ‘%’ as a wildcard in the middle of LIKE pattern.
2) How to escape ‘%’ to be able to find strings with the ‘%’ char with help of 
LIKE.

Item #1was resolved as CASSANDRA-12573.

Regarding to item #2: you said the following:
> A possible fix would be:
> 
> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the 
> column data type)
> 2) remove the escape character e.g. before parsing OR use some advanced regex 
> to exclude the %% from parsing e.g
> 
> Step 2) is dead easy but step 1) is harder because I don't know if converting 
> the bytebuffer into String at this stage of the CQL parser is expensive or 
> not (in term of computation)
> 
> Let me try a patch 

So is there any update on this?

Thanks,
Mikhail


> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy 
>  wrote:
> 
> Hi!
> 
> Have you had a chance to try your patch or solve the issue in an other way? 
> 
> Thanks,
> Mikhail
>> On 15 Sep 2016, at 16:02, DuyHai Doan > > wrote:
>> 
>> Ok so I've found the source of the issue, it's pretty well hidden because it 
>> is NOT in the SASI source code directly.
>> 
>> Here is the method where C* determines what kind of LIKE expression you're 
>> using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>> 
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>>  
>> 
>> 
>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget to 
>> remove escape character BEFORE doing the matching so if your search is LIKE 
>> '%%esc%', the detected expression is LIKE_CONTAINS.
>> 
>> A possible fix would be:
>> 
>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the 
>> column data type)
>> 2) remove the escape character e.g. before parsing OR use some advanced 
>> regex to exclude the %% from parsing e.g
>> 
>> Step 2) is dead easy but step 1) is harder because I don't know if 
>> converting the bytebuffer into String at this stage of the CQL parser is 
>> expensive or not (in term of computation)
>> 
>> Let me try a patch  
>> 
>> 
>> 
>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan > > wrote:
>> Ok you're right, I get your point
>> 
>> LIKE '%%esc%' --> startWith('%esc')
>> 
>> LIKE 'escape%%' -->  = 'escape%'
>> 
>> What I strongly suspect is that in the source code of SASI, we parse the % 
>> xxx % expression BEFORE applying escape. That will explain the observed 
>> behavior. E.g:
>> 
>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>> 
>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>> 
>> Let me check in the source code and try to reproduce the issue
>> 
>> 
>> 
>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy 
>> > 
>> wrote:
>> Looks like we have different understanding of what results are expected.
>> I based my understanding on 
>> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html 
>> 
>> According to the doc ‘esc’ is a pattern for exact match and I guess that 
>> there is no semantical difference between two LIKE patterns (both of 
>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>> 
>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
>>> containing '%esc' so %escapeme is a possible match and also escape%esc
>> Why ‘containing’? I expect that it should be ’starting’..
>>> 
>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results 
>>> starting with 'escape%' so escape%me is a valid result and also escape%esc
>> Why ’starting’? I expect that it should be ‘exact matching’.
>> 
>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it returns 
>> nothing (CASSANDRA-12573).
>> 
>> What I’m missing?
>> 
>> Thanks,
>> Mikhail
>> 
>>> On 13 Sep 2016, at 19:31, DuyHai Doan >> > wrote:
>>> 
>>> CREATE CUSTOM INDEX ON test.escape(val) USING 
>>> 'org.apache.cassandra.index.sa 
>>> si.SASIIndex' WITH OPTIONS = 
>>> {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa 
>>> si.analyzer.NonTokenizingAnalyzer', 
>>> 'case_sensitive': 'false'};
>>> 
>>> I don't see any problem in the results you got
>>> 
>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
>>> containing '%esc' so %escapeme is a possible match and also escape%esc
>> Why ‘containing’? I expect that it should be ’starting’..
>>> 
>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results 
>>> 

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-20 Thread Mikhail Krupitskiy
Hi!

Have you had a chance to try your patch or solve the issue in an other way? 

Thanks,
Mikhail
> On 15 Sep 2016, at 16:02, DuyHai Doan  wrote:
> 
> Ok so I've found the source of the issue, it's pretty well hidden because it 
> is NOT in the SASI source code directly.
> 
> Here is the method where C* determines what kind of LIKE expression you're 
> using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
> 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>  
> 
> 
> As you can see, it's pretty simple, maybe too simple. Indeed, they forget to 
> remove escape character BEFORE doing the matching so if your search is LIKE 
> '%%esc%', the detected expression is LIKE_CONTAINS.
> 
> A possible fix would be:
> 
> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the 
> column data type)
> 2) remove the escape character e.g. before parsing OR use some advanced regex 
> to exclude the %% from parsing e.g
> 
> Step 2) is dead easy but step 1) is harder because I don't know if converting 
> the bytebuffer into String at this stage of the CQL parser is expensive or 
> not (in term of computation)
> 
> Let me try a patch  
> 
> 
> 
> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan  > wrote:
> Ok you're right, I get your point
> 
> LIKE '%%esc%' --> startWith('%esc')
> 
> LIKE 'escape%%' -->  = 'escape%'
> 
> What I strongly suspect is that in the source code of SASI, we parse the % 
> xxx % expression BEFORE applying escape. That will explain the observed 
> behavior. E.g:
> 
> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
> 
> LIKE 'escape%%' parsed as xxx% where xxx =escape%
> 
> Let me check in the source code and try to reproduce the issue
> 
> 
> 
> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy 
> > 
> wrote:
> Looks like we have different understanding of what results are expected.
> I based my understanding on 
> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html 
> 
> According to the doc ‘esc’ is a pattern for exact match and I guess that 
> there is no semantical difference between two LIKE patterns (both of patterns 
> should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
> 
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
>> containing '%esc' so %escapeme is a possible match and also escape%esc
> Why ‘containing’? I expect that it should be ’starting’..
>> 
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting 
>> with 'escape%' so escape%me is a valid result and also escape%esc
> Why ’starting’? I expect that it should be ‘exact matching’.
> 
> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it returns 
> nothing (CASSANDRA-12573).
> 
> What I’m missing?
> 
> Thanks,
> Mikhail
> 
>> On 13 Sep 2016, at 19:31, DuyHai Doan > > wrote:
>> 
>> CREATE CUSTOM INDEX ON test.escape(val) USING 'org.apache.cassandra.index.sa 
>> si.SASIIndex' WITH OPTIONS = {'mode': 
>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa 
>> si.analyzer.NonTokenizingAnalyzer', 
>> 'case_sensitive': 'false'};
>> 
>> I don't see any problem in the results you got
>> 
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
>> containing '%esc' so %escapeme is a possible match and also escape%esc
> Why ‘containing’? I expect that it should be ’starting’..
>> 
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting 
>> with 'escape%' so escape%me is a valid result and also escape%esc
> Why ’starting’? I expect that it should be ‘exact matching’.
> 
>> 
>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy 
>> > 
>> wrote:
>> Thanks for the reply.
>> Could you please provide what index definition did you use?
>> With the index from my script I get the following results:
>> 
>> cqlsh:test> select * from escape;
>> 
>>  id | val
>> +---
>>   1 | %escapeme
>>   2 | escape%me
>>   3 | escape%esc
>> 
>> Contains search
>> 
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>> 
>>  id | val
>> +---
>>   1 | %escapeme
>>   3 | escape%esc
>> (2 rows)
>> 
>> 
>> Prefix search
>> 
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>> 
>>  id | val
>> +---
>>   2 | escape%me
>>   3 | escape%esc
>> 
>> Thanks,
>> Mikhail 
>> 
>>> On 13 Sep 2016, at 18:16, DuyHai Doan >> 

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-15 Thread DuyHai Doan
Ok I get around the issue about %w%a%

So this will be interpreter first by the CQL parser as LIKE CONTAINS with
searched term = w%a

And then things get complicated

1) if you're using NonTokeninzingAnalyzer or NoOpAnalyzer, everything is
fine, the % in 'w%a' is interpreted as simple literal and not wildcard
character

2) if you're using StandardAnalyzer, it's an entirely different story.
During the parsing of the search predicates by the query planer, the term
'w%a' is passed to the analyzer (StandardAnalyzer here):
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java#L303-L323

The StandardAnalyzer is tokenizing the search term so 'w%a' becomes 2
distinct token, 'w' OR 'a'. Why does it ignore the % ? Because according to
Unicode line breaking rule, % is a separator, read here:
http://www.unicode.org/Public/UNIDATA/LineBreak.txt

Nowhere in the source code we can see this, in fact you'll need to look
into the JFlex grammar file
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/analyzer/StandardTokenizerImpl.jflex
to see a reference to Unicode word breaking rules ...

So indeed when using StandardAnalyzer, any % character will be interpreter
as a separator so our LIKE '%w%a%' is indeed transformed into a LIKE '%w%'
OR LIKE '%a%' e.g all words containing 'w' OR 'a', irrespective of their
relative position to each other ...

Why is it an OR predicate and not an AND predicate ? The answer is a
comment in the source code here:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java#L290-L295

I'll end by a famous sentence : "It is not a bug, it is a feature"  :D

On Thu, Sep 15, 2016 at 4:11 PM, DuyHai Doan  wrote:

> Currently SASI can only understand the % for the beginning (suffix) or
> ending (prefix) position.
>
> Any expression containing the % in the middle like %w%a% will not be
> interpreter by SASI as wildcard.
>
> %w%a% will translate into "Give me all results containing w%a
>
> On Thu, Sep 15, 2016 at 3:58 PM, Mikhail Krupitskiy <
> mikhail.krupits...@jetbrains.com> wrote:
>
>> Thank you for the investigation. Will wait for a fix and news.
>>
>> Probably it’s not a directly related question but what do you think about
>> CASSANDRA-12573? Let me know if it’s better to create a separate thread for
>> it.
>>
>> Thanks,
>> Mikhail
>>
>>
>> On 15 Sep 2016, at 16:02, DuyHai Doan  wrote:
>>
>> Ok so I've found the source of the issue, it's pretty well hidden because
>> it is NOT in the SASI source code directly.
>>
>> Here is the method where C* determines what kind of LIKE expression
>> you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>>
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/
>> apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>>
>> As you can see, it's pretty simple, maybe too simple. Indeed, they forget
>> to remove escape character BEFORE doing the matching so if your search is 
>> LIKE
>> '%%esc%', the detected expression is LIKE_CONTAINS.
>>
>> A possible fix would be:
>>
>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
>> the column data type)
>> 2) remove the escape character e.g. before parsing OR use some advanced
>> regex to exclude the %% from parsing e.g
>>
>> Step 2) is dead easy but step 1) is harder because I don't know if
>> converting the bytebuffer into String at this stage of the CQL parser is
>> expensive or not (in term of computation)
>>
>> Let me try a patch
>>
>>
>>
>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan 
>> wrote:
>>
>>> Ok you're right, I get your point
>>>
>>> LIKE '%%esc%' --> startWith('%esc')
>>>
>>> LIKE 'escape%%' -->  = 'escape%'
>>>
>>> What I strongly suspect is that in the source code of SASI, we parse the
>>> % xxx % expression BEFORE applying escape. That will explain the observed
>>> behavior. E.g:
>>>
>>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>>
>>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>>
>>> Let me check in the source code and try to reproduce the issue
>>>
>>>
>>>
>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
>>> mikhail.krupits...@jetbrains.com> wrote:
>>>
 Looks like we have different understanding of what results are expected.
 I based my understanding on http://docs.datastax.com/en
 /cql/3.3/cql/cql_using/useSASIIndex.html
 According to the doc ‘esc’ is a pattern for exact match and I guess
 that there is no semantical difference between two LIKE patterns (both of
 patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.

 SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
 *containing* '%esc' so *%esc*apeme is a possible match and also escape
 *%esc*

 Why ‘containing’? I expect that it should be ’starting’..


 SELECT * FROM escape 

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-15 Thread DuyHai Doan
Currently SASI can only understand the % for the beginning (suffix) or
ending (prefix) position.

Any expression containing the % in the middle like %w%a% will not be
interpreter by SASI as wildcard.

%w%a% will translate into "Give me all results containing w%a

On Thu, Sep 15, 2016 at 3:58 PM, Mikhail Krupitskiy <
mikhail.krupits...@jetbrains.com> wrote:

> Thank you for the investigation. Will wait for a fix and news.
>
> Probably it’s not a directly related question but what do you think about
> CASSANDRA-12573? Let me know if it’s better to create a separate thread for
> it.
>
> Thanks,
> Mikhail
>
>
> On 15 Sep 2016, at 16:02, DuyHai Doan  wrote:
>
> Ok so I've found the source of the issue, it's pretty well hidden because
> it is NOT in the SASI source code directly.
>
> Here is the method where C* determines what kind of LIKE expression you're
> using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>
> https://github.com/apache/cassandra/blob/trunk/src/java/
> org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#
> L733-L778
>
> As you can see, it's pretty simple, maybe too simple. Indeed, they forget
> to remove escape character BEFORE doing the matching so if your search is LIKE
> '%%esc%', the detected expression is LIKE_CONTAINS.
>
> A possible fix would be:
>
> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
> the column data type)
> 2) remove the escape character e.g. before parsing OR use some advanced
> regex to exclude the %% from parsing e.g
>
> Step 2) is dead easy but step 1) is harder because I don't know if
> converting the bytebuffer into String at this stage of the CQL parser is
> expensive or not (in term of computation)
>
> Let me try a patch
>
>
>
> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan  wrote:
>
>> Ok you're right, I get your point
>>
>> LIKE '%%esc%' --> startWith('%esc')
>>
>> LIKE 'escape%%' -->  = 'escape%'
>>
>> What I strongly suspect is that in the source code of SASI, we parse the
>> % xxx % expression BEFORE applying escape. That will explain the observed
>> behavior. E.g:
>>
>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>
>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>
>> Let me check in the source code and try to reproduce the issue
>>
>>
>>
>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
>> mikhail.krupits...@jetbrains.com> wrote:
>>
>>> Looks like we have different understanding of what results are expected.
>>> I based my understanding on http://docs.datastax.com/en
>>> /cql/3.3/cql/cql_using/useSASIIndex.html
>>> According to the doc ‘esc’ is a pattern for exact match and I guess that
>>> there is no semantical difference between two LIKE patterns (both of
>>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>>
>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>> *%esc*
>>>
>>> Why ‘containing’? I expect that it should be ’starting’..
>>>
>>>
>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>> *escape%*esc
>>>
>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>
>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it
>>> returns nothing (CASSANDRA-12573).
>>>
>>> What I’m missing?
>>>
>>> Thanks,
>>> Mikhail
>>>
>>> On 13 Sep 2016, at 19:31, DuyHai Doan  wrote:
>>>
>>> CREATE CUSTOM INDEX ON test.escape(val) USING '
>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
>>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa
>>> si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>>>
>>> I don't see any problem in the results you got
>>>
>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>> *%esc*
>>>
>>> Why ‘containing’? I expect that it should be ’starting’..
>>>
>>>
>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>> *escape%*esc
>>>
>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>
>>>
>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <
>>> mikhail.krupits...@jetbrains.com> wrote:
>>>
 Thanks for the reply.
 Could you please provide what index definition did you use?
 With the index from my script I get the following results:

 cqlsh:test> select * from escape;

  id | val
 +---
   1 | %escapeme
   2 | escape%me
 *  3 | escape%esc*

 Contains search

 cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';

  id | val
 +---
   1 | %escapeme
   3
 * | escape%esc*(2 rows)


 Prefix search

 cqlsh:test> SELECT * 

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-15 Thread Mikhail Krupitskiy
Thank you for the investigation. Will wait for a fix and news.

Probably it’s not a directly related question but what do you think about 
CASSANDRA-12573? Let me know if it’s better to create a separate thread for it.

Thanks,
Mikhail

> On 15 Sep 2016, at 16:02, DuyHai Doan  wrote:
> 
> Ok so I've found the source of the issue, it's pretty well hidden because it 
> is NOT in the SASI source code directly.
> 
> Here is the method where C* determines what kind of LIKE expression you're 
> using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
> 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778
>  
> 
> 
> As you can see, it's pretty simple, maybe too simple. Indeed, they forget to 
> remove escape character BEFORE doing the matching so if your search is LIKE 
> '%%esc%', the detected expression is LIKE_CONTAINS.
> 
> A possible fix would be:
> 
> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on the 
> column data type)
> 2) remove the escape character e.g. before parsing OR use some advanced regex 
> to exclude the %% from parsing e.g
> 
> Step 2) is dead easy but step 1) is harder because I don't know if converting 
> the bytebuffer into String at this stage of the CQL parser is expensive or 
> not (in term of computation)
> 
> Let me try a patch  
> 
> 
> 
> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan  > wrote:
> Ok you're right, I get your point
> 
> LIKE '%%esc%' --> startWith('%esc')
> 
> LIKE 'escape%%' -->  = 'escape%'
> 
> What I strongly suspect is that in the source code of SASI, we parse the % 
> xxx % expression BEFORE applying escape. That will explain the observed 
> behavior. E.g:
> 
> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
> 
> LIKE 'escape%%' parsed as xxx% where xxx =escape%
> 
> Let me check in the source code and try to reproduce the issue
> 
> 
> 
> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy 
> > 
> wrote:
> Looks like we have different understanding of what results are expected.
> I based my understanding on 
> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html 
> 
> According to the doc ‘esc’ is a pattern for exact match and I guess that 
> there is no semantical difference between two LIKE patterns (both of patterns 
> should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
> 
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
>> containing '%esc' so %escapeme is a possible match and also escape%esc
> Why ‘containing’? I expect that it should be ’starting’..
>> 
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting 
>> with 'escape%' so escape%me is a valid result and also escape%esc
> Why ’starting’? I expect that it should be ‘exact matching’.
> 
> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it returns 
> nothing (CASSANDRA-12573).
> 
> What I’m missing?
> 
> Thanks,
> Mikhail
> 
>> On 13 Sep 2016, at 19:31, DuyHai Doan > > wrote:
>> 
>> CREATE CUSTOM INDEX ON test.escape(val) USING 'org.apache.cassandra.index.sa 
>> si.SASIIndex' WITH OPTIONS = {'mode': 
>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa 
>> si.analyzer.NonTokenizingAnalyzer', 
>> 'case_sensitive': 'false'};
>> 
>> I don't see any problem in the results you got
>> 
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results 
>> containing '%esc' so %escapeme is a possible match and also escape%esc
> Why ‘containing’? I expect that it should be ’starting’..
>> 
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting 
>> with 'escape%' so escape%me is a valid result and also escape%esc
> Why ’starting’? I expect that it should be ‘exact matching’.
> 
>> 
>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy 
>> > 
>> wrote:
>> Thanks for the reply.
>> Could you please provide what index definition did you use?
>> With the index from my script I get the following results:
>> 
>> cqlsh:test> select * from escape;
>> 
>>  id | val
>> +---
>>   1 | %escapeme
>>   2 | escape%me
>>   3 | escape%esc
>> 
>> Contains search
>> 
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>> 
>>  id | val
>> +---
>>   1 | %escapeme
>>   3 | escape%esc
>> (2 rows)
>> 
>> 
>> Prefix search
>> 
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>> 
>>  id | val
>> +---
>>   2 

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-15 Thread DuyHai Doan
Ok so I've found the source of the issue, it's pretty well hidden because
it is NOT in the SASI source code directly.

Here is the method where C* determines what kind of LIKE expression you're
using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java#L733-L778

As you can see, it's pretty simple, maybe too simple. Indeed, they forget
to remove escape character BEFORE doing the matching so if your search is LIKE
'%%esc%', the detected expression is LIKE_CONTAINS.

A possible fix would be:

1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
the column data type)
2) remove the escape character e.g. before parsing OR use some advanced
regex to exclude the %% from parsing e.g

Step 2) is dead easy but step 1) is harder because I don't know if
converting the bytebuffer into String at this stage of the CQL parser is
expensive or not (in term of computation)

Let me try a patch



On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan  wrote:

> Ok you're right, I get your point
>
> LIKE '%%esc%' --> startWith('%esc')
>
> LIKE 'escape%%' -->  = 'escape%'
>
> What I strongly suspect is that in the source code of SASI, we parse the %
> xxx % expression BEFORE applying escape. That will explain the observed
> behavior. E.g:
>
> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>
> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>
> Let me check in the source code and try to reproduce the issue
>
>
>
> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
> mikhail.krupits...@jetbrains.com> wrote:
>
>> Looks like we have different understanding of what results are expected.
>> I based my understanding on http://docs.datastax.com/en
>> /cql/3.3/cql/cql_using/useSASIIndex.html
>> According to the doc ‘esc’ is a pattern for exact match and I guess that
>> there is no semantical difference between two LIKE patterns (both of
>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>> *%esc*
>>
>> Why ‘containing’? I expect that it should be ’starting’..
>>
>>
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>> *starting* with 'escape%' so *escape%*me is a valid result and also
>> *escape%*esc
>>
>> Why ’starting’? I expect that it should be ‘exact matching’.
>>
>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it
>> returns nothing (CASSANDRA-12573).
>>
>> What I’m missing?
>>
>> Thanks,
>> Mikhail
>>
>> On 13 Sep 2016, at 19:31, DuyHai Doan  wrote:
>>
>> CREATE CUSTOM INDEX ON test.escape(val) USING '
>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa
>> si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>>
>> I don't see any problem in the results you got
>>
>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>> *%esc*
>>
>> Why ‘containing’? I expect that it should be ’starting’..
>>
>>
>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>> *starting* with 'escape%' so *escape%*me is a valid result and also
>> *escape%*esc
>>
>> Why ’starting’? I expect that it should be ‘exact matching’.
>>
>>
>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <
>> mikhail.krupits...@jetbrains.com> wrote:
>>
>>> Thanks for the reply.
>>> Could you please provide what index definition did you use?
>>> With the index from my script I get the following results:
>>>
>>> cqlsh:test> select * from escape;
>>>
>>>  id | val
>>> +---
>>>   1 | %escapeme
>>>   2 | escape%me
>>> *  3 | escape%esc*
>>>
>>> Contains search
>>>
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>
>>>  id | val
>>> +---
>>>   1 | %escapeme
>>>   3
>>> * | escape%esc*(2 rows)
>>>
>>>
>>> Prefix search
>>>
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>
>>>  id | val
>>> +---
>>>   2 | escape%me
>>>   3
>>> * | escape%esc*
>>>
>>> Thanks,
>>> Mikhail
>>>
>>> On 13 Sep 2016, at 18:16, DuyHai Doan  wrote:
>>>
>>> Use % to escape %
>>>
>>> cqlsh:test> select * from escape;
>>>
>>>  id | val
>>> +---
>>>   1 | %escapeme
>>>   2 | escape%me
>>>
>>>
>>> Contains search
>>>
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>
>>>  id | val
>>> +---
>>>   1 | %escapeme
>>>
>>> (1 rows)
>>>
>>>
>>> Prefix search
>>>
>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>
>>>  id | val
>>> +---
>>>   2 | escape%me
>>>
>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <
>>> mikhail.krupits...@jetbrains.com> wrote:
>>>
 Hi Cassandra guys,


Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-14 Thread DuyHai Doan
Ok you're right, I get your point

LIKE '%%esc%' --> startWith('%esc')

LIKE 'escape%%' -->  = 'escape%'

What I strongly suspect is that in the source code of SASI, we parse the %
xxx % expression BEFORE applying escape. That will explain the observed
behavior. E.g:

LIKE '%%esc%'  parsed as %xxx% where xxx = %esc

LIKE 'escape%%' parsed as xxx% where xxx =escape%

Let me check in the source code and try to reproduce the issue



On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
mikhail.krupits...@jetbrains.com> wrote:

> Looks like we have different understanding of what results are expected.
> I based my understanding on http://docs.datastax.com/
> en/cql/3.3/cql/cql_using/useSASIIndex.html
> According to the doc ‘esc’ is a pattern for exact match and I guess that
> there is no semantical difference between two LIKE patterns (both of
> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>
> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
> *containing* '%esc' so *%esc*apeme is a possible match and also escape
> *%esc*
>
> Why ‘containing’? I expect that it should be ’starting’..
>
>
> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
> *starting* with 'escape%' so *escape%*me is a valid result and also
> *escape%*esc
>
> Why ’starting’? I expect that it should be ‘exact matching’.
>
> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it
> returns nothing (CASSANDRA-12573).
>
> What I’m missing?
>
> Thanks,
> Mikhail
>
> On 13 Sep 2016, at 19:31, DuyHai Doan  wrote:
>
> CREATE CUSTOM INDEX ON test.escape(val) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex'
> WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive': 'false'};
>
> I don't see any problem in the results you got
>
> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
> *containing* '%esc' so *%esc*apeme is a possible match and also escape
> *%esc*
>
> Why ‘containing’? I expect that it should be ’starting’..
>
>
> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
> *starting* with 'escape%' so *escape%*me is a valid result and also
> *escape%*esc
>
> Why ’starting’? I expect that it should be ‘exact matching’.
>
>
> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <
> mikhail.krupits...@jetbrains.com> wrote:
>
>> Thanks for the reply.
>> Could you please provide what index definition did you use?
>> With the index from my script I get the following results:
>>
>> cqlsh:test> select * from escape;
>>
>>  id | val
>> +---
>>   1 | %escapeme
>>   2 | escape%me
>> *  3 | escape%esc*
>>
>> Contains search
>>
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>
>>  id | val
>> +---
>>   1 | %escapeme
>>   3
>> * | escape%esc*(2 rows)
>>
>>
>> Prefix search
>>
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>
>>  id | val
>> +---
>>   2 | escape%me
>>   3
>> * | escape%esc*
>>
>> Thanks,
>> Mikhail
>>
>> On 13 Sep 2016, at 18:16, DuyHai Doan  wrote:
>>
>> Use % to escape %
>>
>> cqlsh:test> select * from escape;
>>
>>  id | val
>> +---
>>   1 | %escapeme
>>   2 | escape%me
>>
>>
>> Contains search
>>
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>
>>  id | val
>> +---
>>   1 | %escapeme
>>
>> (1 rows)
>>
>>
>> Prefix search
>>
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>
>>  id | val
>> +---
>>   2 | escape%me
>>
>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <
>> mikhail.krupits...@jetbrains.com> wrote:
>>
>>> Hi Cassandra guys,
>>>
>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a
>>> search pattern.
>>> Here is my test script:
>>>
>>> DROP keyspace if exists kmv;
>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' :
>>> 'SimpleStrategy', 'replication_factor':'1'} ;
>>> USE kmv;
>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY
>>> KEY(id, c1));
>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING '
>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>>> 'analyzed' : 'true',
>>> 'analyzer_class' : 'org.apache.cassandra.index.sa
>>> si.analyzer.NonTokenizingAnalyzer',
>>> 'case_sensitive' : 'false',
>>> 'mode' : 'CONTAINS'
>>> };
>>>
>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>
>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>
>>> _pattern_ '%%%' finds all columns that contain %.
>>> How to find columns that start form ‘%’ or ‘%a’?
>>> How to find columns that end with ‘%’?
>>> What about more complex patterns: '%qwe%a%sd%’? How to 

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-13 Thread Mikhail Krupitskiy
Looks like we have different understanding of what results are expected.
I based my understanding on 
http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSASIIndex.html 

According to the doc ‘esc’ is a pattern for exact match and I guess that there 
is no semantical difference between two LIKE patterns (both of patterns should 
be treated as ‘exact match'): ‘%%esc’ and ‘esc’.

> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing 
> '%esc' so %escapeme is a possible match and also escape%esc
Why ‘containing’? I expect that it should be ’starting’..
> 
> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting 
> with 'escape%' so escape%me is a valid result and also escape%esc
Why ’starting’? I expect that it should be ‘exact matching’.

Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it returns 
nothing (CASSANDRA-12573).

What I’m missing?

Thanks,
Mikhail

> On 13 Sep 2016, at 19:31, DuyHai Doan  wrote:
> 
> CREATE CUSTOM INDEX ON test.escape(val) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 
> 'CONTAINS', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
> 'case_sensitive': 'false'};
> 
> I don't see any problem in the results you got
> 
> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results containing 
> '%esc' so %escapeme is a possible match and also escape%esc
Why ‘containing’? I expect that it should be ’starting’..
> 
> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results starting 
> with 'escape%' so escape%me is a valid result and also escape%esc
Why ’starting’? I expect that it should be ‘exact matching’.
> 
> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy 
> > 
> wrote:
> Thanks for the reply.
> Could you please provide what index definition did you use?
> With the index from my script I get the following results:
> 
> cqlsh:test> select * from escape;
> 
>  id | val
> +---
>   1 | %escapeme
>   2 | escape%me
>   3 | escape%esc
> 
> Contains search
> 
> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
> 
>  id | val
> +---
>   1 | %escapeme
>   3 | escape%esc
> (2 rows)
> 
> 
> Prefix search
> 
> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
> 
>  id | val
> +---
>   2 | escape%me
>   3 | escape%esc
> 
> Thanks,
> Mikhail 
> 
>> On 13 Sep 2016, at 18:16, DuyHai Doan > > wrote:
>> 
>> Use % to escape %
>> 
>> cqlsh:test> select * from escape;
>> 
>>  id | val
>> +---
>>   1 | %escapeme
>>   2 | escape%me
>> 
>> 
>> Contains search
>> 
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>> 
>>  id | val
>> +---
>>   1 | %escapeme
>> 
>> (1 rows)
>> 
>> 
>> Prefix search
>> 
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>> 
>>  id | val
>> +---
>>   2 | escape%me
>> 
>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy 
>> > 
>> wrote:
>> Hi Cassandra guys,
>> 
>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a 
>> search pattern.
>> Here is my test script:
>> 
>> DROP keyspace if exists kmv;
>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : 
>> 'SimpleStrategy', 'replication_factor':'1'} ;
>> USE kmv;
>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
>> c1));
>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 'org.apache.cassandra.index.sa 
>> si.SASIIndex' WITH OPTIONS = {
>> 'analyzed' : 'true',
>> 'analyzer_class' : 'org.apache.cassandra.index.sa 
>> si.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive' : 'false',
>> 'mode' : 'CONTAINS'
>> };
>> 
>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>> 
>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>> 
>> _pattern_ '%%%' finds all columns that contain %.
>> How to find columns that start form ‘%’ or ‘%a’?
>> How to find columns that end with ‘%’?
>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ 
>> char form % as a command symbol? (Also there is a related issue 
>> CASSANDRA-12573).
>> 
>> 
>> Thanks,
>> Mikhail
>> 
> 
> 



Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-13 Thread DuyHai Doan
CREATE CUSTOM INDEX ON test.escape(val) USING
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
'CONTAINS', 'analyzer_class':
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'};

I don't see any problem in the results you got

SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
*containing* '%esc' so *%esc*apeme is a possible match and also escape*%esc*

SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
*starting* with 'escape%' so *escape%*me is a valid result and also
*escape%*esc

On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <
mikhail.krupits...@jetbrains.com> wrote:

> Thanks for the reply.
> Could you please provide what index definition did you use?
> With the index from my script I get the following results:
>
> cqlsh:test> select * from escape;
>
>  id | val
> +---
>   1 | %escapeme
>   2 | escape%me
> *  3 | escape%esc*
>
> Contains search
>
> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>
>  id | val
> +---
>   1 | %escapeme
>   3
> * | escape%esc*(2 rows)
>
>
> Prefix search
>
> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>
>  id | val
> +---
>   2 | escape%me
>   3
> * | escape%esc*
>
> Thanks,
> Mikhail
>
> On 13 Sep 2016, at 18:16, DuyHai Doan  wrote:
>
> Use % to escape %
>
> cqlsh:test> select * from escape;
>
>  id | val
> +---
>   1 | %escapeme
>   2 | escape%me
>
>
> Contains search
>
> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>
>  id | val
> +---
>   1 | %escapeme
>
> (1 rows)
>
>
> Prefix search
>
> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>
>  id | val
> +---
>   2 | escape%me
>
> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <
> mikhail.krupits...@jetbrains.com> wrote:
>
>> Hi Cassandra guys,
>>
>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a
>> search pattern.
>> Here is my test script:
>>
>> DROP keyspace if exists kmv;
>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' :
>> 'SimpleStrategy', 'replication_factor':'1'} ;
>> USE kmv;
>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id,
>> c1));
>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING '
>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>> 'analyzed' : 'true',
>> 'analyzer_class' : 'org.apache.cassandra.index.sa
>> si.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive' : 'false',
>> 'mode' : 'CONTAINS'
>> };
>>
>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>
>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>
>> _pattern_ '%%%' finds all columns that contain %.
>> How to find columns that start form ‘%’ or ‘%a’?
>> How to find columns that end with ‘%’?
>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’
>> char form % as a command symbol? (Also there is a related issue
>> CASSANDRA-12573).
>>
>>
>> Thanks,
>> Mikhail
>
>
>
>


Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-13 Thread Mikhail Krupitskiy
Thanks for the reply.
Could you please provide what index definition did you use?
With the index from my script I get the following results:

cqlsh:test> select * from escape;

 id | val
+---
  1 | %escapeme
  2 | escape%me
  3 | escape%esc

Contains search

cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';

 id | val
+---
  1 | %escapeme
  3 | escape%esc
(2 rows)


Prefix search

cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';

 id | val
+---
  2 | escape%me
  3 | escape%esc

Thanks,
Mikhail 

> On 13 Sep 2016, at 18:16, DuyHai Doan  wrote:
> 
> Use % to escape %
> 
> cqlsh:test> select * from escape;
> 
>  id | val
> +---
>   1 | %escapeme
>   2 | escape%me
> 
> 
> Contains search
> 
> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
> 
>  id | val
> +---
>   1 | %escapeme
> 
> (1 rows)
> 
> 
> Prefix search
> 
> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
> 
>  id | val
> +---
>   2 | escape%me
> 
> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy 
> > 
> wrote:
> Hi Cassandra guys,
> 
> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a search 
> pattern.
> Here is my test script:
> 
> DROP keyspace if exists kmv;
> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> USE kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
> 'analyzed' : 'true',
> 'analyzer_class' : 
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive' : 'false',
> 'mode' : 'CONTAINS'
> };
> 
> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
> 
> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
> 
> _pattern_ '%%%' finds all columns that contain %.
> How to find columns that start form ‘%’ or ‘%a’?
> How to find columns that end with ‘%’?
> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ char 
> form % as a command symbol? (Also there is a related issue CASSANDRA-12573).
> 
> 
> Thanks,
> Mikhail
> 



Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-13 Thread DuyHai Doan
Use % to escape %

cqlsh:test> select * from escape;

 id | val
+---
  1 | %escapeme
  2 | escape%me


Contains search

cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';

 id | val
+---
  1 | %escapeme

(1 rows)


Prefix search

cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';

 id | val
+---
  2 | escape%me

On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <
mikhail.krupits...@jetbrains.com> wrote:

> Hi Cassandra guys,
>
> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a
> search pattern.
> Here is my test script:
>
> DROP keyspace if exists kmv;
> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' :
> 'SimpleStrategy', 'replication_factor':'1'} ;
> USE kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id,
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex'
> WITH OPTIONS = {
> 'analyzed' : 'true',
> 'analyzer_class' : 'org.apache.cassandra.index.sasi.analyzer.
> NonTokenizingAnalyzer',
> 'case_sensitive' : 'false',
> 'mode' : 'CONTAINS'
> };
>
> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>
> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>
> _pattern_ '%%%' finds all columns that contain %.
> How to find columns that start form ‘%’ or ‘%a’?
> How to find columns that end with ‘%’?
> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’
> char form % as a command symbol? (Also there is a related issue
> CASSANDRA-12573).
>
>
> Thanks,
> Mikhail


How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-13 Thread Mikhail Krupitskiy
Hi Cassandra guys,

I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a search 
pattern.
Here is my test script:

DROP keyspace if exists kmv;
CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' : 
'SimpleStrategy', 'replication_factor':'1'} ;
USE kmv;
CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, c1));
CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
'analyzed' : 'true',
'analyzer_class' : 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive' : 'false', 
'mode' : 'CONTAINS'
};

INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');

SELECT c2 from kmv.kmv where c2 like ‘_pattern_';

_pattern_ '%%%' finds all columns that contain %.
How to find columns that start form ‘%’ or ‘%a’?
How to find columns that end with ‘%’?
What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’ char 
form % as a command symbol? (Also there is a related issue CASSANDRA-12573).


Thanks,
Mikhail