Thank you all for your
replies,

@RSmith:

My question is about the _specific_ case:

SELECT * FROM ATable WHERE AColumn LIKE
'FixedSearchPart' || 'SomeWildCard'

where the right side of the LIKE operator is a constant that has a fixed
part at the beginning and a wildcard after that. The optimization is that
the index is used to produce a rowset in which AColumn starts with
'FixedSearchPart' and then rows are filtered according to 'SomeWildCard'.
For instance, in

SELECT * FROM ATable WHERE AColumn LIKE
'Constantine%Yann%'

As I understand it, the index scan will use the string 'Constantine' as
argument and then the full string 'Constantine%Yann%' will be used to
further filter the rowset. Of course any other case that has no fixed part
at the start of the right-side string will have to fall back to a full scan
and filter.

Maybe the parameter notation was a little confusing but from the
description you can deduce that it will not contain a wildcard.
Essentially, the user will be asking for the rows where AColumn BEGINS WITH
a string.

@Jean-Christophe:

Thank you for the extension. I will certainly have a look at it, but I
already have the comparison algorithm (specific to latin/greek) from
another case. So the only thing I have to do is to tailor it inside a
custom collation.

@Yuriy:

Yes, the current interface for custom functions does not provide any
information on what the arguments are, so it is impossible to deduce the
collation of arguments inside the function body. That's why this has to be
implemented by the SqLite query optimizer itself. I was thinking about
something like:

- The query optimizer understands that the left-side of the LIKE argument
is a column with an index and a collation, and the right side is a literal
with a fixed left part (after parameter substitution).
- It takes the fixed part, appends a very high-order (preferably
non-printable) character to it and use the index to materialize this clause:

  AColumn BETWEEN 'FixedPart' AND 'FixedPart' || <HighOrderChar>.

If we were using ANSI I would suggest something like:

  AColumn BETWEEN 'FixedPart' AND 'FixedPart' || (char)0xFF

but for UTF-8 I am not sure what to suggest. It would have to be a UTF8
character value that is guaranteed to be greater than any other usable
character in any code page. I am not sure whether there actually is one.

-Then it applies the whole right-side argument to this rowset using LIKE
pattern matching to produce the correct result.

@Simon:

Your proposal is very clever but it has two weaknesses:
- You are assuming that 'z' is the higher order character that can appear
in a value. This is not the case; for instance greek characters have higher
order than 'z'. This can be fixed (only for latin/greek) by using the
highest order greek character 'ώ' (accented omega) instead of 'z'; but I
would prefer a very high-order non-printable one instead.
- It assumes that the column has a maximum width. Since SQLite does not
have maximum widths for columns of TEXT affinity (although a max width can
be implied and imposed by the app itself), improbable as it may be, there
could be a row with the value :SearchString||'zzzzzzzzzzzz'||'a' which will
erroneously not make it into the result set. Again, the very high-order
non-printable character would solve this.
So it can be a solution for a specific case with a given max value and a
known set of code pages but it cannot be a general solution.

--Constantine.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to