On Fri, Jan 29, 2016 at 1:53 PM, Cecil Westerhof <cldwesterhof at gmail.com>
wrote:

> 2016-01-29 10:12 GMT+01:00 Richard Hipp <drh at sqlite.org>:
> > On 1/29/16, Cecil Westerhof <cldwesterhof at gmail.com> wrote:
> > > In H2 I can find records containing non-ascii characters with:
> > >     SELECT *
> > >     FROM <TABLE>
> > >     WHERE STRINGENCODE(<COLUMN>) LIKE '%\\u%'
> > >
> > > Is something like this also possible with SQLite?
> >
> > Perhaps something like this:
> > SELECT * FROM <table>
> > WHERE <column> GLOB ('*[^'||char(1,45,127)||']*');
>
> I tried it and it looks like it works. But I do not understand how it
> works. Could you enlighten me?
>

char() returns text with 1 (unicode) char per unicode code point integer
used as argument to this function.
So char(1,45,127) return first-ascii-char - (i.e. dash) last-ascii-char
(unicode is a superset of ascii).
So the same way a-z is the character range between a and z, i.e. all
lower-case letters
NUL - DEL is the full range of ascii characters. (char(1) = NUL, char(127)
= DEL)
[^...] is the negation of the characters inside.
So [^NUL-DEL] is all non-ascii (unicode) characters. It matches a single
character.
And *[^NULL-DEL]* allows that 1 character to be matched anywhere.
* is the kleen star, which matches 0 or more arbitrary characters.

Note that glob above is not unicode aware per se. It operates on UTF-8
encoded strings,
and in that encoding, all non-ascii characters are encoded into multiple
bytes, which (the bytes) all are
in the 128-255 range. So it works indeed.

> Note, however, that the "*[" combination in a GLOB pattern in SQLite
> > is inefficient. So the above might be slow for a large amount of text.
>

because the kleen star is "griddy" by default.


> > Perhaps a better approach would be to create an application-defined
> > function to do the search.
> >
> >   SELECT * FROM <table> WHERE contains_non_ascii(<column>);
> >
>
> ?I am working (in this case) with bash. That is not possible then I
> think.Or is it?
>

You can create a C shared library SQLite extension, that you .load in the
sqlite3 shell.
So that works in bash too, but you need to do some C programming. --DD

Reply via email to