2016-01-29 14:39 GMT+01:00 Dominique Devienne <ddevienne at gmail.com>:

> On Fri, Jan 29, 2016 at 1:53 PM, Cecil Westerhof <cldwesterhof at gmail.com>
> wrote:
>
> > 2016-01-29 10:12 GMT+01:00 Richard Hipp <drh at sqlite.org>:
> > > On 1/29/16, Cecil Westerhof <cldwesterhof at gmail.com> wrote:
> > > > In H2 I can find records containing non-ascii characters with:
> > > >     SELECT *
> > > >     FROM <TABLE>
> > > >     WHERE STRINGENCODE(<COLUMN>) LIKE '%\\u%'
> > > >
> > > > Is something like this also possible with SQLite?
> > >
> > > Perhaps something like this:
> > > SELECT * FROM <table>
> > > WHERE <column> GLOB ('*[^'||char(1,45,127)||']*');
> >
> > I tried it and it looks like it works. But I do not understand how it
> > works. Could you enlighten me?
> >
>
> char() returns text with 1 (unicode) char per unicode code point integer
> used as argument to this function.
> ??
>
> So char(1,45,127) return first-ascii-char - (i.e. dash) last-ascii-char
> (unicode is a superset of ascii).
> So the same way a-z is the character range between a and z, i.e. all
> lower-case letters
> NUL - DEL is the full range of ascii characters. (char(1) = NUL, char(127)
> = DEL)
>

?I should have been a little more persevering: I did not understand the
usage of char, but I could/should have worked it out.

Thanks.

?


> [^...] is the negation of the characters inside.
> So [^NUL-DEL] is all non-ascii (unicode) characters. It matches a single
> character.
> And *[^NULL-DEL]* allows that 1 character to be matched anywhere.
> * is the kleen star, which matches 0 or more arbitrary characters.
>
> Note that glob above is not unicode aware per se. It operates on UTF-8
> encoded strings,
> and in that encoding, all non-ascii characters are encoded into multiple
> bytes, which (the bytes) all are
> in the 128-255 range. So it works indeed.
>
> > Note, however, that the "*[" combination in a GLOB pattern in SQLite
> > > is inefficient. So the above might be slow for a large amount of text.
> >
>
> because the kleen star is "griddy" by default.
>
>
> > > Perhaps a better approach would be to create an application-defined
> > > function to do the search.
> > >
> > >   SELECT * FROM <table> WHERE contains_non_ascii(<column>);
> > >
> >
> > ?I am working (in this case) with bash. That is not possible then I
> > think.Or is it?
> >
>
> You can create a C shared library SQLite extension, that you .load in the
> sqlite3 shell.
> So that works in bash too, but you need to do some C programming. --DD
>

?In principal I share my things, so that could make things a little ?

?complicated, but I'll cross that bridge when I get there.

-- 
Cecil Westerhof

Reply via email to