Re: [sqlite] unique with icu

dd Sat, 25 Oct 2014 23:01:38 -0700

Thanks Richard/Joseph.

Application using sqlite database without icu extension. I am planning to
add icu extension. for schema, add new column and index with lower.


Is it safe to add icu for existing db's? Will it lead to any corruptions?



On Sun, Oct 26, 2014 at 1:22 AM, Joseph R. Justice <[email protected]>
wrote:

> On Sat, Oct 25, 2014 at 3:44 PM, Richard Hipp <[email protected]> wrote:
>
> > On Sat, Oct 25, 2014 at 7:09 AM, dd <[email protected]> wrote:
> >
>
>
> > > Hi,
> > >
> > >   icu enabled for sqlite. I didn't do any custom collations/like
> > operator.
> > >
> > > CREATE TABLE test(id integer primary key autoincrement, t text collate
> > > nocase, unique(t));
> > >
> > > Case 1:  When I try to insert 'd' and 'D', throwing constraint
> violation.
> > >  (SUCCESS)
> > > Case 2:  When I try to insert 'ö' and 'Ö', sqlite inserted both
> > > sucessfully. (FAILED)
> > >
> > >  I expect case 2 should throw constraint violation, but not.  Am I
> > missing
> > > anything here?
> >
> > Please read https://www.sqlite.org/src/artifact/d9fbbad0c2f and
> especially
> > the part about ICU collating sequences.  "NOCASE" is still the standard
> > ASCII-only collating sequence, even if you enable ICU.
> >
>
> I'm looking at that page now.  (I'm not especially interested in this issue
> -- it's just a whim.)
>
>
>
> When I look at the following text from section 1.1:
>
>     To utilise "general" case mapping, the upper() or lower() scalar
>     functions are invoked with one argument:
>
>         upper('ABC') -> 'abc'
>         lower('abc') -> 'ABC'
>
> I wonder if the examples shown are reversed, and it should be
> upper('abc') -> 'ABC', et al.
>
>
>
> W.r.t. the original poster's question, given what DRH has said here, I
> wonder if they might have to introduce a third column to the table, say
> t_unique, where t_unique is defined to be the value of t after it has been
> passed through the upper() and/or lower() functions (as they are defined by
> 1.1), and in the definition of the table instead of unique(t) have
> unique(t_unique).  Obviously this would bloat the size of the table in any
> real usage (since t and t_unique could potentially be quite long).  And I
> recognize that such bloat may be unacceptable to the original poster.
>
> Perhaps a "good enough" solution would be to define t_unique as being a
> hash of the value of upper(t) (or lower(t)), and again have
> unique(t_unique).  I recognize this could potentially lead to a false
> positive (two unrelated strings hashing to the same value and therefore
> colliding), but hopefully the chance of that occurring in real life would
> be acceptably small.  Alternatively, define t_unique to be a concatenation
> of two or three hash values of upper(t), where each hash value is generated
> using a different hash function -- the chance of two unrelated strings
> hashing to the same value with two or three unrelated hash functions should
> be far, far smaller even than that of them hashing to the same value under
> only one function.  (Pulling numbers out of thin air, if we say the chance
> of two unrelated strings hashing to the same value with any reasonable hash
> function is no more likely than 1x10^-10, then concatenating the value of
> three unrelated hash values should result in a collision no more often at
> worst than 1x10^-30 (assuming my math is correct), which is 20 orders of
> magnitude (or more!) smaller than the chance using just one hash function.)
>  Of course, here we are probably going to use far more CPU, since we will
> be hashing strings at least once (and possibly two or three times,
> depending on the chosen implementation) and this is not necessarily a cheap
> thing to compute.
>
> Alternatively, and this would require programming / API changes to SQLite,
> perhaps a "nocase_icu" could be defined / created which, in the absence of
> ICU being enabled for sqlite, works identically to how "nocase" works now,
> but which when ICU is enabled works in the fashion expected by the original
> poster as to how they thought plain "nocase" would work with ICU enabled.
>  (I assume that changing how "nocase" works when ICU is enabled is
> unacceptable since that would be a backwards-incompatible change, and it's
> possible there are programs existing which depend on the current behavior.)
>
>
>
> Hope this is of some use, interest.  Thanks for your time.
>
>
>
> Joseph
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] unique with icu

Reply via email to