Re: [sqlite] unique with icu

2014-10-26 Thread dd
Thanks a million Simon :-)

On Sun, Oct 26, 2014 at 8:11 PM, Simon Slavin  wrote:

>
> On 26 Oct 2014, at 6:00am, dd  wrote:
>
> > Application using sqlite database without icu extension. I am planning to
> > add icu extension. for schema, add new column and index with lower.
> >
> > Is it safe to add icu for existing db's?
>
> Yes.  But once you've added it and used it there will be a problem if you
> ever try to use the database without it.
>
> > Will it lead to any corruptions?
>
> No.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] unique with icu

2014-10-26 Thread Simon Slavin

On 26 Oct 2014, at 6:00am, dd  wrote:

> Application using sqlite database without icu extension. I am planning to
> add icu extension. for schema, add new column and index with lower.
> 
> Is it safe to add icu for existing db's?

Yes.  But once you've added it and used it there will be a problem if you ever 
try to use the database without it.

> Will it lead to any corruptions?

No.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] unique with icu

2014-10-26 Thread dd
Thanks Richard/Joseph.

Application using sqlite database without icu extension. I am planning to
add icu extension. for schema, add new column and index with lower.

Is it safe to add icu for existing db's? Will it lead to any corruptions?



On Sun, Oct 26, 2014 at 1:22 AM, Joseph R. Justice 
wrote:

> On Sat, Oct 25, 2014 at 3:44 PM, Richard Hipp  wrote:
>
> > On Sat, Oct 25, 2014 at 7:09 AM, dd  wrote:
> >
>
>
> > > Hi,
> > >
> > >   icu enabled for sqlite. I didn't do any custom collations/like
> > operator.
> > >
> > > CREATE TABLE test(id integer primary key autoincrement, t text collate
> > > nocase, unique(t));
> > >
> > > Case 1:  When I try to insert 'd' and 'D', throwing constraint
> violation.
> > >  (SUCCESS)
> > > Case 2:  When I try to insert 'ö' and 'Ö', sqlite inserted both
> > > sucessfully. (FAILED)
> > >
> > >  I expect case 2 should throw constraint violation, but not.  Am I
> > missing
> > > anything here?
> >
> > Please read https://www.sqlite.org/src/artifact/d9fbbad0c2f and
> especially
> > the part about ICU collating sequences.  "NOCASE" is still the standard
> > ASCII-only collating sequence, even if you enable ICU.
> >
>
> I'm looking at that page now.  (I'm not especially interested in this issue
> -- it's just a whim.)
>
>
>
> When I look at the following text from section 1.1:
>
> To utilise "general" case mapping, the upper() or lower() scalar
> functions are invoked with one argument:
>
> upper('ABC') -> 'abc'
> lower('abc') -> 'ABC'
>
> I wonder if the examples shown are reversed, and it should be
> upper('abc') -> 'ABC', et al.
>
>
>
> W.r.t. the original poster's question, given what DRH has said here, I
> wonder if they might have to introduce a third column to the table, say
> t_unique, where t_unique is defined to be the value of t after it has been
> passed through the upper() and/or lower() functions (as they are defined by
> 1.1), and in the definition of the table instead of unique(t) have
> unique(t_unique).  Obviously this would bloat the size of the table in any
> real usage (since t and t_unique could potentially be quite long).  And I
> recognize that such bloat may be unacceptable to the original poster.
>
> Perhaps a "good enough" solution would be to define t_unique as being a
> hash of the value of upper(t) (or lower(t)), and again have
> unique(t_unique).  I recognize this could potentially lead to a false
> positive (two unrelated strings hashing to the same value and therefore
> colliding), but hopefully the chance of that occurring in real life would
> be acceptably small.  Alternatively, define t_unique to be a concatenation
> of two or three hash values of upper(t), where each hash value is generated
> using a different hash function -- the chance of two unrelated strings
> hashing to the same value with two or three unrelated hash functions should
> be far, far smaller even than that of them hashing to the same value under
> only one function.  (Pulling numbers out of thin air, if we say the chance
> of two unrelated strings hashing to the same value with any reasonable hash
> function is no more likely than 1x10^-10, then concatenating the value of
> three unrelated hash values should result in a collision no more often at
> worst than 1x10^-30 (assuming my math is correct), which is 20 orders of
> magnitude (or more!) smaller than the chance using just one hash function.)
>  Of course, here we are probably going to use far more CPU, since we will
> be hashing strings at least once (and possibly two or three times,
> depending on the chosen implementation) and this is not necessarily a cheap
> thing to compute.
>
> Alternatively, and this would require programming / API changes to SQLite,
> perhaps a "nocase_icu" could be defined / created which, in the absence of
> ICU being enabled for sqlite, works identically to how "nocase" works now,
> but which when ICU is enabled works in the fashion expected by the original
> poster as to how they thought plain "nocase" would work with ICU enabled.
>  (I assume that changing how "nocase" works when ICU is enabled is
> unacceptable since that would be a backwards-incompatible change, and it's
> possible there are programs existing which depend on the current behavior.)
>
>
>
> Hope this is of some use, interest.  Thanks for your time.
>
>
>
> Joseph
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] unique with icu

2014-10-25 Thread Joseph R. Justice
On Sat, Oct 25, 2014 at 3:44 PM, Richard Hipp  wrote:

> On Sat, Oct 25, 2014 at 7:09 AM, dd  wrote:
>


> > Hi,
> >
> >   icu enabled for sqlite. I didn't do any custom collations/like
> operator.
> >
> > CREATE TABLE test(id integer primary key autoincrement, t text collate
> > nocase, unique(t));
> >
> > Case 1:  When I try to insert 'd' and 'D', throwing constraint violation.
> >  (SUCCESS)
> > Case 2:  When I try to insert 'ö' and 'Ö', sqlite inserted both
> > sucessfully. (FAILED)
> >
> >  I expect case 2 should throw constraint violation, but not.  Am I
> missing
> > anything here?
>
> Please read https://www.sqlite.org/src/artifact/d9fbbad0c2f and especially
> the part about ICU collating sequences.  "NOCASE" is still the standard
> ASCII-only collating sequence, even if you enable ICU.
>

I'm looking at that page now.  (I'm not especially interested in this issue
-- it's just a whim.)



When I look at the following text from section 1.1:

To utilise "general" case mapping, the upper() or lower() scalar
functions are invoked with one argument:

upper('ABC') -> 'abc'
lower('abc') -> 'ABC'

I wonder if the examples shown are reversed, and it should be
upper('abc') -> 'ABC', et al.



W.r.t. the original poster's question, given what DRH has said here, I
wonder if they might have to introduce a third column to the table, say
t_unique, where t_unique is defined to be the value of t after it has been
passed through the upper() and/or lower() functions (as they are defined by
1.1), and in the definition of the table instead of unique(t) have
unique(t_unique).  Obviously this would bloat the size of the table in any
real usage (since t and t_unique could potentially be quite long).  And I
recognize that such bloat may be unacceptable to the original poster.

Perhaps a "good enough" solution would be to define t_unique as being a
hash of the value of upper(t) (or lower(t)), and again have
unique(t_unique).  I recognize this could potentially lead to a false
positive (two unrelated strings hashing to the same value and therefore
colliding), but hopefully the chance of that occurring in real life would
be acceptably small.  Alternatively, define t_unique to be a concatenation
of two or three hash values of upper(t), where each hash value is generated
using a different hash function -- the chance of two unrelated strings
hashing to the same value with two or three unrelated hash functions should
be far, far smaller even than that of them hashing to the same value under
only one function.  (Pulling numbers out of thin air, if we say the chance
of two unrelated strings hashing to the same value with any reasonable hash
function is no more likely than 1x10^-10, then concatenating the value of
three unrelated hash values should result in a collision no more often at
worst than 1x10^-30 (assuming my math is correct), which is 20 orders of
magnitude (or more!) smaller than the chance using just one hash function.)
 Of course, here we are probably going to use far more CPU, since we will
be hashing strings at least once (and possibly two or three times,
depending on the chosen implementation) and this is not necessarily a cheap
thing to compute.

Alternatively, and this would require programming / API changes to SQLite,
perhaps a "nocase_icu" could be defined / created which, in the absence of
ICU being enabled for sqlite, works identically to how "nocase" works now,
but which when ICU is enabled works in the fashion expected by the original
poster as to how they thought plain "nocase" would work with ICU enabled.
 (I assume that changing how "nocase" works when ICU is enabled is
unacceptable since that would be a backwards-incompatible change, and it's
possible there are programs existing which depend on the current behavior.)



Hope this is of some use, interest.  Thanks for your time.



Joseph
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] unique with icu

2014-10-25 Thread Richard Hipp
On Sat, Oct 25, 2014 at 7:09 AM, dd  wrote:

> Hi,
>
>   icu enabled for sqlite. I didn't do any custom collations/like operator.
>
> CREATE TABLE test(id integer primary key autoincrement, t text collate
> nocase, unique(t));
>
> Case 1:  When I try to insert 'd' and 'D', throwing constraint violation.
>  (SUCCESS)
> Case 2:  When I try to insert 'ö' and 'Ö', sqlite inserted both
> sucessfully. (FAILED)
>
>  I expect case 2 should throw constraint violation, but not.  Am I missing
> anything here?
>
>
>
Please read https://www.sqlite.org/src/artifact/d9fbbad0c2f and especially
the part about ICU collating sequences.  "NOCASE" is still the standard
ASCII-only collating sequence, even if you enable ICU.



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] unique with icu

2014-10-25 Thread dd
any inputs.

On Sat, Oct 25, 2014 at 3:09 PM, dd  wrote:

> Hi,
>
>   icu enabled for sqlite. I didn't do any custom collations/like operator.
>
> CREATE TABLE test(id integer primary key autoincrement, t text collate
> nocase, unique(t));
>
> Case 1:  When I try to insert 'd' and 'D', throwing constraint violation.
>  (SUCCESS)
> Case 2:  When I try to insert 'ö' and 'Ö', sqlite inserted both
> sucessfully. (FAILED)
>
>  I expect case 2 should throw constraint violation, but not.  Am I missing
> anything here?
>
> Thanks.
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] unique with icu

2014-10-25 Thread dd
Hi,

  icu enabled for sqlite. I didn't do any custom collations/like operator.

CREATE TABLE test(id integer primary key autoincrement, t text collate
nocase, unique(t));

Case 1:  When I try to insert 'd' and 'D', throwing constraint violation.
 (SUCCESS)
Case 2:  When I try to insert 'ö' and 'Ö', sqlite inserted both
sucessfully. (FAILED)

 I expect case 2 should throw constraint violation, but not.  Am I missing
anything here?

Thanks.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users