[sqlite] Using collation instead a virtual table

2015-09-10 Thread Eduardo Morras
On Wed, 9 Sep 2015 09:54:28 -0400
Igor Tandetnik  wrote:

> On 9/9/2015 6:54 AM, Eduardo Morras wrote:
> > Yes, the comparison operators work correctly and the b-tree binary
> > search should give correct answers, the only tweak is in xCompare,
> > that returns 0 when left(x) and rigth($value) expressions distance
> > is lower than a threshold.
> 
> A comparison like this would not generally be a proper collation. The 
> equivalence relation it induces is not transitive - it's possible to 
> have A == B and B == C but A != C (when A is "close enough" to B and
> B is "close enough" to C, but A and C are just far enough from each
> other).

Yes, in first mail I noted that, but I don't think it's a problem because:

a) xCompare(A,C) returns AC (-1/+1),
b) the query always return only one value, 
c) no ORDER BY in query,
d) hashed data hash are far enough / disperse and expect no new additions.

I'll compare collation vs virtual table with the tests cases, if fails or is 
slower I'll mark the collation branch as closed.

Thanks

> -- 
> Igor Tandetnik

---   ---
Eduardo Morras 


[sqlite] Using collation instead a virtual table

2015-09-10 Thread Constantine Yannakopoulos
On Wed, Sep 9, 2015 at 10:59 PM, Scott Doctor  wrote:

>
> Best practice when dealing with floating point is to normalize and Chop.


In my line of business (ledger/ERP/retail) I don't think I can normalize
but chopping is definitely performed ?on each operation that can produce
rounding errors. Luckily the number of decimal digits is fixed (almost
always 3) and one can easily detect when there is a problem with rounding.
However, there are cases like the one I described above (distribution)
where deviations are inevitable, but it is usually acceptable to fix by
slightly violating the rules, e.g. calculating the rounding error and
adding it to a single row's amount in order to balance it out.

But I feel I have hijacked the thread so I think I'd better stop.


[sqlite] Using collation instead a virtual table

2015-09-09 Thread Constantine Yannakopoulos
On Wed, Sep 9, 2015 at 9:47 PM, R.Smith  wrote:

>
> On 2015-09-09 05:19 PM, Constantine Yannakopoulos wrote:
>
>> On Wed, Sep 9, 2015 at 4:54 PM, Igor Tandetnik 
>> wrote:
>>
>> ?Out of curiosity, doesn't this also apply also to numeric (real number)
>> comparisons since SQLite3 uses IEEE floating point arithmetic??
>>
>
> IEEE Float comparisons do not work this way - you are more likely to find
> the opposite:  two numbers that seem to be near perfectly equal might fail
> an equality test.
>

?That is the problem. There are cases where two numbers that come out of
different calculations? (especially if a division is included) are expected
to be equal but they fail the equality test. A classic case is when you
distribute an amount (e.g. a total) among several rows using a certain
column as weight and you expect the sum of the distributed amounts to be
exactly equal to the original total but it is not.


> On Wed, Sep 9, 2015 at 7:47 PM, Igor Tandetnik  wrote:
> What aspect of IEEE floating point arithmetic makes comparisons unsafe, in
> your opinion? Given two IEEE numbers (NaNs and INFs excepted), the
> comparison would only ever declare them equal if their representations are
> bit-for-bit identical; it doesn't play "close enough" games. What again
> seems to be the problem?
>

?OK, I was under the impression that SQLite ?used an epsilon comparison to
avoid the aforementioned case. Obviously I was wrong. Sorry. As I said, no
problem, just curiosity.


[sqlite] Using collation instead a virtual table

2015-09-09 Thread R.Smith


On 2015-09-09 09:02 PM, Constantine Yannakopoulos wrote:
> On Wed, Sep 9, 2015 at 9:47 PM, R.Smith  wrote:
>
>> On 2015-09-09 05:19 PM, Constantine Yannakopoulos wrote:
>>
>>> On Wed, Sep 9, 2015 at 4:54 PM, Igor Tandetnik 
>>> wrote:
>>>
>>> ?Out of curiosity, doesn't this also apply also to numeric (real number)
>>> comparisons since SQLite3 uses IEEE floating point arithmetic??
>>>
>> IEEE Float comparisons do not work this way - you are more likely to find
>> the opposite:  two numbers that seem to be near perfectly equal might fail
>> an equality test.
>>
> ?That is the problem. There are cases where two numbers that come out of
> different calculations? (especially if a division is included) are expected
> to be equal but they fail the equality test. A classic case is when you
> distribute an amount (e.g. a total) among several rows using a certain
> column as weight and you expect the sum of the distributed amounts to be
> exactly equal to the original total but it is not.

Indeed - but you can always avoid it when comparing values using your 
own epsilon difference test, something like:

WHERE abs(x-y) < 0.1
(Or whatever small value seems adequate to satisfy your equality 
specification).


While on the subject - found this re-print and edited version of an old 
paper by David Goldberg highlighting all the IEEE pitfalls in computing 
- a valuable read to any programmer:

https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html




[sqlite] Using collation instead a virtual table

2015-09-09 Thread R.Smith


On 2015-09-09 05:19 PM, Constantine Yannakopoulos wrote:
> On Wed, Sep 9, 2015 at 4:54 PM, Igor Tandetnik  wrote:
>
>> A comparison like this would not generally be a proper collation. The
>> equivalence relation it induces is not transitive - it's possible to have A
>> == B and B == C but A != C (when A is "close enough" to B and B is "close
>> enough" to C, but A and C are just far enough from each other).
>>
> ?Out of curiosity, doesn't this also apply also to numeric (real number)
> comparisons since SQLite3 uses IEEE floating point arithmetic??

IEEE Float comparisons do not work this way - you are more likely to 
find the opposite:  two numbers that seem to be near perfectly equal 
might fail an equality test.

Such confusion might be caused by statements such as:
...WHERE (5.6 - 3.1) = 2.5
...WHERE (14 * 0.4) = 5.6

Which might return false if two or more of the constants cannot be 
precisely represented. (The second one is a known problem value).

Nothing however would "seem" equal to the processor if they are not 
exactly equal in binary form - no "almost" matching happens.

BTW: In strict Math it can be shown that 0.999...  (repeating) is 
exactly equal to 1 but in IEEE floats they are not, but that is just 
because an 8-byte (64b) float lacks the capacity to render the repeating 
nines to sufficiently wide a representation to find the one-ness of it.

https://en.wikipedia.org/wiki/0.999...

IEEE fun in C#:

Testing 1/3:
  f = 0.333
  d = 0.333
  m = 0.
  f*3 = 1
  d*3 = 1
  m*3 = 0.
  (double)f*3 = 1.0002980232
  (decimal)f*3 = 0.999
  (decimal)d*3 = 0.999
  (double)((float)i/3)*3 = 1
Testing 2/3:
  f = 0.667
  d = 0.667
  m = 0.6667
  f*3 = 2
  d*3 = 2
  m*3 = 2.0001
  (double)f*3 = 2.0005960464
  (decimal)f*3 = 2.001
  (decimal)d*3 = 2.001
  (double)((float)i/3)*3 = 2

Cheers,
Ryan



[sqlite] Using collation instead a virtual table

2015-09-09 Thread Constantine Yannakopoulos
On Wed, Sep 9, 2015 at 4:54 PM, Igor Tandetnik  wrote:

> A comparison like this would not generally be a proper collation. The
> equivalence relation it induces is not transitive - it's possible to have A
> == B and B == C but A != C (when A is "close enough" to B and B is "close
> enough" to C, but A and C are just far enough from each other).
>

?Out of curiosity, doesn't this also apply also to numeric (real number)
comparisons since SQLite3 uses IEEE floating point arithmetic??


[sqlite] Using collation instead a virtual table

2015-09-09 Thread Scott Doctor

Best practice when dealing with floating point is to normalize 
and Chop.

The best practice for dealing with floating point operations is 
to normalize your data sets before proceeding. All numbers 
should be -1.0<=x<=1.0. Done properly, after calculations are 
complete, the data set is easily returned to its original range 
and domain by reversing the normalization process.

Chop is basically a rounding to a specified number of digits. 
Often 4 to 8 digits is adequate but each application is 
different. When dealing with matrix convolutions, after each row 
operation, chop each number in the matrix. Mathematically it can 
be shown that the final result is more accurate than allowing 
the floating point resolution error to propagate. Numbers such 
as 2.9 should round to 3. before doing the next round of 
calculations. This is especially important for results near 
zero. Typically numbers less than, for instance 0.1 should 
be set to zero. This is especially important in matrix operations.

All floating point math libraries have round and/or chop 
functions. In fact, not chopping when doing large data sets will 
ultimately result in significantly wrong results due to 
propagating floating point resolution errors. Chopping corrects 
for these errors.

If you remember your High School chemistry or biology class, one 
of the first lessons is about significant digits. How to 
determine the proper number of significant digits depends on 
your application and field of study. By using proper number of 
significant digits throughout your calculations, the result will 
be more correct than not doing so.


Scott Doctor
scott at scottdoctor.com
--

On 9/9/2015 11:47 AM, R.Smith wrote:
>
>
> On 2015-09-09 05:19 PM, Constantine Yannakopoulos wrote:
>> On Wed, Sep 9, 2015 at 4:54 PM, Igor Tandetnik 
>>  wrote:
>>
>>> A comparison like this would not generally be a proper 
>>> collation. The
>>> equivalence relation it induces is not transitive - it's 
>>> possible to have A
>>> == B and B == C but A != C (when A is "close enough" to B 
>>> and B is "close
>>> enough" to C, but A and C are just far enough from each other).
>>>
>> ?Out of curiosity, doesn't this also apply also to numeric 
>> (real number)
>> comparisons since SQLite3 uses IEEE floating point arithmetic??
>
> IEEE Float comparisons do not work this way - you are more 
> likely to find the opposite:  two numbers that seem to be near 
> perfectly equal might fail an equality test.
>
> Such confusion might be caused by statements such as:
> ...WHERE (5.6 - 3.1) = 2.5
> ...WHERE (14 * 0.4) = 5.6
>
> Which might return false if two or more of the constants 
> cannot be precisely represented. (The second one is a known 
> problem value).
>
> Nothing however would "seem" equal to the processor if they 
> are not exactly equal in binary form - no "almost" matching 
> happens.
>
> BTW: In strict Math it can be shown that 0.999...  (repeating) 
> is exactly equal to 1 but in IEEE floats they are not, but 
> that is just because an 8-byte (64b) float lacks the capacity 
> to render the repeating nines to sufficiently wide a 
> representation to find the one-ness of it.
>
> https://en.wikipedia.org/wiki/0.999...
>
> IEEE fun in C#:
>
> Testing 1/3:
>  f = 0.333
>  d = 0.333
>  m = 0.
>  f*3 = 1
>  d*3 = 1
>  m*3 = 0.
>  (double)f*3 = 1.0002980232
>  (decimal)f*3 = 0.999
>  (decimal)d*3 = 0.999
>  (double)((float)i/3)*3 = 1
> Testing 2/3:
>  f = 0.667
>  d = 0.667
>  m = 0.6667
>  f*3 = 2
>  d*3 = 2
>  m*3 = 2.0001
>  (double)f*3 = 2.0005960464
>  (decimal)f*3 = 2.001
>  (decimal)d*3 = 2.001
>  (double)((float)i/3)*3 = 2
>
> Cheers,
> Ryan
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users 
>



[sqlite] Using collation instead a virtual table

2015-09-09 Thread Eduardo Morras
On Tue, 8 Sep 2015 15:42:28 -0400
Richard Hipp  wrote:

> On 9/8/15, Eduardo Morras  wrote:
> >
> >
> > Hello,
> >
> > I have a virtual table that implements query perceptual hashing data
> > [1]. Now I'm thinking about converting the virtual table
> > implementation in a collation on a normal sqlite3 table, but
> > collation requieres that '=','<' and '>' be well defined by obeying
> > the rules cited on create_collation() page[2]. Sometimes, rule 2
> > may not be true, but I always query for '=',
> 
> Yes, but under the hood, SQLlite never does an == query on the b-trees
> even if you ask for a == query in the SQL.  Instead, the b-trees are
> queried using one of >, >=, <, or <=.  A query of the form:
> 
>  x=$value
> 
> Gets translated (at the b-tree layer) into
> 
>  x>=$value AND x<=$value
> 
> So it is *very* important that the comparison operators all work
> correctly on your collating sequence function.  If they don't, then
> SQLite will give incorrect answers.

Yes, the comparison operators work correctly and the b-tree binary search 
should give correct answers, the only tweak is in xCompare, that returns 0 when 
left(x) and rigth($value) expressions distance is lower than a threshold. I 
begin with the implementation and test cases, I expect it be faster than 
virtual table.

Thanks

---   ---
Eduardo Morras 


[sqlite] Using collation instead a virtual table

2015-09-09 Thread Igor Tandetnik
On 9/9/2015 11:19 AM, Constantine Yannakopoulos wrote:
> On Wed, Sep 9, 2015 at 4:54 PM, Igor Tandetnik  wrote:
>
>> A comparison like this would not generally be a proper collation. The
>> equivalence relation it induces is not transitive - it's possible to have A
>> == B and B == C but A != C (when A is "close enough" to B and B is "close
>> enough" to C, but A and C are just far enough from each other).
>>
>
> ?Out of curiosity, doesn't this also apply also to numeric (real number)
> comparisons since SQLite3 uses IEEE floating point arithmetic??

What aspect of IEEE floating point arithmetic makes comparisons unsafe, 
in your opinion? Given two IEEE numbers (NaNs and INFs excepted), the 
comparison would only ever declare them equal if their representations 
are bit-for-bit identical; it doesn't play "close enough" games. What 
again seems to be the problem?
-- 
Igor Tandetnik



[sqlite] Using collation instead a virtual table

2015-09-09 Thread Igor Tandetnik
On 9/9/2015 6:54 AM, Eduardo Morras wrote:
> Yes, the comparison operators work correctly and the b-tree binary search 
> should give correct answers, the only tweak is in xCompare, that returns 0 
> when left(x) and rigth($value) expressions distance is lower than a threshold.

A comparison like this would not generally be a proper collation. The 
equivalence relation it induces is not transitive - it's possible to 
have A == B and B == C but A != C (when A is "close enough" to B and B 
is "close enough" to C, but A and C are just far enough from each other).
-- 
Igor Tandetnik



[sqlite] Using collation instead a virtual table

2015-09-08 Thread Eduardo Morras


Hello, 

I have a virtual table that implements query perceptual hashing data[1]. Now 
I'm thinking about converting the virtual table implementation in a collation 
on a normal sqlite3 table, but collation requieres that '=','<' and '>' be well 
defined by obeying the rules cited on create_collation() page[2]. Sometimes, 
rule 2 may not be true, but I always query for '=', never need sort query 
output and result is unique.
If I calculate the perceptual hash of an input, I want to get the closer 
(minimal distance) hash in the table calculating equal . Can I use a collation 
in this case?

[1] Perceptual Hashing: Hash function that similar input data has equal or 
similar hash. 
[2] http://www.sqlite.org/c3ref/create_collation.html
1.If A==B then B==A.
2.If A==B and B==C then A==C.
3.If AA.
4.If A


[sqlite] Using collation instead a virtual table

2015-09-08 Thread Richard Hipp
On 9/8/15, Eduardo Morras  wrote:
>
>
> Hello,
>
> I have a virtual table that implements query perceptual hashing data[1]. Now
> I'm thinking about converting the virtual table implementation in a
> collation on a normal sqlite3 table, but collation requieres that '=','<'
> and '>' be well defined by obeying the rules cited on create_collation()
> page[2]. Sometimes, rule 2 may not be true, but I always query for '=',

Yes, but under the hood, SQLlite never does an == query on the b-trees
even if you ask for a == query in the SQL.  Instead, the b-trees are
queried using one of >, >=, <, or <=.  A query of the form:

 x=$value

Gets translated (at the b-tree layer) into

 x>=$value AND x<=$value

So it is *very* important that the comparison operators all work
correctly on your collating sequence function.  If they don't, then
SQLite will give incorrect answers.



> never need sort query output and result is unique.
> If I calculate the perceptual hash of an input, I want to get the closer
> (minimal distance) hash in the table calculating equal . Can I use a
> collation in this case?
>
> [1] Perceptual Hashing: Hash function that similar input data has equal or
> similar hash.
> [2] http://www.sqlite.org/c3ref/create_collation.html
> 1.If A==B then B==A.
> 2.If A==B and B==C then A==C.
> 3.If AA.
> 4.If A
> ---   ---
> Eduardo Morras 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
D. Richard Hipp
drh at sqlite.org