[sqlite] Using collation instead a virtual table
On Wed, 9 Sep 2015 09:54:28 -0400 Igor Tandetnik wrote: > On 9/9/2015 6:54 AM, Eduardo Morras wrote: > > Yes, the comparison operators work correctly and the b-tree binary > > search should give correct answers, the only tweak is in xCompare, > > that returns 0 when left(x) and rigth($value) expressions distance > > is lower than a threshold. > > A comparison like this would not generally be a proper collation. The > equivalence relation it induces is not transitive - it's possible to > have A == B and B == C but A != C (when A is "close enough" to B and > B is "close enough" to C, but A and C are just far enough from each > other). Yes, in first mail I noted that, but I don't think it's a problem because: a) xCompare(A,C) returns AC (-1/+1), b) the query always return only one value, c) no ORDER BY in query, d) hashed data hash are far enough / disperse and expect no new additions. I'll compare collation vs virtual table with the tests cases, if fails or is slower I'll mark the collation branch as closed. Thanks > -- > Igor Tandetnik --- --- Eduardo Morras
[sqlite] Using collation instead a virtual table
On Wed, Sep 9, 2015 at 10:59 PM, Scott Doctor wrote: > > Best practice when dealing with floating point is to normalize and Chop. In my line of business (ledger/ERP/retail) I don't think I can normalize but chopping is definitely performed ?on each operation that can produce rounding errors. Luckily the number of decimal digits is fixed (almost always 3) and one can easily detect when there is a problem with rounding. However, there are cases like the one I described above (distribution) where deviations are inevitable, but it is usually acceptable to fix by slightly violating the rules, e.g. calculating the rounding error and adding it to a single row's amount in order to balance it out. But I feel I have hijacked the thread so I think I'd better stop.
[sqlite] Using collation instead a virtual table
On Wed, Sep 9, 2015 at 9:47 PM, R.Smith wrote: > > On 2015-09-09 05:19 PM, Constantine Yannakopoulos wrote: > >> On Wed, Sep 9, 2015 at 4:54 PM, Igor Tandetnik >> wrote: >> >> ?Out of curiosity, doesn't this also apply also to numeric (real number) >> comparisons since SQLite3 uses IEEE floating point arithmetic?? >> > > IEEE Float comparisons do not work this way - you are more likely to find > the opposite: two numbers that seem to be near perfectly equal might fail > an equality test. > ?That is the problem. There are cases where two numbers that come out of different calculations? (especially if a division is included) are expected to be equal but they fail the equality test. A classic case is when you distribute an amount (e.g. a total) among several rows using a certain column as weight and you expect the sum of the distributed amounts to be exactly equal to the original total but it is not. > On Wed, Sep 9, 2015 at 7:47 PM, Igor Tandetnik wrote: > What aspect of IEEE floating point arithmetic makes comparisons unsafe, in > your opinion? Given two IEEE numbers (NaNs and INFs excepted), the > comparison would only ever declare them equal if their representations are > bit-for-bit identical; it doesn't play "close enough" games. What again > seems to be the problem? > ?OK, I was under the impression that SQLite ?used an epsilon comparison to avoid the aforementioned case. Obviously I was wrong. Sorry. As I said, no problem, just curiosity.
[sqlite] Using collation instead a virtual table
On 2015-09-09 09:02 PM, Constantine Yannakopoulos wrote: > On Wed, Sep 9, 2015 at 9:47 PM, R.Smith wrote: > >> On 2015-09-09 05:19 PM, Constantine Yannakopoulos wrote: >> >>> On Wed, Sep 9, 2015 at 4:54 PM, Igor Tandetnik >>> wrote: >>> >>> ?Out of curiosity, doesn't this also apply also to numeric (real number) >>> comparisons since SQLite3 uses IEEE floating point arithmetic?? >>> >> IEEE Float comparisons do not work this way - you are more likely to find >> the opposite: two numbers that seem to be near perfectly equal might fail >> an equality test. >> > ?That is the problem. There are cases where two numbers that come out of > different calculations? (especially if a division is included) are expected > to be equal but they fail the equality test. A classic case is when you > distribute an amount (e.g. a total) among several rows using a certain > column as weight and you expect the sum of the distributed amounts to be > exactly equal to the original total but it is not. Indeed - but you can always avoid it when comparing values using your own epsilon difference test, something like: WHERE abs(x-y) < 0.1 (Or whatever small value seems adequate to satisfy your equality specification). While on the subject - found this re-print and edited version of an old paper by David Goldberg highlighting all the IEEE pitfalls in computing - a valuable read to any programmer: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
[sqlite] Using collation instead a virtual table
On 2015-09-09 05:19 PM, Constantine Yannakopoulos wrote: > On Wed, Sep 9, 2015 at 4:54 PM, Igor Tandetnik wrote: > >> A comparison like this would not generally be a proper collation. The >> equivalence relation it induces is not transitive - it's possible to have A >> == B and B == C but A != C (when A is "close enough" to B and B is "close >> enough" to C, but A and C are just far enough from each other). >> > ?Out of curiosity, doesn't this also apply also to numeric (real number) > comparisons since SQLite3 uses IEEE floating point arithmetic?? IEEE Float comparisons do not work this way - you are more likely to find the opposite: two numbers that seem to be near perfectly equal might fail an equality test. Such confusion might be caused by statements such as: ...WHERE (5.6 - 3.1) = 2.5 ...WHERE (14 * 0.4) = 5.6 Which might return false if two or more of the constants cannot be precisely represented. (The second one is a known problem value). Nothing however would "seem" equal to the processor if they are not exactly equal in binary form - no "almost" matching happens. BTW: In strict Math it can be shown that 0.999... (repeating) is exactly equal to 1 but in IEEE floats they are not, but that is just because an 8-byte (64b) float lacks the capacity to render the repeating nines to sufficiently wide a representation to find the one-ness of it. https://en.wikipedia.org/wiki/0.999... IEEE fun in C#: Testing 1/3: f = 0.333 d = 0.333 m = 0. f*3 = 1 d*3 = 1 m*3 = 0. (double)f*3 = 1.0002980232 (decimal)f*3 = 0.999 (decimal)d*3 = 0.999 (double)((float)i/3)*3 = 1 Testing 2/3: f = 0.667 d = 0.667 m = 0.6667 f*3 = 2 d*3 = 2 m*3 = 2.0001 (double)f*3 = 2.0005960464 (decimal)f*3 = 2.001 (decimal)d*3 = 2.001 (double)((float)i/3)*3 = 2 Cheers, Ryan
[sqlite] Using collation instead a virtual table
On Wed, Sep 9, 2015 at 4:54 PM, Igor Tandetnik wrote: > A comparison like this would not generally be a proper collation. The > equivalence relation it induces is not transitive - it's possible to have A > == B and B == C but A != C (when A is "close enough" to B and B is "close > enough" to C, but A and C are just far enough from each other). > ?Out of curiosity, doesn't this also apply also to numeric (real number) comparisons since SQLite3 uses IEEE floating point arithmetic??
[sqlite] Using collation instead a virtual table
Best practice when dealing with floating point is to normalize and Chop. The best practice for dealing with floating point operations is to normalize your data sets before proceeding. All numbers should be -1.0<=x<=1.0. Done properly, after calculations are complete, the data set is easily returned to its original range and domain by reversing the normalization process. Chop is basically a rounding to a specified number of digits. Often 4 to 8 digits is adequate but each application is different. When dealing with matrix convolutions, after each row operation, chop each number in the matrix. Mathematically it can be shown that the final result is more accurate than allowing the floating point resolution error to propagate. Numbers such as 2.9 should round to 3. before doing the next round of calculations. This is especially important for results near zero. Typically numbers less than, for instance 0.1 should be set to zero. This is especially important in matrix operations. All floating point math libraries have round and/or chop functions. In fact, not chopping when doing large data sets will ultimately result in significantly wrong results due to propagating floating point resolution errors. Chopping corrects for these errors. If you remember your High School chemistry or biology class, one of the first lessons is about significant digits. How to determine the proper number of significant digits depends on your application and field of study. By using proper number of significant digits throughout your calculations, the result will be more correct than not doing so. Scott Doctor scott at scottdoctor.com -- On 9/9/2015 11:47 AM, R.Smith wrote: > > > On 2015-09-09 05:19 PM, Constantine Yannakopoulos wrote: >> On Wed, Sep 9, 2015 at 4:54 PM, Igor Tandetnik >> wrote: >> >>> A comparison like this would not generally be a proper >>> collation. The >>> equivalence relation it induces is not transitive - it's >>> possible to have A >>> == B and B == C but A != C (when A is "close enough" to B >>> and B is "close >>> enough" to C, but A and C are just far enough from each other). >>> >> ?Out of curiosity, doesn't this also apply also to numeric >> (real number) >> comparisons since SQLite3 uses IEEE floating point arithmetic?? > > IEEE Float comparisons do not work this way - you are more > likely to find the opposite: two numbers that seem to be near > perfectly equal might fail an equality test. > > Such confusion might be caused by statements such as: > ...WHERE (5.6 - 3.1) = 2.5 > ...WHERE (14 * 0.4) = 5.6 > > Which might return false if two or more of the constants > cannot be precisely represented. (The second one is a known > problem value). > > Nothing however would "seem" equal to the processor if they > are not exactly equal in binary form - no "almost" matching > happens. > > BTW: In strict Math it can be shown that 0.999... (repeating) > is exactly equal to 1 but in IEEE floats they are not, but > that is just because an 8-byte (64b) float lacks the capacity > to render the repeating nines to sufficiently wide a > representation to find the one-ness of it. > > https://en.wikipedia.org/wiki/0.999... > > IEEE fun in C#: > > Testing 1/3: > f = 0.333 > d = 0.333 > m = 0. > f*3 = 1 > d*3 = 1 > m*3 = 0. > (double)f*3 = 1.0002980232 > (decimal)f*3 = 0.999 > (decimal)d*3 = 0.999 > (double)((float)i/3)*3 = 1 > Testing 2/3: > f = 0.667 > d = 0.667 > m = 0.6667 > f*3 = 2 > d*3 = 2 > m*3 = 2.0001 > (double)f*3 = 2.0005960464 > (decimal)f*3 = 2.001 > (decimal)d*3 = 2.001 > (double)((float)i/3)*3 = 2 > > Cheers, > Ryan > > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >
[sqlite] Using collation instead a virtual table
On Tue, 8 Sep 2015 15:42:28 -0400 Richard Hipp wrote: > On 9/8/15, Eduardo Morras wrote: > > > > > > Hello, > > > > I have a virtual table that implements query perceptual hashing data > > [1]. Now I'm thinking about converting the virtual table > > implementation in a collation on a normal sqlite3 table, but > > collation requieres that '=','<' and '>' be well defined by obeying > > the rules cited on create_collation() page[2]. Sometimes, rule 2 > > may not be true, but I always query for '=', > > Yes, but under the hood, SQLlite never does an == query on the b-trees > even if you ask for a == query in the SQL. Instead, the b-trees are > queried using one of >, >=, <, or <=. A query of the form: > > x=$value > > Gets translated (at the b-tree layer) into > > x>=$value AND x<=$value > > So it is *very* important that the comparison operators all work > correctly on your collating sequence function. If they don't, then > SQLite will give incorrect answers. Yes, the comparison operators work correctly and the b-tree binary search should give correct answers, the only tweak is in xCompare, that returns 0 when left(x) and rigth($value) expressions distance is lower than a threshold. I begin with the implementation and test cases, I expect it be faster than virtual table. Thanks --- --- Eduardo Morras
[sqlite] Using collation instead a virtual table
On 9/9/2015 11:19 AM, Constantine Yannakopoulos wrote: > On Wed, Sep 9, 2015 at 4:54 PM, Igor Tandetnik wrote: > >> A comparison like this would not generally be a proper collation. The >> equivalence relation it induces is not transitive - it's possible to have A >> == B and B == C but A != C (when A is "close enough" to B and B is "close >> enough" to C, but A and C are just far enough from each other). >> > > ?Out of curiosity, doesn't this also apply also to numeric (real number) > comparisons since SQLite3 uses IEEE floating point arithmetic?? What aspect of IEEE floating point arithmetic makes comparisons unsafe, in your opinion? Given two IEEE numbers (NaNs and INFs excepted), the comparison would only ever declare them equal if their representations are bit-for-bit identical; it doesn't play "close enough" games. What again seems to be the problem? -- Igor Tandetnik
[sqlite] Using collation instead a virtual table
On 9/9/2015 6:54 AM, Eduardo Morras wrote: > Yes, the comparison operators work correctly and the b-tree binary search > should give correct answers, the only tweak is in xCompare, that returns 0 > when left(x) and rigth($value) expressions distance is lower than a threshold. A comparison like this would not generally be a proper collation. The equivalence relation it induces is not transitive - it's possible to have A == B and B == C but A != C (when A is "close enough" to B and B is "close enough" to C, but A and C are just far enough from each other). -- Igor Tandetnik
[sqlite] Using collation instead a virtual table
Hello, I have a virtual table that implements query perceptual hashing data[1]. Now I'm thinking about converting the virtual table implementation in a collation on a normal sqlite3 table, but collation requieres that '=','<' and '>' be well defined by obeying the rules cited on create_collation() page[2]. Sometimes, rule 2 may not be true, but I always query for '=', never need sort query output and result is unique. If I calculate the perceptual hash of an input, I want to get the closer (minimal distance) hash in the table calculating equal . Can I use a collation in this case? [1] Perceptual Hashing: Hash function that similar input data has equal or similar hash. [2] http://www.sqlite.org/c3ref/create_collation.html 1.If A==B then B==A. 2.If A==B and B==C then A==C. 3.If AA. 4.If A
[sqlite] Using collation instead a virtual table
On 9/8/15, Eduardo Morras wrote: > > > Hello, > > I have a virtual table that implements query perceptual hashing data[1]. Now > I'm thinking about converting the virtual table implementation in a > collation on a normal sqlite3 table, but collation requieres that '=','<' > and '>' be well defined by obeying the rules cited on create_collation() > page[2]. Sometimes, rule 2 may not be true, but I always query for '=', Yes, but under the hood, SQLlite never does an == query on the b-trees even if you ask for a == query in the SQL. Instead, the b-trees are queried using one of >, >=, <, or <=. A query of the form: x=$value Gets translated (at the b-tree layer) into x>=$value AND x<=$value So it is *very* important that the comparison operators all work correctly on your collating sequence function. If they don't, then SQLite will give incorrect answers. > never need sort query output and result is unique. > If I calculate the perceptual hash of an input, I want to get the closer > (minimal distance) hash in the table calculating equal . Can I use a > collation in this case? > > [1] Perceptual Hashing: Hash function that similar input data has equal or > similar hash. > [2] http://www.sqlite.org/c3ref/create_collation.html > 1.If A==B then B==A. > 2.If A==B and B==C then A==C. > 3.If AA. > 4.If A > --- --- > Eduardo Morras > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > -- D. Richard Hipp drh at sqlite.org