Re: [sqlite] Will ROWID be required to be unique in contentless Full text search

2019-04-25 Thread Dan Kennedy


On 25/4/62 07:58, paul tracy wrote:

My question is whether or not I can rely on my use of non-unique rowids in FTS5 
contentless tables to remain unenforced into the future of SQLite or am I 
exploiting an oversight that may be resolved in the future?
Details ...
I have a key-value pair SQLite table (called props) used for storing properties 
of objects. It has the following columns: objID, propCode, propVal
objID + propCode is my primary key.

I have created a contentless FTS5 virtual table as follows: create virtual 
table fti using FTS5(ftiCode, ftiVal, content='');
for every record in props table a corresponding insert is made to the fti 
table, I insert the props.objID into fti.rowid.
Since any given objID in my props table can have multiple props I wind up 
making multiple inserts into the fti table with a non-unique rowid (since I'm 
using rowid to store the objid). This works perfectly fine in all use cases so 
my app runs perfectly.
All of the docs on rowid say that it is unique but in the FTS5 contentless 
table this unique requirement is not enforced. Since I have to provide all 3 of 
the initial fields whenever a delete is required then this also works to remove 
only the exact record from FTI. I view the rowid in the FTS contentless table 
as simply holding whatever value I send to it and it will return that value in 
a field named rowid but that field isn't really a rowid in the true sense. If 
this interpretation will remain valid into the future then my implementation is 
perfect.

For the curious: Why am I doing this instead of something more traditional? 
Mostly it has to do with elimination of duplicate returns from the FTS5 search. 
But also contentless to save space. When I query the FT index I am interested 
in which objects contain given property values. My table has millions of rows 
so I'm sensitive to space and speed.
If I insert the following rows into fti rowid, ftiCode, ftiValue:
3, 7, A3, 8, B
and then I ... SELECT rowid FROM FTI where FTI MATCH A OR B … I only get one 
row returned with rowid of 3. This was a surprise but was exactly what I 
wanted. A more traditional approach to using FTI would have resulted in two 
rows returned both with the same ID field (object ID in my case). By exploiting 
the fact that the FTS5 does not enforce unique rowids I save myself the expense 
of using DISTINCT.


Well, I can tell you that a single query on an FTS5 will never return 
two rows with the same rowid. Or, more accurately, if it does it 
indicates a bug that would affect all users, not just folks doing 
unusual things. That's just the nature of the data structure.


Other things:

I think you should add the "columnsize=0" option to the CREATE VIRTUAL 
TABLE statement. Without this option FTS5 creates a table that maps from 
rowid to the size in tokens of each column in the associated row (in 
your case, it will be named "fti_docsize"). This isn't going to work for 
you - as the second time you insert a row with rowid=3 the size values 
will clobber the values set when you inserted the first row with 
rowid=3. It's only apps using specific ranking functions that benefit 
from this extra data structure, so you can probably live without it:


https://sqlite.org/fts5.html#the_columnsize_option

You might also add the "detail=none" option, so that entire CREATE 
VIRTUAL TABLE statement is:


  CREATE VIRTUAL TABLE fti USING fts5(ftiCode, ftiVal, content='', 
columnsize=0, detail=none);


This option means that instead of storing offset information, FTS5 just 
stores a list of rowids for each token. This might be better for you as 
the offset information isn't going to work quite right either. Say you do:


  INSERT INTO fti VALUES(3, 'B', 'A B');
  INSERT INTO fti VALUSE(3, 3, 'B');

The second insert will clobber the offset information for "rowid=3, 
token=B". So FTS5 will think that the only instance of "B" in the 
rowid=3 document is token 0 of column "ftiVal". It will forget all about 
the instances associated with the first insert of a rowid=3 row. But it 
will remember that the rowid=3 document contains an 'A' as the first 
token of column 'ftiVal'. If you set detail=none, then FTS5 won't store 
any offset information and you won't have a problem. Like columnsize=0, 
this option saves disk space too.


Good luck then,

Dan.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Will ROWID be required to be unique in contentless Full text search

2019-04-24 Thread paul tracy
My question is whether or not I can rely on my use of non-unique rowids in FTS5 
contentless tables to remain unenforced into the future of SQLite or am I 
exploiting an oversight that may be resolved in the future?
Details ...
I have a key-value pair SQLite table (called props) used for storing properties 
of objects. It has the following columns: objID, propCode, propVal
objID + propCode is my primary key.

I have created a contentless FTS5 virtual table as follows: create virtual 
table fti using FTS5(ftiCode, ftiVal, content='');
for every record in props table a corresponding insert is made to the fti 
table, I insert the props.objID into fti.rowid.
Since any given objID in my props table can have multiple props I wind up 
making multiple inserts into the fti table with a non-unique rowid (since I'm 
using rowid to store the objid). This works perfectly fine in all use cases so 
my app runs perfectly. 
All of the docs on rowid say that it is unique but in the FTS5 contentless 
table this unique requirement is not enforced. Since I have to provide all 3 of 
the initial fields whenever a delete is required then this also works to remove 
only the exact record from FTI. I view the rowid in the FTS contentless table 
as simply holding whatever value I send to it and it will return that value in 
a field named rowid but that field isn't really a rowid in the true sense. If 
this interpretation will remain valid into the future then my implementation is 
perfect.

For the curious: Why am I doing this instead of something more traditional? 
Mostly it has to do with elimination of duplicate returns from the FTS5 search. 
But also contentless to save space. When I query the FT index I am interested 
in which objects contain given property values. My table has millions of rows 
so I'm sensitive to space and speed.
If I insert the following rows into fti rowid, ftiCode, ftiValue:
3, 7, A3, 8, B
and then I ... SELECT rowid FROM FTI where FTI MATCH A OR B … I only get one 
row returned with rowid of 3. This was a surprise but was exactly what I 
wanted. A more traditional approach to using FTI would have resulted in two 
rows returned both with the same ID field (object ID in my case). By exploiting 
the fact that the FTS5 does not enforce unique rowids I save myself the expense 
of using DISTINCT.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users