> Behalf Of Dan Kennedy
> Sent: Tuesday, October 17, 2017 11:58 AM
> 
> On 10/17/2017 01:22 PM, Hick Gunter wrote:
> > In our virtual table implementations, we are using the 
> rowid to return the location of the record in the backing 
> store (e.g. record offset in the file used as a backing 
> store, offset within a shared memory section or maybe even 
> the memory address of the record image) and also implement 
> fast lookup by rowid.
> >
> > If you don't require such ability, you may as well return a 
> constant, a global counter value or a counter that is reset 
> in the xFilter function.
> >
> > So, YES you always have to implement the xRowid method.
> >
> > It will only get called if your SELECT statement explicitly 
> mentions it. No "INTEGER PRIMARY KEY" magic is performed for 
> virtual tables.
> 
> I think the exception is queries with OR terms. With 
> FTS[345], if you do 
> something like:
> 
>    CREATE VIRTUAL TABLE t1 USING fts5(x);
>    EXPLAIN SELECT x FROM t1 WHERE t1 MATCH 'abc' OR t1 MATCH 'def';
> 
> You can see the Rowid opcodes.
> 
> SQLite runs two separate queries on the virtual table - one 
> with "MATCH 
> 'abc'" and the other with "MATCH 'def'". It uses the rowids for each 
> matched row to avoid returning duplicates. If the xRowid 
> method always 
> returned 0, then only the first set of matches would be returned 
> (because SQLite would deem the second set to be duplicates of the 
> first). Or if xRowid returned arbitrary values your results might 
> include duplicates. etc.
> 
> Same applies to other virtual table types.
> 
> Dan.

Yikes, thanks for the insight on that OR use-case; I'll have to do some
analysis to see what is my exposure.

The counter trick is an interesting suggestion, but I guess I am still
at-risk because I have to make it deterministic/repeatable at least in the
context of a statement, which can still a challenge.  E.g. in your OR clause
example, if the query engine does indeed do two table scans (as opposed to
one table scan, and computing all the predicates) then I have some risk that
the two scans return different results (since my data is coming from APIs,
and dynamic, rather that persisted collections).

In other projects I've definitely used the counter trick before, caching the
underlying data (to support updates and transactions) but those were known
to be small datasets.  This stuff coming from APIs could be big, so I wanted
to avoid caching it all.  But one does what one must....

Thanks for all the feedback!

-dave


_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to