On 02/01/2013 08:31 PM, Dominique Pellé wrote:
Hi

I was trying to optimize this FTS query which can be a bottleneck
in my application:


SELECT docId,matchinfo(ftsStreets,'pcx')
FROM ftsStreets
WHERE ftsStreets MATCH '...' AND docId BETWEEN docid1 AND docid2;


Can't SQLite optimize the query to use the fact that docid cannot be more
than the specified upper limit (docid2) given in the BETWEEN clause here?

Looking that the SQLite code (3.7.14),  I see that it unpacks docid lists (in
function fts3SegReaderNextDocid.c) of the FTS inverted index. Those
docid lists are stored in ascending order and stored by delta. Each delta
is stored in variable size (1 bytes to 10 bytes depending on size of delta).

When running above query, I see that SQLite unpacks sequentially all
the docids of the lists, even after it reaches beyond upper bound docid2,
which seems useless at least in above query.

It probably is possible to include an optimization like this.

One problem will be the 'x' passed to the matchinfo() function. One of
the values returned in this case is the number of rows in the entire
table that contain at least one instance of a given query phrase. To
obtain that information we have to iterate through the entire set of
docids anyhow.

Dan.




I made a temporary change to sqlite3.c (function fts3SegReaderNextDocid.c)
just to prove that there is a potential optimization here:

$ p4 diff -d-c sqlite3.c

***************
*** 126024,126029 ****
--- 126024,126030 ----
       }else{
         rc = fts3SegReaderRequire(pReader, p, FTS3_VARINT_MAX);
         if( rc==SQLITE_OK ){
+         extern sqlite3_int64 docidMax;
           sqlite3_int64 iDelta;
           pReader->pOffsetList = p + sqlite3Fts3GetVarint(p,&iDelta);
           if( pTab->bDescIdx ){
***************
*** 126031,126036 ****
--- 126032,126042 ----
           }else{
             pReader->iDocid += iDelta;
           }
+         if ((unsigned long long)pReader->iDocid>  (unsigned long
long)docidMax)
+         {
+           fprintf(stdout, "*** MAX DOCID (0x%llx) REACHED! (abort
unpack of docid list)\n", docidMax);
+           pReader->pOffsetList = 0;
+         }
         }
       }
     }

The docidMax is normally initialized to 0xffffffffffffffffUL (-1), so my
optimization normally do not do anything. But I set doidMax to docid2
(upper bound of the BETWEEN clause) temporarily when I run my query
and then put it back to 0xffffffffffffffffUL.

Needless to say that this is an ugly hack which should definitely not be
checked-in as-is.

But it clearly helps to speed up my query and it seems to work (without
being 100% sure it's correct).  I get the same results, faster.
And I hope it's good enough at least to prove that there is a potential
optimization in FTS SQLite. I don't know the SQLite code well enough
to come up with a clean patch for the optimization.

I measured timing several times and it consistently speeds up. For
example, I see some queries that used to take 203 ms which now
take 68 ms. Of course speed up depends on the upper bound
value docid2.

Regards
Dominique
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to