[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-06-08 Thread Eric Hill
Surely enough, this SQL:

SELECT t2.rental_id, t2.rental_date, t1.film_id, t3.title, 
t4.category_id, t5."name" 
FROM rental  t2 
LEFT OUTER JOIN inventory t1 
ON  ( t1.inventory_id = t2.inventory_id )  
LEFT OUTER JOIN film t3 
ON  ( t3.film_id = t1.film_id )  
LEFT OUTER JOIN film_category t4 
ON  ( t4.film_id = t1.film_id )  
LEFT OUTER JOIN category t5 
ON  ( t5.category_id = t4.category_id )  ;

generates the desired xBestIndex call:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 3
   CONST[0]: 0 (inventory_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 1 (film_id) = Usable

allowing the "first constraint" heuristic to be successful.

So, in the case where my application is generating SQL, I can fix it so that 
the ON is generated in the helpful order.  But I also let users type in their 
own SQL.

Would it be too much to ask SQLite to determine  vs.  based 
on context rather than inferring based on position?  I'm looking at SQL In a 
Nutshell, 3rd Edition, page 353, and the JOIN syntax example shows the ON 
statements in the order SQLite dislikes.

Thanks very much!

Eric


-Original Message-
From: sqlite-users-bounces at mailinglists.sqlite.org 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Hick Gunter
Sent: Monday, June 08, 2015 10:21 AM
To: 'General Discussion of SQLite Database'
Subject: Re: [sqlite] Virtual Table query - why isn't SQLite using my indexes?

Can you try changing LHS and RHS in the first ON expression?

The older, larger query has inventory_id on the LHS and film_id on the RHS. Now 
you have all fields on the RHS.

It would seem the QP may be inferring   LEFT OUTER JOIN  ON 
( = ) And placing the  fields first in the argument 
list And adding the  fields only for symmetry reasons (they can never 
be used, because they point the "wrong way")

-Urspr?ngliche Nachricht-
Von: Eric Hill [mailto:Eric.Hill at jmp.com]
Gesendet: Montag, 08. Juni 2015 15:55
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] Virtual Table query - why isn't SQLite using my indexes?

Revisiting a thread from about 3 weeks back, I have another xBestIndex puzzler. 
 The example query from that thread was:

SELECT t1.rental_date, t1.inventory_id, t8.film_id, t5.title AS Title,
t3."name" AS Category, t4.customer_id, t4.store_id, t4.email,
t5.length
FROM rental  t1
LEFT OUTER JOIN inventory t8
ON  ( t8.inventory_id = t1.inventory_id )
LEFT OUTER JOIN customer t4
ON  ( t4.customer_id = t1.customer_id )
LEFT OUTER JOIN film_category t7
ON  ( t7.film_id = t8.film_id )
LEFT OUTER JOIN film t5
ON  ( t5.film_id = t8.film_id )
LEFT OUTER JOIN category t3
ON  ( t3.category_id = t7.category_id )
LEFT OUTER JOIN film_actor t6
ON  ( t6.film_id = t5.film_id )
LEFT OUTER JOIN actor t2
ON  ( t2.actor_id = t6.actor_id )  ;

The "money" call to xBestIndex in that case looked like this:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 3
   CONST[0]: 0 (inventory_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 1 (film_id) = Usable

In response to this, I originally promised to create an index using all 3 
constraints (two if which are the same), but SQLite rejected that and did a 
table scan instead, because it really only needed an index based on 
inventory_id.  So the heuristic we ended up with, which did fix this specific 
query, was (from Gunter):

> If you know nothing about a table except for the names of the fields 
> and the number of rows, then you are best off choosing the first 
> constraint only. (rows = cost = log n)

Well, now I have a counter-example.  This time, the SQL looks like this:

SELECT t2.rental_id, t2.rental_date, t1.film_id, t3.title,
t4.category_id, t5."name"
FROM rental  t2
LEFT OUTER JOIN inventory t1
ON  ( t2.inventory_id = t1.inventory_id )
LEFT OUTER JOIN film t3
ON  ( t3.film_id = t1.film_id )
LEFT OUTER JOIN film_category t4
ON  ( t4.film_id = t1.film_id )
LEFT OUTER JOIN category t5
ON  ( t5.category_id = t4.category_id )  ;

For whatever reason, the "money" call to xBestIndex in this case looks like 
this:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 3
   CONST[0]: 1 (film_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 0 (inventory_id) = Usable

The order of the constraints is different!  So, using the "first constraint" 
heuristic, I commit to indexing based on film_id, but indexing inventory on 
film_id not helpful for this query.  SQLite sees that inventory is indexed on 
film_id and decides to use table scan fo

[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-06-08 Thread Hick Gunter
Can you try changing LHS and RHS in the first ON expression?

The older, larger query has inventory_id on the LHS and film_id on the RHS. Now 
you have all fields on the RHS.

It would seem the QP may be inferring   LEFT OUTER JOIN  ON 
( = )
And placing the  fields first in the argument list
And adding the  fields only for symmetry reasons (they can never be 
used, because they point the "wrong way")

-Urspr?ngliche Nachricht-
Von: Eric Hill [mailto:Eric.Hill at jmp.com]
Gesendet: Montag, 08. Juni 2015 15:55
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] Virtual Table query - why isn't SQLite using my indexes?

Revisiting a thread from about 3 weeks back, I have another xBestIndex puzzler. 
 The example query from that thread was:

SELECT t1.rental_date, t1.inventory_id, t8.film_id, t5.title AS Title,
t3."name" AS Category, t4.customer_id, t4.store_id, t4.email,
t5.length
FROM rental  t1
LEFT OUTER JOIN inventory t8
ON  ( t8.inventory_id = t1.inventory_id )
LEFT OUTER JOIN customer t4
ON  ( t4.customer_id = t1.customer_id )
LEFT OUTER JOIN film_category t7
ON  ( t7.film_id = t8.film_id )
LEFT OUTER JOIN film t5
ON  ( t5.film_id = t8.film_id )
LEFT OUTER JOIN category t3
ON  ( t3.category_id = t7.category_id )
LEFT OUTER JOIN film_actor t6
ON  ( t6.film_id = t5.film_id )
LEFT OUTER JOIN actor t2
ON  ( t2.actor_id = t6.actor_id )  ;

The "money" call to xBestIndex in that case looked like this:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 3
   CONST[0]: 0 (inventory_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 1 (film_id) = Usable

In response to this, I originally promised to create an index using all 3 
constraints (two if which are the same), but SQLite rejected that and did a 
table scan instead, because it really only needed an index based on 
inventory_id.  So the heuristic we ended up with, which did fix this specific 
query, was (from Gunter):

> If you know nothing about a table except for the names of the fields
> and the number of rows, then you are best off choosing the first
> constraint only. (rows = cost = log n)

Well, now I have a counter-example.  This time, the SQL looks like this:

SELECT t2.rental_id, t2.rental_date, t1.film_id, t3.title,
t4.category_id, t5."name"
FROM rental  t2
LEFT OUTER JOIN inventory t1
ON  ( t2.inventory_id = t1.inventory_id )
LEFT OUTER JOIN film t3
ON  ( t3.film_id = t1.film_id )
LEFT OUTER JOIN film_category t4
ON  ( t4.film_id = t1.film_id )
LEFT OUTER JOIN category t5
ON  ( t5.category_id = t4.category_id )  ;

For whatever reason, the "money" call to xBestIndex in this case looks like 
this:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 3
   CONST[0]: 1 (film_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 0 (inventory_id) = Usable

The order of the constraints is different!  So, using the "first constraint" 
heuristic, I commit to indexing based on film_id, but indexing inventory on 
film_id not helpful for this query.  SQLite sees that inventory is indexed on 
film_id and decides to use table scan for inventory, and it's game over.

If SQLite calls to xBestIndex do not in some way convey which constraints 
matter and which ones don't, I don't see how I can use virtual tables.

Here is the WhereTrace, in case it might help:

*** Optimizer Start ***
add: * 0.01.00 sqlite_master f 00100 N 0 cost 0,216,200
0 0.01.00 sqlite_master f 00100 N 0 cost 0,216,200
 begin solver.  (nRowEst=0)
New0 cost=216,200 order=0
 after round 0 
 0 cost=216 nrow=200 order=0
 begin solver.  (nRowEst=201)
New0 cost=216,200 order=1
 after round 0 
 0 cost=216 nrow=200 order=1 rev=0x0
 Solution nRow=200 ORDERBY=1,0x0
0 0.01.00 sqlite_master f 00100 N 0 cost 0,216,200
*** Optimizer Finished ***
*** Optimizer Start ***
 Solution nRow=1
0 0.01.00 sqlite_master f 01101 N 1 cost 0,33,1
*** Optimizer Finished ***
*** Optimizer Start ***
add: * 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
   skip: * 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
add: * 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
0 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
1 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
 begin solver.  (nRowEst=0)
New0 cost=271, 43 order=0
Update 1 cost=216,180 order=0  was 0 cost=271, 43 order=0
 after round 0 
 1 cost=216 nrow=180 order=0
 begin solver.  (nRowEst=181)
 sort cost=239 (1/1) increase

[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-06-08 Thread Eric Hill
Revisiting a thread from about 3 weeks back, I have another xBestIndex puzzler. 
 The example query from that thread was:

SELECT t1.rental_date, t1.inventory_id, t8.film_id, t5.title AS Title, 
t3."name" AS Category, t4.customer_id, t4.store_id, t4.email, 
t5.length 
FROM rental  t1 
LEFT OUTER JOIN inventory t8 
ON  ( t8.inventory_id = t1.inventory_id )  
LEFT OUTER JOIN customer t4 
ON  ( t4.customer_id = t1.customer_id )  
LEFT OUTER JOIN film_category t7 
ON  ( t7.film_id = t8.film_id )  
LEFT OUTER JOIN film t5 
ON  ( t5.film_id = t8.film_id )  
LEFT OUTER JOIN category t3 
ON  ( t3.category_id = t7.category_id )  
LEFT OUTER JOIN film_actor t6 
ON  ( t6.film_id = t5.film_id )  
LEFT OUTER JOIN actor t2 
ON  ( t2.actor_id = t6.actor_id )  ;

The "money" call to xBestIndex in that case looked like this:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 3
   CONST[0]: 0 (inventory_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 1 (film_id) = Usable

In response to this, I originally promised to create an index using all 3 
constraints (two if which are the same), but SQLite rejected that and did a 
table scan instead, because it really only needed an index based on 
inventory_id.  So the heuristic we ended up with, which did fix this specific 
query, was (from Gunter):

> If you know nothing about a table except for the names of the fields 
> and the number of rows, then you are best off choosing 
> the first constraint only. (rows = cost = log n)

Well, now I have a counter-example.  This time, the SQL looks like this:

SELECT t2.rental_id, t2.rental_date, t1.film_id, t3.title, 
t4.category_id, t5."name" 
FROM rental  t2 
LEFT OUTER JOIN inventory t1 
ON  ( t2.inventory_id = t1.inventory_id )  
LEFT OUTER JOIN film t3 
ON  ( t3.film_id = t1.film_id )  
LEFT OUTER JOIN film_category t4 
ON  ( t4.film_id = t1.film_id )  
LEFT OUTER JOIN category t5 
ON  ( t5.category_id = t4.category_id )  ;

For whatever reason, the "money" call to xBestIndex in this case looks like 
this:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 3
   CONST[0]: 1 (film_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 0 (inventory_id) = Usable

The order of the constraints is different!  So, using the "first constraint" 
heuristic, I commit to indexing based on film_id, but indexing inventory on 
film_id not helpful for this query.  SQLite sees that inventory is indexed on 
film_id and decides to use table scan for inventory, and it's game over.

If SQLite calls to xBestIndex do not in some way convey which constraints 
matter and which ones don't, I don't see how I can use virtual tables.

Here is the WhereTrace, in case it might help:

*** Optimizer Start ***
add: * 0.01.00 sqlite_master f 00100 N 0 cost 0,216,200
0 0.01.00 sqlite_master f 00100 N 0 cost 0,216,200
 begin solver.  (nRowEst=0)
New0 cost=216,200 order=0
 after round 0 
 0 cost=216 nrow=200 order=0
 begin solver.  (nRowEst=201)
New0 cost=216,200 order=1
 after round 0 
 0 cost=216 nrow=200 order=1 rev=0x0
 Solution nRow=200 ORDERBY=1,0x0
0 0.01.00 sqlite_master f 00100 N 0 cost 0,216,200
*** Optimizer Finished ***
*** Optimizer Start ***
 Solution nRow=1
0 0.01.00 sqlite_master f 01101 N 1 cost 0,33,1
*** Optimizer Finished ***
*** Optimizer Start ***
add: * 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
   skip: * 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
add: * 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
0 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
1 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
 begin solver.  (nRowEst=0)
New0 cost=271, 43 order=0
Update 1 cost=216,180 order=0  was 0 cost=271, 43 order=0
 after round 0 
 1 cost=216 nrow=180 order=0
 begin solver.  (nRowEst=181)
 sort cost=239 (1/1) increases cost 271 to 272
New0 cost=272, 43 order=0
Update 1 cost=216,180 order=1  was 0 cost=272, 43 order=0
 after round 0 
 1 cost=216 nrow=180 order=1 rev=0x0
 Solution nRow=180 ORDERBY=1,0x0
1 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
*** Optimizer Finished ***
*** Optimizer Start ***
 Solution nRow=1
0 0.01.00 sqlite_master f 01101 N 1 cost 0,33,1
*** Optimizer Finished ***
*** Optimizer Start ***
 Solution nRow=1
0 0.01.00 sqlite_master f 01101 N 1 cost 0,33,1
*** Optimizer Finished ***
*** Optimizer Start ***
add: * 0.01.00 sqlite_master f 04000 N 1 cost 

[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-20 Thread Eric Hill
Sorry, scratch that.  T2 was not a virtual table when I ran this query.  My 
bad.  Told you I was a noob.

Eric

-Original Message-
From: Eric Hill 
Sent: Wednesday, May 20, 2015 12:32 PM
To: 'General Discussion of SQLite Database'
Subject: RE: [sqlite] Virtual Table query - why isn't SQLite using my indexes?

Eric had said:

>> But then what about a query like this:
>>
>>SELECT * FROM T1
>>LEFT OUTER JOIN T2 ON ( T2.a = T1.a ) AND ( T2.b = 
>> T1.b ) AND ( T2.c = T1.c );
>>
>> xBestIndex will get called here for T1 with 3 constraints, c, b, and a, in 
>> that order.  

To which Gunter replied in part:

>  an Index on the LHS Table of a LEFT OUTER join never helps, because 
> you are requesting to retrieve all rows of the LHS anyway. SQLite is calling 
> xBestIndex with constraints on T1 for symmetry purposes only (this may even 
> be considered a bug).
> Try again with indexes on T2.

Very good point, so this is what puzzles me:  xBestIndex is *never called* for 
T2 for this query (!).  To my undoubtedly noob mind, *that* seems like a bug.

Thanks again!

Eric




[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-20 Thread Eric Hill
Eric had said:

>> But then what about a query like this:
>>
>>SELECT * FROM T1
>>LEFT OUTER JOIN T2 ON ( T2.a = T1.a ) AND ( T2.b = T1.b ) AND 
>> ( T2.c = T1.c );
>>
>> xBestIndex will get called here for T1 with 3 constraints, c, b, and a, in 
>> that order.  

To which Gunter replied in part:

>  an Index on the LHS Table of a LEFT OUTER join never helps, because you are 
> requesting to retrieve all rows of the LHS 
> anyway. SQLite is calling xBestIndex with constraints on T1 for symmetry 
> purposes only (this may even be considered a bug). 
> Try again with indexes on T2.

Very good point, so this is what puzzles me:  xBestIndex is *never called* for 
T2 for this query (!).  To my undoubtedly noob mind, *that* seems like a bug.

Thanks again!

Eric




[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-20 Thread Hick Gunter
-Urspr?ngliche Nachricht-
Von: Eric Hill [mailto:Eric.Hill at jmp.com]
Gesendet: Dienstag, 19. Mai 2015 22:44
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] Virtual Table query - why isn't SQLite using my indexes?

Hey, Gunter,

...
But then what about a query like this:

SELECT * FROM T1
LEFT OUTER JOIN T2 ON ( T2.a = T1.a ) AND ( T2.b = T1.b ) AND ( 
T2.c = T1.c );

xBestIndex will get called here for T1 with 3 constraints, c, b, and a, in that 
order.  In this case, though, it seems (to the uninitiated, at least ;-) that 
the "best index" would be:

CREATE INDEX T1_all ON T1 (c, b, a);

(or would it?) but the xBestIndex call does not contain any information that 
tells me that.  To be able to know that, I would have to know independently 
what the join conditions are.  And I could make that knowledge available to my 
xBestIndex implementation.  But how much better would that be than an index on 
c and partial scans to find the others?

(I tried this exact example, once creating an index on just T1.c and once 
creating an index on T1 (c, b, a), and in both cases, my index was not used by 
SQLite.  So I don't think I understand what SQLite wants.)

  Eric,

  an Index on the LHS Table of a LEFT OUTER join never helps, because you are 
requesting to retrieve all rows of the LHS anyway. SQLite is calling xBestIndex 
with constraints on T1 for symmetry purposes only (this may even be considered 
a bug). Try again with indexes on T2.

  The more xBestIndex knows about a table, the better its answers to SQLite 
will be. Unfortunately, xBestIndex has no way of knowing how many joins the 
passed constraints are coming from (apart from the obvious giveaway of repeated 
fields).

  If you know nothing about a table except for the names of the fields and the 
number of rows, then you are best off choosing the first constraint only. (rows 
= cost = log n)

  If you know the cardinality of each field, you should choose the one with the 
highest cardinality c. (rows = n/c, cost = rows + log n)

  If you at least know which sets of fields are guaranteed to be unique, choose 
the ?most fulfilled? set (tie break: least number of fields, highest 
cardinality).

  Gunter




  ___
   Gunter Hick
  Software Engineer
  Scientific Games International GmbH
  FN 157284 a, HG Wien
  Klitschgasse 2-4, A-1130 Vienna, Austria
  Tel: +43 1 80100 0
  E-Mail: hick at scigames.at

  This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.





[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-20 Thread James K. Lowden
On Tue, 19 May 2015 20:44:17 +
Eric Hill  wrote:

> But then what about a query like this:
> 
>   SELECT * FROM T1
>   LEFT OUTER JOIN T2 ON ( T2.a = T1.a ) AND ( T2.b =
> T1.b ) AND ( T2.c = T1.c );
> 
> xBestIndex will get called here for T1 with 3 constraints, c, b, and
> a, in that order.  In this case, though, it seems (to the
> uninitiated, at least ;-) that the "best index" would be:
> 
>   CREATE INDEX T1_all ON T1 (c, b, a);

Given that query, any index that includes a, b, and c would serve
equally well.  The order of the criteria in the ON clause is
immaterial.  

> I guess xBestIndex is saying "Tell me about indexes that you already
> have available so I can take advantage of them", 

Right.  

> I have *no* indexes, but I am willing to make whatever indexes would
> be most helpful, if I could just figure that out.

Depending on the physical characteristics of thing you're
representing as a virtual table, you might not want any indexes. If
every access requires either a sequential scan or a random access to
computed location, all search criteria are either perfect or
inconsequential.  If that's fast enough, it's fast enough.  If you want
to make it faster, indexes are certainly an option.  

Don't start by guessing, though.  Rather than beginning by trying to
anticipate what the query-generation tool will produce (and thus what
xBestIndex combinations will be interrogated), it's better to put the
horse before the cart and consider the data.  Once you've characterized
your data, you apply that information to that supplied by xBestIndex.  

If you have a table in 1NF, you have a key.  What are the columns that
identify a row?  That's your primary key, and your first index. There
may be other sets of columns that uniquely identify a row; these also
could use an index to enforce/verify uniqueness.  

Suppose your primary key is {a,b,c} and you want to construct an index
for it.  At this point I start will talking out of school, because I
don't know anything about the SQLite query planner.  But if I go astray
I'm sure others will correct me.  

The choice of the first column in the index is the main concern, and it
is influenced by the dominant search criteria i.e., the kinds of queries
that will be submitted.  

Queries in general are of two kinds: point queries, returning one row
(specifying values for every column in the index), and range queries
(specifying an incomplete set of values for the index).  Point queries
are indifferent to column order: find row in index and return. 
Range queries are the interesting ones.  

In a B+ tree such as SQLite uses for its indexes, rows will be sorted
in column order.  If your index is {a,b,c}, all the "a" rows will be
together and all the "b" columns within those "a"'s, and so on.  When
the query says, 

where a between x and y

the I/O engine can scoop all the row references out of contiguous
pages in the index.  Whereas if the query says, 

where b between x and y

the index is all but useless.  Because it's sorted by "a", it would have
to be scanned from beginning to end to find all the "b"'s that meet the
criteria.  

The other important criterion to answering xBestIndex, as Hick
mentioned, is cardinality.  You can create for each column a histogram
of the frequency distribution for the values in that column.  If the
column domain is two values (e.g., 'male' and 'female'), for example,
the probability of it matching a queried value X might be 50%.  (I think
I read that at Google 4 employees in 5 are men, in which case there a
value of 'female' would be a much more useful 20%.)  You can use the
combination of selective cardinality and your intuition (or experience)
for probable search criteria to create other indexes.  Your answer to
xBestIndex will be based on the product of the selectivity of the
contiguous columns (starting from the first) in the applicable index.  

Cardinality is only one part of the equation, however; the other is the
operator.  Equality is at least an opportunity for a point query, but
inequality implies a range.  How big a range?  If we're talking about
ages, for example, 

where age > x

If X is 0, that's not very useful, whereas if X is 100, it's very
selective of human beings (assuming age is in years).  Problem is, X
isn't supplied (for good reason) to xBestIndex.  How to answer?  

You can only work with what you know: the cardinality of the column,
the operator, and the rowcount.  Rows and cardinality you know, and
they apply directly to equality comparisons.  For inequality, you need
a heuristic.  A quarter of a century ago, during the Late Bronz Age,
Sybase used a simple heuristic: every inequality selected 25% of the
table.  That meant a BETWEEN query (two inequalities) was 6.25% = 0.25
* 0.25.  I doubt they pulled that number out of a hat.  Absent better
information, that might be your best option.  

I hope that's of some use.  Complicated, for sure, and no small amount

[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-19 Thread Eric Hill
Hey, Gunter,

Thanks for taking the time to make such a thorough response to my question.

Yes, your query is arguably more correct, though our queries do produce 
identical results.  The SQL I included was generated by a general-purpose query 
tool that automatically joins tables based on key relationships first and then 
common variable names, so it doesn't always do what makes the most sense.

Your explanation of xBestIndex is very helpful.  The data that I am projecting 
into SQLite via virtual tables does not have the concept of keys, constraints, 
or indexes, at least not currently.  So, when SQLite presents me with multiple 
constraints and asks for the best choice, I'm flying a bit blind.  Also, I'm 
using SQLite as part of the aforementioned general-purpose query tool; I'm not 
just trying to optimize queries into this DVD rental sample data; I'm trying to 
come up with general rules that will work reasonably well for arbitrary queries 
against data that I don't know much about.  I have implemented the strategy I 
mentioned at the end of my last post (choose the first constraint, constructing 
an index for it in the xFilter call), and it's doing orders of magnitude better 
than what I had before (SQLite is using my indexes) and performs as well as 
copying the data into SQLite and letting SQLite do everything, at least for the 
small set of cases I've tried.

But then what about a query like this:

SELECT * FROM T1
LEFT OUTER JOIN T2 ON ( T2.a = T1.a ) AND ( T2.b = T1.b ) AND ( 
T2.c = T1.c );

xBestIndex will get called here for T1 with 3 constraints, c, b, and a, in that 
order.  In this case, though, it seems (to the uninitiated, at least ;-) that 
the "best index" would be:

CREATE INDEX T1_all ON T1 (c, b, a);

(or would it?) but the xBestIndex call does not contain any information that 
tells me that.  To be able to know that, I would have to know independently 
what the join conditions are.  And I could make that knowledge available to my 
xBestIndex implementation.  But how much better would that be than an index on 
c and partial scans to find the others?

(I tried this exact example, once creating an index on just T1.c and once 
creating an index on T1 (c, b, a), and in both cases, my index was not used by 
SQLite.  So I don't think I understand what SQLite wants.)

I guess xBestIndex is saying "Tell me about indexes that you already have 
available so I can take advantage of them", but in my case, I have *no* 
indexes, but I am willing to make whatever indexes would be most helpful, if I 
could just figure that out.

Thanks again,

Eric

-Original Message-
From: sqlite-users-bounces at mailinglists.sqlite.org 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Hick Gunter
Sent: Tuesday, May 19, 2015 3:20 AM
To: 'General Discussion of SQLite Database'
Subject: Re: [sqlite] Virtual Table query - why isn't SQLite using my indexes?

Eric

Working backwards from your query, I think your schema would be similar to 
(foreign keys omitted)

CREATE TABLE rental ( rentalID INTEGER PRIMARY KEY NOT NULL, rental_date TEXT, 
inventory_id INTEGER, customer_id INTEGER, ...); CREATE TABLE inventory ( 
inventory_id INTEGER PRIMARY KEY NOT NULL, film_id INTEGER, ...); CREATE TABLE 
customer ( customer_id INTEGER PRIMARY KEY NOT NULL, first_name TEXT, last_name 
TEXT,...); CREATE TABLE film ( film_id INTEGER PRIMARY KEY NOT NULL, title 
TEXT, release_year INTEGER, length INTEGER, ...); CREATE TABLE actor ( actor_id 
INTEGER PRIMARY KEY NOT NULL, first_name TEXT, last_name TEXT,...); CREATE 
TABLE category ( category_id INTEGER PRIMARY KEY NOT NULL, name TEXT, ...); 
CREATE TABLE film_actor ( film_id INTEGER, actor_id INTEGER, 
UNIQUE(film_id,actor_id) ); CREATE TABLE film_category ( film_id INTEGER, 
category_id INTEGER, UNIQUE (film_id,category_id) );

And your query should really be (note that film_actor and film_category are 
joined to the film table)

SELECT t1.rental_date, t1.inventory_id, t1.customer_id, t2.film_id,
t2.store_id, t3.first_name AS cust_firstname, 
t3.last_name AS cust_lastname, t3.email,
t6.category_id, t4.title, t4.release_year, 
t4.length,
t4.rating, t4.rental_rate, t5.actor_id, t8.name 
AS category,
t7.first_name AS actor_firstname, t7.last_name 
AS actor_lastname
FROM rental10  t1
LEFT OUTER JOIN inventory t2
ON  ( t2.inventory_id = 
t1.inventory_id )
LEFT OUTER JOIN customer t3
ON  ( t3.customer_id = 
t1.customer_id )
LEFT OUTER JOIN film t4
ON  ( t4.film_id = t2.film_id )
LEFT OUTE

[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-19 Thread Hick Gunter
 Database
Betreff: Re: [sqlite] Virtual Table query - why isn't SQLite using my indexes?

Per Richard's request, I have produced the WhereTrace for the query that uses 
virtual tables and the same query that is not using virtual tables.  That 
output is at the end of this message.  Ultimately, I think my question is, how 
should I respond to xBestIndex such that SQLite will perform the query as 
efficiently with virtual tables as it does when I copy all the data into SQLite?

Hick asked:

> Are you setting the constraintUsage return parameters correctly?

I suspect not.  I think I don't understand what to do when xBestIndex is passed 
multiple constraints, because those are the cases when SQLite rejects my 
indexes.  And, from debugging, my indexes are being rejected because this 
condition (at the tail end of whereLoopFindLesser()):

if( (p->prereq & pTemplate->prereq)==pTemplate->prereq   /* (1)  */

is false, so the costs I am returning never get considered.

Here is an example.  For the inventory table, xBestIndex gets called twice, 
each time with 4 constraints, 3 of which happen to be the same, film_id, 
presumably because inventory.film_id is used in 3 different join constraints:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 4
   CONST[0]: 0 (inventory_id) = Unusable
   CONST[1]: 1 (film_id) = Unusable
   CONST[2]: 1 (film_id) = Unusable
   CONST[3]: 1 (film_id) = Unusable
   Index NOT created: est. cost: 440393770484721

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 4
   CONST[0]: 0 (inventory_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 1 (film_id) = Usable
   CONST[3]: 1 (film_id) = Usable
   Index created: est. cost: 1

The first time, all the constraints are marked unusable, so I set argvIndex and 
omit to 0 for all four constraints and set the estimatedCost to a ridiculously 
large number.  The second time, all of the constraints are marked as usable.  
My data does not have any pre-existing indexes; I'm willing to create whatever 
indexes are needed to speed up the query.  So I set omit to 1 and argvIndex to 
1, 2, 3, and 4 respectively for the four constraints.  This is clearly where I 
am confusing SQLite.

Here is a simpler example, for the film_actor table.  xBestIndex is again 
called twice:

jmpvtab BEST INDEX:  Table: film_actor  nConstraints: 2
   CONST[0]: 1 (film_id) = Unusable
   CONST[1]: 0 (actor_id) = Unusable
   Index NOT created: est. cost: 890034380901136

jmpvtab BEST INDEX:  Table: film_actor  nConstraints: 2
   CONST[0]: 1 (film_id) = Usable
   CONST[1]: 0 (actor_id) = Usable
   Index created: est. cost: 1

Here again, in the second case, I set omit to 1 for both and set argvIndex to 1 
for film_id and 2 for actor_id.  What I am trying to tell SQLite is that I am 
willing to make an index that sorts first by film_id and then by actor_id for 
rows with the same film_id.  But I'm thinking that's not what SQLite wants.

Since I'm going to have to create an index for whichever constraint that SQLite 
is going to filter on, maybe I should just take the first constraint, set omit 
to 1 and argvIndex to 1, and set omit and argvIndex to 0 for all other 
constraints?  And then set the cost to n(log n) where n is number of rows, 
since that is the cost of making an index?

Thanks very much for your help.  WhereTrace follows:

=
WhereTrace when USING virtual tables:
=

New0 cost=272, 43 order=0
Update 1 cost=216,180 order=1  was 0 cost=272, 43 order=0
 after round 0 
 1 cost=216 nrow=180 order=1 rev=0x0
 Solution nRow=180 ORDERBY=1,0x0
1 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
*** Optimizer Finished ***
*** Optimizer Start ***
 Solution nRow=1
0 0.01.00 sqlite_master f 01101 N 1 cost 0,33,1
*** Optimizer Finished ***
*** Optimizer Start ***
 Solution nRow=1
0 0.01.00 sqlite_master f 01101 N 1 cost 0,33,1
*** Optimizer Finished ***
*** Optimizer Start ***
add: * 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
   skip: * 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
add: * 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
0 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
1 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
 begin solver.  (nRowEst=0)
New0 cost=271, 43 order=0
Update 1 cost=216,180 order=0  was 0 cost=271, 43 order=0
 after round 0 
 1 cost=216 nrow=180 order=0
 begin solver.  (nRowEst=181)
 sort cost=239 (1/1) increases cost 271 to 272
New0 cost=272, 43 order=0
Update 1 cost=216,180 order=1  was 0 cost=272, 43 order=0
 after round 0 
 1 cost=216 nrow=180 order=1 rev=0x0
 Solution nRow=180 ORDERBY=1,0x0
1 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
*** Optimizer Finished ***
***

[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-18 Thread Eric Hill
Per Richard's request, I have produced the WhereTrace for the query that uses 
virtual tables and the same query that is not using virtual tables.  That 
output is at the end of this message.  Ultimately, I think my question is, how 
should I respond to xBestIndex such that SQLite will perform the query as 
efficiently with virtual tables as it does when I copy all the data into SQLite?

Hick asked:

> Are you setting the constraintUsage return parameters correctly?

I suspect not.  I think I don't understand what to do when xBestIndex is passed 
multiple constraints, because those are the cases when SQLite rejects my 
indexes.  And, from debugging, my indexes are being rejected because this 
condition (at the tail end of whereLoopFindLesser()):

if( (p->prereq & pTemplate->prereq)==pTemplate->prereq   /* (1)  */

is false, so the costs I am returning never get considered.

Here is an example.  For the inventory table, xBestIndex gets called twice, 
each time with 4 constraints, 3 of which happen to be the same, film_id, 
presumably because inventory.film_id is used in 3 different join constraints:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 4
   CONST[0]: 0 (inventory_id) = Unusable
   CONST[1]: 1 (film_id) = Unusable
   CONST[2]: 1 (film_id) = Unusable
   CONST[3]: 1 (film_id) = Unusable
   Index NOT created: est. cost: 440393770484721

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 4
   CONST[0]: 0 (inventory_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 1 (film_id) = Usable
   CONST[3]: 1 (film_id) = Usable
   Index created: est. cost: 1

The first time, all the constraints are marked unusable, so I set argvIndex and 
omit to 0 for all four constraints and set the estimatedCost to a ridiculously 
large number.  The second time, all of the constraints are marked as usable.  
My data does not have any pre-existing indexes; I'm willing to create whatever 
indexes are needed to speed up the query.  So I set omit to 1 and argvIndex to 
1, 2, 3, and 4 respectively for the four constraints.  This is clearly where I 
am confusing SQLite.  

Here is a simpler example, for the film_actor table.  xBestIndex is again 
called twice:

jmpvtab BEST INDEX:  Table: film_actor  nConstraints: 2
   CONST[0]: 1 (film_id) = Unusable
   CONST[1]: 0 (actor_id) = Unusable
   Index NOT created: est. cost: 890034380901136

jmpvtab BEST INDEX:  Table: film_actor  nConstraints: 2
   CONST[0]: 1 (film_id) = Usable
   CONST[1]: 0 (actor_id) = Usable
   Index created: est. cost: 1

Here again, in the second case, I set omit to 1 for both and set argvIndex to 1 
for film_id and 2 for actor_id.  What I am trying to tell SQLite is that I am 
willing to make an index that sorts first by film_id and then by actor_id for 
rows with the same film_id.  But I'm thinking that's not what SQLite wants.

Since I'm going to have to create an index for whichever constraint that SQLite 
is going to filter on, maybe I should just take the first constraint, set omit 
to 1 and argvIndex to 1, and set omit and argvIndex to 0 for all other 
constraints?  And then set the cost to n(log n) where n is number of rows, 
since that is the cost of making an index?

Thanks very much for your help.  WhereTrace follows:

=
WhereTrace when USING virtual tables:
=

New0 cost=272, 43 order=0
Update 1 cost=216,180 order=1  was 0 cost=272, 43 order=0
 after round 0 
 1 cost=216 nrow=180 order=1 rev=0x0
 Solution nRow=180 ORDERBY=1,0x0
1 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
*** Optimizer Finished ***
*** Optimizer Start ***
 Solution nRow=1
0 0.01.00 sqlite_master f 01101 N 1 cost 0,33,1
*** Optimizer Finished ***
*** Optimizer Start ***
 Solution nRow=1
0 0.01.00 sqlite_master f 01101 N 1 cost 0,33,1
*** Optimizer Finished ***
*** Optimizer Start ***
add: * 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
   skip: * 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
add: * 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
0 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
1 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
 begin solver.  (nRowEst=0)
New0 cost=271, 43 order=0
Update 1 cost=216,180 order=0  was 0 cost=271, 43 order=0
 after round 0 
 1 cost=216 nrow=180 order=0
 begin solver.  (nRowEst=181)
 sort cost=239 (1/1) increases cost 271 to 272
New0 cost=272, 43 order=0
Update 1 cost=216,180 order=1  was 0 cost=272, 43 order=0
 after round 0 
 1 cost=216 nrow=180 order=1 rev=0x0
 Solution nRow=180 ORDERBY=1,0x0
1 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
*** Optimizer Finished ***
*** Optimizer Start ***
 Solution nRow=1
0 0.01.00 sqlite_master f 01101 N 1 cost 0,33,1

[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-18 Thread Richard Hipp
Eric:  Can you please recompile the SQLite shell (sqlite3.exe) with
-DSQLITE_DEBUG and -DSQLITE_ENABLE_WHERETRACE and then run your query
after first running

   .wheretrace 0xff

And then send us the output?  (In order for this to work, you'll have
to build your virtual table implementation as a shared library/DLL and
load it using the ".load" command.)

If you cannot easily run your query from sqlite3.exe, then please
recompile your application using -DSQLITE_DEBUG and
-DSQLITE_ENABLE_WHERETRACE and then run:

extern int sqlite3WhereTrace;
sqlite3WhereTrace = 0xff;

Right before you run the problem query.


-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-18 Thread Hick Gunter
Are you setting the constraintUsage return parameters correctly?

SQLite ist asking two questions:

1) What is the cost of a full table scan on table inventory (your answer is 
4581; I personally would leave idxNum=0 for this case)

2) What is the least cost of doing a partial table scan with any combination of 
the four constraints (resulting from the joins of t1->t2 on inventory_id and 
T4/5/6 -> t2 on film_id)

In the second case, you should not only set rows and cost, but also indicate 
which constraint is to be used e.b. by setting constraintUsage[0].argvIndex = 
1. This allows SQLite to pass the actual constraint value to the xFilter 
function.
Also, if you can guarantee that your xFilter/xNext implementation will only 
return matching rows, you should set contraintUsage[0].omit = 1. This allows 
SQLite to skip the value check that would be coded to eliminate unwanted rows.
Personally, I would be using small positive numbers for the indexes.

Again, you should not be creating Indexes in xBestIndex; if you are actually 
creating indexes on request in xFilter, the cost of doing that should be 
included in the cost return value.

Looking at your query, you will be returning the cartesian product of all 
actors and categories. I also find it strange to join inventory records to 
actors and categories instead of joining via the film table, as would be 
suggested by the n:m relational table names too.

-Urspr?ngliche Nachricht-
Von: Eric Hill [mailto:Eric.Hill at jmp.com]
Gesendet: Freitag, 15. Mai 2015 18:03
An: sqlite-users at mailinglists.sqlite.org
Betreff: [sqlite] Virtual Table query - why isn't SQLite using my indexes?

Hey,

Let me say up front that I'm sure this is my fault.  I have SQLite version 
3.8.9.

I am using virtual tables, and I am trying to get xBestIndex and xFilter doing 
the right things so that I get optimal queries.  Currently, I am working on the 
following query:

SELECT t1.rental_date, t1.inventory_id, t1.customer_id, 
t2.film_id,
t2.store_id, t3.first_name AS cust_firstname, 
t3.last_name AS cust_lastname, t3.email,
t6.category_id, t4.title, t4.release_year, 
t4.length,
t4.rating, t4.rental_rate, t5.actor_id, t8.name 
AS category,
t7.first_name AS actor_firstname, t7.last_name 
AS actor_lastname
FROM rental10  t1
LEFT OUTER JOIN inventory t2
ON  ( t2.inventory_id = 
t1.inventory_id )
LEFT OUTER JOIN customer t3
ON  ( t3.customer_id = 
t1.customer_id )
LEFT OUTER JOIN film_category t6
ON  ( t6.film_id = t2.film_id )
LEFT OUTER JOIN film t4
ON  ( t4.film_id = t2.film_id )
LEFT OUTER JOIN film_actor t5
ON  ( t5.film_id = t2.film_id )
LEFT OUTER JOIN category t8
ON  ( t8.category_id = 
t6.category_id )
LEFT OUTER JOIN actor t7
ON  ( t7.actor_id = t5.actor_id 
);

When I execute this query, the result is correct, but it is taking too long by 
an order of magnitude or two.  It seems to be doing full table scans despite 
the fact that I am creating indexes as requested.

For example, xBestIndex gets called for the inventory table twice, with four 
constraints, once with usable set to false for all four constraints, and once 
with usable set to true for all four.  Here is my printf debugging spew:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 4
   CONST[0]: 0 (inventory_id) = Unusable
   CONST[1]: 1 (film_id) = Unusable
   CONST[2]: 1 (film_id) = Unusable
   CONST[3]: 1 (film_id) = Unusable
   Index NOT created: est. cost: 4581
jmpvtab BEST INDEX:  Table: inventory  nConstraints: 4
   CONST[0]: 0 (inventory_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 1 (film_id) = Usable
   CONST[3]: 1 (film_id) = Usable
   Index created: est. cost: 3.66096029177608

So, in the first case, I do not create an index (which I signify by setting 
idxNum to -999), and I set the cost (and, est. rows) to 4581.  In the second 
case, I create an index (0) and set cost to log10(4581) = 3.66 and est. rows to 
4.  Yet, later, whenever xFilter is called for the inventory table, SQLite 
passes in idxNum = 999 and nConstraints = 0.  The index I dutifully created is 
never asked for.  In cases where there is a single constraint, SQLite does ask 
request the index in the xFilter call, but it seems that for all the cases 
where multiple constraints are involved, the

[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-17 Thread Eric Hill
Thanks, Dominique, for the tip on multifile amalgamation. After a few internet 
searches and the acquisition of gawk.exe and tclsh85.exe, I was able to build 
the multi-file amalgamation for 3.8.10.  Tomorrow, I will see if I can tell why 
SQLite is choosing table scans over using my indexes.  If anyone has tips on 
where to set breakpoints, that would be helpful.

One thing I did do:  Since I can copy the data into SQLite and do the same join 
as I was trying using virtual tables, I did that with EXPLAIN QUERY PLAN, and I 
got this:

0   0   0   SCAN TABLE rental10 AS t1
0   1   1   SEARCH TABLE inventory AS t2 USING AUTOMATIC COVERING 
INDEX (inventory_id=?)
0   2   2   SEARCH TABLE customer AS t3 USING AUTOMATIC COVERING 
INDEX (customer_id=?)
0   3   3   SEARCH TABLE film_category AS t6 USING AUTOMATIC 
COVERING INDEX (film_id=?)
0   4   4   SEARCH TABLE film AS t4 USING AUTOMATIC COVERING INDEX 
(film_id=?)
0   5   5   SEARCH TABLE film_actor AS t5 USING AUTOMATIC COVERING 
INDEX (film_id=?)
0   6   6   SEARCH TABLE category AS t8 USING AUTOMATIC COVERING 
INDEX (category_id=?)
0   7   7   SEARCH TABLE actor AS t7 USING AUTOMATIC COVERING INDEX 
(actor_id=?)

So, when SQLite has all the data, it figures out that it needs to use indexes, 
but when using my virtual tables, for some reason it doesn't.  Still working 
under the assumption that it is my fault.

Thanks,

Eric



[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-16 Thread Eduardo Morras
On Fri, 15 May 2015 18:34:33 +
Eric Hill  wrote:

> Thanks for your reply.
> 
> I went as high as using (number of rows)^4, resulting in a cost of
> ~440 trillion for the unindexed case, along with setting the cost to
> 1 for the indexed case, and it still won't use my index.
> 
> I'd like to step out of my xBestFilter implementation into SQLite
> code to see if I can tell what is going on there, but when I step
> out, the call stack knows where I'm supposed to be, but the debugger
> does not find the right line in sqlite3.c.  I have built sqlite3.c
> simply by adding the amalgamation to my Microsoft Visual Studio 2013 C
> ++ project.  Is there something I can do to make the debugger work?
> 
> I will postpone index creation until the call to xFilter, I reckon,
> once I work out these other issues.  Thanks for the tip!

Perhaps you can disable the unindexed path, forcing to use always the index. 
This way your virtual table always use your index.

What I don't know is: 

Sqlite3 can use only one index per query, I don't know if indexes from virtual 
tables count for this limit. I suppouse not because it should be an internal 
virtual table decission implementation show the selected data the faster way. 

If it counts for index limit, perhaps Sqlite3 found a faster index when joining 
your virtual table data with other table data.

Perhaps I miss read something in documentation.

Dr. Hipps, can you explain if virtual table index count for this limit?

> Eric

---   ---
Eduardo Morras 


[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-15 Thread Dominique Devienne
On Fri, May 15, 2015 at 8:34 PM, Eric Hill  wrote:

> Is there something I can do to make the debugger work?
>

See this thread:
http://sqlite.1065341.n5.nabble.com/Windows-Specific-2-c-files-Amalgamation-td67626.html

Basically, use a multi-file amalgamation, that uses files with fewer than
64K lines. --DD

PS: Richard or someone else might provide a link to the current-release
such amalgamation.


[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-15 Thread Eric Hill
Thanks for your reply.

I went as high as using (number of rows)^4, resulting in a cost of ~440 
trillion for the unindexed case, along with setting the cost to 1 for the 
indexed case, and it still won't use my index.

I'd like to step out of my xBestFilter implementation into SQLite code to see 
if I can tell what is going on there, but when I step out, the call stack knows 
where I'm supposed to be, but the debugger does not find the right line in 
sqlite3.c.  I have built sqlite3.c simply by adding the amalgamation to my 
Microsoft Visual Studio 2013 C++ project.  Is there something I can do to make 
the debugger work?

I will postpone index creation until the call to xFilter, I reckon, once I work 
out these other issues.  Thanks for the tip!

Eric

-Original Message-
From: sqlite-users-bounces at mailinglists.sqlite.org 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Richard Hipp
Sent: Friday, May 15, 2015 12:23 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Virtual Table query - why isn't SQLite using my indexes?

On 5/15/15, Eric Hill  wrote:
>
> So, in the first case, I do not create an index (which I signify by 
> setting idxNum to -999), and I set the cost (and, est. rows) to 4581.  
> In the second case, I create an index (0) and set cost to log10(4581) 
> = 3.66 and est. rows to 4.  Yet, later, whenever xFilter is called for 
> the inventory table, SQLite passes in idxNum = 999 and nConstraints = 
> 0.  The index I dutifully created is never asked for.  In cases where 
> there is a single constraint, SQLite does ask request the index in the 
> xFilter call, but it seems that for all the cases where multiple 
> constraints are involved, the index is not being used.
>

Two things:

(1) You probably shouldn't be "creating an index" in response to an xBestIndex 
call.  xBestIndex should be thought of as a "what-if"
function.  It is asking your virtual table what it could do with a query given 
certain constraints.  SQLite makes no guarantees that it will actually ever 
call your virtual table that way - it is merely exploring possibilities.

(2) The query planner looks at many different cost factors and tries to pick 
the best overall query plan.  You've told it that running your virtual table 
without an index is 1252 times slower than running it with an index.  And it 
takes this into consideration.  That SQLite is not choosing to use the virtual 
table index indicates that some other part or parts of the join would be more 
than 1252 times slower if the virtual table index were in fact used, and so the 
overall query plan is faster even without the virtual table index.  If these 
estimates are incorrect, then an obvious work-around is merely to increase the 
cost of not using the indexing mode on the virtual table.  Have it return 10x 
or 100x the cost (45810 or 458100) when not using an index, and see if that 
helps.

--
D. Richard Hipp
drh at sqlite.org
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-15 Thread Eric Hill
Hey,

Let me say up front that I'm sure this is my fault.  I have SQLite version 
3.8.9.

I am using virtual tables, and I am trying to get xBestIndex and xFilter doing 
the right things so that I get optimal queries.  Currently, I am working on the 
following query:

SELECT t1.rental_date, t1.inventory_id, t1.customer_id, 
t2.film_id,
t2.store_id, t3.first_name AS cust_firstname, 
t3.last_name AS cust_lastname, t3.email,
t6.category_id, t4.title, t4.release_year, 
t4.length,
t4.rating, t4.rental_rate, t5.actor_id, t8.name 
AS category,
t7.first_name AS actor_firstname, t7.last_name 
AS actor_lastname
FROM rental10  t1
LEFT OUTER JOIN inventory t2
ON  ( t2.inventory_id = 
t1.inventory_id )
LEFT OUTER JOIN customer t3
ON  ( t3.customer_id = 
t1.customer_id )
LEFT OUTER JOIN film_category t6
ON  ( t6.film_id = t2.film_id )
LEFT OUTER JOIN film t4
ON  ( t4.film_id = t2.film_id )
LEFT OUTER JOIN film_actor t5
ON  ( t5.film_id = t2.film_id )
LEFT OUTER JOIN category t8
ON  ( t8.category_id = 
t6.category_id )
LEFT OUTER JOIN actor t7
ON  ( t7.actor_id = t5.actor_id 
);

When I execute this query, the result is correct, but it is taking too long by 
an order of magnitude or two.  It seems to be doing full table scans despite 
the fact that I am creating indexes as requested.

For example, xBestIndex gets called for the inventory table twice, with four 
constraints, once with usable set to false for all four constraints, and once 
with usable set to true for all four.  Here is my printf debugging spew:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 4
   CONST[0]: 0 (inventory_id) = Unusable
   CONST[1]: 1 (film_id) = Unusable
   CONST[2]: 1 (film_id) = Unusable
   CONST[3]: 1 (film_id) = Unusable
   Index NOT created: est. cost: 4581
jmpvtab BEST INDEX:  Table: inventory  nConstraints: 4
   CONST[0]: 0 (inventory_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 1 (film_id) = Usable
   CONST[3]: 1 (film_id) = Usable
   Index created: est. cost: 3.66096029177608

So, in the first case, I do not create an index (which I signify by setting 
idxNum to -999), and I set the cost (and, est. rows) to 4581.  In the second 
case, I create an index (0) and set cost to log10(4581) = 3.66 and est. rows to 
4.  Yet, later, whenever xFilter is called for the inventory table, SQLite 
passes in idxNum = 999 and nConstraints = 0.  The index I dutifully created is 
never asked for.  In cases where there is a single constraint, SQLite does ask 
request the index in the xFilter call, but it seems that for all the cases 
where multiple constraints are involved, the index is not being used.

I did EXPLAIN QUERY PLAN for the query and got this, consistent with what I'm 
seeing:

0  0  0  SCAN TABLE rental10 AS t1 VIRTUAL 
TABLE INDEX -999:
0  1  1  SCAN TABLE inventory AS t2 VIRTUAL 
TABLE INDEX -999:
0  2  2  SCAN TABLE customer AS t3 VIRTUAL 
TABLE INDEX 0:
0  3  3  SCAN TABLE film_category AS t6 
VIRTUAL TABLE INDEX -999:
0  4  4  SCAN TABLE film AS t4 VIRTUAL 
TABLE INDEX 0:
0  5  5  SCAN TABLE film_actor AS t5 
VIRTUAL TABLE INDEX -999:
0  6  6  SCAN TABLE category AS t8 VIRTUAL 
TABLE INDEX -999:
0  7  7  SCAN TABLE actor AS t7 VIRTUAL 
TABLE INDEX 0:

Now, I know that SQLite is capable of efficiently performing this query, 
because I also have the ability to copy these tables into SQLite so that I am 
querying real tables instead of virtual tables.  SQLite can perform the query 
in under 1 second with real tables, but with virtual tables, it is taking > 25 
seconds.

Any thoughts on what I can do to convince SQLite to use my indexes?

Thanks,

Eric



[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-15 Thread Richard Hipp
On 5/15/15, Eric Hill  wrote:
>
> So, in the first case, I do not create an index (which I signify by setting
> idxNum to -999), and I set the cost (and, est. rows) to 4581.  In the second
> case, I create an index (0) and set cost to log10(4581) = 3.66 and est. rows
> to 4.  Yet, later, whenever xFilter is called for the inventory table,
> SQLite passes in idxNum = 999 and nConstraints = 0.  The index I dutifully
> created is never asked for.  In cases where there is a single constraint,
> SQLite does ask request the index in the xFilter call, but it seems that for
> all the cases where multiple constraints are involved, the index is not
> being used.
>

Two things:

(1) You probably shouldn't be "creating an index" in response to an
xBestIndex call.  xBestIndex should be thought of as a "what-if"
function.  It is asking your virtual table what it could do with a
query given certain constraints.  SQLite makes no guarantees that it
will actually ever call your virtual table that way - it is merely
exploring possibilities.

(2) The query planner looks at many different cost factors and tries
to pick the best overall query plan.  You've told it that running your
virtual table without an index is 1252 times slower than running it
with an index.  And it takes this into consideration.  That SQLite is
not choosing to use the virtual table index indicates that some other
part or parts of the join would be more than 1252 times slower if the
virtual table index were in fact used, and so the overall query plan
is faster even without the virtual table index.  If these estimates
are incorrect, then an obvious work-around is merely to increase the
cost of not using the indexing mode on the virtual table.  Have it
return 10x or 100x the cost (45810 or 458100) when not using an index,
and see if that helps.

-- 
D. Richard Hipp
drh at sqlite.org