"Simon Slavin"  wrote ...
>
> On 21 Aug 2010, at 3:29am, Igor Tandetnik wrote:
>
>> ve3meo <holden_fam...@sympatico.ca> wrote:
>>> If the number of VM instructions is a good inverse indicator of 
>>> efficiency,
>>> then the second query with the index appears to be the most efficient. 
>>> I'm
>>> sure somebody will point out if it is not.
>>
>> It is not. It may very well take fewer instructions to implement a 
>> straightforward linear scan than it is to join through the index.
>
> Yeah.  I think your formulation like this:
>
>> 41 - Jim's query revamped to work:
>> select * from A as A1, A as A2 where A1.name=A2.name and
>> A1.position != A2.position and
>> A2.position between A1.position - 10 and A1.position + 10 ;
>
> is going to be the fastest.  It involves more VM instructions but ends up 
> searching fewer points.  You might want to time it as it is, then add an 
> index (just for testing) on just (position) and see whether that improves 
> matters.

Thanks for the explanation. I thought I might be naively wishing for too 
much from the count of VM instructions when some may take much longer to 
execute than others. I was intrigued by Peng's question and wondered if I 
might be able to predict efficiency by examining Explain Query Plan and 
Explain. Apparently not.

I understand the principle of minimising the number of points to search and 
I think that this argues for the WHERE conditions to have the coarse 
resolution first and the finest resolution last. To pick up on Simon's 
suggestion to chunk, I added an integer field 'chunk' to the table:

UPDATE A SET chunk = (position/10);

I indexed a bunch of ways to see what the optimiser would do:

create index idxNameChunkPos ON A(name,chunk,position);
create index idxNamePos ON A(name,position);
create index idxNameChunk ON A(name,chunk);
create index idxName ON A(name);
create index idxPos ON A(position);
create index idxChunk ON A(chunk);


Here's how I have interpreted Simon's suggested chunky query:

select * from A as A1, A as A2 where A1.name=A2.name and
A1.position != A2.position and
A2.chunk between A1.chunk - 1 and A1.chunk + 1 and
A2.position between A1.position - 10 and A1.position + 10 ;

Explain Query Plan shows that this query uses table A2 with idxNameChunkPos. 
Is this liable to result in a minimum of points searched?

But so do Peng's original and re-ordered query use idxNameChunkPos even 
though Chunk is not in the query. Despite the presence of all these other 
indexes, Jim's suggested query persists in using idxNamePos, as one would 
expect it to be the optimal index for that query and for the original and 
re-ordered.

I'm wary of what the optimiser does. On those first two queries, I have seen 
that it switches from using idxNamePos to idxName after the addition of the 
Chunk field when only those two indexes were present. I should add that the 
SQLite manager through which I play with SQLite uses a 3.7.0 DLL.

Tom 



_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to