Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-26 Thread Russ Leighton

On Sep 26, 2008, at 2:29 PM, Kees Nuyt wrote:

> On Fri, 26 Sep 2008 12:54:36 -0400, Russell wrote:
>
>> I need a 2 key index for some queries and also want to aggregate on
>> these 2 columns. I need this index BUT I have many large sqlite dbs I
>> iterate over and they won't fit in the filesystem cache. Run time  
>> when
>> the index is present is 105min. Run time with out the index is 3min.
>
> Did you populate the sqlite_stat1 index statistics table
> with ANALYZE ? It might influence the query plan.

Yup. 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-26 Thread Kees Nuyt
On Fri, 26 Sep 2008 12:54:36 -0400, Russell wrote:

>I need a 2 key index for some queries and also want to aggregate on  
>these 2 columns. I need this index BUT I have many large sqlite dbs I  
>iterate over and they won't fit in the filesystem cache. Run time when  
>the index is present is 105min. Run time with out the index is 3min.

Did you populate the sqlite_stat1 index statistics table
with ANALYZE ? It might influence the query plan.

>I see no way a simple query planner can account for factors like  
>available ram, disk io speeds and CPU speeds. The solution DRH  
>suggests is perfect for my needs.
-- 
  (  Kees Nuyt
  )
c[_]
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-26 Thread Russell Leighton
Sqlite would need to know if the file was cached or not to make the  
right decision.

The big web site where every user has their own db is a perfect example.
Assume that after a user logs in that their db gets cached (because  
they do many queries) and they do some aggregation, hence it runs fast  
due to the index. Every week the system does the same aggregation over  
ALL db files for billing, hence it loops over all files and nothing is  
in the file cache. This will run 25 times slower with the index.  
Sqlite cannot know the difference between the two cases.

Using many sqlite dbs as partitions (like above for the web site) is a  
great way to scale. The application will know if a table scan is  
better or not. Sqlite itself does not have the view.



On Sep 26, 2008, at 1:23 PM, Nicolas Williams  
<[EMAIL PROTECTED]> wrote:

> On Fri, Sep 26, 2008 at 12:54:36PM -0400, Russell Leighton wrote:
>> I need a 2 key index for some queries and also want to aggregate on
>> these 2 columns. I need this index BUT I have many large sqlite dbs I
>> iterate over and they won't fit in the filesystem cache. Run time  
>> when
>> the index is present is 105min. Run time with out the index is 3min.
>>
>> I see no way a simple query planner can account for factors like
>> available ram, disk io speeds and CPU speeds. The solution DRH
>> suggests is perfect for my needs.
>
> First, CPU speed is probably not an issue here.  Knowing the size of  
> the
> tables and indexes relative to RAM/cache size sure is relevant though.
>
> Given knowledge of table row counts, why couldn't SQLite3 recognize  
> that
> your query is best planned to do a full table scan?
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-26 Thread Nicolas Williams
On Fri, Sep 26, 2008 at 12:54:36PM -0400, Russell Leighton wrote:
> I need a 2 key index for some queries and also want to aggregate on  
> these 2 columns. I need this index BUT I have many large sqlite dbs I  
> iterate over and they won't fit in the filesystem cache. Run time when  
> the index is present is 105min. Run time with out the index is 3min.
> 
> I see no way a simple query planner can account for factors like  
> available ram, disk io speeds and CPU speeds. The solution DRH  
> suggests is perfect for my needs.

First, CPU speed is probably not an issue here.  Knowing the size of the
tables and indexes relative to RAM/cache size sure is relevant though.

Given knowledge of table row counts, why couldn't SQLite3 recognize that
your query is best planned to do a full table scan?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-26 Thread Russell Leighton

Perfect solution as long as there is a no index option along with  
index by.



On Sep 26, 2008, at 12:54 PM, Russell Leighton <[EMAIL PROTECTED] 
 > wrote:

> I have another scenario where this is needed , the one in the subject.
> I repeated this problem this AM.
>
> I need a 2 key index for some queries and also want to aggregate on
> these 2 columns. I need this index BUT I have many large sqlite dbs I
> iterate over and they won't fit in the filesystem cache. Run time when
> the index is present is 105min. Run time with out the index is 3min.
>
> I see no way a simple query planner can account for factors like
> available ram, disk io speeds and CPU speeds. The solution DRH
> suggests is perfect for my needs.
>
>
>
> On Sep 26, 2008, at 12:38 PM, "D. Richard Hipp" <[EMAIL PROTECTED]> wrote:
>
>>
>> On Sep 25, 2008, at 10:37 PM, Alex Scotti wrote:
>>>
>>>
>>> read http://www.ibm.com/developerworks/db2/library/tips/dm-0312yip/
>>> index.html as well if you could.
>>>
>>>
>>> i implore you all to take the high road here.
>>
>>
>> I agree with philosophy expressed at the link above: "If [the RDBMS]
>> does not choose the optimal access plan and the poor access plan is
>> not due to a limitation inherent in the query, we consider it a  
>> defect
>> in the product and prefer to fix the problem at the source so that
>> all ... users may benefit "  The intent of the INDEXED BY syntax
>> in SQLite is *not* to provide users a way to tune queries.  The
>> problem we are attempting to solve is one of detecting a performance
>> regression due to coding erors. Here are the two problem reports from
>> real SQLite users that we are trying to resolve:
>>
>> (1)  A workstation application uses SQLite as its database file
>> format.  During a product enhancement, one of the indices in the
>> schema was deleted by mistake, which caused certain queries to run
>> much slower.  But the slowdown was not detected during testing  
>> because
>> very large application files were necessary for the slowdown to
>> appear.  The vendor asks for some way to detect during testing that
>> the query plan has changed.
>>
>> (2) A web portal uses SQLite databases to store per-customer state
>> information - one SQLite database per customer.  These millions of
>> SQLite database are stored on a network filesystem.  Access must be
>> efficient in order to prevent the file servers from being
>> overwhelmed.  If a critical index is deleted by mistake, the
>> applications will still work fine during testing (because SQLite will
>> just use a different query plan) but might buckle under real-world
>> loads.  There is a significant chance that the problem will not be
>> detected until the upgrade is moved into production and millions of
>> users start pounding on it all at once.  The vendor lives in terror  
>> of
>> this scenario and would like a way to detect the query plan change
>> early - before a large scale rollout and subsequent disruption of
>> their service.
>>
>> The MySQL, Oracle, and MSSQL hinting solutions are not applicable to
>> the above problems because they are only hints.  If the hints cannot
>> be satisfied, the query plan silently reverts to something else.  But
>> in my proposed INDEXED BY clause, if the query plan specified by the
>> INDEXED BY clause cannot be used, then the query fails with an error.
>> This allows developers to detect problems in a few critical queries
>> early, before a large rollout.  To put it another way, the INDEXED BY
>> clause is more like a CHECK constraint than a hint - only that the
>> constraint applies to the query plan instead of the database content.
>>
>> My original idea on how to solve the problems above was to provide
>> some new API that returned an estimate of the performance for a
>> prepared statement.  Then an index change that caused (for example)  
>> an
>> O(logN) to O(N) performance regression could be detected using the  
>> new
>> API.  That sounds like a good approach upon first hearing, but as we
>> have dug deeper, we have uncovered many subtleties that make it much
>> less appealing.  The INDEXED BY clause, in contrast, is simple,
>> direct, and gets the job done with remarkably little fuss.
>>
>>
>>
>> D. Richard Hipp
>> [EMAIL PROTECTED]
>>
>>
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-26 Thread Russell Leighton
I have another scenario where this is needed , the one in the subject.  
I repeated this problem this AM.

I need a 2 key index for some queries and also want to aggregate on  
these 2 columns. I need this index BUT I have many large sqlite dbs I  
iterate over and they won't fit in the filesystem cache. Run time when  
the index is present is 105min. Run time with out the index is 3min.

I see no way a simple query planner can account for factors like  
available ram, disk io speeds and CPU speeds. The solution DRH  
suggests is perfect for my needs.



On Sep 26, 2008, at 12:38 PM, "D. Richard Hipp" <[EMAIL PROTECTED]> wrote:

>
> On Sep 25, 2008, at 10:37 PM, Alex Scotti wrote:
>>
>>
>> read http://www.ibm.com/developerworks/db2/library/tips/dm-0312yip/
>> index.html as well if you could.
>>
>>
>> i implore you all to take the high road here.
>
>
> I agree with philosophy expressed at the link above: "If [the RDBMS]
> does not choose the optimal access plan and the poor access plan is
> not due to a limitation inherent in the query, we consider it a defect
> in the product and prefer to fix the problem at the source so that
> all ... users may benefit "  The intent of the INDEXED BY syntax
> in SQLite is *not* to provide users a way to tune queries.  The
> problem we are attempting to solve is one of detecting a performance
> regression due to coding erors. Here are the two problem reports from
> real SQLite users that we are trying to resolve:
>
> (1)  A workstation application uses SQLite as its database file
> format.  During a product enhancement, one of the indices in the
> schema was deleted by mistake, which caused certain queries to run
> much slower.  But the slowdown was not detected during testing because
> very large application files were necessary for the slowdown to
> appear.  The vendor asks for some way to detect during testing that
> the query plan has changed.
>
> (2) A web portal uses SQLite databases to store per-customer state
> information - one SQLite database per customer.  These millions of
> SQLite database are stored on a network filesystem.  Access must be
> efficient in order to prevent the file servers from being
> overwhelmed.  If a critical index is deleted by mistake, the
> applications will still work fine during testing (because SQLite will
> just use a different query plan) but might buckle under real-world
> loads.  There is a significant chance that the problem will not be
> detected until the upgrade is moved into production and millions of
> users start pounding on it all at once.  The vendor lives in terror of
> this scenario and would like a way to detect the query plan change
> early - before a large scale rollout and subsequent disruption of
> their service.
>
> The MySQL, Oracle, and MSSQL hinting solutions are not applicable to
> the above problems because they are only hints.  If the hints cannot
> be satisfied, the query plan silently reverts to something else.  But
> in my proposed INDEXED BY clause, if the query plan specified by the
> INDEXED BY clause cannot be used, then the query fails with an error.
> This allows developers to detect problems in a few critical queries
> early, before a large rollout.  To put it another way, the INDEXED BY
> clause is more like a CHECK constraint than a hint - only that the
> constraint applies to the query plan instead of the database content.
>
> My original idea on how to solve the problems above was to provide
> some new API that returned an estimate of the performance for a
> prepared statement.  Then an index change that caused (for example) an
> O(logN) to O(N) performance regression could be detected using the new
> API.  That sounds like a good approach upon first hearing, but as we
> have dug deeper, we have uncovered many subtleties that make it much
> less appealing.  The INDEXED BY clause, in contrast, is simple,
> direct, and gets the job done with remarkably little fuss.
>
>
>
> D. Richard Hipp
> [EMAIL PROTECTED]
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-26 Thread D. Richard Hipp

On Sep 25, 2008, at 10:37 PM, Alex Scotti wrote:
>
>
> read http://www.ibm.com/developerworks/db2/library/tips/dm-0312yip/
> index.html as well if you could.
>
>
> i implore you all to take the high road here.


I agree with philosophy expressed at the link above: "If [the RDBMS]  
does not choose the optimal access plan and the poor access plan is  
not due to a limitation inherent in the query, we consider it a defect  
in the product and prefer to fix the problem at the source so that  
all ... users may benefit "  The intent of the INDEXED BY syntax  
in SQLite is *not* to provide users a way to tune queries.  The  
problem we are attempting to solve is one of detecting a performance  
regression due to coding erors. Here are the two problem reports from  
real SQLite users that we are trying to resolve:

(1)  A workstation application uses SQLite as its database file  
format.  During a product enhancement, one of the indices in the  
schema was deleted by mistake, which caused certain queries to run  
much slower.  But the slowdown was not detected during testing because  
very large application files were necessary for the slowdown to  
appear.  The vendor asks for some way to detect during testing that  
the query plan has changed.

(2) A web portal uses SQLite databases to store per-customer state  
information - one SQLite database per customer.  These millions of  
SQLite database are stored on a network filesystem.  Access must be  
efficient in order to prevent the file servers from being  
overwhelmed.  If a critical index is deleted by mistake, the  
applications will still work fine during testing (because SQLite will  
just use a different query plan) but might buckle under real-world  
loads.  There is a significant chance that the problem will not be  
detected until the upgrade is moved into production and millions of  
users start pounding on it all at once.  The vendor lives in terror of  
this scenario and would like a way to detect the query plan change  
early - before a large scale rollout and subsequent disruption of  
their service.

The MySQL, Oracle, and MSSQL hinting solutions are not applicable to  
the above problems because they are only hints.  If the hints cannot  
be satisfied, the query plan silently reverts to something else.  But  
in my proposed INDEXED BY clause, if the query plan specified by the  
INDEXED BY clause cannot be used, then the query fails with an error.   
This allows developers to detect problems in a few critical queries  
early, before a large rollout.  To put it another way, the INDEXED BY  
clause is more like a CHECK constraint than a hint - only that the  
constraint applies to the query plan instead of the database content.

My original idea on how to solve the problems above was to provide  
some new API that returned an estimate of the performance for a  
prepared statement.  Then an index change that caused (for example) an  
O(logN) to O(N) performance regression could be detected using the new  
API.  That sounds like a good approach upon first hearing, but as we  
have dug deeper, we have uncovered many subtleties that make it much  
less appealing.  The INDEXED BY clause, in contrast, is simple,  
direct, and gets the job done with remarkably little fuss.



D. Richard Hipp
[EMAIL PROTECTED]



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-25 Thread Alex Scotti

On Sep 24, 2008, at 4:17 PM, Nicolas Williams wrote:


>
> But every commercial SQL RDBMS seems to have syntax for index control.

as "[EMAIL PROTECTED]" was kind enough to post on sept 21 to  
this very mailing list, not all sql rdbms have taken this approach.   
at least one, db2, chose the high road.

On Sep 21, 2008, at 12:59 PM, Stephen Woodbridge wrote:

> Richard,
. edited
>
> DB2:
> http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/ 
> com.ibm.db2.udb.doc/admin/t0005308.htm


read http://www.ibm.com/developerworks/db2/library/tips/dm-0312yip/ 
index.html as well if you could.


i implore you all to take the high road here.



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-24 Thread Nicolas Williams
On Wed, Sep 24, 2008 at 01:35:40AM -0400, Alex Scotti wrote:
> On Sep 24, 2008, at 1:13 AM, Alex Scotti wrote:
> > On Sep 23, 2008, at 2:35 PM, Jay A. Kreibich wrote:
> >>   It doesn't "blatantly" anything.  Indexes are outside of the
> >>   Relational Model and have nothing to do with it.  They're  
> >> orthogonal.
> >>   From that, anything having to do with creating, using, or
> >>   manipulating indexes is outside the model.
> >>
> >>   Within any actual real-world RDBMS product, however, once you  
> >> accept
> >>   the need for indexes (on any level) then it shouldn't be hard to
> >>   accept the desire to control how those indexes are used.
> >
> > no.  no no no.  i don't know how to convince you of this deep  
> > mistake other than beating you over the head with your own words.
> >
> >
> > as i said, i give up.
> >
> > at the point where the conversations look this, i begin to worry:
> >
> > question: "my query runs very slowly!  i don't understand why!?"
> > answer: "just append this glob of sqlite specific crud to your  
> > query and it runs great!"
> >
> > question: "this query runs slow also!"
> > answer: "just code in raw vdbe instructions, you'll have absolute  
> > control over the choice of index usage!"
> >
> > question: "this query runs slowly still!!"
> > answer: you should be using berkeley db.  they give you absolute  
> > control over everything.

But every commercial SQL RDBMS seems to have syntax for index control.

I see an admission there that it's really difficult for an RDBMS to take
a query and make the system go as fast as possible without more
information (namely, an analysis of typical data and queries).

If SQLite3 can do better than the rest on this, then it has a bright
future indeed.  Otherwise to then provide less control over index use
than other choices needs to be justified on "that's what lite means"
grounds.  (I've no idea whether which is easier to implement: specific
or generic index control directives.)

A tool to generate an appropriate indexing strategy given a set of
queries and model database contents would be great.  But if there were
still times when the optimizer would pick the wrong plan for a given
query and making it pick a better plan would be ETOOHARD then index use
directives is really not a terrible thing to settle for.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-23 Thread Alex Scotti

On Sep 24, 2008, at 1:13 AM, Alex Scotti wrote:

>
> On Sep 23, 2008, at 2:35 PM, Jay A. Kreibich wrote:
>
>
>>
>>   It doesn't "blatantly" anything.  Indexes are outside of the
>>   Relational Model and have nothing to do with it.  They're  
>> orthogonal.
>>   From that, anything having to do with creating, using, or
>>   manipulating indexes is outside the model.
>>
>>   Within any actual real-world RDBMS product, however, once you  
>> accept
>>   the need for indexes (on any level) then it shouldn't be hard to
>>   accept the desire to control how those indexes are used.
>
> no.  no no no.  i don't know how to convince you of this deep  
> mistake other than beating you over the head with your own words.
>
>
> as i said, i give up.
>
> at the point where the conversations look this, i begin to worry:
>
> question: "my query runs very slowly!  i don't understand why!?"
> answer: "just append this glob of sqlite specific crud to your  
> query and it runs great!"
>
> question: "this query runs slow also!"
> answer: "just code in raw vdbe instructions, you'll have absolute  
> control over the choice of index usage!"
>
> question: "this query runs slowly still!!"
> answer: you should be using berkeley db.  they give you absolute  
> control over everything.
>
>

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-23 Thread Jay A. Kreibich
On Tue, Sep 23, 2008 at 02:26:08PM -0500, Nicolas Williams scratched on the 
wall:
> On Tue, Sep 23, 2008 at 01:35:44PM -0500, Jay A. Kreibich wrote:

> >   IMHO, the jump from "you must manually create indexes" to "you may
> >   control the *use* of an index" is a MUCH smaller jump than the very
> >   concept of "CREATE INDEX."  After all, if the RDBMS is supposed to be
> >   smart enough to figure out when using or not using an index is a good
> >   idea, shouldn't it also be smart enough to figure out if creating an
> >   index is a good idea?  Or, conversely, if nobody expects a system to
> >   be smart enough to know when to create them, why do we expect it to
> >   be smart enough to know when to use them?
> 
> Given a finite number of [parametrized] queries that are expected to
> perform well against a given schema and data, then yes, the RDBMS could
> automatically create the indexes needed to optimize those queries.

> That'd sure be user-friendly.  Provide set of [parametrized] queries.
> Analyze data.  Automatically create indexes.  It'd be nice too to give
> the user information about the impact of indexes on writes.

  I hope this was clear, but I wasn't proposing anything like this for
  SQLite itself.  I was speaking of RDBMS systems in general at a much
  higher level.

  It is, however, the kind of feature I'd like to see in some of the
  management tools.  As you said, point it at a database, feed it a
  bunch of queries, and let it make suggestions.

> If the API allowed one to first import a statement (from text), then
> apply directives relating to index use and so on, _then_ compile the
> statement, then these directives would be "outside the relational model"
> at least in so far as the directives have no impact on the SQL syntax.

  SQL (the standard) != the Relational Model.  Not even close.

> I suspect that "impact on syntax" is taken to be evil by some.

  Personally, I think altering the SQL is the least of my concerns.
  When you get this nitty-gritty you're so deep into whatever specific
  RDBMS product you're using that custom syntax shouldn't be a worry.
  You're already fully committed.

  If anything, it will remind you that you need to re-visit the issue
  if you ever move the SQL to another platform.

> Also, API-only directives would be harder to use from the shell...

  Yes, I would put forth that a "hint API" is actually against the general
  approach.  There is a reason why the C interface doesn't include
  public functions to scan a table or things like that.  The interface
  is SQL, and that's where DB engine interactions should go.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-23 Thread Nicolas Williams
On Tue, Sep 23, 2008 at 01:35:44PM -0500, Jay A. Kreibich wrote:
>   If there was a point I was trying to make, it was that something
>   being "un-RDBMS like" in itself doesn't make it a bad thing.  After
>   all, the very concept of indexes themselves is (from a Relational
>   Model theory viewpoint) "un-RDBMS like," but most of us are very very
>   comfortable with them, and accept the need to occasionally manually
>   create them to improve practical real-world performance.

+1

>   IMHO, the jump from "you must manually create indexes" to "you may
>   control the *use* of an index" is a MUCH smaller jump than the very
>   concept of "CREATE INDEX."  After all, if the RDBMS is supposed to be
>   smart enough to figure out when using or not using an index is a good
>   idea, shouldn't it also be smart enough to figure out if creating an
>   index is a good idea?  Or, conversely, if nobody expects a system to
>   be smart enough to know when to create them, why do we expect it to
>   be smart enough to know when to use them?

Given a finite number of [parametrized] queries that are expected to
perform well against a given schema and data, then yes, the RDBMS could
automatically create the indexes needed to optimize those queries.

That'd sure be user-friendly.  Provide set of [parametrized] queries.
Analyze data.  Automatically create indexes.  It'd be nice too to give
the user information about the impact of indexes on writes.

SQLite3 doesn't do that.  If indexes are exposed to the user, then the
user should have more control over the query optimizer's use of indexes
than DROP INDEX and CREATE INDEX.  How much control, I don't know.

But being SQ_Lite_, and given that the 'lite' part is often advertised
as relating to footprint, I'd think that whatever syntax requires the
least amount of code, while not painting SQLite3 too much into a corner,
would be the way to go.  I've no idea then if simple directives like Dr.
Hipp proposes, or something more complex, like MySQL's, would be best.

> > an index surely doesn't break the relational model in any way.
> 
>   Of course not.  No more than the sky being blue.  "Indexes are outside
>   of the Relational Model."  As long as they stay outside, they
>   shouldn't matter to the Model.
>   
>   But that doesn't mean they're outside the developer experience.

Like the garbage collector in many a high level language.  You could do
without a GC... for a while, but mostly you need it, even if it's
transparent.  And when the GC is critical, then you want to be able to
tune it.

> > on the other hand we have here a non standard sql extension which ties 
> > users to sqlite, and blatantly does fly in the face of the relational 
> > model.
> 
>   It doesn't "blatantly" anything.  Indexes are outside of the
>   Relational Model and have nothing to do with it.  They're orthogonal.
>   From that, anything having to do with creating, using, or
>   manipulating indexes is outside the model.

If the API allowed one to first import a statement (from text), then
apply directives relating to index use and so on, _then_ compile the
statement, then these directives would be "outside the relational model"
at least in so far as the directives have no impact on the SQL syntax.

I suspect that "impact on syntax" is taken to be evil by some.  I don't
disagree, but you can't have as much precise control over index use if
you put the directives outside the statement, not without breaking more
abstractions.

Also, API-only directives would be harder to use from the shell...

>   Within any actual real-world RDBMS product, however, once you accept
>   the need for indexes (on any level) then it shouldn't be hard to
>   accept the desire to control how those indexes are used.
> 
>   I find it much more odd that people expect the system to magically
>   understand when to use or not use an index perfectly in every odd
>   case, but don't think twice about the fact that we need to manually
>   create (or not) those indexes to start with.  The status quo is
>   this strange kind of second-guessing tuning game between the DBA
>   and the optimizer.

+1

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-23 Thread Jay A. Kreibich
On Mon, Sep 22, 2008 at 10:44:20PM -0400, Alex Scotti scratched on the wall:
>
> On Sep 22, 2008, at 11:18 AM, Jay A. Kreibich wrote:
>
>> On Mon, Sep 22, 2008 at 10:07:54AM -0400, D. Richard Hipp scratched on 
>> the wall:
>>> I am reluctant to add to SQLite the ability to explicitly specify the
>>> index for a query.  I agree with Alex Scotti that the whole idea  
>>> seems very un-RDBMS like.
>>
>>   Well it is outside of the Relational Model, that's for sure.
>>
>>   Then again, the whole concept of indexes are outside of the
>>   Relational Model.
>
> this isn't exactly a good argument.

  I didn't realize it was an argument.  I'm simply pointing out that
  there is a reason it feels "un-RDBMS like" -- mostly because it is.

  If there was a point I was trying to make, it was that something
  being "un-RDBMS like" in itself doesn't make it a bad thing.  After
  all, the very concept of indexes themselves is (from a Relational
  Model theory viewpoint) "un-RDBMS like," but most of us are very very
  comfortable with them, and accept the need to occasionally manually
  create them to improve practical real-world performance.

  IMHO, the jump from "you must manually create indexes" to "you may
  control the *use* of an index" is a MUCH smaller jump than the very
  concept of "CREATE INDEX."  After all, if the RDBMS is supposed to be
  smart enough to figure out when using or not using an index is a good
  idea, shouldn't it also be smart enough to figure out if creating an
  index is a good idea?  Or, conversely, if nobody expects a system to
  be smart enough to know when to create them, why do we expect it to
  be smart enough to know when to use them?

> an index surely doesn't break the relational model in any way.

  Of course not.  No more than the sky being blue.  "Indexes are outside
  of the Relational Model."  As long as they stay outside, they
  shouldn't matter to the Model.
  
  But that doesn't mean they're outside the developer experience.

> it's existence or absence may or may not effect execution time, but
> would never yield incorrect or different results.

  In theory, yes.  In fact, that's what keeps them outside of the Model.

  Of course nearly every RDBMS has some type of "REINDEX" command which
  says something in itself, but that's more about the realities of
  implementing complex systems than anything else.

> to beat a dead horse, the relational model doesn't discuss anything  
> physical at all. 

  Yes.  That's kind of the whole point of a "model."  Especially a
  mathematical one.

> by your line of reasoning using disks would be outside.

  Disks *are* outside the Relational Model.  You just said so, disks
  being physical and all.

  And I'm glad they're outside.  I use in-memory databases all the time.

> heaven forbid a buffer pool caching your i/o.

  As long as it would never yield incorrect or different results, the
  Model doesn't care about it.  Just like indexes.

  There are plenty of practical real-world reasons to worry about disks,
  however.  And indexes.



  A very hefty part of what an RDBMS system does is outside the
  Relational Model.  The Model gives us concepts to manipulate data,
  not a complete solution.  It is only a kernel.

  Further, nobody would want an RDBMS that tried to stay as true as
  possible to the Relational Model.  I mean, how many of us would be
  willing to give up ORDER BY just because it is outside the Model?

  Actually, ORDER BY isn't outside the model, it is in direct
  conflict with it.  But we love it and use it anyways.

> on the other hand we have here a non standard sql extension which ties 
> users to sqlite, and blatantly does fly in the face of the relational 
> model.

  It doesn't "blatantly" anything.  Indexes are outside of the
  Relational Model and have nothing to do with it.  They're orthogonal.
  From that, anything having to do with creating, using, or
  manipulating indexes is outside the model.

  Within any actual real-world RDBMS product, however, once you accept
  the need for indexes (on any level) then it shouldn't be hard to
  accept the desire to control how those indexes are used.


  I find it much more odd that people expect the system to magically
  understand when to use or not use an index perfectly in every odd
  case, but don't think twice about the fact that we need to manually
  create (or not) those indexes to start with.  The status quo is
  this strange kind of second-guessing tuning game between the DBA
  and the optimizer.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-22 Thread Alex Scotti

On Sep 22, 2008, at 11:18 AM, Jay A. Kreibich wrote:

> On Mon, Sep 22, 2008 at 10:07:54AM -0400, D. Richard Hipp scratched  
> on the wall:
>> I am reluctant to add to SQLite the ability to explicitly specify the
>> index for a query.  I agree with Alex Scotti that the whole idea  
>> seems
>> very un-RDBMS like.
>
>   Well it is outside of the Relational Model, that's for sure.
>
>   Then again, the whole concept of indexes are outside of the
>   Relational Model.

this isn't exactly a good argument.  an index surely doesn't break  
the relational model in any way.  it's existence or
absence may or may not effect execution time, but would never yield  
incorrect or different results.  the ubiquitous b-tree index
may not have been so obvious at the time, and who's to say it will  
continue to be a given forever.  the model is just that, a model -
abstracted away from implementation details, no matter how obvious  
they seem at the time to the implementers.

to beat a dead horse, the relational model doesn't discuss anything  
physical at all.  by your line of reasoning using disks would be  
outside.  heaven forbid a buffer pool caching your i/o.

on the other hand we have here a non standard sql extension which  
ties users to sqlite, and blatantly does fly in the face of the  
relational model.

that being said, as richard points out nobody would force anyone to  
use this extension.  i would simply pretend it didn't exist.  my fear  
is more along the lines of what a crutch for query optimization  
problems features like this can become.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-22 Thread Ken
Seems to me, 

That maybe  "index by" might be better served as an "access by" key phrase 
instead. That way the access to the table can be described, join order etc. Not 
just  pick an index. That way you can programatically describe the "access" 
path, index, full scan, rowid and potentially the join ordering of tables. 

How would you extend the "index by" if in the future you extend sqlite to 
utilise other indexing/(r-tree)/Hash based indexes ???  Or want the user to be 
able to specify other access components?

Some other areas for performance optimizations can occur when column data is 
not evenly distributed making some index paths poor when the column data is of 
low cardinality for a subset of data points. In oracle this is handled via 
histograms and the optimizer rejects indices based upon low cardinality where 
clause field qualifiers.


Just my .02.



--- On Mon, 9/22/08, Jay A. Kreibich <[EMAIL PROTECTED]> wrote:
From: Jay A. Kreibich <[EMAIL PROTECTED]>
Subject: Re: [sqlite] Specifing which index to use. Was: Performance/bug in 
multikey 'group by' in 3.6.2
To: "General Discussion of SQLite Database" 
Cc: [EMAIL PROTECTED]
Date: Monday, September 22, 2008, 10:18 AM

On Mon, Sep 22, 2008 at 10:07:54AM -0400, D. Richard Hipp scratched on the wall:
> I am reluctant to add to SQLite the ability to explicitly specify the  
> index for a query.  I agree with Alex Scotti that the whole idea seems  
> very un-RDBMS like.

  Well it is outside of the Relational Model, that's for sure.
  
  Then again, the whole concept of indexes are outside of the
  Relational Model.

> There seems to be no standard SQL way of providing hints to the query  
> optimizer for which index to use.   Every SQL database engine does it  
> differently.  The MySQL approach is the simplest by far. 

  Since the FROM clause is the only area of a SELECT statement where
  you're always dealing with fully-realized tables (and those are the
  only objects that can be indexed) it does seem like the most direct
  and cleanest approach.

>   SELECT * FROM tablex INDEX BY indexy WHERE ... ORDER BY ...;
>   SELECT * FROM tablex INDEX BY ROWID WHERE ... ORDER BY ...;
>   SELECT * FROM tablex NOT INDEXED WHERE ... ORDER BY ...;

  How are the last two different?
  
  It seems that NOT INDEXED would use a table scan, which is essentially
  walking the ROWID b-tree, while "INDEX BY ROWID" is going to use
the
  "index" on ROWID, which isn't really an index, but the
table's root
  b-tree-- e.g. the ROWID b-tree.

  Unless the "BY ROWID" is just some kind of alias for "the
primary b-tree"
  and can be used regardless of if there is a INTEGER PRIMARY KEY or not?
  But wouldn't that still be the same as NOT INDEXED?

  Or would NOT INDEXED force a full table-scan no matter what, even if
  the query had something like "WHERE ROWID = 43" ?



  Also, reading the MySQL docs, it sounds like their syntax is only
  applied to JOIN operations (i.e. none of the examples above).  I'm
  not sure that implies that it forces JOINs to be first or not (given
  an index or NOT given an index).  Since different clauses in a SELECT
  may need to access data from the tables in different ways, it seems
  that telling the system to use or not use an index is only half the
  issue.  You also need to be able to tell the query optimizer what
  you want the index used for (since, in general, the index can only
  be used for the first operation).

  For example, the system might pull out a long series of rows "in
  order" via an index and then join them to a much smaller data set, or
  it might join two large tables with a small result set via in index
  and then sort them.  Even if you tell the optimizer to use (or not
  use) an index, that's only half the story if you can't tell it what
  order you want the operations performed in.

  Or are SELECT operations more or less set in a fixed order for SQLite?

> I further propose that if the specified index cannot be used, the  
> sqlite3_prepare_v2() call will fail with an error.  In other words,  
> the new syntax is a requirement, not a hint.

  Given that the SQLite query optimizer doesn't have access to a large
  amount of (expensive to maintain) statistics, there are always going
  to be limitation in what it can do.  This seems like a good balance
  between simple system design and real-world tuning needs.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a
ruler
 and a piece of string."  --from Anathem by Neal Stephenson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-22 Thread Stephen Oberholtzer
Since someone  mentioned only the support in passing, not the details, I
thought I'd throw out this:

Microsoft SQL Server using the following syntax for hints:

SELECT * FROM FOO WITH (hints)

'hints' can included locking hints, index-usage hints, and whatnot.

I will quote some of the proposed examples and give the equivalent in MSSQL.
The reason is, language extensions are a *bitch*, because you have to do
massive regression testing on the parser to ensure backwards and forwards
compatibility.
This way, we can use whichever syntax is easier to support programmatically.
(Remember: We're smarter and more flexible than the computers.  If writing
our queries one way is far more efficient for the computer to process, we
should do that.)


>  >>   SELECT * FROM tablex INDEX BY indexy WHERE ... ORDER BY ...;
>
select * from tablex with(index(indexy)) where... order by...


> >  >>   SELECT * FROM tablex INDEX BY ROWID WHERE ... ORDER BY ...;
>
>  >>   SELECT * FROM tablex NOT INDEXED WHERE ... ORDER BY ...;
>

These two, as far as I understand, would be expressed to SQL Server
identically: "with (index(0))".


-- 
-- Stevie-O
Real programmers use COPY CON PROGRAM.EXE
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-22 Thread P Kishor
On 9/22/08, Steve Friedman <[EMAIL PROTECTED]> wrote:
>
>  >>
>  >> There seems to be no standard SQL way of providing hints to the query
>  >> optimizer for which index to use.   Every SQL database engine does it
>  >> differently.  The MySQL approach is the simplest by far.  But even it
>  >> is more complex than is really needed.  I propose syntax for SQLite as
>  >> follows:
>  >>
>  >>   SELECT * FROM tablex INDEX BY indexy WHERE ... ORDER BY ...;
>  >>   SELECT * FROM tablex INDEX BY ROWID WHERE ... ORDER BY ...;
>  >>   SELECT * FROM tablex NOT INDEXED WHERE ... ORDER BY ...;
>  >>
>  >> I further propose that if the specified index cannot be used, the
>  >> sqlite3_prepare_v2() call will fail with an error.  In other words,
>  >> the new syntax is a requirement, not a hint.
>  >>
>  >> Comments?  Objections?
>  >>
>
>
> As a pedant, I have two comments:
>
>  - INDEX BY is a verb form.  I would think that INDEXED BY (a past
>  participle) would be more accurate syntax since no new indices are being
>  constructed.

How about USING INDEX

SELECT * FROM tablex USING INDEX indexy WHERE...;
SELECT * FROM tablex USING INDEX ROWID WHERE ...;
SELECT * FROM tablex NOT USING INDEX WHERE ... ;

>
>  - I presume that the following is not contemplated (and the
>  documentation should reflect this just to avoid surprises):
>  SELECT * FROM (some sub-query) AS t INDEX BY ...
>
>
>  Steve Friedman
>
>
>  ___
>  sqlite-users mailing list
>  sqlite-users@sqlite.org
>  http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Puneet Kishor http://punkish.eidesis.org/
Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/
Open Source Geospatial Foundation (OSGeo) http://www.osgeo.org/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-22 Thread Seun Osewa
Why something like "USE INDEX a,b,c"?

On Mon, Sep 22, 2008 at 5:42 PM, Steve Friedman <[EMAIL PROTECTED]> wrote:

> As a pedant, I have two comments:
>
> - INDEX BY is a verb form.  I would think that INDEXED BY (a past
> participle) would be more accurate syntax since no new indices are being
> constructed.
>
> - I presume that the following is not contemplated (and the
> documentation should reflect this just to avoid surprises):
> SELECT * FROM (some sub-query) AS t INDEX BY ...
>
> Steve Friedman
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Seun Osewa
http://www.nairaland.com/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-22 Thread Steve Friedman

>>
>> There seems to be no standard SQL way of providing hints to the query  
>> optimizer for which index to use.   Every SQL database engine does it  
>> differently.  The MySQL approach is the simplest by far.  But even it  
>> is more complex than is really needed.  I propose syntax for SQLite as  
>> follows:
>>
>>   SELECT * FROM tablex INDEX BY indexy WHERE ... ORDER BY ...;
>>   SELECT * FROM tablex INDEX BY ROWID WHERE ... ORDER BY ...;
>>   SELECT * FROM tablex NOT INDEXED WHERE ... ORDER BY ...;
>>
>> I further propose that if the specified index cannot be used, the  
>> sqlite3_prepare_v2() call will fail with an error.  In other words,  
>> the new syntax is a requirement, not a hint.
>>
>> Comments?  Objections?
>>

As a pedant, I have two comments:

- INDEX BY is a verb form.  I would think that INDEXED BY (a past 
participle) would be more accurate syntax since no new indices are being 
constructed.

- I presume that the following is not contemplated (and the 
documentation should reflect this just to avoid surprises):
SELECT * FROM (some sub-query) AS t INDEX BY ...

Steve Friedman

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-22 Thread John Stanton
To me this is a very rational approach.  It is simple and unambiguous to 
understand and use and simple to implement compared to the alternative 
schemes.  That fits nicely with the "lite" approach.

Ad the directive to the SQL and measure the result and the effect is 
immediately obvious.  Hard to improve on that.

D. Richard Hipp wrote:
> I am reluctant to add to SQLite the ability to explicitly specify the  
> index for a query.  I agree with Alex Scotti that the whole idea seems  
> very un-RDBMS like.
> 
> On the other hand, just because a feature is there does not mean  
> people have to use it.  The documentation can make it clear that the  
> feature should be used rarely and only be experts.  We can make  
> arrangements to omit the feature at compile-time (or perhaps to  
> require a special compile-time option to enable it.)  And, there  
> really do seem to be a few rare cases where explicitly naming the  
> index is helpfull.
> 
> There seems to be no standard SQL way of providing hints to the query  
> optimizer for which index to use.   Every SQL database engine does it  
> differently.  The MySQL approach is the simplest by far.  But even it  
> is more complex than is really needed.  I propose syntax for SQLite as  
> follows:
> 
>   SELECT * FROM tablex INDEX BY indexy WHERE ... ORDER BY ...;
>   SELECT * FROM tablex INDEX BY ROWID WHERE ... ORDER BY ...;
>   SELECT * FROM tablex NOT INDEXED WHERE ... ORDER BY ...;
> 
> I further propose that if the specified index cannot be used, the  
> sqlite3_prepare_v2() call will fail with an error.  In other words,  
> the new syntax is a requirement, not a hint.
> 
> Comments?  Objections?
> 
> D. Richard Hipp
> [EMAIL PROTECTED]
> 
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-22 Thread Shawn Wilsher
On Mon, Sep 22, 2008 at 8:23 AM, D. Richard Hipp <[EMAIL PROTECTED]> wrote:

> In the two high-profile use cases, the programmers already have the
> statement using the "correct" index without an INDEX BY clause.  They
> just want to be alerted if some future schema change alters the index
> choice, perhaps by deleting one of the indexes that were being used.
> If the INDEX BY clause becomes a hint, then this function of the
> clause is removed.  And without the impetus of those two high-profile
> use cases, the functionality will not be added at all.  So, I am
> offering this choice:  (1) The ability to select and index with an
> error if that index won't work and (2) no new capabilities at all.

I can see option (1) being useful to Mozilla, so we'd like to see that if
possible.

Cheers,

Shawn Wilsher
Mozilla Developer
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-22 Thread D. Richard Hipp

On Sep 22, 2008, at 10:53 AM, Jeffrey Becker wrote:

>   I think the policy other dbms
> systems have of making these things hints rather than requirements is
> a good one because it still allows the query optimizer to make the
> best choice when the hints its given become incorrect.

If you want the query optimizer to make the choice, omit the INDEX BY  
clause all together.  If you include an INDEX BY clause, that is  
saying that you the programmer know better than the optimizer and the  
optimizer is not to second-guess you.

In the two high-profile use cases, the programmers already have the  
statement using the "correct" index without an INDEX BY clause.  They  
just want to be alerted if some future schema change alters the index  
choice, perhaps by deleting one of the indexes that were being used.   
If the INDEX BY clause becomes a hint, then this function of the  
clause is removed.  And without the impetus of those two high-profile  
use cases, the functionality will not be added at all.  So, I am  
offering this choice:  (1) The ability to select and index with an  
error if that index won't work and (2) no new capabilities at all.

> I'd prefer to
> see some sort of programmatic method of doing this.  The method I'd
> find ideal would be to have some sort of sqlite_suggest_* api which
> would allow a user to apply hints to an already prepared statement.

The indices are already chosen by the time the sqlite3_stmt is  
constructed.  It is too late to offer hints after the fact.

>  I
> forget if sqlite_stmt keeps a copy of the sql so I may well be
> suggesting the impossible here.  The api would reinforce the
> non-standard nature of the action while keeping the sql dialect free
> of non-standard sql.

D. Richard Hipp
[EMAIL PROTECTED]



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-22 Thread Jay A. Kreibich
On Mon, Sep 22, 2008 at 10:07:54AM -0400, D. Richard Hipp scratched on the wall:
> I am reluctant to add to SQLite the ability to explicitly specify the  
> index for a query.  I agree with Alex Scotti that the whole idea seems  
> very un-RDBMS like.

  Well it is outside of the Relational Model, that's for sure.
  
  Then again, the whole concept of indexes are outside of the
  Relational Model.

> There seems to be no standard SQL way of providing hints to the query  
> optimizer for which index to use.   Every SQL database engine does it  
> differently.  The MySQL approach is the simplest by far. 

  Since the FROM clause is the only area of a SELECT statement where
  you're always dealing with fully-realized tables (and those are the
  only objects that can be indexed) it does seem like the most direct
  and cleanest approach.

>   SELECT * FROM tablex INDEX BY indexy WHERE ... ORDER BY ...;
>   SELECT * FROM tablex INDEX BY ROWID WHERE ... ORDER BY ...;
>   SELECT * FROM tablex NOT INDEXED WHERE ... ORDER BY ...;

  How are the last two different?
  
  It seems that NOT INDEXED would use a table scan, which is essentially
  walking the ROWID b-tree, while "INDEX BY ROWID" is going to use the
  "index" on ROWID, which isn't really an index, but the table's root
  b-tree-- e.g. the ROWID b-tree.

  Unless the "BY ROWID" is just some kind of alias for "the primary b-tree"
  and can be used regardless of if there is a INTEGER PRIMARY KEY or not?
  But wouldn't that still be the same as NOT INDEXED?

  Or would NOT INDEXED force a full table-scan no matter what, even if
  the query had something like "WHERE ROWID = 43" ?



  Also, reading the MySQL docs, it sounds like their syntax is only
  applied to JOIN operations (i.e. none of the examples above).  I'm
  not sure that implies that it forces JOINs to be first or not (given
  an index or NOT given an index).  Since different clauses in a SELECT
  may need to access data from the tables in different ways, it seems
  that telling the system to use or not use an index is only half the
  issue.  You also need to be able to tell the query optimizer what
  you want the index used for (since, in general, the index can only
  be used for the first operation).

  For example, the system might pull out a long series of rows "in
  order" via an index and then join them to a much smaller data set, or
  it might join two large tables with a small result set via in index
  and then sort them.  Even if you tell the optimizer to use (or not
  use) an index, that's only half the story if you can't tell it what
  order you want the operations performed in.

  Or are SELECT operations more or less set in a fixed order for SQLite?

> I further propose that if the specified index cannot be used, the  
> sqlite3_prepare_v2() call will fail with an error.  In other words,  
> the new syntax is a requirement, not a hint.

  Given that the SQLite query optimizer doesn't have access to a large
  amount of (expensive to maintain) statistics, there are always going
  to be limitation in what it can do.  This seems like a good balance
  between simple system design and real-world tuning needs.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2

2008-09-22 Thread Jeffrey Becker
First, I have to agree that this is very 'un-rdbmsish'.  I understand
that sometimes the programmer really does know better than the DB
engine which indexes it should use.  However, the RDBMS is
fundamentally an abstraction layer.  I think the policy other dbms
systems have of making these things hints rather than requirements is
a good one because it still allows the query optimizer to make the
best choice when the hints its given become incorrect. I'd prefer to
see some sort of programmatic method of doing this.  The method I'd
find ideal would be to have some sort of sqlite_suggest_* api which
would allow a user to apply hints to an already prepared statement.  I
forget if sqlite_stmt keeps a copy of the sql so I may well be
suggesting the impossible here.  The api would reinforce the
non-standard nature of the action while keeping the sql dialect free
of non-standard sql.

On Mon, Sep 22, 2008 at 10:07 AM, D. Richard Hipp <[EMAIL PROTECTED]> wrote:
> I am reluctant to add to SQLite the ability to explicitly specify the
> index for a query.  I agree with Alex Scotti that the whole idea seems
> very un-RDBMS like.
>
> On the other hand, just because a feature is there does not mean
> people have to use it.  The documentation can make it clear that the
> feature should be used rarely and only be experts.  We can make
> arrangements to omit the feature at compile-time (or perhaps to
> require a special compile-time option to enable it.)  And, there
> really do seem to be a few rare cases where explicitly naming the
> index is helpfull.
>
> There seems to be no standard SQL way of providing hints to the query
> optimizer for which index to use.   Every SQL database engine does it
> differently.  The MySQL approach is the simplest by far.  But even it
> is more complex than is really needed.  I propose syntax for SQLite as
> follows:
>
>  SELECT * FROM tablex INDEX BY indexy WHERE ... ORDER BY ...;
>  SELECT * FROM tablex INDEX BY ROWID WHERE ... ORDER BY ...;
>  SELECT * FROM tablex NOT INDEXED WHERE ... ORDER BY ...;
>
> I further propose that if the specified index cannot be used, the
> sqlite3_prepare_v2() call will fail with an error.  In other words,
> the new syntax is a requirement, not a hint.
>
> Comments?  Objections?
>
> D. Richard Hipp
> [EMAIL PROTECTED]
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users