Re: [sqlite] Query planner: Scanning subqueries vs using automatic covering index

2019-03-18 Thread niklas
Thanks to all who have replied, very informative! :)

This is just a database for own personal use so it's not a big deal in any
way, mainly trying to get a better understanding of how Sqlite works here.

I'll note that the sql queries are not static inside my application but they
are generated dynamically from command line arguments. 
Basically any column can be added to the SELECT, WHERE and ORDER BY clauses
at runtime, and some columns will be generated from sub-queries (via JOIN:s)
as shown in first post.

As the correlated sub-queries will be executed twice if used in the WHERE
clause it seems that using JOIN:s is preferable for my use cases.

Some further experimentation shows that using LEFT JOIN for the subqueries
instead of INNER JOIN will always make the query planner do the "right"
thing, i.e. use automatic indexes instead of table scans. Regardless of
ANALYZE information being present or not. 

So that is maybe a better work-around than removing the ANALYZE tables. LEFT
JOIN and INNER JOIN will always return the same results in this case as all
books will (or should) have dates, genres and authors, and if they do not
then I probably want LEFT JOIN semantics anyway to better notice it. I
currently use INNER JOIN to give the query planner more freedom in selecting
query plans. 

Still would be interesting to know why Sqlite went for plain table scans in
the initial case. Seems that using automatic indexes will always be faster
(N*logN vs N*N), so why not always use them when possible? Acccording to the
documentation Sqlite assumes N is a million without analyze information, and
in that case it opted to generate automatic indexes. In my case with ANALYZE
information present N will be around 3000, and then it opted for table
scans. The final query took over 24 minutes with all 3 sub-query columns
present when run to completion though, so obviously the wrong choice since
the loops ended up being nested three or more times.

(I understand that the query planner must take many different scenarios into
account and generate good plans for all of them, so this is most likely an
unfortunate edge case.)

Also noticed another case where Sqlite uses nested table scanning for JOIN:s
and this time it was not instead of automatic indexes, apparently it opted
for nested scans to avoid using a temp b-tree in the ORDER BY. (This is part
of co-routine for a window function using the AuthorID for partitions.)

Slow nested table scan (execution time measured in seconds):

|  |  |--SCAN TABLE Authors
|  |  |--SCAN TABLE DatesRead
|  |  |--SEARCH TABLE AuthorBooks USING PRIMARY KEY (BookID=? AND
AuthorID=?)
|  |  |--SEARCH SUBQUERY 1 ...
|  |  `--SEARCH SUBQUERY 2 ...

vs temp b-tree (execution time measured in milliseconds):

|  |  |--SCAN TABLE AuthorBooks
|  |  |--SEARCH TABLE Authors USING INTEGER PRIMARY KEY (rowid=?)
|  |  |--SEARCH TABLE DatesRead USING PRIMARY KEY (BookID=?)
|  |  |--SEARCH SUBQUERY 1 ...
|  |  |--SEARCH SUBQUERY 2 ...
|  |  `--USE TEMP B-TREE FOR ORDER BY

This is part of a larger query but I have not included all tables used in it
so just show parts that differ here, everything else in the two queries is
identical.
Dropping the ANALYZE information makes the query planner select the faster
alternative here as well.
I can provide more information about the query in case anyone is interested,
just included these parts now to illustrate the "problematic" nested scans.

(Not really that problematic, this just came up in a test that iterated over
all supported columns for all main queries, in actual use of the application
I would 
hardly run it, but still an interesting case I think.)




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query planner: Scanning subqueries vs using automatic covering index

2019-03-17 Thread Keith Medcalf

On Sunday, 17 March, 2019 11:19, niklas  wrote:

>I agree that correlated subqueries in general seem more natural and
>are probably also less likely to have the performance pessimizations
>noticed with joins.

>But I might also want to use the column, or in case of a correlated
>subquery, the column alias, in the WHERE clause and previously that
>has not been supported as I recall. Maybe also not allowed by SQL
>specification itself?

>I modified the suggested query and used the column alias in WHERE now
>though, and it seemed to work!

Yes and no and it "seemed to work" is an adequate description.  You have to 
realize that the list of things to get (the SELECT arguments) are executed only 
after the WHERE clause is executed.  That is to say, the semantics of the 
SELECT statement is:

SELECT some stuff
  FROM 
 WHERE 

so that except in the case where the alias in the select list is a simple alias 
for a column name, the computation will be made twice.  So for example if you 
did something like:

  select BookID,
 Title,
 (
   SELECT group_concat(ltrim("First Name" || ' ' || "Last Name"),',')
 FROM AuthorBooks JOIN Authors USING (AuthorID)
WHERE BookID == Books.BookID
 ) as "Author(s)",
 (
   SELECT group_concat("Date read",', ')
 FROM DatesRead
WHERE BookID == Books.BookID
 ) as "Date(s)",
 (
   SELECT group_concat(Genre,', ')
 FROM BookGenres JOIN Genres USING (GenreID)
WHERE BookID == Books.BookID
 ) AS "Genre(s)"
FROM Books
   WHERE "Author(s)" IN NOT NULL
 AND "Date(s)" IS NOT NULL
 AND "Genre(s)" IS NOT NULL
ORDER BY BookID;

then you are executing the correlated subquery's TWICE each, once for the WHERE 
clause and once for the SELECT clause.  If you want to ensure that those values 
are not null, and do not want to execute the correlates twice, you need to do 
something like this which will execute the correlates only for the books that 
would not have null results is those three columns (without doing the duplicate 
group_concat).

But the inner join of the subqueries will still be more performant IF THE QUERY 
PLANNER USES AUTOMATIC INDEXES.

  select BookID,
 Title,
 (
   SELECT group_concat(ltrim("First Name" || ' ' || "Last Name"),',')
 FROM AuthorBooks JOIN Authors USING (AuthorID)
WHERE BookID == Books.BookID
 ) as "Author(s)",
 (
   SELECT group_concat("Date read",', ')
 FROM DatesRead
WHERE BookID == Books.BookID
 ) as "Date(s)",
 (
   SELECT group_concat(Genre,', ')
 FROM BookGenres JOIN Genres USING (GenreID)
WHERE BookID == Books.BookID
 ) AS "Genre(s)"
FROM Books
   WHERE BookID in (
   SELECT BookID FROM AuthorBooks JOIN Authors USING 
(AuthorID)
INTERSECT
   SELECT BookID FROM DatesRead
INTERSECT
   SELECT BookID FROM BookGenres JOIN Genres USING (GenreID)
   )
ORDER BY BookID;

>Is this a recent change in Sqlite or have I misunderstood something?
>The Sqlite documentation still does not seem to say that column aliases
>can be used in the WHERE clause at least.

Well, I think this was added somewhere along the way.  Remember they are 
ALIASES and the original text is substituted for them.

Of course, you could always just retrieve all the data and ignore the rows you 
do not want at the application level ...

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query planner: Scanning subqueries vs using automatic covering index

2019-03-17 Thread Simon Slavin
On 17 Mar 2019, at 5:19pm, niklas  wrote:

> Is this a recent change in Sqlite or have I misunderstood something? The 
> Sqlite documentation still does not seem to say that column aliases can be 
> used in the WHERE clause at least.

You are correct in two things.  Column aliases cannot be relied on inside the 
WHERE clause or any other clause.  You should imagine that they are used only 
when the SELECT statement is ready to return values.

And also that some implementations of SQL allow the 'AS' names to be used for 
other purposes, but this is not in the SQL specification.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query planner: Scanning subqueries vs using automatic covering index

2019-03-17 Thread niklas
I agree that correlated subqueries in general seem more natural and are
probably also less likely to have the performance pessimizations noticed
with joins.

But I might also want to use the column, or in case of a correlated
subquery, the column alias, in the WHERE clause and previously that has not
been supported as I recall. Maybe also not allowed by SQL specification
itself?

I modified the suggested query and used the column alias in WHERE now
though, and it seemed to work! 

Is this a recent change in Sqlite or have I misunderstood something? The
Sqlite documentation still does not seem to say that column aliases can be
used in the WHERE clause at least.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query planner: Scanning subqueries vs using automatic covering index

2019-03-16 Thread Simon Slavin
On 15 Mar 2019, at 7:02pm, niklas  wrote:

> The data used for sqlite_stat1 in create.txt is taken from the real data, 
> it's copied from the sql-dump generated just after running ANALYZE.

Okay.  I should have guessed that.  Sorry for doubting you.

You seem to have figured out a work-around for now.  Good luck with it.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query planner: Scanning subqueries vs using automatic covering index

2019-03-16 Thread niklas
The data used for sqlite_stat1 in create.txt is taken from the real data,
it's copied from the sql-dump generated just after running ANALYZE.

I only wanted to include the minimum amount of data the demonstrate the
issue so I omitted all other tables, views and data. As I understand it
sqlite only checks the statN table data when planning the queries and not
the actual data in the real tables.

I have dropped the sqlite_stat1 table for now and so disabled analyze
functionality. This way I avoid the slow queries. Running ANALYZE on the
database again will immediately make them return though.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query planner: Scanning subqueries vs using automatic covering index

2019-03-15 Thread Keith Medcalf

Of course, the correlated version returns all books, not just the ones that 
have at least one author, were read at least once, and have at least one genre. 
 

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Keith Medcalf
>Sent: Friday, 15 March, 2019 14:44
>To: SQLite mailing list
>Subject: Re: [sqlite] Query planner: Scanning subqueries vs using
>automatic covering index
>
>
>The current tip of trunk produced the same results demonstrated by
>Niklas in his original post for his original testcase for me.
>
>I would have written the query as a simple query with correlated
>subqueries to get the concatenated data as it is a "more natural"
>declaration of what is wanted (in my opinion anyway), rather than re-
>writing into joins.
>
>  select BookID,
> Title,
> (
>   SELECT group_concat(ltrim("First Name" || ' ' || "Last
>Name"),',')
> FROM AuthorBooks JOIN Authors USING (AuthorID)
>WHERE BookID == Books.BookID
> ) as "Author(s)",
> (
>   SELECT group_concat("Date read",', ')
> FROM DatesRead
>WHERE BookID == Books.BookID
> ) as "Date(s)",
> (
>   SELECT group_concat(Genre,', ')
> FROM BookGenres JOIN Genres USING (GenreID)
>WHERE BookID == Books.BookID
> ) AS "Genre(s)"
>FROM Books
>ORDER BY BookID;
>
>---
>The fact that there's a Highway to Hell but only a Stairway to Heaven
>says a lot about anticipated traffic volume.
>
>>-Original Message-
>>From: sqlite-users [mailto:sqlite-users-
>>boun...@mailinglists.sqlite.org] On Behalf Of niklas
>>Sent: Friday, 15 March, 2019 01:36
>>To: sqlite-users@mailinglists.sqlite.org
>>Subject: [sqlite] Query planner: Scanning subqueries vs using
>>automatic covering index
>>
>>I recently noticed some very slow queries for my sqlite book
>>database.
>>Turns out that it happened due to the query planner decided to scan
>>sub-queries instead of using an automatic covering index to search
>>them.
>>
>>The database contains about 3000 entries and with two subqueries it
>>took a noticable time to run, with three subqueries I had to give up
>>and
>>abort to save the CPU from overheating.
>>
>>Using indexes only take milliseconds in all cases.
>>
>>Dropping analyze (sqlite_stat1) information from the database made
>>the problem go away, i.e. made sqlite use index instead of scanning.
>>As did adding a few more columns to the ORDER BY part.
>>
>>These sql statements should be enough to demonstrate the problem.
>>I am using sqlite 3.25.1
>>
>>**
>>create.txt
>>**
>>
>>BEGIN;
>>
>>CREATE TABLE "Authors" (
>>"AuthorID" INTEGER PRIMARY KEY,
>>"Last Name" TEXT NOT NULL,
>>"First Name" TEXT NOT NULL);
>>
>>CREATE TABLE "Books" (
>>"BookID" INTEGER PRIMARY KEY,
>>"Title" TEXT NOT NULL);
>>
>>CREATE TABLE "Genres"(
>>"GenreID" INTEGER PRIMARY KEY,
>>"Genre" TEXT UNIQUE NOT NULL);
>>
>>CREATE TABLE "DatesRead"(
>>"BookID" INTEGER,
>>"Date Read" TEXT,
>>PRIMARY KEY("BookID", "Date Read"),
>>FOREIGN KEY(BookID) REFERENCES Books(BookID)) WITHOUT ROWID;
>>
>>CREATE TABLE "AuthorBooks"(
>>"BookID" INTEGER,
>>"AuthorID" INTEGER,
>>PRIMARY KEY("BookID", "AuthorID" ),
>>FOREIGN KEY(BookID) REFERENCES Books(BookID),
>>FOREIGN KEY(AuthorID) REFERENCES Authors(AuthorID)) WITHOUT
>>ROWID;
>>
>>CREATE TABLE "BookGenres"(
>>"BookID" INTEGER,
>>"GenreID" INTEGER,
>>PRIMARY KEY("BookID", "GenreID" ),
>>FOREIGN KEY(BookID) REFERENCES Books(BookID),
>>FOREIGN KEY(GenreID) REFERENCES Genres(GenreID)) WITHOUT ROWID;
>>
>>ANALYZE;
>>INSERT INTO sqlite_stat1 VALUES('BookGenres','BookGenres','3190 2
>>1');
>>INSERT INTO sqlite_stat1 VALUES('DatesRead','DatesRead','3047 2 1');
>>INSERT INTO sqlite_stat1 VALUES('AuthorBooks','AuthorBooks','3549 2
>>1');
>>INSERT INTO sqlite_stat1 VA

Re: [sqlite] Query planner: Scanning subqueries vs using automatic covering index

2019-03-15 Thread Keith Medcalf

The current tip of trunk produced the same results demonstrated by Niklas in 
his original post for his original testcase for me.

I would have written the query as a simple query with correlated subqueries to 
get the concatenated data as it is a "more natural" declaration of what is 
wanted (in my opinion anyway), rather than re-writing into joins.

  select BookID,
 Title,
 (
   SELECT group_concat(ltrim("First Name" || ' ' || "Last Name"),',')
 FROM AuthorBooks JOIN Authors USING (AuthorID)
WHERE BookID == Books.BookID
 ) as "Author(s)",
 (
   SELECT group_concat("Date read",', ')
 FROM DatesRead
WHERE BookID == Books.BookID
 ) as "Date(s)",
 (
   SELECT group_concat(Genre,', ')
 FROM BookGenres JOIN Genres USING (GenreID)
WHERE BookID == Books.BookID
 ) AS "Genre(s)"
FROM Books
ORDER BY BookID;

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of niklas
>Sent: Friday, 15 March, 2019 01:36
>To: sqlite-users@mailinglists.sqlite.org
>Subject: [sqlite] Query planner: Scanning subqueries vs using
>automatic covering index
>
>I recently noticed some very slow queries for my sqlite book
>database.
>Turns out that it happened due to the query planner decided to scan
>sub-queries instead of using an automatic covering index to search
>them.
>
>The database contains about 3000 entries and with two subqueries it
>took a noticable time to run, with three subqueries I had to give up
>and
>abort to save the CPU from overheating.
>
>Using indexes only take milliseconds in all cases.
>
>Dropping analyze (sqlite_stat1) information from the database made
>the problem go away, i.e. made sqlite use index instead of scanning.
>As did adding a few more columns to the ORDER BY part.
>
>These sql statements should be enough to demonstrate the problem.
>I am using sqlite 3.25.1
>
>**
>create.txt
>**
>
>BEGIN;
>
>CREATE TABLE "Authors" (
>"AuthorID" INTEGER PRIMARY KEY,
>"Last Name" TEXT NOT NULL,
>"First Name" TEXT NOT NULL);
>
>CREATE TABLE "Books" (
>"BookID" INTEGER PRIMARY KEY,
>"Title" TEXT NOT NULL);
>
>CREATE TABLE "Genres"(
>"GenreID" INTEGER PRIMARY KEY,
>"Genre" TEXT UNIQUE NOT NULL);
>
>CREATE TABLE "DatesRead"(
>"BookID" INTEGER,
>"Date Read" TEXT,
>PRIMARY KEY("BookID", "Date Read"),
>FOREIGN KEY(BookID) REFERENCES Books(BookID)) WITHOUT ROWID;
>
>CREATE TABLE "AuthorBooks"(
>"BookID" INTEGER,
>"AuthorID" INTEGER,
>PRIMARY KEY("BookID", "AuthorID" ),
>FOREIGN KEY(BookID) REFERENCES Books(BookID),
>FOREIGN KEY(AuthorID) REFERENCES Authors(AuthorID)) WITHOUT
>ROWID;
>
>CREATE TABLE "BookGenres"(
>"BookID" INTEGER,
>"GenreID" INTEGER,
>PRIMARY KEY("BookID", "GenreID" ),
>FOREIGN KEY(BookID) REFERENCES Books(BookID),
>FOREIGN KEY(GenreID) REFERENCES Genres(GenreID)) WITHOUT ROWID;
>
>ANALYZE;
>INSERT INTO sqlite_stat1 VALUES('BookGenres','BookGenres','3190 2
>1');
>INSERT INTO sqlite_stat1 VALUES('DatesRead','DatesRead','3047 2 1');
>INSERT INTO sqlite_stat1 VALUES('AuthorBooks','AuthorBooks','3549 2
>1');
>INSERT INTO sqlite_stat1 VALUES('Authors',NULL,'1329');
>INSERT INTO sqlite_stat1 VALUES('Books',NULL,'2978');
>INSERT INTO sqlite_stat1
>VALUES('Genres','sqlite_autoindex_Genres_1','112
>1');
>
>COMMIT;
>
>
>queries.txt
>
>
>select "";
>select "order by bookid only";
>select "";
>
>EXPLAIN QUERY PLAN
>SELECT BookID,Title,"Author(s)","Date(s)","Genre(s)"
>FROM Books
>JOIN (SELECT BookID, group_concat(ltrim("First Name"||' '||"Last
>Name"),',
>') AS 'Author(s)' FROM Books JOIN AuthorBooks USING(BookID) JOIN
>Authors
>USING(AuthorID) GROUP BY BookID) USING(BookID)
>JOIN (SELECT BookID, group_concat("Date read",', ') AS 'Date(s)' FROM
>Books
>JOIN DatesRead USING(BookID) GROUP BY BookID) USING(BookID)
>JOIN (SELECT BookID, group_concat(Genre,', ') AS 'Genre(s)' FROM
>Books JOIN
>BookGenres USING(BookID) JOIN Genres USING(GenreID) GROUP BY BookID)
>USING(BookID)
>ORDER BY BookID;
>
>select "---";
>select "order by bookid and some other fields too";
>select "--";
>
>EXPLAIN QUERY PLAN
>SELECT BookID,Title,"Author(s)","Date(s)","Genre(s)"
>FROM Books
>JOIN (SELECT BookID, group_concat(ltrim("First Name"||' '||"Last
>Name"),',
>') AS 'Author(s)' FROM Books JOIN AuthorBooks USING(BookID) JOIN
>Authors
>USING(AuthorID) GROUP BY BookID) USING(BookID)
>JOIN (SELECT BookID, group_concat("Date read",', ') AS 'Date(s)' FROM
>Books
>JOIN DatesRead USING(BookID) GROUP BY BookID) USING(BookID)
>JOIN (SELECT BookID, group_concat(Genre,', ') AS 'Genre(s)' 

Re: [sqlite] Query planner: Scanning subqueries vs using automatic covering index

2019-03-15 Thread Simon Slavin
On 15 Mar 2019, at 7:35am, niklas  wrote:

> Dropping analyze (sqlite_stat1) information from the database made 
> the problem go away, i.e. made sqlite use index instead of scanning. 

Instead of dropping ANALYZE information, did you try updating it instead, by 
running ANALYZE again ?

Ah ...

> ANALYZE;
> INSERT INTO sqlite_stat1 VALUES('BookGenres','BookGenres','3190 2 1');

How do we know your values for sqlite_stat1 are right ?  Why aren't you just 
leaving ANALZE to do its job ?

Also, you are missing out on part of what ANALYZE does.  As well as looking at 
your schema, ANALYZE also considers the 'chunkiness' of the data in each 
indexed column.  Run it with data in the table, with the data as convincing as 
possible in terms of number of rows and the values in each column.

In your create.txt I would insert data in your tables, then put ANALYZE just 
before COMMIT.

Curious to know if this improves matters for you.  I'm not certain.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users