Re: [sqlite] Parallel reading can be slow on APFS

2018-11-12 Thread Jens Alfke


> On Oct 29, 2018, at 7:13 PM, Simon Slavin  wrote:
> 
> This post is about a problem with Apple's new APFS file system.  The problem 
> will affect you only if you have multiple reads/writes happening at the same 
> time.  The problem involves merely slowing of performance, not corruption of 
> databases or incorrect answers being returned by SQLite.

I finally got around to reading the article — thanks for posting the link, 
Simon.

However, I don’t think the issue described in the article is relevant to SQLite 
at all. The article is specifically about slowdowns in the readdir system call, 
not file I/O. ("I haven't conducted a comprehensive analysis of APFS to 
determine what other filesystem operations seem to acquire global kernel locks: 
all I know is readdir() does.”) But there are no calls to readdir in the SQLite 
3.25 source code!

Second, it’s good to see that the situation has been improved in macOS 10.14. 
"It is apparent that macOS 10.14 Mojave has received performance work relative 
to macOS 10.13! Overall kernel CPU time when performing parallel directory 
walks has decreased substantially - to ~50% of original on some invocations!” 
However, "Despite those improvements, APFS is still spending a lot of CPU time 
in the kernel.”

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Query regression with virtual tables

2018-11-12 Thread Hick Gunter
On upgrading from 3.7.14 to 3.24 (at last) I found a very detrimental change in 
the query implementation for a certain type of query:

SELECT <...> FROM  WHERE a IN () AND b IN () AND 
c =  AND timestamp between  AND  ORDER BY timestamp 
DESC LIMIT ,;

In 3.7.14 the xBestIndex function was called with 3 constraints 
{(c,'='),(timestamp, '<='),(timestamp,'>=)}. The function sets argvIndex values 
and OMIT flags for all three constraints, an index number and the 
orderByConsumed flag.

The created bytecode is

- create & populate ephemeral table a
- create & populate ephemeral table b
- loop over virtual table
-- check field a value IN ephemeral table a
-- check field b value IN ehpemeral table b
-next

In 3.24 the xBestIndex fuction is called with two additional constraints 
{(a,'='),(b.'=')}. The function sets argvIndex values and OMIT flags for all 
five constraints, an index number and the orderByConsumed flag.

The created bytecode is

- create & populate ephemeral table a
- create & populate ephemeral table b
- loop over ephemeral table a
-- loop over ephemeral table b
--- loop over virtual table
- sort result set
- return LIMIT rows


The virtual table is actually a partitioned table that will search only the 
partitions that cover the selected range of timestamps and implements merge 
logic. It will therefore accept and pass on addtional constraints to the 
subquery against the partition members. Unfortunately, the NGQP seems to be 
asking about a join with ephemeral tables, which precludes using CROSS JOIN to 
force a performant query plan that returns correctly ordered result rows.

Of course I could rewrite this as a sequence of statements approximately like:

BEGIN;
CREATE TEMP TABLE a_values AS VALUES()...
CREATE TEMP TABLE b_values AS VALUES()...
SELECT <...> FROM  CROSS JOIN a_values ON (a) CROSS JOIN 
b_values ON (b) WHERE c =  AND timestamp between  AND  
ORDER BY timestamp DESC LIMIT ,;
DROP TABLE a_values;
DROP TABLE b_values;
COMMIT;

But this results in two nested loops (even if an index is added on teach temp 
table). How can I get the ephemeral table lookups back?


___
 Gunter Hick | Software Engineer | Scientific Games International GmbH | 
Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 
1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users