Re: [sqlite] Efficient query to count number of leaves in a DAG.
If a different perspective may be helpful to you: If moving overhead to writes is an option (ie you dont have many or time critical writes), then the tree descendants problem can be sped up to stellar speeds by using a path column. IE. add a column "path" in the nodes table that would contain something like "1.2.3" for node 3 that is descendant of node 2 that is descendant of node 1. Then querying 2's descendant would result in the following range: "path">='1.2.' AND "path"<'1.2/' or you can try using LIKE semantics - both can use an index if you are careful with collation; I found out the hard way that the range queries are more resilient and portable, SQLite and others have a pretty awkward way of plugging in the LIKE optimisations that may result in the index being skipped for not-so-obvious reasons. Inserting nodes is trivial, but moving edges requires an algorithm to update paths (whenever a node's parent changes, all descendant's paths must be updated). However, for most real-world ontology use scenarios, this opperation happens very rarely and usually on the admin range of functions, so you can afford this operation that can be pretty slow. -- Sent from: http://sqlite.1065341.n5.nabble.com/ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can select * from table where not exists (subquery) be optimized?
Shane Dev wrote: > CREATE TABLE nodes(id integer primary key, description text); > CREATE TABLE edges(parent not null references nodes, child not null > references nodes, primary key(parent, child)); > > select * from nodes where not exists (select * from edges where > child=nodes.id); > > This works but is very slow when there are a million nodes and edges. > I thought an index on edges(child) might help > but it didn't The index has the wrong affinity; the edges.child column must be integer. The following query would be simpler, and does not need the index (because SQLite always creates a temporary index for the lookup anyway): select * from nodes where id not in (select child from edges); Regards, Clemens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes
Am Mon, 1 Jan 2018 16:20:21 +0100 schrieb Clemens Ladisch: > It is indeed possible to change the query so that SQLite uses rowid > lookups for the R-tree filter (INDEX 1). However, any likelihood on the > R-tree search expression still did not make any difference. Do you have > an example? Try: --- CREATE TABLE t(id INTEGER PRIMARY KEY,x INTEGER,y INTEGER,z INTEGER); CREATE VIRTUAL TABLE i USING rtree(id,minx,maxx,miny,maxy); CREATE INDEX t_x ON t(x); --- EXPLAIN QUERY PLAN SELECT * FROM t INNER JOIN i USING(id) WHERE LIKELIHOOD(i.minX>=-81.08 AND i.maxX<=-80.58 AND i.minY>=35.00 AND i.maxY<=35.44, 0.999) -- 0.999 AND LIKELIHOOD(t.x=3, 0.001); -- 0.001 -> SEARCH TABLE t USING INDEX t_x (x=?) -> SCAN TABLE i VIRTUAL TABLE INDEX 1: --- SELECT * FROM t INNER JOIN i USING(id) WHERE LIKELIHOOD(i.minX>=-81.08 AND i.maxX<=-80.58 AND i.minY>=35.00 AND i.maxY<=35.44, 0.001) -- 0.001 AND LIKELIHOOD(t.x=3, 0.999); -- 0.999 -> SCAN TABLE i VIRTUAL TABLE INDEX 2:D0B1D2B3 -> SEARCH TABLE t USING INTEGER PRIMARY KEY (rowid=?) --- Tested with SQLite 3.13.0 here, but IIRC newer versions behave the same. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes
On Mon, Jan 1, 2018 at 10:45 AM, Clemens Ladischwrote: > Wolfgang Enzinger wrote: > > First, query the overall extent of your data, like this: > > SELECT MIN(mingx),MAX(maxgx),MIN(mingy),MAX(maxgy) FROM > flst_shape_index; > > This results in a full table scan. Instead of caching these values > manually, > it would be a better idea to read them from the index: > > SELECT rtreenode(2, data) FROM flst_shape_index_node WHERE nodeno = 1; > > (rtreenode() is undocumented; maybe you should use your own decoder.) > Intriguing! I'm looking at the output on my local data set (all Texas roads, 3+ million records, large, but not all that much in GIS systems): SELECT rtreenode(2, data) FROM rtree_tiger_shp_geom_node WHERE nodeno = 1; {26338 -100.598 -96.2836 25.8411 30.6962} {26337 -96.9349 -93.5195 28.598 31.1915} {43295 -96.9621 -93.6133 30.6486 33.9442} {62086 -106.644 -98.0079 28.887 32.1568} {80094 -97.6371 -96.4565 28.5382 33.9419} {97065 -98.5755 -97.2869 28.686 34.1405} {108645 -104.096 -98.3331 31.6511 36.5007} So, if I'm guessing right, each set of numbers is a count of features and then the rectangle intersecting them? I've prepared a picture with the data: https://drive.google.com/file/d/15WpyfWNKdVeBoLDDRCoCW2xx87T6qf_Z/view?usp=sharing This query is indeed quite a bit faster and more interesting than scanning the entire index virtual table, wondering, is there also a way to get the "next level" of the rtree? This one returns only a few records, I would not mind having a bit more in memory for more accurate pre-checks (dabase access can be optimized out fully if the requested rectangle falls outside of the rtree boxes no?) What worries me is that these functions are undocumented, and thus, I imaging, unsupported and at risk of being removed at any time. Do you have a feeling of how likely is this to happen? > > > Let SQLite know about that likelihood in a JOIN query > The query is being generated after translation from another generic filter language and at the moment it's expressed as a sub-query (cause it can be negated, related with other conditions by and/or, and so on). I've tried to use it with little luck, the subquery is always run according to explain plan: sqlite> explain query plan SELECT "fid","geom" as "geom" FROM "tiger_shp" WHERE likelihood("fid" IN (SELECT id FROM "rtree_tiger_shp_geom" r WHERE r.maxx >= -98.41324970985764 AND r.minx <= -98.33594825643836 AND r.maxy >= 30.532285631231357 AND r.miny <= 30.595068029284644), *0.01*); 0|0|0|SEARCH TABLE tiger_shp USING INTEGER PRIMARY KEY (rowid=?) 0|0|0|EXECUTE LIST SUBQUERY 1 1|0|0|SCAN TABLE rtree_tiger_shp_geom AS r VIRTUAL TABLE INDEX 2:D1B0D3B2 sqlite> explain query plan SELECT "fid","geom" as "geom" FROM "tiger_shp" WHERE likelihood("fid" IN (SELECT id FROM "rtree_tiger_shp_geom" r WHERE r.maxx >= -98.41324970985764 AND r.minx <= -98.33594825643836 AND r.maxy >= 30.532285631231357 AND r.miny <= 30.595068029284644), *0.99*); 0|0|0|SEARCH TABLE tiger_shp USING INTEGER PRIMARY KEY (rowid=?) 0|0|0|EXECUTE LIST SUBQUERY 1 1|0|0|SCAN TABLE rtree_tiger_shp_geom AS r VIRTUAL TABLE INDEX 2:D1B0D3B2 Oh well, since the code is generating the SQL, it can directly omit the subquery. Is there any literature on what a good threshold would be? Cheers Andrea ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Capturing groups for regexp.c Check-in [3d6fba62] ?
Richard. Please consider adding capturing groups during your upgrade of the regexp.c matching capability. In addition to the adding a powerful new capability to all SQLite expressions, it would be very instructive to see how your code obtains the cached object for a pair of captured group accessors such as regexp_group_count(R) and regexp_group(R,I) where R is the common regexp and I is the index of the captured group text to be retrieved. Peter ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Capturing groups for regexp.c Check-in [3d6fba62] ?
On 1/1/18, peternwrote: > Richard. Please consider adding capturing groups during your upgrade of > the regexp.c matching capability. I did consider that. It seems hard to do in linear time. I also notice that neither JavaScript nor AWK support that capability. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can select * from table where not exists (subquery) be optimized?
On 01-01-18 03:14, Shane Dev wrote: > Hello, > > I have a directed acyclic graph defined as follows - > > sqlite> .sch > CREATE TABLE nodes(id integer primary key, description text); > CREATE TABLE edges(parent not null references nodes, child not null > references nodes, primary key(parent, child)); > > Now I want to find the "roots" of the graph - i.e nodes which are not > children of other nodes - > > select * from nodes where not exists (select * from edges where child= > nodes.id); > > This works but is very slow when there are a million nodes and edges. Changing this to: select * from nodes where not exists (select 1 from edges where child= nodes.id); saved in my test about 10% of time > Is there any way to speed it up? ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can select * from table where not exists (subquery) be optimized?
This query will use the index you proposed. SELECT * FROM nodes NATURAL JOIN (SELECT parent AS id FROM edges WHERE parent NOT IN (SELECT child FROM edges)); Peter On Sun, Dec 31, 2017 at 6:14 PM, Shane Devwrote: > Hello, > > I have a directed acyclic graph defined as follows - > > sqlite> .sch > CREATE TABLE nodes(id integer primary key, description text); > CREATE TABLE edges(parent not null references nodes, child not null > references nodes, primary key(parent, child)); > > Now I want to find the "roots" of the graph - i.e nodes which are not > children of other nodes - > > select * from nodes where not exists (select * from edges where child= > nodes.id); > > This works but is very slow when there are a million nodes and edges. > > Looking at the query plan - > > sqlite> explain query plan select * from nodes where not exists (select * > from edges where child=nodes.id); > selectidorder fromdetail > 0 0 0 SCAN TABLE nodes > 0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 1 > 1 0 0 SCAN TABLE edges > > I thought an index on edges(child) might help > > sqlite> CREATE INDEX iedges on edges(child); > > but it didn't - > > sqlite> explain query plan select * from nodes where not exists (select * > from edges where child=nodes.id); > selectidorder fromdetail > 0 0 0 SCAN TABLE nodes > 0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 1 > 1 0 0 SCAN TABLE edges > > Is there any way to speed it up? > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can select * from table where not exists (subquery) be optimized?
On 01-01-18 12:18, Luuk wrote: > On 01-01-18 03:14, Shane Dev wrote: >> Hello, >> >> I have a directed acyclic graph defined as follows - >> >> sqlite> .sch >> CREATE TABLE nodes(id integer primary key, description text); >> CREATE TABLE edges(parent not null references nodes, child not null >> references nodes, primary key(parent, child)); >> >> Now I want to find the "roots" of the graph - i.e nodes which are not >> children of other nodes - >> >> select * from nodes where not exists (select * from edges where child= >> nodes.id); >> >> This works but is very slow when there are a million nodes and edges. > Changing this to: > > select * from nodes where not exists (select 1 from edges where child= > nodes.id); > > saved in my test about 10% of time > >> Is there any way to speed it up? Above is the same as what Clemens already mentioned ... sqlite> explain query plan select * from nodes where not exists (select * from edges where child=nodes.id); 0|0|0|SCAN TABLE nodes 0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1 1|0|0|SCAN TABLE edges Run Time: real 0.009 user 0.00 sys 0.00 sqlite> sqlite> explain query plan select * from nodes where not exists (select 1 from edges where child=nodes.id); 0|0|0|SCAN TABLE nodes 0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1 1|0|0|SCAN TABLE edges USING COVERING INDEX iedges Run Time: real 0.003 user 0.00 sys 0.00 sqlite> ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes
Am Mon, 1 Jan 2018 10:45:50 +0100 schrieb Clemens Ladisch: > Wolfgang Enzinger wrote: >> First, query the overall extent of your data, like this: >> SELECT MIN(mingx),MAX(maxgx),MIN(mingy),MAX(maxgy) FROM flst_shape_index; > > This results in a full table scan. Instead of caching these values manually, > it would be a better idea to read them from the index: > > SELECT rtreenode(2, data) FROM flst_shape_index_node WHERE nodeno = 1; > > (rtreenode() is undocumented; maybe you should use your own decoder.) Thanks, didn't know that, I'll look into it. You're right, my query results in a full table scan, however it's pretty fast anyway - less than a second with 160,000 rows and cold cache. >> Let SQLite know about that likelihood in a JOIN query > > This does not appear to change anything with a virtual table: > > CREATE TABLE t(id INETGER PRIMARY KEY, x, [...]); > CREATE VIRTUAL TABLE i USING rtree(id, minx, maxx); > > SELECT t.* FROM t JOIN i USING (id) WHERE likelihood(i.minx BETWEEN 10 AND > 20, 0.01); > --EQP-- 0,0,1,SCAN TABLE i VIRTUAL TABLE INDEX 2:D0B0 > --EQP-- 0,1,0,SEARCH TABLE t USING INTEGER PRIMARY KEY (rowid=?) > SELECT t.* FROM t JOIN i USING (id) WHERE likelihood(i.minx BETWEEN 10 AND > 20, 0.99); > --EQP-- 0,0,1,SCAN TABLE i VIRTUAL TABLE INDEX 2:D0B0 > --EQP-- 0,1,0,SEARCH TABLE t USING INTEGER PRIMARY KEY (rowid=?) This is not surprising because only criteria concerning the "i" table are in effect here. So it is clear that even a likelihood of 0.99 is more selective than a likelihood of 1.000 (= no filter criteria in this table). However, if your query has criteria both in the "i" and the "t" table, it can make a difference. Of course, anybody correct me if I'm mistaken. Happy new year! Wolfgang ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can select * from table where not exists (subquery) be optimized?
Shane Dev wrote: Hi Clemens, Your query is much faster on my system - thanks! Apart from visual inspection and testing, is there anyway to be sure your query selects the same results as my query? From https://sqlite.org/queryplanner.html "When programming in SQL you tell the system what you want to compute, not how to compute it". Is this an exception to the rule where the query planner must be told how to compute the result? select * from nodes where not exists (select * from edges where child=nodes.id); select * from nodes where not exists (select 1 from edges where child=nodes.id); select * from nodes where id not in (select child from edges); The first two queries look more 'procedural' than the last. So this may confirm "When programming in SQL you tell the system what you want to compute, not how to compute it" ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can select * from table where not exists (subquery) be optimized?
Shane Dev wrote: > Apart from visual inspection and testing, is there anyway to be sure your > query selects the same results as my query? Can I interest you in things like relational algebra or tuple calculus? ;-) >>> select * from nodes where not exists (select * from edges where >>> child=nodes.id); >> select * from nodes where id not in (select child from edges); > > From https://sqlite.org/queryplanner.html "When programming in SQL you tell > the system what you want to compute, not how to compute it". Is this an > exception to the rule where the query planner must be told how to compute > the result? Arguable, the second query is on a higher abstraction level, i.e., the _first_ one tells the system how to compute it. (AFAIK SQLite is able to transform an IN lookup into a correlated subquery, if worthwhile, but not in the opposite direction.) Regards, Clemens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Efficient query to count number of leaves in a DAG.
Shane. I sent you a query to work with the crippled schema and index you proposed for TABLE edges. Clemens then explicitly suggested you correct the schema to have use of automatic covering index. >CREATE TABLE edges(parent not null references nodes, child not null >references nodes, primary key(parent, child)); Try your leaf counter again - after making the schema changes Clemens suggested. Peter On Mon, Jan 1, 2018 at 8:13 AM, Shane Devwrote: > Hi, > > I want to the count the number of leaves (descendants without children) for > each node in a DAG > > DAG definition - > > CREATE TABLE nodes(id integer primary key, description text); > CREATE TABLE edges(parent not null references nodes, child not null > references nodes, primary key(parent, child)); > > My query - > > CREATE VIEW v_count_leaves as with recursive r(id, top) as ( > select id, id from nodes > union all > select t.id, top from nodes as t, edges as e, r where e.parent=r.id and > t.id > =e.child) > select top, count(*) from r where top<>id and id not in (select parent from > edges where parent=id) group by top; > > It seems to work but is complex to understand and debug despite my aim to > keep it simple as possible, but more importantly - it is very slow when > there are more than a few thousand nodes and edges. > > It there a more efficient (and ideally simpler) way? > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can select * from table where not exists (subquery) be optimized?
Hi Clemens, Your query is much faster on my system - thanks! Apart from visual inspection and testing, is there anyway to be sure your query selects the same results as my query? From https://sqlite.org/queryplanner.html "When programming in SQL you tell the system what you want to compute, not how to compute it". Is this an exception to the rule where the query planner must be told how to compute the result? On 1 January 2018 at 10:58, Clemens Ladischwrote: > Shane Dev wrote: > > CREATE TABLE nodes(id integer primary key, description text); > > CREATE TABLE edges(parent not null references nodes, child not null > references nodes, primary key(parent, child)); > > > > select * from nodes where not exists (select * from edges where child= > nodes.id); > > > > This works but is very slow when there are a million nodes and edges. > > I thought an index on edges(child) might help > > but it didn't > > The index has the wrong affinity; the edges.child column must be integer. > > The following query would be simpler, and does not need the index (because > SQLite > always creates a temporary index for the lookup anyway): > > select * from nodes where id not in (select child from edges); > > > Regards, > Clemens > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can select * from table where not exists (subquery) be optimized?
On 01-01-18 16:52, E.Pasma wrote: > Clemens Ladisch wrote: > >> Luuk wrote: >>> On 01-01-18 03:14, Shane Dev wrote: select * from nodes where not exists (select * from edges where child=nodes.id); >>> >>> Changing this to: >>> >>> select * from nodes where not exists (select 1 from edges where >>> child=nodes.id); >>> >>> saved in my test about 10% of time >> >> Then I have to doubt your test; the generated code (see the EXPLAIN >> output) is exactly the same. > > the 3rd step of EXPLAIN changed from: 1|0|0|SCAN TABLE edges to: 1|0|0|SCAN TABLE edges USING COVERING INDEX iedges luuk@opensuse:~/tmp> sqlite3 nodes.db SQLite version 3.8.10.2 2015-05-20 18:17:19 Enter ".help" for usage hints. sqlite> .timer on sqlite> select * from nodes where not exists (select * from edges where child=nodes.id) and id<10; Run Time: real 254.400 user 254.007998 sys 0.119976 sqlite> select * from nodes where not exists (select * from edges where child=nodes.id) and id<10; Run Time: real 234.342 user 233.348752 sys 0.491637 sqlite> select * from nodes where not exists (select 1 from edges where child=nodes.id) and id<10; Run Time: real 219.904 user 218.968920 sys 0.651569 sqlite> select * from nodes where not exists (select 1 from edges where child=nodes.id) and id<10; Run Time: real 219.929 user 219.562780 sys 0.127948 sqlite> select * from nodes where not exists (select * from edges where child=nodes.id) and id<10; Run Time: real 236.423 user 234.648774 sys 1.622957 ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Efficient query to count number of leaves in a DAG.
Hi, I want to the count the number of leaves (descendants without children) for each node in a DAG DAG definition - CREATE TABLE nodes(id integer primary key, description text); CREATE TABLE edges(parent not null references nodes, child not null references nodes, primary key(parent, child)); My query - CREATE VIEW v_count_leaves as with recursive r(id, top) as ( select id, id from nodes union all select t.id, top from nodes as t, edges as e, r where e.parent=r.id and t.id =e.child) select top, count(*) from r where top<>id and id not in (select parent from edges where parent=id) group by top; It seems to work but is complex to understand and debug despite my aim to keep it simple as possible, but more importantly - it is very slow when there are more than a few thousand nodes and edges. It there a more efficient (and ideally simpler) way? ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can select * from table where not exists (subquery) be optimized?
Luuk wrote: > On 01-01-18 03:14, Shane Dev wrote: >> select * from nodes where not exists (select * from edges where >> child=nodes.id); > > Changing this to: > > select * from nodes where not exists (select 1 from edges where > child=nodes.id); > > saved in my test about 10% of time Then I have to doubt your test; the generated code (see the EXPLAIN output) is exactly the same. Regards, Clemens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can select * from table where not exists (subquery) be optimized?
Clemens Ladisch wrote: Luuk wrote: On 01-01-18 03:14, Shane Dev wrote: select * from nodes where not exists (select * from edges where child=nodes.id); Changing this to: select * from nodes where not exists (select 1 from edges where child=nodes.id); saved in my test about 10% of time Then I have to doubt your test; the generated code (see the EXPLAIN output) is exactly the same. Yes, the execution plans are the same. Still the second is faster! Another query that uses EXCEPT instead of NOT IN and that has yet another execution plan: select id from nodes except select child from edges; ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] scanstats
The info on .help is not complete i.e. '.version' is missing also '.scanstats' give info that one should user 'on', of 'off'. When one of these options is used a warning is show that this option is not available suggestion: remove the '.scanstats' from the list or, give the warning when doing '.scanstats' about it not being available. luuk@opensuse:~/tmp> sqlite-autoconf-321/sqlite3 SQLite version 3.21.0 2017-10-24 18:55:49 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. sqlite> .scanstats Usage: .scanstats on|off sqlite> .scanstats on Warning: .scanstats not available in this build. sqlite> BTW, it's hard to find out how to compile sQlite with this *SQLITE_ENABLE_STMT_SCANSTATUS* ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Capturing groups for regexp.c Check-in [3d6fba62] ?
Indeed, but JavaScript and awk also have first class loop accessible variables to make up for the limitations in their respective regex parsers. About linear time. Are you saying it is slower than linear time to compile a group captured regex or that it is impossible to efficiently reuse the compiled object in the body of a complementary UDF in the same statement? All sorts of ad hoc parsing functions are possible and I use quite a few in my code. However, as I pointed out in the carefully worked test case linked below, there is presently no reliable way to share that parsed state among functions with different names, in different columns, or of column valued objects using the thread safe auxdata API. Those improvements in the auxdata API alone would be valuable without the regexp capture capability. https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg107041.html BTW, it is not only my possibly eccentric boutique code that is running into this problem: https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg107045.html On Mon, Jan 1, 2018 at 10:44 AM, Richard Hippwrote: > On 1/1/18, petern wrote: > > Richard. Please consider adding capturing groups during your upgrade of > > the regexp.c matching capability. > > I did consider that. It seems hard to do in linear time. I also > notice that neither JavaScript nor AWK support that capability. > > -- > D. Richard Hipp > d...@sqlite.org > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes
Wolfgang Enzinger wrote: > Am Mon, 1 Jan 2018 16:20:21 +0100 schrieb Clemens Ladisch: >> It is indeed possible to change the query so that SQLite uses rowid >> lookups for the R-tree filter (INDEX 1). However, any likelihood on the >> R-tree search expression still did not make any difference. Do you have >> an example? > > SELECT * FROM t INNER JOIN i USING(id) > WHERE LIKELIHOOD(i.minX>=-81.08 ..., 0.999) -- 0.999 >AND LIKELIHOOD(t.x=3, 0.001); -- 0.001 > > SELECT * FROM t INNER JOIN i USING(id) > WHERE LIKELIHOOD(i.minX>=-81.08 ..., 0.001) -- 0.001 >AND LIKELIHOOD(t.x=3, 0.999); -- 0.999 Sorry, this is not what I meant. The original question was about manually removing the R-tree search depending on the spatial window. So, do you have an example where the query plan changes due to a difference in *only* the likelihood of the R-tree search expression? Regards, Clemens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes
Am Mon, 1 Jan 2018 20:30:10 +0100 schrieb Clemens Ladisch: > Wolfgang Enzinger wrote: >> Am Mon, 1 Jan 2018 16:20:21 +0100 schrieb Clemens Ladisch: >>> It is indeed possible to change the query so that SQLite uses rowid >>> lookups for the R-tree filter (INDEX 1). However, any likelihood on the >>> R-tree search expression still did not make any difference. Do you have >>> an example? >> >> SELECT * FROM t INNER JOIN i USING(id) >> WHERE LIKELIHOOD(i.minX>=-81.08 ..., 0.999) -- 0.999 >>AND LIKELIHOOD(t.x=3, 0.001); -- 0.001 >> >> SELECT * FROM t INNER JOIN i USING(id) >> WHERE LIKELIHOOD(i.minX>=-81.08 ..., 0.001) -- 0.001 >>AND LIKELIHOOD(t.x=3, 0.999); -- 0.999 > > Sorry, this is not what I meant. Seems like I didn't get the original question correctly, then. > The original question was about manually removing the R-tree search > depending on the spatial window. So, do you have an example where > the query plan changes due to a difference in *only* the likelihood > of the R-tree search expression? No. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can select * from table where not exists (subquery) be optimized?
Luuk wrote: >> Clemens Ladisch wrote: >>> Luuk wrote: On 01-01-18 03:14, Shane Dev wrote: > select * from nodes where not exists (select * from edges where > child=nodes.id); Changing this to: select * from nodes where not exists (select 1 from edges where child=nodes.id); saved in my test about 10% of time >>> >>> Then I have to doubt your test; the generated code (see the EXPLAIN >>> output) is exactly the same. > > the 3rd step of EXPLAIN changed from: > > 1|0|0|SCAN TABLE edges > to: > 1|0|0|SCAN TABLE edges USING COVERING INDEX iedges Indeed it does; my own test was without the index. Regards, Clemens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Efficient query to count number of leaves in a DAG.
Hi Peter, By "schema changes Clemens suggested" I assume you mean replacing the constraint - not exists (select * from edges where child=nodes.id); with ...where id not in (select child from edges); For this leaf count query, I need to constrain the result set to exclude nodes which are parents - sqlite> CREATE VIEW v_count_leaves as with recursive r(id, top) as (select id, id from nodes union all select t.id, top from nodes as t, edges as e, r where e.parent=r.id and t.id=e.child) select top, count(*) from r where top<>id and id not in (select parent from edges) group by top; sqlite> .timer on sqlite> select * from v_count_leaves where top=4465; 44651 Run Time: real 75.678 user 75.671875 sys 0.00 sqlite> select count(*) from nodes; 1 sqlite> select count(*) from edges; 9986 It is still very slow or did you mean something else? On 1 January 2018 at 18:04, peternwrote: > Shane. I sent you a query to work with the crippled schema and index you > proposed for TABLE edges. > Clemens then explicitly suggested you correct the schema to have use of > automatic covering index. > > >CREATE TABLE edges(parent not null references nodes, child not null > >references nodes, primary key(parent, child)); > > Try your leaf counter again - after making the schema changes Clemens > suggested. > > Peter > > > On Mon, Jan 1, 2018 at 8:13 AM, Shane Dev wrote: > > > Hi, > > > > I want to the count the number of leaves (descendants without children) > for > > each node in a DAG > > > > DAG definition - > > > > CREATE TABLE nodes(id integer primary key, description text); > > CREATE TABLE edges(parent not null references nodes, child not null > > references nodes, primary key(parent, child)); > > > > My query - > > > > CREATE VIEW v_count_leaves as with recursive r(id, top) as ( > > select id, id from nodes > > union all > > select t.id, top from nodes as t, edges as e, r where e.parent=r.id and > > t.id > > =e.child) > > select top, count(*) from r where top<>id and id not in (select parent > from > > edges where parent=id) group by top; > > > > It seems to work but is complex to understand and debug despite my aim to > > keep it simple as possible, but more importantly - it is very slow when > > there are more than a few thousand nodes and edges. > > > > It there a more efficient (and ideally simpler) way? > > ___ > > sqlite-users mailing list > > sqlite-users@mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes
Wolfgang Enzinger wrote: > First, query the overall extent of your data, like this: > SELECT MIN(mingx),MAX(maxgx),MIN(mingy),MAX(maxgy) FROM flst_shape_index; This results in a full table scan. Instead of caching these values manually, it would be a better idea to read them from the index: SELECT rtreenode(2, data) FROM flst_shape_index_node WHERE nodeno = 1; (rtreenode() is undocumented; maybe you should use your own decoder.) > Let SQLite know about that likelihood in a JOIN query This does not appear to change anything with a virtual table: CREATE TABLE t(id INETGER PRIMARY KEY, x, [...]); CREATE VIRTUAL TABLE i USING rtree(id, minx, maxx); SELECT t.* FROM t JOIN i USING (id) WHERE likelihood(i.minx BETWEEN 10 AND 20, 0.01); --EQP-- 0,0,1,SCAN TABLE i VIRTUAL TABLE INDEX 2:D0B0 --EQP-- 0,1,0,SEARCH TABLE t USING INTEGER PRIMARY KEY (rowid=?) SELECT t.* FROM t JOIN i USING (id) WHERE likelihood(i.minx BETWEEN 10 AND 20, 0.99); --EQP-- 0,0,1,SCAN TABLE i VIRTUAL TABLE INDEX 2:D0B0 --EQP-- 0,1,0,SEARCH TABLE t USING INTEGER PRIMARY KEY (rowid=?) Regards, Clemens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes
Wolfgang Enzinger wrote: > Am Mon, 1 Jan 2018 10:45:50 +0100 schrieb Clemens Ladisch: >> Wolfgang Enzinger wrote: >>> Let SQLite know about that likelihood in a JOIN query >> >> This does not appear to change anything with a virtual table: >> >> SELECT t.* FROM t JOIN i USING (id) WHERE likelihood(i.minx BETWEEN 10 AND >> 20, 0.01); >> SELECT t.* FROM t JOIN i USING (id) WHERE likelihood(i.minx BETWEEN 10 AND >> 20, 0.99); >> ... > > This is not surprising because only criteria concerning the "i" table are > in effect here. So it is clear that even a likelihood of 0.99 is more > selective than a likelihood of 1.000 (= no filter criteria in this table). > However, if your query has criteria both in the "i" and the "t" table, it > can make a difference. It is indeed possible to change the query so that SQLite uses rowid lookups for the R-tree filter (INDEX 1). However, any likelihood on the R-tree search expression still did not make any difference. Do you have an example? Regards, Clemens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users