Re: [sqlite] Efficient query to count number of leaves in a DAG.

2018-01-01 Thread Dinu
If a different perspective may be helpful to you: If moving overhead to writes is an option (ie you dont have many or time critical writes), then the tree descendants problem can be sped up to stellar speeds by using a path column. IE. add a column "path" in the nodes table that would contain

Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread Clemens Ladisch
Shane Dev wrote: > CREATE TABLE nodes(id integer primary key, description text); > CREATE TABLE edges(parent not null references nodes, child not null > references nodes, primary key(parent, child)); > > select * from nodes where not exists (select * from edges where > child=nodes.id); > > This

Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes

2018-01-01 Thread Wolfgang Enzinger
Am Mon, 1 Jan 2018 16:20:21 +0100 schrieb Clemens Ladisch: > It is indeed possible to change the query so that SQLite uses rowid > lookups for the R-tree filter (INDEX 1). However, any likelihood on the > R-tree search expression still did not make any difference. Do you have > an example?

Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes

2018-01-01 Thread Andrea Aime
On Mon, Jan 1, 2018 at 10:45 AM, Clemens Ladisch wrote: > Wolfgang Enzinger wrote: > > First, query the overall extent of your data, like this: > > SELECT MIN(mingx),MAX(maxgx),MIN(mingy),MAX(maxgy) FROM > flst_shape_index; > > This results in a full table scan. Instead of

[sqlite] Capturing groups for regexp.c Check-in [3d6fba62] ?

2018-01-01 Thread petern
Richard. Please consider adding capturing groups during your upgrade of the regexp.c matching capability. In addition to the adding a powerful new capability to all SQLite expressions, it would be very instructive to see how your code obtains the cached object for a pair of captured group

Re: [sqlite] Capturing groups for regexp.c Check-in [3d6fba62] ?

2018-01-01 Thread Richard Hipp
On 1/1/18, petern wrote: > Richard. Please consider adding capturing groups during your upgrade of > the regexp.c matching capability. I did consider that. It seems hard to do in linear time. I also notice that neither JavaScript nor AWK support that capability.

Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread Luuk
On 01-01-18 03:14, Shane Dev wrote: > Hello, > > I have a directed acyclic graph defined as follows - > > sqlite> .sch > CREATE TABLE nodes(id integer primary key, description text); > CREATE TABLE edges(parent not null references nodes, child not null > references nodes, primary key(parent,

Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread petern
This query will use the index you proposed. SELECT * FROM nodes NATURAL JOIN (SELECT parent AS id FROM edges WHERE parent NOT IN (SELECT child FROM edges)); Peter On Sun, Dec 31, 2017 at 6:14 PM, Shane Dev wrote: > Hello, > > I have a directed acyclic graph defined as

Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread Luuk
On 01-01-18 12:18, Luuk wrote: > On 01-01-18 03:14, Shane Dev wrote: >> Hello, >> >> I have a directed acyclic graph defined as follows - >> >> sqlite> .sch >> CREATE TABLE nodes(id integer primary key, description text); >> CREATE TABLE edges(parent not null references nodes, child not null >>

Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes

2018-01-01 Thread Wolfgang Enzinger
Am Mon, 1 Jan 2018 10:45:50 +0100 schrieb Clemens Ladisch: > Wolfgang Enzinger wrote: >> First, query the overall extent of your data, like this: >> SELECT MIN(mingx),MAX(maxgx),MIN(mingy),MAX(maxgy) FROM flst_shape_index; > > This results in a full table scan. Instead of caching these values

Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread E.Pasma
Shane Dev wrote: Hi Clemens, Your query is much faster on my system - thanks! Apart from visual inspection and testing, is there anyway to be sure your query selects the same results as my query? From https://sqlite.org/queryplanner.html "When programming in SQL you tell the system what

Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread Clemens Ladisch
Shane Dev wrote: > Apart from visual inspection and testing, is there anyway to be sure your > query selects the same results as my query? Can I interest you in things like relational algebra or tuple calculus? ;-) >>> select * from nodes where not exists (select * from edges where >>>

Re: [sqlite] Efficient query to count number of leaves in a DAG.

2018-01-01 Thread petern
Shane. I sent you a query to work with the crippled schema and index you proposed for TABLE edges. Clemens then explicitly suggested you correct the schema to have use of automatic covering index. >CREATE TABLE edges(parent not null references nodes, child not null >references nodes, primary

Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread Shane Dev
Hi Clemens, Your query is much faster on my system - thanks! Apart from visual inspection and testing, is there anyway to be sure your query selects the same results as my query? From https://sqlite.org/queryplanner.html "When programming in SQL you tell the system what you want to compute, not

Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread Luuk
On 01-01-18 16:52, E.Pasma wrote: > Clemens Ladisch wrote: > >> Luuk wrote: >>> On 01-01-18 03:14, Shane Dev wrote: select * from nodes where not exists (select * from edges where child=nodes.id); >>> >>> Changing this to: >>> >>> select * from nodes where not exists (select 1 from edges

[sqlite] Efficient query to count number of leaves in a DAG.

2018-01-01 Thread Shane Dev
Hi, I want to the count the number of leaves (descendants without children) for each node in a DAG DAG definition - CREATE TABLE nodes(id integer primary key, description text); CREATE TABLE edges(parent not null references nodes, child not null references nodes, primary key(parent, child));

Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread Clemens Ladisch
Luuk wrote: > On 01-01-18 03:14, Shane Dev wrote: >> select * from nodes where not exists (select * from edges where >> child=nodes.id); > > Changing this to: > > select * from nodes where not exists (select 1 from edges where > child=nodes.id); > > saved in my test about 10% of time Then I

Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread E.Pasma
Clemens Ladisch wrote: Luuk wrote: On 01-01-18 03:14, Shane Dev wrote: select * from nodes where not exists (select * from edges where child=nodes.id); Changing this to: select * from nodes where not exists (select 1 from edges where child=nodes.id); saved in my test about 10% of time

[sqlite] scanstats

2018-01-01 Thread Luuk
The info on .help is not complete i.e. '.version' is missing also '.scanstats' give info that one should user 'on', of 'off'. When one of these options is used a warning is show that this option is not available suggestion: remove the '.scanstats' from the list or, give the warning when doing

Re: [sqlite] Capturing groups for regexp.c Check-in [3d6fba62] ?

2018-01-01 Thread petern
Indeed, but JavaScript and awk also have first class loop accessible variables to make up for the limitations in their respective regex parsers. About linear time. Are you saying it is slower than linear time to compile a group captured regex or that it is impossible to efficiently reuse the

Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes

2018-01-01 Thread Clemens Ladisch
Wolfgang Enzinger wrote: > Am Mon, 1 Jan 2018 16:20:21 +0100 schrieb Clemens Ladisch: >> It is indeed possible to change the query so that SQLite uses rowid >> lookups for the R-tree filter (INDEX 1). However, any likelihood on the >> R-tree search expression still did not make any difference.

Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes

2018-01-01 Thread Wolfgang Enzinger
Am Mon, 1 Jan 2018 20:30:10 +0100 schrieb Clemens Ladisch: > Wolfgang Enzinger wrote: >> Am Mon, 1 Jan 2018 16:20:21 +0100 schrieb Clemens Ladisch: >>> It is indeed possible to change the query so that SQLite uses rowid >>> lookups for the R-tree filter (INDEX 1). However, any likelihood on the

Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread Clemens Ladisch
Luuk wrote: >> Clemens Ladisch wrote: >>> Luuk wrote: On 01-01-18 03:14, Shane Dev wrote: > select * from nodes where not exists (select * from edges where > child=nodes.id); Changing this to: select * from nodes where not exists (select 1 from edges where

Re: [sqlite] Efficient query to count number of leaves in a DAG.

2018-01-01 Thread Shane Dev
Hi Peter, By "schema changes Clemens suggested" I assume you mean replacing the constraint - not exists (select * from edges where child=nodes.id); with ...where id not in (select child from edges); For this leaf count query, I need to constrain the result set to exclude nodes which are

Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes

2018-01-01 Thread Clemens Ladisch
Wolfgang Enzinger wrote: > First, query the overall extent of your data, like this: > SELECT MIN(mingx),MAX(maxgx),MIN(mingy),MAX(maxgy) FROM flst_shape_index; This results in a full table scan. Instead of caching these values manually, it would be a better idea to read them from the index:

Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes

2018-01-01 Thread Clemens Ladisch
Wolfgang Enzinger wrote: > Am Mon, 1 Jan 2018 10:45:50 +0100 schrieb Clemens Ladisch: >> Wolfgang Enzinger wrote: >>> Let SQLite know about that likelihood in a JOIN query >> >> This does not appear to change anything with a virtual table: >> >> SELECT t.* FROM t JOIN i USING (id) WHERE