Re: [sqlite] Efficient query to count number of leaves in a DAG.

2018-01-01 Thread Dinu
If a different perspective may be helpful to you:
If moving overhead to writes is an option (ie you dont have many or time
critical writes), then the tree descendants problem can be sped up to
stellar speeds by using a path column.

IE. 
add a column "path" in the nodes table that would contain something like
"1.2.3" for node 3 that is descendant of node 2 that is descendant of node
1.
Then querying 2's descendant would result in the following range:
"path">='1.2.' AND "path"<'1.2/' or you can try using LIKE semantics - both
can use an index if you are careful with collation; I found out the hard way
that the range queries are more resilient and portable, SQLite and others
have a pretty awkward way of plugging in the LIKE optimisations that may
result in the index being skipped for not-so-obvious reasons.

Inserting nodes is trivial, but moving edges requires an algorithm to update
paths (whenever a node's parent changes, all descendant's paths must be
updated). However, for most real-world ontology use scenarios, this
opperation happens very rarely and usually on the admin range of functions,
so you can afford this operation that can be pretty slow.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread Clemens Ladisch
Shane Dev wrote:
> CREATE TABLE nodes(id integer primary key, description text);
> CREATE TABLE edges(parent not null references nodes, child not null 
> references nodes, primary key(parent, child));
>
> select * from nodes where not exists (select * from edges where 
> child=nodes.id);
>
> This works but is very slow when there are a million nodes and edges.
> I thought an index on edges(child) might help
> but it didn't

The index has the wrong affinity; the edges.child column must be integer.

The following query would be simpler, and does not need the index (because 
SQLite
always creates a temporary index for the lookup anyway):

  select * from nodes where id not in (select child from edges);


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes

2018-01-01 Thread Wolfgang Enzinger
Am Mon, 1 Jan 2018 16:20:21 +0100 schrieb Clemens Ladisch:

> It is indeed possible to change the query so that SQLite uses rowid
> lookups for the R-tree filter (INDEX 1).  However, any likelihood on the
> R-tree search expression still did not make any difference.  Do you have
> an example?

Try:

---

CREATE TABLE t(id INTEGER PRIMARY KEY,x INTEGER,y INTEGER,z INTEGER);
CREATE VIRTUAL TABLE i USING rtree(id,minx,maxx,miny,maxy);
CREATE INDEX t_x ON t(x);

---

EXPLAIN QUERY PLAN
SELECT * FROM t INNER JOIN i USING(id) 
 WHERE LIKELIHOOD(i.minX>=-81.08 AND i.maxX<=-80.58
   AND i.minY>=35.00  AND i.maxY<=35.44, 0.999)  -- 0.999
   AND LIKELIHOOD(t.x=3, 0.001); -- 0.001

-> SEARCH TABLE t USING INDEX t_x (x=?)
-> SCAN TABLE i VIRTUAL TABLE INDEX 1:

---

SELECT * FROM t INNER JOIN i USING(id) 
 WHERE LIKELIHOOD(i.minX>=-81.08 AND i.maxX<=-80.58
   AND i.minY>=35.00  AND i.maxY<=35.44, 0.001)  -- 0.001
   AND LIKELIHOOD(t.x=3, 0.999); -- 0.999

-> SCAN TABLE i VIRTUAL TABLE INDEX 2:D0B1D2B3
-> SEARCH TABLE t USING INTEGER PRIMARY KEY (rowid=?)

---

Tested with SQLite 3.13.0 here, but IIRC newer versions behave the same.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes

2018-01-01 Thread Andrea Aime
On Mon, Jan 1, 2018 at 10:45 AM, Clemens Ladisch  wrote:

> Wolfgang Enzinger wrote:
> > First, query the overall extent of your data, like this:
> > SELECT MIN(mingx),MAX(maxgx),MIN(mingy),MAX(maxgy) FROM
> flst_shape_index;
>
> This results in a full table scan.  Instead of caching these values
> manually,
> it would be a better idea to read them from the index:
>
>   SELECT rtreenode(2, data) FROM flst_shape_index_node WHERE nodeno = 1;
>
> (rtreenode() is undocumented; maybe you should use your own decoder.)
>

Intriguing! I'm looking at the output on my local data set (all Texas
roads, 3+ million records, large,
but not all that much in GIS systems):

SELECT rtreenode(2, data) FROM rtree_tiger_shp_geom_node WHERE nodeno = 1;
{26338 -100.598 -96.2836 25.8411 30.6962} {26337 -96.9349 -93.5195 28.598
31.1915} {43295 -96.9621 -93.6133 30.6486 33.9442} {62086 -106.644 -98.0079
28.887 32.1568} {80094 -97.6371 -96.4565 28.5382 33.9419} {97065 -98.5755
-97.2869 28.686 34.1405} {108645 -104.096 -98.3331 31.6511 36.5007}

So, if I'm guessing right, each set of numbers is a count of features and
then the rectangle intersecting them?
I've prepared a picture with the data:

https://drive.google.com/file/d/15WpyfWNKdVeBoLDDRCoCW2xx87T6qf_Z/view?usp=sharing

This query is indeed quite a bit faster and more interesting than scanning
the entire index virtual table, wondering,
is there also a way to get the "next level" of the rtree?
This one returns only a few records, I would not mind having a bit more in
memory for more accurate pre-checks (dabase
access can be optimized out fully if the requested rectangle falls outside
of the rtree boxes no?)

What worries me is that these functions are undocumented, and thus, I
imaging, unsupported and at risk
of being removed at any time. Do you have a feeling of how likely is this
to happen?



>
> > Let SQLite know about that likelihood in a JOIN query
>

The query is being generated after translation from another generic filter
language and at the moment it's
expressed as a sub-query (cause it can be negated, related with other
conditions by and/or, and so on).
I've tried to use it with little luck, the subquery is always run according
to explain plan:

sqlite> explain query plan SELECT "fid","geom" as "geom" FROM "tiger_shp"
WHERE  likelihood("fid" IN (SELECT id FROM "rtree_tiger_shp_geom" r WHERE
r.maxx >= -98.41324970985764 AND r.minx <= -98.33594825643836 AND r.maxy >=
30.532285631231357 AND r.miny <= 30.595068029284644), *0.01*);
0|0|0|SEARCH TABLE tiger_shp USING INTEGER PRIMARY KEY (rowid=?)
0|0|0|EXECUTE LIST SUBQUERY 1
1|0|0|SCAN TABLE rtree_tiger_shp_geom AS r VIRTUAL TABLE INDEX 2:D1B0D3B2
sqlite> explain query plan SELECT "fid","geom" as "geom" FROM "tiger_shp"
WHERE  likelihood("fid" IN (SELECT id FROM "rtree_tiger_shp_geom" r WHERE
r.maxx >= -98.41324970985764 AND r.minx <= -98.33594825643836 AND r.maxy >=
30.532285631231357 AND r.miny <= 30.595068029284644), *0.99*);
0|0|0|SEARCH TABLE tiger_shp USING INTEGER PRIMARY KEY (rowid=?)
0|0|0|EXECUTE LIST SUBQUERY 1
1|0|0|SCAN TABLE rtree_tiger_shp_geom AS r VIRTUAL TABLE INDEX 2:D1B0D3B2

Oh well, since the code is generating the SQL, it can directly omit the
subquery. Is there any literature on what a good threshold
would be?

Cheers
Andrea
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Capturing groups for regexp.c Check-in [3d6fba62] ?

2018-01-01 Thread petern
Richard.  Please consider adding capturing groups during your upgrade of
the regexp.c matching capability.

In addition to the adding a powerful new capability to all SQLite
expressions, it would be very instructive to see how your code obtains the
cached object for a pair of captured group accessors such as
regexp_group_count(R) and regexp_group(R,I) where R is the common regexp
and I is the index of the captured group text to be retrieved.

Peter
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Capturing groups for regexp.c Check-in [3d6fba62] ?

2018-01-01 Thread Richard Hipp
On 1/1/18, petern  wrote:
> Richard.  Please consider adding capturing groups during your upgrade of
> the regexp.c matching capability.

I did consider that.  It seems hard to do in linear time.  I also
notice that neither JavaScript nor AWK support that capability.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread Luuk
On 01-01-18 03:14, Shane Dev wrote:
> Hello,
>
> I have a directed acyclic graph defined as follows -
>
> sqlite> .sch
> CREATE TABLE nodes(id integer primary key, description text);
> CREATE TABLE edges(parent not null references nodes, child not null
> references nodes, primary key(parent, child));
>
> Now I want to find the "roots" of the graph - i.e nodes which are not
> children of other nodes -
>
> select * from nodes where not exists (select * from edges where child=
> nodes.id);
>
> This works but is very slow when there are a million nodes and edges.

Changing this to:

select * from nodes where not exists (select 1 from edges where child=
nodes.id);

saved in my test about 10% of time

> Is there any way to speed it up?

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread petern
This query will use the index you proposed.

SELECT * FROM nodes NATURAL JOIN (SELECT parent AS id FROM edges WHERE
parent NOT IN (SELECT child FROM edges));

Peter

On Sun, Dec 31, 2017 at 6:14 PM, Shane Dev  wrote:

> Hello,
>
> I have a directed acyclic graph defined as follows -
>
> sqlite> .sch
> CREATE TABLE nodes(id integer primary key, description text);
> CREATE TABLE edges(parent not null references nodes, child not null
> references nodes, primary key(parent, child));
>
> Now I want to find the "roots" of the graph - i.e nodes which are not
> children of other nodes -
>
> select * from nodes where not exists (select * from edges where child=
> nodes.id);
>
> This works but is very slow when there are a million nodes and edges.
>
> Looking at the query plan -
>
> sqlite> explain query plan select * from nodes where not exists (select *
> from edges where child=nodes.id);
> selectidorder   fromdetail
> 0   0   0   SCAN TABLE nodes
> 0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 1
> 1   0   0   SCAN TABLE edges
>
> I thought an index on edges(child) might help
>
> sqlite> CREATE INDEX iedges on edges(child);
>
> but it didn't -
>
> sqlite> explain query plan select * from nodes where not exists (select *
> from edges where child=nodes.id);
> selectidorder   fromdetail
> 0   0   0   SCAN TABLE nodes
> 0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 1
> 1   0   0   SCAN TABLE edges
>
> Is there any way to speed it up?
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread Luuk


On 01-01-18 12:18, Luuk wrote:
> On 01-01-18 03:14, Shane Dev wrote:
>> Hello,
>>
>> I have a directed acyclic graph defined as follows -
>>
>> sqlite> .sch
>> CREATE TABLE nodes(id integer primary key, description text);
>> CREATE TABLE edges(parent not null references nodes, child not null
>> references nodes, primary key(parent, child));
>>
>> Now I want to find the "roots" of the graph - i.e nodes which are not
>> children of other nodes -
>>
>> select * from nodes where not exists (select * from edges where child=
>> nodes.id);
>>
>> This works but is very slow when there are a million nodes and edges.
> Changing this to:
>
> select * from nodes where not exists (select 1 from edges where child=
> nodes.id);
>
> saved in my test about 10% of time
>
>> Is there any way to speed it up?

Above is the same as what Clemens already mentioned ...

sqlite> explain query plan select * from nodes where not exists (select
* from edges where child=nodes.id);
0|0|0|SCAN TABLE nodes
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SCAN TABLE edges
Run Time: real 0.009 user 0.00 sys 0.00
sqlite>
sqlite> explain query plan select * from nodes where not exists (select
1 from edges where child=nodes.id);
0|0|0|SCAN TABLE nodes
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SCAN TABLE edges USING COVERING INDEX iedges
Run Time: real 0.003 user 0.00 sys 0.00
sqlite>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes

2018-01-01 Thread Wolfgang Enzinger
Am Mon, 1 Jan 2018 10:45:50 +0100 schrieb Clemens Ladisch:

> Wolfgang Enzinger wrote:
>> First, query the overall extent of your data, like this:
>> SELECT MIN(mingx),MAX(maxgx),MIN(mingy),MAX(maxgy) FROM flst_shape_index;
> 
> This results in a full table scan.  Instead of caching these values manually,
> it would be a better idea to read them from the index:
> 
>   SELECT rtreenode(2, data) FROM flst_shape_index_node WHERE nodeno = 1;
> 
> (rtreenode() is undocumented; maybe you should use your own decoder.)

Thanks, didn't know that, I'll look into it. You're right, my query results
in a full table scan, however it's pretty fast anyway - less than a second
with 160,000 rows and cold cache.

>> Let SQLite know about that likelihood in a JOIN query
> 
> This does not appear to change anything with a virtual table:
> 
>   CREATE TABLE t(id INETGER PRIMARY KEY, x, [...]);
>   CREATE VIRTUAL TABLE i USING rtree(id, minx, maxx);
> 
>   SELECT t.* FROM t JOIN i USING (id) WHERE likelihood(i.minx BETWEEN 10 AND 
> 20, 0.01);
>   --EQP-- 0,0,1,SCAN TABLE i VIRTUAL TABLE INDEX 2:D0B0
>   --EQP-- 0,1,0,SEARCH TABLE t USING INTEGER PRIMARY KEY (rowid=?)
>   SELECT t.* FROM t JOIN i USING (id) WHERE likelihood(i.minx BETWEEN 10 AND 
> 20, 0.99);
>   --EQP-- 0,0,1,SCAN TABLE i VIRTUAL TABLE INDEX 2:D0B0
>   --EQP-- 0,1,0,SEARCH TABLE t USING INTEGER PRIMARY KEY (rowid=?)

This is not surprising because only criteria concerning the "i" table are
in effect here. So it is clear that even a likelihood of 0.99 is more
selective than a likelihood of 1.000 (= no filter criteria in this table).
However, if your query has criteria both in the "i" and the "t" table, it
can make a difference.

Of course, anybody correct me if I'm mistaken.

Happy new year!
Wolfgang

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread E.Pasma

Shane Dev wrote:


Hi Clemens,

Your query is much faster on my system - thanks!

Apart from visual inspection and testing, is there anyway to be sure  
your

query selects the same results as my query?

From https://sqlite.org/queryplanner.html "When programming in SQL  
you tell
the system what you want to compute, not how to compute it". Is this  
an
exception to the rule where the query planner must be told how to  
compute

the result?


select * from nodes where not exists (select * from edges where  
child=nodes.id);


select * from nodes where not exists (select 1 from edges where  
child=nodes.id);


 select * from nodes where id not in (select child from edges);



The first two queries look more 'procedural' than the last. So this  
may confirm "When programming in SQL you tell the system what you want  
to compute, not how to compute it"


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread Clemens Ladisch
Shane Dev wrote:
> Apart from visual inspection and testing, is there anyway to be sure your
> query selects the same results as my query?

Can I interest you in things like relational algebra or tuple calculus?
;-)

>>> select * from nodes where not exists (select * from edges where 
>>> child=nodes.id);
>>  select * from nodes where id not in (select child from edges);
>
> From https://sqlite.org/queryplanner.html "When programming in SQL you tell
> the system what you want to compute, not how to compute it". Is this an
> exception to the rule where the query planner must be told how to compute
> the result?

Arguable, the second query is on a higher abstraction level, i.e., the
_first_ one tells the system how to compute it.  (AFAIK SQLite is able
to transform an IN lookup into a correlated subquery, if worthwhile, but
not in the opposite direction.)


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficient query to count number of leaves in a DAG.

2018-01-01 Thread petern
Shane.  I sent you a query to work with the crippled schema and index you
proposed for TABLE edges.
Clemens then explicitly suggested you correct the schema to have use of
automatic covering index.

>CREATE TABLE edges(parent not null references nodes, child not null
>references nodes, primary key(parent, child));

Try your leaf counter again - after making the schema changes Clemens
suggested.

Peter


On Mon, Jan 1, 2018 at 8:13 AM, Shane Dev  wrote:

> Hi,
>
> I want to the count the number of leaves (descendants without children) for
> each node in a DAG
>
> DAG definition -
>
> CREATE TABLE nodes(id integer primary key, description text);
> CREATE TABLE edges(parent not null references nodes, child not null
> references nodes, primary key(parent, child));
>
> My query -
>
> CREATE VIEW v_count_leaves as with recursive r(id, top) as (
> select id, id from nodes
> union all
> select t.id, top from nodes as t, edges as e, r where e.parent=r.id and
> t.id
> =e.child)
> select top, count(*) from r where top<>id and id not in (select parent from
> edges where parent=id) group by top;
>
> It seems to work but is complex to understand and debug despite my aim to
> keep it simple as possible, but more importantly - it is very slow when
> there are more than a few thousand nodes and edges.
>
> It there a more efficient (and ideally simpler) way?
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread Shane Dev
Hi Clemens,

Your query is much faster on my system - thanks!

Apart from visual inspection and testing, is there anyway to be sure your
query selects the same results as my query?

From https://sqlite.org/queryplanner.html "When programming in SQL you tell
the system what you want to compute, not how to compute it". Is this an
exception to the rule where the query planner must be told how to compute
the result?




On 1 January 2018 at 10:58, Clemens Ladisch  wrote:

> Shane Dev wrote:
> > CREATE TABLE nodes(id integer primary key, description text);
> > CREATE TABLE edges(parent not null references nodes, child not null
> references nodes, primary key(parent, child));
> >
> > select * from nodes where not exists (select * from edges where child=
> nodes.id);
> >
> > This works but is very slow when there are a million nodes and edges.
> > I thought an index on edges(child) might help
> > but it didn't
>
> The index has the wrong affinity; the edges.child column must be integer.
>
> The following query would be simpler, and does not need the index (because
> SQLite
> always creates a temporary index for the lookup anyway):
>
>   select * from nodes where id not in (select child from edges);
>
>
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread Luuk
On 01-01-18 16:52, E.Pasma wrote:
> Clemens Ladisch wrote:
>
>> Luuk wrote:
>>> On 01-01-18 03:14, Shane Dev wrote:
 select * from nodes where not exists (select * from edges where
 child=nodes.id);
>>>
>>> Changing this to:
>>>
>>> select * from nodes where not exists (select 1 from edges where
>>> child=nodes.id);
>>>
>>> saved in my test about 10% of time
>>
>> Then I have to doubt your test; the generated code (see the EXPLAIN
>> output) is exactly the same.
>
>

the 3rd step of EXPLAIN changed from:

1|0|0|SCAN TABLE edges
to:
1|0|0|SCAN TABLE edges USING COVERING INDEX iedges


luuk@opensuse:~/tmp> sqlite3 nodes.db
SQLite version 3.8.10.2 2015-05-20 18:17:19
Enter ".help" for usage hints.
sqlite> .timer on
sqlite> select * from nodes where not exists (select * from edges where
child=nodes.id) and id<10;
Run Time: real 254.400 user 254.007998 sys 0.119976
sqlite> select * from nodes where not exists (select * from edges where
child=nodes.id) and id<10;
Run Time: real 234.342 user 233.348752 sys 0.491637
sqlite> select * from nodes where not exists (select 1 from edges where
child=nodes.id) and id<10;
Run Time: real 219.904 user 218.968920 sys 0.651569
sqlite> select * from nodes where not exists (select 1 from edges where
child=nodes.id) and id<10;
Run Time: real 219.929 user 219.562780 sys 0.127948
sqlite> select * from nodes where not exists (select * from edges where
child=nodes.id) and id<10;
Run Time: real 236.423 user 234.648774 sys 1.622957




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Efficient query to count number of leaves in a DAG.

2018-01-01 Thread Shane Dev
Hi,

I want to the count the number of leaves (descendants without children) for
each node in a DAG

DAG definition -

CREATE TABLE nodes(id integer primary key, description text);
CREATE TABLE edges(parent not null references nodes, child not null
references nodes, primary key(parent, child));

My query -

CREATE VIEW v_count_leaves as with recursive r(id, top) as (
select id, id from nodes
union all
select t.id, top from nodes as t, edges as e, r where e.parent=r.id and t.id
=e.child)
select top, count(*) from r where top<>id and id not in (select parent from
edges where parent=id) group by top;

It seems to work but is complex to understand and debug despite my aim to
keep it simple as possible, but more importantly - it is very slow when
there are more than a few thousand nodes and edges.

It there a more efficient (and ideally simpler) way?
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread Clemens Ladisch
Luuk wrote:
> On 01-01-18 03:14, Shane Dev wrote:
>> select * from nodes where not exists (select * from edges where 
>> child=nodes.id);
>
> Changing this to:
>
>  select * from nodes where not exists (select 1 from edges where 
> child=nodes.id);
>
> saved in my test about 10% of time

Then I have to doubt your test; the generated code (see the EXPLAIN
output) is exactly the same.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread E.Pasma

Clemens Ladisch wrote:


Luuk wrote:

On 01-01-18 03:14, Shane Dev wrote:
select * from nodes where not exists (select * from edges where  
child=nodes.id);


Changing this to:

select * from nodes where not exists (select 1 from edges where  
child=nodes.id);


saved in my test about 10% of time


Then I have to doubt your test; the generated code (see the EXPLAIN
output) is exactly the same.



Yes, the execution plans are the same. Still the second is faster!

Another query that uses EXCEPT instead of NOT IN and that has yet  
another execution plan:


select id from nodes except select child from edges;
 

___

sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] scanstats

2018-01-01 Thread Luuk
The info on .help is not complete

i.e. '.version' is missing

also '.scanstats' give info that one should user 'on', of 'off'. When
one of these options is used a warning is show that this option is not
available

suggestion: remove the '.scanstats' from the list or, give the warning
when doing '.scanstats' about it not being available.


luuk@opensuse:~/tmp> sqlite-autoconf-321/sqlite3
SQLite version 3.21.0 2017-10-24 18:55:49
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .scanstats
Usage: .scanstats on|off
sqlite> .scanstats on
Warning: .scanstats not available in this build.
sqlite>


BTW, it's hard to find out how to compile sQlite with this
*SQLITE_ENABLE_STMT_SCANSTATUS*

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Capturing groups for regexp.c Check-in [3d6fba62] ?

2018-01-01 Thread petern
Indeed, but JavaScript and awk also have first class loop accessible
variables to make up for the limitations in their respective regex parsers.

About linear time. Are you saying it is slower than linear time to compile
a group captured regex or that it is impossible to efficiently reuse the
compiled object in the body of a complementary UDF in the same statement?

All sorts of ad hoc parsing functions are possible and I use quite a few in
my code.  However, as I pointed out in the carefully worked test case
linked below, there is presently no reliable way to share that parsed state
among functions with different names, in different columns, or of column
valued objects using the thread safe auxdata API.  Those improvements in
the auxdata API alone would be valuable without the regexp capture
capability.

https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg107041.html

BTW, it is not only my possibly eccentric boutique code that is running
into this problem:

https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg107045.html



On Mon, Jan 1, 2018 at 10:44 AM, Richard Hipp  wrote:

> On 1/1/18, petern  wrote:
> > Richard.  Please consider adding capturing groups during your upgrade of
> > the regexp.c matching capability.
>
> I did consider that.  It seems hard to do in linear time.  I also
> notice that neither JavaScript nor AWK support that capability.
>
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes

2018-01-01 Thread Clemens Ladisch
Wolfgang Enzinger wrote:
> Am Mon, 1 Jan 2018 16:20:21 +0100 schrieb Clemens Ladisch:
>> It is indeed possible to change the query so that SQLite uses rowid
>> lookups for the R-tree filter (INDEX 1).  However, any likelihood on the
>> R-tree search expression still did not make any difference.  Do you have
>> an example?
>
> SELECT * FROM t INNER JOIN i USING(id)
>  WHERE LIKELIHOOD(i.minX>=-81.08 ..., 0.999)  -- 0.999
>AND LIKELIHOOD(t.x=3, 0.001);  -- 0.001
>
> SELECT * FROM t INNER JOIN i USING(id)
>  WHERE LIKELIHOOD(i.minX>=-81.08 ..., 0.001)  -- 0.001
>AND LIKELIHOOD(t.x=3, 0.999);  -- 0.999

Sorry, this is not what I meant.

The original question was about manually removing the R-tree search
depending on the spatial window.  So, do you have an example where
the query plan changes due to a difference in *only* the likelihood
of the R-tree search expression?


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes

2018-01-01 Thread Wolfgang Enzinger
Am Mon, 1 Jan 2018 20:30:10 +0100 schrieb Clemens Ladisch:

> Wolfgang Enzinger wrote:
>> Am Mon, 1 Jan 2018 16:20:21 +0100 schrieb Clemens Ladisch:
>>> It is indeed possible to change the query so that SQLite uses rowid
>>> lookups for the R-tree filter (INDEX 1).  However, any likelihood on the
>>> R-tree search expression still did not make any difference.  Do you have
>>> an example?
>>
>> SELECT * FROM t INNER JOIN i USING(id)
>>  WHERE LIKELIHOOD(i.minX>=-81.08 ..., 0.999)  -- 0.999
>>AND LIKELIHOOD(t.x=3, 0.001);  -- 0.001
>>
>> SELECT * FROM t INNER JOIN i USING(id)
>>  WHERE LIKELIHOOD(i.minX>=-81.08 ..., 0.001)  -- 0.001
>>AND LIKELIHOOD(t.x=3, 0.999);  -- 0.999
> 
> Sorry, this is not what I meant.

Seems like I didn't get the original question correctly, then.

> The original question was about manually removing the R-tree search
> depending on the spatial window.  So, do you have an example where
> the query plan changes due to a difference in *only* the likelihood
> of the R-tree search expression?

No.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can select * from table where not exists (subquery) be optimized?

2018-01-01 Thread Clemens Ladisch
Luuk wrote:
>> Clemens Ladisch wrote:
>>> Luuk wrote:
 On 01-01-18 03:14, Shane Dev wrote:
> select * from nodes where not exists (select * from edges where
> child=nodes.id);

 Changing this to:

 select * from nodes where not exists (select 1 from edges where
 child=nodes.id);

 saved in my test about 10% of time
>>>
>>> Then I have to doubt your test; the generated code (see the EXPLAIN
>>> output) is exactly the same.
>
> the 3rd step of EXPLAIN changed from:
>
> 1|0|0|SCAN TABLE edges
> to:
> 1|0|0|SCAN TABLE edges USING COVERING INDEX iedges

Indeed it does; my own test was without the index.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficient query to count number of leaves in a DAG.

2018-01-01 Thread Shane Dev
Hi Peter,

By "schema changes Clemens suggested" I assume you mean replacing the
constraint -

not exists (select * from edges where child=nodes.id);

with

...where id not in (select child from edges);

For this leaf count query, I need to constrain the result set to exclude
nodes which are parents -

sqlite> CREATE VIEW v_count_leaves as with recursive r(id, top) as (select
id, id from nodes union all select t.id, top from nodes as t, edges as e, r
where e.parent=r.id and t.id=e.child) select top, count(*) from r where
top<>id and id not in (select parent from edges) group by top;
sqlite> .timer on
sqlite> select * from v_count_leaves where top=4465;
44651
Run Time: real 75.678 user 75.671875 sys 0.00

sqlite> select count(*) from nodes;
1
sqlite> select count(*) from edges;
9986

It is still very slow or did you mean something else?


On 1 January 2018 at 18:04, petern  wrote:

> Shane.  I sent you a query to work with the crippled schema and index you
> proposed for TABLE edges.
> Clemens then explicitly suggested you correct the schema to have use of
> automatic covering index.
>
> >CREATE TABLE edges(parent not null references nodes, child not null
> >references nodes, primary key(parent, child));
>
> Try your leaf counter again - after making the schema changes Clemens
> suggested.
>
> Peter
>
>
> On Mon, Jan 1, 2018 at 8:13 AM, Shane Dev  wrote:
>
> > Hi,
> >
> > I want to the count the number of leaves (descendants without children)
> for
> > each node in a DAG
> >
> > DAG definition -
> >
> > CREATE TABLE nodes(id integer primary key, description text);
> > CREATE TABLE edges(parent not null references nodes, child not null
> > references nodes, primary key(parent, child));
> >
> > My query -
> >
> > CREATE VIEW v_count_leaves as with recursive r(id, top) as (
> > select id, id from nodes
> > union all
> > select t.id, top from nodes as t, edges as e, r where e.parent=r.id and
> > t.id
> > =e.child)
> > select top, count(*) from r where top<>id and id not in (select parent
> from
> > edges where parent=id) group by top;
> >
> > It seems to work but is complex to understand and debug despite my aim to
> > keep it simple as possible, but more importantly - it is very slow when
> > there are more than a few thousand nodes and edges.
> >
> > It there a more efficient (and ideally simpler) way?
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes

2018-01-01 Thread Clemens Ladisch
Wolfgang Enzinger wrote:
> First, query the overall extent of your data, like this:
> SELECT MIN(mingx),MAX(maxgx),MIN(mingy),MAX(maxgy) FROM flst_shape_index;

This results in a full table scan.  Instead of caching these values manually,
it would be a better idea to read them from the index:

  SELECT rtreenode(2, data) FROM flst_shape_index_node WHERE nodeno = 1;

(rtreenode() is undocumented; maybe you should use your own decoder.)

> Let SQLite know about that likelihood in a JOIN query

This does not appear to change anything with a virtual table:

  CREATE TABLE t(id INETGER PRIMARY KEY, x, [...]);
  CREATE VIRTUAL TABLE i USING rtree(id, minx, maxx);

  SELECT t.* FROM t JOIN i USING (id) WHERE likelihood(i.minx BETWEEN 10 AND 
20, 0.01);
  --EQP-- 0,0,1,SCAN TABLE i VIRTUAL TABLE INDEX 2:D0B0
  --EQP-- 0,1,0,SEARCH TABLE t USING INTEGER PRIMARY KEY (rowid=?)
  SELECT t.* FROM t JOIN i USING (id) WHERE likelihood(i.minx BETWEEN 10 AND 
20, 0.99);
  --EQP-- 0,0,1,SCAN TABLE i VIRTUAL TABLE INDEX 2:D0B0
  --EQP-- 0,1,0,SEARCH TABLE t USING INTEGER PRIMARY KEY (rowid=?)



Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes

2018-01-01 Thread Clemens Ladisch
Wolfgang Enzinger wrote:
> Am Mon, 1 Jan 2018 10:45:50 +0100 schrieb Clemens Ladisch:
>> Wolfgang Enzinger wrote:
>>> Let SQLite know about that likelihood in a JOIN query
>>
>> This does not appear to change anything with a virtual table:
>>
>>   SELECT t.* FROM t JOIN i USING (id) WHERE likelihood(i.minx BETWEEN 10 AND 
>> 20, 0.01);
>>   SELECT t.* FROM t JOIN i USING (id) WHERE likelihood(i.minx BETWEEN 10 AND 
>> 20, 0.99);
>>   ...
>
> This is not surprising because only criteria concerning the "i" table are
> in effect here. So it is clear that even a likelihood of 0.99 is more
> selective than a likelihood of 1.000 (= no filter criteria in this table).
> However, if your query has criteria both in the "i" and the "t" table, it
> can make a difference.

It is indeed possible to change the query so that SQLite uses rowid
lookups for the R-tree filter (INDEX 1).  However, any likelihood on the
R-tree search expression still did not make any difference.  Do you have
an example?


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users