Re: [sqlite] The performance of indexed select

2018-01-05 Thread Simon Slavin


On 6 Jan 2018, at 6:41am, Nick  wrote:

> I find it is indeed faster than t2(c).

If you want to know which is the best index, create all the indexes you think 
might be good, run ANALYZE, then use

EXPLAIN QUERY PLAN SELECT (rest of SELECT statement here)

and see which index SQLite chooses to use.  Then you can delete the index(es) 
it didn’t choose.

Note that this gives accurate results only if your tables have convincing data 
in.  It’s not pointless, but nowhere near as accurate, if you're running it on 
a test database with 10 entries.

> Or in another word, if a TEXT column has similar meaning with an INTEGER
> column in my applications,(such as use userID instead of userName, still the
> way that the data works in my head:) ) is it recommended to use INTEGER one
> in order to get a less index pages?  

The correct way to arrange your data is to have userID everywhere you don’t 
need to know the actual name.  UserID can always remain the same but people 
change their names.  You wouldn’t want to have to go update your invoice table 
every time someone got married, would you ?

When you need to print out your invoices showing the actual name, that’s what 
JOIN is for.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-05 Thread Nick
Some simple SQLs:
SELECT count(*) FROM t2 WHERE c = xx; (or d = xx) 



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-05 Thread Nick
Thank you Simon. 

But I am still uncertain if it is a good way to replace column 'c'. 

CREATE TABLE t2(a INTEGER, b INTEGER, d INTEGER); 
or:
CREATE TABLE t2(a INTEGER, b INTEGER, c TEXT, d INTEGER); 
and then
CREATE INDEX t2d ON t2(d);
SELECT count(*) FROM t2 WHERE d = xx;

I find it is indeed faster than t2(c). 

Or in another word, if a TEXT column has similar meaning with an INTEGER
column in my applications,(such as use userID instead of userName, still the
way that the data works in my head:) ) is it recommended to use INTEGER one
in order to get a less index pages?  


One more small question:
> For instance, once SQLite has found the right entry in the index it might
> need to look up that entry in the table to retrieve values which are not
> in the index.

I understand the execution process you said. And in my opinion, sqlite
should fetch pages when looking up the entry both in the index and then in
the table. But I only found pages with '0x0A' and '0x02' when
getPageNormal() is called during the time running select SQL. Could you give
me any advises to find the code when sqlite fetching the '0x0D' pages? 

Thanks.




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-05 Thread Clemens Ladisch
Nick wrote:
>I am trying to analysis the performance of indexed select. 
>
>CREATE TABLE t2(a INTEGER, b INTEGER, c TEXT);
>CREATE INDEX t2c ON t2(c); 

Show the query that you are trying to analyze.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-05 Thread Simon Slavin
On 6 Jan 2018, at 3:32am, Nick  wrote:

> I think there may be much more leaf index b-tree pages whose header is
> '0x0A' if the length of the content of index key 'c' is always 20-25 bytes,
> as I notice the format of index inside sqlite consist of the index key and
> rowid.

You’re overthinking it.  Establish your tables and indexes according to however 
the data works in your head.  Insert some plausible data into the tables. The 
more this data is like a fully-populated production database, the better.  Then 
run the SQL command "ANALYZE".

See if the results are fast enough for your intended purposes.  They probably 
will be.  Only if they’re not, start worrying about optimization.

Try to make SQL serve the way you want to organise your data, not the other way 
around.

> So if it is correct that the performance of indexed select is up to the
> number of index pages which is fetched in getPageNormal() within the select?
> I think it has positive correlation but I do not know if it is the major
> constraint.

More complicated than that.  For instance, once SQLite has found the right 
entry in the index it might need to look up that entry in the table to retrieve 
values which are not in the index.  So it might be better to make an index 
which contains all those values (called a "covering index").  But it might not, 
because that will make the index bigger, and will mean that your computer has 
to do a lot more disk access while searching the index.

> And does sqlite have a profile tool to get call tree or execution time of
> each functions? All I know is VDBE_PROFILE.

No.  But the shell tool has a timer which can closely time the execution of any 
command.  And that is a far more reliable way of knowing what will take longer 
or shorter in the real world than timing individual calls.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] difference between 'ID IS NULL' and 'ID = NULL'

2018-01-05 Thread Cezary H. Noweta

Hello,

On 2018-01-05 21:56, David Raymond wrote:

sqlite> explain select id from tbl where id = null;
addr  opcode p1p2p3p4 p5  comment
  -        -  --  -
0 Init   0 8 000  Start at 8
1 OpenRead   0 3 0 0  00  root=3 iDb=0; tbl
2 Explain0 0 0 SEARCH TABLE tbl USING INTEGER PRIMARY 
KEY (rowid=?)  00
3 Null   0 1 000  r[1]=NULL
4 SeekRowid  0 7 100  intkey=r[1]; pk
5 Copy   1 2 000  r[2]=r[1]
6 ResultRow  2 1 000  output=r[2]
7 Halt   0 0 000
8 Transaction0 0 2 0  01  usesStmtJournal=0
9 Goto   0 1 000
Run Time: real 0.000 user 0.00 sys 0.00


??? My 3.21 checks if NULL is NULL and exits:

2 Null   0 1
3 IsNull 1 9
...
9 Halt   0 0

-- best regards

Cezary H. Noweta
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] The performance of indexed select

2018-01-05 Thread Nick
I am trying to analysis the performance of indexed select. 

CREATE TABLE t2(a INTEGER, b INTEGER, c TEXT);
CREATE INDEX t2c ON t2(c); 

I think there may be much more leaf index b-tree pages whose header is
'0x0A' if the length of the content of index key 'c' is always 20-25 bytes,
as I notice the format of index inside sqlite consist of the index key and
rowid.

I can establish mapping relation between column 'c' and a new INTEGER column
'd'. Then I am wondering if it is reasonable to create new index t2(d) to
get a better performance, as sqlite stores INTEGER in a variable-length way
which means there will be less index pages. 

So if it is correct that the performance of indexed select is up to the
number of index pages which is fetched in getPageNormal() within the select?
I think it has positive correlation but I do not know if it is the major
constraint. 

And does sqlite have a profile tool to get call tree or execution time of
each functions? All I know is VDBE_PROFILE. 

Thanks for any light you can shed.


I want to profile sqlite



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Dinu
Keith Medcalf wrote
> The primary difference is likely the test on b.c.  

/There was some text here that seems to have been lost and my browser opened
a "super resume templates" page... maybe I've been already spectre-hacked?
Anyway, I repost:/

The primary difference from what I see is that it changes the query plan to
start off with an index-based search on b, not a search (scan) on a. To me,
this can be due to the supplimentary condition in the ON clause making this
plan more appealing to the optimizer. So with this in mind, an easy
optimisation would be to just "permeate" any imperative non-null condition
in the where clause to the on clause. I don't know if the execution plan is
identical to that of an inner join (I never tried to actually convert it to
an inner join and I'm working on sth else atm), but it's satisfactory and it
starts off the right way, with a search on b, with an execution time well in
the expected margins.


Keith Medcalf wrote
> I think the query is ill-conceived when written as an outer join.

If anything, to reflect intentionality the query should have been written as
such:

SELECT FROM (
  SELECT FROM a LEFT JOIN b
) WHERE b.c=5




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Dinu
Keith Medcalf wrote
> The primary difference is likely the test on b.c.  


Keith Medcalf wrote
> I think the query is ill-conceived when written as an outer join.

If anything, to reflect intentionality the query should have been written as
such:

SELECT FROM (
  SELECT FROM a LEFT JOIN b
) WHERE b.c=5




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Keith Medcalf
>To reiterate, Keith: to get the query to execute properly, I didn't
>change
>the LEFT JOIN to an INNER JOIN!
>Nope,
>I rewrote
>
>SELECT
>FROM
>  a
>  LEFT JOIN b ON 
>WHERE b.c=5

This means:

Take all rows of table a
match with rows of table b on the key and where there is no match "fill" 
columns that would have come from b with a NULL
AND THEN AFTER THAT
return all rows where b.c = 5


>to

>SELECT
>FROM
>  a
>  LEFT JOIN b ON  AND b.c=5
>WHERE b.c IS NOT NULL

This translates to:
Take all rows of table a
match with rows of table b on the key AND where b.c = 5 and where there is no 
match "fill" columns that would have come from b with a NULL
AND THEN AFTER THAT
return only rows where b.c is not NULL

The primary difference is likely the test on b.c.  You see, to check if b.c is 
a particular value requires unpacking the row and retrieving the value.  
Checking whether b.c IS NULL does not require unpacking the data but merely 
reading the 'datatype', which should be a much faster operation.

If you wrote the query as an inner join it would be much faster since it would 
not be possible after projection for b.c to be anything other than not null.

>So I just added a redundant predicate and it runs perfectly, on
>SQLite!

>That's why I said this simple improvement can surely be taken care of
>on the
>optimizer, while the larger discussion of actually changing the outer
>join
>to an inner join or even tackling the -OR- case is for sure something
>nice
>to think of, but increasingly more complicated.

I think the query is ill-conceived when written as an outer join.  You are 
asking the query to "project" a result containing a crap-load of rows which you 
will subsequently remove from the result set.  For bitty-bases (a couple of 
hundred rows) this is irrelevant.  However, for tables containing millions of 
rows, creating a googillion of rows then culling out all of them except 4 is a 
very bad use of computer resources.  Why generate them in the first place if 
you are just going to discard them?

I have seen many a mis-designed program which works blazingly quickly on a 
"test" database that contains a dozen customers, half-a-dozen products, with 
two or three components each.  But when populated with the production data of 
1,000,000 customers, 500,000 products, and between 100 and 10,000 components 
per product frankly take "years" to run any step and are completely unuseable.

One should always design as if each table contained a googillion rows.  If it 
is fast enough then, it will certainly be efficient enough to handle your ten 
row per table database.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Dinu
To reiterate, Keith: to get the query to execute properly, I didn't change
the LEFT JOIN to an INNER JOIN!
Nope,
I rewrote

SELECT
FROM
  a
  LEFT JOIN b ON 
WHERE b.c=5

to

SELECT
FROM
  a
  LEFT JOIN b ON  AND b.c=5
WHERE b.c IS NOT NULL

So I just added a redundant predicate and it runs perfectly, on SQLite!
That's why I said this simple improvement can surely be taken care of on the
optimizer, while the larger discussion of actually changing the outer join
to an inner join or even tackling the -OR- case is for sure something nice
to think of, but increasingly more complicated.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Dinu
Keith Medcalf wrote
> but cannot be re-ordered for OUTER JOINS.

Actually, I think order requirements (or rather, row grouping requirements,
as far as I can paint it the requirement is just that all outer joined rows
come in a bunch for each main join tuple) would not be violated if the join
is made on an unique key left-side and an index is used right-side :) or
something similar. I don't know, extensive algebra must be involved :) Even
without index order inference, the main trunk keys can be sorted in a temp
structure to preserve the condition, like in a GROUP BY query. However, I do
see that SQLite seems to actually do it (scan b before a I mean) if I
reqrite the query as I showed.

select *
  from a, b
 where a.a *= b.a

was replaced by

select *
  from a LEFT JOIN b ON a.a = b.a

Right, right, and with this in mind you can see my problem with the query is
so easy to understand:

My query, on the old format, is:

select *
from a, b
where 
  a.a *= b.a AND
  b.c = 5

My "improved" query, on the old format:

select *
from a, b
where 
  a.a *= b.a AND
  b.c *= 5 AND
  b.c = 5 // (OR IS NOT NULL)

You can see the b.c *= 5 (JOIN ON ... AND b.c=5 ... WHERE b.c IS NOT NULL)
is redundant, because it's just a weaker predicate, and I needed to add it
just as an index hint on the join loop to trigger the right execution plan.


Keith Medcalf wrote
> I would sincerely doubt that there is *any* SQL optimizer or query planner
> that can optimize mis-spoken queries containing OUTER JOINS.

I don't know about mis-spoken, I don't think anything is mis-spoken.
This app is running fine on Maria, I'm in the process of porting in to
SQLite. I wouldn't have picked on this query unless it was lagging behind
orders of magnitude (2.5s vs 50ms). So I think Maria does it (I haven't
bothered to check the execution plan there, went straight to hacking
SQLite).



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] difference between 'ID IS NULL' and 'ID = NULL'

2018-01-05 Thread Simon Slavin
On 5 Jan 2018, at 9:03pm, Doug Currie  wrote:

> NULL is not equal to NULL, though NULL is NULL.
> 
> sqlite> select NULL IS NULL;
> 
> 1
> 
> sqlite> select NULL = NULL;
> 
> 
> sqlite>

To expand on this, in SQL NULL has a special meaning.  It means "value unknown 
or missing".  It turns up naturally in some places, for example, in JOINs where 
the joined row doesn’t exist.  Don’t think of NULL as a value.  Think of it as 
a flag.

Because of this, testing for NULL is unexpectedly complicated, as shown by the 
above.  Two values might both be unknown, but this doesn’t automatically mean 
that they have the same value.  Yet an unknown value is indeed unknown, even if 
we don’t know what it is.  That’s why you get the results Doug posted above.

Many examples I see which store NULL as a value in the database don’t really 
mean NULL, they mean 0 or the empty string, or an empty list, and changing the 
database to store a more appropriate value lets the programmer strip many lines 
of complicated logic out of their software.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficient query to count number of leaves in a DAG.

2018-01-05 Thread David Raymond
Something is seriously funky here. I'm getting the opposite, where your query 
appears to be going faster than mine. I used your queries there to populate 
nodes and edges, based on 1,000,000 nodes. I even added in the extra index 
which turns out isn't used anyway. With it all in memory my version is taking 
59 seconds, whereas your _new version is taking 28 seconds and the old version 
is only 34. So apparently I should be taking query advice from you.

If I change my union into a union all it goes down to 31 seconds, so closer to 
yours.
If I change your union all into a union the time jumps to 156 seconds.
I think I was thinking of a graph with possible loops or multiple paths to get 
from A to B, which is why I went with the union.

So my next question is: what SQLite version are you using, and what hardware 
are you on?

Are you query plans looking like what I'm seeing here?

sqlite> select * from v_count_leaves_new where top = 777;
--EQP-- 3,0,0,SCAN TABLE nodes
--EQP-- 4,0,1,SCAN TABLE r
--EQP-- 4,1,0,SEARCH TABLE edges AS e USING COVERING INDEX 
sqlite_autoindex_edges_1 (parent=?)
--EQP-- 2,0,0,COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
--EQP-- 1,0,0,SCAN SUBQUERY 2
--EQP-- 0,0,0,USING INDEX sqlite_autoindex_edges_1 FOR IN-OPERATOR
--EQP-- 0,0,0,SCAN SUBQUERY 1
top|count(*)
777|314
Run Time: real 28.502 user 28.454582 sys 0.00

--now with union all
sqlite> select * from leafCounts2 where parent = 777;
--EQP-- 3,0,0,SCAN TABLE edges
--EQP-- 4,0,0,SCAN TABLE paths
--EQP-- 4,1,1,SEARCH TABLE edges USING COVERING INDEX sqlite_autoindex_edges_1 
(parent=?)
--EQP-- 2,0,0,COMPOUND SUBQUERIES 0 AND 0 (UNION ALL)
--EQP-- 1,0,0,SCAN SUBQUERY 2
--EQP-- 1,0,0,EXECUTE LIST SUBQUERY 5
--EQP-- 5,0,0,SCAN TABLE nodes
--EQP-- 5,1,1,SEARCH TABLE edges USING COVERING INDEX sqlite_autoindex_edges_1 
(parent=?)
--EQP-- 0,0,0,SCAN SUBQUERY 1
parent|leafCount
777|314
Run Time: real 31.590 user 31.434202 sys 0.00


-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of Shane Dev
Sent: Friday, January 05, 2018 4:57 PM
To: SQLite mailing list
Subject: Re: [sqlite] Efficient query to count number of leaves in a DAG.

Hi David,

According to https://sqlite.org/lang_with.html, "Optimization note: ...if
the example had used UNION instead of UNION ALL, then SQLite would have had
to keep around all previously generated content in order to check for
duplicates. For this reason, programmers should strive to use UNION ALL
instead of UNION when feasible."

Despite that, your RCTE with UNION is much faster than mine.

sqlite> select count(*) from nodes;
count(*)
1
sqlite> select count(*) from edges;
count(*)
9990

Here is how create my test data -

sqlite> .sch v_generate_nodes
-- Generates an infinite series of x, 'nodex' records where x = 1, 2, 3 ...
CREATE VIEW v_generate_nodes as with recursive rcte(id, description) as
(select 1, 'node1' union all select id+1, 'node'||(id+1) from rcte) select
* from rcte;
sqlite> insert into nodes select from v_generate_nodes limit 1;

sqlite> .sch v_generate_edges
-- Randomly generates edges between entries in the nodes table.
---Assumption : node ids are 1, 2, 3...n without gaps
-- Each node will have 0 or 1 parents and 0, 1, 2, ... children
CREATE VIEW v_generate_edges as with rcte(parent, child) as (select
cast(abs(random())/9223372036854775808 as integer), 1 union all select
cast(abs(random())/9223372036854775808*(child+1) as integer), child+1 from
rcte where child <= (select count(*) from nodes) limit (select count(*)
from nodes)) select * from rcte where parent>0;
sqlite> insert into edges select * from v_generate_edges;



On 5 January 2018 at 18:32, David Raymond  wrote:

> Hmm. Maybe try yours with union instead of union all? Though if there's
> only 1 path between any pair of nodes that shouldn't make too much
> difference. Otherwise I'm getting low on ideas.
>
> What're the record counts for nodes and edges?
>
> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Shane Dev
> Sent: Thursday, January 04, 2018 5:20 PM
> To: SQLite mailing list
> Subject: Re: [sqlite] Efficient query to count number of leaves in a DAG.
>
> Hi David
>
> I recommend using longer names than 1 letter for your aliases, what you
> > save in typing you lose a couple times over again when wondering what "r"
> > is or why "t" has anything to do with "nodes"
> >
>
> Fair enough. I tend to use shorts names to reduce the risk of typos. My
> original node table was called "tasks". I tried to simplify the query for
> this forum post but neglected to change the alias.
>
> >
> > In your CTE you're doing a 3 table join. There's no need to include the
> > nodes table in there at all, you can get the node ID from the edge table.
> > ...union all select e.child, top from r, edges as e where e.parent =
> r.id
> > )...
> >
>
> You're right in this case. My original 

Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Keith Medcalf

>Thanks for acknowledging this, you are absolutely right, that's why I
>stated
>that no DB does perfect optimisations and that computing the
>alternative
>-OR- based closures are probably much harder to tackle. Also E. Pasma
>pointed out the -OR- queries as I wrote them are not really
>semantically
>equivalent unless the 2 joins are disjunct.

I suspect that query re-write of an outer join to an inner join would be 
violating some rule in the new SQL standards, most likely about visitation 
(nested loop) order -- which is freely reorderable for INNER JOINS but cannot 
be re-ordered for OUTER JOINS.  Back in the "olden days" one specified outer 
join conditions in the WHERE clause using *= =* or *=* syntax, where the * was 
on the side of the operator where all rows came from.  This was deprecated many 
years ago when the  JOIN ON  syntatactic sugar was created 
because too many people where forgetting that they need to use the appropriate 
* to "bind" the where condition to the appropriate outer join 
binding, then complaining that it was too difficult to remember or figure out 
where a plain "WHERE" clause was needed and where a "OUTER JOIN" bound 
condition was required, resulting in many calls to SQL Database support lines.  
And these same vendors also happen to be who write the specs, so they promptly 
changed the spec to eliminate the support calls.

So the ** format was written out of the standard and the OUTER JOIN 
ON became mandatory ONLY FOR OUTER JOIN operations.  In all other cases the 
JOIN ON syntax was just syntactic sugar for the old "list of tables" and where 
clause.  Don't recall exactly when this occurred but it was about two decades 
ago.  ( I remember it well because there were many queries that could not be 
expressed in the new-fangled format )

So really, adding conditions to apply to an OUTER JOIN to the where clause is 
equivalent to the common error of yester-decades of forgetting the *.

OLD SYNTAX:

select *
  from a, b
 where a.a *= b.a

was replaced by

select *
  from a LEFT JOIN b ON a.a = b.a

Of course, the old syntax allowed one to specify algebraic conditions that can 
no longer be expressed with the new syntax.  But that is OK, it is easier and 
simpler to say:  go RTFM, it is jolly clear, and if you don't like it then 
boo-hoo on you.  A good implementation was destroyed.  Live with it, love it, 
and get over it.

I would sincerely doubt that there is *any* SQL optimizer or query planner that 
can optimize mis-spoken queries containing OUTER JOINS.  Not even DB2's 
exhaustive search query planner/optimizer can do it, and it is quite possibly 
now one of the best in existence if you tell it that it can take unlimited time 
and resources to generate (and run) the plan.  All others pale in comparison.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Preventing a slow, determinstic function from running twice

2018-01-05 Thread Rob Hoelz
On Fri, 5 Jan 2018 14:42:00 -0600
Rob Hoelz  wrote:

> On Fri, 5 Jan 2018 15:04:16 -0500
> Richard Hipp  wrote:
> 
> > On 1/5/18, r...@hoelz.ro  wrote:
> > > Hi SQLite users and devs,
> > >
> > > I have an application using SQLite which stores XZ-compressed
> > > blobs of JSON from the Twitter API to minimize disk usage.  My
> > > unxz function is a little slow, and I've noticed that if I
> > > specify the function several times in a query, it gets run
> > > multiple times, even though it's deterministic.  For example:
> > >
> > 
> > Your example did not go through.  Please resend.  Use plain-text
> > email for best results.
> > 
> > >
> > >
> > > In the above query, unxz is run three times, even though content
> > > doesn't change within the same row.  Is there a way to tell SQLite
> > > to only run a function once per row of results?  I looked into
> > > https://sqlite.org/c3ref/get_auxdata.html, but it appears that
> > > metadata is only stored for compile-time constants.
> > >
> > > Thanks,
> > > Rob
> > >
> > >
> > >
> > >
> > > --
> > > Sent from: http://sqlite.1065341.n5.nabble.com/
> > > ___
> > > sqlite-users mailing list
> > > sqlite-users@mailinglists.sqlite.org
> > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> > >
> > 
> > 
> 
> 
> Here's the example:
> 
> > select json_ref(unxz(content), '$.user.id'), json_ref(unxz(content),
> > '$.full_text) from archive where json_ref(unxz(content),
> > '$.full_text') like '%dogs%';
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Sorry - I just noticed that I wrote "json_ref" where I meant to write 
"json_extract".
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficient query to count number of leaves in a DAG.

2018-01-05 Thread Shane Dev
Hi David,

According to https://sqlite.org/lang_with.html, "Optimization note: ...if
the example had used UNION instead of UNION ALL, then SQLite would have had
to keep around all previously generated content in order to check for
duplicates. For this reason, programmers should strive to use UNION ALL
instead of UNION when feasible."

Despite that, your RCTE with UNION is much faster than mine.

sqlite> select count(*) from nodes;
count(*)
1
sqlite> select count(*) from edges;
count(*)
9990

Here is how create my test data -

sqlite> .sch v_generate_nodes
-- Generates an infinite series of x, 'nodex' records where x = 1, 2, 3 ...
CREATE VIEW v_generate_nodes as with recursive rcte(id, description) as
(select 1, 'node1' union all select id+1, 'node'||(id+1) from rcte) select
* from rcte;
sqlite> insert into nodes select from v_generate_nodes limit 1;

sqlite> .sch v_generate_edges
-- Randomly generates edges between entries in the nodes table.
---Assumption : node ids are 1, 2, 3...n without gaps
-- Each node will have 0 or 1 parents and 0, 1, 2, ... children
CREATE VIEW v_generate_edges as with rcte(parent, child) as (select
cast(abs(random())/9223372036854775808 as integer), 1 union all select
cast(abs(random())/9223372036854775808*(child+1) as integer), child+1 from
rcte where child <= (select count(*) from nodes) limit (select count(*)
from nodes)) select * from rcte where parent>0;
sqlite> insert into edges select * from v_generate_edges;



On 5 January 2018 at 18:32, David Raymond  wrote:

> Hmm. Maybe try yours with union instead of union all? Though if there's
> only 1 path between any pair of nodes that shouldn't make too much
> difference. Otherwise I'm getting low on ideas.
>
> What're the record counts for nodes and edges?
>
> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Shane Dev
> Sent: Thursday, January 04, 2018 5:20 PM
> To: SQLite mailing list
> Subject: Re: [sqlite] Efficient query to count number of leaves in a DAG.
>
> Hi David
>
> I recommend using longer names than 1 letter for your aliases, what you
> > save in typing you lose a couple times over again when wondering what "r"
> > is or why "t" has anything to do with "nodes"
> >
>
> Fair enough. I tend to use shorts names to reduce the risk of typos. My
> original node table was called "tasks". I tried to simplify the query for
> this forum post but neglected to change the alias.
>
> >
> > In your CTE you're doing a 3 table join. There's no need to include the
> > nodes table in there at all, you can get the node ID from the edge table.
> > ...union all select e.child, top from r, edges as e where e.parent =
> r.id
> > )...
> >
>
> You're right in this case. My original node table "tasks" had more columns
> which I wanted in the final result set.
>
> >
> > The big thing though is in the where clause.
> > where...and id not in (select parent from edges where parent = id)...
> >
>
> That was a sloppy mistake, I changed it to  ..and id not in (select parent
> from edges)... but it was still very slow
>
> >
> > Old:
> > sqlite> explain query plan with recursive r (id, top) as (select id, id
> > from nodes union all select t.id, top from nodes as t, edges as e, r
> > where e.parent = r.id and t.id = e.child) select top, count(*) from r
> > where top != id and id not in (select parent from edges where parent =
> id)
> > group by top;
> >
>
> CREATE VIEW v_count_leaves as with recursive r (id, top) as (select id, id
> from nodes union all select t.id, top from nodes as t, edges as e, r where
> e.parent = r.id and t.id = e.child) select top, count(*) from r where top
> != id and id not in (select parent from edges where parent = id) group by
> top;
>
> sqlite> select * from v_count_leaves where top=679;
> top count(*)
> 679 2
> Run Time: real 73.365 user 73.328125 sys 0.00
>
>
> > New:
> > sqlite> explain query plan with recursive r (id, top) as (select id, id
> > from nodes union all select e.child, top from edges as e, r where
> e.parent
> > = r.id) select top, count(*) from r where top != id and id not in
> (select
> > parent from edges) group by top;
> > Now give your modified query a go and let me know how it compares to what
> > I came up with.
> >
>
> CREATE VIEW v_count_leaves_new as with recursive r (id, top) as (select id,
> id from nodes union all select e.child, top from edges as e, r where
> e.parent = r.id) select top, count(*) from r where top != id and id not in
> (select parent from edges) group by top;
>
> sqlite> select * from v_count_leaves_new where top=679;
> top count(*)
> 679 2
> Run Time: real 45.099 user 45.093750 sys 0.00
>
> faster, but about 8 times slower than your query -
>
> sqlite> select * from leafcounts where parent=679;
> parent  leafCount
> 679 2
> Run Time: real 5.639 user 5.640625 sys 0.00
>
> and that is without the reverseEdges index.
>
> I still don't understand 

Re: [sqlite] Preventing a slow, determinstic function from running twice

2018-01-05 Thread Rob Hoelz
On Fri, 5 Jan 2018 12:02:21 -0800
Jens Alfke  wrote:

> > On Jan 5, 2018, at 11:25 AM, r...@hoelz.ro wrote:
> > 
> > … My unxz function is a little slow, and I've noticed that if I
> > specify the function several times in a query, it gets run multiple
> > times, even though it's deterministic. … Is there a way to tell
> > SQLite to only run a function once per row of results?
> 
> Not currently. There was a long-ish email thread about that here last
> September, with subject "Common subexpression optimization of
> deterministic functions”. I don’t think there was any resolution,
> just that it’s an optimization that hasn’t been implemented yet.
> 
> —Jens
> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Thanks for letting me know about the thread, Jens - I did a cursory search of 
the archives but I must've been using the wrong keywords!  I'll read up on that.

-Rob
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread R Smith

At the risk of preserving this thread well past its end of life cycle...

On 2018/01/05 6:58 PM, Dinu wrote:

Ryan,

You cannot ask SQL a Method query, that's where my whole RDBMS understanding
takes me.


Everything you ask SQL is underpinned by a specific Method. Perhaps I 
should have been more clear - by METHOD I mean, in the case of the join 
method, "the /way/ you expect the query to enumerate rows" such as an 
Inner join being one method, and outer join being another method. 
Sorting/Ordering is a method of output, grouping, etc.


Why would the SQL standard propose these different methods if they were 
not meaningful and distinct?


As I said before, there need not be an AI to judge that the query 
uttered by the programmer can in fact, in a narrow case, be recomputed 
as another query because the result will be the same and hopefully that 
the alternate method would be more efficient.


Note that I said there needn't be... I did not say there /can't/ be one, 
indeed, query flattening is a good example, but with query flattening 
the cost is low and the reward is high for a really broad spectrum of cases.


Why did I say it is not needed?  Well, what you propose has a relatively 
high cost (added heuristic AI) considering it is paid across all DB 
engine query planning to achieve a small advantage in the narrowest of 
use cases, not to mention that - should the programmer wish for a 
speed-up for the left join that conforms to this narrow set of 
circumstances, he or she could instantly change it to a normal join (the 
way it should have been to start with) and enjoy the fruits of the added 
speed with zero cost to the rest of us who wouldn't have made the 
imperfect query in the first place.


Why do I call it narrow?  Have you looked at your example queries in 
detail? Do you know how many things must be exactly just so (or how many 
other normal query things must be absent) for that join replacement to 
work algebraically? At least in the case of query flattening, it 
improved a query construct that is found abundantly and considered the 
correct construct for the expected results.


You essentially want the engine to second-guess programmers who didn't 
write the best query for their expected results. That kind of 
hand-holding belongs to the realms of Microsoft and MySQL.


(Apologies for all the word clarifications, but I'd rather avoid having 
this turn into a "semantics" debate, so trying to be as clear as 
possible on meanings of statements... not sure I succeeded though)  :)



Cheers,
Ryan

PS: I'm not judging MySQL, at least it has the benefit of being a fully 
fledged server-side software and greatly tweak-able on the fly for all 
its hand-holdy functionality.
PPS: I will say this - If you're not using MySQL in STRICT mode, you 
are  n  hours away from some disaster, where n is a not-too-big positive 
integer.

PPPS: I wish SQLite had a STRICT mode. :)



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] difference between 'ID IS NULL' and 'ID = NULL'

2018-01-05 Thread Doug Currie
Cezary is correct,

NULL is not equal to NULL, though NULL is NULL.

sqlite> select NULL IS NULL;

1

sqlite> select NULL = NULL;


sqlite>


e
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] difference between 'ID IS NULL' and 'ID = NULL'

2018-01-05 Thread Cezary H. Noweta

Hwllo,

On 2018-01-05 21:19, x wrote:

Tbl has 2.4 million records and has an INTEGER PRIMARY KEY ID which is NOT NULL.

SELECT ID FROM Tbl where ID = NULL


It has no sense as ``='' returns NULL when one of operands is NULL. NULL 
evaluates to FALSE. Above query can be described as SELECT NOTHING.



SELECT ID FROM Tbl WHERE ID IS NULL



takes around 350 milliseconds to run and the explain query plan shows it uses 
some covering index.


SQLite is looking for NULL values of ID.

Please, please, do not require to optimize such query due to the fact 
that your field does not contain NULLs. :-)



I’m sure I’ve read something before about this but can’t find it in the 
documentation?


STH IS STH allows NULLs and returns 1 if both are NULLs, 0 otherwise. 
http://sqlite.org/lang_expr.html#isisnot


-- best regards

Cezary H. Noweta
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] difference between 'ID IS NULL' and 'ID = NULL'

2018-01-05 Thread David Raymond
I  it's treating "is" as an operator on the value.

So with the = version, it knows it can use the index to zip to where the nulls 
would be and ends pretty quick when it doesn't find any.

With the IS, it thinks it has to check every value because indexes are sorted 
on =, <, > etc, and not on "is"

This is just my guess anyway. What has you checking an integer primary key for 
null anyway?


SQLite version 3.21.0 2017-10-24 18:55:49
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

sqlite> create table tbl (id integer primary key);

sqlite> explain query plan select id from tbl where id = null;
selectid|order|from|detail
0|0|0|SEARCH TABLE tbl USING INTEGER PRIMARY KEY (rowid=?)

sqlite> explain query plan select id from tbl where id is null;
selectid|order|from|detail
0|0|0|SCAN TABLE tbl

sqlite> explain select id from tbl where id = null;
addr  opcode p1p2p3p4 p5  comment
  -        -  --  -
0 Init   0 8 000  Start at 8
1 OpenRead   0 3 0 0  00  root=3 iDb=0; tbl
2 Explain0 0 0 SEARCH TABLE tbl USING INTEGER PRIMARY 
KEY (rowid=?)  00
3 Null   0 1 000  r[1]=NULL
4 SeekRowid  0 7 100  intkey=r[1]; pk
5 Copy   1 2 000  r[2]=r[1]
6 ResultRow  2 1 000  output=r[2]
7 Halt   0 0 000
8 Transaction0 0 2 0  01  usesStmtJournal=0
9 Goto   0 1 000
Run Time: real 0.000 user 0.00 sys 0.00

sqlite> explain select id from tbl where id is null;
addr  opcode p1p2p3p4 p5  comment
  -        -  --  -
0 Init   0 10000  Start at 10
1 OpenRead   0 3 0 0  00  root=3 iDb=0; tbl
2 Explain0 0 0 SCAN TABLE tbl  00
3 Rewind 0 9 000
4   Rowid  0 1 000  r[1]=rowid
5   NotNull1 8 000  if r[1]!=NULL goto 8
6   Copy   1 2 000  r[2]=r[1]
7   ResultRow  2 1 000  output=r[2]
8 Next   0 4 001
9 Halt   0 0 000
10Transaction0 0 2 0  01  usesStmtJournal=0
11Goto   0 1 000


-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of x
Sent: Friday, January 05, 2018 3:20 PM
To: sqlite-users@mailinglists.sqlite.org
Subject: [sqlite] difference between 'ID IS NULL' and 'ID = NULL'

Tbl has 2.4 million records and has an INTEGER PRIMARY KEY ID which is NOT NULL.

SELECT ID FROM Tbl where ID = NULL

takes only a few milliseconds to run and the explain query plan shows it uses 
the primary key.

SELECT ID FROM Tbl WHERE ID IS NULL

takes around 350 milliseconds to run and the explain query plan shows it uses 
some covering index.


I’m sure I’ve read something before about this but can’t find it in the 
documentation?

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Preventing a slow, determinstic function from running twice

2018-01-05 Thread Rob Hoelz
On Fri, 5 Jan 2018 15:04:16 -0500
Richard Hipp  wrote:

> On 1/5/18, r...@hoelz.ro  wrote:
> > Hi SQLite users and devs,
> >
> > I have an application using SQLite which stores XZ-compressed blobs
> > of JSON from the Twitter API to minimize disk usage.  My unxz
> > function is a little slow, and I've noticed that if I specify the
> > function several times in a query, it gets run multiple times, even
> > though it's deterministic.  For example:
> >
> 
> Your example did not go through.  Please resend.  Use plain-text email
> for best results.
> 
> >
> >
> > In the above query, unxz is run three times, even though content
> > doesn't change within the same row.  Is there a way to tell SQLite
> > to only run a function once per row of results?  I looked into
> > https://sqlite.org/c3ref/get_auxdata.html, but it appears that
> > metadata is only stored for compile-time constants.
> >
> > Thanks,
> > Rob
> >
> >
> >
> >
> > --
> > Sent from: http://sqlite.1065341.n5.nabble.com/
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> 
> 


Here's the example:

> select json_ref(unxz(content), '$.user.id'), json_ref(unxz(content),
> '$.full_text) from archive where json_ref(unxz(content),
> '$.full_text') like '%dogs%';
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] difference between 'ID IS NULL' and 'ID = NULL'

2018-01-05 Thread x
Tbl has 2.4 million records and has an INTEGER PRIMARY KEY ID which is NOT NULL.

SELECT ID FROM Tbl where ID = NULL

takes only a few milliseconds to run and the explain query plan shows it uses 
the primary key.

SELECT ID FROM Tbl WHERE ID IS NULL

takes around 350 milliseconds to run and the explain query plan shows it uses 
some covering index.


I’m sure I’ve read something before about this but can’t find it in the 
documentation?

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Preventing a slow, determinstic function from running twice

2018-01-05 Thread Richard Hipp
On 1/5/18, r...@hoelz.ro  wrote:
> Hi SQLite users and devs,
>
> I have an application using SQLite which stores XZ-compressed blobs of JSON
> from the Twitter API to minimize disk usage.  My unxz function is a little
> slow, and I've noticed that if I specify the function several times in a
> query, it gets run multiple times, even though it's deterministic.  For
> example:
>

Your example did not go through.  Please resend.  Use plain-text email
for best results.

>
>
> In the above query, unxz is run three times, even though content doesn't
> change within the same row.  Is there a way to tell SQLite to only run a
> function once per row of results?  I looked into
> https://sqlite.org/c3ref/get_auxdata.html, but it appears that metadata is
> only stored for compile-time constants.
>
> Thanks,
> Rob
>
>
>
>
> --
> Sent from: http://sqlite.1065341.n5.nabble.com/
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Preventing a slow, determinstic function from running twice

2018-01-05 Thread Jens Alfke


> On Jan 5, 2018, at 11:25 AM, r...@hoelz.ro wrote:
> 
> … My unxz function is a little slow, and I've noticed that if I specify the 
> function several times in a query, it gets run multiple times, even though 
> it's deterministic. …
> Is there a way to tell SQLite to only run a function once per row of results?

Not currently. There was a long-ish email thread about that here last 
September, with subject "Common subexpression optimization of deterministic 
functions”. I don’t think there was any resolution, just that it’s an 
optimization that hasn’t been implemented yet.

—Jens

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Preventing a slow, determinstic function from running twice

2018-01-05 Thread r...@hoelz.ro
Hi SQLite users and devs,

I have an application using SQLite which stores XZ-compressed blobs of JSON
from the Twitter API to minimize disk usage.  My unxz function is a little
slow, and I've noticed that if I specify the function several times in a
query, it gets run multiple times, even though it's deterministic.  For
example:



In the above query, unxz is run three times, even though content doesn't
change within the same row.  Is there a way to tell SQLite to only run a
function once per row of results?  I looked into
https://sqlite.org/c3ref/get_auxdata.html, but it appears that metadata is
only stored for compile-time constants.

Thanks,
Rob




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] new Error database disk image is malformed

2018-01-05 Thread Ron Barnes
I'm coding that now.  I never thought about it.  Thank you!

-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of Jens Alfke
Sent: Friday, January 5, 2018 1:34 PM
To: SQLite mailing list 
Subject: Re: [sqlite] new Error database disk image is malformed



> On Jan 4, 2018, at 6:29 PM, Peter Da Silva  
> wrote:
> 
> Since you're I/O bound on socket connections, and not CPU or database bound, 
> you might want to just have one database thread that communicates using 
> native inter-thread messaging to pass out work and accept responses from the 
> worker threads.

+1. Also, this will make it easy to batch multiple updates into a single 
transaction, which greatly improves write performance.

—Jens

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] new Error database disk image is malformed

2018-01-05 Thread Jens Alfke


> On Jan 4, 2018, at 6:29 PM, Peter Da Silva  
> wrote:
> 
> Since you're I/O bound on socket connections, and not CPU or database bound, 
> you might want to just have one database thread that communicates using 
> native inter-thread messaging to pass out work and accept responses from the 
> worker threads.

+1. Also, this will make it easy to batch multiple updates into a single 
transaction, which greatly improves write performance.

—Jens

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Efficient query to count number of leaves in a DAG.

2018-01-05 Thread David Raymond
Hmm. Maybe try yours with union instead of union all? Though if there's only 1 
path between any pair of nodes that shouldn't make too much difference. 
Otherwise I'm getting low on ideas.

What're the record counts for nodes and edges?

-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of Shane Dev
Sent: Thursday, January 04, 2018 5:20 PM
To: SQLite mailing list
Subject: Re: [sqlite] Efficient query to count number of leaves in a DAG.

Hi David

I recommend using longer names than 1 letter for your aliases, what you
> save in typing you lose a couple times over again when wondering what "r"
> is or why "t" has anything to do with "nodes"
>

Fair enough. I tend to use shorts names to reduce the risk of typos. My
original node table was called "tasks". I tried to simplify the query for
this forum post but neglected to change the alias.

>
> In your CTE you're doing a 3 table join. There's no need to include the
> nodes table in there at all, you can get the node ID from the edge table.
> ...union all select e.child, top from r, edges as e where e.parent = r.id
> )...
>

You're right in this case. My original node table "tasks" had more columns
which I wanted in the final result set.

>
> The big thing though is in the where clause.
> where...and id not in (select parent from edges where parent = id)...
>

That was a sloppy mistake, I changed it to  ..and id not in (select parent
from edges)... but it was still very slow

>
> Old:
> sqlite> explain query plan with recursive r (id, top) as (select id, id
> from nodes union all select t.id, top from nodes as t, edges as e, r
> where e.parent = r.id and t.id = e.child) select top, count(*) from r
> where top != id and id not in (select parent from edges where parent = id)
> group by top;
>

CREATE VIEW v_count_leaves as with recursive r (id, top) as (select id, id
from nodes union all select t.id, top from nodes as t, edges as e, r where
e.parent = r.id and t.id = e.child) select top, count(*) from r where top
!= id and id not in (select parent from edges where parent = id) group by
top;

sqlite> select * from v_count_leaves where top=679;
top count(*)
679 2
Run Time: real 73.365 user 73.328125 sys 0.00


> New:
> sqlite> explain query plan with recursive r (id, top) as (select id, id
> from nodes union all select e.child, top from edges as e, r where e.parent
> = r.id) select top, count(*) from r where top != id and id not in (select
> parent from edges) group by top;
> Now give your modified query a go and let me know how it compares to what
> I came up with.
>

CREATE VIEW v_count_leaves_new as with recursive r (id, top) as (select id,
id from nodes union all select e.child, top from edges as e, r where
e.parent = r.id) select top, count(*) from r where top != id and id not in
(select parent from edges) group by top;

sqlite> select * from v_count_leaves_new where top=679;
top count(*)
679 2
Run Time: real 45.099 user 45.093750 sys 0.00

faster, but about 8 times slower than your query -

sqlite> select * from leafcounts where parent=679;
parent  leafCount
679 2
Run Time: real 5.639 user 5.640625 sys 0.00

and that is without the reverseEdges index.

I still don't understand why "leafcounts" is so much faster than
"v_count_leaves_new"




> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Shane Dev
> Sent: Wednesday, January 03, 2018 12:45 AM
> To: SQLite mailing list
> Subject: Re: [sqlite] Efficient query to count number of leaves in a DAG.
>
> Hi David,
>
> Nice work! your query is far quicker than mine- even without the
> reverseEdges index. I think you are right about the problem of potentially
> double counting leaves. There weren't any multi-parent nodes in my test
> data so I didn't notice this mistake.
>
> Could you please explain why your query is so much faster?
>
> On 2 January 2018 at 17:50, David Raymond 
> wrote:
>
> > I think you need a union there instead of a union all. Otherwise you're
> > double (or more) counting leaves where there is more than 1 path to get
> to
> > the leaf.
> >
> > I don't have a large dataset to test it on, but how about something like:
> >
> > create table nodes
> > (
> >   id integer primary key,
> >   description text
> > );
> >
> > create table edges
> > (
> >   parent int not null references nodes,
> >   child int not null references nodes,
> >   primary key (parent, child),
> >   check (parent != child)
> > ) without rowid;
> > create index reverseEdges on edges (child, parent);
> >
> > create view leafCounts as with recursive
> > leaves (id) as (
> >   select nodes.id
> >   from nodes left outer join edges
> >   on nodes.id = edges.parent
> >   where edges.parent is null
> > ),
> > paths (parent, child) as (
> >   select parent, child from edges
> >   union
> >   select paths.parent, edges.child
> >   from paths inner join 

[sqlite] Explain query plan output and recursive CTE's

2018-01-05 Thread David Raymond
Quick thing I noticed looking at some explain query plan outputs. When you get 
to the compound subqueries part in a recursive CTE it always seems to give the 
subquery numbers as 0 and 0. I’m not sure if that’s a bug, a limitation, or 
intended behavior. There're no issues with the queries themselves, just the 
text in the explain query plan output.

(following queries aren't equivalent, they're just showing the numbering issue)
"Normal", non-recursive, and recursive examples given.

SQLite version 3.21.0 2017-10-24 18:55:49
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

sqlite> create table foo (x);

sqlite> explain query plan select x from foo union select x + 1 from foo;
selectid|order|from|detail
1|0|0|SCAN TABLE foo
2|0|0|SCAN TABLE foo
0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION)

sqlite> explain query plan with bar as (select x from foo) select * from bar 
union select * from foo;
selectid|order|from|detail
1|0|0|SCAN TABLE foo
2|0|0|SCAN TABLE foo
0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION)

sqlite> explain query plan with recursive bar (x) as (select x from foo union 
select x + 1 from bar) select * from bar;
selectid|order|from|detail
2|0|0|SCAN TABLE foo
3|0|0|SCAN TABLE bar
1|0|0|COMPOUND SUBQUERIES 0 AND 0 USING TEMP B-TREE (UNION)
0|0|0|SCAN SUBQUERY 1

sqlite>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Loadable extension with shared state

2018-01-05 Thread Simon Slavin
On 5 Jan 2018, at 12:12pm, Lifepillar  wrote:

> I can make application functions access the context only for reading,
> if that matters.

Nope.  Reading vs. writing doesn’t matter.  You treat them both the same.

Nor does it matter whether you’re doing something trivial with sqlite3_vtab() 
or doing a multi-row UPDATE.

What does matter is whether all your threads are using the same SQLite3 
connection and whether they’re all using the same statement handle.  I don’t 
think you’ve made this plain anywhere, or whether it can be changed if it would 
help things.

In terms of SQLITE3_THREADSAFE, I can’t improve on



Read that, do the thing in section 3, and if you have questions about it, ask 
here.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Dinu
Short sum-up:
- The -OR- alternative I provided is not semantically equivalent, I will
work on one that is :)
- The other one, without the -OR- (second post) still stands.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Dinu
Richard,

Thanks for acknowledging this, you are absolutely right, that's why I stated
that no DB does perfect optimisations and that computing the alternative
-OR- based closures are probably much harder to tackle. Also E. Pasma
pointed out the -OR- queries as I wrote them are not really semantically
equivalent unless the 2 joins are disjunct.

However, the case of the imperative WHERE NOT NULL implying INNER JOIN is
just a matter of replacing a predicate with a stronger one, so in all
fairness I imagined it a far lesser overhead than, say, the query flattener.
And I imagine it's a much more common situation, too, especially when users
are adding additional filters via WHERE clauses to a base query, so it might
benefit a lot of users, too. I know it would us, by not having to rewrite
these queries when porting; we are working on x86 servers, and a stick of
memory or a hard drive cost less than a programmer's day for us :) 

For the extra memory, I know for computing relational closures the spatial
complexity can get big, but only when the structure of the query is written
warrants it in the first place, so it shouldn't manifest heavily on a query
that doesn't have this structure.

This is just my best view on this, obviously it's a political decision to be
made so it's no make-it-or-break-it thing, like mentioned before, we are
porting some pretty big system and when I notice differences with SQLite, I
jolt them down, in the hope it might benefit you or the millions of users,
if not by changing SQLite, then simply by pointing out the workaround to
other users, such as moving the WHERE condition out to the ON clause, it's
not necessarily a trivial thing to consider for everyone. 

Ryan, 

You cannot ask SQL a Method query, that's where my whole RDBMS understanding
takes me. It nullifies the purpose of queries as well as all efforts you
yourselves have put into a lot of things, query flattening to mention just
one. The "same result" is not accidental, the equivalent queries will
produce the same result no matter which data populates the tables. That is
the only deffinition I know of semantic equivalence. SQL is declarative and
thus everything that describes the same thing is the same thing.

E.Pasma,

Thanks for taking the time to make the TC. This is always a huge putdown for
me, because finally the execution plan depends on the data indexes are
populated with (via ANALYZE) and are tables are huge so it's always a
putdown for me to create a minimal TC.

Indeed I noticed just now the 2 queries are not equivalent that way :)
Thanks for pointing that out! I will work on an equivalent -and- optimized
rewrite :)

For the query plans though, here is where the index stats come in: here a
"SCAN a" makes sense, but in our case the number of records in a is on the
order of 1x records to b and c, and also the cardinality of b.d and c.d
is on the order of 1000; so a "SEARCH b, SEARCH c" works out. 

At minimum you should have indexes on b.d, c.d, a.ab, a.ac; but even so and
with adding another 1000 records on a, b and c and running the query:

EXPLAIN QUERY PLAN
SELECT * 
FROM 
  a 
JOIN 
  b ON b=ab AND b.d IN (1,2,3) 
JOIN 
  c ON c=ac AND c.d IN (4,5) 

selectid |order |from |detail |
-|--|-|---|
0|0 |0|SCAN TABLE a   |
0|1 |1|SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?) |
0|0 |0|EXECUTE LIST SUBQUERY 1|
0|2 |2|SEARCH TABLE c USING INTEGER PRIMARY KEY (rowid=?) |

it still plans a "SCAN a" first. So I guess I'll have to backtrack from the
real data to generate a TC.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread E.Pasma
op 05-01-2018 17:23 schreef David Raymond op david.raym...@tomtom.com:

>> Anyway the two queries return the same set of rows.
> 
>> This test also show a small semantic difference in the two queries.
>> The set of rows is the same but the second query leaves certain
>> details null if only one of the OR conditions is true. That occurs in
>> row 1.
> 
> You're contradicting yourself there. If there's a difference in the results
> then they're not the same set of rows.
> 
> We'll just look at the "all 1" case. a has (1, 1, 1), b has (1, 1) and c has
> (1, 1)
> 
> Best if viewed in a fixed-width font
> 
> Step by step version 1:
> 
> a left join b on b = ab
> 
> a b
> a ab ac   b  d
> (1, 1, 1) (1, 1)
> 
> result:
> a ab ac  b  d
> (1, 1, 1, 1, 1)
> 
> left join c on c = ab
> 
> c
> a ab ac  b  d   c  d
> (1, 1, 1, 1, 1) (1, 1)
> 
> result:
> a ab ac  b  d  c  d
> (1, 1, 1, 1, 1, 1, 1)
> 
> where b.d in (1, 2, 3) or c.d in (4, 5)
> 
> b.d is 1, so it passes
> result:
> 
> a ab ac  b  d  c  d
> (1, 1, 1, 1, 1, 1, 1)
> 
> The OR worked. We got the values from both tables b, and c and because one of
> them was correct.
> 
> 
> 
> Now, Step by step version 2:
> 
> a left join b on b = ab and b.d in (1, 2, 3)
> 
> a b
> a ab ac   b  d
> (1, 1, 1) (1, 1)
> 
> result:
> a ab ac  b  d
> (1, 1, 1, 1, 1)  same so far (but only by coincidence)
> 
> left join c on c = ac and c.d in (4, 5)
> 
> c
> a ab ac  b  d   c  d
> (1, 1, 1, 1, 1) (1, 1)
> 
> the ON condition doesn't match. Since this is an outer join, and there were no
> matches for the row in the left side, nulls are included
> 
> result:
> a ab ac  b  dc d
> (1, 1, 1, 1, 1,  null, null)
> 
> where b.d is not null or c.d is not null
> 
> b.d isn't null, so that passes.
> result:
> 
> a ab ac  b  dc d
> (1, 1, 1, 1, 1,  null, null)
> 
> So we get a row saying that a matched something in b, but we're throwing out
> the value from the c table, which is not what we wanted.

Thanks, all clear except this last line. Did we not want the value to be
thrown out. Or not want the value?
It depends on that which query is favourite.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread David Raymond
> Anyway the two queries return the same set of rows.

> This test also show a small semantic difference in the two queries.  
> The set of rows is the same but the second query leaves certain  
> details null if only one of the OR conditions is true. That occurs in  
> row 1.

You're contradicting yourself there. If there's a difference in the results 
then they're not the same set of rows.

We'll just look at the "all 1" case. a has (1, 1, 1), b has (1, 1) and c has 
(1, 1)

Best if viewed in a fixed-width font

Step by step version 1:

a left join b on b = ab

 a b
 a ab ac   b  d
(1, 1, 1) (1, 1)

result:
 a ab ac  b  d
(1, 1, 1, 1, 1)

left join c on c = ab

 c
 a ab ac  b  d   c  d
(1, 1, 1, 1, 1) (1, 1)

result:
 a ab ac  b  d  c  d
(1, 1, 1, 1, 1, 1, 1)

where b.d in (1, 2, 3) or c.d in (4, 5)

b.d is 1, so it passes
result:

 a ab ac  b  d  c  d
(1, 1, 1, 1, 1, 1, 1)

The OR worked. We got the values from both tables b, and c and because one of 
them was correct.



Now, Step by step version 2:

a left join b on b = ab and b.d in (1, 2, 3)

 a b
 a ab ac   b  d
(1, 1, 1) (1, 1)

result:
 a ab ac  b  d
(1, 1, 1, 1, 1)  same so far (but only by coincidence)

left join c on c = ac and c.d in (4, 5)

 c
 a ab ac  b  d   c  d
(1, 1, 1, 1, 1) (1, 1)

the ON condition doesn't match. Since this is an outer join, and there were no 
matches for the row in the left side, nulls are included

result:
 a ab ac  b  dc d
(1, 1, 1, 1, 1,  null, null)

where b.d is not null or c.d is not null

b.d isn't null, so that passes.
result:

 a ab ac  b  dc d
(1, 1, 1, 1, 1,  null, null)

So we get a row saying that a matched something in b, but we're throwing out 
the value from the c table, which is not what we wanted.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread E.Pasma

Dinu wrote:


Hi all,
I've ran into an optimisation problem with a double-left join that  
works as

an "either" clause.

The query is as follows:

SELECT *
FROM
 a
LEFT JOIN
 b ON 
LEFT JOIN
 c ON 
WHERE
 b.someId IN (1,2,3) OR
 c.someId IN (4,5)

This results in a bloated execution plan:
SEARCH a
SEARCH SUBQUERY 1 USING AUTOMATIC COVERING INDEX
SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX

However, the semantically equivalent:
SELECT *
FROM
 a
LEFT JOIN
 b ON  AND b.someId IN (1,2,3)
LEFT JOIN
 c ON AND c.someId IN (4,5)
WHERE
 b.someId IS NOT NULL OR
 c.someId IS NOT NULL

Gets the proper execution plan:
SEARCH b
SEARCH c
EXECUTE LIST SUBQUERY




Hello, the discussion about whether the two queries are equivalent is  
not satisfactory to me. What Keith sais


Your "where" condition is effectively converted an OUTER JOIN into  
an INNER JOIN ..


is true. But the OR condition makes this true for either the one or  
the other outer join. I hope this is what Dinu means here:


And the two queries are algebrically identical. "(X ∊ S or X:=null)  
AND (X is not null)" is
equivalent to "X ∊ S is not null". The two queries might look  
different only

from an imperative programming point of view.


Anyway the two queries return the same set of rows.

Furthermore: what is a "bloated" execution plan?
I set up some test data and the query deamed bloated appears just as  
fast. See below.


This test also show a small semantic difference in the two queries.  
The set of rows is the same but the second query leaves certain  
details null if only one of the OR conditions is true. That occurs in  
row 1.

The outcome of the "bloated" execution plan is more complete.

Possibly I am too pragmatical and don't understand the discussion.

E. Pasma


My test script:

create table a (a integer primary key, ab, ac);
create table b (b integer primary key, d);
create table c (c integer primary key, d);
insert into a values (null,1,1);
insert into a select null, 2,2 from a;
insert into a select null, 3,2 from a;
insert into a select null, 4,4 from a;
insert into a select null, 5,5 from a;
insert into a select null, 6,6 from a;
insert into a select null, 7,7 from a;
insert into a select null, 8,8 from a;
insert into a select null, 9,9 from a;
insert into a select null, 10,10 from a;

insert into b values (1,1),(2,2),(3,3),(4,3),(6,3);
insert into c values (1,1),(4,5),(5,5),(7,1);
.eqp on
.timer on
SELECT *
FROM
 a
LEFT JOIN
 b ON b=ab
LEFT JOIN
 c ON c=ac
WHERE
 b.d IN (1,2,3) OR
 c.d IN (4,5)
;

SELECT *
FROM
 a
LEFT JOIN
 b ON b=ab AND b.d IN (1,2,3)
LEFT JOIN
 c ON c=ac AND c.d IN (4,5)
WHERE
 b.d IS NOT NULL OR
 c.d IS NOT NULL
;

Output:

--EQP-- 0,0,0,SCAN TABLE a
--EQP-- 0,1,1,SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?)
--EQP-- 0,2,2,SEARCH TABLE c USING INTEGER PRIMARY KEY (rowid=?)
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 1
1|1|1|1|1|1|1
2|2|2|2|2||
3|3|2|3|3||
4|3|2|3|3||
5|4|4|4|3|4|5
6|4|4|4|3|4|5
7|4|4|4|3|4|5
8|4|4|4|3|4|5
9|5|5|||5|5
10|5|5|||5|5
11|5|5|||5|5
12|5|5|||5|5
13|5|5|||5|5
14|5|5|||5|5
15|5|5|||5|5
16|5|5|||5|5
17|6|6|6|3||
18|6|6|6|3||
19|6|6|6|3||
20|6|6|6|3||
21|6|6|6|3||
22|6|6|6|3||
23|6|6|6|3||
24|6|6|6|3||
25|6|6|6|3||
26|6|6|6|3||
27|6|6|6|3||
28|6|6|6|3||
29|6|6|6|3||
30|6|6|6|3||
31|6|6|6|3||
32|6|6|6|3||
Run Time: real 0.003 user 0.001587 sys 0.000358
--EQP-- 0,0,0,SCAN TABLE a
--EQP-- 0,1,1,SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?)
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 1
--EQP-- 0,2,2,SEARCH TABLE c USING INTEGER PRIMARY KEY (rowid=?)
1|1|1|1|1||
2|2|2|2|2||
3|3|2|3|3||
4|3|2|3|3||
5|4|4|4|3|4|5
6|4|4|4|3|4|5
7|4|4|4|3|4|5
8|4|4|4|3|4|5
9|5|5|||5|5
10|5|5|||5|5
11|5|5|||5|5
12|5|5|||5|5
13|5|5|||5|5
14|5|5|||5|5
15|5|5|||5|5
16|5|5|||5|5
17|6|6|6|3||
18|6|6|6|3||
19|6|6|6|3||
20|6|6|6|3||
21|6|6|6|3||
22|6|6|6|3||
23|6|6|6|3||
24|6|6|6|3||
25|6|6|6|3||
26|6|6|6|3||
27|6|6|6|3||
28|6|6|6|3||
29|6|6|6|3||
30|6|6|6|3||
31|6|6|6|3||
32|6|6|6|3||
Run Time: real 0.002 user 0.001560 sys 0.000296








___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Richard Hipp
On 1/4/18, Dinu  wrote:
>  I think it's every DB's intention to optimize as best possible
> a query into an execution plan. None does it perfectly, but all try to, very
> hard.

There are trade-offs here.  How much slower are you willing for
sqlite3_prepare() to run in order to get a better query plan?  How
much extra memory and disk space are you willing to allocation to
libsqlite3.so in order to get a better query plan?  Are you willing to
impose these costs on (literally) billions of other users that don't
really need the more advanced query planning?  These are hard
questions with no easy answers.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Loadable extension with shared state

2018-01-05 Thread Lifepillar

On 03/01/2018 15:58, Lifepillar wrote:

On 03/01/2018 15:48, Richard Hipp wrote:

On 1/3/18, Lifepillar  wrote:

Consider an extension that has some shared state, say a global `context`
struct, whose value is used by a few user-defined SQL functions.
Besides, assume that there are other SQL functions that can act on the
global context.

The question is: how do I turn this into a thread-safe extension?

Should I use SQLite3 mutex functions to guarantee exclusive access to
shared state? Or should I define my own locks?


Either approach will work.  Which is easiest for you?


If SQLite3 locks may be used in loadable extensions, it is fine with me.

Thanks for the quick feedback!


Just wondering: if I put the context data into a sqlite3_vtab instance,
do I still need to take care of thread synchronization myself?

What I am currently doing is this:

1. Shared context data is allocated dynamically during initialization;

2. A pointer to the allocated context is passed to each application
   function and to sqlite3_create_module_v2().

3. In my xConnect() I store the pointer into an object derived from
   sqlite3_vtab.

4. Application functions and xUpdate() freely read and write through
   the pointer.

I can make application functions access the context only for reading,
if that matters.

Life.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Search % using sqlite

2018-01-05 Thread Hegde, Deepakakumar (D.)
Hi,


Thanks a lot for the support.


We are actually using sqlite3_snprintf() with the proper buffer size to make 
sure that overflow will not happen. and also bind operations.


INSTR() seems to be the best option. It seems to be fine. Thanks a lot.


Thanks and Regards

Deepak



From: sqlite-users  on behalf of 
R Smith 
Sent: Friday, January 5, 2018 3:41:35 PM
To: sqlite-users@mailinglists.sqlite.org
Subject: Re: [sqlite] Search % using sqlite


On 2018/01/05 10:14 AM, Rowan Worth wrote:
> For this simple search it's easier to replace "NAME LIKE ?" with
> "instr(NAME, ?) > 0", unless you need case insensitive matching.

And in case you do wish for Case-Insensitive matching while using
Rowan's method...

"instr(UPPER(NAME), UPPER(?)) > 0"

(Or LOWER(), if you so wish).


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Search % using sqlite

2018-01-05 Thread R Smith


On 2018/01/05 10:14 AM, Rowan Worth wrote:

For this simple search it's easier to replace "NAME LIKE ?" with
"instr(NAME, ?) > 0", unless you need case insensitive matching.


And in case you do wish for Case-Insensitive matching while using 
Rowan's method...


"instr(UPPER(NAME), UPPER(?)) > 0"

(Or LOWER(), if you so wish).


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread R Smith


On 2018/01/05 4:24 AM, Dinu wrote:

Thank you for your answer, Keith. I had my problem "fixed" before I wrote the
first mail. Also with every problem I also provided the fix that worked, for
anyone that might run into the same problem.
However, it's difficult to not get a little frustrated with your answer.

At https://sqlite.org/queryplanner.html I read:

"The best feature of SQL (in all its implementations, not just SQLite) is
that it is a declarative language, not a procedural language. When
programming in SQL you tell the system what you want to compute, not how to
compute it."


I'm sure his frustration is on a par. :)
While we've both stated that your queries are not equivalent, Keith took 
the time to write an explanation of why that is, which seemingly did not 
hit home, and now I will try again with an analogy:


You are essentially have a delivery person usually tasked to take a 
truck, go to a farm and pick up a load of eggs. Now you ask the same 
driver to use the same truck to go to the corner cafe and pick up 6 
eggs, and then you exclaim "Wow, why he takes the truck? he can just 
take the scooter!! The truck is sooo inefficient for this job!".


And yes, you are right, but the problem is you asked for it to be done 
by truck. Now I agree the ideal in SQL is (as the quote above states) 
that one should merely ask for a result and the engine should decide how 
best to achieve it, but in practice there are many nuances in the 
programming of the engine that thwarts this ideal, not to mention how 
many programmers like to tweak their queries to get the engine to 
execute the quickest, and there is nothing wrong with this. It does 
however mean that the engine should in all circumstances, while trying 
to find the best query plan, still adhere to the type of question that 
was asked of it. You asked a METHOD A question, it won't (and shouldn't) 
apply a METHOD B to reach the accidental similar result.


Second problem, your queries show the narrowest of use cases. The engine 
has to work for ALL use cases which can get very involved and complex. 
The engine needn't have another level of abstraction AI going "Oh this 
one is simple, we will disregard what the programmer asked for and use 
our own more simple query because it should get the same result."




To sum it up: I think it's every DB's intention to optimize as best possible
a query into an execution plan. None does it perfectly, but all try to, very
hard. With this intention, I reported a case where the query planner COULD
be improved. I think you will at least agree with me that making it better
can't be wrong. Whether that happens tomorrow, in a year or never, that's up
to the mercy, resources and priorities of the developers, so I am really am
not interested in an argue over this.


It's a good idea to report possible improvements, and thank you for 
that, but this case isn't able to improve since mangling an outer join 
into an inner join when sometimes it might yield the same result is as 
unsafe as it gets. However, that doesn't mean the devs (who would have 
read all this) doesn't find something of interest and could possibly 
think of a tweak that might improve things, so having this debate is 
never a waste, but the specific algebraic essence of what you are 
suggesting is not correct - 's all we're sayin.


Cheers!
Ryan

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Search % using sqlite

2018-01-05 Thread Rowan Worth
Firstly this code is extremely dangerous. What would happen if
acInputString contained this string?

';DROP TABLE AUDIO; SELECT '

It's best practice to use bound parameters to prevent this kind of problem,
ie.

sqlite3_prepare(db, "SELECT NAME FROM AUDIO WHERE NAME LIKE ?", -1, ,
NULL);
sprintf(acGlobString, "%%%s%%", acInputString);
sqlite3_bind_string(stmt, 1, acGlobString);
sqlite3_step(stmt);

Which is still dangerous if acInputString is larger than can fit in
acGlobString -- snprintf is advised to avoid buffer overflow. And all
sqlite3 return codes should be checked to see if an error occurred, of
course.


Ok now to the actual problem -- you can modify your query to read:

SELECT NAME FROM AUDIO WHERE NAME LIKE ? ESCAPE '!'

The ESCAPE clause defines a character which can be used to match a literal
% instead of % being treated as a wildcard. I've chosen ! as the escape
character, which means you'll have to prefix all !, %, and _ characters
with an ! to get a literal match.


For this simple search it's easier to replace "NAME LIKE ?" with
"instr(NAME, ?) > 0", unless you need case insensitive matching.

-Rowan


On 5 January 2018 at 15:49, Hegde, Deepakakumar (D.) <
deep...@allgosystems.com> wrote:

> Hi All,
>
>
> We are implementing a wild card search feature.  our query is as below:
>
>
> sprintf (acQstring,
>
> "SELECT NAME FROM AUDIO WHERE NAME LIKE '%%%s%%'", acInputString);
>
>
> In the input string is '%' then we are getting t all the entry in the
> column.
>
>
> ex: name column have following:
>
> %a
>
> a
>
> a%a
>
> aa%
>
>
> we are expecting entry 2 which don't have % in it should not get as
> output. But it seems not the case, it is giving all the 4 entry as output.
>
> Please can we know is there any way of searching this? Thanks.
>
>
> Thanks and Regards
>
> Deepak
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users