Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-08 Thread Dinu
Another reason to do at least the LEFT JOIN + WHERE -> INNER JOIN
optimisation:
If a view is based on a LEFT JOIN, running a WHERE query on it will exhibit
the same poor behavior and here there will be no way to rewrite the query.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Dinu
Keith Medcalf wrote
> The primary difference is likely the test on b.c.  

/There was some text here that seems to have been lost and my browser opened
a "super resume templates" page... maybe I've been already spectre-hacked?
Anyway, I repost:/

The primary difference from what I see is that it changes the query plan to
start off with an index-based search on b, not a search (scan) on a. To me,
this can be due to the supplimentary condition in the ON clause making this
plan more appealing to the optimizer. So with this in mind, an easy
optimisation would be to just "permeate" any imperative non-null condition
in the where clause to the on clause. I don't know if the execution plan is
identical to that of an inner join (I never tried to actually convert it to
an inner join and I'm working on sth else atm), but it's satisfactory and it
starts off the right way, with a search on b, with an execution time well in
the expected margins.


Keith Medcalf wrote
> I think the query is ill-conceived when written as an outer join.

If anything, to reflect intentionality the query should have been written as
such:

SELECT FROM (
  SELECT FROM a LEFT JOIN b
) WHERE b.c=5




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Dinu
Keith Medcalf wrote
> The primary difference is likely the test on b.c.  


Keith Medcalf wrote
> I think the query is ill-conceived when written as an outer join.

If anything, to reflect intentionality the query should have been written as
such:

SELECT FROM (
  SELECT FROM a LEFT JOIN b
) WHERE b.c=5




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Keith Medcalf
>To reiterate, Keith: to get the query to execute properly, I didn't
>change
>the LEFT JOIN to an INNER JOIN!
>Nope,
>I rewrote
>
>SELECT
>FROM
>  a
>  LEFT JOIN b ON 
>WHERE b.c=5

This means:

Take all rows of table a
match with rows of table b on the key and where there is no match "fill" 
columns that would have come from b with a NULL
AND THEN AFTER THAT
return all rows where b.c = 5


>to

>SELECT
>FROM
>  a
>  LEFT JOIN b ON  AND b.c=5
>WHERE b.c IS NOT NULL

This translates to:
Take all rows of table a
match with rows of table b on the key AND where b.c = 5 and where there is no 
match "fill" columns that would have come from b with a NULL
AND THEN AFTER THAT
return only rows where b.c is not NULL

The primary difference is likely the test on b.c.  You see, to check if b.c is 
a particular value requires unpacking the row and retrieving the value.  
Checking whether b.c IS NULL does not require unpacking the data but merely 
reading the 'datatype', which should be a much faster operation.

If you wrote the query as an inner join it would be much faster since it would 
not be possible after projection for b.c to be anything other than not null.

>So I just added a redundant predicate and it runs perfectly, on
>SQLite!

>That's why I said this simple improvement can surely be taken care of
>on the
>optimizer, while the larger discussion of actually changing the outer
>join
>to an inner join or even tackling the -OR- case is for sure something
>nice
>to think of, but increasingly more complicated.

I think the query is ill-conceived when written as an outer join.  You are 
asking the query to "project" a result containing a crap-load of rows which you 
will subsequently remove from the result set.  For bitty-bases (a couple of 
hundred rows) this is irrelevant.  However, for tables containing millions of 
rows, creating a googillion of rows then culling out all of them except 4 is a 
very bad use of computer resources.  Why generate them in the first place if 
you are just going to discard them?

I have seen many a mis-designed program which works blazingly quickly on a 
"test" database that contains a dozen customers, half-a-dozen products, with 
two or three components each.  But when populated with the production data of 
1,000,000 customers, 500,000 products, and between 100 and 10,000 components 
per product frankly take "years" to run any step and are completely unuseable.

One should always design as if each table contained a googillion rows.  If it 
is fast enough then, it will certainly be efficient enough to handle your ten 
row per table database.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Dinu
To reiterate, Keith: to get the query to execute properly, I didn't change
the LEFT JOIN to an INNER JOIN!
Nope,
I rewrote

SELECT
FROM
  a
  LEFT JOIN b ON 
WHERE b.c=5

to

SELECT
FROM
  a
  LEFT JOIN b ON  AND b.c=5
WHERE b.c IS NOT NULL

So I just added a redundant predicate and it runs perfectly, on SQLite!
That's why I said this simple improvement can surely be taken care of on the
optimizer, while the larger discussion of actually changing the outer join
to an inner join or even tackling the -OR- case is for sure something nice
to think of, but increasingly more complicated.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Dinu
Keith Medcalf wrote
> but cannot be re-ordered for OUTER JOINS.

Actually, I think order requirements (or rather, row grouping requirements,
as far as I can paint it the requirement is just that all outer joined rows
come in a bunch for each main join tuple) would not be violated if the join
is made on an unique key left-side and an index is used right-side :) or
something similar. I don't know, extensive algebra must be involved :) Even
without index order inference, the main trunk keys can be sorted in a temp
structure to preserve the condition, like in a GROUP BY query. However, I do
see that SQLite seems to actually do it (scan b before a I mean) if I
reqrite the query as I showed.

select *
  from a, b
 where a.a *= b.a

was replaced by

select *
  from a LEFT JOIN b ON a.a = b.a

Right, right, and with this in mind you can see my problem with the query is
so easy to understand:

My query, on the old format, is:

select *
from a, b
where 
  a.a *= b.a AND
  b.c = 5

My "improved" query, on the old format:

select *
from a, b
where 
  a.a *= b.a AND
  b.c *= 5 AND
  b.c = 5 // (OR IS NOT NULL)

You can see the b.c *= 5 (JOIN ON ... AND b.c=5 ... WHERE b.c IS NOT NULL)
is redundant, because it's just a weaker predicate, and I needed to add it
just as an index hint on the join loop to trigger the right execution plan.


Keith Medcalf wrote
> I would sincerely doubt that there is *any* SQL optimizer or query planner
> that can optimize mis-spoken queries containing OUTER JOINS.

I don't know about mis-spoken, I don't think anything is mis-spoken.
This app is running fine on Maria, I'm in the process of porting in to
SQLite. I wouldn't have picked on this query unless it was lagging behind
orders of magnitude (2.5s vs 50ms). So I think Maria does it (I haven't
bothered to check the execution plan there, went straight to hacking
SQLite).



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Keith Medcalf

>Thanks for acknowledging this, you are absolutely right, that's why I
>stated
>that no DB does perfect optimisations and that computing the
>alternative
>-OR- based closures are probably much harder to tackle. Also E. Pasma
>pointed out the -OR- queries as I wrote them are not really
>semantically
>equivalent unless the 2 joins are disjunct.

I suspect that query re-write of an outer join to an inner join would be 
violating some rule in the new SQL standards, most likely about visitation 
(nested loop) order -- which is freely reorderable for INNER JOINS but cannot 
be re-ordered for OUTER JOINS.  Back in the "olden days" one specified outer 
join conditions in the WHERE clause using *= =* or *=* syntax, where the * was 
on the side of the operator where all rows came from.  This was deprecated many 
years ago when the  JOIN ON  syntatactic sugar was created 
because too many people where forgetting that they need to use the appropriate 
* to "bind" the where condition to the appropriate outer join 
binding, then complaining that it was too difficult to remember or figure out 
where a plain "WHERE" clause was needed and where a "OUTER JOIN" bound 
condition was required, resulting in many calls to SQL Database support lines.  
And these same vendors also happen to be who write the specs, so they promptly 
changed the spec to eliminate the support calls.

So the ** format was written out of the standard and the OUTER JOIN 
ON became mandatory ONLY FOR OUTER JOIN operations.  In all other cases the 
JOIN ON syntax was just syntactic sugar for the old "list of tables" and where 
clause.  Don't recall exactly when this occurred but it was about two decades 
ago.  ( I remember it well because there were many queries that could not be 
expressed in the new-fangled format )

So really, adding conditions to apply to an OUTER JOIN to the where clause is 
equivalent to the common error of yester-decades of forgetting the *.

OLD SYNTAX:

select *
  from a, b
 where a.a *= b.a

was replaced by

select *
  from a LEFT JOIN b ON a.a = b.a

Of course, the old syntax allowed one to specify algebraic conditions that can 
no longer be expressed with the new syntax.  But that is OK, it is easier and 
simpler to say:  go RTFM, it is jolly clear, and if you don't like it then 
boo-hoo on you.  A good implementation was destroyed.  Live with it, love it, 
and get over it.

I would sincerely doubt that there is *any* SQL optimizer or query planner that 
can optimize mis-spoken queries containing OUTER JOINS.  Not even DB2's 
exhaustive search query planner/optimizer can do it, and it is quite possibly 
now one of the best in existence if you tell it that it can take unlimited time 
and resources to generate (and run) the plan.  All others pale in comparison.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread R Smith

At the risk of preserving this thread well past its end of life cycle...

On 2018/01/05 6:58 PM, Dinu wrote:

Ryan,

You cannot ask SQL a Method query, that's where my whole RDBMS understanding
takes me.


Everything you ask SQL is underpinned by a specific Method. Perhaps I 
should have been more clear - by METHOD I mean, in the case of the join 
method, "the /way/ you expect the query to enumerate rows" such as an 
Inner join being one method, and outer join being another method. 
Sorting/Ordering is a method of output, grouping, etc.


Why would the SQL standard propose these different methods if they were 
not meaningful and distinct?


As I said before, there need not be an AI to judge that the query 
uttered by the programmer can in fact, in a narrow case, be recomputed 
as another query because the result will be the same and hopefully that 
the alternate method would be more efficient.


Note that I said there needn't be... I did not say there /can't/ be one, 
indeed, query flattening is a good example, but with query flattening 
the cost is low and the reward is high for a really broad spectrum of cases.


Why did I say it is not needed?  Well, what you propose has a relatively 
high cost (added heuristic AI) considering it is paid across all DB 
engine query planning to achieve a small advantage in the narrowest of 
use cases, not to mention that - should the programmer wish for a 
speed-up for the left join that conforms to this narrow set of 
circumstances, he or she could instantly change it to a normal join (the 
way it should have been to start with) and enjoy the fruits of the added 
speed with zero cost to the rest of us who wouldn't have made the 
imperfect query in the first place.


Why do I call it narrow?  Have you looked at your example queries in 
detail? Do you know how many things must be exactly just so (or how many 
other normal query things must be absent) for that join replacement to 
work algebraically? At least in the case of query flattening, it 
improved a query construct that is found abundantly and considered the 
correct construct for the expected results.


You essentially want the engine to second-guess programmers who didn't 
write the best query for their expected results. That kind of 
hand-holding belongs to the realms of Microsoft and MySQL.


(Apologies for all the word clarifications, but I'd rather avoid having 
this turn into a "semantics" debate, so trying to be as clear as 
possible on meanings of statements... not sure I succeeded though)  :)



Cheers,
Ryan

PS: I'm not judging MySQL, at least it has the benefit of being a fully 
fledged server-side software and greatly tweak-able on the fly for all 
its hand-holdy functionality.
PPS: I will say this - If you're not using MySQL in STRICT mode, you 
are  n  hours away from some disaster, where n is a not-too-big positive 
integer.

PPPS: I wish SQLite had a STRICT mode. :)



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Dinu
Short sum-up:
- The -OR- alternative I provided is not semantically equivalent, I will
work on one that is :)
- The other one, without the -OR- (second post) still stands.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Dinu
Richard,

Thanks for acknowledging this, you are absolutely right, that's why I stated
that no DB does perfect optimisations and that computing the alternative
-OR- based closures are probably much harder to tackle. Also E. Pasma
pointed out the -OR- queries as I wrote them are not really semantically
equivalent unless the 2 joins are disjunct.

However, the case of the imperative WHERE NOT NULL implying INNER JOIN is
just a matter of replacing a predicate with a stronger one, so in all
fairness I imagined it a far lesser overhead than, say, the query flattener.
And I imagine it's a much more common situation, too, especially when users
are adding additional filters via WHERE clauses to a base query, so it might
benefit a lot of users, too. I know it would us, by not having to rewrite
these queries when porting; we are working on x86 servers, and a stick of
memory or a hard drive cost less than a programmer's day for us :) 

For the extra memory, I know for computing relational closures the spatial
complexity can get big, but only when the structure of the query is written
warrants it in the first place, so it shouldn't manifest heavily on a query
that doesn't have this structure.

This is just my best view on this, obviously it's a political decision to be
made so it's no make-it-or-break-it thing, like mentioned before, we are
porting some pretty big system and when I notice differences with SQLite, I
jolt them down, in the hope it might benefit you or the millions of users,
if not by changing SQLite, then simply by pointing out the workaround to
other users, such as moving the WHERE condition out to the ON clause, it's
not necessarily a trivial thing to consider for everyone. 

Ryan, 

You cannot ask SQL a Method query, that's where my whole RDBMS understanding
takes me. It nullifies the purpose of queries as well as all efforts you
yourselves have put into a lot of things, query flattening to mention just
one. The "same result" is not accidental, the equivalent queries will
produce the same result no matter which data populates the tables. That is
the only deffinition I know of semantic equivalence. SQL is declarative and
thus everything that describes the same thing is the same thing.

E.Pasma,

Thanks for taking the time to make the TC. This is always a huge putdown for
me, because finally the execution plan depends on the data indexes are
populated with (via ANALYZE) and are tables are huge so it's always a
putdown for me to create a minimal TC.

Indeed I noticed just now the 2 queries are not equivalent that way :)
Thanks for pointing that out! I will work on an equivalent -and- optimized
rewrite :)

For the query plans though, here is where the index stats come in: here a
"SCAN a" makes sense, but in our case the number of records in a is on the
order of 1x records to b and c, and also the cardinality of b.d and c.d
is on the order of 1000; so a "SEARCH b, SEARCH c" works out. 

At minimum you should have indexes on b.d, c.d, a.ab, a.ac; but even so and
with adding another 1000 records on a, b and c and running the query:

EXPLAIN QUERY PLAN
SELECT * 
FROM 
  a 
JOIN 
  b ON b=ab AND b.d IN (1,2,3) 
JOIN 
  c ON c=ac AND c.d IN (4,5) 

selectid |order |from |detail |
-|--|-|---|
0|0 |0|SCAN TABLE a   |
0|1 |1|SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?) |
0|0 |0|EXECUTE LIST SUBQUERY 1|
0|2 |2|SEARCH TABLE c USING INTEGER PRIMARY KEY (rowid=?) |

it still plans a "SCAN a" first. So I guess I'll have to backtrack from the
real data to generate a TC.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread E.Pasma
op 05-01-2018 17:23 schreef David Raymond op david.raym...@tomtom.com:

>> Anyway the two queries return the same set of rows.
> 
>> This test also show a small semantic difference in the two queries.
>> The set of rows is the same but the second query leaves certain
>> details null if only one of the OR conditions is true. That occurs in
>> row 1.
> 
> You're contradicting yourself there. If there's a difference in the results
> then they're not the same set of rows.
> 
> We'll just look at the "all 1" case. a has (1, 1, 1), b has (1, 1) and c has
> (1, 1)
> 
> Best if viewed in a fixed-width font
> 
> Step by step version 1:
> 
> a left join b on b = ab
> 
> a b
> a ab ac   b  d
> (1, 1, 1) (1, 1)
> 
> result:
> a ab ac  b  d
> (1, 1, 1, 1, 1)
> 
> left join c on c = ab
> 
> c
> a ab ac  b  d   c  d
> (1, 1, 1, 1, 1) (1, 1)
> 
> result:
> a ab ac  b  d  c  d
> (1, 1, 1, 1, 1, 1, 1)
> 
> where b.d in (1, 2, 3) or c.d in (4, 5)
> 
> b.d is 1, so it passes
> result:
> 
> a ab ac  b  d  c  d
> (1, 1, 1, 1, 1, 1, 1)
> 
> The OR worked. We got the values from both tables b, and c and because one of
> them was correct.
> 
> 
> 
> Now, Step by step version 2:
> 
> a left join b on b = ab and b.d in (1, 2, 3)
> 
> a b
> a ab ac   b  d
> (1, 1, 1) (1, 1)
> 
> result:
> a ab ac  b  d
> (1, 1, 1, 1, 1)  same so far (but only by coincidence)
> 
> left join c on c = ac and c.d in (4, 5)
> 
> c
> a ab ac  b  d   c  d
> (1, 1, 1, 1, 1) (1, 1)
> 
> the ON condition doesn't match. Since this is an outer join, and there were no
> matches for the row in the left side, nulls are included
> 
> result:
> a ab ac  b  dc d
> (1, 1, 1, 1, 1,  null, null)
> 
> where b.d is not null or c.d is not null
> 
> b.d isn't null, so that passes.
> result:
> 
> a ab ac  b  dc d
> (1, 1, 1, 1, 1,  null, null)
> 
> So we get a row saying that a matched something in b, but we're throwing out
> the value from the c table, which is not what we wanted.

Thanks, all clear except this last line. Did we not want the value to be
thrown out. Or not want the value?
It depends on that which query is favourite.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread David Raymond
> Anyway the two queries return the same set of rows.

> This test also show a small semantic difference in the two queries.  
> The set of rows is the same but the second query leaves certain  
> details null if only one of the OR conditions is true. That occurs in  
> row 1.

You're contradicting yourself there. If there's a difference in the results 
then they're not the same set of rows.

We'll just look at the "all 1" case. a has (1, 1, 1), b has (1, 1) and c has 
(1, 1)

Best if viewed in a fixed-width font

Step by step version 1:

a left join b on b = ab

 a b
 a ab ac   b  d
(1, 1, 1) (1, 1)

result:
 a ab ac  b  d
(1, 1, 1, 1, 1)

left join c on c = ab

 c
 a ab ac  b  d   c  d
(1, 1, 1, 1, 1) (1, 1)

result:
 a ab ac  b  d  c  d
(1, 1, 1, 1, 1, 1, 1)

where b.d in (1, 2, 3) or c.d in (4, 5)

b.d is 1, so it passes
result:

 a ab ac  b  d  c  d
(1, 1, 1, 1, 1, 1, 1)

The OR worked. We got the values from both tables b, and c and because one of 
them was correct.



Now, Step by step version 2:

a left join b on b = ab and b.d in (1, 2, 3)

 a b
 a ab ac   b  d
(1, 1, 1) (1, 1)

result:
 a ab ac  b  d
(1, 1, 1, 1, 1)  same so far (but only by coincidence)

left join c on c = ac and c.d in (4, 5)

 c
 a ab ac  b  d   c  d
(1, 1, 1, 1, 1) (1, 1)

the ON condition doesn't match. Since this is an outer join, and there were no 
matches for the row in the left side, nulls are included

result:
 a ab ac  b  dc d
(1, 1, 1, 1, 1,  null, null)

where b.d is not null or c.d is not null

b.d isn't null, so that passes.
result:

 a ab ac  b  dc d
(1, 1, 1, 1, 1,  null, null)

So we get a row saying that a matched something in b, but we're throwing out 
the value from the c table, which is not what we wanted.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread E.Pasma

Dinu wrote:


Hi all,
I've ran into an optimisation problem with a double-left join that  
works as

an "either" clause.

The query is as follows:

SELECT *
FROM
 a
LEFT JOIN
 b ON 
LEFT JOIN
 c ON 
WHERE
 b.someId IN (1,2,3) OR
 c.someId IN (4,5)

This results in a bloated execution plan:
SEARCH a
SEARCH SUBQUERY 1 USING AUTOMATIC COVERING INDEX
SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX

However, the semantically equivalent:
SELECT *
FROM
 a
LEFT JOIN
 b ON  AND b.someId IN (1,2,3)
LEFT JOIN
 c ON AND c.someId IN (4,5)
WHERE
 b.someId IS NOT NULL OR
 c.someId IS NOT NULL

Gets the proper execution plan:
SEARCH b
SEARCH c
EXECUTE LIST SUBQUERY




Hello, the discussion about whether the two queries are equivalent is  
not satisfactory to me. What Keith sais


Your "where" condition is effectively converted an OUTER JOIN into  
an INNER JOIN ..


is true. But the OR condition makes this true for either the one or  
the other outer join. I hope this is what Dinu means here:


And the two queries are algebrically identical. "(X ∊ S or X:=null)  
AND (X is not null)" is
equivalent to "X ∊ S is not null". The two queries might look  
different only

from an imperative programming point of view.


Anyway the two queries return the same set of rows.

Furthermore: what is a "bloated" execution plan?
I set up some test data and the query deamed bloated appears just as  
fast. See below.


This test also show a small semantic difference in the two queries.  
The set of rows is the same but the second query leaves certain  
details null if only one of the OR conditions is true. That occurs in  
row 1.

The outcome of the "bloated" execution plan is more complete.

Possibly I am too pragmatical and don't understand the discussion.

E. Pasma


My test script:

create table a (a integer primary key, ab, ac);
create table b (b integer primary key, d);
create table c (c integer primary key, d);
insert into a values (null,1,1);
insert into a select null, 2,2 from a;
insert into a select null, 3,2 from a;
insert into a select null, 4,4 from a;
insert into a select null, 5,5 from a;
insert into a select null, 6,6 from a;
insert into a select null, 7,7 from a;
insert into a select null, 8,8 from a;
insert into a select null, 9,9 from a;
insert into a select null, 10,10 from a;

insert into b values (1,1),(2,2),(3,3),(4,3),(6,3);
insert into c values (1,1),(4,5),(5,5),(7,1);
.eqp on
.timer on
SELECT *
FROM
 a
LEFT JOIN
 b ON b=ab
LEFT JOIN
 c ON c=ac
WHERE
 b.d IN (1,2,3) OR
 c.d IN (4,5)
;

SELECT *
FROM
 a
LEFT JOIN
 b ON b=ab AND b.d IN (1,2,3)
LEFT JOIN
 c ON c=ac AND c.d IN (4,5)
WHERE
 b.d IS NOT NULL OR
 c.d IS NOT NULL
;

Output:

--EQP-- 0,0,0,SCAN TABLE a
--EQP-- 0,1,1,SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?)
--EQP-- 0,2,2,SEARCH TABLE c USING INTEGER PRIMARY KEY (rowid=?)
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 1
1|1|1|1|1|1|1
2|2|2|2|2||
3|3|2|3|3||
4|3|2|3|3||
5|4|4|4|3|4|5
6|4|4|4|3|4|5
7|4|4|4|3|4|5
8|4|4|4|3|4|5
9|5|5|||5|5
10|5|5|||5|5
11|5|5|||5|5
12|5|5|||5|5
13|5|5|||5|5
14|5|5|||5|5
15|5|5|||5|5
16|5|5|||5|5
17|6|6|6|3||
18|6|6|6|3||
19|6|6|6|3||
20|6|6|6|3||
21|6|6|6|3||
22|6|6|6|3||
23|6|6|6|3||
24|6|6|6|3||
25|6|6|6|3||
26|6|6|6|3||
27|6|6|6|3||
28|6|6|6|3||
29|6|6|6|3||
30|6|6|6|3||
31|6|6|6|3||
32|6|6|6|3||
Run Time: real 0.003 user 0.001587 sys 0.000358
--EQP-- 0,0,0,SCAN TABLE a
--EQP-- 0,1,1,SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?)
--EQP-- 0,0,0,EXECUTE LIST SUBQUERY 1
--EQP-- 0,2,2,SEARCH TABLE c USING INTEGER PRIMARY KEY (rowid=?)
1|1|1|1|1||
2|2|2|2|2||
3|3|2|3|3||
4|3|2|3|3||
5|4|4|4|3|4|5
6|4|4|4|3|4|5
7|4|4|4|3|4|5
8|4|4|4|3|4|5
9|5|5|||5|5
10|5|5|||5|5
11|5|5|||5|5
12|5|5|||5|5
13|5|5|||5|5
14|5|5|||5|5
15|5|5|||5|5
16|5|5|||5|5
17|6|6|6|3||
18|6|6|6|3||
19|6|6|6|3||
20|6|6|6|3||
21|6|6|6|3||
22|6|6|6|3||
23|6|6|6|3||
24|6|6|6|3||
25|6|6|6|3||
26|6|6|6|3||
27|6|6|6|3||
28|6|6|6|3||
29|6|6|6|3||
30|6|6|6|3||
31|6|6|6|3||
32|6|6|6|3||
Run Time: real 0.002 user 0.001560 sys 0.000296








___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread Richard Hipp
On 1/4/18, Dinu  wrote:
>  I think it's every DB's intention to optimize as best possible
> a query into an execution plan. None does it perfectly, but all try to, very
> hard.

There are trade-offs here.  How much slower are you willing for
sqlite3_prepare() to run in order to get a better query plan?  How
much extra memory and disk space are you willing to allocation to
libsqlite3.so in order to get a better query plan?  Are you willing to
impose these costs on (literally) billions of other users that don't
really need the more advanced query planning?  These are hard
questions with no easy answers.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-05 Thread R Smith


On 2018/01/05 4:24 AM, Dinu wrote:

Thank you for your answer, Keith. I had my problem "fixed" before I wrote the
first mail. Also with every problem I also provided the fix that worked, for
anyone that might run into the same problem.
However, it's difficult to not get a little frustrated with your answer.

At https://sqlite.org/queryplanner.html I read:

"The best feature of SQL (in all its implementations, not just SQLite) is
that it is a declarative language, not a procedural language. When
programming in SQL you tell the system what you want to compute, not how to
compute it."


I'm sure his frustration is on a par. :)
While we've both stated that your queries are not equivalent, Keith took 
the time to write an explanation of why that is, which seemingly did not 
hit home, and now I will try again with an analogy:


You are essentially have a delivery person usually tasked to take a 
truck, go to a farm and pick up a load of eggs. Now you ask the same 
driver to use the same truck to go to the corner cafe and pick up 6 
eggs, and then you exclaim "Wow, why he takes the truck? he can just 
take the scooter!! The truck is sooo inefficient for this job!".


And yes, you are right, but the problem is you asked for it to be done 
by truck. Now I agree the ideal in SQL is (as the quote above states) 
that one should merely ask for a result and the engine should decide how 
best to achieve it, but in practice there are many nuances in the 
programming of the engine that thwarts this ideal, not to mention how 
many programmers like to tweak their queries to get the engine to 
execute the quickest, and there is nothing wrong with this. It does 
however mean that the engine should in all circumstances, while trying 
to find the best query plan, still adhere to the type of question that 
was asked of it. You asked a METHOD A question, it won't (and shouldn't) 
apply a METHOD B to reach the accidental similar result.


Second problem, your queries show the narrowest of use cases. The engine 
has to work for ALL use cases which can get very involved and complex. 
The engine needn't have another level of abstraction AI going "Oh this 
one is simple, we will disregard what the programmer asked for and use 
our own more simple query because it should get the same result."




To sum it up: I think it's every DB's intention to optimize as best possible
a query into an execution plan. None does it perfectly, but all try to, very
hard. With this intention, I reported a case where the query planner COULD
be improved. I think you will at least agree with me that making it better
can't be wrong. Whether that happens tomorrow, in a year or never, that's up
to the mercy, resources and priorities of the developers, so I am really am
not interested in an argue over this.


It's a good idea to report possible improvements, and thank you for 
that, but this case isn't able to improve since mangling an outer join 
into an inner join when sometimes it might yield the same result is as 
unsafe as it gets. However, that doesn't mean the devs (who would have 
read all this) doesn't find something of interest and could possibly 
think of a tweak that might improve things, so having this debate is 
never a waste, but the specific algebraic essence of what you are 
suggesting is not correct - 's all we're sayin.


Cheers!
Ryan

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-04 Thread Dinu
Thank you for your answer, Keith. I had my problem "fixed" before I wrote the
first mail. Also with every problem I also provided the fix that worked, for
anyone that might run into the same problem.
However, it's difficult to not get a little frustrated with your answer. 

At https://sqlite.org/queryplanner.html I read:

"The best feature of SQL (in all its implementations, not just SQLite) is
that it is a declarative language, not a procedural language. When
programming in SQL you tell the system what you want to compute, not how to
compute it."

And I completely agree with this, "how to compute it" is called relational
algebra and it's what a query planner should do best. And the two queries
are algebrically identical. "(X ∊ S or X:=null) AND (X is not null)" is
equivalent to "X ∊ S is not null". The two queries might look different only
from an imperative programming point of view.

As to why the query is written that way: with the above in mind, I will
contend that there can absolutely never exist a "mistaken" way to write a
query, as long as the description of the predicates is correct and
consistent with the schema. You should consider that quite frequently
queries are the result of one or more levels of logic abstraction (ORM,
DBAL, etc). In my case, modifying the query was not difficult to do, but in
other cases one may have few options on rewriting the way the query
structure is generated. The only way to reduce a fabricated query is through
relational algebra, and that is up to the DB, not the programmer, not the
abstractions in-between.

In this particular case, the where is optional; depending on parameters, I
want the set of data that is correctly defined as the left join of tables a
and b, or I might want a subset of this join that has a particular property
over the left-joined set. The query was correctly written, to rewrite it so
that the query planner might know how to run it is wrong, IMHO.

To sum it up: I think it's every DB's intention to optimize as best possible
a query into an execution plan. None does it perfectly, but all try to, very
hard. With this intention, I reported a case where the query planner COULD
be improved. I think you will at least agree with me that making it better
can't be wrong. Whether that happens tomorrow, in a year or never, that's up
to the mercy, resources and priorities of the developers, so I am really am
not interested in an argue over this. 



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-04 Thread Keith Medcalf

They are not the same.  Just as 5 - 3 is not the same as 1 + 1, even though 
both come up with the same result, 2. by happenstance.

Your "where" condition is effectively converted an OUTER JOIN into an INNER 
JOIN through artifice (and quite likely mistake).  If you *want* an inner join, 
use an inner join.  If you want an outer join, use an outer join.  Just because 
subtraction of two different numbers may have the same result as addition of 
two other numbers, does not meant that addition and subtraction are the same 
thing.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Dinu
>Sent: Thursday, 4 January, 2018 16:01
>To: sqlite-users@mailinglists.sqlite.org
>Subject: Re: [sqlite] LEFT JOIN + WHERE / OR optimisation
>
>Algebrically, having a non-null imperative lookup condition in the
>WHERE
>clause means you have a stronger predicate on the same subject (ALL
>MUST fit
>vs. ANY that fit).
>
>
>
>--
>Sent from: http://sqlite.1065341.n5.nabble.com/
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-04 Thread Dinu
Algebrically, having a non-null imperative lookup condition in the WHERE
clause means you have a stronger predicate on the same subject (ALL MUST fit
vs. ANY that fit).



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-04 Thread Dinu
I think they are equivalent, if you look closer.

SELECT FROM a LEFT JOIN b ON a.x=b.x WHERE b.y=5 -is- equivalent to 
SELECT FROM a JOIN b ON a.x=b.x AND b.y=5
SELECT FROM a JOIN b WHERE a.x=b.x AND b.y=5
SELECT FROM a LEFT JOIN b ON a.x=b.x AND b.y=5 WHERE b.y IS NOT NULL

All the above are semantically equivalent. When there is only one LEFT JOIN,
the presence of any non-null non-alternative condition on the joined table
in the WHERE clause transforms it in an INNER join. There is no other way to
have a non-null value except if the row exists. The reciprocal is not true
of course.

I don't know how difficult it is to compute the -OR- closure, as it is more
difficult. But for an imperative non-null condition, I did expect the WHERE
condition to be ported to the ON lookup for optimisation.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-04 Thread David Raymond
The ON condition is used  the "add one result row for each row of the 
outer table where nothing matches the ON condition"
The WHERE condition is used  those rows are added.

Example with the basic "not in" type of outer join:

SQLite version 3.21.0 2017-10-24 18:55:49
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

sqlite> create table a (x);

sqlite> create table b (x);

sqlite> insert into a values (1), (2), (3);

sqlite> insert into b values (2);

sqlite> select a.x from a left outer join b on a.x = b.x where b.x is null;
x
1
3

sqlite> select a.x from a left outer join b on a.x = b.x and b.x is null;
x
1
2
3

sqlite> select a.x from a left outer join b where a.x = b.x and b.x is null;

sqlite>

-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of Keith Medcalf
Sent: Thursday, January 04, 2018 2:53 PM
To: SQLite mailing list
Subject: Re: [sqlite] LEFT JOIN + WHERE / OR optimisation


They are not semantically equivalent.  join conditions attached to an outer 
join operation are not semantically equivalent to the same conditions being in 
the where clause.

In other words:

select a,b,c
  from a
  join b
  join c on a.a=b.b
 where c.c=b.d

is simply syntactic sugar for

select a,b,c
  from a, b, c
 where a.a=b.b
   and c.c=b.d;

In all cases the conditions in ON clauses of INNER JOINS are nothing more than 
WHERE clause filters.  You do not even have to have the tables used in the ON 
clause "referenced" at the point you refer to them.

the word "INNER JOIN" is syntactic sugar for a comma (,), and ON is sytactic 
sugar for the word WHERE (or AND).

However, for OUTER JOINS the conditions in the ON clause "glue themselves" to 
the OUTER JOIN operation and ARE NEITHER syntactically or symantically the same 
as WHERE clause conditions.

That is to say the behaviour observed is how it is designed to work and you 
expectations are misguided.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Dinu
>Sent: Thursday, 4 January, 2018 12:29
>To: sqlite-users@mailinglists.sqlite.org
>Subject: [sqlite] LEFT JOIN + WHERE / OR optimisation
>
>Hi all,
>I've ran into an optimisation problem with a double-left join that
>works as
>an "either" clause.
>
>The query is as follows:
>
>SELECT *
>FROM
>  a
>LEFT JOIN
>  b ON 
>LEFT JOIN
>  c ON 
>WHERE
>  b.someId IN (1,2,3) OR
>  c.someId IN (4,5)
>
>This results in a bloated execution plan:
>SEARCH a
>SEARCH SUBQUERY 1 USING AUTOMATIC COVERING INDEX
>SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX
>
>However, the semantically equivalent:
>SELECT *
>FROM
>  a
>LEFT JOIN
>  b ON  AND b.someId IN (1,2,3)
>LEFT JOIN
>  c ON AND c.someId IN (4,5)
>WHERE
>  b.someId IS NOT NULL OR
>  c.someId IS NOT NULL
>
>Gets the proper execution plan:
>SEARCH b
>SEARCH c
>EXECUTE LIST SUBQUERY
>
>
>
>--
>Sent from: http://sqlite.1065341.n5.nabble.com/
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-04 Thread R Smith


On 2018/01/04 9:28 PM, Dinu wrote:

Hi all,
I've ran into an optimisation problem with a double-left join that works as
an "either" clause.

The query is as follows:

SELECT *
FROM
   a
LEFT JOIN
   b ON 
LEFT JOIN
   c ON 
WHERE
   b.someId IN (1,2,3) OR
   c.someId IN (4,5)

This results in a bloated execution plan:
SEARCH a
SEARCH SUBQUERY 1 USING AUTOMATIC COVERING INDEX
SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX

However, the semantically equivalent:
SELECT *
FROM
   a
LEFT JOIN
   b ON  AND b.someId IN (1,2,3)
LEFT JOIN
   c ON AND c.someId IN (4,5)
WHERE
   b.someId IS NOT NULL OR
   c.someId IS NOT NULL

Gets the proper execution plan:
SEARCH b
SEARCH c
EXECUTE LIST SUBQUERY


These Queries are not equivalent, they cannot and should not have the 
same query plan.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-04 Thread Keith Medcalf

They are not semantically equivalent.  join conditions attached to an outer 
join operation are not semantically equivalent to the same conditions being in 
the where clause.

In other words:

select a,b,c
  from a
  join b
  join c on a.a=b.b
 where c.c=b.d

is simply syntactic sugar for

select a,b,c
  from a, b, c
 where a.a=b.b
   and c.c=b.d;

In all cases the conditions in ON clauses of INNER JOINS are nothing more than 
WHERE clause filters.  You do not even have to have the tables used in the ON 
clause "referenced" at the point you refer to them.

the word "INNER JOIN" is syntactic sugar for a comma (,), and ON is sytactic 
sugar for the word WHERE (or AND).

However, for OUTER JOINS the conditions in the ON clause "glue themselves" to 
the OUTER JOIN operation and ARE NEITHER syntactically or symantically the same 
as WHERE clause conditions.

That is to say the behaviour observed is how it is designed to work and you 
expectations are misguided.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Dinu
>Sent: Thursday, 4 January, 2018 12:29
>To: sqlite-users@mailinglists.sqlite.org
>Subject: [sqlite] LEFT JOIN + WHERE / OR optimisation
>
>Hi all,
>I've ran into an optimisation problem with a double-left join that
>works as
>an "either" clause.
>
>The query is as follows:
>
>SELECT *
>FROM
>  a
>LEFT JOIN
>  b ON 
>LEFT JOIN
>  c ON 
>WHERE
>  b.someId IN (1,2,3) OR
>  c.someId IN (4,5)
>
>This results in a bloated execution plan:
>SEARCH a
>SEARCH SUBQUERY 1 USING AUTOMATIC COVERING INDEX
>SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX
>
>However, the semantically equivalent:
>SELECT *
>FROM
>  a
>LEFT JOIN
>  b ON  AND b.someId IN (1,2,3)
>LEFT JOIN
>  c ON AND c.someId IN (4,5)
>WHERE
>  b.someId IS NOT NULL OR
>  c.someId IS NOT NULL
>
>Gets the proper execution plan:
>SEARCH b
>SEARCH c
>EXECUTE LIST SUBQUERY
>
>
>
>--
>Sent from: http://sqlite.1065341.n5.nabble.com/
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-04 Thread Dinu
Probably related:

Compound join with a left outer join generates different execution plans:

LEFT JOIN (
  b 
  JOIN c ON 
)
WHERE
  b.something = 5

vs.

LEFT JOIN (
  b 
  JOIN c ON  AND b.something = 5
)
WHERE
  b.something IS NOT NULL




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] LEFT JOIN + WHERE / OR optimisation

2018-01-04 Thread Dinu
Hi all,
I've ran into an optimisation problem with a double-left join that works as
an "either" clause.

The query is as follows:

SELECT *
FROM
  a
LEFT JOIN
  b ON 
LEFT JOIN 
  c ON 
WHERE
  b.someId IN (1,2,3) OR
  c.someId IN (4,5)

This results in a bloated execution plan:
SEARCH a
SEARCH SUBQUERY 1 USING AUTOMATIC COVERING INDEX
SEARCH SUBQUERY 2 USING AUTOMATIC COVERING INDEX

However, the semantically equivalent:
SELECT *
FROM
  a
LEFT JOIN
  b ON  AND b.someId IN (1,2,3)
LEFT JOIN 
  c ON AND c.someId IN (4,5)
WHERE
  b.someId IS NOT NULL OR
  c.someId IS NOT NULL

Gets the proper execution plan:
SEARCH b
SEARCH c
EXECUTE LIST SUBQUERY



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users