Re: [sqlite] Equiv stmts, different explain plans

2019-03-06 Thread James K. Lowden
On Tue, 05 Mar 2019 13:58:06 -0700
"Keith Medcalf"  wrote:

> >The query requests no such thing.  SQL makes no request or
> >suggestion for how to execute a query.  It simply describes a result.
> >It's up to the implementation to determine how to produce that
> >result.
> 
> You are, of course, correct.  However for the two queries given I do
> not believe that any query planner currently in existence will
> recognize that t1.c == 1 and t2.c == 1 implies that t1.c == t2.c.  

Thank you for the clarification, Keith.  You may well be right about
the state of the art.  I fault SQL itself; if it implemented relational
algebra correctly, there would be no duplicates from SELECT, and no need
of DISTINCT.  Perhaps then the transformation of FORALL to JOIN would
be easier to infer.  

There is sometimes a tendency in this forum to use shorthand, to
describe what SQLite does as what SQL does.  It's useful for the users'
sake to distinguish between the two, so as not to confuse the
attainable with the attained.   :-) 

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Equiv stmts, different explain plans

2019-03-05 Thread Keith Medcalf

On Tuesday, 5 March, 2019 12:53, James K. Lowden  
wrote:

>On Mon, 04 Mar 2019 20:20:08 -0700> "Keith Medcalf"  
>wrote:

>> In the first query the subselect that creates the list is
>> independent.
>> In the second query the subselect that creates the list is
>> correlated.

>Yes, and if it can be shown that the two queries are logically
>equivalent under relational algebra, then it's theoretically possible
>for the query planner to have arrived at the same plan in both cases.
>That is the only test that could support/deny the assertion that they
>could be rendered according to the same execution plan.

>> In the first query you have requested that the subquery be executed
>> to create the list for use by the IN operator.

>No.  The query requests no such thing.  SQL makes no request or
>suggestion for how to execute a query.  It simply describes a result.
>It's up to the implementation to determine how to produce that
>result.

You are, of course, correct.  However for the two queries given I do not 
believe that any query planner currently in existence will recognize that t1.c 
== 1 and t2.c == 1 implies that t1.c == t2.c.  However, that implication may be 
stated explicitly (as it is in the correlated subquery).  It is also entirely 
possible that if the (first) query were phrased as:

select * 
  from t1 
 where c == ?0
   and d in (select d from t2 where c == ?0 and d == t1.d)
;

then it is quite possible for the query planner to take notice of the fact that 
t1.c == t2.c ...

Similarly I would not *expect* that a query planner would consider t1.c and 
t2.c to be transitively equal if the query were phrased as:

select * 
  from t1 
 where c == ?
   and d in (select d from t2 where c == ? and d == t1.d)
;

even if the two parameters were the same value ...

As another note, you also commented that a "select distinct * from t1 join 
" is (possibly) equivalent.  This is not necessarily the case because there 
is nothing in the schema which requires the rows of t1 (or t2 for that matter) 
to be distinct and thus the loop order does affect the output (without 
distinct).  Granted, with distinct the output (set) will be the same no matter 
the loop nesting order, it may not be the same without distinct depending on 
the data in the tables.

In other words, we arrive at the same point in the end.  It depends on the 
original "problem statement" which the SQL was composed to solve.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Equiv stmts, different explain plans

2019-03-05 Thread James K. Lowden
On Mon, 04 Mar 2019 20:20:08 -0700
"Keith Medcalf"  wrote:

> In the first query the subselect that creates the list is independent.
> In the second query the subselect that creates the list is correlated.

Yes, and if it can be shown that the two queries are logically
equivalent under relational algebra, then it's theoretically possible
for the query planner to have arrived at the same plan in both cases.
That is the only test that could support/deny the assertion that they
could be rendered according to the same execution plan.  

> In the first query you have requested that the subquery be executed
> to create the list for use by the IN operator.  

No.  The query requests no such thing.  SQL makes no request or
suggestion for how to execute a query.  It simply describes a result.
It's up to the implementation to determine how to produce that result.  

--jkl


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Equiv stmts, different explain plans

2019-03-05 Thread Keith Medcalf

On Tuesday, 5 March, 2019 04:09, Simon Slavin  wrote:

>On 5 Mar 2019, at 2:06am, kk  wrote:

 select * from t1
where c=1 and d in (select d from t2 where c=1);
 select * from t1
where c=1 and d in (select d from t2 where t2.c=t1.c);

>> DRH, many thanks for your reply, I was expecting same output
>because I believe stmts to be equivalent, so was not sure why query
>plan was different. I see the explain plans are very similar.
>> But I believe original stmts mentioned are still equivalent?

>How do you expect a SQL engine to approach the above statements ?
>Should it process the inner SELECT first, or the outer SELECT first ?

>If it processes the inner SELECT first, where does it get the value
>it needs for t1.c ?

>If it processes the outer SELECT first, what strategy does it use for
>selecting on t1.d when it doesn't yet know whether there's going to
>be no, a single, or multiple values ?

>> Do you agree? And in SQLite what is best way to write such stmt (or
>in other terms, what is difference)?
>
>Using a JOIN.
>
>SELECT t1.* FROM t1
>INNER JOIN t2 ON t2.c=1 AND t2.d = t1.d
>WHERE t1.c=1;

Technically this is invalid.  t2.c == 1 is NOT a equijoin condition and should 
NOT appear in the ON clause (though it would be valid in the case of an outer 
join).  However, since [INNER] JOIN is merely syntactic sugar for a , and the 
ON is merely syntactic sugar for a WHERE clause, this does not really matter 
much.  Moreover, the query plans will be different because in this case you are 
joining T1 against T2 using only the common column d, and then filtering the 
interim results based on t1.c and t2.c.  (The actual plan will use the c==1 
condition to constrain the outer loop, then loop through the inner table based 
on the join column d, then filter the result of that with the "other" c==1 
condition -- which table is chosen as the outer table is up to the query 
planner (and it will choose whichever one it things the constraint c==1 will 
produce the least rows)).

This is entirely different from the below query where you are joining T1 and T2 
on the common columns c and d, then filtering for a specific value of c in t1 
(which will constrain the outer loop).

The total number of candidate solutions (the number of intersects in the nested 
loops) can be quite different.

>SELECT t1.* FROM t1
>INNER JOIN t2 ON t2.c=t1.c AND t2.d = t1.d
>WHERE t1.c=1;

>The INNER JOIN (as opposed to OUTER JOIN) means that a row must exist
>in t2 for the equivalent row in t1 to be returned.  INNER is the
>default kind of JOIN.  Of the two statements, it seems that the fist
>one requires less processing.

Actually, the latter (the properly phrased equijoin) will require the least 
processing since it gives the query planner the greatest latitude to generate 
an optimal solution.  In one case (the former) you have the condition "t1.c == 
1 AND t2.c == 1".  The query planner cannot possibly derive from this the fact 
that "t1.c == t2.c".  However, in the latter case, you have the condition "t1.c 
== t2.c AND t1.c == 1" from which the query planner can determine the fact that 
"t2.c == 1".  Depending on the shape of the data and the available statistics 
this may have a great effect on the performance of the query because the query 
planner has more information and may choose a more optimal solution (that is, 
it may now choose whether t1 or t2 is the outer loop, and the descent into the 
inner table uses both columns).

This is, of course, also fully equivalent to the properly phrased JOIN but it 
does constrain the solution (it is equivalent to specifying "t1 CROSS JOIN t2" 
in the above properly phrased equijoin in that it constrains the order of 
traversal of the tables):

SELECT *
  FROM t1
 WHERE c == 1
   AND EXISTS (SELECT * FROM t2 WHERE c == t1.c AND d == t1.d)
;

since no actual data from t2 needs to be returned.  Which "phrasing" of the 
myriad of perhaps different queries producing (quite possibly) the same (or 
perhaps different) results probably depends on how one "translates" the 
original problem statement from English into SQL (and the adequacy of that 
problem statement) together with the maintainability of the application and the 
actual schema including the indexes ...

>Internally, SQLite does comparison and conversion when faced with
>different ways of phrasing your query.  But that's not your problem.
>Phrase what you want in as specific terms as possible, and let SQLite
>pick its preferred way of solving the problem.

Of course, which one chooses depends entirely on the original problem 
statement.  If one "assumes" that the t1.c=1 and t2.c=1 in the original 
statement:

select * from t1
   where c=1 and d in (select d from t2 where c=1);

is merely a lazy programmer expressing that t1.c == t2.c without actually 
saying so, then the properly phrased equijoin is the most efficient because it 
gives the query planner the most latitude to generate an optimal 

Re: [sqlite] Equiv stmts, different explain plans

2019-03-05 Thread Simon Slavin
On 5 Mar 2019, at 2:06am, kk  wrote:

>>> select * from t1
>>>where c=1 and d in (select d from t2 where c=1);
>>> select * from t1
>>>where c=1 and d in (select d from t2 where t2.c=t1.c);

> DRH, many thanks for your reply, I was expecting same output because I 
> believe stmts to be equivalent, so was not sure why query plan was different. 
> I see the explain plans are very similar.
> But I believe original stmts mentioned are still equivalent?

How do you expect a SQL engine to approach the above statements ?  Should it 
process the inner SELECT first, or the outer SELECT first ?

If it processes the inner SELECT first, where does it get the value it needs 
for t1.c ?

If it processes the outer SELECT first, what strategy does it use for selecting 
on t1.d when it doesn't yet know whether there's going to be no, a single, or 
multiple values ?

> Do you agree? And in SQLite what is best way to write such stmt (or in other 
> terms, what is difference)?

Using a JOIN.

SELECT t1.* FROM t1
INNER JOIN t2 ON t2.c=1 AND t2.d = t1.d
WHERE t1.c=1;
SELECT t1.* FROM t1
INNER JOIN t2 ON t2.c=t1.c AND t2.d = t1.d
WHERE t1.c=1;

The INNER JOIN (as opposed to OUTER JOIN) means that a row must exist in t2 for 
the equivalent row in t1 to be returned.  INNER is the default kind of JOIN.  
Of the two statements, it seems that the fist one requires less processing.

Internally, SQLite does comparison and conversion when faced with different 
ways of phrasing your query.  But that's not your problem.  Phrase what you 
want in as specific terms as possible, and let SQLite pick its preferred way of 
solving the problem.

Simon.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Equiv stmts, different explain plans

2019-03-05 Thread R Smith


On 2019/03/05 4:06 AM, kk wrote:

On 05/03/2019 01:33, Richard Hipp wrote:



create table t1(c,d);
create table t2(c,d);
explain select * from t1
    where c=1 and d in (select d from t2 where c=1);
explain select * from t1
    where c=1 and d in (select d from t2 where t2.c=t1.c);



DRH, many thanks for your reply, I was expecting same output because I 
believe stmts to be equivalent//...


They are very much not equivalent. They happen to produce the same 
output with this very specific crafted schema and queries, but that does 
not say that they mean the same thing, in fact they mean very different 
things in execution. I think Keith explained it well enough technically, 
but in case it is not 100% clear yet, let me add to it this example:


Say we have a group of random people, and I asked you to separate out 
all the people aged above 25, and then from that group separate out all 
the women, and then from that group separate all who have 
husbands/partners in the original group.


The next day, with the same group of beings, I might ask to first 
separate out all partnered pairs from the group, then from that group 
separate out all females and from that remainder, get everything that's 
been on Earth more than 25 years.


You might rightfully protest that, in the end, we would have the exact 
same people we've already picked out yesterday, and it would be true - 
however, the intermediate groups along the execution plan look very 
different, and the method you've used to achieve this second result 
follows a very different set of instructions, and, if the origin group 
allowed non-humans in, the second query may actually yield different 
results.


They are not equivalent in function just because they happen to yield 
the same end-results for the specific schema and content.



Hope that is a useful clarification!

Ryan


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Equiv stmts, different explain plans

2019-03-04 Thread Keith Medcalf

In the first query the subselect that creates the list is independent.
In the second query the subselect that creates the list is correlated.

In the first query you have requested that the subquery be executed to create 
the list for use by the IN operator.  After this has been done the main (outer) 
query is executed and a result generated when the where condition (which 
includes the IN operator) are satisfied.  Since it is possible that the list 
may not need to be generated because the condition c==1 in the outer query may 
never be satisfied, the subquery is only executed ONCE the first time its 
results are required.

In the second query you have requested that the outer query be executed AND FOR 
EACH ROW that passes the WHERE c=1 constraint, execute the subquery and then 
check if d in the outer query is in the set of the results obtained by running 
the correlated subquery.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Kyle
>Sent: Monday, 4 March, 2019 18:05
>To: sqlite-users@mailinglists.sqlite.org
>Subject: [sqlite] Equiv stmts, different explain plans
>
>On another DB I came across 2 stmts, that I think are equivalent, but
>generated different explain plans. I request a second opinion - are
>these 2 stmts equivalent? If so, why do they generate different
>explain
>plans even on sqlite?
>TIA
>--
>create table t1(c,d);
>create table t2(c,d);
>explain select * from t1
>   where c=1 and d in (select d from t2 where c=1);
>explain select * from t1
>   where c=1 and d in (select d from t2 where t2.c=t1.c);
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Equiv stmts, different explain plans

2019-03-04 Thread kk

On 05/03/2019 01:33, Richard Hipp wrote:

On 3/4/19, Kyle  wrote:

On another DB I came across 2 stmts, that I think are equivalent, but
generated different explain plans. I request a second opinion - are
these 2 stmts equivalent? If so, why do they generate different explain
plans even on sqlite?


The two SELECT statements below may well compute the same output
(unless I'm missing something) but they are not the same.  The WHERE
clause in the subquery is different. So why do you expect them to
generate the same query plan?


create table t1(c,d);
create table t2(c,d);
explain select * from t1
where c=1 and d in (select d from t2 where c=1);
explain select * from t1
where c=1 and d in (select d from t2 where t2.c=t1.c);



DRH, many thanks for your reply, I was expecting same output because I 
believe stmts to be equivalent, so was not sure why query plan was 
different. I see the explain plans are very similar.

But I believe original stmts mentioned are still equivalent?
Do you agree? And in SQLite what is best way to write such stmt (or in 
other terms, what is difference)?

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Equiv stmts, different explain plans

2019-03-04 Thread Richard Hipp
On 3/4/19, Kyle  wrote:
> On another DB I came across 2 stmts, that I think are equivalent, but
> generated different explain plans. I request a second opinion - are
> these 2 stmts equivalent? If so, why do they generate different explain
> plans even on sqlite?

The two SELECT statements below may well compute the same output
(unless I'm missing something) but they are not the same.  The WHERE
clause in the subquery is different. So why do you expect them to
generate the same query plan?

> create table t1(c,d);
> create table t2(c,d);
> explain select * from t1
>where c=1 and d in (select d from t2 where c=1);
> explain select * from t1
>where c=1 and d in (select d from t2 where t2.c=t1.c);


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Equiv stmts, different explain plans

2019-03-04 Thread Kyle
On another DB I came across 2 stmts, that I think are equivalent, but 
generated different explain plans. I request a second opinion - are 
these 2 stmts equivalent? If so, why do they generate different explain 
plans even on sqlite?

TIA
--
create table t1(c,d);
create table t2(c,d);
explain select * from t1
  where c=1 and d in (select d from t2 where c=1);
explain select * from t1
  where c=1 and d in (select d from t2 where t2.c=t1.c);
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users