Re: [sqlite] JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

2017-09-11 Thread J. King
Doh. I meant the linked document on the error log. Silly me. 



On September 11, 2017 9:41:39 PM EDT, "J. King"  wrote:
>There's an extra word in the first paragraph of Section 4 of that
>document, by the way:
>
>" The error logger callback has also proven useful in catching errors
>occasional errors that the application misses..."
>
>On September 11, 2017 11:22:50 AM EDT, Dan Kennedy
> wrote:
>>On 09/10/2017 08:30 PM, R Smith wrote:
>>> Well yes but the documentation suggests that one could expect a
>>slight 
>>> degradation. The words "works best with" does not seem to imbue an 
>>> idea of "give WITHOUT ROWID tables a wide berth when your tables are
>
>>> more than few columns wide", and I don't think the Devs intended
>that
>>
>>> either.
>>>
>>> I can still roll with the idea that the WITHOUT ROWID tables with
>151
>>
>>> columns and lots of data actually performed FASTER than the same
>>ROWID 
>>> tables for the other queries but slower with the JOIN... but that
>>much 
>>> slower?
>>
>>
>>
>>The difference seems to be that, currently, SQLite never creates 
>>automatic indexes on WITHOUT ROWID tables.
>>
>>   http://sqlite.org/optoverview.html#autoindex
>>
>>I don't think there is a good reason that it cannot do so. It just 
>>doesn't yet.
>>
>>Dan.
>>
>>
>>
>>
>>
>>>
>>> I mean we are not talking a "little" slower, we are talking 50ms vs.
>
>>> ~70,000ms, that's a difference of a whopping 3 orders of magnitude
>>and 
>>> change. And we are not talking huge tables, another query that
>simply
>>
>>> walks the tables can do so several million times in the same time
>the
>>
>>> JOIN query does.
>>>
>>> Put another way, I can create 100 new tables and populate them each 
>>> with the rows from one test table, then delete the rows from each
>new
>>
>>> table that doesn't satisfy an EXISTS() check in the other test table
>
>>> (essentially mimicking the JOIN query) and then output each full new
>
>>> table, 100 of them in turn, and then DROP them all. SQlite can do
>ALL
>>
>>> of that in a fraction of the time that the normal JOIN query
>(between
>>
>>> those same two test tables) takes to complete.
>>>
>>> We are talking a formula 1 car suddenly going at max speed of 1 
>>> mile-per-weekend, barely keeping up with a semi-athletic snail, and 
>>> people suggest checking the fuel octane rating. I'm saying there is 
>>> something wrong under the hood.
>>>
>>> Cheers,
>>> Ryan
>>>
>>>
>>> On 2017/09/10 11:28 AM, Clemens Ladisch wrote:
 R Smith wrote:
> I am using 151 columns for both tests. The only thing that changes
> between the two scripts are the words "WITHOUT ROWID" being added
  says:
 | WITHOUT ROWID tables will work correctly ... for tables with a
>>single
 | INTEGER PRIMARY KEY. However, ordinary rowid tables will run
>>faster in
 | that case.
 and:
 | WITHOUT ROWID tables work best when individual rows are not too
>>large.

 So this is pretty much the documented worst case for WITHOUT ROWID 
 tables.

 If the query is executed by making a copy of all the table data
>into
 a temporary B-tree (index), which is what INTERSECT and probably
>>IN(...)
 are doing, then the WITHOUT-ROWID-ness of the table does not
>matter.


 Regards,
 Clemens
 ___
 sqlite-users mailing list
 sqlite-users@mailinglists.sqlite.org

>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users@mailinglists.sqlite.org
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
>>___
>>sqlite-users mailing list
>>sqlite-users@mailinglists.sqlite.org
>>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>-- 
>Sent from my Android device with K-9 Mail. Please excuse my brevity.
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

2017-09-11 Thread J. King
There's an extra word in the first paragraph of Section 4 of that document, by 
the way:

" The error logger callback has also proven useful in catching errors 
occasional errors that the application misses..."

On September 11, 2017 11:22:50 AM EDT, Dan Kennedy  
wrote:
>On 09/10/2017 08:30 PM, R Smith wrote:
>> Well yes but the documentation suggests that one could expect a
>slight 
>> degradation. The words "works best with" does not seem to imbue an 
>> idea of "give WITHOUT ROWID tables a wide berth when your tables are 
>> more than few columns wide", and I don't think the Devs intended that
>
>> either.
>>
>> I can still roll with the idea that the WITHOUT ROWID tables with 151
>
>> columns and lots of data actually performed FASTER than the same
>ROWID 
>> tables for the other queries but slower with the JOIN... but that
>much 
>> slower?
>
>
>
>The difference seems to be that, currently, SQLite never creates 
>automatic indexes on WITHOUT ROWID tables.
>
>   http://sqlite.org/optoverview.html#autoindex
>
>I don't think there is a good reason that it cannot do so. It just 
>doesn't yet.
>
>Dan.
>
>
>
>
>
>>
>> I mean we are not talking a "little" slower, we are talking 50ms vs. 
>> ~70,000ms, that's a difference of a whopping 3 orders of magnitude
>and 
>> change. And we are not talking huge tables, another query that simply
>
>> walks the tables can do so several million times in the same time the
>
>> JOIN query does.
>>
>> Put another way, I can create 100 new tables and populate them each 
>> with the rows from one test table, then delete the rows from each new
>
>> table that doesn't satisfy an EXISTS() check in the other test table 
>> (essentially mimicking the JOIN query) and then output each full new 
>> table, 100 of them in turn, and then DROP them all. SQlite can do ALL
>
>> of that in a fraction of the time that the normal JOIN query (between
>
>> those same two test tables) takes to complete.
>>
>> We are talking a formula 1 car suddenly going at max speed of 1 
>> mile-per-weekend, barely keeping up with a semi-athletic snail, and 
>> people suggest checking the fuel octane rating. I'm saying there is 
>> something wrong under the hood.
>>
>> Cheers,
>> Ryan
>>
>>
>> On 2017/09/10 11:28 AM, Clemens Ladisch wrote:
>>> R Smith wrote:
 I am using 151 columns for both tests. The only thing that changes
 between the two scripts are the words "WITHOUT ROWID" being added
>>>  says:
>>> | WITHOUT ROWID tables will work correctly ... for tables with a
>single
>>> | INTEGER PRIMARY KEY. However, ordinary rowid tables will run
>faster in
>>> | that case.
>>> and:
>>> | WITHOUT ROWID tables work best when individual rows are not too
>large.
>>>
>>> So this is pretty much the documented worst case for WITHOUT ROWID 
>>> tables.
>>>
>>> If the query is executed by making a copy of all the table data into
>>> a temporary B-tree (index), which is what INTERSECT and probably
>IN(...)
>>> are doing, then the WITHOUT-ROWID-ness of the table does not matter.
>>>
>>>
>>> Regards,
>>> Clemens
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users@mailinglists.sqlite.org
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

2017-09-11 Thread Dan Kennedy

On 09/10/2017 08:30 PM, R Smith wrote:
Well yes but the documentation suggests that one could expect a slight 
degradation. The words "works best with" does not seem to imbue an 
idea of "give WITHOUT ROWID tables a wide berth when your tables are 
more than few columns wide", and I don't think the Devs intended that 
either.


I can still roll with the idea that the WITHOUT ROWID tables with 151 
columns and lots of data actually performed FASTER than the same ROWID 
tables for the other queries but slower with the JOIN... but that much 
slower?




The difference seems to be that, currently, SQLite never creates 
automatic indexes on WITHOUT ROWID tables.


  http://sqlite.org/optoverview.html#autoindex

I don't think there is a good reason that it cannot do so. It just 
doesn't yet.


Dan.







I mean we are not talking a "little" slower, we are talking 50ms vs. 
~70,000ms, that's a difference of a whopping 3 orders of magnitude and 
change. And we are not talking huge tables, another query that simply 
walks the tables can do so several million times in the same time the 
JOIN query does.


Put another way, I can create 100 new tables and populate them each 
with the rows from one test table, then delete the rows from each new 
table that doesn't satisfy an EXISTS() check in the other test table 
(essentially mimicking the JOIN query) and then output each full new 
table, 100 of them in turn, and then DROP them all. SQlite can do ALL 
of that in a fraction of the time that the normal JOIN query (between 
those same two test tables) takes to complete.


We are talking a formula 1 car suddenly going at max speed of 1 
mile-per-weekend, barely keeping up with a semi-athletic snail, and 
people suggest checking the fuel octane rating. I'm saying there is 
something wrong under the hood.


Cheers,
Ryan


On 2017/09/10 11:28 AM, Clemens Ladisch wrote:

R Smith wrote:

I am using 151 columns for both tests. The only thing that changes
between the two scripts are the words "WITHOUT ROWID" being added

 says:
| WITHOUT ROWID tables will work correctly ... for tables with a single
| INTEGER PRIMARY KEY. However, ordinary rowid tables will run faster in
| that case.
and:
| WITHOUT ROWID tables work best when individual rows are not too large.

So this is pretty much the documented worst case for WITHOUT ROWID 
tables.


If the query is executed by making a copy of all the table data into
a temporary B-tree (index), which is what INTERSECT and probably IN(...)
are doing, then the WITHOUT-ROWID-ness of the table does not matter.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

2017-09-10 Thread R Smith
Well yes but the documentation suggests that one could expect a slight 
degradation. The words "works best with" does not seem to imbue an idea 
of "give WITHOUT ROWID tables a wide berth when your tables are more 
than few columns wide", and I don't think the Devs intended that either.


I can still roll with the idea that the WITHOUT ROWID tables with 151 
columns and lots of data actually performed FASTER than the same ROWID 
tables for the other queries but slower with the JOIN... but that much 
slower?


I mean we are not talking a "little" slower, we are talking 50ms vs. 
~70,000ms, that's a difference of a whopping 3 orders of magnitude and 
change. And we are not talking huge tables, another query that simply 
walks the tables can do so several million times in the same time the 
JOIN query does.


Put another way, I can create 100 new tables and populate them each with 
the rows from one test table, then delete the rows from each new table 
that doesn't satisfy an EXISTS() check in the other test table 
(essentially mimicking the JOIN query) and then output each full new 
table, 100 of them in turn, and then DROP them all. SQlite can do ALL of 
that in a fraction of the time that the normal JOIN query (between those 
same two test tables) takes to complete.


We are talking a formula 1 car suddenly going at max speed of 1 
mile-per-weekend, barely keeping up with a semi-athletic snail, and 
people suggest checking the fuel octane rating. I'm saying there is 
something wrong under the hood.


Cheers,
Ryan


On 2017/09/10 11:28 AM, Clemens Ladisch wrote:

R Smith wrote:

I am using 151 columns for both tests. The only thing that changes
between the two scripts are the words "WITHOUT ROWID" being added

 says:
| WITHOUT ROWID tables will work correctly ... for tables with a single
| INTEGER PRIMARY KEY. However, ordinary rowid tables will run faster in
| that case.
and:
| WITHOUT ROWID tables work best when individual rows are not too large.

So this is pretty much the documented worst case for WITHOUT ROWID tables.

If the query is executed by making a copy of all the table data into
a temporary B-tree (index), which is what INTERSECT and probably IN(...)
are doing, then the WITHOUT-ROWID-ness of the table does not matter.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

2017-09-10 Thread Clemens Ladisch
R Smith wrote:
> I am using 151 columns for both tests. The only thing that changes
> between the two scripts are the words "WITHOUT ROWID" being added

 says:
| WITHOUT ROWID tables will work correctly ... for tables with a single
| INTEGER PRIMARY KEY. However, ordinary rowid tables will run faster in
| that case.
and:
| WITHOUT ROWID tables work best when individual rows are not too large.

So this is pretty much the documented worst case for WITHOUT ROWID tables.

If the query is executed by making a copy of all the table data into
a temporary B-tree (index), which is what INTERSECT and probably IN(...)
are doing, then the WITHOUT-ROWID-ness of the table does not matter.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

2017-09-09 Thread Nico Williams
On Sat, Sep 09, 2017 at 11:26:35PM +0200, R Smith wrote:
> I think you are missing something or my explanation was not clear.
> When I say "first test" I mean of THIS test suite, not the previous set from
> 3 days ago.

I meant the opposite.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

2017-09-09 Thread R Smith



On 2017/09/09 9:20 PM, Nico Williams wrote:

On Sat, Sep 09, 2017 at 05:56:03PM +0200, R Smith wrote:

*Findings* pertaining to JOIN, INTERSECT and WHERE IN (...) type queries on
tables *WITHOUT Row_ids*:
(This is the full test posted below because it is the one that matters most)
INTERSECT AND WHERE IN (...) queries posted similar times here with WHERE IN
(...) being the slightly faster (similar to the above findings), but BOTH
were a good 10 - 15% faster than on the ROW-ID tables, so WITHOUT ROWID
tables seem to have a definite advantage here (it is currently unclear to me
why this is).

A troubling test is the JOIN on WITHOUT ROWID tables - it took several
orders of magnitude longer than any other test in the entire experiment.

In your first test you were ordering by PK, now you're not, and you
don't have an [covering] index on the columns you're ordering by, so,
yeah, "orders of magnitude" slower is to be expected.  You're comparing
apples and oranges. /// etc 


I think you are missing something or my explanation was not clear.
When I say "first test" I mean of THIS test suite, not the previous set 
from 3 days ago.


I am in no way ordering by PK in this set of tests, anywhere. I am not 
using 1 column, I am using 151 columns for both tests. The only thing 
that changes between the two scripts are the words "WITHOUT ROWID" being 
added or removed, NOTHING else. The problem also happens on the 
un-ordered tests. The ordering is of no consequence.


To be clear, there are two tests, both do not order by PK, both do not 
have covering indexes, both have 151 columns and lots of data... the 
only difference between them is the words "WITHOUT ROWID" at the end of 
the test tables. One query runs in ~50 milliseconds, the other runs for 
minutes. The results are exactly the same.


You can test this by taking the script I provided, run it, it will run 
for longer than 2 minutes (depending on your machine speed) because of 
the JOIN queries taking over a minute each, then remove the words 
"WITHOUT ROWID" from the two test tables and run it again. The entire 
script will finish in under  3 seconds with the JOIN queries weighing in 
at circa 50ms each.


That cannot possibly be an expected circumstance and it isn't an 
apples-oranges thing.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

2017-09-09 Thread Nico Williams
On Sat, Sep 09, 2017 at 05:56:03PM +0200, R Smith wrote:
> *Findings* pertaining to JOIN, INTERSECT and WHERE IN (...) type queries on
> tables *WITHOUT Row_ids*:
> (This is the full test posted below because it is the one that matters most)
> INTERSECT AND WHERE IN (...) queries posted similar times here with WHERE IN
> (...) being the slightly faster (similar to the above findings), but BOTH
> were a good 10 - 15% faster than on the ROW-ID tables, so WITHOUT ROWID
> tables seem to have a definite advantage here (it is currently unclear to me
> why this is).
> 
> A troubling test is the JOIN on WITHOUT ROWID tables - it took several
> orders of magnitude longer than any other test in the entire experiment.

In your first test you were ordering by PK, now you're not, and you
don't have an [covering] index on the columns you're ordering by, so,
yeah, "orders of magnitude" slower is to be expected.  You're comparing
apples and oranges.

Also, in your first test you have one column.  Now you have lots.
That's no way to isolate the performance numbers you're trying to get.

The only way to have an ORDER BY speed up a query is when there's an
index on the columns in question that can be scanned to produce the
order you're looking for without additional sorting, or when the engine
can create such an index as a temporary index (this, of course, has a
cost, so it won't always work well)

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

2017-09-09 Thread R Smith
Full tests completed with findings ranging from less interesting to 
exposing a rather significant inefficiency in SQLite.


I won't post all the tests because that would take far too much space, 
in stead I will simply discuss the experiment and findings and post the 
test script so that anyone can verify the findings.



*Problem*:
Earlier tests suggested that, in a query where the common rows between 
two tables need to be found, JOIN is faster than INTERSECT when there 
are very few matches with efficiencies evening out as the number of 
matches increase. It suggested, more pertinently, that an ORDER BY 
clause when added to both types of queries improved all-round 
performance by a significant margin (moreso in JOIN's case).
After examination of the tests, comments suggested that a lot of the 
efficiency differences may be due to:

- row_id involvement,
- checks only done on Primary key values as opposed to common row 
values, and
- test tables having one or two columns only not matching any real-world 
scenario which made the findings irrelevant.
It was also pointed out that apart from JOIN and INTERSECT, a WHERE IN 
(SELECT FROM ...) clause could achieve the same query result set and 
should form part of the tests.



*Purpose*:
This experiment and tests aimed to establish the differences in 
efficiency for queries with the same results that could be posed in 
different ways to the Query Planner in specifically SQLite.
The possible ways to ask the same question that was tested included 
similar queries (producing the same result set) when finding the common 
set of rows between two tables using JOIN, INTERSECT and WHERE IN (...), 
and secondly similar queries finding the exclusive set of rows between 
two tables using OUTER JOIN, EXCEPT and WHERE NOT IN (...).
[SQLite doesn't implement a full outer join so that specific test was 
restricted to comparing EXCEPT vs. WHERE NOT IN (...)]



*Method*:
A simple SQL script that first establishes a table "p" with parameters 
that dictate the data shape added to the tests. Next a random value 
table "rnd" is set up with the number of rows and randomness scope 
dictated by table "p" and then two test tables "t1" and "t2" with each 
151 columns (1 Key and 30 each of 5 data types - INT, REAL, NUMERIC 
(Dates), BLOB and TEXT) which gets populated with randomly selected rows 
from the "rnd" table so that many rows appear only in either table, and 
some rows appear in both (coincidence).


The total number of rows to  choose from is dictated by the "p" table's 
"rndCount" field and the approximate percentage of rows that appear in 
both tables is dictated by the "rndAddFreq" (random row adding 
frequency) field.
Up to here the tests are all similar, except that for the EXCEPT / JOIN 
WHERE NOT IN (...) a much higher frequency of coincidence between the 
table rows were selected since the output count is opposite to the other 
tests.
The queries designed to produce the exact same output using the 
different methods discussed above are run in turn. The queries are then 
repeated with an added ORDER BY clause.

Timings are measured for all.


*Test parameters* to ensure all potential problems mentioned above were 
accounted for:
- Pre-setup of the random table ensures all the queries operate on the 
exact same randomized data.
- Tables contain 150 data columns of all typical use types and with 
varying data and data lengths (in the case of BLOB and TEXT at least)

- Tables have a Primary Key, but it does not alias the Row_ID
- Tests were repeated for both normal and WITHOUT ROWID tables
- Query field comparisons involved a full set of field types (except 
BLOB columns) and not the primary key
- Test-bed using older/slower notebook and sqlitespeed on Windows 10 
64bit prof with updated latest sqlite3 DLL file - version 3.20.1 Release 
2017-08-24.



*General Notes*:
- Queries were tested in different orders to avoid caching effects, but 
this made no difference - it seems there is no mystery caching when the 
queries are so different.
- The EXCEPT vs. WHERE NOT IN (...) queries yielded consistently similar 
times through a range of parameters, with or without ORDER BY clauses, 
with or without ROWIDs, so their tests provided no real insight, is 
probably both well-optimized already, and I won't refer to them again in 
this document.



*Findings* pertaining to JOIN, INTERSECT and WHERE IN (...) type queries 
on tables *with Row_ids*:
First important thing to note is that when not involving the primary key 
and using these larger (wider) tables, the ORDER BY clause addition made 
very little difference (sometimes completely zero difference). JOIN and 
INTERSECT posted similar times for these larger (wider) tables with 
INTERSECT consistently winning by a very slight margin.
The surprise here was that WHERE IN (...) queries were consistently 
faster than both JOIN and INTERSECT (again with the added ORDER BY being 
inconsequential). Not by a large margin though, 4-5%