Re: [sqlite] Ordering of fields in a join

2012-10-11 Thread Hamish Symington
Hi -

> Adding the warning to the explain plan output should work well.

It'd be a good place for it, I'd agree - it'd've saved me a trawl through the 
documentation and posting to this list! :)

Thanks,

Hamish
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Ordering of fields in a join

2012-10-11 Thread Jim Morris

Adding the warning to the explain plan output should work well.

...


And yet the coding mistake in the SQL query was very subtle.  It makes me
wonder if we shouldn't somehow come up with a "warning" mechanism in SQLite
to give developers a heads-up on error-prone constructs, such as using ==
between two columns with different default collating sequences...




Pavel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users






___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Ordering of fields in a join

2012-10-11 Thread Richard Hipp
On Thu, Oct 11, 2012 at 10:20 AM, Pavel Ivanov  wrote:

> On Thu, Oct 11, 2012 at 7:09 AM, Hamish Symington
>  wrote:
> >> Note that I was mistaken earlier when I said that "X=Y" and "Y=X" should
> >> result in the same answer.  If X and Y have different default collating
> >> sequences, then X=Y does not mean the same thing as Y=X and so the
> answers
> >> can be different.
> > Right, I understand. Perhaps a brief note on the docs for the query
> optimizer to remind people of this might be worth it - it seems slightly
> counterintuitive, from a 'normal' user's point of view.
>
> Usage of appropriate collating function is not an optimizer feature,
> it's property of SQLite data types. And it's well documented here
> http://www.sqlite.org/datatype3.html (see section 6.1). Whether
> optimizer chooses to use index or not depending on collating function
> is also well documented.
>
>
And yet the coding mistake in the SQL query was very subtle.  It makes me
wonder if we shouldn't somehow come up with a "warning" mechanism in SQLite
to give developers a heads-up on error-prone constructs, such as using ==
between two columns with different default collating sequences...



> Pavel
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Ordering of fields in a join

2012-10-11 Thread Pavel Ivanov
On Thu, Oct 11, 2012 at 7:09 AM, Hamish Symington
 wrote:
>> Note that I was mistaken earlier when I said that "X=Y" and "Y=X" should
>> result in the same answer.  If X and Y have different default collating
>> sequences, then X=Y does not mean the same thing as Y=X and so the answers
>> can be different.
> Right, I understand. Perhaps a brief note on the docs for the query optimizer 
> to remind people of this might be worth it - it seems slightly 
> counterintuitive, from a 'normal' user's point of view.

Usage of appropriate collating function is not an optimizer feature,
it's property of SQLite data types. And it's well documented here
http://www.sqlite.org/datatype3.html (see section 6.1). Whether
optimizer chooses to use index or not depending on collating function
is also well documented.

Pavel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Ordering of fields in a join

2012-10-11 Thread Hamish Symington
Hello,

> In your schema, SaleItem_SaleUUID collates using NOCASE and Picture_UUID
> collates using BINARY.
I think you mean SaleItem_PictureUUID, not SaleItem_SaleUUID, but yes.

> Note that I was mistaken earlier when I said that "X=Y" and "Y=X" should
> result in the same answer.  If X and Y have different default collating
> sequences, then X=Y does not mean the same thing as Y=X and so the answers
> can be different.
Right, I understand. Perhaps a brief note on the docs for the query optimizer 
to remind people of this might be worth it - it seems slightly 
counterintuitive, from a 'normal' user's point of view. 

Many thanks for the help - much appreciated. 

Best wishes,

Hamish
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Ordering of fields in a join

2012-10-11 Thread Richard Hipp
On Wed, Oct 10, 2012 at 10:53 AM, Hamish Symington <
ham...@lightbluesoftware.com> wrote:

>
> If I perform the query
>
> SELECT SaleItem.*, Picture_FileName FROM SaleItem LEFT JOIN Picture ON
> SaleItem_PictureUUID=Picture_UUID WHERE SaleItem_SaleUUID = 'DAB8FE97-
> D308-4809-B496-E55142DC05B5'
>
> the query is slow (c. 11 sec)
>
> but if I perform the exact same query but with the order of the terms in
> the ON part switched
>
> SELECT SaleItem.*, Picture_FileName FROM SaleItem LEFT JOIN Picture ON
> Picture_UUID=SaleItem_PictureUUID WHERE SaleItem_SaleUUID = 'DAB8FE97-
> D308-4809-B496-E55142DC05B5'
>
> then the query is near instant.
>

When you are comparing two columns that have different collating sequences,
the collating sequence of the left operand is used for the comparison.  See
http://www.sqlite.org/datatype3.html#collation for additional information.

In your schema, SaleItem_SaleUUID collates using NOCASE and Picture_UUID
collates using BINARY.

Hence, the second query above (the fast query) uses BINARY for comparing
the two values in the ON clause and is thus able to use the BINARY index
that exists on Picture_UUID:

CREATE INDEX idx_Picture_UUID ON Picture(Picture_UUID);

The first query above (the slow query) compares the two values using
NOCASE, and is thus unable to use the idx_Picture_UUID index.  If you
change your index to be:

CREATE INDEX idx_Picture_UUID ON Picture(Picture_UUID COLLATE nocase);

Then the first query will become fast and the second one will be slow.  You
can create two separate indices, one with COLLATE BINARY and the other with
COLLATE NOCASE, and then both queries will be fast.

Note that I was mistaken earlier when I said that "X=Y" and "Y=X" should
result in the same answer.  If X and Y have different default collating
sequences, then X=Y does not mean the same thing as Y=X and so the answers
can be different.

You can add a COLLATE clause after a comparison operator to explicitly
state the collating sequence to be used.  For example, "(X=Y collate
binary)" instead of just "X=Y".

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Ordering of fields in a join

2012-10-11 Thread Richard Hipp
On Thu, Oct 11, 2012 at 3:13 AM, Hamish Symington <
ham...@lightbluesoftware.com> wrote:

>
> > If you run ANALYZE on your database, SQLite will have more information
> with
> > which to estimate the run-time of each plan, and is more likely to choose
> > the faster on.  At the very least, it much less likely for the estimated
> > run-times to tie.  Even better is if you compile with SQLITE_ENABLE_STAT3
> > and then run ANALYZE.
> ANALYZE run; still the same result.
>

Please send me your complete schema and the content of the sqlite_stat1 and
sqlite_stat3 tables and I will have a look.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Ordering of fields in a join

2012-10-11 Thread Hamish Symington
Hello,

> Perhaps try the forthcoming 3.7.15 preview:
> http://www.sqlite.org/sqlite3-20121009.zip 
> See "Please test the latest SQLite enhancements" for details:
> http://comments.gmane.org/gmane.comp.db.sqlite.general/77259

Still present in that, as well. 

I should note (from further investigation) that this only occurs when COLLATE 
NOCASE is set on the fields I am joining on. 

My original CREATE TABLE statements should have read:

CREATE TABLE SaleItem (SaleItem_ID integer NOT NULL PRIMARY KEY, SaleItem_UUID 
varchar NOT NULL DEFAULT '' COLLATE NOCASE, SaleItem_SaleUUID varchar NOT NULL 
DEFAULT '' COLLATE NOCASE, SaleItem_PictureUUID varchar NOT NULL DEFAULT '' 
COLLATE NOCASE);

and

CREATE TABLE Picture (Picture_ID integer NOT NULL PRIMARY KEY, Picture_UUID 
varchar NOT NULL DEFAULT '' COLLATE NOCASE, Picture_FileName varchar NOT NULL 
DEFAULT '' COLLATE NOCASE);

I would speculate that the presence of a COLLATE is making it (incorrectly) 
believe that = is somehow no longer a symmetric relation (i.e. a = b doesn’t 
imply b = a).

Best wishes,

Hamish





___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Ordering of fields in a join

2012-10-11 Thread Hamish Symington
Hi -

>> I was under the impression that the order of fields in the ON clause
>> doesn't matter. Is this impression incorrect?
> It doesn't matter for the answer.  You get the same result either way,
> right?  Just one way is faster than the other.
True, I do get the same result. 

> If interchanging the order of terms in an ON clause makes a performance
> difference, that means that there are (at least) two possible ways of
> evaluating the query.  SQLite is looking at both possible plans and
> estimating the run-time for each, and it is coming up with the same
> estimate in both cases - a tie.  The ambiguity is resolved in favor of the
> plan it looked at first.  And the order in which the plans are examined is
> probably determined (to some extent) by the order of the terms in the ON
> clause.
I'd be surprised if the engine was evaluating the same time for each, given 
that the slower one is a good few hundred times slower - unless there's 
something about evaluation which I've fundamentally misunderstood. 

> If you run ANALYZE on your database, SQLite will have more information with
> which to estimate the run-time of each plan, and is more likely to choose
> the faster on.  At the very least, it much less likely for the estimated
> run-times to tie.  Even better is if you compile with SQLITE_ENABLE_STAT3
> and then run ANALYZE.
ANALYZE run; still the same result.

Thanks,

Hamish

> 
> 
> -- 
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


--
Hamish Symington
ham...@lightbluesoftware.com
Twitter: @lighterblue

Winner of the 2009 FileMaker CubeAwards 'Best Private Sector Solution' award

Light Blue Software is a company registered in England and Wales, number 
6671025. Our registered office is 101 Teversham Drift, Cambridge CB1 3LL.




___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Ordering of fields in a join

2012-10-10 Thread Petite Abeille

On Oct 10, 2012, at 7:30 PM, Hamish Symington  
wrote:

>>> I was under the impression that the order of fields in the ON clause 
>>> doesn't matter. Is this impression incorrect? 
>> 
>> You should be correct, and the latest version of SQLite has some changes in 
>> which look relevant to this issue.  Can you check which version of SQLite 
>> you're using and possibly download the latest version and see if this fixes 
>> your issue ?
> 
> It doesn't - still present in 3.7.14.1. 

Perhaps try the forthcoming 3.7.15 preview:

http://www.sqlite.org/sqlite3-20121009.zip 

See "Please test the latest SQLite enhancements" for details:

http://comments.gmane.org/gmane.comp.db.sqlite.general/77259
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Ordering of fields in a join

2012-10-10 Thread Richard Hipp
On Wed, Oct 10, 2012 at 10:53 AM, Hamish Symington <
ham...@lightbluesoftware.com> wrote:

>
> I was under the impression that the order of fields in the ON clause
> doesn't matter. Is this impression incorrect?
>

It doesn't matter for the answer.  You get the same result either way,
right?  Just one way is faster than the other.

If interchanging the order of terms in an ON clause makes a performance
difference, that means that there are (at least) two possible ways of
evaluating the query.  SQLite is looking at both possible plans and
estimating the run-time for each, and it is coming up with the same
estimate in both cases - a tie.  The ambiguity is resolved in favor of the
plan it looked at first.  And the order in which the plans are examined is
probably determined (to some extent) by the order of the terms in the ON
clause.

If you run ANALYZE on your database, SQLite will have more information with
which to estimate the run-time of each plan, and is more likely to choose
the faster on.  At the very least, it much less likely for the estimated
run-times to tie.  Even better is if you compile with SQLITE_ENABLE_STAT3
and then run ANALYZE.


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Ordering of fields in a join

2012-10-10 Thread Hamish Symington
>> I was under the impression that the order of fields in the ON clause doesn't 
>> matter. Is this impression incorrect? 
> 
> You should be correct, and the latest version of SQLite has some changes in 
> which look relevant to this issue.  Can you check which version of SQLite 
> you're using and possibly download the latest version and see if this fixes 
> your issue ?

It doesn't - still present in 3.7.14.1. 

Thanks,

Hamish
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Ordering of fields in a join

2012-10-10 Thread Simon Slavin

On 10 Oct 2012, at 3:53pm, Hamish Symington  
wrote:

> I was under the impression that the order of fields in the ON clause doesn't 
> matter. Is this impression incorrect? 

You should be correct, and the latest version of SQLite has some changes in 
which look relevant to this issue.  Can you check which version of SQLite 
you're using and possibly download the latest version and see if this fixes 
your issue ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Ordering of fields in a join

2012-10-10 Thread Hamish Symington
Hello,

I've made a new database with the latest version of SQLite, and added two 
tables, created thus:

CREATE TABLE SaleItem (SaleItem_ID integer NOT NULL PRIMARY KEY,
SaleItem_UUID varchar NOT NULL DEFAULT '', SaleItem_SaleUUID varchar NOT
NULL DEFAULT '', SaleItem_PictureUUID varchar NOT NULL DEFAULT '' COLLATE
NOCASE);

and

CREATE TABLE Picture (Picture_ID integer NOT NULL PRIMARY KEY,
Picture_UUID varchar NOT NULL DEFAULT '', Picture_FileName varchar NOT
NULL DEFAULT '' COLLATE NOCASE);

I also have three indexes - one each on SaleItem_UUID, Picture_UUID and
SaleItem_PictureUUID.

In SaleItem, I have 8500 rows. In Picture, I have about 1 million. For the
sake of ease, let's say that every four SaleItem rows have the same
SaleItem_SaleUUID, and that each SaleItem has a different, random
Picture_UUID in its SaleItem_PictureUUID field.

If I perform the query

SELECT SaleItem.*, Picture_FileName FROM SaleItem LEFT JOIN Picture ON
SaleItem_PictureUUID=Picture_UUID WHERE SaleItem_SaleUUID = 'DAB8FE97-
D308-4809-B496-E55142DC05B5'

the query is slow (c. 11 sec)

but if I perform the exact same query but with the order of the terms in
the ON part switched

SELECT SaleItem.*, Picture_FileName FROM SaleItem LEFT JOIN Picture ON
Picture_UUID=SaleItem_PictureUUID WHERE SaleItem_SaleUUID = 'DAB8FE97-
D308-4809-B496-E55142DC05B5'

then the query is near instant.

Doing an EXPLAIN QUERY PLAN gives me

0  0  0  SEARCH TABLE SaleItem USING INDEX idx_SaleItem_PictureUUID
(SaleItem_SaleUUID=?) (~10 rows)
0  1  1  SEARCH TABLE Picture USING AUTOMATIC COVERING INDEX
(Picture_UUID=?) (~10 rows)
for the slow one and

0  0  0  SEARCH TABLE SaleItem USING INDEX idx_SaleItem_PictureUUID
(SaleItem_SaleUUID=?) (~10 rows)
0  1  1  SEARCH TABLE Picture USING INDEX idx_Picture_UUID
(Picture_UUID=?) (~10 rows)
for the fast one.

I was under the impression that the order of fields in the ON clause doesn't 
matter. Is this impression incorrect? 

Thanks,

Hamish
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users