>>>>> Jim Morris writes:
>>>>> On 10/6/2011 10:43 PM, Ivan Shmakov wrote:
>>>>> Jim Morris writes:

[…]

 >>> INSERT INTO fts3_table (a,b,c)
 >>> SELECT 'an A','a B','a C'
 >>> WHERE NOT EXISTS
 >>> (SELECT DISTINCT a,b,c
 >>> FROM fts3_table
 >>> WHERE a='an A' AND b='a B' AND c='a C');

 >> However, I wonder, would the following (slightly more concise) query
 >> imply any performance loss in comparison to the one above?

 >> INSERT INTO fts3_table (a, b, c)
 >>     SELECT 'an A', 'a B', 'a C'
 >>         EXCEPT SELECT DISTINCT a, b, c
 >>                    FROM fts3_table;

 >> Also, I'm curious if DISTINCT may cause any performance loss in the
 >> case that the columns in question are constrained by an UNIQUE
 >> index?  Like:

 >> CREATE UNIQUE INDEX "foo-unique"
 >>     ON "foo" (a, b, c);

 > I don't know for sure.  You would need to do some testing to
 > determine performance issues.  I wouldn't use distinct if the values
 > are already guaranteed to be unique

        As per EXPLAIN, SELECT DISTINCT indeed doesn't take the
        uniqueness constraint into consideration.

 > or in a not exits clause since it may have additional overhead.

        Surprisingly enough, when used in the EXCEPT part of the query,
        DISTINCT has no effect on the VM code (as shown by EXPLAIN)!

        At a first glance, the code for the NOT EXISTS variant (sans
        INSERT; hopefully it doesn't matter) seems simpler (one
        operation less in total, one less Next, has no OpenEphemeral's,
        and IdxGE instead of IdxInsert and IdxDelete):

addr  opcode         p1    p2    p3    p4             p5  comment      
----  -------------  ----  ----  ----  -------------  --  -------------
0     Trace          0     0     0                    00               
1     Goto           0     21    0                    00               
2     OpenRead       0     2     1     1              00               
3     Rewind         0     19    0                    00               
4     Integer        0     2     0                    00               
5     Integer        1     3     0                    00               
6     OpenRead       2     3     0     keyinfo(1,BINARY)  00               
7     Column         0     0     4                    00               
8     IsNull         4     14    0                    00               
9     SeekGe         2     14    4     1              00               
10    IdxGE          2     14    4     1              01               
11    Integer        1     2     0                    00               
12    IfZero         3     14    -1                   00               
13    Next           2     10    0                    00               
14    Close          2     0     0                    00               
15    If             2     18    1                    00               
16    Column         0     0     7                    00               
17    ResultRow      7     1     0                    00               
18    Next           0     4     0                    01               
19    Close          0     0     0                    00               
20    Halt           0     0     0                    00               
21    Transaction    0     0     0                    00               
22    VerifyCookie   0     2     0                    00               
23    Transaction    1     0     0                    00               
24    VerifyCookie   1     1     0                    00               
25    TableLock      1     2     0     temp.staged    00               
26    TableLock      0     2     0     foo            00               
27    Goto           0     2     0                    00               

        The code for the EXCEPT variant is:

addr  opcode         p1    p2    p3    p4             p5  comment      
----  -------------  ----  ----  ----  -------------  --  -------------
0     Trace          0     0     0                    00               
1     OpenEphemeral  2     1     0     keyinfo(1,BINARY)  00               
2     Goto           0     22    0                    00               
3     OpenRead       1     2     1     1              00               
4     Rewind         1     9     0                    00               
5     Column         1     0     1                    00               
6     MakeRecord     1     1     2                    00               
7     IdxInsert      2     2     0                    00               
8     Next           1     5     0                    01               
9     Close          1     0     0                    00               
10    OpenRead       0     2     0     2              00               
11    Rewind         0     15    0                    00               
12    Column         0     1     1                    00               
13    IdxDelete      2     1     1                    00               
14    Next           0     12    0                    01               
15    Close          0     0     0                    00               
16    Rewind         2     20    0                    00               
17    Column         2     0     3                    00               
18    ResultRow      3     1     0                    00               
19    Next           2     17    0                    00               
20    Close          2     0     0                    00               
21    Halt           0     0     0                    00               
22    Transaction    0     0     0                    00               
23    VerifyCookie   0     2     0                    00               
24    Transaction    1     0     0                    00               
25    VerifyCookie   1     1     0                    00               
26    TableLock      1     2     0     temp.staged    00               
27    TableLock      0     2     0     foo            00               
28    Goto           0     3     0                    00               

        The SQL code used for testing is as follows.

BEGIN;
CREATE TABLE "foo" (
    "id"  INTEGER PRIMARY KEY,
    "bar" TEXT    NOT NULL);
CREATE UNIQUE INDEX "foo-unique"
    ON "foo" ("bar");
INSERT INTO "foo" ("bar") VALUES ('one');
INSERT INTO "foo" ("bar") VALUES ('two');
INSERT INTO "foo" ("bar") VALUES ('three');

CREATE TEMP TABLE "temp.staged" ("bar" TEXT NOT NULL);
INSERT INTO "temp.staged" ("bar") VALUES ('one');
INSERT INTO "temp.staged" ("bar") VALUES ('four');

.explain ON

EXPLAIN 
    SELECT DISTINCT "bar"
        FROM "foo";

EXPLAIN 
    SELECT "bar"
        FROM "foo";

-- INSERT INTO "foo" ("bar")

EXPLAIN
    SELECT "bar"
        FROM "temp.staged"
        EXCEPT SELECT DISTINCT "bar"
                   FROM "foo";

-- produces exactly the same output as the one above:
EXPLAIN
    SELECT "bar"
        FROM "temp.staged"
        EXCEPT SELECT "bar"
                   FROM "foo";

EXPLAIN
    SELECT s."bar"
        FROM "temp.staged" s
        WHERE NOT EXISTS (SELECT 1
                          FROM "foo" f
                          WHERE f."bar" = s."bar");

-- SELECT *
--     FROM "foo";

END;

PS.  As of SQLite version 3.7.3 debian 1.

-- 
FSF associate member #7257

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to