Re: [sqlite] Unexpected REINDEX behavior.

2019-08-31 Thread dirdi
On 8/30/19 3:59 AM, dirdi wrote:
> Do not bother!
- "bother!"
+ "worry!"

Sorry @Keith, English is only a second language to me and sometimes I
mix-up idioms. You all were very helpful and I learned a lot so far. I
just feel a bit dumb right now =/


-- 
Best regards

dirdi
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unexpected REINDEX behavior.

2019-08-30 Thread dirdi
On 8/30/19 3:42 PM, David Raymond wrote:
> Sorry if my mind is working slowly today, but why are those showing up as a 
> difference when they're exactly the same line?
> 
> 
> The only difference between both runs:
>> $ diff run1 run2
>> 1260d1259
>> < INSERT INTO space_used 
>> VALUES('idx_words1','words',1,0,551407,550067,3,3826907,0,0,7,6,1341,0,5816,14208,0,1282,5517312);
>> 1270a1270
>>> INSERT INTO space_used 
>>> VALUES('idx_words1','words',1,0,551407,550067,3,3826907,0,0,7,6,1341,0,5816,14208,0,1282,5517312);

Because the line has been moved ;)

-- 
Best regards

dirdi
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unexpected REINDEX behavior.

2019-08-30 Thread David Raymond
Sorry if my mind is working slowly today, but why are those showing up as a 
difference when they're exactly the same line?


The only difference between both runs:
> $ diff run1 run2
> 1260d1259
> < INSERT INTO space_used 
> VALUES('idx_words1','words',1,0,551407,550067,3,3826907,0,0,7,6,1341,0,5816,14208,0,1282,5517312);
> 1270a1270
>> INSERT INTO space_used 
>> VALUES('idx_words1','words',1,0,551407,550067,3,3826907,0,0,7,6,1341,0,5816,14208,0,1282,5517312);
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unexpected REINDEX behavior.

2019-08-29 Thread dirdi
On 8/30/19 3:34 AM, Keith Medcalf wrote:
> Fascinating again, because the code is identical.  The p1 difference in the 
> OpenWrite/IdxInsert/IdxDelete opcodes is just the "file handle" that happens 
> to be being used and doesn't have any real significance (at least I don't 
> think it does).  The p3 in the Transaction opcode is merely the schemacookie 
> and not only indicates that the schema has seen 2 changes between these plans 
> being generated (presumably the DROP and CREATE of the index).
> 
> I am at a loss to explain the difference in execution speed. 

Do not bother! I was just curious if this was expected behavior. Until
now I put about 5h into solving this little riddle. With DROP INDEX /
CREATE INDEX I got something that works for me and therefore I will
leave it at that.

However, if one wants to investigate further, I am happy to provide
additional information.


-- 
Best regards

dirdi
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unexpected REINDEX behavior.

2019-08-29 Thread Keith Medcalf

Fascinating again, because the code is identical.  The p1 difference in the 
OpenWrite/IdxInsert/IdxDelete opcodes is just the "file handle" that happens to 
be being used and doesn't have any real significance (at least I don't think it 
does).  The p3 in the Transaction opcode is merely the schemacookie and not 
only indicates that the schema has seen 2 changes between these plans being 
generated (presumably the DROP and CREATE of the index).

I am at a loss to explain the difference in execution speed.

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of dirdi
>Sent: Thursday, 29 August, 2019 15:10
>To: sqlite-users@mailinglists.sqlite.org
>Subject: Re: [sqlite] Unexpected REINDEX behavior.
>
>On 8/29/19 9:19 PM, Keith Medcalf wrote:
>> Fascinating ... because the optimal query plan would be to do a
>table scan of tbl1 and then dip into tbl2 for each row to get the
>update value.  The only meaningful index then would be on tbl2(int)
>(that is idx3) since an index will be unhelpful for the LIKE
>constraint.
>
>I noticed that, too. But I stopped wondering about optimizer
>internals,
>back when web-browsers started to JIT compile javascript ;)
>
>> Is the generated code different for the two queries, the one that
>runs slow -vs- the one that runs fast?  (That is, the output from
>EXPLAIN ..., not the output from EXPLAIN QUERY PLAN, or from .eqp
>full in the CLI).
>
>This is the output of EXPLAIN UPDATE before I recreate the index:
>> 0Init0   73  0   00
>> 1Null0   4   5   00
>> 2OpenRead0   13  0   0   00
>> 3Rewind  0   7   0   00
>> 4Rowid   0   5   0   00
>> 5RowSetAdd   4   5   0   00
>> 6Next0   4   0   01
>> 7OpenWrite   0   13  0   7   00
>> 8OpenWrite   1   18  0   k(2,,)  00
>> 9RowSetRead  4   72  5   00
>> 10   NotExists   0   9   5   00
>> 11   Rowid   0   6   0   00
>> 12   Null0   7   0   00
>> 13   Null0   8   0   00
>> 14   Column  0   3   9   00
>> 15   Null0   10  0   00
>> 16   Null0   11  0   00
>> 17   Null0   12  0   00
>> 18   Copy5   13  0   00
>> 19   Null0   14  0   00
>> 20   Column  0   1   15  00
>> 21   Column  0   2   16  00
>> 22   Null0   22  22  00
>> 23   Noop5   3   0   00
>> 24   Integer 1   23  0   00
>> 25   OpenRead4   11  0   3   00
>> 26   OpenRead6   23  0   k(2,,)  00
>> 27   Column  0   2   24  00
>> 28   Affinity24  1   0   D   00
>> 29   SeekLE  6   39  24  1   00
>> 30   DeferredSeek6   0   4   00
>> 31   Column  4   1   29  00
>> 32   Concat  29  28  26  00
>> 33   Column  0   1   27  00
>> 34   Function0   0   26  25  like(2) 02
>> 35   IfNot   25  38  1   00
>> 36   IdxRowid6   22  0   00
>> 37   DecrJumpZero23  39  0   00
>> 38   Prev6   30  0   00
>> 39   SCopy   22  17  0   00
>> 40   Column  0   4   18  NULL00
>> 41   Column  0   5   19  NULL00
>> 42   Column  0   6   20  0   00
>> 43   Affinity14  7   0   DBDDBDD 00
>> 44   SCopy   17  2   0   00
>> 45   IntCopy 13  3   0   00
>> 46   MakeRecord  2   2   1   00
>> 47   MakeRecord  14  7   21  00
>> 48   FkIfZero0   56  0   00
>> 49   IsNull  9   56  0   00
>> 50   SCopy   9   25  0   00
>> 51   MustBeInt   25  55  0   00
>> 52   OpenRead7   11  0   3   00
>> 53   NotExists   7   55  25  00
>> 5

Re: [sqlite] Unexpected REINDEX behavior.

2019-08-29 Thread dirdi
On 8/29/19 9:19 PM, Keith Medcalf wrote:
> Fascinating ... because the optimal query plan would be to do a table scan of 
> tbl1 and then dip into tbl2 for each row to get the update value.  The only 
> meaningful index then would be on tbl2(int) (that is idx3) since an index 
> will be unhelpful for the LIKE constraint.

I noticed that, too. But I stopped wondering about optimizer internals,
back when web-browsers started to JIT compile javascript ;)

> Is the generated code different for the two queries, the one that runs slow 
> -vs- the one that runs fast?  (That is, the output from EXPLAIN ..., not the 
> output from EXPLAIN QUERY PLAN, or from .eqp full in the CLI).

This is the output of EXPLAIN UPDATE before I recreate the index:
> 0 Init0   73  0   00  
> 1 Null0   4   5   00  
> 2 OpenRead0   13  0   0   00  
> 3 Rewind  0   7   0   00  
> 4 Rowid   0   5   0   00  
> 5 RowSetAdd   4   5   0   00  
> 6 Next0   4   0   01  
> 7 OpenWrite   0   13  0   7   00  
> 8 OpenWrite   1   18  0   k(2,,)  00  
> 9 RowSetRead  4   72  5   00  
> 10NotExists   0   9   5   00  
> 11Rowid   0   6   0   00  
> 12Null0   7   0   00  
> 13Null0   8   0   00  
> 14Column  0   3   9   00  
> 15Null0   10  0   00  
> 16Null0   11  0   00  
> 17Null0   12  0   00  
> 18Copy5   13  0   00  
> 19Null0   14  0   00  
> 20Column  0   1   15  00  
> 21Column  0   2   16  00  
> 22Null0   22  22  00  
> 23Noop5   3   0   00  
> 24Integer 1   23  0   00  
> 25OpenRead4   11  0   3   00  
> 26OpenRead6   23  0   k(2,,)  00  
> 27Column  0   2   24  00  
> 28Affinity24  1   0   D   00  
> 29SeekLE  6   39  24  1   00  
> 30DeferredSeek6   0   4   00  
> 31Column  4   1   29  00  
> 32Concat  29  28  26  00  
> 33Column  0   1   27  00  
> 34Function0   0   26  25  like(2) 02  
> 35IfNot   25  38  1   00  
> 36IdxRowid6   22  0   00  
> 37DecrJumpZero23  39  0   00  
> 38Prev6   30  0   00  
> 39SCopy   22  17  0   00  
> 40Column  0   4   18  NULL00  
> 41Column  0   5   19  NULL00  
> 42Column  0   6   20  0   00  
> 43Affinity14  7   0   DBDDBDD 00  
> 44SCopy   17  2   0   00  
> 45IntCopy 13  3   0   00  
> 46MakeRecord  2   2   1   00  
> 47MakeRecord  14  7   21  00  
> 48FkIfZero0   56  0   00  
> 49IsNull  9   56  0   00  
> 50SCopy   9   25  0   00  
> 51MustBeInt   25  55  0   00  
> 52OpenRead7   11  0   3   00  
> 53NotExists   7   55  25  00  
> 54Goto0   56  0   00  
> 55FkCounter   0   -1  0   00  
> 56Close   7   0   0   00  
> 57Column  0   3   26  00  
> 58Rowid   0   27  0   00  
> 59IdxDelete   1   26  2   00  
> 60Delete  0   68  13  tbl100  
> 61IsNull  17  68  0   00  
> 62SCopy   17  25  0   00  
> 63MustBeInt   25  67  0   00  
> 64OpenRead8   11  0   3   00  
> 65NotExists   8   67  25  00  
> 66Goto0   68  0   00  
> 67FkCounter   0   1   0   00  
> 68Close   8   0   0   00  
> 69IdxInsert   1   1   2   2   00  
> 70Insert  0   21  13  tbl1 

Re: [sqlite] Unexpected REINDEX behavior.

2019-08-29 Thread dirdi
On 8/29/19 9:35 PM, Simon Slavin wrote:
> On 29 Aug 2019, at 8:12pm, dirdi  wrote:
> 
>> The only difference between both runs:
> 
> Well, that's nothing.  I see no reason for the massive change in timing from 
> what you posted.  But someone else might.

Well the quoted line was shifted to output's second to last line. Hence
I assume that internally something must have changed.

-- 
Best regards

dirdi
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unexpected REINDEX behavior.

2019-08-29 Thread Simon Slavin
On 29 Aug 2019, at 8:12pm, dirdi  wrote:

> The only difference between both runs:

Well, that's nothing.  I see no reason for the massive change in timing from 
what you posted.  But someone else might.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unexpected REINDEX behavior.

2019-08-29 Thread Keith Medcalf

Fascinating ... because the optimal query plan would be to do a table scan of 
tbl1 and then dip into tbl2 for each row to get the update value.  The only 
meaningful index then would be on tbl2(int) (that is idx3) since an index will 
be unhelpful for the LIKE constraint.

Is the generated code different for the two queries, the one that runs slow 
-vs- the one that runs fast?  (That is, the output from EXPLAIN ..., not the 
output from EXPLAIN QUERY PLAN, or from .eqp full in the CLI).

Also, what version of SQLite3 are you using?

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of dirdi
>Sent: Thursday, 29 August, 2019 11:57
>To: sqlite-users@mailinglists.sqlite.org
>Subject: Re: [sqlite] Unexpected REINDEX behavior.
>
>On 8/29/19 6:10 PM, Keith Medcalf wrote:
>> Do you have sqlite_stat1 or sqlite_stat4 tables present (that is,
>have you ever run analyze)?
>
>No.
>
>> If the sqlite_stat1 or sqlite_stat4 tables are present, then if you
>run ANALYZE; rather than reindexing or dropping/recreating the index,
>what is the result?
>
>If I run ...
>
>> ANALYZE;
>> REINDEX;
>> ANALYZE;
>
>... instead of ...
>
>> DROP INDEX `idx1`;
>> CREATE INDEX `idx1` ON `tbl1` (
>> `int`
>> );
>
>the UPDATE query remains being slow (~36m).
>
>--
>Best regards
>
>dirdi
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unexpected REINDEX behavior.

2019-08-29 Thread dirdi
On 8/29/19 6:00 PM, Simon Slavin wrote:
> Just in case, please run an integrity-check on the database as it is before 
> you delete/rebuild the index.

The python script creates the DB from scratch every time. However ...

> PRAGMA integrity_check;

... returned ...

> ok

> You might be able to learn more about the various versions of that index 
> using this:
> 
> 
> 
> It may show that the various versions of idx1 you see/create take different 
> numbers of pages, or have different depths.  However, I agree with you that 
> the amount of time disparity is unexpected.

Here is what sqlanalyze reports about the index:
> *** Index IDX1 of table TBL1 ***
> 
> Percentage of total database..   3.7%
> Number of entries. 551407
> Bytes of storage consumed. 5517312   
> Bytes of payload.. 3826907 69.4% 
> Bytes of metadata. 1670381 30.3% 
> B-tree depth.. 3 
> Average payload per entry. 6.94  
> Average unused bytes per entry 0.04  
> Average metadata per entry 3.03  
> Average fanout 224.00
> Non-sequential pages.. 128295.2% 
> Maximum payload per entry. 7 
> Entries that use overflow. 00.0% 
> Index pages used.. 6 
> Primary pages used 1341  
> Overflow pages used... 0 
> Total pages used.. 1347  
> Unused bytes on index pages... 581623.7% 
> Unused bytes on primary pages. 142080.26% 
> Unused bytes on overflow pages 0 
> Unused bytes on all pages. 200240.36%

I ran it twice, once before executing DROP INDEX and CREATE INDEX
commands ...

> $ ./sqlite3_analyzer db.sqlite3 > run1

... and once afterwards:

> $ ./sqlite3_analyzer db.sqlite3 > run2

The only difference between both runs:
> $ diff run1 run2
> 1260d1259
> < INSERT INTO space_used 
> VALUES('idx_words1','words',1,0,551407,550067,3,3826907,0,0,7,6,1341,0,5816,14208,0,1282,5517312);
> 1270a1270
>> INSERT INTO space_used 
>> VALUES('idx_words1','words',1,0,551407,550067,3,3826907,0,0,7,6,1341,0,5816,14208,0,1282,5517312);

-- 
Best regards

dirdi
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unexpected REINDEX behavior.

2019-08-29 Thread Richard Damon
On 8/29/19 10:56 AM, dirdi wrote:
> On 8/29/19 6:10 PM, Keith Medcalf wrote:
>> Do you have sqlite_stat1 or sqlite_stat4 tables present (that is, have you 
>> ever run analyze)?  
> No.
>
>> If the sqlite_stat1 or sqlite_stat4 tables are present, then if you run 
>> ANALYZE; rather than reindexing or dropping/recreating the index, what is 
>> the result?
> If I run ...
>
>> ANALYZE;
>> REINDEX;
>> ANALYZE;
> ... instead of ...
>
>> DROP INDEX `idx1`;
>> CREATE INDEX `idx1` ON `tbl1` (
>> `int`
>> );
> the UPDATE query remains being slow (~36m).
>
One thought, could there perhaps be something about the query that the
ANALYZE gives bad information about how the structure of data in the
table that dropping and recreating (which deletes the results of the
last ANALYZE) makes the query planner chose a different plan which works
better. Perhaps you could look at and compare the query plans of the two
different scenarios.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unexpected REINDEX behavior.

2019-08-29 Thread dirdi
On 8/29/19 6:10 PM, Keith Medcalf wrote:
> Do you have sqlite_stat1 or sqlite_stat4 tables present (that is, have you 
> ever run analyze)?  

No.

> If the sqlite_stat1 or sqlite_stat4 tables are present, then if you run 
> ANALYZE; rather than reindexing or dropping/recreating the index, what is the 
> result?

If I run ...

> ANALYZE;
> REINDEX;
> ANALYZE;

... instead of ...

> DROP INDEX `idx1`;
> CREATE INDEX `idx1` ON `tbl1` (
> `int`
> );

the UPDATE query remains being slow (~36m).

-- 
Best regards

dirdi
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unexpected REINDEX behavior.

2019-08-29 Thread Keith Medcalf

Do you have sqlite_stat1 or sqlite_stat4 tables present (that is, have you ever 
run analyze)?  

Neither running a REINDEX nor a VACUUM will modify the statistics, however, 
dropping the index will delete the index statistics for the index (and 
re-creating the index will not re-create the statistics).

If the statistics tables (sqlite_stat1 and/or sqlite_stat4) are present then 
you update them by running analyze;  You can also drop the sqlite_stat1 and/or 
sqlite_stat4 tables to get rid of the statistics entirely, or delete individual 
rows from the table to get rid of stale shape data.

If the sqlite_stat1 or sqlite_stat4 tables are present, then if you run 
ANALYZE; rather than reindexing or dropping/recreating the index, what is the 
result?

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of dirdi
>Sent: Thursday, 29 August, 2019 08:10
>To: sqlite-users@mailinglists.sqlite.org
>Subject: [sqlite] Unexpected REINDEX behavior.
>
>Hi there, I noticed some - at least for me - unexpected behavior of
>the
>REINDEX command:
>
>I have a DB with 7 tables, 11 indexes and a size of about 140MiB. The
>DB
>is generated by a python script parsing an XML file. Entries are
>added
>to the tables in arbitrary order.
>
>After the tables have been created and filled, some UPDATE queries
>are
>executed. One of those queries took ~36m. I tried to reduce the
>runtime
>by using the REINDEX command, but with no success. However, if I
>delete
>and recreate one of the indexes, using the DROP INDEX and CREATE
>INDEX
>commands, the UPDATE query takes only ~2m.
>
>Unfortunately I can not share the DB with you, due to legal reasons
>and
>was also not able to create a test script to trigger this behavior.
>
>The two tables, affected by the UPDATE query:
>> CREATE TABLE `tbl1` (
>> `id` INTEGER PRIMARY KEY AUTOINCREMENT,
>> `str` TEXT NOT NULL UNIQUE COLLATE NOCASE,
>> `int` INTEGER NOT NULL,
>> `tbl2_id` INTEGER,
>> `other_str` TEXT DEFAULT NULL,
>> `other_int1` INTEGER DEFAULT NULL,
>> `other_int2` INTEGER NOT NULL DEFAULT 0,
>> FOREIGN KEY(`tbl2_id`) REFERENCES `tbl2`(`id`)
>> );
>>
>> CREATE TABLE `tbl2` (
>> `id` INTEGER PRIMARY KEY AUTOINCREMENT,
>> `str` TEXT NOT NULL UNIQUE COLLATE NOCASE,
>> `int` INTEGER NOT NULL
>> );
>
>Their corresponding indexes:
>> CREATE INDEX `idx1` ON `tbl1` (
>> `int`
>> );
>> CREATE INDEX `idx2` ON `tbl1` (
>> `tbl2_id`
>> );
>> CREATE INDEX `idx3` ON `tbl2` (
>> `int`
>> );
>... filling the tables ...
>
>Recreating index idx1 (To archive the speedup, recreation of the
>other
>indexes is not necessary):
>> DROP INDEX `idx1`;
>> CREATE INDEX `idx1` ON `tbl1` (
>> `int`
>> );
>And the update query:
>> UPDATE `tbl1`
>> SET `tbl2_id` = (
>> SELECT `tbl2`.`id`
>> FROM `tbl2`
>> WHERE `tbl1`.`int` >= `tbl2`.`int`
>> AND `tbl1`.`str` LIKE '%' || `tbl2`.`str`
>> ORDER BY `tbl2`.`int` DESC
>> LIMIT 1
>> );
>
>And it's corresponding query plan:
>> idparent  notused detail> 3  0   0   SCAN TABLE tbl1
>> 23   0   0   CORRELATED SCALAR SUBQUERY 1
>> 29   23  0   SEARCH TABLE tbl2 USING INDEX idx3 (int
>tbl1 and tbl2 contain about 600k and 20k entries, respectively.
>
>Does one have an explanation for this behavior?
>Is it expected that the REINDEX command produces other results than
>"manually" recreation of an index?
>If yes, under which circumstances does this happen?
>And is there some way to measure the "quality" of an index / if it
>has
>been scattered?
>
>PS: I also tried the VACUUM (combined w/ and w/o REINDEX) command to
>no
>success.
>
>--
>Best regards
>
>dirdi
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unexpected REINDEX behavior.

2019-08-29 Thread Simon Slavin
On 29 Aug 2019, at 3:09pm, dirdi  wrote:

> Does one have an explanation for this behavior?

> Is it expected that the REINDEX command produces other results than 
> "manually" recreation of an index?
> If yes, under which circumstances does this happen?
> And is there some way to measure the "quality" of an index / if it has been 
> scattered?

Just in case, please run an integrity-check on the database as it is before you 
delete/rebuild the index.

What you're seeing seems weird to me.  One thing that comes to mind is that 
deleting and recreating an index might put all the pages of the new index at 
the end of the database file, whereas the REINDEX command might reuse the pages 
which the index is currently using.  But I don't know whether this is how 
SQLite really works.

You might be able to learn more about the various versions of that index using 
this:



It may show that the various versions of idx1 you see/create take different 
numbers of pages, or have different depths.  However, I agree with you that the 
amount of time disparity is unexpected.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Unexpected REINDEX behavior.

2019-08-29 Thread dirdi
Hi there, I noticed some - at least for me - unexpected behavior of the
REINDEX command:

I have a DB with 7 tables, 11 indexes and a size of about 140MiB. The DB
is generated by a python script parsing an XML file. Entries are added
to the tables in arbitrary order.

After the tables have been created and filled, some UPDATE queries are
executed. One of those queries took ~36m. I tried to reduce the runtime
by using the REINDEX command, but with no success. However, if I delete
and recreate one of the indexes, using the DROP INDEX and CREATE INDEX
commands, the UPDATE query takes only ~2m.

Unfortunately I can not share the DB with you, due to legal reasons and
was also not able to create a test script to trigger this behavior.

The two tables, affected by the UPDATE query:
> CREATE TABLE `tbl1` (
> `id` INTEGER PRIMARY KEY AUTOINCREMENT,
> `str` TEXT NOT NULL UNIQUE COLLATE NOCASE,
> `int` INTEGER NOT NULL,
> `tbl2_id` INTEGER,
> `other_str` TEXT DEFAULT NULL,
> `other_int1` INTEGER DEFAULT NULL,
> `other_int2` INTEGER NOT NULL DEFAULT 0,
> FOREIGN KEY(`tbl2_id`) REFERENCES `tbl2`(`id`)
> );
> 
> CREATE TABLE `tbl2` (
> `id` INTEGER PRIMARY KEY AUTOINCREMENT,
> `str` TEXT NOT NULL UNIQUE COLLATE NOCASE,
> `int` INTEGER NOT NULL
> );

Their corresponding indexes:
> CREATE INDEX `idx1` ON `tbl1` (
> `int`
> );
> CREATE INDEX `idx2` ON `tbl1` (
> `tbl2_id`
> );
> CREATE INDEX `idx3` ON `tbl2` (
> `int`
> );
... filling the tables ...

Recreating index idx1 (To archive the speedup, recreation of the other
indexes is not necessary):
> DROP INDEX `idx1`;
> CREATE INDEX `idx1` ON `tbl1` (
> `int`
> );
And the update query:
> UPDATE `tbl1`
> SET `tbl2_id` = (
> SELECT `tbl2`.`id`
> FROM `tbl2`
> WHERE `tbl1`.`int` >= `tbl2`.`int`
> AND `tbl1`.`str` LIKE '%' || `tbl2`.`str`
> ORDER BY `tbl2`.`int` DESC
> LIMIT 1
> );

And it's corresponding query plan:
> idparent  notused detail> 3   0   0   SCAN TABLE tbl1
> 230   0   CORRELATED SCALAR SUBQUERY 1
> 2923  0   SEARCH TABLE tbl2 USING INDEX idx3 (inthttp://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users