Re: [sqlite] [EXTERNAL] Slow joining of tables with indexes

2019-12-03 Thread Jonathan Moules

Thanks for the suggestion.
Changing the USING to ON makes absolutely no difference. The speed is 
the same and the query plans (both EXPLAIN and EXPLAIN QUERY PLAN) are 
absolutely identical. Same for if I convert it to WHERE:


WHERE joining_table.data_id = data_table.data_id;

On 2019-12-03 14:46, Simon Slavin wrote:

On 3 Dec 2019, at 8:48am, Jonathan Moules  wrote:


   SELECT
   count(1)
   FROM
   data_table
   JOIN joining_table USING (data_id);

   SELECT
   count(1)
   FROM data_table
   JOIN joining_table
   ON joining_table.data_id = data_table.data_id;

Given the rest of the structure you gave, including the indexes, compare the 
speeds of these two.

Simon
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Slow joining of tables with indexes

2019-12-03 Thread Simon Slavin
On 3 Dec 2019, at 8:48am, Jonathan Moules  wrote:

>   SELECT
>   count(1)
>   FROM
>   data_table
>   JOIN joining_table USING (data_id);

  SELECT
  count(1)
  FROM data_table
  JOIN joining_table
  ON joining_table.data_id = data_table.data_id;

Given the rest of the structure you gave, including the indexes, compare the 
speeds of these two.

Simon
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Slow joining of tables with indexes

2019-12-03 Thread Jonathan Moules
/peterhansen.ca/blog/sqlite-negative-integer-primary-keys.html - 
but it's odd that the HDD was better than the SSD for the most part 
with these.


Also the full-size 64bit integers were a fair percentage slower than 
the regular integers even though there were the exact same number of 
them.



Thanks again,

Jonathan


On 2019-11-26 14:40, David Raymond wrote:
Not the reason for the slowdown, but note that both of these are 
redundant:


CREATE INDEX IF NOT EXISTS data_table__data_id__pk_idx ON data_table (
  data_id
);
CREATE INDEX IF NOT EXISTS ignore_me__ignored_id__pk_idx ON ignore_me (
  ignored_id
);

...because you declared them as the primary keys in the table 
creation. So you now have 2 different indexes on the exact same data 
for each of those.



The rest of it looks fine to me anyway, and I'm not sure why you'd be 
seeing such slow times. Old slow hard disk?


If you analyze and vacuum it does it get any better?

I think the CLI has something like ".scanstats on" to get a little 
more info, but I'm not sure how much more info it'll provide.



-Original Message-
From: sqlite-users  On 
Behalf Of Hick Gunter

Sent: Tuesday, November 26, 2019 4:57 AM
To: 'SQLite mailing list' 
Subject: Re: [sqlite] [EXTERNAL] Slow joining of tables with indexes

You are using text columns as primary keys and referencing them 
directly in foreign keys. This is probably not what you want, because 
it duplicates the text key. Also, with foreign keys enabled, your 
join is not accomplishing anything more than a direct select from 
joining_table, just with more effort (and circumventing the count() 
optimization).


SQLite uses an 64bit Integer as a rowid that uniquely identifies the 
row in the table. This is what you should be using as a foreign key, 
because it is twice as fast as using an index.


OTOH, SQLite supports WITHOUT ROWID tables, you might like to read up 
on those too


-Ursprüngliche Nachricht-
Von: sqlite-users 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von 
Jonathan Moules

Gesendet: Dienstag, 26. November 2019 10:25
An: SQLite mailing list 
Betreff: [EXTERNAL] [sqlite] Slow joining of tables with indexes

Hi List,
I have a relational table setup where I've built indexes but I'm 
still seeing very slow join times on middling amounts of data. I'm 
guessing I'm doing something wrong but I can't see what. (SQLite: 
3.24.0)


Simplified schema as below.
The ids are 16 character hex strings. I've included the ignore_me 
table only because it's relevant to the indexes.
Note: I can *guarantee* that the data inserted into `data_table` and 
`ignore_me` is ordered by their respective primary keys ASC. Entries 
in joining_table are ordered by one of either data_id ASC or 
ignored_id ASC depending on creation method.


--==

-- 1.7 million items
CREATE TABLE data_table (
  data_id    TEXT PRIMARY KEY
  NOT NULL
  COLLATE NOCASE,
  data_1    TEXT,
  data_2 TEXT );

-- 1.9 million items
CREATE TABLE joining_table (
  data_id TEXT REFERENCES data_table (data_id)
  NOT NULL
  COLLATE NOCASE,
  ignored_id TEXT    REFERENCES ignore_me (ignored_id)
  NOT NULL
  COLLATE NOCASE,
  misc_col_1    TEXT,
  misc_col_2    TEXT
);

-- ~200,000 items
CREATE TABLE ignore_me (
  ignored_id    TEXT PRIMARY KEY
  NOT NULL
  COLLATE NOCASE );

CREATE INDEX IF NOT EXISTS data_table__data_id__pk_idx ON data_table (
  data_id
);
CREATE INDEX IF NOT EXISTS ignore_me__ignored_id__pk_idx ON ignore_me (
  ignored_id
);

-- Allow quick joining from data_table to ignore_me CREATE INDEX IF 
NOT EXISTS joining_table__data_ignored_id__fk_idx ON joining_table (

  data_id ASC,
  ignored_id ASC
);
-- Allow quick joining from ignore_me to data_table CREATE INDEX IF 
NOT EXISTS joining_table__ignored_data_id__fk_idx ON joining_table (

  ignored_id ASC,
  data_id ASC
);

-- Example data:

INSERT INTO data_table (data_id) VALUES ('00196a21e8c0f9f6'); INSERT 
INTO data_table (data_id) VALUES ('579c57f1268c0f5c');


INSERT INTO ignore_me VALUES ('c402eb3f05d433f2'); INSERT INTO 
ignore_me VALUES ('d827e58953265f63'); INSERT INTO ignore_me VALUES 
('ec1d2e817f55b249');


INSERT INTO joining_table (data_id, ignored_id) VALUES 
('00196a21e8c0f9f6', 'c402eb3f05d433f2'); INSERT INTO joining_table 
(data_id, ignored_id) VALUES ('00196a21e8c0f9f6', 
'd827e58953265f63'); INSERT INTO joining_table (data_id, ignored_id) 
VALUES ('579c57f1268c0f5c', 'ec1d2e817f55b249');





-- Then to test the speed I'm simply doing:

Re: [sqlite] [EXTERNAL] Slow joining of tables with indexes

2019-12-02 Thread Jonathan Moules
Thanks for the comments. I've done some testing. Results below for those 
interested.


* Unnecessary manual indexes on the Primary Key - good spot, I'd 
forgotten SQLite does that!


* I was indeed using a Hard Disk but that was intentional - this is for 
customers and I can't know their hardware.


* INTEGERs vs WITHOUT ROW_ID vs what I have now vs full-on 64 bit 
INTEGERs - Tested below


Non-scientific Timings are below (in seconds). "HDD" = Hard drive, 
otherwise it's a SSD. "Indexes" means an index built on the FK.


==

Original structure

-- original (16 character string PK/FK) - indexes
-- 4.04
-- 4.6 (hdd)
-- 4.1
-- 4.7 (hdd)
-- 4.14
-- 5.03 (hdd)

-- original (16 character string PK/FK) - no indexes
-- 4.03
-- 5.9 (hdd)
-- 5.1
-- 11.4 (hdd)
-- 4.18
-- 9.766 (hdd)

So not much speed difference with indexes between SSD and HDD.

===

Original structure but changing to WITHOUT ROW_ID

-- original (16 character string PK/FK) - WITHOUT ROW_ID - indexes
-- 3.69
-- 2.9 (hdd)
-- 3.8
-- 5.2 (hdd)
-- 3.74
-- 5.8 (hdd)

-- original (16 character string PK/FK) - WITHOUT ROW_ID - no indexes
-- 3.45
-- 3.4 (hdd)
-- 3.4
-- 3.4 (hdd)
-- 8.47
-- 18.4 (hdd)

Curiously with the with-indexes seems to on average be slower than 
without indexes for this on the HDD.


==

Auto-incrementing INTEGER as the ID and FK

-- integer_id (autoincrement INTEGER PK/FK) - indexes
-- 1.3
-- 1.21
-- 6.9 (hdd)
-- 1.2
-- 4.4 (hdd)
-- 2.45
-- 5.2 (hdd)

-- integer_id (autoincrement INTEGER PK/FK) - no indexes
-- 1.3
-- 19.3 (hdd)
-- 4.7
-- 9.1 (hdd)
-- 5.229
-- 18.98 (hdd)

no-index speeds seem to be very random.



The last test I did was to convert the hex strings to their 64bit 
INTEGER equivalents and use those as the keys. So still using a 64bit 
INTEGER as the keys, they could be anything rather than low value


So my keys are like:
-9223326038759585676
-5012230838021194131
-3961911462337065450
3423089283580538480
9221679147258515042
...

my integer (Hex to INTEGER PK/FK - negative PKs) - index
-- 2.02
-- 2.03
-- 1.9 (hdd)
-- 1.9 (hdd)
-- 6.1
-- 1.9 (hdd)

my integer (Hex to INTEGER PK/FK - negative PKs) - no indexes
-- 2.48s
-- 2.42s
-- 2.4 (hdd)
-- 2.4 (hdd)
-- 7.5
-- 20.1 (hdd)

The HDD was consistently good with these full-size 64bit keys which 
surprised me. I've seen that there are some optimisations assuming 
positive integers - 
http://peterhansen.ca/blog/sqlite-negative-integer-primary-keys.html - 
but it's odd that the HDD was better than the SSD for the most part with 
these.


Also the full-size 64bit integers were a fair percentage slower than the 
regular integers even though there were the exact same number of them.



Thanks again,

Jonathan


On 2019-11-26 14:40, David Raymond wrote:

Not the reason for the slowdown, but note that both of these are redundant:

CREATE INDEX IF NOT EXISTS data_table__data_id__pk_idx ON data_table (
  data_id
);
CREATE INDEX IF NOT EXISTS ignore_me__ignored_id__pk_idx ON ignore_me (
  ignored_id
);

...because you declared them as the primary keys in the table creation. So you 
now have 2 different indexes on the exact same data for each of those.


The rest of it looks fine to me anyway, and I'm not sure why you'd be seeing 
such slow times. Old slow hard disk?

If you analyze and vacuum it does it get any better?

I think the CLI has something like ".scanstats on" to get a little more info, 
but I'm not sure how much more info it'll provide.


-Original Message-
From: sqlite-users  On Behalf Of 
Hick Gunter
Sent: Tuesday, November 26, 2019 4:57 AM
To: 'SQLite mailing list' 
Subject: Re: [sqlite] [EXTERNAL] Slow joining of tables with indexes

You are using text columns as primary keys and referencing them directly in 
foreign keys. This is probably not what you want, because it duplicates the 
text key. Also, with foreign keys enabled, your join is not accomplishing 
anything more than a direct select from joining_table, just with more effort 
(and circumventing the count() optimization).

SQLite uses an 64bit Integer as a rowid that uniquely identifies the row in the 
table. This is what you should be using as a foreign key, because it is twice 
as fast as using an index.

OTOH, SQLite supports WITHOUT ROWID tables, you might like to read up on those 
too

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Jonathan Moules
Gesendet: Dienstag, 26. November 2019 10:25
An: SQLite mailing list 
Betreff: [EXTERNAL] [sqlite] Slow joining of tables with indexes

Hi List,
I have a relational table setup where I've built indexes but I'm still seeing 
very slow join times on middling amounts of data. I'm guessing I'm doing 
something wrong but I can't see what. (SQLite: 3.24.0)

Simplified schema as below.
The ids are 16 character hex strings. I've included the ignore_me table only 
because it's relevant to the indexes.
Note: I can *guarantee* that the 

Re: [sqlite] [EXTERNAL] Slow joining of tables with indexes

2019-11-26 Thread David Raymond
Not the reason for the slowdown, but note that both of these are redundant:

CREATE INDEX IF NOT EXISTS data_table__data_id__pk_idx ON data_table (
 data_id
);
CREATE INDEX IF NOT EXISTS ignore_me__ignored_id__pk_idx ON ignore_me (
 ignored_id
);

...because you declared them as the primary keys in the table creation. So you 
now have 2 different indexes on the exact same data for each of those.


The rest of it looks fine to me anyway, and I'm not sure why you'd be seeing 
such slow times. Old slow hard disk?

If you analyze and vacuum it does it get any better?

I think the CLI has something like ".scanstats on" to get a little more info, 
but I'm not sure how much more info it'll provide.


-Original Message-
From: sqlite-users  On Behalf Of 
Hick Gunter
Sent: Tuesday, November 26, 2019 4:57 AM
To: 'SQLite mailing list' 
Subject: Re: [sqlite] [EXTERNAL] Slow joining of tables with indexes

You are using text columns as primary keys and referencing them directly in 
foreign keys. This is probably not what you want, because it duplicates the 
text key. Also, with foreign keys enabled, your join is not accomplishing 
anything more than a direct select from joining_table, just with more effort 
(and circumventing the count() optimization).

SQLite uses an 64bit Integer as a rowid that uniquely identifies the row in the 
table. This is what you should be using as a foreign key, because it is twice 
as fast as using an index.

OTOH, SQLite supports WITHOUT ROWID tables, you might like to read up on those 
too

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Jonathan Moules
Gesendet: Dienstag, 26. November 2019 10:25
An: SQLite mailing list 
Betreff: [EXTERNAL] [sqlite] Slow joining of tables with indexes

Hi List,
I have a relational table setup where I've built indexes but I'm still seeing 
very slow join times on middling amounts of data. I'm guessing I'm doing 
something wrong but I can't see what. (SQLite: 3.24.0)

Simplified schema as below.
The ids are 16 character hex strings. I've included the ignore_me table only 
because it's relevant to the indexes.
Note: I can *guarantee* that the data inserted into `data_table` and 
`ignore_me` is ordered by their respective primary keys ASC. Entries in 
joining_table are ordered by one of either data_id ASC or ignored_id ASC 
depending on creation method.

--==

-- 1.7 million items
CREATE TABLE data_table (
 data_idTEXT PRIMARY KEY
 NOT NULL
 COLLATE NOCASE,
 data_1TEXT,
 data_2 TEXT );

-- 1.9 million items
CREATE TABLE joining_table (
 data_id TEXT REFERENCES data_table (data_id)
 NOT NULL
 COLLATE NOCASE,
 ignored_id TEXTREFERENCES ignore_me (ignored_id)
 NOT NULL
 COLLATE NOCASE,
 misc_col_1TEXT,
 misc_col_2TEXT
);

-- ~200,000 items
CREATE TABLE ignore_me (
 ignored_idTEXT PRIMARY KEY
 NOT NULL
 COLLATE NOCASE );

CREATE INDEX IF NOT EXISTS data_table__data_id__pk_idx ON data_table (
 data_id
);
CREATE INDEX IF NOT EXISTS ignore_me__ignored_id__pk_idx ON ignore_me (
 ignored_id
);

-- Allow quick joining from data_table to ignore_me CREATE INDEX IF NOT EXISTS 
joining_table__data_ignored_id__fk_idx ON joining_table (
 data_id ASC,
 ignored_id ASC
);
-- Allow quick joining from ignore_me to data_table CREATE INDEX IF NOT EXISTS 
joining_table__ignored_data_id__fk_idx ON joining_table (
 ignored_id ASC,
 data_id ASC
);

-- Example data:

INSERT INTO data_table (data_id) VALUES ('00196a21e8c0f9f6'); INSERT INTO 
data_table (data_id) VALUES ('579c57f1268c0f5c');

INSERT INTO ignore_me VALUES ('c402eb3f05d433f2'); INSERT INTO ignore_me VALUES 
('d827e58953265f63'); INSERT INTO ignore_me VALUES ('ec1d2e817f55b249');

INSERT INTO joining_table (data_id, ignored_id) VALUES ('00196a21e8c0f9f6', 
'c402eb3f05d433f2'); INSERT INTO joining_table (data_id, ignored_id) VALUES 
('00196a21e8c0f9f6', 'd827e58953265f63'); INSERT INTO joining_table (data_id, 
ignored_id) VALUES ('579c57f1268c0f5c', 'ec1d2e817f55b249');




-- Then to test the speed I'm simply doing:
 SELECT
 count(1)
 FROM
 data_table
 JOIN joining_table USING (data_id);

--==

The query plan says it's using the indexes:
 SCAN TABLE joining_table USING COVERING INDEX 
joining_table__ignored_data_id__fk_idx
 SEARCH TABLE data_table USING COVERING INDEX data_table__data_id__pk_idx 
(data_id=?)

But it takes about 20 seconds to do 

Re: [sqlite] [EXTERNAL] Slow joining of tables with indexes

2019-11-26 Thread Hick Gunter
You are using text columns as primary keys and referencing them directly in 
foreign keys. This is probably not what you want, because it duplicates the 
text key. Also, with foreign keys enabled, your join is not accomplishing 
anything more than a direct select from joining_table, just with more effort 
(and circumventing the count() optimization).

SQLite uses an 64bit Integer as a rowid that uniquely identifies the row in the 
table. This is what you should be using as a foreign key, because it is twice 
as fast as using an index.

OTOH, SQLite supports WITHOUT ROWID tables, you might like to read up on those 
too

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Jonathan Moules
Gesendet: Dienstag, 26. November 2019 10:25
An: SQLite mailing list 
Betreff: [EXTERNAL] [sqlite] Slow joining of tables with indexes

Hi List,
I have a relational table setup where I've built indexes but I'm still seeing 
very slow join times on middling amounts of data. I'm guessing I'm doing 
something wrong but I can't see what. (SQLite: 3.24.0)

Simplified schema as below.
The ids are 16 character hex strings. I've included the ignore_me table only 
because it's relevant to the indexes.
Note: I can *guarantee* that the data inserted into `data_table` and 
`ignore_me` is ordered by their respective primary keys ASC. Entries in 
joining_table are ordered by one of either data_id ASC or ignored_id ASC 
depending on creation method.

--==

-- 1.7 million items
CREATE TABLE data_table (
 data_idTEXT PRIMARY KEY
 NOT NULL
 COLLATE NOCASE,
 data_1TEXT,
 data_2 TEXT );

-- 1.9 million items
CREATE TABLE joining_table (
 data_id TEXT REFERENCES data_table (data_id)
 NOT NULL
 COLLATE NOCASE,
 ignored_id TEXTREFERENCES ignore_me (ignored_id)
 NOT NULL
 COLLATE NOCASE,
 misc_col_1TEXT,
 misc_col_2TEXT
);

-- ~200,000 items
CREATE TABLE ignore_me (
 ignored_idTEXT PRIMARY KEY
 NOT NULL
 COLLATE NOCASE );

CREATE INDEX IF NOT EXISTS data_table__data_id__pk_idx ON data_table (
 data_id
);
CREATE INDEX IF NOT EXISTS ignore_me__ignored_id__pk_idx ON ignore_me (
 ignored_id
);

-- Allow quick joining from data_table to ignore_me CREATE INDEX IF NOT EXISTS 
joining_table__data_ignored_id__fk_idx ON joining_table (
 data_id ASC,
 ignored_id ASC
);
-- Allow quick joining from ignore_me to data_table CREATE INDEX IF NOT EXISTS 
joining_table__ignored_data_id__fk_idx ON joining_table (
 ignored_id ASC,
 data_id ASC
);

-- Example data:

INSERT INTO data_table (data_id) VALUES ('00196a21e8c0f9f6'); INSERT INTO 
data_table (data_id) VALUES ('579c57f1268c0f5c');

INSERT INTO ignore_me VALUES ('c402eb3f05d433f2'); INSERT INTO ignore_me VALUES 
('d827e58953265f63'); INSERT INTO ignore_me VALUES ('ec1d2e817f55b249');

INSERT INTO joining_table (data_id, ignored_id) VALUES ('00196a21e8c0f9f6', 
'c402eb3f05d433f2'); INSERT INTO joining_table (data_id, ignored_id) VALUES 
('00196a21e8c0f9f6', 'd827e58953265f63'); INSERT INTO joining_table (data_id, 
ignored_id) VALUES ('579c57f1268c0f5c', 'ec1d2e817f55b249');




-- Then to test the speed I'm simply doing:
 SELECT
 count(1)
 FROM
 data_table
 JOIN joining_table USING (data_id);

--==

The query plan says it's using the indexes:
 SCAN TABLE joining_table USING COVERING INDEX 
joining_table__ignored_data_id__fk_idx
 SEARCH TABLE data_table USING COVERING INDEX data_table__data_id__pk_idx 
(data_id=?)

But it takes about 20 seconds to do that count on the full dataset.

The full EXPLAIN from the full dataset:

0Init016000
1Null01100
2OpenRead27718750k(3,NOCASE,NOCASE,)00
3OpenRead37377150k(2,NOCASE,)02
4Rewind2122000
5Column21200
6SeekGE3112100
7IdxGT3112100
8Integer13000
9AggStep0031count(1)01
10Next37100
11Next25001
12AggFinal110count(1)00
13Copy14000
14ResultRow41000
15Halt00000
16Transaction0077001
17Goto01000

Thoughts? What (probably obvious) thing am I missing?

Thanks,
Jonathan