Hi,

I'm thinking about using a non standard design for my database.
There are up to 1 million records in the database. Each record
has an integer UID. There is another table with keywords for
each of the records. Records typically have 0-15 keywords associated.
The number of keywords is small 100-1000. Currently I have

CREATE TABLE data (
        dataId INTEGER PRIMARY KEY AUTOINCREMENT,
        ...
);
CREATE TABLE words (
        wordId INTEGER PRIMARY KEY AUTOINCREMENT,
        word TEXT PRIMARY KEY ON CONFLICT IGNORE
);

CREATE TABLE keywords (
        wordId INTEGER,
        dataId INTEGER,
        PRIMARY KEY (wordId, dataId)    ON CONFLICT IGNORE
);
CREATE INDEX keywordsDataIdIndex ON keywords (dataId);

This creates in 2 big indexes for keywords...

When I display data items, I also display all associated keywords.
Therefore I created a cat function, that creates a comma separated
list of items:

SELECT *,(SELECT cat(category) FROM keywords WHERE keywords.dataId = 
data.dataId) FROM data ...

I just wonder, if I should not forget about database normalization
and create one table per keyword:

CREATE TABLE data (
        dataId    INTEGER PRIMARY KEY AUTOINCREMENT,
        keywords  TEXT, -- a comma separated list of word ids
        ...
);
CREATE TABLE words (
        wordId  INTEGER PRIMARY KEY AUTOINCREMENT,
        word    TEXT PRIMARY KEY ON CONFLICT IGNORE
);

-- for each word ID there's a table
CREATE TABLE keywords_0 (
        dataId  INTEGER PRIMARY KEY,
);

Is there a limit in the number of tables?

If I choose keyword 1, 7 and 42 I would

SELECT * FROM keywords_1 UNION SELECT * FROM keywords_7' UNION SELECT * FROM 
keywords_42'

I expect the database to become significantly smaller...

Has anybody tried something like this?

Michael

Reply via email to