I need to save text files (let say between 1 KB to 20 MB) in a SQLite DB.
I see two possibilities:
1) save all the content in a single column:
create table content(id integer not null primary key,
text blob not null);
2) split the content in lines:
create table line(content integer not null,
line integer not null,
text blob not null,
primary key(content, line));
Some queries will need to extract the whole file, while other queries will need
to extract the text for a range of lines.
According to your experience it is better/faster the first option, the second
option or a smarter option I've not considered?
Why not do both?
If it was me, I would write some code to split the text into sentences
(not lines - which is rather easy in English, but might be harder in
some other languages).
Then I would save the lines in a two-table Database like this:
CREATE TABLE files(
ID INTEGER PRIMARY KEY, -- This is only to get a handle on the rowid
fileName TEXT, -- Add COLLATE NOCASE if the file system is case
insensitive, like Windows.
filePath TEXT,
... -- Any other things you want to store about the origin file, or
perhaps a timestamp etc.
);
CREATE TABLE content(
ID INTEGER PRIMARY KEY,
fileID INT NOT NULL REFERENCES files(ID), -- This to know which
origin file [*].
lineNo INT NOT NULL, -- So that a line can be referenced by position
into a file.
txtLine TEXT,
...
);
Now you can easily query all lines with a specific fileID to see the
entire document, or JOIN by fileName even to list the content of any
file by name, or simply refer to any specific line in any file by either
its fileID+lineNo or simply its own ID. I would probably go further
(since I'm code-parsing the file anyway) and include a paragraph number
or even chapter + page numbers if that's relevant.
This approach will work whether you split lines into sentences or just
physical positional lines, though a sentence-split makes more sense to
me (unless it's a data format).
[*] - I'm only showing basic options, but you'd typically want to add
some ON UPDATE CASCADE or ON DELETE referencing methods.
Cheers,
Ryan
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users