Am 21.02.2020 um 02:24 schrieb Chip Beaulieu:
I have a table with 4.5 million records with full text indexing.
> Reads are very fast, but deleting / inserting / updating
takes on average about 50 seconds per record.
I often do batches of 30,000 deletes / inserts at a time.
The last batch took 10 hours to complete.
These 30,000 deletes+inserts indicates that these were the
verse-records of a complete Bible-Content, right?
And your 4.5Mio records total indicate, that you have about
140 different Bible-versions in your DB?
I suspect it’s got something to do with the triggers more than the indexes.
Could be - but my guess is, that your primary DB-changes are caused
by "deleting or inserting verses of whole bible-contents".
So why not handle (hold) each bible in a separate table?
An amount of 140 tables in such a "BibleVersions.db" is not
uncommon and can be managed by SQLite in a good performance.
Inserting "your next Bible-Version" into it would be fast,
because you're creating separate Tables for that job.
Same thing for "deleting Bible-versions" (just 2 "Drop Table"-calls).
You might argue, that the perfomance will suffer - especially
when you're trying to find "differences between two versions"...
But that is only a matter of a Join (via VerseID) on
the two Tables you want to "get a Verse-Diff for".
I've played that through with two different "Bibles":
- Bible12.txt and Bible13.txt (each one about 5MB unzipped)
- downloadable here: http://www.gutenberg.org/files/30/old/
The 31,102 "verse-records" each of the two text-files contains,
can be parsed and inserted into separate FTS-Tables in about:
- 0.5sec each (incl. text-parsing)
So, the total import-time for both (Bible12 and Bible13) was ~ 1sec.
That's far below your timings for "inserting a single verse-record".
FWIW, here is the SQLite-DB, my imports have produced (about 7MB)
http://vbRichClient.com/Downloads/BibleVersions.zip
It contains 4 Tables now (after importing the 2 text-files):
- Books_Bible12 (66 records) + FTS_Bible12 (31,102 records)
- Books_Bible13 (66 records) + FTS_Bible13 (31,102 records)
The schema-defs for those "Books_xxx" and "FTS_xxx" tables are:
"Create Table <Books_xxx> (BookID Integer Primary Key, Title Text)"
"Create Virtual Table <FTS_xxx> Using FTS5(Scripture, tokenize=porter)"
No extra-indexes are needed... (IMO)
So, whilst the Books-Table was defined "quite normally" -
the FTS-table only contains a single Field ("Scripture")...
which raises the question, how to efficiently join 2 FTS-tables
(e.g. to find "differences in the Scripture-fields").
Well, in the recent Bible-TextFiles (from project Gutenberg),
theres already a "verse- or scripture-id" which is unique:
01:001:001 (two digits for BookNr, 3 for ChapterNr and 3 for VerseNr)
This Text-based (unique) ID can be easily transformed
to an Int-Value of the form: 01001001 - and that's what
I did whilst inserting into the FTS-Tables (filling their RowID-fields)
So, a later Join, to find scripture-differences between:
- FTS_Bible12 and FTS_Bible13 contents
becomes possible - e.g. this way:
Select T1.RowID, T1.Scripture, T2.Scripture
From FTS_Bible12 T1 Join FTS_Bible13 T2 On T1.RowID=T2.RowID
Where T1.Scripture <> T2.Scripture
It will (on the given example-DB above) find 8 records,
where scripture-content differs (after about 50msec or so).
For those interested, below is the import-Code I've used - which
will run as either VBScript - but also within VBA or VB5/6...
(it reads like "pseudo-code", so porting to other languages is easy):
'**depends on the COM-wrapper for SQLite (available at vbRichClient.com)
Function ImportInto(oMemDB, sTblNameBooks, sTblNameFTS, sBibleContent)
On Error Resume Next
With oMemDB
.BeginTrans
.Exec "Create Table " & sTblNameBooks & " (BookID Integer Primary
Key, Title Text)"
.Exec "Create Virtual Table " & sTblNameFTS & " Using
FTS5(Scripture, tokenize=porter)"
Dim sCmdB: sCmdB = "Insert Into " & sTblNameBooks & " (BookID,
Title) Values(?,?)"
Dim sCmdV: sCmdV = "Insert Into " & sTblNameFTS & "(RowID,
Scripture) Values(?,?)"
Dim L, VID, S
For Each L In Split(sBibleContent, vbCrLf) 'enumerate the Lines (in L)
Select Case InStr(L, " ") 'check the Pos of the first Space-Char in L
Case 1 'we are probably still in a verse-block...
If Len(VID) Then S = S & Mid(L, 11) 'Ok, concat that Verse-Part
Case 11 'a potential Verse-Idx-Line
If InStr(L, ":") = 3 Then VID = Left(L, 10): S = Mid(L, 12)
Case 0 'probably an empty Line (as in the end of a Verse-Block)
If Len(VID) Then
.ExecCmd CStr(sCmdV), CLng(Replace(VID, ":", "")), S 'RowID
+ Scripture
VID = "" 'reset the VID for the next verse
End If
Case 5 'it's a potential Book-Line
If Left(L, 4) = "Book" And Len(L) < 40 Then
.ExecCmd CStr(sCmdB), CLng(Mid(L, 6, 2)), Trim(Mid(L, 10))
'BookID + Title
End If
End Select
Next
.CommitTrans
If Err Then .RollbackTrans: ImportInto = Err.Description 'return
the ErrStr
End With
End Function
HTH
Olaf
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users