Am 21.02.2020 um 02:24 schrieb Chip Beaulieu:
I have a table with 4.5 million records with full text indexing. > Reads are very fast, but deleting / inserting / updating takes on average about 50 seconds per record. I often do batches of 30,000 deletes / inserts at a time. The last batch took 10 hours to complete.

These 30,000 deletes+inserts indicates that these were the
verse-records of a complete Bible-Content, right?

And your 4.5Mio records total indicate, that you have about
140 different Bible-versions in your DB?

I suspect it’s got something to do with the triggers more than the indexes.
Could be - but my guess is, that your primary DB-changes are caused
by "deleting or inserting verses of whole bible-contents".

So why not handle (hold) each bible in a separate table?

An amount of 140 tables in such a "BibleVersions.db" is not
uncommon and can be managed by SQLite in a good performance.

Inserting "your next Bible-Version" into it would be fast,
because you're creating separate Tables for that job.
Same thing for "deleting Bible-versions" (just 2 "Drop Table"-calls).

You might argue, that the perfomance will suffer - especially
when you're trying to find "differences between two versions"...

But that is only a matter of a Join (via VerseID) on
the two Tables you want to "get a Verse-Diff for".

I've played that through with two different "Bibles":
- Bible12.txt and Bible13.txt (each one about 5MB unzipped)
- downloadable here: http://www.gutenberg.org/files/30/old/

The 31,102 "verse-records" each of the two text-files contains,
can be parsed and inserted into separate FTS-Tables in about:
- 0.5sec each (incl. text-parsing)

So, the total import-time for both (Bible12 and Bible13) was ~ 1sec.

That's far below your timings for "inserting a single verse-record".

FWIW, here is the SQLite-DB, my imports have produced (about 7MB)
http://vbRichClient.com/Downloads/BibleVersions.zip

It contains 4 Tables now (after importing the 2 text-files):
- Books_Bible12 (66 records) + FTS_Bible12 (31,102 records)
- Books_Bible13 (66 records) + FTS_Bible13 (31,102 records)

The schema-defs for those "Books_xxx" and "FTS_xxx" tables are:
"Create Table <Books_xxx> (BookID Integer Primary Key, Title Text)"
"Create Virtual Table <FTS_xxx> Using FTS5(Scripture, tokenize=porter)"

No extra-indexes are needed... (IMO)

So, whilst the Books-Table was defined "quite normally" -
the FTS-table only contains a single Field ("Scripture")...
which raises the question, how to efficiently join 2 FTS-tables
(e.g. to find "differences in the Scripture-fields").

Well, in the recent Bible-TextFiles (from project Gutenberg),
theres already a "verse- or scripture-id" which is unique:
01:001:001 (two digits for BookNr, 3 for ChapterNr and 3 for VerseNr)

This Text-based (unique) ID can be easily transformed
to an Int-Value of the form: 01001001 - and that's what
I did whilst inserting into the FTS-Tables (filling their RowID-fields)

So, a later Join, to find scripture-differences between:
- FTS_Bible12 and FTS_Bible13 contents
becomes possible - e.g. this way:

Select T1.RowID, T1.Scripture, T2.Scripture
From FTS_Bible12 T1 Join FTS_Bible13 T2 On T1.RowID=T2.RowID
Where T1.Scripture <> T2.Scripture

It will (on the given example-DB above) find 8 records,
where scripture-content differs (after about 50msec or so).


For those interested, below is the import-Code I've used - which
will run as either VBScript - but also within VBA or VB5/6...
(it reads like "pseudo-code", so porting to other languages is easy):

'**depends on the COM-wrapper for SQLite (available at vbRichClient.com)
Function ImportInto(oMemDB, sTblNameBooks, sTblNameFTS, sBibleContent)
  On Error Resume Next

  With oMemDB
    .BeginTrans
.Exec "Create Table " & sTblNameBooks & " (BookID Integer Primary Key, Title Text)" .Exec "Create Virtual Table " & sTblNameFTS & " Using FTS5(Scripture, tokenize=porter)"

Dim sCmdB: sCmdB = "Insert Into " & sTblNameBooks & " (BookID, Title) Values(?,?)" Dim sCmdV: sCmdV = "Insert Into " & sTblNameFTS & "(RowID, Scripture) Values(?,?)"

    Dim L, VID, S
    For Each L In Split(sBibleContent, vbCrLf) 'enumerate the Lines (in L)

      Select Case InStr(L, " ") 'check the Pos of the first Space-Char in L

        Case 1  'we are probably still in a verse-block...
          If Len(VID) Then S = S & Mid(L, 11) 'Ok, concat that Verse-Part

        Case 11 'a potential Verse-Idx-Line
          If InStr(L, ":") = 3 Then VID = Left(L, 10): S = Mid(L, 12)

        Case 0  'probably an empty Line (as in the end of a Verse-Block)
          If Len(VID) Then
.ExecCmd CStr(sCmdV), CLng(Replace(VID, ":", "")), S 'RowID + Scripture
             VID = "" 'reset the VID for the next verse
          End If

        Case 5 'it's a potential Book-Line
          If Left(L, 4) = "Book" And Len(L) < 40 Then
.ExecCmd CStr(sCmdB), CLng(Mid(L, 6, 2)), Trim(Mid(L, 10)) 'BookID + Title
          End If
      End Select
    Next

    .CommitTrans
If Err Then .RollbackTrans: ImportInto = Err.Description 'return the ErrStr
  End With
End Function

HTH

Olaf


_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to