A lot of speculation here. I am certainly no SQLite expert.

Your input has 1g positions, taking 187gb, so averaging 187b/position. From 
your CREATE TABLE, it looks like to get that size most of your qfrom and qto 
are fairly long strings. I'm assuming there are a great many duplications in 
those positions. If so, put them into a separate position table { positionId 
INT, positionName TEXT }, with positionId as the primary key and positionName 
also being unique. This will be even more useful if you have a fixed set of 
possible positions, and you make it so that positionId is increasing whenever 
positionName is increasing. In your qmoves table, store positionId values, 
rather than postionName values. Saves a lot of space because no name is in the 
database more than once, and most of your space is related to names.

Space is important, because at a few hundred gb, your database is probably 
bigger than all of your available fast cache's, and you are probably storing 
your data on rotating storage. Writes to random positions might easily average 
10ms, and the import of most of your records may involve one or more such a 
writes by the time indices are updated. Reducing sizes mean fewer such writes, 
because things are more likely to fit in the various caches.

I believe that with SQLite, if you don't specify WITHOUT ROWID your "real" 
record order is based on rowid, so complete records are, perhaps, stored in 
import order. The PRIMARY KEY you specified is large (a name and an integer), 
so the index entries are likely about half as large as a complete record, and 
they don't fit in cache either. It is also likely that your input data was not 
in sorted order. That means that adding an entry to that index likely involves 
a write to a random position. 10ms * 1g = 10e6s, or about three months.

Your qmoves_by_dest index is about as large as your primary (it also has a 
position and an integer), so similar timings might apply.

I suggest trying timing your script on smaller inputs (1gb, 10gb, 20gb) and 
seeing if there is a size where things suddenly get worse (available caches are 
not big enough). See if my suggestions help those timings. See if WITHOUT ROWID 
helps those timings.

In principle, indices can be created by writing the needed information (index 
key, record position) in the original order, and then sorting that into 
key-order. That can be done with many less random seeks (merge sorts involve 
mostly sequential reads and writes). I don't know if, or when, SQLite does that.

Regards,
Bill

-----Original Message-----
From: Kevin O'Gorman [mailto:kevinogorm...@gmail.com]
Sent: Wednesday, August 03, 2016 10:00 PM
To: sqlite-users
Subject: [sqlite] newbie has waited days for a DB build to complete. what's up 
with this.

I'm working on a hobby project, but the data has gotten a bit out of hand.
I thought I'd put it in a real database rather than flat ASCII files.

I've got a problem set of about 1 billion game positions and 187GB to work on 
(no, I won't have to solve them all) that took about 4 hours for a generator 
program just to write.  I wrote code to turn them into something SQLite could 
import.  Actually, it's import, build a non-primary index, and alter table to 
add a column, all in sqlite3.

The database was growing for about 1-1/2 days.  Then its journal disappeared, 
the file size dropped to zero, but sqlite3 is still running 100% CPU time, now 
for a total of 3800+ minutes (63+ hours).  The database is still locked, but I 
have no idea what sqlite3 is doing, or if it will ever stop.  All partitions 
still have lots of space left (most of this is running in a RAID partition of 
11 TiB).  Here's what I gave to sqlite3 on my Linux system:

time sqlite3 qubic.db <<EOF
BEGIN EXCLUSIVE TRANSACTION;
DROP TABLE IF EXISTS qmoves;
CREATE TABLE qmoves (
  qfrom CHAR(64),
  qmove INT,
  qto   CHAR(64),
  qweight INT,
  PRIMARY KEY (qfrom, qmove) ON CONFLICT ROLLBACK ); CREATE INDEX IF NOT EXISTS 
qmoves_by_dest ON qmoves (
  qto,
  qweight
);
CREATE TABLE IF NOT EXISTS qposn (
  qposn CHAR(64) PRIMARY KEY ON CONFLICT ROLLBACK,
  qmaxval INT,
  qmove INT,
  qminval INT,
  qstatus INT
);
.separator " "
.import am.all qmoves
ALTER TABLE qmoves ADD COLUMN qstatus INT DEFAULT NULL; .schema COMMIT 
TRANSACTION;

EOF

Any clues, hints, or advice?


--
#define QUESTION ((bb) || (!bb)) /* Shakespeare */


**************************************************************************************
This e-mail and any attachments thereto may contain confidential information 
and/or information protected by intellectual property rights for the exclusive 
attention of the intended addressees named above. If you have received this 
transmission in error, please immediately notify the sender by return e-mail 
and delete this message and its attachments. Unauthorized use, copying or 
further full or partial distribution of this e-mail or its contents is 
prohibited.
**************************************************************************************
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to