> > I'm creating a table by inserting rows from a select statement. Given
> > enough memory this takes a few hours for small datasets and ages for
> > larger ones.
> Guillaume FOUGNIES:
> try a 'PRAGMA synchronous=OFF;' before executing your statement.
Yes, this is already implemented.
In addition: pragma cache_size = 1000000; (about 1.5 GB)
> D. Hipp:
> When I do a CREATE TABLE ... AS SELECT on my 3-year-old desktop, I get a
> table created at over 29000 rows/second and about 3.35MB/sec. At that
> rate, 1 hour of runtime should give you about 100 million rows and 12 GB.
> Not what I would call a "small dataset".
Tables:
-haploshare has 7 columns, just integer values, 3 of them declared as foreign
keys
- matched pairs has 7 columns, 3 of them integer (foreign keys), the
remaining ones are declared as varchar
The statement:
INSERT INTO haploshare
SELECT id1, id2, locus,
CASE WHEN p1 == 0 OR p2 == 0 THEN NULL WHEN p1 != p2 THEN 0 ELSE -1
END,
CASE WHEN p1 == 0 OR m2 == 0 THEN NULL WHEN p1 != m2 THEN 0 ELSE -1
END,
CASE WHEN m1 == 0 OR p2 == 0 THEN NULL WHEN m1 != p2 THEN 0 ELSE -1
END,
CASE WHEN m1 == 0 OR m2 == 0 THEN NULL WHEN m1 != m2 THEN 0 ELSE -1
END,
FROM matched_pairs
ORDER BY id1, id2, locus;
Where each inserted row fires a trigger if the inserted values do not equal
'-1'. The trigger itself does updates the very same table (haploshare) on
rows already inserted by replacing all -1 by the number of '-1' found in the
respective column.
(to support all those find-and-replace operations I added a bunch of indices)
The trigger statement:
--
CREATE TRIGGER haplo_share_of_p1p2 BEFORE INSERT ON haploshare
WHEN new.p1p2 == 0 OR new.p1p2 IS NULL
BEGIN
UPDATE haploshare
SET p1p2 = (SELECT count(*) FROM haploshare
WHERE p1p2 == -1 AND id1 == new.id1 AND id2 ==
new.id2) - 1
WHERE p1p2 == -1 AND id1 == new.id1 AND id2 == new.id2;
END;
--
(this trigger exists for each of the for columns created by the CASE-WHEN
staments of the INSERT above):
Number of rows to expect: up to 10 million
It takes about 2 hours for smaller datasets (about 600.000 rows) to be
created ... any possible speed-up would be greatly appreciated!
Thanks for your replies :)
Daniel
--
Dipl.-Math. (FH) Daniel Franke
Institut fuer Medizinische Biometrie und Statistik
Medizinische Universit�t zu Luebeck
Ratzeburger Allee 160, Haus 4
23538 Luebeck
Telefon: 0451-500-2786
Telefax: 0451-500-2999
[EMAIL PROTECTED]