> > I'm creating a table by inserting rows from a select statement. Given
> > enough memory this takes a few hours for small datasets and ages for
> > larger ones.

> Guillaume FOUGNIES:
 > try a 'PRAGMA synchronous=OFF;' before executing your statement.
Yes, this is already implemented.
In addition: pragma cache_size = 1000000; (about 1.5 GB)

> D. Hipp:
> When I do a CREATE TABLE ... AS SELECT on my 3-year-old desktop, I get a
> table created at over 29000 rows/second and about 3.35MB/sec.  At that
> rate, 1 hour of runtime should give you about 100 million rows and 12 GB. 
> Not what I would call a "small dataset".

Tables:
 -haploshare has 7 columns, just integer values, 3 of them declared as foreign 
keys 
 - matched pairs has 7 columns, 3 of them integer (foreign keys), the 
remaining ones are declared as varchar


The statement:

INSERT INTO haploshare 
        SELECT id1, id2, locus, 
                CASE WHEN p1 == 0 OR p2 == 0 THEN NULL WHEN p1 != p2 THEN 0 ELSE -1 
END,
                CASE WHEN p1 == 0 OR m2 == 0 THEN NULL WHEN p1 != m2 THEN 0 ELSE -1 
END,
                CASE WHEN m1 == 0 OR p2 == 0 THEN NULL WHEN m1 != p2 THEN 0 ELSE -1 
END,
                CASE WHEN m1 == 0 OR m2 == 0 THEN NULL WHEN m1 != m2 THEN 0 ELSE -1 
END,
        FROM matched_pairs
        ORDER BY id1, id2, locus;

Where each inserted row fires a trigger if the inserted values do not equal 
'-1'. The trigger itself does updates the very same table (haploshare) on 
rows already inserted by replacing all -1 by the number of '-1' found in the 
respective column.
(to support all those find-and-replace operations I added a bunch of indices)

The trigger statement:
--
CREATE TRIGGER haplo_share_of_p1p2 BEFORE INSERT ON haploshare 
WHEN new.p1p2 == 0 OR new.p1p2 IS NULL
BEGIN 
        UPDATE haploshare
        SET p1p2 = (SELECT count(*) FROM haploshare 
                                WHERE p1p2 == -1 AND id1 == new.id1 AND id2 == 
new.id2) - 1
        WHERE p1p2 == -1 AND id1 == new.id1 AND id2 == new.id2; 
END;
--
(this trigger exists for each of the for columns created by the CASE-WHEN 
staments of the INSERT above):

Number of rows to expect: up to 10 million

It takes about 2 hours for smaller datasets (about 600.000 rows) to be 
created ... any possible speed-up would be greatly appreciated!

Thanks for your replies :)
        Daniel



-- 
Dipl.-Math. (FH) Daniel Franke
Institut fuer Medizinische Biometrie und Statistik
Medizinische Universit�t zu Luebeck
Ratzeburger Allee 160, Haus 4
23538 Luebeck
Telefon: 0451-500-2786
Telefax: 0451-500-2999
[EMAIL PROTECTED]

Reply via email to