On 22/02/2009 7:48 PM, Kim Boulton wrote:
> Thanks for the hints so far.
> 
> Here is my existing Mysql setup and what I've tried with Sqlite....
> 
> *Mysql Table structure:*
> CREATE TABLE  `mydb`.`mytable` (
>   `c1` 
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>  
> NOT NULL,
[snip FFS]
>   `c7` 
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>  
> NULL,
>   `price` smallint(4) NOT NULL,  KEY `c1` (`c1`),  KEY `c2` (`c2`),  KEY 
> `c3` (`c3`),  KEY `c4` (`c4`),  KEY `c5` (`c5`),  KEY `c6` (`c6`),  KEY 
> `c7` (`c7`),  KEY `price` (`price`) ) ENGINE=MyISAM DEFAULT 
> CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=COMPRESSED;
> 
> *CSV Import Data e.g.:*
> 1,A,P,0,0,X,X,300
> 1,A,P,0,0,X,P,9999
> A,A,P,0,0,,,2000
> B,3,Y,0,1,X,,300
> ........ approx 30 million row, 500MB csv, text file
> 
> If I compress the table it is approximately 900MB in size with an index 
> of approximately 550MB, which i can load into memory via LOAD INDEX INTO 
> CACHE
> 
> *Mysql Query example:*
> SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 IN 
> ('S','Z')) UNION ALL
> SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = 
> 'S' AND c6 IS NULL) UNION ALL
> SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = 
> 'S' AND c7 IS NULL) UNION ALL
> SELECT * FROM mytable WHERE (c1 = 'W' AND c2 IN ('3','C','E','F') AND c3 
> IN ('2','5') ) UNION ALL
> SELECT * FROM mytable WHERE (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6 
> IS NULL ) UNION ALL
> SELECT * FROM mytable WHERE (c2 = '1' AND c3 = '1' AND c4 IN 
> ('C','E','F') AND c5 = 'S' AND c7 IS NULL)
> ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC;
> *
> The above query takes 37 seconds and produces approx 200,000 results and 
> uses 550MB RAM*

Try this:

(1) ditch the redundant commas in your csv file
(2) run this (suitably adjusted for my errors and typos, and for my
ignorance of the minutiae of command-line and grep syntax on your
[unspecified IIRC] OS):

grep
"(....W[CEF][SZ]|..W[CEF]S|...W[CEF]S|W[3CEF]S[25]..|W3S..|.11[CEF]S.),"
yourfile.csv

How long does that take?

Another suggestion: search for clues on whether it might be better
instead of doing
    select * from mytable where <critera1>
    union all
    etc etc
    select * from mytable where <criteria7>
to do
    select * from mytable where <criteria1> or <criteria2> or etc etc
and if you don't find a strong weight of opinion for one way or the
other, give it a try.

HTH,

John


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to