On 22/02/2009 7:48 PM, Kim Boulton wrote:
> Thanks for the hints so far.
>
> Here is my existing Mysql setup and what I've tried with Sqlite....
>
> *Mysql Table structure:*
> CREATE TABLE `mydb`.`mytable` (
> `c1`
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>
> NOT NULL,
[snip FFS]
> `c7`
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>
> NULL,
> `price` smallint(4) NOT NULL, KEY `c1` (`c1`), KEY `c2` (`c2`), KEY
> `c3` (`c3`), KEY `c4` (`c4`), KEY `c5` (`c5`), KEY `c6` (`c6`), KEY
> `c7` (`c7`), KEY `price` (`price`) ) ENGINE=MyISAM DEFAULT
> CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=COMPRESSED;
>
> *CSV Import Data e.g.:*
> 1,A,P,0,0,X,X,300
> 1,A,P,0,0,X,P,9999
> A,A,P,0,0,,,2000
> B,3,Y,0,1,X,,300
> ........ approx 30 million row, 500MB csv, text file
>
> If I compress the table it is approximately 900MB in size with an index
> of approximately 550MB, which i can load into memory via LOAD INDEX INTO
> CACHE
>
> *Mysql Query example:*
> SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 IN
> ('S','Z')) UNION ALL
> SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 =
> 'S' AND c6 IS NULL) UNION ALL
> SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 =
> 'S' AND c7 IS NULL) UNION ALL
> SELECT * FROM mytable WHERE (c1 = 'W' AND c2 IN ('3','C','E','F') AND c3
> IN ('2','5') ) UNION ALL
> SELECT * FROM mytable WHERE (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6
> IS NULL ) UNION ALL
> SELECT * FROM mytable WHERE (c2 = '1' AND c3 = '1' AND c4 IN
> ('C','E','F') AND c5 = 'S' AND c7 IS NULL)
> ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC;
> *
> The above query takes 37 seconds and produces approx 200,000 results and
> uses 550MB RAM*
Try this:
(1) ditch the redundant commas in your csv file
(2) run this (suitably adjusted for my errors and typos, and for my
ignorance of the minutiae of command-line and grep syntax on your
[unspecified IIRC] OS):
grep
"(....W[CEF][SZ]|..W[CEF]S|...W[CEF]S|W[3CEF]S[25]..|W3S..|.11[CEF]S.),"
yourfile.csv
How long does that take?
Another suggestion: search for clues on whether it might be better
instead of doing
select * from mytable where <critera1>
union all
etc etc
select * from mytable where <criteria7>
to do
select * from mytable where <criteria1> or <criteria2> or etc etc
and if you don't find a strong weight of opinion for one way or the
other, give it a try.
HTH,
John
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users