Re: [sqlite] Newb-ish performance questions
As the topic goes I am new to Sqlite. The gui I'd been using was compiled with an older version, than the 3.6.10 cli I downloaded. So things were a bit choppy. I'll stick to using just the downloaded Sqlite cli, now I (sort of) know what I'm doing with it. I'll post the results once I've finished trying all the options, it's looking good so far though :-) k D. Richard Hipp wrote: > > On Feb 23, 2009, at 2:42 AM, Kim Boulton wrote: >> >> But OR is a lot slower than using UNION ALL on both Sqlite and Mysql > > > The optimizer was significantly enhanced for version 3.6.8 in order to > better handle OR in WHERE clauses. What version of SQLite did you > test this with? > > > D. Richard Hipp > d...@hwaci.com > > > > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Newb-ish performance questions
On Feb 23, 2009, at 2:42 AM, Kim Boulton wrote: > > But OR is a lot slower than using UNION ALL on both Sqlite and Mysql The optimizer was significantly enhanced for version 3.6.8 in order to better handle OR in WHERE clauses. What version of SQLite did you test this with? D. Richard Hipp d...@hwaci.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Newb-ish performance questions
Why do you need the 7 single-column indexes? Do you ever do a lookup on a single column? Bear in mind that only 1 index is used per query, so having seven separate indexes on seven separate columns means that six are always unused. I'm curious why the UNION is faster than the OR'ed version... can you post the EXPLAINs of the two? (Feel free to send them to me personally... I dunno if the rest of the list cares :) -T On Mon, Feb 23, 2009 at 1:43 AM, Kim Boultonwrote: > > Sorry, I forgot to mention that I was running the query twice, first to > cache it, second to measure the speed. > > Yeah it's a horrible query but produces the fastest results in Mysql. > maybe using OR instead works better on Sqlite, working on that one. > > Problem I have with indexes in sqlite is that the table balloons 400MB > in size with every index i add. At the moment it's 4GB with one on each > column, if i add more indexes across other columns i'm thinking it'll > get too big to cache up. > > thanks > > Thomas Briggs wrote: >>For starters, I think that loading the index into the cache in >> MySQL is biasing your performance measures. SQLite will automatically >> load pages of any necessary indexes into memory as part of executing >> the query, but doing so takes time. By preloading the index on MySQL, >> you're removing that time from the amount of time MySQL spends >> executing that query, so you're not really comparing apples to apples. >> Said another way: how long does the query take in MySQL if you don't >> pre-load the index? >> >>Secondly, the page size needs to be set before the database is >> created, i.e. before you execute the CREATE TABLE statement. >> >>Thirdly, that's a pretty horrible query. :) Collapsing the >> different queries into fewer queries - even if it isn't a single query >> - will speed things up. Each SELECT is a separate query that requires >> that the table data be read; reducing the number of SELECTs will thus >> reduce query time. >> >>Lastly, your indexes need to reflect how the data is accessed. For >> example, if you're going to do "WHERE c5 = x AND c6 = y AND c7 = 7" >> then you need one index on (c5, c6, c7) - not three individual indexes >> on c5, c6, and c7. >> >>Performance tuning rule #1 - problems are almost always in your use >> of the database, not the database itself. :) >> >>-Tom >> >> On Sun, Feb 22, 2009 at 3:48 AM, Kim Boulton wrote: >> >>> Thanks for the hints so far. >>> >>> Here is my existing Mysql setup and what I've tried with Sqlite >>> >>> *Mysql Table structure:* >>> CREATE TABLE `mydb`.`mytable` ( >>> `c1` >>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') >>> NOT NULL, >>> `c2` >>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') >>> NOT NULL, >>> `c3` >>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') >>> NOT NULL, >>> `c4` >>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') >>> NOT NULL, >>> `c5` >>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') >>> NOT NULL, >>> `c6` >>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') >>> NULL, >>> `c7` >>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') >>> NULL, >>> `price` smallint(4) NOT NULL, KEY `c1` (`c1`), KEY `c2` (`c2`), KEY >>> `c3` (`c3`), KEY `c4` (`c4`), KEY `c5` (`c5`), KEY `c6` (`c6`), KEY >>> `c7` (`c7`), KEY `price` (`price`) ) ENGINE=MyISAM DEFAULT >>> CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=COMPRESSED; >>> >>> *CSV Import Data e.g.:* >>> 1,A,P,0,0,X,X,300 >>> 1,A,P,0,0,X,P, >>> A,A,P,0,0,,,2000 >>> B,3,Y,0,1,X,,300 >>> approx 30 million row, 500MB csv, text file >>> >>> If I compress the table it is approximately 900MB in size with an index >>> of approximately 550MB, which i can load into memory via LOAD INDEX INTO >>> CACHE >>> >>> *Mysql Query example:* >>> SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 IN >>> ('S','Z')) UNION ALL >>> SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = >>> 'S' AND c6 IS NULL) UNION ALL >>> SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = >>> 'S' AND c7 IS NULL) UNION ALL >>> SELECT * FROM mytable WHERE (c1 = 'W' AND c2 IN ('3','C','E','F') AND c3 >>> IN ('2','5') ) UNION ALL >>> SELECT * FROM mytable
Re: [sqlite] Newb-ish performance questions
On 23/02/2009 8:14 PM, Kim Boulton wrote: > Hehe, probably a combination of rubbish grep (i used regex function in a > text editor) and vaccuming a 4GB table at the same time. google("scientific method") :-) > > @echo off > setlocal > set starttime=%time% > egrep --count > "(W[CEF][SZ]|..W[CEF]S|...W[CEF]S|W[3CEF]S[25]..|W3S..|.11[CEF]S.)," > my-30-million-rows-of-data.txt > set stoptime=%time% > echo Started: %starttime% > echo Ended: %stoptime% > > results in: > 24561 > Started: 9:00:58.82 > Ended: 9:01:34.29 > > 36-ish seconds. obviously the regex needs a bit of work as there are > supposed to be around 200,000 matches. Probably a big contributing factor is that my regex is based on you getting rid of the commas in the part number. If the above input file is in your original format, you need to sprinkle commas about madly; the first subpattern would become: .,.,.,.,W,[CEF],[SZ] Note that your average record size is 16 to 17 bytes. If you lose 6 commas, it will be 10 to 11 bytes per record i.e. it can be reduced from about 500Mb to about 320Mb ... quite a useful saving in processing time as well as disk space. > > interesting nonetheless, never used grep before...useful. Sure is. Cheers, John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Newb-ish performance questions
Hehe, probably a combination of rubbish grep (i used regex function in a text editor) and vaccuming a 4GB table at the same time. @echo off setlocal set starttime=%time% egrep --count "(W[CEF][SZ]|..W[CEF]S|...W[CEF]S|W[3CEF]S[25]..|W3S..|.11[CEF]S.)," my-30-million-rows-of-data.txt set stoptime=%time% echo Started: %starttime% echo Ended: %stoptime% results in: 24561 Started: 9:00:58.82 Ended: 9:01:34.29 36-ish seconds. obviously the regex needs a bit of work as there are supposed to be around 200,000 matches. interesting nonetheless, never used grep before...useful. k John Machin wrote: > On 23/02/2009 5:14 PM, Kim Boulton wrote: > >> Hello, >> >> Thanks >> >> The grep regex on the text file found around 10,000 lines over 5 minutes >> (out of a total possible 200,000 rows), at which time I stopped it, >> interesting experiment anyway :-) >> > > Uh-huh ... so you'd estimate that it would take 5 minutes * (200K rows / > 10k rows) = 100 minutes to get through the lot, correct? > > I tried an experiment on a 161Mb CSV file with about 1.1M > name-and-address-etc rows in it. Because none of the patterns in your > query are likely to match my data, I added an extra pattern that would > select about 22% of the records (ended up with 225K output rows), > putting it at the end to ensure it got no unfair advantage from a regex > engine that tested each pattern sequentially. > > BTW, I had to use egrep (or grep -E) to get it to work. > > Anyway, it took about 6 seconds. Scaling up by number of input records: > 6 * 30M / 1M = 180 seconds = 3 minutes. Scaling up by file size: 6 * 500 > / 161 = 19 seconds. By number of output rows: 6 * 200 / 225 ... forget > it. By size of output rows: ... triple forget it. > > Conclusion: something went drastically wrong with your experiment. > Swapping? Other processes hogging the disk or the CPU? A really duff grep?? > > Anyway, here's my environment: 2.0 GHz single-core AMD Turion (64 bit > but running 32-bit Windows XP SP3), using GNU grep 2.5.3 from the > GnuWin32 project; 1 GB memory. > > Cheers, > John > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Newb-ish performance questions
On 23/02/2009 5:14 PM, Kim Boulton wrote: > Hello, > > Thanks > > The grep regex on the text file found around 10,000 lines over 5 minutes > (out of a total possible 200,000 rows), at which time I stopped it, > interesting experiment anyway :-) Uh-huh ... so you'd estimate that it would take 5 minutes * (200K rows / 10k rows) = 100 minutes to get through the lot, correct? I tried an experiment on a 161Mb CSV file with about 1.1M name-and-address-etc rows in it. Because none of the patterns in your query are likely to match my data, I added an extra pattern that would select about 22% of the records (ended up with 225K output rows), putting it at the end to ensure it got no unfair advantage from a regex engine that tested each pattern sequentially. BTW, I had to use egrep (or grep -E) to get it to work. Anyway, it took about 6 seconds. Scaling up by number of input records: 6 * 30M / 1M = 180 seconds = 3 minutes. Scaling up by file size: 6 * 500 / 161 = 19 seconds. By number of output rows: 6 * 200 / 225 ... forget it. By size of output rows: ... triple forget it. Conclusion: something went drastically wrong with your experiment. Swapping? Other processes hogging the disk or the CPU? A really duff grep?? Anyway, here's my environment: 2.0 GHz single-core AMD Turion (64 bit but running 32-bit Windows XP SP3), using GNU grep 2.5.3 from the GnuWin32 project; 1 GB memory. Cheers, John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Newb-ish performance questions
Woo, that's a bit better. Setting the page_size = 4096 gets the query time down to 17 seconds, so it's now faster than Mysql. Result! I also tried changing the query to: SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 IN ('S','Z')) OR (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = 'S' AND c6 IS NULL) OR (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = 'S' AND c7 IS NULL) OR (c1 = 'W' AND c2 IN ('3','C','E','F') AND c3 IN ('2','5') ) OR (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6 IS NULL ) OR (c2 = '1' AND c3 = '1' AND c4 IN ('C','E','F') AND c5 = 'S' AND c7 IS NULL) ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC; But OR is a lot slower than using UNION ALL on both Sqlite and Mysql I'm going to try the page_size at 8192 and see if that works even better. Thanks to everyone for the help! kim Kim Boulton wrote: > Thanks for the hints so far. > > Here is my existing Mysql setup and what I've tried with Sqlite > > *Mysql Table structure:* > CREATE TABLE `mydb`.`mytable` ( > `c1` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > > NOT NULL, > `c2` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > > NOT NULL, > `c3` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > > NOT NULL, > `c4` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > > NOT NULL, > `c5` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > > NOT NULL, > `c6` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > > NULL, > `c7` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > > NULL, > `price` smallint(4) NOT NULL, KEY `c1` (`c1`), KEY `c2` (`c2`), > KEY `c3` (`c3`), KEY `c4` (`c4`), KEY `c5` (`c5`), KEY `c6` > (`c6`), KEY `c7` (`c7`), KEY `price` (`price`) ) ENGINE=MyISAM > DEFAULT CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=COMPRESSED; > > *CSV Import Data e.g.:* > 1,A,P,0,0,X,X,300 > 1,A,P,0,0,X,P, > A,A,P,0,0,,,2000 > B,3,Y,0,1,X,,300 > approx 30 million row, 500MB csv, text file > > If I compress the table it is approximately 900MB in size with an > index of approximately 550MB, which i can load into memory via LOAD > INDEX INTO CACHE > > *Mysql Query example:* > SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 > IN ('S','Z')) UNION ALL > SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = > 'S' AND c6 IS NULL) UNION ALL > SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = > 'S' AND c7 IS NULL) UNION ALL > SELECT * FROM mytable WHERE (c1 = 'W' AND c2 IN ('3','C','E','F') AND > c3 IN ('2','5') ) UNION ALL > SELECT * FROM mytable WHERE (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6 > IS NULL ) UNION ALL > SELECT * FROM mytable WHERE (c2 = '1' AND c3 = '1' AND c4 IN > ('C','E','F') AND c5 = 'S' AND c7 IS NULL) > ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC; > * > The above query takes 37 seconds and produces approx 200,000 results > and uses 550MB RAM* > > *So, in Sqlite3 I created a similar table structure like this:* > CREATE TABLE [mytable] ([c1] TEXT NOT NULL,[c2] TEXT NOT NULL,[c3] > TEXT NOT NULL,[c4] TEXT NOT NULL,[c5] TEXT NOT NULL,[c6] TEXT > NULL,[c7] TEXT NULL,[price] INTEGER NOT NULL) > CREATE INDEX [c1] ON [mytable]([c1] ASC) > CREATE INDEX [c2] ON [mytable]([c2] ASC) > CREATE INDEX [c3] ON [mytable]([c3] ASC) > CREATE INDEX [c4] ON [mytable]([c4] ASC) > CREATE INDEX [c5] ON [mytable]([c5] ASC) > CREATE INDEX [c6] ON [mytable]([c6] ASC) > CREATE INDEX [c7] ON [mytable]([c7] ASC) > CREATE INDEX [price] ON [mytable]([price] ASC) > > *Then I imported the same csv data using:* > .separator "," > .import mycsv.csv mytable > > *Then fixed the NULL values in the last two columns with:* > UPDATE mytable SET c6 = NULL where c6 = ''; > UPDATE mytable SET c7 = NULL where c7 = ''; > > Then Vacuumed - took 6 hours! > > This leaves me with a 4GB Sqlite table > > *Then queried the Sqlite3 table with:* > PRAGMA cache_size = 2000; */uses up 1.5GB RAM regardless*/ > PRAGMA page_size = 2000; /*this doesn't make any difference*/ > SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 > IN ('S','Z')) UNION ALL > SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = > 'S' AND c6 IS NULL) UNION ALL > SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 =
Re: [sqlite] Newb-ish performance questions
Woo, that's a bit better. Setting the page_size = 4096 gets the query time down to 17 seconds, so it's now faster than Mysql. Result! I also tried changing the query to: SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 IN ('S','Z')) OR (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = 'S' AND c6 IS NULL) OR (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = 'S' AND c7 IS NULL) OR (c1 = 'W' AND c2 IN ('3','C','E','F') AND c3 IN ('2','5') ) OR (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6 IS NULL ) OR (c2 = '1' AND c3 = '1' AND c4 IN ('C','E','F') AND c5 = 'S' AND c7 IS NULL) ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC; But OR is a lot slower than using UNION ALL on both Sqlite and Mysql I'm going to try the page_size at 8192 and see if that works even better. Thanks to everyone for the help! kim Kim Boulton wrote: > Thanks for the hints so far. > > Here is my existing Mysql setup and what I've tried with Sqlite > > *Mysql Table structure:* > CREATE TABLE `mydb`.`mytable` ( > `c1` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > > NOT NULL, > `c2` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > > NOT NULL, > `c3` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > > NOT NULL, > `c4` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > > NOT NULL, > `c5` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > > NOT NULL, > `c6` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > > NULL, > `c7` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > > NULL, > `price` smallint(4) NOT NULL, KEY `c1` (`c1`), KEY `c2` (`c2`), > KEY `c3` (`c3`), KEY `c4` (`c4`), KEY `c5` (`c5`), KEY `c6` > (`c6`), KEY `c7` (`c7`), KEY `price` (`price`) ) ENGINE=MyISAM > DEFAULT CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=COMPRESSED; > > *CSV Import Data e.g.:* > 1,A,P,0,0,X,X,300 > 1,A,P,0,0,X,P, > A,A,P,0,0,,,2000 > B,3,Y,0,1,X,,300 > approx 30 million row, 500MB csv, text file > > If I compress the table it is approximately 900MB in size with an > index of approximately 550MB, which i can load into memory via LOAD > INDEX INTO CACHE > > *Mysql Query example:* > SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 > IN ('S','Z')) UNION ALL > SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = > 'S' AND c6 IS NULL) UNION ALL > SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = > 'S' AND c7 IS NULL) UNION ALL > SELECT * FROM mytable WHERE (c1 = 'W' AND c2 IN ('3','C','E','F') AND > c3 IN ('2','5') ) UNION ALL > SELECT * FROM mytable WHERE (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6 > IS NULL ) UNION ALL > SELECT * FROM mytable WHERE (c2 = '1' AND c3 = '1' AND c4 IN > ('C','E','F') AND c5 = 'S' AND c7 IS NULL) > ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC; > * > The above query takes 37 seconds and produces approx 200,000 results > and uses 550MB RAM* > > *So, in Sqlite3 I created a similar table structure like this:* > CREATE TABLE [mytable] ([c1] TEXT NOT NULL,[c2] TEXT NOT NULL,[c3] > TEXT NOT NULL,[c4] TEXT NOT NULL,[c5] TEXT NOT NULL,[c6] TEXT > NULL,[c7] TEXT NULL,[price] INTEGER NOT NULL) > CREATE INDEX [c1] ON [mytable]([c1] ASC) > CREATE INDEX [c2] ON [mytable]([c2] ASC) > CREATE INDEX [c3] ON [mytable]([c3] ASC) > CREATE INDEX [c4] ON [mytable]([c4] ASC) > CREATE INDEX [c5] ON [mytable]([c5] ASC) > CREATE INDEX [c6] ON [mytable]([c6] ASC) > CREATE INDEX [c7] ON [mytable]([c7] ASC) > CREATE INDEX [price] ON [mytable]([price] ASC) > > *Then I imported the same csv data using:* > .separator "," > .import mycsv.csv mytable > > *Then fixed the NULL values in the last two columns with:* > UPDATE mytable SET c6 = NULL where c6 = ''; > UPDATE mytable SET c7 = NULL where c7 = ''; > > Then Vacuumed - took 6 hours! > > This leaves me with a 4GB Sqlite table > > *Then queried the Sqlite3 table with:* > PRAGMA cache_size = 2000; */uses up 1.5GB RAM regardless*/ > PRAGMA page_size = 2000; /*this doesn't make any difference*/ > SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 > IN ('S','Z')) UNION ALL > SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = > 'S' AND c6 IS NULL) UNION ALL > SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 =
Re: [sqlite] Newb-ish performance questions
Sorry, I forgot to mention that I was running the query twice, first to cache it, second to measure the speed. Yeah it's a horrible query but produces the fastest results in Mysql. maybe using OR instead works better on Sqlite, working on that one. Problem I have with indexes in sqlite is that the table balloons 400MB in size with every index i add. At the moment it's 4GB with one on each column, if i add more indexes across other columns i'm thinking it'll get too big to cache up. thanks Thomas Briggs wrote: >For starters, I think that loading the index into the cache in > MySQL is biasing your performance measures. SQLite will automatically > load pages of any necessary indexes into memory as part of executing > the query, but doing so takes time. By preloading the index on MySQL, > you're removing that time from the amount of time MySQL spends > executing that query, so you're not really comparing apples to apples. > Said another way: how long does the query take in MySQL if you don't > pre-load the index? > >Secondly, the page size needs to be set before the database is > created, i.e. before you execute the CREATE TABLE statement. > >Thirdly, that's a pretty horrible query. :) Collapsing the > different queries into fewer queries - even if it isn't a single query > - will speed things up. Each SELECT is a separate query that requires > that the table data be read; reducing the number of SELECTs will thus > reduce query time. > >Lastly, your indexes need to reflect how the data is accessed. For > example, if you're going to do "WHERE c5 = x AND c6 = y AND c7 = 7" > then you need one index on (c5, c6, c7) - not three individual indexes > on c5, c6, and c7. > >Performance tuning rule #1 - problems are almost always in your use > of the database, not the database itself. :) > >-Tom > > On Sun, Feb 22, 2009 at 3:48 AM, Kim Boultonwrote: > >> Thanks for the hints so far. >> >> Here is my existing Mysql setup and what I've tried with Sqlite >> >> *Mysql Table structure:* >> CREATE TABLE `mydb`.`mytable` ( >> `c1` >> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') >> NOT NULL, >> `c2` >> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') >> NOT NULL, >> `c3` >> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') >> NOT NULL, >> `c4` >> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') >> NOT NULL, >> `c5` >> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') >> NOT NULL, >> `c6` >> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') >> NULL, >> `c7` >> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') >> NULL, >> `price` smallint(4) NOT NULL, KEY `c1` (`c1`), KEY `c2` (`c2`), KEY >> `c3` (`c3`), KEY `c4` (`c4`), KEY `c5` (`c5`), KEY `c6` (`c6`), KEY >> `c7` (`c7`), KEY `price` (`price`) ) ENGINE=MyISAM DEFAULT >> CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=COMPRESSED; >> >> *CSV Import Data e.g.:* >> 1,A,P,0,0,X,X,300 >> 1,A,P,0,0,X,P, >> A,A,P,0,0,,,2000 >> B,3,Y,0,1,X,,300 >> approx 30 million row, 500MB csv, text file >> >> If I compress the table it is approximately 900MB in size with an index >> of approximately 550MB, which i can load into memory via LOAD INDEX INTO >> CACHE >> >> *Mysql Query example:* >> SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 IN >> ('S','Z')) UNION ALL >> SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = >> 'S' AND c6 IS NULL) UNION ALL >> SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = >> 'S' AND c7 IS NULL) UNION ALL >> SELECT * FROM mytable WHERE (c1 = 'W' AND c2 IN ('3','C','E','F') AND c3 >> IN ('2','5') ) UNION ALL >> SELECT * FROM mytable WHERE (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6 >> IS NULL ) UNION ALL >> SELECT * FROM mytable WHERE (c2 = '1' AND c3 = '1' AND c4 IN >> ('C','E','F') AND c5 = 'S' AND c7 IS NULL) >> ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC; >> * >> The above query takes 37 seconds and produces approx 200,000 results and >> uses 550MB RAM* >> >> *So, in Sqlite3 I created a similar table structure like this:* >> CREATE TABLE [mytable] ([c1] TEXT NOT NULL,[c2] TEXT NOT NULL,[c3] >> TEXT NOT NULL,[c4] TEXT NOT NULL,[c5] TEXT NOT NULL,[c6] TEXT >> NULL,[c7] TEXT NULL,[price] INTEGER NOT NULL) >> CREATE INDEX [c1] ON
Re: [sqlite] Newb-ish performance questions
You have a good point there, I've been using that wrongly. I'll try that. Thanks Kees Nuyt wrote: > On Sun, 22 Feb 2009 08:48:00 +, Kim Boulton >wrote in k...@jesk.co.uk, General Discussion > of SQLite Database : > > >> *Then queried the Sqlite3 table with:* >> PRAGMA cache_size = 2000; */uses up 1.5GB RAM regardless*/ >> > > cache_size is expressed in number of pages. > default_cache_size is useful too. > > >> PRAGMA page_size = 2000; /*this doesn't make any difference*/ >> > > PRAGMA page_size will only make a difference if you use it > when creating the database (before the first table is > created), or just before a VACUUM statement. > Don't make it too big. 4096 or 8192 are a good start to > experiment with. > > >> Unless anyone has some good ideas I might >> have to give up on Sqlite. >> > > I'm sure you can get more performance if you tweak page_size > and cache_size with some more understanding. > Use whatever engine is best for the task. > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Newb-ish performance questions
Hello, Thanks The grep regex on the text file found around 10,000 lines over 5 minutes (out of a total possible 200,000 rows), at which time I stopped it, interesting experiment anyway :-) Using OR instead of UNION in Mysql is definitely slower, although maybe, as someone has mentioned above, the Sqlite optimizer works better with OR. Will try that next. Thanks. > Try this: > > (1) ditch the redundant commas in your csv file > (2) run this (suitably adjusted for my errors and typos, and for my > ignorance of the minutiae of command-line and grep syntax on your > [unspecified IIRC] OS): > > grep > "(W[CEF][SZ]|..W[CEF]S|...W[CEF]S|W[3CEF]S[25]..|W3S..|.11[CEF]S.)," > yourfile.csv > > How long does that take? > > Another suggestion: search for clues on whether it might be better > instead of doing > select * from mytable where > union all > etc etc > select * from mytable where > to do > select * from mytable where or or etc etc > and if you don't find a strong weight of opinion for one way or the > other, give it a try. > > HTH, > > John > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Newb-ish performance questions
Would loading the 30 million row csv file via the command line be wrapped in inside a single transaction, thus building a very rollback log? I like to break my bulk loads into nice chunks. Also it could be the MySQL parser does a better job of optimizing the unusual select. On Sun, Feb 22, 2009 at 11:27 AM, Nicolas Williamswrote: > On Sun, Feb 22, 2009 at 01:29:09PM +0100, Kees Nuyt wrote: > > > PRAGMA page_size = 2000; /*this doesn't make any difference*/ > > > > PRAGMA page_size will only make a difference if you use it > > when creating the database (before the first table is > > created), or just before a VACUUM statement. > > Don't make it too big. 4096 or 8192 are a good start to > > experiment with. > > The hard max is 32KB, IIRC, and even that requires changing the code. > Otherwise 16KB is the max, and it works fine. > > I've wanted to make SQLite3 default to using the smaller of 16KB or the > filesystem's preferred block size. This would make SQLite3 more > efficient on filesystems like ZFS. But unfortunately the tests assume > the 1KB page size throughout, thus a number of tests fail if that change > is made. Thus for Solaris I've held back on that change for now. > > Nico > -- > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Jim Dodgen j...@dodgen.us ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Newb-ish performance questions
On Sun, Feb 22, 2009 at 01:29:09PM +0100, Kees Nuyt wrote: > > PRAGMA page_size = 2000; /*this doesn't make any difference*/ > > PRAGMA page_size will only make a difference if you use it > when creating the database (before the first table is > created), or just before a VACUUM statement. > Don't make it too big. 4096 or 8192 are a good start to > experiment with. The hard max is 32KB, IIRC, and even that requires changing the code. Otherwise 16KB is the max, and it works fine. I've wanted to make SQLite3 default to using the smaller of 16KB or the filesystem's preferred block size. This would make SQLite3 more efficient on filesystems like ZFS. But unfortunately the tests assume the 1KB page size throughout, thus a number of tests fail if that change is made. Thus for Solaris I've held back on that change for now. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Newb-ish performance questions
For starters, I think that loading the index into the cache in MySQL is biasing your performance measures. SQLite will automatically load pages of any necessary indexes into memory as part of executing the query, but doing so takes time. By preloading the index on MySQL, you're removing that time from the amount of time MySQL spends executing that query, so you're not really comparing apples to apples. Said another way: how long does the query take in MySQL if you don't pre-load the index? Secondly, the page size needs to be set before the database is created, i.e. before you execute the CREATE TABLE statement. Thirdly, that's a pretty horrible query. :) Collapsing the different queries into fewer queries - even if it isn't a single query - will speed things up. Each SELECT is a separate query that requires that the table data be read; reducing the number of SELECTs will thus reduce query time. Lastly, your indexes need to reflect how the data is accessed. For example, if you're going to do "WHERE c5 = x AND c6 = y AND c7 = 7" then you need one index on (c5, c6, c7) - not three individual indexes on c5, c6, and c7. Performance tuning rule #1 - problems are almost always in your use of the database, not the database itself. :) -Tom On Sun, Feb 22, 2009 at 3:48 AM, Kim Boultonwrote: > Thanks for the hints so far. > > Here is my existing Mysql setup and what I've tried with Sqlite > > *Mysql Table structure:* > CREATE TABLE `mydb`.`mytable` ( > `c1` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > NOT NULL, > `c2` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > NOT NULL, > `c3` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > NOT NULL, > `c4` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > NOT NULL, > `c5` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > NOT NULL, > `c6` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > NULL, > `c7` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > NULL, > `price` smallint(4) NOT NULL, KEY `c1` (`c1`), KEY `c2` (`c2`), KEY > `c3` (`c3`), KEY `c4` (`c4`), KEY `c5` (`c5`), KEY `c6` (`c6`), KEY > `c7` (`c7`), KEY `price` (`price`) ) ENGINE=MyISAM DEFAULT > CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=COMPRESSED; > > *CSV Import Data e.g.:* > 1,A,P,0,0,X,X,300 > 1,A,P,0,0,X,P, > A,A,P,0,0,,,2000 > B,3,Y,0,1,X,,300 > approx 30 million row, 500MB csv, text file > > If I compress the table it is approximately 900MB in size with an index > of approximately 550MB, which i can load into memory via LOAD INDEX INTO > CACHE > > *Mysql Query example:* > SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 IN > ('S','Z')) UNION ALL > SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = > 'S' AND c6 IS NULL) UNION ALL > SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = > 'S' AND c7 IS NULL) UNION ALL > SELECT * FROM mytable WHERE (c1 = 'W' AND c2 IN ('3','C','E','F') AND c3 > IN ('2','5') ) UNION ALL > SELECT * FROM mytable WHERE (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6 > IS NULL ) UNION ALL > SELECT * FROM mytable WHERE (c2 = '1' AND c3 = '1' AND c4 IN > ('C','E','F') AND c5 = 'S' AND c7 IS NULL) > ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC; > * > The above query takes 37 seconds and produces approx 200,000 results and > uses 550MB RAM* > > *So, in Sqlite3 I created a similar table structure like this:* > CREATE TABLE [mytable] ([c1] TEXT NOT NULL,[c2] TEXT NOT NULL,[c3] > TEXT NOT NULL,[c4] TEXT NOT NULL,[c5] TEXT NOT NULL,[c6] TEXT > NULL,[c7] TEXT NULL,[price] INTEGER NOT NULL) > CREATE INDEX [c1] ON [mytable]([c1] ASC) > CREATE INDEX [c2] ON [mytable]([c2] ASC) > CREATE INDEX [c3] ON [mytable]([c3] ASC) > CREATE INDEX [c4] ON [mytable]([c4] ASC) > CREATE INDEX [c5] ON [mytable]([c5] ASC) > CREATE INDEX [c6] ON [mytable]([c6] ASC) > CREATE INDEX [c7] ON [mytable]([c7] ASC) > CREATE INDEX [price] ON [mytable]([price] ASC) > > *Then I imported the same csv data using:* > .separator "," > .import mycsv.csv mytable > > *Then fixed the NULL values in the last two columns with:* > UPDATE mytable SET c6 = NULL where c6 = ''; > UPDATE mytable SET c7 = NULL where c7 = ''; > > Then Vacuumed - took 6 hours! > > This leaves me with a 4GB
Re: [sqlite] Newb-ish performance questions
On Sun, 22 Feb 2009 08:48:00 +, Kim Boultonwrote in k...@jesk.co.uk, General Discussion of SQLite Database : > *Then queried the Sqlite3 table with:* > PRAGMA cache_size = 2000; */uses up 1.5GB RAM regardless*/ cache_size is expressed in number of pages. default_cache_size is useful too. > PRAGMA page_size = 2000; /*this doesn't make any difference*/ PRAGMA page_size will only make a difference if you use it when creating the database (before the first table is created), or just before a VACUUM statement. Don't make it too big. 4096 or 8192 are a good start to experiment with. > Unless anyone has some good ideas I might > have to give up on Sqlite. I'm sure you can get more performance if you tweak page_size and cache_size with some more understanding. Use whatever engine is best for the task. -- ( Kees Nuyt ) c[_] ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Newb-ish performance questions
On 22/02/2009 7:48 PM, Kim Boulton wrote: > Thanks for the hints so far. > > Here is my existing Mysql setup and what I've tried with Sqlite > > *Mysql Table structure:* > CREATE TABLE `mydb`.`mytable` ( > `c1` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > > NOT NULL, [snip FFS] > `c7` > enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') > > NULL, > `price` smallint(4) NOT NULL, KEY `c1` (`c1`), KEY `c2` (`c2`), KEY > `c3` (`c3`), KEY `c4` (`c4`), KEY `c5` (`c5`), KEY `c6` (`c6`), KEY > `c7` (`c7`), KEY `price` (`price`) ) ENGINE=MyISAM DEFAULT > CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=COMPRESSED; > > *CSV Import Data e.g.:* > 1,A,P,0,0,X,X,300 > 1,A,P,0,0,X,P, > A,A,P,0,0,,,2000 > B,3,Y,0,1,X,,300 > approx 30 million row, 500MB csv, text file > > If I compress the table it is approximately 900MB in size with an index > of approximately 550MB, which i can load into memory via LOAD INDEX INTO > CACHE > > *Mysql Query example:* > SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 IN > ('S','Z')) UNION ALL > SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = > 'S' AND c6 IS NULL) UNION ALL > SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = > 'S' AND c7 IS NULL) UNION ALL > SELECT * FROM mytable WHERE (c1 = 'W' AND c2 IN ('3','C','E','F') AND c3 > IN ('2','5') ) UNION ALL > SELECT * FROM mytable WHERE (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6 > IS NULL ) UNION ALL > SELECT * FROM mytable WHERE (c2 = '1' AND c3 = '1' AND c4 IN > ('C','E','F') AND c5 = 'S' AND c7 IS NULL) > ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC; > * > The above query takes 37 seconds and produces approx 200,000 results and > uses 550MB RAM* Try this: (1) ditch the redundant commas in your csv file (2) run this (suitably adjusted for my errors and typos, and for my ignorance of the minutiae of command-line and grep syntax on your [unspecified IIRC] OS): grep "(W[CEF][SZ]|..W[CEF]S|...W[CEF]S|W[3CEF]S[25]..|W3S..|.11[CEF]S.)," yourfile.csv How long does that take? Another suggestion: search for clues on whether it might be better instead of doing select * from mytable where union all etc etc select * from mytable where to do select * from mytable where or or etc etc and if you don't find a strong weight of opinion for one way or the other, give it a try. HTH, John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Newb-ish performance questions
Thanks for the hints so far. Here is my existing Mysql setup and what I've tried with Sqlite *Mysql Table structure:* CREATE TABLE `mydb`.`mytable` ( `c1` enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') NOT NULL, `c2` enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') NOT NULL, `c3` enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') NOT NULL, `c4` enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') NOT NULL, `c5` enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') NOT NULL, `c6` enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') NULL, `c7` enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z') NULL, `price` smallint(4) NOT NULL, KEY `c1` (`c1`), KEY `c2` (`c2`), KEY `c3` (`c3`), KEY `c4` (`c4`), KEY `c5` (`c5`), KEY `c6` (`c6`), KEY `c7` (`c7`), KEY `price` (`price`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=COMPRESSED; *CSV Import Data e.g.:* 1,A,P,0,0,X,X,300 1,A,P,0,0,X,P, A,A,P,0,0,,,2000 B,3,Y,0,1,X,,300 approx 30 million row, 500MB csv, text file If I compress the table it is approximately 900MB in size with an index of approximately 550MB, which i can load into memory via LOAD INDEX INTO CACHE *Mysql Query example:* SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 IN ('S','Z')) UNION ALL SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = 'S' AND c6 IS NULL) UNION ALL SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = 'S' AND c7 IS NULL) UNION ALL SELECT * FROM mytable WHERE (c1 = 'W' AND c2 IN ('3','C','E','F') AND c3 IN ('2','5') ) UNION ALL SELECT * FROM mytable WHERE (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6 IS NULL ) UNION ALL SELECT * FROM mytable WHERE (c2 = '1' AND c3 = '1' AND c4 IN ('C','E','F') AND c5 = 'S' AND c7 IS NULL) ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC; * The above query takes 37 seconds and produces approx 200,000 results and uses 550MB RAM* *So, in Sqlite3 I created a similar table structure like this:* CREATE TABLE [mytable] ([c1] TEXT NOT NULL,[c2] TEXT NOT NULL,[c3] TEXT NOT NULL,[c4] TEXT NOT NULL,[c5] TEXT NOT NULL,[c6] TEXT NULL,[c7] TEXT NULL,[price] INTEGER NOT NULL) CREATE INDEX [c1] ON [mytable]([c1] ASC) CREATE INDEX [c2] ON [mytable]([c2] ASC) CREATE INDEX [c3] ON [mytable]([c3] ASC) CREATE INDEX [c4] ON [mytable]([c4] ASC) CREATE INDEX [c5] ON [mytable]([c5] ASC) CREATE INDEX [c6] ON [mytable]([c6] ASC) CREATE INDEX [c7] ON [mytable]([c7] ASC) CREATE INDEX [price] ON [mytable]([price] ASC) *Then I imported the same csv data using:* .separator "," .import mycsv.csv mytable *Then fixed the NULL values in the last two columns with:* UPDATE mytable SET c6 = NULL where c6 = ''; UPDATE mytable SET c7 = NULL where c7 = ''; Then Vacuumed - took 6 hours! This leaves me with a 4GB Sqlite table *Then queried the Sqlite3 table with:* PRAGMA cache_size = 2000; */uses up 1.5GB RAM regardless*/ PRAGMA page_size = 2000; /*this doesn't make any difference*/ SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 IN ('S','Z')) UNION ALL SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = 'S' AND c6 IS NULL) UNION ALL SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = 'S' AND c7 IS NULL) UNION ALL SELECT * FROM mytable WHERE (c1 = 'W' AND c2 IN ('3','C','E','F') AND c3 IN ('2','5') ) UNION ALL SELECT * FROM mytable WHERE (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6 IS NULL ) UNION ALL SELECT * FROM mytable WHERE (c2 = '1' AND c3 = '1' AND c4 IN ('C','E','F') AND c5 = 'S' AND c7 IS NULL) ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC; *The above query takes 57 seconds, 20 seconds slower than Mysql but produces the same results - but uses up 1.5GB RAM!* Also if I try some more complex queries it always runs out steam at 2GB RAM and crashes. Maybe that is a limitation of my PC or XP though. I have also tried... 1. Sqlite table with no indexes - very slow! 2. Sqlite table with one unique index using all columns - still slower than seperate indexes. So I've had a fair crack of the whip with Sqlite. I was hoping it would be faster and use less memory, no luck though. Unless anyone has some good ideas I might have to give up on Sqlite. Cheers. Kim Kim Boulton wrote: > Hello,
Re: [sqlite] Newb-ish performance questions
On Fri, Feb 20, 2009 at 05:22:33AM +, Kim Boulton scratched on the wall: > Hello, > > I'm trying out Sqlite3 with an eye to improving the performance of > queries on an existing MySQL database. > > I've imported the data into sqlite which is approx. 30 million rows of > part numbers each with a price. > > So far, it's approx. four times slower than the MySQL version, and the > size of the sqlite database is too big to fit in memory (several GB) > whereas I can get the MySQL data down to 900MB if it's compressed and > read only. Make sure you have indexes on appropriate columns, like part-number or whatever your queries are commonly keyed off of. Bump up the page-cache size. No matter what the size of the database, by default SQLite will only cache 2000 pages. The default page size is 1K, and a cache slot takes about 1.5K, so adjust accordingly (like 10x or 100x) via PRAGMA if you're on a system with a comfortable amount of RAM. If you're randomly pulling out individual records, the only thing you really care about is keeping the most important index(es) in the cache. Also, SQLite will not pre-load anything, so your first few queries are likely to be slower as it seeds the indexes into the data cache. I'm surprised about the size. If you have a name or description column, make sure you're not using fixed-length strings for the import. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Our opponent is an alien starship packed with atomic bombs. We have a protractor." "I'll go home and see if I can scrounge up a ruler and a piece of string." --from Anathem by Neal Stephenson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Newb-ish performance questions
was it 4 times slower to load? or 4 times slower to query? also we need some examples. On Thu, Feb 19, 2009 at 9:33 PM, Thomas Briggswrote: > Depending on the nature of the data and queries, increasing the > block size may help. > > Posting some information about your schema and queries is the only > way to get truly good advice on this though, I think. There is no > "-runfast" switch you can include on the command line to fix things. > :) The answers are almost guaranteed to be found in your use of > SQLite, not in the database itself. > > -T > > On Fri, Feb 20, 2009 at 12:22 AM, Kim Boulton wrote: > > Hello, > > > > I'm trying out Sqlite3 with an eye to improving the performance of > > queries on an existing MySQL database. > > > > I've imported the data into sqlite which is approx. 30 million rows of > > part numbers each with a price. > > > > So far, it's approx. four times slower than the MySQL version, and the > > size of the sqlite database is too big to fit in memory (several GB) > > whereas I can get the MySQL data down to 900MB if it's compressed and > > read only. > > > > I would appreciate some tips or pointers on getting sqlite3 performance > > up and the data size down. I googled but couldn't find much. > > > > I don't need concurrency or inserts, it's single user, read only. > > > > TIA > > > > kimb > > > > > > > > > > ___ > > sqlite-users mailing list > > sqlite-users@sqlite.org > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Jim Dodgen j...@dodgen.us ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Newb-ish performance questions
Depending on the nature of the data and queries, increasing the block size may help. Posting some information about your schema and queries is the only way to get truly good advice on this though, I think. There is no "-runfast" switch you can include on the command line to fix things. :) The answers are almost guaranteed to be found in your use of SQLite, not in the database itself. -T On Fri, Feb 20, 2009 at 12:22 AM, Kim Boultonwrote: > Hello, > > I'm trying out Sqlite3 with an eye to improving the performance of > queries on an existing MySQL database. > > I've imported the data into sqlite which is approx. 30 million rows of > part numbers each with a price. > > So far, it's approx. four times slower than the MySQL version, and the > size of the sqlite database is too big to fit in memory (several GB) > whereas I can get the MySQL data down to 900MB if it's compressed and > read only. > > I would appreciate some tips or pointers on getting sqlite3 performance > up and the data size down. I googled but couldn't find much. > > I don't need concurrency or inserts, it's single user, read only. > > TIA > > kimb > > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Newb-ish performance questions
Hello, I'm trying out Sqlite3 with an eye to improving the performance of queries on an existing MySQL database. I've imported the data into sqlite which is approx. 30 million rows of part numbers each with a price. So far, it's approx. four times slower than the MySQL version, and the size of the sqlite database is too big to fit in memory (several GB) whereas I can get the MySQL data down to 900MB if it's compressed and read only. I would appreciate some tips or pointers on getting sqlite3 performance up and the data size down. I googled but couldn't find much. I don't need concurrency or inserts, it's single user, read only. TIA kimb ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users