Re: [sqlite] Newb-ish performance questions

2009-02-23 Thread Kim Boulton

As the topic goes I am new to Sqlite. The gui I'd been using was 
compiled with an older version, than the 3.6.10 cli I downloaded. So 
things were a bit choppy. I'll stick to using just the downloaded Sqlite 
cli, now I (sort of) know what I'm doing with it.

I'll post the results once I've finished trying all the options, it's 
looking good so far though :-)

k


D. Richard Hipp wrote:
>
> On Feb 23, 2009, at 2:42 AM, Kim Boulton wrote:
>>
>> But OR is a lot slower than using UNION ALL on both Sqlite and Mysql
>
>
> The optimizer was significantly enhanced for version 3.6.8 in order to 
> better handle OR in WHERE clauses.  What version of SQLite did you 
> test this with?
>
>
> D. Richard Hipp
> d...@hwaci.com
>
>
>
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Newb-ish performance questions

2009-02-23 Thread D. Richard Hipp

On Feb 23, 2009, at 2:42 AM, Kim Boulton wrote:
>
> But OR is a lot slower than using UNION ALL on both Sqlite and Mysql


The optimizer was significantly enhanced for version 3.6.8 in order to  
better handle OR in WHERE clauses.  What version of SQLite did you  
test this with?


D. Richard Hipp
d...@hwaci.com



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Newb-ish performance questions

2009-02-23 Thread Thomas Briggs
   Why do you need the 7 single-column indexes?  Do you ever do a
lookup on a single column?  Bear in mind that only 1 index is used per
query, so having seven separate indexes on seven separate columns
means that six are always unused.

   I'm curious why the UNION is faster than the OR'ed version... can
you post the EXPLAINs of the two?  (Feel free to send them to me
personally... I dunno if the rest of the list cares :)

   -T

On Mon, Feb 23, 2009 at 1:43 AM, Kim Boulton  wrote:
>
> Sorry, I forgot to mention that I was running the query twice, first to
> cache it, second to measure the speed.
>
> Yeah it's a horrible query but produces the fastest results in Mysql.
> maybe using OR instead works better on Sqlite, working on that one.
>
> Problem I have with indexes in sqlite is that the table balloons 400MB
> in size with every index i add. At the moment it's 4GB with one on each
> column, if i add more indexes across other columns i'm thinking it'll
> get too big to cache up.
>
> thanks
>
> Thomas Briggs wrote:
>>For starters, I think that loading the index into the cache in
>> MySQL is biasing your performance measures.  SQLite will automatically
>> load pages of any necessary indexes into memory as part of executing
>> the query, but doing so takes time.  By preloading the index on MySQL,
>> you're removing that time from the amount of time MySQL spends
>> executing that query, so you're not really comparing apples to apples.
>>  Said another way: how long does the query take in MySQL if you don't
>> pre-load the index?
>>
>>Secondly, the page size needs to be set before the database is
>> created, i.e. before you execute the CREATE TABLE statement.
>>
>>Thirdly, that's a pretty horrible query. :)  Collapsing the
>> different queries into fewer queries - even if it isn't a single query
>> - will speed things up.  Each SELECT is a separate query that requires
>> that the table data be read; reducing the number of SELECTs will thus
>> reduce query time.
>>
>>Lastly, your indexes need to reflect how the data is accessed.  For
>> example, if you're going to do "WHERE c5 = x AND c6 = y AND c7 = 7"
>> then you need one index on (c5, c6, c7) - not three individual indexes
>> on c5, c6, and c7.
>>
>>Performance tuning rule #1 - problems are almost always in your use
>> of the database, not the database itself. :)
>>
>>-Tom
>>
>> On Sun, Feb 22, 2009 at 3:48 AM, Kim Boulton  wrote:
>>
>>> Thanks for the hints so far.
>>>
>>> Here is my existing Mysql setup and what I've tried with Sqlite
>>>
>>> *Mysql Table structure:*
>>> CREATE TABLE  `mydb`.`mytable` (
>>>  `c1`
>>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>>> NOT NULL,
>>>  `c2`
>>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>>> NOT NULL,
>>>  `c3`
>>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>>> NOT NULL,
>>>  `c4`
>>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>>> NOT NULL,
>>>  `c5`
>>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>>> NOT NULL,
>>>  `c6`
>>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>>> NULL,
>>>  `c7`
>>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>>> NULL,
>>>  `price` smallint(4) NOT NULL,  KEY `c1` (`c1`),  KEY `c2` (`c2`),  KEY
>>> `c3` (`c3`),  KEY `c4` (`c4`),  KEY `c5` (`c5`),  KEY `c6` (`c6`),  KEY
>>> `c7` (`c7`),  KEY `price` (`price`) ) ENGINE=MyISAM DEFAULT
>>> CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=COMPRESSED;
>>>
>>> *CSV Import Data e.g.:*
>>> 1,A,P,0,0,X,X,300
>>> 1,A,P,0,0,X,P,
>>> A,A,P,0,0,,,2000
>>> B,3,Y,0,1,X,,300
>>>  approx 30 million row, 500MB csv, text file
>>>
>>> If I compress the table it is approximately 900MB in size with an index
>>> of approximately 550MB, which i can load into memory via LOAD INDEX INTO
>>> CACHE
>>>
>>> *Mysql Query example:*
>>> SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 IN
>>> ('S','Z')) UNION ALL
>>> SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 =
>>> 'S' AND c6 IS NULL) UNION ALL
>>> SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 =
>>> 'S' AND c7 IS NULL) UNION ALL
>>> SELECT * FROM mytable WHERE (c1 = 'W' AND c2 IN ('3','C','E','F') AND c3
>>> IN ('2','5') ) UNION ALL
>>> SELECT * FROM mytable 

Re: [sqlite] Newb-ish performance questions

2009-02-23 Thread John Machin
On 23/02/2009 8:14 PM, Kim Boulton wrote:
> Hehe, probably a combination of rubbish grep (i used regex function in a 
> text editor) and vaccuming a 4GB table at the same time.

google("scientific method") :-)

> 
> @echo off
> setlocal
> set starttime=%time%
> egrep --count 
> "(W[CEF][SZ]|..W[CEF]S|...W[CEF]S|W[3CEF]S[25]..|W3S..|.11[CEF]S.)," 
> my-30-million-rows-of-data.txt
> set stoptime=%time%
> echo Started: %starttime%
> echo Ended: %stoptime%
> 
> results in:
> 24561
> Started:  9:00:58.82
> Ended:  9:01:34.29
> 
> 36-ish seconds. obviously the regex needs a bit of work as there are 
> supposed to be around 200,000 matches.

Probably a big contributing factor is that my regex is based on you 
getting rid of the commas in the part number. If the above input file is 
in your original format, you need to sprinkle commas about madly; the 
first subpattern would become:
.,.,.,.,W,[CEF],[SZ]

Note that your average record size is 16 to 17 bytes. If you lose 6 
commas, it will be 10 to 11 bytes per record i.e. it can be reduced from 
about 500Mb to about 320Mb ... quite a useful saving in processing time 
as well as disk space.

> 
> interesting nonetheless, never used grep before...useful.

Sure is.


Cheers,
John
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Newb-ish performance questions

2009-02-23 Thread Kim Boulton
Hehe, probably a combination of rubbish grep (i used regex function in a 
text editor) and vaccuming a 4GB table at the same time.

@echo off
setlocal
set starttime=%time%
egrep --count 
"(W[CEF][SZ]|..W[CEF]S|...W[CEF]S|W[3CEF]S[25]..|W3S..|.11[CEF]S.)," 
my-30-million-rows-of-data.txt
set stoptime=%time%
echo Started: %starttime%
echo Ended: %stoptime%

results in:
24561
Started:  9:00:58.82
Ended:  9:01:34.29

36-ish seconds. obviously the regex needs a bit of work as there are 
supposed to be around 200,000 matches.

interesting nonetheless, never used grep before...useful.

k

John Machin wrote:
> On 23/02/2009 5:14 PM, Kim Boulton wrote:
>   
>> Hello,
>>
>> Thanks
>>
>> The grep regex on the text file found around 10,000 lines over 5 minutes 
>> (out of a total possible 200,000 rows), at which time I stopped it, 
>> interesting experiment anyway :-)
>> 
>
> Uh-huh ... so you'd estimate that it would take 5 minutes * (200K rows / 
> 10k rows) = 100 minutes to get through the lot, correct?
>
> I tried an experiment on a 161Mb CSV file with about 1.1M 
> name-and-address-etc rows in it. Because none of the patterns in your 
> query are likely to match my data, I added an extra pattern that would 
> select about 22% of the records (ended up with 225K output rows), 
> putting it at the end to ensure it got no unfair advantage from a regex 
> engine that tested each pattern sequentially.
>
> BTW, I had to use egrep (or grep -E) to get it to work.
>
> Anyway, it took about 6 seconds. Scaling up by number of input records: 
> 6 * 30M / 1M = 180 seconds = 3 minutes. Scaling up by file size: 6 * 500 
> / 161 = 19 seconds. By number of output rows: 6 * 200 / 225 ... forget 
> it. By size of output rows: ... triple forget it.
>
> Conclusion: something went drastically wrong with your experiment. 
> Swapping? Other processes hogging the disk or the CPU? A really duff grep??
>
> Anyway, here's my environment: 2.0 GHz single-core AMD Turion (64 bit 
> but running 32-bit Windows XP SP3), using GNU grep 2.5.3 from the 
> GnuWin32 project; 1 GB memory.
>
> Cheers,
> John
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>   
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Newb-ish performance questions

2009-02-22 Thread John Machin
On 23/02/2009 5:14 PM, Kim Boulton wrote:
> Hello,
> 
> Thanks
> 
> The grep regex on the text file found around 10,000 lines over 5 minutes 
> (out of a total possible 200,000 rows), at which time I stopped it, 
> interesting experiment anyway :-)

Uh-huh ... so you'd estimate that it would take 5 minutes * (200K rows / 
10k rows) = 100 minutes to get through the lot, correct?

I tried an experiment on a 161Mb CSV file with about 1.1M 
name-and-address-etc rows in it. Because none of the patterns in your 
query are likely to match my data, I added an extra pattern that would 
select about 22% of the records (ended up with 225K output rows), 
putting it at the end to ensure it got no unfair advantage from a regex 
engine that tested each pattern sequentially.

BTW, I had to use egrep (or grep -E) to get it to work.

Anyway, it took about 6 seconds. Scaling up by number of input records: 
6 * 30M / 1M = 180 seconds = 3 minutes. Scaling up by file size: 6 * 500 
/ 161 = 19 seconds. By number of output rows: 6 * 200 / 225 ... forget 
it. By size of output rows: ... triple forget it.

Conclusion: something went drastically wrong with your experiment. 
Swapping? Other processes hogging the disk or the CPU? A really duff grep??

Anyway, here's my environment: 2.0 GHz single-core AMD Turion (64 bit 
but running 32-bit Windows XP SP3), using GNU grep 2.5.3 from the 
GnuWin32 project; 1 GB memory.

Cheers,
John
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Newb-ish performance questions

2009-02-22 Thread Kim Boulton

Woo, that's a bit better. Setting the page_size = 4096 gets the query 
time down to 17 seconds, so it's now faster than Mysql. Result!

I also tried changing the query to:
SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 IN 
('S','Z')) OR
 (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = 'S' AND c6 IS NULL) OR
 (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = 'S' AND c7 IS NULL) OR
 (c1 = 'W' AND c2 IN ('3','C','E','F') AND c3 IN ('2','5') ) OR
 (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6 IS NULL ) OR
 (c2 = '1' AND c3 = '1' AND c4 IN ('C','E','F') AND c5 = 'S' AND c7 IS NULL)
ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC;

But OR is a lot slower than using UNION ALL on both Sqlite and Mysql

I'm going to try the page_size at 8192 and see if that works even better.

Thanks to everyone for the help!

kim

Kim Boulton wrote:
> Thanks for the hints so far.
>
> Here is my existing Mysql setup and what I've tried with Sqlite
>
> *Mysql Table structure:*
> CREATE TABLE  `mydb`.`mytable` (
>   `c1` 
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>  
> NOT NULL,
>   `c2` 
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>  
> NOT NULL,
>   `c3` 
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>  
> NOT NULL,
>   `c4` 
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>  
> NOT NULL,
>   `c5` 
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>  
> NOT NULL,
>   `c6` 
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>  
> NULL,
>   `c7` 
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>  
> NULL,
>   `price` smallint(4) NOT NULL,  KEY `c1` (`c1`),  KEY `c2` (`c2`),  
> KEY `c3` (`c3`),  KEY `c4` (`c4`),  KEY `c5` (`c5`),  KEY `c6` 
> (`c6`),  KEY `c7` (`c7`),  KEY `price` (`price`) ) ENGINE=MyISAM 
> DEFAULT CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=COMPRESSED;
>
> *CSV Import Data e.g.:*
> 1,A,P,0,0,X,X,300
> 1,A,P,0,0,X,P,
> A,A,P,0,0,,,2000
> B,3,Y,0,1,X,,300
>  approx 30 million row, 500MB csv, text file
>
> If I compress the table it is approximately 900MB in size with an 
> index of approximately 550MB, which i can load into memory via LOAD 
> INDEX INTO CACHE
>
> *Mysql Query example:*
> SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 
> IN ('S','Z')) UNION ALL
> SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = 
> 'S' AND c6 IS NULL) UNION ALL
> SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = 
> 'S' AND c7 IS NULL) UNION ALL
> SELECT * FROM mytable WHERE (c1 = 'W' AND c2 IN ('3','C','E','F') AND 
> c3 IN ('2','5') ) UNION ALL
> SELECT * FROM mytable WHERE (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6 
> IS NULL ) UNION ALL
> SELECT * FROM mytable WHERE (c2 = '1' AND c3 = '1' AND c4 IN 
> ('C','E','F') AND c5 = 'S' AND c7 IS NULL)
> ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC;
> *
> The above query takes 37 seconds and produces approx 200,000 results 
> and uses 550MB RAM*
>
> *So, in Sqlite3 I created a similar table structure like this:*
> CREATE TABLE [mytable] ([c1] TEXT  NOT NULL,[c2] TEXT  NOT NULL,[c3] 
> TEXT  NOT NULL,[c4] TEXT  NOT NULL,[c5] TEXT  NOT NULL,[c6] TEXT  
> NULL,[c7] TEXT  NULL,[price] INTEGER  NOT NULL)
> CREATE INDEX [c1] ON [mytable]([c1]  ASC)
> CREATE INDEX [c2] ON [mytable]([c2]  ASC)
> CREATE INDEX [c3] ON [mytable]([c3]  ASC)
> CREATE INDEX [c4] ON [mytable]([c4]  ASC)
> CREATE INDEX [c5] ON [mytable]([c5]  ASC)
> CREATE INDEX [c6] ON [mytable]([c6]  ASC)
> CREATE INDEX [c7] ON [mytable]([c7]  ASC)
> CREATE INDEX [price] ON [mytable]([price]  ASC)
>
> *Then I imported the same csv data using:*
> .separator ","
> .import mycsv.csv mytable
>
> *Then fixed the NULL values in the last two columns with:*
> UPDATE mytable SET c6 = NULL where c6 = '';
> UPDATE mytable SET c7 = NULL where c7 = '';
>
> Then Vacuumed - took 6 hours!
>
> This leaves me with a 4GB Sqlite table
>
> *Then queried the Sqlite3 table with:*
> PRAGMA cache_size = 2000; */uses up 1.5GB RAM regardless*/
> PRAGMA page_size = 2000; /*this doesn't make any difference*/
> SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 
> IN ('S','Z')) UNION ALL
> SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = 
> 'S' AND c6 IS NULL) UNION ALL
> SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = 

Re: [sqlite] Newb-ish performance questions

2009-02-22 Thread Kim Boulton

Woo, that's a bit better. Setting the page_size = 4096 gets the query 
time down to 17 seconds, so it's now faster than Mysql. Result!

I also tried changing the query to:
SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 IN 
('S','Z')) OR
 (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = 'S' AND c6 IS NULL) OR
 (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = 'S' AND c7 IS NULL) OR
 (c1 = 'W' AND c2 IN ('3','C','E','F') AND c3 IN ('2','5') ) OR
 (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6 IS NULL ) OR
 (c2 = '1' AND c3 = '1' AND c4 IN ('C','E','F') AND c5 = 'S' AND c7 IS NULL)
ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC;

But OR is a lot slower than using UNION ALL on both Sqlite and Mysql

I'm going to try the page_size at 8192 and see if that works even better.

Thanks to everyone for the help!

kim

Kim Boulton wrote:
> Thanks for the hints so far.
>
> Here is my existing Mysql setup and what I've tried with Sqlite
>
> *Mysql Table structure:*
> CREATE TABLE  `mydb`.`mytable` (
>   `c1` 
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>  
> NOT NULL,
>   `c2` 
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>  
> NOT NULL,
>   `c3` 
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>  
> NOT NULL,
>   `c4` 
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>  
> NOT NULL,
>   `c5` 
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>  
> NOT NULL,
>   `c6` 
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>  
> NULL,
>   `c7` 
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>  
> NULL,
>   `price` smallint(4) NOT NULL,  KEY `c1` (`c1`),  KEY `c2` (`c2`),  
> KEY `c3` (`c3`),  KEY `c4` (`c4`),  KEY `c5` (`c5`),  KEY `c6` 
> (`c6`),  KEY `c7` (`c7`),  KEY `price` (`price`) ) ENGINE=MyISAM 
> DEFAULT CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=COMPRESSED;
>
> *CSV Import Data e.g.:*
> 1,A,P,0,0,X,X,300
> 1,A,P,0,0,X,P,
> A,A,P,0,0,,,2000
> B,3,Y,0,1,X,,300
>  approx 30 million row, 500MB csv, text file
>
> If I compress the table it is approximately 900MB in size with an 
> index of approximately 550MB, which i can load into memory via LOAD 
> INDEX INTO CACHE
>
> *Mysql Query example:*
> SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 
> IN ('S','Z')) UNION ALL
> SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = 
> 'S' AND c6 IS NULL) UNION ALL
> SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = 
> 'S' AND c7 IS NULL) UNION ALL
> SELECT * FROM mytable WHERE (c1 = 'W' AND c2 IN ('3','C','E','F') AND 
> c3 IN ('2','5') ) UNION ALL
> SELECT * FROM mytable WHERE (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6 
> IS NULL ) UNION ALL
> SELECT * FROM mytable WHERE (c2 = '1' AND c3 = '1' AND c4 IN 
> ('C','E','F') AND c5 = 'S' AND c7 IS NULL)
> ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC;
> *
> The above query takes 37 seconds and produces approx 200,000 results 
> and uses 550MB RAM*
>
> *So, in Sqlite3 I created a similar table structure like this:*
> CREATE TABLE [mytable] ([c1] TEXT  NOT NULL,[c2] TEXT  NOT NULL,[c3] 
> TEXT  NOT NULL,[c4] TEXT  NOT NULL,[c5] TEXT  NOT NULL,[c6] TEXT  
> NULL,[c7] TEXT  NULL,[price] INTEGER  NOT NULL)
> CREATE INDEX [c1] ON [mytable]([c1]  ASC)
> CREATE INDEX [c2] ON [mytable]([c2]  ASC)
> CREATE INDEX [c3] ON [mytable]([c3]  ASC)
> CREATE INDEX [c4] ON [mytable]([c4]  ASC)
> CREATE INDEX [c5] ON [mytable]([c5]  ASC)
> CREATE INDEX [c6] ON [mytable]([c6]  ASC)
> CREATE INDEX [c7] ON [mytable]([c7]  ASC)
> CREATE INDEX [price] ON [mytable]([price]  ASC)
>
> *Then I imported the same csv data using:*
> .separator ","
> .import mycsv.csv mytable
>
> *Then fixed the NULL values in the last two columns with:*
> UPDATE mytable SET c6 = NULL where c6 = '';
> UPDATE mytable SET c7 = NULL where c7 = '';
>
> Then Vacuumed - took 6 hours!
>
> This leaves me with a 4GB Sqlite table
>
> *Then queried the Sqlite3 table with:*
> PRAGMA cache_size = 2000; */uses up 1.5GB RAM regardless*/
> PRAGMA page_size = 2000; /*this doesn't make any difference*/
> SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 
> IN ('S','Z')) UNION ALL
> SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = 
> 'S' AND c6 IS NULL) UNION ALL
> SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = 

Re: [sqlite] Newb-ish performance questions

2009-02-22 Thread Kim Boulton

Sorry, I forgot to mention that I was running the query twice, first to
cache it, second to measure the speed.

Yeah it's a horrible query but produces the fastest results in Mysql.
maybe using OR instead works better on Sqlite, working on that one.

Problem I have with indexes in sqlite is that the table balloons 400MB
in size with every index i add. At the moment it's 4GB with one on each
column, if i add more indexes across other columns i'm thinking it'll
get too big to cache up.

thanks

Thomas Briggs wrote:
>For starters, I think that loading the index into the cache in
> MySQL is biasing your performance measures.  SQLite will automatically
> load pages of any necessary indexes into memory as part of executing
> the query, but doing so takes time.  By preloading the index on MySQL,
> you're removing that time from the amount of time MySQL spends
> executing that query, so you're not really comparing apples to apples.
>  Said another way: how long does the query take in MySQL if you don't
> pre-load the index?
>
>Secondly, the page size needs to be set before the database is
> created, i.e. before you execute the CREATE TABLE statement.
>
>Thirdly, that's a pretty horrible query. :)  Collapsing the
> different queries into fewer queries - even if it isn't a single query
> - will speed things up.  Each SELECT is a separate query that requires
> that the table data be read; reducing the number of SELECTs will thus
> reduce query time.
>
>Lastly, your indexes need to reflect how the data is accessed.  For
> example, if you're going to do "WHERE c5 = x AND c6 = y AND c7 = 7"
> then you need one index on (c5, c6, c7) - not three individual indexes
> on c5, c6, and c7.
>
>Performance tuning rule #1 - problems are almost always in your use
> of the database, not the database itself. :)
>
>-Tom
>
> On Sun, Feb 22, 2009 at 3:48 AM, Kim Boulton  wrote:
>   
>> Thanks for the hints so far.
>>
>> Here is my existing Mysql setup and what I've tried with Sqlite
>>
>> *Mysql Table structure:*
>> CREATE TABLE  `mydb`.`mytable` (
>>  `c1`
>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>> NOT NULL,
>>  `c2`
>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>> NOT NULL,
>>  `c3`
>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>> NOT NULL,
>>  `c4`
>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>> NOT NULL,
>>  `c5`
>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>> NOT NULL,
>>  `c6`
>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>> NULL,
>>  `c7`
>> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>> NULL,
>>  `price` smallint(4) NOT NULL,  KEY `c1` (`c1`),  KEY `c2` (`c2`),  KEY
>> `c3` (`c3`),  KEY `c4` (`c4`),  KEY `c5` (`c5`),  KEY `c6` (`c6`),  KEY
>> `c7` (`c7`),  KEY `price` (`price`) ) ENGINE=MyISAM DEFAULT
>> CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=COMPRESSED;
>>
>> *CSV Import Data e.g.:*
>> 1,A,P,0,0,X,X,300
>> 1,A,P,0,0,X,P,
>> A,A,P,0,0,,,2000
>> B,3,Y,0,1,X,,300
>>  approx 30 million row, 500MB csv, text file
>>
>> If I compress the table it is approximately 900MB in size with an index
>> of approximately 550MB, which i can load into memory via LOAD INDEX INTO
>> CACHE
>>
>> *Mysql Query example:*
>> SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 IN
>> ('S','Z')) UNION ALL
>> SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 =
>> 'S' AND c6 IS NULL) UNION ALL
>> SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 =
>> 'S' AND c7 IS NULL) UNION ALL
>> SELECT * FROM mytable WHERE (c1 = 'W' AND c2 IN ('3','C','E','F') AND c3
>> IN ('2','5') ) UNION ALL
>> SELECT * FROM mytable WHERE (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6
>> IS NULL ) UNION ALL
>> SELECT * FROM mytable WHERE (c2 = '1' AND c3 = '1' AND c4 IN
>> ('C','E','F') AND c5 = 'S' AND c7 IS NULL)
>> ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC;
>> *
>> The above query takes 37 seconds and produces approx 200,000 results and
>> uses 550MB RAM*
>>
>> *So, in Sqlite3 I created a similar table structure like this:*
>> CREATE TABLE [mytable] ([c1] TEXT  NOT NULL,[c2] TEXT  NOT NULL,[c3]
>> TEXT  NOT NULL,[c4] TEXT  NOT NULL,[c5] TEXT  NOT NULL,[c6] TEXT
>> NULL,[c7] TEXT  NULL,[price] INTEGER  NOT NULL)
>> CREATE INDEX [c1] ON 

Re: [sqlite] Newb-ish performance questions

2009-02-22 Thread Kim Boulton

You have a good point there, I've been using that wrongly. I'll try that.

Thanks


Kees Nuyt wrote:
> On Sun, 22 Feb 2009 08:48:00 +, Kim Boulton
>  wrote in k...@jesk.co.uk, General Discussion
> of SQLite Database :
>
>   
>> *Then queried the Sqlite3 table with:*
>> PRAGMA cache_size = 2000; */uses up 1.5GB RAM regardless*/
>> 
>
> cache_size is expressed in number of pages.
> default_cache_size is useful too.
>
>   
>> PRAGMA page_size = 2000; /*this doesn't make any difference*/
>> 
>
> PRAGMA page_size will only make a difference if you use it
> when creating the database (before the first table is
> created), or just before a VACUUM statement.
> Don't make it too big. 4096 or 8192 are a good start to
> experiment with.
>
>   
>> Unless anyone has some good ideas I might 
>> have to give up on Sqlite.
>> 
>
> I'm sure you can get more performance if you tweak page_size
> and cache_size with some more understanding.
> Use whatever engine is best for the task.
>   
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Newb-ish performance questions

2009-02-22 Thread Kim Boulton

Hello,

Thanks

The grep regex on the text file found around 10,000 lines over 5 minutes 
(out of a total possible 200,000 rows), at which time I stopped it, 
interesting experiment anyway :-)

Using OR instead of UNION in Mysql is definitely slower, although maybe, 
as someone has mentioned above, the Sqlite optimizer works better with 
OR. Will try that next.

Thanks.


> Try this:
>
> (1) ditch the redundant commas in your csv file
> (2) run this (suitably adjusted for my errors and typos, and for my
> ignorance of the minutiae of command-line and grep syntax on your
> [unspecified IIRC] OS):
>
> grep
> "(W[CEF][SZ]|..W[CEF]S|...W[CEF]S|W[3CEF]S[25]..|W3S..|.11[CEF]S.),"
> yourfile.csv
>
> How long does that take?
>
> Another suggestion: search for clues on whether it might be better
> instead of doing
> select * from mytable where 
> union all
> etc etc
> select * from mytable where 
> to do
> select * from mytable where  or  or etc etc
> and if you don't find a strong weight of opinion for one way or the
> other, give it a try.
>
> HTH,
>
> John
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>   
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Newb-ish performance questions

2009-02-22 Thread Jim Dodgen
Would loading the 30 million row csv file via the command line be wrapped in
inside a single transaction,  thus building a very rollback log? I like to
break my  bulk loads into nice chunks.

Also it could be the MySQL parser does a better job of optimizing the
unusual select.





On Sun, Feb 22, 2009 at 11:27 AM, Nicolas Williams  wrote:

> On Sun, Feb 22, 2009 at 01:29:09PM +0100, Kees Nuyt wrote:
> > > PRAGMA page_size = 2000; /*this doesn't make any difference*/
> >
> > PRAGMA page_size will only make a difference if you use it
> > when creating the database (before the first table is
> > created), or just before a VACUUM statement.
> > Don't make it too big. 4096 or 8192 are a good start to
> > experiment with.
>
> The hard max is 32KB, IIRC, and even that requires changing the code.
> Otherwise 16KB is the max, and it works fine.
>
> I've wanted to make SQLite3 default to using the smaller of 16KB or the
> filesystem's preferred block size.  This would make SQLite3 more
> efficient on filesystems like ZFS.  But unfortunately the tests assume
> the 1KB page size throughout, thus a number of tests fail if that change
> is made.  Thus for Solaris I've held back on that change for now.
>
> Nico
> --
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Jim Dodgen
j...@dodgen.us
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Newb-ish performance questions

2009-02-22 Thread Nicolas Williams
On Sun, Feb 22, 2009 at 01:29:09PM +0100, Kees Nuyt wrote:
> > PRAGMA page_size = 2000; /*this doesn't make any difference*/
> 
> PRAGMA page_size will only make a difference if you use it
> when creating the database (before the first table is
> created), or just before a VACUUM statement.
> Don't make it too big. 4096 or 8192 are a good start to
> experiment with.

The hard max is 32KB, IIRC, and even that requires changing the code.
Otherwise 16KB is the max, and it works fine.

I've wanted to make SQLite3 default to using the smaller of 16KB or the
filesystem's preferred block size.  This would make SQLite3 more
efficient on filesystems like ZFS.  But unfortunately the tests assume
the 1KB page size throughout, thus a number of tests fail if that change
is made.  Thus for Solaris I've held back on that change for now.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Newb-ish performance questions

2009-02-22 Thread Thomas Briggs
   For starters, I think that loading the index into the cache in
MySQL is biasing your performance measures.  SQLite will automatically
load pages of any necessary indexes into memory as part of executing
the query, but doing so takes time.  By preloading the index on MySQL,
you're removing that time from the amount of time MySQL spends
executing that query, so you're not really comparing apples to apples.
 Said another way: how long does the query take in MySQL if you don't
pre-load the index?

   Secondly, the page size needs to be set before the database is
created, i.e. before you execute the CREATE TABLE statement.

   Thirdly, that's a pretty horrible query. :)  Collapsing the
different queries into fewer queries - even if it isn't a single query
- will speed things up.  Each SELECT is a separate query that requires
that the table data be read; reducing the number of SELECTs will thus
reduce query time.

   Lastly, your indexes need to reflect how the data is accessed.  For
example, if you're going to do "WHERE c5 = x AND c6 = y AND c7 = 7"
then you need one index on (c5, c6, c7) - not three individual indexes
on c5, c6, and c7.

   Performance tuning rule #1 - problems are almost always in your use
of the database, not the database itself. :)

   -Tom

On Sun, Feb 22, 2009 at 3:48 AM, Kim Boulton  wrote:
> Thanks for the hints so far.
>
> Here is my existing Mysql setup and what I've tried with Sqlite
>
> *Mysql Table structure:*
> CREATE TABLE  `mydb`.`mytable` (
>  `c1`
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
> NOT NULL,
>  `c2`
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
> NOT NULL,
>  `c3`
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
> NOT NULL,
>  `c4`
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
> NOT NULL,
>  `c5`
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
> NOT NULL,
>  `c6`
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
> NULL,
>  `c7`
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
> NULL,
>  `price` smallint(4) NOT NULL,  KEY `c1` (`c1`),  KEY `c2` (`c2`),  KEY
> `c3` (`c3`),  KEY `c4` (`c4`),  KEY `c5` (`c5`),  KEY `c6` (`c6`),  KEY
> `c7` (`c7`),  KEY `price` (`price`) ) ENGINE=MyISAM DEFAULT
> CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=COMPRESSED;
>
> *CSV Import Data e.g.:*
> 1,A,P,0,0,X,X,300
> 1,A,P,0,0,X,P,
> A,A,P,0,0,,,2000
> B,3,Y,0,1,X,,300
>  approx 30 million row, 500MB csv, text file
>
> If I compress the table it is approximately 900MB in size with an index
> of approximately 550MB, which i can load into memory via LOAD INDEX INTO
> CACHE
>
> *Mysql Query example:*
> SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 IN
> ('S','Z')) UNION ALL
> SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 =
> 'S' AND c6 IS NULL) UNION ALL
> SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 =
> 'S' AND c7 IS NULL) UNION ALL
> SELECT * FROM mytable WHERE (c1 = 'W' AND c2 IN ('3','C','E','F') AND c3
> IN ('2','5') ) UNION ALL
> SELECT * FROM mytable WHERE (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6
> IS NULL ) UNION ALL
> SELECT * FROM mytable WHERE (c2 = '1' AND c3 = '1' AND c4 IN
> ('C','E','F') AND c5 = 'S' AND c7 IS NULL)
> ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC;
> *
> The above query takes 37 seconds and produces approx 200,000 results and
> uses 550MB RAM*
>
> *So, in Sqlite3 I created a similar table structure like this:*
> CREATE TABLE [mytable] ([c1] TEXT  NOT NULL,[c2] TEXT  NOT NULL,[c3]
> TEXT  NOT NULL,[c4] TEXT  NOT NULL,[c5] TEXT  NOT NULL,[c6] TEXT
> NULL,[c7] TEXT  NULL,[price] INTEGER  NOT NULL)
> CREATE INDEX [c1] ON [mytable]([c1]  ASC)
> CREATE INDEX [c2] ON [mytable]([c2]  ASC)
> CREATE INDEX [c3] ON [mytable]([c3]  ASC)
> CREATE INDEX [c4] ON [mytable]([c4]  ASC)
> CREATE INDEX [c5] ON [mytable]([c5]  ASC)
> CREATE INDEX [c6] ON [mytable]([c6]  ASC)
> CREATE INDEX [c7] ON [mytable]([c7]  ASC)
> CREATE INDEX [price] ON [mytable]([price]  ASC)
>
> *Then I imported the same csv data using:*
> .separator ","
> .import mycsv.csv mytable
>
> *Then fixed the NULL values in the last two columns with:*
> UPDATE mytable SET c6 = NULL where c6 = '';
> UPDATE mytable SET c7 = NULL where c7 = '';
>
> Then Vacuumed - took 6 hours!
>
> This leaves me with a 4GB 

Re: [sqlite] Newb-ish performance questions

2009-02-22 Thread Kees Nuyt
On Sun, 22 Feb 2009 08:48:00 +, Kim Boulton
 wrote in k...@jesk.co.uk, General Discussion
of SQLite Database :

> *Then queried the Sqlite3 table with:*
> PRAGMA cache_size = 2000; */uses up 1.5GB RAM regardless*/

cache_size is expressed in number of pages.
default_cache_size is useful too.

> PRAGMA page_size = 2000; /*this doesn't make any difference*/

PRAGMA page_size will only make a difference if you use it
when creating the database (before the first table is
created), or just before a VACUUM statement.
Don't make it too big. 4096 or 8192 are a good start to
experiment with.

> Unless anyone has some good ideas I might 
> have to give up on Sqlite.

I'm sure you can get more performance if you tweak page_size
and cache_size with some more understanding.
Use whatever engine is best for the task.
-- 
  (  Kees Nuyt
  )
c[_]
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Newb-ish performance questions

2009-02-22 Thread John Machin
On 22/02/2009 7:48 PM, Kim Boulton wrote:
> Thanks for the hints so far.
> 
> Here is my existing Mysql setup and what I've tried with Sqlite
> 
> *Mysql Table structure:*
> CREATE TABLE  `mydb`.`mytable` (
>   `c1` 
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>  
> NOT NULL,
[snip FFS]
>   `c7` 
> enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
>  
> NULL,
>   `price` smallint(4) NOT NULL,  KEY `c1` (`c1`),  KEY `c2` (`c2`),  KEY 
> `c3` (`c3`),  KEY `c4` (`c4`),  KEY `c5` (`c5`),  KEY `c6` (`c6`),  KEY 
> `c7` (`c7`),  KEY `price` (`price`) ) ENGINE=MyISAM DEFAULT 
> CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=COMPRESSED;
> 
> *CSV Import Data e.g.:*
> 1,A,P,0,0,X,X,300
> 1,A,P,0,0,X,P,
> A,A,P,0,0,,,2000
> B,3,Y,0,1,X,,300
>  approx 30 million row, 500MB csv, text file
> 
> If I compress the table it is approximately 900MB in size with an index 
> of approximately 550MB, which i can load into memory via LOAD INDEX INTO 
> CACHE
> 
> *Mysql Query example:*
> SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 IN 
> ('S','Z')) UNION ALL
> SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = 
> 'S' AND c6 IS NULL) UNION ALL
> SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = 
> 'S' AND c7 IS NULL) UNION ALL
> SELECT * FROM mytable WHERE (c1 = 'W' AND c2 IN ('3','C','E','F') AND c3 
> IN ('2','5') ) UNION ALL
> SELECT * FROM mytable WHERE (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6 
> IS NULL ) UNION ALL
> SELECT * FROM mytable WHERE (c2 = '1' AND c3 = '1' AND c4 IN 
> ('C','E','F') AND c5 = 'S' AND c7 IS NULL)
> ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC;
> *
> The above query takes 37 seconds and produces approx 200,000 results and 
> uses 550MB RAM*

Try this:

(1) ditch the redundant commas in your csv file
(2) run this (suitably adjusted for my errors and typos, and for my
ignorance of the minutiae of command-line and grep syntax on your
[unspecified IIRC] OS):

grep
"(W[CEF][SZ]|..W[CEF]S|...W[CEF]S|W[3CEF]S[25]..|W3S..|.11[CEF]S.),"
yourfile.csv

How long does that take?

Another suggestion: search for clues on whether it might be better
instead of doing
select * from mytable where 
union all
etc etc
select * from mytable where 
to do
select * from mytable where  or  or etc etc
and if you don't find a strong weight of opinion for one way or the
other, give it a try.

HTH,

John


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Newb-ish performance questions

2009-02-22 Thread Kim Boulton
Thanks for the hints so far.

Here is my existing Mysql setup and what I've tried with Sqlite

*Mysql Table structure:*
CREATE TABLE  `mydb`.`mytable` (
  `c1` 
enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
 
NOT NULL,
  `c2` 
enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
 
NOT NULL,
  `c3` 
enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
 
NOT NULL,
  `c4` 
enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
 
NOT NULL,
  `c5` 
enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
 
NOT NULL,
  `c6` 
enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
 
NULL,
  `c7` 
enum('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z')
 
NULL,
  `price` smallint(4) NOT NULL,  KEY `c1` (`c1`),  KEY `c2` (`c2`),  KEY 
`c3` (`c3`),  KEY `c4` (`c4`),  KEY `c5` (`c5`),  KEY `c6` (`c6`),  KEY 
`c7` (`c7`),  KEY `price` (`price`) ) ENGINE=MyISAM DEFAULT 
CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=COMPRESSED;

*CSV Import Data e.g.:*
1,A,P,0,0,X,X,300
1,A,P,0,0,X,P,
A,A,P,0,0,,,2000
B,3,Y,0,1,X,,300
 approx 30 million row, 500MB csv, text file

If I compress the table it is approximately 900MB in size with an index 
of approximately 550MB, which i can load into memory via LOAD INDEX INTO 
CACHE

*Mysql Query example:*
SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 IN 
('S','Z')) UNION ALL
SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = 
'S' AND c6 IS NULL) UNION ALL
SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = 
'S' AND c7 IS NULL) UNION ALL
SELECT * FROM mytable WHERE (c1 = 'W' AND c2 IN ('3','C','E','F') AND c3 
IN ('2','5') ) UNION ALL
SELECT * FROM mytable WHERE (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6 
IS NULL ) UNION ALL
SELECT * FROM mytable WHERE (c2 = '1' AND c3 = '1' AND c4 IN 
('C','E','F') AND c5 = 'S' AND c7 IS NULL)
ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC;
*
The above query takes 37 seconds and produces approx 200,000 results and 
uses 550MB RAM*

*So, in Sqlite3 I created a similar table structure like this:*
CREATE TABLE [mytable] ([c1] TEXT  NOT NULL,[c2] TEXT  NOT NULL,[c3] 
TEXT  NOT NULL,[c4] TEXT  NOT NULL,[c5] TEXT  NOT NULL,[c6] TEXT  
NULL,[c7] TEXT  NULL,[price] INTEGER  NOT NULL)
CREATE INDEX [c1] ON [mytable]([c1]  ASC)
CREATE INDEX [c2] ON [mytable]([c2]  ASC)
CREATE INDEX [c3] ON [mytable]([c3]  ASC)
CREATE INDEX [c4] ON [mytable]([c4]  ASC)
CREATE INDEX [c5] ON [mytable]([c5]  ASC)
CREATE INDEX [c6] ON [mytable]([c6]  ASC)
CREATE INDEX [c7] ON [mytable]([c7]  ASC)
CREATE INDEX [price] ON [mytable]([price]  ASC)

*Then I imported the same csv data using:*
.separator ","
.import mycsv.csv mytable

*Then fixed the NULL values in the last two columns with:*
UPDATE mytable SET c6 = NULL where c6 = '';
UPDATE mytable SET c7 = NULL where c7 = '';

Then Vacuumed - took 6 hours!

This leaves me with a 4GB Sqlite table

*Then queried the Sqlite3 table with:*
PRAGMA cache_size = 2000; */uses up 1.5GB RAM regardless*/
PRAGMA page_size = 2000; /*this doesn't make any difference*/
SELECT * FROM mytable WHERE (c5 = 'W' AND c6 IN ('C','E','F') AND c7 IN 
('S','Z')) UNION ALL
SELECT * FROM mytable WHERE (c3 = 'W' AND c4 IN ('C','E','F') AND c5 = 
'S' AND c6 IS NULL) UNION ALL
SELECT * FROM mytable WHERE (c4 = 'W' AND c5 IN ('C','E','F') AND c6 = 
'S' AND c7 IS NULL) UNION ALL
SELECT * FROM mytable WHERE (c1 = 'W' AND c2 IN ('3','C','E','F') AND c3 
IN ('2','5') ) UNION ALL
SELECT * FROM mytable WHERE (c1 = 'W' AND c2 = '3' AND c3 = 'S' AND c6 
IS NULL ) UNION ALL
SELECT * FROM mytable WHERE (c2 = '1' AND c3 = '1' AND c4 IN 
('C','E','F') AND c5 = 'S' AND c7 IS NULL)
ORDER BY c1,c2,c3,c4,c5,c6,c7 ASC;

*The above query takes 57 seconds, 20 seconds slower than Mysql but 
produces the same results - but uses up 1.5GB RAM!*

Also if I try some more complex queries it always runs out steam at 2GB 
RAM and crashes. Maybe that is a limitation of my PC or XP though.

I have also tried...
1. Sqlite table with no indexes - very slow!
2. Sqlite table with one unique index using all columns - still slower 
than seperate indexes.

So I've had a fair crack of the whip with Sqlite. I was hoping it would 
be faster and use less memory, no luck though.

Unless anyone has some good ideas I might have to give up on Sqlite.

Cheers.

Kim

Kim Boulton wrote:
> Hello,

Re: [sqlite] Newb-ish performance questions

2009-02-19 Thread Jay A. Kreibich
On Fri, Feb 20, 2009 at 05:22:33AM +, Kim Boulton scratched on the wall:
> Hello,
> 
> I'm trying out Sqlite3 with an eye to improving the performance of 
> queries on an existing MySQL database.
> 
> I've imported the data into sqlite which is approx. 30 million rows of 
> part numbers each with a price.
> 
> So far, it's approx. four times slower than the MySQL version, and the 
> size of the sqlite database is too big to fit in memory (several GB) 
> whereas I can get the MySQL data down to 900MB if it's compressed and 
> read only.

  Make sure you have indexes on appropriate columns, like part-number or
  whatever your queries are commonly keyed off of.

  Bump up the page-cache size.  No matter what the size of the
  database, by default SQLite will only cache 2000 pages.  The default
  page size is 1K, and a cache slot takes about 1.5K, so adjust
  accordingly (like 10x or 100x) via PRAGMA if you're on a system with
  a comfortable amount of RAM.  If you're randomly pulling out
  individual records, the only thing you really care about is keeping
  the most important index(es) in the cache.
  
  Also, SQLite will not pre-load anything, so your first few queries are
  likely to be slower as it seeds the indexes into the data cache.


  I'm surprised about the size.  If you have a name or description
  column, make sure you're not using fixed-length strings for the import.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Newb-ish performance questions

2009-02-19 Thread Jim Dodgen
was it 4 times slower to load? or 4 times slower to query?

also we need some examples.

On Thu, Feb 19, 2009 at 9:33 PM, Thomas Briggs  wrote:

>   Depending on the nature of the data and queries, increasing the
> block size may help.
>
>   Posting some information about your schema and queries is the only
> way to get truly good advice on this though, I think.  There is no
> "-runfast" switch you can include on the command line to fix things.
> :)  The answers are almost guaranteed to be found in your use of
> SQLite, not in the database itself.
>
>   -T
>
> On Fri, Feb 20, 2009 at 12:22 AM, Kim Boulton  wrote:
> > Hello,
> >
> > I'm trying out Sqlite3 with an eye to improving the performance of
> > queries on an existing MySQL database.
> >
> > I've imported the data into sqlite which is approx. 30 million rows of
> > part numbers each with a price.
> >
> > So far, it's approx. four times slower than the MySQL version, and the
> > size of the sqlite database is too big to fit in memory (several GB)
> > whereas I can get the MySQL data down to 900MB if it's compressed and
> > read only.
> >
> > I would appreciate some tips or pointers on getting sqlite3 performance
> > up and the data size down. I googled but couldn't find much.
> >
> > I don't need concurrency or inserts, it's single user, read only.
> >
> > TIA
> >
> > kimb
> >
> >
> >
> >
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Jim Dodgen
j...@dodgen.us
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Newb-ish performance questions

2009-02-19 Thread Thomas Briggs
   Depending on the nature of the data and queries, increasing the
block size may help.

   Posting some information about your schema and queries is the only
way to get truly good advice on this though, I think.  There is no
"-runfast" switch you can include on the command line to fix things.
:)  The answers are almost guaranteed to be found in your use of
SQLite, not in the database itself.

   -T

On Fri, Feb 20, 2009 at 12:22 AM, Kim Boulton  wrote:
> Hello,
>
> I'm trying out Sqlite3 with an eye to improving the performance of
> queries on an existing MySQL database.
>
> I've imported the data into sqlite which is approx. 30 million rows of
> part numbers each with a price.
>
> So far, it's approx. four times slower than the MySQL version, and the
> size of the sqlite database is too big to fit in memory (several GB)
> whereas I can get the MySQL data down to 900MB if it's compressed and
> read only.
>
> I would appreciate some tips or pointers on getting sqlite3 performance
> up and the data size down. I googled but couldn't find much.
>
> I don't need concurrency or inserts, it's single user, read only.
>
> TIA
>
> kimb
>
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Newb-ish performance questions

2009-02-19 Thread Kim Boulton
Hello,

I'm trying out Sqlite3 with an eye to improving the performance of 
queries on an existing MySQL database.

I've imported the data into sqlite which is approx. 30 million rows of 
part numbers each with a price.

So far, it's approx. four times slower than the MySQL version, and the 
size of the sqlite database is too big to fit in memory (several GB) 
whereas I can get the MySQL data down to 900MB if it's compressed and 
read only.

I would appreciate some tips or pointers on getting sqlite3 performance 
up and the data size down. I googled but couldn't find much.

I don't need concurrency or inserts, it's single user, read only.

TIA

kimb




___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users