Re: [sqlite] COUNT() extremely slow first time!

2011-02-28 Thread Black, Michael (IS)
Did you do multiple runs and average?  And how are you measuring time?  Did you 
do buffered reads?
And you're MB/sec isn't accurate for the count() test.  I think it was already 
said that count() doesn't read the entire record.

The SQLite numbers are due to SQLite caching.  If you make page size 4096 you 
are 4 times less likely to need another disk read (SSD or otherwise).  That's 
different from what you're doing.  You're one layer down from that.

Your read makes perfect sense if I look at it as sampling errorwhich is 
what I would expect.

As you increase your read page size you get a curve that saturates at 50MB/sec.

Also...you either have a really slow machine or your timing is wrong.

I can do over 70MB/sec just reading from disk (2.6Ghz Xeon, 7200 SAS drives 
mirrored)
#include 
#include 
#include 
double tic(int flag)
{
static double t1;
double t2;
struct timeval tv;
static long base_seconds;
gettimeofday(,NULL);
// to give us the resolution we need for high-speed we will subtract our 
starting seconds
// otherwise 15 digits of accuracy aren't enough
if (base_seconds==0) base_seconds=tv.tv_sec;
if (flag==0) {
t1 = (tv.tv_sec-base_seconds)+tv.tv_usec/(double)100;
}
t2 = (tv.tv_sec-base_seconds)+tv.tv_usec/(double)100;
return t2-t1;
}
void test1(int bufsize)
{
char *buf=malloc(bufsize);
FILE *fp=fopen("test2.dat","r");
int n,nn=0;
while((n=fread(buf,1,bufsize,fp))) {
nn+=n;
}
fclose(fp);
printf("%d\n",nn);
}
int main(int argc,char *argv[])
{
tic(0);
test1(atoi(argv[1]));
printf("%f MB/sec\n",51200/tic(1));
return 0;
}

dd if=/dev/zero of=test2.dat bs=1024 count=50
sync; echo 3 > /proc/sys/vm/drop_caches
io 4096
51200
73.071591 MB/sec

Now that it's cached
io 4096
51200
1884.153971 MB/sec

I would hope your SSD can do better than my disk on raw throughput.


Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Max Vlasov [max.vla...@gmail.com]
Sent: Monday, February 28, 2011 8:52 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] EXT :Re: COUNT() extremely slow first time!

Michael,
it's an interesting hypothesis. Maybe you can describe the following
results. Recently I made a test program that just reads a file sequentially
with different block sizes and results are still confusing. This was the
same SSD drive and the functions were basically from api:
CreateFile/ReadFile

128, 256: 5MB/Sec
512: 20MB/Sec
1024: 25MB/Sec.
2048: 41MB/Sec
4096 - 65536: ~50MB/Sec.

Strangely I didn't see such a big difference between slowest and fastest
scenario (if we exclude the sizes below 1024), as you see the difference is
only x2. I have only one explanation looking at the numbers. Although it's
well-known that sqlite reads only full pages, if it sometimes does partial
reading, this 5MB/Sec drop for <256 reading can affect linear speed of 25
MB/Sec to end up as 12MB/Sec. But it's just a guess.

Max


On Mon, Feb 28, 2011 at 4:43 PM, Black, Michael (IS)  wrote:

> Those numbers make sense to me.
>
> Since count is doing a linear walk throughcorrect?
>
> #1 SSDs more than likely don't do "read ahead" like a disk drive.  So what
> you're seeing is what read-ahead does for you.
> #2 Count is doing a linear scan of the table...Probability of 2K containing
> the next page after 1K -- 100% (2X performance improvment)
> #3 Probability of 4K containing the next page after 2K -- 100% (2X
> improvement).
> #4 Probability of 8K containing the next page after 4K -- here the
> probability either drops or we're starting to hit the bandwidth of
> SSD+Sqlite -- I'm tempted to say that it's not the probability that drops.
>
> 
> From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on
> behalf of Max Vlasov [max.vla...@gmail.com]
>
> my timing for Select Count for 400-500MB Bases and the table with about
> 1,000,000 records
>
> 1024:
>  40 Seconds, 98% system+user time, 500 MB Read, 12 MB/Sec
> 2048:
>  22 Seconds, 94% system+user time, 500 MB Read, 20 MB/Sec
> 4096 (This NTFS Cluster size)
>  11 Seconds, 96% system+user time, 450 MB Read, 32 MB/Sec
> 8192
>  8 Seconds, 87% system+user time, 420 MB Read, 50 MB/Sec
> 32768
>  8 Seconds, 56% system+user time, 410 MB Read, 50 MB/Sec
>
> Max
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-27 Thread Nico Williams
On Sun, Feb 27, 2011 at 3:11 PM, Max Vlasov  wrote:
> Hmm, yesterday something struck me I can do similar tests on an Asus T91MT
> having SSD as the storage. I got similar timings to Greg's. So it seems like
> page size is a very sensitive parameter for solid state drives. Looks like
> having the page_size lower than the cluster size on NTFS file drive will
> have very high price for scanning-intensive operations. [...]

Page size (typically a power of two larger than or equal to 12, i.e.,
4KB) and _alignment_ are critical in SSD applications.  Make
absolutely sure that the partition alignment is such that the
filesystem can ensure 4KB blocks starting at 4KB boundaries on SSDs,
or larger blocks starting on 4KB or larger boundaries.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-27 Thread Max Vlasov
On Sat, Feb 26, 2011 at 1:03 AM, Greg Barker  wrote:

>  harddrive. Can you confirm this?
>
> > Giving this, if the fastest is 3.6 seconds, you have a very fast
> I can confirm this, my tests were run on a machine that uses a solid state
> drive.
>
>
Hmm, yesterday something struck me I can do similar tests on an Asus T91MT
having SSD as the storage. I got similar timings to Greg's. So it seems like
page size is a very sensitive parameter for solid state drives. Looks like
having the page_size lower than the cluster size on NTFS file drive will
have very high price for scanning-intensive operations. What puzzles me is
that CPU time for 1024 'select count' can take about 98% of the time so it's
not the same logic as was for rotating hard drives when a program just waits
for the hard drive to complete the operation. I doubt sqlite does something
special so it's maybe about adaptation of ssd specific by windows, it looks
like it's far from ideal. Can someone point to some tests regarding this?
Since SSD becomes more and more spread, i think it's better to understand
what it going on at least approximately.

my timing for Select Count for 400-500MB Bases and the table with about
1,000,000 records

1024:
  40 Seconds, 98% system+user time, 500 MB Read, 12 MB/Sec
2048:
  22 Seconds, 94% system+user time, 500 MB Read, 20 MB/Sec
4096 (This NTFS Cluster size)
  11 Seconds, 96% system+user time, 450 MB Read, 32 MB/Sec
8192
  8 Seconds, 87% system+user time, 420 MB Read, 50 MB/Sec
32768
  8 Seconds, 56% system+user time, 410 MB Read, 50 MB/Sec

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-26 Thread Simon Slavin

On 25 Feb 2011, at 10:03pm, Greg Barker wrote:

> Thanks for the responses guys.
> 
>> I would never have any table with 150 columns.  It should be possible to
>> keep the schema for your table in your head.
> 
> Unfortunately those are the cards I have been dealt. The columns are just
> buckets of data (bucket1, bucket2, bucket3, etc). Each bucket could be 1
> month or 1 weeks worth of data, it can vary from table to table. They can
> have up to two years worth of data loaded, so that could be 24 monthly
> buckets or 104 weekly buckets.

So your schema really should have a table with 'bucketNumber' and 
'bucketContents' as columns.  That would make every bucket a separate row.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-26 Thread Greg Barker
Thanks for the responses guys.

> I would never have any table with 150 columns.  It should be possible to
keep the schema for your table in your head.

Unfortunately those are the cards I have been dealt. The columns are just
buckets of data (bucket1, bucket2, bucket3, etc). Each bucket could be 1
month or 1 weeks worth of data, it can vary from table to table. They can
have up to two years worth of data loaded, so that could be 24 monthly
buckets or 104 weekly buckets.

We don't have any control over the machines this code is running on, so
hardware changes are not an option. These machines are definitely less than
ideal for what we are trying to get them to do, but what can you do.

> Giving this, if the fastest is 3.6 seconds, you have a very fast
harddrive. Can you confirm this?

I can confirm this, my tests were run on a machine that uses a solid state
drive.

Side note - how do I become a member of this list? Do my posts always need
to be approved?

On Fri, Feb 25, 2011 at 4:57 AM, Max Vlasov  wrote:

> On Fri, Feb 25, 2011 at 1:51 AM, Greg Barker 
> wrote:
>
> > I ran my tests again, this time setting the cache_size based on the
> > page_size. For some reason I wasn't able to replicate the timing I got
> > before for the 1024 page size (40.2 seconds) which is really confusing
> me,
> > since I was careful to make sure nothing was running in the background
> > while
> > I was testing.
> >
> > page_size/cache_size: SELECT COUNT time
> > 1024/2000: 20.83s
> > 4096/500: 14.4s
> > 8192/250: 8.8s
> > 16384/125: 5.0s
> > 32768/62: 3.6s
> >
> > I'm assuming reducing the cache size like that will definitely have an
> > impact on the performance of an actual application? Optimizing
> performance
> > for an application where both the queries and data can take many
> different
> > shapes and sizes is beginning to seem like quite a daunting task. What do
> > you do if there could be anywhere between 30-150 columns?
> >
> >
> Greg, first, I suggest to forget about sqlite cache for scanning
> operations.
> This is because the os cache is also used and it's a known fact that they
> sometimes store the same data twice, so disabling or decreasing one still
> leaves another in effect.
>
> I saw that the db you have is about 400 MB in size. Giving this, if the
> fastest is 3.6 seconds, you have a very fast harddrive. Can you confirm
> this? The scanning of such big base with at least 5-years old hd definitely
> should be 10 seconds or even slower. Did you reboot your comp this time? By
> the way, a faster way to clear the system cache for a particular file is to
> "touch" the file with CreateFile(...FILE_FLAG_NO_BUFFERING).
>
> I tried to emulate your base with a table
> CREATE TABLE [TestTable] ([Id] Integer Primary key autoincrement, [a] Text,
> [b] Text, [c] Text, [d] Text);
> and filling it with 1,000,000 records having a, b,c,d about 100 bytes each.
> I saw no big difference between 1024 and 8192 page sizes. When copy the db
> to Nul took about 20 seconds, 1024-select count took 25 seconds and 8192 -
> 20 seconds.
>
> Anyway, select count usually is a very slow operation regardless of the
> tweaks we can use, almost in every scenario it's equivalent to reading all
> the data of the table. I suppose the only exception is if your records is
> very big to take advantage of bypassing overflow data. So, probably in a
> table with 100 records each containing a 1MB text, 'select count' will be
> very fast
>
> Max Vlasov
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-25 Thread Max Vlasov
On Fri, Feb 25, 2011 at 1:51 AM, Greg Barker  wrote:

> I ran my tests again, this time setting the cache_size based on the
> page_size. For some reason I wasn't able to replicate the timing I got
> before for the 1024 page size (40.2 seconds) which is really confusing me,
> since I was careful to make sure nothing was running in the background
> while
> I was testing.
>
> page_size/cache_size: SELECT COUNT time
> 1024/2000: 20.83s
> 4096/500: 14.4s
> 8192/250: 8.8s
> 16384/125: 5.0s
> 32768/62: 3.6s
>
> I'm assuming reducing the cache size like that will definitely have an
> impact on the performance of an actual application? Optimizing performance
> for an application where both the queries and data can take many different
> shapes and sizes is beginning to seem like quite a daunting task. What do
> you do if there could be anywhere between 30-150 columns?
>
>
Greg, first, I suggest to forget about sqlite cache for scanning operations.
This is because the os cache is also used and it's a known fact that they
sometimes store the same data twice, so disabling or decreasing one still
leaves another in effect.

I saw that the db you have is about 400 MB in size. Giving this, if the
fastest is 3.6 seconds, you have a very fast harddrive. Can you confirm
this? The scanning of such big base with at least 5-years old hd definitely
should be 10 seconds or even slower. Did you reboot your comp this time? By
the way, a faster way to clear the system cache for a particular file is to
"touch" the file with CreateFile(...FILE_FLAG_NO_BUFFERING).

I tried to emulate your base with a table
CREATE TABLE [TestTable] ([Id] Integer Primary key autoincrement, [a] Text,
[b] Text, [c] Text, [d] Text);
and filling it with 1,000,000 records having a, b,c,d about 100 bytes each.
I saw no big difference between 1024 and 8192 page sizes. When copy the db
to Nul took about 20 seconds, 1024-select count took 25 seconds and 8192 -
20 seconds.

Anyway, select count usually is a very slow operation regardless of the
tweaks we can use, almost in every scenario it's equivalent to reading all
the data of the table. I suppose the only exception is if your records is
very big to take advantage of bypassing overflow data. So, probably in a
table with 100 records each containing a 1MB text, 'select count' will be
very fast

Max Vlasov
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-25 Thread Simon Slavin

On 24 Feb 2011, at 10:51pm, Greg Barker wrote:

> What do
> you do if there could be anywhere between 30-150 columns?

I would never have any table with 150 columns.  It should be possible to keep 
the schema for your table in your head.

> Optimizing performance
> for an application where both the queries and data can take many different
> shapes and sizes is beginning to seem like quite a daunting task.

For 'daunting' read 'impossible', since the optimal setup will vary over the 
life of the system you're writing, as the tables get bigger, the operating 
system gets slower and the hardware gets faster.  If there was a good general 
solution it would be built into SQLite as the default setting.

There's a big difference between 'fast enough to do what's wanted' and 'as fast 
as possible without heroic measures'.  The first is vital to the success of the 
project.  The second is just bragging rights.  I almost never make performance 
tweaks because although they might save me 600 milliseconds on the sort of 
lookup my users actually do, my users don't notice a difference of 2/3rds of a 
second.

My employer pays about 240 dollars a day for my time.  So for 480 dollars I 
could spend two days optimizing one of my schema or it could buy a faster 
computer (hard disk ?) and speed up not only database access but also 
everything else done with that computer.  Since I have too much to do as it is, 
I'm unlikely to spend the extra two days and emerge with a program which does 
weird things in a weird way just to save a second or two.  I'm curious what my 
customers would have chosen back when I was contracting and being paid a great 
deal more than 240 dollars a day for my time.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-25 Thread Greg Barker
I ran my tests again, this time setting the cache_size based on the
page_size. For some reason I wasn't able to replicate the timing I got
before for the 1024 page size (40.2 seconds) which is really confusing me,
since I was careful to make sure nothing was running in the background while
I was testing.

page_size/cache_size: SELECT COUNT time
1024/2000: 20.83s
4096/500: 14.4s
8192/250: 8.8s
16384/125: 5.0s
32768/62: 3.6s

I'm assuming reducing the cache size like that will definitely have an
impact on the performance of an actual application? Optimizing performance
for an application where both the queries and data can take many different
shapes and sizes is beginning to seem like quite a daunting task. What do
you do if there could be anywhere between 30-150 columns?

I'll post the times I got before, when using the default cache_size of 2000:
1024: 40.2s
4096: 15.5s
8192: 8.5s
16384: 5.3s
32768: 3.8s

On Thu, Feb 24, 2011 at 11:19 AM, Greg Barker  wrote:

> Average payload per entry for my test table was 409.00. Sounds about right
> since the db has 4 columns and each is filled with a random string of length
> 100.
>
> I've uploaded the full output from the sqlite3_analyzer for a few different
> page sizes:
>
> http://fletchowns.net/files/1024-analyzer.txt
> http://fletchowns.net/files/4096-analyzer.txt
> http://fletchowns.net/files/8192-analyzer.txt
>
> I'm not quite sure how to interpret everything in there. What stands out to
> you guys? What is Average Fanout?
>
> Greg
>
>
> On Thu, Feb 24, 2011 at 4:28 AM, Max Vlasov  wrote:
>
>> Yes, Greg, please post this value for this table you use in count query
>>
>> Max
>>
>>
>> On Wed, Feb 23, 2011 at 9:58 PM, Greg Barker 
>> wrote:
>>
>> > Is the record size you refer to here the same as the "Average payload
>> per
>> > entry" that sqlite3_analyzer determines for me?
>> >
>> > On Wed, Feb 23, 2011 at 5:09 AM, Max Vlasov 
>> wrote:
>> >
>> > > Greg, you should also take the record size into account. My hypothesis
>> is
>> > > that if your record is comparatively small (several fits into 1024)
>> the
>> > > speed of select count will be the same for any page size (my quick
>> tests
>> > > confirm this). It's interesting to know what is an average size of
>> your
>> > > record to understand why the numbers are so different.
>> > >
>> > > Max
>> >
>> >
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-25 Thread Greg Barker
Average payload per entry for my test table was 409.00. Sounds about right
since the db has 4 columns and each is filled with a random string of length
100.

I've uploaded the full output from the sqlite3_analyzer for a few different
page sizes:

http://fletchowns.net/files/1024-analyzer.txt
http://fletchowns.net/files/4096-analyzer.txt
http://fletchowns.net/files/8192-analyzer.txt

I'm not quite sure how to interpret everything in there. What stands out to
you guys? What is Average Fanout?

Greg

On Thu, Feb 24, 2011 at 4:28 AM, Max Vlasov  wrote:

> Yes, Greg, please post this value for this table you use in count query
>
> Max
>
>
> On Wed, Feb 23, 2011 at 9:58 PM, Greg Barker 
> wrote:
>
> > Is the record size you refer to here the same as the "Average payload per
> > entry" that sqlite3_analyzer determines for me?
> >
> > On Wed, Feb 23, 2011 at 5:09 AM, Max Vlasov 
> wrote:
> >
> > > Greg, you should also take the record size into account. My hypothesis
> is
> > > that if your record is comparatively small (several fits into 1024) the
> > > speed of select count will be the same for any page size (my quick
> tests
> > > confirm this). It's interesting to know what is an average size of your
> > > record to understand why the numbers are so different.
> > >
> > > Max
> >
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-24 Thread Sven L

Note sure. Will get back on this as soon as I have played a bit with the 
analyzer utility!
 
What I meant was "My own estimated x bytes/row" :)
x + ID + ~20 characters should make some ~32 bytes :)
 
> Date: Wed, 23 Feb 2011 10:58:00 -0800
> From: fle...@fletchowns.net
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] COUNT() extremely slow first time!
> 
> Is the record size you refer to here the same as the "Average payload per
> entry" that sqlite3_analyzer determines for me?
> 
> On Wed, Feb 23, 2011 at 5:09 AM, Max Vlasov <max.vla...@gmail.com> wrote:
> 
> > Greg, you should also take the record size into account. My hypothesis is
> > that if your record is comparatively small (several fits into 1024) the
> > speed of select count will be the same for any page size (my quick tests
> > confirm this). It's interesting to know what is an average size of your
> > record to understand why the numbers are so different.
> >
> > Max
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-24 Thread Max Vlasov
Yes, Greg, please post this value for this table you use in count query

Max


On Wed, Feb 23, 2011 at 9:58 PM, Greg Barker  wrote:

> Is the record size you refer to here the same as the "Average payload per
> entry" that sqlite3_analyzer determines for me?
>
> On Wed, Feb 23, 2011 at 5:09 AM, Max Vlasov  wrote:
>
> > Greg, you should also take the record size into account. My hypothesis is
> > that if your record is comparatively small (several fits into 1024) the
> > speed of select count will be the same for any page size (my quick tests
> > confirm this). It's interesting to know what is an average size of your
> > record to understand why the numbers are so different.
> >
> > Max
>
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-24 Thread Greg Barker
Is the record size you refer to here the same as the "Average payload per
entry" that sqlite3_analyzer determines for me?

On Wed, Feb 23, 2011 at 5:09 AM, Max Vlasov  wrote:

> Greg, you should also take the record size into account. My hypothesis is
> that if your record is comparatively small (several fits into 1024) the
> speed of select count will be the same for any page size (my quick tests
> confirm this). It's interesting to know what is an average size of your
> record to understand why the numbers are so different.
>
> Max
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-23 Thread Jim Wilcoxson
On Wed, Feb 23, 2011 at 11:12 AM, Sven L <larvpo...@hotmail.se> wrote:
>
> Thanks for pointing this out!
>
> In my case I have spent much time on normalizing my tables, so the row size 
> should be constant in most cases. I do wonder though, what if the row size is 
> 32 bytes? Or is there a minimum?
>
> For instance, I have many lookup tables with ID+text (usually around 20 
> characters):
> MyID|MyText
>
> With a page size of 4096, will SQLite put ~200 rows in one page?

Yes, very roughly.  There is other internal information: a header on
each page, on each row, on each field, ints are variable length, etc.,
and SQLite reserves some free space on each page for later inserts.

Use sqlite3_analyzer for lots of useful info when picking a page size.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com

>
>
>
>> Date: Wed, 23 Feb 2011 10:47:03 -0500
>> From: pri...@gmail.com
>> To: t...@djii.com; sqlite-users@sqlite.org
>> Subject: Re: [sqlite] COUNT() extremely slow first time!
>>
>> The SQLite cache size is in pages (2000 by default), so by increasing
>> the page size 8x, you're also increasing the cache size and memory
>> requirements by 8x. Not saying it's a bad thing, just something to be
>> aware of.
>>
>> If you want to compare 1K and 8K page size and only compare the effect
>> page size has, you should either increase the cache size to 16000 for
>> 1K pages or decrease the cache to 250 for 8K pages.
>>
>> The other thing to be aware of is that SQLite will not allow a row to
>> cross 2 pages. (It does allow a row to be larger than a page, using
>> an overflow page.) So for example, if your page size is 1024 and row
>> size is 512 bytes, you can fit 2 rows on a page. I'm simplifying this
>> somewhat and ignoring internal SQLite data, but you get the idea. If
>> your row size is 513 bytes, you will have 511 bytes of waste on each
>> page, so 50% of your database will be "air". As your row size heads
>> toward 1024 there will be less waste. At 1025 bytes, SQLite will
>> start splitting rows into overflow pages, putting 1024 bytes into the
>> overflow page and 1 byte in the btree page. These numbers aren't
>> right, but illustrate the point.
>>
>> So to find a good page size, experiment and measure.
>>
>> Jim
>> --
>> HashBackup: easy onsite and offsite Unix backup
>> http://www.hashbackup.com
>>
>>
>>
>> On Wed, Feb 23, 2011 at 10:20 AM, Teg <t...@djii.com> wrote:
>> > Hello Greg,
>> >
>> > I found this to be the case too. The difference between 1K and 8K is
>> > staggering. I default all my windows DB's to 8K now.
>> >
>> >
>> > Tuesday, February 22, 2011, 1:59:29 PM, you wrote:
>> >
>> > GB> I'm currently dealing with a similar issue. I've found that the 
>> > page_size
>> > GB> PRAGMA setting can have a dramatic effect on how long it takes to 
>> > "warm up"
>> > GB> the table. On Windows 7, with page_size=1024, a SELECT 
>> > COUNT(last_column)
>> > GB> takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 
>> > takes
>> > GB> 8.5 seconds. This was done with a reboot between each test.
>> >
>> >
>> >
>> >
>> > --
>> > Best regards,
>> >  Teg                            mailto:t...@djii.com
>> >
>> > ___
>> > sqlite-users mailing list
>> > sqlite-users@sqlite.org
>> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> >
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-23 Thread Sven L

Thanks for pointing this out!
 
In my case I have spent much time on normalizing my tables, so the row size 
should be constant in most cases. I do wonder though, what if the row size is 
32 bytes? Or is there a minimum?
 
For instance, I have many lookup tables with ID+text (usually around 20 
characters):
MyID|MyText
 
With a page size of 4096, will SQLite put ~200 rows in one page?
 

 
> Date: Wed, 23 Feb 2011 10:47:03 -0500
> From: pri...@gmail.com
> To: t...@djii.com; sqlite-users@sqlite.org
> Subject: Re: [sqlite] COUNT() extremely slow first time!
> 
> The SQLite cache size is in pages (2000 by default), so by increasing
> the page size 8x, you're also increasing the cache size and memory
> requirements by 8x. Not saying it's a bad thing, just something to be
> aware of.
> 
> If you want to compare 1K and 8K page size and only compare the effect
> page size has, you should either increase the cache size to 16000 for
> 1K pages or decrease the cache to 250 for 8K pages.
> 
> The other thing to be aware of is that SQLite will not allow a row to
> cross 2 pages. (It does allow a row to be larger than a page, using
> an overflow page.) So for example, if your page size is 1024 and row
> size is 512 bytes, you can fit 2 rows on a page. I'm simplifying this
> somewhat and ignoring internal SQLite data, but you get the idea. If
> your row size is 513 bytes, you will have 511 bytes of waste on each
> page, so 50% of your database will be "air". As your row size heads
> toward 1024 there will be less waste. At 1025 bytes, SQLite will
> start splitting rows into overflow pages, putting 1024 bytes into the
> overflow page and 1 byte in the btree page. These numbers aren't
> right, but illustrate the point.
> 
> So to find a good page size, experiment and measure.
> 
> Jim
> --
> HashBackup: easy onsite and offsite Unix backup
> http://www.hashbackup.com
> 
> 
> 
> On Wed, Feb 23, 2011 at 10:20 AM, Teg <t...@djii.com> wrote:
> > Hello Greg,
> >
> > I found this to be the case too. The difference between 1K and 8K is
> > staggering. I default all my windows DB's to 8K now.
> >
> >
> > Tuesday, February 22, 2011, 1:59:29 PM, you wrote:
> >
> > GB> I'm currently dealing with a similar issue. I've found that the 
> > page_size
> > GB> PRAGMA setting can have a dramatic effect on how long it takes to "warm 
> > up"
> > GB> the table. On Windows 7, with page_size=1024, a SELECT 
> > COUNT(last_column)
> > GB> takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 
> > takes
> > GB> 8.5 seconds. This was done with a reboot between each test.
> >
> >
> >
> >
> > --
> > Best regards,
> >  Tegmailto:t...@djii.com
> >
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-23 Thread Jim Wilcoxson
The SQLite cache size is in pages (2000 by default), so by increasing
the page size 8x, you're also increasing the cache size and memory
requirements by 8x.  Not saying it's a bad thing, just something to be
aware of.

If you want to compare 1K and 8K page size and only compare the effect
page size has, you should either increase the cache size to 16000 for
1K pages or decrease the cache to 250 for 8K pages.

The other thing to be aware of is that SQLite will not allow a row to
cross 2 pages.  (It does allow a row to be larger than a page, using
an overflow page.)  So for example, if your page size is 1024 and row
size is 512 bytes, you can fit 2 rows on a page.  I'm simplifying this
somewhat and ignoring internal SQLite data, but you get the idea.  If
your row size is 513 bytes, you will have 511 bytes of waste on each
page, so 50% of your database will be "air".  As your row size heads
toward 1024 there will be less waste.  At 1025 bytes, SQLite will
start splitting rows into overflow pages, putting 1024 bytes into the
overflow page and 1 byte in the btree page.  These numbers aren't
right, but illustrate the point.

So to find a good page size, experiment and measure.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com



On Wed, Feb 23, 2011 at 10:20 AM, Teg  wrote:
> Hello Greg,
>
> I found this to be the case too. The difference between 1K and 8K is
> staggering. I default all my windows DB's to 8K now.
>
>
> Tuesday, February 22, 2011, 1:59:29 PM, you wrote:
>
> GB> I'm currently dealing with a similar issue. I've found that the page_size
> GB> PRAGMA setting can have a dramatic effect on how long it takes to "warm 
> up"
> GB> the table. On Windows 7, with page_size=1024, a SELECT COUNT(last_column)
> GB> takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 
> takes
> GB> 8.5 seconds. This was done with a reboot between each test.
>
>
>
>
> --
> Best regards,
>  Teg                            mailto:t...@djii.com
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-23 Thread Jay A. Kreibich
On Wed, Feb 23, 2011 at 03:10:02PM +0100, Sven L scratched on the wall:
> 
> Does this trick work on the primary key? If not, why?

  Yes, all the time.
  
  Defining a column as a PK automatically creates a UNIQUE index
  over that column.  The only exception is when the column is an
  INTEGER PRIMARY KEY, in which case the column becomes the rowid and
  uses the table's native index.  In that case, to achieve the same
  result, one would need to manually create an index, as Max outlines
  below.

  In addition to fast counts, this setup is also very good for
  equi-joins, which tend to be somewhat common in most database
  designs.  (An equi-join is when you join table A to table B only for
  the purpose of filtering rows in A, and don't actually return any
  values from B as part of the result set.)

  Because of the way SQLite works internally, there are also tricks
  of creating indexes with "extra" columns.  For example, if you have
  a table with 23 columns, but you mostly use the PK and two additional
  columns, you can create an index over just those tree columns.  This
  will generally result in faster access to those two columns (for
  reasons that take several pages to explain).  Of course, even better
  would be to break things up into a primary and detail table, but
  that's a design consideration.

  Just beware that all of these indexes come at a cost.  If you
  database is strongly read dominated, you might consider some of these
  techniques.  On the other hand, if you're more or less read/write
  mixed, or write dominated, these techniques will cause an overall
  performance drop.  Indexes can be useful for reads, but they always
  come at a write (INSERT/UPDATE/DELETE) cost.

   -j


> > Returning to the original topic, for performance reasons I sometimes
> > recommend using an index created on the id/rowid. It's a strange construct
> > that makes no sense, but actually it sometimes give a speed improvement.
> >
> > This is because any index contains only the data used in it and if the query
> > doesn't require getting additional data from the table it was created for,
> > sqlite only reads this index and nothing else.

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-23 Thread Teg
Hello Greg,

I found this to be the case too. The difference between 1K and 8K is
staggering. I default all my windows DB's to 8K now.


Tuesday, February 22, 2011, 1:59:29 PM, you wrote:

GB> I'm currently dealing with a similar issue. I've found that the page_size
GB> PRAGMA setting can have a dramatic effect on how long it takes to "warm up"
GB> the table. On Windows 7, with page_size=1024, a SELECT COUNT(last_column)
GB> takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 takes
GB> 8.5 seconds. This was done with a reboot between each test.




-- 
Best regards,
 Tegmailto:t...@djii.com

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-23 Thread Sven L

Does this trick work on the primary key? If not, why?
 
> From: max.vla...@gmail.com
> Date: Wed, 23 Feb 2011 16:09:04 +0300
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] COUNT() extremely slow first time!
> 
> On Tue, Feb 22, 2011 at 9:59 PM, Greg Barker <fle...@fletchowns.net> wrote:
> 
> > I'm currently dealing with a similar issue. I've found that the page_size
> > PRAGMA setting can have a dramatic effect on how long it takes to "warm up"
> > the table. On Windows 7, with page_size=1024, a SELECT COUNT(last_column)
> > takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 takes
> > 8.5 seconds. This was done with a reboot between each test.
> >
> > This page recommends a page_size of 4096:
> > http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuningWindows
> >
> > If I see better performance with the larger page sizes (going to test 16384
> > and beyond after this) is there any reason not to use them?
> >
> >
> Greg, you should also take the record size into account. My hypothesis is
> that if your record is comparatively small (several fits into 1024) the
> speed of select count will be the same for any page size (my quick tests
> confirm this). It's interesting to know what is an average size of your
> record to understand why the numbers are so different.
> 
> Returning to the original topic, for performance reasons I sometimes
> recommend using an index created on the id/rowid. It's a strange construct
> that makes no sense, but actually it sometimes give a speed improvement.
> This is because any index contains only the data used in it and if the query
> doesn't require getting additional data from the table it was created for,
> sqlite only reads this index and nothing else.
> 
> So to get the fastest count result one can create the following index
> (assuming id is the alias for rowid)
> 
> CREATE INDEX [idx_MyTableId] ON [MyTable] ([ID] )
> 
> And use the following query
> 
> SELECT COUNT(id) from (SELECT id FROM MyTable ORDER By Id)
> 
> "Order by" here forces using this index and I used outer select since
> count(id) inside the main select for unknown reasons triggers the table
> scanning.
> 
> For any query in my tests that usually takes 5-50 seconds, this one is
> always less than a second. But is costs a little in term of the size (the
> index takes space) and the speed of insert. If this is a small price to pay
> then this may be an answer.
> 
> Max
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-23 Thread Max Vlasov
On Tue, Feb 22, 2011 at 9:59 PM, Greg Barker  wrote:

> I'm currently dealing with a similar issue. I've found that the page_size
> PRAGMA setting can have a dramatic effect on how long it takes to "warm up"
> the table. On Windows 7, with page_size=1024, a SELECT COUNT(last_column)
> takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 takes
> 8.5 seconds. This was done with a reboot between each test.
>
> This page recommends a page_size of 4096:
> http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuningWindows
>
> If I see better performance with the larger page sizes (going to test 16384
> and beyond after this) is there any reason not to use them?
>
>
Greg, you should also take the record size into account. My hypothesis is
that if your record is comparatively small (several fits into 1024) the
speed of select count will be the same for any page size (my quick tests
confirm this). It's interesting to know what is an average size of your
record to understand why the numbers are so different.

Returning to the original topic, for performance reasons I sometimes
recommend using an index created on the id/rowid. It's a strange construct
that makes no sense, but actually it sometimes give a speed improvement.
This is because any index contains only the data used in it and if the query
doesn't require getting additional data from the table it was created for,
sqlite only reads this index and nothing else.

So to get the fastest count result one can create the following index
(assuming id is the alias for rowid)

   CREATE INDEX [idx_MyTableId] ON [MyTable] ([ID] )

And use the following query

   SELECT COUNT(id) from (SELECT id FROM MyTable ORDER By Id)

"Order by" here forces using this index and I used outer select since
count(id) inside the main select for unknown reasons triggers the table
scanning.

For any query in my tests that usually takes 5-50 seconds, this one is
always less than a second. But is costs a little in term of the size (the
index takes space) and the speed of insert. If this is a small price to pay
then this may be an answer.

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-23 Thread Sven L

Interesting!
I've been using "PRAGMA page_size = 4096;" in my software. Perhaps I should 
increase it and see if I can get a performance gain.
 
Does it affect INSERTs too?
 
> Date: Tue, 22 Feb 2011 10:59:29 -0800
> From: fle...@fletchowns.net
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] COUNT() extremely slow first time!
> 
> I'm currently dealing with a similar issue. I've found that the page_size
> PRAGMA setting can have a dramatic effect on how long it takes to "warm up"
> the table. On Windows 7, with page_size=1024, a SELECT COUNT(last_column)
> takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 takes
> 8.5 seconds. This was done with a reboot between each test.
> 
> This page recommends a page_size of 4096:
> http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuningWindows
> 
> If I see better performance with the larger page sizes (going to test 16384
> and beyond after this) is there any reason not to use them?
> 
> Greg
> 
> On Mon, Feb 21, 2011 at 4:37 PM, Stephen Oberholtzer <
> oliverkloz...@gmail.com> wrote:
> 
> > On Mon, Feb 21, 2011 at 11:35 AM, Jim Wilcoxson <pri...@gmail.com> wrote:
> > > On Mon, Feb 21, 2011 at 11:05 AM, Sven L <larvpo...@hotmail.se> wrote:
> > >>
> > >> Thank you for your detailed explanation!
> > >> First, can you please tell me how to purge the cache in Windows 7? This
> > could be very useful for my tests!
> > >
> > > Sorry, dunno for Windows. On Mac OSX it is the purge command, in the
> > > development tools. On Linux, you do: echo 3 >
> > > /prog/sys/vm/drop_caches
> >
> > Just make sure you either (a) quote the 3 (echo '3' >
> > /proc/sys/vm/drop_caches) or (b) put a space between the 3 and the >.
> > If you don't quote it, and you don't put the space in (echo
> > 3>/proc/sys/vm/drop_caches) it doesn't do anything, for reasons I
> > won't go into.
> >
> > --
> > -- Stevie-O
> > Real programmers use COPY CON PROGRAM.EXE
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-23 Thread Greg Barker
I'm currently dealing with a similar issue. I've found that the page_size
PRAGMA setting can have a dramatic effect on how long it takes to "warm up"
the table. On Windows 7, with page_size=1024, a SELECT COUNT(last_column)
takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 takes
8.5 seconds. This was done with a reboot between each test.

This page recommends a page_size of 4096:
http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuningWindows

If I see better performance with the larger page sizes (going to test 16384
and beyond after this) is there any reason not to use them?

Greg

On Mon, Feb 21, 2011 at 4:37 PM, Stephen Oberholtzer <
oliverkloz...@gmail.com> wrote:

> On Mon, Feb 21, 2011 at 11:35 AM, Jim Wilcoxson  wrote:
> > On Mon, Feb 21, 2011 at 11:05 AM, Sven L  wrote:
> >>
> >> Thank you for your detailed explanation!
> >> First, can you please tell me how to purge the cache in Windows 7? This
> could be very useful for my tests!
> >
> > Sorry, dunno for Windows.  On Mac OSX it is the purge command, in the
> > development tools.  On Linux, you do: echo 3 >
> > /prog/sys/vm/drop_caches
>
> Just make sure you either (a) quote the 3 (echo '3' >
> /proc/sys/vm/drop_caches) or (b) put a space between the 3 and the >.
> If you don't quote it, and you don't put the space in (echo
> 3>/proc/sys/vm/drop_caches) it doesn't do anything, for reasons I
> won't go into.
>
> --
> -- Stevie-O
> Real programmers use COPY CON PROGRAM.EXE
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-21 Thread Stephen Oberholtzer
On Mon, Feb 21, 2011 at 11:35 AM, Jim Wilcoxson  wrote:
> On Mon, Feb 21, 2011 at 11:05 AM, Sven L  wrote:
>>
>> Thank you for your detailed explanation!
>> First, can you please tell me how to purge the cache in Windows 7? This 
>> could be very useful for my tests!
>
> Sorry, dunno for Windows.  On Mac OSX it is the purge command, in the
> development tools.  On Linux, you do: echo 3 >
> /prog/sys/vm/drop_caches

Just make sure you either (a) quote the 3 (echo '3' >
/proc/sys/vm/drop_caches) or (b) put a space between the 3 and the >.
If you don't quote it, and you don't put the space in (echo
3>/proc/sys/vm/drop_caches) it doesn't do anything, for reasons I
won't go into.

-- 
-- Stevie-O
Real programmers use COPY CON PROGRAM.EXE
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-21 Thread Sven L

Thanks :)
 
This did the trick:
First make a copy of the database: copy HugeDatabase.db HugeDatabase_copy.db
Then for each run, replace the database with its copy. This is why I thought 
the COUNT operation was somehow written to the database after its first run... 
:P

> From: slav...@bigfraud.org
> Date: Mon, 21 Feb 2011 16:56:01 +
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] COUNT() extremely slow first time!
> 
> 
> On 21 Feb 2011, at 4:35pm, Jim Wilcoxson wrote:
> 
> > On Mon, Feb 21, 2011 at 11:05 AM, Sven L <larvpo...@hotmail.se> wrote:
> >> 
> >> Thank you for your detailed explanation!
> >> First, can you please tell me how to purge the cache in Windows 7? This 
> >> could be very useful for my tests!
> > 
> > Sorry, dunno for Windows. On Mac OSX it is the purge command, in the
> > development tools.
> 
> In current versions it's just /usr/bin/purge/, which should be in your path. 
> No idea why a non-programmer should need it, but there it is.
> 
> > On Linux, you do: echo 3 >
> > /prog/sys/vm/drop_caches
> 
> And in Windows it's ... almost impossible. You can sync to disk, using 
> fflush(), and there's no reason you can't do this from the command-line:
> 
> http://technet.microsoft.com/en-us/sysinternals/bb897438.aspx
> 
> But for a purge you have to invalidate the contents of cache, and there's no 
> way for an application to tell Windows to do this. For a start, each 
> application has its own user mode address cache, so if you run another 
> utility to flush cache, it just messes with its own space. Second, there's no 
> system call that allows you access to the cache because Microsoft considers 
> it private to the device level. I am not dissing Microsoft for this: there 
> are fair arguments that this is actually the Right Way to do it.
> 
> So the way to do it is to overrun the cache by yourself. If you know your 
> cache is 2Gig, find a 2Gig disk file that has nothing to do with your test 
> suite and read it. Or generate 2Gig of gibberish and write it to disk, then 
> delete that file. Pah.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-21 Thread Simon Slavin

On 21 Feb 2011, at 4:35pm, Jim Wilcoxson wrote:

> On Mon, Feb 21, 2011 at 11:05 AM, Sven L  wrote:
>> 
>> Thank you for your detailed explanation!
>> First, can you please tell me how to purge the cache in Windows 7? This 
>> could be very useful for my tests!
> 
> Sorry, dunno for Windows.  On Mac OSX it is the purge command, in the
> development tools.

In current versions it's just /usr/bin/purge/, which should be in your path.  
No idea why a non-programmer should need it, but there it is.

> On Linux, you do: echo 3 >
> /prog/sys/vm/drop_caches

And in Windows it's ... almost impossible.  You can sync to disk, using 
fflush(), and there's no reason you can't do this from the command-line:

http://technet.microsoft.com/en-us/sysinternals/bb897438.aspx

But for a purge you have to invalidate the contents of cache, and there's no 
way for an application to tell Windows to do this.  For a start, each 
application has its own user mode address cache, so if you run another utility 
to flush cache, it just messes with its own space.  Second, there's no system 
call that allows you access to the cache because Microsoft considers it private 
to the device level.  I am not dissing Microsoft for this: there are fair 
arguments that this is actually the Right Way to do it.

So the way to do it is to overrun the cache by yourself.  If you know your 
cache is 2Gig, find a 2Gig disk file that has nothing to do with your test 
suite and read it.  Or generate 2Gig of gibberish and write it to disk, then 
delete that file.  Pah.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-21 Thread Jim Wilcoxson
(logid) the first time, and 8x faster once all
pages are cached.  This is just what happens on my system (Mac OSX),
and may have nothing at all to do with the performance on yours.

Jim

>
> Thanks again!
>
>
>> Date: Mon, 21 Feb 2011 10:17:03 -0500
>> From: pri...@gmail.com
>> To: sqlite-users@sqlite.org
>> Subject: Re: [sqlite] COUNT() extremely slow first time!
>>
>> This is a common issue on the mailing list. The first time you do
>> count(*), SQLite (actually your OS) has to load data into memory.
>> Most OS's will keep the file in a buffer cache, so the 2nd count(*)
>> doesn't have to read from disk.
>>
>> Here's a timing from my own system, after a purge command to clear the
>> buffer cache:
>>
>> $ time sqlite3 hb.db 'select count(*) from logs'
>> -- Loading resources from /Users/jim/.sqliterc
>> count(*)
>> --
>> 734909
>>
>> real 0m0.580s
>> user 0m0.190s
>> sys 0m0.034s
>>
>> Same command again, with the file cached:
>>
>> $ time sqlite3 hb.db 'select count(*) from logs'
>> -- Loading resources from /Users/jim/.sqliterc
>> count(*)
>> --
>> 734909
>>
>> real 0m0.189s
>> user 0m0.165s
>> sys 0m0.019s
>>
>> This time is consistent no matter how many times I run it, because the
>> file is still cached. Doing a purge command to clear the cache and
>> re-running the query, we get:
>>
>> $ purge
>> $ time sqlite3 hb.db 'select count(*) from logs'
>> -- Loading resources from /Users/jim/.sqliterc
>> count(*)
>> --
>> 734909
>>
>> real 0m0.427s
>> user 0m0.175s
>> sys 0m0.024s
>>
>> On my system, there is not a huge difference, but it is consistent.
>> Now, if you have a fragmented file system, you will see a much larger
>> difference. There are many posts on the mailing list about both file
>> system fragmentation and logical fragmentation within the SQLite file
>> itself. Your first count(*) is subject to these fragmentation
>> effects, while your 2nd usually is not, because the file is in memory.
>>
>> Some people on the list believe fragmentation is an unimportant detail
>> you shouldn't worry about, because you have little control over it.
>> That may be true, but it's useful to understand how it can affect
>> performance. I think you are seeing this first hand.
>>
>> Jim
>> --
>> HashBackup: easy onsite and offsite Unix backup
>> http://www.hashbackup.com
>>
>>
>>
>> On Mon, Feb 21, 2011 at 9:37 AM, Sven L <larvpo...@hotmail.se> wrote:
>> >
>> > Same result :(
>> > Note that I have compiled SQLite with the following switches:
>> > SQLITE_ENABLE_STAT2
>> > SQLITE_THREADSAFE=2
>> >
>> > I've learnt that COUNT(*) is slower than COUNT(ID), since * means the 
>> > engine has to traverse all columns and it might even return another value 
>> > if there are NULL-values...
>> >
>> > Also, this is quite interesting:
>> >
>> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items;
>> > 0|0|0|SCAN TABLE Items (~100 rows)
>> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items;
>> > sqlite>
>> >
>> > I would expect an index scan on the first statement. The second statement 
>> > tells me nada?!
>> >
>> > Thanks for your help!
>> >
>> >
>> >> From: slav...@bigfraud.org
>> >> Date: Mon, 21 Feb 2011 14:24:50 +
>> >> To: sqlite-users@sqlite.org
>> >> Subject: Re: [sqlite] COUNT() extremely slow first time!
>> >>
>> >>
>> >> On 21 Feb 2011, at 2:23pm, Sven L wrote:
>> >>
>> >> > SELECT COUNT(ItemID) FROM Items;
>> >> >
>> >> > This takes around 40 seconds the first time! WHY?!
>> >>
>> >> Try again, doing everything identically except that instead of the above 
>> >> line use
>> >>
>> >> SELECT COUNT(*) FROM Items;
>> >>
>> >> Simon.
>> >> ___
>> >> sqlite-users mailing list
>> >> sqlite-users@sqlite.org
>> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> >
>> > ___
>> > sqlite-users mailing list
>> > sqlite-users@sqlite.org
>> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> >
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-21 Thread Jay A. Kreibich
On Mon, Feb 21, 2011 at 03:37:50PM +0100, Sven L scratched on the wall:

> I've learnt that COUNT(*) is slower than COUNT(ID),

  That's usually not true.

> since * means the engine has to traverse all columns 

  Actually, count(*) is the one case when the engine does *not* need to
  traverse any columns.

> and it might even return another value if there are NULL-values...

  Yes, but that's how count() is defined to work.

  The expression "count(id)" only counts rows where id is not NULL. 
  This requires that the database engine retrieve the value of the id
  column from each row in order to test for NULL.
  
  The expression "count(*)" strictly counts rows, without concern over
  NULLs.  As such, there is no need to actually retrieve any row data,
  because there is no need to test any row values.  The count(*)
  function can scan the table tree, counting the number of rows,
  without actually loading or decoding the row data.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



Re: [sqlite] COUNT() extremely slow first time!

2011-02-21 Thread Jim Wilcoxson
On Mon, Feb 21, 2011 at 10:38 AM, Puneet Kishor <punk.k...@gmail.com> wrote:
> On Mon, Feb 21, 2011 at 10:17:03AM -0500, Jim Wilcoxson wrote:
>> This is a common issue on the mailing list.  The first time you do
>> count(*), SQLite (actually your OS) has to load data into memory.
>> Most OS's will keep the file in a buffer cache, so the 2nd count(*)
>> doesn't have to read from disk.
>
>
> One question I have related to the above -- how long does that buffer
> cache remain filled with the data? I am assuming it is until new stuff
> to be cached pushes out old stuff, no?

For most OS's, the time data remains in the cache and the size of the
cache will vary as a function of available RAM.

>
> I was doing some R*Tree selects, and the first query was dog slow,
> although benchmarking showed that the actual CPU time was very small.
> Subsequent queries were lightning fast. I am assuming that the buffer is
> not getting filled with the results as much as it is getting filled with
> whatever part of the db that the program needs to open to do its work.

Right.  SQLite doesn't cache query results.  It does cache database
pages in its own cache, which by default is rather small: 2000 pages.
At the default page size is 1K, that's a 2MB cache.  To fetch records
not in it's own cache, SQLite will use the OS.  If the page is in the
OS cache, there is no seek time and no read latency (for spinning
media).

Jim

>
>
>>
>> Here's a timing from my own system, after a purge command to clear the
>> buffer cache:
>>
>> $ time sqlite3 hb.db 'select count(*) from logs'
>> -- Loading resources from /Users/jim/.sqliterc
>> count(*)
>> --
>> 734909
>>
>> real  0m0.580s
>> user  0m0.190s
>> sys   0m0.034s
>>
>> Same command again, with the file cached:
>>
>> $ time sqlite3 hb.db 'select count(*) from logs'
>> -- Loading resources from /Users/jim/.sqliterc
>> count(*)
>> --
>> 734909
>>
>> real  0m0.189s
>> user  0m0.165s
>> sys   0m0.019s
>>
>> This time is consistent no matter how many times I run it, because the
>> file is still cached.  Doing a purge command to clear the cache and
>> re-running the query, we get:
>>
>> $ purge
>> $ time sqlite3 hb.db 'select count(*) from logs'
>> -- Loading resources from /Users/jim/.sqliterc
>> count(*)
>> --
>> 734909
>>
>> real  0m0.427s
>> user  0m0.175s
>> sys   0m0.024s
>>
>> On my system, there is not a huge difference, but it is consistent.
>> Now, if you have a fragmented file system, you will see a much larger
>> difference.  There are many posts on the mailing list about both file
>> system fragmentation and logical fragmentation within the SQLite file
>> itself.  Your first count(*) is subject to these fragmentation
>> effects, while your 2nd usually is not, because the file is in memory.
>>
>> Some people on the list believe fragmentation is an unimportant detail
>> you shouldn't worry about, because you have little control over it.
>> That may be true, but it's useful to understand how it can affect
>> performance.  I think you are seeing this first hand.
>>
>> Jim
>> --
>> HashBackup: easy onsite and offsite Unix backup
>> http://www.hashbackup.com
>>
>>
>>
>> On Mon, Feb 21, 2011 at 9:37 AM, Sven L <larvpo...@hotmail.se> wrote:
>> >
>> > Same result :(
>> > Note that I have compiled SQLite with the following switches:
>> > SQLITE_ENABLE_STAT2
>> > SQLITE_THREADSAFE=2
>> >
>> > I've learnt that COUNT(*) is slower than COUNT(ID), since * means the 
>> > engine has to traverse all columns and it might even return another value 
>> > if there are NULL-values...
>> >
>> > Also, this is quite interesting:
>> >
>> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items;
>> > 0|0|0|SCAN TABLE Items (~100 rows)
>> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items;
>> > sqlite>
>> >
>> > I would expect an index scan on the first statement. The second statement 
>> > tells me nada?!
>> >
>> > Thanks for your help!
>> >
>> >
>> >> From: slav...@bigfraud.org
>> >> Date: Mon, 21 Feb 2011 14:24:50 +
>> >> To: sqlite-users@sqlite.org
>> >> Subject: Re: [sqlite] COUNT() extremely slow first time!
>> >>
>> >>
>> >> On 21 Feb 2011, at 2:23pm, Sven L wrote:
>> >>
>> >> > SELECT COUNT(ItemID) FROM

Re: [sqlite] COUNT() extremely slow first time!

2011-02-21 Thread Sven L

Thank you for your detailed explanation!
First, can you please tell me how to purge the cache in Windows 7? This could 
be very useful for my tests!
 
I'm quite sure my database itself is not fragmented, since I have only inserted 
data. The file system is in good shape too; Windows reports 0% fragmentation. 
Perhaps there is some other bottleneck, like disk performance in general (this 
is a 5400 rpm)? Or antivirus-related CPU holds? (I did turn real-time scanning 
off though.)
I have even turned Windows Search off (and got a 20% performance gain!). My 
32-bit application is running under Windows 7 (64-bit). Could WOW64 have 
something to do with this performance issue?
 
The size of the database is 1,98 GB, with 1.5M rows. Couldn't it be that the 
table scan simply is darn slow for huge tables?
 
In an ideal world the COUNT() would be performed on the primary key in RAM. 
That's not possible? :P
 
Thanks again!
 
 
> Date: Mon, 21 Feb 2011 10:17:03 -0500
> From: pri...@gmail.com
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] COUNT() extremely slow first time!
> 
> This is a common issue on the mailing list. The first time you do
> count(*), SQLite (actually your OS) has to load data into memory.
> Most OS's will keep the file in a buffer cache, so the 2nd count(*)
> doesn't have to read from disk.
> 
> Here's a timing from my own system, after a purge command to clear the
> buffer cache:
> 
> $ time sqlite3 hb.db 'select count(*) from logs'
> -- Loading resources from /Users/jim/.sqliterc
> count(*)
> --
> 734909
> 
> real 0m0.580s
> user 0m0.190s
> sys 0m0.034s
> 
> Same command again, with the file cached:
> 
> $ time sqlite3 hb.db 'select count(*) from logs'
> -- Loading resources from /Users/jim/.sqliterc
> count(*)
> --
> 734909
> 
> real 0m0.189s
> user 0m0.165s
> sys 0m0.019s
> 
> This time is consistent no matter how many times I run it, because the
> file is still cached. Doing a purge command to clear the cache and
> re-running the query, we get:
> 
> $ purge
> $ time sqlite3 hb.db 'select count(*) from logs'
> -- Loading resources from /Users/jim/.sqliterc
> count(*)
> --
> 734909
> 
> real 0m0.427s
> user 0m0.175s
> sys 0m0.024s
> 
> On my system, there is not a huge difference, but it is consistent.
> Now, if you have a fragmented file system, you will see a much larger
> difference. There are many posts on the mailing list about both file
> system fragmentation and logical fragmentation within the SQLite file
> itself. Your first count(*) is subject to these fragmentation
> effects, while your 2nd usually is not, because the file is in memory.
> 
> Some people on the list believe fragmentation is an unimportant detail
> you shouldn't worry about, because you have little control over it.
> That may be true, but it's useful to understand how it can affect
> performance. I think you are seeing this first hand.
> 
> Jim
> --
> HashBackup: easy onsite and offsite Unix backup
> http://www.hashbackup.com
> 
> 
> 
> On Mon, Feb 21, 2011 at 9:37 AM, Sven L <larvpo...@hotmail.se> wrote:
> >
> > Same result :(
> > Note that I have compiled SQLite with the following switches:
> > SQLITE_ENABLE_STAT2
> > SQLITE_THREADSAFE=2
> >
> > I've learnt that COUNT(*) is slower than COUNT(ID), since * means the 
> > engine has to traverse all columns and it might even return another value 
> > if there are NULL-values...
> >
> > Also, this is quite interesting:
> >
> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items;
> > 0|0|0|SCAN TABLE Items (~100 rows)
> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items;
> > sqlite>
> >
> > I would expect an index scan on the first statement. The second statement 
> > tells me nada?!
> >
> > Thanks for your help!
> >
> >
> >> From: slav...@bigfraud.org
> >> Date: Mon, 21 Feb 2011 14:24:50 +
> >> To: sqlite-users@sqlite.org
> >> Subject: Re: [sqlite] COUNT() extremely slow first time!
> >>
> >>
> >> On 21 Feb 2011, at 2:23pm, Sven L wrote:
> >>
> >> > SELECT COUNT(ItemID) FROM Items;
> >> >
> >> > This takes around 40 seconds the first time! WHY?!
> >>
> >> Try again, doing everything identically except that instead of the above 
> >> line use
> >>
> >> SELECT COUNT(*) FROM Items;
> >>
> >> Simon.
> >> ___
> >> sqlite-users mailing list
> >> sqlite-users@sqlite.org
> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-21 Thread Puneet Kishor
On Mon, Feb 21, 2011 at 10:17:03AM -0500, Jim Wilcoxson wrote:
> This is a common issue on the mailing list.  The first time you do
> count(*), SQLite (actually your OS) has to load data into memory.
> Most OS's will keep the file in a buffer cache, so the 2nd count(*)
> doesn't have to read from disk.


One question I have related to the above -- how long does that buffer
cache remain filled with the data? I am assuming it is until new stuff
to be cached pushes out old stuff, no?

I was doing some R*Tree selects, and the first query was dog slow,
although benchmarking showed that the actual CPU time was very small.
Subsequent queries were lightning fast. I am assuming that the buffer is
not getting filled with the results as much as it is getting filled with
whatever part of the db that the program needs to open to do its work.


> 
> Here's a timing from my own system, after a purge command to clear the
> buffer cache:
> 
> $ time sqlite3 hb.db 'select count(*) from logs'
> -- Loading resources from /Users/jim/.sqliterc
> count(*)
> --
> 734909
> 
> real  0m0.580s
> user  0m0.190s
> sys   0m0.034s
> 
> Same command again, with the file cached:
> 
> $ time sqlite3 hb.db 'select count(*) from logs'
> -- Loading resources from /Users/jim/.sqliterc
> count(*)
> --
> 734909
> 
> real  0m0.189s
> user  0m0.165s
> sys   0m0.019s
> 
> This time is consistent no matter how many times I run it, because the
> file is still cached.  Doing a purge command to clear the cache and
> re-running the query, we get:
> 
> $ purge
> $ time sqlite3 hb.db 'select count(*) from logs'
> -- Loading resources from /Users/jim/.sqliterc
> count(*)
> --
> 734909
> 
> real  0m0.427s
> user  0m0.175s
> sys   0m0.024s
> 
> On my system, there is not a huge difference, but it is consistent.
> Now, if you have a fragmented file system, you will see a much larger
> difference.  There are many posts on the mailing list about both file
> system fragmentation and logical fragmentation within the SQLite file
> itself.  Your first count(*) is subject to these fragmentation
> effects, while your 2nd usually is not, because the file is in memory.
> 
> Some people on the list believe fragmentation is an unimportant detail
> you shouldn't worry about, because you have little control over it.
> That may be true, but it's useful to understand how it can affect
> performance.  I think you are seeing this first hand.
> 
> Jim
> --
> HashBackup: easy onsite and offsite Unix backup
> http://www.hashbackup.com
> 
> 
> 
> On Mon, Feb 21, 2011 at 9:37 AM, Sven L <larvpo...@hotmail.se> wrote:
> >
> > Same result :(
> > Note that I have compiled SQLite with the following switches:
> > SQLITE_ENABLE_STAT2
> > SQLITE_THREADSAFE=2
> >
> > I've learnt that COUNT(*) is slower than COUNT(ID), since * means the 
> > engine has to traverse all columns and it might even return another value 
> > if there are NULL-values...
> >
> > Also, this is quite interesting:
> >
> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items;
> > 0|0|0|SCAN TABLE Items (~100 rows)
> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items;
> > sqlite>
> >
> > I would expect an index scan on the first statement. The second statement 
> > tells me nada?!
> >
> > Thanks for your help!
> >
> >
> >> From: slav...@bigfraud.org
> >> Date: Mon, 21 Feb 2011 14:24:50 +
> >> To: sqlite-users@sqlite.org
> >> Subject: Re: [sqlite] COUNT() extremely slow first time!
> >>
> >>
> >> On 21 Feb 2011, at 2:23pm, Sven L wrote:
> >>
> >> > SELECT COUNT(ItemID) FROM Items;
> >> >
> >> > This takes around 40 seconds the first time! WHY?!
> >>
> >> Try again, doing everything identically except that instead of the above 
> >> line use
> >>
> >> SELECT COUNT(*) FROM Items;
> >>
> >> Simon.
> >> ___
> >> sqlite-users mailing list
> >> sqlite-users@sqlite.org
> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-21 Thread Jim Wilcoxson
This is a common issue on the mailing list.  The first time you do
count(*), SQLite (actually your OS) has to load data into memory.
Most OS's will keep the file in a buffer cache, so the 2nd count(*)
doesn't have to read from disk.

Here's a timing from my own system, after a purge command to clear the
buffer cache:

$ time sqlite3 hb.db 'select count(*) from logs'
-- Loading resources from /Users/jim/.sqliterc
count(*)
--
734909

real0m0.580s
user0m0.190s
sys 0m0.034s

Same command again, with the file cached:

$ time sqlite3 hb.db 'select count(*) from logs'
-- Loading resources from /Users/jim/.sqliterc
count(*)
--
734909

real0m0.189s
user0m0.165s
sys 0m0.019s

This time is consistent no matter how many times I run it, because the
file is still cached.  Doing a purge command to clear the cache and
re-running the query, we get:

$ purge
$ time sqlite3 hb.db 'select count(*) from logs'
-- Loading resources from /Users/jim/.sqliterc
count(*)
--
734909

real0m0.427s
user0m0.175s
sys 0m0.024s

On my system, there is not a huge difference, but it is consistent.
Now, if you have a fragmented file system, you will see a much larger
difference.  There are many posts on the mailing list about both file
system fragmentation and logical fragmentation within the SQLite file
itself.  Your first count(*) is subject to these fragmentation
effects, while your 2nd usually is not, because the file is in memory.

Some people on the list believe fragmentation is an unimportant detail
you shouldn't worry about, because you have little control over it.
That may be true, but it's useful to understand how it can affect
performance.  I think you are seeing this first hand.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com



On Mon, Feb 21, 2011 at 9:37 AM, Sven L <larvpo...@hotmail.se> wrote:
>
> Same result :(
> Note that I have compiled SQLite with the following switches:
> SQLITE_ENABLE_STAT2
> SQLITE_THREADSAFE=2
>
> I've learnt that COUNT(*) is slower than COUNT(ID), since * means the engine 
> has to traverse all columns and it might even return another value if there 
> are NULL-values...
>
> Also, this is quite interesting:
>
> sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items;
> 0|0|0|SCAN TABLE Items (~100 rows)
> sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items;
> sqlite>
>
> I would expect an index scan on the first statement. The second statement 
> tells me nada?!
>
> Thanks for your help!
>
>
>> From: slav...@bigfraud.org
>> Date: Mon, 21 Feb 2011 14:24:50 +
>> To: sqlite-users@sqlite.org
>> Subject: Re: [sqlite] COUNT() extremely slow first time!
>>
>>
>> On 21 Feb 2011, at 2:23pm, Sven L wrote:
>>
>> > SELECT COUNT(ItemID) FROM Items;
>> >
>> > This takes around 40 seconds the first time! WHY?!
>>
>> Try again, doing everything identically except that instead of the above 
>> line use
>>
>> SELECT COUNT(*) FROM Items;
>>
>> Simon.
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-21 Thread Sven L

Same result :(
Note that I have compiled SQLite with the following switches:
SQLITE_ENABLE_STAT2
SQLITE_THREADSAFE=2
 
I've learnt that COUNT(*) is slower than COUNT(ID), since * means the engine 
has to traverse all columns and it might even return another value if there are 
NULL-values...
 
Also, this is quite interesting:
 
sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items;
0|0|0|SCAN TABLE Items (~100 rows)
sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items;
sqlite>

I would expect an index scan on the first statement. The second statement tells 
me nada?!
 
Thanks for your help!

 
> From: slav...@bigfraud.org
> Date: Mon, 21 Feb 2011 14:24:50 +
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] COUNT() extremely slow first time!
> 
> 
> On 21 Feb 2011, at 2:23pm, Sven L wrote:
> 
> > SELECT COUNT(ItemID) FROM Items;
> > 
> > This takes around 40 seconds the first time! WHY?!
> 
> Try again, doing everything identically except that instead of the above line 
> use
> 
> SELECT COUNT(*) FROM Items;
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-21 Thread Simon Slavin

On 21 Feb 2011, at 2:23pm, Sven L wrote:

> SELECT COUNT(ItemID) FROM Items;
> 
> This takes around 40 seconds the first time! WHY?!

Try again, doing everything identically except that instead of the above line 
use

SELECT COUNT(*) FROM Items;

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] COUNT() extremely slow first time!

2011-02-21 Thread Sven L

SQLite 3.7.5.
 
I have a table (13 columns with 1 primary key, 1 index on a date column, 1 
UNIQUE-constraint on two columns).
I insert ~130 rows, 1000 at a time (transaction-wise).
I close the database, reopen it and immediately perform a COUNT-operation:
 
SELECT COUNT(ItemID) FROM Items;

This takes around 40 seconds the first time! WHY?!
I believe the value is somehow cached inside the database after the above call.
 
 
The table used:
 
CREATE TABLE IF NOT EXISTS Item
(
 ItemID INTEGER PRIMARY KEY,
 A INTEGER NOT NULL,
 B INTEGER NOT NULL,
 C INTEGER NOT NULL,
 D INTEGER NOT NULL,
 E INTEGER NOT NULL,
 F INTEGER NOT NULL,
 G INTEGER NOT NULL,
 H DATE NOT NULL,
 I CHAR(3) NOT NULL,
 J INTEGER NOT NULL,
 K INTEGER NOT NULL,
 L INTEGER NOT NULL,
 
 UNIQUE (B, A)
);
CREATE INDEX IF NOT EXISTS Item_H_Index ON Item (H);
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users