Re: [sqlite] COUNT() extremely slow first time!
Did you do multiple runs and average? And how are you measuring time? Did you do buffered reads? And you're MB/sec isn't accurate for the count() test. I think it was already said that count() doesn't read the entire record. The SQLite numbers are due to SQLite caching. If you make page size 4096 you are 4 times less likely to need another disk read (SSD or otherwise). That's different from what you're doing. You're one layer down from that. Your read makes perfect sense if I look at it as sampling errorwhich is what I would expect. As you increase your read page size you get a curve that saturates at 50MB/sec. Also...you either have a really slow machine or your timing is wrong. I can do over 70MB/sec just reading from disk (2.6Ghz Xeon, 7200 SAS drives mirrored) #include #include #include double tic(int flag) { static double t1; double t2; struct timeval tv; static long base_seconds; gettimeofday(,NULL); // to give us the resolution we need for high-speed we will subtract our starting seconds // otherwise 15 digits of accuracy aren't enough if (base_seconds==0) base_seconds=tv.tv_sec; if (flag==0) { t1 = (tv.tv_sec-base_seconds)+tv.tv_usec/(double)100; } t2 = (tv.tv_sec-base_seconds)+tv.tv_usec/(double)100; return t2-t1; } void test1(int bufsize) { char *buf=malloc(bufsize); FILE *fp=fopen("test2.dat","r"); int n,nn=0; while((n=fread(buf,1,bufsize,fp))) { nn+=n; } fclose(fp); printf("%d\n",nn); } int main(int argc,char *argv[]) { tic(0); test1(atoi(argv[1])); printf("%f MB/sec\n",51200/tic(1)); return 0; } dd if=/dev/zero of=test2.dat bs=1024 count=50 sync; echo 3 > /proc/sys/vm/drop_caches io 4096 51200 73.071591 MB/sec Now that it's cached io 4096 51200 1884.153971 MB/sec I would hope your SSD can do better than my disk on raw throughput. Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Max Vlasov [max.vla...@gmail.com] Sent: Monday, February 28, 2011 8:52 AM To: General Discussion of SQLite Database Subject: Re: [sqlite] EXT :Re: COUNT() extremely slow first time! Michael, it's an interesting hypothesis. Maybe you can describe the following results. Recently I made a test program that just reads a file sequentially with different block sizes and results are still confusing. This was the same SSD drive and the functions were basically from api: CreateFile/ReadFile 128, 256: 5MB/Sec 512: 20MB/Sec 1024: 25MB/Sec. 2048: 41MB/Sec 4096 - 65536: ~50MB/Sec. Strangely I didn't see such a big difference between slowest and fastest scenario (if we exclude the sizes below 1024), as you see the difference is only x2. I have only one explanation looking at the numbers. Although it's well-known that sqlite reads only full pages, if it sometimes does partial reading, this 5MB/Sec drop for <256 reading can affect linear speed of 25 MB/Sec to end up as 12MB/Sec. But it's just a guess. Max On Mon, Feb 28, 2011 at 4:43 PM, Black, Michael (IS)wrote: > Those numbers make sense to me. > > Since count is doing a linear walk throughcorrect? > > #1 SSDs more than likely don't do "read ahead" like a disk drive. So what > you're seeing is what read-ahead does for you. > #2 Count is doing a linear scan of the table...Probability of 2K containing > the next page after 1K -- 100% (2X performance improvment) > #3 Probability of 4K containing the next page after 2K -- 100% (2X > improvement). > #4 Probability of 8K containing the next page after 4K -- here the > probability either drops or we're starting to hit the bandwidth of > SSD+Sqlite -- I'm tempted to say that it's not the probability that drops. > > > From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on > behalf of Max Vlasov [max.vla...@gmail.com] > > my timing for Select Count for 400-500MB Bases and the table with about > 1,000,000 records > > 1024: > 40 Seconds, 98% system+user time, 500 MB Read, 12 MB/Sec > 2048: > 22 Seconds, 94% system+user time, 500 MB Read, 20 MB/Sec > 4096 (This NTFS Cluster size) > 11 Seconds, 96% system+user time, 450 MB Read, 32 MB/Sec > 8192 > 8 Seconds, 87% system+user time, 420 MB Read, 50 MB/Sec > 32768 > 8 Seconds, 56% system+user time, 410 MB Read, 50 MB/Sec > > Max > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
On Sun, Feb 27, 2011 at 3:11 PM, Max Vlasovwrote: > Hmm, yesterday something struck me I can do similar tests on an Asus T91MT > having SSD as the storage. I got similar timings to Greg's. So it seems like > page size is a very sensitive parameter for solid state drives. Looks like > having the page_size lower than the cluster size on NTFS file drive will > have very high price for scanning-intensive operations. [...] Page size (typically a power of two larger than or equal to 12, i.e., 4KB) and _alignment_ are critical in SSD applications. Make absolutely sure that the partition alignment is such that the filesystem can ensure 4KB blocks starting at 4KB boundaries on SSDs, or larger blocks starting on 4KB or larger boundaries. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
On Sat, Feb 26, 2011 at 1:03 AM, Greg Barkerwrote: > harddrive. Can you confirm this? > > > Giving this, if the fastest is 3.6 seconds, you have a very fast > I can confirm this, my tests were run on a machine that uses a solid state > drive. > > Hmm, yesterday something struck me I can do similar tests on an Asus T91MT having SSD as the storage. I got similar timings to Greg's. So it seems like page size is a very sensitive parameter for solid state drives. Looks like having the page_size lower than the cluster size on NTFS file drive will have very high price for scanning-intensive operations. What puzzles me is that CPU time for 1024 'select count' can take about 98% of the time so it's not the same logic as was for rotating hard drives when a program just waits for the hard drive to complete the operation. I doubt sqlite does something special so it's maybe about adaptation of ssd specific by windows, it looks like it's far from ideal. Can someone point to some tests regarding this? Since SSD becomes more and more spread, i think it's better to understand what it going on at least approximately. my timing for Select Count for 400-500MB Bases and the table with about 1,000,000 records 1024: 40 Seconds, 98% system+user time, 500 MB Read, 12 MB/Sec 2048: 22 Seconds, 94% system+user time, 500 MB Read, 20 MB/Sec 4096 (This NTFS Cluster size) 11 Seconds, 96% system+user time, 450 MB Read, 32 MB/Sec 8192 8 Seconds, 87% system+user time, 420 MB Read, 50 MB/Sec 32768 8 Seconds, 56% system+user time, 410 MB Read, 50 MB/Sec Max ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
On 25 Feb 2011, at 10:03pm, Greg Barker wrote: > Thanks for the responses guys. > >> I would never have any table with 150 columns. It should be possible to >> keep the schema for your table in your head. > > Unfortunately those are the cards I have been dealt. The columns are just > buckets of data (bucket1, bucket2, bucket3, etc). Each bucket could be 1 > month or 1 weeks worth of data, it can vary from table to table. They can > have up to two years worth of data loaded, so that could be 24 monthly > buckets or 104 weekly buckets. So your schema really should have a table with 'bucketNumber' and 'bucketContents' as columns. That would make every bucket a separate row. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
Thanks for the responses guys. > I would never have any table with 150 columns. It should be possible to keep the schema for your table in your head. Unfortunately those are the cards I have been dealt. The columns are just buckets of data (bucket1, bucket2, bucket3, etc). Each bucket could be 1 month or 1 weeks worth of data, it can vary from table to table. They can have up to two years worth of data loaded, so that could be 24 monthly buckets or 104 weekly buckets. We don't have any control over the machines this code is running on, so hardware changes are not an option. These machines are definitely less than ideal for what we are trying to get them to do, but what can you do. > Giving this, if the fastest is 3.6 seconds, you have a very fast harddrive. Can you confirm this? I can confirm this, my tests were run on a machine that uses a solid state drive. Side note - how do I become a member of this list? Do my posts always need to be approved? On Fri, Feb 25, 2011 at 4:57 AM, Max Vlasovwrote: > On Fri, Feb 25, 2011 at 1:51 AM, Greg Barker > wrote: > > > I ran my tests again, this time setting the cache_size based on the > > page_size. For some reason I wasn't able to replicate the timing I got > > before for the 1024 page size (40.2 seconds) which is really confusing > me, > > since I was careful to make sure nothing was running in the background > > while > > I was testing. > > > > page_size/cache_size: SELECT COUNT time > > 1024/2000: 20.83s > > 4096/500: 14.4s > > 8192/250: 8.8s > > 16384/125: 5.0s > > 32768/62: 3.6s > > > > I'm assuming reducing the cache size like that will definitely have an > > impact on the performance of an actual application? Optimizing > performance > > for an application where both the queries and data can take many > different > > shapes and sizes is beginning to seem like quite a daunting task. What do > > you do if there could be anywhere between 30-150 columns? > > > > > Greg, first, I suggest to forget about sqlite cache for scanning > operations. > This is because the os cache is also used and it's a known fact that they > sometimes store the same data twice, so disabling or decreasing one still > leaves another in effect. > > I saw that the db you have is about 400 MB in size. Giving this, if the > fastest is 3.6 seconds, you have a very fast harddrive. Can you confirm > this? The scanning of such big base with at least 5-years old hd definitely > should be 10 seconds or even slower. Did you reboot your comp this time? By > the way, a faster way to clear the system cache for a particular file is to > "touch" the file with CreateFile(...FILE_FLAG_NO_BUFFERING). > > I tried to emulate your base with a table > CREATE TABLE [TestTable] ([Id] Integer Primary key autoincrement, [a] Text, > [b] Text, [c] Text, [d] Text); > and filling it with 1,000,000 records having a, b,c,d about 100 bytes each. > I saw no big difference between 1024 and 8192 page sizes. When copy the db > to Nul took about 20 seconds, 1024-select count took 25 seconds and 8192 - > 20 seconds. > > Anyway, select count usually is a very slow operation regardless of the > tweaks we can use, almost in every scenario it's equivalent to reading all > the data of the table. I suppose the only exception is if your records is > very big to take advantage of bypassing overflow data. So, probably in a > table with 100 records each containing a 1MB text, 'select count' will be > very fast > > Max Vlasov > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
On Fri, Feb 25, 2011 at 1:51 AM, Greg Barkerwrote: > I ran my tests again, this time setting the cache_size based on the > page_size. For some reason I wasn't able to replicate the timing I got > before for the 1024 page size (40.2 seconds) which is really confusing me, > since I was careful to make sure nothing was running in the background > while > I was testing. > > page_size/cache_size: SELECT COUNT time > 1024/2000: 20.83s > 4096/500: 14.4s > 8192/250: 8.8s > 16384/125: 5.0s > 32768/62: 3.6s > > I'm assuming reducing the cache size like that will definitely have an > impact on the performance of an actual application? Optimizing performance > for an application where both the queries and data can take many different > shapes and sizes is beginning to seem like quite a daunting task. What do > you do if there could be anywhere between 30-150 columns? > > Greg, first, I suggest to forget about sqlite cache for scanning operations. This is because the os cache is also used and it's a known fact that they sometimes store the same data twice, so disabling or decreasing one still leaves another in effect. I saw that the db you have is about 400 MB in size. Giving this, if the fastest is 3.6 seconds, you have a very fast harddrive. Can you confirm this? The scanning of such big base with at least 5-years old hd definitely should be 10 seconds or even slower. Did you reboot your comp this time? By the way, a faster way to clear the system cache for a particular file is to "touch" the file with CreateFile(...FILE_FLAG_NO_BUFFERING). I tried to emulate your base with a table CREATE TABLE [TestTable] ([Id] Integer Primary key autoincrement, [a] Text, [b] Text, [c] Text, [d] Text); and filling it with 1,000,000 records having a, b,c,d about 100 bytes each. I saw no big difference between 1024 and 8192 page sizes. When copy the db to Nul took about 20 seconds, 1024-select count took 25 seconds and 8192 - 20 seconds. Anyway, select count usually is a very slow operation regardless of the tweaks we can use, almost in every scenario it's equivalent to reading all the data of the table. I suppose the only exception is if your records is very big to take advantage of bypassing overflow data. So, probably in a table with 100 records each containing a 1MB text, 'select count' will be very fast Max Vlasov ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
On 24 Feb 2011, at 10:51pm, Greg Barker wrote: > What do > you do if there could be anywhere between 30-150 columns? I would never have any table with 150 columns. It should be possible to keep the schema for your table in your head. > Optimizing performance > for an application where both the queries and data can take many different > shapes and sizes is beginning to seem like quite a daunting task. For 'daunting' read 'impossible', since the optimal setup will vary over the life of the system you're writing, as the tables get bigger, the operating system gets slower and the hardware gets faster. If there was a good general solution it would be built into SQLite as the default setting. There's a big difference between 'fast enough to do what's wanted' and 'as fast as possible without heroic measures'. The first is vital to the success of the project. The second is just bragging rights. I almost never make performance tweaks because although they might save me 600 milliseconds on the sort of lookup my users actually do, my users don't notice a difference of 2/3rds of a second. My employer pays about 240 dollars a day for my time. So for 480 dollars I could spend two days optimizing one of my schema or it could buy a faster computer (hard disk ?) and speed up not only database access but also everything else done with that computer. Since I have too much to do as it is, I'm unlikely to spend the extra two days and emerge with a program which does weird things in a weird way just to save a second or two. I'm curious what my customers would have chosen back when I was contracting and being paid a great deal more than 240 dollars a day for my time. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
I ran my tests again, this time setting the cache_size based on the page_size. For some reason I wasn't able to replicate the timing I got before for the 1024 page size (40.2 seconds) which is really confusing me, since I was careful to make sure nothing was running in the background while I was testing. page_size/cache_size: SELECT COUNT time 1024/2000: 20.83s 4096/500: 14.4s 8192/250: 8.8s 16384/125: 5.0s 32768/62: 3.6s I'm assuming reducing the cache size like that will definitely have an impact on the performance of an actual application? Optimizing performance for an application where both the queries and data can take many different shapes and sizes is beginning to seem like quite a daunting task. What do you do if there could be anywhere between 30-150 columns? I'll post the times I got before, when using the default cache_size of 2000: 1024: 40.2s 4096: 15.5s 8192: 8.5s 16384: 5.3s 32768: 3.8s On Thu, Feb 24, 2011 at 11:19 AM, Greg Barkerwrote: > Average payload per entry for my test table was 409.00. Sounds about right > since the db has 4 columns and each is filled with a random string of length > 100. > > I've uploaded the full output from the sqlite3_analyzer for a few different > page sizes: > > http://fletchowns.net/files/1024-analyzer.txt > http://fletchowns.net/files/4096-analyzer.txt > http://fletchowns.net/files/8192-analyzer.txt > > I'm not quite sure how to interpret everything in there. What stands out to > you guys? What is Average Fanout? > > Greg > > > On Thu, Feb 24, 2011 at 4:28 AM, Max Vlasov wrote: > >> Yes, Greg, please post this value for this table you use in count query >> >> Max >> >> >> On Wed, Feb 23, 2011 at 9:58 PM, Greg Barker >> wrote: >> >> > Is the record size you refer to here the same as the "Average payload >> per >> > entry" that sqlite3_analyzer determines for me? >> > >> > On Wed, Feb 23, 2011 at 5:09 AM, Max Vlasov >> wrote: >> > >> > > Greg, you should also take the record size into account. My hypothesis >> is >> > > that if your record is comparatively small (several fits into 1024) >> the >> > > speed of select count will be the same for any page size (my quick >> tests >> > > confirm this). It's interesting to know what is an average size of >> your >> > > record to understand why the numbers are so different. >> > > >> > > Max >> > >> > >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
Average payload per entry for my test table was 409.00. Sounds about right since the db has 4 columns and each is filled with a random string of length 100. I've uploaded the full output from the sqlite3_analyzer for a few different page sizes: http://fletchowns.net/files/1024-analyzer.txt http://fletchowns.net/files/4096-analyzer.txt http://fletchowns.net/files/8192-analyzer.txt I'm not quite sure how to interpret everything in there. What stands out to you guys? What is Average Fanout? Greg On Thu, Feb 24, 2011 at 4:28 AM, Max Vlasovwrote: > Yes, Greg, please post this value for this table you use in count query > > Max > > > On Wed, Feb 23, 2011 at 9:58 PM, Greg Barker > wrote: > > > Is the record size you refer to here the same as the "Average payload per > > entry" that sqlite3_analyzer determines for me? > > > > On Wed, Feb 23, 2011 at 5:09 AM, Max Vlasov > wrote: > > > > > Greg, you should also take the record size into account. My hypothesis > is > > > that if your record is comparatively small (several fits into 1024) the > > > speed of select count will be the same for any page size (my quick > tests > > > confirm this). It's interesting to know what is an average size of your > > > record to understand why the numbers are so different. > > > > > > Max > > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
Note sure. Will get back on this as soon as I have played a bit with the analyzer utility! What I meant was "My own estimated x bytes/row" :) x + ID + ~20 characters should make some ~32 bytes :) > Date: Wed, 23 Feb 2011 10:58:00 -0800 > From: fle...@fletchowns.net > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] COUNT() extremely slow first time! > > Is the record size you refer to here the same as the "Average payload per > entry" that sqlite3_analyzer determines for me? > > On Wed, Feb 23, 2011 at 5:09 AM, Max Vlasov <max.vla...@gmail.com> wrote: > > > Greg, you should also take the record size into account. My hypothesis is > > that if your record is comparatively small (several fits into 1024) the > > speed of select count will be the same for any page size (my quick tests > > confirm this). It's interesting to know what is an average size of your > > record to understand why the numbers are so different. > > > > Max > > ___ > > sqlite-users mailing list > > sqlite-users@sqlite.org > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
Yes, Greg, please post this value for this table you use in count query Max On Wed, Feb 23, 2011 at 9:58 PM, Greg Barkerwrote: > Is the record size you refer to here the same as the "Average payload per > entry" that sqlite3_analyzer determines for me? > > On Wed, Feb 23, 2011 at 5:09 AM, Max Vlasov wrote: > > > Greg, you should also take the record size into account. My hypothesis is > > that if your record is comparatively small (several fits into 1024) the > > speed of select count will be the same for any page size (my quick tests > > confirm this). It's interesting to know what is an average size of your > > record to understand why the numbers are so different. > > > > Max > > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
Is the record size you refer to here the same as the "Average payload per entry" that sqlite3_analyzer determines for me? On Wed, Feb 23, 2011 at 5:09 AM, Max Vlasovwrote: > Greg, you should also take the record size into account. My hypothesis is > that if your record is comparatively small (several fits into 1024) the > speed of select count will be the same for any page size (my quick tests > confirm this). It's interesting to know what is an average size of your > record to understand why the numbers are so different. > > Max > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
On Wed, Feb 23, 2011 at 11:12 AM, Sven L <larvpo...@hotmail.se> wrote: > > Thanks for pointing this out! > > In my case I have spent much time on normalizing my tables, so the row size > should be constant in most cases. I do wonder though, what if the row size is > 32 bytes? Or is there a minimum? > > For instance, I have many lookup tables with ID+text (usually around 20 > characters): > MyID|MyText > > With a page size of 4096, will SQLite put ~200 rows in one page? Yes, very roughly. There is other internal information: a header on each page, on each row, on each field, ints are variable length, etc., and SQLite reserves some free space on each page for later inserts. Use sqlite3_analyzer for lots of useful info when picking a page size. Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com > > > >> Date: Wed, 23 Feb 2011 10:47:03 -0500 >> From: pri...@gmail.com >> To: t...@djii.com; sqlite-users@sqlite.org >> Subject: Re: [sqlite] COUNT() extremely slow first time! >> >> The SQLite cache size is in pages (2000 by default), so by increasing >> the page size 8x, you're also increasing the cache size and memory >> requirements by 8x. Not saying it's a bad thing, just something to be >> aware of. >> >> If you want to compare 1K and 8K page size and only compare the effect >> page size has, you should either increase the cache size to 16000 for >> 1K pages or decrease the cache to 250 for 8K pages. >> >> The other thing to be aware of is that SQLite will not allow a row to >> cross 2 pages. (It does allow a row to be larger than a page, using >> an overflow page.) So for example, if your page size is 1024 and row >> size is 512 bytes, you can fit 2 rows on a page. I'm simplifying this >> somewhat and ignoring internal SQLite data, but you get the idea. If >> your row size is 513 bytes, you will have 511 bytes of waste on each >> page, so 50% of your database will be "air". As your row size heads >> toward 1024 there will be less waste. At 1025 bytes, SQLite will >> start splitting rows into overflow pages, putting 1024 bytes into the >> overflow page and 1 byte in the btree page. These numbers aren't >> right, but illustrate the point. >> >> So to find a good page size, experiment and measure. >> >> Jim >> -- >> HashBackup: easy onsite and offsite Unix backup >> http://www.hashbackup.com >> >> >> >> On Wed, Feb 23, 2011 at 10:20 AM, Teg <t...@djii.com> wrote: >> > Hello Greg, >> > >> > I found this to be the case too. The difference between 1K and 8K is >> > staggering. I default all my windows DB's to 8K now. >> > >> > >> > Tuesday, February 22, 2011, 1:59:29 PM, you wrote: >> > >> > GB> I'm currently dealing with a similar issue. I've found that the >> > page_size >> > GB> PRAGMA setting can have a dramatic effect on how long it takes to >> > "warm up" >> > GB> the table. On Windows 7, with page_size=1024, a SELECT >> > COUNT(last_column) >> > GB> takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 >> > takes >> > GB> 8.5 seconds. This was done with a reboot between each test. >> > >> > >> > >> > >> > -- >> > Best regards, >> > Teg mailto:t...@djii.com >> > >> > ___ >> > sqlite-users mailing list >> > sqlite-users@sqlite.org >> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
Thanks for pointing this out! In my case I have spent much time on normalizing my tables, so the row size should be constant in most cases. I do wonder though, what if the row size is 32 bytes? Or is there a minimum? For instance, I have many lookup tables with ID+text (usually around 20 characters): MyID|MyText With a page size of 4096, will SQLite put ~200 rows in one page? > Date: Wed, 23 Feb 2011 10:47:03 -0500 > From: pri...@gmail.com > To: t...@djii.com; sqlite-users@sqlite.org > Subject: Re: [sqlite] COUNT() extremely slow first time! > > The SQLite cache size is in pages (2000 by default), so by increasing > the page size 8x, you're also increasing the cache size and memory > requirements by 8x. Not saying it's a bad thing, just something to be > aware of. > > If you want to compare 1K and 8K page size and only compare the effect > page size has, you should either increase the cache size to 16000 for > 1K pages or decrease the cache to 250 for 8K pages. > > The other thing to be aware of is that SQLite will not allow a row to > cross 2 pages. (It does allow a row to be larger than a page, using > an overflow page.) So for example, if your page size is 1024 and row > size is 512 bytes, you can fit 2 rows on a page. I'm simplifying this > somewhat and ignoring internal SQLite data, but you get the idea. If > your row size is 513 bytes, you will have 511 bytes of waste on each > page, so 50% of your database will be "air". As your row size heads > toward 1024 there will be less waste. At 1025 bytes, SQLite will > start splitting rows into overflow pages, putting 1024 bytes into the > overflow page and 1 byte in the btree page. These numbers aren't > right, but illustrate the point. > > So to find a good page size, experiment and measure. > > Jim > -- > HashBackup: easy onsite and offsite Unix backup > http://www.hashbackup.com > > > > On Wed, Feb 23, 2011 at 10:20 AM, Teg <t...@djii.com> wrote: > > Hello Greg, > > > > I found this to be the case too. The difference between 1K and 8K is > > staggering. I default all my windows DB's to 8K now. > > > > > > Tuesday, February 22, 2011, 1:59:29 PM, you wrote: > > > > GB> I'm currently dealing with a similar issue. I've found that the > > page_size > > GB> PRAGMA setting can have a dramatic effect on how long it takes to "warm > > up" > > GB> the table. On Windows 7, with page_size=1024, a SELECT > > COUNT(last_column) > > GB> takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 > > takes > > GB> 8.5 seconds. This was done with a reboot between each test. > > > > > > > > > > -- > > Best regards, > > Tegmailto:t...@djii.com > > > > ___ > > sqlite-users mailing list > > sqlite-users@sqlite.org > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
The SQLite cache size is in pages (2000 by default), so by increasing the page size 8x, you're also increasing the cache size and memory requirements by 8x. Not saying it's a bad thing, just something to be aware of. If you want to compare 1K and 8K page size and only compare the effect page size has, you should either increase the cache size to 16000 for 1K pages or decrease the cache to 250 for 8K pages. The other thing to be aware of is that SQLite will not allow a row to cross 2 pages. (It does allow a row to be larger than a page, using an overflow page.) So for example, if your page size is 1024 and row size is 512 bytes, you can fit 2 rows on a page. I'm simplifying this somewhat and ignoring internal SQLite data, but you get the idea. If your row size is 513 bytes, you will have 511 bytes of waste on each page, so 50% of your database will be "air". As your row size heads toward 1024 there will be less waste. At 1025 bytes, SQLite will start splitting rows into overflow pages, putting 1024 bytes into the overflow page and 1 byte in the btree page. These numbers aren't right, but illustrate the point. So to find a good page size, experiment and measure. Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com On Wed, Feb 23, 2011 at 10:20 AM, Tegwrote: > Hello Greg, > > I found this to be the case too. The difference between 1K and 8K is > staggering. I default all my windows DB's to 8K now. > > > Tuesday, February 22, 2011, 1:59:29 PM, you wrote: > > GB> I'm currently dealing with a similar issue. I've found that the page_size > GB> PRAGMA setting can have a dramatic effect on how long it takes to "warm > up" > GB> the table. On Windows 7, with page_size=1024, a SELECT COUNT(last_column) > GB> takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 > takes > GB> 8.5 seconds. This was done with a reboot between each test. > > > > > -- > Best regards, > Teg mailto:t...@djii.com > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
On Wed, Feb 23, 2011 at 03:10:02PM +0100, Sven L scratched on the wall: > > Does this trick work on the primary key? If not, why? Yes, all the time. Defining a column as a PK automatically creates a UNIQUE index over that column. The only exception is when the column is an INTEGER PRIMARY KEY, in which case the column becomes the rowid and uses the table's native index. In that case, to achieve the same result, one would need to manually create an index, as Max outlines below. In addition to fast counts, this setup is also very good for equi-joins, which tend to be somewhat common in most database designs. (An equi-join is when you join table A to table B only for the purpose of filtering rows in A, and don't actually return any values from B as part of the result set.) Because of the way SQLite works internally, there are also tricks of creating indexes with "extra" columns. For example, if you have a table with 23 columns, but you mostly use the PK and two additional columns, you can create an index over just those tree columns. This will generally result in faster access to those two columns (for reasons that take several pages to explain). Of course, even better would be to break things up into a primary and detail table, but that's a design consideration. Just beware that all of these indexes come at a cost. If you database is strongly read dominated, you might consider some of these techniques. On the other hand, if you're more or less read/write mixed, or write dominated, these techniques will cause an overall performance drop. Indexes can be useful for reads, but they always come at a write (INSERT/UPDATE/DELETE) cost. -j > > Returning to the original topic, for performance reasons I sometimes > > recommend using an index created on the id/rowid. It's a strange construct > > that makes no sense, but actually it sometimes give a speed improvement. > > > > This is because any index contains only the data used in it and if the query > > doesn't require getting additional data from the table it was created for, > > sqlite only reads this index and nothing else. -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
Hello Greg, I found this to be the case too. The difference between 1K and 8K is staggering. I default all my windows DB's to 8K now. Tuesday, February 22, 2011, 1:59:29 PM, you wrote: GB> I'm currently dealing with a similar issue. I've found that the page_size GB> PRAGMA setting can have a dramatic effect on how long it takes to "warm up" GB> the table. On Windows 7, with page_size=1024, a SELECT COUNT(last_column) GB> takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 takes GB> 8.5 seconds. This was done with a reboot between each test. -- Best regards, Tegmailto:t...@djii.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
Does this trick work on the primary key? If not, why? > From: max.vla...@gmail.com > Date: Wed, 23 Feb 2011 16:09:04 +0300 > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] COUNT() extremely slow first time! > > On Tue, Feb 22, 2011 at 9:59 PM, Greg Barker <fle...@fletchowns.net> wrote: > > > I'm currently dealing with a similar issue. I've found that the page_size > > PRAGMA setting can have a dramatic effect on how long it takes to "warm up" > > the table. On Windows 7, with page_size=1024, a SELECT COUNT(last_column) > > takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 takes > > 8.5 seconds. This was done with a reboot between each test. > > > > This page recommends a page_size of 4096: > > http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuningWindows > > > > If I see better performance with the larger page sizes (going to test 16384 > > and beyond after this) is there any reason not to use them? > > > > > Greg, you should also take the record size into account. My hypothesis is > that if your record is comparatively small (several fits into 1024) the > speed of select count will be the same for any page size (my quick tests > confirm this). It's interesting to know what is an average size of your > record to understand why the numbers are so different. > > Returning to the original topic, for performance reasons I sometimes > recommend using an index created on the id/rowid. It's a strange construct > that makes no sense, but actually it sometimes give a speed improvement. > This is because any index contains only the data used in it and if the query > doesn't require getting additional data from the table it was created for, > sqlite only reads this index and nothing else. > > So to get the fastest count result one can create the following index > (assuming id is the alias for rowid) > > CREATE INDEX [idx_MyTableId] ON [MyTable] ([ID] ) > > And use the following query > > SELECT COUNT(id) from (SELECT id FROM MyTable ORDER By Id) > > "Order by" here forces using this index and I used outer select since > count(id) inside the main select for unknown reasons triggers the table > scanning. > > For any query in my tests that usually takes 5-50 seconds, this one is > always less than a second. But is costs a little in term of the size (the > index takes space) and the speed of insert. If this is a small price to pay > then this may be an answer. > > Max > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
On Tue, Feb 22, 2011 at 9:59 PM, Greg Barkerwrote: > I'm currently dealing with a similar issue. I've found that the page_size > PRAGMA setting can have a dramatic effect on how long it takes to "warm up" > the table. On Windows 7, with page_size=1024, a SELECT COUNT(last_column) > takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 takes > 8.5 seconds. This was done with a reboot between each test. > > This page recommends a page_size of 4096: > http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuningWindows > > If I see better performance with the larger page sizes (going to test 16384 > and beyond after this) is there any reason not to use them? > > Greg, you should also take the record size into account. My hypothesis is that if your record is comparatively small (several fits into 1024) the speed of select count will be the same for any page size (my quick tests confirm this). It's interesting to know what is an average size of your record to understand why the numbers are so different. Returning to the original topic, for performance reasons I sometimes recommend using an index created on the id/rowid. It's a strange construct that makes no sense, but actually it sometimes give a speed improvement. This is because any index contains only the data used in it and if the query doesn't require getting additional data from the table it was created for, sqlite only reads this index and nothing else. So to get the fastest count result one can create the following index (assuming id is the alias for rowid) CREATE INDEX [idx_MyTableId] ON [MyTable] ([ID] ) And use the following query SELECT COUNT(id) from (SELECT id FROM MyTable ORDER By Id) "Order by" here forces using this index and I used outer select since count(id) inside the main select for unknown reasons triggers the table scanning. For any query in my tests that usually takes 5-50 seconds, this one is always less than a second. But is costs a little in term of the size (the index takes space) and the speed of insert. If this is a small price to pay then this may be an answer. Max ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
Interesting! I've been using "PRAGMA page_size = 4096;" in my software. Perhaps I should increase it and see if I can get a performance gain. Does it affect INSERTs too? > Date: Tue, 22 Feb 2011 10:59:29 -0800 > From: fle...@fletchowns.net > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] COUNT() extremely slow first time! > > I'm currently dealing with a similar issue. I've found that the page_size > PRAGMA setting can have a dramatic effect on how long it takes to "warm up" > the table. On Windows 7, with page_size=1024, a SELECT COUNT(last_column) > takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 takes > 8.5 seconds. This was done with a reboot between each test. > > This page recommends a page_size of 4096: > http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuningWindows > > If I see better performance with the larger page sizes (going to test 16384 > and beyond after this) is there any reason not to use them? > > Greg > > On Mon, Feb 21, 2011 at 4:37 PM, Stephen Oberholtzer < > oliverkloz...@gmail.com> wrote: > > > On Mon, Feb 21, 2011 at 11:35 AM, Jim Wilcoxson <pri...@gmail.com> wrote: > > > On Mon, Feb 21, 2011 at 11:05 AM, Sven L <larvpo...@hotmail.se> wrote: > > >> > > >> Thank you for your detailed explanation! > > >> First, can you please tell me how to purge the cache in Windows 7? This > > could be very useful for my tests! > > > > > > Sorry, dunno for Windows. On Mac OSX it is the purge command, in the > > > development tools. On Linux, you do: echo 3 > > > > /prog/sys/vm/drop_caches > > > > Just make sure you either (a) quote the 3 (echo '3' > > > /proc/sys/vm/drop_caches) or (b) put a space between the 3 and the >. > > If you don't quote it, and you don't put the space in (echo > > 3>/proc/sys/vm/drop_caches) it doesn't do anything, for reasons I > > won't go into. > > > > -- > > -- Stevie-O > > Real programmers use COPY CON PROGRAM.EXE > > ___ > > sqlite-users mailing list > > sqlite-users@sqlite.org > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
I'm currently dealing with a similar issue. I've found that the page_size PRAGMA setting can have a dramatic effect on how long it takes to "warm up" the table. On Windows 7, with page_size=1024, a SELECT COUNT(last_column) takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 takes 8.5 seconds. This was done with a reboot between each test. This page recommends a page_size of 4096: http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuningWindows If I see better performance with the larger page sizes (going to test 16384 and beyond after this) is there any reason not to use them? Greg On Mon, Feb 21, 2011 at 4:37 PM, Stephen Oberholtzer < oliverkloz...@gmail.com> wrote: > On Mon, Feb 21, 2011 at 11:35 AM, Jim Wilcoxsonwrote: > > On Mon, Feb 21, 2011 at 11:05 AM, Sven L wrote: > >> > >> Thank you for your detailed explanation! > >> First, can you please tell me how to purge the cache in Windows 7? This > could be very useful for my tests! > > > > Sorry, dunno for Windows. On Mac OSX it is the purge command, in the > > development tools. On Linux, you do: echo 3 > > > /prog/sys/vm/drop_caches > > Just make sure you either (a) quote the 3 (echo '3' > > /proc/sys/vm/drop_caches) or (b) put a space between the 3 and the >. > If you don't quote it, and you don't put the space in (echo > 3>/proc/sys/vm/drop_caches) it doesn't do anything, for reasons I > won't go into. > > -- > -- Stevie-O > Real programmers use COPY CON PROGRAM.EXE > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
On Mon, Feb 21, 2011 at 11:35 AM, Jim Wilcoxsonwrote: > On Mon, Feb 21, 2011 at 11:05 AM, Sven L wrote: >> >> Thank you for your detailed explanation! >> First, can you please tell me how to purge the cache in Windows 7? This >> could be very useful for my tests! > > Sorry, dunno for Windows. On Mac OSX it is the purge command, in the > development tools. On Linux, you do: echo 3 > > /prog/sys/vm/drop_caches Just make sure you either (a) quote the 3 (echo '3' > /proc/sys/vm/drop_caches) or (b) put a space between the 3 and the >. If you don't quote it, and you don't put the space in (echo 3>/proc/sys/vm/drop_caches) it doesn't do anything, for reasons I won't go into. -- -- Stevie-O Real programmers use COPY CON PROGRAM.EXE ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
Thanks :) This did the trick: First make a copy of the database: copy HugeDatabase.db HugeDatabase_copy.db Then for each run, replace the database with its copy. This is why I thought the COUNT operation was somehow written to the database after its first run... :P > From: slav...@bigfraud.org > Date: Mon, 21 Feb 2011 16:56:01 + > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] COUNT() extremely slow first time! > > > On 21 Feb 2011, at 4:35pm, Jim Wilcoxson wrote: > > > On Mon, Feb 21, 2011 at 11:05 AM, Sven L <larvpo...@hotmail.se> wrote: > >> > >> Thank you for your detailed explanation! > >> First, can you please tell me how to purge the cache in Windows 7? This > >> could be very useful for my tests! > > > > Sorry, dunno for Windows. On Mac OSX it is the purge command, in the > > development tools. > > In current versions it's just /usr/bin/purge/, which should be in your path. > No idea why a non-programmer should need it, but there it is. > > > On Linux, you do: echo 3 > > > /prog/sys/vm/drop_caches > > And in Windows it's ... almost impossible. You can sync to disk, using > fflush(), and there's no reason you can't do this from the command-line: > > http://technet.microsoft.com/en-us/sysinternals/bb897438.aspx > > But for a purge you have to invalidate the contents of cache, and there's no > way for an application to tell Windows to do this. For a start, each > application has its own user mode address cache, so if you run another > utility to flush cache, it just messes with its own space. Second, there's no > system call that allows you access to the cache because Microsoft considers > it private to the device level. I am not dissing Microsoft for this: there > are fair arguments that this is actually the Right Way to do it. > > So the way to do it is to overrun the cache by yourself. If you know your > cache is 2Gig, find a 2Gig disk file that has nothing to do with your test > suite and read it. Or generate 2Gig of gibberish and write it to disk, then > delete that file. Pah. > > Simon. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
On 21 Feb 2011, at 4:35pm, Jim Wilcoxson wrote: > On Mon, Feb 21, 2011 at 11:05 AM, Sven Lwrote: >> >> Thank you for your detailed explanation! >> First, can you please tell me how to purge the cache in Windows 7? This >> could be very useful for my tests! > > Sorry, dunno for Windows. On Mac OSX it is the purge command, in the > development tools. In current versions it's just /usr/bin/purge/, which should be in your path. No idea why a non-programmer should need it, but there it is. > On Linux, you do: echo 3 > > /prog/sys/vm/drop_caches And in Windows it's ... almost impossible. You can sync to disk, using fflush(), and there's no reason you can't do this from the command-line: http://technet.microsoft.com/en-us/sysinternals/bb897438.aspx But for a purge you have to invalidate the contents of cache, and there's no way for an application to tell Windows to do this. For a start, each application has its own user mode address cache, so if you run another utility to flush cache, it just messes with its own space. Second, there's no system call that allows you access to the cache because Microsoft considers it private to the device level. I am not dissing Microsoft for this: there are fair arguments that this is actually the Right Way to do it. So the way to do it is to overrun the cache by yourself. If you know your cache is 2Gig, find a 2Gig disk file that has nothing to do with your test suite and read it. Or generate 2Gig of gibberish and write it to disk, then delete that file. Pah. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
(logid) the first time, and 8x faster once all pages are cached. This is just what happens on my system (Mac OSX), and may have nothing at all to do with the performance on yours. Jim > > Thanks again! > > >> Date: Mon, 21 Feb 2011 10:17:03 -0500 >> From: pri...@gmail.com >> To: sqlite-users@sqlite.org >> Subject: Re: [sqlite] COUNT() extremely slow first time! >> >> This is a common issue on the mailing list. The first time you do >> count(*), SQLite (actually your OS) has to load data into memory. >> Most OS's will keep the file in a buffer cache, so the 2nd count(*) >> doesn't have to read from disk. >> >> Here's a timing from my own system, after a purge command to clear the >> buffer cache: >> >> $ time sqlite3 hb.db 'select count(*) from logs' >> -- Loading resources from /Users/jim/.sqliterc >> count(*) >> -- >> 734909 >> >> real 0m0.580s >> user 0m0.190s >> sys 0m0.034s >> >> Same command again, with the file cached: >> >> $ time sqlite3 hb.db 'select count(*) from logs' >> -- Loading resources from /Users/jim/.sqliterc >> count(*) >> -- >> 734909 >> >> real 0m0.189s >> user 0m0.165s >> sys 0m0.019s >> >> This time is consistent no matter how many times I run it, because the >> file is still cached. Doing a purge command to clear the cache and >> re-running the query, we get: >> >> $ purge >> $ time sqlite3 hb.db 'select count(*) from logs' >> -- Loading resources from /Users/jim/.sqliterc >> count(*) >> -- >> 734909 >> >> real 0m0.427s >> user 0m0.175s >> sys 0m0.024s >> >> On my system, there is not a huge difference, but it is consistent. >> Now, if you have a fragmented file system, you will see a much larger >> difference. There are many posts on the mailing list about both file >> system fragmentation and logical fragmentation within the SQLite file >> itself. Your first count(*) is subject to these fragmentation >> effects, while your 2nd usually is not, because the file is in memory. >> >> Some people on the list believe fragmentation is an unimportant detail >> you shouldn't worry about, because you have little control over it. >> That may be true, but it's useful to understand how it can affect >> performance. I think you are seeing this first hand. >> >> Jim >> -- >> HashBackup: easy onsite and offsite Unix backup >> http://www.hashbackup.com >> >> >> >> On Mon, Feb 21, 2011 at 9:37 AM, Sven L <larvpo...@hotmail.se> wrote: >> > >> > Same result :( >> > Note that I have compiled SQLite with the following switches: >> > SQLITE_ENABLE_STAT2 >> > SQLITE_THREADSAFE=2 >> > >> > I've learnt that COUNT(*) is slower than COUNT(ID), since * means the >> > engine has to traverse all columns and it might even return another value >> > if there are NULL-values... >> > >> > Also, this is quite interesting: >> > >> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items; >> > 0|0|0|SCAN TABLE Items (~100 rows) >> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items; >> > sqlite> >> > >> > I would expect an index scan on the first statement. The second statement >> > tells me nada?! >> > >> > Thanks for your help! >> > >> > >> >> From: slav...@bigfraud.org >> >> Date: Mon, 21 Feb 2011 14:24:50 + >> >> To: sqlite-users@sqlite.org >> >> Subject: Re: [sqlite] COUNT() extremely slow first time! >> >> >> >> >> >> On 21 Feb 2011, at 2:23pm, Sven L wrote: >> >> >> >> > SELECT COUNT(ItemID) FROM Items; >> >> > >> >> > This takes around 40 seconds the first time! WHY?! >> >> >> >> Try again, doing everything identically except that instead of the above >> >> line use >> >> >> >> SELECT COUNT(*) FROM Items; >> >> >> >> Simon. >> >> ___ >> >> sqlite-users mailing list >> >> sqlite-users@sqlite.org >> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > >> > ___ >> > sqlite-users mailing list >> > sqlite-users@sqlite.org >> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
On Mon, Feb 21, 2011 at 03:37:50PM +0100, Sven L scratched on the wall: > I've learnt that COUNT(*) is slower than COUNT(ID), That's usually not true. > since * means the engine has to traverse all columns Actually, count(*) is the one case when the engine does *not* need to traverse any columns. > and it might even return another value if there are NULL-values... Yes, but that's how count() is defined to work. The expression "count(id)" only counts rows where id is not NULL. This requires that the database engine retrieve the value of the id column from each row in order to test for NULL. The expression "count(*)" strictly counts rows, without concern over NULLs. As such, there is no need to actually retrieve any row data, because there is no need to test any row values. The count(*) function can scan the table tree, counting the number of rows, without actually loading or decoding the row data. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
On Mon, Feb 21, 2011 at 10:38 AM, Puneet Kishor <punk.k...@gmail.com> wrote: > On Mon, Feb 21, 2011 at 10:17:03AM -0500, Jim Wilcoxson wrote: >> This is a common issue on the mailing list. The first time you do >> count(*), SQLite (actually your OS) has to load data into memory. >> Most OS's will keep the file in a buffer cache, so the 2nd count(*) >> doesn't have to read from disk. > > > One question I have related to the above -- how long does that buffer > cache remain filled with the data? I am assuming it is until new stuff > to be cached pushes out old stuff, no? For most OS's, the time data remains in the cache and the size of the cache will vary as a function of available RAM. > > I was doing some R*Tree selects, and the first query was dog slow, > although benchmarking showed that the actual CPU time was very small. > Subsequent queries were lightning fast. I am assuming that the buffer is > not getting filled with the results as much as it is getting filled with > whatever part of the db that the program needs to open to do its work. Right. SQLite doesn't cache query results. It does cache database pages in its own cache, which by default is rather small: 2000 pages. At the default page size is 1K, that's a 2MB cache. To fetch records not in it's own cache, SQLite will use the OS. If the page is in the OS cache, there is no seek time and no read latency (for spinning media). Jim > > >> >> Here's a timing from my own system, after a purge command to clear the >> buffer cache: >> >> $ time sqlite3 hb.db 'select count(*) from logs' >> -- Loading resources from /Users/jim/.sqliterc >> count(*) >> -- >> 734909 >> >> real 0m0.580s >> user 0m0.190s >> sys 0m0.034s >> >> Same command again, with the file cached: >> >> $ time sqlite3 hb.db 'select count(*) from logs' >> -- Loading resources from /Users/jim/.sqliterc >> count(*) >> -- >> 734909 >> >> real 0m0.189s >> user 0m0.165s >> sys 0m0.019s >> >> This time is consistent no matter how many times I run it, because the >> file is still cached. Doing a purge command to clear the cache and >> re-running the query, we get: >> >> $ purge >> $ time sqlite3 hb.db 'select count(*) from logs' >> -- Loading resources from /Users/jim/.sqliterc >> count(*) >> -- >> 734909 >> >> real 0m0.427s >> user 0m0.175s >> sys 0m0.024s >> >> On my system, there is not a huge difference, but it is consistent. >> Now, if you have a fragmented file system, you will see a much larger >> difference. There are many posts on the mailing list about both file >> system fragmentation and logical fragmentation within the SQLite file >> itself. Your first count(*) is subject to these fragmentation >> effects, while your 2nd usually is not, because the file is in memory. >> >> Some people on the list believe fragmentation is an unimportant detail >> you shouldn't worry about, because you have little control over it. >> That may be true, but it's useful to understand how it can affect >> performance. I think you are seeing this first hand. >> >> Jim >> -- >> HashBackup: easy onsite and offsite Unix backup >> http://www.hashbackup.com >> >> >> >> On Mon, Feb 21, 2011 at 9:37 AM, Sven L <larvpo...@hotmail.se> wrote: >> > >> > Same result :( >> > Note that I have compiled SQLite with the following switches: >> > SQLITE_ENABLE_STAT2 >> > SQLITE_THREADSAFE=2 >> > >> > I've learnt that COUNT(*) is slower than COUNT(ID), since * means the >> > engine has to traverse all columns and it might even return another value >> > if there are NULL-values... >> > >> > Also, this is quite interesting: >> > >> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items; >> > 0|0|0|SCAN TABLE Items (~100 rows) >> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items; >> > sqlite> >> > >> > I would expect an index scan on the first statement. The second statement >> > tells me nada?! >> > >> > Thanks for your help! >> > >> > >> >> From: slav...@bigfraud.org >> >> Date: Mon, 21 Feb 2011 14:24:50 + >> >> To: sqlite-users@sqlite.org >> >> Subject: Re: [sqlite] COUNT() extremely slow first time! >> >> >> >> >> >> On 21 Feb 2011, at 2:23pm, Sven L wrote: >> >> >> >> > SELECT COUNT(ItemID) FROM
Re: [sqlite] COUNT() extremely slow first time!
Thank you for your detailed explanation! First, can you please tell me how to purge the cache in Windows 7? This could be very useful for my tests! I'm quite sure my database itself is not fragmented, since I have only inserted data. The file system is in good shape too; Windows reports 0% fragmentation. Perhaps there is some other bottleneck, like disk performance in general (this is a 5400 rpm)? Or antivirus-related CPU holds? (I did turn real-time scanning off though.) I have even turned Windows Search off (and got a 20% performance gain!). My 32-bit application is running under Windows 7 (64-bit). Could WOW64 have something to do with this performance issue? The size of the database is 1,98 GB, with 1.5M rows. Couldn't it be that the table scan simply is darn slow for huge tables? In an ideal world the COUNT() would be performed on the primary key in RAM. That's not possible? :P Thanks again! > Date: Mon, 21 Feb 2011 10:17:03 -0500 > From: pri...@gmail.com > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] COUNT() extremely slow first time! > > This is a common issue on the mailing list. The first time you do > count(*), SQLite (actually your OS) has to load data into memory. > Most OS's will keep the file in a buffer cache, so the 2nd count(*) > doesn't have to read from disk. > > Here's a timing from my own system, after a purge command to clear the > buffer cache: > > $ time sqlite3 hb.db 'select count(*) from logs' > -- Loading resources from /Users/jim/.sqliterc > count(*) > -- > 734909 > > real 0m0.580s > user 0m0.190s > sys 0m0.034s > > Same command again, with the file cached: > > $ time sqlite3 hb.db 'select count(*) from logs' > -- Loading resources from /Users/jim/.sqliterc > count(*) > -- > 734909 > > real 0m0.189s > user 0m0.165s > sys 0m0.019s > > This time is consistent no matter how many times I run it, because the > file is still cached. Doing a purge command to clear the cache and > re-running the query, we get: > > $ purge > $ time sqlite3 hb.db 'select count(*) from logs' > -- Loading resources from /Users/jim/.sqliterc > count(*) > -- > 734909 > > real 0m0.427s > user 0m0.175s > sys 0m0.024s > > On my system, there is not a huge difference, but it is consistent. > Now, if you have a fragmented file system, you will see a much larger > difference. There are many posts on the mailing list about both file > system fragmentation and logical fragmentation within the SQLite file > itself. Your first count(*) is subject to these fragmentation > effects, while your 2nd usually is not, because the file is in memory. > > Some people on the list believe fragmentation is an unimportant detail > you shouldn't worry about, because you have little control over it. > That may be true, but it's useful to understand how it can affect > performance. I think you are seeing this first hand. > > Jim > -- > HashBackup: easy onsite and offsite Unix backup > http://www.hashbackup.com > > > > On Mon, Feb 21, 2011 at 9:37 AM, Sven L <larvpo...@hotmail.se> wrote: > > > > Same result :( > > Note that I have compiled SQLite with the following switches: > > SQLITE_ENABLE_STAT2 > > SQLITE_THREADSAFE=2 > > > > I've learnt that COUNT(*) is slower than COUNT(ID), since * means the > > engine has to traverse all columns and it might even return another value > > if there are NULL-values... > > > > Also, this is quite interesting: > > > > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items; > > 0|0|0|SCAN TABLE Items (~100 rows) > > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items; > > sqlite> > > > > I would expect an index scan on the first statement. The second statement > > tells me nada?! > > > > Thanks for your help! > > > > > >> From: slav...@bigfraud.org > >> Date: Mon, 21 Feb 2011 14:24:50 + > >> To: sqlite-users@sqlite.org > >> Subject: Re: [sqlite] COUNT() extremely slow first time! > >> > >> > >> On 21 Feb 2011, at 2:23pm, Sven L wrote: > >> > >> > SELECT COUNT(ItemID) FROM Items; > >> > > >> > This takes around 40 seconds the first time! WHY?! > >> > >> Try again, doing everything identically except that instead of the above > >> line use > >> > >> SELECT COUNT(*) FROM Items; > >> > >> Simon. > >> ___ > >> sqlite-users mailing list > >> sqlite-users@sqlite.org > >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > > ___ > > sqlite-users mailing list > > sqlite-users@sqlite.org > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
On Mon, Feb 21, 2011 at 10:17:03AM -0500, Jim Wilcoxson wrote: > This is a common issue on the mailing list. The first time you do > count(*), SQLite (actually your OS) has to load data into memory. > Most OS's will keep the file in a buffer cache, so the 2nd count(*) > doesn't have to read from disk. One question I have related to the above -- how long does that buffer cache remain filled with the data? I am assuming it is until new stuff to be cached pushes out old stuff, no? I was doing some R*Tree selects, and the first query was dog slow, although benchmarking showed that the actual CPU time was very small. Subsequent queries were lightning fast. I am assuming that the buffer is not getting filled with the results as much as it is getting filled with whatever part of the db that the program needs to open to do its work. > > Here's a timing from my own system, after a purge command to clear the > buffer cache: > > $ time sqlite3 hb.db 'select count(*) from logs' > -- Loading resources from /Users/jim/.sqliterc > count(*) > -- > 734909 > > real 0m0.580s > user 0m0.190s > sys 0m0.034s > > Same command again, with the file cached: > > $ time sqlite3 hb.db 'select count(*) from logs' > -- Loading resources from /Users/jim/.sqliterc > count(*) > -- > 734909 > > real 0m0.189s > user 0m0.165s > sys 0m0.019s > > This time is consistent no matter how many times I run it, because the > file is still cached. Doing a purge command to clear the cache and > re-running the query, we get: > > $ purge > $ time sqlite3 hb.db 'select count(*) from logs' > -- Loading resources from /Users/jim/.sqliterc > count(*) > -- > 734909 > > real 0m0.427s > user 0m0.175s > sys 0m0.024s > > On my system, there is not a huge difference, but it is consistent. > Now, if you have a fragmented file system, you will see a much larger > difference. There are many posts on the mailing list about both file > system fragmentation and logical fragmentation within the SQLite file > itself. Your first count(*) is subject to these fragmentation > effects, while your 2nd usually is not, because the file is in memory. > > Some people on the list believe fragmentation is an unimportant detail > you shouldn't worry about, because you have little control over it. > That may be true, but it's useful to understand how it can affect > performance. I think you are seeing this first hand. > > Jim > -- > HashBackup: easy onsite and offsite Unix backup > http://www.hashbackup.com > > > > On Mon, Feb 21, 2011 at 9:37 AM, Sven L <larvpo...@hotmail.se> wrote: > > > > Same result :( > > Note that I have compiled SQLite with the following switches: > > SQLITE_ENABLE_STAT2 > > SQLITE_THREADSAFE=2 > > > > I've learnt that COUNT(*) is slower than COUNT(ID), since * means the > > engine has to traverse all columns and it might even return another value > > if there are NULL-values... > > > > Also, this is quite interesting: > > > > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items; > > 0|0|0|SCAN TABLE Items (~100 rows) > > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items; > > sqlite> > > > > I would expect an index scan on the first statement. The second statement > > tells me nada?! > > > > Thanks for your help! > > > > > >> From: slav...@bigfraud.org > >> Date: Mon, 21 Feb 2011 14:24:50 + > >> To: sqlite-users@sqlite.org > >> Subject: Re: [sqlite] COUNT() extremely slow first time! > >> > >> > >> On 21 Feb 2011, at 2:23pm, Sven L wrote: > >> > >> > SELECT COUNT(ItemID) FROM Items; > >> > > >> > This takes around 40 seconds the first time! WHY?! > >> > >> Try again, doing everything identically except that instead of the above > >> line use > >> > >> SELECT COUNT(*) FROM Items; > >> > >> Simon. > >> ___ > >> sqlite-users mailing list > >> sqlite-users@sqlite.org > >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > > ___ > > sqlite-users mailing list > > sqlite-users@sqlite.org > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
This is a common issue on the mailing list. The first time you do count(*), SQLite (actually your OS) has to load data into memory. Most OS's will keep the file in a buffer cache, so the 2nd count(*) doesn't have to read from disk. Here's a timing from my own system, after a purge command to clear the buffer cache: $ time sqlite3 hb.db 'select count(*) from logs' -- Loading resources from /Users/jim/.sqliterc count(*) -- 734909 real0m0.580s user0m0.190s sys 0m0.034s Same command again, with the file cached: $ time sqlite3 hb.db 'select count(*) from logs' -- Loading resources from /Users/jim/.sqliterc count(*) -- 734909 real0m0.189s user0m0.165s sys 0m0.019s This time is consistent no matter how many times I run it, because the file is still cached. Doing a purge command to clear the cache and re-running the query, we get: $ purge $ time sqlite3 hb.db 'select count(*) from logs' -- Loading resources from /Users/jim/.sqliterc count(*) -- 734909 real0m0.427s user0m0.175s sys 0m0.024s On my system, there is not a huge difference, but it is consistent. Now, if you have a fragmented file system, you will see a much larger difference. There are many posts on the mailing list about both file system fragmentation and logical fragmentation within the SQLite file itself. Your first count(*) is subject to these fragmentation effects, while your 2nd usually is not, because the file is in memory. Some people on the list believe fragmentation is an unimportant detail you shouldn't worry about, because you have little control over it. That may be true, but it's useful to understand how it can affect performance. I think you are seeing this first hand. Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com On Mon, Feb 21, 2011 at 9:37 AM, Sven L <larvpo...@hotmail.se> wrote: > > Same result :( > Note that I have compiled SQLite with the following switches: > SQLITE_ENABLE_STAT2 > SQLITE_THREADSAFE=2 > > I've learnt that COUNT(*) is slower than COUNT(ID), since * means the engine > has to traverse all columns and it might even return another value if there > are NULL-values... > > Also, this is quite interesting: > > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items; > 0|0|0|SCAN TABLE Items (~100 rows) > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items; > sqlite> > > I would expect an index scan on the first statement. The second statement > tells me nada?! > > Thanks for your help! > > >> From: slav...@bigfraud.org >> Date: Mon, 21 Feb 2011 14:24:50 + >> To: sqlite-users@sqlite.org >> Subject: Re: [sqlite] COUNT() extremely slow first time! >> >> >> On 21 Feb 2011, at 2:23pm, Sven L wrote: >> >> > SELECT COUNT(ItemID) FROM Items; >> > >> > This takes around 40 seconds the first time! WHY?! >> >> Try again, doing everything identically except that instead of the above >> line use >> >> SELECT COUNT(*) FROM Items; >> >> Simon. >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
Same result :( Note that I have compiled SQLite with the following switches: SQLITE_ENABLE_STAT2 SQLITE_THREADSAFE=2 I've learnt that COUNT(*) is slower than COUNT(ID), since * means the engine has to traverse all columns and it might even return another value if there are NULL-values... Also, this is quite interesting: sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items; 0|0|0|SCAN TABLE Items (~100 rows) sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items; sqlite> I would expect an index scan on the first statement. The second statement tells me nada?! Thanks for your help! > From: slav...@bigfraud.org > Date: Mon, 21 Feb 2011 14:24:50 + > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] COUNT() extremely slow first time! > > > On 21 Feb 2011, at 2:23pm, Sven L wrote: > > > SELECT COUNT(ItemID) FROM Items; > > > > This takes around 40 seconds the first time! WHY?! > > Try again, doing everything identically except that instead of the above line > use > > SELECT COUNT(*) FROM Items; > > Simon. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
On 21 Feb 2011, at 2:23pm, Sven L wrote: > SELECT COUNT(ItemID) FROM Items; > > This takes around 40 seconds the first time! WHY?! Try again, doing everything identically except that instead of the above line use SELECT COUNT(*) FROM Items; Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] COUNT() extremely slow first time!
SQLite 3.7.5. I have a table (13 columns with 1 primary key, 1 index on a date column, 1 UNIQUE-constraint on two columns). I insert ~130 rows, 1000 at a time (transaction-wise). I close the database, reopen it and immediately perform a COUNT-operation: SELECT COUNT(ItemID) FROM Items; This takes around 40 seconds the first time! WHY?! I believe the value is somehow cached inside the database after the above call. The table used: CREATE TABLE IF NOT EXISTS Item ( ItemID INTEGER PRIMARY KEY, A INTEGER NOT NULL, B INTEGER NOT NULL, C INTEGER NOT NULL, D INTEGER NOT NULL, E INTEGER NOT NULL, F INTEGER NOT NULL, G INTEGER NOT NULL, H DATE NOT NULL, I CHAR(3) NOT NULL, J INTEGER NOT NULL, K INTEGER NOT NULL, L INTEGER NOT NULL, UNIQUE (B, A) ); CREATE INDEX IF NOT EXISTS Item_H_Index ON Item (H); ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users