Re: [sqlite] Tracing latencies
On Fri, Dec 7, 2012 at 3:28 PM, Keith Chewwrote: > > We can see the times taken to do the syscalls are fast, so what is > causing the delay between the seek and write? I had a look at the > code, and found the method seekAndWrite(), which looks pretty straight > forward. The only time it could be slow is when osWrite() return -1. > Any ideas? > > Please advise if I should still try to get vfs=unix-none to work. Looks like I have managed to pin point the problem area. Here are the results: --- Config 1: No latency --- PRAGMA journal_mode = OFF; PRAGMA locking_mode = EXCLUSIVE; --- Config 2: Latency observed --- PRAGMA journal_mode = WAL; PRAGMA locking_mode = EXCLUSIVE; --- Config 3: Latency observed --- PRAGMA journal_mode = OFF; PRAGMA locking_mode = NORMAL; --- >From the above, we can see that: - WAL itself is not the issue, because config 3 shows the latency without WAL - Speed of writing to disk is not the issue, because of config 1 & 3 What causes the latency is the common thing between WAL and locking_mode=normal, and just briefly scanning the code, it looks like they both lock files before writing to disk. I am happy to try out any suggestions to pin point this even further. Any guidance on where I should next look at would be greatly appreciated! I am happy to change or comment out sections of the code just to prove things... Regards Keith ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
Hi Richard On Fri, Dec 7, 2012 at 2:08 AM, Richard Hippwrote: > If that does cure your latencies, it also adds another problem, in that it > does nothing to prevent two or more processes, or even two threads within > the same process, from trying to use the database at the same time and > tripping over one another. So this solution will only work if you can > absolutely guarantee that nobody will ever fire up an "sqlite3" shell on > your database file at the same time you application is using it. First, thank you so much for having the patience to help me out, really appreciate it. I tried to see if unix-none works from the shell: - sqlite3 file:/tmp/test.db?vfs=unix-none SQLite version 3.7.10 2012-01-16 13:28:40 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> .tables Error: unable to open database file - I can't seem to get this to work, but I know URI name format is working because sqlite3 file:/tmp/test.db?vfs=unix works fine. I had another idea. Would setting "PRAGMA locking_mode=EXCLUSIVE;" have the same effect? Ie it only locks once when opening the DB, and never releases it? I tried this, but the latency is still there. I have gone another step further and straced it: --- 19547 22:04:13.331096 _llseek(79, 1050632, [1050632], SEEK_SET) = 0 <0.16> 19547 22:04:13.545595 write(79, "\0\0\0\4\0\0\r\222\313\370\227\2001\342\366\304p\5\254\250W}<\372", 24) = 24 <0.000243> 19547 22:04:13.546570 _llseek(79, 1050656, 19547 22:04:13.547100 <... _llseek resumed> [1050656], SEEK_SET) = 0 <0.000502> 19547 22:04:13.548011 write(79, "\r\0\0\0\1\0173\0\0173\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096 19547 22:04:13.548368 <... write resumed> ) = 4096 <0.000293> --- We can see the times taken to do the syscalls are fast, so what is causing the delay between the seek and write? I had a look at the code, and found the method seekAndWrite(), which looks pretty straight forward. The only time it could be slow is when osWrite() return -1. Any ideas? Please advise if I should still try to get vfs=unix-none to work. Regards Keith ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
On Thu, Dec 6, 2012 at 12:14 AM, Keith Chewwrote: > On Thu, Dec 6, 2012 at 3:52 PM, Keith Chew wrote: > > I am guess it may have something to do with the locking of the files. > > sqlite could be getting blocked by the filesystem when trying to > > obtain the lock before writing to the file... > > This is a stab in the dark... I had a look at the sqlite code, and > noticed that the file lock is acquired using fcntl(F_SETLK). So, when > kjournald wants to flush the data to disk, and sqlite wants to write > to the file at the same time, I guess there will be contention, and > sqlite can only get the lock when the file is free. This explains the > latencies that I am seeing being consistent with the ext3 journal's > parameters. Eg increase commit time removes the contention (thus 0ms > between commits), removing write-cache, disabling barriers all affect > the latency accordingly. > > So, if this is the case, how do I get around it? > Specify "unix-none" at the VFS in the 4th parameter to sqlite3_open_v2(). (Or you can use URI pathnames and add the vfs=unix-none query parameter.) This will load a version of the backend that never does any file locking. Does that make your problem go away? If that does cure your latencies, it also adds another problem, in that it does nothing to prevent two or more processes, or even two threads within the same process, from trying to use the database at the same time and tripping over one another. So this solution will only work if you can absolutely guarantee that nobody will ever fire up an "sqlite3" shell on your database file at the same time you application is using it. > > Regards > Keith > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
On Thu, Dec 6, 2012 at 3:52 PM, Keith Chewwrote: > I am guess it may have something to do with the locking of the files. > sqlite could be getting blocked by the filesystem when trying to > obtain the lock before writing to the file... This is a stab in the dark... I had a look at the sqlite code, and noticed that the file lock is acquired using fcntl(F_SETLK). So, when kjournald wants to flush the data to disk, and sqlite wants to write to the file at the same time, I guess there will be contention, and sqlite can only get the lock when the file is free. This explains the latencies that I am seeing being consistent with the ext3 journal's parameters. Eg increase commit time removes the contention (thus 0ms between commits), removing write-cache, disabling barriers all affect the latency accordingly. So, if this is the case, how do I get around it? Regards Keith ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
On Thu, Dec 6, 2012 at 1:01 PM, Keith Chewwrote: > I used sar and iostat to profile what is happening at the disk level, > and below are the results. You can see that the TPS on the disk is > very small, only 6-7 tps, and both mysql and sqlite profiles show very > close patterns, except sqlite is writing a bit more. I would expect at > this tps level, writes should be very fast. Is there anything else I > can try to narrow this down further? > Hi I have found that by changing the commit time of ext3 (ie from commit=1 to commit=5), the latency drops to 0s, and only goes up to 160ms during the commit from the kernel. So, I know that the latency is actually caused by the kernel journal commits, but strangely it is not affecting mysql. Is there anything I can do in the code to investigate this further? I am guess it may have something to do with the locking of the files. sqlite could be getting blocked by the filesystem when trying to obtain the lock before writing to the file... ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
On Thu, Dec 6, 2012 at 9:16 AM, Keith Chewwrote: > Thank you so much for all the input. I tried 2 things, using only a > single connection (as suggested by Clemen) and to change the page size > to 32K. Single connection did not reduce the average latency, but did > help in reducing the spikes. So, I believe Clemen has helped identify > one issue (write concurrencies). Using 32K page size also did help > reduce the spikes (not sure logic behind this, have yet to do a > strace), but average latency is still there. I used sar and iostat to profile what is happening at the disk level, and below are the results. You can see that the TPS on the disk is very small, only 6-7 tps, and both mysql and sqlite profiles show very close patterns, except sqlite is writing a bit more. I would expect at this tps level, writes should be very fast. Is there anything else I can try to narrow this down further? Sqlite 12/06/2012 12:43:10 PM Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 6.930.006.93 0.00 110.89 16.00 0.35 64.57 25.14 17.43 12/06/2012 12:43:11 PM Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 9.000.006.00 0.00 120.00 20.00 0.45 59.33 29.33 17.60 12/06/2012 12:43:12 PM Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 6.000.007.00 0.00 104.00 14.86 0.37 67.43 24.57 17.20 12/06/2012 12:43:13 PM Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 5.940.005.94 0.0095.05 16.00 0.36 59.33 30.00 17.82 12/06/2012 12:43:14 PM Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 6.000.004.00 0.0080.00 20.00 0.46 34.00 39.00 15.60 12/06/2012 12:43:15 PM Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 6.000.009.00 0.00 120.00 13.33 0.49 89.78 27.11 24.40 Mysql 12/06/2012 12:50:22 PM Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 3.060.006.12 0.0073.47 12.00 0.38 62.67 27.33 16.73 12/06/2012 12:50:23 PM Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 2.910.004.85 0.0062.14 12.80 0.31 63.20 30.40 14.76 12/06/2012 12:50:24 PM Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 3.030.006.06 0.0072.73 12.00 0.34 57.33 24.67 14.95 12/06/2012 12:50:25 PM Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 3.000.005.00 0.0064.00 12.80 0.30 57.60 29.60 14.80 12/06/2012 12:50:26 PM Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 2.970.005.94 0.0071.29 12.00 0.36 60.00 26.00 15.45 12/06/2012 12:50:27 PM Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 3.000.005.00 0.0064.00 12.80 0.29 57.60 28.80 14.40 Sqlite 12:43:44 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 12:43:45 PMdev8-0 6.93 0.00102.97 14.86 0.39 56.00 21.71 15.05 12:43:45 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 12:43:46 PMdev8-0 5.94 0.00 95.05 16.00 0.31 52.67 24.67 14.65 12:43:46 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 12:43:47 PMdev8-0 6.93 0.00102.97 14.86 0.41 58.86 22.86 15.84 12:43:47 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 12:43:48 PMdev8-0 6.00 0.00 96.00 16.00 0.33 54.67 28.00 16.80 12:43:48 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 12:43:49 PMdev8-0 7.00 0.00120.00 17.14 0.46 66.29 26.86 18.80 12:43:49 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 12:43:50 PMdev8-0 5.94 0.00 95.05 16.00 0.36 60.00 31.33 18.61 12:43:50 PM DEV tps rd_sec/s wr_sec/s
Re: [sqlite] Tracing latencies
Hi All > That hypothesis could be easily tested by examining the corresponding strace > output for mysql... I don't know if sqlite3's unaligned log writes are > actually slow (didn't see this info in any of Keith's messages), but if they > are that would suggest the hypothesis is false; it would really put nails in > the coffin if mysql uses aligned log writes. > > (BTW Michael, whatever mechanism you use to reply breaks threading in both > Thunderbird and the list archives at sqlite.org; makes it harder to follow > conversations) Thank you so much for all the input. I tried 2 things, using only a single connection (as suggested by Clemen) and to change the page size to 32K. Single connection did not reduce the average latency, but did help in reducing the spikes. So, I believe Clemen has helped identify one issue (write concurrencies). Using 32K page size also did help reduce the spikes (not sure logic behind this, have yet to do a strace), but average latency is still there. As expected, combining the 2 solutions would not reduce the average latency, but spikes are not as frequent anymore. Eg I still see 160ms average, and sometimes it spikes to 300ms, but I have not seen a 4000ms transaction. I don't think there is anything more I do about this at this stage, I am getting a 7200rpm drive later today to test and see if that makes a difference. I am expecting the average latency to drop by about 25%. Regards Keith ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
On 05/12/2012 1:25 PM, Dan Kennedy wrote: On 12/06/2012 12:47 AM, Black, Michael (IS) wrote: I'm a bit confused with "always aligned". None of the lseeks in this log are 1024 aligned. And I just ran a test with the 3.7.15.1 latest amalgamation and most of these seeks are not aligned. Once in a while it gets lucky. Alignment sure isn't deliberate in this. It appears the first page is 1080 which is already out of alignment. File#4 here is the wal file. I wasn't clear. Writes to the WAL file are not aligned. However, they are usually sequential, which means on an OS with a smart enough cache you don't really gain anything at all trying to align them. That hypothesis could be easily tested by examining the corresponding strace output for mysql... I don't know if sqlite3's unaligned log writes are actually slow (didn't see this info in any of Keith's messages), but if they are that would suggest the hypothesis is false; it would really put nails in the coffin if mysql uses aligned log writes. (BTW Michael, whatever mechanism you use to reply breaks threading in both Thunderbird and the list archives at sqlite.org; makes it harder to follow conversations) Ryan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
On 12/06/2012 12:47 AM, Black, Michael (IS) wrote: I'm a bit confused with "always aligned". None of the lseeks in this log are 1024 aligned. And I just ran a test with the 3.7.15.1 latest amalgamation and most of these seeks are not aligned. Once in a while it gets lucky. Alignment sure isn't deliberate in this. It appears the first page is 1080 which is already out of alignment. File#4 here is the wal file. I wasn't clear. Writes to the WAL file are not aligned. However, they are usually sequential, which means on an OS with a smart enough cache you don't really gain anything at all trying to align them. To the database file, where it does matter, writes are always aligned. lseek(4, 0, SEEK_SET) = 0 lseek(4, 0, SEEK_SET) = 0 lseek(4, 32, SEEK_SET) = 32 lseek(4, 56, SEEK_SET) = 56 lseek(4, 1080, SEEK_SET) = 1080 lseek(4, 1104, SEEK_SET)= 1104 lseek(4, 2128, SEEK_SET)= 2128 lseek(4, 2152, SEEK_SET) = 2152 lseek(4, 3176, SEEK_SET)= 3176 lseek(4, 3200, SEEK_SET)= 3200 Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Dan Kennedy [danielk1...@gmail.com] Sent: Wednesday, December 05, 2012 10:27 AM To: sqlite-users@sqlite.org Subject: EXT :Re: [sqlite] Tracing latencies On 12/05/2012 09:03 PM, Black, Michael (IS) wrote: Hmmm...looking at that strace sequencing is there some reason those 24-byte and 1024-byte writes can't be combined? The 1024-byte write is occurring at the end boundary of the 24-byte. That would cut the i/o ops in half and might be a noticeable improvement. A memory copy would be a lot faster than 2 i/o requests. And since disk page size is always a power of 2 would the 1024 buffer be better off as 1000 to align the page i/o better? Those writes are appending to the WAL file. Writes to the database file are always aligned page-sized (in this case 1024 byte) blocks. For the WAL file, we tried it both ways (combining the 24 and 1024 bytes writes into one and leaving them separate) and found that, on linux, it's faster to call write() twice. In other words, the extra write() call is cheaper than doing a 1048 byte memcpy(). And you can't just use 1048 byte buffers everywhere internally, as memory allocators tend to waste lots of space if you allocate many blocks that are all just a bit larger than a power-of-two. Having said that, there has been at least one closed-source VFS backend that buffers the sequential writes SQLite makes on the journal and WAL files so that it can make mostly 8KB aligned writes to the underlying file-system. So on some systems there is a benefit to writing aligned page blocks even if you are writing sequentially. Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Keith Chew [keith.c...@gmail.com] Sent: Wednesday, December 05, 2012 2:11 AM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] Tracing latencies Hi Dan On Wed, Dec 5, 2012 at 6:38 PM, Dan Kennedy<danielk1...@gmail.com> wrote: If it's not fsync() then IO delays are normally caused by read(). You could try [strace -T -eread ...] to check. Are SELECT statements fast on the same database? How large is the database compared to the machines memory? Ah, do you think the read on the OS level is starving the writes? There are not many DB selects from the app, but there are other read IO activities happening in the background. Still it doesn't make sense, because WAL mode ensures from sqlite's point of view, the reader will not be blocked by the writer. So, sqlite is subjected to similar levels of read IO activity as Mysql (because there are very few DB selects). The tables are small, around 20MB in total, compared to 2GB of memory available. I have done more investigation. Using strace, I waited to capture an insert/update with a long latency, and caught one that is 2s long, below is the strace to the WAL file. We can see that it is very bursty, all the seeks and writes span close to over 1s. I recall doing a strace on Mysql a long time ago, and each insert only does a single seek and write. Maybe it is sqlite's file format that requires it to seek/write multiple places causing the slowness? I am only guessing here, as I do not know how to interpret the strace logs below. [pid 4015] 21:01:53.634099 _llseek(98, 499928, [pid 4015] 21:01:53.634245 write(98, "\0\0\0\4\0\0HU^=\226\213\23\10<\247+\214\332\260\314Wf ", 24 [pid 4015] 21:01:53.634546 _llseek(98, 499952, [pid 4015] 21:01:
Re: [sqlite] Tracing latencies
I'm a bit confused with "always aligned". None of the lseeks in this log are 1024 aligned. And I just ran a test with the 3.7.15.1 latest amalgamation and most of these seeks are not aligned. Once in a while it gets lucky. Alignment sure isn't deliberate in this. It appears the first page is 1080 which is already out of alignment. File#4 here is the wal file. lseek(4, 0, SEEK_SET) = 0 lseek(4, 0, SEEK_SET) = 0 lseek(4, 32, SEEK_SET) = 32 lseek(4, 56, SEEK_SET) = 56 lseek(4, 1080, SEEK_SET)= 1080 lseek(4, 1104, SEEK_SET)= 1104 lseek(4, 2128, SEEK_SET)= 2128 lseek(4, 2152, SEEK_SET)= 2152 lseek(4, 3176, SEEK_SET)= 3176 lseek(4, 3200, SEEK_SET)= 3200 Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Dan Kennedy [danielk1...@gmail.com] Sent: Wednesday, December 05, 2012 10:27 AM To: sqlite-users@sqlite.org Subject: EXT :Re: [sqlite] Tracing latencies On 12/05/2012 09:03 PM, Black, Michael (IS) wrote: > Hmmm...looking at that strace sequencing is there some reason those 24-byte > and 1024-byte writes can't be combined? The 1024-byte write is occurring at > the end boundary of the 24-byte. > > That would cut the i/o ops in half and might be a noticeable improvement. A > memory copy would be a lot faster than 2 i/o requests. > > And since disk page size is always a power of 2 would the 1024 buffer be > better off as 1000 to align the page i/o better? Those writes are appending to the WAL file. Writes to the database file are always aligned page-sized (in this case 1024 byte) blocks. For the WAL file, we tried it both ways (combining the 24 and 1024 bytes writes into one and leaving them separate) and found that, on linux, it's faster to call write() twice. In other words, the extra write() call is cheaper than doing a 1048 byte memcpy(). And you can't just use 1048 byte buffers everywhere internally, as memory allocators tend to waste lots of space if you allocate many blocks that are all just a bit larger than a power-of-two. Having said that, there has been at least one closed-source VFS backend that buffers the sequential writes SQLite makes on the journal and WAL files so that it can make mostly 8KB aligned writes to the underlying file-system. So on some systems there is a benefit to writing aligned page blocks even if you are writing sequentially. > > Michael D. Black > Senior Scientist > Advanced Analytics Directorate > Advanced GEOINT Solutions Operating Unit > Northrop Grumman Information Systems > > > From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on > behalf of Keith Chew [keith.c...@gmail.com] > Sent: Wednesday, December 05, 2012 2:11 AM > To: General Discussion of SQLite Database > Subject: EXT :Re: [sqlite] Tracing latencies > > Hi Dan > > On Wed, Dec 5, 2012 at 6:38 PM, Dan Kennedy<danielk1...@gmail.com> wrote: >> If it's not fsync() then IO delays are normally caused by read(). >> You could try [strace -T -eread ...] to check. >> >> Are SELECT statements fast on the same database? How large is the >> database compared to the machines memory? > > Ah, do you think the read on the OS level is starving the writes? > There are not many DB selects from the app, but there are other read > IO activities happening in the background. Still it doesn't make > sense, because WAL mode ensures from sqlite's point of view, the > reader will not be blocked by the writer. So, sqlite is subjected to > similar levels of read IO activity as Mysql (because there are very > few DB selects). > > The tables are small, around 20MB in total, compared to 2GB of memory > available. > > I have done more investigation. Using strace, I waited to capture an > insert/update with a long latency, and caught one that is 2s long, > below is the strace to the WAL file. We can see that it is very > bursty, all the seeks and writes span close to over 1s. > > I recall doing a strace on Mysql a long time ago, and each insert only > does a single seek and write. Maybe it is sqlite's file format that > requires it to seek/write multiple places causing the slowness? I am > only guessing here, as I do not know how to interpret the strace logs > below. > > [pid 4015] 21:01:53.634099 _llseek(98, 499928, > [pid 4015] 21:01:53.634245 write(98, > "\0\0\0\4\0\0HU^=\226\213\23\10<\247+\214\332\260\314Wf ", 24 > > [pid 4015] 21:01:53.634546 _llseek(98,
Re: [sqlite] Tracing latencies
On 12/05/2012 09:03 PM, Black, Michael (IS) wrote: Hmmm...looking at that strace sequencing is there some reason those 24-byte and 1024-byte writes can't be combined? The 1024-byte write is occurring at the end boundary of the 24-byte. That would cut the i/o ops in half and might be a noticeable improvement. A memory copy would be a lot faster than 2 i/o requests. And since disk page size is always a power of 2 would the 1024 buffer be better off as 1000 to align the page i/o better? Those writes are appending to the WAL file. Writes to the database file are always aligned page-sized (in this case 1024 byte) blocks. For the WAL file, we tried it both ways (combining the 24 and 1024 bytes writes into one and leaving them separate) and found that, on linux, it's faster to call write() twice. In other words, the extra write() call is cheaper than doing a 1048 byte memcpy(). And you can't just use 1048 byte buffers everywhere internally, as memory allocators tend to waste lots of space if you allocate many blocks that are all just a bit larger than a power-of-two. Having said that, there has been at least one closed-source VFS backend that buffers the sequential writes SQLite makes on the journal and WAL files so that it can make mostly 8KB aligned writes to the underlying file-system. So on some systems there is a benefit to writing aligned page blocks even if you are writing sequentially. Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Keith Chew [keith.c...@gmail.com] Sent: Wednesday, December 05, 2012 2:11 AM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] Tracing latencies Hi Dan On Wed, Dec 5, 2012 at 6:38 PM, Dan Kennedy<danielk1...@gmail.com> wrote: If it's not fsync() then IO delays are normally caused by read(). You could try [strace -T -eread ...] to check. Are SELECT statements fast on the same database? How large is the database compared to the machines memory? Ah, do you think the read on the OS level is starving the writes? There are not many DB selects from the app, but there are other read IO activities happening in the background. Still it doesn't make sense, because WAL mode ensures from sqlite's point of view, the reader will not be blocked by the writer. So, sqlite is subjected to similar levels of read IO activity as Mysql (because there are very few DB selects). The tables are small, around 20MB in total, compared to 2GB of memory available. I have done more investigation. Using strace, I waited to capture an insert/update with a long latency, and caught one that is 2s long, below is the strace to the WAL file. We can see that it is very bursty, all the seeks and writes span close to over 1s. I recall doing a strace on Mysql a long time ago, and each insert only does a single seek and write. Maybe it is sqlite's file format that requires it to seek/write multiple places causing the slowness? I am only guessing here, as I do not know how to interpret the strace logs below. [pid 4015] 21:01:53.634099 _llseek(98, 499928, [pid 4015] 21:01:53.634245 write(98, "\0\0\0\4\0\0HU^=\226\213\23\10<\247+\214\332\260\314Wf ", 24 [pid 4015] 21:01:53.634546 _llseek(98, 499952, [pid 4015] 21:01:53.634712 write(98, "\r\0\0\0\1\0033\0\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024 [pid 4015] 21:01:53.740378 _llseek(98, 500976, [500976], SEEK_SET) = 0 [pid 4015] 21:01:53.740449 write(98, "\0\0\0\3\0\0\0\0^=\226\213\23\10<\247\327\201>\32\227\323\f8", 24) = 24 [pid 4015] 21:01:53.740521 _llseek(98, 501000, [501000], SEEK_SET) = 0 [pid 4015] 21:01:53.740566 write(98, "\r\3\221\0\t\3G\1\3\243\3\261\3\352\3\325\3\304\3G\3\177\3m\3\\\0\0\0\0\0\0"..., 1024) = 1024 [pid 4015] 21:01:53.740632 _llseek(98, 502024, [502024], SEEK_SET) = 0 [pid 4015] 21:01:53.740677 write(98, "\0\0\37X\0\0\0\0^=\226\213\23\10<\247dL\17\316\32\30\301\237", 24) = 24 [pid 4015] 21:01:53.740736 _llseek(98, 502048, [502048], SEEK_SET) = 0 [pid 4015] 21:01:53.740781 write(98, "\r\0\0\0\5\1*\0\3o\2\336\2L\1\273\1*\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 [pid 4015] 21:01:53.740844 _llseek(98, 503072, [503072], SEEK_SET) = 0 [pid 4015] 21:01:53.740889 write(98, "\0\0HR\0\0HU^=\226\213\23\10<\247\276\32g\304j\372Q.", 24) = 24 [pid 4015] 21:01:53.740945 _llseek(98, 503096, [503096], SEEK_SET) = 0 [pid 4015] 21:01:53.740989 write(98, "\n\0\0\0\25\1\264\0\2@\2\\\2x\2\224\2\260\2\314\2\350\3\4\3 \3<\3X\3t"..., 1024) = 1024 [pid 4015] 21:01:54.636566 _llseek(98, 504120, [504120], SEEK_SET) = 0 [pid 4015] 21:01:54.636636 write(98, "\0\0\0\4\0\0HU^=\226\213\23\10<\247=]`\3700\351\226n", 24) = 24 [pid 4015] 21:01:54.636812 _llsee
Re: [sqlite] Tracing latencies
Run this program (change the diff threshold if you want) on the strace log file. This will only show the calls that take "too long" and the time involved. #include #include main(int argc, char *argv[]) { char buf[65535]; double t1=0,t2; FILE *fp=fopen(argv[1],"r"); while(fgets(buf,sizeof(buf),fp)) { if (t1==0) { t1 = atof(buf); } else { t2 = atof(buf); double diff = t2 - t1; if (diff > .0001) { printf("%.6f %s",t2-t1,buf); } t1 = t2; } } } Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Keith Chew [keith.c...@gmail.com] Sent: Wednesday, December 05, 2012 2:11 AM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] Tracing latencies Hi Dan On Wed, Dec 5, 2012 at 6:38 PM, Dan Kennedy <danielk1...@gmail.com> wrote: > If it's not fsync() then IO delays are normally caused by read(). > You could try [strace -T -eread ...] to check. > > Are SELECT statements fast on the same database? How large is the > database compared to the machines memory? Ah, do you think the read on the OS level is starving the writes? There are not many DB selects from the app, but there are other read IO activities happening in the background. Still it doesn't make sense, because WAL mode ensures from sqlite's point of view, the reader will not be blocked by the writer. So, sqlite is subjected to similar levels of read IO activity as Mysql (because there are very few DB selects). The tables are small, around 20MB in total, compared to 2GB of memory available. I have done more investigation. Using strace, I waited to capture an insert/update with a long latency, and caught one that is 2s long, below is the strace to the WAL file. We can see that it is very bursty, all the seeks and writes span close to over 1s. I recall doing a strace on Mysql a long time ago, and each insert only does a single seek and write. Maybe it is sqlite's file format that requires it to seek/write multiple places causing the slowness? I am only guessing here, as I do not know how to interpret the strace logs below. [pid 4015] 21:01:53.634099 _llseek(98, 499928, [pid 4015] 21:01:53.634245 write(98, "\0\0\0\4\0\0HU^=\226\213\23\10<\247+\214\332\260\314Wf ", 24 [pid 4015] 21:01:53.634546 _llseek(98, 499952, [pid 4015] 21:01:53.634712 write(98, "\r\0\0\0\1\0033\0\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024 [pid 4015] 21:01:53.740378 _llseek(98, 500976, [500976], SEEK_SET) = 0 [pid 4015] 21:01:53.740449 write(98, "\0\0\0\3\0\0\0\0^=\226\213\23\10<\247\327\201>\32\227\323\f8", 24) = 24 [pid 4015] 21:01:53.740521 _llseek(98, 501000, [501000], SEEK_SET) = 0 [pid 4015] 21:01:53.740566 write(98, "\r\3\221\0\t\3G\1\3\243\3\261\3\352\3\325\3\304\3G\3\177\3m\3\\\0\0\0\0\0\0"..., 1024) = 1024 [pid 4015] 21:01:53.740632 _llseek(98, 502024, [502024], SEEK_SET) = 0 [pid 4015] 21:01:53.740677 write(98, "\0\0\37X\0\0\0\0^=\226\213\23\10<\247dL\17\316\32\30\301\237", 24) = 24 [pid 4015] 21:01:53.740736 _llseek(98, 502048, [502048], SEEK_SET) = 0 [pid 4015] 21:01:53.740781 write(98, "\r\0\0\0\5\1*\0\3o\2\336\2L\1\273\1*\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 [pid 4015] 21:01:53.740844 _llseek(98, 503072, [503072], SEEK_SET) = 0 [pid 4015] 21:01:53.740889 write(98, "\0\0HR\0\0HU^=\226\213\23\10<\247\276\32g\304j\372Q.", 24) = 24 [pid 4015] 21:01:53.740945 _llseek(98, 503096, [503096], SEEK_SET) = 0 [pid 4015] 21:01:53.740989 write(98, "\n\0\0\0\25\1\264\0\2@\2\\\2x\2\224\2\260\2\314\2\350\3\4\3 \3<\3X\3t"..., 1024) = 1024 [pid 4015] 21:01:54.636566 _llseek(98, 504120, [504120], SEEK_SET) = 0 [pid 4015] 21:01:54.636636 write(98, "\0\0\0\4\0\0HU^=\226\213\23\10<\247=]`\3700\351\226n", 24) = 24 [pid 4015] 21:01:54.636812 _llseek(98, 504144, [504144], SEEK_SET) = 0 [pid 4015] 21:01:54.636860 write(98, "\r\0\0\0\1\0033\0\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024 [pid 6744] 21:01:54.676590 close(98) = 0 Regards Keith ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
Hmmm...looking at that strace sequencing is there some reason those 24-byte and 1024-byte writes can't be combined? The 1024-byte write is occurring at the end boundary of the 24-byte. That would cut the i/o ops in half and might be a noticeable improvement. A memory copy would be a lot faster than 2 i/o requests. And since disk page size is always a power of 2 would the 1024 buffer be better off as 1000 to align the page i/o better? Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Keith Chew [keith.c...@gmail.com] Sent: Wednesday, December 05, 2012 2:11 AM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] Tracing latencies Hi Dan On Wed, Dec 5, 2012 at 6:38 PM, Dan Kennedy <danielk1...@gmail.com> wrote: > If it's not fsync() then IO delays are normally caused by read(). > You could try [strace -T -eread ...] to check. > > Are SELECT statements fast on the same database? How large is the > database compared to the machines memory? Ah, do you think the read on the OS level is starving the writes? There are not many DB selects from the app, but there are other read IO activities happening in the background. Still it doesn't make sense, because WAL mode ensures from sqlite's point of view, the reader will not be blocked by the writer. So, sqlite is subjected to similar levels of read IO activity as Mysql (because there are very few DB selects). The tables are small, around 20MB in total, compared to 2GB of memory available. I have done more investigation. Using strace, I waited to capture an insert/update with a long latency, and caught one that is 2s long, below is the strace to the WAL file. We can see that it is very bursty, all the seeks and writes span close to over 1s. I recall doing a strace on Mysql a long time ago, and each insert only does a single seek and write. Maybe it is sqlite's file format that requires it to seek/write multiple places causing the slowness? I am only guessing here, as I do not know how to interpret the strace logs below. [pid 4015] 21:01:53.634099 _llseek(98, 499928, [pid 4015] 21:01:53.634245 write(98, "\0\0\0\4\0\0HU^=\226\213\23\10<\247+\214\332\260\314Wf ", 24 [pid 4015] 21:01:53.634546 _llseek(98, 499952, [pid 4015] 21:01:53.634712 write(98, "\r\0\0\0\1\0033\0\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024 [pid 4015] 21:01:53.740378 _llseek(98, 500976, [500976], SEEK_SET) = 0 [pid 4015] 21:01:53.740449 write(98, "\0\0\0\3\0\0\0\0^=\226\213\23\10<\247\327\201>\32\227\323\f8", 24) = 24 [pid 4015] 21:01:53.740521 _llseek(98, 501000, [501000], SEEK_SET) = 0 [pid 4015] 21:01:53.740566 write(98, "\r\3\221\0\t\3G\1\3\243\3\261\3\352\3\325\3\304\3G\3\177\3m\3\\\0\0\0\0\0\0"..., 1024) = 1024 [pid 4015] 21:01:53.740632 _llseek(98, 502024, [502024], SEEK_SET) = 0 [pid 4015] 21:01:53.740677 write(98, "\0\0\37X\0\0\0\0^=\226\213\23\10<\247dL\17\316\32\30\301\237", 24) = 24 [pid 4015] 21:01:53.740736 _llseek(98, 502048, [502048], SEEK_SET) = 0 [pid 4015] 21:01:53.740781 write(98, "\r\0\0\0\5\1*\0\3o\2\336\2L\1\273\1*\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 [pid 4015] 21:01:53.740844 _llseek(98, 503072, [503072], SEEK_SET) = 0 [pid 4015] 21:01:53.740889 write(98, "\0\0HR\0\0HU^=\226\213\23\10<\247\276\32g\304j\372Q.", 24) = 24 [pid 4015] 21:01:53.740945 _llseek(98, 503096, [503096], SEEK_SET) = 0 [pid 4015] 21:01:53.740989 write(98, "\n\0\0\0\25\1\264\0\2@\2\\\2x\2\224\2\260\2\314\2\350\3\4\3 \3<\3X\3t"..., 1024) = 1024 [pid 4015] 21:01:54.636566 _llseek(98, 504120, [504120], SEEK_SET) = 0 [pid 4015] 21:01:54.636636 write(98, "\0\0\0\4\0\0HU^=\226\213\23\10<\247=]`\3700\351\226n", 24) = 24 [pid 4015] 21:01:54.636812 _llseek(98, 504144, [504144], SEEK_SET) = 0 [pid 4015] 21:01:54.636860 write(98, "\r\0\0\0\1\0033\0\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024 [pid 6744] 21:01:54.676590 close(98) = 0 Regards Keith ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
Keith Chew wrote: > In the application, it opens 5 connections and put them in a pool. Please note that SQLite has no write concurrency (actual writing to the DB file requires an exclusive lock), and that there is also no synchronization between multiple connections -- when one connection does not get the exclusive lock, it just waits a little time before trying again. It's possible that it's these waits that you're seeing. > I had a look, and at the most 3 connections are used at a time. Try it with one connection at a time. Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
On 12/05/2012 03:11 PM, Keith Chew wrote: Hi Dan On Wed, Dec 5, 2012 at 6:38 PM, Dan Kennedywrote: If it's not fsync() then IO delays are normally caused by read(). You could try [strace -T -eread ...] to check. Are SELECT statements fast on the same database? How large is the database compared to the machines memory? Ah, do you think the read on the OS level is starving the writes? There are not many DB selects from the app, but there are other read IO activities happening in the background. Still it doesn't make sense, because WAL mode ensures from sqlite's point of view, the reader will not be blocked by the writer. So, sqlite is subjected to similar levels of read IO activity as Mysql (because there are very few DB selects). The tables are small, around 20MB in total, compared to 2GB of memory available. I have done more investigation. Using strace, I waited to capture an insert/update with a long latency, and caught one that is 2s long, below is the strace to the WAL file. We can see that it is very bursty, all the seeks and writes span close to over 1s. I recall doing a strace on Mysql a long time ago, and each insert only does a single seek and write. Maybe it is sqlite's file format that requires it to seek/write multiple places causing the slowness? I am only guessing here, as I do not know how to interpret the strace logs below. A bunch of sequential writes to the wal file, totaling somewhere around 4KB. This should be fast. But in the middle you have a 1 second delay: > [pid 4015] 21:01:53.740989 write(98, > "\n\0\0\0\25\1\264\0\2@\2\\\2x\2\224\2\260\2\314\2\350\3\4\3 > \3<\3X\3t"..., 1024) = 1024 > [pid 4015] 21:01:54.636566 _llseek(98, 504120, [504120], SEEK_SET) = 0 If you use -T instead of -t with strace, it reports the amount of wall-clock time spent in each system call. Dan. [pid 4015] 21:01:53.634099 _llseek(98, 499928, [pid 4015] 21:01:53.634245 write(98, "\0\0\0\4\0\0HU^=\226\213\23\10<\247+\214\332\260\314Wf ", 24 [pid 4015] 21:01:53.634546 _llseek(98, 499952, [pid 4015] 21:01:53.634712 write(98, "\r\0\0\0\1\0033\0\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024 [pid 4015] 21:01:53.740378 _llseek(98, 500976, [500976], SEEK_SET) = 0 [pid 4015] 21:01:53.740449 write(98, "\0\0\0\3\0\0\0\0^=\226\213\23\10<\247\327\201>\32\227\323\f8", 24) = 24 [pid 4015] 21:01:53.740521 _llseek(98, 501000, [501000], SEEK_SET) = 0 [pid 4015] 21:01:53.740566 write(98, "\r\3\221\0\t\3G\1\3\243\3\261\3\352\3\325\3\304\3G\3\177\3m\3\\\0\0\0\0\0\0"..., 1024) = 1024 [pid 4015] 21:01:53.740632 _llseek(98, 502024, [502024], SEEK_SET) = 0 [pid 4015] 21:01:53.740677 write(98, "\0\0\37X\0\0\0\0^=\226\213\23\10<\247dL\17\316\32\30\301\237", 24) = 24 [pid 4015] 21:01:53.740736 _llseek(98, 502048, [502048], SEEK_SET) = 0 [pid 4015] 21:01:53.740781 write(98, "\r\0\0\0\5\1*\0\3o\2\336\2L\1\273\1*\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 [pid 4015] 21:01:53.740844 _llseek(98, 503072, [503072], SEEK_SET) = 0 [pid 4015] 21:01:53.740889 write(98, "\0\0HR\0\0HU^=\226\213\23\10<\247\276\32g\304j\372Q.", 24) = 24 [pid 4015] 21:01:53.740945 _llseek(98, 503096, [503096], SEEK_SET) = 0 [pid 4015] 21:01:53.740989 write(98, "\n\0\0\0\25\1\264\0\2@\2\\\2x\2\224\2\260\2\314\2\350\3\4\3 \3<\3X\3t"..., 1024) = 1024 [pid 4015] 21:01:54.636566 _llseek(98, 504120, [504120], SEEK_SET) = 0 [pid 4015] 21:01:54.636636 write(98, "\0\0\0\4\0\0HU^=\226\213\23\10<\247=]`\3700\351\226n", 24) = 24 [pid 4015] 21:01:54.636812 _llseek(98, 504144, [504144], SEEK_SET) = 0 [pid 4015] 21:01:54.636860 write(98, "\r\0\0\0\1\0033\0\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024 [pid 6744] 21:01:54.676590 close(98) = 0 Regards Keith ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
On 5 Dec 2012, at 8:11am, Keith Chewwrote: > Ah, do you think the read on the OS level is starving the writes? > There are not many DB selects from the app, but there are other read > IO activities happening in the background. Still it doesn't make > sense, because WAL mode ensures from sqlite's point of view, the > reader will not be blocked by the writer. So, sqlite is subjected to > similar levels of read IO activity as Mysql (because there are very > few DB selects). You may be right. If your cache is big enough, some storage setups prioritise all reads ahead of writes. They'll accumulate write commands in cache but won't actually do them until nothing is asking to read data any more. The theory is that all programs, sooner or later, will need some writes to happen before they know what to read next. In these days when programs do several things simultaneously the theory is wrong, and many programs do things like constantly updating one window while allowing another to be used to input new information. If they ask for data to be read fast enough, the writes never get done. This is called 'write starvation'. I can verify that there's nothing in SQLite that makes this assumption or works this way. What you're seeing must be provided by some subset of your firmware, networking protocol, framework or database library. The cures you've already discovered: don't use that combination of PRAGMAs on that setup. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
Hi Dan On Wed, Dec 5, 2012 at 6:38 PM, Dan Kennedywrote: > If it's not fsync() then IO delays are normally caused by read(). > You could try [strace -T -eread ...] to check. > > Are SELECT statements fast on the same database? How large is the > database compared to the machines memory? Ah, do you think the read on the OS level is starving the writes? There are not many DB selects from the app, but there are other read IO activities happening in the background. Still it doesn't make sense, because WAL mode ensures from sqlite's point of view, the reader will not be blocked by the writer. So, sqlite is subjected to similar levels of read IO activity as Mysql (because there are very few DB selects). The tables are small, around 20MB in total, compared to 2GB of memory available. I have done more investigation. Using strace, I waited to capture an insert/update with a long latency, and caught one that is 2s long, below is the strace to the WAL file. We can see that it is very bursty, all the seeks and writes span close to over 1s. I recall doing a strace on Mysql a long time ago, and each insert only does a single seek and write. Maybe it is sqlite's file format that requires it to seek/write multiple places causing the slowness? I am only guessing here, as I do not know how to interpret the strace logs below. [pid 4015] 21:01:53.634099 _llseek(98, 499928, [pid 4015] 21:01:53.634245 write(98, "\0\0\0\4\0\0HU^=\226\213\23\10<\247+\214\332\260\314Wf ", 24 [pid 4015] 21:01:53.634546 _llseek(98, 499952, [pid 4015] 21:01:53.634712 write(98, "\r\0\0\0\1\0033\0\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024 [pid 4015] 21:01:53.740378 _llseek(98, 500976, [500976], SEEK_SET) = 0 [pid 4015] 21:01:53.740449 write(98, "\0\0\0\3\0\0\0\0^=\226\213\23\10<\247\327\201>\32\227\323\f8", 24) = 24 [pid 4015] 21:01:53.740521 _llseek(98, 501000, [501000], SEEK_SET) = 0 [pid 4015] 21:01:53.740566 write(98, "\r\3\221\0\t\3G\1\3\243\3\261\3\352\3\325\3\304\3G\3\177\3m\3\\\0\0\0\0\0\0"..., 1024) = 1024 [pid 4015] 21:01:53.740632 _llseek(98, 502024, [502024], SEEK_SET) = 0 [pid 4015] 21:01:53.740677 write(98, "\0\0\37X\0\0\0\0^=\226\213\23\10<\247dL\17\316\32\30\301\237", 24) = 24 [pid 4015] 21:01:53.740736 _llseek(98, 502048, [502048], SEEK_SET) = 0 [pid 4015] 21:01:53.740781 write(98, "\r\0\0\0\5\1*\0\3o\2\336\2L\1\273\1*\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 [pid 4015] 21:01:53.740844 _llseek(98, 503072, [503072], SEEK_SET) = 0 [pid 4015] 21:01:53.740889 write(98, "\0\0HR\0\0HU^=\226\213\23\10<\247\276\32g\304j\372Q.", 24) = 24 [pid 4015] 21:01:53.740945 _llseek(98, 503096, [503096], SEEK_SET) = 0 [pid 4015] 21:01:53.740989 write(98, "\n\0\0\0\25\1\264\0\2@\2\\\2x\2\224\2\260\2\314\2\350\3\4\3 \3<\3X\3t"..., 1024) = 1024 [pid 4015] 21:01:54.636566 _llseek(98, 504120, [504120], SEEK_SET) = 0 [pid 4015] 21:01:54.636636 write(98, "\0\0\0\4\0\0HU^=\226\213\23\10<\247=]`\3700\351\226n", 24) = 24 [pid 4015] 21:01:54.636812 _llseek(98, 504144, [504144], SEEK_SET) = 0 [pid 4015] 21:01:54.636860 write(98, "\r\0\0\0\1\0033\0\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024 [pid 6744] 21:01:54.676590 close(98) = 0 Regards Keith ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
On 12/05/2012 09:22 AM, Keith Chew wrote: On Wed, Dec 5, 2012 at 1:14 PM, Keith Chewwrote: Since there is no fsync (I know this using grep on the strace ouput), it is hard to tell what to look for in the strace. The output is very noisy, so it makes it hard to go through them. Does anyone know any handy flags for strace to only see all the activities associated with sqlite file access? I have confirmed using strace there is no fsyncs done by sqlite (using synchronous=0 or synchronous=1), just seeks and writes. fsync is seen only when synchronous=2. After conducting more tests (using synchronous=0), I have some results which is baffling me: - Mysql average insert time, with write-cache OFF: 1-2ms - Mysql average insert time, with write-cache ON: 0-1ms - sqlite average insert time, with write-cache OFF: 160ms (have seen up to 4000ms) - sqlite average insert time, with write-cache ON: 50ms (have seen up to 150ms) Quite a significant difference. I am stumped on how to make sqlite run faster. If it's not fsync() then IO delays are normally caused by read(). You could try [strace -T -eread ...] to check. Are SELECT statements fast on the same database? How large is the database compared to the machines memory? Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
On Wed, Dec 5, 2012 at 1:14 PM, Keith Chewwrote: > Since there is no fsync (I know this using grep on the strace ouput), > it is hard to tell what to look for in the strace. The output is very > noisy, so it makes it hard to go through them. > > Does anyone know any handy flags for strace to only see all the > activities associated with sqlite file access? I have confirmed using strace there is no fsyncs done by sqlite (using synchronous=0 or synchronous=1), just seeks and writes. fsync is seen only when synchronous=2. After conducting more tests (using synchronous=0), I have some results which is baffling me: - Mysql average insert time, with write-cache OFF: 1-2ms - Mysql average insert time, with write-cache ON: 0-1ms - sqlite average insert time, with write-cache OFF: 160ms (have seen up to 4000ms) - sqlite average insert time, with write-cache ON: 50ms (have seen up to 150ms) Quite a significant difference. I am stumped on how to make sqlite run faster. Regards Keith ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
Hi Michael On Wed, Dec 5, 2012 at 12:02 PM, Black, Michael (IS)wrote: > Can you re-run your strace as "strace -tt" and look at the timings to help > pinpoint it? Since there is no fsync (I know this using grep on the strace ouput), it is hard to tell what to look for in the strace. The output is very noisy, so it makes it hard to go through them. Does anyone know any handy flags for strace to only see all the activities associated with sqlite file access? Regards Keith ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
Can you re-run your strace as "strace -tt" and look at the timings to help pinpoint it? Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Keith Chew [keith.c...@gmail.com] Sent: Tuesday, December 04, 2012 5:00 PM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] Tracing latencies On Wed, Dec 5, 2012 at 11:10 AM, Keith Chew <keith.c...@gmail.com> wrote: > The strange thing is that I am setting sqlite it to use WAL, > autocheckpoint off and synchronous off. Even in this setup, I still > see > 350ms transactions times for less than 3 TPS. A bit hard to > believe, so I am now doing a strace to find out exactly what is > hitting the disk. II can confirm using strace that there are no fsyncs happening from the application (which we expect because synchronous=0). So, it must be something else that is causing these blocks. Somekind of file locking issue (particular to my environment)? What else should I be looking at? Regards Keith ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
On Wed, Dec 5, 2012 at 11:10 AM, Keith Chewwrote: > The strange thing is that I am setting sqlite it to use WAL, > autocheckpoint off and synchronous off. Even in this setup, I still > see > 350ms transactions times for less than 3 TPS. A bit hard to > believe, so I am now doing a strace to find out exactly what is > hitting the disk. II can confirm using strace that there are no fsyncs happening from the application (which we expect because synchronous=0). So, it must be something else that is causing these blocks. Somekind of file locking issue (particular to my environment)? What else should I be looking at? Regards Keith ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
Hi Michael On Wed, Dec 5, 2012 at 10:57 AM, Black, Michael (IS)wrote: > Could it be waitiing on the prior transaction though? > > Since disk I/O lies it might be syncing the last transaction causing the new > one to wait longer. Yes, that is possible. In the application, it opens 5 connections and put them in a pool. I had a look, and at the most 3 connections are used at a time. So, I am looking at 3 TPS max. The strange thing is that I am setting sqlite it to use WAL, autocheckpoint off and synchronous off. Even in this setup, I still see > 350ms transactions times for less than 3 TPS. A bit hard to believe, so I am now doing a strace to find out exactly what is hitting the disk. Regards Keith ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
On Wed, Dec 5, 2012 at 10:56 AM, Richard Hippwrote: >> It is not an indexing issue. For one of the UPDATE SQLs, it is >> updating a table with only 1 record in it. And this takes > 350ms... >> > > Can you provide specifics: The schema and the UPDATE statement? Schema and SQL are below: CREATE TABLE `k_user` ( `user_id` integer primary key autoincrement, `user_name` varchar(50) default NULL, `created_at` datetime default NULL, `user_group_id` bigint(20) default NULL, `x` double default NULL, `y` double default NULL, `heading` double default NULL, `altitude` double default NULL, `speed` double default NULL, `state_id` bigint(20) default NULL, `status` varchar(15) default NULL, `last_updated_at` datetime default NULL ); update k_user set x = ? , y = ? , heading = ? , last_updated_at = ? where user_id=? It cannot get simpler than this. For the inserts, it is just as simple. A table with 15 columns, and a straight insert. Regards Keith ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
Could it be waitiing on the prior transaction though? Since disk I/O lies it might be syncing the last transaction causing the new one to wait longer. Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Keith Chew [keith.c...@gmail.com] Sent: Tuesday, December 04, 2012 3:45 PM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] Tracing latencies On Wed, Dec 5, 2012 at 10:28 AM, Keith Chew <keith.c...@gmail.com> wrote: > > I wonder what could be causing sqlite to hang so long? Will try to > remove all indexes to see if that narrows things down. It is not an indexing issue. For one of the UPDATE SQLs, it is updating a table with only 1 record in it. And this takes > 350ms... All the other tables have only 1 or 2 indexes, so should not be impacting the inserts/updates. Regards Keith ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
On Tue, Dec 4, 2012 at 4:45 PM, Keith Chewwrote: > On Wed, Dec 5, 2012 at 10:28 AM, Keith Chew wrote: > > > > I wonder what could be causing sqlite to hang so long? Will try to > > remove all indexes to see if that narrows things down. > > It is not an indexing issue. For one of the UPDATE SQLs, it is > updating a table with only 1 record in it. And this takes > 350ms... > Can you provide specifics: The schema and the UPDATE statement? -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
On Wed, Dec 5, 2012 at 10:28 AM, Keith Chewwrote: > > I wonder what could be causing sqlite to hang so long? Will try to > remove all indexes to see if that narrows things down. It is not an indexing issue. For one of the UPDATE SQLs, it is updating a table with only 1 record in it. And this takes > 350ms... All the other tables have only 1 or 2 indexes, so should not be impacting the inserts/updates. Regards Keith ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
On Wed, Dec 5, 2012 at 9:33 AM, Keith Chewwrote: > Any other ideas for me to trace this at a lower level would be greatly > appreciated. I think I will need to get deeper into linux's block > layer stuff, so perhaps this is not the correct place to ask the > question, but I thought I'd start here. I have done a bit more comparison on the IO pattern when running MySQL and sqlite, using vmstat (results below). I can observe that sqlite causes a bit more bytes to be written to the disk, but not by much. Wait times are also reasonable. I wonder what could be causing sqlite to hang so long? Will try to remove all indexes to see if that narrows things down. Sqlite: 2 0 0 184148 51564 44622400 028 290 602 0 0 86 14 0 1 0 0 184148 51572 44623200 028 293 1041 26 7 51 16 0 0 0 0 184148 51580 44623200 052 300 926 0 0 84 16 0 1 0 0 184024 51588 44623200 028 293 1071 24 9 53 14 0 1 0 0 184024 51596 44622800 036 287 802 0 1 83 16 0 2 0 0 183588 51604 44623600 028 291 1004 10 4 45 41 0 0 0 0 184024 51612 44623600 056 293 1003 16 3 65 17 0 0 0 0 184024 51620 44624000 028 293 854 0 0 87 13 0 0 0 0 184024 51628 44624400 040 292 1060 28 6 49 18 0 0 0 0 184024 51636 44624000 040 293 598 0 1 82 17 0 1 0 0 184024 51644 44624400 036 290 899 24 7 54 15 0 0 0 0 184024 51652 44624800 036 292 591 0 1 90 9 0 1 0 0 183580 51660 44624800 040 290 758 16 3 67 14 0 0 0 0 183900 51668 44624800 036 291 730 11 3 71 16 0 1 0 0 183900 51676 44624800 028 292 601 0 1 59 40 0 0 0 0 183900 51684 44625200 036 292 894 25 8 67 0 0 1 0 0 183900 51692 44625200 064 295 590 0 0 91 9 0 0 0 0 183900 51700 44625600 028 289 860 25 7 55 13 0 Mysql: 2 0 0 192408 48120 44830800 048 260 1076 30 5 62 3 0 0 0 0 195300 48128 45031600 032 263 1027 15 4 71 10 0 2 1 0 194708 48128 45027600 056 257 768 15 4 81 0 0 2 0 0 199004 48136 44640000 052 263 722 10 2 85 3 0 1 0 0 197268 48144 44842800 040 258 636 7 0 81 12 0 0 0 0 196896 48152 44845600 036 260 916 22 9 59 10 0 1 0 0 198880 48152 44640800 0 0 253 496 0 0 100 0 0 1 0 0 199012 48160 44626400 024 258 765 28 9 63 0 0 1 0 0 199012 48168 44626400 032 260 382 1 0 85 14 0 2 0 0 198868 48168 44626400 0 0 254 701 23 8 69 0 0 0 0 0 199012 48176 44629200 032 259 434 6 0 82 12 0 3 0 0 198832 48184 44628000 032 259 445 0 2 86 12 0 1 0 0 199012 48184 44626400 0 0 253 656 22 8 70 0 0 1 0 0 199012 48192 44626400 032 259 412 6 1 79 14 0 1 0 0 19 48200 44625600 056 261 702 24 6 53 17 0 0 0 0 19 48200 44626400 056 266 376 0 1 96 3 0 1 0 0 198864 48208 44626400 024 257 757 31 5 53 11 0 0 0 0 19 48216 44626400 032 259 382 1 1 85 13 0 2 0 0 19 48216 44626400 0 0 253 368 0 0 100 0 0 0 0 0 19 48224 44626800 036 258 430 6 1 73 20 0 1 0 0 197568 48232 44468800 024 267 687 8 3 80 9 0 Regards Keith ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
Hi Richard On Wed, Dec 5, 2012 at 9:05 AM, Richard Hippwrote: > The $dir will contain initial database contents and scripts of SQL > statements that were run. You can rerun those SQL statement using the > command-line shell to find slow ones, then do things like EXPLAIN QUERY > PLAN to figure out why they are slow and perhaps fix them by adding indices. Thank you very much for your comprehensive reply. I actually know the queries, because the application logs both the SQL and times. They are simple SQL statements, eg "insert into table_a (...) values (...)" or "update table_a set aaa='...' where bbb='...'". I just tried enabling the write-cache of the drive, and the warn logs disappeared. So, I know that somehow the OS/filesystem/disk setup is blocking the inserts/updates quite severely for sqlite (but not for MySQL). Is there anything else I can do to improve the situation? Ideally I would like to understand what is happening under the hood of sqlite, to improve my understanding of the difference between the 2 DBs. Any other ideas for me to trace this at a lower level would be greatly appreciated. I think I will need to get deeper into linux's block layer stuff, so perhaps this is not the correct place to ask the question, but I thought I'd start here. Regards Keith ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Tracing latencies
On Tue, Dec 4, 2012 at 2:30 PM, Keith Chewwrote: > Hi > > I have an existing application, which I am experimenting with > replacing MySQL with sqlite. I have 2 boxes, one running MySQL, the > other sqlite. > > Execution times for every insert/update/delete SQLs is measured, and a > warn log is printed if it takes > 250ms to perform the SQL. > > OS configuration: > - kernel 2.6.39 > - ext3, data=ordered, commit=1, barrier=1 > > On the MySQL box, I do not see any warn logs, so everything is going > well. On the sqlite box, I see warn logs come up quite often, ranging > from 350ms up to 4000ms to run simple insert/update/delete SQL. > > I have tried many sqlite configurations including: > - WAL, synchronous=0, wal_autocheckpoint=0 > > which should make sqlite run the fastest! > > My question is how do I go about tracing the cause of these latencies? > Here is one approach: (1) Get the latest SQLite sources from the Fossil repository (2) Do "./configure; make sqlite3.c" to build the amalgamation (3) Do "cat src/test_sqllog.c >>sqlite3.c" to append the "test_sqllog.c" file to the end of the amalgamation from step (2). (4) Edit the "sqlite3.c" file to insert a line near the top: #define SQLITE_ENABLE_SQLLOG Or, you can compile with -DSQLITE_ENABLE_SQLLOG, whichever is easiest for you. (5) Set the environment variable SQLITE_SQLLOG_DIR=$dir where $dir is an empty directory someplace where you want logging information to be written. (6) Run your application The $dir will contain initial database contents and scripts of SQL statements that were run. You can rerun those SQL statement using the command-line shell to find slow ones, then do things like EXPLAIN QUERY PLAN to figure out why they are slow and perhaps fix them by adding indices. Further information in the header comment of test_sqllog.c. http://www.sqlite.org/src/artifact/7813b47021a6d4e39bb7b1b328a8893dc59885cb The SQLLOG facility is new. So you'll have to use the latest sources out of Fossil for it to work. > I am starting to look at tools like blktrace, is that the right path? > Some guidance on how to use some low level tools would be appreciated, > > Regards > Keith > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users