I'm a bit confused with "always aligned".  None of the lseeks in this log are 
1024 aligned.

And I just ran a test with the 3.7.15.1 latest amalgamation and most of these 
seeks are not aligned.  Once in a while it gets lucky.  Alignment sure isn't 
deliberate in this.  It appears the first page is 1080 which is already out of 
alignment.  File#4 here is the wal file.

lseek(4, 0, SEEK_SET)                   = 0
lseek(4, 0, SEEK_SET)                   = 0
lseek(4, 32, SEEK_SET)                  = 32
lseek(4, 56, SEEK_SET)                  = 56
lseek(4, 1080, SEEK_SET)                = 1080
lseek(4, 1104, SEEK_SET)                = 1104
lseek(4, 2128, SEEK_SET)                = 2128
lseek(4, 2152, SEEK_SET)                = 2152
lseek(4, 3176, SEEK_SET)                = 3176
lseek(4, 3200, SEEK_SET)                = 3200




Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Advanced GEOINT Solutions Operating Unit
Northrop Grumman Information Systems

________________________________________
From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Dan Kennedy [danielk1...@gmail.com]
Sent: Wednesday, December 05, 2012 10:27 AM
To: sqlite-users@sqlite.org
Subject: EXT :Re: [sqlite] Tracing latencies

On 12/05/2012 09:03 PM, Black, Michael (IS) wrote:
> Hmmm...looking at that strace sequencing is there some reason those 24-byte 
> and 1024-byte writes can't be combined?  The 1024-byte write is occurring at 
> the end boundary of the 24-byte.
>
> That would cut the i/o ops in half and might be a noticeable improvement. A 
> memory copy would be a lot faster than 2 i/o requests.
>
> And since disk page size is always a power of 2 would the 1024 buffer be 
> better off as 1000 to align the page i/o better?


Those writes are appending to the WAL file. Writes to the
database file are always aligned page-sized (in this case
1024 byte) blocks.

For the WAL file, we tried it both ways (combining the 24
and 1024 bytes writes into one and leaving them separate) and
found that, on linux, it's faster to call write() twice. In
other words, the extra write() call is cheaper than doing
a 1048 byte memcpy(). And you can't just use 1048 byte buffers
everywhere internally, as memory allocators tend to waste lots
of space if you allocate many blocks that are all just a bit
larger than a power-of-two.

Having said that, there has been at least one closed-source VFS backend
that buffers the sequential writes SQLite makes on the journal
and WAL files so that it can make mostly 8KB aligned writes to
the underlying file-system. So on some systems there is a benefit
to writing aligned page blocks even if you are writing sequentially.




>
> Michael D. Black
> Senior Scientist
> Advanced Analytics Directorate
> Advanced GEOINT Solutions Operating Unit
> Northrop Grumman Information Systems
>
> ________________________________________
> From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
> behalf of Keith Chew [keith.c...@gmail.com]
> Sent: Wednesday, December 05, 2012 2:11 AM
> To: General Discussion of SQLite Database
> Subject: EXT :Re: [sqlite] Tracing latencies
>
> Hi Dan
>
> On Wed, Dec 5, 2012 at 6:38 PM, Dan Kennedy<danielk1...@gmail.com>  wrote:
>> If it's not fsync() then IO delays are normally caused by read().
>> You could try [strace -T -eread ...] to check.
>>
>> Are SELECT statements fast on the same database? How large is the
>> database compared to the machines memory?
>
> Ah, do you think the read on the OS level is starving the writes?
> There are not many DB selects from the app, but there are other read
> IO activities happening in the background. Still it doesn't make
> sense, because WAL mode ensures from sqlite's point of view, the
> reader will not be blocked by the writer. So, sqlite is subjected to
> similar levels of read IO activity as Mysql (because there are very
> few DB selects).
>
> The tables are small, around 20MB in total, compared to 2GB of memory 
> available.
>
> I have done more investigation. Using strace, I waited to capture an
> insert/update with a long latency, and caught one that is 2s long,
> below is the strace to the WAL file. We can see that it is very
> bursty, all the seeks and writes span close to over 1s.
>
> I recall doing a strace on Mysql a long time ago, and each insert only
> does a single seek and write. Maybe it is sqlite's file format that
> requires it to seek/write multiple places causing the slowness? I am
> only guessing here, as I do not know how to interpret the strace logs
> below.
>
> [pid  4015] 21:01:53.634099 _llseek(98, 499928,<unfinished ...>
> [pid  4015] 21:01:53.634245 write(98,
> "\0\0\0\4\0\0HU^=\226\213\23\10<\247+\214\332\260\314Wf ", 24
> <unfinished ...>
> [pid  4015] 21:01:53.634546 _llseek(98, 499952,<unfinished ...>
> [pid  4015] 21:01:53.634712 write(98,
> "\r\0\0\0\1\0033\0\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 1024<unfinished ...>
> [pid  4015] 21:01:53.740378 _llseek(98, 500976, [500976], SEEK_SET) = 0
> [pid  4015] 21:01:53.740449 write(98,
> "\0\0\0\3\0\0\0\0^=\226\213\23\10<\247\327\201>\32\227\323\f8", 24) =
> 24
> [pid  4015] 21:01:53.740521 _llseek(98, 501000, [501000], SEEK_SET) = 0
> [pid  4015] 21:01:53.740566 write(98,
> "\r\3\221\0\t\3G\1\3\243\3\261\3\352\3\325\3\304\3G\3\177\3m\3\\\0\0\0\0\0\0"...,
> 1024) = 1024
> [pid  4015] 21:01:53.740632 _llseek(98, 502024, [502024], SEEK_SET) = 0
> [pid  4015] 21:01:53.740677 write(98,
> "\0\0\37X\0\0\0\0^=\226\213\23\10<\247dL\17\316\32\30\301\237", 24) =
> 24
> [pid  4015] 21:01:53.740736 _llseek(98, 502048, [502048], SEEK_SET) = 0
> [pid  4015] 21:01:53.740781 write(98,
> "\r\0\0\0\5\1*\0\3o\2\336\2L\1\273\1*\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 1024) = 1024
> [pid  4015] 21:01:53.740844 _llseek(98, 503072, [503072], SEEK_SET) = 0
> [pid  4015] 21:01:53.740889 write(98,
> "\0\0HR\0\0HU^=\226\213\23\10<\247\276\32g\304j\372Q.", 24) = 24
> [pid  4015] 21:01:53.740945 _llseek(98, 503096, [503096], SEEK_SET) = 0
> [pid  4015] 21:01:53.740989 write(98,
> "\n\0\0\0\25\1\264\0\2@\2\\\2x\2\224\2\260\2\314\2\350\3\4\3
> \3<\3X\3t"..., 1024) = 1024
> [pid  4015] 21:01:54.636566 _llseek(98, 504120, [504120], SEEK_SET) = 0
> [pid  4015] 21:01:54.636636 write(98,
> "\0\0\0\4\0\0HU^=\226\213\23\10<\247=]`\3700\351\226n", 24) = 24
> [pid  4015] 21:01:54.636812 _llseek(98, 504144, [504144], SEEK_SET) = 0
> [pid  4015] 21:01:54.636860 write(98,
> "\r\0\0\0\1\0033\0\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 1024<unfinished ...>
> [pid  6744] 21:01:54.676590 close(98)   = 0
>
>
> Regards
> Keith
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to