On 12/06/2012 12:47 AM, Black, Michael (IS) wrote:
I'm a bit confused with "always aligned".  None of the lseeks in this
log are 1024 aligned.

And I just ran a test with the 3.7.15.1 latest amalgamation and most
of these seeks are not aligned.  Once in a while it gets lucky.
Alignment sure isn't deliberate in this.  It appears the first page
is 1080 which is already out of alignment.  File#4 here is the wal
file.

I wasn't clear. Writes to the WAL file are not aligned. However,
they are usually sequential, which means on an OS with a smart
enough cache you don't really gain anything at all trying to align
them.

To the database file, where it does matter, writes are always
aligned.





lseek(4, 0, SEEK_SET)                   = 0 lseek(4, 0, SEEK_SET)
= 0 lseek(4, 32, SEEK_SET)                  = 32 lseek(4, 56,
SEEK_SET)                  = 56 lseek(4, 1080, SEEK_SET)
= 1080 lseek(4, 1104, SEEK_SET)                = 1104 lseek(4, 2128,
SEEK_SET)                = 2128 lseek(4, 2152, SEEK_SET)
= 2152 lseek(4, 3176, SEEK_SET)                = 3176 lseek(4, 3200,
SEEK_SET)                = 3200




Michael D. Black Senior Scientist Advanced Analytics Directorate
Advanced GEOINT Solutions Operating Unit Northrop Grumman Information
Systems

________________________________________ From:
sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on
behalf of Dan Kennedy [danielk1...@gmail.com] Sent: Wednesday,
December 05, 2012 10:27 AM To: sqlite-users@sqlite.org Subject: EXT
:Re: [sqlite] Tracing latencies

On 12/05/2012 09:03 PM, Black, Michael (IS) wrote:
Hmmm...looking at that strace sequencing is there some reason those
24-byte and 1024-byte writes can't be combined?  The 1024-byte
write is occurring at the end boundary of the 24-byte.

That would cut the i/o ops in half and might be a noticeable
improvement. A memory copy would be a lot faster than 2 i/o
requests.

And since disk page size is always a power of 2 would the 1024
buffer be better off as 1000 to align the page i/o better?


Those writes are appending to the WAL file. Writes to the database
file are always aligned page-sized (in this case 1024 byte) blocks.

For the WAL file, we tried it both ways (combining the 24 and 1024
bytes writes into one and leaving them separate) and found that, on
linux, it's faster to call write() twice. In other words, the extra
write() call is cheaper than doing a 1048 byte memcpy(). And you
can't just use 1048 byte buffers everywhere internally, as memory
allocators tend to waste lots of space if you allocate many blocks
that are all just a bit larger than a power-of-two.

Having said that, there has been at least one closed-source VFS
backend that buffers the sequential writes SQLite makes on the
journal and WAL files so that it can make mostly 8KB aligned writes
to the underlying file-system. So on some systems there is a benefit
to writing aligned page blocks even if you are writing sequentially.





Michael D. Black Senior Scientist Advanced Analytics Directorate
Advanced GEOINT Solutions Operating Unit Northrop Grumman
Information Systems

________________________________________ From:
sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org]
on behalf of Keith Chew [keith.c...@gmail.com] Sent: Wednesday,
December 05, 2012 2:11 AM To: General Discussion of SQLite
Database Subject: EXT :Re: [sqlite] Tracing latencies

Hi Dan

On Wed, Dec 5, 2012 at 6:38 PM, Dan Kennedy<danielk1...@gmail.com>
wrote:
If it's not fsync() then IO delays are normally caused by
read(). You could try [strace -T -eread ...] to check.

Are SELECT statements fast on the same database? How large is
the database compared to the machines memory?

Ah, do you think the read on the OS level is starving the writes?
There are not many DB selects from the app, but there are other
read IO activities happening in the background. Still it doesn't
make sense, because WAL mode ensures from sqlite's point of view,
the reader will not be blocked by the writer. So, sqlite is
subjected to similar levels of read IO activity as Mysql (because
there are very few DB selects).

The tables are small, around 20MB in total, compared to 2GB of
memory available.

I have done more investigation. Using strace, I waited to capture
an insert/update with a long latency, and caught one that is 2s
long, below is the strace to the WAL file. We can see that it is
very bursty, all the seeks and writes span close to over 1s.

I recall doing a strace on Mysql a long time ago, and each insert
only does a single seek and write. Maybe it is sqlite's file format
that requires it to seek/write multiple places causing the
slowness? I am only guessing here, as I do not know how to
interpret the strace logs below.

[pid  4015] 21:01:53.634099 _llseek(98, 499928,<unfinished ...>
[pid  4015] 21:01:53.634245 write(98,
"\0\0\0\4\0\0HU^=\226\213\23\10<\247+\214\332\260\314Wf ", 24
<unfinished ...> [pid  4015] 21:01:53.634546 _llseek(98,
499952,<unfinished ...> [pid  4015] 21:01:53.634712 write(98,
"\r\0\0\0\1\0033\0\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,


1024<unfinished ...>
[pid  4015] 21:01:53.740378 _llseek(98, 500976, [500976], SEEK_SET)
= 0 [pid  4015] 21:01:53.740449 write(98,
"\0\0\0\3\0\0\0\0^=\226\213\23\10<\247\327\201>\32\227\323\f8", 24)
= 24 [pid  4015] 21:01:53.740521 _llseek(98, 501000, [501000],
SEEK_SET) = 0 [pid  4015] 21:01:53.740566 write(98,
"\r\3\221\0\t\3G\1\3\243\3\261\3\352\3\325\3\304\3G\3\177\3m\3\\\0\0\0\0\0\0"...,


1024) = 1024
[pid  4015] 21:01:53.740632 _llseek(98, 502024, [502024], SEEK_SET)
= 0 [pid  4015] 21:01:53.740677 write(98,
"\0\0\37X\0\0\0\0^=\226\213\23\10<\247dL\17\316\32\30\301\237", 24)
= 24 [pid  4015] 21:01:53.740736 _llseek(98, 502048, [502048],
SEEK_SET) = 0 [pid  4015] 21:01:53.740781 write(98,
"\r\0\0\0\5\1*\0\3o\2\336\2L\1\273\1*\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,


1024) = 1024
[pid  4015] 21:01:53.740844 _llseek(98, 503072, [503072], SEEK_SET)
= 0 [pid  4015] 21:01:53.740889 write(98,
"\0\0HR\0\0HU^=\226\213\23\10<\247\276\32g\304j\372Q.", 24) = 24
[pid  4015] 21:01:53.740945 _llseek(98, 503096, [503096], SEEK_SET)
= 0 [pid  4015] 21:01:53.740989 write(98,
"\n\0\0\0\25\1\264\0\2@\2\\\2x\2\224\2\260\2\314\2\350\3\4\3
\3<\3X\3t"..., 1024) = 1024 [pid  4015] 21:01:54.636566 _llseek(98,
504120, [504120], SEEK_SET) = 0 [pid  4015] 21:01:54.636636
write(98, "\0\0\0\4\0\0HU^=\226\213\23\10<\247=]`\3700\351\226n",
24) = 24 [pid  4015] 21:01:54.636812 _llseek(98, 504144, [504144],
SEEK_SET) = 0 [pid  4015] 21:01:54.636860 write(98,
"\r\0\0\0\1\0033\0\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,


1024<unfinished ...>
[pid  6744] 21:01:54.676590 close(98)   = 0


Regards Keith _______________________________________________
sqlite-users mailing list sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________ sqlite-users
mailing list sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


_______________________________________________ sqlite-users mailing
list sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________ sqlite-users mailing
list sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to