On 12/05/2012 09:03 PM, Black, Michael (IS) wrote:
Hmmm...looking at that strace sequencing is there some reason those 24-byte and 
1024-byte writes can't be combined?  The 1024-byte write is occurring at the 
end boundary of the 24-byte.

That would cut the i/o ops in half and might be a noticeable improvement. A 
memory copy would be a lot faster than 2 i/o requests.

And since disk page size is always a power of 2 would the 1024 buffer be better 
off as 1000 to align the page i/o better?


Those writes are appending to the WAL file. Writes to the
database file are always aligned page-sized (in this case
1024 byte) blocks.

For the WAL file, we tried it both ways (combining the 24
and 1024 bytes writes into one and leaving them separate) and
found that, on linux, it's faster to call write() twice. In
other words, the extra write() call is cheaper than doing
a 1048 byte memcpy(). And you can't just use 1048 byte buffers
everywhere internally, as memory allocators tend to waste lots
of space if you allocate many blocks that are all just a bit
larger than a power-of-two.

Having said that, there has been at least one closed-source VFS backend
that buffers the sequential writes SQLite makes on the journal
and WAL files so that it can make mostly 8KB aligned writes to
the underlying file-system. So on some systems there is a benefit
to writing aligned page blocks even if you are writing sequentially.





Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Advanced GEOINT Solutions Operating Unit
Northrop Grumman Information Systems

________________________________________
From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Keith Chew [keith.c...@gmail.com]
Sent: Wednesday, December 05, 2012 2:11 AM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] Tracing latencies

Hi Dan

On Wed, Dec 5, 2012 at 6:38 PM, Dan Kennedy<danielk1...@gmail.com>  wrote:
If it's not fsync() then IO delays are normally caused by read().
You could try [strace -T -eread ...] to check.

Are SELECT statements fast on the same database? How large is the
database compared to the machines memory?

Ah, do you think the read on the OS level is starving the writes?
There are not many DB selects from the app, but there are other read
IO activities happening in the background. Still it doesn't make
sense, because WAL mode ensures from sqlite's point of view, the
reader will not be blocked by the writer. So, sqlite is subjected to
similar levels of read IO activity as Mysql (because there are very
few DB selects).

The tables are small, around 20MB in total, compared to 2GB of memory available.

I have done more investigation. Using strace, I waited to capture an
insert/update with a long latency, and caught one that is 2s long,
below is the strace to the WAL file. We can see that it is very
bursty, all the seeks and writes span close to over 1s.

I recall doing a strace on Mysql a long time ago, and each insert only
does a single seek and write. Maybe it is sqlite's file format that
requires it to seek/write multiple places causing the slowness? I am
only guessing here, as I do not know how to interpret the strace logs
below.

[pid  4015] 21:01:53.634099 _llseek(98, 499928,<unfinished ...>
[pid  4015] 21:01:53.634245 write(98,
"\0\0\0\4\0\0HU^=\226\213\23\10<\247+\214\332\260\314Wf ", 24
<unfinished ...>
[pid  4015] 21:01:53.634546 _llseek(98, 499952,<unfinished ...>
[pid  4015] 21:01:53.634712 write(98,
"\r\0\0\0\1\0033\0\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1024<unfinished ...>
[pid  4015] 21:01:53.740378 _llseek(98, 500976, [500976], SEEK_SET) = 0
[pid  4015] 21:01:53.740449 write(98,
"\0\0\0\3\0\0\0\0^=\226\213\23\10<\247\327\201>\32\227\323\f8", 24) =
24
[pid  4015] 21:01:53.740521 _llseek(98, 501000, [501000], SEEK_SET) = 0
[pid  4015] 21:01:53.740566 write(98,
"\r\3\221\0\t\3G\1\3\243\3\261\3\352\3\325\3\304\3G\3\177\3m\3\\\0\0\0\0\0\0"...,
1024) = 1024
[pid  4015] 21:01:53.740632 _llseek(98, 502024, [502024], SEEK_SET) = 0
[pid  4015] 21:01:53.740677 write(98,
"\0\0\37X\0\0\0\0^=\226\213\23\10<\247dL\17\316\32\30\301\237", 24) =
24
[pid  4015] 21:01:53.740736 _llseek(98, 502048, [502048], SEEK_SET) = 0
[pid  4015] 21:01:53.740781 write(98,
"\r\0\0\0\5\1*\0\3o\2\336\2L\1\273\1*\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1024) = 1024
[pid  4015] 21:01:53.740844 _llseek(98, 503072, [503072], SEEK_SET) = 0
[pid  4015] 21:01:53.740889 write(98,
"\0\0HR\0\0HU^=\226\213\23\10<\247\276\32g\304j\372Q.", 24) = 24
[pid  4015] 21:01:53.740945 _llseek(98, 503096, [503096], SEEK_SET) = 0
[pid  4015] 21:01:53.740989 write(98,
"\n\0\0\0\25\1\264\0\2@\2\\\2x\2\224\2\260\2\314\2\350\3\4\3
\3<\3X\3t"..., 1024) = 1024
[pid  4015] 21:01:54.636566 _llseek(98, 504120, [504120], SEEK_SET) = 0
[pid  4015] 21:01:54.636636 write(98,
"\0\0\0\4\0\0HU^=\226\213\23\10<\247=]`\3700\351\226n", 24) = 24
[pid  4015] 21:01:54.636812 _llseek(98, 504144, [504144], SEEK_SET) = 0
[pid  4015] 21:01:54.636860 write(98,
"\r\0\0\0\1\0033\0\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1024<unfinished ...>
[pid  6744] 21:01:54.676590 close(98)   = 0


Regards
Keith
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to