Any advice for settings for extremely IO constrained systems?
A demo I've set up for sales seems to be spending much of it's time in disk wait states.
The particular system I'm working with is:
Ext3 on Debian inside Microsoft VirtualPC on NTFS
on WindowsXP on laptops of our sales team.
Somewhat surprisingly, CPU performance is close to native; but disk IO is much worse - probably orders of magnitude worse - since there are
so many layers of filesystems involved. Unfortunately, no, I don't think the sales guys will upgrade to BSD. :)
The database is too large to fit entirely in memory (3GB of spatial data
using PostGIS); and has relative large updates (people can add "layers" consisting of perhaps 10000 points, lines, and polygons out of a million
or so possibilities - they do this by doing 10K inserts into tables with postgis geometry columns).
Steps I've already done:
* Gave virtual PC as much memory as possible (1/2 gig)
* Tuned postgresql.conf; setting increased effective_cache_size to 10000 (tested a few values with this workload) reduced cpu_index_tuple_cost to 0.0005 (encourages indexes which may reduce disk hits) decreased random_page_cost to 2 (seems the fragmented NTFS means many sequential access are probably a random access anyway) increased work_mem to 15000 (sorting on disk was very VERY amazingly slow) increased shared_buffers to 3000 (guess)
* Tuned ext3 (yeah, I'll try JFS or XFS next) Journal_data_writeback == minimize journaling? commit=600,noatime in fstab * tuned the VM echo 60000 > /proc/sys/vm/dirty_expire_centisecs echo 70 > /proc/sys/vm/dirty_ratio
It seems for this workload, the two biggest benefits were "commit=600" and writeback for ext3 and "echo 60000 > /proc/sys/vm/dirty_expire_centisecs"
If I understand right, this combination says that dirty pages can sit in memory far longer than the defaults -- and I guess this delays my bad IO
times to the point in the salesguys presentation when he's playing with powerpoint:).
Much of this tuning was guesswork; but it did make the demo go from "unacceptable" to "reasonable". Were any of my guesses particularly bad, and may be doing more harm than good?
Any more ideas on how to deal with a pathologically slow IO system?
---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster