Re: [zfs-discuss] Remedies for suboptimal mmap performance on zfs

Andrew Gabriel Mon, 28 May 2012 12:44:05 -0700

On 05/28/12 20:06, Iwan Aucamp wrote:

I'm getting sub-optimal performance with an mmap based database(mongodb) which is running on zfs of Solaris 10u9.
System is Sun-Fire X4270-M2 with 2xX5680 and 72GB (6 * 8GB + 6 * 4GB)ram (installed so it runs at 1333MHz) and 2 * 300GB 15K RPM disks
- a few mongodb instances are running with with moderate IO and totalrss of 50 GB- a service which logs quite excessively (5GB every 20 mins) is alsorunning (max 2GB ram use) - log files are compressed after some timeto bzip2.
Database performance is quite horrid though - it seems that zfs doesnot know how to manage allocation between page cache and arc cache -and it seems arc cache wins most of the time.
I'm thinking of doing the following:
- relocating mmaped (mongo) data to a zfs filesystem with onlymetadata cache
 - reducing zfs arc cache to 16 GB
Is there any other recommendations - and is above likely to improveperformance.

1. Upgrade to S10 Update 10 - this has various performance improvements,in particular related to database type loads (but I don't know anythingabout mongodb).


2. Reduce the ARC size so RSS + ARC + other memory users < RAM size.

I assume the RSS include's whatever caching the database does. Intheory, a database should be able to work out what's worth cachingbetter than any filesystem can guess from underneath it, so you want toconfigure more memory in the DB's cache than in the ARC. (The defaultARC tuning is unsuitable for a database server.)

3. If the database has some concept of blocksize or recordsize that ituses to perform i/o, make sure the filesystems it is using configured tobe the same recordsize. The ZFS default recordsize (128kB) is usuallymuch bigger than database blocksizes. This is probably going to haveless impact with an mmaped database than a read(2)/write(2) database,where it may prove better to match the filesystem's record size to thesystem's page size (4kB, unless it's using some type of large pages). Ihaven't tried playing with recordsize for memory mapped i/o, so I'mspeculating here.

Blocksize or recordsize may apply to the log file writer too, and it maybe that this needs a different recordsize and therefore has to be in adifferent filesystem. If it uses write(2) or some variant rather thanmmap(2) and doesn't document this in detail, Dtrace is your friend.

4. Keep plenty of free space in the zpool if you want good databaseperformance. If you're more than 60% full (S10U9) or 80% full (S10U10),that could be a factor.


Anyway, there are a few things to think about.

--
Andrew
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Remedies for suboptimal mmap performance on zfs

Reply via email to