On Mon, Nov 17, 2008 at 07:01:15PM +0100, Santi Saez spake thusly:
> It's curious, when changing from "cfq" to "noop" there are less reads that
> when using cfq!! I have measured using dstat, vmstat, etc.. and always
> occurs the same :-/

I wonder if you might be having a disk/page cache alignment issue. Are
you creating partitions inside your iscsi volumes? You might want to
check out the following url as there is lots of good performance
optimization tips on it that both myself and colleagues have
discovered. We did most of this with AoE but the same things should
all apply to iSCSI.

From: http://xenaoe.org/wiki/AoEOptimizations/view

Page alignment problems

There is an alignment issue going on when an AoE (or possibly iSCSI or
fibrechannel) device is partitioned.

Loather on the #aoe channel of irc.freenode.net has written a short
paper explaining the issue in
detail. http://xenaoe.org/wiki/aoe-caching-alignment.pdf

If I write blocks direct to the AoE disk device etherd/e0.0, for
instance, the block writes are directly aligned with pages and I can
stream at full-speed(in this case I'm saturating my gigabit ethernet
interface on a write) to the underlying device.

If I add a partition table and write to this partition, the writes are
offset by 512 bytes, which is not a multiple of a page size (and,
incedentally, the exact size of an x86 partition table).

That is to say, if I write blocks directly to etherd/e0.0p1, the
writes cause page cache reads every time.

I have discovered that this alignment error causes every write() on
the initiator to be offset into two cache pages on the target. So,
even if the cache reads don't happen on the initiator, they do happen
on the target.

The off-by-one error is a result of the default CHS geometry:

AoE devices default to a 255 head, 63 SPT geometry. This causes the
first partition to start at the beginning of the 63rd block on the

    63 isn't a multiple of 8.

Solution: I edited the partition table to align the disk with the
geometry of 256 heads and 32 sectors per track. I also had to change
the number of cylinders on the virtual disk to align with the change
in bytes per cylinder.

Powers of two align nicely with 4096-byte pages. This way, pages on
the underlying device (target) directly correspond with pages on the
virtual device (initiator). Having anything other than a multiple of
eight as the number of sectors per track will result in a

However, I now have a filesystem on a partitioned AoE device that can
write at full speed (minus filesystem overhead) without the cache
alignment penalty. This is definitely a huge step in the right

Tracy Reed

Attachment: pgpSCCItXpvci.pgp
Description: PGP signature

Reply via email to