Roch

I've been chewing on this for a little while and had some thoughts

On Jan 15, 2007, at 12:02, Roch - PAE wrote:


Jonathan Edwards writes:

On Jan 5, 2007, at 11:10, Anton B. Rang wrote:

DIRECT IO is a set of performance optimisations to circumvent
shortcomings of a given filesystem.

Direct I/O as generally understood (i.e. not UFS-specific) is an
optimization which allows data to be transferred directly between
user data buffers and disk, without a memory-to-memory copy.

This isn't related to a particular file system.


true .. directio(3) is generally used in the context of *any* given
filesystem to advise it that an application buffer to system buffer
copy may get in the way or add additional overhead (particularly if
the filesystem buffer is doing additional copies.)  You can also look
at it as a way of reducing more layers of indirection particularly if
I want the application overhead to be higher than the subsystem
overhead.  Programmatically .. less is more.

Direct IO makes good sense when the target disk sectors are
set a priori. But in the context of ZFS, would you rather
have 10 direct disk I/Os or 10 bcopies and 2 I/O (say that
was possible).

sure, but in a well designed filesystem this is essentially the
same as efficient buffer cache utilization .. coalescing IO
operations to commit on a more efficient and larger disk
allocation unit.  However, paged IO (and in particular ZFS
paged IO) is probably a little more than simply a bcopy()
in comparison to Direct IO (at least in the QFS context)

As for read, I  can see that when  the load is cached in the
disk array and we're running  100% CPU, the extra copy might
be noticeable. Is this the   situation that longs for DIO  ?
What % of a system is spent in the copy  ? What is the added
latency that comes from the copy ? Is DIO the best way to
reduce the CPU cost of ZFS ?

To achieve maximum IO rates (in particular if you have a flexible
blocksize and know the optimal stripe width for the best raw disk
or array logical volume performance) you're going to do much
better if you don't have to pass through buffered IO strategies
with the added latencies and kernel space dependencies.

Consider the case where you're copying or replicating from one
disk device to another in a one-time shot.  There's tremendous
advantage in bypassing the buffer and reading and writing full
stripe passes.  The additional buffer copy is also going to add
latency and affect your run queue, particularly if you're working
on a shared system as the buffer cache might get affected by
memory pressure, kernel interrupts, or other applications.

Another common case could be line speed network data capture
if the frame size is already well aligned for the storage device.
Being able to attach one device to another with minimal kernel
intervention should be seen as an advantage for a wide range
of applications that need to stream data from device A to device
B and already know more than you might about both devices.

The  current Nevada  code base  has  quite nice  performance
characteristics  (and  certainly   quirks); there are   many
further efficiency gains to be reaped from ZFS. I just don't
see DIO on top of  that list for now.   Or at least  someone
needs to  spell out what  is ZFS/DIO and  how much better it
is expected to be (back of the envelope calculation accepted).

the real benefit is measured more in terms of memory consumption
for a given application and the type of balance between application
memory space and filesystem memory space.  when the filesystem
imposes more pressure on the application due to it's mapping you're
really measuring the impact of doing an application buffer read and
copy for each write.  In other words you're imposing more of a limit
on how the application should behave with respect to it's notion of
the storage device.

DIO should not been seen as a catchall for the notion of "more
efficiency will be gotten by bypassing the filesystem buffers" but
rather as "please don't buffer this since you might push back on
me and I don't know if I can handle a push back" advice

Reading RAID-Z  subblocks on filesystems that  have checksum
disabled might be interesting.   That would avoid  some disk
seeks.    To served  the  subblocks directly   or  not is  a
separate matter; it's  a small deal  compared to the feature
itself.  How about disabling the  DB  checksum (it can't fix
the block anyway) and do mirroring ?

Basically speaking - there needs to be some sort of strategy for
bypassing the ARC or even parts of the ARC for applications that
may need to advise the filesystem of either:
1) the delicate nature of imposing additional buffering for their
data flow
2) already well optimized applications that need more adaptive
cache in the application instead of the underlying filesystem or
volume manager

---
.je
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to