The problem I see with "sequential access jump all over the place" is that this increases the utilization of the disks - over the years disks have become even faster for sequential access, whereas random access (as they have to move the actuator) has not improved at the same pace - this is what ZFS exploits when writing. With its fancy detection of sequential access patterns and improved readahead, ZFS should be able to deal with the latency aspect of randomized read accesses - but at the expense of a higher disk utilization. If you think of many processes accessing the same disks this may result in disks "running out of IOPS" earlier than in an environment with sequential accesses (though contiguos data). Obviously this heavily depends on the workload - but with the trend towards even higher capacity disks, IOPS become a valuable resource and it may be worth to think about how to most efficiently use disks - a "self-optimizing" mechanism that in the background or on request rearranges files to become contigous
may therefore be useful.

- Franz

Gregory Shaw wrote:

Rich, correct me if I'm wrong, but here's the scenario I was thinking of:

- A large file is created.
- Over time, the file grows and shrinks.

The anticipated layout on disk due to this is that extents are allocated as the file changes. The extents may or may not be on multiple spindles.

I envision a fragmentation over time that will cause sequential access to jump all over the place. If you use smart controllers or disks with read caching, their use of stripes and read-ahead (if enabled) could cause performance to be bad.

So, my thought was to de-fragment the file to make it more contiguous and to allow hardware read-ahead to be effective.

An additional benefit would be to spread it across multiple spindles in a contiguous fashion, such as:

disk0: 32mb
disk1: 32mb
disk2: 32mb
... etc.

Perhaps this is unnecessary. I'm simply trying to grasp the long term performance implications of COW data.

On May 15, 2006, at 8:47 AM, Roch Bourbonnais - Performance Engineering wrote:


Gregory Shaw writes:

I really like the below idea:
    - the ability to defragment a file 'live'.

    I can see instances where that could be very useful.  For instance,
if you have multiple LUNs (or spindles, whatever) using ZFS, you
could re-optimize large files to spread the chunks across as many
spindles as possible.  Or, as stated below, make it contiguous.
    I don't know if that is automatic with ZFS today, but it's an idea.


I think the expected benefits of making it contiguous is
rooted in the belief that bigger I/Os is the only way to
reach top performance.

I think that before ZFS, both physical and logical
contiguity was required to enable sufficient readahead and
get performance.

Once  we  have  good  readahead based  on   detected logical
contiguous   accesses, It may  well  be  possible to get top
device speed through reasonably-sized I/O concurrency.

-r


-----
Gregory Shaw, IT Architect
Phone: (303) 673-8273        Fax: (303) 673-8273
ITCTO Group, Sun Microsystems Inc.
1 StorageTek Drive ULVL4-382           [EMAIL PROTECTED] (work)
Louisville, CO 80028-4382                 [EMAIL PROTECTED] (home)
"When Microsoft writes an application for Linux, I've Won." - Linus Torvalds



_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to