Re: [zfs-discuss] resilver = defrag?

Edward Ned Harvey Mon, 13 Sep 2010 07:44:25 -0700

> From: Richard Elling [mailto:rich...@nexenta.com]
> >
> > Regardless of multithreading, multiprocessing, it's absolutely
> possible to
> > have contiguous files, and/or file fragmentation.  That's not a
> > characteristic which depends on the threading model.
> 
> Possible, yes.  Probable, no.  Consider that a file system is
> allocating
> space for multiple, concurrent file writers.


Process A is writing.  Suppose it starts writing at block 10,000 out of my
1,000,000 block device.
Process B is also writing.  Suppose it starts writing at block 50,000.

These two processes write simultaneously, and no fragmentation occurs,
unless Process A writes more than 40,000 blocks.  In that case, A's file
gets fragmented, and the 2nd fragment might begin at block 300,000.

The concept which causes fragmentation (not counting COW) in the size of the
span of unallocated blocks.  Most filesystems will allocate blocks from the
largest unallocated contiguous area of the physical device, so as to
minimize fragmentation.

I can't say how ZFS behaves authoritatively, but I'd be extremely surprised
if two processes writing different files as fast as possible result in all
their blocks interleaved with each other on physical disk.  I think this is
possible if you have multiple processes lazily writing at less-than full
speed, because then ZFS might remap a bunch of small writes into a single
contiguous write.


> > Also regardless of raid, it's possible to have contiguous or
> fragmented
> > files.  The same concept applies to multiple disks.
> 
> RAID works against the efforts to gain performance by contiguous access
> because the access becomes non-contiguous.

These might as well have been words randomly selected from the dictionary to
me - I recognize that it's a complete sentence, but you might have said
"processors aren't needed in computers anymore," or something equally
illogical.

Suppose you have a 3-disk raid stripe set, using traditional simple
striping, because it's very easy to explain.  Suppose a process is writing
as fast as it can, and suppose it's going to write block 0 through block 99
of a virtual device.

        virtual block 0 = block 0 of disk 0
        virtual block 1 = block 0 of disk 1
        virtual block 2 = block 0 of disk 2
        virtual block 3 = block 1 of disk 0
        virtual block 4 = block 1 of disk 1
        virtual block 5 = block 1 of disk 2
        virtual block 6 = block 2 of disk 0
        virtual block 7 = block 2 of disk 1
        virtual block 8 = block 2 of disk 2
        virtual block 9 = block 3 of disk 0
        ...
        virtual block 96 = block 32 of disk 0
        virtual block 97 = block 32 of disk 1
        virtual block 98 = block 32 of disk 2
        virtual block 99 = block 33 of disk 0

Thanks to buffering and command queueing, the OS tells the RAID controller
to write blocks 0-8, and the raid controller tells disk 0 to write blocks
0-2, tells disk 1 to write blocks 0-2, and tells disk 2 to write 0-2,
simultaneously.  So the total throughput is the sum of all 3 disks writing
continuously and contiguously to sequential blocks.

This accelerates performance for continuous sequential writes.  It does not
"work against efforts to gain performance by contiguous access."

The same concept is true for raid-5 or raidz, but it's more complicated.
The filesystem or raid controller does in fact know how to write sequential
filesystem blocks to sequential physical blocks on the physical devices for
the sake of performance enhancement on contiguous read/write.

If you don't believe me, there's a very easy test to prove it:

Create a zpool with 1 disk in it.  time writing 100G (or some amount of data
>> larger than RAM.)
Create a zpool with several disks in a raidz set, and time writing 100G.
The speed scales up linearly with the number of disks, until you reach some
other hardware bottleneck, such as bus speed or something like that.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] resilver = defrag?

Reply via email to