Re: [zfs-discuss] Basic question about striping and ZFS

2009-11-23 Thread Kjetil Torgrim Homme
Kjetil Torgrim Homme  writes:
> Cindy Swearingen  writes:
>> You might check the slides on this page:
>>
>> http://hub.opensolaris.org/bin/view/Community+Group+zfs/docs
>>
>> Particularly, slides 14-18.
>>
>> In this case, graphic illustrations are probably the best way
>> to answer your questions.
>
> thanks, Cindy.  can you explain the meaning of the blocks marked X in
> the illustration on page 18?

I found the explanation in an older (2009-09-03) message to this list
from Adam Leventhal:

|   RAID-Z writes full stripes every time; note that without careful
|   accounting it would be possible to effectively fragment the vdev
|   such that single sectors were free but useless since single-parity
|   RAID-Z requires two adjacent sectors to store data (one for data,
|   one for parity). To address this, RAID-Z rounds up its allocation to
|   the next (nparity + 1).  This ensures that all space is accounted
|   for. RAID-Z will thus skip sectors that are unused based on this
|   rounding. For example, under raidz1 a write of 1024 bytes would
|   result in 512 bytes of parity, 512 bytes of data on two devices and
|   512 bytes skipped.

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Basic question about striping and ZFS

2009-11-05 Thread Ilya
So then of what use is the parity? 

And how is the metadata used to reconstruct bad data? I understand obviously 
what the metadata contains but I don't get how ZFS traverses through a file 
system and USES the metadata to construct bad blocks.

I understand that you write everything to separate blocks. My question was this:

If you have initially two stripes over two disks like this:

Disk 1:  (Stripe Unit 1)
Disk 2:  (Stripe Unit 2)

You then want to modify something in the first stripe unit with modifications 
which are smaller so now Disk 1 and Disk 2 stripes look like this:

Disk 1: XXYY (the y's indicate modified bits or bytes or whatever)
Disk 2: 

So now, with a full-stripe write, you then make new blocks for both stripes and 
just copy the data over to the new blocks. Now, tell me if I am write with what 
happens on a full-stripe write:

You read in Disk 1 and Disk 2 stripes in the file system cache. You then apply 
the modifications to the Disk 1 stripe within the cache. After this, you 
compute the parity within the cache and finally you write out both Disk 1 
Stripe and Disk 2 stripe to new blocks. Since the modifications to the disk 1 
stripe (the Ys) were smaller than the total stripe size, the new sector which 
will be written to will be of a smaller stripe size than the originals.

Is this correct?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Basic question about striping and ZFS

2009-11-05 Thread A Darren Dunham
On Thu, Nov 05, 2009 at 11:55:58AM -0800, Ilya wrote:

> Slide 18 shows variably sizes extents but doesn't explain the process
> of full-on write. What I'm looking for is one example. I still don't
> understand how it works with variable sized extents. So if you have 2
> stripe units on one disk and 1 stripe unit for the parity and you
> modify half of the first stripe unit only, when you do a full-stripe
> write, what happens in terms of a full-stripe write?

You never modify a ZFS block (or part of a ZFS block).  You write a new
replacement block elsewhere and a new metadata tree is constructed that
references the new block and not the old one (other than via snapshots).

I'm not sure I understand your picture of "2 stripe units on one disk
and 1 stripe unit for parity".  That doesn't seem correct.  Are you
looking at a particular portion of that graphic that you could
reference? 

> I also didn't see a distinction between parity and metadata
> reconstruction. I still do not know the process behind the metadata
> reconstruction for bad data and when parity is used for bad data.

Not sure what you mean by metadata reconstruction.

The checksums stored in parent blocks can be used to validate child
blocks (either metadata or data).

If the checksum fails, and there is redundant information (copy, parity,
mirror), then the system tries to see if the data is available through
the redundant data.  It will read the other half of the mirror, or try
to read a data using a parity reconstruction.  It will then validate
that other read via the checksum.  If that checksum succeeds, you've
read the data and the system should attempt to rewrite the redundant
info (assuming it was a bad block and not a disk failure that has left
the pool in a degraded state).

If the checksum fails and there is no redundant copy, then the data is
not returned.

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Basic question about striping and ZFS

2009-11-05 Thread Ilya
Hey

Thanks for the slides but some things are still unclear.

Slide 18 shows variably sizes extents but doesn't explain the process of 
full-on write. What I'm looking for is one example. I still don't understand 
how it works with variable sized extents. So if you have 2 stripe units on one 
disk and 1 stripe unit for the parity and you modify half of the first stripe 
unit only, when you do a full-stripe write, what happens in terms of a 
full-stripe write?

I also didn't see a distinction between parity and metadata reconstruction. I 
still do not know the process behind the metadata reconstruction for bad data 
and when parity is used for bad data.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Basic question about striping and ZFS

2009-11-04 Thread Ilya
Forgot to add, are those four stripe units (for that one file) above considered 
the stripe itself? Or are each of those stripe units on the seperate disks 
considered as separate stripes?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Basic question about striping and ZFS

2009-11-04 Thread Ilya
Researching about ZFS and had a question leating to Raid-Z and the striping. 
So, I was glacing over Jeff's blog (http://blogs.sun.com/bonwick/entry/raid_z):

[i]"RAID-Z is a data/parity scheme like RAID-5, but it uses dynamic stripe 
width. Every block is its own RAID-Z stripe, regardless of blocksize. This 
means that every RAID-Z write is a full-stripe write. This, when combined with 
the copy-on-write transactional semantics of ZFS, completely eliminates the 
RAID write hole. RAID-Z is also faster than traditional RAID because it never 
has to do read-modify-write. "[/i]

So firstly, is this literally referring to the blocks of a file for example? 
Also by stripe, is this referring to the stripe UNITS (within a whole stripe) 
or the ENTIRE stripe across disks? 

So, let's say that you have a file of 64 kb per sector (stripe units consisting 
of blocks of whatever size totaling 64k) across four disks. 

Disk 0: Stripe 1
Disk 1: Stripe 2
Disk 2: Stripe 3
Disk 3: Parity

When Jeff's blog mentions that "every block has it's own stripe" what does he 
exactly mean in the context of this example? And let's say that I am 
modifying/write out bytes in the first stripe, how does this affect the other 
stripes/parity?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss