Re: suns raid-z / zfs

2008-02-18 Thread Neil Brown
On Monday February 18, [EMAIL PROTECTED] wrote:
 On Mon, Feb 18, 2008 at 03:07:44PM +1100, Neil Brown wrote:
  On Sunday February 17, [EMAIL PROTECTED] wrote:
   Hi
   
  
   It seems like a good way to avoid the performance problems of raid-5
   /raid-6
  
  I think there are better ways.
 
 Interesting! What do you have in mind?

A Log Structured Filesystem always does large contiguous writes.
Aligning these to the raid5 stripes wouldn't be too hard and then you
would never have to do any pre-reading.

 
 and what are the problems with zfs?

Recovery after a failed drive would not be an easy operation, and I
cannot imagine it being even close to the raw speed of the device.

 
   
   But does it stripe? One could think that rewriting stripes
   other places would damage the striping effects.
  
  I'm not sure what you mean exactly.  But I suspect your concerns here
  are unjustified.
 
 More precisely. I understand that zfs always write the data anew.
 That would mean at other blocks on the partitions, for the logical blocks
 of the file in question. So the blocks on the partitions will not be
 adjacant. And striping will not be possible, generally.

The important part of striping is that a write is spread out over
multiple devices, isn't it.

If ZFS can choose where to put each block that it writes, it can
easily choose to write a series of blocks to a collection of different
devices, thus getting the major benefit of striping.


NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 to RAID6 reshape?

2008-02-18 Thread Beolach
On Feb 17, 2008 10:26 PM, Janek Kozicki [EMAIL PROTECTED] wrote:
 Conway S. Smith said: (by the date of Sun, 17 Feb 2008 07:45:26 -0700)

  Well, I was reading that LVM2 had a 20%-50% performance penalty,

 huh? Make a benchmark. Do you really think that anyone would be using
 it if there was any penalty bigger than 1-2% ? (random access, r/w).

 I have no idea what is the penalty, but I'm totally sure I didn't
 notice it.


(Oops, replied straight to Janek, rather than the list.  Sorry.)

I saw those numbers in a few places, the only one I can remember off
the top of my head was the Gentoo-Wiki:
http://gentoo-wiki.com/HOWTO_Gentoo_Install_on_Software_RAID_mirror_and_LVM2_on_top_of_RAID.
Looking at its history, that warning was added back on 23 Dec. 2006,
so it could very well be out-of-date.  Good to hear you don't notice
any performance drop.  I think I will try to run some benchmarks.
What do you guys recommend using for benchmarking?  Plain dd,
bonnie++?


Conway S. Smith
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 to RAID6 reshape?

2008-02-18 Thread Andre Noll
On 17:40, Mark Hahn wrote:
 Question to other people here - what is the maximum partition size
 that ext3 can handle, am I correct it 4 TB ?
 
 8 TB.  people who want to push this are probably using ext4 already.

ext3 supports up to 16T for quite some time. It works fine for me:

[EMAIL PROTECTED]:~ # mount |grep sda; df /dev/sda; uname -a; uptime
/dev/sda on /media/bia type ext3 (rw)
FilesystemSize  Used Avail Use% Mounted on
/dev/sda   15T  7.8T  7.0T  53% /media/bia
Linux ume 2.6.20.12 #3 SMP Tue Jun 5 14:33:44 CEST 2007 x86_64 GNU/Linux
 13:44:29 up 236 days, 15:12,  9 users,  load average: 10.47, 10.28, 10.17

Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe


signature.asc
Description: Digital signature


Re: RAID5 to RAID6 reshape?

2008-02-18 Thread Janek Kozicki
Beolach said: (by the date of Mon, 18 Feb 2008 05:38:15 -0700)

 On Feb 17, 2008 10:26 PM, Janek Kozicki [EMAIL PROTECTED] wrote:
  Conway S. Smith said: (by the date of Sun, 17 Feb 2008 07:45:26 -0700)
 
   Well, I was reading that LVM2 had a 20%-50% performance penalty,
 http://gentoo-wiki.com/HOWTO_Gentoo_Install_on_Software_RAID_mirror_and_LVM2_on_top_of_RAID.

hold on. This might be related to raid chunk positioning with respect
to LVM chunk positioning. If they interfere there indeed may be some
performance drop. Best to make sure that those chunks are aligned together.

-- 
Janek Kozicki |
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 to RAID6 reshape?

2008-02-18 Thread Mark Hahn

8 TB.  people who want to push this are probably using ext4 already.


ext3 supports up to 16T for quite some time. It works fine for me:


thanks.  16 makes sense (2^32 * 4k blocks).
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: HDD errors in dmesg, but don't know why...

2008-02-18 Thread Justin Piszcz
Looks like your replacement disk is no good, the SATA port is bad or other 
issue.  I am not sure what SDB FIS means but as long as you keep getting 
that error, don't expect the drive to work correctly, I had a drive that 
did a similar thing (DOA Raptor) and after I got the replacement it worked 
fine.  However, like I said, I am not sure what that error means SDB FIS.


On Mon, 18 Feb 2008, Steve Fairbairn wrote:



Hi All,

I've got a degraded RAID5 which I'm trying to add in the replacement
disk.  Trouble is, every time the recovery starts, it flies along at
70MB/s or so.  Then after doing about 1%, it starts dropping rapidly,
until eventually a device is marked failed.

When I look in dmesg, I get the following...

SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
ata5.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x0
ata5.00: (irq_stat 0x00060002, device error via SDB FIS)
ata5.00: cmd 60/00:10:3f:0e:f9/01:00:00:00:00/40 tag 2 cdb 0x0 data
131072 in
res 41/40:00:50:0e:f9/9c:00:00:00:00/40 Emask 0x9 (media error)
ata5.00: configured for UDMA/100
ata5: EH complete
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
ata5.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x0
ata5.00: (irq_stat 0x00060002, device error via SDB FIS)
ata5.00: cmd 60/00:18:3f:02:f9/01:00:00:00:00/40 tag 3 cdb 0x0 data
131072 in
res 41/40:00:c3:02:f9/9c:00:00:00:00/40 Emask 0x9 (media error)
ata5.00: configured for UDMA/100
ata5: EH complete
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
ata5.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x0
ata5.00: (irq_stat 0x00060002, device error via SDB FIS)
ata5.00: cmd 60/00:10:3f:0e:f9/01:00:00:00:00/40 tag 2 cdb 0x0 data
131072 in
res 41/40:00:50:0e:f9/9c:00:00:00:00/40 Emask 0x9 (media error)
ata5.00: configured for UDMA/100
ata5: EH complete
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
ata5.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x0
ata5.00: (irq_stat 0x00060002, device error via SDB FIS)
ata5.00: cmd 60/00:18:3f:02:f9/01:00:00:00:00/40 tag 3 cdb 0x0 data
131072 in
res 41/40:00:c3:02:f9/9c:00:00:00:00/40 Emask 0x9 (media error)
ata5.00: configured for UDMA/100
ata5: EH complete
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
ata5.00: exception Emask 0x0 SAct 0x3ff SErr 0x0 action 0x0
ata5.00: (irq_stat 0x00060002, device error via SDB FIS)
ata5.00: cmd 60/00:10:3f:0e:f9/01:00:00:00:00/40 tag 2 cdb 0x0 data
131072 in
res 41/40:00:50:0e:f9/9c:00:00:00:00/40 Emask 0x9 (media error)
ata5.00: configured for UDMA/100
ata5: EH complete
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back

I've no idea what to make of these errors.  As far as I can work out,
the HD's themselves are fine They are all less than 2 months old.

The box is CentOS 5.1.  Linux space.homenet.com 2.6.18-53.1.13.el5 #1
SMP Tue Feb 12 13:02:30 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

Any suggestions on what I can do to stop this issue?

Steve.

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.20.7/1284 - Release Date:
17/02/2008 14:39


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


HDD errors in dmesg, but don't know why...

2008-02-18 Thread Steve Fairbairn

Hi All,

I've got a degraded RAID5 which I'm trying to add in the replacement
disk.  Trouble is, every time the recovery starts, it flies along at
70MB/s or so.  Then after doing about 1%, it starts dropping rapidly,
until eventually a device is marked failed.

When I look in dmesg, I get the following...

SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
ata5.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x0
ata5.00: (irq_stat 0x00060002, device error via SDB FIS)
ata5.00: cmd 60/00:10:3f:0e:f9/01:00:00:00:00/40 tag 2 cdb 0x0 data
131072 in
 res 41/40:00:50:0e:f9/9c:00:00:00:00/40 Emask 0x9 (media error)
ata5.00: configured for UDMA/100
ata5: EH complete
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
ata5.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x0
ata5.00: (irq_stat 0x00060002, device error via SDB FIS)
ata5.00: cmd 60/00:18:3f:02:f9/01:00:00:00:00/40 tag 3 cdb 0x0 data
131072 in
 res 41/40:00:c3:02:f9/9c:00:00:00:00/40 Emask 0x9 (media error)
ata5.00: configured for UDMA/100
ata5: EH complete
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
ata5.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x0
ata5.00: (irq_stat 0x00060002, device error via SDB FIS)
ata5.00: cmd 60/00:10:3f:0e:f9/01:00:00:00:00/40 tag 2 cdb 0x0 data
131072 in
 res 41/40:00:50:0e:f9/9c:00:00:00:00/40 Emask 0x9 (media error)
ata5.00: configured for UDMA/100
ata5: EH complete
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
ata5.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x0
ata5.00: (irq_stat 0x00060002, device error via SDB FIS)
ata5.00: cmd 60/00:18:3f:02:f9/01:00:00:00:00/40 tag 3 cdb 0x0 data
131072 in
 res 41/40:00:c3:02:f9/9c:00:00:00:00/40 Emask 0x9 (media error)
ata5.00: configured for UDMA/100
ata5: EH complete
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
ata5.00: exception Emask 0x0 SAct 0x3ff SErr 0x0 action 0x0
ata5.00: (irq_stat 0x00060002, device error via SDB FIS)
ata5.00: cmd 60/00:10:3f:0e:f9/01:00:00:00:00/40 tag 2 cdb 0x0 data
131072 in
 res 41/40:00:50:0e:f9/9c:00:00:00:00/40 Emask 0x9 (media error)
ata5.00: configured for UDMA/100
ata5: EH complete
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back

I've no idea what to make of these errors.  As far as I can work out,
the HD's themselves are fine They are all less than 2 months old.

The box is CentOS 5.1.  Linux space.homenet.com 2.6.18-53.1.13.el5 #1
SMP Tue Feb 12 13:02:30 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

Any suggestions on what I can do to stop this issue?

Steve.

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.20.7/1284 - Release Date:
17/02/2008 14:39
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: HDD errors in dmesg, but don't know why...

2008-02-18 Thread Jeff Garzik

Steve Fairbairn wrote:

Hi All,

I've got a degraded RAID5 which I'm trying to add in the replacement
disk.  Trouble is, every time the recovery starts, it flies along at
70MB/s or so.  Then after doing about 1%, it starts dropping rapidly,
until eventually a device is marked failed.

When I look in dmesg, I get the following...

SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
ata5.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x0
ata5.00: (irq_stat 0x00060002, device error via SDB FIS)
ata5.00: cmd 60/00:10:3f:0e:f9/01:00:00:00:00/40 tag 2 cdb 0x0 data
131072 in
 res 41/40:00:50:0e:f9/9c:00:00:00:00/40 Emask 0x9 (media error)
ata5.00: configured for UDMA/100
ata5: EH complete
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: drive cache: write back
ata5.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x0
ata5.00: (irq_stat 0x00060002, device error via SDB FIS)
ata5.00: cmd 60/00:18:3f:02:f9/01:00:00:00:00/40 tag 3 cdb 0x0 data
131072 in
 res 41/40:00:c3:02:f9/9c:00:00:00:00/40 Emask 0x9 (media error)


media error means just that -- your hard drive is reporting bad media 
to libata, which in turn dutifully reports that info to you :)


Jeff




-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suns raid-z / zfs

2008-02-18 Thread Keld Jørn Simonsen
On Mon, Feb 18, 2008 at 09:51:15PM +1100, Neil Brown wrote:
 On Monday February 18, [EMAIL PROTECTED] wrote:
  On Mon, Feb 18, 2008 at 03:07:44PM +1100, Neil Brown wrote:
   On Sunday February 17, [EMAIL PROTECTED] wrote:
Hi

   
It seems like a good way to avoid the performance problems of raid-5
/raid-6
   
   I think there are better ways.
  
  Interesting! What do you have in mind?
 
 A Log Structured Filesystem always does large contiguous writes.
 Aligning these to the raid5 stripes wouldn't be too hard and then you
 would never have to do any pre-reading.
 
  
  and what are the problems with zfs?
 
 Recovery after a failed drive would not be an easy operation, and I
 cannot imagine it being even close to the raw speed of the device.

I thought this was a problem with most raid types, while
reconstructioning, performance is quite slow. And as there has been some
damage, this is expected. And there probebly is no much ado about it.

Or is there? Are there any RAID types that performs reasonably well
given that one disk is under repair? The performance could be cruical
for some applications. 

One could think of clever arrangements so that say two disks could go
down and the rest of the array with 10-20 drives could still function
reasonably well, even under the reconstruction. As far as I can tell
from the code, the reconstruction itself is not impeding normal
performance much, as normal operation bars reconstuction operations.

Hmm, my understanding would then be, for both random reads and writes
that performance in typical raids would only be reduced by the IO bandwidth
of the failing disks.

For sequential R/W performance for raid10,f would
be hurt, downgrading its performance to random IO for the drives involved.

Raid5/6 would be hurt much for reading, as all drives need to be read for giving
correct information during reconstruction.


So it looks like, if your performance is important under a
reconstruction, then you should avoid raid5/6 and use the mirrored raid
types. Given you have a big operation, with a load balance of a lot of
random reading and writing, it does not matter much which mirrored
raid type you would choose, as they all perform about equal for random
IO, even when reconstructing. Is that correct advice?

  

But does it stripe? One could think that rewriting stripes
other places would damage the striping effects.
   
   I'm not sure what you mean exactly.  But I suspect your concerns here
   are unjustified.
  
  More precisely. I understand that zfs always write the data anew.
  That would mean at other blocks on the partitions, for the logical blocks
  of the file in question. So the blocks on the partitions will not be
  adjacant. And striping will not be possible, generally.
 
 The important part of striping is that a write is spread out over
 multiple devices, isn't it.
 
 If ZFS can choose where to put each block that it writes, it can
 easily choose to write a series of blocks to a collection of different
 devices, thus getting the major benefit of striping.

I see 2 major benefits of striping: one is that many drives are involved 
and the other is that the stripes are  allocated adjacant, so that io
on one drive can just proceed to the next physical blocks when one
stripe has been processed. Dependent on the size of the IO operations
involved, first one or more disks in a stripe is processed, and then the
following stripes are processed. ZFS misses the second part of the
optimization, In think.

Best regards
Keld
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 to RAID6 reshape?

2008-02-18 Thread Peter Grandi
 On Sun, 17 Feb 2008 07:45:26 -0700, Conway S. Smith
 [EMAIL PROTECTED] said:

[ ... ]

beolach Which part isn't wise? Starting w/ a few drives w/ the
beolach intention of growing; or ending w/ a large array (IOW,
beolach are 14 drives more than I should put in 1 array  expect
beolach to be safe from data loss)?

Well, that rather depends on what is your intended data setup and
access patterns, but the above are all things that may be unwise
in many cases. The intended use mentioned below does not require
a single array for example.

However while doing the above may make sense in *some* situation,
I reckon that the number of those situations is rather small.

Consider for example the answers to these questions:

* Suppose you have a 2+1 array which is full. Now you add a disk
  and that means that almost all free space is on a single disk.
  The MD subsystem has two options as to where to add that lump
  of space, consider why neither is very pleasant.

* How fast is doing unaligned writes with a 13+1 or a 12+2
  stripe? How often is that going to happen, especially on an
  array that started as a 2+1?

* How long does it take to rebuild parity with a 13+1 array or a
  12+2 array in case of s single disk failure? What happens if a
  disk fails during rebuild?

* When you have 13 drives and you add the 14th, how long does
  that take? What happens if a disk fails during rebuild??

The points made by http://WWW.BAARF.com/ apply too.

beolach [ ... ] media files that would typically be accessed
beolach over the network by MythTV boxes.  I'll also be using
beolach it as a sandbox database/web/mail server. [ ... ] most
beolach important stuff backed up, [ ... ] some gaming, which
beolach is where I expect performance to be most noticeable.

To me that sounds like something that could well be split across
multiple arrays, rather than risking repeatedly extending a
single array, and then risking a single large array.

beolach Well, I was reading that LVM2 had a 20%-50% performance
beolach penalty, which in my mind is a really big penalty. But I
beolach think those numbers where from some time ago, has the
beolach situation improved?

LVM2 relies on DM, which is not much slower than say 'loop', so
it is almost insignificant for most people.

But even if the overhead may be very very low, DM/LVM2/EVMS seem
to me to have very limited usefulness (e.g. Oracle tablespaces,
and there are contrary opinions as to that too). In your stated
applications it is hard to see why you'd want to split your
arrays into very many block devices or why you'd want to resize
them.

beolach And is a 14 drive RAID6 going to already have enough
beolach overhead that the additional overhead isn't very
beolach significant? I'm not sure why you say it's amusing.

Consider the questions above. Parity RAID has issues, extending
an array has issues, the idea of extending both massively and
in several steps a parity RAID looks very amusing to me.

beolach [ ... ] The other reason I wasn't planning on using LVM
beolach was because I was planning on keeping all the drives in
beolach the one RAID. [... ]

Good luck :-).
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html