Re: The SX4 challenge

2008-01-20 Thread Mikael Pettersson
Jeff Garzik writes:
  
  Promise just gave permission to post the docs for their PDC20621 (i.e. 
  SX4) hardware:
  http://gkernel.sourceforge.net/specs/promise/pdc20621-pguide-1.2.pdf.bz2
  
  joining the existing PDC20621 DIMM and PLL docs:
  http://gkernel.sourceforge.net/specs/promise/pdc20621-pguide-dimm-1.6.pdf.bz2
  http://gkernel.sourceforge.net/specs/promise/pdc20621-pguide-pll-ata-timing-1.2.pdf.bz2
  
  
  So, the SX4 is now open.  Yay :)  I am hoping to talk Mikael into 
  becoming the sata_sx4 maintainer, and finally integrating my 'new-eh' 
  conversion in libata-dev.git.

The best solution would be if some storage driver person would
take on the SX4 challenge and work towards integrating the SX4
into Linux' RAID framework.

If no-one steps forward I'll take over Jeff's SX4 card and just
maintain sata_sx4 as a plain non-RAID driver. Unfortunately I
don't have the time needed to turn it into a decent RAID or
RAID-offload driver myself.

/Mikael

  
  But now is a good time to remind people how lame the sata_sx4 driver 
  software really is -- and I should know, I wrote it.
  
  The SX4 hardware, simplified, is three pieces:  XOR engine (for raid5), 
  host-board memcpy engine, and several ATA engines (and some helpful 
  transaction sequencing features).  Data for each WRITE command is first 
  copied to the board RAM, then the ATA engines DMA to/from the board RAM. 
Data for each READ command is copied to board RAM via the ATA engines, 
  then DMA'd across PCI to your host memory.
  
  Therefore, while it is not hardware RAID, the SX4 provides all the 
  pieces necessary to offload RAID1 and RAID5, and handle other RAID 
  levels optimally.  RAID1 and 5 copies can be offloaded (provided all 
  copies go to SX4-attached devices of course).  RAID5 XOR gen and 
  checking can be offloaded, allowing the OS to see a single request, 
  while the hardware processes a sequence of low-level requests sent in a 
  batch.
  
  This hardware presents an interesting challenge:  it does not really fit 
  into software RAID (i.e. no RAID) /or/ hardware RAID categories.  The 
  sata_sx4 driver presents the no-RAID configuration, while is terribly 
  inefficient:
  
   WRITE:
   submit host DMA (copy to board)
   host DMA completion via interrupt
   submit ATA command
   ATA command completion via interrupt
   READ:
   submit ATA command
   ATA command completion via interrupt
   submit host DMA (copy from board)
   host DMA completion via interrupt
  
  Thus, the SX4 challenge is a challenge to developers to figure out the 
  most optimal configuration for this hardware, given the existing MD and 
  DM work going on.
  
  Now, it must be noted that the SX4 is not current-gen technology.  Most 
  vendors have moved towards an IOP model, where the hw vendor puts most 
  of their hard work into an ARM/MIPS firmware, running on an embedded 
  chip specially tuned for storage purposes.  (ref hptiop and stex 
  drivers, very very small SCSI drivers)
  
  I know Dan Williams @ Intel is working on very similar issues on the IOP 
  -- async memcpy, XOR offload, etc. -- and I am hoping that, due to that 
  current work, some of the good ideas can be reused with the SX4.
  
  Anyway...  it's open, it's interesting, even if it's not current-gen 
  tech anymore.  You can probably find them on Ebay or in an 
  out-of-the-way computer shop somewhere.
  
   Jeff
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


how to create a degraded raid1 with only 1 of 2 drives ??

2008-01-20 Thread Mitchell Laks
Hi mdadm raid gurus,

I wanted to make a raid1 array, but at the moment I have only 1 drive 
available. The other disk is
in the mail. I wanted to make a raid1 that i will use as a backup.

But I need to do the backup now, before the second drive comes.

So I did this.

formated /dev/sda creating /dev/sda1 with type fd.

then I tried to run
mdadm -C /dev/md0 --level=1 --raid-devices=1 /dev/sda1

but I got an error message
mdadm: 1 is an unusual numner of drives for an array so it is probably a 
mistake. If you really
mean it you will need to specify --force before setting the number of drives

so then i tried

mdadm -C /dev/md0 --level=1 --force --raid-devices=1 /dev/sda1
mdadm: /dev/sda1 is too small:0K
mdadm: create aborted

now what does that mean?

fdisk -l /dev/sda  shows

device boot start   end blocks  Id  System
/dev/sda1   1   60801   488384001 fdlinux raid autodetect

so what do I do?


I need to back up my data.
If I simply format /dev/sda1 as an ext3 file system then I can't add the 
second drive later on.

How can I  set it up as a `degraded` raid1 array so I can later on add in the 
second drive and sync?

Thanks for your help!

Mitchell
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to create a degraded raid1 with only 1 of 2 drives ??

2008-01-20 Thread michael

Quoting Mitchell Laks [EMAIL PROTECTED]:


Hi mdadm raid gurus,

I wanted to make a raid1 array, but at the moment I have only 1   
drive available. The other disk is

in the mail. I wanted to make a raid1 that i will use as a backup.

But I need to do the backup now, before the second drive comes.

So I did this.

formated /dev/sda creating /dev/sda1 with type fd.

then I tried to run
mdadm -C /dev/md0 --level=1 --raid-devices=1 /dev/sda1


Perhaps give this a try:

# mdadm -C /dev/md0 --level=1 --raid-devices=2 /dev/sda1 missing

Cheers,
Mike
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to create a degraded raid1 with only 1 of 2 drives ??

2008-01-20 Thread Mitchell Laks
I think my error was that maybe I did not
do write the fdisk changes to the drive with 
fdisk w

so I did
fdisk /dev/sda
p
then 
w
and then when I did

mdadm -C /dev/md0 --level=2 -n2 /dev/sda1 missing

it worked and set up the array.

Thanks for being there!

Mitchell
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: mdadm error when trying to replace a failed drive in RAID5 array

2008-01-20 Thread Steve Fairbairn

Thanks for the response Bill.  Neil has responded to me a few times, but
I'm more than happy to try and keep it on this list instead as it feels
like I'm badgering Neil which really isn't fair...

Since my initial email, I got to the point of believing it was down to
the superblock, and that --zero-superblock wasn't working, so a good few
hours and a dd if=/dev/zero of=/dev/hdc later, I tried adding it again
to the same result.

As it happens, I did the --zero-superblock, then tried to insert it
again and then examined (mdadm -E) again and the block was 'still there'
- What really happened was that the act of trying to add it writes in
the superblock.  So --zero-superblock is working fine for me, but it's
still refusing to add the device.

The only other thing I've tried is moving the replacement drive to
/dev/hdd instead (secondary slave) with an small old HD I had lying
around as hdc.

[EMAIL PROTECTED] ~]# mdadm -E /dev/hdd1
mdadm: No md superblock detected on /dev/hdd1.

[EMAIL PROTECTED] ~]# mdadm /dev/md0 --add /dev/hdd1
mdadm: add new device failed for /dev/hdd1 as 5: Invalid argument

[EMAIL PROTECTED] ~]# dmesg | tail
...
md: hdd1 has invalid sb, not importing!
md: md_import_device returned -22

[EMAIL PROTECTED] ~]# mdadm -E /dev/hdd1
/dev/hdd1:
Magic : a92b4efc
Version : 00.90.00
UUID : 382c157a:405e0640:c30f9e9e:888a5e63
Creation Time : Wed Jan 9 18:57:53 2008
Raid Level : raid5
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 1953535744 (1863.04 GiB 2000.42 GB)
Raid Devices : 5
Total Devices : 4
Preferred Minor : 0
Update Time : Sun Jan 20 13:02:00 2008
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0
Checksum : 198f8fb4 - correct
Events : 0.348270
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 5 22 65 -1 spare /dev/hdd1
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 8 33 2 active sync /dev/sdc1
3 3 0 0 3 faulty removed
4 4 8 49 4 active sync /dev/sdd1

I have mentioned it to Neil, but didn't mention it here before.  I am a
C developer by trade, so can easily devle into the mdadm source for
extra debug if anyone thinks it could help.  I could also delve into md
in the kernel if really wanted, but my knowledge of building kernels on
linux is some 4 years+ out of date and forgotten, so if that's a yes,
then some pointers on how to get the centos kernel config and a choice
of kernel from www.kernel.org, or from the centos distro would be
invaluable.

I'm away for a few days from tomorrow and probably wont be able to do
much if anything until I'm back on Thursday, so please be patient if I
don't respond before then.

Many Thanks,

Steve.

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.19.7/1233 - Release Date:
19/01/2008 18:37
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


One Large md or Many Smaller md for Better Peformance?

2008-01-20 Thread Moshe Yudkowsky
Question: with the same number of physical drives,  do I get better 
performance with one large md-based drive, or do I get better 
performance if I have several smaller md-based drives?


Situation: dual CPU, 4 drives (which I will set up as RAID-1 after being 
terrorized by the anti-RAID-5 polemics included in the Debian distro of 
mdadm).


I've two choices:

1. Allocate all the drive space into a single large partition, place 
into a single RAID array (either 10 or 1 + LVM, a separate question).


2. Allocate each drive into several smaller partitions. Make each set of 
smaller partitions into a separate RAID 1 array and use separate RAID md 
drives for the various file systems.


Example use case:

While working other problems, I download a large torrent in the 
background. The torrent writes to its own, separate file system called 
/foo. If /foo is mounted on its own RAID 10 or 1-LVM array, will that 
help or hinder overall system responsiveness?


It would seem a no brainer that giving each major filesystem its own 
array would allow for better threading and responsiveness, but I'm 
picking up hints in various piece of documentation that the performance 
can be counter-intuitive. I've even considered the possibility of giving 
/var and /usr separate RAID arrays (data vs. executables).


If an expert could chime in, I'd appreciate it a great deal.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 There are more ways to skin a cat than nuking it from orbit
-- but it's the only way to be sure.
-- Eliezer Yudkowsky
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Performance of RAID 10 vs. using LVM?

2008-01-20 Thread Moshe Yudkowsky
Let's assume that I have 4 drives; they are set up in mirrored pairs as 
RAID 1, and then aggregated together to create a RAID 10 system (RAID 1 
followed by RAID 0). That is, 4 x N disks become a 2N size filesystem.


Question: Is this higher or lower performance than using LVM to 
aggregate the disks?


LVM allows the creation of unitary file system from disparate physical 
drives, and has the advantage that filesystems can be expanded or shrunk 
with ease. I'll be using LVM on top of the RAID 1 or RAID 10 regardless.


Therefore, I can use LVM to create a 1L system, to coin an acronym. 
This would have the same 2N size, but would be created by LVM instead of 
RAID 0. Is there a performance advantage to using RAID 10 instead of 
RAID 1L? (The other question is whether the hypothetical performance 
advantage of 10 outweighs the flexibility advantage 1L, a question that 
only an individual user can answer... perhaps.)


Comments extremely welcome.

--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 The sharpest knives are also the quietest.
 -- John M. Ford, _The Final Reflection_
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm error when trying to replace a failed drive in RAID5 array

2008-01-20 Thread Robin Hill
On Sat Jan 19, 2008 at 11:08:43PM -, Steve Fairbairn wrote:

 
 Hi All,
 
 I have a Software RAID 5 device configured, but one of the drives
 failed. I removed the drive with the following command...
 
 mdadm /dev/md0 --remove /dev/hdc1
 
 Now, when I try to insert the replacement drive back in, I get the
 following...
 
 [EMAIL PROTECTED] ~]# mdadm /dev/md0 --add /dev/hdc1
 mdadm: add new device failed for /dev/hdc1 as 5: Invalid argument
 
 [EMAIL PROTECTED] mdadm-2.6.4]# dmesg | tail
 ...
 md: hdc1 has invalid sb, not importing!
 md: md_import_device returned -22
 md: hdc1 has invalid sb, not importing!
 md: md_import_device returned -22
 
I've had the same error message trying to add a drive into an array
myself - in my case I'm almost certain it's because the drive is
slightly smaller than the others in the array (the array's currently
growing so I haven't delved any further yet).  Have you checked the
actual partition sizes?  Particularly if it's a different type of drive
as drives from different manufacturers can vary by quite a large
amount.

Cheers,
Robin
-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgpKJVYKhAk6m.pgp
Description: PGP signature


RE: mdadm error when trying to replace a failed drive in RAID5 array

2008-01-20 Thread Steve Fairbairn
 -Original Message-
 From: Neil Brown [mailto:[EMAIL PROTECTED] 
 Sent: 20 January 2008 20:37
 
  md: hdd1 has invalid sb, not importing!
  md: md_import_device returned -22
 
 In 2.6.18, the only thing that can return this message 
 without other more explanatory messages are:
 
   2/ If the device appears to be too small.
 
 Maybe it is the later, though that seems unlikely.
 

[EMAIL PROTECTED] ~]# mdadm /dev/md0 --verbose --add /dev/hdd1
mdadm: added /dev/hdd1

HUGE thanks to Neil, and one white gold plated donkey award to me.

OK.  When I created /dev/md1 after creating /dev/md0, I was using a
mishmash of disks I had lying around.  As this selection of disks used
differing block sizes, I chose to create the raid partitions from the
first block, to a set size (+250G).  When I reinstalled the disk for
going into /dev/md0, I partitioned the disk the same way (+500G), which
it turns out isn't how I created the partitions when I created that
array.

So the device I was trying to add was about 22 blocks too small.  Taking
Neils suggestion and looking at /proc/partitions showed this up
incredibly quickly.

My sincere apologies for wasting all your time on a stupid error, and
again many many thanks for the solution...

md0 : active raid5 hdd1[5] sdd1[4] sdc1[2] sdb1[1] sda1[0]
  1953535744 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUU_U]
  []  recovery =  0.9% (4430220/488383936)
finish=1110.8min speed=7259K/sec

Steve.

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.516 / Virus Database: 269.19.7/1233 - Release Date:
19/01/2008 18:37
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: One Large md or Many Smaller md for Better Peformance?

2008-01-20 Thread Iustin Pop
On Sun, Jan 20, 2008 at 02:24:46PM -0600, Moshe Yudkowsky wrote:
 Question: with the same number of physical drives,  do I get better  
 performance with one large md-based drive, or do I get better  
 performance if I have several smaller md-based drives?

No expert here, but my opinion:
  - md code works better if it's only one array per physical drive,
because it keeps statistics per array (like last accessed sector,
etc.) and if you combine two arrays on the same drive these
statistics are not exactly true anymore
  - simply separating 'application work areas' into different
filesystems is IMHO enogh, no need to separate the raid arrays too
  - if you download torrents, fragmentation is a real problem, so use a
filesystem that knows how to preallocate space (XFS and maybe ext4;
for XFS use xfs_io to set a bigger extend size for where you
download)

regards,
iustin
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: One Large md or Many Smaller md for Better Peformance?

2008-01-20 Thread Bill Davidsen

Moshe Yudkowsky wrote:
Question: with the same number of physical drives,  do I get better 
performance with one large md-based drive, or do I get better 
performance if I have several smaller md-based drives?


Situation: dual CPU, 4 drives (which I will set up as RAID-1 after 
being terrorized by the anti-RAID-5 polemics included in the Debian 
distro of mdadm).


I've two choices:

1. Allocate all the drive space into a single large partition, place 
into a single RAID array (either 10 or 1 + LVM, a separate question).


One partitionable RAID-10, perhaps, then partition as needed. Read the 
discussion here about performance of LVM and RAID. I personally don't do 
LVM unless I know I will have to have great flexibility of configuration 
and can give up performance to get it. Other report different results, 
so make up your own mind.
2. Allocate each drive into several smaller partitions. Make each set 
of smaller partitions into a separate RAID 1 array and use separate 
RAID md drives for the various file systems.


Example use case:

While working other problems, I download a large torrent in the 
background. The torrent writes to its own, separate file system called 
/foo. If /foo is mounted on its own RAID 10 or 1-LVM array, will that 
help or hinder overall system responsiveness?


It would seem a no brainer that giving each major filesystem its own 
array would allow for better threading and responsiveness, but I'm 
picking up hints in various piece of documentation that the 
performance can be counter-intuitive. I've even considered the 
possibility of giving /var and /usr separate RAID arrays (data vs. 
executables).


If an expert could chime in, I'd appreciate it a great deal.





--
Bill Davidsen [EMAIL PROTECTED]
 Woe unto the statesman who makes war without a reason that will still
 be valid when the war is over... Otto von Bismark 



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to create a degraded raid1 with only 1 of 2 drives ??

2008-01-20 Thread David Greaves
Mitchell Laks wrote:
 I think my error was that maybe I did not
 do write the fdisk changes to the drive with 
 fdisk w

No - your problem was that you needed to use the literal word missing
like you did this time:
 mdadm -C /dev/md0 --level=2 -n2 /dev/sda1 missing

[however, this time you also asked for a RAID2 (--level=2) which I doubt would 
work]

David
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to create a degraded raid1 with only 1 of 2 drives ??

2008-01-20 Thread Bill Davidsen

Mitchell Laks wrote:

Hi mdadm raid gurus,

I wanted to make a raid1 array, but at the moment I have only 1 drive 
available. The other disk is
in the mail. I wanted to make a raid1 that i will use as a backup.

But I need to do the backup now, before the second drive comes.

So I did this.

formated /dev/sda creating /dev/sda1 with type fd.

then I tried to run
mdadm -C /dev/md0 --level=1 --raid-devices=1 /dev/sda1

but I got an error message
mdadm: 1 is an unusual numner of drives for an array so it is probably a 
mistake. If you really
mean it you will need to specify --force before setting the number of drives

so then i tried

mdadm -C /dev/md0 --level=1 --force --raid-devices=1 /dev/sda1
mdadm: /dev/sda1 is too small:0K
mdadm: create aborted

now what does that mean?

fdisk -l /dev/sda  shows

device boot start   end blocks  Id  System
/dev/sda1   1   60801   488384001 fdlinux raid autodetect

so what do I do?


I need to back up my data.
If I simply format /dev/sda1 as an ext3 file system then I can't add the 
second drive later on.

How can I  set it up as a `degraded` raid1 array so I can later on add in the 
second drive and sync?
  


Specify two drives and one as missing.

--
Bill Davidsen [EMAIL PROTECTED]
 Woe unto the statesman who makes war without a reason that will still
 be valid when the war is over... Otto von Bismark 



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


array doesn't run even with --force

2008-01-20 Thread Carlos Carvalho
I've got a raid5 array with 5 disks where 2 failed. The failures are
occasional and only on a few sectors so I tried to assemble it with 4
disks anyway:

# mdadm -A -f -R /dev/mdnumber /dev/disk1 /dev/disk2 /dev/disk3 /dev/disk4

However mdadm complains that one of the disks has an out-of-date
superblock and kicks it out, and then it cannot run the array with
only 3 disks.

Shouldn't it adjust the superblock and assemble-run it anyway? That's
what -f is for, no? This is with kernel 2.6.22.16 and mdadm 2.6.4.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


idle array consuming cpu ??!!

2008-01-20 Thread Carlos Carvalho
A raid6 array with a spare and bitmap is idle: not mounted and with no
IO to it or any of its disks (obviously), as shown by iostat. However
it's consuming cpu: since reboot it used about 11min in 24h, which is quite
a lot even for a busy array (the cpus are fast). The array was cleanly
shutdown so there's been no reconstruction/check or anything else.

How can this be? Kernel is 2.6.22.16 with the two patches for the
deadlock ([PATCH 004 of 4] md: Fix an occasional deadlock in raid5 -
FIX) and the previous one.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: array doesn't run even with --force

2008-01-20 Thread Neil Brown
On Sunday January 20, [EMAIL PROTECTED] wrote:
 I've got a raid5 array with 5 disks where 2 failed. The failures are
 occasional and only on a few sectors so I tried to assemble it with 4
 disks anyway:
 
 # mdadm -A -f -R /dev/mdnumber /dev/disk1 /dev/disk2 /dev/disk3 /dev/disk4
 
 However mdadm complains that one of the disks has an out-of-date
 superblock and kicks it out, and then it cannot run the array with
 only 3 disks.
 
 Shouldn't it adjust the superblock and assemble-run it anyway? That's
 what -f is for, no? This is with kernel 2.6.22.16 and mdadm 2.6.4.

Please provide actual commands and actual output.
Also add --verbose to the assemble command
Also provide --examine for all devices.
Also provide any kernel log messages.

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: idle array consuming cpu ??!!

2008-01-20 Thread Neil Brown
On Sunday January 20, [EMAIL PROTECTED] wrote:
 A raid6 array with a spare and bitmap is idle: not mounted and with no
 IO to it or any of its disks (obviously), as shown by iostat. However
 it's consuming cpu: since reboot it used about 11min in 24h, which is quite
 a lot even for a busy array (the cpus are fast). The array was cleanly
 shutdown so there's been no reconstruction/check or anything else.
 
 How can this be? Kernel is 2.6.22.16 with the two patches for the
 deadlock ([PATCH 004 of 4] md: Fix an occasional deadlock in raid5 -
 FIX) and the previous one.

Maybe the bitmap code is waking up regularly to do nothing.

Would you be happy to experiment?  Remove the bitmap with
   mdadm --grow /dev/mdX --bitmap=none

and see how that affects cpu usage?

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: idle array consuming cpu ??!!

2008-01-20 Thread Carlos Carvalho
Neil Brown ([EMAIL PROTECTED]) wrote on 21 January 2008 12:15:
 On Sunday January 20, [EMAIL PROTECTED] wrote:
  A raid6 array with a spare and bitmap is idle: not mounted and with no
  IO to it or any of its disks (obviously), as shown by iostat. However
  it's consuming cpu: since reboot it used about 11min in 24h, which is quite
  a lot even for a busy array (the cpus are fast). The array was cleanly
  shutdown so there's been no reconstruction/check or anything else.
  
  How can this be? Kernel is 2.6.22.16 with the two patches for the
  deadlock ([PATCH 004 of 4] md: Fix an occasional deadlock in raid5 -
  FIX) and the previous one.
 
 Maybe the bitmap code is waking up regularly to do nothing.
 
 Would you be happy to experiment?  Remove the bitmap with
mdadm --grow /dev/mdX --bitmap=none
 
 and see how that affects cpu usage?

OK, I just removed the bitmap (checked with mdadm -E on one of the
devices) and recorded the cpu time of the kernel thread. Tomorrow I'll
look at it again.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: array doesn't run even with --force

2008-01-20 Thread Carlos Carvalho
Neil Brown ([EMAIL PROTECTED]) wrote on 21 January 2008 12:13:
 On Sunday January 20, [EMAIL PROTECTED] wrote:
  I've got a raid5 array with 5 disks where 2 failed. The failures are
  occasional and only on a few sectors so I tried to assemble it with 4
  disks anyway:
  
  # mdadm -A -f -R /dev/mdnumber /dev/disk1 /dev/disk2 /dev/disk3 /dev/disk4
  
  However mdadm complains that one of the disks has an out-of-date
  superblock and kicks it out, and then it cannot run the array with
  only 3 disks.
  
  Shouldn't it adjust the superblock and assemble-run it anyway? That's
  what -f is for, no? This is with kernel 2.6.22.16 and mdadm 2.6.4.
 
 Please provide actual commands and actual output.
 Also add --verbose to the assemble command
 Also provide --examine for all devices.
 Also provide any kernel log messages.

The command is

mdadm -A --verbose -f -R /dev/md3 /dev/sda4 /dev/sdc4 /dev/sde4 /dev/sdd4

The failed areas are sdb4 (which I didn't include above) and sdd4. I
did a dd if=/dev/sdb4 of=/dev/hda4 bs=512 conv=noerror and it
complained about roughly 10 bad sectors. I did dd if=/dev/sdd4
of=/dev/hdc4 bs=512 conv=noerror and there were no errors, that's why
I used sdd4 above. I tried to substitute hdc4 for sdd4, and hda4 for
sdb4, to no avail.

I don't have kernel logs because the failed area has /home and /var.
The double fault occurred during the holidays, so I don't know which
happened first. Below are the output of the command above and of
--examine.

mdadm: looking for devices for /dev/md3
mdadm: /dev/sda4 is identified as a member of /dev/md3, slot 0.
mdadm: /dev/sdc4 is identified as a member of /dev/md3, slot 2.
mdadm: /dev/sde4 is identified as a member of /dev/md3, slot 4.
mdadm: /dev/sdd4 is identified as a member of /dev/md3, slot 5.
mdadm: no uptodate device for slot 1 of /dev/md3
mdadm: added /dev/sdc4 to /dev/md3 as 2
mdadm: no uptodate device for slot 3 of /dev/md3
mdadm: added /dev/sde4 to /dev/md3 as 4
mdadm: added /dev/sdd4 to /dev/md3 as 5
mdadm: added /dev/sda4 to /dev/md3 as 0
mdadm: failed to RUN_ARRAY /dev/md3: Input/output error
mdadm: Not enough devices to start the array.

On screen it shows kicking out of date... for sdd4.

/dev/sda4:
  Magic : a92b4efc
Version : 00.90.00
   UUID : 2f2f8327:375b4306:94521055:e3dc373b
  Creation Time : Tue May 11 16:03:35 2004
 Raid Level : raid5
  Used Dev Size : 70454400 (67.19 GiB 72.15 GB)
 Array Size : 281817600 (268.76 GiB 288.58 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 3

Update Time : Wed Jan 16 16:00:53 2008
  State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 2
  Spare Devices : 0
   Checksum : 16119868 - correct
 Events : 0.14967284

 Layout : left-symmetric
 Chunk Size : 128K

  Number   Major   Minor   RaidDevice State
this 0   840  active sync   /dev/sda4

   0 0   840  active sync   /dev/sda4
   1 1   001  active sync  -  note the 
difference compared to sdc4
   2 2   8   362  active sync   /dev/sdc4
   3 3   003  faulty removed
   4 4   8   684  active sync   /dev/sde4

/dev/sdc4:
  Magic : a92b4efc
Version : 00.90.00
   UUID : 2f2f8327:375b4306:94521055:e3dc373b
  Creation Time : Tue May 11 16:03:35 2004
 Raid Level : raid5
  Used Dev Size : 70454400 (67.19 GiB 72.15 GB)
 Array Size : 281817600 (268.76 GiB 288.58 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 3

Update Time : Wed Jan 16 16:00:53 2008
  State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 2
  Spare Devices : 0
   Checksum : 1611988f - correct
 Events : 0.14967284

 Layout : left-symmetric
 Chunk Size : 128K

  Number   Major   Minor   RaidDevice State
this 2   8   362  active sync   /dev/sdc4

   0 0   840  active sync   /dev/sda4
   1 1   001  faulty removed
   2 2   8   362  active sync   /dev/sdc4
   3 3   003  faulty removed
   4 4   8   684  active sync   /dev/sde4

/dev/sdd4:
  Magic : a92b4efc
Version : 00.90.00
   UUID : 2f2f8327:375b4306:94521055:e3dc373b
  Creation Time : Tue May 11 16:03:35 2004
 Raid Level : raid5
  Used Dev Size : 70454400 (67.19 GiB 72.15 GB)
 Array Size : 281817600 (268.76 GiB 288.58 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 3

Update Time : Fri Jan 11 18:45:17 2008
  State : clean
 Active Devices : 3
Working Devices : 4
 Failed Devices : 2
  Spare Devices : 1
   Checksum : 160b27ce - correct
 Events : 0.14967266

 Layout : left-symmetric
 Chunk Size : 128K

  Number   Major   Minor   RaidDevice 

Re: One Large md or Many Smaller md for Better Performance?

2008-01-20 Thread Moshe Yudkowsky

Bill Davidsen wrote:

One partitionable RAID-10, perhaps, then partition as needed. Read the 
discussion here about performance of LVM and RAID. I personally don't do 
LVM unless I know I will have to have great flexibility of configuration 
and can give up performance to get it. Other report different results, 
so make up your own mind.


I've used Google to search (again) through the archives of the 
newsgroup, and performance lvm turns up relevant discussions back in 
2004 or so, but nothing very recent. Am I missing some other location 
for discussions of this question, or perhaps I'm looking in the wrong 
places? (The Wiki didn't help either.)


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
Rumor is information distilled so finely that it can filter through 
anything.

 --  Terry Pratchet, _Feet of Clay_
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: array doesn't run even with --force

2008-01-20 Thread Neil Brown
On Monday January 21, [EMAIL PROTECTED] wrote:
 
 The command is
 
 mdadm -A --verbose -f -R /dev/md3 /dev/sda4 /dev/sdc4 /dev/sde4 /dev/sdd4
 
 The failed areas are sdb4 (which I didn't include above) and sdd4. I
 did a dd if=/dev/sdb4 of=/dev/hda4 bs=512 conv=noerror and it
 complained about roughly 10 bad sectors. I did dd if=/dev/sdd4
 of=/dev/hdc4 bs=512 conv=noerror and there were no errors, that's why
 I used sdd4 above. I tried to substitute hdc4 for sdd4, and hda4 for
 sdb4, to no avail.
 
 I don't have kernel logs because the failed area has /home and /var.
 The double fault occurred during the holidays, so I don't know which
 happened first. Below are the output of the command above and of
 --examine.
 
 mdadm: looking for devices for /dev/md3
 mdadm: /dev/sda4 is identified as a member of /dev/md3, slot 0.
 mdadm: /dev/sdc4 is identified as a member of /dev/md3, slot 2.
 mdadm: /dev/sde4 is identified as a member of /dev/md3, slot 4.
 mdadm: /dev/sdd4 is identified as a member of /dev/md3, slot 5.
 mdadm: no uptodate device for slot 1 of /dev/md3
 mdadm: added /dev/sdc4 to /dev/md3 as 2
 mdadm: no uptodate device for slot 3 of /dev/md3
 mdadm: added /dev/sde4 to /dev/md3 as 4
 mdadm: added /dev/sdd4 to /dev/md3 as 5
 mdadm: added /dev/sda4 to /dev/md3 as 0
 mdadm: failed to RUN_ARRAY /dev/md3: Input/output error
 mdadm: Not enough devices to start the array.

So no device claim to be member '1' or '3' of the array, and as you
cannot start an array with 2 devices missing, there is nothing that
mdadm can do.  It has no way of knowing what should go in as '1' or
'3'.

As you note, sda4 says that it thinks slot 1 is still active/sync, but
it doesn't seem to know which device should go there either.
However that does indicate that slot 3 failed first and slot 1 failed
later.  So if we have candidates for both, slot 1 is probably more
uptodate.

You need to tell mdadm what goes where by creating the array.
e.g. if you think that sdb4 is adequately reliable and that it was in
slot 1, then

 mdadm -C /dev/md3 -l5 -n5 -c 128 /dev/sda4 /dev/sdb4 /dev/sdc4 missing 
/dev/sde4

alternately if you think it best to use sdd, and it was in slot 3,
then

 mdadm -C /dev/md3 -l5 -n5 -c 128 /dev/sda4 missing /dev/sdc4 /dev/sdd4 
/dev/sde4

would be the command to use.

Note that this command will not touch any data.  It will just
overwrite the superblock and assemble the array.
You can then 'fsck' or whatever to confirm that the data looks good.

good luck.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: One Large md or Many Smaller md for Better Peformance?

2008-01-20 Thread Moshe Yudkowsky

Thanks for the tips, and in particular:

Iustin Pop wrote:


  - if you download torrents, fragmentation is a real problem, so use a
filesystem that knows how to preallocate space (XFS and maybe ext4;
for XFS use xfs_io to set a bigger extend size for where you
download)


That's a very interesting idea; it also gives me an opportunity to 
experiment with XFS. I had been avoiding it because of possible 
power-failure issues on writes.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 She will have fun who knows when to work
  and when not to work.
-- Segami
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: array doesn't run even with --force

2008-01-20 Thread Carlos Carvalho
Neil Brown ([EMAIL PROTECTED]) wrote on 21 January 2008 14:09:
 As you note, sda4 says that it thinks slot 1 is still active/sync, but
 it doesn't seem to know which device should go there either.
 However that does indicate that slot 3 failed first and slot 1 failed
 later.  So if we have candidates for both, slot 1 is probably more
 uptodate.

I was going home (it's 1h20 past midnight) when I remembered and came
back to write that assembling with
/dev/sda4 /dev/sdb4 /dev/sdc4 missing /dev/sde4

works, which confirms what you say. Adding sdd4 back it
starts resyncing, however since sdb4 has errors, a double fault
happens again and the array fails.

 You need to tell mdadm what goes where by creating the array.
 e.g. if you think that sdb4 is adequately reliable and that it was in
 slot 1, then
 
  mdadm -C /dev/md3 -l5 -n5 -c 128 /dev/sda4 /dev/sdb4 /dev/sdc4 missing 
  /dev/sde4
 
 alternately if you think it best to use sdd, and it was in slot 3,
 then
 
  mdadm -C /dev/md3 -l5 -n5 -c 128 /dev/sda4 missing /dev/sdc4 /dev/sdd4 
  /dev/sde4
 
 would be the command to use.
 
 Note that this command will not touch any data.  It will just
 overwrite the superblock and assemble the array.
 You can then 'fsck' or whatever to confirm that the data looks good.

I have two possibilities: use sdd4 in slot 3 or the dump of sdb4 in
another disk in slot 1. This copy is more recent but has errors. Is it
possible to know which would be less bad before I fsck?
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html