Re: mdadm --grow failed

2007-02-18 Thread Marc Marais
Ok, I understand the risks which is why I did a full backup before doing 
this. I have subsequently recreated the array and restored my data from 
backup.

Just for information, the e2fsck -n on the drive hung (unresponsive with no 
I/O) so I assume the filesystem was hosed. I suspect resyncing the array 
after the grow failed was a bad idea. 

I'm not sure how the grow operation is performed but to me it seems that 
their is no fault tolerance during the operation so any failure will cause a 
corrupt array. My 2c would be that if any drive fails during a grow 
operation that the operation is aborted in such a way as to allow a restart 
later (if possible) - as in my case a retry would've probably worked. 

Anyway, if you need more info to help improve growing arrays let me know.

As a side note, either my hardware (Promise TX4000) card is acting up or 
there are still some unresolved issues with libata in general and/or 
sata_promise itself. 

Regards,
Marc

On Sat, 17 Feb 2007 19:40:17 +1100, Neil Brown wrote
 On Saturday February 17, [EMAIL PROTECTED] wrote:
  
  Is my array destroyed? Seeing as the sda disk wasn't completely synced 
I'm 
  wonder how it was using to resync the array when sdc went offline. I've 
got 
  a bad feeling about this :|
 
 I can understand your bad feeling...
 What happened there shouldn't happen, but obviously it did.  There is
 evidence that all is not lost but obviously I cannot be sure yet.
 
 Can you fsck -n the array?  does the data still seem to be intact?
 
 Can you report exactly what version of Linux kernel, and of mdadm you
 are using, and give the output of mdadm -E on each drive.
 
 I'll try to work out what happened and how to go forward, but am
 unlikely to get back to you for 24-48 hours (I have a busy weekend:-).
 
 NeilBrown


--
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20: reproducible hard lockup with RAID-5 resync

2007-02-18 Thread Corey Hickey
Neil Brown wrote:
 Ok, so the difference is CONFIG_SYSFS_DEPRECATED. If that is not
 defined, the kernel locks up. There's not a lot of code under
 #ifdef/#ifndef CONFIG_SYSFS_DEPRECATED, but since I'm not familiar with
 any of it I don't expect trying to locate the bug on my own would be
 very productive.

 Neil, do you have CONFIG_SYSFS_DEPRECATED enabled? If so, does disabling
 it reproduce my problem? If you can't reproduce it, should it take the
 problem over to linux-kernel?
 
 # CONFIG_SYSFS_DEPRECATED is not set
 
 No, it is not set, and yet it all still works for me.

Dang, again. :)

 It is very hard to see how this CONFIG option can make a difference.
 Have you double checked that setting it removed the problem and
 clearing it causes the problem?

Yes, it seems odd to me too, but I have double-checked. If I build a
kernel with CONFIG_SYSFS_DEPRECATED enabled, it works; if I disable that
option and rebuild the kernel, it locks up.

I just tried running 'make defconfig' and then enabling only RAID,
RAID-0, RAID-1, and RAID-4/5/6. If I then disable
CONFIG_SYSFS_DEPRECATED, there aren't any problems. ...so, I'll try to
isolate the problem some more later.

Thanks,
Corey
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm --grow failed

2007-02-18 Thread Neil Brown
On Sunday February 18, [EMAIL PROTECTED] wrote:
 
 I'm not sure how the grow operation is performed but to me it seems that 
 their is no fault tolerance during the operation so any failure will cause a 
 corrupt array. My 2c would be that if any drive fails during a grow 
 operation that the operation is aborted in such a way as to allow a restart 
 later (if possible) - as in my case a retry would've probably worked. 

For what it's worth, the code does exactly what you suggest.  It does
fail gracefully.  The problem is that it doesn't restart quite the
way you would like.

Had you stopped the array and re-assembled it, it would have resume
the reshape process (at least it did in my testing).

The following patch makes it retry a reshape straight away if it was
aborted due to a device failure (of course, if too many devices have
failed, the retry won't get anywhere, but you would expect that).

Thanks for the valuable feedback.

NeilBrown


Restart a (raid5) reshape that has been aborted due to a read/write error.

An error always aborts any resync/recovery/reshape on the understanding
that it will immediately be restarted if that still makes sense.
However a reshape currently doesn't get restarted.  This this patch
it does.
To avoid restarting when it is not possible to do work, we call 
in to the personality to check that a reshape is ok, and strengthen
raid5_check_reshape to fail if there are too many failed devices.

We also break some code out into a separate function: remote_and_add_spares
as the indent level for that code we getting crazy.


### Diffstat output
 ./drivers/md/md.c|   74 +++
 ./drivers/md/raid5.c |2 +
 2 files changed, 47 insertions(+), 29 deletions(-)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c   2007-02-19 11:44:51.0 +1100
+++ ./drivers/md/md.c   2007-02-19 11:44:54.0 +1100
@@ -5343,6 +5343,44 @@ void md_do_sync(mddev_t *mddev)
 EXPORT_SYMBOL_GPL(md_do_sync);
 
 
+static int remove_and_add_spares(mddev_t *mddev)
+{
+   mdk_rdev_t *rdev;
+   struct list_head *rtmp;
+   int spares = 0;
+
+   ITERATE_RDEV(mddev,rdev,rtmp)
+   if (rdev-raid_disk = 0 
+   (test_bit(Faulty, rdev-flags) ||
+! test_bit(In_sync, rdev-flags)) 
+   atomic_read(rdev-nr_pending)==0) {
+   if (mddev-pers-hot_remove_disk(
+   mddev, rdev-raid_disk)==0) {
+   char nm[20];
+   sprintf(nm,rd%d, rdev-raid_disk);
+   sysfs_remove_link(mddev-kobj, nm);
+   rdev-raid_disk = -1;
+   }
+   }
+
+   if (mddev-degraded) {
+   ITERATE_RDEV(mddev,rdev,rtmp)
+   if (rdev-raid_disk  0
+!test_bit(Faulty, rdev-flags)) {
+   rdev-recovery_offset = 0;
+   if (mddev-pers-hot_add_disk(mddev,rdev)) {
+   char nm[20];
+   sprintf(nm, rd%d, rdev-raid_disk);
+   sysfs_create_link(mddev-kobj,
+ rdev-kobj, nm);
+   spares++;
+   md_new_event(mddev);
+   } else
+   break;
+   }
+   }
+   return spares;
+}
 /*
  * This routine is regularly called by all per-raid-array threads to
  * deal with generic issues like resync and super-block update.
@@ -5397,7 +5435,7 @@ void md_check_recovery(mddev_t *mddev)
return;
 
if (mddev_trylock(mddev)) {
-   int spares =0;
+   int spares = 0;
 
spin_lock_irq(mddev-write_lock);
if (mddev-safemode  !atomic_read(mddev-writes_pending) 
@@ -5460,35 +5498,13 @@ void md_check_recovery(mddev_t *mddev)
 * Spare are also removed and re-added, to allow
 * the personality to fail the re-add.
 */
-   ITERATE_RDEV(mddev,rdev,rtmp)
-   if (rdev-raid_disk = 0 
-   (test_bit(Faulty, rdev-flags) || ! 
test_bit(In_sync, rdev-flags)) 
-   atomic_read(rdev-nr_pending)==0) {
-   if (mddev-pers-hot_remove_disk(mddev, 
rdev-raid_disk)==0) {
-   char nm[20];
-   sprintf(nm,rd%d, rdev-raid_disk);
-   sysfs_remove_link(mddev-kobj, nm);
-   rdev-raid_disk = -1;
-   }
-   }
-
-   if 

Re: mdadm --grow failed

2007-02-18 Thread Marc Marais
On Sun, 18 Feb 2007 07:13:28 -0500 (EST), Justin Piszcz wrote
 On Sun, 18 Feb 2007, Marc Marais wrote:
 
  On Sun, 18 Feb 2007 20:39:09 +1100, Neil Brown wrote
  On Sunday February 18, [EMAIL PROTECTED] wrote:
  Ok, I understand the risks which is why I did a full backup before doing
  this. I have subsequently recreated the array and restored my data from
  backup.
 
  Could you still please tell me exactly what kernel/mdadm version you
  were using?
 
  Thanks,
  NeilBrown
 
  2.6.20 with the patch you supplied in response to the md6_raid5 crash
  email I posted in linux-raid a few days ago. Just as background, I replaced
  the failing drive and at the same time bought an additional drive in order
  to increase the array size.
 
  mdadm -V = v2.6 - 21 December 2006. Compiled under Debian (stable).
 
  Also, I've just noticed another drive failure with the new array with a
  similar error to what happened during the grow operation (although on a
  different drive) - I wonder if I should post this to linux-ide?
 
  Feb 18 00:58:10 xerces kernel: ata4: command timeout
  Feb 18 00:58:10 xerces kernel: ata4: no sense translation for status: 0x40
  Feb 18 00:58:10 xerces kernel: ata4: translated ATA stat/err 0x40/00 to SCSI
  SK/ASC/ASCQ 0xb/00/00
  Feb 18 00:58:10 xerces kernel: ata4: status=0x40 { DriveReady }
  Feb 18 00:58:10 xerces kernel: sd 4:0:0:0: SCSI error: return code =
  0x0802
  Feb 18 00:58:10 xerces kernel: sdd: Current [descriptor]: sense key: Aborted
  Command
  Feb 18 00:58:10 xerces kernel: Additional sense: No additional sense
  information
  Feb 18 00:58:10 xerces kernel: Descriptor sense data with sense descriptors
  (in hex):
  Feb 18 00:58:10 xerces kernel: 72 0b 00 00 00 00 00 0c 00 0a 80 00
  00 00 00 00
  Feb 18 00:58:10 xerces kernel: 00 00 00 00
  Feb 18 00:58:10 xerces kernel: end_request: I/O error, dev sdd, sector
  35666775
  Feb 18 00:58:10 xerces kernel: raid5: Disk failure on sdd1, disabling
  device. Operation continuing on 3 devices
 
  Regards,
  Marc
 
  -
  To unsubscribe from this list: send the line unsubscribe linux-raid in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 Just out of curiosity:
 
 Feb 18 00:58:10 xerces kernel: end_request: I/O error, dev sdd,
  sector 35666775
 
 Can you run:
 
 smartctl -d ata -t short /dev/sdd
 wait 5 min
 smartctl -d ata -t long /dev/sdd
 wait 2-3 hr
 smartctl -d ata -a /dev/sdd
 
 And then e-mail that output to the list?
 
 Justin.

Ok here we go:

/dev/sdd:

smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen Home page is
http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: WDC WD1600JB-00EVA0
Serial Number:WD-WMAEK2751794
Firmware Version: 15.05R15
Device is:In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:Mon Feb 19 14:38:16 2007 GMT-9
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION === SMART overall-health self-assessment
test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
was suspended by an interrupting 
command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status:  (   0) The previous self-test routine completed
without error or no self-test has ever 
been run.
Total time to complete Offline 
data collection: (5073) seconds.
Offline data collection
capabilities:(0x79) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine 
recommended polling time:(   2) minutes.
Extended self-test routine
recommended polling time:(  67) minutes.
Conveyance self-test routine
recommended polling time:(   5) minutes.

SMART Attributes Data Structure revision number: 16 Vendor Specific SMART