Re: [bug?] raid1 integrity checking is broken on 2.6.18-rc4

2006-08-27 Thread Neil Brown
On Thursday August 17, [EMAIL PROTECTED] wrote:
 
 I just tried the patch and now it seems to be syncing the drives instead
 of only checking them?  (At the very least the message is misleading.)
 

Yes, the message is misleading.  I should fix that.

NeilBrown


  # echo check /sys/block/md0/md/sync_action
  # dmesg | tail -9
  md: syncing RAID array md0
  md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
  md: using maximum available idle IO bandwidth (but not more than 20 
 KB/sec) for reconstruction.
  md: using 128k window, over a total of 104256 blocks.
  md: md0: sync done.
  RAID1 conf printout:
   --- wd:2 rd:2
   disk 0, wo:0, o:1, dev:hda9
   disk 1, wo:0, o:1, dev:sda5
 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bug?] raid1 integrity checking is broken on 2.6.18-rc4

2006-08-18 Thread raid
Neil introduced read-checking into 2.6.16. In versions prior, mirror copies 
were overwritten instead of checked.

I'm running 2.6.17rc4:
# echo check  /sys/block/md0/md/sync_action
# dmesg
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 20 KB/sec) 
for reconstruction.
md: using 128k window, over a total of 15542784 blocks.

# dstat -d -D sda,sdg
--disk/sdadisk/sdg-
_read write:_read write
  81k   30k:  81k   30k
  58M0 :  58M0
  58M0 :  57M0
  57M0 :  58M0
  58M0 :  57M0

Although the message uses the word reconscruction, the drives are being 
checked for consistancy.



-
I just tried the patch and now it seems to be syncing the drives instead
of only checking them?  (At the very least the message is misleading.)

 # echo check /sys/block/md0/md/sync_action
 # dmesg | tail -9
 md: syncing RAID array md0
 md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
 md: using maximum available idle IO bandwidth (but not more than 20 
KB/sec) for reconstruction.
 md: using 128k window, over a total of 104256 blocks.
 md: md0: sync done.
 RAID1 conf printout:
  --- wd:2 rd:2
  disk 0, wo:0, o:1, dev:hda9
  disk 1, wo:0, o:1, dev:sda5
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bug?] raid1 integrity checking is broken on 2.6.18-rc4

2006-08-17 Thread Chuck Ebbert
In-Reply-To: [EMAIL PROTECTED]

On Mon, 14 Aug 2006 16:14:06 +1000, Neil Brown wrote:

  On 2.6.18-rc4 + the below patch:
  Drive activity light blinks once on one drive, then the
  array status prints (obviously no checking takes place.)
  
 
 Thanks for the report.
 Easily duplicated, easily fixed.
 I'll make sure this patch gets into 2.6.18.
 
 Thanks again,
 NeilBrown
 

I just tried the patch and now it seems to be syncing the drives instead
of only checking them?  (At the very least the message is misleading.)

 # echo check /sys/block/md0/md/sync_action
 # dmesg | tail -9
 md: syncing RAID array md0
 md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
 md: using maximum available idle IO bandwidth (but not more than 20 
KB/sec) for reconstruction.
 md: using 128k window, over a total of 104256 blocks.
 md: md0: sync done.
 RAID1 conf printout:
  --- wd:2 rd:2
  disk 0, wo:0, o:1, dev:hda9
  disk 1, wo:0, o:1, dev:sda5


 Signed-off-by: Neil Brown [EMAIL PROTECTED]
 
 diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c
 --- .prev/drivers/md/raid1.c  2006-07-31 17:24:36.0 +1000
 +++ ./drivers/md/raid1.c  2006-08-14 15:52:48.0 +1000
 @@ -1644,15 +1644,16 @@ static sector_t sync_request(mddev_t *md
   return 0;
   }
  
 - /* before building a request, check if we can skip these blocks..
 -  * This call the bitmap_start_sync doesn't actually record anything
 -  */
   if (mddev-bitmap == NULL 
   mddev-recovery_cp == MaxSector 
 + !test_bit(MD_RECOVERY_REQUESTED, mddev-recovery) 
   conf-fullsync == 0) {
   *skipped = 1;
   return max_sector - sector_nr;
   }
 + /* before building a request, check if we can skip these blocks..
 +  * This call the bitmap_start_sync doesn't actually record anything
 +  */
   if (!bitmap_start_sync(mddev-bitmap, sector_nr, sync_blocks, 1) 
   !conf-fullsync  !test_bit(MD_RECOVERY_REQUESTED, 
 mddev-recovery)) {
   /* We can skip this block, and probably several more */

-- 
Chuck
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bug?] raid1 integrity checking is broken on 2.6.18-rc4

2006-08-12 Thread Justin Piszcz



On Sat, 12 Aug 2006, Chuck Ebbert wrote:


Doing this on a raid1 array:
   echo check /sys/block/md0/md/sync_action

On 2.6.16.27:
   Activity lights on both mirrors show activity for a while,
   then the array status prints on the console.

On 2.6.18-rc4 + the below patch:
   Drive activity light blinks once on one drive, then the
   array status prints (obviously no checking takes place.)


Applied hotfix on 2.6.18-rc4:

--- .prev/drivers/md/md.c   2006-08-08 09:00:44.0 +1000
+++ ./drivers/md/md.c   2006-08-08 09:04:04.0 +1000
@@ -1597,6 +1597,19 @@ void md_update_sb(mddev_t * mddev)

repeat:
   spin_lock_irq(mddev-write_lock);
+
+   if (mddev-degraded  mddev-sb_dirty == 3)
+   /* If the array is degraded, then skipping spares is both
+* dangerous and fairly pointless.
+* Dangerous because a device that was removed from the array
+* might have a event_count that still looks up-to-date,
+* so it can be re-added without a resync.
+* Pointless because if there are any spares to skip,
+* then a recovery will happen and soon that array won't
+* be degraded any more and the spare can go back to sleep then.
+*/
+   mddev-sb_dirty = 1;
+
   sync_req = mddev-in_sync;
   mddev-utime = get_seconds();
   if (mddev-sb_dirty == 3)
--
Chuck

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Is there a doc for all of the options you can echo into the sync_action? 
I'm assuming mdadm does these as well and echo is just another way to run 
work with the array?

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bug?] raid1 integrity checking is broken on 2.6.18-rc4

2006-08-12 Thread Michael Tokarev
Justin Piszcz wrote:
 Is there a doc for all of the options you can echo into the sync_action?
 I'm assuming mdadm does these as well and echo is just another way to
 run work with the array?

How about the obvious, Documentation/md.txt ?

And no, mdadm does not perform or trigger data integrity checking.

/mjt
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[bug?] raid1 integrity checking is broken on 2.6.18-rc4

2006-08-12 Thread Chuck Ebbert
Doing this on a raid1 array:
echo check /sys/block/md0/md/sync_action

On 2.6.16.27:
Activity lights on both mirrors show activity for a while,
then the array status prints on the console.

On 2.6.18-rc4 + the below patch:
Drive activity light blinks once on one drive, then the
array status prints (obviously no checking takes place.)


Applied hotfix on 2.6.18-rc4:

--- .prev/drivers/md/md.c   2006-08-08 09:00:44.0 +1000
+++ ./drivers/md/md.c   2006-08-08 09:04:04.0 +1000
@@ -1597,6 +1597,19 @@ void md_update_sb(mddev_t * mddev)
 
 repeat:
spin_lock_irq(mddev-write_lock);
+
+   if (mddev-degraded  mddev-sb_dirty == 3)
+   /* If the array is degraded, then skipping spares is both
+* dangerous and fairly pointless.
+* Dangerous because a device that was removed from the array
+* might have a event_count that still looks up-to-date,
+* so it can be re-added without a resync.
+* Pointless because if there are any spares to skip,
+* then a recovery will happen and soon that array won't
+* be degraded any more and the spare can go back to sleep then.
+*/
+   mddev-sb_dirty = 1;
+
sync_req = mddev-in_sync;
mddev-utime = get_seconds();
if (mddev-sb_dirty == 3)
-- 
Chuck

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html