Re: question : raid bio sector size

2006-03-29 Thread Raz Ben-Jehuda(caro)
I was refering to bios reaching make_request in raid5.c .
I would be more precise.
I am dd'ing  dd if=/dev/md1 of=/dev/zero bs=1M count=1 skip=10
I have added the following printk in make_request printk (%d:,bio-bi_size)
I am getting sector sizes. 512:512:512:512:512
I suppose they gathered in the elevator,
but still why so small ?

thank you
raz.

On 3/27/06, Neil Brown [EMAIL PROTECTED] wrote:
 On Monday March 27, [EMAIL PROTECTED] wrote:
  i have playing with raid5 and i noticed that the arriving bios  sizes
  are 1 sector.
  why is that  and where is it set ?

 bios arriving from where?

 bios from the filesystem to the raid5 device will be whatever size the
 fs wants to make them.

 bios from the raid5 device to the component devices will always be 1
 page (typically 8 sectors).  This is the size used by the stripe cache
 which is used to synchronise everything.

 NeilBrown



--
Raz
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: question : raid bio sector size

2006-03-29 Thread Neil Brown
On Wednesday March 29, [EMAIL PROTECTED] wrote:
 I was refering to bios reaching make_request in raid5.c .
 I would be more precise.
 I am dd'ing  dd if=/dev/md1 of=/dev/zero bs=1M count=1 skip=10
 I have added the following printk in make_request printk 
 (%d:,bio-bi_size)
 I am getting sector sizes. 512:512:512:512:512
 I suppose they gathered in the elevator,
 but still why so small ?

Odd.. When I try that I get 4096 repeatedly.
Which kernel are you using?
What does
   blockdev --getbsz /dev/md1
say?
Do you have a filesystem mounted on /dev/md1?  If so, what sort of
filesystem.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


(X)FS corruption on 2 SATA disk RAID 1

2006-03-29 Thread JaniD++
Hello, list,

I think, this is generally hardware error, but looks like software problem
too.
At this point there is no dirty data in memory!

Cheers,
Janos

[EMAIL PROTECTED] /]# cmp -b /dev/sda1 /dev/sdb1
/dev/sda1 /dev/sdb1 differ: byte 68881481729, line 308395510 is 301 M-A  74

[EMAIL PROTECTED] /]# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [faulty]
md10 : active raid1 sdb1[1] sda1[0]
  136729088 blocks [2/2] [UU]
  bitmap: 0/131 pages [0KB], 512KB chunk

unused devices: none

[EMAIL PROTECTED] /]# mount
192.168.0.1://NFS/ROOT-BASE/ on / type nfs
(rw,hard,rsize=8192,wsize=8192,timeo=
5,retrans=0,actimeo=1)
none on /proc type proc (rw,noexec,nosuid,nodev)
none on /dev/pts type devpts (rw,gid=5,mode=620)
none on /dev/shm type tmpfs (rw)
none on /sys type sysfs (rw)
/dev/ram0 on /mnt/fast type ext2 (rw)
none on /dev/cpuset type cpuset (rw)
/dev/md10 on /mnt/1 type xfs (ro)
[EMAIL PROTECTED] /]#

cut from log:

Mar 29 08:14:45 dy-xeon-1 kernel: scsi1 : ata_piix
Mar 29 08:14:45 dy-xeon-1 kernel:   Vendor: ATA   Model: WDC
WD2000JD-19H  Rev: 08.0
Mar 29 08:14:45 dy-xeon-1 kernel:   Type:   Direct-Access
ANSI SCSI revision: 05
Mar 29 08:14:45 dy-xeon-1 kernel:   Vendor: ATA   Model: WDC
WD2000JD-19H  Rev: 08.0
Mar 29 08:14:45 dy-xeon-1 kernel:   Type:   Direct-Access
ANSI SCSI revision: 05
Mar 29 08:14:45 dy-xeon-1 kernel: SCSI device sda: 390721968 512-byte hdwr
sectors (200050 MB)
Mar 29 08:14:45 dy-xeon-1 kernel: SCSI device sda: drive cache: write back
Mar 29 08:14:45 dy-xeon-1 kernel: SCSI device sda: 390721968 512-byte hdwr
sectors (200050 MB)
Mar 29 08:14:45 dy-xeon-1 kernel: SCSI device sda: drive cache: write back
Mar 29 08:14:45 dy-xeon-1 kernel:  sda: sda1 sda2
Mar 29 08:14:45 dy-xeon-1 kernel: sd 0:0:0:0: Attached scsi disk sda
Mar 29 08:14:45 dy-xeon-1 kernel: SCSI device sdb: 390721968 512-byte hdwr
sectors (200050 MB)
Mar 29 08:14:45 dy-xeon-1 kernel: SCSI device sdb: drive cache: write back
Mar 29 08:14:45 dy-xeon-1 kernel: SCSI device sdb: 390721968 512-byte hdwr
sectors (200050 MB)
Mar 29 08:14:45 dy-xeon-1 kernel: SCSI device sdb: drive cache: write back
Mar 29 08:14:45 dy-xeon-1 kernel:  sdb: sdb1 sdb2
Mar 29 08:14:45 dy-xeon-1 kernel: sd 1:0:0:0: Attached scsi disk sdb
Mar 29 08:14:45 dy-xeon-1 kernel: sd 0:0:0:0: Attached scsi generic sg0 type
0
Mar 29 08:14:45 dy-xeon-1 kernel: sd 1:0:0:0: Attached scsi generic sg1 type
0

Smart logs:
sda:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000b   200   200   051Pre-fail
s   -   0
  3 Spin_Up_Time0x0007   130   124   021Pre-fail
s   -   6025
  4 Start_Stop_Count0x0032   100   100   040Old_age
ys   -   97
  5 Reallocated_Sector_Ct   0x0033   200   200   140Pre-fail
s   -   0
  7 Seek_Error_Rate 0x000b   200   200   051Pre-fail
s   -   0
  9 Power_On_Hours  0x0032   089   089   000Old_age
ys   -   8047
 10 Spin_Retry_Count0x0013   100   253   051Pre-fail
s   -   0
 11 Calibration_Retry_Count 0x0013   100   253   051Pre-fail
s   -   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age
ys   -   97
194 Temperature_Celsius 0x0022   120   111   000Old_age
ys   -   30
196 Reallocated_Event_Count 0x0032   200   200   000Old_age
ys   -   0
197 Current_Pending_Sector  0x0012   200   200   000Old_age
ys   -   0
198 Offline_Uncorrectable   0x0012   200   200   000Old_age
ys   -   0
199 UDMA_CRC_Error_Count0x000a   200   253   000Old_age
ys   -   0
200 Multi_Zone_Error_Rate   0x0009   200   200   051Pre-fail  Offline
-   0

SMART Error Log Version: 1
No Errors Logged

sdb:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000b   200   200   051Pre-fail
s   -   0
  3 Spin_Up_Time0x0007   127   120   021Pre-fail
s   -   6175
  4 Start_Stop_Count0x0032   100   100   040Old_age
ys   -   94
  5 Reallocated_Sector_Ct   0x0033   200   200   140Pre-fail
s   -   0
  7 Seek_Error_Rate 0x000b   200   200   051Pre-fail
s   -   0
  9 Power_On_Hours  0x0032   089   089   000Old_age
ys   -   8065
 10 Spin_Retry_Count0x0013   100   253   051Pre-fail
s   -   0
 11 Calibration_Retry_Count 0x0013   100   253   051Pre-fail
s   -   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age
ys   -   94
194 Temperature_Celsius 0x0022   117   109   000Old_age
ys   -   33

Re: question : raid bio sector size

2006-03-29 Thread Raz Ben-Jehuda(caro)
man .. very very good.
blockdev --getsz says 512.


On 3/29/06, Neil Brown [EMAIL PROTECTED] wrote:
 On Wednesday March 29, [EMAIL PROTECTED] wrote:
  I was refering to bios reaching make_request in raid5.c .
  I would be more precise.
  I am dd'ing  dd if=/dev/md1 of=/dev/zero bs=1M count=1 skip=10
  I have added the following printk in make_request printk 
  (%d:,bio-bi_size)
  I am getting sector sizes. 512:512:512:512:512
  I suppose they gathered in the elevator,
  but still why so small ?

 Odd.. When I try that I get 4096 repeatedly.
 Which kernel are you using?
 What does
blockdev --getbsz /dev/md1
 say?
 Do you have a filesystem mounted on /dev/md1?  If so, what sort of
 filesystem.

 NeilBrown



--
Raz
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: making raid5 more robust after a crash?

2006-03-29 Thread Chris Allen
On Sat, Mar 18, 2006 at 08:13:48AM +1100, Neil Brown wrote:
 On Friday March 17, [EMAIL PROTECTED] wrote:
  Dear All,
  
  We have a number of machines running 4TB raid5 arrays.
  Occasionally one of these machines will lock up solid and
  will need power cycling. Often when this happens, the
  array will refuse to restart with 'cannot start dirty
  degraded array'. Usually  mdadm --assemble --force will
  get the thing going again - although it will then do
  a complete resync.
  
  
  My question is: Is there any way I can make the array
  more robust? I don't mind it losing a single drive and
  having to resync when we get a lockup - but having to
  do a forced assemble always makes me nervous, and means
  that this sort of crash has to be escalated to a senior
  engineer.
 
 Why is the array degraded?
 
 Having a crash while the array is degraded can cause undetectable data
 loss.  That is why md won't assemble the array itself: you need to
 know there could be a problem.
 
 But a crash with a degraded array should be fairly unusual.  If it is
 happening a lot, then there must be something wrong with your config:
 either you are running degraded a lot (which is not safe, don't do
 it), or md cannot find all the devices to assemble.


Thanks for your reply. As you guessed, this was a problem
with our hardware/config and nothing to do with the raid software.

After much investigation we found that we had two separate problems.
The first of these was a SATA driver problem. This would occasionally
return hard errors for a drive in the array, after which it would
get kicked. The second was XFS over NFS using up too much kernel
stack and hanging the machine. If both happened before we noticed
(say during the night), the result would be one drive dirty because
of the SATA driver and one dirty because of the lockup.

The real sting in the tail is that (for some reason) the drive lost through the 
SATA
problem would not be marked as dirty - so if the array was force rebuilt it
would be used in place of the more recent failure - causing horrible
synchronisation problems.

Can anybody point me to the syntax I could use for saying:

force rebuild the array using drives ABCD but not E, even though
E looks fresh and D doesn't.

?



  
  Typical syslog:
  
  
  Mar 17 10:45:24 snap27 kernel: md: Autodetecting RAID arrays.
  Mar 17 10:45:24 snap27 kernel: raid5: cannot start dirty degraded
  array for md0
 
 So where is 'disk 1' ??  Presumably it should be 'sdb1'.  Does that
 drive exist?  Is is marked for auto-detect like the others?

Ok, this syslog was a complete red herring for the above problem - 
and you hit the nail right on the head - in this particular case I
had installed a new sdb1 and forgot to set the autodetect flag :-)


Chris.


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: making raid5 more robust after a crash?

2006-03-29 Thread Neil Brown
On Wednesday March 29, [EMAIL PROTECTED] wrote:
 
 Thanks for your reply. As you guessed, this was a problem
 with our hardware/config and nothing to do with the raid software.

I'm glad you have found your problem!
 
 Can anybody point me to the syntax I could use for saying:
 
 force rebuild the array using drives ABCD but not E, even though
 E looks fresh and D doesn't.

mdadm -Af /dev/mdX A B C D

i.e. don't even tell mdadm about E.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


addendum: was Re: recovering data on a failed raid-0 installation

2006-03-29 Thread Technomage
ok, guy and others.

this is a followup to the case I am currently trying (still) to solve.

synopsis:
the general consensus is that raid0 writes in a striping fashion.

However, the test case I have here doesn't appear to operate in the above 
described manner. what was observed was this: on /dev/mdo (while observing 
drive activity for both hda and hdb) hda was active until filled at which 
point data was spanned to hdb.  In other words, the data was written in a 
linear, not striped, manner. 

given this behavior (as observed), it stands to reason that the data on the 
first of the 2 members of this raid should be recoverable, if only we could 
trick the raid into allowing us to mount it without its second member. at 
this point, we are assuming that the data on drive 2 (hdb) is not 
recoverable. 

In a scientific fashion, assuming that the observed behavior is correct, how 
would one go about recovering data from the first member without the second 
being present? I assume that we are going to have to use mdadm in such a way 
as to trick it into thinking it is doing something that it is not. I invite 
anyone here to setup a similar testing environment to confirm these results.

drives: 2 identical IDE drives (same make/model)
suse 9.3 os.

p.s. I have heard all the naysayer commentary so please, keep it to USEFUL 
information only. thanks

On Tuesday 28 March 2006 22:26, you wrote:
 RAID0 uses all disks evenly (all 2 in your case).  I don’t see how you can
 recover from a drive failure with a RAID0.  Never use RAID0 unless you are
 willing to lose all the data!

 Are you sure the second disk is dead?  Have you done a read test on the
 disk?  dd works well for read testing.  Try this:
 dd if=/dev/hdb2 of=/dev/null bs=64k
 or
 dd if=/dev/hdb of=/dev/null bs=64k

 Guy

 } -Original Message-
 } From: [EMAIL PROTECTED] [mailto:linux-raid-
 } [EMAIL PROTECTED] On Behalf Of Technomage
 } Sent: Wednesday, March 29, 2006 12:09 AM
 } To: linux-raid@vger.kernel.org
 } Subject: recovering data on a failed raid-0 installation
 }
 } ok,
 } here's the situation in a nutshell.
 }
 } one of the 2 HD's in a linux raid-0 installation has failed.
 }
 } Fortunately, or otherwise, it was NOT the primary HD.
 }
 } problem is, I need to recover data from the first drive but appear to be
 } unable to do so because the raid is not complete. the second drive only
 } had
 } 193 MB written to it and I am fairly certain that the data I would like
 to } recover is NOT on that drive.
 }
 } can anyone offer any solutions to this?
 }
 } the second HD is not usable (heat related failure issues).
 }
 } The filesystem used on the md0 partition (under mdadm) was xfs. now I
 have } tried the xfs_check and xfs_repair tools and they are not helpful at
 this } point.
 }
 } The developer (of mdadm) suggested I use the following commands in an
 } attempt
 } to recover:
 }
 }   mdadm -C /dev/md0 -l0 -n2 /dev/..
 }   fsck -n /dev/md0
 }
 } However, the second one was a no go.
 }
 } I am stumped as to how to proceed here. I need the data off the first
 } drive,
 } but do not appear to have any way (other than using cat to see it) to get
 } at
 } it.
 }
 } some help would be greatly appreciated.
 }
 } technomage
 }
 } p. here is the original response sent back to me from the developer of
 } mdadm:
 } ***
 } Re: should have been more explicit here - Re: need some help URGENT!
 } From: Neil Brown [EMAIL PROTECTED]
 } To: Technomage [EMAIL PROTECTED]
 } Date: Sunday 22:01:45
 } On Sunday March 26, [EMAIL PROTECTED] wrote:
 }  ok,
 } 
 }  you gave me more info than some local to that mentioned e-mail list.
 } 
 }  ok, the vast majority of the data I need to recover is on /dev/hda
 }  and /dev/hdb only has 193 MB and is probably irrelevant.
 } 
 }  can you help me with this?
 }  can you baby me through this. I really need to recover this data (if at
 } all
 }  possible).
 }
 } Not really, and certainly not now (I have to go out).
 } I have already make 2 suggestions
 }   mail linux-raid@vger.kernel.org
 } and
 }   mdadm -C /dev/md0 -l0 -n2 /dev/..
 }   fsck -n /dev/md0
 }
 } try one of those.
 }
 } NeilBrown
 }
 } 
 }  the friend of mine that this actually happened to is on the phone,
 } begging
 } me
 }  and grovelling before the gods of linux in order to have this fixed. I
 } have
 }  setup an identical test situation here.
 } 
 }  the important data is on drive 1 and drive 2 is mostly irrelevant.
 }  is there any way to convince raid-0 to truncate to the end of drive 1
 } and
 }  allow me to get whatever data I can off. btw, the filesystem that was
 }  formatted was xfs (for linux) on md0.
 } 
 }  if you have questions, please do not hesitate to ask.
 } 
 }  thank you.
 } 
 }  p. real name here is Eric.
 } 
 } 
 }  On Sunday 26 March 2006 21:33, you wrote:
 }   On Sunday March 26, [EMAIL PROTECTED] wrote:
 }  
 }   With a name like Technomage and a vague subject need some help
 }   URGENT, I very really 

ANNOUNCE: mdadm 2.4 - A tool for managing Soft RAID under Linux

2006-03-29 Thread Neil Brown

I am pleased to announce the availability of
   mdadm version 2.4

It is available at the usual places:
   http://www.cse.unsw.edu.au/~neilb/source/mdadm/
and
   http://www.{countrycode}.kernel.org/pub/linux/utils/raid/mdadm/

mdadm is a tool for creating, managing and monitoring
device arrays using the md driver in Linux, also
known as Software RAID arrays.

Release 2.4 primarily adds support for increasing the number of
devices in a RAID5 array, which requires 2.6.17 (or some -rc or -mm
prerelease).
It also includes a number of minor functionality enhancements and
documentation updates.

Changelog Entries:
-   Rewrite 'reshape' support including performing a backup
of the critical region for a raid5 growth, and restoring that
backup after a crash.
-   Put a 'canary' at each end of the backup so a corruption
can be more easily detected.
-   Remove useless 'ident' arguement from -getinfo_super method.
-   Support --backup-file for backing-up critical section during
growth.
-   Erase old superblocks (of different versions) when creating new
array.
-   Allow --monitor to work with arrays with 28 devices
-   Report reshape information in --detail
-   Handle symlinks in /dev better
-   Fix mess in --detail output which a device is missing.
-   Manpage tidyup
-   Support 'bitmap=' in mdadm.conf for auto-assembling arrays with
write-intent bitmaps in separate files.
-   Updates to md.4 man page including section on RESTRIPING and SYSFS

Development of mdadm is sponsored by
 SUSE Labs, Novell Inc.

NeilBrown  30th March 2006

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: addendum: was Re: recovering data on a failed raid-0 installation

2006-03-29 Thread Guy
If what you say is true, then it was not a RAID0.  It sounds like LINEAR.
Do you have the original command used to create the array?
Or the output from mdadm before you tried any recovery methods.
The output must be from before you re-created the array.
Output from commands like mdadm -D /dev/md0 or mdadm -E /dev/hda2.
Or the output from cat /proc/mdstat, from before you re-created the array.

Guy


} -Original Message-
} From: [EMAIL PROTECTED] [mailto:linux-raid-
} [EMAIL PROTECTED] On Behalf Of Technomage
} Sent: Wednesday, March 29, 2006 11:15 PM
} To: Guy
} Cc: linux-raid@vger.kernel.org
} Subject: addendum: was Re: recovering data on a failed raid-0 installation
} 
} ok, guy and others.
} 
} this is a followup to the case I am currently trying (still) to solve.
} 
} synopsis:
} the general consensus is that raid0 writes in a striping fashion.
} 
} However, the test case I have here doesn't appear to operate in the above
} described manner. what was observed was this: on /dev/mdo (while observing
} drive activity for both hda and hdb) hda was active until filled at which
} point data was spanned to hdb.  In other words, the data was written in a
} linear, not striped, manner.
} 
} given this behavior (as observed), it stands to reason that the data on
} the
} first of the 2 members of this raid should be recoverable, if only we
} could
} trick the raid into allowing us to mount it without its second member.
} at
} this point, we are assuming that the data on drive 2 (hdb) is not
} recoverable.
} 
} In a scientific fashion, assuming that the observed behavior is correct,
} how
} would one go about recovering data from the first member without the
} second
} being present? I assume that we are going to have to use mdadm in such a
} way
} as to trick it into thinking it is doing something that it is not. I
} invite
} anyone here to setup a similar testing environment to confirm these
} results.
} 
} drives: 2 identical IDE drives (same make/model)
} suse 9.3 os.
} 
} p.s. I have heard all the naysayer commentary so please, keep it to
} USEFUL
} information only. thanks
} 
} On Tuesday 28 March 2006 22:26, you wrote:
}  RAID0 uses all disks evenly (all 2 in your case).  I don’t see how you
} can
}  recover from a drive failure with a RAID0.  Never use RAID0 unless you
} are
}  willing to lose all the data!
} 
}  Are you sure the second disk is dead?  Have you done a read test on the
}  disk?  dd works well for read testing.  Try this:
}  dd if=/dev/hdb2 of=/dev/null bs=64k
}  or
}  dd if=/dev/hdb of=/dev/null bs=64k
} 
}  Guy
} 
}  } -Original Message-
}  } From: [EMAIL PROTECTED] [mailto:linux-raid-
}  } [EMAIL PROTECTED] On Behalf Of Technomage
}  } Sent: Wednesday, March 29, 2006 12:09 AM
}  } To: linux-raid@vger.kernel.org
}  } Subject: recovering data on a failed raid-0 installation
}  }
}  } ok,
}  } here's the situation in a nutshell.
}  }
}  } one of the 2 HD's in a linux raid-0 installation has failed.
}  }
}  } Fortunately, or otherwise, it was NOT the primary HD.
}  }
}  } problem is, I need to recover data from the first drive but appear to
} be
}  } unable to do so because the raid is not complete. the second drive
} only
}  } had
}  } 193 MB written to it and I am fairly certain that the data I would
} like
}  to } recover is NOT on that drive.
}  }
}  } can anyone offer any solutions to this?
}  }
}  } the second HD is not usable (heat related failure issues).
}  }
}  } The filesystem used on the md0 partition (under mdadm) was xfs. now I
}  have } tried the xfs_check and xfs_repair tools and they are not helpful
} at
}  this } point.
}  }
}  } The developer (of mdadm) suggested I use the following commands in an
}  } attempt
}  } to recover:
}  }
}  }   mdadm -C /dev/md0 -l0 -n2 /dev/..
}  }   fsck -n /dev/md0
}  }
}  } However, the second one was a no go.
}  }
}  } I am stumped as to how to proceed here. I need the data off the first
}  } drive,
}  } but do not appear to have any way (other than using cat to see it) to
} get
}  } at
}  } it.
}  }
}  } some help would be greatly appreciated.
}  }
}  } technomage
}  }
}  } p. here is the original response sent back to me from the developer of
}  } mdadm:
}  } ***
}  } Re: should have been more explicit here - Re: need some help
} URGENT!
}  } From: Neil Brown [EMAIL PROTECTED]
}  } To: Technomage [EMAIL PROTECTED]
}  } Date: Sunday 22:01:45
}  } On Sunday March 26, [EMAIL PROTECTED] wrote:
}  }  ok,
}  } 
}  }  you gave me more info than some local to that mentioned e-mail list.
}  } 
}  }  ok, the vast majority of the data I need to recover is on /dev/hda
}  }  and /dev/hdb only has 193 MB and is probably irrelevant.
}  } 
}  }  can you help me with this?
}  }  can you baby me through this. I really need to recover this data (if
} at
}  } all
}  }  possible).
}  }
}  } Not really, and certainly not now (I have to go out).
}  } I have already make 2 suggestions
}  }   mail 

Re: [PATCH] Add stripe cache entries to raid6 sysfs

2006-03-29 Thread Neil Brown
On Saturday March 25, [EMAIL PROTECTED] wrote:
 Raid-6 did not create sysfs entries for stripe cache
 
 Signed-off-by: Brad Campbell [EMAIL PROTECTED]
 
 ---
 diff -u vanilla/linux-2.6.16/drivers/md/raid6main.c 
 linux-2.6.16/drivers/md/raid6main.c
 --- vanilla/linux-2.6.16/drivers/md/raid6main.c 2006-03-20 09:53:29.0 
 +0400
 +++ linux-2.6.16/drivers/md/raid6main.c 2006-03-25 16:35:05.0 +0400
 @@ -2148,6 +2148,7 @@
  }
 
  /* Ok, everything is just fine now */
 +sysfs_create_group(mddev-kobj, raid6_attrs_group);
  mddev-array_size =  mddev-size * (mddev-raid_disks - 2);
 
  mddev-queue-unplug_fn = raid6_unplug_device;

Gee, I wonder I missed that...
Thanks!

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 001 of 3] md: Don't clear bits in bitmap when writing to one device fails during recovery.

2006-03-29 Thread NeilBrown

Currently a device failure during recovery leaves bits set in the
bitmap.  This normally isn't a problem as the offending device will be
rejected because of errors.  However if device re-adding is being used
with non-persistent bitmaps, this can be a problem.

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/raid1.c |   13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~   2006-03-30 16:48:29.0 +1100
+++ ./drivers/md/raid1.c2006-03-30 16:48:40.0 +1100
@@ -1135,8 +1135,19 @@ static int end_sync_write(struct bio *bi
mirror = i;
break;
}
-   if (!uptodate)
+   if (!uptodate) {
+   int sync_blocks = 0;
+   sector_t s = r1_bio-sector;
+   long sectors_to_go = r1_bio-sectors;
+   /* make sure these bits doesn't get cleared. */
+   do {
+   bitmap_end_sync(mddev-bitmap, r1_bio-sector,
+   sync_blocks, 1);
+   s += sync_blocks;
+   sectors_to_go -= sync_blocks;
+   } while (sectors_to_go  0);
md_error(mddev, conf-mirrors[mirror].rdev);
+   }
 
update_head_pos(mirror, r1_bio);
 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 002 of 3] md: Remove some code that can sleep from under a spinlock.

2006-03-29 Thread NeilBrown

And remove the comments that were put in inplace of a fix too

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/md.c |8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~  2006-03-30 16:48:30.0 +1100
+++ ./drivers/md/md.c   2006-03-30 16:48:47.0 +1100
@@ -214,13 +214,11 @@ static void mddev_put(mddev_t *mddev)
return;
if (!mddev-raid_disks  list_empty(mddev-disks)) {
list_del(mddev-all_mddevs);
-   /* that blocks */
+   spin_unlock(all_mddevs_lock);
blk_cleanup_queue(mddev-queue);
-   /* that also blocks */
kobject_unregister(mddev-kobj);
-   /* result blows... */
-   }
-   spin_unlock(all_mddevs_lock);
+   } else
+   spin_unlock(all_mddevs_lock);
 }
 
 static mddev_t * mddev_find(dev_t unit)
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 003 of 3] md: Raid-6 did not create sysfs entries for stripe cache

2006-03-29 Thread NeilBrown

Signed-off-by: Brad Campbell [EMAIL PROTECTED]
Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/raid6main.c |2 ++
 1 file changed, 2 insertions(+)

diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~   2006-03-30 16:48:30.0 +1100
+++ ./drivers/md/raid6main.c2006-03-30 16:48:52.0 +1100
@@ -2151,6 +2151,8 @@ static int run(mddev_t *mddev)
}
 
/* Ok, everything is just fine now */
+   sysfs_create_group(mddev-kobj, raid6_attrs_group);
+
mddev-array_size =  mddev-size * (mddev-raid_disks - 2);
 
mddev-queue-unplug_fn = raid6_unplug_device;
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 000 of 3] md: Introduction - assorted fixed for 2.6.16

2006-03-29 Thread NeilBrown
Following are three patches for md.  The first fixes a problem that
can cause corruption in fairly unusual circumstances (re-adding a
device to a raid1 and suffering write-errors that are subsequntly
fixed and the device is re-added again).

The other two fix minor problems

The are suitable to go straight in to 2.6.17-rc.

NeilBrown

 [PATCH 001 of 3] md: Don't clear bits in bitmap when writing to one device 
fails during recovery.
 [PATCH 002 of 3] md: Remove some code that can sleep from under a spinlock.
 [PATCH 003 of 3] md: Raid-6 did not create sysfs entries for stripe cache
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 001 of 3] md: Don't clear bits in bitmap when writing to one device fails during recovery.

2006-03-29 Thread Andrew Morton
NeilBrown [EMAIL PROTECTED] wrote:

 + if (!uptodate) {
  +int sync_blocks = 0;
  +sector_t s = r1_bio-sector;
  +long sectors_to_go = r1_bio-sectors;
  +/* make sure these bits doesn't get cleared. */
  +do {
  +bitmap_end_sync(mddev-bitmap, r1_bio-sector,
  +sync_blocks, 1);
  +s += sync_blocks;
  +sectors_to_go -= sync_blocks;
  +} while (sectors_to_go  0);
   md_error(mddev, conf-mirrors[mirror].rdev);
  +}

Can mddev-bitmap be NULL?

If so, will the above loop do the right thing when this:

void bitmap_end_sync(struct bitmap *bitmap, sector_t offset, int *blocks, int 
aborted)
{
bitmap_counter_t *bmc;
unsigned long flags;
/*
if (offset == 0) printk(bitmap_end_sync 0 (%d)\n, aborted);
*/  if (bitmap == NULL) {
*blocks = 1024;
return;
}

triggers?
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 001 of 3] md: Don't clear bits in bitmap when writing to one device fails during recovery.

2006-03-29 Thread Neil Brown
On Wednesday March 29, [EMAIL PROTECTED] wrote:
 NeilBrown [EMAIL PROTECTED] wrote:
 
  +   if (!uptodate) {
   +  int sync_blocks = 0;
   +  sector_t s = r1_bio-sector;
   +  long sectors_to_go = r1_bio-sectors;
   +  /* make sure these bits doesn't get cleared. */
   +  do {
   +  bitmap_end_sync(mddev-bitmap, r1_bio-sector,
   +  sync_blocks, 1);
   +  s += sync_blocks;
   +  sectors_to_go -= sync_blocks;
   +  } while (sectors_to_go  0);
  md_error(mddev, conf-mirrors[mirror].rdev);
   +  }
 
 Can mddev-bitmap be NULL?

Yes, normally it is.

 
 If so, will the above loop do the right thing when this:
 
 void bitmap_end_sync(struct bitmap *bitmap, sector_t offset, int *blocks, int 
 aborted)
 {
   bitmap_counter_t *bmc;
   unsigned long flags;
 /*
   if (offset == 0) printk(bitmap_end_sync 0 (%d)\n, aborted);
 */if (bitmap == NULL) {
   *blocks = 1024;
   return;
   }
 
 triggers?

Yes.  sync_blocks will be 1024 (a nice big number) and the loop will
exit quite quickly having done nothing (which is what it needs to do
in that case).
Ofcourse, if someone submits a bio for multiple thousands of sectors
it will loop needlessly a few times, but do we ever generate bios that
are even close to a megabyte?
If so, that 1024 can be safely increased to 120, and possibly higher
but I would need to check.

Thanks for asking
NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


making raid5 more robust against block errors

2006-03-29 Thread Mikael Abrahamsson


Is there any work going on to handle readerrors on a raid5 disk being 
handled by recreating the faulty block from the other disks and just 
rewriting the block, instead of kicking the disk out?


I've problems on several occasions where two disks in a raid5 will have 
single sector errors and thus it's impossible (afaik) to get the array up 
and running without a lot of manual intervention and likely data loss, 
when the information needed to get the array up and running without 
dataloss is actually there.


I know this has been discussed before (I've been in these discussions 
myself), just wanted to know if this resulted in any improvement?


Right now I am more prone to using 3ware hw-raid rather than sw-raid due 
to this, as it will do the above and handle the read error gracefully. The 
data integrity is more important than write speed (where sw-raid excels) 
for me.


--
Mikael Abrahamssonemail: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html