Re: Please help me save my data

2006-09-16 Thread martin . kihlgren
 Patrick Hoover wrote:
 Is anyone else having issues with USB interfaced disks to implement
 RAID? Any thoughts on Pros / Cons for doing this?

 Sounds like a very good stress test for MD.

 I often find servers completely hung when a disk fails, this usually
 happens in the IDE layer.
 If using USB disks circumvents the IDE layer enough, using USB disks
 might get rid of these hangs.  Would be nice at least.  Maybe I'm just
 dreaming.

 For end users, USB might remove the need to take special care of
 cooling in your cabinet.
 OTOH, most USB disk enclosures have horrible thermal properties.

 USB would make it a lot easier to add new disks (beyond your cabinet's
 capacity) and to remove old disks when/if they're no longer needed.
 Users might run into a bandwidth issue at some point..

After I got rid of a crappy USB hub the catastrophic resets stopped. And
after I bought a separate PCI USB card the non-catastrophic resets have
almost stopped as well.

So now the system works as well as I hoped when I planned it!

And no, nothing hangs except the disk access to the device in question
when a disk fails.

My Seagate disks DO generate too much heat if I stack them on top of each
other, which their form factor suggests they would accept. If I put them a
bit more spacy though it works perfectly. And there ARE enclosures with
separate fans.

I have 10 external USB disks now - I got rid of my internal ones which
were too old and failing, and I plan on continuing to add on to my
external array. My RAID5 + LVM + dm_crypt + XFS setup allows for a very
extendable system.

And as long as I treat the entire disk set as one device, the bandwidth
will not be an issue since I will never demand more bandwidth from the
entire array than from a single USB drive anyway.

//Martin
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please help me save my data

2006-09-11 Thread martin . kihlgren
 On 9/8/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 So, what I want to do is:

  * Mark the synced spare drive as working and in position 1
  * Assemble the array without the unsynced spare and check if this
provides consistent data
  * If it didnt, I want to mark the synced spare as working and in
position 3, and try the same thing again
  * When I have it working, I just want to add the unsynced spare and
let it sync normally
  * Then I will create a write-intent bitmap to avoid the dangerously
long sync times, and also buy a new USB controller hoping that it
will solve my problems

 You can recreate the raid array with 1 missing disk, like this:

 mdadm -C /dev/md1 /dev/sdn1 /dev/sdX1 /dev/sdn1 /dev/sdn1 missing

 The ordering is relevant, raid-disks 0,1,2,3,4 or so. beware, you have
to have block size and symmetry correct, so better backup mdadm
 --examine and --detail output beforehand.

 This create op causes no sync (no danger data overwrites), as there is
still the one drive missing, but raid-superblocks are rewritten.

 (On a sidenote, i'm uncertain if a bitmap helps in the case of
 single-device remove-add cycle? I thought it was only for crashes, at
least for now..)


Thanks for your help! Your advice is good, and I will use it next time.

This time I found an old USB memory stick to experiment with, and managed
to do pretty much the same thing with:

mdadm -C -l 5 -n 5 -f -e 1.2 --assume-clean /dev/md1 /dev
mdadm -f /dev/md1 /dev/borken_device

And yes, the ordering was very relevant. An xfs_check showed me which
ordering was correct however. But I still have a problem with not easily
knowing what physical drive is what raid device, since USB devices get
ordered in some random way.

And no, the bitmap didnt help in this case (it has happened again but with
only one disk)... I wish my USB worked better, but I guess its a question
of time and kernel development.

Thanks anyhow!
regards,
//Martin

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Please help me save my data

2006-09-08 Thread martin . kihlgren

Hello list.

I have a spot of trouble with a RAID5 array of mine, and I thought maybe you 
could help me.

This is the story so far:

 * I bought 10 external USB drives. This seemed like a good idea, they are
   cheap, they are hot-pluggable and they are fast enough.
 * I set them up in two RAID5 arrays, which I set up as LVM pv's. Then I
   created an LVM vg out of these and an LVM lv out of the vg.
 * I encrypted this lv and formatted it with an xfs fs.

This all worked perfectly fine, until I realized how bad these drives and this 
USB controller work with ehci_hcd.

In short, the devices get reset all the time. And each time they get reset, 
everything stops for a while. Nothing strange and no showstopper 
here.

But the really bad thing is when they reset in some extra-bad way, and get 
dropped completely. What happens then is that the RAID5 system 
drops them, and they get reincarnated with a new device name. /dev/sdi becomes 
suddenly /dev/sdn or something similarly horrbile.

And since I (up until yesterday) didnt know about write-intent bitmaps each 
resync took around 10 hours. Plenty of time for ANOTHER disk to 
fail and get dropped.

This I usually solved by doing mdadm -S and then mdadm -A -f.

Yesterday, however, I was feeling extra clever, and I just did mdadm -a 
/dev/md1 /dev/sdn1.

This was a huge mistake.

What had happened, I now realized, was this:

 * /dev/md1 is fine
 * /dev/sdX1 drops, and /dev/md1 is degraded
 * I re-add /dev/sdX1 in its new guise, and /dev/md1 is resyncing with
   4 working drives and one spare
 * /dev/sdY1 drops, and /dev/md1 stops
 * I re-add /dev/sdY1 in its new guise, and mdadm marks it as a SPARE. * I 
suddenly have an array with 3 working drives and 2 spares where
   I know that one spare is in fact synced and ready to go, since
   the array stopped the moment it failed.

Also, I dont know any longer WHERE in the array the synced but
spare-marked drive should go. I know that the working drives are 0, 2 and 4, 
but not where the synced spare drive should go.

So, what I want to do is:

 * Mark the synced spare drive as working and in position 1
 * Assemble the array without the unsynced spare and check if this
   provides consistent data
 * If it didnt, I want to mark the synced spare as working and in
   position 3, and try the same thing again
 * When I have it working, I just want to add the unsynced spare and
   let it sync normally
 * Then I will create a write-intent bitmap to avoid the dangerously
   long sync times, and also buy a new USB controller hoping that it will solve 
my problems

So, do you guys have any idea how I can do this? mdadm doesnt support changing 
the superblock in such a free hand manner...

Please help me save this data :/ It is precious to me :.(

regards,
//Martin Kihlgren

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Please help me save my data

2006-09-08 Thread martin . kihlgren

Hello list.

I have a spot of trouble with a RAID5 array of mine, and I thought maybe
you could help me.

This is the story so far:

 * I bought 10 external USB drives. This seemed like a good idea, they are
   cheap, they are hot-pluggable and they are fast enough.
 * I set them up in two RAID5 arrays, which I set up as LVM pv's. Then I
   created an LVM vg out of these and an LVM lv out of the vg.
 * I encrypted this lv and formatted it with an xfs fs.

This all worked perfectly fine, until I realized how bad these drives and
this USB controller work with ehci_hcd.

In short, the devices get reset all the time. And each time they get
reset, everything stops for a while. Nothing strange and no showstopper
here.

But the really bad thing is when they reset in some extra-bad way, and get
dropped completely. What happens then is that the RAID5 system drops them,
and they get reincarnated with a new device name. /dev/sdi becomes
suddenly /dev/sdn or something similarly horrbile.

And since I (up until yesterday) didnt know about write-intent bitmaps
each resync took around 10 hours. Plenty of time for ANOTHER disk to fail
and get dropped.

This I usually solved by doing mdadm -S and then mdadm -A -f.

Yesterday, however, I was feeling extra clever, and I just did mdadm -a
/dev/md1 /dev/sdn1.

This was a huge mistake.

What had happened, I now realized, was this:

 * /dev/md1 is fine
 * /dev/sdX1 drops, and /dev/md1 is degraded
 * I re-add /dev/sdX1 in its new guise, and /dev/md1 is resyncing with
   4 working drives and one spare
 * /dev/sdY1 drops, and /dev/md1 stops
 * I re-add /dev/sdY1 in its new guise, and mdadm marks it as a SPARE.
 * I suddenly have an array with 3 working drives and 2 spares where
   I know that one spare is in fact synced and ready to go, since
   the array stopped the moment it failed.

Also, I dont know any longer WHERE in the array the synced but
spare-marked drive should go. I know that the working drives are 0, 2 and
4, but not where the synced spare drive should go.

So, what I want to do is:

 * Mark the synced spare drive as working and in position 1
 * Assemble the array without the unsynced spare and check if this
   provides consistent data
 * If it didnt, I want to mark the synced spare as working and in
   position 3, and try the same thing again
 * When I have it working, I just want to add the unsynced spare and
   let it sync normally
 * Then I will create a write-intent bitmap to avoid the dangerously
   long sync times, and also buy a new USB controller hoping that it
   will solve my problems

So, do you guys have any idea how I can do this? mdadm doesnt support
changing the superblock in such a free hand manner...

Please help me save this data :/ It is precious to me :.(

regards,
//Martin Kihlgren
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html