Re: mdadm create to existing raid5

2007-07-13 Thread David Greaves

Guy Watkins wrote:

} [EMAIL PROTECTED] On Behalf Of Jon Collette
} I wasn't thinking and did a mdadm --create to my existing raid5 instead
} of --assemble.  The syncing process ran and now its not mountable.  Is
} there anyway to recover from this?
Maybe.  Not really sure.  But don't do anything until someone that really
knows answers!

I agree - Yes, maybe.



What I think...
If you did a create with the exact same parameters the data should not have
changed.  But you can't mount so you must have used different parameters.

I'd agree.



Only 1 disk was written to during the create.

Yep.


 Only that disk was changed.

Yep.


If you remove the 1 disk and do another create with the original parameters
and put missing for the 1 disk your array will be back to normal, but
degraded.  Once you confirm this you can add back the 1 disk.

Yep.
**WARNING**
**WARNING**
**WARNING**
At this point you are relatively safe (!) but as soon as you do an 'add' and 
initiate another resync then if you got it wrong you will have toasted your data 
completely!!!

**WARNING**
**WARNING**
**WARNING**


 You must be
able to determine which disk was written to.  I don't know how to do that
unless you have the output from mdadm -D during the create/syncing.


Do you know the *exact* command you issued when you did the initial --create?
Do you know the *exact* command you issued when you did the bogus --create?

And what version of mdadm you are using?

Neil said that it's mdadm, not the kernel, that determines which device is 
initially degraded during a create. We can look at the code and your command 
line and guess which device mdadm chose. (Getting this wrong won't matter but it 
may make recovery quicker.)


assuming you have a 4 device raid using /dev/sda1, /dev/sdb1, /dev/sdc1, 
/dev/sdd1

you'll then do something like:
mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 /dev/sda1 /dev/sdb1 
/dev/sdc1 missing

try a mount
mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 /dev/sda1 missing 
/dev/sdc1 /dev/sdb1

try a mount
mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 /dev/sdb1 /dev/sda1 
/dev/sdc1 missing

try a mount
mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 /dev/sdc1 /dev/sdb1 
/dev/sda1 missing

try a mount

etc etc,

So you'll still need to do a trial and error assemble
For a simple 4 device array I there are 24 permutations - doable by hand, if you 
have 5 devices then it's 120, 6 is 720 - getting tricky ;)


I'm bored so I'm going to write a script based on something like this:
http://www.unix.org.ua/orelly/perl/cookbook/ch04_20.htm

Feel free to beat me to it ...

The critical thing is that you *must* use 'missing' when doing these trial 
--create calls.


If we've not explained something very well and you don't understand then please 
ask before trying it out...


David
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm create to existing raid5

2007-07-13 Thread David Greaves

David Greaves wrote:
For a simple 4 device array I there are 24 permutations - doable by 
hand, if you have 5 devices then it's 120, 6 is 720 - getting tricky ;)


Oh, wait, for 4 devices there are 24 permutations - and you need to do it 4 
times, substituting 'missing' for each device - so 96 trials.


4320 trials for a 6 device array.

Hmm. I've got a 7 device raid 6 - I think I'll go an make a note of how it's put 
together... grin



Have a look at this section and the linked script.
I can't test it until later

http://linux-raid.osdl.org/index.php/RAID_Recovery

http://linux-raid.osdl.org/index.php/Permute_array.pl


David


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-07-13 Thread Ric Wheeler



Guy Watkins wrote:

} -Original Message-
} From: [EMAIL PROTECTED] [mailto:linux-raid-
} [EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
} Sent: Thursday, July 12, 2007 1:35 PM
} To: [EMAIL PROTECTED]
} Cc: Tejun Heo; [EMAIL PROTECTED]; Stefan Bader; Phillip Susi; device-mapper
} development; [EMAIL PROTECTED]; [EMAIL PROTECTED];
} linux-raid@vger.kernel.org; Jens Axboe; David Chinner; Andreas Dilger
} Subject: Re: [dm-devel] Re: [RFD] BIO_RW_BARRIER - what it means for
} devices, filesystems, and dm/md.
} 
} On Wed, 11 Jul 2007 18:44:21 EDT, Ric Wheeler said:

}  [EMAIL PROTECTED] wrote:
}   On Tue, 10 Jul 2007 14:39:41 EDT, Ric Wheeler said:
}  
}   All of the high end arrays have non-volatile cache (read, on power
} loss, it is a
}   promise that it will get all of your data out to permanent storage).
} You don't
}   need to ask this kind of array to drain the cache. In fact, it might
} just ignore
}   you if you send it that kind of request ;-)
}  
}   OK, I'll bite - how does the kernel know whether the other end of that
}   fiberchannel cable is attached to a DMX-3 or to some no-name product
} that
}   may not have the same assurances?  Is there a I'm a high-end array
} bit
}   in the sense data that I'm unaware of?
}  
} 
}  There are ways to query devices (think of hdparm -I in S-ATA/P-ATA
} drives, SCSI
}  has similar queries) to see what kind of device you are talking to. I am
} not
}  sure it is worth the trouble to do any automatic detection/handling of
} this.
} 
}  In this specific case, it is more a case of when you attach a high end
} (or
}  mid-tier) device to a server, you should configure it without barriers
} for its
}  exported LUNs.
} 
} I don't have a problem with the sysadmin *telling* the system the other

} end of
} that fiber cable has characteristics X, Y and Z.  What worried me was
} that it
} looked like conflating device reported writeback cache with device
} actually
} has enough battery/hamster/whatever backup to flush everything on a power
} loss.
} (My back-of-envelope calculation shows for a worst-case of needing a 1ms
} seek
} for each 4K block, a 1G cache can take up to 4 1/2 minutes to sync.
} That's
} a lot of battery..)

Most hardware RAID devices I know of use the battery to save the cache while
the power is off.  When the power is restored it flushes the cache to disk.
If the power failure lasts longer than the batteries then the cache data is
lost, but the batteries last 24+ hours I beleve.


Most mid-range and high end arrays actually use that battery to insure that data 
is all written out to permanent media when the power is lost. I won't go into 
how that is done, but it clearly would not be a safe assumption to assume that 
your power outage is only going to be a certain length of time (and if not, you 
would lose data).




A big EMC array we had had enough battery power to power about 400 disks
while the 16 Gig of cache was flushed.  I think EMC told me the batteries
would last about 20 minutes.  I don't recall if the array was usable during
the 20 minutes.  We never tested a power failure.

Guy


I worked on the team that designed that big array.

At one point, we had an array on loan to a partner who tried to put it in a very 
small data center. A few weeks later, they brought in an electrician who needed 
to run more power into the center.  It was pretty funny - he tried to find a 
power button to turn it off and then just walked over and dropped power trying 
to get the Symm to turn off.  When that didn't work, he was really, really 
confused ;-)


ric
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Software based SATA RAID-5 expandable arrays?

2007-07-13 Thread Daniel Korstad
To run it manually;

echo check  /sys/block/md0/md/sync_action

than you can check the status with;

cat /proc/mdstat

Or to continually watch it, if you want (kind of boring though :) )

watch cat /proc/mdstat

This will refresh ever 2sec.

In my original email I suggested to use a crontab so you don't need to remember 
to do this every once in a while.

Run (I did this in root);

crontab -e 

This will allow you to edit you crontab. Now past this command in there;

30 2 * * Mon echo  check /sys/block/md0/md/sync_action

If you want you can add comments, I like to comment my stuff since I have lots 
of stuff in mine, just make sure you have '#' in the front of the lines so your 
system knows it is just a comment and not a command it should run;

#check for bad blocks once a week (every Mon at 2:30am)
#if bad blocks are found, they are corrected from parity information

After you have put this in your crontab, write and quit with this command;

:wq

It should come back with this;
[EMAIL PROTECTED] ~]# crontab -e
crontab: installing new crontab

Now you can look at your cron table (without editing) with this;

crontab -l

It should return something like this, depending if you added comments or how 
you scheduled your command;

#check for bad blocks once a week (every Mon at 2:30am)
#if bad blocks are found, they are corrected from parity information
30 2 * * Mon echo  check /sys/block/md0/md/sync_action

For more info on crontab and syntax for times (I just did a google and grabbed 
the first couple links...);
http://www.tech-geeks.org/contrib/mdrone/croncrontab-howto.htm
http://ubuntuforums.org/showthread.php?t=102626highlight=cron

Cheers,
Dan.

-Original Message-
From: Michael [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 12, 2007 5:43 PM
To: Bill Davidsen; Daniel Korstad
Cc: linux-raid@vger.kernel.org
Subject: Re: Software based SATA RAID-5 expandable arrays?

SuSe uses its own version of cron which is different then everything else I 
have seen, and the documentation is horrible.  However they provide a 
wonderfull xwindows utility that helps set them up... the problem Im having is 
figuring out what to run.  When I try to run /sys/block/md0/md/sync_action 
under a prompt it shoots out a permission denied even though I am SU or logged 
in under Root.  Very annoying.  You mention Check vrs Repair... which brings me 
too my last issue on setting up this machine.  How do you send an email when 
Check, SMART, and when a RAID drive fails?  How do you auto repair if the Check 
fails?

These are the last things I need to do for my Linux Server to work right... 
after I get all of this done, I will change the boot to goto the command prompt 
and not XWindows, and I will leave it in the corner of my room hopefully not to 
be used for as long as possible.

- Original Message 
From: Bill Davidsen [EMAIL PROTECTED]
To: Daniel Korstad [EMAIL PROTECTED]
Cc: Michael [EMAIL PROTECTED]; linux-raid@vger.kernel.org
Sent: Wednesday, July 11, 2007 10:21:42 AM
Subject: Re: Software based SATA RAID-5 expandable arrays?

Daniel Korstad wrote:
 You have lots of options.  This will be a lengthy response and will give just 
 some ideas for just some of the options...
  
   
Just a few thoughts below interspersed with your comments.
 For my server, I had started out with a single drive.  I later migrated to 
 migrate to a RAID 1 mirror (after having to deal with reinstalls after drive 
 failures I wised up).  Since I already had an OS that I wanted to keep, my 
 RAID-1 setup was a bit more involved.  I following this migration to get me 
 there;
 http://wiki.clug.org.za/wiki/RAID-1_in_a_hurry_with_grub_and_mdadm
  
 Since you are starting from scratch, it should be easier for you.  Most 
 distros will have an installer that will guide you though the process.  When 
 you get to hard drive partitioning, look for an advance option or review and 
 modify partition layout option or something similar otherwise it might just 
 make a guess of what you want and that would not be RAID.  In this advance 
 partition setup, you will be able to create your RAID.  First you make equal 
 size partitions on both physical drives.  For example, first carve out 100M 
 partition on each of the two physical OS drives, than make a RAID 1 md0 with 
 each of this partitions and than make this your /boot.  Do this again for 
 other partitions you want to have RAIDed.  You can do this for /boot, /var, 
 /home, /tmp, /usr.  This is can be nice to have a separations incase a user 
 fills /home/foo with crap and this will not effect other parts of the OS, or 
 if mail spool fills up, it will not hang the OS.  Only problem it
 determining how big to make them during the install.  At a minimum, I would do 
three partitions; /boot, swap, and /  This means all the others (/var, /home, 
/tmp, /usr) are in the / partition but this way you don't have to worry about 
sizing them all correctly. 
  
 For the simplest setup, I would do RAID 1 for 

[GIT PULL] ioat fixes, raid5 acceleration, and the async_tx api

2007-07-13 Thread Dan Williams

Linus, please pull from

git://lost.foo-projects.org/~dwillia2/git/iop ioat-md-accel-for-linus

to receive:

1/ I/OAT performance tweaks and simple fixups.  These patches have been
   in -mm for a few kernel releases as git-ioat.patch
2/ RAID5 acceleration and the async_tx api.  These patches have also
   been in -mm for a few kernel releases as git-md-accel.patch.  In
   addition, they have received field testing as a part of the -iop kernel
   released via SourceForge[1] since 2.6.18-rc6.

The raid acceleration work can further be subdivided into three logical
areas:
- API -
The async_tx api provides methods for describing a chain of
asynchronous bulk memory transfers/transforms with support for
inter-transactional dependencies.  It is implemented as a dmaengine
client that smooths over the details of different hardware offload
engine implementations.  Code that is written to the api can optimize
for asynchronous operation and the api will fit the chain of operations
to the available offload resources. 

- Implementation -
When the raid acceleration work was proposed, Neil laid out the
following attack plan:
1/ move the xor and copy operations outside spin_lock(sh-lock)
2/ find/implement an asynchronous offload api
The raid5_run_ops routine uses the asynchronous offload api
(async_tx) and the stripe_operations member of a stripe_head to carry
out xor and copy operations asynchronously, outside the lock.

- Driver -
The Intel(R) Xscale IOP series of I/O processors integrate an Xscale
core with raid acceleration engines.  The iop-adma driver supports the
copy and xor capabilities of the 3 IOP architectures iop32x, iop33x, and
iop34x.

All the MD changes have been acked-by Neil Brown.  For the changes made
to net/ I have received David Miller's acked-by.  Shannon Nelson has
tested the I/OAT changes (due to async_tx support) in his environment
and has added his signed-off-by.  Herbert Xu has agreed to let the
async_tx api be housed under crypto/ with the intent to coordinate
efforts as support for transforms like crc32c and raid6-p+q are
developed.

To be clear Shannon Nelson is the I/OAT maintainer, but we agreed that I
should coordinate this release to simplify the merge process.  Going
forward I will be the iop-adma maintainer.  For the common bits,
dmaengine core and the async_tx api, Shannon and I will coordinate as
co-maintainers.

- Credits -
I cannot thank Neil Brown enough for his advice and patience as this
code was developed.

Jeff Garzik is credited with helping the dmaengine core and async_tx
become sane apis.  You are credited with the general premise that users
of an asynchronous offload engine api should not know or care if an
operation is carried out asynchronously or synchronously in software.
Andrew Morton is credited with corralling these conflicting git trees in
-mm and more importantly imparting encouragement at OLS 2006.

Per Andrew's request the md-accel changelogs were fleshed out and the
patch set was posted for a final review a few weeks ago[2].  To my
knowledge there are no pending review items.  This tree is based on
2.6.22.

Thank you,
Dan

[1] http://sourceforge.net/projects/xscaleiop
[2] http://marc.info/?l=linux-raidw=2r=1s=md-accelq=b

Andrew Morton (1):
  I/OAT: warning fix

Chris Leech (5):
  ioatdma: Push pending transactions to hardware more frequently
  ioatdma: Remove the wrappers around read(bwl)/write(bwl) in ioatdma
  ioatdma: Remove the use of writeq from the ioatdma driver
  I/OAT: Add documentation for the tcp_dma_copybreak sysctl
  I/OAT: Only offload copies for TCP when there will be a context switch

Dan Aloni (1):
  I/OAT: fix I/OAT for kexec

Dan Williams (20):
  dmaengine: refactor dmaengine around dma_async_tx_descriptor
  dmaengine: make clients responsible for managing channels
  xor: make 'xor_blocks' a library routine for use with async_tx
  async_tx: add the async_tx api
  raid5: refactor handle_stripe5 and handle_stripe6 (v3)
  raid5: replace custom debug PRINTKs with standard pr_debug
  md: raid5_run_ops - run stripe operations outside sh-lock
  md: common infrastructure for running operations with raid5_run_ops
  md: handle_stripe5 - add request/completion logic for async write ops
  md: handle_stripe5 - add request/completion logic for async compute ops
  md: handle_stripe5 - add request/completion logic for async check ops
  md: handle_stripe5 - add request/completion logic for async read ops
  md: handle_stripe5 - add request/completion logic for async expand ops
  md: handle_stripe5 - request io processing in raid5_run_ops
  md: remove raid5 compute_block and compute_parity5
  dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines
  iop13xx: surface the iop13xx adma units to the iop-adma driver
  iop3xx: surface the iop3xx DMA and AAU units to the iop-adma driver
  ARM: Add drivers/dma to arch/arm/Kconfig
  

Re: Software based SATA RAID-5 expandable arrays?

2007-07-13 Thread Bill Davidsen

Michael wrote:

RESPONSE

I had everything working, but it is evident that when I installed SuSe
the first time check and repair where not included in the package:(  I
did not use the  I used , as was incorrectly stated in
many documentations I set up.


  

Doesn't matter, either will work and most people just use 

The thing that made me suspect check and repair wasn't part of sues was
the failure of check or repair typed at the command prompt to
respond in any kind other then a response that stated their was no
command.  In addition man check and man repair was also missing.


  
One more time, check and repair are not commands, they are character 
strings! You are using the echo command to write those strings into the 
control interface in the sysfs area. If you type exactly what people 
have sent you that will work.

BROKEN!

I did an auto update of the SuSe machine, which ended up replacing the
kernel.  They added the new entries to the boot choices but the mount
information was not transfered.  SuSe also deleted the original kernel
boot setup.  When suse looked at the drives individually they found
that none of them was recognizable.  Therefor when I woke up this
morning and rebooted the machine after the update, I received the
errors and then dumps me to a basic prompt with limited ability to do
anything.  I know I need to manually remount the drives, but its going
to be a challenge since I did not do this in the past.  The answer to
this question is that I either have to change distro's (which I am
tempted to do) or fix the current distro.  Please do not bother
providing any solutions for I simply have to RTFM (which I haven't had
time to do).



I think I am going to reset up my machines.  The first two drives with
identical boot partitions, yet not mirror them.  I can then manually
run a tree copy that would update my second drive as I grow the
system, and after successfull and needed updates.  This would then
allow me a fall back after any updates, and with simply swapping SATA
drive cables from the first boot drive too the second.  I am assuming
this will work.  I then can RAID-6 (or 5) in the setup, recopy my files
(yes I haven't deleted them because I am not confident in my ability
with Linux yet.).  Hopefully I will just simply remount these 4 drives
because there a simple raid 5 array.



SUSE's COMPLETE FAILURES

This frustration with SuSe, the lack of a simple reliable update
utility and the failures I experience has discouraged me from using
SuSe at all.  Its got some amazing tools that help me from constantly
looking up documentation, posting to forums, or going to IRC, but the
unreliable upgrade process is a deal breaker for me.  Its simply to
much work to manually update everything.  This project had a simple
goal, which was to provide an easy and cheap solution to an unlimited
NAS service.



SUPPORT

In addition, SuSe's IRC help channel is among the worst I have
encountered.  The level of support is often very good, but the level of
harassment, flames and simple childish behavior overcomes almost any
attempt at providing any level of support.  I have no problem giving
back to the community when I learn enough to do so, but I will not be
mocked for my inability to understand a new and very in depth system. 
In fact, I tend to goto the wonderful gentoo irc for my answers.  The

IRC is amazing, the people patient and encouraging, the level of
knowledge is the best I have experienced.  This resource, outside the
original incident, has been an amazing resource.  I feel highly
confident asking questions about RAID here, because I know you guys are
actually RUNNING systems that I am attempting to do.

- Original Message 
From: Daniel Korstad [EMAIL PROTECTED]
To: big.green.jelly.bean [EMAIL PROTECTED]
Cc: davidsen [EMAIL PROTECTED]; linux-raid linux-raid@vger.kernel.org
Sent: Friday, July 13, 2007 11:22:45 AM
Subject: RE: Software based SATA RAID-5 expandable arrays?

To run it manually;

echo check  /sys/block/md0/md/sync_action

than you can check the status with;

cat /proc/mdstat

Or to continually watch it, if you want (kind of boring though :) )

watch cat /proc/mdstat

This will refresh ever 2sec.

In my original email I suggested to use a crontab so you don't need to remember 
to do this every once in a while.

Run (I did this in root);

crontab -e 


This will allow you to edit you crontab. Now past this command in there;

30 2 * * Mon echo  check /sys/block/md0/md/sync_action

If you want you can add comments, I like to comment my stuff since I have lots 
of stuff in mine, just make sure you have '#' in the front of the lines so your 
system knows it is just a comment and not a command it should run;

#check for bad blocks once a week (every Mon at 2:30am)
#if bad blocks are found, they are corrected from parity information

After you have put this in your crontab, write and quit with this command;

:wq

It should come back with this;
[EMAIL PROTECTED] ~]# crontab -e

RE: Software based SATA RAID-5 expandable arrays?

2007-07-13 Thread Daniel Korstad
I can't speak for SuSe issues but I believe there is some confusion on the 
packages and command syntax.  

So hang on, we are going for a ride, step by step...

Check and repair are not packages per say.

You should have a package called echo.

If you run this;

echo 1
 
Should get a 1 echoed back at you.

For example;

[EMAIL PROTECTED] echo 1
1

Or anything else you want;

[EMAIL PROTECTED] echo check
check

Now all we are doing with this is redirecting with the  to another 
location, /sys/block/md0/md/sync_action

The difference between a double  and a single  is the  will append it to 
the end and the single  will replace the contents of the file with the value.

For example;
I will create a file called foo;

[EMAIL PROTECTED] tmp]# vi foo

In this file I add two lines of text, foo, than I will write and quit :wq

Now I will take a look at the file I just made with my vi editor...

[EMAIL PROTECTED] tmp]# cat foo
foo
foo

Great, now I run my echo command to send another value to it.

First I use the double  to just append;

[EMAIL PROTECTED] tmp]# echo foo2  foo

Now I take another look at the file;

[EMAIL PROTECTED] tmp]# cat foo
foo
foo
foo2

So, I have my first two text lines the third line foo2 appended.

Now I do this again but use just the single  to replace the file with a value.

[EMAIL PROTECTED] tmp]# echo foo3  foo

Than I look at it again;

[EMAIL PROTECTED] tmp]# cat foo
foo3

Ahh, all the other lines are gone and now I just have foo3.

So,  replaces and  appends.

How does this affect your /sys/block/md0/md/sync_action  file?  As it turns 
out, it does not matter.

Think of the proc and sys (/proc and /sys) as psuedo file system is a real 
time, memory resident file system that tracks the processes running on your 
machine and the state of your system.

So first lets go to /sys/block/

Than I will list its contents;
[EMAIL PROTECTED] ~]# cd /sys/block/
[EMAIL PROTECTED] block]# ls
dm-0  dm-3  hda  md1  ram0   ram11  ram14  ram3  ram6  ram9  sdc  sdf  sdi
dm-1  dm-4  hdc  md2  ram1   ram12  ram15  ram4  ram7  sda   sdd  sdg
dm-2  dm-5  md0  md3  ram10  ram13  ram2   ram5  ram8  sdb   sde  sdh


This will be different for you since your system will have different hardware 
and settings, again a pseudo file system.  The dm stuff are my logical volumes 
and you might have more or less sata drives, the sda, sdb, ...  these were 
created when I boot the system.  If I add another sata drive, another sdj will 
be created automatically for me.

So depending on how many raid devices you have (I have four, /boot, swa, /, and 
my RAID6 data, (md0, md1, md2, md3)) they are listed here too.

So lets go into one, my swap RAID, md1, is small so let go to that one and test 
this out;

[EMAIL PROTECTED] md1]# ls
dev  holders  md  range  removable  size  slaves  stat  uevent


Lets go deeper,
[EMAIL PROTECTED] md1]# cd /sys/block/md1/md/

[EMAIL PROTECTED] md]# ls
chunk_size  dev-hdc1  mismatch_cnt  rd0 suspend_lo  
sync_speed
component_size  level new_dev   rd1 sync_action 
sync_speed_max
dev-hda1metadata_version  raid_diskssuspend_hi  sync_completed  
sync_speed_min

Now lets look at sync_action;

[EMAIL PROTECTED] md]# cat sync_action
idle

That is the pseudo file the represents the current state of my RAID md1.

So lets run that echo command and than lets check the state of the RAID;

[EMAIL PROTECTED] md]# echo check  sync_action
[EMAIL PROTECTED] md]# cat /proc/mdstat
Personalities : [raid1] [raid6]
md1 : active raid1 hdc1[1] hda1[0]
  104320 blocks [2/2] [UU]
  []  resync = 62.7% (65664/104320) finish=0.0min 
speed=65664K/sec

So it is in resync state and if there are bad blocks they will be correct from 
parity.

Now once it is done, lets check that sync_action file again.

[EMAIL PROTECTED] md]# cat sync_action
idle

Now remember we used the single redirect, so we replace the value with the text 
of check with our echo command.  Once it was done with the resync, my system 
changed the value back to idle.

What about the double  well they append to the file but it will have the 
over all same effect...

[EMAIL PROTECTED] md]# echo check  sync_action
[EMAIL PROTECTED] md]# cat /proc/mdstat
Personalities : [raid1] [raid6]
md1 : active raid1 hdc1[1] hda1[0]
  104320 blocks [2/2] [UU]
  [=...]  resync = 49.0% (52096/104320) finish=0.0min 
speed=52096K/sec

When it is done the value goes back to idle;

[EMAIL PROTECTED] md]# cat sync_action
idle

So,  or  does not matter here.  And the command you need is echo.

Manipulating the pseudo files in /proc are similar.

Say for example, for security, I don't want my box to respond to pings (1 is 
for true and 0 is for false),
echo 0  /proc/sys/net/ipv4/icmp_echo_ignore_all

In this case, you want the single  because you want to replace the current 
value to 1 and not the  for append.

Also another pseudo file for turning you linux box into a 

Re: mdadm create to existing raid5

2007-07-13 Thread Jon Collette
The mdadm --create with missing instead of a drive is a good idea.  Do 
you actually say missing or just leave out a drive?  However doesn't it 
do a sync everytime you create?  So wouldn't you run the risk of 
corrupting another drive each time?  Or does it not sync because of the 
saying missing?


To bad I am intent on learning things the hard way.

/etc/mdadm.conf from before I recreated
ARRAY /dev/md2 level=raid5 num-devices=4 spares=1 
UUID=4f935928:2b7a1633:71d575d6:dab4d6bc


/etc/mdadm.conf after I recreated
ARRAY /dev/md1 level=raid5 num-devices=4 
UUID=81bdd737:901c0a8f:af38cb94:41c4e3da


Well before I heard back from you guys .  I noticed this problem and in 
my fountain of infinite wisdom I did mdadm --zero-superblock to all my 
raid drives and created them again thinking if I got it to look the same 
it woud just fix it.  Well they do look the same now, I am at work or I 
would give you the new mdadm.conf.


I really need to learn patients :(


David Greaves wrote:

David Greaves wrote:
For a simple 4 device array I there are 24 permutations - doable by 
hand, if you have 5 devices then it's 120, 6 is 720 - getting tricky ;)


Oh, wait, for 4 devices there are 24 permutations - and you need to do 
it 4 times, substituting 'missing' for each device - so 96 trials.


4320 trials for a 6 device array.

Hmm. I've got a 7 device raid 6 - I think I'll go an make a note of 
how it's put together... grin



Have a look at this section and the linked script.
I can't test it until later

http://linux-raid.osdl.org/index.php/RAID_Recovery

http://linux-raid.osdl.org/index.php/Permute_array.pl


David


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3ware 9650 tips

2007-07-13 Thread Jon Collette

Wouldn't Raid 6 be slower than Raid 5 because of the extra fault tolerance?
   http://www.enterprisenetworksandservers.com/monthly/art.php?1754 - 
20% drop according to this article


His 500GB WD drives are 7200RPM compared to the Raptors 10K.  So his 
numbers will be slower. 

Justin what file system do you have running on the Raptors?  I think 
thats an interesting point made by Joshua.



Justin Piszcz wrote:



On Fri, 13 Jul 2007, Joshua Baker-LePain wrote:

My new system has a 3ware 9650SE-24M8 controller hooked to 24 500GB 
WD drives.  The controller is set up as a RAID6 w/ a hot spare.  OS 
is CentOS 5 x86_64.  It's all running on a couple of Xeon 5130s on a 
Supermicro X7DBE motherboard w/ 4GB of RAM.


Trying to stick with a supported config as much as possible, I need 
to run ext3.  As per usual, though, initial ext3 numbers are less 
than impressive. Using bonnie++ to get a baseline, I get (after doing 
'blockdev --setra 65536' on the device):

Write: 136MB/s
Read:  384MB/s

Proving it's not the hardware, with XFS the numbers look like:
Write: 333MB/s
Read:  465MB/s

How many folks are using these?  Any tuning tips?

Thanks.

--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University



Let's try that again with the right address :)


You are using HW RAID then?  Those numbers seem pretty awful for that
setup, including linux-raid@ even it though it appears you're running 
HW raid,

this is rather peculiar.

To give you an example I get 464MB/s write and 627MB/s with a 10 disk
raptor software raid5.

Justin.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3ware 9650 tips

2007-07-13 Thread Justin Piszcz



On Fri, 13 Jul 2007, Joshua Baker-LePain wrote:

My new system has a 3ware 9650SE-24M8 controller hooked to 24 500GB WD 
drives.  The controller is set up as a RAID6 w/ a hot spare.  OS is CentOS 5 
x86_64.  It's all running on a couple of Xeon 5130s on a Supermicro X7DBE 
motherboard w/ 4GB of RAM.


Trying to stick with a supported config as much as possible, I need to run 
ext3.  As per usual, though, initial ext3 numbers are less than impressive. 
Using bonnie++ to get a baseline, I get (after doing 'blockdev --setra 65536' 
on the device):

Write: 136MB/s
Read:  384MB/s

Proving it's not the hardware, with XFS the numbers look like:
Write: 333MB/s
Read:  465MB/s

How many folks are using these?  Any tuning tips?

Thanks.

--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University



Let's try that again with the right address :)


You are using HW RAID then?  Those numbers seem pretty awful for that
setup, including linux-raid@ even it though it appears you're running HW raid,
this is rather peculiar.

To give you an example I get 464MB/s write and 627MB/s with a 10 disk
raptor software raid5.

Justin.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re-building an array

2007-07-13 Thread mail
Hi List,

I am very new to raid, and I am having a problem.

I made a raid10 array, but I only used 2 disks.  Since then, one failed,
and my system crashes with a kernel panic.

I copied all the data, and I would like to start over.  How can I start
from scratch? I need to get rid of my /dev/md0, fully test the discs,
and build them over again as raid1 ?

Thanks!
Rick


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3ware 9650 tips

2007-07-13 Thread Joshua Baker-LePain

On Fri, 13 Jul 2007 at 2:35pm, Justin Piszcz wrote


On Fri, 13 Jul 2007, Joshua Baker-LePain wrote:

My new system has a 3ware 9650SE-24M8 controller hooked to 24 500GB WD 
drives.  The controller is set up as a RAID6 w/ a hot spare.  OS is CentOS 
5 x86_64.  It's all running on a couple of Xeon 5130s on a Supermicro X7DBE 
motherboard w/ 4GB of RAM.


Trying to stick with a supported config as much as possible, I need to run 
ext3.  As per usual, though, initial ext3 numbers are less than impressive. 
Using bonnie++ to get a baseline, I get (after doing 'blockdev --setra 
65536' on the device):

Write: 136MB/s
Read:  384MB/s

Proving it's not the hardware, with XFS the numbers look like:
Write: 333MB/s
Read:  465MB/s

How many folks are using these?  Any tuning tips?

Thanks.


You are using HW RAID then?  Those numbers seem pretty awful for that
setup, including linux-raid@ even it though it appears you're running HW 
raid,

this is rather peculiar.


Yep, hardware RAID -- I need the hot swappability (which, AFAIK, is still 
an issue with md).


--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re-building an array

2007-07-13 Thread Justin Piszcz



On Fri, 13 Jul 2007, mail wrote:


Hi List,

I am very new to raid, and I am having a problem.

I made a raid10 array, but I only used 2 disks.  Since then, one failed,
and my system crashes with a kernel panic.

I copied all the data, and I would like to start over.  How can I start
from scratch? I need to get rid of my /dev/md0, fully test the discs,
and build them over again as raid1 ?

Thanks!
Rick


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



man mdadm, check --zero-superblock option
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-mm PATCH 1/2] raid5: add the stripe_queue object for tracking raid io requests (take2)

2007-07-13 Thread Dan Williams
The raid5 stripe cache object, struct stripe_head, serves two purposes:
1/ frontend: queuing incoming requests
2/ backend: transitioning requests through the cache state machine
   to the backing devices
The problem with this model is that queuing decisions are directly tied to
cache availability.  There is no facility to determine that a request or
group of requests 'deserves' usage of the cache and disks at any given time.

This patch separates the object members needed for queuing from the object
members used for caching.  The stripe_queue object takes over the incoming
bio lists as well as the buffer state flags.

The following fields are moved from struct stripe_head to struct
stripe_queue:
raid5_private_data *raid_conf
int pd_idx
spinlock_t lock
int bm_seq

The following fields are moved from struct r5dev to struct r5_queue_dev:
sector_t sector
struct bio *toread, *towrite

This patch lays the groundwork, but does not implement, the facility to
have more queue objects in the system than available stripes, currently this
remains a 1:1 relationship.  In other words, this patch just moves fields
around and does not implement new logic.

--- Performance Data ---

Unit information

File size = megabytes
Blk Size  = bytes
Num Thr   = number of threads
Avg Rate  = relative throughput
CPU%  = relative percentage of CPU used during the test
CPU Eff   = Rate divided by CPU% - relative throughput per cpu load

Configuration
=
Platform: 1200Mhz iop348 with 4-disk sata_vsc array
mdadm --create /dev/md0 /dev/sd[abcd] -n 4 -l 5
mkfs.ext2 /dev/md0
mount /dev/md0 /mnt/raid
tiobench --size 2048 --numruns 5 --block 4096 --block 131072 --dir /mnt/raid

Sequential Reads
FileBlk Num Avg Maximum CPU
Identifier  SizeSizeThr Rate(CPU%)  Eff
--- --  -   --- --  --  -
2.6.22-iop1 204840961   -1% 2%  -3%
2.6.22-iop1 204840962   -37%-34%-5%
2.6.22-iop1 204840964   -22%-19%-3%
2.6.22-iop1 204840968   -3% -3% -1%
2.6.22-iop1 204813107   1   1%  -1% 2%
2.6.22-iop1 204813107   2   -11%-11%-1%
2.6.22-iop1 204813107   4   25% 20% 4%
2.6.22-iop1 204813107   8   8%  6%  2%

Sequential Writes
FileBlk Num Avg Maximum CPU
Identifier  SizeSizeThr Rate(CPU%)  Eff
--- --  -   --- --  --  -
2.6.22-iop1 204840961   26% 29% -2%
2.6.22-iop1 204840962   40% 43% -2%
2.6.22-iop1 204840964   24% 7%  16%
2.6.22-iop1 204840968   6%  -11%19%
2.6.22-iop1 204813107   1   66% 65% 0%
2.6.22-iop1 204813107   2   41% 33% 6%
2.6.22-iop1 204813107   4   23% -8% 34%
2.6.22-iop1 204813107   8   13% -24%49%

The read numbers in this take have approved from a %14 average decline to a
%5 average decline.  However it is still a mystery as to why any
significant variance is showing up because most reads should completely
bypass the stripe_cache.

New for take3 is blktrace data for a component disk while running the
following:
for i in `seq 1 5`; do dd if=/dev/zero of=/dev/md0 bs=1024k count=1024; 
done

Pre-patch:
CPU0 (sda):
 Reads Queued:7965,31860KiB  Writes Queued:  437458, 1749MiB
 Read Dispatches:  881,31860KiB  Write Dispatches:26405, 1749MiB
 Reads Requeued: 0   Writes Requeued: 0
 Reads Completed:  881,31860KiB  Writes Completed:26415, 1749MiB
 Read Merges: 6955,27820KiB  Write Merges:   411007, 1644MiB
 Read depth: 2   Write depth: 2
 IO unplugs:   176   Timer unplugs: 176

Post-patch:
CPU0 (sda):
 Reads Queued:   36255,   145020KiB  Writes Queued:  437727, 1750MiB
 Read Dispatches: 1960,   145020KiB  Write Dispatches: 6672, 1750MiB
 Reads Requeued: 0   Writes Requeued: 0
 Reads Completed: 1960,   145020KiB  Writes Completed: 6682, 1750MiB
 Read Merges:34235,   136940KiB  Write Merges:   430409, 1721MiB
 Read depth: 2   Write depth: 2
 IO unplugs:   423   Timer unplugs: 423

It looks like the performance win is coming from improved merging and not
from reduced reads as previously assumed.  Note that with blktrace enabled
the throughput comes in at ~98MB/s compared to ~120MB/s without.  Pre-patch
throughput hovers at ~85MB/s for this dd command.

Changes in take2:
* leave the flags with the buffers, prevents a data corruption issue
  whereby stale buffer state 

Re: [-mm PATCH 0/2] 74% decrease in dispatched writes, stripe-queue take3

2007-07-13 Thread Andrew Morton
On Fri, 13 Jul 2007 15:35:42 -0700
Dan Williams [EMAIL PROTECTED] wrote:

 The following patches replace the stripe-queue patches currently in -mm.

I have a little practical problem here: am presently unable to compile
anything much due to all the git rejects coming out of git-md-accel.patch.

It'd be appreciated if you could keep on top of that, please.  It's a common
problem at this time of the kernel cycle.  The quilt trees are much worse - 
Greg's
stuff is an unholy mess.  Ho hum.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [-mm PATCH 0/2] 74% decrease in dispatched writes, stripe-queue take3

2007-07-13 Thread Williams, Dan J
 -Original Message-
 From: Andrew Morton [mailto:[EMAIL PROTECTED]
  The following patches replace the stripe-queue patches currently in
-mm.
 
 I have a little practical problem here: am presently unable to compile
 anything much due to all the git rejects coming out of
git-md-accel.patch.
 
 It'd be appreciated if you could keep on top of that, please.  It's a
common
 problem at this time of the kernel cycle.  The quilt trees are much
worse -
 Greg's
 stuff is an unholy mess.  Ho hum.

Sorry, please drop git-md-accel.patch and git-ioat.patch as they have
been merged into Linus' tree.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [-mm PATCH 0/2] 74% decrease in dispatched writes, stripe-queue take3

2007-07-13 Thread Andrew Morton
On Fri, 13 Jul 2007 15:57:26 -0700
Williams, Dan J [EMAIL PROTECTED] wrote:

  -Original Message-
  From: Andrew Morton [mailto:[EMAIL PROTECTED]
   The following patches replace the stripe-queue patches currently in
 -mm.
  
  I have a little practical problem here: am presently unable to compile
  anything much due to all the git rejects coming out of
 git-md-accel.patch.
  
  It'd be appreciated if you could keep on top of that, please.  It's a
 common
  problem at this time of the kernel cycle.  The quilt trees are much
 worse -
  Greg's
  stuff is an unholy mess.  Ho hum.
 
 Sorry, please drop git-md-accel.patch and git-ioat.patch as they have
 been merged into Linus' tree.

But your ongoing maintenance activity will continue to be held in those
trees, won't it?
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [-mm PATCH 0/2] 74% decrease in dispatched writes, stripe-queue take3

2007-07-13 Thread Williams, Dan J
 -Original Message-
 From: Andrew Morton [mailto:[EMAIL PROTECTED]
 
 But your ongoing maintenance activity will continue to be held in
those
 trees, won't it?

For now:
git://lost.foo-projects.org/~dwillia2/git/iop
ioat-md-accel-for-linus

is where the latest combined tree is located.  However, Shannon Nelson
is coming online to own the i/oat driver so we may need to revisit this
situation.  We want to avoid the git-ioat/git-md-accel collisions that
happened in the past.  I will talk with Shannon about how we will
coordinate this going forward.

The code ownership looks like this:
ioat dma driver - Shannon
net dma offload implementation - Shannon
dmaengine core - shared
async_tx api - shared
iop-adma dma driver - Dan
md-accel implementation - Dan
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3ware 9650 tips

2007-07-13 Thread Michael Tokarev
Joshua Baker-LePain wrote:
[]
 Yep, hardware RAID -- I need the hot swappability (which, AFAIK, is
 still an issue with md).

Just out of curiocity - what do you mean by swappability ?

For many years we're using linux software raid, we had no problems
with swappability of the component drives (in case of drive
failures and what not).  With non-hotswappable drives (old scsi
and ide ones), rebooting is needed for the system to recognize the
drives.  For modern sas/sata drives, i can replace a faulty drive
without anyone noticing...  Maybe you're referring to something
else?

Thanks.

/mjt


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [-mm PATCH 0/2] 74% decrease in dispatched writes, stripe-queue take3

2007-07-13 Thread Andrew Morton
On Fri, 13 Jul 2007 16:28:30 -0700
Williams, Dan J [EMAIL PROTECTED] wrote:

  -Original Message-
  From: Andrew Morton [mailto:[EMAIL PROTECTED]
  
  But your ongoing maintenance activity will continue to be held in
 those
  trees, won't it?
 
 For now:
   git://lost.foo-projects.org/~dwillia2/git/iop
 ioat-md-accel-for-linus
 
 is where the latest combined tree is located.  However, Shannon Nelson
 is coming online to own the i/oat driver so we may need to revisit this
 situation.  We want to avoid the git-ioat/git-md-accel collisions that
 happened in the past.  I will talk with Shannon about how we will
 coordinate this going forward.
 
 The code ownership looks like this:
 ioat dma driver - Shannon
 net dma offload implementation - Shannon
 dmaengine core - shared
 async_tx api - shared
 iop-adma dma driver - Dan
 md-accel implementation - Dan

oh my, how scary.  I'll go into hiding until the dust has settled.  Please
send me the git URLs when it's all set up, thanks.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid array is not automatically detected.

2007-07-13 Thread Zivago Lee
On Fri, 2007-07-13 at 15:36 -0500, Bryan Christ wrote:
 My apologies if this is not the right place to ask this question. 
 Hopefully it is.
 
 I created a RAID5 array with:
 
 mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sda1 /dev/sdb1 
 /dev/sdc1 /dev/sdd1 /dev/sde1
 
 mdadm -D /dev/md0 verifies the devices has a persistent super-block, but 
 upon reboot, /dev/md0 does not get automatically assembled (an hence is 
 not a installable/bootable device).
 
 I have created several raid1 arrays and one raid5 array this way and 
 have never had this problem.  In all fairness, this is the first time I 
 have used mdadm for the job.  Usually, I boot to something like 
 SysRescueCD, used raidtools to create my array and then reboot with my 
 Slackware install CD.
 
 Anyone know why this might be happening?

Are you trying to boot on this raid device?  I believe there is a
limitation as what raid type you can boot off of (IIRC. only raid0 and
raid1).

-- 
Zivago Lee [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3ware 9650 tips

2007-07-13 Thread Andrew Klaassen
--- Justin Piszcz [EMAIL PROTECTED] wrote:

 To give you an example I get 464MB/s write and
 627MB/s with a 10 disk
 raptor software raid5.

Is that with the 9650?

Andrew





  

Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel 
and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid array is not automatically detected.

2007-07-13 Thread Bryan Christ
I would like for it to be the boot device.  I have setup a raid5 mdraid 
array before and it was automatically accessible as /dev/md0 after every 
reboot.  In this peculiar case, I am having to assemble the array 
manually before I can access it...


mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

Unless I do the above, I cannot access /dev/md0.  I've never had this 
happen before.  Usually a cursory glance through dmesg will show that 
the array was detected, but not so in this case.


Zivago Lee wrote:

On Fri, 2007-07-13 at 15:36 -0500, Bryan Christ wrote:
My apologies if this is not the right place to ask this question. 
Hopefully it is.


I created a RAID5 array with:

mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sda1 /dev/sdb1 
/dev/sdc1 /dev/sdd1 /dev/sde1


mdadm -D /dev/md0 verifies the devices has a persistent super-block, but 
upon reboot, /dev/md0 does not get automatically assembled (an hence is 
not a installable/bootable device).


I have created several raid1 arrays and one raid5 array this way and 
have never had this problem.  In all fairness, this is the first time I 
have used mdadm for the job.  Usually, I boot to something like 
SysRescueCD, used raidtools to create my array and then reboot with my 
Slackware install CD.


Anyone know why this might be happening?


Are you trying to boot on this raid device?  I believe there is a
limitation as what raid type you can boot off of (IIRC. only raid0 and
raid1).


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html