Re: Hard drive lifetime: wear from spinning up or rebooting vs running

2006-02-06 Thread Mattias Wadenstein

On Sun, 5 Feb 2006, David Liontooth wrote:


In designing an archival system, we're trying to find data on when it
pays to power or spin the drives down versus keeping them running.

Is there a difference between spinning up the drives from sleep and from
a reboot? Leaving out the cost imposed on the (separate) operating
system drive.


Hitachi claims 5 years (Surface temperature of HDA is 45°C or less) Life 
of the drive does not change in the case that the drive is used 
intermittently. for their ultrastar 10K300 drives. I suspect that the 
best estimates you're going to get is from the manufacturers, if you can 
find the right documents (OEM specifications, not marketing blurbs).


For their deskstar (sata/pata) drives I didn't find life time estimates 
beyond 5 start-stop-cycles.


/Mattias Wadenstein
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Question: read-only array

2006-02-06 Thread Chris Osicki

Hi

I've just noticed that setting an array readonly doesn't really make
it readonly.

I have a RAID1 array and LVM on top of it.

When I run 

/sbin/mdadm --misc --readonly /dev/md0

/proc/mdstat shows:

Personalities : [raid1]
md0 : active (read-only) raid1 sda[0] sdb[1]
  160436096 blocks [2/2] [UU]

However, it doesn't prevent me from activating volume groups, mounting 
filesystems and write files onto it.

Is it a bug, feature or my misunderstanding of the meaning of readonly flag?


I use RedHat AS 4 (U1) on a dual core Opteron machine.
Kernel 2.6.9-11.ELsmp as delivered with RH.

mdadm - v1.12.0 - 14 June 2005

Thanks for your time.

Regards,
Chris
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hard drive lifetime: wear from spinning up or rebooting vs running

2006-02-06 Thread David Liontooth
Mattias Wadenstein wrote:

 On Sun, 5 Feb 2006, David Liontooth wrote:

 In designing an archival system, we're trying to find data on when it
 pays to power or spin the drives down versus keeping them running.

 Hitachi claims 5 years (Surface temperature of HDA is 45°C or less)
 Life of the drive does not change in the case that the drive is used
 intermittently. for their ultrastar 10K300 drives. I suspect that the
 best estimates you're going to get is from the manufacturers, if you
 can find the right documents (OEM specifications, not marketing blurbs).

Intermittent may assume the drive is powered on and in regular use and
may simply be a claim that spindle drive components are designed to fail
simultaneously with disk platter and head motor components. 

Konstantin's observation that disk die about evenly from 3 causes: no
spinning (dead spindle motor power electronics), heads do not move (dead
head motor power electronics), or spontaneusly developing bad sectors
(disk platter contamination?) is consistent with a rational goal of
manufacturing components with similar lifetimes under normal use. 

 For their deskstar (sata/pata) drives I didn't find life time
 estimates beyond 5 start-stop-cycles.

If components are in fact manufactured to fail simultaneously under
normal use (including a dozen or two start-stop cycles a day), then
taking the drive off-line for more than a few hours should
unproblematically extend its life.

Appreciate all the good advice and references. While we have to rely on
specifications rather than actual long-term tests, this should still
move us in the right direction. One of the problems with creating a
digital archive is that the technology has no archival history. We know
acid-free paper lasts millennia; how long do modern hard drives last in
cold storage?  To some people's horror, we now know home-made CDs last a
couple of years.

Dave






-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hard drive lifetime: wear from spinning up or rebooting vs running

2006-02-06 Thread Francois Barre
2006/2/6, David Liontooth [EMAIL PROTECTED]:
 Mattias Wadenstein wrote:

  On Sun, 5 Feb 2006, David Liontooth wrote:
  For their deskstar (sata/pata) drives I didn't find life time
  estimates beyond 5 start-stop-cycles.

 If components are in fact manufactured to fail simultaneously under
 normal use (including a dozen or two start-stop cycles a day), then
 taking the drive off-line for more than a few hours should
 unproblematically extend its life.

IMHO, a single start-stop cycle is more costy in terms of lifetime
than a couple of hours spinning. As far as I know, on actual disks
(especially 7200 and 10k rpm ones), spinup is a really critical and
life-consuming action ; spindle motor is heavily used, much more than
it will be when spin speed is stable. On our actual storage design,
disks are never stopped (sorry for Earth...), because it doesn't worth
spinning down for less than a couple of days.
However, temperature has a real impact on heads, (incl. head motors),
because of material dilatation on overheat. So cooling your drives is
a major issue.

 how long do modern hard drives last in cold storage?
Demagnetation ?
A couple of years back in time, there were some tools to read and then
rewrite floppy contents to remagnet the floppy content. I guess it
shall be the same for the drive : periodically re-read and re-write
each and every sector of the drive to grant a good magnetation of the
surface.
I would not give more than 100 years for a drive to lose all its
content by demagnetation... Anyway, in 100 years, no computer will
have the controllers to plug a sata nor a scsi :-p.
I guess a long-living system should not stay cool, and
re-activate/check its content periodically...

 we now know home-made CDs last a couple of years.
I thought it was said to be at least a century... But with the
enormous cost reduction operated in this area, it's no surprise the
lifetime decreased so much.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 000 of 3] MD Acceleration and the ADMA interface: Introduction

2006-02-06 Thread Dan Williams
On 2/5/06, Neil Brown [EMAIL PROTECTED] wrote:
 I've looked through the patches - not exhaustively, but hopefully
 enough to get a general idea of what is happening.
 There are some things I'm not clear on and some things that I could
 suggest alternates too...

I have a few questions to check that I understand your suggestions.

  - Each ADMA client (e.g. a raid5 array) gets a dedicated adma thread
to handle all its requests.  And it handles them all in series.  I
wonder if this is really optimal.  If there are multiple adma
engines, then a single client could only make use of one of them
reliably.
It would seem to make more sense to have just one thread - or maybe
one per processor or one per adma engine - and have any ordering
between requests made explicit in the interface.

Actually as each processor could be seen as an ADMA engine, maybe
you want one thread per processor AND one per engine.  If there are
no engines, the per-processor threads run with high priority, else
with low.

...so the engine thread would handle explicit client requested
ordering constraints and then hand the operations off to per processor
worker threads in the pio case or queue directly to hardware in the
presence of such an engine.  In md_thread you talk about priority
inversion deadlocks, do those same concerns apply here?

  - I have thought that the way md/raid5 currently does the
'copy-to-buffer' and 'xor' in two separate operations may not be
the best use of the memory bus.  If you could have a 3-address
operation that read from A, stored into B, and xorred into C, then
A would have to be read half as often.  Would such an interface
make sense with ADMA?  I don't have sufficient knowledge of
assemble to do it myself for the current 'xor' code.

At the very least I can add a copy+xor command to ADMA, that way
developers implementing engines can optimize for this case, if the
hardware supports it, and the hand coded assembly guys can do their
thing.

  - Your handling of highmem doesn't seem right.  You shouldn't kmap it
until you have decided that you have to do the operation 'by hand'
(i.e. in the cpu, not in the DMA engine).  If the dma engine can be
used at all, kmap isn't needed at all.

I made the assumption that if CONFIG_HIGHMEM is not set then the kmap
call resolves to a simple page_address() call.  I think its ok, but
it does look fishy so I will revise this code.  I also was looking to
handle the case where the underlying hardware DMA engine does not
support high memory addresses.

  - The interfacing between raid5 and adma seems clumsy... Maybe this
is just because you were trying to minimise changes to raid5.c.
I think it would be better to make substantial but elegant changes
to raid5.c - handle_stripe in particular - so that what is
happening becomes very obvious.

Yes, I went into this with the idea of being minimally intrusive, but
you are right the end result should have MD optimized for ADMA rather
than ADMA shoe-horned into MD.

For example, one it has been decided to initiate a write (there is
enough data to correctly update the parity block).  You need to
perform a sequence of copies and xor operations, and then submit
write requests.
This is currently done by the copy/xor happening inline under the
sh-lock spinlock, and then R5_WantWrite is set.  Then, out side
the spinlock, if WantWrite is set generic_make_request is calls as
appropriate.

I would change this so that a sequence of descriptors was assembled
which described that copies and xors.  Appropriate call-backs would
be set so that the generic_make_request is called at the right time
(after the copy, or after that last xor for the parity block).
Then outside the sh-lock spinlock this sequence is passed to the
ADMA manager.  If there is no ADMA engine present, everything is
performed inline - multiple xors are possibly combined into
multi-way xors automatically.  If there is an ADMA engine, it is
scheduled to do the work.

I like this idea of clearly separated stripe assembly (finding work
while under the lock) and stripe execute (running copy+xor / touching
disks) stages.

Can you elaborate on a scenario where xors are combined into multi-way xors?

The relevant blocks are all 'locked' as they are added to the
sequence, and unlocked as the writes complete or, for unchanged
block in RECONSTRUCT_WRITE, when the copy xor that uses them
completes.

resync operations would construct similar descriptor sequences, and
have a different call-back on completion.


Doing this would require making sure that get_desc always
succeeds.  I notice that you currently allow for the possible
failure of adma_get_desc and fall back to 'pio' in that case (I
think).  I think it would be better to use a mempool (or similar)
to ensure that you never fail.  There 

Re: Hard drive lifetime: wear from spinning up or rebooting vs running

2006-02-06 Thread Brad Dameron
On Sun, 2006-02-05 at 15:42 -0800, David Liontooth wrote:
 In designing an archival system, we're trying to find data on when it
 pays to power or spin the drives down versus keeping them running. 
 
 Is there a difference between spinning up the drives from sleep and from
 a reboot? Leaving out the cost imposed on the (separate) operating
 system drive.
 
 Temperature obviously matters -- a linear approximation might look like
 this,
 
  Lifetime = 60 - 12 [(t-40)/2.5]
 
 where 60 is the average maximum lifetime, achieved at 40 degrees C and
 below, and lifetime decreases by a year for every 2.5 degree rise in
 temperature.  Does anyone have an actual formula?
 
 To keep it simple, let's assume we keep temperature at or below what is
 required to reach average maximum lifetime. What is the cost of spinning
 up the drives in the currency of lifetime months?
 
 My guess would be that the cost is tiny -- in the order of minutes.
 
 Or are different components stressed in a running drive versus one that
 is spinning up, so it's not possible to translate the cost of one into
 the currency of the other?
 
 Finally, is there passive decay of drive components in storage?
 
 Dave

I read somewhere, still looking for the link, that the constant on/off
of a drive actually decrease's the drives lifespan due to the
heating/cooling of the bearings. It was actually determined to be best
to leave the drive spinning. 

Brad Dameron
SeaTab Software
www.seatab.com


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hard drive lifetime: wear from spinning up or rebooting vs running

2006-02-06 Thread Dan Stromberg

Drives are probably going to have a lifetime that is proportionate to a
variety of things, and while I'm not a physicist or mechanical engineer,
nor in the hard disk business, the things that come to mind first are:

1) Thermal stress due to temperate changes - with more rapid changes
being more severe (expansion and contraction, I assume - viiz one of
those projectors or cars that run hot, and leave a fan running for a
while before fully powering off)

2) The amount of time a disk spends in a powered-off state (EG,
lubricants may congeal, and about every time my employer, UCI, has a
campus-wide power outage, -some- piece of equipment somewhere on campus
fails to come back up - probably due to thermal stress)

3) The number of times a disk goes to a powered-off state (thermal
stress again)

4) The amount of bumping around the disk undergoes, which may to an
extent be greater in disks that are surrounded by other disks, with
disks on the physical periphery of your RAID solution bumping around a
little less - those little rubber things that you screw the drive into
may help here.

5) The materials used in the platters, heads, servo, etc.

6) The number of alternate blocks for remapping bad blocks

7) The degree of tendency for a head crash to peel off a bunch of
material, or to just make a tiny scratch, and the degree of tendency for
scratched-off particles to bang into platters or heads later and scrape
off more particles - which can sometimes yield an exponential decay of
drive usability

8) How good the clean room(s) the drive was built in was/were

9) How good a drive is at parking the hards over unimportant parts of
the platters, when bumped, dropped, in an earth quake, when turned off,
etc.

If you want to be thorough with this, you probably want to employ some
materials scientists, some statisticians, get a bunch of different kinds
of drives and characterize their designs somehow, do multiple
longitudinal studies, hunt for correlations between drive attributes and
lifetimes, etc.

And I totally agree with a previous poster - this stuff may all change
quite a bit by the time the study is done, so it'd be a really good idea
to look for ways of increasing your characterizations longevity somehow,
possibly by delving down into individual parts of the drives and looking
at their lifetime.  But don't rule out holistic/chaotic effects
unnecessarily, even if the light's better over here when looking at
the reductionistic view of drives.

PS: Letting a drive stay powered without spinning is sometimes called a
warm spare, while a drive that's spinning all the time even while not
in active use in a RAID array is called a cold spare.

HTH :)

On Sun, 2006-02-05 at 15:42 -0800, David Liontooth wrote:
 In designing an archival system, we're trying to find data on when it
 pays to power or spin the drives down versus keeping them running. 
 
 Is there a difference between spinning up the drives from sleep and from
 a reboot? Leaving out the cost imposed on the (separate) operating
 system drive.
 
 Temperature obviously matters -- a linear approximation might look like
 this,
 
  Lifetime = 60 - 12 [(t-40)/2.5]
 
 where 60 is the average maximum lifetime, achieved at 40 degrees C and
 below, and lifetime decreases by a year for every 2.5 degree rise in
 temperature.  Does anyone have an actual formula?
 
 To keep it simple, let's assume we keep temperature at or below what is
 required to reach average maximum lifetime. What is the cost of spinning
 up the drives in the currency of lifetime months?
 
 My guess would be that the cost is tiny -- in the order of minutes.
 
 Or are different components stressed in a running drive versus one that
 is spinning up, so it's not possible to translate the cost of one into
 the currency of the other?
 
 Finally, is there passive decay of drive components in storage?
 
 Dave
 
 
 
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Raid 1 always degrades after a reboot.

2006-02-06 Thread Hans Rasmussen

Hi all.

After every reboot, my brand new Raid1 array comes up degraded.  It's always 
/dev/sdb1 that is unavailable or removed.


The hardware is as follows..

2x200MB Seagate SATA drives, in RAID 1 These are for data only, OS is on 
a separate IDE disk.

LVM Partitions for my data on the RAID
Promise SATA300 TX2Plus Sata Card. (use their kernel module, ULSATA2)
Asus P3B motherboard and 400Mhz P2 (getting replaced in the near future)

Software

Mandriva 2006 download edition, upgraded from Mandrake 9.1
Kernel 2.6.12-15mdk (not rebuilt)
MDADM version 00.90.01

Logs, etc..

dmesg
Linux version 2.6.12-15mdk ([EMAIL PROTECTED]) (gcc version 4.0.1 
(4.0.1-5mdk for Mandriva Linux release 2006.0)) #1 Mon Jan 9 17:08:48 MST 
2006

BIOS-provided physical RAM map:
BIOS-e820:  - 0009d400 (usable)
BIOS-e820: 0009d400 - 000a (reserved)
BIOS-e820: 000f - 0010 (reserved)
BIOS-e820: 0010 - 27ffc000 (usable)
BIOS-e820: 27ffc000 - 27fff000 (ACPI data)
BIOS-e820: 27fff000 - 2800 (ACPI NVS)
BIOS-e820:  - 0001 (reserved)
0MB HIGHMEM available.
639MB LOWMEM available.
On node 0 totalpages: 163836
 DMA zone: 4096 pages, LIFO batch:1
 Normal zone: 159740 pages, LIFO batch:31
 HighMem zone: 0 pages, LIFO batch:1
DMI 2.3 present.
ACPI: RSDP (v000 ASUS  ) @ 0x000f58a0
ACPI: RSDT (v001 ASUS   P3B_F0x30303031 MSFT 0x31313031) @ 0x27ffc000
ACPI: FADT (v001 ASUS   P3B_F0x30303031 MSFT 0x31313031) @ 0x27ffc080
ACPI: BOOT (v001 ASUS   P3B_F0x30303031 MSFT 0x31313031) @ 0x27ffc040
ACPI: DSDT (v001   ASUS P3B_F0x1000 MSFT 0x010b) @ 0x
ACPI: PM-Timer IO Port: 0xe408
Allocating PCI resources starting at 2800 (gap: 2800:d7ff)
Built 1 zonelists
Local APIC disabled by BIOS -- you can enable it with lapic
mapped APIC to d000 (01503000)
Initializing CPU#0
Kernel command line: auto BOOT_IMAGE=linux root=306 quiet acpi=ht 
resume=/dev/hda5 splash=silent

bootsplash: silent mode.
PID hash table entries: 4096 (order: 12, 65536 bytes)
Detected 400.955 MHz processor.
Using pmtmr for high-res timesource
Console: colour dummy device 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 644744k/655344k available (2348k kernel code, 10028k reserved, 717k 
data, 268k init, 0k highmem, 0k BadRAM)

Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay loop... 794.62 BogoMIPS (lpj=397312)
Mount-cache hash table entries: 512
CPU: After generic identify, caps: 0183f9ff    
  
CPU: After vendor identify, caps: 0183f9ff    
  

CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 512K
CPU: After all inits, caps: 0183f9ff   0040  
 

Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: Intel Pentium II (Deschutes) stepping 02
Enabling fast FPU save and restore... done.
Checking 'hlt' instruction... OK.
checking if image is initramfs...it isn't (bad gzip magic numbers); looks 
like an initrd

Freeing initrd memory: 299k freed
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xf08b0, last bus=1
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20050309
ACPI: Interpreter disabled.
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI: disabled
PnPBIOS: Disabled
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
Boot video device is :01:00.0
PCI: Using IRQ router PIIX/ICH [8086/7110] at :00:04.0
Simple Boot Flag at 0x3a set to 0x1
apm: BIOS version 1.2 Flags 0x03 (Driver version 1.16ac)
audit: initializing netlink socket (disabled)
audit(1139207174.020:0): initialized
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
devfs: 2004-01-31 Richard Gooch ([EMAIL PROTECTED])
devfs: boot_options: 0x0
Initializing Cryptographic API
Limiting direct PCI/PCI transfers.
vesafb: framebuffer at 0xe300, mapped to 0xe888, using 3750k, total 
4096k

vesafb: mode is 800x600x16, linelength=1600, pages=3
vesafb: protected mode interface info at c000:474c
vesafb: scrolling: redraw
vesafb: Truecolor: size=0:5:6:5, shift=0:11:5:0
bootsplash 3.1.6-2004/03/31: looking for picture...6 silentjpeg size 34430 
bytes,6...found (800x600, 34382 bytes, v3).

Console: switching to colour frame buffer device 93x30
fb0: VESA VGA frame buffer device
isapnp: Scanning for PnP cards...
isapnp: No Plug  Play device found
Real Time Clock Driver v1.12
PNP: No PS/2 controller found. Probing ports directly.
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, 

Re: [klibc] Re: Exporting which partitions to md-configure

2006-02-06 Thread H. Peter Anvin

Neil Brown wrote:


What constitutes 'a piece of data'?  A bit? a byte?

I would say that 
   msdos:fd

is one piece of data.  The 'fd' is useless without the 'msdos'.
The 'msdos' is, I guess, not completely useless with the fd.

I would lean towards the composite, but I wouldn't fight a separation.



Well, the two pieces come from different sources.



Just as there is a direct unambiguous causal path from something
present at early boot to the root filesystem that is mounted (and the
root filesystem specifies all other filesystems through fstab)
similarly there should be an unambiguous causal path from something
present at early boot to the array which holds the root filesystem -
and the root filesystem should describe all other arrays via
mdadm.conf

Does that make sense?



It makes sense, but I disagree.  I believe you are correct in that the 
current preferred minor bit causes an invalid assumption that, e.g. 
/dev/md3 is always a certain thing, but since each array has a UUID, and 
one should be able to mount by either filesystem UUID or array UUID, 
there should be no need for such a conflict if one allows for dynamic md 
numbers.


Requiring that mdadm.conf describes the actual state of all volumes 
would be an enormous step in the wrong direction.  Right now, the Linux 
md system can handle some very oddball hardware changes (such as on 
hera.kernel.org, when the disks not just completely changed names due to 
a controller change, but changed from hd* to sd*!)


Dynamicity is a good thing, although it needs to be harnessed.

 kernel parameter md_root_uuid=xxyy:zzyy:aabb:ccdd...
This could be interpreted by an initramfs script to run mdadm
to find and assemble the array with that uuid.  The uuid of
each array is reasonably unique.

This, in fact is *EXACTLY* what we're talking about; it does require 
autoassemble.  Why do we care about the partition types at all?  The 
reason is that since the md superblock is at the end, it doesn't get 
automatically wiped if the partition is used as a raw filesystem, and so 
it's important that there is a qualifier for it.


-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid5 Debian Yaird Woes

2006-02-06 Thread dean gaudet
On Sun, 5 Feb 2006, Lewis Shobbrook wrote:

 On Saturday 04 February 2006 11:22 am, you wrote:
  On Sat, 4 Feb 2006, Lewis Shobbrook wrote:
   Is there any way to avoid this requirement for input, so that the system
   skips the missing drive as the raid/initrd system did previously?
 
  what boot errors are you getting before it drops you to the root password
  prompt?
 
 Basically it just states waiting X seconds for /dev/sdx3 (corresponding to 
 the 
 missing raid5 member). Where X cycles from 2,4,8,16 and then drops you into a 
 recovery console, no root pwd prompt.
 It will only occur if the partition is completely missing, such as a 
 replacement disk with a blank partition table, or a completely missing/failed 
 drive.
  is it trying to fsck some filesystem it doesn't have access to?
 
 No fsck seen for bad extX partitions etc.

try something like this...

cd /tmp
mkdir t
cd t
zcat /boot/initrd.img-`uname -r` | cpio -i
grep -r sd.3 .

that should show us what script is directly accessing /dev/sdx3 ... maybe 
there's something more we can do about it.

i did find a possible deficiency with the patch i posted... looking more 
closely at my yaird /init i see this:

mkbdev '/dev/sdb' 'sdb'
mkbdev '/dev/sdb4' 'sdb/sdb4'
mkbdev '/dev/sda' 'sda'
mkbdev '/dev/sda4' 'sda/sda4'

and i think that means that mdadm -Ac partitions will fail if one of my 
root disks ends up somewhere other than sda or sdb... because the device 
nodes won't exist.

i suspect i should update the patch to use mdrun instead of mdadm -Ac 
partitions... because mdrun will create temporary device nodes for 
everything in /proc/partitions in order to find all the possible raid 
pieces.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Problems making a RAID-0 over 2 gpt partitions...

2006-02-06 Thread James Lamanna
I recently acquired a 7TB Xserve RAID. It is configured in hardware as
2 RAID 5 arrays of 3TB each.
Now I'm trying to configure a RAID 0 over these 2 drives (so RAID 50 in total).

I only wanted to make 1 large partition on each array, so I used
parted as follows:
parted /dev/sd[bc]
(parted) mklabel gpt
(parted) mkpart primary 0 3000600
(parted) set 1 raid on
(parted) q

for each of the disks.

Then I went to go make the RAID array:
mdadm -C -l 0 --raid-devices=2 /dev/md0 /dev/sdb1 /dev/sdc1

Everything seems ok at this point, /etc/mdstat lists the array as active..
I then wanted to put LVM on top of this for future expansion:

pvcreate /dev/md0
vgcreate imagery /dev/md0
lvcreate -l xxx -n image1 imagery (x is the number of PEs for
the whole disk, couldn't remember the number off the top of my head)

then a filesystem..
mkfs.xfs /dev/imagery/image1

Everything works fine up to this point until I reboot.
after reboot, the md array does not reassemble itself, and manually
doing it results in:
mdadm -A /dev/md0 /dev/sdb1 /dev/sdc1
/dev/sdb1: no RAID superblock

Kernel is 2.6.14
mdadm is 1.12.0

Did I miss a partitioning step here (or do something else sufficiently stupid)?

Thanks in advance, and please CC me for I am not subscribed.
-- James
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 000 of 3] MD Acceleration and the ADMA interface: Introduction

2006-02-06 Thread Evgeniy Polyakov
On Mon, Feb 06, 2006 at 12:25:22PM -0700, Dan Williams ([EMAIL PROTECTED]) 
wrote:
 On 2/5/06, Neil Brown [EMAIL PROTECTED] wrote:
  I've looked through the patches - not exhaustively, but hopefully
  enough to get a general idea of what is happening.
  There are some things I'm not clear on and some things that I could
  suggest alternates too...
 
 I have a few questions to check that I understand your suggestions.
 
   - Each ADMA client (e.g. a raid5 array) gets a dedicated adma thread
 to handle all its requests.  And it handles them all in series.  I
 wonder if this is really optimal.  If there are multiple adma
 engines, then a single client could only make use of one of them
 reliably.
 It would seem to make more sense to have just one thread - or maybe
 one per processor or one per adma engine - and have any ordering
 between requests made explicit in the interface.
 
 Actually as each processor could be seen as an ADMA engine, maybe
 you want one thread per processor AND one per engine.  If there are
 no engines, the per-processor threads run with high priority, else
 with low.
 
 ...so the engine thread would handle explicit client requested
 ordering constraints and then hand the operations off to per processor
 worker threads in the pio case or queue directly to hardware in the
 presence of such an engine.  In md_thread you talk about priority
 inversion deadlocks, do those same concerns apply here?

Just for reference: the more threads you have, the less stable your
system is. Ping-ponging work between several completely independent
entities is always a bad idea. Even completion of the request postponed
to workqueue from current execution unit introduces noticeble latencies.
System should be able to process as much as possible of it's work in one
flow.

 Dan

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html