Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-04 Thread Michael Tokarev
Moshe Yudkowsky wrote:
[]
 But that's *exactly* what I have -- well, 5GB -- and which failed. I've
 modified /etc/fstab system to use data=journal (even on root, which I
 thought wasn't supposed to work without a grub option!) and I can
 power-cycle the system and bring it up reliably afterwards.

Note also that data=journal effectively doubles the write time.
It's a bit faster for small writes (because all writes are first
done into the journal, i.e. into the same place, so no seeking
is needed), but for larger writes, the journal will become full
and data found in it needs to be written to proper place, to free
space for new data.  Here, if you'll continue writing, you will
have more than 2x speed degradation, because of a) double writes,
and b) more seeking.

/mjt
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid1 or raid10 for /boot

2008-02-04 Thread Robin Hill
On Mon Feb 04, 2008 at 07:34:54AM +0100, Keld Jørn Simonsen wrote:

 I understand that lilo and grub only can boot partitions that look like
 a normal single-drive partition. And then I understand that a plain
 raid10 has a layout which is equivalent to raid1. Can such a raid10
 partition be used with grub or lilo for booting?
 And would there be any advantages in this, for example better disk
 utilization in the raid10 driver compared with raid?
 
A plain RAID-10 does _not_ have a layout equivalent to RAID-1 and
_cannot_ be used for booting (well, possibly a 2-disk RAID-10 could -
I'm not sure how that'd be layed out).  RAID-10 uses striping as well as
mirroring, and the striping breaks both grub and lilo (and, AFAIK, every
other boot manager currently out there).

Cheers,
Robin
-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgp5z4BPWeOcx.pgp
Description: PGP signature


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-04 Thread Moshe Yudkowsky

Michael Tokarev wrote:

Moshe Yudkowsky wrote:
[]

But that's *exactly* what I have -- well, 5GB -- and which failed. I've
modified /etc/fstab system to use data=journal (even on root, which I
thought wasn't supposed to work without a grub option!) and I can
power-cycle the system and bring it up reliably afterwards.


Note also that data=journal effectively doubles the write time.
It's a bit faster for small writes (because all writes are first
done into the journal, i.e. into the same place, so no seeking
is needed), but for larger writes, the journal will become full
and data found in it needs to be written to proper place, to free
space for new data.  Here, if you'll continue writing, you will
have more than 2x speed degradation, because of a) double writes,
and b) more seeking.


The alternative seems to be that portions of the / file system won't 
mount because the file system is corrupted on a crash while writing.


If I'm reading the man pages, Wikis, READMEs and mailing lists correctly 
--  not necessarily the case -- the ext3 file system uses the equivalent 
of data=journal as a default.


The question then becomes what data scheme to use with reiserfs on the 
remainder of the file system, the /usr, /var, /home, and others. If they 
can recover on a reboot sing fsck and the default configuration of 
resierfs, then I have no problem using them. But my understanding is 
that data can be destroyed or lost or destroyed if there's a crash on a 
write; then there's little point in running a RAID system that can 
collect corrupt data.


Another way to phrase this: unless you're running data-center grade 
hardware and have absolute confidence in your UPS, you should use 
data=journal for reiserfs and perhaps avoid XFS entirely.



--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
Right in the middle of a large field where there had never been a 
trench was a
shell hole... 8 feet deep by 15 across. On the edge of it was a dead... 
rat not over
twice the size of a mouse. No wonder the war costs so much. Col. George 
Patton

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-04 Thread Moshe Yudkowsky

Robin, thanks for the explanation. I have a further question.

Robin Hill wrote:


Once the file system is mounted then hdX,Y maps according to the
device.map file (which may actually bear no resemblance to the drive
order at boot - I've had issues with this before).  At boot time it maps
to the BIOS boot order though, and (in my experience anyway) hd0 will
always map to the drive the BIOS is booting from.


At the time that I use grub to write to the MBR, hd2,1 is /dev/sdc1. 
Therefore, I don't quite understand why this would not work:


grub EOF
root(hd2,1)
setup(hd2)
EOF

This would seem to be a command to have the MBR on hd2 written to use 
the boot on hd2,1. It's valid when written. Are you saying that it's a 
command for the MBR on /dev/sdc to find the data on (hd2,1), the 
location of which might change at any time? That's... a  very strange 
way to write the tool. I thought it would be a command for the MBR on 
hd2 (sdc) to look at hd2,1 (sdc1) to find its data, regardless of the 
boot order that caused sdc to be the boot disk.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 Bring me the head of Prince Charming.
-- Robert Sheckley  Roger Zelazny
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid1 or raid10 for /boot

2008-02-04 Thread Keld Jørn Simonsen
On Mon, Feb 04, 2008 at 09:17:35AM +, Robin Hill wrote:
 On Mon Feb 04, 2008 at 07:34:54AM +0100, Keld Jørn Simonsen wrote:
 
  I understand that lilo and grub only can boot partitions that look like
  a normal single-drive partition. And then I understand that a plain
  raid10 has a layout which is equivalent to raid1. Can such a raid10
  partition be used with grub or lilo for booting?
  And would there be any advantages in this, for example better disk
  utilization in the raid10 driver compared with raid?
  
 A plain RAID-10 does _not_ have a layout equivalent to RAID-1 and
 _cannot_ be used for booting (well, possibly a 2-disk RAID-10 could -
 I'm not sure how that'd be layed out).  RAID-10 uses striping as well as
 mirroring, and the striping breaks both grub and lilo (and, AFAIK, every
 other boot manager currently out there).

Yes, it is understood that raid10,f2 uses striping, but a raid10,near=2,
far=1 does not use striping, anfd this is what you get if you just make
amdadm --create /dev/md0 -l 10 -n 2 /dev/sda1 /dev/sdb1

best regards
keld
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-04 Thread Michael Tokarev
Moshe Yudkowsky wrote:
[]
 If I'm reading the man pages, Wikis, READMEs and mailing lists correctly
 --  not necessarily the case -- the ext3 file system uses the equivalent
 of data=journal as a default.

ext3 defaults to data=ordered, not data=journal.  ext2 doesn't have
journal at all.

 The question then becomes what data scheme to use with reiserfs on the

I'd say don't use reiserfs in the first place ;)

 Another way to phrase this: unless you're running data-center grade
 hardware and have absolute confidence in your UPS, you should use
 data=journal for reiserfs and perhaps avoid XFS entirely.

By the way, even if you do have a good UPS, there should be some
control program for it, to properly shut down your system when
UPS loses the AC power.  So far, I've seen no such programs...

/mjt
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-04 Thread Moshe Yudkowsky

Eric,

Thanks very much for your note. I'm becoming very leery of resiserfs at 
the moment... I'm about to run another series of crash tests.


Eric Sandeen wrote:

Justin Piszcz wrote:


Why avoid XFS entirely?

esandeen, any comments here?


Heh; well, it's the meme.


Well, yeah...


Note also that ext3 has the barrier option as well, but it is not
enabled by default due to performance concerns.  Barriers also affect
xfs performance, but enabling them in the non-battery-backed-write-cache
scenario is the right thing to do for filesystem integrity.


So if I understand you correctly, you're stating that current the most 
reliable fs in its default configuration, in terms of protection against 
power-loss scenarios, is XFS?



--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 There is something fundamentally wrong with a country [USSR] where
  the citizens want to buy your underwear.  -- Paul Thereaux
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-04 Thread Michael Tokarev
Eric Sandeen wrote:
 Moshe Yudkowsky wrote:
 So if I understand you correctly, you're stating that current the most 
 reliable fs in its default configuration, in terms of protection against 
 power-loss scenarios, is XFS?
 
 I wouldn't go that far without some real-world poweroff testing, because
 various fs's are probably more or less tolerant of a write-cache
 evaporation.  I suppose it'd depend on the size of the write cache as well.

I know no filesystem which is, as you say, tolerant to a write-cache
evaporation.  If a drive says the data is written but in fact it's
not, it's a Bad Drive (tm) and it should be thrown away immediately.
Fortunately, almost all modern disk drives don't lie this way.  The
only thing needed for the filesystem is to tell the drive to flush
it's cache at the appropriate time, and actually wait for the flush
to complete.  Barriers (mentioned in this thread) is just another
way to do so, in a somewhat more efficient way, but normal cache
flush will do as well.  IFF the write caching is enabled in the
first place - note that with some workloads, write caching in
the drive actually makes write speed worse, not better - namely,
in case of massive writes.

Speaking of XFS (and with ext3fs with write barriers enabled) -
I'm confused here as well, and answers to my questions didn't
help either.  As far as I understand, XFS only use barriers,
not regular cache flushes, hence without write barrier support
(which is not here for linux software raid, which is explained
elsewhere) it's unsafe, -- probably the same applies to ext3
with barrier support enabled.  But I'm not sure I got it all
correctly.

/mjt
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid1 and raid 10 always writes all data to all disks?

2008-02-04 Thread Bill Davidsen

Keld Jørn Simonsen wrote:

On Sun, Feb 03, 2008 at 10:56:01AM -0500, Bill Davidsen wrote:
  

Keld Jørn Simonsen wrote:


I found a sentence in the HOWTO:

raid1 and raid 10 always writes all data to all disks

I think this is wrong for raid10.

eg

a raid10,f2 of 4 disks only writes to two of the disks -
not all 4 disks. Is that true?
 
  
I suspect that really should have read all mirror copies, in the 
raid10 case.



OK, I changed the text to:

raid1 always writes all data to all disks.
  


Just to be really pedantic, you might say devices instead of disks, 
since many or most arrays are on partitions. Otherwise I like this, it's 
much clearer.

raid10 always writes all data to the number of copies that the raid holds.
For example on a raid10,f2 or raid10,o2 of 6 disks, the data will only
be written 2 times.

Best regards
Keld

  



--
Bill Davidsen [EMAIL PROTECTED]
 Woe unto the statesman who makes war without a reason that will still
 be valid when the war is over... Otto von Bismark 




-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: draft howto on making raids for surviving a disk crash

2008-02-04 Thread Bill Davidsen

Keld Jørn Simonsen wrote:

On Sun, Feb 03, 2008 at 10:53:51AM -0500, Bill Davidsen wrote:
  

Keld Jørn Simonsen wrote:


This is intended for the linux raid howto. Please give comments.
It is not fully ready /keld

Howto prepare for a failing disk

6. /etc/mdadm.conf

Something here on /etc/mdadm.conf. What would be safe, allowing
a system to boot even if a disk has crashed?
 
  

Recommend PARTITIONS by used



Thanks Bill for your suggestions, which I have incorporated in the text.

However, I do not understand what to do with the remark above.
Please explain.
  


The mdadm.conf file should contain the DEVICE partitions statement to 
identify all possible partitions regardless of name changes. See man 
mdadm.conf for more discussion. This protects against udev doing 
something innovative in device naming.


--
Bill Davidsen [EMAIL PROTECTED]
 Woe unto the statesman who makes war without a reason that will still
 be valid when the war is over... Otto von Bismark 




-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: In this partition scheme, grub does not find md information?

2008-02-04 Thread Michael Tokarev
John Stoffel wrote:
[]
 C'mon, how many of you are programmed to believe that 1.2 is better
 than 1.0?  But when they're not different, just just different
 placements, then it's confusing.

Speaking of more is better thing...

There were quite a few bugs fixed in recent months wrt version 1
superblocks - both in kernel and in mdadm.  While 0.90 format is
stable for a very long time, and unless you're hitting its limits
(namely, max 26 drives in an array, no homehost field), there's
nothing which makes v1 superblocks better than 0.90 ones.

In my view, better = stable first, faster/easier/whatever second.

/mjt
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: In this partition scheme, grub does not find md information?

2008-02-04 Thread John Stoffel

David On 26 Oct 2007, Neil Brown wrote:
 On Thursday October 25, [EMAIL PROTECTED] wrote:
 I also suspect that a *lot* of people will assume that the highest 
 superblock
 version is the best and should be used for new installs etc.
 
 Grumble... why can't people expect what I want them to expect?


David Moshe Yudkowsky wrote:
 I expect it's because I used 1.2 superblocks (why
 not use the latest, I said, foolishly...) and therefore the RAID10 --

David Aha - an 'in the wild' example of why we should deprecate '0.9
David 1.0 1.1, 1.2' and rename the superblocks to data-version +
David on-disk-location :)

As the person who started this entire thread ages ago about the *poor*
naming convetion used for RAID Superblocks, I have to agree.

I'd much rather see 1.near, 1.far, 1.both or something like that added
in.

Heck, we don't have to remove the support for the old 1.0, 1.1, 1.2
names either, just make the default be something more user friendly.

C'mon, how many of you are programmed to believe that 1.2 is better
than 1.0?  But when they're not different, just just different
placements, then it's confusing.

John
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-04 Thread Richard Scobie

Michael Tokarev wrote:


Unfortunately an UPS does not *really* help here.  Because unless
it has control program which properly shuts system down on the loss
of input power, and the battery really has the capacity to power the
system while it's shutting down (anyone tested this?  With new UPS?
and after an year of use, when the battery is not new?), -- unless
the UPS actually has the capacity to shutdown system, it will cut
the power at an unexpected time, while the disk(s) still has dirty
caches...


I'm unsure what you mean here. The Network UPS Tools project
http://www.networkupstools.org/ has been supplying software to do this
for years.

In addition, a number of UPS manufacturers including APC, one of the
larger ones, provide Linux management and monitoring software with the UPS.

As far as worrying whether a one year old battery has enough capacity to
hold up while the system shuts down, there is no reason why you cannot
set it to shut the system down gracefully after maybe 30 seconds of
power loss if you feel it is necessary.

A reputable brand UPS with a correctly sized battery capacity will have
no trouble in this scenario unless the battery is faulty, in which case
it will probably be picked up during automated load tests. As long as
the manufacturers battery replacement schedule is followed, genuine
replacement batteries are used and automated regular UPS tests are
enabled, the risks of failure are small.

Regards,

Richard


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-04 Thread Eric Sandeen
Moshe Yudkowsky wrote:
 So if I understand you correctly, you're stating that current the most 
 reliable fs in its default configuration, in terms of protection against 
 power-loss scenarios, is XFS?

I wouldn't go that far without some real-world poweroff testing, because
various fs's are probably more or less tolerant of a write-cache
evaporation.  I suppose it'd depend on the size of the write cache as well.

-Eric
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid1 or raid10 for /boot

2008-02-04 Thread Robin Hill
On Mon Feb 04, 2008 at 12:21:40PM +0100, Keld Jørn Simonsen wrote:

 On Mon, Feb 04, 2008 at 09:17:35AM +, Robin Hill wrote:
  On Mon Feb 04, 2008 at 07:34:54AM +0100, Keld Jørn Simonsen wrote:
  
   I understand that lilo and grub only can boot partitions that look like
   a normal single-drive partition. And then I understand that a plain
   raid10 has a layout which is equivalent to raid1. Can such a raid10
   partition be used with grub or lilo for booting?
   And would there be any advantages in this, for example better disk
   utilization in the raid10 driver compared with raid?
   
  A plain RAID-10 does _not_ have a layout equivalent to RAID-1 and
  _cannot_ be used for booting (well, possibly a 2-disk RAID-10 could -
  I'm not sure how that'd be layed out).  RAID-10 uses striping as well as
  mirroring, and the striping breaks both grub and lilo (and, AFAIK, every
  other boot manager currently out there).
 
 Yes, it is understood that raid10,f2 uses striping, but a raid10,near=2,
 far=1 does not use striping, anfd this is what you get if you just make
 amdadm --create /dev/md0 -l 10 -n 2 /dev/sda1 /dev/sdb1
 
Well yes, if you do a two-disk RAID-10 then (as I said above) you
probably end up with a RAID-1 (as you do with a two-disk RAID-5).  I
don't see how this would work any differently (or better) than a RAID-1
though (and only serves to confuse things).

If you have more than two disks then RAID-10 will _always_ stripe (no
matter whether you use near, far or offset layout - these affect only
where the mirrored chunks are put) and grub/lilo will fail to work.

Cheers,
Robin

-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgpJkvfTSVpOK.pgp
Description: PGP signature


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-04 Thread Robin Hill
On Mon Feb 04, 2008 at 05:06:09AM -0600, Moshe Yudkowsky wrote:

 Robin, thanks for the explanation. I have a further question.

 Robin Hill wrote:

 Once the file system is mounted then hdX,Y maps according to the
 device.map file (which may actually bear no resemblance to the drive
 order at boot - I've had issues with this before).  At boot time it maps
 to the BIOS boot order though, and (in my experience anyway) hd0 will
 always map to the drive the BIOS is booting from.

 At the time that I use grub to write to the MBR, hd2,1 is /dev/sdc1. 
 Therefore, I don't quite understand why this would not work:

 grub EOF
 root(hd2,1)
 setup(hd2)
 EOF

 This would seem to be a command to have the MBR on hd2 written to use the 
 boot on hd2,1. It's valid when written. Are you saying that it's a command 
 for the MBR on /dev/sdc to find the data on (hd2,1), the location of which 
 might change at any time? That's... a  very strange way to write the tool. 
 I thought it would be a command for the MBR on hd2 (sdc) to look at hd2,1 
 (sdc1) to find its data, regardless of the boot order that caused sdc to be 
 the boot disk.

This is exactly what it does, yes - the hdX,Y are mapped by GRUB into
BIOS disk interfaces (0x80 being the first, 0x81 the second and so on)
and it writes (to hdc in this case) the instructions to look on the
first partition of BIOS drive 0x82 (whichever drive that ends up being)
for the rest of the bootloader.

It is a bit of a strange way to work, but it's really the only way it
_can_ work (and cover all circumstances).  Unfortunately when you start
playing with bootloaders you have to get down to the BIOS level, and
things weren't written to make sense at that level (after all, when
these standards were put in place everyone was booting from a single
floppy disk system).  If EFI becomes more standard then hopefully this
will simplify but we're stuck with things as they are for now.

Cheers,
Robin

-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgplpoLJetl8c.pgp
Description: PGP signature


Re: Linux md and iscsi problems

2008-02-04 Thread aristizb

Good morning.


Quoting Neil Brown [EMAIL PROTECTED]:


On Friday February 1, [EMAIL PROTECTED] wrote:



Summarizing, I have two questions about the behavior of Linux md with
slow devices:

1. Is it possible to modify some kind of time-out parameter on the
mdadm tool so the slow device wouldn't be marked as faulty because of
its slow performance.


No.  md doesn't do timeouts at all.  The underlying device does.
So if you are getting time out errors from the iscsi initiator, then
you need to change the timeout value used by the iscsi initiator.  md
has no part to play in this.  It just sends a request and eventually
gets either 'success' or 'fail'.



1. What seems strange to me is that under the same conditions running  
the test only with iscsi, the initiator never fails. I had some  
problems with iscsi on the past and they  were solved changing the  
time-out parameters on  the initiator side, but now that I added the  
md layer I am getting errors with the slow device.






2. Is it possible to control the buffer size of the RAID?, in other
words, can I control the amount of data I can write to the local disc
before I receive an acknowledgment from the slow device when I am
using the write-behind option.


No.  md/raid1 simply calls 'kmalloc' to get space to buffer each write
as the write arrives.  If the allocation succeeds, it is used to
perform the write lazily.  If the allocation fails, the write is
performs synchronously.

What did you hope to achieve by such tuning?  It can probably be
added if it is generally useful.

NeilBrown




2. The idea here is to implement remote replication, by having a  
RAID-1 I can create a mirror of my local disk and use it for backup or  
for any other purposes on a centralized location.


Changing the buffer will allow me to improve performance on the  
writing, so a user will always experience local writing speed, while I  
am still sending data across the network to the mirror device.



Thanks a lot for your time.


Juan Aristizabal.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-04 Thread Justin Piszcz



On Mon, 4 Feb 2008, Michael Tokarev wrote:


Moshe Yudkowsky wrote:
[]

If I'm reading the man pages, Wikis, READMEs and mailing lists correctly
--  not necessarily the case -- the ext3 file system uses the equivalent
of data=journal as a default.


ext3 defaults to data=ordered, not data=journal.  ext2 doesn't have
journal at all.


The question then becomes what data scheme to use with reiserfs on the


I'd say don't use reiserfs in the first place ;)


Another way to phrase this: unless you're running data-center grade
hardware and have absolute confidence in your UPS, you should use
data=journal for reiserfs and perhaps avoid XFS entirely.


By the way, even if you do have a good UPS, there should be some
control program for it, to properly shut down your system when
UPS loses the AC power.  So far, I've seen no such programs...

/mjt


Why avoid XFS entirely?

esandeen, any comments here?

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-04 Thread Michael Tokarev
Eric Sandeen wrote:
[]
 http://oss.sgi.com/projects/xfs/faq.html#nulls
 
 and note that recent fixes have been made in this area (also noted in
 the faq)
 
 Also - the above all assumes that when a drive says it's written/flushed
 data, that it truly has.  Modern write-caching drives can wreak havoc
 with any journaling filesystem, so that's one good reason for a UPS.  If

Unfortunately an UPS does not *really* help here.  Because unless
it has control program which properly shuts system down on the loss
of input power, and the battery really has the capacity to power the
system while it's shutting down (anyone tested this?  With new UPS?
and after an year of use, when the battery is not new?), -- unless
the UPS actually has the capacity to shutdown system, it will cut
the power at an unexpected time, while the disk(s) still has dirty
caches...

 the drive claims to have metadata safe on disk but actually does not,
 and you lose power, the data claimed safe will evaporate, there's not
 much the fs can do.  IO write barriers address this by forcing the drive
 to flush order-critical data before continuing; xfs has them on by
 default, although they are tested at mount time and if you have
 something in between xfs and the disks which does not support barriers
 (i.e. lvm...) then they are disabled again, with a notice in the logs.

Note also that with linux software raid barriers are NOT supported.

/mjt
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-04 Thread Eric Sandeen
Eric Sandeen wrote:
 Justin Piszcz wrote:
 
 Why avoid XFS entirely?

 esandeen, any comments here?
 
 Heh; well, it's the meme.
 
 see:
 
 http://oss.sgi.com/projects/xfs/faq.html#nulls
 
 and note that recent fixes have been made in this area (also noted in
 the faq)

Actually, continue reading past that specific entry to the next several,
 it covers all this quite well.

-Eric

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-04 Thread Eric Sandeen
Justin Piszcz wrote:

 Why avoid XFS entirely?
 
 esandeen, any comments here?

Heh; well, it's the meme.

see:

http://oss.sgi.com/projects/xfs/faq.html#nulls

and note that recent fixes have been made in this area (also noted in
the faq)

Also - the above all assumes that when a drive says it's written/flushed
data, that it truly has.  Modern write-caching drives can wreak havoc
with any journaling filesystem, so that's one good reason for a UPS.  If
the drive claims to have metadata safe on disk but actually does not,
and you lose power, the data claimed safe will evaporate, there's not
much the fs can do.  IO write barriers address this by forcing the drive
to flush order-critical data before continuing; xfs has them on by
default, although they are tested at mount time and if you have
something in between xfs and the disks which does not support barriers
(i.e. lvm...) then they are disabled again, with a notice in the logs.

Note also that ext3 has the barrier option as well, but it is not
enabled by default due to performance concerns.  Barriers also affect
xfs performance, but enabling them in the non-battery-backed-write-cache
scenario is the right thing to do for filesystem integrity.

-Eric

 Justin.
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?

2008-02-04 Thread maximilian attems
On Mon, Feb 04, 2008 at 02:59:44PM -0600, Moshe Yudkowsky wrote:
 Problem: on reboot, the I get an error message:
 
 root (hd0,1)  (Moshe comment: as expected)
 Filesystem type is xfs, partition type 0xfd (Moshe comment: as expected)
 kernel /boot/vmliuz-etc.-amd64 root=/dev/md/boot ro
 
 Error 15: File not found
 

error 15 is an *grub* error.

grub is known for it's dislike of xfs, so with this whole setup use ext3
rerun grub-install and you should be fine.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?

2008-02-04 Thread Robin Hill
On Mon Feb 04, 2008 at 02:59:44PM -0600, Moshe Yudkowsky wrote:

 I've managed to get myself into a little problem.

 Since power hits were taking out the /boot partition, I decided to split 
 /boot out of root. Working from my emergency partition,  I copied all files 
 from /root, re-partitioned what had been /root into room for /boot and 
 /root, and then created the drives. This left me with /dev/md/boot, 
 /dev/md/root, and /dev/md/base (everything else).

 I modified mdadm.conf on the emergency partition, used update-initramfs to 
 make certain that the new md drives would be recognized, and rebooted. This 
 worked as expected.

 I then mounted all the entire new file system on a mount point, copied the 
 mdadm.conf to that point, did a chroot to that point, and did an 
 update-initramfs so that the non-emergency partition would have the updated 
 mdadm.conf. This worked -- but with complaints about missing the file 
 /proc/modules (which is not present under chroot). If I use the -v option I 
 can see the raid456, raid1, etc. modules loading.

 I modified menu.lst to make certain that boot=/dev/md/boot, ran grub 
 (thanks, Robin!) successfully.

 Problem: on reboot, the I get an error message:

 root (hd0,1)  (Moshe comment: as expected)
 Filesystem type is xfs, partition type 0xfd (Moshe comment: as expected)
 kernel /boot/vmliuz-etc.-amd64 root=/dev/md/boot ro
---^^

Are you sure that's right?  Looks like a typo to me.

Cheers,
Robin
-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgpLybDJ7EGiw.pgp
Description: PGP signature


Re: using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?

2008-02-04 Thread Moshe Yudkowsky

Robin Hill wrote:


File not found at that point would suggest it can't find the kernel
file.  The path here should be relative to the root of the partition
/boot is on, so if your /boot is its own partition then you should
either use kernel /vmlinuz or (the more usual solution from what
I've seen) make sure there's a symlink:
ln -s . /boot/boot


Robin,

Thanks very much! ln -s . /boot/boot works to get past this problem.

Now it's failed in a different section and complains that it can't find 
/sbin/init. I'm at the (initramfs) prompt, which I don't ever recall 
seeing before. I can't  mount /dev/md/root on any mount points (invalid 
arguments even though I'm not supplying any). I've checked /dev/md/root 
and it does work as expected when I try mounting it while in my 
emergency partition, and it does contain /sbin/init and the other files 
and mount points for /var, /boot, /tmp, etc.


So this leads me to the question of why /sbin isn't being seen. /sbin is 
on the device /dev/md/root, and /etc/fstab specifically mounts it at /. 
 I would think /boot would look at an internal copy of /etc/fstab. Is 
this another side effect of using /boot on its own partition?


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 Blessed are the peacemakers,
  for they shall be mowed down in the crossfire.
-- Michael Flynn
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?

2008-02-04 Thread Moshe Yudkowsky

maximilian attems wrote:


error 15 is an *grub* error.

grub is known for it's dislike of xfs, so with this whole setup use ext3
rerun grub-install and you should be fine.


I should mention that something *did* change. When attempting to use 
XFS, grub would give me a note about 18 partitions used (I forget the 
exact language). This was different than I'd remembered; when I switched 
back to using reiserfs, grub reports using 19 partitions.


So there's something definitely interesting about XFS and booting.

As an additional note, if I use the grub boot-time commands to edit root 
to read, e.g., root=/dev/sda2 or root=/dev/sdb2, I get the same Error 15 
error message.


It may be that grub is complaining about grub and resiserfs, but I 
suspect that it has a true complain about the file system and what's on 
the partitions.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
 If, after hearing my songs, just one human being is inspired to
  say something nasty to a friend, it will all have been worthwhile.
-- Tom Lehrer
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID needs more to survive a power hit, different /boot layout for example (was Re: draft howto on making raids for surviving a disk crash)

2008-02-04 Thread Justin Piszcz



On Mon, 4 Feb 2008, Michael Tokarev wrote:


Eric Sandeen wrote:
[]

http://oss.sgi.com/projects/xfs/faq.html#nulls

and note that recent fixes have been made in this area (also noted in
the faq)

Also - the above all assumes that when a drive says it's written/flushed
data, that it truly has.  Modern write-caching drives can wreak havoc
with any journaling filesystem, so that's one good reason for a UPS.  If


Unfortunately an UPS does not *really* help here.  Because unless
it has control program which properly shuts system down on the loss
of input power, and the battery really has the capacity to power the
system while it's shutting down (anyone tested this?  With new UPS?
and after an year of use, when the battery is not new?), -- unless
the UPS actually has the capacity to shutdown system, it will cut
the power at an unexpected time, while the disk(s) still has dirty
caches...
You use nut and a large enough UPS to handle the load of the system, it 
shuts the machine down just fine.





the drive claims to have metadata safe on disk but actually does not,
and you lose power, the data claimed safe will evaporate, there's not
much the fs can do.  IO write barriers address this by forcing the drive
to flush order-critical data before continuing; xfs has them on by
default, although they are tested at mount time and if you have
something in between xfs and the disks which does not support barriers
(i.e. lvm...) then they are disabled again, with a notice in the logs.


Note also that with linux software raid barriers are NOT supported.

/mjt



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?

2008-02-04 Thread Robin Hill
On Mon Feb 04, 2008 at 02:59:44PM -0600, Moshe Yudkowsky wrote:

 I've managed to get myself into a little problem.

 Since power hits were taking out the /boot partition, I decided to split 
 /boot out of root. Working from my emergency partition,  I copied all files 
 from /root, re-partitioned what had been /root into room for /boot and 
 /root, and then created the drives. This left me with /dev/md/boot, 
 /dev/md/root, and /dev/md/base (everything else).

 I modified mdadm.conf on the emergency partition, used update-initramfs to 
 make certain that the new md drives would be recognized, and rebooted. This 
 worked as expected.

 I then mounted all the entire new file system on a mount point, copied the 
 mdadm.conf to that point, did a chroot to that point, and did an 
 update-initramfs so that the non-emergency partition would have the updated 
 mdadm.conf. This worked -- but with complaints about missing the file 
 /proc/modules (which is not present under chroot). If I use the -v option I 
 can see the raid456, raid1, etc. modules loading.

 I modified menu.lst to make certain that boot=/dev/md/boot, ran grub 
 (thanks, Robin!) successfully.

 Problem: on reboot, the I get an error message:

 root (hd0,1)  (Moshe comment: as expected)
 Filesystem type is xfs, partition type 0xfd (Moshe comment: as expected)
 kernel /boot/vmliuz-etc.-amd64 root=/dev/md/boot ro

 Error 15: File not found

File not found at that point would suggest it can't find the kernel
file.  The path here should be relative to the root of the partition
/boot is on, so if your /boot is its own partition then you should
either use kernel /vmlinuz or (the more usual solution from what
I've seen) make sure there's a symlink:
ln -s . /boot/boot

HTH,
Robin
-- 
 ___
( ' } |   Robin Hill[EMAIL PROTECTED] |
   / / )  | Little Jim says |
  // !!   |  He fallen in de water !! |


pgpSsIYkFb4DG.pgp
Description: PGP signature


when is a disk non-fresh?

2008-02-04 Thread Dexter Filmore
Seems the other topic wasn't quite clear...
Occasionally a disk is kicked for being non-fresh - what does this mean and 
what causes it?

Dex



-- 
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCS d--(+)@ s-:+ a- C UL++ P+++ L+++ E-- W++ N o? K-
w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ 
b++(+++) DI+++ D- G++ e* h++ r* y?
--END GEEK CODE BLOCK--

http://www.vorratsdatenspeicherung.de
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?

2008-02-04 Thread Moshe Yudkowsky

I've managed to get myself into a little problem.

Since power hits were taking out the /boot partition, I decided to split 
/boot out of root. Working from my emergency partition,  I copied all 
files from /root, re-partitioned what had been /root into room for /boot 
and /root, and then created the drives. This left me with /dev/md/boot, 
/dev/md/root, and /dev/md/base (everything else).


I modified mdadm.conf on the emergency partition, used update-initramfs 
to make certain that the new md drives would be recognized, and 
rebooted. This worked as expected.


I then mounted all the entire new file system on a mount point, copied 
the mdadm.conf to that point, did a chroot to that point, and did an 
update-initramfs so that the non-emergency partition would have the 
updated mdadm.conf. This worked -- but with complaints about missing the 
file /proc/modules (which is not present under chroot). If I use the -v 
option I can see the raid456, raid1, etc. modules loading.


I modified menu.lst to make certain that boot=/dev/md/boot, ran grub 
(thanks, Robin!) successfully.


Problem: on reboot, the I get an error message:

root (hd0,1)  (Moshe comment: as expected)
Filesystem type is xfs, partition type 0xfd (Moshe comment: as expected)
kernel /boot/vmliuz-etc.-amd64 root=/dev/md/boot ro

Error 15: File not found

Did I miss something? I'm pretty certain this is the procedure I used 
before. The XFS module is being loaded by update-initramfs, so unless 
there's a reason that I can't boot md from  a boot partition with the 
XFS file system, then I don't understand what the problem is.


Comments welcome -- I'm wedged!


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
Many that live deserve death. And some that die deserve life. Can you 
give it to

them? Then do not be too eager to deal out death in judgement. For even the
wise cannot see all ends.
-- Gandalf (J.R.R. Tolkien)
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm 2.6.4 : How i can check out current status of reshaping ?

2008-02-04 Thread Neil Brown
On Monday February 4, [EMAIL PROTECTED] wrote:
 
 [EMAIL PROTECTED]:/# cat /proc/mdstat
 Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
 [multipath] [faulty]
 md1 : active raid5 sdc[0] sdb[5](S) sdf[3] sde[2] sdd[1]
   1465159488 blocks super 0.91 level 5, 64k chunk, algorithm 2 [5/4] 
 [_]
 
 unused devices: none
 
 ##
 But how i can see the status of reshaping ?
 Is it reshaped realy ? or may be just hang up ? or may be mdadm nothing do 
 not give in
 general ?
 How long wait when reshaping will finish ?
 ##
 

The reshape hasn't restarted.

Did you do that mdadm -w /dev/md1 like I suggested?  If so, what
happened?

Possibly you tried mounting the filesystem before trying the mdadm
-w.  There seems to be a bug such that doing this would cause the
reshape not to restart, and mdadm -w would not help any more.

I suggest you:

  echo 0  /sys/module/md_mod/parameters/start_ro

stop the array 
  mdadm -S /dev/md1
(after unmounting if necessary).

Then assemble the array again.
Then
  mdadm -w /dev/md1

just to be sure.

If this doesn't work, please report exactly what you did, exactly what
message you got and exactly where message appeared in the kernel log.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: using update-initramfs: how to get new mdadm.conf into the /boot? Or is it XFS?

2008-02-04 Thread Moshe Yudkowsky

I wrote:

Now it's failed in a different section and complains that it can't find 
/sbin/init. I'm at the (initramfs) prompt, which I don't ever recall 
seeing before. I can't  mount /dev/md/root on any mount points (invalid 
arguments even though I'm not supplying any). I've checked /dev/md/root 
and it does work as expected when I try mounting it while in my 
emergency partition, and it does contain /sbin/init and the other files 
and mount points for /var, /boot, /tmp, etc.


So this leads me to the question of why /sbin isn't being seen. /sbin is 
on the device /dev/md/root, and /etc/fstab specifically mounts it at /. 
 I would think /boot would look at an internal copy of /etc/fstab. Is 
this another side effect of using /boot on its own partition?


The answer: I managed to make a mistake in the configuration of grub, in 
 /boot/grub/menu.lst. I'd changed root= from /dev/md/root to 
/dev/md/boot -- but I really need to include the *root* location, which 
does not change, vs. the boot location, which is not relevant.


--
Moshe Yudkowsky * [EMAIL PROTECTED] * www.pobox.com/~moshe
The central tenet of Buddhism is not 'Every man for himself.'
-- Wanda
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: when is a disk non-fresh?

2008-02-04 Thread Neil Brown
On Monday February 4, [EMAIL PROTECTED] wrote:
 Seems the other topic wasn't quite clear...

not necessarily.  sometimes it helps to repeat your question.  there
is a lot of noise on the internet and somethings important things get
missed... :-)

 Occasionally a disk is kicked for being non-fresh - what does this mean and 
 what causes it?

The 'event' count is too small.  
Every event that happens on an array causes the event count to be
incremented.
If the event counts on different devices differ by more than 1, then
the smaller number is 'non-fresh'.

You need to look to the kernel logs of when the array was previously
shut down to figure out why it is now non-fresh.

NeilBrown


 
 Dex
 
 
 
 -- 
 -BEGIN GEEK CODE BLOCK-
 Version: 3.12
 GCS d--(+)@ s-:+ a- C UL++ P+++ L+++ E-- W++ N o? K-
 w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ 
 b++(+++) DI+++ D- G++ e* h++ r* y?
 --END GEEK CODE BLOCK--
 
 http://www.vorratsdatenspeicherung.de
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html