Grub vs Lilo

2006-07-26 Thread Lewis Shobbrook
Hi All,

Wondering if anyone can comment on an easy way to get grub to update all 
components in a raid1 array.  I have a raid1 /boot with a raid10 /root and 
have previously used lilo with the raid-extra-boot option to install to boot 
sectors of all component devices.  With grub it appears that you can only 
update non default devices via the command line.  I like the ability to be 
able to type lilo and have all updated in one hit.  Is there a way to do this 
with grub?  

Cheers,

Lewis
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: host based mirror distance in a fc-based SAN environment

2006-07-26 Thread David Greaves
Stefan Majer wrote:
 Hi,
 
 im curious if there are some numbers out up to which distance its possible
 to mirror (raid1) 2 FC-LUNs. We have 2 datacenters with a effective
 distance of 11km. The fabrics in one datacenter are connected to the
 fabrics in the other datacenter with 5 dark fibre both about 11km in
 distance.
 
 I want to set up servers wich mirrors their LUNs across the SAN-boxen in
 both datacenters. On top of this mirrored LUN i put lvm2.
 
 So the question is does anybody have some numbers up to which distance
 this method works ?

No. But have a look at man mdadm in later mdadm:

   -W, --write-mostly
 subsequent devices lists in a --build, --create,  or  --add
 command  will  be flagged as 'write-mostly'.  This is valid
 for RAID1 only and means that the 'md'  driver  will  avoid
 reading from these devices if at all possible.  This can be
 useful if mirroring over a slow link.

   --write-behind=
 Specify that write-behind mode should be enabled (valid for
 RAID1  only).  If an argument is specified, it will set the
 maximum number of outstanding writes allowed.  The  default
 value  is  256.  A write-intent bitmap is required in order
 to  use  write-behind  mode,  and  write-behind   is   only
 attempted on drives marked as write-mostly.

Which suggests that the WAN/LAN latency shouldn't impact you except on
failure.

HTH

David

-- 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Grub vs Lilo

2006-07-26 Thread Jason Lunz
[EMAIL PROTECTED] said:
 Wondering if anyone can comment on an easy way to get grub to update
 all components in a raid1 array.  I have a raid1 /boot with a raid10
 /root and have previously used lilo with the raid-extra-boot option to
 install to boot sectors of all component devices.  With grub it
 appears that you can only update non default devices via the command
 line.  I like the ability to be able to type lilo and have all updated
 in one hit.  Is there a way to do this with grub?  

assuming your /boot is made of hda1 and hdc1:

grub-install /dev/hda1
grub-install /dev/hdc1

Jason

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Grub vs Lilo

2006-07-26 Thread Michael Tokarev
Jason Lunz wrote:
 [EMAIL PROTECTED] said:
 Wondering if anyone can comment on an easy way to get grub to update
 all components in a raid1 array.  I have a raid1 /boot with a raid10
 /root and have previously used lilo with the raid-extra-boot option to
 install to boot sectors of all component devices.  With grub it
 appears that you can only update non default devices via the command
 line.  I like the ability to be able to type lilo and have all updated
 in one hit.  Is there a way to do this with grub?  
 
 assuming your /boot is made of hda1 and hdc1:
 
 grub-install /dev/hda1
 grub-install /dev/hdc1

Don't do that.
Because if your hda dies, and you will try to boot off hdc instead (which
will be hda in this case), grub will try to read hdc which is gone, and
will fail.

Most of the time (unless the bootloader is really smart and understands
mirroring in full - lilo and grub does not) you want to have THE SAME
boot code on both (or more, in case of 3 or 4 disks mirrors) your disks,
including bios disk codes.  after the above two commands, grub will write
code to boot from disk 0x80 to hda, and from disk 0x81 (or 0x82) to hdc.
So when your hdc becomes hda, it will not boot.

In order to solve this all, you have to write diskmap file and run grub-install
twice.  Both times, diskmap should list 0x80 for the device to which you're
installing grub.

I don't remember the syntax of the diskmap file (or even if it's really
called 'diskmap'), but assuming hda and hdc notation, I mean the following:

  echo /dev/hda 0x80  /boot/grub/diskmap
  grub-install /dev/hda1

  echo /dev/hdc 0x80  /boot/grub/diskmap # overwrite it!
  grub-install /dev/hdc1

The thing with all this my RAID devices works, it is really simple! thing is:
for too many people it indeed works, so they think it's good and correct way.
But it works up to the actual failure, which, in most setups, isn't tested.
But once something failed, umm...  Jason, try to remove your hda (pretend it
is failed) and boot off hdc to see what I mean ;)  (Well yes, rescue disk will
help in that case... hopefully.  But not RAID, which, when installed properly,
will really make disk failure transparent).

/mjt
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Grub vs Lilo

2006-07-26 Thread Bernd Rieke

Michael Tokarev wrote on 26.07.2006 20:00:


The thing with all this my RAID devices works, it is really simple! 
thing is:
for too many people it indeed works, so they think it's good and 
correct way.
But it works up to the actual failure, which, in most setups, isn't 
tested.
But once something failed, umm... Jason, try to remove your hda 
(pretend it
is failed) and boot off hdc to see what I mean ;) (Well yes, rescue 
disk will
help in that case... hopefully. But not RAID, which, when installed 
properly,

will really make disk failure transparent).
/mjt

Yes Michael, your right. We use a simple RAID1 config with swap and  / on
three SCSI-disks (2 working, one hot-spare) on SuSE 9.3 systems. We had to
use lilo to handle the boot off of any of the two (three) disks. But we had
problems over problems until lilo 22.7 came up. With this version of lilo
we can pull off any disk in any scenario. The box boots in any case.

We were wondering when we asked the groups while in trouble with lilo
before 22.7 not having any response. Ok, the RAID-Driver and the kernel
worked fine while resyncing the spare in case of a disk failure (thanks to
Neil Brown for that). But if a box had to be rebooted with a failed disk
the situation became worse. And you have to reboot because hotplug still
doesn't work. But nobody seems to care abou or nobody apart of us has
these problems  ...

We tested the setup again and again until we find a stable setup which works
in _any_ case. Ok, we're still missing hotpluging (seems to be solved 
for aic79
in 2.6.17, we're testing). But when we tried to discuss these problems 
(one half
of the raid-devices go offline on that controle where hotplugging 
occurs) there

was no response, too.

So we came to the conclusion that everybody is working on RAID but nobody
cares about the things around, just as you mentioned, thanks for that.

Bernd Rieke

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Grub vs Lilo

2006-07-26 Thread Michael Tokarev
Bernd Rieke wrote:
 Michael Tokarev wrote on 26.07.2006 20:00:
 .
 .
The thing with all this my RAID devices works, it is really simple! thing 
is:
for too many people it indeed works, so they think it's good and correct way.
But it works up to the actual failure, which, in most setups, isn't tested.
But once something failed, umm... Jason, try to remove your hda (pretend it
is failed) and boot off hdc to see what I mean ;) (Well yes, rescue disk will
help in that case... hopefully. But not RAID, which, when installed properly,
will really make disk failure transparent).
/mjt
 
 Yes Michael, your right. We use a simple RAID1 config with swap and  / on
 three SCSI-disks (2 working, one hot-spare) on SuSE 9.3 systems. We had to
 use lilo to handle the boot off of any of the two (three) disks. But we had
 problems over problems until lilo 22.7 came up. With this version of lilo
 we can pull off any disk in any scenario. The box boots in any case.

Well, alot of systems here works on root-on-raid1 with lilo-2.2.4 (Debian
package), and grub.  By works I mean they really works, ie, any disk
failure don't prevent the system from working and (re)booting flawlessly
(provided the disk is really dead, as opposed to when it is present but
fails to read (some) data - in which case the only way is either to remove
it physically or to choose another boot device in BIOS.  But that's
entirely different story, about (non-existed) really smart boot loader
I mentioned in my previous email).

The trick is to set the system up properly.  Simple/obvious way
(installing grub to hda1 and hdc1) don't work when you remove hda, but
complex way works.

More, I'd not let LILO to do more guesswork for me (like raid-extra-boot
stuff, or whatever comes with 22.7 - to be honest, I didn't look at it
at all, as debian package of 2.2.4 (or 22.4?) works for me just fine).
Just write the damn thing into the start of mdN (and let raid code to
replicate it to all drives, regardless of how many of them there is),
after realizing it's really a partition number X (with offset Y) on a
real disk, and use bios code 0x80 for all disk access.  That's all.
The rest - like ensuring all the (boot) partitions are at the same
place on every disk, that disk geometry is the same etc - is my duty,
and this duty is done by me accurately - because I want the disks to
be interchangeable.

 We were wondering when we asked the groups while in trouble with lilo
 before 22.7 not having any response. Ok, the RAID-Driver and the kernel
 worked fine while resyncing the spare in case of a disk failure (thanks to
 Neil Brown for that). But if a box had to be rebooted with a failed disk
 the situation became worse. And you have to reboot because hotplug still
 doesn't work. But nobody seems to care abou or nobody apart of us has
 these problems  ...

Just curious - when/where you asked?
[]
 So we came to the conclusion that everybody is working on RAID but nobody
 cares about the things around, just as you mentioned, thanks for that.

I tend to disagree.  My statement above refers to simple advise sometimes
given here and elsewhere, do this and that, it worked for me.  By users
who didn't do their homework, who never tested the stuff, who, sometimes,
just has no idea as of HOW to test (it's not an insulting statement
hopefully - I don't blame them for their lack of knowlege, it's something
which isn't really cheap, after all).  Majority of users are of this sort,
and they follow each other's advises, again, without testing.  HOWTOs
written by such users, as well (as someone mentioned to me in private
email as a response to my reply).

I mean, the existing software works.  It really works.  The only thing left
is to set it up correctly.

And please PLEASE don't treat it all as blames to bad users.  It's not.
I learned this stuff the hard way too.  After having unbootable remote
machines after a disk failure, when everything seemed to be ok.  After
screwing up systems using famous linux raid autodetect stuff everyone
loves, when, after replacing a failed disk to another, which -- bad me --
was a part of another raid array on another system, and the box choosen
to assemble THAT raid array instead of this box's one, and overwritten
good disk with data from new disk which was in a testing machine.  And
so on.  That all to say: it's easy to make a mistake, and treating the
resulting setup as a good one, until shit start happening.  But shit
happens very rarely, compared to average system usage, so you may never
know at all that your setup is wrong, and ofcourse you will tell how
to do things to others... :)

/mjt
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: host based mirror distance in a fc-based SAN environment

2006-07-26 Thread Luca Berra

On Wed, Jul 26, 2006 at 07:58:09AM +0200, Stefan Majer wrote:

Hi,

im curious if there are some numbers out up to which distance its possible
to mirror (raid1) 2 FC-LUNs. We have 2 datacenters with a effective
distance of 11km. The fabrics in one datacenter are connected to the
fabrics in the other datacenter with 5 dark fibre both about 11km in
distance.


as you probably already know with LX (1310nm) GBICS and single-mode fiber you 
can
reach up to a theoretical limit of 50Km, and you can double that using 1550 nm 
lasers (ZX?)


I want to set up servers wich mirrors their LUNs across the SAN-boxen in
both datacenters. On top of this mirrored LUN i put lvm2.

So the question is does anybody have some numbers up to which distance
this method works ?


the method is independent of the distance, if your FC hardware can do
that, then you can.
the only thing you should consider (and that is not directly related to
distance) is the bandwith you have between the two sites (i mean the
number of systems that might be using those 5 fibers)


Regards,
L.

--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] md: new bitmap sysfs interface

2006-07-26 Thread Mike Snitzer

On 7/25/06, Paul Clements [EMAIL PROTECTED] wrote:

This patch (tested against 2.6.18-rc1-mm1) adds a new sysfs interface
that allows the bitmap of an array to be dirtied. The interface is
write-only, and is used as follows:

echo 1000  /sys/block/md2/md/bitmap

(dirty the bit for chunk 1000 [offset 0] in the in-memory and on-disk
bitmaps of array md2)

echo 1000-2000  /sys/block/md1/md/bitmap

(dirty the bits for chunks 1000-2000 in md1's bitmap)

This is useful, for example, in cluster environments where you may need
to combine two disjoint bitmaps into one (following a server failure,
after a secondary server has taken over the array). By combining the
bitmaps on the two servers, a full resync can be avoided (This was
discussed on the list back on March 18, 2005, [PATCH 1/2] md bitmap bug
fixes thread).


Hi Paul,

I tracked down the thread you referenced and these posts (by you)
seems to summarize things well:
http://marc.theaimsgroup.com/?l=linux-raidm=16563016418w=2
http://marc.theaimsgroup.com/?l=linux-raidm=17515400864w=2

But for clarity's sake, could you elaborate on the negative
implications of not merging the bitmaps on the secondary server?  Will
the previous primary's dirty blocks get dropped on the floor because
the secondary (now the primary) doesn't have awareness of the previous
primary's dirty blocks once it activates the raid1?

Also, what is the interface one should use to collect dirty bits from
the primary's bitmap?

This bitmap merge can't happen until the primary's dirty bits can be
collected right?  Waiting for the failed server to come back to
harvest the dirty bits it has seems wrong (why failover at all?); so I
must be missing something.

please advise, thanks.
Mike
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ICH7R strip size

2006-07-26 Thread Jeff Woods

Howdy.

I realize this list is more focused on Linux software RAID than 
proprietary RAID controllers, but I've been Googling and can't find 
an answer that I suspect someone on this list probably knows. If 
anyone can suggest a better forum for this question or someplace 
where the answer is documented, that would be great.


Intel's ICH7R firmware allows creating volumes in various RAID modes 
where the strip size (which I believe to be a misspelling of 
stripe size) can be certain multiples of 16KB.


IIUC, a RAID stripe is composed of chunks spanning two or more 
drives. For example, a four disc RAID-5 array with 64KB chunks would 
yield a 256KB stripe with 192KB of usable storage per stripe.


I'm trying to optimize I/O performance for a database with 64KB 
blocks. I understand that RAID-0 multiplies the likelihood of a 
failure but that's only a nuisance and not a problem in my situation.


My question: What are the chunk and usable storage sizes per stripe 
for four discs in RAID-0 on an ICH7R configured for a 128KB strip?


Thank you very much.

--
Jeff Woods [EMAIL PROTECTED] 


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] md: new bitmap sysfs interface

2006-07-26 Thread Paul Clements

Mike Snitzer wrote:


I tracked down the thread you referenced and these posts (by you)
seems to summarize things well:
http://marc.theaimsgroup.com/?l=linux-raidm=16563016418w=2
http://marc.theaimsgroup.com/?l=linux-raidm=17515400864w=2

But for clarity's sake, could you elaborate on the negative
implications of not merging the bitmaps on the secondary server?  Will
the previous primary's dirty blocks get dropped on the floor because
the secondary (now the primary) doesn't have awareness of the previous
primary's dirty blocks once it activates the raid1?


Right. At the time of the failover, there were (probably) blocks that 
were out of sync between the primary and secondary. Now, after you've 
failed over to the secondary, you've got to overwrite those blocks with 
data from the secondary in order to make the primary disk consistent 
again. This requires that either you do a full resync from secondary to 
primary (if you don't know what differs), or you merge the two bitmaps 
and resync just that data.



Also, what is the interface one should use to collect dirty bits from
the primary's bitmap?


Whatever you'd like. scp the bitmap file over or collect the ranges into 
a file and scp that over, or something similar.



This bitmap merge can't happen until the primary's dirty bits can be
collected right?  Waiting for the failed server to come back to


Right. So, when the primary fails, you start the array on the secondary 
with a _clean_ bitmap, and just its local disk component. Now, whatever 
gets written while the primary is down gets put into the bitmap on the 
secondary. When the primary comes back up, you take the dirty bits from 
it and add them into the secondary's bitmap. Then, you insert the 
primary's disk (via nbd or similar) back into the array, and begin a 
resync.


That's the whole reason for this interface. We have to modify the bitmap 
while the array is active (modifying the bitmap while the array is down 
is trivial, and certainly doesn't require sysfs :).



harvest the dirty bits it has seems wrong (why failover at all?); so I
must be missing something.


We fail over immediately. We wait until later to combine the bitmaps and 
resync the data.


Hope that helps.

--
Paul
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] md: new bitmap sysfs interface

2006-07-26 Thread Mike Snitzer

On 7/26/06, Paul Clements [EMAIL PROTECTED] wrote:

Mike Snitzer wrote:

 I tracked down the thread you referenced and these posts (by you)
 seems to summarize things well:
 http://marc.theaimsgroup.com/?l=linux-raidm=16563016418w=2
 http://marc.theaimsgroup.com/?l=linux-raidm=17515400864w=2

 But for clarity's sake, could you elaborate on the negative
 implications of not merging the bitmaps on the secondary server?  Will
 the previous primary's dirty blocks get dropped on the floor because
 the secondary (now the primary) doesn't have awareness of the previous
 primary's dirty blocks once it activates the raid1?

Right. At the time of the failover, there were (probably) blocks that
were out of sync between the primary and secondary. Now, after you've
failed over to the secondary, you've got to overwrite those blocks with
data from the secondary in order to make the primary disk consistent
again. This requires that either you do a full resync from secondary to
primary (if you don't know what differs), or you merge the two bitmaps
and resync just that data.


I took more time to read the later posts in the original thread; that
coupled with your detailed response has helped a lot. thanks.


 Also, what is the interface one should use to collect dirty bits from
 the primary's bitmap?

Whatever you'd like. scp the bitmap file over or collect the ranges into
a file and scp that over, or something similar.


OK, so regardless of whether you are using an external or internal
bitmap; how does one collect the ranges from an array's bitmap?

Generally speaking I think others would have the same (naive) question
given that we need to know what to use as input for the sysfs
interface you've kindly provided.   If it is left as an exercise to
the user that is fine; I'd imagine neilb will get our backs with a
nifty new mdadm flag if need be.

thanks again,
Mike
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ICH7R strip size

2006-07-26 Thread Mark Hahn
My question: What are the chunk and usable storage sizes per stripe for four 
discs in RAID-0 on an ICH7R configured for a 128KB strip?


raid0 always has 100% usable; configuring it is deciding how much 
concurrency you want.


if your writes are 64K and you have 4 disks, your max concurrency would
be at 64K per disk (stripe size in MD), or 256K for a whole stripe.
if you want max bandwidth, but concurrency of 1, you want the 64k write
to busy all the disks, for a 16K stripe size (64K whole-stripe).

with raid5, the main thing additional factor is to try to arrange 
blind whole-stripe writes if possible.  that is, you pay the

read-modify-write penalty unless you write a whole stripe at once:
n-1 * stripe size if you can manage it aligned, or at least several
whole stripes to amortize the reads necessary on the edges...
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html