Re: Journaling FS and RAID

2000-06-28 Thread Benno Senoner

Krzysztof SIEROTA wrote:


 
 As far as I know the issue has been fixed in 2.4.* kernel series.
 ReiserFS and software RAID5 is NOT safe in 2.2.*

 Chris

Hi,

but Stephen Tweedie (some time ago) pointed out that ,
the only way to make a software raid system that survives (without data corruption)
a power failure
while in degraded mode ( this case is rare but it COULD happen),
is to make a big RAID5 partition where you store the data and a small RAID1
parition where
you keep the journal of the RAID5 partition.

He said ext3fs can be adapted for this, what is the current status ?

Regarding ReiserFS and RAID5 :  does allow reiserfs to put the journal on a
different partition
( eg RAID1) ?


If not consider it, because people want to run software-RAID5  arrays expecting the
same
reliability of HW ones.

last questions: are the current ext3 and reiserfs  raid-reconstruction safe ?

thanks for infos.

(waiting for the moment where I will be able to powercycle a 100GB journaled
soft-RAID5 array
which comes back up within few secs, instead of dozen of mins up to hours because
of fsck.
:-) )

PS: do you think we will see that before the end of the year ?

Benno.






Re: RAID1 on IDE

2000-05-13 Thread Benno Senoner

There is a possibility that the slave  can die if the master goes down.

Therefore the purpose of RAID1 availability goes away in this situation.

What's still left is  RAID1 data protection, that means even if your box
stops
or do not boot anymore your data should still be ok, since only one of
the two
disks failed. Therefore adding a fresh disk, and reconstructing the
array will
just work fine.

But after hearing all the discussions about the drawbacks (both from a
reliability
and performance point of view) of IDE raid in
master/slave configurations, I would never use that kind of config.

If you need more IDE channels in your box add a promise card,
or a 3Ware card, which supports up to 8 IDE channels (8 disks in
master-only configuration).

Benno.



Edward Schernau wrote:

 If I have a RAID1 set on a single IDE channel, i.e. master  slave,
 will the box keep running if a drive goes down?
 --
 Edward Schernau,   mailto:[EMAIL PROTECTED]
 Network Architect  http://www.schernau.com
 RC5-64#: 243249 e-gold acct #:131897




3WARE IDE cards questions and thoughts ..

2000-04-28 Thread Benno Senoner

Hi,
I went to the 3WARE site.

really nice the 8 IDE channel version and cheap too.
:-)

I guess they do not support master/slave configurations for performance
and reliability reasons.
(At least I am assuming this because they say up to 8 drives and there are
8 connectors)

I noticed that they do not support UDMA/66. (at least the PDF says so).

Do you think that the impact is negligible when there is only one drive
per chanel ?
(At least I think that there are not that many EIDE disks which can
sustain 33MB/sec all the time).
The load (in terms of bandwidth) generated on the bus/CPU by multiple (8)
disks
is quite high IMHO, therefore 66MB/sec * 8 (even if every drive would be able

to deliver that kind of bandwidth) , would likely saturate your mobo/CPU,
shifting the bottleneck from the disks to the memory/CPU subsystem.

BTW, do you need a 2.3.x to work with these 3WARE monsters ?
Or are there patches backported to 2.2.x floating around ?

Benno.






Re: RAID-0 - RAID-5

2000-04-28 Thread Benno Senoner

Jakob Østergaard wrote:

 On Thu, 27 Apr 2000, Mika Kuoppala wrote:

 [snip]
 
  I think Jakob Østergaard has made raid-reconf utility
  which you can use to grow raid0 arrays. But i think
  i didn't support converting from raid0 to raid5. Or
  perhaps it alraeady does =? :)

 It doesn't (yet).  And unfortunately, with examns coming
 up, it's not likely that it will for the next month or
 two.

 I haven't abandoned the idea though.  With the new raid in 2.4
 the demand for such a utility will be even greater.

Sorry , I am not very up to date:

do you plan both, an increase of the individual partitions,
and resizing by adding more disks ?

That would be too cool, having for example a 4 disks soft-RAID5 array
and when you run out of space, add one more disk, and let the resize
tool
recalculare all parities etc, in order to take the fifth disk into
account.

Is that possible form a pratical POV ?

Benno.





Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure =problems ?

2000-01-14 Thread Benno Senoner

Chris Wedgwood wrote:

  In the power+disk failure case, there is a very narrow window in which
  parity may be incorrect, so loss of the disk may result in inability to
  correctly restore the lost data.

 For some people, this very narrow window may still be a problem.
 Especially when you consider the case of a disk failing because of a
 power surge -- which also kills a drive.

  This may affect data which was not being written at the time of the
  crash.  Only raid 5 is affected.

 Long term -- if you journal to something outside the RAID5 array (ie.
 to raid-1 protected log disks) then you should be safe against this
 type of failure?

 -cw

wow, really good idea to journal to a RAID1 array !

do you think it is possible to to the following:

- N disks holding a soft RAID5  array.
- reserve a small partition on at least 2 disks of the array to hold a RAID1
array.
- keep the journal on this partition.

do you think that this will be possible ?
is ext3 / reiserfs  capable of keeping the journal on a different partition
than
the one holding the FS ?

That would really be great !

Benno.




Re: large ide raid system

2000-01-13 Thread Benno Senoner

Thomas Davis wrote:

 JMy 4way IDE based, 2 channels (ie, master/slave, master/slave) built
 using IBM 16gb Ultra33 drives in RAID0 are capable of about 25mb/sec
 across the raid.

nice to hear :-) not a very big performance degradation



 Adding a Promise 66 card, changing to all masters, got the numbers up
 into the 30's range (I don't have them at the moment.. hmm..)

  I was also wondering about the reliability of using slaves.
  Does anyone know about the likelihood of a single failed drive
  bringing down the whole master/slave pair?  Since I have tended to
  stay away from slaves, for performance reasons, I don't know
  how they influence reliability.  Maybe it's ok.
 

 When the slave fail, the master goes down.

 My experience has been, when _ANY_ IDE drive fails, it takes down the
 whole channel.  Master or slave.  The kernel just gives fits..

hmm .. strange .. I got an old Pentium box, and disconnected the slave and
the raid5 array continued  to work after a TON of syslog messages.

Anyway, I agree that the master-only configuration is much more reliable
from an electrical point of view.

I was wondering how much IDE channels linux 2.2 can handle,
can it handle 8 channels ?

would an Abit with 4 channels + 2 promise ultra 66 cards work ?
or a normal BX mainboard (2 channels) + 3 promise ultra 66 ?

thanks for infos,

Benno.





Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure = problems ?

2000-01-12 Thread Benno Senoner

James Manning wrote:

 [ Tuesday, January 11, 2000 ] Benno Senoner wrote:
  The problem is that power outages are unpredictable even in presence
  of UPSes therefore it is important to have some protection against
  power losses.

 I gotta ask dying power supply? cord getting ripped out?
 Most ppl run serial lines (of course :) and with powerd they
 get nice shutdowns :)

 Just wanna make sure I'm understanding you...

 James
 --
 Miscellaneous Engineer --- IBM Netfinity Performance Development

yep, obviously the UPS has a serial line to shut down the machine nicely
before a failure,
but it happened to me that the serial cable was disconnected and the
power outage lasted
SEVERAL hours during a weekend , where no one was in the machine room (of
an ISP).

you know murphy's law ...
:-)

But I am mainly interested in the power-failure-protection in the case
where you want to setup
a workstation with a reliable disk array (soft raid5), and do not have
always an UPS handy,

you will loose the file that was being written, but the important thing
is that the disk array remains
in a safe state , just  like a single disk + journaled FS.

Sthephen Tweedie said that this is possible (by fixing the remaining
races in the RAID code),
if these problems will be fixed sometime, then our fears of a corrupted
soft-RAID array in
the case of a  power-failure on a machine without UPS will completely go
away.

cheers,
Benno.







Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure =problems ?

2000-01-12 Thread Benno Senoner

"Stephen C. Tweedie" wrote:

 Ideally, what I'd like to see the reconstruction code do is to:

 * lock a stripe
 * read a new copy of that stripe locally
 * recalc parity and write back whatever disks are necessary for the stripe
 * unlock the stripe

 so that the data never goes through the buffer cache at all, but that
 the stripe is locked with respect to other IOs going on below the level
 of ll_rw_block (remember there may be IOs coming in to ll_rw_block which
 are not from the buffer cache, eg. swap or journal IOs).

  We are '100% journal-safe' if power fails during resync.

 Except for the fact that resync isn't remotely journal-safe in the first
 place, yes.  :-)

 --Stephen

Sorry for my ignorance I got a little confused by this post:

Ingo said we are 100% journal-safe, you said the contrary,

can you or Ingo please explain us in which situation (power-loss)
running linux-raid+ journaled FS we risk a corrupted filesystem ?

I am interested what happens if the power goes down while you write
heavily to a ext3/reiserfs (journaled FS) on soft-raid5 array.

After the reboot if all disk remain intact physically,
will we only lose the data that was being written, or is there a possibility
to end up in a corrupted filesystem which could more damages in future ?

(or do we need to wait for the raid code in 2.3 ?)

sorry for re-asking that question, but I am still confused.

regards,
Benno.





Re: large ide raid system

2000-01-11 Thread Benno Senoner

Jan Edler wrote:

 On Mon, Jan 10, 2000 at 12:49:29PM -0800, Dan Hollis wrote:
  On Mon, 10 Jan 2000, Jan Edler wrote:
- Performance is really horrible if you use IDE slaves.
  Even though you say you aren't performance-sensitive, I'd
  recommend against it if possible.
 
  My tests indicate UDMA performs favorably with ultrascsi, at about 1/6 the
  cost. Cost is often a big factor.

 I wasn't advising against IDE, only against the use of slaves.
 With UDMA-33 or -66, masters work quite well,
 if you can deal with the other constraints that I mentioned
 (cable length, PCI slots, etc).

Do you have any numbers handy ?

will the performance of master/slave setup be at least HALF of the
master-only setup.

For some apps cost is really important, and software IDE RAID has a very low
price/Megabyte.
If the app doesn't need killer performance , then I think it is the best
solution.

now if we only had soft-RAID + journaled FS + power failure safeness  right now
...

cheers,
Benno.





Re: soft RAID5 + journalled FS + power failure = problems ?

2000-01-11 Thread Benno Senoner

"Stephen C. Tweedie" wrote:

 Hi,

 On Fri, 07 Jan 2000 13:26:21 +0100, Benno Senoner [EMAIL PROTECTED]
 said:

  what happens when I run RAID5+ jornaled FS and the box is just writing
  data to the disk and then a power outage occurs ?

  Will this lead to a corrupted filesystem or will only the data which
  was just written, be lost ?

 It's more complex than that.  Right now, without any other changes, the
 main danger is that the raid code can sometimes lead to the filesystem's
 updates being sent to disk in the wrong order, so that on reboot, the
 journaling corrupts things unpredictably and silently.

 There is a second effect, which is that if the journaling code tries to

 prevent a buffer being written early by keeping its dirty bit clear,

 then raid can miscalculate parity by assuming that the buffer matches
 what is on disk, and that can actually cause damage to other data than
 the data being written if a disk dies and we have to start using parity
 for that stripe.

do you know if using soft RAID5 + regular etx2 causes the same sort of
damages,
or if the corruption chances are lower when using a non journaled FS ?

is the potential corruption  caused by the RAID layer or by the FS layer ?
( does need the FS code or the RAID code to be fixed ?)

if it's caused by the FS layer, how does behave XFS (not here yet ;-) ) or
ReiserFS in this case ?

cheers,
Benno.




 Both are fixable, but for now, be careful...

 --Stephen





Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure = problems ?

2000-01-11 Thread Benno Senoner

"Stephen C. Tweedie" wrote:

(...)


 3) The soft-raid backround rebuild code reads and writes through the
buffer cache with no synchronisation at all with other fs activity.
After a crash, this background rebuild code will kill the
write-ordering attempts of any journalling filesystem.

This affects both ext3 and reiserfs, under both RAID-1 and RAID-5.

 Interaction 3) needs a bit more work from the raid core to fix, but it's
 still not that hard to do.

 So, can any of these problems affect other, non-journaled filesystems
 too?  Yes, 1) can: throughout the kernel there are places where buffers
 are modified before the dirty bits are set.  In such places we will
 always mark the buffers dirty soon, so the window in which an incorrect
 parity can be calculated is _very_ narrow (almost non-existant on
 non-SMP machines), and the window in which it will persist on disk is
 also very small.

 This is not a problem.  It is just another example of a race window
 which exists already with _all_ non-battery-backed RAID-5 systems (both
 software and hardware): even with perfect parity calculations, it is
 simply impossible to guarantee that an entire stipe update on RAID-5
 completes in a single, atomic operation.  If you write a single data
 block and its parity block to the RAID array, then on an unexpected
 reboot you will always have some risk that the parity will have been
 written, but not the data.  On a reboot, if you lose a disk then you can
 reconstruct it incorrectly due to the bogus parity.

 THIS IS EXPECTED.  RAID-5 isn't proof against multiple failures, and the
 only way you can get bitten by this failure mode is to have a system
 failure and a disk failure at the same time.



 --Stephen

thank you very much for these clear explanations,

Last doubt: :-)
Assume all RAID code - FS interaction problems get fixed,
since a linux soft-RAID5 box has no battery backup,
does this mean that we will loose data
ONLY if there is a power failure AND successive disk failure ?
If we loose the power and then after reboot all disks remain intact
can the RAID layer reconstruct all information in a safe way ?

The problem is that power outages are unpredictable even in presence
of UPSes therefore it is important to have some protection against
power losses.

regards,
Benno.






Re: Swap on Raid1 : safe on EIDE disks ? =no hangs ?

2000-01-07 Thread Benno Senoner

Luca Berra wrote:



 According to Stephen Tweedie the problem happens in both case,
 writes to the swap file do bypass the buffer cache in ANY case.

 only way to be safe is:
 do not use spare drives
 swapoff before raidhotadding
 replace swapon with something like
 while grep -q "^${1#/dev/} .* resync=" /proc/mdstat;do sleep 1;done  swapon $1
 (needs to be smarter than that, to support swapon -a and swap on file)

I was wondering if this procedure (swapping on Raid1 + waiting for the resync),
is safe on an IDE only system ?

I don't need hot-swapping, the only thing I need is that if one disk dies,
swapping will not take down the disk (hangs like in the SCSI case etc).

If this is safe I can easily live with the "wait for resync" issue.

regards,
Benno.




soft RAID5 + journalled FS + power failure = problems ?

2000-01-07 Thread Benno Senoner

Hi,
I was just thinking about the following :

There will soon be available at least one stable journaled FS for linux
out of the box.
Of course we want to run our soft-RAID5 array with journaling in order
to
prevent large fsck and speed up the boot process.

My question:

what happens when I run RAID5+ jornaled FS and the box is just writing
data
to the disk and then a power outage occurs ?

Will this lead to a corrupted filesystem or will only the data which was
just written,
be lost ?

I know that a single disk + journaled FS do not lead to a corrupted
disk,

but in the RAID array case ?

The Software-RAID HOWTO says:
"In particular, note that RAID is  designed to protect against *disk*
failures, and not against
*power* failures or *operator* mistakes."

What happens if a new written block was only committed to 1 disk out of
4disks present in an array (RAID5) ?

Will this block be marked as free after the array resync or will it lead
to problems making the
md device corrupted ?

If  the software RAID in linux doesn't already guarantee this, could
this be added with a similar technique
just like a journaled FS does ?

I am thinking about keeping a journal of the committed blocks to disk,
and if a power failure occurs, then
just wipe out these blocks and make them free again.

A journaled FS on top of the raid device will automatically avoid files
being placed in these
"uncommitted  disk blocks".

But since md is a "virtual device" the situation might be more
complicated.

I think quite a few of us are interested this "raid on powerfailure
during diskwrites reliability" topic.

Thank you in advance for your explanations.

regards,
Benno.




Re: linux raid patch

1999-05-29 Thread Benno Senoner

John Barbee wrote:

 I'm running Redhat 6 which comes with the 2.2.5 kernel.

 Does that mean I should be using the raid0145-2.2.6 patch from ftp.kernel.org?

 How do I incorporate this patch into the kernel.  I couldn't find any
 directions.  The patch starts with a lot of +'s and -'s.  I presume those
 mark the differences between what the block_device.c file is and what it
 should be, but what is the proper way for me to update that file?

 john.

you don't need the patches,

the 2.2.5 kernel which comes with Redhat 6.0 already have thes patches,
and the raidtools-0.90 RPM is included too.

the raid devices are built as modules,
therefore you may load them with insmod.

So if you want a root-raid device, you must rebuild your initrd and add the
raid*.o module

regards,
Benno.




IDE: Master/Slave failure = entire channel down ?

1999-05-21 Thread Benno Senoner

Can someone of you explain us please,
if a failure of a the master or slave drive, causes both drives on the
same IDE channel to be inaccessible ?

I tested this with my Pentium board (Tyan Tomcat HX chipset),
and had no problems while disconnecting
either a master or a slave drive.
Does this happen only on certain IDE controllers ?

regards,
Benno.




Re: installing root raid non-destructively

1999-05-15 Thread Benno Senoner

Luca Berra wrote:



  Third, (naive questions) if raid1 supports on-the-fly disk
  "reconstruction" why can't I simply add another identical disk alongside
  my present one, activate raid1 non-destructively and have disk2 be
  "reconstructed" as the mirror image of disk1?
 because linux-raid keeps a 4K raid superblock at the end of the partition
 so if you already created the filesystem you have no room for the superblock.
 i have not yet tested this, but you could get resize2fs (you need a partition
 magic license for that, shrink the filesystem by 4K, dd one disk over the
 other, then create a new raid.

I tested this some time ago ( dd if=/dev/hda1 of=/dev/hdc1 in single user mode)
but did not shrink the ext2fs, but it worked.

Will the superblock get corrupted when I use up my last 4k of space on the
ext2fs ?

regards,
Benno.




Re: RAID and RedHat 6.0

1999-05-11 Thread Benno Senoner

Ingo Molnar wrote:

 On Sun, 9 May 1999, Giulio Botto wrote:

   I downloaded the latest version of the raidutils and compiled them but
   still the same error, is there something else I should have goten?
 
  My guess is the "latest" raidtools are already installed, the problem lies
  with the kernel: they probably put the mainstream kernel without the
  correct patches for raid [...]

 no, RedHat 6.0 has the newest (ie. most stable) RAID driver. The problem
 in the config probably is the missing 'persistent-superblock' and
 'chunksize' parameter.

I think Redhat should compile fix in the kernel all the MD driver options,
It will not take up very much memory, and if Redhat will put
the raidtools on the CD instimage , people will be able to install a
fresh redhat distribuion on a soft-RAID array,
and even upgrade on older distribution sitting on a root RAID array.

This is a often requested feature.
 The installer should  be able to recognize md devices, and do
the proper handling.
diskdruid sould allow a simple raid-ed setup too.
All this would lead in a very easy installation/upgrade procedure
of Redhat on a machine on soft-raid  array.

comments ?

PS: is there a way to boot of an autodetected root-raid array
using the kernel shipped with RH6.0 (without recompiling)
by generating a new initrd which loads the the raid* modules at bootup.
or does need the boot process the raid* drivers before the initrd is started ?

regards,
Benno.






 -- mingo



SWAP CRASHES LIUNX 2.2.6 = malloc() design problem ?

1999-04-27 Thread Benno Senoner

Hi,

My system is a Redhat 5.2 running on
linux 2.2.6 + raid0145-19990421.

I tested if the system is stable while swapping heavily.
I tested a regular swap area and a soft-RAID1 (2 disks) swaparea.


So I wrote e little program wich does basically the following:

allocate as much as possible blocks of about 4MB,

then begin to write data in a linear way to each block,
and displaying the write-performance in MB/sec.
The program writes a dot on the screen after the write
of each 1024bytes.

NOW TO THE VERY STRANGE RESULTS :

1)
my frist BIG QUESTION is if there is a design flaw in malloc() or not:

when I do (number of successfully allocated blocks)* 4MB

then I get 2GB of *SUCCESSFULLY* malloc()ed memory,
but my system has only about 100MB of virtual mem
(64MB RAM + 40MB swap).

How hopes the kernel to squeeze the allocated 2GB in 100MB of virtual
mem ?
:-)

Does anyone know why the kernels does not limit the maximum malloc()ed
memory
to the amount of RAM+SWAP ?

Will this be changed in the future ?


2)

At beginning the program runs fine and when the RAM is used up,
swapping activity begins, and the mem-write performance drops
to about the write performance of the disk or RAID1 array.

Now the problem:
When all RAM + SWAP are used up, then the system begin to freeze,
and every 10-20secs , there appear messages on the console
like:

out of memory of syslog
out of memory of klogd
.
.

and after a while my swapstress program exits with a Bus Error.

Sometimes even "update" get killed, or "init"
which writes: PANIC SEGMENT VIOLATION !

after the exit of with "Bus Error" of my swapstress program the system
continues to work, but since init got killed , you can not reboot or
shutdown the machine anymore.

Note that I ran swapstress as normal user.
This means it's easy to crash/render unstable Linux with heavy
malloc()ing
 / swapping.

You can find my swapstress program at

http://www.gardena.net/benno/linux/swapstress.tgz

please let me know your result. (crash / lockup .. ? )

comments please, especially form the kernel gurus !

regards,
Benno.




Re: SWAP CRASHES LIUNX 2.2.6 = malloc() design problem ?

1999-04-27 Thread Benno Senoner


Alvin Starr wrote:

 On Tue, 27 Apr 1999, Benno Senoner wrote:

  Hi,
 
  My system is a Redhat 5.2 running on
  linux 2.2.6 + raid0145-19990421.
 
  I tested if the system is stable while swapping heavily.
  I tested a regular swap area and a soft-RAID1 (2 disks) swaparea.
 
 
  So I wrote e little program wich does basically the following:
 
  allocate as much as possible blocks of about 4MB,
 
  then begin to write data in a linear way to each block,
  and displaying the write-performance in MB/sec.
  The program writes a dot on the screen after the write
  of each 1024bytes.
 
  NOW TO THE VERY STRANGE RESULTS :
 
  1)
  my frist BIG QUESTION is if there is a design flaw in malloc() or not:
 
  when I do (number of successfully allocated blocks)* 4MB
 
  then I get 2GB of *SUCCESSFULLY* malloc()ed memory,
  but my system has only about 100MB of virtual mem
  (64MB RAM + 40MB swap).

 I took a look at your program. It looks as if you are not using the memory
 that you malloc. If I remember correctly Linux will not allocate the
 memory that you have requested until you use it. This is a really handy
 feature when you have sparce arrays.

I now this, fact,
without the fragment below, the program would do nothing,
just run the program and see that is writes to mem,
look at your HD LED
:-)

 What you need to do is to have your
 program touch every page in the virtual memory allocated. If the page size
 is 4096 bytes you will have to write every 4096'th byte to insure that the
 page is brought into existance.

this is the fragment that writes to the memory:
in particular  buffer[u]=u does the actual write,


buffer=block_arr[i];
for(u=0;uMYBUFSIZE;u++)
{
  buffer[u]=u;
  if((u  1023)==0) { printf("."); fflush(stdout); }
}





 
  Does anyone know why the kernels does not limit the maximum malloc()ed
  memory
  to the amount of RAM+SWAP ?

 This is a design choice. If you do this you will limit programms that use
 virtual memory in a way that is sparsly populated.

  Will this be changed in the future ?

 Other systems pre-commit memory and this can at times cause your system to
 stop allowing new processes to run up even though you are not using
 anywhere near the total of ram+swap avilable. The choice of using one
 allocation method or another may be a possible place for a kernel tuneing
 feature. But for me I like the current choice.

 
  2)
 
  At beginning the program runs fine and when the RAM is used up,
  swapping activity begins, and the mem-write performance drops
  to about the write performance of the disk or RAID1 array.
 
  Now the problem:
  When all RAM + SWAP are used up, then the system begin to freeze,
  and every 10-20secs , there appear messages on the console
  like:
 
  out of memory of syslog
  out of memory of klogd
  .
  .
 
  and after a while my swapstress program exits with a Bus Error.
 
  Sometimes even "update" get killed, or "init"
  which writes: PANIC SEGMENT VIOLATION !
 
  after the exit of with "Bus Error" of my swapstress program the system
  continues to work, but since init got killed , you can not reboot or
  shutdown the machine anymore.
 
  Note that I ran swapstress as normal user.
  This means it's easy to crash/render unstable Linux with heavy
  malloc()ing
   / swapping.

 resource exhaustion can cause a number of problems one of which is system
 crashes. One solution is to limit the per user memory maximum so that a
 single user cannot burn up all the system memory but that still will not
 stop the problem.

 One possible answer is for the kernel to allways spare some swap space for
 tasks running as root and to suspend any user tasks that request memory
 when the swap limit is reached. the creation of new user processes should
 also be suspended when this limit is reached. at this point an
 administrator would be able to login to the system and kill the offending
 processes or take some other remedial action.

I agree, root processes should have some spare ressurces.



 Alvin Starr   ||   voice: (416)585-9971
 Interlink Connectivity||   fax:   (416)585-9974
 [EMAIL PROTECTED]  ||



Re: SWAP CRASHES LIUNX 2.2.6 = malloc() design problem ?

1999-04-27 Thread Benno Senoner

David Guo wrote:

 Hi.
 If you read the document of the raid. You'll know swap on raid is not safe.
 And you don't have any reason to use swap on raid. Because kernel handles
 the swap on different disk will not be worse then raid.
 I think you can checkout the docs with raid.

 Yours David.

Not true:
The document is pretty outdated,
older raid patches crashes easily, with my program,

free_page not on list ... and so on ...

But with the latest patches +2.2.6 ,
swapping on a soft-RAID1 swaparea gives me the same
behavoiur as swapping on a regular disk.

(no more free_page errors)

Therefore it's a kernel design issue.

ciao
Benno.





Re: SWAP CRASHES LIUNX 2.2.6 = malloc() design problem ?

1999-04-27 Thread Benno Senoner

[EMAIL PROTECTED] wrote:

 I don't see the relevance to linux-raid either, but the 2.2.x kernel does
 have /proc/sys/vm/overcommit_memory which will enable the below behaviour.
 It's off by default though...

thanks, I will try this /proc setting,
I am using a standard RH5.2 box with a 2.2.6 kernel + 0145 raid patches
but strange when the /proc/.. setting is off by default, why it's on on my
machine
I don't think RH5.2 will specifically turn on the switch, because the
init-scripts are for kernel 2.0.36.

the relevance to linux-raid is the following:

I was testing the stability of swapping over soft-RAID1,
and got a a "side-effect" the malloc problem ...
:-)

earlier raid patches creashed the machine because
the free_page problems ( memory starvation)
the never patches gave me the same results while swapping on soft-RAID1 as
swapping on a regular swaparea.

regards,
Benno.




auto-partiton new blank hotadded disk

1999-04-26 Thread Benno Senoner


 This brings up another question, partitioning.  The above (I don't think)
 would work currently
 anyway due to the disks having to be partitioned out first (correct?).  How
 hard would it be
 to have the raid code itself write the required partition information and
 whatever it requires
 to get a raw disk (which was marked as a hot spare) up and running?

I am interested more in the idea of automatically repartition a new blank disk
while it is hot-added.

A simple solution would be:

assume that all disks in the array are partitioned in the same way,

assume you have a command  like

myraidhotadd  /dev/md0  /dev/hdb

which does the following:

scans  /etc/raidtab and sees wich disks are part of the md0 device

choses a a non-failed disk, for example /dev/hda

reads the partition-table from hda using  fdisk /dev/hda inputcommands
outputfile

then myraidhotadd parses the contens of outputfile (partitiontable of hda), and
invokes fdisk on hdb
and recreates the exact partition table (perhaps deleting first eventual
partitions found on hdb)

finally you can run raidhotadd to add the disk to the array

The fun thing is that you can write this program with little effort because its
basically only parsing of text
and calling of external programs.

If I find some free time I will write such a script and post it here.

or am I reinventing the wheel ?
:-)

ciao
Benno.






Re: auto-partiton new blank hotadded disk

1999-04-26 Thread Benno Senoner

Ingo Molnar wrote:

 On Mon, 26 Apr 1999, Benno Senoner wrote:

  I am interested more in the idea of automatically repartition a new blank disk
  while it is hot-added.

 no need to do this in the kernel (or even in raidtools). I use such
 scripts to 'mass-create' partitioned disks:

that's ok,

but it's not unsafe to overwrite the partition-table of disks which are actually
part of
a soft-raid array and in use ?

I think for hot-adding it would be better to use a script which takes as parameter
the disk
you want to partition, and partition only this disk.

An other idea would be , reading the partiton-table of a non-failed disk part of
the
array and recreate this partiton on the new disk.

or a 3rd possiblity would be creating a nice cmdline tool,

where you give the following informations:

the disks which you want to include in the array
and how much space of the disk you want to use for the array.

then the tool partition all disks and write a sort of "template" script for fdisk
so that you can write a wrapper for raidhotadd which first calls this script
before doing the hot-add.

opinions ?

regards,
Benno.





 [root@moon root]# cat dobigsd

 if [ "$#" -ne "1" ]; then
echo 'sample usage: dobigsd sda'
exit -1
 fi

 echo "*** DESTROYING /dev/$1 in 5 seconds!!! ***"
 sleep 5
 dd if=/dev/zero of=/dev/$1 bs=1024k count=1
 (for N in `cat domanydisks`; do echo $N; done) | fdisk /dev/$1

 [root@moon root]# cat domanydisks
 n e 1   1 200
  n l 1 25
  n l 26 50
  n l 51 75
  n l 76 100
  n l 101 125
  n l 126 150
  n l 151 175
  n l 176 200
  n p 2 300 350
  n p 3 350 400
  n p 4 450 500
 t 2 86
 t 3 83
 t 4 83
 t 5 83
 t 6 83
 t 7 83
 t 8 83
 t 9 83
 t 10 83
 t 11 83
 t 12 83
 w

 thats all, fdisk is happy to be put into scripts.

 -- mingo



Re: auto-partiton new blank hotadded disk

1999-04-26 Thread Benno Senoner

Ingo Molnar wrote:

 On Mon, 26 Apr 1999, Benno Senoner wrote:

   no need to do this in the kernel (or even in raidtools). I use such
   scripts to 'mass-create' partitioned disks:
 
  but it's not unsafe to overwrite the partition-table of disks which are
  actually part of a soft-raid array and in use ?

 it's unsafe, and thus the kernel does not allow it at all. Why dont you
 create the partitions before hot-adding the disk?

 -- mingo

Ok, I misunderstanded your script:

your script acts on only 1 disk, I thought it partitions your whole array.

I will try to do some raid init-tool which automates the task of creating
the
partitions on the array, and when you hotadd a new disk,
it partitions a blank disk before the actual hotadd.


PS: does someone know, why the linux disk driver blocks for so much time
during heavy disk writes.

This is very bad for playing low-latency audio. (50-100ms)

all processes block up to 2-3 secs during syncs of the buffer cache.

Is there something planned for future to avoid this ?


regards,
Benno.




Re: Hot Swap

1999-04-23 Thread Benno Senoner

Helge Hafting wrote:

 [question on how to hot-swap IDE]

 I have an IDE drive that I can disconnect and reconnect with no
 problems.  It is not on a RAID though.
 The controller is nothing special, a ultra-dma thing
 built into the shuttle motherboard.

 The trick seems to be unloading the IDE kernel module before
 reconnecting.  It probes the drive just fine upon reloading.
 Unfortunately you can't do that with a IDE root file system.

 Helge Hafting

I did some hot-swap tests with IDE drives successfully:

normally the IDE driver is compiled fix in the kernel , and even if you
compile it as a module in the 2.2.x kernels , you can not unload it
because its busy
in the case the root-fs is on the IDE disk.

The kernel probes the attached disks at boot time,
so it has a static table of the disks which are present in your machine.

I tested the following:

a soft-RAID5 array consisting of 3 disks , hda1 , hdb1 , hdc1

diring heavy writes on the raid array, I disconnected the power hdb for
example.
At this point the kernel sees that the device does not respond and
writes many messages
in the syslog.
The raid layer detects the error condition , marks the device as faulty
(you can see this in /proc/mdstat ) , and continues in degrated mode.

at this point you must issue a:

raidhotremove /dev/md0 /dev/hdb1

which removes the hdb1 device from the array
(again, check your /proc/mdstat )

now you can reconnect the disk

if it's a new blank disk you have to repartition the disk like the old
one , creating hdb1 for
example, and you can do this on the fly without a reboot (it worked for
me,
I know that there are disks where you must reboot to ensure that the new
partition table
is updated properly)

now simply type:

raidhotadd /dev/md0 /dev/hdb1

et voila': the device is added to the array and hot-reconstruction
begins in background
and in /proc/mdstat you can see the progress status of the
reconstruction.

regards,
Benno.




Re: Swap on RAID

1999-04-15 Thread Benno Senoner


Stephen C. Tweedie wrote:

 Hi,

 On Wed, 14 Apr 1999 21:59:49 +0100 (BST), A James Lewis
[EMAIL PROTECTED]
 said:

  It wasn't a month ago that this was not possible because it needed
to
  allocate memory for the raid and couldn't because it needed to swap
to
  do it?  Was I imagining this or have you guys been working too hard!


 There may well have been a few possible deadlocks, but the current
 kswapd code is pretty careful to avoid them.  Things should be OK.

 --Stephen

So what must one use to swap over a soft-RAID1 partition ?

a standard 2.2.x kernel  or a 2.2.x kernel patched with the latest
alpha-raid patches ?

I want to use the alpha-patches because the raid 0145 driver has
hot-reconstruciton.

can one you you please explain us exactly which patches can do SWAP over

soft-RAID1 ?

thanks,

regards,
Benno.





raid patches for linux 2.2.4 available ?

1999-03-28 Thread Benno Senoner

Hi,
do you know if there are already the raid alpha patches available
for kernel 2.2.4 ?
at www.us.kernel.org/  the latest I found is for 2.2.3.

or can I apply the raid-2.2.3 patch over a 2.2.4 kernel ?

thank you for the info.

regards,
Benno.




most RAID crashproof setup = BOOT FROM FLOPPY DISK

1999-01-29 Thread Benno Senoner

Hello,

I was looking for the most crashproof setup of a Software Root-RAID
array:

my conclusion:

assume one wants to setup a machine with Root RAID5 array,

the problem is the booting of the kernel,
since LILO uses the BIOS routines the kernel must reside on a standard
partition (non software-raid),
on the usual 1024 cylinders ecc..

assume I set up LILO to load the kernel off the first disk (where the
/boot dir resides too)

when the first disk crashes , the system won't boot anymore.


Solution:

I use Redhat 5.2  , I prepared a bootdisk, which contains a
2.0.36+raid0145 kernel plus the initrd image
then using as partition id 0xFD on the harddrives, the kernel
autodetects the RAID array and
starts the root /dev/md0 device and boots the system.

So even if the first disk crashes, the system is still functional.

My recommendations:

make 2-3 copies of the bootdisk, and buy a spare 3.5inch floppy drive
(very cheap) ,
in the case that the floppy drive breaks.
:-)

Yes , I know it is not very elegant to boot of a floppy drive, but it
saves a lot of trouble.

Or are there better methods, to boot realiably a pure software RAID5
array ?

any comments ?

PS: do you know if switching on/off the power of the computer with the
floppy disk in the floppy drive,
can accidentally (through electromagnetic discharges) erase the data of
the floppy disk ?
( this is the cause why I recommend to make some copies of the
bootdisk).

best regards,
Benno.




Re: hot-add features of alpha-code IDE drives , is this possible ?

1999-01-29 Thread Benno Senoner

m. allan noah wrote:

 the eide electrical interface does not support hot-swap. i have done
 hot-remove many times, but not hot-add. seems like a nice way to lose a
 drive.

I tested the following:
Redhat 5.2 kernel 2.0.36 + raid 0145 + raidtools 0.90

RAID5 setup consisting of 3 partitions:  /dev/hda1 /dev/hda2  /dev/hdc1
( yes the first 2 partitions are physically on the same drive, but at the
testing time
I had only 2 drives  -:)  )

I wrote a script that copies many files to the RAID5 array,
at some point I disconnected the data cable of the hdc drive (but not the
power cable) (secondary master),
then the kernel gives many errors like  eide: IntrCmd
and then the usual I/O errors on /dev/hdc1 .
the RAID code spit out many debugging messages,
plus a strange
"md: raid bug found in line 666   ( calls the procedure MD_BUG )
what does means this ?

the kernel disabled the drive:  in /proc/mdstat is can see: /dev/hdc1[0] (F)

raidhotremove /dev/md0 /dev/hdc1 works without problems
I reconnected the hdc drive ,
ran
fdsik /dev/hdc
and the kernel says me:  ide1: reset

I removed the hdc1 partition and recreated it with id 0xFD , and then leaved
fdisk.

raidhotadd /dev/md0 /dev/hdc1

the kernel began to reconstruct the parity on the disk,
and all seemed ok.


but I am very courious if a one can really damage an EIDE drive doing
hot-adds ?

As you know the price/performace of the latest EIDE drives are very good,
especially if you look at the IBM Deskstar 16GB or the upcoming IBM 24GB
drives,

using 4 drives you can build a huge Raid array for a very competitive price,

and the performance is not so bad ( the IBM 16GB drives does a nice 12MB/sec
read transfer).

Yes SCSI is superiour to IDE, but software RAID adds much realiability to
these drives.

the maximum would be having even (realiable) hot-swap on  IDE drives,
but I if you say it is not possible in a realiable manner .
:-(

Do you know precisely what can be the factor which can damage the IDE drives
?


thank you for the infos,

best regards,

Benno.




Re: most RAID crashproof setup = BOOT FROM FLOPPY DISK

1999-01-29 Thread Benno Senoner


 In this case the bootfloppy has the advantage that if it is corrupt you can
 quickly replace it with a new.

 This is why you'd have entries for all the disks in the lilo config files.
 You'd use lablel like Linux_d1, Linux_d2 ... or whatever. If loading the
 image from the first disk fails, you just have to enter the label of
 another kernel at the lilo prompt.

Hmm, you are right,  floppy has security problem (but the customer in my case IS
root, and
there are no additional users)

I forget the fact that you can setup multiple labels,

but do you think that there is a possibility that the disk gets corrupted at a
point which,
LILO begins loading itself (prior kernel loading) and then stops due to disk I/O
error ?

thank you again for you infos,

best regards,
Benno.




 Plus there is another additional work when a disk fails,using your solution:
 
 you have to set up the /bootn dir and rerun lilo.

 Since you must repartition and raidhotadd the new disk anyway I don't
 consider this to be much of a problem.

 i plan to install software-RAID Linux PCs, where the user is a joe-average
 user, so I think for my purposes I think I will consider the boot-floppy
 option.

 I don't particularely like booting off floppy for several reasons:

 * security problems - by booting with some sort of rescue disk you can get
 around any security on the system.

 * I've had more problems with dust-clogged or otherwise inoperative floppy
 disks than with failing harddisks - admittedly, this may not be typical.

 * additional single point of failure. If the floppy drive or controller
 breaks, you're down till you get a replacement.

 Bye, Martin
 --
  Martin Bene   vox: +43-664-3251047
  simon media   fax: +43-316-813824-6
  Andreas-Hofer-Platz 9 e-mail: [EMAIL PROTECTED]
  8010 Graz, Austria
 --
 finger [EMAIL PROTECTED] for PGP public key