Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-17 Thread Hans Reiser
Vitaly Fertman wrote:


Ok, so the reiserfs kernel code detects an error on disk, what does it
do?  Print out an error message, maybe BUG?  There is an error field
in the reiserfs superblock, I hope it is set when the kernel detects
something bad.

So, now what happens?  Maybe the user doesn't read their syslog and
doesn't see the error, or the error is just a prelude to memory corruption
which causes the system to crash.  When the system boots again, it goes
on its merry way, mounting the reiserfs filesystem with _known_ errors
on it, using bad allocation bitmaps, directories btrees, etc and maybe
double allocating blocks or overwriting blocks from other files causing
them to become corrupt, etc, etc, etc.  Until finally the filesystem is
totally corrupt, the system crashes miserably, the user emails this list
and reiserfsck has an impossible job trying to fix the filesystem.

Instead, what I propose is to have reiserfsck -a AS A STARTING POINT
simply check for a valid reiserfs superblock and the absence of the
error flag before declaring the filesystem clean and allowing the
system to boot.

What's even worse, the reiserfs_read_super (at least 2.4.18 RH kernel)
code OVERWRITES the superblock error status at mount time, making it
worse than useless, since each mount hides any errors that were detected
before the crash:

	s-u.reiserfs_sb.s_mount_state = SB_REISERFS_STATE(s);
	s-u.reiserfs_sb.s_mount_state = REISERFS_VALID_FS ;
 

Andreas seems reasonable, Vitaly, what are your thoughts?

   

Next, add journal replay to reiserfsck if it isn't already there,
 

Why, when it is in the kernel?
   

Because that is the next stage to allowing reiserfsck do checks on the
filesystem after a crash.  Do you tell me you would rather (and you
must, because it obviously currently does) have reiserfsck just throw
away everything in the journal, leaving possibly inconsistent data in
the filesystem for it to check?  Or maybe make the user mount the
filesystem (which obviously has problems or they wouldn't be running
reiserfsck to do a full check) just to clear out the journal and maybe
risk crashing or corruption if the filesystem is strangely corrupted?
 

Vitaly, answer this.
   


Ok, so probably we should make the following changes. The kernel set IO_ERROR
and FS_ERROR flags. 
In the case of IO_ERROR reiserfsck prints the message about hardware problems 
and returns error, so the fs does not get mounted at boot. On attempt mounting 
the fs with IO_ERROR flag set it is mounted ro with some message about hardware 
problems. When you are sure that problems disappeared you can mount it with a 
spetial option cleaning this flag and probably reiserfstune will have some 
option cleaning these flags also.
In the case of FS_ERROR - search_by_key failed or beyond end of device access 
or similar - reiserfsck gets -a option at boot, replays the journal if needed 
and checks for the flag. No flag - returns OK. Else - run fix-fixable. Errors
left - returns 'errors left uncorrected' and the fs does not get mounted at 
boot. On attempt mounting the fs with the flag just print the message about 
mounting the fs with errors and mount it. Not ro here as kernel will not do 
deep analysis of errors and it could be just a small insignificant error.

 

Sounds good to me.  Do it.  Reiser4 also.

--
Hans





Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-17 Thread Andreas Dilger
On Feb 14, 2003  22:19 +0300, Hans Reiser wrote:
 Andreas Dilger wrote:
  You are well aware
 that the e2fsck check intervals can be tuned per-filesystem and even
 disabled if desired (it prints options for how to do this at mke2fs time
 and is clearly documented for the experienced user).  For a boot-once-a-day
 machine, the default is to check about once a month (at most 6 months for
 the time check), and if machines are crashing more often, then they should
 probably be checked more often because _something_ has to be causing crashes.
 
 The idea that how often you boot determines how often it checks is just 
 silly, sorry.

I guess the shortcoming in the ext2 case is that it counts mounts and
not crashes.  If it were counting the number of times the filesystem
was uncleanly shut down instead of normal shutdowns, would that be more
acceptable?  The reason I'm still interested in crashes, even if they
are not filesystem-related crashes, is because there had to be _something_
which caused a crash (bad code, bad hardware, whatever), and once you have
any driver corrupting memory the chance that it is also corrupting filesystem
memory exists.

 Having reiserfsck just do read-only checks shouldn't force you to type
 yes (and we mean yes because this is so scary, mere mortals shouldn't
 be doing this).  Hans, you've always talked about making things easy for
 the average user (error messages and such), don't you think that making
 a data consistency check for the user a little less intimidating too?

 I think that you should have to agree that you have time to wait for 
 fsck before you get stuck with a 1 day large server fsck.

That is definitely true.  However, my assumption would be that if someone
is running a system with terabytes of data they will read the man page
after waiting a day for fsck to complete, or lose their job.  It is entirely
possible for administrators to disable the per-mount e2fsck checking, and
the time-based (6 months by default) checking too, and do fsck themselves.
My experience would be that, like backups, people don't do that, so leaving
the 6 month check in protects users from themselves.

The other thing to keep in mind is that you can have different levels of
automated fsck at boot time, depending on how long they take.  You never
necessarily have to try and fix anything with fsck -a, just detect errors
and leave it up to the user to decide what to do if you find a problem:
- always recover journal, validate superblock, error flag ( 1s)

Don't know how long it takes these things to run, so it is up to you to
trade off checks vs. speed, and you could even round-robin them (storing
the last checked item in the superblock or something):
- check block allocation bitmaps match superblock counts
- walk directory structure from root, checking for directory corruption
- check btree validity on inodes for up to 10 seconds (or whatever, storing
  last checked inode in superblock for restarting this test at next one)

By all means, don't do checks for an hour, or allow users to set the maximum
boot check duration in the superblock.  I'm sure users don't mind waiting
5s at boot time if it means they don't lose data.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/




Re: Error - Partition Correspondance [was Re: Corrupted/unreadable journal: reiser vs. ext3]

2003-02-17 Thread Oleg Drokin
Hello!

On Tue, Feb 18, 2003 at 12:35:23AM +0100, Manuel Krause wrote:

 BTW, do the ReiserFS errors nowadays print out a usable partition 
 identification (like Chris actual data-logging patches perform at mount, 
 e.g.)?

Sometimes it does.

 I mostly always have 2 partitions with ReiserFS mounted, so -- is it 
 still meaningless to get an error message related to one of them in my logs?

It depends on what are the messages.

 I posted this circumstance some 3.6-ReiserFS levels ago and someone of 
 your team wanted to implement this after his task-list was done, IIRC.

Yes. I have a patch dated back to May 7th, 2002. But it was never
accepted for reason I don't remember already.
I will dig through my email, though. Probably I will give it another try.

Bye,
Oleg



fsck on boot (was: Re: Corrupted/unreadable journal: reiser vs. ext3)

2003-02-17 Thread Ookhoi
Andreas Dilger wrote (ao):
 The other thing to keep in mind is that you can have different
 levels of automated fsck at boot time, depending on how long they
 take.  You never necessarily have to try and fix anything with fsck
 -a, just detect errors and leave it up to the user to decide what to
 do if you find a problem: - always recover journal, validate
 superblock, error flag ( 1s)
 
 Don't know how long it takes these things to run, so it is up to you
 to trade off checks vs. speed, and you could even round-robin them
 (storing the last checked item in the superblock or something):
 - check block allocation bitmaps match superblock counts
 - walk directory structure from root, checking for directory
   corruption
 - check btree validity on inodes for up to 10 seconds (or whatever,
   storing last checked inode in superblock for restarting this test at
   next one)
 
 By all means, don't do checks for an hour, or allow users to set the
 maximum boot check duration in the superblock.  I'm sure users don't
 mind waiting 5s at boot time if it means they don't lose data.

Yes! Yes! I agree so much on this .. Let fsck always run at boot, and
perform checks which take at most a few seconds all together.

Then dmesg will tell if something is wrong. Maybe it can also show the
error code in /proc/mounts ?



Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-13 Thread Anders Widman
 In article [EMAIL PROTECTED],
 Anders Widman  [EMAIL PROTECTED] wrote:
The  others  want to make Linux a viable option for normal users and
want Linux to be able to replace Windows or Mac OS. The only way I see
that happen is if Linux starts to get more userfriendly and safe.

 Last time I checked, Windows and Mac OS come to a near total halt when
 they see a disk error while doing a write on non-removable media, unless
 the application goes to extraordinary lengths to handle the error itself.

Actually no. :) Windows continue to run (ok, maybe now win9x or WinNT,
but  these are old anyway). You can just remove a harddrive in Windows
XP  and  the system continues to run. Or you can add new PCI cards and
Windows will find those too.


 Frankly, I used to mount my ext3 filesystems on servers with
 'errors=panic', causing a reboot at the very first sign of trouble (past
 tense as I now use reiserfs which doesn't like that option ;-).
 The sooner the server goes out of production and starts running fsck,
 the sooner it will finish running fsck and come back into production
 (or, in the worst case, the sooner an admin person will start pulling
 out backup tapes and ordering replacement disks).











Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-13 Thread Rudy Zijlstra


On Thu, 13 Feb 2003, Anders Widman wrote:

  In article [EMAIL PROTECTED],
  Anders Widman  [EMAIL PROTECTED] wrote:
 The  others  want to make Linux a viable option for normal users and
 want Linux to be able to replace Windows or Mac OS. The only way I see
 that happen is if Linux starts to get more userfriendly and safe.

  Last time I checked, Windows and Mac OS come to a near total halt when
  they see a disk error while doing a write on non-removable media, unless
  the application goes to extraordinary lengths to handle the error itself.

 Actually no. :) Windows continue to run (ok, maybe now win9x or WinNT,
 but  these are old anyway). You can just remove a harddrive in Windows
 XP  and  the system continues to run. Or you can add new PCI cards and
 Windows will find those too.


Provided you first shut it down, then yes. I am not aware of PC hardware
that will allow you to savely do this with power on the board. Disk removal and
addition also worked using Win2K. And by the way, also using Linux -:)

If you get troubles with the system disk under windows, i do not know
what happens, likely to be interesting... And I have had Linux running
with 1 disk disconnected after it was mounted. unexpexted SCSI disconnect.
All kept working, except for the paritions that were unreachable. Which
happened to be reiserfs and were unharmed.

Cheers

Rudy

P.S. I am getting RAID for that particular system...






Re: rijndael loopback encryption was [Re: Corrupted/unreadable journal: reiser vs. ext3]

2003-02-13 Thread Zygo Blaxell
In article [EMAIL PROTECTED],
Philippe =?ISO-8859-15?Q?Gramoull=E9?=  [EMAIL PROTECTED] wrote:
Hi Zygo,

This is a little bit OT from the thread on ReiserFS ML, but could you tell =
me more
about your laptop setup with rijndael loopback encryption and how you insta=
lled it
, what kernel version/patches ( link to a guide or FAQ or tutorial about ho=
w to set this up)

I had once a NOC latop full of critical infos stolen in the tube in Paris,F=
rance
and it had been a mess to change all the passwords, etc...

Having every FS encrypted would make my paranoid ego feel much better ;o)

Linux 2.4.20, loop-AES 1.7b (replaces the standard loop.o module).
See http://loop-aes.sourceforge.net for loop-AES.  The package tarball
contains a crypto-ramdisk-boot script and some information on how to
set things up.

Note that you should probably encrypt swap as well, and watch out for
features like suspend-to-disk (aka hibernate) that save the contents
of RAM without encryption.

-- 
Zygo Blaxell (Laptop) [EMAIL PROTECTED]
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD



Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-12 Thread Russell Coker
On Wed, 12 Feb 2003 16:26, Anders Widman wrote:
  Unplanned downtime do cause lot of harm to any business.
 
  It's better to stop when there's a serious error than to blindly continue
  and make things worse.

 I  (and  I  think  no  one  else)  never  said  continue blindly. Most
 users/workstations do not have RAID and probably never will.

Hard drive costs are constantly decreasing while the value of data is 
constantly increasing.  I think that the use of RAID will increase steadily.

 The  others  want to make Linux a viable option for normal users and
 want Linux to be able to replace Windows or Mac OS. The only way I see
 that happen is if Linux starts to get more userfriendly and safe.

I guess you're not familiar with what NT does then.

NT 3.5x would sometimes get confused about it's data and umount the file 
system in question to avoid the risk of damaging data.

In case of a serious kernel error NT will give a BSOD in situations where 
Linux by default will print an Oops message and continue running.

-- 
http://www.coker.com.au/selinux/   My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/  Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/  My home page




Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-12 Thread Sam Vilain
On Thu, 13 Feb 2003 00:12, Adam Goryachev wrote:
 I can conceive of a few things that *might* be the right thing in
 various circumstances:

 A) Immediately re-mount the drive read-only, and wait for the sysadmin
 to either re-mount rw or to do some other data recovery/repair

 B) Immediately dis-mount the drive and wait

 C) OK, I tried to write to sector 1324 so lets just try each consecutive
 available sector until it doesn't return an error (possibly marking the
 sectors bad/used as we go)

 D) Just return an error to the application

Or a mixture...

C) with a max limit of, say 5 attempts, then D).  And then, later if it 
gets `really bad', where most I/O operations are failing, then A).

But I'd consider it acceptable behaviour for bounds check exceptions (ie, 
unreported filesystem corruption) or situations where you have lost a 
large amount of really critical structural information to invoke B).  Much 
better than an Oops.

Whoever made that statement about the hard disk head crashing... now that's 
certainly a laughable suggestion; a hard disk continuing after a head 
crash.  If anything, my experience with disks has been that if they start 
failing, you have to sort things out sooner rather than later.
-- 
Sam Vilain, [EMAIL PROTECTED]

  You can judge your age by the amount of pain you feel when you come
in contact with a new idea.
JOHN NUVEEN



Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-12 Thread Anders Widman
 On Wed, 12 Feb 2003 16:26, Anders Widman wrote:
  Unplanned downtime do cause lot of harm to any business.
 
  It's better to stop when there's a serious error than to blindly continue
  and make things worse.

 I  (and  I  think  no  one  else)  never  said  continue blindly. Most
 users/workstations do not have RAID and probably never will.

 Hard drive costs are constantly decreasing while the value of data is
 constantly increasing.  I think that the use of RAID will increase steadily.

 The  others  want to make Linux a viable option for normal users and
 want Linux to be able to replace Windows or Mac OS. The only way I see
 that happen is if Linux starts to get more userfriendly and safe.

 I guess you're not familiar with what NT does then.

 NT 3.5x would sometimes get confused about it's data and umount the file
 system in question to avoid the risk of damaging data.

 In case of a serious kernel error NT will give a BSOD in situations where
 Linux by default will print an Oops message and continue running.

NT3.5  is  a little old to compare a modern OS with, is it not? I have
had numerous Linux kernel crashes that were not recoverable also.







Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-12 Thread Sam Vilain
On Wed, 12 Feb 2003 08:43, [EMAIL PROTECTED] wrote:
 Dirk, I'd be interested in hearing from you your performance
 experience with ext3 when it reaches 96% full.

No problem, because you get ENOSPC at 95% or 90%.

Hmm, another feature SysAdmins actually find useful, missing in reiserfs.  
Along with quotas (this feature is a lazy case of a quota, really).

On Wed, 12 Feb 2003 18:12, Ross Vandegrift wrote:
 You have to start your software on some kind of foundation.  Working
 hardware sounds like a great place to me.

Hmm, you've never heard of redundancy or fault tolerance then.

What part fails the most in running systems ?  Disk platters.

CPUs might overheat and RAM might suddenly one day get a sticky bit, but as 
you point out there ain't much you can do about it.  Except buy a Tandem, 
or use ECC memory.

But with disks, you can.  Mirroring aside, modern hard disks use S.M.A.R.T. 
technology which claims to be able to spot failures before they happen.  
Many BIOSes will let you turn this feature on and off.  Of course I've 
never actually seen it in action :-).

Not only that, but re-attempting a failed read might just work.  In that 
case, you need to freshen the data (hopefully the disk will re-map the 
block once it sees a write), and if that fails, re-map the block.  I don't 
know if any of the other filesystems do that (I seriously doubt it), but 
it's what Norton 4.5 on DOS used to do to `repair' faulty disks :-).

But doing disk repair is entirely irrelevant for a filesystem.  What's 
important is that you don't get an Oops, a kernel Segfault or worse random 
data corruption or file structure mangling, that the calling process gets 
EIO instead.

Stopping random corruption from violating your assumptions is extremely 
difficult; a software engineer's nightmare :-).  However, modern disks are 
pretty good at keeping their own CRCs, so you should expect that you can 
always get an error code back from the OS if the data didn't come back the 
same state you wrote it.

You (the reiserfs team) need to wire up reiserfs on a custom loopback 
device, and selectively flick blocks to faulty and see what happens.  It's 
just a part of stress testing.

And there is no excuse - reiserfsck should do the right thing when it 
encounters a filesystem with bad blocks and recover what is possible, 
marking the bad blocks as bad.  It needs dd_rescue built into its 
operation :-).

It must suck having a free project get only slight funding.  All of a 
sudden a whole load of geeks get very angry and demanding.  I wish I could 
help, but hey it's more fun to troll.^H^H^H I've got better things to do.
-- 
Sam Vilain, [EMAIL PROTECTED]

The reason we start a war is to fight a war, win a war, thereby
causing no more war!
 - George W. Bush during the first Presidential debate



Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-12 Thread Sam Vilain
On Wed, 12 Feb 2003 14:02, Mike Hodson wrote:
 Well one way of being completely sure is to reset the mount count in the
 filesystem before rebooting, or to set the fstab to never automatically
 fsck. then on some set  schedule, fsck along with a kernel upgrade, and
 schedule the downtime

Nah.  Set up a mirror, wait for a fairly quiet time, sync, split the 
mirror, fsck the split mirror, and only do something if that fsck fails 
:-).

Solaris does all this very well.  It's equivalent of `md-utils' - Online 
Disk Suite - does journalling for you of all writes (including data) if 
you turn it on; at the block level, ignorant of the FS.  IMHO that's a 
much better place to do the journalling.  It's simple, solid.
-- 
Sam Vilain, [EMAIL PROTECTED]

Do you have blacks, too?
 - George W. Bush, talking to Fernando Henrique Cardoso (the president
   of Brazil).  Reported in Der Speigel on May 19 2002.  Never
   reported in any US paper or news source.



Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-12 Thread Anders Widman
 On Wednesday 12 February 2003 02:17, Anders Widman wrote:
  I've used ReiserFS in the past, but have also used ext3 on my
  user's important
  data (/home) after a good chunk of one drive was converted to
  sparse/null files due to a screwup stemming from no 'badblocks' support
  in reiserfs.  Since then, i've used ext3 as well as Reiser but recently
 
  I can't comment on your experience, but personally if I have a drive with
  any number of badblocks (which are showing up to the fs layer, not
  invisibly re-mapped by the drive) then I take the drive back and get a
  replacement, or bin the drive.

 However,  the FS SHOULD support handling of bad blocks/clusters at the
 FS  layer,  even  while running in a production system. Bad blocks can
 pop  up  at any give time for no particular reason, and it is at these
 times  you  (we) need a strong and reliable filesystem that can handle
 and logically remap broken blocks/sectors.

 Sure,  a  disk  with physical errors should be replaced, but until you
 find out about the error on the drive the FS HAS TO HANDLE these kinds
 of problems.

 That is difficult to say if bad blocks should be handled at fs layer or not.
 It would be useful to have this problem solved somehow, but harddrives with
 their remappings looks like the proper part of doing this. And probably fs
 layer should just skilfully use some interface for such remapping. Well,
 remapping is probably not correct word here. Thus, Xuan Baldauf 
 [EMAIL PROTECTED] sent us his program once claimed that it recovered
 blocks w/out remapping. The explanations were the following:

 The problem is that often multiple adjacent blocks are bad. You'll have to detect
 them manually. Once you know the bad blocks, just trying to overwrite them usually
 does not succeed because the disk wants to seek to that block exactly (which does
 not work for the same reason the block is bad). But if the whole track is
 rewritten, the bad blocks usually are gone.

 I suspect track wandering for this: due to small misalignments at each write, a 
track (or more
 precisely, and arc of the track which contains the block to be written) slowly 
wanders. If the
 misalignments do not zero out each other, they add up to a bias. If an arc of an 
has been
 written many times, it will have wandered under these
 conditions. If the wandering has
 progressed too far, the wandering arc slowly reaches the next neighbouring track.

 Now imagine an access to the wandered track: if the head seeks to the original 
position of the
 wandered track, it may not be able to read the wandered arc
 because it is too far away (lower
 signal quality). If the head seeks to the new position of the wandered arc, the 
signal may be
 interfered by the neighbouring track.

 Both effects may occur, which one does not really matter, both makes parts of the 
wandered arc
 inaccessible

 The problem is: the individual wandered arc is no longer accessible, because the 
disk
 controller cannot sync to the block it is flying over because of the bad
 signal-to-noise-ratio. And if the wandered arc is accessible, another write will 
make it
 further wander up to inaccessibility.

 But if the seek to the track of the arc which should be
 overwritten occurs before the wandered
 arc, the disk controller actually can sync to the track and then write the whole 
track,
 effectivily creating the track new and only having the bias of the not-wandered 
part of the
 track. Thus, the wandered arc has not wandered anymore compared to the other arcs 
of the
 track.

 Well, it worked. We had some bad blocks on a drive, write to them failed, after using
 this program there were no bad blocks anymore. 

 So it would be possible to do some actions to 1) get some blocks back in the 
described
 way, 1.1) write to really bad blocks should have remaped them already here if there 
is
 a space in remap area 2) save bad blocks to badblock list in fs if they are still 
bad -
 out of remap area. 
 Would be not bad to try to recover in this way already remapped blocks - do not know 
how
 to get the list of them only.

 Ok, but what if the IO error you got is not a bad block, but a bad cable? Do you want
 the fs to work in the described way? Trying to fix all automatically? I am not sure.

  How about trial and (then) error? :)

 Now about the user space. Using badblocks and some programs like Xuan Baldauf sent us
 and just trying to write to bad blocks make them being remapped - that is how you can
 try to get rid of some amount of badblocks. Should a drive with amount of bad blocks
 which exceeds the remap area be used? It is a realy rare case that the amount of bad
 blocks of such a drive does not get increased - the case where you may want to 
continue
 using the drive - so this is why a proper support for bad blocks was not implemented
 in reiserfs yet. And probably it is not the most urgent thing to do.

  No,  perhaps  bad  blocks  handling is not the major i mprovement we
  need,  however  I  

Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-12 Thread Oleg Drokin
Hello!

On Wed, Feb 12, 2003 at 05:56:58PM +0100, Anders Widman wrote:
 
  So it would be possible to do some actions to 1) get some blocks back in the 
described
  way, 1.1) write to really bad blocks should have remaped them already here if 
there is
  a space in remap area 2) save bad blocks to badblock list in fs if they are still 
bad -
  out of remap area. 
  Would be not bad to try to recover in this way already remapped blocks - do not 
know how
  to get the list of them only.
  Ok, but what if the IO error you got is not a bad block, but a bad cable? Do you 
want
  the fs to work in the described way? Trying to fix all automatically? I am not 
sure.
   How about trial and (then) error? :)

That might be suitable for fsck, but not for kernel I am sure.
Kernel should just probably return error or try to use different block (if it was
doing write) and if certain number of attempts failed, return error too.
Also remount R/O if write error is in system area (journal, superblock, bitmaps)
or special mount option was given that demands remounting R/O on io errors.

Bye,
Oleg



Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-12 Thread Hans Reiser
We are doing ok financially until summer, during which I need to come up 
with more money from somewhere.

There is a lot of fiscal uncertainty in a project like ours.  We get 
money in big chunks with no knowledge of how long we need to make it 
last.  There is this nagging worry that I could get unlucky for the 
wrong 6 month stretch, and have to lay off everyone.  This worry is 
especially acute during the midst of a tech bust (sponsors dry up even 
for successful projects) which is also during the debugging phase for 
reiser4 (will it be debugged by June, or will it take as long as V3, or 
longer).

For V3 we are going to fix any bugs that come in (one known bug remains 
that seems to be elusive and will distract our lead programmer from V4 
this month), put in Oleg's write patch and chris's patches, fix fsck 
when it fails, and that is it.  V3 will be our feature frozen FS for 
mission critical servers.

Every resource we have is going to go into getting V4 done and stable so 
that we can sell it in the summer.  Hopefully we will make it.  One 
worry is that while V4 will be much more stable by design (transactions, 
fsck friendly node format with mkfsids and transaction ids (that make it 
easier to figure out when two version of a file collide which one to 
keep), etc.), V3 will be more stable in implementation for quite some time.

It must suck having a free project get only slight funding.  All of a
sudden a whole load of geeks get very angry and demanding.  I wish I could
help, but hey it's more fun to troll.^H^H^H I've got better things to do.
   


You are free to donate your hard earned money to them.

- Anders

 



--
Hans





Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-12 Thread Anders Widman

 Every resource we have is going to go into getting V4 done and stable so
 that we can sell it in the summer.  Hopefully we will make it.

  Just a question. (I know lots of people will shout at me for asking,
  but please don't :) Will V3/4 be ported to Windows, or are we doomed
  to use the new MS database with integrated Palladium software?

  Linux  is  a great OS, but there are tools that I (and probably many
  other)  use  every  day that I need. One example is Adobe Photoshop,
  colour  management  and lots of other things - not to mention people
  who want to use games ;).

  As  of  now I can not completely go over to Linux. Therefore I would
  pay  to use ReiserFS on my Windows machines. Maybe I am the only one
  who would, but perhaps not.

  - Anders




Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-12 Thread Dirk Mueller
On Mit, 12 Feb 2003, Anders Widman wrote:

   Just a question. (I know lots of people will shout at me for asking,
   but please don't :) Will V3/4 be ported to Windows, or are we doomed
   to use the new MS database with integrated Palladium software?

very unlikely. porting a filesystem is about the same work as writing it 
from scratch. 


Dirk



Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-12 Thread Anders Widman
 On Mit, 12 Feb 2003, Anders Widman wrote:

   Just a question. (I know lots of people will shout at me for asking,
   but please don't :) Will V3/4 be ported to Windows, or are we doomed
   to use the new MS database with integrated Palladium software?

 very unlikely. porting a filesystem is about the same work as writing it
 from scratch. 

  Depends  what  is  the most difficult part; to develop a good system
  and algorithms, or to write the code. :)

  Anyway,  I  see your point and I know my request was far fetched. It
  is  more  likely  that  Adobe  port their programs to Linux than the
  other way around.

   - Anders


 Dirk







Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-12 Thread Anders Widman

 On Wed, Feb 12, 2003 at 06:40:04PM +0100, Anders Widman wrote:
 
  Every resource we have is going to go into getting V4 done and stable so
  that we can sell it in the summer.  Hopefully we will make it.
 
   Just a question. (I know lots of people will shout at me for asking,
   but please don't :) Will V3/4 be ported to Windows, or are we doomed
   to use the new MS database with integrated Palladium software?

 Have you supplied namesys with funding for a port?

  Nope,  I do not have the cash for that. I do have cash to buy myself
  a licence to use ReiserFS though, if it were sold.
 
   Linux  is  a great OS, but there are tools that I (and probably many
   other)  use  every  day that I need. One example is Adobe Photoshop,
   colour  management  and lots of other things - not to mention people
   who want to use games ;).

 Does Photoshop no longer run on a Macintosh?  Does colour management no longer
 run on a Macintosh?  As for games, have you considered a subscription to
 WineX or a game console.

No,  I  do  not  use Mac because they are simply to slow :). WineX and
similar is not fast enough, or stable enough to run most modern games.
But it is not all about the games, rather it is about all the software
that do exist in the Windows-world that has not yet been ported.

 I apologize, but I have a habit of hounding Windows users into admitting
 that the main reason they need Windows is because 1) their employer
 requires it (and my response is The employer can supply the hardware and
 technical support.) or 2) They haven't really looked to see if it can be done
 elsewhere. or 3) a software vendor (like Autodesk) only supports Windows.

And  what  should  we  (Windows users) do when software vendors do not
support anything but Windows?

 
   As  of  now I can not completely go over to Linux. Therefore I would
   pay  to use ReiserFS on my Windows machines. Maybe I am the only one
   who would, but perhaps not.

 Out of curiousity, what do you think that reiserfs would buy you on windows?
 Would reiserfs be more of a benefit than a separate linux box running
 samba or nfsd?

  No,  Samba  and  NFS  would  defeat  some  of  the  benefit (speed) of
  ReiserFS. Though I do use ReiserFS over Samba for backup/storage of my
  data.

   - Anders


   




Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-12 Thread Dirk Schenkewitz

Oleg Drokin wrote (in response to Anders Widman):

   So it would be possible to do some actions to 
   1) get some blocks back in the described way, 
   1.1) write to really bad blocks should have remaped them 
already here if there is a space in remap area 
   2) save bad blocks to badblock list in fs if they are still bad -
  out of remap area. 
   Would be not bad to try to recover in this way already remapped 
   blocks - do not know how to get the list of them only.
   Ok, but what if the IO error you got is not a bad block, but 
   a bad cable? Do you want the fs to work in the described way? 
   Trying to fix all automatically? I am not sure.
 
How about trial and (then) error? :)
 
 That might be suitable for fsck, but not for kernel I am sure.
 Kernel should just probably return error or try to use different 
 block (if it was doing write) and if certain number of attempts
 failed, return error too.
 Also remount R/O if write error is in system area (journal, 
 superblock, bitmaps) or special mount option was given that demands 
 remounting R/O on io errors.

I still feel that the system area should be DESIGNED to be extra-
robust against everything, because it is vital for the whole fs.
Btw, might such thoughts be the reason that ext2 has superblock
backups? I agree that a bad block in the system area is a good
reason for all kinds of alarm, but a really good fs should overcome
more than one without (unrecoverable) damage to the fs in whole.

happy coding
dirk
-- 
Dirk Schenkewitz 

InterFace AG fon: +49 (0)89 / 610 49 - 126
Leipziger Str. 16fax: +49 (0)89 / 610 49 - 83
D-82008 Unterhaching 
http://www.interface-ag.de   mailto:[EMAIL PROTECTED]



Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-12 Thread Zygo Blaxell
In article [EMAIL PROTECTED],
Russell Coker  [EMAIL PROTECTED] wrote:
Now all machines other than laptops are getting RAID, all hard drives support 
re-mapping bad sectors, and the entire situation is different.

Actually, laptops get RAID too... ;-)

My laptop can have up to 3 2.5 IDE disks simultaneously installed, if
I remove optional equipment such as second batteries and CD-ROM.
/ is /dev/loop7 (rijndael encryption) on top of /dev/md0 on top of
/dev/hd[ab]2.

-- 
Zygo Blaxell (Laptop) [EMAIL PROTECTED]
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD



Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-12 Thread Zygo Blaxell
In article [EMAIL PROTECTED],
Anders Widman  [EMAIL PROTECTED] wrote:
The  others  want to make Linux a viable option for normal users and
want Linux to be able to replace Windows or Mac OS. The only way I see
that happen is if Linux starts to get more userfriendly and safe.

Last time I checked, Windows and Mac OS come to a near total halt when
they see a disk error while doing a write on non-removable media, unless
the application goes to extraordinary lengths to handle the error itself.

Frankly, I used to mount my ext3 filesystems on servers with
'errors=panic', causing a reboot at the very first sign of trouble (past
tense as I now use reiserfs which doesn't like that option ;-).
The sooner the server goes out of production and starts running fsck,
the sooner it will finish running fsck and come back into production
(or, in the worst case, the sooner an admin person will start pulling
out backup tapes and ordering replacement disks).


-- 
Zygo Blaxell (Laptop) [EMAIL PROTECTED]
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD