Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-07 Thread Matthias Andree
[stripping Cc: list]

On Thu, 03 Aug 2006, Edward Shishkin wrote:

 What kind of forward error correction would that be,
 
 Actually we use checksums, not ECC. If checksum is wrong, then run
 fsck - it will remove the whole disk cluster, that represent 64K of
 data.

Well, that's quite a difference...

 Checksum is checked before unsafe decompression (when trying to
 decompress incorrect data can lead to fatal things).

Is this sufficient? How about corruptions that lead to the same checksum
and can then confuse the decompressor? Is the decompressor safe in that
it does not scribble over memory it has not allocated?

-- 
Matthias Andree


e2fsck unfixable corruptions (was: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)

2006-08-06 Thread Matthias Andree
(changing subject to catch Ted's attention)

Bodo Eggert schrieb am 2006-08-05:

 - I have an ext3 that can't be fixed by e2fsck (see below). fsck will fix
   some errors, trash some files and leave a fs waiting to throw the same
   error again. I'm fixing it using mkreiserfs now.

If such a bug persists with the latest released e2fsck version - you're
not showing e2fsck logs - I'm rather sure Ted Ts'o would like to have a
look at your file system meta data in order to teach e2fsck how to fix
this.

I've seen sufficient releases of reiserfsck that couldn't fix certain
bugs, too, so trying with the latest version of the respective tools is
a must.

-- 
Matthias Andree


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-03 Thread Matthias Andree
On Tue, 01 Aug 2006, David Masover wrote:

 RAID deals with the case where a device fails. RAID 1 with 2 disks can
 in theory detect an internal inconsistency but cannot fix it.
 
 Still, if it does that, that should be enough.  The scary part wasn't 
 that there's an internal inconsistency, but that you wouldn't know.

You won't usually know, unless you run a consistency check: RAID-1 will
only read from one of the two drives for speed - except if you make the
system check consistency as it goes, which would imply waiting for both
disks at the same time. And in that case, you'd better look for drives
that allow to synchronize their platter staples in order to avoid the
read access penalty that waiting for two drives entails.

 And it can fix it if you can figure out which disk went.

If it's decent and detects a bad block, it'll log it and rewrite it with
data from the mirror and let the drive do the remapping through ARWE.

 Depending how far you propogate it. Someone people working with huge
 data sets already write and check user level CRC values for this reason
 (in fact bitkeeper does it for one example). It should be relatively
 cheap to get much of that benefit without doing application to
 application just as TCP gets most of its benefit without going app to
 app.
 
 And yet, if you can do that, I'd suspect you can, should, must do it at 
 a lower level than the FS.  Again, FS robustness is good, but if the 
 disk itself is going, what good is having your directory (mostly) intact 
 if the files themselves have random corruptions?

Berkeley DB can, since version 4.1 (IIRC), write checksums (newer
versions document this as SHA1) on its database pages, to detect
corruptions and writes that were supposed to be atomic but failed
(because you cannot write 4K or 16K atomically on a disk drive).

-- 
Matthias Andree


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-03 Thread Matthias Andree
On Tue, 01 Aug 2006, Ric Wheeler wrote:

 Mirroring a corrupt file system to a remote data center will mirror your 
 corruption.
 
 Rolling back to a snapshot typically only happens when you notice a 
 corruption which can go undetected for quite a while, so even that will 
 benefit from having reliability baked into the file system (i.e., it 
 should grumble about corruption to let you know that you need to roll 
 back or fsck or whatever).
 
 An even larger issue is that our tools, like fsck, which are used to 
 uncover these silent corruptions need to scale up to the point that they 
 can uncover issues in minutes instead of days.  A lot of the focus at 
 the file system workshop was around how to dramatically reduce the 
 repair time of file systems.

Which makes me wonder if backup systems shouldn't help with this. If
they are reading the whole file anyways, they can easily compute strong
checksums as they go, and record them for later use, and check so many
percent of unchanged files every day to complain about corruptions.

-- 
Matthias Andree


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-03 Thread Matthias Andree
On Tue, 01 Aug 2006, Hans Reiser wrote:

 You will want to try our compression plugin, it has an ecc for every 64k

What kind of forward error correction would that be, and how much and
what failure patterns can it correct? URL suffices.

-- 
Matthias Andree


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread Matthias Andree
Adrian Ulrich schrieb am 2006-08-01:

  suspect, particularly with 7200/min (s)ATA crap. 
 
 Quoting myself (again):
  A quick'n'dirty ZFS-vs-UFS-vs-Reiser3-vs-Reiser4-vs-Ext3 'benchmark'
 
 Yeah, the test ran on a single SATA-Harddisk (quick'n'dirty).
 I'm so sorry but i don't have access to a $$$ Raid-System at home. 

I'm not asking for you to perform testing on a  RAID system with
SCSI or SAS, but I consider the obtained data (I am focussing on
transactions per unit of time) highly suspicious, and suspect write
caches might have contributed their share - I haven't seen a drive that
shipped with write cache disabled in the past years.

  sdparm --clear=WCE /dev/sda   # please.
 
 How about using /dev/emcpower* for the next benchmark?

No, it is valid to run the test on commodity hardware, but if you (or
the benchmark rather) is claiming transactions, I tend to think
ACID, and I highly doubt any 200 GB SATA drive manages 3000
synchronous writes per second without causing either serious
fragmentation or background block moving.

This is a figure I'd expect for synchronous random access to RAM disks
that have no seek and rotational latencies (and research for hybrid
disks w/ flash or other nonvolatile fast random access media to cache
actual rotating magnetic plattern access is going on elsewhere).

I didn't mean to say your particular drive were crap, but 200GB SATA
drives are low end, like it or not -- still, I have one in my home
computer because these Samsung SP2004C are so nicely quiet.

 I mighty be able to re-run it in a few weeks if people are interested
 and if i receive constructive suggestions (= Postmark parameters,
 mkfs options, etc..)

I don't know Postmark, I did suggest to turn the write cache off. If
your systems uses hdparm -W0 /dev/sda instead, go ahead. But you're
right to collect and evaluate suggestions first if you don't want to run
a new benchmark every day :)

-- 
Matthias Andree


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread Matthias Andree
On Tue, 01 Aug 2006, Avi Kivity wrote:

 There's no reason to repack *all* of the data.  Many workloads write and 
 delete whole files, so file data should be contiguous.  The repacker 
 would only need to move metadata and small files.

Move small files? What for?

Even if it is only moving metadata, it is not different from what ext3
or xfs are doing today (rewriting metadata from the intent log or block
journal to the final location).

The UFS+softupdates from the BSD world looks pretty good at avoiding
unnecessary writes (at the expense of a long-running but nice background
fsck after a crash, which is however easy on the I/O as of recent FreeBSD
versions).  Which was their main point against logging/journaling BTW,
but they are porting XFS as well to save those that need instant
complete recovery.

-- 
Matthias Andree


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread Matthias Andree
Jan Engelhardt schrieb am 2006-08-01:

 I didn't mean to say your particular drive were crap, but 200GB SATA
 drives are low end, like it or not --
 
 And you think an 18 GB SCSI disk just does it better because it's SCSI?

18 GB SCSI disks are 1999 gear, so who cares?
Seagate didn't sell 200 GB SATA drives at that time.

 Esp. in long sequential reads.

You think SCSI drives aren't on par? Right, they're ahead.
98 MB/s for the fastest SCSI drives vs. 88 MB/s for Raptor 150 GB SATA
and 74 MB/s for the fastest other ATA drives.

(Figures obtained from StorageReview.com's Performance Database.)

-- 
Matthias Andree


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-31 Thread Matthias Andree
Adrian Ulrich schrieb am 2006-07-31:

   And EXT3 imposes practical limits that ReiserFS doesn't as well. The big
   one being a fixed number of inodes that can't be adjusted on the fly,
  
  Right. Plan ahead.
 
 Ok: Assume that i've read the mke2fs manpage and added more inodes to
 my filesystem.
 
 So: What happens if i need to grow my filesystem by 200% after 1-2
 years? Can i add more inodes to Ext3 on-the-fly ?

Since you grow, you'll be using resize2fs (or growfs or mkfs -G for
UFS). resize2fs and the other tools do exactly that: add inodes - and
you could easily have told this either from reading the resize2fs code
or just trying it on a temp file:

  -- create file system
  dd if=/dev/zero of=/tmp/foo bs=1k count=5
  /sbin/mke2fs -F -j /tmp/foo

  -- check no. of inodes
  /sbin/tune2fs -l /tmp/foo | grep -i inode | head -2
  # Inode count:  12544
  # Free inodes:  12533

  -- resize
  /sbin/e2fsck -f /tmp/foo
  dd if=/dev/zero bs=1k count=5 /tmp/foo
  /sbin/resize2fs /tmp/foo

  -- check no. of inodes
  /sbin/tune2fs -l /tmp/foo | grep -i inode
  # Inode count:  23296
  # Free inodes:  23285

Trying the same after mke2fs -b 1024 -i 1024 shows that the inode
density will continue to be respected.

FreeBSD 6.1's growfs(8) increases the number of inodes. This is
documented to work since 4.4.

Solaris 8's mkfs -G also increases the number of inodes and apparently
also works for mounted file systems.

This looks rather like an education issue rather than a technical limit.

 A filesystem with a fixed number of inodes (= not readjustable while
 mounted) is ehr.. somewhat unuseable for a lot of people with
 big and *flexible* storage needs (Talking about NetApp/EMC owners)

Which is untrue at least for Solaris, which allows resizing a life file
system. FreeBSD and Linux require an unmount.

 Why are a lot of Solaris-people using (buying) VxFS? Maybe because UFS
 also has such silly limitations? (..and performs awkward with trillions
 of files..?..)

Well, such silly limitations... looks like they are mostly hot air
spewn by marketroids that need to justify people spending money on their
new filesystem.

The only problem remains if you grossly overestimate the average file
size and with it underestimate the number of inodes needed. But even
then, I'd be interested to know if that's a real problem for systems
such as ZFS.

-- 
Matthias Andree


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-31 Thread Matthias Andree
(resending complete message to the list).

Adrian Ulrich schrieb am 2006-07-31:

 Hello Matthias,
 
  This looks rather like an education issue rather than a technical limit.
 
 We aren't talking about the same issue: I was asking to do it
 on-the-fly. Umounting the filesystem, running e2fsck and resize2fs
 is something different ;-)

There was stuff by Andreas Dilger, to support online resizing of
mounted ext2 file systems. I never cared to look for this (does it
support ext3, does it work with current kernels, merge status) since
offline resizing was always sufficient for me.

 A colleague of mine happened to create a ~300gb filesystem and started
 to migrate Mailboxes (Maildir-style format = many small files (1-3kb))
 to the new LUN. At about 70% the filesystem ran out of inodes;

Well - easy to fix, newfs again with proper inode density (perhaps 1 per
2 kB) and redo the migration. Of course you're free to pay for a new
file system if your fellow admin can't be bothered to remember newfs's
-i option.

  Well, such silly limitations... looks like they are mostly hot air
  spewn by marketroids that need to justify people spending money on their
  new filesystem.
 
 Have you ever seen VxFS or WAFL in action?

No I haven't. As long as they are commercial, it's not likely that I
will.

 Great to see that Sun ships a state-of-the-art Filesystem with
 Solaris... I think linux should do the same...

I think reallocating inodes for UFS and/or ext2/ext3 is possible, even
online, but someone needs to write, debug and field-test the code to do
that - possibly based on Andreas Dilger's earlier ext2 online resizing
work.

-- 
Matthias Andree


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-31 Thread Matthias Andree
Jan-Benedict Glaw schrieb am 2006-07-31:

 Uh?  Where did you face a problem there?
 
 With maildir, you shouldn't face any problems IMO. Even users with
 zillions of mails should work properly with the dir_index stuff:
 
   tune2fs -O dir_index /dev/hdXX
 
 or alternatively (to start that for already existing directories):
 
   e2fsck -fD /dev/hdXX

hat is not alternatively, but tune2fs first, then e2fsck -fD
(which can't happen on a RW-mounted FS and you should only try this on
your rootfs if you can reboot with magic sysrq or from a rescue system).

-- 
Matthias Andree


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-31 Thread Matthias Andree
Jan-Benedict Glaw schrieb am 2006-07-31:

 On Mon, 2006-07-31 18:44:33 +0200, Rudy Zijlstra [EMAIL PROTECTED] wrote:
  On Mon, 31 Jul 2006, Jan-Benedict Glaw wrote:
   On Mon, 2006-07-31 17:59:58 +0200, Adrian Ulrich 
   [EMAIL PROTECTED] wrote:
A colleague of mine happened to create a ~300gb filesystem and started
to migrate Mailboxes (Maildir-style format = many small files (1-3kb))
to the new LUN. At about 70% the filesystem ran out of inodes; Not a
  
   So preparation work wasn't done.
  
  Of course you are right. Preparation work was not fully done. And using 
  ext1 would also have been possible. I suspect you are still using ext1, 
  cause with proper preparation it is perfectly usable.
 
 Oh, and before people start laughing at me, here are some personal or
 friend's experiences with different filesystems:
 
   * reiser3: A HDD containing a reiser3 filesystem was tried to be
 booted on a machine that fucked up DMA writes. Fortunately, it
 crashed really soon (right after going for read-write.)  After
 rebooting the HDD on a sane PeeCee, it refused to boot. Starting
 off some rescue system showed an _empty_ root filesystem.

Massive hardware problems don't count. ext2/ext3 doesn't look much better in
such cases. I had a machine with RAM gone bad (no ECC - I wonder what
idiot ordered a machine without ECC for a server, but anyways) and it
fucked up every 64th bit - only in a certain region. Guess what happened
to the fs when it went into e2fsck after a reboot. Boom. Same with a
dead DPTA that lost every 16th block or so, the rescue in the first case
was swapping the RAM and amrecover and in the second swapping the
drive and dsmc restore. OTOH, kernel panics on bad blocks are a no-no
of course.

   * A friend's XFS data partition (portable USB/FireWire HDD) once
 crashed due to being hot-unplugged off the USB.  The in-kernel XFS
 driver refused to mount that thing again, and the tools also
 refused to fix any errors. (Don't ask, no details at my hands...)

Don't use write caches then. (Though I've seen NUL-filled blocks in new
files or appended to files after in 2001 or 2002.)

   * JFS just always worked for me. Though I've never ever had a broken
 HDD where it (or it's tools) could have shown how well-done they
 were, so from a crash-recovery point of view, it's untested.

SUSE removed JFS support from their installation tool for technical
reasons they didn't specify in the release notes. Whatever.

 ext3 always worked well for me, so why should I abandon it?

Plus, it and its tools are maintained.

-- 
Matthias Andree


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-31 Thread Matthias Andree
Adrian Ulrich schrieb am 2006-07-31:

 Ehr: Such a migration (on a very busy system) takes *some* time (weeks).
 Re-Doing (migrate users back / recreate the FS / start again) the whole
 thing isn't really an option..

All the more important to think about FS requirements *before*
newfs-ing if a quick one day for rsync/star/dump+restore isn't
available. If you're hitting, for instance, the hash collision problem
in reiser3, you're as dead as with a FS without inodes.

   Have you ever seen VxFS or WAFL in action?
  
  No I haven't. As long as they are commercial, it's not likely that I
  will.
 
 Why?

I'm trying to shift my focus away from computer administration and
better file systems than old-style non-journalling, non-softupdates UFS
are available today and more will follow.

Cc: list weeded out.

-- 
Matthias Andree


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]

2006-07-31 Thread Matthias Andree
Adrian Ulrich wrote:

 See also: http://spam.workaround.ch/dull/postmark.txt
 
 A quick'n'dirty ZFS-vs-UFS-vs-Reiser3-vs-Reiser4-vs-Ext3 'benchmark'

Whatever Postmark does, this looks pretty besides the point.

Are these actual transactions with the Durability guarantee?
3000/s doesn't look too much like you're doing synchronous I/O (else
figures around 70/s perhaps 100/s would be more adequate), and cache
exercise is rather irrelevant for databases that manage real (=valuable)
data...

-- 
Matthias Andree


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-31 Thread Matthias Andree
Jan-Benedict Glaw schrieb am 2006-07-31:

  Massive hardware problems don't count. ext2/ext3 doesn't look much better in
  such cases. I had a machine with RAM gone bad (no ECC - I wonder what
 
 They do! Very much, actually. These happen In Real Life, so I have to
 pay attention to them. Once you're in setups with  1 machines,
 everything counts. At some certain point, you can even use HDD's
 temperature sensors in old machines to diagnose dead fans.
 
 Everything that eases recovery for whatever reason is something you
 have to pay attention to. The simplicity of ext{2,3} is something I
 really fail to find proper words for. As well as the really good fsck.
 Once seen a SIGSEGV'ing fsck, you really don't want to go there.

The point is: If you've written data with broken hardware (RAM, bus,
controllers - loads of them, CPU), what is on your disks is
untrustworthy anyways, and fsck isn't going to repair your gzip file
where every 64th bit has become a 1 or when the battery-backed write
cache threw 60 MB down the drain...

Of course, an fsck that crashes is unbearable, but that doesn't apply to
broken hardware failures. You need backups with a few generations to
avoid massively losing data.

-- 
Matthias Andree


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-31 Thread Matthias Andree
Theodore Tso schrieb am 2006-07-31:

 With the latest e2fsprogs and 2.6 kernels, the online resizing support
 has been merged in, and as long as the filesystem was created with
 space reserved for growing the filesystem (which is now the default,
 or if the filesystem has the off-line prepration step ext2prepare run
 on it), you can run resize2fs on a mounted filesystem and grow an
 ext2/3 filesystem on-line.  And yes, you get more inodes as you add
 more disk blocks, using the original inode ratio that was established
 when the filesystem was created.

That's cool.

The interesting part for some people would be, if I read past postings
correctly, to change the inode ratio in an existing (perhaps even
mounted) file system without losing data.

(I'm not sure how many blocks have to be moved and/or changed for that
purpose, because I know too little about the on-disk ext2 layout, but
since block relocating is already in place for shrink support in the
offline resizer, some of the work appears to be done already.)

-- 
Matthias Andree


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]

2006-07-31 Thread Matthias Andree
On Mon, 31 Jul 2006, Nate Diller wrote:

 this is only a limitation for filesystems which do in-place data and
 metadata updates.  this is why i mentioned the similarities to log
 file systems (see rosenblum and ousterhout, 1991).  they observed an
 order-of-magnitude increase in performance for such workloads on their
 system.

It's well known that transactions that would thrash on UFS or ext2fs may
have quieter access patterns with shorter strokes can benefit from
logging, data journaling, whatever else turns seeks into serial writes.
And then, the other question with wandering logs (to avoid double
writes) and such, you start wondering how much fragmentation you get as
the price to pay for avoiding seeks and double writes at the same time.
TANSTAAFL, or how long the system can sustain such access patterns,
particularly if it gets under memory pressure and must move. Even with
lazy allocation and other optimizations, I question the validity of
3000/s or faster transaction frequencies. Even the 500 on ext3 are
suspect, particularly with 7200/min (s)ATA crap. This sounds pretty much
like the drive doing its best to shuffle blocks around in its 8 MB cache
and lazily writing back.

sdparm --clear=WCE /dev/sda   # please.

-- 
Matthias Andree


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-27 Thread Matthias Andree
On Thu, 27 Jul 2006, Grzegorz Kulewski wrote:

 Sorry for my stupid question, but could you tell me why starting to make 
 incompatible changes to reiserfs3 now (when reiserfs3 technology is 
 rather old) and making reiserfs3 unstable (again), possibly for several 
 months or even years is better than fixing big issues with reiser4 (if 
 there are any really big left) merging it and trying to stabilize it?
 
 For end user both ways will result in mkfs so...

ext2fs and ext3fs, without plugins, added dir_index as a compatible
upgrade, with an e2fsck option (that implies optional) to build indices
for directories without them.

ext3fs is a compatible upgrade from ext2fs, it's as simple as unmount,
tune2fs -j, mount.

reiserfs 3.6 could deal with 3.5 file systems, and mount -o conv with
a 3.6 driver would convert a 3.5 file system to 3.6 level
(ISTR it had to do with large file support and perhaps NFS
exportability, but don't quote me on that).

I wonder what makes the hash overflow issue so complicated (other than
differing business plans, that is) that upgrading in place isn't
possible. Changes introduce instability, but namesys were proud of their
regression testing - so how sustainable is their internal test suite?

Instead, we're told reiser4 would fix this (quite likely) and we should
wait until it's ready (OK, we shouldn't be using experimental stuff for
production but rather for /tmp, but the file system will take many
months to mature after integration) and it will be mkfs time - so
reiser4 better be mature before we go that way if there's no way back
short of amrecover, restore or tar -x.

Smashing out most of the Cc:s in order not to bore people.

-- 
Matthias Andree


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-25 Thread Matthias Andree
On Tue, 25 Jul 2006, Denis Vlasenko wrote:

 I, on the contrary, want software to impose as few limits on me
 as possible.

As long as it's choosing some limit, I'll pick the one with fewer
surprises.

-- 
Matthias Andree


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-24 Thread Matthias Andree
On Sun, 23 Jul 2006, Hans Reiser wrote:

 I want reiserfs to be the filesystem that professional system
 administrators view as the one with both the fastest technological pace,
 and the most conservative release management.

Well, I, with the administrator hat on, phased out all reiserfs file
systems and replaced them by ext3. This got me rid of silent
corruptions, immature reiserfsprogs and hash collision chain limits.

 I apologize to users  that the technology required a 5 year gap between
 releases.   It just did, an outsider may not realize how deep the
 changes we made were.  Things like per node locking based on a whole new
 approach to tree locking that goes bottom up instead of the usual top
 down are big tasks.Dancing trees are a big change, getting rid of
 blobs is a big change, wandering logs.  We did a lot of things like
 that, and got very fortunate with them.  If we had tried to add such
 changes to V3, the code would have been unstable the whole 5 years, and
 would not have come out right.

And that is something that an administrator does not care the least
about. It must simply work, and the tools must simply work. Once I hit
issues like xfs_check believes / were mounted R/W (not ignoring rootfs)
and refuses the R/O check, reiserfsck can't fix a R/O file system
(I believed this one got fixed before 3.6.19) or particularly silent
corruptions that show up later in a routine fsck --check after a kernel
update, the filesystem and its tools appear in a bad light. I've never
had such troubles with ext2fs or ext3fs or FreeBSD's or Solaris's ufs.

I'm not sure what patches Chris added to SUSE's reiserfs, nor do I care
any more. The father declared his child unsupported, and that's the end
of the story for me. There's nothing wrong about focusing on newer code,
but the old code needs to be cared for, too, to fix remaining issues
such as the can only have N files with the same hash value. (I am well
aware this is exploiting worst-case behavior in a malicious sense but I
simply cannot risk such nonsense on a 270 GB RAID5 if users have shared
work directories.)

-- 
Matthias Andree


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-24 Thread Matthias Andree
On Mon, 24 Jul 2006, Hans Reiser wrote:

 and that's the end
 of the story for me. There's nothing wrong about focusing on newer code,
 but the old code needs to be cared for, too, to fix remaining issues
 such as the can only have N files with the same hash value. 

 Requires a disk format change, in a filesystem without plugins, to fix it.

You see, I don't care a iota about plugins or other implementation details.

The bottom line is reiserfs 3.6 imposes practial limits that ext3fs
doesn't impose and that's reason enough for an administrator not to
install reiserfs 3.6. Sorry.

-- 
Matthias Andree


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-24 Thread Matthias Andree
Mike Benoit wrote:

 I've been bitten by running out of inodes on several occasions, and by
 switching to ReiserFS it saved one company I worked for over $250,000
 because they didn't need to buy a totally new piece of software.

ext3fs's inode density is configurable, reiserfs's hash overflow chain
length is not, and it doesn't show in df -i either.

If you need lots of inodes, mkfs for lots. That's old Unix lore.


Re: Congratulations! we have got hash function screwed up

2004-12-31 Thread Matthias Andree
On Thu, 30 Dec 2004, Hans Reiser wrote:

 A working undelete can either hog disk space or die the moment some
 large write comes in. And if you're at that point, make it a versioning
 file system
 
 Well, yes, it should be one.
 
 darpa is paying for views, add in a little versioning and.

If the view is something between a transactional view in a SQL
database and a device-mapper snapshot, then yes, it might be close
enough. There's however always the problem of capacity conflicts, and
there may need to be a switch that prefers keep older versions over
discard older versions so that the admin with - by your leave - idiot
users has a chance to save his users' a**e*.

 - but then don't complain about space efficiency.

 This is an area where apple was smarter than Unix.  Having a trash can 
 is what real users need, more than they need performance..

Does Apple's trash can help against files that get overwritten in situ?
If not, it's insufficient to fix another common failure. My Mom is
prone (is that word applicable to human behavior?) to open a file
(say, a half year schedule of the local community), edit it, without
saving it under a new name - by the time she's completed her edit, she
has forgotten to rename it and boom, old file dead. Next week she wants
it back...

 I would however auto-empty the trash can when space got low

That isn't desired. See above.

 Well, it hasn't been coded solely because we haven't gotten around to it 
 what with all else that needs doing and still needs doing.  Remind me 
 about this in a year.:)

Save this mail to a file and have atd mail it to you. Or use a calendar :)

-- 
Matthias Andree


Re: Congratulations! we have got hash function screwed up

2004-12-30 Thread Matthias Andree
Hans Reiser [EMAIL PROTECTED] writes:

Again, this is a lame excuse for a bug. First you declare some features on
your filesystem, later, when it turns out that it isn't being delivered,
you act as if this were a known condition.

 Well this is true, you are right.  Reiser4 is the fix though.

No, it isn't. Reiser4 is an alternative beast. Or will it transparently
fix the collision problem in a 3.5 or 3.6 file system, in a way that
is backwards compatible with 3.6 drivers? If not, please fix reiser3.6.

Given that Reiser4 isn't proven yet in the field (for that, it would
have to be used as the default file system by at least one major
distributor for at least a year), it is certainly not an option for
servers _yet_.

A file system that intransparently (i. e. not inode count or block
count) refuses to create a new file doesn't belong on _my_ production
machines, which shall migrate away from reiserfs on the next suitable
occasion (such as upgrades). There's ext3fs, jfs, xfs, and in 2006 or
2007, we'll talk about reiser4 again. Yes, I am conservative WRT file
systems and storage.

-- 
Matthias Andree


Re: Congratulations! we have got hash function screwed up

2004-12-30 Thread Matthias Andree
Yiannis Mavroukakis [EMAIL PROTECTED] writes:

 Your proven reasoning sounds a bit strange to me..Microsoft (aka
 major distributor at least in my books) had her filesystems in the
 field for ages, does this prove any of them good (or bad for that
 matter)?

My reasoning mentioned a /required/, but not a /sufficient/ criterion.

In other words: not before it is proven in the field will I consider it
for production use.

Remember the Linux 2.2 reiserfs 3.5 NFS woes?
Remember the early XFS-NFS woes?

These are all reasons to avoid a shiny new file system for serious work.

 I don't think I'd wait for a distributor to shove reiser4 down my
 throat, just because the distributor seems to trust it, so the logical
 course would be for me to try it out. I'll grant you that I am not
 using it on the mission critical server, because our hosting provider
 will not support it (ext3 addicts..oh well)

For practical recovery reasons (error on root FS after a crash), ext3fs
is easier to handle. You can fsck the (R/O) root partition just fine
(e2fsck then asks you to reboot right away); for reiserfs, you'll have
to boot into some emergency or rescue system...

 but I do have it on my development server, that does house critical
 code and receives all kinds of hammering from yours truly; And I use
 it at home.

I reformatted my last at home reiserfs an hour ago and unloaded the
reiserfs kernel module, as the way how Hans has responded to the error
report is inacceptable.

Anyone is free to choose the file system, and as the simple
demonstration code posted earlier shows a serious flaw in reiserfs,
Hans's response was boldfaced, I ditched reiserfs3. End of story.

-- 
Matthias Andree


Re: Congratulations! we have got hash function screwed up

2004-12-30 Thread Matthias Andree
Yiannis Mavroukakis [EMAIL PROTECTED] writes:

 I agree, but you're generalising, this is not xfs and reiser4 is not 3.5
 ;)
 If you don't try out the shiny new filesystem yourself, how can you
 possibly dismiss it based on the past failures
 of other filesystems? 

I doubt new software is bug-free. I don't expect NFS problems with
reiser4 though, these should be in the regression tests. :-)

-- 
Matthias Andree


Re: Congratulations! we have got hash function screwed up

2004-12-30 Thread Matthias Andree
Cal [EMAIL PROTECTED] writes:

 --
 and then at Thu, 30 Dec 2004 13:40:53 +0100, it was written ...
  ... 
  Anyone is free to choose the file system, and as the simple
  demonstration code posted earlier shows a serious flaw in reiserfs,
  Hans's response was boldfaced, I ditched reiserfs3. End of story.
  

 Your policy and philosophy on file system selection are yours to enjoy as
 you see fit, but the anger and angst ... ?   Phew!! 

I have no interest to deal with systems that have known and reproducible
cases of failure that are nondeterministic in practical use.

And Marc's documentation showed this is a real-world problem, not an
ivory tower problem.

The reiserfs story is over for me. All private machines I deal with are
reiserfs-free as of a few hours ago.

It was just one bug too many, and it was handled unprofessionally,
unlike many bugs before which had been dealt with on short notice
usually, or at least accepted for looking into.

I'll phase reiser3 out on my work machines as I see fit.

I have seen too many bugs in reiserfs3.

I do believe reiserfs4 fixes some design flaws of reiser3, and when the
implementation issues are all shaken out in one or two years' time, it
may be a good file system and I will look at it - I trust the reiserfs
team can learn from their mistakes.

I hope they learn that THIS handling of the error was wrong.

Who cares, not us for the past five years is not a proper response to
a real-world problem.

-- 
Matthias Andree


Re: Congratulations! we have got hash function screwed up

2004-12-30 Thread Matthias Andree
On Thu, 30 Dec 2004, Hans Reiser wrote:

 Fixing hash collisions in V3 to do them the way V4 does them would 
 create more bugs and user disruption than the current bug we have all 
 lived with for 5 years until now.  If someone thinks it is a small 
 change to fix it, send me a patch.  Better by far to fix bugs in V4, 
 which is pretty stable these days.

Better to fix a known bug than create a file system vacuum before V4 is
really stable.

Anyways, I don't care any more, I'm phasing out ReiserFS v3 and have no
plans to try V4 before 2006.

-- 
Matthias Andree


Re: Congratulations! we have got hash function screwed up

2004-12-30 Thread Matthias Andree
On Thu, 30 Dec 2004, Burnes, James wrote:

 (BTW: If Hans is a little tired of working on Reiser3 it's probably
 because he is currently stressed out making last minute tweaks on
 Reiser4 and managing his team.
 
 Cut him some slack.  Email conversations don't show a number of things
 we take for granted, like that fact that the person we're talking to
 looks really tired etc.  Unlike ext3, XFS and JFS, Reiser isn't funded
 by someone with huge pockets.)

I'm willing to grant ANY time-out, if Hans wrote I have a pile of
deadline reiser4 contract work before I can deal with that, fine, he
didn't but said use reiser4 instead. And that's inadequate.

And I say this without any emotions, red head, swelling veins and such.

-- 
Matthias Andree


Re: Congratulations! we have got hash function screwed up

2004-12-30 Thread Matthias Andree
On Thu, 30 Dec 2004, Esben Stien wrote:

 Sure, but your not factoring in murphys law here. A tool to undelete
 would come many people in handy who even got proper backup solutions.

You're asking for a versioned file system. If reiserfs v4 doesn't offer
such properties, find something else that does.

 Besides, a recovery is not the same as a feature to undelete. Such a
 feature would maybe save a days work, which is a lot in some circles.

Backup more often. Staged backup schemes (hourly, daily, weekly,
monthly) with varying levels of differential or complete backups, plus
off-site archives, are probably a good idea for these circles then.

-- 
Matthias Andree


Re: Congratulations! we have got hash function screwed up

2004-12-30 Thread Matthias Andree
Spam [EMAIL PROTECTED] writes:

   In any case. Undelete has been since ages on many platforms. It IS a
   useful feature. Accidents CAN happen for many reasons and in some
   cases you may need to recover data.

   Besides, a deletion does not fully remove the data, but just unlinks
   it. In Reiser where there is tailing etc for small files this can be
   a problem. Either the little file might not be able to be recovered
   (shouldn't the data still exist, even if it is tailed), or the user
   need to use a non-tailing policy?

A working undelete can either hog disk space or die the moment some
large write comes in. And if you're at that point, make it a versioning
file system - but then don't complain about space efficiency.

   well, overwritten data is not so easy to get back. But from what I
   understand in Linux, is that many applications actually write
   another file and then unlinks the old file? If that is the case then
   it may even be possible to get back some overwritten files!

I see enough applications to just overwrite an output file. 

This whole discussion doesn't belong here until someone talks about
implementing a whole versioning system for reiser4.

-- 
Matthias Andree


Re: I oppose Chris and Jeff's patch to add an unnecessary additional namespace to ReiserFS

2004-04-26 Thread Matthias Andree
On Mon, 26 Apr 2004, Chris Mason wrote:

 I hope v4 does improve the xattr api, and I hope it manages to do so for
 more then just reiser4.  It is important that application writers are
 able to code to a single interface and get coverage across all the major
 linux filesystems.

Interesting point, given that SuSE were early adopters of alternative
file systems such as JFS, ReiserFS, and XFS (in lexicographical order
rather than order of appearance). These have always diversified the
semantics offered, not only in adding features that other systems didn't
have, but also in omitting features the other file systems did have -
chattr, for instance, or tail merging that confused boot loaders, for
another.

With respect to Hans's reasoning about name spaces, is there an official
standard that mandates a particular API for the ACL stuff (POSIX)?

If so, the whole discussion is about getting out of the frying pan and
into the fire. The traditional approach will then be standards compliant
but be out-of-band and outside of the file system name space, the new
approach will be outside of the standards, requiring application
developers to produce a Linux and a POSIX version.

Or am I barking up the wrong tree?

-- 
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95


Re: online fsck

2004-04-22 Thread Matthias Andree
On Thu, 22 Apr 2004, Jure Pe??ar wrote:

 Is it theoretically posible?

It is actually implemented in the BSD kernels, for UFS. Look for
softdep or softupdates.

As for other file systems, when crashing while the write cache is
enabled (unfortunately, it is in FreeBSD for instance), it can royally
screw up your file system. Beyond repair, to a use your backup state.

-- 
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95


reiserfs corruption through 2.6 crash?

2004-03-11 Thread Matthias Andree


rebuild.log.gz
Description: gzipped reiserfsck --rebuild-tree log


Re: nfs with reiserfs and ufs on freebsd

2004-02-18 Thread Matthias Andree
Philippe Gramoullé [EMAIL PROTECTED] writes:

 Hello,

 If you don't use BSD flock() nor O_EXCL in a dotlocking scheme, there shouldn't be 
 any
 problem with NFS linux server and BSD client.

FreeBSD 4.8 as NFS client doesn't do any locking at all.

O_EXCL doesn't work across NFS anyways, workaround: use mkstemp(2) and
link(2).

-- 
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95


2.4.24 Oops in find (maybe reiserfs related)

2004-01-20 Thread Matthias Andree
This happened during the nightly updatedb, which calls find. The hex
string is resi, locate resi finds a file in a reiserfs file system,
/usr.

reiserfsck 3.6.11 afterwards fixed some
vpf-10680: The file [103048 103049] has the wrong block count in the
StatData (2) - corrected to (1)

But I doubt these are related. Are they?

Unable to handle kernel paging request at virtual address 72657369
 printing eip:
72657369
*pde = 
Oops: 
CPU:0
EIP:0010:[72657369]Not tainted
EFLAGS: 00010206
eax: f8bce0a0   ebx: 72657369   ecx: c1c1f13c   edx: f117dec0
esi: ec837f98   edi: 08060828   ebp: b258   esp: ec837f8c
ds: 0018   es: 0018   ss: 0018
Process find (pid: 7765, stackpage=ec837000)
Stack: c014ebf1 f117dec0 b530 f117dec0 f7edae80 1000 ebcc0be0 0001
   0008 0001 1000 ec836000 b530 c01090eb 08060831 b530
   080541cc b530 08060828 b258 00c4 002b 002b 00c4
Call Trace:[sys_lstat64+129/144] [system_call+51/56]
Call Trace:[c014ebf1] [c01090eb]
Code:  Bad EIP value.

-- 
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95


Re: 2.4.24 Oops in find (maybe reiserfs related)

2004-01-20 Thread Matthias Andree
On Tue, 20 Jan 2004, Matthias Andree wrote:

 This happened during the nightly updatedb, which calls find. The hex
 string is resi, locate resi finds a file in a reiserfs file system,
 /usr.
 
 reiserfsck 3.6.11 afterwards fixed some
 vpf-10680: The file [103048 103049] has the wrong block count in the
 StatData (2) - corrected to (1)

I have put the vmlinux, bzImage, modules and .config available, if
anyone cares to have a look, send me a mail off-list and I'll by happy
to return the URL. Marcello has been mailed the URL.

-- 
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95


Re: Linux 2.4 - 2.6 migration

2003-11-08 Thread Matthias Andree
Sebastian Kaps [EMAIL PROTECTED] writes:

 Is there something concerning ReiserFS I should know when migrating from
 Linux 2.4 to Linux 2.6?

No. Make sure the file system is fine before booting a new kernel, using
a CURRENT reiserfsprogs is essential. Namesys have made lots of fixes
recently.

-- 
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95


Re: reiserfsprogs-3.6.12-pre1 release

2003-11-01 Thread Matthias Andree
Vitaly Fertman [EMAIL PROTECTED] writes:

 The new pre release is available for downloading on
 ftp://ftp.namesys.com/pub/reiserfsprogs/pre/reiserfsprogs-3.6.12-pre1.tar.gz
...
 *reiserfsck can check ro mounted filesystems.

Does this use the reboot Linux exit codes that (e2)fsck uses?

-- 
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95


Re: r4 v. ext3, quick speed vs. cpu experiments

2003-08-14 Thread Matthias Andree
Szakacsits Szabolcs [EMAIL PROTECTED] writes:

 Yes, if you have enough CPU capacity (aka you don't run anything else, just
 bechmarking filesystems). Otherwise it seems to be slower. That's I was
 refering to.

This has been the situation with reiserfs 3.5/3.6 before, and it got
resolved, or so it appears. I haven't ext3-vs-reiserfs3.6 figures at
hand, but I'm not aware of CPU bottlenecks in reiserfs3.6 code. Just
wait a couple of months until the reiserfs gurus got their reiserfs4
beast stable and debugged and can focus on tuning.

To a previous post about code size and execution speed: it's not
generally true that larger code is also slower. It depends how that code
is arranged. If you have many abstractions, then maybe it's slower. If
you have many specialized functions in an otherwise flat profile, it can
be a good deal faster than a simpler (less complex) code.

-- 
Matthias Andree


Re: [reiserfs-list] Catastrophe with mailboxes on ReiserFS

2002-10-10 Thread Matthias Andree

On Thu, 10 Oct 2002, Newsmail wrote:

 What are those write barrier patches? and where can I find them? I was also 

I don't know. Some vendor kernels have them (SuSE), but I'm not sure
where they are available ATM.

 hurted by several 'cross linked' files, described in here. is there a way 
 to deactivate write caching for just one filesystem?

No. You can only disable it per drive.

 ps: please somebody let me know where I can find documentation about those 
 chris mason patches, or at least what is dara=ordered or data=journal 
 options. everybody talks about it, nobody explains. or did I miss something?

You can safely consult ext3 documentation on these options, reiserfs
should behave the same in respect to these options.

-- 
Matthias Andree



Re: [reiserfs-list] Catastrophe with mailboxes on ReiserFS

2002-10-09 Thread Matthias Andree

Oleg Drokin [EMAIL PROTECTED] writes:

 BTW, I just remembered that until you apply Chris' Mason data logging patches,
 there is a certain window where system crash would lead to deleted 
 data appearing at the end of files that were appended before the
 crash.

I'd like to see these patches merged. I hurt myself several times with
reiserfs, when the computer locked up during heavy file system activity,
with NUL blocks in files, files mixed up and the like. I've never seen
this on any data=ordered or data=journal file system, regardless of ext3
or reiserfs.

Of course, write caches must be turned of or write barrier patches be
applied to be safe in case of a power blackout.

-- 
Matthias Andree



Re: [reiserfs-list] reiserfsck not fixing vs-5150/vs-13070

2002-08-30 Thread Matthias Andree

Hubert Mantel [EMAIL PROTECTED] writes:

 This is an unofficial kernel with lots of new features and work in
 progress. The README file explicitly warns not to use this one on a
 production system yet. Depending on the version you are using, there
 exists even a random memory corruption bug that only recently got fixed.

Thanks for the heads-up. I'm aware that your unofficial kernel is WIP,
and that it can undeliberately eat my lunch, burn my house and exhume my
grandmother -- and even memory corruptions /could/ serve to improve
fsck.*.

Still, it looks as though the problems really escaped reiserfsck v3.6.3
but no more v3.6.4-pre1.

I'll check your kernel's changelog to get the fix though.

-- 
Matthias Andree



[reiserfs-list] reiserfsck not fixing vs-5150/vs-13070

2002-08-28 Thread Matthias Andree
|
===

-- 
Matthias Andree



Re: [reiserfs-list] 'let the hdd remap the bad blocks'

2002-08-20 Thread Matthias Andree

Hans Reiser [EMAIL PROTECTED] writes:

 Just taking a guess, many hard drives have difficult and time-consuming
 procedures that they can go through to read a troublesome block.  These
 can take 20-30 seconds.  Probably if they have to go through these
 procedures, once they finally succeed the smart vendors remap the block.

They should try to rewrite and write verify the block before remapping
it, as there is only a finite amount of spares.

For SCSI drives, there's also Jörg Schilling's sformat tool that can
do the badblocks stuff directly in the drive rather than through all
the kernel buffers, and can also refresh or reassign bad blocks.

-- 
Matthias Andree



Re: [reiserfs-list] 'let the hdd remap the bad blocks'

2002-08-20 Thread Matthias Andree

On Tue, 20 Aug 2002, Hans Reiser wrote:

 Vitaly, take a look at that.  Part of a good user interface is letting 
 users know what tools are available.  Remember, most users will 
 encounter a failing drive and/or fsck on a journaling fs as a rare and 
 stressful event in their lives, so it is good to educate them with URLs 
 and other references at the time they run fsck.

A propos URL, here we go:

ftp://ftp.fokus.gmd.de/pub/unix/sformat/

-- 
Matthias Andree



Re: [reiserfs-list] 'let the hdd remap the bad blocks'

2002-08-19 Thread Matthias Andree

Oleg Drokin [EMAIL PROTECTED] writes:

Basically uyou'd better search for this on HDD vendors sites.
What's going on is simply can be described this way:
You write some block to HDD, if HDD decides the block is bad for some reason
and remapping is allowed (usually by tiurning on SMART), block is written to
different on-platter location and drive adds one more entry to its
remaped-blocks list. Next time you read this block, drive consults its
remapped blocks list and if block is remapped, reads it from new location
with correct content.
Described mechanism works for writing.
Actually I've seen something that looks like remapping on read, though 
I have no meaningful explanation for that (except that they may have some
extra redundant info stored when you write data to disk, so that if sector
cannot be read, its content is restored with that redundant information and
sector is then remapped.). And this process takes a lot of time.

My Fujitsu MAH-3182MP drive (SCSI actually) had ARRE enabled as it
shipped, but ARWE disabled, for reasons I cannot tell, not even from the
data book (PDF). That's Automatic Remap on Read/Write Error. I'm not
sure what it really means, but if the drive really remaps on a read
error, it's going to leak a block at power loss while it is amidst a
block write the next time this block is read. So I switched that to do
ARWE. IDE users are not too lucky unless their vendor provides them with
a tool (and not many ship raw floppy images, many have some multi-MB
Windoze tools just to write some hundred kByte to a floppy disk...)

-- 
Matthias Andree



Re: [reiserfs-list] Corruption: --fix-fixable results in all nlinkvalues = 0

2002-08-16 Thread Matthias Andree

Gerrit Hannaert [EMAIL PROTECTED] writes:

 Is there a difference in the way reiserfs formats as opposed to ext2/3?
 Your mentioning the defective blocks were never read before reminds me

Well, the long explanation is, that the blocks may not have been used
for some time, or that they have gone bad recently, such things happen,
although you'd expect that from DTLA-3070xx drives earlier than from
others.

 of the fact that a Reiserfs format is much quicker than any other
 filesystem's format - does it mean anything?

The on-disk layout is different, but I'm not aware of the internals, and
I don't believe that the allocation pattern changes anything about the
facts.

 Perhaps it is good practice to run 'badblocks' before any initial
 format... if there is no option to format/scan or something.

Neither mke2fs nor mkreiserfs read or write all blocks when formatting,
they instead write some meta data, and that's it. Of course, running
badblocks prior to formatting is an option to find these earlier.

 This one is a MAXTOR 6L080J4. But I've seen these 'dma' issues with
 *all* my other drives as well (IBM-DTLA-307045, FUJITSU MPE3136AH) on
 different PCs with Intel and Via motherboards.

It's not dma but the unrecoverable error part that matters. The DMA
trips over as consequence of this defective block (there is no data that
could be transferred), the DMA is *not* the cause for the bad block.

-- 
Matthias Andree



Re: [reiserfs-list] Corruption: --fix-fixable results in all nlinkvalues = 0

2002-08-16 Thread Matthias Andree

Stefan Fleiter [EMAIL PROTECTED] writes:

 Hi Vitaly!

 On Thu, 15 Aug 2002 Vitaly Fertman wrote:

 Ah, I guess I know what happened. I think you have some fatal corruptions and 
 rebuild-tree is required. In this case check and fix-fixable do not perform 
 semantic check.

 Then reiserfsck should not start in fix-fixable mode when rebuild-tree is
 required.
 People think that fix-fiable is less dangerous. You have shown in some
 situations it is the other way round...

 I propos a new reiserfsck version with only this fix included!

Hum, if reiserfsck can tell if fix-fixable or rebuild-tree is the right
one, then it should also be able to abort the fix-fixable run and tell
the user to run rebuild-tree. Maybe such needs-fix-fixable and
needs-rebuild-tree flags should be stored into the super block, much
like ext2 stores the file system with errors condition.

-- 
Matthias Andree




[reiserfs-list] [OT] traffic magnet dot com crap

2002-06-05 Thread Matthias Andree

Christine Hall [EMAIL PROTECTED] writes:

 I visited http://namesys.com, and noticed that you're not listed on
 some search engines! I think we can offer you a service which can help
 you increase traffic and the number of visitors to your website.

Blacklist this site, trafficmagnet.net. They will come back.
(They come back to haunt the university site that I administer, I
regularly see rejects in my mailer's log file.)

-- 
Matthias Andree



Re: [reiserfs-list] fsync() Performance Issue

2002-04-29 Thread Matthias Andree

Toby Dickenson [EMAIL PROTECTED] writes:

 write to file A
 write to file B
 write to file C
 sync

Be careful with this approach. Apart from syncing other processes' dirty
data, sync() does not make the same guarantees as fsync() does.

Barring write cache effects, fsync() only returns after all blocks are
on disk. While I'm not sure if and if yes, which, Linux file systems are
affected, but for portable applications, be aware that sync() may return
prematurely (and is allowed to!).

-- 
Matthias Andree



Re: [reiserfs-list] Bad blocks

2002-04-16 Thread Matthias Andree

Sam Vilain [EMAIL PROTECTED] writes:

 How can I deal with this?  If anyone knows of a tool to re-format just
 8 sectors (to let the disk re-map the blocks elsewhere), that also
 would be helpful.

Manufacturers may have these tools, but some do a full low-level
format. Usually, writing the bad blocks will make the drive remap them
if it has spare sectors left to remap to.

-- 
Matthias Andree



Re: [reiserfs-list] reiserfs -o notail less throughput than ext3?

2002-03-04 Thread Matthias Andree

On Mon, 04 Mar 2002, Oleg Drokin wrote:

  But how much seeking is done on one 650 MB file that's been written onto
  an empty partition? I presume, not too much
 1625*2 seeks (that's right, 2 seeks per each 4M of data)
 This figure is for reading
 Writing is more complex due to journal

So let's assume we read 4 M in half a second, so we seek four times per
second, gives us like 80 ms penalty on a slow drive That causes a
performance drop to like 92% of the raw transfer rate, even when adding
the same penalty for bus arbitration, we should still get 86% out

However, I hope to be able to play with the settings later, to figure
what's going on there

-- 
Matthias Andree

GPG encrypted mail welcome, unless it's unsolicited commercial email



Re: [reiserfs-list] reiserfs -o notail less throughput than ext3?

2002-03-03 Thread Matthias Andree

Chris Mason [EMAIL PROTECTED] writes:

 I would not say that speeds this bad are a known problem.  1.9MB/s is
 much too slow.  Is that FS very full?  Fragmentation is the only thing
 that should be causing this.

We can exclude that, the partition is empty except that single file, or
maybe two files at several hundred MB each. CD Images, Debian 2.2r5 in
my case.

-- 
Matthias Andree

GPG encrypted mail welcome, unless it's unsolicited commercial email.



Re: [reiserfs-list] reiserfs -o notail less throughput than ext3?

2002-03-03 Thread Matthias Andree

Oleg Drokin [EMAIL PROTECTED] writes:

 Yes, it is slow, but overal disk throughput of 7M/sec suggests this is
 old drive. Old drives tend to have worse seeking speed than today's drives.

But how much seeking is done on one 650 MB file that's been written onto
an empty partition? I presume, not too much.

-- 
Matthias Andree

GPG encrypted mail welcome, unless it's unsolicited commercial email.



Re: [reiserfs-list] reiserfs -o notail less throughput than ext3?

2002-03-03 Thread Matthias Andree

Anders Widman [EMAIL PROTECTED] writes:

 Even with 'heavy' fragmentation this is quite low. A quick benchmark
 of my 5400rpm 80GB disk gave me an average on 30MB/s. However, when
 simulating large fragmentation (10 000+ fragments on a 1GB file) I get
 about 2MB/s.

 Is DMA, unmask IRQ, read ahead and similar activated?

SCSI here, with aic7xxx 5.x and 6.x driver, no particular tuning in
place except that I told the AHA2940 to negotiate Ultra-Wide, it has
braindead default settings (negotiates 10 MXfers/s only, no Ultra), so
we can safely assume it did DMA.

-- 
Matthias Andree

GPG encrypted mail welcome, unless it's unsolicited commercial email.



Re: [reiserfs-list] Boot failure: msdos pushes in front of reiserfs

2002-01-28 Thread Matthias Andree

Hubert Mantel [EMAIL PROTECTED] writes:

 Installation time is after boot time. Use a Unix-style file system. Go
 for minix, that's small and will not get in the way.

 So the modules floppy would need to be minix also. We had that in the 
 past.

No need, you can load fat.o + vfat.o from initrd or romfs or something.

 People don't need to mount these floppies. Agreed. But guess what: They do
 it nevertheless. At least they try to do so. Go work in the support
 department of some Linux distributor for some weeks. This might change
 your view of some things drastically.

That particular person asked in de.comp.os.unix.linux.moderated rather
than bothering your installation support team -- so much as to changed
view.

 3. the kernel fails to keep rootfstype over an initrd. There should
be an additional parameter like initrdfstype or something to let
users override autodetection. Or use *BSD-style, support a file
system tag in the root= bootparam):  root=reiserfs:/dev/hda13 would
specify mounting hda13 as root and try reiserfs first, and
initrd=vfat:/some/braindead/location whatever.

 Ok, so you are no longer requesting a distributor fixing his bugs, but now
 you want distributors to provide a workaround for some kernel
 shortcomming.

 Quite different for me.

I was skiing for a week, that helps sometimes. :-)

Still, I think the distributor should first modprobe reiserfs and only
after that modprobe vfat. That way, reiserfs is tried BEFORE vfat and
all is fine. Robustness of a boot procedure is not a luxury, but a
requirement.

 1. bootloader loads kernel with ext2/minix driver and corrsponding initrd
 2. kernel bootstraps its initrd
 3. initrd does modprobe reiserfs jfs xfs (whatever native)

 Where do you get those modules from? They don't fit onto one single 
 floppy; there is already the kernel and initrd on the first floppy, so we 
 need some filesystem in order to access the second floppy.

Is there no room left for vfat.o inside the initrd when the initrd is
minix? I think it does not matter if you link vfat into the kernel or
drop it into the initrd, either is gzipped.

I feel very uncomfortable with a crippled file system (no devices!) to
bootstrap Linux. It seems to me like a dead end. Maybe there's a better
way than initrd (initramfs had been suggested, cramfs also), I didn't
look (and I usually boot my boxen without initrd, no RAID here).

 Since you don't have to deal with customers, it is very convenient for you 
 to demand we just use minix and answer all the newbies that report 
 defective floppies shipped by us.

In fact, I've yet to see a broken floppy shipped by SuSE. The worst
thing I came across was a CD of some 6.x version which the CD-ROM drive
read at 8x speed (and I didn't bother to have that exchanged)

-- 
Matthias Andree

They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety. Benjamin Franklin



Re: [reiserfs-list] Boot failure: msdos pushes in front of reiserfs

2002-01-15 Thread Matthias Andree

On Mon, 14 Jan 2002, Oleg Drokin wrote:

 Looking at init/main.c and fs/super.c, rootfsflags parameter is never
 saved, moreover - it's original value is destroyed, once initrd fs is
 mounted.  And I only see not very nice ways of fixing this, so perhaps
 someone more exeprienced can come up with the solution?  (my crappy
 ides is not to do putname() on fs_names, if (real_root_dev !=
 ROOT_DEV), all of this is only when CONFIG_..._INITRD enabled)

Thanks for confirming a bug, so I understand that mounting an initrd
loses the rootfsflags, and as the actual root= parameter is kept over an
initrd boot, it should also be possible for rootfsflags= -- can the
rootfsflags maybe be saved along with the root= parameter?

  Yup, reiserfs is last in /proc/filesystems when loaded as module, but on
  my private machine (where it's linked into the kernel), it's right after
  ext2 and before vfat.
 Do you have vfat as a loadable module?

Hum, yes, but that's not the point, someone turned up with a SuSE 7.3
default kernel .config, and it had ...MSDOS=y ...REISERFS=m -- that says
about all, msdos is higher in the list, reiserfs is then loaded from
initrd, and thus at the bottom of the list. Strange enough SuSE compile
MSDOS which hardly anyone needs at boot time into the kernel, but not
reiserfs (admittedly, reiserfs takes up some memory, but then, it's a
native file system and should be loaded before non-native file systems
such as msdos, vfat, ntfs, freevxfs or whatever). This one is for the
distributors to fix.

Had they left MSDOS as a module, things would have worked out: 1. ext2
in the kernel 2. initrd loads reiserfs 3. actual root (reiserfs) is
mounted 4. only now, msdos.o becomes available.

-- 
Matthias Andree

They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety. Benjamin Franklin



Re: [reiserfs-list] writeback caching okay?

2001-10-04 Thread Matthias Andree

Gregory Ade [EMAIL PROTECTED] writes:

 I was just wondering if there are any known problems using ReiserFS on
 disk drives that have writeback caching enabled?
 
 I realized that there is a possibility of the filesystem getting royally
 screwed if there is a sudden loss of power, but the system I'd do this on
 is on a UPS, so this shouldn't be an issue unless my cat figures out how
 to power off my UPS units.

Well, a piece of cardboard and some adhesive tape should fix /that/
problem. Pray your PSU in the machine itself doesn't fail. If your UPS
has a decent surge filter and is fast to ramp up the voltage should the
main supply fail or show brownouts, then the risk may be small enough.

-- 
Matthias Andree

Those who give up essential liberties for temporary safety deserve
neither liberty nor safety. - Benjamin Franklin



[reiserfs-list] Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better

2001-09-25 Thread Matthias Andree

On Tue, 25 Sep 2001, Alex Bligh - linux-kernel wrote:

 Probably because sectors are so close together on the physical media.
 If you disable write caching, and are writing sectors 1001, 1002, 1003
 etc., you tell it to write sector 1001, and it doesn't complete until
 it's written it, you IRQ the PC, and it sends the write out for 1002,
 which completes a little later. However, by this time 1002 has
 flown past the drive head, as it wasn't immediately queued on the drive.
 If you had only one sector of writeahead, this effect would disappear
 (but is just as theoretically dangerous if there is no way to force
 a flush() of the write cache).

Which leads me to the question: which ATA standard brought up the
mandatory FLUSH CACHE command? I saw it's in the ATA 6 draft. How about
standards used in drives that are sold today? ATA 4, ATA 5? Do they have
the FLUSH CACHE command listed, possibly as mandatory? That might be
rather useful to use after in a synchronous write.

-- 
Matthias Andree

Those who give up essential liberties for temporary safety deserve
neither liberty nor safety. - Benjamin Franklin



[reiserfs-list] Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better

2001-09-24 Thread Matthias Andree

On Mon, 24 Sep 2001, Alan Cox wrote:

  Those drives should be blacklisted and rejected as soon as someone tries
  to mount those pieces rw. Either the drive can make guarantees when a
  write to permanent storage has COMPLETED (either by switching off the
  cache or by a flush operation) or it belongs ripped out of the boxes and
  stuffed down the throat of the idiot who built it.
 
 In which case you can choose between ancient ST-506 drives and SCSI

Sorry, a disk drive which makes no guarantees even after a flush, does
not belong in my boxen. I'd return it as broken the first day I figured
it did lazy write-back caching. No file system can be safe on such
disks.

-- 
Matthias Andree

Those who give up essential liberties for temporary safety deserve
neither liberty nor safety. - Benjamin Franklin



Re: [reiserfs-list] filesystem interfaces

2001-05-22 Thread Matthias Andree

On Mon, 21 May 2001, Xuan Baldauf wrote:

 Transaction are not only needed by mail servers, databases will like it, and
 file copying (ftp, scp, even nfs) do need filesystem transactions.

Well. Rsync emulates those, transferring files to temporary names and
renaming them atomically into place. :-)

-- 
Matthias Andree



Re: [reiserfs-list] /etc/fstab

2001-05-19 Thread Matthias Andree

Krasi Zlatev [EMAIL PROTECTED] writes:

 I do not want to put in /etc/fstab, because the partition is mounted at
 start up.
 /dev/hdd5  /mnt5   reiserfs  notail,noatime0 0
 How to mount it manually what options should I give to mount?

The same :-)

You can add noauto to the fstab options to prevent mount on boot.

See man fstab.

-- 
Matthias Andree



Re: [reiserfs-list] filesystem interfaces

2001-05-19 Thread Matthias Andree

Ragnar Kjørstad [EMAIL PROTECTED] writes:

 1. fsync
 
 The current fsync interface lacks a couple of interesting features. I
 believe lazy-fsync has been discussed, but another useful feature is

Wait a minute. The major problem is that mail-servers, regardless of
their performance efforts, need to be safe in that they *NEVER* lose any
mail. Trading speed for lost mail over a crash is not a good deal.

An architecture which defers fsync() for a finite, maximum amount of
time to gather multiple fsync() requests into a batch of (possibly
sorted, sent into a tagged SCSI command queue) writes and then
acknowledging all at once may be a good idea, provided you don't run out
of process table slots with all those waiting smtpds, and a mail server
can not acknowledge receipt of a mail before all file and meta data is
on disk (and not in a fast write cache or something).

 The caller should be able to wait for fsync to complete by using poll in
 the case of asyn fsync.

It would require introducing a special syscall, I believe. However, in
that case, fsync should also sync all meta data related to files.

A different aspect is that many mailers expect BSD-type directory
modifications are always synchronous semantics. Revisit the
ReiserFS-and-qmail issue. On ext2, you can circumvent the problems with
chattr +S, but on ReiserFS, there is no such thing. 



 Now link and rename behaves differently with regards to replacing
 existing files, but what's the logic behind this?

What's the problem with that? 

To clobber, use rename.

To be careful, use link and if that succeeds, unlink.

How would you like to establish the atomicity of either functionality
(either rename or link (without unlink)) in your approach?



 3. Ruby
 
 I just came across the ruby programming language today - the interesting
 thing is that this language has a concept of transactions! Does any
 other languages have this kind of features? Do anyone use them for real
 software? It would be really cool with a ruby-implementation that
 actually used filesystem-transactions to implement this instead of the
 library implementation that I assume ruby uses.

That would probably be most useful in things like NFS where a link may
succeed, but the success report fails. If this was transaction-oriented,
the check if your file's st.n_link has increased to 2, if so, your link
has succeeded could be avoided.

Not sure if Coda or AFS have concepts like these.

-- 
Matthias Andree