Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
[stripping Cc: list] On Thu, 03 Aug 2006, Edward Shishkin wrote: What kind of forward error correction would that be, Actually we use checksums, not ECC. If checksum is wrong, then run fsck - it will remove the whole disk cluster, that represent 64K of data. Well, that's quite a difference... Checksum is checked before unsafe decompression (when trying to decompress incorrect data can lead to fatal things). Is this sufficient? How about corruptions that lead to the same checksum and can then confuse the decompressor? Is the decompressor safe in that it does not scribble over memory it has not allocated? -- Matthias Andree
e2fsck unfixable corruptions (was: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
(changing subject to catch Ted's attention) Bodo Eggert schrieb am 2006-08-05: - I have an ext3 that can't be fixed by e2fsck (see below). fsck will fix some errors, trash some files and leave a fs waiting to throw the same error again. I'm fixing it using mkreiserfs now. If such a bug persists with the latest released e2fsck version - you're not showing e2fsck logs - I'm rather sure Ted Ts'o would like to have a look at your file system meta data in order to teach e2fsck how to fix this. I've seen sufficient releases of reiserfsck that couldn't fix certain bugs, too, so trying with the latest version of the respective tools is a must. -- Matthias Andree
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
On Tue, 01 Aug 2006, David Masover wrote: RAID deals with the case where a device fails. RAID 1 with 2 disks can in theory detect an internal inconsistency but cannot fix it. Still, if it does that, that should be enough. The scary part wasn't that there's an internal inconsistency, but that you wouldn't know. You won't usually know, unless you run a consistency check: RAID-1 will only read from one of the two drives for speed - except if you make the system check consistency as it goes, which would imply waiting for both disks at the same time. And in that case, you'd better look for drives that allow to synchronize their platter staples in order to avoid the read access penalty that waiting for two drives entails. And it can fix it if you can figure out which disk went. If it's decent and detects a bad block, it'll log it and rewrite it with data from the mirror and let the drive do the remapping through ARWE. Depending how far you propogate it. Someone people working with huge data sets already write and check user level CRC values for this reason (in fact bitkeeper does it for one example). It should be relatively cheap to get much of that benefit without doing application to application just as TCP gets most of its benefit without going app to app. And yet, if you can do that, I'd suspect you can, should, must do it at a lower level than the FS. Again, FS robustness is good, but if the disk itself is going, what good is having your directory (mostly) intact if the files themselves have random corruptions? Berkeley DB can, since version 4.1 (IIRC), write checksums (newer versions document this as SHA1) on its database pages, to detect corruptions and writes that were supposed to be atomic but failed (because you cannot write 4K or 16K atomically on a disk drive). -- Matthias Andree
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
On Tue, 01 Aug 2006, Ric Wheeler wrote: Mirroring a corrupt file system to a remote data center will mirror your corruption. Rolling back to a snapshot typically only happens when you notice a corruption which can go undetected for quite a while, so even that will benefit from having reliability baked into the file system (i.e., it should grumble about corruption to let you know that you need to roll back or fsck or whatever). An even larger issue is that our tools, like fsck, which are used to uncover these silent corruptions need to scale up to the point that they can uncover issues in minutes instead of days. A lot of the focus at the file system workshop was around how to dramatically reduce the repair time of file systems. Which makes me wonder if backup systems shouldn't help with this. If they are reading the whole file anyways, they can easily compute strong checksums as they go, and record them for later use, and check so many percent of unchanged files every day to complain about corruptions. -- Matthias Andree
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
On Tue, 01 Aug 2006, Hans Reiser wrote: You will want to try our compression plugin, it has an ecc for every 64k What kind of forward error correction would that be, and how much and what failure patterns can it correct? URL suffices. -- Matthias Andree
Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]
Adrian Ulrich schrieb am 2006-08-01: suspect, particularly with 7200/min (s)ATA crap. Quoting myself (again): A quick'n'dirty ZFS-vs-UFS-vs-Reiser3-vs-Reiser4-vs-Ext3 'benchmark' Yeah, the test ran on a single SATA-Harddisk (quick'n'dirty). I'm so sorry but i don't have access to a $$$ Raid-System at home. I'm not asking for you to perform testing on a RAID system with SCSI or SAS, but I consider the obtained data (I am focussing on transactions per unit of time) highly suspicious, and suspect write caches might have contributed their share - I haven't seen a drive that shipped with write cache disabled in the past years. sdparm --clear=WCE /dev/sda # please. How about using /dev/emcpower* for the next benchmark? No, it is valid to run the test on commodity hardware, but if you (or the benchmark rather) is claiming transactions, I tend to think ACID, and I highly doubt any 200 GB SATA drive manages 3000 synchronous writes per second without causing either serious fragmentation or background block moving. This is a figure I'd expect for synchronous random access to RAM disks that have no seek and rotational latencies (and research for hybrid disks w/ flash or other nonvolatile fast random access media to cache actual rotating magnetic plattern access is going on elsewhere). I didn't mean to say your particular drive were crap, but 200GB SATA drives are low end, like it or not -- still, I have one in my home computer because these Samsung SP2004C are so nicely quiet. I mighty be able to re-run it in a few weeks if people are interested and if i receive constructive suggestions (= Postmark parameters, mkfs options, etc..) I don't know Postmark, I did suggest to turn the write cache off. If your systems uses hdparm -W0 /dev/sda instead, go ahead. But you're right to collect and evaluate suggestions first if you don't want to run a new benchmark every day :) -- Matthias Andree
Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]
On Tue, 01 Aug 2006, Avi Kivity wrote: There's no reason to repack *all* of the data. Many workloads write and delete whole files, so file data should be contiguous. The repacker would only need to move metadata and small files. Move small files? What for? Even if it is only moving metadata, it is not different from what ext3 or xfs are doing today (rewriting metadata from the intent log or block journal to the final location). The UFS+softupdates from the BSD world looks pretty good at avoiding unnecessary writes (at the expense of a long-running but nice background fsck after a crash, which is however easy on the I/O as of recent FreeBSD versions). Which was their main point against logging/journaling BTW, but they are porting XFS as well to save those that need instant complete recovery. -- Matthias Andree
Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]
Jan Engelhardt schrieb am 2006-08-01: I didn't mean to say your particular drive were crap, but 200GB SATA drives are low end, like it or not -- And you think an 18 GB SCSI disk just does it better because it's SCSI? 18 GB SCSI disks are 1999 gear, so who cares? Seagate didn't sell 200 GB SATA drives at that time. Esp. in long sequential reads. You think SCSI drives aren't on par? Right, they're ahead. 98 MB/s for the fastest SCSI drives vs. 88 MB/s for Raptor 150 GB SATA and 74 MB/s for the fastest other ATA drives. (Figures obtained from StorageReview.com's Performance Database.) -- Matthias Andree
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Adrian Ulrich schrieb am 2006-07-31: And EXT3 imposes practical limits that ReiserFS doesn't as well. The big one being a fixed number of inodes that can't be adjusted on the fly, Right. Plan ahead. Ok: Assume that i've read the mke2fs manpage and added more inodes to my filesystem. So: What happens if i need to grow my filesystem by 200% after 1-2 years? Can i add more inodes to Ext3 on-the-fly ? Since you grow, you'll be using resize2fs (or growfs or mkfs -G for UFS). resize2fs and the other tools do exactly that: add inodes - and you could easily have told this either from reading the resize2fs code or just trying it on a temp file: -- create file system dd if=/dev/zero of=/tmp/foo bs=1k count=5 /sbin/mke2fs -F -j /tmp/foo -- check no. of inodes /sbin/tune2fs -l /tmp/foo | grep -i inode | head -2 # Inode count: 12544 # Free inodes: 12533 -- resize /sbin/e2fsck -f /tmp/foo dd if=/dev/zero bs=1k count=5 /tmp/foo /sbin/resize2fs /tmp/foo -- check no. of inodes /sbin/tune2fs -l /tmp/foo | grep -i inode # Inode count: 23296 # Free inodes: 23285 Trying the same after mke2fs -b 1024 -i 1024 shows that the inode density will continue to be respected. FreeBSD 6.1's growfs(8) increases the number of inodes. This is documented to work since 4.4. Solaris 8's mkfs -G also increases the number of inodes and apparently also works for mounted file systems. This looks rather like an education issue rather than a technical limit. A filesystem with a fixed number of inodes (= not readjustable while mounted) is ehr.. somewhat unuseable for a lot of people with big and *flexible* storage needs (Talking about NetApp/EMC owners) Which is untrue at least for Solaris, which allows resizing a life file system. FreeBSD and Linux require an unmount. Why are a lot of Solaris-people using (buying) VxFS? Maybe because UFS also has such silly limitations? (..and performs awkward with trillions of files..?..) Well, such silly limitations... looks like they are mostly hot air spewn by marketroids that need to justify people spending money on their new filesystem. The only problem remains if you grossly overestimate the average file size and with it underestimate the number of inodes needed. But even then, I'd be interested to know if that's a real problem for systems such as ZFS. -- Matthias Andree
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
(resending complete message to the list). Adrian Ulrich schrieb am 2006-07-31: Hello Matthias, This looks rather like an education issue rather than a technical limit. We aren't talking about the same issue: I was asking to do it on-the-fly. Umounting the filesystem, running e2fsck and resize2fs is something different ;-) There was stuff by Andreas Dilger, to support online resizing of mounted ext2 file systems. I never cared to look for this (does it support ext3, does it work with current kernels, merge status) since offline resizing was always sufficient for me. A colleague of mine happened to create a ~300gb filesystem and started to migrate Mailboxes (Maildir-style format = many small files (1-3kb)) to the new LUN. At about 70% the filesystem ran out of inodes; Well - easy to fix, newfs again with proper inode density (perhaps 1 per 2 kB) and redo the migration. Of course you're free to pay for a new file system if your fellow admin can't be bothered to remember newfs's -i option. Well, such silly limitations... looks like they are mostly hot air spewn by marketroids that need to justify people spending money on their new filesystem. Have you ever seen VxFS or WAFL in action? No I haven't. As long as they are commercial, it's not likely that I will. Great to see that Sun ships a state-of-the-art Filesystem with Solaris... I think linux should do the same... I think reallocating inodes for UFS and/or ext2/ext3 is possible, even online, but someone needs to write, debug and field-test the code to do that - possibly based on Andreas Dilger's earlier ext2 online resizing work. -- Matthias Andree
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Jan-Benedict Glaw schrieb am 2006-07-31: Uh? Where did you face a problem there? With maildir, you shouldn't face any problems IMO. Even users with zillions of mails should work properly with the dir_index stuff: tune2fs -O dir_index /dev/hdXX or alternatively (to start that for already existing directories): e2fsck -fD /dev/hdXX hat is not alternatively, but tune2fs first, then e2fsck -fD (which can't happen on a RW-mounted FS and you should only try this on your rootfs if you can reboot with magic sysrq or from a rescue system). -- Matthias Andree
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Jan-Benedict Glaw schrieb am 2006-07-31: On Mon, 2006-07-31 18:44:33 +0200, Rudy Zijlstra [EMAIL PROTECTED] wrote: On Mon, 31 Jul 2006, Jan-Benedict Glaw wrote: On Mon, 2006-07-31 17:59:58 +0200, Adrian Ulrich [EMAIL PROTECTED] wrote: A colleague of mine happened to create a ~300gb filesystem and started to migrate Mailboxes (Maildir-style format = many small files (1-3kb)) to the new LUN. At about 70% the filesystem ran out of inodes; Not a So preparation work wasn't done. Of course you are right. Preparation work was not fully done. And using ext1 would also have been possible. I suspect you are still using ext1, cause with proper preparation it is perfectly usable. Oh, and before people start laughing at me, here are some personal or friend's experiences with different filesystems: * reiser3: A HDD containing a reiser3 filesystem was tried to be booted on a machine that fucked up DMA writes. Fortunately, it crashed really soon (right after going for read-write.) After rebooting the HDD on a sane PeeCee, it refused to boot. Starting off some rescue system showed an _empty_ root filesystem. Massive hardware problems don't count. ext2/ext3 doesn't look much better in such cases. I had a machine with RAM gone bad (no ECC - I wonder what idiot ordered a machine without ECC for a server, but anyways) and it fucked up every 64th bit - only in a certain region. Guess what happened to the fs when it went into e2fsck after a reboot. Boom. Same with a dead DPTA that lost every 16th block or so, the rescue in the first case was swapping the RAM and amrecover and in the second swapping the drive and dsmc restore. OTOH, kernel panics on bad blocks are a no-no of course. * A friend's XFS data partition (portable USB/FireWire HDD) once crashed due to being hot-unplugged off the USB. The in-kernel XFS driver refused to mount that thing again, and the tools also refused to fix any errors. (Don't ask, no details at my hands...) Don't use write caches then. (Though I've seen NUL-filled blocks in new files or appended to files after in 2001 or 2002.) * JFS just always worked for me. Though I've never ever had a broken HDD where it (or it's tools) could have shown how well-done they were, so from a crash-recovery point of view, it's untested. SUSE removed JFS support from their installation tool for technical reasons they didn't specify in the release notes. Whatever. ext3 always worked well for me, so why should I abandon it? Plus, it and its tools are maintained. -- Matthias Andree
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Adrian Ulrich schrieb am 2006-07-31: Ehr: Such a migration (on a very busy system) takes *some* time (weeks). Re-Doing (migrate users back / recreate the FS / start again) the whole thing isn't really an option.. All the more important to think about FS requirements *before* newfs-ing if a quick one day for rsync/star/dump+restore isn't available. If you're hitting, for instance, the hash collision problem in reiser3, you're as dead as with a FS without inodes. Have you ever seen VxFS or WAFL in action? No I haven't. As long as they are commercial, it's not likely that I will. Why? I'm trying to shift my focus away from computer administration and better file systems than old-style non-journalling, non-softupdates UFS are available today and more will follow. Cc: list weeded out. -- Matthias Andree
Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]
Adrian Ulrich wrote: See also: http://spam.workaround.ch/dull/postmark.txt A quick'n'dirty ZFS-vs-UFS-vs-Reiser3-vs-Reiser4-vs-Ext3 'benchmark' Whatever Postmark does, this looks pretty besides the point. Are these actual transactions with the Durability guarantee? 3000/s doesn't look too much like you're doing synchronous I/O (else figures around 70/s perhaps 100/s would be more adequate), and cache exercise is rather irrelevant for databases that manage real (=valuable) data... -- Matthias Andree
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Jan-Benedict Glaw schrieb am 2006-07-31: Massive hardware problems don't count. ext2/ext3 doesn't look much better in such cases. I had a machine with RAM gone bad (no ECC - I wonder what They do! Very much, actually. These happen In Real Life, so I have to pay attention to them. Once you're in setups with 1 machines, everything counts. At some certain point, you can even use HDD's temperature sensors in old machines to diagnose dead fans. Everything that eases recovery for whatever reason is something you have to pay attention to. The simplicity of ext{2,3} is something I really fail to find proper words for. As well as the really good fsck. Once seen a SIGSEGV'ing fsck, you really don't want to go there. The point is: If you've written data with broken hardware (RAM, bus, controllers - loads of them, CPU), what is on your disks is untrustworthy anyways, and fsck isn't going to repair your gzip file where every 64th bit has become a 1 or when the battery-backed write cache threw 60 MB down the drain... Of course, an fsck that crashes is unbearable, but that doesn't apply to broken hardware failures. You need backups with a few generations to avoid massively losing data. -- Matthias Andree
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Theodore Tso schrieb am 2006-07-31: With the latest e2fsprogs and 2.6 kernels, the online resizing support has been merged in, and as long as the filesystem was created with space reserved for growing the filesystem (which is now the default, or if the filesystem has the off-line prepration step ext2prepare run on it), you can run resize2fs on a mounted filesystem and grow an ext2/3 filesystem on-line. And yes, you get more inodes as you add more disk blocks, using the original inode ratio that was established when the filesystem was created. That's cool. The interesting part for some people would be, if I read past postings correctly, to change the inode ratio in an existing (perhaps even mounted) file system without losing data. (I'm not sure how many blocks have to be moved and/or changed for that purpose, because I know too little about the on-disk ext2 layout, but since block relocating is already in place for shrink support in the offline resizer, some of the work appears to be done already.) -- Matthias Andree
Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]
On Mon, 31 Jul 2006, Nate Diller wrote: this is only a limitation for filesystems which do in-place data and metadata updates. this is why i mentioned the similarities to log file systems (see rosenblum and ousterhout, 1991). they observed an order-of-magnitude increase in performance for such workloads on their system. It's well known that transactions that would thrash on UFS or ext2fs may have quieter access patterns with shorter strokes can benefit from logging, data journaling, whatever else turns seeks into serial writes. And then, the other question with wandering logs (to avoid double writes) and such, you start wondering how much fragmentation you get as the price to pay for avoiding seeks and double writes at the same time. TANSTAAFL, or how long the system can sustain such access patterns, particularly if it gets under memory pressure and must move. Even with lazy allocation and other optimizations, I question the validity of 3000/s or faster transaction frequencies. Even the 500 on ext3 are suspect, particularly with 7200/min (s)ATA crap. This sounds pretty much like the drive doing its best to shuffle blocks around in its 8 MB cache and lazily writing back. sdparm --clear=WCE /dev/sda # please. -- Matthias Andree
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
On Thu, 27 Jul 2006, Grzegorz Kulewski wrote: Sorry for my stupid question, but could you tell me why starting to make incompatible changes to reiserfs3 now (when reiserfs3 technology is rather old) and making reiserfs3 unstable (again), possibly for several months or even years is better than fixing big issues with reiser4 (if there are any really big left) merging it and trying to stabilize it? For end user both ways will result in mkfs so... ext2fs and ext3fs, without plugins, added dir_index as a compatible upgrade, with an e2fsck option (that implies optional) to build indices for directories without them. ext3fs is a compatible upgrade from ext2fs, it's as simple as unmount, tune2fs -j, mount. reiserfs 3.6 could deal with 3.5 file systems, and mount -o conv with a 3.6 driver would convert a 3.5 file system to 3.6 level (ISTR it had to do with large file support and perhaps NFS exportability, but don't quote me on that). I wonder what makes the hash overflow issue so complicated (other than differing business plans, that is) that upgrading in place isn't possible. Changes introduce instability, but namesys were proud of their regression testing - so how sustainable is their internal test suite? Instead, we're told reiser4 would fix this (quite likely) and we should wait until it's ready (OK, we shouldn't be using experimental stuff for production but rather for /tmp, but the file system will take many months to mature after integration) and it will be mkfs time - so reiser4 better be mature before we go that way if there's no way back short of amrecover, restore or tar -x. Smashing out most of the Cc:s in order not to bore people. -- Matthias Andree
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
On Tue, 25 Jul 2006, Denis Vlasenko wrote: I, on the contrary, want software to impose as few limits on me as possible. As long as it's choosing some limit, I'll pick the one with fewer surprises. -- Matthias Andree
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
On Sun, 23 Jul 2006, Hans Reiser wrote: I want reiserfs to be the filesystem that professional system administrators view as the one with both the fastest technological pace, and the most conservative release management. Well, I, with the administrator hat on, phased out all reiserfs file systems and replaced them by ext3. This got me rid of silent corruptions, immature reiserfsprogs and hash collision chain limits. I apologize to users that the technology required a 5 year gap between releases. It just did, an outsider may not realize how deep the changes we made were. Things like per node locking based on a whole new approach to tree locking that goes bottom up instead of the usual top down are big tasks.Dancing trees are a big change, getting rid of blobs is a big change, wandering logs. We did a lot of things like that, and got very fortunate with them. If we had tried to add such changes to V3, the code would have been unstable the whole 5 years, and would not have come out right. And that is something that an administrator does not care the least about. It must simply work, and the tools must simply work. Once I hit issues like xfs_check believes / were mounted R/W (not ignoring rootfs) and refuses the R/O check, reiserfsck can't fix a R/O file system (I believed this one got fixed before 3.6.19) or particularly silent corruptions that show up later in a routine fsck --check after a kernel update, the filesystem and its tools appear in a bad light. I've never had such troubles with ext2fs or ext3fs or FreeBSD's or Solaris's ufs. I'm not sure what patches Chris added to SUSE's reiserfs, nor do I care any more. The father declared his child unsupported, and that's the end of the story for me. There's nothing wrong about focusing on newer code, but the old code needs to be cared for, too, to fix remaining issues such as the can only have N files with the same hash value. (I am well aware this is exploiting worst-case behavior in a malicious sense but I simply cannot risk such nonsense on a 270 GB RAID5 if users have shared work directories.) -- Matthias Andree
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
On Mon, 24 Jul 2006, Hans Reiser wrote: and that's the end of the story for me. There's nothing wrong about focusing on newer code, but the old code needs to be cared for, too, to fix remaining issues such as the can only have N files with the same hash value. Requires a disk format change, in a filesystem without plugins, to fix it. You see, I don't care a iota about plugins or other implementation details. The bottom line is reiserfs 3.6 imposes practial limits that ext3fs doesn't impose and that's reason enough for an administrator not to install reiserfs 3.6. Sorry. -- Matthias Andree
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Mike Benoit wrote: I've been bitten by running out of inodes on several occasions, and by switching to ReiserFS it saved one company I worked for over $250,000 because they didn't need to buy a totally new piece of software. ext3fs's inode density is configurable, reiserfs's hash overflow chain length is not, and it doesn't show in df -i either. If you need lots of inodes, mkfs for lots. That's old Unix lore.
Re: Congratulations! we have got hash function screwed up
On Thu, 30 Dec 2004, Hans Reiser wrote: A working undelete can either hog disk space or die the moment some large write comes in. And if you're at that point, make it a versioning file system Well, yes, it should be one. darpa is paying for views, add in a little versioning and. If the view is something between a transactional view in a SQL database and a device-mapper snapshot, then yes, it might be close enough. There's however always the problem of capacity conflicts, and there may need to be a switch that prefers keep older versions over discard older versions so that the admin with - by your leave - idiot users has a chance to save his users' a**e*. - but then don't complain about space efficiency. This is an area where apple was smarter than Unix. Having a trash can is what real users need, more than they need performance.. Does Apple's trash can help against files that get overwritten in situ? If not, it's insufficient to fix another common failure. My Mom is prone (is that word applicable to human behavior?) to open a file (say, a half year schedule of the local community), edit it, without saving it under a new name - by the time she's completed her edit, she has forgotten to rename it and boom, old file dead. Next week she wants it back... I would however auto-empty the trash can when space got low That isn't desired. See above. Well, it hasn't been coded solely because we haven't gotten around to it what with all else that needs doing and still needs doing. Remind me about this in a year.:) Save this mail to a file and have atd mail it to you. Or use a calendar :) -- Matthias Andree
Re: Congratulations! we have got hash function screwed up
Hans Reiser [EMAIL PROTECTED] writes: Again, this is a lame excuse for a bug. First you declare some features on your filesystem, later, when it turns out that it isn't being delivered, you act as if this were a known condition. Well this is true, you are right. Reiser4 is the fix though. No, it isn't. Reiser4 is an alternative beast. Or will it transparently fix the collision problem in a 3.5 or 3.6 file system, in a way that is backwards compatible with 3.6 drivers? If not, please fix reiser3.6. Given that Reiser4 isn't proven yet in the field (for that, it would have to be used as the default file system by at least one major distributor for at least a year), it is certainly not an option for servers _yet_. A file system that intransparently (i. e. not inode count or block count) refuses to create a new file doesn't belong on _my_ production machines, which shall migrate away from reiserfs on the next suitable occasion (such as upgrades). There's ext3fs, jfs, xfs, and in 2006 or 2007, we'll talk about reiser4 again. Yes, I am conservative WRT file systems and storage. -- Matthias Andree
Re: Congratulations! we have got hash function screwed up
Yiannis Mavroukakis [EMAIL PROTECTED] writes: Your proven reasoning sounds a bit strange to me..Microsoft (aka major distributor at least in my books) had her filesystems in the field for ages, does this prove any of them good (or bad for that matter)? My reasoning mentioned a /required/, but not a /sufficient/ criterion. In other words: not before it is proven in the field will I consider it for production use. Remember the Linux 2.2 reiserfs 3.5 NFS woes? Remember the early XFS-NFS woes? These are all reasons to avoid a shiny new file system for serious work. I don't think I'd wait for a distributor to shove reiser4 down my throat, just because the distributor seems to trust it, so the logical course would be for me to try it out. I'll grant you that I am not using it on the mission critical server, because our hosting provider will not support it (ext3 addicts..oh well) For practical recovery reasons (error on root FS after a crash), ext3fs is easier to handle. You can fsck the (R/O) root partition just fine (e2fsck then asks you to reboot right away); for reiserfs, you'll have to boot into some emergency or rescue system... but I do have it on my development server, that does house critical code and receives all kinds of hammering from yours truly; And I use it at home. I reformatted my last at home reiserfs an hour ago and unloaded the reiserfs kernel module, as the way how Hans has responded to the error report is inacceptable. Anyone is free to choose the file system, and as the simple demonstration code posted earlier shows a serious flaw in reiserfs, Hans's response was boldfaced, I ditched reiserfs3. End of story. -- Matthias Andree
Re: Congratulations! we have got hash function screwed up
Yiannis Mavroukakis [EMAIL PROTECTED] writes: I agree, but you're generalising, this is not xfs and reiser4 is not 3.5 ;) If you don't try out the shiny new filesystem yourself, how can you possibly dismiss it based on the past failures of other filesystems? I doubt new software is bug-free. I don't expect NFS problems with reiser4 though, these should be in the regression tests. :-) -- Matthias Andree
Re: Congratulations! we have got hash function screwed up
Cal [EMAIL PROTECTED] writes: -- and then at Thu, 30 Dec 2004 13:40:53 +0100, it was written ... ... Anyone is free to choose the file system, and as the simple demonstration code posted earlier shows a serious flaw in reiserfs, Hans's response was boldfaced, I ditched reiserfs3. End of story. Your policy and philosophy on file system selection are yours to enjoy as you see fit, but the anger and angst ... ? Phew!! I have no interest to deal with systems that have known and reproducible cases of failure that are nondeterministic in practical use. And Marc's documentation showed this is a real-world problem, not an ivory tower problem. The reiserfs story is over for me. All private machines I deal with are reiserfs-free as of a few hours ago. It was just one bug too many, and it was handled unprofessionally, unlike many bugs before which had been dealt with on short notice usually, or at least accepted for looking into. I'll phase reiser3 out on my work machines as I see fit. I have seen too many bugs in reiserfs3. I do believe reiserfs4 fixes some design flaws of reiser3, and when the implementation issues are all shaken out in one or two years' time, it may be a good file system and I will look at it - I trust the reiserfs team can learn from their mistakes. I hope they learn that THIS handling of the error was wrong. Who cares, not us for the past five years is not a proper response to a real-world problem. -- Matthias Andree
Re: Congratulations! we have got hash function screwed up
On Thu, 30 Dec 2004, Hans Reiser wrote: Fixing hash collisions in V3 to do them the way V4 does them would create more bugs and user disruption than the current bug we have all lived with for 5 years until now. If someone thinks it is a small change to fix it, send me a patch. Better by far to fix bugs in V4, which is pretty stable these days. Better to fix a known bug than create a file system vacuum before V4 is really stable. Anyways, I don't care any more, I'm phasing out ReiserFS v3 and have no plans to try V4 before 2006. -- Matthias Andree
Re: Congratulations! we have got hash function screwed up
On Thu, 30 Dec 2004, Burnes, James wrote: (BTW: If Hans is a little tired of working on Reiser3 it's probably because he is currently stressed out making last minute tweaks on Reiser4 and managing his team. Cut him some slack. Email conversations don't show a number of things we take for granted, like that fact that the person we're talking to looks really tired etc. Unlike ext3, XFS and JFS, Reiser isn't funded by someone with huge pockets.) I'm willing to grant ANY time-out, if Hans wrote I have a pile of deadline reiser4 contract work before I can deal with that, fine, he didn't but said use reiser4 instead. And that's inadequate. And I say this without any emotions, red head, swelling veins and such. -- Matthias Andree
Re: Congratulations! we have got hash function screwed up
On Thu, 30 Dec 2004, Esben Stien wrote: Sure, but your not factoring in murphys law here. A tool to undelete would come many people in handy who even got proper backup solutions. You're asking for a versioned file system. If reiserfs v4 doesn't offer such properties, find something else that does. Besides, a recovery is not the same as a feature to undelete. Such a feature would maybe save a days work, which is a lot in some circles. Backup more often. Staged backup schemes (hourly, daily, weekly, monthly) with varying levels of differential or complete backups, plus off-site archives, are probably a good idea for these circles then. -- Matthias Andree
Re: Congratulations! we have got hash function screwed up
Spam [EMAIL PROTECTED] writes: In any case. Undelete has been since ages on many platforms. It IS a useful feature. Accidents CAN happen for many reasons and in some cases you may need to recover data. Besides, a deletion does not fully remove the data, but just unlinks it. In Reiser where there is tailing etc for small files this can be a problem. Either the little file might not be able to be recovered (shouldn't the data still exist, even if it is tailed), or the user need to use a non-tailing policy? A working undelete can either hog disk space or die the moment some large write comes in. And if you're at that point, make it a versioning file system - but then don't complain about space efficiency. well, overwritten data is not so easy to get back. But from what I understand in Linux, is that many applications actually write another file and then unlinks the old file? If that is the case then it may even be possible to get back some overwritten files! I see enough applications to just overwrite an output file. This whole discussion doesn't belong here until someone talks about implementing a whole versioning system for reiser4. -- Matthias Andree
Re: I oppose Chris and Jeff's patch to add an unnecessary additional namespace to ReiserFS
On Mon, 26 Apr 2004, Chris Mason wrote: I hope v4 does improve the xattr api, and I hope it manages to do so for more then just reiser4. It is important that application writers are able to code to a single interface and get coverage across all the major linux filesystems. Interesting point, given that SuSE were early adopters of alternative file systems such as JFS, ReiserFS, and XFS (in lexicographical order rather than order of appearance). These have always diversified the semantics offered, not only in adding features that other systems didn't have, but also in omitting features the other file systems did have - chattr, for instance, or tail merging that confused boot loaders, for another. With respect to Hans's reasoning about name spaces, is there an official standard that mandates a particular API for the ACL stuff (POSIX)? If so, the whole discussion is about getting out of the frying pan and into the fire. The traditional approach will then be standards compliant but be out-of-band and outside of the file system name space, the new approach will be outside of the standards, requiring application developers to produce a Linux and a POSIX version. Or am I barking up the wrong tree? -- Matthias Andree Encrypt your mail: my GnuPG key ID is 0x052E7D95
Re: online fsck
On Thu, 22 Apr 2004, Jure Pe??ar wrote: Is it theoretically posible? It is actually implemented in the BSD kernels, for UFS. Look for softdep or softupdates. As for other file systems, when crashing while the write cache is enabled (unfortunately, it is in FreeBSD for instance), it can royally screw up your file system. Beyond repair, to a use your backup state. -- Matthias Andree Encrypt your mail: my GnuPG key ID is 0x052E7D95
reiserfs corruption through 2.6 crash?
rebuild.log.gz Description: gzipped reiserfsck --rebuild-tree log
Re: nfs with reiserfs and ufs on freebsd
Philippe Gramoullé [EMAIL PROTECTED] writes: Hello, If you don't use BSD flock() nor O_EXCL in a dotlocking scheme, there shouldn't be any problem with NFS linux server and BSD client. FreeBSD 4.8 as NFS client doesn't do any locking at all. O_EXCL doesn't work across NFS anyways, workaround: use mkstemp(2) and link(2). -- Matthias Andree Encrypt your mail: my GnuPG key ID is 0x052E7D95
2.4.24 Oops in find (maybe reiserfs related)
This happened during the nightly updatedb, which calls find. The hex string is resi, locate resi finds a file in a reiserfs file system, /usr. reiserfsck 3.6.11 afterwards fixed some vpf-10680: The file [103048 103049] has the wrong block count in the StatData (2) - corrected to (1) But I doubt these are related. Are they? Unable to handle kernel paging request at virtual address 72657369 printing eip: 72657369 *pde = Oops: CPU:0 EIP:0010:[72657369]Not tainted EFLAGS: 00010206 eax: f8bce0a0 ebx: 72657369 ecx: c1c1f13c edx: f117dec0 esi: ec837f98 edi: 08060828 ebp: b258 esp: ec837f8c ds: 0018 es: 0018 ss: 0018 Process find (pid: 7765, stackpage=ec837000) Stack: c014ebf1 f117dec0 b530 f117dec0 f7edae80 1000 ebcc0be0 0001 0008 0001 1000 ec836000 b530 c01090eb 08060831 b530 080541cc b530 08060828 b258 00c4 002b 002b 00c4 Call Trace:[sys_lstat64+129/144] [system_call+51/56] Call Trace:[c014ebf1] [c01090eb] Code: Bad EIP value. -- Matthias Andree Encrypt your mail: my GnuPG key ID is 0x052E7D95
Re: 2.4.24 Oops in find (maybe reiserfs related)
On Tue, 20 Jan 2004, Matthias Andree wrote: This happened during the nightly updatedb, which calls find. The hex string is resi, locate resi finds a file in a reiserfs file system, /usr. reiserfsck 3.6.11 afterwards fixed some vpf-10680: The file [103048 103049] has the wrong block count in the StatData (2) - corrected to (1) I have put the vmlinux, bzImage, modules and .config available, if anyone cares to have a look, send me a mail off-list and I'll by happy to return the URL. Marcello has been mailed the URL. -- Matthias Andree Encrypt your mail: my GnuPG key ID is 0x052E7D95
Re: Linux 2.4 - 2.6 migration
Sebastian Kaps [EMAIL PROTECTED] writes: Is there something concerning ReiserFS I should know when migrating from Linux 2.4 to Linux 2.6? No. Make sure the file system is fine before booting a new kernel, using a CURRENT reiserfsprogs is essential. Namesys have made lots of fixes recently. -- Matthias Andree Encrypt your mail: my GnuPG key ID is 0x052E7D95
Re: reiserfsprogs-3.6.12-pre1 release
Vitaly Fertman [EMAIL PROTECTED] writes: The new pre release is available for downloading on ftp://ftp.namesys.com/pub/reiserfsprogs/pre/reiserfsprogs-3.6.12-pre1.tar.gz ... *reiserfsck can check ro mounted filesystems. Does this use the reboot Linux exit codes that (e2)fsck uses? -- Matthias Andree Encrypt your mail: my GnuPG key ID is 0x052E7D95
Re: r4 v. ext3, quick speed vs. cpu experiments
Szakacsits Szabolcs [EMAIL PROTECTED] writes: Yes, if you have enough CPU capacity (aka you don't run anything else, just bechmarking filesystems). Otherwise it seems to be slower. That's I was refering to. This has been the situation with reiserfs 3.5/3.6 before, and it got resolved, or so it appears. I haven't ext3-vs-reiserfs3.6 figures at hand, but I'm not aware of CPU bottlenecks in reiserfs3.6 code. Just wait a couple of months until the reiserfs gurus got their reiserfs4 beast stable and debugged and can focus on tuning. To a previous post about code size and execution speed: it's not generally true that larger code is also slower. It depends how that code is arranged. If you have many abstractions, then maybe it's slower. If you have many specialized functions in an otherwise flat profile, it can be a good deal faster than a simpler (less complex) code. -- Matthias Andree
Re: [reiserfs-list] Catastrophe with mailboxes on ReiserFS
On Thu, 10 Oct 2002, Newsmail wrote: What are those write barrier patches? and where can I find them? I was also I don't know. Some vendor kernels have them (SuSE), but I'm not sure where they are available ATM. hurted by several 'cross linked' files, described in here. is there a way to deactivate write caching for just one filesystem? No. You can only disable it per drive. ps: please somebody let me know where I can find documentation about those chris mason patches, or at least what is dara=ordered or data=journal options. everybody talks about it, nobody explains. or did I miss something? You can safely consult ext3 documentation on these options, reiserfs should behave the same in respect to these options. -- Matthias Andree
Re: [reiserfs-list] Catastrophe with mailboxes on ReiserFS
Oleg Drokin [EMAIL PROTECTED] writes: BTW, I just remembered that until you apply Chris' Mason data logging patches, there is a certain window where system crash would lead to deleted data appearing at the end of files that were appended before the crash. I'd like to see these patches merged. I hurt myself several times with reiserfs, when the computer locked up during heavy file system activity, with NUL blocks in files, files mixed up and the like. I've never seen this on any data=ordered or data=journal file system, regardless of ext3 or reiserfs. Of course, write caches must be turned of or write barrier patches be applied to be safe in case of a power blackout. -- Matthias Andree
Re: [reiserfs-list] reiserfsck not fixing vs-5150/vs-13070
Hubert Mantel [EMAIL PROTECTED] writes: This is an unofficial kernel with lots of new features and work in progress. The README file explicitly warns not to use this one on a production system yet. Depending on the version you are using, there exists even a random memory corruption bug that only recently got fixed. Thanks for the heads-up. I'm aware that your unofficial kernel is WIP, and that it can undeliberately eat my lunch, burn my house and exhume my grandmother -- and even memory corruptions /could/ serve to improve fsck.*. Still, it looks as though the problems really escaped reiserfsck v3.6.3 but no more v3.6.4-pre1. I'll check your kernel's changelog to get the fix though. -- Matthias Andree
[reiserfs-list] reiserfsck not fixing vs-5150/vs-13070
| === -- Matthias Andree
Re: [reiserfs-list] 'let the hdd remap the bad blocks'
Hans Reiser [EMAIL PROTECTED] writes: Just taking a guess, many hard drives have difficult and time-consuming procedures that they can go through to read a troublesome block. These can take 20-30 seconds. Probably if they have to go through these procedures, once they finally succeed the smart vendors remap the block. They should try to rewrite and write verify the block before remapping it, as there is only a finite amount of spares. For SCSI drives, there's also Jörg Schilling's sformat tool that can do the badblocks stuff directly in the drive rather than through all the kernel buffers, and can also refresh or reassign bad blocks. -- Matthias Andree
Re: [reiserfs-list] 'let the hdd remap the bad blocks'
On Tue, 20 Aug 2002, Hans Reiser wrote: Vitaly, take a look at that. Part of a good user interface is letting users know what tools are available. Remember, most users will encounter a failing drive and/or fsck on a journaling fs as a rare and stressful event in their lives, so it is good to educate them with URLs and other references at the time they run fsck. A propos URL, here we go: ftp://ftp.fokus.gmd.de/pub/unix/sformat/ -- Matthias Andree
Re: [reiserfs-list] 'let the hdd remap the bad blocks'
Oleg Drokin [EMAIL PROTECTED] writes: Basically uyou'd better search for this on HDD vendors sites. What's going on is simply can be described this way: You write some block to HDD, if HDD decides the block is bad for some reason and remapping is allowed (usually by tiurning on SMART), block is written to different on-platter location and drive adds one more entry to its remaped-blocks list. Next time you read this block, drive consults its remapped blocks list and if block is remapped, reads it from new location with correct content. Described mechanism works for writing. Actually I've seen something that looks like remapping on read, though I have no meaningful explanation for that (except that they may have some extra redundant info stored when you write data to disk, so that if sector cannot be read, its content is restored with that redundant information and sector is then remapped.). And this process takes a lot of time. My Fujitsu MAH-3182MP drive (SCSI actually) had ARRE enabled as it shipped, but ARWE disabled, for reasons I cannot tell, not even from the data book (PDF). That's Automatic Remap on Read/Write Error. I'm not sure what it really means, but if the drive really remaps on a read error, it's going to leak a block at power loss while it is amidst a block write the next time this block is read. So I switched that to do ARWE. IDE users are not too lucky unless their vendor provides them with a tool (and not many ship raw floppy images, many have some multi-MB Windoze tools just to write some hundred kByte to a floppy disk...) -- Matthias Andree
Re: [reiserfs-list] Corruption: --fix-fixable results in all nlinkvalues = 0
Gerrit Hannaert [EMAIL PROTECTED] writes: Is there a difference in the way reiserfs formats as opposed to ext2/3? Your mentioning the defective blocks were never read before reminds me Well, the long explanation is, that the blocks may not have been used for some time, or that they have gone bad recently, such things happen, although you'd expect that from DTLA-3070xx drives earlier than from others. of the fact that a Reiserfs format is much quicker than any other filesystem's format - does it mean anything? The on-disk layout is different, but I'm not aware of the internals, and I don't believe that the allocation pattern changes anything about the facts. Perhaps it is good practice to run 'badblocks' before any initial format... if there is no option to format/scan or something. Neither mke2fs nor mkreiserfs read or write all blocks when formatting, they instead write some meta data, and that's it. Of course, running badblocks prior to formatting is an option to find these earlier. This one is a MAXTOR 6L080J4. But I've seen these 'dma' issues with *all* my other drives as well (IBM-DTLA-307045, FUJITSU MPE3136AH) on different PCs with Intel and Via motherboards. It's not dma but the unrecoverable error part that matters. The DMA trips over as consequence of this defective block (there is no data that could be transferred), the DMA is *not* the cause for the bad block. -- Matthias Andree
Re: [reiserfs-list] Corruption: --fix-fixable results in all nlinkvalues = 0
Stefan Fleiter [EMAIL PROTECTED] writes: Hi Vitaly! On Thu, 15 Aug 2002 Vitaly Fertman wrote: Ah, I guess I know what happened. I think you have some fatal corruptions and rebuild-tree is required. In this case check and fix-fixable do not perform semantic check. Then reiserfsck should not start in fix-fixable mode when rebuild-tree is required. People think that fix-fiable is less dangerous. You have shown in some situations it is the other way round... I propos a new reiserfsck version with only this fix included! Hum, if reiserfsck can tell if fix-fixable or rebuild-tree is the right one, then it should also be able to abort the fix-fixable run and tell the user to run rebuild-tree. Maybe such needs-fix-fixable and needs-rebuild-tree flags should be stored into the super block, much like ext2 stores the file system with errors condition. -- Matthias Andree
[reiserfs-list] [OT] traffic magnet dot com crap
Christine Hall [EMAIL PROTECTED] writes: I visited http://namesys.com, and noticed that you're not listed on some search engines! I think we can offer you a service which can help you increase traffic and the number of visitors to your website. Blacklist this site, trafficmagnet.net. They will come back. (They come back to haunt the university site that I administer, I regularly see rejects in my mailer's log file.) -- Matthias Andree
Re: [reiserfs-list] fsync() Performance Issue
Toby Dickenson [EMAIL PROTECTED] writes: write to file A write to file B write to file C sync Be careful with this approach. Apart from syncing other processes' dirty data, sync() does not make the same guarantees as fsync() does. Barring write cache effects, fsync() only returns after all blocks are on disk. While I'm not sure if and if yes, which, Linux file systems are affected, but for portable applications, be aware that sync() may return prematurely (and is allowed to!). -- Matthias Andree
Re: [reiserfs-list] Bad blocks
Sam Vilain [EMAIL PROTECTED] writes: How can I deal with this? If anyone knows of a tool to re-format just 8 sectors (to let the disk re-map the blocks elsewhere), that also would be helpful. Manufacturers may have these tools, but some do a full low-level format. Usually, writing the bad blocks will make the drive remap them if it has spare sectors left to remap to. -- Matthias Andree
Re: [reiserfs-list] reiserfs -o notail less throughput than ext3?
On Mon, 04 Mar 2002, Oleg Drokin wrote: But how much seeking is done on one 650 MB file that's been written onto an empty partition? I presume, not too much 1625*2 seeks (that's right, 2 seeks per each 4M of data) This figure is for reading Writing is more complex due to journal So let's assume we read 4 M in half a second, so we seek four times per second, gives us like 80 ms penalty on a slow drive That causes a performance drop to like 92% of the raw transfer rate, even when adding the same penalty for bus arbitration, we should still get 86% out However, I hope to be able to play with the settings later, to figure what's going on there -- Matthias Andree GPG encrypted mail welcome, unless it's unsolicited commercial email
Re: [reiserfs-list] reiserfs -o notail less throughput than ext3?
Chris Mason [EMAIL PROTECTED] writes: I would not say that speeds this bad are a known problem. 1.9MB/s is much too slow. Is that FS very full? Fragmentation is the only thing that should be causing this. We can exclude that, the partition is empty except that single file, or maybe two files at several hundred MB each. CD Images, Debian 2.2r5 in my case. -- Matthias Andree GPG encrypted mail welcome, unless it's unsolicited commercial email.
Re: [reiserfs-list] reiserfs -o notail less throughput than ext3?
Oleg Drokin [EMAIL PROTECTED] writes: Yes, it is slow, but overal disk throughput of 7M/sec suggests this is old drive. Old drives tend to have worse seeking speed than today's drives. But how much seeking is done on one 650 MB file that's been written onto an empty partition? I presume, not too much. -- Matthias Andree GPG encrypted mail welcome, unless it's unsolicited commercial email.
Re: [reiserfs-list] reiserfs -o notail less throughput than ext3?
Anders Widman [EMAIL PROTECTED] writes: Even with 'heavy' fragmentation this is quite low. A quick benchmark of my 5400rpm 80GB disk gave me an average on 30MB/s. However, when simulating large fragmentation (10 000+ fragments on a 1GB file) I get about 2MB/s. Is DMA, unmask IRQ, read ahead and similar activated? SCSI here, with aic7xxx 5.x and 6.x driver, no particular tuning in place except that I told the AHA2940 to negotiate Ultra-Wide, it has braindead default settings (negotiates 10 MXfers/s only, no Ultra), so we can safely assume it did DMA. -- Matthias Andree GPG encrypted mail welcome, unless it's unsolicited commercial email.
Re: [reiserfs-list] Boot failure: msdos pushes in front of reiserfs
Hubert Mantel [EMAIL PROTECTED] writes: Installation time is after boot time. Use a Unix-style file system. Go for minix, that's small and will not get in the way. So the modules floppy would need to be minix also. We had that in the past. No need, you can load fat.o + vfat.o from initrd or romfs or something. People don't need to mount these floppies. Agreed. But guess what: They do it nevertheless. At least they try to do so. Go work in the support department of some Linux distributor for some weeks. This might change your view of some things drastically. That particular person asked in de.comp.os.unix.linux.moderated rather than bothering your installation support team -- so much as to changed view. 3. the kernel fails to keep rootfstype over an initrd. There should be an additional parameter like initrdfstype or something to let users override autodetection. Or use *BSD-style, support a file system tag in the root= bootparam): root=reiserfs:/dev/hda13 would specify mounting hda13 as root and try reiserfs first, and initrd=vfat:/some/braindead/location whatever. Ok, so you are no longer requesting a distributor fixing his bugs, but now you want distributors to provide a workaround for some kernel shortcomming. Quite different for me. I was skiing for a week, that helps sometimes. :-) Still, I think the distributor should first modprobe reiserfs and only after that modprobe vfat. That way, reiserfs is tried BEFORE vfat and all is fine. Robustness of a boot procedure is not a luxury, but a requirement. 1. bootloader loads kernel with ext2/minix driver and corrsponding initrd 2. kernel bootstraps its initrd 3. initrd does modprobe reiserfs jfs xfs (whatever native) Where do you get those modules from? They don't fit onto one single floppy; there is already the kernel and initrd on the first floppy, so we need some filesystem in order to access the second floppy. Is there no room left for vfat.o inside the initrd when the initrd is minix? I think it does not matter if you link vfat into the kernel or drop it into the initrd, either is gzipped. I feel very uncomfortable with a crippled file system (no devices!) to bootstrap Linux. It seems to me like a dead end. Maybe there's a better way than initrd (initramfs had been suggested, cramfs also), I didn't look (and I usually boot my boxen without initrd, no RAID here). Since you don't have to deal with customers, it is very convenient for you to demand we just use minix and answer all the newbies that report defective floppies shipped by us. In fact, I've yet to see a broken floppy shipped by SuSE. The worst thing I came across was a CD of some 6.x version which the CD-ROM drive read at 8x speed (and I didn't bother to have that exchanged) -- Matthias Andree They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. Benjamin Franklin
Re: [reiserfs-list] Boot failure: msdos pushes in front of reiserfs
On Mon, 14 Jan 2002, Oleg Drokin wrote: Looking at init/main.c and fs/super.c, rootfsflags parameter is never saved, moreover - it's original value is destroyed, once initrd fs is mounted. And I only see not very nice ways of fixing this, so perhaps someone more exeprienced can come up with the solution? (my crappy ides is not to do putname() on fs_names, if (real_root_dev != ROOT_DEV), all of this is only when CONFIG_..._INITRD enabled) Thanks for confirming a bug, so I understand that mounting an initrd loses the rootfsflags, and as the actual root= parameter is kept over an initrd boot, it should also be possible for rootfsflags= -- can the rootfsflags maybe be saved along with the root= parameter? Yup, reiserfs is last in /proc/filesystems when loaded as module, but on my private machine (where it's linked into the kernel), it's right after ext2 and before vfat. Do you have vfat as a loadable module? Hum, yes, but that's not the point, someone turned up with a SuSE 7.3 default kernel .config, and it had ...MSDOS=y ...REISERFS=m -- that says about all, msdos is higher in the list, reiserfs is then loaded from initrd, and thus at the bottom of the list. Strange enough SuSE compile MSDOS which hardly anyone needs at boot time into the kernel, but not reiserfs (admittedly, reiserfs takes up some memory, but then, it's a native file system and should be loaded before non-native file systems such as msdos, vfat, ntfs, freevxfs or whatever). This one is for the distributors to fix. Had they left MSDOS as a module, things would have worked out: 1. ext2 in the kernel 2. initrd loads reiserfs 3. actual root (reiserfs) is mounted 4. only now, msdos.o becomes available. -- Matthias Andree They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. Benjamin Franklin
Re: [reiserfs-list] writeback caching okay?
Gregory Ade [EMAIL PROTECTED] writes: I was just wondering if there are any known problems using ReiserFS on disk drives that have writeback caching enabled? I realized that there is a possibility of the filesystem getting royally screwed if there is a sudden loss of power, but the system I'd do this on is on a UPS, so this shouldn't be an issue unless my cat figures out how to power off my UPS units. Well, a piece of cardboard and some adhesive tape should fix /that/ problem. Pray your PSU in the machine itself doesn't fail. If your UPS has a decent surge filter and is fast to ramp up the voltage should the main supply fail or show brownouts, then the risk may be small enough. -- Matthias Andree Those who give up essential liberties for temporary safety deserve neither liberty nor safety. - Benjamin Franklin
[reiserfs-list] Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
On Tue, 25 Sep 2001, Alex Bligh - linux-kernel wrote: Probably because sectors are so close together on the physical media. If you disable write caching, and are writing sectors 1001, 1002, 1003 etc., you tell it to write sector 1001, and it doesn't complete until it's written it, you IRQ the PC, and it sends the write out for 1002, which completes a little later. However, by this time 1002 has flown past the drive head, as it wasn't immediately queued on the drive. If you had only one sector of writeahead, this effect would disappear (but is just as theoretically dangerous if there is no way to force a flush() of the write cache). Which leads me to the question: which ATA standard brought up the mandatory FLUSH CACHE command? I saw it's in the ATA 6 draft. How about standards used in drives that are sold today? ATA 4, ATA 5? Do they have the FLUSH CACHE command listed, possibly as mandatory? That might be rather useful to use after in a synchronous write. -- Matthias Andree Those who give up essential liberties for temporary safety deserve neither liberty nor safety. - Benjamin Franklin
[reiserfs-list] Re: [PATCH] 2.4.10 improved reiserfs a lot, but could still be better
On Mon, 24 Sep 2001, Alan Cox wrote: Those drives should be blacklisted and rejected as soon as someone tries to mount those pieces rw. Either the drive can make guarantees when a write to permanent storage has COMPLETED (either by switching off the cache or by a flush operation) or it belongs ripped out of the boxes and stuffed down the throat of the idiot who built it. In which case you can choose between ancient ST-506 drives and SCSI Sorry, a disk drive which makes no guarantees even after a flush, does not belong in my boxen. I'd return it as broken the first day I figured it did lazy write-back caching. No file system can be safe on such disks. -- Matthias Andree Those who give up essential liberties for temporary safety deserve neither liberty nor safety. - Benjamin Franklin
Re: [reiserfs-list] filesystem interfaces
On Mon, 21 May 2001, Xuan Baldauf wrote: Transaction are not only needed by mail servers, databases will like it, and file copying (ftp, scp, even nfs) do need filesystem transactions. Well. Rsync emulates those, transferring files to temporary names and renaming them atomically into place. :-) -- Matthias Andree
Re: [reiserfs-list] /etc/fstab
Krasi Zlatev [EMAIL PROTECTED] writes: I do not want to put in /etc/fstab, because the partition is mounted at start up. /dev/hdd5 /mnt5 reiserfs notail,noatime0 0 How to mount it manually what options should I give to mount? The same :-) You can add noauto to the fstab options to prevent mount on boot. See man fstab. -- Matthias Andree
Re: [reiserfs-list] filesystem interfaces
Ragnar Kjørstad [EMAIL PROTECTED] writes: 1. fsync The current fsync interface lacks a couple of interesting features. I believe lazy-fsync has been discussed, but another useful feature is Wait a minute. The major problem is that mail-servers, regardless of their performance efforts, need to be safe in that they *NEVER* lose any mail. Trading speed for lost mail over a crash is not a good deal. An architecture which defers fsync() for a finite, maximum amount of time to gather multiple fsync() requests into a batch of (possibly sorted, sent into a tagged SCSI command queue) writes and then acknowledging all at once may be a good idea, provided you don't run out of process table slots with all those waiting smtpds, and a mail server can not acknowledge receipt of a mail before all file and meta data is on disk (and not in a fast write cache or something). The caller should be able to wait for fsync to complete by using poll in the case of asyn fsync. It would require introducing a special syscall, I believe. However, in that case, fsync should also sync all meta data related to files. A different aspect is that many mailers expect BSD-type directory modifications are always synchronous semantics. Revisit the ReiserFS-and-qmail issue. On ext2, you can circumvent the problems with chattr +S, but on ReiserFS, there is no such thing. Now link and rename behaves differently with regards to replacing existing files, but what's the logic behind this? What's the problem with that? To clobber, use rename. To be careful, use link and if that succeeds, unlink. How would you like to establish the atomicity of either functionality (either rename or link (without unlink)) in your approach? 3. Ruby I just came across the ruby programming language today - the interesting thing is that this language has a concept of transactions! Does any other languages have this kind of features? Do anyone use them for real software? It would be really cool with a ruby-implementation that actually used filesystem-transactions to implement this instead of the library implementation that I assume ruby uses. That would probably be most useful in things like NFS where a link may succeed, but the success report fails. If this was transaction-oriented, the check if your file's st.n_link has increased to 2, if so, your link has succeeded could be avoided. Not sure if Coda or AFS have concepts like these. -- Matthias Andree