Re: FEATURE Req: integrate badblocks check into fsck.reiser*
Ric Wheeler wrote: David Masover wrote: Peter wrote: On Fri, 01 Sep 2006 17:27:20 -0500, David Masover wrote: snip... both mkfs.reiserfs and fsck.reiserfs have -B option to accept list of bad blocks. We thought that should be enough. It really should. Why bother with a patch? Just write a wrapper script that runs badblocks and passes in the list to mkfs. It was just a thought from userland. My perspective was that a user, not a hard-boiled geek, might get lulled into a false sense of security but may not have the wherewithal to write a wrapper. If nothing else, when the final doc is written (did I say final?:)), it should include a notice about not running badblocks. Well, let's see... Most hard drives come more thoroughly tested at the factory than anything badblocks would do. Also, it seems redundant to have every single mkfs have to implement a badblocks flag.. I'd suggest a universal wrapper, then, or a modification to the mkfs frontend, so that this works the same way across all filesystems. Something like mkfs -B -t reiser4 I don't think that modern drives that fail writes are worth using for a brand new file system. While failing reads is quite common and can be caused by temporal issues (dirt on the platter, a bad write, etc), failed writes are almost always a sign that you have a serious issue. Almost all modern drives remap each failed write to a bad sector automatically. This action only fails if you have exhausted this remapping area (or have some really nasty issue like a bad cable, bad write head, etc). Having mkfs ignore bad writes would seem to encourage users to create a new file system on a disk that is known to be bad most likely not going to function well. If a user ever has a golden opportunity to toss a drive in the trash, it is when they notice mkfs fails ;-) This option to mkfs sounds like an invitation to disaster. Yes, you are right, the option should be to run badblocks and then fail if it finds any. The other tools (debugreiserfs, reiserfsck, etc) do need to be able to handle bad blocks as well as possible since they are often needed during a salvage operation. in order to recover data (which might need to be migrated to a new disk). It is not clear to me that passing a list of bad blocks helps them as much as a robust general purpose error recovery support.
Re: FEATURE Req: integrate badblocks check into fsck.reiser*
Vladimir V. Saveliev wrote: Hello On Friday 01 September 2006 22:23, Peter wrote: Perhaps this has been mentioned before. If so, sorry. IMHO, it would be useful to integrate a call to badblocks in the fsck/mkfs.reiser* programs so that more thorough disk checking can be done at format time. Sort of like the option e2fsck -c. If this is added, the output could be fed immediately to the reiser format program and badblocks spared prior to filesystem use. JM$0.02 both mkfs.reiserfs and fsck.reiserfs have -B option to accept list of bad blocks. We thought that should be enough. I'll take a patch;-)
Re: FEATURE Req: integrate badblocks check into fsck.reiser*
David Masover wrote: Vladimir V. Saveliev wrote: Hello On Friday 01 September 2006 22:23, Peter wrote: Perhaps this has been mentioned before. If so, sorry. IMHO, it would be useful to integrate a call to badblocks in the fsck/mkfs.reiser* programs so that more thorough disk checking can be done at format time. Sort of like the option e2fsck -c. If this is added, the output could be fed immediately to the reiser format program and badblocks spared prior to filesystem use. JM$0.02 both mkfs.reiserfs and fsck.reiserfs have -B option to accept list of bad blocks. We thought that should be enough. It really should. Why bother with a patch? Just write a wrapper script that runs badblocks and passes in the list to mkfs. For average sysadmins, it would be nice to just add a flag and it happens. I will take a patch, but I won't write it.;-)
Re: FEATURE Req: integrate badblocks check into fsck.reiser*
David Masover wrote: Peter wrote: On Fri, 01 Sep 2006 17:27:20 -0500, David Masover wrote: snip... both mkfs.reiserfs and fsck.reiserfs have -B option to accept list of bad blocks. We thought that should be enough. It really should. Why bother with a patch? Just write a wrapper script that runs badblocks and passes in the list to mkfs. It was just a thought from userland. My perspective was that a user, not a hard-boiled geek, might get lulled into a false sense of security but may not have the wherewithal to write a wrapper. If nothing else, when the final doc is written (did I say final?:)), it should include a notice about not running badblocks. Well, let's see... Most hard drives come more thoroughly tested at the factory than anything badblocks would do. They leave more tested, they arrive ;-) and you assume it is a new drive. Also, it seems redundant to have every single mkfs have to implement a badblocks flag.. I'd suggest a universal wrapper, then, or a modification to the mkfs frontend, so that this works the same way across all filesystems. Something like mkfs -B -t reiser4
Re: Reiser4 und LZO compression
Edward Shishkin wrote: Clemens Eisserer wrote: But speaking of single threadedness, more and more desktops are shipping with ridiculously more power than people need. Even a gamer really Will the LZO compression code in reiser4 be able to use multi-processor systems? E.g. if I've a Turion-X2 in my laptop will it use 2 threads for compression/decompression making cpu throughput much better than whatthe disk could do? Compression is going in flush time and there can be more then one flush thread that processes the same transaction atom. Decompression is going in the context of readpage/readpages. So if you mean per file, then yes for compression and no for decompression. I don't think your explanation above is a good one. If there is more than one process reading a file, then you can have multiple decompressions at one time of the same file, yes? Just because there can be more than one flush thread per file does not mean it is likely there will be. CPU scheduling of compression/decompression is an area that could use work in the future.For now, just understand that what we do is better than doing nothing.;-/ Edward. lg Clemens 2006/8/30, Hans Reiser [EMAIL PROTECTED]: Edward Shishkin wrote: (Plain) file is considered as a set of logical clusters (64K by default). Minimal unit occupied in memory by (plain) file is one page. Compressed logical cluster is stored on disk in so-called disk clusters. Disk cluster is a set of special items (aka ctails, or compressed bodies), so that one block can contain (compressed) data of many files and everything is packed tightly on disk. So the compression unit is 64k for purposes of your benchmarks.
Re: bug: Unable to mount reiserfs from DVD+R
Xuân Baldauf wrote: Hello, I created backup DVDs formatted using reiserfs. However, mounting them is not possible. If I try to mount such a DVD, I get following results: Aug 31 01:08:18 notebook2 kernel: ReiserFS: hdc: using ordered data mode Aug 31 01:08:18 notebook2 kernel: ReiserFS: hdc: warning: sh-458: journal_init_dev: cannot init journal device 'unknown-block(22,0)': -30 Aug 31 01:08:18 notebook2 kernel: ReiserFS: hdc: warning: sh-462: unable to initialize jornal device Aug 31 01:08:18 notebook2 kernel: ReiserFS: hdc: warning: sh-2022: reiserfs_fill_super: unable to initialize journal space I get this error even when mounting using -o ro,nolog. I tracked this down to the method journal_init_dev, which opens the journal device with mode blkdev_mode = FMODE_READ | FMODE_WRITE; except when the journal device is marked read only. However, as it seems, my DVD burner drive is able to write DVD media (so it is not a read-only-device, in general), but the particular DVD is only readable, not writable. That's why (I think) the journal device itself is not marked read only and yet open_by_devnum(jdev, blkdev_mode); fails. The method journal_init_dev should honor the mount options ro or nolog or both, which it currently does not (as of Linux 2.6.18-rc4). Alternatively, method journal_init_dev might try to open the journal device again (only using read only mode) in case the first try to open results in -EROFS. Xuân Baldauf. Sounds reasonable to me, Chris?
Re: [patch] Re: assertion failed: can_hit_entd(ctx, s)
Is it already sent in? If not, can it go out today? Hans Alexander Zarochentsev wrote: Hello, On 30 August 2006 01:10, Andrew James Wade wrote: Hello Alexander, In addition to your patch, I've also applied the patch below. With these two patches the fs is much more stable for me. That code was removed from reiser4 recently, the patch will be in the next -mm kernel. I knew there was a bug somewhere :) However, something is holding a d_ref across the calls to reiser4_writepage. It's not clear to me that this is allowed so my patch may not be a full fix. Andrew Wade signed-off-by: [EMAIL PROTECTED] diff -rupN a/fs/reiser4/plugin/item/extent_file_ops.c b/fs/reiser4/plugin/item/extent_file_ops.c --- a/fs/reiser4/plugin/item/extent_file_ops.c 2006-08-28 11:30:33.0 -0400 +++ b/fs/reiser4/plugin/item/extent_file_ops.c 2006-08-29 13:06:20.0 -0400 @@ -1320,20 +1320,22 @@ static int extent_readpage_filler(void * TWIG_LEVEL, CBK_UNIQUE, NULL); if (result != CBK_COORD_FOUND) { reiser4_unset_hint(hint); -return result; +goto out; } ext_coord-valid = 0; } if (zload(ext_coord-coord.node)) { reiser4_unset_hint(hint); -return RETERR(-EIO); +result = RETERR(-EIO); +goto out; } if (!item_is_extent(ext_coord-coord)) { /* tail conversion is running in parallel */ zrelse(ext_coord-coord.node); reiser4_unset_hint(hint); -return RETERR(-EIO); +result = RETERR(-EIO); +goto out; } if (ext_coord-valid == 0) @@ -1358,6 +1360,10 @@ static int extent_readpage_filler(void * } else reiser4_unset_hint(hint); zrelse(ext_coord-coord.node); + +out: +/* Calls to this function may be intermingled with VM writeback. */ +reiser4_txn_restart_current(); return result; } !DSPAM:44f4ad37293861987214747! Thanks, Alex.
Re: Reiser4 und LZO compression
Edward Shishkin wrote: (Plain) file is considered as a set of logical clusters (64K by default). Minimal unit occupied in memory by (plain) file is one page. Compressed logical cluster is stored on disk in so-called disk clusters. Disk cluster is a set of special items (aka ctails, or compressed bodies), so that one block can contain (compressed) data of many files and everything is packed tightly on disk. So the compression unit is 64k for purposes of your benchmarks.
Re: Reiser4 und LZO compression
PFC wrote: I made a little benchmark on my own PC (Athlon64 3200+ in 64 bit gentoo) http://peufeu.free.fr/compression.html So, gzip could be used on PCs having very fast processors and very slow harddrives, like Core Duo laptops. However, lzo compresses nearly as much and is still a lot faster. I don't see a reason for gzip in a FS application. Anyone has a bench for lzf ? Yes, Edward did equivalent tests, and thus we selected LZO.
Re: Reiser4 und LZO compression
PFC, thanks for giving us some real data. May I post it to the lkml thread? In essence, LZO wins the benchmarks, and the code is hard to read. I guess I have to go with LZO, and encourage people to take a stab at dethroning it. Hans PFC wrote: I have made a little openoffice spreadsheet with the results. You can have fun entering stuff and seeing the results. http://peufeu.free.fr/compression.ods Basically, a laptop having the same processor as my PC and a crummy 15 MB/s drive (like most laptop drives) will get a 2.5x speedup using lzf, while using 40% CPU for compression and 15% CPU for decompression. I'd say it's a clear, hge win. A desktop computer with a modern IDE drive doing 50 MB/s will still get nice speedups (1.8x on write, 2.5x on read) but of course, more CPU will be used because of the higher throughput. In this case it is CPU limited on compression and disk limited on decompression. However soon everyone will have dual core monsters so... A big ass RAID will not get much benefit unless : - the buffer cache stores compressed pages, so compression virtually doubles the RAM cache - or the CPU is really fast - or you put one of these neat FPGA modules in a free Opteron socket and upload a soft-hardware LZF in it with a few gigabytes/s throughput Or you look the sysadmin in the eyes, and say, your file servers have more out of disk space problems than load problems, yes? ...
Re: mutt tells me: Fetching message... Bus error
Thanks. Did you run an older version of reiser4 before this? If yes, then this may have been fixed but only showed up for you now. Zam? Hans Posern wrote: Hi. I don't know if this can help you to improve reiser4: On a: Linux jolie 2.6.17.8-reiser4-r3-jolie #1 Tue Aug 15 00:14:45 CEST 2006 i686 Intel(R) Pentium(R) 4 CPU 2.53GHz GNU/Linux I was opening a maildir-email-file in mutt via IMAP-SSL running on the same machine. The home and root partition on this machine are reiser4. Mutt did NOT open the Email, but told me: Fetching message... Bus error in the status line. The kernel-log from this time told me: Aug 29 09:55:26 [kernel] [17183751.088000] reiser4[mutt(8457)]: present_lw_sd (fs/reiser4/plugin/item/static_stat.c:277)[]: Aug 29 09:55:26 [kernel] [17183751.088000] WARNING: partially converted file is encountered Aug 29 09:55:26 [imapd-ssl] Unexpected SSL connection shutdown. Aug 29 09:55:26 [imapd-ssl] DISCONNECTED, user=me, ip=[:::127.0.0.1], headers=16823, body=254592, time=3339, starttls=1 Aug 29 09:55:26 [kernel] [17183751.132000] reiser4[mutt(8457)]: extent2tail (fs/reiser4/plugin/file/tail_conversion.c:714)[nikita-2282]: Aug 29 09:55:26 [kernel] [17183751.132000] WARNING: Partial conversion of 2413985: 2 of 3: -22 Aug 29 09:55:26 [kernel] [17183751.132000] reiser4[mutt(8457)]: release_unix_file (fs/reiser4/plugin/file/file.c:2268)[nikita-3233]: Aug 29 09:55:26 [kernel] [17183751.132000] WARNING: Failed (-22) to convert in release_unix_file (2413985) Aug 29 09:55:29 [imapd-ssl] Connection, ip=[:::127.0.0.1] Aug 29 09:55:30 [mutt] unable to dlopen /usr/lib/sasl2/libsql.so: /usr/lib/sasl2/libsql.so: cannot open shared object file: No such file or directory Aug 29 09:55:30 [imapd-ssl] LOGIN, user=me, ip=[:::127.0.0.1], protocol=IMAP Aug 29 09:55:34 [kernel] [17183758.976000] reiser4[mutt(9207)]: extent2tail (fs/reiser4/plugin/file/tail_conversion.c:714)[nikita-2282]: Aug 29 09:55:34 [kernel] [17183758.976000] WARNING: Partial conversion of 2413985: 2 of 3: -22 Aug 29 09:55:34 [kernel] [17183758.976000] reiser4[mutt(9207)]: release_unix_file (fs/reiser4/plugin/file/file.c:2268)[nikita-3233]: Aug 29 09:55:34 [kernel] [17183758.976000] WARNING: Failed (-22) to convert in release_unix_file (2413985) Aug 29 09:55:34 [imapd-ssl] Unexpected SSL connection shutdown. Aug 29 09:55:34 [imapd-ssl] DISCONNECTED, user=me, ip=[:::127.0.0.1], headers=15902, body=3979, time=4, starttls=1 Aug 29 09:58:59 [imapd-ssl] Connection, ip=[:::10.10.10.28] Aug 29 09:58:59 [imapd-ssl] LOGIN, user=me, ip=[:::10.10.10.28], protocol=IMAP Even though with thunderbird via IMAP-SSL from another computer I could open this message! After I booted with the www.sysresccd.org v0.2.19 and did a fsck.reiser4 on this partition. It said I should run fsck.reiser4 --build-fs. Unfortunatly I don't have the output of this call (next time :-). Then I booted into my gentoo linux again ... and now the mail opens in mutt without error message. Greetings, K. Posern
Re: Reiser4 und LZO compression
David Masover wrote: John Carmack is pretty much the only superstar programmer in video games, and after his first fairly massive attempt to make Quake 3 have two threads (since he'd just gotten a dual-core machine to play with) actually resulted in the game running some 30-40% slower than it did with a single thread. Do the two processors have separate caches, and thus being overly fined grained makes you memory transfer bound or? Two processors tends to create a snappier user experience, in that big CPU processes get throttled nicely. Hans
Re: Possible bug with FIBMAP
zam, please review this unless vs is back. What is the file size? Is there anything special about the file (holes, etc.)? Thanks for finding what I assume is a bug. (I wonder if this has been sporadically affecting use of reiser4 with bootloaders.) Hans Brice Arnould wrote: Hi Two users of a hack I wrote told me that http://vleu.net/shake/fb_r4.c (also attached with the mail) returned FIBMAP=-22, FIGETBSZ=4096 on some of their files on reiser4 filesystems. Does this value of -22 have a special meaning (would be strange), or is it a bug in Reiser4 ? I can ask them for more details, if you want. Thanks Brice /* * Non released test software, distributed under GPL-2 licence by * Brice Arnould (c) 2006 * You shouldn't use it. */ #include stdio.h #include assert.h // assert() #include errno.h// errno #include error.h// error() #include sys/ioctl.h// ioctl() #include linux/fs.h // FIBMAP, FIGETBSZ #include sys/types.h// open() #include sys/stat.h // open() #include fcntl.h// open() int main (int argc, char **argv) { int fd, blocksize, block = 0; if (1 != argc) error (1, 0, usage : %s FILE, argv[0]); fd = open (argv[1], O_RDONLY); assert (0 fd); if (-1 == ioctl (fd, FIGETBSZ, blocksize) || -1 == ioctl (fd, FIBMAP, block)) error (1, 0, ioctl() failed, are you root ?\n); printf (FIBMAP=%i, FIGETBSZ=%i\n, block, blocksize); close (fd); }
Re: Reiser4 und LZO compression
Alexey Dobriyan wrote: Reiser4 developers, Andrew, The patch below is so-called reiser4 LZO compression plugin as extracted from 2.6.18-rc4-mm3. I think it is an unauditable piece of shit and thus should not enter mainline. Hmm. LZO is the best compression algorithm for the task as measured by the objectives of good compression effectiveness while still having very low CPU usage (the best of those written and GPL'd, there is a slightly better one which is proprietary and uses more CPU, LZRW if I remember right. The gzip code base uses too much CPU, though I think Edward made an option of it). Could you be kind enough to send me a plugin which is better at those two measures, I'd be quite grateful? By the way, could you tell me about this auditing stuff? Last I remember, when I mentioned that the US Defense community had coding practices worth adopting by the Kernel Community, I was pretty much disregarded. So, while I understand that the FSB has serious security issues what with all these Americans seeking to crack their Linux boxen, complaining to me about auditability seems a bit graceless.;-) Especially if there is no offer of replacement compression code. Oh, and this LZO code is not written by Namesys. You can tell by the utter lack of comments, assertions, etc. We are just seeking to reuse well known widely used code. I have in the past been capable of demanding that my programmers comment code not written by them before we use it, but this time I did not. I have mixed feeling about us adding our comments to code written by a compression specialist. If Andrew wants us to write our own compression code, or comment this code and fill it with asserts, we will grumble a bit and do it. It is not a task I am eager for, as compression code is a highly competitive field which gives me the surface impression that if you are not gripped by what you are sure is an inspiration you should stay out of it. Jorn wrote: I've had an identical argument with Linus about lib/zlib_*. He decided that he didn't care about diverging, I went ahead and changed the code. In the process, I merged a couple of outstanding bugfixes and reduced memory consumption by 25%. Looks like Linus was right on that one. Anyone sends myself or Edward a patch, that's great. Jorn, sounds like you did a good job on that one. Hans
Re: Reiser4 und LZO compression
Nigel Cunningham wrote: For Suspend2, we ended up converting the LZF support to a cryptoapi plugin. Is there any chance that you could use cryptoapi modules? We could then have a hope of sharing the support It is in principle a good idea, and I hope we will be able to say yes. However, I have to see the numbers, as we are more performance sensitive than you folks probably are, and every 10% is a big deal for us.
Re: Need help retrieving data
I am very sorry to inform you that our fsck guy is on vacation, and we have no effective backup for him. Hans Brian Davis wrote: Hello, I've paid the 25 dollars, but I haven't gotten a response yet so I'm trying this list I have a disk which has a single partition with reiserfs version 3 on it. One day I put the disk on a 3Ware 7506-4LP RAID controller card and tried to mount the partition through the normal mechanisms. (mount -t xfs /dev/..). It was not part of a RAID array, it was just setup as a single disk on the controller. When I did this, mount hung for what seemed to be a long time and then Seg Faulted. Since then I have not been able to mount the partition on either the RAID controller or a normal IDE controller. I get the following error when trying to mount: localhost ~ # mount -t reiserfs /dev/hde1 /stuff mount: wrong fs type, bad option, bad superblock on /dev/hde1, missing codepage or other error In some cases useful info is found in syslog - try dmesg | tail or so Checking /var/log/messages reveals the following: ReiserFS: hde1: found reiserfs format 3.6 with standard journal ReiserFS: hde1: using ordered data mode ReiserFS: hde1: journal params: device hde1, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 ReiserFS: hde1: checking transaction log (hde1) ReiserFS: hde1: warning: vs-7000: search_by_entry_key: search_by_key returned item position == 0 I then run reiserfsck, which doesn't find any errors: localhost ~ # reiserfsck /dev/hde1 reiserfsck 3.6.19 (2003 www.namesys.com) * ** If you are using the latest reiserfsprogs and it fails ** ** please email bug reports to reiserfs-list@namesys.com, ** ** providing as much information as possible -- your ** ** hardware, kernel, patches, settings, all reiserfsck ** ** messages (including version), the reiserfsck logfile, ** ** check the syslog file for any related information. ** ** If you would like advice on using this program, support ** ** is available for $25 at www.namesys.com/support.html. ** * Will read-only check consistency of the filesystem on /dev/hde1 Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes ### reiserfsck --check started at Tue Aug 22 19:21:46 2006 ### Replaying journal.. Reiserfs journal '/dev/hde1' in blocks [18..8211]: 0 transactions replayed Checking internal tree..finished Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 40825 Internal nodes 270 Directories 2351 Other files 34336 Data block pointers 34474385 (536372 of them are zero) Safe links 0 ### reiserfsck finished at Tue Aug 22 19:26:56 2006 ### Trying to mount again after running reiserfsck still results in the same error as above. I'm at a loss for the next steps to take, I'm hoping you can help be get my data back from this drive/partition. Thanks, Brian
Looking for Kernel hacking or storage systems design and code consulting work
Namesys wrote the Reiser4 filesystem. We are looking for persons who need any kind of kernel hacking or storage system code done. We have a complete team of kernel programmers available. Do you have race conditions you need found, spaghetti code from a predecessor that nobody can do anything with, a driver you want made to work, or a storage related feature you need? We can do it. Our guys are quite senior, and we have been doing Linux filesystem programming since 1993. Would you like an outsider to review a design, or someone for a technical advisory board? We can do it. Funds earned go to support the completion of the reiser4 filesystem, described at www.namesys.com.
Re: [PATCH] reiserfs: eliminate minimum window size for bitmap searching
Jeff Mahoney wrote: Also, I think the bigalloc behavior just ultimately ends up introducing even more fragmentation on an already fragmented file system. It'll keep contiguous chunks together, but those chunks can end up being spread all over the disk. -Jeff Yes, and almost as important, it makes it difficult to understand and predict the allocator, which means other optimizations become harder to do.
Re: Reiser4 stress test.
Thanks Andrew, please be patient and persistent with us at this time, as one programmer is on vacation, and the other is only able to work a few hours a day due to an illness. Hans
Meeting with akpm today
He indicated that he would threaten to include reiser4 in 2.6.19 once we got our fixes back to him (as a way of getting more comments), and that it would most likely go in in 2.6.20 as a result. I told him that most of the team was on vacation or sick, so we would probably be delayed in getting the fixes back to him. If by chance more of the work is done than I fear, it would be nice to get it to him soon... We discussed dentry cache, and the framgentation problems it has (where one dentry being active keeps a whole page around). He suggested that by using the reclaim code and ignoring the cases where dentries are referenced, they could be repacked without too much coding. I think we both think repacking is the right answer, as dcache is very inefficient in its pruning without it. We discussed the library vs. control thread owning issues for the generic code. I think he agrees in principle, but I sense he would want to see a good generic code library approach implemented before he would accept it, which is rather reasonable. He prefers fixing generic code to needlessly duplicating it. He said that inodes and dentries don't saw tooth in their usage. I had remembered the code as letting them increase to more and more percentage of RAM until their is a crisis shortage of memory, at which point the shrink_* code for them gets finally invoked. He assured me I was incorrect, I guess I need to go reread that code. All in all, he is a very perceptive and reasonable fellow who knows what he is doing. hans
Re: the 'official' point of view expressed by kernelnewbies.org
Ric Wheeler wrote: Hans Reiser wrote: I am skeptical that bitflip errors above the storage layer are as common as the ZFS authors say, and their statistics that I have seen somehow lack a lot of detail about how they were gathered. If, say, a device with 100 errors counts as 100 instances for their statistics. Well, it would be nice to know how they were gathered. Next time I meet them I must ask. I think that most big vendors have a lot of information about failure rates on drives, but cannot actually share the details in public (due to NDA's with the suppliers). One thing that we are trying to do is to get some of the more community oriented people at Seagate Research to come out and talk to the people about what are reasonable types of errors to code against. Current idea is to get everyone in the same place a couple of days before the next FAST conference (i.e., linux IO people or file system people and these vendors). (See the USENIX page for details on FAST at http://www.usenix.org/events/fast07/cfp/). I will say that media errors tend to be larger than single bit errors, i.e. you will lose a set of sectors instead of seeing a single bit flip on one sector (remember that the drive vendors do extensive ECC at their level). What their ECC will not fix is something like junk settling on the platter or a really bad error like a bad disk head. I think that integration of fs, fsck, and raid is the right solution for media errors. What I haven't seen data I trust on is what is bitflip error rate for the non-media sources. Since I haven't seen data I believe (where belief requires details being supplied), my inclination is to say plugins that users can choose to use if they want them are the right solution. I think that ECC would be overkill, I view it as an option that we make available to enterprise customers who want to feel good. It is not for me to tell them that they are wrong, for I lack the data, it is merely for me to supply it as a non-default option, and let the users tell me how often it actually gets triggered when they use it.
Re: some testing questions
Ingo Bormuth wrote: #df: /dev/hda8 6357768 3478716 2879052 55% /cache Before doing so, the partition was 90% full. The performance difference between 90% full and 55% full will be large on every filesystem. When we ship a repacker, that will be less true, because we will have large chunks of unused space after the repacker runs. Oddly enough, I don't know the statistics for reiser* filesystems, but I know that for FFS you should not let it become more than 85% full before buying a new disk (or cleaning your home directory) if you want good performance.
Re: the 'official' point of view expressed by kernelnewbies.org
Tom Reinhart wrote: Anyone with serious need for data integrity already uses RAID, so why add brand new complexity for a solved problem? RAID is great at recovering data, but not detecting errors. File system can detect errors with checksum. What is missing is an API between layers for filesystem to say this sector is bad, go rebuild it. I agree that such an API is needed. I think there are a lot of systems on desktops that lack RAID though. Probably I should leave ECC for some hopefully next year future release though. This seems like a much more simple and useful thing than adding ECC into the filesystem itself. How about we switch to ecc, which would help with bit rot not sector loss? Interesting aspect. Yes, we can implement ECC as a special crypto transform that inflates data. As I mentioned earlier, it is possible via translation of key offsets with scale factor 1. Of course, it is better then nothing, but anyway meta-data remains ecc-unprotected, and, hence, robustness is not increased.. _ On the road to retirement? Check out MSN Life Events for advice on how to get there! http://lifeevents.msn.com/category.aspx?cid=Retirement
Re: some testing questions
David Masover wrote: that's 3-7% fragmentation. which is high enough to hurt performance. 50mb/s * 0.01 seconds = amount of transfer a seek costs. He needs a repacker. After we resolve code review issues from akpm.
Re: the 'official' point of view expressed by kernelnewbies.org
Edward Shishkin wrote: Tom Reinhart wrote: Anyone with serious need for data integrity already uses RAID, so why add brand new complexity for a solved problem? RAID is great at recovering data, but not detecting errors. File system can detect errors with checksum. What is missing is an API between layers for filesystem to say this sector is bad, go rebuild it. Actually we dont need a special API: kernel should warn and recommend running fsck, which scans the whole tree and handles blocks with bad checksums. Yes, but our fsck knows nothing about RAID currently so This seems like a much more simple and useful thing than adding ECC into the filesystem itself. checksumming is _not_ much more easy then ecc-ing from implementation standpoint, however it would be nice, if some part of errors will get fixed without massive surgery performed by fsck How about we switch to ecc, which would help with bit rot not sector loss? Interesting aspect. Yes, we can implement ECC as a special crypto transform that inflates data. As I mentioned earlier, it is possible via translation of key offsets with scale factor 1. Of course, it is better then nothing, but anyway meta-data remains ecc-unprotected, and, hence, robustness is not increased.. _ On the road to retirement? Check out MSN Life Events for advice on how to get there! http://lifeevents.msn.com/category.aspx?cid=Retirement
Re: the 'official' point of view expressed by kernelnewbies.org
I am skeptical that bitflip errors above the storage layer are as common as the ZFS authors say, and their statistics that I have seen somehow lack a lot of detail about how they were gathered. If, say, a device with 100 errors counts as 100 instances for their statistics. Well, it would be nice to know how they were gathered. Next time I meet them I must ask. That said, if users want it, there should be a plugin that checks the bits. I agree that stripe awareness and the need to signal the underlying raid that a block needs to be recovered is important. Checksumming at the fs level seems like a reasonable plugin. I have no opinion on the computational cost of ECC vs. checksums, I will trust that you are correct.
Re: Another article abour Reiser4 on linux.com
Bernd Schubert wrote: An alternative might be a reiser4 fuse port. Fuse is not performance effective.
Re: article abour Reiser4 on linux.com
Bruce, I read your article on Linus and GPL V3, and I understand that you are frustrated by his not going for V3. I suspect the main thing that sparked your concern in writing the article about reiser4 is that I am somehow doing something different that affects licensing, and not conforming on that issue. Plugins have nothing to do with licensing issues --- that was purely Linus getting confused. Plugins are compiled in. Bruce Byfield wrote: I believe that you make a reference to two people working on XFS, one of whom other people later identify as Steve Lord. They wrongly identify him. Unfortunately, it's not one of which editors approve. It too easily looks as though the writer is being influenced by the source. If I were to do so, I'd risk being banned from publication. Wow. I thought only the judiciary insulated itself from ever learning of its mistakes that well.:-/
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Pavel Machek wrote: Yes, I'm afraid redundancy/checksums kill write speed, they kill write speed to cache, but not to disk our compression plugin is faster than the uncompressed plugin.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Edward Shishkin wrote: Hans Reiser wrote: Edward Shishkin wrote: How about we switch to ecc, which would help with bit rot not sector loss? Interesting aspect. Yes, we can implement ECC as a special crypto transform that inflates data. As I mentioned earlier, it is possible via translation of key offsets with scale factor 1. Of course, it is better then nothing, but anyway meta-data remains ecc-unprotected, and, hence, robustness is not increased.. Edward. Would you prefer to do it as a node layout plugin instead, so as to get the metadata? Yes, it looks like a business of node plugin, but AFAIK, you objected against such checks: Did I really? Well, I think that allowing users to choose whether to checksum or not is a reasonable thing to allow them. I personally would skip the checksum on my computer, but others It could be a useful mkfs option currently only bitmap nodes have a protection (checksum); supporting ecc-signatures is more space/cpu expensive. Edward.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Pavel Machek wrote: On Wed 2006-08-09 02:37:45, Hans Reiser wrote: Pavel Machek wrote: Yes, I'm afraid redundancy/checksums kill write speed, they kill write speed to cache, but not to disk our compression plugin is faster than the uncompressed plugin. Yes, you can get clever. But your compression plugin also means that single bit error means whole block is lost, so there _is_ speed vs. stability-against-hw-problems. But you are right that compression will catch same class of errors checksums will, so that it is probably good thing w.r.t. stability. Pavel So we need to use ecc not checksums if we want to increase reliability. Edward, can you comment in more detail regarding your views and the performance issues for ecc that you see?
Re: article abour Reiser4 on linux.com
Bruce Byfield wrote: Wow. I thought only the judiciary insulated itself from ever learning of its mistakes that well.:-/ Oh, we have ways of learning. As witness this email exchange :) You are right, it is only the judiciary.;-)
Re: Another article abour Reiser4 on linux.com
TongKe Xue wrote: A really stupid question ... why not put Reiser4 in one of the BSDs? The cost to port to BSD is about $500k, and I am not possessed of a lot of money at this time. There is also a license issue, I don't want reiser4 to be BSD licensed, people who want proprietary additions to reiser4 should pay me for reiser4.
article abour Reiser4 on linux.com
Bruce, regarding a longstanding convention of avoiding plugins in the kernel, considering that we are the first and only ones ever to have plugins, and considering the existence of binary kernel modules, I don't think your characterization is accurate. Perhaps there was some licensing controversy on lkml I am unaware of? Since plugins are (always) compiled in, unlike kernel modules, I don't see how there is a licensing issue. I think your characterization of plugins as something we impose on the VFS is unfair. Plugins exist entirely internally to reiser4 --- we impose nothing on anyone. I only wish to impose my views on Reiser4, not on VFS, because I really don't care for haggling with others about the code they wrote not having the features I want. I don't think you should characterize me as saying coding style conformity is the issue: that is how my opponents portray it, as some sort of 80 characters vs. 120 characters per line disagreement in which I refuse to go along. I think social connections prevailing over technical merits is the issue, and perhaps also whether VFS should intrude deeply into the independence of filesystem design based on the assumption that the VFS authors are more likely to get it right, or whether it should let each filesystem do what it wants to do based on the hope that if enough people try their own thing in a decentralized system, odds are that one of them will get it more right than the others and users will be able to choose that FS. I never made a reference to Steve Lord or Jim Lord, so suggesting that I got it wrong about him is inaccurate. Your characterization of the issue of whether Linux should double its filesystem performance as a philosophical point seems odd to me --- I think my writing failed to reach you as an audience on that point. Reiser4 was first proposed for inclusion in October 2002, and a casual reader of your article would think it was proposed in 2005. Many journalists will run my words past me to see if it fairly portrays my views --- perhaps you might choose to do that with those you write about in the future. In my experience it is a good journalistic practice. On a positive note, it seems like Andrew Morton will singlehandedly turn the Reiser4 review process into a technical not a political process, and I am much encouraged by that. He has made numerous useful technical remarks in his last email, and we are addressing them. Thanks for your time, Hans Tassilo Horn wrote: Hi, there's another Reiser4 article on linux.com [1], mainly about politics. Neither much of the technical discussion is covered, nor are the comments too positive, but it may be interesting anyway. One thing which bothers me most: Whenever such an article appears it's commented mostly by kiddies speaking after the mouth of several kernel developers, sophisticating most of their comments, ripping them out of context or spreading superficial knowledge. Of course those comments let it get to possible new users. Kind regards and keep up the great work, Tassilo (pleased Reiser4 user since 2.6.0_testX) Footnotes: [1] http://www.linux.com/article.pl?sid=06/07/31/1548201
Re: Article on LWN about recent discussions on reiser4 and inclusion
Jorgen Hermanrud Fjeld wrote: The recent discussions regarding reiser4 and possible inclusion have also caught the eye(s) of LWN. I have made the article available for you, non-lwn-subscribers, so that you may have a look at it here http://lwn.net/SubscriberLink/193663/9d2ac03195c775bc/;. Jorgen, are you with lwn? Thanks Jorgen. It was a remarkably positive article, and the posters were also quite positive. Hans
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Edward Shishkin wrote: How about we switch to ecc, which would help with bit rot not sector loss? Interesting aspect. Yes, we can implement ECC as a special crypto transform that inflates data. As I mentioned earlier, it is possible via translation of key offsets with scale factor 1. Of course, it is better then nothing, but anyway meta-data remains ecc-unprotected, and, hence, robustness is not increased.. Edward. Would you prefer to do it as a node layout plugin instead, so as to get the metadata? Hans
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Antonio Vargas wrote: On 8/4/06, Edward Shishkin [EMAIL PROTECTED] wrote: Hans Reiser wrote: Edward Shishkin wrote: Matthias Andree wrote: On Tue, 01 Aug 2006, Hans Reiser wrote: You will want to try our compression plugin, it has an ecc for every 64k What kind of forward error correction would that be, Actually we use checksums, not ECC. If checksum is wrong, then run fsck - it will remove the whole disk cluster, that represent 64K of data. How about we switch to ecc, which would help with bit rot not sector loss? Interesting aspect. Yes, we can implement ECC as a special crypto transform that inflates data. As I mentioned earlier, it is possible via translation of key offsets with scale factor 1. Of course, it is better then nothing, but anyway meta-data remains ecc-unprotected, and, hence, robustness is not increased.. Edward. and how much and what failure patterns can it correct? URL suffices. Checksum is checked before unsafe decompression (when trying to decompress incorrect data can lead to fatal things). It can be broken because of many reasons. The main one is tree corruption (for example, when disk cluster became incomplete - ECC can not help here). Perhaps such checksumming is also useful for other things, I didnt classify the patterns.. Edward. Would the storage + plugin subsystem support storing 1 copies of the metadata tree? I suppose What would be nice would be to have a plugin that when a node fails its checksum/ecc it knows to get it from another mirror, and which generally handles faults with a graceful understanding of its ability to get copies from a mirror (or RAID parity calculation). I would happily accept such a patch (subject to usual reservation of right to complain about implementation details).
Re: Ebuild/rpm/deb repo's (was Re: reiser4 can now bear with filled fs, looks stable to me...)
Marcel Hilzinger wrote: Am Donnerstag, 3. August 2006 10:55 schrieb Marcel Hilzinger: Am Dienstag, 1. August 2006 23:59 schrieb Sander Sweers: On Tue, 2006-08-01 at 23:12 +0200, Maciej Sołtysiak wrote: [...] Are there any on the list who know of rpm's for Suse/Redhat/Mandrake that include reiser4? One more idea: The next release of Ubuntu is a playground release. Hans, perhaps you should have a meeting with Mark Shuttleworth. Reiser4 inclusion in Linspire was a big step, but Linspire has not really a community. If you get Reiser4 included in the next Ubuntu release (and can make it rock stable then!), you do not have to bother about Fedora or Suse... It's quite late for inclusion in the next Ubuntu release, but who knows. Could you contact him for us, and ask? It is more convincing when users ask. Hans
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Edward Shishkin wrote: Matthias Andree wrote: On Tue, 01 Aug 2006, Hans Reiser wrote: You will want to try our compression plugin, it has an ecc for every 64k What kind of forward error correction would that be, Actually we use checksums, not ECC. If checksum is wrong, then run fsck - it will remove the whole disk cluster, that represent 64K of data. How about we switch to ecc, which would help with bit rot not sector loss? and how much and what failure patterns can it correct? URL suffices. Checksum is checked before unsafe decompression (when trying to decompress incorrect data can lead to fatal things). It can be broken because of many reasons. The main one is tree corruption (for example, when disk cluster became incomplete - ECC can not help here). Perhaps such checksumming is also useful for other things, I didnt classify the patterns.. Edward.
Re: Ebuild/rpm/deb repo's (was Re: reiser4 can now bear with filled fs, looks stable to me...)
Sander Sweers wrote: With the approval of Namesys I would like to add a new entry to the wiki frontpage. It would be very appreciated.
Re: reiser4: maybe just fix bugs?
Andrew Morton wrote: On Mon, 31 Jul 2006 10:26:55 +0100 Denis Vlasenko [EMAIL PROTECTED] wrote: The reiser4 thread seem to be longer than usual. Meanwhile here's poor old me trying to find another four hours to finish reviewing the thing. Thanks Andrew. The writeout code is ugly, although that's largely due to a mismatch between what reiser4 wants to do and what the VFS/MM expects it to do. I agree --- both with it being ugly, and that being part of why. If it works, we can live with it, although perhaps the VFS could be made smarter. I would be curious regarding any ideas on that. Next time I read through that code, I will keep in mind that you are open to making VFS changes if it improves things, and I will try to get clever somehow and send it by you. Our squalloc code though is I must say the most complicated and ugliest piece of code I ever worked on for which every cumulative ugliness had a substantive performance advantage requiring us to keep it. If you spare yourself from reading that, it is understandable to do so. I'd say that resier4's major problem is the lack of xattrs, acls and direct-io. That's likely to significantly limit its vendor uptake. (As might the copyright assignment thing, but is that a kernel.org concern?) Thanks to you and the batch write code, direct io support will now be much easier to code, and it probably will get coded the soonest of those features. acls are on the todo list, but doing them right might require solving a few additional issues (finishing the inheritance code, etc.) The plugins appear to be wildly misnamed - they're just an internal abstraction layer which permits later feature additions to be added in a clean and safe manner. Certainly not worth all this fuss. Could I suggest that further technical critiques of reiser4 include a file-and-line reference? That should ease the load on vger. Thanks.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Matthias Andree wrote: Have you ever seen VxFS or WAFL in action? No I haven't. As long as they are commercial, it's not likely that I will. WAFL was well done. It has several innovations that I admire, including quota trees, non-support of fragments for performance reasons, and the basic WAFL notion applied to an NFS RAID special (though important) case.
Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]
Theodore Tso wrote: On Mon, Jul 31, 2006 at 09:41:02PM -0700, David Lang wrote: just becouse you have redundancy doesn't mean that your data is idle enough for you to run a repacker with your spare cycles. to run a repacker you need a time when the chunk of the filesystem that you are repacking is not being accessed or written to. it doesn't matter if that data lives on one disk or 9 disks all mirroring the same data, you can't just break off 1 of the copies and repack that becouse by the time you finish it won't match the live drives anymore. database servers have a repacker (vaccum), and they are under tremendous preasure from their users to avoid having to use it becouse of the performance hit that it generates. (the theory in the past is exactly what was presented in this thread, make things run faster most of the time and accept the performance hit when you repack). the trend seems to be for a repacker thread that runs continuously, causing a small impact all the time (that can be calculated into the capacity planning) instead of a large impact once in a while. Ah, but as soon as the repacker thread runs continuously, then you lose all or most of the claimed advantage of wandering logs. Wandering logs is a term specific to reiser4, and I think you are making a more general remark. You are missing the implications of the oft-cited statistic that 80% of files never or rarely move. You are also missing the implications of the repacker being able to do larger IOs than occur for a random tiny IO workload which is impacting a filesystem that is performing allocations on the fly. Specifically, the claim of the wandering log is that you don't have to write your data twice --- once to the log, and once to the final location on disk (whereas with ext3 you end up having to do double writes). But if the repacker is running continuously, you end up doing double writes anyway, as the repacker moves things from a location that is convenient for the log, to a location which is efficient for reading. Worse yet, if the repacker is moving disk blocks or objects which are no longer in cache, it may end up having to read objects in before writing them to a final location on disk. So instead of a write-write overhead, you end up with a write-read-write overhead. But of course, people tend to disable the repacker when doing benchmarks because they're trying to play the my filesystem/database has bigger performance numbers than yours game When the repacker is done, we will just for you run one of our benchmarks the morning after the repacker is run (and reference this email);-) that was what you wanted us to do to address your concern, yes?;-) - Ted
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Alan, I have seen only anecdotal evidence against reiserfsck, and I have seen formal tests from Vitaly (which it seems a user has replicated) where our fsck did better than ext3s. Note that these tests are of the latest fsck from us: I am sure everyone understands that it takes time for an fsck to mature, and that our early fsck's were poor. I will also say the V4's fsck is more robust than V3's because we made disk format changes specifically to help fsck. Now I am not dismissing your anecdotes as I will never dismiss data I have not seen, and it sounds like you have seen more data than most people, but I must dismiss your explanation of them. Being able to throw away all of the tree but the leaves and twigs with extent pointers and rebuild all of it makes V4 very robust, more so than ext3. This business of inodes not moving, I don't see what the advantage is, we can lose the directory entry and rebuild just as well as ext3, probably better because we can at least figure out what directory it was in. Vitaly can say all of this more expertly than I Hans
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Ric Wheeler wrote: Alan Cox wrote: You do it turns out. Its becoming an issue more and more that the sheer amount of storage means that the undetected error rate from disks, hosts, memory, cables and everything else is rising. I agree with Alan You will want to try our compression plugin, it has an ecc for every 64k Hans
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Gregory Maxwell wrote: This is why ZFS offers block checksums... it can then try all the permutations of raid regens to find a solution which gives the right checksum. ZFS performance is pretty bad in the only benchmark I have seen of it. Does anyone have serious benchmarks of it? I suspect that our compression plugin (with ecc) will outperform it.
Re: reiser4 can now bear with filled fs, looks stable to me...
I think that most of our problem is that we are too socially insulated from lkml. They are a herd, and decide things based on what thoughts echo most loudly. That none of the shy developers working for me actively post on lkml hurts us quite a bit. It might even be socially effective to shut down reiserfs-list until inclusion occurs. Hans
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
David Masover wrote: Nikita Danilov wrote: As you see, ext2 code already has multiple file plugins, with persistent plugin id (stored in i_mode field of on-disk struct ext2_inode). Aha! So here's another question: Is it fair to ask Reiser4 to make its plugins generic, or should we be asking ext2/3 first? Sh., ext* already made their plugins generic, job is done:)
Re: possible recursive locking detected - while running fs operations in loops - 2.6.18-rc2-git5
Jesper Juhl wrote: Thanks. That's a nice little test suite. Yes, it is quite useful, our developers have added it to the regression suite Hans
Re: possible recursive locking detected - while running fs operations in loops - 2.6.18-rc2-git5
Jesper Juhl wrote: On 30/07/06, Hans Reiser [EMAIL PROTECTED] wrote: Jesper Juhl wrote: Thanks. That's a nice little test suite. Yes, it is quite useful, our developers have added it to the regression suite That's nice. Now how about that lock validator message I managed to tease out? Akpm said ... the reiserfs locking appears to be unneeded - this inode is going down and nobody else can look it up, so what is to be locked against? - can you comment on that? Err, how about Zam handles all locking issues and this is Sunday with the family? I know, lame, but he'll answer you on Monday Russian time.;-)
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Maciej Sołtysiak wrote: Hmm, what about linspire / freespire ? Linsire is a proud reiser4 debugging sponsor as the website (http://www.namesys.com) says. Wouldn't they want to include reiser4 in their distro first? Not if the mainstream kernel is not going to add it. Hans
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
David Masover wrote: If indeed it can be changed easily at all. I think the burden is on you to prove that you can change it to be more generic, rather than saying Well, we could do it later, if people want us to... None of the filesystems other than reiser4 have any interest in using plugins, and this whole argument over how it should be in VFS is nonsensical because nobody but us has any interest in using the functionality. The burden is on the generic code authors to prove that they will ever ever do anything at all besides complain. Frankly, I don't think they will. I think they will never produce one line of code. Please cite one ext3 developer who is signed up to implement ext3 using plugins if they are supported by VFS. . It also prevents users from getting advances they could be getting today, for no reason. It prevents users from doing nothing. Most users not only cannot patch a kernel, they don't know what a patch is. It most certainly does. Hans
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Jeff Garzik wrote: Using guilt as an argument in a technical discussion is a flashing red sign that says I have no technical rebuttal Wow, that is really nervy. Let's recap this all: * reiser4 has a 2x performance advantage over the next fastest FS (ext3), and when compression ships in a month that will double again as well as save space. See http://www.namesys.com/benchmarks.html, and then ask the reiserfs-list@namesys.com whether those benchmarks are fair representations of their experiences. This is in a field where a 25% advantage is a hard won big deal. * we described our plugin architecture in 2001. No other FS developers were interested, only the users were, and it was presented quite a lot. * So we implemented plugins for ourselves, because no other FS developers would possibly have supported us touching their code. (I do not say that they erred in this.) * No one has actually made a serious case for it being genericizable when you get to the details, it is all just handwaving. I'd be surprised if 10% of it was FS independent, and unsurprised if making that 10% FS independent made the code ossified and hard to maintain. I do not in anyway claim that those who choose to implement Reiser4 plugins are not deeply affected by Reiser4 design choices. Most of the value of writing Reiser4 plugins comes from being able to reuse Reiser4 code as you choose to in the process, and if Reiser4 is not to your taste as a whole, then nobody should impose our plugins upon you. VFS is a bad enough straight jacket for FS developers, we don't need even more mandated design decisions for the FS developers to come who will be brighter than us. Actually, I would like to see Nate Diller implement a competing VFS layer, I think he would do a very good job of that. * Here we are today, and Reiser4 plugins work. Now some say that because we did it for Reiser4 and not for every other FS, that we should be excluded from the kernel. So we are supposed to re-implement it as generic code, which will involve years of time, and then finally something will be coded and nobody but us will use it, and then they will tell us that because nobody but us wants to use it it cannot go in. If you disagree, find one ext3 developer who wants to rewrite ext3 to use plugins and change its disk format to do it. And you have the nerve to say that this ever was a technical discussion? Our code measurably works the best. If folks want to imitate it, go ahead, but don't blame us for making our code work without first making those other folks's code work. The technical rebuttal you ask for is http://www.namesys.com/benchmarks.html. The only time this argument gets technical is when akpm is involved. He was right about what should technically be done about batch write, which, by the way, was greeted upon completion with an if only reiser4 uses it then it should not go in response. We are being penalized for thinking too differently, and this whole ping-pong between no we don't want to do it your way and you did it your way for only you, redo it for us even though we won't ever use it and oh, you redid it for us but none of us want to use it, so no it is an imposition and cannot go in is the Kafka-esque manifestation of that. If only reiser4 wants to use something, then just let us do it in our little corner without bothering anybody else. (Though any advice from akpm that he has time for giving us is always welcome.) David, we aren't asking to be in the band, we are asking to be in the jukebox. I think enough users want to go 2x as fast that the users would benefit from our being in the jukebox. Hans
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Mike and Lukasz, please post your email to not just reiserfs-list, where only the reiserfs team will read it, but also to lkml if you could, please? Thanks for your support, user opinions count for a lot on lkml.
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Nikita Danilov wrote: Hans Reiser writes: David Masover wrote: If indeed it can be changed easily at all. I think the burden is on you to prove that you can change it to be more generic, rather than saying Well, we could do it later, if people want us to... None of the filesystems other than reiser4 have any interest in using plugins, and this whole argument over how it should be in VFS is nonsensical because nobody but us has any interest in using the functionality. The burden is on the generic code authors to prove that they will ever ever do anything at all besides complain. Frankly, I don't think they will. I think they will never produce one line of code. Please cite one ext3 developer who is signed up to implement ext3 using plugins if they are supported by VFS. In fact, they all do: struct inode_operations ext2_file_inode_operations; struct inode_operations ext2_dir_inode_operations; struct inode_operations ext2_special_inode_operations; struct inode_operations ext2_symlink_inode_operations; struct inode_operations ext2_fast_symlink_inode_operations; As you see, ext2 code already has multiple file plugins, with persistent plugin id (stored in i_mode field of on-disk struct ext2_inode). Hans Nikita. So the job is already done. Good. Reiser4 can be included then.:) Hans The Easily Agreeable Reiser
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Jeff Garzik wrote: Actually, there is reiser4 brokenness lurking in Hans' statement, too: Where! Someone tell me!;-) A filesystem WITH plugins must still handle the standard Linux compatibility stuff that other filesystems handle. Hmm, you mean we should first implement regular unix file plugins before implementing enhanced functionality ones? Are you aware that reiser4 plugins are per file, and thus if a user selects a plugin that is not the default, and which has user visible semantic differences, it means they said they want non-standard behavior? Plugins --do not-- mean that you can just change the filesystem format willy-nilly, with zero impact. Yes they do. Jeff
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Matthias Andree wrote: I wonder what makes the hash overflow issue so complicated (other than differing business plans, that is) that upgrading in place isn't possible. Changes introduce instability, but namesys were proud of their regression testing - so how sustainable is their internal test suite? Never met a test suite the equal of a few million users. People have this image of Namesys as some large corporation that has large resources. We just barely are going to be able to ship reiser4, at the cost of a LOT of financial pain. We can't afford to go in two directions at once. We can add bugfixes to V3, but adding features, I have to tell you that we ain't got the staff for both that and shipping V4. Our whole corporation has a budget about what most corporations spend on two programmers. We have 5 developers, including me, and making little bits of money is a constant distraction from the main work. Hans
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Linus Torvalds wrote: In other words, if a filesystem wants to do something fancy, it needs to do so WITH THE VFS LAYER, not as some plugin architecture of its own. Where does VFS store the plugin ids that specify per file variations? /etc/fstab? Also, is (current) VFS the interface for specifying where the hash directory plugin goes (specifies what order directory entries are within the directory)? What about the node layout plugin? The disk format plugin? Etc.? Our approach is different, but it has reasons. We eliminated the layer of indirection that Hellwig objected to, in which VFS called the plugin which called its method. (Let us try to avoid arguments over whether if you extend VFS it is still called VFS or is called reiser4's plugin layer, agreed?) Regarding copyright, these plugins are compiled in. I have resisted dynamically loaded plugins for now, for reasons I will not go into here. You can either portray reiser4 as duplicating VFS, or you can portray it as taking it to the next level, in which files (objects with classes and methods) vary rather than solely filesystems. I would prefer the latter.;-) If you agree with taking it to the next level, then it is only to be expected that there are things that aren't well suited as they are, like parsing /etc/fstab when you have a trillion files. It is not very feasible to do it for all of the filesystems all at once given finite resources, it needs a prototype. We already have exactly the plugin interface we need, and it literally _is_ the VFS interfaces - you can plug in your own filesystems with register_filesystem(), which in turn indirectly allows you to plug in your per-file and per-directory operations for things like lookup etc. If that isn't enough, then the filesystem shouldn't make its own internal plug-in architecture that bypasses the VFS layer and exposes functionality that isn't necessarily sane. For example, reiser4 used to have (perhaps still does) these cool files that can be both directories and links, and I don't mind that at all, but I _do_ mind the fact that when Al Viro (long long ago) pointed out serious locking issues with them, those seemed to be totally brushed away. We disabled them, and we won't enable them until them until the bug is fixed. It is fixable, but not within this year's programmer resources to fix it. I thank him for pointing out the bug, and it is not trivial to fix it. I don't think I've ever had the cojones to argue with Al.. Linux needs all kinds of people, not just the kind that can audit locking and copy Plan 9 well (which was very valuable to do), but now that Linux is large in the market it also needs those who can take it where Plan 9 has not already been. Why should we remain technology trailers instead of moving into the role of leaders? We have finite resources. We can give you a working filesystem with roughly twice the IO performance of the next fastest you have that does not disturb other filesystems,. (4x once the compression plugin is fully debugged). It also fixes various V3 bugs without disturbing that code with deep fixes. We cannot take every advantage reiser4 has and port it to every other filesystem in the form of genericized code as a prerequisite for going in, we just don't have the finances. Without plugins our per file compression plugins and encryption plugins cannot work. We can however let other filesystems use our code, and cooperate as they extend it and genericize it for their needs. Imposing code on other development teams is not how one best leads in open source, one sets an example and sees if others copy it. That is what I propose to do with our plugins. If no one copies, then we have harmed no one. Reasonable? Hans
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Let me put it from my perspective and stop pretending to be unbiased, so others can see where I am coming from. No one was interested in our plugins. We put the design on a website, spoke at conferences, no one but users were interested. No one would have conceived of having plugins if not for us. Our plugins affect no one else. Our self-contained code should not be delayed because other people delayed getting interested in our ideas and now they don't want us to have an advantage from leading. If they want to some distant day implement generic plugins, for which they have written not one line of code to date, fine, we'll use it when it exists, but right now those who haven't coded should get out of the way of people with working code. It is not fair or just to do otherwise. It also prevents users from getting advances they could be getting today, for no reason. Our code will not be harder to change once it is in the kernel, it will be easier, because there will be more staff funded to work on it. As for this we are all too grand to be bothered with money to feed our families business, building a system in which those who contribute can find a way to be rewarded is what managers do. Free software programmers may be willing to live on less than others, but they cannot live on nothing, and code that does not ever ship means living on nothing. If reiser4 is delayed enough, for reasons that have nothing to do with its needs, and without it having encumbered anyone else, it won't be ahead of the other filesystems when it ships.
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Hua Zhong wrote: I remember someone said something along the lines of Linux is evolution, not revolution. To me it seems unreasonable to put all the revolutionary VFS burden upon reiserfs team. It's not practical. Thanks for saying that Hua. We have a guy named Nate Diller, who probably could fix VFS up pretty nicely if Namesys did not have him earning the consulting income the rest of the team lives on (he is doing io scheduler work at the moment). That said, he would need a lot of time, and stopping reiser4 inclusion to await his work merely ensures his work will never happen.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
David Masover wrote: Although I should mention, Hans, that there is a really good reason to prefer the 15 minute patches. The patches that take a week are much harder to read during that week than any number of 15 minute incremental patches, because the incremental patches are already broken down into nice, small, readable, ordered chunks. And since development follows some sort of logical, orderly pattern, it can be much easier to read it that way than to try to consider the whole. No, I disagree, if the code is well commented, it is easier to read the whole thing at the end when it has its greatest coherence and refinement. A problem with Reiser4 is that its core algorithms are simply complex. We pushed the envelope in multiple areas all at once. Benchmarks don't always suggest simple algorithms are the ones that will be highest performance. Tree algorithms are notorious in the database industry for being simple on web pages but complex as code. Some people program in small increments, some program things that require big increments of change, both kinds of people are needed.
Re: ReiserFS v3 choking when free space falls below 10% - FIXED
David Masover wrote: As a future MythTV user a bit late to this discussion, I'm curious -- was this Reiser3 or 4? Are there any known MythTV issues with v4? I say this because the box with my capture card is running on a Reiser4 root right now... I think you get to be the one to tell us
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
David Masover wrote: Hans Reiser wrote: to use as his default. Now that we paid the 5 year development price tag to get everything as plugins, we can now upgrade in littler pieces than any other FS. Hmm, I need a buzz phrase, its not extreme programming, maybe moderate programming. This phrase was a bit tongue-in-cheek. Does that sound exciting to
our 2.6.17 patch is not stable, please be warned
It crashed on me, and needed an fsck. At least our fsck works well though:-/, Vitaly, you did a great job of making the user interface informative. Hans
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Matthias Andree wrote: The father declared his child unsupported, I never did that. and that's the end of the story for me. There's nothing wrong about focusing on newer code, but the old code needs to be cared for, too, to fix remaining issues such as the can only have N files with the same hash value. Requires a disk format change, in a filesystem without plugins, to fix it. (I am well aware this is exploiting worst-case behavior in a malicious sense but I simply cannot risk such nonsense on a 270 GB RAID5 if users have shared work directories.)
Re: portage tree (Was: Re: reiser4 status (correction))
Thanks Christian. You can go ahead and add something to our wiki pointing to it if you would like. This might help tide people over until the repacker ships. Hans
Re: future r4 maintenance question
Maciej Sołtysiak wrote: Hello Hans, Saturday, July 22, 2006, 8:03:28 PM, you wrote: We are going to give changing the paradigm a try. The difference between 4.1-beta and 4.0 is that different plugins are the default, and the experimental code is in the plugins you see when mounting with the mount option 4.1-beta. Let's see if it works in practice. I Understand. This is good news. Hm, do you think that reiser4's pluggability is enough to have this single kernel tree (fs/reiser4) for a longer period of time. Yes, reiser4 will have a much longer lifetime, and improvements will come out in small pieces rather than complete rewrites. I think we can add the enhanced semantics one feature at a time. It was the storage layer that was the big thing that had to be right before the rest could proceed. Now, what remains are a whole lot of incremental improvements I hope. If I can make enough money off the repacker or I get funding from France or some other government and we are able to afford to keep vs and zam working on the storage layer, and Nate working on VFS changes and Peter Foldiak and others at St. Andrews doing semantic enhancements and Gorazd doing user space browsers things could get very interesting. We need to get into the kernel, and money will manifest, and programmers can get to work I mean, can you predict a need of spawning something like reiser5 in the forseeable future or would fs/reiser4 + plugins be enough to do away with the future vision and other future * stuff you've written about ? eg. I remember reading about very granular security ACLs like restricting a certain line in a file (like /etc/passwd) Yes, I still want to do that stuff Hans
Re: reiser4 status (correction)
Mike Benoit wrote: On Sat, 2006-07-22 at 07:34 -0500, David Masover wrote: The compression will probably mostly be about speed. Remember, if we're talking about people who want to see tangible, visceral results, we're probably also talking about end-users. And trust me, the vast majority of most of my data (as an end-user) is not very compressible. Sure, mine too. Between the many gigs of MP3s and movies I have stored on my HD, only about 10-20GB is the OS, Email, and documents/source code. Even just compressing that small portion though I could probably save between 5-10GB. The difference is though I can do a df before, and a df after, and I can instantly see I got my moneys worth. Same with encryption. I am looking forward to the first user email complaining that he compressed a file stored on reiser4 and it didn't save space (and then someday maybe even an email saying that he compressed a file and it took up more space but the user space compressor is sure that it saved space and he does not understand).:) With the repacker it is much more difficult (for average users) to time how long a program takes to load some file (or launch), before the repacker and after. I think you are confusing repacking and compression? Repacking, by removing seeks, will make it more predictable not less. Especially since caching comes in to play. Also, according to this little poll on one of the compressed FUSE sites you linked to, more people are looking to compression for space saving, then for speed: http://parallel.vub.ac.be/~johan/compFUSEd/index.php?option=pollstask=resultspollid=31 No, mostly we're talking about things like office documents, the majority of which fit in less than a gigabyte, and multimedia (music, movies, games) which will gain very little from compression. If anything, the benefit would be mostly in compressing software. less tangible like fragmentation percentages and minor I/O throughput improvements. I used to work at a large, world wide web hosting company and I could see making a case to management for purchasing Reiser4 compression would be pretty easy for our shared servers. Instantly freeing up large amounts of disk space (where .html/.php files were the vast majority) would save huge amounts of money on disk drives, especially since most of the servers used RAID1 and adding new drives was a huge pain in the neck. Making a case to purchase a repacker would be much, much more difficult. Hmm, the problem is, if storage space is really the big deal, it's been done before, and some of these efforts are still usable and free: http://parallel.vub.ac.be/~johan/compFUSEd/ http://www.miio.net/fusecompress/ http://north.one.pl/~kazik/pub/LZOlayer/ http://apfs.humorgraficojr.com/apfs_ingles.html And while we're on the topic, here's an FS that does unpacking of archives, probably about the same way we imagined it in Reiser4 pseudofiles/magic: http://www.nongnu.org/unpackfs/ But regardless, as far as I can tell, the only real, tangible benefit of using Reiser4 compression instead of one of those four FUSE filesystems is speed. Reiser4 would compress/decompress when actually hitting the disk, not just the FS, and it would also probably use in-kernel compression, rather than calling out to userspace on every FS operation. I think that compressing only on flush is a big issue. It was a lot harder to code it, but it really makes a difference. You don't want a machine with a working set that fits into RAM to be compressing, that would be lethal to performance, and it is a very important case. Hans
Re: future r4 maintenance question
I am going to do the enhanced semantics first, so that somebody does not beat me to it. David's examples are good. There's another note to kernel developers -- if Reiser5, 6, and 7 are implemented as suites of plugins on top of Reiser4, then the Reiser4 code will be maintained for a very long time. Kind of like ext2 vs ext3, only moreso -- a Reiser5 FS may well be a Reiser4 FS mounted with additional mount options. There is definitely a lot that can be done to move Reiser4 (as it is today) closer to the Reiser4 whitepaper on the homepage. ACLs are one thing, files as a directory are another. The idea of v4 is to do away with many cases where a separate namespace is created for no good reason -- for instance, where is the data in an id3 tag? It's inside an mp3 file, and you can only get to it with tools written for id3 tags of mp3 files. The Reiser4 concept is to allow things like that to exist, but not require programs to know about libid3 or whatever. Want to know what the artist of a particular file is? foo.mp3/.../id3/artist Or maybe a more generic way: foo.mp3/.../song-info/artist That way, you could have tools which don't even have to know if the file has an id3 tag, or something entirely new, or if the metadata is being stored outside the main file. It'd be entirely possible to allow that file to be treated as a separate file entirely by the plugin, rather than something derived from foo.mp3. The advantages don't seem immediately obvious until you consider that the program which does this doesn't have to even know that it's dealing with song metadata. Consider some of the one-line shell scripts possible: # Change the artist name for all songs in the directory: for i in *; do echo 'Jimi Hendrix' foo.mp3/.../song-info/artist; done # Make a playlist of all files by Hendrix, mp3 or otherwise: for i in `find`; do if [ `cat $i/.../song-info/artist` == 'Jimi Hendrix' ]; then echo ../$i playlists/hendrix.m3u; fi; done # Copy all files needed by said playlist to a USB device: cp playlist/hendrix.m3u/.../files/* /mnt/usb cp playlist/hendrix.m3u /mnt/usb I'm sure others can think of much more interesting examples. All of that is planned for v4, eventually. It's very pluggable. Well, I think it is. I don't work here...
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Jeff, I think that a large part of what is going on is that any patch that can be read in 15 minutes gets reviewed immediately, and any patch that is worked on for 5 years and then takes a week to read gets neglected. This is true even if line for line the 1 week to read patch is more valuable.What is more is that people know this is irrational, but aren't able to cure it in themselves. Even I have a problem of paying too much attention to endless 5 minute emails when I know I should instead, say, read the compression plugin from beginning to end. There is nothing about small patches that makes them better code. There is no reason we should favor them, if the developers are willing to work on something for 5 years to escape a local optimum, that is often the RIGHT thing to do. It is importand that we embrace our diversity, and be happy for the strength it gives us. Some of us are good at small patches that evolve, and some are good at escaping local optimums. We all have value, both trees and grass have their place in the world. With you in particular, you demonstrated NO interest in maintaining reiser3, once reiser4 began to make a splash. Linux kernel code exists for DECADES, and as such, long term maintenance is a CRITICAL aspect of development. You are rejecting the development model which is based on stable branches getting only bugfixes. V3 is a stable branch. It just had a feature added to it which added a bug that MythTV users are hitting. Some of them are responding to it by walking away from Reiser3, and no doubt muttering about what an unstable pile of shit our code is. On monday one of my guys is stopping work on V4 to send in a bug fix for a feature that should have gone into V4 first, and then maybe gotten backported after it was proven in V4. So, given that Jeff and Chris can often be gotten to fix bugs, do I ask them to do it whenever there is a bug to fix and they will fix it? Oh yes! The despiriting thing though is that there is usually another reason to let them fix it, which is that almost all v3 bugs are in features they have added to what ought to have been a stable branch, and since it is their code, they should be the ones to fix it. We might, maybe, get one bug report a year in code written by Namesys before I announced code freeze on V3. I just got an email from the programmer who wrote the MythTV bug saying that he is just too busy to bother fixing the bug in his code. so my response is that a Namesys programmer is going to fix it on Monday. All this talk about how you guys worry that code is going to be abandoned, you know, try policing the kids in their 20's who do it, not those who have been working since 1984 on developing the thing you somehow are worried they will abandon. I am not 20 something anymore, I am getting fat no matter how much I exercise, and I stick with things, and I only wish some things didn't stick so much with my middle Regardless of whatever new whiz-bang technology exists in reiser4, there is a very real worry that you will abandon reiser4 once its in the tree for a few years, just like what happened with reiser3. And look at how Linus abandoned 2.4! Users of 2.4 needed so many features that were put into 2.6 instead, and they were just abandoned and neglected and Do you think he will abandon 2.6.18 also? The stable branch of code getting only bugfixes and the development branch getting all the new features model of development is something most release management professionals agree is the right way to do things. I worked with release management teams some, and I have to say that the dominant paradigm in the software industry is, in this case, the best one yet. Of course, I want to make it a little better, you know how I am, and as I was just discussing on the reiserfs-list, with plugins we can now move to a model in which if you mount reiser4 using the -o reiser4.1-beta mount option, it changes what the default plugin is, and that is how we do releases, we put our beta code in different plugins, and let the user choose whether to upgrade to a new release by just choosing what plugins to use as his default. Now that we paid the 5 year development price tag to get everything as plugins, we can now upgrade in littler pieces than any other FS. Hmm, I need a buzz phrase, its not extreme programming, maybe moderate programming. Does that sound exciting to others.;-) Seriously though, I am curious to see whether plugin based release management works out as pleasantly for users as I am hoping it will. Hans
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Jeff Mahoney wrote: That particular bug isn't in the bitmap scanning code, it's a side effect of the write batching higher up. Did you write the code that looks for a window of 32 blocks? If not, and if this code has been around for a long time, I apologize. I thought you did write it and added it in recent months. It's a pathological case when the file system is seriously fragmented. Most bugs are pathological cases.;-) A quick fix would be to set a flag indicating that future writes shouldn't bother trying to find a window that large, There are lots of quick fixes. 1) The quickest is to not scan for the window at all. 2) The second quickest is to limit the number of bitmaps that will be scanned to some number like 3. 3) The not at all quickest is to track free extents like XFS does, which is not a hack, but it belongs in a development branch. I am not sure it is worth the complexity, but my mind is not closed. On monday we will do 1) or 2), probably 1). After the repacker is done, we should review all our block allocation algorithms. I have an idea for how to do things more optimally for streaming media that will avoid fragmentation over time, and when combined with the repacker may make 3 not worthwhile. I am grateful that you and Chris do bug fixes, but when you guys are too busy, (and that can and will happen to any of us), the baton needs to get passed. V3 needs to be a zero defect product, and once we know it is a bug I don't want bugs in V3 to remain unfixed for more than a day plus the time it takes to fix it.If you do add code, I want any bugs that show up in the aftermath of mainstream merging to get jumped on. Hans
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Jeff Mahoney wrote: Anyone up for it? :) There are changes I'd like to see in reiser3, particularly ones that address the severe problems observed in David Chinner's high bandwidth file system talk this year at OLS. Specifically, it ended up making very little progress and spending the majority of the time in the journal when the workload is streaming data at the disk at a very high rate on a very large file system. Yes, that is certainly XFS's sweet spot, but barely making progress at all is a bit more severe than poor performance. Perhaps mkreiserfs should be a bit saner about choosing journal sizes, since a 32 MB journal is not a good fit for all cases. Also, I'd like to see the usage of the BKL gone as it severely limits performance when more than one thread is writing to the file system, or even another reiserfs file system. It's not entirely low hanging fruit since the nested cases need to be audited, but it shouldn't be too hard to eliminate the inter-filesystem lock contention by replacing the BKL with a per-sb mutex. Getting rid of the BKL is a huge task that was done in V4 for a reason. You are talking about 6+ man-months, and years of shake-out to fully debug. Actually, it is a tribute to Zam's skill that V4's locking got debugged so fast: I gave him the task knowing it was going to be the hardest code to debug, and he did it very well. These things you discuss, except for the journal size, are not things to fix in a stable branch. My apologies that I thought this was a new bug. Let us be glad that a user gave us enough detail we saw it. I have some more things, but I have nowhere near the time to do them, and other file systems will perform fine.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Matt Heler wrote: On Sunday 23 July 2006 12:20 am, Hans Reiser wrote: The way you wrote this, makes it sound like a userspace issue, and _not_ a problem with reiserfs. It was a problem with reiserfs. Code was added to search for the perfect spot to fit a file. If there is no perfect spot, it searches every bitmap for that spot before giving up. However, Jeff kindly gave us a little patch to fix this and made the whole issue moot. It also seems I was in error, and we actually have had this problem since 2002. Now some past remarks from users about fragmentation make more sense. What can I say, since I have no MP3s I never get anywhere near full on my personal hard drive. And look at how Linus abandoned 2.4! Users of 2.4 needed so many features that were put into 2.6 instead, and they were just abandoned and neglected and Do you think he will abandon 2.6.18 also? Not entirely true, he did not abandon the 2.4 kernel branch, he passed on maintainership to Marcelo. Similar to how he passed the torch on the 2.2 kernel branch to Alan Cox. Also on a side note, many new features ( and a ton of bug fixes !! ) were added to the 2.4 series _after_ Linus started working on the 2.5 branch. You missed the sarcasm in my voice, my apologies, it is the trouble I have with email. Just to balance everything with some nuance, let me add that when a development branch is first opened, there is usually a bit of gray as to whether particular small features should go into the development branch or the stable branch. As the stable branch gets more stable the incentive to not destabilize it increases, and as a development branch becomes usable, the delay to users due to putting features only there reduces. I want reiserfs to be the filesystem that professional system administrators view as the one with both the fastest technological pace, and the most conservative release management. I apologize to users that the technology required a 5 year gap between releases. It just did, an outsider may not realize how deep the changes we made were. Things like per node locking based on a whole new approach to tree locking that goes bottom up instead of the usual top down are big tasks.Dancing trees are a big change, getting rid of blobs is a big change, wandering logs. We did a lot of things like that, and got very fortunate with them. If we had tried to add such changes to V3, the code would have been unstable the whole 5 years, and would not have come out right. Experienced writers know that often, if you want to fix a passage, even a passage that is quite good in some parts, sometimes it is better to write the whole passage again without looking at the text of the first draft of the old passage, because sometimes your muse just needs the freedom, and without the freedom the awkwardness of the old passage is incurable. Probably there is some very sophisticated neurological reason why that is. Code can be the same. Sometimes. I knew that reiser4 HAD to be written from scratch without reference to the old code if it was to come out right. If I cannot be a great artist, at least I can try to have the temperament of one, yes? :-) I sincerely hope that using mount options to select default plugins, and making development code go into new plugins means that releases after this can be roughly quarterly, and that we can start doing a whole bunch of quick little plugins. Technically, I think it is going to be downhill skiing from here, and some very visible bits of functionality will get added much more easily than this difficult infrastructure we just coded. Hans
Re: reiser4 status (correction)
David Masover wrote: And it's not just databases. Consider BitTorrent. The usual BitTorrent way of doing things is to create a sparse file, then fill it in randomly as you receive data. Only if you decide to allocate the whole file right away, instead of making it sparse, you gain nothing on Reiser4, since writes will be just as fragmented as if it was sparse. If you don't flush it before you fill it, then in reiser4 it does not matter if it starts out sparse. Personally, I'd rather leave it as sparse, but repack everything later. We have 3 levels of optimization: 1) at each modification, 2) at each flush, and 3) at each repack. Each of these operates on a different time scale, and all 3 are worthy of doing as right as we can. Now, the issue of where should airholes be? Why, that is the grand experiment that will start to happen in a few months. Nobody knows yet what defaults we should have, and whatever we choose, there will be some users who gain from explicit control of it. it must not be as trivial as I think it is. The problem is that there was a list of must dos, and this was just one of them. If reiser4 goes in, then fsync is the only thing in front of the repacker. The list has reduced in size a bunch. A much better approach in my opinion would be to have Reiser4 perform well in the majority of cases without the repacker, and sell the repacker to people who need that extra bit of performance. If I'm not mistaken this is actually Hans intent. Hans? Yes, that's the idea. Only sysadmins of large corps are likely to buy. We throw in service and support as well for those who purchase it. If I was making money, I would not do this, but I am not. I am not really willing to work a day job for the rest of my life supporting guys in Russia, it is only ok to do as a temporary measure. I am getting tired If Reiser4 does turn out to perform much worse over time, I would expect Hans would consider it a bug or design flaw and try to correct the problem however possible. I would want Reiser4 without a repacker to outperform all other filesystems. The problem with this fragmentation over time issue is that it is hard to tweak allocation, measure the effect, tweak again, etc. Not sure how to address it without a lot of work. Maybe we need to create some sort of condensed portage benchmark Hans Hans
Re: Utilizing OSDL's STP for Reiser4 testing/benchmarks?
Nate Diller wrote: On 7/21/06, Mike Benoit [EMAIL PROTECTED] wrote: Has the Reiser4 team looked at utilizing the OSDL Scalable Test Platform (STP) service to benchmark and test Reiser4 patches? They seem to offer a wide variety of hardware to test on, and already have some file system benchmarks available to choose from. I noticed the Namesys benchmark page hasn't been updated in quite some time, understandably so, as benchmarking can be incredibly time consuming. Perhaps using the STP would make this much easier to do more frequently? It would be really nice to see how the batch_write/read(?) patches to the kernel affect things, and how Reiser4 performance progresses over time. I know Hans has mentioned CPU usage dropping in recent patches, but actual numbers from a benchmark are usually more compelling to see. the dev team has seen a number of benchmarks which Hans' statements are based on... maybe we should post some of them publicly? NATE Sure. I asked for that, actually.
Re: future r4 maintenance question
We are going to give changing the paradigm a try. The difference between 4.1-beta and 4.0 is that different plugins are the default, and the experimental code is in the plugins you see when mounting with the mount option 4.1-beta. Let's see if it works in practice. Hans Pysiak Satriani wrote: Hello, one of the concerns of LKML folks about r4 is that namesys after finishing their work with r4 will rush off to reiser5 leaving r4 as stable in a bugfix-mode only I understand that way of things - you don't mess with a stable branch software. But a question arises. Will namesys keep r4 in a bugfix-only mode (as with reiserfs v3) or will it have resources planned not only for bugfixes but for updates too ? I think that's a thing that needs to be very clear. Best regards, Maciej
Re: fsck.reiserfs --rebuild-tree out of disk space aborted
vs will try to help you Did Jeff fix the mythtv performance problem yet? If not, vs, please rip out the optimization which goes looking for the perfect length spot for files, and send both joel and akpm the patch. It is really not such a bad algorithm to just use the spots that are free in the order that they are found, and it can certainly be used until a well tested and benchmarked alternative algorithm is presented to me in detail. Hans Joel Heenan wrote: Hi, I can home about two weeks ago and found my media box locked up. I was able to discover that it had filled up its /dev/md2 partition (mounted on /home) which surprised me because it is 550 gigs. Perhaps mythtv went nuts and used it all up. It was only a temporary thing I was going to move mythtv to another partition anyway and boy I wish I had now :-P. Anyway I rebooted and the fsck said I had to run rebuild-tree. So I ran that and a few days later it said out of disk space, aborted. I can't mount the partition it says No folder found I believe. I tried it a few times with both the reiserfsprogs from etch: ii reiserfsprogs3.6.19-2 User-level tools for ReiserFS filesystems and the latest ones downloaded from the website (3.6.19). I am currently trying it with the -S option. I'm running a custom 2.6.12 kernel which is basically the same as the default debian one except it includes some drivers for my dvico fusion tv tuner. I read that the best way to fix this problem is to dd to a bigger partition but there is really no easy way for me to do that - it will probably involve me purchasing 2x300gig partitions, raiding them, then performing the restore. Please let me know if there is a way I can fix this without going to a bigger partition. Here is output of the fsck: fsck.reiserfs --rebuild-tree /dev/md2 reiserfsck 3.6.19 (2003 www.namesys.com) * ** Do not run the program with --rebuild-tree unless ** ** something is broken and MAKE A BACKUP before using it. ** ** If you have bad sectors on a drive it is usually a bad ** ** idea to continue using it. Then you probably should get ** ** a working hard drive, copy the file system from the bad ** ** drive to the good one -- dd_rescue is a good tool for ** ** that -- and only then run this program. ** ** If you are using the latest reiserfsprogs and it fails ** ** please email bug reports to reiserfs-list@namesys.com, ** ** providing as much information as possible -- your ** ** hardware, kernel, patches, settings, all reiserfsck ** ** messages (including version), the reiserfsck logfile, ** ** check the syslog file for any related information. ** ** If you would like advice on using this program, support ** ** is available for $25 at www.namesys.com/support.html. ** * Will rebuild the filesystem (/dev/md2) tree Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes Replaying journal.. Reiserfs journal '/dev/md2' in blocks [18..8211]: 0 transactions replayed ### reiserfsck --rebuild-tree started at Mon Jul 17 21:41:44 2006 ### Pass 0: ### Pass 0 ### Loading on-disk bitmap .. ok, 116676032 blocks marked used Skipping 11771 blocks (super block, journal, bitmaps) 116664261 blocks will be read 0%20%block 85875092: The number of items (256) is incorrect, should be (1) - corrected block 85875092: The free space (1) is incorrect, should be (208) - corrected pass0: vpf-10110: block 85875092, item (0): Unknown item type found [33488896 65537 0x100 ??? (15)] - deleted left 0, 15065 /seccsec 108413 directory entries were hashed with r5 hash. r5 hash is selected Flushing..finished Read blocks (but not data blocks) 116664261 Leaves among those 87768646 - leaves all contents of which could not be saved and deleted 1 Objectids found 108417 Pass 1 (will try to insert 87768645 leaves): ### Pass 1 ### Looking for allocable blocks .. finished 0%20%40%..Not enough allocable blocks, checking bitmap...there are 1 allocable blocks, btw out of disk space Aborted
Re: reiser4 status (correction)
David Masover wrote: Hans Reiser wrote: On a more positive note, Reiser4.1 is getting closer to release Good news! But it's been awhile since I've followed development, and the homepage seems out of date (as usual). Where can I find a list of changes since v4? By out of date, I mean things like this: Reiser4.1 will modify the repacker to insert controlled air holes, as it is well known that insertion efficiency is harmed by overly tight packing. Sigh, no, the repacker will probably be after 4.1 The list of tasks for zam looks something like: fix bugs that arise debug read optimization code (CPU reduction only, has no effect on IO), 1 week est. (would be nice if it was less) review compression code 1 day per week until it ships. fix fsync performance (est. 1 week of time to make post-commit writes asynchronous, maybe 3 weeks to create fixed-reserve for write twice blocks, and make all fsync blocks write twice) write repacker (12 weeks). I am not sure that putting the repacker after fsync is the right choice The task list for vs looks like: * fix bugs as they arise. * fix whatever lkml complains about that either seems reasonable, or that akpm agrees with. * Help edward get the compression plugins out the door. * Improve fsck's time performance. * Fix any V3 bugs that Chris and Jeff don't fix for us. Which reminds me, I need to check on whether the 90% full bug got fixed
Re: Utilizing OSDL's STP for Reiser4 testing/benchmarks?
Elena, on Monday can you comment on this in detail? Thanks, Hans Mike Benoit wrote: Has the Reiser4 team looked at utilizing the OSDL Scalable Test Platform (STP) service to benchmark and test Reiser4 patches? They seem to offer a wide variety of hardware to test on, and already have some file system benchmarks available to choose from. I noticed the Namesys benchmark page hasn't been updated in quite some time, understandably so, as benchmarking can be incredibly time consuming. Perhaps using the STP would make this much easier to do more frequently? It would be really nice to see how the batch_write/read(?) patches to the kernel affect things, and how Reiser4 performance progresses over time. I know Hans has mentioned CPU usage dropping in recent patches, but actual numbers from a benchmark are usually more compelling to see. User Guide: http://www.osdl.org/lab_activities/kernel_testing/stp/guide.html/document_view Test Details: http://www.osdl.org/lab_activities/kernel_testing/stp/test_details.html/document_view
Re: storing images thumbnails as pseudo files?
David Masover wrote: Disclaimer: I don't speak for Namesys, and I don't work here. While I'm pretty confident I understand their vision, the final word on anything Reiser is always from Hans Reiser. David described my views pretty well, and saved me much typing.:)
Re: reiser4 status (correction)
Mike Benoit wrote: On top of that, I don't see how a repacker would help these work loads much as the files usually have a high churn rate. I think Reiserfs is used on a lot more than squid servers. For them, 80% of files don't move for long periods of time is the usual industry statistic
Re: somewhat OT query on journalling
David Masover wrote: Andreas Schäfer wrote: Don't get too excited -- the transactions probably aren't done yet. Without those, no filesystem that claims to journal data is really any better than a filesystem which only journals metadata. Even once they are implemented (or even if they are already), applications have to support them directly. Actually, I think transactions in a filesystem context are a bit different from the transactions you know form databases. Generally Yes, generally speaking, you're entirely right. But in the case of Reiser4, at least for a single file, you can perform a number of writes and declare them a single transaction. If we finish that code you can.;-) One of the problems that we need to deal with is that we are shipping a product pared of all functionality not essential so that we can get it out the door, and the website still describes the whole vision. We will do the whole vision, but first we need to get some income flowing.
Re: storing images thumbnails as pseudo files?
Pysiak Satriani wrote: Hello, suppose pseudo files, file-as-directory are on my r4 partition and are usuable. Does namesys' vision allow things like storing image thumbnails inside the file itself ? Example using jpgtn (jpgtn creates thumbnails of jpg files) // create 150px thumbnail $ jpgtn -s 150 -H test.jpg // move the thumbnail into the file $ mv tn_test test.jpg/thumbs/150px // the same but a 100px thumbnail $ jpgtn -s 100 -H test.jpg $ mv tn_test test.jpg/thumbs/100px Then I could write HTML code like this: a href=test.jpgimg src=test.jpg/thumbs/150px //a This idea came to me because of a project that has to create many thumbnails for many files. That requires to make a naming scheme, parse filenames and don't make one mistake. If I could do things this way it would be much easier and elegant to do. But that's just an idea from someone not really knowledgible enough in the area. Best regards, Maciej Soltysiak What you describe is the vision we are trying for.
reiser4 status (correction)
Well, it seems we still aren't quite as stable as we were 6 months ago (the new reduced cpu usage code was extensive, as was the VFS change code), and we know of a bug we can reproduce using our standard tests. Also, it seems we can oops when a particular program is run to consume all memory (thanks Jate for finding it). Hopefully things will be more stable next week Us developers are using the new code on our workstations without problem though. On a more positive note, Reiser4.1 is getting closer to release It is working fine for the developer coding it, and we are scheduling code reviews for it and defining migration paths, etc. Hopefully in 2 months it will ship. The big issue with 4.1 is that we are having to deal with all the issues of REALLY allowing users to change default plugins, etc., and finding we missed details. We will say more later. Thanks for your patience, Hans
Re: somewhat OT query on journalling
Payal Rathod wrote: Hi, I was just reading about filesystems and my ideas are a bit confused. I read quite a few articles on net but still my basic doubts are not completely clarified. I thought this would be the right place to ask, since many journalling gurus might be here. Can someone tell me do journalling fs maintain journal about the metadata or the all the data? V3 defaults to metadata only, V4 does data also because we can do it without performance loss. Also, is it true that now-a-days there is no such thing as inode block since for faster access the inodes are kept near the data itself? reiserfs does not use inodes at all. see our website for more. How is the journal maintained? How is it prevented from being too big and why are these fs not slower than traditional fs since it involves an overhead of writing to a journal? see website. there is overhead. for v4 it is not a lot though. And lastly don't the journalling fs give a false sense of security to the user, saying that the data is written to disk when in reality only an entry is made in journal and data is still not committed to disk. someone else anwered this Thanks a lot for the patience and eagerly waiting for any replies. With warm regards, -Payal
2.6.17 patch is in testing now
It contains 5 bug fixes. If testing goes well we will release it tomorrow.It is listed below in case you feel like helping to test, it works on vs's workstation ftp://ftp.namesys.com/pub/reiser4-for-2.6/2.6.17/reiser4-for-2.6.17-1.patch.gz Apologies to our users that this took a while Hans
Re: [PATCH] reiserfs: fix handling of device names with /'s in them
Jeffrey Mahoney wrote: Hans Reiser wrote: Jeffrey Mahoney wrote: This is not the desired interpretation, which is why we need to replace the pathname separator in the name. ReiserFS is the component that is choosing to use the block device name as a pathname component and is responsible for making any translation to that usage. This makes no sense. I have the feeling you see trees and I see forest. No, Hans. I see a problem that has been fixed elsewhere in an identical manner. The real solution is to eliminate / from block devices in the long run, not to start introducing mount points with different pathname interpretation rules. Those may have a place elsewhere, after a tough uphill battle, and are most certainly overkill for this problem. -Jeff I don't understand your patch and cannot support it as it is written. Perhaps you can call me and explain it on the phone. Hans
Re: [PATCH] reiserfs: fix handling of device names with /'s in them
Jeff Mahoney wrote: Hans Reiser wrote: I don't understand your patch and cannot support it as it is written. Perhaps you can call me and explain it on the phone. I seriously can't tell if you're deliberately trying to be difficult or not. It's a simple replace / with ! before sending the name to procfs. Reiserfs requests that a procfs directory called /proc/fs/reiserfs/blockdev be created. Some block devices contain slashes, so with cciss/c123 it attempts to create a directory called /proc/fs/reiserfs/cciss/c123, but cciss/ doesn't exist, shouldn't, and never will. Why not check to see if it does not exist, and create it if not, as needed, and skip the !'s? In order to create a single path component, cciss/c123 becomes cciss!c123. This is consistent with how sysfs does it now. For a real example, change the - in device mapper block names to / and see what happens. Regardless, it's already been checked into mainline as change 6fbe82a952790c634ea6035c223a01a81377daf1. -Jeff -- Jeff Mahoney SUSE Labs
Re: [PATCH] reiserfs: fix handling of device names with /'s in them
Jeff Mahoney wrote: As stated before numerous times by both Andrew and myself, the correct solution is to eliminate / from block device names. Why? It is elegant to have those /'s Just create the directory, how is that hard? This patch was a band-aid until that's done. -Jeff -- Jeff Mahoney SUSE Labs
Re: [PATCH] reiserfs: fix handling of device names with /'s in them
[EMAIL PROTECTED] wrote: On Sun, 16 Jul 2006 20:02:27 PDT, Hans Reiser said: Create a mountpoint which knows how to resolve a/b without using a directory. And said mountpoint gets past the '/' interpretation in the VFS, how, exactly? fs/namei.c, do_path_lookup() does magic on a '/' on about the 3rd line. So you're going to get handed 'a'. It does not need to be so complex actually, Just create a plain old parent directory just like every other parent directory in procfs.
Re: [PATCH] reiserfs: fix handling of device names with /'s in them
Jeff Mahoney wrote: [EMAIL PROTECTED] wrote: On Sun, 16 Jul 2006 20:02:27 PDT, Hans Reiser said: Create a mountpoint which knows how to resolve a/b without using a directory. And said mountpoint gets past the '/' interpretation in the VFS, how, exactly? fs/namei.c, do_path_lookup() does magic on a '/' on about the 3rd line. So you're going to get handed 'a'. That's where he started talking about how BSD gets namei() right by allowing each file system to deal with it how it chooses. Personally, I think it's insane. On occasion, I've started to port ReiserFS to BSD-like systems, Porting V3 to anything is insane. Why would you even consider it? and I get so fed up with how you have to reinvent the wheel for everything. There's something to be said for replaceable-anything semantics, but personally I like the Linux model and having an agreed-upon framework to work with. Linux vs. BSD's namei is the difference between thinking you know how to do things and everyone should be forced into your mold, and thinking that someone will always be more clever, at the very least with regards to some special case you could never have anticipated. I also think it's insane to come up with a reisermetafs to export procfs information when a simple s#/#!# _on a single directory name_ will do the job. Or just create a parent directory and skip the metafs. Look, I don't much care about the other details of coding it, but if you are changing !'s to /'s, as an architect my intuition says something is wrong and being papered over. /'s are just fine, and what the block devices do is elegant. You are doing a quick hack. -Jeff -- Jeff Mahoney SUSE Labs
Re: [PATCH] reiserfs: fix handling of device names with /'s in them
Jeff Mahoney wrote: Hans Reiser wrote: Jeff Mahoney wrote: Hans Reiser wrote: I don't understand your patch and cannot support it as it is written. Perhaps you can call me and explain it on the phone. I seriously can't tell if you're deliberately trying to be difficult or not. It's a simple replace / with ! before sending the name to procfs. Reiserfs requests that a procfs directory called /proc/fs/reiserfs/blockdev be created. Some block devices contain slashes, so with cciss/c123 it attempts to create a directory called /proc/fs/reiserfs/cciss/c123, but cciss/ doesn't exist, shouldn't, and never will. Why not check to see if it does not exist, and create it if not, as needed, and skip the !'s? 1) Because then the behavior of /proc/fs/reiserfs/ would be inconsistent. Devices that contain slashes end up being one level deeper than other devices, which is silly and a userspace visible change. And you think translating / to ! is less work for user space? Tools that wish to parse the information would then need added complexity to traverse into the next level to reach that information. 2) The block-device-as-path-name-component behavior is already defined by sysfs (/sys/block), and it should be consistent. Translate that as, I won't recompile my brain no matter what you do to make me. You blindly copied how someone else in a hurry did it without a thought to whether it was done right, and now you don't want to change it. You should have asked me about it before coding it. Replace block-device-as-path-name-component with block-device-as-path-name-suffix, and everything is very consistent. And elegant. Jeff, you are a programmer, not an architect, and when you disregard architects we end up with things like the performance disaster that is V3 acls. Replacing / with ! is hideous. Someone added a nifty elegance to block device naming, and you are desecrating it. Hans
Re: data corruption with 2.4.25 and datalogging patches
It seems like bad memory is growing as a percentage of user filesystem problem sources. Do others have that feeling also? Hans Brad Dameron wrote: On Mon, 2006-07-17 at 21:55 +0400, Vladimir V. Saveliev wrote: Hello On Mon, 2006-07-17 at 10:53 +0200, Francisco Javier Cabello wrote: Hello Vladimir, such corruptions used to be considered as hardware bugs. Memory failure, for instance. Did you ever run memtest on your systems? Yes, We have run memtest in our system. It's very seldom to find a system with a hardware memory problem running. When we find a memory problem the kernel doesn't boot. I am going to pass memtest in some of the system with reiserfs corruption problem. This is not true. There are certain memory issues that can still allow the system to boot and appear to run ok. I had a system that didn't show a memory error until the 4th pass on memtest. I just happened to let it run over the weekend. I have seen other issues with my larger systems that have 64GB of ram. To where memtest after a week didn't detect anything but the kernel mcelog reported weird ECC memory issues. I replaced several DIMM's and the issue went away. But who knows what could of occured had I not replaced the memory. Brad Dameron SeaTab Software www.seatab.com
Re: [PATCH] reiserfs: fix handling of device names with /'s in them
Jeff Mahoney wrote: Hans Reiser wrote: Jeff Mahoney wrote: 1) Because then the behavior of /proc/fs/reiserfs/ would be inconsistent. Devices that contain slashes end up being one level deeper than other devices, which is silly and a userspace visible change. And you think translating / to ! is less work for user space? A one line s#/#!# to access devices they couldn't before versus now having to deal with going deeper into a tree for no real reason? Yes, I do. I am willing to bet that perl can tree iterate with one line of code Please read the Hideous Name by Rob Pike. You are making it more hideous. Jeff, you are a programmer, not an architect, and when you disregard architects we end up with things like the performance disaster that is V3 acls. This again? The original reiser3 implementation was, and still is, incomplete in comparison to the design document. The design touted the extensibility of the tree and item types. Try v4. My original xattr implementation added another item type, but oh -- wait -- it turns out that the file system isn't quite as extensible as claimed.. or, well, AT ALL. Adding another item results in an incompatible file system change that when mounted on another system, will panic the node. That's friendly! There's not even any way to identify which items are in use on a particular file system to issue a warning/error on mount. Outstanding job architecting there. Well, if you had an obsessive desire to not use V4, you could fix this in V3 instead. If I could go back and do it again, I would have forked a reiserfs v3.7 that actually incorporated a compatibility block to identify which items are in use on a particular file systems, so that the mount can succeed or fail based on that. Might be easier to use V4... so many bugs got added to an otherwise stable V3. Xattrs would have been another item type as expected, and the performance problems wouldn't be nearly as harsh as they are. Hmm, maybe it was all perfectly predictable to an architect Not that you wouldn't have been just as resistant to that change as well. The thing is, you have a history of ignoring what users want. What quality architect does not? Users are to be listened to with great care. Users are to be listened to with great discretion. Users wanted ACLs and xattrs on reiser3, but you said, wait for v4, it'll be out soon, and it'll have them. That was 4 years ago. Reiser4 still isn't completely stable It is more stable than V3 was when it went in, and surely it is more stable than ext4 It is getting there. Recent get ready for mainline changes added bugs. We need to get some patches out the door tomorrow, and then we should be back to being stable again. (or in mainline), not my doing;-) we wasted a lot of time shuffling code from one side of the room to the other for no measurable benefit. If only that time could have been spent on the things I know deserve work. and ACLs and xattrs still aren't implemented. Users that demanded ACLs certainly aren't waiting around for reiser4 to be released and have ACLs added. They've long since switched to a file system that actually does what they need. They wanted ACLs and xattrs added to the stable file system they were using and you refused in an attempt to get more support for your latest project. Further, reiser3 users remember what a long painful road it was to reiser3 stability so why did you take their stable branch away from them by working on more than bugfixes for V3? Jeff, working on v3 at this point is nuts. V4 blows it away
Re: [PATCH] reiserfs: fix handling of device names with /'s in them
So the Plan 9 and Unix way would be to let the driver parse the number part of the name after the last slash. What I don't understand is why reiserfs is getting involved here, rather than recognizing the driver as an extension of the namespace, seeing the driver as a mountpoint, and just passing number to the driver. There must be something I don't grasp here, can you help me? Hans Jeff Mahoney wrote: Bodo Eggert wrote: Eric Dumazet [EMAIL PROTECTED] wrote: On Wednesday 12 July 2006 18:42, Jeff Mahoney wrote: On systems with block devices containing slashes (virtual dasd, cciss, etc), reiserfs will fail to initialize /proc/fs/reiserfs/dev due to it being interpreted as a subdirectory. The generic block device code changes the / to ! for use in the sysfs tree. This patch uses that convention. Tested by making dm devices use dm/number rather than dm-number Your patch handles at most one slash. But the description mentions 'slashes' (ie several slashes) Besides that, there is no reason to prevent the user from using many slashes. OTOH, I'd prefer propper quoting, but having each driver do this would be insane. The strings aren't user-supplied, they're kernel-internal names of block devices, supplied by the driver. At present there is no possibility of more than one slash in the name, and I doubt we'll see any new devices with one slash in them, never mind more than one. -Jeff -- Jeff Mahoney SUSE Labs
Re: short term task list for Reiser4
Jindrich Makovicka wrote: On Fri, 14 Jul 2006 00:01:49 -0700 Hans Reiser [EMAIL PROTECTED] wrote: rvalles wrote: I believe those two are related. I'm having the pauses (of many minutes at times!) when writing to reiser4. It seems it is triggered mostly by the use of fsync(); NFS in sync mode manages to trigger it way often: I mount my old desktop's home from my new computer via synced NFS. The pauses consist in the application being frozen and the other applications being slowed down on their IO operations, while the disks writes data continously during an interval that usually last a minute or so, but may last many times that. It often happens with small files (like, when sending a mail, as it passes through the MTA), so I believe it probably (re)writes to disk lots of stuff that doesn't need to be written to disk at all. It only happens on reiser4 patches against 2.6.13 or newer, 2.6.12.x is fine. So the pauses are experienced by a process waiting on fsync to finish? If yes, then the problem is a very different issue from what I thought I am pretty sure I triggered the same a couple of times with aMule, which was downloading and uploading about 20-30kB per second in each direction, and randomly accessing larger files. Doing a sync then suddenly caused the disk seek like crazy and write about 500kB per second constantly. I waited about a minute, and then better killed aMule. Could it be possible that sync tries to sync the writes which arrive during the operation and cannot catch up? zam, can you answer this, because he is right that if the answer is yes, then we have a bit of a problem we need to address. I worry that, assuming the answer is yes, that one process doing a lot of fsync, could muck up the performance of everyone its atoms fuse with if we expire the atom with every fsync, but if we don't then. Jindrich, don't let us forget this issue before we fix it, ok? Thanks, Hans
Re: [mythtv-users] Permission denied to ls for root?
Ryan Steffes wrote: On 7/14/06, *Michael T. Dean* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: On 07/14/2006 09:31 PM, Carl Fongheiser wrote: On 7/14/06, *Ryan Steffes* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: I've never seen ls do this before, at least, not for root. Any idea what could cause it? # ls ls: 1036_2006041723_20060417233000.nuv.png: Permission denied ls: 1056_2006061013.mpg: Permission denied ls: 1006_20050920213000_2005092022.nuv.png: Permission denied ls: 1013_20060420201900_20060420203000.nuv: Permission denied ls: 1003_2006041922_2006041923.nuv.png: Permission denied Can't delete em, look at em, touch em, chmod em, or chown em, even as root. It's odd. Are these files on a remote NFS share? If so, that's not unusual. Root is usually mapped to nobody on remote NFS servers. Or, it could happen because of a stale NFS file handle (fixable by unmounting/remounting). Mike ___ mythtv-users mailing list mythtv-users@mythtv.org mailto:mythtv-users@mythtv.org http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users It appears the answer I may have to go with is some sort of corruption in the file system. I'm CCing this to the reiserfs list, since that's what the partition is. On reboot, the partition starts to load, and hangs up on Checking Internal Tree.. and gets no further. Any suggestions on how to recover without losing all my data? Ryan Send lots more details to [EMAIL PROTECTED]
Re: [PATCH] reiserfs: fix handling of device names with /'s in them
This sounds like it should be fixed in the driver, not in reiserfs. It sounds like the driver is violating Posix naming, and should be fixed to conform to it. Have the driver create an fs mountpoint, and then have the driver handle the number. I really don't get why reiserfs has any role in this problem. Regarding a separate name space that doesn't follow the same rules as the standard file system name space., linux does not need those to be created, but what I don't understand is exactly in what respect the driver namespace does not conform. It has components separated by slashes. Is this related to the difference between BSD's namei and Linux's? BSD is the one getting it right. Hans Jeffrey Mahoney wrote: Hans Reiser wrote: So the Plan 9 and Unix way would be to let the driver parse the number part of the name after the last slash. What I don't understand is why reiserfs is getting involved here, rather than recognizing the driver as an extension of the namespace, seeing the driver as a mountpoint, and just passing number to the driver. There must be something I don't grasp here, can you help me? The name used in procfs isn't parsed anywhere, it could just as easily be fs0, fs1, fs2, etc, but that wouldn't be a very user friendly way of indicating which file system's statistics are described in that directory. It's just presented to the user as a pathname to identify a particular file system. The problem is that reiserfs is attempting to use a name from a separate name space that doesn't follow the same rules as the standard file system name space. Block device names, initially, weren't intended for use as self-contained path components and aren't part of the file system name space. If we wish to use those names, we need to sanitize them to conform to the rules of the file system name space by removing/replacing the path separator character. It's unfortunate that some drivers use a slash rather than sticking with the typeletter convention. I don't expect new drivers to be added with slashes in them. If at some point the existing drivers are changed to remove the slash, then this patch can be removed again. -Jeff
Re: [PATCH] reiserfs: fix handling of device names with /'s in them
Jeff Mahoney wrote: Hans Reiser wrote: Hans, we're all in agreement that we'd prefer drivers not use names with slashes in them, there is nothing wrong with using names that have slashes. The thing that is wrong is somehow needing to translate them into names with !'s. and it would be nice to correct drivers currently using them. The problem is that when you change the name of a device, that's a userspace visible change. So don't. Why would user space care how you parse it and whether the driver or reiserfs does it? Scripts that currently expect, say, /proc/partitions to contain cciss/number will break between kernel versions. Sysfs wants to use the device name as a pathname component, and as such translates the / to a !, the same as this patch proposes. Reiserfs gets involved because it expects that name to be usable as a file system pathname component when it is not intended to be one without translating slashes into another character. The difference is that block device names are allowed to have slashes in them, while normal file system names are not. We should distinguish here between names and name components. The fact is that device driver names, when in /dev can use separate components, like /dev/cciss/0, but when used in the manner reiserfs wants them to be used, they can't. Also, I'm not talking about name spaces like struct namespace, I mean that the group of names that block device drivers use have different constraints than the group of names that are allowable as file names. The fact is that this change is required for users deploying devices that use slashes in their names to see the proc data for a reiserfs file system. You can point the finger all you want at the block drivers in the mean time, but it's still a reiserfs problem. I still do not grok why you need to change / to !. Something is wrong. Reiserfs is being asked to do something that somebody else should be doing. Hans
Re: [PATCH] reiserfs: fix handling of device names with /'s in them
Jeffrey Mahoney wrote: Hans Reiser wrote: Jeff Mahoney wrote: Hans Reiser wrote: Hans, we're all in agreement that we'd prefer drivers not use names with slashes in them, there is nothing wrong with using names that have slashes. The thing that is wrong is somehow needing to translate them into names with !'s. If using something with slashes in it as a file name component isn't problematic, then by all means create a single file system object named a/b where a doesn't refer to a parent directory and tell us all how. Create a mountpoint which knows how to resolve a/b without using a directory. and it would be nice to correct drivers currently using them. The problem is that when you change the name of a device, that's a userspace visible change. So don't. Why would user space care how you parse it and whether the driver or reiserfs does it? Huh? The block device's name is exported directly via /proc/partitions, but then also used as a file name component in sysfs, and also procfs via reiserfs. How do you propose fixing this without adding an additional field to genhd? Adding a helper function is essentially the same thing as this patch other than it being open coded, what is open coding? Something different from free software? and I'm not getting the impression that the open coding is your issue. Scripts that currently expect, say, /proc/partitions to contain cciss/number will break between kernel versions. Sysfs wants to use the device name as a pathname component, and as such translates the / to a !, the same as this patch proposes. Reiserfs gets involved because it expects that name to be usable as a file system pathname component when it is not intended to be one without translating slashes into another character. The difference is that block device names are allowed to have slashes in them, while normal file system names are not. We should distinguish here between names and name components. In terms of file system names, I have been making that distinction. In terms of block devices, the name the name refers to? consists of only one component. So fix that. Create a pseudo directory, and have a/b and a/c get resolved by a. That is cleaner than converting / to !. More below. The fact is that device driver names, when in /dev can use separate components, like /dev/cciss/0, but when used in the manner reiserfs wants them to be used, they can't. Also, I'm not talking about name spaces like struct namespace, I mean that the group of names that block device drivers use have different constraints than the group of names that are allowable as file names. The fact is that this change is required for users deploying devices that use slashes in their names to see the proc data for a reiserfs file system. You can point the finger all you want at the block drivers in the mean time, but it's still a reiserfs problem. I still do not grok why you need to change / to !. Something is wrong. Reiserfs is being asked to do something that somebody else should be doing. Splitting the block device names with / is applying file system path name rules to the block device name, when they don't. Don't what? The entire point of this is that cciss/whatever refers to a single object in the block layer, but when you apply file system rules, it becomes two. Uh, no, a/b in any POSIX filesystem refers to one object. Now maybe someday probably not what you meant This is not the desired interpretation, which is why we need to replace the pathname separator in the name. ReiserFS is the component that is choosing to use the block device name as a pathname component and is responsible for making any translation to that usage. This makes no sense. I have the feeling you see trees and I see forest. -Jeff