Re: Filesystem corruption
On Tuesday 29 May 2007 07:36:13 Toby Thain wrote: but you can't mention using reiserfs in mixed company without someone accusing you of throwing your data away. People who repeat this rarely have any direct experience of Reiser; they repeat what they've heard; like all myths and legends they are transmitted orally rather than based on scientific observation. Well, there is one problem I vaguely remember that I don't think has been addressed, I think it was one of those lets-put-it-off-till-v4 things. It was the fact that there are a limited number of inodes (or keys, or whatever you call a unique file), and no way of knowing how many you have left until your FS will suddenly, one day refuse to create another file. (For comparison, ext3 seems to support not only telling you how many inodes you have left, but tuning that on the fly.) But, I haven't run into that, and the only problem I've had lately has been Reiser4 losing data, and crashing occasionally. I switched most of my data off of Reiser4 and onto XFS for that reason. I've also been using ext3 in some places, and Reiser3 in others (one place in particular where space is limited, but I will have tons of small files). I later learned that XFS does out-of-order writes by default, making me think I should give up and invest in UPS hardware. But, switching away from Reiser4 means I no longer see random files (including stuff in, for example, /sbin, that I hadn't touched in months) go up in smoke. Ordinarily I like to help debug things, but not at the risk of my data. Maybe I'll try again later, and see if I can reproduce it in a VM or somewhere safe... I do still follow the list, though, in case something interesting happens. It was fun while it lasted! pgpariYsg6fOw.pgp Description: PGP signature
Re: Filesystem corruption
On Wednesday 30 May 2007 11:42:01 Toby Thain wrote: But does it cause data loss? One usually sees claims that reiserfs ate my data, or I heard reiserfs ate somebody's data, but without supplying a root cause - bad memory? powerfail? bad disk? etc. Power failure shouldn't kill a filesystem, and generally shouldn't eat data that was written to disk before the failure. (Although I could complain all day here about why corruption happens anyway when you do any kind of out-of-order operations... I am looking forward to that Reiser4 transaction API, so we can finally get rid of the tmpfile+rename hack.) But in any case, there were some kernels -- 2.4.16, I think? -- in which reiserfs was unstable and did corrupt easily. I believe that was tracked down to kernel bugs outside of reiserfs. pgpro4QoRvDOq.pgp Description: PGP signature
Re: Filesystem corruption
On Wednesday 30 May 2007 12:22:17 devsk wrote: I have used R4 for a year now and I have had to reset my PC, troubleshooting problems with vmware/mythtv/cisco vpn client/nvidia, so many times that its not even funny! And R4 didn't give me any problems even once. It boots right up, without any files lost and consistent FS as a subsequent livecd boot and fsck proved it everytime. That happened to me for maybe a year or so, I'm not sure. Then, slowly, I started to get problems. The machine crashing due to some nvidia bug -- or even a reiser-specific oops or something -- then I'd have to fsck it, which would take an hour or more, then I'd boot, and apparently no problems. Only, recently, these fsck-a-thons started happening more and more often, and I started to lose random files. They'd just be silently truncated to 0 bytes. And not files I was writing a lot -- I'm talking about things like /bin/mount. Now, maybe it's an amd64-specific bug. Or (somehow) a dmraid-specific bug, or a dont_load_bitmap bug. (Who can blame me; without dont_load_bitmap, it takes at least 30 seconds, maybe a minute to mount.) Could even be, somehow, a Gentoo-specific bug. Could be a 350-gig-partition bug, or even a bug of the it-hates-me variety. (My server ran Reiser4 for awhile longer, with no problems, but I wasn't about to take chances there.) But, I switched a friend over to Ubuntu, and he had the same kind of problems. In fact, he had them first (I thought it was his computer, for awhile). Finally, we switched to stock Ubuntu kernels and XFS, me on dmraid, him on normal linux raid5 (md), and we now have no problems. It's even faster -- the biggest gain for Reiser4 was /usr/portage, which doesn't exist on Ubuntu. If I did that to ext or xfs, I would have lost big time. Well, I'm on XFS on my desktop now, and ext3 on my server. No problems at all so far. Also much faster, because my desktop now has a repacker (xfs_fsr). I hope people don't leave this good piece of code to rot!! Me too, but you know, I can no longer afford to spend a few hours running fsck for no apparent reason. I no longer have a machine that can do anything but just work. The killer feature of Reiser4, as implemented, is small file performance that makes ReiserFSv3 weep, and v3 makes XFS weep. All the other stuff we were promised is either planned for a later release (repacker, pseudofiles, transaction API) or barely working (cryptocompress). And on just about any setup I work on today, small file performance is a small enough priority that even the slightest hint of instability is a deal-breaker. Enough people feel the same way that ext3 is still widely used. And if it's ever really crucial, there's reiserfs3. So, you can blame it on my hardware, or on not getting kernel inclusion, or anything you want, but the only place I still use Reiser4 is on the gameserver at our LAN party, and we're thinking of moving that to something like ext3 or xfs, just so we don't need custom kernels. And after all, that's a gameserver, it's not like the filesystem is the bottleneck anyway. pgpyny6ogblkT.pgp Description: PGP signature
Re: Filesystem corruption
On Wednesday 30 May 2007 11:02:26 Vladimir V. Saveliev wrote: Ordinarily I like to help debug things, but not at the risk of my data. Maybe I'll try again later, and see if I can reproduce it in a VM or somewhere safe... that would be great, thanks Keep in mind, it's unlikely, given I don't have much resembling my original setup left around. And it was fairly random, under fairly normal usage patterns -- just I'd suddenly notice my movie had stopped playing, and I'd hit ctrl+alt+f8 and find a bunch of reiser4 error messages. Is it at all likely that this is an amd64 bug? (The only two places I've seen it are on my box and my friend's, both amd64 on some sort of RAID.) If you don't have enough testers or hardware for amd64, I can try (again) to setup a working x86_64 VM for you to test on. pgphsmCDRGDn1.pgp Description: PGP signature
Reiser4 crash (?)
Finally set up network logging: kernel - syslog-ng - TCP (crossover) - syslog-ng (other box) - log file. This time, I actually caught something from the crash. It may be hardware-related, but I thought I'd report it here this time because the crash was definitely in Reiser4 code. This may or may not be relevant: Oct 6 02:15:27 elite irq event 217: bogus return value 8027d751 Oct 6 02:15:27 elite Oct 6 02:15:27 elite Call Trace: IRQ 8028f765{__report_bad_irq+53} Oct 6 02:15:27 elite 8028f812{note_interrupt+82} 8028f1e9{_ _do_IRQ+169} Oct 6 02:15:27 elite 802611e2{do_IRQ+66} 8025f290{default_i dle+0} Oct 6 02:15:27 elite 802582ac{ret_from_intr+0} EOI 8025a76 f{thread_return+0} Oct 6 02:15:27 elite 8025f2ba{default_idle+42} 8024592d{cpu _idle+61} Oct 6 02:15:27 elite 8048384f{start_kernel+495} 80483255{_s inittext+597} Oct 6 02:15:27 elite handlers: Oct 6 02:15:27 elite [8033cea0] (usb_hcd_irq+0x0/0x60) Oct 6 02:15:27 elite [880b55a0] (nv_nic_irq+0x0/0x180 [forcedeth]) This message repeated a couple of times beforehand. The only log entries between that and the crash are my ntpd, apparently trying to set the clock, apparently not bothering to actually do it: Oct 6 02:16:03 elite ntpd[4996]: adjusting local clock by -2012.380177s Oct 6 02:19:55 elite ntpd[4996]: adjusting local clock by -2012.350292s Oct 6 02:20:25 elite ntpd[4996]: adjusting local clock by -2012.088300s Oct 6 02:24:42 elite ntpd[4996]: adjusting local clock by -2011.895028s Oct 6 02:27:58 elite ntpd[4996]: adjusting local clock by -2011.637006s Oct 6 02:32:12 elite ntpd[4996]: adjusting local clock by -2011.575243s Oct 6 02:33:15 elite ntpd[4996]: adjusting local clock by -2011.448711s Oct 6 02:35:23 elite ntpd[4996]: adjusting local clock by -2011.293673s I can't believe my clock is skewed that badly, especially when I manually restart it, I get log entries like this: Oct 5 04:05:50 grunt ntpd[6601]: adjusting local clock by -0.146991s I believe this is the crash, though: Oct 6 02:38:36 elite Unable to handle kernel NULL pointer dereference at 00 38 RIP: Oct 6 02:38:36 elite 880808e7{:reiser4:search_one_bitmap_forward+135} Oct 6 02:38:36 elite PGD 7552e067 PUD 7de47067 PMD 0 Oct 6 02:38:36 elite Oops: [1] Oct 6 02:38:36 elite CPU 0 Oct 6 02:38:36 elite Modules linked in: xt_tcpudp xt_state ip_conntrack iptable _filter ip_tables nfnetlink_queue nfnetlink xt_NFQUEUE x_tables nfs nfsd exportf s lockd sunrpc snd_seq_midi snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd _seq_midi_emul snd_pcm_oss snd_mixer_oss snd_seq_dummy snd_seq_oss snd_seq_midi_ event snd_seq snd_emu10k1 snd_rawmidi snd_ac97_codec snd_ac97_bus snd_pcm snd_se q_device snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore nvidia af _packet usb_storage joydev ide_cd cdrom amd74xx ide_core sk98lin forcedeth unix reiser4 dm_mod sd_mod sata_nv libata scsi_mod Oct 6 02:38:36 elite Pid: 428, comm: ktxnmgrd:dm-4:r Tainted: P 2.6.17.13 #2 Oct 6 02:38:36 elite RIP: 0010:[880808e7] 880808e7{:reiser4 :search_one_bitmap_forward+135} Oct 6 02:38:36 elite RSP: 0018:81007f1f5888 EFLAGS: 00010246 Oct 6 02:38:36 elite RAX: RBX: 0001 RCX: 81007 f1f591c Oct 6 02:38:36 elite RDX: c204d660 RSI: 810001d8fc10 RDI: 81000 1d8fc10 Oct 6 02:38:36 elite RBP: 81007f1f591c R08: 81007f1f5778 R09: 0 000 Oct 6 02:38:36 elite R10: 0010 R11: 81007f1f57d8 R12: 0 001 Oct 6 02:38:36 elite R13: 7fe0 R14: 7fe0 R15: 81007 f1f5970 Oct 6 02:38:36 elite FS: 2af33ea0f1b0() GS:8047a000() knlG S: Oct 6 02:38:36 elite CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b Oct 6 02:38:36 elite CR2: 0038 CR3: 7d4f1000 CR4: 0 6e0 Oct 6 02:38:36 elite Process ktxnmgrd:dm-4:r (pid: 428, threadinfo 81007f1f 4000, task 81007e64a200) Oct 6 02:38:36 elite Stack: 8000 00010001 81007f1f591c c204d660 Oct 6 02:38:36 elite 81003df44004 7f1f5ae8 00200020 810 07f1f5968 Oct 6 02:38:36 elite 81007f1f591c 0001 Oct 6 02:38:36 elite Call Trace: 88080b11{:reiser4:bitmap_alloc_forwa rd+145} Oct 6 02:38:36 elite 88080d2f{:reiser4:alloc_blocks_bitmap+447} Oct 6 02:38:36 elite 88060e53{:reiser4:plugin_by_unsafe_id+35} Oct 6 02:38:36 elite 8807f6a1{:reiser4:item_length_by_coord+17} Oct 6 02:38:36 elite 8804fa65{:reiser4:reiser4_alloc_blocks+165} Oct 6 02:38:36 elite 88053db9{:reiser4:allocate_znode_update+265} Oct 6 02:38:36 elite 8807f9d1{:reiser4:item_body_by_coord_hard+17} Oct 6 02:38:36 elite 880776ef{:reiser4:internal_at+15} 88077 709{:reiser4:pointer_at+9} Oct 6 02:38:36 elite
BitTorrent+Reiser4: curiouser and curiouser
Azureus had a problem. Once it got up to a good clip downloading, it would thrash the disk. It would thrash the disk, and the system, so hard that even web browsing was difficult, due to disk access being many, many times slower than Internet access, even an Internet which is being hogged by BitTorrent. After changing Azureus' cache to 32 megs, and telling it not to write files immediately, I thought I had the problem solved -- no thrashing at all. Until the cache got full. Then: Thrashing. Less freqent, but much more vigorous -- Azureus becomes extremely unresponsive for a few minutes. It shouldn't be touching the disk AT ALL when there's over a gig of FREE RAM (as in, neither buffer nor cache nor actually used yet), and the file I'm attempting to download is less than 200 megs. I tried an strace, but as I am not at all skilled in the ways of debugging or reverse engineering, I got syscall spam -- a 200 meg log file, and when I finally found a decent way to analyze it, I found most of Azureus' system call wall time is spent in futex(). Huh? Looked up futex on Wikipedia, and I still have no clue how this makes any sense. Either futex was somehow thrashing the disk, or Azureus has somehow managed to fork completely out of strace's control. Or maybe it's somehow something that the kernel is doing on its own, which is somehow forcing azureus to block, but somehow not tripping strace's timers while doing so. This problem did not always happen with my Reiser4, but unfortunately, I can't pin down exactly when it started doing this. It might have been a kernel upgrade, a Reiser4 upgrade, or an Azureus upgrade. Here's the catch, though -- when I finally tried another client (BitTornado, on the same file), I have had absolutely no thrashing yet. It's hardly touched the disk. I was thinking maybe Azureus synced somehow, and BT didn't, but running sync on the commandline took about 2 seconds. Which means that, with BitTornado, everything works exactly the way it's supposed to. So I'm happy it works, but I'm still curious why Azureus thrashed so much, and BitTornado doesn't thrash at all. Maybe it's the apps? Or Python vs Java? Or maybe it's something like Evolution and column resizing -- something so embarrassingly, retardedly inefficient as flushing the column width information to disk every couple of pixels, that went unnoticed for so long because fsync performs well enough on other filesystems. That's what it seems like to me, but one thing's sure -- it is neither fsync nor fdatasync. I've disabled those at the kernel level. I've still got no clue as to what it is, but I'll be glad to be rid of Azureus just as soon as I can actually find the features I like from it in other BitTorrent clients.
Re: BitTorrent+Reiser4: curiouser and curiouser
Konstantin Münning wrote: David Masover wrote: (snip) It shouldn't be touching the disk AT ALL when there's over a gig of FREE RAM (as in, neither buffer nor cache nor actually used yet), and the file I'm attempting to download is less than 200 megs. I tried an strace, but as I am not at all skilled in the ways of debugging or reverse engineering, I got syscall spam -- a 200 meg log file, and when I finally found a decent way to analyze it, I found most of Azureus' system call wall time is spent in futex(). Huh? Looked up futex on Wikipedia, and I still have no clue how this makes any sense. Either futex was somehow thrashing the disk, or Azureus has somehow managed to fork completely out of strace's control. Or maybe it's somehow something that the kernel is doing on its own, which is somehow forcing azureus to block, but somehow not tripping strace's timers while doing so. Have you used -f or -ff with strace? I used -f. What's the difference between that and -ff?
Re: BitTorrent+Reiser4: curiouser and curiouser
Alexander Zarochentsev wrote: I guess futex (, FUTEX_WAIT, ) calls can be ignored in this analysis. They just wait another threat to call futex(, FUTEX_WAKE, ). Interesting to find that thread and look what it was doing before FUTEX_WAKE? Or FUTEX_WAIT returns ETIMEDOUT? It probably would be interesting, but I'm a complete newbie at strace. You'll have to walk me through this one step by step...
Re: reiser4 resize
Alexey Polyakov wrote: On 9/20/06, Łukasz Mierzwa [EMAIL PROTECTED] wrote: It's been proven that flushes are doing much more job then they should. Not so long ago someone send a trace of block device io accesess during reiser4 work and someone anylized it and said that some files or parts of file where written over and over 200 times or so. Wow. I should go back and read that -- I assume this is being worked on? a few months old filesystem that had been used often just shows a week spot in reiser4, while downloading files with azureus with only 64KB of data per second I got disk lid on almost all the time, swithcing to rtorrent helped as it does not seem to call fsync ( I think I disabled fsync in azureus). Hmm, strange. I am using Azureus, but I don't think it's fsync. I can try rtorrent, but there are several things I like about Azureus that nothing else seems to do yet. But also, Azureus didn't always do this. In fact, I used it for several months before I started having this problem. Ah, I see, if bittorrent calls fsync often, it's no wonder that reiser4 behaves badly. I had to preload libnosync for some of my programs that do fsync to avoid this. Way ahead of you. I noticed how much fsync performance sucked when using vim, and I was sick of waiting 10 seconds every time I hit :w -- a LOT of stuff can pile up in 2 gigs of disk buffer, and at the time, Reiser4 fsync effectively just called sync. I didn't know about libnosync (or it didn't exist yet, or didn't work, I'm not entirely sure), so I was faced with either patching vim, which had just been patched to _add_ fsync'ing, not to mention all the other programs that might fsync too much; patching glibc (huge, I don't update it often, and I'd have no idea where to start); or patching the kernel. I now keep backups, and I maintain and apply the following (STUPID, DON'T TRY THIS AT HOME) patch to my kernel: --- linux/fs/buffer.c 2006-08-15 20:40:36.504608696 -0500 +++ linux/fs/buffer.c.new 2006-08-15 20:42:35.877461264 -0500 @@ -366,12 +366,12 @@ asmlinkage long sys_fsync(unsigned int fd) { - return __do_fsync(fd, 0); + return 0; } asmlinkage long sys_fdatasync(unsigned int fd) { - return __do_fsync(fd, 1); + return 0; } /*
Re: reiser4 resize
Alexey Polyakov wrote: On 9/19/06, David Masover [EMAIL PROTECTED] wrote: When I have over a gig of RAM free (not even buffer/cache, but _free_), and am trying to download anything over BitTorrent, even if it's less than 200 megs, the disk thrashes so badly that the system is really only usable for web and email. Even movies will occasionally stall when this is happening, and by occasionally, I mean every minute or so. Do you have this problem on plain vanilla + reiser4? Yes. Well, no. My kernel is: vanilla 2.6.17.13 on amd64 patches: sk98lin 8.36, latest from the manufacturer reiser4-for-2.6.17-3 my own patch that disables fsync and fdatasync external modules, installed via Portage: ALSA 1.0.11 driver, using snd_emu10k1 and all sorts of support stuff (OSS emulation, synth, etc) nvidia driver, 1.0.8762 I've also been having a bit of instability issues, but only very rarely do these seem at all FS-related. I'm overclocked a bit, and I can reliably crash my system by playing Neverball, Doom 3, or Quake 4 for several hours. I strongly suspect this is either my overclocking or the nvidia drivers here. However, I doubt anything I've done beyond vanilla+reiser4 is affecting this disk access issue, and I'm pretty much rock solid when I'm not playing a game. I also have a close-to-identical machine nearby which is not overclocked, same kernel, same modules, everything except the nvidia driver, been rock solid for a year, no performance issues to speak of. The main difference, other than graphics, is that the stable machine is using 21 gigs out of 72, whereas the unstable one (the one that's sluggish for BitTorrent) is using 279 gigs out of 350, and has been up to 320 or 330 at least before I started cleaning things out. So I think we're down to two possibilities: Either an update to Azureus has found a way to sync that I'm not aware of, or this is the behavior someone described where Reiser4 will attempt to find contiguous space to allocate, and continue searching and re-searching the same areas of the disk almost every write. To be honest, I hope it's about syncing, somehow, because I'd much rather believe my disk isn't horrendously fragmented...
Re: reiser4 resize
Vladimir V. Saveliev wrote: Hello On Tuesday 19 September 2006 05:12, Jack Byer wrote: Short summary: Will a resize program for reiser4 be available within the next six months? Currently nobody works on that. So, I guess it is not very likely that reiser4.resize will be created within next six months. Not even an expand? I know a shrink depends on a working repacker (even an offline one), but I'd think expanding it would be simple enough, so long as there's a big warning of You cannot undo this (can't shrink)! When I first created the filesystem, there was a reiser4 resize program. This is no longer the case. that was not a working program. Yes, I remember that, it was a stub. I think you should change to a filesystem which has resize. Alternately, how much would it cost to implement basic resizefs.reiser4? There are other reasons that make me wish I'd stayed away from reiser4 for awhile. Mainly, right now, I need a repacker, and the system seems to have become absurdly slow when it's fragmented. When I have over a gig of RAM free (not even buffer/cache, but _free_), and am trying to download anything over BitTorrent, even if it's less than 200 megs, the disk thrashes so badly that the system is really only usable for web and email. Even movies will occasionally stall when this is happening, and by occasionally, I mean every minute or so. I believe there was a patch to address the thrashing, so I'm eagerly awaiting 2.6.18, but the lack of a repacker bothers me.
Re: v3 rebuild-tree left system in unusable state because of space shortage
Vladimir V. Saveliev wrote: while there is no fix currently for this problem you can solve the problem by expanding underlaying device. Just curious, could it also be fixed by mounting the FS, freeing up some space, then retrying the FSCK? Or is the FS unusable?
Re: Relocating files for faster boot/start-up on reiser(fs/4)
Quinn Harris wrote: On Thursday 14 September 2006 23:15, Toby Thain wrote: On 14-Sep-06, at 6:23 PM, David Masover wrote: Quinn Harris wrote: On Thursday 14 September 2006 13:55, David Masover wrote: ... That is a good point. Recording the disk layout before and after to compare relative fragmentation would be a good idea. As well as randomizing the sequence as a sanity check. Also note that during boot I was using readahead on all 3885 files. So the kernel has a good opportunity to rearrange the reads. And the read sequence doesn't necessary match the order its needed (though I tried to get that). Speaking of which, did you parallize the boot process at all? Just off the top of my head, wouldn't that make the access sequence asynchronous thereby less predictable? (Although I'm sure it's a net win.) It could, but the kernel will try to reorder the outstanding block requests to reduce seek. If that is an overall win I don't know. In addition early in the boot, readahead-list or similar will tell the kernel to start reading most of the files need for the complete boot so they are already in memory when needed. Ubuntu does the readahead now and all my tests where with readahead. That's interesting. I think either parallizing or a very aggressive readahead will perform similarly, except in cases where you have a script blocking on something other than disk or CPU, like, say, network. I'd estimate my system easily spent more than 50% of its boot time not touching the disk at all before I did that. Gentoo can do this, I'm not sure what else, as it kind of needs your init system to understand dependencies. ... The current Ubuntu boot waits for hardware probing, DHCP and other things giving the disk readahead a chance to work. I think this reallocation might help a parallel boot more as the data will be needed sooner. So I changed my mind, I think parallel boot will highlight the reallocate advantage. Now I just need to test the hypothesis. Hmm. That's possible. But again, even with the parallel boot, there was still a bit of time spent not touching the disk, so I wouldn't expect much more of a speedup than what you already have. Which also means, by the way, that I wouldn't use it much -- my system takes more like 20 seconds from Grub to a login prompt, and from then on, the only things that take more than 5 seconds to load are games. Since I know Quake 4 uses zipfiles (probably compressed) for its storage, and I watched the HD LED while it loads, I don't think I can speed that up at all short of buying a faster CPU. Well, that and the Portage tree, but you say I shouldn't expect much from that. Maybe the portage cache? Not sure if I would be better of trying initng or waiting for upstart (Ubuntus new init) to get scripts that actually parallel boot. The code for upstart is very clean and it has the backing of a major distro, so I have high hopes. Hmm. That sounds kind of cool, but I wonder how it compares to Gentoo's init scripts? I guess I'll have to wait till it hits the one Ubuntu box I have... Much like before, I was able to improve a 16.5s oowriter cold start to 14s with this reallocate script, with a cold start of 4.8s (OO 2.0.2, was using 2.0.3 before). Wait -- cold start is 14s, but it's also 4.8s? Did you mean warm/hot start for that last number? I think Python will be the best language for this because its become relatively universal and its easy to understand for the uninitiated. This really isn't black magic so transparency is good. I personally prefer Ruby though. Wait... Python is more universal than Ruby of Ruby on Rails? Python is faster, anyway... I'm waiting for someone to do a decent implementation of Ruby on something like .NET before I start using it for anything I want to perform well.
Re: reiser4: mount -o remount,ro / causes error on reboot
Peter wrote: Using: gentoo kernel 2.6.17.11 with beyond patchset reiser patch 2.6.17-3 reiser4progs 1.0.5 At the end of the gentoo shutdown script is a short function which remounts / as ro. There's also one in the Gentoo startup script, which attempts to remount / ro, then remount it rw. I commented that out, because it was causing similar problems. I figure if it runs sync when it shuts down, that's good enough. Still, it's an annoying problem, I think it's a kernel oops. Namesys, what kind of information would be helpful?
Re: FEATURE Req: integrate badblocks check into fsck.reiser*
Ric Wheeler wrote: David Masover wrote: Hans Reiser wrote: Ric Wheeler wrote: Having mkfs ignore bad writes would seem to encourage users to create a new file system on a disk that is known to be bad most likely not going to function well. If a user ever has a golden opportunity to toss a drive in the trash, it is when they notice mkfs fails ;-) This option to mkfs sounds like an invitation to disaster. Yes, you are right, the option should be to run badblocks and then fail if it finds any. Unless it creates significantly more work for us, there should be an option to run badblocks, and if it finds any, it should prompt the user (with BIG FAT CAPSLOCK WARNINGS) whether they want to format anyway. Formatting anyway should work, and we should be able to have blocks marked bad. I think that you are missing the way modern drives behave. To give a typical example, on a 300 GB drive, we typically have 2000 or more extra sectors that are used for automatic remapping. Theses sectors are consumed only when the drive retries a failed write multiple times. Oh, I'm not disputing that mkfs should discourage users from using broken drives. Presumably, smart admins wouldn't see this often, because they'd be monitoring SMART. We really, really do not need a list of bad blocks to avoid during writing a new file system image. Why do you presume to make this decision for users? I don't think we need CONFIG_LEGACY_PTYS -- they're insecure, and almost never needed. But we should still leave them in. The burden is on us to show that it's taking real work to implement and maintain. I think that the more interesting case is handling bad blocks during recovery. It is not clear to me that fsck needs a list, but we have worked with Hans and Vladamir to get support for doing a reverse mapping (given a list of bad blocks, show the user what files, etc got hit). Yes, it seems like fsck would be much better off that way. In this case, of course, I'd prefer to avoid hitting that problem -- use RAID, make regular backups, toss out the disk and restore. Being able to repair bad blocks would tend to encourage a user to keep using a bad disk, but I don't want to force my opinion on everyone when there's a reasonable way for all of us to be happy.
Re: FEATURE Req: integrate badblocks check into fsck.reiser*
Ric Wheeler wrote: David Masover wrote: Why do you presume to make this decision for users? It's not a decision that I want to make for users, it is a decision that Hans and his team need to make about how best to spend their limited resources. Agreed. It's not important if it takes more than, say, an hour. It will also give more users a bad experience with the file system, since users rarely have the in depth knowledge required to make this kind of choice. While it's true that most users just click through dialog boxes, I imagine this would be sufficient: ===WARNING===WARNING===WARNING=== - THIS DISK IS BAD! If you continue with the format, we will not help you when you lose data. When, not if. You are strongly encouraged to THROW THIS DISK OUT! - ARE YOU ABSOLUTELY SURE YOU WANT TO CONTINUE? (yes/no): And require an actual yes or no answer. No y/n. Now, compare that to a filesystem which doesn't allow badblocks in mkfs at all. While it's rare, I suspect that would be a worse experience if you actually had a real need for it. If you've got a huge 300 gig drive with some bad blocks, you can always throw some stuff on it anyway, for backup, or stuff you don't care about, even knowing it'll fail soon. Again, probably not a high priority item at all, but it certainly won't make the user experience worse. Any user who says yes to the above warning does not get to complain about their experience. Here we mostly agree. The need for enhanced tools is not to encourage people to keep using bad drives, rather to allow them to fsck remount a drive for data recovery. If you cannot mount fsck fails to repair enough to give you at least a readable file system, then you are in real trouble ;-) Also, unlike failing writes, disk read errors are quite often ephemeral and will be self correcting on the next write (you might get an error from dust, etc that gets swept clean on the next write). Either one, I would personally feel quite a lot safer grabbing a disk image and doing the fsck once the image was on known good media. One thing that would be even better here, though, if you don't want to spend the time for a huge backup: A way to tell badblocks to only scan space which is actually being used, and then enough free space to make sure relocations work. If you're mounting readonly, you shouldn't care about marking every single bad sector in free space. I guess this would require a lot more intelligence from fsck, though -- it would have to be able to constantly check for bad blocks, as opposed to just running badblocks once and grabbing a list to avoid.
Re: FEATURE Req: integrate badblocks check into fsck.reiser*
Hans Reiser wrote: Ric Wheeler wrote: Having mkfs ignore bad writes would seem to encourage users to create a new file system on a disk that is known to be bad most likely not going to function well. If a user ever has a golden opportunity to toss a drive in the trash, it is when they notice mkfs fails ;-) This option to mkfs sounds like an invitation to disaster. Yes, you are right, the option should be to run badblocks and then fail if it finds any. Unless it creates significantly more work for us, there should be an option to run badblocks, and if it finds any, it should prompt the user (with BIG FAT CAPSLOCK WARNINGS) whether they want to format anyway. Formatting anyway should work, and we should be able to have blocks marked bad. It would also be nice to be able to change this later -- to pass in a list of badblocks to, say, fsck (which I think is the original request). This is especially nice for recovery, if you don't have the luxury of copying a whole disk image to another drive before running fsck. That's not to say that we should automatically detect and relocate bad blocks during normal operation (while the FS is mounted), but deliberately removing functionality to protect you from yourself isn't the Linux Way. Linux has a long history of kernel config options that say things like YOU WILL LOSE DATA. You have been warned.
Re: Reiser FS will not boot after crash
[EMAIL PROTECTED] wrote: On Mon, 04 Sep 2006 23:33:27 +0400, Vladimir V. Saveliev said: after unclean shutdown journal reply is necessary to return reiserfs to consistent state. Maybe GRUB did not do that? A case can be made that GRUB should be keeping its grubby little paws off the filesystem journal. It's a *bootloader*. It's only purpose in life is to load other code that can make intelligent decisions about things like how (or even whether) to replay a filesystem journal. But, unlike Lilo, Grub usually has to load that other code from a filesystem, which means it's already doing more than what bootloaders traditionally do. If it was up to me, we'd all be using LinuxBIOS and kexec, and it wouldn't be an issue.
Re: wrt: checking reiserfs/4 partitions on boot
Peter wrote: On the namesys.com FAQ page, it is recommended that 0 0 be placed at the end of the fstab lines for reiserfs partitions. I have two questions: 1) does this recommendation also apply for reiser4? 2) why is this recommendation made? Is it unnecessary to routinely check reiser partitions? I understand that in the event of an abnormal shutdown, fsck will be forced, correct? I think the idea is that in the event of an abnormal shutdown, you simply replay the journal. With Reiser4, the likelihood of having to run fsck should be even less. Probably isn't now, but should be.
Re: reiser4 corruption on initial copy
Peter wrote: On Fri, 01 Sep 2006 17:35:29 -0500, David Masover wrote: Peter wrote: 2) I did run badblocks on the dest, and it was clean. 3) I am using the patch from 2.6.17.3 and in my kernel, I have full preempt and cfq scheduling. What about the kernel on the livecd? Anticipatory Voluntary Yes, that should be fine, but I was wondering if it's the same version, if you built it yourself, etc etc. Plus, it is smp, so there are some additional options checked. Should I have preempt=none with reiser4? I'm not sure.
Re: FEATURE Req: integrate badblocks check into fsck.reiser*
Vladimir V. Saveliev wrote: Hello On Friday 01 September 2006 22:23, Peter wrote: Perhaps this has been mentioned before. If so, sorry. IMHO, it would be useful to integrate a call to badblocks in the fsck/mkfs.reiser* programs so that more thorough disk checking can be done at format time. Sort of like the option e2fsck -c. If this is added, the output could be fed immediately to the reiser format program and badblocks spared prior to filesystem use. JM$0.02 both mkfs.reiserfs and fsck.reiserfs have -B option to accept list of bad blocks. We thought that should be enough. It really should. Why bother with a patch? Just write a wrapper script that runs badblocks and passes in the list to mkfs.
Re: reiser4 corruption on initial copy
Peter wrote: 2) I did run badblocks on the dest, and it was clean. 3) I am using the patch from 2.6.17.3 and in my kernel, I have full preempt and cfq scheduling. What about the kernel on the livecd?
Re: FEATURE Req: integrate badblocks check into fsck.reiser*
Peter wrote: On Fri, 01 Sep 2006 17:27:20 -0500, David Masover wrote: snip... both mkfs.reiserfs and fsck.reiserfs have -B option to accept list of bad blocks. We thought that should be enough. It really should. Why bother with a patch? Just write a wrapper script that runs badblocks and passes in the list to mkfs. It was just a thought from userland. My perspective was that a user, not a hard-boiled geek, might get lulled into a false sense of security but may not have the wherewithal to write a wrapper. If nothing else, when the final doc is written (did I say final?:)), it should include a notice about not running badblocks. Well, let's see... Most hard drives come more thoroughly tested at the factory than anything badblocks would do. Also, it seems redundant to have every single mkfs have to implement a badblocks flag.. I'd suggest a universal wrapper, then, or a modification to the mkfs frontend, so that this works the same way across all filesystems. Something like mkfs -B -t reiser4
Re: Reiser4 und LZO compression
Clemens Eisserer wrote: But speaking of single threadedness, more and more desktops are shipping with ridiculously more power than people need. Even a gamer really Will the LZO compression code in reiser4 be able to use multi-processor systems? Good point, but it wasn't what I was talking about. I was talking about the compression happening on one CPU, meaning even if it takes most of the CPU to saturate disk throughput, your other CPU is still 100% available, meaning the typical desktop user won't notice their apps running slower, they'll just notice disk access being faster.
Re: Reiser4 und LZO compression
PFC wrote: Maybe, but Reiser4 is supposed to be a general purpose filesystem talking about its advantages/disadvantages wrt. gaming makes sense, I don't see a lot of gamers using Linux ;) There have to be some. Transgaming seems to still be making a successful business out of making games work out-of-the-box under Wine. While I don't imagine there are as many who attempt gaming on Linux, I'd guess a significant portion of Linux users, if not the majority, are at least casual gamers. Some will have given up on the PC as a gaming platform long a go, tired of its upgrade cycle, crashes, game patches, and install times. These people will have a console for games, probably a PS2 so they can watch DVDs, and use their computer for real work, with as much free software as they can manage. Others will compromise somewhat. I compromise by running the binary nVidia drivers, keeping a Windows partition around sometimes, and enjoying many old games which have released their source recently, and now run under Linux -- as well as a few native Linux games, some Cedega games, and some under straight Wine. Basically, I'll play it on Linux if it works well, otherwise I boot Windows. I'm migrating away from that Windows dependency by making sure all my new game purchases work on Linux. Others will use some or all of the above -- stick to old games, use exclusively stuff that works on Linux (one way or the other), or give up on Linux gaming entirely and use a Windows partition. Anything Linux can do to become more game-friendly is one less reason for gamers to have to compromise. Not all gamers are willing to do that. I know at least two who ultimately decided that, with dual boot, they end up spending most of their time on Windows anyway. These are the people who would use Linux if they didn't have a good reason to use something else, but right now, they do. This is not the fault of the filesystem, but taking the attitude of There aren't many Linux gamers anyway -- that's a self-fulfilling prophecy, gamers WILL leave because of it. Also, as you said, gamers (like many others) reinvent filesystems and generally use the Big Zip File paradigm, which is not that stupid for a read only FS (if you cache all file offsets, reading can be pretty fast). However when you start storing ogg-compressed sound and JPEG images inside a zip file, it starts to stink. I don't like it as a read-only FS, either. Take an MMO -- while most commercial ones load the entire game to disk from install DVDs, there are some smaller ones which only cache the data as you explore the world. Also, even with the bigger ones, the world is always changing with patches, and I've seen patches take several hours to install -- not download, install -- on a 2.4 ghz amd64 with 2 gigs of RAM, on a striped RAID. You can trust me when I say this was mostly disk-bound, which is retarded, because it took less than half an hour to install in the first place. Even simple multiplayer games -- hell, even single-player games can get fairly massive updates relatively often. Half-Life 2 is one example -- they've now added HDR to the engine. In these cases, you still need as fast access as possible to the data (to cut down on load time), and it would be nice to save on space as well, but a zipfile starts to make less sense. And yet, I still see people using _cabinet_ files. Compression at the FS layer, plus efficient storing of small files, makes this much simpler. While you can make the zipfile-fs transparent to a game, even your mapping tools, it's still not efficient, and it's not transparent to your modeling package, Photoshop-alike, audio software, or gcc. But everything understands a filesystem. It depends, you have to consider several distinct scenarios. For instance, on a big Postgres database server, the rule is to have as many spindles as you can. - If you are doing a lot of full table scans (like data mining etc), more spindles means reads can be parallelized ; of course this will mean more data will have to be decompressed. I don't see why more spindles means more data decompressed. If anything, I'd imagine it would be less reads, total, if there's any kind of data locality. But I'll leave this to the database experts, for now. - If you are doing a lot of little transactions (web sites), it means seeks can be distributed around the various disks. In this case compression would be a big win because there is free CPU to use ; Dangerous assumption. Three words: Ruby on Rails. There goes your free CPU. Suddenly, compression makes no sense at all. But then, Ruby makes no sense at all for any serious load, unless you really have that much money to spend, or until the Ruby.NET compiler is finished -- that should speed things up. besides, it would virtually double the RAM cache size. No it wouldn't, not the way Reiser4 does it.
Re: Reiser4 und LZO compression
Nigel Cunningham wrote: Hi. On Tue, 2006-08-29 at 06:05 +0200, Jan Engelhardt wrote: Hmm. LZO is the best compression algorithm for the task as measured by the objectives of good compression effectiveness while still having very low CPU usage (the best of those written and GPL'd, there is a slightly better one which is proprietary and uses more CPU, LZRW if I remember right. The gzip code base uses too much CPU, though I think Edward made I don't think that LZO beats LZF in both speed and compression ratio. LZF is also available under GPL (dual-licensed BSD) and was choosen in favor of LZO for the next generation suspend-to-disk code of the Linux kernel. see: http://www.goof.com/pcg/marc/liblzf.html thanks for the info, we will compare them For Suspend2, we ended up converting the LZF support to a cryptoapi plugin. Is there any chance that you could use cryptoapi modules? We could then have a hope of sharing the support. I am throwing in gzip: would it be meaningful to use that instead? The decoder (inflate.c) is already there. 06:04 shanghai:~/liblzf-1.6 l configure* -rwxr-xr-x 1 jengelh users 154894 Mar 3 2005 configure -rwxr-xr-x 1 jengelh users 26810 Mar 3 2005 configure.bz2 -rw-r--r-- 1 jengelh users 30611 Aug 28 20:32 configure.gz-z9 -rw-r--r-- 1 jengelh users 30693 Aug 28 20:32 configure.gz-z6 -rw-r--r-- 1 jengelh users 53077 Aug 28 20:32 configure.lzf We used gzip when we first implemented compression support, and found it to be far too slow. Even with the fastest compression options, we were only getting a few megabytes per second. Perhaps I did something wrong in configuring it, but there's not that many things to get wrong! All that comes to mind is the speed/quality setting -- the number from 1 to 9. Recently, I backed up someone's hard drive using -1, and I believe I was still able to saturate... the _network_. Definitely try again if you haven't changed this, but I can't imagine I'm the first persson to think of it. From what I remember, gzip -1 wasn't faster than the disk. But at least for (very) repetitive data, I was wrong: eve:~ sanity$ time bash -c 'dd if=/dev/zero of=test bs=10m count=10; sync' 10+0 records in 10+0 records out 104857600 bytes transferred in 3.261990 secs (32145287 bytes/sec) real0m3.746s user0m0.005s sys 0m0.627s eve:~ sanity$ time bash -c 'dd if=/dev/zero bs=10m count=10 | gzip -v1 test; sync' 10+0 records in 10+0 records out 104857600 bytes transferred in 2.404093 secs (43616282 bytes/sec) 99.5% real0m2.558s user0m1.554s sys 0m0.680s eve:~ sanity$ This was on OS X, but I think it's still valid -- this is a slightly older Powerbook, with a 5400 RPM drive, 1.6 ghz G4. -1 is still worlds better than nothing. The backup was over 15 gigs, down to about 6 -- loads of repetitive data, I'm sure, but that's where you win with compression anyway. Well, you use cryptoapi anyway, so it should be easy to just let the user pick a plugin, right?
Re: Reiser4 und LZO compression
PFC wrote: Would it be, by any chance, possible to tweak the thing so that reiserfs plugins become kernel modules, so that the reiserfs core can be put in the kernel without the plugins slowing down its acceptance ? I don't see what this has to do with cryptoapi plugins -- those are not related to Reiser plugins. As for the plugins slowing down acceptance, it's actually the concept of plugins and the plugin API -- in other words, it's the fact that Reiser4 supports plugins -- that is slowing it down, if anything about plugins is still an issue at all. Making them modules would make it worse. Last I saw, Linus doesn't particularly like the idea of plugins because of a few misconceptions, like the possibility of proprietary (possibly GPL-violating) plugins distributed as modules -- basically, something like what nVidia and ATI do with their video drivers. As it is, a good argument in favor of plugins is that this kind of thing isn't possible -- we often put plugins in quotes because really, it's just a nice abstraction layer. They aren't any more plugins than iptables modules or cryptoapi plugins are. If anything, they're less, because they must be compiled into Reiser4, which means either one huge monolithic Reiser4 module (including all plugins), or everything compiled into the kernel image. (and updating plugins without rebooting would be a nice extra) It probably wouldn't be as nice as you think. Remember, if you're using a certain plugin in your root FS, it's part of the FS, so I don't think you'd be able to remove that plugin any more than you're able to remove reiser4.ko if that's your root FS. You'd have to unmount every FS that uses that plugin. At this point, you don't really gain much -- if you unmount every last Reiser4 filesystem, you can then remove reiser4.ko, recompile it, and load a new one with different plugins enabled. Also, these things would typically be part of a kernel update anyway, meaning a reboot anyway. But suppose you could remove a plugin, what then? What would that mean? Suppose half your files are compressed and you remove cryptocompress -- are those files uncompressed when the plugin goes away? Probably not. The only smart way to handle this that I can think of is to make those files unavailable, which is probably not what you want -- how do you update cryptocompress when the new reiser4_cryptocompress.ko is itself compressed? That may be an acceptable solution for some plugins, but you'd have to be extremely careful which ones you remove. The only safe way I can imagine doing this may not be possible, and if it is, it's extremely hackish -- load the plugin under another module name, so r4_cryptocompress would be r4_cryptocompress_init -- have the module, once loaded, do an atomic switch from the old one to the new one, effectively in-place. But that kind of solution is something I've never seen attempted, and only really heard of in strange environments like Erlang. It would probably require much more engineering than the Reiser team can handle right now, especially with their hands full with inclusion. The patch below is so-called reiser4 LZO compression plugin as extracted from 2.6.18-rc4-mm3. I think it is an unauditable piece of shit and thus should not enter mainline. Like lib/inflate.c (and this new code should arguably be in lib/). The problem is that if we clean this up, we've diverged very much from the upstream implementation. So taking in fixes and features from upstream becomes harder and more error-prone. I'd suspect that the maturity of these utilities is such that we could afford to turn them into kernel code in the expectation that any future changes will be small. But it's not a completely simple call. (iirc the inflate code had a buffer overrun a while back, which was found and fixed in the upstream version).
Re: Reiser4 und LZO compression
Gregory Maxwell wrote: On 8/29/06, David Masover [EMAIL PROTECTED] wrote: [snip] Conversely, compression does NOT make sense if: - You spend a lot of time with the CPU busy and the disk idle. - You have more than enough disk space. - Disk space is cheaper than buying enough CPU to handle compression. - You've tried compression, and the CPU requirements slowed you more than you saved in disk access. [snip] It's also not always this simple ... if you have a single threaded workload that doesn't overlap CPU and disk well, (de)compression may be free even if you're still CPU bound a lot as the compression is using cpu cycles which would have been otherwise idle.. Isn't that implied, though -- if the CPU is not busy (run top under a 2.6 kernel and you'll see an IO-Wait number), then the first condition isn't satisfied -- CPU is not busy, disk is not idle. But speaking of single threadedness, more and more desktops are shipping with ridiculously more power than people need. Even a gamer really won't benefit that much from having a dual-core system, because multithreading is hard, and games haven't been doing it properly. John Carmack is pretty much the only superstar programmer in video games, and after his first fairly massive attempt to make Quake 3 have two threads (since he'd just gotten a dual-core machine to play with) actually resulted in the game running some 30-40% slower than it did with a single thread. So, for the desktop, compression makes perfect sense. We don't have massive amounts of RAID. If we have newer machines, there's a good chance we'll have one CPU sitting mostly idle while playing games. Short of gaming, there are few desktop applications that will fully utilize even one reasonably fast CPU. The reason gamers buy dual-core systems is they're getting cheap enough to be worth it, and that one core sitting idle is a perfect place to do OS/system work not related to the game -- antivirus, automatic update checks, the inevitable background processes leeching a couple few % off your available CPU. So for the typical new desktop with about 2 ghz of 64-bit processor sitting idle, compression is essentially free.
Re: Reiser4 und LZO compression
Hans Reiser wrote: David Masover wrote: John Carmack is pretty much the only superstar programmer in video games, and after his first fairly massive attempt to make Quake 3 have two threads (since he'd just gotten a dual-core machine to play with) actually resulted in the game running some 30-40% slower than it did with a single thread. Do the two processors have separate caches, and thus being overly fined grained makes you memory transfer bound or? It wasn't anything that intelligent. Let me see if I can find it... Taken from http://techreport.com/etc/2005q3/carmack-quakecon/index.x?pg=1 Graphics accelerators are a great example of parallelism working well, he noted, but game code is not similarly parallelizable. Carmack cited his Quake III Arena engine, whose renderer was multithreaded and achieved up to 40% performance increases on multiprocessor systems, as a good example of where games would have to go. (Q3A's SMP mode was notoriously crash-prone and fragile, working only with certain graphics driver revisions and the like.) Initial returns on multithreading, he projected, will be disappointing. Basically, it's hard enough to split what we currently do onto even 2 CPUs, and it definitely seems like we're about to hit a wall in CPU frequency just as multicore becomes a practical reality, so future CPUs may be measured in how many cores they have, not how fast each core is. There's also a question of what to use the extra power for. From the same presentation: Part of the problem with multithreading, argued Carmack, is knowing how to use the power of additional CPU cores to enhance the game experience. A.I., can be effective when very simple, as some of the first Doom logic was. It was less than a page of code, but players ascribed complex behaviors and motivations to the bad guys. However, more complex A.I. seems hard to improve to the point where it really changes the game. More physics detail, meanwhile, threatens to make games too fragile as interactions in the game world become more complex. So, I humbly predict that Physics cards (so-called PPUs) will fail, and be replaced by ever-increasing numbers of cores, which will, for awhile, be one step ahead of what we can think of to fill them with. Thus, anything useful (like compression) that can be split off into a separate thread is going to be useful for games, and won't hurt performance on future mega-multicore monstrosities. The downside is, most game developers are working on Windows, for which FS compression has always sucked. Thus, they most often implement their own compression, often something horrible, like storing the whole game in CAB or ZIP files, and loading the entire level into RAM before play starts, making load times less relevant for gameplay. Reiser4's cryptocompress would be a marked improvement over that, but it would also not be used in many games.
Re: Reiser4 und LZO compression
Toby Thain wrote: Gamer systems, whether from coder's or player's p.o.v., would appear fairly irrelevant to reiserfs and this list. I'd trust Carmack's eye candy credentials but doubt he has much to say about filesystems or server threading... Maybe, but Reiser4 is supposed to be a general purpose filesystem, so talking about its advantages/disadvantages wrt. gaming makes sense, especially considering gamers are the most likely to tune their desktop for perfomance. That was a bit much, though. I apologize.
Re: Reiser4 und LZO compression
Andrew Morton wrote: On Sun, 27 Aug 2006 04:34:26 +0400 Alexey Dobriyan [EMAIL PROTECTED] wrote: The patch below is so-called reiser4 LZO compression plugin as extracted from 2.6.18-rc4-mm3. I think it is an unauditable piece of shit and thus should not enter mainline. Like lib/inflate.c (and this new code should arguably be in lib/). The problem is that if we clean this up, we've diverged very much from the upstream implementation. So taking in fixes and features from upstream becomes harder and more error-prone. Well, what kinds of changes have to happen? I doubt upstream would care about moving some of it to lib/ -- and anyway, reiserfs-list is on the CC. We are speaking of upstream in the third party in the presence of upstream, so... Maybe just ask upstream?
Re: [PATCH] reiserfs: eliminate minimum window size for bitmap searching
Jeff Mahoney wrote: When a file system becomes fragmented (using MythTV, for example), the bigalloc window searching ends up causing huge performance problems. In a file system presented by a user experiencing this bug, the file system was 90% free, but no 32-block free windows existed on the entire file system. This causes the allocator to scan the entire file system for each 128k write before backing down to searching for individual blocks. Question: Would it be better to take that performance hit once, then cache the result for awhile? If we can't find enough consecutive space, such space isn't likely to appear until a lot of space is freed or a repacker is run. In the end, finding a contiguous window for all the blocks in a write is an advantageous special case, but one that can be found naturally when such a window exists anyway. Hmm. Ok, I don't understand how this works, so I'll shut up.
Re: problem with reiser3
Marcos Dione wrote: On Mon, Aug 21, 2006 at 08:23:30PM -0500, David Masover wrote: it would be better to create a backup on a spare bigger partition using dd_rescue (pad not recoverable zones with zeroes), then run fsck on the created image. unluckly I can't. it's a 160 GiB partition and I don't have spare space. How much spare space do you have? You may be able to do some tricks with dm_snapshot... right now, I have 45 MiB of space in my spare disk. I *could* (should?) make more space, but can't guarrantee anythung. That won't be enough. Worst case, decide whether the data on that 160 gig partition is worth buying a cheap 200 or 300 gig drive for this backup.
Re: [PATCH] reiserfs: eliminate minimum window size for bitmap searching
Jeff Mahoney wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 David Masover wrote: Jeff Mahoney wrote: When a file system becomes fragmented (using MythTV, for example), the bigalloc window searching ends up causing huge performance problems. In a file system presented by a user experiencing this bug, the file system was 90% free, but no 32-block free windows existed on the entire file system. This causes the allocator to scan the entire file system for each 128k write before backing down to searching for individual blocks. Question: Would it be better to take that performance hit once, then cache the result for awhile? If we can't find enough consecutive space, such space isn't likely to appear until a lot of space is freed or a repacker is run. The problem is that finding the window isn't really a direct function of free space, it's a function of fragmentation. You could have a 50% full file system that still can't find a 32 block window by having every other block used. I know it's an extremely unlikely case, but it demonstrates the point perfectly. Maybe, but it's still not a counterpoint. No matter how fragmented a filesystem is, freeing space can open up contiguous space, whereas if space is not freed, you won't open up contiguous space. Thus, if your FS is 50% full and 100% fragmented, then you wait till space is freed, because if nothing happens, or if more space is filled in, you'll have the same problem at 60% than you did at 50%. If, however, you're at 60% full, and 10% of the space is freed, then it's fairly unlikely that you still don't have contiguous space, and it's worth it to scan once more at 50%, and again if it then drops to 40%. So, if your FS is 90% full and space is being freed, I'd think it would be worth it to scan again at 80%, 70%, and so on. I'd also imagine it would do little or nothing to constantly monitor an FS that stays mostly full -- maybe give it a certain amount of time, but if we're repacking anyway, just wait for a repacker run. It seems very unlikely that between repacker runs, activity between 86% and 94% would open up contiguous space. It's still not a direct function of freed space (as opposed to free space), but it starts to look better. I'm not endorsing one way or the other without benchmarks, though. In the end, finding a contiguous window for all the blocks in a write is an advantageous special case, but one that can be found naturally when such a window exists anyway. Hmm. Ok, I don't understand how this works, so I'll shut up. If the space after the end of the file has 32 or more blocks free, even without the bigalloc behavior, those blocks will be used. For what behavior -- appending? Also, I think the bigalloc behavior just ultimately ends up introducing even more fragmentation on an already fragmented file system. It'll keep contiguous chunks together, but those chunks can end up being spread all over the disk. This sounds like the NTFS strategy, which was basically to allow all hell to break loose -- above a certain chunk size. Keep chunks of a certain size contiguous, and you limit the number of seeks by quite a lot.
Re: problem with reiser3
Marcos Dione wrote: it would be better to create a backup on a spare bigger partition using dd_rescue (pad not recoverable zones with zeroes), then run fsck on the created image. unluckly I can't. it's a 160 GiB partition and I don't have spare space. How much spare space do you have? You may be able to do some tricks with dm_snapshot...
Re: reiserfs and IDE write cache
Francisco Javier Cabello wrote: Hello, I have been 'googling' and I have found a lot of people warning about the problems with IDE write cache and journaling filesystems. These problems exist with ANY filesystem, journaling or not. They also exist with no filesystem at all. Should I disable write cache in my systems using reiserfs3+2.4.25? I'm not sure if it will help. At least with IDE drives, I often cannot get the write cache disabled -- it's as if it ignores hdparm. So, like Toby says, get a UPS. Or get a laptop instead, if that makes any sense.
Re: some testing questions
Hans Reiser wrote: Ingo Bormuth wrote: #df: /dev/hda8 6357768 3478716 2879052 55% /cache Before doing so, the partition was 90% full. The performance difference between 90% full and 55% full will be large on every filesystem. When we ship a repacker, that will be less true, because we will have large chunks of unused space after the repacker runs. Not always true. For one, doesn't Reiser4 arbitrarily reserve 5%? For another, look at his results -- unless I'm wrong, that's 3-7% fragmentation. If I'm wrong, it's more like .03-.07%. And lastly, at a certain point, percentages aren't really that accurate. I've got a 350 or 400 gig partition which is 95% full, according to df (which if I was right about that 5%, it's more like 90% full) and that still leaves a solid 10-20 gigs free. I mean, yes, performance will eventually start to suffer, but how much time and activity will it take to fragment 20 gigs of free space, especially with lazy allocation?
Re: the 'official' point of view expressed by kernelnewbies.org
Edward Shishkin wrote: Tom Reinhart wrote: Anyone with serious need for data integrity already uses RAID, so why add brand new complexity for a solved problem? RAID is great at recovering data, but not detecting errors. File system can detect errors with checksum. What is missing is an API between layers for filesystem to say this sector is bad, go rebuild it. Actually we dont need a special API: kernel should warn and recommend running fsck, which scans the whole tree and handles blocks with bad checksums. What does this have to do with RAID, though?
Re: the 'official' point of view expressed by kernelnewbies.org
Edward Shishkin wrote: David Masover wrote: Edward Shishkin wrote: Tom Reinhart wrote: Anyone with serious need for data integrity already uses RAID, so why add brand new complexity for a solved problem? RAID is great at recovering data, but not detecting errors. File system can detect errors with checksum. What is missing is an API between layers for filesystem to say this sector is bad, go rebuild it. Actually we dont need a special API: kernel should warn and recommend running fsck, which scans the whole tree and handles blocks with bad checksums. What does this have to do with RAID, though? I assumed we dont have raid: reiser4 can support its own checksums/ecc signatures for (meta)data protection via node plugin We don't have a guaranteed raid, however, it would be nice to do the right thing when there is raid.
Re: The Infamous Reiser4-randomly-blocks-for-ages-and-writes-the-hd-continously-in-the-mean-while now with a btrace log! (hope it helps)
Vesa Kaihlavirta wrote: Incidentally, I've witnessed similar behaviour in various simple tasks, e.g. writing entries to an sqlite database, or receiving mail from pop3 in thunderbird. Sounds like fsync issues. That is being worked on.
Re: The Infamous Reiser4-randomly-blocks-for-ages-and-writes-the-hd-continously-in-the-mean-while now with a btrace log! (hope it helps)
Łukasz Mierzwa wrote: Dnia Thu, 10 Aug 2006 20:48:59 +0200, David Masover [EMAIL PROTECTED] napisał: Vesa Kaihlavirta wrote: Incidentally, I've witnessed similar behaviour in various simple tasks, e.g. writing entries to an sqlite database, or receiving mail from pop3 in thunderbird. Sounds like fsync issues. That is being worked on. I'm think it's writeout that's involved, I tried to disable fsync and it helped for apps that are calling fsync to keep data integrity (like sqlite) but it also happens when I'm downloading files using rtorrent which does not call fsync but generetes many little writes. Hmm. Fragmentation, maybe? Is this easily reproduceable with a freshly-formatted fs? I'm just guessing here...
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Jan Engelhardt wrote: Yes, it looks like a business of node plugin, but AFAIK, you objected against such checks: Did I really? Well, I think that allowing users to choose whether to checksum or not is a reasonable thing to allow them. I personally would skip the checksum on my computer, but others It could be a useful mkfs option It should preferably a runtime tunable variable, at best even per-superblock and (overriding the sb setting), per-file. Sounds almost exactly like a plugin. And yes, that would be the way to do it, especially considering some files will already have internal consistency checking -- just as we should allow direct disk IO to some files (no journaling) when the files in question are databases that do their own journaling.
Re: article abour Reiser4 on linux.com
Andreas Schäfer wrote: On 02:28 Wed 09 Aug , Hans Reiser wrote: Unfortunately, it's not one of which editors approve. It too easily looks as though the writer is being influenced by the source. If I were to do so, I'd risk being banned from publication. Uhm... interesting. It's not that I have so much experience with the press (just three interviews so far), but everytime I got the article for review before publication. If you didn't trust the source in the first place, why should you bother to take information from it at all? If you do trust it, why not ask again? Hmm. Except in this case, they were summarizing a rather large debate, so it's not a question of trusting the source or not, it's a question of whether you want to fact-check with every person on reiserfs-list and lkml, until you've got the whole thing so debated and watered-down that it's meaningless. Then, too, sometimes it's better to check ahead of time than to get it wrong and have to correct later, because people won't always read the corrections.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Hans Reiser wrote: Pavel Machek wrote: Yes, I'm afraid redundancy/checksums kill write speed, they kill write speed to cache, but not to disk our compression plugin is faster than the uncompressed plugin. Regarding cache, do we do any sort of consistency checking for RAM, or do we leave that to some of the stranger kernel patches -- or just an occasional memtest?
Re: Ebuild/rpm/deb repo's (was Re: reiser4 can now bear with filled fs, looks stable to me...)
Christian Trefzer wrote: On Sun, Aug 06, 2006 at 04:23:16PM +0200, Maciej Sołtysiak wrote: There also is an issue with grub. The kernel alone is fine for creating partitions (or loop devices) but with grub not patched we can't install boot partitions. No biggy, I guess, but still a problem. Few people keep a 32MB ext2 for /boot purposes these days, so it really is imperative that grub can read kernel images off a reiser4 /. I think there are patches, but I do keep a 32 meg ext3 for /boot, because it seems like no matter what FS I choose, there's some sort of caveat involving Grub. I know when installing XFS as a root FS on Ubuntu, it talks about Grub problems... I mean, having Grub support everything would be nice, but if you're reformatting anyway, I don't think it's that imperative.
Re: Ebuild/rpm/deb repo's (was Re: reiser4 can now bear with filled fs, looks stable to me...)
Maciej Sołtysiak wrote: Hello David, hi I have built today an r4-patched ubuntu kernel package (yes, debs!) Sounds good. I don't have an ubuntu to test with at the moment, though. Please note, that this is done all under virtualization (Microsoft Virtual PC). Not to nitpick, but isn't that emulation? Or have they actually done real virtualization yet?
Re: Ebuild/rpm/deb repo's (was Re: reiser4 can now bear with filled fs, looks stable to me...)
Maciej Sołtysiak wrote: Hello David, Tuesday, August 8, 2006, 1:23:01 AM, you wrote: Sounds good. I don't have an ubuntu to test with at the moment, though. Well, both MS Virtual PC and VMWare are free of charge, so installing is a real snap. Under what, though? I don't want MS crap on my OS X (need that for work ATM), and I can't imagine they've ported it to Linux. I have no reason to boot Windows except for games, and if I was going to do that, I may as well shrink my Windows partition to make room for a native install. Which would be fine, but it's a lot of work when I don't run Ubuntu normally. I'd be willing to test on the one Ubuntu server I run, but it's across the country until next week, and also work-critical. Not to nitpick, but isn't that emulation? Or have they actually done real virtualization yet? I don't know the differences, can you shed some light? AFAICS M$ will be shipping Virtual PC with Vista to allow people run older software under virtual machines. (be it virtualized or emulated) Still hard to say. Virtualization splits up the real hardware. It's like a scheduler, only for OSes. Emulation is more like an interpreter -- it reads each instruction and then executes something that does the same thing. Emulation can work from any arch to any arch, so Rosetta (allowing PPC OS X apps to run on OS X86) is emulation. Emulation is usually at least 2x slower than native. Virtualization usually approaches native for CPU stuff, but at least disk IO and graphics usually have to be emulated -- so no 3D acceleration, so no games under a guest OS. If MS wanted to do the best possible thing for their consumers, they'd give you a free XP under VirtualPC with Vista, and actually do virtualization. If M$ wanted to make it even more likely for people to want to upgrade to Vista, they might deliberately make it cost tons of money and make it emulation, so that XP looks slower, and native Vista apps look so much faster that people complain until everything works on Vista. If Virtual PC is emulation, maybe Virtual Server 2005 R2 (also free of charge) is virtualizaton. I have no idea what Virtual Server is.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
[EMAIL PROTECTED] wrote: It seems that finding all the bits and pieces to do ext3 on-line expansion has been a study in obfuscation. Somewhat surprising since this feature is a must for enterprise class storage management. Not really. Having people who can dig through the obfuscation is also a must for enterprise class anything. The desktop is where it's really crucial to have good documentation and ease of use. The enterprise can afford to pay people who already knew it well, helped to develop it... Grandma probably got Linux because she couldn't afford a new OS, or computer. Of course, I won't go so far as to try to say Linux should focus on this. Linux should focus on whatever Linux developers feel like focusing on.
Re: Another article abour Reiser4 on linux.com
Lexington Luthor wrote: Bernd Schubert wrote: An alternative might be a reiser4 fuse port. Has some advantages: Please please no. The kernel people will use that as an argument for keeping it out of the kernel. They'll use anything as an argument for keeping it out of the kernel. This one is particularly shallow, especially if we still have the kernel version, because the performance difference will be significant. If it isn't, maybe it is time for things like FUSE to take us in the direction of microkernels... I want reiser4 to be popular enough to make my apps depend on it and not have the users complain about having to use an obscure fs. Well, an obscure program (FUSE) is probably a lot easier to convince users of than an obscure filesystem (reiser4 in-kernel). Besides, the only thing about reiser4 that interests me more than XFS or reiserfs is the speed. That's you. There are other reasons to like it. But I agree with you in that I don't think it's worth the resources to do a FUSE port, especially when there is (again) NO guarantee that anything we do will get us in the kernel, so better to do things that will either get us users anyway (like distro inclusion) or do things the kernel people specifically ask for.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Pavel Machek wrote: On Tue 01-08-06 11:57:10, David Masover wrote: Horst H. von Brand wrote: Bernd Schubert [EMAIL PROTECTED] wrote: While filesystem speed is nice, it also would be great if reiser4.x would be very robust against any kind of hardware failures. Can't have both. Why not? I mean, other than TANSTAAFL, is there a technical reason for them being mutually exclusive? I suspect it's more we haven't found a way yet... What does the acronym mean? There Ain't No Such Thing As A Free Lunch. Yes, I'm afraid redundancy/checksums kill write speed, and you need that for robustness... Not necessarily -- if you do it on flush, and store it near the data it relates to, you can expect a similar impact to compression, except that due to slow disks, the compression can actually speed things up 2x, whereas checksums should be some insignificant amount slower than 1x. Redundancy, sure, but checksums should be easy, and I don't see what robustness (abilities of fsck) has to do with it. You could have filesystem that can be tuned for reliability and tuned for speed... but you can't have both in one filesystem instance. That's an example of TANSTAAFL, if it's true.
Re: Another article abour Reiser4 on linux.com
Tassilo Horn wrote: [1] http://www.linux.com/article.pl?sid=06/07/31/1548201 From the article: To complicate matters, Reiser4's approach lands the filesystem in the middle of a longstanding convention of avoiding plugins in the kernel, mainly to avoid architectural complications, but also to discourage proprietary drivers that circumvent the kernel's release under the GNU General Public License. We should really find something better to call them than plugins, or we should come up with a standard copy'n'paste statement to refute this.
Re: Another article abour Reiser4 on linux.com
Clay Barnes wrote: I like using a term that is already in an accepted part of the kernel. Extensions might smack of plugins a bit much, and we're trying to avoid just doing a s/plugins/extensions/ of the arguments we're seeing now. We could do that with almost anything: Or just modules... netfilter has modules that allow us to write very cool and weird stuff (like unclean match once was) and nobody complains. Except that modules could also possibly remind people of proprietary modules, like the nvidia/ATI/vmware stuff. Still, if we allow netfilter, why not Reiser4 modules? Another word could be 'hooks' I don't think this would quite work. A hook describes more the place you connect to, whereas a module/plugin/whatever... Think of it this way -- the hook is what a plugin would plug in to. So it may not matter much what we name them, we're probably still going to need that cut'n'paste argument. Might be easier with modules, though.
Re: Another article abour Reiser4 on linux.com
Clay Barnes wrote: I think the core thing we have to have to win this argument is a) A word that isn't *instantly* associated with banned things. That'd be nice. b) The ability to point to the technology to point to the design and say look, Look, it's *impossible* to use this design to put binary modules into the kernel. Even if it's as hard as ATI or nVidia modules to put it in, that'll be enough to put up a fight against inclusion. Why? Why does it have to be impossible to do binary things with the kernel? I mean, if Linus hates GPL3 because it limits what people do with the kernel... Besides, you can't make it impossible, you can only make it about as hard as it is now. The license is the issue here. The *only* way to win a polical/personal fight is to remove any possible objection until resistance looks purely stupid and wholly unsubtantiated. I agree. That's why we not only need a new name, we also need a cut'n'paste argument that just makes this look stupid. And it has to be short enough that cut'n'paste isn't bad, because if we refer people to the FAQ, they won't read it. I was just saying to my roomate that I was losing hope for Reiser4 because I didn't see an end to the politics any time soon. Yes, it can look pretty hopeless. There's only one possible way I see to get in. You must ask for an absolute list of things that are objectionable. You should then ask *before you start work* about removal of any items that are either a) impossible, or b) illogical. Once you've gotten the official stamp of approval of the (posibly recvised) absolute list of objections, you have to do it, completely and exactly. If they agreed that that is everything they find wrong and promised that they would include Reiser4 if those issues were resolved, then they really *have* to put it in then. The problem is, they don't. There have been some fairly definitive lists in the past, that were done, but maybe not quite the way they were expected. The core of all this is that rather than leaving an open-ended task that can be expanded at will, they are given limits to how long the objections can be spread out. Problem is, dictators can do whatever they want, even if they said something else before. And that's all assuming you can get them to agree to such a list, and agree to abide by it. They either wouldn't go for it, or they would come up with a list that effectively kills Reiser4, turning it into ext3.
Re: Another article abour Reiser4 on linux.com
TongKe Xue wrote: A really stupid question ... why not put Reiser4 in one of the BSDs? And after it's got mainstream use, if it proves its worth, there'll be more pressure for Linux to adopt. It will likely take far more work to port it to BSD than it will to be included in Linux. And you're talking about probably even less chance of inclusion or of picking up a large community than in Linux.
Re: Checksumming blocks? [was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]
Russell Leighton wrote: Is there a recovery mechanism, or do you just be happy you know there is a problem (and go to backup)? You probably go to backup anyway. The recovery mechanism just means you get to choose the downtime to restore from backup (if there is downtime), versus being suddenly down until you can restore.
Re: reiser4: maybe just fix bugs?
Theodore Tso wrote: On Tue, Aug 01, 2006 at 11:55:57AM -0500, David Masover wrote: If I understand it right, the original Reiser4 model of file metadata is the file-as-directory stuff that caused such a furor the last big push for inclusion (search for Silent semantic changes in Reiser4): The furor was caused by concerns Al Viro expressed about locking/deadlock issues that reiser4 introduced. Which, I believe, was about file-as-dir. Which also had problems with things like directory loops. That's sort of a disk space memory leak. The bigger issue with xattr support is two-fold. First of all, there are the progams that are expecting the existing extended attribute interface, Yeah... More importantly are the system-level extended attributes, such as those used by SELINUX, which by definition are not supposed to be visible to the user at all, I don't see why either of these are issues. The SELINUX stuff can be a plugin which doesn't necessarily have a user-level interface. Cryptocompress, for instance, exists independent of its user-level interface (probably the file-as-dir stuff), and will probably be implemented in some sort of stable form as a system-wide default for new files. So, certainly metadata (xattrs) as a plugin could be implemented with no UI at all, or any given UI. ... Anyway, I still see no reason why these cannot be implemented in Reiser4, other than the possibility that if it uses plugins, I guarantee that at least one or two people will hate the implementation for that reason alone. Not supporting xattrs means that those distro's that use SELINUX by default (i.e., RHEL, Fedora, etc.) won't want to use reiser4, because SELINUX won't work on reiser4 filesytstems. Right. So they will be implemented, eventually. Whether or not Hans cares about this is up to him He does, or he should. Reiser4 needs every bit of acceptance it can get right now, as long as it can get them without compromising its goals or philosophy. Extended attributes only compromise these because it provides less incentive to learn any other metadata interface that Reiser4 provides. But that's irrelevant if Reiser4 doesn't gain enough acceptance due to lack of xattr support, anything it has will be irrelevant anyway. So just as we provide the standard interface to Unix permissions (even though we intend to implement things like acls and views, and even though there was a file/.pseudo/rwx interface), we should provide the standard xattr interface, and the standard direct IO interface, and anything else that's practical. Be a good, standard filesystem first, and an innovative filesystem second.
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Horst H. von Brand wrote: Vladimir V. Saveliev [EMAIL PROTECTED] wrote: On Tue, 2006-08-01 at 17:32 +0200, Åukasz Mierzwa wrote: What fancy (beside cryptocompress) does reiser4 do now? it is supposed to provide an ability to easy modify filesystem behaviour in various aspects without breaking compatibility. If it just modifies /behaviour/ it can't really do much. And what can be done here is more the job of the scheduler, not of the filesystem. Keep your hands off it! Say wha? There's a lot you can do with the _representation_ of the on-disk format without changing the _physical_ on-disk format. As a very simple example, a plugin could add a sysfs-like folder with information about that particular filesystem. Yes, I know there are better ways to do things, but there are things you can change about behavior without (I think) touching the scheduler. Or am I wrong about the scope of the scheduler? If it somehow modifies /on disk format/, it (by *definition*) isn't compatible. Ditto. Cryptocompress is compatible with kernels that have a working cryptocompress plugin. Other kernels will notice that they are meant to be read by cryptocompress, and (I hope) refuse to read files they won't be able to. Same would be true of any plugin that changes the disk format. But, the above comments about behavior still hold. There's a lot you can do with plugins without changing the on-disk format. If you want a working example, look to your own favorite filesystems that support quotas, xattrs, and acls -- is an on-disk FS format with those enabled compatible with a kernel that doesn't support them (has them turned off)? How about ext3, with its journaling -- is the journaling all in the scheduler? But isn't the ext3 disk format compatible with ext2? quota support xattrs and acls Without those, it is next to useless anyway. What is? The FS? I use neither on desktop machines, though I'd appreciate xattrs for Beagle. Or are you talking about the plugins? See above, then.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Alan Cox wrote: Ar Maw, 2006-08-01 am 16:52 +0200, ysgrifennodd Adrian Ulrich: WriteCache, Mirroring between 2 Datacenters, snapshotting.. etc.. you don't need your filesystem beeing super-robust against bad sectors and such stuff because: You do it turns out. Its becoming an issue more and more that the sheer amount of storage means that the undetected error rate from disks, hosts, memory, cables and everything else is rising. Yikes. Undetected. Wait, what? Disks, at least, would be protected by RAID. Are you telling me RAID won't detect such an error? It just seems wholly alien to me that errors would go undetected, and we're OK with that, so long as our filesystems are robust enough. If it's an _undetected_ error, doesn't that cause way more problems (impossible problems) than FS corruption? Ok, your FS is fine -- but now your bank database shows $1k less on random accounts -- is that ok? There has been a great deal of discussion about this at the filesystem and kernel summits - and data is getting kicked the way of networking - end to end not reliability in the middle. Sounds good, but I've never let discussions by people smarter than me prevent me from asking the stupid questions. The sort of changes this needs hit the block layer and ever fs. Seems it would need to hit every application also...
Re: reiser4: maybe just fix bugs?
Vladimir V. Saveliev wrote: Do you think that if reiser4 supported xattrs - it would increase its chances on inclusion? Probably the opposite. If I understand it right, the original Reiser4 model of file metadata is the file-as-directory stuff that caused such a furor the last big push for inclusion (search for Silent semantic changes in Reiser4): foo.mp3/.../rwx# permissions foo.mp3/.../artist # part of the id3 tag So I suspect xattrs would just be a different interface to this stuff, maybe just a subset of it (to prevent namespace collisions): foo.mp3/.../xattr/ # contains files representing attributes Of course, you'd be able to use the standard interface for getting/setting these. The point is, I don't think Hans/Namesys wants to do this unless they're going to do it right, especially because they already have the file-as-dir stuff somewhat done. Note that these are neither mutually exclusive nor mutually dependent -- you don't have to enable file-as-dir to make xattrs work. I know it's not done yet, though. I can understand Hans dragging his feet here, because xattrs and traditional acls are examples of things Reiser4 is supposed to eventually replace. Anyway, if xattrs were done now, the only good that would come of it is building a userbase outside the vanilla kernel. I can't see it as doing anything but hurting inclusion by introducing more confusion about plugins. I could be entirely wrong, though. I speak for neither Hans/Namesys/reiserfs nor LKML. Talk amongst yourselves...
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Horst H. von Brand wrote: Bernd Schubert [EMAIL PROTECTED] wrote: While filesystem speed is nice, it also would be great if reiser4.x would be very robust against any kind of hardware failures. Can't have both. Why not? I mean, other than TANSTAAFL, is there a technical reason for them being mutually exclusive? I suspect it's more we haven't found a way yet...
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Christian Trefzer wrote: On Mon, Jul 31, 2006 at 10:57:35AM -0500, David Masover wrote: Wil Reichert wrote: Any idea how the fragmentation resulting from re-syncing the tree affects performance over time? Yes, it does affect it a lot. I have no idea how much, and I've never benchmarked it, but purely subjectively, my portage has gotten slower over time. Delayed allocation still performs a lot better here than the v3 immediate allocation. In addition, tree balancing operations are performed on flush as well, so what you get on disk is basically an almost-optimal tree. Of course, this will change a bit over time, but with v4 it takes a lot longer for that to happen than with v3 afaict. There _has_ been some worthwile development in the meantime : ) Hmm. The thing is, I don't remember v3 slowing down much at all, whereas v4 slowed down pretty dramatically after the first few weeks. It does seem pretty stable now, though, and it doesn't seem to be getting any slower. I've had this particular FS since... hmm... Is there an FS tool to check mkfs time? I think it's a year now, but I'd like to be sure. If not, I'll just find the oldest file, but the clock on this machine isn't reliable (have to set it with NTP every boot)...
Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]
Theodore Tso wrote: Ah, but as soon as the repacker thread runs continuously, then you lose all or most of the claimed advantage of wandering logs. [...] So instead of a write-write overhead, you end up with a write-read-write overhead. This would tend to suggest that the repacker should not run constantly, but also that while it's running, performance could be almost as good as ext3. But of course, people tend to disable the repacker when doing benchmarks because they're trying to play the my filesystem/database has bigger performance numbers than yours game So you run your own benchmarks, I'll run mine... Benchmarks for everyone! I'd especially like to see what performance is like with the repacker not running, and during the repack. If performance during a repack is comparable to ext3, I think we win, although we have to amend that statement to My filesystem/database has the same or bigger perfomance numbers than yours.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Alan Cox wrote: Ar Maw, 2006-08-01 am 11:44 -0500, ysgrifennodd David Masover: Yikes. Undetected. Wait, what? Disks, at least, would be protected by RAID. Are you telling me RAID won't detect such an error? Yes. RAID deals with the case where a device fails. RAID 1 with 2 disks can in theory detect an internal inconsistency but cannot fix it. Still, if it does that, that should be enough. The scary part wasn't that there's an internal inconsistency, but that you wouldn't know. And it can fix it if you can figure out which disk went. Or give it 3 disks and it should be entirely automatic -- admin gets paged, admin hotswaps in a new disk, done. we're OK with that, so long as our filesystems are robust enough. If it's an _undetected_ error, doesn't that cause way more problems (impossible problems) than FS corruption? Ok, your FS is fine -- but now your bank database shows $1k less on random accounts -- is that ok? Not really no. Your bank is probably using a machine (hopefully using a machine) with ECC memory, ECC cache and the like. The UDMA and SATA storage subsystems use CRC checksums between the controller and the device. SCSI uses various similar systems - some older ones just use a parity bit so have only a 50/50 chance of noticing a bit error. Similarly the media itself is recorded with a lot of FEC (forward error correction) so will spot most changes. Unfortunately when you throw this lot together with astronomical amounts of data you get burned now and then, especially as most systems are not using ECC ram, do not have ECC on the CPU registers and may not even have ECC on the caches in the disks. It seems like this is the place to fix it, not the software. If the software can fix it easily, great. But I'd much rather rely on the hardware looking after itself, because when hardware goes bad, all bets are off. Specifically, it seems like you do mention lots of hardware solutions, that just aren't always used. It seems like storage itself is getting cheap enough that it's time to step back a year or two in Moore's Law to get the reliability. The sort of changes this needs hit the block layer and ever fs. Seems it would need to hit every application also... Depending how far you propogate it. Someone people working with huge data sets already write and check user level CRC values for this reason (in fact bitkeeper does it for one example). It should be relatively cheap to get much of that benefit without doing application to application just as TCP gets most of its benefit without going app to app. And yet, if you can do that, I'd suspect you can, should, must do it at a lower level than the FS. Again, FS robustness is good, but if the disk itself is going, what good is having your directory (mostly) intact if the files themselves have random corruptions? If you can't trust the disk, you need more than just an FS which can mostly survive hardware failure. You also need the FS itself (or maybe the block layer) to support bad block relocation and all that good stuff, or you need your apps designed to do that job by themselves. It just doesn't make sense to me to do this at the FS level. You mention TCP -- ok, but if TCP is doing its job, I shouldn't also need to implement checksums and other robustness at the protocol layer (http, ftp, ssh), should I? Because in this analogy, it looks like TCP is the block layer and a protocol is the fs. As I understand it, TCP only lets the protocol/application know when something's seriously FUBARed and it has to drop the connection. Similarly, the FS (and the apps) shouldn't have to know about hardware problems until it really can't do anything about it anymore, at which point the right thing to do is for the FS and apps to go oh shit and drop what they're doing, and the admin replaces hardware and restores from backup. Or brings a backup server online, or... I guess my main point was that _undetected_ problems are serious, but if you can detect them, and you have at least a bit of redundancy, you should be good. For instance, if your RAID reports errors that it can't fix, you bring that server down and let the backup server run.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Gregory Maxwell wrote: On 8/1/06, David Masover [EMAIL PROTECTED] wrote: Yikes. Undetected. Wait, what? Disks, at least, would be protected by RAID. Are you telling me RAID won't detect such an error? Unless the disk ECC catches it raid won't know anything is wrong. This is why ZFS offers block checksums... it can then try all the permutations of raid regens to find a solution which gives the right checksum. Isn't there a way to do this at the block layer? Something in device-mapper? Every level of the system must be paranoid and take measure to avoid corruption if the system is to avoid it... it's a tough problem. It seems that the ZFS folks have addressed this challenge by building as much of what is classically separate layers into one part. Sounds like bad design to me, and I can point to the antipattern, but what do I know?
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Ric Wheeler wrote: Alan Cox wrote: Ar Maw, 2006-08-01 am 16:52 +0200, ysgrifennodd Adrian Ulrich: WriteCache, Mirroring between 2 Datacenters, snapshotting.. etc.. you don't need your filesystem beeing super-robust against bad sectors and such stuff because: You do it turns out. Its becoming an issue more and more that the sheer amount of storage means that the undetected error rate from disks, hosts, memory, cables and everything else is rising. Most people use absolutely giant disks in laptops and desktop systems (300GB 500GB are common, 750GB on the way). File systems need to be as robust as possible for users of these systems as people are commonly storing personal critical data like photos mostly on these unprotected drives. Their loss. Robust FS is good, but really, if you aren't doing backup, you are going to lose data. End of story. Even for the high end users, array based mirroring and so on can only do so much to protect you. Mirroring a corrupt file system to a remote data center will mirror your corruption. Assuming it's undetected. Why would it be undetected? Rolling back to a snapshot typically only happens when you notice a corruption which can go undetected for quite a while, so even that will benefit from having reliability baked into the file system (i.e., it should grumble about corruption to let you know that you need to roll back or fsck or whatever). Yes, the filesystem should complain about corruption. So should the block layer -- if you don't trust the FS, use a checksum at the block layer. So should... There are just so many other, better places to do this than the FS. The FS should complain, yes, but if the disk is bad, there's going to be corruption. An even larger issue is that our tools, like fsck, which are used to uncover these silent corruptions need to scale up to the point that they can uncover issues in minutes instead of days. A lot of the focus at the file system workshop was around how to dramatically reduce the repair time of file systems. That would be interesting. I know from experience that fsck.reiser4 is amazing. Blew away my data with something akin to an rm -rf, and fsck fixed it. Tons of crashing/instability in the early days, but only once -- before they even had a version instead of a date, I think -- did I ever have a case where fsck couldn't fix it. So I guess the next step would be to make fsck faster. Someone mentioned a fsck that repairs the FS in the background? In a way, having super reliable storage hardware is only as good as the file system layer on top of it - reliability needs to be baked into the entire IO system stack... That bit makes no sense. If you have super reliable storage failure (never dies), and your FS is also reliable (never dies unless hardware does, but may go bat-shit insane when hardware dies), then you've got a super reliable system. You're right, running Linux's HFS+ or NTFS write support is generally a bad idea, no matter how reliable your hardware is. But this discussion was not about whether an FS is stable, but how well an FS survives hardware corruption.
Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]
Ian Stirling wrote: David Masover wrote: David Lang wrote: On Mon, 31 Jul 2006, David Masover wrote: Oh, I'm curious -- do hard drives ever carry enough battery/capacitance to cover their caches? It doesn't seem like it would be that hard/expensive, and if it is done that way, then I think it's valid to leave them on. You could just say that other filesystems aren't taking as much advantage of newer drive features as Reiser :P there are no drives that have the ability to flush their cache after they loose power. Aha, so back to the usual argument: UPS! It takes a fraction of a second to flush that cache. You probably don't actually want to flush the cache - but to write to a journal. 16M of cache - split into 32000 writes to single sectors spread over the disk could well take several minutes to write. Slapping it onto a journal would take well under .2 seconds. That's a non-trivial amount of storage though - 3J or so, [EMAIL PROTECTED] - a moderately large/expensive capacitor. Before we get ahead of ourselves, remember: ~$200 buys you a huge amount of battery storage. We're talking several minutes for several boxes, at the very least -- more like 10 minutes. But yes, a journal or a software suspend.
Re: reiser4: maybe just fix bugs?
Nate Diller wrote: On 8/1/06, David Masover [EMAIL PROTECTED] wrote: Vladimir V. Saveliev wrote: I could be entirely wrong, though. I speak for neither Hans/Namesys/reiserfs nor LKML. Talk amongst yourselves... i should clarify things a bit here. yes, hans' goal is for there to be no difference between the xattr namespace and the readdir one. unfortunately, this is not feasible with the current VFS, and some major work would have to be done to enable this without some pathological cases cropping up. some very smart people think that it cannot be done at all. But an xattr interface should work just fine, even if the rest of the system is inaccessible (no readdir interface) -- preventing all these pathological problems, except the one where Hans implements it the way I'm thinking, and kernel people hate it.
Re: reiser4 can now bear with filled fs, looks stable to me...
Hans Reiser wrote: I think that most of our problem is that we are too socially insulated from lkml. They are a herd, and decide things based on what thoughts echo most loudly. To be fair, it's not the whole lkml you have to convince, just the few people directly responsible for filesystems and 2.6 maintenance. But then, they probably do consider what the herd is saying... It might even be socially effective to shut down reiserfs-list until inclusion occurs. Maybe. It will be an inconvenience for me, if we have to. I'm not even on LKML, and I'd rather not be -- even this list can get noisy at times... But I will go with it if it's what works best.
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Wil Reichert wrote: =) That was sorta the plan. Any idea how the fragmentation resulting from re-syncing the tree affects performance over time? Try to post replies at the bottom, or below the context. Yes, it does affect it a lot. I have no idea how much, and I've never benchmarked it, but purely subjectively, my portage has gotten slower over time.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Jan-Benedict Glaw wrote: On Mon, 2006-07-31 17:59:58 +0200, Adrian Ulrich [EMAIL PROTECTED] wrote: A colleague of mine happened to create a ~300gb filesystem and started to migrate Mailboxes (Maildir-style format = many small files (1-3kb)) to the new LUN. At about 70% the filesystem ran out of inodes; Not a So preparation work wasn't done. So what? Yes, you need to do preparation. But it is really nice if the filesystem can do that work for you. Let me put it this way -- You're back in college, and it's time to write a thesis. You have a choice of software packages: Package A: You have to specify how many pages, and how many words, you're likely to use before you start typing. Guess too high, and you'll print out a bunch of blank pages at the end. Guess too low, and you'll run out of space and have to start over, copy and paste your document back in, and hope it gets all the formatting right, which it probably won't. Package B: Your document grows as you type. When it's time to print, only the pages you've actually written something on -- but all of the pages you've actually written something on -- are printed. All other things being equal, which would you choose? Which one seems more modern? Look, I understand the argument against ReiserFS v3 -- it has another limitation that you don't even know about. That other limitation is scary -- that's like being able to type as many words as you want, but once you type enough pages (no way of knowing how many), pages start randomly disappearing from the middle of your document. But the argument that no one cares about inode limits? Really, stop kidding yourselves. It's 2006. The limits are starting to look ridiculous. Just because they're workable doesn't mean we should have to live with them.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Matthias Andree wrote: Adrian Ulrich schrieb am 2006-07-31: Why are a lot of Solaris-people using (buying) VxFS? Maybe because UFS also has such silly limitations? (..and performs awkward with trillions of files..?..) Well, such silly limitations... looks like they are mostly hot air spewn by marketroids that need to justify people spending money on their new filesystem. I think the limitations are silly, and I'm not paid to say this. Besides, we're talking about a filesystem that will be free (and libre), so I don't see the point of marketroids, certainly not in this context. But let's not stoop to name-calling.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Jan-Benedict Glaw wrote: On Mon, 2006-07-31 12:17:12 -0700, Clay Barnes [EMAIL PROTECTED] wrote: On 20:43 Mon 31 Jul , Jan-Benedict Glaw wrote: On Mon, 2006-07-31 20:11:20 +0200, Matthias Andree [EMAIL PROTECTED] wrote: Jan-Benedict Glaw schrieb am 2006-07-31: [Crippled DMA writes] Massive hardware problems don't count. ext2/ext3 doesn't look much better in such cases. I had a machine with RAM gone bad (no ECC - I wonder what They do! Very much, actually. These happen In Real Life, so I have to I think what he meant was that it is unfair to blame reiser3 for data loss in a massive failure situation as a case example by itself. What Crippling a few KB of metadata in the ext{2,3} case probably wouldn't fobar the filesystem... Probably. By the time a few KB of metadata are corrupted, I'm reaching for my backup. I don't care what filesystem it is or how easy it is to edit the on-disk structures. This isn't to say that having robust on-disk structures isn't a good thing. I have no idea how Reiser4 will hold up either way. But ultimately, what you want is the journaling (so power failure / crashes still leave you in an OK state), backups (so when blocks go bad, you don't care), and performance (so you can spend less money on hardware and more money on backup hardware).
Re: the 'official' point of view expressed by kernelnewbies.orgregarding reiser4 inclusion
David Lang wrote: On Mon, 31 Jul 2006, David Masover wrote: Probably. By the time a few KB of metadata are corrupted, I'm reaching for my backup. I don't care what filesystem it is or how easy it is to edit the on-disk structures. This isn't to say that having robust on-disk structures isn't a good thing. I have no idea how Reiser4 will hold up either way. But ultimately, what you want is the journaling (so power failure / crashes still leave you in an OK state), backups (so when blocks go bad, you don't care), and performance (so you can spend less money on hardware and more money on backup hardware). please read the discussion that took place at the filesystem summit a couple weeks ago (available on lwn.net) I think I will, but I don't have the time today, so... one of the things that they pointed out there is that as disks get larger the ratio of bad spots per Gig of storage is remaining about the same. As is the rate of failures per Gig of storage. As a result of this the idea of only running on perfect disks that never have any failures is becomeing significantly less realistic, instead you need to take measures to survive in the face of minor corruption (including robust filesystems, raid, etc) RAID seems a much more viable solution to me. That and cheaper storage, so that you can actually afford to replace the disk when you find corruption, or have more redundancy so you don't have to. Because robust filesystems is nice in theory, but in practice, you really never know what will get hit. RAID, at least, is predictable. When it's not: Backups.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Alan Cox wrote: Ar Llu, 2006-07-31 am 17:00 -0400, ysgrifennodd Gregory Maxwell: Are you sure that you aren't commenting on cases where Reiser3 alerts the user to a critical data condition (via a panic) which leads to a trouble report while ext3 ignores the problem which suppresses the trouble report from the user? man mount Ext3 is configurable, and has been for years via the errors= option. Sure, but I think the suggestion is that the reason we generally see more ReiserFS complaints than ext3 complaints might be because of the default level of errors logged.
Re: reiser4 can now bear with filled fs, looks stable to me...
Maciej Sołtysiak wrote: Hello David, - it is more expensive to: a) succeed at kernel inclusion b) argue c) waste time You must be new here... Options B and C are all that ever seems to happen when reiserfs-list and lkml collide. Is option A possible? The speed of a nonworking program is irrelevant. The cost-effectiveness of an impossible solution is irrelevant.
Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]
Matthias Andree wrote: On Mon, 31 Jul 2006, Nate Diller wrote: this is only a limitation for filesystems which do in-place data and metadata updates. this is why i mentioned the similarities to log file systems (see rosenblum and ousterhout, 1991). they observed an order-of-magnitude increase in performance for such workloads on their system. It's well known that transactions that would thrash on UFS or ext2fs may have quieter access patterns with shorter strokes can benefit from logging, data journaling, whatever else turns seeks into serial writes. And then, the other question with wandering logs (to avoid double writes) and such, you start wondering how much fragmentation you get as the price to pay for avoiding seeks and double writes at the same time. So you use a repacker. Nice thing about a repacker is, everyone has downtime. Better to plan to be a little sluggish when you'll have 1/10th or 1/50th of the users than be MUCH slower all the time. You're right, though, to ask the question: TANSTAAFL, or how long the system can sustain such access patterns, particularly if it gets under memory pressure and must move. Anyone care to run some very long benchmarks? Even with lazy allocation and other optimizations, I question the validity of 3000/s or faster transaction frequencies. Even the 500 on ext3 are suspect, particularly with 7200/min (s)ATA crap. This sounds pretty much like the drive doing its best to shuffle blocks around in its 8 MB cache and lazily writing back. Oh, I'm curious -- do hard drives ever carry enough battery/capacitance to cover their caches? It doesn't seem like it would be that hard/expensive, and if it is done that way, then I think it's valid to leave them on. You could just say that other filesystems aren't taking as much advantage of newer drive features as Reiser :P Anyway, remember that the primary tool of science is not logic. Logic is the primary tool of philosophy. The primary tool of science is observation. Sorry, the only machines I could really run this on are about to be in remote only mode for a couple weeks. I'm hesitant to hit them too hard.
Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]
Theodore Tso wrote: On Mon, Jul 31, 2006 at 08:31:32PM -0500, David Masover wrote: So you use a repacker. Nice thing about a repacker is, everyone has downtime. Better to plan to be a little sluggish when you'll have 1/10th or 1/50th of the users than be MUCH slower all the time. Actually, that's a problem with log-structured filesystems in general. There are quite a few real-life workloads where you *don't* have downtime. The thing is, in a global economy, you move from the London/European stock exchanges, to the New York/US exchanges, to the Asian exchanges, with little to no downtime available. Such systems must have redundancy, however. And if you have 2-3 servers hot in case one of them goes down, I can see switching between which is more active, and which is repacking. This repacker is online, hence a filesystem being repacked would have to be less active, not necessarily down. So repack the backup server, then make the backup server the active one and repack the main server. If the main server goes down while the backup is repacking, kill the repack process. I actually have a problem imagining a system where you don't have enough spare capacity (disk, CPU, spare servers) to run a repacker every now and then, but which also must have 100% uptime. What happens when a disk goes bad? Or when power to half the country goes out? Or... You get the idea. In addition, people have been getting more sophisticated with workload consolidation tricks so that you use your downtime for other applications (either to service other parts of the world, or to do daily summaries, 3-d frame rendering at animation companies, etc.) So the assumption that there will always be time to run the repacker is a dangerous one. 3D frame rendering in particular doesn't require much disk use, does it? Daily summaries, I guess, depends on what kind of summaries they are. And anyway, those applications are making the same dangerous assumption. And anyway, I suspect the repacker will work best once a week or so, but no one knows yet, as they haven't written it yet. The problem is that many benchmarks (such as taring and untaring the kernel sources in reiser4 sort order) are overly simplistic, in that they don't really reflect how people use the filesystem in real life. That's true. I'd also like to see lots more benchmarks. If the benchmark doesn't take into account the need for repacker, or if the repacker is disabled or fails to run during the benchmark, the filesystem are in effect cheating on the benchmark because there is critical work which is necessary for the long-term health of the filesystem which is getting deferred until after the benchmark has finished measuring the performance of the system under test. In this case, the only fair test would be to run the benchmark 24/7 for a week, and run the repacker on a weekend. Or however you're planning to do it. It wouldn't be fair to run a 10-minute or 1-hour benchmark and then immediately run the repacker. But I'd also like to see more here, especially about fragmentation. If the repacker will cost money, the system should be reasonably good at avoiding fragmentation. I'm wondering if I should run a benchmark on my systems -- they're over a year old, and while they aren't under particularly heavy load, they should be getting somewhat fragmented by now.
Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]
Timothy Webster wrote: Different users have different needs. I'm having trouble thinking of users who need an FS that doesn't need a repacker. The disk error problem, though, you're right -- most users will have to get bitten by this, hard, at least once, or they'll never get the importance of it. But it'd be nice if it's not too hard, and we can actually recover most of their files. Still, I can see most people who are aware of this problem using RAID, backups, and not caring if their filesystem tolerates bad hardware. The problem I see is managing disk errors. I see this kind of the same way. If your disk has errors, you should be getting a new disk. If you can't do that, you can run a mirrored RAID -- even on SATA, you should be able to hotswap it. Even for a home/desktop user, disks are cheap, and getting cheaper all the time. All you have to do is run the mean time between failure numbers by them, and ask them if their backup is enough. And perhaps a really good clustering filesystem for markets that require NO downtime. Thing is, a cluster is about the only FS I can imagine that could reasonably require (and MAYBE provide) absolutely no downtime. Everything else, the more you say it requires no downtime, the more I say it requires redundancy. Am I missing any more obvious examples where you can't have enough redundancy, but you can't have downtime either?
Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]
David Lang wrote: On Mon, 31 Jul 2006, David Masover wrote: Oh, I'm curious -- do hard drives ever carry enough battery/capacitance to cover their caches? It doesn't seem like it would be that hard/expensive, and if it is done that way, then I think it's valid to leave them on. You could just say that other filesystems aren't taking as much advantage of newer drive features as Reiser :P there are no drives that have the ability to flush their cache after they loose power. Aha, so back to the usual argument: UPS! It takes a fraction of a second to flush that cache. now, that being said, /. had a story within the last couple of days about hard drive manufacturers adding flash to their hard drives. they may be aiming to add some non-volitile cache capability to their drives, although I didn't think that flash writes were that fast (needed if you dump the cache to flash when you loose power), or that easy on power (given that you would first loose power), and flash has limited write cycles (needed if you always use the cache). But, the point of flash was not to replace the RAM cache, but to be another level. That is, you have your Flash which may be as fast as the disk, maybe faster, maybe less, and you have maybe a gig worth of it. Even the bloatiest of OSes aren't really all that big -- my OS X came installed, with all kinds of apps I'll never use, in less than 10 gigs. And I think this story was awhile ago (a dupe? Not surprising), and the point of the Flash is that as long as your read/write cache doesn't run out, and you're still in that 1 gig of Flash, you're a bit safer than the RAM cache, and you can also leave the disk off, as in, spinned down. Parked. Very useful for a laptop -- I used to do this in Linux by using Reiser4, setting the disk to spin down, and letting lazy writes do their thing, but I didn't have enough RAM, and there's always the possibility of losing data. But leaving the disk off is nice, because in the event of sudden motion, it's safer that way. Besides, most hardware gets designed for That Other OS, which doesn't support any kind of Laptop Mode, so it's nice to be able to enforce this at a hardware level, in a safe way. I've heard to many fancy-sounding drive technologies that never hit the market, I'll wait until thye are actually available before I start counting on them for anything (let alone design/run a filesystem that requires them :-) Or even remember their names. external battery backed cache is readily available, either on high-end raid controllers or as seperate ram drives (and in raid array boxes), but nothing on individual drives. Ah. Curses. UPS, then. If you have enough time, you could even do a Software Suspend first -- that way, when power comes back on, you boot back up, and if it's done quickly enough, connections won't even be dropped...
Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]
David Lang wrote: On Mon, 31 Jul 2006, David Masover wrote: And perhaps a really good clustering filesystem for markets that require NO downtime. Thing is, a cluster is about the only FS I can imagine that could reasonably require (and MAYBE provide) absolutely no downtime. Everything else, the more you say it requires no downtime, the more I say it requires redundancy. Am I missing any more obvious examples where you can't have enough redundancy, but you can't have downtime either? just becouse you have redundancy doesn't mean that your data is idle enough for you to run a repacker with your spare cycles. Then you don't have redundancy, at least not for reliability. In that case, you have redundancy for speed. to run a repacker you need a time when the chunk of the filesystem that you are repacking is not being accessed or written to. Reasonably, yes. But it will be an online repacker, so it will be somewhat tolerant of this. it doesn't matter if that data lives on one disk or 9 disks all mirroring the same data, you can't just break off 1 of the copies and repack that becouse by the time you finish it won't match the live drives anymore. Aha. That really depends how you're doing the mirroring. If you're doing it at the block level, then no, it won't work. But if you're doing it at the filesystem level (a cluster-based FS, or something that layers on top of an FS), or (most likely) the database/application level, then when you come back up, the new data is just pulled in from the logs as if it had been written to the FS. The only example I can think of that I've actually used and seen working is MySQL tables, but that already covers a huge number of websites. database servers have a repacker (vaccum), and they are under tremendous preasure from their users to avoid having to use it becouse of the performance hit that it generates. (the theory in the past is exactly what was presented in this thread, make things run faster most of the time and accept the performance hit when you repack). the trend seems to be for a repacker thread that runs continuously, causing a small impact all the time (that can be calculated into the capacity planning) instead of a large impact once in a while. Hmm, if that could be done right, it wouldn't be so bad -- if you get twice the performance but have to repack for 2 hrs at the end of the week, repacker is better, right? So if you could spread the 2 hours out over the week, in theory, you'd still be pretty close to twice the performance. But that is fairly difficult to do, and may be more difficult to do well than to implement, say, a Reiser4 plugin that operates about on the level of rsync, but on every file modification. the other thing they are seeing as new people start useing them is that the newbys don't realize they need to do somthing as archaic as running a repacker periodicly, as a result they let things devolve down to where performance is really bad without understanding why. Yikes. But then, that may be a failure of distro maintainers for not throwing it in cron for them. I had a similar problem with MySQL. I turned on binary logging so I could do database replication, but I didn't realize I had to actually delete the logs. I now have a daily cron job that wipes out everything except the last day's logs. It could probably be modified pretty easily to run hourly, if I needed to. Moral of the story? Maybe there's something to this continuous repacker idea, but don't ruin a good thing for the rest of us because of newbies.
Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressedby kernelnewbies.org regarding reiser4 inclusion]
David Lang wrote: On Mon, 31 Jul 2006, David Masover wrote: Aha, so back to the usual argument: UPS! It takes a fraction of a second to flush that cache. which does absolutly no good if someone trips over the power cord, the fuse blows in the power supply, someone yanks the drive out of the hot-swap bay, etc. Power supply fuse... Yeah, it happens. Drives die, too. This seems fairly uncommon. And dear God, please tell me anyone smart enough to set up a UPS would also be smart enough to make tripping over the power cord rare or impossible. My box has a cable that runs down behind a desk, between the desk and the wall. Power strip is on the floor, where a UPS will be when I get around to buying one. If someone kicks any cable, it would be where the UPS hits the wall -- but that's also behind the same desk. as I understand it flash reads are fast (ram speeds), but writes are pretty slow (comparable or worse to spinning media) writing to a ram cache, but having a flash drive behind it doesn't gain you any protection. and I don't think you need it for reads Does gain you protection if you're not using the RAM cache, if you're that paranoid. I don't know if it's cheaper than RAM, but more read cache is always better. And losing power seems a lot less likely than crashing, especially on a Windows laptop, so it does make sense as a product. And a laptop, having a battery, will give you a good bit of warning before it dies. My Powerbook syncs and goes into Sleep mode when it runs low on power (~1%/5mins) external battery backed cache is readily available, either on high-end raid controllers or as seperate ram drives (and in raid array boxes), but nothing on individual drives. Ah. Curses. UPS, then. If you have enough time, you could even do a Software Suspend first -- that way, when power comes back on, you boot back up, and if it's done quickly enough, connections won't even be dropped... remember, it can take 90W of power to run your CPU, 100+ to run your video card, plus everything else. even a few seconds of power for this is a very significant amount of energy storage. Suspend2 can take about 10-20 seconds. It should be possible to work out the maximum amount of time it can take. Anyway, according to a quick Google search, my CPU is more like 70W. Video card isn't required on a server, but you may be right on mine. I haven't looked at UPSes lately, though. I need about 3 seconds for a sync, maybe 10 for a suspend, so to be safe I can say for sure I'd be down in about 30 seconds. So, another Google search, and while you can get a cheap UPS for anywhere from $10 to $100, the sweet spot seems to be a little over $200. $229, and it's 865W, supposedly for 3.7 minutes. Here's a review: This is a great product. It powers an AMD 64 3200+ with beefy (6800GT) graphics card, 21 CRT monitor, secondary 19 CRT, a linux server, a 15 CRT, Cisco 2800XL switch, Linksys WRTG54GS, cable modem, speakers, and many other things. The software says I will get 9 minutes runtime with all of that hooked up, realistically it's about 4 minutes. This was the lowest time reported. Most of the other reviews say at least 15 minutes, sometimes 30 minutes, with fairly high-end computers listed (and monitors, sometimes two computers/monitors), but nowhere near as much stuff as this guy has. I checked most of these for Linux support, and UPSes in general seem well supported. So yes, the box will shut off automatically. On a network, it shouldn't be too hard to get one box to shut off all the rest. It's a lot of money, even at the low end, but when you're already spending a pile of money on a new computer, keep power in mind. And really, even 11 minutes would be fine, but 40 minutes of power is quite a lot compared to less than a minute of time taken to shut down normally -- not even suspend, but a normal shut down. I'd be tempted to try to ride it out for the first 20 minutes, see if power comes back up... however, I did get a pointer recently at a company makeing super-high capcity caps, up to 2600F (F, not uF!) in a 138mmx tall 57mm dia cyliner, however it only handles 2.7v (they have modules that handle higher voltages available) http://www.maxwell.com/ultracapacitors/index.html however I don't see these as being standard equipment in systems or on drives anytime soon This seems to be a whole different approach -- more along the lines of in the drive, which would be cool...
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Łukasz Mierzwa wrote: Dnia Sat, 29 Jul 2006 20:31:59 +0200, David Masover [EMAIL PROTECTED] napisał: Nikita Danilov wrote: As you see, ext2 code already has multiple file plugins, with persistent plugin id (stored in i_mode field of on-disk struct ext2_inode). Aha! So here's another question: Is it fair to ask Reiser4 to make its plugins generic, or should we be asking ext2/3 first? Doesn't iptables have plugins? Maybe we should make them generic so other packet filters can use them ;) Hey, yeah! I mean, not everyone wants to run the ipchains emulation on top of iptables! Some people really want to run ipchains with iptables plugins! /sarcasm It is REALLY time for this discussion to get technical again, and to go way, way over my head. And it's time for me to go build my MythTV box, and see if I can shake out some Reiser4 bugs.
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Christian Trefzer wrote: On Sun, Jul 30, 2006 at 11:39:42PM +0200, Christian Trefzer wrote: In order to avoid having to pull the whole tree via rsync again, you might want to grab my script from the list and adapt it to your needs. Of course, you can tar it up manually instead. Silly me, but after approx. 9h of studying, little wonder ; ) In fact, the official install guide tells you to download a snapshot tarball first, then start syncing.
Re: reiser4 can now bear with filled fs, looks stable to me...
Christian Trefzer wrote: Hi, I booted 2.6.18-rc2-mm1 today and later filled up my /opt partition by accident, and guess what, reiser4 did not screw up : D Hmm, I'm curious, though... How does it react to a few billion files? Sorry, I can't test this, but I will be testing MythTV, if not now, then in a few weeks. Congratulations and thanks to the namesys developers! Hans, I can somewhat understand how you feel about your situation. Don't let frustration get in your way, your work is simply too great. You're an [...] screwing over society ; ) Sometimes you just have to swallow your pride instead of wasting your time by yelling at the rest of the world, and if humble work does not lead to success, there won't be any other way, I fear. Amen. I do not want to see Reiser4 not succeed because of politics, and it really looks like the only way to win the political war is not to play. The technical stuff is really the last way in, but neither side has said anything technical in awhile. The most technical things that have happened lately is Hans pointing to benchmarks and LKML pointing to ext3 plugins. I suspect part of this is simply the word plugin coming around to bite us in the ass, but whatever. We're all tired of this fight. IMHO it would be best to deliver quality patches against all kinds of sources (distro kernels, vanilla -rcs maybe, etc.) Well, we have the patches against vanilla, which seem to work well with at least a few other patches I've tried. and the entire patched source tarball as well, for people to download and build. Next step would be to provide binary packages, and repos for people to add to their package manager's source list. Until distros pick up their respective patch, this is as far as support can go, I guess. That would actually be pretty good, for anyone making the conscious decision to use a filesystem. Still need official distro support to get the people who don't (think they) care. So, what do you all say? Sounds good. I don't have any idea of the work required, either...
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Arjan van de Ven wrote: Most users not only cannot patch a kernel, they don't know what a patch is. It most certainly does. obviously you can provide complete kernels, including precompiled ones. Most distros have a yum or apt or similar tool to suck down packages, it's trivial for users to add a site to that, so you could provide packages if you want and make it easy for them. What's more, many distros patch their kernels extensively. They listen to their users, too. So if there are a lot of users wanting this to be in the kernel, let them complain -- loudly -- to their distro to patch for Reiser4. It could be made even easier than that -- if Reiser4 is really so self-contained, it could be a whole separate set of modules, distributed on its own. Most gamers have to be content with doing something similar with the nvidia drivers -- for different reasons (licensing) but with the same results. I know Gentoo handles this automatically (emerge nvidia-kernel). Hmm, maybe it makes it a pain to have it as a root filesystem, so that really needs distro support. And yet, we have a whole system designed specifically for being able to load modules and tweak settings before the root FS is available. It's called initrd, or more recently, initramfs. I use an old-style initrd on this box, because my root FS is on an nvidia RAID, so I have to run a program called dmraid before I mount my root FS -- it's really trivial for me to have Reiser4 as a module, and I do, despite it being a root FS. I suspect that, all technical, political, and mine is bigger arguments aside, being available as a root FS of a distro, especially a default FS, would go a long way towards inclusion in the kernel. So all you have to do is find a reasonably popular and friendly distro, with people who are (for the moment) easier to deal with than kernel people. Most people, if they even know what a filesystem or a kernel is, still won't bother compiling their own kernel, you're right. But that means they are more likely to be using a distro-patched kernel than a stock, vanilla one. Is this enough to be in the jukebox, Hans? Of course, it's odd that I mention Gentoo, the Gentoo people (as a rule) hate ReiserFS, but there are far more distros than there are popular kernel forks. I'm sure someone will be interested. That's assuming that making further changes (putting stuff in the VFS) is out of the question (for now). signature.asc Description: OpenPGP digital signature
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Hans Reiser wrote: David Masover wrote: If indeed it can be changed easily at all. I think the burden is on you to prove that you can change it to be more generic, rather than saying Well, we could do it later, if people want us to... None of the filesystems other than reiser4 have any interest in using plugins, and this whole argument over how it should be in VFS is nonsensical because nobody but us has any interest in using the functionality. The burden is on the generic code authors to prove that they will ever ever do anything at all besides complain. Frankly, I don't think they will. I think they will never produce one line of code. I think it's fair to say that 5-10 years from now, with different ext3 maintainers, when the Reiser4 concept has proven itself, people will want plugins for ext3, and the ext3 developers will like the idea. ext* is one of those things that just refuses to die. I use ext3 for my /boot fs, so that I don't have to patch Grub for Reiser4, and so that at least I can mess with the bootloader from any rescue CD if something goes wrong. It's for kind of the same reason that Gentoo builds a 32-bit Grub, even though I'm booting a 64-bit OS -- just in case. I also use ext2 for my initrd. There are other monstrosities that will likely never die, also. ISO9660, UDF, and VFAT probably all have worse storage characteristics than Reiser4, in that as I understand it, they won't pack multiple files into a block. So Reiser4 might even make a good boot cd FS, storing things more efficiently -- but even if I'm right, those three filesystems will last forever, because they are currently well supported on every major OS, and I think one of ISO/UDF is required for DVDs. So for whatever reason someone's using another filesystem, even if all they need is the on-disk format (my reason for ext3 /boot and vfat on USB thumbdrives), I think it's reasonable to expect that they may one day want plugin functionality. People who like Reiser filesystems will do just fine running Reiser4 with a (udf|iso|vfat) storage driver, but people who don't will just want the higher level stuff. You're probably right and this is years of work for something that may not be worth anything, but I think this is what is going through people's heads as they look at this plugin system. So see my comments about distro inclusion. signature.asc Description: OpenPGP digital signature
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Nikita Danilov wrote: As you see, ext2 code already has multiple file plugins, with persistent plugin id (stored in i_mode field of on-disk struct ext2_inode). Aha! So here's another question: Is it fair to ask Reiser4 to make its plugins generic, or should we be asking ext2/3 first? signature.asc Description: OpenPGP digital signature
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Sarath Menon wrote: On Saturday 29 July 2006 23:41, David Masover wrote: I know Gentoo handles this automatically (emerge nvidia-kernel). I hate to say this again, but its not automatically. It requires more My point is, there's a fairly large group of users who would be willing to do that, as they're willing to do that to get their video drivers working. Also, assuming a distro did choose to support it, the only reason nvidia-kernel isn't just distributed with a pre-built kernel (on pre-built OSes, anyway) is licensing. This isn't a problem for Reiser4, which is GPL'd. I suspect that, all technical, political, and mine is bigger arguments aside, being available as a root FS of a distro, especially a default FS, would go a long way towards inclusion in the kernel. So all you have to do is find a reasonably popular and friendly distro, with people who are (for the moment) easier to deal with than kernel people. Its actually a matter of a hastle for the end user. That's where I would agree with Hans' comments quite earlier. Putting it in the kernel doesn't make it any more or less of a hassle for the end-user than getting distro support. I remember downloading a different set of Debian floppies which supported XFS, before XFS was mainstream. In that sense, it's somewhat done already -- there is a Gentoo livecd that is kept patched for Reiser4. The problem with Gentoo, of course, is that if you're going to use Gentoo, you're going to be compiling your own kernel. So when it comes down to getting vanilla-sources or gentoo-sources, it wouldn't take much -- just a reiser4-sources, or a separate reiser4-module package. Most people, if they even know what a filesystem or a kernel is, still won't bother compiling their own kernel, you're right. But that means they are more likely to be using a distro-patched kernel than a stock, vanilla one. Well, that's different, and that's the main problem in the linux empowerment that we see around ourselves. It finally revolves around the user, and as harsh as it may seem, it ultimately is the user who decides which fs is better (Give or take, they don't know the difference between a kernel or user-space. or for that matter far more basic things.) If I remember right, SuSe had ReiserFS as the default at one point. If even one moderately popular Linux had Reiser4 as the default FS, it would get a LOT more exposure than it would simply being included (as EXPERIMENTAL, at that) in the vanilla kernel. Of course, it's odd that I mention Gentoo, the Gentoo people (as a rule) hate ReiserFS, but there are far more distros than there are popular kernel forks. I'm sure someone will be interested. I do, and that's partly due to the speed of /usr/portage on reiser4, and the easiness of blowing everything and starting from scratch : -) Yes, I love /var/lib/portage/world also. Is /usr/portage still faster on Reiser4? I know it was when I switched, but that was years ago...
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Horst H. von Brand wrote: Jeff Garzik [EMAIL PROTECTED] wrote: [...] It is then simple to follow that train of logic: why not make it easy to replace the directory algorithm [and associated metadata]? or the file data space management algorithms? or even the inode handling? why not allow customers to replace a stock algorithm with an exotic, site-specific one? IMVHO, such experiments should/must be done in userspace. And AFAICS, they can today. inode handling? Really? But what's wrong with people doing such experiments outside the kernel? AFAICS, exotic, site-specific one is not something that would be considered for inclusion.
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Hans Reiser wrote: Linus Torvalds wrote: In other words, if a filesystem wants to do something fancy, it needs to do so WITH THE VFS LAYER, not as some plugin architecture of its own. (Let us try to avoid arguments over whether if you extend VFS it is still called VFS or is called reiser4's plugin layer, agreed?) Ok, assuming you actually extend the VFS. The point is that if we want plugins, we don't have to implement them in ext3, but we have to put the plugin interface somewhere standard that is obviously not part of one filesystem (the VFS is the place) so that ext3 can implement a plugin system without having to read or touch a line of reiser4 code, and without compiling reiser4 into the kernel. It may ultimately not be any different, technically. This seems more like an organizational and political thing. But that doesn't make it less important or valid. Regarding copyright, these plugins are compiled in. I have resisted dynamically loaded plugins for now, for reasons I will not go into here. Good point, there's no GPL issue here. Plugins will either not be distributed (used internally) or distributed as GPL. If you agree with taking it to the next level, then it is only to be expected that there are things that aren't well suited as they are, like parsing /etc/fstab when you have a trillion files. It is not very feasible to do it for all of the filesystems all at once given finite resources, it needs a prototype. Doesn't have to be in fstab, I hope, but think of it this way: ext3 uses JBD for its journaling. As I understand it, any other filesystem can also use JBD, and ext3 is mostly ext2 + JDB. So, make the plugin interface generic enough that it compliments the VFS, doesn't duplicate it, and doesn't exist as part of Reiser4 (and requires Reiser4 to be present). This may be just a bunch of renaming or a lot of work, I don't know, but I suspect it would make a lot of people a lot happier. We have finite resources. We can give you a working filesystem with roughly twice the IO performance of the next fastest you have that does not disturb other filesystems,. (4x once the compression plugin is fully debugged). It also fixes various V3 bugs without disturbing that code with deep fixes. We cannot take every advantage reiser4 has and port it to every other filesystem in the form of genericized code as a prerequisite for going in, we just don't have the finances. This is a very compelling argument to me, but that's preaching to the choir, I've been running Reiser4 since before it was released, and before it looked like it was going to be stable anytime soon. It may be bold of me to speak for the LKML, but I think the general consensus is: The speed of a nonworking program is irrelevant -- no one cares how fast it is if it breaks things, either now or in the future. Currently, the concern is that it breaks things in the future, like adding plugin support to other filesystems. And no one else cares what your finances are. Not out of compassion, but out of practicality. For instance, it would be a huge financial benefit to me if the kernel displayed, in big bold letters while booting, that DAVID MASOVER WROTE THIS! (I'm sure Linus knows what I'm talking about.) It would also be untrue in my case, and pointless for everyone else in the kernel, so I have to find another way to make money. This is because one way Linux stays ahead of the competition (technologically) is by having quality be a much greater motivation than money. Without plugins our per file compression plugins and encryption plugins cannot work. We can however let other filesystems use our code, and cooperate as they extend it and genericize it for their needs. Imposing code on other development teams is not how one best leads in open source, one sets an example and sees if others copy it. That is what I propose to do with our plugins. If no one copies, then we have harmed no one. Reasonable? Someone still has to maintain the FS. Anyway, like I said, this is a very compelling argument for me, but code speaks louder than words. Maybe, if you insist it's not in the VFS, maybe use some insanely simple FS like RomFS to demonstrate another FS using plugins? Do that, and put it in the VFS. Maybe implement something like cramfs as a romfs plugin (another demo). Maybe even per-file -- implement zisofs as isofs + compression plugin. I think that would effectively kill any argument that plugins are bad because they are only in Reiser4. Beyond that is all marketing, I guess. The word plugin is not helping here, too many people remember Plugins like Macromedia Flash...
Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)
Hans Reiser wrote: plugins if not for us. Our plugins affect no one else. Our self-contained code should not be delayed because other people delayed And at the moment, I can still use Reiser4. If I ever make a distro, I will include Reiser4 support, probably as the default FS. That will help with getting into the kernel. So, why is it that it's urgent to get into the kernel? It will have to be bootstrapped one way or another -- either get it into the kernel so distros are more likely to include it, or get it into distros so the kernel is more likely to include it. But this is exactly the kind of thing that has happened before. With XFS, with Nvidia even -- clean it up, do it the way the kernel people want you to, because they're the ones who will have to maintain it for 20 years, and make sure it doesn't stop working or break anything else. advantage from leading. If they want to some distant day implement generic plugins, for which they have written not one line of code to date, fine, we'll use it when it exists, but right now those who haven't coded should get out of the way of people with working code. It is not fair or just to do otherwise. It also prevents users from getting advances they could be getting today, for no reason. It prevents users from doing nothing. Our code will not be harder to change once it is in the kernel, it will be easier, because there will be more staff funded to work on it. If indeed it can be changed easily at all. I think the burden is on you to prove that you can change it to be more generic, rather than saying Well, we could do it later, if people want us to... As for this we are all too grand to be bothered with money to feed our families business, building a system in which those who contribute can find a way to be rewarded is what managers do. Free software programmers may be willing to live on less than others, but they cannot live on nothing, and code that does not ever ship means living on nothing. Let me put this in perspective the best way I know how, with an inane analogy: Suppose there's a band. A good band, full of impossible superstars, led by a benevolent dictator -- for the sake of argument, let's call him Elvis. (the King -- dictator...) The band's doing really well, and Elvis crew are getting paid fairly well just to share their music. (Ok, maybe Elvis didn't write anything, but bear with me...) Now, along comes a young Jimi Hendrix. He wants to be in the band, and Elvis says Sure, just come up with a song we like and we'll play it, and you can even play it with us! Sounds like a pretty good deal, so Jimi goes and tells all his friends, a couple of girls... Now, Jimi finishes his song, Elvis listens to it, and if you know anything about the music Elvis did and the music Hendrix did, you can imagine what happens next. Elvis says This song just isn't us. But if you change it here, and here, and maybe here, we'll play it. Jimi is devastated. He'd been counting on playing it with them that night, and if he doesn't, he won't have any groupies, all his friends will laugh at him, and his life will kind of suck. But, does anyone really think Elvis has any business singing Voodoo Child? Or Purple Haze? Is it really fair to ask Elvis to completely change his act and embarrass himself to help Jimi out? The answer is, Jimi shouldn't have staked so much on something that was never a guarantee. And what's more, the real-life Jimi Hendrix never played with Elvis, but had a very successful band of his own. And if Elvis was still alive, seeing Jimi play might make him change his mind, maybe -- but at least with his own band, Jimi's success isn't pinned on playing with Elvis. This analogy is flawed in many ways, aside from just being plain chronologically impossible, but while I'm sure Linus feels bad for you, I don't think it's his obligation to compromise his kernel to help you out with your financial situation. So it would help a lot if you wouldn't keep bringing it up in what should be a technical discussion. So, if you can't make it work with VFS, then I guess you can't, and you're stuck either creating another interface which is not tied to any one filesystem and isn't tied to the VFS either, or coming up with a better (more specific) idea of how to make the Reiser4 plugin system acceptable to kernel maintainers without having to eat Ramen for a few years. Understand that I'm putting on my devil's advocate hat right now. I'd love to see Reiser4 merged tomorrow, or a week from now, exactly as it's written today, but I just don't see it happening. I'd also love to get more technical, but I just don't know the Reiser4 internals well enough to understand the feasibility (or not) of any of my vague ideas.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Jeff Garzik wrote: Pavel Machek wrote: Hi! of the story for me. There's nothing wrong about focusing on newer code, but the old code needs to be cared for, too, to fix remaining issues such as the can only have N files with the same hash value. Requires a disk format change, in a filesystem without plugins, to fix it. A filesystem WITH plugins must still handle the standard Linux compatibility stuff that other filesystems handle. Plugins --do not-- mean that you can just change the filesystem format willy-nilly, with zero impact. They --do-- mean that you can change much of the filesystem behavior without requiring massive on-disk changes or massive interface changes. After all, this is how many FUSE plugins work -- standard FS interface, usually uses another standard FS as storage, but does crazy things like compression, encryption, and other transformations in between.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Maciej Sołtysiak wrote: Hello David, Thursday, July 27, 2006, 3:19:15 AM, you wrote: I'm not arguing for closed source, I'm just saying that once you open, there's no going back. Many times it's a good thing, but sometimes you A sidenote. Reiser4 is open and still we don't see people writing plugins as crazy. I belive there is one group that tried to be the first outside of namesys to write a plugin but still no success. Kernel inclusion would help a lot. Decent documentation would be better, though. Someone should look at what FUSE is doing right. Plugins fill a lot of the same niches, but with significantly better performance.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Matthias Andree wrote: On Tue, 25 Jul 2006, David Masover wrote: Matthias Andree wrote: On Tue, 25 Jul 2006, Denis Vlasenko wrote: I, on the contrary, want software to impose as few limits on me as possible. As long as it's choosing some limit, I'll pick the one with fewer surprises. Running out of inodes would be pretty surprising for me. No offense: Then it was a surprise for you because you closed your eyes and didn't look at df -i or didn't have monitors in place. Or because my (hypothetical) business exploded before I had the chance. After all, you could make the same argument about bandwidth, until you get Slashdotted. Surprise! There is no way to ask how many files with particular hash values you can still stuff into a reiserfs 3.X. There, you're running into a brick wall that only your forehead will see when you touch it. That's true, so you may be correct about less surprises. So, it depends which is more valuable -- fewer surprises, or fewer limits? That's not a hypothetical statement, and I don't really know. I can see both sides of this one. But I do hope that once Reiser4 is stable enough for you, it will be predictable enough. But the assertion that some backup was the cause for inode exhaustion on ext? is not very plausible since hard links do not take up inodes, symlinks are not backups and everything else requires disk blocks. So, Ok, where's the assertion that symlinks are not backups? Or not used in backup software? What about directories full of hardlinks -- the dirs themselves must use something, right? Anyway, it wasn't my project that hit this limit.
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Matthias Andree wrote: On Tue, 25 Jul 2006, Denis Vlasenko wrote: I, on the contrary, want software to impose as few limits on me as possible. As long as it's choosing some limit, I'll pick the one with fewer surprises. Running out of inodes would be pretty surprising for me. But then, I guess it's a good thing I don't admin for a living anymore. signature.asc Description: OpenPGP digital signature
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Russell Cattelan wrote: On Sun, 2006-07-23 at 01:20 -0600, Hans Reiser wrote: Jeff, I think that a large part of what is going on is that any patch that can be read in 15 minutes gets reviewed immediately, and any patch that is worked on for 5 years and then takes a week to read gets [...] It is importand that we embrace our diversity, and be happy for the strength it gives us. Some of us are good at small patches that evolve, and some are good at escaping local optimums. We all have value, both trees and grass have their place in the world. Which is summed up quite well by: http://en.wikipedia.org/wiki/Color_of_the_bikeshed It seem to be a well know tendency for people to want to be involved in some way, thus keeping to much of the development cycle internal tends to generate friction. No, I think Hans is right. Although I should mention, Hans, that there is a really good reason to prefer the 15 minute patches. The patches that take a week are much harder to read during that week than any number of 15 minute incremental patches, because the incremental patches are already broken down into nice, small, readable, ordered chunks. And since development follows some sort of logical, orderly pattern, it can be much easier to read it that way than to try to consider the whole. Think of it this way -- why are debuggers useful? One of the nicest thing about a debugger, especially for newbies, is the ability to step through a program a line at a time. It's the same principle -- you can understand the program state at one point in time, and the impact of one line of code, much more easily than the overall model of the program state (and all of its edge cases), or the impact of several hundred (thousand? tens of thousands?) lines of code. So while I don't blame the Namesys team for putting off inclusion till it's done, I also can't really blame the kernel guys for not wanting to read it, especially if it's revolutionary. Revolutionary ideas are hard to grasp, and it's not their fault. I mean, if revolutionary ideas were easy, why didn't you write Reiser4 for a system like, say, Tunes? (http://tunes.org/) Say what you will, but there are ways of doing fast filesystems which don't require that said filesystems be written in kernel C. Consider this: http://www.cs.utah.edu/flux/oskit/ If I understand that right, it's a mechanism for writing kernel code in languages like Java, Lisp, Scheme, or ML... If we could all grasp every single good (best) idea from every corner of software engineering, and write completely new software (including the OS) using those ideas, we could potentially replace all existing software in something like 3-5 years with software which has none of the problems ours does now. We'd never have inflexibility, insecurity, instability, user interface issues... Never have to worry about getting software out the door (it'd be so fast to develop), but always have it designed the right way the first time, yet be able to rearrange it completely with only 5-10 line patches. So it's not always the computer hardware that's the limitation. Often it's our hardware as well. Human beings usually aren't equipped to be able to grok the whole universe all at once. If we were... see above. signature.asc Description: OpenPGP digital signature
Re: Viewing files as directories
Timothy Webster wrote: WARNING, a users point of view ;) Everything is a file, including a directory. Being able to view files as directories is not just a nice to have thing. It is actually required if we are going to manage changesets of odf files. The lkml people will tell you that this isn't required at all, and it's ludicrous to say so. And they're somewhat right -- you could just patch SVN, and it might be easier than writing a Reiser4 plugin. The truth is most people aren't code developers, but document developers. odf files are a container. And it is XML inside. Come on, do you really expect people to read XML diffs? Even if you split the XML out into files/dirs based on elements, using SVN directly would be way too arcane to people who are used to what word processors already do -- it's something called Track Changes. Fire up OpenOffice and check out the Edit-Changes menu. Word has a similar feature. Not as powerful, maybe, but most people are not collaborative document developers, either. But just about just about every program or script would be better off seening the odf as a compressed directory. Maybe, maybe not. Yes it would be really wonderful, if we could just see directories as file and files as directories. Which of course means a file and a directory are one in the same. Ever use OS X? It does this, to some extent, in Finder, which supports the lkml point that doing this in the filesystem, or anywhere in the kernel, is unnecessary and a bad idea. As things stand now the way forward seams to be per application program mime types. Simple right, but it is not because, applications tools like svn, brz, There are two OS X file types that I know of, and probably quite a few more, which are actually stored on disk as folders, which is why most Mac software is distributed as disk images or zipfiles. One is the Application type (.app, though Finder hides the extension) and the other is the MPKG type (whatever it stands for, extension is .mpkg). Basically, they appear as ordinary files to Finder, which means that most of the time, you cannot see that there are files inside them. You double click on a .app, and it runs a script in a predefined relative location inside the folder. Double click on a .mpkg, and it launches their installer program. Drag them around and they behave like files in every way, except that you cannot email them, upload them to a web page, or interact with anywhere other than your local Mac system which expects single files. But when you run into that, just zip them. But if you want, you can right-click on them (or control+click) and -- I forget which option it is, but you can browse inside the package. By the way, Hans, Apple has beaten you by quite a bit for at least some of the functionality we've discussed. You can do operations on Search Folders easily, which work by using Spotlight (an indexed fulltext local system search engine). You can have files-as-directories, to a point. There are generic ways of getting at metadata, and they are done as plugins -- Spotlight plugins, anyway. I'd much rather use the Reiser4 described in the whitepaper, of course, and I am getting sick of the lack of decent package management for my Mac, so I'll be adding a Linux boot. I'm curious to see if Reiser4 is stable on PowerPC -- this is a year-old G4, I missed the Intel cores by just a few short months... signature.asc Description: OpenPGP digital signature
Re: ReiserFS v3 choking when free space falls below 10% - FIXED
Mike Benoit wrote: Thanks for all your hard work, I'm sure many other MythTV users will be appreciate it. As a future MythTV user a bit late to this discussion, I'm curious -- was this Reiser3 or 4? Are there any known MythTV issues with v4? I say this because the box with my capture card is running on a Reiser4 root right now... signature.asc Description: OpenPGP digital signature
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Horst H. von Brand wrote: 18GiB = 18 million KiB, you do have a point there. But 40 million files on that, with some space to spare, just doesn't add up. Right, ok... Here's a quick check of my box. I've explicitly stated which root-level directories to search, to avoid nfs mounts, chrooted OSes, and virtual filesystems like /proc and /sys. elite ~ # find /bin/ /boot/ /dev/ /emul/ /etc/ /home /lib32 /lib64 /opt /root /sbin /tmp /usr /var -type f -size 1 | wc -l 246127 According to the find manpage: -size n[bckw] File uses n units of space. The units are 512-byte blocks by default or if `b' follows n, bytes if `c' follows n, kilobytes if `k' follows n, or 2-byte words if `w' follows n. The size does not count indirect blocks, but it does count blocks in sparse files that are not actually allocated. And I certainly didn't plan it that way. And this is my desktop box, and I'm just one user. Most of the space is taken up by movies. And yet, I have almost 250k files at the moment whose size is less than 512 bytes. And this is a normal usage pattern. It's not hard to imagine something prone to creating lots of tiny files, combined with thousands of users, easily hitting some 40 mil files -- and since none of them are movies, it could fit in 18 gigs. I mean, just for fun: elite ~ # find /bin/ /boot/ /dev/ /emul/ /etc/ /home /lib32 /lib64 /opt /root /sbin /tmp /usr /var | wc -l 866160 It may not be a good idea, but it's possible. And one of the larger reasons it's not a good idea is that most filesystems can't handle it. Kind of like how BitTorrent is a very bad idea on dialup, but a very good idea on broadband. signature.asc Description: OpenPGP digital signature
Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion
Hans Reiser wrote: to use as his default. Now that we paid the 5 year development price tag to get everything as plugins, we can now upgrade in littler pieces than any other FS. Hmm, I need a buzz phrase, its not extreme programming, maybe moderate programming. Does that sound exciting to Hah! No, it doesn't sound exciting. Plugins don't work well either, not as a marketing concept. People have had so many bad experiences with plugins, and they're only ever visible when you have a bad experience. Think about it -- missing plugin (so you have to download it), On the other hand, it works for WordPress. My day job is work on a plugin for WordPress. Not including a link because I feel dirty for having to work with PHP... Fluid programming? If you build a solution from the bottom up with gravel or large rocks, you leave gaps that are hard to fill without ripping off the top layer and redoing it. But if you can do fluid programming, your program just flows around any obstacle, and into every crack / between every space (metaphor for new customer requirements)... signature.asc Description: OpenPGP digital signature