Re: Filesystem corruption

2007-05-30 Thread David Masover
On Tuesday 29 May 2007 07:36:13 Toby Thain wrote:

  but you can't
  mention using reiserfs in mixed company without someone accusing
  you of
  throwing your data away.

 People who repeat this rarely have any direct experience of Reiser;
 they repeat what they've heard; like all myths and legends they are
 transmitted orally rather than based on scientific observation.

Well, there is one problem I vaguely remember that I don't think has been 
addressed, I think it was one of those lets-put-it-off-till-v4 things. It was 
the fact that there are a limited number of inodes (or keys, or whatever you 
call a unique file), and no way of knowing how many you have left until your 
FS will suddenly, one day refuse to create another file.

(For comparison, ext3 seems to support not only telling you how many inodes 
you have left, but tuning that on the fly.)

But, I haven't run into that, and the only problem I've had lately has been 
Reiser4 losing data, and crashing occasionally. I switched most of my data 
off of Reiser4 and onto XFS for that reason. I've also been using ext3 in 
some places, and Reiser3 in others (one place in particular where space is 
limited, but I will have tons of small files).

I later learned that XFS does out-of-order writes by default, making me think 
I should give up and invest in UPS hardware. But, switching away from Reiser4 
means I no longer see random files (including stuff in, for example, /sbin, 
that I hadn't touched in months) go up in smoke.

Ordinarily I like to help debug things, but not at the risk of my data. Maybe 
I'll try again later, and see if I can reproduce it in a VM or somewhere 
safe...

I do still follow the list, though, in case something interesting happens. It 
was fun while it lasted!


pgpariYsg6fOw.pgp
Description: PGP signature


Re: Filesystem corruption

2007-05-30 Thread David Masover
On Wednesday 30 May 2007 11:42:01 Toby Thain wrote:

 But does it cause data loss? One usually sees claims that reiserfs
 ate my data, or I heard reiserfs ate somebody's data, but without
 supplying a root cause - bad memory? powerfail? bad disk? etc.

Power failure shouldn't kill a filesystem, and generally shouldn't eat data 
that was written to disk before the failure. (Although I could complain all 
day here about why corruption happens anyway when you do any kind of 
out-of-order operations...  I am looking forward to that Reiser4 transaction 
API, so we can finally get rid of the tmpfile+rename hack.)

But in any case, there were some kernels -- 2.4.16, I think? -- in which 
reiserfs was unstable and did corrupt easily. I believe that was tracked down 
to kernel bugs outside of reiserfs.


pgpro4QoRvDOq.pgp
Description: PGP signature


Re: Filesystem corruption

2007-05-30 Thread David Masover
On Wednesday 30 May 2007 12:22:17 devsk wrote:

 I have used R4 for a year now and I have had to reset my PC,
 troubleshooting problems with vmware/mythtv/cisco vpn client/nvidia, so
 many times that its not even funny! And R4 didn't give me any problems even
 once. It boots right up, without any files lost and consistent FS as a
 subsequent livecd boot and fsck proved it everytime.

That happened to me for maybe a year or so, I'm not sure. Then, slowly, I 
started to get problems. The machine crashing due to some nvidia bug -- or 
even a reiser-specific oops or something -- then I'd have to fsck it, which 
would take an hour or more, then I'd boot, and apparently no problems.

Only, recently, these fsck-a-thons started happening more and more often, and 
I started to lose random files. They'd just be silently truncated to 0 bytes. 
And not files I was writing a lot -- I'm talking about things 
like /bin/mount.

Now, maybe it's an amd64-specific bug. Or (somehow) a dmraid-specific bug, or 
a dont_load_bitmap bug. (Who can blame me; without dont_load_bitmap, it takes 
at least 30 seconds, maybe a minute to mount.) Could even be, somehow, a 
Gentoo-specific bug. Could be a 350-gig-partition bug, or even a bug of the 
it-hates-me variety. (My server ran Reiser4 for awhile longer, with no 
problems, but I wasn't about to take chances there.)

But, I switched a friend over to Ubuntu, and he had the same kind of problems. 
In fact, he had them first (I thought it was his computer, for awhile).

Finally, we switched to stock Ubuntu kernels and XFS, me on dmraid, him on 
normal linux raid5 (md), and we now have no problems. It's even faster -- the 
biggest gain for Reiser4 was /usr/portage, which doesn't exist on Ubuntu.

 If I did that to ext 
 or xfs, I would have lost big time.

Well, I'm on XFS on my desktop now, and ext3 on my server. No problems at all 
so far. Also much faster, because my desktop now has a repacker (xfs_fsr).

 I hope people don't leave this good piece of code to rot!!

Me too, but you know, I can no longer afford to spend a few hours running fsck 
for no apparent reason. I no longer have a machine that can do anything but 
just work.

The killer feature of Reiser4, as implemented, is small file performance that 
makes ReiserFSv3 weep, and v3 makes XFS weep. All the other stuff we were 
promised is either planned for a later release (repacker, pseudofiles, 
transaction API) or barely working (cryptocompress).

And on just about any setup I work on today, small file performance is a small 
enough priority that even the slightest hint of instability is a 
deal-breaker. Enough people feel the same way that ext3 is still widely used. 
And if it's ever really crucial, there's reiserfs3.

So, you can blame it on my hardware, or on not getting kernel inclusion, or 
anything you want, but the only place I still use Reiser4 is on the 
gameserver at our LAN party, and we're thinking of moving that to something 
like ext3 or xfs, just so we don't need custom kernels. And after all, that's 
a gameserver, it's not like the filesystem is the bottleneck anyway.


pgpyny6ogblkT.pgp
Description: PGP signature


Re: Filesystem corruption

2007-05-30 Thread David Masover
On Wednesday 30 May 2007 11:02:26 Vladimir V. Saveliev wrote:

  Ordinarily I like to help debug things, but not at the risk of my data.
  Maybe I'll try again later, and see if I can reproduce it in a VM or
  somewhere safe...

 that would be great, thanks

Keep in mind, it's unlikely, given I don't have much resembling my original 
setup left around. And it was fairly random, under fairly normal usage 
patterns -- just I'd suddenly notice my movie had stopped playing, and I'd 
hit ctrl+alt+f8 and find a bunch of reiser4 error messages.

Is it at all likely that this is an amd64 bug? (The only two places I've seen 
it are on my box and my friend's, both amd64 on some sort of RAID.) If you 
don't have enough testers or hardware for amd64, I can try (again) to setup a 
working x86_64 VM for you to test on.


pgphsmCDRGDn1.pgp
Description: PGP signature


Reiser4 crash (?)

2006-10-06 Thread David Masover
Finally set up network logging:  kernel - syslog-ng - TCP (crossover)
- syslog-ng (other box) - log file.  This time, I actually caught
something from the crash.  It may be hardware-related, but I thought I'd
report it here this time because the crash was definitely in Reiser4
code.  This may or may not be relevant:

Oct  6 02:15:27 elite irq event 217: bogus return value 8027d751
Oct  6 02:15:27 elite
Oct  6 02:15:27 elite Call Trace: IRQ
8028f765{__report_bad_irq+53}
Oct  6 02:15:27 elite 8028f812{note_interrupt+82}
8028f1e9{_
_do_IRQ+169}
Oct  6 02:15:27 elite 802611e2{do_IRQ+66}
8025f290{default_i
dle+0}
Oct  6 02:15:27 elite 802582ac{ret_from_intr+0} EOI
8025a76
f{thread_return+0}
Oct  6 02:15:27 elite 8025f2ba{default_idle+42}
8024592d{cpu
_idle+61}
Oct  6 02:15:27 elite 8048384f{start_kernel+495}
80483255{_s
inittext+597}
Oct  6 02:15:27 elite handlers:
Oct  6 02:15:27 elite [8033cea0] (usb_hcd_irq+0x0/0x60)
Oct  6 02:15:27 elite [880b55a0] (nv_nic_irq+0x0/0x180
[forcedeth])

This message repeated a couple of times beforehand.  The only log
entries between that and the crash are my ntpd, apparently trying to set
the clock, apparently not bothering to actually do it:

Oct  6 02:16:03 elite ntpd[4996]: adjusting local clock by -2012.380177s
Oct  6 02:19:55 elite ntpd[4996]: adjusting local clock by -2012.350292s
Oct  6 02:20:25 elite ntpd[4996]: adjusting local clock by -2012.088300s
Oct  6 02:24:42 elite ntpd[4996]: adjusting local clock by -2011.895028s
Oct  6 02:27:58 elite ntpd[4996]: adjusting local clock by -2011.637006s
Oct  6 02:32:12 elite ntpd[4996]: adjusting local clock by -2011.575243s
Oct  6 02:33:15 elite ntpd[4996]: adjusting local clock by -2011.448711s
Oct  6 02:35:23 elite ntpd[4996]: adjusting local clock by -2011.293673s

I can't believe my clock is skewed that badly, especially when I
manually restart it, I get log entries like this:

Oct  5 04:05:50 grunt ntpd[6601]: adjusting local clock by -0.146991s

I believe this is the crash, though:

Oct  6 02:38:36 elite Unable to handle kernel NULL pointer dereference
at 00
38 RIP:
Oct  6 02:38:36 elite
880808e7{:reiser4:search_one_bitmap_forward+135}
Oct  6 02:38:36 elite PGD 7552e067 PUD 7de47067 PMD 0
Oct  6 02:38:36 elite Oops:  [1]
Oct  6 02:38:36 elite CPU 0
Oct  6 02:38:36 elite Modules linked in: xt_tcpudp xt_state ip_conntrack
iptable
_filter ip_tables nfnetlink_queue nfnetlink xt_NFQUEUE x_tables nfs nfsd
exportf
s lockd sunrpc snd_seq_midi snd_emu10k1_synth snd_emux_synth
snd_seq_virmidi snd
_seq_midi_emul snd_pcm_oss snd_mixer_oss snd_seq_dummy snd_seq_oss
snd_seq_midi_
event snd_seq snd_emu10k1 snd_rawmidi snd_ac97_codec snd_ac97_bus
snd_pcm snd_se
q_device snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore
nvidia af
_packet usb_storage joydev ide_cd cdrom amd74xx ide_core sk98lin
forcedeth unix
reiser4 dm_mod sd_mod sata_nv libata scsi_mod
Oct  6 02:38:36 elite Pid: 428, comm: ktxnmgrd:dm-4:r Tainted: P
2.6.17.13
#2
Oct  6 02:38:36 elite RIP: 0010:[880808e7]
880808e7{:reiser4
:search_one_bitmap_forward+135}
Oct  6 02:38:36 elite RSP: 0018:81007f1f5888  EFLAGS: 00010246
Oct  6 02:38:36 elite RAX:  RBX: 0001 RCX:
81007
f1f591c
Oct  6 02:38:36 elite RDX: c204d660 RSI: 810001d8fc10 RDI:
81000
1d8fc10
Oct  6 02:38:36 elite RBP: 81007f1f591c R08: 81007f1f5778 R09:
0
000
Oct  6 02:38:36 elite R10: 0010 R11: 81007f1f57d8 R12:
0
001
Oct  6 02:38:36 elite R13: 7fe0 R14: 7fe0 R15:
81007
f1f5970
Oct  6 02:38:36 elite FS:  2af33ea0f1b0()
GS:8047a000() knlG
S:
Oct  6 02:38:36 elite CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
Oct  6 02:38:36 elite CR2: 0038 CR3: 7d4f1000 CR4:
0
6e0
Oct  6 02:38:36 elite Process ktxnmgrd:dm-4:r (pid: 428, threadinfo
81007f1f
4000, task 81007e64a200)
Oct  6 02:38:36 elite Stack: 8000 00010001
81007f1f591c
c204d660
Oct  6 02:38:36 elite 81003df44004 7f1f5ae8 00200020
810
07f1f5968
Oct  6 02:38:36 elite 81007f1f591c 0001
Oct  6 02:38:36 elite Call Trace:
88080b11{:reiser4:bitmap_alloc_forwa
rd+145}
Oct  6 02:38:36 elite 88080d2f{:reiser4:alloc_blocks_bitmap+447}
Oct  6 02:38:36 elite 88060e53{:reiser4:plugin_by_unsafe_id+35}
Oct  6 02:38:36 elite 8807f6a1{:reiser4:item_length_by_coord+17}
Oct  6 02:38:36 elite 8804fa65{:reiser4:reiser4_alloc_blocks+165}
Oct  6 02:38:36 elite 88053db9{:reiser4:allocate_znode_update+265}
Oct  6 02:38:36 elite
8807f9d1{:reiser4:item_body_by_coord_hard+17}
Oct  6 02:38:36 elite 880776ef{:reiser4:internal_at+15}
88077
709{:reiser4:pointer_at+9}
Oct  6 02:38:36 elite 

BitTorrent+Reiser4: curiouser and curiouser

2006-09-22 Thread David Masover
Azureus had a problem.  Once it got up to a good clip downloading, it 
would thrash the disk.  It would thrash the disk, and the system, so 
hard that even web browsing was difficult, due to disk access being 
many, many times slower than Internet access, even an Internet which is 
being hogged by BitTorrent.


After changing Azureus' cache to 32 megs, and telling it not to write 
files immediately, I thought I had the problem solved -- no thrashing at 
all.  Until the cache got full.  Then:  Thrashing.  Less freqent, but 
much more vigorous -- Azureus becomes extremely unresponsive for a few 
minutes.


It shouldn't be touching the disk AT ALL when there's over a gig of FREE 
RAM (as in, neither buffer nor cache nor actually used yet), and the 
file I'm attempting to download is less than 200 megs.  I tried an 
strace, but as I am not at all skilled in the ways of debugging or 
reverse engineering, I got syscall spam -- a 200 meg log file, and when 
I finally found a decent way to analyze it, I found most of Azureus' 
system call wall time is spent in futex().  Huh?


Looked up futex on Wikipedia, and I still have no clue how this makes 
any sense.  Either futex was somehow thrashing the disk, or Azureus has 
somehow managed to fork completely out of strace's control.  Or maybe 
it's somehow something that the kernel is doing on its own, which is 
somehow forcing azureus to block, but somehow not tripping strace's 
timers while doing so.


This problem did not always happen with my Reiser4, but unfortunately, I 
can't pin down exactly when it started doing this.  It might have been a 
kernel upgrade, a Reiser4 upgrade, or an Azureus upgrade.


Here's the catch, though -- when I finally tried another client 
(BitTornado, on the same file), I have had absolutely no thrashing yet. 
 It's hardly touched the disk.  I was thinking maybe Azureus synced 
somehow, and BT didn't, but running sync on the commandline took about 
2 seconds.  Which means that, with BitTornado, everything works exactly 
the way it's supposed to.


So I'm happy it works, but I'm still curious why Azureus thrashed so 
much, and BitTornado doesn't thrash at all.  Maybe it's the apps?  Or 
Python vs Java?  Or maybe it's something like Evolution and column 
resizing -- something so embarrassingly, retardedly inefficient as 
flushing the column width information to disk every couple of pixels, 
that went unnoticed for so long because fsync performs well enough on 
other filesystems.


That's what it seems like to me, but one thing's sure -- it is neither 
fsync nor fdatasync.  I've disabled those at the kernel level.  I've 
still got no clue as to what it is, but I'll be glad to be rid of 
Azureus just as soon as I can actually find the features I like from it 
in other BitTorrent clients.


Re: BitTorrent+Reiser4: curiouser and curiouser

2006-09-22 Thread David Masover

Konstantin Münning wrote:

David Masover wrote:

(snip)

It shouldn't be touching the disk AT ALL when there's over a gig of FREE
RAM (as in, neither buffer nor cache nor actually used yet), and the
file I'm attempting to download is less than 200 megs.  I tried an
strace, but as I am not at all skilled in the ways of debugging or
reverse engineering, I got syscall spam -- a 200 meg log file, and when
I finally found a decent way to analyze it, I found most of Azureus'
system call wall time is spent in futex().  Huh?

Looked up futex on Wikipedia, and I still have no clue how this makes
any sense.  Either futex was somehow thrashing the disk, or Azureus has
somehow managed to fork completely out of strace's control.  Or maybe
it's somehow something that the kernel is doing on its own, which is
somehow forcing azureus to block, but somehow not tripping strace's
timers while doing so.


Have you used -f or -ff with strace?


I used -f.  What's the difference between that and -ff?


Re: BitTorrent+Reiser4: curiouser and curiouser

2006-09-22 Thread David Masover

Alexander Zarochentsev wrote:

I guess futex (, FUTEX_WAIT, ) calls can be ignored in this analysis. 
They just wait another threat to call futex(, FUTEX_WAKE, ).  
Interesting to find that thread and look what it was doing before 
FUTEX_WAKE? Or FUTEX_WAIT returns ETIMEDOUT?


It probably would be interesting, but I'm a complete newbie at strace. 
You'll have to walk me through this one step by step...


Re: reiser4 resize

2006-09-21 Thread David Masover

Alexey Polyakov wrote:

On 9/20/06, Łukasz Mierzwa [EMAIL PROTECTED] wrote:


It's been proven that flushes are doing much more job then they should.
Not so long ago someone send a trace of block device io accesess during
reiser4 work and someone anylized it and said that some files or parts of
file where written over and over 200 times or so.


Wow.  I should go back and read that -- I assume this is being worked on?


a few months old filesystem that had been used often just shows a week
spot in reiser4, while downloading files with azureus with only 64KB of
data per second I got disk lid on almost all the time, swithcing to
rtorrent helped as it does not seem to call fsync ( I think I disabled
fsync in azureus).


Hmm, strange.  I am using Azureus, but I don't think it's fsync.  I can 
try rtorrent, but there are several things I like about Azureus that 
nothing else seems to do yet.


But also, Azureus didn't always do this.  In fact, I used it for several 
months before I started having this problem.



Ah, I see, if bittorrent calls fsync often, it's no wonder that
reiser4 behaves badly. I had to preload libnosync for some of my
programs that do fsync to avoid this.


Way ahead of you.  I noticed how much fsync performance sucked when 
using vim, and I was sick of waiting 10 seconds every time I hit :w -- a 
LOT of stuff can pile up in 2 gigs of disk buffer, and at the time, 
Reiser4 fsync effectively just called sync.


I didn't know about libnosync (or it didn't exist yet, or didn't work, 
I'm not entirely sure), so I was faced with either patching vim, which 
had just been patched to _add_ fsync'ing, not to mention all the other 
programs that might fsync too much;  patching glibc (huge, I don't 
update it often, and I'd have no idea where to start);  or patching the 
kernel.


I now keep backups, and I maintain and apply the following (STUPID, 
DON'T TRY THIS AT HOME) patch to my kernel:


--- linux/fs/buffer.c   2006-08-15 20:40:36.504608696 -0500
+++ linux/fs/buffer.c.new   2006-08-15 20:42:35.877461264 -0500
@@ -366,12 +366,12 @@

 asmlinkage long sys_fsync(unsigned int fd)
 {
-   return __do_fsync(fd, 0);
+   return 0;
 }

 asmlinkage long sys_fdatasync(unsigned int fd)
 {
-   return __do_fsync(fd, 1);
+   return 0;
 }

 /*


Re: reiser4 resize

2006-09-21 Thread David Masover

Alexey Polyakov wrote:

On 9/19/06, David Masover [EMAIL PROTECTED] wrote:


When I have over a
gig of RAM free (not even buffer/cache, but _free_), and am trying to
download anything over BitTorrent, even if it's less than 200 megs, the
disk thrashes so badly that the system is really only usable for web and
email.  Even movies will occasionally stall when this is happening, and
by occasionally, I mean every minute or so.


Do you have this problem on plain vanilla + reiser4?


Yes.

Well, no.  My kernel is:

vanilla 2.6.17.13 on amd64

patches:
sk98lin 8.36, latest from the manufacturer
reiser4-for-2.6.17-3
my own patch that disables fsync and fdatasync

external modules, installed via Portage:
ALSA 1.0.11 driver, using snd_emu10k1 and all sorts of support stuff 
(OSS emulation, synth, etc)

nvidia driver, 1.0.8762

I've also been having a bit of instability issues, but only very rarely 
do these seem at all FS-related.  I'm overclocked a bit, and I can 
reliably crash my system by playing Neverball, Doom 3, or Quake 4 for 
several hours.  I strongly suspect this is either my overclocking or the 
nvidia drivers here.


However, I doubt anything I've done beyond vanilla+reiser4 is affecting 
this disk access issue, and I'm pretty much rock solid when I'm not 
playing a game.  I also have a close-to-identical machine nearby which 
is not overclocked, same kernel, same modules, everything except the 
nvidia driver, been rock solid for a year, no performance issues to 
speak of.  The main difference, other than graphics, is that the stable 
machine is using 21 gigs out of 72, whereas the unstable one (the one 
that's sluggish for BitTorrent) is using 279 gigs out of 350, and has 
been up to 320 or 330 at least before I started cleaning things out.


So I think we're down to two possibilities:  Either an update to Azureus 
has found a way to sync that I'm not aware of, or this is the behavior 
someone described where Reiser4 will attempt to find contiguous space to 
allocate, and continue searching and re-searching the same areas of the 
disk almost every write.  To be honest, I hope it's about syncing, 
somehow, because I'd much rather believe my disk isn't horrendously 
fragmented...


Re: reiser4 resize

2006-09-19 Thread David Masover

Vladimir V. Saveliev wrote:

Hello

On Tuesday 19 September 2006 05:12, Jack Byer wrote:

Short summary: Will a resize program for reiser4 be available within the
next six months?



Currently nobody works on that. So, I guess it is not very likely that 
reiser4.resize will be created within next six months.


Not even an expand?  I know a shrink depends on a working repacker (even 
an offline one), but I'd think expanding it would be simple enough, so 
long as there's a big warning of You cannot undo this (can't shrink)!



When I first created the filesystem, there was a reiser4 resize program.
This is no longer the case.


that was not a working program.


Yes, I remember that, it was a stub.


I think you should change to a filesystem which has resize.


Alternately, how much would it cost to implement basic resizefs.reiser4?

There are other reasons that make me wish I'd stayed away from reiser4 
for awhile.  Mainly, right now, I need a repacker, and the system seems 
to have become absurdly slow when it's fragmented.  When I have over a 
gig of RAM free (not even buffer/cache, but _free_), and am trying to 
download anything over BitTorrent, even if it's less than 200 megs, the 
disk thrashes so badly that the system is really only usable for web and 
email.  Even movies will occasionally stall when this is happening, and 
by occasionally, I mean every minute or so.


I believe there was a patch to address the thrashing, so I'm eagerly 
awaiting 2.6.18, but the lack of a repacker bothers me.


Re: v3 rebuild-tree left system in unusable state because of space shortage

2006-09-15 Thread David Masover

Vladimir V. Saveliev wrote:


while there is no fix currently for this problem you can solve the problem by 
expanding underlaying device.


Just curious, could it also be fixed by mounting the FS, freeing up some 
space, then retrying the FSCK?  Or is the FS unusable?


Re: Relocating files for faster boot/start-up on reiser(fs/4)

2006-09-15 Thread David Masover

Quinn Harris wrote:

On Thursday 14 September 2006 23:15, Toby Thain wrote:

On 14-Sep-06, at 6:23 PM, David Masover wrote:

Quinn Harris wrote:

On Thursday 14 September 2006 13:55, David Masover wrote:

...

That is a good point.  Recording the disk layout before and after
to compare relative fragmentation would be a good idea.  As well
as randomizing the sequence as a sanity check.
Also note that during boot I was using readahead on all 3885
files.  So the kernel has a good opportunity to rearrange the
reads.  And the read sequence doesn't necessary match the order
its needed (though I tried to get that).

Speaking of which, did you parallize the boot process at all?

Just off the top of my head, wouldn't that make the access sequence
asynchronous  thereby less predictable? (Although I'm sure it's a
net win.)
It could, but the kernel will try to reorder the outstanding block requests to 
reduce seek.  If that is an overall win I don't know.  In addition early in 
the boot, readahead-list or similar will tell the kernel to start reading 
most of the files need for the complete boot so they are already in memory 
when needed.  Ubuntu does the readahead now and all my tests where with 
readahead.


That's interesting.  I think either parallizing or a very aggressive 
readahead will perform similarly, except in cases where you have a 
script blocking on something other than disk or CPU, like, say, network.



I'd estimate my system easily spent more than 50% of its boot time
not touching the disk at all before I did that.  Gentoo can do
this, I'm not sure what else, as it kind of needs your init system
to understand dependencies.

...




The current Ubuntu boot waits for hardware probing, DHCP and other things 
giving the disk readahead a chance to work.  I think this reallocation might 
help a parallel boot more as the data will be needed sooner.  So I changed my 
mind, I think parallel boot will highlight the reallocate advantage.  Now I 
just need to test the hypothesis.


Hmm.  That's possible.  But again, even with the parallel boot, there 
was still a bit of time spent not touching the disk, so I wouldn't 
expect much more of a speedup than what you already have.  Which also 
means, by the way, that I wouldn't use it much -- my system takes more 
like 20 seconds from Grub to a login prompt, and from then on, the only 
things that take more than 5 seconds to load are games.  Since I know 
Quake 4 uses zipfiles (probably compressed) for its storage, and I 
watched the HD LED while it loads, I don't think I can speed that up at 
all short of buying a faster CPU.


Well, that and the Portage tree, but you say I shouldn't expect much 
from that.  Maybe the portage cache?


Not sure if I would be better of trying initng or waiting for upstart (Ubuntus 
new init) to get scripts that actually parallel boot.  The code for upstart 
is very clean and it has the backing of a major distro, so I have high hopes.


Hmm.  That sounds kind of cool, but I wonder how it compares to Gentoo's 
init scripts?  I guess I'll have to wait till it hits the one Ubuntu box 
I have...


Much like before, I was able to improve a 16.5s oowriter cold start to 14s 
with this reallocate script, with a cold start of 4.8s (OO 2.0.2, was using 
2.0.3 before).


Wait -- cold start is 14s, but it's also 4.8s?  Did you mean warm/hot 
start for that last number?


I think Python will be the best language for this because its become 
relatively universal and its easy to understand for the uninitiated.  This 
really isn't black magic so transparency is good.  I personally prefer Ruby 
though.


Wait...  Python is more universal than Ruby of Ruby on Rails?

Python is faster, anyway...  I'm waiting for someone to do a decent 
implementation of Ruby on something like .NET before I start using it 
for anything I want to perform well.


Re: reiser4: mount -o remount,ro / causes error on reboot

2006-09-10 Thread David Masover

Peter wrote:

Using: gentoo
kernel 2.6.17.11 with beyond patchset
reiser patch 2.6.17-3
reiser4progs 1.0.5

At the end of the gentoo shutdown script is a short function which
remounts / as ro.


There's also one in the Gentoo startup script, which attempts to remount 
/ ro, then remount it rw.  I commented that out, because it was causing 
similar problems.  I figure if it runs sync when it shuts down, that's 
good enough.


Still, it's an annoying problem, I think it's a kernel oops.  Namesys, 
what kind of information would be helpful?


Re: FEATURE Req: integrate badblocks check into fsck.reiser*

2006-09-07 Thread David Masover

Ric Wheeler wrote:



David Masover wrote:


Hans Reiser wrote:


Ric Wheeler wrote:




Having mkfs ignore bad writes would seem to encourage users to create
a new file system on a disk that is known to be bad  most likely not
going to function well.  If a user ever has a golden opportunity to
toss a drive in the trash, it is when they notice mkfs fails ;-)  This
option to mkfs sounds like an invitation to disaster.


Yes, you are right, the option should be to run badblocks and then fail
if it finds any.



Unless it creates significantly more work for us, there should be an 
option to run badblocks, and if it finds any, it should prompt the 
user (with BIG FAT CAPSLOCK WARNINGS) whether they want to format 
anyway. Formatting anyway should work, and we should be able to have 
blocks marked bad.



I think that you are missing the way modern drives behave.  To give a 
typical example, on a 300 GB drive, we typically have 2000 or more extra 
sectors that are used for automatic remapping.  Theses sectors are 
consumed only when the drive retries a failed write multiple times.


Oh, I'm not disputing that mkfs should discourage users from using 
broken drives.  Presumably, smart admins wouldn't see this often, 
because they'd be monitoring SMART.


We really, really do not need a list of bad blocks to avoid during 
writing a new file system image.


Why do you presume to make this decision for users?

I don't think we need CONFIG_LEGACY_PTYS -- they're insecure, and almost 
never needed.  But we should still leave them in.  The burden is on us 
to show that it's taking real work to implement and maintain.


I think that the more interesting case is handling bad blocks during 
recovery.  It is not clear to me that fsck needs a list, but we have 
worked with Hans and Vladamir to get support for doing a reverse mapping 
(given a list of bad blocks, show the user what files, etc got hit).


Yes, it seems like fsck would be much better off that way.  In this 
case, of course, I'd prefer to avoid hitting that problem -- use RAID, 
make regular backups, toss out the disk and restore.  Being able to 
repair bad blocks would tend to encourage a user to keep using a bad 
disk, but I don't want to force my opinion on everyone when there's a 
reasonable way for all of us to be happy.


Re: FEATURE Req: integrate badblocks check into fsck.reiser*

2006-09-07 Thread David Masover

Ric Wheeler wrote:

David Masover wrote:



Why do you presume to make this decision for users?


It's not a decision that I want to make for users, it is a decision that 
Hans and his team need to make about how best to spend their limited 
resources.


Agreed.  It's not important if it takes more than, say, an hour.

It will also give more users a bad experience with the file system, 
since users rarely have the in depth knowledge required to make this 
kind of choice.


While it's true that most users just click through dialog boxes, I 
imagine this would be sufficient:


===WARNING===WARNING===WARNING===
-
THIS DISK IS BAD!
If you continue with the format,
we will not help you when you lose data.
When, not if.  You are strongly encouraged to
THROW THIS DISK OUT!
-
ARE YOU ABSOLUTELY SURE YOU WANT TO CONTINUE?
(yes/no):

And require an actual yes or no answer.  No y/n.

Now, compare that to a filesystem which doesn't allow badblocks in mkfs 
at all.  While it's rare, I suspect that would be a worse experience if 
you actually had a real need for it.  If you've got a huge 300 gig drive 
with some bad blocks, you can always throw some stuff on it anyway, for 
backup, or stuff you don't care about, even knowing it'll fail soon.


Again, probably not a high priority item at all, but it certainly won't 
make the user experience worse.  Any user who says yes to the above 
warning does not get to complain about their experience.


Here we mostly agree.  The need for enhanced tools is not to encourage 
people to keep using bad drives, rather to allow them to fsck  remount 
a drive for data recovery.  If you cannot mount  fsck fails to repair 
enough to give you at least a readable file system, then you are in real 
trouble ;-)


Also, unlike failing writes, disk read errors are quite often ephemeral 
and will be self correcting on the next write (you might get an error 
from dust, etc that gets swept clean on the next write).


Either one, I would personally feel quite a lot safer grabbing a disk 
image and doing the fsck once the image was on known good media.


One thing that would be even better here, though, if you don't want to 
spend the time for a huge backup:  A way to tell badblocks to only scan 
space which is actually being used, and then enough free space to make 
sure relocations work.  If you're mounting readonly, you shouldn't care 
about marking every single bad sector in free space.  I guess this would 
require a lot more intelligence from fsck, though -- it would have to be 
able to constantly check for bad blocks, as opposed to just running 
badblocks once and grabbing a list to avoid.


Re: FEATURE Req: integrate badblocks check into fsck.reiser*

2006-09-06 Thread David Masover

Hans Reiser wrote:

Ric Wheeler wrote:



Having mkfs ignore bad writes would seem to encourage users to create
a new file system on a disk that is known to be bad  most likely not
going to function well.  If a user ever has a golden opportunity to
toss a drive in the trash, it is when they notice mkfs fails ;-)  This
option to mkfs sounds like an invitation to disaster.

Yes, you are right, the option should be to run badblocks and then fail
if it finds any.


Unless it creates significantly more work for us, there should be an 
option to run badblocks, and if it finds any, it should prompt the user 
(with BIG FAT CAPSLOCK WARNINGS) whether they want to format anyway. 
Formatting anyway should work, and we should be able to have blocks 
marked bad.


It would also be nice to be able to change this later -- to pass in a 
list of badblocks to, say, fsck (which I think is the original request). 
 This is especially nice for recovery, if you don't have the luxury of 
copying a whole disk image to another drive before running fsck.


That's not to say that we should automatically detect and relocate bad 
blocks during normal operation (while the FS is mounted), but 
deliberately removing functionality to protect you from yourself isn't 
the Linux Way.  Linux has a long history of kernel config options that 
say things like YOU WILL LOSE DATA. You have been warned.


Re: Reiser FS will not boot after crash

2006-09-04 Thread David Masover

[EMAIL PROTECTED] wrote:

On Mon, 04 Sep 2006 23:33:27 +0400, Vladimir V. Saveliev said:

after unclean shutdown journal reply is necessary to return reiserfs to 
consistent state. Maybe GRUB did not do that?


A case can be made that GRUB should be keeping its grubby little paws off
the filesystem journal.  It's a *bootloader*.  It's only purpose in life is
to load other code that can make intelligent decisions about things like
how (or even whether) to replay a filesystem journal.


But, unlike Lilo, Grub usually has to load that other code from a 
filesystem, which means it's already doing more than what bootloaders 
traditionally do.


If it was up to me, we'd all be using LinuxBIOS and kexec, and it 
wouldn't be an issue.


Re: wrt: checking reiserfs/4 partitions on boot

2006-09-02 Thread David Masover

Peter wrote:

On the namesys.com FAQ page, it is recommended that 0 0 be placed at the
end of the fstab lines for reiserfs partitions. I have two questions:

1) does this recommendation also apply for reiser4?
2) why is this recommendation made? Is it unnecessary to routinely check
reiser partitions? I understand that in the event of an abnormal shutdown,
fsck will be forced, correct?


I think the idea is that in the event of an abnormal shutdown, you 
simply replay the journal.  With Reiser4, the likelihood of having to 
run fsck should be even less.  Probably isn't now, but should be.


Re: reiser4 corruption on initial copy

2006-09-02 Thread David Masover

Peter wrote:

On Fri, 01 Sep 2006 17:35:29 -0500, David Masover wrote:


Peter wrote:


2) I did run badblocks on the dest, and it was clean. 3) I am using the
patch from 2.6.17.3 and in my kernel, I have full preempt and cfq
scheduling.

What about the kernel on the livecd?



Anticipatory
Voluntary


Yes, that should be fine, but I was wondering if it's the same version, 
if you built it yourself, etc etc.



Plus, it is smp, so there are some additional options checked. Should I
have preempt=none with reiser4?


I'm not sure.


Re: FEATURE Req: integrate badblocks check into fsck.reiser*

2006-09-01 Thread David Masover

Vladimir V. Saveliev wrote:

Hello

On Friday 01 September 2006 22:23, Peter wrote:

Perhaps this has been mentioned before. If so, sorry. IMHO, it would be
useful to integrate a call to badblocks in the fsck/mkfs.reiser* programs
so that more thorough disk checking can be done at format time. Sort of
like the option e2fsck -c. If this is added, the output could be fed
immediately to the reiser format program and badblocks spared prior to
filesystem use.

JM$0.02


both mkfs.reiserfs and fsck.reiserfs have -B option to accept list of bad 
blocks. We thought that should be enough.


It really should.  Why bother with a patch?  Just write a wrapper script 
that runs badblocks and passes in the list to mkfs.


Re: reiser4 corruption on initial copy

2006-09-01 Thread David Masover

Peter wrote:


2) I did run badblocks on the dest, and it was clean. 3) I am using the
patch from 2.6.17.3 and in my kernel, I have full preempt and cfq
scheduling.


What about the kernel on the livecd?


Re: FEATURE Req: integrate badblocks check into fsck.reiser*

2006-09-01 Thread David Masover

Peter wrote:

On Fri, 01 Sep 2006 17:27:20 -0500, David Masover wrote:
snip...

both mkfs.reiserfs and fsck.reiserfs have -B option to accept list of
bad blocks. We thought that should be enough.

It really should.  Why bother with a patch?  Just write a wrapper script
that runs badblocks and passes in the list to mkfs.


It was just a thought from userland. My perspective was that a user, not a
hard-boiled geek, might get lulled into a false sense of security but may
not have the wherewithal to write a wrapper. If nothing else, when the
final doc is written (did I say final?:)), it should include a notice
about not running badblocks.


Well, let's see...  Most hard drives come more thoroughly tested at the 
factory than anything badblocks would do.  Also, it seems redundant to 
have every single mkfs have to implement a badblocks flag..


I'd suggest a universal wrapper, then, or a modification to the mkfs 
frontend, so that this works the same way across all filesystems. 
Something like mkfs -B -t reiser4


Re: Reiser4 und LZO compression

2006-08-31 Thread David Masover

Clemens Eisserer wrote:

But speaking of single threadedness, more and more desktops are shipping
with ridiculously more power than people need.  Even a gamer really
Will the LZO compression code in reiser4 be able to use multi-processor 
systems?


Good point, but it wasn't what I was talking about.  I was talking about 
the compression happening on one CPU, meaning even if it takes most of 
the CPU to saturate disk throughput, your other CPU is still 100% 
available, meaning the typical desktop user won't notice their apps 
running slower, they'll just notice disk access being faster.




Re: Reiser4 und LZO compression

2006-08-30 Thread David Masover

PFC wrote:



Maybe, but Reiser4 is supposed to be a general purpose filesystem
talking about its advantages/disadvantages wrt. gaming makes sense,


I don't see a lot of gamers using Linux ;)


There have to be some.  Transgaming seems to still be making a 
successful business out of making games work out-of-the-box under Wine. 
 While I don't imagine there are as many who attempt gaming on Linux, 
I'd guess a significant portion of Linux users, if not the majority, are 
at least casual gamers.


Some will have given up on the PC as a gaming platform long a go, tired 
of its upgrade cycle, crashes, game patches, and install times.  These 
people will have a console for games, probably a PS2 so they can watch 
DVDs, and use their computer for real work, with as much free software 
as they can manage.


Others will compromise somewhat.  I compromise by running the binary 
nVidia drivers, keeping a Windows partition around sometimes, and 
enjoying many old games which have released their source recently, and 
now run under Linux -- as well as a few native Linux games, some Cedega 
games, and some under straight Wine.


Basically, I'll play it on Linux if it works well, otherwise I boot 
Windows.  I'm migrating away from that Windows dependency by making sure 
all my new game purchases work on Linux.


Others will use some or all of the above -- stick to old games, use 
exclusively stuff that works on Linux (one way or the other), or give up 
on Linux gaming entirely and use a Windows partition.


Anything Linux can do to become more game-friendly is one less reason 
for gamers to have to compromise.  Not all gamers are willing to do 
that.  I know at least two who ultimately decided that, with dual boot, 
they end up spending most of their time on Windows anyway.  These are 
the people who would use Linux if they didn't have a good reason to use 
something else, but right now, they do.  This is not the fault of the 
filesystem, but taking the attitude of There aren't many Linux gamers 
anyway -- that's a self-fulfilling prophecy, gamers WILL leave because 
of it.


Also, as you said, gamers (like many others) reinvent filesystems 
and generally use the Big Zip File paradigm, which is not that stupid 
for a read only FS (if you cache all file offsets, reading can be pretty 
fast). However when you start storing ogg-compressed sound and JPEG 
images inside a zip file, it starts to stink.


I don't like it as a read-only FS, either.  Take an MMO -- while most 
commercial ones load the entire game to disk from install DVDs, there 
are some smaller ones which only cache the data as you explore the 
world.  Also, even with the bigger ones, the world is always changing 
with patches, and I've seen patches take several hours to install -- not 
download, install -- on a 2.4 ghz amd64 with 2 gigs of RAM, on a striped 
RAID.  You can trust me when I say this was mostly disk-bound, which is 
retarded, because it took less than half an hour to install in the first 
place.


Even simple multiplayer games -- hell, even single-player games can get 
fairly massive updates relatively often.  Half-Life 2 is one example -- 
they've now added HDR to the engine.


In these cases, you still need as fast access as possible to the data 
(to cut down on load time), and it would be nice to save on space as 
well, but a zipfile starts to make less sense.  And yet, I still see 
people using _cabinet_ files.


Compression at the FS layer, plus efficient storing of small files, 
makes this much simpler.  While you can make the zipfile-fs transparent 
to a game, even your mapping tools, it's still not efficient, and it's 
not transparent to your modeling package, Photoshop-alike, audio 
software, or gcc.


But everything understands a filesystem.


It depends, you have to consider several distinct scenarios.
For instance, on a big Postgres database server, the rule is to have 
as many spindles as you can.
- If you are doing a lot of full table scans (like data mining etc), 
more spindles means reads can be parallelized ; of course this will mean 
more data will have to be decompressed.


I don't see why more spindles means more data decompressed.  If 
anything, I'd imagine it would be less reads, total, if there's any kind 
of data locality.  But I'll leave this to the database experts, for now.


- If you are doing a lot of little transactions (web sites), it 
means seeks can be distributed around the various disks. In this case 
compression would be a big win because there is free CPU to use ; 


Dangerous assumption.  Three words:  Ruby on Rails.  There goes your 
free CPU.  Suddenly, compression makes no sense at all.


But then, Ruby makes no sense at all for any serious load, unless you 
really have that much money to spend, or until the Ruby.NET compiler is 
finished -- that should speed things up.



besides, it would virtually double the RAM cache size.


No it wouldn't, not the way Reiser4 does it.  

Re: Reiser4 und LZO compression

2006-08-29 Thread David Masover

Nigel Cunningham wrote:

Hi.

On Tue, 2006-08-29 at 06:05 +0200, Jan Engelhardt wrote:

Hmm.  LZO is the best compression algorithm for the task as measured by
the objectives of good compression effectiveness while still having very
low CPU usage (the best of those written and GPL'd, there is a slightly
better one which is proprietary and uses more CPU, LZRW if I remember
right.  The gzip code base uses too much CPU, though I think Edward made

I don't think that LZO beats LZF in both speed and compression ratio.

LZF is also available under GPL (dual-licensed BSD) and was choosen in favor
of LZO for the next generation suspend-to-disk code of the Linux kernel.

see: http://www.goof.com/pcg/marc/liblzf.html

thanks for the info, we will compare them

For Suspend2, we ended up converting the LZF support to a cryptoapi
plugin. Is there any chance that you could use cryptoapi modules? We
could then have a hope of sharing the support.
I am throwing in gzip: would it be meaningful to use that instead? The 
decoder (inflate.c) is already there.


06:04 shanghai:~/liblzf-1.6  l configure*
-rwxr-xr-x  1 jengelh users 154894 Mar  3  2005 configure
-rwxr-xr-x  1 jengelh users  26810 Mar  3  2005 configure.bz2
-rw-r--r--  1 jengelh users  30611 Aug 28 20:32 configure.gz-z9
-rw-r--r--  1 jengelh users  30693 Aug 28 20:32 configure.gz-z6
-rw-r--r--  1 jengelh users  53077 Aug 28 20:32 configure.lzf


We used gzip when we first implemented compression support, and found it
to be far too slow. Even with the fastest compression options, we were
only getting a few megabytes per second. Perhaps I did something wrong
in configuring it, but there's not that many things to get wrong!


All that comes to mind is the speed/quality setting -- the number from 1 
to 9.  Recently, I backed up someone's hard drive using -1, and I 
believe I was still able to saturate... the _network_.  Definitely try 
again if you haven't changed this, but I can't imagine I'm the first 
persson to think of it.


From what I remember, gzip -1 wasn't faster than the disk.  But at 
least for (very) repetitive data, I was wrong:


eve:~ sanity$ time bash -c 'dd if=/dev/zero of=test bs=10m count=10; sync'
10+0 records in
10+0 records out
104857600 bytes transferred in 3.261990 secs (32145287 bytes/sec)

real0m3.746s
user0m0.005s
sys 0m0.627s
eve:~ sanity$ time bash -c 'dd if=/dev/zero bs=10m count=10 | gzip -v1  
test; sync'

10+0 records in
10+0 records out
104857600 bytes transferred in 2.404093 secs (43616282 bytes/sec)
 99.5%

real0m2.558s
user0m1.554s
sys 0m0.680s
eve:~ sanity$



This was on OS X, but I think it's still valid -- this is a slightly 
older Powerbook, with a 5400 RPM drive, 1.6 ghz G4.


-1 is still worlds better than nothing.  The backup was over 15 gigs, 
down to about 6 -- loads of repetitive data, I'm sure, but that's where 
you win with compression anyway.


Well, you use cryptoapi anyway, so it should be easy to just let the 
user pick a plugin, right?


Re: Reiser4 und LZO compression

2006-08-29 Thread David Masover

PFC wrote:



Would it be, by any chance, possible to tweak the thing so that 
reiserfs plugins become kernel modules, so that the reiserfs core can be 
put in the kernel without the plugins slowing down its acceptance ?


I don't see what this has to do with cryptoapi plugins -- those are not 
related to Reiser plugins.


As for the plugins slowing down acceptance, it's actually the concept of 
plugins and the plugin API -- in other words, it's the fact that Reiser4 
supports plugins -- that is slowing it down, if anything about plugins 
is still an issue at all.


Making them modules would make it worse.  Last I saw, Linus doesn't 
particularly like the idea of plugins because of a few misconceptions, 
like the possibility of proprietary (possibly GPL-violating) plugins 
distributed as modules -- basically, something like what nVidia and ATI 
do with their video drivers.


As it is, a good argument in favor of plugins is that this kind of thing 
isn't possible -- we often put plugins in quotes because really, it's 
just a nice abstraction layer.  They aren't any more plugins than 
iptables modules or cryptoapi plugins are.  If anything, they're less, 
because they must be compiled into Reiser4, which means either one huge 
monolithic Reiser4 module (including all plugins), or everything 
compiled into the kernel image.



(and updating plugins without rebooting would be a nice extra)


It probably wouldn't be as nice as you think.  Remember, if you're using 
a certain plugin in your root FS, it's part of the FS, so I don't think 
you'd be able to remove that plugin any more than you're able to remove 
reiser4.ko if that's your root FS.  You'd have to unmount every FS that 
uses that plugin.


At this point, you don't really gain much -- if you unmount every last 
Reiser4 filesystem, you can then remove reiser4.ko, recompile it, and 
load a new one with different plugins enabled.


Also, these things would typically be part of a kernel update anyway, 
meaning a reboot anyway.


But suppose you could remove a plugin, what then?  What would that mean? 
 Suppose half your files are compressed and you remove cryptocompress 
-- are those files uncompressed when the plugin goes away?  Probably 
not.  The only smart way to handle this that I can think of is to make 
those files unavailable, which is probably not what you want -- how do 
you update cryptocompress when the new reiser4_cryptocompress.ko is 
itself compressed?


That may be an acceptable solution for some plugins, but you'd have to 
be extremely careful which ones you remove.  The only safe way I can 
imagine doing this may not be possible, and if it is, it's extremely 
hackish -- load the plugin under another module name, so 
r4_cryptocompress would be r4_cryptocompress_init -- have the module, 
once loaded, do an atomic switch from the old one to the new one, 
effectively in-place.


But that kind of solution is something I've never seen attempted, and 
only really heard of in strange environments like Erlang.  It would 
probably require much more engineering than the Reiser team can handle 
right now, especially with their hands full with inclusion.



The patch below is so-called reiser4 LZO compression plugin as extracted
from 2.6.18-rc4-mm3.

I think it is an unauditable piece of shit and thus should not enter
mainline.


Like lib/inflate.c (and this new code should arguably be in lib/).

The problem is that if we clean this up, we've diverged very much from 
the

upstream implementation.  So taking in fixes and features from upstream
becomes harder and more error-prone.

I'd suspect that the maturity of these utilities is such that we could
afford to turn them into kernel code in the expectation that any future
changes will be small.  But it's not a completely simple call.

(iirc the inflate code had a buffer overrun a while back, which was found
and fixed in the upstream version).









Re: Reiser4 und LZO compression

2006-08-29 Thread David Masover

Gregory Maxwell wrote:

On 8/29/06, David Masover [EMAIL PROTECTED] wrote:
[snip]

Conversely, compression does NOT make sense if:
   - You spend a lot of time with the CPU busy and the disk idle.
   - You have more than enough disk space.
   - Disk space is cheaper than buying enough CPU to handle compression.
   - You've tried compression, and the CPU requirements slowed you more
than you saved in disk access.

[snip]

It's also not always this simple ... if you have a single threaded
workload that doesn't overlap CPU and disk well, (de)compression may
be free even if you're still CPU bound a lot as the compression is
using cpu cycles which would have been otherwise idle..


Isn't that implied, though -- if the CPU is not busy (run top under a 
2.6 kernel and you'll see an IO-Wait number), then the first condition 
isn't satisfied -- CPU is not busy, disk is not idle.


But speaking of single threadedness, more and more desktops are shipping 
with ridiculously more power than people need.  Even a gamer really 
won't benefit that much from having a dual-core system, because 
multithreading is hard, and games haven't been doing it properly.  John 
Carmack is pretty much the only superstar programmer in video games, and 
after his first fairly massive attempt to make Quake 3 have two threads 
(since he'd just gotten a dual-core machine to play with) actually 
resulted in the game running some 30-40% slower than it did with a 
single thread.


So, for the desktop, compression makes perfect sense.  We don't have 
massive amounts of RAID.  If we have newer machines, there's a good 
chance we'll have one CPU sitting mostly idle while playing games. 
Short of gaming, there are few desktop applications that will fully 
utilize even one reasonably fast CPU.  The reason gamers buy dual-core 
systems is they're getting cheap enough to be worth it, and that one 
core sitting idle is a perfect place to do OS/system work not related to 
the game -- antivirus, automatic update checks, the inevitable 
background processes leeching a couple few % off your available CPU.


So for the typical new desktop with about 2 ghz of 64-bit processor 
sitting idle, compression is essentially free.


Re: Reiser4 und LZO compression

2006-08-29 Thread David Masover

Hans Reiser wrote:

David Masover wrote:

  John Carmack is pretty much the only superstar programmer in video
games, and after his first fairly massive attempt to make Quake 3 have
two threads (since he'd just gotten a dual-core machine to play with)
actually resulted in the game running some 30-40% slower than it did
with a single thread.

Do the two processors have separate caches, and thus being overly fined
grained makes you memory transfer bound or?


It wasn't anything that intelligent.  Let me see if I can find it...

Taken from
http://techreport.com/etc/2005q3/carmack-quakecon/index.x?pg=1

Graphics accelerators are a great example of parallelism working well, 
he noted, but game code is not similarly parallelizable. Carmack cited 
his Quake III Arena engine, whose renderer was multithreaded and 
achieved up to 40% performance increases on multiprocessor systems, as a 
good example of where games would have to go. (Q3A's SMP mode was 
notoriously crash-prone and fragile, working only with certain graphics 
driver revisions and the like.) Initial returns on multithreading, he 
projected, will be disappointing.


Basically, it's hard enough to split what we currently do onto even 2 
CPUs, and it definitely seems like we're about to hit a wall in CPU 
frequency just as multicore becomes a practical reality, so future CPUs 
may be measured in how many cores they have, not how fast each core is.


There's also a question of what to use the extra power for.  From the 
same presentation:


Part of the problem with multithreading, argued Carmack, is knowing how 
to use the power of additional CPU cores to enhance the game experience. 
A.I., can be effective when very simple, as some of the first Doom logic 
was. It was less than a page of code, but players ascribed complex 
behaviors and motivations to the bad guys. However, more complex A.I. 
seems hard to improve to the point where it really changes the game. 
More physics detail, meanwhile, threatens to make games too fragile as 
interactions in the game world become more complex.


So, I humbly predict that Physics cards (so-called PPUs) will fail, and 
be replaced by ever-increasing numbers of cores, which will, for awhile, 
be one step ahead of what we can think of to fill them with.  Thus, 
anything useful (like compression) that can be split off into a separate 
thread is going to be useful for games, and won't hurt performance on 
future mega-multicore monstrosities.


The downside is, most game developers are working on Windows, for which 
FS compression has always sucked.  Thus, they most often implement their 
own compression, often something horrible, like storing the whole game 
in CAB or ZIP files, and loading the entire level into RAM before play 
starts, making load times less relevant for gameplay.  Reiser4's 
cryptocompress would be a marked improvement over that, but it would 
also not be used in many games.


Re: Reiser4 und LZO compression

2006-08-29 Thread David Masover

Toby Thain wrote:

Gamer systems, whether from coder's or player's p.o.v., would appear 
fairly irrelevant to reiserfs and this list. I'd trust Carmack's eye 
candy credentials but doubt he has much to say about filesystems or 
server threading...


Maybe, but Reiser4 is supposed to be a general purpose filesystem, so 
talking about its advantages/disadvantages wrt. gaming makes sense, 
especially considering gamers are the most likely to tune their desktop 
for perfomance.


That was a bit much, though.  I apologize.


Re: Reiser4 und LZO compression

2006-08-27 Thread David Masover

Andrew Morton wrote:

On Sun, 27 Aug 2006 04:34:26 +0400
Alexey Dobriyan [EMAIL PROTECTED] wrote:


The patch below is so-called reiser4 LZO compression plugin as extracted
from 2.6.18-rc4-mm3.

I think it is an unauditable piece of shit and thus should not enter
mainline.


Like lib/inflate.c (and this new code should arguably be in lib/).

The problem is that if we clean this up, we've diverged very much from the
upstream implementation.  So taking in fixes and features from upstream
becomes harder and more error-prone.


Well, what kinds of changes have to happen?  I doubt upstream would care 
about moving some of it to lib/ -- and anyway, reiserfs-list is on the 
CC.  We are speaking of upstream in the third party in the presence of 
upstream, so...


Maybe just ask upstream?


Re: [PATCH] reiserfs: eliminate minimum window size for bitmap searching

2006-08-22 Thread David Masover

Jeff Mahoney wrote:

 When a file system becomes fragmented (using MythTV, for example), the
 bigalloc window searching ends up causing huge performance problems. In
 a file system presented by a user experiencing this bug, the file system
 was 90% free, but no 32-block free windows existed on the entire file system.
 This causes the allocator to scan the entire file system for each 128k write
 before backing down to searching for individual blocks.


Question:  Would it be better to take that performance hit once, then 
cache the result for awhile?  If we can't find enough consecutive space, 
such space isn't likely to appear until a lot of space is freed or a 
repacker is run.



 In the end, finding a contiguous window for all the blocks in a write is
 an advantageous special case, but one that can be found naturally when
 such a window exists anyway.


Hmm.  Ok, I don't understand how this works, so I'll shut up.


Re: problem with reiser3

2006-08-22 Thread David Masover

Marcos Dione wrote:

On Mon, Aug 21, 2006 at 08:23:30PM -0500, David Masover wrote:

it would be better to create a backup on a spare bigger partition
using dd_rescue (pad not recoverable zones with zeroes), then run
fsck on the created image.

   unluckly I can't. it's a 160 GiB partition and I don't have spare
space.
How much spare space do you have?  You may be able to do some tricks 
with dm_snapshot...


right now, I have 45 MiB of space in my spare disk. I *could*
(should?) make more space, but can't guarrantee anythung.


That won't be enough.  Worst case, decide whether the data on that 160 
gig partition is worth buying a cheap 200 or 300 gig drive for this backup.


Re: [PATCH] reiserfs: eliminate minimum window size for bitmap searching

2006-08-22 Thread David Masover

Jeff Mahoney wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

David Masover wrote:

Jeff Mahoney wrote:

 When a file system becomes fragmented (using MythTV, for example), the
 bigalloc window searching ends up causing huge performance problems. In
 a file system presented by a user experiencing this bug, the file system
 was 90% free, but no 32-block free windows existed on the entire file
system.
 This causes the allocator to scan the entire file system for each
128k write
 before backing down to searching for individual blocks.

Question:  Would it be better to take that performance hit once, then
cache the result for awhile?  If we can't find enough consecutive space,
such space isn't likely to appear until a lot of space is freed or a
repacker is run.


The problem is that finding the window isn't really a direct function of
free space, it's a function of fragmentation. You could have a 50% full
file system that still can't find a 32 block window by having every
other block used. I know it's an extremely unlikely case, but it
demonstrates the point perfectly.


Maybe, but it's still not a counterpoint.  No matter how fragmented a 
filesystem is, freeing space can open up contiguous space, whereas if 
space is not freed, you won't open up contiguous space.


Thus, if your FS is 50% full and 100% fragmented, then you wait till 
space is freed, because if nothing happens, or if more space is filled 
in, you'll have the same problem at 60% than you did at 50%.  If, 
however, you're at 60% full, and 10% of the space is freed, then it's 
fairly unlikely that you still don't have contiguous space, and it's 
worth it to scan once more at 50%, and again if it then drops to 40%.


So, if your FS is 90% full and space is being freed, I'd think it would 
be worth it to scan again at 80%, 70%, and so on.  I'd also imagine it 
would do little or nothing to constantly monitor an FS that stays mostly 
full -- maybe give it a certain amount of time, but if we're repacking 
anyway, just wait for a repacker run.  It seems very unlikely that 
between repacker runs, activity between 86% and 94% would open up 
contiguous space.


It's still not a direct function of freed space (as opposed to free 
space), but it starts to look better.


I'm not endorsing one way or the other without benchmarks, though.


 In the end, finding a contiguous window for all the blocks in a write is
 an advantageous special case, but one that can be found naturally when
 such a window exists anyway.

Hmm.  Ok, I don't understand how this works, so I'll shut up.


If the space after the end of the file has 32 or more blocks free, even
without the bigalloc behavior, those blocks will be used.


For what behavior -- appending?


Also, I think the bigalloc behavior just ultimately ends up introducing
even more fragmentation on an already fragmented file system. It'll keep
contiguous chunks together, but those chunks can end up being spread all
over the disk.


This sounds like the NTFS strategy, which was basically to allow all 
hell to break loose -- above a certain chunk size.  Keep chunks of a 
certain size contiguous, and you limit the number of seeks by quite a lot.


Re: problem with reiser3

2006-08-21 Thread David Masover

Marcos Dione wrote:


it would be better to create a backup on a spare bigger partition
using dd_rescue (pad not recoverable zones with zeroes), then run
fsck on the created image.


unluckly I can't. it's a 160 GiB partition and I don't have spare
space.


How much spare space do you have?  You may be able to do some tricks 
with dm_snapshot...


Re: reiserfs and IDE write cache

2006-08-18 Thread David Masover

Francisco Javier Cabello wrote:

Hello,
I have been 'googling' and I have found a lot of people warning about the 
problems with IDE write cache and journaling filesystems. 


These problems exist with ANY filesystem, journaling or not.  They also 
exist with no filesystem at all.



Should I disable write cache in my systems using reiserfs3+2.4.25?


I'm not sure if it will help.  At least with IDE drives, I often cannot 
get the write cache disabled -- it's as if it ignores hdparm.


So, like Toby says, get a UPS.  Or get a laptop instead, if that makes 
any sense.


Re: some testing questions

2006-08-15 Thread David Masover

Hans Reiser wrote:

Ingo Bormuth wrote:


#df:
/dev/hda8  6357768   3478716   2879052  55% /cache
 
Before doing so, the partition was 90% full. 

The performance difference between 90% full and 55% full will be large
on every filesystem.  When we ship a repacker, that will be less true,
because we will have large chunks of unused space after the repacker runs.


Not always true.  For one, doesn't Reiser4 arbitrarily reserve 5%?  For 
another, look at his results -- unless I'm wrong, that's 3-7% 
fragmentation.  If I'm wrong, it's more like .03-.07%.


And lastly, at a certain point, percentages aren't really that accurate. 
 I've got a 350 or 400 gig partition which is 95% full, according to df 
(which if I was right about that 5%, it's more like 90% full) and that 
still leaves a solid 10-20 gigs free.


I mean, yes, performance will eventually start to suffer, but how much 
time and activity will it take to fragment 20 gigs of free space, 
especially with lazy allocation?


Re: the 'official' point of view expressed by kernelnewbies.org

2006-08-15 Thread David Masover

Edward Shishkin wrote:

Tom Reinhart wrote:
Anyone with serious need for data integrity already uses RAID, so why 
add brand new complexity for a solved problem?


RAID is great at recovering data, but not detecting errors.  File 
system can detect errors with checksum.  What is missing is an API 
between layers for filesystem to say this sector is bad, go rebuild it.




Actually we dont need a special API: kernel should warn and recommend
running fsck, which scans the whole tree and handles blocks with bad
checksums.


What does this have to do with RAID, though?


Re: the 'official' point of view expressed by kernelnewbies.org

2006-08-15 Thread David Masover

Edward Shishkin wrote:

David Masover wrote:

Edward Shishkin wrote:


Tom Reinhart wrote:

Anyone with serious need for data integrity already uses RAID, so 
why add brand new complexity for a solved problem?


RAID is great at recovering data, but not detecting errors.  File 
system can detect errors with checksum.  What is missing is an API 
between layers for filesystem to say this sector is bad, go rebuild 
it.




Actually we dont need a special API: kernel should warn and recommend
running fsck, which scans the whole tree and handles blocks with bad
checksums.



What does this have to do with RAID, though?




I assumed we dont have raid: reiser4 can support its own checksums/ecc
signatures for (meta)data protection via node plugin


We don't have a guaranteed raid, however, it would be nice to do the 
right thing when there is raid.


Re: The Infamous Reiser4-randomly-blocks-for-ages-and-writes-the-hd-continously-in-the-mean-while now with a btrace log! (hope it helps)

2006-08-10 Thread David Masover

Vesa Kaihlavirta wrote:


Incidentally, I've witnessed similar behaviour in various simple tasks,
e.g. writing
entries to an sqlite database, or receiving mail from pop3 in thunderbird.


Sounds like fsync issues.  That is being worked on.


Re: The Infamous Reiser4-randomly-blocks-for-ages-and-writes-the-hd-continously-in-the-mean-while now with a btrace log! (hope it helps)

2006-08-10 Thread David Masover

Łukasz Mierzwa wrote:
Dnia Thu, 10 Aug 2006 20:48:59 +0200, David Masover [EMAIL PROTECTED] 
napisał:



Vesa Kaihlavirta wrote:


Incidentally, I've witnessed similar behaviour in various simple tasks,
e.g. writing
entries to an sqlite database, or receiving mail from pop3 in 
thunderbird.


Sounds like fsync issues.  That is being worked on.


I'm think it's writeout that's involved, I tried to disable fsync and it 
helped for apps that are calling fsync to keep data integrity (like 
sqlite) but it also happens when I'm downloading files using rtorrent 
which does not call fsync but generetes many little writes.


Hmm.  Fragmentation, maybe?  Is this easily reproduceable with a 
freshly-formatted fs?


I'm just guessing here...


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-09 Thread David Masover

Jan Engelhardt wrote:

Yes, it looks like a business of node plugin, but AFAIK, you
objected against such checks:

Did I really?  Well, I think that allowing users to choose whether to
checksum or not is a reasonable thing to allow them.  I personally would
skip the checksum on my computer, but others

It could be a useful mkfs option


It should preferably a runtime tunable variable, at best even
per-superblock and (overriding the sb setting), per-file.


Sounds almost exactly like a plugin.  And yes, that would be the way to 
do it, especially considering some files will already have internal 
consistency checking -- just as we should allow direct disk IO to some 
files (no journaling) when the files in question are databases that do 
their own journaling.


Re: article abour Reiser4 on linux.com

2006-08-09 Thread David Masover

Andreas Schäfer wrote:

On 02:28 Wed 09 Aug , Hans Reiser wrote:

Unfortunately, it's not one of which editors approve. It too easily
looks as though the writer is being influenced by the source. 
 

If I were to do so, I'd risk being banned from publication. 


Uhm... interesting. It's not that I have so much experience with the
press (just three interviews so far), but everytime I got the article
for review before publication.



If you didn't trust the source in the first place, why should you
bother to take information from it at all? If you do trust it, why not
ask again?


Hmm.  Except in this case, they were summarizing a rather large debate, 
so it's not a question of trusting the source or not, it's a question of 
whether you want to fact-check with every person on reiserfs-list and 
lkml, until you've got the whole thing so debated and watered-down that 
it's meaningless.


Then, too, sometimes it's better to check ahead of time than to get it 
wrong and have to correct later, because people won't always read the 
corrections.


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-09 Thread David Masover

Hans Reiser wrote:

Pavel Machek wrote:



Yes, I'm afraid redundancy/checksums kill write speed,


they kill write speed to cache, but not to disk  our compression
plugin is faster than the uncompressed plugin.


Regarding cache, do we do any sort of consistency checking for RAM, or 
do we leave that to some of the stranger kernel patches -- or just an 
occasional memtest?


Re: Ebuild/rpm/deb repo's (was Re: reiser4 can now bear with filled fs, looks stable to me...)

2006-08-07 Thread David Masover

Christian Trefzer wrote:

On Sun, Aug 06, 2006 at 04:23:16PM +0200, Maciej Sołtysiak wrote:



There also is an issue with grub. The kernel alone is fine for creating 
partitions
(or loop devices) but with grub not patched we can't install boot partitions. 
No biggy,
I guess, but still a problem.


Few people keep a 32MB ext2 for /boot purposes these days, so it really
is imperative that grub can read kernel images off a reiser4 /.


I think there are patches, but I do keep a 32 meg ext3 for /boot, 
because it seems like no matter what FS I choose, there's some sort of 
caveat involving Grub.  I know when installing XFS as a root FS on 
Ubuntu, it talks about Grub problems...


I mean, having Grub support everything would be nice, but if you're 
reformatting anyway, I don't think it's that imperative.


Re: Ebuild/rpm/deb repo's (was Re: reiser4 can now bear with filled fs, looks stable to me...)

2006-08-07 Thread David Masover

Maciej Sołtysiak wrote:

Hello David,


hi


I have built today an r4-patched ubuntu kernel package (yes, debs!)


Sounds good.  I don't have an ubuntu to test with at the moment, though.


Please note, that this is done all under virtualization
(Microsoft Virtual PC).


Not to nitpick, but isn't that emulation?  Or have they actually done 
real virtualization yet?


Re: Ebuild/rpm/deb repo's (was Re: reiser4 can now bear with filled fs, looks stable to me...)

2006-08-07 Thread David Masover

Maciej Sołtysiak wrote:

Hello David,

Tuesday, August 8, 2006, 1:23:01 AM, you wrote:

Sounds good.  I don't have an ubuntu to test with at the moment, though.

Well, both MS Virtual PC and VMWare are free of charge, so installing
is a real snap.


Under what, though?  I don't want MS crap on my OS X (need that for work 
ATM), and I can't imagine they've ported it to Linux.  I have no reason 
to boot Windows except for games, and if I was going to do that, I may 
as well shrink my Windows partition to make room for a native install.


Which would be fine, but it's a lot of work when I don't run Ubuntu 
normally.


I'd be willing to test on the one Ubuntu server I run, but it's across 
the country until next week, and also work-critical.



Not to nitpick, but isn't that emulation?  Or have they actually done
real virtualization yet?

I don't know the differences, can you shed some light? AFAICS M$ will
be shipping Virtual PC with Vista to allow people run older software
under virtual machines. (be it virtualized or emulated)


Still hard to say.

Virtualization splits up the real hardware.  It's like a scheduler, only 
for OSes.  Emulation is more like an interpreter -- it reads each 
instruction and then executes something that does the same thing. 
Emulation can work from any arch to any arch, so Rosetta (allowing PPC 
OS X apps to run on OS X86) is emulation.


Emulation is usually at least 2x slower than native.  Virtualization 
usually approaches native for CPU stuff, but at least disk IO and 
graphics usually have to be emulated -- so no 3D acceleration, so no 
games under a guest OS.


If MS wanted to do the best possible thing for their consumers, they'd 
give you a free XP under VirtualPC with Vista, and actually do 
virtualization.  If M$ wanted to make it even more likely for people to 
want to upgrade to Vista, they might deliberately make it cost tons of 
money and make it emulation, so that XP looks slower, and native Vista 
apps look so much faster that people complain until everything works on 
Vista.



If Virtual PC is emulation, maybe Virtual Server 2005 R2 (also free of
charge) is virtualizaton.


I have no idea what Virtual Server is.


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-07 Thread David Masover

[EMAIL PROTECTED] wrote:


It seems that finding all the bits and pieces to do ext3 on-line
expansion has been a study in obfuscation.  Somewhat surprising since
this feature is a must for enterprise class storage management.


Not really.  Having people who can dig through the obfuscation is also a 
must for enterprise class anything.


The desktop is where it's really crucial to have good documentation and 
ease of use.  The enterprise can afford to pay people who already knew 
it well, helped to develop it...  Grandma probably got Linux because she 
couldn't afford a new OS, or computer.


Of course, I won't go so far as to try to say Linux should focus on 
this.  Linux should focus on whatever Linux developers feel like 
focusing on.




Re: Another article abour Reiser4 on linux.com

2006-08-06 Thread David Masover

Lexington Luthor wrote:

Bernd Schubert wrote:

An alternative might be a reiser4 fuse port. Has some advantages:


Please please no. The kernel people will use that as an argument for 
keeping it out of the kernel.


They'll use anything as an argument for keeping it out of the kernel. 
This one is particularly shallow, especially if we still have the kernel 
version, because the performance difference will be significant.


If it isn't, maybe it is time for things like FUSE to take us in the 
direction of microkernels...


I want reiser4 to be popular enough to 
make my apps depend on it and not have the users complain about having 
to use an obscure fs.


Well, an obscure program (FUSE) is probably a lot easier to convince 
users of than an obscure filesystem (reiser4 in-kernel).


Besides, the only thing about reiser4 that interests me more than XFS or 
reiserfs is the speed.


That's you.  There are other reasons to like it.

But I agree with you in that I don't think it's worth the resources to 
do a FUSE port, especially when there is (again) NO guarantee that 
anything we do will get us in the kernel, so better to do things that 
will either get us users anyway (like distro inclusion) or do things the 
kernel people specifically ask for.


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-06 Thread David Masover

Pavel Machek wrote:

On Tue 01-08-06 11:57:10, David Masover wrote:

Horst H. von Brand wrote:

Bernd Schubert [EMAIL PROTECTED] wrote:
While filesystem speed is nice, it also would be great 
if reiser4.x would be very robust against any kind of 
hardware failures.

Can't have both.
Why not?  I mean, other than TANSTAAFL, is there a 
technical reason for them being mutually exclusive?  I 
suspect it's more we haven't found a way yet...


What does the acronym mean?


There Ain't No Such Thing As A Free Lunch.


Yes, I'm afraid redundancy/checksums kill write speed, and you need
that for robustness...


Not necessarily -- if you do it on flush, and store it near the data it 
relates to, you can expect a similar impact to compression, except that 
due to slow disks, the compression can actually speed things up 2x, 
whereas checksums should be some insignificant amount slower than 1x.


Redundancy, sure, but checksums should be easy, and I don't see what 
robustness (abilities of fsck) has to do with it.



You could have filesystem that can be tuned for reliability and tuned
for speed... but you can't have both in one filesystem instance.


That's an example of TANSTAAFL, if it's true.


Re: Another article abour Reiser4 on linux.com

2006-08-05 Thread David Masover

Tassilo Horn wrote:


[1] http://www.linux.com/article.pl?sid=06/07/31/1548201


From the article:

To complicate matters, Reiser4's approach lands the filesystem in the 
middle of a longstanding convention of avoiding plugins in the kernel, 
mainly to avoid architectural complications, but also to discourage 
proprietary drivers that circumvent the kernel's release under the GNU 
General Public License.


We should really find something better to call them than plugins, or 
we should come up with a standard copy'n'paste statement to refute this.


Re: Another article abour Reiser4 on linux.com

2006-08-05 Thread David Masover

Clay Barnes wrote:

I like using a term that is already in an accepted part of the
kernel.  Extensions might smack of plugins a bit much, and we're
trying to avoid just doing a s/plugins/extensions/ of the
arguments we're seeing now.


We could do that with almost anything:


Or just modules... netfilter has modules that allow us to write
very cool and weird stuff (like unclean match once was) and
nobody complains.


Except that modules could also possibly remind people of proprietary 
modules, like the nvidia/ATI/vmware stuff.


Still, if we allow netfilter, why not Reiser4 modules?


Another word could be 'hooks'


I don't think this would quite work.  A hook describes more the place 
you connect to, whereas a module/plugin/whatever...


Think of it this way -- the hook is what a plugin would plug in to.

So it may not matter much what we name them, we're probably still going 
to need that cut'n'paste argument.  Might be easier with modules, though.




Re: Another article abour Reiser4 on linux.com

2006-08-05 Thread David Masover

Clay Barnes wrote:

I think the core thing we have to have to win this argument is
a) A word that isn't *instantly* associated with banned things.


That'd be nice.


b) The ability to point to the technology to point to the design
and say look, Look, it's *impossible* to use this design to put
binary modules into the kernel.  Even if it's as hard as ATI or
nVidia modules to put it in, that'll be enough to put up a fight
against inclusion.


Why?

Why does it have to be impossible to do binary things with the kernel? 
I mean, if Linus hates GPL3 because it limits what people do with the 
kernel...


Besides, you can't make it impossible, you can only make it about as 
hard as it is now.  The license is the issue here.



The *only* way to win a polical/personal fight
is to remove any possible objection until resistance looks purely
stupid and wholly unsubtantiated.


I agree.  That's why we not only need a new name, we also need a 
cut'n'paste argument that just makes this look stupid.


And it has to be short enough that cut'n'paste isn't bad, because if we 
refer people to the FAQ, they won't read it.



I was just saying to my roomate that I was losing hope for Reiser4
because I didn't see an end to the politics any time soon.


Yes, it can look pretty hopeless.


There's only one possible way I see to get in.  You must ask for an
absolute list of things that are objectionable.  You should then
ask *before you start work*  about removal of any items that are 
either a) impossible, or b) illogical.  Once you've gotten the

official stamp of approval of the (posibly recvised) absolute list
of objections, you have to do it, completely and exactly.  If they
agreed that that is everything they find wrong and promised that
they would include Reiser4 if those issues were resolved, then they
really *have* to put it in then.


The problem is, they don't.  There have been some fairly definitive 
lists in the past, that were done, but maybe not quite the way they were 
expected.



The core of all this is that rather than leaving an open-ended task
that can be expanded at will, they are given limits to how long the
objections can be spread out.


Problem is, dictators can do whatever they want, even if they said 
something else before.


And that's all assuming you can get them to agree to such a list, and 
agree to abide by it.  They either wouldn't go for it, or they would 
come up with a list that effectively kills Reiser4, turning it into ext3.




Re: Another article abour Reiser4 on linux.com

2006-08-05 Thread David Masover

TongKe Xue wrote:

A really stupid question ... why not put Reiser4 in one of the BSDs?

And after it's got mainstream use, if it proves its worth, there'll be 
more pressure for Linux to adopt.


It will likely take far more work to port it to BSD than it will to be 
included in Linux.  And you're talking about probably even less chance 
of inclusion or of picking up a large community than in Linux.


Re: Checksumming blocks? [was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-04 Thread David Masover

Russell Leighton wrote:

Is there a recovery mechanism, or do you just be happy you know there is 
a problem (and go to backup)?


You probably go to backup anyway.  The recovery mechanism just means you 
get to choose the downtime to restore from backup (if there is 
downtime), versus being suddenly down until you can restore.


Re: reiser4: maybe just fix bugs?

2006-08-04 Thread David Masover

Theodore Tso wrote:

On Tue, Aug 01, 2006 at 11:55:57AM -0500, David Masover wrote:
If I understand it right, the original Reiser4 model of file metadata is 
the file-as-directory stuff that caused such a furor the last big push 
for inclusion (search for Silent semantic changes in Reiser4):


The furor was caused by concerns Al Viro expressed about
locking/deadlock issues that reiser4 introduced.  


Which, I believe, was about file-as-dir.  Which also had problems with 
things like directory loops.  That's sort of a disk space memory leak.



The bigger issue with xattr support is two-fold.  First of all, there
are the progams that are expecting the existing extended attribute
interface,


Yeah...


More importantly are the system-level extended attributes, such as
those used by SELINUX, which by definition are not supposed to be
visible to the user at all,


I don't see why either of these are issues.  The SELINUX stuff can be a 
plugin which doesn't necessarily have a user-level interface. 
Cryptocompress, for instance, exists independent of its user-level 
interface (probably the file-as-dir stuff), and will probably be 
implemented in some sort of stable form as a system-wide default for new 
files.


So, certainly metadata (xattrs) as a plugin could be implemented with no 
UI at all, or any given UI.


... Anyway, I still see no reason why these cannot be implemented in 
Reiser4, other than the possibility that if it uses plugins, I 
guarantee that at least one or two people will hate the implementation 
for that reason alone.



Not supporting xattrs means that those distro's that use SELINUX by
default (i.e., RHEL, Fedora, etc.) won't want to use reiser4, because
SELINUX won't work on reiser4 filesytstems.


Right.  So they will be implemented, eventually.


Whether or not Hans cares about this is up to him


He does, or he should.  Reiser4 needs every bit of acceptance it can get 
right now, as long as it can get them without compromising its goals or 
philosophy.  Extended attributes only compromise these because it 
provides less incentive to learn any other metadata interface that 
Reiser4 provides.  But that's irrelevant if Reiser4 doesn't gain enough 
acceptance due to lack of xattr support, anything it has will be 
irrelevant anyway.


So just as we provide the standard interface to Unix permissions (even 
though we intend to implement things like acls and views, and even 
though there was a file/.pseudo/rwx interface), we should provide the 
standard xattr interface, and the standard direct IO interface, and 
anything else that's practical.  Be a good, standard filesystem first, 
and an innovative filesystem second.


Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)

2006-08-04 Thread David Masover

Horst H. von Brand wrote:

Vladimir V. Saveliev [EMAIL PROTECTED] wrote:

On Tue, 2006-08-01 at 17:32 +0200, Łukasz Mierzwa wrote:



What fancy (beside cryptocompress) does reiser4 do now?

it is supposed to provide an ability to easy modify filesystem behaviour
in various aspects without breaking compatibility.


If it just modifies /behaviour/ it can't really do much. And what can be
done here is more the job of the scheduler, not of the filesystem. Keep your
hands off it!


Say wha?

There's a lot you can do with the _representation_ of the on-disk format 
without changing the _physical_ on-disk format.  As a very simple 
example, a plugin could add a sysfs-like folder with information about 
that particular filesystem.  Yes, I know there are better ways to do 
things, but there are things you can change about behavior without (I 
think) touching the scheduler.


Or am I wrong about the scope of the scheduler?


If it somehow modifies /on disk format/, it (by *definition*) isn't
compatible. Ditto.


Cryptocompress is compatible with kernels that have a working 
cryptocompress plugin.  Other kernels will notice that they are meant to 
be read by cryptocompress, and (I hope) refuse to read files they won't 
be able to.


Same would be true of any plugin that changes the disk format.

But, the above comments about behavior still hold.  There's a lot you 
can do with plugins without changing the on-disk format.  If you want a 
working example, look to your own favorite filesystems that support 
quotas, xattrs, and acls -- is an on-disk FS format with those enabled 
compatible with a kernel that doesn't support them (has them turned 
off)?  How about ext3, with its journaling -- is the journaling all in 
the scheduler?  But isn't the ext3 disk format compatible with ext2?



quota support
xattrs and acls


Without those, it is next to useless anyway.


What is?  The FS?  I use neither on desktop machines, though I'd 
appreciate xattrs for Beagle.


Or are you talking about the plugins?  See above, then.



Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread David Masover

Alan Cox wrote:

Ar Maw, 2006-08-01 am 16:52 +0200, ysgrifennodd Adrian Ulrich:

WriteCache, Mirroring between 2 Datacenters, snapshotting.. etc..
you don't need your filesystem beeing super-robust against bad sectors
and such stuff because:


You do it turns out. Its becoming an issue more and more that the sheer
amount of storage means that the undetected error rate from disks,
hosts, memory, cables and everything else is rising.


Yikes.  Undetected.

Wait, what?  Disks, at least, would be protected by RAID.  Are you 
telling me RAID won't detect such an error?


It just seems wholly alien to me that errors would go undetected, and 
we're OK with that, so long as our filesystems are robust enough.  If 
it's an _undetected_ error, doesn't that cause way more problems 
(impossible problems) than FS corruption?  Ok, your FS is fine -- but 
now your bank database shows $1k less on random accounts -- is that ok?



There has been a great deal of discussion about this at the filesystem
and kernel summits - and data is getting kicked the way of networking -
end to end not reliability in the middle.


Sounds good, but I've never let discussions by people smarter than me 
prevent me from asking the stupid questions.



The sort of changes this needs hit the block layer and ever fs.


Seems it would need to hit every application also...


Re: reiser4: maybe just fix bugs?

2006-08-01 Thread David Masover

Vladimir V. Saveliev wrote:


Do you think that if reiser4 supported xattrs - it would increase its
chances on inclusion?


Probably the opposite.

If I understand it right, the original Reiser4 model of file metadata is 
the file-as-directory stuff that caused such a furor the last big push 
for inclusion (search for Silent semantic changes in Reiser4):


foo.mp3/.../rwx# permissions
foo.mp3/.../artist # part of the id3 tag

So I suspect xattrs would just be a different interface to this stuff, 
maybe just a subset of it (to prevent namespace collisions):


foo.mp3/.../xattr/ # contains files representing attributes

Of course, you'd be able to use the standard interface for 
getting/setting these.  The point is, I don't think Hans/Namesys wants 
to do this unless they're going to do it right, especially because they 
already have the file-as-dir stuff somewhat done.  Note that these are 
neither mutually exclusive nor mutually dependent -- you don't have to 
enable file-as-dir to make xattrs work.


I know it's not done yet, though.  I can understand Hans dragging his 
feet here, because xattrs and traditional acls are examples of things 
Reiser4 is supposed to eventually replace.


Anyway, if xattrs were done now, the only good that would come of it is 
building a userbase outside the vanilla kernel.  I can't see it as doing 
anything but hurting inclusion by introducing more confusion about 
plugins.


I could be entirely wrong, though.  I speak for neither 
Hans/Namesys/reiserfs nor LKML.  Talk amongst yourselves...


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread David Masover

Horst H. von Brand wrote:

Bernd Schubert [EMAIL PROTECTED] wrote:


While filesystem speed is nice, it also would be great if reiser4.x would be 
very robust against any kind of hardware failures.


Can't have both.


Why not?  I mean, other than TANSTAAFL, is there a technical reason for 
them being mutually exclusive?  I suspect it's more we haven't found a 
way yet...


Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)

2006-08-01 Thread David Masover

Christian Trefzer wrote:

On Mon, Jul 31, 2006 at 10:57:35AM -0500, David Masover wrote:

Wil Reichert wrote:


Any idea how the fragmentation resulting from re-syncing the tree
affects performance over time?
Yes, it does affect it a lot.  I have no idea how much, and I've never 
benchmarked it, but purely subjectively, my portage has gotten slower 
over time.


Delayed allocation still performs a lot better here than the v3
immediate allocation. In addition, tree balancing operations are
performed on flush as well, so what you get on disk is basically an
almost-optimal tree. Of course, this will change a bit over time, but
with v4 it takes a lot longer for that to happen than with v3 afaict.
There _has_ been some worthwile development in the meantime : )


Hmm.  The thing is, I don't remember v3 slowing down much at all, 
whereas v4 slowed down pretty dramatically after the first few weeks. 
It does seem pretty stable now, though, and it doesn't seem to be 
getting any slower.


I've had this particular FS since...  hmm...  Is there an FS tool to 
check mkfs time?  I think it's a year now, but I'd like to be sure.


If not, I'll just find the oldest file, but the clock on this machine 
isn't reliable (have to set it with NTP every boot)...


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread David Masover

Theodore Tso wrote:


Ah, but as soon as the repacker thread runs continuously, then you
lose all or most of the claimed advantage of wandering logs.

[...]

So instead of a write-write overhead, you end up with a
write-read-write overhead.


This would tend to suggest that the repacker should not run constantly, 
but also that while it's running, performance could be almost as good as 
ext3.



But of course, people tend to disable the repacker when doing
benchmarks because they're trying to play the my filesystem/database
has bigger performance numbers than yours game


So you run your own benchmarks, I'll run mine...  Benchmarks for 
everyone!  I'd especially like to see what performance is like with the 
repacker not running, and during the repack.  If performance during a 
repack is comparable to ext3, I think we win, although we have to amend 
that statement to My filesystem/database has the same or bigger 
perfomance numbers than yours.


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread David Masover

Alan Cox wrote:

Ar Maw, 2006-08-01 am 11:44 -0500, ysgrifennodd David Masover:

Yikes.  Undetected.

Wait, what?  Disks, at least, would be protected by RAID.  Are you 
telling me RAID won't detect such an error?


Yes.

RAID deals with the case where a device fails. RAID 1 with 2 disks can
in theory detect an internal inconsistency but cannot fix it.


Still, if it does that, that should be enough.  The scary part wasn't 
that there's an internal inconsistency, but that you wouldn't know.


And it can fix it if you can figure out which disk went.  Or give it 3 
disks and it should be entirely automatic -- admin gets paged, admin 
hotswaps in a new disk, done.


we're OK with that, so long as our filesystems are robust enough.  If 
it's an _undetected_ error, doesn't that cause way more problems 
(impossible problems) than FS corruption?  Ok, your FS is fine -- but 
now your bank database shows $1k less on random accounts -- is that ok?


Not really no. Your bank is probably using a machine (hopefully using a
machine) with ECC memory, ECC cache and the like. The UDMA and SATA
storage subsystems use CRC checksums between the controller and the
device. SCSI uses various similar systems - some older ones just use a
parity bit so have only a 50/50 chance of noticing a bit error.

Similarly the media itself is recorded with a lot of FEC (forward error
correction) so will spot most changes.

Unfortunately when you throw this lot together with astronomical amounts
of data you get burned now and then, especially as most systems are not
using ECC ram, do not have ECC on the CPU registers and may not even
have ECC on the caches in the disks.


It seems like this is the place to fix it, not the software.  If the 
software can fix it easily, great.  But I'd much rather rely on the 
hardware looking after itself, because when hardware goes bad, all bets 
are off.


Specifically, it seems like you do mention lots of hardware solutions, 
that just aren't always used.  It seems like storage itself is getting 
cheap enough that it's time to step back a year or two in Moore's Law to 
get the reliability.



The sort of changes this needs hit the block layer and ever fs.

Seems it would need to hit every application also...


Depending how far you propogate it. Someone people working with huge
data sets already write and check user level CRC values for this reason
(in fact bitkeeper does it for one example). It should be relatively
cheap to get much of that benefit without doing application to
application just as TCP gets most of its benefit without going app to
app.


And yet, if you can do that, I'd suspect you can, should, must do it at 
a lower level than the FS.  Again, FS robustness is good, but if the 
disk itself is going, what good is having your directory (mostly) intact 
if the files themselves have random corruptions?


If you can't trust the disk, you need more than just an FS which can 
mostly survive hardware failure.  You also need the FS itself (or maybe 
the block layer) to support bad block relocation and all that good 
stuff, or you need your apps designed to do that job by themselves.


It just doesn't make sense to me to do this at the FS level.  You 
mention TCP -- ok, but if TCP is doing its job, I shouldn't also need to 
implement checksums and other robustness at the protocol layer (http, 
ftp, ssh), should I?  Because in this analogy, it looks like TCP is the 
block layer and a protocol is the fs.


As I understand it, TCP only lets the protocol/application know when 
something's seriously FUBARed and it has to drop the connection. 
Similarly, the FS (and the apps) shouldn't have to know about hardware 
problems until it really can't do anything about it anymore, at which 
point the right thing to do is for the FS and apps to go oh shit and 
drop what they're doing, and the admin replaces hardware and restores 
from backup.  Or brings a backup server online, or...




I guess my main point was that _undetected_ problems are serious, but if 
you can detect them, and you have at least a bit of redundancy, you 
should be good.  For instance, if your RAID reports errors that it can't 
fix, you bring that server down and let the backup server run.


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread David Masover

Gregory Maxwell wrote:

On 8/1/06, David Masover [EMAIL PROTECTED] wrote:

Yikes.  Undetected.

Wait, what?  Disks, at least, would be protected by RAID.  Are you
telling me RAID won't detect such an error?


Unless the disk ECC catches it raid won't know anything is wrong.

This is why ZFS offers block checksums... it can then try all the
permutations of raid regens to find a solution which gives the right
checksum.


Isn't there a way to do this at the block layer?  Something in 
device-mapper?



Every level of the system must be paranoid and take measure to avoid
corruption if the system is to avoid it... it's a tough problem. It
seems that the ZFS folks have addressed this challenge by building as
much of what is classically separate layers into one part.


Sounds like bad design to me, and I can point to the antipattern, but 
what do I know?


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-08-01 Thread David Masover

Ric Wheeler wrote:

Alan Cox wrote:

Ar Maw, 2006-08-01 am 16:52 +0200, ysgrifennodd Adrian Ulrich:


WriteCache, Mirroring between 2 Datacenters, snapshotting.. etc..
you don't need your filesystem beeing super-robust against bad sectors
and such stuff because:



You do it turns out. Its becoming an issue more and more that the sheer
amount of storage means that the undetected error rate from disks,
hosts, memory, cables and everything else is rising.


Most people use absolutely giant disks in laptops and desktop systems 
(300GB  500GB are common, 750GB on the way). File systems need to be as 
robust as possible for users of these systems as people are commonly 
storing personal critical data like photos mostly on these unprotected 
drives.


Their loss.  Robust FS is good, but really, if you aren't doing backup, 
you are going to lose data.  End of story.


Even for the high end users, array based mirroring and so on can only do 
so much to protect you.


Mirroring a corrupt file system to a remote data center will mirror your 
corruption.


Assuming it's undetected.  Why would it be undetected?

Rolling back to a snapshot typically only happens when you notice a 
corruption which can go undetected for quite a while, so even that will 
benefit from having reliability baked into the file system (i.e., it 
should grumble about corruption to let you know that you need to roll 
back or fsck or whatever).


Yes, the filesystem should complain about corruption.  So should the 
block layer -- if you don't trust the FS, use a checksum at the block 
layer.  So should...


There are just so many other, better places to do this than the FS.  The 
FS should complain, yes, but if the disk is bad, there's going to be 
corruption.


An even larger issue is that our tools, like fsck, which are used to 
uncover these silent corruptions need to scale up to the point that they 
can uncover issues in minutes instead of days.  A lot of the focus at 
the file system workshop was around how to dramatically reduce the 
repair time of file systems.


That would be interesting.  I know from experience that fsck.reiser4 is 
amazing.  Blew away my data with something akin to an rm -rf, and fsck 
fixed it.  Tons of crashing/instability in the early days, but only once 
-- before they even had a version instead of a date, I think -- did I 
ever have a case where fsck couldn't fix it.


So I guess the next step would be to make fsck faster.  Someone 
mentioned a fsck that repairs the FS in the background?


In a way, having super reliable storage hardware is only as good as the 
file system layer on top of it - reliability needs to be baked into the 
entire IO system stack...


That bit makes no sense.  If you have super reliable storage failure 
(never dies), and your FS is also reliable (never dies unless hardware 
does, but may go bat-shit insane when hardware dies), then you've got a 
super reliable system.


You're right, running Linux's HFS+ or NTFS write support is generally a 
bad idea, no matter how reliable your hardware is.  But this discussion 
was not about whether an FS is stable, but how well an FS survives 
hardware corruption.


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 Thread David Masover

Ian Stirling wrote:

David Masover wrote:

David Lang wrote:


On Mon, 31 Jul 2006, David Masover wrote:

Oh, I'm curious -- do hard drives ever carry enough 
battery/capacitance to cover their caches?  It doesn't seem like it 
would be that hard/expensive, and if it is done that way, then I 
think it's valid to leave them on.  You could just say that other 
filesystems aren't taking as much advantage of newer drive features 
as Reiser :P



there are no drives that have the ability to flush their cache after 
they loose power.



Aha, so back to the usual argument:  UPS!  It takes a fraction of a 
second to flush that cache.


You probably don't actually want to flush the cache - but to write
to a journal.
16M of cache - split into 32000 writes to single sectors spread over
the disk could well take several minutes to write. Slapping it onto
a journal would take well under .2 seconds.
That's a non-trivial amount of storage though - 3J or so, [EMAIL PROTECTED] -
a moderately large/expensive capacitor.


Before we get ahead of ourselves, remember:  ~$200 buys you a huge 
amount of battery storage.  We're talking several minutes for several 
boxes, at the very least -- more like 10 minutes.


But yes, a journal or a software suspend.


Re: reiser4: maybe just fix bugs?

2006-08-01 Thread David Masover

Nate Diller wrote:

On 8/1/06, David Masover [EMAIL PROTECTED] wrote:

Vladimir V. Saveliev wrote:



I could be entirely wrong, though.  I speak for neither
Hans/Namesys/reiserfs nor LKML.  Talk amongst yourselves...


i should clarify things a bit here.  yes, hans' goal is for there to
be no difference between the xattr namespace and the readdir one.
unfortunately, this is not feasible with the current VFS, and some
major work would have to be done to enable this without some
pathological cases cropping up.  some very smart people think that it
cannot be done at all.


But an xattr interface should work just fine, even if the rest of the 
system is inaccessible (no readdir interface) -- preventing all these 
pathological problems, except the one where Hans implements it the way 
I'm thinking, and kernel people hate it.


Re: reiser4 can now bear with filled fs, looks stable to me...

2006-07-31 Thread David Masover

Hans Reiser wrote:

I think that most of our problem is that we are too socially insulated
from lkml.  They are a herd, and decide things based on what thoughts
echo most loudly.


To be fair, it's not the whole lkml you have to convince, just the few 
people directly responsible for filesystems and 2.6 maintenance.  But 
then, they probably do consider what the herd is saying...



It might even be socially effective to shut down reiserfs-list until
inclusion occurs.


Maybe.  It will be an inconvenience for me, if we have to.  I'm not even 
on LKML, and I'd rather not be -- even this list can get noisy at times...


But I will go with it if it's what works best.


Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)

2006-07-31 Thread David Masover

Wil Reichert wrote:

=)

That was sorta the plan.

Any idea how the fragmentation resulting from re-syncing the tree
affects performance over time?


Try to post replies at the bottom, or below the context.

Yes, it does affect it a lot.  I have no idea how much, and I've never 
benchmarked it, but purely subjectively, my portage has gotten slower 
over time.


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-31 Thread David Masover

Jan-Benedict Glaw wrote:

On Mon, 2006-07-31 17:59:58 +0200, Adrian Ulrich [EMAIL PROTECTED] wrote:

A colleague of mine happened to create a ~300gb filesystem and started
to migrate Mailboxes (Maildir-style format = many small files (1-3kb))
to the new LUN. At about 70% the filesystem ran out of inodes; Not a


So preparation work wasn't done.


So what?

Yes, you need to do preparation.  But it is really nice if the 
filesystem can do that work for you.


Let me put it this way -- You're back in college, and it's time to write 
a thesis.  You have a choice of software packages:




Package A:  You have to specify how many pages, and how many words, 
you're likely to use before you start typing.  Guess too high, and 
you'll print out a bunch of blank pages at the end.  Guess too low, and 
you'll run out of space and have to start over, copy and paste your 
document back in, and hope it gets all the formatting right, which it 
probably won't.


Package B:  Your document grows as you type.  When it's time to print, 
only the pages you've actually written something on -- but all of the 
pages you've actually written something on -- are printed.




All other things being equal, which would you choose?  Which one seems 
more modern?


Look, I understand the argument against ReiserFS v3 -- it has another 
limitation that you don't even know about.  That other limitation is 
scary -- that's like being able to type as many words as you want, but 
once you type enough pages (no way of knowing how many), pages start 
randomly disappearing from the middle of your document.


But the argument that no one cares about inode limits?  Really, stop 
kidding yourselves.  It's 2006.  The limits are starting to look 
ridiculous.  Just because they're workable doesn't mean we should have 
to live with them.


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-31 Thread David Masover

Matthias Andree wrote:

Adrian Ulrich schrieb am 2006-07-31:



Why are a lot of Solaris-people using (buying) VxFS? Maybe because UFS
also has such silly limitations? (..and performs awkward with trillions
of files..?..)


Well, such silly limitations... looks like they are mostly hot air
spewn by marketroids that need to justify people spending money on their
new filesystem.


I think the limitations are silly, and I'm not paid to say this. 
Besides, we're talking about a filesystem that will be free (and libre), 
so I don't see the point of marketroids, certainly not in this context.


But let's not stoop to name-calling.


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-31 Thread David Masover

Jan-Benedict Glaw wrote:

On Mon, 2006-07-31 12:17:12 -0700, Clay Barnes [EMAIL PROTECTED] wrote:

On 20:43 Mon 31 Jul , Jan-Benedict Glaw wrote:

On Mon, 2006-07-31 20:11:20 +0200, Matthias Andree [EMAIL PROTECTED] wrote:

Jan-Benedict Glaw schrieb am 2006-07-31:

[Crippled DMA writes]

Massive hardware problems don't count. ext2/ext3 doesn't look much better in
such cases. I had a machine with RAM gone bad (no ECC - I wonder what

They do! Very much, actually. These happen In Real Life, so I have to

I think what he meant was that it is unfair to blame reiser3 for data
loss in a massive failure situation as a case example by itself.  What


Crippling a few KB of metadata in the ext{2,3} case probably wouldn't
fobar the filesystem...


Probably.  By the time a few KB of metadata are corrupted, I'm reaching 
for my backup.  I don't care what filesystem it is or how easy it is to 
edit the on-disk structures.


This isn't to say that having robust on-disk structures isn't a good 
thing.  I have no idea how Reiser4 will hold up either way.  But 
ultimately, what you want is the journaling (so power failure / crashes 
still leave you in an OK state), backups (so when blocks go bad, you 
don't care), and performance (so you can spend less money on hardware 
and more money on backup hardware).


Re: the 'official' point of view expressed by kernelnewbies.orgregarding reiser4 inclusion

2006-07-31 Thread David Masover

David Lang wrote:

On Mon, 31 Jul 2006, David Masover wrote:

Probably.  By the time a few KB of metadata are corrupted, I'm 
reaching for my backup.  I don't care what filesystem it is or how 
easy it is to edit the on-disk structures.


This isn't to say that having robust on-disk structures isn't a good 
thing. I have no idea how Reiser4 will hold up either way.  But 
ultimately, what you want is the journaling (so power failure / 
crashes still leave you in an OK state), backups (so when blocks go 
bad, you don't care), and performance (so you can spend less money on 
hardware and more money on backup hardware).


please read the discussion that took place at the filesystem summit a 
couple weeks ago (available on lwn.net)


I think I will, but I don't have the time today, so...

one of the things that they pointed out there is that as disks get 
larger the ratio of bad spots per Gig of storage is remaining about the 
same. As is the rate of failures per Gig of storage.


As a result of this the idea of only running on perfect disks that never 
have any failures is becomeing significantly less realistic, instead you 
need to take measures to survive in the face of minor corruption 
(including robust filesystems, raid, etc)


RAID seems a much more viable solution to me.  That and cheaper storage, 
so that you can actually afford to replace the disk when you find 
corruption, or have more redundancy so you don't have to.


Because robust filesystems is nice in theory, but in practice, you 
really never know what will get hit.  RAID, at least, is predictable.


When it's not:  Backups.


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-31 Thread David Masover

Alan Cox wrote:

Ar Llu, 2006-07-31 am 17:00 -0400, ysgrifennodd Gregory Maxwell:



Are you sure that you aren't commenting on cases where Reiser3 alerts
the user to a critical data condition (via a panic) which leads to a
trouble report while ext3 ignores the problem which suppresses the
trouble report from the user?


man mount

Ext3 is configurable, and has been for years via the errors= option.


Sure, but I think the suggestion is that the reason we generally see 
more ReiserFS complaints than ext3 complaints might be because of the 
default level of errors logged.


Re: reiser4 can now bear with filled fs, looks stable to me...

2006-07-31 Thread David Masover

Maciej Sołtysiak wrote:

Hello David,



- it is more expensive to:
  a) succeed at kernel inclusion
  b) argue
  c) waste time


You must be new here...

Options B and C are all that ever seems to happen when reiserfs-list and 
lkml collide.


Is option A possible?  The speed of a nonworking program is irrelevant. 
 The cost-effectiveness of an impossible solution is irrelevant.


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]

2006-07-31 Thread David Masover

Matthias Andree wrote:

On Mon, 31 Jul 2006, Nate Diller wrote:


this is only a limitation for filesystems which do in-place data and
metadata updates.  this is why i mentioned the similarities to log
file systems (see rosenblum and ousterhout, 1991).  they observed an
order-of-magnitude increase in performance for such workloads on their
system.


It's well known that transactions that would thrash on UFS or ext2fs may
have quieter access patterns with shorter strokes can benefit from
logging, data journaling, whatever else turns seeks into serial writes.
And then, the other question with wandering logs (to avoid double
writes) and such, you start wondering how much fragmentation you get as
the price to pay for avoiding seeks and double writes at the same time.


So you use a repacker.  Nice thing about a repacker is, everyone has 
downtime.  Better to plan to be a little sluggish when you'll have 
1/10th or 1/50th of the users than be MUCH slower all the time.


You're right, though, to ask the question:


TANSTAAFL, or how long the system can sustain such access patterns,
particularly if it gets under memory pressure and must move.


Anyone care to run some very long benchmarks?


Even with
lazy allocation and other optimizations, I question the validity of
3000/s or faster transaction frequencies. Even the 500 on ext3 are
suspect, particularly with 7200/min (s)ATA crap. This sounds pretty much
like the drive doing its best to shuffle blocks around in its 8 MB cache
and lazily writing back.


Oh, I'm curious -- do hard drives ever carry enough battery/capacitance 
to cover their caches?  It doesn't seem like it would be that 
hard/expensive, and if it is done that way, then I think it's valid to 
leave them on.  You could just say that other filesystems aren't taking 
as much advantage of newer drive features as Reiser :P


Anyway, remember that the primary tool of science is not logic.  Logic 
is the primary tool of philosophy.  The primary tool of science is 
observation.


Sorry, the only machines I could really run this on are about to be in 
remote only mode for a couple weeks.  I'm hesitant to hit them too hard.


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]

2006-07-31 Thread David Masover

Theodore Tso wrote:

On Mon, Jul 31, 2006 at 08:31:32PM -0500, David Masover wrote:
So you use a repacker.  Nice thing about a repacker is, everyone has 
downtime.  Better to plan to be a little sluggish when you'll have 
1/10th or 1/50th of the users than be MUCH slower all the time.


Actually, that's a problem with log-structured filesystems in general.
There are quite a few real-life workloads where you *don't* have
downtime.  The thing is, in a global economy, you move from the
London/European stock exchanges, to the New York/US exchanges, to the
Asian exchanges, with little to no downtime available.


Such systems must have redundancy, however.  And if you have 2-3 servers 
hot in case one of them goes down, I can see switching between which is 
more active, and which is repacking.


This repacker is online, hence a filesystem being repacked would have to 
be less active, not necessarily down.  So repack the backup server, then 
make the backup server the active one and repack the main server.  If 
the main server goes down while the backup is repacking, kill the repack 
process.


I actually have a problem imagining a system where you don't have enough 
spare capacity (disk, CPU, spare servers) to run a repacker every now 
and then, but which also must have 100% uptime.  What happens when a 
disk goes bad?  Or when power to half the country goes out?  Or...  You 
get the idea.



In addition,
people have been getting more sophisticated with workload
consolidation tricks so that you use your downtime for other
applications (either to service other parts of the world, or to do
daily summaries, 3-d frame rendering at animation companies, etc.)  So
the assumption that there will always be time to run the repacker is a
dangerous one.


3D frame rendering in particular doesn't require much disk use, does it? 
 Daily summaries, I guess, depends on what kind of summaries they are. 
 And anyway, those applications are making the same dangerous assumption.


And anyway, I suspect the repacker will work best once a week or so, but 
no one knows yet, as they haven't written it yet.



The problem is that many benchmarks (such as taring and untaring the
kernel sources in reiser4 sort order) are overly simplistic, in that
they don't really reflect how people use the filesystem in real life.


That's true.  I'd also like to see lots more benchmarks.


If the benchmark doesn't take into account the need for
repacker, or if the repacker is disabled or fails to run during the
benchmark, the filesystem are in effect cheating on the benchmark
because there is critical work which is necessary for the long-term
health of the filesystem which is getting deferred until after the
benchmark has finished measuring the performance of the system under
test.


In this case, the only fair test would be to run the benchmark 24/7 for 
a week, and run the repacker on a weekend.  Or however you're planning 
to do it.  It wouldn't be fair to run a 10-minute or 1-hour benchmark 
and then immediately run the repacker.


But I'd also like to see more here, especially about fragmentation.  If 
the repacker will cost money, the system should be reasonably good at 
avoiding fragmentation.  I'm wondering if I should run a benchmark on my 
systems -- they're over a year old, and while they aren't under 
particularly heavy load, they should be getting somewhat fragmented by now.


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion]

2006-07-31 Thread David Masover

Timothy Webster wrote:

Different users have different needs.


I'm having trouble thinking of users who need an FS that doesn't need a 
repacker.


The disk error problem, though, you're right -- most users will have to 
get bitten by this, hard, at least once, or they'll never get the 
importance of it.  But it'd be nice if it's not too hard, and we can 
actually recover most of their files.


Still, I can see most people who are aware of this problem using RAID, 
backups, and not caring if their filesystem tolerates bad hardware.



The problem I see is managing disk errors.


I see this kind of the same way.  If your disk has errors, you should be 
getting a new disk.  If you can't do that, you can run a mirrored RAID 
-- even on SATA, you should be able to hotswap it.


Even for a home/desktop user, disks are cheap, and getting cheaper all 
the time.  All you have to do is run the mean time between failure 
numbers by them, and ask them if their backup is enough.



And perhaps a
really good clustering filesystem for markets that
require NO downtime. 


Thing is, a cluster is about the only FS I can imagine that could 
reasonably require (and MAYBE provide) absolutely no downtime. 
Everything else, the more you say it requires no downtime, the more I 
say it requires redundancy.


Am I missing any more obvious examples where you can't have enough 
redundancy, but you can't have downtime either?


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]

2006-07-31 Thread David Masover

David Lang wrote:

On Mon, 31 Jul 2006, David Masover wrote:

Oh, I'm curious -- do hard drives ever carry enough 
battery/capacitance to cover their caches?  It doesn't seem like it 
would be that hard/expensive, and if it is done that way, then I think 
it's valid to leave them on.  You could just say that other 
filesystems aren't taking as much advantage of newer drive features as 
Reiser :P


there are no drives that have the ability to flush their cache after 
they loose power.


Aha, so back to the usual argument:  UPS!  It takes a fraction of a 
second to flush that cache.


now, that being said, /. had a story within the last couple of days 
about hard drive manufacturers adding flash to their hard drives. they 
may be aiming to add some non-volitile cache capability to their drives, 
although I didn't think that flash writes were that fast (needed if you 
dump the cache to flash when you loose power), or that easy on power 
(given that you would first loose power), and flash has limited write 
cycles (needed if you always use the cache).


But, the point of flash was not to replace the RAM cache, but to be 
another level.  That is, you have your Flash which may be as fast as the 
disk, maybe faster, maybe less, and you have maybe a gig worth of it. 
Even the bloatiest of OSes aren't really all that big -- my OS X came 
installed, with all kinds of apps I'll never use, in less than 10 gigs.


And I think this story was awhile ago (a dupe?  Not surprising), and the 
point of the Flash is that as long as your read/write cache doesn't run 
out, and you're still in that 1 gig of Flash, you're a bit safer than 
the RAM cache, and you can also leave the disk off, as in, spinned down. 
 Parked.


Very useful for a laptop -- I used to do this in Linux by using Reiser4, 
setting the disk to spin down, and letting lazy writes do their thing, 
but I didn't have enough RAM, and there's always the possibility of 
losing data.  But leaving the disk off is nice, because in the event of 
sudden motion, it's safer that way.  Besides, most hardware gets 
designed for That Other OS, which doesn't support any kind of Laptop 
Mode, so it's nice to be able to enforce this at a hardware level, in a 
safe way.


I've heard to many fancy-sounding drive technologies that never hit the 
market, I'll wait until thye are actually available before I start 
counting on them for anything  (let alone design/run a filesystem that 
requires them :-)


Or even remember their names.

external battery backed cache is readily available, either on high-end 
raid controllers or as seperate ram drives (and in raid array boxes), 
but nothing on individual drives.


Ah.  Curses.

UPS, then.  If you have enough time, you could even do a Software 
Suspend first -- that way, when power comes back on, you boot back up, 
and if it's done quickly enough, connections won't even be dropped...




Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressed by kernelnewbies.org regarding reiser4 inclusion]

2006-07-31 Thread David Masover

David Lang wrote:

On Mon, 31 Jul 2006, David Masover wrote:


And perhaps a
really good clustering filesystem for markets that
require NO downtime. 


Thing is, a cluster is about the only FS I can imagine that could 
reasonably require (and MAYBE provide) absolutely no downtime. 
Everything else, the more you say it requires no downtime, the more I 
say it requires redundancy.


Am I missing any more obvious examples where you can't have enough 
redundancy, but you can't have downtime either?


just becouse you have redundancy doesn't mean that your data is idle 
enough for you to run a repacker with your spare cycles.


Then you don't have redundancy, at least not for reliability.  In that 
case, you have redundancy for speed.


to run a 
repacker you need a time when the chunk of the filesystem that you are 
repacking is not being accessed or written to.


Reasonably, yes.  But it will be an online repacker, so it will be 
somewhat tolerant of this.


it doesn't matter if that 
data lives on one disk or 9 disks all mirroring the same data, you can't 
just break off 1 of the copies and repack that becouse by the time you 
finish it won't match the live drives anymore.


Aha.  That really depends how you're doing the mirroring.

If you're doing it at the block level, then no, it won't work.  But if 
you're doing it at the filesystem level (a cluster-based FS, or 
something that layers on top of an FS), or (most likely) the 
database/application level, then when you come back up, the new data is 
just pulled in from the logs as if it had been written to the FS.


The only example I can think of that I've actually used and seen working 
is MySQL tables, but that already covers a huge number of websites.


database servers have a repacker (vaccum), and they are under tremendous 
preasure from their users to avoid having to use it becouse of the 
performance hit that it generates. (the theory in the past is exactly 
what was presented in this thread, make things run faster most of the 
time and accept the performance hit when you repack). the trend seems to 
be for a repacker thread that runs continuously, causing a small impact 
all the time (that can be calculated into the capacity planning) instead 
of a large impact once in a while.


Hmm, if that could be done right, it wouldn't be so bad -- if you get 
twice the performance but have to repack for 2 hrs at the end of the 
week, repacker is better, right?  So if you could spread the 2 hours out 
over the week, in theory, you'd still be pretty close to twice the 
performance.


But that is fairly difficult to do, and may be more difficult to do well 
than to implement, say, a Reiser4 plugin that operates about on the 
level of rsync, but on every file modification.


the other thing they are seeing as new people start useing them is that 
the newbys don't realize they need to do somthing as archaic as running 
a repacker periodicly, as a result they let things devolve down to where 
performance is really bad without understanding why.


Yikes.  But then, that may be a failure of distro maintainers for not 
throwing it in cron for them.


I had a similar problem with MySQL.  I turned on binary logging so I 
could do database replication, but I didn't realize I had to actually 
delete the logs.  I now have a daily cron job that wipes out everything 
except the last day's logs.  It could probably be modified pretty easily 
to run hourly, if I needed to.


Moral of the story?  Maybe there's something to this continuous 
repacker idea, but don't ruin a good thing for the rest of us because 
of newbies.


Re: Solaris ZFS on Linux [Was: Re: the 'official' point of viewexpressedby kernelnewbies.org regarding reiser4 inclusion]

2006-07-31 Thread David Masover

David Lang wrote:

On Mon, 31 Jul 2006, David Masover wrote:


Aha, so back to the usual argument:  UPS!  It takes a fraction of a 
second to flush that cache.


which does absolutly no good if someone trips over the power cord, the 
fuse blows in the power supply, someone yanks the drive out of the 
hot-swap bay, etc.


Power supply fuse...  Yeah, it happens.  Drives die, too.  This seems 
fairly uncommon.  And dear God, please tell me anyone smart enough to 
set up a UPS would also be smart enough to make tripping over the power 
cord rare or impossible.


My box has a cable that runs down behind a desk, between the desk and 
the wall.  Power strip is on the floor, where a UPS will be when I get 
around to buying one.  If someone kicks any cable, it would be where the 
UPS hits the wall -- but that's also behind the same desk.



as I understand it flash reads are fast (ram speeds), but writes are 
pretty slow (comparable or worse to spinning media)


writing to a ram cache, but having a flash drive behind it doesn't gain 
you any protection. and I don't think you need it for reads


Does gain you protection if you're not using the RAM cache, if you're 
that paranoid.  I don't know if it's cheaper than RAM, but more read 
cache is always better.  And losing power seems a lot less likely than 
crashing, especially on a Windows laptop, so it does make sense as a 
product.  And a laptop, having a battery, will give you a good bit of 
warning before it dies.  My Powerbook syncs and goes into Sleep mode 
when it runs low on power (~1%/5mins)


external battery backed cache is readily available, either on 
high-end raid controllers or as seperate ram drives (and in raid 
array boxes), but nothing on individual drives.


Ah.  Curses.

UPS, then.  If you have enough time, you could even do a Software 
Suspend first -- that way, when power comes back on, you boot back up, 
and if it's done quickly enough, connections won't even be dropped...


remember, it can take 90W of power to run your CPU, 100+ to run your 
video card, plus everything else. even a few seconds of power for this 
is a very significant amount of energy storage.


Suspend2 can take about 10-20 seconds.  It should be possible to work 
out the maximum amount of time it can take.


Anyway, according to a quick Google search, my CPU is more like 70W. 
Video card isn't required on a server, but you may be right on mine.  I 
haven't looked at UPSes lately, though.  I need about 3 seconds for a 
sync, maybe 10 for a suspend, so to be safe I can say for sure I'd be 
down in about 30 seconds.


So, another Google search, and while you can get a cheap UPS for 
anywhere from $10 to $100, the sweet spot seems to be a little over $200.


$229, and it's 865W, supposedly for 3.7 minutes.  Here's a review:

This is a great product. It powers an AMD 64 3200+ with beefy (6800GT) 
graphics card, 21 CRT monitor, secondary 19 CRT, a linux server, a 15 
CRT, Cisco 2800XL switch, Linksys WRTG54GS, cable modem, speakers, and 
many other things. The software says I will get 9 minutes runtime with 
all of that hooked up, realistically it's about 4 minutes.


This was the lowest time reported.  Most of the other reviews say at 
least 15 minutes, sometimes 30 minutes, with fairly high-end computers 
listed (and monitors, sometimes two computers/monitors), but nowhere 
near as much stuff as this guy has.


I checked most of these for Linux support, and UPSes in general seem 
well supported.  So yes, the box will shut off automatically.  On a 
network, it shouldn't be too hard to get one box to shut off all the rest.


It's a lot of money, even at the low end, but when you're already 
spending a pile of money on a new computer, keep power in mind.  And 
really, even 11 minutes would be fine, but 40 minutes of power is quite 
a lot compared to less than a minute of time taken to shut down normally 
-- not even suspend, but a normal shut down.  I'd be tempted to try to 
ride it out for the first 20 minutes, see if power comes back up...


however, I did get a pointer recently at a company makeing super-high 
capcity caps, up to 2600F (F, not uF!) in a 138mmx tall 57mm dia 
cyliner, however it only handles 2.7v (they have modules that handle 
higher voltages available)

http://www.maxwell.com/ultracapacitors/index.html

however I don't see these as being standard equipment in systems or on 
drives anytime soon


This seems to be a whole different approach -- more along the lines of 
in the drive, which would be cool...


Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)

2006-07-30 Thread David Masover

Łukasz Mierzwa wrote:
Dnia Sat, 29 Jul 2006 20:31:59 +0200, David Masover [EMAIL PROTECTED] 
napisał:



Nikita Danilov wrote:


As you see, ext2 code already has multiple file plugins, with
persistent plugin id (stored in i_mode field of on-disk struct
ext2_inode).


Aha!  So here's another question:  Is it fair to ask Reiser4 to make its
plugins generic, or should we be asking ext2/3 first?



Doesn't iptables have plugins? Maybe we should make them generic so 
other packet filters can use them ;)


Hey, yeah!  I mean, not everyone wants to run the ipchains emulation on 
top of iptables!  Some people really want to run ipchains with iptables 
plugins!


/sarcasm

It is REALLY time for this discussion to get technical again, and to go 
way, way over my head.  And it's time for me to go build my MythTV box, 
and see if I can shake out some Reiser4 bugs.


Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)

2006-07-30 Thread David Masover

Christian Trefzer wrote:

On Sun, Jul 30, 2006 at 11:39:42PM +0200, Christian Trefzer wrote:

In order to avoid having to pull the whole tree via rsync again, you
might want to grab my script from the list and adapt it to your needs.


Of course, you can tar it up manually instead. Silly me, but after
approx. 9h of studying, little wonder ; )


In fact, the official install guide tells you to download a snapshot 
tarball first, then start syncing.


Re: reiser4 can now bear with filled fs, looks stable to me...

2006-07-30 Thread David Masover

Christian Trefzer wrote:

Hi,

I booted 2.6.18-rc2-mm1 today and later filled up my /opt partition by
accident, and guess what, reiser4 did not screw up : D


Hmm, I'm curious, though...  How does it react to a few billion files? 
Sorry, I can't test this, but I will be testing MythTV, if not now, then 
in a few weeks.



Congratulations and thanks to the namesys developers! Hans, I can
somewhat understand how you feel about your situation. Don't let
frustration get in your way, your work is simply too great. You're an

[...]

screwing over society ; ) Sometimes you just  have to swallow your pride
instead of wasting your time by yelling at the rest of the world, and if
humble work does not lead to success, there won't be any other way, I
fear.


Amen.  I do not want to see Reiser4 not succeed because of politics, and 
it really looks like the only way to win the political war is not to 
play.  The technical stuff is really the last way in, but neither side 
has said anything technical in awhile.  The most technical things that 
have happened lately is Hans pointing to benchmarks and LKML pointing to 
ext3 plugins.


I suspect part of this is simply the word plugin coming around to bite 
us in the ass, but whatever.  We're all tired of this fight.



IMHO it would be best to deliver quality patches against all kinds of
sources (distro kernels, vanilla -rcs maybe, etc.)


Well, we have the patches against vanilla, which seem to work well with 
at least a few other patches I've tried.



and the entire
patched source tarball as well, for people to download and build. Next
step would be to provide binary packages, and repos for people to add to
their package manager's source list. Until distros pick up their
respective patch, this is as far as support can go, I guess.


That would actually be pretty good, for anyone making the conscious 
decision to use a filesystem.  Still need official distro support to get 
the people who don't (think they) care.



So, what do you all say?


Sounds good.  I don't have any idea of the work required, either...


Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)

2006-07-29 Thread David Masover
Arjan van de Ven wrote:
 Most users not only cannot patch a kernel, they don't know what a patch
 is.  It most certainly does. 
 
 
 obviously you can provide complete kernels, including precompiled ones.
 Most distros have a yum or apt or similar tool to suck down packages,
 it's trivial for users to add a site to that, so you could provide
 packages if you want and make it easy for them.

What's more, many distros patch their kernels extensively.  They listen
to their users, too.  So if there are a lot of users wanting this to be
in the kernel, let them complain -- loudly -- to their distro to patch
for Reiser4.

It could be made even easier than that -- if Reiser4 is really so
self-contained, it could be a whole separate set of modules, distributed
on its own.  Most gamers have to be content with doing something similar
with the nvidia drivers -- for different reasons (licensing) but with
the same results.  I know Gentoo handles this automatically (emerge
nvidia-kernel).

Hmm, maybe it makes it a pain to have it as a root filesystem, so that
really needs distro support.  And yet, we have a whole system designed
specifically for being able to load modules and tweak settings before
the root FS is available.  It's called initrd, or more recently,
initramfs.  I use an old-style initrd on this box, because my root FS is
on an nvidia RAID, so I have to run a program called dmraid before I
mount my root FS -- it's really trivial for me to have Reiser4 as a
module, and I do, despite it being a root FS.

I suspect that, all technical, political, and mine is bigger arguments
aside, being available as a root FS of a distro, especially a default
FS, would go a long way towards inclusion in the kernel.  So all you
have to do is find a reasonably popular and friendly distro, with people
who are (for the moment) easier to deal with than kernel people.

Most people, if they even know what a filesystem or a kernel is, still
won't bother compiling their own kernel, you're right.  But that means
they are more likely to be using a distro-patched kernel than a stock,
vanilla one.

Is this enough to be in the jukebox, Hans?

Of course, it's odd that I mention Gentoo, the Gentoo people (as a rule)
hate ReiserFS, but there are far more distros than there are popular
kernel forks.  I'm sure someone will be interested.

That's assuming that making further changes (putting stuff in the VFS)
is out of the question (for now).



signature.asc
Description: OpenPGP digital signature


Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)

2006-07-29 Thread David Masover
Hans Reiser wrote:
 David Masover wrote:
 
 If indeed it can be changed easily at all.  I think the burden is on
 you to prove that you can change it to be more generic, rather than
 saying Well, we could do it later, if people want us to...
 
 None of the filesystems other than reiser4 have any interest in using
 plugins, and this whole argument over how it should be in VFS is
 nonsensical because nobody but us has any interest in using the
 functionality.  The burden is on the generic code authors to prove that
 they will ever ever do anything at all besides complain.  Frankly, I
 don't think they will.  I think they will never produce one line of code.

I think it's fair to say that 5-10 years from now, with different ext3
maintainers, when the Reiser4 concept has proven itself, people will
want plugins for ext3, and the ext3 developers will like the idea.

ext* is one of those things that just refuses to die.  I use ext3 for my
/boot fs, so that I don't have to patch Grub for Reiser4, and so that at
least I can mess with the bootloader from any rescue CD if something
goes wrong.  It's for kind of the same reason that Gentoo builds a
32-bit Grub, even though I'm booting a 64-bit OS -- just in case.

I also use ext2 for my initrd.

There are other monstrosities that will likely never die, also.
ISO9660, UDF, and VFAT probably all have worse storage characteristics
than Reiser4, in that as I understand it, they won't pack multiple files
into a block.  So Reiser4 might even make a good boot cd FS, storing
things more efficiently -- but even if I'm right, those three
filesystems will last forever, because they are currently well supported
on every major OS, and I think one of ISO/UDF is required for DVDs.

So for whatever reason someone's using another filesystem, even if all
they need is the on-disk format (my reason for ext3 /boot and vfat on
USB thumbdrives), I think it's reasonable to expect that they may one
day want plugin functionality.  People who like Reiser filesystems will
do just fine running Reiser4 with a (udf|iso|vfat) storage driver, but
people who don't will just want the higher level stuff.

You're probably right and this is years of work for something that may
not be worth anything, but I think this is what is going through
people's heads as they look at this plugin system.

So see my comments about distro inclusion.



signature.asc
Description: OpenPGP digital signature


Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)

2006-07-29 Thread David Masover
Nikita Danilov wrote:

 As you see, ext2 code already has multiple file plugins, with
 persistent plugin id (stored in i_mode field of on-disk struct
 ext2_inode).

Aha!  So here's another question:  Is it fair to ask Reiser4 to make its
plugins generic, or should we be asking ext2/3 first?



signature.asc
Description: OpenPGP digital signature


Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)

2006-07-29 Thread David Masover

Sarath Menon wrote:

On Saturday 29 July 2006 23:41, David Masover wrote:

I know Gentoo handles this automatically (emerge  nvidia-kernel).


I hate to say this again, but its not automatically. It requires more 


My point is, there's a fairly large group of users who would be willing 
to do that, as they're willing to do that to get their video drivers 
working.  Also, assuming a distro did choose to support it, the only 
reason nvidia-kernel isn't just distributed with a pre-built kernel (on 
pre-built OSes, anyway) is licensing.  This isn't a problem for Reiser4, 
which is GPL'd.



I suspect that, all technical, political, and mine is bigger arguments
aside, being available as a root FS of a distro, especially a default
FS, would go a long way towards inclusion in the kernel.  So all you
have to do is find a reasonably popular and friendly distro, with people
who are (for the moment) easier to deal with than kernel people.


Its actually a matter of a hastle for the end user. That's where I would agree 
with Hans' comments quite earlier.


Putting it in the kernel doesn't make it any more or less of a hassle 
for the end-user than getting distro support.  I remember downloading a 
different set of Debian floppies which supported XFS, before XFS was 
mainstream.


In that sense, it's somewhat done already -- there is a Gentoo livecd 
that is kept patched for Reiser4.  The problem with Gentoo, of course, 
is that if you're going to use Gentoo, you're going to be compiling your 
own kernel.  So when it comes down to getting vanilla-sources or 
gentoo-sources, it wouldn't take much -- just a reiser4-sources, or a 
separate reiser4-module package.



Most people, if they even know what a filesystem or a kernel is, still
won't bother compiling their own kernel, you're right.  But that means
they are more likely to be using a distro-patched kernel than a stock,
vanilla one.


Well, that's different, and that's the main problem in the linux empowerment 
that we see around ourselves. It finally revolves around the user, and as 
harsh as it may seem, it ultimately is the user who decides which fs is 
better (Give or take, they don't know the difference between a kernel or 
user-space. or for that matter far more basic things.)


If I remember right, SuSe had ReiserFS as the default at one point.  If 
even one moderately popular Linux had Reiser4 as the default FS, it 
would get a LOT more exposure than it would simply being included (as 
EXPERIMENTAL, at that) in the vanilla kernel.



Of course, it's odd that I mention Gentoo, the Gentoo people (as a rule)
hate ReiserFS, but there are far more distros than there are popular
kernel forks.  I'm sure someone will be interested.


I do, and that's partly due to the speed of /usr/portage on reiser4, and the 
easiness of blowing everything and starting from scratch : -)


Yes, I love /var/lib/portage/world also.

Is /usr/portage still faster on Reiser4?  I know it was when I switched, 
but that was years ago...


Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)

2006-07-28 Thread David Masover

Horst H. von Brand wrote:

Jeff Garzik [EMAIL PROTECTED] wrote:

[...]


It is then simple to follow that train of logic:  why not make it easy
to replace the directory algorithm [and associated metadata]?  or the
file data space management algorithms?  or even the inode handling?
why not allow customers to replace a stock algorithm with an exotic,
site-specific one?


IMVHO, such experiments should/must be done in userspace. And AFAICS, they
can today.


inode handling?  Really?

But what's wrong with people doing such experiments outside the kernel? 
 AFAICS, exotic, site-specific one is not something that would be 
considered for inclusion.


Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)

2006-07-28 Thread David Masover

Hans Reiser wrote:

Linus Torvalds wrote:



In other words, if a filesystem wants to do something fancy, it needs to 
do so WITH THE VFS LAYER, not as some plugin architecture of its own.





(Let us try to avoid arguments over whether if you extend VFS it is
still called VFS or is called reiser4's plugin layer, agreed?)


Ok, assuming you actually extend the VFS.  The point is that if we want 
plugins, we don't have to implement them in ext3, but we have to put the 
plugin interface somewhere standard that is obviously not part of one 
filesystem (the VFS is the place) so that ext3 can implement a plugin 
system without having to read or touch a line of reiser4 code, and 
without compiling reiser4 into the kernel.


It may ultimately not be any different, technically.  This seems more 
like an organizational and political thing.  But that doesn't make it 
less important or valid.



Regarding copyright, these plugins are compiled in.  I have resisted
dynamically loaded plugins for now, for reasons I will not go into here.


Good point, there's no GPL issue here.  Plugins will either not be 
distributed (used internally) or distributed as GPL.



If you agree with taking it to the next level, then it is only to be
expected that there are things that aren't well suited as they are, like
parsing /etc/fstab when you have a trillion files.  It is not very
feasible to do it for all of the filesystems all at once given finite
resources, it needs a prototype. 


Doesn't have to be in fstab, I hope, but think of it this way:  ext3 
uses JBD for its journaling.  As I understand it, any other filesystem 
can also use JBD, and ext3 is mostly ext2 + JDB.


So, make the plugin interface generic enough that it compliments the 
VFS, doesn't duplicate it, and doesn't exist as part of Reiser4 (and 
requires Reiser4 to be present).  This may be just a bunch of renaming 
or a lot of work, I don't know, but I suspect it would make a lot of 
people a lot happier.



We have finite resources.  We can give you a working filesystem with
roughly twice the IO performance of the next fastest you have that does
not disturb other filesystems,.  (4x once the compression plugin is
fully debugged).  It also fixes various V3 bugs without disturbing that
code with deep fixes.  We cannot take every advantage reiser4 has and
port it to every other filesystem in the form of genericized code as a
prerequisite for going in, we just don't have the finances.


This is a very compelling argument to me, but that's preaching to the 
choir, I've been running Reiser4 since before it was released, and 
before it looked like it was going to be stable anytime soon.


It may be bold of me to speak for the LKML, but I think the general 
consensus is:


The speed of a nonworking program is irrelevant -- no one cares how fast 
it is if it breaks things, either now or in the future.  Currently, the 
concern is that it breaks things in the future, like adding plugin 
support to other filesystems.


And no one else cares what your finances are.  Not out of compassion, 
but out of practicality.  For instance, it would be a huge financial 
benefit to me if the kernel displayed, in big bold letters while 
booting, that DAVID MASOVER WROTE THIS!  (I'm sure Linus knows what I'm 
talking about.)  It would also be untrue in my case, and pointless for 
everyone else in the kernel, so I have to find another way to make money.


This is because one way Linux stays ahead of the competition 
(technologically) is by having quality be a much greater motivation than 
money.



Without
plugins our per file compression plugins and encryption plugins cannot
work.  We can however let other filesystems use our code, and cooperate
as they extend it and genericize it for their needs.  Imposing code on
other development teams is not how one best leads in open source, one
sets an example and sees if others copy it.  That is what I propose to
do with our plugins.  If no one copies, then we have harmed no one. 
Reasonable?


Someone still has to maintain the FS.  Anyway, like I said, this is a 
very compelling argument for me, but code speaks louder than words. 
Maybe, if you insist it's not in the VFS, maybe use some insanely simple 
FS like RomFS to demonstrate another FS using plugins?


Do that, and put it in the VFS.  Maybe implement something like cramfs 
as a romfs plugin (another demo).  Maybe even per-file -- implement 
zisofs as isofs + compression plugin.  I think that would effectively 
kill any argument that plugins are bad because they are only in Reiser4.


Beyond that is all marketing, I guess.  The word plugin is not helping 
here, too many people remember Plugins like Macromedia Flash...


Re: metadata plugins (was Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion)

2006-07-28 Thread David Masover

Hans Reiser wrote:


plugins if not for us.  Our plugins affect no one else.  Our
self-contained code should not be delayed because other people delayed


And at the moment, I can still use Reiser4.  If I ever make a distro, I 
will include Reiser4 support, probably as the default FS.  That will 
help with getting into the kernel.


So, why is it that it's urgent to get into the kernel?  It will have to 
be bootstrapped one way or another -- either get it into the kernel so 
distros are more likely to include it, or get it into distros so the 
kernel is more likely to include it.


But this is exactly the kind of thing that has happened before.  With 
XFS, with Nvidia even -- clean it up, do it the way the kernel people 
want you to, because they're the ones who will have to maintain it for 
20 years, and make sure it doesn't stop working or break anything else.



advantage from leading.  If they want to some distant day implement
generic plugins, for which they have written not one line of code to
date, fine, we'll use it when it exists, but right now those who haven't
coded should get out of the way of people with working code.  It is not
fair or just to do otherwise.  It also prevents users from getting
advances they could be getting today, for no reason.


It prevents users from doing nothing.


Our code will not
be harder to change once it is in the kernel, it will be easier, because
there will be more staff funded to work on it.


If indeed it can be changed easily at all.  I think the burden is on you 
to prove that you can change it to be more generic, rather than saying 
Well, we could do it later, if people want us to...



As for this we are all too grand to be bothered with money to feed our
families business, building a system in which those who contribute can
find a way to be rewarded is what managers do.   Free software
programmers may be willing to live on less than others, but they cannot
live on nothing, and code that does not ever ship means living on nothing.


Let me put this in perspective the best way I know how, with an inane 
analogy:


Suppose there's a band.  A good band, full of impossible superstars, led 
by a benevolent dictator -- for the sake of argument, let's call him 
Elvis. (the King -- dictator...)  The band's doing really well, and 
Elvis  crew are getting paid fairly well just to share their music.


(Ok, maybe Elvis didn't write anything, but bear with me...)

Now, along comes a young Jimi Hendrix.  He wants to be in the band, and 
Elvis says Sure, just come up with a song we like and we'll play it, 
and you can even play it with us!  Sounds like a pretty good deal, so 
Jimi goes and tells all his friends, a couple of girls...


Now, Jimi finishes his song, Elvis listens to it, and if you know 
anything about the music Elvis did and the music Hendrix did, you can 
imagine what happens next.  Elvis says This song just isn't us.  But if 
you change it here, and here, and maybe here, we'll play it.


Jimi is devastated.  He'd been counting on playing it with them that 
night, and if he doesn't, he won't have any groupies, all his friends 
will laugh at him, and his life will kind of suck.


But, does anyone really think Elvis has any business singing Voodoo 
Child?  Or Purple Haze?  Is it really fair to ask Elvis to completely 
change his act and embarrass himself to help Jimi out?


The answer is, Jimi shouldn't have staked so much on something that was 
never a guarantee.  And what's more, the real-life Jimi Hendrix never 
played with Elvis, but had a very successful band of his own.  And if 
Elvis was still alive, seeing Jimi play might make him change his mind, 
maybe -- but at least with his own band, Jimi's success isn't pinned on 
playing with Elvis.


This analogy is flawed in many ways, aside from just being plain 
chronologically impossible, but while I'm sure Linus feels bad for you, 
I don't think it's his obligation to compromise his kernel to help you 
out with your financial situation.  So it would help a lot if you 
wouldn't keep bringing it up in what should be a technical discussion.




So, if you can't make it work with VFS, then I guess you can't, and 
you're stuck either creating another interface which is not tied to any 
one filesystem and isn't tied to the VFS either, or coming up with a 
better (more specific) idea of how to make the Reiser4 plugin system 
acceptable to kernel maintainers without having to eat Ramen for a few 
years.


Understand that I'm putting on my devil's advocate hat right now.  I'd 
love to see Reiser4 merged tomorrow, or a week from now, exactly as it's 
written today, but I just don't see it happening.  I'd also love to get 
more technical, but I just don't know the Reiser4 internals well enough 
to understand the feasibility (or not) of any of my vague ideas.


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-27 Thread David Masover

Jeff Garzik wrote:

Pavel Machek wrote:

Hi!

of the story for me. There's nothing wrong about focusing on newer 
code,

but the old code needs to be cared for, too, to fix remaining issues
such as the can only have N files with the same hash value.
Requires a disk format change, in a filesystem without plugins, to 
fix it.


A filesystem WITH plugins must still handle the standard Linux 
compatibility stuff that other filesystems handle.


Plugins --do not-- mean that you can just change the filesystem format 
willy-nilly, with zero impact.


They --do-- mean that you can change much of the filesystem behavior 
without requiring massive on-disk changes or massive interface changes.


After all, this is how many FUSE plugins work -- standard FS interface, 
usually uses another standard FS as storage, but does crazy things like 
compression, encryption, and other transformations in between.


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-27 Thread David Masover

Maciej Sołtysiak wrote:

Hello David,

Thursday, July 27, 2006, 3:19:15 AM, you wrote:

I'm not arguing for closed source, I'm just saying that once you open,
there's no going back.  Many times it's a good thing, but sometimes you

A sidenote.

Reiser4 is open and still we don't see people writing plugins as crazy.
I belive there is one group that tried to be the first outside of
namesys to write a plugin but still no success.


Kernel inclusion would help a lot.  Decent documentation would be 
better, though.  Someone should look at what FUSE is doing right. 
Plugins fill a lot of the same niches, but with significantly better 
performance.


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-26 Thread David Masover

Matthias Andree wrote:

On Tue, 25 Jul 2006, David Masover wrote:


Matthias Andree wrote:

On Tue, 25 Jul 2006, Denis Vlasenko wrote:


I, on the contrary, want software to impose as few limits on me
as possible.

As long as it's choosing some limit, I'll pick the one with fewer
surprises.

Running out of inodes would be pretty surprising for me.


No offense: Then it was a surprise for you because you closed your eyes
and didn't look at df -i or didn't have monitors in place.


Or because my (hypothetical) business exploded before I had the chance.

After all, you could make the same argument about bandwidth, until you 
get Slashdotted.  Surprise!



There is no way to ask how many files with particular hash values you
can still stuff into a reiserfs 3.X. There, you're running into a brick
wall that only your forehead will see when you touch it.


That's true, so you may be correct about less surprises.  So, it 
depends which is more valuable -- fewer surprises, or fewer limits?


That's not a hypothetical statement, and I don't really know.  I can see 
both sides of this one.  But I do hope that once Reiser4 is stable 
enough for you, it will be predictable enough.



But the assertion that some backup was the cause for inode exhaustion on
ext? is not very plausible since hard links do not take up inodes,
symlinks are not backups and everything else requires disk blocks. So,


Ok, where's the assertion that symlinks are not backups?  Or not used in 
backup software?  What about directories full of hardlinks -- the dirs 
themselves must use something, right?


Anyway, it wasn't my project that hit this limit.


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-25 Thread David Masover
Matthias Andree wrote:
 On Tue, 25 Jul 2006, Denis Vlasenko wrote:
 
 I, on the contrary, want software to impose as few limits on me
 as possible.
 
 As long as it's choosing some limit, I'll pick the one with fewer
 surprises.

Running out of inodes would be pretty surprising for me.

But then, I guess it's a good thing I don't admin for a living anymore.



signature.asc
Description: OpenPGP digital signature


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-25 Thread David Masover
Russell Cattelan wrote:
 On Sun, 2006-07-23 at 01:20 -0600, Hans Reiser wrote:
 Jeff, I think that a large part of what is going on is that any patch
 that can be read in 15 minutes gets reviewed immediately, and any patch
 that is worked on for 5 years and then takes a week to read gets
[...]
 It is importand that we embrace our diversity, and be happy for the
 strength it gives us.  Some of us are good at small patches that evolve,
 and some are good at escaping local optimums.  We all have value, both
 trees and grass have their place in the world.

 Which is summed up quite well by:
 http://en.wikipedia.org/wiki/Color_of_the_bikeshed
 
 It seem to be a well know tendency for people to want to
 be involved in some way, thus keeping to much of the development
 cycle internal tends to generate friction.

No, I think Hans is right.

Although I should mention, Hans, that there is a really good reason to
prefer the 15 minute patches.  The patches that take a week are much
harder to read during that week than any number of 15 minute incremental
patches, because the incremental patches are already broken down into
nice, small, readable, ordered chunks.  And since development follows
some sort of logical, orderly pattern, it can be much easier to read it
that way than to try to consider the whole.

Think of it this way -- why are debuggers useful?  One of the nicest
thing about a debugger, especially for newbies, is the ability to step
through a program a line at a time.  It's the same principle -- you can
understand the program state at one point in time, and the impact of one
line of code, much more easily than the overall model of the program
state (and all of its edge cases), or the impact of several hundred
(thousand? tens of thousands?) lines of code.

So while I don't blame the Namesys team for putting off inclusion till
it's done, I also can't really blame the kernel guys for not wanting to
read it, especially if it's revolutionary.  Revolutionary ideas are hard
to grasp, and it's not their fault.

I mean, if revolutionary ideas were easy, why didn't you write Reiser4
for a system like, say, Tunes? (http://tunes.org/)  Say what you will,
but there are ways of doing fast filesystems which don't require that
said filesystems be written in kernel C.  Consider this:

http://www.cs.utah.edu/flux/oskit/

If I understand that right, it's a mechanism for writing kernel code in
languages like Java, Lisp, Scheme, or ML...

If we could all grasp every single good (best) idea from every corner of
software engineering, and write completely new software (including the
OS) using those ideas, we could potentially replace all existing
software in something like 3-5 years with software which has none of the
problems ours does now.  We'd never have inflexibility, insecurity,
instability, user interface issues...  Never have to worry about getting
software out the door (it'd be so fast to develop), but always have it
designed the right way the first time, yet be able to rearrange it
completely with only 5-10 line patches.

So it's not always the computer hardware that's the limitation.  Often
it's our hardware as well.  Human beings usually aren't equipped to be
able to grok the whole universe all at once.  If we were...  see above.



signature.asc
Description: OpenPGP digital signature


Re: Viewing files as directories

2006-07-25 Thread David Masover
Timothy Webster wrote:
 WARNING, a users point of view ;)
 Everything is a file, including a directory.
 
 Being able to view files as directories is not just a
 nice to have thing. It is actually required if we are
 going to manage changesets of odf files.

The lkml people will tell you that this isn't required at all, and it's
ludicrous to say so.  And they're somewhat right -- you could just patch
SVN, and it might be easier than writing a Reiser4 plugin.

 The truth is most people aren't code developers, but
 document developers. odf files are a container. And it
is XML inside.

Come on, do you really expect people to read XML diffs?  Even if you
split the XML out into files/dirs based on elements, using SVN directly
would be way too arcane to people who are used to what word processors
already do -- it's something called Track Changes.

Fire up OpenOffice and check out the Edit-Changes menu.  Word has a
similar feature.  Not as powerful, maybe, but most people are not
collaborative document developers, either.

 But just about just about every program or script
 would be better off seening the odf as a compressed
 directory.

Maybe, maybe not.

 Yes it would be really wonderful, if we could just see
 directories as file and files as directories. Which of
 course means a file and a directory are one in the
 same.

Ever use OS X?  It does this, to some extent, in Finder, which supports
the lkml point that doing this in the filesystem, or anywhere in the
kernel, is unnecessary and a bad idea.

 As things stand now the way forward seams to be per
 application program mime types. Simple right, but it
 is not because, applications tools like svn, brz,

There are two OS X file types that I know of, and probably quite a few
more, which are actually stored on disk as folders, which is why most
Mac software is distributed as disk images or zipfiles.  One is the
Application type (.app, though Finder hides the extension) and the other
is the MPKG type (whatever it stands for, extension is .mpkg).

Basically, they appear as ordinary files to Finder, which means that
most of the time, you cannot see that there are files inside them.  You
double click on a .app, and it runs a script in a predefined relative
location inside the folder.  Double click on a .mpkg, and it launches
their installer program.  Drag them around and they behave like files in
every way, except that you cannot email them, upload them to a web page,
or interact with anywhere other than your local Mac system which expects
single files.  But when you run into that, just zip them.

But if you want, you can right-click on them (or control+click) and -- I
forget which option it is, but you can browse inside the package.



By the way, Hans, Apple has beaten you by quite a bit for at least some
of the functionality we've discussed.  You can do operations on Search
Folders easily, which work by using Spotlight (an indexed fulltext local
system search engine).  You can have files-as-directories, to a point.
There are generic ways of getting at metadata, and they are done as
plugins -- Spotlight plugins, anyway.

I'd much rather use the Reiser4 described in the whitepaper, of course,
and I am getting sick of the lack of decent package management for my
Mac, so I'll be adding a Linux boot.  I'm curious to see if Reiser4 is
stable on PowerPC -- this is a year-old G4, I missed the Intel cores by
just a few short months...



signature.asc
Description: OpenPGP digital signature


Re: ReiserFS v3 choking when free space falls below 10% - FIXED

2006-07-25 Thread David Masover
Mike Benoit wrote:

 Thanks for all your hard work, I'm sure many other MythTV users will be
 appreciate it.

As a future MythTV user a bit late to this discussion, I'm curious --
was this Reiser3 or 4?  Are there any known MythTV issues with v4?  I
say this because the box with my capture card is running on a Reiser4
root right now...



signature.asc
Description: OpenPGP digital signature


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-25 Thread David Masover
Horst H. von Brand wrote:

 18GiB = 18 million KiB, you do have a point there. But 40 million files on
 that, with some space to spare, just doesn't add up.

Right, ok...

Here's a quick check of my box.  I've explicitly stated which root-level
directories to search, to avoid nfs mounts, chrooted OSes, and virtual
filesystems like /proc and /sys.

elite ~ # find /bin/ /boot/ /dev/ /emul/ /etc/ /home /lib32 /lib64 /opt
/root /sbin /tmp /usr /var -type f -size 1 | wc -l
246127

According to the find manpage:

-size n[bckw]
  File uses n units of space.  The units are  512-byte  blocks  by
  default  or  if `b' follows n, bytes if `c' follows n, kilobytes
  if `k' follows n, or 2-byte words if `w' follows  n.   The  size
  does  not  count  indirect  blocks,  but it does count blocks in
  sparse files that are not actually allocated.


And I certainly didn't plan it that way.  And this is my desktop box,
and I'm just one user.  Most of the space is taken up by movies.

And yet, I have almost 250k files at the moment whose size is less than
512 bytes.  And this is a normal usage pattern.  It's not hard to
imagine something prone to creating lots of tiny files, combined with
thousands of users, easily hitting some 40 mil files -- and since none
of them are movies, it could fit in 18 gigs.

I mean, just for fun:

elite ~ # find /bin/ /boot/ /dev/ /emul/ /etc/ /home /lib32 /lib64 /opt
/root /sbin /tmp /usr /var | wc -l
866160

It may not be a good idea, but it's possible.  And one of the larger
reasons it's not a good idea is that most filesystems can't handle it.
Kind of like how BitTorrent is a very bad idea on dialup, but a very
good idea on broadband.



signature.asc
Description: OpenPGP digital signature


Re: the 'official' point of view expressed by kernelnewbies.org regarding reiser4 inclusion

2006-07-25 Thread David Masover
Hans Reiser wrote:

 to use as his default.  Now that we paid the 5 year development price
 tag to get everything as plugins, we can now upgrade in littler pieces
 than any other FS.  Hmm, I need a buzz phrase, its not extreme
 programming, maybe moderate programming.  Does that sound exciting to

Hah!  No, it doesn't sound exciting.

Plugins don't work well either, not as a marketing concept.  People have
had so many bad experiences with plugins, and they're only ever visible
when you have a bad experience.  Think about it -- missing plugin (so
you have to download it),

On the other hand, it works for WordPress.  My day job is work on a
plugin for WordPress.  Not including a link because I feel dirty for
having to work with PHP...

Fluid programming?  If you build a solution from the bottom up with
gravel or large rocks, you leave gaps that are hard to fill without
ripping off the top layer and redoing it.  But if you can do fluid
programming, your program just flows around any obstacle, and into every
crack / between every space (metaphor for new customer requirements)...



signature.asc
Description: OpenPGP digital signature


  1   2   3   4   5   >