Re: Howto help hans?

2006-12-04 Thread Valdis . Kletnieks
On Mon, 04 Dec 2006 13:02:15 GMT, Danny Milosavljevic said:
 What does an attourney cost for, say, 100 hours over there?

Guesstimating $200/hour (minimum), you're looking at $20K and up.  Taking
a case to trial is going to take a lot more than 100 billable hours.




pgpgsOvhkEQyt.pgp
Description: PGP signature


Re: Which version will be merged into mainline kernel?

2006-11-11 Thread Valdis . Kletnieks
On Sat, 11 Nov 2006 15:14:18 GMT, Danny Milosavljevic said:
 I've never understood this kind of attitude some MTAs have. Usually the
 hardware would make sure that stuff doesn't disappear (UPS, powered RAM,
 harddisk condenser) and not some weird software workaround that complicates
 and slows down everything.

I've never understood this kind of attitude some filesystems have. Usually
the hardware would make sure that stuff doesn't disappear, and not some
weird software workarounds like journalling or write barriers that
complicate and slow down everything.

Now as you were saying?



pgpHqTi1lKSH4.pgp
Description: PGP signature


Re: reiser4 experimental patch

2006-11-10 Thread Valdis . Kletnieks
On Fri, 10 Nov 2006 10:59:30 -0200, Guilherme Covolo said:
 the diference between my an  Johannes Hirte's patch is:
 *
 
 /fs/reiser4/super_ops.c
 
 290c290
  static int reiser4_statfs(struct dentry *dentry, struct kstatfs *statfs)
 ---
  static int reiser4_statfs(struct super_block *super, struct kstatfs *statfs

diff -c or diff -u please.  That way, if some unrelated thing moves the lines
up or down 1 or 2, it still applies.  Also, it's easier to look at a 'diff -u'
and understand what's going on, because you get to see 3-4 lines either side
of the changed lines.

 i change my super_ops.c but why you alter te int to ssize_t on item.h?

ssize_t isn't an int on some architectures, it's a 'long'.  As a result if
you reference a 32 bit value where you should use 64, you'll certainly
end up with something unexpected (probably an oops).


pgp89gE9Ad8ly.pgp
Description: PGP signature


Re: reiser4 experimental patch

2006-11-09 Thread Valdis . Kletnieks
On Thu, 09 Nov 2006 17:23:20 -0200, Guilherme Covolo said:
 hello guys,
 
 my experimental patch need modfications on fs/reiser4/context.c
 
 i need help ;)

You'll have to give us more info than that.  What happened?

Patch reject? It didn't compile? It didn't modprobe? The resulting kernel
didn't boot? The resulting kernel oopsed? Other? 



pgpL4a0CRF33Z.pgp
Description: PGP signature


Re: Reiser FS will not boot after crash

2006-09-04 Thread Valdis . Kletnieks
On Mon, 04 Sep 2006 23:33:27 +0400, Vladimir V. Saveliev said:

 after unclean shutdown journal reply is necessary to return reiserfs to 
 consistent state. Maybe GRUB did not do that?

A case can be made that GRUB should be keeping its grubby little paws off
the filesystem journal.  It's a *bootloader*.  It's only purpose in life is
to load other code that can make intelligent decisions about things like
how (or even whether) to replay a filesystem journal.


pgpPJnsG2OrrF.pgp
Description: PGP signature


Re: [PATCH] reiserfs: fix handling of device names with /'s in them

2006-07-17 Thread Valdis . Kletnieks
On Sun, 16 Jul 2006 20:02:27 PDT, Hans Reiser said:

 Create a mountpoint which knows how to resolve a/b without using a
 directory.

And said mountpoint gets past the '/' interpretation in the VFS, how, exactly?

fs/namei.c, do_path_lookup() does magic on a '/' on about the 3rd line.
So you're going to get handed 'a'.


pgpAiNTu9WQ5W.pgp
Description: PGP signature


Re: [PATCH] reiserfs: fix handling of device names with /'s in them

2006-07-17 Thread Valdis . Kletnieks
On Mon, 17 Jul 2006 11:27:20 PDT, Hans Reiser said:
 [EMAIL PROTECTED] wrote:
 On Sun, 16 Jul 2006 20:02:27 PDT, Hans Reiser said:

 Create a mountpoint which knows how to resolve a/b without using a
 directory.

For wanting to resolve it *without* using a directory...

 And said mountpoint gets past the '/' interpretation in the VFS, how, 
 exactly?
 
 fs/namei.c, do_path_lookup() does magic on a '/' on about the 3rd line.
 So you're going to get handed 'a'.

 It does not need to be so complex actually,  Just create a plain old
 parent directory just like every other parent directory in procfs.

This smells a lot like using a directory to resolve it


pgpII2oRQluma.pgp
Description: PGP signature


Re: any way to disable fsync?

2006-07-11 Thread Valdis . Kletnieks
On Tue, 11 Jul 2006 23:03:12 +0200, =?iso-8859-2?B?o3VrYXN6IE1pZXJ6d2E=?= said:
 I got problem with apps that are calling fsync, it makes my hard drive  
 flush like mad and it slows down things quite a lot.

Several have posted how to bypass it.  I'll pose the opposite side:

Usually, applications call fsync() because they're pretty sure that if
the disk and in-memory copies aren't lined up, a crash at that point could
result in data loss and/or corruption.

So sqlite calls fsync() - probably because if it *doesn't*, and your
system crashes/reboots, you *will* lose that sqlite database.

Your data, your decision.


pgpTVO6hloYbk.pgp
Description: PGP signature


Re: any way to disable fsync?

2006-07-11 Thread Valdis . Kletnieks
On Tue, 11 Jul 2006 17:04:56 PDT, Hans Reiser said:
 There are legitimate applications where the value of data is low enough
 and the load is high enough, that losing the database upon crash is ok.

 I have mixed feelings about making it a mount option for reiser4 because
 many users will not know what they do.  In the end though, I should just
 sell the rope and advise but not control what people do with it.   If
 someone writes it I will take a mount option patch to disable fsync iff
 it comes with documentation that has a lot of warnings.

Two things to consider before writing code:

1) Should it be done at the VFS level instead of in the filesystem?
Architecturally, it might be better there, so it applies to ext3 and jfs
and others too... 

2) Alternatively, should it be done on a per-file basis (possibly
flagged with a chattr or similar)?  It can't be done as an open()
flag or ioctl(), because you're trying to override what the code does...
That way, you can mitigate any fsync() load caused by one file, and
still not leave yourself open to being screwed by some other application
that tries to fsync() in other directories on that filesystem.  It
would be Really Bad if /home/fred/db23.sqlite gets corrupted because
the filesystem was mounted -nofsync because of /home/george/moby.sqlite
overhead


pgpVCab4SeTph.pgp
Description: PGP signature


Re: Re: alman birasý oettinger türkiyede

2006-05-15 Thread Valdis . Kletnieks
On Mon, 15 May 2006 13:50:49 +0200, Lars Grobe said:
 Ok, this is the first time of my life that I was really pleased by what I
 read in a spam mail. As a German living in Istanbul, I will translate: The
 cheapest beer available in Germany will be sold in Turkey now, too :-)

If it weren't for the fact that even *cheap* German beer beats most US beer,
I'd say you got the wrong criterion for being overjoyed... ;)


pgpY7RGOlVrJd.pgp
Description: PGP signature


Re: bad bread

2006-05-09 Thread Valdis . Kletnieks
On Tue, 09 May 2006 00:18:32 +0200, PFC said:

   Linux RAID has a special option for that : you can trigger a check, 
 which  
 will re-read the entire disks and, if a read error occurs, re-write the  
 failing sector with good data from the other drives in the RAID. The drive  
 with the bad sector will then remap it to another sector.

If you have 2 mirrored disks, and are replacing one, you don't have a good
block to read it from.  The failure mode was a RAID controller that didn't
properly handle re-writing the bad block on the first disk, so when the
second disk got a bad block, you were screwed



pgpQzzLSB85Ov.pgp
Description: PGP signature


Re: bad bread

2006-05-08 Thread Valdis . Kletnieks
On Sun, 07 May 2006 10:35:44 +0200, PFC said:
 
  In the event of physical HD failure, the procedure goes like this:
 
   Get mail saying a HDD is dead. Replace harddisk, resynchronize RAID.
   Use Linux software RAID. Harddrives are cheaper that the time you'll 
 lose  
 trying to recover your data.

Remember to take backups *anyhow*.  That way, if the RAID controller dumps
cow manure on all the sectors, you won't be saying Oh, SH*T.

Also, note that there exist buggy RAID controllers, where if you are doing
mirroring to 2 disks, and they develop bad blocks at different locations,
you can trash the mirror by resynchronizing (basically, you swap out one of
the bad disks, re-sync, it progresses as far as the bad block on the source
for the mirror, and dies).



pgpncHXAUBEls.pgp
Description: PGP signature


Re: Transparent Compression

2006-05-05 Thread Valdis . Kletnieks
On Fri, 05 May 2006 10:37:40 +0200, Jonathan Carter said:
 I've read about ReiserFS's built-in compression, and I've been excited
 about it for a long time, but haven't figured out how to activate it
 yet. I've googled and looked on the reiserfs website, but couldn't find
 any information on how to do it.
 
 Can anyone please tell me how, or point me to the appropriate
 documentation?

Step 1: Wait till the code is actually released.

I don't think Reiser-with-compression is a configuration that is
currently buildable by mere mortals at the current time.



pgpUGJEe5NGk9.pgp
Description: PGP signature


Re: Reiser4 crash 2.6.16-mm1

2006-03-27 Thread Valdis . Kletnieks
On Mon, 27 Mar 2006 14:32:14 PST, Joe Feise said:

 Thanks for the suggestion. I haven't run a memtest, but I don't really think 
 that the memory is bad. The machine most likely would have had other issues 
 if that was the case.

You'd be *amazed*.  Intermittently weak memory (especially if it's just one bad
bit) can manifest in the most odd ways.

In fact, if you think about it, if it's bad memory, your trashed reiser4
partition could very well *be* that would have had other issues that you
said you'd see if it was bad memory. ;)


pgpJspWmlyLGY.pgp
Description: PGP signature


iosched (was Re: Full of surprises - A reiser4 story from userland)

2005-09-28 Thread Valdis . Kletnieks
On Wed, 28 Sep 2005 22:13:52 +0300, Islam Amer said:

 BTW, Previously I had amazing performance with anticipatory
 IO-scheduler ( even more so with genetic anticipatory ) any comments
 on this io-scheduler business, as it stirred up some commotion before.
 Is the performance boost an illusion or is it not.

The performance boost for any of the provided iosched schemes can be
positive, negative, imaginary, or complex(*), depending on the actual workload 
of
the system, and what reference patterns it generates.

There's 4 in-tree schedulers precisely because each of them has a clear-cut
advantage for some statistic (be it throughput, or latency, or CPU overhead, or
whatever) for some identified workload type.

(*) I suspect that (benchmarks being benchmarks) the chance that the boost
be totally real, with no imaginary component, is very slim.  And everybody
knows that most benchmark results are complex to interpret.. :)


pgpYlwcNDClWq.pgp
Description: PGP signature


Re: Will I need to re-format my partition for using the compression plugin?

2005-09-23 Thread Valdis . Kletnieks
On Thu, 22 Sep 2005 18:13:23 EDT, Gregory Maxwell said:

 It would normally seem silly to use RSA for disk encryption... but
 there might be applications, although you'd still never use RSA
 directly on user controlled data.  For example, RSA could be used on a
 multi user server to append mail to a mail file so that once written
 the data is only accessible once the user logs on.  The reiser4 crypto
 system will use the kernel keyring api, so it would be quite
 reasonable to tie encryption to user accounts. 'write only' files and
 'read only' files would be a simple logical extension, and would
 require asymetric cryptography.

In fact, RSA would *still* be a poor choice there - the CPU costs go up
exponentially with the size of the object encrypted.  And if you have a 64K
sized files, that means if you use RSA directly, you get to do mathematics with
524,288 bit numbers.  Yep, multiply a 524,288 bit number by a 1024 bit number
and then compute the remainder when divided by another 1024 bit number. Lather,
rinse, repeat. ;)

You know how sites that do a lot of SSL buy special hardware accelerators?
The only *real* benefit they give you is offloading the CPU cost of doing
RSA over a 128-bit or so session key.

OK. Got that?  Doing RSA over a 16 byte file costs as much as opening a
standard 128-bit encryption SSL connection (because it's basically the same
thing).  And a 17 byte file costs you a lot more than 8 times as much.  And a
32 byte file isn't 16 times worse, it's *hundreds* of times worse.

That's why *nobody* uses RSA for anything other than securing a good-sized
symmetric session key.  So for this use, you'd use RSA to secure the file's
actual symmetric key (and possibly things like the initialization vectors).

(Note to designers - those pesky IV's are a *lot* trickier to get right
than you might think.  For instance, there's a known watermarking attack
against the current cryptoloop implementation in the kernel that allows
an attacker to prove the existence of data on the disk even without the
key - so a DRM scheme could find watermarked data even *after* encryption).

 Although for most compression algorithms not all inputs are valid
 outputs, so this may not work for you... It would be ideal (for disk
 encryption) if it were not possible to tell if you have the right key
 without decrypting an entire sector. This requires careful selection
 in compression and chaining mode.

In fact, Hamming distance considerations imply that usually you don't
need to decrypt more than 1 or 2 (*maybe* 3) blocks the size of the
symmetric cypher's blocksize.  For something like AES-256, you can probably
be sure in 32 bytes (1 block), very sure in 64 bytes, and totally sure in 128
bytes (unless the attacker has the misfortune to be trying to decrypt a file
that has actual structure on the same order as /dev/random output).

   Alternatively, it may be possible
 to develop a good large block cipher which while being much slower
 than a single block of a small-block cipher, is faster for a disk
 block.  For example, mercy is about 4x faster than AES on my system
 but is still 16x slower for the smallest unit of decryption than AES.
 Unfortunately mercy has security problems.

Tough design challenge there.

The problem is that if you have a cipher that can handle 512-*byte* input
blocks, it's going to probably stomp on a *lot* of L1 and L2 cache lines.
And you can't even rely on the usual pre-expansion tricks because that adds
even *more* to cache pressure.

Another desirable property of symmetric ciphers is that they tend to change
about half the output bits for a single-bit input change, and in an 
unpredictable
manner.  This ends up meaning that you'll probably need O(log2 N) rounds, and
more likely closer to O(N) rounds, to mix the pool.  Gonna be a *lot* of rounds
for a 512-byte block. ;)

  2) Even though most modern block ciphers are designed to be fast, it's still
  faster to apply a reasonably quick compression scheme to whomp 16K of data
  down to 5-6K and encrypt/decrypt 5-6k than it is to encrypt/decrypt 16K.
 
 Depends on the compression mode and the cipher. A good AES
 implementation is around the same speed as an aggressive gzip. In
 general this is correct.

That's why you don't use an *aggressive* gzip, but use 'gzip -3' instead. :)



pgpw8EBD8yIaD.pgp
Description: PGP signature


Re: Will I need to re-format my partition for using the compression plugin?

2005-09-22 Thread Valdis . Kletnieks
On Fri, 23 Sep 2005 00:03:32 +0400, Edward Shishkin said:

 Checksuming means a low
 performance: in order to read some bytes of such file you will need 
 first to read the whole file
 to check a checksum (isnt it?).

No.  Almost all modern networking gear is *perfectly* able to do incremental
updates of the checksum.  See this RFC:

1141 Incremental updating of the Internet checksum. T. Mallory, A.
 Kullberg. Jan-01-1990. (Format: TXT=3587 bytes) (Updates RFC1071)
 (Updated by RFC1624) (Status: INFORMATIONAL)
http://www.ietf.org/rfc/rfc1141.txt

The method is trivially extensible to other CRC schemes - and in fact, the
triviality is the entire reason why cryptographically strong hashes like MD5 or
the SHA family are interesting at all.  (I've seen more than one definition of
cryptographically strong hash as being basically a CRC function that does
*not* permit incremental updating)



pgpI0WmTOcshN.pgp
Description: PGP signature


Re: Will I need to re-format my partition for using the compression plugin?

2005-09-22 Thread Valdis . Kletnieks
On Thu, 22 Sep 2005 15:11:59 CDT, David Masover said:

  Because sometimes it is useful to compress data before encryption since 
  compression
  destroys vulnerable regular structure of some special files (like *.html)
 
 Although I'd imagine some algorithms are fairly resistant against that 
 (RSA, maybe?), the main reason is simple -- encryption tends to 
 introduce randomness.  If the crypto is any good at all, you won't be 
 able to compress very well after you've encrypted.

1) RSA is useless for this - you really need a symmetric block cipher of some
sort.  Almost all block ciphers are best used with maximum-entropy input - if
the attacker can lop out a large part of the keyspace, a brute force attack
becomes a lot easier.  This is somewhat related to the concept of Hamming
Distance. If the attacker tries a brute force attack, and the first 8 bytes of
the output look like valid HTML, or English text, or anything else
recognizable, he's almost certainly found found the correct key.  On the other
hand, well-compressed data has very high entropy - as a result, it becomes
harder to tell if a correct key has been found.  If it's English text, but
3 of the first 8 bytes have the high bit set, it's probably not a correct key.
If it's compressed, 3 flipped bits in the first 8 bytes will probably still
represent a valid compressed stream - just of something else wildly different.

2) Even though most modern block ciphers are designed to be fast, it's still
faster to apply a reasonably quick compression scheme to whomp 16K of data
down to 5-6K and encrypt/decrypt 5-6k than it is to encrypt/decrypt 16K.



pgp2td0oJxyEV.pgp
Description: PGP signature


Re: Will I need to re-format my partition for using the compression plugin?

2005-09-22 Thread Valdis . Kletnieks
On Thu, 22 Sep 2005 16:54:12 EDT, michael chang said:
 On 9/22/05, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
  2) Even though most modern block ciphers are designed to be fast, it's still
  faster to apply a reasonably quick compression scheme to whomp 16K of data
  down to 5-6K and encrypt/decrypt 5-6k than it is to encrypt/decrypt 16K.
 
 Two questions.  One, does this mean that compression will usually be
 performed before encryption (which to me, sounds like it appears to be
 what would be the best method here)?

Yes, the general rule is compress, then encrypt and Decrypt then decompress.

The corner cases where it should be the other way around are so few and far
between that it's not worth worrying about - they basically center around those
few times when compression causes a *larger* payload, and can be dealt with
a simple Don't compress if output is bigger than input rule.


pgpRncQZOEsYN.pgp
Description: PGP signature


Re: I request inclusion of reiser4 in the mainline kernel

2005-09-20 Thread Valdis . Kletnieks
On Tue, 20 Sep 2005 23:28:12 +0400, Roman I Khimov said:
 --nextPart1692600.LIfSYN1P7A

 Maybe I'm doing something wrong here, but ext2 have failed on second check
 of first pass with
 
 Second check...
 e2fsck 1.34 (25-Jul-2003)
 Pass 1: Checking inodes, blocks, and sizes
 Pass 2: Checking directory structure

 fsck.damaged: * FILE SYSTEM WAS MODIFIED *
 fsck.damaged: 1345/25064 files (1.7% non-contiguous), 94063/10 blocks
 fsck lied about its success (result = 1)

What was the return value and output from the *first* fsck? 


pgpoMa1WLwXB9.pgp
Description: PGP signature


Re: I request inclusion of reiser4 in the mainline kernel

2005-09-20 Thread Valdis . Kletnieks
On Tue, 20 Sep 2005 17:17:13 EDT, Theodore Ts'o said:

 An exit code of 1 means that filesystem errors were corrected
 (successfully).  

Right.  The problem is that this was a *second* check, after the first one
terminated with exit code 0, 1, or 2.  Thus, it *should* have exited with 0.

The *first* check lied - if there were unfixed errors, it should have exited
with exit 4.


pgphzcfEapgtf.pgp
Description: PGP signature


Re: I request inclusion of reiser4 in the mainline kernel

2005-09-18 Thread Valdis . Kletnieks
On Sun, 18 Sep 2005 13:22:27 EDT, michael chang said:

 Give Hans a chance; and please try to understand, even if he's hard to
 work with.  Discriminate him because he's not a developer you can talk
 with, and I believe that's like discriminating a guy in a wheelchair
 because he can't run with you when you jog in the morning.

There's nothing wrong with discriminating against the guy in the wheelchair
under some circumstances - for instance, when your track team needs a new
high jumper.

Similarly, when the goal is to build a set of developers that can actually
get work accomplished, poor interpersonal communication skills can be a
major problem.

If the problem is that Hans and the rest of the kernel developers don't get
along, perhaps the most expedient thing would be for Hans to step out of the
way and have somebody else from Namesys (or elsewhere even) act as the 
interface.


pgp90wjQAcsxl.pgp
Description: PGP signature


Re: I request inclusion of reiser4 in the mainline kernel

2005-09-18 Thread Valdis . Kletnieks
On Sun, 18 Sep 2005 22:16:11 PDT, Hans Reiser said:

 Hellwig, people who write slow file systems should not lecture their
 measurably superiors on how to code.  Oh, and I should mention that
 other people besides me have measured reiser4, and concluded it is twice
 the speed of the other Linux filesystems, so don't go claiming it is
 just my benchmarks.   What you are doing is keeping me from doing a real
 code review myself by keeping my guys so busy that they don't have time
 to review the fixmes I inserted and would insert more of if I thought
 they had time for them.

Hans, unfortunately the most obvious reading of the above is Reiser4 is so
damned fast because it doesn't bother doing sanity-checking.  If there's still
more fixmes to be inserted that *you* know of, and there are so many that
there's no time to fix them, why is this being submitted for inclusion?

On Sun, 18 Sep 2005 22:09:08 PDT, Hans Reiser said:
 Of course, the reiser4 code is not as stable as it was before the
 changes Christoph asked for.

This sort of claim requires proof - can you point at *specific* things that
were less stable after you fixed the code, including explaining why they're
less stable?


pgpSbMPr0RKqd.pgp
Description: PGP signature


Re: we have got hash function screwed up

2005-09-11 Thread Valdis . Kletnieks
On Mon, 12 Sep 2005 00:49:54 +0200, evilninja said:
 [EMAIL PROTECTED] schrieb:
  Yes, I know there's needs to support borked legacy filesystems that were 
  mkfs'ed
  before the problem was recognized.  That means fsck.reiserfs needs to know 
  about
  it - but mkfs.reiserfs??  Seen in the Fedora Core devel tree as of tonight:
 
 yeah, this feature of mkfs.reiserfs could be removed, but since reiserfs
 is in bugfixes-only-mode i don't see it happen.

I'd consider a #if 0/#endif to remove known busticated code a bugfix. ;)


pgpbojpUl362y.pgp
Description: PGP signature


Re: we have got hash function screwed up

2005-09-10 Thread Valdis . Kletnieks
On Sat, 10 Sep 2005 17:36:49 +0200, evilninja said:
 Gabor HALASZ schrieb:
  Sep  5 12:30:24 sk8n kernel: ReiserFS: dm-10: checking transaction log 
  (dm-10)
  Sep  5 12:30:24 sk8n kernel: ReiserFS: dm-10: Using rupasov hash to sort 
  names
 
 why did you choose the rupasov hash?
 http://www.namesys.com/mount-options.html knows:
 
 rupasov: [...] Never use it, as it has high probability of hash
  collisions.

Why is it selectable then?

Yes, I know there's needs to support borked legacy filesystems that were mkfs'ed
before the problem was recognized.  That means fsck.reiserfs needs to know about
it - but mkfs.reiserfs??  Seen in the Fedora Core devel tree as of tonight:

% rpm -q reiserfs-utils
reiserfs-utils-3.6.19-2
% strings /sbin/mkfs.reiserfs  | grep -i rupasov
rupasov
  -h | --hash rupasov|tea|r5   hash function to use by default



pgpCEjTycKelh.pgp
Description: PGP signature


Re: Reiser4 and ACLs

2005-08-14 Thread Valdis . Kletnieks
On Sun, 14 Aug 2005 05:38:40 PDT, Marc Perkel said:
 btw - is Reiser4 still going to get merged into 2.6.13?

It's not in 2.6.13-rc6, and I doubt Linus is going to blop *that* big
a chunk of code in this late - it's already well into the is this 3-liner
too drastic phase.

What happens when the 2.6.14 tree opens is up to Linus and Andrew.



pgpvushrPVW5F.pgp
Description: PGP signature


Re: reiser4 on 2.6.13-rc6-realtime-preempt

2005-08-12 Thread Valdis . Kletnieks
On Fri, 12 Aug 2005 12:09:03 +0200, gimpel said:

 reiser4 again. Maybe the is to wait for stable 2.6.13 before doing
 tests with realtime-preempt as it gets updated twice a day.
 And i so much hope the kernel guys decide to merge reiser4.

Well, reiser4 can't possibly make it into 2.6.13, as we're at -rc6 already
and Linus asked for a quiet down several -rc ago.  What happens when the
tree opens for 2.6.14 is a different question that I can't answer



pgpcKVS80WLMQ.pgp
Description: PGP signature


Re: reiser4 performance

2005-08-09 Thread Valdis . Kletnieks
On Tue, 09 Aug 2005 13:52:49 EDT, michael chang said:

 Striped RAID only works if you have multiple disks and a decent bus. 
 I'm stuck on the lowest-end Dell Dimension 3000, with one of the
 slowest hard drives in history.  And I haven't gotten around to
 opening the case... yet.

Newbie. ;)

IBM 2314 disk drive for the S/360, late 60s. 10 14 platters, 3600RPM, 29M of
storage capacity, 650Kbytes/second transfer rate.  And that was a fast
mainframe drive for its day.

Now what was this about slow tiny drives? ;)

And if you think a seek hurts latency on modern disk drives, you should have
seen what an end-to-end seek did on a filesystem on a DECTape (yes, the tape
had addressable blocks, you could (and many people did) put a filesystem on it).


pgpqcuTkTllWu.pgp
Description: PGP signature


Re: Reiser4 + seekdir()

2005-06-29 Thread Valdis . Kletnieks
On Wed, 29 Jun 2005 14:22:05 +0400, Vladimir Saveliev said:
 
 Existence of various plugins assumes that user is able to choose
 whatever is suitable for him. Or create his own plugin if none of
 existing ones satisfies him.
 If user cares a lot about using telldir/seekdir he is supposed to choose
 SEEKABLE_HASHED_DIR_PLUGIN_ID.

Is that the user, or the person building the kernel?



pgpe8CxAd1yAw.pgp
Description: PGP signature


Re: reiser4 plugins

2005-06-29 Thread Valdis . Kletnieks
On Wed, 29 Jun 2005 16:58:20 +0300, Markus =?UNKNOWN?Q?T=F6rnqvist?= said:
 What pisses me off is the fact that Gnome and friends implement
 their own incompatible-with-others VFS's and automounters and
 stuff.

The fact that things like Gnome, which are basically consumers of their own
dogfood, have incompatible versions says very loudly that there's no consensus
on the semantics

 Surely supporting this in the kernel and extending the LSB
 to require this is the best step to take without infringing
 anyone's freedom as such.

First we need to decide *if* it's to be supported, then *what* to support


pgp3p4cUtovoE.pgp
Description: PGP signature


Re: reiser4 plugins

2005-06-28 Thread Valdis . Kletnieks
On Mon, 27 Jun 2005 13:25:14 CDT, David Masover said:

 I was just trying to avoid the people will never adopt a new archive
 format argument by pointing out that a similar archive format was
 recently created and adopted.

Out of curiosity, adopted by popular acclaim, or because an 800 pound gorilla
said This is the format we're shipping XYZ in, learn to deal with it?
(I've seen both happen multiple times in the last quarter century, on many
different operating systems)

(For that matter, all of my production boxes are backed up by either Tivoli or
Legato, and I haven't a *clue* what format those tapes are in.  As a practical
matter, it doesn't really matter - after the first quarter petabyte or so of
backed up data, you're not going to do a restore without the software's help
anyhow.. ;)



pgpG4HKmqmFUb.pgp
Description: PGP signature


Re: reiser4 plugins

2005-06-27 Thread Valdis . Kletnieks
On Mon, 27 Jun 2005 00:57:54 CDT, David Masover said:

 In one of three possible settings for the imaginary zipfile plugin, yes.
  But if we're talking about a kernel source tree, how many of us
 actually build zipfiles/tarballs of their kernel source trees, rather
 than unpack existing ones?

I dunno.  I'll often build a tarball of -mm plus local patches known to
be working at the moment, precisely so I can just untar that as a known good
base for the next kernel-hackfest, rather than untar Linus's tree, apply all
of the -mm patch, then all my local patches again...

And even if I'm not *that* ambitious, I'll at least tar up a clean -mm tree
to use as a base. :)

And even if I didn't do that, you *do* have to do something when the disk
gets backed up.  You *do* intend for sensible things to happen then, right? ;)



pgpSCyqEvVUKZ.pgp
Description: PGP signature


Re: reiser4 plugins

2005-06-27 Thread Valdis . Kletnieks
On Mon, 27 Jun 2005 00:54:17 CDT, David Masover said:

 There has been some mention of inheritance, but I've forgotten how
 that's supposed to work.  If there's some sort of inheritance where
 children inherit properties of their parent directory, and also inherit
 changes to those properties, than Hans probably wants that to be the
 prefered way of doing things?

Well, the 'chmod g+s dirname/' example *is* just children inherit the
group of the directory, and somebody didn't like that.. ;)

  Now throw in multiple users and CPU limits.  User A enters that directory 
  and
  references everything, causing the buffer cache to get filled up.  While 
  there,
  A makes changes, so the pages are dirty - for i in */*; do echo$i; 
  done
  would do the job...  User B now does something that causes a writeback of 
  one
  of those buffer cache pages.
  
  A) What process currently gets ticked for the CPU and I/O for the writeback?
  
  B) In your model, who will get ticked for the resources?
  
  C) Will the users riot? (Note that you can't win here - currently, the 
  price
  of writing back A's and B's pages are about equal.  However, if A gets 
  dinked
  for an expensive writeback due to B's process, A will get miffed.  If B gets
  charge for an expensive writeback of A's, B will get miffed. If you say 
  screw it
  and bill it to a kernel thread, the bean counters will get miffed... ;)
 
 If I understand this correctly, this is somewhat like if user A creates
 a 50 meg file on a system with 100 megs free RAM, and user B runs
 sync.  Also similar to if B were to suddenly fill up 75 megs of RAM,
 forcing A's file to be flushed -- last I checked, in Reiser4, only a
 sync or memory pressure causes writes to flush.

Exactly the same sort of thing - traditionally it's been more or less ignored
in the system accounting, because A would usually average out to causing as
many I/Os as B did, and they were roughly equal in cost so it was a wash.
However, if one user has a much higher per-page cost than the other, the
imbalance can start to matter *very* quickly

 Right?  This is tempting to comment on, but I want to make sure I grok
 it first...

For more fun, consider how you can write 1 megabyte of data to a file,
lseek to the beginning and start writing again - and you go over quota
on the *second* write even though you're over-writing already existing
data.  Can happen if you're compressing, and the second write doesn't
compress as well as the first. (To be fair, we already have similar
issues with sparse files - but at least 'tar --sparse' has an easy way
to deal with it compared to this. ;)


pgpCF0I8B9buu.pgp
Description: PGP signature


Re: reiser4 plugins

2005-06-27 Thread Valdis . Kletnieks
On Mon, 27 Jun 2005 01:27:25 CDT, David Masover said:

 I back up with rsync, actually.

Doesn't matter what it is.  You still need to define sane semantics for
it.. ;)

 Speaking of backup, that's another nice place for a plugin.  Imagine a
 dump that didn't have to be of the entire FS, but rather an arbitrary
 tree...  That might be a nice new archive format.  I know Apple already
 uses something like this for their dmg packages.

Hmm.. you mean like 'tar' or 'cpio' or 'pax' or 'rsync'? :) 


pgpBnQQgpO38t.pgp
Description: PGP signature


Re: reiser4 plugins

2005-06-27 Thread Valdis . Kletnieks
On Mon, 27 Jun 2005 02:00:49 CDT, David Masover said:

 Speaking of backup, that's another nice place for a plugin.  Imagine a
 dump that didn't have to be of the entire FS, but rather an arbitrary
 tree...  That might be a nice new archive format.  I know Apple already
 uses something like this for their dmg packages.
  Hmm.. you mean like 'tar' or 'cpio' or 'pax' or 'rsync'? :) 
 No, a dmg is an OS X program installer.  It appears to be a disk image
 of sorts.  So this is the backup idea in reverse.

I was addressing the ability to deal with an arbitrary tree.  By that 
definition,
a dmg, being a disk image and not a tree image, is *not* what you want


pgp3Qa8O4oZuW.pgp
Description: PGP signature


Re: reiser4 plugins

2005-06-27 Thread Valdis . Kletnieks
On Mon, 27 Jun 2005 02:07:46 CDT, David Masover said:
  Exactly the same sort of thing - traditionally it's been more or less 
  ignored
  in the system accounting, because A would usually average out to causing as
  many I/Os as B did, and they were roughly equal in cost so it was a wash.
 
 Even if A is doing A/V work and B is programming?

I said traditionally - it's been a oh well, we can't do much about it
problem for a *long* time (for instance, time spent in an interrupt handler
has usually been charged off against whoever's timeslide the interrupt handler
took a chunk out of).  It's only been tolerated so far because (a) the costs
for both users are about equal and (b) you rarely have a heavy I/O DB and a
number cruncher on the same box, or a user doing A/V work and a user doing
programming - if it's not a single-use machine, there's *multiple* number
crunchers, DBs, or programmers, and they tend to balance out.

Said tendency can dissapear quite easily here

 How do we get over quota errors, btw?  Can we get them from write()
 calls?  If so, I don't see a Problem(TM), just an annoyance.

One gotcha here is that it means that you can't do delayed allocation on
writes - you *have* to allocate disk space at each write and then update
the quotas. (And yes, I know that 'man 2 close' says that bad stuff can
happen to your data even after your program exits - that doesn't mean we
should go out of our way to make things worse.. ;)


pgpUHb5i8cnuK.pgp
Description: PGP signature


Re: reiser4 plugins

2005-06-26 Thread Valdis . Kletnieks
On Sun, 26 Jun 2005 02:48:06 CDT, David Masover said:

 Lincoln Dale wrote:

  this is the WHOLE point of standardization .. i don't think its that
  Reiser4's EAs offer any more or less capabilities than standard EAs -
 
 They do.  Reiser4's EAs can look like any other object -- files,
 folders, symlinks, whatever.  This is important, especially for
 transparency.

No, you want them to look like the same objects that {get|set}xattr() manage
currently.  You don't want programs to have to guess what an EA looks like
this week, with this user's combination of plugins that's different from
everybody else's.

  lets take this a step further.  what about compression?  do we accept
  that each filesystem can implement its own proprietary compression via
  its own API - and now we need individual user-space tools to understand
 
 No, that's the beauty of these EAs in Reiser4.  The API is standard
 write(2) commands.  sys_reiser4 supposedly implements an interface to
 make this scale better, but otherwise have the same semantics.  And who
 said anything about proprietary compression?  I think we were planning
 on the kernel's zlib, though we might have been planning to make it a
 bit more seekable...
 
  each of these APIs?
 
 So, the API becomes something like:
 
 cat crypto/inflated/foo   # transparently decompressed
 cat crypto/raw/foo.gz # raw, gzip-compressed

And 'cat crypto/raw/foo' or 'crypto/inflated/foo.gz' gets you what, exactly?

Now throw some .bz2 and .zip files into the mix... ;)

 Another possibility, if you like file-as-a-directory:
 
 cat foo.gz# raw
 cat foo.gz/inflated   # decompressed
 
 One could easily imagine things like these two potentially equivalent
 commands:
 
 cp foo bar.zip/
 zip bar foo

Unless of course the user had done 'mkdir sorted.by.city.zip' to make
a directory of files containing data sorted by USPS Zip code.

And what happens if the user has a file 'bar' that's not a ZIP file,
and a directory 'bar.zip' isn't a view into 'bar'?

Most of the time, if I have a file 'linux-2.6.12.tar.bz2' and a
directory 'linux-2.6.12', what is under the directory is *NOT* the same
data as what's in the .bz2 - I've done 'make oldconfig' and a few builds
and some variable amount of patching, usually with rejects, and I *don't*
want that .bz2 being updated during all this (hint - what's my next command
after 'rm -rf linux-2.6.12' likely to be, and why, and  what expectations
do I have when I do it?)

You want to think this sort of thing through *really* thoroughly, because
there's a *lot* of things, both users and programs, that have expectations
about The Way Things Work.

 The whole point is to have less userland tools, not more.  I'm not
 saying we move zip into the kernel, just that the user now has one less
 command to remember.

But now instead of having to remember the one meme I can manage any
compressed-archive format that's stored in a file, and put other files in it,
and all I need is the appropriate userspace tool, they have to remember the
cp trick works for .zip and .tar, but I'll get a not a directory error if I
try it with a .hqx file, and that other file format may or may not work,
because I can't remember if this kernel has a working out-of-tree module for
this kernel



pgp996ifajjEW.pgp
Description: PGP signature


Re: reiser4 plugins

2005-06-26 Thread Valdis . Kletnieks
On Sun, 26 Jun 2005 14:58:07 CDT, David Masover said:

 Plugins is a bad word.  This user's combination of plugins is most
 likely identical to other users', it's just which ones are enabled, and
 which aren't?  If they are all included, I assume they play nice.

Which ones are enabled. Exactly.

 And just because they are called plugins doesn't mean the EA looks
 different every week.

They do if the one enabled this week is make EAs look like symlinks, and
last week's was make EAs look like folders.

(Don't blame me, *you're* the one that said EAs can look like any other 
object..)


  And 'cat crypto/raw/foo' or 'crypto/inflated/foo.gz' gets you what, exactly
?
  
  Now throw some .bz2 and .zip files into the mix... ;)
 
 Interface is the same.  Only, zip files aren't just compression, so
 maybe the interface changes a little there.

Right. So please explain what crypto/raw/foo and crypto/inflated/foo.gz give 
you.

 Point is, now you have a standard interface for any program to access
 any simple lossless compression, transparently.
 
 Another possibility, if you like file-as-a-directory:
 
 cat foo.gz  # raw
 cat foo.gz/inflated # decompressed
 
 One could easily imagine things like these two potentially equivalent
 commands:
 
 cp foo bar.zip/
 zip bar foo

  Unless of course the user had done 'mkdir sorted.by.city.zip' to make
  a directory of files containing data sorted by USPS Zip code.
 
 What's this got to do with anything?

It's got a *LOT* to do with it if I created a *DIRECTORY*, to use *AS A 
DIRECTORY*,
the way Unix-style systems have done for 3 decades, and suddenly my system is
running like a pig because the kernel decided that it's a .zip file.

  And what happens if the user has a file 'bar' that's not a ZIP file,
  and a directory 'bar.zip' isn't a view into 'bar'?
 
 In file-as-a-directory (which is probably NOT happening soon), bar.zip
 is both the actual zipfile and the view inside, depending on whether you
 try to open() it directly or peek inside it as a directory.

Ahem.  bar.zip' is a *DIRECTORY*. I said 'mkdir bar.zip' - why is it not
acting like a directory?
 
 However, let's not discuss this now.  I do NOT want to start another
 silent semantic changes with reiser4 thread.  File-as-directory is not
 happening this time, so don't worry about it -- this time.

Fish or cut bait.  You are the one who started handwaving the 
'file-as-directory'.
If you don't want it discussed, don't mention it.

  Most of the time, if I have a file 'linux-2.6.12.tar.bz2' and a
  directory 'linux-2.6.12', what is under the directory is *NOT* the same
  data as what's in the .bz2 - I've done 'make oldconfig' and a few builds
  and some variable amount of patching, usually with rejects, and I *don't*
  want that .bz2 being updated during all this (hint - what's my next command
  after 'rm -rf linux-2.6.12' likely to be, and why, and  what expectations
  do I have when I do it?)
 
 You're misunderstanding.  man zip.
 $ zip bar foo
 creates/modifies a file named bar.zip, not bar, which contains the
 file foo.

No. *YOU* are misunderstanding.  I have a directory 'linux-2.6.12', and
I have a file 'linux-2.6.12.tar.bz2', and I do *NOT* want directory operations
to be silently converted into let's scribble into the middle of this tar file
and then compress it.  (Hint - work out how long a kernel 'make' would take
if you were doing it inside a .tar.bz2).

  You want to think this sort of thing through *really* thoroughly, because
  there's a *lot* of things, both users and programs, that have expectations
  about The Way Things Work.
 
 Or, I can avoid those issues altogether, and simply delegate this kind
 of stuff to user-created-but-magic directories.  For instance, I could
 have a directory called /foo which contains encrypted files, and
 /foo/decrypted which has transparently decrypted representations of them.

So rather than everything working in a funky manner, a program gets to guess
how funky, and in what direction, a given magical directory is


pgp8ioinE1xvZ.pgp
Description: PGP signature


Re: reiser4 plugins

2005-06-26 Thread Valdis . Kletnieks
On Sun, 26 Jun 2005 19:16:48 CDT, David Masover said:

 But, to avoid confusion, the inclusion of a crytocompress plugin in a
 given kernel doesn't mean that all files accessed from that kernel are
 encrypted and compressed.  It just means that you can pick an individual
 file and set it to be transparently encrypted/compressed.
 
 That is what I meant by enabled.  Not per-user, but per-file.

Doing key management in a secure manner is going to be *fun*. :)


pgpOGJMOJGgcF.pgp
Description: PGP signature


Re: reiser4 plugins

2005-06-26 Thread Valdis . Kletnieks
On Sun, 26 Jun 2005 17:35:48 CDT, David Masover said:

  Right. So please explain what crypto/raw/foo and crypto/inflated/foo.gz 
  give you.
 
 In that example (shouldn't have used the name crypto, but oh well), it
 should be crypto/raw/foo.gz and crypto/inflated/foo -- where foo.gz is
 the gzip'ed file and foo is the transparently compressed/decompressed
 file.  Basically, these are equivalent:
 
 $ zcat crypto/raw/foo.gz
 $ cat crypto/inflated/foo

I'm *quite* aware of what your preconceived notions think it *should* be.

Maybe the two examples I asked for have *real-world* meanings that you should
be allowing for.  Like, for instance, on a mail server, where the A/V software
may need to unzip a file 5 or 6 times to find out if there's malicious content.

Or seeing if it's a .zip bomb, where a small .zip will decompress to 
gigabytes.

Or I'm testing a new compression algorithm, to see if multiple compressions help
(yes, I know that it *shouldn't* help - but I've seen real-world cases where the
algorithm could only look at a 4K or 8K window at a time, and if you hit a 
*very*
long run of duplicate 4K segments, a second compression would compress all the
identical or near-identical this is a 4K chunk tokens...)


  It's got a *LOT* to do with it if I created a *DIRECTORY*, to use *AS A 
  DIRECTORY*,
  the way Unix-style systems have done for 3 decades, and suddenly my system 
  is
  running like a pig because the kernel decided that it's a .zip file.
 
 The kernel does not decide that.  You do.  And it doesn't automatically
 decide that every time you create a file.  You have to use some
 interface to trigger the plugins.

Oh, I'm waiting for the fun the first time somebody deploys a plugin that
has similar semantics to 'chmod g+s dirname/' ;)

 I guess I need a new name for this approach.  That's three possible ways
 of doing this?

I *said* you need to think this through in detail, didn't I? ;)
 
 I remember discussing that, actually.  It wouldn't automatically do this
 if you didn't want it to, but it would be nice if, say, it was something
 truly seekable like linux-2.6.12.zip, and linux-2.6.12 was a
 user-created symlink to linux-2.6.12.zip/.../contents, and we had a nice
 caching system...

I think you're highly deluded as to just how much or little performance gain
this will get you. Model what happens with a kernel 'make' on a 256M machine
with and without all that zipping and compressing, and assume that a constant
48M is available for caching of the linux-2.6.12/ tree.

 This is nice because then you get exactly the same performance during
 make as you would with unzip  make, only better, because files you
 don't ever use (lots of arch, for instance) are not unpacked.

Go read http://www.tux.org/lkml/#s7-7 and ponder until enlightenment arrives.



pgpIiBhIz7zum.pgp
Description: PGP signature


Re: reiser4 plugins

2005-06-26 Thread Valdis . Kletnieks
On Sun, 26 Jun 2005 15:54:25 PDT, Hans Reiser said:
 [EMAIL PROTECTED] wrote:
 
  (Hint - work out how long a kernel 'make' would take
 if you were doing it inside a .tar.bz2).
   
 
 After the first time, not very long, if you had enough ram  the
 plugin would keep the data uncompressed until it flushed it to disk.

You're not allowed to use current existing stuff like the disk buffer cache
to weasel your way out on this one.  if you had enough ram has been true
for decades.  The trouble is that quite often you *don't* have enough ram
 
 Performance might even improve since less would be written to disk.

I've worked with filesystems where performance improves due to compression
(AIX's JFS).  It's a lot harder to provide an improvement if you're writing
37 more bytes in between bytes 399457 and 399458 (I suppose by aligning
byte 399458 so it actually is on the start of a 4K block you can do that, but
then you're losing the advantages of the compression.. ;)



pgpt7ncoA2bcx.pgp
Description: PGP signature


Re: reiser4 plugins

2005-06-26 Thread Valdis . Kletnieks
On Sun, 26 Jun 2005 21:37:48 CDT, David Masover said:

 Assume we can do on-disk caching, similar to fscache/cachefs for nfs.
 Now, benchmark:
 
 $ unzip linux-2.6.12.zip  make -C linux-2.6.12
 
 versus the hypothetical
 
 $ make -C linux-2.6.12.zip/.../contents
 
 This is an automatic performance gain, in theory, because the second
 command is identical to unzipping just the parts you need into
 linux-2.6.12, then running make.

Nope, they're not identical.  The first specifically unzips it into the file
system, leaving the zip file intact.  The second, you're having to take all
those .o files and other stuff that the 'make' generates and put them back
into the .zip file *on the fly* - when the 'make' is half done, the .zip should
reflect a directory tree that has had half the make execute

(Think - after that hyptothetical 'make' completes, where is 'vmlinux'? ;)


pgpMW7gmAGlYr.pgp
Description: PGP signature


Re: reiser4 plugins

2005-06-26 Thread Valdis . Kletnieks
On Sun, 26 Jun 2005 21:37:48 CDT, David Masover said:

  Go read http://www.tux.org/lkml/#s7-7 and ponder until enlightenment 
  arrives.
 
 So what?  I don't intend to convince anyone based on how much
 slower/faster their kernel compiles.  It's meant to illustrate the
 principle of the thing.

No, you seemed convinced that you'd have a big win based on the fact that
big chunks don't get unpacked - when in fact it's not as much of a win as
you might think.

And at least in the real world, performance *does* matter - if doing it the
traditional way is 3 times faster, nobody's going to be interested.

 Besides, your point was that you could not run make inside of a kernel
 tarball/zipfile.  Nobody ever suggested that you would actually want to.

Here's a new facility.  Don't bother trying to actually use it.

Is that the message you're trying to send?


pgpkXNloSX147.pgp
Description: PGP signature


Re: reiser4 plugins

2005-06-26 Thread Valdis . Kletnieks
On Sun, 26 Jun 2005 23:10:43 EDT, Hubert Chan said:
 On Sun, 26 Jun 2005 20:40:29 -0400, [EMAIL PROTECTED] said:

  Oh, I'm waiting for the fun the first time somebody deploys a plugin
  that has similar semantics to 'chmod g+s dirname/' ;)

(You *did* notice it was set-GID of a *directory* not an executable file,
right?)

 Reiser4 plugins have to be compiled into the kernel.  (They're not
 plugins in the sense that most people use that word.)  And any admin who
 would compile that kind of plugin into the kernel needs to have his head

Oh?  You saying that it *wont* be permitted for a user to say:

mkdir $HOME/zipped
chattr files under here are ZIP files $HOME/zipped

and instead you have to do that chattr by hand for *every* *single* zip file?

Or files on this filesystem are encrypted by default?

I suspect that this sort of thing is going to be one of the *first* things
that will get created, and any admin who tries to sell this idea to the users
*without* that sort of functionality will be handed their head.

Or, if that type of plugin.. needs to have their head examimed, I suggest
that you go to your kernel source tree, find fs/ext3/ialloc.c, and this code
in ext3_new_inode():

if (test_opt (sb, GRPID))
inode-i_gid = dir-i_gid;
else if (dir-i_mode  S_ISGID) {
inode-i_gid = dir-i_gid;
if (S_ISDIR(mode))
mode |= S_ISGID;
} else
inode-i_gid = current-fsgid;

and #ifdef out all but the last line, and see if anything breaks. ;)

 examined.  Not to mention that plugins must first go through Hans and/or
 Linus before they can get included into the kernel.
 
 The kernel defines the set of plugins available to the user.  The user
 selects (to a certain degree) which plugins to use.

The point you missed was that plugins *will* have interactions, and as
the guys who are working on a stacker for LSM modules have found out the
hard way, trying to deal with the composition of functions is fiendishly
difficult.

And notice that it doesn't *have* to be quite so obvious - how about if a
user creates a directory $HOME/zipped/ and flags it as anything under here
is a zipped file.

Now throw in multiple users and CPU limits.  User A enters that directory and
references everything, causing the buffer cache to get filled up.  While there,
A makes changes, so the pages are dirty - for i in */*; do echo$i; 
done
would do the job...  User B now does something that causes a writeback of one
of those buffer cache pages.

A) What process currently gets ticked for the CPU and I/O for the writeback?

B) In your model, who will get ticked for the resources?

C) Will the users riot? (Note that you can't win here - currently, the price
of writing back A's and B's pages are about equal.  However, if A gets dinked
for an expensive writeback due to B's process, A will get miffed.  If B gets
charge for an expensive writeback of A's, B will get miffed. If you say screw 
it
and bill it to a kernel thread, the bean counters will get miffed... ;)


pgpeCE1Y8XJs7.pgp
Description: PGP signature


Re: reiser4 plugins

2005-06-26 Thread Valdis . Kletnieks
On Mon, 27 Jun 2005 00:31:46 CDT, David Masover said:

 *If* we decide that this must go both ways, *then* we either turn off
 write support inside the zipfile

Oh, *that* will do wonders for command symmetry.  And you just shot down
the whole 'mv foo bar' being equivalent to 'zip bar foo' concept. ;)

  and do make with a symlink farm (cp
 - -as isn't hard), or (better) we can set things up so that only on access
 (most likely a read) of the original zipfile do we re-add all the changes.

Those chuckleheads who have filled up a disk by saying 'tar cvf foo.tar .' just
got a whole new way to fill the disk... ;)


pgp8cnDLIoZE0.pgp
Description: PGP signature


Re: reiser4 plugins

2005-06-25 Thread Valdis . Kletnieks
On Fri, 24 Jun 2005 23:10:35 CDT, David Masover said:

 But Linux is better.  DOS ain't broke, but Linux is better.  So maybe
 VFS ain't broke, but plugins would be better.  I guess we'll only know
 if we let Reiser4 merge...

No, we'll only know if we merge something that does plugins at the VFS
level in a well-designed way.

 This was about a hypothetical ext3 format as a reiser4 storage plugin.
 I'm not sure how this ties into the VFS stuff.

Very poorly.  There's only two interpretations of ext3 as a reiser4 plugin
that make *any* sense.  The first is that reiser4 is totally violating the VFS
layer boundary, and the second is that reiser4 is trying to be an all-singing
all-dancing wankfest.  Later on, you say:

 A lot of what people like about ext3 is its stability and fairly
 universally accepted format.  A lot of what people like about XFS is its
 stability and speed, mainly with large files.  A lot of what people like
 about Reiser4 (as it is today) is its speed, with large and especially
 with small files.

Now *think* for a moment - how does a hypothetical Reiser4 using ext3 format
gain any speed advantage with small files, when the speed advantage is based
on using a format other than ext3?

As I said, either it's violating the VFS boundary, or it's busy wanking.

The Reiser4 proponents would be well served to disavow that particular
hypothetical example - I have yet to see *anything* that does more damage
to the Reiser4 cause.

 So, in this hypothetical situation where ext3 is a reiser4 plugin,
 suddenly all the ext3 developers are trying to improve the speed and
 reliability of reiser4, which benefits both ext3 and reiser4, instead of
 just ext3.

Or we can do what *should* be done, which is:

a) Put the crack pipe down.

b) Tell reiser4 to get its grubby little paws off the VFS if it ever intends
to have a chance of being merged in mainline.

c) Have a *separate* project to improve the speed/reliability/function of
the VFS layer, which is the only way that your vision of having the ext3 and
reiser developers cooperating will ever happen.

Yes, the VFS could probably use an overhaul.  But that *will* happen like this:

1) A patch is submitted and passes review to change the VFS.
2) If appropriate, a patch for reiser4 (if it gets merged) is also submitted
(possibly by the same people) to be the first user of the new API/functionality.

There's a *reason* why we see patch streams that look like:

Patch 1/3: Add moby_foo_init function to nautical core.
Patch 2/3: Modify white_whale driver to use moby_foo_init
Patch 3/3: Modify captain_ahab driver to use moby_foo_init

 Aside from what someone else already said about this, why not just have
 support for accessing, say, a .gpg file as transparently decrypted?  You
 don't even need file-as-directory, just create a file called foo which
 is really the decrypted version of foo.gpg.  No need to change the
 format, just the filesystem.

I don't think this is what they mean by Linux gives you enough rope to
shoot yourself in the foot with...

 Plus, as someone else said, it's much easier to do
 $ vim /some/encrypted/file
 than
 $ gpg --decrypt /some/encrypted/file  /some/decrypted/file
 $ vim /some/decrypted/file
 $ gpg --encrypt /some/decrypted/file  /some/encrypted/file
 $ shred /some/decrypted/file

You've totally failed to understand that the whole *point* of PGP is that 'vim
/some/encrypted/file' *isnt* easy to do.  A better example might be the various
crypto-loop-ish variants or Microsoft's EFS, where the key management model is
more tractable to this sort of automation.



pgpusAjyL6l40.pgp
Description: PGP signature


Re: reiser4 plugins

2005-06-24 Thread Valdis . Kletnieks
On Fri, 24 Jun 2005 11:13:45 MDT, Perry Kundert said:

 In general, isn't it better to first include modules providing
 divergent but possibly interesting functionality (such as Reiser4) as
 an optional or experimental component, and then slowly re-factor
 desirable functionality into higher level facilities like the VFS?

The problem arises when the facility is something that is demonstrably
borked when done in an optional way in one filesystem, and really needs
to be done at the VFS level if it is to be done at all.

I ask you -- if everyone in kernel-land is so convinced that you
 should always select varying on-disk formats via the VFS, then *why*
 hasn't ext2/ext3 been merged into a single filesystem?

Because the formats, although similar enough to be mostly compatible, are
still different enough that merging them is difficult.  There's some very
subtle second-order effects, where the ext3 driver can do things in different
orders or with different algorithms because it has a journal, when the ext2
code has to do things in a specific way because it has to *always* have things
in a consistent enough state that fsck.ext2 can clean things up.  So you end
up with code that looks like:

if (fs-journalled) {
/* 500 lines of code for the ext3 case */
} else {
/* 300 lines of different code for ext2 */
}

If you don't like that, then you can do this instead:

1) put ext2_do_whatever in ext2_whatever.c
2) put ext3_do_whatever in ext3_whatever.c

extern ext2_do_whatever();
extern ext3_do_whatever();

if (fs- journalled) {
ext3_do_whatever();
} else {
ext2_do_whatever();
}

In fact, I seem to remember Alan Cox answering this with only about 10% of
the code *wouldn't* end up like this or similar...

 Surely the
 journalling plugin of this filesystem is a prime candidate for
 selection via the VFS?

To be doing journalling at the VFS level implies that a journal is something
that makes sense at the VFS level - that it's basically filesystem independent,
which is most certainly *not* true - the notations an XFS journal needs to make
to indicate which blocks were just removed from the free-block structure are
quite different from what ext3 needs to record.

Note that journalling is neither an attribute of the actual data, or of
the user-visible metadata (inode contents, etc).   The only things that
care about the journalling format/etc are the filesystem driver, the mount
command, and the mkfs/fsck commands.   As such, it's a file system issue,
not a VFS issue.

For a good example of why this is so, go back and read the recent discussion
of what happens to flash memory filesystems mounted with 'sync' - this was
a case of the VFS doing journalling by flushing without consulting the
low-level drivers


pgpoai8EjBqXa.pgp
Description: PGP signature


Re: reiser4 plugins

2005-06-24 Thread Valdis . Kletnieks
On Fri, 24 Jun 2005 16:20:45 MDT, Perry Kundert said:

 OK, fair enough.  The file-as-directory stuff, which introduces
 VFS-incompatible issues, was turned off.  It requires VFS changes.

Mind you, I still think that sounds *interesting*, but it *has* to happen
at the VFS level.  (And if *that* doesn't force a 2.7 fork to happen,
nothing will :)

 The remaining plugin architecture, as far as I understand, deals
 in the on-disk structure of the FS -- just like journals.  Encryption,
 Compression, and the like.
 
 So, what you are saying, is -- so long as the plugins do stuff
 that deals in how reiser4 slings bits back and forth to the disk,
 you're OK, right?

Right - once the VFS hands the call off to reiser4, you're on your own
as far as I'm concerned..

 So, what you are saying is: if reiser4 wants to provide
 variability in on-disk format, so long as it implements it using
 *multiple different filesystems* (eg reiser4-cryptcompress,
 reiser4-whatever) -- just like ext2 vs. ext3 -- then you are OK with
 it?  But, if they are implemented as plugins, so that the ONE
 reiser4 filesystem can modulate its behaviour based on what the
 on-disk format says, that you are NOT OK with it?
 
 And this makes sense, why?

You misread that - my point was that ext2 and ext3 may look similar, but
they're sufficiently divergent that trying to create one driver that handles
both results in an ugly driver, thus the split...

 Don't get me wrong -- I'm not saying that ext2 and ext3 shouldn't
 be separate file systems.  However, if they were designed from the
 start so that the ONE (say) ext23 filesystem could look at its on-disk
 format, notice that the data specified the journal plugin, and
 implement the correct behaviour -- that this would be bad?

Well, if they *had* been, it would be a different story.  And there's stuff
in ext3 (see the -O feature section of 'man tune2fs') that *does* do the
sort of thing that you're proposing.  It's just that the ext2 codebase doesn't
fit in well for historical reasons.

 Because I can envision an ext23 filesystem that is just like
 reiser4, that does exactly that -- implements its variable behaviour
 via a journal plugin.

 So, if it did so, would you be OK with it?  As long as it wasn't
 called reiser4?

No, I'd be perfectly happy with a reiser4 that had a 'tunereiser 
--enable-plugin='
that had the same sort of format-altering semantics that 'tune2fs -O' has.

For bonus points, design a system that stores the plugin *in the file
system* (probably need to have a bytecode interpreter for this).  Then
you eliminate the can't mount if the kernel can't insmod the plugin issue ;)

 I really don't mean to sound sarcastic -- it just sounds like
 there are other issues at work here -- like Hans is a Butt-Head, so
 I want to reject reiser4's plugin design for modulating its behviour,
 no matter what.

I've never actually met Hans, so I don't know if he really *is* a butt-head
or not.  And I usually at least *try* to phrase it more like this proposal is
a non-starter that only a butt-head could continue to support, because.. :)

Of course, I myself have been called a butt-head on numerous occasions, because
I'm convinced there's a right and wrong way to design something. ;)

 OK, so far it seems like we are actually agreeing -- the stuff
 that gets done via reiser4 plugins actually doesn't have anything to
 do with the VFS, and it shouldn't be there.  So long as reiser4
 presents a VFS-sensible, VFS-consistent heirarchy of stuff that looks
 like files and directories to the VFS, then we're OK with it?
 
 Whether reiser4 uses plugins, or ESP, or whatever to decide what
 behaviour it implements in order to produce this VFS-consistent
 interface, then that's OK, right?

Right - not that *my* opinion counts for tons on LKML, and any *other*
stylistic/design faults are a separate issue. :)


pgpysKWXO5s3x.pgp
Description: PGP signature


Re: 13000Gig partition badblock check is the same -- do a reiserfsck again ?

2005-06-02 Thread Valdis . Kletnieks
On Thu, 02 Jun 2005 09:28:50 CDT, Dan Oglesby said:

 latest versions.  Took two days to run, but it completed, and I ended up 
 only losing 2 files out of over 1.1 million files on a 1TB RAID-5 
 array.  That's not too bad, considering how many times the machine went 
 up and down due to bad power in the building.

Buy a UPS. Now.  Even if it's just a big battery that will only keep you
running for 10 mins - at least that will give you enough time to do a clean
shutdown -h rather than get stuff trashed.

If you can't get money for it, just point at the lost-productivity costs
the *next* time the terabyte takes 2 days to recover.. and remind the boss that
you could be down for 2 days every time the lights flicker.. ;)


pgp0bq8czXfZ1.pgp
Description: PGP signature


Re: File as a directory - VFS Changes

2005-05-31 Thread Valdis . Kletnieks
On Tue, 31 May 2005 08:04:42 PDT, Hans Reiser said:

 Cycle may consists of more graph nodes than fits into memory. 
 
 There are pathname length restrictions already in the kernel that should
 prevent that, yes?

The problem is that although a *single* pathname can't be longer than some
length, you can still create a cycle.  Consider for instance a pathname 
restriction
of 1024 chars.  Filenames A, B, and C are all 400 characters long.  A points at 
B,
B points at C - and C points back to A.

Also, although the set of inodes *in the cycle* fits in memory, the set of
inodes *in the entire graph* that has to be searched to verify the presence of
a cycle may not (in general, you have to be ready to examine *all* the inodes
unless you can do some pruning (unallocated, provably un-cycleable, and so
on)).  THis is the sort of thing that you can afford to do in userspace during
an fsck, but certainly can't do in the kernel on every syscall that might
create a cycle...



pgpdt2U5lIsqK.pgp
Description: PGP signature


Re: Reiserfs 1300G partition on lvm problem ...

2005-05-30 Thread Valdis . Kletnieks
On Mon, 30 May 2005 08:17:00 +0200, Matthias Barremaecker said:

 I did a bad block check and I have 10 bad blocks of 4096bytes on 1300Gig 
 and ... that is the reason reiserfs will not work anymore.

 I guess this sux. I rather have that the data on the bad blocks is just 
 corupted but the rest is accesseble.

It all depends on which 10 blocks go bad.  If it's a block that's allocated to
a file, you lose the 4K or whatever that's in that block.

If it's a block that an inode lives in, you're probably going to have the
entire file evaporate.

If it's a block that contains something even more important, you're going to
have large sections of the file system evaporate.

It's a tradeoff issue - how many times do you replicate metadata on the
filesystem, against how well the file system deals with errors.  The problem is
that if you just say let's have 2 copies of everything, just in case, it
takes a lot more disk space to *store* 2 copies of the metadata.  Also, your
disk performance falls through the floor - most journalled filesystems have
enough trouble making sure that *one* copy of things like the free list is on
disk and consistent with the journal.  Making 2 copies is going to probably
triple your disk I/O and complicate matters a *lot* for fsck (if you crash and
the two copies aren't consistent, which one do you believe?)

That's why almost all filesystems designers just punt and assume that the media
actually works, and suggest if your media might not be 100% reliable, that you
use RAID or similar solutions



pgpSnF34gSSOu.pgp
Description: PGP signature


Re: Problems with accessing directory

2005-05-29 Thread Valdis . Kletnieks
On Tue, 29 May 2001 18:14:28 +0200, Webservice said:

 When accessing a particulary directory, the systems hangs with a kernel
 panic (mapping memory).

This will be a lot easier to diagnose with:

The exact version of your kernel (uname -a), the version of reiserfsck,
and the actual panic traceback (set up a serial console to catch it, or
even take a picture with a digital camera if all else fails).

Have you run 'badblocks' on the 4 md devices, to rule out an actual bad
spot on the disk?


pgpFfAoRZDuOp.pgp
Description: PGP signature


Re: Problems with accessing directory

2005-05-29 Thread Valdis . Kletnieks
On Sun, 29 May 2005 18:44:02 +0200, Kurt Ghekiere said:

 May 29 17:28:51 mail3 kernel: Process hax0r (pid: 3738,
 stackpage=f121b000)
 May 29 17:28:51 mail3 kernel: Stack:  bfe5 1000 f63b6000
 bfffe7f0 f63b6000 b8e4 
 May 29 17:28:51 mail3 kernel:f78aaf22 0a3a 0020 f121a000
 c0108a93 000b bfffe7f0 f78a99a1
 May 29 17:28:51 mail3 kernel:   
    
 May 29 17:28:51 mail3 kernel: Call Trace:[c0108a93]
 May 29 17:28:51 mail3 kernel:
 May 29 17:28:51 mail3 kernel: Code: 8a 02 84 c0 75 ef e8 9c ec ff ff 89
 c2 80 3a 00 0f 84 bb 00

Interesting process name indeed. Hopefully you recognize it? ;)

I would suggest running the call trace through ksymoops, but it's so short that
we've quite obviously clobbered the stack to the point that ksymoops won't tell
us anything useful.

I'd investigate why you get all those insmod errors - why is the system trying
to load pciehp and hw_random if there's no device?  Alternatively, are other
modules getting loaded incorrectly and blocking those from starting? It's
possible that if your kernel and modules are out of sync, that Bad Things like
panics happen

You probably should look at upgrading the userspace reiserfsck and MD/LVM tools
- your kernel seems unhappy with the old versions.

Other than that, I admit to not having any clear AHA! THAT's their problem
solution, sorry


pgpTiOKMSdDkf.pgp
Description: PGP signature


Re: Reiserfs 1300G partition on lvm problem ...

2005-05-29 Thread Valdis . Kletnieks
On Sun, 29 May 2005 21:25:54 +0200, Matthias Barremaecker said:

 but that sais it is a fysical drive error

Physical drive errors.  Your hardware is broken.  Isn't much that Reiserfs
can do about it.

 What can I do.

1) Call whoever you get hardware support from.

2) Be ready to restore from backups.

3) If you didn't have RAID-5 (or similar) set up, or a good backup, consider
it a learning experience.

If your data is important enough that you'll care if you lose it, you should 
take
steps to make sure you won't lose it... It's that simple.

(Just for the record, if we have important info, it gets at least RAID5, a
backup to tape or other device, *and* a *second* backup off-site.  And my shop
is far from the most paranoid about such things.)



pgp2dVSpWxCvh.pgp
Description: PGP signature


Re: peak performance

2005-05-28 Thread Valdis . Kletnieks
On Fri, 27 May 2005 19:26:26 +1000, robby cunningham said:
 I've been using your product for 4 months now. I've increased my length from
 2 inches
 to nearly 6 inches. Your product has saved my sex life.-Matt, FL

I'm glad Reiserfs worked for him, but somehow I don't see Hans listing this
one on the Reiserfs success stories page. ;)


pgpajTaF2JBvl.pgp
Description: PGP signature


Re: File as a directory - Ordered Relations

2005-05-28 Thread Valdis . Kletnieks
On Fri, 27 May 2005 23:56:35 CDT, David Masover said:

 Hans, comment please?  Is this approaching v5 / v6 / Future Vision?  It
 does seem more than a little clunky when applied to v4...

I'm not Hans, but I *will* ask How much of this is *rationally* doable
without some help from the VFS?.  At the very least, some of this stuff
will require the FS to tell the VFS to suspend its disbelief (for starters,
doing this without confusing the VFS's concepts of dentries/inodes/reference
counts is going to be interesting... :)


pgppFAsfV0InP.pgp
Description: PGP signature


Re: Reiser4 O_DIRECT

2005-05-24 Thread Valdis . Kletnieks
On Tue, 24 May 2005 16:35:51 CDT, David Masover said:

 My feeling is that you create the standard as you create the test, not
 the other way around.  If the test works, then there are by definition
 few bugs if any in the system itself -- any other bugs are actually in
 the application, not the system.

That's even worse.  Then if somebody bodgers it all up with some corner case
that your test system didn't cover, you're by definition screwed, as the
standard won't say what it *SHOULD* do

Consider the vast philosophical difference between This is what the FS *should*
do and This is what we tested the FS for.  You want the standard to be the
first, not the second.


pgpv30U27h0em.pgp
Description: PGP signature


Re: Reiser4 O_DIRECT

2005-05-23 Thread Valdis . Kletnieks
On Mon, 23 May 2005 12:52:12 +0300, Markus =?UNKNOWN?Q?T=F6rnqvist?= said:
 On Sun, May 22, 2005 at 07:22:51PM -0500, David Masover wrote:
 Of course, I've worked on sufficiently few big projects that I'm still
 naive enough to believe that unit tests _can_ catch everything, if
 they're done right.  I'm sure I'll eventually be proven wrong...
 I'm sure a testing professional will happily prove you wrong ;)

It's *never* the testing professional that disproves unit tests can catch 
everything.

It's the guy with the creeping-horror Cobol/Python database that finds the stuff
that unit tests can't catch.. ;)



pgpYs0VhzoXNU.pgp
Description: PGP signature


Re: Re[2]: Reiser4 O_DIRECT

2005-05-22 Thread Valdis . Kletnieks
On Sat, 21 May 2005 23:49:00 +0200, Pysiak Satriani said:

 I remember Hans saying that r4 is so stable that the developers themselves
 can not find any more bugs.

Which in reality probably means It *probably* won't eat your data.

Remember that the developers have a limited number of different hardware
configurations, and a limited number of test tools, and a limited number
of ways they use the file system.

So there's probably *plenty* of bugs still to be found - most of them the sort
that nobody will expect, and won't be found until some user's creeping-horror
database application that's written half in Cobol and half in Python does 
something
totally stupid but legal.

I've been on both ends - beta tester for software the developers weren't
finding any more bugs in (I filed over 300 bug reports against the product
anyhow), and had users find fatal bugs I couldn't find (my favorite had to be a
user who managed to crash software I wrote by entering a backspace character.
On an IBM 3270 terminal. Which doesn't *HAVE* a transmittable backspace
character - backspace is handled locally in the terminal)



pgppuKrhGbpvC.pgp
Description: PGP signature


Re: Reiser4 O_DIRECT

2005-05-22 Thread Valdis . Kletnieks
On Sun, 22 May 2005 19:22:51 CDT, David Masover said:

 This is exactly why it should be in the kernel once the developers can't
 find any more bugs.  Marked as experimental, mainly, but in the kernel
 where real users can throw cobol/Java/sql bastardizations at it and
 break it.

Oh, I agree there - it's at a point where it *should* be in a -mm kernel,
or a -linus wrapped with a Kconfig 'depends on EXPERIMENTAL'. (I'll let
Andrew and Linus make *THAT* decision ;)

I'm just worried about PHB managers lurking on the list and reading so stable
even the *developers* can't break it as it's *really* good and solid rather
than we need *other* people to break all the stuff we forgot to break ;)



pgpIVeDrSjLfn.pgp
Description: PGP signature


Re: disk runs full

2005-05-12 Thread Valdis . Kletnieks
On Thu, 12 May 2005 13:50:31 +0200, Alexander Gruber said:
 I checked it with du -sh * on the root partition and the result was much 
 smaller than the used space reported by df.

Note that temporary files are often creat()ed and then unlink()ed, leaving
the open file descriptor as the last reference.  You should probably run
'lsof' or similar tool.  On my laptop at the moment:

lsof -n | grep dele
cardmgr2207   root3u   CHR  254,05556 
/dev/cm-2123-2 (deleted)
cardmgr2207   root4u   CHR  254,15559 
/dev/cm-2123-5 (deleted)
cardmgr2207   root5u   CHR  254,25562 
/dev/cm-2123-8 (deleted)
exmh   7142 valdis   10u   REG   0,160  74035 
/tmp/tclfG25oV (deleted)
gconfd-2   7805 valdis   13wW  REG   0,16  641  44735 
/tmp/gconfd-valdis/lock/0t1115905590ut151063u967p7805r252866408k3219173544 
(deleted)
aspell 9481 valdis2u   REG   0,160  74035 
/tmp/tclfG25oV (deleted)

So there's 2 open but unlinked files on /tmp, and du and df will show up 
different
values. (Note that exmh did an open() of a file, unlinked it, and then passed
the open file descriptor to aspell as stdin - so that space will be reclaimed
once *both* of those processes have done a close() on the file descriptor).


pgpPdTaifLWuY.pgp
Description: PGP signature


Re: file as a directory

2005-05-10 Thread Valdis . Kletnieks
On Tue, 10 May 2005 10:39:23 BST, Peter Foldiak said:
 Back in November 2004, I suggested on the linux-kernel and reiserfs
 lists that the Reiser4 architecture could allow us to abolish the
 unnatural naming distinction between directories/files/parts-of-file
 (i.e. to unify naming within-file-system and within-file naming) in an
 efficient way.
 I suggested that one way of doing that would be to extend XPath-like
 selection syntax above the (XML) file level.

I believe the consensus was that this needs to happen at the VFS layer, not
the FS level.  The next step would be designing an API for this - what would
the VFS present to userspace, and in what way, and how would backward
combatability be maintained?


pgpKIGxXIFQdb.pgp
Description: PGP signature


Re: Re[2]: When Reiser4 will be officially included in the kernel? ...

2005-05-04 Thread Valdis . Kletnieks
On Thu, 05 May 2005 00:38:54 +0200, Pysiak Satriani said:
  This is OK, however, what I am looking for is to download the Kernel
  from kernel.org, and found Reiser4 code inside. This means officially
  for me.
 FYI, kernel.org does have patches with r4, eg.
 http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc3/2.6.12-rc3-mm2/2.6.12-rc3-mm2.bz2
 
 When 2.6.12 comes up, you might want to take the next 2.6.12-mm1 or
 the next 2.6.12 reiser patch from namesys.com

Please do *NOT* run -mm kernels unless you are *sure* you know what you're
doing.  They are *literally* bleeding-edge test kernels, and *are* the most
likely things on the entire kernel.org server to hang, wedge, reformat your
disks and eat your data, and otherwise have a bad time with.

To repeat:  -mm are *TEST* kernels.  2.6.12-rc3-mm1 came out at 23:11 Friday.
2.6.12-rc3-mm2 came out at 16:43 the next day. Why? Umm.. let's just say
that it didn't even *compile* cleanly for uniprocessor X86... ;)  I've been
running -mm kernels on this laptop since 2.5.45-mm1 or so, and I'd estimate
that at *least* 1 in 4 hasn't even booted cleanly to multiuser without 
additional
patching and tweaking. In the last 31 -mms, I've needed 10 additional patches.
Think about that - since 2.6.9-rc2-mm1, there's been 31, and 10 needed patches.
Cool stuff if Test Pilot is part of your job description, but not what you
want to put on production boxes. ;)

If you're brave, you can pull the -broken-out variant and apply all the
reiser4-* patches in the order they're listed in the 'series' file - that
should work unless they depend on some other patch being applied first.

If you're willing to stress test new stuff, and are prepared to recover your
system from backups, go for it - Linus and Andrew want -rc and -mm kernels to
get more runtime.  But they're definitely *not* the official kernels that the
original poster wanted, which is usually called mainline or Linus kernels.

From where I'm sitting on the sidelines, Reiser4 *can't* make 2.6.12, is a long
shot for .13, and most likely will land somewhere around .14 to .16.  And I'm
going to predict there will be at least one more major bun fight on the lkml
list about pseudo-files before it gets in (sorry Hans, but I have to side with
the guys who said Cool idea, but it really needs to be at the VFS level)...



pgp2pDwCcdvrX.pgp
Description: PGP signature


Re: Re[2]: reiser4.1

2005-03-03 Thread Valdis . Kletnieks
On Thu, 03 Mar 2005 09:55:20 +0100, Pysiak Satriani said:

 I remember Hans saying that nowadays CPUs are so fast, they compress
 faster than HDDs move the heads around and do the writes. So compression,
 if done properly, can be with no negative impact to speed.
 
 Can you say what level of compression with which processors would handle
 it without speedloss?

IBM's AIX 4.3 and later support LZ compression on their JFS file system.
I was able to measure a 10-15% speed-up by converting /usr to compressed
even on a 133MZ Power604e chip because even back then, it was faster to
read half as many blocks off a SCSI disk and decompress.

So it's been at least a potential win for a decade or so, assuming your
filesystem is able to deal well with fragments (you can't really win unless
you take a 4K or so logical block, compress it to some number of 512-byte
chunks, and then store the resulting chunks cheaply - JFS does the
blocks-and-frags efficiently, so it's easy to win.  On the other hand,
it would be dreadful for Reiser3 - imagine having to do tail packing
for *every block* (which is what you end up doing, sort of...)


pgp81X4PfC8ZW.pgp
Description: PGP signature


Re: where are reiser4 sources

2005-02-28 Thread Valdis . Kletnieks
On Mon, 28 Feb 2005 15:41:52 PST, Hans Reiser said:

 I am frankly skeptical that one should attempt to clone windows.

That explains why WINE exists, I guess.. ;)


pgpzJsM1Ca7pO.pgp
Description: PGP signature


Re: where are reiser4 sources

2005-02-22 Thread Valdis . Kletnieks
On Tue, 22 Feb 2005 10:41:25 PST, Hans Reiser said:

 That violates the license.

Umm.. from fs/reiser4/README:

Reiser4 is hereby licensed under the GNU General
Public License version 2.

Where in the GPL does it say he can't port to another OS?


pgppgfP4VjYKb.pgp
Description: PGP signature


Re: Plugin for corruption resistance?

2005-02-18 Thread Valdis . Kletnieks
On Fri, 18 Feb 2005 08:36:51 EST, Gregory Maxwell said:

 Tree hashes.
 Divide the file into blocks of N bytes. Compute size/N hashes. 
 Group hashes into pairs. Compute N/2 N' hashes, this is fast because
 hashes are small. Group N' hashes into pairs compute N'/2 N'' hashes
 etc.. Reduce to a single hash.

You get massively I/O bound real fast this way.  You may want to re-evaluate
whether this *really* buys you anything, especially if you're not using some
sort of guarantee that you know what's actually b0rked...

 In my initial suggestion I offered that hashes could be verified by a
 userspace daemon, or by fsck (since it's an expensive operation)...
 Such policy could be controlled in the daemon.
 In most cases I'd like it to make the file inaccessible until I go and
 fix it by hand.

You're still missing the point that in general, you don't have a way to tell 
whether
the block the file lived in went bad, or the block the hash lived in went bad.

Sure, if the file *happens* to be ascii text, you can use Wetware 1.5 to scan
the file and tell which one went bad.  However, you'll need Wetware 2.0 to
do the same for your multi-gigabyte Oracle database... :)

(And yes, I *have* seen cases where Tripwire went completely and totally bananas
and claimed zillions of files were corrupted, when the *real* problem was that
the Tripwire database itself had gotten stomped on - so it's *not* a purely
theoretical issue


pgpk0wA71b8oV.pgp
Description: PGP signature


Re: Plugin for corruption resistance?

2005-02-17 Thread Valdis . Kletnieks
On Thu, 17 Feb 2005 21:43:08 CST, David Masover said:

 This way is easier, though.  But I was thinking about accessing the
 file.  I don't know of any hashes that can be easily updated from part
 of the file, unless you're hashing only pieces of the file in the first
 place, but it'd be nice to not bother hashing at all until the hash is
 needed, especially if we are hashing the whole file.

There's plenty of CRC functions that are quite easily set up for an
incremental update (see RFCs 1141 and 1624 on how to do it for the CRC function
used for Internet IP packets).  You'd of course not want to use that CRC-16,
but the same basic principle applies to other CRC functions.

The problem is that most CRC functions aren't very much good at detecting
multi-bit errors, and when you're talking about hundreds of gigabytes of
disk on a modern RAID, the CRC functions are hardly bulletproof.

On the flip side, hash functions like MD5 or the SHA family are fairly 
bulletproof,
but are essentially impossible to develop an incremental update for (if there
existed a fast incremental update for the hash function, that would imply a
very low preimage resistance, rendering it useless as a cryptographic hash).

Also, there's another issue - unlike standard ECC codes that can actually *fix*
the problem (for at least small number of bit errors), it's unclear what you 
should
do if you find a mismatch between the hash of a block and the block contents, as
you don't know whether it's the actual data or the hash that's corrupted



pgppfOUk0kfEV.pgp
Description: PGP signature


Re: WELCOME to reiserfs-list@namesys.com

2005-02-07 Thread Valdis . Kletnieks
On Mon, 07 Feb 2005 13:28:26 EST, Rick Spillane said:
 Is reiserfs *completely* ACID compliant? Acid meaning Atomicity,
 Consistancy, Isolation, and Durability? If not (which I would expect
 is true) then how far away are the offending parts from making
 reiserfs ACID compliant, and where are they in the source?

Do you mean this as buzzword compliant, or do you have a pointer
to an actual specification or compliance suite?



pgpuuSXiM43ZH.pgp
Description: PGP signature


Re: reiser4 and apache (was: Re: Reiser4 and ZFS)

2004-12-27 Thread Valdis . Kletnieks
On Mon, 27 Dec 2004 13:28:15 +0100, Sander said:
 [EMAIL PROTECTED] wrote (ao):
  For many shops, it's quite likely that a ZFS with more scalability
  and administration is The Right Choice, especially if it does *NOT*
  include lots of odd new features and quirks that might break
  production code (remember the joys in getting Apache running on
  reiser4, until it was discovered that the 'file-as-directory' stuff
  broke programs that weren't expecting it?).
 
 I just got bitten by this. Is it possible to get apache to run on
 reiser4, and if so, how?

I think the archives have pointers to both a reiser4 patch that disables
the file as directory, and a patch to apache to deal with the situation...


pgpvyRJnczF90.pgp
Description: PGP signature


Re: Why is Reiser4 slower then ReiserFS v3

2004-12-27 Thread Valdis . Kletnieks
On Mon, 27 Dec 2004 13:38:12 MST, Dark Shadow said:

 I have three hard drives so I took a file from one and copied it to
 the others and timed it
 source drive /dev/hda Reiser3 Western Digital 40gb 7200rpm
 target1 drive /dev/hdb Reiser4 Western Digital 80gb 7200rpm
 target2 drive /dev/sda Reiser3 Seagate 160gb 7200rpm (SATA but still
 same rpm as rest so it should be the same)

You may wish to run 'hdparm -T -t' on each drive and see what the *raw* speed
is.  All drives are not created equal... ;)

 time cp ~/800mb.file /target1
 real0m41.409s
 user0m0.010s
 sys 0m4.364s

 time cp ~/800mb.file /target2
 real0m38.318s
 user0m0.017s
 sys 0m5.627s

Similarly, you should try each one 3-5 times and get an average (for
starters, if you have more than 800M of memory, the second time around it
may all still be in cache, so the second time gets a hot-cache boost). It
may be useful to run the command once and *ignore* its times, and then
re-run the command 3 times and average those results (so all 3 times you
actually *use* start from the same previous command just finished cache 
state).


pgpXNoxZMWOWg.pgp
Description: PGP signature


Re: (reiser4)install on root

2004-11-26 Thread Valdis . Kletnieks
On Fri, 26 Nov 2004 00:58:28 PST, BLuEGoD said:

  Hi, i want to know how to install reiser4 on root with only 1 HD.. I use
 mkfs.reiser4 from the reiser4 utils after compile and install kernel  patches
 on a debian woody 2.2.. with a scsi HD, but it crashes (errors found doing
 mkfs.reiser4 on root device and on next boot I saw a kernel panic).. Note: I
 did it with the root mounted.. because i need to boot with that HD..

As several have noted, you can use a spare partition to build a new root
filesystem.  Another option is to use a rescue disk or a Knoppix disk
or other CD-based toolset to boot from, and use that to do your mkfs.reiser4.

It's a good idea to *always* have a rescue disk handy, because they enable you
to recover from many problems that will prevent you from booting all the
way to single-user off your production disk (trashed boot block, 
missing/misnamed
files in /boot, a need to fsck the boot partition, or whatever...)


pgpTdyu9Am6zL.pgp
Description: PGP signature


Re: file as a directory

2004-11-22 Thread Valdis . Kletnieks
On Mon, 22 Nov 2004 19:24:36 +0530, Amit Gud said:

  A straight forward question. Wouldn't adding a file as a directory
 mechanism more logical in VFS itself,

There was quite the flame-fest on the lkml a while back regarding
how the semantics of file as a directory should operate.  There's
a number of really nasty corner cases that you need to deal with.

Go back and re-read the whole flame-fest, understand all the points
raised, and let us know when you have a workable proposal.

(Hint - file as directory broke a number of programs that didn't
expect that a file *could* be a directory, when run on a reiser4
filesystem...)


pgpHR2WwtexCi.pgp
Description: PGP signature


Re: Reiser4 kernel modules

2004-11-01 Thread Valdis . Kletnieks
On Mon, 01 Nov 2004 14:07:45 +0100, =?iso-8859-1?q?Lars_Tobias_B=F8rsting?= said:

 And why does reiser4 need changes in the kernel code? Is it really a
 smart approach to require kernel changes for reiser4 to work?
 
 Why isn't it possible to build reiser4 as kernel modules?

It still requires the change to the kernel to add the proper code
in linux-2.6.9/fs/reiser4, add it to the proper Makefiles, the Kconfig
glue needed to build it, etc.  Adding in-tree code is a change to the
kernel, even if it ends up getting built as modules.

Or you *can* build it as an out-of-tree module - which still has some
rough edges in the 2.6 Kbuild infrastructure (most notably, if you
do a 'make modules_install', you have to remember to re-install all your
out-of-tree stuff as well...)


pgpHTEcqbsHgz.pgp
Description: PGP signature


Re: Directory updates in filesystems

2004-10-29 Thread Valdis . Kletnieks
On Fri, 29 Oct 2004 00:26:18 CDT, David Masover said:

 If this is about locking not working well with NFS, why not ensure that
 the directory itself is owned by root and read-only before attempting?
 Wait -- don't answer that...

No, this is a different problem.

Imagine a directory with 10K files called 0001, 0002, 0003,  , .
You start a 'readdir()' loop, and get to 5497 or so.  At this point,
another process removes 1260 through 1265, and then another process renames 8534 to
1263, putting it in the slot just vacated - and you reach the end of
the readdir() loop never seeing that file.

 | Are there any file systems that fully address this issue, or POSIX
 calls that
 | guaranteed to make an atomic readdir, without specific locking, or must a
 | lock be obtained on the directory to ensure that the read is
 consistent. I
 | think that locking is needed in the application if complete
 consistency is
 | required because the underlying behaviour of the OSes/filesystems is so
 | variable in this regard, but I'd be interested in understanding what
 | characteristics a filesystem would have to have to avoid this.
 
 Maybe an atomic readdir operation?  Does reiser4 do atomic reads?

Do you *REALLY* want to lock the *entire* dir (probably in memory, which
can hurt for directories with 10Ks or 100Ks entries, which is where the
problem is most evident)?  Even if it's not locked in memory, the mere
locking against updates can be *painful* performance-wise.

 I know reiser4 (or at least should by 4.1) have a sys_reiser4 api which
 does atomic write operations.  That is:  application starts the
 transaction, does a bunch of writes, ends the transaction.  If at any
 point there is a failure, filesystem tells application to roll back.

Atomic operations don't help you here, unless you're willing to take a
locking performance hit.  Remember that rename() is *already* atomic (at least
from other process's viewpoint), and you have the rename into a slot
you've passed problem mentioned above...


 This alows read-only access, such as a web server, to operate on
 slightly stale snapshots as this would create.  When faced with a
 decision of:
 
 - - serving a slightly stale page immediately
 - - making users wait for a write of a newer version to complete
 - - serving a half-written newer version
 
 I am sure most web admins would choose the first option, which is what
 they would get if the pages were being updated with vim.  The difference
 is that the filesystem solution works on larger units than single files.

The problem is that if you're a mail server, you probably *don't* want to
be sending a slightly stale version of the mail that just got queued.  There,
the only realistic option is your make users wait - which may be intolerable
when you're trying to do millions of transactions an hour...


pgpl3McrPt7Pi.pgp
Description: PGP signature


Re: Interesting deletion idea

2004-10-08 Thread Valdis . Kletnieks
On Fri, 08 Oct 2004 19:52:14 EDT, John Richard Moser said:

 I thought the DOD algorithm was 7 pass?

Citation please?  If you have a better reference than DOD 5220-22.M,
feel free to share it.

 If this is going on rapidly, there's no point in trying to completely
 destroy the disk for *every* logical operation; but buffering the
 operations and then only doing the most recent one, and destroying the
 area before that one exactly, would be OK.  The idea is that rapid
 overwrites from userspace get collapsed into a single overwrite; and
 then the kernel overwrites a bunch of times before flushing that data to
 disk to securely erase it.

The point is that you have no really good way to know beforehand that
the flurry of writes is over, and it's time to collapse the writes into
a single write.  

To demonstrate using your example:

a = open(/some/file.txt);
seek(a, 0, 0);
fputc(a,'N');
seek(a, 0, 0);
fputc(a, 'D');
seek(a,0,0);
fputc(a, 'X');

At what point do you do the overwrite?  You place it just before the
fputc 'X' - but you can't really delay to that rather than at the
'N' or 'D' unless you *know* that the 'X' one will happen 'Soon Enough'.
There's also the point that fputc() is stdio and buffered by default,
unless you've called fflush() or setlinebuf() or similar.  Even if you
look at the read()/write() syscall level, the Linux kernel will almost
certainly automatically do most of the needed collapsing in the buffer
cache code (look at fs/buffer.c for the gory details) - in fact, most
of the time, you need to use fsync() or similar to *force* the data to
actually get to the disk (often, the data doesn't go out until long after
the process has actually exited - and then there's the different way
that the different I/O elevators schedule things, just to add another
layer of unpredictability into things).  The end result is that it's
a lot harder than it looks to get this right...

In addition, doing the overwrite at *THAT* point is *the wrong point* - as
you're about to overwrite the block at least once *anyhow*.  You *really* need to
be doing erasing in the handling for the unlink() and (f)truncate() syscalls,
because *that* is the point you're freeing the disk blocks - and the point of
erasing is to prohibit scavenging of old data off the disk.  This has the added
benefit of being something you *can* do basically at the filesystem's leisure,
subject to a requirement that you return blocks to the free list fast enough
to prevent disk space exhaustion (which is trickier than it looks - under heavy
file create/write/read/unlink loads, you need to be doing it as fast as possible
at exactly the time you have the least idle bandwidth - at worst case, a 3-pass
erase of all blocks will limit you to 25% of the effective write bandwidth in a
steady-state high-load situation).

Also, you *really* need to be *very* careful regarding write barriers and the
like - look at the linux-kernel archives for the last few months where a *long*
series of threads about the problems on IDE.

Basically, if the drive has a write cache on it, you have to either disable
it or jump through some *real* hoops in order to get strictly correct write
barrier semantics (and on some drives, the situation is totally impossible).




pgpjXUeKGZHnZ.pgp
Description: PGP signature


Re: [PATCH] make fs/reiser4/search.c compile with gcc 4.0

2004-09-22 Thread Valdis . Kletnieks
On Wed, 22 Sep 2004 20:45:55 +0200, =?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= said:

 I know gcc 4.0 is still in it's alphas.
 Obvious solution is to move function declared in other function
 up-wards. Since it's static anyway, it won't make any diffrence.
 Please consider applying to repo.

I'm not sure it's a good idea to be trying to fix reiser4 code to compile
with an alpha compiler, at least without a *very* firm commitment from the gcc
crew that this *is* a real error in the reiser4 code and not a bug in gcc
causing a spurious message.

Why does this code compile cleanly with gcc 3.x and fail with a 4.0 alpha?
Without knowing that, it's *STUPID* to change the code to suit the alpha...


pgpK6j3pBQqVn.pgp
Description: PGP signature


Re: [PATCH] make fs/reiser4/search.c compile with gcc 4.0

2004-09-22 Thread Valdis . Kletnieks
On Wed, 22 Sep 2004 21:03:45 +0200, =?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= said:

 This code uses stuppid gcc extension, that is not present in gcc 4.0.

OK, if it's using a GCC extension that's already officially deprecated in 3.X and will 
be
removed in 4.0, then that *is* a good reason to fix the code.


pgpMo5D4IKMbc.pgp
Description: PGP signature


Re: The argument for fs assistance in handling archives

2004-09-02 Thread Valdis . Kletnieks
On Thu, 02 Sep 2004 19:43:34 CDT, David Masover said:

 And on apps.  Should I teach OpenOffice.org to do version control?
 Seems a lot easier to just do it in the kernel, and teach everything to
 do version control in one fell swoop.

Including files you didn't really want to keep version control of?

How many temp files does gcc create and unlink in the course of a kernel build?
(And remember, you can't say don't enable that on /tmp - gcc respects the
setting of $TMPDIR - so an 'export TMPDIR=~/tmp' confuses things quite
nicely...)

And it's hard for the kernel to know that an unlink() done by gcc should be
treated differently than the recover the last version you *want* it do be able
to do after you work on a source file for a long while, save it, and then
fumble-finger a 'rm * .o' - you can't even use a heuristic like don't version
control it unless it's N seconds or more old

(Note that the obvious solution of creating a chattr flag has its own
complexity issues - should versioning be turned on by default for some types
and not others, etc...)

There be dragons here - it's not as simple as drop in a plugin and be happy.



pgpzMIzBDqaRf.pgp
Description: PGP signature


Re: The argument for fs assistance in handling archives

2004-09-02 Thread Valdis . Kletnieks
On Thu, 02 Sep 2004 20:11:13 CDT, David Masover said:

 It'd be like writing OpenGL entirely in software, before hardware
 accelerators work, and at the last minute have to change the library to
 use triangles instead of splines.

I expect that SGI did a software-only version of IrisGL first, so they could
figure out what the hardware accelerators needed to support.  And even then,
the API for IrisGL got modified when it became OpenGL.



pgpSsKKBMfbKA.pgp
Description: PGP signature


Re: Was able to reproduce cp: cannot stat file.x: Input/output error

2004-08-10 Thread Valdis . Kletnieks
On Tue, 10 Aug 2004 01:31:17 PDT, Hans Reiser said:

 Thanks for explaining
 
 sync;sync;sync;halt
 
 I always felt I was failing to grok something.

As was the author who recommended it.  It started out as:

# sync   ( this one schedules the I/O)
# sync   ( just a time waster typing)
# sync   ( just a time waster typing )
# halt( and we finally actually shut down).

The disks on the old PDP and Vax 750 boxes were actually sluggish enough that
if you had a whole 1M or 2M in the buffer cache to flush out, it was actually not
difficult to enter sync, hit return, enter halt, hit return, and have the
halt happen before sync finished, doing the predictable to the non-journaled file
systems.  Empirical studies showed that even on the biggest-memory boxes,
sync could almost always finish with 2 time-wasters before the halt.. ;)



pgpHv1dAXoZ26.pgp
Description: PGP signature


Re: Was able to reproduce cp: cannot stat file.x: Input/output error

2004-08-09 Thread Valdis . Kletnieks
On Sat, 07 Aug 2004 00:49:43 PDT, Hans Reiser said:

 I think I have discovered the problem - unless there was a reason mongo was
 issuing mount/unmount commands at the start/end of a mongo 'run' as well as
 before/after _each phase_.

 Probably someone wanted to separate the measurement of the phases.  It 
 has been a while since I read mongo.

Note that an unmount/mount pair will force a flush of all dirtied pages in the
in-memory file cache, and *really* not return until it's really done and really
out on disk.  In addition, sync() will force stuff to disk, but *not* invalidate
in-cache pages - more drastic measures are needed if you want to benchmark
with a cold cache (which is almost a must if you're doing actual filesystem
benchmarking, as otherwise you're benching the in-core cache instead).

As an aside, although the Linux fs/buffer:do_sync() won't return until it's
all really done, there is no mandate that the sync() syscall wait (and in fact,
is the source of the old type 'sync' three times, then 'halt' - the second
and third times you typed sync and hit return hopefully gave the I/O scheduled
by the *first* sync time to complete.  At least one 'Unix for Dummies' book
proved their lack of depth of understanding when they recommended:

# sync;sync;sync;halt

;)



pgpfOGWPiApBm.pgp
Description: PGP signature


Re: Fibration questions

2004-07-23 Thread Valdis . Kletnieks
On Fri, 23 Jul 2004 12:28:49 +0300, Markus =?UNKNOWN?Q?T=F6rnqvist?= said:

 I think desktops for all the Joe Q. Averages are pretty much a different
 scene from servers..

It's not as different as you might think.  Remember in most corporations that use
Active Directory, all the infrastructure boxes (domain controllers, etc) are Windows
boxes too.  Quite recently, an amazing number of webservers got 0wned because
somebody browsed the net using IE while logged on at the server console

 And how many times has the global RAM market been put under severe strain
 because the latest Windows upgrade needed more RAM, so everybody went out and
 bought more RAM, and more RAM, and more RAM...
 
 But Windows isn't the only thing that starts requiring more RAM, and
 if you can buy more for a lesser price, that's what you'll do, regardless.

No, the price of RAM went *UP*, dramatically, because demand was higher
than supply, so you were buying less for a higher price.

The point is that the manufacturers of RAM and systems had *no* incentive
to do anything to stop it.

Microsoft is expected to recommend that the average Longhorn PC feature a
dual-core CPU running at 4 to 6GHz; a minimum of 2 gigs of RAM; up to a
terabyte of storage; a 1 Gbit, built-in, Ethernet-wired port and an 802.11g
wireless link; and a graphics processor that runs three times faster than those
on the market today.

http://www.microsoft-watch.com/article2/0,1995,1581842,00.asp

Now *try* to convince me that the Dell and HP saw this, and their first thought
was Let's see if we can get it to run well on a single-core 3GHz with 1G of RAM ;)

If that was their first thought, the second was OK, I'm done laughing, now I need
to pick myself up off the floor

 Maybe I'm the unexperienced obnoxious adolescent again, as I'm in only
 my second job so far, but I've noticed that both employers have the
 principle that if you can get anything, even the slightest guarantee,
 that something is faster and more stable at a somewhat higher cost, it's 
 worth it. Even if you'd be paying for a scapegoat-factor warranty.

Right. Which is why you end up *buying* that faster server at higher cost than
you might really need.

Most managers have a *really* hard time dealing with the concept If you use this
alternative, totally free, no-cost, software, it will run faster and save you money.

 Tune even faster solution and get even more power, it'll last us
 all weekend, before it goes obsolete...

You'd be *amazed* at how many sites *dont* have somebody on the payroll
who can do tuning well.  Usually, it's whatever they remember from the MSCE
exam.  Just because my shop has people experienced in tuning everything from
old ferrite-core systems to top-10 supercomputers doesn't mean every shop does. ;)

 So a Dell 2650 could have could have handled what the 6600 did?

No, the 2650 would certainly have gotten swamped, the two boxes are doing
different things.  The point is that *DELL* didn't have any incentive to get
me to buy a 2650 instead of another 6600.

And if I had little clue, and actually talked to a Dell sales rep, they probably could
have convinced me I needed a 6600.  

 And they're still selling 6600s, how big an impact would Reiser4's speed
 advantage have really on them? But it seems I'm over over my head now :)

Trust me, it wouldn't have helped enough to get the 6600's workload to fit
on a 2650.

 Should I have said safe instead of secure? Maybe that would be the better
 English word for it.
 Like being safe at power failures.

Is it *demonstrably* better than ext3 with 'data=journal'?

 Then there's view security, which should be implemented.

Ahh.. but view security doesn't do you as much good as it could, mostly
because of the support at the VFS level issues.

 I make my meager living as a small-time administrator and writer of
 web (and similar) magick in Python, so I don't know why the xattrs 
 couldn't be mapped to Reiser4 calls, but shouldn't it be technically possible?

I'll refrain from saying anything except read the list archives
 
 But these are just ideas, I have absolutely zero marketing experience
 so this should not be taken as a presumptuous manual on how to do things :)

You'd have more luck not talking to the people who sell hardware or systems, but
to the people who *use* hardware and systems, or who sell consulting/maintenance.

For instance, Google has multiple large server farms, each of which has 15K to 20K
systems in it.  They're a Linux shop, and would probably be willing to part with a
fairly large sum of cash if it meant their hardware upgrade costs went down even 5%.

There's lots of places making money doing custom one-off solutions based on
Linux - for instance, most of IBM's Linux revenue comes from consulting/
support. A shop that's doing systems integration might well be willing to pay
$100K for another thing in their bag of tricks that lets them land 20 contracts
that make them $10K profit each, by being able to 

Re: ext3 - reiserfs conversion utility?

2004-06-18 Thread Valdis . Kletnieks
On Fri, 18 Jun 2004 23:00:35 CDT, David Masover said:

 Do backups.  Now.  You are an idiot and/or a cheapskate if you don't
 have backups, because one day something will happen -- probably
 something ridiculously stupid -- and you will need them.  I mean, go
 build a backup server and, if you can afford it, give it something like
 a terabyte raid5 hotplug array.  Do it now.

And if possible, don't rely on the fact that raid5 is redundant.  I know
somebody who worked at a dot-com, and a PHB bought a large 2-terabyte RAID5 in
the days when 2T was still pretty big.  Said PHB refused to buy a separate
backup, since it was hot-swap RAID5.  Friend voiced objections, and PHB gloated
the first 2 times a single disk died and the system automagically rebuilt onto
a hot spare.

Then poetic justice arrived - a plumbing problem on a floor above caused
multiple thousands of gallons to decide the fastest way to ground level was
through the RAID5. Everybody immediately started updating their resumes,
because they *knew* the company was doomed when all their data went away.
Except for the PHB - his resume used to be on the RAID5...;)



pgpUtnyFESBTt.pgp
Description: PGP signature


Re: snapshot, checkpoints

2004-06-04 Thread Valdis . Kletnieks
On Fri, 04 Jun 2004 13:10:05 +0200, Paul Wagland said:

 Can't the same functionality be created with device mapper though? At least
 under linux anyway?

You'd need 2 things:

1) *very* recent patched device mapper (I think patches for snapshot support
went by on LKML just day before yesterday or so).

2) You also need a suitable write-barrier interlock to the filesystem, to
basically force a flush-to-disk of all the incore data buffers, etc (basically,
you need to ensure that at the instant the snapshot is taken, the on-disk copy
is clean by fsck standards).

At that point, you can just have a utility that goes flush; snapshot; and go on
your way.



pgpMb3KdlRYDW.pgp
Description: PGP signature


Re: Processes dying?

2004-06-04 Thread Valdis . Kletnieks
On Fri, 04 Jun 2004 23:39:25 +0300, [EMAIL PROTECTED] (Markus 
=?UNKNOWN?Q?T=F6rnqvist?=)  said:
 Hello
 
 I just started using the latest auto-snapshot.
 
 I noticed weird behavior, that is, processes crash, so bad even C-c doesn't
 kill them. For some reason running strace behind them gives me C-c support.

My guess is that some *other* process got wedged in the kernel while holding
a kernel lock, causing other processes to block when they needed that lock.

Probably will need a SysRq-T output to figure out who's hung where...


pgp7QNPQoONO7.pgp
Description: PGP signature


Re: The situation at hand and in the future

2004-05-28 Thread Valdis . Kletnieks
On Fri, 28 May 2004 09:33:24 +0300, Markus =?UNKNOWN?Q?T=F6rnqvist?= said:

 Persistent over boots? I'd like the passphrase and key to survive
 a boot...

No you don't.

If the passphrase and key are persistent, then an attacker can get your data.

Think about it - the only reason an attacker doesn't have access to your
data is because they don't have the passphrase/key.  If you leave them around,
you've given away the keys to the kingdom.


pgpU8FbRs1vOH.pgp
Description: PGP signature


Re: [PATCH] metas in reiserfs v4 snapshot 2004.03.26

2004-05-17 Thread Valdis . Kletnieks
On Sat, 15 May 2004 14:10:10 +0300, Markus =?UNKNOWN?Q?T=F6rnqvist?= said:

 This has been discussed. There is the mailer that uses an @-named
 symlink to the current message. Can't remember which one.

That would be MH, nhm, and exmh...


pgpVdhLh4Ebv2.pgp
Description: PGP signature


Re: reiser4 non-free?

2004-05-11 Thread Valdis . Kletnieks
On Tue, 11 May 2004 09:03:11 PDT, Hans Reiser said:
 [EMAIL PROTECTED] wrote:

 I pondered the whole credits question for a bit last night, and I realized
 that (a) I could account for at least the last 75 'mkfs' commands I had caused
 to run, and (b) of those 75, exactly *one* did *not* have all of its output
 swallowed by 'anaconda' during a RedHat or Fedora install.

 Yes, I believe that, and that is my concern. 

Unfortunately, this way lies madness - the IBM AIX install stuff *does* scroll
the copyrights for every single program as it installs.

It *sucks*.  Badly.

I've had system upgrades that have had 12 *thousand* lines of output from
copyright notices - and the way IBM makes sure that you see them is by
outputting them to the same place as the error messages, so you can't just
redirect the notices.

And yes, at least twice I've managed to render a system unbootable because I
missed something important scrolling by

The problem isn't What do we do if Hans does this?.. The problem is that
Fedora Core 2 is coming in at just under 1,700 RPMs, of which some 750 are
installed in a default workstation install.

What happens when the *other* 749 do it too?  You end up with a tragedy
of the commons - *none* of the 750 get read, and the installer becomes
unusable



pgp0.pgp
Description: PGP signature


Re: reiser4 non-free?

2004-05-11 Thread Valdis . Kletnieks
On Tue, 11 May 2004 10:57:01 PDT, Hans Reiser said:

 Random credits are the elegant answer.  Displaying only the distro name 
 at boot time is morally wrong.

Would be nice - the RedHat/Fedora GUI installer already supports showing the
current install status in one pane, and scrolling through a bunch of blurbs
in another.  It might be possible to get (at least) the Fedora side of the fence
to include blurbs for the package contributors as well...

I'm uncomfortable with the very large leap between a request for the distro
to do the morally right thing and required by license however.  As the old
saying goes: Don't let you mouth write no check your butt can't cash - a
distro could very well be willing to accept something under a good faith
best effort basis, but be unwilling to commit to required to under all
circumstances


pgp0.pgp
Description: PGP signature


Re: reiser4 non-free?

2004-05-10 Thread Valdis . Kletnieks
On Mon, 10 May 2004 21:49:15 EDT, Walter Landry said:

 The question is rarely what Debian needs to do, but rather what Debian
 promises that the users will be able to do.  Suppose that someone
 wanted to use Reiser4 on a miniature burnable CD for elections.  The
 mini-CD holds the person's vote, but it has to be initialized before
 it can be used.
 
 So whenever the poll worker initializes the mini-CD, they have to
 mentally remove the credits before they can parse the results of the
 filesystem initialization.

If you're giving a poll worker (or any similar usage - a kiosk, a point-of-sale
terminal, etc) a machine, and they have to do that, you're doing something
Very Wrong.

If it requires any more intelligence than turn it on, if a big RED circle
comes up, yell for help, if a big GREEN square with a check box comes up,
you're ready to go, you've failed your class in GUI Design 101. ;)







pgp0.pgp
Description: PGP signature


Re: Metas

2004-04-26 Thread Valdis . Kletnieks
On Sun, 25 Apr 2004 13:09:26 +0300, Markus =?UNKNOWN?Q?T=F6rnqvist?= said:
 On Sat, Apr 24, 2004 at 11:38:05PM -0500, David Masover wrote:

 scripts, it'd be different, but we're talking about something that I 
 know I'd find a _manual_ use for.  Even if you say that I can set it to 
 
 You check it once to see if the kernel has the default names for
 the metas directory. Oh, bugger, it's changed to METAS, then I guess
 I'll use cd METAS in the furure
 
 As for scripts it's easy to say `cat`

For bonus points, try writing a bash script fragment that will Do The Right Thing
whether or not the file system in question is Reiser4.

It really sucks when your script does:

cd `cat /sys/names/parent_directory`

only to have it bomb out because cat can't open it.

It sucks even worse if it actually *works* because some b0rked script
had previously done:

mkdir `cat /sys/names/parent_directory`

Don't believe me? I've seen plenty of scripts that will do an 'rm $foo'
where $foo is a parameter passed - so somebody runs the script with

/usr/local/bin/foo -o /dev/null

and later on you're trying to figure out why /dev/null is a 49 megabyte
regular file.. ;)


pgp0.pgp
Description: PGP signature


Re: Do xml-like namespaces make sense for Reiser4? (re: metas thread)

2004-04-13 Thread Valdis . Kletnieks
On Tue, 13 Apr 2004 13:47:12 CDT, John D. Heintz said:
 foo/nsa:permissions - foo/nsa.gov/secure-linux/permissions
 
 This assumes a mapping from nsa - nsa.gov/security/.
 
 The characters up to the ':' would be looked up in a namespaces map, and 
 if found the substituion would occur before further name- object 
 resolution. I don't think this breaks the goals set out in the Future 
 Vision white paper, but I guess that is exactly what I am asking you ;-)

This needs to be done *very* carefully, as all sorts of ugliness can result
from a user being able to muck about with the mappings.

Anybody who's ever used a VMS system and fiddled with the values of the
various SYS$WHATEVER values knows what I mean - and yes, there were
security holes caused by software that made the rash assumption that the
SYS$FOO variables weren't altered by the user.



pgp0.pgp
Description: PGP signature


Re: [PATCH] metas in reiserfs v4 snapshot 2004.03.26

2004-04-06 Thread Valdis . Kletnieks
On Tue, 06 Apr 2004 23:05:45 +0400, Nikita Danilov said:
   Meath \Meath\, Meathe \Meathe\, n. [See {Mead}.]
  A sweet liquor; mead. [Obs.] --Chaucer. Milton.

On the other hand, both those Chaucer and Milton blokes have
been dust for quite some time, and the language has moved on.

Does anybody outside the Renaissance Fair circuit still even drink mead? ;)



pgp0.pgp
Description: PGP signature


Re: secure delete?

2004-03-24 Thread Valdis . Kletnieks
On Tue, 23 Mar 2004 09:22:58 +0300, Hans Reiser said:
 
 Secure delete doesn't work against people who have the necessary 
 equipment to scan the media and find remnants due to track misalignment.

Notice that the people who have the necessary equipment and assume that the
adversaries are equally well-stocked don't seem to agree:

Canadian RCMP TSSIT OPS-II says: Must first be checked for correct functioning
and then have all storage areas overwritten once with the binary digit ONE,
once with the binary digit ZERO and once with a single numeric, alphabetic or
special character,  (http://jya.com/rcmp2.htm)

American DoD 5220-22.M says: Overwriting all addressable locations with a
character, its complement, then a random character and verify.

That's our official government recommendation for what's sufficient when
we're throwing away stuff that the Other Guys might actually do this to.
I have to assume that if the DoD or RCMP thought this wasn't sufficient
to protect *our* secrets, they'd have a stricter standard.


pgp0.pgp
Description: PGP signature


Re: ReConfigurable Directory Structure Agrregation of files according to semantic.

2004-03-22 Thread Valdis . Kletnieks
On Tue, 16 Mar 2004 23:10:04 EST, Hubert Chan [EMAIL PROTECTED]  said:
 And document files too.  I'm looking forward to being able to being able
 to scrap this strange hierarchy system that I'm currently using for all
 my documents.  Email, too, would do well with this system.  Just toss
 all the mail in a single folder, and have your MUA query the filesystem
 for mails from the ReiserFS list, or mails from friends, etc.

Ad-hoc query support in the file system (or even in user space) is always a
problematic issue, because there's so many corner cases that result in a DWIM
interface problem.

For example:

If you query your music filesystem for Eric Clapton, should it return While My
Guitar Gently Weeps by the Beatles?  If you ask it for songs written by the
artist Prince, should it return Manic Monday by the Bangles (the album
credit says Christopher)?  The music industry is *full* of that - and queries
like that Just Don't Work unless your metadata is accurate.

Bonus points for being able to handle music by Metallica before they heaved
Mustaine overboard and he went off to make Megadeth - what year did he leave,
and are the songs all *accurately* tagged for release dates?

If some idiot in Zanzibar says ooh shiny and clicks on an attachment they
shouldn't have, and starts spewing mail to you that has a friend's address in
the From: field, should mail from friends find it?

For that matter, how does my MUA know who friends are?  I have some people
that would count as friends who I correspond with on a much lower frequency
than some idiots that I'd rather never hear from again (but have to deal with
due to various obligations).  Equally problematic is when an old college
classmate drops me a note asking about our supercomputer, as an off-list reply
to something I said on a security mailing list (actually happened recently).
Is that a security, or friends, or supercomputing, or VT News, or all/
other?  And how does it know, other than simple word-indexing schemes (I
already use 'glimpse', but even that gets painful when your e-mail archive goes
back 15 years and totals over a gigabyte - compound searches take *forever*.

Semantic analysis is a royal pain - I can't expect the computer to be able to
figure out meanings in order to classify them, when *I* can't do it (I have at
least 10 or 15 pieces of mail that require a reply, but I haven't figured out
yet what the fleep the author was talking about..)



pgp0.pgp
Description: PGP signature


Re: Proposal for keying encrypted filesystem

2003-04-06 Thread Valdis . Kletnieks
On Sun, 06 Apr 2003 21:14:36 EDT, Pierre Abbat [EMAIL PROTECTED]  said:

 The tape monkey might could overwrite an encrypted file on disk with random 
 gibberish.

The problem we started discussing was that a backup system needs *read*
access to something isomorphic(*) to your data in order to back it up, but
you may not trust your tape monkey, backup software, or a myriad other things.

(*) The trick here is that we don't *actually* have to back up the data,
we need to back up something that will be a copy of the data once restored.
Properly done, the ciphertext of the data fulfills this requirement.

If your tape monkey has *write* access to the files, you're in trouble whether
or not it's encrypted.  Even in this case, encryption helps - if the tape
monkey manages to re-write a block with random gibberish, it will almost
certainly be noticed fairly quickly (Oracle complaining about corrupted
databases, a sudden 4K block of noise in the middle of a document, etc).
The tape monkey will almost certainly be blocked from doing something
subtle (like changing one block on an Oracle database so his salary goes
up 30%).



pgp0.pgp
Description: PGP signature


Re: Proposal for keying encrypted filesystem

2003-04-04 Thread Valdis . Kletnieks
On Fri, 04 Apr 2003 09:30:29 EST, Pierre Abbat [EMAIL PROTECTED]  said:

 But I'd also like to be able to have several encrypted directories on one 
 partition, with different keys, such that when I give the key any process 
 with the right UID can access them. I might have a cron job that needs access
 
 to encrypted data.

You need to apply least privilege - you don't give the key to any process
that doesn't need it.  In your example, you would make sure that any process
running under UID nnn gets given the key, so that other processes couldn't
do anything even if they *did* access them.

Properly applied, you can even leverage it further - for instance, if your
backup process doesn't have the key tokens, you can safely let it have access
to all the files - it can read the 127 meg of data to back it up in a bitwise
manner, but it can't actually DO anything with the data - this is something
that you can't do in the give everything the token model


pgp0.pgp
Description: PGP signature


Re: Proposal for keying encrypted filesystem

2003-04-04 Thread Valdis . Kletnieks
On Fri, 04 Apr 2003 20:36:49 +0400, Edward Shushkin said:
 Pierre Abbat wrote:
  
  On Friday 04 April 2003 09:47, [EMAIL PROTECTED] wrote:
 
  If a process that has no key tokens attempts to read an encrypted file with
  the ordinary syscalls, does it get an error or the ciphertext?
 
 Error. Wanna backup - give a valid key, and backups will be cpu-expensive.. 

In this case, you want it to return the ciphertext, so the backup process can
run cheaply and securely.  Among other things, if somebody steals the backup
tapes, they can't restore your system image.

And yes, this is a major issue for some sites - you've got some near-minimum
wage tape monkey taking your corporate data to the offsite vault, and you
want to be sure that even if he leaves with the tapes, it doesn't hurt you.

Having the backup read the ciphertext is more secure (and faster) than having
it encrypt on the way to the tape - among other things, this prevents the
underpaid tape monkey from bribing the encryption key for backups from the
backup admin, because they dont HAVE a key...


pgp0.pgp
Description: PGP signature


Re: Snapshots a la NetApp?

2003-03-26 Thread Valdis . Kletnieks
On Wed, 26 Mar 2003 21:37:10 +0300, Hans Reiser said:
 Heinz-Josef Claes wrote:
 btw: The only handycap of the used method is that all the links to a
 file can have only *one* set of permissions (chmod, chown). Will it be
 possible to change this with a loadable module in reiser4?

 I am not sure if we will support that.  If someone wanted to write the 
 code to do it, then I could examine the problem more closely and it 
 could be done.

Please don't.

There is just *WAY* too much Unix/Linux semantics that are dependent on the
concept that an inode describes a file, with all it implies - the fact that all
hard links have the same owner/perms (hint - how would you store 2 inodes that
had different owner/perm/mtime/ctime/etc but the same block list, and still
have it fsck correctly?  If you have a linked list of owners/perms, how do you
determine which one to use? etc etc..)  I'm not even going into the aspects
like programs that creat() and then unlink() in order to get a anonymous temp
file that goes away at last close, etc etc

You want different permissions, use ACLs.



pgp0.pgp
Description: PGP signature


  1   2   >