Remount bug (?) in 2.6.21-rc4-mm1

2007-03-26 Thread Jonathan Briggs
This is a very odd problem:

First, I did fsck the Reiser4 partition, there were no errors found.

The machine:
A Compaq R3000 laptop running 64-bit Gentoo.

Here's the problem:
I boot 2.6.21-rc4-mm1 into Gentoo single user mode.
I cat /etc/ld.so.preload.
It should contain /usr/local/lib64/libnosync.so
Instead it contains an equal number of zeros.
Weird, I say.

I copy a backup file into place, or mv one, it does not matter.
Reboot.
Boot single user, /etc/ld.so.preload is empty AGAIN.

So I add custom debugging to my init scripts.  The file is safe all
through shutdown and remount read-only seems OK.

On boot, the file seems OK, UNTIL the initscripts remount the file
system.

This does NOT happen on 2.6.20.

I could try to learn how to git bisect I guess.  Or maybe someone here
knows the problem already?

Thanks.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Is as_ops.c releasepage patch still needed?

2006-08-21 Thread Jonathan Briggs
The following patch was posted to the Reiser4 list August 3 by zam.  Is
it still needed?  It solved many problems for me, making my systems able
to actually complete full Beagle indexing.

But I have not seen this patch show up in the last two mm kernel
releases.  Did something else fix it or is this patch still needed?


Index: linux-2.6-git/fs/reiser4/as_ops.c
===
--- linux-2.6-git.orig/fs/reiser4/as_ops.c
+++ linux-2.6-git/fs/reiser4/as_ops.c
@@ -350,6 +350,11 @@ int reiser4_releasepage(struct page *pag
if (PageDirty(page))
return 0;

+   /* extra page reference is used by reiser4 to protect
+* jnode<->page link from this ->releasepage(). */
+   if (page_count(page) > 3)
+   return 0;
+
/* releasable() needs jnode lock, because it looks at the jnode fields
 * and we need jload_lock here to avoid races with jload(). */
spin_lock_jnode(node);

-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: create very large file system

2006-07-20 Thread Jonathan Briggs
On Thu, 2006-07-20 at 13:17 +0200, Christian Iversen wrote:
> On Thursday 20 July 2006 08:26, Andreas Dilger wrote:
> > On Jul 19, 2006  16:57 +0400, Alexander Zarochentsev wrote:
> > > On Wednesday 19 July 2006 16:10, Mark F wrote:
> > > > I've tried to create a large 5TB file system using both reiserfs and
> > > > ext3 and both have failed.
> > >
> > > you might need to convert the partition table to GPT format for
> > > supporting 2TB+ partitions.  it can be done by the gnu parted tool.
> >
> > Or, for that matter, don't use a partition table at all, since this
> > adds an unhelpful offset to all the filesystem structures and can
> > hurt performance on RAID where the filesystem is trying to align IO
> > to RAID stripe boundaries.
> 
> Can linux still auto-detect raid volumes if there's no partition table?

You're not supposed to be doing it that way these days.  RAID autodetect
is getting tossed out of the kernel in the future (probably still many
versions away though), and RAID, DM, LVM, and maybe even regular
partition setup is going to be done in initramfs / initrd.

At least, that is what I read.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: another semantic storage system (in userspace)

2006-07-13 Thread Jonathan Briggs
On Thu, 2006-07-13 at 16:30 -0400, Hubert Chan wrote:
> On Thu, 13 Jul 2006 10:38:23 -0700, Hans Reiser <[EMAIL PROTECTED]> said:
> 
> > Clay Barnes wrote:
> 
> >> 1) Scope
> >> a) Should the semantic content of files be purely user-defined?
> 
> > Yes.
> 
> I guess this also raises the question of how multiple users on the same
> machine can define their own semantic content (e.g. if user A wants to
> index some new file format, but doesn't want to have to bug the
> administrator to add support for it).  Will the filesystem be talking to
> some userspace daemons?

I was thinking that the file system should only index its own meta-data
attributes.  A user-space daemon should read the file contents and
create these attributes.

Search directories would display selected parts of the indexes.  One of
these that would be highly useful for a user-space indexing daemon is a
timestamp search directory.  The indexer would begin with the timestamp
search set to (UID == my user and timestamp > 0).  After indexing a few
files it would update the search to (my user and > timestamp of last
indexed file).  Or possibly, if Reiser4 has something like a 64-bit
monotonic update ID, it could use that instead of a timestamp.

If the filesystem indexes are not going to be updated in real-time but
only at specific times, another search type that could list updated but
not yet indexed files would also be useful.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: any way to disable fsync?

2006-07-11 Thread Jonathan Briggs
On Tue, 2006-07-11 at 23:03 +0200, Łukasz Mierzwa wrote:
[snip]
> So my question is: is there any way to disable fsync for reiser4? (beside  
> patching it to fake fsync instead of doing them).

http://ftp.die.net/pub/qmail-tools/libnosync.c

-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: ReiserFS v3 choking when free space falls below 10%?

2006-07-06 Thread Jonathan Briggs
On Thu, 2006-07-06 at 08:43 -0700, Mike Benoit wrote:
[snip]
> My desktop machine (v2.6.16, same as my MythTV box) is running with 9%
> free space right now and it is not experiencing any slow down. I think
> the problem is caused by the usage pattern of MythTV and how it
> simultaneously streams one or more large files to the HD in relatively
> small chunks over a long period of time.

Hasn't someone patched MythTV to pre-allocate (zero-write) the video
files to the expected sizes?

I was sure I'd read about that somewhere...
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: ReiserFS slow, need help diagnosing

2006-06-05 Thread Jonathan Briggs
On Mon, 2006-06-05 at 09:22 -0700, Hans Reiser wrote:
> Grzegorz Kulewski wrote:
> 
> >
> >>
> >> Well, inode location in reiser4 changed comparing to reiserfs. reiser4
> >> groups inodes of files of one directory together (reiserfs did not do
> >> that), but still allocated disk space for inodes dynamically as
> >> reiserfs.
> >> So, I guess that reiser4 will be better than reiserfs, but
> >> still worse than ext[23]. Would you verify this guess it please?
> >
> I wouild not assume this.  There is a huge difference with respect to
> this usage pattern between reiser4 and reiser3, it should dramatically
> improve.  I don't know if we will be better or worse than ext3, it could
> be either, best to measure it.

I also use Pan, and recently switched to using Reiser4 on my laptop.  I
can tell that Pan starts much more quickly than it used to using
Reiser3.  It isn't nearly as fast as starting it from a USB Flash drive
though.  I don't know how it compares to ext3, although the flash drive
was using a ext2 loopback on FAT-32 filesystem.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: Reiser4 crash 2.6.16-mm1

2006-03-28 Thread Jonathan Briggs
On Tue, 2006-03-28 at 12:08 -0800, Hans Reiser wrote:
> Jonathan Briggs wrote:
[...]
> >And if it's a production machine, it is using ECC RAM, I would hope.  If
> >it is, memory problems (unreported ones, anyway) are very, very
> >unlikely.
> >  
> >
> Jonathan, be merciful, ECC ram last I checked is twice the cost of
> regular and the mobs cost more too.  (I am sure the cost to produce is <
> 15% more, which makes it a great pity Intel does not standardize on
> requiring it and force it to be cheap)  Some folks need to save money. 
> Yeah, I know, this time it may have cost him more in cost of his time
> but we are all just assuming it is memory.  Unfortunately, unless he
> checks it or we see an identical error message from another user with
> checked memory, or vs tells me he sees a flaw in the code, we need to
> assume it is memory.
[...]

Yes, I know. :)

But for a production machine that is "producing" something of value, the
extra cost should not be an issue.  RAM errors are so subtle and so hard
to find that ECC is of far more value than RAID.  It is obvious when
your disk fails.

An extra high bit in a credit transaction could cost you $16,384 and you
might not ever realize what happened. :)

Anyway, off topic, but ECC is highly recommended.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: Reiser4 crash 2.6.16-mm1

2006-03-28 Thread Jonathan Briggs
On Tue, 2006-03-28 at 07:34 -0800, Joachim Feise wrote:
[...]
> This is a production machine that I can't take offline for too long.
> But yes, I have compiled the kernel on another reiser4 partition over night,
> without problems.
> If this was a memory problem, it would indeed manifest itself in other areas
> with more or less random errors. The fact that it does not indicates to me 
> that
> this is a fs problem. So, at this point I am ruling out a memory issue.

And if it's a production machine, it is using ECC RAM, I would hope.  If
it is, memory problems (unreported ones, anyway) are very, very
unlikely.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: State of the Reiser4 FS

2006-03-15 Thread Jonathan Briggs
On Tue, 2006-03-14 at 23:14 -0800, Hans Reiser wrote:
[snip]
> They claim that if we don't use the ext3 code
> in our fs then they will be forced to shoulder an extra burden to
> maintain our code.  We are not allowed to specify that they should not
> maintain our code at all.  I need to read more Kafka I think, it is hard
> for me to understand it all.

Err, this actually does make a lot of sense Hans.

The mainline Linux Kernel code is maintained by everyone that can
convince Linus or a sub-maintainer to accept their patch.  In order to
produce an acceptable patch to core kernel code that provides
file-system services, a patch author must also change the file systems
that use it.  He or she cannot just leave the change laying around for
everyone else to fix.  (At least, not usually.)

The only code that doesn't have to be maintained by main-line patch
contributors is out of tree, which is where Reiser4 is now.  Code that
no one is interested in maintaining or can't be maintained gets kicked
out of tree.

Some of your reasons to get it into main-line kernel include: more trust
by end users that your code is stable, avoiding your team having to do
their own fixes for cross-kernel changes, and a wider user base through
ease of use (not having to apply extra kernel patches).

Right?

User trust comes through passing the code reviews, and users knowing
that if the Reiser4 team vaporizes, the code can still be maintained and
the file-system won't disappear.

Avoiding extra work for cross-kernel patches means that other people
have to be able to make changes to your code.

That all means that accepting Reiser4 code into main-line does mean they
have to maintain it.

-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


SCSI caching and Reiser4

2006-03-07 Thread Jonathan Briggs
I thought I'd ask the list, since I've been wondering about it for a few
versions now, and I haven't been able to figure it out from the code
(although I haven't looked *that* hard).

My SCSI disks report that they have write back cache w/ FUA, which
should mean that write barriers work.  Am I right?

But Reiser4 complains that it is disabling write barriers.

/dev/md2 is a RAID-0 of the two SCSI devices.  I suspect that the MD
driver isn't supporting FUA properly but I haven't been able to figure
that out for sure.

Does anyone know for sure about MD block devices and write barriers?

Here are some of the interesting bits from dmesg:
Linux version 2.6.16-rc5-mm2 ([EMAIL PROTECTED]) (gcc version 4.0.2 20051125 
(Red Hat 4.0.2-8)) #1 PREEMPT Fri Mar 3 14:06:12 MST 2006
...
SCSI device sda: 35843686 512-byte hdwr sectors (18352 MB)
sda: Write Protect is off
sda: Mode Sense: ab 00 10 08
SCSI device sda: drive cache: write back w/ FUA
SCSI device sda: 35843686 512-byte hdwr sectors (18352 MB)
sda: Write Protect is off
sda: Mode Sense: ab 00 10 08
SCSI device sda: drive cache: write back w/ FUA
 sda: sda1 sda2 sda3 sda4
sd 0:0:0:0: Attached scsi disk sda
SCSI device sdb: 35843686 512-byte hdwr sectors (18352 MB)
sdb: Write Protect is off
sdb: Mode Sense: ab 00 10 08
SCSI device sdb: drive cache: write back w/ FUA
SCSI device sdb: 35843686 512-byte hdwr sectors (18352 MB)
sdb: Write Protect is off
sdb: Mode Sense: ab 00 10 08
SCSI device sdb: drive cache: write back w/ FUA
 sdb: sdb1 sdb2 sdb3 sdb4
sd 0:0:1:0: Attached scsi disk sdb
...
md: Autodetecting RAID arrays.
md: autorun ...
md: considering sdb4 ...
md:  adding sdb4 ...
md: sdb3 has different UUID to sdb4
md: sdb1 has different UUID to sdb4
md:  adding sda4 ...
md: sda3 has different UUID to sdb4
md: sda1 has different UUID to sdb4
md: created md2
...
<5>reiser4[ktxnmgrd:md2:ru(1368)]: disable_write_barrier 
(fs/reiser4/wander.c:234)[zam-1055]:
NOTICE: md2 does not support write barriers, using synchronous write instead.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: external journal questions

2006-02-24 Thread Jonathan Briggs
On Fri, 2006-02-24 at 11:58 +0100, Jure Pečar wrote:
> On Thu, 23 Feb 2006 04:21:46 -0600
> David Masover <[EMAIL PROTECTED]> wrote:
> 
> > Where can I find the paper on why this makes sense?  Because offhand,
> > it doesn't, unless you're hoping that the majority of transactions
> > can be flushed on boot, rather than unrolled.
> 
> Can't point you to any specific paper, but you can imagine running a
> large mailserver for hundreds of tousands of users. Plenty of
> small, random io, almost as much writes as reads. That's where ssd for
> journal makes sense.
>  
> > I'm going to assume you aren't talking about v4, since this sounds
> > like a mission-critical production-style environment.  As I
> > understand it, v4 has a completely different way of doing journaling.
>  
> Right and right.
>  
> > I'm replying to you, not because I actually have an answer for you,
> > but because your case seems interesting, and I'm curious how Reiser4
> > handles it.
> 
> Check namesys.com on "wandering logs" :)
> 
> > Problem is, I see nowhere for this to fit in the current model of
> > Reiser4.  As I understand it, there is no concept of a separate
> > "journal" device, or of writing a file twice, because the vast
> > majority of writes are simply written out to disk in the new
> > location, and then the "commit" is updating the metadata to point to
> > the new location and free the old.
> 
> I suppose this "wandering logs" concept is going to be much better that
> "journal file/device" concept ext3 uses, but right now it sounds like
> it needs some more optimization work.
> 
> The cost here we all want to avoid is called seek time. Even today,
> it's still measured in miliseconds and that's a couple orders of
> magnitude more that gigaherzs cpus tick at. Reiser4 is on a good way to
> decrease this cost by spending some more cpu ticks, but because I need
> a solution "yesterday" (welcome to the real world ... or how they
> say:), I'm lookig for a more traditional approach, ssd journal.

It does sound like Reiser4 will have problems with the small email files
scenario.  Each email file will be written out and immediately sync'd,
so R4 will not have any opportunity to build up its usual delayed
allocation and wandering log.

With an external journal on SSD, data journaling and Ext3 or Reiser3,
the small email files become very efficient because the journal writes
move in a simple linear pattern across the journal.  fsync() can return
immediately once the data is in the journal and the file system can move
data from the journal to the actual disk at its own time.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: Wtr.: Re: 2.6.15.1+reiser4 Kernel Panic

2006-01-30 Thread Jonathan Briggs
Clay Barnes wrote:
> To amplify what Hans (accurately) wrote, once you see bad sectors,
> that means that you already have many more than you see, and that your
> hard drive is all out of spare sectors to silently swap out for when
> it finds one.
>
> Basically, there's so much wrong with the surface that your hard drive
> can't hide any more flaws, and has given up trying.
>
> Hans is right:  If you see a single bad sector, the drive is
> considered unsafe and trash for even the tightest owner.
I don't believe that's quite true.  It's my understanding that a drive
may have a correctable read error.  The data may be too damaged to read
from that sector, but rewriting the sector can repair it or remap it to
a new undamaged area.  The drive cannot make up the lost data though, so
it must report the error.

The damage can come from a partial write during power down or from data
that was marginal to start with and hadn't been read in a long time
(Drives will remap sectors even on successful reads, when they have to
use too much error correction to get the data, but don't generally do
full surface sweeps to check.  SMART can be told to do full surface
checks, however.)

Don't take the existence of a single bad sector as proof that the drive
is trash.  Instead, check the SMART status and look at the remap rates
and error rates.  SMART should have a nice summary for you too, either
OK, FAILING or FAILED (or similar phrases).


Re: Authoring a versioning plugin

2006-01-12 Thread Jonathan Briggs
On Wed, 2006-01-11 at 22:44 -0800, Hans Reiser wrote:
> Hans Reiser wrote:
> >  I am skeptical that having it occur with every
> >write is desirable actually.
> >  
> >
> Consider the case where you type cat file1 >> file2.  This will produce
> a version of file2 for every 4k that is in file1, because (well I didn't
> look at the bash source, but I would guess) it appends in 4k incremental
> writes rather than one big write.  Versioning on file close makes more
> sense
[snip]

Not that my opinion means anything. :-) But I agree with Hans that file
close is the place to create the new version.  The plugin should track
the writes (and mmap flushes) between file open and close, then on file
close it can process everything into a reverse binary diff to save
permanently.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: Slowdown is gone & apt-get works with updated reiser4. So nevermind...

2005-11-11 Thread Jonathan Briggs
On Fri, 2005-11-11 at 13:59 +, John Gilmore wrote:
[snip]
> 
> There is, btw, one main reason that I've decided that whatever trouble it may 
> cause, and whatever growing pangs I may experience along the way, root 
> reiser4 is worth it. Does anybody remember GoBack? It was a versioning system 
> for windows 95/98 that was incredibly flexible and useful.
[snip]

I have a "goback" utility for my Linux machines.  It's a 300 GB USB2
drive and rdiff-backup. :)

You could do a similar thing to GoBack with any recent Linux kernel and
inotify, actually.

Create a series of snapshot directories that contain copies of every
file.  Use inotify to watch all changes.  Every 15 minutes update the
snapshots with the changed files.

Or you might be able to hook it into rdiff-backup or something like it,
to update only the changed files every 15 minutes, without needing to do
full disk scans to find the changes.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: Our introduction to Reiser-list

2005-10-27 Thread Jonathan Briggs
On Thu, 2005-10-27 at 12:20 -0700, Peter van Hardenberg wrote:
> On October 27, 2005 04:17 am, David Masover wrote:
> > Peter van Hardenberg wrote:
> > > On October 26, 2005 10:02 am, John Gilmore wrote:
> > >
> > > And I thought the whole idea was to unify the namespace and make things
> > > like ID3 tags obsolete...
> >
> > The two are not mutually exclusive.  You unify the namespace, and use
> > that to access things like ID3 tags.  Of course, eventually ID3 tags
> > become obsolete, and the information is instead stored outside of the
> > file itself, as a separate stream (treated as a file).  You'd have a
> > standard way of serializing any given file and all its metadata, so that
> > "something like id3" doesn't have to be re-invented for every file type
> > that has metadata, and so that similar metadata can be accessed through
> > a standard mechanism -- searching for a particular artist should return
> > songs (using id3 tags) and music videos (using the mpeg equivalent) and
> > maybe even song lyrics (using separate metadata).
> 
> It's much easier, more extensible, and more secure to create a utility which 
> ties together a number of userspace metadata libraries to create static files 
> than to move them all down into kernel space. I feel plugins providing 
> pseudofiles should only be used when there is no viable alternative.

And with transactions, it can be safe against becoming out of sync
(which is one of the arguments for doing it in-kernel).

Scenario:
Reiser4 detects a file update about to happen.
Transaction opens.  Changes are now invisible outside
transaction.
File contents are updated.  
User space helper is called.
    Metadata files are updated.
User space helper exits.
Transaction closes.  File and metadata updates are now visible.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: iosched (was Re: Full of surprises - A reiser4 story from userland)

2005-09-28 Thread Jonathan Briggs
On Wed, 2005-09-28 at 17:33 -0400, studdugie wrote:
> On 9/28/05, Hans Reiser <[EMAIL PROTECTED]> wrote:
> >  Check out the latest cfq in the latest kernel, it is much better than
> > the others for most applications.  Anticipatory used to be the best, but
> > cfq-3 is better now.
> >
> When you say the best is that a general conclusion for both single
> disks and RAID?

I find (but no hard data to provide) that no-op or deadline seems to
work best when working with an intelligent RAID controller.  Just push
the queue to the controller and it'll sort out the best way to get
things.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: I request inclusion of reiser4 in the mainline kernel

2005-09-20 Thread Jonathan Briggs
On Tue, 2005-09-20 at 13:57 -0400, Theodore Ts'o wrote:
> On Tue, Sep 20, 2005 at 09:41:36AM -0500, David Masover wrote:
> > And personally, if it was my FS, I'd stop working on fsck after it was 
> > able to "check".  That's what it's for.  To fix an FS, you wipe it and 
> > restore from backups.
> 
> If that's Reiser4's philosophy, just make sure you tell all of your
> prospective users about ahead of time.  Is it really an explicit goal
> that performance is so important beyond all other considerations that
> it's OK if the filesystem is extremely fragile and that if anything
> goes wrong, you have to wipe things and recover from backups?!?
[snip]

I think that you missed the bit "if it was my FS" that David wrote.  As
far as I know, Hans Reiser defines the Reiser philosophy.

I use Reiser3 and Reiser4 on all my systems and fsck has always worked
even if it has been much slower than I would like.  The only problems
I've experienced have been on the same level as when an ext2/3
filesystem fsck dumps several directories of unlabeled files into lost
+found.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: Howto make reiser-3.6 a bit more standby friendly?

2005-08-23 Thread Jonathan Briggs
On Tue, 2005-08-23 at 14:39 +0200, Clemens Eisserer wrote:
> Hi there,
> 
> I am currently tinkering a idn-router which a harddrive which should
> mostly be in stand-by.
> However it seems the drive is rather unimpressed by a 5s timeout
> setting specified with hdparm and goes in standby very seldom although
> the computer is runnig 100% idle (even no isdn-services started or
> anything else).
> 
> Could this be because of some journal-writeback stuff or anything like
> that? Is there a way to tune ReiserFS for more standby-friendliness?

Check out a script called laptop-mode.  It adjusts various parameters
including /proc/sys/vm/laptop_mode.  I use Reiser 3 on my laptop and
laptop_mode seems to work pretty well.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: Reiser4 and ACLs

2005-08-15 Thread Jonathan Briggs
On Tue, 2005-08-16 at 00:19 +0400, Nikita Danilov wrote:
> Hubert Chan writes:
>  > On Sun, 14 Aug 2005 17:24:17 +0400, Nikita Danilov <[EMAIL PROTECTED]> 
> said:
>  > 
>  > > Not exactly. As a matter of fact, ACL and EA support was already
>  > > implemented in reiser4. But it used standard xattrs API to interface
>  > > to the user-land, and it was decided that reiser4 should go
>  > > sys_reiser4() route instead. So, it was reaped.
>  > 
>  > Does this mean that file-as-dir has also been abandoned?
> 
> Sorry, it seems I was too vague. It's exactly the opposite: standard
> xattr API was abandoned in favour of accessing EAs and ACLs through
> pseudo files. The latter method is not implemented yet, and I don't know
> how stable sys_reiser4() API is currently. Hans is the proper person to
> ask.

There is no reason not to support the standard xattr calls, whether it
is done in-kernel or in the C library.  The pseudo files would also be
good, but several programs already use the current xattr support.

Namesys may not want to add the support themselves but I hope they do
not reject contributed code for it.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: Fastest way to "find / -mtime +7".....

2005-07-19 Thread Jonathan Briggs
On Tue, 2005-07-19 at 22:09 +0200, Ragnar Kjørstad wrote:
> On Tue, Jul 19, 2005 at 12:48:53PM -0600, Jonathan Briggs wrote:
> > > this is pretty slow on reiser, at least compared with ext2/3, and I  
> > > understand that it may be because the find command returns the names  
> > > in a non-optimal order (ie readdir order?).
> > 
> > I think Reiser3 is slow more because with mtime, find has to stat each
> > file. 
> 
> The two issues are related.
> 
> Readdir will return the filenames in hash order. Find will then go and
> stat each file, still in hash order.
> 
> Problem is, the inodes are not sorted in directory hash order on the
> disk. They tend to be in approximate creation order. So, the disk access
> pattern is nearly random access, meaning every stat is likely to touch a
> new block and readahead is completely useless.
[snip]
How about some kind of stat-data readahead logic?  If the first two or
three directory entries are stat'd, queue up the rest (or next
hundred/thousand) of them.  If the disk queue is given the whole pile of
stat requests at once instead of one at a time, it should be able to
sort them into a reasonable order.

This might even be a VFS thing to do instead of per-FS.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: Fastest way to "find / -mtime +7".....

2005-07-19 Thread Jonathan Briggs
On Tue, 2005-07-19 at 10:55 -0700, Ed Walker wrote:
> I've got a lot of small maildir files stored on a reiser-fs  
> partition.  Currently we expire out the old stuff using
>  find /mail -mtime +7 -type f -print0 | xargs -0 rm -rf
> 
> this is pretty slow on reiser, at least compared with ext2/3, and I  
> understand that it may be because the find command returns the names  
> in a non-optimal order (ie readdir order?).
> 
> Is there something we can do to speed it up?  Any suggestions?
> 
> Thanks-
> 
> Ed

I think Reiser3 is slow more because with mtime, find has to stat each
file.  I did a couple ad-hoc tests and it seems to be about 40x slower
on directory list with stat than just plain directory lists.  I didn't
try ext2/3.

I believe the reiser3 directory order problems with maildir are related
to something else, like not finding new mail at the end of the
readdir()? 
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: reiser4 plugins

2005-07-06 Thread Jonathan Briggs
On Wed, 2005-07-06 at 15:53 -0500, David Masover wrote:
> Jonathan Briggs wrote:
[snip]
> > /a/b/c, /a/b/d, /a/b/e, /b
> > mv /a/b /b
> > Now you have to change the stored grand-parent inodes for /a/b/c, /a/b/d
> > and /a/b/e so that they reference /b/b instead of /a/b.
> 
> no, c, d, and e all would reference /a/b's *inode*, not *pathname*. 
> Then /a/b would reference /a.  So with the above mv, you just change the 
> reference in /a/b (now /b/b) to point to /b as parent.

I misunderstood the reference to parent(s).  It's a list of parent
inodes, one parent per hard link, not a list of all (grand)*parents up
to the root.

But I see it now.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: reiser4 plugins

2005-07-06 Thread Jonathan Briggs
On Wed, 2005-07-06 at 15:51 -0400, Hubert Chan wrote:
> On Wed, 06 Jul 2005 12:52:23 -0600, Jonathan Briggs <[EMAIL PROTECTED]> said:
[snip]
> > It still has the performance and locking problem of having to update
> > every child file when moving a directory tree to a new parent.  On the
> > other hand, maybe the benefit is worth the cost.
> 
> Every node should store the inode number(s) for its parent(s).  Not the
> path name.  So you don't need to do any updates, since when you move a
> tree, inode numbers don't change.

You do need the updates if you change what inode is the parent.

/a/b/c, /a/b/d, /a/b/e, /b
mv /a/b /b
Now you have to change the stored grand-parent inodes for /a/b/c, /a/b/d
and /a/b/e so that they reference /b/b instead of /a/b.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: reiser4 plugins

2005-07-06 Thread Jonathan Briggs
On Tue, 2005-07-05 at 23:44 -0700, Hans Reiser wrote:
> Hubert Chan wrote:
> >And a question: is it feasible to store, for each inode, its parent(s),
> >instead of just the hard link count?
> >  
> >
> Ooh, now that is an interesting old idea I haven't considered in 20
> years makes fsck more robust too

Hey, sounds like the idea I proposed a couple months ago of storing the
path names in each file, instead of in directories.  Only better, since
each path component isn't text but a link instead.

It still has the performance and locking problem of having to update
every child file when moving a directory tree to a new parent.  On the
other hand, maybe the benefit is worth the cost.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: reiser4 plugins

2005-07-05 Thread Jonathan Briggs
On Tue, 2005-07-05 at 17:46 +0200, Martin Waitz wrote:
[snip]
> Filesystems are there to store files.
> Everything else can be done in userspace.

You could do filesystems in userspace too and just use the kernel's
block layer.

In fact you can reduce the OS kernel to just interrupts, memory
management, context switches and message passing.

Everything else can be done in userspace, but that doesn't always make
it a good idea.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: File as a directory - VFS Changes

2005-06-02 Thread Jonathan Briggs
On Thu, 2005-06-02 at 14:38 +0400, Nikita Danilov wrote:
> Jonathan Briggs writes:
>  > On Wed, 2005-06-01 at 21:27 +0400, Nikita Danilov wrote:
>  > [snip]
>  > > Frankly speaking, I suspect that name-as-attribute is going to limit
>  > > usability of file system significantly.

Usability as in features?  Or usability as in performance?

>  > > 
>  > > Note, that in the "real world", only names from quite limited class are
>  > > attributes of objects, viz. /proper names/ like "France", or "Jonathan
>  > > Briggs". Communication wouldn't get any far if only proper names were
>  > > allowed.
>  > > 
>  > > Nikita.
>  > 
>  > Bringing up /proper names/ from the real world agrees with my idea
>  > though! :-)
> 
> I don't understand why if you are liberty to design new namespace model
> from scratch (it seems POSIX semantics are not binding in our case), you
> are going to faithfully replicate deficiencies of natural languages.
> 
> It is common trait in both science and engineering that when two flavors
> of the same functionality (real names vs. indices) arise, an attempt is
> made to reduce one of them to another, simplifying the system as a
> result.

A index is an arrangement of information about the indexed items.  The
index contents *belong* to the items.  An index by name?  That name
belongs to the item.  An index by date?  Those dates are properties of
the item.  Anything that can be indexed about an item can be described
as a property of the item.

Only for efficiency reasons are index data not included with the item
data.

> 
> In our case, motivation to reduce one type of names to another is even
> more pressing, as these types are incompatible: in the presence of
> cycles or dynamic queries, namespace visible through the directory
> hierarchy is different from the namespace of real names.

Queries create indexes based on properties of the items.  This is no
different from directories, which are indexes based on names of the
items.

In the same way that you can descend a directory tree and copy the names
found into each item, you can check each item and copy the names found
into a directory tree.

> 
> Indices cannot be reduced to real names (as rename is impossible to
> implement efficiently), but real names can very well be reduced to
> indices as exemplified by each and every UNIX file system out there.
> 
> So, the question is: what real names buy one, that indices do not?

By storing the names in the items, cycles become solvable because you
can always look at the current directory's name(s) to see where you
really are.  Every name becomes absolutely connected to the top of the
namespace instead of depending on a parent pointer that may not ever
connect to the top.

If speeding up rename was very important, you can replace every pathname
component with a indirect reference instead of using simple strings.
Changing directory levels is still difficult.

-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: File as a directory - VFS Changes

2005-06-01 Thread Jonathan Briggs
On Wed, 2005-06-01 at 21:27 +0400, Nikita Danilov wrote:
[snip]
> Frankly speaking, I suspect that name-as-attribute is going to limit
> usability of file system significantly.
> 
> Note, that in the "real world", only names from quite limited class are
> attributes of objects, viz. /proper names/ like "France", or "Jonathan
> Briggs". Communication wouldn't get any far if only proper names were
> allowed.
> 
> Nikita.

Bringing up /proper names/ from the real world agrees with my idea
though! :-)

As a person, you have a list of "proper names" that you answer to and
that you prefer.  However, in some cases you will also answer to "Hey,
you over there!" or "Someone who left a white Honda in the parking lot,
please turn your lights off."

So a file could have a list of proper names, but it can also be referred
to in any other way and by any other name.  Proper names would be
preferred, though.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: File as a directory - VFS Changes

2005-06-01 Thread Jonathan Briggs
On Wed, 2005-06-01 at 18:42 +0400, Nikita Danilov wrote:
> Jonathan Briggs writes:
>  > On Wed, 2005-06-01 at 14:43 +0400, Nikita Danilov wrote:
>  > > Nikita Danilov writes:
> 
> [...]
> 
>  > > 
>  > > That latter bit, about making them persistent, is where the trouble
>  > > begins: once queries acquire identity and a place in the file system
>  > > name-space, they logically become part of that very name-space they are
>  > > querying! This leads to various complication, and you are trying to work
>  > > around them by claiming that queries are not _always_ part of name-space
>  > > ("file1 [only] **appears** to be a child..."). This non-uniform behavior
>  > > is a big disadvantage.
>  > 
>  > In this scheme, query objects were always part of the name-space.
> 
> Then, paths visible through queries are inconsistent with names of
> underlying objects. You querying system returns fake results
> ("/tmp/A/B/C/A/file1") that are not present in the database queries are
> ran against. This is *wrong*. Nobody is going to tolerate DBMS that
> sometimes returns extra rows in SELECT statement, right?

If you wished to enforce name-query directories always having a single
name and their query always being identical to their name, then that
wouldn't happen.

However, query directories (or "smart folders") will have this namespace
problem in every case and there is no avoiding it.  If the query is for
every file modified in the past day, the file path through the query
directory is not going to match any given name of the file.  Same for
keyword queries, ownership queries, or whatever.

In the traditional directory system, a file doesn't have an official
name, just links to it from directory entries.  Perhaps if you think of
the proposed "name" meta-data as a "preferred name" the idea would work
better for you?
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: File as a directory - VFS Changes

2005-06-01 Thread Jonathan Briggs
On Wed, 2005-06-01 at 14:43 +0400, Nikita Danilov wrote:
> Nikita Danilov writes:
> 
> [...]
> 
>  > 
>  >  > 
>  >  > Yes. :-)  It is radical, and the idea is taken from databases.  I
>  >  > thought that seemed to be the direction Reiser filesystems were moving.
>  >  > In this scheme a name is just another bit of metadata and not
>  >  > first-class important information.  The name-query directories would be
>  >  > there for traditional filesystem users and Unix compatibility.  They
>  >  > would probably be virtual and dynamic, only being created when needed
>  >  > and only being persistent if assigned meta-data (extra names (links),
>  >  > non-default permission bits, etc) or for performance reasons (faster to
>  >  > load from cache than searching every file).
>  > 
>  > That latter bit, about making them persistent, is where the tr
>  > 
> 
> [Hmm... grue ate my message.]
> 
> That latter bit, about making them persistent, is where the trouble
> begins: once queries acquire identity and a place in the file system
> name-space, they logically become part of that very name-space they are
> querying! This leads to various complication, and you are trying to work
> around them by claiming that queries are not _always_ part of name-space
> ("file1 [only] **appears** to be a child..."). This non-uniform behavior
> is a big disadvantage.

In this scheme, query objects were always part of the name-space.

None of the objects are really children of any of the others. They only
appear to be children when viewed through a set of name-query
directories.  In reality every object would be an equal in the true OID
name-space.  Only meta-data objects are children of their data objects.

You could also create a confusing query named /tmp/G that returned
results for /usr/lib/.  This is the same sort of abuse that creates
A->B->C->A loops: the query was deliberately set to have a misleading
name/name-query relationship.

The user is responsible for sensible naming.   Under normal use, a user
would hardly notice the difference between traditional directories and
this name-query system.  

With persistent disk cache of queries and lookup tables for common
names, it does start to look like regular directory structures, but it
is still coming at the problem from the opposite direction.  Traditional
directories store information about a file (its name) outside the file,
and this system would store everything about a file with the file
itself.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: File as a directory - VFS Changes

2005-05-31 Thread Jonathan Briggs
On Wed, 2005-06-01 at 02:36 +0400, Nikita Danilov wrote:
> Jonathan Briggs writes:
>  > On Tue, 2005-05-31 at 15:01 -0600, Jonathan Briggs wrote:
>  > > I should create an example.
>  > > 
>  > > Wherever I used True Name previously, use OID instead.  True Name was
>  > > simply another term for a unique object identifier.
>  > > 
>  > > Three files with OIDs of 1001, 1002, and 1003.
>  > > Object 1001:
>  > > name: /tmp/A/file1
>  > > name: /tmp/A/B/file1
>  > > name: /tmp/A/B/C/file1
>  > > 
>  > > Object 1002:
>  > > name: /tmp/A/file2
>  > > 
>  > > Object 1003:
>  > > name: /tmp/A/B/file3
>  > > 
>  > > Three query objects (directories) with OIDs of 1, 2, and 3.
>  > > Object 1:
>  > > name: /tmp/A
>  > > name: /tmp/A/B/C/A
>  > > query: name begins with /tmp/A/
>  > > query result cache: B->2, file1->1001, file2->1002
>  > > 
>  > > Object 2:
>  > > name: /tmp/A/B
>  > > query: name begins with /tmp/A/B/
>  > > query result cache: C->3, file1->1001, file3->1003
>  > > 
>  > > Object 3:
>  > > name: /tmp/A/B/C
>  > > query: name begins with /tmp/A/B/C/
>  > > query result cache: A->1, file1->1001
>  > > 
>  > > Now there is a A -> B -> C -> A directory loop.  But removing
>  > > name: /tmp/A/B/C/A from Object 1 fixes the loop.  Deleting Object 1 also
>  > > fixes the loop.  Deleting any of Object 1, 2 or 3 does not affect any
>  > > other object, because in this scheme, directory objects do not need to
>  > > actually exist: they are just queries that return objects with certain
>  > > names.
> 
> One problem with the above is that directory structure is inconsistent
> with lists of names associated with objects. For example, file1 is a
> child of /tmp/A/B/C/A, but Object 1001 doesn't list /tmp/A/B/C/A/file1
> among its names.

file1 *appears* to be a child because it is actually returned as the
query result for its name of /tmp/A/file1 because A is a query
for /tmp/A/.  If the shell was smart enough to normalize its path by
asking the directory for its name, it would know that /tmp/A/B/C/A
was /tmp/A.   But yes, a stupid program could be confused by the
difference between names.

> 
>  > 
>  > I forgot to address Nikita's point about reclaiming lost cycles.  In
>  > this case, let me create Object 4 for /tmp
>  > Object 4:
>  > name: /tmp
>  > query: name begins with /tmp/
>  > query result cache: A->1
>  > 
>  > Now, if we delete Object 4, are Objects 1,2,3 lost?  I would say not
>  > because they still have names.  When the shell calls chdir("/tmp") a new
>  > query object (directory) must be created dynamically, and Objects
>  > 1001,1002,1003 still have their names that start with /tmp and so they
>  > immediately appear again.  Their names still start with /, so the top
>  > level query will still find them and /tmp as well.
> 
> Object 4 is "/tmp". Once it was removed what does it _mean_ for, say,
> Object 1003 to have a name "/tmp/A/B/file3"? What is "/tmp" bit there?
> Just a string? If so, and your directories are but queries, what does it
> mean for directory to be removed? How mv /tmp/A /tmp/A1 is implemented?
> By scanning whole file system and updating leaf name-lists?

Well, the name doesn't mean anything. :-)  It is just a convenient
metadata for describing where to find the file in a hierarchy, and for
Unix compatibility.

If a directory was removed by a standard rm -rf, it would work as
expected because it would descend the tree removing names (unlink) from
each object it found.

Moving an object with "mv" would change its name.  Moving a top-level
directory like /usr would require visiting every object starting
with /usr and doing an edit.  A compression scheme could be used where
the most-used top-level directory names were replaced with lookup
tables, then /usr could be renamed just once in the table.

> It seems that what you are proposing is a radical departure from file
> system namespace as we know it. :-) In your scheme all structural
> information is encoded in leaves _only_, and directories just do some
> kind of pattern matching. This is closer to a relational database than
> to the current file-systems where directories are the only source of
> the structural inform

Yes. :-)  It is radical, and the idea is taken from databases.  I
thought that seemed to be the direction Reiser filesystems were moving.
In this scheme a name is just another bit of metadata and not
first-class important information.  The name-query directories would be
there for traditional filesystem users and Unix compatibility.  They
would probably be virtual and dynamic, only being created when needed
and only being persistent if assigned meta-data (extra names (links),
non-default permission bits, etc) or for performance reasons (faster to
load from cache than searching every file).
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: File as a directory - VFS Changes

2005-05-31 Thread Jonathan Briggs
On Tue, 2005-05-31 at 15:01 -0600, Jonathan Briggs wrote:
> I should create an example.
> 
> Wherever I used True Name previously, use OID instead.  True Name was
> simply another term for a unique object identifier.
> 
> Three files with OIDs of 1001, 1002, and 1003.
> Object 1001:
> name: /tmp/A/file1
> name: /tmp/A/B/file1
> name: /tmp/A/B/C/file1
> 
> Object 1002:
> name: /tmp/A/file2
> 
> Object 1003:
> name: /tmp/A/B/file3
> 
> Three query objects (directories) with OIDs of 1, 2, and 3.
> Object 1:
> name: /tmp/A
> name: /tmp/A/B/C/A
> query: name begins with /tmp/A/
> query result cache: B->2, file1->1001, file2->1002
> 
> Object 2:
> name: /tmp/A/B
> query: name begins with /tmp/A/B/
> query result cache: C->3, file1->1001, file3->1003
> 
> Object 3:
> name: /tmp/A/B/C
> query: name begins with /tmp/A/B/C/
> query result cache: A->1, file1->1001
> 
> Now there is a A -> B -> C -> A directory loop.  But removing
> name: /tmp/A/B/C/A from Object 1 fixes the loop.  Deleting Object 1 also
> fixes the loop.  Deleting any of Object 1, 2 or 3 does not affect any
> other object, because in this scheme, directory objects do not need to
> actually exist: they are just queries that return objects with certain
> names.

I forgot to address Nikita's point about reclaiming lost cycles.  In
this case, let me create Object 4 for /tmp
Object 4:
name: /tmp
query: name begins with /tmp/
query result cache: A->1

Now, if we delete Object 4, are Objects 1,2,3 lost?  I would say not
because they still have names.  When the shell calls chdir("/tmp") a new
query object (directory) must be created dynamically, and Objects
1001,1002,1003 still have their names that start with /tmp and so they
immediately appear again.  Their names still start with /, so the top
level query will still find them and /tmp as well.

Therefore, the cycle is never detached and lost.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: File as a directory - VFS Changes

2005-05-31 Thread Jonathan Briggs
I should create an example.

Wherever I used True Name previously, use OID instead.  True Name was
simply another term for a unique object identifier.

Three files with OIDs of 1001, 1002, and 1003.
Object 1001:
name: /tmp/A/file1
name: /tmp/A/B/file1
name: /tmp/A/B/C/file1

Object 1002:
name: /tmp/A/file2

Object 1003:
name: /tmp/A/B/file3

Three query objects (directories) with OIDs of 1, 2, and 3.
Object 1:
name: /tmp/A
name: /tmp/A/B/C/A
query: name begins with /tmp/A/
query result cache: B->2, file1->1001, file2->1002

Object 2:
name: /tmp/A/B
query: name begins with /tmp/A/B/
query result cache: C->3, file1->1001, file3->1003

Object 3:
name: /tmp/A/B/C
query: name begins with /tmp/A/B/C/
query result cache: A->1, file1->1001

Now there is a A -> B -> C -> A directory loop.  But removing
name: /tmp/A/B/C/A from Object 1 fixes the loop.  Deleting Object 1 also
fixes the loop.  Deleting any of Object 1, 2 or 3 does not affect any
other object, because in this scheme, directory objects do not need to
actually exist: they are just queries that return objects with certain
names.

One problem I already see with it is that there is no way to enforce the
Unix "x" permission without real directory traversal.  But I never liked
that anyway. :)

Are there other problems with it?  Did I explain it clearly?

On Tue, 2005-05-31 at 11:27 -0700, Hans Reiser wrote:
> Well,. if you allow multiple true names, then you start to resemble
> something I suggested a few years ago, in which I outlined a taxonomy of
> links, and suggested that some links would count towards the reference
> count and some would not.
> 
> Of course, that does nothing for the cycle problem..
> 
> How are cycles handled for symlinks currently?
> 
> Hans
> 
> Jonathan Briggs wrote:
> 
> >Either that isn't allowed, or it immediately vanishes from all
> >directories.
> >
> >If deleting by OID isn't allowed, then every name property must be
> >removed in order to delete the file.
> >
> >Personally, I would allow deleting the OID.  It would be a convenient
> >way to be sure every instance of a file was deleted.
> >
> >On Tue, 2005-05-31 at 09:59 -0700, Hans Reiser wrote:
> >  
> >
> >>What happens when you unlink the True Name?
> >>
> >>Hans
> >>
> >>Jonathan Briggs wrote:
> >>
> >>
> >>
> >>>You can avoid cycles by redefining the problem.
> >>>
> >>>Every file or "data object" has one single True Name which is their
> >>>inode or OID.  Each data object then has one or more "names" as
> >>>properties.  Names are either single strings with slash separators for
> >>>directories, or each directory element is a unique object in an object
> >>>list.  Directories then become queries that return the set of objects
> >>>holding that directory name.  The query results are of course cached and
> >>>updated whenever a name property changes.
> >>>
> >>>Now there are no cycles, although a naive Unix "find" program could get
> >>>stuck in a loop.
> >>> 
> >>>
> >>>  
> >>>
> 
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: File as a directory - VFS Changes

2005-05-31 Thread Jonathan Briggs
Either that isn't allowed, or it immediately vanishes from all
directories.

If deleting by OID isn't allowed, then every name property must be
removed in order to delete the file.

Personally, I would allow deleting the OID.  It would be a convenient
way to be sure every instance of a file was deleted.

On Tue, 2005-05-31 at 09:59 -0700, Hans Reiser wrote:
> What happens when you unlink the True Name?
> 
> Hans
> 
> Jonathan Briggs wrote:
> 
> >
> >You can avoid cycles by redefining the problem.
> >
> >Every file or "data object" has one single True Name which is their
> >inode or OID.  Each data object then has one or more "names" as
> >properties.  Names are either single strings with slash separators for
> >directories, or each directory element is a unique object in an object
> >list.  Directories then become queries that return the set of objects
> >holding that directory name.  The query results are of course cached and
> >updated whenever a name property changes.
> >
> >Now there are no cycles, although a naive Unix "find" program could get
> >stuck in a loop.
> >  
> >
> 
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: File as a directory - VFS Changes

2005-05-31 Thread Jonathan Briggs
On Tue, 2005-05-31 at 12:30 -0400, [EMAIL PROTECTED] wrote:
> On Tue, 31 May 2005 08:04:42 PDT, Hans Reiser said:
> 
> > >Cycle may consists of more graph nodes than fits into memory. 
> > >
> > There are pathname length restrictions already in the kernel that should
> > prevent that, yes?
> 
> The problem is that although a *single* pathname can't be longer than some
> length, you can still create a cycle.  Consider for instance a pathname 
> restriction
> of 1024 chars.  Filenames A, B, and C are all 400 characters long.  A points 
> at B,
> B points at C - and C points back to A.
> 
> Also, although the set of inodes *in the cycle* fits in memory, the set of
> inodes *in the entire graph* that has to be searched to verify the presence of
> a cycle may not (in general, you have to be ready to examine *all* the inodes
> unless you can do some pruning (unallocated, provably un-cycleable, and so
> on)).  THis is the sort of thing that you can afford to do in userspace during
> an fsck, but certainly can't do in the kernel on every syscall that might
> create a cycle...

You can avoid cycles by redefining the problem.

Every file or "data object" has one single True Name which is their
inode or OID.  Each data object then has one or more "names" as
properties.  Names are either single strings with slash separators for
directories, or each directory element is a unique object in an object
list.  Directories then become queries that return the set of objects
holding that directory name.  The query results are of course cached and
updated whenever a name property changes.

Now there are no cycles, although a naive Unix "find" program could get
stuck in a loop.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: File as a directory - Ordered Relations

2005-05-31 Thread Jonathan Briggs
On Mon, 2005-05-30 at 01:19 -0700, Hans Reiser wrote:
> [EMAIL PROTECTED] wrote:
> 
> >On Fri, 27 May 2005 23:56:35 CDT, David Masover said:
> >
> >  
> >
> >>Hans, comment please?  Is this approaching v5 / v6 / Future Vision?  It
> >>does seem more than a little "clunky" when applied to v4...
> >>
> >>
> Well, if you read our whitepaper, we consider relational algebra to be a
> functional subset of what we will implement (which implies we think
> relational algebra should be possible in the filesystem naming.)
> 
> >
> >I'm not Hans, but I *will* ask "How much of this is *rationally* doable
> >without some help from the VFS?".
> >
> Think of VFS as a standards committee.  That means that 5-15 years after
> we implement it, they will copy it, break it, and then demand that we
> conform to their breakage. 
> 
> Anytimes someone says it should go into VFS, what they really mean is,
> nobody should get ahead of them because it will increase their workload.;-)
> 
> VFS is a baseline.  Once you support VFS, and your performance is good,
> you can start to innovate.  Next year we finally start to seriously
> innovate, after 10 years of groundwork.  The storage layer was never the
> interesting part of our plans, not to me.

Why innovate in the filesystem though, when it would work just as well
or better in the VFS layer?  Files as directories and meta-files would
work for all filesystems.  Ext3 with extended attributes could support
the same file structures as Reiser4.  Reiser4 would then be the most
efficient implementation of the general case.

From the last LKML discussion, it didn't look to me as if the kernel
maintainers are going to accept Reiser4's stranger features into the
mainline kernel, so if you're going to be implementing and maintaining
them separately anyway, why not do it in the implementation of all
namespaces, in the VFS code?
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: Reiser4 repackers

2005-05-05 Thread Jonathan Briggs
On Thu, 2005-05-05 at 17:10 -0500, David Masover wrote:
> This would, I think, involve creating a fully functional
> resizefs.reiser4 -- something I distinctly remember Hans telling me not
> to do, because my approach also created (most of) an online repacker,
> which is something Hans wants to do for money.

I am not certain this plan to make money from a repacker is going to be
a workable idea.

Reiser4 starts off being very nice and fast, but I have been using it
for a /home partition for almost a year now, and the performance has
dropped significantly.

Once this happens to enough people using Reiser4, one of them is going
to write a free repacker.

Perhaps it would be a good idea to release at least an offline repacker.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: where are reiser4 sources

2005-02-23 Thread Jonathan Briggs
On Wed, 2005-02-23 at 09:19 -0800, Hans Reiser wrote:
> Mark Junker wrote:
> 
> > Hans Reiser schrieb:
> >
> >> That violates the license.
> >
> >
> > GPL? This shouldn't be a problem because the sources aren't statically 
> > linked to Windows. The sources to create the driver can be published 
> > under GPL too so this shouldn't be a problem.
> >
> > Where do you see the license violation?
> >
> > Regards,
> > Mark
> >
> >
> >
> The GPL says nothing about static linking.  You are making a derivative 
> work.  No. 

I don't think that's right, Hans.  If his linking code is also GPL, the
GPL of the filesystem code is satisfied.  The only problem would be from
the Windows licensing rejecting GPL'd driver code, which I think it
does.

The dynamic linking problems can be got around by writing an
implementation that links dynamically against something else that is
okay under GPL.  If that code also happens to link dynamically against
non-GPL code, that is the end-user's problem.  So if there was, say, a
Linux or BSD implementation of a Windows driver interface like
NDISwrapper for filesystems, he could legally write against _that_.

Once there's a documented interface and more than one implementation,
implementing that interface no longer becomes a derived work.

In any case, he could always write a FUSE-like driver that thunked
filesystem calls in and out of a service.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: reiser4 for windows

2004-12-05 Thread Jonathan Briggs
On Sat, 2004-12-04 at 13:10 -0800, Job Bob wrote:
>   I know that this question has a fairly obvious
> answer, but is there or will there be a reiser4 driver
> for Windows? Microsoft will only make one if enough
> people use reiser4, otherwise hell will freeze before
> they do. The free software people seem to suffer from
> NIH syndrome, so they probably won't help either. That
> only leaves individuals or companies. Any ideas on
> when we might we a reiser4 driver for windows?
> 
>Yale


There are ext2/3 and reiserfs access programs for Windows.  They are not
implemented as Windows filesystem drivers.  I do not know if this is
because of technical difficulties or Windows vs. GPL licensing problems.

Most of them bring up the filesystem in a Windows Explorerish interface.
You can then copy files in or out.

I don't see any reasons why someone willing to do the work couldn't
adapt one of those programs to interface to Reiser4.
-- 
Jonathan Briggs <[EMAIL PROTECTED]>
eSoft, Inc.


signature.asc
Description: This is a digitally signed message part


Re: REISER4 snapshot

2004-08-04 Thread Jonathan Briggs
On Wed, 2004-08-04 at 12:28, Vladimir V. Saveliev wrote:
> Hello
> 
> Namesys has issued new reiser4 snapshot 
> (http://thebsh.namesys.com/snapshots/2004.08.04).
> It is against 2.6.8-rc2-mm2.
> It is mostly bug fixes.
> Look at http://thebsh.namesys.com/snapshots/2004.08.04/READ.ME for install 
> instructions

So, is this is what will be sent to the main kernel as ready-to-go? 
Yay!
-- 
Jonathan Briggs
[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part


Re: Longhorn FS being removed to Blackcomb (end of decade)

2004-04-09 Thread Jonathan Briggs
On Fri, 2004-04-09 at 11:32, Burnes, James wrote:
> Just thought this might be interesting to some of you.  The
> database-like FS for Longhorn has just been delayed until the end of the
> decade.  That's great.  It means that Microsoft's attempt to copy the
> BeOS filesystem is not meeting with much success.
> 
> Which leads to my follow on...
> 
> I think I remember database-like file system capabilities in Reiser are
> scheduled for V5? Or could it be done with plugins?
> 
> It would be great to beat MS to the DB filesystem punch.  Maybe with
> object-oriented plugins in V4 that won't matter as much.
[snip]

I've been thinking about the filesystem database idea a lot recently. 
I'm not a famous database or filesystem designer, but I've got opinions
:-)

I think the best way to be doing database features is with a fast,
robust file change notification feature, and do the database in user
space.  Some filesystem features would be required, but I think metadata
and two-way hard links is all we need.

I came to the conclusion that doing the search and indexing in the
filesystem isn't necessary if a userspace daemon can track file and
metadata changes and update index/search directories with hard links. 
One of the Reiser4 pseudo files needs to be a list of hard links to the
file, so the daemon can find each reference and update or delete links
as needed.

The daemon could watch directory creation and look for query metadata on
directories.  When found, it would use index directories or full
filesystem search to fill in the query results.

I think we could do this in Reiser4.  Some of the security auditing work
being done in the kernel could be adapted for filesystem change
notification.  Then we need some Reiser4 plugins for tracking hard links
and metadata.  Then we need the daemon to watch file changes and create
index and query results.

Whee!  I make it sound so easy!

> (Hans in the background: You want more features, how about writing it
> yourself.  I mumble something about being in the middle of a move to
> Denver).

Hey, great!  I'm living near Boulder (close to Denver) and I love it
here.  I think you will too.
-- 
Jonathan Briggs
[EMAIL PROTECTED]



New Reiser4 snapshot slower?

2004-03-26 Thread Jonathan Briggs
This command:
# dd if=/dev/zero of=testfile bs=4k count=256k
seems to take longer than it did before.  The difference seems to be all
system time.

I will have to reboot to an old version later, to get accurate numbers.

I don't have debugging turned on, so that's not it.  Has something
changed that would have made this slower?
-- 
Jonathan Briggs
[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part


Re: [ANNOUNCE] new reiser4 snapshot released.

2004-03-26 Thread Jonathan Briggs
On Fri, 2004-03-26 at 09:45, Nikita Danilov wrote:
> Hello,
> 
> new reiser4 snapshot against 2.6.5-rc2 is available at
> 
> http://www.namesys.com/snapshots/2004.03.26/
> 
> It is mainly bug-fixing release. See READ.ME for the list of fixes and
> caveats.

A definition of fibration:
http://mathworld.wolfram.com/Fibration.html

I'm going to have to study math for about a year before I understand all
that, I think.

It's a good thing we won't have to understand "fiber bundles",
"paracompact topological space" and the "homotopy lifting property" to
USE Reiser4.

*grin*

If I missed the discussion or a web page, I am sorry.  But could someone
post a quick explanation or pointer to one about this fibration plugin? 
What does it do and what effects will it have?

-- 
Jonathan Briggs
[EMAIL PROTECTED]



Reiser4 panic at fs/reiser4/lock.c:434

2004-03-24 Thread Jonathan Briggs
The system is running Linux kernel 2.6.5-rc2.  It's tainted by Nvidia
drivers.

The system was compiling the kernel, and Reiser4 panicked.
The Reiser4 panic starts at line 443 of the attached dmesg log.

The Reiser4 filesystem is on a MD RAID 0 device made of two SCSI disks.

I had some Reiser4 debugging options turned on.  I was trying to catch
it creating blocks of zeros, which is something I noticed it doing to
frequently appended files like .xsession-errors and my Evolution mbox
file.

After I rebooted the system, I copied the kernel source to another
filesystem and finished the compile, which does not include debugging
options (It was way too slow!), so I am not running with debugging now. 
But if you want more information, let me know and I will try to provide
it.
-- 
Jonathan Briggs
[EMAIL PROTECTED]
Linux version 2.6.5-rc2 ([EMAIL PROTECTED]) (gcc version 3.3.2 20031022 (Red Hat Linux 
3.3.2-1)) #5 Mon Mar 22 10:50:14 MST 2004
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000ec000 - 0010 (reserved)
 BIOS-e820: 0010 - 1fff (usable)
 BIOS-e820: 1fff - 1fff8000 (ACPI data)
 BIOS-e820: 1fff8000 - 2000 (ACPI NVS)
 BIOS-e820: fec0 - fec01000 (reserved)
 BIOS-e820: fee0 - fee01000 (reserved)
 BIOS-e820: ffee - fff0 (reserved)
 BIOS-e820: fffc - 0001 (reserved)
511MB LOWMEM available.
On node 0 totalpages: 131056
  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 126960 pages, LIFO batch:16
  HighMem zone: 0 pages, LIFO batch:1
DMI 2.3 present.
ACPI: RSDP (v000 AMI   ) @ 0x000fa2c0
ACPI: RSDT (v001 AMIINT SiS735XX 0x1000 MSFT 0x010b) @ 0x1fff
ACPI: FADT (v001 AMIINT SiS735XX 0x1000 MSFT 0x010b) @ 0x1fff0030
ACPI: DSDT (v001SiS  735 0x0100 MSFT 0x010d) @ 0x
Built 1 zonelists
Kernel command line: ro root=/dev/md1 rootfstype=reiserfs single
Local APIC disabled by BIOS -- reenabling.
Found and enabled local APIC!
Initializing CPU#0
PID hash table entries: 2048 (order 11: 16384 bytes)
Detected 1659.672 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Memory: 512988k/524224k available (3016k kernel code, 10472k reserved, 1336k data, 
156k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay loop... 3284.99 BogoMIPS
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
CPU: After generic identify, caps: 0383fbff c1c3fbff  
CPU: After vendor identify, caps: 0383fbff c1c3fbff  
CPU: CLK_CTL MSR was 6003d22f. Reprogramming to 2003d22f
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 256K (64 bytes/line)
CPU: After all inits, caps: 0383fbff c1c3fbff  0020
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: AMD Athlon(tm) XP 2000+ stepping 01
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
enabled ExtINT on CPU#0
ESR value before enabling vector: 
ESR value after enabling vector: 
Using local APIC timer interrupts.
calibrating APIC timer ...
. CPU clock speed is 1659.0295 MHz.
. host bus clock speed is 265.0487 MHz.
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xfdb01, last bus=1
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20040311
 tbxface-0117 [03] acpi_load_tables  : ACPI Tables successfully acquired
Parsing all Control 
Methods:
Table [DSDT](id F004) - 413 Objects with 41 Devices 144 Methods 18 Regions
ACPI Namespace successfully loaded at root c0589ffc
ACPI: IRQ9 SCI: Edge set to Level Trigger.
evxfevnt-0093 [04] acpi_enable   : Transition to ACPI mode successful
evgpeblk-0747 [06] ev_create_gpe_block   : GPE 00 to 15 [_GPE] 2 regs at 
0820 on int 9
evgpeblk-0747 [06] ev_create_gpe_block   : GPE 16 to 31 [_GPE] 2 regs at 
0830 on int 9
Completing Region/Field/Buffer/Package 
initialization:
Initialized 18/18 Regions 5/5 Fields 35/35 Buffers 30/30 Packages (422 nodes)
Executing all Device _STA and_INI methods:...
43 Devices found containing: 43 _STA, 1 _INI methods
ACPI: Interpreter enab

Re: Bug report: Reiser4 freezes during file copy

2004-02-18 Thread Jonathan Briggs
On Wed, 2004-02-18 at 11:43, Alex Zarochentsev wrote:
> hi,
> 
> On Wed, Feb 18, 2004 at 11:13:14AM -0700, Jonathan Briggs wrote:
> > The software:
> > Linux kernel 2.6.3 with Reiser4 patch dated February 6th.
> > Compiled with gcc 3.4.0 20040129
> > (I know the GCC is still in testing, but Reiser3 works fine with it,
> > so...)
> > 
> > After I'd created a new Reiser4 filesystem, I was copying my backup
> > files onto it.  While I was doing that, it just stopped.
> 
> Did you create reiser4 on a loopback or RAID/LVM device?

I knew I should have included more details!  Sorry...
Yes, it was on a MD device on top of two SCSI drives.  RAID 0.

I will also attach my kernel configuration, which I meant to do earlier.
-- 
Jonathan Briggs
[EMAIL PROTECTED]


config.gz
Description: GNU Zip compressed data


signature.asc
Description: This is a digitally signed message part


Bug report: Reiser4 freezes during file copy

2004-02-18 Thread Jonathan Briggs
The software:
Linux kernel 2.6.3 with Reiser4 patch dated February 6th.
Compiled with gcc 3.4.0 20040129
(I know the GCC is still in testing, but Reiser3 works fine with it,
so...)

After I'd created a new Reiser4 filesystem, I was copying my backup
files onto it.  While I was doing that, it just stopped.

Here is the cp process from a process dump:
cpD C83F583C 0 20232  20159 (NOTLB)
c83f57b4 00200082 dd505014 c83f583c c83f58ec 0002  0002
   0002 0002 00072f8f 972dc381 21ec d0ffad00 d0ffaec0 c83f5f0c
   00200292 c83f4000 d0ffad00 c010714b c83f5f14 0001 d0ffad00 c011c620
Call Trace:
 [] __down+0x7b/0x110
 [] default_wake_function+0x0/0x20
 [] extent_size+0x96/0xa0
 [] __down_failed+0x8/0xc
 [] .text.lock.lock+0xf/0x14
 [] atom_wait_event+0x57/0x80
 [] plugin_by_coord_node40+0x2f/0x40
 [] write_fq+0x18/0x1d0
 [] longterm_unlock_znode+0xa7/0x140
 [] write_prepped_nodes+0x50/0x60
 [] flush_current_atom+0x757/0xbc0
 [] reiser4_writepages+0x0/0xc0
 [] flush_some_atom+0x174/0x290
 [] reiser4_sync_inodes+0xb3/0x120
 [] sync_sb_inodes+0x16/0x30
 [] writeback_inodes+0x49/0x92
 [] balance_dirty_pages_ratelimited+0x8e/0x190
 [] item_balance_dirty_pages+0xbd/0x190
 [] write_extent+0x89d/0xf10
 [] hint_validate+0xab/0xd0
 [] equal_to_rdk+0x29/0x60
 [] plugin_by_coord_node40+0x2f/0x40
 [] write_flow+0x253/0x460
 [] write_extent+0x0/0xf10
 [] all_grabbed2free+0x36/0x40
 [] write_unix_file+0x178/0x2a0
 [] reiser4_write+0x8f/0x100
 [] reiser4_write+0x0/0x100
 [] vfs_write+0x10a/0x150
 [] sys_write+0x42/0x70
 [] sysenter_past_esp+0x52/0x71


-- 
Jonathan Briggs
[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part