Re: Lazy block allocation and block_prepare_write?

2005-04-18 Thread Badari Pulavarty
Martin Jambor wrote:
Hi all,
I am a member of a group that implements a filesystem that allocates
disk blocks to in-memory blocks lazily, that means, the decision is
made just before the data are actually sent to disk. Moreover, when
cached pages are modified, the data can be (and almost certainly will
be) written to a different place to from where it was read.
I was wondering, whether we could use the generic function
block_prepare_write at all. The function checks every buffer of the
page and if it is not mapped, it calls a fs supplied function that is
supposed to map the buffer, i.e. assign it a block on the device and
set its mapped flag.
This is where we would like to give an error if there is not enough
free disk space left but we cannot give a specific device block number
yet. Can we make one up, such as -1? What would that do to such dark
functions as unmap_underlying_metadata or any other? Would some other
part of kernel break if there was a bunch of buffers assigned to the
same spot on the disk?
On the other hand, if I understand buffer flags correctly, I need to
be able to emulate mapping of buffers to set them dirty, or em I
wrong?
Thanks for any insight or thoughts,
Yes. Its possible to do what you want to. I am currently working on
adding "delayed allocation" support to ext3. As part of that, We
are modifying generic helper routines to delay the allocation from
prepare time to actual writeout time. (writepage).
Here is the basic idea:
===
The idea is to "reserve" a block at the prepare/commit write instead
of allocating the block. Do the actual allocation in writepage().
Sounds simple :)
Here are the issues:

1) Currently none of the generic helper routines can handle this.
We need to add support to do these, but still somehow make the
routines generic enough for every ones use.
2) There is no easy way to find out if we "reserved" a block or
not in writepage() correctly. There are 2 paths to writepage().
sys_write() -> prepare/commit()
and later sync() > writepage()
mmap() -> touch a page()
and later --> writepage()
In order to do the correct accounting, we need to mark a page
to indicate if we reserved a block or not. One way to do this,
to use page->private to indicate this. But then, all the generic
routines will fail - since they assume that page->private represents
bufferheads. So we need a better way to do this.
3) We need add hooks into filesystem specific calls from these
generic routines to handle "journaling mode" requirements
(for ext3 and may be others).
So, what are your requirements ?  I am looking for a common
way to combine all the requirements and come out with a
saner "generic" routines to handle these.
Thanks,
Badari
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS4 mount problem

2005-04-18 Thread Trond Myklebust
mà den 18.04.2005 Klokka 15:07 (-0700) skreiv Bryan Henderson:

> >We're already up to version 6 of the binary interfaces for v2/v3, and if
> >you count NFSv4 too, then that makes 7. 
> 
> I don't know the NFS mount option format, but I'm having a hard time 
> imagining how a string-based format can take less code to parse and be 
> more forward compatible than a binary one.  People don't even use the term 
> "parse" for binary structures, because parsing typically means turning 
> strings into binary structures.

The string based parser (based, BTW, on the generic string parser in
lib/parser.c) for NFS mount options is already in the kernel, thanks to
NFSroot, and already needs to be maintained. As is the NFSv2/v3 "mount"
RPC code, and everything else that the kernel needs to take over that
duty.

The only extra information we need from userland is the DNS lookup of
the server hostname(NFSv2/v3/v4) and the client IP address (NFSv4 only).

> Having 6 separate formats isn't the only way to have an evolving
> binary 
> interface.  People do make extensible binary formats.

I never said they were 6 _separate_ formats. The NFSv2/v3 stuff is one
constantly "extending" binary format. See include/linux/nfs_mount.h.

Note how 4 of those fields are currently entirely obsolete (fd,
old_root, namlen, bsize) and how one more cannot be extended to cope
with IPv6 and other new transports (addr), and how one more (root)
cannot be used for NFSv4 mounts, which had to add in at least 2 more
fields that are unused by NFSv2/v3...

Sure, we could indeed have developed more sensible binary formats if our
1992 crystal ball had told us all about NFSv3 (RFC dates from 1995) and
NFSv4 (RFC dates from 2003). Not to forget lockd, statd, nfsacl, IPv6,
etc...

> I personally almost never worry about the number of bytes of code, but
> I 
> worry a lot about its simplicity.  User space code is less costly to 
> develop and less risky to make a mistake in.  I would add,
>  
> 3) Keeping the kernel parsing code simple.

No.
  3) Keeping the kernel parsing code _maintainable_

...and keeping around parsers for all these different formats and fields
and now extra 32-bit counterparts isn't my idea of code simplicity, code
compactness, or code maintainability.

Cheers,
  Trond
-- 
Trond Myklebust <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS4 mount problem

2005-04-18 Thread Bryan Henderson
>My concern is that we are slowly but surely building up a bigger
>in-kernel library for parsing the binary structure than it would take to
>parse the naked mount option string.
>
>...
>If people really do need a fully documented NFS mount interface, then
>the only one that makes sense is a string interface. Looking back at the
>manpages, the string mount options are the only thing that have remained
>constant over the last 10 years.
>
>We're already up to version 6 of the binary interfaces for v2/v3, and if
>you count NFSv4 too, then that makes 7. 

I don't know the NFS mount option format, but I'm having a hard time 
imagining how a string-based format can take less code to parse and be 
more forward compatible than a binary one.  People don't even use the term 
"parse" for binary structures, because parsing typically means turning 
strings into binary structures.

Having 6 separate formats isn't the only way to have an evolving binary 
interface.  People do make extensible binary formats.

>There are only 2 reasons for doing
>that parsing in userland:
>
>  1) DNS lookups
>  2) Keeping the kernel parsing code small

I personally almost never worry about the number of bytes of code, but I 
worry a lot about its simplicity.  User space code is less costly to 
develop and less risky to make a mistake in.  I would add,
 
3) Keeping the kernel parsing code simple.

--
Bryan Henderson  IBM Almaden Research Center
San Jose CA  Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS4 mount problem

2005-04-18 Thread Bryan Henderson
>(1) The kernel is returning EFAULT to the 32-bit userspace; this implies 
that
> userspace is handing over a bad address. It isn't, the kernel is
> malfunctioning as it stands.
>...
>Either the kernel should return ENOSYS for any 32-bit mount on a 64-bit 
kernel
>or it must support it fully.

So this point is just the error code?  If so, where do you get ENOSYS?  A 
more usual errno for where a particular filesystem type can't be mounted 
is ENODEV.  Choosing errnos is a pretty whimsical thing anyway, since 
there are so many more kinds of errors than the authors of the errno space 
contemplated, but EFAULT and ENOSYS are two that have a pretty solid 
definition.  ENOSYS is for when an entire system call type is missing.

I'm not sure we can complain about EFAULT, though, because you really are 
supplying an invalid address.  You're doing it because you're using the 
wrong mount option format, so what you think of as 4 bytes of flags 
followed by 4 bytes of address is really 8 bytes of address.

I do understand the more important issue of there being a kernel that 
understands both mount option formats; but since you enumerated the errno 
issue, I wanted to comment on that one independently.

--
Bryan Henderson  IBM Almaden Research Center
San Jose CA  Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH] (#2) file system auditing

2005-04-18 Thread Timothy R. Chavez
Hello,

The audit subsystem is currently incapable of auditing a file system object 
based on its location and name.  This is critical for auditing well-defined 
and security-relevant locations such as /etc/shadow, where the file is 
re-created on each transaction, and cannot rely on the (device, inode)-based 
filters to ensure persistence of auditing across transactions. This patch adds 
the necessary functionality to the audit subsystem and VFS to support file 
system auditing in which an object is audited based on its location and name.  
This work is being done to make the audit subsystem compliant with Common 
Criteria's Controlled Access Protection Profile (CAPP) specification.

This is the second (#2) RFC on linux-fsdevel.

I've made some updates to the file system auditing code.  The file system 
hooks remain unchanged, but I would still appreciate feedback.  Since the
audit patchset was temporarily removed from the mm tree, the patch, which 
appears at the bottom of this message, was diffed against linux-2.6.12-rc2-mm1
again.

The most notable updates/changes are as follows:

1.

Updated handling of watch data between user and kernel.  Introduced a new 
struct, audit_transport, which is used as a common means by the kernel and
user to send audit_watch information via netlink to each other.

2.

A master watchlist was added.  I've added a global linked list that keeps 
tabs on all the watches in the file system.  This list is currently used to 
send back to the user space a list of all the watches.  The path used to add 
the watch and the device the watch was added on are now stored with the 
watch.  Before sending the watch back to user space the path is walked and 
checked for the watch.  If the path can be walked from this [view of the] 
file system and the watch is found, the watch is returned as 'valid' 
otherwise it is returned as 'invalid'.

3.  

A bug was fixed by which one could create a hardlink to a file, watch the 
file, and access the hardlink first, subverting the audit subsystem.  Now, 
when a watch is inserted, the inode is updated ASAP.

-tim

diff -Nurp linux-2.6.12-rc2-mm1~orig/fs/dcache.c 
linux-2.6.12-rc2-mm1~audit/fs/dcache.c
--- linux-2.6.12-rc2-mm1~orig/fs/dcache.c   2005-04-11 14:14:36.0 
+
+++ linux-2.6.12-rc2-mm1~audit/fs/dcache.c  2005-04-05 18:16:04.0 
+
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* #define DCACHE_DEBUG 1 */
 
@@ -97,6 +98,7 @@ static inline void dentry_iput(struct de
 {
struct inode *inode = dentry->d_inode;
if (inode) {
+   audit_attach_watch(dentry, 1);
dentry->d_inode = NULL;
list_del_init(&dentry->d_alias);
spin_unlock(&dentry->d_lock);
@@ -802,6 +804,7 @@ void d_instantiate(struct dentry *entry,
if (inode)
list_add(&entry->d_alias, &inode->i_dentry);
entry->d_inode = inode;
+   audit_attach_watch(entry, 0);
spin_unlock(&dcache_lock);
security_d_instantiate(entry, inode);
 }
@@ -978,6 +981,7 @@ struct dentry *d_splice_alias(struct ino
new = __d_find_alias(inode, 1);
if (new) {
BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
+   audit_attach_watch(new, 0);
spin_unlock(&dcache_lock);
security_d_instantiate(new, inode);
d_rehash(dentry);
@@ -987,6 +991,7 @@ struct dentry *d_splice_alias(struct ino
/* d_instantiate takes dcache_lock, so we do it by hand 
*/
list_add(&dentry->d_alias, &inode->i_dentry);
dentry->d_inode = inode;
+   audit_attach_watch(dentry, 0);
spin_unlock(&dcache_lock);
security_d_instantiate(dentry, inode);
d_rehash(dentry);
@@ -1090,6 +1095,7 @@ struct dentry * __d_lookup(struct dentry
if (!d_unhashed(dentry)) {
atomic_inc(&dentry->d_count);
found = dentry;
+   audit_attach_watch(found, 0);
}
spin_unlock(&dentry->d_lock);
break;
@@ -1299,6 +1305,8 @@ void d_move(struct dentry * dentry, stru
spin_lock(&target->d_lock);
}
 
+   audit_attach_watch(dentry, 1);
+
/* Move the dentry to the target hash queue, if on different bucket */
if (dentry->d_flags & DCACHE_UNHASHED)
goto already_unhashed;
@@ -1332,6 +1340,7 @@ already_unhashed:
list_add(&target->d_child, &target->d_parent->d_subdirs);
}
 
+   audit_attach_watch(dentry, 0);
list_add(&dentry->d_child, &dentry->d_parent->d_subdirs);
spin_unlock(&target->d_lock);
spin_unlock(&dentry->d_lock);
diff -Nurp linux-2.6.12-rc2-mm1~orig/fs/inode.c 
linux-2.6.12-rc2-mm1~a

Re: NFS4 mount problem

2005-04-18 Thread David S. Miller
On Mon, 18 Apr 2005 11:36:25 +0100
David Howells <[EMAIL PROTECTED]> wrote:

> Christoph Hellwig <[EMAIL PROTECTED]> wrote:
> 
> > I don't think we should encourage filesystem writers to do such stupid
> > things as ncfps/smbfs do.  In fact I'm totally unhappy thay nfs4 went
> > down that road.
> 
> The problem with NFS4, I think, is that the mount syscall sets a hard limit on
> the amount of mount data that's insufficiently large.

That's correct, it currently cannot support more than one page
of data.  Even worse, that makes the limit platform dependent.


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS4 mount problem

2005-04-18 Thread Trond Myklebust
mà den 18.04.2005 Klokka 10:17 (-0700) skreiv Bryan Henderson:
> >mount() is not a documented syscall.  The binary formats for filesystems
> >like NFS are only documented inside the kernels to which they apply.
> 
> What  _is_ a documented system call?  Linux is famous for not having 
> documented interfaces (or, put another way, not distinguishing between an 
> interface you can read in an official document and one you discover by 
> reading kernel source code).  But of all interfaces in Linux, the system 
> call interface is probably the most accepted as one a user of the kernel 
> can rely on.
> 
> I don't think a filesystem driver designer should expect mount options to 
> be private to one particular user space program.  Especially one that 
> isn't even packaged with the driver.

If people really do need a fully documented NFS mount interface, then
the only one that makes sense is a string interface. Looking back at the
manpages, the string mount options are the only thing that have remained
constant over the last 10 years.

We're already up to version 6 of the binary interfaces for v2/v3, and if
you count NFSv4 too, then that makes 7. Choice of which binary interface
to use is entirely dependent on the kernel revision. Good luck fitting
all that (plus future revisions) into something like sash without
doubling its size...

Cheers,
  Trond
-- 
Trond Myklebust <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS4 mount problem

2005-04-18 Thread Bryan Henderson
>> Architecture-dependent blob passed to mount(2) (aka nfs4_mount_data).
>> If you want it to be a blob, at least have a decency to use encoding
>> that would not depend on alignment rules and word size.  Hell, you
>> could use XDR - it's not that nfs would need something new to handle
>> it.  Or, better yet, use a normal string.
>
>Mount doesn't appear to permit a big enough blob though. It has a hard 
limit
>of PAGE_SIZE.

That seems to me to be orthogonal to Al's point.  You could make an 
architecture-independent format for that page that still contains 
addresses in user space of additional information.  Which would presumably 
also have an architecture-independent format.

But why is mount() special here?  It's ancient tradition for Linux system 
calls to take as parameters, and return as results, in-memory structures 
that are dependent on local word size and endianness.  Lots of them do.

--
Bryan Henderson  IBM Almaden Research Center
San Jose CA  Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS4 mount problem

2005-04-18 Thread Al Viro
On Mon, Apr 18, 2005 at 06:33:09PM +0100, David Howells wrote:
> Al Viro <[EMAIL PROTECTED]> wrote:
> 
> > 
> > Architecture-dependent blob passed to mount(2) (aka nfs4_mount_data).
> > If you want it to be a blob, at least have a decency to use encoding
> > that would not depend on alignment rules and word size.  Hell, you
> > could use XDR - it's not that nfs would need something new to handle
> > it.  Or, better yet, use a normal string.
> 
> Mount doesn't appear to permit a big enough blob though. It has a hard limit
> of PAGE_SIZE.

Excuse me?  Would the use of fixed offsets, field sizes and endianness
make the blob bigger?  And as for the length of string representation
going past 4Kb...  that could be easily dealt with in sys_mount() if it
really becomes a problem.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS4 mount problem

2005-04-18 Thread David Howells
Al Viro <[EMAIL PROTECTED]> wrote:

> 
> Architecture-dependent blob passed to mount(2) (aka nfs4_mount_data).
> If you want it to be a blob, at least have a decency to use encoding
> that would not depend on alignment rules and word size.  Hell, you
> could use XDR - it's not that nfs would need something new to handle
> it.  Or, better yet, use a normal string.

Mount doesn't appear to permit a big enough blob though. It has a hard limit
of PAGE_SIZE.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Lilo requirements (Was: Re: Address space operations questions)

2005-04-18 Thread Bryan Henderson
>- unit of disk space allocation for the kernel image file is
> block. That is, optimizations like UFS fragments or reiserfs tails are
> not applied, and
>
> - blocks that kernel image is stored into are real disk blocks (i.e.,
> there is a way to disable "delayed allocation"), and
>
> - kernel image file is not relocated, i.e., data are not moved into
> another blocks on the fly.

It also has to implement the ioctl that tells you what blocks a file is in 
(that kind of implies much of the above).  Except if the LILO installer 
makes special provisions as for Reiserfs, of course.

To be really exact, it's OK for the blocks to move, as long as it doesn't 
do so so subtly that the user doesn't know to rerun the LILO installer. 
E.g. you can move the blocks of the kernel file if someone overwrites it.

--
Bryan Henderson  IBM Almaden Research Center
San Jose CA  Filesystems

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS4 mount problem

2005-04-18 Thread Bryan Henderson
>mount() is not a documented syscall.  The binary formats for filesystems
>like NFS are only documented inside the kernels to which they apply.

What  _is_ a documented system call?  Linux is famous for not having 
documented interfaces (or, put another way, not distinguishing between an 
interface you can read in an official document and one you discover by 
reading kernel source code).  But of all interfaces in Linux, the system 
call interface is probably the most accepted as one a user of the kernel 
can rely on.

I don't think a filesystem driver designer should expect mount options to 
be private to one particular user space program.  Especially one that 
isn't even packaged with the driver.

--
Bryan Henderson  IBM Almaden Research Center
San Jose CA  Filesystems
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS4 mount problem

2005-04-18 Thread Al Viro
On Mon, Apr 18, 2005 at 10:07:14AM -0700, Bryan Henderson wrote:
> >On Fri, Apr 15, 2005 at 01:22:59PM -0700, David S. Miller wrote:
> >> 
> >> Make a ->compat_read_super() just like we have a ->compat_ioctl()
> >> method for files, if you want to suggest a solution like what
> >> you describe.
> >
> >I don't think we should encourage filesystem writers to do such stupid
> >things as ncfps/smbfs do.  In fact I'm totally unhappy thay nfs4 went
> >down that road.
> 
> Which road is that?

Architecture-dependent blob passed to mount(2) (aka nfs4_mount_data).
If you want it to be a blob, at least have a decency to use encoding
that would not depend on alignment rules and word size.  Hell, you
could use XDR - it's not that nfs would need something new to handle
it.  Or, better yet, use a normal string.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS4 mount problem

2005-04-18 Thread Bryan Henderson
>On Fri, Apr 15, 2005 at 01:22:59PM -0700, David S. Miller wrote:
>> 
>> Make a ->compat_read_super() just like we have a ->compat_ioctl()
>> method for files, if you want to suggest a solution like what
>> you describe.
>
>I don't think we should encourage filesystem writers to do such stupid
>things as ncfps/smbfs do.  In fact I'm totally unhappy thay nfs4 went
>down that road.

Which road is that?

--
Bryan Henderson  IBM Almaden Research Center
San Jose CA  Filesystems

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS4 mount problem

2005-04-18 Thread Trond Myklebust
mà den 18.04.2005 Klokka 16:23 (+0100) skreiv David Howells:
> Trond Myklebust <[EMAIL PROTECTED]> wrote:
> 
> > Without such a library, it is pointless to contemplate "other callers".
> > With such a library, you will have a single point for switching between
> > 32bit and 64 bit.
> 
> "Other callers" include such as busybox, sash and uClinux. I'm not sure about
> such as Perl, but Perl is hardly in the same class as the other three.
> Admittedly, a library is probably the right way to do it - libmount or some
> such thing.

Hmm... Nope. None of the above have support for NFSv4 AFAICS. busybox
has support for NFSv2/v3, but not v4.

Note you are starting to convince me that the correct way to do this is
to bite the bullet, move all the binary stuff into a compat library that
we can drop at some later time, and then rebase on the NFSroot parser.

Cheers,
  Trond
-- 
Trond Myklebust <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS4 mount problem

2005-04-18 Thread David Howells
Trond Myklebust <[EMAIL PROTECTED]> wrote:

> Without such a library, it is pointless to contemplate "other callers".
> With such a library, you will have a single point for switching between
> 32bit and 64 bit.

"Other callers" include such as busybox, sash and uClinux. I'm not sure about
such as Perl, but Perl is hardly in the same class as the other three.
Admittedly, a library is probably the right way to do it - libmount or some
such thing.

> Then can we kill the PPC64 binary structure and substitute PPC32?

Whilst I might be happy to, I'm not sure I can speak for everyone. You can
also use i386 mount on x86_64 for instance; and possibly s390 on s390x,
sparc32 on sparc64 and mips32 on mips64.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS4 mount problem

2005-04-18 Thread Trond Myklebust
mà den 18.04.2005 Klokka 11:34 (+0100) skreiv David Howells:
>  (1) The kernel is returning EFAULT to the 32-bit userspace; this implies that
>  userspace is handing over a bad address. It isn't, the kernel is
>  malfunctioning as it stands.
> 
>  (2) The kernel API does not prohibit 32-bit userspace calling mount() under a
>  64-bit kernel. All other filesystems cope with it (AFAIK), so NFS4 must
>  too.
> 
> Either the kernel should return ENOSYS for any 32-bit mount on a 64-bit kernel
> or it must support it fully. I think the latter is the right thing to do;
> despite what you'd prefer, there are other callers of the mount syscall out
> there.

No. If you want generalized support for mounting NFS filesystems, then
the right thing to do is to create a userland library that can translate
the mount options, set up the binary structure with sane defaults etc.

Without such a library, it is pointless to contemplate "other callers".
With such a library, you will have a single point for switching between
32bit and 64 bit.

My concern is that we are slowly but surely building up a bigger
in-kernel library for parsing the binary structure than it would take to
parse the naked mount option string. There are only 2 reasons for doing
that parsing in userland:

  1) DNS lookups
  2) Keeping the kernel parsing code small

> > There should therefore be exactly ONE instance of usage, and that is in
> > the "mount" program itself.
> 
> Exactly. That should then be the ppc32 mount; which should work equally well
> with a ppc32 or a ppc64 kernel.

Then can we kill the PPC64 binary structure and substitute PPC32?

Cheers,
  Trond

-- 
Trond Myklebust <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS4 mount problem

2005-04-18 Thread David Howells
Christoph Hellwig <[EMAIL PROTECTED]> wrote:

> I don't think we should encourage filesystem writers to do such stupid
> things as ncfps/smbfs do.  In fact I'm totally unhappy thay nfs4 went
> down that road.

The problem with NFS4, I think, is that the mount syscall sets a hard limit on
the amount of mount data that's insufficiently large.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS4 mount problem

2005-04-18 Thread David Howells

Trond Myklebust <[EMAIL PROTECTED]> wrote:

> > We've come across an interesting problem with NFS4 mount on a PPC64 box. If
> > the mount program is compiled as PPC32, then the mount() syscall is returned
> > EFAULT.
>
> So, why is this not a case of "Doctor it hurts..."?

Because:

 (1) The kernel is returning EFAULT to the 32-bit userspace; this implies that
 userspace is handing over a bad address. It isn't, the kernel is
 malfunctioning as it stands.

 (2) The kernel API does not prohibit 32-bit userspace calling mount() under a
 64-bit kernel. All other filesystems cope with it (AFAIK), so NFS4 must
 too.

Either the kernel should return ENOSYS for any 32-bit mount on a 64-bit kernel
or it must support it fully. I think the latter is the right thing to do;
despite what you'd prefer, there are other callers of the mount syscall out
there.

> There should therefore be exactly ONE instance of usage, and that is in
> the "mount" program itself.

Exactly. That should then be the ppc32 mount; which should work equally well
with a ppc32 or a ppc64 kernel.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[announce] mountlo 0.1 - loopback mounting in userspace

2005-04-18 Thread Miklos Szeredi
This program works similarly to "mount -o loop", but the filesystem
runs in userspace, making it possible for non-root users to safely
loopback mount filesystem images.

It works by starting a UML (User Mode Linux) instance, mounting the
image in there, and exporting the resulting data through FUSE. 

This is a first release and is really stupid: you can't even specify
the filesystem type or any mount options.  But for filesystems that
mount can recognize it works fine.

A binary compiled for i386 is available (2.1M) [1]. Requirements for
running the binary are:

  o FUSE-2.2 or greater, or kernel module from recent -mm kernel
  o Any Linux version supported by the above (>= 2.4.21 basically)

To compile from source, the following components are needed:

  o Linux 2.6.11 kernel source  (35M)  [2]
  o FUSE 2.3-pre4 source(350k) [3]
  o mountlo 0.1 source  (15k)  [4]

Mount time is about 0.5 sec, which is ghastly compared to native
kernel mount, but not so bad considering, that a complete kernel boot
with initramfs unpacking, etc. is in there.  Other than this I haven't
done any performance measurements.

Comments, patches, offers to take over maintenance are welcome ;)

Miklos

[1] http://prdownloads.sourceforge.net/fuse/mountlo-i386-0.1.tar.gz
[2] http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.11.tar.bz2
[3] http://prdownloads.sourceforge.net/fuse/fuse-2.3-pre4.tar.gz
[4] http://prdownloads.sourceforge.net/fuse/mountlo-0.1.tar.gz
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html