Re: FreeBSD 5/6/7 kernel emulator for NetBSD 2.x

2005-10-28 Thread Bill Studenmund
On Thu, Oct 27, 2005 at 04:53:35PM -0400, der Mouse wrote:
  - Implements FreeBSD's devfs on NetBSD.
  In the past, we (NetBSD folks) have talked about a devfs.
  [...persistence...]
  FreeBSD 5+ has /etc/devfs.conf and /etc/devfs.rules [...].  Is that
  what you're looking for?
 
 I didn't write what you're responding to, but, speaking personally:
 
 If making changes to /dev with chmod/chown/mv/etc results in those
 files getting rewritten to match, it's fine.  If not, it's not.

I think that having those changes propogated to the userland config files 
would be very good. I'd really want NetBSD to do it, and I bet it could be 
a useful feature for FreeBSD too.

Take care,

Bill


pgpPFSuyGvQ6e.pgp
Description: PGP signature


Re: FreeBSD 5/6/7 kernel emulator for NetBSD 2.x

2005-10-27 Thread Bill Studenmund
On Mon, Oct 24, 2005 at 10:35:47PM +0200, Hans Petter Selasky wrote:
 
 Main features:
 
 - Implements FreeBSD's devfs on NetBSD.

In the past, we (NetBSD folks) have talked about a devfs. One issue that 
has come up (I'll be honest, I've raised it a lot) is a desire to retain 
permission changes across boots, and to tie devices (when possible) to a 
device-specific attribute rather than a probe order.

Does FreeBSD's devfs support locators and persistent information? Are 
there plans to support something like that, if not?

Take care,

Bill


pgpr3OJEGJ3wV.pgp
Description: PGP signature


Re: kqueue, NOTE_EOF

2003-11-12 Thread Bill Studenmund
On Wed, Nov 12, 2003 at 09:58:15AM +0100, Jaromir Dolecek wrote:
 marius aamodt eriksen wrote:
  hi - 
  
  in order to be able to preserve consistent semantics across poll,
  select, and kqueue (EVFILT_READ), i propose the following change: on
  EVFILT_READ, add an fflag NOTE_EOF which will return when the file
  pointer *is* at the end of the file (effectively always returning on
  EVFILT_READ, but setting the NOTE_EOF flag when it is at the end).
  
  specifically, this allows libevent[1] to behave consistently across
  underlying polling infrastructures (this has become a practical
  issue).
 
 I'm not sure I understand what is the exact issue.

I'm only responding to the notes also.

 Why would this be necessary or what does this exactly solve? AFAIK
 poll() doesn't set any flags in this case neither, so I don't
 see how this is inconsistent.

I think the difference is in the default behavior. When you're at EOF, I 
know that poll() will give you a read-availability event, so you'll read 
the EOF. Will kqueue?

 BTW, shouldn't the EOF flag be cleared when the file is extended?

Probably.

Take care,

Bill


pgp0.pgp
Description: PGP signature


Re: Technical Differences of *BSD and Linux

2003-01-24 Thread Bill Studenmund
On Fri, 24 Jan 2003, arief_mulya wrote:

 Dear all,


 I Apologize, If this thread has existed before, and so if
 this is very offtopic and tiredsome for most of you here.

 I'm a newbie, and just about to get my feet wet into the
 kernel-code, been using (GNU/)Linux (or whatever the name
 is, I personally don't really care, I caremost at the
 technical excellence) for the last two years, I personally
 think it's a toupper(great); system.

 But after recently reviewing some BSD based systems, I began
 to wonder. And these are my questions (I'm trying to avoid
 flame and being a troll here, so if there's any of my
 questions is not on technical basis, or are being such a
 jerk troll please just trash filter my name and email address):

Evidently others opted to not pursue that option.

 1. In what technical area of the kernel are Linux and *BSD
 differ?
 2. How does it differ? What are the technical reasoning
 behind the decisions?

They differ in most technical areas. Mainly as the *BSD kernels were
derived from 4.4-Lite, and Linux was derived, I believe, from Minux. The
difference grew since they were developed by differing groups of people.

Within the BSDs, the main focus of each one is different. To put it in
terms of sound bites, FreeBSD wants to make kick-ass servers, NetBSD wants
to support lots  lots of hardware, and OpenBSD is concerned all about
security. That doesn't mean that the others ignore those areas; all three
are interested in security, and being servers, and they all run on more
than just one platform.

There also is a lot of polination between BSDs. Things will show up in one
and then get ported to another.

 3. Is there any group of developer from each project that
 review each other changes, and tries to make the best code
 out, or is the issues very system specific (something that
 work best on Linux might not be so on FreeBSD or NetBSD or
 OpenBSD)?

Sometimes changes will apply to all, and a comparable fix will happen to
each. This usually shows up in dealing with security advisories, but
happens in other places too. For the most part though, what the BSDs need
is different from what Linux needs, or at least the expertise doesn't
overlap.

 4. Any chance of merging the very best part of each kernel?
 5. Or is it possible to do so?

No, I don't forsee merging. der Mouse pointed out the GPL issue, which is
one where I think the BSD and Linux folks will just agree to disagree.

Take care,

Bill


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Temperature

1999-12-30 Thread Bill Studenmund

On Wed, 29 Dec 1999, Ted Sikora wrote:

 It is HOTTER under FreeBSD. Immediatelly upon boot-up it's 26F
 hotter under FreeBSD than under Linux. Sometime after 3.4-RC and 
 now this started. (I follow the stable branch via CVSup) Under
 3.3-STABLE the temerature was always the same as Linux...cool averaging
 89F for the CPU's. Now it's over 113F under FreeBSD only. I know it's
 wierd but the machine does not lie. Under Linux it's the same as before
 87-89F.

The big question of course is are you sure the machine's not lying?

I agree with you that i't unlikely that it's a fundamental hardware
problem (like you're getting no air flow) if Linux still reports sane
temperatures. But it seems quite reasonable that somehow temperture
reading broke for your hardware when you upgraded.

Two easy ways to settle the issue would be either to get a thermocouple
thermometer, put the thermocouple in the case, and see exactly what
happens with the case temperature. Another easy way to settle it is to get
the voltages being returned for the temperature sensors as opposed to the
reported temperature. If the voltages are the same under the two OS's,
then it's definitly a reporting error. :-)

Take care,

Bill



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Portable way to compare struct stat's?

1999-11-16 Thread Bill Studenmund

On Mon, 15 Nov 1999, Kelly Yancey wrote:

 
   Is there a portable method for determining if the contents of two struct
 stat's are identical? I believe there is not. The problem is that while

What exactly are you trying to do? i.e. why are you comparing the struct
stat's?

Take care,

Bill



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Status of UMAPFS

1999-10-20 Thread Bill Studenmund

On Sat, 16 Oct 1999, Zhihui Zhang wrote:

 On Fri, 15 Oct 1999, Zhihui Zhang wrote:
 
  
  Is the UMAPFS working?  I add "options UMAPFS" to the configuration file
  of FreeBSD 3.3-Release and rebuilt the kernel.  I got the following
  errors: 
  
  loading kernel
  umap_vnops.o: In function `umap_lock':
  umap_vnops.o(.text+0x568): undefined reference to `null_bypass'
  umap_vnops.o: In function `umap_unlock':
  umap_vnops.o(.text+0x58e): undefined reference to `null_bypass'
  *** Error code 1
  
  Stop. 
  
 I find out that you must also include NULLFS in the kernel to compile. I
 have tested NULLFS and UMAPFS with some trivial commands.  Both works.

In NetBSD, we changed these two references to be to umap_bypass since it
is, after all, umapfs. :-)

Take care,

Bill



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: mounting a partition more than once

1999-10-18 Thread Bill Studenmund

On Mon, 13 Sep 1999, Tony Finch wrote:

 Well, in the absence of any comments I hacked around a bit and ended
 up with the following patch (against 3.3-RC), which permits the same
 block device to be mounted read-only more than once. The motivation
 for this is to permit multiple chrooted environments to share the same
 /usr partition.

Wouldn't it be much cleaner to use nullfs? This application is what it's
good at.

Take care,

Bill



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: The meaning of LK_INTERLOCK

1999-10-13 Thread Bill Studenmund

On Wed, 13 Oct 1999, Zhihui Zhang wrote:

 
 The comments say that the flag LK_INTERLOCK means "unlock passed simple
 lock after getting lk_interlock". Under what circumstances are we going to
 need two simple locks (release the first one after getting the second
 one)? I can not understand this easily from the source code. 
 
 Any help is appreciated.

The idea is that the other interlock protects something whose value
determines if we want to grab the lock.

For example, vn_lock() grabs the vnode interlock and looks at v_flag. If
VXLOCK is clear, we then call VOP_LOCK. By doing this interlock trick, no
one can get in and modify the flags before we've entered the lock manager.

Take care,

Bill



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: The usage of MNT_RELOAD

1999-09-08 Thread Bill Studenmund

On Wed, 8 Sep 1999, Zhihui Zhang wrote:

 
 Does fsck have to run on a MOUNTED filesystem?  If so, your answer makes
 sense to me: if fsck modifies the on-disk copy of the superblock, it does
 not have to unmount and then remount the filesystem, it only need to
 reload the superlock for disk. 

I think it's more for the case where fsck has to run on a filesystem which
is mounted. It's better to fsck unmounted filesystems, but you don't
always have that option (say you want to fsck the fs with fsck on it :-)

Take care,

Bill



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: The usage of MNT_RELOAD

1999-09-08 Thread Bill Studenmund
On Wed, 8 Sep 1999, Zhihui Zhang wrote:

 
 Does fsck have to run on a MOUNTED filesystem?  If so, your answer makes
 sense to me: if fsck modifies the on-disk copy of the superblock, it does
 not have to unmount and then remount the filesystem, it only need to
 reload the superlock for disk. 

I think it's more for the case where fsck has to run on a filesystem which
is mounted. It's better to fsck unmounted filesystems, but you don't
always have that option (say you want to fsck the fs with fsck on it :-)

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Bill Studenmund

On Wed, 18 Aug 1999, Terry Lambert wrote:

  Right. That exported struct lock * makes locking down to the lowest-level
  file easy - you just feed it to the lock manager, and you're locking the
  same lock the lowest level fs uses. You then lock all vnodes stacked over
  this one at the same time. Otherwise, you just call VOP_LOCK below and
  then lock yourself.
 
 I think this defeats the purpose of the stacking architecture; I
 think that if you look at an unadulterated NULLFS, you'll see what I
 mean.

Please be more precise. I have looked at an unadulterated NULLFS, and
found it lacking. I don't see how this change breaks stacking.

 Intermediate FS's should not trap VOP's that are not applicable
 to them.

True. But VOP_LOCK is applicable to layered fs's. :-)

 One of the purposes of doing a VOP_LOCK on intermediate vnodes
 that aren't backing objects is to deal with the global vnode
 pool management.  I'd really like FS's to own their vnode pools,
 but even without that, you don't need the locking, since you
 only need to flush data on vnodes that are backing objects.
 
 If we look at a stack of FS's with intermediate exposure into the
 namespace, then it's clear that the issue is really only applicable
 to objects that act as a backing store:
 
 
 ----  
 FSExposed in hierarchyBacking object
 ----  
 top   yes no
 intermediate_1no  no
 intermediate_2no  yes
 intermediate_3yes no
 bottomno  yes
 ----  
 
 So when we lock "top", we only lock in intermediate_2 and in bottom.

No. One of the things Heidemann notes in his dissertation is that to
prevent deadlock, you have to lock the whole stack of vnodes at once, not
bit by bit.

i.e. there is one lock for the whole thing.

  Actually isn't the only problem when you have vnode fan-in (union FS)? 
  i.e.  a plain compressing layer should not introduce vnode locking
  problems. 
 
 If it's a block compression layer, it will.  Also a translation layer;
 consider a pure Unicode system that wants to remotely mount an FS
 from a legacy system.  To do this, it needs to expand the pages from
 the legacy system [only it can, since the legacy system doesn't know
 about Unicode] in a 2:1 ratio.  Now consider doing a byte-range lock
 on a file on such a system.  To propogate the lock, you have to do
 an arithmetic conversion at the translation layer.  This gets worse
 if the lower end FS is exposed in the namespace as well.

Wait. byte-range locking is different from vnode locking. I've been
talking about vnode locking, which is different from the byte-range
locking you're discussing above.

  Nope. The problem is that while stacking (null, umap, and overlay fs's)
  work, we don't have the coherency issues worked out so that upper layers
  can cache data. i.e. so that the lower fs knows it has to ask the uper
  layers to give pages back. :-) But multiple ls -lR's work fine. :-)
 
 With UVM in NetBSD, this is (supposedly) not an issue.

UBC. UVM is a new memory manager. UBC unifies the buffer cache with the VM
system.

 You could actually think of it this way, as well: only FS's that
 contain vnodes that provide backing should implement VOP_GETPAGES
 and VOP_PUTPAGES, and all I/O should be done through paging.

Right. That's part of UBC. :-)

Take care,

Bill



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Bill Studenmund

On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:

 Yes, but we need subsecond in the filesystems.  Think about make(1) on
 a blinding fast machine...

Oh yes, I realize that. :-) It's just that I thought you were at one point
suggesting having 128 bits to the left of the decimal point (128 bits
worth of seconds). I was trying to say that'd be a bit much. :-)

Take care,

Bill



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Bill Studenmund
On Wed, 18 Aug 1999, Terry Lambert wrote:

  Right. That exported struct lock * makes locking down to the lowest-level
  file easy - you just feed it to the lock manager, and you're locking the
  same lock the lowest level fs uses. You then lock all vnodes stacked over
  this one at the same time. Otherwise, you just call VOP_LOCK below and
  then lock yourself.
 
 I think this defeats the purpose of the stacking architecture; I
 think that if you look at an unadulterated NULLFS, you'll see what I
 mean.

Please be more precise. I have looked at an unadulterated NULLFS, and
found it lacking. I don't see how this change breaks stacking.

 Intermediate FS's should not trap VOP's that are not applicable
 to them.

True. But VOP_LOCK is applicable to layered fs's. :-)

 One of the purposes of doing a VOP_LOCK on intermediate vnodes
 that aren't backing objects is to deal with the global vnode
 pool management.  I'd really like FS's to own their vnode pools,
 but even without that, you don't need the locking, since you
 only need to flush data on vnodes that are backing objects.
 
 If we look at a stack of FS's with intermediate exposure into the
 namespace, then it's clear that the issue is really only applicable
 to objects that act as a backing store:
 
 
 ----  
 FSExposed in hierarchyBacking object
 ----  
 top   yes no
 intermediate_1no  no
 intermediate_2no  yes
 intermediate_3yes no
 bottomno  yes
 ----  
 
 So when we lock top, we only lock in intermediate_2 and in bottom.

No. One of the things Heidemann notes in his dissertation is that to
prevent deadlock, you have to lock the whole stack of vnodes at once, not
bit by bit.

i.e. there is one lock for the whole thing.

  Actually isn't the only problem when you have vnode fan-in (union FS)? 
  i.e.  a plain compressing layer should not introduce vnode locking
  problems. 
 
 If it's a block compression layer, it will.  Also a translation layer;
 consider a pure Unicode system that wants to remotely mount an FS
 from a legacy system.  To do this, it needs to expand the pages from
 the legacy system [only it can, since the legacy system doesn't know
 about Unicode] in a 2:1 ratio.  Now consider doing a byte-range lock
 on a file on such a system.  To propogate the lock, you have to do
 an arithmetic conversion at the translation layer.  This gets worse
 if the lower end FS is exposed in the namespace as well.

Wait. byte-range locking is different from vnode locking. I've been
talking about vnode locking, which is different from the byte-range
locking you're discussing above.

  Nope. The problem is that while stacking (null, umap, and overlay fs's)
  work, we don't have the coherency issues worked out so that upper layers
  can cache data. i.e. so that the lower fs knows it has to ask the uper
  layers to give pages back. :-) But multiple ls -lR's work fine. :-)
 
 With UVM in NetBSD, this is (supposedly) not an issue.

UBC. UVM is a new memory manager. UBC unifies the buffer cache with the VM
system.

 You could actually think of it this way, as well: only FS's that
 contain vnodes that provide backing should implement VOP_GETPAGES
 and VOP_PUTPAGES, and all I/O should be done through paging.

Right. That's part of UBC. :-)

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Bill Studenmund
On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:

 Yes, but we need subsecond in the filesystems.  Think about make(1) on
 a blinding fast machine...

Oh yes, I realize that. :-) It's just that I thought you were at one point
suggesting having 128 bits to the left of the decimal point (128 bits
worth of seconds). I was trying to say that'd be a bit much. :-)

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Need some advice regarding portable user IDs

1999-08-24 Thread Bill Studenmund
On Tue, 17 Aug 1999, Brian C. Grayson wrote:

 On Tue, Aug 17, 1999 at 07:17:45PM -0700, Wilfredo Sanchez wrote:
A group of us at Apple are trying to figure out how to handle  
  situations where a filesystem with foreign user ID's are present.   
 
   Have you looked at mount_umap(8)?  I (naively) think it would
 solve most of your concerns.

I don't think so. umap is for translating credentials between domains. I
think what Fred wants to do is different, and that is to ignore the
credentials on the system.

Fred, right now what happens in MacOS when I take a disk which has sharing
credentials set up, and hook it into another machine? How are the
credentials handled there?

Also, one of the problems which has been brought up in the thread is that
umap needs to know what credentials to translate to. For that, we'd need
to stash the credentails on the drive.

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Need some advice regarding portable user IDs

1999-08-23 Thread Bill Studenmund

On Tue, 17 Aug 1999, Brian C. Grayson wrote:

 On Tue, Aug 17, 1999 at 07:17:45PM -0700, Wilfredo Sanchez wrote:
A group of us at Apple are trying to figure out how to handle  
  situations where a filesystem with "foreign" user ID's are present.   
 
   Have you looked at mount_umap(8)?  I (naively) think it would
 solve most of your concerns.

I don't think so. umap is for translating credentials between domains. I
think what Fred wants to do is different, and that is to ignore the
credentials on the system.

Fred, right now what happens in MacOS when I take a disk which has sharing
credentials set up, and hook it into another machine? How are the
credentials handled there?

Also, one of the problems which has been brought up in the thread is that
umap needs to know what credentials to translate to. For that, we'd need
to stash the credentails on the drive.

Take care,

Bill



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Bill Studenmund

On Wed, 18 Aug 1999, Terry Lambert wrote:

  Right. That exported struct lock * makes locking down to the lowest-level
  file easy - you just feed it to the lock manager, and you're locking the
  same lock the lowest level fs uses. You then lock all vnodes stacked over
  this one at the same time. Otherwise, you just call VOP_LOCK below and
  then lock yourself.
 
 I think this defeats the purpose of the stacking architecture; I
 think that if you look at an unadulterated NULLFS, you'll see what I
 mean.

Please be more precise. I have looked at an unadulterated NULLFS, and
found it lacking. I don't see how this change breaks stacking.

 Intermediate FS's should not trap VOP's that are not applicable
 to them.

True. But VOP_LOCK is applicable to layered fs's. :-)

 One of the purposes of doing a VOP_LOCK on intermediate vnodes
 that aren't backing objects is to deal with the global vnode
 pool management.  I'd really like FS's to own their vnode pools,
 but even without that, you don't need the locking, since you
 only need to flush data on vnodes that are backing objects.
 
 If we look at a stack of FS's with intermediate exposure into the
 namespace, then it's clear that the issue is really only applicable
 to objects that act as a backing store:
 
 
 ----  
 FSExposed in hierarchyBacking object
 ----  
 top   yes no
 intermediate_1no  no
 intermediate_2no  yes
 intermediate_3yes no
 bottomno  yes
 ----  
 
 So when we lock "top", we only lock in intermediate_2 and in bottom.

No. One of the things Heidemann notes in his dissertation is that to
prevent deadlock, you have to lock the whole stack of vnodes at once, not
bit by bit.

i.e. there is one lock for the whole thing.

  Actually isn't the only problem when you have vnode fan-in (union FS)? 
  i.e.  a plain compressing layer should not introduce vnode locking
  problems. 
 
 If it's a block compression layer, it will.  Also a translation layer;
 consider a pure Unicode system that wants to remotely mount an FS
 from a legacy system.  To do this, it needs to expand the pages from
 the legacy system [only it can, since the legacy system doesn't know
 about Unicode] in a 2:1 ratio.  Now consider doing a byte-range lock
 on a file on such a system.  To propogate the lock, you have to do
 an arithmetic conversion at the translation layer.  This gets worse
 if the lower end FS is exposed in the namespace as well.

Wait. byte-range locking is different from vnode locking. I've been
talking about vnode locking, which is different from the byte-range
locking you're discussing above.

  Nope. The problem is that while stacking (null, umap, and overlay fs's)
  work, we don't have the coherency issues worked out so that upper layers
  can cache data. i.e. so that the lower fs knows it has to ask the uper
  layers to give pages back. :-) But multiple ls -lR's work fine. :-)
 
 With UVM in NetBSD, this is (supposedly) not an issue.

UBC. UVM is a new memory manager. UBC unifies the buffer cache with the VM
system.

 You could actually think of it this way, as well: only FS's that
 contain vnodes that provide backing should implement VOP_GETPAGES
 and VOP_PUTPAGES, and all I/O should be done through paging.

Right. That's part of UBC. :-)

Take care,

Bill



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Need some advice regarding portable user IDs

1999-08-18 Thread Bill Studenmund

On Tue, 17 Aug 1999, Brian C. Grayson wrote:

 On Tue, Aug 17, 1999 at 07:17:45PM -0700, Wilfredo Sanchez wrote:
A group of us at Apple are trying to figure out how to handle  
  situations where a filesystem with "foreign" user ID's are present.   
 
   Have you looked at mount_umap(8)?  I (naively) think it would
 solve most of your concerns.

I don't think so. umap is for translating credentials between domains. I
think what Fred wants to do is different, and that is to ignore the
credentials on the system.

Fred, right now what happens in MacOS when I take a disk which has sharing
credentials set up, and hook it into another machine? How are the
credentials handled there?

Also, one of the problems which has been brought up in the thread is that
umap needs to know what credentials to translate to. For that, we'd need
to stash the credentails on the drive.

Take care,

Bill



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Need some advice regarding portable user IDs

1999-08-18 Thread Bill Studenmund

On Wed, 18 Aug 1999, Wilfredo Sanchez wrote:

   I think Mac OS 8 will forget about the credentials.  I don't  
 actually know much about how sharing works.
 
   But the current file sharing behaviour is not entirely useful to  
 think about, because it doesn't effect the local permissions (much),  
 and the local permission are what I'm worried about.  Exported  
 filesystems are another story, and I don't want to compilcate things  
 too much by worrying about that right now.

My thought here was more that this was the closest thing to prior art that
MacOS has, and that that might be a good user experience to emulate. ;-)

Probably the thing to do is either have options to the mount call which
have the mounting user own everything, or to set up a umap which maps the
desired user to root for access on the filesystem.

Take care,

Bill



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Bill Studenmund
On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:

 In message pine.sol.3.96.990816105106.27345h-100...@marcy.nas.nasa.gov, 
 Bill 
 Studenmund writes:
 On Sat, 14 Aug 1999, Terry Lambert wrote:
 
 Matt doesn't represent the FreeBSD project, and even if he rewrites
 the VFS subsystem so he can understand it, his rewrite would face
 considerable resistance on its way into FreeBSD.  I don't think
 there is reason to rewrite it, but there certainly are areas
 that need fixing.

Whew! That's reasuring. I agree there are things which need fixing. It'd
be nice if both NetBSD and FreeBSD could fix things in the same way.

 The use of the vfs_default to make unimplemented VOP's
 fall through to code which implements function, while well
 intentioned, is misguided.
 
 I beg to differ.  The only difference is that we pass through
 multiple layers before we hit the bottom of the stack.  There is
 no loss of functionality but significant gain of clarity and
 modularity.

If I understood the issue, it is that the leaf fs's (the bottom ones)
would use a default routine for non-error functionality. I think Terry's
point (which I agree with) was that a leaf fs's default routine should
only return errors.

  3. The filesystem itself is broken for Y2038
 One other suggestion I've heard is to split the 64 bits we have for time
 into 44 bits for seconds, and 20 bits for microseconds. That's more than
 enough modification resolution, and also pushes things to past year
 500,000 AD. Versioning the indoe would cover this easily.
 
 This would be misguided, and given the current speed of evolution
 lead to other problems far before 2038.
 
 Both struct timespec and struct timeval are major mistakes, they
 make arithmetic on timestamps an expensive operation.  Timestamps
 should be stored as integers using an fix-point notations, for
 instance 64bits with 32bit fractional seconds (the NTP timestamp),
 or in the future 128/48.

I like that idea.

One thing I should probably mention is that I'm not suggesting we ever do
arighmetic on the 44/20 number, just we store it that way. struct inode
would contain time fields in whatever format the host prefers, with the
44/20 stuff only being in struct dinode. Converting from 44/20 would only
happen on initial read. Math would happen on the host format version. :-)

If time structures go to 64/32 fixed-point math, then my suggestion can be
re-phrased as storing 44.20 worth of that number in the on-disk inode.

 Extending from 64 to 128bits would be a cheap shift and increased
 precision and range could go hand in hand.

I doubt we need more than 64 bit times. 2^63 seconds works out to
292,279,025,208 years, or 292 (american) billion years. Current theories
put the age of the universe at I think 12 to 16 billion years. So 64-bit
signed times in seconds will cover from before the big bang to way past
any time we'll be caring about. :-)

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Bill Studenmund
On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:

 In message pine.sol.3.96.990818101005.14430b-100...@marcy.nas.nasa.gov, 
 Bill Studenmund writes:
 
 Whew! That's reasuring. I agree there are things which need fixing. It'd
 be nice if both NetBSD and FreeBSD could fix things in the same way.
 
 Well, that still remains to be seen...

:-)

 I doubt we need more than 64 bit times. 2^63 seconds works out to
 292,279,025,208 years, or 292 (american) billion years. Current theories
 put the age of the universe at I think 12 to 16 billion years. So 64-bit
 signed times in seconds will cover from before the big bang to way past
 any time we'll be caring about. :-)

I was unclear. I was refering to the seconds side of things. Sub-second
resolution would need other bits.

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Bill Studenmund
On Wed, 18 Aug 1999, Terry Lambert wrote:

  Right. That exported struct lock * makes locking down to the lowest-level
  file easy - you just feed it to the lock manager, and you're locking the
  same lock the lowest level fs uses. You then lock all vnodes stacked over
  this one at the same time. Otherwise, you just call VOP_LOCK below and
  then lock yourself.
 
 I think this defeats the purpose of the stacking architecture; I
 think that if you look at an unadulterated NULLFS, you'll see what I
 mean.

Please be more precise. I have looked at an unadulterated NULLFS, and
found it lacking. I don't see how this change breaks stacking.

 Intermediate FS's should not trap VOP's that are not applicable
 to them.

True. But VOP_LOCK is applicable to layered fs's. :-)

 One of the purposes of doing a VOP_LOCK on intermediate vnodes
 that aren't backing objects is to deal with the global vnode
 pool management.  I'd really like FS's to own their vnode pools,
 but even without that, you don't need the locking, since you
 only need to flush data on vnodes that are backing objects.
 
 If we look at a stack of FS's with intermediate exposure into the
 namespace, then it's clear that the issue is really only applicable
 to objects that act as a backing store:
 
 
 ----  
 FSExposed in hierarchyBacking object
 ----  
 top   yes no
 intermediate_1no  no
 intermediate_2no  yes
 intermediate_3yes no
 bottomno  yes
 ----  
 
 So when we lock top, we only lock in intermediate_2 and in bottom.

No. One of the things Heidemann notes in his dissertation is that to
prevent deadlock, you have to lock the whole stack of vnodes at once, not
bit by bit.

i.e. there is one lock for the whole thing.

  Actually isn't the only problem when you have vnode fan-in (union FS)? 
  i.e.  a plain compressing layer should not introduce vnode locking
  problems. 
 
 If it's a block compression layer, it will.  Also a translation layer;
 consider a pure Unicode system that wants to remotely mount an FS
 from a legacy system.  To do this, it needs to expand the pages from
 the legacy system [only it can, since the legacy system doesn't know
 about Unicode] in a 2:1 ratio.  Now consider doing a byte-range lock
 on a file on such a system.  To propogate the lock, you have to do
 an arithmetic conversion at the translation layer.  This gets worse
 if the lower end FS is exposed in the namespace as well.

Wait. byte-range locking is different from vnode locking. I've been
talking about vnode locking, which is different from the byte-range
locking you're discussing above.

  Nope. The problem is that while stacking (null, umap, and overlay fs's)
  work, we don't have the coherency issues worked out so that upper layers
  can cache data. i.e. so that the lower fs knows it has to ask the uper
  layers to give pages back. :-) But multiple ls -lR's work fine. :-)
 
 With UVM in NetBSD, this is (supposedly) not an issue.

UBC. UVM is a new memory manager. UBC unifies the buffer cache with the VM
system.

 You could actually think of it this way, as well: only FS's that
 contain vnodes that provide backing should implement VOP_GETPAGES
 and VOP_PUTPAGES, and all I/O should be done through paging.

Right. That's part of UBC. :-)

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Bill Studenmund
On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:

 Yes, but we need subsecond in the filesystems.  Think about make(1) on
 a blinding fast machine...

Oh yes, I realize that. :-) It's just that I thought you were at one point
suggesting having 128 bits to the left of the decimal point (128 bits
worth of seconds). I was trying to say that'd be a bit much. :-)

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Need some advice regarding portable user IDs

1999-08-18 Thread Bill Studenmund
On Tue, 17 Aug 1999, Brian C. Grayson wrote:

 On Tue, Aug 17, 1999 at 07:17:45PM -0700, Wilfredo Sanchez wrote:
A group of us at Apple are trying to figure out how to handle  
  situations where a filesystem with foreign user ID's are present.   
 
   Have you looked at mount_umap(8)?  I (naively) think it would
 solve most of your concerns.

I don't think so. umap is for translating credentials between domains. I
think what Fred wants to do is different, and that is to ignore the
credentials on the system.

Fred, right now what happens in MacOS when I take a disk which has sharing
credentials set up, and hook it into another machine? How are the
credentials handled there?

Also, one of the problems which has been brought up in the thread is that
umap needs to know what credentials to translate to. For that, we'd need
to stash the credentails on the drive.

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Need some advice regarding portable user IDs

1999-08-18 Thread Bill Studenmund
On Wed, 18 Aug 1999, Wilfredo Sanchez wrote:

   I think Mac OS 8 will forget about the credentials.  I don't  
 actually know much about how sharing works.
 
   But the current file sharing behaviour is not entirely useful to  
 think about, because it doesn't effect the local permissions (much),  
 and the local permission are what I'm worried about.  Exported  
 filesystems are another story, and I don't want to compilcate things  
 too much by worrying about that right now.

My thought here was more that this was the closest thing to prior art that
MacOS has, and that that might be a good user experience to emulate. ;-)

Probably the thing to do is either have options to the mount call which
have the mounting user own everything, or to set up a umap which maps the
desired user to root for access on the filesystem.

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Need some advice regarding portable user IDs

1999-08-18 Thread Bill Studenmund
On Wed, 18 Aug 1999, Chris Dillon wrote:

 I'm probably being extremely naive myself, but I just envisioned a
 scenario like this (pardon me if someone else has already suggested
 this):
 
 When a filesystem is mounted as foreign (HOW that is determined I
 won't talk about), every file in the filesytem has its credentials
 mapped to that of the mountpoint.  File mode bits are not remapped in
 any way.  New files gain the credentials of their _foreign_ parent.
 
 That's the skinny.  Now I'll give a (much longer) example to clarify.

Sounds fine, except I'd have the owner  group passed in in the initial
mount, rather than taken from the mount point. :-)

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Bill Studenmund

On Tue, 17 Aug 1999, Terry Lambert wrote:

 2.Advisory locks are hung off private backing objects.
  I'm not sure. The struct lock * is only used by layered filesystems, so
  they can keep track both of the underlying vnode lock, and if needed their
  own vnode lock. For advisory locks, would we want to keep track both of
  locks on our layer and the layer below? Don't we want either one or the
  other? i.e. layers bypass to the one below, or deal with it all
  themselves.
 
 I think you want the lock on the intermediate layer: basically, on
 every vnode that has data associated with it that is unique to a
 layer.  Let's not forget, also, that you can expose a layer into
 the namespace in one place, and expose it covered under another
 layer, at another.  If you locked down to the backing object, then
 the only issue you would be left with is one or more intermediate
 backing objects.

Right. That exported struct lock * makes locking down to the lowest-level
file easy - you just feed it to the lock manager, and you're locking the
same lock the lowest level fs uses. You then lock all vnodes stacked over
this one at the same time. Otherwise, you just call VOP_LOCK below and
then lock yourself.

 For a layer with an intermediate backing object, I'm prepared to
 declare it "special", and proxy the operation down to any inferior
 backing object (e.g. a union FS that adds files from two FS's
 together, rather than just directoriy entry lists).  I think such
 layers are the exception, not the rule.

Actually isn't the only problem when you have vnode fan-in (union FS)? 
i.e.  a plain compressing layer should not introduce vnode locking
problems. 

 I think that export policies are the realm of /etc/exports.
 
 The problem with each FS implementing its own policy, is that this
 is another place that copyinstr() gets called, when it shouldn't.

Well, my thought was that, like with current code, most every fs would
just call vfs_export() when it's presented an export operation. But by
retaining the option of having the fs do its own thing, we can support
different export semantics if desired.

 Right.  The "covering" operation is not the same as the "marking as
 covered" operation.  Both need to be at the higher level.
 Not really.  Julian Elisher had code that mounted a /devfs under
 / automatically, before the user was ever allowed to see /.  As a
 result, the FS that you were left with was indistinguishable from
 what I describe.
 
 The only real difference is that, as a translucent mount over /devfs,
 the one I describe would be capable of implementing persistant changes
 to the /devfs, as whiteouts.  I don't think this is really that
 desirable, but some people won't accept a devfs that doesn't have
 traditional persistance semantics (e.g. "chmod" vs. modifying a
 well known kernel data structure as an administrative operation).

That wouldn't be hard to do. :-)

 I guess the other difference is that you don't have to worry about
 large minor numbers when you are bringing up a new platform via
 NFS from an old platform that can't support large minors in its FS
 at all.  ;-).

True. :-)

 I would resolve this by passing a standard option to the mount code
 in user space.  For root mounts, a vnode is passed down.  For other
 mounts, the vnode is parsed and passed if the option is specified.

Or maybe add a field to vfsops. This info says what the mount call will
expect (I want a block device, a regular file, a directory, etc), so it
fits. :-)

Also, if we leave it to userland, what happens if someone writes a
program which calls sys_mount with something the fs doesn't expect. :-)

 I think that you will only be able to find rare examples of FS's
 that don't take device names as arguments.  But for those, you
 don't specify the option, and it gets "NULL", and whatever local
 options you specify.

I agree I can't see a leaf fs not taking a device node. But layered fs's
certainly will want something else. :-)

 The point is that, for FS's that can be both root and sub-root,
 the mount code doesn't have to make the decision, it can be punted
 to higher level code, in one place, where the code can be centrally
 maintained and kept from getting "stale" when things change out
 from under it.

True.

And with good comments we can catch the times when the centrally located
code changes  brakes an assumption made by the fs. :-)

  Except for a minor buglet with device nodes, stacking works in NetBSD at
  present. :-)
 
 Have you tried Heidemann's student's stacking layers?  There is one
 encryption, and one per-file compression with namespace hiding, that
 I think it would be hard pressed to keep up with.  But I'll give it
 the benefit of the doubt.  8-).

Nope. The problem is that while stacking (null, umap, and overlay fs's)
work, we don't have the coherency issues worked out so that upper layers
can cache data. i.e. so that the lower fs knows it has to ask the uper
layers to give pages back. :-) But multiple 

Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Bill Studenmund
On Tue, 17 Aug 1999, Michael Hancock wrote:

 As I recall most of FBSD's default routines are also error routines, if
 the exceptions were a problem it would would be trivial to fix.
 
 I think fixing resource allocation/deallocation for things like vnodes,
 cnbufs, and locks are a higher priority for now.  There are examples such
 as in detached threading where it might make sense for the detached child
 to be responsible for releasing resources allocated to it by the parent,
 but in stacking this model is very messy and unnatural.  This is why the
 purpose of VOP_ABORTOP appears to be to release cnbufs but this is really
 just an ugly side effect.  With stacking the code that allocates should be
 the code that deallocates. Substitute, code  with layer to be more
 correct. 
 
 I fixed a lot of the vnode and locking cases, unfortunately the ones that
 remain are probably ugly cases where you have to reacquire locks that had
 to be unlocked somewhere in the executing layer.  See VOP_RENAME for an
 example.  Compare the number of WILLRELEs in vnode_if.src in FreeBSD and
 NetBSD, ideally there'd be none.

I've compared the two, and making the NetBSD number match the FreeBSD
number is one of my goals. :-)

Any suggestions, or just plodfix?

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Bill Studenmund
On Tue, 17 Aug 1999, Terry Lambert wrote:

 2.Advisory locks are hung off private backing objects.
  I'm not sure. The struct lock * is only used by layered filesystems, so
  they can keep track both of the underlying vnode lock, and if needed their
  own vnode lock. For advisory locks, would we want to keep track both of
  locks on our layer and the layer below? Don't we want either one or the
  other? i.e. layers bypass to the one below, or deal with it all
  themselves.
 
 I think you want the lock on the intermediate layer: basically, on
 every vnode that has data associated with it that is unique to a
 layer.  Let's not forget, also, that you can expose a layer into
 the namespace in one place, and expose it covered under another
 layer, at another.  If you locked down to the backing object, then
 the only issue you would be left with is one or more intermediate
 backing objects.

Right. That exported struct lock * makes locking down to the lowest-level
file easy - you just feed it to the lock manager, and you're locking the
same lock the lowest level fs uses. You then lock all vnodes stacked over
this one at the same time. Otherwise, you just call VOP_LOCK below and
then lock yourself.

 For a layer with an intermediate backing object, I'm prepared to
 declare it special, and proxy the operation down to any inferior
 backing object (e.g. a union FS that adds files from two FS's
 together, rather than just directoriy entry lists).  I think such
 layers are the exception, not the rule.

Actually isn't the only problem when you have vnode fan-in (union FS)? 
i.e.  a plain compressing layer should not introduce vnode locking
problems. 

 I think that export policies are the realm of /etc/exports.
 
 The problem with each FS implementing its own policy, is that this
 is another place that copyinstr() gets called, when it shouldn't.

Well, my thought was that, like with current code, most every fs would
just call vfs_export() when it's presented an export operation. But by
retaining the option of having the fs do its own thing, we can support
different export semantics if desired.

 Right.  The covering operation is not the same as the marking as
 covered operation.  Both need to be at the higher level.
 Not really.  Julian Elisher had code that mounted a /devfs under
 / automatically, before the user was ever allowed to see /.  As a
 result, the FS that you were left with was indistinguishable from
 what I describe.
 
 The only real difference is that, as a translucent mount over /devfs,
 the one I describe would be capable of implementing persistant changes
 to the /devfs, as whiteouts.  I don't think this is really that
 desirable, but some people won't accept a devfs that doesn't have
 traditional persistance semantics (e.g. chmod vs. modifying a
 well known kernel data structure as an administrative operation).

That wouldn't be hard to do. :-)

 I guess the other difference is that you don't have to worry about
 large minor numbers when you are bringing up a new platform via
 NFS from an old platform that can't support large minors in its FS
 at all.  ;-).

True. :-)

 I would resolve this by passing a standard option to the mount code
 in user space.  For root mounts, a vnode is passed down.  For other
 mounts, the vnode is parsed and passed if the option is specified.

Or maybe add a field to vfsops. This info says what the mount call will
expect (I want a block device, a regular file, a directory, etc), so it
fits. :-)

Also, if we leave it to userland, what happens if someone writes a
program which calls sys_mount with something the fs doesn't expect. :-)

 I think that you will only be able to find rare examples of FS's
 that don't take device names as arguments.  But for those, you
 don't specify the option, and it gets NULL, and whatever local
 options you specify.

I agree I can't see a leaf fs not taking a device node. But layered fs's
certainly will want something else. :-)

 The point is that, for FS's that can be both root and sub-root,
 the mount code doesn't have to make the decision, it can be punted
 to higher level code, in one place, where the code can be centrally
 maintained and kept from getting stale when things change out
 from under it.

True.

And with good comments we can catch the times when the centrally located
code changes  brakes an assumption made by the fs. :-)

  Except for a minor buglet with device nodes, stacking works in NetBSD at
  present. :-)
 
 Have you tried Heidemann's student's stacking layers?  There is one
 encryption, and one per-file compression with namespace hiding, that
 I think it would be hard pressed to keep up with.  But I'll give it
 the benefit of the doubt.  8-).

Nope. The problem is that while stacking (null, umap, and overlay fs's)
work, we don't have the coherency issues worked out so that upper layers
can cache data. i.e. so that the lower fs knows it has to ask the uper
layers to give pages back. :-) But multiple ls -lR's work 

Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Bill Studenmund
On Wed, 18 Aug 1999, Michael Hancock wrote:

 Interesting, have you read the Heidemann paper that outlines a solution
 that uses a cache manager?
 
 You can probably find it somewhere here,
 http://www.isi.edu/~johnh/SOFTWARE/UCLA_STACKING/

Nope. I've read his dissertation, and his discussion of the lock
management inspired the struct lock * work I did for NetBSD (we use the
address of the lock, not the vnode, but other than that it's the same).

Thanks for the ref!

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-16 Thread Bill Studenmund

On Sat, 14 Aug 1999, Terry Lambert wrote:

  I am currently conducting a thorough study of the VFS subsystem
  in preparation for an all-out effort to port SGI's XFS filesystem to
  FreeBSD 4.x at such time as SGI gives up the code.  Matt Dillon
  has written in hackers- that the VFS subsystem is presently not
  well understood by any of the active kernel code contributers and
  that it will be rewritten later this year.  This is obviously of great
  concern to me in this port.
 
 It is of great concern to me that a rewrite, apparently because of
 non-understanding, is taking place at all.

That concerns me too. Many aspects of the 4.4 vnode interface were there  
for specific reasons. Even if they were hack solutions, to re-write them  
because of a lack of understanding is dangerous as the new code will
likely run into the same problems as before. :-)

Also, it behooves all the *BSD's to not get too divergent. Sharing code
between us all helps all. Given that I'm working on the kernel side of a
data migration file system using NetBSD, I can assure you there are things
which FreeBSD would get access to more easily the more-similar the two VFS
interface are. :-)

 I would suggest that anyone planning on this rewrite should talk,
 in depth, with John Heidemann prior to engaging in such activity.
 John is very approachable, and is a deep thinker.  Any rewrite
 that does not meet his original design goals for his stacking
 architecture is, I think, a Very Bad Idea(tm).
 
 
  I greatly appreciate all assistance in answering the following
  questions:
  
  1)  What are the perceived problems with the current VFS?
  2)  What options are available to us as remedies?
  3)  To what extent will existing FS code require revision in order
   to be useful after the rewrite?
  4)  Will Chapters 6,7,8  9 of "The Design and Implementation of
   the 4.4BSD Operating System" still pertain after the rewrite?
  5)  How important are questions 3  4 in the design of the new
   VFS?
  
  I believe that the VFS is conceptually sound and that the existing
  semantics should be strictly retained in the new code.  Any new
  functionality should be added in the form of entirely new kernel 
  routines and system calls, or possibly by such means as
  converting the existing routines to the vararg format etc.
 
 Here some of the problems I'm aware of, and my suggested remedies:
 
 1.The interface is not reflexive, with regard to cn_pnbuf.
 
   Specifically, path buffers are allocated by the caller, but
   not freed by the caller, and various routines in each FS
   implementation are expected to deal with this.
 
   Each FS duplicates code, and such duplication is subject
   to error.  Not to mention that it makes your kernel fat.

Yep, that's not good.

 2.Advisory locks are hung off private backing objects.
 
   Advisory locks are passed into VOP_ADVLOCK in each FS
   instance, and then each FS applies this by hanging the
   locks off a list on a private backing object.  For FFS,
   this is the in core inode.
 
   A more correct approach would be to hang the lock off the
   vnode.  This effectively obviates the need for having a
   VOP_ADVLOCK at all, except for the NFS client FS, which
   will need to propagate lock requests across the net.  The
   most efficient mechanism for this would be to institute
   a pass/fail response for VOP_ADVLOCK calls, with a default
   of "pass", and an actual implementation of the operand only
   in the NFS client FS.

I agree that it's better for all fs's to share this functionality as much
as possible.

I'd vote againsts your implimentation suggestion for VOP_ADVLOCK on an 
efficiency concern. If we actually make a VOP call, that should be the
end of the story. I.e either add a vnode flag to indicate pas/fail-ness,
or add a genfs/std call to handle the problem.

I'd actually vote for the latter. Hang the byte-range locking off of the
vnode, and add a genfs_advlock() or vop_stdadvlock() routine (depending on
OS flavor) to handle the call. That way all fs's that can share code, and
the callers need only call VO_ADVLOCK() - no other logic.

NetBSD actually needs this to get unionfs to work. Do you want to talk
privately about it?

   Again, each FS must duplicate the advisory locking code,
   at present, and such duplication is subject to error.

Agreed.

 3.Object locks are implemented locally in many FS's.
 
   The VOP_LOCK interface is implemented via vop_stdlock()
   calls in many FS's.  This is done using the "vfs_default"
   mechanism.  In other FS's, it's implemented locally.
 
   The intent of the VOP_LOCK mechanism being implemented
   as a VOP at all was to allow it to be proxied to another
   machine over a network, using the original Heidemann
   design.  This is also the reason for the use of descriptors
   for all VOP arguments, since they can be opaquely 

Re: BSD XFS Port BSD VFS Rewrite

1999-08-16 Thread Bill Studenmund

On Mon, 16 Aug 1999, Terry Lambert wrote:

   2.Advisory locks are hung off private backing objects.
  I'd vote againsts your implimentation suggestion for VOP_ADVLOCK on an 
  efficiency concern. If we actually make a VOP call, that should be the
  end of the story. I.e either add a vnode flag to indicate pas/fail-ness,
  or add a genfs/std call to handle the problem.
  
  I'd actually vote for the latter. Hang the byte-range locking off of the
  vnode, and add a genfs_advlock() or vop_stdadvlock() routine (depending on
  OS flavor) to handle the call. That way all fs's that can share code, and
  the callers need only call VO_ADVLOCK() - no other logic.
 
 OK.  Here's the problem with that:  NFS client locks in a stacked
 FS on top the the NFS client FS.

Ahh, but it'd be the fs's decision to map genfs_advlock()/vop_stdadvlock()
to its vop_advlock_desc entry or not. In this case, NFS wouldn't want to
do that.

Though it would mean growing the fs footprint.

 Specifically, you need to seperate the idea of asserting a lock
 against the local vnode, asserting the lock via NFS locking, and
 coelescing the local lock list, after both have succeeded, or
 reverting the local assertion, should the remote assertion fail.

Right. But my thought was that you'd be calling an NFS routine, so it
could do the right thing.

  NetBSD actually needs this to get unionfs to work. Do you want to talk
  privately about it?
 
 If you want.  FreeBSD needs it for unionfs and nullfs, so it's
 something that would be worth airing.
 
 I think you could say that no locking routine was an approval of
 the uuper level lock.  This lets you bail on all FS's except NFS,
 where you have to deal with the approve/reject from the remote
 host.  The problem with this on FreeBSD is the VFS_default stuff,
 which puts a non-NULL interface on all FS's for all VOP's.

I'm not familiar with the VFS_default stuff. All the vop_default_desc
routines in NetBSD point to error routines.

 Yes, this NULL is the same NULL I suggested for advisory locks,
 above.

I'm not sure. The struct lock * is only used by layered filesystems, so
they can keep track both of the underlying vnode lock, and if needed their
own vnode lock. For advisory locks, would we want to keep track both of
locks on our layer and the layer below? Don't we want either one or the
other? i.e. layers bypass to the one below, or deal with it all
themselves.

   5.The idea of "root" vs. "non-root" mounts is inherently bad.
  You forgot:
  
  5)  Update export lists
  
  If you call the mount routine with no device name
  (args.fspec == 0) and with MNT_UPDATE, you get
  routed to the vfs_export routine
 
 This must be the job of the upper level code, so that there is
 a single control point for export information, instead of spreading
 it throughout ead FS's mount entry point.

I agree it should be detangled, but think it should remain the fs's job to
choose to call vfs_export. Otherwise an fs can't impliment its own export
policies. :-)

  I thought it was? Admitedly the only reference code I have is the ntfs
  code in the NetBSD kernel. But given how full of #ifdef (__FreeBSD__)'s it
  is, I thought it'd be an ok reference.
 
 No.

We've lost the context, but what I was trying to say was that I thought
the marking-the-vnode-as-mounted-on bit was done in the mount syscall at
present. At least that's what
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/vfs_syscalls.c?rev=1.130
seems to be doing.

 Basically, what you would have is the equivalent of a variable
 length "mounted volume" table, from which mappings (and exports,
 based on the mappings) are externalized into the namespace.

Ahh, sounds like you're talking about a new formalism..

 Right.  It should just have a "mount" entry point, and the rest
 of the stuff moves to higher level code, called by the mount system
 call, and the mountroot stuff during boot, to externalize the root
 volume at the top of the hierarchy.
 
 An ideal world would mount a / that had a /dev under it, and then
 do transparent mounts over top of that.

That would be quite a different place than we have now. ;-)

 The conversion of the root device into a vnode pointer, or
 a path to a device into a vnode pointer, is the job of upper
 level code -- specifically, the mount system call, and the
 common code for booting.
  
  My one concern about this is you've assumed that the user is mounting a
  device onto a filesystem.
 
 No.  Vnoide, not bdevvp.  The bdevvp stuff is for the boot time stuff
 in the upper level code, and only applies to the root volume.

Maybe I mis-parsed. I thought you were talking about parsing the first
mount option (in mount /dev/disk there, the /dev/disk option) into a
vnode. The concern below is that different fs's have different ideas as to
what that node should be. Some want it a device node which no one else is
using (most leaf fs's), while some others want 

Re: BSD XFS Port BSD VFS Rewrite

1999-08-16 Thread Bill Studenmund
On Sat, 14 Aug 1999, Terry Lambert wrote:

  I am currently conducting a thorough study of the VFS subsystem
  in preparation for an all-out effort to port SGI's XFS filesystem to
  FreeBSD 4.x at such time as SGI gives up the code.  Matt Dillon
  has written in hackers- that the VFS subsystem is presently not
  well understood by any of the active kernel code contributers and
  that it will be rewritten later this year.  This is obviously of great
  concern to me in this port.
 
 It is of great concern to me that a rewrite, apparently because of
 non-understanding, is taking place at all.

That concerns me too. Many aspects of the 4.4 vnode interface were there  
for specific reasons. Even if they were hack solutions, to re-write them  
because of a lack of understanding is dangerous as the new code will
likely run into the same problems as before. :-)

Also, it behooves all the *BSD's to not get too divergent. Sharing code
between us all helps all. Given that I'm working on the kernel side of a
data migration file system using NetBSD, I can assure you there are things
which FreeBSD would get access to more easily the more-similar the two VFS
interface are. :-)

 I would suggest that anyone planning on this rewrite should talk,
 in depth, with John Heidemann prior to engaging in such activity.
 John is very approachable, and is a deep thinker.  Any rewrite
 that does not meet his original design goals for his stacking
 architecture is, I think, a Very Bad Idea(tm).
 
 
  I greatly appreciate all assistance in answering the following
  questions:
  
  1)  What are the perceived problems with the current VFS?
  2)  What options are available to us as remedies?
  3)  To what extent will existing FS code require revision in order
   to be useful after the rewrite?
  4)  Will Chapters 6,7,8  9 of The Design and Implementation of
   the 4.4BSD Operating System still pertain after the rewrite?
  5)  How important are questions 3  4 in the design of the new
   VFS?
  
  I believe that the VFS is conceptually sound and that the existing
  semantics should be strictly retained in the new code.  Any new
  functionality should be added in the form of entirely new kernel 
  routines and system calls, or possibly by such means as
  converting the existing routines to the vararg format etc.
 
 Here some of the problems I'm aware of, and my suggested remedies:
 
 1.The interface is not reflexive, with regard to cn_pnbuf.
 
   Specifically, path buffers are allocated by the caller, but
   not freed by the caller, and various routines in each FS
   implementation are expected to deal with this.
 
   Each FS duplicates code, and such duplication is subject
   to error.  Not to mention that it makes your kernel fat.

Yep, that's not good.

 2.Advisory locks are hung off private backing objects.
 
   Advisory locks are passed into VOP_ADVLOCK in each FS
   instance, and then each FS applies this by hanging the
   locks off a list on a private backing object.  For FFS,
   this is the in core inode.
 
   A more correct approach would be to hang the lock off the
   vnode.  This effectively obviates the need for having a
   VOP_ADVLOCK at all, except for the NFS client FS, which
   will need to propagate lock requests across the net.  The
   most efficient mechanism for this would be to institute
   a pass/fail response for VOP_ADVLOCK calls, with a default
   of pass, and an actual implementation of the operand only
   in the NFS client FS.

I agree that it's better for all fs's to share this functionality as much
as possible.

I'd vote againsts your implimentation suggestion for VOP_ADVLOCK on an 
efficiency concern. If we actually make a VOP call, that should be the
end of the story. I.e either add a vnode flag to indicate pas/fail-ness,
or add a genfs/std call to handle the problem.

I'd actually vote for the latter. Hang the byte-range locking off of the
vnode, and add a genfs_advlock() or vop_stdadvlock() routine (depending on
OS flavor) to handle the call. That way all fs's that can share code, and
the callers need only call VO_ADVLOCK() - no other logic.

NetBSD actually needs this to get unionfs to work. Do you want to talk
privately about it?

   Again, each FS must duplicate the advisory locking code,
   at present, and such duplication is subject to error.

Agreed.

 3.Object locks are implemented locally in many FS's.
 
   The VOP_LOCK interface is implemented via vop_stdlock()
   calls in many FS's.  This is done using the vfs_default
   mechanism.  In other FS's, it's implemented locally.
 
   The intent of the VOP_LOCK mechanism being implemented
   as a VOP at all was to allow it to be proxied to another
   machine over a network, using the original Heidemann
   design.  This is also the reason for the use of descriptors
   for all VOP arguments, since they can be opaquely proxied 

Re: BSD XFS Port BSD VFS Rewrite

1999-08-16 Thread Bill Studenmund
On Mon, 16 Aug 1999, Terry Lambert wrote:

   2.Advisory locks are hung off private backing objects.
  I'd vote againsts your implimentation suggestion for VOP_ADVLOCK on an 
  efficiency concern. If we actually make a VOP call, that should be the
  end of the story. I.e either add a vnode flag to indicate pas/fail-ness,
  or add a genfs/std call to handle the problem.
  
  I'd actually vote for the latter. Hang the byte-range locking off of the
  vnode, and add a genfs_advlock() or vop_stdadvlock() routine (depending on
  OS flavor) to handle the call. That way all fs's that can share code, and
  the callers need only call VO_ADVLOCK() - no other logic.
 
 OK.  Here's the problem with that:  NFS client locks in a stacked
 FS on top the the NFS client FS.

Ahh, but it'd be the fs's decision to map genfs_advlock()/vop_stdadvlock()
to its vop_advlock_desc entry or not. In this case, NFS wouldn't want to
do that.

Though it would mean growing the fs footprint.

 Specifically, you need to seperate the idea of asserting a lock
 against the local vnode, asserting the lock via NFS locking, and
 coelescing the local lock list, after both have succeeded, or
 reverting the local assertion, should the remote assertion fail.

Right. But my thought was that you'd be calling an NFS routine, so it
could do the right thing.

  NetBSD actually needs this to get unionfs to work. Do you want to talk
  privately about it?
 
 If you want.  FreeBSD needs it for unionfs and nullfs, so it's
 something that would be worth airing.
 
 I think you could say that no locking routine was an approval of
 the uuper level lock.  This lets you bail on all FS's except NFS,
 where you have to deal with the approve/reject from the remote
 host.  The problem with this on FreeBSD is the VFS_default stuff,
 which puts a non-NULL interface on all FS's for all VOP's.

I'm not familiar with the VFS_default stuff. All the vop_default_desc
routines in NetBSD point to error routines.

 Yes, this NULL is the same NULL I suggested for advisory locks,
 above.

I'm not sure. The struct lock * is only used by layered filesystems, so
they can keep track both of the underlying vnode lock, and if needed their
own vnode lock. For advisory locks, would we want to keep track both of
locks on our layer and the layer below? Don't we want either one or the
other? i.e. layers bypass to the one below, or deal with it all
themselves.

   5.The idea of root vs. non-root mounts is inherently bad.
  You forgot:
  
  5)  Update export lists
  
  If you call the mount routine with no device name
  (args.fspec == 0) and with MNT_UPDATE, you get
  routed to the vfs_export routine
 
 This must be the job of the upper level code, so that there is
 a single control point for export information, instead of spreading
 it throughout ead FS's mount entry point.

I agree it should be detangled, but think it should remain the fs's job to
choose to call vfs_export. Otherwise an fs can't impliment its own export
policies. :-)

  I thought it was? Admitedly the only reference code I have is the ntfs
  code in the NetBSD kernel. But given how full of #ifdef (__FreeBSD__)'s it
  is, I thought it'd be an ok reference.
 
 No.

We've lost the context, but what I was trying to say was that I thought
the marking-the-vnode-as-mounted-on bit was done in the mount syscall at
present. At least that's what
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/vfs_syscalls.c?rev=1.130
seems to be doing.

 Basically, what you would have is the equivalent of a variable
 length mounted volume table, from which mappings (and exports,
 based on the mappings) are externalized into the namespace.

Ahh, sounds like you're talking about a new formalism..

 Right.  It should just have a mount entry point, and the rest
 of the stuff moves to higher level code, called by the mount system
 call, and the mountroot stuff during boot, to externalize the root
 volume at the top of the hierarchy.
 
 An ideal world would mount a / that had a /dev under it, and then
 do transparent mounts over top of that.

That would be quite a different place than we have now. ;-)

 The conversion of the root device into a vnode pointer, or
 a path to a device into a vnode pointer, is the job of upper
 level code -- specifically, the mount system call, and the
 common code for booting.
  
  My one concern about this is you've assumed that the user is mounting a
  device onto a filesystem.
 
 No.  Vnoide, not bdevvp.  The bdevvp stuff is for the boot time stuff
 in the upper level code, and only applies to the root volume.

Maybe I mis-parsed. I thought you were talking about parsing the first
mount option (in mount /dev/disk there, the /dev/disk option) into a
vnode. The concern below is that different fs's have different ideas as to
what that node should be. Some want it a device node which no one else is
using (most leaf fs's), while some others want a 

Re: BSD XFS Port BSD VFS Rewrite

1999-08-13 Thread Bill Studenmund
On Fri, 13 Aug 1999, Terry Lambert wrote:

 Has anyone mentioned to them that they will be unable to incorporate
 changes made to the GPL'ed version of XFS back into the IRIX version
 of XFS, without IRIX becoming GPL'ed?

Given that they say they're dropping IRIX and going with Linux, I don't
think it'll be a problem.

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: Improving the Unix API

1999-06-28 Thread Bill Studenmund
On Mon, 28 Jun 1999, Francois-Rene Rideau wrote:

 On Sun, Jun 27, 1999 at 12:58:05PM -0400, der Mouse wrote:
  See NetBSD (and presumably other BSD) mount -o update,rdonly and/or
  umount -f.  (Last I tried, the latter didn't work as it should, but
  that's a matter of fixing bugs rather than introducing new features.)
 If you re-read the original message, the problem is what to do
 about processes with open file descriptors on the partition:
 stop them at once? stop them at first file access?
 block them instead? kill them? Will you do it atomically?
 How will you allow for such large table-walking to be compatible
 with real-time kernel response? [Hint: either use incremental
 data-structures, or don't be atomic and be interruptible instead.]

unmount -f is more intended for oh-sh*t situations. So harshness is ok.

The way it's done is that all of the vnodes in that fs's vnode list get
either vgone'd or vcleaned (in the -f case). This will have the effect of
mapping them to deadfs vnodes, so all future access will either fail or do
nothing (close works, read returns an error). There aren't any big table
walks. :-)

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message