RE: BSD XFS Port BSD VFS Rewrite

1999-11-01 Thread Alton, Matthew

I spent an hour on the phone with SGI's lead FS scientist Dan Koren discussing
the XFS situation, Margot Seltzer's LFS work, ships, sails, sealing wax...  The
code is not yet open.  It is being "disencumbered" and retrofitted to the Linux
kernel interfaces by a team of contractors and university people all under NDA.
So we're on hold for the time being.  Unless you want to sign an NDA and move to
Iowa for a year or so.

We BSDies really are going to have to come up with something in the way of a
modern storage subsystem.

 -Original Message-
 From: Andrzej Bialecki [SMTP:[EMAIL PROTECTED]]
 Sent: Saturday, October 30, 1999 10:56 AM
 To:   Alton, Matthew
 Subject:      Re: BSD XFS Port  BSD VFS Rewrite
 
 On Thu, 5 Aug 1999, Alton, Matthew wrote:
 
  I am currently conducting a thorough study of the VFS subsystem
  in preparation for an all-out effort to port SGI's XFS filesystem to
  FreeBSD 4.x at such time as SGI gives up the code.  Matt Dillon
 
 Is there anything that you might say on the progress status of this
 project? Thanks!
 
 Andrzej Bialecki
 
 //  [EMAIL PROTECTED] WebGiro AB, Sweden (http://www.webgiro.com)
 // ---
 // -- FreeBSD: The Power to Serve. http://www.freebsd.org 
 // --- Small  Embedded FreeBSD: http://www.freebsd.org/~picobsd/ 
 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-11-01 Thread Russell Cattelan

"Alton, Matthew" wrote:

Hmm interesting... Guess I need to read the hackers list more often.
So anybody interested in what is going on right now?

Legal BS; The encumbrance work is progressing at an expected snails pace.
The hardest question to answer at this point; What is encumbered and what isn't!
It isn't even clear what constitutes encumbrance structures function names/api's.
Short summary to all this nobody has any clear idea as to how long before the code
can be released.

I do have a bit of good news: if anybody is truly interested in helping out with
the
project they can sign an NDA through the company that sgi has contracted to work on

the linux port. This is basically to protect SGI until the code has be officially
clean and
blessed.

Contact me directly if you are interested
[EMAIL PROTECTED]

Where are we at with the linux port...
Well we can mount file systems, df, ls, and read files (not a complete
implementation)
I am currently working on the write path, this one is much more complicated and
will
require addition work from other people to complete first.

There are a lot and I mean a lot of issues involved with getting xfs to interface
with
the buffer/memory management system of an OS. IRIX pulls a lot of tricks with
delayed allocation, holes, overlapping buffers, pining etc. etc.

There is a lot of discussion amongst the linux people about how to proceed with
upgrading linux's buffer/page code.

I am currently trying to keep linux specific stuff out of the bowels of XFS.
In fact one of our main goals is change a little XFS code as possible since
all current improvements / bug fixes are being done on the IRIX code base.

If people have ideas how how to keep this a "portable" file system let me know.
It is easier for me to push things in certain directions now rather than later.


 I spent an hour on the phone with SGI's lead FS scientist Dan Koren discussing
 the XFS situation, Margot Seltzer's LFS work, ships, sails, sealing wax...  The
 code is not yet open.  It is being "disencumbered" and retrofitted to the Linux
 kernel interfaces by a team of contractors and university people all under NDA.
 So we're on hold for the time being.  Unless you want to sign an NDA and move to
 Iowa for a year or so.



 We BSDies really are going to have to come up with something in the way of a
 modern storage subsystem.

  -Original Message-
  From: Andrzej Bialecki [SMTP:[EMAIL PROTECTED]]
  Sent: Saturday, October 30, 1999 10:56 AM
  To:   Alton, Matthew
  Subject:  Re: BSD XFS Port  BSD VFS Rewrite
 
  On Thu, 5 Aug 1999, Alton, Matthew wrote:
 
   I am currently conducting a thorough study of the VFS subsystem
   in preparation for an all-out effort to port SGI's XFS filesystem to
   FreeBSD 4.x at such time as SGI gives up the code.  Matt Dillon
 
  Is there anything that you might say on the progress status of this
  project? Thanks!
 
  Andrzej Bialecki
 
  //  [EMAIL PROTECTED] WebGiro AB, Sweden (http://www.webgiro.com)
  // ---
  // -- FreeBSD: The Power to Serve. http://www.freebsd.org 
  // --- Small  Embedded FreeBSD: http://www.freebsd.org/~picobsd/ 
 

 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-hackers" in the body of the message

--
Russell Cattelan
[EMAIL PROTECTED]





To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Julian Elischer

The discussions between Kirk and matt over a glass of beer/drink
at kirk's party at USENIX and at the Bay area User's group.



On Wed, 18 Aug 1999, Nate Williams wrote:

   Matt doesn't represent the FreeBSD project, and even if he rewrites
   the VFS subsystem so he can understand it, his rewrite would face
   considerable resistance on its way into FreeBSD.  I don't think
   there is reason to rewrite it, but there certainly are areas
   that need fixing.
  
  You are misinformed as far as I know.. From discussions I saw, th
  main architect of a VFS rewrite would be Kirk, and Matt would be acting as
  Kirk's right-hand-man.
 
 Which discussions are these?  Are they archived somewhere?
 
 
 Nate
 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Julian Elischer



On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:

 Matt doesn't represent the FreeBSD project, and even if he rewrites
 the VFS subsystem so he can understand it, his rewrite would face
 considerable resistance on its way into FreeBSD.  I don't think
 there is reason to rewrite it, but there certainly are areas
 that need fixing.

You are misinformed as far as I know.. From discussions I saw, th
main architect of a VFS rewrite would be Kirk, and Matt would be acting as
Kirk's right-hand-man.

 
 The use of the "vfs_default" to make unimplemented VOP's
 fall through to code which implements function, while well
 intentioned, is misguided.
 
 I beg to differ.  The only difference is that we pass through
 multiple layers before we hit the bottom of the stack.  There is
 no loss of functionality but significant gain of clarity and
 modularity.

Well I believe that Kirk considers them misguided too, but he stated that
he wasn't going to remove them without serious thought about the alternatives.
 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Bill Studenmund

On Wed, 18 Aug 1999, Terry Lambert wrote:

  Right. That exported struct lock * makes locking down to the lowest-level
  file easy - you just feed it to the lock manager, and you're locking the
  same lock the lowest level fs uses. You then lock all vnodes stacked over
  this one at the same time. Otherwise, you just call VOP_LOCK below and
  then lock yourself.
 
 I think this defeats the purpose of the stacking architecture; I
 think that if you look at an unadulterated NULLFS, you'll see what I
 mean.

Please be more precise. I have looked at an unadulterated NULLFS, and
found it lacking. I don't see how this change breaks stacking.

 Intermediate FS's should not trap VOP's that are not applicable
 to them.

True. But VOP_LOCK is applicable to layered fs's. :-)

 One of the purposes of doing a VOP_LOCK on intermediate vnodes
 that aren't backing objects is to deal with the global vnode
 pool management.  I'd really like FS's to own their vnode pools,
 but even without that, you don't need the locking, since you
 only need to flush data on vnodes that are backing objects.
 
 If we look at a stack of FS's with intermediate exposure into the
 namespace, then it's clear that the issue is really only applicable
 to objects that act as a backing store:
 
 
 ----  
 FSExposed in hierarchyBacking object
 ----  
 top   yes no
 intermediate_1no  no
 intermediate_2no  yes
 intermediate_3yes no
 bottomno  yes
 ----  
 
 So when we lock "top", we only lock in intermediate_2 and in bottom.

No. One of the things Heidemann notes in his dissertation is that to
prevent deadlock, you have to lock the whole stack of vnodes at once, not
bit by bit.

i.e. there is one lock for the whole thing.

  Actually isn't the only problem when you have vnode fan-in (union FS)? 
  i.e.  a plain compressing layer should not introduce vnode locking
  problems. 
 
 If it's a block compression layer, it will.  Also a translation layer;
 consider a pure Unicode system that wants to remotely mount an FS
 from a legacy system.  To do this, it needs to expand the pages from
 the legacy system [only it can, since the legacy system doesn't know
 about Unicode] in a 2:1 ratio.  Now consider doing a byte-range lock
 on a file on such a system.  To propogate the lock, you have to do
 an arithmetic conversion at the translation layer.  This gets worse
 if the lower end FS is exposed in the namespace as well.

Wait. byte-range locking is different from vnode locking. I've been
talking about vnode locking, which is different from the byte-range
locking you're discussing above.

  Nope. The problem is that while stacking (null, umap, and overlay fs's)
  work, we don't have the coherency issues worked out so that upper layers
  can cache data. i.e. so that the lower fs knows it has to ask the uper
  layers to give pages back. :-) But multiple ls -lR's work fine. :-)
 
 With UVM in NetBSD, this is (supposedly) not an issue.

UBC. UVM is a new memory manager. UBC unifies the buffer cache with the VM
system.

 You could actually think of it this way, as well: only FS's that
 contain vnodes that provide backing should implement VOP_GETPAGES
 and VOP_PUTPAGES, and all I/O should be done through paging.

Right. That's part of UBC. :-)

Take care,

Bill



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Bill Studenmund

On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:

 Yes, but we need subsecond in the filesystems.  Think about make(1) on
 a blinding fast machine...

Oh yes, I realize that. :-) It's just that I thought you were at one point
suggesting having 128 bits to the left of the decimal point (128 bits
worth of seconds). I was trying to say that'd be a bit much. :-)

Take care,

Bill



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Terry Lambert

I'm not familiar with the VFS_default stuff. All the vop_default_desc
routines in NetBSD point to error routines.
   
   In FreeBSD, they now point to default routines that are *not* error
   routines.  This is the problem.  I admit the change was very well
   intentioned, since it made the code a hell of a lot more readable,
   but choosing between readable and additional function, I take function
   over form (I think the way I would have "fixed" the readability is by
   making the operations that result in the descriptor set for a mounted
   FS instance be both discrete, and named for their specific function).
  
  As I recall most of FBSD's default routines are also error routines, if
  the exceptions were a problem it would would be trivial to fix.
 
 You would have to de-collapse several VOP lists that have been
 pre-collapsed.
 
 You are talking gibberish here.  Please show code where this is
 a problem.

When you write a proxy stacking layer, such as John Heidemann's
network proxy stacking layer (an NFS alternative), VOP's which
would normally be handled by vfs_default have to be handled on
the other end of the proxy, instead, in the same way that they
would be handled by the vfs_default stuff.

Some VOP's, like advisory locking, need both local assertion and
remote proxy of the VOP to avoid introducing race windows.

The result of this is that, if you rely on the vfs_default stuff,
then you can't proxy those VOP's into a different address space,
either on another machine, or to a user space VFS stacking layer
developement environment.

This is the same problem that embedding VM references directly
into any FS causes, and that vm_object_t aliases would exacerbate.

John has, in the past, sent me a number of stacking layers done
by various people, with the requirement that I not redistribute
them, as they are not what he would consider to be properly
representative of finished work.

Since John himself did the network proxy, you could perhaps get
him to send you a copy, so you could have direct access to code
where this was a problem.

Make sure that the system you are talking to over the proxy is
not assumed to be a FreeBSD system (e.g. don't assume that the
vfs_default stuff exists on the other end of the proxy, or that
it would be functional).


Terry Lambert
[EMAIL PROTECTED]
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Terry Lambert writes:

 You would have to de-collapse several VOP lists that have been
 pre-collapsed.
 
 You are talking gibberish here.  Please show code where this is
 a problem.

When you write a proxy stacking layer, such as John Heidemann's
network proxy stacking layer (an NFS alternative), VOP's which
would normally be handled by vfs_default have to be handled on
the other end of the proxy, instead, in the same way that they
would be handled by the vfs_default stuff.

And what prevents you from taking over the default op ?

--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Poul-Henning Kamp
In message pine.sol.3.96.990818104932.14430d-100...@marcy.nas.nasa.gov, Bill 
Studenmund writes:

 I doubt we need more than 64 bit times. 2^63 seconds works out to
 292,279,025,208 years, or 292 (american) billion years. Current theories
 put the age of the universe at I think 12 to 16 billion years. So 64-bit
 signed times in seconds will cover from before the big bang to way past
 any time we'll be caring about. :-)

I was unclear. I was refering to the seconds side of things. Sub-second
resolution would need other bits.

Yes, but we need subsecond in the filesystems.  Think about make(1) on
a blinding fast machine...

--
Poul-Henning Kamp FreeBSD coreteam member
p...@freebsd.org   Real hackers run -current on their laptop.
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Poul-Henning Kamp
In message pine.bsf.3.95.990818105716.12306a-100...@current1.whistle.com, 
Julian Elischer writes:
On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:

 Matt doesn't represent the FreeBSD project, and even if he rewrites
 the VFS subsystem so he can understand it, his rewrite would face
 considerable resistance on its way into FreeBSD.  I don't think
 there is reason to rewrite it, but there certainly are areas
 that need fixing.

You are misinformed as far as I know.. From discussions I saw, th
main architect of a VFS rewrite would be Kirk, and Matt would be acting as
Kirk's right-hand-man.

I bet that Matt and Kirk uses rewrite for two very different
concepts.  The resulting reviews will be equally different.

The use of the vfs_default to make unimplemented VOP's
fall through to code which implements function, while well
intentioned, is misguided.
 
 I beg to differ.  The only difference is that we pass through
 multiple layers before we hit the bottom of the stack.  There is
 no loss of functionality but significant gain of clarity and
 modularity.

Well I believe that Kirk considers them misguided too, but he stated that
he wasn't going to remove them without serious thought about the alternatives.

I'll be more than ready to discuss this with Kirk.

--
Poul-Henning Kamp FreeBSD coreteam member
p...@freebsd.org   Real hackers run -current on their laptop.
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Terry Lambert
  2.  Advisory locks are hung off private backing objects.
   I'm not sure. The struct lock * is only used by layered filesystems, so
   they can keep track both of the underlying vnode lock, and if needed their
   own vnode lock. For advisory locks, would we want to keep track both of
   locks on our layer and the layer below? Don't we want either one or the
   other? i.e. layers bypass to the one below, or deal with it all
   themselves.
  
  I think you want the lock on the intermediate layer: basically, on
  every vnode that has data associated with it that is unique to a
  layer.  Let's not forget, also, that you can expose a layer into
  the namespace in one place, and expose it covered under another
  layer, at another.  If you locked down to the backing object, then
  the only issue you would be left with is one or more intermediate
  backing objects.
 
 Right. That exported struct lock * makes locking down to the lowest-level
 file easy - you just feed it to the lock manager, and you're locking the
 same lock the lowest level fs uses. You then lock all vnodes stacked over
 this one at the same time. Otherwise, you just call VOP_LOCK below and
 then lock yourself.

I think this defeats the purpose of the stacking architecture; I
think that if you look at an unadulterated NULLFS, you'll see what I
mean.

Intermediate FS's should not trap VOP's that are not applicable
to them.

One of the purposes of doing a VOP_LOCK on intermediate vnodes
that aren't backing objects is to deal with the global vnode
pool management.  I'd really like FS's to own their vnode pools,
but even without that, you don't need the locking, since you
only need to flush data on vnodes that are backing objects.

If we look at a stack of FS's with intermediate exposure into the
namespace, then it's clear that the issue is really only applicable
to objects that act as a backing store:


--  --  
FS  Exposed in hierarchyBacking object
--  --  
top yes no
intermediate_1  no  no
intermediate_2  no  yes
intermediate_3  yes no
bottom  no  yes
--  --  

So when we lock top, we only lock in intermediate_2 and in bottom.

Then we attempt to lock in intermediate_3, but it fails: not because
there is a lock on the vnode in intermediate_3, but because there is
a lock in bottom.

It's unnecessary to lock the vnodes in the intermediate path, or
even at the exposure level, unless they are vnodes that have an
associated backing store.

The need to lock in intermediate_2 exists because it is a translation
layer or a namespace escape.  It deals with compression, or it deals
with file-as-a-directory folding, or it deals with file-hiding
(perhaps for a quoata file), etc..  If it didn't, it wouldn't need
backing store (and therefore wouldn't need to be locked).


  For a layer with an intermediate backing object, I'm prepared to
  declare it special, and proxy the operation down to any inferior
  backing object (e.g. a union FS that adds files from two FS's
  together, rather than just directoriy entry lists).  I think such
  layers are the exception, not the rule.
 
 Actually isn't the only problem when you have vnode fan-in (union FS)? 
 i.e.  a plain compressing layer should not introduce vnode locking
 problems. 

If it's a block compression layer, it will.  Also a translation layer;
consider a pure Unicode system that wants to remotely mount an FS
from a legacy system.  To do this, it needs to expand the pages from
the legacy system [only it can, since the legacy system doesn't know
about Unicode] in a 2:1 ratio.  Now consider doing a byte-range lock
on a file on such a system.  To propogate the lock, you have to do
an arithmetic conversion at the translation layer.  This gets worse
if the lower end FS is exposed in the namespace as well.

You could make the same arguments for other types of translation or
namespace escapes.


  I think that export policies are the realm of /etc/exports.
  
  The problem with each FS implementing its own policy, is that this
  is another place that copyinstr() gets called, when it shouldn't.
 
 Well, my thought was that, like with current code, most every fs would
 just call vfs_export() when it's presented an export operation. But by
 retaining the option of having the fs do its own thing, we can support
 different export semantics if desired.

I think this bears down on whether the NFS server VFS consumer is
allowed access to the VFS stack at the particular intermediate
layer.  I think this is really an administrative policy decision,
and not an option for the VFS.

I think it would be bad if a given VFS could refuse to participate
in a stacking 

Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Matthew Dillon
:On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:
:
: Matt doesn't represent the FreeBSD project, and even if he rewrites
: the VFS subsystem so he can understand it, his rewrite would face
: considerable resistance on its way into FreeBSD.  I don't think
: there is reason to rewrite it, but there certainly are areas
: that need fixing.
:
:You are misinformed as far as I know.. From discussions I saw, th
:main architect of a VFS rewrite would be Kirk, and Matt would be acting as
:Kirk's right-hand-man.

Yes, this is correct.  Kirk is going to be the main architect.  I have
been heavily involved and will continue to be.

:The use of the vfs_default to make unimplemented VOP's
:
: I beg to differ.  The only difference is that we pass through
: multiple layers before we hit the bottom of the stack.  There is
:...
:Well I believe that Kirk considers them misguided too, but he stated that
:he wasn't going to remove them without serious thought about the alternatives.

The vfs op callout layering has not been on the radar screen.  There
are much too many other more serious problems.  I really doubt that any
changes will be made to this piece any time in the next year or even two,
if at all.

The main items on the radar screen are related to buffer management
(struct buf stuff.  For example, preventing VM blockages due to pages
being wired by write I/O's), VFS locking and reference count issues 
(for example, namei lookups, blockages in the pager and syncer due to
vnode locks held by blocked processes, etc...), and interactions 
between VFS and VM (for example: moving away from VOP_READ/VOP_WRITE 
and moving more towards a getpages/putpages model).

None of the items have been set in stone yet.  We're waiting for Kirk
to get back from vacation and get back into the groove.

-Matt



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Nate Williams
  Matt doesn't represent the FreeBSD project, and even if he rewrites
  the VFS subsystem so he can understand it, his rewrite would face
  considerable resistance on its way into FreeBSD.  I don't think
  there is reason to rewrite it, but there certainly are areas
  that need fixing.
 
 You are misinformed as far as I know.. From discussions I saw, th
 main architect of a VFS rewrite would be Kirk, and Matt would be acting as
 Kirk's right-hand-man.

Which discussions are these?  Are they archived somewhere?


Nate


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Julian Elischer


On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:

 Matt doesn't represent the FreeBSD project, and even if he rewrites
 the VFS subsystem so he can understand it, his rewrite would face
 considerable resistance on its way into FreeBSD.  I don't think
 there is reason to rewrite it, but there certainly are areas
 that need fixing.

You are misinformed as far as I know.. From discussions I saw, th
main architect of a VFS rewrite would be Kirk, and Matt would be acting as
Kirk's right-hand-man.

 
 The use of the vfs_default to make unimplemented VOP's
 fall through to code which implements function, while well
 intentioned, is misguided.
 
 I beg to differ.  The only difference is that we pass through
 multiple layers before we hit the bottom of the stack.  There is
 no loss of functionality but significant gain of clarity and
 modularity.

Well I believe that Kirk considers them misguided too, but he stated that
he wasn't going to remove them without serious thought about the alternatives.
 



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Poul-Henning Kamp
In message 199908181848.laa14...@usr02.primenet.com, Terry Lambert writes:

 You would have to de-collapse several VOP lists that have been
 pre-collapsed.
 
 You are talking gibberish here.  Please show code where this is
 a problem.

When you write a proxy stacking layer, such as John Heidemann's
network proxy stacking layer (an NFS alternative), VOP's which
would normally be handled by vfs_default have to be handled on
the other end of the proxy, instead, in the same way that they
would be handled by the vfs_default stuff.

And what prevents you from taking over the default op ?

--
Poul-Henning Kamp FreeBSD coreteam member
p...@freebsd.org   Real hackers run -current on their laptop.
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Julian Elischer
The discussions between Kirk and matt over a glass of beer/drink
at kirk's party at USENIX and at the Bay area User's group.



On Wed, 18 Aug 1999, Nate Williams wrote:

   Matt doesn't represent the FreeBSD project, and even if he rewrites
   the VFS subsystem so he can understand it, his rewrite would face
   considerable resistance on its way into FreeBSD.  I don't think
   there is reason to rewrite it, but there certainly are areas
   that need fixing.
  
  You are misinformed as far as I know.. From discussions I saw, th
  main architect of a VFS rewrite would be Kirk, and Matt would be acting as
  Kirk's right-hand-man.
 
 Which discussions are these?  Are they archived somewhere?
 
 
 Nate
 



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Poul-Henning Kamp
In message 199908181737.laa03...@mt.sri.com, Nate Williams writes:
  Both struct timespec and struct timeval are major mistakes, they
  make arithmetic on timestamps an expensive operation.  Timestamps
  should be stored as integers using an fix-point notations, for
  instance 64bits with 32bit fractional seconds (the NTP timestamp),
  or in the future 128/48.
...
 
  Extending from 64 to 128bits would be a cheap shift and increased
  precision and range could go hand in hand.
 
 I doubt we need more than 64 bit times. 2^63 seconds works out to
 292,279,025,208 years, or 292 (american) billion years.

I think Poul's point is that in the future seconds is probably way too
coarse grained.  Computer's are getting faster all the time, and in the
future we may need 64 seconds, plus an additional 64 bits for the
fractions of a second, which will be necessary for accurate timekeeping.

No, 64bits of fractions will not be needed, at least until we start
using FreeBSD as embedded computer in Heisenbergcompensators.

I recall somebody saying that 100GHz was the highest realistic (or
lowest unrealistic) clock frequency using digital logic, the argument
was pretty convincing physically: light speed sets a size limit,
that prescripes some voltage gradients which in turn produces EMC
which in turn makes sure nothing works.  Also various tunnel effects,
and the general heisenberisms took their toll.

State of the art time interval measuring equipment is into the
a few picosecond territory (http://www.timing.com/).

Based on that I would say that 40 to 48 bits will be OK for the
fraction.

As a sidebar:  I had a kernel running which used 32i.32f timestamps
and converted to timeval  timespec as needed and it actually made
a lot of code look a lot more sane.  I may go back and do it some
day.

--
Poul-Henning Kamp FreeBSD coreteam member
p...@freebsd.org   Real hackers run -current on their laptop.
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Bill Studenmund
On Wed, 18 Aug 1999, Terry Lambert wrote:

  Right. That exported struct lock * makes locking down to the lowest-level
  file easy - you just feed it to the lock manager, and you're locking the
  same lock the lowest level fs uses. You then lock all vnodes stacked over
  this one at the same time. Otherwise, you just call VOP_LOCK below and
  then lock yourself.
 
 I think this defeats the purpose of the stacking architecture; I
 think that if you look at an unadulterated NULLFS, you'll see what I
 mean.

Please be more precise. I have looked at an unadulterated NULLFS, and
found it lacking. I don't see how this change breaks stacking.

 Intermediate FS's should not trap VOP's that are not applicable
 to them.

True. But VOP_LOCK is applicable to layered fs's. :-)

 One of the purposes of doing a VOP_LOCK on intermediate vnodes
 that aren't backing objects is to deal with the global vnode
 pool management.  I'd really like FS's to own their vnode pools,
 but even without that, you don't need the locking, since you
 only need to flush data on vnodes that are backing objects.
 
 If we look at a stack of FS's with intermediate exposure into the
 namespace, then it's clear that the issue is really only applicable
 to objects that act as a backing store:
 
 
 ----  
 FSExposed in hierarchyBacking object
 ----  
 top   yes no
 intermediate_1no  no
 intermediate_2no  yes
 intermediate_3yes no
 bottomno  yes
 ----  
 
 So when we lock top, we only lock in intermediate_2 and in bottom.

No. One of the things Heidemann notes in his dissertation is that to
prevent deadlock, you have to lock the whole stack of vnodes at once, not
bit by bit.

i.e. there is one lock for the whole thing.

  Actually isn't the only problem when you have vnode fan-in (union FS)? 
  i.e.  a plain compressing layer should not introduce vnode locking
  problems. 
 
 If it's a block compression layer, it will.  Also a translation layer;
 consider a pure Unicode system that wants to remotely mount an FS
 from a legacy system.  To do this, it needs to expand the pages from
 the legacy system [only it can, since the legacy system doesn't know
 about Unicode] in a 2:1 ratio.  Now consider doing a byte-range lock
 on a file on such a system.  To propogate the lock, you have to do
 an arithmetic conversion at the translation layer.  This gets worse
 if the lower end FS is exposed in the namespace as well.

Wait. byte-range locking is different from vnode locking. I've been
talking about vnode locking, which is different from the byte-range
locking you're discussing above.

  Nope. The problem is that while stacking (null, umap, and overlay fs's)
  work, we don't have the coherency issues worked out so that upper layers
  can cache data. i.e. so that the lower fs knows it has to ask the uper
  layers to give pages back. :-) But multiple ls -lR's work fine. :-)
 
 With UVM in NetBSD, this is (supposedly) not an issue.

UBC. UVM is a new memory manager. UBC unifies the buffer cache with the VM
system.

 You could actually think of it this way, as well: only FS's that
 contain vnodes that provide backing should implement VOP_GETPAGES
 and VOP_PUTPAGES, and all I/O should be done through paging.

Right. That's part of UBC. :-)

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Bill Studenmund
On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:

 Yes, but we need subsecond in the filesystems.  Think about make(1) on
 a blinding fast machine...

Oh yes, I realize that. :-) It's just that I thought you were at one point
suggesting having 128 bits to the left of the decimal point (128 bits
worth of seconds). I was trying to say that'd be a bit much. :-)

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-24 Thread Terry Lambert
I'm not familiar with the VFS_default stuff. All the vop_default_desc
routines in NetBSD point to error routines.
   
   In FreeBSD, they now point to default routines that are *not* error
   routines.  This is the problem.  I admit the change was very well
   intentioned, since it made the code a hell of a lot more readable,
   but choosing between readable and additional function, I take function
   over form (I think the way I would have fixed the readability is by
   making the operations that result in the descriptor set for a mounted
   FS instance be both discrete, and named for their specific function).
  
  As I recall most of FBSD's default routines are also error routines, if
  the exceptions were a problem it would would be trivial to fix.
 
 You would have to de-collapse several VOP lists that have been
 pre-collapsed.
 
 You are talking gibberish here.  Please show code where this is
 a problem.

When you write a proxy stacking layer, such as John Heidemann's
network proxy stacking layer (an NFS alternative), VOP's which
would normally be handled by vfs_default have to be handled on
the other end of the proxy, instead, in the same way that they
would be handled by the vfs_default stuff.

Some VOP's, like advisory locking, need both local assertion and
remote proxy of the VOP to avoid introducing race windows.

The result of this is that, if you rely on the vfs_default stuff,
then you can't proxy those VOP's into a different address space,
either on another machine, or to a user space VFS stacking layer
developement environment.

This is the same problem that embedding VM references directly
into any FS causes, and that vm_object_t aliases would exacerbate.

John has, in the past, sent me a number of stacking layers done
by various people, with the requirement that I not redistribute
them, as they are not what he would consider to be properly
representative of finished work.

Since John himself did the network proxy, you could perhaps get
him to send you a copy, so you could have direct access to code
where this was a problem.

Make sure that the system you are talking to over the proxy is
not assumed to be a FreeBSD system (e.g. don't assume that the
vfs_default stuff exists on the other end of the proxy, or that
it would be functional).


Terry Lambert
te...@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-23 Thread Matthew Dillon

:On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:
:
: Matt doesn't represent the FreeBSD project, and even if he rewrites
: the VFS subsystem so he can understand it, his rewrite would face
: considerable resistance on its way into FreeBSD.  I don't think
: there is reason to rewrite it, but there certainly are areas
: that need fixing.
:
:You are misinformed as far as I know.. From discussions I saw, th
:main architect of a VFS rewrite would be Kirk, and Matt would be acting as
:Kirk's right-hand-man.

Yes, this is correct.  Kirk is going to be the main architect.  I have
been heavily involved and will continue to be.

:The use of the "vfs_default" to make unimplemented VOP's
:
: I beg to differ.  The only difference is that we pass through
: multiple layers before we hit the bottom of the stack.  There is
:...
:Well I believe that Kirk considers them misguided too, but he stated that
:he wasn't going to remove them without serious thought about the alternatives.

The vfs op callout layering has not been on the radar screen.  There
are much too many other more serious problems.  I really doubt that any
changes will be made to this piece any time in the next year or even two,
if at all.

The main items on the radar screen are related to buffer management
(struct buf stuff.  For example, preventing VM blockages due to pages
being wired by write I/O's), VFS locking and reference count issues 
(for example, namei lookups, blockages in the pager and syncer due to
vnode locks held by blocked processes, etc...), and interactions 
between VFS and VM (for example: moving away from VOP_READ/VOP_WRITE 
and moving more towards a getpages/putpages model).

None of the items have been set in stone yet.  We're waiting for Kirk
to get back from vacation and get back into the groove.

-Matt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-23 Thread Terry Lambert

  2.  Advisory locks are hung off private backing objects.
   I'm not sure. The struct lock * is only used by layered filesystems, so
   they can keep track both of the underlying vnode lock, and if needed their
   own vnode lock. For advisory locks, would we want to keep track both of
   locks on our layer and the layer below? Don't we want either one or the
   other? i.e. layers bypass to the one below, or deal with it all
   themselves.
  
  I think you want the lock on the intermediate layer: basically, on
  every vnode that has data associated with it that is unique to a
  layer.  Let's not forget, also, that you can expose a layer into
  the namespace in one place, and expose it covered under another
  layer, at another.  If you locked down to the backing object, then
  the only issue you would be left with is one or more intermediate
  backing objects.
 
 Right. That exported struct lock * makes locking down to the lowest-level
 file easy - you just feed it to the lock manager, and you're locking the
 same lock the lowest level fs uses. You then lock all vnodes stacked over
 this one at the same time. Otherwise, you just call VOP_LOCK below and
 then lock yourself.

I think this defeats the purpose of the stacking architecture; I
think that if you look at an unadulterated NULLFS, you'll see what I
mean.

Intermediate FS's should not trap VOP's that are not applicable
to them.

One of the purposes of doing a VOP_LOCK on intermediate vnodes
that aren't backing objects is to deal with the global vnode
pool management.  I'd really like FS's to own their vnode pools,
but even without that, you don't need the locking, since you
only need to flush data on vnodes that are backing objects.

If we look at a stack of FS's with intermediate exposure into the
namespace, then it's clear that the issue is really only applicable
to objects that act as a backing store:


--  --  
FS  Exposed in hierarchyBacking object
--  --  
top yes no
intermediate_1  no  no
intermediate_2  no  yes
intermediate_3  yes no
bottom  no  yes
--  --  

So when we lock "top", we only lock in intermediate_2 and in bottom.

Then we attempt to lock in intermediate_3, but it fails: not because
there is a lock on the vnode in intermediate_3, but because there is
a lock in bottom.

It's unnecessary to lock the vnodes in the intermediate path, or
even at the exposure level, unless they are vnodes that have an
associated backing store.

The need to lock in intermediate_2 exists because it is a translation
layer or a namespace escape.  It deals with compression, or it deals
with file-as-a-directory folding, or it deals with file-hiding
(perhaps for a quoata file), etc..  If it didn't, it wouldn't need
backing store (and therefore wouldn't need to be locked).


  For a layer with an intermediate backing object, I'm prepared to
  declare it "special", and proxy the operation down to any inferior
  backing object (e.g. a union FS that adds files from two FS's
  together, rather than just directoriy entry lists).  I think such
  layers are the exception, not the rule.
 
 Actually isn't the only problem when you have vnode fan-in (union FS)? 
 i.e.  a plain compressing layer should not introduce vnode locking
 problems. 

If it's a block compression layer, it will.  Also a translation layer;
consider a pure Unicode system that wants to remotely mount an FS
from a legacy system.  To do this, it needs to expand the pages from
the legacy system [only it can, since the legacy system doesn't know
about Unicode] in a 2:1 ratio.  Now consider doing a byte-range lock
on a file on such a system.  To propogate the lock, you have to do
an arithmetic conversion at the translation layer.  This gets worse
if the lower end FS is exposed in the namespace as well.

You could make the same arguments for other types of translation or
namespace escapes.


  I think that export policies are the realm of /etc/exports.
  
  The problem with each FS implementing its own policy, is that this
  is another place that copyinstr() gets called, when it shouldn't.
 
 Well, my thought was that, like with current code, most every fs would
 just call vfs_export() when it's presented an export operation. But by
 retaining the option of having the fs do its own thing, we can support
 different export semantics if desired.

I think this bears down on whether the NFS server VFS consumer is
allowed access to the VFS stack at the particular intermediate
layer.  I think this is really an administrative policy decision,
and not an option for the VFS.

I think it would be bad if a given VFS could refuse to participate
in a 

Re: BSD XFS Port BSD VFS Rewrite

1999-08-23 Thread Nate Williams

  Matt doesn't represent the FreeBSD project, and even if he rewrites
  the VFS subsystem so he can understand it, his rewrite would face
  considerable resistance on its way into FreeBSD.  I don't think
  there is reason to rewrite it, but there certainly are areas
  that need fixing.
 
 You are misinformed as far as I know.. From discussions I saw, th
 main architect of a VFS rewrite would be Kirk, and Matt would be acting as
 Kirk's right-hand-man.

Which discussions are these?  Are they archived somewhere?


Nate


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-21 Thread Daniel C. Sobral

"Daniel C. Sobral" wrote:
 
 Terry Lambert wrote:
 
  That's kind of the point.  No other VFS stacking system out there
  plays by FreeBSD's revamped rules.
 
 I look around and I see no standards. It is still time to be
 experimental.

Since someone complained of my meekness, let me restate that... :-)

1) BS. That was not your point. Your point, in which you spent many
paragraphs, was that the present way FreeBSD things does it stuff
cannot support passing a method through an intermediate host/fs that
does not know it.

If your "point" was the above, you could just have said "no one else
does it this way, so we won't be able to have non-FreeBSD
intermediate/frontend/backend hosts". Only that does not prove that
"our" way is not right.

2) There is *no* compatibility in the VFS out there. It's a jungle.
If we implemented something compatible with anyone else, it would be
a first. And given that everything out there have it's problems, it
would be a huge mistake to adopt someone's standard just for the
sake of being compatible.

And if you disagree with point 2, feel free to argue against it. But
in no way it will justify that absurd comment you made.

Either that paragraph was trying to cover a flaw in your logic, or
you just lost your train of thought. It certainly detracted from the
content of the message. "You must assume that the intermediate host
doesn't play by your rules". Bah.

[not that I don't generally agree with you more often than it would
be prudent to let it be publicly known :-) ]

--
Daniel C. Sobral(8-DCS)
[EMAIL PROTECTED]
[EMAIL PROTECTED]

- Can I speak to your superior?
- There's some religious debate on that question.




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-21 Thread Daniel C. Sobral
Daniel C. Sobral wrote:
 
 Terry Lambert wrote:
 
  That's kind of the point.  No other VFS stacking system out there
  plays by FreeBSD's revamped rules.
 
 I look around and I see no standards. It is still time to be
 experimental.

Since someone complained of my meekness, let me restate that... :-)

1) BS. That was not your point. Your point, in which you spent many
paragraphs, was that the present way FreeBSD things does it stuff
cannot support passing a method through an intermediate host/fs that
does not know it.

If your point was the above, you could just have said no one else
does it this way, so we won't be able to have non-FreeBSD
intermediate/frontend/backend hosts. Only that does not prove that
our way is not right.

2) There is *no* compatibility in the VFS out there. It's a jungle.
If we implemented something compatible with anyone else, it would be
a first. And given that everything out there have it's problems, it
would be a huge mistake to adopt someone's standard just for the
sake of being compatible.

And if you disagree with point 2, feel free to argue against it. But
in no way it will justify that absurd comment you made.

Either that paragraph was trying to cover a flaw in your logic, or
you just lost your train of thought. It certainly detracted from the
content of the message. You must assume that the intermediate host
doesn't play by your rules. Bah.

[not that I don't generally agree with you more often than it would
be prudent to let it be publicly known :-) ]

--
Daniel C. Sobral(8-DCS)
d...@newsguy.com
d...@freebsd.org

- Can I speak to your superior?
- There's some religious debate on that question.




To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-20 Thread Daniel C. Sobral
Terry Lambert wrote:
 
 That's kind of the point.  No other VFS stacking system out there
 plays by FreeBSD's revamped rules.

I look around and I see no standards. It is still time to be
experimental.

--
Daniel C. Sobral(8-DCS)
d...@newsguy.com
d...@freebsd.org

- Can I speak to your superior?
- There's some religious debate on that question.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-19 Thread Russell Cattelan

Glad to hear somebody is willing to dive in to XFS.


Right now I am one of three people working on the XFS to linux port, so I
have
pretty good view of what is currently happening.

When is it going to be ready?
Don't hold your breath. Officially SGI has said by the end of the year,
technically... whew
frankly I can't even guess. I would hope within a month or so we will
have the basics of a FS.

There are a lot of hurtles to overcome. XFS is a very very complex file
system that relies on
some of the more advanced features of IRIX. The buffer cache and chunk
cache (chunking
buffers together to do large IO) are two  examples that come to mind. SGI
is rewriting
the buffer cache (calling it the page cache) such that is will be able to
support XFS.
chunk cache... ? not sure what we are going to do with that.

We have been having several discussions about the best way to
"interface".
IRIX uses VFS,VNODE,BEHAVIOR which is similar to the BSD's interface
but of course very  IRIX specific. Linux's vfs/vnode is different from
either.
Realizing this, a lot of our discussions have been around how to go at
making a
new/modify existing interface layer that might be more "universal"
i.e. not irix not linux not bsd not etc specific.

In reading Terry's   Bill's comments seems there is a a lot of room for
improvement.

Initially we trying to make as few changes as possible to XFS to get an
initial implementation
running on linux. After we get things running we will start to analyze
where the problems exist,
and decide what direction in terms of interface to take at that time.

I would like any constructive input people have on this matter. I have a
pretty good
chance of setting design direction.
Be waned: SGI at the moment is committed to linux, development directions
will favor that platform.
They are not against other OS's being XFS'atized but SGI is in the
business of selling
hardware/solutions based on that hardware and linux one of the OS they
have decided to use for
their intel based boxes.

Also as far as the GPL issue goes,  get over it! I understand the issues
and agree with many
of the points.
My suggestion lets find a way to work with the GPL (i.e. loadable kernel
module /
softupdates model)
If somebody has a very very good argument/solution to the licensing
debate let me
know, I can present it to the people dealing with the lawyers.
The license issue has slowed the release of the actual code more than
anything else,
and will not be revisited again without great pain.


 I am currently conducting a thorough study of the VFS subsystem
 in preparation for an all-out effort to port SGI's XFS filesystem to
 FreeBSD 4.x at such time as SGI gives up the code.  Matt Dillon
 has written in hackers- that the VFS subsystem is presently not
 well understood by any of the active kernel code contributers and
 that it will be rewritten later this year.  This is obviously of great
 concern to me in this port.  I greatly appreciate all assistance in
 answering the following questions:

 1)  What are the perceived problems with the current VFS?
 2)  What options are available to us as remedies?
 3)  To what extent will existing FS code require revision in order
  to be useful after the rewrite?
 4)  Will Chapters 6,7,8  9 of "The Design and Implementation of
  the 4.4BSD Operating System" still pertain after the rewrite?
 5)  How important are questions 3  4 in the design of the new
  VFS?

 I believe that the VFS is conceptually sound and that the existing
 semantics should be strictly retained in the new code.  Any new
 functionality should be added in the form of entirely new kernel
 routines and system calls, or possibly by such means as
 converting the existing routines to the vararg format etc.

 Does anyone know when SGI will release XFS?



--
Russell Cattelan
[EMAIL PROTECTED]





To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



RE: BSD XFS Port BSD VFS Rewrite

1999-08-19 Thread Alton, Matthew

Do you have access to more of the code than is currently posted on SGI's
web page?  I am willing to sign an NDA in order to get access to all
relevant source.  I would like to assist in porting XFS to Linux also.  I would
very much like to see SGI succeed by using open source software in the 
commercial realm.  As for licensing issues, I am purely agnostic -- I trust that
any legal issues can be worked out after the fact by the proper people.

 -Original Message-
 From: Russell Cattelan [SMTP:[EMAIL PROTECTED]]
 Sent: Thursday, August 19, 1999 12:41 AM
 To:   Alton, Matthew
 Cc:   '[EMAIL PROTECTED]'; '[EMAIL PROTECTED]'
 Subject:  Re: BSD XFS Port  BSD VFS Rewrite
 
 Glad to hear somebody is willing to dive in to XFS.
 
 
 Right now I am one of three people working on the XFS to linux port, so I
 have
 pretty good view of what is currently happening.
 
 When is it going to be ready?
 Don't hold your breath. Officially SGI has said by the end of the year,
 technically... whew
 frankly I can't even guess. I would hope within a month or so we will
 have the basics of a FS.
 
 There are a lot of hurtles to overcome. XFS is a very very complex file
 system that relies on
 some of the more advanced features of IRIX. The buffer cache and chunk
 cache (chunking
 buffers together to do large IO) are two  examples that come to mind. SGI
 is rewriting
 the buffer cache (calling it the page cache) such that is will be able to
 support XFS.
 chunk cache... ? not sure what we are going to do with that.
 
 We have been having several discussions about the best way to
 "interface".
 IRIX uses VFS,VNODE,BEHAVIOR which is similar to the BSD's interface
 but of course very  IRIX specific. Linux's vfs/vnode is different from
 either.
 Realizing this, a lot of our discussions have been around how to go at
 making a
 new/modify existing interface layer that might be more "universal"
 i.e. not irix not linux not bsd not etc specific.
 
 In reading Terry's   Bill's comments seems there is a a lot of room for
 improvement.
 
 Initially we trying to make as few changes as possible to XFS to get an
 initial implementation
 running on linux. After we get things running we will start to analyze
 where the problems exist,
 and decide what direction in terms of interface to take at that time.
 
 I would like any constructive input people have on this matter. I have a
 pretty good
 chance of setting design direction.
 Be waned: SGI at the moment is committed to linux, development directions
 will favor that platform.
 They are not against other OS's being XFS'atized but SGI is in the
 business of selling
 hardware/solutions based on that hardware and linux one of the OS they
 have decided to use for
 their intel based boxes.
 
 Also as far as the GPL issue goes,  get over it! I understand the issues
 and agree with many
 of the points.
 My suggestion lets find a way to work with the GPL (i.e. loadable kernel
 module /
 softupdates model)
 If somebody has a very very good argument/solution to the licensing
 debate let me
 know, I can present it to the people dealing with the lawyers.
 The license issue has slowed the release of the actual code more than
 anything else,
 and will not be revisited again without great pain.
 
 
  I am currently conducting a thorough study of the VFS subsystem
  in preparation for an all-out effort to port SGI's XFS filesystem to
  FreeBSD 4.x at such time as SGI gives up the code.  Matt Dillon
  has written in hackers- that the VFS subsystem is presently not
  well understood by any of the active kernel code contributers and
  that it will be rewritten later this year.  This is obviously of great
  concern to me in this port.  I greatly appreciate all assistance in
  answering the following questions:
 
  1)  What are the perceived problems with the current VFS?
  2)  What options are available to us as remedies?
  3)  To what extent will existing FS code require revision in order
   to be useful after the rewrite?
  4)  Will Chapters 6,7,8  9 of "The Design and Implementation of
   the 4.4BSD Operating System" still pertain after the rewrite?
  5)  How important are questions 3  4 in the design of the new
   VFS?
 
  I believe that the VFS is conceptually sound and that the existing
  semantics should be strictly retained in the new code.  Any new
  functionality should be added in the form of entirely new kernel
  routines and system calls, or possibly by such means as
  converting the existing routines to the vararg format etc.
 
  Does anyone know when SGI will release XFS?
 
 
 
 --
 Russell Cattelan
 [EMAIL PROTECTED]
 
 
 
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-hackers" in the body of the message



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-19 Thread Terry Lambert

 Terry Lambert wrote:
  
  Make sure that the system you are talking to over the proxy is
  not assumed to be a FreeBSD system (e.g. don't assume that the
  vfs_default stuff exists on the other end of the proxy, or that
  it would be functional).
 
 Now, Terry, that is ridiculous. One has to assume that both ends
 play by the same rules. That is not only a reasonably expectation,
 it's minimum requirement for any protocol to work.

That's kind of the point.  No other VFS stacking system out there
plays by FreeBSD's revamped rules.


Terry Lambert
[EMAIL PROTECTED]
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-19 Thread Chuck Silvers

On Wed, Aug 18, 1999 at 08:43:14PM +, Terry Lambert wrote:
Nope. The problem is that while stacking (null, umap, and overlay fs's)
work, we don't have the coherency issues worked out so that upper layers
can cache data. i.e. so that the lower fs knows it has to ask the uper
layers to give pages back. :-) But multiple ls -lR's work fine. :-)
   
   With UVM in NetBSD, this is (supposedly) not an issue.
  
  UBC. UVM is a new memory manager. UBC unifies the buffer cache with the VM
  system.
 
 I was under the impression that th "U" in "UVM" was for "Unified".
 
 Does NetBSD not have a unified VM and buffer cache?  is th "U" in
 "UVM" referring not to buffer cache unification, but to platform
 unification?
 
 It was my understanding from John Dyson, who had to work on NetBSD
 for NCI, that the new NetBSD stuff actually unified the VM and the
 buffer cache.
 
 If this isn't the case, then, yes, you will need to lock all the way
 up and down, and eat the copy overhead for the concurrency for the
 intermediate vnodes.  8-(.

netbsd w/UVM currently doesn't have unified caches.  that feature is
what I named UBC, for "unified buffer cache" (ala DEC's UBC).
the U in UVM doesn't actually stand for anything.  :-)


   You could actually think of it this way, as well: only FS's that
   contain vnodes that provide backing should implement VOP_GETPAGES
   and VOP_PUTPAGES, and all I/O should be done through paging.
  
  Right. That's part of UBC. :-)
 
 Yep.  Again, if NetBSD doesn't have this, it's really important
 that it obtain it.  8-(.

I'm workin' on it... it'll go in soon after the branch for the next release
is created (ie. it won't be in the next release, but the one after that).

-Chuck


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-19 Thread Russell Cattelan
Glad to hear somebody is willing to dive in to XFS.


Right now I am one of three people working on the XFS to linux port, so I
have
pretty good view of what is currently happening.

When is it going to be ready?
Don't hold your breath. Officially SGI has said by the end of the year,
technically... whew
frankly I can't even guess. I would hope within a month or so we will
have the basics of a FS.

There are a lot of hurtles to overcome. XFS is a very very complex file
system that relies on
some of the more advanced features of IRIX. The buffer cache and chunk
cache (chunking
buffers together to do large IO) are two  examples that come to mind. SGI
is rewriting
the buffer cache (calling it the page cache) such that is will be able to
support XFS.
chunk cache... ? not sure what we are going to do with that.

We have been having several discussions about the best way to
interface.
IRIX uses VFS,VNODE,BEHAVIOR which is similar to the BSD's interface
but of course very  IRIX specific. Linux's vfs/vnode is different from
either.
Realizing this, a lot of our discussions have been around how to go at
making a
new/modify existing interface layer that might be more universal
i.e. not irix not linux not bsd not etc specific.

In reading Terry's   Bill's comments seems there is a a lot of room for
improvement.

Initially we trying to make as few changes as possible to XFS to get an
initial implementation
running on linux. After we get things running we will start to analyze
where the problems exist,
and decide what direction in terms of interface to take at that time.

I would like any constructive input people have on this matter. I have a
pretty good
chance of setting design direction.
Be waned: SGI at the moment is committed to linux, development directions
will favor that platform.
They are not against other OS's being XFS'atized but SGI is in the
business of selling
hardware/solutions based on that hardware and linux one of the OS they
have decided to use for
their intel based boxes.

Also as far as the GPL issue goes,  get over it! I understand the issues
and agree with many
of the points.
My suggestion lets find a way to work with the GPL (i.e. loadable kernel
module /
softupdates model)
If somebody has a very very good argument/solution to the licensing
debate let me
know, I can present it to the people dealing with the lawyers.
The license issue has slowed the release of the actual code more than
anything else,
and will not be revisited again without great pain.


 I am currently conducting a thorough study of the VFS subsystem
 in preparation for an all-out effort to port SGI's XFS filesystem to
 FreeBSD 4.x at such time as SGI gives up the code.  Matt Dillon
 has written in hackers- that the VFS subsystem is presently not
 well understood by any of the active kernel code contributers and
 that it will be rewritten later this year.  This is obviously of great
 concern to me in this port.  I greatly appreciate all assistance in
 answering the following questions:

 1)  What are the perceived problems with the current VFS?
 2)  What options are available to us as remedies?
 3)  To what extent will existing FS code require revision in order
  to be useful after the rewrite?
 4)  Will Chapters 6,7,8  9 of The Design and Implementation of
  the 4.4BSD Operating System still pertain after the rewrite?
 5)  How important are questions 3  4 in the design of the new
  VFS?

 I believe that the VFS is conceptually sound and that the existing
 semantics should be strictly retained in the new code.  Any new
 functionality should be added in the form of entirely new kernel
 routines and system calls, or possibly by such means as
 converting the existing routines to the vararg format etc.

 Does anyone know when SGI will release XFS?



--
Russell Cattelan
catte...@thebarn.com





To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



RE: BSD XFS Port BSD VFS Rewrite

1999-08-19 Thread Alton, Matthew
Do you have access to more of the code than is currently posted on SGI's
web page?  I am willing to sign an NDA in order to get access to all
relevant source.  I would like to assist in porting XFS to Linux also.  I would
very much like to see SGI succeed by using open source software in the 
commercial realm.  As for licensing issues, I am purely agnostic -- I trust that
any legal issues can be worked out after the fact by the proper people.

 -Original Message-
 From: Russell Cattelan [SMTP:catte...@thebarn.com]
 Sent: Thursday, August 19, 1999 12:41 AM
 To:   Alton, Matthew
 Cc:   'hack...@freebsd.org'; 'f...@freebsd.org'
 Subject:  Re: BSD XFS Port  BSD VFS Rewrite
 
 Glad to hear somebody is willing to dive in to XFS.
 
 
 Right now I am one of three people working on the XFS to linux port, so I
 have
 pretty good view of what is currently happening.
 
 When is it going to be ready?
 Don't hold your breath. Officially SGI has said by the end of the year,
 technically... whew
 frankly I can't even guess. I would hope within a month or so we will
 have the basics of a FS.
 
 There are a lot of hurtles to overcome. XFS is a very very complex file
 system that relies on
 some of the more advanced features of IRIX. The buffer cache and chunk
 cache (chunking
 buffers together to do large IO) are two  examples that come to mind. SGI
 is rewriting
 the buffer cache (calling it the page cache) such that is will be able to
 support XFS.
 chunk cache... ? not sure what we are going to do with that.
 
 We have been having several discussions about the best way to
 interface.
 IRIX uses VFS,VNODE,BEHAVIOR which is similar to the BSD's interface
 but of course very  IRIX specific. Linux's vfs/vnode is different from
 either.
 Realizing this, a lot of our discussions have been around how to go at
 making a
 new/modify existing interface layer that might be more universal
 i.e. not irix not linux not bsd not etc specific.
 
 In reading Terry's   Bill's comments seems there is a a lot of room for
 improvement.
 
 Initially we trying to make as few changes as possible to XFS to get an
 initial implementation
 running on linux. After we get things running we will start to analyze
 where the problems exist,
 and decide what direction in terms of interface to take at that time.
 
 I would like any constructive input people have on this matter. I have a
 pretty good
 chance of setting design direction.
 Be waned: SGI at the moment is committed to linux, development directions
 will favor that platform.
 They are not against other OS's being XFS'atized but SGI is in the
 business of selling
 hardware/solutions based on that hardware and linux one of the OS they
 have decided to use for
 their intel based boxes.
 
 Also as far as the GPL issue goes,  get over it! I understand the issues
 and agree with many
 of the points.
 My suggestion lets find a way to work with the GPL (i.e. loadable kernel
 module /
 softupdates model)
 If somebody has a very very good argument/solution to the licensing
 debate let me
 know, I can present it to the people dealing with the lawyers.
 The license issue has slowed the release of the actual code more than
 anything else,
 and will not be revisited again without great pain.
 
 
  I am currently conducting a thorough study of the VFS subsystem
  in preparation for an all-out effort to port SGI's XFS filesystem to
  FreeBSD 4.x at such time as SGI gives up the code.  Matt Dillon
  has written in hackers- that the VFS subsystem is presently not
  well understood by any of the active kernel code contributers and
  that it will be rewritten later this year.  This is obviously of great
  concern to me in this port.  I greatly appreciate all assistance in
  answering the following questions:
 
  1)  What are the perceived problems with the current VFS?
  2)  What options are available to us as remedies?
  3)  To what extent will existing FS code require revision in order
   to be useful after the rewrite?
  4)  Will Chapters 6,7,8  9 of The Design and Implementation of
   the 4.4BSD Operating System still pertain after the rewrite?
  5)  How important are questions 3  4 in the design of the new
   VFS?
 
  I believe that the VFS is conceptually sound and that the existing
  semantics should be strictly retained in the new code.  Any new
  functionality should be added in the form of entirely new kernel
  routines and system calls, or possibly by such means as
  converting the existing routines to the vararg format etc.
 
  Does anyone know when SGI will release XFS?
 
 
 
 --
 Russell Cattelan
 catte...@thebarn.com
 
 
 
 
 
 To Unsubscribe: send mail to majord...@freebsd.org
 with unsubscribe freebsd-hackers in the body of the message



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-19 Thread Terry Lambert
 Terry Lambert wrote:
  
  Make sure that the system you are talking to over the proxy is
  not assumed to be a FreeBSD system (e.g. don't assume that the
  vfs_default stuff exists on the other end of the proxy, or that
  it would be functional).
 
 Now, Terry, that is ridiculous. One has to assume that both ends
 play by the same rules. That is not only a reasonably expectation,
 it's minimum requirement for any protocol to work.

That's kind of the point.  No other VFS stacking system out there
plays by FreeBSD's revamped rules.


Terry Lambert
te...@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



RE: BSD XFS Port BSD VFS Rewrite

1999-08-19 Thread Alfred Perlstein
On Thu, 19 Aug 1999, Alton, Matthew wrote:

 Do you have access to more of the code than is currently posted on SGI's
 web page?  I am willing to sign an NDA in order to get access to all
 relevant source.  I would like to assist in porting XFS to Linux also.  I 
 would
 very much like to see SGI succeed by using open source software in the 
 commercial realm.  As for licensing issues, I am purely agnostic -- I trust 
 that
 any legal issues can be worked out after the fact by the proper people.

You mean like the USL lawsuit? :)

And why are we talking about Linux and crossposting it to
two seperate FreeBSD mailing lists? 

-Alfred



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-19 Thread Chuck Silvers
On Wed, Aug 18, 1999 at 08:43:14PM +, Terry Lambert wrote:
Nope. The problem is that while stacking (null, umap, and overlay fs's)
work, we don't have the coherency issues worked out so that upper layers
can cache data. i.e. so that the lower fs knows it has to ask the uper
layers to give pages back. :-) But multiple ls -lR's work fine. :-)
   
   With UVM in NetBSD, this is (supposedly) not an issue.
  
  UBC. UVM is a new memory manager. UBC unifies the buffer cache with the VM
  system.
 
 I was under the impression that th U in UVM was for Unified.
 
 Does NetBSD not have a unified VM and buffer cache?  is th U in
 UVM referring not to buffer cache unification, but to platform
 unification?
 
 It was my understanding from John Dyson, who had to work on NetBSD
 for NCI, that the new NetBSD stuff actually unified the VM and the
 buffer cache.
 
 If this isn't the case, then, yes, you will need to lock all the way
 up and down, and eat the copy overhead for the concurrency for the
 intermediate vnodes.  8-(.

netbsd w/UVM currently doesn't have unified caches.  that feature is
what I named UBC, for unified buffer cache (ala DEC's UBC).
the U in UVM doesn't actually stand for anything.  :-)


   You could actually think of it this way, as well: only FS's that
   contain vnodes that provide backing should implement VOP_GETPAGES
   and VOP_PUTPAGES, and all I/O should be done through paging.
  
  Right. That's part of UBC. :-)
 
 Yep.  Again, if NetBSD doesn't have this, it's really important
 that it obtain it.  8-(.

I'm workin' on it... it'll go in soon after the branch for the next release
is created (ie. it won't be in the next release, but the one after that).

-Chuck


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Terry Lambert writes:
   I'm not familiar with the VFS_default stuff. All the vop_default_desc
   routines in NetBSD point to error routines.
  
  In FreeBSD, they now point to default routines that are *not* error
  routines.  This is the problem.  I admit the change was very well
  intentioned, since it made the code a hell of a lot more readable,
  but choosing between readable and additional function, I take function
  over form (I think the way I would have "fixed" the readability is by
  making the operations that result in the descriptor set for a mounted
  FS instance be both discrete, and named for their specific function).
 
 As I recall most of FBSD's default routines are also error routines, if
 the exceptions were a problem it would would be trivial to fix.

You would have to de-collapse several VOP lists that have been
pre-collapsed.

You are talking gibberish here.  Please show code where this is
a problem.

--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Bill 
Studenmund writes:

Whew! That's reasuring. I agree there are things which need fixing. It'd
be nice if both NetBSD and FreeBSD could fix things in the same way.

Well, that still remains to be seen...

The use of the "vfs_default" to make unimplemented VOP's
fall through to code which implements function, while well
intentioned, is misguided.
 
 I beg to differ.  The only difference is that we pass through
 multiple layers before we hit the bottom of the stack.  There is
 no loss of functionality but significant gain of clarity and
 modularity.

If I understood the issue, it is that the leaf fs's (the bottom ones)
would use a default routine for non-error functionality. I think Terry's
point (which I agree with) was that a leaf fs's default routine should
only return errors.

I beg to differ.  It is far more likely, in my mind, that you will
want to handle a currently existing, unimplemented VOP than add a
new one.  Using the default for all unimplemented VOPs makes this
possible, using the same logic which makes adding a VOP possible.

Go back and review the diffs from when I did this, and my other
argument why this is a good idea should be obvious.

I doubt we need more than 64 bit times. 2^63 seconds works out to
292,279,025,208 years, or 292 (american) billion years. Current theories
put the age of the universe at I think 12 to 16 billion years. So 64-bit
signed times in seconds will cover from before the big bang to way past
any time we'll be caring about. :-)

But we cannot do time in seconds resolution, we need to resolve at least
the cpu clock frequency, which right now is approaching 1GHz (30bit!)

--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Nate Williams

  Both struct timespec and struct timeval are major mistakes, they
  make arithmetic on timestamps an expensive operation.  Timestamps
  should be stored as integers using an fix-point notations, for
  instance 64bits with 32bit fractional seconds (the NTP timestamp),
  or in the future 128/48.
...
 
  Extending from 64 to 128bits would be a cheap shift and increased
  precision and range could go hand in hand.
 
 I doubt we need more than 64 bit times. 2^63 seconds works out to
 292,279,025,208 years, or 292 (american) billion years.

I think Poul's point is that in the future seconds is probably way too
coarse grained.  Computer's are getting faster all the time, and in the
future we may need 64 seconds, plus an additional 64 bits for the
fractions of a second, which will be necessary for accurate timekeeping.




Nate


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Julian Elischer



On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:

 Matt doesn't represent the FreeBSD project, and even if he rewrites
 the VFS subsystem so he can understand it, his rewrite would face
 considerable resistance on its way into FreeBSD.  I don't think
 there is reason to rewrite it, but there certainly are areas
 that need fixing.

You are misinformed as far as I know.. From discussions I saw, th
main architect of a VFS rewrite would be Kirk, and Matt would be acting as
Kirk's right-hand-man.

 
 The use of the "vfs_default" to make unimplemented VOP's
 fall through to code which implements function, while well
 intentioned, is misguided.
 
 I beg to differ.  The only difference is that we pass through
 multiple layers before we hit the bottom of the stack.  There is
 no loss of functionality but significant gain of clarity and
 modularity.

Well I believe that Kirk considers them misguided too, but he stated that
he wasn't going to remove them without serious thought about the alternatives.
 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Terry Lambert

  2.  Advisory locks are hung off private backing objects.
   I'm not sure. The struct lock * is only used by layered filesystems, so
   they can keep track both of the underlying vnode lock, and if needed their
   own vnode lock. For advisory locks, would we want to keep track both of
   locks on our layer and the layer below? Don't we want either one or the
   other? i.e. layers bypass to the one below, or deal with it all
   themselves.
  
  I think you want the lock on the intermediate layer: basically, on
  every vnode that has data associated with it that is unique to a
  layer.  Let's not forget, also, that you can expose a layer into
  the namespace in one place, and expose it covered under another
  layer, at another.  If you locked down to the backing object, then
  the only issue you would be left with is one or more intermediate
  backing objects.
 
 Right. That exported struct lock * makes locking down to the lowest-level
 file easy - you just feed it to the lock manager, and you're locking the
 same lock the lowest level fs uses. You then lock all vnodes stacked over
 this one at the same time. Otherwise, you just call VOP_LOCK below and
 then lock yourself.

I think this defeats the purpose of the stacking architecture; I
think that if you look at an unadulterated NULLFS, you'll see what I
mean.

Intermediate FS's should not trap VOP's that are not applicable
to them.

One of the purposes of doing a VOP_LOCK on intermediate vnodes
that aren't backing objects is to deal with the global vnode
pool management.  I'd really like FS's to own their vnode pools,
but even without that, you don't need the locking, since you
only need to flush data on vnodes that are backing objects.

If we look at a stack of FS's with intermediate exposure into the
namespace, then it's clear that the issue is really only applicable
to objects that act as a backing store:


--  --  
FS  Exposed in hierarchyBacking object
--  --  
top yes no
intermediate_1  no  no
intermediate_2  no  yes
intermediate_3  yes no
bottom  no  yes
--  --  

So when we lock "top", we only lock in intermediate_2 and in bottom.

Then we attempt to lock in intermediate_3, but it fails: not because
there is a lock on the vnode in intermediate_3, but because there is
a lock in bottom.

It's unnecessary to lock the vnodes in the intermediate path, or
even at the exposure level, unless they are vnodes that have an
associated backing store.

The need to lock in intermediate_2 exists because it is a translation
layer or a namespace escape.  It deals with compression, or it deals
with file-as-a-directory folding, or it deals with file-hiding
(perhaps for a quoata file), etc..  If it didn't, it wouldn't need
backing store (and therefore wouldn't need to be locked).


  For a layer with an intermediate backing object, I'm prepared to
  declare it "special", and proxy the operation down to any inferior
  backing object (e.g. a union FS that adds files from two FS's
  together, rather than just directoriy entry lists).  I think such
  layers are the exception, not the rule.
 
 Actually isn't the only problem when you have vnode fan-in (union FS)? 
 i.e.  a plain compressing layer should not introduce vnode locking
 problems. 

If it's a block compression layer, it will.  Also a translation layer;
consider a pure Unicode system that wants to remotely mount an FS
from a legacy system.  To do this, it needs to expand the pages from
the legacy system [only it can, since the legacy system doesn't know
about Unicode] in a 2:1 ratio.  Now consider doing a byte-range lock
on a file on such a system.  To propogate the lock, you have to do
an arithmetic conversion at the translation layer.  This gets worse
if the lower end FS is exposed in the namespace as well.

You could make the same arguments for other types of translation or
namespace escapes.


  I think that export policies are the realm of /etc/exports.
  
  The problem with each FS implementing its own policy, is that this
  is another place that copyinstr() gets called, when it shouldn't.
 
 Well, my thought was that, like with current code, most every fs would
 just call vfs_export() when it's presented an export operation. But by
 retaining the option of having the fs do its own thing, we can support
 different export semantics if desired.

I think this bears down on whether the NFS server VFS consumer is
allowed access to the VFS stack at the particular intermediate
layer.  I think this is really an administrative policy decision,
and not an option for the VFS.

I think it would be bad if a given VFS could refuse to participate
in a 

Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Terry Lambert

I'm not familiar with the VFS_default stuff. All the vop_default_desc
routines in NetBSD point to error routines.
   
   In FreeBSD, they now point to default routines that are *not* error
   routines.  This is the problem.  I admit the change was very well
   intentioned, since it made the code a hell of a lot more readable,
   but choosing between readable and additional function, I take function
   over form (I think the way I would have "fixed" the readability is by
   making the operations that result in the descriptor set for a mounted
   FS instance be both discrete, and named for their specific function).
  
  As I recall most of FBSD's default routines are also error routines, if
  the exceptions were a problem it would would be trivial to fix.
 
 You would have to de-collapse several VOP lists that have been
 pre-collapsed.
 
 You are talking gibberish here.  Please show code where this is
 a problem.

When you write a proxy stacking layer, such as John Heidemann's
network proxy stacking layer (an NFS alternative), VOP's which
would normally be handled by vfs_default have to be handled on
the other end of the proxy, instead, in the same way that they
would be handled by the vfs_default stuff.

Some VOP's, like advisory locking, need both local assertion and
remote proxy of the VOP to avoid introducing race windows.

The result of this is that, if you rely on the vfs_default stuff,
then you can't proxy those VOP's into a different address space,
either on another machine, or to a user space VFS stacking layer
developement environment.

This is the same problem that embedding VM references directly
into any FS causes, and that vm_object_t aliases would exacerbate.

John has, in the past, sent me a number of stacking layers done
by various people, with the requirement that I not redistribute
them, as they are not what he would consider to be properly
representative of finished work.

Since John himself did the network proxy, you could perhaps get
him to send you a copy, so you could have direct access to code
where this was a problem.

Make sure that the system you are talking to over the proxy is
not assumed to be a FreeBSD system (e.g. don't assume that the
vfs_default stuff exists on the other end of the proxy, or that
it would be functional).


Terry Lambert
[EMAIL PROTECTED]
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], Terry Lambert writes:

 You would have to de-collapse several VOP lists that have been
 pre-collapsed.
 
 You are talking gibberish here.  Please show code where this is
 a problem.

When you write a proxy stacking layer, such as John Heidemann's
network proxy stacking layer (an NFS alternative), VOP's which
would normally be handled by vfs_default have to be handled on
the other end of the proxy, instead, in the same way that they
would be handled by the vfs_default stuff.

And what prevents you from taking over the default op ?

--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Bill Studenmund

On Wed, 18 Aug 1999, Terry Lambert wrote:

  Right. That exported struct lock * makes locking down to the lowest-level
  file easy - you just feed it to the lock manager, and you're locking the
  same lock the lowest level fs uses. You then lock all vnodes stacked over
  this one at the same time. Otherwise, you just call VOP_LOCK below and
  then lock yourself.
 
 I think this defeats the purpose of the stacking architecture; I
 think that if you look at an unadulterated NULLFS, you'll see what I
 mean.

Please be more precise. I have looked at an unadulterated NULLFS, and
found it lacking. I don't see how this change breaks stacking.

 Intermediate FS's should not trap VOP's that are not applicable
 to them.

True. But VOP_LOCK is applicable to layered fs's. :-)

 One of the purposes of doing a VOP_LOCK on intermediate vnodes
 that aren't backing objects is to deal with the global vnode
 pool management.  I'd really like FS's to own their vnode pools,
 but even without that, you don't need the locking, since you
 only need to flush data on vnodes that are backing objects.
 
 If we look at a stack of FS's with intermediate exposure into the
 namespace, then it's clear that the issue is really only applicable
 to objects that act as a backing store:
 
 
 ----  
 FSExposed in hierarchyBacking object
 ----  
 top   yes no
 intermediate_1no  no
 intermediate_2no  yes
 intermediate_3yes no
 bottomno  yes
 ----  
 
 So when we lock "top", we only lock in intermediate_2 and in bottom.

No. One of the things Heidemann notes in his dissertation is that to
prevent deadlock, you have to lock the whole stack of vnodes at once, not
bit by bit.

i.e. there is one lock for the whole thing.

  Actually isn't the only problem when you have vnode fan-in (union FS)? 
  i.e.  a plain compressing layer should not introduce vnode locking
  problems. 
 
 If it's a block compression layer, it will.  Also a translation layer;
 consider a pure Unicode system that wants to remotely mount an FS
 from a legacy system.  To do this, it needs to expand the pages from
 the legacy system [only it can, since the legacy system doesn't know
 about Unicode] in a 2:1 ratio.  Now consider doing a byte-range lock
 on a file on such a system.  To propogate the lock, you have to do
 an arithmetic conversion at the translation layer.  This gets worse
 if the lower end FS is exposed in the namespace as well.

Wait. byte-range locking is different from vnode locking. I've been
talking about vnode locking, which is different from the byte-range
locking you're discussing above.

  Nope. The problem is that while stacking (null, umap, and overlay fs's)
  work, we don't have the coherency issues worked out so that upper layers
  can cache data. i.e. so that the lower fs knows it has to ask the uper
  layers to give pages back. :-) But multiple ls -lR's work fine. :-)
 
 With UVM in NetBSD, this is (supposedly) not an issue.

UBC. UVM is a new memory manager. UBC unifies the buffer cache with the VM
system.

 You could actually think of it this way, as well: only FS's that
 contain vnodes that provide backing should implement VOP_GETPAGES
 and VOP_PUTPAGES, and all I/O should be done through paging.

Right. That's part of UBC. :-)

Take care,

Bill



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Terry Lambert

   Right. That exported struct lock * makes locking down to the lowest-level
   file easy - you just feed it to the lock manager, and you're locking the
   same lock the lowest level fs uses. You then lock all vnodes stacked over
   this one at the same time. Otherwise, you just call VOP_LOCK below and
   then lock yourself.
  
  I think this defeats the purpose of the stacking architecture; I
  think that if you look at an unadulterated NULLFS, you'll see what I
  mean.
 
 Please be more precise. I have looked at an unadulterated NULLFS, and
 found it lacking. I don't see how this change breaks stacking.


OK, there's the concept of "collapse" of stacking layer.  This was
first introduced in the Rosenthal stacking vnode architecture, out
of Sun Microsystems.

Rosenthal was concerned that, when you stack 500 putatively "null"
NULLFS's, that the amount of function call overhead not increase
proportionally.

To resolve this, he introduced the concept of a "collapsed" VFS
stack.  That is, the actual array of function vectors is actually
a one dimensional projection of a two dimensional stack, and that
the visible portion is actually where the first layer on the way
down the stack that implements a VOP occurs.

We can visualize this like so:

VOPs
Layer | VOP1VOP2VOP3VOP4VOP5VOP6...
---
L1  -   -   -   imp -   -   ...
L2  imp -   -   imp -   imp ...
L3  imp -   -   imp imp -   ...
L4  -   -   imp -   -   -   ...
L5  imp imp imp imp imp imp ...

The resulting "collapsed" array of entry vectors looks like so:

L2VOP1  L5VOP2  L4VOP3  L1VOP4  L3VOP5  L2VOP6  ...

There is an implicit assumption here that most stacks will not be
randomly staggered like this example.  The idea behind this
assumption is that additional layers will most frequently add
functionality, rather than replacing it.

Heidemann carried this idea over into his architecture, to be
employed at the point that a VFS stack is first instanced.

The BSD4.4 implementation of this is partially flawed.  There is
an implicit implementation of this for the UFS/FFS "stack" of
layers, in the VOP's descriptor array exported by the combination
of the two being hard coded as being a precollapsed stack.  This
is actually antithetical to the design.

The second place this flaw is apparent is in the inability to
add VOP's into an existing kernel, since the entry point vector
is a fixed size, and is not expanded implicitly by the act of
adding a VFS layer containing a new VOP.

For the use of non-error vfs_defaults, this is also flawed for
proxies, but not for the consumer of the VFS stack, only for the
producer end on the other side of the proxy, which although it
does not implement a particular VOP, needs to _NOT_ use the
local vfs_default for the VOP, but instead needs to proxy the
VOP over to the other side for remote processing.

The act of getting a vfs_default VOP after a collapse, instead
of having a NULL entry point that the descriptor call mechanism
treats as a call failure, damages the ability to proxy unknown
VOP's.


  Intermediate FS's should not trap VOP's that are not applicable
  to them.
 
 True. But VOP_LOCK is applicable to layered fs's. :-)

Only for translation layers that require local backing store.  I'm
prepared to make an exception for them, and require that they
explicitly call the VOP in the underlying vnode over which they are
stacked.  This is the same compromise that both Rosenthal and
Heidemann consciously chose.


  One of the purposes of doing a VOP_LOCK on intermediate vnodes
  that aren't backing objects is to deal with the global vnode
  pool management.  I'd really like FS's to own their vnode pools,
  but even without that, you don't need the locking, since you
  only need to flush data on vnodes that are backing objects.
  
  If we look at a stack of FS's with intermediate exposure into the
  namespace, then it's clear that the issue is really only applicable
  to objects that act as a backing store:
  
  
  --  --  
  FS  Exposed in hierarchyBacking object
  --  --  
  top yes no
  intermediate_1  no  no
  intermediate_2  no  yes
  intermediate_3  yes no
  bottom  no  yes
  --  --  
  
  So when we lock "top", we only lock in intermediate_2 and in bottom.
 
 No. One of the things Heidemann notes in his dissertation is that to
 prevent deadlock, you have to lock the whole stack of vnodes at 

Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Poul-Henning Kamp


Terry,

It is very fine with this example, but I'm not even going to bother
much with it for several reasons, most of which you can find codified
in the development rules for X11 which you can find in Scheiflers
book.

But for the record: your example would get even shorter on
the code we had before I started using the default op sensibly
because all the layers tended to shunt things they didn't 
understand to errno rather than pass them through, so in
fact my change took us closer to being able to handle the
rather lofty example you have here.

Once you show me an actual implementation which has a problem
with it, I will look at it again, until then, I think pretty
much everything else is more important (Scheiflers 1st rule :-)

Poul-Henning

 And what prevents you from taking over the default op ?

It needs to be NULL, not taken over.


machine 1  machine2machine 3

vfs consumer
upper proxy - lower proxy
   vfs stacking layer
   upper proxy - lower proxy
   vfs producer

How do I get a VOP, unknown to machine 2, from the vfs consumer
on machine 1 that does know about it, to the vfs producer on
machine 3 that also knows about it?

--
Poul-Henning Kamp FreeBSD coreteam member
[EMAIL PROTECTED]   "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Terry Lambert

  You would have to de-collapse several VOP lists that have been
  pre-collapsed.
  
  You are talking gibberish here.  Please show code where this is
  a problem.
 
 When you write a proxy stacking layer, such as John Heidemann's
 network proxy stacking layer (an NFS alternative), VOP's which
 would normally be handled by vfs_default have to be handled on
 the other end of the proxy, instead, in the same way that they
 would be handled by the vfs_default stuff.
 
 And what prevents you from taking over the default op ?

It needs to be NULL, not taken over.


machine 1   machine2machine 3

vfs consumer
upper proxy - lower proxy
vfs stacking layer
upper proxy - lower proxy
vfs producer

How do I get a VOP, unknown to machine 2, from the vfs consumer
on machine 1 that does know about it, to the vfs producer on
machine 3 that also knows about it?

My understanding is that it is very hard, given vfs_default:

On machine 1, since the upper proxy doesn't know from VOP's, it
wants to locally satisfy it from vfs_default on machine 1.  Taking
over the default op doesn't really help me; I have to do surgery
to the in core dispatch vector instance to do the job properly
(e.g. zapping it out, not taking it over).

On machine 2, it is out of range, but still needs to be passed
through the stacking layer, from the lower porxy to the upper
proxy (and the response, back).


Terry Lambert
[EMAIL PROTECTED]
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Poul-Henning Kamp
In message pine.sol.3.96.990816105106.27345h-100...@marcy.nas.nasa.gov, Bill 
Studenmund writes:
On Sat, 14 Aug 1999, Terry Lambert wrote:

  I am currently conducting a thorough study of the VFS subsystem
  in preparation for an all-out effort to port SGI's XFS filesystem to
  FreeBSD 4.x at such time as SGI gives up the code.  Matt Dillon
  has written in hackers- that the VFS subsystem is presently not
  well understood by any of the active kernel code contributers and
  that it will be rewritten later this year.  This is obviously of great
  concern to me in this port.
 
 It is of great concern to me that a rewrite, apparently because of
 non-understanding, is taking place at all.

That concerns me too. Many aspects of the 4.4 vnode interface were there  
for specific reasons. Even if they were hack solutions, to re-write them  
because of a lack of understanding is dangerous as the new code will
likely run into the same problems as before. :-)

Matt doesn't represent the FreeBSD project, and even if he rewrites
the VFS subsystem so he can understand it, his rewrite would face
considerable resistance on its way into FreeBSD.  I don't think
there is reason to rewrite it, but there certainly are areas
that need fixing.

  The use of the vfs_default to make unimplemented VOP's
  fall through to code which implements function, while well
  intentioned, is misguided.

I beg to differ.  The only difference is that we pass through
multiple layers before we hit the bottom of the stack.  There is
no loss of functionality but significant gain of clarity and
modularity.

Adding a new VOP entails the same thing as it has always done.

 3.   The filesystem itself is broken for Y2038
 
  The space which was historically reserved for the Y2038 fix
  (a 64 bit time_t) was absconeded with for subsecond resoloution.
 
  This change should be reverted, and fsck modified to re-zero
  the values, given a specific argument.

That would break make(1) on contemporary machines.

One other suggestion I've heard is to split the 64 bits we have for time
into 44 bits for seconds, and 20 bits for microseconds. That's more than
enough modification resolution, and also pushes things to past year
500,000 AD. Versioning the indoe would cover this easily.

This would be misguided, and given the current speed of evolution
lead to other problems far before 2038.

Both struct timespec and struct timeval are major mistakes, they
make arithmetic on timestamps an expensive operation.  Timestamps
should be stored as integers using an fix-point notations, for
instance 64bits with 32bit fractional seconds (the NTP timestamp),
or in the future 128/48.

Extending from 64 to 128bits would be a cheap shift and increased
precision and range could go hand in hand.

If we don't want to extend the size of the timestamps before 2038,
(and we should not only look at filesystems here), then the correct
fix will be to move the epoch and use the inode version to mark
this fact.

--
Poul-Henning Kamp FreeBSD coreteam member
p...@freebsd.org   Real hackers run -current on their laptop.
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Terry Lambert
   I'm not familiar with the VFS_default stuff. All the vop_default_desc
   routines in NetBSD point to error routines.
  
  In FreeBSD, they now point to default routines that are *not* error
  routines.  This is the problem.  I admit the change was very well
  intentioned, since it made the code a hell of a lot more readable,
  but choosing between readable and additional function, I take function
  over form (I think the way I would have fixed the readability is by
  making the operations that result in the descriptor set for a mounted
  FS instance be both discrete, and named for their specific function).
 
 As I recall most of FBSD's default routines are also error routines, if
 the exceptions were a problem it would would be trivial to fix.

You would have to de-collapse several VOP lists that have been
pre-collapsed.  The pre-collapse is also an issue for stacking,
since the collapse is supposed to be late bound to the stacking
operation itself.  This lets you revisit it later when you need
to add a new VOP into the system, so that there's a NULL pointer
in the VOP slot for older FS's, in case you stack on top of them.
This is particularly true of an FS stacked on an FS stacked on a
proxy layer.


 I think fixing resource allocation/deallocation for things like vnodes,
 cnbufs, and locks are a higher priority for now.  There are examples such
 as in detached threading where it might make sense for the detached child
 to be responsible for releasing resources allocated to it by the parent,
 but in stacking this model is very messy and unnatural.  This is why the
 purpose of VOP_ABORTOP appears to be to release cnbufs but this is really
 just an ugly side effect.  With stacking the code that allocates should be
 the code that deallocates. Substitute, code  with layer to be more
 correct. 

Yes.  That's actually maintenance, not rewrite, and I think it's
very important to address.  I'm rather pleased with the way the
NFS stuff has turned out (so far), and I was the one calling for
a return to first principles (i.e. a rewrite from the specification).


 I fixed a lot of the vnode and locking cases, unfortunately the ones that
 remain are probably ugly cases where you have to reacquire locks that had
 to be unlocked somewhere in the executing layer.  See VOP_RENAME for an
 example.  Compare the number of WILLRELEs in vnode_if.src in FreeBSD and
 NetBSD, ideally there'd be none.

The way I handled this in the rename case on my hacking box was by
adding a flag to the namei() call.  You could call this flag the
same as WILLRELE, but it had inverse semantics.

Really, this is another issue of reflexivity being absent from an
interface.  You really don't want asymmetric interfaces (VOP_LOCK
is an example, in many cases, based on internal use in the FFS).


Terry Lambert
te...@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Poul-Henning Kamp
In message 199908181716.kaa12...@usr02.primenet.com, Terry Lambert writes:
   I'm not familiar with the VFS_default stuff. All the vop_default_desc
   routines in NetBSD point to error routines.
  
  In FreeBSD, they now point to default routines that are *not* error
  routines.  This is the problem.  I admit the change was very well
  intentioned, since it made the code a hell of a lot more readable,
  but choosing between readable and additional function, I take function
  over form (I think the way I would have fixed the readability is by
  making the operations that result in the descriptor set for a mounted
  FS instance be both discrete, and named for their specific function).
 
 As I recall most of FBSD's default routines are also error routines, if
 the exceptions were a problem it would would be trivial to fix.

You would have to de-collapse several VOP lists that have been
pre-collapsed.

You are talking gibberish here.  Please show code where this is
a problem.

--
Poul-Henning Kamp FreeBSD coreteam member
p...@freebsd.org   Real hackers run -current on their laptop.
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Bill Studenmund
On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:

 In message pine.sol.3.96.990816105106.27345h-100...@marcy.nas.nasa.gov, 
 Bill 
 Studenmund writes:
 On Sat, 14 Aug 1999, Terry Lambert wrote:
 
 Matt doesn't represent the FreeBSD project, and even if he rewrites
 the VFS subsystem so he can understand it, his rewrite would face
 considerable resistance on its way into FreeBSD.  I don't think
 there is reason to rewrite it, but there certainly are areas
 that need fixing.

Whew! That's reasuring. I agree there are things which need fixing. It'd
be nice if both NetBSD and FreeBSD could fix things in the same way.

 The use of the vfs_default to make unimplemented VOP's
 fall through to code which implements function, while well
 intentioned, is misguided.
 
 I beg to differ.  The only difference is that we pass through
 multiple layers before we hit the bottom of the stack.  There is
 no loss of functionality but significant gain of clarity and
 modularity.

If I understood the issue, it is that the leaf fs's (the bottom ones)
would use a default routine for non-error functionality. I think Terry's
point (which I agree with) was that a leaf fs's default routine should
only return errors.

  3. The filesystem itself is broken for Y2038
 One other suggestion I've heard is to split the 64 bits we have for time
 into 44 bits for seconds, and 20 bits for microseconds. That's more than
 enough modification resolution, and also pushes things to past year
 500,000 AD. Versioning the indoe would cover this easily.
 
 This would be misguided, and given the current speed of evolution
 lead to other problems far before 2038.
 
 Both struct timespec and struct timeval are major mistakes, they
 make arithmetic on timestamps an expensive operation.  Timestamps
 should be stored as integers using an fix-point notations, for
 instance 64bits with 32bit fractional seconds (the NTP timestamp),
 or in the future 128/48.

I like that idea.

One thing I should probably mention is that I'm not suggesting we ever do
arighmetic on the 44/20 number, just we store it that way. struct inode
would contain time fields in whatever format the host prefers, with the
44/20 stuff only being in struct dinode. Converting from 44/20 would only
happen on initial read. Math would happen on the host format version. :-)

If time structures go to 64/32 fixed-point math, then my suggestion can be
re-phrased as storing 44.20 worth of that number in the on-disk inode.

 Extending from 64 to 128bits would be a cheap shift and increased
 precision and range could go hand in hand.

I doubt we need more than 64 bit times. 2^63 seconds works out to
292,279,025,208 years, or 292 (american) billion years. Current theories
put the age of the universe at I think 12 to 16 billion years. So 64-bit
signed times in seconds will cover from before the big bang to way past
any time we'll be caring about. :-)

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Poul-Henning Kamp
In message pine.sol.3.96.990818101005.14430b-100...@marcy.nas.nasa.gov, Bill 
Studenmund writes:

Whew! That's reasuring. I agree there are things which need fixing. It'd
be nice if both NetBSD and FreeBSD could fix things in the same way.

Well, that still remains to be seen...

The use of the vfs_default to make unimplemented VOP's
fall through to code which implements function, while well
intentioned, is misguided.
 
 I beg to differ.  The only difference is that we pass through
 multiple layers before we hit the bottom of the stack.  There is
 no loss of functionality but significant gain of clarity and
 modularity.

If I understood the issue, it is that the leaf fs's (the bottom ones)
would use a default routine for non-error functionality. I think Terry's
point (which I agree with) was that a leaf fs's default routine should
only return errors.

I beg to differ.  It is far more likely, in my mind, that you will
want to handle a currently existing, unimplemented VOP than add a
new one.  Using the default for all unimplemented VOPs makes this
possible, using the same logic which makes adding a VOP possible.

Go back and review the diffs from when I did this, and my other
argument why this is a good idea should be obvious.

I doubt we need more than 64 bit times. 2^63 seconds works out to
292,279,025,208 years, or 292 (american) billion years. Current theories
put the age of the universe at I think 12 to 16 billion years. So 64-bit
signed times in seconds will cover from before the big bang to way past
any time we'll be caring about. :-)

But we cannot do time in seconds resolution, we need to resolve at least
the cpu clock frequency, which right now is approaching 1GHz (30bit!)

--
Poul-Henning Kamp FreeBSD coreteam member
p...@freebsd.org   Real hackers run -current on their laptop.
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Nate Williams
  Both struct timespec and struct timeval are major mistakes, they
  make arithmetic on timestamps an expensive operation.  Timestamps
  should be stored as integers using an fix-point notations, for
  instance 64bits with 32bit fractional seconds (the NTP timestamp),
  or in the future 128/48.
...
 
  Extending from 64 to 128bits would be a cheap shift and increased
  precision and range could go hand in hand.
 
 I doubt we need more than 64 bit times. 2^63 seconds works out to
 292,279,025,208 years, or 292 (american) billion years.

I think Poul's point is that in the future seconds is probably way too
coarse grained.  Computer's are getting faster all the time, and in the
future we may need 64 seconds, plus an additional 64 bits for the
fractions of a second, which will be necessary for accurate timekeeping.




Nate


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Bill Studenmund
On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:

 In message pine.sol.3.96.990818101005.14430b-100...@marcy.nas.nasa.gov, 
 Bill Studenmund writes:
 
 Whew! That's reasuring. I agree there are things which need fixing. It'd
 be nice if both NetBSD and FreeBSD could fix things in the same way.
 
 Well, that still remains to be seen...

:-)

 I doubt we need more than 64 bit times. 2^63 seconds works out to
 292,279,025,208 years, or 292 (american) billion years. Current theories
 put the age of the universe at I think 12 to 16 billion years. So 64-bit
 signed times in seconds will cover from before the big bang to way past
 any time we'll be caring about. :-)

I was unclear. I was refering to the seconds side of things. Sub-second
resolution would need other bits.

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Poul-Henning Kamp
In message 199908181737.laa03...@mt.sri.com, Nate Williams writes:
  Both struct timespec and struct timeval are major mistakes, they
  make arithmetic on timestamps an expensive operation.  Timestamps
  should be stored as integers using an fix-point notations, for
  instance 64bits with 32bit fractional seconds (the NTP timestamp),
  or in the future 128/48.
...
 
  Extending from 64 to 128bits would be a cheap shift and increased
  precision and range could go hand in hand.
 
 I doubt we need more than 64 bit times. 2^63 seconds works out to
 292,279,025,208 years, or 292 (american) billion years.

I think Poul's point is that in the future seconds is probably way too
coarse grained.  Computer's are getting faster all the time, and in the
future we may need 64 seconds, plus an additional 64 bits for the
fractions of a second, which will be necessary for accurate timekeeping.

No, 64bits of fractions will not be needed, at least until we start
using FreeBSD as embedded computer in Heisenbergcompensators.

I recall somebody saying that 100GHz was the highest realistic (or
lowest unrealistic) clock frequency using digital logic, the argument
was pretty convincing physically: light speed sets a size limit,
that prescripes some voltage gradients which in turn produces EMC
which in turn makes sure nothing works.  Also various tunnel effects,
and the general heisenberisms took their toll.

State of the art time interval measuring equipment is into the
a few picosecond territory (http://www.timing.com/).

Based on that I would say that 40 to 48 bits will be OK for the
fraction.

As a sidebar:  I had a kernel running which used 32i.32f timestamps
and converted to timeval  timespec as needed and it actually made
a lot of code look a lot more sane.  I may go back and do it some
day.

--
Poul-Henning Kamp FreeBSD coreteam member
p...@freebsd.org   Real hackers run -current on their laptop.
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Poul-Henning Kamp
In message pine.sol.3.96.990818104932.14430d-100...@marcy.nas.nasa.gov, Bill 
Studenmund writes:

 I doubt we need more than 64 bit times. 2^63 seconds works out to
 292,279,025,208 years, or 292 (american) billion years. Current theories
 put the age of the universe at I think 12 to 16 billion years. So 64-bit
 signed times in seconds will cover from before the big bang to way past
 any time we'll be caring about. :-)

I was unclear. I was refering to the seconds side of things. Sub-second
resolution would need other bits.

Yes, but we need subsecond in the filesystems.  Think about make(1) on
a blinding fast machine...

--
Poul-Henning Kamp FreeBSD coreteam member
p...@freebsd.org   Real hackers run -current on their laptop.
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Julian Elischer


On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:

 Matt doesn't represent the FreeBSD project, and even if he rewrites
 the VFS subsystem so he can understand it, his rewrite would face
 considerable resistance on its way into FreeBSD.  I don't think
 there is reason to rewrite it, but there certainly are areas
 that need fixing.

You are misinformed as far as I know.. From discussions I saw, th
main architect of a VFS rewrite would be Kirk, and Matt would be acting as
Kirk's right-hand-man.

 
 The use of the vfs_default to make unimplemented VOP's
 fall through to code which implements function, while well
 intentioned, is misguided.
 
 I beg to differ.  The only difference is that we pass through
 multiple layers before we hit the bottom of the stack.  There is
 no loss of functionality but significant gain of clarity and
 modularity.

Well I believe that Kirk considers them misguided too, but he stated that
he wasn't going to remove them without serious thought about the alternatives.
 



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Nate Williams
  Matt doesn't represent the FreeBSD project, and even if he rewrites
  the VFS subsystem so he can understand it, his rewrite would face
  considerable resistance on its way into FreeBSD.  I don't think
  there is reason to rewrite it, but there certainly are areas
  that need fixing.
 
 You are misinformed as far as I know.. From discussions I saw, th
 main architect of a VFS rewrite would be Kirk, and Matt would be acting as
 Kirk's right-hand-man.

Which discussions are these?  Are they archived somewhere?


Nate


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Poul-Henning Kamp
In message pine.bsf.3.95.990818105716.12306a-100...@current1.whistle.com, 
Julian Elischer writes:
On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:

 Matt doesn't represent the FreeBSD project, and even if he rewrites
 the VFS subsystem so he can understand it, his rewrite would face
 considerable resistance on its way into FreeBSD.  I don't think
 there is reason to rewrite it, but there certainly are areas
 that need fixing.

You are misinformed as far as I know.. From discussions I saw, th
main architect of a VFS rewrite would be Kirk, and Matt would be acting as
Kirk's right-hand-man.

I bet that Matt and Kirk uses rewrite for two very different
concepts.  The resulting reviews will be equally different.

The use of the vfs_default to make unimplemented VOP's
fall through to code which implements function, while well
intentioned, is misguided.
 
 I beg to differ.  The only difference is that we pass through
 multiple layers before we hit the bottom of the stack.  There is
 no loss of functionality but significant gain of clarity and
 modularity.

Well I believe that Kirk considers them misguided too, but he stated that
he wasn't going to remove them without serious thought about the alternatives.

I'll be more than ready to discuss this with Kirk.

--
Poul-Henning Kamp FreeBSD coreteam member
p...@freebsd.org   Real hackers run -current on their laptop.
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Terry Lambert
  2.  Advisory locks are hung off private backing objects.
   I'm not sure. The struct lock * is only used by layered filesystems, so
   they can keep track both of the underlying vnode lock, and if needed their
   own vnode lock. For advisory locks, would we want to keep track both of
   locks on our layer and the layer below? Don't we want either one or the
   other? i.e. layers bypass to the one below, or deal with it all
   themselves.
  
  I think you want the lock on the intermediate layer: basically, on
  every vnode that has data associated with it that is unique to a
  layer.  Let's not forget, also, that you can expose a layer into
  the namespace in one place, and expose it covered under another
  layer, at another.  If you locked down to the backing object, then
  the only issue you would be left with is one or more intermediate
  backing objects.
 
 Right. That exported struct lock * makes locking down to the lowest-level
 file easy - you just feed it to the lock manager, and you're locking the
 same lock the lowest level fs uses. You then lock all vnodes stacked over
 this one at the same time. Otherwise, you just call VOP_LOCK below and
 then lock yourself.

I think this defeats the purpose of the stacking architecture; I
think that if you look at an unadulterated NULLFS, you'll see what I
mean.

Intermediate FS's should not trap VOP's that are not applicable
to them.

One of the purposes of doing a VOP_LOCK on intermediate vnodes
that aren't backing objects is to deal with the global vnode
pool management.  I'd really like FS's to own their vnode pools,
but even without that, you don't need the locking, since you
only need to flush data on vnodes that are backing objects.

If we look at a stack of FS's with intermediate exposure into the
namespace, then it's clear that the issue is really only applicable
to objects that act as a backing store:


--  --  
FS  Exposed in hierarchyBacking object
--  --  
top yes no
intermediate_1  no  no
intermediate_2  no  yes
intermediate_3  yes no
bottom  no  yes
--  --  

So when we lock top, we only lock in intermediate_2 and in bottom.

Then we attempt to lock in intermediate_3, but it fails: not because
there is a lock on the vnode in intermediate_3, but because there is
a lock in bottom.

It's unnecessary to lock the vnodes in the intermediate path, or
even at the exposure level, unless they are vnodes that have an
associated backing store.

The need to lock in intermediate_2 exists because it is a translation
layer or a namespace escape.  It deals with compression, or it deals
with file-as-a-directory folding, or it deals with file-hiding
(perhaps for a quoata file), etc..  If it didn't, it wouldn't need
backing store (and therefore wouldn't need to be locked).


  For a layer with an intermediate backing object, I'm prepared to
  declare it special, and proxy the operation down to any inferior
  backing object (e.g. a union FS that adds files from two FS's
  together, rather than just directoriy entry lists).  I think such
  layers are the exception, not the rule.
 
 Actually isn't the only problem when you have vnode fan-in (union FS)? 
 i.e.  a plain compressing layer should not introduce vnode locking
 problems. 

If it's a block compression layer, it will.  Also a translation layer;
consider a pure Unicode system that wants to remotely mount an FS
from a legacy system.  To do this, it needs to expand the pages from
the legacy system [only it can, since the legacy system doesn't know
about Unicode] in a 2:1 ratio.  Now consider doing a byte-range lock
on a file on such a system.  To propogate the lock, you have to do
an arithmetic conversion at the translation layer.  This gets worse
if the lower end FS is exposed in the namespace as well.

You could make the same arguments for other types of translation or
namespace escapes.


  I think that export policies are the realm of /etc/exports.
  
  The problem with each FS implementing its own policy, is that this
  is another place that copyinstr() gets called, when it shouldn't.
 
 Well, my thought was that, like with current code, most every fs would
 just call vfs_export() when it's presented an export operation. But by
 retaining the option of having the fs do its own thing, we can support
 different export semantics if desired.

I think this bears down on whether the NFS server VFS consumer is
allowed access to the VFS stack at the particular intermediate
layer.  I think this is really an administrative policy decision,
and not an option for the VFS.

I think it would be bad if a given VFS could refuse to participate
in a stacking 

Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Matthew Dillon
:On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:
:
: Matt doesn't represent the FreeBSD project, and even if he rewrites
: the VFS subsystem so he can understand it, his rewrite would face
: considerable resistance on its way into FreeBSD.  I don't think
: there is reason to rewrite it, but there certainly are areas
: that need fixing.
:
:You are misinformed as far as I know.. From discussions I saw, th
:main architect of a VFS rewrite would be Kirk, and Matt would be acting as
:Kirk's right-hand-man.

Yes, this is correct.  Kirk is going to be the main architect.  I have
been heavily involved and will continue to be.

:The use of the vfs_default to make unimplemented VOP's
:
: I beg to differ.  The only difference is that we pass through
: multiple layers before we hit the bottom of the stack.  There is
:...
:Well I believe that Kirk considers them misguided too, but he stated that
:he wasn't going to remove them without serious thought about the alternatives.

The vfs op callout layering has not been on the radar screen.  There
are much too many other more serious problems.  I really doubt that any
changes will be made to this piece any time in the next year or even two,
if at all.

The main items on the radar screen are related to buffer management
(struct buf stuff.  For example, preventing VM blockages due to pages
being wired by write I/O's), VFS locking and reference count issues 
(for example, namei lookups, blockages in the pager and syncer due to
vnode locks held by blocked processes, etc...), and interactions 
between VFS and VM (for example: moving away from VOP_READ/VOP_WRITE 
and moving more towards a getpages/putpages model).

None of the items have been set in stone yet.  We're waiting for Kirk
to get back from vacation and get back into the groove.

-Matt



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Julian Elischer
The discussions between Kirk and matt over a glass of beer/drink
at kirk's party at USENIX and at the Bay area User's group.



On Wed, 18 Aug 1999, Nate Williams wrote:

   Matt doesn't represent the FreeBSD project, and even if he rewrites
   the VFS subsystem so he can understand it, his rewrite would face
   considerable resistance on its way into FreeBSD.  I don't think
   there is reason to rewrite it, but there certainly are areas
   that need fixing.
  
  You are misinformed as far as I know.. From discussions I saw, th
  main architect of a VFS rewrite would be Kirk, and Matt would be acting as
  Kirk's right-hand-man.
 
 Which discussions are these?  Are they archived somewhere?
 
 
 Nate
 



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Terry Lambert
I'm not familiar with the VFS_default stuff. All the vop_default_desc
routines in NetBSD point to error routines.
   
   In FreeBSD, they now point to default routines that are *not* error
   routines.  This is the problem.  I admit the change was very well
   intentioned, since it made the code a hell of a lot more readable,
   but choosing between readable and additional function, I take function
   over form (I think the way I would have fixed the readability is by
   making the operations that result in the descriptor set for a mounted
   FS instance be both discrete, and named for their specific function).
  
  As I recall most of FBSD's default routines are also error routines, if
  the exceptions were a problem it would would be trivial to fix.
 
 You would have to de-collapse several VOP lists that have been
 pre-collapsed.
 
 You are talking gibberish here.  Please show code where this is
 a problem.

When you write a proxy stacking layer, such as John Heidemann's
network proxy stacking layer (an NFS alternative), VOP's which
would normally be handled by vfs_default have to be handled on
the other end of the proxy, instead, in the same way that they
would be handled by the vfs_default stuff.

Some VOP's, like advisory locking, need both local assertion and
remote proxy of the VOP to avoid introducing race windows.

The result of this is that, if you rely on the vfs_default stuff,
then you can't proxy those VOP's into a different address space,
either on another machine, or to a user space VFS stacking layer
developement environment.

This is the same problem that embedding VM references directly
into any FS causes, and that vm_object_t aliases would exacerbate.

John has, in the past, sent me a number of stacking layers done
by various people, with the requirement that I not redistribute
them, as they are not what he would consider to be properly
representative of finished work.

Since John himself did the network proxy, you could perhaps get
him to send you a copy, so you could have direct access to code
where this was a problem.

Make sure that the system you are talking to over the proxy is
not assumed to be a FreeBSD system (e.g. don't assume that the
vfs_default stuff exists on the other end of the proxy, or that
it would be functional).


Terry Lambert
te...@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Poul-Henning Kamp
In message 199908181848.laa14...@usr02.primenet.com, Terry Lambert writes:

 You would have to de-collapse several VOP lists that have been
 pre-collapsed.
 
 You are talking gibberish here.  Please show code where this is
 a problem.

When you write a proxy stacking layer, such as John Heidemann's
network proxy stacking layer (an NFS alternative), VOP's which
would normally be handled by vfs_default have to be handled on
the other end of the proxy, instead, in the same way that they
would be handled by the vfs_default stuff.

And what prevents you from taking over the default op ?

--
Poul-Henning Kamp FreeBSD coreteam member
p...@freebsd.org   Real hackers run -current on their laptop.
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Bill Studenmund
On Wed, 18 Aug 1999, Terry Lambert wrote:

  Right. That exported struct lock * makes locking down to the lowest-level
  file easy - you just feed it to the lock manager, and you're locking the
  same lock the lowest level fs uses. You then lock all vnodes stacked over
  this one at the same time. Otherwise, you just call VOP_LOCK below and
  then lock yourself.
 
 I think this defeats the purpose of the stacking architecture; I
 think that if you look at an unadulterated NULLFS, you'll see what I
 mean.

Please be more precise. I have looked at an unadulterated NULLFS, and
found it lacking. I don't see how this change breaks stacking.

 Intermediate FS's should not trap VOP's that are not applicable
 to them.

True. But VOP_LOCK is applicable to layered fs's. :-)

 One of the purposes of doing a VOP_LOCK on intermediate vnodes
 that aren't backing objects is to deal with the global vnode
 pool management.  I'd really like FS's to own their vnode pools,
 but even without that, you don't need the locking, since you
 only need to flush data on vnodes that are backing objects.
 
 If we look at a stack of FS's with intermediate exposure into the
 namespace, then it's clear that the issue is really only applicable
 to objects that act as a backing store:
 
 
 ----  
 FSExposed in hierarchyBacking object
 ----  
 top   yes no
 intermediate_1no  no
 intermediate_2no  yes
 intermediate_3yes no
 bottomno  yes
 ----  
 
 So when we lock top, we only lock in intermediate_2 and in bottom.

No. One of the things Heidemann notes in his dissertation is that to
prevent deadlock, you have to lock the whole stack of vnodes at once, not
bit by bit.

i.e. there is one lock for the whole thing.

  Actually isn't the only problem when you have vnode fan-in (union FS)? 
  i.e.  a plain compressing layer should not introduce vnode locking
  problems. 
 
 If it's a block compression layer, it will.  Also a translation layer;
 consider a pure Unicode system that wants to remotely mount an FS
 from a legacy system.  To do this, it needs to expand the pages from
 the legacy system [only it can, since the legacy system doesn't know
 about Unicode] in a 2:1 ratio.  Now consider doing a byte-range lock
 on a file on such a system.  To propogate the lock, you have to do
 an arithmetic conversion at the translation layer.  This gets worse
 if the lower end FS is exposed in the namespace as well.

Wait. byte-range locking is different from vnode locking. I've been
talking about vnode locking, which is different from the byte-range
locking you're discussing above.

  Nope. The problem is that while stacking (null, umap, and overlay fs's)
  work, we don't have the coherency issues worked out so that upper layers
  can cache data. i.e. so that the lower fs knows it has to ask the uper
  layers to give pages back. :-) But multiple ls -lR's work fine. :-)
 
 With UVM in NetBSD, this is (supposedly) not an issue.

UBC. UVM is a new memory manager. UBC unifies the buffer cache with the VM
system.

 You could actually think of it this way, as well: only FS's that
 contain vnodes that provide backing should implement VOP_GETPAGES
 and VOP_PUTPAGES, and all I/O should be done through paging.

Right. That's part of UBC. :-)

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Bill Studenmund
On Wed, 18 Aug 1999, Poul-Henning Kamp wrote:

 Yes, but we need subsecond in the filesystems.  Think about make(1) on
 a blinding fast machine...

Oh yes, I realize that. :-) It's just that I thought you were at one point
suggesting having 128 bits to the left of the decimal point (128 bits
worth of seconds). I was trying to say that'd be a bit much. :-)

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Terry Lambert
   Right. That exported struct lock * makes locking down to the lowest-level
   file easy - you just feed it to the lock manager, and you're locking the
   same lock the lowest level fs uses. You then lock all vnodes stacked over
   this one at the same time. Otherwise, you just call VOP_LOCK below and
   then lock yourself.
  
  I think this defeats the purpose of the stacking architecture; I
  think that if you look at an unadulterated NULLFS, you'll see what I
  mean.
 
 Please be more precise. I have looked at an unadulterated NULLFS, and
 found it lacking. I don't see how this change breaks stacking.


OK, there's the concept of collapse of stacking layer.  This was
first introduced in the Rosenthal stacking vnode architecture, out
of Sun Microsystems.

Rosenthal was concerned that, when you stack 500 putatively null
NULLFS's, that the amount of function call overhead not increase
proportionally.

To resolve this, he introduced the concept of a collapsed VFS
stack.  That is, the actual array of function vectors is actually
a one dimensional projection of a two dimensional stack, and that
the visible portion is actually where the first layer on the way
down the stack that implements a VOP occurs.

We can visualize this like so:

VOPs
Layer | VOP1VOP2VOP3VOP4VOP5VOP6...
---
L1  -   -   -   imp -   -   ...
L2  imp -   -   imp -   imp ...
L3  imp -   -   imp imp -   ...
L4  -   -   imp -   -   -   ...
L5  imp imp imp imp imp imp ...

The resulting collapsed array of entry vectors looks like so:

L2VOP1  L5VOP2  L4VOP3  L1VOP4  L3VOP5  L2VOP6  ...

There is an implicit assumption here that most stacks will not be
randomly staggered like this example.  The idea behind this
assumption is that additional layers will most frequently add
functionality, rather than replacing it.

Heidemann carried this idea over into his architecture, to be
employed at the point that a VFS stack is first instanced.

The BSD4.4 implementation of this is partially flawed.  There is
an implicit implementation of this for the UFS/FFS stack of
layers, in the VOP's descriptor array exported by the combination
of the two being hard coded as being a precollapsed stack.  This
is actually antithetical to the design.

The second place this flaw is apparent is in the inability to
add VOP's into an existing kernel, since the entry point vector
is a fixed size, and is not expanded implicitly by the act of
adding a VFS layer containing a new VOP.

For the use of non-error vfs_defaults, this is also flawed for
proxies, but not for the consumer of the VFS stack, only for the
producer end on the other side of the proxy, which although it
does not implement a particular VOP, needs to _NOT_ use the
local vfs_default for the VOP, but instead needs to proxy the
VOP over to the other side for remote processing.

The act of getting a vfs_default VOP after a collapse, instead
of having a NULL entry point that the descriptor call mechanism
treats as a call failure, damages the ability to proxy unknown
VOP's.


  Intermediate FS's should not trap VOP's that are not applicable
  to them.
 
 True. But VOP_LOCK is applicable to layered fs's. :-)

Only for translation layers that require local backing store.  I'm
prepared to make an exception for them, and require that they
explicitly call the VOP in the underlying vnode over which they are
stacked.  This is the same compromise that both Rosenthal and
Heidemann consciously chose.


  One of the purposes of doing a VOP_LOCK on intermediate vnodes
  that aren't backing objects is to deal with the global vnode
  pool management.  I'd really like FS's to own their vnode pools,
  but even without that, you don't need the locking, since you
  only need to flush data on vnodes that are backing objects.
  
  If we look at a stack of FS's with intermediate exposure into the
  namespace, then it's clear that the issue is really only applicable
  to objects that act as a backing store:
  
  
  --  --  
  FS  Exposed in hierarchyBacking object
  --  --  
  top yes no
  intermediate_1  no  no
  intermediate_2  no  yes
  intermediate_3  yes no
  bottom  no  yes
  --  --  
  
  So when we lock top, we only lock in intermediate_2 and in bottom.
 
 No. One of the things Heidemann notes in his dissertation is that to
 prevent deadlock, you have to lock the whole stack of vnodes at once, not
 bit by 

Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Terry Lambert
  You would have to de-collapse several VOP lists that have been
  pre-collapsed.
  
  You are talking gibberish here.  Please show code where this is
  a problem.
 
 When you write a proxy stacking layer, such as John Heidemann's
 network proxy stacking layer (an NFS alternative), VOP's which
 would normally be handled by vfs_default have to be handled on
 the other end of the proxy, instead, in the same way that they
 would be handled by the vfs_default stuff.
 
 And what prevents you from taking over the default op ?

It needs to be NULL, not taken over.


machine 1   machine2machine 3

vfs consumer
upper proxy - lower proxy
vfs stacking layer
upper proxy - lower proxy
vfs producer

How do I get a VOP, unknown to machine 2, from the vfs consumer
on machine 1 that does know about it, to the vfs producer on
machine 3 that also knows about it?

My understanding is that it is very hard, given vfs_default:

On machine 1, since the upper proxy doesn't know from VOP's, it
wants to locally satisfy it from vfs_default on machine 1.  Taking
over the default op doesn't really help me; I have to do surgery
to the in core dispatch vector instance to do the job properly
(e.g. zapping it out, not taking it over).

On machine 2, it is out of range, but still needs to be passed
through the stacking layer, from the lower porxy to the upper
proxy (and the response, back).


Terry Lambert
te...@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Poul-Henning Kamp

Terry,

It is very fine with this example, but I'm not even going to bother
much with it for several reasons, most of which you can find codified
in the development rules for X11 which you can find in Scheiflers
book.

But for the record: your example would get even shorter on
the code we had before I started using the default op sensibly
because all the layers tended to shunt things they didn't 
understand to errno rather than pass them through, so in
fact my change took us closer to being able to handle the
rather lofty example you have here.

Once you show me an actual implementation which has a problem
with it, I will look at it again, until then, I think pretty
much everything else is more important (Scheiflers 1st rule :-)

Poul-Henning

 And what prevents you from taking over the default op ?

It needs to be NULL, not taken over.


machine 1  machine2machine 3

vfs consumer
upper proxy - lower proxy
   vfs stacking layer
   upper proxy - lower proxy
   vfs producer

How do I get a VOP, unknown to machine 2, from the vfs consumer
on machine 1 that does know about it, to the vfs producer on
machine 3 that also knows about it?

--
Poul-Henning Kamp FreeBSD coreteam member
p...@freebsd.org   Real hackers run -current on their laptop.
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Terry Lambert
 Terry,
 
 It is very fine with this example, but I'm not even going to bother
 much with it for several reasons, most of which you can find codified
 in the development rules for X11 which you can find in Scheiflers
 book.
 
 But for the record: your example would get even shorter on
 the code we had before I started using the default op sensibly
 because all the layers tended to shunt things they didn't 
 understand to errno rather than pass them through, so in
 fact my change took us closer to being able to handle the
 rather lofty example you have here.
 
 Once you show me an actual implementation which has a problem
 with it, I will look at it again, until then, I think pretty
 much everything else is more important (Scheiflers 1st rule :-)
 
 Poul-Henning


That's a fair requirement.  I have some of Heidemann's code that
runs into the problem, but I don't have any that I can redistribute.

Would it be OK if I asked John to send you his code as well, if
you will abide with the non-redistribution requirement?

I understand the prioritization process, and FWIW, I agree with
it, in a resource-starved situation (e.g.g FreeBSD).


Terry Lambert
te...@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-18 Thread Daniel C. Sobral
Terry Lambert wrote:
 
 Make sure that the system you are talking to over the proxy is
 not assumed to be a FreeBSD system (e.g. don't assume that the
 vfs_default stuff exists on the other end of the proxy, or that
 it would be functional).

Now, Terry, that is ridiculous. One has to assume that both ends
play by the same rules. That is not only a reasonably expectation,
it's minimum requirement for any protocol to work.

--
Daniel C. Sobral(8-DCS)
d...@newsguy.com
d...@freebsd.org

- Can I speak to your superior?
- There's some religious debate on that question.




To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Sheldon Hearn



On Tue, 17 Aug 1999 00:30:02 CST, Warner Losh wrote:

 Acutally, the Nintendo 64 uses the Vr4300 series of chips from NEC.

!!!

I've been dethreading this subject line for a few days now, so I'm quite
relieved to see this, the one e-mail message which I happened to check in
on to make sure that I'm not missing anything.

Has anyone invoked Godwin's Law yet? And is anyone recycling the
bottle-rockets?

:-P

Ciao,
Sheldon.

PS: Yes, there's a comma missing from that first paragraph, but at
least there were no split infinitives.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Michael Hancock

  I'm not familiar with the VFS_default stuff. All the vop_default_desc
  routines in NetBSD point to error routines.
 
 In FreeBSD, they now point to default routines that are *not* error
 routines.  This is the problem.  I admit the change was very well
 intentioned, since it made the code a hell of a lot more readable,
 but choosing between readable and additional function, I take function
 over form (I think the way I would have "fixed" the readability is by
 making the operations that result in the descriptor set for a mounted
 FS instance be both discrete, and named for their specific function).

As I recall most of FBSD's default routines are also error routines, if
the exceptions were a problem it would would be trivial to fix.

I think fixing resource allocation/deallocation for things like vnodes,
cnbufs, and locks are a higher priority for now.  There are examples such
as in detached threading where it might make sense for the detached child
to be responsible for releasing resources allocated to it by the parent,
but in stacking this model is very messy and unnatural.  This is why the
purpose of VOP_ABORTOP appears to be to release cnbufs but this is really
just an ugly side effect.  With stacking the code that allocates should be
the code that deallocates. Substitute, "code"  with "layer" to be more
correct. 

I fixed a lot of the vnode and locking cases, unfortunately the ones that
remain are probably ugly cases where you have to reacquire locks that had
to be unlocked somewhere in the executing layer.  See VOP_RENAME for an
example.  Compare the number of WILLRELEs in vnode_if.src in FreeBSD and
NetBSD, ideally there'd be none.

Regards,


Mike Hancock




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Brian McGroarty

--- Warner Losh [EMAIL PROTECTED] wrote:
 In message
 [EMAIL PROTECTED]
 Brian McGroarty writes: 
 : So do the old and new Playstation models. The MIPS core is
 : being manufactured by several companies: IDT alone has
 : something like a dozen variants available with and without
 : MMU, FP, 5000 vs 1 core, etc. and is in far wider use
 : than in just PCs and gaming consoles. I doubt if SGI
 : machines abandoning MIPS processors would put much of a dent
 : in MIPS' profitability.
 
 NEC also makes about a dozen.  Although they aren't stupid
 enough to make any without MMUs :-)  I really don't like the
 IDT 4650, can you tell...


If you think that's bad, try writing 3D games on the MIPS
without a MMU, FPU or a data cache. Welcome to Playstation
programming. Admittedly it's got a direct-mapped scratchpad
where you can drop temporary data and some really esoteric
low-precision fixed-point matrix operators, but it's never
-quite- what you're looking for.

My last Playstation title saw upward of fifteen hundred lines of
tightly tuned assembly before all the special effects were up to
60fps again. The Lego Racers team have had an average of one and
a half programmers tuning their rendering engine for over a year
now. I think at this point they're assembly from the BSP
traversal on down.

Next project's Windows, Playstation 2 and FreeBSD/Linux
internally at the least. I'm finally a happy camper.


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Michael Hancock

On Tue, 17 Aug 1999, Bill Studenmund wrote:

 I've compared the two, and making the NetBSD number match the FreeBSD
 number is one of my goals. :-)
 
 Any suggestions, or just plodfix?

It can be very cumbersome tracking down references being bumped by
vref/VREF and other operations.

Among the uncompleted operations are VOPs that pre-release the returned
vpp to the caller.  I think in VOP_MKNOD this was done as a convenience
and you might have to add code to handle device vp aliases correctly.

Just remember the rule, the allocating layer must be the layer that
deallocates.

Regards,


Mike



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Bill Studenmund

On Tue, 17 Aug 1999, Terry Lambert wrote:

 2.Advisory locks are hung off private backing objects.
  I'm not sure. The struct lock * is only used by layered filesystems, so
  they can keep track both of the underlying vnode lock, and if needed their
  own vnode lock. For advisory locks, would we want to keep track both of
  locks on our layer and the layer below? Don't we want either one or the
  other? i.e. layers bypass to the one below, or deal with it all
  themselves.
 
 I think you want the lock on the intermediate layer: basically, on
 every vnode that has data associated with it that is unique to a
 layer.  Let's not forget, also, that you can expose a layer into
 the namespace in one place, and expose it covered under another
 layer, at another.  If you locked down to the backing object, then
 the only issue you would be left with is one or more intermediate
 backing objects.

Right. That exported struct lock * makes locking down to the lowest-level
file easy - you just feed it to the lock manager, and you're locking the
same lock the lowest level fs uses. You then lock all vnodes stacked over
this one at the same time. Otherwise, you just call VOP_LOCK below and
then lock yourself.

 For a layer with an intermediate backing object, I'm prepared to
 declare it "special", and proxy the operation down to any inferior
 backing object (e.g. a union FS that adds files from two FS's
 together, rather than just directoriy entry lists).  I think such
 layers are the exception, not the rule.

Actually isn't the only problem when you have vnode fan-in (union FS)? 
i.e.  a plain compressing layer should not introduce vnode locking
problems. 

 I think that export policies are the realm of /etc/exports.
 
 The problem with each FS implementing its own policy, is that this
 is another place that copyinstr() gets called, when it shouldn't.

Well, my thought was that, like with current code, most every fs would
just call vfs_export() when it's presented an export operation. But by
retaining the option of having the fs do its own thing, we can support
different export semantics if desired.

 Right.  The "covering" operation is not the same as the "marking as
 covered" operation.  Both need to be at the higher level.
 Not really.  Julian Elisher had code that mounted a /devfs under
 / automatically, before the user was ever allowed to see /.  As a
 result, the FS that you were left with was indistinguishable from
 what I describe.
 
 The only real difference is that, as a translucent mount over /devfs,
 the one I describe would be capable of implementing persistant changes
 to the /devfs, as whiteouts.  I don't think this is really that
 desirable, but some people won't accept a devfs that doesn't have
 traditional persistance semantics (e.g. "chmod" vs. modifying a
 well known kernel data structure as an administrative operation).

That wouldn't be hard to do. :-)

 I guess the other difference is that you don't have to worry about
 large minor numbers when you are bringing up a new platform via
 NFS from an old platform that can't support large minors in its FS
 at all.  ;-).

True. :-)

 I would resolve this by passing a standard option to the mount code
 in user space.  For root mounts, a vnode is passed down.  For other
 mounts, the vnode is parsed and passed if the option is specified.

Or maybe add a field to vfsops. This info says what the mount call will
expect (I want a block device, a regular file, a directory, etc), so it
fits. :-)

Also, if we leave it to userland, what happens if someone writes a
program which calls sys_mount with something the fs doesn't expect. :-)

 I think that you will only be able to find rare examples of FS's
 that don't take device names as arguments.  But for those, you
 don't specify the option, and it gets "NULL", and whatever local
 options you specify.

I agree I can't see a leaf fs not taking a device node. But layered fs's
certainly will want something else. :-)

 The point is that, for FS's that can be both root and sub-root,
 the mount code doesn't have to make the decision, it can be punted
 to higher level code, in one place, where the code can be centrally
 maintained and kept from getting "stale" when things change out
 from under it.

True.

And with good comments we can catch the times when the centrally located
code changes  brakes an assumption made by the fs. :-)

  Except for a minor buglet with device nodes, stacking works in NetBSD at
  present. :-)
 
 Have you tried Heidemann's student's stacking layers?  There is one
 encryption, and one per-file compression with namespace hiding, that
 I think it would be hard pressed to keep up with.  But I'll give it
 the benefit of the doubt.  8-).

Nope. The problem is that while stacking (null, umap, and overlay fs's)
work, we don't have the coherency issues worked out so that upper layers
can cache data. i.e. so that the lower fs knows it has to ask the uper
layers to give pages back. :-) But multiple 

Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Don Lewis

On Aug 16,  9:18pm, Terry Lambert wrote:
} Subject: Re: BSD XFS Port  BSD VFS Rewrite

}  I don't see how the namei recursion method prevents catching // as a
}  namespace escape.
} 
} 
} //apple-resource-fork/intermediate_dir/some_other_dir/file_with_fork
} 
} You can't inherit the fact that you are looking at the resource fork
} in the terminal component, ONLY.

I don't think this is a good example.  How would you access the resource
fork of a file relative to the current directory?  IMHO, the necessary
goop needs to go at the end of the path name.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Wes Peters

Don Lewis wrote:
 
 On Aug 16,  9:18pm, Terry Lambert wrote:
 } Subject: Re: BSD XFS Port  BSD VFS Rewrite
 
 }  I don't see how the namei recursion method prevents catching // as a
 }  namespace escape.
 }
 }
 } //apple-resource-fork/intermediate_dir/some_other_dir/file_with_fork
 }
 } You can't inherit the fact that you are looking at the resource fork
 } in the terminal component, ONLY.
 
 I don't think this is a good example.  How would you access the resource
 fork of a file relative to the current directory?  IMHO, the necessary
 goop needs to go at the end of the path name.

Pick a separator character that nobody in their right mind would use in
a file path.  "\" strikes me as a good candidate.  ;^)

-- 
"Where am I, and what am I doing in this handbasket?"

Wes Peters Softweyr LLC
http://softweyr.com/   [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Warner Losh
In message 19990816164048.28824.rocketm...@web1001.mail.yahoo.com
Brian McGroarty writes: 
: So do the old and new Playstation models. The MIPS core is being
: manufactured by several companies: IDT alone has something like
: a dozen variants available with and without MMU, FP, 5000 vs
: 1 core, etc. and is in far wider use than in just PCs and
: gaming consoles. I doubt if SGI machines abandoning MIPS
: processors would put much of a dent in MIPS' profitability.

NEC also makes about a dozen.  Although they aren't stupid enough to
make any without MMUs :-)  I really don't like the IDT 4650, can you
tell...

Warner


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Warner Losh
In message pine.bsf.3.96.990816201515.19879o-100...@haldjas.folklore.ee Narvi 
writes:
:  Nintendo 64 uses MIPS.
:  
: 
: Which doesn't matter all that much. MIPS cpus for nintendo could be made
: by say MISP, not SGI (and SGI sold/is trying to sell MIPS).

Acutally, the Nintendo 64 uses the Vr4300 series of chips from NEC.  I
think the new Nintendo will use a different (non-mips) processor, but
I'm not completely sure what the new one will be (when NEC announced
this, MIPS stock took a dive).  SGI has already spun out MIPS and has
been slowly reducing its stake in MIPS for some time now.

However, there is another gaming machine based on a 128bit MIPS design 
in the pipeline from, I think, Sony.

Warner


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Sheldon Hearn


On Tue, 17 Aug 1999 00:30:02 CST, Warner Losh wrote:

 Acutally, the Nintendo 64 uses the Vr4300 series of chips from NEC.

!!!

I've been dethreading this subject line for a few days now, so I'm quite
relieved to see this, the one e-mail message which I happened to check in
on to make sure that I'm not missing anything.

Has anyone invoked Godwin's Law yet? And is anyone recycling the
bottle-rockets?

:-P

Ciao,
Sheldon.

PS: Yes, there's a comma missing from that first paragraph, but at
least there were no split infinitives.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Michael Hancock
  I'm not familiar with the VFS_default stuff. All the vop_default_desc
  routines in NetBSD point to error routines.
 
 In FreeBSD, they now point to default routines that are *not* error
 routines.  This is the problem.  I admit the change was very well
 intentioned, since it made the code a hell of a lot more readable,
 but choosing between readable and additional function, I take function
 over form (I think the way I would have fixed the readability is by
 making the operations that result in the descriptor set for a mounted
 FS instance be both discrete, and named for their specific function).

As I recall most of FBSD's default routines are also error routines, if
the exceptions were a problem it would would be trivial to fix.

I think fixing resource allocation/deallocation for things like vnodes,
cnbufs, and locks are a higher priority for now.  There are examples such
as in detached threading where it might make sense for the detached child
to be responsible for releasing resources allocated to it by the parent,
but in stacking this model is very messy and unnatural.  This is why the
purpose of VOP_ABORTOP appears to be to release cnbufs but this is really
just an ugly side effect.  With stacking the code that allocates should be
the code that deallocates. Substitute, code  with layer to be more
correct. 

I fixed a lot of the vnode and locking cases, unfortunately the ones that
remain are probably ugly cases where you have to reacquire locks that had
to be unlocked somewhere in the executing layer.  See VOP_RENAME for an
example.  Compare the number of WILLRELEs in vnode_if.src in FreeBSD and
NetBSD, ideally there'd be none.

Regards,


Mike Hancock




To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Brian McGroarty
--- Warner Losh i...@village.org wrote:
 In message
 19990816164048.28824.rocketm...@web1001.mail.yahoo.com
 Brian McGroarty writes: 
 : So do the old and new Playstation models. The MIPS core is
 : being manufactured by several companies: IDT alone has
 : something like a dozen variants available with and without
 : MMU, FP, 5000 vs 1 core, etc. and is in far wider use
 : than in just PCs and gaming consoles. I doubt if SGI
 : machines abandoning MIPS processors would put much of a dent
 : in MIPS' profitability.
 
 NEC also makes about a dozen.  Although they aren't stupid
 enough to make any without MMUs :-)  I really don't like the
 IDT 4650, can you tell...


If you think that's bad, try writing 3D games on the MIPS
without a MMU, FPU or a data cache. Welcome to Playstation
programming. Admittedly it's got a direct-mapped scratchpad
where you can drop temporary data and some really esoteric
low-precision fixed-point matrix operators, but it's never
-quite- what you're looking for.

My last Playstation title saw upward of fifteen hundred lines of
tightly tuned assembly before all the special effects were up to
60fps again. The Lego Racers team have had an average of one and
a half programmers tuning their rendering engine for over a year
now. I think at this point they're assembly from the BSP
traversal on down.

Next project's Windows, Playstation 2 and FreeBSD/Linux
internally at the least. I'm finally a happy camper.


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Bill Studenmund
On Tue, 17 Aug 1999, Michael Hancock wrote:

 As I recall most of FBSD's default routines are also error routines, if
 the exceptions were a problem it would would be trivial to fix.
 
 I think fixing resource allocation/deallocation for things like vnodes,
 cnbufs, and locks are a higher priority for now.  There are examples such
 as in detached threading where it might make sense for the detached child
 to be responsible for releasing resources allocated to it by the parent,
 but in stacking this model is very messy and unnatural.  This is why the
 purpose of VOP_ABORTOP appears to be to release cnbufs but this is really
 just an ugly side effect.  With stacking the code that allocates should be
 the code that deallocates. Substitute, code  with layer to be more
 correct. 
 
 I fixed a lot of the vnode and locking cases, unfortunately the ones that
 remain are probably ugly cases where you have to reacquire locks that had
 to be unlocked somewhere in the executing layer.  See VOP_RENAME for an
 example.  Compare the number of WILLRELEs in vnode_if.src in FreeBSD and
 NetBSD, ideally there'd be none.

I've compared the two, and making the NetBSD number match the FreeBSD
number is one of my goals. :-)

Any suggestions, or just plodfix?

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Michael Hancock
On Tue, 17 Aug 1999, Bill Studenmund wrote:

 I've compared the two, and making the NetBSD number match the FreeBSD
 number is one of my goals. :-)
 
 Any suggestions, or just plodfix?

It can be very cumbersome tracking down references being bumped by
vref/VREF and other operations.

Among the uncompleted operations are VOPs that pre-release the returned
vpp to the caller.  I think in VOP_MKNOD this was done as a convenience
and you might have to add code to handle device vp aliases correctly.

Just remember the rule, the allocating layer must be the layer that
deallocates.

Regards,


Mike



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Bill Studenmund
On Tue, 17 Aug 1999, Terry Lambert wrote:

 2.Advisory locks are hung off private backing objects.
  I'm not sure. The struct lock * is only used by layered filesystems, so
  they can keep track both of the underlying vnode lock, and if needed their
  own vnode lock. For advisory locks, would we want to keep track both of
  locks on our layer and the layer below? Don't we want either one or the
  other? i.e. layers bypass to the one below, or deal with it all
  themselves.
 
 I think you want the lock on the intermediate layer: basically, on
 every vnode that has data associated with it that is unique to a
 layer.  Let's not forget, also, that you can expose a layer into
 the namespace in one place, and expose it covered under another
 layer, at another.  If you locked down to the backing object, then
 the only issue you would be left with is one or more intermediate
 backing objects.

Right. That exported struct lock * makes locking down to the lowest-level
file easy - you just feed it to the lock manager, and you're locking the
same lock the lowest level fs uses. You then lock all vnodes stacked over
this one at the same time. Otherwise, you just call VOP_LOCK below and
then lock yourself.

 For a layer with an intermediate backing object, I'm prepared to
 declare it special, and proxy the operation down to any inferior
 backing object (e.g. a union FS that adds files from two FS's
 together, rather than just directoriy entry lists).  I think such
 layers are the exception, not the rule.

Actually isn't the only problem when you have vnode fan-in (union FS)? 
i.e.  a plain compressing layer should not introduce vnode locking
problems. 

 I think that export policies are the realm of /etc/exports.
 
 The problem with each FS implementing its own policy, is that this
 is another place that copyinstr() gets called, when it shouldn't.

Well, my thought was that, like with current code, most every fs would
just call vfs_export() when it's presented an export operation. But by
retaining the option of having the fs do its own thing, we can support
different export semantics if desired.

 Right.  The covering operation is not the same as the marking as
 covered operation.  Both need to be at the higher level.
 Not really.  Julian Elisher had code that mounted a /devfs under
 / automatically, before the user was ever allowed to see /.  As a
 result, the FS that you were left with was indistinguishable from
 what I describe.
 
 The only real difference is that, as a translucent mount over /devfs,
 the one I describe would be capable of implementing persistant changes
 to the /devfs, as whiteouts.  I don't think this is really that
 desirable, but some people won't accept a devfs that doesn't have
 traditional persistance semantics (e.g. chmod vs. modifying a
 well known kernel data structure as an administrative operation).

That wouldn't be hard to do. :-)

 I guess the other difference is that you don't have to worry about
 large minor numbers when you are bringing up a new platform via
 NFS from an old platform that can't support large minors in its FS
 at all.  ;-).

True. :-)

 I would resolve this by passing a standard option to the mount code
 in user space.  For root mounts, a vnode is passed down.  For other
 mounts, the vnode is parsed and passed if the option is specified.

Or maybe add a field to vfsops. This info says what the mount call will
expect (I want a block device, a regular file, a directory, etc), so it
fits. :-)

Also, if we leave it to userland, what happens if someone writes a
program which calls sys_mount with something the fs doesn't expect. :-)

 I think that you will only be able to find rare examples of FS's
 that don't take device names as arguments.  But for those, you
 don't specify the option, and it gets NULL, and whatever local
 options you specify.

I agree I can't see a leaf fs not taking a device node. But layered fs's
certainly will want something else. :-)

 The point is that, for FS's that can be both root and sub-root,
 the mount code doesn't have to make the decision, it can be punted
 to higher level code, in one place, where the code can be centrally
 maintained and kept from getting stale when things change out
 from under it.

True.

And with good comments we can catch the times when the centrally located
code changes  brakes an assumption made by the fs. :-)

  Except for a minor buglet with device nodes, stacking works in NetBSD at
  present. :-)
 
 Have you tried Heidemann's student's stacking layers?  There is one
 encryption, and one per-file compression with namespace hiding, that
 I think it would be hard pressed to keep up with.  But I'll give it
 the benefit of the doubt.  8-).

Nope. The problem is that while stacking (null, umap, and overlay fs's)
work, we don't have the coherency issues worked out so that upper layers
can cache data. i.e. so that the lower fs knows it has to ask the uper
layers to give pages back. :-) But multiple ls -lR's work 

Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Michael Hancock
  Have you tried Heidemann's student's stacking layers?  There is one
  encryption, and one per-file compression with namespace hiding, that
  I think it would be hard pressed to keep up with.  But I'll give it
  the benefit of the doubt.  8-).
 
 Nope. The problem is that while stacking (null, umap, and overlay fs's)
 work, we don't have the coherency issues worked out so that upper layers
 can cache data. i.e. so that the lower fs knows it has to ask the uper
 layers to give pages back. :-) But multiple ls -lR's work fine. :-)

Interesting, have you read the Heidemann paper that outlines a solution
that uses a cache manager?

You can probably find it somewhere here,
http://www.isi.edu/~johnh/SOFTWARE/UCLA_STACKING/





To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Bill Studenmund
On Wed, 18 Aug 1999, Michael Hancock wrote:

 Interesting, have you read the Heidemann paper that outlines a solution
 that uses a cache manager?
 
 You can probably find it somewhere here,
 http://www.isi.edu/~johnh/SOFTWARE/UCLA_STACKING/

Nope. I've read his dissertation, and his discussion of the lock
management inspired the struct lock * work I did for NetBSD (we use the
address of the lock, not the vnode, but other than that it's the same).

Thanks for the ref!

Take care,

Bill



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Michael Hancock
I forgot I had some old diffs that may be of help,
http://www.freebsd.org/~mch/vop1a.diff

You'll notice that just about everywhere that I moved vput() to the
appropriate layer a path component buffer was also freed in the wrong
place.  John Dyson put these buffers in zones so the free routine probably
looks very different than in netbsd.

zfree(namei_zone, cnp-cn_pnbuf);
-   vput(dvp);

Regards,


Mike



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Don Lewis
On Aug 16,  9:18pm, Terry Lambert wrote:
} Subject: Re: BSD XFS Port  BSD VFS Rewrite

}  I don't see how the namei recursion method prevents catching // as a
}  namespace escape.
} 
} 
} //apple-resource-fork/intermediate_dir/some_other_dir/file_with_fork
} 
} You can't inherit the fact that you are looking at the resource fork
} in the terminal component, ONLY.

I don't think this is a good example.  How would you access the resource
fork of a file relative to the current directory?  IMHO, the necessary
goop needs to go at the end of the path name.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-17 Thread Wes Peters
Don Lewis wrote:
 
 On Aug 16,  9:18pm, Terry Lambert wrote:
 } Subject: Re: BSD XFS Port  BSD VFS Rewrite
 
 }  I don't see how the namei recursion method prevents catching // as a
 }  namespace escape.
 }
 }
 } //apple-resource-fork/intermediate_dir/some_other_dir/file_with_fork
 }
 } You can't inherit the fact that you are looking at the resource fork
 } in the terminal component, ONLY.
 
 I don't think this is a good example.  How would you access the resource
 fork of a file relative to the current directory?  IMHO, the necessary
 goop needs to go at the end of the path name.

Pick a separator character that nobody in their right mind would use in
a file path.  \ strikes me as a good candidate.  ;^)

-- 
Where am I, and what am I doing in this handbasket?

Wes Peters Softweyr LLC
http://softweyr.com/   w...@softweyr.com


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-16 Thread Dominic Mitchell

On Sat, Aug 14, 1999 at 12:23:00PM -0400, Brian F. Feldman wrote:
 On Sat, 14 Aug 1999, James Howard wrote:
  I heard somewhere that Linux was released under a slightly modified GPL to
  permit the inclusion of BSD code.  I assumed they did this to steal the IP
  stack.
 
 Most likely.

Nope.  Linux has always had it's own IP stack.  That's where the
"Linux's network performance is poor" arguments came from.  Of course,
it's a lot better these days.
-- 
Dom Mitchell -- Palmer  Harvey McLane -- Unix Systems Administrator

"Finally, when replying to messages only quote the parts of the message
 your will be discussing or that are relevant. Quoting whole messages
 and adding two lines at the top is not good etiquette." -- Elias Levy
-- 
**
This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom they   
are addressed. If you have received this email in error please notify 
the system manager.

This footnote also confirms that this email message has been swept by 
MIMEsweeper for the presence of computer viruses.
**


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-16 Thread Terry Lambert

 On Fri, 13 Aug 1999, Terry Lambert wrote:
 
  Has anyone mentioned to them that they will be unable to incorporate
  changes made to the GPL'ed version of XFS back into the IRIX version
  of XFS, without IRIX becoming GPL'ed?
 
 Given that they say they're dropping IRIX and going with Linux, I don't
 think it'll be a problem.

Can you please site a reference for this, other than wishful
thinking by the Linux camp?

PS: I fired off a note to Dr. Forest Baskett at SGI (their senior VP
for R  D, and their CTO) about the license, and FS researchers in
the BSD community (e.g. Dr. Margo Seltzer, Dr. Kirk McKusick, John
Heidemann, et. al.) which will be unable to participated in driving
their technology forward because of the license.

I will report the response (if any).


Terry Lambert
[EMAIL PROTECTED]
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-16 Thread Vince Vielhaber

On Mon, 16 Aug 1999, Terry Lambert wrote:

  On Fri, 13 Aug 1999, Terry Lambert wrote:
  
   Has anyone mentioned to them that they will be unable to incorporate
   changes made to the GPL'ed version of XFS back into the IRIX version
   of XFS, without IRIX becoming GPL'ed?
  
  Given that they say they're dropping IRIX and going with Linux, I don't
  think it'll be a problem.
 
 Can you please site a reference for this, other than wishful
 thinking by the Linux camp?

Here's one:
http://www.zdnet.com/pcweek/stories/news/0,4153,1015908,00.html

But just about every trade rag covered it.

Vince.
-- 
==
Vince Vielhaber -- KA8CSH   email: [EMAIL PROTECTED]   flame-mail: /dev/null
   # include std/disclaimers.h   TEAM-OS2
Online Campground Directoryhttp://www.camping-usa.com
   Online Giftshop Superstorehttp://www.cloudninegifts.com
==





To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-16 Thread Terry Lambert

   On Fri, 13 Aug 1999, Terry Lambert wrote:
   
Has anyone mentioned to them that they will be unable to incorporate
changes made to the GPL'ed version of XFS back into the IRIX version
of XFS, without IRIX becoming GPL'ed?
   
   Given that they say they're dropping IRIX and going with Linux, I don't
   think it'll be a problem.
  
  Can you please site a reference for this, other than wishful
  thinking by the Linux camp?
 
 Here's one:
 http://www.zdnet.com/pcweek/stories/news/0,4153,1015908,00.html
 
 But just about every trade rag covered it.


Begging your pardon, but:


| --- With the help of Veritas Software Corp., SGI will work to add
| key features of its Irix operating system to the Linux platform.
| Currently, Irix runs on the MIPS platform. Once SGI switches
| entirely to Intel Corp.'s IA/64 platform, that will be the end of
| Irix. 
|
| SGI is also forming an alliance with NEC Corp. to increase its
| market share in Japan.

These paragraphs are contradictory.  It implies an end to MIPS.

Nintendo 64 uses MIPS.

It also seems a bit overzealous.


Terry Lambert
[EMAIL PROTECTED]
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-16 Thread Vince Vielhaber

On Mon, 16 Aug 1999, Terry Lambert wrote:

On Fri, 13 Aug 1999, Terry Lambert wrote:

 Has anyone mentioned to them that they will be unable to incorporate
 changes made to the GPL'ed version of XFS back into the IRIX version
 of XFS, without IRIX becoming GPL'ed?

Given that they say they're dropping IRIX and going with Linux, I don't
think it'll be a problem.
   
   Can you please site a reference for this, other than wishful
   thinking by the Linux camp?
  
  Here's one:
  http://www.zdnet.com/pcweek/stories/news/0,4153,1015908,00.html
  
  But just about every trade rag covered it.
 
 
 Begging your pardon, but:
 
 
 | --- With the help of Veritas Software Corp., SGI will work to add
 | key features of its Irix operating system to the Linux platform.
 | Currently, Irix runs on the MIPS platform. Once SGI switches
 | entirely to Intel Corp.'s IA/64 platform, that will be the end of
 | Irix. 
 |
 | SGI is also forming an alliance with NEC Corp. to increase its
 | market share in Japan.
 
 These paragraphs are contradictory.  It implies an end to MIPS.
 
 Nintendo 64 uses MIPS.
 
 It also seems a bit overzealous.

No argument here.  Perhaps they're just trying to float a few trial 
baloons in hopes of finding something popular/feasable. 

Vince.
-- 
==
Vince Vielhaber -- KA8CSH   email: [EMAIL PROTECTED]   flame-mail: /dev/null
   # include std/disclaimers.h   TEAM-OS2
Online Campground Directoryhttp://www.camping-usa.com
   Online Giftshop Superstorehttp://www.cloudninegifts.com
==





To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-16 Thread Brian McGroarty

--- Terry Lambert [EMAIL PROTECTED] wrote:
On Fri, 13 Aug 1999, Terry Lambert wrote:

   Can you please site a reference for this, other than
 wishful
   thinking by the Linux camp?
  
  Here's one:
 

http://www.zdnet.com/pcweek/stories/news/0,4153,1015908,00.html
  
  But just about every trade rag covered it.
 
 Begging your pardon, but:
 
 
 | --- With the help of Veritas Software Corp., SGI will work
 to add
 | key features of its Irix operating system to the Linux
 platform.
 | Currently, Irix runs on the MIPS platform. Once SGI switches
 | entirely to Intel Corp.'s IA/64 platform, that will be the
 end of
 | Irix. 
 |
 | SGI is also forming an alliance with NEC Corp. to increase
 | its market share in Japan.
 
 These paragraphs are contradictory.  It implies an end to
 MIPS.

Contradictory how? NEC's a big PC manufacurer in Japan. If SGI
is moving toward more conventional off-the-shelf components,
they stand to gain tremendously by an alliance, both from
manufacturing and distribution standpoints.



 Nintendo 64 uses MIPS.
 
 It also seems a bit overzealous.

So do the old and new Playstation models. The MIPS core is being
manufactured by several companies: IDT alone has something like
a dozen variants available with and without MMU, FP, 5000 vs
1 core, etc. and is in far wider use than in just PCs and
gaming consoles. I doubt if SGI machines abandoning MIPS
processors would put much of a dent in MIPS' profitability.

_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-16 Thread Ronald G. Minnich


I lost track of the quotes. 

  | --- With the help of Veritas Software Corp., SGI will work to add
  | key features of its Irix operating system to the Linux platform.
  | Currently, Irix runs on the MIPS platform. Once SGI switches
  | entirely to Intel Corp.'s IA/64 platform, that will be the end of
  | Irix. 
  |
  | SGI is also forming an alliance with NEC Corp. to increase its
  | market share in Japan.
  These paragraphs are contradictory.  It implies an end to MIPS.

an end to irix and an end to MIPS on desktop and server platforms has no
big effect on MIPS processors. The big volume for MIPS is embedded, or so
I am told. 

For an interesting take on all this visit www.mipsabi.org

ron




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-16 Thread Terry Lambert

 I lost track of the quotes. 
 
   | --- With the help of Veritas Software Corp., SGI will work to add
   | key features of its Irix operating system to the Linux platform.
   | Currently, Irix runs on the MIPS platform. Once SGI switches
   | entirely to Intel Corp.'s IA/64 platform, that will be the end of
   | Irix. 
   |
   | SGI is also forming an alliance with NEC Corp. to increase its
   | market share in Japan.
   These paragraphs are contradictory.  It implies an end to MIPS.
 
 an end to irix and an end to MIPS on desktop and server platforms has no
 big effect on MIPS processors. The big volume for MIPS is embedded, or so
 I am told. 
 
 For an interesting take on all this visit www.mipsabi.org

Uh, that site is dead, as of the end of this month.  See the
first link ("announcement").


Terry Lambert
[EMAIL PROTECTED]
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-16 Thread Terry Lambert

  These paragraphs are contradictory.  It implies an end to MIPS.
  
  Nintendo 64 uses MIPS.
  
  It also seems a bit overzealous.
 
 No argument here.  Perhaps they're just trying to float a few trial 
 baloons in hopes of finding something popular/feasable. 

That was my take on things, since no source was attributed.

Either that, or press zealotry.


Terry Lambert
[EMAIL PROTECTED]
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: BSD XFS Port BSD VFS Rewrite

1999-08-16 Thread Narvi


On Mon, 16 Aug 1999, Terry Lambert wrote:

On Fri, 13 Aug 1999, Terry Lambert wrote:

 Has anyone mentioned to them that they will be unable to incorporate
 changes made to the GPL'ed version of XFS back into the IRIX version
 of XFS, without IRIX becoming GPL'ed?

Given that they say they're dropping IRIX and going with Linux, I don't
think it'll be a problem.
   
   Can you please site a reference for this, other than wishful
   thinking by the Linux camp?
  
  Here's one:
  http://www.zdnet.com/pcweek/stories/news/0,4153,1015908,00.html
  
  But just about every trade rag covered it.
 
 
 Begging your pardon, but:
 
 
 | --- With the help of Veritas Software Corp., SGI will work to add
 | key features of its Irix operating system to the Linux platform.
 | Currently, Irix runs on the MIPS platform. Once SGI switches
 | entirely to Intel Corp.'s IA/64 platform, that will be the end of
 | Irix. 
 |

Why would switch to IA/64 mean end of IRIX? SGI has long planned to switch
to IA/64, but with IRIX. If SGI wants to continue building Origins, esp
the high CPU count ones, IRIX is to stay for a long time.

 | SGI is also forming an alliance with NEC Corp. to increase its
 | market share in Japan.
 
 These paragraphs are contradictory.  It implies an end to MIPS.
 

An end to high-end MIPS may come ... if Merced, etc. peform well enough.
As this is a topic beaten to death on comp.arch, everybody interested
should look there.

 Nintendo 64 uses MIPS.
 

Which doesn't matter all that much. MIPS cpus for nintendo could be made
by say MISP, not SGI (and SGI sold/is trying to sell MIPS).

 It also seems a bit overzealous.
 

You bet. It seems it is to hard foir the journalists to actually read the
press releases.

 
   Terry Lambert
   [EMAIL PROTECTED]
 ---
 Any opinions in this posting are my own and not those of my present
 or previous employers.
 

Sander

There is no love, no good, no happiness and no future -
all these are just illusions.




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



  1   2   >