Re: Anonymous vnodes?

2023-06-28 Thread Taylor R Campbell
On discussion with chs@, I realize that:

(a) you could just use uao_create(INT64_MAX - PAGE_SIZE), like tmpfs
does, and not bother with defining a uao_resize (just .pgo_put the
pages when you truncate and update the length under the lock);

but

(b) this is kind of duplicating the existing functionality of tmpfs,
and it's a shame to have to do that.

That said, I suspect it'll be easier to:

1. add .fo_truncate (which I think is reasonable to do anyway), and
2. duplicate the tmpfs functionality anyway just for memfd_create,

than to go through the effort of finding a way to incorporate the
Linux-style O_TMPFILE and linkat(..., AT_EMPTY_PATH) into NetBSD VFS
(although it might be worthwhile to do that in the long run).

It would also be technically possible to use vn_open("/var/shm/...")
with a randomly generated pathname and then unlink it immediately, but
that would likely introduce an exploitable race unless there's a
clever way I haven't thought of to avoid the race.


Re: Anonymous vnodes?

2023-06-27 Thread Taylor R Campbell
> Date: Tue, 27 Jun 2023 16:27:34 -0400
> From: Theodore Preduta 
> 
> On 2023-06-26 20:03, Taylor R Campbell wrote:
> > For a syscall, you should implement it in terms of uvm anonymous
> > objects:
> 
> Is there a preexisting way to resize a uvm_object?  Or do I need to
> write a function similar (but not really that similar) to uvm_vnp_setsize?

Hmm, yes, it looks like we will need a uao_setsize operation.

Here's a quick hack:

1. New flag UAO_FLAG_RESIZABLE.  In uao_create and others, always use
   a max-size u_swhash table when this flag is set, so we don't have
   to worry about resizing u_swslots array, rehashing u_swhash, or
   converting between array and hash table.  Not ideal but it'll do
   without much effort.

2. Define uao_resize to:
   (a) assert UAO_FLAG_RESIZABLE is set;
   (b) if truncating: uao_put the parts that will be discarded, from
   round_page(newsize) to oldsize, with PGO_FREE; and
   (c) update u_pages,
   all under a write-lock on uobj->vmobjlock.

Then .fo_write and .fo_truncate can just do uao_setsize -- or at
least, .fo_truncate could do that if it existed!  Currently
sys_ftruncate is vnode-specific, so we'll need to introduce a new
.fo_truncate fileops member like we've done for various other things
recently like .fo_seek, .fo_advlock, :

https://mail-index.netbsd.org/source-changes-hg/2023/04/22/msg364655.html
https://mail-index.netbsd.org/source-changes-hg/2023/04/22/msg364656.html
https://mail-index.netbsd.org/source-changes-hg/2023/04/22/msg364657.html


P.S.  I wonder whether this procedure is racy -- uao_put can drop the
lock; what happens if a concurrent uao_get/put swoops in while the
lock is dropped?  But uvm_vnp_setsize must have the same issue, since
uvn_put can drop the lock too.  So if there is a problem here, it'll
probably be easier if the logic is essentially the same, so that it
can be fixed in both of them the same way.


Re: Anonymous vnodes?

2023-06-27 Thread Theodore Preduta
On 2023-06-26 20:03, Taylor R Campbell wrote:
>> Date: Mon, 26 Jun 2023 18:13:17 -0400
>> From: Theodore Preduta 
>>
>> Is it possible to create a vnode for a regular file in a file system
>> without linking the vnode to any directory, so that it disappears when
>> all open file descriptors to it are closed?  (As far as I can tell, this
>> isn't possible with any of the vn_* or VOP_* functions?)
>>
>> If this idea is indeed not possible, should/could I implement something
>> like this?  (If so, how?)
>>
>> For context, I'm currently working on implementing memfd_create(2), and
>> thought this might be a shortcut.  Otherwise, I'll have to implement it
>> in terms of uvm operations (which is fine, just more work).
> 
> For a syscall, you should implement it in terms of uvm anonymous
> objects:

Is there a preexisting way to resize a uvm_object?  Or do I need to
write a function similar (but not really that similar) to uvm_vnp_setsize?

Theo(dore)



Re: Anonymous vnodes?

2023-06-27 Thread Joerg Sonnenberger
On Mon, Jun 26, 2023 at 06:13:17PM -0400, Theodore Preduta wrote:
> Is it possible to create a vnode for a regular file in a file system
> without linking the vnode to any directory, so that it disappears when
> all open file descriptors to it are closed?  (As far as I can tell, this
> isn't possible with any of the vn_* or VOP_* functions?)

Linux has O_TMPFILE for this, but we don't support this extension so
far.

Joerg


Re: Anonymous vnodes?

2023-06-27 Thread Christoph Badura
On Tue, Jun 27, 2023 at 11:30:24AM -0400, Mouse wrote:
> It's a normal state to be in.  But, as I read it, the post was asking
> for a way to reach that state _without_ passing through a "has a name
> in some directory" state; it's not clear to me whether that's possible
> in general (ie, without doing something filesystem-specific).

In general that isn't possible.  Being able to create a directory entry is
a sort of access control.  It does prevent users without that ability from
allocating, possibly all remaining, space in the file system.

--chris


Re: Anonymous vnodes?

2023-06-27 Thread Martin Husemann
On Tue, Jun 27, 2023 at 05:20:50PM +0200, Reinoud Zandijk wrote:
> That's completely normal. If a file is created in a file system and its
> unlinked its effectively in this state.

While that is true, a vnode is an internal representation of some entity
in some file system. What you really want here is unrelated to any file
system, so I think Taylor's suggestion makes a lot sense.

Martin


Re: Anonymous vnodes?

2023-06-27 Thread Mouse
>> Is it possible to create a vnode for a regular file in a file system
>> without linking the vnode to any directory, so that it disappears
>> when all open file descriptors to it are closed?  (As far as I can
>> tell, this isn't possible with any of the vn_* or VOP_* functions?)
> That's completely normal.

It's a normal state to be in.  But, as I read it, the post was asking
for a way to reach that state _without_ passing through a "has a name
in some directory" state; it's not clear to me whether that's possible
in general (ie, without doing something filesystem-specific).

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Anonymous vnodes?

2023-06-27 Thread Reinoud Zandijk
Hi,

On Mon, Jun 26, 2023 at 06:13:17PM -0400, Theodore Preduta wrote:
> Is it possible to create a vnode for a regular file in a file system without
> linking the vnode to any directory, so that it disappears when all open file
> descriptors to it are closed?  (As far as I can tell, this isn't possible
> with any of the vn_* or VOP_* functions?)

That's completely normal. If a file is created in a file system and its
unlinked its effectively in this state. If you want a code reference, look at
the system file handling of the UDF file system. A system file is loaded with
udf_get_node() (sys/fs/udf/udf_subc.c) that looks up the vnode and if its not
in the cache it calls udf_loadvnode() (trough vfs_loadvnode) that loads a
newly created vnode representing a file that is not linked to any directory.
It's up to the VFS_NEWVNODE() implementation to accept a NULL dvp or not.

As long as the vnode is referred, see UDF_SET_SYSTEMFILE(), the vnode is kept
alive. If the last reference is dropped the vnode is recycled and disappears.

With regards,
Reinoud



Re: Anonymous vnodes?

2023-06-26 Thread Taylor R Campbell
> Date: Mon, 26 Jun 2023 18:13:17 -0400
> From: Theodore Preduta 
> 
> Is it possible to create a vnode for a regular file in a file system
> without linking the vnode to any directory, so that it disappears when
> all open file descriptors to it are closed?  (As far as I can tell, this
> isn't possible with any of the vn_* or VOP_* functions?)
> 
> If this idea is indeed not possible, should/could I implement something
> like this?  (If so, how?)
> 
> For context, I'm currently working on implementing memfd_create(2), and
> thought this might be a shortcut.  Otherwise, I'll have to implement it
> in terms of uvm operations (which is fine, just more work).

For a syscall, you should implement it in terms of uvm anonymous
objects:

- memfd_create: fd_allocfile, uao_create
- .fo_close: uao_detach
- .fo_read/write: ubc_uiomove
  (tricky bit: if the offset pointer is >f_offset, you must
  acquire and release fp->f_lock around it)
- .fo_mmap: just take a reference and return the object

Should be easy, and similar to what kern_ksyms.c already does.  No
need for vnodes to be involved at all.


Re: Anonymous vnodes?

2023-06-26 Thread RVP

On Mon, 26 Jun 2023, Theodore Preduta wrote:


Is it possible to create a vnode for a regular file in a file system
without linking the vnode to any directory, so that it disappears when
all open file descriptors to it are closed?  (As far as I can tell, this
isn't possible with any of the vn_* or VOP_* functions?)

If this idea is indeed not possible, should/could I implement something
like this?  (If so, how?)



You could extend what shm_open() currently does on NetBSD: create a
unique temp. file in /var/shm; immediately unlink it, return the fd.

-RVP



Re: Anonymous vnodes?

2023-06-26 Thread Jason Thorpe


> On Jun 26, 2023, at 3:13 PM, Theodore Preduta  wrote:
> 
> Is it possible to create a vnode for a regular file in a file system
> without linking the vnode to any directory, so that it disappears when
> all open file descriptors to it are closed?  (As far as I can tell, this
> isn't possible with any of the vn_* or VOP_* functions?)

There isn't a general way to do this.

> For context, I'm currently working on implementing memfd_create(2), and
> thought this might be a shortcut.  Otherwise, I'll have to implement it
> in terms of uvm operations (which is fine, just more work).

Seems like these objects should be implemented above the file system ... just 
create a new descriptor type and interface directly with UVM.

-- thorpej



Anonymous vnodes?

2023-06-26 Thread Theodore Preduta
Is it possible to create a vnode for a regular file in a file system
without linking the vnode to any directory, so that it disappears when
all open file descriptors to it are closed?  (As far as I can tell, this
isn't possible with any of the vn_* or VOP_* functions?)

If this idea is indeed not possible, should/could I implement something
like this?  (If so, how?)

For context, I'm currently working on implementing memfd_create(2), and
thought this might be a shortcut.  Otherwise, I'll have to implement it
in terms of uvm operations (which is fine, just more work).

Thanks,

Theo(dore)