from:"Christos Zoulas"

Re: GSoC 2024 : Emulating missing system calls

2024-03-13 Thread Christos Zoulas

In article ,
Arnav Gupta   wrote:
>-=-=-=-=-=-
>
>follow up
>
>On Fri, Mar 1, 2024, 02:05 Arnav Gupta  wrote:
>
>> Hey,
>>
>> I'm a third year undergraduate student at BITS Pilani. I work with async
>> io (mainly io uring) calls in my lab and am generally interested in
>> systems topics. I found the topic "Emulating missing system calls"
>> interesting while going through the list of potential projects for the
>> year 2024.
>>
>> If preferable, I would like to join the communication channel used for the
>> same.
>>
>> Looking forward to learning from everyone in the community and making some
>> meaningful contributions along the way. Any guidance or roadmaps would be
>> greatly appreciated while I spend some time with the code-base on my end.
>>
>> Thanks,
>> Arnav Gupta

Hi Arnav,

As part of GSoC last summer, Theodore did a lot of work adding
missing compat-linux system calls. I am not sure if there are many
popular ones missing. The way to find out is to install the latest
NetBSD version (10 or current) and then install a very recent compat
linux package (from pkgsrc, perhaps even getting a newer one to
work that is there already) and see how many binaries work, and
from the ones that fail, which system calls they need.

Best,

christos

Re: panic on mfii(4) vd removal

2023-09-23 Thread Christos Zoulas

In article ,
Edgar Fuß   wrote:
>-=-=-=-=-=-
>
>> I get a panic if I remove a virtual disk from an mfii(4) device.
>That's another blunder in mfii(4). Patch (including the last) attached.

Committed, thanks!

christos

Re: Adding a virtual disk to mpii(4)

2023-09-22 Thread Christos Zoulas

In article ,
Edgar Fuß   wrote:
>-=-=-=-=-=-
>
>After adding a virtual disk to an mfii(4) device (racadm createvirtualdisk, 
>in my case), you get a nice
>   mfii0: logical drive 2 added (target 2)
>message, but
>   scsictl scsibus0 scan 2 0
>doesn't find any drive.
>
>That's because sc_ld[i].ld_present is still unset from mfii_attach() and so 
>mfii_scsipi_request() will return early (with xs->error = XS_SELTIMEOUT).
>
>The attached fix seems pretty dammn obvious and appears to work.
>
>Shall I file a PR?

Committed, thanks!

christos

Re: [PATCH] Support INCOMPAT_64BIT on ext4

2023-08-28 Thread Christos Zoulas

I noticed that the two cgload calls are different:

  e2fs_cgload(bp->b_data,
>e2fs_gd[i * fs->e2fs_bsize / sizeof(struct ext2_gd)],
fs->e2fs_bsize, 1 << fs->e2fs_group_desc_shift);
  ^
int32_t sh = m_fs->e2fs_bsize >> m_fs->e2fs_group_desc_shift;
e2fs_cgload(bp->b_data, _fs->e2fs_gd[i * sh],
^
m_fs->e2fs_bsize, m_fs->e2fs_group_desc_shift);

e2fs_cgsave(>e2fs_gd[
i * fs->e2fs_bsize / sizeof(struct ext2_gd)],
bp->b_data, fs->e2fs_bsize, fs->e2fs_group_desc_shift);

shouldn't we consistently index as:
>e2fs_gd[i * fs->e2fs_bsize / sizeof(struct ext2_gd)]
and pass the shift as:
fs->e2fs_group_desc_shift

???

Thanks,

christos

> On Aug 26, 2023, at 7:43 PM, Vladimir 'phcoder' Serbinenko 
>  wrote:
> 
> 
> 
> 
> Le dim. 27 août 2023, 00:48, Christos Zoulas  <mailto:chris...@zoulas.com>> a écrit :
> 
>> I took care of it. One I think was mine, one was yours  :-)
>> 
> 
> 
> Thank you a lot. I was diagnosing it but you were faster. I'll see what I can 
> do for userspace to remove old cgload/cgsave
> 
>> 
>> christos
>> 
>> 
>> 
>>> On Aug 26, 2023, at 8:52 AM, Christos Zoulas >> <mailto:chris...@zoulas.com>> wrote:
>>> 
>>> 
>>> It could also be my fault because I refactored the code a bit. One of the 
>>> things that I changed
>>> 
>>> that looked like a bug to me was adding a cast to e2fs_cgload():
>>> 
>>> memset((char *)optr + 32, 0, sizeof(*optr) - 32);
>>> 
>>> 
>>> 
>>> since optr is struct ext2_gd *...
>>> 
>>> 
>>> 
>>> christos
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On Aug 26, 2023, at 7:46 AM, Vladimir 'phcoder' Serbinenko 
>>>> mailto:phco...@gmail.com>> wrote:
>>>> 
>>>> 
>>>> I'll have a look today or tomorrow
>>>> 
>>>> 
>>>> Le sam. 26 août 2023, 12:06, Taylor R Campbell 
>>>> >>> <mailto:campbell%2bnetbsd-tech-k...@mumble.net>> a écrit :
>>>> 
>>>>> > Date: Wed, 23 Aug 2023 04:54:34 +0200
>>>>> > From: "Vladimir 'phcoder' Serbinenko" >>>> > <mailto:phco...@gmail.com>>
>>>>> >
>>>>> > This patch adds support for incompat_64bit on ext4 filesystem. This 
>>>>> > feature
>>>>> > is enabled by default on new filesystems on Ubuntu and probably other
>>>>> > distros
>>>>> 
>>>>> Cool, thanks!  christos@ committed this.
>>>>> 
>>>>> It looks like it may have some issues, though -- a lot of ext2fs tests
>>>>> are failing now.  Can you please take a look at the failures?
>>>>> 
>>>>> https://releng.netbsd.org/b5reports/i386/commits-2023.08.html#build-2023.08.26.05.47.53
>>>>> 
>>>>>Termination reason
>>>>> 
>>>>>FAILED: create file: No space left on device
>>>>> 
>>>>>Standard output stream
>>>>> 
>>>>>[   1.000] entropy: ready
>>>>>[   1.0200050] uid 0 on /mnt: out of inodes
>>>>> 
>>>>> FYI, if you're running a new enough kernel already, you can run the
>>>>> tests yourself by doing a distribution build and then doing
>>>>> 
>>>>> # chroot /path/to/objdir/destdir.amd64
>>>>> (chroot)# (cd /dev && sh MAKEDEV all)
>>>>> (chroot)# mount -t ptyfs ptyfs /dev/pts
>>>>> (chroot)# mount -t tmpfs tmpfs /tmp
>>>>> (chroot)# cd /usr/tests/fs/vfs
>>>>> (chroot)# atf-run | atf-report
>>>>> 
>>>>> You can also do build.sh release, install into a VM, and run the tests
>>>>> the same way -- cd /usr/tests/fs/vfs && atf-run | atf-report.
>>>>> 
>>>>> This also broke the build of bootloaders and the newfs_ext2fs userland
>>>>> tool.  I put in a stop-gap measure to restore the old definitions of
>>>>> e2fs_cgload/cgsave outside the kernel, just to unbreak the build, but
>>>>> it might be appropriate to make the new definitions available to
>>>>> bootloaders and newfs_ext2fs too.
>>>>> 
>>>>> (I haven't looked into technical details to see whether it is
>>>>> appropriate.  Probably newfs_ext2s should avoid creating file systems
>>>>> this new feature for a while, but maybe have an option to do it so we
>>>>> can exercise the code paths in tests.)
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> 



signature.asc
Description: Message signed with OpenPGP

Re: [PATCH] Support INCOMPAT_64BIT on ext4

2023-08-26 Thread Christos Zoulas

I took care of it. One I think was mine, one was yours  :-)

christos

> On Aug 26, 2023, at 8:52 AM, Christos Zoulas  wrote:
> 
> It could also be my fault because I refactored the code a bit. One of the 
> things that I changed
> that looked like a bug to me was adding a cast to e2fs_cgload():
>   memset((char *)optr + 32, 0, sizeof(*optr) - 32);
> 
> since optr is struct ext2_gd *...
> 
> christos
> 
> 
>> On Aug 26, 2023, at 7:46 AM, Vladimir 'phcoder' Serbinenko 
>>  wrote:
>> 
>> I'll have a look today or tomorrow
>> 
>> 
>> Le sam. 26 août 2023, 12:06, Taylor R Campbell 
>> > <mailto:campbell%2bnetbsd-tech-k...@mumble.net>> a écrit :
>> 
>>> > Date: Wed, 23 Aug 2023 04:54:34 +0200
>>> > From: "Vladimir 'phcoder' Serbinenko" >> > <mailto:phco...@gmail.com>>
>>> >
>>> > This patch adds support for incompat_64bit on ext4 filesystem. This 
>>> > feature
>>> > is enabled by default on new filesystems on Ubuntu and probably other
>>> > distros
>>> 
>>> Cool, thanks!  christos@ committed this.
>>> 
>>> It looks like it may have some issues, though -- a lot of ext2fs tests
>>> are failing now.  Can you please take a look at the failures?
>>> 
>>> https://releng.netbsd.org/b5reports/i386/commits-2023.08.html#build-2023.08.26.05.47.53
>>> 
>>>Termination reason
>>> 
>>>FAILED: create file: No space left on device
>>> 
>>>Standard output stream
>>> 
>>>[   1.000] entropy: ready
>>>[   1.0200050] uid 0 on /mnt: out of inodes
>>> 
>>> FYI, if you're running a new enough kernel already, you can run the
>>> tests yourself by doing a distribution build and then doing
>>> 
>>> # chroot /path/to/objdir/destdir.amd64
>>> (chroot)# (cd /dev && sh MAKEDEV all)
>>> (chroot)# mount -t ptyfs ptyfs /dev/pts
>>> (chroot)# mount -t tmpfs tmpfs /tmp
>>> (chroot)# cd /usr/tests/fs/vfs
>>> (chroot)# atf-run | atf-report
>>> 
>>> You can also do build.sh release, install into a VM, and run the tests
>>> the same way -- cd /usr/tests/fs/vfs && atf-run | atf-report.
>>> 
>>> This also broke the build of bootloaders and the newfs_ext2fs userland
>>> tool.  I put in a stop-gap measure to restore the old definitions of
>>> e2fs_cgload/cgsave outside the kernel, just to unbreak the build, but
>>> it might be appropriate to make the new definitions available to
>>> bootloaders and newfs_ext2fs too.
>>> 
>>> (I haven't looked into technical details to see whether it is
>>> appropriate.  Probably newfs_ext2s should avoid creating file systems
>>> this new feature for a while, but maybe have an option to do it so we
>>> can exercise the code paths in tests.)
>> 
>> 
> 



signature.asc
Description: Message signed with OpenPGP

Re: [PATCH] Fix for PS2 on Chromebook

2023-07-16 Thread Christos Zoulas

In article ,
Vladimir 'phcoder' Serbinenko  wrote:
>-=-=-=-=-=-
>-=-=-=-=-=-
>
>On at least some Chromebooks PS/2 reset command generates no response and
>we end up reading garbage and disabling keyboard support altogether even
>though that controller otherwise works fine. Linux issues reset but ignores
>the reply and relies on getid instead. So does Haiku. FreeBSD assumes that
>all coreboot systems have PS/2 which is wrong as it supports some MacBooks
>as well. No idea what windows does but keyboard works there.

committed, thanks!

christos

Re: Per-descriptor state

2023-05-03 Thread Christos Zoulas

In article <21465.1682869...@jacaranda.noi.kre.to>,
Robert Elz   wrote:
>Date:Sun, 30 Apr 2023 05:25:41 +
>From:David Holland 
>Message-ID:  
>
>  | Close-on-fork is apparently either coming or already here, not sure
>  | which, but it's also per-descriptor.
>
>We don't have it, but it will be in Posix-8.   Largely inspired by the
>needs of threaded programs (without lots of critical sections, one cannot
>otherwise open anything if another thread might fork, there's no
>way to avoid race conditions, hence O_CLOFORK on open ... not sure if
>anyone has thought of a way to add it to socket() - that doesn't look
>to be trivial, though it might be possible to abuse one of the params
>it has - probably domain - and add flags in upper bits ... while having
>it able to be set/reset via fcntl is useful, to work, it needs to be
>able to be set atomically with the operation that creates the fd, and
>having it default "on", which would work, would break almost all existing
>non-trivial code).

We already abuse "type":

 The following flags can be or'ed to the type to condition the returned
 file descriptor: The following flags are valid:

   SOCK_CLOEXEC Set the close on exec property.
   SOCK_NONBLOCK Sets non-blocking I/O.
   SOCK_NOSIGPIPE Return EPIPE instead of raising SIGPIPE.

christos

Re: Off by ones in wsemul_vt100_subr.c

2023-01-09 Thread Christos Zoulas

In article ,
Crystal Kolipe   wrote:
>The attached patch fixes a couple of off-by-ones in the wscons code.
>
>WSEMUL_VT_ID1 and WSEMUL_VT_ID2 are string constants, and sizeof returns the
>length of the string _plus_ the 0x00 terminator.  This size is then supplied
>to wsdisplay_emulinput, which expects to be passed just the number of
>printable characters.
>
>--- sys/dev/wscons/wsemul_vt100_subr.c.dist2018-12-05 22:42:20.0 
>-0300
>+++ sys/dev/wscons/wsemul_vt100_subr.c 2023-01-08 19:25:52.747978285 -0300
>@@ -195,7 +195,7 @@
>   switch (A3(edp->modif1, edp->modif2, c)) {
>   case A3('>', '\0', 'c'): /* DA secondary */
>   wsdisplay_emulinput(edp->cbcookie, WSEMUL_VT_ID2,
>-  sizeof(WSEMUL_VT_ID2));
>+  sizeof(WSEMUL_VT_ID2)-1);
>   break;
> 
>   case A3('\0', '\0', 'J'): /* ED selective erase in display */
>@@ -452,7 +452,7 @@
>   case 'c': /* DA primary */
>   if (ARG(edp, 0) == 0)
>   wsdisplay_emulinput(edp->cbcookie, WSEMUL_VT_ID1,
>-  sizeof(WSEMUL_VT_ID1));
>+  sizeof(WSEMUL_VT_ID1)-1);
>   break;
>   case 'g': /* TBC */
>   KASSERT(edp->tabs != 0);
>

Committed, thanks!

christos

Re: Limiting malloc to the low 2GB?

2022-11-28 Thread Christos Zoulas

In article ,
Valery Ushakov   wrote:
>Do we have a way to tell malloc on a 32-bit system to allocate memory
>only below the 2GB boundary (on i386, including when run under amd64)?
>I'm trying to port a(n old) program that wants to use the sign bit for
>its internal purposes.  I guess one option would be to prevent malloc
>from using mmap (and disable alsr?) so that only sbrk (in the low 2GB)
>is used.
>
>Suggestions are appreciated.

Perhaps this is also affected by uvm TOPDOWN. I think that this might
need kernel support...

christos

Re: Restructuring inpcb/in6pcb

2022-10-25 Thread Christos Zoulas

In article ,
Ryota Ozaki   wrote:
>Hi,
>
>Data structures of network protocol control blocks (PCBs)
>(struct inpcb, in6pcb and inpcb_hdr) are not organized well.
>Users of the data structures have to handle them separately
>and thus the code becomes duplicated and cluttered.
>
>The proposal restructures the data structures, which eliminates
>duplicate and cluttered code and makes further changes easy.
>
>A typical cleanup by the change looks like this:
>
>-   if (tp->t_inpcb)
>-   so = tp->t_inpcb->inp_socket;
>-#ifdef INET6
>-   else if (tp->t_in6pcb)
>-   so = tp->t_in6pcb->in6p_socket;
>-#endif
>+   so = tp->t_inpcb->inp_socket;
>
>Also some duplicated functions for IPv4 and IPv6 are integrated:
>tcp6_notify, tcp6_quench, etc.
>
>The change consists of two parts.  The first half of the series of
>commits tries to integrate all the data structures into one structure
>(struct inpcb).  The commits cleans up ugly code like above.
>However, because the structure includes both IPv4 and IPv6 stuff,
>the data size for IPv4 increases by 40 bytes (from 248 to 288).
>
>The second half of the series of commits addresses the increasement
>of the data size by separating the data structure again while keeping
>the code simple.  By the change, struct inpcb has only common data
>and newly introduced data structures, struct in4pcb and struct in6pcb
>inherited from struct inpcb, have dedicated data for each protocol.
>Even after the separation, users don't need to recognize the separation
>and just need to use some macros to access dedicated data.
>For example, inp->inp_laddr, which represents the local IPv4 address,
>is now accessed through in4p_laddr(inp).  Using these macros adds
>some code complexity, but this is a trade-off between data size and
>code complexity.
>
>The diffs are here:
>- https://www.netbsd.org/~ozaki-r/restructure-inpcb-1.patch
>- https://www.netbsd.org/~ozaki-r/restructure-inpcb-2.patch
>
>Also, you can see individual commits at github:
>  https://github.com/ozaki-r/netbsd-src/commits/restructure-inpcb
>
>
>We can adopt either of the whole changes or only changes of the first half.
>Which should we choose? Smaller data size or simpler code?
>
>By the way, I think the changes should be committed (if permitted)
>after branching netbsd-10.  When will the netbsd-10 branch come?
>If it doesn't come soon, is it ok to commit the changes before branching?

Thanks! I wanted to do this for a long time... I don't see why it should wait
for the branch...

Best,

christos

Re: Open master pty (/dev/ptmx) non blocking

2022-09-24 Thread Christos Zoulas

In article <25389.33419.131713.821...@gargle.gargle.howl>,
Anthony Mallet   wrote:
>-=-=-=-=-=-
>
>Hi,
>
>I have a piece of software that opens a master pty non-blocking:
>fd = open("/dev/ptmx", O_RDWR | O_NOCTTY | O_NONBLOCK);
>
>The intent is to make further read(2) on the master non blocking. But
>the O_NONBLOCK flag seems to be ignored. Attached is a minimal sample
>C program showing the issue.
>
>Several remarks:
>* open(2) manual does not mention a master pty as special regarding the
>  O_NONBLOCK flag, and even says "this flag also has the effect of making
>  all subsequent I/O on the open file non-blocking",
>* explicitly setting the file descriptor as non-blocking with fcntl(2)
>  works fine,
>* a slave pty has no such issue (O_NONBLOCK in open(2) is honoured),
>* FWIW, linux has no surprise here, the flag behaves as one would expect,
>* POSIX does not mention the flag as supported in posix_openpt(3) (it
>  does not says it's not supported either :).
>
>So, I'm not sure if something should be changed here, and if someone
>is willing to check that? I tried to track down where in the kernel
>this happens, maybe in pty_alloc_master in kern/tty_ptm.c? I really
>don't master those kernel aspects so I'm not confident in providing a
>patch.
>
>I guess the principle of least surprise would say that the flag should
>be supported (e.g. for maximum portability). OTOH, if the current
>behaviour is deemed correct or sufficient, maybe at least some manual
>(posix_openpt, open, ...) should mention this is not supported?
>
>Best,
>Anthony

Fixed in HEAD.

christos

Re: 9.99.100 fallout: file(1)

2022-09-22 Thread Christos Zoulas

In article ,
Michael van Elst  wrote:
>campbell+netbsd-tech-k...@mumble.net (Taylor R Campbell) writes:
>
>>We appear to have revived the old alphanumeric versioning scheme,
>>according to file(1)!  Someone needs to teach file(1) that this is
>>9.99.100, not 9.99A(.0).
>
>Index: external/bsd/file/dist/src/readelf.c
>===
>RCS file: /cvsroot/src/external/bsd/file/dist/src/readelf.c,v
>retrieving revision 1.25
>diff -p -u -r1.25 readelf.c
>--- external/bsd/file/dist/src/readelf.c   9 Apr 2021 19:11:42 -   
>1.25
>+++ external/bsd/file/dist/src/readelf.c   21 Sep 2022 19:32:32 -
>@@ -456,7 +456,11 @@ do_note_netbsd_version(struct magic_set 
> 
>   if (file_printf(ms, " %u.%u", ver_maj, ver_min) == -1)
>   return -1;
>-  if (ver_rel == 0 && ver_patch != 0) {
>+  if (ver_maj >= 9) {
>+  ver_patch += 100 * ver_rel;
>+  if (file_printf(ms, ".%u", ver_patch) == -1)
>+  return -1;
>+  } else if (ver_rel == 0 && ver_patch != 0) {
>   if (file_printf(ms, ".%u", ver_patch) == -1)
>   return -1;
>   } else if (ver_rel != 0) {
>
>% file /bin/ls
>/bin/ls: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV),
>dynamically linked, interpreter /libexec/ld.elf_so, for NetBSD 9.99.100,
>not stripped

Thanks, applied!

christos

Re: Emulating missing linux syscalls

2022-04-13 Thread Christos Zoulas

In article , Joerg Sonnenberger   wrote:
>Am Tue, Apr 12, 2022 at 04:56:05PM - schrieb Christos Zoulas:

>splice(2) as a concept is much older than the current Linux implementation.
>There is no reason why zero-copying for sockets should require a
>different system call for zero-copying from/to pipes. There are valid
>reasons for other combinations, too. Consider /bin/cp for example.

You don't need two system calls because the kernel knows the type of
the file descriptors and can dispatch to different implementations.
One of the questions is do you provide the means to pass an additional
header/trailer to the output data like FreeBSD does for its sendfile(2)
implementation?

int
splice(int infd, off_t *inoff, int outfd, off_t *outoff, size_t len, 
const struct {
struct iov *head;
size_t headcnt;
struct iov *tail;
size_t tailcnt;
} *ht, int flags);

>I was saying that the Linux system call can be implemented without a
>kernel backend, because I don't consider zero copy a necessary part of
>the interface contract. It's a perfectly valid, if a bit slower
>implementation to do allocate a kernel buffer and do IO via that.

Of course, but how do you make an existing binary use it? LD_PRELOAD
a binary to override the symbol in the linux glibc? By that logic you
don't need an in kernel linux emulation, you can do it all in userland :-)

christos

Re: Emulating missing linux syscalls

2022-04-12 Thread Christos Zoulas

In article , Joerg Sonnenberger   wrote:
>Am Tue, Apr 12, 2022 at 12:29:21PM - schrieb Christos Zoulas:
>> In article
>,
>> Piyush Sachdeva   wrote:
>> >-=-=-=-=-=-
>> >
>> >Dear Stephen Borrill,
>> >My name is Piyush, and I was looking into the
>> >'Emulating missing Linux syscalls' project hoping to contribute
>> >to this year's GSoC.
>> >
>> >I wanted to be sure of a few basic things before I go ahead:
>> >- linux binaries are found in- src/sys/compat/linux
>> >- particular implementation in - src/sys/compat/linux/common
>> >- a few architecture-specific implementations in-
>> >  src/sys/compat/linux/arch/.
>> >- The src/sys/compat/linux/arch//linux_syscalls.c file
>> >   lists of system calls, and states if a particular syscall is present or
>> >not.
>> >
>> >I was planning to work on the 'sendfile()' syscall, which I believe
>> >is unimplemented for amd64 and a few other architectures as well.
>> >
>> >Considering the above points, I was hoping you could point me in
>> >the right direction for this project. Hope to hear from you soon.
>> 
>> I would look into porting the FreeBSD implementation of sendfile to NetBSD.
>
>sendfile(2) for Linux compat can be emulated in the kernel without
>backing. That said, a real splice(2) or even splicev(2) would be really
>nice to have. But that's a different project and arguable, a potentially
>more generally useful one, too.


Yes, splice is more general (as opposed to send a file to a socket), but I
think splice has limitations too (one of the fds needs to be a pipe).
Is that true only for linux?

christos

Re: Emulating missing linux syscalls

2022-04-12 Thread Christos Zoulas

In article 
,
Piyush Sachdeva   wrote:
>-=-=-=-=-=-
>
>Dear Stephen Borrill,
>My name is Piyush, and I was looking into the
>'Emulating missing Linux syscalls' project hoping to contribute
>to this year's GSoC.
>
>I wanted to be sure of a few basic things before I go ahead:
>- linux binaries are found in- src/sys/compat/linux
>- particular implementation in - src/sys/compat/linux/common
>- a few architecture-specific implementations in-
>  src/sys/compat/linux/arch/.
>- The src/sys/compat/linux/arch//linux_syscalls.c file
>   lists of system calls, and states if a particular syscall is present or
>not.
>
>I was planning to work on the 'sendfile()' syscall, which I believe
>is unimplemented for amd64 and a few other architectures as well.
>
>Considering the above points, I was hoping you could point me in
>the right direction for this project. Hope to hear from you soon.

I would look into porting the FreeBSD implementation of sendfile to NetBSD.

christos

Re: procfs files vs symlink

2022-01-14 Thread Christos Zoulas

In article <8ad9feaf-a513-d33d-c887-3ca8407c...@sdf.org>,
RVP   wrote:
>On Fri, 14 Jan 2022, Robert Elz wrote:
>
>> If your patch makes any difference to the way ls /proc/self/fd
>> works, that is just a fluke (relates to the way ls happens to
>> sequence its operations) and in no way any kind of general fix.
>>
>
>It does not, and wasn't meant to. I noticed that d_type was being
>set to VREG and attached a patch for that to my reply.
>
>> If procfs is not setting d_type correctly, that should be fixed.
>>
>
>I think this does it:

That looks nice, committed and forgot to give you credit, apologies.

christos

Re: procfs files vs symlink

2022-01-12 Thread Christos Zoulas



> On Jan 12, 2022, at 8:08 AM, Manuel Bouyer  wrote
>> 
>> Where do you get the reference to the original inode? Try it...
> 
> I don't understand. My patch doesn't change the procfs behavior for non-linux
> binaries so I can't see the problem.

Ah I did not understand that, sorry. Anyway it is fine then, although I would
like to understand first why linux binaries break... Do they do readlink(2)
unconditionally and then they don't handle the error condition?

christos



signature.asc
Description: Message signed with OpenPGP

Re: procfs files vs symlink

2022-01-12 Thread Christos Zoulas



> On Jan 12, 2022, at 7:54 AM, Manuel Bouyer  wrote:
> 
> you can still do this, as long as you're not using a linux ln(1) binary.

Where do you get the reference to the original inode? Try it...

christos


signature.asc
Description: Message signed with OpenPGP

Re: procfs files vs symlink

2022-01-12 Thread Christos Zoulas

In article ,
Manuel Bouyer   wrote:
>-=-=-=-=-=-
>
>On Fri, Jan 07, 2022 at 03:20:04PM +0100, Manuel Bouyer wrote:
>> Hello
>> I'm trying to get a linux binary to run on NetBSD, as stated in this thread
>> http://mail-index.netbsd.org/current-users/2022/01/06/msg041891.html
>> 
>> Now I hit an issue where the linux process does a readlink() on a procfs
>> file and gets EINVAL.
>> It seems that this is because, on linux all files in /proc//fd/ are
>> symlinks, while on NetBSD they are some kind of hard links.
>> E.g. on linux:
>> bip:/dsk/l1/misc/bouyer/HEAD/clean/src>ls -l /proc/$$/fd/
>> total 0
>> lr-x-- 1 bouyer ita-iatos 64 Jan  7 14:13 0 -> /dev/null
>> lr-x-- 1 bouyer ita-iatos 64 Jan  7 14:13 1 -> /dev/null
>> lrwx-- 1 bouyer ita-iatos 64 Jan  7 15:16 15 -> /dev/pts/11
>> lrwx-- 1 bouyer ita-iatos 64 Jan  7 14:13 16 -> /dev/pts/11
>> lrwx-- 1 bouyer ita-iatos 64 Jan  7 15:16 17 -> /dev/pts/11
>> lrwx-- 1 bouyer ita-iatos 64 Jan  7 15:16 18 -> /dev/pts/11
>> lrwx-- 1 bouyer ita-iatos 64 Jan  7 15:16 19 -> /dev/pts/11
>> lr-x-- 1 bouyer ita-iatos 64 Jan  7 14:13 2 -> /dev/null
>> 
>> On NetBSD:
>> armandeche:/local/armandeche1/bouyer>/emul/linux/bin/ls -l /proc/$$/fd/
>> total 0
>> crw--w 1 bouyer tty 3, 13 Jan  7 15:19 15
>> crw--w 1 bouyer tty 3, 13 Jan  7 15:19 16
>> crw--w 1 bouyer tty 3, 13 Jan  7 15:19 17
>> 
>> Any idea on how to properly fix it ?
>
>The attached diff changes the procfs behavior to match the linux one, for
>linux processes:
>comore:/home/bouyer>ls -l /proc/self/fd/
>total 1
>crw--w  1 bouyer  tty5, 0 Jan 11 11:08 0
>crw--w  1 bouyer  tty5, 0 Jan 11 11:08 1
>crw--w  1 bouyer  tty5, 0 Jan 11 11:08 2
>lr-xr-xr-x  1 bouyer  staff   512 Jan 11 11:08 3 -> /home/bouyer
>
>ls: /proc/self/fd//4: Invalid argument
>lr-xr-xr-x  1 bouyer  staff 0 Jan 11 11:08 4
>comore:/home/bouyer>/emul/linux/bin/ls -l /proc/self/fd/
>total 0
>lr-xr-xr-x 1 root   wheel 0 Jan 11 11:08 0 -> /dev/ttyp0
>lr-xr-xr-x 1 root   wheel 0 Jan 11 11:08 1 -> /dev/ttyp0
>lr-xr-xr-x 1 root   wheel 0 Jan 11 11:08 2 -> /dev/ttyp0
>lr-xr-xr-x 1 bouyer staff 0 Jan 11 11:08 3 -> /
>
>and my linux binaries seems to work properly now
>
>would it be OK to commit ?

Err, no :-) The previous behavior uses the original inode from the filesystem.
This means that you can undelete files:

[7:39am] 1978>cat > removeme
23423234234324
^Z
[1]  + 21532 Suspended cat > removeme
[7:39am] 1979>bg
[1]cat > removeme &
[2]  + Suspended (tty input) cat > removeme
[7:39am] 1980>rm removeme 
[7:39am] 1981>cd /proc/21532/fd
[7:39am] 1982>ls -l
total 2
crw--w  1 christos  tty   5, 2 Jan 12 07:39 0
-rw-rw-r--  0 christos  christos15 Jan 12 07:39 1
crw--w  1 christos  tty   5, 2 Jan 12 07:39 2
[7:39am] 1983>ln 1 ~christos/unremove
[7:40am] 1984>fg
cat > removeme (wd: ~)
(wd now: /proc/21532/fd)
[7:40am] 1985>cat ~christos/unremove 
23423234234324

Yes, this is a stupid example, but consider doing the same to recover
the log from a long running daemon that was accidentally removed.

christos

Re: POSIX priority when SCHED_NONE is incorrect

2021-10-05 Thread Christos Zoulas

In article <7e349617-70c3-6eac-b865-41a378693...@irvise.xyz>,
Fernando Oleo Blanco   wrote:
>[[Resending from tech-userlevel as per recommendation from Reinoud]]
>
>
>Hello everybody,
>
>
>TL;DR: the min and max priority for SCHED_OTHER is -1 (both cases).
>POSIX allows for whatever priority the system wants to define as min and
>max when using SCHED_OTHER. In *NIX systems, when using SCHED_OTHER what
>matters is the "niceness" of the process. However,
>"sched_get_priority_{min,max}(SCHED_OTHER);" functions, which return the
>value of the priority (-1 as stated before for NetBSD) use -1 as an
>error code. So a min = max priority of -1 while "okay", is, by standard,
>considered an error if a user program asks for it. However, after
>further testing, I think that it can be left to -1 or at least accept
>the value 0 too.
>
>
>I am trying to update GNAT (gcc-ada/aux) in pkgsrc for NetBSD. I
>stumbled on a roadblock, all tests that used tasks (Ada jargon for
>threads) were failing with the same assertion error. After digging up a
>bit I found they were failing when calling the function
>"pthread_setschedparam". I looked at the inputs that were being sent and
>they all seemed acceptable: the task, okay; the SCHED_ODER scheduling
>system, ok?; and a priority of 15... ok? Long story short, I fixed a
>problem with the SCHED_OTHER (it was erroneously imported into Ada), and
>it still failed.
>
>Since I did not know whether a priority of 15 was correct I looked it
>up. I ended with this Stack Overflow question/answer [1]. I ran the code
>to see the maximum and minimum values allowed by NetBSD with SCHED_OTHER:
>
>- min: -1
>
>- max: -1
>
>
>Mmmm... Strange, those values, shouldn't they be positive, or at least
>not -1? Lets see what "man sched_get_priority_min" says: "If successful,
>the sched_get_priority_max() and sched_get_priority_min() functions
>shall return the appropriate maximum or minimum values, respectively. If
>unsuccessful, they shall return a value of -1 and set errno to indicate
>the error."
>
>Uf! Am I getting an error? Does NetBSD not allow the use of
>SCHED_OTHER??? I went to the IRC chat and asked. As it happens, those
>values are actually correct. NetBSD returns the value PRI_NONE, which is
>-1. I created a horrible patch for GNAT, recompiled and it no longer
>failed at that point! Yay!
>
>
>However, I looked at how GNAT deals with priorities in Linux and found
>out that it really does not care about what "pthread_setschedparam"
>returns, even if it is an error. Why? Because when a process is using
>SCHED_OTHER, since it is implementation defined, the process does not
>care what happens. So if "pthread_setschedparam" fails for whatever
>reason, it does not matter, it will keep on working.
>
>
>So I really do not know what to say with this email apart from bringing
>your attention to an indirect POSIX design violation. The issue is not
>really with what you use as valid priorities (-1), since POSIX says it
>is implementation defined. The problem is that if one gets the priority,
>it receives an error as defined by POSIX, even though it is no error,
>but the actual priority.

You can differentiate between (-1 + error), and (-1 + valid value). by checking
errno. The same is true for strtol("-1", NULL, 10). Yes, it is ugly because
you need to set errno to 0 before the call, but it is doable. In retrospect
we should have chosen a different value to return, but it is going to be
disruptive to change.

Best,

christos

Re: eventfd(2) and timerfd(2) APIs

2021-09-18 Thread Christos Zoulas

In article <986563ad-88c2-41b9-bf69-51b26240b...@me.com>,
Jason Thorpe   wrote:
>Last year, I wrote implementations of the Linux eventfd(2) and
>timerfd(2) interfaces for NetBSD, with the goal of improving our Linux
>emulation.  In order to be able to test them with ATF tests, I went
>ahead and made them native calls as well.
>
>Here are the man pages describing the interfaces:
>
>   https://www.netbsd.org/~thorpej/eventfd.2
>   https://www.netbsd.org/~thorpej/timerfd.2
>
>Any objections to adding these?

None from me.

christos

Re: General device properties API

2021-08-14 Thread Christos Zoulas

In article ,
Jason Thorpe   wrote:
>
>==> int device_getprop_string(device_t dev, const char *prop, void *buf,
>int buflen);

can we use ssize_t for returns and size_t for sizes?

christos

Re: vn_open, EDUPFD, EMOVEFD

2021-06-29 Thread Christos Zoulas

In article ,
David Holland   wrote:
>
>* Does anyone know why dev/kloader.c calls namei and then vn_open on
>the same path? I remember seeing this before and leaving it alone
>because nobody I could find was sure what the deal was, but it's
>unlikely to work as-is and increasingly likely to break over time...

Just fix it, and if something breaks we'll put it back.

>diff -r 4c2d0a182ef8 external/cddl/osnet/sys/sys/vnode.h
>--- a/external/cddl/osnet/sys/sys/vnode.h  Sun Jun 27 18:13:54 2021 -0400
>+++ b/external/cddl/osnet/sys/sys/vnode.h  Mon Jun 28 11:05:45 2021 -0400
>@@ -239,7 +239,6 @@
> vnode_t **vpp, enum create crwhy, mode_t umask)
> {
>   struct pathbuf *pb;
>-  struct nameidata nd;
>   int error;
> 
>   ASSERT(seg == UIO_SYSSPACE);
>@@ -248,11 +247,9 @@
>   ASSERT(umask == 0);
> 
>   pb = pathbuf_create(pnamep);
>-  NDINIT(, LOOKUP, NOFOLLOW, pb);
>-  error = vn_open(, filemode, createmode);
>+  error = vn_open(NULL, pb, 0, filemode, createmode, vpp, NULL, NULL);

This is the only NOFOLLOW NDINIT case, should that be 'createmode | O_NOFOLLOW'?


>-  NDINIT(, CREATE, LOCKPARENT, pb);
>   
>   /*
>* Since we do not hold ulfs_extattr_uepm_lock anymore,
>* another thread may race with us for backend creation,
>-   * but only one can succeed here thanks to O_EXCL
>+   * but only one can succeed here thanks to O_EXCL.
>+   *
>+   * backing_vp is the backing store. 
>*/
>-  error = vn_open(, O_CREAT|O_EXCL|O_RDWR, 0600);
>+  error = vn_open(NULL, pb, 0, O_CREAT|O_EXCL|O_RDWR, 0600,
>+  _vp, NULL, NULL);

I guess O_CREAT will do the LOCKPARENT later...
I would have preferred if EMOVEFD/EDUPFD were gc'ed as part of the patch,
because there is a lot of ugliness left.

christos

Re: posix_spawn/chdir

2021-06-15 Thread Christos Zoulas



> On Jun 15, 2021, at 4:22 AM, Piyush Sachdeva  
> wrote:
> 
>>> Progress Update 1 (Week 2)-
>>> Contrary to the timeline provided, I started working within the user-land.
>>> (src/lib/libc/gen/posix_spawn_fileactions.c, src/include/spawn.h,
>>> src/sys/sys/spawn.h
>>> The modifications made to the files can be found at:
>>> https://github.com/cosmologistPiyush/posix_spawn-chdir
>>> 
>>> 
>>> As most of the work in the user-land is done,
>>> it is my goal to work in the kernel-space for the next three weeks.
> 
>> Good progress. It is probably better to mimick the original NetBSD tree
>> structure in userland and commit your changes on top of a baseline that
>> comes unmodified from NetBSD so that you can produce a diff that directly
>> applies to the NetBSD tree.
> 
> Thank you!
> I will get my commits in order asap.
> 
>> You also need to write unit-tests for userland :-)
> 
> My initial plan was to write test cases during weeks 6&7,
> given I am not too fluent with unit testing.
> However, if the test cases are required right away,
> then I will start working on them as well.

No, you don't need to write the unit-tests right now. I was merely addressing
the comment that "most of the work in the user-land is done", because writing
proper unit-tests is more work than adding the APIs.

Regards,

christos


signature.asc
Description: Message signed with OpenPGP

Re: Introduction to posix_spawn/chdir

2021-06-14 Thread Christos Zoulas

In article ,
Piyush Sachdeva   wrote:
>On Fri, Jun 11, 2021 at 1:10 AM Piyush Sachdeva
> wrote:
>>
>> To whomever it may concern,
>> Hi, I am Piyush Sachdeva.
>> I am doing an "unofficial GSoC Project" to enhance the posix_spawn()
>system call.
>> Check out my blogpost for more details.
>>
>> I will ask questions here and report my progress during this project.
>
>
>Progress Update 1 (Week 2)-
>Contrary to the timeline provided, I started working within the user-land.
>(src/lib/libc/gen/posix_spawn_fileactions.c, src/include/spawn.h,
>src/sys/sys/spawn.h
>The modifications made to the files can be found at:
>https://github.com/cosmologistPiyush/posix_spawn-chdir
>
>
>As most of the work in the user-land is done,
>it is my goal to work in the kernel-space for the next three weeks.

Good progress. It is probably better to mimick the original NetBSD tree
structure in userland and commit your changes on top of a baseline that
comes unmodified from NetBSD so that you can produce a diff that directly
applies to the NetBSD tree. You also need to write unit-tests for userland :-)

Best,

christos

Re: Is there a command to change btime (creation time of files)?

2021-05-29 Thread Christos Zoulas


> On May 29, 2021, at 12:22 PM, Martin Husemann  wrote:
> 
> On Sat, May 29, 2021 at 02:03:42PM -, Christos Zoulas wrote:
>> The baroque procedure is described in the manual page of utimes(2) where
>> one would expect it :-)
> 
> We should also add a command for it to fsdb(8).
> 
> Martin

Done :-)

christos


signature.asc
Description: Message signed with OpenPGP

Re: Is there a command to change btime (creation time of files)?

2021-05-29 Thread Christos Zoulas

In article <2828.1622233...@jinx.noi.kre.to>,
Robert Elz   wrote:
>Date:Fri, 28 May 2021 16:49:26 +
>From:Kenny 
>Message-ID: 
>
>
>  | I am using NetBSD 9.2 (amd64) with ZFS as file system and I have
>  | not found a command to change btime for my files.
>
>Don't bother, the birth time is a total waste of space.  It is used
>by nothing that matters, and is useful for nothing at all.   It is
>mostly unsupported (in that the stat command is just about the only
>way to view it, and while it is possible to set it, but not arbitrarily,
>the procedure is baroque and not worth explaining).

The baroque procedure is described in the manual page of utimes(2) where
one would expect it :-)

christos

Re: Ext4 support

2021-04-29 Thread Christos Zoulas

In article ,
Vincent DEFERT  <20@defert.com> wrote:
>Hi,
>
>I use both Linux and NetBSD, so I would be interested in implementing 
>ext4 support.
>However, the project description page at 
>https://wiki.netbsd.org/projects/project/ext4fs/
>is 6 years old and I wonder whether its status has changed since then.

Some ext4 features were implemented as part of GSoC 2016 (extents, htrees).
I am sure that there are other unimplemented features. What are you looking
for?

christos

Re: umodeswitch

2021-04-12 Thread Christos Zoulas

In article <1p7infd.zofefgbhxppgm%m...@netbsd.org>,
Emmanuel Dreyfus  wrote:
>Emmanuel Dreyfus  wrote:
>
>> I had some success with ZTE MF112 by just telling umass to not attach it
>> using umass_quirks (patch below). I got a frienly OK when telling it ATZ
>> on /dev/ttyU2 at 230400 bps. I need to add a SIM to perform further
>> testing.
>> 
>> Is the approach sane?
>
>It is not. I got mislead by the fact that the device maintains state
>across reboots, and is only reset by a power cycle.
>
>Supporting ZTE MF112 only requires this change:
>
>--- u3g.c.orig
>+++ ./u3g.c
>@@ -253,4 +253,5 @@
>{ USB_VENDOR_ZTE, USB_PRODUCT_ZTE_MF628 },
>{ USB_VENDOR_ZTE, USB_PRODUCT_ZTE_MF820D },
>+   { USB_VENDOR_ZTE, USB_PRODUCT_ZTE_MF112 },
> 
>/* 4G Systems */
>
>But it does not work out of the box. It needs two device scan for u3g to
>attach, either booting twice after a power cycle, or
>detaching/rescanning after a single boot  from cold state (drvctl -d
>umodeswitch0 ; drvctl -a usbdevif -r uhub1).
>
>Once u3g attaches the device, it keep doing so across reboots, until the
>next power cycle. 
>
>Any hint on how that could be fixed?

See FreeBSD's u3g.c driver. Perhaps we can accommodate sending the 
initialization command in the umodeswitch.c driver?

christos

Re: ZFS: time to drop Big Scary Warning

2021-03-25 Thread Christos Zoulas

I think that is good enough. We should document the timing-related tests and 
try to fix them!

christos

> On Mar 25, 2021, at 2:06 PM, Greg Troxel  wrote:
> 
> Signed PGP part
> 
> chris...@astron.com (Christos Zoulas) writes:
> 
>> That's a good test, but how does zfs compare in for the same test with lets
>> say ffs or ext2fs (filesystems that offer persistence)?
> 
> With the same system, booted in the  same way, but with 3 different
> filesystems mounted on /tmp, I get similar numbers of failures:
> 
> tmpfs 12
> ffs2  13
> zfs   18
> 
> So tmpfs/ffs2 are ~equal and zfs has a few more failures (but it all
> looks a bit random and non-repeatable).So it's hard to sort out "zfs
> is buggy" vs "some tests fail in timing-related hard-to-understand ways
> and that seems provoked slightly more with /tmp on zfs".
> 
> Did you mean something else?
> 
> 



signature.asc
Description: Message signed with OpenPGP

Re: ZFS: time to drop Big Scary Warning

2021-03-23 Thread Christos Zoulas

In article ,
Greg Troxel   wrote:
>-=-=-=-=-=-
>
>which is also similar, but slightly different.
>
>So overal I conclude that there's nothing terrible going on, and that
>these results are in the same class of mostly passing but somewhat
>irregular as the base case.  So work to do, but it doesn't support "ZFS
>is scary".
>
>(Of course, the system stayed up through the tests and has no apparent
>trouble, or I would have said.)
>
>As an aside, it would be nice if atf-test used TMPDIR or had an argument
>to say what place to do tests.

That's a good test, but how does zfs compare in for the same test with lets
say ffs or ext2fs (filesystems that offer persistence)?

Best,

christos

Re: GSOC ZFS root support (bootloader and mount_root)

2021-03-06 Thread Christos Zoulas

In article ,
Victor Kukshiev   wrote:
>-=-=-=-=-=-
>
>Hello! I am Victor Kukshiev (cetjs2 in IRC), 2rd course student of PetrSU
>university
>This idea is interesting and possible for me, I think. I am interested
>in bsd systems and ZFS and I know C programming language.
>Could you tell us more about this project?
>How do I participate in this project? Please let me know.

See the NetBSD ZFS wiki page: https://wiki.netbsd.org/zfs/

christos

Re: POINTER_ALIGNED_P (was: Re: CVS commit: src/sys)

2021-02-17 Thread Christos Zoulas

In article <5912ca9e-b4e7-423d-a45d-f4693d1c9...@zoulas.com>,
Christos Zoulas   wrote:
>-=-=-=-=-=-

Here's the final changes

- Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
  correct because it works with non-primitive types and provides the ABI
  alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
  ALIGNED_POINTER, but returns that the pointer is always aligned if the
  CPU supports unaligned accesses.

Index: sys/mbuf.h
===
RCS file: /cvsroot/src/sys/sys/mbuf.h,v
retrieving revision 1.231
diff -u -u -r1.231 mbuf.h
--- sys/mbuf.h  17 Feb 2021 22:32:04 -  1.231
+++ sys/mbuf.h  17 Feb 2021 23:54:31 -
@@ -843,15 +843,21 @@
m->m_pkthdr.rcvif_index = n->m_pkthdr.rcvif_index;
 }
 
+#define M_GET_ALIGNED_HDR(m, type, linkhdr) \
+m_get_aligned_hdr((m), __alignof(type) - 1, sizeof(type), (linkhdr))
+
 static __inline int
-m_get_aligned_hdr(struct mbuf **m, int align, size_t hlen, bool linkhdr)
+m_get_aligned_hdr(struct mbuf **m, int mask, size_t hlen, bool linkhdr)
 {
-   if (POINTER_ALIGNED_P(mtod(*m, void *), align) == 0) {
-   --align;// Turn into mask
+#ifndef __NO_STRICT_ALIGNMENT
+   if (((uintptr_t)mtod(*m, void *) & mask) != 0)
*m = m_copyup(*m, hlen, 
- linkhdr ? (max_linkhdr + align) & ~align : 0);
-   } else if (__predict_false((size_t)(*m)->m_len < hlen))
+ linkhdr ? (max_linkhdr + mask) & ~mask : 0);
+   else
+#endif
+   if (__predict_false((size_t)(*m)->m_len < hlen))
*m = m_pullup(*m, hlen);
+
return *m == NULL;
 }
 
Index: sys/param.h
===
RCS file: /cvsroot/src/sys/sys/param.h,v
retrieving revision 1.689
diff -u -u -r1.689 param.h
--- sys/param.h 17 Feb 2021 22:32:04 -  1.689
+++ sys/param.h 17 Feb 2021 23:54:31 -
@@ -281,16 +281,24 @@
 #defineALIGN(p)(((uintptr_t)(p) + ALIGNBYTES) & 
~ALIGNBYTES)
 #endif
 #ifndef ALIGNED_POINTER
-#defineALIGNED_POINTER(p,t)uintptr_t)(p)) & (sizeof(t) - 1)) 
== 0)
+#defineALIGNED_POINTER(p,t)uintptr_t)(p)) & (__alignof(t) - 
1)) == 0)
 #endif
 #ifndef ALIGNED_POINTER_LOAD
 #defineALIGNED_POINTER_LOAD(q,p,t) (*(q) = *((const t *)(p)))
 #endif
 
+/*
+ * Return if the pointer p is accessible for type t. For primitive types
+ * this means that the pointer itself can be dereferenced; for structures
+ * and unions this means that any field can be dereferenced. On CPUs
+ * that allow unaligned pointer access, we always return that the pointer
+ * is accessible to prevent unnecessary copies, although this might not be
+ * necessarily faster.
+ */
 #ifdef __NO_STRICT_ALIGNMENT
-#definePOINTER_ALIGNED_P(p, a) 1
+#defineACCESSIBLE_POINTER(p, t)1
 #else
-#definePOINTER_ALIGNED_P(p, a) (((uintptr_t)(p) & ((a) - 1)) 
== 0)
+#defineACCESSIBLE_POINTER(p, t)ALIGNED_POINTER(p, t)
 #endif
 
 /*
Index: net/if_arp.h
===
RCS file: /cvsroot/src/sys/net/if_arp.h,v
retrieving revision 1.42
diff -u -u -r1.42 if_arp.h
--- net/if_arp.h17 Feb 2021 22:32:04 -  1.42
+++ net/if_arp.h17 Feb 2021 23:54:31 -
@@ -72,8 +72,6 @@
uint8_t  ar_tpa[];  /* target protocol address */
 #endif
 };
-#defineARP_HDR_ALIGNMENT   __alignof(struct arphdr)
-__CTASSERT(ARP_HDR_ALIGNMENT == 2);
 
 static __inline uint8_t *
 ar_data(struct arphdr *ap)
Index: net/if_bridge.c
===
RCS file: /cvsroot/src/sys/net/if_bridge.c,v
retrieving revision 1.178
diff -u -u -r1.178 if_bridge.c
--- net/if_bridge.c 14 Feb 2021 20:58:34 -  1.178
+++ net/if_bridge.c 17 Feb 2021 23:54:31 -
@@ -2806,7 +2806,7 @@
if (*mp == NULL)
return -1;
 
-   if (m_get_aligned_hdr(, IP_HDR_ALIGNMENT, sizeof(*ip), true) != 0) {
+   if (M_GET_ALIGNED_HDR(, struct ip, true) != 0) {
/* XXXJRT new stat, please */
ip_statinc(IP_STAT_TOOSMALL);
goto bad;
@@ -2900,7 +2900,7 @@
 * it.  Otherwise, if it is aligned, make sure the entire base
 * IPv6 header is in the first mbuf of the chain.
 */
-   if (m_get_aligned_hdr(, IP6_HDR_ALIGNMENT, sizeof(*ip6), true) != 0) {
+   if (M_GET_ALIGNED_HDR(, struct ip6_hdr, true) != 0) {
struct ifnet *inifp = m_get_rcvif_NOMPSAFE(m);
/* XXXJRT new stat, please */
ip6_statinc(IP6_STAT_TOOSMALL);
Index: netinet/if_arp.c
===

Re: MAXTSIZ removal?

2020-11-26 Thread Christos Zoulas

In article <20201125210311.7wofo3mtipvfb...@yt.nih.at>,
Thomas Klausner   wrote:
>There was a commit by christos that made MAXTSIZ optional, but
>at least the amd64 vmparam.h still defines it.
>
>Any reason not to remove it?
>
>(I still can't start emulators/mame with a GENERIC without that change)
> Thomas

Nope I'll remove it,

christos

Re: ktrace-ing a command that locks up the machine

2020-11-18 Thread Christos Zoulas

In article <20201118093741.gb17...@trav.math.uni-bonn.de>,
Edgar Fuß   wrote:
>So after fixing kern/53311 and kern/55745 on -8, I'm back to one nesting 
>level down my original task.
>
>I have a command that (when run the second time and with certain USB devices 
>connected) will irrevertibly (to me) partly (no console switching) lock up 
>the machine. I need to enter DDB and reboot.
>
>I would like to ktrace/ktruss the command to see which USB transfer exactly 
>is the one that hangs. However, even with ktrace -s, there is no trace file 
>after the re-boot (on FFS/WAPBL); I can't tell whether it exists before the 
>reboot. Using ktruss, the last trace output to the console is way behind the 
>execution.

ktrace over NFS.

christos

Re: [PATCH 0/2] Delete CIRCLEQ

2020-10-12 Thread Christos Zoulas

In article ,
Kamil Rytarowski   wrote:
>This removal is a part of a larger synchronization with other BSDs as
>we lack various features in sys/queue.h (like LIST_PREV()).
>
>CIRCLEQ was already deleted from the documentation and disabled in the
>kernel in NetBSD-7. If there are still any unaware users, they are
>certainly long broken.
>
>What's the benefit of keeping it around and having no users and
>documented deprecation plus being prone to miscompilation? The removal
>does not break libc or kernel ABIs. Most 3rd party users of these
>macros deliver a homegrown copy of sys/queue.h anyway.

We should follow suit with the other BSD's and remove it. We should
also look at FreeBSD's header, add the missing functions and the
comments. It is pointless keeping it around. We've already
discouraged its use and there is very little software using it. We
can easily convert whatever it is using it to TAILQ. 

christos

Re: autoloading compat43 on tty ioctls

2020-10-10 Thread Christos Zoulas

I don't think my change will make a difference, because there
was another copy of TIOCGSID in tty.c already. I don't think
anything is loading combat_43 on my machines...

christos

> On Oct 10, 2020, at 1:43 PM, nia  wrote:
> 
> On Sat, Oct 10, 2020 at 01:29:43PM -0400, Greg Troxel wrote:
>> 
>> chris...@astron.com (Christos Zoulas) writes:
>> 
>>> Aside for the TIOCGSID bug which I am about to fix (it is in tty_43.c
>>> and is used in libc tcgetsid(), all the compat tty ioctls are defined
>>> in /usr/src/sys/sys/ioctl_compat.h... We can empty that file and try
>>> to build the tree :-), but I am guessing things will break. Also a lot
>>> of pkgsrc will break too. It is not 4.3 applications that break it is
>>> applications that still use the 4.3 terminal api's.
>> 
>> If the API is still present in our source tree, then the implementation
>> probably does not belong under COMPAT_43.  As I see it COMPAT_43 is to
>> match an old ABI that one can no longer (on modern NetBSD) compile to.
>> What you are describing sounds like "we have an API still, and we've had
>> it since 4.3", which is not in my view COMPAT.
> 
> It seems that after christos's change only applications that
> #include  will require tty_43.c



signature.asc
Description: Message signed with OpenPGP

Re: autoloading compat43 on tty ioctls

2020-10-10 Thread Christos Zoulas

In article <20201010133342.ga13...@homeworld.netbsd.org>,
nia   wrote:
>syzkaller found a bug in tty_43.c which i noticed could lock up my
>system after i wrote a reproducer for it.
>
>i was curious why 43bsd compat is enabled by default on amd64 then
>noticed it gets loaded by tty if it detects an unrecognized ioctl
>
>https://github.com/NetBSD/src/blob/trunk/sys/kern/tty.c#L1420
>
>this seems to be a strange choice, especially on archs that never ran
>43bsd binaries. were applications relying on these ioctls more recently?
>
>i'm not familiar with the history of this code, but maybe someone here is

Aside for the TIOCGSID bug which I am about to fix (it is in tty_43.c
and is used in libc tcgetsid(), all the compat tty ioctls are defined
in /usr/src/sys/sys/ioctl_compat.h... We can empty that file and try
to build the tree :-), but I am guessing things will break. Also a lot
of pkgsrc will break too. It is not 4.3 applications that break it is
applications that still use the 4.3 terminal api's.

christos

Re: fsck updating but not fixing filesystem

2020-08-28 Thread Christos Zoulas

In article <20200827213416.ga25...@netbsd.org>,
David Holland   wrote:
>On Mon, Aug 24, 2020 at 12:01:27PM +0100, David Brownlee wrote:
> > > > I think the general consensus is that ffs can be inconsistent it ways
> > > > fsck is unable to detect.
> > >
> > > ...much less fix.  Yes.  When I was doing the program that eventually
> > > got massaged into resize_ffs, during development I had some filesystems
> > > that were definitely corrupted but that fsck was happy with.  (I rather
> > > wish I'd saved some of them as test cases, but I didn't.)
>
>I also once after abusing rename got a filesystem where fsck -y didn't
>converge; I had to newfs it.
>
> > Sounds like there is an in interesting fuzzing project in there for
> > someone - make a filesystem mage and the repeatedly damage it, then
> > see if fsck can fix it, then if you get a rump panic when moving
> > everything around, and then re-run fsck to see if it indicates any new
> > issues :)
>
>One can do that, but given that there are lots of edge cases and many
>of them will be hard to reach, formal verification might be more
>effective.
>

I think we should fix all filesystems to pass:

https://www.netbsd.org/~riastradh/tmp/dirconc.c

Then we can think about formal verification :-)

christos

Re: wait(2) and SIGCHLD

2020-08-16 Thread Christos Zoulas

In article <28808.1597602...@jinx.noi.kre.to>,
Robert Elz   wrote:
>Date:Sun, 16 Aug 2020 16:13:57 - (UTC)
>From:chris...@astron.com (Christos Zoulas)
>Message-ID:  
>
>  | They don't vanish, they get reparented to init(8) which then wakes up
>  | and reaps them.
>
>That probably would work, approximately, but isn't what's supposed to
>happen when a child's parent is ignoring SIGCHLD - the child should
>skip zombie state, and simply be cleaned up.
>
>The difference would be detectable if init were sent a SIGSTOP
>(assuming that isn't one which would cause a system panic)
>so it would stop reaping children (temporarily) - processes of
>the type in question should not be showing up as zombies.

FreeBSD does what we do (reparent to init). Linux has autoreap
which moves the state of the process to DEAD without going through
ZOMBIE and adds it to the dead queue.

christos

Re: wait(2) and SIGCHLD

2020-08-16 Thread Christos Zoulas

In article <5919.1597441...@jinx.noi.kre.to>,
Robert Elz   wrote:
>Date:Fri, 14 Aug 2020 20:01:18 +0200
>From:Edgar =?iso-8859-1?B?RnXf?= 
>Message-ID:  <20200814180117.gq61...@trav.math.uni-bonn.de>
>
>  | 3. I don't see where POSIX defines or allows this, but given 2., I'm surely
>  |missing something.
>
>It is specified to work this way in POSIX, though right now I don't
>have the time to go dig out exactly where.
>
>Setting SIGCHLD to SIG_IGN effectively means that you want to ignore
>your children - they then don't report any exit status to their parent,
>but simply vanish when they exit.   Thus when the parent does a wait()
>it has no children, and gets ECHLD.

They don't vanish, they get reparented to init(8) which then wakes up
and reaps them.

>Leave (or set) SIGCHLD to SIG_DFL and you don't get signals, but child
>processes do report status to their parent.   Catch SIGCHLD and you'll
>get signalled when a child exits (I'm not sure if NetBSD guarantees one
>signal delivery for each exited child or just a signal if there are
>some unspecified number of exited children).
>
>The actions on an ignored SIGCHLD is SysV inherited behaviour,
>Bell Labs (v7/32V) and CSRG BSD systems didn't act this way.

Yup, I edded this:
1.199(christos 30-Mar-05): #define  P_CLDSIGIGN 0x0008 /* 
Process is ignoring SIGCHLD */

christos

Re: pg_jobc going negative?

2020-07-10 Thread Christos Zoulas

In article <27763.1594388...@jinx.noi.kre.to>,
Robert Elz   wrote:
>Date:Tue, 9 Jun 2020 08:23:19 - (UTC)
>From:mlel...@serpens.de (Michael van Elst)
>Message-ID:  
>
>I have spent a little time looking at this now, and I think
>it is just all a mess.
>
>  | pg_jobc is not a reference counter.
>
>Maybe not technically a "reference" counter, as what it is counting isn't
>strictly references, but anything that has x++ and if (--x == 0) do_stuff()
>is essentially a reference counter.   What it is counting references to
>isn't clear (particularly here), but that is what it is doing, or trying
>to do (it has all the same issues as things which really are ref counters).
>
>  | The assertion probably stopped
>  | a bug in a different place by coincidence.
>
>I doubt that, this code is not at all good.   There is no question but
>that the counter does not count properly.
>
>As best I can work out, and someone correct me if I'm wrong,
>the whole purpose of pg_jobc is so that orphanpg() can be called
>when a process group is orphaned (no longer has a session leader).
>
>If it has any other use, I cannot see it.
>
>What's there now simply doesn't work for this purpose.   It was
>suggested that the FreeBSD code has been modified from what we
>have, and that simply adopting that might work.   I went to look
>at their code, but before I did that, I saw that a month ago
>(that is, just around the time of the original discussion here)
>they copied maxv's KASSERTs into their code.  A week ago they removed
>them again, as they were occasionally firing.   That tells me,
>even without looking at their code, that they (still) have similar
>bugs to what we do, and thus that just importing their code won't
>help us.
>
>I see 3 ways forward...   simply drop the KASSERT the way that FreeBSD
>have done, and let things return to the semi-broken but rarely bothersome
>code that was there before.   That's not really a good idea, as the
>sanitizers apparently find problems with the code the way it was (not
>too surprising, deleting the KASSERT won't fix the bugs, it just stops
>noticing them explicitly).
>
>Or, we could properly define what pg_jobc is counting, and then make sure
>that it counts whatever that is properly - is incremented in all the
>appropriate places, and decremented properly as well.   Currently
>the comment about it in proc.h is:
>/*
> * Number of processes qualifying
>* pgrp for job control 
> */
>which makes it clear that it is a reference counter (not necessarily
>counting the number of something which exists, so that something can be 
>deleted, but it is counting references to processes).   Unfortunately
>I have no idea what "qualifying pgrp for job control" is supposed to mean.
>
>That could be done, but it seems like a lot of work to me, and not easy.
>
>Another (more radical) approach would be to simply drop orphanpg()
>completely, and thus no longer need pg_jobc at all.   The system
>wouldn't be bothered by this at all - all orphanpg() does is locate
>stopped members of the process group, and send then SIGCONT (to restart)
>and SIGHUP (to probably kill them - or at least inform them they their
>environment has changed).   If the system wasn't doing this, users manually
>(or a script run from cron or something) could do it instead,  If not done
>at all, badly behaving session leaders (processes which don't clean up
>their stopped jobs before exiting - including ones with bugs that causes
>them to abort) would over time cause a growing number of stopped jobs to
>simply clutter the system, consuming various resources, but otherwise
>harmless (there is nothing special about the processes, they can be killed,
>or continued - it is just that the process which would normally do that
>is no longer around).
>
>Third, and the option I'd suggest, is to revert to more basic principles,
>remove the pg_jobc attempt to detect when a session leader has exited,
>or changed to a different process group, and instead at candidate events
>(any time a process leaves a process group, for any reason) check if that
>process was the session leader, and if it is, clean up any now orphaned
>stopped processes.   This is likely to be slower than the current attempt,
>but is much easier to make correct (and much less likely to cause problems,
>other than dangling orphaned stopped processes, if incorrect).
>
>As best I can tell, all the data needed exists already, all that will be
>needed is to modify the code.   We can even leave pg_jobc in the pgrp
>struct, to avoid needing a kernel version bump (and for reasons I cannot
>imagine, pg_jobc is copied into kinfo and einfo structs for sysctl and /proc
>access to the process data, so leaving it around avoids needing to version
>those interfaces as well ... the value would be a meaningless 0, always, but
>I really find it hard to believe that anything would ever care, or even 
>notice).
>
>Opinions on any of this before I start

Re: -.su file in kernel compile dir

2020-07-07 Thread Christos Zoulas

In article <20200707044652.c094d4e...@thoreau.thistledown.com.au>,
Simon Burge   wrote:
>Valery Ushakov wrote:
>
>> Recent changes to record stack usage cause a file named -.su to be
>> created (that refers assym.c).  It plays tricks with targets like
>> clean that refer to *.su
>
>This -.su file is created by genassym which calls GCC using stdin for
>the input file.  The attached patch feels a bit hackish, but works for
>me.  Any reason not to commit it now, or does anyone have a cleaner
>fix?  I certainly didn't want to add all the ugliness of checking if the
>current compiler is GCC and !vax !

Just add another -N-fstack-usage?

-   ${GENASSYM} -- ${CC} ${CFLAGS:N-Wa,*} ${CPPFLAGS} ${PROF} \
+   ${GENASSYM} -- ${CC} ${CFLAGS:N-Wa,*:N-fstack-usage} ${CPPFLAGS} \
+   ${PROF} \

christos

Re: makesyscalls (moving forward)

2020-06-15 Thread Christos Zoulas

In article <20200615120806.gb1...@diablo.13thmonkey.org>,
Reinoud Zandijk   wrote:
>Small addendum,
>
>On Mon, Jun 15, 2020 at 01:44:19PM +0200, Reinoud Zandijk wrote:
>> What about not installing it at all? Its only going to be used during
>> definition updates or fixes. Compare it to the pcidevs.h and pcidevs_data.h
>> creation only this time it creates the relevant kernel/rump/fuzzer files. The
>> program can optionally be compiled and linked in /tmp and then called from
>> there to create all the variants using the templates or just be created in
>> place and cleaned up later.  No need to install it in base. The resulting
>> files can then be committed as `regen' just like the pcidevs variants.
>
>LLVM code and its fuzzing tools should be in tree anyway so it can be created
>there on the fly too if requested.

How about strace, or any other program that wants access to the system call
table information? Everything in the tree?

We have an opportunity here to do this better than we have been.
But we are trying to answer "how" first, not "why". Let's set up
some goals and some properties that the new system call description
table should have. What kinds of output should we be able to generate?

christos

Re: [PATCH] Add support for RUMP_USE_LIBC_ALLOCATORS

2020-06-14 Thread Christos Zoulas

In article ,
Kamil Rytarowski   wrote:
>-=-=-=-=-=-
>-=-=-=-=-=-
>
>I propose to add a new option for building rumpkernel:
>RUMP_USE_LIBC_ALLOCATORS.
>
>This option wires the kernel allocators directly to the libc functions.
>This is useful for sanitizers with their fine-grained checks of
>allocated chunks.
>
>http://netbsd.org/~kamil/patch-00268-RUMP_USE_LIBC_ALLOCATORS.2.txt
>
>Does this code and approach look good? Is possible to improve this
>approach? There is a fallout with ATF test t_vm::uvmwait as it no longer
>passes as the memory limit is no longer respected.
>
>With this patch rumpkernel is now more sensitive to real memory access
>bugs. The immediate detected problem is with kthread_join() that
>attempts to join a thread that was freed.
>
>http://netbsd.org/~kamil/rump/heap-use-after-free.txt
>
>We use this patch during the ongoing GSoC so quick merge into src/ is
>wanted.

I think putting the libc allocator versions of the functions in their own
files is better, because this way has too many ifdefs. Perhaps subr_pool
needs to be split for that to be effective. Try that and see if it looks
better...

christos

Re: pg_jobc going negative?

2020-06-09 Thread Christos Zoulas

The FreeBSD refactoring LGTM. It also simplifies the code.

christos

> On Jun 9, 2020, at 2:11 PM, Robert Elz  wrote:
> 
>Date:Tue, 9 Jun 2020 17:04:54 +0200
>From:Kamil Rytarowski 
>Message-ID:  
> 
> 
>  | Yes... syzkaller had like 12 different ways to reproduce it.
> 
> OK, thanks.
> 
>  | There is still a race and we randomly go to negative pg_jobc.
> 
> I am not at all surprised...
> 
> I will look at it over the next couple of days.   No guarantees...
> 
> kre



signature.asc
Description: Message signed with OpenPGP

Re: kernel stack usage

2020-05-30 Thread Christos Zoulas

In article <20200530095218.gb28...@mail.duskware.de>,
Martin Husemann   wrote:
>Hey folks,
>
>triggered by some experiments simonb did on mips I wrote a script to find
>the functions using the bigest stack frame in my current sparc64 kernel.
>
>The top 15 list is:
>
>Frame/b Function
>4096pci_conf_print at pci_subr.c:4812
>4096dtv_demux_read at dtv_demux.c:493
>3536SHA3_Selftest at sha3.c:430
>3408genfb_calc_hsize at genfb.c:630
>3248radeonfb_pickres at radeonfb.c:4127
>2304radeonfb_set_cursor at radeonfb.c:3690
>2272gem_pci_attach at if_gem_pci.c:147
>2256twoway_memmem at memmem.c:84
>2240bwfm_rx_event_cb at bwfm.c:2099
>2240compat_60_ptmget_ioctl at tty_60.c:70
>2112db_stack_trace_print at db_trace.c:77
>1664wdcprobe_with_reset at wdc.c:491
>1424nfsrv_rename at nfs_serv.c:1906
>1408OF_mapintr at ofw_machdep.c:728
>1344sysctl_hw_firmware_path at firmload.c:81
>1280fw_bmr at firewire.c:2296
>1264cdioctl at cd.c:1204
>1248cpu_reset_fpustate at cpu.c:400
>1248aubtfwl_attach_hook at aubtfwl.c:273
>1248uvm_swap_stats at uvm_swap.c:726
>
>(left column is size of the frame on sparc64 in bytes)
>
>I think anything > 1k is dubious and should be checked.

I agree, here is the same for x86_64/GENERIC...

4408 8027af14:pci_conf_print+0xd
4128 80a8dca0:dtv_demux_read+0xb
3352 80b940bb:procfs_domounts+0xd
3272 80e36b4b:SHA3_Selftest+0xd
3264 80c677da:coredump_note_elf64+0xb
3240 80b537c6:genfb_calc_hsize.isra.0+0x5
2704 80c66a88:coredump_note_elf32+0xb
2408 80227a71:process_machdep_doxstate+0xd
2184 804381fd:linux_ioctl_termios+0xd
2168 80440b2d:linux32_ioctl_termios+0xd
2112 802c5579:gem_pci_attach+0xb
2104 80e465c3:twoway_memmem+0xd
2088 806b5c18:bwfm_rx_event_cb+0xd
2072 8097e221:compat_60_ptmget_ioctl+0xd
2064 8053ce72:db_stack_trace_print+0x11
1488 8064f943:wdcprobe_with_reset+0xb
1384 80d7ee2b:ipmi_match+0x9
1328 80467a95:usb_add_event+0x7
1304 80ba85bb:nfsrv_rename+0xd
1256 8053069f:acpicpu_md_pstate_sysctl_all+0xd
1256 8052cd13:acpicpu_start+0x9
1240 8043b162:linux_sys_rt_sigreturn+0x9
1192 80b9392a:procfs_do_pid_stat+0xd
1176 80d70810:sysctl_hw_firmware_path+0xd
1160 807b30a3:radeon_cs_ioctl+0xd
1128 8044610f:oss_ioctl_mixer+0xd
1128 8024dd3c:cdioctl+0xd
1112 80c633ca:uvm_swap_stats.part.1+0xd
1104 804fdf59:fw_bmr+0xb
1096 80c8c0ab:ktrwrite+0xd
1080 80573ca6:ahc_print_register+0xd
1080 80550de8:procfs_getonecpu+0xd
1080 8048d95e:aubtfwl_attach_hook+0x9
1064 80cf925d:proc_regio+0xd
1064 80ccb407:bufq_alloc+0xd
1064 80ae8090:ar5112SetPowerTable+0xd
1064 80582aa0:ahd_print_register+0xd
1064 80568a53:tpmread+0xd
1064 80382262:txp_attach+0xd
1064 802636da:ata_probe_caps+0xd
1048 80e06097:ar9003_paprd_tx_tone_done+0xd
1048 80d87e9b:sdl_print+0x9
1048 80c67604:coredump_getseghdrs_elf64+0xd

Re: Fix for slow run(4) configuration on OHCI/UHCI

2020-05-30 Thread Christos Zoulas

In article <24273.15131.928644.743...@guava.gson.org>,
Andreas Gustafsson   wrote:
>Hi all,
>
>When I connect a USB WiFi adapter based on a Ralink RT5370 chip to a
>USB port that uses an OHCI or UHCI host controller, running "ifconfig
>run0 up" takes a very long time, about 20-30 seconds.
>
>This is because the run(4) driver writes a large amount of data to the
>device just two bytes at a time using the WRITE_2 command, and each
>write takes two full USB frames of 1 millisecond each on UHCI, or
>three frames on OCHI.  With an EHCI or XHCI controller, the large
>number of transfers is not a problem as these controllers can perform
>multiple control transfers within a single frame.
>
>The driver already contains code to do the transfers in larger blocks
>using the WRITE_REGION_1 command, but it's #if'ed out with a comment
>saying it is "not stable on RT2860".  The FreeBSD driver is similar,
>except that its version of the #if'ed-out code limits the transfers
>to a maximum of 64 bytes at a time.
>
>I have verified that enabling the use of WRITE_REGION_1 with the
>64-bit limit from FreeBSD works with my RT5370 based adapter, and
>makes it configure about ten times faster on OHCI and UHCI.
>
>I'm now using the following patch, which enables the use of
>WRITE_REGION_1 using blocks of up to 64 bytes for the RT5370 only:
>
>  https://www.gson.org/netbsd/patches/run-faster.patch
>
>OK to commit?

Yes, let's do it but put a comment in the commit message that provides
a summary of this message, so next time someone encounters an issue,
they find the information in one place.

christos

Re: sys/idtype.h unused enumeration values

2020-05-18 Thread Christos Zoulas

The *used* enum values are already burned into existing programs.
Reordering/removing from the list breaks ABI. They are best left alone.

christos

> On May 18, 2020, at 4:55 PM, Kamil Rytarowski  wrote:
> 
> Signed PGP part
> On 18.05.2020 22:18, Christos Zoulas wrote:
>> 
>> 
>>> On May 18, 2020, at 3:40 PM, Kamil Rytarowski  wrote:
>>> 
>>> If I delete P_TASKID ... P_P_CPUID ones, P_SETID will be reordered (but
>>> we can force the number anyway). If I delete P_CID there is an inelegant
>>> hole. Naturally P_SETID -> P_CID can fill the gap.
>>> 
>>> This is in theory ABI change, but no users could use in a useful
>>> approach previously.
>>> 
>>> My intention isto g/c unused values and keep this clean and elegant (as
>>> this is still possible).
>>> 
>> 
>> Why don't you leave them alone for the same reason FreeBSD did (source 
>> compatibility)
>> and append the ones you want? If you look they #define P_ZONEID P_JAILID when
>> they made the change...
>> 
>> christos
>> 
> 
> My point is that there is no source code (at least in base) that we gain
> compatibility with, no ABI compatibility layer and these concepts do not
> match the current NetBSD kernel features. If there is anything in 3rd
> party pretending to use these values, it would not work anyway.
> 
> If we want to these compat defines, they already live in:
> 
> external/cddl/osnet/dist/uts/common/sys/procset.h
> 
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=170346 
> <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=170346>
> 
> Regarding FreeBSD, I don't see rationale for inclusion of these values.
> It looks like it was copy-pasted (there were also Solaris-specific
> preprocessor guards in the initial version).
> 
> But if there is intention to keep these values around (as it might be in
> theory too late as they leaked somewhere), it's fine. Thanks.
> 
> 
> 



signature.asc
Description: Message signed with OpenPGP

Re: sys/idtype.h unused enumeration values

2020-05-18 Thread Christos Zoulas



> On May 18, 2020, at 3:40 PM, Kamil Rytarowski  wrote:
> 
> If I delete P_TASKID ... P_P_CPUID ones, P_SETID will be reordered (but
> we can force the number anyway). If I delete P_CID there is an inelegant
> hole. Naturally P_SETID -> P_CID can fill the gap.
> 
> This is in theory ABI change, but no users could use in a useful
> approach previously.
> 
> My intention isto g/c unused values and keep this clean and elegant (as
> this is still possible).
> 

Why don't you leave them alone for the same reason FreeBSD did (source 
compatibility)
and append the ones you want? If you look they #define P_ZONEID P_JAILID when
they made the change...

christos



signature.asc
Description: Message signed with OpenPGP

Re: sys/idtype.h unused enumeration values

2020-05-18 Thread Christos Zoulas

I copied these from FreeBSD who in turn copied them from solaris and changed
P_ZONEID to P_JAILID. The FreeBSD comment is:
http://bxr.su/FreeBSD/sys/sys/wait.h#100 

I decided to keep all the names too.

christos

> On May 18, 2020, at 1:45 PM, Kamil Rytarowski  wrote:
> 
> Signed PGP part
> POSIX notes:
> 
> "The type idtype_t shall be defined as an enumeration type whose
> possible values shall include at least the following: P_ALL P_PGID P_PID"
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_wait.h.html
> 
> For some reason we copied as-is solaris types into our public headers
> with fields that will possibly never apply to NetBSD, such as P_CTID,
> P_POOLID etc.
> 
> I have got a local use-case for another P_type (premature to discuss it
> in this thread) and I would rather recycle an unused value.
> 
> Do we plan to get Solaris feature-parity with all the types? I assume
> that the answer is NO. If so, can we delete the P_ values that are not
> applicable for NetBSD?
> 
> If something is used in some compat shim for sunos software (dtrace,
> zfs) we could easily add a compat wrapper in the 3rd party code.
> Defining the types does not make the features to work.
> 
> 
> 



signature.asc
Description: Message signed with OpenPGP

Re: ACL patch

2020-05-15 Thread Christos Zoulas



> On May 15, 2020, at 5:53 PM, Andrew Doran  wrote:
> 
> 
> One problem with the VV_HASACLS thing is that VOP_ACCESS() is called with an
> LK_SHARED lock and you can't modify v_vflag with that.
> 
> It looks like you have to explicitly enable ACLs before the file system is
> mounted with tunefs, and it's not the default?  In that's true then if either
> FS_ACLS or FS_POSIX1EACLS is true during mount you could instead not set
> IMNT_NCLOOKUP, which would disable the lookup-in-namecache fastpath and
> everything would go through VOP_ACCESS() instead.  It's kind of a big hammer
> but at least everything is always correct then.  If not I'll dig into it in
> more detail tomorrow morning.
> 

Thanks Andrew, I will go with the big hammer approach then.

christos


signature.asc
Description: Message signed with OpenPGP

Re: ACL patch

2020-05-14 Thread Christos Zoulas

In article <20200512235040.gd9...@homeworld.netbsd.org>,
Andrew Doran   wrote:
>On Tue, May 12, 2020 at 10:37:27PM +, Andrew Doran wrote:
>> On Sun, May 10, 2020 at 09:48:10AM -0400, Christos Zoulas wrote:
>> > 
>> > > Maybe I missed it but I didn't see a way for cache_lookup_linked() and
>> > > cache_have_id() to know a vnode has ACLs.  The presence of ACLs
>means those
>> > > routines can't do their imitation of VOP_ACCESS() and need to fail so 
>> > > that
>> > > the lookup is handled by VOP_LOOKUP().  To handle that on a
>per-vnode basis
>> > > you'd want to flag the "has ACLs" fact when an inode is loaded, and do 
>> > > the
>> > > same when an ACL is added to an in-core inode.  I'll look into it over 
>> > > the
>> > > next few days unless you want to take care of it.
>> > 
>> > So I looked into this a bit and genfs_can_access() is also used by 
>cache_revlookup.
>> > I am not sure what to do here. It is simple enough to add a flag to
>the vnode to indicate
>> > if it has ACLs, but is it the right thing to do?
>> 
>> I tried to pull down your patch again over the weekend to take a look but it
>> was gone.  The best place for a flag would be cache_enter_id() since it
>> deals with the necessary synchronisation.  I'll see about adding a flag now.
>> In addition to the places it's called now it would need to be called
>> whenever an inode gains an ACL.
>
>Ok done should just need to mark the cached ID as invalid when the inode has
>/ gains an ACL.

Ok, I made some changes using this in my latest acl patch. Can you please
check?  https://www.netbsd.org/~christos/acl.diff

Best,

christos

Re: ACL patch

2020-05-13 Thread Christos Zoulas

In article <20200512235040.gd9...@homeworld.netbsd.org>,
Andrew Doran   wrote:
>On Tue, May 12, 2020 at 10:37:27PM +, Andrew Doran wrote:
>> On Sun, May 10, 2020 at 09:48:10AM -0400, Christos Zoulas wrote:
>> > 
>> > > Maybe I missed it but I didn't see a way for cache_lookup_linked() and
>> > > cache_have_id() to know a vnode has ACLs.  The presence of ACLs
>means those
>> > > routines can't do their imitation of VOP_ACCESS() and need to fail so 
>> > > that
>> > > the lookup is handled by VOP_LOOKUP().  To handle that on a
>per-vnode basis
>> > > you'd want to flag the "has ACLs" fact when an inode is loaded, and do 
>> > > the
>> > > same when an ACL is added to an in-core inode.  I'll look into it over 
>> > > the
>> > > next few days unless you want to take care of it.
>> > 
>> > So I looked into this a bit and genfs_can_access() is also used by 
>cache_revlookup.
>> > I am not sure what to do here. It is simple enough to add a flag to
>the vnode to indicate
>> > if it has ACLs, but is it the right thing to do?
>> 
>> I tried to pull down your patch again over the weekend to take a look but it
>> was gone.  The best place for a flag would be cache_enter_id() since it
>> deals with the necessary synchronisation.  I'll see about adding a flag now.
>> In addition to the places it's called now it would need to be called
>> whenever an inode gains an ACL.
>
>Ok done should just need to mark the cached ID as invalid when the inode has
>/ gains an ACL.

Ok, thanks!

christos

Re: ACL patch

2020-05-10 Thread Christos Zoulas


> Maybe I missed it but I didn't see a way for cache_lookup_linked() and
> cache_have_id() to know a vnode has ACLs.  The presence of ACLs means those
> routines can't do their imitation of VOP_ACCESS() and need to fail so that
> the lookup is handled by VOP_LOOKUP().  To handle that on a per-vnode basis
> you'd want to flag the "has ACLs" fact when an inode is loaded, and do the
> same when an ACL is added to an in-core inode.  I'll look into it over the
> next few days unless you want to take care of it.

So I looked into this a bit and genfs_can_access() is also used by  
cache_revlookup.
I am not sure what to do here. It is simple enough to add a flag to the vnode 
to indicate
if it has ACLs, but is it the right thing to do?

Best,

christos


signature.asc
Description: Message signed with OpenPGP

Re: ACL patch

2020-05-05 Thread Christos Zoulas



> On May 4, 2020, at 7:51 PM, Andrew Doran  wrote:
>> 

Thanks for reviewing.

> 
>   +++ sbin/tunefs/tunefs.8
>   ...
>   +.It Fl n Cm enable | disable
>   +Turn on/off the administrative NFSv4 ACL enable flag.

Fixed, this was before when:
-a = posix alls
-n = nfs alls

Now we have
-a = nfs acls
-p = posix alls

Since the nfs acls are preferred.

> Is that handled?  I don't see it.
> 
>   --- share/mk/bsd.own.mk 27 Apr 2020 03:15:12 -  1.1187
>   +++ share/mk/bsd.own.mk 2 May 2020 23:06:42 -
>   @@ -164,7 +164,9 @@ EXTERNAL_GDB_SUBDIR=/does/not/exist
>   #
>.if ${MACHINE_ARCH} == "x86_64" || ${MACHINE_ARCH} == "i386" || \
>${MACHINE_ARCH} == "powerpc64" || ${MACHINE_ARCH} == "powerpc" || \
>   -${MACHINE_CPU} == "aarch64" || ${MACHINE_CPU} == "arm"
>   +${MACHINE_CPU} == "aarch64" || ${MACHINE_CPU} == "arm" || \
>   +${MACHINE_CPU} == "sparc64" || ${MACHINE_CPU} == "sparc" || \
>   +${MACHINE_CPU} == "mips"
> 
> Seems unrelated.

Yup unrelated.

> 
>   @@ -739,10 +1206,6 @@ genfs_can_chmod(enum vtype type, kauth_c
>{
>   int error;
> 
>   -   /* The user must own the file. */
>   -   if (kauth_cred_geteuid(cred) != cur_uid)
>   -   return (EPERM);
>   -
>   /*
>* Unprivileged users can't set the sticky bit on files.
>*/
>   @@ -762,6 +1225,12 @@ genfs_can_chmod(enum vtype type, kauth_c
>   return (EPERM);
>   }
> 
>   +   /*
>   +* Deny setting setuid if we are not the file owner.
>   +*/
>   +   if ((new_mode & S_ISUID) && cur_uid != kauth_cred_geteuid(cred))
>   +   return (EPERM);
>   +
> 
> Are you sure this is correct?

Indeed, there is a missing check here:

/*
 * To modify the permissions on a file, must possess VADMIN
 * for that file.
 */
if ((error = VOP_ACCESSX(vp, VWRITE_ACL, cred, td)) != 0)
return (error);

I have fixed it.

> Maybe I missed it but I didn't see a way for cache_lookup_linked() and
> cache_have_id() to know a vnode has ACLs.  The presence of ACLs means those
> routines can't do their imitation of VOP_ACCESS() and need to fail so that
> the lookup is handled by VOP_LOOKUP().  To handle that on a per-vnode basis
> you'd want to flag the "has ACLs" fact when an inode is loaded, and do the
> same when an ACL is added to an in-core inode.  I'll look into it over the
> next few days unless you want to take care of it.


Thanks, please let me know if you start working on this and I will do the 
same...

christos



signature.asc
Description: Message signed with OpenPGP

ACL patch

2020-05-02 Thread Christos Zoulas



Hi,

The following patch ports the FreeBSD FFS ACLS (both posix1e and nfs4) to
NetBSD. Comments? I will let this sit for a week or so and then commit it
if I don't hear screams.

[it is ~24K lines, so not posted inline]

https://www.netbsd.org/~christos/acl.diff

Best,


christos

Re: Symbol debugging support for kernel modules in crash dumps

2020-05-02 Thread Christos Zoulas

In article <20200501233413.291a717f...@rebar.astron.com>,
Christos Zoulas  wrote:
>
>Hi,
>
>I just added symbol debugging support for modules in kernel dumps.
>Things are not perfect because of what I call "current thread
>confusion" in the kvm target, but as you see in the following
>session it works just fine if you follow the right steps. First of
>all you need a build from HEAD that has the capability to build
>.debug files for kernel modules.  Once that's done, you are all
>set; see how it works (comments prefixed by )

I fixed the "current thread confusion" by setting the current
thread and reloading the symbol file in gdb, this works as
expected:

$ gdb netbsd.gdb
(gdb) target kvm netbsd.666.core
(gdb) source /usr/src/sys/gdbscripts/modload
(gdb) modload
(gdb) where

christos

Symbol debugging support for kernel modules in crash dumps

2020-05-01 Thread Christos Zoulas



Hi,

I just added symbol debugging support for modules in kernel dumps.
Things are not perfect because of what I call "current thread
confusion" in the kvm target, but as you see in the following
session it works just fine if you follow the right steps. First of
all you need a build from HEAD that has the capability to build
.debug files for kernel modules.  Once that's done, you are all
set; see how it works (comments prefixed by )

Enjoy,

christos

$ gdb /usr/src/sys/arch/amd64/compile/QUASAR/netbsd.gdb
GNU gdb (GDB) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/src/sys/arch/amd64/compile/QUASAR/netbsd.gdb...
(gdb) target kvm netbsd.22.core
0x80224375 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at ../../../../arch/amd64/amd64/machdep.c:718
warning: Source file is more recent than executable.
718 if (s != IPL_NONE)

 Ok we got a stacktrace here, but we don't have a current thread...
 So we set it...

(gdb) info thread
  Id   Target Id Frame 
* 2.1   0x80224375 in cpu_reboot (
howto=howto@entry=260, bootstr=bootstr@entry=0x0)
at ../../../../arch/amd64/amd64/machdep.c:718

No selected thread.  See `help thread'.
(gdb) thread 2.1

[Switching to thread 2.1 ()]
#0  0x80224375 in ?? ()

 Note that here we lost all symbol table access when we switched threads
 let's load it again..

(gdb) add-symbol-file /usr/src/sys/arch/amd64/compile/QUASAR/netbsd.gdb
add symbol table from file "/usr/src/sys/arch/amd64/compile/QUASAR/netbsd.gdb"
(y or n) y
Reading symbols from /usr/src/sys/arch/amd64/compile/QUASAR/netbsd.gdb...

 OK, lets load our modules

(gdb) source /usr/src/sys/gdbscripts/modload 
(gdb) modload
add symbol table from file "/stand/amd64/9.99.59/modules/ping/ping.kmod" at
.text_addr = 0x8266e000
.data_addr = 0x8266b000
.rodata_addr = 0x8266c000
add symbol table from file 
"/stand/amd64/9.99.59/modules/nfsserver/nfsserver.kmod" at
.text_addr = 0x82a64000
.data_addr = 0x82669000
.rodata_addr = 0x8298e000
add symbol table from file 
"/stand/amd64/9.99.59/modules/npf_ext_log/npf_ext_log.kmod" at
.text_addr = 0x82668000
.data_addr = 0x82667000
.rodata_addr = 0x82969000
add symbol table from file 
"/stand/amd64/9.99.59/modules/npf_alg_icmp/npf_alg_icmp.kmod" at
.text_addr = 0x82666000
.data_addr = 0x82665000
.rodata_addr = 0x82952000
add symbol table from file "/stand/amd64/9.99.59/modules/bpfjit/bpfjit.kmod" at
.text_addr = 0x82661000
.data_addr = 0x0
.rodata_addr = 0x828dd000
add symbol table from file "/stand/amd64/9.99.59/modules/sljit/sljit.kmod" at
.text_addr = 0x82945000
.data_addr = 0x82664000
.rodata_addr = 0x828f9000
add symbol table from file 
"/stand/amd64/9.99.59/modules/if_npflog/if_npflog.kmod" at
.text_addr = 0x8266
.data_addr = 0x8265f000
.rodata_addr = 0x828ca000
add symbol table from file "/stand/amd64/9.99.59/modules/npf/npf.kmod" at
.text_addr = 0x82648000
.data_addr = 0x82647000
.rodata_addr = 0x826d6000
add symbol table from file "/stand/amd64/9.99.59/modules/bpf/bpf.kmod" at
.text_addr = 0x82622000
.data_addr = 0x82621000
.rodata_addr = 0x826a3000
add symbol table from file 
"/stand/amd64/9.99.59/modules/bpf_filter/bpf_filter.kmod" at
.text_addr = 0x8263c000
.data_addr = 0x0
.rodata_addr = 0x82627000
add symbol table from file 
"/stand/amd64/9.99.59/modules/scsiverbose/scsiverbose.kmod" at
.text_addr = 0x826a2000
.data_addr = 0x82686000
.rodata_addr = 0x82687000
add symbol table from file 
"/stand/amd64/9.99.59/modules/usbverbose/usbverbose.kmod" at
.text_addr = 0x82685000
.data_addr = 0x8267
.rodata_addr = 0x82671000
add symbol table from file 
"/stand/amd64/9.99.59/modules/miiverbose/miiverbose.kmod" at
.text_addr = 0x82646000

Re: fstat(1) and sysctl(3)

2020-04-16 Thread Christos Zoulas

In article <13adfa7f-0d25-9fa0-6559-706e693dc...@gmx.com>,
Kamil Rytarowski   wrote:
>Is there any good reason that fstat(1) needs kvm(3)? Is it viable to
>offer its functionality with sysctl(3), in particular in struct kinfo_file?
>
>I'm have got a use-case (in GDB (*)) where I would make use of more
>fields in struct kinfo_file, at least file path and socket information.
>
>Maybe it would be viable to switch fstat(1) to sysctl(3)?
>
>(*) FreeBSD uses:
>
>fbsd_info_proc_files_entry (kf->kf_type, kf->kf_fd, kf->kf_flags,
>kf->kf_offset, kf->kf_vnode_type,
>kf->kf_sock_domain, kf->kf_sock_type,
>kf->kf_sock_protocol, >kf_sa_local,
>>kf_sa_peer, kf->kf_path);
>

If you do it, make sure you provide info for all the file descriptor types.
(grep -i kvm_read /usr/src/usr.bin/fstat/*.c)

christos

Extended Attributes [take 2]

2020-04-10 Thread Christos Zoulas



Oops I included ufs/ufs_extattr.c (the FFSv1 ones) not ffs/ffs_extattr.c
which is the new file, so let's try again:


? o
? ffs/ffs_inode.c.debug
Index: files.ufs
===
RCS file: /cvsroot/src/sys/ufs/files.ufs,v
retrieving revision 1.45
diff -u -u -r1.45 files.ufs
--- files.ufs   17 Jun 2019 03:32:58 -  1.45
+++ files.ufs   11 Apr 2020 00:13:05 -
@@ -52,6 +52,7 @@
 file   ufs/ffs/ffs_alloc.c ffs
 file   ufs/ffs/ffs_balloc.cffs
 file   ufs/ffs/ffs_bswap.c (ffs | mfs) & ffs_ei
+file   ufs/ffs/ffs_extattr.c   ffs & ufs_extattr
 file   ufs/ffs/ffs_inode.c ffs
 file   ufs/ffs/ffs_snapshot.c  ffs
 file   ufs/ffs/ffs_subr.c  ffs
Index: ffs/ffs_alloc.c
===
RCS file: /cvsroot/src/sys/ufs/ffs/ffs_alloc.c,v
retrieving revision 1.166
diff -u -u -r1.166 ffs_alloc.c
--- ffs/ffs_alloc.c 23 Feb 2020 15:46:42 -  1.166
+++ ffs/ffs_alloc.c 11 Apr 2020 00:13:05 -
@@ -257,7 +257,10 @@
bno = ffs_hashalloc(ip, cg, bpref, size, 0, flags, ffs_alloccg);
if (bno > 0) {
DIP_ADD(ip, blocks, btodb(size));
-   ip->i_flag |= IN_CHANGE | IN_UPDATE;
+   if (flags & IO_EXT)
+   ip->i_flag |= IN_CHANGE;
+   else
+   ip->i_flag |= IN_CHANGE | IN_UPDATE;
*bnp = bno;
return (0);
}
@@ -300,14 +303,15 @@
  * => return with um_lock released
  */
 int
-ffs_realloccg(struct inode *ip, daddr_t lbprev, daddr_t bpref, int osize,
-int nsize, kauth_cred_t cred, struct buf **bpp, daddr_t *blknop)
+ffs_realloccg(struct inode *ip, daddr_t lbprev, daddr_t bprev, daddr_t bpref,
+int osize, int nsize, int flags, kauth_cred_t cred, struct buf **bpp,
+daddr_t *blknop)
 {
struct ufsmount *ump;
struct fs *fs;
struct buf *bp;
int cg, request, error;
-   daddr_t bprev, bno;
+   daddr_t bno;
 
fs = ip->i_fs;
ump = ip->i_ump;
@@ -368,10 +372,6 @@
mutex_exit(>um_lock);
goto nospace;
}
-   if (fs->fs_magic == FS_UFS2_MAGIC)
-   bprev = ufs_rw64(ip->i_ffs2_db[lbprev], UFS_FSNEEDSWAP(fs));
-   else
-   bprev = ufs_rw32(ip->i_ffs1_db[lbprev], UFS_FSNEEDSWAP(fs));
 
if (bprev == 0) {
panic("%s: bad bprev: dev = 0x%llx, bsize = %d, bprev = %"
@@ -403,7 +403,10 @@
mutex_enter(>um_lock);
if ((bno = ffs_fragextend(ip, cg, bprev, osize, nsize)) != 0) {
DIP_ADD(ip, blocks, btodb(nsize - osize));
-   ip->i_flag |= IN_CHANGE | IN_UPDATE;
+   if (flags & IO_EXT)
+   ip->i_flag |= IN_CHANGE;
+   else
+   ip->i_flag |= IN_CHANGE | IN_UPDATE;
 
if (bpp != NULL) {
if (bp->b_blkno != FFS_FSBTODB(fs, bno)) {
@@ -503,7 +506,10 @@
ip->i_number);
}
DIP_ADD(ip, blocks, btodb(nsize - osize));
-   ip->i_flag |= IN_CHANGE | IN_UPDATE;
+   if (flags & IO_EXT)
+   ip->i_flag |= IN_CHANGE;
+   else
+   ip->i_flag |= IN_CHANGE | IN_UPDATE;
if (bpp != NULL) {
bp->b_blkno = FFS_FSBTODB(fs, bno);
allocbuf(bp, nsize, 1);
Index: ffs/ffs_balloc.c
===
RCS file: /cvsroot/src/sys/ufs/ffs/ffs_balloc.c,v
retrieving revision 1.63
diff -u -u -r1.63 ffs_balloc.c
--- ffs/ffs_balloc.c28 Oct 2017 00:37:13 -  1.63
+++ ffs/ffs_balloc.c11 Apr 2020 00:13:05 -
@@ -72,6 +72,12 @@
 static int ffs_balloc_ufs2(struct vnode *, off_t, int, kauth_cred_t, int,
 struct buf **);
 
+static daddr_t
+ffs_extb(struct fs *fs, struct ufs2_dinode *dp, daddr_t nb)
+{
+   return ufs_rw64(dp->di_extb[nb], UFS_FSNEEDSWAP(fs));
+}
+   
 /*
  * Balloc defines the structure of file system storage
  * by allocating the physical blocks on a device given
@@ -139,10 +145,11 @@
osize = ffs_blksize(fs, ip, nb);
if (osize < fs->fs_bsize && osize > 0) {
mutex_enter(>um_lock);
-   error = ffs_realloccg(ip, nb,
+   error = ffs_realloccg(ip, nb, ffs_getdb(fs, ip, nb),
ffs_blkpref_ufs1(ip, lastlbn, nb, flags,
>i_ffs1_db[0]),
-   osize, (int)fs->fs_bsize, cred, bpp, );
+   osize, (int)fs->fs_bsize, flags, cred, bpp,
+   );
if (error)
return (error);

Extended Attribute support for FFSv2

2020-04-10 Thread Christos Zoulas



Hi,

I am planning to add ACL support to FFS. As I was porting the
FreeBSD ACL code, I noticed that we currently lack extended attribute
support for FFSv2, so I took a detour and added that support by
copying the FreeBSD code (which you can compare against; I've left
it mostly intact by adding compatibility defines). I am not sure
if I got it all right, but I've written a simple unit-test and it
passes and does not corrupt the filesystem. Here are my changes
for review. If you think they are ok, I will commit them in a day
or so.

Thanks,

christos

Index: files.ufs
===
RCS file: /cvsroot/src/sys/ufs/files.ufs,v
retrieving revision 1.45
diff -u -u -r1.45 files.ufs
--- files.ufs   17 Jun 2019 03:32:58 -  1.45
+++ files.ufs   10 Apr 2020 22:15:21 -
@@ -52,6 +52,7 @@
 file   ufs/ffs/ffs_alloc.c ffs
 file   ufs/ffs/ffs_balloc.cffs
 file   ufs/ffs/ffs_bswap.c (ffs | mfs) & ffs_ei
+file   ufs/ffs/ffs_extattr.c   ffs & ufs_extattr
 file   ufs/ffs/ffs_inode.c ffs
 file   ufs/ffs/ffs_snapshot.c  ffs
 file   ufs/ffs/ffs_subr.c  ffs
Index: ffs/ffs_alloc.c
===
RCS file: /cvsroot/src/sys/ufs/ffs/ffs_alloc.c,v
retrieving revision 1.166
diff -u -u -r1.166 ffs_alloc.c
--- ffs/ffs_alloc.c 23 Feb 2020 15:46:42 -  1.166
+++ ffs/ffs_alloc.c 10 Apr 2020 22:15:22 -
@@ -257,7 +257,10 @@
bno = ffs_hashalloc(ip, cg, bpref, size, 0, flags, ffs_alloccg);
if (bno > 0) {
DIP_ADD(ip, blocks, btodb(size));
-   ip->i_flag |= IN_CHANGE | IN_UPDATE;
+   if (flags & IO_EXT)
+   ip->i_flag |= IN_CHANGE;
+   else
+   ip->i_flag |= IN_CHANGE | IN_UPDATE;
*bnp = bno;
return (0);
}
@@ -300,14 +303,15 @@
  * => return with um_lock released
  */
 int
-ffs_realloccg(struct inode *ip, daddr_t lbprev, daddr_t bpref, int osize,
-int nsize, kauth_cred_t cred, struct buf **bpp, daddr_t *blknop)
+ffs_realloccg(struct inode *ip, daddr_t lbprev, daddr_t bprev, daddr_t bpref,
+int osize, int nsize, int flags, kauth_cred_t cred, struct buf **bpp,
+daddr_t *blknop)
 {
struct ufsmount *ump;
struct fs *fs;
struct buf *bp;
int cg, request, error;
-   daddr_t bprev, bno;
+   daddr_t bno;
 
fs = ip->i_fs;
ump = ip->i_ump;
@@ -368,10 +372,6 @@
mutex_exit(>um_lock);
goto nospace;
}
-   if (fs->fs_magic == FS_UFS2_MAGIC)
-   bprev = ufs_rw64(ip->i_ffs2_db[lbprev], UFS_FSNEEDSWAP(fs));
-   else
-   bprev = ufs_rw32(ip->i_ffs1_db[lbprev], UFS_FSNEEDSWAP(fs));
 
if (bprev == 0) {
panic("%s: bad bprev: dev = 0x%llx, bsize = %d, bprev = %"
@@ -403,7 +403,10 @@
mutex_enter(>um_lock);
if ((bno = ffs_fragextend(ip, cg, bprev, osize, nsize)) != 0) {
DIP_ADD(ip, blocks, btodb(nsize - osize));
-   ip->i_flag |= IN_CHANGE | IN_UPDATE;
+   if (flags & IO_EXT)
+   ip->i_flag |= IN_CHANGE;
+   else
+   ip->i_flag |= IN_CHANGE | IN_UPDATE;
 
if (bpp != NULL) {
if (bp->b_blkno != FFS_FSBTODB(fs, bno)) {
@@ -503,7 +506,10 @@
ip->i_number);
}
DIP_ADD(ip, blocks, btodb(nsize - osize));
-   ip->i_flag |= IN_CHANGE | IN_UPDATE;
+   if (flags & IO_EXT)
+   ip->i_flag |= IN_CHANGE;
+   else
+   ip->i_flag |= IN_CHANGE | IN_UPDATE;
if (bpp != NULL) {
bp->b_blkno = FFS_FSBTODB(fs, bno);
allocbuf(bp, nsize, 1);
Index: ffs/ffs_balloc.c
===
RCS file: /cvsroot/src/sys/ufs/ffs/ffs_balloc.c,v
retrieving revision 1.63
diff -u -u -r1.63 ffs_balloc.c
--- ffs/ffs_balloc.c28 Oct 2017 00:37:13 -  1.63
+++ ffs/ffs_balloc.c10 Apr 2020 22:15:22 -
@@ -72,6 +72,12 @@
 static int ffs_balloc_ufs2(struct vnode *, off_t, int, kauth_cred_t, int,
 struct buf **);
 
+static daddr_t
+ffs_extb(struct fs *fs, struct ufs2_dinode *dp, daddr_t nb)
+{
+   return ufs_rw64(dp->di_extb[nb], UFS_FSNEEDSWAP(fs));
+}
+   
 /*
  * Balloc defines the structure of file system storage
  * by allocating the physical blocks on a device given
@@ -139,10 +145,11 @@
osize = ffs_blksize(fs, ip, nb);
if (osize < fs->fs_bsize && osize > 0) {
mutex_enter(>um_lock);
-   error = ffs_realloccg(ip, nb,
+   error = ffs_realloccg(ip, nb, ffs_getdb(fs, ip, nb),

Re: All (?) network tests failing

2020-04-04 Thread Christos Zoulas

In article <24200.56933.470930.730...@guava.gson.org>,
Andreas Gustafsson   wrote:
>Robert Elz wrote:
>> Not an idea, but a possibility - the change to route.c (1.167) was
>> unimportant - it doesn't really matter (to the tests) if it does
>> anything useful or not - it is possible that it just happened that the
>> fd that the setsockopt() was being performed on was a socket (a suitable
>> socket) prior to the openssl update, but after that, the rump fd's
>> shifted around, and what the setsockopt() was operating upon was no
>> longer a socket.
>> 
>> No idea if that is really what happened or not, but something like that
>> is at least plausible (even though it would seem that the changes of the
>> sys call having worked by accident seem to be not very high).
>
>I agree that this sounds plausible.  Also, the tests never failing for
>Christos might then be explained by him running them in an environment
>that has a different number of file descriptors already in use.

It could be due to tcsh doing its file descriptor dance differently...
What shell are you using?

christos

Re: All (?) network tests failing

2020-04-04 Thread Christos Zoulas



> On Apr 4, 2020, at 9:37 AM, Andreas Gustafsson  wrote:
> 
> Martin Husemann wrote:
>> I analyzed this particular one (202 steps back because rump.netstat dumps
>> core) - will fix it soon.
> 
> With martin's changes, the number of unexpected test failures
> went down from 413 to 6 on my bare metal testbed:
> 
>  
> http://www.gson.org/netbsd/bugs/build/amd64-baremetal/commits-2020.04.html#2020.04.03.16.20.52
> 
> The remaining 6 or so failures are unrelated.  Thanks to everyone who
> helped get this fixed.
> 
> Does anyone have an idea why the tests didn't start failing
> immediately when route.c 1.167 was committed, but only after the
> seemingly unrelated openssl update?
> 
I am still puzzled by this as the tests never failed on my machine!

christos



signature.asc
Description: Message signed with OpenPGP

Re: All (?) network tests failing

2020-04-02 Thread Christos Zoulas

In article <19747.1585851...@jinx.noi.kre.to>,
Robert Elz   wrote:
>Date:Mon, 30 Mar 2020 14:25:01 -0400
>From:    Christos Zoulas 
>Message-ID:  <3d3ac2b9-5e6e-400c-9a4b-10742c90c...@zoulas.com>
>
>  | All the tests are failing for you the same way:
>  | rump.route: SO_RERROR: Socket operation on non-socket
>
>Not all, but quite a few are.
>
>This one I think is due to src/sbin/route/rouyte.c 1.167
>
>sock = prog_socket(PF_ROUTE, SOCK_RAW, 0);
>if (setsockopt(sock, SOL_SOCKET, SO_RERROR,
>, sizeof(on)) == -1)
>warn("SO_RERROR");
>
>where that setcockopt() was added.   I think that needs to be a prog_*
>type call, so rump can do the right thing.   That will mean adding it
>to prog_opts, and right now I don't have time to work out what the correct
>magic is, but if no-one else does in the next day or so, I will take
>another look.
>
>That should take care of the failing network related tests that contain
>rump.route commands, but that's not all of the failing tests.

Thanks! I fixed that now. Let's see how many break after this...

christos

Re: New tools proposal: ioctlname and ioctldecode

2020-04-02 Thread Christos Zoulas

In article <2096.1585816...@jinx.noi.kre.to>,
Robert Elz   wrote:
>Date:Thu, 2 Apr 2020 04:11:17 +0200
>From:Kamil Rytarowski 
>Message-ID:  
>
>  | This is partially enforceable. As once we generate catchall switch like:
>  |
>  | case FOO_OP:
>  | ...
>  | case BAR_OP:
>  | ...
>  |
>  | a compiler will report error whenever FOO_OP = BAR_OP.
>
>That makes it easy to detect, not enforce.   Once detected that way,
>what happens next?  Neither will (or can really) change as they both
>have existing applications compiled that use them - in the worse case
>the conflicts come from attempting to implement compat mode for someone
>else's binaries, and support their existing ioctl's (which we obviously
>cannot alter) - and do that for two different systems at once.
>
>This is the same reason we cannot fix the few duplicates that exist in
>our code - we want to retain backward binary compatibility, which means
>supporting ancient binaries that happen to use those ioctl values.
>
>Avoiding conflicts is (always has been) the aim - it allows drivers to
>detect attempts to use an ioctl intended for some different device, rather
>than believing it was intended for them, but there simply isn't, and can't
>really be, any way to enforce that.
>
>kre
>
>ps: don't forget all the sockioctl()s which also need decoding.   Perhaps
>even more than device ioctls.

We already have a lot of dups. See the mkioctls script.

christos

Re: New tools proposal: ioctlname and ioctldecode

2020-04-01 Thread Christos Zoulas



> On Apr 1, 2020, at 9:27 PM, Kamil Rytarowski  wrote:
> 
> I've implemented:
> 
> ioctlprint [-f format] [-e emul] arg...
> 
> $ ./ioctlprint   2148554498 2148554498
> WSKBDIO_COMPLEXBELL _IOW('W',0x2,0x10) 0x80105702
> WSKBDIO_COMPLEXBELL _IOW('W',0x2,0x10) 0x80105702
> 
> $ ./ioctlprint -f "%o %d %d %i %x %e %n\n"  2148554498
> 020004053402 2148554498 2148554498 2148554498 0x80105702
> _IOW('W',0x2,0x10) WSKBDIO_COMPLEXBELL
> 
> %n - name
> %e - expression
> %x - print HEX number
> %o - print OCT number
> %d %i - print DEC number
> 
> http://netbsd.org/~kamil/patch-00245-kdump-ioctlname.3.txt


You are fast! I'd write a man page and commit it.

christos


signature.asc
Description: Message signed with OpenPGP

Re: New tools proposal: ioctlname and ioctldecode

2020-04-01 Thread Christos Zoulas

In article <0fd513f7-6f0c-6ed1-2119-6ce5313d4...@gmx.com>,
Kamil Rytarowski   wrote:
>I propose to add two new tools:
>
> - ioctlname
> - ioctldecode

I would call it ioctlprint and have:

ioctlprint [-f ] || arg

for the input arg can be:
name = TIOCFOO
expr = _IO?('x', y, z)
value = 0xfoobee

The format can be contain %e %n %v which expand to what you
think and defaults to:

"%n %e %v\n"

christos

Re: All (?) network tests failing

2020-03-30 Thread Christos Zoulas

Unfortunately they still work for me after a clean build. I am going to try to 
download
a standard build...

christos

> On Mar 30, 2020, at 3:35 PM, Christos Zoulas  wrote:
> 
> Signed PGP part
> Ok, let me start a clean build.
> 
> christos
> 
>> On Mar 30, 2020, at 2:36 PM, Andreas Gustafsson  wrote:
>> 
>> Christos Zoulas wrote:
>>> All the tests are failing for you the same way:
>>> 
>>> rump.route: SO_RERROR: Socket operation on non-socket
>>> 
>>> I doubt that my gif change affected that. This smells to me like the rump fd
>>> hijack is not
>>> working either because we have some new system call involved or something is
>>> messing
>>> up the file descriptors.  What is your build host?
>> 
>> The tests are failing on both the TNF testbed and my own.  The
>> respective OS versions are:
>> 
>> NetBSD babylon5.netbsd.org 8.1_STABLE NetBSD 8.1_STABLE (BABYLON5) #1: Fri 
>> Jan 24 21:50:18 UTC 2020  
>> s...@franklin.netbsd.org:/home/netbsd/8/amd64/obj/sys/arch/amd64/compile/BABYLON5
>>  amd64
>> NetBSD guido.araneus.fi 9.0 NetBSD 9.0 (GENERIC) #0: Fri Feb 14 00:06:28 UTC 
>> 2020  mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC 
>> amd64
>> 
>> --
>> Andreas Gustafsson, g...@gson.org
> 
> 
> 



signature.asc
Description: Message signed with OpenPGP

Re: All (?) network tests failing

2020-03-30 Thread Christos Zoulas

Ok, let me start a clean build.

christos

> On Mar 30, 2020, at 2:36 PM, Andreas Gustafsson  wrote:
> 
> Christos Zoulas wrote:
>> All the tests are failing for you the same way:
>> 
>> rump.route: SO_RERROR: Socket operation on non-socket
>> 
>> I doubt that my gif change affected that. This smells to me like the rump fd
>> hijack is not
>> working either because we have some new system call involved or something is
>> messing
>> up the file descriptors.  What is your build host?
> 
> The tests are failing on both the TNF testbed and my own.  The
> respective OS versions are:
> 
>  NetBSD babylon5.netbsd.org 8.1_STABLE NetBSD 8.1_STABLE (BABYLON5) #1: Fri 
> Jan 24 21:50:18 UTC 2020  
> s...@franklin.netbsd.org:/home/netbsd/8/amd64/obj/sys/arch/amd64/compile/BABYLON5
>  amd64
>  NetBSD guido.araneus.fi 9.0 NetBSD 9.0 (GENERIC) #0: Fri Feb 14 00:06:28 UTC 
> 2020  mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
> 
> --
> Andreas Gustafsson, g...@gson.org



signature.asc
Description: Message signed with OpenPGP

Re: All (?) network tests failing

2020-03-30 Thread Christos Zoulas


> 
>> 2. The gif related tests are failing because of a recent change to record 
>> mac addresses
>>I committed a fix for that.
> 
> Your fix didn't work; the gif tests are still failing with
> src/tests/net/net_common.sh 1.40:
> 
>  
> http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2020/2020.03.30.13.01.39/test.html#net_if_gif_t_gif_gif_basic_ipv4overipv4
> 
>> 3. The rest of the tests (I've sampled 5 of them) don't fail for me.
> 
> If you do a full release build from scratch, install the release, and
> run the tests in the installed release like the testbeds do, I bet
> they will fail for you, too.

All the tests are failing for you the same way:
rump.route: SO_RERROR: Socket operation on non-socket
 <>I doubt that my gif change affected that. This smells to me like the rump fd 
hijack is not
working either because we have some new system call involved or something is 
messing
up the file descriptors.

What is your build host?
I am running the latest build I installed built from NetBSD/current to 
NetBSD/current.

christos



signature.asc
Description: Message signed with OpenPGP

Re: All (?) network tests failing

2020-03-30 Thread Christos Zoulas

I've been looking into this:
1. The libcrypto/bn test just needs more time
2. The gif related tests are failing because of a recent change to record mac 
addresses
I committed a fix for that.
3. The rest of the tests (I've sampled 5 of them) don't fail for me.

christos

> On Mar 30, 2020, at 8:50 AM, Martin Husemann  wrote:
> 
> On Mon, Mar 30, 2020 at 03:44:49PM +0300, Andreas Gustafsson wrote:
>> Martin Husemann wrote:
>>> -current just had a serious regression in test results, it seems like
>>> ~all networking tests are failing now:
>> 
>> Many (most?) of these have been failing for more than a week now, as
>> reported on current-users in
>> 
>>  http://mail-index.netbsd.org/current-users/2020/03/23/msg038127.html
>> 
>> which identified both the commits and the developer responsible for
>> the breakage.
> 
> Well, they did not fail for me, and I can't see how the openssl upgrade
> is related - there certainly is something strange ongoing (maybe a bug
> in the build system not triggering all rebuilds for update builds).
> 
> Martin



signature.asc
Description: Message signed with OpenPGP

Re: GSoC 2020 - The NetBSD Foundation - Emulating linux timer.. syscalls

2020-03-23 Thread Christos Zoulas

In article ,
Humayun Mulla   wrote:
>-=-=-=-=-=-
>
>Hi,
>
>My name is Humayun Mulla, I am a graduate student pursuing MS in Computer
>Science from State University of New York, Binghamton. I have a total work
>experience of 6 years of which 1 year as an Assistant Professor and 5 years
>in the software industry.
>My Github link - https://github.com/HumayunMulla & LinkedIn profile -
>https://www.linkedin.com/in/humayun-mulla-02719727/.
>
>I have taken an Operating System course in the previous semester and I have
>developed an interest in understanding how things work at such root level
>in the computer system. As a part of the course I have done different
>projects which vary from developing a shell that can execute all the
>commands given to it. In addition to that added some features to the
>program so that it can do I/O redirection and implemented inter-process
>communication. Also another assignment involved developing a kernel module
>that does 5-level page walk and printing the physical address of any given
>process ID. The last project that I did was to solve producer-consumer
>problem using a named pipe, i.e. by developing character device driver that
>can act as an interface between multiple producers & consumers.
>
>I know I am late to approach you but I taught of better to learn to walk
>first and then run. Therefore I spent the last couple of weeks in
>understanding and implementing Live VM Migration. And also understanding
>about how timer system calls work in Linux environment.
>
>I wanted to know if there are any pre-requisite tasks that I need to
>complete before submission of my application. I am looking forward to work
>under your mentor-ship on emulating linux timer.. syscalls using my
>knowledge and your guidance. Hoping to hear from you soon. Thank you for
>your time and consideration.

Hi Humayun,

I think that the best thing to do is to try installing NetBSD on either a VM
or real hardware and then become familiar with pkgsrc and install the linux
emulation packages (if you have not done so). ping me privately if you need
help with that.

Best,

christos

Routing socket code uses the wrong credentials for permissions checks

2020-03-12 Thread Christos Zoulas

Hi,

If I have a setuid process that opens a file for write that I don't have access 
to, then drop privileges and try to write to the file, I expect the write to 
succeed.
Similarly if I pass a file descriptor opened for write to another process that 
does not have access to the file, I expect that process to be able to write to 
the file.
This does not work properly with the routing socket because it uses the current 
lwp's credentials to do the checking. Here's the relevant code as a patch to fix
the issue (to use the socket credentials) and a test program to demonstrate the 
issue. I am planning to fix this soon, unless someone has a reason why not to.

Best,

christos

Index: rtsock_shared.c
===
RCS file: /cvsroot/src/sys/net/rtsock_shared.c,v
retrieving revision 1.16
diff -u -u -r1.16 rtsock_shared.c
--- rtsock_shared.c 12 Mar 2020 19:36:33 -  1.16
+++ rtsock_shared.c 12 Mar 2020 22:17:02 -
@@ -703,10 +703,10 @@
}

/*
-* Verify that the caller has the appropriate privilege; RTM_GET
+* Verify that the socket has the appropriate privilege; RTM_GET
 * is the only operation the non-superuser is allowed.
 */
-   if (kauth_authorize_network(curlwp->l_cred, KAUTH_NETWORK_ROUTE,
+   if (kauth_authorize_network(so->so_cred, KAUTH_NETWORK_ROUTE,
0, rtm, NULL, NULL) != 0)
senderr(EACCES);

[6:22pm] 10>cat ~/foo.c
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int
checkrtmsg(int fd) {
/*
 * Send a routing message that is not supported to check for access
 */
static struct sockaddr_in sin = {
.sin_len = sizeof(sin),
.sin_family = AF_INET,
};
char buf[4096];
struct rt_msghdr *rtm = (void *)buf;
char *cp = (char *)(rtm + 1);
int l;

#define NEXTADDR(s) \
l = RT_ROUNDUP((s)->sin_len); memmove(cp, s, l); cp += l;
memset(buf, 0, sizeof(buf));
rtm->rtm_type = RTM_IFANNOUNCE;
rtm->rtm_flags = 0;
rtm->rtm_addrs = RTA_DST|RTA_GATEWAY;
rtm->rtm_version = RTM_VERSION;
rtm->rtm_seq = 666;
NEXTADDR();
NEXTADDR();
rtm->rtm_msglen = (char *)cp - (char *)rtm;
if (write(fd, rtm, rtm->rtm_msglen) == -1)
warn("write");
}

int main(void)
{
int fd = socket(PF_ROUTE, SOCK_RAW, 0);
setuid(getuid());
printf("running as: %u %u\n", (int)geteuid(), (int)getuid());
checkrtmsg(fd);
return 0;
}

With a fixed kernel:
# cc foo.c
# chmod u+s a.out
#^D
[6:24pm] 13>./a.out
running as: 10080 10080
a.out: write: Operation not supported

With a broken kernel:
[6:25pm] 25>./a.out
running as: 10080 10080
a.out: write: Permission denied




signature.asc
Description: Message signed with OpenPGP

Re: ALTQ Refactoring and NPF Integration Sponsorship

2020-02-25 Thread Christos Zoulas

On Feb 25,  1:36pm, sean.yeh...@gmail.com (Sean Yeh) wrote:
-- Subject: ALTQ Refactoring and NPF Integration Sponsorship

| Hi Christos,
| 
| I hope you are having a great week!
| 
| My name is Sean and I was hoping to contribute to the ALTQ Refactoring and
| NPF Integration project.
| 
| I graduated from University of California, San Diego (UCSD) as a
| Mathematics-Computer Science major in March 2019. The common thread tying
| my work experience together would be automating and optimizing processes,
| such as tutor notifications and shipping processes. I've attached my resume
| as well.
| 
| This seems like the perfect project as I recently finished reading and
| taking notes on TCP/IP Illustrated Vol. 1 and would like to further my
| networking knowledge.
| 
| I am currently able to contribute 40 hours a week. If you need anything
| else from me, please let me know. My phone number is (909)-610-0560.

Hi Sean,

I am glad you are interested in this project. I think that the first thing
to do is some research on the state of the art in traffic shaping software
and their integration with firewalls. Altq is an old piece of software and
it probably needs a complete re-write. Once you get a feel about what's
out there and what you would like to implement we can discuss it in this
forum and we can help you write a proposal.

Best,

christos

Re: NULL pointer arithmetic issues

2020-02-24 Thread Christos Zoulas

In article ,
Kamil Rytarowski   wrote:
>-=-=-=-=-=-
>-=-=-=-=-=-
>
>On 24.02.2020 13:41, Joerg Sonnenberger wrote:
>> On Mon, Feb 24, 2020 at 11:42:01AM +0100, Kamil Rytarowski wrote:
>>> Forbidding NULL pointer arithmetic is not just for C purists trolls. It
>>> is now in C++ mainstream and already in C2x draft.
>> 
>> This is not true. NULL pointer arithmetic and nullptr arithmetic are
>> *very* different things. Do not conflate them.
>> 
>> Joerg
>> 
>
>As noted, they are allowed to be practically the same in C++. The C
>proposal (n2394) NULL is marked as deprecated and NULL should be set to
>nullptr.

This is just a proposal; once it becomes part of the standard we
can worry about it. I agree with the rest of the people that we
should (for now) change these cases in the sanitizer to not produce
errors instead of making the code more complicated, to make the
sanitizer happy.

christos

Re: MAX_PAGE_SIZE for m68k (Re: CVS commit:src/sys/arch/arm/include/arm32)

2020-01-14 Thread Christos Zoulas

On Jan 14,  8:54am, thor...@me.com (Jason Thorpe) wrote:
-- Subject: Re: MAX_PAGE_SIZE for m68k (Re: CVS commit:src/sys/arch/arm/inclu

| 
| 
| > On Jan 14, 2020, at 8:41 AM, Izumi Tsutsui  wrot=
| e:
| >=20
| >> b) Modules should be built such that they can use a non-fixed PAGE_SIZE.=
| 
| >=20
| > No, this is not necessary, because modules are built for each $MACHINE
| > and (a) each $MACHINE has fixed PAGE_SIZE.
| 
| Yes, understood.  I think it would eventually be a nice idea to make module=
| s $MACHINE_ARCH-sharable, but it's not critical for now.

We do this to save space, but as you say, not important for now. Perhaps
the expedient solution is to define MALLOC_PAGE_SIZE...

christos

Re: MAX_PAGE_SIZE for m68k (Re: CVS commit: src/sys/arch/arm/include/arm32)

2020-01-13 Thread Christos Zoulas

On Jan 14,  1:15am, tsut...@ceres.dti.ne.jp (Izumi Tsutsui) wrote:
-- Subject: Re: MAX_PAGE_SIZE for m68k (Re: CVS commit: src/sys/arch/arm/incl

| christos@ wrote:
| 
| > >Now I get the following erro during local tests of
| > >"build.sh -U -m hp300 release" on NetBSD/i386 9.0_RC1 host:
| > >
| > >---
| > >#create  compat_util/compat_exec.d
|  :
| > >In file included from /s/cvs/src/sys/sys/param.h:149:0,
| > > from /s/cvs/src/sys/compat/common/compat_exec.c:41:
| > >./m68k/pmap_motorola.h:165:5: error: operator '*' has no left operand
| > > #if PAGE_SIZE == 8192 /* NBPG / (SG4_LEV1SIZE * sizeof(st_entry_t)) */
| > > ^
| > >nbmkdep: compile failed.
| > >*** [compat_exec.d] Error code 1
| > 
| > try cc -E?
| 
| It turns out the problem is more complicated.
| 
|  has the following definitions:
| 
| https://nxr.netbsd.org/xref/src/sys/uvm/uvm_param.h?r=1.38#135
| ---
| 135  * If MIN_PAGE_SIZE and MAX_PAGE_SIZE are not equal, then we must use
| 136  * non-constant PAGE_SIZE, et al for LKMs.
| 137  */
| 138 #if (MIN_PAGE_SIZE != MAX_PAGE_SIZE)
| 139 #define   __uvmexp_pagesize
| 140 #if defined(_LKM) || defined(_MODULE)
| 141 #undef PAGE_SIZE
| 142 #undef PAGE_MASK
| 143 #undef PAGE_SHIFT
| 144 #endif
| 145 #endif
| 146 
| 147 /*
| 148  * Now provide PAGE_SIZE, PAGE_MASK, and PAGE_SHIFT if we do not
| 149  * have ones that are compile-time constants.
| 150  */
| 151 #if !defined(PAGE_SIZE)
| 152 extern const int *const uvmexp_pagesize;
| 153 extern const int *const uvmexp_pagemask;
| 154 extern const int *const uvmexp_pageshift;
| 155 #define   PAGE_SIZE   (*uvmexp_pagesize)  /* size of page 
*/
| 156 #define   PAGE_MASK   (*uvmexp_pagemask)  /* size of page 
- 1 */
| 157 #define   PAGE_SHIFT  (*uvmexp_pageshift) /* bits to 
shift for pages */
| 158 #endif /* PAGE_SIZE */
| ---
| 
| I.e.  assumes PAGE_SIZE is not compile time constant
| for MODULEs if (MIN_PAGE_SIZE != MAX_PAGE_SIZE).
| 
| Probably this is the same reason of recent arm build failures:
|  https://releng.netbsd.org/builds/HEAD/202001130720Z/
|  https://releng.netbsd.org/builds/HEAD/202001130720Z/evbarm-earm.build.failed
| ---
| /tmp/genassym.Lq8h9d/assym.c:57:1: error: asm operand 0 probably doesn't 
match constraints [-Werror]
|  __asm("XYZZY VM_MIN_ADDRESS %0" : : "n" (VM_MIN_ADDRESS));
|  ^
| /tmp/genassym.Lq8h9d/assym.c:58:1: error: asm operand 0 probably doesn't 
match constraints [-Werror]
|  __asm("XYZZY VM_MAXUSER_ADDRESS %0" : : "n" (VM_MAXUSER_ADDRESS));
|  ^
| /tmp/genassym.Lq8h9d/assym.c:97:1: error: asm operand 0 probably doesn't 
match constraints [-Werror]
|  __asm("XYZZY PAGE_MASK %0" : : "n" (PAGE_MASK));
|  ^
| /tmp/genassym.Lq8h9d/assym.c:98:1: error: asm operand 0 probably doesn't 
match constraints [-Werror]
|  __asm("XYZZY PAGE_SIZE %0" : : "n" (PAGE_SIZE));
|  ^
| ---
| 
| Should we prepare independent constant for
| "possible pagesize value among different MACHINE with the same MACHINE_ARCH"
| for jemalloc(3)?

Ouch, this hurts. Sure, perhaps MALLOC_PAGE_SIZE?

christos

Re: [PATCH v5 3/4] Combine x86 register tests into unified test function

2019-12-24 Thread Christos Zoulas

In article <20191224114717.302420-3-mgo...@gentoo.org>,
MichaÅ  GÃ³rny   wrote:
>Reduce the code duplication and improve maintainability of x86 register
>tests by combining all of them to a single base function.
>---
> tests/lib/libc/sys/t_ptrace_amd64_wait.h |  406 +---
> tests/lib/libc/sys/t_ptrace_i386_wait.h  |  335 +--
> tests/lib/libc/sys/t_ptrace_x86_wait.h   | 2417 ++
> 3 files changed, 1103 insertions(+), 2055 deletions(-)

That's a good start, but I'd also start using loops for some of the
repetitive code for brevity.

christos

Re: [PATCH v4 2/4] Fix alignment when reading core notes

2019-12-23 Thread Christos Zoulas

In article <20191222164104.165346-2-mgo...@gentoo.org>,
MichaÅ  GÃ³rny   wrote:
>Both desc and note header needs to be aligned.  Therefore, we need
>to realign after skipping past desc as well.

Should probably be done with a standard alignment macro?

christos

>---
> tests/lib/libc/sys/t_ptrace_wait.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/tests/lib/libc/sys/t_ptrace_wait.c
>b/tests/lib/libc/sys/t_ptrace_wait.c
>index 6fc43572e015..1097a351fd92 100644
>--- a/tests/lib/libc/sys/t_ptrace_wait.c
>+++ b/tests/lib/libc/sys/t_ptrace_wait.c
>@@ -7603,6 +7603,9 @@ static ssize_t core_find_note(const char *core_path,
>   }
> 
>   offset += note_hdr.n_descsz;
>+  /* fix to alignment */
>+  offset = ((offset + core_hdr.p_align - 1)
>+  / core_hdr.p_align) * core_hdr.p_align;
>   }
>   }
> 
>-- 
>2.24.1
>
>

Re: [PATCH] i2c bus acquire / release cleanup, forbid i2c access in hard interrupt context

2019-12-19 Thread Christos Zoulas

In article ,
Jason Thorpe   wrote:
>-=-=-=-=-=-
>
>The i2c subsystem has a mechanism by which drivers for i2c-connected
>devices "acquire" and "release" the i2c bus for their own exclusive
>access while performing a set of operations.  Historically, it has been
>the responsibility of the back-end controller drivers to implement this
>mechanism, which generally speaking, has been implemented as a simple
>mutex.
>
>This has worked out fine, for the most part, but but one of the features
>of the i2c subsystem is that a driver that's using it can request that
>the operation run in a "polled" mode, which has historically meant
>"don't sleep, busy wait for completions, don't use interrupts".
>
>Unfortunately, the way this "polled" mode was implemented in some
>back-end drivers' acquire / release functions was wildly inconsistent,
>and in some cases, just plain broken.  During my audit, I found
>instances where drivers just didn't bother with the mutex at all... some
>would "trylock" ... some would busy-loop around a "trylock".
>
>In sort, it was kind of a hot mess.
>
>In one case (the "ihid" hid-over-i2c driver), a driver used "polled"
>mode in order to perform i2c bus access from a hard interrupt context. 
>When combined with some of the mistakes noted above, you've got recipe
>for deadlocks.
>
>My solution to this is two-fold:
>
>1- Fix the hid-over-i2c driver to not perform i2c access in hard
>interrupt context.  That required some additional capabilities from the
>ACPI interrupt API, which is chronicled here (note there are still some
>messages on that thread that are not on the web archive yet at the time
>I write this email, but they should land soon):
>
>   http://mail-index.netbsd.org/port-i386/2019/08/08/msg003804.html
>   (cross-posted to port-amd64 and port-i386)
>
>2- Centralize the logic for bus acquire / release.  This is now all
>performed in iic_acquire_bus() and iic_release_bus().  The I2C_F_POLL
>case is handled with a "trylock" operation, and EBUSY is returned if
>that fails.  The rationale is that basically NO ONE should be using
>polled mode.  It is used automatically by the i2c subsystem when the
>system is "cold", and honestly, I doubt anyone will be able to convince
>me that any other use of I2C_F_POLL is legitimate.  The acquire /
>release hooks for the back-end drivers remain in case they need to power
>on the controller or something like that, but the hook is now entirely
>optional just for those cases.
>
>Access to i2c bus in hard interrupt context is now forbidden.  Allowing
>it makes synchronization much harder and more expensive, and it's not
>something that can safely be done with all i2c controller back-ends
>(e.g. the type that might share registers with another type of device on
>the system).
>
>The end result -- there is a bunch of redundant (and incorrect) code
>that simply gets deleted.  And every driver has consistent behavior.
>
>Patch attached.  I've *at least* compile-tested these changes just about
>everywhere, and I have done pretty thorough testing of the MI i2c
>portions on a Raspberry Pi (where I also took the opportunity to rewrite
>the driver for the bcm2835 i2c controller to work as an interrupt-driven
>state machine rather than busy-waiting everywhere, which improved system
>responsiveness when transferring lots of data over i2c on a single-cpu
>Pi).  I also found some bugs in the Exynos and Tegra i2c controller
>drivers, but I don't have the hardware to test my fixes for those.
>

LGTM, thanks for doing this.

christos

Re: modules item #14 revisited

2019-12-07 Thread Christos Zoulas

On Dec 7,  8:55pm, a...@absd.org (David Brownlee) wrote:
-- Subject: Re: modules item #14 revisited

| Very much like this - would assume that modules.tgz goes away?

This is a good question. The problem is that if every kernel in a
distribution includes its own copy of modules, we'll end up bloating
the distribution a lot. For example on amd64 we distribute 4 kernels:

kern-GENERIC.tgz
kern-GENERIC_KASLR.tgz
kern-XEN3_DOM0.tgz  
kern-XEN3_DOMU.tgz  

Does each one gets a copy of the "appropriate" (since XEN needs different
modules)? Or do we have a modules.tgz and a modules.xen3.tgz to be unpacked
together with the kernel? And how is the unpacking done?

| Could logical extensions to this be:
| a) Allow including a miniroot as a separate file

I have not thought about that, but it should be doable.

| b) Use ustarfs to allow handling this layout of kernel, modules and/or
| miniroot as a (optionally compressed) tar file

Yes, I was thinking about booting from ustarfs... It will need more code
in boot...

christos

Re: modules item #14 revisited

2019-12-06 Thread Christos Zoulas

In article <20191207024224.1b0d417f...@rebar.astron.com>,
Christos Zoulas  wrote:
>
>Hi,
>
>This is a quick and dirty implementation of:
>
>http://mail-index.NetBSD.org/current-users/2009/05/10/msg009372.html
>
>to use:
>$ echo KERNEL_DIR=yes >> /etc/mk.conf
># apply the enclosed patch
>$ mv /netbsd{,.old}
>$ mkdir -p /netbsd/modules

$ make && make install in src/share/mk

># build a new kernel and put it in /netbsd/kernel
>$ make && make install in src/sys/modules
>$ make && make install in sys/arch/i386/stand/boot

>$ cp /usr/mdec/boot /
>
>There are quite a few open issues:
>- All ports need to provide get_booted_kernel()
>  Are there ports where we can't get the booted kernel name? What do we do
>  there?
>- Make install in modules always overwrites the modules in /netbsd/
>- Does each kernel.tgz need to pack all the modules in the sets or
>  do we keep the modules.tgz that unpacks in /netbsd/modules?
>- How about the other kernels? do they unpack in /KERNEL-NAME/kernel?
>- Right now there is fallback code in the boot blocks for just the
>  kernel but not the old module path (this is intentional to keep things
>  simple). You can always boot a kernel by name from a file.
>
>christos
>
>Index: share/mk/bsd.kmodule.mk
>===
>RCS file: /cvsroot/src/share/mk/bsd.kmodule.mk,v
>retrieving revision 1.63
>diff -u -p -u -r1.63 bsd.kmodule.mk
>--- share/mk/bsd.kmodule.mk1 Dec 2019 20:24:47 -   1.63
>+++ share/mk/bsd.kmodule.mk7 Dec 2019 02:26:01 -
>@@ -172,7 +172,13 @@ ${PROG}: ${OBJS} ${DPADD} ${KMODSCRIPT}
> # Install rules
> .if !target(kmodinstall)
> .if !defined(KMODULEDIR)
>+.if ${KERNEL_DIR:Uno} == "yes"
>+KMODULEDIR=   ${DESTDIR}/netbsd/modules/${KMOD}
>+_INST_DIRS=   ${DESTDIR}/netbsd
>+_INST_DIRS+=  ${DESTDIR}/netbsd/modules
>+_INST_DIRS+=  ${DESTDIR}/netbsd/modules/${KMOD}
> _OSRELEASE!=  ${HOST_SH} $S/conf/osrelease.sh -k
>+.else
> # Ensure these are recorded properly in METALOG on unprived installes:
> KMODULEARCHDIR?= ${MACHINE}
> _INST_DIRS=   ${DESTDIR}/stand/${KMODULEARCHDIR}
>@@ -180,6 +186,7 @@ _INST_DIRS+=   ${DESTDIR}/stand/${KMODULEA
> _INST_DIRS+=  ${DESTDIR}/stand/${KMODULEARCHDIR}/${_OSRELEASE}/modules
> KMODULEDIR=   ${DESTDIR}/stand/${KMODULEARCHDIR}/${_OSRELEASE}/modules/${KMOD}
> .endif
>+.endif
> _PROG:=   ${KMODULEDIR}/${PROG} # installed path
> 
> .if ${MKUPDATE} == "no"
>Index: sys/arch/i386/stand/boot/Makefile.boot
>===
>RCS file: /cvsroot/src/sys/arch/i386/stand/boot/Makefile.boot,v
>retrieving revision 1.73
>diff -u -p -u -r1.73 Makefile.boot
>--- sys/arch/i386/stand/boot/Makefile.boot 13 Sep 2019 02:19:45 -  
>1.73
>+++ sys/arch/i386/stand/boot/Makefile.boot 7 Dec 2019 02:26:01 -
>@@ -53,6 +53,10 @@ CFLAGS+= -Wall -Wmissing-prototypes -Wst
> CPPFLAGS+= -nostdinc -D_STANDALONE
> CPPFLAGS+= -I$S
> 
>+.if ${KERNEL_DIR:Uno} == "yes"
>+CPPFLAGS+= -DKERNEL_DIR
>+.endif
>+
> CPPFLAGS+= -DSUPPORT_PS2
> CPPFLAGS+= -DDIRECT_SERIAL
> CPPFLAGS+= -DSUPPORT_SERIAL=boot_params.bp_consdev
>Index: sys/arch/i386/stand/boot/boot2.c
>===
>RCS file: /cvsroot/src/sys/arch/i386/stand/boot/boot2.c,v
>retrieving revision 1.72
>diff -u -p -u -r1.72 boot2.c
>--- sys/arch/i386/stand/boot/boot2.c   2 Sep 2019 06:10:24 -   1.72
>+++ sys/arch/i386/stand/boot/boot2.c   7 Dec 2019 02:26:01 -
>@@ -279,6 +279,12 @@ bootit(const char *filename, int howto)
>   if (howto & AB_VERBOSE)
>   printf("booting %s (howto 0x%x)\n", sprint_bootsel(filename),
>   howto);
>+#ifdef KERNEL_DIR
>+  char path[512];
>+  strcpy(path, filename);
>+  strcat(path, "/kernel");
>+  (void)exec_netbsd(path, 0, howto, boot_biosdev < 0x80, clearit);
>+#endif
> 
>   if (exec_netbsd(filename, 0, howto, boot_biosdev < 0x80, clearit) < 0)
>   printf("boot: %s: %s\n", sprint_bootsel(filename),
>Index: sys/arch/i386/stand/lib/exec.c
>===
>RCS file: /cvsroot/src/sys/arch/i386/stand/lib/exec.c,v
>retrieving revision 1.74
>diff -u -p -u -r1.74 exec.c
>--- sys/arch/i386/stand/lib/exec.c 13 Sep 2019 02:19:46 -  1.74
>+++ sys/arch/i386/stand/lib/exec.c 7 Dec 2019 02:26:01 -
>@@ -151,7 +151,7 @@ static voidmodule_add_common(const char
> static void   userconf_init(void);
> 
> static void   extract_device(

modules item #14 revisited

2019-12-06 Thread Christos Zoulas



Hi,

This is a quick and dirty implementation of:

http://mail-index.NetBSD.org/current-users/2009/05/10/msg009372.html

to use:
$ echo KERNEL_DIR=yes >> /etc/mk.conf
# apply the enclosed patch
$ mv /netbsd{,.old}
$ mkdir -p /netbsd/modules
# build a new kernel and put it in /netbsd/kernel
$ make && make install in src/sys/modules
$ make && make install sys/arch/i386/stand/boot
$ cp /usr/mdec/boot /

There are quite a few open issues:
- All ports need to provide get_booted_kernel()
  Are there ports where we can't get the booted kernel name? What do we do
  there?
- Make install in modules always overwrites the modules in /netbsd/
- Does each kernel.tgz need to pack all the modules in the sets or
  do we keep the modules.tgz that unpacks in /netbsd/modules?
- How about the other kernels? do they unpack in /KERNEL-NAME/kernel?
- Right now there is fallback code in the boot blocks for just the
  kernel but not the old module path (this is intentional to keep things
  simple). You can always boot a kernel by name from a file.

christos

Index: share/mk/bsd.kmodule.mk
===
RCS file: /cvsroot/src/share/mk/bsd.kmodule.mk,v
retrieving revision 1.63
diff -u -p -u -r1.63 bsd.kmodule.mk
--- share/mk/bsd.kmodule.mk 1 Dec 2019 20:24:47 -   1.63
+++ share/mk/bsd.kmodule.mk 7 Dec 2019 02:26:01 -
@@ -172,7 +172,13 @@ ${PROG}: ${OBJS} ${DPADD} ${KMODSCRIPT}
 # Install rules
 .if !target(kmodinstall)
 .if !defined(KMODULEDIR)
+.if ${KERNEL_DIR:Uno} == "yes"
+KMODULEDIR=${DESTDIR}/netbsd/modules/${KMOD}
+_INST_DIRS=${DESTDIR}/netbsd
+_INST_DIRS+=   ${DESTDIR}/netbsd/modules
+_INST_DIRS+=   ${DESTDIR}/netbsd/modules/${KMOD}
 _OSRELEASE!=   ${HOST_SH} $S/conf/osrelease.sh -k
+.else
 # Ensure these are recorded properly in METALOG on unprived installes:
 KMODULEARCHDIR?= ${MACHINE}
 _INST_DIRS=${DESTDIR}/stand/${KMODULEARCHDIR}
@@ -180,6 +186,7 @@ _INST_DIRS+=${DESTDIR}/stand/${KMODULEA
 _INST_DIRS+=   ${DESTDIR}/stand/${KMODULEARCHDIR}/${_OSRELEASE}/modules
 KMODULEDIR=${DESTDIR}/stand/${KMODULEARCHDIR}/${_OSRELEASE}/modules/${KMOD}
 .endif
+.endif
 _PROG:=${KMODULEDIR}/${PROG} # installed path
 
 .if ${MKUPDATE} == "no"
Index: sys/arch/i386/stand/boot/Makefile.boot
===
RCS file: /cvsroot/src/sys/arch/i386/stand/boot/Makefile.boot,v
retrieving revision 1.73
diff -u -p -u -r1.73 Makefile.boot
--- sys/arch/i386/stand/boot/Makefile.boot  13 Sep 2019 02:19:45 -  
1.73
+++ sys/arch/i386/stand/boot/Makefile.boot  7 Dec 2019 02:26:01 -
@@ -53,6 +53,10 @@ CFLAGS+= -Wall -Wmissing-prototypes -Wst
 CPPFLAGS+= -nostdinc -D_STANDALONE
 CPPFLAGS+= -I$S
 
+.if ${KERNEL_DIR:Uno} == "yes"
+CPPFLAGS+= -DKERNEL_DIR
+.endif
+
 CPPFLAGS+= -DSUPPORT_PS2
 CPPFLAGS+= -DDIRECT_SERIAL
 CPPFLAGS+= -DSUPPORT_SERIAL=boot_params.bp_consdev
Index: sys/arch/i386/stand/boot/boot2.c
===
RCS file: /cvsroot/src/sys/arch/i386/stand/boot/boot2.c,v
retrieving revision 1.72
diff -u -p -u -r1.72 boot2.c
--- sys/arch/i386/stand/boot/boot2.c2 Sep 2019 06:10:24 -   1.72
+++ sys/arch/i386/stand/boot/boot2.c7 Dec 2019 02:26:01 -
@@ -279,6 +279,12 @@ bootit(const char *filename, int howto)
if (howto & AB_VERBOSE)
printf("booting %s (howto 0x%x)\n", sprint_bootsel(filename),
howto);
+#ifdef KERNEL_DIR
+   char path[512];
+   strcpy(path, filename);
+   strcat(path, "/kernel");
+   (void)exec_netbsd(path, 0, howto, boot_biosdev < 0x80, clearit);
+#endif
 
if (exec_netbsd(filename, 0, howto, boot_biosdev < 0x80, clearit) < 0)
printf("boot: %s: %s\n", sprint_bootsel(filename),
Index: sys/arch/i386/stand/lib/exec.c
===
RCS file: /cvsroot/src/sys/arch/i386/stand/lib/exec.c,v
retrieving revision 1.74
diff -u -p -u -r1.74 exec.c
--- sys/arch/i386/stand/lib/exec.c  13 Sep 2019 02:19:46 -  1.74
+++ sys/arch/i386/stand/lib/exec.c  7 Dec 2019 02:26:01 -
@@ -151,7 +151,7 @@ static void module_add_common(const char
 static voiduserconf_init(void);
 
 static voidextract_device(const char *, char *, size_t);
-static voidmodule_base_path(char *, size_t);
+static voidmodule_base_path(char *, size_t, const char *);
 static int module_open(boot_module_t *, int, const char *, const char *,
bool);
 
@@ -653,8 +653,18 @@ module_open(boot_module_t *bm, int mode,
 }
 
 static void
-module_base_path(char *buf, size_t bufsize)
+module_base_path(char *buf, size_t bufsize, const char *kernel_path)
 {
+#ifdef KERNEL_DIR
+   /* we cheat here, because %.* does not work with the mini printf */
+   char *ptr = strrchr(kernel_path, '/');
+   if (*ptr)
+   *ptr = '\0';
+
+

Re: usbhist support for urtwn

2019-11-25 Thread Christos Zoulas

In article <24027.34627.564698.872...@guava.gson.org>,
Andreas Gustafsson   wrote:
>Hi all,
>
>I have this patch to replace the debug printfs in sys/dev/usb/if_urtwn.c
>by usbhist calls, roughly modelled after the use of usbhist in if_axe.c:
>
>  https://www.gson.org/netbsd/patches/urtwn-usbhist.patch
>
>OK to commit?

I'd probably 0x%jx -> %#jx but otherwise LGTM.

christos

Re: netbsd32_{,u}int64 in sys/types.h for compat/sys/siginfo.h

2019-11-23 Thread Christos Zoulas

In article ,
Christos Zoulas  wrote:
>In article <468095c0-b973-7f23-1cfa-3dc19e3b8...@gmail.com>,
>Rin Okuyama   wrote:
>>On 2019/11/22 10:56, Christos Zoulas wrote:
>>> In article <679493cf-3e85-f56d-85e4-dfaf7958a...@gmail.com>,
>>> Rin Okuyama   wrote:
>>...
>>>> This was my misunderstanding. These codes are used when tracer is 64-bit
>>>> and traced is 32-bit. Don't know whether this is useful though...
>>> 
>>> Yes, and someone broke them because all the ptrace 64->32 calls for
>>> registers seem to return 0 now. Was that code changed recently?
>>...
>>
>>I fixed it:
>>http://mail-index.netbsd.org/source-changes/2019/11/22/msg111068.html
>>
>>Now, gdb/amd64 seems to work for i386 binaries to some extent, at least.
>
>Thanks! I think that the gdb on head should working for i386 binaries
>on amd64. I am installing a new kernel and userland and I will test
>shortly.

And it works fine for both static and dynamic binaries. We could also add
kernel debugging support for i386 kernels, but that needs changes in
the i386 header files:

1. Everywhere s/#include

Re: netbsd32_{,u}int64 in sys/types.h for compat/sys/siginfo.h

2019-11-21 Thread Christos Zoulas

In article <679493cf-3e85-f56d-85e4-dfaf7958a...@gmail.com>,
Rin Okuyama   wrote:
>On 2019/11/20 20:12, Rin Okuyama wrote:
>> On 2019/11/19 22:59, Kamil Rytarowski wrote:
>...
>>>
>>> We still miss compat32 support for PT_GETXMMREGS and PT_SETXMMREGS, at
>>> some point of time we shall add it for completeness.
>> 
>> Thank you!
>> 
>> With amd64/netbsd32_machdep.c rev 1.130, all tests in t_ptrace* pass in
>> COMPAT_NETBSD32 on amd64, except for that involved with XMM registers.
>> I will examine how to implement PT32_[GS]ETXMMREGS.
>
>I wrote a draft version of patch which adds PT32_[GS]ETXMMREGS support:
>
>http://www.netbsd.org/~rin/amd64-PT32_GSETXMMREGS-20191121.patch
>
>With this patch, XMM-related tests pass for COMPAT_NETBSD32 on amd64.
>
>Some remarks:
>
>(1) PT_[GS]ETXMMREGS ptrace(2) commands are added to .
> These are only used for COMPAT_NETBSD32, and not exposed to userland.
>
>(2) COMPAT_NETBSD32 codes are called from process_machdep.c via
> module_hook framework. This may be too much though...
>
>Comments?
>
>> Also, it seems that some COMPAT_NETBSD32 related codes for amd64 need to
>> be cleaned up. For example, there remain COMPAT_NETBSD32 codes in
>> amd64/process_machdep.c, that are no longer used:
>> 
>> https://nxr.netbsd.org/xref/src/sys/arch/amd64/amd64/process_machdep.c#129
>> https://nxr.netbsd.org/xref/src/sys/arch/amd64/amd64/process_machdep.c#183
>> ...
>> 
>> I will examine them too.
>
>This was my misunderstanding. These codes are used when tracer is 64-bit
>and traced is 32-bit. Don't know whether this is useful though...

Yes, and someone broke them because all the ptrace 64->32 calls for
registers seem to return 0 now. Was that code changed recently?

christos

[8:55pm] 1404>cat hello.c
#include 
#include 

int
main(void)
{
printf("hello world\n");
sleep(1);
return 0;
}
[8:56pm] 1405>cc -g -m32 hello.c
[8:56pm] 1406>gdb ./a.out
GNU gdb (GDB) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./a.out...
(gdb) break main
Breakpoint 1 at 0x80488be: file hello.c, line 7.
(gdb) r
Starting program: /u/christos/a.out 

Program received signal SIGTRAP, Trace/breakpoint trap.
0x in ?? ()
(gdb)

Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-30 Thread Christos Zoulas

In article <201909301931.x8ujvvve028...@server.cornerstoneservice.ca>,
John Nemeth   wrote:
>On Sep 30,  1:06pm, Michael van Elst wrote:
>} On Mon, Sep 30, 2019 at 12:37:38AM -0700, John Nemeth wrote:
>} 
>} > BTW, modules.conf isn't read by the kernel, it's read by
>} > /etc/rc.d/modules.  Putting anything in there that would have a
>} > lasting effect (i.e. parameters for autoloaded modules) would
>} > require quite a bit of work.
>} 
>} You could just store the parameters in the kernel so that a future
>} autoload will use these instead of or merged with the plist.
>
> You could do this using the backend code for the proposed
>sysctl.  The question then is how do you know when the .plist is
>changed? Would you attach some kind of a kevent to it?  If so, then
>you need to track the source of the entry in the "blacklist".  If
>it came from the .plist, then upon receiving notification that it
>has changed, then you want to delete the entry.  However, if it
>came from userland via sysctl or is part of the default list, then
>you don't want to delete the entry just because the .plist changed.
>This is starting to get complicated with corner cases.

I am not sure what gets priority here. The command line arguments
in modules.conf or the plist entries (when both exist)? Also while
it is attractive to store the plist next to the module we currently
install 0 plists.  What is the expectation here? That one will use
modload to generate plists or that we are going to put the arguments
in modules.conf? I think that the plists were meant to be static
(as the arguments in modules.conf) and they would take effect only
during load time. The variables in the plists are also supposed to
be module-specific not global.

christos

Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-29 Thread Christos Zoulas

In article <20190929090053.g...@homeworld.netbsd.org>,
  wrote:
>On Sat, Sep 28, 2019 at 01:29:39AM -, Christos Zoulas wrote:
>> Here's what I've implemented:
>> 
>> kern.module.noautoload="compat_linux* compat_[0-4]?"
>> 
>> This disables autoload for all compat_linux modules as well
>> as compat_netbsd < NetBSD-5.0
>> 
>> Comments?
>> 
>> christos
>> 
>...
>> +"compat_linux",
>> +"compat_linux32",
>
>Kinda shocking that nowhere in this thread do we bring up the fact that
>compat_linux is not currently being auto-loaded (since a 2017 change by
>maxv).
>
>As for the actual change, I'd like to see it integrated through
>modules.conf, not via settings of default sysctl values. I think it's
>bad user experience.

modules.conf contains module names and their arguments. It is a configuration
file for each module. There are already sysctls in the kern.module. tree all
related to autoloading.

christos

Re: mstohz and hztoms

2019-09-28 Thread Christos Zoulas




> On Sep 28, 2019, at 6:53 PM, matthew green  wrote:
> 
>> Comments?
> 
> i like the clean up.  it's clearly a step forward.
> 
> i only don't understand why 32 bit platforms can't handle
> large values here but 64 bit ones can.  is it only so that the
> 32 bit platforms don't use 64 bit maths when it's not needed?

There was a comment about not using 64 bit math on 32
bit platforms, but I don't understand why. It is not like those
are being called too frequently. Of course I might be missing
something...

> it just seems wrong to me to limit 32 bit artificially here,
> and it's not like it's _that_ difficult to overflow 32 bit hz.
> i run with HZ=1000 on some systems, like alpha does by
> default.  that gives you 49 days.  even with standard HZ=100
> it's only 16 months or so.  (hmm, i wonder if these macros
> compile nothing with HZ=1000 kernels.  be nice to confirm or
> add a hack :-)
> 
> can we make the 32 bit version smarter about accepting small
> values with 32 bit maths, but large or non-constant values
> with 64 bit maths?
> 

Yes, now that they are inline functions (we can safely do more
complex things.

> perhaps an explicit mstohz64() could handle the cases if we
> know they will exist, since most probably _know_ they are
> dealing with short intervals.
> 
> there's just something about the artificial limit here that
> is bugging me...

I will take a look at all the uses and come back with a proposal.

christos

mstohz and hztoms

2019-09-28 Thread Christos Zoulas



Hi,

I was looking at the man page for hstohz and hztoms and their function
signatures say "int hztoms(int);" and "int mztohz(int);" where the
implementations use unsigned. We also have 64 bit specific implementations
only for sparc64 and amd64 and only for mstohz() and not for hztoms():

The following patch does the following:
1. documents the prototypes to take and return unsigned (should the 64
   bit ones be documented to return unsigned long or truncate)?
2. fixes the comparison to 0x2 to be unsigned like the other calculations
3. moves the 64 bit mstohz in  so all the 64 bit platforms
   can use it
4. adds an hztoms() 64 bit implementation
5. removes the port-specific mstohz() ones

Comments?

christos

Index: share/man/man9/mstohz.9
===
RCS file: /cvsroot/src/share/man/man9/mstohz.9,v
retrieving revision 1.11
diff -u -p -u -r1.11 mstohz.9
--- share/man/man9/mstohz.9 20 Oct 2011 10:36:42 -  1.11
+++ share/man/man9/mstohz.9 28 Sep 2019 12:47:21 -
@@ -24,7 +24,7 @@
 .\" SUCH DAMAGE.
 .\"
 .\"
-.Dd October 20, 2011
+.Dd September 28, 2019
 .Dt MSTOHZ 9
 .Os
 .Sh NAME
@@ -33,10 +33,10 @@
 .Nd convert between milliseconds and system clock ticks
 .Sh SYNOPSIS
 .In sys/param.h
-.Ft int
-.Fn mstohz "int ms"
-.Ft int
-.Fn hztoms "int hz"
+.Ft unsigned int
+.Fn mstohz "unsigned int ms"
+.Ft unsigned int
+.Fn hztoms "unsigned int hz"
 .Sh DESCRIPTION
 The
 .Fn mstohz
Index: sys/sys/param.h
===
RCS file: /cvsroot/src/sys/sys/param.h,v
retrieving revision 1.614
diff -u -p -u -r1.614 param.h
--- sys/sys/param.h 27 Sep 2019 00:32:03 -  1.614
+++ sys/sys/param.h 28 Sep 2019 12:47:21 -
@@ -488,21 +488,29 @@
 #ifdef _KERNEL
 /*
  * macro to convert from milliseconds to hz without integer overflow
- * Default version using only 32bits arithmetics.
- * 64bit port can define 64bit version in their 
- * 0x2 is safe for hz < 2
+ * The 32 bit version uses only 32bit arithmetic; 0x2 is safe for hz < 
2
+ * the 64 bit version does the computation directly.
  */
 #ifndef mstohz
-#define mstohz(ms) \
-   (__predict_false((ms) >= 0x2) ? \
-   ((ms +0u) / 1000u) * hz : \
-   ((ms +0u) * hz) / 1000u)
+# ifdef _LP64
+#  define mstohz(ms) ((ms + 0UL) * hz / 1000)
+# else
+#  define mstohz(ms) \
+   (__predict_false((ms + 0u) >= 0x2u) ? \
+   ((ms + 0u) / 1000u) * hz : \
+   ((ms + 0u) * hz) / 1000u)
+# endif
 #endif
+
 #ifndef hztoms
-#define hztoms(t) \
-   (__predict_false((t) >= 0x2) ? \
-   ((t +0u) / hz) * 1000u : \
-   ((t +0u) * 1000u) / hz)
+# ifdef _LP64
+#  define hztomz(t) (((t) + 0UL) * 1000u / hz)
+# else
+#  define hztoms(t) \
+   (__predict_false((t + 0u) >= 0x2u) ? \
+   ((t + 0u) / hz) * 1000u : \
+   ((t + 0u) * 1000u) / hz)
+# endif
 #endif
 
 #definehz2bintime(t)   (ms2bintime(hztoms(t)))
Index: sys/arch/amd64/include/param.h
===
RCS file: /cvsroot/src/sys/arch/amd64/include/param.h,v
retrieving revision 1.31
diff -u -p -u -r1.31 param.h
--- sys/arch/amd64/include/param.h  20 Aug 2019 12:33:04 -  1.31
+++ sys/arch/amd64/include/param.h  28 Sep 2019 12:47:21 -
@@ -126,8 +126,6 @@
 #define btop(x)x86_btop(x)
 #define ptob(x)x86_ptob(x)
 
-#define mstohz(ms) ((ms + 0UL) * hz / 1000)
-
 #else  /*  __x86_64__  */
 
 #include 
Index: sys/arch/sparc64/include/param.h
===
RCS file: /cvsroot/src/sys/arch/sparc64/include/param.h,v
retrieving revision 1.60
diff -u -p -u -r1.60 param.h
--- sys/arch/sparc64/include/param.h15 May 2019 16:59:10 -  1.60
+++ sys/arch/sparc64/include/param.h28 Sep 2019 12:47:21 -
@@ -229,11 +229,6 @@ extern voiddelay(unsigned int);
 #defineDELAY(n)delay(n)
 #endif /* __HIDE_DELAY */
 
-#ifdef __arch64__
-/* If we're using a 64-bit kernel use 64-bit math */
-#define mstohz(ms) ((ms + 0UL) * hz / 1000)
-#endif
-
 /* Keep this a const so compiler optimization is done */
 extern const int cputyp;

Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-27 Thread Christos Zoulas

In article <20190927125444.gb12...@pony.stderr.spb.ru>,
Valery Ushakov   wrote:
>
>May be we should take a look at how SNMP did tables in MIB, b/c we are
>trying to create just such a table indexed by module name.

I think it is simpler than that.

>
>Also, I'm not that sure about autoload of compat stuff especially
>since iirc it currently implies auto-unload too.  I vaguely remember
>when I was debugging something in sh3 kobj_machdep.c I had some
>printfs there that made the autoloads visibile and (iirc) each vi
>invocation would trigger an autoload of compat ioctl code (which
>wouldn't recognize the ioctl, and that would be auto-unloaded a few
>seconds later).
>
>-uwe


Here's what I've implemented:

kern.module.noautoload="compat_linux* compat_[0-4]?"

This disables autoload for all compat_linux modules as well
as compat_netbsd < NetBSD-5.0

Comments?

christos

Index: kern_exec.c
===
RCS file: /cvsroot/src/sys/kern/kern_exec.c,v
retrieving revision 1.481
diff -u -u -r1.481 kern_exec.c
--- kern_exec.c 17 Sep 2019 15:19:27 -  1.481
+++ kern_exec.c 28 Sep 2019 01:27:00 -
@@ -626,6 +626,8 @@
"exec_ecoff",
"compat_aoutm68k",
"compat_netbsd32",
+   "compat_linux",
+   "compat_linux32",
"compat_sunos",
"compat_sunos32",
"compat_ultrix",
Index: kern_module.c
===
RCS file: /cvsroot/src/sys/kern/kern_module.c,v
retrieving revision 1.138
diff -u -u -r1.138 kern_module.c
--- kern_module.c   8 Aug 2019 18:08:41 -   1.138
+++ kern_module.c   28 Sep 2019 01:27:00 -
@@ -53,10 +53,16 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
 
+#ifndef MODULE_NOAUTOLOAD
+// Disable compat_linux and compat_linux32 by default for now
+#define MODULE_NOAUTOLOAD "compat_linux*"
+#endif
+
 struct vm_map *module_map;
 const char *module_machine;
 char   module_base[MODULE_BASE_SIZE];
@@ -93,6 +99,7 @@
 u_int  module_gen = 1;
 static kcondvar_t module_thread_cv;
 static kmutex_t module_thread_lock;
+static kmutex_t module_noautoload_lock;
 static int module_thread_ticks;
 int (*module_load_vfs_vec)(const char *, int, bool, module_t *,
   prop_dictionary_t *) = (void *)eopnotsupp;
@@ -120,6 +127,8 @@
 static voidmodule_enqueue(module_t *);
 
 static boolmodule_merge_dicts(prop_dictionary_t, const prop_dictionary_t);
+static boolmodule_allow_autoload(const char *);
+static voidmodule_noautoload_update(const char *);
 
 static voidsysctl_module_setup(void);
 static int sysctl_module_autotime(SYSCTLFN_PROTO);
@@ -412,6 +421,7 @@
}
cv_init(_thread_cv, "mod_unld");
mutex_init(_thread_lock, MUTEX_DEFAULT, IPL_NONE);
+   mutex_init(_noautoload_lock, MUTEX_DEFAULT, IPL_NONE);
TAILQ_INIT();
 
 #ifdef MODULAR /* XXX */
@@ -444,6 +454,8 @@
module_netbsd = module_newmodule(MODULE_SOURCE_KERNEL);
module_netbsd->mod_refcnt = 1;
module_netbsd->mod_info = _netbsd_modinfo;
+
+   module_noautoload_update(MODULE_NOAUTOLOAD);
 }
 
 /*
@@ -503,6 +515,100 @@
return (0);
 }
 
+struct noautoload {
+   const char *name;
+   SLIST_ENTRY(noautoload) next;
+};
+
+static SLIST_HEAD(, noautoload) noautoload_list =
+SLIST_HEAD_INITIALIZER(noautoload_list);
+
+static char noautoload_buf[1024];
+static char noautoload_nbuf[sizeof(noautoload_buf)];
+
+static void
+module_noautoload_update(const char *cbuf)
+{
+   static const char SEP[] = " \t\n,";
+   struct noautoload *e, *te;
+   char buf[sizeof(noautoload_buf)];
+
+   strlcpy(buf, cbuf, sizeof(buf));
+
+   mutex_enter(_noautoload_lock);
+
+   SLIST_FOREACH_SAFE(e, _list, next, te) {
+   SLIST_REMOVE(_list, e, noautoload, next);
+   kmem_free(e, sizeof(*e));
+   }
+
+   noautoload_nbuf[0] = noautoload_buf[0] = '\0';
+
+   size_t pos = 0;
+   char *p, *str = buf;
+
+   while ((p = strsep(, SEP)) != NULL) {
+   size_t len = strlen(p);
+   if (len == 0)
+   break;
+   e = kmem_alloc(sizeof(*e), KM_SLEEP);
+   e->name = noautoload_nbuf + pos;
+   SLIST_INSERT_HEAD(_list, e, next);
+
+   memcpy(noautoload_buf + pos, p, len);
+   memcpy(noautoload_nbuf + pos, p, len);
+   pos += len;
+   noautoload_buf[pos] = ' ';
+   noautoload_nbuf[pos] = '\0';
+   pos++;
+   }
+
+   if (pos)
+   noautoload_buf[pos - 1] = '\0'; /* space to NUL */
+
+   mutex_exit(_noautoload_lock);
+}
+
+static int
+sysctl_module_noautoload(SYSCTLFN_ARGS)
+{
+   struct sysctlnode node;
+   int error;
+   char newbuf[sizeof(noautoload_buf)];
+
+

Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-27 Thread Christos Zoulas

In article <201909262031.x8qkvnpv021...@server.cornerstoneservice.ca>,
John Nemeth   wrote:
>On Sep 26,  7:40pm, Christos Zoulas wrote:
>} In article <390f4c81-bf1c-443f-f7a9-a379c46b7...@m00nbsd.net>,
>} Maxime Villard   wrote:
>} >I recently made a big set of changes to fix many bugs and vulnerabilities in
>} >compat_linux and compat_linux32, the majority of which have a security 
>impact
>} >bigger than the Intel CPU bugs we hear about so much. These compat layers 
>are
>} >enabled by default, so everybody is affected.
>} >
>} >Secteam is in a state where no one is willing to pull up all the changes to
>} >the stable branches, because of the effort. No one is willing to write a
>} >security advisory either. When I say "no one", it includes me.
>} >
>} >The proposal and discussion held in this 2017 thread still hold and are
>} >unchanged two years later:
>} >
>} >https://mail-index.netbsd.org/tech-kern/2017/07/31/msg022153.html
>} >
>} >The compat layers are largely untested, often broken, and are a security 
>risk
>} >for everybody. Keeping them enabled for the <1% users interested
>means keeping
>} >vulnerabilities for the >99% who don't use these features.
>} >
>} >In the conversation above, we hit the problem that there was 
>cross-dependency
>} >among compat modules, and we couldn't selectively disable specific layers.
>} >Today this is possible thanks to pgoyette's work. That is, it is possible to
>} >comment out "options COMPAT_LINUX" from GENERIC, and have a 
>compat_linux.kmod
>} >which will modload correctly and be able to run Linux binaries out of
>the box.
>} >Under this scheme, the feature would be only one root command away from 
>being
>} >enabled in the kernel.
>} >
>} >Therefore, I am making today the same proposal as Taylor in 2017, because 
>the
>} >problem is still there exactly as-is and we just hit it again; the solution
>} >however is more straightforward.
>} 
>} I propose something very slightly different that can preserve the current
>} functionality with user action:
>} 
>} 1. Remove them from standard kernels in architectures where modules are
>}supported. Users can add them back or just use modules.
>} 2. Disable autoloading, but provide a sysctl to enable autoloading
>}(1 global sysctl for all compat modules). Users can change the default
>}in /etc/sysctl.conf (adds sysctl to the proposal)
>
> You mean this (first line):
>
>i386devel: {31} sysctl kern.module
>kern.module.autoload = 0
>kern.module.verbose = 0
>kern.module.path = /stand/amd64-xen/8.99.26/modules
>kern.module.autotime = 10

Perhaps:

kern.module.autoload.disable = linux,linux32

christos

Re: fcntl(F_GETPATH) support

2019-09-17 Thread Christos Zoulas

In article ,
Jason Thorpe   wrote:
>
>
>> On Sep 17, 2019, at 1:39 PM, Christos Zoulas  wrote:
>> 
>> I am not sure though if we should change the current behavior just to make
>> F_GETPATH better? Opinions?
>
>It seems completely logical that we SHOULD fix this.

I concur. Specially because some filesystems already do it. These
are the only ones that need to be changed:

christos

Index: fs/msdosfs/msdosfs_vnops.c
===
RCS file: /cvsroot/src/sys/fs/msdosfs/msdosfs_vnops.c,v
retrieving revision 1.98
diff -u -u -r1.98 msdosfs_vnops.c
--- fs/msdosfs/msdosfs_vnops.c  26 Apr 2017 03:02:48 -  1.98
+++ fs/msdosfs/msdosfs_vnops.c  17 Sep 2019 22:40:01 -
@@ -153,6 +153,8 @@
goto bad;
VN_KNOTE(ap->a_dvp, NOTE_WRITE);
*ap->a_vpp = DETOV(dep);
+   cache_enter(ap->a_dvp, *ap->a_vpp, cnp->cn_nameptr, cnp->cn_namelen,
+   cnp->cn_flags);
return (0);
 
 bad:
Index: fs/tmpfs/tmpfs_subr.c
===
RCS file: /cvsroot/src/sys/fs/tmpfs/tmpfs_subr.c,v
retrieving revision 1.104
diff -u -u -r1.104 tmpfs_subr.c
--- fs/tmpfs/tmpfs_subr.c   1 Jan 2019 10:06:54 -   1.104
+++ fs/tmpfs/tmpfs_subr.c   17 Sep 2019 22:40:01 -
@@ -434,6 +434,7 @@
 
VOP_UNLOCK(*vpp);
 
+   cache_enter(dvp, *vpp, cnp->cn_nameptr, cnp->cn_namelen, cnp->cn_flags);
return 0;
 }
 
Index: fs/udf/udf_subr.c
===
RCS file: /cvsroot/src/sys/fs/udf/udf_subr.c,v
retrieving revision 1.146
diff -u -u -r1.146 udf_subr.c
--- fs/udf/udf_subr.c   3 Jun 2019 06:04:20 -   1.146
+++ fs/udf/udf_subr.c   17 Sep 2019 22:40:01 -
@@ -5963,6 +5963,7 @@
/* adjust file count */
udf_adjust_filecount(udf_node, 1);
 
+   cache_enter(dvp, *vpp, cnp->cn_nameptr, cnp->cn_namelen, cnp->cn_flags);
return 0;
 }
 
Index: ufs/chfs/chfs_vnode.c
===
RCS file: /cvsroot/src/sys/ufs/chfs/chfs_vnode.c,v
retrieving revision 1.15
diff -u -u -r1.15 chfs_vnode.c
--- ufs/chfs/chfs_vnode.c   1 Apr 2017 19:35:57 -   1.15
+++ ufs/chfs/chfs_vnode.c   17 Sep 2019 22:40:01 -
@@ -310,6 +310,8 @@
 
VOP_UNLOCK(vp);
*vpp = vp;
+   cache_enter(pdir, *vpp, cnp->cn_nameptr, cnp->cn_namelen,
+   cnp->cn_flags);
return (0);
 }
 
Index: ufs/ext2fs/ext2fs_vnops.c
===
RCS file: /cvsroot/src/sys/ufs/ext2fs/ext2fs_vnops.c,v
retrieving revision 1.129
diff -u -u -r1.129 ext2fs_vnops.c
--- ufs/ext2fs/ext2fs_vnops.c   1 Jan 2019 10:06:55 -   1.129
+++ ufs/ext2fs/ext2fs_vnops.c   17 Sep 2019 22:40:01 -
@@ -1045,6 +1045,7 @@
}
 
*vpp = tvp;
+   cache_enter(dvp, *vpp, cnp->cn_nameptr, cnp->cn_namelen, cnp->cn_flags);
return 0;
 
 bad:
Index: ufs/lfs/lfs_vnops.c
===
RCS file: /cvsroot/src/sys/ufs/lfs/lfs_vnops.c,v
retrieving revision 1.324
diff -u -u -r1.324 lfs_vnops.c
--- ufs/lfs/lfs_vnops.c 20 Jun 2019 00:49:11 -  1.324
+++ ufs/lfs/lfs_vnops.c 17 Sep 2019 22:40:01 -
@@ -405,6 +405,7 @@
if (error)
goto bad;
*vpp = tvp;
+   cache_enter(dvp, *vpp, cnp->cn_nameptr, cnp->cn_namelen, cnp->cn_flags);
KASSERT(VOP_ISLOCKED(*vpp) == LK_EXCLUSIVE);
return (0);
 
Index: ufs/ufs/ufs_vnops.c
===
RCS file: /cvsroot/src/sys/ufs/ufs/ufs_vnops.c,v
retrieving revision 1.247
diff -u -u -r1.247 ufs_vnops.c
--- ufs/ufs/ufs_vnops.c 1 Jul 2019 00:57:06 -   1.247
+++ ufs/ufs/ufs_vnops.c 17 Sep 2019 22:40:01 -
@@ -1909,6 +1909,7 @@
if (error)
goto bad;
*vpp = tvp;
+   cache_enter(dvp, *vpp, cnp->cn_nameptr, cnp->cn_namelen, cnp->cn_flags);
return (0);
 
  bad:

Re: fcntl(F_GETPATH) support

2019-09-17 Thread Christos Zoulas

In article <4e7e49e0-9c71-1f21-22fc-8ed54590a...@gmx.com>,
Kamil Rytarowski   wrote:
>-=-=-=-=-=-
>-=-=-=-=-=-
>
>On 15.09.2019 20:03, Christos Zoulas wrote:
>> I think it is quite reliable because all the file descriptors would be
>recently
>> opened and therefore be in the cache.  One would need to DoS the cache
>> cause eviction. If that turns out to be false, we can make the namecache
>> reliable, or withdraw it.
>
>As discussed with Mateusz Guzik offlist, the dentry approach looks
>reliable and as a way to go.
>
>It changes fd -> vnode -> inode to fd -> dentry -> vnode.
>
>We could switch from catching program name on exec to proper pathname
>resolution with KERN_PROC_PATHNAME.
>
>Certain programs require always correct path to be resolved from
>hardlinks, not the last one from the cache. This used to affect LLVM.
>
>There is also a hole in the current namecache implementation as it
>misses entry for newly created files (as informed by Mateusz). Example:
>
>#include 
>#include 
>#include 
>#include 
>#include 
>
>int
>main(void)
>{
>char buf[1024];
>int fd;
>
>fd = open("/tmp/test", O_RDWR|O_CREAT|O_EXCL,0600);
>if (fd == -1)
>err(1, "open");
>if (fcntl(fd, F_GETPATH, buf) < 0)
>err(1, "fcntl");
>printf("[%s]\n", buf);
>}
>
>For the time being, the current code is still an improvement over the
>past state.

Well, I did some experiments by adding a cache_enter call in ufs inode
creation. This fixes the above case, and makes the cache more logically
consistent i.e. we usually create files we are going to access, so we
might as well cache them (consider a build scenario and object files,
or even binaries that will get installed). This is a one line change
per filesystem:

Index: ufs_vnops.c
===
RCS file: /cvsroot/src/sys/ufs/ufs/ufs_vnops.c,v
retrieving revision 1.247
diff -u -p -u -r1.247 ufs_vnops.c
--- ufs_vnops.c 1 Jul 2019 00:57:06 -   1.247
+++ ufs_vnops.c 17 Sep 2019 20:32:48 -
@@ -1909,6 +1909,7 @@ ufs_makeinode(struct vattr *vap, struct 
if (error)
goto bad;
*vpp = tvp;
+   cache_enter(dvp, *vpp, cnp->cn_nameptr, cnp->cn_namelen, cnp->cn_flags);
return (0);
 
  bad:

The above change does not seem to affect build times...

no cache
build.sh started:Mon Sep 16 13:11:04 EDT 2019
build.sh ended:  Mon Sep 16 15:27:41 EDT 2019
2:16:37

cache
build.sh started:Mon Sep 16 16:02:18 EDT 2019
build.sh ended:  Mon Sep 16 18:18:23 EDT 2019
2:16:15

I don't know what made the builds slower the second day... We should
investigate why builds on a freshly booted machines are faster!

cache
build.sh started:Tue Sep 17 11:05:35 EDT 2019
build.sh ended:  Tue Sep 17 13:37:33 EDT 2019
2:31:58

no cache
build.sh started:Tue Sep 17 14:01:14 EDT 2019
build.sh ended:  Tue Sep 17 16:29:17 EDT 2019
2:28:03

I am not sure though if we should change the current behavior just to make
F_GETPATH better? Opinions?

christos

Re: fcntl(F_GETPATH) support

2019-09-15 Thread Christos Zoulas




> On Sep 15, 2019, at 12:43 PM, Mateusz Guzik  wrote:
> 
> On 9/14/19, Christos Zoulas  wrote:
>> 
>> Comments?
>> 
>> +error = vnode_to_path(kpath, MAXPATHLEN, fp->f_vnode, l, l->l_proc);
> 
> What motivates this change?
> 
> I think it is a little problematic in that namecache resolution is not
> guaranteed to work. If the consumer does not check for failure or fallback
> to something else the user is going to experience "once in the blue moon"
> failures.
> 
> For example clang likes to get full paths to names. On Linux it can
> trivially do it thanks to dentry cache. On other systems there is a
> compilation check for F_GETPATH. If none of this is present it goes for
> realpath, which is slover but basically works. If F_GETPATH is present,
> it is always used and there is no fallback if it fails. Then you risk
> setting yourself up for a weird compilation failure, which may not even
> repeat itself after you retry.
> 
> All in all I don't think this should be added unless namecache becomes
> reliable. The above reasoning is why I did not add it to FreeBSD.
> 

I think it is quite reliable because all the file descriptors would be recently
opened and therefore be in the cache.  One would need to DoS the cache
cause eviction. If that turns out to be false, we can make the namecache
reliable, or withdraw it.

christos

Re: more fexecve questions

2019-09-14 Thread Christos Zoulas

In article <20190910145247.c52cc17f...@rebar.astron.com>,
Christos Zoulas  wrote:

1. So the consensus is to leave the file descriptor alone.
2. I've added the reverse cache lookup, and now script execution works.
3. Nobody suggested anything to improve security.

Here's the latest patch; if I don't hear objections soon I will commit it...

Index: compat/netbsd32/netbsd32_execve.c
===
RCS file: /cvsroot/src/sys/compat/netbsd32/netbsd32_execve.c,v
retrieving revision 1.39
diff -u -p -u -r1.39 netbsd32_execve.c
--- compat/netbsd32/netbsd32_execve.c   3 Sep 2018 16:29:29 -   1.39
+++ compat/netbsd32/netbsd32_execve.c   15 Sep 2019 00:14:10 -
@@ -71,9 +71,8 @@ netbsd32_execve(struct lwp *l, const str
syscallarg(netbsd32_charpp) argp;
syscallarg(netbsd32_charpp) envp;
} */
-   const char *path = SCARG_P32(uap, path);
 
-   return execve1(l, path, SCARG_P32(uap, argp),
+   return execve1(l, SCARG_P32(uap, path), -1, SCARG_P32(uap, argp),
SCARG_P32(uap, envp), netbsd32_execve_fetch_element);
 }
 
@@ -86,13 +85,9 @@ netbsd32_fexecve(struct lwp *l, const st
syscallarg(netbsd32_charpp) argp;
syscallarg(netbsd32_charpp) envp;
} */
-   struct sys_fexecve_args ua;
 
-   NETBSD32TO64_UAP(fd);
-   NETBSD32TOP_UAP(argp, char * const);
-   NETBSD32TOP_UAP(envp, char * const);
-
-   return sys_fexecve(l, , retval);
+   return execve1(l, NULL, SCARG(uap, fd), SCARG_P32(uap, argp),
+   SCARG_P32(uap, envp), netbsd32_execve_fetch_element);
 }
 
 static int
Index: kern/exec_elf.c
===
RCS file: /cvsroot/src/sys/kern/exec_elf.c,v
retrieving revision 1.98
diff -u -p -u -r1.98 exec_elf.c
--- kern/exec_elf.c 7 Jun 2019 23:35:52 -   1.98
+++ kern/exec_elf.c 15 Sep 2019 00:14:10 -
@@ -157,6 +157,7 @@ elf_populate_auxv(struct lwp *l, struct 
size_t len, vlen;
AuxInfo ai[ELF_AUX_ENTRIES], *a, *execname;
struct elf_args *ap;
+   char *path = l->l_proc->p_path;
int error;
 
a = ai;
@@ -224,9 +225,11 @@ elf_populate_auxv(struct lwp *l, struct 
a->a_v = l->l_proc->p_stackbase;
a++;
 
-   execname = a;
-   a->a_type = AT_SUN_EXECNAME;
-   a++;
+   if (path[0] == '/' && path[1] != '\0') {
+   execname = a;
+   a->a_type = AT_SUN_EXECNAME;
+   a++;
+   }
 
exec_free_emul_arg(pack);
} else {
@@ -242,7 +245,6 @@ elf_populate_auxv(struct lwp *l, struct 
KASSERT(vlen <= sizeof(ai));
 
if (execname) {
-   char *path = l->l_proc->p_path;
execname->a_v = (uintptr_t)(*stackp + vlen);
len = strlen(path) + 1;
if ((error = copyout(path, (*stackp + vlen), len)) != 0)
Index: kern/exec_script.c
===
RCS file: /cvsroot/src/sys/kern/exec_script.c,v
retrieving revision 1.79
diff -u -p -u -r1.79 exec_script.c
--- kern/exec_script.c  27 Jan 2019 02:08:43 -  1.79
+++ kern/exec_script.c  15 Sep 2019 00:14:10 -
@@ -290,7 +290,7 @@ check_shell:
/* try loading the interpreter */
if ((error = exec_makepathbuf(l, shellname, UIO_SYSSPACE,
_pathbuf, NULL)) == 0) {
-   error = check_exec(l, epp, shell_pathbuf);
+   error = check_exec(l, epp, shell_pathbuf, NULL);
pathbuf_destroy(shell_pathbuf);
}
 
Index: kern/kern_exec.c
===
RCS file: /cvsroot/src/sys/kern/kern_exec.c,v
retrieving revision 1.479
diff -u -p -u -r1.479 kern_exec.c
--- kern/kern_exec.c7 Sep 2019 15:34:44 -   1.479
+++ kern/kern_exec.c15 Sep 2019 00:14:10 -
@@ -257,7 +257,7 @@ struct execve_data {
struct ps_strings   ed_arginfo;
char*ed_argp;
const char  *ed_pathstring;
-   char*ed_resolvedpathbuf;
+   char*ed_resolvedname;
size_t  ed_ps_strings_sz;
int ed_szsigcode;
size_t  ed_argslen;
@@ -309,9 +309,32 @@ exec_path_free(struct execve_data *data)
 {  
pathbuf_stringcopy_put(data->ed_pathbuf, data->ed_pathstring);
pathbuf_destroy(data->ed_pathbuf);
-   PNBUF_PUT(data->ed_resolvedpathbuf);
+   if (data->ed_resolvedname)
+   PNBUF_PUT(data->ed_resolvedname);
 }
 
+static void
+exec_resolvename(struct lwp *l, struct exec_package *epp, struct vnode *vp,
+char **rpath)
+{
+   int error;
+   char *

Re: fcntl(F_GETPATH) support

2019-09-14 Thread Christos Zoulas

In article <2f29ca9a-0ae1-48d3-b3f4-1556912d4...@me.com>,
Jason Thorpe   wrote:
>
>
>> On Sep 14, 2019, at 2:52 PM, Kamil Rytarowski  wrote:
>> 
>> On 14.09.2019 23:34, Christos Zoulas wrote:
>>> Comments?
>>> 
>> 
>> Question. How does it handle symbolic/hardlinks?
>
>Symbolic links, of course, are simply continuations of the lookup, so
>there's "nothing to see here" with a symbolic link.

If you can get a file desriptor to a symlink, it will work; I don't think
that we have a way to do this now.

>I looked at cache_revlookup() and it looks to me like it simply picks
>the first hit it finds in the hash table for the given vnode.

That is correct.

>At least one platform that support this API uses "the last name the file
>was looked up with", and will fall back on "whatever the underlying file
>system considers to be the canonical name for the file", which is file
>system-defined, if for some reason there is no entry in the name cache
>(which could, for example, happen if an application is doing an
>"open-by-file-id" type operation and the file has not yet been opened by
>path name).

It will also fail if the entry has been evicted from the cache, but that
rarely happens with recently opened files.

christos

fcntl(F_GETPATH) support

2019-09-14 Thread Christos Zoulas



Comments?

christos


Index: kern/sys_descrip.c
===
RCS file: /cvsroot/src/sys/kern/sys_descrip.c,v
retrieving revision 1.34
diff -u -p -u -r1.34 sys_descrip.c
--- kern/sys_descrip.c  26 Aug 2019 10:19:08 -  1.34
+++ kern/sys_descrip.c  14 Sep 2019 21:33:29 -
@@ -315,6 +312,28 @@ do_fcntl_lock(int fd, int cmd, struct fl
return error;
 }
 
+static int
+do_fcntl_getpath(struct lwp *l, file_t *fp, char *upath)
+{
+   char *kpath;
+   int error;
+
+   if (fp->f_type != DTYPE_VNODE)
+   return EOPNOTSUPP;
+
+   kpath = PNBUF_GET();
+
+   error = vnode_to_path(kpath, MAXPATHLEN, fp->f_vnode, l, l->l_proc);
+   if (error)
+   goto out;
+
+   error = copyoutstr(kpath, upath, MAXPATHLEN, NULL);
+out:
+   PNBUF_PUT(kpath);
+
+   return error;
+}
+   
 /*
  * The file control system call.
  */
@@ -463,6 +482,10 @@ sys_fcntl(struct lwp *l, const struct sy
error = (*fp->f_ops->fo_ioctl)(fp, FIOSETOWN, );
break;
 
+   case F_GETPATH:
+   error = do_fcntl_getpath(l, fp, SCARG(uap, arg));
+   break;
+
default:
error = EINVAL;
}
Index: sys/fcntl.h
===
RCS file: /cvsroot/src/sys/sys/fcntl.h,v
retrieving revision 1.50
diff -u -p -u -r1.50 fcntl.h
--- sys/fcntl.h 20 Feb 2018 18:20:05 -  1.50
+++ sys/fcntl.h 14 Sep 2019 21:33:29 -
@@ -193,6 +195,7 @@
 #defineF_DUPFD_CLOEXEC 12  /* close on exec duplicated fd 
*/
 #defineF_GETNOSIGPIPE  13  /* get SIGPIPE disposition */
 #defineF_SETNOSIGPIPE  14  /* set SIGPIPE disposition */
+#defineF_GETPATH   15  /* get pathname assosiated with 
file */
 #endif
 
 /* file descriptor flags (F_GETFD, F_SETFD) */

Re: NCHNAMLEN vnode cache limitation removal

2019-09-13 Thread Christos Zoulas

In article <20190913180602.gb20...@netbsd.org>,
David Holland   wrote:
>On Wed, Sep 11, 2019 at 03:53:18PM +0700, Robert Elz wrote:
> > What's more interesting to me is to know just how many long names people
> > are seeing which are currently excluded from the cache, and would benefit
> > from the change - that is, what percentage of all lookups fail in the
> > cache for that reason, and do we have a histogram of the longer lengths
> > with their frequencies (that is, would simply making the cutoff a little
> > bigger improve things without requiring mem allocation to be added).
>
>The goal is to not arbitrarily exclude names from the cache at all,
>because fexecve is causing rumbles about doing significantly more
>reverse lookups.

I am going to run more tests with a bigger cache, but if you run:

cc https://www.netbsd.org/~christos/countlen.c
./a.out /usr/src/ /usr/obj/ /usr/xsrc/

you'll see that there are still lots of names now > 32, so perhaps
making NCHNAMLEN 40 makes more sense this days (rather than 31
which was chosen ~40 years ago). Seems that people like longer
names as time goes forward. Also the way the code is now we can
make NCHNAMLEN a variable and use sysctl to change it (flushing
the cache), for quicker experiments that don't need recompiling.

But now we are not excluding anything; it is just that names over
NCHNAMLEN use kmem_alloc instead of pooled storage...

Best,

christos

1 2 3 4 5 6 7 8 >

1 - 100 of 707 matches

Mail list logo