Re: bounty for virtio 1.0 (now with instructions!)

2020-11-03 Thread Kamil Rytarowski
On 03.11.2020 23:20, co...@sdf.org wrote:
> On Tue, Nov 03, 2020 at 10:42:27PM +0100, Martin Husemann wrote:
>> On Tue, Nov 03, 2020 at 10:23:30PM +0100, Reinoud Zandijk wrote:
>>> To be clear, do we want to (keep) supporting legacy devices? Its not 
>>> required
>>> in 1.0 and could clean up the code a lot!
>>
>> Yes, we need that still, as not all hosts offer the newer ones.
> 
> The QEMU people mentioned that even if "legacy virtio" IDs are used,
> there's a bit to show that it's compatible with newer virtio.
> Things that claim old virtio probably do both old & new.
> 

"Transitional virtio" supports both, 1.0 and 0.9. I had a draft work
with the upgrade of virtio, but incomplete..



signature.asc
Description: OpenPGP digital signature


Re: make COMPAT_LINUX match SYSV binaries

2020-10-21 Thread Kamil Rytarowski
On 21.10.2020 14:14, co...@sdf.org wrote:
> On Tue, Oct 20, 2020 at 07:11:05PM +, co...@sdf.org wrote:
>> hello,
>>
>> As a background, some Linux binaries don't claim to be targeting the
>> Linux OS, but instead are "SYSV".
>>
>> We have used some heuristics to still identify those binaries as being
>> Linux binaries, like looking into the symbols defined by the binary.
>>
>> it looks like we no longer have other forms of compat expected to use
>> SYSV ELF binaries. Perhaps we should drop this elaborate detection logic
>> in favour of detecting SYSV == Linux?
>>
>> As an added bonus, it allows detecting binaries built with a musl
>> toolchain as being Linux binaries.
>>
> 
> I feel compelled to explain further:
> any OS that doesn't rely on this tag is prone to spitting out binaries
> with the wrong tag. For example, Go spits out Solaris binaries with SYSV
> as well.
> 
> Our current solution to it is the kernel reading through the binary,
> checking if it contains certain known symbols that are common on Linux.
> 
> We support the following forms of compat:
> 
> ultrixnot ELF
> sunos not ELF (we support only oold stuff)
> freebsd   always correctly tagged, because the native OS
>   checks this, like we do.
> linux ELF, not always correctly tagged
> 
> 
> So, currently, we only support one OS that has this problem, which is
> linux. I am proposing we take advantage of it.
> 
> In the event someone adds support for another OS with this problem (say,
> modern Solaris), I don't expect this compat to be enabled by default,
> for security reasons. So the problem will only occur if a user enables
> both forms of compat at the same time.
> 
> Users already have to opt in to have Linux compat support. I think it is
> a lot to ask to have them tag every binary.
> 

I couldn't run musl binaries without either patching the kernel or ELF
files, so I'm for making this easier.

In my case, I had to add manually build-id tag to musl binaries. For
some reason someone in the kernel assumed that they are always present,
which is just a special case in some distros.



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2] Issue 64-bit versions of *XSAVE* for 64-bit amd64 programs

2020-10-17 Thread Kamil Rytarowski
The same bug was fixed in FreeBSD here:

"amd64: Store full 64bit of FIP/FDP for 64bit processes when using XSAVE."
https://github.com/freebsd/freebsd/commit/62a9018a8878533432500e5cb89f9bd07fd9ef14

Kamil Rytarowski
CTO, Moritz Systems
www.moritz.systems

pt., 16 paź 2020 o 15:26 Michał Górny  napisał(a):
>
> When calling FXSAVE, XSAVE, FXRSTOR, ... for 64-bit programs on amd64
> use the 64-suffixed variant in order to include the complete FIP/FDP
> registers in the x87 area.
>
> The difference between the two variants is that the FXSAVE64 (new)
> variant represents FIP/FDP as 64-bit fields (union fp_addr.fa_64),
> while the legacy FXSAVE variant uses split fields: 32-bit offset,
> 16-bit segment and 16-bit reserved field (union fp_addr.fa_32).
> The latter implies that the actual addresses are truncated to 32 bits
> which is insufficient in modern programs.
>
> The change is applied only to 64-bit programs on amd64.  Plain i386
> and compat32 continue using plain FXSAVE.  Similarly, NVMM is not
> changed as I am not familiar with that code.
>
> This is a potentially breaking change.  However, I don't think it likely
> to actually break anything because the data provided by the old variant
> were not meaningful (because of the truncated pointer).
> ---
>  sys/arch/x86/include/cpufunc.h | 76 ++
>  sys/arch/x86/include/fpu.h |  4 +-
>  sys/arch/x86/x86/fpu.c | 30 ++
>  sys/dev/nvmm/x86/nvmm_x86_svm.c|  6 +-
>  sys/dev/nvmm/x86/nvmm_x86_vmx.c|  6 +-
>  tests/lib/libc/sys/t_ptrace_x86_wait.h |  2 -
>  6 files changed, 105 insertions(+), 19 deletions(-)
>
> diff --git a/sys/arch/x86/include/cpufunc.h b/sys/arch/x86/include/cpufunc.h
> index 38534c305544..dd8b0c7dc375 100644
> --- a/sys/arch/x86/include/cpufunc.h
> +++ b/sys/arch/x86/include/cpufunc.h
> @@ -485,6 +485,82 @@ xrstor(const void *addr, uint64_t mask)
> );
>  }
>
> +#ifdef __x86_64__
> +static inline void
> +fxsave64(void *addr)
> +{
> +   uint8_t *area = addr;
> +
> +   __asm volatile (
> +   "fxsave64   %[area]"
> +   : [area] "=m" (*area)
> +   :
> +   : "memory"
> +   );
> +}
> +
> +static inline void
> +fxrstor64(const void *addr)
> +{
> +   const uint8_t *area = addr;
> +
> +   __asm volatile (
> +   "fxrstor64 %[area]"
> +   :
> +   : [area] "m" (*area)
> +   : "memory"
> +   );
> +}
> +
> +static inline void
> +xsave64(void *addr, uint64_t mask)
> +{
> +   uint8_t *area = addr;
> +   uint32_t low, high;
> +
> +   low = mask;
> +   high = mask >> 32;
> +   __asm volatile (
> +   "xsave64%[area]"
> +   : [area] "=m" (*area)
> +   : "a" (low), "d" (high)
> +   : "memory"
> +   );
> +}
> +
> +static inline void
> +xsaveopt64(void *addr, uint64_t mask)
> +{
> +   uint8_t *area = addr;
> +   uint32_t low, high;
> +
> +   low = mask;
> +   high = mask >> 32;
> +   __asm volatile (
> +   "xsaveopt64 %[area]"
> +   : [area] "=m" (*area)
> +   : "a" (low), "d" (high)
> +   : "memory"
> +   );
> +}
> +
> +static inline void
> +xrstor64(const void *addr, uint64_t mask)
> +{
> +   const uint8_t *area = addr;
> +   uint32_t low, high;
> +
> +   low = mask;
> +   high = mask >> 32;
> +   __asm volatile (
> +   "xrstor64 %[area]"
> +   :
> +   : [area] "m" (*area), "a" (low), "d" (high)
> +   : "memory"
> +   );
> +}
> +#endif
> +
>  /* 
> -- */
>
>  #ifdef XENPV
> diff --git a/sys/arch/x86/include/fpu.h b/sys/arch/x86/include/fpu.h
> index 3255a8ca39e0..bdd86abfdd39 100644
> --- a/sys/arch/x86/include/fpu.h
> +++ b/sys/arch/x86/include/fpu.h
> @@ -14,8 +14,8 @@ struct trapframe;
>  void fpuinit(struct cpu_info *);
>  void fpuinit_mxcsr_mask(void);
>
> -void fpu_area_save(void *, uint64_t);
> -void fpu_area_restore(const void *, uint64_t);
> +void fpu_area_save(void *, uint64_t, bool);
> +void fpu_area_restore(const void *, uint64_t, bool);
>
>  void fpu_save(void);
>
> diff --git a/sys/arch/x86/x86/fpu.c b/sys/arch/x86/x86/fpu.c

Re: CVS commit: src/external/gpl3/gcc/dist/gcc/config/aarch64

2020-10-15 Thread Kamil Rytarowski
On 15.10.2020 17:14, Rin Okuyama wrote:
> On 2020/10/15 16:12, matthew green wrote:
>> Martin Husemann writes:
>>> On Thu, Oct 15, 2020 at 05:28:12PM +1100, matthew green wrote:
 you could try reverting most of our changes to this file and
 making sure you run with /proc mounted -o linux.  ryo@ recently
 added additional /proc/cpuinfo support that should make this
 just work with the upstream version, but i haven't had chance
 to update and see if this is the case.
> 
> I've confirmed that -mtune=native works fine at least for A53,
> even if all the local changes to driver-aarch64.c is reverted.
> I will commit it soon.
> 
>>> If we go this route, we should make the relevant procfs nodes
>>> independent
>>> of -o linux.
>>
>> that would be fine by me.
> 
> Nowadays, -o linux is turned on by default (unless nolinux is
> specified explicitly). Still, native apps probably should not
> depend on it.
> 
> This needs MI changes to procfs, not MD to aarch64. Should we
> enable /proc/cpuinfo unconditionally?


I'm against the policy of restoring the /proc dependency for this corner
case in one application.

We need to upstream the NetBSD specific patches to mainline GCC.



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 0/2] Delete CIRCLEQ

2020-10-12 Thread Kamil Rytarowski
This removal is a part of a larger synchronization with other BSDs as
we lack various features in sys/queue.h (like LIST_PREV()).

CIRCLEQ was already deleted from the documentation and disabled in the
kernel in NetBSD-7. If there are still any unaware users, they are
certainly long broken.

What's the benefit of keeping it around and having no users and
documented deprecation plus being prone to miscompilation? The removal
does not break libc or kernel ABIs. Most 3rd party users of these
macros deliver a homegrown copy of sys/queue.h anyway.

Kamil Rytarowski
CTO, Moritz Systems
www.moritz.systems

pon., 12 paź 2020 o 13:15 Mouse  napisał(a):
>
> >>> Remove the CIRCLEQ API completely from the system headers and
> >>> document this fact in the QUEUE(3) man-page.
>
> >> why?  queue.h may be used by more than src.
>
> >> i don't see any benefit except forcing working code to be changed,
> >> possibly introducing bugs.
>
> >> please leave it alone.
>
> > It's been deprecated since -7, we can remove it for -10.
>
> So?  I still agree with mrg: I too see no benefit to removing it (or
> for that matter to deprecating it).  What am I missing?  What benefit
> do you see?
>
> /~\ The ASCII Mouse
> \ / Ribbon Campaign
>  X  Against HTMLmo...@rodents-montreal.org
> / \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: [PATCH 0/2] Delete CIRCLEQ

2020-10-12 Thread Kamil Rytarowski
Everything relatively modern that uses sys/queue.h directly was
already switched a long time ago to TAILQ.

Christos Zoulas changed most users of CIRCLEQ in the src tree 6 years
ago. The last leftover is handled in this patchset.

I was able to find some 3rd projects using CIRCLEQ, but probably all
of them are both: 1. old 2. ship with a  local copy of sys/queue.h.
There is not much working code using CIRCLEQ left and NetBSD is pretty
much the last one to deliver it.

It's also broken for modern compilers and we need to launder pointers
to workaround the pointer aliasing design flaw.

Kamil Rytarowski
CTO, Moritz Systems
www.moritz.systems

pon., 12 paź 2020 o 04:23 matthew green  napisał(a):
>
> Kamil Rytarowski writes:
> > Switch the last user (ypserv) from CIRCLEQ to TAILQ.
> > This is inspired by analogous refactoring from OpenBSD:
> > https://github.com/openbsd/src/commit/d53c0cf4d32fdbd8b42debfba068f1b0efa423dc#diff-8d0a4fbb89658213ebf314ff188581d7
> >
> > Remove the CIRCLEQ API completely from the system headers and document
> > this fact in the QUEUE(3) man-page.
>
> why?  queue.h may be used by more than src.
>
> i don't see any benefit except forcing working code to
> be changed, possibly introducing bugs.
>
> please leave it alone.
>
>
> .mrg.


[PATCH 2/2] Remove the CIRCLEQ API

2020-10-11 Thread Kamil Rytarowski
It was marked deprecated in NetBSD 7 and already removed from
FreeBSD in 2000 and OpenBSD in 2015.
---
 share/man/man3/queue.3 |  10 +++
 sys/sys/queue.h| 196 -
 2 files changed, 10 insertions(+), 196 deletions(-)

diff --git a/share/man/man3/queue.3 b/share/man/man3/queue.3
index 4293090986d4..359c772e5439 100644
--- a/share/man/man3/queue.3
+++ b/share/man/man3/queue.3
@@ -1091,3 +1091,13 @@ and
 .Nm STAILQ
 functions first appeared in
 .Fx 2.1.5 .
+.Pp
+The
+.Nm CIRCLEQ
+functions first appeared in
+.Bx 4.4
+and were deprecated in
+.Nx 7
+and removed in
+.Nx 10
+due to the pointer aliasing violations.
diff --git a/sys/sys/queue.h b/sys/sys/queue.h
index 5764c7a98a53..ec686364126b 100644
--- a/sys/sys/queue.h
+++ b/sys/sys/queue.h
@@ -69,14 +69,6 @@
  * after an existing element, at the head of the list, or at the end of
  * the list. A tail queue may be traversed in either direction.
  *
- * A circle queue is headed by a pair of pointers, one to the head of the
- * list and the other to the tail of the list. The elements are doubly
- * linked so that an arbitrary element can be removed without a need to
- * traverse the list. New elements can be added to the list before or after
- * an existing element, at the head of the list, or at the end of the list.
- * A circle queue may be traversed in either direction, but has a more
- * complex end of list detection.
- *
  * For details on the use of these macros, see the queue(3) manual page.
  */
 
@@ -663,192 +655,4 @@ struct {  
\
((struct type *)(void *)\
((char *)((head)->stqh_last) - offsetof(struct type, field
 
-
-#ifndef _KERNEL
-/*
- * Circular queue definitions. Do not use. We still keep the macros
- * for compatibility but because of pointer aliasing issues their use
- * is discouraged!
- */
-
-/*
- * __launder_type():  We use this ugly hack to work around the compiler
- * noticing that two types may not alias each other and elide tests in code.
- * We hit this in the CIRCLEQ macros when comparing 'struct name *' and
- * 'struct type *' (see CIRCLEQ_HEAD()).  Modern compilers (such as GCC
- * 4.8) declare these comparisons as always false, causing the code to
- * not run as designed.
- *
- * This hack is only to be used for comparisons and thus can be fully const.
- * Do not use for assignment.
- *
- * If we ever choose to change the ABI of the CIRCLEQ macros, we could fix
- * this by changing the head/tail sentinal values, but see the note above
- * this one.
- */
-static __inline const void * __launder_type(const void *);
-static __inline const void *
-__launder_type(const void *__x)
-{
-   __asm __volatile("" : "+r" (__x));
-   return __x;
-}
-
-#if defined(QUEUEDEBUG)
-#define QUEUEDEBUG_CIRCLEQ_HEAD(head, field)   \
-   if ((head)->cqh_first != CIRCLEQ_ENDC(head) &&  \
-   (head)->cqh_first->field.cqe_prev != CIRCLEQ_ENDC(head))\
-   QUEUEDEBUG_ABORT("CIRCLEQ head forw %p %s:%d", (head),  \
- __FILE__, __LINE__);  \
-   if ((head)->cqh_last != CIRCLEQ_ENDC(head) &&   \
-   (head)->cqh_last->field.cqe_next != CIRCLEQ_ENDC(head)) \
-   QUEUEDEBUG_ABORT("CIRCLEQ head back %p %s:%d", (head),  \
- __FILE__, __LINE__);
-#define QUEUEDEBUG_CIRCLEQ_ELM(head, elm, field)   \
-   if ((elm)->field.cqe_next == CIRCLEQ_ENDC(head)) {  \
-   if ((head)->cqh_last != (elm))  \
-   QUEUEDEBUG_ABORT("CIRCLEQ elm last %p %s:%d",   \
-   (elm), __FILE__, __LINE__); \
-   } else {\
-   if ((elm)->field.cqe_next->field.cqe_prev != (elm)) \
-   QUEUEDEBUG_ABORT("CIRCLEQ elm forw %p %s:%d",   \
-   (elm), __FILE__, __LINE__); \
-   }   \
-   if ((elm)->field.cqe_prev == CIRCLEQ_ENDC(head)) {  \
-   if ((head)->cqh_first != (elm)) \
-   QUEUEDEBUG_ABORT("CIRCLEQ elm first %p %s:%d",  \
-   (elm), __FILE__, __LINE__); \
-   } else {\
-   if ((elm)->field.cqe_prev->field.cqe_next != (elm)) \
-   QUEUEDEBUG_ABORT("CIRCLEQ elm prev %p %s:%d",   \
-   (elm), __FILE__, __LINE__); \
-   }
-#define QUEUEDEBUG_CIRCLEQ_POSTREMOVE(elm, field)  \
-   (elm)->field.cqe_next = (void *)1L; \
-   

[PATCH 1/2] Convert the CIRCLEQ (from sys/queue.h) usage to TAILQ

2020-10-11 Thread Kamil Rytarowski
The CIRCLEQ API from sys/queue.h is deprecated since NetBSD 7 and
it will be removed soon. The CIRCLEQ API implementation is prone
to a miscompilation due to the pointer aliasing and is discouraged.
---
 usr.sbin/ypserv/ypserv/ypserv_db.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/usr.sbin/ypserv/ypserv/ypserv_db.c 
b/usr.sbin/ypserv/ypserv/ypserv_db.c
index d6f9eda39eaa..9edfabc4f675 100644
--- a/usr.sbin/ypserv/ypserv/ypserv_db.c
+++ b/usr.sbin/ypserv/ypserv/ypserv_db.c
@@ -65,7 +65,7 @@ __RCSID("$NetBSD: ypserv_db.c,v 1.22 2011/02/01 21:00:25 
chuck Exp $");
 
 LIST_HEAD(domainlist, opt_domain); /* LIST of domains */
 LIST_HEAD(maplist, opt_map);   /* LIST of maps (in a domain) */
-CIRCLEQ_HEAD(mapq, opt_map);   /* CIRCLEQ of maps (LRU) */
+TAILQ_HEAD(mapq, opt_map); /* TAILQ of maps (LRU) */
 
 struct opt_map {
char*map;   /* map name (malloc'd) */
@@ -76,7 +76,7 @@ struct opt_map {
dev_t   dbdev;  /* device db is on */
ino_t   dbino;  /* inode of db */
time_t  dbmtime;/* time of last db modification */
-   CIRCLEQ_ENTRY(opt_map) mapsq;   /* map queue pointers */
+   TAILQ_ENTRY(opt_map) mapsq; /* map queue pointers */
LIST_ENTRY(opt_map) mapsl;  /* map list pointers */
 };
 
@@ -106,7 +106,7 @@ ypdb_init(void)
 {
 
LIST_INIT();
-   CIRCLEQ_INIT();
+   TAILQ_INIT();
 }
 
 /*
@@ -161,7 +161,7 @@ yp_private(datum key, int ypprivate)
 void
 ypdb_close_map(struct opt_map *map)
 {
-   CIRCLEQ_REMOVE(, map, mapsq);  /* remove from LRU circleq */
+   TAILQ_REMOVE(, map, mapsq);/* remove from LRU tailq */
LIST_REMOVE(map, mapsl);/* remove from domain list */
 
 #ifdef DEBUG
@@ -326,8 +326,8 @@ ypdb_open_db(const char *domain, const char *map, u_int 
*status,
 */
if (finfo.st_dev == m->dbdev && finfo.st_ino == m->dbino &&
finfo.st_mtime == m->dbmtime) {
-   CIRCLEQ_REMOVE(, m, mapsq); /* adjust LRU queue */
-   CIRCLEQ_INSERT_HEAD(, m, mapsq);
+   TAILQ_REMOVE(, m, mapsq); /* adjust LRU queue */
+   TAILQ_INSERT_HEAD(, m, mapsq);
if (map_info)
*map_info = m;
return (m->db);
@@ -423,7 +423,7 @@ retryopen:
m->dbdev = finfo.st_dev;
m->dbino = finfo.st_ino;
m->dbmtime = finfo.st_mtime;
-   CIRCLEQ_INSERT_HEAD(, m, mapsq);
+   TAILQ_INSERT_HEAD(, m, mapsq);
LIST_INSERT_HEAD(>dmaps, m, mapsl);
if (strcmp(map, YP_HOSTNAME) == 0 || strcmp(map, YP_HOSTADDR) == 0) {
if (!usedns) {
-- 
2.28.0



[PATCH 0/2] Delete CIRCLEQ

2020-10-11 Thread Kamil Rytarowski
Switch the last user (ypserv) from CIRCLEQ to TAILQ.
This is inspired by analogous refactoring from OpenBSD:
https://github.com/openbsd/src/commit/d53c0cf4d32fdbd8b42debfba068f1b0efa423dc#diff-8d0a4fbb89658213ebf314ff188581d7

Remove the CIRCLEQ API completely from the system headers and document
this fact in the QUEUE(3) man-page.

Kamil Rytarowski (2):
  Convert the CIRCLEQ (from sys/queue.h) usage to TAILQ
  Remove the CIRCLEQ API

 share/man/man3/queue.3 |  10 ++
 sys/sys/queue.h| 196 -
 usr.sbin/ypserv/ypserv/ypserv_db.c |  14 +--
 3 files changed, 17 insertions(+), 203 deletions(-)

-- 
2.28.0


Re: SIGCHLD and sigaction()

2020-08-19 Thread Kamil Rytarowski
On 19.08.2020 20:02, Roy Marples wrote:
> On 18/08/2020 20:52, Mouse wrote:
 Perhaps it would need a new flavour of file descriptor, [...]
>>> Linux has apparently done this: pidfd (file descriptors representing
>>> a process).  The idea is that you can pass them to various system
>>> call variants that otherwise take pids, without the risk that the
>>> process has exited in the mean time and the pid re-used.
>>
>> I've been thinking about something like that myself, starting with
>> AF_PID sockets, then deciding they wouldn't/couldn't work (as think I
>> mentioned in this thread, the socket infrastructure really wants the
>> contents of a socket to be independent of who's accessing it).
>> Personally, I've wanted it as a way to provide an out-of-band channel
>> to userland programs (like a control command channel for various
>> daemons), but...hmm.
>>
>> Feels strange to find an idea I like coming from Linux.
> 
> You can relax.
> FreeBSD did it first for Capsicum.
> 
> https://www.freebsd.org/cgi/man.cgi?query=pdfork=2
> 
> Roy

Hi,

I have got a draft work on this.

It's composed of:

1. New id_t type in P_PIDFD.
2. sigsendset(2) + sigsend(3), picked from SVID (Solaris, etc) It's a
generalized version of kill(2). It allows specifying P_PIDFD.
3. waitid supporting new id type: P_PIDFD.

Linux is close to the above, except reinventing the wheel instead of
picking sigsendset(2) + integration with /proc.

FreeBSD went with a distinct direction and invented new semantics of
process file descriptors, that differs to references over pid_t. For
example whenever you close(2) a file descriptor, it kills the process.
You also must wait for process events in FreeBSD using kevent(2). This
introduced incompatibilities with the established UNIX semantics. Linux
went a different path and whenever a process dies, we get ESRCH, which
is sensible as it avoids e.g. killing a random process with recycled pid_t.

We want process file descriptors to reference process handles through
PID namespaces, for hopefully upcoming support of containers.



signature.asc
Description: OpenPGP digital signature


Proposal to enable WAPBL by default for 10.0

2020-07-22 Thread Kamil Rytarowski
I propose to enable WAPBL ("log" in fstab(5)) by default for 10.0 and newer.

WAPBL - Write Ahead Physical Block Logging file system journaling

More info on WAPBL in wapbl(4).

https://netbsd.gw.com/cgi-bin/man-cgi?wapbl++NetBSD-current

Rationale: the default filesystem (FFS) without WAPBL is more prone to
data loss.

A potential blocker:

http://gnats.netbsd.org/54504
panic: kernel diagnostic assertion "(wapbl_transaction_len(wl) <=
(wl->wl_circ_size - wl->wl_reserved_bytes))" failed: file
".../src/sys/kern/vfs_wapbl.c", line 1271 wapbl_end: current transaction
too big to flush



signature.asc
Description: OpenPGP digital signature


Re: kernel stack usage

2020-07-16 Thread Kamil Rytarowski
On 04.07.2020 15:51, Jaromír Doleček wrote:
> Le sam. 4 juil. 2020 à 15:30, Kamil Rytarowski  a écrit :
>>> Kamil - what's the difference in gcc between -Wframe-larger-than= and
>>> -Wstack-size= ?
>>>
>>
>> -Wstack-size doesn't exist?
> 
> Sorry, meant -Wstack-usage=
> 

It looks like WStack-usage is GCC specific (absent in Clang) and it
includes alloca + vla. Both are not welcome in the kernel anyway so both
should give in most cases the same result.

>>> I see according to gcc documentation -Wframe-larger-than doesn't count
>>> size for alloca() and variable-length arrays, which makes it much less
>>> useful than -Wstack-usage.
>>>
>>
>> It's not a problem.
>>
>> Whenever alloca or VLA is in use, it's already breaking the stack
>> protector. There are a few exceptions registered in sys/conf/ssp.mk.
>>
>> We could add -Walloca -Wvla for USE_SSP=yes builds to catch quickly
>> inappropriate usage (frequently violated by code optimizer).
> 
> It's already not used in our kernel code, I checked and I found it
> used only in some arch-specific stand/ code. So -Walloca should be
> safe to turn on unconditionally, regardless of SSP. Unless the
> compiler emits alloca() calls itself.
> 

Sounds good. I'm for adding -Walloca and ideally -Wvla wherever possible
in the kernel.

> Jaromir
> 




signature.asc
Description: OpenPGP digital signature


Re: pg_jobc going negative?

2020-07-10 Thread Kamil Rytarowski
On 10.07.2020 15:41, Robert Elz wrote:
> Unfortunately
> I have no idea what "qualifying pgrp for job control" is supposed to mean.

The Design and implementation of 4.4book phrases it as: number of
processes with parent controlling terminal.

Unfortunately the book does not explain whether pg_jobc can go bellow 0.



signature.asc
Description: OpenPGP digital signature


Re: kernel stack usage

2020-07-04 Thread Kamil Rytarowski
On 04.07.2020 14:00, Jaromír Doleček wrote:
> Can anybody using clang please confirm kernel build with
> -Wframe-larger-than=3584?
> 

NetBSD-current from today, amd64 GENERIC builds for me.

> Kamil - what's the difference in gcc between -Wframe-larger-than= and
> -Wstack-size= ?
> 

-Wstack-size doesn't exist?

> I see according to gcc documentation -Wframe-larger-than doesn't count
> size for alloca() and variable-length arrays, which makes it much less
> useful than -Wstack-usage.
> 

It's not a problem.

Whenever alloca or VLA is in use, it's already breaking the stack
protector. There are a few exceptions registered in sys/conf/ssp.mk.

We could add -Walloca -Wvla for USE_SSP=yes builds to catch quickly
inappropriate usage (frequently violated by code optimizer).


On 04.07.2020 14:10, Martin Husemann wrote:
> On Sat, Jul 04, 2020 at 02:00:04PM +0200, Jaromír Dole?ek wrote:
>> Can anybody using clang please confirm kernel build with
>> -Wframe-larger-than=3584?
>
> Side note: 3584 is inacceptably large, we need to trim it down to ~1k.
>

Setting it to 3584 is a good start, it can be lowered later to 1024.



signature.asc
Description: OpenPGP digital signature


Re: pg_jobc going negative?

2020-06-29 Thread Kamil Rytarowski
On 09.06.2020 20:11, Robert Elz wrote:
> Date:Tue, 9 Jun 2020 17:04:54 +0200
> From:    Kamil Rytarowski 
> Message-ID:  
> 
> 
>   | Yes... syzkaller had like 12 different ways to reproduce it.
> 
> OK, thanks.
> 
>   | There is still a race and we randomly go to negative pg_jobc.
> 
> I am not at all surprised...
> 
> I will look at it over the next couple of days.   No guarantees...
> 

Ping? This kernel crash is blocking GDB/etc and it is an instant crash.

> kre
> 



Re: makesyscalls (moving forward)

2020-06-15 Thread Kamil Rytarowski
On 15.06.2020 15:21, Johnny Billquist wrote:
> 
> Anyway. Who here does not modify their path at login anyway.

The path has to be readily available for pkgsrc users with unprepared
environment.

However if we install the utility into /usr/sys (similar to /usr/games),
we can use a full path to the program and it will be good enough (for
me). Are there other programs that would be moved to this directory?

I have got a feeling that too many programs already rely on specific
kernel internals so making a distinction would only confuse people and
impose unclear conditions what belongs where. fsdb(8) or crash(8) are
definitely not going to be very usable with mixed kernel and userland
versions.

Something we possibly agree upon is that makesyscalls(1) would not be a
tool for administer a computer/server, so /usr/sbin /sbin is not a good
place.



signature.asc
Description: OpenPGP digital signature


Re: makesyscalls (moving forward)

2020-06-15 Thread Kamil Rytarowski
On 15.06.2020 14:35, Reinoud Zandijk wrote:
> On Mon, Jun 15, 2020 at 02:06:00PM +0200, Kamil Rytarowski wrote:
>> On 15.06.2020 13:44, Reinoud Zandijk wrote:
>>>  No need to install it in base. The resulting files can then be committed
>>>  as `regen' just like the pcidevs variants.
>>
>> I disagree as we don't want to pull ${BSDSRCDIR} dependency for users, for
>> building an application.
> 
> Lets try to make it clear then: who are the users?
> 
> 1) Kernel syscall and compat (module) code; only when updating calls
> 
> 2) ktrace (and friends) system calls decode. That would greatly increase
> readability ! Esp. if passed arguments could be automatically dumped too.
> 

The above are good for TOOLDIR.

Below:

> 3) (llvm) fuzzers for testing; this is intree too so no big deal
> 

LLVM is an external project and only in a special case part of the
basesystem. While there, there is the same issue with GCC sanitizers. We
definitely don't want to request regular LLVM or GCC users building the
toolchain to depend on TOOLDIR / BSDSRCDIR.

> 4) some syscall bashing tools for testing etc. They are tailored anyway so
> using a $BSDSRCDIR specfic program that is not installed is not that relevant.
> 

I don't know what is syscall bashing tool for testing.

> But what else? There are IMHO no other valid users.
> 

As of today GDB, but other similar programs can/shall follow.

syscall tracers (I wrote picotrace, truss - both distributed in pkgsrc;
there is strace)

Language runtime, basically everything that is not using libc could use
it (go, rust, D, etc).

kernel fuzzers (syzkaller)

We work on rumpkernel syscall fuzzers during the ongoing GSoC.

>> This utility shall receive ATF testing and thus shall be part of $PATH.
> 
> ATF testing sounds like a good idea; but does an utility have to be installed
> to be able to test code?
> 

Yes.

That could be worked around for ATF, but generally it needs testing.

> Also the generated files need to be updated in the kernel source tree and are
> tightly coupled to the kernel code templates.
> 

Yes.

>> Putting it to /kern would be bad as we will gain another kernel ABI
>> dependency and this program won't be usable in TOOLDIR neither when working
>> with different target NetBSD release than the developer's computer.
>>
>> I personally think that the definition file shall be embedded directly into
>> the program to avoid any issues with incompatible script version vs
>> makesyscalls(1) program.
> 
> You got a point there, and embedding it would make sense yes; but i still
> wouldn't install the program or its definition files as its kernel source
> version dependent and when building tools etc. $BSDSRCDIR is obviously
> available anyway.
> 

For a developer it is fine to request BSDSRCDIR to be available, but for
users (of e.g. GDB) this is certainly an overkill. makesyscalls(1) will
be maybe up to 1MB. Just $BSDSRCDIR/sys takes around 450MB. If we want
to depend on BSDSRCDIR for programs in pkgsrc, this is IMO a blocker.

Prebuilt picotrace takes less than 100kb. Adding a hard dependency on
BSDSRCDIR would be severe overkill.

The intention of this tool is too export its functionality to regular
programs that can be built in pkgsrc and there are plenty of them.

> Reinoud
> 

I'm definitely going to be a user of this program (in default PATH,
without BSDSRCDIR) and whenever possible, I will wire pkgsrc packages to
depend on it as soon as possible. None of the pkgsrc programs will need
BSDSRCDIR.



signature.asc
Description: OpenPGP digital signature


Re: makesyscalls (moving forward)

2020-06-15 Thread Kamil Rytarowski
On 15.06.2020 14:16, Johnny Billquist wrote:
> On 2020-06-15 14:12, Kamil Rytarowski wrote:
>> On 15.06.2020 14:11, Johnny Billquist wrote:
>>>
>>> We should not clutter the directories that are in the normal users path
>>> with things that a normal user would never care about.
>>
>> I never used 90% of the programs from /usr/bin /usr/sbin /bin /sbin. but
>> I definitely would use makesyscall(1). If you have other argument that
>> "I don't use it" please speak up.
> 
> I'm not convinced you are particularly representative of "users".
> 

NetBSD is a my daily driver so I'm a user!

> But it would be interesting to hear how and when you are planning to use
> makesyscalls.
> 

I work with the syscall layer almost continuously in various projects
(debuggers, fuzzers, syscall tracers, sanitizers, non-libc language
runtimes etc). Reiterating over the same list 10 times just increases
the frustration and perception of lost time of repeating the same
process in an incompatible way for another program. The tool shall
centralize the whole knowledge about passed arguments, structs and
export it to users through a flexible code generation.

We already distribute to users /usr/include/sys/syscalls.h (and it is
used e.g. by GDB to parse the syscalls, as parsing syscalls.master in
that case was harder). makesyscalls(1) is intended to be a more
specialized and generic version of the same functionality as distributed
by this header.

With some sort of fanciness, we could generate these lists on the fly in
some projects (for e.g. GDB) and we would want the utility to be
available in place. If it is restricted to build-only phase of various
programs (that definitely shall be free from BSDSRCDIR dependency) it
will be good enough.

I'm for adding this program in PATH and I would be a user on a regular
basis. I basically need it for pretty everything (2 GSoC ongoing
projects are about covering the same syscalls in 2 different ways).
Asking me for a use-case is odd to me as it is an elementary program
that belongs to /usr/bin.

>   Johnny
> 




signature.asc
Description: OpenPGP digital signature


Re: makesyscalls (moving forward)

2020-06-15 Thread Kamil Rytarowski
On 15.06.2020 14:11, Johnny Billquist wrote:
> 
> We should not clutter the directories that are in the normal users path
> with things that a normal user would never care about.

I never used 90% of the programs from /usr/bin /usr/sbin /bin /sbin. but
I definitely would use makesyscall(1). If you have other argument that
"I don't use it" please speak up.



signature.asc
Description: OpenPGP digital signature


Re: makesyscalls (moving forward)

2020-06-15 Thread Kamil Rytarowski
On 15.06.2020 13:44, Reinoud Zandijk wrote:
>  No need to install it in base. The resulting
> files can then be committed as `regen' just like the pcidevs variants.

I disagree as we don't want to pull ${BSDSRCDIR} dependency for users,
for building an application.

This utility shall receive ATF testing and thus shall be part of $PATH.

On 15.06.2020 13:44, Reinoud Zandijk wrote:
> I wouldn't install the file in the FS; kernel and userland are often
out of
> sync, maybe even versions apart with say NetBSD-8 userland but
NetBSD-current
> kernel. If anywhere, request the data from the kernel by exposing it
in /kern.
> Exposing it one way or another might be an attack vector too ...


Putting it to /kern would be bad as we will gain another kernel ABI
dependency and this program won't be usable in TOOLDIR neither when
working with different target NetBSD release than the developer's computer.

I personally think that the definition file shall be embedded directly
into the program to avoid any issues with incompatible script version vs
makesyscalls(1) program.



signature.asc
Description: OpenPGP digital signature


Re: makesyscalls (moving forward)

2020-06-15 Thread Kamil Rytarowski
On 15.06.2020 00:57, Johnny Billquist wrote:
> On 2020-06-15 00:52, Kamil Rytarowski wrote:
>> On 15.06.2020 00:26, Johnny Billquist wrote:
>>> But that's just me. I'll leave the deciding to you guys...
>>>
>>
>> This is only me, but /sbin and /usr/sbin are for users with root
>> privileges, while /bin and /usr/bin for everybody. makesyscalls(1)
>> intends to be an end-user program that aids building software and this
>> is just another specialized program similar to flex(1) or yacc(1), just
>> a more domain specific code generator.
> 
> Is ping only for people with root privileges???
> 

ping needs setuid so yes.

>   Johnny
> 




signature.asc
Description: OpenPGP digital signature


Re: makesyscalls (moving forward)

2020-06-14 Thread Kamil Rytarowski
On 15.06.2020 00:26, Johnny Billquist wrote:
> But that's just me. I'll leave the deciding to you guys...
> 

This is only me, but /sbin and /usr/sbin are for users with root
privileges, while /bin and /usr/bin for everybody. makesyscalls(1)
intends to be an end-user program that aids building software and this
is just another specialized program similar to flex(1) or yacc(1), just
a more domain specific code generator.

I don't see any reason why to restrict makesyscalls(1) to root-only.
/usr/bin is a settled path for native programs and chaing it is not
worth it (and I personally see no reason). If I want to plug
makesyscalls(1) into LLVM or GDB or some fuzzers, it would be certainly
cumbersome to pass full path to some /sys/bin or similar.



signature.asc
Description: OpenPGP digital signature


Re: makesyscalls (moving forward)

2020-06-14 Thread Kamil Rytarowski
On 14.06.2020 23:59, Johnny Billquist wrote:
> On 2020-06-14 23:21, Paul Goyette wrote:
>> On Sun, 14 Jun 2020, David Holland wrote:
>>
>> 
>>
>>> This raises two points that need to be bikeshedded:
>>>
>>> (1) What's the new tool called, and where does it live in the tree?
>>> "usr.bin/makesyscalls" is fine with me but ymmv.
>>
>> "usr.bin/makesyscalls" sounds good to me.
> 
> Uh? usr.bin is where stuff for /usr/bin is located, right? Anything
> there should be pretty normal tools that any user might be interested
> in. Don't seem to me as makesyscalls would be a tool like that?
> 
> Possibly some sbin thing, but in all honestly, wouldn't this make more
> sense to have somewhere under sys? Don't we have some other tools and
> bits which are specific for kernel and library building?
> 

/usr/bin is appropriate and there are already similar tools (like
ioctlprint(1)). It's already in PATH and definitely in interest of some
end-users (like me) and I do want to have it.

makesyscalls(1) sounds like a good name.

/usr/share/sys/syscalls.de should be an internal detail for
makesyscalls(1). I actually think that syscalls.def should be builtin
into the program and we should avoid an external file dependency as it
is expected to be operational for only one kernel ABI release +
makesyscalls(1) version. There are expected no external consumers of
this .def file and all we need and want is to pass rules how to generate
syscall definitions.

makesyscalls(1) will likely quickly turn into a ./build.sh tool and
reducing management of an external file is especially a good idea.

There is already a prior art as ioctlprint(1) has a builtin database for
the ioctl codes and it works well.

>   Johnny
> 




signature.asc
Description: OpenPGP digital signature


[PATCH] Add support for RUMP_USE_LIBC_ALLOCATORS

2020-06-14 Thread Kamil Rytarowski
I propose to add a new option for building rumpkernel:
RUMP_USE_LIBC_ALLOCATORS.

This option wires the kernel allocators directly to the libc functions.
This is useful for sanitizers with their fine-grained checks of
allocated chunks.

http://netbsd.org/~kamil/patch-00268-RUMP_USE_LIBC_ALLOCATORS.2.txt

Does this code and approach look good? Is possible to improve this
approach? There is a fallout with ATF test t_vm::uvmwait as it no longer
passes as the memory limit is no longer respected.

With this patch rumpkernel is now more sensitive to real memory access
bugs. The immediate detected problem is with kthread_join() that
attempts to join a thread that was freed.

http://netbsd.org/~kamil/rump/heap-use-after-free.txt

We use this patch during the ongoing GSoC so quick merge into src/ is
wanted.



signature.asc
Description: OpenPGP digital signature


Re: makesyscalls

2020-06-09 Thread Kamil Rytarowski
On 10.06.2020 01:13, David Holland wrote:
> The question is: do we want the Python version in the tree now

For this, I would say "NO", at least as long Python is out of base and
IMO it shall not be there.

But it is fine to put into othersrc/.

On 10.06.2020 01:13, David Holland wrote:
> Rewriting in C is a possible future step. The code generator I have in
> mind going forward should not be done in Python. But again, more on
> that later.

I would like to have mksyscalls (and at some point makeioctls) much more
flexible and as a tool scriptable. I had to iterate a dozen of times
over all our syscalls in various fuzzers, sanitizers, debuggers, tracers
etc.

Something that is very needed is knowing the full serialized struct
passed as a pointer to each syscall. It's a lot of work to teach the
tool about it, but it could be finally centralized and time saved of
repeatedly teaching all the other programs about this property of syscalls.



signature.asc
Description: OpenPGP digital signature


Re: pg_jobc going negative?

2020-06-09 Thread Kamil Rytarowski
On 09.06.2020 16:35, Robert Elz wrote:
> Date:Tue, 9 Jun 2020 14:13:56 +0200
> From:    Kamil Rytarowski 
> Message-ID:  <85d5e51f-afd1-1038-fd68-2366ff073...@netbsd.org>
> 
>   | Here is the simplest reproducer crashing the kernel on negative pg_jobc:
> 
> I have not looked at this closely yet, but this is likely because
> ptrace() fiddles p_pptr which the routines that manipulate the pg_jobc
> more or less expect to be a constant.
> 
> Is there any known reproducer of this problem which does not involve ptrace() 
> ?
> 

Yes... syzkaller had like 12 different ways to reproduce it.

As far as I can tell, in the syzkaller case they all are about races.

One of them is here:

https://syzkaller.appspot.com/text?tag=ReproC=128060f610

After adding the asserts, all look similar to me: forking + setpgid(0, 0).

> At first glance, the manipulations of pg_jobc looks a bit dodgy to me, but I 
> haven't investigated enough to be able to spot a definite problem yet
> (possible ptrace() generated issue aside - and yes, those need to work as
> well).
> 
> I doubt very much that adding a new mutex will make a difference, all the
> manipulations are done with proc_lock held, which is kind of the "big lock"
> for process manipulation - adding finer grained locking might improve
> performance, by improving concurrency, but is unlikely (at this stage,
> nothing is impossible) to be a fix for this problem.
> 

There is still a race and we randomly go to negative pg_jobc.

> kre
> 



Re: pg_jobc going negative?

2020-06-09 Thread Kamil Rytarowski
On 09.06.2020 08:38, Maxime Villard wrote:
>> Should we allow pg_jobc going negative?
> 
> I don't think so, the code is not designed to expect negative values.

Here is the simplest reproducer crashing the kernel on negative pg_jobc:

http://netbsd.org/~kamil/ptrace/pg_jobc-crash.c

On 09.06.2020 08:38, Maxime Villard wrote:
> In short, (1) my understanding of it is that the code is not designed to
> expect negative values, and (2) since I added the KASSERTs 10/11 of the
> random bugs didn't trigger. Big signs the bug is indeed related to
> refcounting.
>
> It would be nice if someone with better understanding than me of the lwp
> group stuff could have a look, though.


Generally, pg_jobc looks like a ref counting mechanism. FreeBSD reworked
the code and added a dedicated pgrp mutex.

I don't know which path is the best for us, especially regarding the SMP
properties.

+ ad@


Re: pg_jobc going negative?

2020-06-09 Thread Kamil Rytarowski
On 09.06.2020 10:23, Michael van Elst wrote:
> m...@m00nbsd.net (Maxime Villard) writes:
> 
>> You can see they are all different, but all have to do with reading the
>> group pointer, which was either freed, overwritten, not initialized,
>> unmapped, or contained pure garbage. This is typical of refcounting bugs
>> where a resource disappears under your feet.
> 
> pg_jobc is not a reference counter. The assertion probably stopped
> a bug in a different place by coincidence.
> 

As the first step, is it fine to replace all pg_jobc == 0/ != 0 checks
with pg_jobc > 0 / <= 0?

This should not make anything worse than it is now.

The remaining code assumes that pg_jobc never goes below 0.

And then, follow up with the removal of the assert. We will check with
syzkaller whether the races/crashes are gone.



signature.asc
Description: OpenPGP digital signature


pg_jobc going negative?

2020-06-08 Thread Kamil Rytarowski
pg_jobc is a process group struct member counting the number of
processes with a parent capable of doing job control. Once reaching 0,
the process group is orphaned.

With the addition of asserts checking for pg_jobc > 0 (by maxv@), we
caught issues that pg_jobc can go negative. I have landed new ATF tests
that trigger this reliably with ptrace(2)
(src/tests/lib/libc/sys/t_ptrace_fork_wait.h r.1.7). The problem was
originally triggered with GDB.

There are also other sources of these asserts due to races
The ptrace(2) ATF tests are reliable triggering a negative pg_jobc,
however there are racy tests doing the same as triggered by syzkaller
(mentioned at least in [1]).

Should we allow pg_jobc going negative? (Other BSDs allow this.)

Is going negative in the first place a real bug?

Are the races triggered by syzkaller real bugs?

[1] http://mail-index.netbsd.org/current-users/2020/05/04/msg038511.html

On 20.04.2020 18:32, Maxime Villard wrote:
> Module Name:  src
> Committed By: maxv
> Date: Mon Apr 20 16:32:03 UTC 2020
> 
> Modified Files:
>   src/sys/kern: kern_proc.c
> 
> Log Message:
> Add three KASSERTs, to detect refcount bugs.
> 
> This narrows down an unknown bug in some place near, that has manifested
> itself in various forms (use-after-frees, uninit accesses, page faults,
> segmentation faults), all pointed out by syzbot.
> 
> The first KASSERT in fixjobc() fires when the bug is encountered.
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.244 -r1.245 src/sys/kern/kern_proc.c
> 
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
> 
> 
> Modified files:
> 
> Index: src/sys/kern/kern_proc.c
> diff -u src/sys/kern/kern_proc.c:1.244 src/sys/kern/kern_proc.c:1.245
> --- src/sys/kern/kern_proc.c:1.244Sun Apr 19 20:31:59 2020
> +++ src/sys/kern/kern_proc.c  Mon Apr 20 16:32:03 2020
> @@ -1,4 +1,4 @@
> -/*   $NetBSD: kern_proc.c,v 1.244 2020/04/19 20:31:59 thorpej Exp $  */
> +/*   $NetBSD: kern_proc.c,v 1.245 2020/04/20 16:32:03 maxv Exp $ */
>  
>  /*-
>   * Copyright (c) 1999, 2006, 2007, 2008, 2020 The NetBSD Foundation, Inc.
> @@ -62,7 +62,7 @@
>   */
>  
>  #include 
> -__KERNEL_RCSID(0, "$NetBSD: kern_proc.c,v 1.244 2020/04/19 20:31:59 thorpej 
> Exp $");
> +__KERNEL_RCSID(0, "$NetBSD: kern_proc.c,v 1.245 2020/04/20 16:32:03 maxv Exp 
> $");
>  
>  #ifdef _KERNEL_OPT
>  #include "opt_kstack.h"
> @@ -554,6 +554,7 @@ proc_sessrele(struct session *ss)
>  {
>  
>   KASSERT(mutex_owned(proc_lock));
> + KASSERT(ss->s_count > 0);
>   /*
>* We keep the pgrp with the same id as the session in order to
>* stop a process being given the same pid.  Since the pgrp holds
> @@ -1181,8 +1182,11 @@ fixjobc(struct proc *p, struct pgrp *pgr
>   if (entering) {
>   pgrp->pg_jobc++;
>   p->p_lflag &= ~PL_ORPHANPG;
> - } else if (--pgrp->pg_jobc == 0)
> - orphanpg(pgrp);
> + } else {
> + KASSERT(pgrp->pg_jobc > 0);
> + if (--pgrp->pg_jobc == 0)
> + orphanpg(pgrp);
> + }
>   }
>  
>   /*
> @@ -1197,8 +1201,11 @@ fixjobc(struct proc *p, struct pgrp *pgr
>   if (entering) {
>   child->p_lflag &= ~PL_ORPHANPG;
>   hispgrp->pg_jobc++;
> - } else if (--hispgrp->pg_jobc == 0)
> - orphanpg(hispgrp);
> + } else {
> + KASSERT(hispgrp->pg_jobc > 0);
> + if (--hispgrp->pg_jobc == 0)
> + orphanpg(hispgrp);
> + }
>   }
>   }
>  }
> 




signature.asc
Description: OpenPGP digital signature


Re: UBSan: Undefined Behavior in lf_advlock

2020-06-05 Thread Kamil Rytarowski
On 06.06.2020 00:25, Jaromír Doleček wrote:
> Le ven. 5 juin 2020 à 21:49, syzbot
>  a écrit :
>> [  44.1699615] panic: UBSan: Undefined Behavior in 
>> /syzkaller/managers/netbsd-kubsan/kernel/sys/kern/vfs_lockf.c:843:16, signed 
>> integer overflow: 131072 + 9223372036854771712 cannot be represented in type 
>> 'long int'
>>
>> [  44.1931600] cpu0: Begin traceback...
>> [  44.1999503] vpanic() at netbsd:vpanic+0x287 sys/kern/subr_prf.c:290
>> [  44.2299515] isAlreadyReported() at netbsd:isAlreadyReported
>> [  44.2599494] HandleOverflow() at netbsd:HandleOverflow+0x1c9 
>> sys/../common/lib/libc/misc/ubsan.c:375
>> [  44.2899499] lf_advlock() at netbsd:lf_advlock+0x2124 
>> sys/kern/vfs_lockf.c:843
> 
> This happens in:
> if (fl->l_len > 0)
> end = start + fl->l_len - 1;
> else {
> 
> when call to fcntl() is arranged so that 'start' ends up 0x2 and
> fl->l_len 0x7000, overflowing the off_t.
> 
> I wonder, Is it important to fix cases like that?
> 
> Jaromir
> 

Generally we want to mute noise so signal is high on serious issues.

Overflowing off_t shall not happen as it is a signed integer. A compiler
is free to impose assumptions that the result is not overflowed and
miscompile. In this case it is probably safe, but better to silent it.



signature.asc
Description: OpenPGP digital signature


Re: kernel stack usage

2020-05-31 Thread Kamil Rytarowski
Can we adopt -Wframe-larger-than=1024 and mark it fatal?

This option is supported by GCC and Clang.

On 31.05.2020 15:55, Jaromír Doleček wrote:
> I think it would make sense to add -Wstack-usage=X to kernel builds.
> 
> Either 2KB or 1KB seems to be good limit, not too many offenders
> between 1KB and 2KB so maybe worth it to use 1KB and change them.
> 
> I'm sure there would be something similar for LLVM too.
> 
> Jaromir
> 
> Le dim. 31 mai 2020 à 15:39, Simon Burge  a écrit :
>>
>> matthew green wrote:
>>
>>> glad to see this effort and the clean up already!
>>>
>>> ideally, we can break the kernel build if large stack consumers
>>> are added to the kernel.  i'd be OK with it being default on,
>>> with of course a way to skip it, and if necessary it can have
>>> a whitelist of "OK large users."
>>
>> I started to look at -fstack-usage which gives a nice MI way of getting
>> the data instead of parsing MD objdump output, but didn't get any
>> further than the below patch.  The find(1) command referenced in the
>> patch gives the following
>>
>>   1008 nfs_serv.c:2181:1:nfsrv_link
>>   1056 procfs_linux.c:422:1:procfs_do_pid_stat
>>   1056 vfs_subr.c:1521:1:vfs_buf_print
>>   1072 dl_print.c:80:1:sdl_print
>>   1104 core_elf32.c:300:1:coredump_getseghdrs_elf32
>>   1104 core_elf32.c:300:1:coredump_getseghdrs_elf64
>>   1104 sys_ptrace_common.c:1595:1:proc_regio
>>   1120 subr_bufq.c:131:1:bufq_alloc
>>   1136 db_lwp.c:64:1:db_lwp_whatis
>>   1152 kern_ktrace.c:1269:1:ktrwrite
>>   1168 uvm_swap.c:768:1:uvm_swap_stats.part.1
>>   1312 nfs_serv.c:1905:1:nfsrv_rename
>>   2112 tty_60.c:68:1:compat_60_ptmget_ioctl
>>   2144 memmem.c:83:14:twoway_memmem
>>
>> Anyone else want to run with adding a bit more to this to do some build
>> failure as mrg@ suggests and documenting for options(4)?
>>
>> Cheers,
>> Simon.
>> --
>> Index: Makefile.kern.inc
>> ===
>> RCS file: /cvsroot/src/sys/conf/Makefile.kern.inc,v
>> retrieving revision 1.270
>> diff -d -p -u -r1.270 Makefile.kern.inc
>> --- Makefile.kern.inc   21 May 2020 18:44:19 -  1.270
>> +++ Makefile.kern.inc   31 May 2020 13:34:13 -
>> @@ -104,6 +104,11 @@ CFLAGS+=   -ffreestanding -fno-zero-initia
>>  CFLAGS+=   ${${ACTIVE_CC} == "gcc":? -fno-delete-null-pointer-checks :}
>>  CFLAGS+=   ${DEBUG} ${COPTS}
>>  AFLAGS+=   -D_LOCORE -Wa,--fatal-warnings
>> +.if defined(KERN_STACK_USAGE)
>> +# example usage to find largest stack users in kernel compile directory:
>> +#find . -name \*.su | xargs awk '{ printf "%6d %s\n", $2, $1 }' | sort 
>> +1n
>> +CFLAGS+=   -fstack-usage
>> +.endif
>>
>>  # XXX
>>  .if defined(HAVE_GCC) || defined(HAVE_LLVM)
>> @@ -338,8 +343,8 @@ ${_s:T:R}.o: ${_s}
>>  .if !target(__CLEANKERNEL)
>>  __CLEANKERNEL: .USE
>> ${_MKMSG} "${.TARGET}ing the kernel objects"
>> -   rm -f ${KERNELS} *.map eddep tags *.[io] *.ko *.ln [a-z]*.s vers.c \
>> -   [Ee]rrs linterrs makelinks assym.h.tmp assym.h \
>> +   rm -f ${KERNELS} *.map *.[io] *.ko *.ln [a-z]*.s *.su vers.c \
>> +   eddep tags [Ee]rrs linterrs makelinks assym.h.tmp assym.h \
>> ${EXTRA_KERNELS} ${EXTRA_CLEAN}
>>  .endif
>>




signature.asc
Description: OpenPGP digital signature


Re: sys/idtype.h unused enumeration values

2020-05-19 Thread Kamil Rytarowski
On 19.05.2020 15:06, Robert Elz wrote:
> Date:Tue, 19 May 2020 14:12:31 +0200
> From:    Kamil Rytarowski 
> Message-ID:  <6874bb63-5146-797f-98b7-b9c497677...@gmx.com>
> 
>   | Rationale for pointless?
> 
> There is no point.   What more can I say?
> 
>   | My points were:
>   |
>   |  - Clobbering OS that claims the goals of clean design and clean code
>   | with mutant alien bodies without counterparts in the native kernel,
>   | without request from any relevant standard body.
> 
> Having the extra entries is harmless, there is no point deleting them.
> 

I've abandoned the intention of changing these values (by adding
comments, renaming etc). Once I will have spare time I might look into
implementing the missing ID types, but I don't promise to do it soon.
P_PSETID is possibly the easiest one.

Thank you for the feedback.



signature.asc
Description: OpenPGP digital signature


Re: sys/idtype.h unused enumeration values

2020-05-19 Thread Kamil Rytarowski
On 19.05.2020 08:38, Robert Elz wrote:
> Date:Mon, 18 May 2020 21:11:36 +0200
> From:    Kamil Rytarowski 
> Message-ID:  <05255347-1c55-2762-aaf6-fec3caf48...@gmx.com>
> 
>   | Next, I can add my value at the end of list (and before _P_MAXIDTYPE).
> 
> Other than this, everything that you propose is pointless.   This one
> you can do (assuming of course that there is a good reason for your new
> entry, but there's no reason to doubt that now.)


Rationale for pointless?

My points were:

 - Clobbering OS that claims the goals of clean design and clean code
with mutant alien bodies without counterparts in the native kernel,
without request from any relevant standard body.

 - Collecting garbage in public headers that is unused, misleading and
can at best be dummy.

 - Not know any public users. Extremely unlikely to have any private ones.

But as there is insist on keeping that mutant clobber around as it
formally leaked into ABI (even if very unlikely to have any single user
ever), it's better to implement project identifiers and the other kernel
components so this can be meaningful.

> 
> kre
> 




signature.asc
Description: OpenPGP digital signature


Re: sys/idtype.h unused enumeration values

2020-05-18 Thread Kamil Rytarowski
On 18.05.2020 22:18, Christos Zoulas wrote:
> 
> 
>> On May 18, 2020, at 3:40 PM, Kamil Rytarowski  wrote:
>>
>> If I delete P_TASKID ... P_P_CPUID ones, P_SETID will be reordered (but
>> we can force the number anyway). If I delete P_CID there is an inelegant
>> hole. Naturally P_SETID -> P_CID can fill the gap.
>>
>> This is in theory ABI change, but no users could use in a useful
>> approach previously.
>>
>> My intention isto g/c unused values and keep this clean and elegant (as
>> this is still possible).
>>
> 
> Why don't you leave them alone for the same reason FreeBSD did (source 
> compatibility)
> and append the ones you want? If you look they #define P_ZONEID P_JAILID when
> they made the change...
> 
> christos
> 

My point is that there is no source code (at least in base) that we gain
compatibility with, no ABI compatibility layer and these concepts do not
match the current NetBSD kernel features. If there is anything in 3rd
party pretending to use these values, it would not work anyway.

If we want to these compat defines, they already live in:

external/cddl/osnet/dist/uts/common/sys/procset.h

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=170346

Regarding FreeBSD, I don't see rationale for inclusion of these values.
It looks like it was copy-pasted (there were also Solaris-specific
preprocessor guards in the initial version).

But if there is intention to keep these values around (as it might be in
theory too late as they leaked somewhere), it's fine. Thanks.



signature.asc
Description: OpenPGP digital signature


Re: sys/idtype.h unused enumeration values

2020-05-18 Thread Kamil Rytarowski
On 18.05.2020 21:31, Taylor R Campbell wrote:
>> Date: Mon, 18 May 2020 21:11:36 +0200
>> From: Kamil Rytarowski 
>>
>> On 18.05.2020 20:24, Robert Elz wrote:
>>> Date:Mon, 18 May 2020 19:45:55 +0200
>>> From:Kamil Rytarowski 
>>> Message-ID:  
>>>
>>>   | I have got a local use-case for another P_type (premature to discuss it
>>>   | in this thread) and I would rather recycle an unused value.
>>>
>>> Don't do that, it is just a number, use one that hasn't been used
>>> for this purpose before.
>>>
>>>   | Do we plan to get Solaris feature-parity with all the types? I assume
>>>   | that the answer is NO. If so, can we delete the P_ values that are not
>>>   | applicable for NetBSD?
>>>
>>> I have no problem with that - just don't reuse the values for some
>>> different purpose, keep them reserved (assign them meaningless reserved
>>> names) just in case there's ever a need to implement one of those things
>>> (this is very very cheap insurance).
>>
>> I propose to delete P_CID and recycle its place for P_PSETID.
> 
> There are 2,147,483,630 different numbers you can choose from, if I
> counted right -- that's over two billion.  It's not that important to
> be economical with reuse when there's a scary comment above it
> exhorting you not to mess with the existing numbers...  Also, P_PSETID
> already appears to be assigned the number 15, so why would you want to
> change that?
> 

If I delete P_TASKID ... P_P_CPUID ones, P_SETID will be reordered (but
we can force the number anyway). If I delete P_CID there is an inelegant
hole. Naturally P_SETID -> P_CID can fill the gap.

This is in theory ABI change, but no users could use in a useful
approach previously.

My intention isto g/c unused values and keep this clean and elegant (as
this is still possible).



signature.asc
Description: OpenPGP digital signature


Re: sys/idtype.h unused enumeration values

2020-05-18 Thread Kamil Rytarowski
On 18.05.2020 20:24, Robert Elz wrote:
> Date:Mon, 18 May 2020 19:45:55 +0200
> From:    Kamil Rytarowski 
> Message-ID:  
> 
>   | I have got a local use-case for another P_type (premature to discuss it
>   | in this thread) and I would rather recycle an unused value.
> 
> Don't do that, it is just a number, use one that hasn't been used
> for this purpose before.
> 
>   | Do we plan to get Solaris feature-parity with all the types? I assume
>   | that the answer is NO. If so, can we delete the P_ values that are not
>   | applicable for NetBSD?
> 
> I have no problem with that - just don't reuse the values for some
> different purpose, keep them reserved (assign them meaningless reserved
> names) just in case there's ever a need to implement one of those things
> (this is very very cheap insurance).
> 
> kre
> 

Solaris ABI compat is something that has small chances of being useful
in future. Solaris is also not a reference for NetBSD here.

Linux uses the POSIX types: P_ALL, P_PID, P_GID + extension P_PIDFD.

https://github.com/torvalds/linux/blob/master/include/uapi/linux/wait.h#L16

FreeBSD already recycled the P_ZONEID value for P_JAILID.

Keeping these unused Solaris values in tree has no added value. If we
want it for the sake of defining it, push this to src/external/.

On 18.05.2020 20:19, Christos Zoulas wrote:
> I copied these from FreeBSD who in turn copied them from solaris and
changed
> P_ZONEID to P_JAILID. The FreeBSD comment is:
> http://bxr.su/FreeBSD/sys/sys/wait.h#100
> I decided to keep all the names too.
>

I propose to delete: P_TASKID, P_PROJID, P_POOLID, P_ZONEID, P_CTID as
they have no equivalent objects in NetBSD. (If they could have, these
objects could be designed differently).

P_CID could be in theory used, but has no users and could be deleted.

P_PSETID shall be used in pset_bind(3)/pset_unbind(3), but it looks like
it was not implemented (and thus, has no users today).

http://netbsd.org/~kamil/patch-00256-cleanup-waitids.txt

I propose to delete P_CID and recycle its place for P_PSETID.

If we will ever have a need for P_CTID or similar, we could readd them
to the enumeration.

Next, I can add my value at the end of list (and before _P_MAXIDTYPE).



signature.asc
Description: OpenPGP digital signature


sys/idtype.h unused enumeration values

2020-05-18 Thread Kamil Rytarowski
POSIX notes:

"The type idtype_t shall be defined as an enumeration type whose
possible values shall include at least the following: P_ALL P_PGID P_PID"

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_wait.h.html

For some reason we copied as-is solaris types into our public headers
with fields that will possibly never apply to NetBSD, such as P_CTID,
P_POOLID etc.

I have got a local use-case for another P_type (premature to discuss it
in this thread) and I would rather recycle an unused value.

Do we plan to get Solaris feature-parity with all the types? I assume
that the answer is NO. If so, can we delete the P_ values that are not
applicable for NetBSD?

If something is used in some compat shim for sunos software (dtrace,
zfs) we could easily add a compat wrapper in the 3rd party code.
Defining the types does not make the features to work.



signature.asc
Description: OpenPGP digital signature


Re: KAUTH_SYSTEM_UNENCRYPTED_SWAP

2020-05-16 Thread Kamil Rytarowski
Is it possible to avoid negation in the name?

KAUTH_SYSTEM_ENABLE_SWAP_ENCRYPTION

On 17.05.2020 00:51, Paul Goyette wrote:
> I'm not sure I like the name!
> 
> Can you call it KAUTH_SYSTEM_DISABLE_SWAPENCRYPT ?  That more
> closely describes the action which is being controlled.
> 
> 
> On Sat, 16 May 2020, Alexander Nasonov wrote:
> 
>> Attached patch adds KAUTH_SYSTEM_UNENCRYPTED_SWAP and
>> it forbids changing vm.swap_encrypt from 1 to 0 when
>> securelevel > 0.
>>
>> If there are no objections, I'm going to commit it tomorrow.
>>
>> -- 
>> Alex
>>
>>
>> !DSPAM:5ec06ddb24841398742664!
>>
> 
> ++--+---+
> | Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
> | (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
> | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
> ++--+---+




signature.asc
Description: OpenPGP digital signature


Re: vmspace refcnt refactor patch

2020-05-16 Thread Kamil Rytarowski
On 16.05.2020 18:52, Nick Hudson wrote:
> 
> On 16/05/2020 12:45, Kamil Rytarowski wrote:
>> On 16.05.2020 07:48, Nick Hudson wrote:
>>> On 15/05/2020 17:35, Kamil Rytarowski wrote:
>>>> I propose the following patch:
>>>>
>>>> http://netbsd.org/~kamil/patch-00253-refactor-vmspace-refcnt.txt
>>>>
>>>> Does it look good?
>>>>
>>>> Nick, does it fix the hppa locking problem?
>>>>
>>> Thanks for working on this.
>>>
>>> Unfortunately, it doesn't fix all the problems...
>>>
>> Please check this:
>>
>> http://netbsd.org/~kamil/patch-00254-remove-lwp_lock_in_sstep.txt
> 
> I can run the ptrace tests now without triggering LOCKDEBUG asserts.
> 
> 
> Thanks,
> 
> Nick
> 

OK. I'm waiting now for the feedback from Andew before committing.



signature.asc
Description: OpenPGP digital signature


Re: vmspace refcnt refactor patch

2020-05-16 Thread Kamil Rytarowski
On 16.05.2020 07:48, Nick Hudson wrote:
> On 15/05/2020 17:35, Kamil Rytarowski wrote:
>> I propose the following patch:
>>
>> http://netbsd.org/~kamil/patch-00253-refactor-vmspace-refcnt.txt
>>
>> Does it look good?
>>
>> Nick, does it fix the hppa locking problem?
>>
> 
> Thanks for working on this.
> 
> Unfortunately, it doesn't fix all the problems...
> 

Please check this:

http://netbsd.org/~kamil/patch-00254-remove-lwp_lock_in_sstep.txt

In general I don't like the locking model in ptrace(2), but this change
is not making anything worse.

> 
>     setstep1: [ 209.5497837] Mutex error: assert_sleepable,78: spin
> lock held
> 
> [ 209.5497837] lock address : 0x3ffccf00 type :  spin
> [ 209.5497837] initialized  : 0x007d6f78
> [ 209.5497837] shared holds :  0 exclusive: 1
> [ 209.5497837] shares wanted:  0 exclusive: 0
> [ 209.5497837] relevant cpu :  0 last held: 0
> [ 209.5497837] relevant lwp : 0x3fc36d00 last held:
> 0x3fc36d00
> [ 209.5497837] last locked* : 0x0084a760 unlocked :
> 0x007d5134
> [ 209.5497837] owner field  : 0xff10 wait/spin:   0/1
> 
> [ 209.5497837] panic: LOCKDEBUG: Mutex error: assert_sleepable,78: spin
> lock held
> [ 209.5497837] cpu0: Begin traceback...
> [ 209.5497837] vpanic() at netbsd:vpanic+0x1b8
> [ 209.5497837] panic() at netbsd:panic+0x38
> [ 209.5497837] lockdebug_abort1() at netbsd:lockdebug_abort1+0x170
> [ 209.5497837] lockdebug_barrier() at netbsd:lockdebug_barrier+0x114
> [ 209.5497837] assert_sleepable() at netbsd:assert_sleepable+0xa0
> [ 209.5497837] pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x254
> [ 209.5497837] uvm_mapent_alloc() at netbsd:uvm_mapent_alloc+0x150
> [ 209.5497837] uvm_map_enter() at netbsd:uvm_map_enter+0x848
> [ 209.5497837] uvm_map() at netbsd:uvm_map+0xd4
> [ 209.5497837] uvm_map_reserve() at netbsd:uvm_map_reserve+0x240
> [ 209.5497837] uvm_map_extract() at netbsd:uvm_map_extract+0x6fc
> [ 209.5497837] uvm_io() at netbsd:uvm_io+0xf8
> [ 209.5497837] process_domem() at netbsd:process_domem+0x94
> [ 209.5497837] ss_get_value() at netbsd:ss_get_value+0x74
> [ 209.5497837] process_sstep() at netbsd:process_sstep+0x74
> [ 209.5497837] do_ptrace() at netbsd:do_ptrace+0xe00
> [ 209.5497837] sys_ptrace() at netbsd:sys_ptrace+0x4c
> [ 209.5497837] syscall() at netbsd:syscall+0x480
> [ 209.5497837] -- syscall #26(7, 141a, 1, 0, ...) (0xd3131000)
> [ 209.5497837] cpu0: End traceback...
> 
> nick@zoom:/netbsd/hppa/obj.hppa/sys/arch/hppa/compile/GENERIC$
> hppa--netbsd-addr2line -e netbsd.gdb  -f 0x007d6f78
> sched_cpuattach
> /netbsd/hppa/src/sys/kern/kern_runq.c:147
> nick@zoom:/netbsd/hppa/obj.hppa/sys/arch/hppa/compile/GENERIC$
> hppa--netbsd-addr2line -e netbsd.gdb  -f 0x0084a760
> lwp_lock
> /netbsd/hppa/src/sys/sys/lwp.h:412
> nick@zoom:/netbsd/hppa/obj.hppa/sys/arch/hppa/compile/GENERIC$
> hppa--netbsd-addr2line -e netbsd.gdb  -f 0x007d5134
> calcru
> /netbsd/hppa/src/sys/kern/kern_resource.c:506 (discriminator 2)
> nick@zoom:/netbsd/hppa/obj.hppa/sys/arch/hppa/compile/GENERIC$
> 
> Nick





signature.asc
Description: OpenPGP digital signature


vmspace refcnt refactor patch

2020-05-15 Thread Kamil Rytarowski
I propose the following patch:

http://netbsd.org/~kamil/patch-00253-refactor-vmspace-refcnt.txt

Does it look good?

Nick, does it fix the hppa locking problem?



signature.asc
Description: OpenPGP digital signature


fstat(1) and sysctl(3)

2020-04-12 Thread Kamil Rytarowski
Is there any good reason that fstat(1) needs kvm(3)? Is it viable to
offer its functionality with sysctl(3), in particular in struct kinfo_file?

I'm have got a use-case (in GDB (*)) where I would make use of more
fields in struct kinfo_file, at least file path and socket information.

Maybe it would be viable to switch fstat(1) to sysctl(3)?

(*) FreeBSD uses:

fbsd_info_proc_files_entry (kf->kf_type, kf->kf_fd, kf->kf_flags,
kf->kf_offset, kf->kf_vnode_type,
kf->kf_sock_domain, kf->kf_sock_type,
kf->kf_sock_protocol, >kf_sa_local,
>kf_sa_peer, kf->kf_path);

-- gdb/fbsd-nat.c


Re: New tools proposal: ioctlname and ioctldecode

2020-04-01 Thread Kamil Rytarowski
On 02.04.2020 04:06, Mouse wrote:
>> We should maintain a contract that all new ioctl operations are
>> system wide unique.
>
> That is, unfortunately, unenforceable in the presence of a user base
> that writes and shares code.  If I #define IOCNEWTHING _IO('?',7) and
> someone else #defines IOCOTHERTHING _IO('?',7), there really isn't any
> way to prevent that, nor to prevent them from conflicting when - and
> eventually it will happen - someone wants to run a system with both new
> thing and other thing.
>

This is partially enforceable. As once we generate catchall switch like:

case FOO_OP:
...
case BAR_OP:
...

a compiler will report error whenever FOO_OP = BAR_OP.



Re: New tools proposal: ioctlname and ioctldecode

2020-04-01 Thread Kamil Rytarowski
On 02.04.2020 03:33, Mouse wrote:
 $ ioctlname 2148554498
 WSKBDIO_COMPLEXBELL
>>> Where would you get the mapping between the ioctl value and the
>>> name?  [...]
>> Everything is already done in kdump and reused in other tools like
>> ktruss.
>
> So, the big switch() method.
>

Yes.

We should maintain a contract that all new ioctl operations are system
wide unique.

>>> What about collisions, two ioctls having the same numeric value?
>> There are some collisions, but not that many of them.  In these cases
>> we try to pick the more interesting device.
>
> For kdump, that makes some sense.  For this tool, I think it would make
> the most sense to report them all.  (Arguably, kdump should too...)
>
> Of course, that would mean changing things at least slightly from the
> current kdump approach.  I'm not sure that would necessarily be a bad
> thing, especially if we could somehow (major handwave here) tell which
> ioctls go together, in which case kdump could do heuristics along the
> lines of "this fd accepted FOO_IOC_A, so we're going to decode this one
> as FOO_IOC_B rather than BAR_IOC_C".
>

This would be ideal... however it's not that simple and I would
recommend to go with the path of removing the conflicts entirely for the
general benefit.

For the time being we just live with conflicts and ignore a subset of
operations.

From a quick look there are conflicts e.g. due to chio(1) a SCSI charger
added in NetBSD 1.3 for and.com. Are there still any users?


Re: New tools proposal: ioctlname and ioctldecode

2020-04-01 Thread Kamil Rytarowski
On 02.04.2020 02:14, Christos Zoulas wrote:
> In article <0fd513f7-6f0c-6ed1-2119-6ce5313d4...@gmx.com>,
> Kamil Rytarowski   wrote:
>> I propose to add two new tools:
>>
>> - ioctlname
>> - ioctldecode
>
> I would call it ioctlprint and have:
>
> ioctlprint [-f ] || arg
>
> for the input arg can be:
> name = TIOCFOO
> expr = _IO?('x', y, z)
> value = 0xfoobee
>
> The format can be contain %e %n %v which expand to what you
> think and defaults to:
>
> "%n %e %v\n"
>
> christos
>

I've implemented:

ioctlprint [-f format] [-e emul] arg...

$ ./ioctlprint   2148554498 2148554498
WSKBDIO_COMPLEXBELL _IOW('W',0x2,0x10) 0x80105702
WSKBDIO_COMPLEXBELL _IOW('W',0x2,0x10) 0x80105702

$ ./ioctlprint -f "%o %d %d %i %x %e %n\n"  2148554498
020004053402 2148554498 2148554498 2148554498 0x80105702
_IOW('W',0x2,0x10) WSKBDIO_COMPLEXBELL

%n - name
%e - expression
%x - print HEX number
%o - print OCT number
%d %i - print DEC number

http://netbsd.org/~kamil/patch-00245-kdump-ioctlname.3.txt


Re: New tools proposal: ioctlname and ioctldecode

2020-04-01 Thread Kamil Rytarowski
On 02.04.2020 02:56, Mouse wrote:
>> $ ioctldecode 2148554498
>> _IOW('W',0x2,0x10)
>
>> $ ioctlname 2148554498
>> WSKBDIO_COMPLEXBELL
>
> Where would you get the mapping between the ioctl value and the name?
> Would this be just a huge switch statement (or compiled-in table), or
> would it search /usr/include/sys at runtime, or what?
>
> In particular, would any special action need to be taken upon adding a
> new ioctl for it to be recognized?
>

Everything is already done in kdump and reused in other tools like
ktruss. Please check: src/usr.bin/kdump/Makefile.ioctl-c.

> What about collisions, two ioctls having the same numeric value?  (I'm
> not aware of any offhand, but I'd be astonished if it never happened.)
>

There are some collisions, but not that many of them. In these cases we
try to pick the more interesting device.

In sanitizers there are around 2000 individual ioctls and 51 collisions.
Some of them are gone as we remove old unmaintained device drivers.


New tools proposal: ioctlname and ioctldecode

2020-04-01 Thread Kamil Rytarowski
I propose to add two new tools:

 - ioctlname
 - ioctldecode

Both of them are a special mode embedded into kdump(1).

>From time to time there is need to decode IOCTL codes and there is no
(as far as I am aware) easy tool to do so.

ioctlname is already invented and it calls the internal function
ioctldecode().

commit f551b480cd03a35cec2a4927270c00cfaa508a27
Author: christos 
Date:   Mon Apr 13 14:39:23 2009 +

Allow kdump to be used as an ioctl decoder if invoked as ioctlname.



Today the internal/hidden program ioctlname produces something like:
"_IOW('W',0x2,0x10)".

I propose to rename this program to ioctldecode and change ioctlname to
print a descriptive operation name as received by the ioctlname() function.

$ ioctldecode 2148554498
_IOW('W',0x2,0x10)

$ ioctlname 2148554498
WSKBDIO_COMPLEXBELL

I propose to install both programs and add appropriate man-pages.

A functional patch is here:

http://netbsd.org/~kamil/patch-00245-kdump-ioctlname.2.txt


Re: Avoid UB in pslist.h (NULL + 0)

2020-03-24 Thread Kamil Rytarowski
On 24.03.2020 14:30, Kamil Rytarowski wrote:
> (3) Patch Clang to start optimizing on NULL + in C so we can return to
> points (1) and (2).
> 

I have received a feedback that the particular NULL + 0 issue is
intended to be reported to the C committee as a defect.

I appreciate this approach. If there is an intention to tune the common
interpretation of the C code, it's better to collaborate with the C
committee directly.



signature.asc
Description: OpenPGP digital signature


Re: Avoid UB in pslist.h (NULL + 0)

2020-03-24 Thread Kamil Rytarowski
On 24.03.2020 07:43, Taylor R Campbell wrote:
>> Date: Sun, 22 Mar 2020 03:30:56 +0100
>> From: Kamil Rytarowski 
>>
>> On 22.03.2020 01:50, Taylor R Campbell wrote:
>>> So far, after several weeks of discussion, nobody has presented a case
>>> that there is a credible thread of a compiler actually misbehaving in
>>> this scenario.
>>
>> There are no public declarations (that was requested on a local ML) but
>> according to my IRC talks, there were plans to start optimizing on NULL
>> + 0 operations, but is was/is considered as a chicken-egg issue. There
>> is a time span now to alert users and
> 
> Perhaps it is not a chicken-egg issue, but pointless abuse of leeway
> in the standard.  So far the public declarations are that Clang
> developers realized they _could_ abuse the leeway this way but they
> _chose not to_; that essentially everyone involved sees such
> `optimization' as abuse; that there are no public reasons stated for
> why the C standard diverges from the C++ standard on this note.
> 
> If you have contrary information, please cite it specifically.
> 

I have nothing to add. I'm personally agnostic to both sides.

>> Both languages can be synced here as the incompatibility was revealed
>> and discussed. Personally I wouldn't be surprised to see adoption of the
>> C behavior as nullptr + 0 is invalid in C++ (and it will be likely
>> invalid in future C).
> 
> The fragment nullptr + 0 is invalid for an unrelated reason: nullptr
> is a pointer to an incomplete type.  We've gone over this already.
> 
> It's not helpful to keep bringing it up: at best it's a distraction,
> and at worst it gives the impression you might not understand basic
> aspects of C and C++ -- an impression which, if true, would make it
> all the more important that you _not_ go around tweaking things to
> appease sanitizers before discussing and understanding them.
> 
>> So far in reply I get 'just noise from the tool' so maybe my messages
>> were not clear enough and reaching authorities (from the clang community
>> and C committee) not credible enough.
> 
> You successfully presented a case that it is technically undefined
> behaviour.  This is not in dispute -- a month ago I cited the specific
> place where the C standard neglects to define it[*] -- so I don't
> understand why you continue to argue that case. 
> 
> [*] https://mail-index.NetBSD.org/tech-kern/2020/02/25/msg026105.html
> 

I reported that these issues are triggered in userland rump (+ in the
pslist.h ATF tests). Rump has userland assumptions with compiler flags.

I already declarared that using assembly code is not planned to be touched.

>>> (b) Change how we invoke ubsan and the compiler by passing
>>> -fno-delete-null-pointer-checks to clang.  joerg objected to this
>>> but I don't recall the details off the top of my head; joerg, can
>>> you expand on your argument against this, and which alternative
>>> you would prefer?
>>
>> I tried to enable -fno-delete-null-pointer-check for RUMPKERNEL, but I
>> was asked to revert... after first getting acknowledge from you that it
>> is a good idea...
> 
> This is an inaccurate representation of what happened.  What I said is
> (https://mail-index.NetBSD.org/tech-kern/2020/03/08/msg026125.html):
> 
>   `Adding -fno-delete-null-pointer-checks for clang too sounds
>sensible to me in general, but please check with joerg first.  It
>remains unclear to me that it's necessary here.'
> 
> I asked you to revert because:
> (a) the discussion was still ongoing without a conclusion,
> (b) it remained unclear whether the change was necessary,

I noted that it is necessary at least for memcpy(NULL, NULL, X)-like
usage in rump. Even if we ignore NULL + 0.

The kernel code is already prebuilt with
-fno-delete-null-pointer-checks, but not userland rump. This causes
slight incompatibilities.

I don't see what's unclear. I k

> (c) you failed to check with joerg like I asked, and

I checked earlier that joerg disagrees with GCC and he has its
interpretation of the standard that memcpy(NULL, NULL, 0)-like
optimization is wrong.

> (d) joerg objected.
> 

And GCC developers do not support this (me neither).

> I appreciate that sometimes it takes longer than you or I might like
> to get a clear explanation out of joerg, but it is also fatiguing to
> have to correct misrepresentations of positions in long meandering
> threads in order to ensure changes you are itching to make -- or have
> already made despite ongoing discussion -- are justified and correct.
> 
>>> (c) Change how we invoke ubsan, but just ubsan -- not the compiler.
>>
>> UBSan is an i

Re: Avoid UB in pslist.h (NULL + 0)

2020-03-21 Thread Kamil Rytarowski
On 22.03.2020 01:50, Taylor R Campbell wrote:
>> Date: Sun, 22 Mar 2020 00:03:57 +0100
>> From: Kamil Rytarowski 
>>
>> I propose to change the fun(pointer + 0) logic with fun(pointer, 0).
> 
> I don't think this is a good approach -- it requires modifying code
> further and further away from the relevant part.
> 
> 
> But let's step back a moment.
> 
> So far, after several weeks of discussion, nobody has presented a case
> that there is a credible thread of a compiler actually misbehaving in
> this scenario.
> 

There are no public declarations (that was requested on a local ML) but
according to my IRC talks, there were plans to start optimizing on NULL
+ 0 operations, but is was/is considered as a chicken-egg issue. There
is a time span now to alert users and

It was also confirmed by two people that clang or a C compiler in
general is allowed to optimize the code.

> Yes, it is technically undefined behaviour -- nobody disputes that --
> but so is lots of other stuff that NetBSD relies on such as inline
> asm.  In C++, it is explicitly _not_ undefined behaviour; Clang
> specifically stopped short of exploiting it as undefined behaviour in
> C; nobody presented reasons why it should be undefined in C but
> defined in C++.
> 

Our kernel/rump code is mostly in C or at most a common subset of C and
C++. As the languages are not fully compatible in details, it's safe to
assume the C behavior that is valid for every C++ program, rather than
C++ one that is not valid for C.

Both languages can be synced here as the incompatibility was revealed
and discussed. Personally I wouldn't be surprised to see adoption of the
C behavior as nullptr + 0 is invalid in C++ (and it will be likely
invalid in future C).

> So this all serves to work around false alarms from ubsan -- not
> actual bugs, just noise from the tool.  Presumably the reason we use
> ubsan at all is that it helps find actual bugs -- true alarms.
> There's a few ways we might approach the false alarms:
> 

Out of 3 cases that I was requested, 3 of them were confirmed to be real
bugs (alignment, memcpy(NULL,NULL, 0), ptr + 0). Two of the cases can
generate crashing code now. One of them was researched by the clang
developers and C committee and confirmed that a compiler can impose
assumptions that would break misdesigned code.

I presented examples for the two UB issues that they are harmful now.

So far in reply I get 'just noise from the tool' so maybe my messages
were not clear enough and reaching authorities (from the clang community
and C committee) not credible enough.

> (a) Ignore them.  It's what we've been doing so far.
> 

This is not true. We already run our kernel with GCC with 0 UBSan
reports and perform syzbot fuzzing. We catch real problems there.

It already happened.

We are reaching to the point of running all ATF reports under UBSan.

If we cannot suppress noise from a tool (any kind of it), then the tool
is not much usable for signaling real problems.

> (b) Change how we invoke ubsan and the compiler by passing
> -fno-delete-null-pointer-checks to clang.  joerg objected to this
> but I don't recall the details off the top of my head; joerg, can
> you expand on your argument against this, and which alternative
> you would prefer?
> 

I tried to enable -fno-delete-null-pointer-check for RUMPKERNEL, but I
was asked to revert... after first getting acknowledge from you that it
is a good idea...

> (c) Change how we invoke ubsan, but just ubsan -- not the compiler.
> There's a cost to this: if we diverge from how we invoke the
> compiler, we might be disabling true alarms from ubsan that
> reflect how the compiler will actually behave.
> 

UBSan is an integral part of a compiler.

> (d) Patch ubsan to disable the false alarms.  This incurs a
> maintenance burden, but maybe leaves less of a sharp edge to trip
> on than setting compiler flags in the makefile -- updates that
> change the behaviour are more likely to lead to merge conflicts.
> 

I'm fine to patch the tool to disable false alarms or rather report them
upstream.

There are some indeed false positives sometimes when we depend on
assumptions that are imposed after compilation, during linking phase. In
these cases we disable sanitizing as there is no way to teach the tool.

> (e) Patch our own code to suppress the false alarms.  The cost to this
> churn is that it can introduce bugs of its own, and make the code
> harder to understand, and the complexity may become obsolete in
> the next version of the tool but will remain a Chesterton's fence
> (`why is there a dummy argument in all these functions? can we get
> rid of it?').
> 

Every single commit can introduce bugs on their own.

If we introduce code that relies on UB assumptions it is rather opp

Avoid UB in pslist.h (NULL + 0)

2020-03-21 Thread Kamil Rytarowski
I propose to change the fun(pointer + 0) logic with fun(pointer, 0).

We still maintain the sanity checks and we can optimize the code to
generate not worse code than before. We can pass the "0" argument only
with DIAGNOSTIC or DEBUG kernel.

http://netbsd.org/~kamil/patch-00242-pslist-avoid-null-pointer-arithmetic.txt

I confirm that with this change there are no longer any reports UBSan
reports.

I'm open to tune this patch to expected coding style.



signature.asc
Description: OpenPGP digital signature


Re: NULL pointer arithmetic issues

2020-03-09 Thread Kamil Rytarowski
On 09.03.2020 07:05, Martin Husemann wrote:
> Also note that the getuid()/geteuid() example here is IMHO unrelated to the
> original issue that caused this discussion, so I am not even convinced this
> is NOT a ubsan bug.

We instruct a C compiler that pointer used in the pserialize macros is
never NULL, as the side effect of adding to it 0. As the pointer can be
NULL, this at least confuses the compiler and can result in a
miscompilation.

We workaround it today with -fno-delete-null-pointer-checks in RUMP. In
regular userland we shall avoid NULL pointer arithmetic.



signature.asc
Description: OpenPGP digital signature


Re: NULL pointer arithmetic issues

2020-03-08 Thread Kamil Rytarowski
On 08.03.2020 19:42, Mouse wrote:
>>> The correct fix is not to disable the null-pointer-check option but
>>> to remove the broken automatic non-null arguments in GCC.
> 
>> The C standard and current usage (GNU) disagrees here.
> 
> GNU is not some kind of arbiter of "current usage" (whatever _that_
> means).
> 
>> memcpy (along with a number of other functions) must not accept NULL
>> arguments and compiler can optimize the code based on these
>> assumptions.
> 
> Then such functions - or the language in which they are embedded - is
> not suitable for writing kernels.
> 

Theoretical C is not suitable for writing kernels and we need extensions
for the freestanding environment. We require at least assembly stubs.

> But, is it "must not accept null arguments" or is it "may do anything
> they like when presented with null arguments"?

Actually both.

>  But the answer is to fix the problem, not to
> twist the code into a pretzel to work around the compiler's refusal to
> be suitable for the job.
> 

We use -fno-delete-null-pointer-checks to disable these optimizations.



signature.asc
Description: OpenPGP digital signature


Re: NULL pointer arithmetic issues

2020-03-08 Thread Kamil Rytarowski
On 08.03.2020 19:30, Taylor R Campbell wrote:
>> Date: Sun, 8 Mar 2020 20:52:29 +0300
>> From: Roman Lebedev 
>>
>> so we are allowed to lower that in clang front-end as:
>>
>> int
>> statu(char *a)
>> {
>>   __builtin_assume(a != NULL);
>>   a += getuid() - geteuid();
>>   __builtin_assume(a != NULL);
> 
> Allowed, yes.
> 
> What I'm wondering is whether this is something Clang will actually do
> -- and whether it is for any serious reason other than to capriciously
> screw people who write software for real machines instead of for the
> pure C abstract machine to the letter of the standard (and if so,
> whether -fno-delete-null-pointer-checks is enough to disable it).
> 
> Evidently making that assumption _not_ allowed in C++, so presumably
> it's not important for performance purposes.  It's also not important
> for expressive purposes, because I could just as well have written
> assert(a != NULL).
> 
>>> I was told by Roman that it was checked during a C committee meeting and
>>> confirmed to be an intentional UB.
>> Correction: Aaron Ballman asked about this in non-public WG14 reflector
>> mailing list, it wasn't a committee meeting, but the point still stands.
> 
> What does `intentional' UB mean, exactly?  What is the intent behind
> having p + 0 for null p be undefined in C, while the C++ committee saw
> fit to define it?
> 
> Is it because there technically once existed C implementations on
> bizarre architectures with wacky arithmetic on pointers like Zeta-C,
> or is it because there are actual useful consequences to inferring
> that p must be nonnull if we evaluate p + 0?
> 
> I ask because in principle a conformant implementation could compile
> the NetBSD kernel into a useless blob that does nothing -- we rely on
> all sorts of behaviour relative to a real physical machine that is not
> defined by the letter of the standard, like inline asm, or converting
> integers from the VM system's virtual address allocator into pointers
> to objects.  But such an implementation would not be useful.
> 

Aaron (as he was mentioned by name), is a voting member in the C++
committee and actively working on harmonizing C and C++ standards. There
is a good chance that C and C++ will be synced here (they definitely
should).



signature.asc
Description: OpenPGP digital signature


Re: NULL pointer arithmetic issues

2020-03-08 Thread Kamil Rytarowski
On 08.03.2020 18:12, Joerg Sonnenberger wrote:
> On Sun, Mar 08, 2020 at 03:33:57PM +0100, Kamil Rytarowski wrote:
>> There was also a request to make a proof that memcpy(NULL,NULL,0) is UB
>> and can be miscompiled.
>>
>> Here is a reproducer:
>>
>> http://netbsd.org/~kamil/memcpy-ub.c
>>
>> 131 kamil@rugged /tmp $ gcc -O0 memcpy.c
>>
>> 132 kamil@rugged /tmp $ ./a.out
>>
>> 1
>>
>> 133 kamil@rugged /tmp $ gcc -O2 memcpy.c
>> 134 kamil@rugged /tmp $ ./a.out
>> 0
>>
>> A fallback for freestanding environment is to use
>> -fno-delete-null-pointer-check.
> 
> The correct fix is not to disable the null-pointer-check option but to
> remove the broken automatic non-null arguments in GCC.
> 
> Joerg
> 

The C standard and current usage (GNU) disagrees here. memcpy (along
with a number of other functions) must not accept NULL arguments and
compiler can optimize the code based on these assumptions.

-fno-delete-null-pointer-check is a fallback disabling this optimizations.



signature.asc
Description: OpenPGP digital signature


Re: NULL pointer arithmetic issues

2020-03-08 Thread Kamil Rytarowski
On 08.03.2020 18:11, Joerg Sonnenberger wrote:
> On Sun, Mar 08, 2020 at 03:30:02PM +0100, Kamil Rytarowski wrote:
>> NULL+x is now miscompiled by Clang/LLVM after this commit:
>>
>> https://reviews.llvm.org/rL369789
>>
>> This broke various programs like:
>>
>> "Performing base + offset pointer arithmetic is only allowed when base
>> itself is not nullptr. In other words, the compiler is assumed to allow
>> that base + offset is always non-null, which an upcoming compiler
>> release will do in this case. The result is that CommandStream.cpp,
>> which calls this in a loop until the result is nullptr, will never
>> terminate (until it runs junk data and crashes)."
> 
> As you said, using a non-zero offset. Noone here argued that using
> non-zero offsets is or should be valid since that would obviously create
> a pointer outside the zero-sized object.
> 
> Joerg
> 

We catch NULL + x at least here:

Undefined Behavior in t_subr_prf.c:179:9, pointer expression with base 0
overflowed to 0x14
Undefined Behavior in t_subr_prf.c:179:9, pointer expression with base 0
overflowed to 0xa



signature.asc
Description: OpenPGP digital signature


Re: NULL pointer arithmetic issues

2020-03-08 Thread Kamil Rytarowski
On 08.03.2020 18:00, Taylor R Campbell wrote:
> Thanks for doing the research.
> 
>> Date: Sun, 8 Mar 2020 15:30:02 +0100
>> From: Kamil Rytarowski 
>>
>> NULL+0 was added to UBSan proactively as it is as of today not
>> miscompiled, but it is planned to so in not so distant time. It is a
>> chicken-egg problem.
> 
> If you say it is planned, can you please cite the plan?
> 

Adding Roman Levedev, we discussed about enabling NULL+0 optimization.

Teaching LLVM about this will allow more aggressive optimization, so
code like this:

http://netbsd.org/~kamil/null_plus_0-ub.c

will report different results depending on optimization levels.

> In C++, adding zero to a null pointer is explicitly allowed and
> guaranteed to return a null pointer.  See, for example, C++11 5.7
> `Additive operators', clause 7, p. 117: `If the value 0 is added to or
> subtracted from a pointer value, the result compares equal to the
> original pointer value.'  C++17 clarifies in 8.7 `Additive operators',
> clause 7, p. 132: `If the value 0 is added to or subtracted from a
> null pointer value, the result is a null pointer.'
> 
> So it would be a little surprising to me if compilers -- which tend to
> focus their efforts on C++ more than C these days -- went out of their
> way to act on the inference that evaluating p + 0 implies p is nonnull
> in C.
> 

I underlined the C language in my message. More elaborated answer here:

https://reviews.llvm.org/D67122#inline-612172

I was told by Roman that it was checked during a C committee meeting and
confirmed to be an intentional UB.

>> There is however a fallback. If we want to use NULL+0, we must use
>> -fno-delete-null-pointer-checks that avoids miscompilation and raising
>> UBSan reports. If we want to allow NULL+x we must enable -fwrapv.
> 
> Adding -fno-delete-null-pointer-checks for clang too sounds sensible
> to me in general, but please check with joerg first.  It remains
> unclear to me that it's necessary here.
> 

Today it will calm down LLVM UBSan. In future potentially avoid
miscompilation.

There are also memcpy(NULL,NULL,0)-like cases that need research why
they happen in the first place.



signature.asc
Description: OpenPGP digital signature


Re: NULL pointer arithmetic issues

2020-03-08 Thread Kamil Rytarowski
On 22.02.2020 17:25, Kamil Rytarowski wrote:
> When running the ATF tests under MKLIBCSANITIZER [1], there are many
> NULL pointer arithmetic issues .
> 
> http://netbsd.org/~kamil/mksanitizer-reports/ubsan-2020-02-22-null-pointer.txt
> 
> These issues are in macros like:
>  - IN_ADDRHASH_READER_FOREACH()
>  - IN_ADDRLIST_WRITER_INSERT_TAIL()
>  - IFADDR_READER_FOREACH()
>  - etc
> 
> These macros wrap internally pserialize-safe linked lists.
> 
> What's the proper approach to address this issue?
> 
> These reports are responsible for around half of all kinds of the
> remaining Undefined Behavior unique issues when executing ATF tests.
> 
> 
> [1] ./build.sh -N0 -U -V MAKECONF=/dev/null -V HAVE_LLVM=yes -V MKGCC=no
> -V MKLLVM=yes -V MKLIBCSANITIZER=yes -j8 -u -O /public/netbsd-llvm
> distribution
> 
> 
> 

NULL + 0 and NULL + x are both Undefined Behavior (as confirmed by a C
committee member, it was discussed and confirmed to be intentional).

NULL+x is now miscompiled by Clang/LLVM after this commit:

https://reviews.llvm.org/rL369789

This broke various programs like:

"Performing base + offset pointer arithmetic is only allowed when base
itself is not nullptr. In other words, the compiler is assumed to allow
that base + offset is always non-null, which an upcoming compiler
release will do in this case. The result is that CommandStream.cpp,
which calls this in a loop until the result is nullptr, will never
terminate (until it runs junk data and crashes)."

https://github.com/google/filament/pull/1566

LLVM middle-end uses those guarantees for transformations.


This pushed the LLVM devs to implement this NULL+x and NULL+0 check in
UBSan:

https://reviews.llvm.org/D67122

NULL+0 was added to UBSan proactively as it is as of today not
miscompiled, but it is planned to so in not so distant time. It is a
chicken-egg problem.

There is however a fallback. If we want to use NULL+0, we must use
-fno-delete-null-pointer-checks that avoids miscompilation and raising
UBSan reports. If we want to allow NULL+x we must enable -fwrapv.

This means that we can avoid pserialize reports by enabling these CFLAGS
for clang as well.

 22 # We are compiling the kernel code with
no-delete-null-pointer-checks,
 23 # and compiling without it, causes issues at least on sh3 by adding
 24 # aborts after kern_assert on NULL pointer checks.
 25 CFLAGS+=${${ACTIVE_CC} == "gcc":?
-fno-delete-null-pointer-checks :}
 26

https://nxr.netbsd.org/xref/src/sys/rump/Makefile.rump#25

I'm going to test it and switch -fno-delete-null-pointer-check on
Clang/LLVM.


For the historical note, a reproducer miscompiling NULL + x:

http://netbsd.org/~kamil/null_plus_x-ub.c

136 kamil@chieftec /tmp $ /usr/local/bin/clang -O0 null_plus_x-ub.c
137 kamil@chieftec /tmp $ ./a.out
1
138 kamil@chieftec /tmp $ /usr/local/bin/clang -O2 null_plus_x-ub.c
139 kamil@chieftec /tmp $ ./a.out
0
151 kamil@chieftec /tmp $ /usr/local/bin/clang -O3 -fwrapv null_plus_x-ub.c
152 kamil@chieftec /tmp $ ./a.out

1

NULL+0 is planned to follow up.



signature.asc
Description: OpenPGP digital signature


Re: Current status of "support jails-like features"?

2020-03-08 Thread Kamil Rytarowski
On 08.03.2020 03:55, メーリングリストNetBSD wrote:
> Hello,
> 
> I'm interested in "support jails-like features" [1] among The NetBSD
> projects.
> I investigated several current container systems [2], but couldn't find
> one providing kernel-level virtualization.
> I would like to know the current state of this project. Are there any
> NetBSD developers working on this project?
> 

There is initial work in this domain. Replied in a private mail.

> [1] http://wiki.netbsd.org/projects/project/kernel_components/
> [2] https://github.com/NetBSDfr/sailor 
>      https://github.com/jmmv/sandboxctl
> 
> ---
> wataru kasai




signature.asc
Description: OpenPGP digital signature


Re: NULL pointer arithmetic issues

2020-02-24 Thread Kamil Rytarowski
On 24.02.2020 23:35, Mouse wrote:
>> Unless I remember wrong, older C standards explicitly say that the
>> integer 0 can be converted to a pointer, and that will be the NULL
>> pointer, and a NULL pointer cast as an integer shall give the value
>> 0.
> 
> The only one I have anything close to a copy of is C99, for which I
> have a very late draft.
> 
> Based on that:
> 
> You are not quite correct.  Any integer may be converted to a pointer,
> and any pointer may be converted to an integer - but the mapping is
> entirely implementation-dependent, except in the integer->pointer
> direction when the integer is a "null pointer constant", defined as
> "[a]n integer constant expression with the value 0" (or such an
> expression cast to void *, though not if we're talking specifically
> about integers), in which case "the resulting pointer, called a null
> pointer, is guaranteed to compare unequal to a pointer to any object or
> function".  You could have meant that, but what you wrote could also be
> taken as applying to the _run-time_ integer value 0, which C99's
> promise does not apply to.  (Quotes are from 6.3.2.3.)
> 
> I don't think there is any promise that converting a null pointer of
> any type back to an integer will necessarily produce a zero integer.
> 
> /~\ The ASCII   Mouse
> \ / Ribbon Campaign
>  X  Against HTML  mo...@rodents-montreal.org
> / \ Email! 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
> 

$ cat test.cpp

#include 

int
main(int argc, char **argv)
{
if (((char *)0)[argc])
return 1;
else
return 0;
}

$ g++ test.cpp
$ ./a.out
Memory fault (core dumped)

And some variations:


$ g++ test.cpp
test.cpp: In function ‘int main(int, char**)’:
test.cpp:6:15: warning: converting NULL to non-pointer type
[-Wconversion-null]
  if (NULL[argc])
   ^
test.cpp:6:15: error: invalid types ‘long int[int]’ for array subscript


$ g++ test.cpp
test.cpp: In function ‘int main(int, char**)’:
test.cpp:6:18: error: invalid types ‘std::nullptr_t[int]’ for array
subscript
  if (nullptr[argc])
  ^

NULL in C is expected to be harmonized with nullptr from C++.

We still can store NULL/nullptr in variables as before and there is no
change in the produced code. The only change is on the syntax level as
we can catch more bugs earlier. Whenever a compiler will be smart enough
to deduce that the code is nullptr[0] it will raise an error.



signature.asc
Description: OpenPGP digital signature


Re: NULL pointer arithmetic issues

2020-02-24 Thread Kamil Rytarowski
On 24.02.2020 21:18, Mouse wrote:
>>> If we use 0x0, it can be a valid pointer.
> 
>>> If we use NULL, it's not expected to work and will eventually
>>> generate a syntax erro.
> 
> Then someone has severely broken compatability with older versions of
> C.  0x0 and (when one of the suitable #includes has been done) NULL
> have both historically been perfectly good null pointer constants.
> 
> Also...syntax error?  Really?  _Syntax_ error??  I'd really like to see
> what they've done to the grammar to lead to that; I'm having trouble
> imagining how that would be done.
> 

The process of evaluation of the NULL semantics is not a recent thing.

Not so long time, still in the NetBSD times, it was a general practice
to allow dereferencing the NULL pointer and expect zeroed bytes over there.

We still maintain compatibility with this behavior (originated as a hack
in PDP11) in older NetBSD releases (NetBSD-0.9 Franz Lisp binaries
depend on this).


> /~\ The ASCII   Mouse
> \ / Ribbon Campaign
>  X  Against HTML  mo...@rodents-montreal.org
> / \ Email! 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
> 




signature.asc
Description: OpenPGP digital signature


Re: NULL pointer arithmetic issues

2020-02-24 Thread Kamil Rytarowski
On 24.02.2020 15:35, Don Lee wrote:
> 
>> On Feb 24, 2020, at 8:05 AM, Mouse  wrote:
>>
> RUST is better defined that C and is indeed used in OS development
> these days
 ...so?  I don't see how this is related to the rest of the
 discussion.
>>> As C is considered as not suitable for OS development,
>>
>> Once again, there is no such language as C.  There is a family of
>> closely related languages collectively called C.
>>
>> But it's actually the compiler, not the language.
>>
>>> there is an escape plan, already with a successful story in this
>>> domain.
>>
>> There's another one, and one that doesn't require the complete rewrite
>> a switch as drastic as C->rust would: various compilers (including
>> older versions of the gcc family) that don't think it reasonable to
>> take clear code and language-lawyer it into broken executables.
>>
> We need to be mindful of the gargantuan body of code written in “C”, 
> expecting the “old” behavior, much of it no longer having any sort of support.
> 
> Software lives almost as long as government programs.
> 
> -dgl-
> 

While there, CHERI CPU can catch invalid intermediates (invalid pointer,
before dereferencing).

This is something that breaks a lot of old C code. tcpdump (that still
preserves ifdefs for MSDOS) received rewrite to remove these types of bugs.

https://www.cl.cam.ac.uk/~dc552/papers/asplos15-memory-safe-c.pdf



signature.asc
Description: OpenPGP digital signature


Re: NULL pointer arithmetic issues

2020-02-24 Thread Kamil Rytarowski
On 24.02.2020 15:04, Jason Thorpe wrote:
> 
>> On Feb 24, 2020, at 4:22 AM, Kamil Rytarowski  wrote:
>>
>> A compiler once being smart enough can introduce ILL/SEGV traps into
>> code that performs operations on NULL pointers. This already bitten us
>> when we were registering a handler at address 0x0 for the kernel code,
>> GCC changed the operation into a cpu trap. (IIRC it was in the sparc code.)
> 
> Nonsense, I think it's fair to classify that as a bug.  That sort of stuff is 
> *not* supposed to happen if -ffreestanding is passed to the compiler.
> 
> -- thorpej
> 

If we use 0x0, it can be a valid pointer.

If we use NULL, it's not expected to work and will eventually generate a
syntax erro.

UBSan as a runtime tool tries to indirectly catch the latter with the
former and is prone to some rare false positives (so far not reported).

If a compiler is too smart for 0x0 pointers, transforming them to abort
traps, it is a compiler bug. I noted that this already happens.

On 24.02.2020 15:05, Mouse wrote:
> (3) If you have reason to think the C committee would be interested in
> having me as a member, let me know whom to talk to.  I might or might
> not actually end up interested in joining, but I'd like more info.

http://www.open-std.org/jtc1/sc22/wg14/



signature.asc
Description: OpenPGP digital signature


Re: NULL pointer arithmetic issues

2020-02-24 Thread Kamil Rytarowski
On 24.02.2020 14:03, Mouse wrote:
 It is now in C++ mainstream and already in C2x draft.
>>> Then those are not suitable languages for OS implementations.
>> This battle is lost for C
> 
> C is not a language.  C is a family of closely related languages.
> 

If we tread C as gnu89 gnu99 gnu11 k etc this is true.

> Some of them are suitable for OS implementation.  It appears some of
> the more recent ones are not, but this does not mean the older ones 
> also aren't.
> 

From my perception the trend is inversed. Things that were undefined or
unspecified in older revisions of C, are more clearly defined now.

> Undefined behaviour as a way of describing differences between
> implementations, things that it limits portability to depend on, is
> useful.  Undefined behaviour as a license-by-fiat for compilers to
> unnecessarily transform code in unexpected ways is not.  Software
> languages and their compilers exist to serve their users, not the other
> way around; it is not a compiler's place to take the position of "ha
> ha, the code you wrote is clear but I can find a way to lawyer it into
> formally undefined behaviour, so I'm going to transform it into
> something I know damn well you didn't expect".
> 

Please join the C committee as a voting member or at least submit papers
with language changes. Complaining here won't change anything.

(Out of people in the discussion, I am involved in wg14 discussions and
submit papers.)

>> RUST is better defined that C and is indeed used in OS development
>> these days
> 
> ...so?  I don't see how this is related to the rest of the discussion.
> 

As C is considered as not suitable for OS development, there is an
escape plan, already with a successful story in this domain.

> /~\ The ASCII   Mouse
> \ / Ribbon Campaign
>  X  Against HTML  mo...@rodents-montreal.org
> / \ Email! 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
> 




signature.asc
Description: OpenPGP digital signature


Re: NULL pointer arithmetic issues

2020-02-24 Thread Kamil Rytarowski
On 24.02.2020 13:41, Joerg Sonnenberger wrote:
> On Mon, Feb 24, 2020 at 11:42:01AM +0100, Kamil Rytarowski wrote:
>> Forbidding NULL pointer arithmetic is not just for C purists trolls. It
>> is now in C++ mainstream and already in C2x draft.
> 
> This is not true. NULL pointer arithmetic and nullptr arithmetic are
> *very* different things. Do not conflate them.
> 
> Joerg
> 

As noted, they are allowed to be practically the same in C++. The C
proposal (n2394) NULL is marked as deprecated and NULL should be set to
nullptr.



signature.asc
Description: OpenPGP digital signature


Re: NULL pointer arithmetic issues

2020-02-24 Thread Kamil Rytarowski
On 24.02.2020 12:14, Mouse wrote:
>> Forbidding NULL pointer arithmetic is not just for C purists trolls.
>> It is now in C++ mainstream and already in C2x draft.
> 
> Then those are not suitable languages for OS implementations.
> 
> I'm with campbell and mrg on this one.  It is not appropriate to twist
> NetBSD's code into a pretzel to work around "bugs" created by language
> committees deciding to give compilers new latitutde to "optimize"
> meaningful code into trash.
> 

This battle is lost for C and not be fought on a downstream user of a C
compiler (Matt Thomas insisted at some point to get the kernel buildable
with C++ and patched it for this..).

A compiler once being smart enough can introduce ILL/SEGV traps into
code that performs operations on NULL pointers. This already bitten us
when we were registering a handler at address 0x0 for the kernel code,
GCC changed the operation into a cpu trap. (IIRC it was in the sparc code.)

Looking at it from the proper perspective, the only rumpkernel reported
 NULL->0 arithmetic is performed by the pserialize macros. Once we will
patch them, the problem can go away. So claim about twisting the kernel
code or churn is exaggeration.


RUST is better defined that C and is indeed used in OS development these
days (there are startups doing OS development in RUST, e.g.
https://github.com/oxidecomputer).



signature.asc
Description: OpenPGP digital signature


Re: NULL pointer arithmetic issues

2020-02-24 Thread Kamil Rytarowski
On 24.02.2020 05:03, Taylor R Campbell wrote:
>> Date: Sun, 23 Feb 2020 22:51:08 +0100
>> From: Kamil Rytarowski 
>>
>> On 23.02.2020 20:08, Taylor R Campbell wrote:
>> Date: Sun, 23 Feb 2020 22:51:08 +0100
>> From: Kamil Rytarowski 
>>
>> On 23.02.2020 20:08, Taylor R Campbell wrote:
>>>> Date: Sat, 22 Feb 2020 17:25:42 +0100
>>>> From: Kamil Rytarowski 
>>>>
>>>> What's the proper approach to address this issue?
>>>
>>> What do these reports mean?
>>>
>>> UBSan: Undefined Behavior in 
>>> /usr/src/sys/rump/net/lib/libnet/../../../../netinet6/in6.c:2351:2, pointer 
>>> expression with base 0 overflowed to 0
>>
>> We added 0 to a NULL pointer.
>>
>> They can be triggered by code like:
>>
>> char *p = NULL;
>> p += 0;
> 
> It seems to me the proper approach is to teach the tool to accept
> this, and to avoid cluttering the tree with churn to work around the
> tool's deficiency, unless there's actually a serious compelling
> argument -- beyond a language-lawyering troll -- that (char *)NULL + 0
> is meaningfully undefined.
> 
> We already assume, for example, that memset(...,0,...) is the same as
> initialization to null pointers where the object in question is a
> pointer or has pointers as subobjects.
> 

Forbidding NULL pointer arithmetic is not just for C purists trolls. It
is now in C++ mainstream and already in C2x draft.

The newer C standard will most likely (already accepted by the
committee) adopt nullptr on par with nullptr from C++. In C++ we can
"#define NULL nullptr" and possibly the same will be possible in C.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2394.pdf

This will change all arithmetic code operating on NULL into syntax error.

> I think we should treat memcpy(NULL,NULL,0) similarly and tell the
> tool `no, on NetBSD that really is defined and we're not interested in
> hearing about theoretical nasal demons from armchair language
> lawyers'.
> 

memcpy(3) and other string functions are different. It is undefined if
we just run it with memcpy(rand(), rand(), 0) and the first two
arguments point to invalid memory.

memcpy(0, 0, x) have another issue with overlapping memory that makes it
undefined.

In theory memcpy(x,y,z) where x or y are 0 is valid, whenever we map 0x0
in the address space, but that is so rare that GCC defines these
arguments as nonnull.



signature.asc
Description: OpenPGP digital signature


Re: NULL pointer arithmetic issues

2020-02-23 Thread Kamil Rytarowski
On 23.02.2020 20:08, Taylor R Campbell wrote:
>> Date: Sat, 22 Feb 2020 17:25:42 +0100
>> From: Kamil Rytarowski 
>>
>> When running the ATF tests under MKLIBCSANITIZER [1], there are many
>> NULL pointer arithmetic issues .
>>
>> http://netbsd.org/~kamil/mksanitizer-reports/ubsan-2020-02-22-null-pointer.txt
>>
>> These issues are in macros like:
>>  - IN_ADDRHASH_READER_FOREACH()
>>  - IN_ADDRLIST_WRITER_INSERT_TAIL()
>>  - IFADDR_READER_FOREACH()
>>  - etc
>>
>> These macros wrap internally pserialize-safe linked lists.
>>
>> What's the proper approach to address this issue?
> 
> What do these reports mean?
> 
> UBSan: Undefined Behavior in 
> /usr/src/sys/rump/net/lib/libnet/../../../../netinet6/in6.c:2351:2, pointer 
> expression with base 0 overflowed to 0
> 

We added 0 to a NULL pointer.

They can be triggered by code like:

char *p = NULL;
p += 0;

or

char *p = NULL;
if (p[0] == NULL) {}



signature.asc
Description: OpenPGP digital signature


Re: NULL pointer arithmetic issues

2020-02-22 Thread Kamil Rytarowski
On 22.02.2020 19:39, Joerg Sonnenberger wrote:
> On Sat, Feb 22, 2020 at 05:25:42PM +0100, Kamil Rytarowski wrote:
>> When running the ATF tests under MKLIBCSANITIZER [1], there are many
>> NULL pointer arithmetic issues .
> 
> Which flags are the sanitizers using? Because I wouldn't be surprised if
> they just hit _PSLIST_VALIDATE_PTRS and friends.
> 
> Joerg
> 

This patch did not help. I double checked that this branch is really taken.

Index: sys/sys/pslist.h
===
RCS file: /cvsroot/src/sys/sys/pslist.h,v
retrieving revision 1.7
diff -u -r1.7 pslist.h
--- sys/sys/pslist.h1 Dec 2019 15:28:19 -   1.7
+++ sys/sys/pslist.h22 Feb 2020 20:51:42 -
@@ -32,6 +32,7 @@
 #ifndef_SYS_PSLIST_H
 #define_SYS_PSLIST_H

+#include 
 #include 
 #include 

@@ -288,7 +289,9 @@
  * Type-safe macros for convenience.
  */

-#if defined(__COVERITY__) || defined(__LGTM_BOT__)
+#if defined(__COVERITY__) || defined(__LGTM_BOT__) || \
+   __has_feature(undefined_behavior_sanitizer) || \
+   defined(__SANITIZE_UNDEFINED__)
 #define_PSLIST_VALIDATE_PTRS(P, Q) 0
 #define_PSLIST_VALIDATE_CONTAINER(P, T, F) 0
 #else



signature.asc
Description: OpenPGP digital signature


NULL pointer arithmetic issues

2020-02-22 Thread Kamil Rytarowski
When running the ATF tests under MKLIBCSANITIZER [1], there are many
NULL pointer arithmetic issues .

http://netbsd.org/~kamil/mksanitizer-reports/ubsan-2020-02-22-null-pointer.txt

These issues are in macros like:
 - IN_ADDRHASH_READER_FOREACH()
 - IN_ADDRLIST_WRITER_INSERT_TAIL()
 - IFADDR_READER_FOREACH()
 - etc

These macros wrap internally pserialize-safe linked lists.

What's the proper approach to address this issue?

These reports are responsible for around half of all kinds of the
remaining Undefined Behavior unique issues when executing ATF tests.


[1] ./build.sh -N0 -U -V MAKECONF=/dev/null -V HAVE_LLVM=yes -V MKGCC=no
-V MKLLVM=yes -V MKLIBCSANITIZER=yes -j8 -u -O /public/netbsd-llvm
distribution





signature.asc
Description: OpenPGP digital signature


Re: fault(4)

2020-02-22 Thread Kamil Rytarowski
On 08.02.2020 11:47, Maxime Villard wrote:
>
> Running ATF with kASan+LOCKDEBUG+fault with {N=32 scope=GLOBAL} already
> gives
> an instant crash:
>
> kernel diagnostic assertion "radix_tree_empty_tree_p(>pm_pvtree)"
> failed: file ".../sys/arch/x86/x86/pmap.c"
>

There is a number of similar reports on syzbot.

> Looks like radixtree.c doesn't handle allocation failures very well
> somewhere.
>
> fault(4) seems like the kind of feature that would be useful for
> stress-testing
> and fuzzing. As you can see in the diff, its code is extremely simple.
>
> Maxime
>
> [1] https://m00nbsd.net/garbage/fault/fault.diff

This tool is a must have but I defer review to others.


SIGCHLD set to SIG_DFL on exec(3)

2020-02-08 Thread Kamil Rytarowski
On 06.02.2020 20:51, Robert Elz wrote:
> Module Name:  src
> Committed By: kre
> Date: Thu Feb  6 19:51:59 UTC 2020
> 
> Modified Files:
>   src/bin/sh: main.c
> 
> Log Message:
> If we are invoked with SIGCHLD ignored, we fail badly, as we assume
> that we can always wait(2) for our children, and an ignored SIGCHLD
> prevents that.   Recent versions of bash can be convinced (due to a
> bug most likely) to invoke us that way.   Always return SIGCHLD to
> SIG_DFL during init - we already prevent scripts from fiddling it.
> 
> All ash derived shells apparently have this problem (observed by
> Martijn Dekker, and notified on the bash-bug list).  Actual issue
> diagnosed by Harald van Dijk (same list).
> 


> + (void)signal(SIGCHLD, SIG_DFL);


We are allowed to fix this in the kernel for everybody:

"If the SIGCHLD signal is set to be ignored by the calling process
image, it is unspecified whether the SIGCHLD signal is set to be ignored
or to the default action in the new process image."


"This volume of POSIX.1-2017 specifies that signals set to SIG_IGN
remain set to SIG_IGN, and that the new process image inherits the
signal mask of the thread that called exec in the old process image.
This is consistent with historical implementations, and it permits some
useful functionality, such as the nohup command. However, it should be
noted that many existing applications wrongly assume that they start
with certain signals set to the default action and/or unblocked. In
particular, applications written with a simpler signal model that does
not include blocking of signals, such as the one in the ISO C standard,
may not behave properly if invoked with some signals blocked. Therefore,
it is best not to block or ignore signals across execs without explicit
reason to do so, and especially not to block signals across execs of
arbitrary (not closely cooperating) programs."

https://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html



signature.asc
Description: OpenPGP digital signature


Kernel (9.99.44) responsiveness issues

2020-02-02 Thread Kamil Rytarowski
I keep observing responsiveness issues on NetBSD-current. This happened
in last 2 months.

Whenever I start building something with -j${CORES}, I have significant
delays of responsiveness in other applications.

load averages:  2.69,  5.56,  6.22;   up 0+01:32:42 12:12:34
71 processes: 69 sleeping, 2 on CPU
CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.1% interrupt, 99.8%
idle
Memory: 19G Act, 9639M Inact, 416K Wired, 34M Exec, 19G File, 43M Free
Swap: 64G Total, 64G Free

  PID USERNAME PRI NICE   SIZE   RES STATE  TIME   WCPUCPU COMMAND
0 root   00 0K   87M CPU/7  0:40  0.00%  0.49% [system]
15823 root  85016M 2508K poll/1 0:01  0.00%  0.00% nbmake
25446 kamil 43028M 2452K CPU/0  0:00  0.00%  0.00% top
14117 root 114027M 3356K tstile/0   0:00  0.00%  0.00% ld
29088 root 114027M 3280K tstile/3   0:00  0.00%  0.00% ld
20839 root 114027M 3208K tstile/1   0:00  0.00%  0.00% ld
19550 root 114026M 3184K tstile/6   0:00  0.00%  0.00% ld
13716 root 114026M 3104K tstile/2   0:00  0.00%  0.00% ld
 8758 root 114026M 3048K tstile/7   0:00  0.00%  0.00% ld
  240 root 114026M 2580K tstile/0   0:00  0.00%  0.00% ld

I can see in top(1) that processes are locked in turnstiles and load
goes down.

$ uname -a
NetBSD chieftec 9.99.44 NetBSD 9.99.44 (GENERIC) #0: Fri Jan 31 19:26:07
CET 2020
root@chieftec:/public/netbsd-root/sys/arch/amd64/compile/GENERIC amd64

135 kamil@chieftec /home/kamil $ cpuctl list

Num  HwId Unbound LWPs Interrupts Last change  #Intr
   --  -
00online   intr   Sun Feb  2 10:40:26 2020 13
12online   intr   Sun Feb  2 10:40:26 2020 0
24online   intr   Sun Feb  2 10:40:26 2020 0
36online   intr   Sun Feb  2 10:40:26 2020 0
41online   intr   Sun Feb  2 10:40:26 2020 0
53online   intr   Sun Feb  2 10:40:26 2020 0
65online   intr   Sun Feb  2 10:40:26 2020 0
77online   intr   Sun Feb  2 10:40:26 2020 0
136 kamil@chieftec /home/kamil $ cpuctl identify 0
Cannot bind to target CPU.  Output may not accurately describe the target.
Run as root to allow binding.

cpu0: highest basic info 000d
cpu0: highest extended info 8008
cpu0: "Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz"
cpu0: Intel Xeon E3-1200v2 and 3rd gen core, Ivy Bridge (686-class),
3392.48 MHz
cpu0: family 0x6 model 0x3a stepping 0x9 (id 0x306a9)
cpu0: features
0xbfebfbff
cpu0: features
0xbfebfbff
cpu0: features 0xbfebfbff
cpu0: features1 0x7fbae3ff
cpu0: features1 0x7fbae3ff
cpu0: features1
0x7fbae3ff
cpu0: features2 0x28100800
cpu0: features3 0x1
cpu0: features5 0x281
cpu0: xsave features 0x7
cpu0: xsave instructions 0x1
cpu0: xsave area size: current 832, maximum 832, xgetbv enabled
cpu0: enabled xsave 0x7
cpu0: I-cache 32KB 64B/line 8-way, D-cache 32KB 64B/line 8-way
cpu0: L2 cache 256KB 64B/line 8-way
cpu0: L3 cache 8MB 64B/line 16-way
cpu0: 64B prefetching
cpu0: ITLB 64 4KB entries 4-way, 2M/4M: 8 entries
cpu0: DTLB 64 4KB entries 4-way, 2M/4M: 32 entries (L0)
cpu0: L2 STLB 512 4KB entries 4-way
cpu0: Initial APIC ID 1
cpu0: Cluster/Package ID 0
cpu0: Core ID 0
cpu0: SMT ID 1
cpu0: MONITOR/MWAIT extensions 0x3
cpu0: monitor-line size 64
cpu0: C1 substates 2
cpu0: C2 substates 1
cpu0: C3 substates 1
cpu0: DSPM-eax 0x77
cpu0: DSPM-ecx 0x9
cpu0: SEF highest subleaf 
cpu0: Perfmon-eax 0x7300403
cpu0: Perfmon-eax 0x7300403
cpu0: Perfmon-edx 0x603
cpu0: microcode version 0x15, platform ID 1



signature.asc
Description: OpenPGP digital signature


Re: x86 bootstrap features

2020-01-20 Thread Kamil Rytarowski
On 20.01.2020 17:42, Emile `iMil' Heitor wrote:
> 
> Hi Kamil, Emmanuel & all,
> 
> On Tue, 24 Sep 2019, Kamil Rytarowski wrote:
> 
>> On 24.09.2019 14:26, Emmanuel Dreyfus wrote:
>>> On Tue, Sep 24, 2019 at 01:31:51PM +0200, Kamil Rytarowski wrote:
>>>> My use-case is "qemu-system-x86_64 -kernel ./netbsd". Last I tried
>>>> (with
>>>> multiboot2 patches merged) it still did not work.
>>>
>>> I did not commit anything in multiboot support in the NetBSD kernel,
>>> I only worked on bootstraps for now, hence the steady failure you
>>> experience should come at no suprise.
>>>
>>> For now our kernel has support code for multiboot 1 for i386 only.
>>
>> qemu-system-i386 works, but -x86_64 not.
>>
>> Are there plans to add it to the amd64 kernel?
> 
> Is there any news on this front? Being able to boot an amd64 kernel
> directly
> from kvm would give NetBSD the ability to be started by AWS
> Firecracker[1] out
> of the box which would be amazing.
> 
> [1]: https://github.com/firecracker-microvm/firecracker
> 

There was some work on multiboot but it was reverted.

I consider qemu -kernel NetBSD/amd64 booting as a high priority.

> 
> Emile `iMil' Heitor * 
>   _
>     | http://imil.net    | ASCII ribbon campaign ( )
>     | http://www.NetBSD.org  |  - against HTML email  X
>     | http://gcu.info    |  & vCards / \
> 
> 
> !DSPAM:5e25d89018011320695049!
> 




signature.asc
Description: OpenPGP digital signature


Re: GSoC Proposal - Defragmentation for FFS

2020-01-20 Thread Kamil Rytarowski
On 20.01.2020 04:53, Chen,Xizi wrote:
> Hi NetBSD Community,
> 

Welcome@

> I am currently a first year Master's Degree student studying Computer
> Science at Northwest Missouri State University. I'm a professional C/C++
> developer with about 4 years of work experience as a storage systems
> engineer. My work mostly revolved around storage stack and I/O path in
> general.
> 
> One of the issues that hits very close to home for me is file system
> aging. I'm interested in learning in detail how to tackle fragmentation.
> I believe "Defragmentation for FFS" project provides me an excellent
> opportunity to learn more.
> 
> I have experience using and writing software in Linux, including
> experience in writing and debugging kernel modules. I am interested in
> learning more about the project "Defragmentation for FFS" albeit my work
> mostly involved working with XFS, I am more than willing to learn and
> contribute to it.
> 
> Looking forward to learning more about the project and working with
> NetBSD community.
> 
> Cheers,
> /Xizi Chen,/
> /NWMSU/

Please elaborate your involvement in XFS.

There is an ongoing (but not formally GSoC) project by Maciej to port
XFS testsuite for the NetBSD filesystems. We probably could use some
help here.



signature.asc
Description: OpenPGP digital signature


Re: sys_ptrace_lwpstatus.c (Was: CVS commit: src/sys)

2019-12-27 Thread Kamil Rytarowski
On 26.12.2019 15:11, Valery Ushakov wrote:
> On Thu, Dec 26, 2019 at 08:52:39 +0000, Kamil Rytarowski wrote:
> 
>> Module Name: src
>> Committed By:kamil
>> Date:Thu Dec 26 08:52:39 UTC 2019
>>
>> Modified Files:
>>  src/sys/kern: files.kern sys_ptrace_common.c
>>  src/sys/sys: ptrace.h
>> Added Files:
>>  src/sys/kern: sys_ptrace_lwpstatus.c
>>
>> Log Message:
>> Put ptrace_read_lwpstatus() and process_read_lwpstatus() to a new file
>>
>> Fixes "no PTRACE" kernel build, in particular zaurus kernel=INSTALL_C700.
> 
> This is counterintuitive when a sys_ptrace* file with ptrace_*
> functions does not depend on options ptrace.  That seems to be a
> strong indication the functions and the file are misnamed.
> 
> filekern/sys_ptrace.c   ptrace
> filekern/sys_ptrace_common.cptrace
> filekern/sys_ptrace_lwpstatus.c kern
> 
> -uwe
> 

I will refactor this.



signature.asc
Description: OpenPGP digital signature


Introducing PT_LWPSTATUS + PT_LWPNEXT, obsoleting PT_LWPINFO

2019-12-21 Thread Kamil Rytarowski
PT_LWPINFO is a legacy ptrace(2) operation that was originally intended
to retrieve the thread (LWP) information inside a traced process.

At the end of the day, this call has been designed to work as an
iterator over threads and retrieve the LWP id + event information. The
event information is received in a raw format (PL_EVENT_NONE,
PL_EVENT_SIGNAL, PL_EVENT_SUSPENDED).

Problems:

1. PT_LWPINFO shares the operation name with PT_LWPINFO from FreeBSD
that works differently and is used for different purposes:

 - On FreeBSD PT_LWPINFO returns pieces of information for the suspended
thread, not the next thread in iteration.

 - FreeBSD uses a custom interface for iterating over threads (actually
retrieving the threads is done with PT_GETNUMLWPS + PT_GETLWPLIST).

 - There is almost no overlapping correct usage of PT_LPWINFO on NetBSD
and PL_LWPINFO FreeBSD and this causes confusion and misuse of the
interfaces (recently I fixed such misuse in the DTrace code).

2. pl_event can merely return whether a signal was emitted to all
threads or a single one. There is no information whether this is per-LWP
signal or per-PROC signal, no siginfo_t information attached etc.

3. Syncing our behavior with FreeBSD would mean complete breakage of our
PT_LWPINFO users and it is actually not needed, as we receive full
siginfo_t through Linux-like PT_GET_SIGINFO instead of reimplementing
siginfo_t inside ptrace_lwpinfo in a FreeBSD-style. (Actually FreeBSD
wanted to follow up after NetBSD and adopt some of our APIs in ptrace(2)
and signals.).

4. Our PT_LWPINFO is reduced in usability to list LWP ids in a traced
process.

5. The PT_LWPINFO semantics cannot be used in core files as-is (as our
PT_LPWINFO returns next LWP, not the prompted one) and pl_event is at
least redundant with netbsd_elfcore_procinfo.cpi_siglwp... and still
less powerful (as it cannot distinguish per-LWP and per-PROC signal in a
single-threaded application).

6. PT_LWPINFO is already documented in the BUGS section of ptrace(2)...
as it contains more flaws. This is basically the only weak part of our
ptrace(2) API.



Proposed solution:

1. Remove PT_LWPINFO from the public ptrace(2) API, keep it only as a
hidden namespaced symbol for legacy purposes.

2. Introduce PT_LWPSTATUS that is prompts the kernel about exact thread
and retrieves useful information about LWP.

3. Introduce PT_LWPNEXT with the iteration semantics from PT_LWPINFO,
namely return the next LWP.

4. Ship with per-LWP information in core(5) files as "PT_LWPSTATUS@nnn".

5. Fix flattening the signal context in netbsd_elfcore_procinfo in
core(5) files and move per-LWP signal information to per-LWP structure
"PT_LWPSTATUS@nnn".

6. Do not bother with FreeBSD like PT_GETNUMLWPS + PT_GETLWPLIST calls,
as this is a micro-optimization. We intend to retrieve the list of
threads once on attach/exec and later trace them through the LWP events
(PTRACE_LWP_CREATE, PTRACE_LWP_EXIT). It's more valuable to keep more
compat with current usage of PT_LWPINFO.

7. Keep the existing ATF tests for PT_LWPINFO to avoid rot.


PT_LWPSTATUS and PT_LWPNEXT operate over newly introduced "struct
ptrace_lwpstatus". This structure is inspired by:
 - SmartOS lwpstatus_t,
 - struct ptrace_lwpinfo from NetBSD,
 - struct ptrace_lwpinfo from FreeBSD

and their usage in real existing world-wide open-source software.


#define PL_LNAMELEN 20  /* extra 4 for alignment */

struct ptrace_lwpstatus {
lwpid_t pl_lwpid;   /* LWP described */
sigset_tpl_sigpend; /* LWP signals pending */
sigset_tpl_sigmask; /* LWP signal mask */
charpl_name[PL_LNAMELEN];   /* LWP name, may be empty */
void*pl_private;/* LWP private data */
/* Add fields at the end */
};


 - pt_lwpid is picked from PT_LWPINFO.

 - pl_event is removed entirely as useless, misleading and harmful.

 - pl_sigpend and pl_sigmask are mainly intended to untangle the
cpi_sig* fields from "struct ptrace_lwpstatus" (fix "XXX" in the kernel
code).

 - pl_name is a quick to use API to retrieve the LWP name, replacing
sysctl() prompting (previous algorithm: retrieving the number of LWPs,
retrieving all LWPs, iterating over them, finding matching id, copying
LWP name); pl_name will also ship with the missing LWP name information
in core(5) files

 - pl_private implements currently missing interface to read the TLS
base value.

In the end I have decided to avoid a write-mode version of PT_LWPSTATUS
that rewrites signals, name or private pointer. These options are
practically unused in real existing open-source software. There are 2
exceptions that I am familiar with but both are specific to kludges
overusing ptrace(2). Once these operations will be really needed, they
can be implemented without write-mode version of PT_LWPSTATUS, patching
guest's code.


Diff fixing the build against the in-sources GDB is as follows:

diff --git 

Re: [filemon] CVS commit: htdocs/support/security

2019-12-17 Thread Kamil Rytarowski
On 17.12.2019 15:44, Andrew Doran wrote:
 Typically with a character device, the kmod can get unloaded while an ioctl
 is being executed on it.
>
> That's solvable.  Conceptually I think the main stumbling block is that
> there are two layers at play which need to reconciled: specfs and devsw.  It
> could also be an opportunity to lessen the distinction between block and
> character devices, there's no real need for cached access from userspace,
> that would simplify things too.
>
 When it comes to syscalls, I haven't looked
 closely, but the issue is likely the same.
>
> It's atomic and side effect free if done correctly.  We have pleasing
> examples of this.  This is hard to get right though, so naturally mistakes
> are made.
>

It would be nice to have at least an example of doing it right.

> Andrew
>



Re: [filemon] CVS commit: htdocs/support/security

2019-12-17 Thread Kamil Rytarowski
On 17.12.2019 14:19, Maxime Villard wrote:
> Le 17/12/2019 à 12:34, Kamil Rytarowski a écrit :
>> On 17.12.2019 09:16, Maxime Villard wrote:
>>>> Module Name:    htdocs
>>>> Committed By:   christos
>>>> Date:   Tue Dec 17 01:03:49 UTC 2019
>>>>
>>>> Modified Files:
>>>>  htdocs/support/security: advisory.html index.html
>>>>
>>>> Log Message:
>>>> new advisory
>>>>
>>>>
>>>> To generate a diff of this commit:
>>>> cvs rdiff -u -r1.140 -r1.141 htdocs/support/security/advisory.html
>>>> cvs rdiff -u -r1.173 -r1.174 htdocs/support/security/index.html
>>>>
>>>> Please note that diffs are not public domain; they are subject to the
>>>> copyright notices on the relevant files.
>>>
>>> There is something I don't understand here. Why keep this totally
>>> useless
>>> misfeature, when we already have many tracing facilities that can do
>>> just
>>> about the same job without having security issues?
>>>
>>> The recommendation in the advisory is literally to remove the kernel
>>> module from the fs. I don't see what could possibly be the use of such a
>>> misfeature as filemon; I would remove it completely from the kernel
>>> source tree.
>>>
>>> Maxime
>>
>> From:
>> http://ftp.netbsd.org/pub/NetBSD/security/advisories/NetBSD-SA2019-006.txt.asc
>>
>>
>> "Additionally the way filemon does filesystem interception is racy
>> and can lead to random crashes if the system calls are in use
>> while the module is unloaded."
>>
>> Is this issue fixable? Not speaking for filemon in particular, I find
>> this ability to rewrite the syscall table on the fly as a feature.
>> Keeping a functional module with this property (even if disabled by
>> default) seems useful to me.
>
> As far as I can tell, there are many races caused by autoloading.
>
> Typically with a character device, the kmod can get unloaded while an ioctl
> is being executed on it. When it comes to syscalls, I haven't looked
> closely, but the issue is likely the same.
>
> You can use tricks to "narrow down" the races; eg in NVMM, I use a global
> 'nmachines' variable, which prevents unloading in ~most cases. But I see
> no way to fix these races, except using atomic refcounts and mutexes on
> the ioctls and syscalls; obviously, this won't scale.
>
> I find this autoloading stuff to be a misfeature, too. If we want something
> on by default, then we should put it in GENERIC; if we want it disabled but
> accessible, then we should put it in a kmod that requires a manual modload
> from root.
>
> Putting stuff in kmods AND having the kernel load them automatically serves
> no purpose; it just adds bugs, and sometimes creates the wrong feeling that
> a driver is disabled while it isn't (like filemon).
>
> Other than that, I don't really understand your point; you can still do
> syscall interception with a custom kmod if you want, regardless of filemon.
>

Out of filemon, I am only interested in syscall_hijack
(filemon_wrapper_install() + filemon_wrapper_deinstall()). If that can
be reliable, I would keep it in src/sys/modules/example.

https://nxr.netbsd.org/xref/src/sys/dev/filemon/filemon_wrapper.c#90

I have no personal interest on the rest of filemon. Switching this
make(1) feature to at least ptrace(2) should be straightforward.

> Maxime



Re: [filemon] CVS commit: htdocs/support/security

2019-12-17 Thread Kamil Rytarowski
On 17.12.2019 09:16, Maxime Villard wrote:
>> Module Name:    htdocs
>> Committed By:   christos
>> Date:   Tue Dec 17 01:03:49 UTC 2019
>>
>> Modified Files:
>>     htdocs/support/security: advisory.html index.html
>>
>> Log Message:
>> new advisory
>>
>>
>> To generate a diff of this commit:
>> cvs rdiff -u -r1.140 -r1.141 htdocs/support/security/advisory.html
>> cvs rdiff -u -r1.173 -r1.174 htdocs/support/security/index.html
>>
>> Please note that diffs are not public domain; they are subject to the
>> copyright notices on the relevant files.
>
> There is something I don't understand here. Why keep this totally useless
> misfeature, when we already have many tracing facilities that can do just
> about the same job without having security issues?
>
> The recommendation in the advisory is literally to remove the kernel
> module from the fs. I don't see what could possibly be the use of such a
> misfeature as filemon; I would remove it completely from the kernel
> source tree.
>
> Maxime

From:
http://ftp.netbsd.org/pub/NetBSD/security/advisories/NetBSD-SA2019-006.txt.asc

"Additionally the way filemon does filesystem interception is racy
and can lead to random crashes if the system calls are in use
while the module is unloaded."

Is this issue fixable? Not speaking for filemon in particular, I find
this ability to rewrite the syscall table on the fly as a feature.
Keeping a functional module with this property (even if disabled by
default) seems useful to me.


Re: [patch] PT_GET_LWP_PRIVATE and PT_SET_LWP_PRIVATE

2019-12-06 Thread Kamil Rytarowski
On 06.12.2019 06:17, Kamil Rytarowski wrote:
> I have implemented l_private accessor in ptrace(2).
> 
> By default this uses l_private from the lwp struct.
> 
> PT_SET_LWP_PRIVATE uses lwp_setprivate() directly and this call
> abstracts internally MI and MD code.
> 
> The PT_SET_LWP_PRIVATE operation uses by default l_private from the
> struct lwp, however whenever appropriate, picks MD logic that grabs this
> value from a MD filed (register or pcb).
> 
> MD code is implemented for:
>  - sparc, sparc64
>  - alpha
>  - hppa
>  - powerpc
>  - sh3
> 
> There is included compat32 support and ATF tests.
> 
> http://netbsd.org/~kamil/patch-00202-PT_GET_LWP_PRIVATE.txt
> 
> This ptrace operation is inspired by PTRACE_GET_THREAD_AREA /
> PTRACE_SET_THREAD_AREA from linux.
> 
> Please review.
> 
> After merging this, I will add an entry in man-page ptrace(2) and
> implement core(5) generation of PT_GET_LWP_PRIVATE for each LWP within a
> process.
> 

Actually, we can go better here and in one go remove one of the issues
in our current ptrace(2) API regarding legacy operation PT_LWPINFO (it
is obsoleted by PT_GET_SIGINFO and confusing with a different PT_LWPINFO
found in FreeBSD).

I will come up with another and better patch.



signature.asc
Description: OpenPGP digital signature


[patch] PT_GET_LWP_PRIVATE and PT_SET_LWP_PRIVATE

2019-12-05 Thread Kamil Rytarowski
I have implemented l_private accessor in ptrace(2).

By default this uses l_private from the lwp struct.

PT_SET_LWP_PRIVATE uses lwp_setprivate() directly and this call
abstracts internally MI and MD code.

The PT_SET_LWP_PRIVATE operation uses by default l_private from the
struct lwp, however whenever appropriate, picks MD logic that grabs this
value from a MD filed (register or pcb).

MD code is implemented for:
 - sparc, sparc64
 - alpha
 - hppa
 - powerpc
 - sh3

There is included compat32 support and ATF tests.

http://netbsd.org/~kamil/patch-00202-PT_GET_LWP_PRIVATE.txt

This ptrace operation is inspired by PTRACE_GET_THREAD_AREA /
PTRACE_SET_THREAD_AREA from linux.

Please review.

After merging this, I will add an entry in man-page ptrace(2) and
implement core(5) generation of PT_GET_LWP_PRIVATE for each LWP within a
process.



signature.asc
Description: OpenPGP digital signature


Re: ptrace(2) interface for TLSBASE

2019-12-04 Thread Kamil Rytarowski
On 04.12.2019 19:11, Jason Thorpe wrote:
> 
> 
>> On Dec 4, 2019, at 8:47 AM, Kamil Rytarowski  wrote:
>>
>> Today it's missing.. do we need it in core files?
> 
> Seems like you would absolutely need it in core files, otherwise the debugger 
> won't know what it is when performing a post-mortem.
> 
> If we add a ptrace accessor a'la Linux, then it fits nicely into the existing 
> model we have for ELF core files:
> 
>  * We also use ptrace(2) request numbers (the ones that exist in
>  * machine-dependent space) to identify register info notes.  The
>  * info in such notes is in the same format that ptrace(2) would
>  * export that information.
> 
> -- thorpej
> 

OK, I will add a core(5) file addition with TLS base
("PT_GET_THREAD_AREA@nnn").

While I will also include a thread name: PT_[SG]ET_THREAD_NAME.

We currently use sysctl to retrieve it from a living process, but miss
in core files. Today using sysctl is more complex than it needs to be
[1] and there is no direct translation to core(5) files.

[1]
https://github.com/llvm/llvm-project/blob/master/lldb/source/Plugins/Process/NetBSD/NativeThreadNetBSD.cpp#L158



signature.asc
Description: OpenPGP digital signature


Re: ptrace(2) interface for TLSBASE

2019-12-04 Thread Kamil Rytarowski
Today it's missing.. do we need it in core files?

On 04.12.2019 16:56, Andrew Cagney wrote:
> Where is it in a core file?
> 
> On Tue, 3 Dec 2019 at 11:18, Kamil Rytarowski  wrote:
>>
>> TLSBASE is stored on a selection of ports in a dedicated mcontext entry,
>> out of gpregs.
>>
>> In the amd64 case mcontext looks like this:
>>
>> typedef struct {
>> __gregset_t __gregs;
>> __greg_t_mc_tlsbase;
>> __fpregset_t__fpregs;
>> } mcontext_t;
>>
>> The same situation is for: i386, arm, m68k and mips.
>>
>> This patch implements the accessors for amd64:
>>
>> http://netbsd.org/~kamil/patch-00199-TLSBASE.txt
>>
>> Does it look correct? If so I will add the same code for i386,
>> i386-on-amd64, arm, m68k and mips.
>>
>> I'm not completely sure as l_private might be desynced with TLSBASE that
>> was updated without letting the kernel know.
>>
>> I intend to get this change merged into -9, before 9.0.
>>
>> This interface pending to be introduced in GDB/NetBSD.
>>
>> NB. FS/GS in x86 __gpregs is confusing as it is not TLS base.
>>




signature.asc
Description: OpenPGP digital signature


Re: ptrace(2) interface for TLSBASE

2019-12-03 Thread Kamil Rytarowski
On 04.12.2019 04:10, Kamil Rytarowski wrote:
> On 03.12.2019 17:55, Joerg Sonnenberger wrote:
>> On Tue, Dec 03, 2019 at 05:11:49PM +0100, Kamil Rytarowski wrote:
>>> TLSBASE is stored on a selection of ports in a dedicated mcontext entry,
>>> out of gpregs.
>>
>> That's an implementation detail and IMO something we shouldn't leak
>> outside the arch. Just provide an accessor for l_private. There are then
>> two possible options:
>> (1) An MD register is used directly for the TLS base. In that case, the
>> register should be used and l_private is normally irrelevant.
>> (2) No MD register is used (directly). In that case l_private is
>> correct.
>>
>> Joerg
>>
> 
> As long as I sympathize with the idea of making it uniform and hide the
> internals I think this is not the best approach here. We exactly want to
> reconstruct more or less mcontext (known as user-data) on per-CPU basis.
> This means that we want to leak the idea of an OS of the registers/state
> on certain CPU into an application through the ptrace(2) accessors.
> 
> The accessor is planned to be used in CPU-specific code in debuggers. We
> already preserved an LLDB field for TLS in amd64 and i386 (on your
> suggestion), reflecting the format of x86 mcontext.
> 
> I don't have users pending where such uniform API would be used. Today
> the only short term planned user is GDB (next, sooner or later LLDB will
> gain it). There are no other recognized open-source projects that I am
> aware that make use of it (except dosemu for DPMI/VM86 and recognition
> of strace for tracking syscalls)
> 
> I think it can be interesting to provide PT_[SG]ET_TLS() that manages
> TLS register inside mcontext/gpregs, but I don't see/have any users for
> it today, so I want to avoid it, at least until the day it can have 1
> real user.
> 
> A question is what to do with vax. As far as I can see, reading/writing
> l_private is still valid for it. On the other hand there are ideas to
> spare a register for TLS and break ABI with other changes, this is why I
> prefer to wait with it.
> 
> Our advantage of picking the PT_[GS]ETTLSBASE API over what we get
> inside FreeBSD/Linux is that it is uniformly named regardless of
> arch/compat. Linux and FreeBSD must detect arch and compat mode before
> prompting either FS or GS (on x86) [1].
> 
> [1]
> https://sources.debian.org/src/gdb/8.3.1-1/gdb/gdbserver/linux-x86-low.c/#L197
> 

Hm... I was too quick. I noted that there is already an interface in a
newer Linux kernel: PTRACE_GET_THREAD_AREA and PTRACE_SET_THREAD_AREA.
(However it was implemented only for few CPUs and not uniformly as
typically in the Linux ptrace()).

It has the same purpose as a generic API to access l_private and there
is a user in GDB for it with IPA [2].

[2] https://suchakra.wordpress.com/2016/06/29/fast-tracing-with-gdb/

PTRACE_GET_THREAD_AREA is used in few other smaller projects (edb,
custom debuggers).

PTRACE_SET_THREAD_AREA is used in usermode Linux.

Both interfaces (get and set) find uses. I will go for an API:
PT_SET_THREAD_AREA and PT_GET_THREAD_AREA and try to target all CPUs.



signature.asc
Description: OpenPGP digital signature


Re: ptrace(2) interface for TLSBASE

2019-12-03 Thread Kamil Rytarowski
On 03.12.2019 17:55, Joerg Sonnenberger wrote:
> On Tue, Dec 03, 2019 at 05:11:49PM +0100, Kamil Rytarowski wrote:
>> TLSBASE is stored on a selection of ports in a dedicated mcontext entry,
>> out of gpregs.
> 
> That's an implementation detail and IMO something we shouldn't leak
> outside the arch. Just provide an accessor for l_private. There are then
> two possible options:
> (1) An MD register is used directly for the TLS base. In that case, the
> register should be used and l_private is normally irrelevant.
> (2) No MD register is used (directly). In that case l_private is
> correct.
> 
> Joerg
> 

As long as I sympathize with the idea of making it uniform and hide the
internals I think this is not the best approach here. We exactly want to
reconstruct more or less mcontext (known as user-data) on per-CPU basis.
This means that we want to leak the idea of an OS of the registers/state
on certain CPU into an application through the ptrace(2) accessors.

The accessor is planned to be used in CPU-specific code in debuggers. We
already preserved an LLDB field for TLS in amd64 and i386 (on your
suggestion), reflecting the format of x86 mcontext.

I don't have users pending where such uniform API would be used. Today
the only short term planned user is GDB (next, sooner or later LLDB will
gain it). There are no other recognized open-source projects that I am
aware that make use of it (except dosemu for DPMI/VM86 and recognition
of strace for tracking syscalls)

I think it can be interesting to provide PT_[SG]ET_TLS() that manages
TLS register inside mcontext/gpregs, but I don't see/have any users for
it today, so I want to avoid it, at least until the day it can have 1
real user.

A question is what to do with vax. As far as I can see, reading/writing
l_private is still valid for it. On the other hand there are ideas to
spare a register for TLS and break ABI with other changes, this is why I
prefer to wait with it.

Our advantage of picking the PT_[GS]ETTLSBASE API over what we get
inside FreeBSD/Linux is that it is uniformly named regardless of
arch/compat. Linux and FreeBSD must detect arch and compat mode before
prompting either FS or GS (on x86) [1].

[1]
https://sources.debian.org/src/gdb/8.3.1-1/gdb/gdbserver/linux-x86-low.c/#L197



signature.asc
Description: OpenPGP digital signature


ptrace(2) interface for TLSBASE

2019-12-03 Thread Kamil Rytarowski
TLSBASE is stored on a selection of ports in a dedicated mcontext entry,
out of gpregs.

In the amd64 case mcontext looks like this:

typedef struct {
__gregset_t __gregs;
__greg_t_mc_tlsbase;
__fpregset_t__fpregs;
} mcontext_t;

The same situation is for: i386, arm, m68k and mips.

This patch implements the accessors for amd64:

http://netbsd.org/~kamil/patch-00199-TLSBASE.txt

Does it look correct? If so I will add the same code for i386,
i386-on-amd64, arm, m68k and mips.

I'm not completely sure as l_private might be desynced with TLSBASE that
was updated without letting the kernel know.

I intend to get this change merged into -9, before 9.0.

This interface pending to be introduced in GDB/NetBSD.

NB. FS/GS in x86 __gpregs is confusing as it is not TLS base.



signature.asc
Description: OpenPGP digital signature


Re: netbsd32_{,u}int64 in sys/types.h for compat/sys/siginfo.h

2019-11-23 Thread Kamil Rytarowski
On 23.11.2019 17:29, Christos Zoulas wrote:
> In article ,
> Christos Zoulas  wrote:
>> In article <468095c0-b973-7f23-1cfa-3dc19e3b8...@gmail.com>,
>> Rin Okuyama   wrote:
>>> On 2019/11/22 10:56, Christos Zoulas wrote:
 In article <679493cf-3e85-f56d-85e4-dfaf7958a...@gmail.com>,
 Rin Okuyama   wrote:
>>> ...
> This was my misunderstanding. These codes are used when tracer is 64-bit
> and traced is 32-bit. Don't know whether this is useful though...

 Yes, and someone broke them because all the ptrace 64->32 calls for
 registers seem to return 0 now. Was that code changed recently?
>>> ...
>>>
>>> I fixed it:
>>> http://mail-index.netbsd.org/source-changes/2019/11/22/msg111068.html
>>>
>>> Now, gdb/amd64 seems to work for i386 binaries to some extent, at least.
>>
>> Thanks! I think that the gdb on head should working for i386 binaries
>> on amd64. I am installing a new kernel and userland and I will test
>> shortly.
> 
> And it works fine for both static and dynamic binaries. We could also add
> kernel debugging support for i386 kernels, but that needs changes in
> the i386 header files:
> 
> 1. Everywhere s/#include I think this is a good change anyway for all MD headers because they
>expect to load their own arch stuff. MI stuff needs to stay " 2. change __vaddr_t in segments.h ld_descriptor to uint32_t
>That should have no impact and it is a good change because it is a
>packed structure and needs to be fixed size.
> 3. Add some fixed field sizes in struct pcb instead of pointers with
>#ifdef __x86_64__ (or copy struct pcb into gdb like kamil did). This
>is just ugly, but I think it is better than copying the struct. I
>could add a comment explaining why.
> 
> Is it worth it?
> 

I consider fixing this as required.

> chritod
> 




signature.asc
Description: OpenPGP digital signature


Re: __{read,write}_once

2019-11-22 Thread Kamil Rytarowski
On 22.11.2019 02:42, Robert Elz wrote:
> Date:Fri, 22 Nov 2019 01:04:56 +0100
> From:    Kamil Rytarowski 
> Message-ID:  <1a9d9b40-42fe-be08-d7b3-e6ecead5b...@gmx.com>
> 
> 
>   | I think that picking C11 terminology is the way forward.
> 
> Use a name like that iff the intent is to also exactly match the
> semantics implied, otherwise it will only create more confusion.
> 
> kre
> 

I think this matches. We want to make operation in 'single operation'
(or looking like so) and 'atomic' is a settled name in English (there
are options like: 'integral', 'intrinsic', 'elemental', 'essential',
'fundamental' or 'single operation').

If there are stronger feelings against it, I would go for
'__write_singleop()', '__read_singleop()'.



signature.asc
Description: OpenPGP digital signature


Re: __{read,write}_once

2019-11-21 Thread Kamil Rytarowski
On 22.11.2019 00:53, Robert Elz wrote:
> Date:Thu, 21 Nov 2019 19:19:51 +0100
> From:Maxime Villard 
> Message-ID:  
> 
>   | So in the end which name do we use? Are people really unhappy with 
> _racy()?
>   | At least it has a general meaning, and does not imply atomicity or 
> ordering.
> 
> I dislike naming discussions, as in general, while there are often a
> bunch of obviously incorrect names (here for example, read_page_data() ...)
> there is very rarely one obviously right answer, and it all really just 
> becomes
> a matter of choice for whoever is doing it.
> 
> Nevertheless, perhaps something that says more what is actually happening,
> rather than mentions what doesn't matter (races here), so perhaps something
> like {read,write}_single_cycle() or {read,write}_1_bus_xact() or something
> along those lines ?
> 
> kre
> 

I think that picking C11 terminology is the way forward. It's settled
probably forever now and it will be simpler to find corresponding
documentation and specification of it.



signature.asc
Description: OpenPGP digital signature


Re: netbsd32_{,u}int64 in sys/types.h for compat/sys/siginfo.h

2019-11-21 Thread Kamil Rytarowski
On 21.11.2019 10:49, Rin Okuyama wrote:
> On 2019/11/20 20:12, Rin Okuyama wrote:
>> On 2019/11/19 22:59, Kamil Rytarowski wrote:
> ...
>>>
>>> We still miss compat32 support for PT_GETXMMREGS and PT_SETXMMREGS, at
>>> some point of time we shall add it for completeness.
>>
>> Thank you!
>>
>> With amd64/netbsd32_machdep.c rev 1.130, all tests in t_ptrace* pass in
>> COMPAT_NETBSD32 on amd64, except for that involved with XMM registers.
>> I will examine how to implement PT32_[GS]ETXMMREGS.
> 
> I wrote a draft version of patch which adds PT32_[GS]ETXMMREGS support:
> 
> http://www.netbsd.org/~rin/amd64-PT32_GSETXMMREGS-20191121.patch
> 
> With this patch, XMM-related tests pass for COMPAT_NETBSD32 on amd64.
> 
> Some remarks:
> 
> (1) PT_[GS]ETXMMREGS ptrace(2) commands are added to .
>     These are only used for COMPAT_NETBSD32, and not exposed to userland.
> 

This is correct.
We don't want to export XMMREGS to amd64 users.

Pleae remove /* */ from this part:

+/*
+   "PT_GETXMMREGS", \
+   "PT_SETXMMREGS"
+ */

This will allow ktruss and related software to have meaningful form for
compat32 ptracing.

> (2) COMPAT_NETBSD32 codes are called from process_machdep.c via
>     module_hook framework. This may be too much though...
> 

I have no opinion here.

> Comments?
> 

I will leave this to be reviewed by mgorny@. Adding him to CC.

>> Also, it seems that some COMPAT_NETBSD32 related codes for amd64 need to
>> be cleaned up. For example, there remain COMPAT_NETBSD32 codes in
>> amd64/process_machdep.c, that are no longer used:
>>
>> https://nxr.netbsd.org/xref/src/sys/arch/amd64/amd64/process_machdep.c#129
>>
>> https://nxr.netbsd.org/xref/src/sys/arch/amd64/amd64/process_machdep.c#183
>>
>> ...
>>
>> I will examine them too.
> 
> This was my misunderstanding. These codes are used when tracer is 64-bit
> and traced is 32-bit. Don't know whether this is useful though...
> 
> Thanks,
> rin

Maxime added it. If you want to change it, please consult with maxv@.



signature.asc
Description: OpenPGP digital signature


Re: Proposal: validate FFS root inode during the mount.

2019-11-20 Thread Kamil Rytarowski
On 20.11.2019 18:14, Michael van Elst wrote:
> n...@gmx.com (Kamil Rytarowski) writes:
> 
>> =46rom a high level point of view, we want to reject early corrupted FS o=
>> n
>> a mount. Today we panic the kernel needlessly.
> 
> 
> Rejecting won't help much, there are so many other parts that may be corrupt
> that you cannot validate on mount.
> 

For start we want to stop the kernel from crashing on mount.

> The goal should be to gracefully handle corrupted data structures by returning
> errors instead of crashing the kernel.
> 

mbouyer@ wants to panic always, after a successful mount.

I have no strong opinion except handling the corrupted data either with
a panic or some error returned from the kernel.



signature.asc
Description: OpenPGP digital signature


Re: Proposal: validate FFS root inode during the mount.

2019-11-20 Thread Kamil Rytarowski
On 20.11.2019 16:11, Mouse wrote:
>> During the fuzzing of FFS filesystem, we had a couple of issues
>> caused by corrupted inode fields.  [...]
> 
>> To make sure that corrupted mount won't cause harm to the user, I
>> want to add function to validate root inode on mount step (after
>> superblock validation)
> 
> Don't you have more or less the same issue with every other non-free
> inode in the filesystem?  The only thing I can see that's special about
> the root inode in this regard is that it is the only inode that is used
> immediately upon mount.
> 

From a high level point of view, we want to reject early corrupted FS on
a mount. Today we panic the kernel needlessly.



signature.asc
Description: OpenPGP digital signature


Re: netbsd32_{,u}int64 in sys/types.h for compat/sys/siginfo.h

2019-11-19 Thread Kamil Rytarowski
On 18.11.2019 13:08, Rin Okuyama wrote:
> On 2019/11/18 20:15, Kamil Rytarowski wrote:
>> I was thinking about something along these lines:
>>
>> http://netbsd.org/~kamil/patch-00196-siginfo_netbsd32_compat_uint64.txt
>>
>> In future some compat of i386 could use 8-byte alignment for 64-bit
>> types.
>>
>> Not build tested.
>>
> 
> Thank you for providing the patch. It looks fine for me. I committed
> it with changes below:
> 
> - Take into account of __arm__ && __ARM_EABI__ for compat with arm-oabi
> - Undefine NETBSD32_SIGINFO_UINT64_ALIGN for clarity
> 
> Thanks!
> rin

Great work!

Please try to run in compat32:

cd /usr/tests/lib/libc/sys
atf-run t_ptrace* | atf-report

We still miss compat32 support for PT_GETXMMREGS and PT_SETXMMREGS, at
some point of time we shall add it for completeness.



signature.asc
Description: OpenPGP digital signature


Re: netbsd32_{,u}int64 in sys/types.h for compat/sys/siginfo.h

2019-11-18 Thread Kamil Rytarowski
On 18.11.2019 11:40, Rin Okuyama wrote:
> On 2019/11/18 16:52, Kamil Rytarowski wrote:
>> On 18.11.2019 05:38, Rin Okuyama wrote:
>>> Thank you for your comment!
>>>
>>> On 2019/11/17 22:42, Kamil Rytarowski wrote:
>>>> Please check it also with picotrace/truss:
>>>>
>>>> http://pkgsrc.se/devel/picotrace
>>>
>>> netbsd32_signal.c needed to catch up with kern_sig.c so that syscall
>>> information is provided with SIGTRAP TRAP_SCE/TRAP_SCX. I committed
>>> the fix and picotrace/truss works fine now on COMPAT_NETBSD32.
>>>
>>
>> Thanks! I have submitted a mail how to further improve it.
> 
> Thank you for your advice! I committed it!
> 
>>> On 2019/11/17 22:42, Kamil Rytarowski wrote:
>>>> On 17.11.2019 04:34, Rin Okuyama wrote:
>>>>> Hi,
>>>>>
>>>>> In order to distangle circular dependency between
>>>>>  v.s. ,
>>>>> I propose
>>>>>
>>>>> (1) Move NETBSD32_INT64_ALIGN from  to
>>>>> 
>>>>>
>>>>> (2) Move netbsd32_{,u}int64 from  to
>>>>> 
>>>>>
>>>>> See attached patch for example on amd64.
>>>>>
>>>>
>>>> What do you think about duplicating the defines of netbsd32_uint64
>>>> inside sys/compat/sys/siginfo.h + adding a comment about keeping it in
>>>> sync with netbsd32.h?
>>>>
>>>> I think that avoiding spaghetti dependencies is a benefit.
>>>>
>>>> We already duplicated there _ptrace_state, removing circular
>>>> dependencies between sys/ptrace.h and sys/siginfo.h.
>>>
>>> I don't think this is a good idea in this case. If we want to have
>>> duplicate define of netbsd32_uint64, and to avoid an "#ifdef __x86_64__"
>>> mess in , we need to move NETBSD32_INT64_ALIGN to
>>>  other than . If so,
>>> why not ?
>>>
>>
>> No need to move anything out of machine/netbsd32_machdep.h. It's
>> sufficient to define netbsd32_int64 with a custom non-conflicting name
>> or protect it with #ifdef before typedefing/defining twice.
>>
>>> Also, in my proposal, spaghetti dependencies are avoided in the end;
>>> everyone depends on , and not on each other.
>>>
>>
>> However I have no strong opinions here. I would personally avoid
>> compat32 definitions in sys/types.h.
>>
>> Compat code tends to need hacks, so it is sensible imho to restrict them
>> to compat headers (I am aware that it's not always followed).
> 
> I agree with you on that compat codes should not be put into sys headers
> as far as possible. However, I still do not understand what you mean.
> 
> (1) NETBSD32_INT64_ALIGN == __attribute__((__aligned__(4))) is in
>     , and
> 
> (2)  needs 
> 
> Therefore, we cannot use NETBSD32_INT64_ALIGN in .
> Of course, we can typedef _netbsd32_uint64 in  like:
> 
> typedef uint64_t _netbsd32_uint64
> #ifdef __x86_64__
>     __attribute__((__aligned__(4)))
> #endif
>     ;
> 
> Do you think it is OK? I guess that I miss some elementary things...
> Sorry in advance ;-).
> 
> Thanks,
> rin

I was thinking about something along these lines:

http://netbsd.org/~kamil/patch-00196-siginfo_netbsd32_compat_uint64.txt

In future some compat of i386 could use 8-byte alignment for 64-bit types.

Not build tested.



signature.asc
Description: OpenPGP digital signature


Re: netbsd32_{,u}int64 in sys/types.h for compat/sys/siginfo.h

2019-11-17 Thread Kamil Rytarowski
On 18.11.2019 05:38, Rin Okuyama wrote:
> Thank you for your comment!
> 
> On 2019/11/17 22:42, Kamil Rytarowski wrote:
>> Please check it also with picotrace/truss:
>>
>> http://pkgsrc.se/devel/picotrace
> 
> netbsd32_signal.c needed to catch up with kern_sig.c so that syscall
> information is provided with SIGTRAP TRAP_SCE/TRAP_SCX. I committed
> the fix and picotrace/truss works fine now on COMPAT_NETBSD32.
> 

Thanks! I have submitted a mail how to further improve it.

> On 2019/11/17 22:42, Kamil Rytarowski wrote:
>> On 17.11.2019 04:34, Rin Okuyama wrote:
>>> Hi,
>>>
>>> In order to distangle circular dependency between
>>>  v.s. ,
>>> I propose
>>>
>>> (1) Move NETBSD32_INT64_ALIGN from  to
>>> 
>>>
>>> (2) Move netbsd32_{,u}int64 from  to
>>> 
>>>
>>> See attached patch for example on amd64.
>>>
>>
>> What do you think about duplicating the defines of netbsd32_uint64
>> inside sys/compat/sys/siginfo.h + adding a comment about keeping it in
>> sync with netbsd32.h?
>>
>> I think that avoiding spaghetti dependencies is a benefit.
>>
>> We already duplicated there _ptrace_state, removing circular
>> dependencies between sys/ptrace.h and sys/siginfo.h.
> 
> I don't think this is a good idea in this case. If we want to have
> duplicate define of netbsd32_uint64, and to avoid an "#ifdef __x86_64__"
> mess in , we need to move NETBSD32_INT64_ALIGN to
>  other than . If so,
> why not ?
> 

No need to move anything out of machine/netbsd32_machdep.h. It's
sufficient to define netbsd32_int64 with a custom non-conflicting name
or protect it with #ifdef before typedefing/defining twice.

> Also, in my proposal, spaghetti dependencies are avoided in the end;
> everyone depends on , and not on each other.
> 

However I have no strong opinions here. I would personally avoid
compat32 definitions in sys/types.h.

Compat code tends to need hacks, so it is sensible imho to restrict them
to compat headers (I am aware that it's not always followed).

> Thanks,
> rin




signature.asc
Description: OpenPGP digital signature


Re: netbsd32_{,u}int64 in sys/types.h for compat/sys/siginfo.h

2019-11-17 Thread Kamil Rytarowski
On 17.11.2019 04:34, Rin Okuyama wrote:
> Hi,
> 
> In order to distangle circular dependency between
>  v.s. ,
> I propose
> 
> (1) Move NETBSD32_INT64_ALIGN from  to
> 
> 
> (2) Move netbsd32_{,u}int64 from  to
> 
> 
> See attached patch for example on amd64.
> 

What do you think about duplicating the defines of netbsd32_uint64
inside sys/compat/sys/siginfo.h + adding a comment about keeping it in
sync with netbsd32.h?

I think that avoiding spaghetti dependencies is a benefit.

We already duplicated there _ptrace_state, removing circular
dependencies between sys/ptrace.h and sys/siginfo.h.

> Background is:
> 
> Now, gdb for i386 does not work again on amd64 (both on -current and
> netbsd-9) with:
> 
> ptrace: Invalid arguments.
> 
> This is because sizeof(struct netbsd32_ptrace_siginfo) is 128+4+4=136
> on amd64 whereas sizeof(struct ptrace_siginfo) is 128+4=132 on i386;
> netbsd32_ptrace_siginfo has uint64_t members via siginfo32_t via
> __ksiginfo32 since compat/sys/siginfo.h rev 1.5:
> 
> http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/compat/sys/siginfo.h#rev1.5
> 
> As a result, netbsd32_ptrace_siginfo requires 4-byte tail padding on
> amd64. However, tail padding does not appear on i386, since 64-bit
> objects only need 4-byte alignment on i386.
> 
> Actually, gdb/i386 becomes sane with this hack:
> 

Please check it also with picotrace/truss:

http://pkgsrc.se/devel/picotrace


> 
> Index: sys/compat/sys/siginfo.h
> ===
> RCS file: /home/netbsd/src/sys/compat/sys/siginfo.h,v
> retrieving revision 1.8
> diff -p -u -r1.8 siginfo.h
> --- sys/compat/sys/siginfo.h    30 Sep 2019 21:13:33 -    1.8
> +++ sys/compat/sys/siginfo.h    12 Nov 2019 11:04:58 -
> @@ -34,6 +34,12 @@
>  
>  #ifdef _KERNEL
>  
> +typedef uint64_t _args_t
> +#ifdef __x86_64__
> +    __attribute__((__aligned__(4)))
> +#endif
> +    ;
> +
>  typedef union sigval32 {
>  int sival_int;
>  uint32_t sival_ptr;
> @@ -73,7 +79,7 @@ struct __ksiginfo32 {
>  int    _sysnum;
>  int    _retval[2];
>  int    _error;
> -    uint64_t _args[8]; /* SYS_MAXSYSARGS */
> +    _args_t    _args[8]; /* SYS_MAXSYSARGS */
>  } _syscall;
>  
>  struct {
> 
> 
> We provide netbsd32_uint64 for this purpose:
> 
> typedef uint64_t netbsd32_uint64 NETBSD32_INT64_ALIGN;
> 
> and NETBSD32_INT64_ALIGN is __attribute__((__aligned__(4))) on amd64.
> However, unfortunately, NETBSD32_INT64_ALIGN is defined in
> , and  requires
> .
> 
> Thoughts? Any comments or objections?
> 
> Thanks,
> rin




signature.asc
Description: OpenPGP digital signature


Re: __{read,write}_once

2019-11-16 Thread Kamil Rytarowski
On 16.11.2019 15:31, Mindaugas Rasiukevicius wrote:
> Maxime Villard  wrote:
>> Alright so let's add the macros with volatile (my initial patch). Which
>> name do we use? I actually like __{read,write}_racy().
> 
> I suggest __atomic_load_relaxed()/__atomic_store_relaxed().  If these
> are to be provided as macros, then I also suggest to ensure that they
> provide compiler-level barrier.
> 

I'm in favor of this naming.



signature.asc
Description: OpenPGP digital signature


Re: __{read,write}_once

2019-11-08 Thread Kamil Rytarowski
On 08.11.2019 13:40, Mindaugas Rasiukevicius wrote:
>>> There is more code in the NetBSD kernel which needs fixing.  I would say
>>> pretty much all lock-free code should be audited.
>> I believe KCSAN can greatly help with that, since it automatically reports
>> concurrent accesses. Up to us then to switch to atomic, or other kinds of
>> markers like READ_ONCE.
> Is there a CSAN i.e. such sanitizer for userspace applications?

No.. there is TSan. If it would be useful to get CSan in userspace it
probably shouldn't be too difficult to write it. At least full TSan is
heavy and requires 64bit CPU.



signature.asc
Description: OpenPGP digital signature


Re: __{read,write}_once

2019-11-06 Thread Kamil Rytarowski
On 06.11.2019 20:38, Mindaugas Rasiukevicius wrote:
> Maxime Villard  wrote:
>> There are cases in the kernel where we read/write global memory
>> locklessly, and accept the races either because it is part of the design
>> (eg low-level scheduling) or we simply don't care (eg global stats).
>>
>> In these cases, we want to access the memory only once, and need to ensure
>> the compiler does not split that access in several pieces, some of which
>> may be changed by concurrent accesses. There is a Linux article [1] about
>> this, and also [2]. I'd like to introduce the following macros:
>>
>> <...>
> 
> A few comments for everybody here:
> 
> - There is a general need in the NetBSD kernel for atomic load/store
> semantics.  This is because plain accesses (loads/stores) are subject
> to _tearing_, _fusing_ and _invented_ loads/stores.  I created a new
> thread which might help to clarify these and various other aspects:
> 
> http://mail-index.netbsd.org/tech-kern/2019/11/06/msg025664.html
> 

Thank you.

> - In C11, this can be handled with atomic_{load,store}_explicit() and
> memory_order_relaxed (or stronger memory order).
> 
> - If we do not want to stick with the C11 API (its emulation), then I
> would suggest to use the similar terminology, e.g. atomic_load_relaxed()
> and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE().  There is
> no reason invent new terminology; it might just add more confusion.
> 
> Thanks.
> 

I have no opinion here, just please do the right thing. Unless there are
any shortcomings it would be nice to follow closely C11.

If that information helps or not, we also need C11/C++14 libatomic-like
library in userland.



signature.asc
Description: OpenPGP digital signature


  1   2   3   4   >