kern_exit.c : 2 tiny buglets + a potential uvm leak

2020-02-29 Thread Amit Kulkarni
Hi,

1) pr->ps_ru is already NULL, so code can be shrunk.
2) missing timeout_del, we clear the 2 timeouts in struct process, but not from 
struct proc.
3) ps_mainproc is allocated in thread_new(), passed to process_new(), and then 
to process_initialize(). ps_mainproc never gets a call to uvm_uarea_free(). So 
to allow ps_mainproc to be freed when called finally in process_zap(), we 
shuffle the uvm_uarea_free() code into proc_free(). Is this analysis correct or 
wrong?

Thanks

Index: kern/kern_exit.c
===
RCS file: /cvs/src/sys/kern/kern_exit.c,v
retrieving revision 1.184
diff -u -p -u -p -r1.184 kern_exit.c
--- kern/kern_exit.c28 Feb 2020 17:03:05 -  1.184
+++ kern/kern_exit.c29 Feb 2020 20:29:16 -
@@ -172,12 +172,7 @@ exit1(struct proc *p, int xexit, int xsi
rup = pr->ps_ru;
if (rup == NULL) {
rup = pool_get(_pool, PR_WAITOK | PR_ZERO);
-   if (pr->ps_ru == NULL) {
-   pr->ps_ru = rup;
-   } else {
-   pool_put(_pool, rup);
-   rup = pr->ps_ru;
-   }
+   pr->ps_ru = rup;
}
p->p_siglist = 0;
if ((p->p_flag & P_THREAD) == 0)
@@ -390,6 +385,14 @@ exit2(struct proc *p)
 void
 proc_free(struct proc *p)
 {
+   timeout_del(>p_sleep_to);
+   /*
+* Free the VM resources we're still holding on to.
+* We must do this from a valid thread because doing
+* so may block.
+*/
+   uvm_uarea_free(p);
+   p->p_vmspace = NULL;/* zap the thread's copy */
crfree(p->p_ucred);
pool_put(_pool, p);
nthreads--;
@@ -422,14 +425,6 @@ reaper(void *arg)
WITNESS_THREAD_EXIT(p);
 
KERNEL_LOCK();
-
-   /*
-* Free the VM resources we're still holding on to.
-* We must do this from a valid thread because doing
-* so may block.
-*/
-   uvm_uarea_free(p);
-   p->p_vmspace = NULL;/* zap the thread's copy */
 
if (p->p_flag & P_THREAD) {
/* Just a thread */



Re: Debugger, traced processes & exit status

2020-02-29 Thread Amit Kulkarni
Hi,

> @@ -303,6 +312,7 @@ struct process {
>  #define  PS_PLEDGE   0x0010  /* Has called pledge(2) */
>  #define  PS_WXNEEDED 0x0020  /* Process may violate W^X */
>  #define  PS_EXECPLEDGE   0x0040  /* Has exec pledges */
> +#define  PS_ORPHAN   0x0080  /* Process is on an orphan list 
> */
>  
>  #define  PS_BITS \
>  ("\20" "\01CONTROLT" "\02EXEC" "\03INEXEC" "\04EXITING" "\05SUGID" \

PS_ORPHAN entry needs to be added to PS_BITS.

Off topic to this diff: kern_exit.c is a weird mix of ISSET() and boolean 
comparison with 1) flag_field & FLAG 2) == 0. Is it ok to send a follow-up diff 
with ISSET() or !ISSET() as applicable throughout this file, so it reads easier?

Thanks



Re: Debugger, traced processes & exit status

2020-02-29 Thread Martin Pieuchot
On 28/02/20(Fri) 18:24, Martin Pieuchot wrote:
> Diff below fixes multiple issues with traced process, exposed by the
> regression test attached:
> 
> - When a debugging process exit, give back the traced process to its
>   original parent, if it exists, instead of init(8).
> 
> - When a traced process exit, make sure the original parent receives
>   the exit status only after the debugger has seen it.  This is done
>   by keeping a list of 'orphaned' children in the original parent and
>   looking in it in dowait4().
> 
> The logic and most of the code comes from FreeBSD as pointed out by
> guenther@.

Improved versions that:

 - Introduce process_clear_orphan() to remove a process from an orphan
   list.

 - Introduce process_untrace() to give back a process to its original
   parent or to init(8), used in both PT_DETACH & exit1(). 

 - Rename proc_reparent() into process_reparent() for coherency

Ok?

Index: kern/kern_exit.c
===
RCS file: /cvs/src/sys/kern/kern_exit.c,v
retrieving revision 1.184
diff -u -p -r1.184 kern_exit.c
--- kern/kern_exit.c28 Feb 2020 17:03:05 -  1.184
+++ kern/kern_exit.c29 Feb 2020 18:50:09 -
@@ -76,6 +76,7 @@
 #endif
 
 void   proc_finish_wait(struct proc *, struct proc *);
+void   process_clear_orphan(struct process *);
 void   process_zap(struct process *);
 void   proc_free(struct proc *);
 void   unveil_destroy(struct process *ps);
@@ -253,21 +254,22 @@ exit1(struct proc *p, int xexit, int xsi
}
 
/*
-* Give orphaned children to init(8).
+* Reparent children to their original parent, in case
+* they were being traced, or to init(8).
 */
qr = LIST_FIRST(>ps_children);
if (qr) /* only need this if any child is S_ZOMB */
wakeup(initprocess);
for (; qr != 0; qr = nqr) {
nqr = LIST_NEXT(qr, ps_sibling);
-   proc_reparent(qr, initprocess);
/*
 * Traced processes are killed since their
 * existence means someone is screwing up.
 */
if (qr->ps_flags & PS_TRACED &&
!(qr->ps_flags & PS_EXITING)) {
-   atomic_clearbits_int(>ps_flags, PS_TRACED);
+   process_untrace(qr);
+
/*
 * If single threading is active,
 * direct the signal to the active
@@ -278,8 +280,19 @@ exit1(struct proc *p, int xexit, int xsi
STHREAD);
else
prsignal(qr, SIGKILL);
+   } else {
+   process_reparent(qr, initprocess);
}
}
+
+   /*
+* Make sure orphans won't remember the exiting process.
+*/
+   while ((qr = LIST_FIRST(>ps_orphans)) != NULL) {
+   KASSERT(qr->ps_oppid == pr->ps_pid);
+   qr->ps_oppid = 0;
+   process_clear_orphan(qr);
+   }
}
 
/* add thread's accumulated rusage into the process's total */
@@ -310,7 +323,7 @@ exit1(struct proc *p, int xexit, int xsi
 */
if (pr->ps_flags & PS_NOZOMBIE) {
struct process *ppr = pr->ps_pptr;
-   proc_reparent(pr, initprocess);
+   process_reparent(pr, initprocess);
wakeup(ppr);
}
 
@@ -562,6 +575,29 @@ loop:
return (0);
}
}
+   /*
+* Look in the orphans list too, to allow the parent to
+* collect it's child exit status even if child is being
+* debugged.
+*
+* Debugger detaches from the parent upon successful
+* switch-over from parent to child.  At this point due to
+* re-parenting the parent loses the child to debugger and a
+* wait4(2) call would report that it has no children to wait
+* for.  By maintaining a list of orphans we allow the parent
+* to successfully wait until the child becomes a zombie.
+*/
+   if (nfound == 0) {
+   LIST_FOREACH(pr, >p_p->ps_orphans, ps_orphan) {
+   if ((pr->ps_flags & PS_NOZOMBIE) ||
+   (pid != WAIT_ANY &&
+   pr->ps_pid != pid &&
+   pr->ps_pgid != -pid))
+   continue;
+   nfound++;
+   break;
+   }
+   }
if (nfound == 0)
   

Re: patch for dump for high percentages

2020-02-29 Thread Theo de Raadt
This is only hiding a problem in the estimation math, rather
than fixing it.

I don't like it.

Peter J. Philipp  wrote:

> Hi,
> 
> I have a patch for dump(8) if it is generally considered bad if percentage
> done is over 100.0%.  I checked the archives on marc.info for this and didn't
> see any discussion whether this was a topic before.
> 
> Here is the odd DUMP message I got on a host:
> 
> DUMP: 102.41% done, finished in 0:00
> 
> And here is the patch:
> 
> Index: optr.c
> ===
> RCS file: /cvs/src/sbin/dump/optr.c,v
> retrieving revision 1.40
> diff -u -p -u -r1.40 optr.c
> --- optr.c22 Jan 2019 16:16:26 -  1.40
> +++ optr.c29 Feb 2020 16:19:25 -
> @@ -202,6 +202,7 @@ void
>  timeest(void)
>  {
>   time_t  tnow, deltat;
> + float percent;
>  
>   (void) time();
>   if (tnow >= tschedule) {
> @@ -211,8 +212,9 @@ timeest(void)
>   deltat = tstart_writing - tnow +
>   (1.0 * (tnow - tstart_writing))
>   / blockswritten * tapesize;
> + percent = (blockswritten * 100.0) / tapesize;
>   msg("%3.2f%% done, finished in %lld:%02lld\n",
> - (blockswritten * 100.0) / tapesize,
> + (percent > 100.0) ? 100.0 : percent,
>   (long long)deltat / 3600,
>   ((long long)deltat % 3600) / 60);
>   }
> 
> 
> I tested this and it seems to report like before, only when it hits 100.0 it
> will not go higher, that particular codepath I didn't hit though.
> 
> beta# dump -0uaf - /var | gzip -c > /dev/null
>   DUMP: Date of this level 0 dump: Sat Feb 29 17:20:26 2020
>   DUMP: Date of last level 0 dump: the epoch
>   DUMP: Dumping /dev/rsd2e (/var) to standard output
>   DUMP: mapping (Pass I) [regular files]
>   DUMP: mapping (Pass II) [directories]
>   DUMP: estimated 23891439 tape blocks.
>   DUMP: Volume 1 started at: Sat Feb 29 17:20:37 2020
>   DUMP: dumping (Pass III) [directories]
>   DUMP: dumping (Pass IV) [regular files]
>   DUMP: 25.98% done, finished in 0:14
> 
> Let me know what you think,
> 
> Regards,
> -peter
> 



Fix use of WITNESS_UNLOCK() in rw_exit_{read,write}()

2020-02-29 Thread Visa Hankala
There is a bug in how rw_exit_read() and rw_exit_write() use
WITNESS_UNLOCK(). The fast paths call WITNESS_UNLOCK() after releasing
the actual lock. This leads to a use-after-free in the checker if the
lock is dynamically allocated and another thread happens to free it
too early.

The following diff splits rw_exit() into two separate functions, one
which is the rw_exit(9) frontend, and the other, rw_do_exit(), which is
common between rw_exit(9), rw_exit_read(9) and rw_exit_write(9). This
makes it possible to call WITNESS_UNLOCK() before the actual lock
release without turning the code into an #ifdef maze.

The diff additionally does the following tweaks:

* Invoke membar_exit_before_atomic() only once even if the slow path is
  taken. The initial barrier is enough for the release semantics.

* Read rwl_owner for rw_cas() as late as possible so that the CAS
  operation would be less likely to fail. Note that in rw_exit() the
  purpose of the read is different; it determines the type of the lock
  (read / write).

* Put the rw_cas() in rw_exit() (now rw_do_exit()) inside
  __predict_false(). This compacts the resulting machine code a bit.

The code could be simplified by removing the fast path from
rw_exit_read() and rw_exit_write(), and always calling rw_do_exit().
However, that would reduce performance a tiny amount. I saw about 0.5%
increase in kernel build time in a test setup. The patch below does
not seem to affect performance negatively.

OK?

Index: kern/kern_rwlock.c
===
RCS file: src/sys/kern/kern_rwlock.c,v
retrieving revision 1.44
diff -u -p -r1.44 kern_rwlock.c
--- kern/kern_rwlock.c  30 Nov 2019 11:19:17 -  1.44
+++ kern/kern_rwlock.c  29 Feb 2020 16:24:59 -
@@ -25,6 +25,8 @@
 #include 
 #include 
 
+void   rw_do_exit(struct rwlock *, unsigned long);
+
 /* XXX - temporary measure until proc0 is properly aligned */
 #define RW_PROC(p) (((long)p) & ~RWLOCK_MASK)
 
@@ -129,31 +131,31 @@ rw_enter_write(struct rwlock *rwl)
 void
 rw_exit_read(struct rwlock *rwl)
 {
-   unsigned long owner = rwl->rwl_owner;
+   unsigned long owner;
 
rw_assert_rdlock(rwl);
+   WITNESS_UNLOCK(>rwl_lock_obj, 0);
 
membar_exit_before_atomic();
+   owner = rwl->rwl_owner;
if (__predict_false((owner & RWLOCK_WAIT) ||
rw_cas(>rwl_owner, owner, owner - RWLOCK_READ_INCR)))
-   rw_exit(rwl);
-   else
-   WITNESS_UNLOCK(>rwl_lock_obj, 0);
+   rw_do_exit(rwl, 0);
 }
 
 void
 rw_exit_write(struct rwlock *rwl)
 {
-   unsigned long owner = rwl->rwl_owner;
+   unsigned long owner;
 
rw_assert_wrlock(rwl);
+   WITNESS_UNLOCK(>rwl_lock_obj, LOP_EXCLUSIVE);
 
membar_exit_before_atomic();
+   owner = rwl->rwl_owner;
if (__predict_false((owner & RWLOCK_WAIT) ||
rw_cas(>rwl_owner, owner, 0)))
-   rw_exit(rwl);
-   else
-   WITNESS_UNLOCK(>rwl_lock_obj, LOP_EXCLUSIVE);
+   rw_do_exit(rwl, RWLOCK_WRLOCK);
 }
 
 #ifdef DIAGNOSTIC
@@ -314,22 +316,29 @@ retry:
 void
 rw_exit(struct rwlock *rwl)
 {
-   unsigned long owner = rwl->rwl_owner;
-   int wrlock = owner & RWLOCK_WRLOCK;
-   unsigned long set;
+   unsigned long wrlock;
 
/* Avoid deadlocks after panic or in DDB */
if (panicstr || db_active)
return;
 
+   wrlock = rwl->rwl_owner & RWLOCK_WRLOCK;
if (wrlock)
rw_assert_wrlock(rwl);
else
rw_assert_rdlock(rwl);
-
WITNESS_UNLOCK(>rwl_lock_obj, wrlock ? LOP_EXCLUSIVE : 0);
 
membar_exit_before_atomic();
+   rw_do_exit(rwl, wrlock);
+}
+
+/* membar_exit_before_atomic() has to precede call of this function. */
+void
+rw_do_exit(struct rwlock *rwl, unsigned long wrlock)
+{
+   unsigned long owner, set;
+
do {
owner = rwl->rwl_owner;
if (wrlock)
@@ -337,7 +346,7 @@ rw_exit(struct rwlock *rwl)
else
set = (owner - RWLOCK_READ_INCR) &
~(RWLOCK_WAIT|RWLOCK_WRWANT);
-   } while (rw_cas(>rwl_owner, owner, set));
+   } while (__predict_false(rw_cas(>rwl_owner, owner, set)));
 
if (owner & RWLOCK_WAIT)
wakeup(rwl);



patch for dump for high percentages

2020-02-29 Thread Peter J. Philipp
Hi,

I have a patch for dump(8) if it is generally considered bad if percentage
done is over 100.0%.  I checked the archives on marc.info for this and didn't
see any discussion whether this was a topic before.

Here is the odd DUMP message I got on a host:

DUMP: 102.41% done, finished in 0:00

And here is the patch:

Index: optr.c
===
RCS file: /cvs/src/sbin/dump/optr.c,v
retrieving revision 1.40
diff -u -p -u -r1.40 optr.c
--- optr.c  22 Jan 2019 16:16:26 -  1.40
+++ optr.c  29 Feb 2020 16:19:25 -
@@ -202,6 +202,7 @@ void
 timeest(void)
 {
time_t  tnow, deltat;
+   float percent;
 
(void) time();
if (tnow >= tschedule) {
@@ -211,8 +212,9 @@ timeest(void)
deltat = tstart_writing - tnow +
(1.0 * (tnow - tstart_writing))
/ blockswritten * tapesize;
+   percent = (blockswritten * 100.0) / tapesize;
msg("%3.2f%% done, finished in %lld:%02lld\n",
-   (blockswritten * 100.0) / tapesize,
+   (percent > 100.0) ? 100.0 : percent,
(long long)deltat / 3600,
((long long)deltat % 3600) / 60);
}


I tested this and it seems to report like before, only when it hits 100.0 it
will not go higher, that particular codepath I didn't hit though.

beta# dump -0uaf - /var | gzip -c > /dev/null
  DUMP: Date of this level 0 dump: Sat Feb 29 17:20:26 2020
  DUMP: Date of last level 0 dump: the epoch
  DUMP: Dumping /dev/rsd2e (/var) to standard output
  DUMP: mapping (Pass I) [regular files]
  DUMP: mapping (Pass II) [directories]
  DUMP: estimated 23891439 tape blocks.
  DUMP: Volume 1 started at: Sat Feb 29 17:20:37 2020
  DUMP: dumping (Pass III) [directories]
  DUMP: dumping (Pass IV) [regular files]
  DUMP: 25.98% done, finished in 0:14

Let me know what you think,

Regards,
-peter



Re: iked(8): Recheck SA proposals when updating policy

2020-02-29 Thread Klemens Nanni
On Wed, Feb 26, 2020 at 11:41:49PM +0100, Tobias Heider wrote:
> Due to the design of the IKEv2 protocol, the receiver does not
> know which policy the initiator tries to negotiate an SA for
> until the second exchange (IKE_AUTH). The IKE_AUTH request contains
> the ID payload which the responder uses to match a policy (and lookup
> authentication keys).
> Until then, iked will fall back to the default policy and then update
> it later.
> The cryptographic proposal and key exchange on the other hand are
> negotiated in the first exchange (IKE_SA_INIT), when the responder
> does not know the initiators ID and thus the actual policy.
> 
> The attached diff adds an additional check to make sure that, when
> the policy changes in IKE_AUTH due to the ID belonging to a different
> policy, the negotiated SA's proposal is still compatible with the
> updated policy.
Yes, this makes sense to me.

Diff reads OK, although I'm no iked expert.

I've been running with this diff on sparc64 the last few days, all my
peers continue to work.