Re: make process.ps_start an uptime instead of a realtime

2018-06-22 Thread Scott Cheloha
On Tue, Jun 05, 2018 at 03:57:48PM -0600, Todd C. Miller wrote:
> On Tue, 05 Jun 2018 16:30:14 -0500, Scott Cheloha wrote:
> 
> > Because the system clock can jump around, process start times
> > should be recorded as offsets from boot and only converted to
> > a time of day on request.
> >
> > This keeps ps(1)'s etime accurate and ensures processes are
> > correctly sorted by age in ps(1) and pgrep(1).
> 
> Looks good.
> 
> > I don't think we're losing much by declining to preserve the
> > time of day when the process forked.
> >
> > FreeBSD has made this change, though NetBSD has not.
> >
> > The attached patch isn't perfect, as there remains a small window
> > after kvm_getprocs is called where the system clock can jump again.
> > To circumvent that we'd need to compute the elapsed real time for
> > the process in the kernel, or pass out ps_start alongside the derived
> > time of day and let userspace compute the elapsed realtime with
> > the CLOCK_BOOTTIME clock.  Unsure how people feel about extending
> > kinfo_proc, though.
> 
> I really don't this is worth worrying about.  In the current state
> of things, if the system clock jumps the process start time as
> reported by ps will be incorrect anyway.  This seems like an
> improvement.
>
> > This change doesn't require any userspace recompilation for inspecting
> > a live kernel: on reboot -lkvm binaries behave as expected.
> 
> Great.  I was worried that consumers of p_ustart_{sec,usec} would
> need to change but I can see that is not the case.

Update: here is a completed patch.  This now adds bits to libkvm that
derive the process starting time in the coredump case.  It doesn't take
subseconds into account: for that we'd need to expose the timehands
definition to userspace.  So I left a comment marking the way forward
for the intrepid.

For the coredump case to work obviously libkvm itself will need to be
recompiled, as will statically linked binaries like ps(1), otherwise
you'll get zeroes for your process starting times and ps(1) will claim
all the processes on your cored system started in ~1970.  The interface
to libkvm is entirely unchanged, so I'm pretty sure this doesn't
require a version bump.

ps(1)'s etime column will still be incorrect in the coredump case because
it computes the elapsed time using the active kernel's time.  There is a
similar problem with the uptime line in w(1).  The attached patch itself
doesn't break these outputs, though, so I'll fix them in a separate patch.

The only port using p_ustart_{sec,usec} in a way that might be broken by
this change is sudo.  I talked to todd and he said that this change wouldn't
cause any problems, though.

ok?

--
Scott Cheloha

Index: lib/libkvm/kvm_proc2.c
===
RCS file: /cvs/src/lib/libkvm/kvm_proc2.c,v
retrieving revision 1.27
diff -u -p -r1.27 kvm_proc2.c
--- lib/libkvm/kvm_proc2.c  7 Nov 2016 00:26:33 -   1.27
+++ lib/libkvm/kvm_proc2.c  23 Jun 2018 00:12:14 -
@@ -119,13 +119,35 @@ kvm_proclist(kvm_t *kd, int op, int arg,
struct sigacts sa, *sap;
struct vmspace vm, *vmp;
struct plimit limits, *limp;
+   struct nlist nl[3];
pid_t parent_pid, leader_pid;
-   int cnt = 0;
-   int dothreads = 0;
+   time_t time_second, time_uptime;
+   int cnt, dothreads, i;
 
+   cnt = 0;
dothreads = op & KERN_PROC_SHOW_THREADS;
op &= ~KERN_PROC_SHOW_THREADS;
 
+   /* Determine the victim's age and time of death. */
+   memset(, 0, sizeof(nl));
+   nl[0].n_name = "time_second";
+   nl[1].n_name = "time_uptime";
+   nl[2].n_name = NULL;
+   if (kvm_nlist(kd, nl) != 0) {
+   for (i = 0; nl[i].n_type != 0; i++)
+   continue;
+   _kvm_err(kd, kd->program, "%s: no such symbol", nl[i].n_name);
+   return (-1);
+   }
+   if (KREAD(kd, nl[0].n_value, _second)) {
+   _kvm_err(kd, kd->program, "can't read %s", nl[0].n_name);
+   return (-1);
+   }
+   if (KREAD(kd, nl[1].n_value, _uptime)) {
+   _kvm_err(kd, kd->program, "can't read %s", nl[1].n_name);
+   return (-1);
+   }
+
/*
 * Modelled on sysctl_doproc() in sys/kern/kern_sysctl.c
 */
@@ -288,6 +310,19 @@ kvm_proclist(kvm_t *kd, int op, int arg,
kp.p_tpgid = -1;
kp.p_tdev = NODEV;
}
+
+   /*
+* Derive the starting time for the process.
+*
+* XXX This ignores subseconds.  One solution would be
+* to expose the timehands definition to userspace and
+* grab th_offset/th_nanotime from the active timehands.
+* Pending that, just pretend every process forked at
+* the beginning of a nearby second.
+*/
+   kp.p_ustart_sec =
+ 

Re: make process.ps_start an uptime instead of a realtime

2018-06-05 Thread Todd C. Miller
On Tue, 05 Jun 2018 16:30:14 -0500, Scott Cheloha wrote:

> Because the system clock can jump around, process start times
> should be recorded as offsets from boot and only converted to
> a time of day on request.
>
> This keeps ps(1)'s etime accurate and ensures processes are
> correctly sorted by age in ps(1) and pgrep(1).

Looks good.

> I don't think we're losing much by declining to preserve the
> time of day when the process forked.
>
> FreeBSD has made this change, though NetBSD has not.
>
> The attached patch isn't perfect, as there remains a small window
> after kvm_getprocs is called where the system clock can jump again.
> To circumvent that we'd need to compute the elapsed real time for
> the process in the kernel, or pass out ps_start alongside the derived
> time of day and let userspace compute the elapsed realtime with
> the CLOCK_BOOTTIME clock.  Unsure how people feel about extending
> kinfo_proc, though.

I really don't this is worth worrying about.  In the current state
of things, if the system clock jumps the process start time as
reported by ps will be incorrect anyway.  This seems like an
improvement.

> This change doesn't require any userspace recompilation for inspecting
> a live kernel: on reboot -lkvm binaries behave as expected.

Great.  I was worried that consumers of p_ustart_{sec,usec} would
need to change but I can see that is not the case.

 - todd



make process.ps_start an uptime instead of a realtime

2018-06-05 Thread Scott Cheloha
Hi,

Because the system clock can jump around, process start times
should be recorded as offsets from boot and only converted to
a time of day on request.

This keeps ps(1)'s etime accurate and ensures processes are
correctly sorted by age in ps(1) and pgrep(1).

I don't think we're losing much by declining to preserve the
time of day when the process forked.

FreeBSD has made this change, though NetBSD has not.

The attached patch isn't perfect, as there remains a small window
after kvm_getprocs is called where the system clock can jump again.
To circumvent that we'd need to compute the elapsed real time for
the process in the kernel, or pass out ps_start alongside the derived
time of day and let userspace compute the elapsed realtime with
the CLOCK_BOOTTIME clock.  Unsure how people feel about extending
kinfo_proc, though.

This change doesn't require any userspace recompilation for inspecting
a live kernel: on reboot -lkvm binaries behave as expected.

I'm pretty sure this will require changes to libkvm to fix up the start
time when inspecting a dead kernel, but I can't figure out how to core my
own system so I can test said changes.

Thoughts?  Help creating a core for test?

--
Scott Cheloha

Index: sys/kern/init_main.c
===
RCS file: /cvs/src/sys/kern/init_main.c,v
retrieving revision 1.277
diff -u -p -r1.277 init_main.c
--- sys/kern/init_main.c28 Apr 2018 03:13:04 -  1.277
+++ sys/kern/init_main.c5 Jun 2018 16:43:08 -
@@ -512,7 +512,7 @@ main(void *framep)
 */
nanotime();
LIST_FOREACH(pr, , ps_list) {
-   pr->ps_start = boottime;
+   getnanouptime(>ps_start);
TAILQ_FOREACH(p, >ps_threads, p_thr_link) {
nanouptime(>p_cpu->ci_schedstate.spc_runtime);
timespecclear(>p_rtime);
Index: sys/kern/kern_acct.c
===
RCS file: /cvs/src/sys/kern/kern_acct.c,v
retrieving revision 1.36
diff -u -p -r1.36 kern_acct.c
--- sys/kern/kern_acct.c28 Apr 2018 03:13:04 -  1.36
+++ sys/kern/kern_acct.c5 Jun 2018 16:43:08 -
@@ -184,8 +184,8 @@ acct_process(struct proc *p)
acct.ac_stime = encode_comp_t(st.tv_sec, st.tv_nsec);
 
/* (3) The elapsed time the command ran (and its starting time) */
-   acct.ac_btime = pr->ps_start.tv_sec;
-   getnanotime();
+   getnanouptime();
+   acct.ac_btime = time_second - (tmp.tv_sec - pr->ps_start.tv_sec);
timespecsub(, >ps_start, );
acct.ac_etime = encode_comp_t(tmp.tv_sec, tmp.tv_nsec);
 
Index: sys/kern/kern_fork.c
===
RCS file: /cvs/src/sys/kern/kern_fork.c,v
retrieving revision 1.202
diff -u -p -r1.202 kern_fork.c
--- sys/kern/kern_fork.c30 Dec 2017 20:47:00 -  1.202
+++ sys/kern/kern_fork.c5 Jun 2018 16:43:08 -
@@ -451,7 +451,7 @@ fork1(struct proc *curp, int flags, void
/*
 * For new processes, set accounting bits and mark as complete.
 */
-   getnanotime(>ps_start);
+   getnanouptime(>ps_start);
pr->ps_acflag = AFORK;
atomic_clearbits_int(>ps_flags, PS_EMBRYO);
 
Index: sys/kern/kern_sysctl.c
===
RCS file: /cvs/src/sys/kern/kern_sysctl.c,v
retrieving revision 1.341
diff -u -p -r1.341 kern_sysctl.c
--- sys/kern/kern_sysctl.c  2 Jun 2018 16:38:21 -   1.341
+++ sys/kern/kern_sysctl.c  5 Jun 2018 16:43:08 -
@@ -1596,7 +1596,8 @@ fill_kproc(struct process *pr, struct ki
struct session *s = pr->ps_session;
struct tty *tp;
struct vmspace *vm = pr->ps_vmspace;
-   struct timespec ut, st;
+   struct timespec elapsed, now, started, uptime;
+   struct timespec st, ut;
int isthread;
 
isthread = p != NULL;
@@ -1627,6 +1628,12 @@ fill_kproc(struct process *pr, struct ki
if ((pr->ps_flags & PS_ZOMBIE) == 0) {
if ((pr->ps_flags & PS_EMBRYO) == 0 && vm != NULL)
ki->p_vm_rssize = vm_resident_count(vm);
+   getnanouptime();
+   getnanotime();
+   timespecsub(, >ps_start, );
+   timespecsub(, , );
+   ki->p_ustart_sec = started.tv_sec;
+   ki->p_ustart_usec = started.tv_nsec/1000;
calctsru(isthread ? >p_tu : >ps_tu, , , NULL);
ki->p_uutime_sec = ut.tv_sec;
ki->p_uutime_usec = ut.tv_nsec/1000;
Index: sys/sys/proc.h
===
RCS file: /cvs/src/sys/sys/proc.h,v
retrieving revision 1.248
diff -u -p -r1.248 proc.h
--- sys/sys/proc.h  12 Apr 2018 17:13:44 -  1.248
+++ sys/sys/proc.h  5 Jun 2018 16:43:08 -
@@ -228,7 +228,7 @@ struct process {
 #define