Re: unkillable firefox

2016-12-20 Thread Eric Badger
On 12/20/2016 15:29, Steve Kargl wrote:
> Anyone know how to kill firefox?
> 
> last pid: 69652;  load averages:  0.49,  0.27,  0.24  up 1+02:40:06  
> 13:16:02
> 126 processes: 1 running, 121 sleeping, 4 stopped
> CPU:  0.8% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> Mem: 2049M Active, 3739M Inact, 496M Laundry, 1365M Wired, 783M Buf, 239M Free
> Swap: 16G Total, 1772K Used, 16G Free
> 
>   PID USERNAME   PRI NICE SIZERES STATE   C   TIMEWCPU COMMAND
> 63902 kargl  40   0  3157M  2302M STOP1  10:50   0.00% 
> firefox{firefox}
> 63902 kargl -16   0  3157M  2302M STOP2   5:46   0.00% 
> firefox{Composit
> 16874 kargl  40   0   740M   330M STOP1   0:07   0.00% 
> firefox{firefox}
> 16874 kargl -16   0   740M   330M STOP1   0:00   0.00% 
> firefox{Composit
> 
> It seems that firefox is wedged in the thread firefox{Compositor},
> and slowly eating up memory.  This is on an amd64 system at
> r310125 and latest firefox from ports.  procstat suggests that its
> stuck in a vm sleep queue.
> 
> % procstat -k 63902
>   PIDTID COMM   TDNAME   KSTACK   
> 63902 100504 firefox-mi_switch thread_suspend_switch
>  thread_single exit1 sigexit postsig ast
>  Xfast_syscall 
> 63902 101494 firefoxCompositor   mi_switch sleepq_wait _sleep 
>  vm_page_busy_sleep vm_page_sleep_if_busy
>  vm_fault_hold vm_fault trap_pfault trap
>  calltrap 
> 

Do you have output of procstat -k for all threads? I'd guess one thread
is busy dumping core.

Eric
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Kqueue races causing crashes

2016-06-14 Thread Eric Badger
Oops, I don't think my attachment worked. This should do the trick: 
https://drive.google.com/open?id=0B8Lj3D-GnaCcS0taVVNlQktQRkk


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Kqueue races causing crashes

2016-06-14 Thread Eric Badger

Hi there,

There seems to be some racy code in kern_event.c which is causing me to 
run into some crashes. I’ve attached the test program used to generate 
these crashes (build it and run the “go” script). They were produced in 
a VM with 4 cores on 11 Alpha 3 (and originally 10.3). The crashes I’ve 
seen come in a few varieties:


1. “userret: returning with the following locks held”. This one is the 
easiest to hit (assuming witness is enabled).


userret: returning with the following locks held:
exclusive sleep mutex process lock (process lock) r = 0 
(0xf80006956120) locked @ /usr/src/sys/kern/kern_event.c:2125

panic: witness_warn
cpuid = 2
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe39d8e0

vpanic() at vpanic+0x182/frame 0xfe39d960
kassert_panic() at kassert_panic+0x126/frame 0xfe39d9d0
witness_warn() at witness_warn+0x3c6/frame 0xfe39daa0
userret() at userret+0x9d/frame 0xfe39dae0
amd64_syscall() at amd64_syscall+0x406/frame 0xfe39dbf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe39dbf0
--- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x800b8a0ba, rsp = 
0x7fffea98, rbp = 0x7fffeae0 ---

KDB: enter: panic
[ thread pid 64855 tid 100106 ]
Stopped at  kdb_enter+0x3b: movq$0,kdb_why
db> show all locks
Process 64855 (watch) thread 0xf800066c3000 (100106)
exclusive sleep mutex process lock (process lock) r = 0 
(0xf80006956120) locked @ /usr/src/sys/kern/kern_event.c:2125

Process 64855 (watch) thread 0xf8000696a500 (100244)
exclusive sleep mutex pmap (pmap) r = 0 (0xf800068c3138) locked @ 
/usr/src/sys/amd64/amd64/pmap.c:4067
exclusive sx vm map (user) (vm map (user)) r = 0 (0xf800068f6080) 
locked @ /usr/src/sys/vm/vm_map.c:3315
exclusive sx vm map (user) (vm map (user)) r = 0 (0xf800068c3080) 
locked @ /usr/src/sys/vm/vm_map.c:3311

db> ps
  pid  ppid  pgrp   uid   state   wmesg wchancmd
64855   690   690 0  R+  (threaded)  watch
100106   Run CPU 2   main
100244   Run CPU 1 procmaker
100245   Run CPU 3 reaper

2.  “Sleeping thread owns a non-sleepable lock”. This one first drew my 
attention by showing up in a real world application at work.


Sleeping thread (tid 100101, pid 76857) owns a non-sleepable lock
KDB: stack backtrace of thread 100101:
sched_switch() at sched_switch+0x2a5/frame 0xfe257690
mi_switch() at mi_switch+0xe1/frame 0xfe2576d0
sleepq_catch_signals() at sleepq_catch_signals+0x16c/frame 
0xfe257730

sleepq_timedwait_sig() at sleepq_timedwait_sig+0xf/frame 0xfe257760
_sleep() at _sleep+0x234/frame 0xfe2577e0
kern_kevent_fp() at kern_kevent_fp+0x38a/frame 0xfe2579d0
kern_kevent() at kern_kevent+0x9f/frame 0xfe257a30
sys_kevent() at sys_kevent+0x12a/frame 0xfe257ae0
amd64_syscall() at amd64_syscall+0x2d4/frame 0xfe257bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe257bf0
--- syscall (363, FreeBSD ELF64, sys_kevent), rip = 0x800b6afea, rsp = 
0x7fffea88, rbp = 0x7fffead0 ---

panic: sleeping thread
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe225590

kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe225640
vpanic() at vpanic+0x126/frame 0xfe225680
panic() at panic+0x43/frame 0xfe2256e0
propagate_priority() at propagate_priority+0x166/frame 0xfe225710
turnstile_wait() at turnstile_wait+0x282/frame 0xfe225750
__mtx_lock_sleep() at __mtx_lock_sleep+0x26b/frame 0xfe2257d0
__mtx_lock_flags() at __mtx_lock_flags+0x5e/frame 0xfe2257f0
proc_to_reap() at proc_to_reap+0x46/frame 0xfe225840
kern_wait6() at kern_wait6+0x202/frame 0xfe2258f0
sys_wait4() at sys_wait4+0x72/frame 0xfe225ae0
amd64_syscall() at amd64_syscall+0x2d4/frame 0xfe225bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe225bf0
--- syscall (7, FreeBSD ELF64, sys_wait4), rip = 0x800b209ba, rsp = 
0x7fffdfdfcf48, rbp = 0x7fffdfdfcf80 ---

KDB: enter: panic
[ thread pid 76857 tid 100225 ]
Stopped at  kdb_enter+0x3e: movq$0,kdb_why
db> show allchains
chain 1:
 thread 100225 (pid 76857, reaper) blocked on lock 0xf800413105f0 
(sleep mutex) "process lock"

 thread 100101 (pid 76857, main) inhibited

(3./4.) There are a few others that I hit less frequently (“page fault 
while in kernel mode”,  "Kernel page fault with the following 
non-sleepable locks held”. I don’t have a backtrace handy for these.


I believe they all have more or less the same cause. The crashes occur 
because we acquire a knlist lock via the KN_LIST_LOCK macro, but when we 
call KN_LIST_UNLOCK, the knote’s knlist reference (kn->kn_knlist) has 
been cleared by another thread. Thus we are unable to unlock the 
previously acquired lock and hold it until 

Re: Appending to message buffer while in ddb

2015-08-04 Thread Eric Badger


On 08/03/2015 03:21 PM, Marcel Moolenaar wrote:

On Aug 3, 2015, at 12:59 PM, Eric Badger eric_bad...@dell.com wrote:

Hi there,

Since r226435, output from kernel printf/log functions is not appended to the message 
buffer when in ddb. The commit message doesn't call this out specifically; instead it 
appears to have been to address double printing to the console while in ddb. I noticed 
this because a ddb script which previously resulted in some things ending up in a 
textdump's msgbuf.txt no longer does so. It may be that the answer is use db_printf 
in ddb, which is ok, but I thought I'd check anyway to see if the aforementioned 
change was indeed intentional, since I wasn't able to dig up any discussion about it.

It’s a direct consequence.



But is it a necessary consequence? For example, would the below patch 
also be acceptable (it's perhaps not the cleanest way to do it, but 
gives the idea)? This way we'll print to the console (once) and, if 
TOLOG is also specified, append to the message buffer. If this is not 
acceptable, then I think all ddb commands not using db_printf (such as 
'show rtc') should be converted to doing so (this might be a good idea 
either way), since their output cannot currently be captured in textdumps.


Thanks,
Eric

diff --git sys/kern/subr_prf.c sys/kern/subr_prf.c
index 4f35838..4739331 100644
--- sys/kern/subr_prf.c
+++ sys/kern/subr_prf.c
@@ -463,19 +463,28 @@ putchar(int c, void *arg)
struct putchar_arg *ap = (struct putchar_arg*) arg;
struct tty *tp = ap-tty;
int flags = ap-flags;
+   int putbuf_done = 0;

/* Don't use the tty code after a panic or while in ddb. */
if (kdb_active) {
if (c != '\0')
cnputc(c);
-   return;
-   }
-
-   if ((flags  TOTTY)  tp != NULL  panicstr == NULL)
-   tty_putchar(tp, c);
+   /* Prevent double printing. */
+   ap-flags = ~(TOCONS);
+   flags = ap-flags;
+   } else {
+   if ((panicstr == NULL)  (flags  TOTTY)  (tp != NULL))
+   tty_putchar(tp, c);

-   if ((flags  (TOCONS | TOLOG))  c != '\0')
-   putbuf(c, ap);
+   if (flags  TOCONS) {
+   putbuf(c, ap);
+   putbuf_done = 1;
+   }
+   }
+   if ((flags  TOLOG)  (putbuf_done == 0)) {
+   if (c != '\0')
+   putbuf(c, ap);
+   }
 }

 /*

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Appending to message buffer while in ddb

2015-08-03 Thread Eric Badger

Hi there,

Since r226435, output from kernel printf/log functions is not appended 
to the message buffer when in ddb. The commit message doesn't call this 
out specifically; instead it appears to have been to address double 
printing to the console while in ddb. I noticed this because a ddb 
script which previously resulted in some things ending up in a 
textdump's msgbuf.txt no longer does so. It may be that the answer is 
use db_printf in ddb, which is ok, but I thought I'd check anyway to 
see if the aforementioned change was indeed intentional, since I wasn't 
able to dig up any discussion about it.


Thanks,
Eric
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: PCI PF memory decode disable when sizing VF BARs

2015-05-06 Thread Eric Badger


On 05/06/15 14:54, Ryan Stone wrote:
On Wed, May 6, 2015 at 2:33 PM, John Baldwin j...@freebsd.org 
mailto:j...@freebsd.org wrote:


Ah, sorry, I didn't know you did it in the caller already.  Perhaps
then something more like your previous patch, but using the test you
added here (PCIR_IS_IOV) instead of your previous check against BAR
values to decide when to frob the command register?

I think that I prefer the current version, as it keeps the interface 
consistent.  It's redundant now, but caller could evolve in the 
future. Given that this is just being run during initialization a 
couple of extra register accesses are irrelevant anyway.


On Wed, May 6, 2015 at 2:58 PM, Eric Badger 
eric.bad...@compellent.com mailto:eric.bad...@compellent.com wrote:


Does the disabling of VF MSE in pci_iov_config actually protect
anything else beyond what happens in pci_read_bar? I gave a read
through which suggests no, but I might have missed something.
Just thinking that the code would be a bit more hardy if it were
done the same way for both VFs and other devices.

Eric


I think that it inherently has to be done differently.  For real PCI 
devices the device might be important during the boot process (e.g. 
the video card) so we need to stay working.  For VFs the devices don't 
even exist until I enable the VF Enable bit is set, so setting MSE 
before that point is irrelevant (it's allowed by the spec, but any 
access to a VF memory space with MSE set and VF Enable clear just gets 
an Unsupported Request response).


Sure; what I meant was to leave the disabling of VF MSE when sizing VF 
BARs in pci_read_bar (as in your second patch) for consistency and, if 
possible, not bother disabling VF MSE in pci_iov_config. But if it's not 
worth nixing the latter (or not possible), it's no big deal.


I've been testing out the second patch in my environment and it looks 
good. I might suggest something like the below (which I find more 
readable) as a cosmetic change:


@@ -2627,9 +2635,18 @@ pci_read_bar(device_t dev, int reg, pci_addr_t 
*mapp, pci_addr_t *testvalp,

 * determining the BAR's length since we will be placing it in
 * a weird state.
 */
-   cmd = pci_read_config(dev, PCIR_COMMAND, 2);
-   pci_write_config(dev, PCIR_COMMAND,
-   cmd  ~(PCI_BAR_MEM(map) ? PCIM_CMD_MEMEN : PCIM_CMD_PORTEN), 2);
+#ifdef PCI_IOV
+if (PCIR_IS_IOV(dinfo-cfg, reg)) {
+restore_reg = dinfo-cfg.iov-iov_pos + PCIR_SRIOV_CTL;
+mask = PCIM_SRIOV_VF_MSE;
+} else
+#endif
+{
+restore_reg = PCIR_COMMAND;
+mask = PCI_BAR_MEM(map) ? PCIM_CMD_MEMEN : PCIM_CMD_PORTEN;
+}
+cmd = pci_read_config(dev, restore_reg, 2);
+pci_write_config(dev, restore_reg, cmd  ~mask, 2);


Thanks,
Eric
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: PCI PF memory decode disable when sizing VF BARs

2015-05-06 Thread Eric Badger


On 05/06/15 13:33, John Baldwin wrote:

On Wednesday, May 06, 2015 02:24:24 PM Ryan Stone wrote:

On Wed, May 6, 2015 at 11:45 AM, John Baldwin j...@freebsd.org wrote:


There are some devices with BARs in non-standard locations. :( If there is
a flag to just disable the VF bar decoding, then ideally we should just be
doing that and leaving the global decoding flag alone while sizing the VF
BAR.


Disabling SR-IOV BAR decoding in this function is currently redundant, as
it's already done in pci_iov.c, but I guess to keep the interface sane it
makes sense to do it here too.  Something like this then?

Ah, sorry, I didn't know you did it in the caller already.  Perhaps
then something more like your previous patch, but using the test you
added here (PCIR_IS_IOV) instead of your previous check against BAR
values to decide when to frob the command register?



Does the disabling of VF MSE in pci_iov_config actually protect anything 
else beyond what happens in pci_read_bar? I gave a read through which 
suggests no, but I might have missed something. Just thinking that the 
code would be a bit more hardy if it were done the same way for both VFs 
and other devices.


Eric
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


PCI PF memory decode disable when sizing VF BARs

2015-05-05 Thread Eric Badger
Hi Ryan and -current,

During IOV config, when setting up VF bars, several calls are made to 
'pci_read_bar' (in sys/dev/pci/pci.c) in order to size VF BARs, which causes 
memory decoding to be turned off temporarily for the PF associated with those 
VFs. I'm finding that this can interfere with an already running PF. I've 
several thoughts about how this might be handled, but I'm not convinced I 
understand all of the consequences each of them entails, so any thoughts from 
others would be appreciated. Here are ideas I've considered:

1. Check the value of the 'reg' arg to 'pci_read_bar' and, if it is outside a 
standard BAR range, don't disable memory decoding. This is simple, but feels a 
little hackish and may have consequences I'm missing.
2. Pass some flag/context through such that pci_read_bar knows it is 
configuring VF BARs (we might instead disable VF MSE in this case, if it is 
enabled). It would be necessary to carry this flag/context through several 
function calls before reaching pci_read_bar, which might end up being ugly.
3. Rearrange the calls so that VF BARs are sized when the PF is not yet 
running, and that info saved until VFs are created. Probably it would be done 
when the PF BARs are sized for any device supporting IOV, even if that device 
never creates VFs.

Thanks,
Eric
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Early use of log() does not end up in kernel msg buffer

2015-04-06 Thread Eric Badger

On 04/06/2015 04:11 PM, Poul-Henning Kamp wrote:


In message 2033248.eu3rhs8...@ralph.baldwin.cx, John Baldwin writes:


I think phk@ broke this back in 70239.  Before that the log() function did
this:

log()
{

/* log to the msg buffer */
kvprintf(fmt, msglogchar, ...);

if (!log_open) {
/* log to console */
kvprintf(fmt, putchar, ...);
}
}

I think your patch is fine unless phk@ (cc'd) has a reason for not wanting to
do this.

The reason was systems not running syslog having slow serial consoles.



Correct me if I've misunderstood, but that doesn't seem to matter here; 
the proposed change adds logging to the message buffer but leaves 
logging to the console (when no syslog is listening) unchanged.


Eric
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Early use of log() does not end up in kernel msg buffer

2015-03-26 Thread Eric Badger
Using log(9) when no process is reading the log results in the message 
going only to the console (contrast with printf(9), which goes to the 
console and to the kernel message buffer in this case). I believe it is 
truer to the semantics of logging for messages to *always* go to the 
message buffer (where they can eventually be collected and in fact put 
into a logfile). I therefore propose the attached patch, which sends 
log(9) to the message buffer always, and to the console only if no one 
has yet opened the log.


It may be more complete to log to the console only if the log level is 
greater than some (user defined) value, but this seems like that might 
be more than necessary for this case.


Thoughts?

Eric
diff --git share/man/man9/printf.9 share/man/man9/printf.9
index 84ac822..505ea9b 100644
--- share/man/man9/printf.9
+++ share/man/man9/printf.9
@@ -67,7 +67,8 @@ The
 .Fn log
 function sends the message to the kernel logging facility, using
 the log level as indicated by
-.Fa pri .
+.Fa pri ,
+and to the console if no process is yet reading the log.
 .Pp
 Each of these related functions use the
 .Fa fmt
diff --git sys/kern/subr_prf.c sys/kern/subr_prf.c
index 7e6fd09..6509522 100644
--- sys/kern/subr_prf.c
+++ sys/kern/subr_prf.c
@@ -295,7 +295,7 @@ log(int level, const char *fmt, ...)
 	va_list ap;
 
 	va_start(ap, fmt);
-	(void)_vprintf(level, log_open ? TOLOG : TOCONS, fmt, ap);
+	(void)_vprintf(level, log_open ? TOLOG : TOCONS | TOLOG, fmt, ap);
 	va_end(ap);
 
 	msgbuftrigger = 1;
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Filepaths in VM map for tmpfs files

2015-02-08 Thread Eric Badger


On 02/05/2015 07:25 AM, John Baldwin wrote:

On Thursday, February 05, 2015 10:37:55 AM Konstantin Belousov wrote:

On Wed, Feb 04, 2015 at 10:15:04AM -0500, John Baldwin wrote:

On Tuesday, February 03, 2015 10:33:36 PM Konstantin Belousov wrote:

On Mon, Feb 02, 2015 at 09:50:22PM -0600, Eric Badger wrote:

On 02/02/2015 03:30 AM, Konstantin Belousov wrote:

On Sun, Feb 01, 2015 at 08:38:29PM -0600, Eric Badger wrote:

On 01/31/2015 09:36 AM, Konstantin Belousov wrote:

First, shouldn't the kve_type changed to KVME_TYPE_VNODE as well ?

My thinking is no, because KVME_TYPE_SWAP is in fact the correct
type;
I'd opine that it is better to be transparent than make it look
like
there is an OBJT_VNODE object there. It may be that some programs
would
be confused by VNODE info returned on a SWAP type mapping, though I
know
that dtrace handles it OK.

kve_vn_* and kve_path fields are defined only for KVME_TYPE_VNODE
kve_type.
So this is in fact a bug in whatever used the API to access kve_path
for KVE_TYPE_SWAP.

Hmm, is that documented anywhere? I think it's fair to assume that
kve_vn* applies only to the VNODE type,
but I know there are several in-tree users that reference kve_path
regardless of type (ostensibly relying
on the default of an empty string). Maybe one could determine the
validity of the kve_vn* fields by
inspecting the kve_vn_type (not sure of all the consequences of that)?
Or change it to KVME_TYPE_VNODE
and deal with the below problem...

There is no useful documentation for the kern.proc. sysctls.
My word (and statements from other involved developers) could be
considered as close to the truth as it can be.
Somebody taking the efforts to document the stuff would make very
valuable contribution.

I think that kve_path should be valid for all types (e.g. shm_open()
is not a vnode but has a pathname, and that should be fixed to display
if possible). In the equivalent for files (kinfo_file), the pathname
is type-independent and always valid.

Well, this means that it should be valid for vnodes and shm.  My point
is that kvme_vn_path should be used only after the check for type.
We can and do set it to nul string, but using the path unconditionally
is a bug in the user code.

The problem is that shm's can have different types (DEFAULT vs SWAP vs PHYS).
:)  For kinfo_file, tools like fstat always print kf_path regardless of type.
I do think it would be more consistent if the path in a kvme worked the same
way.  Then you don't have to update all the tools each time a type starts
populating the path.


Re: the kve_vn* fields, isn't setting kve_status = KF_ATTR_VALID the way 
to mark them as valid (irrespective of kve_type)? As for path name, I'd 
agree that there's no inherent need to restrict it by type. The field is 
somewhat self-validating (if something other than an empty string was 
returned in the path name field, this field is obviously valid).





That said, I think tmpfs nodes should be exposed as files. It is an
implementation detail of tmpfs that they are swap-backed, but from a
user's perspective these are files, and if you want to expose other
vnode-specific fields than just the path, KVME_TYPE_VNODE would be
more correct.

I agree, but doing it is not easy, since there might be no vnode
to get the required information from.  We do know that this swap
object is for tmpfs node, but currently we only store pointer to
object in the node, not pointer to node from the object.  When the
vnode exists, pointer to vnode is stored in the object.

To fix the issue, we should store pointer to node. Code was not done
this way, because VM code  which handles special-case for OBJT_TMPFS,
would need to know tmpfs internals. Right now, code knows about vnodes
anyway, so object-vnode does not bring tmpfs internals into vm.

I'm more arguing in support of your original proposal.  Doing a best effort if
the vnode exists would certainly be an improvement over what we have now.



I'll make one more brief case for returning tmpfs vm objects as 
KVME_TYPE_SWAP. Isn't the purpose of this sysctl for debugging, or to 
help a user understand what is going on internally? I can imagine 
scenarios where knowing that a mapped file is swap backed is relevant 
information, and returning it as KVME_TYPE_VNODE would hide this 
information.


I'd put forth a vote for return vnode info on a best-effort basis, at 
least for now.


Eric
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Filepaths in VM map for tmpfs files

2015-02-04 Thread Eric Badger


On 02/03/2015 02:33 PM, Konstantin Belousov wrote:

On Mon, Feb 02, 2015 at 09:50:22PM -0600, Eric Badger wrote:

On 02/02/2015 03:30 AM, Konstantin Belousov wrote:

On Sun, Feb 01, 2015 at 08:38:29PM -0600, Eric Badger wrote:

On 01/31/2015 09:36 AM, Konstantin Belousov wrote:

First, shouldn't the kve_type changed to KVME_TYPE_VNODE as well ?

My thinking is no, because KVME_TYPE_SWAP is in fact the correct type;
I'd opine that it is better to be transparent than make it look like
there is an OBJT_VNODE object there. It may be that some programs would
be confused by VNODE info returned on a SWAP type mapping, though I know
that dtrace handles it OK.

kve_vn_* and kve_path fields are defined only for KVME_TYPE_VNODE kve_type.
So this is in fact a bug in whatever used the API to access kve_path
for KVE_TYPE_SWAP.

Hmm, is that documented anywhere? I think it's fair to assume that
kve_vn* applies only to the VNODE type,
but I know there are several in-tree users that reference kve_path
regardless of type (ostensibly relying
on the default of an empty string). Maybe one could determine the
validity of the kve_vn* fields by
inspecting the kve_vn_type (not sure of all the consequences of that)?
Or change it to KVME_TYPE_VNODE
and deal with the below problem...

There is no useful documentation for the kern.proc. sysctls.
My word (and statements from other involved developers) could be
considered as close to the truth as it can be.
Somebody taking the efforts to document the stuff would make very
valuable contribution.


Ok. If I can get a solution figured, I'll plan to include some 
documentation updates.


This problem is somewhat important to me, so I'm going to do some 
additional digging and see if I can't come up with a solution that takes 
into account your notes.


Thanks for the help,
Eric
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Filepaths in VM map for tmpfs files

2015-02-02 Thread Eric Badger


On 02/02/2015 03:30 AM, Konstantin Belousov wrote:

On Sun, Feb 01, 2015 at 08:38:29PM -0600, Eric Badger wrote:

On 01/31/2015 09:36 AM, Konstantin Belousov wrote:

First, shouldn't the kve_type changed to KVME_TYPE_VNODE as well ?

My thinking is no, because KVME_TYPE_SWAP is in fact the correct type;
I'd opine that it is better to be transparent than make it look like
there is an OBJT_VNODE object there. It may be that some programs would
be confused by VNODE info returned on a SWAP type mapping, though I know
that dtrace handles it OK.

kve_vn_* and kve_path fields are defined only for KVME_TYPE_VNODE kve_type.
So this is in fact a bug in whatever used the API to access kve_path
for KVE_TYPE_SWAP.


Hmm, is that documented anywhere? I think it's fair to assume that 
kve_vn* applies only to the VNODE type,
but I know there are several in-tree users that reference kve_path 
regardless of type (ostensibly relying
on the default of an empty string). Maybe one could determine the 
validity of the kve_vn* fields by
inspecting the kve_vn_type (not sure of all the consequences of that)? 
Or change it to KVME_TYPE_VNODE

and deal with the below problem...




Second, note that it is possible that the vnode is recycled, so
OBJ_TMPFS flag is cleared for tmpfs swap object.  The OBJ_TMPFS_NODE
flag is still set then.  I am not sure what to do in this case,
should the type changed to KVME_TYPE_VNODE still, but kve_vn_*
fields left invalid ?

I think if we changed to KVME_TYPE_VNODE in some cases, it should be
done in all cases, even if the vnode has been recycled (but leave vp ==
NULL in that case). Though if it is left as KVME_TYPE_SWAP, then that
concern goes away on its own.

Concern is not vp == NULL, but the fact that kve_vn* cannot be filled,
there is simply no (easy) way to fetch this information.


Right; by leaving vp == NULL, I meant don't populate the kve_vn* 
fields, which admittedly isn't a great solution.
But as you say, the information is not really available once the vnode 
has been reclaimed.


There is some inherent difficultly in the duality of the vm object here; 
it would be nice if it could be treated
uniformly with other vnodes, but I think I lack the expertise to 
approach a more involved solution that

would achieve this.

Incidentally Konstantin, thanks for the feedback and advice.

Eric
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Filepaths in VM map for tmpfs files

2015-02-01 Thread Eric Badger


On 01/31/2015 09:36 AM, Konstantin Belousov wrote:

First, shouldn't the kve_type changed to KVME_TYPE_VNODE as well ?
My thinking is no, because KVME_TYPE_SWAP is in fact the correct type; 
I'd opine that it is better to be transparent than make it look like 
there is an OBJT_VNODE object there. It may be that some programs would 
be confused by VNODE info returned on a SWAP type mapping, though I know 
that dtrace handles it OK.



Second, note that it is possible that the vnode is recycled, so
OBJ_TMPFS flag is cleared for tmpfs swap object.  The OBJ_TMPFS_NODE
flag is still set then.  I am not sure what to do in this case,
should the type changed to KVME_TYPE_VNODE still, but kve_vn_*
fields left invalid ?
I think if we changed to KVME_TYPE_VNODE in some cases, it should be 
done in all cases, even if the vnode has been recycled (but leave vp == 
NULL in that case). Though if it is left as KVME_TYPE_SWAP, then that 
concern goes away on its own.


Eric

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Filepaths in VM map for tmpfs files

2015-01-31 Thread Eric Badger
In FreeBSD 9, examining the VM map of a process (with e.g. 'procstat 
-v') with a tmpfs file mapped showed a VNODE type and displayed the file 
path. In 10.0 up to CURRENT (I believe this started at r250030), instead 
SWAP is shown without a filepath.


This has some unfortunate consequences; I discovered this problem when 
trying to use dtrace's pid provider, which fails to find symbols for 
executables running from tmpfs.


I've attached a patch which will repair procstat/dtrace. There are a few 
other places such a patch would be needed. I'm willing to put together 
such a patch, but would like to first hear some feedback that this seems 
like a reasonable approach, or if there's anything I've missed.


Thoughts?

Eric


Index: sys/kern/kern_proc.c
===
--- sys/kern/kern_proc.c	(revision 277957)
+++ sys/kern/kern_proc.c	(working copy)
@@ -2337,6 +2337,11 @@
 break;
 			case OBJT_SWAP:
 kve-kve_type = KVME_TYPE_SWAP;
+if ((lobj-flags  OBJ_TMPFS) != 0)
+{
+	vp = lobj-un_pager.swp.swp_tmpfs;
+	vref(vp);
+}
 break;
 			case OBJT_DEVICE:
 kve-kve_type = KVME_TYPE_DEVICE;
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org