Re: 8.0 performance issue when running build.sh?

2018-08-06 Thread Jason Thorpe


> On Aug 6, 2018, at 12:17 PM, Rhialto  wrote:
> 
> 21.96   24626  82447.93 kernel_lockip_slowtimo+1a


The work that's happening here looks like a scalability nightmare., regardless 
of holding the kernel lock or not.  A couple of things:

1- It should do a better job of determining if there's any work that it 
actually has to do.  As it is, it's going to lock the kernel_lock and traverse 
all of the buckets in a hash table even if there aren't any entries in it.  
Even in the NET_MPSAFE scenario, that's pointless work.

2- The kernel_lock, by its nature, is the lock-of-last-resort, and code needs 
to be doing everything it can to defer taking that lock until it is deemed 
absolutely necessary.  Even in the not-NET_MPSAFE scenario, there should be a 
way to determine that IP fragment expiration processing needs to actually occur 
before taking the kernel_lock.

-- thorpej



Re: ddb input via IPMI virtual console

2018-08-06 Thread Mouse
> The real keyboards are PS/2 and I can't change that because it runs
> on a wire physically passing a /real/ firewall, [...]

(a) Is it possible to run USB over the same conductors used by the PS/2
cable?  (This is a real question; I don't know enough about layer 1 of
either to answer it.)

(b) There exist devices that adapt PS/2 to USB in the
PS/2-keyboard-to-USB-host direction.  They are not as common as the
other direction (USB keyboard on PS/2 host, which I gather is usually
faked rather than truly adapting between the two), but I've seen them
often enough to know they exist.  Perhaps one could help?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: panic: biodone2 already

2018-08-06 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> Here it is. 

And here is another flavor below

I am now convinced the problem came with NetBSD 8.0:
I found that two other domU crashed on daily backup since 
NetBSD 8.0 upgrade, and the panic is also biodone2 already.
I start downgrading today.

uvm_fault(0xc06d4960, 0, 1) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0 eip 0xc0360835 cs 0x9 eflags 0x210282 cr2 0x1c ilevel 0 esp 
0xc59010b4
curlwp 0xc56af000 pid 0 lid 39 lowest kstack 0xdd8bc2c0
panic: trap
cpu0: Begin traceback...
vpanic(c04da74f,dd8bde0c,dd8bde90,c010bb65,c04da74f,dd8bde9c,dd8bde9c,27,dd8bc2c0,210282)
 at netbsd:vpanic+0x12d
panic(c04da74f,dd8bde9c,dd8bde9c,27,dd8bc2c0,210282,1c,0,c59010b4,dd8bde9c) at 
netbsd:panic+0x18
trap() at netbsd:trap+0xc75
--- trap (number 6) ---
uvm_aio_aiodone_pages(dd8bdf20,8,0,0,0,0,0,0,0,0) at 
netbsd:uvm_aio_aiodone_pages+0x15
uvm_aio_aiodone(c59010b4,0,c52b0118,c52b0110,c52b0104,c52b00c0,c03ca660,c56af000,0,c0102031)
 at netbsd:uvm_aio_aiodone+0x9a
workqueue_worker(c52b00c0,b6f000,c0595200,0,c0100075,0,0,0,0,0) at 
netbsd:workqueue_worker+0xbe
cpu0: End traceback...

dumping to dev 142,1 offset 8
dump uvm_fault(0xc06d2be0, 0xfd894000, 2) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0x2 eip 0xc010806d cs 0x9 eflags 0x210202 cr2 0xfd894fff 
ilevel 0x8 esp 0xc06ba580
curlwp 0xc56af000 pid 0 lid 39 lowest kstack 0xdd8bc2c0
Skipping crash dump on recursive panic
panic: trap
cpu0: Begin traceback...
vpanic(c04da74f,dd8bdc98,dd8bdd1c,c010bb65,c04da74f,dd8bdd28,dd8bdd28,27,dd8bc2c0,210202)
 at netbsd:vpanic+0x12d
panic(c04da74f,dd8bdd28,dd8bdd28,27,dd8bc2c0,210202,fd894fff,8,c06ba580,dd8bdd28)
 at netbsd:panic+0x18
trap() at netbsd:trap+0xc75
--- trap (number 6) ---
dodumpsys(dd8bde0c,0,104,c01091fa,8,0,fffa,104,c04da74f,dd8bddf0) at 
netbsd:dodumpsys+0x39d
dumpsys(104,0,c04da74f,dd8bde0c,5,c56af000,6,dd8bde00,c03c4e48,c04da74f) at 
netbsd:dumpsys+0x14
vpanic(c04da74f,dd8bde0c,dd8bde90,c010bb65,c04da74f,dd8bde9c,dd8bde9c,27,dd8bc2c0,210282)
 at netbsd:vpanic+0x13b
panic(c04da74f,dd8bde9c,dd8bde9c,27,dd8bc2c0,210282,1c,0,c59010b4,dd8bde9c) at 
netbsd:panic+0x18
trap() at netbsd:trap+0xc75
--- trap (number 6) ---
uvm_aio_aiodone_pages(dd8bdf20,8,0,0,0,0,0,0,0,0) at 
netbsd:uvm_aio_aiodone_pages+0x15
uvm_aio_aiodone(c59010b4,0,c52b0118,c52b0110,c52b0104,c52b00c0,c03ca660,c56af000,0,c0102031)
 at netbsd:uvm_aio_aiodone+0x9a
workqueue_worker(c52b00c0,b6f000,c0595200,0,c0100075,0,0,0,0,0) at 
netbsd:workqueue_worker+0xbe
cpu0: End traceback...



-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: Proposed change to makesyscalls.sh

2018-08-06 Thread Paul Goyette

 | My initial pass at this was to maintain the bit vector at run-time
 | rather than having makesyscalls.sh calculate the value.  The cost to
 | set the bits in syscall_establish() is not very large.

EIther way, but that makes the kernel fractionally bigger for no real
benefit - if it can be done at compile time, let it be.


Yeah, I knew that!

The attached patch calculates the bit vector at "regen" time.  So we
don't need to maintain the bit vector at run time, and we need only
test the bit vector in one place - syscall_disestablish().


 | The only drawback here is to add a new entry to struct emul to point
 | at the bit vector,

Sure.

 | and to initialize it in every place that the entrypoint array is
 | initialized.

Whenever e_sysent is set, so shall be the new one.  How many places
can there be?


About 20 of them!  :)  Each of the compat- modules has its own
struct emul.  The attached patch includes all of the required changes.

Note that after making these changes, we'll need to regenerate the
various syscall files for each emul.  The attachment also lists these
files.


 | FWIW, what would you suggest as the name of a new struct emul member
 | (in sys/proc.h)?

Naming ... not my strong suit...   e_sc_nomodbits  ?


I ended up using e_nomodbits[] but it would be quite simple to change
if anyone has a strong preference.


 | I'm beginning to think that this change really isn't worthwhile.


I'm still not totally sure if this change is worthwhile.  However, it
does resolve the issue in kern/45781, and it removes the need for the
work-around as currently implemented in

src/sys/arch/usermode/modules/syscallemu/syscallemu.c

I really would appreciate feedback from others on whether or not this
change should be committed.


+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+--+--++Index: arch/i386/i386/linux_syscall.c
===
RCS file: /cvsroot/src/sys/arch/i386/i386/linux_syscall.c,v
retrieving revision 1.53
diff -u -p -r1.53 linux_syscall.c
--- arch/i386/i386/linux_syscall.c  12 Aug 2017 07:21:57 -  1.53
+++ arch/i386/i386/linux_syscall.c  6 Aug 2018 12:02:16 -
@@ -53,6 +53,7 @@ __KERNEL_RCSID(0, "$NetBSD: linux_syscal
 
 static void linux_syscall(struct trapframe *);
 extern struct sysent linux_sysent[];
+extern const uint32_t linux_sysent_nomodbits[];
 
 void
 linux_syscall_intern(struct proc *p)
Index: compat/aoutm68k/aoutm68k_exec.c
===
RCS file: /cvsroot/src/sys/compat/aoutm68k/aoutm68k_exec.c,v
retrieving revision 1.29
diff -u -p -r1.29 aoutm68k_exec.c
--- compat/aoutm68k/aoutm68k_exec.c 6 May 2018 13:40:50 -   1.29
+++ compat/aoutm68k/aoutm68k_exec.c 6 Aug 2018 12:02:17 -
@@ -49,6 +49,7 @@ __KERNEL_RCSID(0, "$NetBSD: aoutm68k_exe
 #include 
 
 extern struct sysent aoutm68k_sysent[];
+extern const uint32_t aoutm68k_sysent_nomodbits[];
 extern char sigcode[], esigcode[];
 void aoutm68k_syscall_intern(struct proc *);
 
@@ -64,6 +65,7 @@ struct emul emul_netbsd_aoutm68k = {
.e_nsysent =AOUTM68K_SYS_NSYSENT,
 #endif
.e_sysent = aoutm68k_sysent,
+   .e_nomodbits =  aoutm68k_sysent_nomodbits,
 #ifdef SYSCALL_DEBUG
.e_syscallnames =   syscallnames,
 #endif
Index: compat/freebsd/freebsd_exec.c
===
RCS file: /cvsroot/src/sys/compat/freebsd/freebsd_exec.c,v
retrieving revision 1.41
diff -u -p -r1.41 freebsd_exec.c
--- compat/freebsd/freebsd_exec.c   6 May 2018 13:40:50 -   1.41
+++ compat/freebsd/freebsd_exec.c   6 Aug 2018 12:02:17 -
@@ -54,6 +54,7 @@ __KERNEL_RCSID(0, "$NetBSD: freebsd_exec
 #include 
 
 extern struct sysent freebsd_sysent[];
+extern const uint32_t freebsd_sysent_nomodbits[];
 extern const char * const freebsd_syscallnames[];
 
 struct uvm_object *emul_freebsd_object;
@@ -72,6 +73,7 @@ struct emul emul_freebsd = {
.e_nsysent =FREEBSD_SYS_NSYSENT,
 #endif
.e_sysent = freebsd_sysent,
+   .e_nomodbits =  freebsd_sysent_nomodbits,
 #ifdef SYSCALL_DEBUG
.e_syscallnames =   freebsd_syscallnames,
 #else
Index: compat/ibcs2/ibcs2_exec.c
===
RCS file: /cvsroot/src/sys/compat/ibcs2/ibcs2_exec.c,v
retrieving revision 1.78
diff -u -p -r1.78 ibcs2_exec.c
--- compat/ibcs2/ibcs2_exec.c   6 May 2018 13:40:51 -   1.78
+++ compat/ibcs2/ibcs2_exec.c   6 Aug 2018 12:02:17 -
@@ -64,6 +64,7 @@ __KERNEL_RCSID(0, "$NetBSD: 

Re: 8.0 performance issue when running build.sh?

2018-08-06 Thread Michael van Elst
rhia...@falu.nl (Rhialto) writes:

>=46rom the caller list, I get the impression that the statistics include
>(potentially) all system activity, not just work being done by/on behalf
>of the specified program. Otherwise I can't explain the network related
>names. All distfiles should have been there already (it was a rebuild of
>existing packages but for 8.0).

Yes, lockstat data includes all system activity.

> 29.41   59522 110425.74 kernel_lockcdev_poll+72
> 21.96   24626  82447.93 kernel_lockip_slowtimo+1a
> 20.04   23020  75230.65 kernel_lockfrag6_slowtimo+1f
> 10.09   26272  37891.06 kernel_lockfrag6_fasttimo+1a

Network isn't MPSAFE yet, so it is naturally that it generates contention
on a busy system.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: 8.0 performance issue when running build.sh?

2018-08-06 Thread Mindaugas Rasiukevicius
Martin Husemann  wrote:
> So here is a more detailed analyzis using flamegraphs:
> 
>   https://netbsd.org/~martin/bc-perf/
> 
> 
> All operations happen on tmpfs, and the locking there is a significant
> part of the game - however, lots of mostly idle time is wasted and it is
> not clear to me what is going on.

Just from a very quick look, it seems like a regression introduced with
the vcache changes: the MP-safe flag is set too late and not inherited
by the root vnode.

http://www.netbsd.org/~rmind/tmpfs_mount_fix.diff

-- 
Mindaugas


Re: Proposed change to makesyscalls.sh

2018-08-06 Thread Rhialto
On Mon 06 Aug 2018 at 05:07:03 +0800, Paul Goyette wrote:
> Well, as I indicated before, it's not really essential to restore the
> original value.  If we blindly reset to sys_nomodule the only down-side
> is the attempt to find an auto-loadable module on subsequent calls for
> the "wrongly-tagged" syscall.

sys_nomodule() could maybe even put back sys_nosys in the table in that
case, for later callers. If it is certain it can't be a module.

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert  -- Wayland: Those who don't understand X
\X/ rhialto/at/falu.nl  -- are condemned to reinvent it. Poorly.


signature.asc
Description: PGP signature


Re: 8.0 performance issue when running build.sh?

2018-08-06 Thread Rhialto
I happened to be rebuilding some packages on amd64/8.0 while I was
reading your article, so I thought I might as well measure something
myself.

My results are not directly comparable to yours. I build from local
rotating disk, not a ram disk. But I have 32 GB of RAM so there should
be a lot of caching.

From the caller list, I get the impression that the statistics include
(potentially) all system activity, not just work being done by/on behalf
of the specified program. Otherwise I can't explain the network related
names. All distfiles should have been there already (it was a rebuild of
existing packages but for 8.0).

NetBSD murthe.falu.nl 8.0 NetBSD 8.0 (GENERIC) #0: Tue Jul 17 14:59:51 UTC 2018
mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64

pkg_comp:default.conf# lockstat -L kernel_lock pkg_chk -ak
...
Elapsed time: 47954.90 seconds.

-- Kernel lock spin

Total%  Count   Time/ms  Lock   Caller
-- --- - -- --
100.00  618791 375442.52 kernel_lock
 29.41   59522 110425.74 kernel_lockcdev_poll+72
 21.96   24626  82447.93 kernel_lockip_slowtimo+1a
 20.04   23020  75230.65 kernel_lockfrag6_slowtimo+1f
 10.09   26272  37891.06 kernel_lockfrag6_fasttimo+1a
  7.50   58970  28150.68 kernel_lockVOP_POLL+93
  3.16   11940  11860.16 kernel_locksleepq_block+1b5
  2.99   56296  11207.05 kernel_lockVOP_LOCK+b2
  1.69  104084   6338.54 kernel_lockbdev_strategy+7f
  0.883939   3311.60 kernel_lockfileassoc_file_delete+20
  0.725289   2700.97 kernel_lockVOP_IOCTL+65
  0.45 703   1680.33 kernel_locknd6_timer_work+49
  0.403102   1500.71 kernel_lockVOP_LOOKUP+5e
  0.26   17808967.32 kernel_lockVOP_UNLOCK+70
  0.146354514.59 kernel_lockVOP_WRITE+61
  0.06 115236.77 kernel_locktcp_timer_keep+48
  0.047082168.23 kernel_locktcp_rcvd_wrapper+18
  0.04   85504144.38 kernel_lockcallout_softclock+2b0
  0.034775103.15 kernel_lockkevent1+692
  0.023725 79.60 kernel_lockkqueue_register+419
  0.01   41908 56.07 kernel_lockintr_biglock_wrapper+16
  0.01   56054 50.86 kernel_lockbiodone2+6d
  0.01  94 43.05 kernel_lockcdev_read+72
  0.01  23 40.65 kernel_lockudp_send_wrapper+2b
  0.016199 38.81 kernel_lockVOP_READ+61
  0.01  20 32.44 kernel_lockcdev_ioctl+88
  0.01 178 30.20 kernel_lockVOP_GETATTR+5e
  0.011050 27.36 kernel_lockknote_detach+134
  0.01  60 25.10 kernel_lockVFS_SYNC+65
  0.01  43 21.64 kernel_lockVOP_READDIR+6d
  0.01 635 19.13 kernel_locktcp_send_wrapper+22
  0.00 101 16.98 kernel_lockcdev_open+be
  0.00   2 13.49 kernel_lockip_drain+e
  0.00  23 11.60 kernel_lockudp6_send_wrapper+2b
  0.006764  9.57 kernel_locksoftint_dispatch+dc
  0.00   9  9.09 kernel_locktcp_timer_rexmt+52
  0.00   3  5.78 kernel_locktcp_attach_wrapper+18
  0.00   2  5.37 kernel_lockmld_timeo+16
  0.00  21  5.10 kernel_locktcp_detach_wrapper+16
  0.00  28  5.07 kernel_lockVOP_FCNTL+64
  0.00   1  5.03 kernel_locktcp_drain+1d
  0.00  31  3.28 kernel_lockbdev_ioctl+88
  0.00 484  2.70 kernel_lockdoifioctl+8f3
  0.00 107  1.22 kernel_lockVOP_ACCESS+5d
  0.00 809  0.91 kernel_lockip6intr+1f
  0.00   1  0.65 kernel_lockrip6_send_wrapper+34
  0.00 101  0.47 kernel_lockudp6_detach_wrapper+14
  0.00  62  0.37 kernel_lockudp_attach_wrapper+16
  0.00  55  0.31 kernel_lockudp_detach_wrapper+16
  0.00 261  0.18 kernel_lockkqueue_register+25b
  0.00  37  0.13 kernel_lockudp6_attach_wrapper+1a
  0.00 144  0.13 kernel_lockipintr+37
  0.00 216  0.11 kernel_lockVOP_SEEK+9c
  0.00  32  0.08 kernel_lockudp6_ctloutput_wrapper+1f
  0.00  12  0.03 kernel_lockudp_ctloutput_wrapper+1f
  0.00   5  0.02 kernel_locktcp_ctloutput_wrapper+1f
  0.00  33  0.02 kernel_lockusb_transfer_complete+122
  0.00   6  0.02 kernel_locktcp_peeraddr_wrapper+1b
  0.00  10  0.02 kernel_locktcp_disconnect_wrapper+18
  0.00   5  0.02 kernel_lockudp6_bind_wrapper+1e
  0.00  10  0.01 kernel_lock  

Re: panic: biodone2 already

2018-08-06 Thread Emmanuel Dreyfus
Martin Husemann  wrote:

> What driver is this?

xbd, this is an NetBSD-8.0/i386 Xen domU on top of a NetBSD-8.0/amd64
dom0 running on Xen 4.8.3.

In the dom0, the disk image is in a file in a FFSv2 filesystem on a
RAIDframe RAID 1, with two wd disks.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: panic: biodone2 already

2018-08-06 Thread Martin Husemann
On Mon, Aug 06, 2018 at 08:37:56PM +0200, Emmanuel Dreyfus wrote:
> cpu0: Begin traceback...
> vpanic(c04da74f,dcbdfdd4,dcbdfe58,c010bb65,c04da74f,dcbdfe64,dcbdfe64,4,dcbde2c0,210202)
>  at netbsd:vpanic+0x12d
> panic(c04da74f,dcbdfe64,dcbdfe64,4,dcbde2c0,210202,fd893fff,8,c06ba580,ff491) 
> at netbsd:panic+0x18
> trap() at netbsd:trap+0xc75
> --- trap (number 6) ---
> dodumpsys(dcbdff48,0,104,c01091fa,8,0,fffb,104,c04f9e2c,dcbdff2c) at 
> netbsd:dodumpsys+0x39d
> dumpsys(104,0,c04f9e2c,dcbdff48,c5bf9950,c057fe00,dcc182cc,dcbdff3c,c03c4e48,c04f9e2c)
>  at netbsd:dumpsys+0x14
> vpanic(c04f9e2c,dcbdff48,dcbdff54,c03fdd2f,c04f9e2c,c0114934,c5bf9950,c057fe00,dcbdff68,c03fdd5d)
>  at netbsd:vpanic+0x13b
> panic(c04f9e2c,c0114934,c5bf9950,c057fe00,dcbdff68,c03fdd5d,0,0,dcc18004,dcbdff8c)
>  at netbsd:panic+0x18
> biodone2(0,0,dcc18004,dcbdff8c,c039e66d,0,dcbdff88,c52ae7e0,0,dcc18004) at 
> netbsd:biodone2+0xff
> biointr(0,dcbdff88,c52ae7e0,0,dcc18004,c039e610,c52ae7e0,0,c0102031,dcc18004) 
> at netbsd:biointr+0x2d
> softint_thread(dcc18004,b6f000,c0595200,0,c0100075,0,0,0,0,0) at 
> netbsd:softint_thread+0x5d
> cpu0: End traceback...

What driver is this?

Martin



Re: panic: biodone2 already

2018-08-06 Thread Emmanuel Dreyfus
Jaromír Dole?ek  wrote:

> Can you give full backtrace?

Here it is. I wonder if it could change things without -o log

cpu0: Begin traceback...
vpanic(c04da74f,dcbdfdd4,dcbdfe58,c010bb65,c04da74f,dcbdfe64,dcbdfe64,4,dcbde2c0,210202)
 at netbsd:vpanic+0x12d
panic(c04da74f,dcbdfe64,dcbdfe64,4,dcbde2c0,210202,fd893fff,8,c06ba580,ff491) 
at netbsd:panic+0x18
trap() at netbsd:trap+0xc75
--- trap (number 6) ---
dodumpsys(dcbdff48,0,104,c01091fa,8,0,fffb,104,c04f9e2c,dcbdff2c) at 
netbsd:dodumpsys+0x39d
dumpsys(104,0,c04f9e2c,dcbdff48,c5bf9950,c057fe00,dcc182cc,dcbdff3c,c03c4e48,c04f9e2c)
 at netbsd:dumpsys+0x14
vpanic(c04f9e2c,dcbdff48,dcbdff54,c03fdd2f,c04f9e2c,c0114934,c5bf9950,c057fe00,dcbdff68,c03fdd5d)
 at netbsd:vpanic+0x13b
panic(c04f9e2c,c0114934,c5bf9950,c057fe00,dcbdff68,c03fdd5d,0,0,dcc18004,dcbdff8c)
 at netbsd:panic+0x18
biodone2(0,0,dcc18004,dcbdff8c,c039e66d,0,dcbdff88,c52ae7e0,0,dcc18004) at 
netbsd:biodone2+0xff
biointr(0,dcbdff88,c52ae7e0,0,dcc18004,c039e610,c52ae7e0,0,c0102031,dcc18004) 
at netbsd:biointr+0x2d
softint_thread(dcc18004,b6f000,c0595200,0,c0100075,0,0,0,0,0) at 
netbsd:softint_thread+0x5d
cpu0: End traceback...


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: panic: biodone2 already

2018-08-06 Thread Jaromír Doleček
This is always a bug, driver processes same buf twice. It can do harm.
If the buf is reused for some other I/O, system can fail to store
data, or claim to read data when it didn't.

Can you give full backtrace?

Jaromir

2018-08-06 17:56 GMT+02:00 Emmanuel Dreyfus :
> Hello
>
> I have a Xen domU that now frequently panics with "biodone2 already". I
> suspect it started after upgrading to NetBSD 8.0.
>
> The offending code is in src/sys/kern/vfs_bio.c (see below). I wonder if
> we could not just release the mutex and exit in that case. Can it make
> some harm?
>
> static void
> biodone2(buf_t *bp)
> {
> void (*callout)(buf_t *);
>
> SDT_PROBE1(io, kernel, ,done, bp);
>
> BIOHIST_FUNC(__func__);
> BIOHIST_CALLARGS(biohist, "bp=%#jx", (uintptr_t)bp, 0, 0, 0);
>
> mutex_enter(bp->b_objlock);
> /* Note that the transfer is done. */
> if (ISSET(bp->b_oflags, BO_DONE))
> panic("biodone2 already");
>
> Cou
>
> --
> Emmanuel Dreyfus
> http://hcpnet.free.fr/pubz
> m...@netbsd.org


Re: repeated panics in mutex_vector_enter (from unp_thread)

2018-08-06 Thread Brian Buhrow
Hello.  If you've not made any changes to the hardware or software and
things were stable, but now they're not, have you tried looking at the
hardware logs kept by the IPMI BMC board?  It could be there is a hardware
problem that's triggering this problem, rather than a specific software
bug.  Bad memory?  Failing cache controller?

-Brian

On Aug 6,  2:16pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
} Subject: repeated panics in mutex_vector_enter (from unp_thread)
} Since a few days, I'm experiencing repeated panics in mutex_vector_enter.
} Nothing was changed to the server in question, probably, it's experiencing 
more 
} load/forks than before. The machine is still on 6.1, but I can't tell whether 
} the problem is version specific.
} 
} The tracebacks look similar (third and fourth coulumn in "show reg" output 
are 
} from subsequend panics, missing values mean same as first one):
} 
} uvm_fault(0x8076d460, 0x0, 1) -> e
} fatal page fault in supervisor mode
} trap type 6 code 0 rip 8027663b cs 8 rflags 10286 cr2  8 cpl 0 rsp 
fe811d83bbc0
} kernel: page fault trap, code=0
} Stopped in pid 0.67 (system) at netbsd:mutex_vector_enter+0x32c:  movq
1
} 8(%rdx),%rax
} db{5}> bt
} mutex_vector_enter() at netbsd:mutex_vector_enter+0x32c
} unp_thread() at netbsd:unp_thread+0x2eb
} db{5}> show reg
} ds64Y ?
} esd2a06405?
} fs269d563 563
} gs0
} rdi   fe834689f040fe83471f1700fe83481f1700
} rsi   1000
} rbp   fe811d83bc20fe811d811c20fe811d811c20
} rbx   fe834689f040fe83481f1700fe83481f1700
} rdx   fff0
} rcx   fff0
} rax   fe811d7ed2a0
} r8fe811d7ed2a0
} r90
} r10   0
} r11   2   1   1
} r12   0
} r13   0
} r14   fe811d7ed2a0
} r15   0
} rip   fff8027663b mutex_vector_enter+0x32c
} cs8
} rflags10286
} rsp   fe811d83bbc0fe811d811bc0fe811d811bc0
} ss0
} netbsd:mutex_vector_enter+0x32c:  movq18(%rdx),%rax
} db{5}> reboot 4
} 
} The machine than hangs hard, I need to press reset.
>-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=




Re: repeated panics in mutex_vector_enter (from unp_thread)

2018-08-06 Thread Edgar Fuß
> If you've not made any changes to the hardware or software
Well, the machine may experience a greater fork rate than before.

> have you tried looking at the hardware logs kept by the IPMI BMC board?
There's nothing relevant in there besides when I disabled one of the PSUs 
to verify my monitoring would recognize that.

> Bad memory?
I only use ECC memory.

PS: The machine just paniced again.


Re: ddb input via IPMI virtual console

2018-08-06 Thread Brian Buhrow
hello.  Since you're using IPMI, how about using a serial console in
the kernel and then using ipmitool to talk to DDB when/if the machine goes
down?  You should be able to do this and still have a virtual VGA console,
but kernel messages won't go there.  This also has the advantage that you
can run script(1) and capture DDB output and/or panic messages for
posterity if you need them.

-Brian

On Aug 6,  2:27pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
} Subject: ddb input via IPMI virtual console
} It looks like my IPMI implementation always emulates a USB keyboard on 
} the virtual console. The real keyboards are PS/2 and I can't change that 
} because it runs on a wire physically passing a /real/ firewall, e. g.
} a constructive element of the building designed to confine a possible fire 
} in the server room. It's close to prohibitively expensive to install another 
} (USB) cable through that and I didn't think about it when I orderd power, 
} VGA and PS/2 cables to be routed through the firewall.
} 
} Can I have ddb input multiplexed from both PS/2 and USB?
>-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=




Re: ddb input via IPMI virtual console

2018-08-06 Thread Robert Swindells


Edgar Fuß 
>> Can I have ddb input multiplexed from both PS/2 and USB?
>To elaborate: I have

[snip]

>and during normal operation, the virtual console does work as expected.
>It's only when the machine panics that I can see on the virtual console what 
>I would see on the VGA monitor, I just can't enter ddb commands.

For a USB keyboard to work with DDB you would need to have a working USB
stack. If the kernel has paniced then this won't be true.


panic: biodone2 already

2018-08-06 Thread Emmanuel Dreyfus
Hello

I have a Xen domU that now frequently panics with "biodone2 already". I
suspect it started after upgrading to NetBSD 8.0.

The offending code is in src/sys/kern/vfs_bio.c (see below). I wonder if
we could not just release the mutex and exit in that case. Can it make
some harm?

static void
biodone2(buf_t *bp)
{
void (*callout)(buf_t *);

SDT_PROBE1(io, kernel, ,done, bp);

BIOHIST_FUNC(__func__);
BIOHIST_CALLARGS(biohist, "bp=%#jx", (uintptr_t)bp, 0, 0, 0);

mutex_enter(bp->b_objlock);
/* Note that the transfer is done. */
if (ISSET(bp->b_oflags, BO_DONE))
panic("biodone2 already");

Cou

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: ddb input via IPMI virtual console

2018-08-06 Thread Edgar Fuß
> Can I have ddb input multiplexed from both PS/2 and USB?
To elaborate: I have

uhidev0 at uhub5 port 2 configuration 1 interface 0
uhidev0: Winbond Electronics Corp Hermon USB hidmouse Device, rev 1.10/0.01, 
addr 2, iclass 3/1
ums0 at uhidev0: 3 buttons and Z dir
wsmouse0 at ums0 mux 0
uhidev1 at uhub5 port 2 configuration 1 interface 1
uhidev1: Winbond Electronics Corp Hermon USB hidmouse Device, rev 1.10/0.01, 
addr 2, iclass 3/1
ukbd0 at uhidev1
wskbd1 at ukbd0 mux 1
wskbd1: connecting to wsdisplay0

and during normal operation, the virtual console does work as expected.
It's only when the machine panics that I can see on the virtual console what 
I would see on the VGA monitor, I just can't enter ddb commands.


Re: ddb input via IPMI virtual console

2018-08-06 Thread Edgar Fuß
> IPMI is a serial console.
IPMI SOL is a serial console.
I was talking about the graphical virtual console.


Re: ddb input via IPMI virtual console

2018-08-06 Thread Michael van Elst
e...@math.uni-bonn.de (Edgar =?iso-8859-1?B?RnXf?=) writes:

>It looks like my IPMI implementation always emulates a USB keyboard on 
>the virtual console. The real keyboards are PS/2 and I can't change that 
>because it runs on a wire physically passing a /real/ firewall, e. g.
>a constructive element of the building designed to confine a possible fire 
>in the server room. It's close to prohibitively expensive to install another 
>(USB) cable through that and I didn't think about it when I orderd power, 
>VGA and PS/2 cables to be routed through the firewall.

IPMI is a serial console. You could just tell the kernel to use it and console
and DDB will be independent of the VGA graphics.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


ddb input via IPMI virtual console

2018-08-06 Thread Edgar Fuß
It looks like my IPMI implementation always emulates a USB keyboard on 
the virtual console. The real keyboards are PS/2 and I can't change that 
because it runs on a wire physically passing a /real/ firewall, e. g.
a constructive element of the building designed to confine a possible fire 
in the server room. It's close to prohibitively expensive to install another 
(USB) cable through that and I didn't think about it when I orderd power, 
VGA and PS/2 cables to be routed through the firewall.

Can I have ddb input multiplexed from both PS/2 and USB?


Switching to first console on ddb entry

2018-08-06 Thread Edgar Fuß
Is there a way to make the console automatically switch to the first one 
(where ddb output goes) on ddb entry/panic?
On the one hand, it's often confusing not to see the green kernel output when 
looking at the screen; on the other hand, while I'm able to see the VGA 
output remotely via IPMI, but currently unable to enter keystrokes, so can't
"press" Ctrl-Alt-F1.


repeated panics in mutex_vector_enter (from unp_thread)

2018-08-06 Thread Edgar Fuß
Since a few days, I'm experiencing repeated panics in mutex_vector_enter.
Nothing was changed to the server in question, probably, it's experiencing more 
load/forks than before. The machine is still on 6.1, but I can't tell whether 
the problem is version specific.

The tracebacks look similar (third and fourth coulumn in "show reg" output are 
from subsequend panics, missing values mean same as first one):

uvm_fault(0x8076d460, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip 8027663b cs 8 rflags 10286 cr2  8 cpl 0 rsp 
fe811d83bbc0
kernel: page fault trap, code=0
Stopped in pid 0.67 (system) at netbsd:mutex_vector_enter+0x32c:movq
1
8(%rdx),%rax
db{5}> bt
mutex_vector_enter() at netbsd:mutex_vector_enter+0x32c
unp_thread() at netbsd:unp_thread+0x2eb
db{5}> show reg
ds  64Y ?
es  d2a06405?
fs  269d563 563
gs  0
rdi fe834689f040fe83471f1700fe83481f1700
rsi 1000
rbp fe811d83bc20fe811d811c20fe811d811c20
rbx fe834689f040fe83481f1700fe83481f1700
rdx fff0
rcx fff0
rax fe811d7ed2a0
r8  fe811d7ed2a0
r9  0
r10 0
r11 2   1   1
r12 0
r13 0
r14 fe811d7ed2a0
r15 0
rip fff8027663b mutex_vector_enter+0x32c
cs  8
rflags  10286
rsp fe811d83bbc0fe811d811bc0fe811d811bc0
ss  0
netbsd:mutex_vector_enter+0x32c:movq18(%rdx),%rax
db{5}> reboot 4

The machine than hangs hard, I need to press reset.


Re: 8.0 performance issue when running build.sh?

2018-08-06 Thread Martin Husemann
So here is a more detailed analyzis using flamegraphs:

https://netbsd.org/~martin/bc-perf/


All operations happen on tmpfs, and the locking there is a significant
part of the game - however, lots of mostly idle time is wasted and it is
not clear to me what is going on.

Initial tests hint at -current having the same issue.

Any ideas?

Martin


Re: Debugging DRM on a laptop?

2018-08-06 Thread Paul Goyette

On Mon, 6 Aug 2018, co...@sdf.org wrote:


I'm gonna try to add a watchdog driver, ichlpcib doesn't work out of the
box on the "Sunrise Point-H LPC controller" or "HM170 chipset LPC" but
it has 1771 pages worth of documents so maybe I can get it to work.


Sometimes, too much documentation is worse than not enough...  :)




+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+--+--++


Re: Debugging DRM on a laptop?

2018-08-06 Thread coypu
On Mon, Aug 06, 2018 at 11:02:45AM +1000, matthew green wrote:
> co...@sdf.org writes:
> > Hi folks,
> > 
> > we're working on a drmkms update.
> > I'm testing it on a Dell XPS 9550. It's a laptop with a Skylake CPU.
> > It has a power button. The power button powers off, not reboot.
> > This means the dmesg buffer gets wiped.
> > No serial console as far as I know.
> > 
> > Now it reaches a point where the screen changes, but my DDB_COMMANDENTER
> > containing "reboot" won't do the expected thing. It's possible it's
> > hanging. I can't seem to blindly enter ddb.
> > 
> > Any suggestions to make debugging less painful?
> > Something like arm a watchdog to panic after some time.
> > 
> > I'm gonna experiment with sprinkling panic("XXX"); and see if this works
> > out.
> 
> if you have ddb.onpanic=0 then it will write a crash dump and
> reboot.  that should give you something useful.

It's not reaching a panic. And I'm not sure it's reliable about where it
hangs (if at all).

This is after cndetach so maybe that explains why my keyboard doesn't
work to blindly enter ddb.

I'm gonna try to add a watchdog driver, ichlpcib doesn't work out of the
box on the "Sunrise Point-H LPC controller" or "HM170 chipset LPC" but
it has 1771 pages worth of documents so maybe I can get it to work.