Re: Kernel panic (possibly inteldrm related)

2017-07-29 Thread Mark Kettenis
> Date: Thu, 27 Jul 2017 18:18:34 +0200 (CEST)
> From: Mark Kettenis 
> 
> > The only "leak" I'm seeing is the 'drmreq' pool.  It grows until the
> > application is closed.  Note that with my fix the allocated size for
> > 'drmreq' is divided by 4.  So if that was the problem I might not be
> > able to reproduce it.
> 
> That might be it.  The pool item size was 584 bytes.  Because of the
> "size * 8" in the pool implementation we end up using "large" pool
> pages.  Since the pool doesn't have the PR_WAITOK flag this end up
> using the "interrupt safe" allocator which allocates its VAs from
> kmem_map.
> 
> After the pool_init() fix, it'll now use "small" pool pages, which are
> directly mapped.  So if the problem disappears we have winner.  I'll
> take a look anyway.  The requests allocated shouldn't grow without
> bound.  At least I expect it to be roughle the same number as the
> number of graphics executaion requests in flight, which shouldn't be
> more than a couple per process.

I found no evidence for a real memory leak.  So the kmem_map
exhaustion probably happened during some heavy-duty rendering.  So I
suppose the issue is properly fixed now.



Re: Kernel panic (possibly inteldrm related)

2017-07-28 Thread Laurence Tratt
On Thu, Jul 27, 2017 at 05:59:00PM +0200, Martin Pieuchot wrote:

Hello Martin,

> The only "leak" I'm seeing is the 'drmreq' pool.  It grows until the
> application is closed.  Note that with my fix the allocated size for
> 'drmreq' is divided by 4.  So if that was the problem I might not be able
> to reproduce it.

I've been running with this commit (admittedly with slightly lighter than
normal use) since yesterday evening and haven't experienced a crash yet, so
I'm hopeful this has fixed it.


Laurie
-- 
Personal http://tratt.net/laurie/
Software Development Teamhttp://soft-dev.org/
   https://github.com/ltratt  http://twitter.com/laurencetratt



Re: Kernel panic (possibly inteldrm related)

2017-07-27 Thread Mark Kettenis
> Date: Thu, 27 Jul 2017 17:59:00 +0200
> From: Martin Pieuchot 
> 
> On 26/07/17(Wed) 13:13, Mark Kettenis wrote:
> > > Date: Wed, 26 Jul 2017 12:11:31 +0200
> > > From: Martin Pieuchot 
> > > 
> > > On 24/07/17(Mon) 23:41, Laurence Tratt wrote:
> > > > On Sun, Jul 23, 2017 at 11:32:06PM +0100, Laurence Tratt wrote:
> > > > 
> > > > > extsmaild (http://tratt.net/laurie/src/extsmail/) appears to be 
> > > > > causing
> > > > > the final panic, but given that it's just in a "wake every 60 
> > > > > seconds
> > > > > and see if new files have appeared in a directory" loop, I'm not 
> > > > > sure
> > > > > why.
> > > > 
> > > > I've now triggered another crash, this time without extsmaild (or 
> > > > Iridium)
> > > > running. The trace is here:
> > > > 
> > > >   https://imagebin.ca/v/3UWOneXfuSWQ
> > > > 
> > > > The "culprit" process is now mutt, but the panic is still "out of space 
> > > > in
> > > > kmem_map" and the trace seems to be in ufs_readdir.
> > > 
> > > I have seen the same panic message while watching a movie fullscreen
> > > with mplayer yesterday.
> > > 
> > > However as soon as CPU0 tried to enter DDB, after typing mach ddbcpu 0,
> > > the machine freeze. 
> > 
> > Sounds like something is leaking memory.  I don't really see any
> > evidence of this on my systems.  The main consumer of kmem_map "space"
> > (on amd64) is malloc(9).  Does vmstat -m give any clues about what is
> > consuming/leaking memory?
> 
> The only "leak" I'm seeing is the 'drmreq' pool.  It grows until the
> application is closed.  Note that with my fix the allocated size for
> 'drmreq' is divided by 4.  So if that was the problem I might not be
> able to reproduce it.

That might be it.  The pool item size was 584 bytes.  Because of the
"size * 8" in the pool implementation we end up using "large" pool
pages.  Since the pool doesn't have the PR_WAITOK flag this end up
using the "interrupt safe" allocator which allocates its VAs from
kmem_map.

After the pool_init() fix, it'll now use "small" pool pages, which are
directly mapped.  So if the problem disappears we have winner.  I'll
take a look anyway.  The requests allocated shouldn't grow without
bound.  At least I expect it to be roughle the same number as the
number of graphics executaion requests in flight, which shouldn't be
more than a couple per process.

Cheers,

Mark



Re: Kernel panic (possibly inteldrm related)

2017-07-27 Thread Martin Pieuchot
On 26/07/17(Wed) 13:13, Mark Kettenis wrote:
> > Date: Wed, 26 Jul 2017 12:11:31 +0200
> > From: Martin Pieuchot 
> > 
> > On 24/07/17(Mon) 23:41, Laurence Tratt wrote:
> > > On Sun, Jul 23, 2017 at 11:32:06PM +0100, Laurence Tratt wrote:
> > > 
> > > > extsmaild (http://tratt.net/laurie/src/extsmail/) appears to be 
> > > > causing
> > > > the final panic, but given that it's just in a "wake every 60 
> > > > seconds
> > > > and see if new files have appeared in a directory" loop, I'm not 
> > > > sure
> > > > why.
> > > 
> > > I've now triggered another crash, this time without extsmaild (or Iridium)
> > > running. The trace is here:
> > > 
> > >   https://imagebin.ca/v/3UWOneXfuSWQ
> > > 
> > > The "culprit" process is now mutt, but the panic is still "out of space in
> > > kmem_map" and the trace seems to be in ufs_readdir.
> > 
> > I have seen the same panic message while watching a movie fullscreen
> > with mplayer yesterday.
> > 
> > However as soon as CPU0 tried to enter DDB, after typing mach ddbcpu 0,
> > the machine freeze. 
> 
> Sounds like something is leaking memory.  I don't really see any
> evidence of this on my systems.  The main consumer of kmem_map "space"
> (on amd64) is malloc(9).  Does vmstat -m give any clues about what is
> consuming/leaking memory?

The only "leak" I'm seeing is the 'drmreq' pool.  It grows until the
application is closed.  Note that with my fix the allocated size for
'drmreq' is divided by 4.  So if that was the problem I might not be
able to reproduce it.



Re: Kernel panic (possibly inteldrm related)

2017-07-24 Thread Laurence Tratt
On Sun, Jul 23, 2017 at 11:32:06PM +0100, Laurence Tratt wrote:

> extsmaild (http://tratt.net/laurie/src/extsmail/) appears to be causing
> the final panic, but given that it's just in a "wake every 60 seconds
> and see if new files have appeared in a directory" loop, I'm not sure
> why.

I've now triggered another crash, this time without extsmaild (or Iridium)
running. The trace is here:

  https://imagebin.ca/v/3UWOneXfuSWQ

The "culprit" process is now mutt, but the panic is still "out of space in
kmem_map" and the trace seems to be in ufs_readdir.


Laurie
-- 
Personal http://tratt.net/laurie/
Software Development Teamhttp://soft-dev.org/
   https://github.com/ltratt  http://twitter.com/laurencetratt



Kernel panic (possibly inteldrm related)

2017-07-23 Thread Laurence Tratt
>Synopsis:      Kernel panic (possibly inteldrm related)
>Category:  kernel
>Environment:
System  : OpenBSD 6.1
Details : OpenBSD 6.1-current (GENERIC.MP) #0: Sun Jul 23 11:17:14 
BST 2017
 
ltr...@phase.tratt.net:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:
Since the inteldrm update on both my desktop (a Skylake machine) and
laptop (X1 Carbon 3rd gen) I have experienced random kernel panics.
I've now had a ddb trace from both machines (both panic with "malloc:
out of space in kmem_map"). The first ddb (from the desktop) is here
(from a kernel a few days old; limited information as my keyboard didn't
work at the ddb prompt):

  https://imagebin.ca/v/3UPGaXO2uK54

The second (from the laptop with snapshot from yesterday and a kernel
built today) is here:

  https://imagebin.ca/v/3UPI4KUtloXi

and then various output from ddb (tar file with several JPEGs inside):

  https://www.dropbox.com/s/xuhzpmftvz9vshj/ddb_output.tar?dl=0

extsmaild (http://tratt.net/laurie/src/extsmail/) appears to be causing
the final panic, but given that it's just in a "wake every 60 seconds
and see if new files have appeared in a directory" loop, I'm not sure
why. I have also tried killing it, and still experienced at least 1 or 2
panics (albeit not ones that have ended up in ddb), so I suspect
extsmaild is a symptom but not the cause. Interestingly, if I "boot -c"
and "disable inteldrm" the panics go away on my desktop (I haven't yet
tried this on my laptop).

The dmesg below is from my laptop with a snapshot from yesterday and a 
kernel
built today.
>How-To-Repeat:
Happens intermittently (generally within a hour of light-to-medium
usage).
>Fix:
Unknown.

dmesg:
OpenBSD 6.1-current (GENERIC.MP) #0: Sun Jul 23 11:17:14 BST 2017
ltr...@phase.tratt.net:/usr/src/sys/arch/amd64/compile/GENERIC.MP
RTC BIOS diagnostic error 80
real mem = 8238284800 (7856MB)
avail mem = 7982817280 (7613MB)
User Kernel Config
UKC> quit
Continuing...
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xccbfd000 (66 entries)
bios0: vendor LENOVO version "N14ET35W (1.13 )" date 04/07/2016
bios0: LENOVO 20BTS05Q00
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP ASF! HPET ECDT APIC MCFG SSDT SSDT SSDT SSDT SSDT SSDT 
SSDT SSDT SSDT PCCT SSDT UEFI MSDM BATB FPDT UEFI DMAR
acpi0: wakeup devices LID_(S4) SLPB(S3) IGBE(S4) EXP2(S4) XHCI(S3) EHC1(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpiec0 at acpi0
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 2594.44 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,RDSEED,ADX,SMAP,PT,SENSOR,ARAT
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: TSC frequency 2594442560 Hz
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 2594.00 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,RDSEED,ADX,SMAP,PT,SENSOR,ARAT
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 1, core 0, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 2594.00 MHz
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,RDSEED,ADX,SMAP,PT,SENSOR,ARAT
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 2594.00 MHz
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,A