Re: Kernel panic (possibly inteldrm related)
> Date: Thu, 27 Jul 2017 18:18:34 +0200 (CEST) > From: Mark Kettenis> > > The only "leak" I'm seeing is the 'drmreq' pool. It grows until the > > application is closed. Note that with my fix the allocated size for > > 'drmreq' is divided by 4. So if that was the problem I might not be > > able to reproduce it. > > That might be it. The pool item size was 584 bytes. Because of the > "size * 8" in the pool implementation we end up using "large" pool > pages. Since the pool doesn't have the PR_WAITOK flag this end up > using the "interrupt safe" allocator which allocates its VAs from > kmem_map. > > After the pool_init() fix, it'll now use "small" pool pages, which are > directly mapped. So if the problem disappears we have winner. I'll > take a look anyway. The requests allocated shouldn't grow without > bound. At least I expect it to be roughle the same number as the > number of graphics executaion requests in flight, which shouldn't be > more than a couple per process. I found no evidence for a real memory leak. So the kmem_map exhaustion probably happened during some heavy-duty rendering. So I suppose the issue is properly fixed now.
Re: Kernel panic (possibly inteldrm related)
On Thu, Jul 27, 2017 at 05:59:00PM +0200, Martin Pieuchot wrote: Hello Martin, > The only "leak" I'm seeing is the 'drmreq' pool. It grows until the > application is closed. Note that with my fix the allocated size for > 'drmreq' is divided by 4. So if that was the problem I might not be able > to reproduce it. I've been running with this commit (admittedly with slightly lighter than normal use) since yesterday evening and haven't experienced a crash yet, so I'm hopeful this has fixed it. Laurie -- Personal http://tratt.net/laurie/ Software Development Teamhttp://soft-dev.org/ https://github.com/ltratt http://twitter.com/laurencetratt
Re: Kernel panic (possibly inteldrm related)
> Date: Thu, 27 Jul 2017 17:59:00 +0200 > From: Martin Pieuchot> > On 26/07/17(Wed) 13:13, Mark Kettenis wrote: > > > Date: Wed, 26 Jul 2017 12:11:31 +0200 > > > From: Martin Pieuchot > > > > > > On 24/07/17(Mon) 23:41, Laurence Tratt wrote: > > > > On Sun, Jul 23, 2017 at 11:32:06PM +0100, Laurence Tratt wrote: > > > > > > > > > extsmaild (http://tratt.net/laurie/src/extsmail/) appears to be > > > > > causing > > > > > the final panic, but given that it's just in a "wake every 60 > > > > > seconds > > > > > and see if new files have appeared in a directory" loop, I'm not > > > > > sure > > > > > why. > > > > > > > > I've now triggered another crash, this time without extsmaild (or > > > > Iridium) > > > > running. The trace is here: > > > > > > > > https://imagebin.ca/v/3UWOneXfuSWQ > > > > > > > > The "culprit" process is now mutt, but the panic is still "out of space > > > > in > > > > kmem_map" and the trace seems to be in ufs_readdir. > > > > > > I have seen the same panic message while watching a movie fullscreen > > > with mplayer yesterday. > > > > > > However as soon as CPU0 tried to enter DDB, after typing mach ddbcpu 0, > > > the machine freeze. > > > > Sounds like something is leaking memory. I don't really see any > > evidence of this on my systems. The main consumer of kmem_map "space" > > (on amd64) is malloc(9). Does vmstat -m give any clues about what is > > consuming/leaking memory? > > The only "leak" I'm seeing is the 'drmreq' pool. It grows until the > application is closed. Note that with my fix the allocated size for > 'drmreq' is divided by 4. So if that was the problem I might not be > able to reproduce it. That might be it. The pool item size was 584 bytes. Because of the "size * 8" in the pool implementation we end up using "large" pool pages. Since the pool doesn't have the PR_WAITOK flag this end up using the "interrupt safe" allocator which allocates its VAs from kmem_map. After the pool_init() fix, it'll now use "small" pool pages, which are directly mapped. So if the problem disappears we have winner. I'll take a look anyway. The requests allocated shouldn't grow without bound. At least I expect it to be roughle the same number as the number of graphics executaion requests in flight, which shouldn't be more than a couple per process. Cheers, Mark
Re: Kernel panic (possibly inteldrm related)
On 26/07/17(Wed) 13:13, Mark Kettenis wrote: > > Date: Wed, 26 Jul 2017 12:11:31 +0200 > > From: Martin Pieuchot> > > > On 24/07/17(Mon) 23:41, Laurence Tratt wrote: > > > On Sun, Jul 23, 2017 at 11:32:06PM +0100, Laurence Tratt wrote: > > > > > > > extsmaild (http://tratt.net/laurie/src/extsmail/) appears to be > > > > causing > > > > the final panic, but given that it's just in a "wake every 60 > > > > seconds > > > > and see if new files have appeared in a directory" loop, I'm not > > > > sure > > > > why. > > > > > > I've now triggered another crash, this time without extsmaild (or Iridium) > > > running. The trace is here: > > > > > > https://imagebin.ca/v/3UWOneXfuSWQ > > > > > > The "culprit" process is now mutt, but the panic is still "out of space in > > > kmem_map" and the trace seems to be in ufs_readdir. > > > > I have seen the same panic message while watching a movie fullscreen > > with mplayer yesterday. > > > > However as soon as CPU0 tried to enter DDB, after typing mach ddbcpu 0, > > the machine freeze. > > Sounds like something is leaking memory. I don't really see any > evidence of this on my systems. The main consumer of kmem_map "space" > (on amd64) is malloc(9). Does vmstat -m give any clues about what is > consuming/leaking memory? The only "leak" I'm seeing is the 'drmreq' pool. It grows until the application is closed. Note that with my fix the allocated size for 'drmreq' is divided by 4. So if that was the problem I might not be able to reproduce it.
Re: Kernel panic (possibly inteldrm related)
On Sun, Jul 23, 2017 at 11:32:06PM +0100, Laurence Tratt wrote: > extsmaild (http://tratt.net/laurie/src/extsmail/) appears to be causing > the final panic, but given that it's just in a "wake every 60 seconds > and see if new files have appeared in a directory" loop, I'm not sure > why. I've now triggered another crash, this time without extsmaild (or Iridium) running. The trace is here: https://imagebin.ca/v/3UWOneXfuSWQ The "culprit" process is now mutt, but the panic is still "out of space in kmem_map" and the trace seems to be in ufs_readdir. Laurie -- Personal http://tratt.net/laurie/ Software Development Teamhttp://soft-dev.org/ https://github.com/ltratt http://twitter.com/laurencetratt
Kernel panic (possibly inteldrm related)
>Synopsis: Kernel panic (possibly inteldrm related) >Category: kernel >Environment: System : OpenBSD 6.1 Details : OpenBSD 6.1-current (GENERIC.MP) #0: Sun Jul 23 11:17:14 BST 2017 ltr...@phase.tratt.net:/usr/src/sys/arch/amd64/compile/GENERIC.MP Architecture: OpenBSD.amd64 Machine : amd64 >Description: Since the inteldrm update on both my desktop (a Skylake machine) and laptop (X1 Carbon 3rd gen) I have experienced random kernel panics. I've now had a ddb trace from both machines (both panic with "malloc: out of space in kmem_map"). The first ddb (from the desktop) is here (from a kernel a few days old; limited information as my keyboard didn't work at the ddb prompt): https://imagebin.ca/v/3UPGaXO2uK54 The second (from the laptop with snapshot from yesterday and a kernel built today) is here: https://imagebin.ca/v/3UPI4KUtloXi and then various output from ddb (tar file with several JPEGs inside): https://www.dropbox.com/s/xuhzpmftvz9vshj/ddb_output.tar?dl=0 extsmaild (http://tratt.net/laurie/src/extsmail/) appears to be causing the final panic, but given that it's just in a "wake every 60 seconds and see if new files have appeared in a directory" loop, I'm not sure why. I have also tried killing it, and still experienced at least 1 or 2 panics (albeit not ones that have ended up in ddb), so I suspect extsmaild is a symptom but not the cause. Interestingly, if I "boot -c" and "disable inteldrm" the panics go away on my desktop (I haven't yet tried this on my laptop). The dmesg below is from my laptop with a snapshot from yesterday and a kernel built today. >How-To-Repeat: Happens intermittently (generally within a hour of light-to-medium usage). >Fix: Unknown. dmesg: OpenBSD 6.1-current (GENERIC.MP) #0: Sun Jul 23 11:17:14 BST 2017 ltr...@phase.tratt.net:/usr/src/sys/arch/amd64/compile/GENERIC.MP RTC BIOS diagnostic error 80 real mem = 8238284800 (7856MB) avail mem = 7982817280 (7613MB) User Kernel Config UKC> quit Continuing... mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xccbfd000 (66 entries) bios0: vendor LENOVO version "N14ET35W (1.13 )" date 04/07/2016 bios0: LENOVO 20BTS05Q00 acpi0 at bios0: rev 2 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP ASF! HPET ECDT APIC MCFG SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT PCCT SSDT UEFI MSDM BATB FPDT UEFI DMAR acpi0: wakeup devices LID_(S4) SLPB(S3) IGBE(S4) EXP2(S4) XHCI(S3) EHC1(S3) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpihpet0 at acpi0: 14318179 Hz acpiec0 at acpi0 acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 2594.44 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,RDSEED,ADX,SMAP,PT,SENSOR,ARAT cpu0: 256KB 64b/line 8-way L2 cache cpu0: TSC frequency 2594442560 Hz cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE cpu1 at mainbus0: apid 1 (application processor) cpu1: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 2594.00 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,RDSEED,ADX,SMAP,PT,SENSOR,ARAT cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 1, core 0, package 0 cpu2 at mainbus0: apid 2 (application processor) cpu2: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 2594.00 MHz cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,RDSEED,ADX,SMAP,PT,SENSOR,ARAT cpu2: 256KB 64b/line 8-way L2 cache cpu2: smt 0, core 1, package 0 cpu3 at mainbus0: apid 3 (application processor) cpu3: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 2594.00 MHz cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,A