[Bug 231296] smartpqi - kernel panics

2018-10-04 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296

--- Comment #12 from Josh Gitlin  ---
(In reply to Andriy Gapon from comment #10)

> it's quite possible that ARC contributes to the problem but
> there is a bug in kmem_back / kmem_malloc.

This is what I felt as well when reading the source. I didn't see any specific
out of memory error, but rather a page fault which (to my untrained eye) looked
like the kernel trying to access a KVA page that did not exist. But I was very
unsure of my theory that it was a bug as opposed to a misconfiguration.

What I found odd was that we had crashes on production systems where the config
in place hadn't changed in years...

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 231296] smartpqi - kernel panics

2018-10-04 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296

--- Comment #11 from rai...@ultra-secure.de ---
Hi,

I compiled a kernel myself with 
make buildkernel && make installkernel

I had thought the debug kernel lived next to the kernel in /boot/kernel...


(ewserv-log03-prod ) 0 # uname -a
FreeBSD ewserv-log03-prod.everyware.zone 11.2-RELEASE-p4 FreeBSD
11.2-RELEASE-p4 #0: Fri Sep 28 16:37:02 CEST 2018
r...@ewserv-log03-prod.everyware.zone:/usr/obj/usr/src/sys/GENERIC  amd64
(ewserv-log03-prod ) 0 # ll /usr/lib/debug/boot/kernel/kernel.debug 
-r-xr-xr-x  1 root  wheel  86179448 Sep 28 16:37
/usr/lib/debug/boot/kernel/kernel.debug
(ewserv-log03-prod ) 0 # ll /boot/kernel/kernel 
-r-xr-xr-x  1 root  wheel  27781528 Sep 28 16:37 /boot/kernel/kernel

because I wasn't sure if the default kernel package contains a kernel with
debug-symbols.


What is the correct way to get a kernel with debug-symbols?

I can reboot and run my tests again without the ARC reduction, to make sure
this is the kernel that is producing the crashdump. It needed less than an hour
to lock up.

We would like to get this server back into production, but for now I can do
whatever is necessary to solve this problem (apart from allowing direct logins
- I'd have to wipe it)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 231296] smartpqi - kernel panics

2018-10-04 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296

--- Comment #10 from Andriy Gapon  ---
(In reply to rainer from comment #9)
The problem might be similar but it is certainly different.
In the other bug they are getting a panic (unfortunately the panic message is
not shown), while you are getting a fatal trap / page fault.

Also, in your case there is no ARC calls in the stack trace.  It's straight
from the ZIO code to the VM code.  So, it's quite possible that ARC contributes
to the problem (e.g., by creating a memory pressure or some such), but there is
a bug in kmem_back / kmem_malloc.

Finally, in comment #3 the stack trace recorded by ddb and the stack trace
shown by kgdb do not match.  I suspect that that is because you passed a wrong
kernel to kgdb or /usr/lib/debug/boot/kernel does not match /boot/kernel.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 231296] smartpqi - kernel panics

2018-10-04 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296

--- Comment #9 from rai...@ultra-secure.de ---
Hi,

Firmware revision is 1.60 (from HPE website).

But it seems it is an ARC problem that just did not materialize on my other
servers because ARC was limited there already, but is actually pretty
widespread.

Also, one of the first panics we got had the driver-name in the backtrace
somewhere - but that was on the old firmware.

I was notified of this PR privately:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231794

which seems to describe a similar problem.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 231296] smartpqi - kernel panics

2018-10-04 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296

Deepak Ukey  changed:

   What|Removed |Added

 CC||deepak.u...@microsemi.com

--- Comment #8 from Deepak Ukey  ---
Hi,

Can you please tell me how to reproduce the issue or what are steps causing
this panic. 

Also can you please provide me the what is firmware version you are using for
E208i-p SR Gen10 /  P408i-a SR Gen10 cards so that i can try reproducing this
on my setup and  help you to resolve this. 

Thanks.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 231296] smartpqi - kernel panics

2018-10-04 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296

--- Comment #7 from rai...@ultra-secure.de ---
At least, it ran through the night.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 231296] smartpqi - kernel panics

2018-10-04 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296

--- Comment #6 from rai...@ultra-secure.de ---
OK.
This is a setting that I have in my sysctl.conf.local but commented out by
default (because not all hosts use ZFS and I somehow thought that it's only
needed on hosts that do other stuff).

I stumbled about this PR[1], too, a while ago and I have adjusted it on my ZFS
hosts.

Just not on this one because this one isn't supposed to run much else - other
hosts run mysql and/or apache+php+nginx etc.pp.


[1]
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229764
or rather, I took my settings from this one:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=163461

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 231296] smartpqi - kernel panics

2018-10-03 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296

Josh Gitlin  changed:

   What|Removed |Added

 CC||jgitlin+freebsd@goboomtown.
   ||com

--- Comment #5 from Josh Gitlin  ---
I have experienced nearly the same issue, and requested help from the
freebsd-fs list as I thought it might have been related to a kernel change or
misconfiguration (even though the config we were using had not changed)

See: https://lists.freebsd.org/pipermail/freebsd-fs/2018-September/026725.html

Panic stack trace we saw was the exact same, happened under ZFS load (but not
unusually high load, not higher than we've seen in production before)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 231296] smartpqi - kernel panics

2018-10-02 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296

--- Comment #4 from rai...@ultra-secure.de ---
BTW: I've been running memtest86 v7.5 (the free edition of the commercial
version that does UEFI) in this for 8h and it showed no error.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 231296] smartpqi - kernel panics

2018-10-01 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296

--- Comment #3 from rai...@ultra-secure.de ---
After updating the firmware, I still get panics.

The handbook should be clearer about the fact that you can't get a crashdump
from ZFS.

After adding an additional swap-partition on an USB drive, I got this
crash-dump:

(ewserv-log03-prod ) 0 # kgdb /boot/kernel/kernel /var/crash/vmcore.1
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x5a
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80dff90d
stack pointer   = 0x28:0xfe084ed93f00
frame pointer   = 0x28:0xfe084ed93f40
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 0 (zio_write_issue_10)
trap number = 12
panic: page fault
cpuid = 3
KDB: stack backtrace:
#0 0x80b3d567 at kdb_backtrace+0x67
#1 0x80af6b07 at vpanic+0x177
#2 0x80af6983 at panic+0x43
#3 0x80f77fcf at trap_fatal+0x35f
#4 0x80f78029 at trap_pfault+0x49
#5 0x80f777f7 at trap+0x2c7
#6 0x80f57dac at calltrap+0x8
#7 0x80dee7e2 at kmem_back+0xf2
#8 0x80dee6c0 at kmem_malloc+0x60
#9 0x80de6172 at keg_alloc_slab+0xe2
#10 0x80de8b7e at keg_fetch_slab+0x14e
#11 0x80de83b4 at zone_fetch_slab+0x64
#12 0x80de848f at zone_import+0x3f
#13 0x80de4b99 at uma_zalloc_arg+0x3d9
#14 0x82351ab2 at zio_write_compress+0x1e2
#15 0x8235074c at zio_execute+0xac
#16 0x80b4ed74 at taskqueue_run_locked+0x154
#17 0x80b4fed8 at taskqueue_thread_loop+0x98
Uptime: 40m34s
Dumping 5489 out of 32379 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from
/usr/lib/debug//boot/kernel/geom_mirror.ko.debug...done.
done.
Loaded symbols for /boot/kernel/geom_mirror.ko
Reading symbols from /boot/kernel/zfs.ko...Reading symbols from
/usr/lib/debug//boot/kernel/zfs.ko.debug...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from
/usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/accf_data.ko...Reading symbols from
/usr/lib/debug//boot/kernel/accf_data.ko.debug...done.
done.
Loaded symbols for /boot/kernel/accf_data.ko
Reading symbols from /boot/kernel/accf_http.ko...Reading symbols from
/usr/lib/debug//boot/kernel/accf_http.ko.debug...done.
done.
Loaded symbols for /boot/kernel/accf_http.ko
Reading symbols from /boot/kernel/cc_htcp.ko...Reading symbols from
/usr/lib/debug//boot/kernel/cc_htcp.ko.debug...done.
done.
Loaded symbols for /boot/kernel/cc_htcp.ko
Reading symbols from /boot/kernel/ums.ko...Reading symbols from
/usr/lib/debug//boot/kernel/ums.ko.debug...done.
done.
Loaded symbols for /boot/kernel/ums.ko
Reading symbols from /boot/kernel/tmpfs.ko...Reading symbols from
/usr/lib/debug//boot/kernel/tmpfs.ko.debug...done.
done.
Loaded symbols for /boot/kernel/tmpfs.ko
#0  0x80af68fb in doadump (textdump=0) at
/usr/src/sys/kern/kern_shutdown.c:309
309 if (dumping)
(kgdb) bt
#0  0x80af68fb in doadump (textdump=0) at
/usr/src/sys/kern/kern_shutdown.c:309
#1  0x80af6925 in doadump (textdump=) at
/usr/src/sys/kern/kern_shutdown.c:315
#2  0x80af671b in kern_reboot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:382
#3  0x80af6b41 in vpanic (fmt=,
ap=0xfe084ed93c50) at /usr/src/sys/kern/kern_shutdown.c:769
#4  0x80af6983 in panic (fmt=0x0) at
/usr/src/sys/kern/kern_shutdown.c:706
#5  0x80f77fcf in trap_fatal (frame=0xfe084ed93e40, eva=90) at
/usr/src/sys/amd64/amd64/trap.c:875
#6  0x80f78029 in trap_pfault (frame=0xfe084ed93e40, usermode=0) at
/usr/src/sys/amd64/amd64/trap.c:712
#7  0x80f777f7 in trap (frame=0xfe084ed93e40) at
/usr/src/sys/amd64/amd64/trap.c:514
#8  0x80f57dac in Xtss_pti () at
/usr/src/sys/amd64/amd64/exception.S:159
#9  0x80dff90d in vm_page_rename (m=0x3ff,
new_object=0xf80018d8d000, new_pindex=) at
/usr/src/sys/vm/vm_page.c:1342
#10 0x80dee7e2 in kmem_suballoc (parent=0x262, min=0x14000,
max=0x81ebc558, size=874980, superpage_align=) at
/usr/src/sys/vm/vm_kern.c:290
#11 

[Bug 231296] smartpqi - kernel panics

2018-09-11 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296

Mark Linimon  changed:

   What|Removed |Added

   Keywords||panic

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 231296] smartpqi - kernel panics

2018-09-11 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296

--- Comment #2 from rai...@ultra-secure.de ---
You are right, there is an update on HPE's website.

Unfortunately, it's not yet part of an SPP.
So I'll have to figure out a way to install it.

Thanks a lot.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 231296] smartpqi - kernel panics

2018-09-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296

Yuri Pankov  changed:

   What|Removed |Added

 CC||yur...@yuripv.net

--- Comment #1 from Yuri Pankov  ---
Just for the note (I have no idea if it's related or if there's relevant
firmware update from HPE): 1.34 firmware you seem to be running was unstable
for me as well with Microsemi HBA 1100-8i, solved by updating to 1.60 from
Microsemi site.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 231296] smartpqi - kernel panics

2018-09-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231296

Bug ID: 231296
   Summary: smartpqi - kernel panics
   Product: Base System
   Version: 11.2-RELEASE
  Hardware: amd64
OS: Any
Status: New
  Severity: Affects Only Me
  Priority: ---
 Component: kern
  Assignee: b...@freebsd.org
  Reporter: rai...@ultra-secure.de

Created attachment 197020
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=197020=edit
pic of kernel panic

Hi,

this is a a HPE DL380 Gen10 system.

smartpqi0:  port 0x4000-0x40ff mem 0xe280-0xe2807fff at
device 0.0 numa-domain 0 on pci4
smartpqi0: using MSI-X interrupts (16 vectors)
smartpqi1:  port 0xc000-0xc0ff mem 0xf380-0xf3807fff at
device 0.0 numa-domain 0 on pci9
smartpqi1: using MSI-X interrupts (16 vectors)


(server ) 0 # camcontrol devlist
at scbus0 target 64 lun 0 (pass0,da0)
at scbus0 target 66 lun 0 (pass1,da1)
   at scbus0 target 187 lun 0 (pass2,ses0)
at scbus0 target 1088 lun 0 (pass3)
at scbus1 target 64 lun 0 (pass4,da2)
at scbus1 target 65 lun 0 (pass5,da3)
at scbus1 target 66 lun 0 (pass6,da4)
at scbus1 target 67 lun 0 (pass7,da5)
at scbus1 target 68 lun 0 (pass8,da6)
at scbus1 target 69 lun 0 (pass9,da7)
at scbus1 target 70 lun 0 (pass10,da8)
at scbus1 target 71 lun 0 (pass11,da9)
   at scbus1 target 187 lun 0 (pass12,ses1)
at scbus1 target 1088 lun 0 (pass13)
  at scbus2 target 0 lun 0 (da10,pass14)
 at scbus3 target 0 lun 0 (da11,pass15)


We get very frequent kernel panics.


The server is receiving syslogs via syslog-ng314-3.14.1_1

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"