I have briefly touched on this panic condition before, however since upgrading
to R151022ce the previous mitigation of restricting the pool to 3 specific
flags is no longer working. So here I am for some further discussion and
hopefully more clues with which to chase the core issue down and kill it dead.
Initial condition is to have a pool under very heavy write I/O. In our case
here it is performing an eager zero during the creation of a VMDK file by an
esxi fibre client.
Then while the eager zero is in progress have a scheduled snapshot destroy and
creation script run. At this point there is about a 20ish percent chance you
will generate a panic during the execution of that script.
All the technical readouts I have been able to generate thus far follow.
> ::status
debugging crash dump vmcore.2 (64-bit) from DALSTOR1
operating system: 5.11 omnios-r151022-5e982daae6 (i86pc)
image uuid: febd682c-2ba5-69ea-b5c4-c8a30c88ffc4
panic message: hati_pte_map: flags & HAT_LOAD_REMAP
dump content: kernel pages only
> $C
d000f5d6f7b0 vpanic()
d000f5d6f850 hati_pte_map+0x3ab(d0341195c518, 46, d000af162cd8,
801756ea8007, 0,
0)
d000f5d6f8e0 hati_load_common+0x139(d0321f9a8908, 8046000,
d000af162cd8, 40b, 0, 0,
1756ea8)
d000f5d6f960 hat_memload+0x75(d0321f9a8908, 8046000, d000af162cd8,
b, 0)
d000f5d6fa80 segvn_faultpage+0x730(d0321f9a8908, d03252f19df0,
8046000,
f000, 0, d000f5d6fb50, d034, d0310001,
d002,
0001)
d000f5d6fc50 segvn_fault+0x8e6(d0321f9a8908, d03252f19df0, 8046000,
1000, 1, 2)
d000f5d6fd60 as_fault+0x312(d0321f9a8908, d0321fa22e20, 8046efc, 1,
1, 2)
d000f5d6fdf0 pagefault+0x96(8046efc, 1, 2, 0)
d000f5d6ff00 trap+0x30c(d000f5d6ff10, 8046efc, b)
d000f5d6ff10 0xfb8001d6()
> ::stack
vpanic()
hati_pte_map+0x3ab(d0341195c518, 46, d000af162cd8, 801756ea8007, 0,
0)
hati_load_common+0x139(d0321f9a8908, 8046000, d000af162cd8, 40b, 0, 0)
hat_memload+0x75(d0321f9a8908, 8046000, d000af162cd8, b, 0)
segvn_faultpage+0x730(d0321f9a8908, d03252f19df0, 8046000,
f000, 0,
d000f5d6fb50)
segvn_fault+0x8e6(d0321f9a8908, d03252f19df0, 8046000, 1000, 1, 2)
as_fault+0x312(d0321f9a8908, d0321fa22e20, 8046efc, 1, 1, 2)
pagefault+0x96(8046efc, 1, 2, 0)
trap+0x30c(d000f5d6ff10, 8046efc, b)
0xfb8001d6()
> ::cpuinfo
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC
0 fbc397a0 1f00 39 nono t-0d031ff4cb060
DALV4_snapcycle.
1 d031ea372b00 1f20 -1 nono t-0d000f45fdc40 (idle)
2 d031ea36f500 1f00 -1 nono t-0d000f46f7c40 (idle)
3 d031ea369540 1f00 -1 nono t-0d000f47a9c40 (idle)
4 d031ea368040 1f00 -1 nono t-0d000f467bc40 (idle)
5 d031ea4bdb00 1f00 -1 nono t-0d000f4868c40 (idle)
6 d031ea4b3a80 1f10 -1 nono t-0d000f49afc40 (idle)
7 d031ea4aa540 1f20 -1 nono t-0d000f50a7c40 (idle)
8 d031ea4a9040 1f20 -1 nono t-0d000f51a1c40 (idle)
9 d031ea60cb00 1f20 -1 nono t-0d000f5253c40 (idle)
10 d031ea602a80 1f10 -1 nono t-0d000f5305c40 (idle)
11 fbc440a0 1b00 29 nono t-0d03220006b60
DALV4_snapcycle.
> d031ff4cb060::findstack -v
stack pointer for thread d031ff4cb060: d000f78fd720
d000f78fd910 page_trylock+1()
d000f78fd9c0 zfs_getpage+0x185(d031ffb33e40, e2000, 1000,
d000f78fdbec,
d000f78fdb50, 8000, d0347314fbf8, fef02000, 3, d0327ee27c80, 0)
d000f78fda80 fop_getpage+0x7e(d031ffb33e40, e2000, 1000,
d000f78fdbec,
d000f78fdb50, 8000, d0347314fbf8, fef02000, d003,
d0327ee27c80, 0)
d000f78fdc50 segvn_fault+0xdfa(d0321f9a8598, d0347314fbf8,
fef02000, 1000, 0, 3)
d000f78fdd60 as_fault+0x312(d0321f9a8598, d0321fa229c0, fef0276b,
1, 0, 3)
d000f78fddf0 pagefault+0x96(fef0276b, 0, 3, 0)
d000f78fdf00 trap+0x30c(d000f78fdf10, fef0276b, 0)
d000f78fdf10 0xfb8001d6()
> d03220006b60::findstack -v
stack pointer for thread d03220006b60: d000f5d6f740
d000f5d6f7b0 param_preset()
d000f5d6f850 hati_pte_map+0x3ab(d0341195c518, 46, d000af162cd8,
801756ea8007, 0
, 0)
d000f5d6f8e0 hati_load_common+0x139(d0321f9a8908, 8046000,
d000af162cd8, 40b, 0, 0
, 1756ea8)
d000f5d6f960 hat_memload+0x75(d0321f9a8908, 8046000,
d000af162cd8, b, 0)
d000f5d6fa80 segvn_faultpage+0x730(d0321f9a8908, d03252f19df0,
8046000,
f000, 0, d000f5d6fb50, d034, d0310001,
d002,