Re: [PATCH] skb_array: fix NULL-pointer exception
On 2017年12月05日 12:40, Michael S. Tsirkin wrote: On Tue, Dec 05, 2017 at 11:11:14AM +0800, Jason Wang wrote: On 2017年12月04日 22:24, George Cherian wrote: While running a multiple VM testscase with each VM running iperf traffic between others the following kernel NULL pointer exception was seen. Race appears when the tun driver instance of one VM calls skb_array_produce (from tun_net_xmit) and the the destined VM's skb_array_consume (from tun_ring_recv), which could run concurrently on another core. Due to which the sock_wfree gets called again from the tun_ring_recv context. OK, so is the implication that there are two concurrent callers for tun_ring_recv on the same device? I guess not since VM is used (unless there's some misconfiguration). The fix is to add write/read barrier calls to be sure that we get proper values in the tun_ring_recv context. Crash log [35321.580227] Unable to handle kernel NULL pointer dereference at virtual address 0060 [35321.596720] pgd = 809ee552f000 [35321.603723] [0060] *pgd=009f514ac003, *pud=009f54f7c003, *pmd= [35321.620588] Internal error: Oops: 9606 1 SMP [35321.630461] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_4 [35321.728501] CPU: 114 PID: 5560 Comm: qemu-system-aar Not tainted 4.11.8-4k-vhe-lse+ #3 [35321.744510] Hardware name: Cavium Inc. Unknown/Unknown, BIOS 1.0 07/24/2017 [35321.758602] task: 80bed7fca880 task.stack: 80beb5128000 [35321.770600] PC is at sock_wfree+0x24/0x80 [35321.778746] LR is at skb_release_head_state+0x68/0xf8 [35321.788979] pc : [] lr : [] pstate: 40400149 [35321.803930] sp : 80beb512bc30 [35321.810648] x29: 80beb512bc30 x28: 80bed7fca880 [35321.821400] x27: 004e x26: [35321.832156] x25: 000c x24: [35321.842947] x23: 809ece3e4900 x22: 80beb512be00 [35321.853729] x21: 09138000 x20: 0144 [35321.864507] x19: x18: 0014 [35321.875276] x17: 9d9689a0 x16: 0828b3f8 [35321.886048] x15: 504d7b00 x14: e90ab50c48680a08 [35321.896824] x13: 010117773f52 x12: 1080d422c00e5db6 [35321.907595] x11: 68c322bd3930cf7a x10: a8c0d07aa8c0ad16 [35321.918352] x9 : 1da4ed90 x8 : b50c48680a080101 [35321.929099] x7 : 17770c521080 x6 : 1d6c13f2 [35321.939865] x5 : 1d6c13f2 x4 : 000e [35321.950619] x3 : 00085ff97d82 x2 : [35321.961376] x1 : 08a772d8 x0 : 0500 [35321.975193] Process qemu-system-aar (pid: 5560, stack limit = 0x80beb5128000) [35321.990347] Stack: (0x80beb512bc30 to 0x80beb512c000) [35322.001982] bc20: 80beb512bc50 08a79238 [35322.017817] bc40: 809e8fd7be00 004e 80beb512bc70 08a79488 [35322.033651] bc60: 809e8fd7be00 0881307c 80beb512bc90 08a79678 [35322.049489] bc80: 809e8fd7be00 80beb512be00 80beb512bcb0 08812f24 [35322.065321] bca0: 809e8fd7be00 004e 80beb512bd50 088133f0 [35322.081165] bcc0: 809ece3e4900 00011000 80beb512bdd8 80beb512be00 [35322.097001] bce0: 1d6c13a4 0015 0124 003f [35322.112866] bd00: 08bc2000 0847b5ac 0002 80be0008 [35322.128701] bd20: 00220001 80bed7fc0010 08100c38 [35322.144539] bd40: 00040b08 80beb512bd80 08288f80 [35322.160395] bd60: 09138000 809ee7cd3500 00011000 80beb512beb0 [35322.176255] bd80: 80beb512be30 0828a224 00011000 809ee7cd3500 [35322.192109] bda0: 1d6c13a4 80beb512beb0 00011000 [35322.207974] bdc0: 1d6c13a4 00011000 809ee7cd3500 [35322.223822] bde0: 004e [35322.239661] be00: 80be 004e 00010fb2 80beb512bdc8 [35322.255519] be20: 0001 00040b08 80beb512be70 0828b464 [35322.271392] be40: 09138000 809ee7cd3501 809ee7cd3500 1d6c13a4 [35322.287255] be60: 00011000 0015 080833f0 [35322.303090] be80: 80bef0071000 9d9689cc [35322.318951] bea0: 8000 80bef0071000 004e 00040b08 [35322.334771] bec0: 000e 1d6c13a4 00011000 9cc89108 [35322.350640] bee0: 0002 9cc89000 9cc896f0 [35322.366500] bf00: 003f 1da4ed90 a8c0d07aa8c0ad16 68c322bd3930cf7a [35322.382358] bf20: 1080d422c00e5db6 010117773f52 e90ab50c48680a08
Re: [PATCH] skb_array: fix NULL-pointer exception
On 2017年12月05日 12:40, Michael S. Tsirkin wrote: On Tue, Dec 05, 2017 at 11:11:14AM +0800, Jason Wang wrote: On 2017年12月04日 22:24, George Cherian wrote: While running a multiple VM testscase with each VM running iperf traffic between others the following kernel NULL pointer exception was seen. Race appears when the tun driver instance of one VM calls skb_array_produce (from tun_net_xmit) and the the destined VM's skb_array_consume (from tun_ring_recv), which could run concurrently on another core. Due to which the sock_wfree gets called again from the tun_ring_recv context. OK, so is the implication that there are two concurrent callers for tun_ring_recv on the same device? I guess not since VM is used (unless there's some misconfiguration). The fix is to add write/read barrier calls to be sure that we get proper values in the tun_ring_recv context. Crash log [35321.580227] Unable to handle kernel NULL pointer dereference at virtual address 0060 [35321.596720] pgd = 809ee552f000 [35321.603723] [0060] *pgd=009f514ac003, *pud=009f54f7c003, *pmd= [35321.620588] Internal error: Oops: 9606 1 SMP [35321.630461] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_4 [35321.728501] CPU: 114 PID: 5560 Comm: qemu-system-aar Not tainted 4.11.8-4k-vhe-lse+ #3 [35321.744510] Hardware name: Cavium Inc. Unknown/Unknown, BIOS 1.0 07/24/2017 [35321.758602] task: 80bed7fca880 task.stack: 80beb5128000 [35321.770600] PC is at sock_wfree+0x24/0x80 [35321.778746] LR is at skb_release_head_state+0x68/0xf8 [35321.788979] pc : [] lr : [] pstate: 40400149 [35321.803930] sp : 80beb512bc30 [35321.810648] x29: 80beb512bc30 x28: 80bed7fca880 [35321.821400] x27: 004e x26: [35321.832156] x25: 000c x24: [35321.842947] x23: 809ece3e4900 x22: 80beb512be00 [35321.853729] x21: 09138000 x20: 0144 [35321.864507] x19: x18: 0014 [35321.875276] x17: 9d9689a0 x16: 0828b3f8 [35321.886048] x15: 504d7b00 x14: e90ab50c48680a08 [35321.896824] x13: 010117773f52 x12: 1080d422c00e5db6 [35321.907595] x11: 68c322bd3930cf7a x10: a8c0d07aa8c0ad16 [35321.918352] x9 : 1da4ed90 x8 : b50c48680a080101 [35321.929099] x7 : 17770c521080 x6 : 1d6c13f2 [35321.939865] x5 : 1d6c13f2 x4 : 000e [35321.950619] x3 : 00085ff97d82 x2 : [35321.961376] x1 : 08a772d8 x0 : 0500 [35321.975193] Process qemu-system-aar (pid: 5560, stack limit = 0x80beb5128000) [35321.990347] Stack: (0x80beb512bc30 to 0x80beb512c000) [35322.001982] bc20: 80beb512bc50 08a79238 [35322.017817] bc40: 809e8fd7be00 004e 80beb512bc70 08a79488 [35322.033651] bc60: 809e8fd7be00 0881307c 80beb512bc90 08a79678 [35322.049489] bc80: 809e8fd7be00 80beb512be00 80beb512bcb0 08812f24 [35322.065321] bca0: 809e8fd7be00 004e 80beb512bd50 088133f0 [35322.081165] bcc0: 809ece3e4900 00011000 80beb512bdd8 80beb512be00 [35322.097001] bce0: 1d6c13a4 0015 0124 003f [35322.112866] bd00: 08bc2000 0847b5ac 0002 80be0008 [35322.128701] bd20: 00220001 80bed7fc0010 08100c38 [35322.144539] bd40: 00040b08 80beb512bd80 08288f80 [35322.160395] bd60: 09138000 809ee7cd3500 00011000 80beb512beb0 [35322.176255] bd80: 80beb512be30 0828a224 00011000 809ee7cd3500 [35322.192109] bda0: 1d6c13a4 80beb512beb0 00011000 [35322.207974] bdc0: 1d6c13a4 00011000 809ee7cd3500 [35322.223822] bde0: 004e [35322.239661] be00: 80be 004e 00010fb2 80beb512bdc8 [35322.255519] be20: 0001 00040b08 80beb512be70 0828b464 [35322.271392] be40: 09138000 809ee7cd3501 809ee7cd3500 1d6c13a4 [35322.287255] be60: 00011000 0015 080833f0 [35322.303090] be80: 80bef0071000 9d9689cc [35322.318951] bea0: 8000 80bef0071000 004e 00040b08 [35322.334771] bec0: 000e 1d6c13a4 00011000 9cc89108 [35322.350640] bee0: 0002 9cc89000 9cc896f0 [35322.366500] bf00: 003f 1da4ed90 a8c0d07aa8c0ad16 68c322bd3930cf7a [35322.382358] bf20: 1080d422c00e5db6 010117773f52 e90ab50c48680a08
Re: [PATCH] skb_array: fix NULL-pointer exception
On Tue, Dec 05, 2017 at 11:11:14AM +0800, Jason Wang wrote: > > > On 2017年12月04日 22:24, George Cherian wrote: > > While running a multiple VM testscase with each VM running iperf > > traffic between others the following kernel NULL pointer exception > > was seen. > > > > Race appears when the tun driver instance of one VM calls skb_array_produce > > (from tun_net_xmit) and the the destined VM's skb_array_consume > > (from tun_ring_recv), which could run concurrently on another core. Due to > > which the sock_wfree gets called again > from the tun_ring_recv context. OK, so is the implication that there are two concurrent callers for tun_ring_recv on the same device? > > The fix is to add write/read barrier calls to be sure that we get proper > > values in the tun_ring_recv context. > > > > Crash log > > [35321.580227] Unable to handle kernel NULL pointer dereference at virtual > > address 0060 > > [35321.596720] pgd = 809ee552f000 > > [35321.603723] [0060] *pgd=009f514ac003, *pud=009f54f7c003, > > *pmd= > > [35321.620588] Internal error: Oops: 9606 1 SMP > > [35321.630461] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE > > nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 > > nf_defrag_ipv4 xt_4 > > [35321.728501] CPU: 114 PID: 5560 Comm: qemu-system-aar Not tainted > > 4.11.8-4k-vhe-lse+ #3 > > [35321.744510] Hardware name: Cavium Inc. Unknown/Unknown, BIOS 1.0 > > 07/24/2017 > > [35321.758602] task: 80bed7fca880 task.stack: 80beb5128000 > > [35321.770600] PC is at sock_wfree+0x24/0x80 > > [35321.778746] LR is at skb_release_head_state+0x68/0xf8 > > [35321.788979] pc : [] lr : [] pstate: > > 40400149 > > [35321.803930] sp : 80beb512bc30 > > [35321.810648] x29: 80beb512bc30 x28: 80bed7fca880 > > [35321.821400] x27: 004e x26: > > [35321.832156] x25: 000c x24: > > [35321.842947] x23: 809ece3e4900 x22: 80beb512be00 > > [35321.853729] x21: 09138000 x20: 0144 > > [35321.864507] x19: x18: 0014 > > [35321.875276] x17: 9d9689a0 x16: 0828b3f8 > > [35321.886048] x15: 504d7b00 x14: e90ab50c48680a08 > > [35321.896824] x13: 010117773f52 x12: 1080d422c00e5db6 > > [35321.907595] x11: 68c322bd3930cf7a x10: a8c0d07aa8c0ad16 > > [35321.918352] x9 : 1da4ed90 x8 : b50c48680a080101 > > [35321.929099] x7 : 17770c521080 x6 : 1d6c13f2 > > [35321.939865] x5 : 1d6c13f2 x4 : 000e > > [35321.950619] x3 : 00085ff97d82 x2 : > > [35321.961376] x1 : 08a772d8 x0 : 0500 > > [35321.975193] Process qemu-system-aar (pid: 5560, stack limit = > > 0x80beb5128000) > > [35321.990347] Stack: (0x80beb512bc30 to 0x80beb512c000) > > [35322.001982] bc20: 80beb512bc50 08a79238 > > [35322.017817] bc40: 809e8fd7be00 004e 80beb512bc70 > > 08a79488 > > [35322.033651] bc60: 809e8fd7be00 0881307c 80beb512bc90 > > 08a79678 > > [35322.049489] bc80: 809e8fd7be00 80beb512be00 80beb512bcb0 > > 08812f24 > > [35322.065321] bca0: 809e8fd7be00 004e 80beb512bd50 > > 088133f0 > > [35322.081165] bcc0: 809ece3e4900 00011000 80beb512bdd8 > > 80beb512be00 > > [35322.097001] bce0: 1d6c13a4 0015 0124 > > 003f > > [35322.112866] bd00: 08bc2000 0847b5ac 0002 > > 80be0008 > > [35322.128701] bd20: 00220001 80bed7fc0010 08100c38 > > > > [35322.144539] bd40: 00040b08 80beb512bd80 > > 08288f80 > > [35322.160395] bd60: 09138000 809ee7cd3500 00011000 > > 80beb512beb0 > > [35322.176255] bd80: 80beb512be30 0828a224 00011000 > > 809ee7cd3500 > > [35322.192109] bda0: 1d6c13a4 80beb512beb0 00011000 > > > > [35322.207974] bdc0: 1d6c13a4 00011000 > > 809ee7cd3500 > > [35322.223822] bde0: 004e > > > > [35322.239661] be00: 80be 004e 00010fb2 > > 80beb512bdc8 > > [35322.255519] be20: 0001 00040b08 80beb512be70 > > 0828b464 > > [35322.271392] be40: 09138000 809ee7cd3501 809ee7cd3500 > > 1d6c13a4 > > [35322.287255] be60: 00011000 0015 > > 080833f0 > > [35322.303090] be80: 80bef0071000 > > 9d9689cc > > [35322.318951] bea0: 8000 80bef0071000 004e > > 00040b08 > > [35322.334771] bec0: 000e 1d6c13a4 00011000 > > 9cc89108 > >
Re: [PATCH] skb_array: fix NULL-pointer exception
On Tue, Dec 05, 2017 at 11:11:14AM +0800, Jason Wang wrote: > > > On 2017年12月04日 22:24, George Cherian wrote: > > While running a multiple VM testscase with each VM running iperf > > traffic between others the following kernel NULL pointer exception > > was seen. > > > > Race appears when the tun driver instance of one VM calls skb_array_produce > > (from tun_net_xmit) and the the destined VM's skb_array_consume > > (from tun_ring_recv), which could run concurrently on another core. Due to > > which the sock_wfree gets called again > from the tun_ring_recv context. OK, so is the implication that there are two concurrent callers for tun_ring_recv on the same device? > > The fix is to add write/read barrier calls to be sure that we get proper > > values in the tun_ring_recv context. > > > > Crash log > > [35321.580227] Unable to handle kernel NULL pointer dereference at virtual > > address 0060 > > [35321.596720] pgd = 809ee552f000 > > [35321.603723] [0060] *pgd=009f514ac003, *pud=009f54f7c003, > > *pmd= > > [35321.620588] Internal error: Oops: 9606 1 SMP > > [35321.630461] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE > > nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 > > nf_defrag_ipv4 xt_4 > > [35321.728501] CPU: 114 PID: 5560 Comm: qemu-system-aar Not tainted > > 4.11.8-4k-vhe-lse+ #3 > > [35321.744510] Hardware name: Cavium Inc. Unknown/Unknown, BIOS 1.0 > > 07/24/2017 > > [35321.758602] task: 80bed7fca880 task.stack: 80beb5128000 > > [35321.770600] PC is at sock_wfree+0x24/0x80 > > [35321.778746] LR is at skb_release_head_state+0x68/0xf8 > > [35321.788979] pc : [] lr : [] pstate: > > 40400149 > > [35321.803930] sp : 80beb512bc30 > > [35321.810648] x29: 80beb512bc30 x28: 80bed7fca880 > > [35321.821400] x27: 004e x26: > > [35321.832156] x25: 000c x24: > > [35321.842947] x23: 809ece3e4900 x22: 80beb512be00 > > [35321.853729] x21: 09138000 x20: 0144 > > [35321.864507] x19: x18: 0014 > > [35321.875276] x17: 9d9689a0 x16: 0828b3f8 > > [35321.886048] x15: 504d7b00 x14: e90ab50c48680a08 > > [35321.896824] x13: 010117773f52 x12: 1080d422c00e5db6 > > [35321.907595] x11: 68c322bd3930cf7a x10: a8c0d07aa8c0ad16 > > [35321.918352] x9 : 1da4ed90 x8 : b50c48680a080101 > > [35321.929099] x7 : 17770c521080 x6 : 1d6c13f2 > > [35321.939865] x5 : 1d6c13f2 x4 : 000e > > [35321.950619] x3 : 00085ff97d82 x2 : > > [35321.961376] x1 : 08a772d8 x0 : 0500 > > [35321.975193] Process qemu-system-aar (pid: 5560, stack limit = > > 0x80beb5128000) > > [35321.990347] Stack: (0x80beb512bc30 to 0x80beb512c000) > > [35322.001982] bc20: 80beb512bc50 08a79238 > > [35322.017817] bc40: 809e8fd7be00 004e 80beb512bc70 > > 08a79488 > > [35322.033651] bc60: 809e8fd7be00 0881307c 80beb512bc90 > > 08a79678 > > [35322.049489] bc80: 809e8fd7be00 80beb512be00 80beb512bcb0 > > 08812f24 > > [35322.065321] bca0: 809e8fd7be00 004e 80beb512bd50 > > 088133f0 > > [35322.081165] bcc0: 809ece3e4900 00011000 80beb512bdd8 > > 80beb512be00 > > [35322.097001] bce0: 1d6c13a4 0015 0124 > > 003f > > [35322.112866] bd00: 08bc2000 0847b5ac 0002 > > 80be0008 > > [35322.128701] bd20: 00220001 80bed7fc0010 08100c38 > > > > [35322.144539] bd40: 00040b08 80beb512bd80 > > 08288f80 > > [35322.160395] bd60: 09138000 809ee7cd3500 00011000 > > 80beb512beb0 > > [35322.176255] bd80: 80beb512be30 0828a224 00011000 > > 809ee7cd3500 > > [35322.192109] bda0: 1d6c13a4 80beb512beb0 00011000 > > > > [35322.207974] bdc0: 1d6c13a4 00011000 > > 809ee7cd3500 > > [35322.223822] bde0: 004e > > > > [35322.239661] be00: 80be 004e 00010fb2 > > 80beb512bdc8 > > [35322.255519] be20: 0001 00040b08 80beb512be70 > > 0828b464 > > [35322.271392] be40: 09138000 809ee7cd3501 809ee7cd3500 > > 1d6c13a4 > > [35322.287255] be60: 00011000 0015 > > 080833f0 > > [35322.303090] be80: 80bef0071000 > > 9d9689cc > > [35322.318951] bea0: 8000 80bef0071000 004e > > 00040b08 > > [35322.334771] bec0: 000e 1d6c13a4 00011000 > > 9cc89108 > >
Re: [PATCH] skb_array: fix NULL-pointer exception
On 2017年12月04日 22:24, George Cherian wrote: While running a multiple VM testscase with each VM running iperf traffic between others the following kernel NULL pointer exception was seen. Race appears when the tun driver instance of one VM calls skb_array_produce (from tun_net_xmit) and the the destined VM's skb_array_consume (from tun_ring_recv), which could run concurrently on another core. Due to which the sock_wfree gets called again from the tun_ring_recv context. The fix is to add write/read barrier calls to be sure that we get proper values in the tun_ring_recv context. Crash log [35321.580227] Unable to handle kernel NULL pointer dereference at virtual address 0060 [35321.596720] pgd = 809ee552f000 [35321.603723] [0060] *pgd=009f514ac003, *pud=009f54f7c003, *pmd= [35321.620588] Internal error: Oops: 9606 1 SMP [35321.630461] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_4 [35321.728501] CPU: 114 PID: 5560 Comm: qemu-system-aar Not tainted 4.11.8-4k-vhe-lse+ #3 [35321.744510] Hardware name: Cavium Inc. Unknown/Unknown, BIOS 1.0 07/24/2017 [35321.758602] task: 80bed7fca880 task.stack: 80beb5128000 [35321.770600] PC is at sock_wfree+0x24/0x80 [35321.778746] LR is at skb_release_head_state+0x68/0xf8 [35321.788979] pc : [] lr : [] pstate: 40400149 [35321.803930] sp : 80beb512bc30 [35321.810648] x29: 80beb512bc30 x28: 80bed7fca880 [35321.821400] x27: 004e x26: [35321.832156] x25: 000c x24: [35321.842947] x23: 809ece3e4900 x22: 80beb512be00 [35321.853729] x21: 09138000 x20: 0144 [35321.864507] x19: x18: 0014 [35321.875276] x17: 9d9689a0 x16: 0828b3f8 [35321.886048] x15: 504d7b00 x14: e90ab50c48680a08 [35321.896824] x13: 010117773f52 x12: 1080d422c00e5db6 [35321.907595] x11: 68c322bd3930cf7a x10: a8c0d07aa8c0ad16 [35321.918352] x9 : 1da4ed90 x8 : b50c48680a080101 [35321.929099] x7 : 17770c521080 x6 : 1d6c13f2 [35321.939865] x5 : 1d6c13f2 x4 : 000e [35321.950619] x3 : 00085ff97d82 x2 : [35321.961376] x1 : 08a772d8 x0 : 0500 [35321.975193] Process qemu-system-aar (pid: 5560, stack limit = 0x80beb5128000) [35321.990347] Stack: (0x80beb512bc30 to 0x80beb512c000) [35322.001982] bc20: 80beb512bc50 08a79238 [35322.017817] bc40: 809e8fd7be00 004e 80beb512bc70 08a79488 [35322.033651] bc60: 809e8fd7be00 0881307c 80beb512bc90 08a79678 [35322.049489] bc80: 809e8fd7be00 80beb512be00 80beb512bcb0 08812f24 [35322.065321] bca0: 809e8fd7be00 004e 80beb512bd50 088133f0 [35322.081165] bcc0: 809ece3e4900 00011000 80beb512bdd8 80beb512be00 [35322.097001] bce0: 1d6c13a4 0015 0124 003f [35322.112866] bd00: 08bc2000 0847b5ac 0002 80be0008 [35322.128701] bd20: 00220001 80bed7fc0010 08100c38 [35322.144539] bd40: 00040b08 80beb512bd80 08288f80 [35322.160395] bd60: 09138000 809ee7cd3500 00011000 80beb512beb0 [35322.176255] bd80: 80beb512be30 0828a224 00011000 809ee7cd3500 [35322.192109] bda0: 1d6c13a4 80beb512beb0 00011000 [35322.207974] bdc0: 1d6c13a4 00011000 809ee7cd3500 [35322.223822] bde0: 004e [35322.239661] be00: 80be 004e 00010fb2 80beb512bdc8 [35322.255519] be20: 0001 00040b08 80beb512be70 0828b464 [35322.271392] be40: 09138000 809ee7cd3501 809ee7cd3500 1d6c13a4 [35322.287255] be60: 00011000 0015 080833f0 [35322.303090] be80: 80bef0071000 9d9689cc [35322.318951] bea0: 8000 80bef0071000 004e 00040b08 [35322.334771] bec0: 000e 1d6c13a4 00011000 9cc89108 [35322.350640] bee0: 0002 9cc89000 9cc896f0 [35322.366500] bf00: 003f 1da4ed90 a8c0d07aa8c0ad16 68c322bd3930cf7a [35322.382358] bf20: 1080d422c00e5db6 010117773f52 e90ab50c48680a08 504d7b00 [35322.398209] bf40: 9d9689a0 0014 0030 [35322.414063] bf60: 1d6c13a4 1d6c0db0 1d6d1db0 006fbbc8 [35322.429901] bf80: 00011000 1d6ab3e8 1d6ab3a4 007ee4c8
Re: [PATCH] skb_array: fix NULL-pointer exception
On 2017年12月04日 22:24, George Cherian wrote: While running a multiple VM testscase with each VM running iperf traffic between others the following kernel NULL pointer exception was seen. Race appears when the tun driver instance of one VM calls skb_array_produce (from tun_net_xmit) and the the destined VM's skb_array_consume (from tun_ring_recv), which could run concurrently on another core. Due to which the sock_wfree gets called again from the tun_ring_recv context. The fix is to add write/read barrier calls to be sure that we get proper values in the tun_ring_recv context. Crash log [35321.580227] Unable to handle kernel NULL pointer dereference at virtual address 0060 [35321.596720] pgd = 809ee552f000 [35321.603723] [0060] *pgd=009f514ac003, *pud=009f54f7c003, *pmd= [35321.620588] Internal error: Oops: 9606 1 SMP [35321.630461] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_4 [35321.728501] CPU: 114 PID: 5560 Comm: qemu-system-aar Not tainted 4.11.8-4k-vhe-lse+ #3 [35321.744510] Hardware name: Cavium Inc. Unknown/Unknown, BIOS 1.0 07/24/2017 [35321.758602] task: 80bed7fca880 task.stack: 80beb5128000 [35321.770600] PC is at sock_wfree+0x24/0x80 [35321.778746] LR is at skb_release_head_state+0x68/0xf8 [35321.788979] pc : [] lr : [] pstate: 40400149 [35321.803930] sp : 80beb512bc30 [35321.810648] x29: 80beb512bc30 x28: 80bed7fca880 [35321.821400] x27: 004e x26: [35321.832156] x25: 000c x24: [35321.842947] x23: 809ece3e4900 x22: 80beb512be00 [35321.853729] x21: 09138000 x20: 0144 [35321.864507] x19: x18: 0014 [35321.875276] x17: 9d9689a0 x16: 0828b3f8 [35321.886048] x15: 504d7b00 x14: e90ab50c48680a08 [35321.896824] x13: 010117773f52 x12: 1080d422c00e5db6 [35321.907595] x11: 68c322bd3930cf7a x10: a8c0d07aa8c0ad16 [35321.918352] x9 : 1da4ed90 x8 : b50c48680a080101 [35321.929099] x7 : 17770c521080 x6 : 1d6c13f2 [35321.939865] x5 : 1d6c13f2 x4 : 000e [35321.950619] x3 : 00085ff97d82 x2 : [35321.961376] x1 : 08a772d8 x0 : 0500 [35321.975193] Process qemu-system-aar (pid: 5560, stack limit = 0x80beb5128000) [35321.990347] Stack: (0x80beb512bc30 to 0x80beb512c000) [35322.001982] bc20: 80beb512bc50 08a79238 [35322.017817] bc40: 809e8fd7be00 004e 80beb512bc70 08a79488 [35322.033651] bc60: 809e8fd7be00 0881307c 80beb512bc90 08a79678 [35322.049489] bc80: 809e8fd7be00 80beb512be00 80beb512bcb0 08812f24 [35322.065321] bca0: 809e8fd7be00 004e 80beb512bd50 088133f0 [35322.081165] bcc0: 809ece3e4900 00011000 80beb512bdd8 80beb512be00 [35322.097001] bce0: 1d6c13a4 0015 0124 003f [35322.112866] bd00: 08bc2000 0847b5ac 0002 80be0008 [35322.128701] bd20: 00220001 80bed7fc0010 08100c38 [35322.144539] bd40: 00040b08 80beb512bd80 08288f80 [35322.160395] bd60: 09138000 809ee7cd3500 00011000 80beb512beb0 [35322.176255] bd80: 80beb512be30 0828a224 00011000 809ee7cd3500 [35322.192109] bda0: 1d6c13a4 80beb512beb0 00011000 [35322.207974] bdc0: 1d6c13a4 00011000 809ee7cd3500 [35322.223822] bde0: 004e [35322.239661] be00: 80be 004e 00010fb2 80beb512bdc8 [35322.255519] be20: 0001 00040b08 80beb512be70 0828b464 [35322.271392] be40: 09138000 809ee7cd3501 809ee7cd3500 1d6c13a4 [35322.287255] be60: 00011000 0015 080833f0 [35322.303090] be80: 80bef0071000 9d9689cc [35322.318951] bea0: 8000 80bef0071000 004e 00040b08 [35322.334771] bec0: 000e 1d6c13a4 00011000 9cc89108 [35322.350640] bee0: 0002 9cc89000 9cc896f0 [35322.366500] bf00: 003f 1da4ed90 a8c0d07aa8c0ad16 68c322bd3930cf7a [35322.382358] bf20: 1080d422c00e5db6 010117773f52 e90ab50c48680a08 504d7b00 [35322.398209] bf40: 9d9689a0 0014 0030 [35322.414063] bf60: 1d6c13a4 1d6c0db0 1d6d1db0 006fbbc8 [35322.429901] bf80: 00011000 1d6ab3e8 1d6ab3a4 007ee4c8