Re: [ovs-dev] [PATCH] datapath: Prevent panic

2018-04-17 Thread Gregory Rose

On 4/16/2018 11:32 PM, Pravin Shelar wrote:

On Mon, Apr 16, 2018 at 10:58 AM, Greg Rose  wrote:

On RHEL 7.x kernels we observe a panic induced by a paging error
when the timer kicks off a job that subsequently accesses memory
that belonged to the openvswitch kernel module but was since
unloaded - thus the paging error.

The panic can be induced on any RHEL 7.x kernel with the following test:

while `true`
do
 make check-kmod TESTSUITEFLAGS="-k \!gre"
done

On the systems I've been testing on it generally takes anywhere from a
minute to 15 minutes or so to repro but never longer than that.  Similar
results have been seen by other testers.

This patch does not fix the underlying bug, which does need to be
investigated and fixed, but it does prevent it from occurring. We
would like to prevent customer systems from panicking while we do
futher investigation to find the root cause.


Can you add stack trace to the commit ?


Sure, I'll send a V2 in a bit.

Thanks,

- Greg
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] datapath: Prevent panic

2018-04-16 Thread Pravin Shelar
On Mon, Apr 16, 2018 at 10:58 AM, Greg Rose  wrote:
> On RHEL 7.x kernels we observe a panic induced by a paging error
> when the timer kicks off a job that subsequently accesses memory
> that belonged to the openvswitch kernel module but was since
> unloaded - thus the paging error.
>
> The panic can be induced on any RHEL 7.x kernel with the following test:
>
> while `true`
> do
> make check-kmod TESTSUITEFLAGS="-k \!gre"
> done
>
> On the systems I've been testing on it generally takes anywhere from a
> minute to 15 minutes or so to repro but never longer than that.  Similar
> results have been seen by other testers.
>
> This patch does not fix the underlying bug, which does need to be
> investigated and fixed, but it does prevent it from occurring. We
> would like to prevent customer systems from panicking while we do
> futher investigation to find the root cause.
>
Can you add stack trace to the commit ?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] datapath: Prevent panic

2018-04-16 Thread Greg Rose
On RHEL 7.x kernels we observe a panic induced by a paging error
when the timer kicks off a job that subsequently accesses memory
that belonged to the openvswitch kernel module but was since
unloaded - thus the paging error.

The panic can be induced on any RHEL 7.x kernel with the following test:

while `true`
do
make check-kmod TESTSUITEFLAGS="-k \!gre"
done

On the systems I've been testing on it generally takes anywhere from a
minute to 15 minutes or so to repro but never longer than that.  Similar
results have been seen by other testers.

This patch does not fix the underlying bug, which does need to be
investigated and fixed, but it does prevent it from occurring. We
would like to prevent customer systems from panicking while we do
futher investigation to find the root cause.

Signed-off-by: Greg Rose 
---
 datapath/datapath.c | 10 ++
 tests/system-kmod-macros.at |  1 +
 utilities/ovs-lib.in|  1 +
 3 files changed, 12 insertions(+)

diff --git a/datapath/datapath.c b/datapath/datapath.c
index 3ea240a..43f0d74 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -2478,6 +2478,16 @@ error:
 
 static void dp_cleanup(void)
 {
+#if RHEL_RELEASE_CODE < RHEL_RELEASE_VERSION(8,0)
+   /* On RHEL 7.x kernels we hit a kernel paging error without
+* this barrier and subsequent hefty delay.  A process will
+* attempt to access openvwitch memory after it has been
+* unloaded.  Further debugging is needed on that but for
+* now let's not let customer machines panic.
+*/
+   rcu_barrier();
+   msleep(3000);
+#endif
dp_unregister_genl(ARRAY_SIZE(dp_genl_families));
ovs_netdev_exit();
unregister_netdevice_notifier(&ovs_dp_device_notifier);
diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
index f23a406..2b9b691 100644
--- a/tests/system-kmod-macros.at
+++ b/tests/system-kmod-macros.at
@@ -23,6 +23,7 @@ m4_define([OVS_TRAFFIC_VSWITCHD_START],
on_exit 'modprobe -q -r mod'
   ])
on_exit 'ovs-dpctl del-dp ovs-system'
+   on_exit 'ovs-appctl dpctl/flush-conntrack'
_OVS_VSWITCHD_START([])
dnl Add bridges, ports, etc.
AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [| 
uuidfilt])], [0], [$2])
diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in
index 4dc3151..4c3ad0f 100644
--- a/utilities/ovs-lib.in
+++ b/utilities/ovs-lib.in
@@ -616,6 +616,7 @@ force_reload_kmod () {
 for dp in `ovs-dpctl dump-dps`; do
 action "Removing datapath: $dp" ovs-dpctl del-dp "$dp"
 done
+action "ovs-appctl dpctl/flush-conntrack"
 
 for vport in `awk '/^vport_/ { print $1 }' /proc/modules`; do
 action "Removing $vport module" rmmod $vport
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev