Hi Greg,

The exact issue occured on the 20th of check-kmod (sometimes there are other 
kernel issue: kernel just hangs but without panic). OVS2.6.0 on CentOS7.2 with 
kernel 3.10.0-327.el7.x86_64. Some info below, which hopes helpful.

datapath-sanity

  1: datapath - ping between two ports               ok
  2: datapath - http between two ports               ok
  3: datapath - ping between two ports on vlan       ok
  4: datapath - ping6 between two ports              ok
  5: datapath - ping6 between two ports on vlan      ok
  6: datapath - ping over vxlan tunnel               FAILED 
(system-traffic.at:159)
  7: datapath - ping over gre tunnel                 FAILED 
(system-traffic.at:199)
  8: datapath - ping over geneve tunnel              skipped 
(system-traffic.at:213)
  9: datapath - basic truncate action                ok
 10: datapath - truncate and output to gre tunnel    FAILED 
(system-traffic.at:445)
 11: conntrack - controller                          FAILED 
(system-traffic.at:522)
 12: conntrack - IPv4 HTTP                           ok
 13: conntrack - IPv6 HTTP                           ok
 14: conntrack - IPv4 ping                           ok
 15: conntrack - IPv6 ping                           ok
 16: conntrack - commit, recirc                      ok
 17: conntrack - preserve registers                  ok
 18: conntrack - invalid                             ok
 19: conntrack - zones                               ok
 20: conntrack - zones from field ....(system crash...)


[root@localhost vmcore-127.0.0.1-2017-06-25-23:17:12]# ls
analyzer      backtrace  count      last_occurrence  os_info     runlevel  type 
 username  vmcore
architecture  component  event_log  machineid        os_release  time      uid  
 uuid      vmcore-dmesg.txt
[root@localhost vmcore-127.0.0.1-2017-06-25-23:17:12]# cat backtrace

Version: 3.10.0-327.el7.x86_64
BUG: unable to handle kernel paging request at ffffffffa0715ae8
IP: [<ffffffff8108e6a7>] get_next_timer_interrupt+0x97/0x270
PGD 194d067 PUD 194e063 PMD b746f067 PTE 0
Oops: 0000 [#1] SMP
Modules linked in: nf_nat_ftp nf_conntrack_ftp nf_conntrack_netlink nfnetlink 
ip_gre ip_tunnel gre vxlan ip6_udp_tunnel udp_tunnel 8021q garp m               
                                                                               
rp veth xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter 
ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute brid             
                                                                                
 ge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6   
                                                                                
           table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_ra      
                                                                                
        w iptable_filter vmw_vsock_vmci_transport vsock bnep dm_mirror 
dm_region_hash dm_log dm_mod snd_seq_midi snd_seq_midi_event snd_ens1371 
snd_raw                                                                         
                     midi coretemp snd_ac97_codec ac97_bus crc32_pclmul snd_seq 
ghash_clmulni_intel ppdev
 snd_seq_device cryptd btusb snd_pcm bluetooth snd_timer snd soundcore sg 
vmw_balloon rfkill pcspkr parport_pc parport i2c_piix4 vmw_vmci shpch           
                                                                                
   p nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod 
cdrom ata_generic sd_mod crc_t10dif crct10dif_generic pata_acpi cr              
                                                                                
ct10dif_pclmul crct10dif_common crc32c_intel serio_raw vmwgfx drm_kms_helper 
ttm mptspi scsi_transport_spi e1000 mptscsih mptbase drm i2c_core               
                                                                                
ata_piix libata [last unloaded: openvswitch]
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G           OE  ------------   
3.10.0-327.el7.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference 
Platform, BIOS 6.00 07/02/2015
task: ffff8800b9a81700 ti: ffff8800b9a8c000 task.ti: ffff8800b9a8c000
RIP: 0010:[<ffffffff8108e6a7>]  [<ffffffff8108e6a7>] 
get_next_timer_interrupt+0x97/0x270
RSP: 0018:ffff8800b9a8fdd8  EFLAGS: 00010012
RAX: ffffffffa0715ad0 RBX: 00000863b6f08300 RCX: ffff8800b95a8d08
RDX: 00000000000000ce RSI: 00000000000000ce RDI: 0000000100882cce
RBP: ffff8800b9a8fe30 R08: 0000000000000202 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000100882ccd
R13: 7fffffffffffffff R14: ffff8800b95a8000 R15: 0000000100882ccd
FS:  0000000000000000(0000) GS:ffff8800bb620000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0715ae8 CR3: 00000000b64d8000 CR4: 00000000003407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffff8800b9f5e780 0000000000000000 ffff8800b9a8dfd8 ffff8800b9a8fe10
 ffff8800b9a8fe48 20cc1170855d3261 ffff8800bb62dbc0 00000863b6f08300
 0000000000000001 ffff8800bb62cf00 0000000100882ccd ffff8800b9a8fe88
Call Trace:
 [<ffffffff810e0978>] tick_nohz_stop_sched_tick+0x1e8/0x2e0
 [<ffffffff8101cd15>] ? native_sched_clock+0x35/0x80
 [<ffffffff810e0b0e>] __tick_nohz_idle_enter+0x9e/0x150
 [<ffffffff810e102d>] tick_nohz_idle_enter+0x3d/0x70
 [<ffffffff810d615e>] cpu_startup_entry+0x9e/0x290
 [<ffffffff810475fa>] start_secondary+0x1ba/0x230
Code: 18 49 8b 7e 10 48 39 cf 48 89 ca 78 5a 40 0f b6 d7 89 d6 48 63 c6 48 c1 
e0 04 49 8d 0c 06 48 8b 41 28 48 83 c1 28 48 39 c8 74 0e <f6> 40                
                                                                               
18 01 74 23 48 8b 00 48 39 c8 75 f2 83 c6 01 40 0f b6 f6
RIP  [<ffffffff8108e6a7>] get_next_timer_interrupt+0x97/0x270
 RSP <ffff8800b9a8fdd8>


Wang Zhike

-----邮件原件-----
发件人: Greg Rose [mailto:[email protected]] 
发送时间: 2017年6月27日 6:26
收件人: 王志克
抄送: [email protected]; Joe Stringer
主题: Re: [ovs-dev] 答复: 答复: [PATCH] pkt reassemble: fix kernel panic for ovs 
reassemble

On 06/26/2017 04:56 AM, 王志克 wrote:
> Hi Joe,
>
> I will try to check how to send the patch. Maybe tomorrow since I am quite 
> busy now.
>
> Regarding the crash, I can reproduce it even with official OVS, like 
> ovs2.6.0. (I just run the check kmod in a loop until kernel panic). So it is 
> not related to the new fix.
>
> Br,
> Wang Zhike
I've been running 'make check-kmod' in a continuous loop on 3 virtual machines 
since this morning.  So far no kernel splats but plenty of errors:

This is on the Ubuntu machine running 4.0 kernel:

ERROR: 66 tests were run,
24 failed unexpectedly.
23 tests were skipped.
## -------------------------------------- ## ## system-kmod-testsuite.log was 
created. ## ## -------------------------------------- ##

Please send `tests/system-kmod-testsuite.log' and all information you think 
might help:

    To: <[email protected]>
       Subject: [openvswitch 2.7.90] system-kmod-testsuite: 16 17 35 57 58 59 
60 61 62 63 70 71 72 75 76 81 82 83 84 85 86 87 88 89 failed

Centos 7.2 running 4.9.24 kernel:

## ------------- ##
## Test results. ##
## ------------- ##

ERROR: 76 tests were run,
34 failed unexpectedly.
13 tests were skipped.
## -------------------------------------- ## ## system-kmod-testsuite.log was 
created. ## ## -------------------------------------- ##

Please send `tests/system-kmod-testsuite.log' and all information you think 
might help:

    To: <[email protected]>
       Subject: [openvswitch 2.7.90] system-kmod-testsuite: 2 14 15 20 21 22 23 
24 25 26 27 28 29 30 31 32 47 48 49 50 51 57 59 60 61 62 70 71 75 76 84 85 86 
87 failed

Centos 7.2 running 4.10.17 kernel:

## ------------- ##
## Test results. ##
## ------------- ##

ERROR: 74 tests were run,
34 failed unexpectedly.
15 tests were skipped.
## -------------------------------------- ## ## system-kmod-testsuite.log was 
created. ## ## -------------------------------------- ##

Please send `tests/system-kmod-testsuite.log' and all information you think 
might help:

    To: <[email protected]>
       Subject: [openvswitch 2.7.90] system-kmod-testsuite: 2 14 15 20 21 22 23 
24 25 26 27 28 29 30 31 32 47 48 49 50 51 57 59 60 61 62 70 71 75 76 84 85 86 
87 failed

I confess to not spending a lot of time running check-kmod.  I certainly intend 
to in the future.

- Greg

>
> -----邮件原件-----
> 发件人: Joe Stringer [mailto:[email protected]]
> 发送时间: 2017年6月24日 5:15
> 收件人: 王志克
> 抄送: [email protected]
> 主题: Re: 答复: [ovs-dev] [PATCH] pkt reassemble: fix kernel panic for ovs 
> reassemble
>
> Hi Wang Zhike,
>
> I'd like if others like Greg could take a look as well, since this code is 
> delicate. The more review it gets, the better. It seems like maybe the 
> version of your email that goes to the list does not get the attachment. 
> Perhaps you could try sending the patch using git send-email or putting the 
> patch on GitHub instead, and linking to it here.
>
> For what it's worth, I did run your patch for a while and it seemed 
> OK, but when I tried again today on an Ubuntu Trusty (Linux
> 3.13.0-119-generic) box, running make check-kmod, I saw an issue with
> get_next_timer_interrupt():
>
> [181250.892557] BUG: unable to handle kernel paging request at 
> ffffffffa03317e0 [181250.892557] IP: [<ffffffff81079606>] 
> get_next_timer_interrupt+0x86/0x250
> [181250.892557] PGD 1c11067 PUD 1c12063 PMD 1381a2067 PTE 0 
> [181250.892557] Oops: 0000 [#1] SMP [181250.892557] Modules linked in: 
> nf_nat_ipv6 nf_nat_ipv4 nf_nat
> gre(-) nf_conntrack_ipv6 nf_conntrack_ipv4 nf_defrag_ipv6
> nf_defrag_ipv4 nf_conntrack_netlink nfnetlink nf_conntrack bonding 
> 8021q garp stp mrp llc veth nfsd auth_rpcgss nfs_acl nfs lockd sunrpc 
> fscache dm_crypt kvm_intel kvm serio_raw netconsole configfs 
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
> aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse floppy ahci 
> libahci [last unloaded: libcrc32c]
> [181250.892557] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OX
> 3.13.0-119-generic #166-Ubuntu
> [181250.892557] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Bochs 01/01/2011 [181250.892557] task: ffffffff81c15480 ti: ffffffff81c00000 
> task.ti:
> ffffffff81c00000
> [181250.892557] RIP: 0010:[<ffffffff81079606>]  [<ffffffff81079606>]
> get_next_timer_interrupt+0x86/0x250
> [181250.892557] RSP: 0018:ffffffff81c01e00  EFLAGS: 00010002 [181250.892557] 
> RAX: ffffffffa03317c8 RBX: 0000000102b245da RCX:
> 00000000000000db
> [181250.892557] RDX: ffffffff81ebac58 RSI: 00000000000000db RDI:
> 0000000102b245db
> [181250.892557] RBP: ffffffff81c01e48 R08: 0000000000c88c1c R09:
> 0000000000000000
> [181250.892557] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000142b245d9
> [181250.892557] R13: ffffffff81eb9e80 R14: 0000000102b245da R15:
> 0000000000cd63e8
> [181250.892557] FS:  0000000000000000(0000) GS:ffff88013fc00000(0000)
> knlGS:0000000000000000
> [181250.892557] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
> [181250.892557] CR2: ffffffffa03317e0 CR3: 000000003707f000 CR4:
> 00000000000006f0
> [181250.892557] Stack:
> [181250.892557]  0000000000000000 ffffffff81c01e30 ffffffff810a3af5
> ffff88013fc13bc0
> [181250.892557]  ffff88013fc0dce0 0000000102b245da 0000000000000000
> 00000063ae154000
> [181250.892557]  0000000000cd63e8 ffffffff81c01ea8 ffffffff810da655
> 0000a4d8c2cb6200
> [181250.892557] Call Trace:
> [181250.892557]  [<ffffffff810a3af5>] ? set_next_entity+0x95/0xb0 
> [181250.892557]  [<ffffffff810da655>] 
> tick_nohz_stop_sched_tick+0x1e5/0x340
> [181250.892557]  [<ffffffff810da851>] 
> __tick_nohz_idle_enter+0xa1/0x160 [181250.892557]  
> [<ffffffff810dab4d>] tick_nohz_idle_enter+0x3d/0x70 [181250.892557]  
> [<ffffffff810c2af7>] cpu_startup_entry+0x87/0x2b0 [181250.892557]  
> [<ffffffff8171b387>] rest_init+0x77/0x80 [181250.892557]  
> [<ffffffff81d34f6a>] start_kernel+0x432/0x43d [181250.892557]  
> [<ffffffff81d34941>] ? repair_env_string+0x5c/0x5c [181250.892557]  
> [<ffffffff81d34120>] ? early_idt_handler_array+0x120/0x120
> [181250.892557]  [<ffffffff81d345ee>] 
> x86_64_start_reservations+0x2a/0x2c
> [181250.892557]  [<ffffffff81d34733>] x86_64_start_kernel+0x143/0x152 
> [181250.892557] Code: 8b 7d 10 4d 8b 75 18 4c 39 f7 78 5c 40 0f b6 cf
> 89 ce 48 63 c6 48 c1 e0 04 49 8d 54 05 00 48 8b 42 28 48 83 c2 28 48
> 39 d0 74 0e <f6> 40 18 01 74 24 48 8b 00 48 39 d0 75 f2 83 c6 01 40 0f
> b6 f6
> [181250.892557] RIP  [<ffffffff81079606>] 
> get_next_timer_interrupt+0x86/0x250
> [181250.892557]  RSP <ffffffff81c01e00> [181250.892557] CR2: 
> ffffffffa03317e0
>
> It seems like perhaps a fragment timer signed up by OVS is still 
> remaining when the OVS module is unloaded, so it may attempt to clean 
> up an entry using OVS code but the OVS code has been unloaded at that 
> point. This might be related to IPv6 cvlan test - that seems to be 
> where my VM froze and went to 100% CPU, but I would think that the
> IPv6 fragmentation cleanup test is a more likely to cause this, since it 
> leaves fragments behind in the cache after the test finishes. I've only hit 
> this when running all of the tests in make check-kmod.
>
> Cheers,
> Joe
>
> On 22 June 2017 at 17:53, 王志克 <[email protected]> wrote:
> > Hi Joe,
> >
> > Please check the attachment. Thanks.
> >
> > Br,
> > Wang Zhike
> >
> > -----邮件原件-----
> > 发件人: Joe Stringer [mailto:[email protected]]
> > 发送时间: 2017年6月23日 8:20
> > 收件人: 王志克
> > 抄送: [email protected]
> > 主题: Re: [ovs-dev] [PATCH] pkt reassemble: fix kernel panic for ovs 
> > reassemble
> >
> > On 21 June 2017 at 18:54, 王志克 <[email protected]> wrote:
> >> Ovs and kernel stack would add frag_queue to same netns_frags list.
> >> As result, ovs and kernel may access the fraq_queue without correct 
> >> lock. Also the struct ipq may be different on kernel(older than 
> >> 4.3), which leads to invalid pointer access.
> >>
> >> The fix creates specific netns_frags for ovs.
> >>
> >> Signed-off-by: wangzhike <[email protected]>
> >> ---
> >
> > Hi,
> >
> > It looks like the whitespace has been corrupted in this version of the 
> > patch that you sent, I cannot apply it. Probably your email client 
> > mistreats it when sending the email out. A reliable method to send patches 
> > correctly via email is to use the commandline client 'git send-email'. This 
> > is the preferred method. If you are unable to set that up, consider 
> > attaching the patch to the email (or send a pull request on GitHub).
> >
> > Cheers,
> > Joe
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to