On Sun, Jan 8, 2017 at 12:18 AM, Uri Foox <[email protected]> wrote: > Hi Pravin, > > Thanks. Does this mean it is a confirmed bug? > > Yes, there is atleast one bug in ovs vport gre implementation.
> How would I be able to get the patch and install it into our environment? > > Easiest way would be compile OVS 2.5 or 2.6. But I have posted patch on netdev mailing list for upstream kernel you can use that for 3.13 kernel. https://patchwork.ozlabs.org/patch/712373/ Thanks, Pravin. Thanks, > Uri > > > On Sat, Jan 7, 2017 at 1:01 PM, Pravin Shelar <[email protected]> wrote: > >> Thanks for all investigation. >> >> On Sat, Jan 7, 2017 at 12:57 AM, Joe Stringer <[email protected]> wrote: >> > >> > >> > On 5 January 2017 at 19:24, Uri Foox <[email protected]> wrote: >> >> >> >> Hey Joe, >> >> >> >> Thank you so much for responding! After 10 days of trying to figure >> this >> >> out I'm at a loss. >> >> >> >> root@node-8:~# modinfo openvswitch >> >> filename: >> >> /lib/modules/3.13.0-106-generic/kernel/net/openvswitch/openvswitch.ko >> >> license: GPL >> >> description: Open vSwitch switching datapath >> >> srcversion: 94294A72258BA583D666607 >> >> depends: libcrc32c,vxlan,gre >> >> intree: Y >> > >> > >> > ^ intree - that is, the version that comes with this kernel. >> > >> >> >> >> vermagic: 3.13.0-106-generic SMP mod_unload modversions >> >> >> >> >> >> Everything you've mentioned is what I've understood so far including >> the >> >> line of code that's triggered. That is what led me to upgrade the >> kernel to >> >> 3.13.0-106 because it claims that the CHECKSUM problems are fixed >> which I >> >> thought this might be related, guess not. >> > >> > >> > I forgot to actually look through those before, but the call chain >> looks a >> > bit different there so I thought it may be a different issue altogether. >> > >> >> >> >> You're saying that skb_headlen is too short for the ethernet header. Do >> >> you know what would cause this? This hardware configuration has been >> running >> >> for 400+ days of uptime with no errors or problems and this suddenly >> started >> >> to happen and no matter how many time we reboot things it doesn't go >> away. >> >> I assume given your interpretation we should try to restart the >> switches >> >> connected to the servers. Is there any way to log what packet is >> causing >> >> this issue? Perhaps that would provide more insight? >> > >> > >> > One thing is that it depends on the packets and how they arrive. I'm >> not too >> > familiar with this code, but I could imagine a situation where the >> IP+GRE >> > packet gets fragmented, causing a single inner frame to be split across >> > muliple GRE packets. Then, when Linux receives the two separate packets, >> > there would be some point in the stack responsible for stitching these >> > packets back together; but it may not put them into a single contiguous >> > buffer. If this is subsequently decapped for local delivery of the inner >> > frame, then perhaps there is less than an ethernet header's worth of >> packet >> > in the first of these buffers. It seems unlikely that packets would be >> > deliberately fragmented like this, but if anyone had access to your >> > underlying network then they could throw any kind of packet they want to >> > your server. >> > >> > There may be another, more likely, explanation - CC Pravin in case he >> has >> > any ideas. >> > >> >> >> >> As far as 4.4/newer kernel - I wish. I tried to go that far up but >> Ubuntu >> >> wouldn't even boot. The best I could do is 3.13.0-106. I'll try to >> report it >> >> over there as well. >> > >> > >> > That's too bad. >> > >> > FWIW, I see a check for pskb_may_pull() in the outer gre_rcv function, >> which >> > would check on the whole GRE packet.. this is then passed to >> gre_cisco_rcv() >> > which does the decap and calls through to the OVS gre_rcv() function. >> At a >> > glance, following the OVS' gre_rcv() I didn't see another >> psukb_may_pull() >> > check for the inner packet. By the time it gets to ovs_flow_extract(), >> > there's an expectation that this call was made but I'm really not sure >> who >> > was supposed to make that check. Also, it should be ETH_HLEN, which is >> 14, >> > not 12.. >> > >> Right. OVS do expect the-header already in skb linear data. It is done >> in iptunnel_pull_header() for tunnel packets. This function is called >> for all packets received in GRE module. >> >> http://lxr.free-electrons.com/source/net/ipv4/ip_tunnel_core.c?v=3.13#L96 >> >> But the skb eth-header is only pulled for GRE-TAP packets not for >> IP-GRE. The change in network could have introduced these IP-GRE >> packets that caused the crash. >> >> This bug does not exist in out of tree kernel module that come with >> OVS 2.5 and newer. So upgrading OVS kernel module to 2.5 should solve >> the problem. >> >> I will sent out a patch for older OVS kernel module. >> >> > Outer gre_rcv(): >> > http://lxr.free-electrons.com/source/net/ipv4/gre_demux.c?v=3.13#L270 >> > >> > Inner gre_rcv(): >> > http://lxr.free-electrons.com/source/net/openvswitch/vport-g >> re.c?v=3.13#L92 >> > > > > -- > Uri Foox | Zoey | Founder > http://www.zoey.com > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
