On Sun, Jan 8, 2017 at 12:18 AM, Uri Foox <[email protected]> wrote:

> Hi Pravin,
>
> Thanks. Does this mean it is a confirmed bug?
>
>
Yes, there is atleast one bug in ovs vport gre implementation.


> How would I be able to get the patch and install it into our environment?
>
> Easiest way would be compile OVS 2.5 or 2.6. But I have posted patch on
netdev mailing list for upstream kernel you can use that for 3.13 kernel.

https://patchwork.ozlabs.org/patch/712373/

Thanks,
Pravin.


Thanks,
> Uri
>
>
> On Sat, Jan 7, 2017 at 1:01 PM, Pravin Shelar <[email protected]> wrote:
>
>> Thanks for all investigation.
>>
>> On Sat, Jan 7, 2017 at 12:57 AM, Joe Stringer <[email protected]> wrote:
>> >
>> >
>> > On 5 January 2017 at 19:24, Uri Foox <[email protected]> wrote:
>> >>
>> >> Hey Joe,
>> >>
>> >> Thank you so much for responding! After 10 days of trying to figure
>> this
>> >> out I'm at a loss.
>> >>
>> >> root@node-8:~# modinfo openvswitch
>> >> filename:
>> >> /lib/modules/3.13.0-106-generic/kernel/net/openvswitch/openvswitch.ko
>> >> license:        GPL
>> >> description:    Open vSwitch switching datapath
>> >> srcversion:     94294A72258BA583D666607
>> >> depends:        libcrc32c,vxlan,gre
>> >> intree:         Y
>> >
>> >
>> > ^ intree - that is, the version that comes with this kernel.
>> >
>> >>
>> >> vermagic:       3.13.0-106-generic SMP mod_unload modversions
>> >>
>> >>
>> >> Everything you've mentioned is what I've understood so far including
>> the
>> >> line of code that's triggered. That is what led me to upgrade the
>> kernel to
>> >> 3.13.0-106 because it claims that the CHECKSUM problems are fixed
>> which I
>> >> thought this might be related, guess not.
>> >
>> >
>> > I forgot to actually look through those before, but the call chain
>> looks a
>> > bit different there so I thought it may be a different issue altogether.
>> >
>> >>
>> >> You're saying that skb_headlen is too short for the ethernet header. Do
>> >> you know what would cause this? This hardware configuration has been
>> running
>> >> for 400+ days of uptime with no errors or problems and this suddenly
>> started
>> >> to happen and no matter how many time we reboot things it doesn't go
>> away.
>> >> I assume given your interpretation we should try to restart the
>> switches
>> >> connected to the servers. Is there any way to log what packet is
>> causing
>> >> this issue? Perhaps that would provide more insight?
>> >
>> >
>> > One thing is that it depends on the packets and how they arrive. I'm
>> not too
>> > familiar with this code, but I could imagine a situation where the
>> IP+GRE
>> > packet gets fragmented, causing a single inner frame to be split across
>> > muliple GRE packets. Then, when Linux receives the two separate packets,
>> > there would be some point in the stack responsible for stitching these
>> > packets back together; but it may not put them into a single contiguous
>> > buffer. If this is subsequently decapped for local delivery of the inner
>> > frame, then perhaps there is less than an ethernet header's worth of
>> packet
>> > in the first of these buffers. It seems unlikely that packets would be
>> > deliberately fragmented like this, but if anyone had access to your
>> > underlying network then they could throw any kind of packet they want to
>> > your server.
>> >
>> > There may be another, more likely, explanation - CC Pravin in case he
>> has
>> > any ideas.
>> >
>> >>
>> >> As far as 4.4/newer kernel - I wish. I tried to go that far up but
>> Ubuntu
>> >> wouldn't even boot. The best I could do is 3.13.0-106. I'll try to
>> report it
>> >> over there as well.
>> >
>> >
>> > That's too bad.
>> >
>> > FWIW, I see a check for pskb_may_pull() in the outer gre_rcv function,
>> which
>> > would check on the whole GRE packet.. this is then passed to
>> gre_cisco_rcv()
>> > which does the decap and calls through to the OVS gre_rcv() function.
>> At a
>> > glance, following the OVS' gre_rcv() I didn't see another
>> psukb_may_pull()
>> > check for the inner packet. By the time it gets to ovs_flow_extract(),
>> > there's an expectation that this call was made but I'm really not sure
>> who
>> > was supposed to make that check. Also, it should be ETH_HLEN, which is
>> 14,
>> > not 12..
>> >
>> Right. OVS do expect the-header already in skb linear data. It is done
>> in iptunnel_pull_header() for tunnel packets. This function is called
>> for all packets received in GRE module.
>>
>> http://lxr.free-electrons.com/source/net/ipv4/ip_tunnel_core.c?v=3.13#L96
>>
>> But the skb eth-header is only pulled for GRE-TAP packets not for
>> IP-GRE. The change in network could have introduced these IP-GRE
>> packets that caused the crash.
>>
>> This bug does not exist in out of tree kernel module that come with
>> OVS 2.5 and newer. So upgrading OVS kernel module to 2.5 should solve
>> the problem.
>>
>> I will sent out a patch for older OVS kernel module.
>>
>> > Outer gre_rcv():
>> > http://lxr.free-electrons.com/source/net/ipv4/gre_demux.c?v=3.13#L270
>> >
>> > Inner gre_rcv():
>> > http://lxr.free-electrons.com/source/net/openvswitch/vport-g
>> re.c?v=3.13#L92
>>
>
>
>
> --
> Uri Foox | Zoey | Founder
> http://www.zoey.com
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to