On 7/2/24 08:04, Chris Mi wrote:
> On 6/27/2024 1:24 PM, Chris Mi wrote:
>> On 6/18/2024 2:27 PM, Chris Mi wrote:
>>> On 5/16/2024 5:09 AM, Ilya Maximets wrote:
>>>> On 3/26/24 03:30, Chris Mi wrote:
>>>>> This patch set adds offload support for sFlow.
>>>>>
>>>>> Psample is a genetlink channel for packet sampling. TC action 
>>>>> act_sample
>>>>> uses psample to send sampled packets to userspace.
>>>>>
>>>>> When offloading sample action to TC, userspace creates a unique ID to
>>>>> map sFlow action and tunnel info and passes this ID to kernel instead
>>>>> of the sFlow info. psample will send this ID and sampled packet to
>>>>> userspace. Using the ID, userspace can recover the sFlow info and send
>>>>> sampled packet to the right sFlow monitoring host.
>>>>
>>>> Hi, Chris, others.
>>>>
>>>> I've been looking through the set and a psample in general for the past
>>>> few days.  There are few fairly minor issues in the set like UBsan crash
>>>> because of incorrect OVS_RETURNS_NONNULL annotation and a few style and
>>>> formatting issues, which are not hard to fix.  However, I encountered
>>>> an issue with a way tunnel metadata is recovered that I'm not sure we
>>>> can actually fix.
>>>>
>>>> The algorithm of work in this patch set is described in the cover letter
>>>> above.  The key point is then packet-1 goes though MISS upcall we 
>>>> install
>>>> a new datapath flow into TC and we remember the tunnel metadata of the
>>>> packet-1 associated with a sample ID.  When the packet-2 hits the 
>>>> datapath
>>>> flow (tc actions), it gets sent to psample and ovs-vswitchd reads this
>>>> packet from psample netlink socket and presents it as an ACTION upcall.
>>>> Doing that it finds tunnel metadata using the sample ID and uses it as a
>>>> tunnel metadata for packet-2.  The key of the problem is that this is
>>>> metadata from packet-1 and it can be different from metadata of 
>>>> packet-2.
>>>>
>>>> An example of this will be having two separate TCP streams going through
>>>> the same VxLAN tunnel.  For load balancing reasons most UDP tunnels
>>>> choose source port of the outer UDP header based on RSS or 5-tuple hash
>>>> of the inner packet.  This means that two TCP streams going through the
>>>> same tunnel will have different UDP source port in the outer header.
>>>>
>>>> When a packet from a first stream hits a MISS upcall, datapath flow will
>>>> be installed and the sample ID will be associated with the metadata of
>>>> the first stream.  Chances are this datapath flow will not have exact
>>>> match on the source port of the tunnel metadata.  That means that a
>>>> packet from the second stream will successfully match on the installed
>>>> datapath flow and will go to psample with the ID of the first stream.
>>>> OVS will read it from the psample socket and send to sFlow collector
>>>> with metadata it found by the ID, i.e. with a tunnel metadata that
>>>> contains tunnel source port of the first TCP stream, which is incorrect.
>>>>
>>>> The test below reproduces the issue.  It's not a real test, it always 
>>>> fails
>>>> on purpose and not very pretty, but what it does is that it creates a
>>>> tunnel, then starts netcat server on one side and sends two files to it
>>>> from the other side.  These two transfers are separate TCP 
>>>> connections, so
>>>> they use different source ports, that is causing the tunnel to choose
>>>> different UDP source ports for them.  After the test failure we can 
>>>> see in
>>>> the sflow.log that all the values for 'tunnel4_in_src_port' are the same
>>>> for both streams and some exchanges prior to that.  However, if offload
>>>> is disabled, the values will be different as they should.
>>>>
>>>> So, unless we can ensure that packets from different streams do not 
>>>> match
>>>> on the same datapath flow, this schema doesn't work.
>>>>
>>>> The only way to ensure that packets from different streams within the 
>>>> same
>>>> tunnel do not match on the same datapath flow is to always have an exact
>>>> match on the whole tunnel metadata.  But I don't think that is a good 
>>>> idea,
>>>> because this way we'll have a separate datapath flow per TCP stream, 
>>>> which
>>>> will be very bad for performance.  And it will be bad for hardware 
>>>> offload
>>>> since hardware NICs normally have a more limited amount of available
>>>> resources.
>>>>
>>>> This leads us to conclusion that only non-tunneling traffic can be 
>>>> offloaded
>>>> this way.  For this to work we'll have to add an explicit match on 
>>>> tunnel
>>>> metadata being empty (can that be expressed in TC?)
>>>>
>>>> But at this point a question arises if this even worth implementing, 
>>>> taking
>>>> into account that it requires a lot of infrastructure changes 
>>>> throughout all
>>>> the layers of OVS code just for sampling of non-tunnel packets, i.e. 
>>>> a very
>>>> narrow use case.
>>>>
>>>> Also, my understanding was that there is a plan to offload other 
>>>> types of
>>>> userspace() action in the future, such as controller() action using 
>>>> the same
>>>> psample infrastructure.  But that will not be possible for the same 
>>>> reason.
>>>> In addition to tunnel metadata we will also have to provide 
>>>> connection tracking
>>>> information, which is even harder, because conntrack state is more 
>>>> dynamic and
>>>> is only actually known to the datapath.  psample can't supply this 
>>>> information
>>>> to the userspace.
>>>>
>>>> Thoughts?
>>>
>>> Hi Ilya,
>>>
>>> Sorry for the late reply.
>>>
>>> I applied your following diff and reproduced the issue you found.
>>> This seems like a limitation we can't resolve currently. But as you
>>> pointed out if customer is interested in the source port of the tunnel,
>>> maybe they can have a full match to have it. And maybe not every
>>> customer is interested in it.
>>>
>>> We have been working on this feature for several years. And the design
>>> is changed several times. We don't want to miss it for a limitation.
>>> Can we merge it with this limitation? What do you think?
>>>
>>> Thanks,
>>> Chris
>>
>> Hi Ilya,
>>
>> Could we know what's your plan for this patchset? Any suggestion?
>>
>> Thanks,
>> Chris
> 
> Hi Ilya,
> 
> I'm wondering if you have a chance to take a look?

Hi, Chris.

It's true that a lot of time and effort went into this patch set from
both development and review sides, and I wish we caught this issue
much earlier.  However, now that I understand all the shortcomings of
the design, I don't think we should add all this infrastructure just
for limited sampling offload.

The main idea of this patch set was that we'll be able to use psample
as more or less transparent replacement for normal OVS upcall mechanism.
Unfortunately, that's not the case.  Psample is not suitable for that
as it can't carry many different types of metadata that we need in OVS.
And it actually should not carry most of that information as it doesn't
fit the intended purpose of psample.

For the tunnel sampling, I don't think it's reasonable to add exact
match on all the tunnel metadata fields, mostly for performance reasons.
Such a setup will not be suitable for any reasonably high-traffic setup,
i.e. most likely not suitable for production use.  Getting changes as
they are and reporting incorrect values to controller is not a good
option either.

Turning off tunnel support for sampling leaves us with very narrow use
case for which the introduced infrastructure is way too big, IMO.
Especially since sampling can be achieved even without involving OVS
on the collection side.  If we can get packets to psample, they can be
taken from there by a separate application and sent to sFlow collector
or anywhere else.  I also found that you added tunnel metadata reporting
to psample itself a few years ago, so the tunnel information reported
there should be more accurate in contrast to what this patch set provides.

Adrian is currently working on local sampling support via psample [1].
We can add offloading of this action via act_sample, so this sampling
method can be offloaded if hardware supports it.  This might be a better
approach for sampling offload.  There are a few issues to solve here
like allow offloading of the no-op gact action, so we can keep UFID
information in there and free up act_sample's action cookie for sampling
information (obs IDs), but all that seems doable.

[1] https://lore.kernel.org/netdev/[email protected]/

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to