Hi Yuanhan

We discussed the RTE_FLOW_ACTION_TYPE_RSS action, in the dpdk meeting.
There is related discussion and details in this thread.
It seems the RTE_FLOW_ACTION_TYPE_RSS action can be supported by the drivers.
If this is the case, then:
1/ We could program only one try with this action.
2/ Masked flows would be distributed to different PMDs as is otherwise true 
today
3/ Since multiple PMDs could see the same flow, the mark would be global across 
PMDs.

Can you check if this can work ?

Thanks Darrell

On 9/15/17, 12:55 PM, "Finn Christensen" <f...@napatech.com> wrote:

    
    
        -----Original Message-----
        From: Darrell Ball [mailto:db...@vmware.com]
        Sent: 15. september 2017 20:09
        To: Finn Christensen <f...@napatech.com>
        Cc: d...@openvswitch.org
        Subject: Re: [ovs-dev] [PATCH v2 6/8] netdev-dpdk: retry with queue 
action
    
    
    
        On 9/15/17, 2:16 AM, "Finn Christensen" <f...@napatech.com> wrote:
    
    
    
                -----Original Message-----
                From: Darrell Ball [mailto:db...@vmware.com]
                Sent: 14. september 2017 21:35
                To: Finn Christensen <f...@napatech.com>
                Cc: d...@openvswitch.org
                Subject: Re: [ovs-dev] [PATCH v2 6/8] netdev-dpdk: retry with 
queue
                action
    
    
    
                On 9/14/17, 1:14 AM, "Finn Christensen" <f...@napatech.com> 
wrote:
    
    
    
    
    
                    -----Original Message-----
    
                    From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
                boun...@openvswitch.org] On Behalf Of Darrell Ball
    
                    Sent: 13. september 2017 18:18
    
                    To: Simon Horman <simon.hor...@netronome.com>
    
                    Cc: d...@openvswitch.org
    
                    Subject: Re: [ovs-dev] [PATCH v2 6/8] netdev-dpdk: retry 
with queue
                action
    
    
    
    
    
    
    
                    On 9/13/17, 2:57 AM, "Simon Horman"
                <simon.hor...@netronome.com> wrote:
    
    
    
                        On Tue, Sep 12, 2017 at 08:36:19AM +0000, Darrell Ball 
wrote:
    
                        >
    
                        > On 9/10/17, 11:14 PM, 
"ovs-dev-boun...@openvswitch.org on
                behalf of Yuanhan Liu" <ovs-dev-boun...@openvswitch.org on 
behalf
        of
                y...@fridaylinux.org> wrote:
    
                        >
    
                        >     On Fri, Sep 08, 2017 at 06:48:50PM +0200, Simon 
Horman
        wrote:
    
                        >     > On Tue, Sep 05, 2017 at 05:22:59PM +0800, 
Yuanhan Liu
        wrote:
    
                        >     > > From: Finn Christensen <f...@napatech.com>
    
                        >     > >
    
                        >     > > AFAIK, most (if not all) NICs (including 
Mellanox and Intel)
        do
                not
    
                        >     > > support a pure MARK action.  It's required to 
be used
        together
                with
    
                        >     > > some other actions, like QUEUE.
    
                        >     > >
    
                        >     > > To workaround it, retry with a queue action 
when first try
                failed.
    
                        >     > >
    
                        >     > > Moreover, some Intel's NIC (say XL710) needs 
the QUEUE
                action set
    
                        >     > > before the MARK action.
    
                        >     > >
    
                        >     > > Co-authored-by: Yuanhan Liu 
<y...@fridaylinux.org>
    
                        >     > > Signed-off-by: Finn Christensen 
<f...@napatech.com>
    
                        >     > > Signed-off-by: Yuanhan Liu 
<y...@fridaylinux.org>
    
                        >     >
    
                        >     > This feels a bit like the tail wagging the dog.
    
                        >     > Is this the lowest level at which it makes 
sense to implement
    
                        >     > this logic?
    
                        >     >
    
                        >     > If so then I wonder if some sort of probing 
would be in order
    
                        >     > to avoid the cost of trying to add the flow 
twice to hardware
    
                        >     > where the queue is required.
    
                        >
    
                        >     Do you mean something like rte_flow capability 
query, like
                whether
    
                        >     a queue action is needed for a mark action? If 
so, yes, I do
        think
    
                        >     we miss an interface like this.
    
                        >
    
                        >     Note that even in this solution, the flow won't 
be created
        twice
    
                        >     to the hardware, because the first try would be 
failed.
    
                        >
    
                        > [Darrell]
    
                        >
    
                        >               Having an api to quey capability and 
avoid the first try to
        HW
                would be nice, but there are dependencies
    
                        >                on RTE, drivers etc and I don’t know 
definitive the api
        would
                be.
    
                        >
    
                        >              Also, as nics are added this capability 
needs to be done
        and
                state needs to be kept in all cases.
    
                        >
    
                        >            It is an enhancement and if done should be 
reliable.
    
    
    
                        Agreed. Though I was more thinking of probing the 
hardware
        rather
                than
    
                        having a capability API - I expect this would remove 
several of the
    
                        dependencies you describe above.
    
    
    
    
    
                    [Darrell] I have been pondering the probing option as well. 
It is
        certainly a
                valid option; we use it in other cases such as datapath 
probing. One of
        the
                aspects that worries me here is maintaining the correct per 
interface
                (essentially; although the attribute is per nic) state across 
various
        events
                such as new ports being added, vswitchd restarts, races with 
flow
        creation.
                It would be non-trivial I guess and probably appropriate for 
the next
        patch
                series, if done.
    
    
    
                    In this case, we have what seems like a clear distinction 
b/w
        Napatech
                which does not need the queue action workaround and everything
        else,
                which does.
    
                    Besides the non-Napatech behavior, which is worrisome,
        maintaining the
                difference for flow handling under the covers is concerning.
    
    
    
                    I wonder if we should be upfront as possible here and just 
have a
        dpdk
                interface configuration – maybe something like “supports native
        HWOL
                mark action” since the better behavior is the exception?
    
                    The interface config would be more robust than probing.
    
                    This would need documentation, of course.
    
    
    
                    [Finn]
    
                    I think the rte queue action should never be used here when 
using
                partial HWOL. Not the way OVS handles multi queues today.
    
                    Maybe a "default queue" could be used in the dpdk PMDs when 
no
                queue is specified in rte flow?
    
                [Darrell] This is the Napatech case, where no queue action is 
needed;
        you
                are suggesting programming a default queue in this case.
                I don’t follow how this would be helpful/desired?
    
            [Finn]
            I was trying to make my view on this, not particularly arguing for 
the
        Napatech
            Case.
            Here is what I was thinking:
            Taking the case of this partial HWOL, then we are trying to offload 
the
        flow
            classification to HW, like "pre-classify and mark". Then this mark 
is used
        to
            accelerate OVS while finding the actions to execute. Since we do 
leave all
            processing of the actions to OVS, there is no way for the partial 
HWOL to
            know, at rte flow creation time, where to send the pre-classified 
packets
            (which is strictly needed when seen from a DPDK rte flow point of 
view).
            When multiple queues are specified in OVS, a hash splitting 
mechanism
            is used in the nic to support RSS. Then the nic is responsible for 
selecting
        the
            right queue according to the configured algorithm for RSS.
            OVS only needs to know how many queues to service per port. No
        knowledge
            about the association b/w flow <-> rxq is used in OVS today.
            When looking at full HWOL, all flow actions will have to be 
interpreted,
        supported
            and programmed to NIC using rte flow - send directly to target port 
and
        we
            do not have this issue.
            Have I missed something?
    
        [Darrell]
        I understand you points about the full offload you are working on.
        I just did not follow this one comment:
    
        “Maybe a "default queue" could be used in the dpdk PMDs when no
                queue is specified in rte flow?”
    
        which I thought is related to partial offloads used here and the receive
        queue action we are discussing?; I am concerned that I missed your point
        about this specific comment ?
    
    
    [Finn]
    Sorry for my messy descriptions, and yes the "default queue" idea was for 
the
    partial HWOL case. I just think if the queue could be set to 0 if 1 rxq 
used and let
    the hash algorithm set the queue if RSS is used, then we do not have to add 
QUEUE
    to the rte flow create.
    However, it was just an idea and I don't know if it can be done reliably in 
all DPDK PMDs.
    
    
                    Essentially, this is a mismatch between the rte flow impl
        functionality in
                PMD and the needs in OVS for flow classification, in a partial 
HWOL
        setup.
    
    
                [Darrell] Maybe this has been explored, but were additional
        workaround
                actions, like RTE_FLOW_ACTION_TYPE_RSS considered ?
                                 I guess we could make this essentially a NOP, 
but if mark
        action
                could ride along with it, then that would be good.
    
            [Finn]
            No. We have not explored that.
            I'm not sure I understand how you would use
        RTE_FLOW_ACTION_TYPE_RSS
            In this situation.
    
    
        [Darrell]
        So there are two points here:
        1/ ‘IF’ RTE_FLOW_ACTION_TYPE_RSS can used in lieu of
        RTE_FLOW_ACTION_TYPE_QUEUE combined with
        RTE_FLOW_ACTION_TYPE_MARK, as need for the workaround for most
        nics, then this would be better for distributing packets across PMDs 
for the
        masked flows relevant here.  This would help mitigate one of the 
concerns.
        We could use the RSS we use globally, essentially making the 
distribution a
        NOP compared with what we normally do today with RSS….
    
        2/ Assuming ‘1’, we might even use this in all cases as a first try for 
all nics,
        which if it could be done solves the queue action workaround
        capability/probing thing?
    
    [Finn] Thanks, I see what you mean. It makes very good sense if it is 
supported
    on the nic DPDK PMDs.

    
    
                    However, we would not mind Darells proposal, it makes sense 
since
                most nic PMDs unfortunately needs this and it is only relevant 
for
        partial
                HWOL. We are mostly concerned with full HWOL and therefore see
        partial
                HWOL as a failover when full HWOL could not be handled in nic, 
if
        enabled.
    
    
    
    
    
                    I think anyways we need documentation describing the 
difference
        b/w
                nics in the dpdk documentation (howto part).
    
    
    
    
    
                        Assuming no such enhancement is appropriate at this 
time I
        would
    
                        still like to ask if this is the best place for this 
hardware-specific
        code?
    
    
    
                    [Darrell]
    
    
    
                    For OVS, the netdev-dpdk layer is the lowest layer.
    
                    This kind of workaround is hard to hide, since we are 
messing with
        the
                rxq, so I think OVS needs to know that it is in effect anyways. 
An
                alternative is to supply a mark and an ‘optional queue’ and let 
the
        driver
                decide if the queue is needed and report back whether it was. 
This
        would
                be hard to do across various drivers. Supporting in the rte 
layer would
                require both rte and driver support, so even more support.
    
    
    
    
    
    
    
                        >            A separate comment is we need to document 
which nics
        need
                the queue action.
    
                        >
    
                        >          Also, I think we should check errno in the 
present code.
    
    
    
    
    
                    _______________________________________________
    
                    dev mailing list
    
                    d...@openvswitch.org
    
                    https://urldefense.proofpoint.com/v2/url?u=https-
                3A__mail.openvswitch.org_mailman_listinfo_ovs-
    
        2Ddev&d=DwIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-
                uZnsw&m=IHypHavCy0AKjNxqOMyc4w3ILyC-
    
        BuwkB8fuVvQUA3k&s=deJQWP9KI22Xp46tEoZ6o6Emitr3Bhfd7iSMxNpude
                g&e=
    
                    Disclaimer: This email and any files transmitted with it 
may contain
                confidential information intended for the addressee(s) only. The
                information is not to be surrendered or copied to unauthorized
        persons. If
                you have received this communication in error, please notify the
        sender
                immediately and delete this e-mail from your system.
    
    
    
            Disclaimer: This email and any files transmitted with it may contain
        confidential information intended for the addressee(s) only. The
        information is not to be surrendered or copied to unauthorized persons. 
If
        you have received this communication in error, please notify the sender
        immediately and delete this e-mail from your system.
    
    
    Disclaimer: This email and any files transmitted with it may contain 
confidential information intended for the addressee(s) only. The information is 
not to be surrendered or copied to unauthorized persons. If you have received 
this communication in error, please notify the sender immediately and delete 
this e-mail from your system.
    

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to